C for Yourself - part 15

Steve Mumford takes a look at linked lists and the principle of their creation.

Last month, I finished off by introducing methods of using pointers to structures. I'll start this month's column by explaining how you can manipulate them, and I'll go on to describr some of the ways in which you can link structures together to create all manner of weird and wonderful data formations.

Manipulating pointers to structures

After you have declared a pointer to a structure and set aside the appropriate amount of memory for it using the malloc() function, it's possible to access the areas of memory to ehivh it points by using the normal pointer notation:

typedef struct
{
  int num1;
  int num2;
} SIMPLE;

SIMPLE pair[5];
SIMPLE *pointer;

pointer = &pair[0];

The fragment above defines a structure named SIMPLE and declares an array called pair which contains five of these structures. A pointer to this structure type is then declared, and it's initialised so that it holds the address of the first structure in the pair[] array. Having done this, it's now possible to access the five array structures through the pointer, as shown below:

(*pointer).num1 = 10;
(*pointer).num2 = 15;

Remember that at this tage, the pointer contains the address of pair[0], and when prefixed by the asterisk it instructs the computer access the data stored at that address - in this case, the members of the array held there are being assigned.

Although the compiler is perfectly happy accessing structures in this way, the ampersands and asterisks can become somewhat confusing. To this end, C possesses another operator to make the programmer's life that bit simpler - it's called the membership operator and it's written as a minus sign followed by a greater than symbol. An example is shown below:

pointer = &pair[0];
pointer->num1 = 5;

This method of accessing structures through pointers is equivalent to using all those asterisks, but it looks a little neater.

Linking structures together

In last month's column I referred to an apparent limitation when defining a structure - you're not allowed to include a reference to the structure itself in its own definition, for reasons of recursion. However, I did mention that there was a way round it, and here it is. If, instead of trying to include the structure itself in the definition, you incorporate a pointer to that structure type, the compiler remains happy. This is because the pointer doesn't actually contain the structure; it's most of a signpost capable of sending the computer to an appropriate piece of memory.

struct list
{
  int num1;
  int num2;
  struct list *next;
}

struct list *root;
root = malloc(sizeof(struct list));

The code above defines a structure that includes a pointer to a structure of its own type. The last two lines declare a pointer named root and allocate it enough memory to hold the list structure. As things stand, we've only got one structure in memory, and it's not complete - the member called next doesn't actually point to anything yet. However, that will soon change:

struct list *current;
root->next = malloc(sizeof(struct list));
current = root->next;

What's happened here? Well, another chunk of memory has been allocated, and its address has been stored in that empty pointer - in effect, we've tacked another structure onto the end of the first. We can keep track of which structure we're looking at with another pointer, named current in the above example. As things stand, there are two structures in memory, one pointed to by root and another whose address is stored in the next pointer inside the root structure itself. A copy of that address has been stored in current, and this is so that another area of memory can be allocated by the same procedure.

current->next = malloc(sizeof(list));
current = current->next;

We've now got three structures in memory - the first contains the address of the second, and the second holds the address of the third. This is what's known as a linked list, since each member points to the next one in the sequence. The list can be extended as far as memory allows, and it's possible to step down the chain by making use of the stored pointers. However, we need some way of telling the computer where the end of the list is; if we don't, the program would merrily run down the line of structures and fall off the end - we'd probably witness a spectacular crash.

How do we go about creating a stop sign? The simplest method is to store NULL in the pointer at the end of the list. In this way, we can check that the structure pointed to is 'real' and not just a rogue address. The safest way of doing this is to assign the NULL value just after a new structure has been added to the list - this way, you can't forget about it later.

current->next = malloc(sizeof(list));
current = current->next;
current->next = NULL;

Some more data constructions

Now that we've designed a linked list, we can move on to create more complex formations. Although linked lists are easy to sort and search, they do have limitations in the fact that you can only step through them in one direction - there's no way of going backwards. If you've just gone past a piece of information, you'd have to start at the beginning of the list and work through it all again. This can be avoided by including another pointer in each structure to refer to the previous item in the list - it requires a little more bookkeeping work, but it can save time in the long run. It's also possible to link from the end structure to the first one in the sequence, to form a cyclic list. This can be useful in some circumstances as a buffer.

Linked lists versus arrays of structures

You'll have seen by now that the process of creating a linked list is somewhat complicated, and unless you thoroughly understand where all the pointers are pointing to, it's easy to get hopelessly confused. Because an array of structures is very similar to a linked list in terms of its capabilities, there would be no point in using them unless there were some definite advantages - so, what are they?

Firstly, if you're using an array you must fix the number of elements it contains before you compile it, and once the program is running this value cannot be altered. In order for the capacity of the array to be increased, the code would have to be recompiled. However, with a linked list that problem does not exist, since each element is declared when it is needed and added on to the end of the list using a pointer. Structures can be added at any point during the execution of the program, and the only limit imposed is that of memory.

The other great advantage that linked lists possess is that the way in which they connect to each other can be altered quickly and easily, whereas arrays are fixed. This makes sorting data much easier, as well as the insertion and removal of records. If, for instance, a listing contained three structures named A, B and C and the user wanted to swap B and C, all it would involve would be the alteration of the pointers within the structures. C would be changed so that it pointed to B, A changed to point to C, and B's pointer would be assigned a NULL value. If an array of structures were being used, the actual data in those arrays would have to be copied across, involving a good deal more work. Similarly, if a record is to be deleted it can be removed from the list by manipulating the pointers and then the memory that it occupied released by calling free() - something which is impossible using arrays.

Well, that concludes my brief introduction to some of the more complex things you can do with structures - don't worry if it all seems a little confusing at present. The best method of understanding these data formations is to make use of them, so take a look at the disc and create a few data trees for yourself. If there are any topics that you'd like me to clarify, please drop me a line and I'll do my best to help. See you next month, when I'll introduce the ideas behind recursion.


Source: Acorn User - 159 - September 1995
Publication: Acorn User
Contributor: Steve Mumford