About C for Yourself

C for Yourself - part 7

Continuing the look at pointers and arrays in C, David Matthewman studies how to access blocks of memory with pointers.

Last month, I looked at how to initialise pointers to point at blocks of memory using the malloc() function. For instance:

int *ptr;
ptr = malloc(100*sizeof(int));

sets the variable ptr - with type pointer to int - to point at a block of memory big enough to hold 100 integers; 400 bytes, on the Archimedes. Because of the correspondence between arrays and pointers, once you have done this you can get at any particular element in the block by using ptr as an array:

ptr[0] = 35;
total += ptr[15];
last_entry = ptr[99];

The first of these statements could also be written in pointer notation as:

*ptr = 35;

literally, 'Let the integer pointed to by ptr have the value 35.' Since ptr by definition points to the array element ptr[0], this does what we want.

Indexing pointers

We now need a mechanism for pointing to other elements in the array. In fact, this is very easy; to point to the element ptr[n] we use *(ptr + n). The brackets are necessary because the dereferencing operator * has a higher precedence than the addition operator +, so the expression *ptr + 1 take the number pointed to by ptr and adds one to it. We can therefore write the last two array statements as:

total += *(ptr + 15);
last_entry = *(ptr + 99);

should we so wish.

Why should we so wish, anyway? Well, there are a number of reasons, many relating to subjects that we have not covered yet. In some cases it may be more intuitive to write the expression in this way, or it might lead to more compact source code, or it might cause the compiler to generate better machine code.

The pointer types

Programmers new to C, especially those who have come to it from BBC BASIC, tend to assume that a pointer is a pointer is a pointer, and that it is identical with the int type. After all, this is the case in BASIC, where the address of a variable or the pointer to a block of memory is an integer variable. This is not the case with C, however.

This should in fact be apparent from the previous example. The variable ptr points to the start of a block of memory containing four-byte integers. The variable ptr + 1 points to the second integer in the block, in other words its value is an address four bytes higher than ptr itself. The value of ptr + 2 is an address eight bytes on from ptr, and so on, up to ptr + 99 which points to the highest integer in the block at an address 396 bytes higher than ptr itself.

However, what if we had originally defined ptr to point to a block of characters, with:

char *ptr;
ptr = malloc(100*sizeof(char));

In this case, ptr still points to the first character in the block, and ptr + 1 points to the second character in the block. However, because sizeof(char) is one byte, ptr + 1's value is an address one byte high than ptr, not four. Likewise, ptr + 99 points to an address 99 bytes higher than ptr, the highest address in the block.

All this implies that the C compiler must be able to tell what type of object a pointer is pointing to. Bear in mind that - although we have so far only come across a few modest-sized objects - C allows for all manner of types of totally arbitrary length which we will look at in a few months. All of these may be referenced using pointers. This is why there is not just one 'pointer' type in C, but any number of 'pointer to something' types.

Casting and arithmetic

Since pointer all have in common the fact that they point to a position in memory - they are all in some sense addresses - it may sometimes be useful to assign the value of one pointer to another pointer of a different type. This should be done with extreme caution, but it can be done with casting. Prefixing a variable by (type *) gives it the type 'pointer to type'. For instance:

char *pchar;
int *pint;
...
pchar = (char *) pint;

However, pchar still has type 'pointer to char'. pchar + 1 will point to one byte higher than pchar, not to the next integer in the list, which would be pointed to by pint + 1.

Because of this problem, casting between pointers should be avoided wherever possible. Another common case involving pointers is to cast them all to int variables. It is true that on the Archimedes all the pointer types and int are physically identical, but this may not be so on other machines. Normally, such casts would be reserved for circumstances where we wish to output the value of a pointer, rather than the object to which it is pointing. Remember that just because a pointer can be represented by an integer, it does not mean that adding one to the pointer will add one to its integer representation. If it is a pointer to int then it will add four, and so on.

One cast to int which is permitted - and is done automatically by the compiler - is setting a pointer to zero. Zero or NULL is a special value which indicates that the pointer may not be used because it has no memory to point at. It is returned by the malloc() function if it cannot allocate memory for some reason. Attempts to use a NULL pointer by dereferencing it using * are invariably nasty.

We have already seen that it is both legal and useful to add integers to pointers. Two pointers may not be added, but pointers of the same type may be subtracted to give the difference between them in units of whatever is being pointed at. Of course, this does not always give a sensible answer unless both pointers were pointing at the same block of memory.

Pointers of the same type may also be compared; a pointer is greater than another pointer if it points to a higher area of memory.

Deallocating pointers

Once we have allocated some memory using malloc(), it stays allocated until we end the program, even if the program has finished using it. If we were to continue to allocate memory to pointers using malloc(), sooner or later we would run out of memory. It is therefore good practice to de-allocate any pointers which are not used using the free() function:

free(ptr);

This frees up the memory used by ptr, and sets ptr to have a value of zero, or NULL. It is still possible to run out of memory due to memory fragmentation, of course.

That's it for pointers for the moment. Next month, I shall look at functions in C.

Sidebox: "The name, value, type and address of a variable" In Through the Looking Glass by Lewis Carroll, Alice encounters Humpty Dumpty, a character who makes a lot of fuss about the difference between the name of a word and what it means.

In a similar way, when programming in C you need to be very clear about the distinction between the name of a variable, the variable's contents or value, its type and its address. The variable's name is a label, used by C to identify the variable. Its value is the contents of the variable, whih could be a character, integer, floating point number or so on, depending on the type of the variable. It could even be an address, if the variable is a pointer. The type tells the compiler how to interpret the contents of the variable to get the value, plus other information like how much memory the variable takes up. The address of the variable is the actual space in memory where the variable's contents are stored.

With a simple declaration:

int i = 10;

this is fairly obvious. The name of the variable is i, its value is ten, its type is int and its address can be found by the expression &i. However, things get more complicated with arrays. If we write:

int ar[3];
ar[0] = 49; ar[1] = 64; ar[2] = 81;

then the type of the variable ar depends upon whether we are referring to the array ar or the pointer ar. The former is an array, with three elements whose values are 49, 64 and 81, and type integer array. The latter is a pointer, whose value is the address of the first array element and whose type is pointer to int. For this reason, when referring to the array, I will write it as ar[], to ensure that the names of the array and of the pointer are different.

With this done, I can write either:

the name of the variable is ar[], it is an array with three values 49, 64 and 81, its type is array of int and its address is given by the expression &ar[], or more simply by ar, though of course it occupies a block of memory starting from this value.
the name of the variable is ar, its value is the base address of a block of memory, its type is pointer to int and its address is given by &ar.

Of the two, the second is usually preferable. Pointers and arrays are equivalent, and it would be confusing to have two different types for what is effectively the same data structure. Because of this, when I am referring to an array/pointer, I will use the pointer variable unless I am specifically talking about an array. Remember though that the name of the array variable is ar[]; the variable ar without the square brackets is a pointer with a value the same as the value of the expression &ar[0], rather than the value of the array ar[].

Source:	Acorn User - 151 - January 1995
Publication:	Acorn User
Contributor:	David Matthewman