About C for Yourself

C for Yourself - part 6

David Matthewman continues to look at pointers and arrays in C, showing how they can be initialised.

First, a quick recap: last month in this column, I introduced two new operators in the C language, * and &. The & operator is the address operator and gives the address of or a pointer to whatever it operates upon.

The * operator should be familiar to anyone who has used indirection operators such as ? and ! in BASIC; it is the C indirection or dereferencing operator and means 'the object pointed to by'. Unlike BASIC, where the indirection operators take integer operands, the * operator in C only operates on pointers. Constructions such as:

int i,j;
...
i = *j;

are not legal, because j has been defined as an integer, not a pointer. Pointers are declared by using the * operator in the declaration, so the following construction is OK:

int i;
int *j;
...
i = *j;

The second line declares j to have type 'pointer to int'. All pointers in C point to a specific type of variable. Having declared i in the program above I would not be able to use it to point to a floating point number or a character, since I have specifically declared it to point to an integer. The exception to this is a type 'pointer to void' which we will encounter later.

The malloc() function

Note that the above program fragment as it stands would cause an error. I have declared j but I have not initialised it, that is to say given it anything to point to. I can do this by explicitly assigning the address of a variable to it, so:

int i;
int *j;
j = &i;
i = *j;

would be a legal if pointless example.

However, as I said last issue, pointers and arrays are equivalent. The name of an array is simply a pointer to the first element in the array, and has type 'pointer to whatever type the array had'. By symmetry, if we declare a variable to have type 'pointer to int' we should be able to arrange for it to point to a block of memory and to access that block as an array.

The function that C uses to allocate blocks of memory is called malloc(). It is declared in the C header file stdlib.h, so any program which uses it must have the line:

#include <stdlib.h>

near the start. The malloc() function to allocate a block of memory no_of_bytes in size and set pointer to point at the first element in the block is:

pointer = malloc(no_of_bytes);

The advantage of this is that no_of_bytes can be calculated at run time, whereas the size of a declared array cannot and must be present when the program is compiled.

It is important to realise that the argument for malloc() is the number of bytes to be reserved. This number may not be immediately obvious; how many bytes must be reserved for an array of integers, for instance? On the Archimedes the answer is four per integer, but it may not be on other machines. As we encounter new sizes and complicated variable types later in the series, we will meet variables whose size is not so easy to calculate. Fortunately, C has an operator sizeof() to do this automatically; sizeof(type) gives the number of bytes needed to store the given type.

int array[10];

and

int *array;
array = malloc(10*sizeof(int));

are equivalent declarations of array. Both methods have their advantages, and should be used appropriately. Declaring an array with a size in brackets means that you need to know what size the array will be before the program starts. On the other hand, the declaration reserves memory for the array, and it can be used immediately. Declaring array as a pointer means that the size of the array can be calculated at run time but that we must remember to initialise the array with a malloc().

Character arrays again

Looking again at character arrays - the C equivalent of strings - the are a prime candidate for being sized at run-time. The discussion about pointers was started last issue by the observation that it was wasteful to make all character arrays large enough to hold the largest strings that they could be expected to hold, as most will in fact hold much shorter strings.

Photographer Kate is writing a program to hold various items of information about the photographs in a roll of film. One piece of information that she wishes to hold is the name of the place where the photograph was taken. Originally she did this by declaring an array:

char place[40][100];

recognising that this was not ideal, as the majority of the 4000 bytes that this would take up would be wasted on descriptions like 'Graffiti in Berlin'.

Instead of this, Kate uses an array of pointers, declared:

char *place[40];

This is an array of 40 pointers, and differs subtly from the array declared before. The first occupies 4K of continuous memory while the second occupies a much smaller block of memory, enough to hold 40 pointers, which on the Archimedes will each be four bytes long. Each of these pointers will eventually point to an area of memory containing a string, but these areas will not necessarily be continuous, or even in order.

To the programmer, two-dimensional arrays and arrays of pointers normally appear identical, but they are held differently in memory.

One other advantage of this approach is that if Kate only uses a 24-exposure film, she only needs to allocate space for 24 strings, whereas all 40 were allocated before, whether or not they were used.

Of course, before each pointer in the array can be used, it must be initialised with a malloc(), but now this need only be done when the pointer is actually needed. Furthermore, the appropriate size of memory can be allocated for the string being stored. Both of these save on memory. Each pointer is allocated separately by a series of statements like:

place[i] = malloc(string_size);

The story so far

Arrays in C are declared by putting a number in square brackets after a variable when declaring it; the number gives the number of elements in the array, which start from zero. The type of each element in the array is the type used when declaring the array. Arrays can be multi-dimensional, with the higher dimensions being accessed by further numbers in square brackets after the variable.

The array's name is also a pointer to the first element in the array. Pointers also have a particular type, and can be declared by prefixing the variable being declared with a *. When declared in this way they must be initialised, either by equating them to another pointer - or a variable prefixed by an & which gives the address of the variable - or by using malloc(). Strings in C are represented by using character arrays. Coming next issue: indexing pointers like arrays.

Source:	Acorn User - 150 - Christmas 1994
Publication:	Acorn User
Contributor:	David Matthewman