C for Yourself - part 13

Steve Mumford concludes his look at file handling.

We've covered the fundamentals of file handling in the past few months, but there's one area left to mention. If you're only using small files, you'd probably be quite happy to load your data in at the start of the program, make whatever changes were necessary during its execution, and save the entire file again once the program had finished.

This would work quite nicely if the size of the file was limited, but what would happen if the volume expands? Even though it might still be able to fit in your machine, you'd notice a gradual decrease in speed. If the program multitasked, you would find yourself with less and less memory to run other applications. Ultimately, you would reach a stage where the memory of the machine was full, and you'd no longer be able to add any records to your file.

In order to overcome these problems, the program has to be able to leap blindly into a file and grab chunks of data at random, as well as saving them out again. There are functions in C to allow for this, but you have to be prepared to make some alterations to your file format in order to get the best results.

File structure

Imagine you wanted to save a list of names, all of different lengths - if you'd used fscanf() to do the job, there's be no easy way of jumping to any one name within the file, since you wouldn't know how long each name was until you'd scanned the file. In order to find a particular record, you would have to search through from the beginning, which would be horribly time-consuming for large files. The way round this is to pad out the data and save it in specifically-sized blocks, so the absolute position of a record within a file can be determined by multiplying the size of the record by a suitable offset.

There are several extra functions provided to help you format your files; fgetc(), fputc(), fgets() and fputs(). The first two load and store single characters while the others deal with whole arrays of them. Consider the following code:

char letter;
int x;
letter = fgetc(pointer);
letter = 'Q';
x = fputc(letter, pointer);

fgetc() returns the next character in the file pointed to be the variable pointer, or returns a number equal to the macro EOF if there was a problem. fputc() should return the character it's saved to the file if all's well, but again it'll return EOF if something went awry. fgets() and fputs() work along the same lines, but they load and save strings of characters in one operation.

int x;
static char string[] = "Hello, world";
x = fputs(string, pointer);

The above fragment will copy the string "Hello, World" to the file given by pointer, x being given a positive value if the function worked, or EOF if an error occurred. The major difference with this function is that it doesn't tack a newline or a null character on to terminate the string; it's up to you to do so if you require one.

char string[50];
int length = 50;
fgets(string, length, pointer);

The fgets() function will attempt to read a string from the file specified by pointer, stopping if it reaches a newline character or the end of a file, or if it's read length-1 characters. After one of these conditions has been satisfied, the string in memory is terminated with a NULL character. The function returns NULL if it came across the end of the file.

Using these functions in conjunction with the others you've learned will allow you to order your data into neat segments, ready for the techniques of random data access described below.

Changing your position

The first of the functions concerning random access is rewind() and, as its name suggests, it takes the pointer you supply it with and resets the position of that file so any future read operations take their data from the beginning. Its syntax is simply:

rewind(pointer);

The next two commands are related, and they are ftell() and fseek(). The former returns a long integer corresponding to the current position within a file, and the latter allows you to jump to another point. Here's the structure of the ftell() function:

long position = 0L;
FILE *pointer;
pointer = fopen("file", "r");
position = ftell(pointer);

It's important to remember that ftell() will return a long integer, so set up your variables carefully. Once you've stored a position, you'll probably want to return to it at some time in the future, so here's how the fseek() function is put together. The integer it returns will be zero if all is well; if you get a non-zero value, it's not been able to complete your request.

int x;
x = fseek(pointer, position, SEEK_SET);

The position in the file is specified by a combination of the last two parameters - the position variable is considered to be an offset, and you choose where you measure that offset from by using the third parameter. SEEK_SET takes the offset to be relative to the start of the file, SEEK_CUR measures it from the current position, and SEEK_END specifies an offset from the file's end. All three macros are defined in stdio.h, in case you're wondering where they've appeared from. To move your position back by twenty bytes, you would use:

x = fseek(pointer, -20, SEEK_CUR);

To go to a position ten bytes before the end of the file, you could use:

x = fseek(pointer, -10, SEEK_END);

These functions can be used in any combination to bounce the position in the file back and forth, and with careful planning of your file structure this will allow you to load sections of data as and when you want them. Used in conjunction with the update file mode discussed last month, you'll be able to load and edit files limited only by the disc space of your machine - apologies to those who don't have a hard disc.

Well, I think that's quite enough on file handling for now - the best way of learning is, as always, to have a go for yourselves. Next month, I'll move on to more complicated data structures, and how they might be used to make the programmer's lot a little bit easier.


Source: Acorn User - 157 - July 1995
Publication: Acorn User
Contributor: Steve Mumford