C for Yourself - part 36

Steve Mumford looks at alternative methods of saving data

In last month's column, we covered the actual mechanics of saving data from an application to a file — because the main function of the program in question was to store text, it made sense to store the data in a simple textual format. However, this isn't the only technique available and sometimes it's best to use an alternative method.

Storing data as plain text within a file has some big advantages — for instance, it's easy to edit from outside the application, and file conversion is made much simpler. It's also easy to transfer a data file between hardware platforms, as very little (if any) editing of the sourcecode is required to load one of these files into the same program that's been compiled for use with a different computer.

However, as we saw last month, a certain amount of processing has to be done when reading the data to transform the plain text stored within the file to a format suitable for storage in the program's data structures. In our application, this conversion didn't cause a problem as the data files are small and loading is performed infrequently. However, it's easy to envisage a situation where the majority of the data is numeric and disc access plays a greater part — for instance, manipulating a database file that's too large to be stored in memory.

Switching to a binary format file allows the use of two extra functions, fread() and fwrite(). Unlike the functions we've seen before that essentially 'print' to an open file, these two access the file more directly and transfer whole blocks of memory in one call.

To see how these techniques differ, imagine you've set up a variable that can hold a four-byte number. Within the program, you've set the value of this variable to be equal to '1'. Whereas using fprintf() or fputs() functions would result in a file containing the actual character '1' (with an ASCII value of 49), using fwrite() would dump out the four bytes that made up the variable — the file would contain three 'zero' bytes followed by a 'one', and since these don't lie within the usual ASCII character set, they wouldn't be visible as normal text and would appear in Edit as control characters.

Storing data in this way means that it's more efficient at loading and saving large, numerically dense files and this can be desirable if you have an application that performs a lot of disc access. However, there is a catch — because you're saving a snapshot of a section of memory, the data files produced tend to be machine-specific and guaranteeing compatibility between different platforms can take a lot more work. This stems from the fact that differ- ent compilers use different amounts of memory for holding the various types of variables. If you try to load a four-byte integer into a space that's only two bytes long (as can be the case with certain short integers), you'll end up with a corrupt file.

fread() and fwrite() both take four parameters, and these are as follows: the first is a pointer to the block of data you wish to load or save, the second is the size of the data block you want to transfer, the third argument allows you to specify how many times you want to repeat that operation and the fourth is a standard file pointer obtained by the fopen() function. It's important that this file pointer refers to a binary-type file, so you should add a 'b' to fopen()'s file access flags. For instance, to save an array of 10 jump_data structures, you might do it in the following way:

FILE *file_ptr;
struct jump_data jmp_data_root[10];

/* fill the array of structure with information */

file_ptr = fopen("filename", "wb");
fwrite(jmp_data_root, sizeof(jumpdata), 10, file_ptr);
fclose(file_ptr);

The sizeof() function allows the compiler to determine the exact size of an individual jump_data structure, to save you having to add up the sizes of all its elements. The one fwrite() command saves the array in its entirety, saving a lot of tedious translation. Loading the data back is just as simple; the file is opened with binary read access, a suitable area of memory is prepared, and the fread() function called. Finally, saving data in this raw format allows us the potential of random access; more of which later.


Source: Acorn User - 180 - April 1997
Publication: Acorn User
Contributor: Steve Mumford