C for Yourself - part 1

David Matthewman explains the differences between compiled and interpreted languages, and introduces the concept of libraries in the first in a new series on the C language.

Welcome to the first in a series on the programming language known as C. C might well be termed the Archimedes's second language; it is certainly the next most familiar after Basic.

However, it makes its presence felt almost as strongly as Basic. Of the applications provided in Rom with Risc OS 3, half of them - Edit, Draw, Configure and Paint - are written in C.

C is a compiled language, Basic is an interpreted one. When a Basic program is run, the interpreter examines a line of code at a time, translating and acting upon it before examining the next line. A C program does not normally run in this way, although we did feature a C interpreter on the February 1994 cover disc.

Compilation

A C program must first be compiled as a block into machine code. The machine code itself is then run. This makes for a faster program, not only because the computer is not having to interpret the code at the same time as running the program, but also because the compiler can perform certain optimisations on the code which would not be possible if it were only seeing the code a line at a time.

There is a down side to all this, which is that the compilation itself can be time consuming. When you are trying to debug a C program, spending minutes waiting for the compiler to compile code to which you have made one very simple change can be frustrating.

Fortunately, it is possible to split large programs into smaller sections, and compile them section by section. Then, the section of the program which has been changed will need to be re-compiled; the rest will not.

Once all the sections have been compiled, they are joined together by a program called a linker, which creates the final, executable program.

Strictly speaking, the compiler does not produce executable machine code, but object code. This is machine code with added 'hooks' to tell the linker how to fit it together.

The process is obviously much more complicated than 'one line at a time' interpreted Basic. First, C program fragments are compiled into object code files, which are linked into executable machine code.

Even if the C program appears to be whole and not split into fragments, it will still need to be linked with one or more of the standard C libraries if it is to do anything useful. All this will be examined in more detail later in the series, but it is important to have a general sense of what is happening.

C keywords

The language itself is relatively bare. Archimedes Basic has over 100 keywords; ANSI C has only 32. A full 19 of these are used to declare the type and scope of variables - whether a variable is an integer, whether it is local to a given program segment and so on.

Of those remaining, there are keywords associated with loops:

do, for, while, break, continue

keywords to perform conditional statements:

if, else, switch, case, default

as well as a keyword to give the size, in memory, of a variable, one to return from a procedure and lastly, that pariah among keywords:

goto

And that is it. There are no keywords to print results, none to handle files, no graphics or sound commands, not even so much as a beep. All these are handled by library functions - object files containing code to perform all these functions.

These files are linked with the code from your program in exactly the same way as any other object code.

Header files

Each object file - the libraries are no exception - comes with a header file which, loosely speaking, describes what each function in the object file does.

If a program segment uses a function from one object file, it includes the relevant header file before its instructions. It does with a line of the form:

#include <library_name.h>

the exact use of which we will tackle later in the series.

This month we will look at one program, which is in the C directory on the cover disc. It is a very small program and will be familiar to almost anyone who has done computing in other languages. It prints up the words 'Hello World!'. The first line:

#include <stdio.h>

includes the header file for the library containing functions for input and output, including writing to the screen.

This program needs this to be able to print anything. The line:

int main(void)

marks the start of the main section of the program. 'main' is a function like any other, but it is always the function called first when the program is run, and must therefore always exist.

The word 'int' states that the function has integer 'type', and 'void' states that the function has no parameters. Strictly speaking these two words are optional, as they will be assumed if they are not present.

The braces { and } mark the start and end of the 'main' function. In this case the function is only one line long, so they may seem a bit superfluous, but normally functions are much longer than this. The printing itself is done by the line:

printf("Hello World!\n");

The '\n' inserts a new line control code when the string is printed. The syntax of the printf function - used to print to the screen - is actually a good deal more complicated than this example suggests, and we will cover it in more detail later.

In order to run HelloWorld you will need a C compiler. See other articles on this site for information on how to get yourself set up with this.

The first thing that the compiler will do with HelloWorld is run the program through a preprocessor. This takes all lines beginning with a '#' character and acts on them.

In HelloWorld the only such line is the #include line at the start of the program. The preprocessor will replace this line with a set of function definitions for the stdio library.

After this, the text will be passed to the compiler, which will turn the instructions into machine code.

This machine code, the object file, will contain a reference to a function 'printf', but will not contain the function itself.

The final stage is performed by the linker. This will take the object file, find any references to external procedures - printf in this case - and extract the code for them from the relevant library, which here is stdio.

The linker will then generate a fully self-contained machine code HelloWorld program.

You should now understand the difference between a compiled language like C and an interpreted one like Basic.

Hopefully you will also have some idea about how C programs are compiled into files which are then linked to make executable machine code.

Next month, I will explain the syntax of C in more detail.

Sidebox: "The C language"

The programming language C was developed in the mid 1970s. Its ancestor was BCPL, which begat B, a language written in 1970 for the first UNIX system. B in turn begat C.

The UNIX operating system and C are inexorably linked. UNIX is written in C, as are most UNIX applications. The UNIX C compiler is also written in C, at first sight a bizarre arrangement.

Dennis Ritchie of AT&T Bell Labs in New Jersey wrote the C language specifically to implement UNIX on the DEC PDP11, so C and UNIX really developed out of each other.

C has evolved a lot since then. Today, any computing system that wants to be taken seriously has a C compiler. As it appeared on more and more platforms, extensions were made to the original standard laid down by Brian Kernighan and Dennis Ritchie in 1978 - the standard known as Kernighan and Ritchie (K&R) C.

In 1983 the American National Standards Institute (ANSI) drew all the changes together and defined a new standard - ANSI C - which has stood ever since.

This standard not only cleared up a set of ambiguities from the first definition, but it defined a set of standard functions which a C compiler must provide in the form of a library.

The standard opened up the way for machine-independent C to be written and ported easily between compilers on different platforms, secure in the knowledge that when a program asked for a file to be opened, the compiler would sort out how to do the actual opening.


Source: Acorn User - 145 - August 1994
Publication: Acorn User
Contributor: David Matthewman