C for Yourself - part 16

Steve Mumford delves into the intricacies of recursion.

You might be glad to know that we're reaching the end of our trek through the basics of C, and it's almost time to start applying our knowledge to more complex applications include those that run in the desktop. However, before we begin that particular quest I'd like to take a moment to investigate an interesting aspect of the language that has a reputation for causing some confusion. To be honest, recursion isn't something that's vital to know about, but if you understand how to apply the methods involved, you can save yourself time and produce some particularly neat and satisfying code.

Self-referencing and recursion

We bumped into the ideas behind recursion in last month's article - the reason we couldn't include a reference to a structure made inside its own definition was that the compiler would get trapped in a never-ending loop and fall over in a somewhat graceless manner. That's recursion as the compiler sees it; however, it's possible to do the same within the execution of the program:

void bottomless(void)
{
  printf("Now we call bottomless() again!\n");
  
  bottomless();
}

The above fragment defines a function that includes a reference to itself in the same way as the structures that we looked at last month.

In this case, the functions prints out a line of text, then calls itself and performs the routine again. In effect it creates an infinite loop, but there's a subtle difference. When a function is called, information concerning what the function was doing was written to something called the stack. It's an area of memory which is used as a form of scratch pad. This is where the computer stores any local variables being used at the time, and the address that it must return to after it's finished executing the current procedure. The stack can cope with more than one entry - if you've called a function which in turn calls another, both return addresses are stored in the correct order. In technical terms, this is known as a last in, first out stack because the newest address stored is always the first one to be returned to.

In the running of a normal program the stack remains quite small due to the fact that there's a finite number of function levels. However, if you include a function that calls itself, it's possible for the stack to grow rapidly as each call to the procedure adds another data entry to the list. If this goes on for too long, the area of memory set aside for the stack can overflow, and - yes, you've guessed it - we witness a crash. You might be wondering how a function that calls itself can ever avoid causing program failure, and the answer lies in providing an escape route which can be used once a certain condition is reached. A popular example of recursion is shown below, and it provides a factorial function:

int fact(int x)
{
  if (x==1)
  {
    return(1);
  } else
  {
    return(x * fact(x-1));
  }
}

The operation of the function is a little hard to describe, and it may help if you've got a bit of paper to doodle on, but here's what happens:

Performing the factorial of a number requires it to be multiplied by all the numbers below it, from one upwards. In order to carry this out, the procedure breaks the task down into individual steps. In common with a lot of recursive alogrithms, there are two distinct phases to the calculation - one is performed as the level of recursion is deepening, and the other comes back into operation while the recursion level works back upwards. In this case, the first task is to work out the extent of the sum - a process which is a little like expanding brackets in algebra. The second half of the routine actually performs the multiplication and hands the result back to the previous level, thus working its way out of the recursion pit.

Whenever the function is called, it checks the value of the number it's been passed. If the value is anything other than one, fact() calls itself to calculate the product of x and x-1, and the execution of the parent function is frozen just before it evaluates the multiplication.

If the number that's been passed to the function is equal to one, this signifies that the end of the calculation has been reached - instead of progressing any further, the function returns with the value 1. This is the turning-point of the whole process and from this point on, the computer will work its way back up the stack, performing all the outstanding multiplications and finally producing a result.

In order to illustrate this, let's assume the function fact() has been called with the value 3. The program makes the comparison mentioned above and then executes the line shown below, with the appropriate values filled in:

return(3 * fact(3 - 1));

At this point, fact() is called once more, and now we're a level deeper into the recursion. Yet again, the comparison is made and contol passes to the line shown:

return(2 * fact(2 - 1));

If we remember that we were halfway through an earlier multiplication when we dropped a level, we can write these two lines down together to form:

return(3 * (2 * fact(1)));

fact() is called one last time, and since the if condition is now successful, no further calls are made - the function returns with a value of one. We arrive in the middle of the previous function with a multiplication that can now be performed. Its result is passed back to the parent and, at long last, the answer to the original question can be produced. The stack has shrunk to its initial size, and the program can continue as normal.

Data structures such as binary trees and linked lists can be more easily handled if recursion is used - for example, if you were using a linked list of 20 elements and you wanted to delete the last 15, you would find yourself having to read and store the pointer to the following element before you deleted the current one - otherwise, the address of the next element is lost and there's no way of recovering them memory. It's perfectly possible this way, but it's a little ungainly. The recursive method steps down the list, checking the next element pointer and calling itself with that value if it's not NULL - in this way, it finds the end of the list. Once the routine discovers a NULL pointer, the recursion switches phase and works its way back, deleting each element as it goes.

Why use recursion?

It should be clear from the above example that the process of recursion gets pretty complex, and it's all too easy to slide into a quagmire of never-ending function calls. Writing bug-free recursive code requires a substantial amount of thought, and at first sight it might seem that recursion isn't worth the trouble. In most cases, it's easier to compose an interative version of an algorithm using the standard for-next and do-while loops, but there are situations where the opposite is true - for instance, the Quicksort algorithm is often programmed using a recursive approach. The speed of operation is another factor to be considered, since recursion involves making large numbers of function calls, and each one of those takes time. The simple answer is to avoid recursion unless you're completely happy with it, or you fancy a bit of mental athletics.

And now for something completely different

I think it's time we broadened our horizons - in the forthcoming issues, I'll start describing the methods of writing applications that run in the Desktop. There are several added complications, including the fact that the functions for accessing the WIMP depend on which C compiler you own. Next month, I'll take a look at the different Desktop libraries available and we'll choose one with which to continue.

In the meantime, you might like to look over the last few issues and write a few more progeams for yourself. There's no substitute for thrashing out your own software when it comes to learning a language, and it's always amusing to find a new way of crashing your computer - I'd be interested to hear of any particularly spectacular ones. See you next month.


Source: Acorn User - 160 - October 1995
Publication: Acorn User
Contributor: Steve Mumford