About C for Yourself

C for Yourself - part 34

Steve Mumford looks at some more sorting techniques

Conceptually, the simplest sorting method is probably the bubblesort, where the computer examines pairs of numbers and gradually rearranges the data so that with the highest value floats to the 'surface'. As discussed last time, its simplicity makes it easy to implement, certainly for fixed arrays of known size. However, any attempt to store data in a less rigid fashion (within linked lists, for instance) brings with it a greatly increased degree of complexity.

How do you improve the efficiency of a sorting technique? The bubblesort is essentially the lowest of the low in that regard and it's not universally loved. Another technique that's easy to understand is called straight insertion and instead of considering pairs of numbers throughout the sorting operation, it takes a different tack. It examines the first two elements in the list and places them in order - from there, it has a look at the third element and compares it with the first two, inserting it into the correct position.

Now there are three elements in order, so the algorithm can grab the fourth, insert that and so on. By the time the end of the list is reached, the data should be fully ordered. It might help to think of the way you order a hand of playing cards as an extra one is dealt to you in turn. Because this method builds from the ground up rather than tackling the whole block of data at once, the maximum time that the sort could possibly take is reduced. However, it's still notably inefficient.

A modification of this algorithm that can make it a little bit quicker is known as Shell's Method or simply shellsort. This technique works by doing a few quick insertion sorts over the data to begin with, shifting the elements into approximately the right locations. These successive sorts widen in scope until there's one sort that covers the entire array; it's that one that brings the elements into their final resting places. The first pass looks at numbers that are far apart in the starting sequence - a large number of sorts is performed with just a couple of numbers in each group - but with each successive iteration the spacing between the elements chosen for each group is reduced, leading to a few sorts with many elements in each set.

Other types of sort

If dealing with a large amount of information, two other sorting techniques are worth considering - the quicksort, invented by C.A.R. Hoare, and the heapsort, designed by J.W.J. Williams. Typically, the quicksort is faster and normally the fastest algorithm for sorting a large number of elements, but it's a somewhat complex process and requires extra memory to work. The heapsort, on the other hand, doesn't require any extra memory and although it's normally slower than the quicksort, can be easier to program.

The quicksort works by taking a set of data and gradually partitioning it into smaller and smaller portions - once they reach a certain size, it's possible to use a simpler sort such as the straight insertion technique. Once all these miniature data sets have been ordered, they can then be strung together once more to give the final result.

The partitioning is done by taking a data element, scanning through from both ends of the data set for numbers greater than this on one side and less on the other, and swapping them over - a point is reached where one side of rhe set is less than the partitioning element and the other side is greater. These two new partitions can then be sorted in the same manner and the process repeats until the units are small enough to be tackled by a more standard algorithm.

The heapsort works by building up relations between data elements in a way that resembles a family tree - each element has two children and one parent. As the tree is built up, the lowest numbers becomes the youngest children right at the farthest branches. Building the heap is the first stage - to sort it, the top of the heap is removed and the eldest child below it is promoted to take its place. In turn, its eldest child is also promoted and so on.

For more information about these methods, plenty of books are available that explain the algorithms concerned along with bits of source code. Next month we'll get back on the track of our embryonic WIMP application, filling in the cracks and making it fully operational. One more thing - if you've been writing your own applications in C and come across any problems, drop me a line and I'll try to help; if we can make a collection of the most common, I'll put them together for a future column.

Source:	Acorn User - 178 - February 1997
Publication:	Acorn User
Contributor:	Steve Mumford