About ARM Code for the truly petrified

ARM Code for the truly petrified - Part 1

The start of a new series by Matthew Bloch intended to squeeze every drop of speed from your programming.

What do BASIC and ARM code programming have in common? When you turn your machine on, the ARM chip starts on the instruction at memory address zero, and the whole of RISC OS is initialised from there. After each instruction, the chip moves on four bytes, or 32 bits (the space needed to store a single ARM code instruction) and executes the next instruction, just like lines of BASIC.

The ARM chip has sixteen registers called R0-R15, and each register can store a 32-bit number. The sixteenth register is called the program counter, and it stores the address at which the ARM chip is currently executing. It is referred to for this reason as PC. BASIC has a limitless number of variables, but in ARM code we have to make careful use of registers, and store those we are not using in the memory for recall later.

So, here's an instruction: MOV R0,R1. This instruction moves the contents of register one (R1) into register zero (R0). The 'destination' register, the one that will be changed, always comes first in ARM code instructions. What the ARM chip sees is the code &E1A00001 stored in memory; the 'instruction' as we humans see it is only a longhand (mnemonic) so we don‘t have to work out these codes for ourselves. What an assembler (or compiler) does is to take these lists of longhand instructions and translate (compile) them into unintelligible machine code.

BASIC, conveniently, has an assembler built into it. Once you‘ve assembled your code, you can then set the PC to start running your compiled code, which means you can make the ARM chip do your bidding! It‘s as simple as reserving some memory in which to store your instructions, compile them there, and then run them. Note that not all four-byte codes are legitimate instructions; trying to execute these will give an 'undefined instructon' error. Other instructions include ADD, SUB(tract), MUL(tiply) and ORR (logical OR, with an extra R to make it three letters). All ARM instructions can be executed conditionally, that is to say, you can set an instruction to execute depending on the result of a previous one. You‘ll see examples of this below.

So, manipulating registers is all very well, but this seems a world away from writing 'real' programs to make graphics and sound. The answer is in the memory; certain areas of it are 'mapped' onto external devices such as the video controller. So if we write a byte into the screen memory, it will be displayed by the video controller. There is also a sound buffer where any bytes stored will be converted to an analogue signal and output through the speakers. Then on top of these crude methods, RISC OS provides us with a huge library of routines (introduced next month) to perform common tasks such as printing text, drawing garaphics and playing music without having to resort to writing directly to the hardware; in fact using system calls is the norm, since they cause fewer compatibility problems than direct hardware access.

Fast cube root

Try this simple task: there's a square root function in BASIC, but there isn't a cube root, so write a function in BASIC to find the cube root of any positive whole number. It only has to find it to the nearest integer, don't worry about the fraction part. Chances are, those who aren't mathematicians (and I'm not) will have come up with something similar to this:

Fig. 1 - BASIC cube root function

DEF FNcube_root(n%):check%=1:result%=0
WHILE(result%<n%):check%+=1
result%=check%*check%*check%
ENDWHILE
=check%

This function will always give an overestimate if the input value isn't an exact cube. Now let's see how fast this function is; try finding the cube of every number from 1 to 10,000. My RISC PC gives a time of 4.62 seconds, though you may be waiting up to ten times this with an ARM2. Either way it‘s still faster than doing it by hand. Look at the example 'CubeRoot' program (download) to see how to do it in 0.2 seconds.

Click here to download the CubeRoot program

To see how this is done, load the program into your favorite editor and take a look at how it's written: the PROCassemble function may look alien, but it demonstrates how to write a small ARM code routine with minimal effort. The form of the DIM statement may be confusing without brackets: what it does is to reserve 256 bytes of memory and store the start address of the memory in the 'code' variable. P% is a variable treated specially in BASIC; it tells the ARM code compiler where in memory to start compiling instructions. When you want to start assembling, the [ character turns BASIC into a compiler; rather than look for lines of BASIC, it will take each subsequent line as ARM code and assemble it at P% until the next ]. Anything inbetween should be code. The OPT switch should follow each [ character to set the various options for the compiler; OPT 2 is a straight compile, with errors reported and nothing else. Its uses will become clear later; for now just use it!

The ARM code is going to mimic the flow of the BASIC function closely, so let‘s use R0 as our n% (number to check), R1 as the number we‘re currently checking (check%) and R2 to work the result of cubing R1 (result%). Throughout this discussion, look back and see how each line of ARM code compares with the BASIC function; note that we only actually need space for three integer variables. We declare the start of the routine with a label, .cube_root. This marks the start of the routine and stores it in the BASIC variable cube_root, so we can 'CALL cube_root' later. CALL is a BASIC statement which executes an ARM code routine. It passes the integer variables A% to G% in registers R0 to R7, so we can pass paremeters easily. result% = USR(cube_root) will do the same thing, but USR is a function and will pass back R0 into a variable when the routine finishes. When we call an ARM code routine from BASIC, we need to be able to pass control back to BASIC when the routine is finished. What the interpreter does is pass the return address in R14, which is sometimes referred to as the link register. Then when we need to return to BASIC, we just copy the link register into R15 with MOV PC,R14; this is equivalent to GOTO in BASIC. You can see this instruction is the last one in the routine; if you don‘t include it, the processor will carry on romping through the memory executing instructions, data, and anything else it finds, probably resulting in a crash of some sort.

We must set R1 (check%) to zero before going into the main cube searching loop. If you want to load registers with integer variables, you can, subject to certain restrictions: MOV R1,#1 will set R1=1, though not ALL numbers can be loaded; more on this later. The top of this loop is marked out with .cube_search so we can jump back to this point easily. For each cycle, we:

Increase our counter R1 by 1.
Cube it, leaving the result in R2 (i.e. R2 = R1*R1*R1).
Check if it‘s greater than or equal to the number to be cube-rooted (R0).
If R1 is smaller than R0, we loop around again until it isn‘t.
Otherwise, return with our value of R1 as the nearest cube root.

The routine does this process with a few straightforward instructions, listed and explained below. So, there‘s a simple ARM code routine in a few lines.

.cube_search

this sets the BASIC variable cube_search to equal the address in memory of the next instruction

ADD R1,R1,#1

means R1 = R1 + 1 (i.e. increase counter)

MUL R2,R1,R1

means MULtiply, R2 = R1 * R1

MUL R2,R1,R2

means R2 = R1 * R2 since cubing has to be done with two MUL instructions)

CMP R2,R0

stands for 'compare R2 with R0'. The result of this comparison is stored in the processor`s status flags (explained later). These status flags indicate (for the purposes of this routine) whether the last two numbers compared were greater than, equal to, or less than each other. In this case, we only want to carry on the loop if our result register is less than the number we`re checking (i.e. if R2 < R0), otherwise we know we`ve found the nearest cube root.

BLT cube_search

This is the branch instruction (B) with a condition code attached. A branch instruction makes the processor jump to another point in memory, usually marked out by an assembler label. You can attach a condition code to any instruction, so it will only be executed under certain conditions. Here, we see the LT condition code which stands for 'less than'. So this will only (B)ranch back to the top of the loop if R2 was (L)ess (T)han R0, as decided in the previous instruction. Some other condition codes include GT (greater than), GE (greater than or equal), and EQ (equal to).

MOV R0,R1

R1 contains the cube root found, but R0 is the only register we can pass back to BASIC. The result of the USR(...) function in BASIC is the contents of R0 when the routine exits, so we need to move the result into R0 before exiting.

We can then find a cube root by setting :

A%=number

... and then ...

cube_root%=USR(cube_root)

as the BASIC does. Next month: a file processor and an introduction to SWIs.

Source:	Archimedes World 13.13
Publication:	Archimedes World
Contributor:	Matthew Bloch