Bank Switching and Screen Access

I received only two replies to my request in Archive about ideas for articles based on ARM or Basic programming. Thanks to both of those people. One of the letters was very hardware-based and I'm sorry but I have no interest or experience in programming for hardware- or control-related problems.

The other letter was focused very much on games programming. This is a huge and complicated area in which I have not really done much work. However, that person did mention a few things that I can cover in this one article: fast screen clearing, lookup tables, bank switching and producing a starfield simulation. So, I shall illustrate the code used to produce a starfield simulation utilising bank switching and fast screen clearing. A short section on lookup tables is also included at the end. As there have been many articles in various magazines, including Archive, in the past, I shall assume that readers already have a reasonable grasp of the main principles of ARM assembly language.

Bank switching

The simplest of the above concepts is that of bank switching. Basically, some screen modes take up (or can be made to take up) less than half the available screen memory, so the rest is used for other screen banks. On the Master and the old A310 - in fact, I think on any pre-RISC OS 3.5 machine - I remember a screen mode with what were then known as shadow banks being selected by setting bit seven of the screen mode number, so

MODE 128+13

would change to mode 13 with as many extra screen banks as possible. There was, of course, all the obligatory messing about to try and claim enough screen memory. However, nowadays it seems that on my RiscPC600,

MODE 128+28

selects a screen mode with only one screen bank and simply

MODE 28

is enough to get at least two. With two screen banks, a sort of buffering effect can be produced, so a program can be writing to one bank while the operating system is displaying the other. Once writing to one bank is finished, the banks can be swapped and the other updated. This process usually happens continually throughout a program's execution. The only problem is that, due to the different banks being in effect different screens, what was drawn on one bank, if it is to remain on screen, must also be drawn on the other.

Under RISC OS, bank switching is achieved very simply through the use of three OS_Byte calls. The code required looks something like that below. Note that I have used r9 to hold the current bank number but, in reality, it may need to be loaded from memory.

.Loop
mov r0,#112      ; select screen bank
mov r1,r9
swi "OS_Byte"

... update things and draw to screen bank ...

mov r0,#19       ; wait for vsync
swi "OS_Byte"
mov r0,#113      ; display screen bank
mov r1,r9
swi "OS_Byte"

rsb r9,r9,#2     ; swap screen banks

b Loop

There are only two parts of this code which really require explanation. The first is the rsb instruction which simply toggles the bank number between zero and two every iteration. The actual banks are numbered one, two, three and so on, but zero is assumed to mean one, allowing the use of only this one instruction. The other is the call to OS_Byte 19. Most Basic programmers will probably be familiar with the WAIT command which is often called just before graphic plotting to help prevent flicker. Well, OS_Byte 19 is identical and is used above to prevent flicker when the screen banks are swapped.

A starfield program

Our starfield program is to be written in assembly language and, for convenience, we will use the assembler within Basic. So, up until the first bit of assembly language, the start of the program is:

ON ERROR MODE MODE: REPORT: PRINT ERL: END
modeN%=28
SYS "OS_ReadModeVariable",modeN%,3 TO,,modeC%
IF modeC%<>63 ERROR EXT &FF,"Not a 256 colour mode"

MODE modeN%
SYS "OS_RemoveCursors"

directionx%=RND(3)-2
directiony%=RND(3)-2

SYS "OS_ReadModeVariable",modeN%,6 TO,,modeX%
SYS "OS_ReadModeVariable",modeN%,7 TO,,modeS%
modeY%=modeS%/modeX%

number%=(modeS%/768)-1

DIM code% 4096+number%*12
FOR pass%=0 TO 2 STEP 2
P%=code%
[OPT pass%

After setting up a Basic error handler, the first few lines check that the chosen screen mode (held in modeN%) has 256 colours and change to that mode.

Following that, two variables, directionx% and directiony%, are set up. Together these hold the direction in which the starfield will move. Each can hold one of three values, minus one for left or downward movement, plus one for right or upward, or zero for no movement.

Next, some information on the selected screen mode is read using SWI OS_ReadModeVariable. The value in r1 signifies which variable to read. Firstly, the length in bytes of one screen line is read, then the total size of the screen, also in bytes, and finally the number of lines in the screen is calculated. The variable number% is then set up holding the number of stars to create. I find that the formula used, one star for every three-quarters of a kilobyte of screen memory, works well in most modes.

Now, rather than leaping straight into the assembly language code, it is probably more useful to look at the end of the program where a number of blocks of data, used by the code, are set up. After all, it is no use looking at the instructions for accessing data if you have no idea what form the data takes.

.Colours_Block
equb 45
equb 47
equb 210
equb 255

The above few lines define the colours of the different speeds of star from darkest and slowest to lightest and quickest. The default above just go from dark grey to white. They are default 256 colour mode palette numbers, identical to those that the Paint application shows in its palette when editing a 256 colour sprite.

.Variables_Template
equd 148
equd -1

The above lines set up a block to read the screen address when passed to OS_ReadVduVariables. VDU variable 148 holds the start of the screen used by the VDU drivers. This is basically the address in memory of the start of the current screen bank memory and thus changes every time the screen bank is changed.

.Variables_Block
equd 0
equd modeX%
equd modeY%
equd modeS%
equd 0

The above variables block is set up to hold a few variables used by the code. The first word will hold the start address of the current screen bank found with OS_ReadVduVariables. The second, third and fourth words are set up with the screen dimensions and memory size which were read at the start of the program. The final word will be used to store the current screen bank number.

.Stars_Size
equd number%*12

The next two lines place the total size of the stars table in memory.

.Stars_Block
]
FOR position%=0 TO number%
[optpass%
equd1+RND(modeX%-2)
equd1+RND(modeY%-2)
equdRND(4)
]
NEXT
NEXT

The above lines define the stars table. This table holds three pieces of information on each star: x coordinate, y coordinate, and speed. The speed value is a number from one to four and is also used to determine the star's colour - slower stars have a darker colour.

CALL code%
END

Finally, at the end of the program, the code is called.

Right, now that all the Basic stuff is out of the way, we can get down to the actual ARM code which begins with the short procedure listed and described below. All the code that follows appears between the opening assembly bracket, [, and the initialisation of the colours block.

.Display_Stars
stmfd r13!, {r14}
adr r12, Variables_Block
adr r11, Stars_Block
ldr r10, Stars_Size
add r10, r10, r11

Firstly, some main registers are initialised. The address of the program variables is stored in r12, the address of the start of the stars table in r11, and the address of the table end is calculated and placed in r10.

.Display_Stars__Loop
mov r0, #112
ldr r1, [r12, #16]
swi "OS_Byte"

Now a loop is begun - this is the main program loop and continues until the user presses <escape>. At the start of the loop, the current screen bank is loaded from the variables (at offset 16) and selected for writing using OS_Byte 112.

adr r0, Variables_Template
mov r1, r12
swi "OS_ReadVduVariables"

Then the address of the start of the screen bank is found using OS_ReadVduVariables. This address is stored in the first word of the variables block described above.

bl Clear_Screen

Next, a subroutine - described later - is called to clear the recently selected screen bank.

bl Update_Stars

Then another subroutine is called to plot and update all the stars in the table.

mov r0,#19
swi "OS_Byte"
mov r0, #113
ldr r1, [r12, #16]
rsb r2, r1, #2
str r2, [r12, #16]
swi "OS_Byte"

Next the screen banks are swapped. Note that, because OS_Byte corrupts r1, it is best to update and store the number before calling the SWI. This prevents the number having to be loaded from memory twice.

swi "OS_ReadEscapeState"
bcc Display_Stars__Loop
mov r0,#126
swi "OS_Byte"
ldmfd r13!,{pc}^

Finally, the loop continues until OS_ReadEscapeState exits with the carry flag set, signifying that the user has pressed <escape>. OS_Byte 126 is then called to clear the escape condition, preventing Basic from reporting it as an error.

The above code calls two other ARM subroutines. The first of these is a fast routine to clear the current screen bank:

.Clear_Screen
stmfd r13!,{r10-r11,r14}
ldr r11, [r12, #00]
ldr r10, [r12, #12]
add r10, r10, r11

After the beginning of the subroutine, the start of the screen bank and its length are loaded from the variables block. The start is stored in r11 and the length is used to calculate the end of the screen bank. This end address is stored in r10.

mov r9, #0
mov r8, #0
mov r7, #0
mov r6, #0
mov r5, #0
mov r4, #0
mov r3, #0
mov r2, #0
mov r1, #0
mov r0, #0

Next, all the registers r0-r9 are reset to zero. Another number between zero and 255 may be used to clear the screen to a colour other than black. Note that this number must be repeated across the four bytes of each register, so green (number &63) would require &63636363 in each of r0 to r9.

.Clear_Screen__Loop
]
FOR loop%=0 TO 24
[OPT pass%
stmdbr10!, {r0-r9}
]
NEXT
[OPT pass%
stmdb r10!, {r0-r5}
cmp r10, r11
bne Clear_Screen__Loop

ldmfd r13!, {r10-r11, pc}^

Next, a loop is used to store the registers r0-r9 25 times, and r0-r5 once, on each iteration. As each register is one word, this means that a total of 1024 bytes is stored on every loop. All screen sizes are a round number of kilobytes, so the loop can simply repeat until the screen is full. This subroutine is very slightly faster than CLS on my machine, but I can't see any obvious way to make it quicker while maintaining mode independence.

The final piece of code, which plots and updates all the stars, is very important but also quite simple.

.Update_Stars
stmfd r13!, {r11, r14}

The register r11 is stored before entry to the main subroutine code. This register initially holds the address of the start of the stars table but is updated through the code to point to each star in turn.

.Update_Stars__Loop
ldr r9, [r11, #0]; x
ldr r8, [r11, #4]; y
ldr r7, [r11, #8]; s

Firstly, the position and speed of a star is read from the table. Its x position is loaded into r9, its y position into r8, and its speed into r7.

ldr r0, [r12, #04]
mla r6, r8, r0, r9

Next, the length of a screen line is read from the variables block pointed to by r12, and the star's address is calculated. The address is found by multiplying the y position by the line length and adding the x position.

adr r1, Colours_Block-1
ldrb r2, [r1, r7]

Now the star's colour is read from the colours block using its speed. As the colours start at one rather than four, the address one byte less than the first colour is found.

ldr r5, [r12, #00]
strb r2, [r5, r6]
add r1, r6, #1
strb r2, [r5, r1]
sub r1, r6, #1
strb r2, [r5, r1]
add r1, r6, r0
strb r2, [r5, r1]
sub r1, r6, r0
strb r2, [r5, r1]

The above section of code actually plots the star. Each star is drawn as a five pixel cross. The first two instructions store a point at the star's screen position. The next two adjust the address and plot the point just to the right. Then the point to the left is stored. Finally, the points below and above the centre are plotted.

]
IF directionx%=+1 THEN
[OPT pass%
ldr r0, [r12, #04]
add r9, r9, r7
cmp r9 ,r0
movge r9, #1
strr9,[r11, #00]
]
ENDIF

Next, if the starfield is moving right, the star's x coordinate is increased by its speed. If that coordinate is greater than the screen width in the variables block, it is reset to one. This means that stars moving off the right of the screen are moved back to the left. Note that conditional assembly is used so that the above code is only assembled if directionx% is plus one - so that if the starfield is not moving right the code is omitted.

IF directionx%=-1 THEN
[OPT pass%
ldr r0, [r12, #04]
sub r9, r9, r7
cmp r9, #0
suble r9, r0, #1
str r9, [r11, #00]
]
ENDIF

The above code is similar to the rightward movement code. A check is made in case the star has moved off the lefthand edge of the screen. If so, it is moved back to the righthand side. This code is only assembled if directionx% is minus one, i.e. if the starfield is moving left.

IF directiony%=+1 THEN
[OPT pass%
ldr r0, [r12, #08]
add r8, r8, r7
cmp r8, r0
movge r8, #1
str r8, [r11, #04]
]
ENDIF
IF directiony%=-1 THEN
[OPT pass%
ldr r0, [r12, #08]
sub r8, r8, r7
cmp r8, #0
suble r8, r0, #1
str r8, [r11, #04]
]
ENDIF

The next two pieces of code, above, update the star's y position depending on the value of directiony%. In a similar manner to the x movement, only one of the two pieces of code is actually assembled.

[OPT pass%
add r11, r11, #12
cmp r11, r10
blt Update_Stars__Loop
ldmfd r13!, {r11, pc}^

Finally to finish the code, r11 is updated and the loop repeats until the end of the stars table is reached.

The above program is a fairly simple illustration of how a starfield may be written. It is possible to have increasingly complicated variations such as having the starfield change direction or colour as the program runs, making the starfield move at angles other than the obvious ones (multiples of 45 degrees), and even having the stars run along a curve.

Lookup tables

As the above program didn't really illustrate the use of a lookup table, code for one is shown below. Note that the example below has no direct bearing on the previous program.

.Position_Table
]
FOR angle%=0 TO 359
[OPT pass%
equd 64*COS(RAD(angle%))
equd 64*SIN(RAD(angle%))
]
NEXT

The code above creates the actual table which, in this case, holds position offsets for the main points on a circle of size 64. Always remember (and I have been caught out in the past by this) that a machine word or register is a 32-bit integer, thus, to be of any use, sine and cosine values must be scaled up for storage. A table such as that above is simply a list of entries in order similar to a one-dimensional array in Basic. In this case, each entry is two words but any size is possible, powers of two being preferable as they are much easier to code.

The instructions below may be used to read an entry from the table:

; r0 is the angle required (0 to 359)
adr r3, Position_Table
ldr r1, [r3, r0, lsl#3]!  ; sine value
ldr r2, [r3, #4]          ; cosine value

Firstly, the adr instruction finds the start address of the table. Next, the first word is read. Writeback (signified by the final exclamation mark) is used to update r3 to the address of the value just read. This means that the second word in the entry may be read without an additional increment instruction. If the table holds more than two words for each entry, some adjustments must be made. The lsl#3 must be changed to produce the size of each entry: lsl#4 for four words, lsl#5 for eight, and so on. Additional ldr instructions can then be added to load the data. An entry in a four word table might be read as follows:

adr r5, Table_Address
ldr r1, [r5, r0, lsl#4]!
ldr r2, [r5, #4]!
ldr r3, [r5, #4]!
ld rr4, [r5, #4]

That's it

Hopefully, this article may be of some use to someone. There is obviously room for further discussion of these topics, but I think I've probably exhausted everybody's interest, at least for one month. Anyway, I'd better get back to revision for my exams - in one week, arrgh! Feel free to borrow any amount of code from the program if you can find a use for it.

For more information on ARM assembly language, the best bet is to try and find a copy of Mike Ginns' excellent book, Archimedes Assembly Language, which I think is unfortunately now out of print.


Source: Archive Magazine 12.10
Publication: Archive Magazine
Contributor: Nicholas Marriott