Assembly Language - Array Scanning

In my last article, I described how details of variables listed after the CALL command are made available in a 'descriptor block', pointed to by R9, at the start of your machine code program. R10 holds the total number of variables passed - 0 if there were none - and this allows your program to detect whether it has been given the correct parameters or not.

The block at R9 contains a two-word entry for each parameter, in reverse order to that in which they were listed after CALL. For each entry the first word is an address, either pointing to the contents of the variable itself or to a further descriptor block in the case of strings and arrays.

The second word is a type code indicating what type of variable we are dealing with and, in last month's article, I gave a table of these type codes. The most important are: &004 (integer variable), &080 (string variable), &081 (CR-terminated string), &104 (integer array) and &180 (string array).

The easiest way to demonstrate how to access Basic variables from machine code is to give a working example. I shall describe a routine which makes use of the values pointed to by R9 to search a whole array for a given value, either string or integer depending on the type of the array. Unlike Basic procedures, where you have to define the type of parameter to be expected, machine code routines can have 'dynamic parameters'; you can pass a string when you call your routine from one part of your program and a floating-point value in another context (though it's hard to imagine what kind of operation such a routine could usefully perform!). You can even pass varying numbers of parameters in different circumstances, checking the value in R10 to find out how many to expect.

For my example, I shall use a routine with two parameters: a whole-array reference, randomstringarray$() or int%(), and a string or integer indirection operator, $search or !search2, indicating the value to search for.

To call it, we would use, for example:

DIM search 20,search2 20
$search="cart":!search2=-12
CALL ourcode%,randomstringarray$() ,$search
CALL ourcode%,int%(),!search2

where randomstringarray$() and int%() are arrays that have already been dimensioned and filled.

Before entering the assembler, I shall name three of the ARM registers, R5, R3 and R8, as entries%, ptr% and type%:

entries%=5:ptr%=3:type%=8
REM register aliases

For two parameters, the descriptor block will be four words long.

Offset Value Represents
block!0 address pointer to last value in parameter list
block!4 &004/&081 type of last value in parameter list
block!8 address pointer to first value in parameter list
block!12 &104/&180 type of first value in parameter list

Reading the parameters

LDR  type%,[R9,#12]  ;get value at
                     ;block!12
                     ;(parameter type)
LDR  R0,[R9,#8]      ;get value at
                     ;block!8 (address 
                     ;of this parameter)
LDR  ptr%,[R0]       ;parameter is
                     ;pointer to start
                     ;of array block

Register ptr% now holds the address of the start of the array's descriptor block, and type% holds either the value &180 or &104, depending on what type of array we used.

Array descriptor blocks

For a whole-array type parameter, the pointer you get points not to the actual start of the array data but to an array descriptor block whose size depends on the number of dimensions in your array. It is worth remembering that Basic array subscripts start from element 0; in other words, DIM a$(7) actually creates eight elements, a$(0) to a$(7), and DIM a$(7,2) creates twenty-four - not only a$(0,1) to a$(7,1) and a$(0,2) to a$(7,2), but also a whole extra row consisting of a$(0,0) to a$(7,0)!

The start of an array descriptor block consists of a list of words giving the number of elements in each dimension of the array, followed by a word set to zero to mark the end of the list, followed by another word giving the total number of elements in the whole array. For the two-dimensional example given above, the descriptor block looks like this (remembering that 8×3 =&18 in hex!):

For the purposes of our example program, we are going to assume that our input array is one-dimensional, since all array elements are, in any case, stored consecutively in memory after the end of the descriptor block, and simply scan through the whole lot.

Finding the start of the array data

We are going to use register entries% as a loop counter to tell us when we have reached the end of the array - and, if we find a match, we are going to use the difference between the value of entries% at that point and its initial value to calculate the subscript of the array element which matched (assuming a one-dimensional array, of course).

.repeat
LDR  R0,[ptr%],#4       ;skip over all
                        ;subscript limits
                        ;(first words in
                        ;array block)
CMP  R0,#0              ;until we get to zero word
BNE  repeat

LDR  entries%,[ptr%],#4 ;this next
                        ;word holds total
                        ;number of entries
MOV  R11,entries%       ;save initial value in R11 for
                        ;later cosmetic use

This is the 'clever' bit. We need to branch to a different comparison routine depending on whether we are expecting to deal with numbers or strings.

CMP  type%,#&180        ;check that type
       ;is &180 (whole string array)
BEQ  printstring
CMP  type%,#&104        ;or &104 (whole
                        ;integer array)
BEQ  printnum

We also supply a trap to deal with the case where we were passed the wrong type of parameter altogether - perhaps int%(5) instead of int%().

FNmessage("Parameter error - must pass array_name()")
MOV  R15, R14          ;exit routine

Scanning an integer array

Once we have found the start of the actual array data (it's the next word after the 'total number of entries' word, and register ptr% is therefore already pointing to it), scanning an integer array is relatively easy. We simply look at each word in turn and compare it against the integer value !search2 which was the second parameter passed to the routine and hence at the start of the block pointed to by R9.

.printnum              ;for integers (simple)
LDR  R7,[R9,#0]        ;load R7 from
                       ;block!0 (pointer
                       ;to !search2)
LDR  R8,[R7]           ;load R8 from R7

Properly speaking, I suppose we ought to check the type code value at [R9,#4] first to make sure that this really is a single integer value.

We simply load the next word from ptr% (using post-indexed addressing so that ptr% is automatically incremented), compare the two values and branch to an appropriate point if a match is found.

LDR  R0,[ptr%],#4      ;load next word
                       ;(which is value
                       ;of array entry!)
FNprintnumber          ;print number in
                       ;R0 as string
CMP  R8,R0
BEQ  abortwhenfound    ;check for match
                       ;with !search

If the counter hasn't reached zero, we branch back to the start of the loop (skipping the initialisation carried out by the first two instructions) and repeat for subsequent words.

SUBS entries%,entries%,#1
                       ;decrement counter
BNE  printnum+8        ;back to start of
                       ;loop (we don't need
                       ;to reset R7 and I'm
                       ;too lazy to create a
                       ;new label!)

If the counter does reach zero (and we still haven't branched to .abortwhenfound) then we know that the search failed. We complain and exit.

FNmessage("Nothing found")
MOV  R15, R14          ;exit routine

Successful result

.abortwhenfound        ;print results
                       ;then exit
FNmessage("match found in array element ")
SUB  R0,R11,entries%

We stored the original value of entries% in R11, remember?

FNprintnumber            ;print number in R0
                         ;as string
MOV  R15, R14            ;exit routine
                         ;==================================
                         ;buffer for use of FNprintnumber
.buffer
EQUS STRING$(8," ")

Printing integer values to the screen

FNprintnumber is a handy routine for displaying the contents of R0. The various OS_Write routines can only print strings, so we have to call OS_ConvertInteger first.

DEF FNprintnumber
REM print number in R0
REM Corrupts R1,R2
[OPT pass%
ADR  R1,buffer
MOV  R2,#8               ;set up for
                         ;OS_ConvertInteger
                         ;(to string)
SWI  "OS_ConvertInteger3";R0 holds
                         ;value, R1 buffer,
                         ;R2 buffer size
                         ;R0 now points to buffer
SWI  "OS_Write0"         ;print converted
                         ;entry as string
SWI  "OS_NewLine"
]
=0

Scanning a string array

We now need to jump back mentally to the point where we checked the value of type% to see what kind of array had been passed to our routine. Suppose it had been a string array? In this case, life becomes very much more complicated. An integer array consists of a simple set of word-aligned values; a string array consists of a set of 5­byte blocks, each of which contains a (non-word-aligned) 32-bit value pointing to the actual memory location of the characters in the string, followed by a single byte giving the length of the (non-terminated) string. Before we can even begin to compare the value of an array element with the search string, we need somehow to get its 32-bit address into a register.

Loading from a non-word-aligned address

.printstring             ;for strings
                         ;(complicated)
LDRB   R0,[ptr%],#1      ;Load R0 with
                         ;byte at ptr%!0
LDRB   R2,[ptr%],#1      ;Load R2 with
                         ;byte at ptr%!1
ORR    R0,R0,R2,LSL#8    ;combine the
                         ;two registers (R0
                         ;low byte, R2 high)
LDRB   R2,[ptr%],#1      ;Load R2 with
                         ;byte at ptr%!2
ORR    R0,R0,R2,LSL#16   ;shift it up
                         ;to 3rd hex byte
                         ;and combine
LDRB   R2,[ptr%],#1      ;Load R2 with
                         ;byte at ptr%!3
ORR    R0,R0,R2,LSL#24   ;combine as
                         ;top byte, i.e. load a
                         ;word from non-word-
                         ;aligned address.
                         ;Load R0 with pointer to
                         ;string (first 4 bytes in
                         ;5-byte block) and ptr%
                         ;now points to last byte

What we have done is to load it one byte at a time, shift each byte up by the relevant number of binary digits, and combine it with the value accumulated so far using a logical OR. (This works because LDRB is guaranteed to set the upper, unused 24 bits of a register to zero.) If anyone has worked out a shorter/more elegant way of doing this, please let me know!

String variables aren't zero-terminated, so we can't use OS_Write0 to display memory contents. Instead, we make use of the last byte in the 5-byte block, which holds the length of the string, and use OS_WriteN, which writes a fixed number of characters.

LDRB  R1,[ptr%],#1     ;Load R3 with
                       ;length of string and advance
                       ;ptr% 1 byte
SWI   "OS_WriteN"      ;R0 points to
                       ;string, R1 holds no
                       ;of chars to write
SWI   "OS_NewLine"

Comparing two strings

We now point R7 to the start of the value we wish to find, just as we did for .printnum. The only difference is that, in this case (a CR-terminated string), the address in R7 won't necessarily be word-aligned - but as we mean to do a byte-by-byte comparison anyway, it doesn't matter.

LDR   R7,[R9,#0]       ;point R7 to
                       ;start of $search
.compare
LDRB  R2,[R0],#1
LDRB  R8,[R7],#1
CMP   R2,R8
BEQ   compare

CMP   R8,#&0D          ;check for end of
                       ;search string
                       ;i.e. success
BEQ   abortwhenfound

The comparison itself isn't particularly sophisticated. Using R0 (which still points to the start of the string after OS_WriteN) and R7 as pointers, we load and compare each pair of characters one by one. If they differ, we check the character from $search to see if it is the string terminator and, if it is, we assume a successful match. Note, in particular, that we don't check for the end of the array element, with two consequences:

a) $search will match any word of which it is a substring ('car' will match 'carcase')

b) if the array element is a substring of $search, a 'false positive' may be obtained if there is random data in memory after the end of the array element which happens to match. If we are searching for 'cart' and the array contains a three-letter string 'car' which happens to be embedded in memory thus:

8tYd1cart%joV

this comparison algorithm will report success.

For those who may wish to pursue this problem further, a more complex 'real-life' string array scanning routine can be found in my Infozip application, in the file !Infozip.Source.codesource. (Infozip was on Archive monthly disc 13.12, and hence on the Archive CD, and I understand it should also be downloadable from here. For demonstration purposes, however, the simple algorithm will suffice.

SUBS entries%,entries%,#1 ;decrement
                          ;counter back to
BNE   printstring         ;start of loop
FNmessage("Nothing found")
MOV   R15, R14            ;exit routine at
                          ;end of array

The remainder of the loop is almost identical to that used for scanning the integer array. Note that, in this case, we have to reload R7 from [R9,#0] in order to move the pointer back to the start of $search, and we have to go through the whole rigmarole of loading a non-word-aligned address for each new array element. As you would expect, scanning a string array is somewhat slower than scanning an integer array. However, it is still much quicker than doing either with a loop from Basic!


Source: Archive Magazine 14.5
Publication: Archive Magazine
Contributor: Harriet Bazley