Assembly Language - Wildcards (part 2)

My last assembler article looked at a simple routine to match one wildcarded string against another. This month, I shall give a brief description of how such a routine can be incorporated into a practical example that could be called from within a Basic program.

Returning an array of pointers

This theoretical Basic program maintains a list of strings in a block of memory. When the user enters a wildcarded target string, the program is expected to print out a sorted list of all the strings in memory that match. The machine code must therefore be written to return an array of pointers for the use of OS_HeapSort.

We will simulate this parent program by reading in strings one line at a time from a test file and poking them into a DIMmed block. The file HARRIET that I've provided (in fact, it's a randomised dump from an Impression user dictionary) contains 240 words, none of which is longer than 16 characters; allowing space for the terminating CHR$(13) on each string, the size of the block must be at least 17*240 bytes. We also need to dimension another block with room for up to 240 pointers, each 4 bytes long, and since the target string also needs to be CR-terminated, we need to dimension another block in which to store that. It's probably best to make it a bit longer than necessary to be on the safe side.

Parameters

The machine code routine requires three main parameters: the address of the target string against which to check each entry, which will be passed on to the comparison routine; the address of the input block holding the main list of string; and the address of the output block in which to place the pointers. We will also pass a fourth parameter, the address of the end of the input block, making it easier to detect when we have finished the scan.

Stacking registers

Our wildcard comparison routine already uses R0 to R7 - all of the eight ARM registers which can be assigned directly from Basic - as well as R10. Any parent machine code program which wishes to make use of it will also need access to these registers for its own purposes; this means that, before we call it, we have to store these registers on a stack so that they can be restored to their former values afterwards. Since we will be calling the subroutine using the BL (Branch with Link) instruction to ensure that we return to the same point in the loop, we also need to stack R14 at the same time. ARM code doesn't automatically maintain a stack of return addresses; as soon the BL instruction is executed, the current value of the program counter is stored in R14 and the old value of R14 - namely the vital address which will enable us to return to Basic - is simply discarded.

OS_HeapSort

This is a very useful SWI which saves us the trouble of writing a sort routine. However, in order to use it, certain values have to be held in specific registers: R0 has to hold a count of the number of items to sort, R1 has to point to an array of objects or pointers, and R2 has to hold a code value indicating what kind of values are pointed to by R1. For our purposes, this means that R1 has to point at the start of the output block and cannot therefore simultaneously be used to hold target% as before. To solve this problem, I have chosen to switch target% and temp%, since there is no need to preserve the value of the latter.

This is, of course, one of the advantages of 'naming' your registers with Basic variables; provided it really doesn't matter which register is being used, it makes it very easy to reassign them in order to 'free up' a register for another purpose without a massive search & replace operation.

Setting up the registers

target%=3:REM was R1
input%=6
output%=2
end%=7
REM these four are initialised
REM from Basic
count%=0
liststart%=1

field%=6
char%=0:temp%=1  :REM was R3
match%=2
t_ptr%=5:f_ptr%=4
store%=7:old%=10

Note that input% and field% both refer to the same register. This is deliberate. Every time the comparison subroutine is called, input% will be pointing to an offset within the input block indicating the start of the new search field to use. The value count%, as we know, needs to be in R0, and liststart% will be used to preserve the initial value of output% in R1 for the use of OS_HeapSort. As for output% and end%, it is irrelevant which registers are used for these, since the comparison subroutine doesn't need to reference these values and we will be restoring them from the stack in any case.

Scanning the block

This is the main loop we will call from Basic.

MOV    count%,#0
          ;zero the counter
MOV    liststart%,output%
          ;store start of block

.start
STMFD  R13!,{R0-R7,R10,R14}
BL     compare
     ;sets zero flag if match failed
LDMFD  R13!,{R0-R7,R10,R14}

Here we use the Load Multiple (LDM) and Store Multiple (STM) instructions to store the ten listed registers at the address pointed to by R13, which is set to point at the current end of the Basic stack by the CALL statement. Basic uses a Full Descending stack, so we will do the same.

STRNE  input%,[output%],#4
ADDNE  count%,count%,#1

R6 has just been restored, so input% now points once more to the start of the string we just checked. We will set the zero flag when returning from the comparison routine to signal a failed match. If the zero flag is clear, this search field matched; we can then store its address in the output buffer and increment the counter. Note that here we're not dealing with single characters - an address is four bytes long!

ADD    input%,input%,#17;point to next data item
CMP    input%,end%
BLO    start

MOV    input%,#0
STR    input%,[output%]
            ;mark end with blank
MOV    R2,#4
SWI    "OS_HeapSort"
   ;R0 holds number of items
   ;R1 holds start of pointer array
   ;R2 holds sort type
MOV    R15, R14      ;exit routine

We loop round at intervals of 17 bytes (16 characters plus a terminator) until we detect the end of the input data block. We then store a zero word to mark the end of the output list of pointers - this also provides an easy way to detect after returning to Basic whether anything was found or not - and call OS_HeapSort. The value 4 in R2 signifies that R1 points to a block of values of the type pointer-to-string, which should be sorted according to the alphabetical order of the strings to which they point. On exit, the output block will hold a list of sorted pointers.

Amending the wildcard routine

We can basically use the wildcard routine from last month as it stands, but two small changes need to be made. First of all, if the routine is to be called repeatedly, we must ensure that match% is reinitialised to zero for every call. Secondly, if the routine is being called from machine code rather than from a USR statement, it is far more useful to return a TRUE/FALSE value in a processor flag than to do so in R0.

One extra line at the start will deal with the first problem. All that is necessary is to ensure that it is outside the main scan loop.

.compare
MOV   match%,#0     ;initialise flag to FALSE
.scantarget
LDRB  char%,[target%],#1

The line at .return needs to be altered from

MOV   R0,match%
         ;routine returns (match%)

to

CMP   match%,#0

This will set the zero flag as required if match%=FALSE.

A demonstration Basic program

DIM data% 241*17,pointers% 241*4,find% 40
i%=OPENIN "HARRIET" :REM assuming this file is in the CSD
FOR n%=0 TO 239
  input$=GET$#i%
  $(data%+(n%*17))=input$
NEXT n%
CLOSE#i%
PROCassemble

We read in the strings and store them at data%+0, data%+17, data%+34 etc. In PROCassemble, we name our registers and assemble the machine code as discussed above.

REPEAT
   INPUT"Search for (wildcarded) string",input$
   IF input$="" THEN input$="*"

A zero-length string will match nothing. We change this to the more useful default value of a single asterisk, which will match anything.

   $(find%)=FNUPPER(input$)

Our comparison isn't case-independent, so we force all data to be in upper case.

   PROCgeneratelist

In this procedure, we initialise and call the machine code. On exit, there will be a list of consecutive words in the block pointers%, each-holding the address of the start of a string, terminated by a zero word.

ptr%=pointers%
IF !ptr% THEN
  REPEAT
    PRINT $(!ptr%)
    ptr%+=4
  UNTIL !ptr%=0
ELSE
  PRINT "Nothing found"
ENDIF

This loop prints out the list of strings located by the machine code routine.

UNTIL FALSE
END
DEF PROCgeneratelist
  D%=find%
  G%=data%
  H%=data%+240*17
  C%=pointers%

These lines initialise R3, R6, R7 and R2 respectively. R7 is pointing to the end of the block at data%.

CALL ourcode%

This assumes that we assembled the machine code at address ourcode%.

ENDPROC

DEF FNUPPER(string$)
LOCAL X%,n%
FOR n%=1 TO LEN(string$)
  X%=ASC(MID$(string$,n%))
  IF (X%>&60 AND X%<&B7) THEN MID$(string$,n%)=CHR$(X% AND %1011111)
NEXT n%
=string$

This is a simple and inefficient function to convert a string to upper case by forcing bit 5 in every lower case letter to be unset. Why can't BBC Basic provide this as a built-in function?

A copy of this program and of the data file is provided on the Archive monthly disc. See if you can list all the words ending in 'SH' - or all the words containing the string 'SS' - or all the words ending in 'D blank N'!


Source: Archive Magazine 14.8
Publication: Archive Magazine
Contributor: Harriet Bazley