Fileformat of BASIC code

For various unrelated reasons, I have had reason to get in amongst the Basic line construction recently, and it occurs to me that you would be hard-pushed to find an explanation in any book nowadays. Therefore it might be worth an item here.

When loaded in RAM, ready to run, a BBC Basic program has quite a simple construction. It starts (at the RAM location held by the Basic keyword/pseudo-variable PAGE) without any preamble at the first program line - in a certain format. Each program line then follows in numerical order in the same format.

The line format is as follows:

1st byte in each line contains the value &0D (13)

2nd byte contains the most significant byte (MSB) of the 2-byte line number

3rd byte contains the least significant byte (LSB) of the 2-byte line number

4th byte contains the total number of bytes in the whole line (including the first four bytes)

5th byte (and onwards) contains the ASCII code of program line characters (except keywords and certain numbers, which are coded differently - see below)

After the final line, the program end is signified by two bytes holding the characters &0D and &FF (13 and 255) in succession.

The maximum possible line number is 65279 (&FEFF) - which is (255×256)-1. This is because a value of &FF isn't allowable for the MSB as it would present the same &0D/&FF sequence as the end of program.

As the total number of bytes in a line is carried in only one byte, its maximum size is 255 (&FF) and, as this includes the first four bytes, the theoretical maximum number of bytes-per-line available for the actual program statements is 251 bytes.

Before moving on, it is worth noting that, when loaded into RAM ready to run, this format holds true whether or not you use line numbers in your text/Basic editor, i.e. you don't save program space by not using line numbers!

When it comes to the program statements themselves, Basic uses normal ASCII code values for everything except keywords and certain numbers. Keywords are 'tokenised', i.e. each is represented by a unique number - mostly a single byte number, but always with its first byte at &7F (127) or higher to avoid clashes with normal ASCII character values. A list of these tokens is in Appendix B of the BBC Basic Reference Manual.

To give a trivial, concrete example, let's look at the program:

10    REM Demo
100   Word$="Test"
1000  PRINT 66/3,Word$
10000 END

If you enter Basic in a task window and type:

X%=PAGE
FOR N%=0 TO 48:PRINT ~X%?N%:NEXT

(or *Dump the file, if you prefer) you will get the following byte values (in hex) to which I have added the translation:

0D   Start of 1st line (and start of program)
00   MSB of line number
0A   LSB of line number (i.e. line number = &000A = 10 decimal)
0A   Length of 1st line
     (Note: no space characters between 4th byte of line and first 'real' character.)
F4   Token for REM
20   Space
44   D
65   e
6D   m
6F   o
0D   Start of 2nd line
00   MSB
64   LSB (i.e. line number = &0064 = 100 decimal)
10   Length
     (Note: No space characters between 4th byte of line and first 'real' character.)
57   W
6F   o
72   r
64   d
24   $
3D   =
22   "
54   T
65   e
73   s
74   t
22   "
0D   Start of line
03   MSB
E8   LSB (i.e. line number = &03E8 = 1000 decimal)
10   Length
     (Note: No Space characters between 4th byte of line and first 'real' character.)
F1   Token for PRINT
20   Space
36   6
36   6
2F   /
33   3
2C   ,
57   W
6F   o
72   r
64   d
24   $
0D   Start of line
27   MSB
10   LSB (i.e. line number = &2710 = 10000 decimal)
05   Length
     (Note: No space characters between 4th byte of line and first 'real' character.)
E0   Token for END
0D   End of program sequence
FF   End of program sequence

This example should be enough to give you the gist of a Basic line when loaded into RAM.

Note that any spaces you may have introduced immediately after the line number (for indenting perhaps - and which you might have expected to be seen after the 4th byte in a line) are deleted automatically when a program is loaded for use.

As you can see, normal number values are converted (digit-by-digit) into their ASCII values exactly like text - but if a program statement makes a line number reference (e.g. RESTORE 1200), the line number is coded in a special way using four bytes. The first byte is always &8D (141) and the following three bytes represent the line number in a modified binary way. The reason for this odd coding is to speed up some operations (including renumbering).

The address of the byte after the end of the program, i.e. after the final &FF, is held in the Basic keyword/pseudo-variable TOP - and, when run, program variables start at TOP (or the first word-aligned address afterwards if TOP isn't word­aligned). So, the static program length can be found from (TOP-PAGE).

When running under the Wimp, the keyword END can be used as a function to return the top address of the RAM used by the program for its variables. Thus, (END-TOP) gives the RAM space used by the variables.

Finally, if you have a text editor which gives you the option to view a file as byte values (e.g. Zap), you may find that, although the program statements will conform with the above, the line numbering and program end may be different. This will only apply to the editor's display; the actual loaded program will conform.


Source: Archive Magazine 12.11 - "Learners' Column"
Publication: Archive Magazine
Contributor: Ray Favre