Fast Printing in Assembler

This article is for those of us who are studying assembler, and it is about driving printers at their maximum speed. If you were to write a graphics printer driver entirely in Basic, it would be very slow indeed, though less difficult to write than one in assembler. However, most of the computing time is spent on only a part of the program: scanning the screen and organizing the data ready for transmission to the printer. So, if just these parts are re-written in assembler, the program could run fast enough to allow the printer to go at full-speed.

The idea of rewriting the 'time-critical' sections in assembler, embedded in Basic, has many applications in engineering when computers are interfaced to external devices, not just printers. Also, the time spent in coding is minimised if most of the program is still in Basic.

In 1985, G. Hill described printer dumps, for the BBC B, programmed entirely in Basic, which took about 9 min to print a whole page. By translating the time-consuming parts of that Basic code into BBC assembler, the printing time could be reduced to about half a minute. Since then, with the help of the recent articles in Archive about assembler, I have been able to reduce the time taken on a RISC OS computer. The limit on the speed is then set by the printer because the computer is able to send the picture data to the printer so fast that the printer never has to wait for it. The effect isn't so marked as it was for the BBC B because Basic on the ARM processor runs much faster, but it is nevertheless significant.

Note too, that the computer becomes free to do something else before printing is finished if the printer has an input buffer.

Before introducing the assembler code, we need to look at the relevant printer codes, and to write a dump in Basic.

Printers and codes

Current bubble-jet printers are programmed in a similar manner to the older dot-matrix printers which had a column of pins (parallel to the long edge of the paper) which impacted on a typewriter-style ribbon. By firing the pins or bubble jets appropriately, as they are carried from left to right across the paper, a narrow band of the picture is printed. A line-feed and a carriage-return are then needed, and the whole process is repeated until the page is filled.

Many printers use the "ESC/P2 codes" for control. 'ESC' stands for "Epson Standard Code" and, over the last twenty years, many makers of new printers adopted it and thereby claimed "Epson compatibility" in their advertising. (There are other codes, but I haven't used them.) As a result, my programs have worked on a Canon 9-pin, a Panasonic 9-pin, a 24-pin Citizen printer, and now on my Canon BJC-4200 Bubblejet. Minor changes in coding or programming have occurred as a result of technical advances (e.g. going from 9 to 24 pins, and reducing the vertical spacing of dots from 1/60 to 1/72 inch) but the graphics print code ESC "*" remained in use.

The Canon BJC-4200 has 64 jets vertically. If 48 of them are divided into eight groups of six and fired as the head traverses the page from left to right (i.e. printing a column of eight blocks of 6x6 droplets) the effect is the same as printing with eight pins on a 9-pin dot-matrix printer. For monochrome printing this requires eight bits of data, contained in one byte (the bits having the value 0 for no ink, 1 for black) and the most significant bit corresponds with the uppermost dot (a dot being made of 36 droplets as already mentioned). The Canon Programming Manual calls this an "8-bit" mode. The 48 jets can also be used two at a time to emulate the 24-pin dot-matrix printer, and this requires three bytes of data. Finally, the jets can be used independently, when six bytes of data are required.

The printer code ESC "*" requires three parameters. The first, denoted 'm', determines the horizontal spacing of the dots and whether the 8- 24- or 48-bit mode is to be used. Presumably, the horizontal spacing is controlled by timing the firing of the jets as the head moves across the paper. The next two parameters, 'N1' and 'N2', give the number 'N' of dot columns to be printed in the traverse across the page in accordance with the formulae:

N1= N MOD 256  and  N2= N DIV 256

This indicates that, initially at least, dot-matrix printers contained an 8-bit microprocessor and couldn't handle numbers greater than 255 directly, so they were given in multiples of 256 (N2) plus a remainder (N1).

Thus, the full code is:

ESC "*" m N1 N2

Table 1 - Printer modes for BJC4200

m mode dots/inch max.dots allowed
0 single density 60 480
1 double density 120 960
2 high speed, double density 120 960
3 quad density 240 1920
4 CRT graphics I 80 640
5 -    
6 CRT graphics II 90 720

(Note: m=5 was used with dot-matrix printers. Values of m>6 give 24 and 48-bit modes.)

which is sent to the printer and must be followed immediately by the correct number of data bytes. If too few bytes are sent, the printer just sits waiting for more. If too many are sent for the width of the paper, the printer just discards them, but time is wasted producing and sending them. If more bytes are sent than the maximum allowed per call (see Table 1) the surplus bytes will be sent and interpreted as ASCII characters, with highly unpredictable effects. If N is bigger than the maximum allowed, there will be a crash. All my printers have been unhelpful, never giving error messages!

I don't have the Epson Programmer's Manual, having already paid 25UKP for the Canon one, but I don't expect there to be any difference in using Esc "*" on an Epson printer.

The bands printed by successive traverses of the head must not have gaps between them so the line-feed size is less than for text, namely 8/60 inch, for which the code is

ESC "A" 8

Table 1 shows the 8-bit printer modes produced by various values of the parameter m. (If you want the values for 24- and 48-bit graphics, please write to me.) Table 2 gives the characteristics of the various monochromatic screen modes, modes 18 and 37 being my favourites; I am using an A3000 with RISC OS 3.11.

Positions on a graphics screen are given with respect to axes X and Y whose origin is at the bottom left. In mode 18, the x-coordinate ranges from zero to 1279, and y from zero to 1023, the units being referred to as g.u. (graphic units). Also, in mode 18, the pixels measure 2x2 g.u. - so there are 640x512 pixels and each one can be addressed by any of its four corners, e.g. (0,0), (0,1), (1,0) or (1,1) in the case of the one at the origin. The point (1280,1024) is just off the screen. I shall address pixels by their top right corner so that all x- and y-values will be odd. Incidentally, in mode 18, text symbols occupy 16x16 g.u.

When making a print of a mode 18 black-and-white screen, there are no more data than the number of pixels so that, on A4 paper, 8-dot graphics will generally be adequate. So far, I haven't tried 24-bit graphics.

Basic program

My own programs originated from G. Hill's article in Acorn User, June 1983. The central part of a printer dump consists of three nested FOR-loops which, as usual, are best understood by first examining the innermost, and proceeding outwards.

The innermost loop uses the function POINT(x,y) to read the pixel value (black=0, white=1) at the point (x,y), and inserts the value found into a data byte B. In this loop, sufficient groups of eight bits are collected in B for firing the print head once, and the most significant bit of B corresponds with the top of the print head, and the least with the bottom one. (Remember that the jets are being fired in eight groups of six each, giving eight dots, the parameter m, in Table 1, being not greater than 6.)

The head has to scan from left to right across the paper, so the computer has to scan the correspond-ing strip across the screen, and send all the data bytes B to the printer. This scan is controlled by the second FOR-loop which envelops the first.

Finally, the process has to be repeated for each strip of the screen until the whole has been scanned and printed. That is organised by the third FOR-loop which envelops the other two.

I will suppose the print is to be in 'portrait' mode (U=upright) rather than 'landscape' (S=sideways), since this affects how the scan is done.

  1. Screen mode - (I chose mode 18).
  2. Printer mode - (I chose m=4)
  3. The x- and y-ranges to be scanned - (I chose to scan the whole screen so that XMIN=1, YMIN=1, XMAX=1279, YMAX=1023. From these, the number of bytes (N) in a strip and the number of strips are determined.)

Table 2 - Monochromatic Screen Modes, RISC OS 3

mode pixels pix.size x y text
0 640x256 2x4 1279 1023 80x32
4 320x256 4x4 1279 1023 40x32
18 640x512 2x2 1279 1023 80x64
25 640x480 2x2 1279 959 80x60
29 800x600 2x2 1599 1199 100x75
37 896x352 2x4 1791 1407 112x44
41 640x352 2x4 1279 1407 80x44

The program should now be studied.

MODE 18
m%=4: max%=640
REM m%=0, max%=480 OR m%=4, max%=640 may be used
PROCpicture: REM computes and
             REM generates the required picture
TIME=0
PROCPRdumU18(0,1,1,1279,1023)
PRINT"TIME="TIME/100" sec": REM print the time taken for printing
END

DEF PROCPRdumU18(mar%,XMIN%,YMIN%,XMAX%,YMAX%)
  N%=(XMAX%-XMIN%+2)/2
  IF N%>max% PRINT"N%="N%;" is too many bytes for width of printer":STOP
  REM if mar% is not zero (but N%<=max%
  REM is satisfied) then any surplus 
  REM bytes discarded without a crash.

  N1%=N% MOD 256: N2%=N% DIV 256
  VDU2,1,27,1,64
  REM set printer to power-on state

  VDU1,27,1,65,1,8, 1,27,1,108,1,mar%
  REM set line-feed and margin

  FOR Y%=YMAX% TO YMIN% STEP -16
    VDU1,27,1,42,1,m%,1,N1%,1,N2%
    FOR X%=XMIN% TO XMAX% STEP 2:B%=0:
      FOR y%=0 TO 14 STEP 2:
        B%=2*B%:C%=POINT(X%,Y%-y%):
        B%=B%+C%
      NEXT
      VDU1,B%
    NEXT X%
    VDU1,10: REM line-feed
  NEXT Y%
  VDU1,27,1,64, 1,12,3
  REM finished with printer
ENDPROC

Notes

  1. For speed, integer variables (%) are used throughout. The two inner loops are in a single Basic statement.
  2. Shifting the bits one place along the byte B% is done by multiplying by 2.
  3. The printer codes are in ASCII. ESC is 27, "*" is 42, and ESC "@" becomes 27, 64. Some codes are followed by numeric data, e.g. m%, N1%, N2%. The extra 1's are inserted to prevent the next ASCII character from going to the screen as well as to the printer. For example, VDU1,12 throws the printer paper but VDU12 would clear the screen as well.
  4. When printing only a portion of the screen, it might be useful to move it a bit to the right on the paper. So a left margin measured in character spaces may be introduced by the code 27,108,mar%.

It is possible to print the picture in landscape mode (i.e. sideways) if the screen is scanned in vertical strips. Programming for this is given on the Archive monthly disc.

Notice, that with m=4, the picture is squeezed a bit to fit into the width of A4 paper; with graphs which have their own scales, this usually doesn't matter. However, if m=0 is used, circles and squares will be undistorted but a narrow strip is off one edge, so isn't printed. I have another dump which prints a mode 37 screen at half-scale so nothing need be lost.

The final stage in this project is to replace the two inner FOR-loops of the Basic code with assembler code so that the computer can allow the printer to run at full speed. In fact there is then some speed in hand, should a faster printer than mine be available.

REM Program with embedded assembler
MODE 18
m%=4: max%=640
REM m%=0,max%=480 OR m%=4,max%=640
PROCpicture: REM computes and generates the required picture
TIME=0
PROCPRdumU18a(0,1,1,1279,1023)
PRINT"TIME="TIME/100" sec"
END

DEFPROCPRdumU18a(mar%,XMIN%,YMIN%, XMAX%,YMAX%)
  Y%=YMIN%
  N%=(XMAX%-XMIN%+2)/2
  IF N%>max% THEN N%=max%
  N1%=N% MOD 256
  N2%=N% DIV 256
  PROCassU18
  VDU2,1,27,1,64:REM set printer to power-on state
  VDU1,27,1,65,1,8,1,27,1,108,1,mar%
  FOR Y%=YMAX% TO YMIN% STEP -16
    !Y=Y%
    VDU1,27,1,42,1,m%,1,N1%,1,N2%
    CALL code%
  NEXT Y%
  VDU1,27,1,64,1,12,3: REM finished with printer
  VDU7
  VDU7
ENDPROC

DEFPROCassU18
  DIM code% 100
  FOR pass%=0 TO 2 STEP 2
    P%=code%

[OPT pass%
        LDR R7,BC         \set byte counter
        LDR R6,X0         \collect XMIN
.xloop  MOV R0,R6         \X-coordinate of pixel
        LDR R1,Y          \Y-coord.
        MOV R5,#0         \clear R5 ready to build up byte B
        MOV R8,#8         \set counter for reading 8 pixels
.pixels SUBS R8,R8,#1     \decrement counter and set flags
        MOV R5,R5,LSL#1   \left-shift B (=R5)
        SWI "OS_ReadPoint"
        ADD R5,R5,R2      \add pixel (0 or 1) to B
        SUB R1,R1,#2      \form Y for next pixel down
        BNE pixels        \loop back if not finished

        MOV R0,R5: SWI "OS_PrintChar"
                          \send B
        ADD R6,R6,#2      \increment X
        SUBS R7,R7,#1     \decrement counter and set flags
        BNE xloop

        MOV R0,#10: SWI "OS_PrintChar"
                          \send linefeed
        MOV PC,R14

.X0     EQUD XMIN%
.Y      EQUD Y%
.BC     EQUD N%
]
  NEXT
ENDPROC

Notes

The parts of the program still in Basic will be recognised. There is a new call, PROCassU, which produces the machine code and stores it in the space reserved by the statement DIM code%.

In the main program, the two inner FOR-loops have been replaced by the single line:

CALL code%

In the assembler code, the two inner FOR-loops are easily recognised. The innermost starts at the line before the label ".pixels" where the counter R8 is set to 8. Pixels are read by "OS_ReadPoint", which requires inputs x and y to be put into R0 and R1 respectively, and the pixel value is returned in R2. The counter R8 is decremented before each pixel is read, so the loop ends with the simple test for zero count, using BNE pixels.

The loop outside that is also easily traceable: R7 is the counter which receives N%, the number of bytes to be read, and BNE xloop is the test at the end. "OS_PrintChar" is used to send the bytes and the line-feed data to the printer; it sends one byte at a time from R0.

The EQUD-statements in the assembler, and the "!Y=Y%" in the Basic, provide the memory links required. It seems that the instruction PROCassU18 which assembles the code must be positioned after the computer has 'seen' the subjects of the EQUDs.

Results

On my A3010 with a BJC4200, a full page is printed by the all-Basic program in 98 sec, very much faster than the BBC B which took about 9 min.

For the program with Basic and assembler, a print of the full mode 18 screen is produced in 49 sec. The computer finishes in 31 sec, but the printer buffer still has to empty, which takes the remaining 18 sec. If the computer sends the output to a 'printer sink' (*FX5,0), it finishes in only 19 sec.

Alternatively, saving the same picture to disc with *SCREENSAVE, loading that into Draw with *SCREENLOAD, and then printing it using !Printers 1.53 set to 360 dots/inch, took 83 sec; with 180 dots/inch, it took 40 sec, but the lines were fainter.

Archive disc

The monthly disc contains a program to make a suitable picture and to print it with the mode 18 'upright' dump. In addition, there is the code for a 'sideways' picture. There is also a nice upright half-scale dump for mode 37 which prints the picture provided in 26 sec. Instructions for running these programs are given in a text file.

Loose ends

It remains to correlate this printing method with what is happening inside !Printers. If the printer driver for your printer is loaded into !PrintEdit and the PRINT facility selected from the menu, then a list of the codes used will be printed. In the case of the Canon BJC4200, the listing contains graphics codes which aren't in my expensive Programmer's Manual, but obviously my printer does respond to them. Perhaps they are just synonyms for ESC "*" when m>6 ? (The manual does give alternative codes for ESC "*" for m=0, 1, 2 and 3. They are, respectively, ESC "K", "L", "Y" and "Z", which are followed by N1 and N2, and then the data bytes.)

By serendipity, I found the article by Francis Crossley in Archive 9.12 describing 'raster graphics'. This contains the very same codes as I found in the Acorn printer driver and, on the monthly disc, he gave an assembly language program, in a text file. I need to study this further, for which it would be helpful to have Canon's (or Epson's) full description of the codes. It looks to me as if he uses the Acorn Assembly Language package, rather than the simpler assembler embedded in Basic V which I have used in this article. No one seems to have written about this package in Archive, and even the PRM just refers readers to the Acorn Desktop Assembler package. It would be useful if someone wrote up the package for us - for my part, I just want to be able to recognise its special features so that I can adapt such code for my own purposes (inside Basic). There must be lots of useful code about!


Source: Archive Magazine - 13.3
Publication: Archive Magazine
Contributor: John Barker