This article is for those of us who are studying assembler, and it is about driving printers at their maximum speed. If you were to write a graphics printer driver entirely in Basic, it would be very slow indeed, though less difficult to write than one in assembler. However, most of the computing time is spent on only a part of the program: scanning the screen and organizing the data ready for transmission to the printer. So, if just these parts are re-written in assembler, the program could run fast enough to allow the printer to go at full-speed.
The idea of rewriting the 'time-critical' sections in assembler, embedded in Basic, has many applications in engineering when computers are interfaced to external devices, not just printers. Also, the time spent in coding is minimised if most of the program is still in Basic.
In 1985, G. Hill described printer dumps, for the BBC B, programmed entirely in Basic, which took about 9 min to print a whole page. By translating the time-consuming parts of that Basic code into BBC assembler, the printing time could be reduced to about half a minute. Since then, with the help of the recent articles in Archive about assembler, I have been able to reduce the time taken on a RISC OS computer. The limit on the speed is then set by the printer because the computer is able to send the picture data to the printer so fast that the printer never has to wait for it. The effect isn't so marked as it was for the BBC B because Basic on the ARM processor runs much faster, but it is nevertheless significant.
Note too, that the computer becomes free to do something else before printing is finished if the printer has an input buffer.
Before introducing the assembler code, we need to look at the relevant printer codes, and to write a dump in Basic.
Current bubble-jet printers are programmed in a similar manner to the older dot-matrix printers which had a column of pins (parallel to the long edge of the paper) which impacted on a typewriter-style ribbon. By firing the pins or bubble jets appropriately, as they are carried from left to right across the paper, a narrow band of the picture is printed. A line-feed and a carriage-return are then needed, and the whole process is repeated until the page is filled.
Many printers use the "ESC/P2 codes" for control. 'ESC' stands for "Epson Standard Code" and, over the last twenty years, many makers of new printers adopted it and thereby claimed "Epson compatibility" in their advertising. (There are other codes, but I haven't used them.) As a result, my programs have worked on a Canon 9-pin, a Panasonic 9-pin, a 24-pin Citizen printer, and now on my Canon BJC-4200 Bubblejet. Minor changes in coding or programming have occurred as a result of technical advances (e.g. going from 9 to 24 pins, and reducing the vertical spacing of dots from 1/60 to 1/72 inch) but the graphics print code ESC "*" remained in use.
The Canon BJC-4200 has 64 jets vertically. If 48 of them are divided into eight groups of six and fired as the head traverses the page from left to right (i.e. printing a column of eight blocks of 6x6 droplets) the effect is the same as printing with eight pins on a 9-pin dot-matrix printer. For monochrome printing this requires eight bits of data, contained in one byte (the bits having the value 0 for no ink, 1 for black) and the most significant bit corresponds with the uppermost dot (a dot being made of 36 droplets as already mentioned). The Canon Programming Manual calls this an "8-bit" mode. The 48 jets can also be used two at a time to emulate the 24-pin dot-matrix printer, and this requires three bytes of data. Finally, the jets can be used independently, when six bytes of data are required.
The printer code ESC "*" requires three parameters. The first, denoted 'm', determines the horizontal spacing of the dots and whether the 8- 24- or 48-bit mode is to be used. Presumably, the horizontal spacing is controlled by timing the firing of the jets as the head moves across the paper. The next two parameters, 'N1' and 'N2', give the number 'N' of dot columns to be printed in the traverse across the page in accordance with the formulae:
N1= N MOD 256 and N2= N DIV 256
This indicates that, initially at least, dot-matrix printers contained an 8-bit microprocessor and couldn't handle numbers greater than 255 directly, so they were given in multiples of 256 (N2) plus a remainder (N1).
Thus, the full code is:
ESC "*" m N1 N2
m | mode | dots/inch | max.dots allowed |
0 | single density | 60 | 480 |
1 | double density | 120 | 960 |
2 | high speed, double density | 120 | 960 |
3 | quad density | 240 | 1920 |
4 | CRT graphics I | 80 | 640 |
5 | - | ||
6 | CRT graphics II | 90 | 720 |
(Note: m=5 was used with dot-matrix printers. Values of m>6 give 24 and 48-bit modes.)
which is sent to the printer and must be followed immediately by the correct number of data bytes. If too few bytes are sent, the printer just sits waiting for more. If too many are sent for the width of the paper, the printer just discards them, but time is wasted producing and sending them. If more bytes are sent than the maximum allowed per call (see Table 1) the surplus bytes will be sent and interpreted as ASCII characters, with highly unpredictable effects. If N is bigger than the maximum allowed, there will be a crash. All my printers have been unhelpful, never giving error messages!
I don't have the Epson Programmer's Manual, having already paid 25UKP for the Canon one, but I don't expect there to be any difference in using Esc "*" on an Epson printer.
The bands printed by successive traverses of the head must not have gaps between them so the line-feed size is less than for text, namely 8/60 inch, for which the code is
ESC "A" 8
Table 1 shows the 8-bit printer modes produced by various values of the parameter m. (If you want the values for 24- and 48-bit graphics, please write to me.) Table 2 gives the characteristics of the various monochromatic screen modes, modes 18 and 37 being my favourites; I am using an A3000 with RISC OS 3.11.
Positions on a graphics screen are given with respect to axes X and Y whose origin is at the bottom left. In mode 18, the x-coordinate ranges from zero to 1279, and y from zero to 1023, the units being referred to as g.u. (graphic units). Also, in mode 18, the pixels measure 2x2 g.u. - so there are 640x512 pixels and each one can be addressed by any of its four corners, e.g. (0,0), (0,1), (1,0) or (1,1) in the case of the one at the origin. The point (1280,1024) is just off the screen. I shall address pixels by their top right corner so that all x- and y-values will be odd. Incidentally, in mode 18, text symbols occupy 16x16 g.u.
When making a print of a mode 18 black-and-white screen, there are no more data than the number of pixels so that, on A4 paper, 8-dot graphics will generally be adequate. So far, I haven't tried 24-bit graphics.
My own programs originated from G. Hill's article in Acorn User, June 1983. The central part of a printer dump consists of three nested FOR-loops which, as usual, are best understood by first examining the innermost, and proceeding outwards.
The innermost loop uses the function POINT(x,y) to read the pixel value (black=0, white=1) at the point (x,y), and inserts the value found into a data byte B. In this loop, sufficient groups of eight bits are collected in B for firing the print head once, and the most significant bit of B corresponds with the top of the print head, and the least with the bottom one. (Remember that the jets are being fired in eight groups of six each, giving eight dots, the parameter m, in Table 1, being not greater than 6.)
The head has to scan from left to right across the paper, so the computer has to scan the correspond-ing strip across the screen, and send all the data bytes B to the printer. This scan is controlled by the second FOR-loop which envelops the first.
Finally, the process has to be repeated for each strip of the screen until the whole has been scanned and printed. That is organised by the third FOR-loop which envelops the other two.
I will suppose the print is to be in 'portrait' mode (U=upright) rather than 'landscape' (S=sideways), since this affects how the scan is done.
mode | pixels | pix.size | x | y | text |
0 | 640x256 | 2x4 | 1279 | 1023 | 80x32 |
4 | 320x256 | 4x4 | 1279 | 1023 | 40x32 |
18 | 640x512 | 2x2 | 1279 | 1023 | 80x64 |
25 | 640x480 | 2x2 | 1279 | 959 | 80x60 |
29 | 800x600 | 2x2 | 1599 | 1199 | 100x75 |
37 | 896x352 | 2x4 | 1791 | 1407 | 112x44 |
41 | 640x352 | 2x4 | 1279 | 1407 | 80x44 |
The program should now be studied.
MODE 18 m%=4: max%=640 REM m%=0, max%=480 OR m%=4, max%=640 may be used PROCpicture: REM computes and REM generates the required picture TIME=0 PROCPRdumU18(0,1,1,1279,1023) PRINT"TIME="TIME/100" sec": REM print the time taken for printing END DEF PROCPRdumU18(mar%,XMIN%,YMIN%,XMAX%,YMAX%) N%=(XMAX%-XMIN%+2)/2 IF N%>max% PRINT"N%="N%;" is too many bytes for width of printer":STOP REM if mar% is not zero (but N%<=max% REM is satisfied) then any surplus REM bytes discarded without a crash. N1%=N% MOD 256: N2%=N% DIV 256 VDU2,1,27,1,64 REM set printer to power-on state VDU1,27,1,65,1,8, 1,27,1,108,1,mar% REM set line-feed and margin FOR Y%=YMAX% TO YMIN% STEP -16 VDU1,27,1,42,1,m%,1,N1%,1,N2% FOR X%=XMIN% TO XMAX% STEP 2:B%=0: FOR y%=0 TO 14 STEP 2: B%=2*B%:C%=POINT(X%,Y%-y%): B%=B%+C% NEXT VDU1,B% NEXT X% VDU1,10: REM line-feed NEXT Y% VDU1,27,1,64, 1,12,3 REM finished with printer ENDPROC
It is possible to print the picture in landscape mode (i.e. sideways) if the screen is scanned in vertical strips. Programming for this is given on the Archive monthly disc.
Notice, that with m=4, the picture is squeezed a bit to fit into the width of A4 paper; with graphs which have their own scales, this usually doesn't matter. However, if m=0 is used, circles and squares will be undistorted but a narrow strip is off one edge, so isn't printed. I have another dump which prints a mode 37 screen at half-scale so nothing need be lost.
The final stage in this project is to replace the two inner FOR-loops of the Basic code with assembler code so that the computer can allow the printer to run at full speed. In fact there is then some speed in hand, should a faster printer than mine be available.
REM Program with embedded assembler MODE 18 m%=4: max%=640 REM m%=0,max%=480 OR m%=4,max%=640 PROCpicture: REM computes and generates the required picture TIME=0 PROCPRdumU18a(0,1,1,1279,1023) PRINT"TIME="TIME/100" sec" END DEFPROCPRdumU18a(mar%,XMIN%,YMIN%, XMAX%,YMAX%) Y%=YMIN% N%=(XMAX%-XMIN%+2)/2 IF N%>max% THEN N%=max% N1%=N% MOD 256 N2%=N% DIV 256 PROCassU18 VDU2,1,27,1,64:REM set printer to power-on state VDU1,27,1,65,1,8,1,27,1,108,1,mar% FOR Y%=YMAX% TO YMIN% STEP -16 !Y=Y% VDU1,27,1,42,1,m%,1,N1%,1,N2% CALL code% NEXT Y% VDU1,27,1,64,1,12,3: REM finished with printer VDU7 VDU7 ENDPROC DEFPROCassU18 DIM code% 100 FOR pass%=0 TO 2 STEP 2 P%=code% [OPT pass% LDR R7,BC \set byte counter LDR R6,X0 \collect XMIN .xloop MOV R0,R6 \X-coordinate of pixel LDR R1,Y \Y-coord. MOV R5,#0 \clear R5 ready to build up byte B MOV R8,#8 \set counter for reading 8 pixels .pixels SUBS R8,R8,#1 \decrement counter and set flags MOV R5,R5,LSL#1 \left-shift B (=R5) SWI "OS_ReadPoint" ADD R5,R5,R2 \add pixel (0 or 1) to B SUB R1,R1,#2 \form Y for next pixel down BNE pixels \loop back if not finished MOV R0,R5: SWI "OS_PrintChar" \send B ADD R6,R6,#2 \increment X SUBS R7,R7,#1 \decrement counter and set flags BNE xloop MOV R0,#10: SWI "OS_PrintChar" \send linefeed MOV PC,R14 .X0 EQUD XMIN% .Y EQUD Y% .BC EQUD N% ] NEXT ENDPROC |
The parts of the program still in Basic will be recognised. There is a new call, PROCassU, which produces the machine code and stores it in the space reserved by the statement DIM code%.
In the main program, the two inner FOR-loops have been replaced by the single line:
CALL code%
In the assembler code, the two inner FOR-loops are easily recognised. The innermost starts at the line before the label ".pixels" where the counter R8 is set to 8. Pixels are read by "OS_ReadPoint", which requires inputs x and y to be put into R0 and R1 respectively, and the pixel value is returned in R2. The counter R8 is decremented before each pixel is read, so the loop ends with the simple test for zero count, using BNE pixels.
The loop outside that is also easily traceable: R7 is the counter which receives N%, the number of bytes to be read, and BNE xloop is the test at the end. "OS_PrintChar" is used to send the bytes and the line-feed data to the printer; it sends one byte at a time from R0.
The EQUD-statements in the assembler, and the "!Y=Y%" in the Basic, provide the memory links required. It seems that the instruction PROCassU18 which assembles the code must be positioned after the computer has 'seen' the subjects of the EQUDs.
On my A3010 with a BJC4200, a full page is printed by the all-Basic program in 98 sec, very much faster than the BBC B which took about 9 min.
For the program with Basic and assembler, a print of the full mode 18 screen is produced in 49 sec. The computer finishes in 31 sec, but the printer buffer still has to empty, which takes the remaining 18 sec. If the computer sends the output to a 'printer sink' (*FX5,0), it finishes in only 19 sec.
Alternatively, saving the same picture to disc with *SCREENSAVE, loading that into Draw with *SCREENLOAD, and then printing it using !Printers 1.53 set to 360 dots/inch, took 83 sec; with 180 dots/inch, it took 40 sec, but the lines were fainter.
The monthly disc contains a program to make a suitable picture and to print it with the mode 18 'upright' dump. In addition, there is the code for a 'sideways' picture. There is also a nice upright half-scale dump for mode 37 which prints the picture provided in 26 sec. Instructions for running these programs are given in a text file.
It remains to correlate this printing method with what is happening inside !Printers. If the printer driver for your printer is loaded into !PrintEdit and the PRINT facility selected from the menu, then a list of the codes used will be printed. In the case of the Canon BJC4200, the listing contains graphics codes which aren't in my expensive Programmer's Manual, but obviously my printer does respond to them. Perhaps they are just synonyms for ESC "*" when m>6 ? (The manual does give alternative codes for ESC "*" for m=0, 1, 2 and 3. They are, respectively, ESC "K", "L", "Y" and "Z", which are followed by N1 and N2, and then the data bytes.)
By serendipity, I found the article by Francis Crossley in Archive 9.12 describing 'raster graphics'. This contains the very same codes as I found in the Acorn printer driver and, on the monthly disc, he gave an assembly language program, in a text file. I need to study this further, for which it would be helpful to have Canon's (or Epson's) full description of the codes. It looks to me as if he uses the Acorn Assembly Language package, rather than the simpler assembler embedded in Basic V which I have used in this article. No one seems to have written about this package in Archive, and even the PRM just refers readers to the Acorn Desktop Assembler package. It would be useful if someone wrote up the package for us - for my part, I just want to be able to recognise its special features so that I can adapt such code for my own purposes (inside Basic). There must be lots of useful code about!
Source: | Archive Magazine - 13.3 |
Publication: | Archive Magazine |
Contributor: | John Barker |