COMPILER

The last program in this book is the complete compiler which will take a program written in SPL and convert it into machine code for the 6502.

The compiler program, and the SPL program to be compiled, are first entered into different parts of memory. On running the compiler the machine-code will be generated, and put into memory where it can be executed. As given below the two versions of the compiler use the following memory areas:

Use:              BBC Computer:   Atom:
Compiler program  &E000-&27FF     #8200-#9800
SPL program       &2800-&3700     #2900-#39FF
Machine code      &3800-63BFF     #3A00-#3BFF

The procedure for using the compiler is as follows:
On the BBC Computer first type:

PAGE=&2800
NEW

and enter the SPL program as you would a BASIC program. Since the symbols are in lower case there is no danger of them being converted into tokens by BASIC. Having done this, type:

PAGE=&E00
NEW

and either type in, or load from tape, the Compiler program. Then type:

RUN

The compiler performs two identical passes so that the assembler will resolve forward references. The first pass is performed with the screen turned off; then the message:

PRINT?

is given to allow CTRL-B to be typed to turn on the printer for a listing. Typing RETURN will then give the assembler listing statement by statement. After the second pass a symbol table will be printed, showing the addresses corresponding to all the symbols used by the program. Finally, to execute the machine code generated by the compiler type:

CALL &3800

where &3800 is the start of the machine code. Some of the programs should be entered not at the start of the machine code, but at the address corresponding to the label 'enter'; this address can be found from the symbol table.

On the Atom the corresponding sequence is as follows. First type:

?18=#29
NEW

and enter the SPL program. Then type:

?18=#82
NEW

and load the compiler. RUN as above, and then to execute the machine code type:

LINK #3A00

or the address corresponding to the label 'ENTER', if present.

Sample Run

The following run shows the assembler listing produced by the Atom version of the Compiler for the Bubble Sort program given above:

{BUBBLE SORT OF SCREEN}
PROC BUBBLE();
BEGIN I=0;
7210 3A00 A9 00    LDA @-L
7200 3A02 85 51    STA H
LOOP:J=I;
7040 3A04 A5 51    LDA L
7200 3A06 85 52    STA H
LOOPA:IF SCREEN[J]>SCREEN[J+1] THEN
7040 3A08 A5 52    LDA L
3685 3A0A AA       TAX
3685 3A0B BD 00 80 LDA V,X
7200 3A0E 85 81    STA H
7040 3A10 A5 52    LDA L
3070 3A12 18       CLC
3070 3A13 69 01    ADC @-U
3685 3A15 AA       TAX
3685 3A16 BD 00 80 LDA V,X
7200 3A19 85 82    STA H
7040 3A1B A5 81    LDA L
7320 3A1D C5 82    CMP M
4201 3A1F F0 02    BEQ P+4
4201 3A21 B0 03    BCS P+5
4201 3A23 4C 60 3A JMP LLG
BEGIN
TEMP=SCREEN[J];
7040 3A26 A5 52    LDA L
3685 3A28 AA       TAX
3685 3A29 BD 00 80 LDA V,X 
7200 3A2C 85 53    STA H
SCREEN[J]=SCREEN[J+1]; 
7040 3A2E A5 52    LDA L
3070 3A30 18       CLC
3070 3A31 69 01    ADC @-U
3685 3A33 AA       TAX
3685 3A34 BD 00 80 LDA V,X 
1330 3A37 A6 52    LDX L
1330 3A39 9D 00 80 STA V,X
SCREEN[J+1]-ZEMP;
7040 3A3C A5 52    LDA L
3070 3A3E 18       CLC
3070 3A3F 69 01    ADC @-U
7200 3A41 85 81    STA H
7040 3A43 A5 53    LDA L
1330 3A45 A6 81    LDX L
1330 3A47 9D 00 80 STA V,X
IF J=0 THEN GOTO OK;
7040 3A4A A5 52    LDA L
4110 3A4C C9 00    CMP @-M 
4202 3A4E F0 03    BEQ P+5 
4202 3A50 4C 56 3A JMP LLG
2270 3A53 4C 60 3A JMP U
1445 3A56          :LLV
J=J-1;GOTO LOOPA
7040 3A56 A5 52    LDA L
3075 3A58 38       SEC
3075 3A59 E9 01    SBC @-U
7200 3A5B 85 52    STA H
2270 3A5D 4C 08 3A JMP U
END;
1445 3A60	:LLV
OK:I=I+1;IF I<255 THEN GOTO LOOP
7040 3A60 A5 51    LDA L
3070 3A62 18       CLC
3070 3A63 69 01    ADC @-U
7200 3A65 85 51    STA H
7040 3A67 A5 51    LDA L
4110 3A69 C9 FF    CMP @-M
4204 3A6B 90 03    BCC P+5
4204 3A6D 4C 73 3A JMP LLG
2270 3A70 4C 04 3A JMP U
END
1445 3A73          :LLV
6040 3A73 60       RTS
SYMBOLS:
FFE6  RDCH
FFF4  WRCH
F802  WRHEX
8000  SCREEN
B000  PORT
3A00  BUBBLE
  51  I
3A04  LOOP
  52  J
3A08  LOOPA
  53  TEMP 
3A60  OK

BBC Computer Version

5 REM ... Compiler
10 HIMEM=&2800
20 DIM SS(20),LAB(20),ID$(30),JJ(30) 
30 DIM TT(20),X%7:A$MHR$(6)

Important addresses:

MC – Start address for machine code.
VARS – Start address for variables and arrays.
TEMPS – 20 temporary locations.
PRARG – Location for use by procedures for argument.
SADD – Source program address.

35 MC=&3800: VARS=&50: TEMPS=&80: PRARG=&94: SADD=&2800

Pre-defined symbols:

rdch()    Function reads a character.
wrch(X)   Procedure writes character X in ASCII.
screen[0] ... screen[255]  Array to access screen.
40 ID$(0)="rdch":JJ(0)=&FFE0
50 ID$(1)="wrch":JJ(1)=&FFEE
60 ID$(2)="screen":JJ(2)=&7C00
70 FOR N=0T020:TTN=0:NEXT

Now do compilation; first pass with screen disabled, and second pass with screen enabled. Finally print symbol table.

200 PRINT CHR$(21):PROCOMPILE
220 PRINTA$'”PRINT";:INPUTB$:PROCOMPILE
225 PRINT '"SYMBOLS:"
230 FORN=0TOI-1:PRINT JJ(N)," ",ID$(N):NEXT
240 END

PROCOMPILE – One pass of compilation. Initialise pointers, with I=3 since there are 3 pre-defined symbols. Then compile statement, and make sure accumulator is stored finally.

900 DEF PROCOMPILE:G=0:A=SADD:I=3:P%=MC
910 S=0:R=0:H=0:T=0
920 PROCSTMT:PROCSTA:ENDPROC

PROCSTMT – Statement. Skip blanks, read symbol, then check for keywords. Ignore ’end' if found.

1000 DEF PROCSTMT
1010 PROCSP:PROCSYM
1020 IF$X%="if"GOT01400
1030 IF$X%="begin"GOT01200
1040 IF$X%="goto"GOT01500
1045 IF$X%="end"A=A-3:ENDPROC
1050 IF$X%="proc"GOT01700
1060 IF$X%="array"GOT01800
1070 IF$X%=”return”GOT01900

If the symbol is not a keyword then it must be a label, an assignment statement, or a procedure call.

1100 REM IDENT STATEMENT
1110 PROCSP
1120 IF?A=ASC":"A=A+1:PROCVAR:JJ(N)=P$:PRDCSTMT: ENDPROC
1130 IF?A=ASC"("PROCVAR:PROCPSH(U):PROCBODY:ENDPROC
1135 PROCV:IF?A=ASC("[") GOTO 1300
1160 PROCRHS:H=V:PROCSTA:ENDPROC

PROCRHS – Right hand side of assignment statement.

1180 DEF PROCRHS
1185 IF ?A<>ASC"=” PRINTA$"NO =":PROCERR
1190 A=A+1:PROCEXP:L=FNPUL
1195 PROCLDA:VMFNPUL:ENDPROC

'begin' Deal with 'begin' ... 'end' block.

1200 REM BEGIN...END
1210 A=A-1: REPEAT A=A+1
1220 PROCSTMT:PROCSP
1230 UNTIL ?A<>ASC";"
1240 PROCSYM:IF$X%="end"ENDPROC
1250 PRINT"NO END":PROCERR

Array element on the left-hand-side of an assignment statement.

1300 REM ARRAY=
1310 A=A+1:PROCEXP:IF?A<>ASC")"PRINTA$"NO ]";PROCERR
1320 A=A+1:PROCRHS:L=V:V=FNPUL
1325 IFL<=0[LDX 0-L:STA V,X:]:H=0:ENDPROC
1330 [LDX L:STA V,X:]:H=0:ENDPROC

'if' – Assemble code to evaluate condition, and following 'then' assemble code to execute a statement.
Pull label from stack and assemble label.
Deal with ’else' clause.

1400 REM IF...THEN...ELSE
1410 PROCLOGICAL:PROCSP
1420 PROCSYM:IF$X%="then"GOT01430
1425 PRINTA$"NO then":PROCERR
1430 PROCSTMT:V=FNPUL:PROCSP
1440 PROCSYM:IF$X%="else"GOT01460
1445 A=A-N:[.LAB(V):]:ENDPROC
1460 G=FNLAB:U=G:PROCPSH(U):[JMP LAB(G):]
1470 [.LAB(V):]:PROCSTMT
1490 V=FNPUL:[.LAB(V):]:ENDPROC

'goto' - Get label and assemble jump to it.

1500 REM GOTO
1510 PROCLABEL:[JMP U:]
1520 ENDPROC

'proc' - Get name and set its value to entry address P. Then get dummy parameter.

1700 REM PROC
1710 PROCLABEL:JJ(N)=P%:IF?A<>ASC"("PRINTA$”MISSING BRACKET":PROCERR
1720 A=A+1:JJ(I)=1:PROCIDENT:IFN=0GOT01780
1730 U=N:PROCPSH(U)
1740 T=PRARG:H=T:JJ{U)=T:PROCSTA
1745 IF?A<>ASC")"PRINTA$"NO BRACKET":PROCERR
1750 A=A+1:PROCSP:IF?A<>ASC";"PRINTA$"NO ;":PROCERR
1760 A=A+1:PROCSTMT
1770 V=FNPUL:N=V:V=FNPUL:JJ(N)=V:[RTS:]:ENDPROC

Come here if procedure has no parameter.

1780 IF?A<>ASC")"PRINTA$"NO BRACKET":PROCERR
1782 A=A+1:PROCSP:IF?A<>ASC";"PRINTA$"NO ;":PROCERR
1785 A=A+1:PROCSTMT:[RTS:]:ENDPROC

'array' - Look up array name; assign space from VARS onwards. Allow multiple declarations, separated by commas.

1800 REM ARRAY
1810 A=A-1: REPEAT A=A+1
1820 PROCSP:PROCSYM:PROCLOOK
1830 IFN<>I PRINTA$"ARRAY DECLARED":PROCERR
1840 IF?A<>ASC"["PRINTA$"BRACKET MISSING":PROCERR
1850 A=A+1:PROCONST:IFN=0PRINTA$"CONSTANT MISSING":PROCERR
1860 V=FNPUL:JJ(I)=VARS+R:I=I+1:R=R-V+1
1870 IF?A<>ASC"]"PRINTA$"BRACKET MISSING":PROCERR
1880 A=A+1:PROCSP:UNTIL?A<>ASC",":ENDPROC

'return' - Assemble code to load accumulator with expression.

1900 REM RETURN
1910 PROCEXP:V=FNPUL:L=V:PROCLDA:H=0:ENDPROC

PROCIDENT - Read an identifier.

2000 DEF PROCIDENT
2010 PROCSYN:PROCV:ENDPROC

PROCV - Look up identifier X in symbol table. I symbol does not already exist {N=I) allocate address for it. Push address to stack.

2020 DEF PROCV
2030 IF N=0 ENDPROC
2040 PROCLOOK
2050 IFN=I:I=I+1:R=R+1:JJ(N)=R+VARS
2070 IFI>30PRINT"TOO MANY VARIABLES":PROCERR
2080 U=JJ(N):PROCPSH(U):ENDPROC

PROCONST " Read a decimal constant. If not found, N=0. If found, push minus its value.

2100 DEF PROCONST
2105 PROCSP
2110 N=-1:C=0:REPEAT D=C:N=N+1:C=C*10
2120 C=C+A?N-ASC"0"
2130 UNTILA?N<ASC"0"ORA?N>ASC"9"
2140 IFN=0 ENDPROC
2150 A=A+N
2160 U=-D:PROCPSH(U):N=l:ENDPROC

PROCLABEL - Read label.

2200 DEF PROCLABEL
2210 PROCSP:PROCSYM
2220 IF N=0 PRINT"LABEL MISSING":PROCERR
2225 PROCVAR:ENDPROC

PROCVAR - Look up label in symbol table. If not found (N=I) put it in. Return its address in U.

2230 DEF PROCVAR:PROCLOOK
2250 IFN=I:I=I+1
2260 IFI>30PRINTA$"TOO MANY VARIABLES":PROCERR
2270 U=JJ(N):ENDPROC

PROCLOOK - Look up $X% in symbol table, ID$(0), ID$(1) ... If not found, N=I.

2400 DEF PROCLOOK
2410 ID$(I)=$X%:N=-1
2420 REPEAT N=N+1:UNTILID$(N)=ID$(I):ENDPROC

PROCEXP - Assemble code to calculate an expression, of the form:

<factor> <operator> <factor>
where <operator> is one of:
+  : add            - : subtract
|  : OR             & : AND
<< : left shift     >> : right shift

Then push the address of the result on the stack.

3000 DEF PROCEXP PROCSP:PROCFACTOR
3010 PROCSP
3020 IF?A=ASC "+"OR?A=ASC"-"OR?A=ASC"&" OR?A=ASC"|"0=?A:A=A+1:GOTO 3035
3025 IF NOT((?A=ASC">"AND A?1=ASC">" ) OR (?A=ASC "<"AND A?1=ASC"<"))ENDPROC
3030 O=?A:A=A+2
3035 PROCPSH(O)
3040 PROCFACTOR:U=FNPUL:O=FNPUL:L=FNPUL:PROCLDA
3045 IF U<=0 GOTO 3070
3050 IFO=ASC"+"[CLC:ADC U:]
3055 IFO=ASC"-"[SEC:SBC U:)
3060 IFO=ASC"|"[ORA U:]
3065 IFO=ASC"&"[AND U:]
3068 GOTO 3190
3070 IFO=ASC"+"[CLC:ADC #-U:]
3075 IFO=ASC"-"[SEC:SBC #-U:]
3080 IFO=ASC"|"[ORA #-U:]
3085 IFO=ASC "&" [AND #-U: ]
3160 IFO=ASC"<"FOR N=1TO-U:[ASL A:]:NEXT
3180 IFO=ASC">"FOR N=1TO-U:[LSR A:]:NEXT
3190 L=U:PROCRELEASE(L):PROCTEMP
3195 GOTO 3010

PROCBODY - Procedure body. Check for ')'.
If there is a parameter first assemble code to calculate parameter, load it into the accumulator, and then JSR.
Assume subroutine alters accumulator, so set H=0.

3200 DEF PROCBODY
3210 IFA?1=ASC")"A=A+2:GOT03230
3220 PROCFACTOR:V=FNPUL:L=V:PROCLDA
3230 V=FNPUL:[JSR V:]:H=0:ENDPROC

PROCFACTOR - Factor. Check for symbol, constant, or bracketed expression.
If the symbol is followed by '(' or '[' then it is a function or an array element respectively.

3600 DEF PROCFACTOR
3610 PROCSYM:IFN=0GOT03630
3615 IF?A=ASC"("GOT03690
3620 PROCV:IF?A=ASC"["GOT03670
3625 ENDPROC
3630 PROCONST:IF N ENDPROC
3635 IF?A<>ASC"(" PRINTA$"BRACKET MISSING":PROCERR
3640 A=A+1:PROCEXP:PROCSP
3650 IF?A<>ASC")" PRINTA$"BRACKET MISSING":PROCERR
3660 A=A+1:ENDPROC

Evaluate array element. Assemble code to evaluate array index and load it into the accumulator; then TAX and load indexed by the base address.

3670 REM ARRAYS
3675 A=A+1:PROCEXP:IF?A<>ASC"]"PRINTA$"NO BRACKET":PROCERR
3680 A=A+1:L=FNPUL:PROCLDA:V=FNPUL
3685 [TAX:LDA V,X:]:PROCTEMP:ENDPROC

Call function here.

3690 PROCVAR:PROCPSH(U):PROCBODY:PROCTEMP:ENDPROC

PROCLOGICAL - Logical expression. Look for: <expression> <comparison> <expression>

4000 DEF PROCLOGICAL
4010 PROCSP:PROCEXP:PROCSP

Expect a comparison here; look for '<', '>', and '=' and set value of U depending on sequence:

>  : 1	=  : 2
>= : 3 	<  : 4
<> : 5	<= : 6

Then use a computed GOTO to assemble code for each case.

4020 U=0
4030 IF?A=ASC"<"A=A+1:U=4
4040 IF?A=ASC">"A=A+1:U=U+1
4050 IP?A=ASC"="A=A+1:U=U+2
4060 IFU=0 OR U>6 PRINT"ILLEGAL TEST":PROCERR
4070 PROCPSH(U):PROCEXP
4080 M=FNPUL:U=FNPUL
4090 L=FNPUL:PROCLDA
4100 IFN>0[CMP M:]
4110 IFM<=0 [CMP 4-M:]

First generate a label LAB(G). Then assemble code for the comparison.
Note that if the condition is true we branch around a jump to LAB(G).
Push value of LAB(G) for use b IF...THEN statement.

4120 PROCRELEASE(M):G=FNLAB:GOTO(4200+V)
4201 [BEQ P%+4:BCS P%+5:]:GOTO4210
4202 [BEQ P%+5:]:GOTO4210
4203 [BCS P%+5:]:GOTO4210
4204 [BCC P%+5:]:GOTO4210
4205 [BNE P%+5:]:GOTO4210
4206 [BCC P%+7:BEQ PI+5:]:GOTO4210
4210 [JMP LAB(G):]
4220 U=G:PROCPSH(U):H=O:ENDPROC

PROCPSH - Push argument onto stack.

5020 DEF PROCPSH(U):SS(S)=U:S=S+1:IFS<21 ENDPROC 5021 PRINTA$"STACK FULL":PROCERR

FNPUL - Pull from stack.

5030 DEF FNPUL:S=S-1:IFS>=0 =SS(S) 5031 PRINTA$"STACK ERROR":PROCERR

PROCSP - Skip blanks, line numbers, and comments between '{' and '}'.

5040 DEF PROCSP
5042 IF?A=32 REPEAT A=A+1:UNTIL?A<>32
5046 IF?A=13A=A+4:PRINT $A:GOTO5042
5048 IF?A=ASC "{"REPEATA=A+1: UNTIL?A=ASC" } ": A=A+1:GOTO5042
5049 ENDPROC

FNLAB - Return a new label number. Label is LAB(G).

5070 DEF FNLAB:G=G+1:IF G<20 =G
5071 PRINTA$"TOO MANY LABELS":PROCERR

PROCTEMP - Return the address of a temporary location TT(N); return its address in T, set H to the address, and push the address.

5100 DEF PROCTEMP
5110 N=-1:REPEAT N=N+1: IF N>20PRINTA$"NOT ENOUGH TEMP":PROCERR
5120 UNTILTT(N)=0
5130 T=N+TEMPS:TT(N)=T:U-T:H=T:PROCPSH(U):ENDPROC

PROCSYM - Read a symbol into X% from A. Returns N=0 if no symbol found.

6000 DEF PROCSYM
6010 PROCSP:N=-1:REPEAT N=N+1: N?X%=A?N
6020 UNTILA?N>ASC"z"ORA?N<ASC"a"ORN=7
6030 IF N=0 ENDPROC
6040 IF N<7 N?X%=&D:A=A+N:ENDPROC
6050 PRINTA$"SYMBOL TOO LONG":PROCERR

PROCLDA - Assemble code to load the accumulator with L. If accumulator already contains L (L=H) then do nothing; otherwise store its previous contents and load new contents.

7000 DEF PROCLDA
7010 IFL=H AND L>0 PROCRELEASE(L):ENDPROC
7020 PROCSTA
7030 IFL<=0 [LDA #-L:]:ENDPROC
7040 [LDA L:]:PROCRELEASE(L)
7050 ENDPROC

PROCSTA - Assemble code to store accumulator's contents to location H.

7100 DEF PROCSTA
7200 IFH>0[STA H:]:H=0 7210 ENDPROC

PROCRELEASE - Release specified temporary variable for re-use.

7300 DEF PROCRELEASE(L)
7310 IF L>=TEMPS AND L<TEMPS+20:TT(L-TEMPS)=0
7320 ENDPROC

PROCERR - Output error. Print line containing error and '^' pointing to approximate position.

9000 DEF PROCERR
9010 N=A:X=0:REPEAT N=N-1:X=X+1:UNTIL?N=13:@%=5
9020 PRINT'N?1*256+N?2,$(N+4)
9030 PRINT TAB(X+2);""":END

Variables:

A - Pointer  to  current position in expression being compiled
C - Used to evaluate constant
G - Number of next free label LAB(G)
H - Address whose contents are currently in accumulator. H=0 means ignore previous contents
I - Number of next free symbol
ID$(0)..ID$(30) - Symbol names
JJ(0)..JJ(30) - Addresses of symbols
L - Value or address to be loaded into accumulator; used by PROCLDA
LAB(0)..LAB(20) - Labels for use in assembly
MC - Assemble machine code to here
N - Temporary variable
O - Operator read by PROCEXP
P - Program location counter, used by assembler
PRARG - Location for use by procedures for argument
R - Number of variable locations used up
S - Next free location on SS stack
SADD - Source program address
SS(0)..SS(20) - Stack used by compiler
T - Temporary location assigned by PROCTEMP
TEMPS - 20 temporary locations start here
TT(0)..TT(20) - Flags for temporary locations; value=0 if location is free for use
U - Value pushed by PROCPSH
V - Used by FNPUL
VARS - Allocate variables and arrays starting here
X% - String into which symbols and keywords are read by PROCSYM

Atom Version

To save program space in this version of the compiler abbreviated forms of many of the BASIC statements and commands have been used, and for convenience these are listed below:

Abbreviation:	Keyword:
A.		AND
E.		END
F.		FOR
G.		GOTO
GOS.		GOSUB
N.		NEXT
P.		PRINT
R.		RETURN
U.		UNTIL
10 REM ... COMPILER ...
20 DIM SS(20),LL(20),II(30),JJ(30) 
30 DIM X(7),TT(20)RR(4)

Important addresses:

RR0 - Start of machine code.
RR1 - Start of variables and arrays.
RR2 - 20 temporary locations.
RR3 - Location for use by procedures for argument.
RR4 - Source  program address.
35 RR0=#3A00;RRl=#50;RR2=#80;RR3=#94;RR4=#2900
40 F.N=0T030;DIMI(6);IIN=I;JJN=RR0;N.

Pre-defined symbols:

RDCH()    Function reads a character.
WRCH(X)   Procedure writes character X.
WRHEX(X)  Procedure writes X as two hex digits.
SCREEN[0] ... SCREEN[255)  Array to access screen.
PORT[0]   ... PORT[3]      Array to access I/0 ports.
50 $II0="RDCH";JJ0=#FFE3;$II1="WRCH";JJ1=#FFF4
60 $II2="WRHEX";JJ2=#F802;$II3="SCREEN";JJ3=#8000
70 $II4="PORT'IIJJ4=4B000
115 F.N=OT020;TTN=0;LLN=RR0;N.

Now do compilation; first pass with screen disabled, and second pass with screen enabled. Finally print symbol table.

210 P.$21;GOS.a
220 P.$6'"PRINT";IN.$100;GOS.a
228 P.'"SYMBOLS:"'
230 F.N=0TOI-1;P.&JJN,"  ",$IIN';N.
240 E.

a - One pass of compilation. Initialise pointers, with I=5 since there are 5 pre-defined symbols.
Then compile statement, and make sure accumulator is stored finally.

900aG=0;A=RR4;I=5;P=RR0
910 S=0;R=0;H=0;T=0
920 GOS.s;GOS.m;R.

s - Statement. Skip blanks, read symbol, then check for keywords. Ignore END if found.

1000sREM STATEMENT
1010 GOS.b;GOS.x
1020 IF$X="IF"G.1400
1030 IF$X="BEGIN"G.1200
1040 IF$X="GOTO"G.1500
1045 IF$X="END"A=A-3;R.
1050 IF$X="PROC"G.1700
1060 IF$X="ARRAY"G.1800
1070 IF$X="RETURN"G.1900

If the symbol is not a keyword then it must be a label, an assignment statement, or a procedure call.

1100 REM IDENT STATEMENT
1110 GOS.b
1120 IF?A=CH":"A=A+1;GOS.h;JJN=P;G.s
1130 IF?A=CH"("GOS.h;GOS.u;G.p
1135 GOS.j;IF?A=CH"["G.1300
1160 GOS.d;H=V;GOS.m;R.

d - Right-hand side of assignment statement.

1180dIF?A<>CH"=" P.$6"NO =";G.o
1190 A=A+1;GOS.e;GOS.v;L=V;GOS.1;GOS.v;R.

BEGIN - Deal with BEGIN ... END block.

1200 REM BEGIN...END
1210 A=A-1;DO A=A+1
1220 GOS.s;GOS.b
1230 U. ?A<>CH";"
1240 GOS.x;IF$X="END"R.
1250 P.$6"NO END";G.o

Array element on the left-hand side of an assignment statement.

1300 REM ARRAY=
1310 A=A+1;GOS.e;IF?A<>CH"]"P.$6"NO ]";G.o
1320 A=A+1;GOS.d;L=V;GOS.v
1325 IFL<=0[LDX @-L;STA V,X;];H=0;R.
1330 [LDX L;STA V,X;];H=0;R.

IF - Assemble code to evaluate condition, and following THEN assemble code to execute a statement.
Pull label from stack and assemble label. Deal with ELSE clause.

1400 REM IF...THEN...ELSE
1410 GOS.q;GOS.b
1420 GOS.x;IF$X="THEN G.1430
1425 P.$6"NO THEN";G.o
1430 GOS.s;GOS.v;GOS.b
1440 GOS.x;IF$X="ELSE"G.1460
1445 A=A-N;[:LLV;];R.
1460 GOS.g;U=G;GOS.u;[JMP LLG;]
1470 [:LLV;];GOS.s
1490 GOS.v;[:LLV;];R.

GOTO - Get label and assemble jump to it.

1500 REM GOTO
1510 GOS.k;[JMP U;];R.

PROC - Get name and set its value to entry address P. Then get dummy parameter.

1700 REM PROC
1710 GOS.k;JJN=P;IF?A<>CH"("P.$6"MISSING BRACKET";G.o
1720 A=A+l;JJI=l;GOS.i;IFN=0G.1780
1730 U=N;GOS.u
1740 T=RR3;H=T;JJU=T;GOS.m
1745 IF?A<>CH")"P.$6"NO BRACKET";G.o
1750 A=A+1;GOS.b;IF?A<>CH";"P.$6"NO ;";G.o
1760 A=A+1;GOS.s
1770 GOS.v;N=V;GOS.v;JJN=V;[RTS;];R.

Come here if procedure has no parameter.

1780 IF?A<>CH")"P.$6"NO BRACKET";G.o
1782 A=A+1;GOS.b;IF?A<>CH";"P.$6"NO ;";G.o 1785 A=A+1;GOS.s;[RTS;];R.

ARRAY - Look up array name; assign space from RR1 onwards. Allow multiple declarations, separated by commas.

1800 REM ARRAY
1810 A=A-1;DO A=A+1
1820 GOS.b;GOS.x;GOS.y
1830 IFN<>I P.$6"ARRAY DECLARED";G.o
1840 IF?A<>CH"["P.$6"BRACKET MISSING";G.o
1850 A=A+1;GOS.c;IFN=0P.$6"CONSTANT MISSING";G.o
1860 GOS.v;JJI=RR1+R;I=I+1;R=R-V+1
1870 IF?A<>CH"]"P.$6"BRACKET MISSING";G.o
1880 A=A+1;GOS.b;UNTIL?A<>CH",";R.

RETURN - Assemble code to load accumulator with expression.

1900 REM RETURN
1910 GOS.e;GOS.v;L=V;GOS.1;H=0;R.

i - Read an identifier.

2000iREM IDENTIFIER
2010 GOS.x

j - Look up identifier X in symbol table.
If symbol does not already exist (N=I) allocate address for it.
Push address to stack.

2030jIF N=0 R.
2040 GOS.y
2050 IFN=I;I=I+1;R=R+1;JJN=R+RR1
2070 IFI>30P.$6"TOO MANY VARIABLES";G.o
2080 U=JJN;GOS.u;R.

c - Read a decimal constant. If not found, N=0. If found push minus its value.

2100cREM CONSTANT
2105 GOS.b
2110 N=-1;C=0;DO D=C;N=N+1;C=C*10
2120 C=C+A?N-CH"0"
2130 U.A?N<CH"0"ORA?N>CH"9"
2140 IFN=0 R.
2150 A=A+N
2160 U=-D;GOS.u;N=l;R.

k - Read label.

2200kREM LABELS
2210 GOS.b;GOS.x
2220 IF N=0 P.$6"LABEL MISSING";G.o

h - Look up label in symbol table. If not found (N=I) put it in. Return its address in U.

2230hGOS.y
2250 IFN=I;I=I+1
2260 IFI>30P.$6"TOO MANY VARIABLES";G.o
2270 U=JJN;R.

y - Look up $X in symbol table, $II(0), $II(1) If not found, N=I.

2400yREM LOOKUP
2410 $III=$X;N=-1
2420 DO N=N+1;U.$IIN=$III;R.

e - Assemble code to calculate an expression, of the form:
<factor> <operator> <factor>
where <operator> is one of:

+  : add             - : subtract
|  : OR              & : AND
<< : left shift     >> : right shift

Then push the address of the result on the stack.

3000eREM EXPRESSION
3010 GOS.b;GOS.f
3015 GOS.b
3020 IF?A=CH"+"OR?=CSH"-"OR?A=CH"&" OR?A=CH"("0=?A; A=A+1;G.3035
3025 IF( (?A=CH "> "A.A?l=CH ">") OR (?A=CH "< "A. A?1=CH"<"))f1 R.
3030 0=?A;A=A+2
3035 UM;GOS.u
3040 GOS.f;GOS.v;U=V;GOS.v;O=V;GOS.v;L=V;GOS.1
3045 IP V<=0 G.3070
30SO IPO=CH"+"[CLCgADC U;]
3055 IPO=CH"-"[SEC;SBC U;]
3060 IFO=CH"("[ORA U;]
3065 IFO=CH"I"[AND U;]
3068 G.3190
3070 IFO=CH"+"[CLC;ADC @-UJ]
3075 IFO=CH"-"[SEC;SBC @-U;]
3080 IFO=CH")"[ORA @-U;]
3085 lFO=CH"$"[AND @-U;]
3160 IFO=CH"<"F.N=1TO-U;[ASL A;];N.
3180 IFO=CH">"P.N=lTO-U;[LSR A;];N.
3190 L=U;GOS.r;GOS.t
3195 G.3015

p - Procedure body. Check for ' '. If there is a parameter first assemble code to calculate parameter, load it into the accumulator, and then JSR. Assume subroutine alters accumulator, so set H=0.

3200pREM PROC BODY
3210 IFA?1=CH")"A=A+2;G.3230 3220 GOS.f;GOS.v;L=V;GOS.1 3230 GOS.v;[JSR V;];H=0;R.

f - Factor. Check for symbol, constant, or bracketed expression.
If the symbol is followed by '(' or 'f' then it is a function or an array element respectively.

3600fREM FACTOR
3610 GOS.x;IFN=0G.3630
3615 IF?A=CH"("G.3690
3620 GOS.j;IF?A=CH"["G.3670
3625 R.
3630 GOS.c;IF N R.
3635 IF?A<>CH"(" P.$6"BRACKET MISSING";G.o
3640 A=A+1;GOS.e;GOS.b
3650 IF?A<>CH")" P.$6"BRACKET MISSING";G.o
3660 A=A+l;R.

Evaluate array element. Assemble code to evaluate array index and load it into the accumulator; then TAX and load indexed b the base address.

3670 REM ARRAYS
3675 A=A+1;GOS.e;IF?A<>CH"]"P.$6"NO BRACKET";G.o
3680 A=A+1;GOS.v;L=V;GOS.1;GOS.v
3685 [TAX;LDA V,X;];G.t

Call function here.

3690 GOS.h;GOS.u;GOS.p;G.t

q - Logical expression. Look for:
<expression> <comprison> <expression>

4000qREM LOGICAL
4010 GOS.b;GOS.e;GOS.b

Expect a comparison here; look for '<', '>', and '=' and set value of U depending on sequence:

>  : 1	=  : 2
>= : 3 	<  : 4
<> : 5	<= : 6

Then use a computed GOTO to assemble code for each case.

4020 U=0
4030 IF?A=CH"<"A=A+1;V=4
4040 IF?A=CH">"A=A+1;U=U+1
4050 IF?A=CH"="A=A+1;V=V+2
4060 IFU=0 OR U>6 P.$6"ILLEGAL TEST";G.o
4070 GOS.u;GOS.e
4080 GOS.v;N=V;GOS.v;U=V
4090 GOS.v;L=V;GOS.l
4100 IFM>0[CMP M;]
4110 IFM<=0[CMP @-M;]

First generate a label LLG. Then assemble code for the comparison. Note that if the condition is true we branch around a jump to LLG. Push value of LLG for use b IF...THEN statement.

4120 L=M;GOS.r;GOS.g;G.(4200+U)
4201 [BEQ P+4;BCS P+5;];G.z
4202 [BEQ P+5;];G.z
4203 [BCS P+5;];G.z
4204 [BCC P+5;];G.z
4205 [BNE P+5;];G.z
4206 [BCC P+7;BEQ P+5;];G.z
4210z[JMP LLG;]
4220 U=G;GOS.u;H=0;R.

u - Push U onto stack.

5020uSSS=U;S=S+l;IFS<21 R.
5021 P.$6"STACK FULL";G.o

v - Pull V from stack.

5030vS=S-1;IFS>=0V=SSS; R.
5031 P.$6"STACK ERROR";G.o

b - Skip blanks, line numbers, and comments between '{' and '}'.

5040bIF?A=32 DO A=A+1;V.?A<>32
5041 IF?A=13A=A+3;P.$A';G.b
5042 IF2A=CH"{"DOA=A+1);U.?A=CH"}")A=A+1;Q.b
5043 R.

q - Generate a new label number in G. Label is LLG.

5070gG=G+1;IF G<20 R.
5072 P.$6"TOO MANY LABELS";G.o

t - Generate a temporary location TTN; return its address in T, set H to the address, and push the address.

5100tREM TEMP. LOC.
5110 N=-1;DO N=N+l; IF N>20P.$6"NOT ENOUGH TEMP";G.o
5120 U.TTN=0
5130 T=N+RR2; TTN=T; U-T; H=T; G.u

x - Read a symbol into $X from A. Returns N=0 if no symbol found.

6000xREM READ SYMBOL
6010 GOS.b;N=-1;DO N=N+1; N?X=A?N
6020 U.A?N>CH"Z "ORA?N<CH "A"ORN=7
6030 IF N=0 R.
6040 IF N<7 N?X=#D;A=A+N;R.
6050 P.$6"SYMBOL TOO LONG";G.o

l - Assemble code to load the accumulator with L.
If accumulator already contains L (L=H) then do nothing; otherwise store its previous contents (GOS.m) and load new contents.

7000lREM LOAD ACCUMULATOR
7010 IFL=H AND L>0 G.r
7020 GOS.m
7030 IFL<=0 [LDA 0-L;];R.
7040 [LDA L;];G.r

m - Assemble code to store accumulator's contents to location H.

7100mREM STORE ACCUMULATOR
7200 IFH>0[STA H;];H=0
7210 R.

r - Release temporary variable with address L for re-use.

7300rREM RELEASE VARIABLE
7310 IF L>=RR2 AND L<RR3;TT(L-RR2)=0 7320 R.

o - Output error. Print line containing error and '^' pointing to approximate position.

9000oREM ERROR
9010 N=A;X=0;DO N=N-1;X=X+1;U.?N=13;@=5
9020 P.'N?1*256+N?2,$N+3'
9030 F.N=0TOX+1;P." ";N.;P."^"';E.

Variables:

A - Pointer  to  current position in expression being compiled
C - Used to evaluate constant
G - Number of next free label LLG
H - Address whose contents are currently in accumulator. H=0 means ignore previous contents
I - Number of next free symbol
II(0)..II(30) - Pointers to symbol names
JJ(0)..JAN(30) - Addresses of symbols
L  - Value or address to be loaded into accumulator; used by subroutine l
LL(0)..LL(20) - Labels for use in assembly
N - Temporary variable
O - Operator read by subroutine e
P - Program location counter, used by assembler
RR(0)..RR(2) - Constant addresses
R - Number of variable locations used up
S - Next free location on SS stack
SS(0)..SS(20) - Stack used by compiler
T - Temporary location assigned by subroutine t
TT(0)..TT(20) - Flags for temporary locations; value=0 if location is free for use
U - Value to be pushed by subroutine u
V - Value pulled by subroutine v
X - String into which symbols and keywords are read by subroutine x

Further Suggestions

The compiler could usefully be extended in two directions. Firstly, the definition of SPL could be enlarged to include some or all of the REPEAT...UNTIL, WHILE...DO, FOR...DO, and CASE statements of Pascal, AND and OR connectives in the IF statement, and multi-parameter procedures. Secondly, the compiler could be enlarged to deal with other data types, such as character strings and two-byte integers. Multi-byte operations, including multiply and divide, could then be implemented by compiling calls to routines which would be included in the machine code generated by the compiler.

Alternatively, the compiler could be extended into a special-purpose language, for applications such as machine control, by adding extra statements for reading and setting bits on the computer's input and output ports, and for setting up interrupt-service routines.

The compiler can also be modified to generate machine code for other processors, such as the 6809. To do this, each assembler statement in the compiler should be replaced by an equivalent BASIC statement that will store need for an assembler le into memory. For example, line 3050 in the BBC Computer version of the Compiler program:

3050 IFO=ASC"+"[CLC:ADC U:]

would be altered to:

3050 IFO=ASC"+": ?P%=&B9: P%1=U/256: P%?2=U AND &FF: P%=P%+3

where &B9 is the code for the 'ADC A' instruction on the 6809. The Compiler program could thus be used for developing software on other processors without the need for an assembler.