ASSIGNMENT

As a preliminary step in designing the compiler, a program is presented that will take a series of assignment statements such as A=A+2 and assemble them into 6502 machine code. This program, Assignment, will then form the basis for the complete compiler.

This program compiles in a single pass, reading the input line and generating assembler statements as it goes. The following operations, which work on 8-bit numbers, are handled by the program

+  : add              -  : subtract
&  : logical AND      |  : logical OR
>> : right shift      << : left shift

All these operators have the same priority, and brackets can be used to alter the order of evaluation. Note that the shift operators must have a constant as their right-hand operand, as in:

A=B>> 2

Variable names can have up to six upper-case letters, and are automatically assigned to zero-page memory locations by the program.

The program uses a stack to hold intermediate results during compilation. Addresses are represented by numbers in the range 1 to FFFF (hexadecimal), and constants as the negative of their value; i.e. by numbers in the range 0 to -255. When the compiler reads an operator it performs the following sequence:

  1. pull the location of the previous results from the stack,
  2. assemble code to calculate the new result,
  3. push onto the stack the location of the new result.

The intermediate results during the compilation of a complicated expression, such as A=(B+2) & (C+3), are associated with temporary locations, TT(0), TT(1) etc.

Using this procedure the program would compile the statement:

MAX=31&(VAR+15)

as:

LDA VAR
CLC
ADC @15
STA TT(0)
LDA @31
AND TT(0)
STA TT(1)
LDA TT(1)
STA MAX

The superfluous STA TT(1) and LDA TT(1) are eliminated by keeping track of the address whose contents are in the accumulator at any time. Instead of generating code to load the accumulator, a call is made to a subroutine that first checks to make sure that the accumulator does not already contain the required value. Furthermore, code is only generated to store the accumulator's contents when absolutely necessary; that is, when a new value is to be loaded into the accumulator.


Sample Run

The following section shows a sample run of the Assignment program, on the Atom, for various statements which are typed in after the '?' prompt. In the listing the first column, which is only present in the Atom version, shows the line in the compiler program that generated the assembler code. The next column gives the address of the machine code, followed by the one, two, or three bytes of the instruction, and the assembler statement. The symbols L, U, and H in the assembler statements are used by the program, and take different values at different points in the program.

>RUN
?VAR=1
7210 3A00 A9 01 LDA @-L
7200 3A02 85 51 STA H

The program has allocated location #51 for the variable VAR.

?VAR=VAR+1
7040 3A04 A5 51    LDA L
3070 3A06 18       CLC
3070 3A07 69 01    ADC @-U
7200 3A09 85 51    STA H
?MAX=31&(VAR+15)
7040 3A0B A5 51    LDA L
3070 3A0D 18       CLC
3070 3A0E 69 0F    ADC @-U
7200 3A10 85 80    STA H 
7210 3A12 A9 1F    LDA @-L
3065 3A14 25 80    AND U
7200 3A16 85 52    STA H

The program has allocated location #52 for MAX. The bracketed expression is evaluated first, and stored in a temporary location #80.

?MIN=(1|VAR)-(MAX>>4)
7210 3A18 A9 01    LDA @-L
3060 3A1A 05 51    ORA U
7200 3A1C 85 80    STA H
7040 3A1E A5 52    LDA L
3180 3A20 4A       LSR A
3180 3A21 4A       LSR A
3180 3A22 4A       LSR A
3180 3A23 4A       LSR A
7200 3A24 85 81    STA H
7040 3A26 A5 80    LDA L
3055 3A28 38       SEC
3055 3A29 E5 81    SBC U
7200 3A2B 85 53    STA H

Here MIN is allocated location #53, and two temporary locations are used. Note that the right-hand operand of '>>' or '<<' must be a constant.

?X=1+(VAR-)
BRACKET MISSING
X=l+(VAR-)
        ^

Finally, a statement which generated an error.


BBC Computer Version

5 REM ... Assignment ...
10 HIMEM=&2800
20 DIM SS(20),ID$(30),JJ(30)
30 DIM TT(20),X%7,Z%256:A$=CHR$(6)

Important addresses:
MC - Start address for machine code.
VARS - Start address for variables and arrays.
TEMPS - 20 temporary locations.

 35 MC=&3800: VARS=&50:TEMPS=&80:PRARG=&94:SADD=&2800
115 FOR N=0T020:TT(N)=0:NEXT
140 I=0:P=MC:S=0:R=0:H=0:T=0

One pass of compilation. Initialise pointers, and make sure accumulator is stored finally.

200 REPEAT INPUT$Z%: A=Z%
210 PROCSTMT:PROCSTA
220 UNTIL FALSE

PROCSTMT - Statement. Skip blanks, then read symbol.

1000 DEF PROCSTMT
1010 PROCSP:PROCSYM
1110 PROCSP
1135 PROCV
1160 PROCRHS:H=V:PROCSTA:ENDPROC

PROCRHS - Right-hand side of assignment statement.

1180 DEF PROCRHS
1185 IF ?A<>ASC"=" PRINTA$"NO =-":PROCERR
1190 A=A+1:PROCEXP:L=FNPUL
1195 PROCLDA:V=FNPUL:ENDPROC

PROCIDENT - Read an identifier.

2000 DEF PROCIDENT
2010 PROCSYM:PROCV:ENDPROC

PROCV - Look up identifier Z$ an symbol table. If symbol does not already exist (N=I) allocate address for it. Push address to stack.

2020 DEF PROCV
2030 IF N=0 ENDPROC
2040 PROCLOOK
2050 IFN=I:I=I+1:R=R+1:JJ(N)=R+VARS
2070 IFI>30PRINT"TOO MANY VARIABLES":PROCERR
2080 U=JJ(N):PROCPSH(U):ENDPROC

PROCONST - Read a decimal constant. If not found, N=0. If found, push minus its value.

2100 DEF PROCONST
2105 PROCSP
2110 N=-1:C=0:REPEAT D=C:N=N+1:C=C*10
2120 C=C+A?N-ASC"0"
2130 UNTILA?N<ASC"0"ORA?N>ASC"9"
2140 IFN=0 ENDPROC
2150 A=A+N
2160 U=-D:PROCPSH(U):N=l:ENDPROC

PROCLOOK – Look up X% in symbol table, ID$(0), ID$(1) ... If not found, N=I.

2400 DEF PROCLOOK
2410 ID$(I)=$X%:N=-1
2420 REPEAT N=N+1:UNTILID$(N)=ID$(I):ENDPROC

PROCEXP - Assemble code to calculate an expression, of the form:
<factor> <operator> <factor>
where <operator> is one of:

+  : add            -  : subtract
|  : OR             &  : AND
<< : left shift     >> : right shift

Then push the address of the result on the stack.

3000 DEF PROCEXP PROCSP:PROCFACTOR
3010 PROCSP
3020 IF?A=ASC"+"OR?A=ASC"-"OR?A=ASC"&"
OR?A=ASC"|"O=?A:A=A+l:GOTO 3035
3025 IF NOT( (?A=ASC "> "AND A?1=ASC ">")OR (?A=ASC "< "AND A?1=ASC”<"))ENDPROC
3030 O=?A:A=A+2
3035 PROCPSH(O)
3040 PROCFACTOR:U=FNPUL:O=FNPUL:L=FNPUL:PROCLDA
3045 IF U<=0 GOTO 3070
3050 IFO=ASC"+"[CLC:ADC U:]
3055 IFO=ASC"-"[SEC:SBC U:]
3060 IFO=ASC"|"[ORA U:]
3065 IFO=ASC"&("[AND U:]
3068 GOTO 3190
3070 IFO=ASC"+"[CLC:ADC #-U:]
3075 IFO=ASC"-"[SEC:SBC #-U:]
3080 IFO=ASC"|"[ORA #-U:]
3085 IFO=ASC"&"[AND #-U:l
3160 IFO=ASC"<"FOR N=1TO-U:[ASL A:]:NEXT
3180 IFO=ASC">"FOR N=1TO-U:[LSR A:]:NEXT
3190 L=U:PROCRELEASE(L):PROCTEMP
3195 GOTO 3010

PROCFACTOR - Factor. Check for symbol, constant, or bracketed expression. If the symbol is followed by '(' or '[’ than it is a function or an array respectively.

3600 DEF PROCFACTOR
3610 PROCSYM:IFN=0GOT03630
3620 PROCV
3625 ENDPROC
3630 PROCONST: IF N ENDPROC
3635 IF?A<>ASC" ( " PRINTA$"BRACKET MISSING ": PROCERR
3640 A=A+1:PROCEXP:PROCSP
3650 IF?A<>ASC")" PRINTA$"BRACKET MISSING":PROCERR
3660 A=A+1: ENDPROC

PROCPSH - Push argument onto stack.

5020 DEF PROCPSH (U ) SS ( S ) =U: S=S+1: IFS <21 ENDPROC
5021 PRINTA$ "STACK FULL":PROCERR

FNPUL - Pull from stack.

5030 DEF FNPUL:S=S-1:IFS>=0 =SS(S)
5031 PRINTA$"STACK ERROR": PROCERR

PROCSP - Skip blanks.

5040 DEF PROCSP
5042 IF?A=32 REPEAT A=A+1:UNTIL?A<>32
5049 ENDPROC

PROCTEMP - Generate a temporary location TT(N ; return its address in T, set H to the address, and push the address.

5100 DEF PROCTEMP
5110 N=-1:REPEAT N=N+1: IF N>20PRINTA$"NOT ENOUGH TEMP":PROCERR
5120 UNTILTT(N)=0
5130 T=N+TEMPS:TT(N)=T:U=T:H=T:PROCPSH(U):ENDPROC

PROCSYM - Read a symbol into $X% from $A. Returns N=0 if no symbol found.

6000 DEF PROCSYM
6010 PROCSP:N=-1:REPEAT N=N+1: N?X%=A?N
6020 UNTILA?N>ASC"Z"ORA?N<ASC"A"ORN=7
6030 IF N=0 ENDPROC
6040 IF N<7 N?X%=&D:A=A+N:ENDPROC
6050 PRINTA$"SYMBOL TOO LONG":PROCERR

PROCLDA - Assemble code to load the accumulator with L. If accumulator already contains L (L=H) then do nothing; otherwise store its previous contents (PROCSTA) and load new contents.

7000 DEF PROCLDA
7010 IFL=H AND L>0 PROCRELEASE(L):ENDPROC
7020 PROCSTA
7030 IFL<=0 [LDA #-L:]:ENDPROC
7040 [LDA L:]:PROCRELEASE(L)
7050 ENDPROC

PROCSTA - Assemble code to store accumulator's contents to location H.

7100 DEF PROCSTA
7200 IFH>0[STA H:]:H=0 7210 ENDPROC

PROCRELEASE - Release temporary variable for re-use.

7300 DEF PROCRELEASE(L)
7310 IF L>=TEMPS AND L<TEMPS+20:TT(L-TEMPS)=0
7320 ENDPROC

PROCERR - Output error.

9000 DEF PROCERR
9010 PRINT '$Z%
9030 PRINT TAB(A-Z%-1);""":END

Variables:

A  - Pointer  to  current position in expression being compiled
C  - Used to evaluate constant
H  - Address whose contents are currently in accumulator. H=0 means ignore previous contents
I  - Number of next free symbol
ID$(0)..ID$(30) - Pointers to symbol names
JJ(0)..JJ(30)   - Addresses of symbols
L  - Value or address to be loaded into accumulator; used by PROCLDA
N  - Temporary variable
O  - Operator read by PROCEXP
P  - Program location counter, used by assembler
RR(0)..RR(2)   - Constant addresses
R  - Number of variable locations used up
S  - Next free location on SS stack
SS(0)..SS(20)  - Stack used by compiler
T  - Temporary location assigned by PROCTEMP
TT(0) ..TT(20) - Flags for temporary locations; value=0 if location is free for use
U  - Value to be pushed by PROCPSH
V  - Used by FNPUL
X% - String into which symbols and keywords are read by PROCSYM
Z% - Input buffer

Atom Version

10 ... ASSIGNMENT ...
20 DIM SS(20),LL(20),II (30),JJ(30)
30 DIM X(7),TT(20),RR(2)

Important addresses:
RR0 - Start address of machine code.
RR1 - Start address of variables and arrays.
RR2 - 20 temporary locations.

35 RR0=#3A00; RRl=#50; RR2=#80
40 F.N=0TO 30; DIMI( 6 ); IIN=I; JJN=RR0; N.
115 F.N=0TO20; TTN=0; LLN=RR0; N.
140 Z=#100; G=0; I=0; P=RR0; S=0; R=0; H=0; T=0

One pass of compilation. Initialise pointers, and make sure accumulator is stored finally.

200 DO INPUT$Z;A=Z
210 GOS.s;GOS.m
220 UNTIL 0

s - Statement. Skip blanks, then read symbol.

1000sREM STATEMENT
1010 GOS.b;GOS.x
1110 GOS.b
1135 GOS.j
1160 GOS.d;H=V;GOS.m;R.

d - Right-hand side of assignment statement.

1180dIF?A<>CH"=" P.$6 "NO =";G.o
1190 A=A+1;GOS.e;GOS.v;L=V;GOS.1;GOS.v;R.

i - Read an identifier.

2000iREM IDENTIFIER
2010 GOS.x

j - Look up identifier X in symbol table. If symbol does not already exist (N=I) allocate address for it. Push address to stack.

2030jIF N=0 R.
2040 GOS.y
2050 IFN=I;I=I+1;R=R+1;JJN=R+RR1
2070 IFI>30P.$6"TOO MANY VARIABLES";G.o
2080 U=JJN;GOS.u;R.

c - Read a decimal constant. If not found, N=0. If found, push minus its value.

2100cREM CONSTANT
2105 GOS.b
2110 N=-1;C=0;DO D=C;N=N+1;C=C*10
2120 C=C+A?N-CH"0"
2130 U.A?N<CH"0"ORA?N>CH"9"
2140 IFN=0 R.
2150 A=A+N
2160 U=-D;GOS.u;N=l;R.

y - Look up X in symbol table, $II(0), $II(l) ...
If not found, N=I.

2400yREM LOOKUP
2410 $III=$X;N=-1
2420 DO N=N+1;U.$IIN=$III;R.

e - Assemble code to calculate an expression, of the form:
<factor> <operator> <factor>
where <operator> is one of:

+  : add         -  : subtract
|  : OR          &  : AND
<< : left shift  >> : right shift

Then push that address of the result on the stack.

3000eREM EXPRESSION
3010 GOS.b;GOS.f
3015 Gos.b
3020 IF?A=CH"+"OR?A=CH"-"OR?A=CH"&"OR ?A=CH"|"O=?A;A=A+1;G.3035
3025 IF((?A=CH">"A.A?1=CH">")OR
(?A=CH "< "A.A? 1=CH "< " ) ):1 R.
3030 O=?A;A=A+2
3035 U=O;GOS.u
3040 GOS.f;GOS.v;U=V;GOS.v;O=V;GOS.v;L=V;GOS.l
3045 IF U<=0 G.3070
3050 IFO=CH"+"[CLC;ADC U;]
3055 IFO=CH"-"[SEC;SBC U;]
3060 IFO=CH"|"[ORA U;]
3065 IFO=CH"&"[AND U;]
3068 G.3190
3070 IFO=CH"+" [CLC;ADC @-U; ]
3075 IFO=CH"-"[SEC;SBC @-U;]
3080 IFO=CH"|"[ORA @-U;)
3085 IFO=CH"&"[AND @-U;]
3160 IFO=CH"<"F.N=1TO-U;[ASL A;];N.
3180 IFO=CH">"F.N=1TO-U;[LSR A;];N.
3190 L=U;GOS.r;GOS.t
3195 G.3015

f - Factor. Check for symbol, constant, or bracketed expressipn. If the symbol is followed by '(' or '[' then it is a function or an array respectively.

3600fREM FACTOR
3610 GOS.x;IFN=0G.3630
3620 GOS.j
3625 R.
3630 GOS.c;IF N R.
3635 IF?A<>CH"(" P.$6"BRACKET MISSING";G.o
3640 A=A+l;GOS.e;GOS.b
3650 IF?A<>CH")" P.$6"BRACKET MISSING";G.o
3660 A=A+1;R.

u - Push U onto stack.

5020uSSS=U;S=S+1;IFS<21 R.
5021 P.$6"STACK FULL";G.o

v - Pull V from stack.

5030vS=S-1;IFS>=0V=SSS; R.
5031 P.$6"STACK ERROR";G.o

b - Skip blanks.

5040bIF?A=32 DO A=A+1;U.?A<>32
5043 R.

t - Generate a temporary location TTN; return its address in T, set H to the address, and push the address.

5100tREM TEMP. LOC.
5110 N=-1;DO N=Ntl; IF N>20P.$6"NOT ENOUGH TEMP";G.o
5120 U.TTN=0
5130 T=N+RR2; TTN=T; U=T; H=T; G.u

x - Read a symbol into $X from $A. Returns N=0 if no symbol found.

6000xREM READ SYMBOL
6010 GOS.b;N=-1;DO N=N+1; N?X=A?N
6020 U.A? N>CH "Z "ORA?N<CH "A" ORN=7
6030 IF N=0 R.
6040 IF N<7 N?X=#D;A=A+N;R.
6050 P.$6"SYMBOL TOO LONG";G.o

l - Assemble code to load the accumulator with L. If accumulator already contains L (L=H) then do nothing; otherwise store its previous contents (GOS.m) and load new contents.

7000lREM LOAD ACCUMULATOR
7010 IFL=H AND L>0 G.r
7020 GOS.m
7030 IFL<=0 [LDA @-L;];R.
7040 [LDA L;];G.r

m - Assemble code to store accumulator's contents to location H.

7100mREM STORE ACCUMULATOR
7200 IFH>0[STA H;];H=0
7210 R.

r - Release temporary variable with address L for re-use.

7300rREM RELEASE VARIABLE
7310 IF L>=RR2 AND L<RR3;TT(L-RR2)=0 7320 R.

o - Output error.

9000oREM ERROR 9020 P.'$Z'
9030 F.N=Z TOA-2;P." ";N.;P."^"';E.

Variables:

A - Pointer to current position in expression being compiled
C - Used to evaluate constant
H - Address whose contents are currently in accumulator. H=0 means ignore previous contents
I - Number of next free symbol
II(0)..II(30) - Pointers to symbol names
JJ(0)..JJ(30) - Addresses of symbols
L - Value or address to be loaded into accumulator; used by subroutine l
N - Temporary variable
O - Operator read by subroutine e
P - Program location counter, used by assembler
RR(0)..RR(2) - Constant addresses
R - Number of variable locations used up
S - Next free location on SS stack
SS(0)..SS(20) - Stack used by compiler
T - Temporary location assigned by subroutine t
TT(0)..TT(20) - Flags for temporary locations; value=0 if location is free for use
U - Value to be pushed by subroutine u
V - Value to be pulled by subroutine v
X - String into which symbols and keywords are read by subroutine x
Z - Input buffer