Part One How Atari |
Chapter OneAtari BASIC: A High-Level Language Translator
The programming language which has become the de facto standard for the Atari Home Computer is the Atari 8K BASIC Cartridge, known simply as Atari BASIC. It was designed to serve the programming needs of hoth the computer novice and the experienced programmer who is interested in developing sophisticated applications programs. In order to meet such a wide range of programming needs, Atari BASIC was designed with some unique features. In this chapter we will introduce the concepts of high level language translators and examine the design features of Atari BASIC that allow it to satisfy such a wide variety of needs. Language Translators Atari BASIC is what is known as a high level language translator. A language, as we ordinarily think of it, is a system for communication. Most languages are constructed around a set of symbols and a set of rules for combining those symbols. The English language is a good example. The symbols are the words you see on this page. The rules that dictate how to combine these words are the patterns of English grammar. Without these patterns, communication would be very difficult, if not impossible: Out sentence this believe, of make don't this trying if sense you to! If we don't use the proper symbols, the results are also disastrous: @twu2 yeggopt gjsiem, keorw? In order to use a computer, we must somehow communicate with it. The only language that our machine really understands is that strange but logical sequence of ones and zeros known as machine language. In the case of the Atari, this is known as 6502 machine language. When the 6502 central processing unit (CPU) "sees" the sequence 01001000 in just the right place according to its rules of syntax, it knows that it should push the current contents of 1
Chapter One the accumulator onto the CPU stack. (If you don't know what an "accumulator" or a "CPU stack" is' don't worry about it. For the discussion which follows, it is sufficient that you be aware of their existence.) Language translators are created to make it simpler for humans to communicate with computers. There are very few 6502 programmers, even among the most expert of them, who would recognize 01001000 as the push-the-accumulator instruction. There are more 6502 programmers, but still not very many, who would recognize the hexadecimal form of 01001000, $48, as the push-the-accumulator instruction. However, most, if not all, 6502 programmers will recognize the symbol PHA as the instruction which will cause the 6502 to push the accumulator. PHA, $48, and even 01001000, to some extent, are translations from the machine's language into a language that humans can understand more easily. We would like to be able to communicate to the computer in symbols like PHA; but if the machine is to understand us, we need a language translator to translate these symbols into machine language. The Debug Mode of Atari's Editor/Assembler cartridge, for example, can be used to translate the symbols $48 and PHA to the ones and zeros that the machine understands. The debugger can also translate the machine's ones and zeros to $48 and PHA. The assembler part of the Editor/Assembler cartridge can be used to translate entire groups of symbols like PHA to machine code. Assemblers An assembler - for example, the one contained in the Assembler/Editor cartridge - is a program which is used to translate symbols that a human can easily understand into the ones and zeros that the machine can understand. In order for the assembler to know what we want it to do, we must communicate with it by using a set of symbols arranged according to a set of rules. The assembler is a translator, and the language it understands is 6502 assembly language. The purpose of 6502 assembly language is to aid program authors in writing machine language code. The designers of the 6502 assembly language created a set of symbols and rules that matches 6502 machine language as closely as possible. This means that the assembler retains some of the 2
Chapter One disadvantages of machine language. For instance, the process of adding two large numbers takes dozens of instructions in 6502 machine language. If human programmers had to code those dozens of instructions in the ones and zeros of machine language, there would be very few human programmers. But the process of adding two large numbers in 6502 assembly language also takes dozens of instructions. The assembly language instructions are easier for a programmer to read and remember, but they still have a One-to-one cor respondence with the dozens of machine language instructions. The programming is easier, but the process remains the same. High Level Languages High level languages, like Atari BASIC, Atari PILOT, and Atari Pascal, are simpler for people to use because they more closely approximate human speech and thought patterns. However, the computer still understands only machine language. So the high level languages, while seeming simple to their users, are really much more complex in their internal operations than assembly language. Each high level language is designed to meet the specific need of some group of people. Atari Pascal is designed to implement the concept of structured programming. Atari PILOT is designed as a teaching tool. Atari BASIC is designed to serve both the needs of the novice who is just learning to program a computer and the needs of the expert programmer who is writing a sophisticated application program, but wants the program to be accessible to a large number of users. Each of these languages uses a different set of symbols and symbol-combining rules. But all these language translators were themselves written in assembly language. Language Translation Methods There are two different methods of performing language translation - compilation and interpretation. Languages which translate via interpretation are called interpreters. Languages which translate via compilation are called compilers. Interpreters examine the program source text and simulate the operations desired. Compilers translate the program source text into machine language for direct machine execution. 3
Chapter One The compilation method tends to produce faster, more efficient programs than does the interpretation method. However, the interpretation method can make programming easier. Problems with the Compiler Method The compiler user first creates a program source file on a disk, using a text editing program. Then the compiler carefully examines the source program text and generates the machine language as required. Finally, the machine language code is loaded and executed. While this three-step process sounds fairly simple, it has several serious 'gotchas." Language translators are very particular about their symbols and symbol-combining rules. If a symbol is misspelled, if the wrong symbol is used, or if the symbol is not in exactly the right place, the language translator will reject it. Since a compiler examines the enure program in one gulp, one misplaced symbol can prevent the compiler from understanding any of the rest of the program - even though the rest of the program does not violate any rules! The result is that the user often has to make several trips between the text editor and the compiler before the compiler successfully generates a machine language program. But this does not guarantee that the program will work. If the programmer is very good or very lucky, the program will execute perfectly the very first time. Usually, however, the user must debug the program. This nearly always involves changing the source program, usually many times. Each change in the source program sends the user back to step one: after the text editor changes the program, the compiler still has to agree that the changes are valid, and then the machine code version must be tested again. This process can be repeated dozens of times if the program is very complex. Faster Programming or Faster Programs? The interpretation method of language translation avoids many of these problems. Instead of translating the source code into machine language during a separate compiling step, the interpreter does all the translation while the program is running. This means that whenever you want to test the program you're writing, you merely have to tell the interpreter to run it. If things don't work right; stop the program, make a few changes, and run the program again at once. 4
Chapter One You must pay a few penalties for the convenience of using the interpreter's interactive process, but you can generally develop a complex program much more quickly than the compiler user can. However, an interpreter is similar to a compiler in that the source code fed to the interpreter must conform to the rules of the language. The difference between a compiler and an interpreter is that a compiler has to verify the symbols and symbol-combining rules only once - when the program is compiled. No evaluation goes on when the program is running. The interpreter, however, must verify the symbols and symbol-combining rules every time it attempts to run the program. If two identical programs are written, one for a compiler and one for an interpreter, the compiled program will generally execute at least ten to twenty times faster than the interpreted program. Pre-compiling Interpreter Atari BASIC has been incorrectly called an interpreter. It does have many of the advantages and features of an interpretive language translator, but it also has some of the useful features of a compiler. A more accurate term for Atari's BASIC Language Translator is pre-compiIing interpreter. Atari BASIC, like an interpreter, has a text editor built into it. When the user enters a source line, though, the line is not stored in text form, but is translated into an intermediate code, a set of symbols called tokens. The program is stored by the editor in token form as each program line is enterred. Syntax and symbol errors are weeded out at that time. Then, when you run the program, these tokens are examined and their functions simulated; but hecause much of the evaluation has already been done, the execution of an Atari BASIC program is faster than-that of a pure interpreter. Yet Atari BASIC's program-building process is much simpler than that of a compiler. Atari BASIC has advantages over compilers and interpreters alike. With Atari BASIC, every time you enter a line it is verified for language correctness. You don't have to wait until compilation; you don't even have to wait until a test run. When you type RUN you already know there are no syntax errors in your program. 5
Chapter OneInternal Design Overview
Atari BASIC is divided into two major functional areas: the Program Constructor and the Program Executor. The Program Constructor is used when you enter and edit a BASIC program. The source line pre-compiler, also part of the Program Constructor, translates your BASIC program source text lines into tokenized lines. The Program Executor is used to execute the tokenized program - when you type RUN, the Program Executor takes over. Both the Program Constructor and the Program Executor are designed to use data tables. Some of these tables are already contained in BASIC's ROM (read-only memory). Others are constructed by BASIC in the user RAM (random- access memory). Understanding these various tables is an important key to understanding the design of Atari BASIC. Tokens In Atari BASIC, tokens are the intermediate code into which the source text is translated. They represent source-language symbols that come in various lengths - some as long as 100 characters (a long variable name) and others as short as one character ("+" or "-"). Every token, however, is exactly one eight-bit byte in length. Since most BASIC Language Symbols are more than one character long, the representation of a multi-character BASIC Language Symbol with a single-byte token can mean a considerable saving of program storage space. A single-byte token symbol is also easier for the Program Executor to recognize than a multi-character symbol, since it can be evaluated by machine language routines much more quickly. The SEARCH routine - 76 bytes long - located at $A462 isa good example of how much assembly language it takes to recognize a multi-character symbol. On the other hand, the two instructions located at $AB42 are enough to 7
Chapter Two determine if a one-byte token is a variable. Because routines to recognize Atari BASIC'so one-byte tokens take so much less machine language, they execute relatively quickly. The 256 possible tokens are divided into logical numerical groups that also make them simpler to deal with in assembly language. For example, any token whose value is 128 ($80) or greater represents a variable name. The logical grouping of the token values also means faster execution speeds, since, in effect, the computer only has to check bit 7 to recognize a variable. The numerical grouping of the tokens is shown below: Token Value (Hex) Description 00-0D Unused 0E Floating Point Numeric Constant. The next six bytes will hold its value. 0F String Constant. The next byte is the string length. A string of that length follows. 10-3C Operators. See table starting at $A7E3 for specific operators and values. 39-54 Functions. See table starting at $A820 for specific functions and values. 55-7F Unused. 80-FF Variables. In addition to the tokens listed above, there is another set of single-byte tokens, the Statement Name Tokens. Every statement in BASIC starts with a unique statement name, such as LET, PRINT, and POKE. (An assignment statement such as "A = B + C," without the word LET, is considered to begin with an implied LET.) Each of these unique statement names is represented by a unique Statement Name Token. The Program Executor does not confuse Statement Name Tokens with the other tokens because the Statement Name Tokens are always located in the same place in every statement - at the beginning. The Statement Name Token value is derived from its entry number, starting with zero, in the Statement Name Table at $A4AF. 8
Chapter Two Tables A table is a systematic arrangement of data or inforrnation. Tables in Atari BASIC fall into two distinct types: tables that are part of the Atari BASIC ROM and tables that Atari BASIC builds in the user RAM area. ROM Tables The following is a brief description of the various tables in the Atari BASIC ROM. The detailed use of these tables will be explained in subsequent chapters. Statement Name Table ($A4AF). The first two bytes in each entry point to the information in the Statement Syntax Table for this statement. The rest of the entry is the name of the statement name in ATASCII. Since name lengths vary, the last character of the statement name has the most signiflcant bit turned on to indicate the end of the entry. The value of the Statement Name Token is derived from the relative (from zero) entry number of the statement name in this table. Statement Execution Table ($AA00). Each entry in this table is the two-byte address of the 6502 machine language code which will simulate the execution of the statement. This table is organized with the statements in the same order as the statements in the Statement Name Table. Therefore, the Statement Name Token can be used as an index to this table. Operator Name Table ($A7E3). Each entry comprises the ATASCII text of an Operator Symbol. The last character of each entry has the most significant bit turned on to indicate the end of the entry. The relative (from zero) entry number, plus 16 ($10), is the value of the token for that entry. Each of the entries is also given a label whose value is the value of the token for that symbol. For example, the ";" symbol at $A7E8 is the fifth (from zero) entry in the table. The label for the ";" token is CSC, and the value of CSC is $15, or 21 decimal (1l6+5). Operator Execution Table ($AA70). Each two-byte entry points to the address, minus one, of the routine which simulates the execution of an operator. The token value, minus 16, is used to access the entries in this table during execution time: The entries in this table are in the same order as in the Operator Name Table. Operator Precedence Table ($AC3F). Each entry represents the relative execution precedence of an individual operator. The table entries are accessed by the operator tokens, 9
Chapter Two minus 16. Entries correspond with the entries in the Operator Name Table. (See Chapter 7.) Statement Syntax Table ($A60D). Entries in this table are used in the process of translating the source program to tokens. The address pointer in the first part of each entry in the Statement Name Table is used to access the specific syntax information for that statement in this table. (See Chapter 5.) RAM Tables The tables that BASIC builds in the user RAM area will be explained in detail in Chapter 3. The following is a brief description of these tables: Variable Name Table. Each entry contains the source ATASCII text for the corresponding user variable symbol in the program. The relative (from zero) entry number of each entry in this table, plus 128, becomes the value of the token representing the variable. Variable Value Table. Each entry either contains or points to the current value of a variable. The entries are accessed by the token value, minus 128. Statement Table. Each entry is one tokenized BASIC program line. The tokenized lines are kept in this table in ascending numerical order by line number. Array/String Table. This table contains the current values for all strings and numerical arrays. The location of the specific values for each string and/or array variable is accessed from information in the Variable Value Table. Runtime Stack. This is the LIFO Runtime Stack, used to control the execution of GOSUB/RETURN and similar statements. Pre-compiler Atari BASIC translates the BASIC source lines from text to tokens as soon as they are entered. To do this, Atari BASIC must recognize the symbols of the BASIC Language. BASIC also requires that its symbols be combined in certain specific patterns. If the symbols don't follow the required patterns, then Atari BASIC cannot translate the line. The process of checking a source line for the required symbol patterns is called syntax checking. BASIC performs syntax checking as part of the tokenizing process. When the Program Editor receives a completed line of 10
Chapter Two input, the editor hands the line to the syntax routine, which examines the first word of the line for a statement name. If a valid statement name is not found, then the line is assumed to be an implied LET statement. The grammatical rules for each statement are contained in the Statement Syntax Table. A special section of code examines the symbols in the source line, under the dfrection of the grammatical rules set forth in the Statement Syntax Table. If the source line does not conform to the rules, then it is reported back as an error. Otherwise, the line is translated to tokens. The result of this process is returned to the Program Editor for further processing. Program Editor When Atari BASIC is not executing statements, it is in the edit mode. When the user enters a source line and hits return, the editor accepts the line into a line buffer, where it is examined by the pre-compiler. The pre-compiler returns either tokens or an error text line. If the line started with a line number, the editor inserts the tokenized line into the Statement Table. If the Statement Table already contains a line with the same line number, then the old line is removed from the Statement Table. The new line is then inserted just after the statement with the next lower line number and just before the statement with the next higher line number. If the line has no line number, the editor inserts the line at the end of the Statement Table. It then passes control to the Program Executor, which will carry out the statement(s) in the line at the end of the Statement Table. Program Executor The Program Executor has a pointer to the statement that it is to execute. When control is passed to the executor, the pointer points to the direct (command) line at the end of the statement table. If that statement causes some other line to be executed (RUN, GOTO, GOSUB, etc.), the pointer is changed to the new line. Lines continue to be executed as long as nothing stops that execution (END, STOP, error, etc.). When the program execution is stopped, the Program Executor returns control to the editor. 11
Chapter Two When a statement is to be executed, the Statement Name Token (the first code in the statement) directs the interpreter to the specific code that executes that statement. For instance, if that token represents the PRINT statement, the PRINT execution code is called. The execution code for each statement then examines the other tokens and simulates their operations. Execute Expression Arithmetic and logical expressions (A+B, C/D+E, F<G, etc.) are simulated with the Execute Expression code. Expression operators (+,-,*,etc.) have execution precedence - some operators must be executed before some others. The expression 1 + 3*4 has a value of 13 rather than 16 because * had a higher precedence than + . To properly simulate expressions, BASIC rearranges the expression with higher precedence first. BASIC uses two temporary storage areas to hold parts of the rearranged expression. One temporary storage area, the Argument Stack, holds arguments - values consisting of constants, variables, and temporary values resulting from previous operator simulations. The other temporary storage area, the Operator Stack, holds operators. Both temporary storage areas are managed as Last-In/First-Out (LIFO) stacks. LIFO Stacks A LIFO (Last In/First Out) stack operates on the principle that the last object placed in the stack storage area will be the first object removed from it. If the letters A, B, C, and D, in that order, were placed in a LIFO stack, then D would be the first letter removed, followed by C, B, and A. The operations required to rearrange the expression using these stacks will be explained in Chapter 7 BASIC also uses another LIFO stack, the Runtime Stack, in the simulation of statements such as GOSUB and FOR. GOSUB requires that BASIC remember where in the statement table the GOSUB was located so it will return to the right spot when RETURN is executed. If more than one GOSUB is executed before a RETURN, BASIC returns to the statement after the most recent GOSUB. 12
Chapter ThreeMemory Usage
Many of BASIC's functions are controlled by a set of tables built in RAM not already occupied by BASIC or the Operating System (OS). Figure 3.1 is a diagram of memory use by both programs. Every time a BASIC programmer enters a statement, memory requirements for the RAM tables change. Memory use by the OS also varies. Different graphics modes, for example, require different amounts of memory. These changing memory requirements are monitored, and this series of pointers keeps BASIC and the OS from overlaying each other in memory:When a graphics mode requires larger screen space, the OS checks the application high memory address (APHM) that has been set by BASIC. If there is enough room for the new screen, the OS uses the upper portion of space and sets the pointer HMADR to the bottom of the screen to tell the application how much space the OS is now using. BASIC builds its table toward high memory from low memory. The pointer to the lowest memory available to an application, called LMADR in the BASIC listing, is set by the OS to tell BASIC the lowest memory address that BASIC can use. When BASIC needs more room for one of its tables, BASIC checks HMADR. If there is enough room, BASIC uses the space and puts the highest address it has used into APHM for OS. BASIC's operation consists primarily of building, reading, and modifying tables. Pointers to the RAM tables are kept in consecutive locations in zero page starting at $80. These tables are, in order,
- High memory address (HMADR) at location $02E5
- Application high memory (APHM) at location $000E
- Low memory address (LMADR) at location $02E7
13
- Multipurpose Buffer
- Variable Name Table
- Variable Value Table
- String/Array Table
Chapter ThreeBASIC reserves space for a buffer at LMADR. It then builds the tables contiguously (without gaps), starting at the top of the buffer and extending as far as necessary towards APHM. When a new entry needs to be added to a table, all data in the tables above is moved upward the exact amount needed to fit the new entry into the right place. Figure 3-1. Memory usage ---------------------- FFFF | Operating System | | ROM | E000 ---------------------- | Floating Point | | ROM | D800 ---------------------- | Hardware Registers | D000 ---------------------- | Unused | BFFF ---------------------- | BASIC ROM | A000 ---------------------- | Screen | ----------------------<----HMADR Free RAM ----------------------<---- APHM | BASIC | | RAM | | Tables | ----------------------<---- LMADR | Operating System | | RAM | 0000 ---------------------- 14
- Statement Table
- Runtime Stack
Chapter Three Variable Name Table The Variable Name Table (VNT) is built during the pre-compile process. It is read, but not modified, during execution but only by the LIST statement. The VNT contains the names of the variables used in the program in the order in which they were entered. The length of entries in the Variable Name Table depends on the length of the variable name. The high order bit of the last character of the name is on. For example, the ATASCII code for the variable name ABC is 414243 (expressed in hexadecimal). In the Variable Name Table it looks like this: 41 42 C3 The $ character of a string name and the ( character of an array element name are stored as part of the variable name. The table entries for variables C, AA$, and X(3) would look like this: C C3 AA$ 41 41 A4 X(3) 58 A8 It takes only two bytes to store X(3) because this table stores only X(. A variable is represented in BASIC by a token. The value of this token is the position (relative to zero) of the variable name in the Variable Name Table, plus $80. BASIC references an entry in the table by using the token, minus $80, as an index. The Variable Name Table is not changed during execution time. The zero page pointer to the Variable Name Table is called VNTP in the BASIC listing. Variable Value Table The Variable Value Table (VVT) is also built during the pre compile process. It is both read and modified during execution. There is a one-to-one correspondence in the order of entries between the Variable Name Table and the Variable Value Table. If XXX is the fifth variable in the Variable Name Table, then XXX's value is the fifth entry in the Variable Value Table. BASIC references a table entry by using the variable token, minus $80, as an index Each entry in the Variable Value Table consists of eight bytes. The first two bytes have the following meaning: 15
Chapter Three 1 2 --------------- | | | --------------- type vnum type = one byte, which indicates the type of variable $00 for floating point variable $40 for array variable $80 for string variable vnum = one byte, which indicates the relative position of the variable in the tables The meaning of the next six bytes varies, depending on the type of variable (floating point, string, or array). In all three cases, these bytes are initialized to zero during syntaxing and during the execution of the RUN or CLR. When the variable is a floating point number, the six bytes represent its value. When the variable is an array, the remaining six bytes have the following format 1 2 3 4 5 6 7 8 ------------------------- | | | | | | | | | ------------------------- |disp | dim1| dim2 disp = the two-byte displacement into string/array space of this array variable dim1 = two bytes indicating the first dimension value dim2 = two bytes indicating the second dimension value All three of these values are set appropriately when the array is DIMensioned during execution. When the variable is a string, the remaining six bytes have the following meaning: 1 2 3 4 5 6 7 8 ------------------------- | | | | | | | | | ------------------------- |disp | curl| maxl 16
Chapter Three disp = the two-byte displacement into string/array space of this string variable. This value is set when the string is DIMensioned during execution. curl = the two-byte current length of the string. This value changes as the length of the string changes during execution. maxl = the two-byte maximum possible length of this string. This value is set to the DIM value during execution. When either a string or an array is DIMensioned during execution, the low-order bit in the type byte is turned on, so that the array type is set to $41 and the string type to $81. The zero page pointer to the Variable Value Table is called VVTP in the BASIC listing. Statement Table The Statement Table, built as each statement is entered during editing, contains tokenized forms of the statements that were entered. This table determines what happens during execution. The format of a Statement Table entry is shown in Figure 3-2. There can be several tokens per statement and several statements per line. Figure 3-2. Format of a Statement Table Entry ---+----------------------- ----------------------- --------------- | | | | | .... | | | | .... | | | ---+----------------------- ----------------------- --------------- |lnum | llen| slen| snt | toks | eos |slen | snt | toks | eos | eol | lnum = the two-byte line number (low-order, high-order) llen = the one-byte line length (the displacement to the next line in the table) slen = the one-byte statement length (the displacement to the next statement in the line) snt = the one-byte Statement Name Token toks = the other tokens that make up the statement (this is variable in length) eos = the one-byte end of statement token eol = the one-byte end of line token The zero page pointer to the Statement Table is called STMTAB in the BASIC listing. 17
Chapter Three String/Array Table The String/Array Table (also called String/Array Space) is created and modified during execution. Strings and arrays can be intermixed in the table, but they have different formats. Each array or string is pointed to by an entry in the Variable Value Table. The entry in the String/Array Table is created when the string or array is DIMensioned during execution. The data in the entry changes during execution as the value of the string or an element of the array changes. An entry in the String/Array Table is not initialized to any particular value when it is created. The elements of arrays and the characters in a string cannot be counted upon to have any particular value. They can be zero, but they can also be garbage - data previously stored at those locations. Array Entry For an array, the String/Array Table contains one six-byte entry for each array element. Each element is a floating point number, stored in raveled order. For example, the entry in the String/Array Table for an array that was dimensioned as A(1,2) contains six elements, in this order: A(0,0) A(0,1) A(0,2) A(1,0) A(1,1) A(1,2) String Entry A string entry in the String/Array Table is created during execution, when the string is DiMensioned. The size of the entry is determined by the DIM value. The "value" of the string to BASIC at any time is determined by the data in the String/Array Table and the current length of the string as set in the Variable Value Table. The zero page pointer to the String/Array Table is called STARP in the BASIC listing. The Runtime Stack is created during execution. BASIC uses this LIFO stack to control processing of FOR/NEXT loops and GOSUBs. When either a FOR or a GOSUB statement is encountered during execution, an entry is put on the Runtime Stack. When a NEXT, RETURN, or a POP statement is encountered, entries are pulled off the stack. Both the FOR entry and the GOSUB entry have a four-byte header: 18
Chapter Three ----------------- | | + | | ----------------- type lnum disp type = one byte indicating the type of element GOSUB type = 0 FOR type = non-zero lnum = the two-byte number of the line which contains the statement (low-order, high-order) disp = one byte indicating the displacement into the line in the Statement Table of the token which caused this stack entry. The FOR-type byte is actually the token representing the loop control variable from the FOR statement. (In the statement FOR I = I to 10, I is the loop control variable.) So the FOR-type byte will have a value of $80 through $FF - the possible values of a variable token. The FOR entry contains 12 additional bytes, formatted like this: 1 2 3 4 5 6 7 8 9 10 11 12 -----+----+----+----+----+---------+----+----+----+----+----- | | | -----+----+----+----+----+---------+----+----+----+----+----- sval step sval = the six-byte (floating point) limit value at which to stop the loop step = the six-byte (floating point) STEP value to increment by The GOSUB entry consists entirely of the four-byte header. The LIST and READ statements also put a GOSUB type entry on the Runtime Stack, so that the line containing the LIST or READ can be found again when the statement has finished executing. The zero page pointer to the Runtime Stack is called RUNSTK in the BASIC listing. 19
Chapter Three Zero Page Table Pointers The starting addresses of the tables change dynamically during both program construction and program execution. BASIC keeps the current start addresses of the tables and other pointers required to manage memory space in contiguous zero- page cells. Each pointer is a two-byte address, low byte first. Since these zero page cell addresses remain constant, BASIC is always able to find the tables. Here are the zero page pointers used in memory management, their names in the BASIC listing, and their addresses: Multipurpose Buffer $80,$81 Variable Name Table VNTP $82,$83 VNT dummy end VNTD $84,$85 Variable Value Table VVTP $86,$87 Statement Table STMTAB $88,$89 Current Statement Pointer STMCUR $8A,$8B StringlArray Table STARP $8C,$8D Runtime Stack RUNSTK $8C,$8E Top of used memory MEMTOP $90,$91 Memory Management Routines Memory Management routines allocate space to the BASIC tables as needed. There are two routines: expand, to add space, and contract, to delete space. Each routine has one entry point for cases in which the number of bytes to be added or deleted is less than 256, and another when it is greater than or equal to 256. The EXPAND and CONTRACT routines often move many thousands of bytes each time they are called. The 6502 microprocessor is designed to move fewer than 256 bytes of data very quickly. When larger blocks of data are moved, the additional 6502 instructions required can make the process very slow. The EXPAND and CONTRACT routines circumvent this by using the less-than-256-byte fast-move capabilities in the movement of thousands of bytes. The end result is a set of very fast and very complex data movement routines. All of this complexity does have a drawback. The infamous Atari BASIC lock-up problem lives in these two routines. If an EXPAND or CONTRACT requires that an exact multiple of 256 bytes be moved, then the routines move things from the wrong 20
Chapter Three place in memory to the wrong place in memory, whereupon the computer locks up and won't respond. The only way to avoid losing hours of work this way is to SAVE to disk or cassette frequently. EXPAND ($A881) Parameters at entry: register X = the zero page address containing the pointer to the location after which space is to be added Y = the low-order part of the number of bytes to expand A = the high-order part of the number of bytes to expand The routine creates a hole in the table memory, starting at a requested location and continuing the requested number of bytes. The routine first checks to see that there is enough free memory space to satisfy the request. It adds the requested expand size to each of the zero-page table pointers between the one pointed to by the X register and MEMTOP. Then each pointer will point to the correct address when EXPAND is done. EXPAND then creates space at the address indicated by the X register. The number of bytes required is contained in the Y and A registers. (Y contains the least significant byte, while A contains the most significant.) All data from the requested address to the address pointed to by MEMTOP is moved toward high memory by the requested number of bytes. This creates a hole of the proper size. The routine then sets Application High Memory (APHM) to the value in MEMTOP. This tells the OS the highest memory address that BASIC is currently using. EXPLOW ($A87F) Parameters at entry: register X = zero page address containing the pointer to the location after which space is to be added Y = number of bytes to expand (low-order byte only) 21
Chapter Three This is an additional entry point for the EXPAND routine. It is used when the number of bytes to be added to the table is less than 256. This routine first loads the 6502 accumulator with zero to indicate the most significant byte of the expand length. It then functions exactly like EXPAND. CONTRACT ($A8FD) Parameters at entry: register X = zero page address containing the pointer to the starting location where space is to be removed Y = the low-order part of the number of bytes to contract A = the high-order part of the number of bytes to contract This routine removes a requested number of bytes at a requested location by moving all the data from higher in the tables downward the exact amount needed to replace the unwanted bytes. It subtracts the requested contract size from each of the zero page table pointers between the one pointed to by the X register and MEMTOP. Then each pointer will point to the correct address when CONTRACT is done. The routine sets application high memory (APHM) to the value in MEMTOP to indicate to the OS the highest memory address that BASIC is currently using. The block of data to be moved downward is defined by starting at the address pointed to by the zero-page address pointed to in X, plus the offset number stored in Y and A, and then continuing to the address specified at MEMTOP. Each byte of data in that block is moved downward in memory by the number of bytes specified in Y and A, effectively erasing all the data between the specified address and that address plus the requested offset. CONTLOW ($A8FB) Parameters at entry: register X = the zero page address containing the pointer to the location at which space is to be removed 22
Chapter Three Y = the number of bytes to contract (low-order byte only) This routine is used to remove fewer than 256 bytes from the tables at a requested location by moving all the data from higher in the tables downward the exact amount needed to replace the unwanted bytes. This routine first loads the 6502 accumulator with zero to serve as the most significant byte of the contract length. It then functions exactly like CONThACT. Miscellaneous Memory Allocations Besides the tables, which change dynamically, BASIC also uses buffers and stacks at fixed locations. The Argument/Operator Stack is allocated at BASIC's low memory address and occupies 256 bytes. During pre-compiling it is used as the output buffer for the tokens. During execution, it is used while evaluating an expression. This buffer/stack is referenced by a pointer at location $80. This pointer has several names in the BASIC listing: LOMEM, ARGOPS, ARGSTK, and OUTBUFF. The Syntax Stack is used during the process of syntaxing a statement. It is referenced directly that is, not through a pointer. It is located at $480 and is 256 bytes long. The Line Buffer is the storage area where the statement is placed when it is EN'TERed. It is the input buffer for the edit and pre-compile processes. It is 128 bytes long and is referenced directly as LBUFF. Often the address of LBUFF is also put into INBUFF so that the buffer can be referenced through a pointer, though INBUFF can point to other locations during various phases of BASIC's execution. 23
Chapter FourProgram Editor
The Atari keyboard is the master Control panel for Atari BASIC. Everything BASIC does has its origins at this control panel. The Program Editor's job is to service the control panel and respond to the commands that come from it. The editor gets a line from the user at the keyboard; does some preliminary processing on the line; passes the line to the pre-compiler for further processing; inserts, deletes, or replaces the line in the Statement Table; calls the Program Executor when necessary; and then waits to receive the user's next line input. Line Processing The Program Editor, which starts at $A060, begins its process by resetting the 6502 CPU stack. Resetting the CPU stack is a drastic operation that can only occur at the beginning of a logical process. Each time Atari BASIC prepares to get a new line from the user, it restarts its entire logical process. Getting a Line The Program Editor gets a user's line by calling CIO. The origin of the line is transparent to the Program Editor. The line may have been typed in at the keyboard or entered from some external device like the disk (if the ENTER command was given). The Program Editor simply calls CIO and asks it to put a line of not more than 255 bytes into the buffer pointed to by INBUFF (sF3). INBUFF points to the 128-byte area defined at LBUFF ($580). The OS'S screen editor, which is involved in getting a line from the keyboard, will not pass BASIC a line that is longer than 120 bytes. Normally, then, the 128-byte buffer at LBUFF is big enough to contain the user's line. Sometimes, however, if a line was originally entered from the keyboard with few blanks and many abbreviations, then LISTed to and re-ENTERed from the disk, an input line may be longer than 128 bytes. When this happens, data in the $600 page is overlaid. A LINE TOO LONG error will not necessarily 25
Chapter Four occur at this point. A LINE TOO LONG error occurs only if the Pre-compiler exceeds its stack while processing the line or if the tokenized line OUTBUFF exceeds 256 bytes. These overflows depend on the complexity of the line rather than on its actual length. When CIO has put a line into the line buffer (LBUFF) and the Program Editor has regained control, it checks to see if the user has changed his mind and hit the break key. If the user did indeed hit break, the Program Editor starts over and asks CIO for another line. Flags and Indices In order to help control its processing, the Program Editor uses flags and indices. These must be given initial values. CIX and COX. The index CIX (sF2) is used to access the user's input line in the line buffer (LBUFF), while COX ($94) is used to access the tokenized statement in the output buffer (OUTBUFF). These buffers and their indices are also used by the pre-compiler. The indices are initialized to zero to indicate the beginning of the buffers. DIRFLG. This flag byte ($A6) is used by the editor to remember whether a line did or did not have a line number, and also to remember if the pre-compiler found an error in that line. DIRFLG is initialized to zero to indicate that the line has a line number and that the pre-compiler has not found an error. MAXCIX. This byte ($9F) is maintained in case the line contains a syntax error. It indicates the displacement into LBUFF of the error. The character at this location will then be displayed in inverse video. The Program Editor gives this byte the same initial value as CIX, which is zero. SVVNTP. The pointer to the current top of the Variable Name Table (VNTD) is saved as SVVNTP (SAD) so that if there is a syntax error in this line, any variables that were added can be removed. If a user entered an erroneous line, such as 100 A=XAND B, the variable XAND would already have been added to the variable tables before the syntax error was discovered. The user probably meant to enter 100 A=X AND B, and, since there can only be 128 variables in BASIC, he probably does not want the variable XAND using up a place in the variable tables. The Program Editor uses SVVNTP to find the entry in the Variable Name Table soit can be removed. 26
Chapter Four SVVYTE. The process used to indicate which variable entries to remove from the Variable Value Table in case of error is different. The number of new variables in the line (SVVVTE1$B1) is initialized to zero. The Program Pre-compiler increments the value every time it adds a variable to the Variable Value Table. If a syntax error is detected, this number is multiplied by eight (the number of bytes in each entry on the Variable Value Table) to get the number of bytes to remove, counting backward from the most recent value entered. Handling Blanks In many places in the BASIC language, blanks are not significant. For example, 100 IFX=6THENGOTO500 has the same meaning as 100 IF X = 6 THEN GOTO 500. The Program Editor, using the SKIPBLANK routine ($DBA1), skips over unnecessary blanks. Processing the Line Number Once the editor has skipped over any leading blanks, it begins to examine the input line, starting with the line number. The floating point package is called to determine if a line number is present, and, if so, to convert the ATASCII line number to a floating point number. The floating point number is converted to an integer, saved in TSLNUM for later use, and stored in the tokenized line in the output buffer (OUTBUFF). The routine used to store data into OUTBUFF is called :SETCODE ($A2C8). When :SETCODE stores a byte into OUTBUFF, it also increments COX, that buffer's index. BASIC could convert the ATASCII line number directly to an integer, but the routine to do this would not be used any other time. Routines to convert ATASCII to floating point and floating point to integer already exist in BASIC for other purposes. Using these existing routines conserves ROM space. An interesting result of this sequence is that it is valid to enter a floating point number as a line number. For example, 100.1, 10.9, or 2.05E2 are valid line numbers. They would be converted to 100, 11, and 205 respectively If the input line does not start with a line number, the line is considered to be a direct statement. DIRFLG is set to $80 so 27
Chapter Four that the editor can remember this fact. The line number is set to 32768 ($8000). This is one larger than the largest line number a user is allowed to enter. BASIC later makes use of this fact in processing the direct statement. Line length. The byte after the line number in the tokenized line in OUTBUFF is reserved so that the line length (actually the displacement to the next line) can be inserted later. (See Chapter 2.) The routine :SETCODE is called to reserve the byte by incrementing (COX) to indicate the next byte. Saving erroneous lines. In the byte labeled STMSTART, the Program Editor saves the index intQ the line buffer (LBUFF) of the first non-blank character after the line number. This index is used only if there is a syntax error, so that all the characters in the erroneous line can be moved into the tokenized line buffer and from there into the Statement Table. There are advantages to saving an erroneous line in the Statement Table, because you can LIST the error line later. The advantage is greatest, not when entering a program at the keyboard, but when entering a program originally written in a different BASIC on another machine (via a modem, perhaps). Then, when a line that is not correct in Atari BASIC is entered, the line is flagged and stored - not discarded. The user can later list the program, find the error lines, and re-enter them with the correct syntax for Atari BASIC. Deleting lines. If the input line consists solely of a line number, the Program Editor deletes the line in the Statement Table which has that line number. The deletion is done by pointing to the line in the Statement Table, getting its length, and calling CONTRACT. (See Chapter 3.) Statement Processing The user's input line may consist of one or more statements. The Program Editor repeats a specific set of functions for each statement in the line. Initializing The current index (COX) into the output buffer (OUTBUFF) is saved in a byte called STMLBD. A byte is reserved in OUTBUFF by the routine :SETCODE. Later, the value in 28
Chapter Four STMLBD will be used to access this byte, and the statement length (the displacement to the next statement) will be stored here. Recognizing the Statement Name After the editor calls SKBLANK to skip blanks, it processes the statement name, now pointed to by the input index (CIX). The editor calls the routine SEARCH ($A462) to look for this statement name in the Statement Name Table. SEARCH saves the table entry number of this statement name into location STENUM. The entry number is also the Statement Name Token value, and it is stored into the tokenized output buffer (OUTBUFF) as such by :SETCODE. The SEARCH routine also saves the address of the entry in SRCADR for use by the pre-compiler. If the first word in the statement was not found in the Statement Name Table, the editor assumes that the statement is an implied LET, and the appropriate token is stored. It is left to the pre-compiler to determine if the statement has the correct syntax for LET The editor now gives control to the pre-compiler, which places the appropriate tokens in OUTBUFF, increments the indices CIX and COX to show current locations, and indicates whether a syntax error was detected by setting the 6502 carry flag on if there was an error and clearing the carry flag if there was not. (See Chapter 5.) If a Syntax Error Is Detected If the 6502 carry flag is set when the editor regains control, the editor does error processing. In MAXCIX, the pre-compiler stored the displacement into LBUFF at which it detected the error. The Program Editor changes the character at this location to inverse video. The character in inverse video may not be the point of error from your point of view, but it is where the pre-compiler detected an error. For example, assume you entered X YAND Z. You probably meant to enter X Y AND Z, and therefore would consider the error to be between Y and AND. However, since YAND is a valid variable name, X=YAND is a valid BASIC statement. The pre-compiler doesn't know there is an error until it encounters B. The value of highlighting the error with inverse 29
Chapter Four video is that it gives the user an approximation of where the error is. This can be a big advantage, espedally if the input line contained multiple statements or complex expressions. The next thing the editor does when a syntax error has been detected is set a value in DIRFLG to indicate this fact for future reference. Since the DIRE LG byte also indicates whether this is a direct statement, the error indicator of $40 is ORed with the value already in DIRELG. The editor takes the value that it saved in STMSTRT and puts it into CIX so that CIX now points to the start of the first statement in the input line in LBUFF. STMLBD is set to indicate the location of the first statement length byte in OUTBUFF. (A length will be stored into OUTBUFF at this displacement at a later time.) The editor sets the index into OUTBUFF (COX) to indicate the Statement Name Token of the first statement in OUTBUFF, and stores a token at that location to indicate that this line has a syntax error. The entire line (after the line number) is moved into OUTBUFF. At this point COX indicates the end of the line in OUTBUFF. (Later, the contents of OUTBUFF will be moved to the Statement Table.) This is the end of the special processing for an erroneous line. The process that follows is done for both correct and erroneous lines. Final Statement processing During initial line processing, the Program Editor saved in STMLBD a value that represents the location in OUTBUFF at which the statement length (displacement to the next statement) should be stored. The Program Editor now retrieves that value from STMLBD. Using this value as an index, the editor stores the value from COX in OUTBUFF as the displacement to the next statement The Program Editor checks the next character in LBUFF. If this character is not a carriage return (indicating end of the line), then the statement processing is repeated. When the carriage return is found, COX will be the displacement to the next line. The Program Editor stores COX as the line length at a displacement of two into OUTBUFF. 30
Chapter Four Statement Table Processing The final tokenized form of the line exists in OUTBUFF at this point. The Program Editor's next task is to insert or replace the line in the Statement Table. The Program Editor first needs to create the correct size hole in the Statement Table. The editor calls the GETSTMT routine ($A9A2) to find the address where this line should go in the Statement Table. If a line with the same line number already exists, the routine returns with the address in STMCUR and with the 6502 carry flag off. Otherwise, the routine puts the address where the new line should be inserted in the Statement Table into STMCUR and turns on the 6502 carry flag. (See Chapter 6.) If the line does not exist in the Statement Table, the editor loads zero into the 6502 accumulator. If the line does exist, the editor calls the GETLL routine ($A9DD) to put the line length into the accumulator. The editor then compares the length of the line already in the Statement Table (old line) with the length of the line in OUTBUFF (new line). If more room is needed in the Statement Table, the editor calls the EXPLOW ($A87F; see Chapter 3). If less space is needed for the new line, it calls a routine to point to the next line (GNXTL, at location $A9D0; see Chapter 6), and then calls the CONTLOW ($A8FB; see Chapter 3). Now that we have the right size hole, the tokenized line is moved from OUTBUFF into the Statement Table at the location indicated by STMCUR. Line Wrap-up After the line has been added to the Statement Table, the editor checks DIRFLG for the syntax error indicator. If the second most significant bit ($40) is on, then there is an error. Error Wrap-up If there is an error, the editor removes any variables that were added by this line by getting the number of bytes that were added to the Variable Name Table and the Variable Value Table from SVVNTP and SVVVTE. It then calls CONTRACT ($A8FD) to remove the bytes from each table. Next, the editor lists the line. The Statement Name Token, which was set to indicate an error, causes the word "ERROR" 31
Chapter Four to be printed. An inverse video character indicates where the error was detected. The editor now waits for you to enter another line. Handling Correct Lines If the line was syntactically correct, the editor again examines DIRFLG. In earlier processing, the most significant bit ($80) of this byte was set on if the line was a direct statement. If it is not a direct statement, then the editor is finished with the line, and it waits for another input line. If the line is a direct statement, earlier processing already assigned it a line number of 32768 ($8000), one larger than the largest line number a user can enter. Since lines are arranged in the Statement Table in ascending numerical order, this line will have been inserted at the end of the table. The current statement pointer (STMCUR-$8A $8B) points to this line. The Program Editor transfers control to a Program Executor routine, Execution Control (EXECNL at location $A95F), which will handle the execution of the direct statement. (See Chapter 6.) 32
Chapter FiveThe Pre-compiler
The symbols and symbol-combining rules of Atari BASIC are coded into Syntax Tables, which direct the Program Pre compiler in examining source code and producing tokens. The information in the Syntax Tables is a transcription of a meta language definition of Atari BASIC. The Atari BASIC Meta-language A meta-language is a language which describes or defines another language. Since a meta-language is itself a language, it also has symbols and symbol-combining rules - which define with precision the symbols and symbol-combining rules of the subject language. Atari BASIC is precisely defined with a specially developed meta-language called the Atari BASIC Meta-language, or ABMI,. (ABML was derived from a commonly used compiler- technology meta-language called BNF.) The symbols and symbol-combining rules of ABML were intentionally kept very simple. Making Up a Language To show you how ABML works, we'll create an extremely simple language called SAP, for Simple Arithmetic Process. SAP symbols consist of variables, constants, and operators.The grammar of the SAP language is precisely defined by the ABML definition in Figure 5-1. 33
- Variables: The letters A, B, and C only.
- Constants: The numbers 1,2,3,4,5,6,7,8, and 9 only.
- Operators: The characters +, -, *, /, and ! only. Of course, you already know the functions of all the operators except "!" The character! is a pseudo-operator of the SAP language used to denote the end of the expression, like the period that ends this sentence.
Chapter Five Figure 5-1 The SAP Language Expressed in ABML SAP := <expression>! <expression> := <value> <operation> | <operation> := <operator> <expression> <value> := <constant> | <variable> <constant> := 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 <variable> := A | B | C <operator> := + | - | * | / The ABML symbols used to define the SAP language in Figure 5-1 are: := is defined as Whatever is on the left of : = is defined as consisting of whatever is on the right of :=, and in that order. | or The symbol | allows choices for what something is defined as. For instance, in the sixth line <variable> can be A or B or C. If does not appear between two symbols, then there is no choice. For example, in the second line <expression> must have both <value> and <operation>, in that order, to be valid. <> label Whatever comes between < and> is an ABML label. All labels, as non-terminal symbols, must be defined at some point, though the definitions can be circular - notice that <operation> is part of the definition of <expression> in the second line, while in the third line <expression> is part of the definition of <operation>. terminal Symbols used in definitions, which are not symbols enclosed by < and > and are also not one of the ABML symbols, are terminal symbols in the language being defined by ABML. In SAP, some terminal symbols are A, !, B, *, and 1. They cannot be defined as consisting of other symbols - they are themselves the symbols that the SAP language manipu 34
Chapter Five lates, and must appear exactly as they are shown to be valid in SAP. In effect, they are the vocabulary of the SAP language. Statement Generation The ABML description of SAP can be used to generate grammatically correct statements in the SAP language. To do this, we merely start with the first line of the definition and replace the non-terminal symbols with the definitions of those symbols. The replacement continues until only terminal symbols remain. These remaining terminal symbols constitute a grammatically correct SAP statement. Since the or statement requires that one and only one of the choices be used, we will have to arbitrarily replace the non- terminal with the one valid choice. Figure 5-2 illustrates the ABML statement generation process. Figure 5-2. The Generation of One Possible SAP Statement (1) SAP := <expression>! (2) SAP := <value> <operation>! (3) SAP := <variable> <operation>! (4) SAP := B< operation>! (5) SAP := B<operator> <expression>! (6) SAP := B*<expression>! (7) SAP := B*<value> <operation>! (8) SAP := B*<constant> <operation>! (9) SAP := B*4<operation>! (10) SAP := B*4< operator> <expression>! (11) SAP := B*4+<expression>! (12) SAP := 8*4+<value> <operation>! (13) SAP := B*4+<variable> <operation>! (14) SAP := B*4+C<operation>! (15) SAP := B*4+C! In (2), <value> <operation> replaces <expression> because the ABML definition of SAP (FigureS-I) defines <expression> as <value> <operation>. In (3), the non-terminal <value> is replaced with 35
Chapter Five <variable>. The definition of <value> gives two choices for the substitution of <value>. We happened to choose <variable>. In (4), we reach a terminal symbol, and the process of defining <value> ends. We happened to choose B to replace <variable>. In (5), we go back and start defining <operation>. There are two choices for the replacement of <operation>, either <operator> <expression> or nothing at all (since there is nothing to the right of in the second line of Figures 5-1). If nothing had been chosen, then (5) would have been: SAP :=B! The statement B! has no further non-terminals; the process would have been finished, and a valid statement would have been produced. Instead we happened to choose <operator> <expression>. The SAP definition for <expression> is <value> <operation>. If we replace <operation> with its definition we get: <expression> : = <value> <operator> <expression> The definition of <expression> includes <expression> as part of its definition. If the <operator> <expression> choice were always made for <operation>, then the process of replacement would never stop. A SAP statement can be infinitely long by definition. The only thing which prevents us from always having an infinitely long SAP statement is that there is a second choice for the replacement of <operation>: nothing. The replacements in (5) and (10) reflect the repetitive choices of defining <expression> in terms of itself. The choice in (15) reflects the nothing choice and thus finishes the replacement process. Computerized Statement Generation If we slightly modify our procedure for generating statements, we will have a process that could be easily programmed into a computer. Instead of arbitrarily replacing the definition of non- terminals, we can think of the non-terminal as a GOSUB. When we see <X> := <Y> <Z>, we can think of <Y> as being a subroutine-type procedure: (a) Go to the line that has <Y> on the left side. (b) Process the definition (right side) of <Y>. 36
Chapter Five (c) If while processing the definition of <Y>, other non- terminals are found, GOSUB to them. (d) If while processing the definition of <Y> we encounter a terminal, output the terminal symbol as the next symbol of the generated statement. (e) When the definition of <Y> is finished, return to the place that <Y> was called from and continue. Since ABML is structured so that it can be programmed, a fascinating exercise is to design a simple English sentence grammar with ABML, then write a BASIC program to generate valid English sentences at random. The randomness of the sentences would be derived by using the RND function to select from the definitions or choices. An example of such a grammar is shown in Figure 5-3. (The programming exercise is left to you.) Figure 5-3. A Simple English Sentence Grammar in ABML SENTENCE := <subject> <adverb> <verb> <object>. <subject> := The <adjective> <noun> <verb> := eats | sleeps | drinks | talks | hugs <adverb> := quickly | silently | slowly | viciously | lovingly | sadly | <object> := at home | in the car | at the table | at school | <subject> <noun> := boy | girl | dog | programmer | computer | teacher <adjective> := happy | sad | blue | light | round | smart | cool | nice | Syntactical Analysis The process of examining a language statement for grammatical correctness is called syntactical analysis, or syntaxing. Statement verification is similar to statement generation. Instead of arbitrarily choosing which or definition to use, however, the choices are already made, and we must check to see whether the statement symbols are used in valid patterns. To do this, we must process through each or definition until we find a matching valid terminal symbol. The result of statement generation is a valid, grammatically correct statement, but the result of statement verification is a 37
Chapter Five statement validity indication, which is a simple yes or no. Either the statement is grammatically correct or it is not.Failure occurs when some statement symbol cannot be matched with a valid terminal symbol under the rules of the grammar. The Reporting System To use the pass/fail result of statement verification, we must build a reporting system into the non4erminal checking process. Whenever we, in effect, GOSUB to a non-terminal definition, that non-terminal definition must report its pass/fail status. A fail status is generated and returned by a non-terminal definition when it finds no matching terminal for the current statement symbol. If the current statement symbol is B and the <constant> definition in the SAP language is called, then <constant> would report a fail status to the routine that called it. A pass status is returned when a terminal symbol is found which matches the current statement symbol. If our current statement symbol had been 7 instead of B, then <constant> would have reported pass. Whenever such a match does occur, we return to the statement, and the next symbol to the right becomes the new current symbol for examination and verification. Cycling Through the Definitions In SAP, the <constant> definition is called from the <value> definition. If <constant> reports fail, then we examine the next or choice, which is <variable>. The current symbol is B, so <variable> reports pass. Since at least one of the or choices of <value> has reported pass, <value> will report pass to its caller. If both <constant> and <variable> had reported fail, then <value> would report fail to its caller. The caller of <value> is <expression>. If <value> reports pass, <operation> is called. If <operation> reports pass, then <expression> can report pass to its caller. If either <value> or <operation> reports fail, then <expression> must report fail, since there are no other or choices for <expression> The definition of <operation> contains a special pass/fail property. If either <operator> or <expression> reports fail, 38
Chapter Five then the or choice must be examined. In this case the or choice is nothing. The or nothing means something special: report pass, but do not advance to the next symbol. The final pass/fail report is generated from the first line of the definition. If <expression> reports pass and the next symbol is!, then SAP reports pass. If either one of these conditions has a fail status, then SAP must report fail to whatever called SAP from outside the language. Backing Up Sometimes it is necessary to back up over symbols which have already been processed. Let's assume that there was a definition of the type <X> : <Y>I<z>. It is possible that while <Y> is attempting to complete its definition, it will find a number of valid matching terminal symbols before it discovers a symbol that it cannot match. In this case, <Y> would have consumed a number of symbols before it decided to report fail. All of the symbols that <Y> consumed must be unconsumed before <Z> can be called, since <Z> will need to check those same symbols. The process of unconsuming symbols is called backup. Backup is usually performed by the caller of <Y>, which remembers which source symbol was current when it called <Y>. If <Y> reports fail, then the caller of <Y> restores the current symbol pointer before calling <Z>. Locating Syntax Error When a final report of fail is given for a statement, it is often possible to guess where the error occurred. In a left-to-right system, the symbol causing the failure is usually the symbol which follows the rightmost symbol found to be valid. If we keep track of the rightmost valid symbol during the various backups, we can report a best guess as to where the failure- causing error is located. This is exactly what Atari BASIC does with the inverse video character in the ERROR line. For simplicity, our example was coded for SAP, but the syntactical analysis we have just described is essentially the process that the Atari BASIC pre-compiler uses to verify the grammar of a source statement. The Syntax Tables are an ABML description of Atari BASIC. The pre-compiler, also known as the syntaxer, contains the routines which verify BASIC statements. 39
Chapter Five Statement Syntax Tables There is one entry in the Syntax Tables for each BASIC statement. Each statement entry in the Syntax Table is a transcription of an ABML definition of the grammar for that particular statement. The starting address of the table entry for a particular statement is pointed to by that statement's entry in the Statement Name Table. The data in the Syntax Tables is very much like a computer machine language. The pseudo-computer which executes this pseudo-machine language is the pre-compiler code. Like any machine language, the pseudo-machine language of the Syntax Tables has instructions and instruction operands. For example, an ABML non4erminal symbol is transcribed to a code which the pre-compiler executes as a type of "GOSUB and report pass/fail" command. Here are the pseudo-instruction codes in the Syntax Tables; each is one byte in length. Absolute Non-Terminal Vector Name: ANTV Code: $00 This is one of the forms of the non-terminal GOSUB. It is followed by the address, minus 1, of the nonAerminal's definition within the Syntax Table. The address is two bytes long, with the least significant byte first. External Subroutine Call Name: ESRT Code: $01 This instruction is a special type of terminal symbol checker. It is followed by the address, minus 1, of a 6502 machine language routine. The address is two bytes long, with the least significant byte first. The ESRT instruction is a deus ex machina - the "god from the machine" who solved everybody's problems at the end of classical Greek plays. There are some terminals whose definition in ABML would be very complex and require a great many instructions to describe. In these cases, we go outside the pseudo-machine language of the Syntax Tables and get help from 6502 machine language routines - the deus ex machina that quickly gives the desired 40
Chapter Five result. A numeric constant is one example of where this outside help is required. ABML or Name: OR Value: $02 This is the familiar ABML or symbol ( | ). It provides for an alternative definition of a non-terminal. Return Name: RTN Value: $03 This code signals the end of an ABML definition line. When we write an ABML statement on paper, the end of a definition line is obvious - there is no further writing on the line. When ABML is transcribed to machine codes, the definitions are all pushed up against each other. Since the function that is performed at the end of a definition is a return, the end of definition is called return (RTN). Unused (Codes $04 through $0D are unused.) Expression Non-Terminal Vector Name: VFXP Value: $0E The ABML definition for an Atari BASIC expression is located at $A60D. Nearly every BASIC statement definition contains the possibility of having <expression> as part of it. VEXP is a single-byte call to <expression>, to avoid wasting the two extra bytes that ANTV would take. The pseudo- machine understands that this instruction is the same as an ANTV call to <expression> at $A60D. Change Last Token Name: CHNG Value: $0F This instruction is followed by a one-byte change to token value. The operator token instructions cause a token to be placed into the output buffer. Sometimes it is necessary to change the token that was just produced. For example, there are several = operators. One = operator is for the assignment 41
Chapter Five statement LET X = 4. Another = operator is for comparison operations like IF Y = 5. The pseudo-machine will generate the assignment = token when it matches =. The context of the grammar at that point may have required a comparison = token. The CHNG instruction rectifies this problem. Operator Token Name: (many) Value: $10 through $7F These instructions are terminal codes for the Atari BASIC Operators. The code values are the values of each operator token. The values, value names, and operator symbols are defined in the Operator Name Table (see Chapter 2). When the pseudo-machine sees these terminal symbol representations, it compares the symbol it represents to the current symbol in the source statement. If the symbols do not match, then fail status is generated. If the symbols match, then pass status is generated, the token (instruction value) is placed in the token output buffer, and the next statement source symbol becomes the current symbol for verification. Relative Non-Terminal Vectors Name: (none) Value: $80- $BF (Plus) $C0 - $FF (Minus) This instruction is similar to ANTV, except that it is a single byte. The upper bit is enough to signal that this one-byte code is a non-terminal GOSUB. The destination address of the GOSUB is given as a position relative to the current table location. The values $80 through $BF correspond to an address which is at the current table address plus $00 through $31'. The values $C0 through $FF correspond to an address which is at the current table address minus $01 through $31'. Pre-compiler Main Code Description The pre-compiler, which starts at SYNENT ($A1C3), uses the pseudo-instructions in the Syntax Tables to verify the correctness of the source line and to generate the tokenized statements. 42
Chapter Five Syntax Stack The pre-compiler uses a LIFO stack in its processing. Each time a non-terminal vector ("GOSUB") is executed, the pre compiler must remember where the call was made from. It must also remember the current locations in the input buffer (source statement) and the output buffer (tokenized statement) in case the called routine reports fail and backup is required. This LIFO stack is called the Syntax Stack. The Syntax Stack starts at 5480 at the label SIX. The stack is 256 bytes in size. Each entry in the stack is four bytes long. The stack can hold 64 levels of non4erminal calls. If a sixty-fifth stack entry is attempted, the LINE TOO LONG error is reported. (This error should be called LINE TOO COMPLEX, but the line is most likely too long also.) The first byte of each stack entry is the current input index (CIX). The second byte is the current output index (COX). The final two bytes are the current address within the syntax tables. The current stack level is managed by the STKLVL ($A9) cell. STKLVL maintains a value from $00 to $FC, which is the displacement to the current top of the stack entry. Initialization The editor has saved an address in SRCADR ($96). This address is the address, minus 1, of the current statement's ABML instructions in the Syntax Tables. The current input index (CIX) and the current output index (COX) are also preset by the editor. The initialization code resets the syntax stack manager (STKLVL) to zero and loads the first stack entry with the values in CIX, COX, and CPC - the current program counter, which holds the address of the next pseudo-instruction in the Syntax Tables. PUSH Values are placed on the stack by the PUSH routine ($A228). PUSH is entered with the new current pseudo-program counter value on the CPU stack. PUSH saves the current CIX, COX, and CPC on the syntax stack and increments STKLVL. Next, it sets a new CPC value from the data on the CPU stack. Finally, PUSH goes to NEXT. 43
Chapter Five POP Values are removed from the stack with the POP routine ($A252). POP is entered with the 6502 carry flag indicating pass/fail. If the carry is clear, then pass is indicated. If the carry is set, then fail is indicated. POP first checks STKLVL. If the current value is zero, then the pre-compiler is done. In this case, POP returns to the editor via RTS. The carry bit status informs the editor of the pass/fail status. If STKLVL is not zero, POP decrements STKLVL. At this point, POP examines the carry bit status. If the carry is clear (pass), POP goes to NEXT. If the carry is set (fail), POP goes to FAIL. NEXT and the Processes It Calls After initialization is finished and after each Syntax Table instruction is processed, NEXT is entered to process the next syntax instruction. NEXT starts by calling NXSC to increment CPC and get the next syntax instruction into the A register. The instruction value is then tested to determine which syntax instruction code it is and where to go to process it. If the Syntax Instruction is OR ($02) or RTN ($03), then exit is via POP. When POP is called due to these two instructions, the carry bit is always clear, indicating pass. ERNTV. If the instruction is RNTV ("GOSUB" $80-$FF), then ERNTV ($A201) is entered. This code calculates the new CPC value, then exits via PUSH. GETADR. If the instruction is ANTV ($00) or the deus ex machina ESRT ($01) instruction, then GETADR is called. GETADR obtains the following two-byte address from the Syntax Table. If the instruction was ANTV, then GETADR exits via PUSH. If the instruction was ESRT, then GETADR calls the external routine indicated. The external routine will report pass/fail via the carry bit. The pass/fail condition is examined at $A1FO. If pass is indicated, then NEXT is entered. If fail is indicated, then FAIL is entered. TERMTST. If the instruction is VEXP ($0E), then the code at $A1F9 will go to TERMTST ($A2A9), which will cause the code 44
Chapter Five at $A2AF to be executed for VEXP. This code obtains the address, minus 1, of the ABML for the <expression> in the Syntax Table and exits via PUSH. ECHNG. If the instruction was CHNG ($0F), then ECHNG ($A2BA) is entered via tests at $A1F9 and $A2AB. ECHNG will increment CPC and obtain the change-to token which will then replace the last previously generated token in OUTBUFF. ECHNG exits via RTS, which will take control back to NEXT. SRCONT. The Operator Token Instructions ($10-$7F) are handled by the SRCONT routine. SRCONT is called via tests at $A1F9 and $A2AD. SRCONT will examine the current source symbol to see if it matches the symbol represented by the operator token. When SRCONT has made its determination, it will return to the code at $A1FC. This code will examine the pass/fail (carry cleariset) indicator returned by SRCONT and take the appropriate action. (The SRCONT routine is detailed on the next page.) FAIL If any routine returns a fail indicator, the FAIL code at $A26C will be entered. FAIL will sequentially examine the instructions, starting at the Syntax Table address pointed to by CPC, looking for an OR instruction. If an OR instruction is found, the code at $A27D will be entered. This code first determines if the current statement symbol is the rightmost source symbol to be examined thus far. If it is, it will update MAXCIX. The editor will use MAXCIX to set the inverse video flag if the statement is erroneous. Second, the code restores CIX and COX to their before-failure values and goes to NEXT to try this new OR choice. If, while searching for an OR instruction, FAIL finds a RTN instruction, it will call POP with the carry set. Since the carry is set, POP will re-enter FAIL once it has restored things to the previous calling level. All instruction codes other than OR and RTN are skipped over by FAIL. 45
Chapter Five Pre-compiler Subroutine Descriptions SRCONT ($A2E6) The SRCONT code will be entered when an operator token instruction is found in the Syntax Tables by the main pre compiler code. The purpose of the routine is to determine if the current source symbol in the user's line matches the terminal symbol represented by the operator token. If the symbols match, the token is placed into the output buffer and pass is returned. If the symbols do not match, fail is returned. SRCONT uses the value of the operator token to access the terminal symbol name in the Operator Name Table. The characters in the source symbol are compared to the characters in the terminal symbol. If all the characters match, pass is indicated. TNVAR, TSVAR ($A32A) These deus ex machina routines are called by the ESRT instruction. The purpose of the routines is to determine if the current source symbol is a valid numeric (TNVAR) or string (TSVAR) variable. If the source symbol is not a valid variable, fail is returned. When pass is indicated, the routine will put a variable token into the output buffer. The variable token ($80-$FF) is an index into the Variable Name Table and the Variable Value Table, plus $80. The Variable Name Table is searched. If the variable is already in the table, the token value for the existing variable is used. If the variable is not in the table, it will be inserted into both tables and a new token value will be used. A source symbol is considered a valid variable if it starts with an alphabetic character and it is not a symbol in the Operator Name Table, which includes all the reserved words. The variable is considered to be a string if it ends with $; otherwise it is a numeric variable. If it is a string variable, $ is stored with the variable name characters. The routine also determines if the variable is an array by looking for (. If the variable is an array, (is stored with the variable name characters in the Variable Name Table. As a result, ABC, ABC$, and ABC(n) are all recognized as different variables. 46
Chapter Five TNCON ($A400) TNCON is called by the FSRT instruction. Its purpose is to examine the current source symbol for a numeric constant, using the floating point package. If the symbol is not a numeric constant, the routine returns fail. If the symbol is a numeric constant, the floating point package has converted it to a floating point number. The resulting six-byte constant is placed in the output buffer preceded by the $OE numeric constant token. The routine then exits with pass indicated. TSCON ($A428) TSCON is called by the ESRT instruction. Its purpose is to examine the current symbol for a string constant. If the symbol is not a string constant, the routine returns fail. If the first character of the symbol is '', the symbol is a string constant. The routine will place the string constant token ($0F) into the output buffer, followed by a string length byte, followed by the string characters. The string constant consists of all the characters that follow the starting double quote up to the ending double quote. If the EOL character ($9B) is found before the ending double quote, an ending double quote is assumed. The EOL is not part of the string. The starting and ending double quotes are not saved with the string. All 256 character codes except $9B (EOL) and $22 (") are allowed in the string. SEARCH ($A462) This is a general purpose table search routine used to find a source symbol character string in a table. The table to be searched is assumed to have entries which consist of a fixed length part (0 to 255 bytes) followed by a variable length ATASCII part. The last character of the ATASCII part is assumed to have the most significant bit ($80) on. The last table entry is assumed to have the first ATASCII character as $00. Upon entry, the X register contains the length of the fixed part of the table (0 to 255). The A, Y register pair points to the start of the table to be searched. The source string for comparison is pointed to by INBUFF plus the value in CIX. Upon exit, the 6502 carry flag is clear if a match was found, and set if no match was found. The X register points to the end 47
Chapter Five of the symbol, plus 1, in the buffer. The SRCADR ($95) two- byte cell points to the matched table entry. STENUM ($AF) contains the number, relative to zero, of the matched table entry. SETCODE (A2C8) The SETCODE routine is used to place a token in the next available position in the output (token) buffer. The value in COX determines the current displacement into the token buffer. Mter the token is placed in the buffer, COX is incremented by one. If COX exceeds 255, the LINE TOO LONG error message is generated. 48
Chapter SixExecution Overview
During the editing and pre-compiling phase, the user's statements were checked for correct syntax, tokenized, and put into the Statement Table. Then direct statements were passed to the Program Executor for immediate processing, while program statements awaited later processing by the Program Executor. We now enter the execution phase of Atari BASIC. The Program Executor consists of three parts: routines which simulate the function of individual statement types; an expression execution routine which processes expressions (for example, A+B+3, A$(1,3), "HELP", A(3)+7.26E-13); and the Execution Control routine, which manages the whole process. Execution Control Execution Control is invoked in two situations. If the user has entered a direct statement, Execution Control does some initial processing and then calls the appropriate statement execution routine to simulate the requested operation. If the user has entered RUN as a direct statement, the statement execution routine for RUN instructs Execution Control to start processing statements from the beginning of the statement table. When the editor has finished processing a direct statement, it initiates the Execution Control routine EXECNL ($A95F). Execution Control's job is to manage the process of statement simulation. The editor has saved the address of the statement it processed in STMCUR and has put the statement in the Statement Table. Since this is a direct statement, the line number is $8000, and the statement is saved as the last line in the Statement Table. The fact that a direct statement is always the last statement in the Statement Table gives a test for the end of a user's program. The high- order byte of the direct statement line number ($8000) has its most significant bit on. Loading this byte ($80) 49
Chapter Six into the 6502 accumulator will set the minus flag on. The line number of any program statement isiess than; or equal to $7FFF. Loading the high order byte ($7F or less) of a program line number into the accumulator will set the 6502 minus flag off. This gives a simple test for a direct statement. Initialization Execution Control uses several parameters to help it manage the task of statement execution. STMCUR holds the address in the Statement Table of the line currently being processed. LLNGTH holds the length of the current line. NXTSTD holds the displacement in the current line of the next statement to process. STMCUR already contains the correct value when Execution Control begins processing. SETLNI ($B81B) is called to store the correct values into LLNGTH and NXTSTD. Statement Execution Since the user may have changed his or her mind about execution, the routine checks to see if the user hit the break key. If the user did hit BREAK, Execution Control carries out XSTOP ($B793), the same routine that is executed when the STOP statement is encountered. At the end of its execution, the XSTOP routine gives control to the beginning of the editor. If the user did not hit BREAK, Execution Control checks to see whether we are at the end of the tokenized line. Since this is the first statement in the line, we can't be at the end of the line. So why do the test? Because this part of the routine is executed once for each statement in the line in order to tell us when we do reach the end of the line. (The end-of-line procedure will be discussed later in this chapter.) The statement length byte (the displacement to the next statement in the line) is the first byte in a statement. (See Chapter 3.) The displacement to this byte was saved in NXTSTD. Execution Control now loads this new statement's displacement using the value in NXTSTD. The byte after the statement length in the line is the statement name token. Execution Control loads the statement name token into the A register. It saves the displacement to the next byte, the first of the statement's tokens, in STINDEX for the use of the statement simulation routines. 50
Chapter Six The statement name token is used as an index to find this statement's entry in the Statement Execution Table. Each table entry consists of the address, minus 1, of the routine that will simulate that statement. This simulation routine is called by pushing the address from the table onto the 6502 CPU stack and doing an RTS. Later, when a simulation routine is finished, it can do an RTS and return to Execution Control. (The name of most of the statement simulation routines in the BASIC listing is the statement name preceded by an X: XFOR, XRUN, XLIST.) Most of the statement simulation routines return to Execution Control after processing. Execution Control again tests for BREAK and checks for the end of the line. As long as we are not at end-of-line, it continues to execute statements. When we reach end-of-line, it does some end-of-line processing. End-of-line Handling in a Direct Statement When we come to the end of the line in a direct statement, Execution Control has done its job and jumps to SNX3. The READY message is printed and control goes back to the Program Editor. End-of-line Handling during Program Execution Program execution is initiated when the user types RUN. Execution Control handles RUN like any other direct statement. The statement simulation routine for RUN initial izes STMCUR, NXTSTD, and LLNGTH to indicate the first statement of the first line in the Statement Table, then returns to Execution Control. Execution Control treats this first program statement as the next statement to be executed, picking up the statement name tokens and calling the simulation routines. Usually, Execution Control is unaware of whether it is processing a direct statement or a program statement. End-of- line is the only time the routine needs to make a distinction. At the end of every program line, Execution Control gets the length of the current line and calls GNXTL to update the address in STMCUR to make the next line in the Statement Table the new current line. Then it calls TENDST ($A9E2) to test the new line number to see if it is another program line or a direct statement. If it is a direct statement, we are at the end of the user's program. 51
Chapter Six Since the direct statement includes the RUN command that started program execution, Execution Control does not execute the line. Instead, Execution Control calls the same routine that would have been calle'lif the program had contained an END statement (XEND, at$B78D). XEND does some end-of- program processing, causes READY to be printed, and returns to the beginning of the editor. If we are not at the end of the user's program, processing continues with the new current line. Execution Control Subroutines TENDST ($A9E2) Exit parameters: The minus flag is set on if we are at the end of program. This routine checks for the end of the user's program in the Statement Table. The very last entry in the Statement Table is always a direct statement. Whenever the statement indicated by STMCUR is the direct statement, we have finished processing the user's program. The line number of a direct statement is $8000. The line number of any other statement is $7FFF or less. TENDST determines if the current statement is the direct statement by loading the high-order byte of the line number into the A register. This byte is at a displacement of one from the address in STMCUR. If this byte is $80 (a direct statement), loading it turns the 6502 minus flag on. Otherwise, the minus flag is turned off. GETSTMT ($A9A2) Entry parameters: TSLNUM contains the line number of the statement whose address is required. Exit parameters: If the line number is found, the STMCUR contains the address of the statement and the carry flag is set off (clear). If the line number does not exist, STMCUR contains the address where a statement with that line number should be, and the carry flag is set on (set). The purpose of this routine is to find the address of the statement whose line number is contained in TSLNUM. The routine saves the address currently in STMCUR into SAVCUR and then sets STMCUR to indicate the top of the 52
Chapter Six Statement Table. The line whose address is in STMCUR is called the current line or statement. GETSTMT then searches the Statement Table for the statement whose line number is in TSLNUM. The line number in TSLNUM is compared to the line number of the current line. If they are equal, then the required statement has been found. Its address is in STMCUR, so GETSTMT clears the 6502 carry flag and is finished. If TSLNUM is smaller than the current statement line number, GETSTMT gets the length of the current statement by executing GETLL ($A9DD). GNXTL ($A9D0) is executed to make the next line in the statement table the current statement by putting its address into STMCUR. GETSTMT then repeats the comparison of TSLNUM and the line number of the current line in the same manner. If TSLNUM is greater than the current line number, then a line with this line number does not exist. STMCUR already points to where the line should be, the 6502 carry flag is already set, and the routine is done. GETLL ($A9DD) Entry parameters: STMCUR indicates the line whose length is desired. Exit parameters: Register A contains the length of the current line. GETLL gets the length of the current line (that is, the line whose address is in STMCUR). The line length is at a displacement of two into the line. GETLL-loads the length into the A register and is done. GNXTL ($A9D0) Entry parameters: STMCUR contains the address of the current line, and register A contains the length of the current line. Exit parameters: STMCUR contains the address of the next line. This routine gets the next line in the statement table and makes it the current line. GNXTL adds the length of the current line (contained in the A register) to the address of the current line in STMCUR. This process yields the address of the next line in the statement table, which replaces the value in STMCUR. 53
Chapter Six SETLN1 ($B81B) Entry parameters: STMCUR contains the address of the current line. Exit parameters: LLNGTH contains the length of the current line. NXTSTD contains the displacement in the line to the next statement to be executed (in this case, the first statement in the line). This routine initializes several line parameters so that Execution Control can process the line. The routine gets the length of the line, which is at a displacement of two from the start of the line. SETLN1 loads a value of three into the Y register to indicate the displacement into the line of the first statement and stores the value into NXTSTD as the displacement to the next statement for execution. SETLINE ($B818) Entry parameters: TSLNUM contains the line number of a statement. Exit parameters: STMCUR contains the address of the statement whose line number is in TSLNUM. LLNGTH contains the length of the line. NXTSTD contains the displacement in the line to the next statement to be executed (in this case, the first statement in the line). Carry is set if the line number does not exist. This routine initializes several line parameters so that execution control can process the line. SETLINE first calls GETSTMT ($A9A2) to find the address of the line whose number is in TSLNUM and put that address into STMCUR. It then continues exactly like SETLN1. 54
Chapter SevenExecute Expression
The Execute Expression routine is entered when the Program Executor needs to evaluate a BASIC expression within a statement. It is also the executor for the LET and implied LET statements. Expression operators have an order of precedence; some must be simulated before others. To properly evaluate an expression, Execute Expression rearranges it during the evaluation. Expression Rearrangement Concepts Operator precedence rules in algebraic expressions are so simple and so unconscious that most people aren't aware of following them. When you evaluate a simple expression like Y=AX2+BX+C, you don't think: "Exponentiation has a higher precedence than multiplication, which has a higher precedence than addition; therefore, I will first square the X, then perform the multiplication." You just do it. Computers don't develop habits or common sense - they have to be specifically commanded. It would be nice if we could just type Y = AX2+BX+C into our machine and have the computer understand, but instead we must separate all our variables with operators. We also have to learn a few new operators, such as * for multiply and ^ for exponentiation. Given that we are willing to adjust our thinking this much, we enter Y=A*X^2+B*X+C. The new form of expression does not quite have the same feel as Y AX2+BX+C; we have translated normal human patterns halfway into a form the computer can use. Even the operation X^2 causes another problem for the computer. It would really prefer that we give it the two values first, then tell it what to do with them. Since the computer still needs separators between items, we should write X^2 as X,2,^. Now we have something the computer can work with. It can obtain the two values X,2, apply the operator ^, and get a result without having to look ahead. 55
Chapter Seven If we were to transcribe X^2*A in the same manner, we would have X,2,^,A,*. The value returned by X,2,^ is the first value to multiply, so the value pair for multiplication is (X,2,^) and A. Again we have two values followed by an operator, and the computer can understand. If we continue to transcribe the expression by pairing values and operators, we find that we don't want to add the value X^2*A to B; we want to add the value X^2*A to B*X. Therefore, we need to tell the computer X,2,^,A,*,B,X,*, +. The value pair for the operator + is (X,2,^,A,*) and (B,X,*). The value pair for the final operation, =, is (X,2,^,A,*,B,X, *,+,C,+) and Y. So the complete translation of Y=AX2+BX+ C is X,2,^,A,*,B,X,*,+,C,+,Y,=. Very few people other than Forth programmers put up with this form of expression transcription. Therefore, Atari BASIC was designed to perform this translation for us, provided we use the correct symbols, like * and ^. The Expression Rearrangement Algorithm The algorithm for expression rearrangement requires two LIFO stacks for temporary storage of the rearranged terms. The Operator Stack is used for temporarily saving operators; the Argument Stack is used for saving arguments. Arguments are values consisting of variables, constants, and the constant-like values resulting from previous expression operations. Operator Precedence Table The Atari BASIC User's Manual lists the operators by precedence. The highest-precedence operators, like <, >, and = <, are at the top of the list; the lowest-precedence operator, OR, is at the bottom. The operators at the top of the list get executed before the operators at the bottom of the list. The operators in the precedence table are arranged in the same order as the Operator Name Table. Thus the token values can be used as direct indices to obtain an operator precedence value. The entry for each operator in the Operator Precedence Table contains two precedence values, the go-onto-stack precedence and the come-off-stack precedence. When a new operator has been plucked from an expression, its go-onto- stack precedence is tested in relation to the top-of-stack operator's come-off-stack precedence. 56
Chapter Seven Expression Rearrangement Procedure The symbols of the expression (the arguments and the operators) are accessed sequentially from left to right, then rearranged into their correct order of precedence by the following procedure: 1. Initialize the Operator Stack with the Start Of Expression (SOE) operator. 2. Get the next symbol from the expression. 3. If the symbol is an argument (variable or constant), place the argument on the top of the Argument Stack. Go to step 2. 4. If the symbol is an operator, save the operator in the temporary save cell, SAVEOP. 5. Compare the go-onto-stack precedence of the operator in SAVEOP to the come-off stack precedence of the operator on the top of the Operator Stack. 6. If the top-of-stack operator's precedence is less than the precedence of the SAVEOP operator, then the SAVEOP operator is pushed onto the Operator Stack. When the push is done, go back to step 2. 7. If the top-of-stack operator's precedence is equal to or greater than the precedence of the SAVEOP operator, then pop the top-of-stack operator and execute it. When the execution is done, go back to step 5 and continue. The Expression Rearrangement Procedure has one apparent problem. It seems that there is no way to stop it. There are no exits for the "evaluation done" condition. This problem is handled by enclosing the expression with two special operators: the Start Of Expression (S0E) operator, and the End Of Expression (EOE) operator. Remember that SOE was the first operator placed on the Operator Stack, in step 1. Execution code for the SOE operator will cause the procedure to be exited in step 7, when SOE is popped and executed. The EOE operator is never executed. EOF's function is to force the execution of SOF. The precedence values of SOE and EOE are set to insure that SOE is executed only when the expression evaluation is finished The SOE come-off-stack precedence is set so that its value is always less than all the other operators' go-onto-stack precedence values. The EOE go-onto-stack precedence is set so that its value is always equal to or less than all the other 57
Chapter Seven operators' (including SOE's) corae-off-stack precedence values. Because SOE and EOE precedence are set this way, no operator other than EOE can cause SOE to be popped and executed. Second, EOE will cause all stacked operators, including SOE, to be popped and executed. Since SOE is always at the start of the expression and EOE is always at the end of the expression, SOE will not be executed until the expression is fully evaluated. In actual practice, the SOE operator is not physically part of the expression in the Statement Table. The Expression Rearrangement Procedure initializes the Operator Stack with the SOE operator before it begins to examine the expression. There is no single operator defined as the End Of Expression (EOE) operator. Every BASIC expression is followed by a symbol like :, THEN, or the EOL character. All of these symbols function as operators with precedence equivalent to the precedence of our phantom EOE operator. The THEN token, for example, serves a dual purpose. It not only indicates the THEN action, but also acts as the EOE operator when it follows an expression. Expression Rearrangement Example To illustrate how the expression evaluation procedure works, including expression rearrangement, we will evaluate our Y = A*X^2 + B*X + C example and see how the expression is rearranged to X,2,^,A,*,B,X,*,+,C,+,Y,= with a correct result. To work our example, we need to establish a precedence table for the operators. The values in Figure 7-1 are similar to the actual values of these operators in Atari BASIC. The lowest precedence value is zero; the highest precedence value is $0F. Figure 7-1. Example precedence Table operator go-on-stack come-off-stack symbol precedence precedence SOE NA $00 + $09 $09 * $0A $0A ^ s0C $0C = $0F $01 !(EOE) $00 NA 58
Chapter Seven Symbol values and notations. In the example steps, the term PSn refers to step n in the Expression Rearrangement Procedure (page 57). Step 5, for instance, will be called P55. In the actual expression, the current symbol will be underlined. If B is the current symbol, then the actual expression will appear as Y=A*X 2+B*X+C . In the rearranged expression, the symbols which have been evaluated up to that point will also be underlined. The values of the variables are: A=2 C=6 B=4 X=3 The variable values are assumed to be accessed when the variable arguments are popped for operator execution. The end-of-expression operator is represented by!. Example step 1. Actual Expression: Y=A*X^2 + B*X + C! Rearranged Expression: X,2,^,A,*,B,X,*,+,C,+,Y,=,! Argument Stack: Operator Stack: SOE SAVEOP: PS1 has been executed. The Operator Stack has been initialized with the SOE operator. We are ready to start processing the expression symbols. Example step 2. Actual Expression: Y= A*X^2+B*X+C! RearrangedExpression: X,2,^,A,*,B,X,*,+,C,+,Y,=,! Argument Stack: Y Operator Stack: SOE SAVEOP: The first symbol, Y, has been obtained and stacked in the Argument Stack according to PS2 and P53. Example step 3. Actual Expression: Y=A*X^2+B*X+C! Rearranged Expression: X,2,^,A,*,B,X,*,+,C,+,Y,=,! Argument Stack: Y Operator Stack: SOE,= SAVEOP: = 59
Chapter Seven Operator = has been obtained via PS2. The relative precedences of SOE ($00) and = ($0F) dictate that the = be placed on the Operator Stack via PS6. Example step 4. Actual Expression: Y=A*X^2+B*X+C! Rearranged Expression: X,2,^,A,*,B,X,*,+,C,+,Y,=,! Argument Stack: Y,A Operator Stack: SOE, = SAVEOP: The next symbol is A. This symbol is pushed onto the Argument Stack via PS3. Example step 5. Actual Expression: Y=A*X^2+B*X+C! Rearranged Expression: X,2,^,A,*,B,X,*,+,C,+,Y,=,! Argument Stack: Y,A Operator Stack: SOE, = , * SAVEOP: * The next symbol is the operator *. The relative precedence of * and = dictates that * be pushed onto the Operator Stack. Example step 6. Actual Expression: Y = A*X^2+B*X+C! Rearranged Expression: X,2,^,A,*,B,X,*,+,C,+,Y,=,! Argument Stack: Y,A,X Operator Stack: SOE, = , * SAVEOP: The next symbol is the variable X. This symbol is stacked on the Argument Stack according to PS3. Example step 7. Actual Expression: Y=A*X^2+B*X+C! Rearranged Expression: X,2,^,A,*,B,X,*,+C,+,Y,=,! Argument Stack: Y,A,X Operator Stack: SOE, = ,~, A U;' SAVEOP: A The next symbol is ^ The relative precedence of the and the * dictate that ^ be stacked via PS6. 60
Chapter Seven Example step 8. Actual Expression: Y=A*X^2+B*X+C! Rearranged Expression X,2,^,A,*,B,X,*,+,C,+,Y,=,! Argument Stack: Y,A,X,2 Operator Stack: SOE,=,*,^ SAVEOP: The next symbol is 2. This symbol is stacked on the Argument Stack via PS3. Example step 9. Actual Expression: Y=A*X^2+B*X+C! Rearranged Expression X,2,^,A,*,B,X,*,+,C,+,Y,=! Argument Stack: Y,A,9 Operator Stack: SOE,=,*, SAVEOP:+ The next symbol is the operator +. The precedence of the operator that was at the top of the stack, ^ , is greater than the precedence of +. PS7 dictates that the top-of-stack operator be popped and executed. The ^ operator is popped. Its execution causes arguments X and 2 to be popped from the Argument Stack, replacing the variable with the value that it represents and operating on the two values yielded: X^2=3^2=9. The resulting value, 9, is pushed onto the Argument Stack. The + operator remains in SAVEOP. We continue at PS5. Note that in the rearranged expression the first symbols, X,2,^, have been evaluated according to plan. Example step 10. Actual Expression: Y=A*X^2+B*X+C! Rearranged Expression: X,2,^,A,*,B,X,*,+,C,+,Y,=,! Argument Stack: Y,18 Operator Stack: SOE,= SAVEOP: + This step originates at PS5. The SAVEOP operator, +, has a precedence that is less than the operator which was at the top of the stack, *. Therefore, according to P57, the is popped and executed. The execution of * results in A*9=2*9=18. The resulting value is pushed onto the Argument Stack. 61
Chapter Seven Example step 11. Actual Expression: Y = A*X^2+B*X+C! Rearranged Expression: X,2,^,A,*,B,X,*,+,C,+,Y,=,! Argument Stack: Y,18 Operator Stack: SOE, =, + SAVEOP: When step 10 finished, we went to P55. The operator in SAVEOP was +. Since + has a higher precedence than the top- of-stack operator, =, the + operator was pushed onto the Operator Stack via PS6. Example step 12. Actual Expression: Y=A*X^2+B*X+C! Rearranged Expression: X,2,^,A,*,B,X,*,+,C,+,Y,=,! Argument Stack: Y,18,B Operator Stack: SOE, =, + SAVEOP: The next symbol is the variable B, which is pushed onto the Argument Stack via P53. Example step 13. Actual Expression: Y = A*X^2+B*X+C! Rearranged Expression: X,2,^,A,*,B,X,*,+,C,+,Y,=,! Argument Stack: Y,18,B Operator Stack: SOE, =,+,* SAVEOP: * The next symbol is the operator *. Since * has a higher precedence than the top-of-stack +, * is pushed onto the stack via PS6. Example step 14. Actual Expression: Y = A*X^2+B*X+C! Rearranged Expression: X,2,^,A,*,B,X,*,+,C,+,Y,=,! Argument Stack: Y,18,B,X Operator Stack: SOE, =, + * SAVEOP: The variable X is pushed onto the Argument Stack via PS3. Example step 15. Actual Expression: Y=A*X^2+B*X+C! Rearranged Expression: X,2,^,A,*,+,C,+,Y, =,! 62
Chapter Seven Argument Stack: Y,18,12 Operator Stack: SOE, =,+ SAVEOP: + The operator + is retneved from the expression. Since + has a lower precedence than which is at the top of the stack, * is popped and executed. The execution of * causes B*X=4*3=12. The resulting value of 12 is pushed onto the Argument Stack. We will continue at PS5 via the P57 exit rule. Example step 16. Actual Expression: Y=A*X^2+B*X+C! Rearranged Expression: X,2,^,A,*,B,X,*,+,C,+,Y,=,! Argument Stack: Y,30 Operator Stack: SOE,= SAVEOP: + This step starts at PS5. The SAVEOP operator, +, has precedence that is equal to the precedence of the top-of-stack operator, also +. Therefore, + is popped from the operator stack and executed. The results of the execution cause 18+12, or 30, to be pushed onto the Argument Stack. PS5 is called. Example step 17. Actual Expression: Y=A*X^2+B*X+C! Rearranged Expression: X,2,^,A,*,B,X,*,+,C,+,Y,=,! Argument Stack: Y,30 Operator Stack: SOE, =,+ SAVEOP: This step starts at PS5. The SAVEOP is +. The top-of-stack operator, =, has a lower precedence than +; therefore, + is pushed onto the stack via PS6. Example step 18. Actual Expression: Y=A*X^2+B*X+C! Rearranged Expression: X,2,^,A,*,B,X,*,+,C,+,Y,=,! Argument Stack: Y,30,C Operator Stack: SOE, =,+ SAVEOP: The variable C is pushed onto the Argument Stack via PS3. 63
Chapter Seven Example step 19. Actual Expression: Y=A*X^2+B*X+C! Rearranged Expression: X,2,^,A,*,B,X,*,+,C,+,Y,=,! Argument Stack: Y,36 Operator Stack: SOE, SAVFOP: The EOE operator is plucked from the expression. The EOE has a lower precedence than the top-of-stack + operator. Therefore, + is popped and executed. The resulting value of 30+6, 36, is pushed onto the Argument Stack. PS5 will execute next. Example step 20. Actual Expression: Y=A*X^2+B*X+C! Rearranged Expression: X,2,^,A,*,B,X,*,+,C,+,Y,=,! Argument Stack: Operator Stack: SOE SAVEOP: ! This step starts at P55. The ! operator has a lower precedence than the top-of-stack = operator, which is popped and executed. The execution of = causes the value 36 to be assigned to Y. This leaves the Argument Stack empty. PS5 will be executed next. Example step 21. Actual Expression: Y=A*X^2+B*X+C! Rearranged Expression: X,2,^,A,*,B,X,*,+,C+,Y,=,! Argument Stack: Operator Stack: SAVEOP: ! The operator in SAVEOP causes the SOE operator to be popped and executed. The execution of SOF terminates the expression evaluation. Note that the rearranged expression was executed exactly as predicted. Mainline Code The Execute Expression code implements the Expression Rearrangement Procedure. The mainline code starts at the EXEXPR label at $AAE0. The input to EXEXPR starts at the current token in the current statement. STMCUR points to the 64
Chapter Seven current statement. STINDEX contains the displacement to the current token in the STMCUR statement. The output of EXEXPR is whatever values remain on the top of the argument stack when the expression evaluation is finished. In the following discussion, PSn refers to the procedure step n in the Expression Rearrangement Procedure. PS1, initialization, occurs when EXEXPR is entered. EXPINT is called to initialize the operator and argument stacks. EXPINT places the SOF operator on the operator stack. PS2, which obtains the next token, directly follows initialization at EXNXT ($AAE3). The code calls EGTOKEN to get the next expression symbol and classify it. If the token is an argument, the carry will be set. If the token is an operator, the carry will be clear. If the token is an argument, P53 is implemented via a call to ARGPUSH. After the argument is pushed onto the argument stack, EXNXT (PS2) will receive control. If the token was an operator, then the code at EXOT ($AAEE) will be executed This code implements P54 by saving the token in EXSVOP. PS5, which compares the precedents of the EXSVOP token and the top-of-stack token, follows EXOT at EXPTST ($AAFA). This code also executes the SOE operator If SOE is popped, then Execute Expression finishes via RTS. If the top-of-stack operator precedence is less than the EXSVOP operator precedence1 PS6 is implemented at EOPUSH ($AB15). EOPUSH pushes EXSVOP onto the operator stack and then goes to EXNXT (PS2). If the top-of-stack operator precedence is greater than or equal to the EXSVOP operator precedence, then PS7 is implemented at EXOPOP ($AB0B). EXOPOP will pop the top- of-stack operator and execute it by calling EXOP. When EXOP is done, control passes to EXPTST (PS5). Expression Evaluation Stacks The two expression evaluation stacks, the Argument Stack and the Operator Stack, share a single 256-byte memory area. The Argument Stack grows upward from the lower end of the 256- byte area. The Operator Stack grows downward from the upper end of the 256-byte area. The 256-byte stack area is the multipurpose buffer at the start of the RAM tables. The buffer is pointed to by the 65
Chapter Seven ARGSTK (also ARGOPS) zero-page pointer at $80. The current index into the Argument Stack is maintained by ARSLVL ($AA). When the Argument Stack is empty, ARSLVL is zero. The OPSTKX cell maintains the current index into the Operator Stack. When the Operator Stack is initialized with the SOE operator, OPSTKX is inihalized to $FF. As operators are added to the Operator Stack, OPSTKX is decremented. As arguments are added to the Argument Stack, ARSLVL is incremented. Since the two stacks share a single 256-byte memory area, there is a possibility that the stacks will run into each other. The code at $ABC1 is used to detect a stack collision. It does this by comparing the values in ARSLVL and OPSTKX. If ARSLVL is greater than or equal to OPSTKX, then a stack collision occurs, sending the STACK OVERFLOW error to the user. Operator Stack Each entry on the Operator Stack is a single-byte operator-type token. Operators are pushed onto the stack at EXOPUSH (SAB15) and are popped from the stack at EXOPOP ($ABOB). Argument Stack Each entry on the Argument Stack is eight bytes long. The format of these entries is described in Figures 7-2, 7-3, and 7-4, and are the same as the formats for entries in the Variable Value Table. Unlike the Variable Value Table, the Argument Stack must deal with both variables and constants. In Figure 7-2, we see that VNUM is used to distinguish variable entries from constant entries. The SADR and AADR fields in the entries for strings and arrays are of special interest. (See Figures 7-3 and 7-4.) When a string or array variable is dimensioned, space for the variable is created in the string/array space. The displacement to the start of the variable's area within the string/array space is placed in the SADR/AADR fields at that time. A displacement is used rather than an absolute address because the absolute address can change if any program changes are made after the DIM statement is executed. Execute Expression needs these values to be absolute address values within the 6502 address space. When a string/array variable is retrieved from the Variable Value Table, 66
Chapter Seven the displacement is transformed to an absolute address. When (and if) the variable is put back into the Variable Value Table, the absolute address is converted back to a displacement. The entries for string constants also deserve some special attention. String constants are the quoted strings within the user program. These strings become part of the tokenized statements in the Statement Table. When Execute Expression gets a string token, it will create a string constant Argument Stack entry. This entry's SADR is an absolute address pointer to the string in the Statement Table. SLEN and SDIM are set to the actual length of the quoted string. Argument Work Area An argument which is currently being examined by Execute Expression is kept in a special zero-page Argument Work Area (AWA). The AWA starts at the label VTYPE at $D2. Figure 7-2. Argument Stack Entry 0 1 2 8 +-------+------+----------+ | VTYPE | VNUM | DATA | +-------+------+----------+ | | | | | | | | Data Field. Format depends on VTYPE. | | | +---- If VNUM =0, the entry is a constant. | If VNUM >0, the entry is a variable. In this case, | the VNUM value is the entry number in the Variable | Value Table. The token representing this variable is | VNUM+$80. | +---- $00=Data is a six-byte floating point constant. $80=Data represents an undimensioned string. $81=Data represents a dimensioned string with a relative address pointer. $83=Data represents a dimensioned string with an absolute address pointer. $40=Data represents an undimensioned array. $41=Data represents a dimensioned array with a relative address pointer. $43=Data represents a dimensioned array with an absolute address pointer. 67
Chapter Seven Figure 7-3. Argument Stack String Entry 0 1 2 4 6 8 +-------+------+------+------+------+ | VTYPE | VNUM | SADR | SLEN | SDIM | +-------+------+------+------+------+ | | | +---------------------+ | | | +--------------------------+ | | | +-------------------------------+ | | | Dimensioned length of the string. Valid only if | | +- VTYPE=$81 or $83. | | | +--- Current length of the string. Valid only if VTYPE | =$81 or $83. | +----| String address. Valid only if VTYPE=$81 or $83. | If VTYPE=$81, then SADR is the displacement | of the start of the string from the start of the | string/array space. | | If VTYPE=$83, then SADR is the absolute address | of the start of the string. Figure 7-4. Argument Stack Array Entry 0 1 2 4 6 8 +-------+------+------+------+------+ | VTYPE | VNUM | SADR | DIM1 | DIM2 | +-------+------+------+------+------+ | | | +---------------------+ | | | +--------------------------+ | | | +-------------------------------+ | | | When an array has been dimensioned as A(DI,D2), | | +- this field contains the D2 value. If an array | | was dimensioned as A(DI), then this field is | | zero. The field is valid only if VTYPE=$41 or | | $43. | | | | When an array has been dimensioned, as A(D1,D2) | |--- or as A(D1), this field contains the D1 value. | The field is valid only if VTYPE=$41 or $43. | | | Array Address. Valid only if VTYPE=$41 or $43. +----| | If VTYPE=$41, the AADR is the displacement to | the start of the array in the string/array space. | | If VTYPE=$43, the AADR is the absolute address | of the start of the string. 68
Chapter Seven Operator Executions An operaror is executed when it is popped from the Operator Stack. Execute Expression calls EXOP at $AB20 to start this execution. The EXOP routine uses the operator token value as an index into the Operator Execution Table ($AA70). The operator execution address from this table, minus 1, is placed on the 6502 CPU stack. An RTS is then executed to begin executing the operator's code. The names of the operator execution routines all begin with the characters XP. All the Atari BASIC functions, such as PEEK, RND, and ABS, are executed as operators. Most routines for the execution of the operators are very simple and straightforward. For example, the * operator routine, XPMUL ($AC96), pops two arguments, multiplies them via the floating point package, pushes the result onto the argument stack, and returns. String, Array, DIM, and Function Operations Any array reference in an expression may be found in one of two forms: A(x) or A(x,y). The indices x andy may be any valid expression. The intent of the indices is to reference a specific array element. Before the specific element reference can take place, the x and/or y index expressions must be fully evaluated. To do this, the characters '(' ',' and ')' are made operators. The precedence of these operators forces things to happen in the correct sequence. Figure 7-5 shows the relative precedence of these operators for an array. Figure 7-5. Array Operator Precedence operator go-on-stack come-off-stack symbol precedence precedence ( $0F $02 ,(comma) $04 $03 ) $04 $0E As a result of these precedence values, ( has a high enough precedence to go onto the stack, no matter what other operator is on the top of the stack. 69
Chapter Seven The comma's go-on-stack precedence will force all operators except ( to be popped and executed. As a result, the x index sub-expression in the expression A(x,y), will be fully evaluated and the final x index value will be pushed onto the Argument Stack. The comma will then be placed onto the Operator Stack. Its come-off-stack precedence is such that no other operator, except ), will pop it off. The ) operator precedence will force any y index expression to be fully evaluated and the y index result value to be placed onto the Argument Stack. It will then force the comma operator to be popped and executed. This action results in a comma counter being incremented. The ) will then force the ( to be popped and executed. The execution of ( results in the proper array element being referenced. The ( operator will pop the indices from the Argument Stack. The number of indices (either zero or one) to be popped is governed by the comma counter, which was incremented by one for each comma that was popped and executed. Atari BASIC has numerous ( tokens, and each causes a different ( routine to be executed. These ( operators are array (CALPRN), string (CSLPRN) array DIM (CDLPRN) string DIM (CDSLPR), function (CFLPRN), and the expression grouping CLPRN operator. The Syntax Table pseudo-instruction CHNG is used to change the CLPRN token to the other ( tokens in accordance with the context of the grammar. The expression operations for each of these various ( operators in relation to commas and ( is exactly the same. When ( is executed, the comma count will show how many arguments the operator's code must pop from the argument stack. Each of these arguments will have been evaluated down to a single value in the form of a constant. 70
Chapter EightExecution Boundary Conditions
BASIC Language statements can be divided into groups with related functions. The execution boundary statements, RUN, STOP, CONT and END, cause a BASIC program to start or stop executing. The routines which simulate these statements are XRUN, XSTOP, XCONT, and XEND. Program Termination Routines Any BASIC statement can be used as either a direct statement or a program statement, but some only make sense in one mode. The STOP statement has no real meaning when entered as a direct statement. When the statement simulation routine for STOP is asked to execute in direct mode, it does as little processing as possible and exits. Useful processing occurs only when STOP is a program statement. STOP ($B7A7). The XSTOP and XEND routines are similar and perform some of the same tasks. The tasks common to both are handled by the STOP routine. If this statement is not a direct statement, the STOP routine saves the line number of the current line in STOPLN. This line number is used later for printing the STOPed message. It is also used by the CONT simulation routine (XCONT) to determine where to restart program execution. (Since XEND also uses this routine, it is possible to CONTinue after an END statement in the middle of a program.) The STOP routine also resets the LIST and ENTER devices to the screen and the keyboard. XSTOP ($B793). XSTOP does the common STOP processing and then calls :LPRTOKEN($B535) to print the STOPed message. It then calls one of the error printing routines, :ERRM2 ($B974), to output the AT LINE nnn portion. The :ERRM2 routine will not print anything if this was a direct statement. When :ERRM2 is finished, it jumps back to the start of the editor. 71
Chapter Eight XEND ($B7SD). XEND calls the STOP routine to save the current line number. It then transfers to the start of the editor via the SNX1 entry point. This turns off the sound, closes any open IOCBs, and prints the READY message. XEND also leaves values on the 6502 CPU stack. These values are thrown away when the editor resets the stack. END OF PROGRAM. A user may have neglected to include an END statement in his program. In this case, when Execution Control comes to the end of the Statement Table it calls XEND, and the program is terminated exactly as if the last statement in the program were an END. Program Initiation Routines The statements that cause a user's program to begin execution are RUN and CONT. These statements are simulated by XRUN and XCONT XCONT ($B7BE). The CONT statement has no meaning when encountered as a program statement, so its execution has no effect. When the user enters CONT as a direct statement, XCONT uses the line number that was saved in STOPLN to set Execution Control's line parameters (STMCUR, NXTSTD, and LLNGTH). This results in the current line being the line following the one whose line number is in STOPLN. This means that any statement following STOP or END on a line will not be executed; therefore, STOP and END should always be the last statement in the line. If we are at the end of the Statement Table, XCONT terminates as if an END statement had been encountered in the program. If there are more lines to process, XCONT returns to Execution Control, which resumes processing at the line whose address was just put into STMCUR. XRUN ($B74D). The RUN statement comes in two formats, RUN and RUN <filespec>. In the case of RUN <filespec>, XRUN executes XLOAD to load a saved program, which replaces the current one in memory. The process then proceeds like RUN. XRUN sets up Execution Control's line pointers to indicate the first line in the Statement Table. It clears some flags used to control various other BASIC statements; for example, it resets STOPLN to O. It closes all IOCBs and executes XCLR to reset all 72
Chapter Eight the variables to zero and get rid of any entries in the String/Array Table or the Runtime Stack. If there is no program, so the only thing in the Statement Table is the direct statement, then XRUN does some clean-up, prints READY, and returns to the start of the editor, which resets the 6502 CPU stack. If there is a program, XRUN returns to Execution Control, which starts processing the first statement in the table as the current statement. When RUN <filespec> is used as a program statement, it performs the useful function of chaining to a new program, but if RUN alone is used as a program statement, an infinite loop will probably result. Error Handling Routine There are other conditions besides the execution boundary statements that terminate a program's execution. The most familiar are errors. There are two kinds of errors that can occur during execution: Input/Output errors and BASIC language errors. Any BASIC routine that does I/O calls the IOTEST routine ($BCB3) to check the outcome of the operation. If an error that needs to be reported to the user is indicated, IOTEST gets the error number that was returned by the Operating System and joins the Error Handling Routine, ERROR ($B940), which finishes processing the error. When a BASIC language error occurs, the error number is generated by the Error Handling Routine. This routine calculates the error by having an entry point for every BASIC language error. At each entry point, there is a 6502 instruction that increments the error number. By the time the main routine, ERROR, is reached, the error number has been generated. The Error Handling Routine calls STOP ($B7A7) to save the line number of the line causing the error in STOPLN. It tests TRAPLN to see if errors are being TRAPed. The TRAP option is on if TRAPLN contains a valid line number. In this case, the Error Handler does some clean-up and joins XGOTO, which transfers processing to the desired line. If the high-order byte of the line number is $80 (not a valid line number), then we are not TRAPing errors. In this case, the Error Handler prints the four-part error message, which 73
Chapter Eight consists of ERROR, the error number, AT LINE, and finally the line number. If the line in error was a direct statement, the AT LINE part is not printed. The error handler resets ERRNUM to zero and is finished. The Error Handling Routine does not do an orderly return, but jumps back to the start of the editor at the SYNTAX entry point where the 6502 stack is reset, clearing it of the now- unwanted return addresses. 74
Chapter NineProgram Flow Control Statements
Execution Control always processes the statement in the Statement Table that follows the one it thinks it has just finished. This means that statements in a BASIC program are usually processed in sequential order. Several statements, however, can change that order: GOTO, IF, TRAP, FOR, NEXT, GOSUB, RETURN, POP, and ON. They trick Execution Control by changing the parameters that it maintains. Simple Flow Control Statements XGOTO ($BGA3) The simplest form of flow control transfer is the GOTO statement, simulated by the XGOTO routine. Following the GOTO token in the tokenized line is an expression representing the line number of the statement that the user wishes to execute next. The first thing the XGOTO routine does is ask Execute Expression to evaluate the expression and convert it to a positive integer. XGOTO then calls the GETSTMT routine to find this line number in the Statement Table and change Execution Control's line parameters to indicate this line. If the line number does not exist, XGOTO restores the line parameters to indicate the line containing the original GOTO, and transfers to the Error Handling Routine via the ERNOLN entry point. The Error Handling Routine processes the error and jumps to the start of the editor. If the line number was found, XGOTO jumps to the beginning of Execution Control (EXECNL) rather than returning to the point in the routine from which it was called. This leaves garbage on the 6502 CPU stack, so XGOTO first pulls the return address off the stack. 75
Chapter Nine XIF($8778) The IF statement changes the statement flow based on a condition. The simulatiQn routine, xw, begins by calling a subroutine of Execute Expressionto evaluate the condition. Since this is a logical (rather than an arithmetic) operation, we are only interested in whether the value is zero or non-zero. If the expression was false (non-zero), XIF modifies Execution Control's line parameters to indicate the end of this line and then returns. Execution Control moves to the next line, skipping any remaining statements on the original IF statement line. If the expression is true (zero), things get a little more complicated. Back during syntaxing, when a statement of the form IF <expression> THEN <statement> was encountered, the pre-compiler generated an end-of-statement token after THEN. XIF now tests for this token. If we are at the end of the statement, XIF returns to Execution Control, which processes what it thinks is the next statement in the current line, but which is actually the THEN <statement> part of the IF statement. If XIF does not find the end-of-statement token, then the statement must have had the form IF <expression> THEN <line number>. XIF jumps to XGOTO, which finishes processing by changing Execution Control's line parameters to indicate the new line. XTRAP ($B7E1) The TRAP statement does not actually change the program flow when it is executed. Instead, the XTRAP simulation routine calls a subroutine of Execute Expression to evaluate the line number and then saves the result in TRAPLN ($BC). The program flow is changed only if there is an error. The Error Handling Routine checks TRAPLN. If it contains a valid line number, the error routine does some initial set-up and joins the XGOTO routine to transfer to the new line. Runtime Stack Routines The rest of the Program Flow Control Statements use the Runtime Stack. They put items on the stack, inspect them, and/or remove them from the stack. Every item on the Runtime Stack contains a four-byte header. This header consists of a one-byte type indication, a 76
Chapter Nine two-byte line number, and a one-byte displacement to the Statement Name Token. (See pages 18-19.) The type byte is the last byte placed on the stack for each entry. This means that the pointer to the top of the Runtime Stack (RUNSTK) points to the type byte of the most recent entry on the stack. A zero type byte indicates a GOSUB-type entry. Any non-zero type byte represents a FOR-type entry A GOSUB entry consists solely of the four-byte header. A FOR entry contains twelve additional bytes: a six-byte limit value and a six-byte step value. Several routines are used by more than one of the statement simulation routines. PSHRSTK ($B683) This routine expands the Runtime Stack by calling EXPLOW and then storing the type byte, line number, and displacement of the Statement Name Token on the stack. POPRSTK ($B841) This routine makes sure there really is an entry on the Runtime Stack. POPRSTK saves the displacement to the statement name token in SVI)ISP, saves the line number in TSLNUM, and puts the type/variable number in the 6502 accumulator. It then removes the entry by calling the CONTLOW routine. :GETTOK ($B737) This routine first sets up Execution Control's line parameters to point to the line whose number is in the entry just pulled from the Runtime Stack. If the line was found, : GETTOK updates the line parameters to indicate that the statement causing this entry is now the current statement. Finally, it loads the 6502 accumulator with the statement name token from the statement that created this entry and returns to its caller. If the line number does not exist, : GETTOK restores the current statement address and exits via the ERGFDEL entry point in the Error Handling Routine. Now let's look at the simulation routines for the statements that utilize the Runtime Stack. XFOR ($B64B) XFOR is the name of the simulation routine which executes a FOR statement. In the statement FOR 1=1 TO 10 STEP 2: I is the loop control variable 77
Chapter Nine 1 is its initial value 10 is the limit value 2 is the step value XFOR calls Execute Expression, which evaluates the initial value and puts it in the loop control variable's entry in the Variable Value Table. Then it calls a routine to remove any currently unwanted stack entries for example, a previous FOR statement that used the same loop control variable as this one. XFOR calls a subroutine of Execute Expression to evaluate the limit and step values. If no step value was given, a value of 1 is assigned. It expands the Runtime Stack using EXPLOW and puts the values on the stack. XFOR uses PSHRSTK to put the header entry on the stack. It uses the variable number ofthe loop control variable (machine-language ORed with $80) as the type byte. XFOR now returns to Execution Control, which processes the statement following the FOR statement. The FOR statement does not change program flow. It just sets up an entry on the Runtime Stack so that the NEXT statement can change the flow. XNEXT ($B6CF) The XNEXT routine decides whether to alter the program flow, depending on the top Runtime Stack entry. XNEXT calls the POPRSTK routine repeatedly to remove four-byte header entries from the top of the stack until an entry is found whose variable number (type) matches the NEXT statement's variable token. If the top-of-stack or GOSUB-type entry is encountered, XNEXT transfers control to an Error Handling Routine via the ERNOFOR entry point. To compute the new value of the loop variable, XNEXT calls a subroutine of Execute Expression to retrieve the loop control variable's current value from the Variable Value Table, then gets the step value from the Runtime Stack, and finally adds the step value to the variable value. XNEXT again calls an Execute Expression subroutine to update the variable's value in the Variable Value Table. XNEXT gets the limit value from the stack to determine if the variable's value is at or past the limit. If so, XNEXT returns to Execution Control without changing the program flow, and the next sequential statement is processed. 78
Chapter Nine If the variable's value has not reached the limit, XNEXT returns the entry to the Runtime Stack and changes the program flow. POPRSTK already saved the line number of the FOR statement in TSLNUM and the displacement to the statement name token in SVDISP. XNEXT calls the GETTOK routine to indicate the FOR statement as the current statement. If the token at the saved displacement is not a FOR statement name token, then the Error Handling Routine is given control at the ERGFDEL entry point. Otherwise, XNEXT returns to Execution Control, which starts processing with the statement following the FOR statement. XGOSUB ($BGA0) The GOSUB statement causes an entry to be made on the Runtime Stack and also changes program flow. The XGOSUB routine puts the GOSUB type indicator (zero) into the 6502 accumulator and calls PSHRSTK to put a four-byte header entry on the Runtime Stack for later use by the simulation routine for RETURN. XGOSUB then processes exactly like XGOTO. XRTN ($B719) The RETURN statement causes an entry to be removed from the Runtime Stack. The XRTN routine uses the information in this entry to determine what statement should be processed next. The XRTN first calls POPRSTK to remove a GOSUB-type entry from the Runtime Stack. If there are no GOSUB entries on the stack, then the Error Handling Routine is called at ERBRTN. Otherwise, XRTN calls GETTOK to indicate that the statement which created the Runtime Stack entry is now the current statement. If the statement name token at the saved displacement is not the correct type, then XRTN exits via the Error Handling Routine's ERGFDEL entry point. Otherwise, control is returned to the caller. When Execution Control was the caller, then GOSUB must have created the stack entry, and processing will start at the statement following the GOSUB. Several other statements put a GOSUB-type entry on the stack when they need to mark their place in the program. They do not affect program flow and will be discussed in later chapters. 79
Chapter Nine XPOP ($B841) The XPOP routine uses POPRSTK to remove an entry from the Runtime Stack. A user might want to do this il he decided not to RETURN from a GOSUB. XON ($B7ED) The ON statement comes in two versions: ON-GOTO and ON GOSUB. Only ON-GOSUB uses the Runtime Stack. The XON routine evaluates the variable and converts it to an integer (MOD 256). If the value is zero, XON returns to Execution Control without changing the program flow. If the value is non-zero and this is an ON-GOSUB statement, XON puts a GOSUB-type entry on the Runtime Stack for RETURN to use later. From this point, ON-GOSUB and ON-GOTO perform in exactly the same manner. XON uses the integer value calculated earlier to index into the tokenized statement line to the correct GOTO or GOSUB line number. If there is no line number corresponding to the index, XON returns to Execution Control without changing program flow. Otherwise, XON joins XGOTO to finish processing. 80
Chapter TenTokenized Program Save and Load
The tokenized program can be saved to and reloaded from a peripheral device, such as a disk or a cassette. The primary statement for saving the tokenized program is SAVE. The saved program is reloaded into RAM with the LOAD statement. The CSAVE and the CLOAD statements are special versions of SAVE and LOAD for use with a cassette. Saved File Format The tokenized program is completely contained within the Variable Name Table, the Variable Value Table, and the Statement Table. However, since these tables vary in size, we must also save some information about the size of the tables. The SAVE file format is shown in Figure 10-1. The first part consists of seven fields, each of them two bytes long, which tell where each table starts or ends. Part two contains the saved program's Variable Name Table (VNT), Variable Value Table (VVT), and Statement Table (ST). The displacement value in all the part-one fields is actually the displacement p?us 256. We must subtract 256 from each displacement value to obtain the true displacement. The VNT starts at relative byte zero in the file's second part. The second field in part one holds that value plus 256. The DVVT field in part one contains the displacement, minus 256, of the VVT from the start of part two. The DST value, minus 256, gives the displacement of the Statement Table from the start of part two. The DEND value, minus 256, gives the end-of-file displacement from the start of part two. 81
Chapter Ten Figure 10-1. SAVE File Format PART 1 0 +----------+ | 0 | 2 +----------+ | 256 | <--> The displacement of the VNT from 4 +----------+ the beginning of part two, plus 256. | Not Used | 6 +----------+ | DVVT | <--> The displacement of VNT from the 8 +----------+ beginning of part two, plus 256. | DST | <--> The displacement of ST from the 10+----------+ beginning of part two, plus 256. | Not Used | 12+----------+ | DEND | <--> The displacement of the end of the ======14+==========+ file from the beginning of part two. PART 2 0| VNT | <--> Variable Name Table DVVT-256+----------+ | VVT | <--> Variable Value Table DSNT-256+----------+ | ST | <--> Statement Table DEND-256+----------+ XSAVE($BB5D) The code that implements the SAVE statement starts at the XSAVE ($BB5D) label. Its first task is to open the specified output file, which it does by calling ELADVC. The next operation is to move the first seven RAM table pointers from $80 to a temporary area at $500. While these pointers are being moved, the value contained in the first pointer is subtracted from the value in each of the seven pointers, including the first. Since the first pointer held the absolute address of the first RAM table, this results in a list of displacements from the first RAM table to each of the other tables. These seven two-byte displacements are then written from the temporary area to the file via 103. These are the first fourteen bytes of the SAVE file. (See Figure 10-1.) The first RAM table is the 256-byte buffer, which will not be SAVEd. This is why the seven two-byte fields at the beginning of the SAVEd file hold values exactly 256 more than the true 82
Chapter Ten displacement of the tables they point to. (The LOAD procedure will resolve the 256-byte discrepancy.) The next operation is to write the three needed RAM tables. The total length of these tables is determined from the value in the seventh entry in the displacement list, minus 256. To write the three entries, we point to the start of the Variable Name Table and call 104, with the length of the three tables. This saves the second part of the file format. The file is then closed and XSAVE returns to Execution Control. XLOAD ($BAFB) The LOAD statement is implemented at the XLOAD label located at $BAFB. XLOAD first opens the specified load file for input by calling ELADVC. BASIC reads the first fourteen bytes from the file into a temporary area starting at $500. These fourteen bytes are the seven RAM table displacements created by SAVE. The first two bytes will always be zero, according to the SAVE file format. (See Figure 10-1.) BASIC tests these two bytes for zero values. If these bytes are not zero, BASIC assumes the file is not a valid SAVE file and exits via the ERRNSF, which generates error code 21 (Load File Error). If this is a valid SAVE file, the value in the pointer at $80 (Low Memory Address) is added to each of the seven displace ments in the temporary area. These values will be the memory addresses of the three RAM tables, if and when they are read into memory. The seventh pointer in the temporary area contains the address where the end of the Statement Table will be. If this address exceeds the current system high memory value, the routine exits via ERRPTL, which generates error code 19 (Load Program Too Big). If the program will fit, the seven addresses are moved from the temporary area to the RAM table pointers at $80. The second part of the file is then loaded into the area now pointed to by the Variable Name Table pointer $82. The file is closed, CLR is executed, and a test for RUN is made. If RUN called XLOAD, then a value of $FF was pushed onto the CPU stack. If RUN did not call XLOAD, then $00 was pushed onto the CPU stack. If RUN was the caller, then an RTS is done. 83
Chapter Ten If XLOAD was entered as a result of a LOAD or CLOAD statement, then XLOAD exits directly to the Program Editor, not to Execution Control. CSAVE and CLOAD The CSAVE and CLOAD statements are special forms of SAVE and LOAD. These two statements assume that the SAVE/LOAD device is the cassette device; CSAVE is not quite the same as SAVE "C:". Using SAVE with the "C:" device name will cause the program to be saved using long cassette inter-record gaps. This is a time waster, and CSAVE uses short inter-record gaps. CSAVE starts at XCSAVE ($BBAC). CLOAD starts at XCLOAD ($BBA4). 84
Chapter ElevenThe LIST and ENTER Statements
LIST can be used to store a program on an external device and ENTER can retrieve it. The difference between LOAD-SAVE and LIST-ENTER is that LOAD-SAVE deals with the tokenized program, while LIST-ENTER deals with the program in its source (ATASCII) form. The ENTER Statement BASIC is in ENTER mode whenever a program is not RUNning. By default the Program Editor looks for lines to be ENTERed from the keyboard, but the editor handles all ENTERed lines alike, whether they come from the keyboard or not. The Enter Device To accomplish transparency of all input data (not just ENTERed lines), BASIC maintains an enter device indicator, ENTDTD ($B4). When a BASIC routine (for example, the INPUT simulation routine) needs data, an I/O operation is done to the IOCB specified in ENTDTD. When the value in ENTDTD is zero, indicating IOCB 0, input will come from the keyboard. When data is to come from some other device, ENTDTD contains a number indicating the corresponding IOCB. During coldstart initialization, the enter device is set to IOCB 0. It is also reset to 0 at various other times. XENTER ($BACB) The XENTER routine is called by Execution Control to simulate the ENTER statement. XENTER opens IOCB 7 for input using the specified <filespec>, stores a 7 in the enter device ENTDTD, and then jumps to the start of the editor. Entering from a Device When the Program Editor asks GLGO, the get line routine ($BA92), for the next line, GLGO tells CIO to get a line from the 85
Chapter Eleven device spedified in ENTDTD - in this case, fr9m IOCB 7. The editor continues to process lines from IOCB 7 until an end-of- file error occurs. The IOTEST routine detects the EOF condition, sees thatwe are using IOCB 7 for ENTER, closes device 7, and jumps to SNX2 to reset the enter device (ENTDTD) to 0 and print the READY message before restarting at the beginning of the editor. The LIST Statement The routine which simulates the LIST statement, XLIST, is actually another example of a language translator, complete with symbols and symbol-combining rules. XLIST translates the tokens generated by Atari BASIC back into the semi- English BASIC statements in ATASCII. This translation is a much simpler task than the one done by the pre-compiler, since XLIST can assume that the statement to be translated is syntactically correct. All that is required is to translate the tokens and insert blanks in the appropriate places. The List Device BASIC maintains a list device indicator, LISTDTD ($B5), similar to the enter device indicator discussed earlier. When a BASIC routine wants to output some data (an error message, for example), the I/O operation is done to the device (IOCB) specified in LISTDTD. During coldstart initialization and at various other times, LISTDTD is set to zero, representing IOCB 0, the editor, which will place the output on the screen. Routines such as XPRINT or XLIST can change the LIST device to indicate some other IOCB. Thus the majority of the BASIC routines need not be concerned about the output's destination. Remember that IOCB 0 is always open to the editor, which gets input from the keyboard and outputs to the screen. IOCB 6 is the S: device, the direct access to graphics screen, which is used in GRAPHICS statements. Atari BASIC uses IOCB 7 for I/O commands that allow different devices, like SAVE, LOAD, ENTER, and LIST. XLIST ($B483) The XLIST routine considers the output's destination in its initialization process and then forgets about it. It looks at the first expression in the tokenized line. If it is the <filespec> 86
Chapter Eleven string, XLIST calls a routine to open the specified device using IOCB 7 and to store a 7 in LISTDTD. All of XLIST's other processing is exactly the same, regardless of the LISTed data's final destination. XLIST marks its place in the Statement Table by calling a subroutine of XGOSUB to put a GOSUB type entry on the Runtime Stack. Then XLIST steps through the Statement Table in the same way that Execution Control does, using Execution Control's line parameters and subroutines. When XLIST is finished, Execution Control takes the entry off the Runtime Stack and continues. The XLIST routine, assuming it is to LIST all program statements, sets default starting and ending line numbers of 0 (in TSLNUM) and $7FFF (in LELNUM). XLIST then determines whether line numbers were specified in the tokenized line that contained the LIST statement. XLIST compares the current index into the line (STINDEX) to the displacement to the next statement (NXTSTD). If STINDEX is not pointing to the next statement, at least one line number is specified. In this case, XLIST calls a subroutine of Execute Expression to evaluate the line number and convert it to a positive integer, which XLIST stores in TSLNUM as the starting line number. If a second line number is specified, XLIST calls Execute Expression again and stores the value in LELNUM as the final line to LIST. If there is no second line number, then XLIST makes the ending line number equal to the starting line number, and only one line will be LISTed. If no line numbers were present, then TSLNUM and LELNUM still contain their default values, and all the program lines will be LISTed. XLIST gets the first line to be LISTed by calling the Execution Control subroutine GETSTMT to initialize the line parameters to correspond to the line number in TSLNUM. If we are not at the end of the Statement Table, and if the current line's number is less than or equal to the final line number to be LISTed, XLIST calls a subroutine :LLINE to list the line. After LISTing the line, XLIST calls Execution Control's subroutines to point to the next line. LISTing continues in this manner until the end of the Statement Table is reached or until the final line specified has been printed. When XLIST is finished, it exits via XRTN at $B719, which makes the LIST statement the current statement again and then returns to Execution Control. 87
Chapter Eleven LIST Subroutines :LLINE ($BS5C) The LLINF routine LISTs the current line (the line whose address is in STMCUR). :LLINE gets the line number from the beginning of the tokenized line. The floating point package is called to convert the integer to floating point and then to printable ATASCII. The result is stored in the buffer indicated by INBUFF. :LLINE calls a subroutine to print the line number and then a blank. For every statement in the line, :LLINE sets STINDEX to point to the statement name token and calls the :LSTMT routine ($B590) to LIST the statement. When all statements have been LISTed, :LLINE returns to its caller, XLIST. :LSTMT ($B590) The :LSTMT routine LISTs the statement which starts at the current displacement (in INDEX) into the current line. This routine does the actual language translation from tokens to BASIC statements. :LSTMT uses two subroutines, :LGCT and :LGNT, to get the current and next token, respectively. If the end of the statement has been reached, these routines both pull the return address of their caller off the 6502 CPU stack and return to :LSTMT's caller, :LLINE. Otherwise, they return the requested token from the tokenized statement line. The first token in a statement is the statement name token. :LSTMT calls a routine which prints the corresponding statement name by calling :LSCAN to find the entry and :LPRTOKEN to print it. In the discussion of the Program Editor we saw that an erroneous statement was given a statement name of ERROR and saved in the Statement Table. If the current statement is this ERROR statement or is REM or DATA, :LSTMT picks up each remaining character in the statement and calls PRCHAR ($BA9F) to print the character. Each type of token is handled differently. :LSTMT determines the type (variable, numeric constant, string constant, or operator) and goes to the proper code to translate it. Variable Token. A variable token has a value greater than or equal to $80. When :LSTMT encounters a variable token, it 88
Chapter Eleven turns off the most significant bit to get an index into the Variable Name Table. :LSTMT asks the :LSCAN routine to get the address of this entry. :LSTMT then calls :LPRTOKEN ($B535) to print the variable name. If the last character of the name is (, the next token is an array left parenthesis operator, and :LSTMT skips it. Numeric Constant Token. A numeric constant is indicated by a token of $0E. The next six bytes are a floating point number. :LSTMT moves the numeric constant from the tokenized line to FRO ($D4) and asks the floating point package to convert it to ATASCII. The result is in a buffer pointed to by INBUFF. :LSTMT moves the address of the ATASCII number to SRCADR and tells :LPRTOKEN to print it. String Constant Token. A string constant is indicated by a token of $0F. The next byte is the length of the string followed by the actual string data. Since the double quotes are not stored with a string constant, :LSTMT calls PRCHAR ($BA9F) to print the leading double quote. The string length tells :LSTMT how many following characters to print without translation. :LSTMT repeatedly gets a character and calls PRCHAR to print it until the whole string constant has been processed. It then asks PRCHAR to print the ending double quote. Operator Token. An operator token is any token greater than or equal to $10 and less than $80. By subtracting $10 from the token value, :LSTMT creates an index into the Operator Name Table. :LSTMT calls :LSCAN to find the address of this entry. If the operator is a function (token value greater than or equal to $3D), :LPROTOKEN is called to print it. If this operator is not a function but its name is alphabetic (such as AND), the name is printed with a preceding and following blank. Otherwise, :LPRTOKEN is called to print just the operator name. :LSCAN ($B50C) This routine scans a table until it finds the translation of a token into an ATASCII name. A token's value is based on its table entry number; therefore, the entry number can be derived by modifying the token. For example, a variable token is created by machine-language ORing the table entry number of the variable name with $80. The entry number can be produced by ANDing out the high-order bit of the token. It is this entry number, stored in SCANT, that the :LSCAN routine uses. 89
Chapter Eleven The tables scanned by :LSCAN have a definite structure. Each entry consists of a fixed length portion followed by a variable length ATASCII portion. The last character in the ATASCII portion has the high-order bit on. Using these facts, :LSCAN in 5 he entry corresponding to the entry number in SCANT and puts the address of the ATASCII portion in SCRADR. :LPRTOKEN ($B535) This routine's task is to print the string of ATASCII characters whose address is in SCRADR. :LPRTOKEN makes sure the most significant bit is off (except for a carriage return) and prints the characters one at a time until it has printed the last character in the string (the one with its most significant bit on). 90
Chapter TwelveAtari Hardware Control Statements
The Atari Hardware Control Statements allow easy access to some of the computer's graphics and audio capabilities. The statements in this group are COLOR, GRAPHICS, PLOT, POSITION, DRAWTO, SETCOLOR, LOCATE, and SOUND. XGR ($BA50) The GRAPHICS statement determines the current graphics mode. The XGR simulation routine executes the GRAPHICS statement. The XGR routine first closes IOCB 6. It then calls an Execute Expression subroutine to evaluate the graphics mode value and convert it to an integer. XGR sets up to open the screen by putting the address of a string "S." into INBUFF. It creates an AUX1 and AUX2 byte from the graphics mode integer. XGR calls a BASIC I/O routine which sets up IOCB 6 and calls CIO to open the screen for the specified graphics mode. Like all BASIC routines that do I/O, XGR jumps to the IOTEST routine, which determines what to do next based on the outcome of the I/O. XCOLOR ($BA29) The COLOR statement is simulated by the XCOLOR routine. XCOLOR calls a subroutine of Execute Expression to evaluate the color value and convert it to an integer. XCOLOR saves this value (MOD 256) in BASIC memory location COLOR ($C8). This value is later retrieved by XPLOT and XDRAWTO. XSETCOLOR ($89B7) The routine that simulates the SETCOLOR statement, XSETCOLOR, calls a subroutine of Execute Expression to evaluate the color register specified in the tokenized line. The Execute Expression routine produces a one-byte integer. If the value is not less than 5 (the number of color registers), XSETCOLOR exits via the Error Handling Routine at entry point ERVAL. Otherwise, it calls Execute Expression to get two more integers from the tokenized line. 91
Chapter Twelve To calculate the color value, XSETCOLOR multiplies the tirst integer (MOD 256) byl6 and adds the second (MOD 256). Since the operating system's five color registers are in consecutive locations starting at $2C4 XSETCOLOR uses the register value specified as an index to the proper register location and stores the color value there. XPOS ($BA16) The POSITION statement, which specifies the X and Y coordinates of the graphics cursor, is simulated by the XPOS routine. XPOS uses a subroutine of Execute Expression to evaluate the X coordinate of the graphics window cursor and convert it to an integer value. The two-byte result is stored in the operating system's X screen coordinate location (SCRX at $55). This is the column number or horizontal position of the cursor. XPOS then calls another Execute Expression subroutine to evaluate the Y coordinate and convert it to a one-byte integer. The result is stored in the Y screen coordinate location (SCRY at $54). This is the row number, or vertical position. XLOCATE ($BC95) XLOCATE, which simulates the LOCATE statement, first calls XPOS to set up the X and Y screen coordinates. Next it initializes IOCB 6 and joins a subroutine of XGET to do the actual I/O required to get the screen data into the variable specified. XPLOT ($5A76) XPLOT, which simulates the PLOT statement, first calls XPOS to set the X and Y coordinates of the graphics cursor. XPLOT gets the value that was saved in COLOR ($C8) and joins a PUT subroutine (PRCX at $BAA1) to do the I/O to IOCB 6 (the screen). XDRAWTO ($BA31) The XDRAWTO routine draws a line from the current X,Y screen coordinates to the X,Y coordinates specified in the statement. The routine calls XPOS to set the new X,Y coordinates. It places the value from BASIC's memory location COLOR into OS location SVCOLOR ($2FB). XDRAWTO does some initialization of IOCB 6 specifying the draw command ($11). It then calls a BASIC I/O routine which finishes the 92
Chapter Twelve initialization of IOCB 6 and calls CIO to draw the line. Finally, XDRAWTO jumps to the IOTEST routine, which will determine what to do next based on the outcome of the I/O. XSOUND ($K9DD) The Atari computer hardware uses a set of memory locations to control sound capabilities. The SOUND statement gives the user access to some of these capabilities. The XSOUND routine, which simulates the SOUND statement, places fixed values in some of the sound locations and user specified values in others. The XSOUND routine uses Execute Expression to get four integer values from the tokenized statement line. If the first integer (voice) is greater than or equal to 4, the Error Handling Routine is invoked at ERVAL. The OS audio control bits are all turned off by storing a 0 into $D208. Any bits left on from previous serial port usage are cleared by storing 3 in $D20F. The Atari has four sound registers (one for each voice) starting at $D200. The first byte of each two-byte register determines the pitch (frequency). In the second byte, the four most significant bits are the distortion, and the four least significant bits are the volume. The voice value mentioned earlier is multiplied by 2 and used as an index into the sound registers. The second value from the tokenized line is stored as the pitch in the first byte of one of the registers ($D200, $D202, $D204, or $D206), depending on the voice index. The third value from the tokenized line is multiplied by 16 and the fourth value is added to it to create the value to be stored as distortion/volume. The voice, times 2, is again used as an index to store this value in the second byte of a sound register ($D201, $D203, $D205, or $D207). The XSOUND routine then returns to Execution Control. 93
Chapter ThirteenExternal Data I/O Statements
The external data I/O statements allow data which is not part of the BASIC source program to flow into and out of BASIC. External data can come from the keyboard, a disk, or a cassette. BASIC can also create external information by sending data to external devices such as the screen, a printer, or a disk. The INPUT and GET statements are the primary statements used for obtaining information from external devices. The PRINT and PUT statements are the primary statements for sending data to external devices. XIO, LPRINT, OPEN, CLOSE, NOTE, POINT and STATUS are specialized I/O statements. LPRINT is used to print a single line to the "P:" device. The other statements assist in the I/O process. XINPUT ($B316) The execution of the INPUT statement starts at )(INPUT ($B316). Getting the Input Line. The first action of XINPUT is to read a line of data from the indicated device. A line is any combination of up to 255 characters terminated by the EOL character ($9B). This line will be read into the buffer located at $580. If the INPUT statement contained was followed by #<expression>, the data will be read from the IOCB whose number was specified by <expression>. If there was no #<expression>, IOCB 0 will be used. IOCB 0 is the screen editor and keyboard device (E:). If IOCB 0 is indicated, the prompt character (?) will be displayed before the input line request is made; otherwise, no prompt is displayed. Line Processing. Once the line has been read into the buffer, processing of the data in that line starts at XINA ($B335). The input line data is processed according to the tokens in the INPUT (or READ) statements. These tokens are numeric or string variables separated by commas. 95
Chapter Thirteen Processing a Numeric Variable. If the new token is a numeric variable, the CVAFP routine is called to convert the next characters in the input line to a floating point number. If this conversion does not report an error1 and if the next input line character is a comma or an EOL, the floatixig point value is processed. The processing of a valid numeric input value consists of calling RTNVAR to return the variable and its new value to the Variable Value Table. If there is an error, INPUT processing is aborted via the ERRINP routine. If there is no error, but the user has hit BREAK, the process is aborted via XSTOP. If there is no abort, XINX ($B389) is called to continue with INPUT's next task. Processing a String Variable. If the next statement token is a string variable, it is processed at XISTR ($B35E). This routine is also used by the READ statement. If the calling statement is INPUT, then all input line characters from the current character up to but not including the EOL character are considered to be part of the input string data. If the routine was called by READ, all characters up to but not including the next comma or EOL are considered to be part of the input string. The process of assigning the data to the string variable is handled by calling RISASN ($B386). If RISASN does not abort the process because of an error like DIMENSION TOO SMALL, XINX is called to continue with INPUT's next task. XINX. The XINX ($B389) routine is entered after each variable token in an INPUT or a READ statement is processed. If the next token in the statement is an EOL, the INPUT/READ statement processing terminates at XIRTS ($B3A1). XIRTS restores the line buffer pointer ($80) to the RAM table buffer. It then restores the enter device to IOCB 0 (in case it had been changed to some other input device). Finally, XIRTS executes an RTS instruction. If the next INPUT/READ statement token is a comma, more input data is needed. If the next input line character is an EOL, another input line is obtained. If the statement was INPUT, the new line is obtained by entering XINO ($B326). If the statement was READ, the new lineis obtained by entering XRD3 ($B2D0). The processing of the next INPUT/READ statement variable token continues at XINA. 96
Chapter Thirteen XGET ($BC7F) The GET statement obtains one character from some specified device and assigns that character to a scalar (non-array) numeric variable. The execution of GET starts at XGET ($BC7F) with a call to GIODVC. GIODVC will set the I/O device to whatever number is specified in the #< expression> or to IOCB zero if no #<expression> was specified. (If the device is IOCB 0 (E:), the user must type RETURN to force E: to terminate the input.) The single character is obtained by calling 103. The character is assigned to the numeric variable by calling ISVAR1 ($BD2D). ISVAR1 also terminates the GET statement processing. PRINT The PRINT statement is used to transmit text data to an external device. The arguments in the PRINT statement are a list of numeric and/or string expressions separated by commas or semicolons. If the argument is numeric, the floating point value is converted to text form. If the argument is a string, the string value is transmitted as is, If an argument separator is a comma, the arguments are output in tabular fashion: each new argument starts at the next tab stop in the output line, with blanks separating the arguments. If the argument separator is a semicolon, the transmitted arguments are appended to each other without separation. The transmitted line is terminated with an EOL, unless a semicolon or comma directly precedes the statement's EOL or statement separator (:). XPRINT ($B3B6). The PRINT routine begins at XPRINT ($B3B6). The tab value is maintained in the PTABW ($C9) cell. The cell is initialized with a value of ten during BASIC's cold start, so that commas in the PRINT line cause each argument to be displaced ten positions after the beginning of the last argument. The user may POKE PTABW to set a different tab value. XPRINT copies PTABW to SCANT ($AF). SCANT will be used to contain the next multiple-of-PTABW output line displacement - the column number of the next tab stop. COX is initialized to zero and is used to maintain the current output column or displacement. 97
Chapter Thirteen XPRO. XPRINT examines the next statement token at XPRO ($B3BE), classifies it, and executes the proper routine. # Token. If the next token is #, XPRIOD ($B437) is entered. This routine modifles the list device to the device specified in the # <expression>. XPRO is then entered to process the next token. , Token. The XPTAB ($B419) routine is called to process the , token. Its job is to tab to the next tab column. If COX (the current column) is greater than SCANT, we must skip to the next available tab position. This is done by continuously adding PTABW to SCANT until COX is less than or equal to SCANT. When COX is less than SCANT, blanks ($20) are transmitted to the output device until COX is equal to SCANT. The next token is then examined at XPRO. EOL and: Tokens. The XPEOS ($B446) routine is entered for EOL and tokens. If the previous token was a; or, token, PRINT exits at XPRTN ($B458). If the previous token was not a; or, token, an EOL character is transmitted before exiting via XPRTN. ; Token. No special action is taken for the; token except to go to XPRO to examine the next token. Numbers and Strings. If the next token is not one of the above tokens, Execute Expression is called to evaluate the expression. The resultant value is popped from the argument stack and its type is tested for a number or a string. If the argument popped was numeric, it will be converted to text form by calling CVFASC. The resulting text is transmitted to the output device from the buffer pointed to by INBUFF ($F3). XPRO is then entered to process the next token. If the argument popped was a string, it will be transmitted to the output device by the code starting at :XPSTh ($B3F8). This code examines the argument parameters to determine the current length of the string. When the string has been transmitted, XPRO is entered to process the next token. XLPRINT ($8464) LPRINT, a special form of the PRINT statement, is used to print a line to the printer device (P:). 98
Chapter Thirteen The XLPRINT routine starts at $B464 by opening IOCB 7 for output to the P: device. XPRINT is then called to do the printing. When the XPRINT is done, IOCB 7 is closed via CLSYS1 and LPRINT is terminated. XPUT ($BC72) The PUT statement sends a single byte from the expression in the PUT statement to a specified external device. Processing starts at XPUT ($BC72) with a call to GIODVC. GIODVC sets the I/O device to the IOCB specified in #<expression>. If a #< expression> does not exist, the device will be set to IOCB zero (E:). The routine then calls GETINT to execute PUT's expression and convert the resulting value to a two-byte integer. The least significant byte of this integer is then sent to the PUT device via PRCX. PRCX also terminates the PUT processing. XXIO ($BBE5) The XIO statement, a general purpose I/O statement, is intended to be used when no other BASIC I/O statement will serve the requirements. The XIO parameters are an IOCB I/O command, an IOCB specifying expression, an AUXi value, an AUX2 value, and finally a string expression to be used as a filespec parameter. XIO starts at XXIO ($BBE5) with a call to GIOCMD. CIOCMD gets the IOCB command parameter. XIO then continues at XOP1 in the OPEN statement code. XOPEN ($BBEB) The OPEN statement is used to open an external device for input and/or output. OPEN has a # <expression>, the open type parameter (AUX1), an AUX2 parameter, and a string expression to be used as a filespec. OPEN starts at XOPFN at $BBEB. It loads the open command code into the A register and continues at XOP1. XOP1. XOP1 continues the OPEN and XIO statement processing. It starts at $BBED by storing the A register into the IOCMD cell. Next it obtains the AUX1(open type) and AUX2 values from the statement. The next parameter is the filespec string. In order to insure that the filespec has a proper terminator, SETSEOL is called to place a temporary EOL at the end of the string. 99
Chapter Thirteen The XIO or OPEN command is then executed via a call to IO1. When IO1 returns, the temporary EOL at the end of the string is replaced with its previous value by calling RSTSEOL. OPEN and XIO terminate by calling TOTEST to insure that the command was executed without error. XCLOSE ($BC1B) The CLOSE statement, which closes the specified device, starts at XCLOSE ($BCIB). It loads the IOCB close command code into the A register and continues at GDVCIO. GDVCEO. GDVCIO ($BC1D) is used for general purpose device I/O. It stores the A register into the IOCMD cell, calls GIODVC to get the device from # <expression>, then calls IO7 to execute the IO0. When IO7 returns, IOTEST is called to test the results of the I/O and terminate the routine. XSTATUS ($BC28) The STATUS statement executes the IOCB status command. Processing starts at XSTATUS ($BC28) by calling GIODVC to get the device number from # <expression>. It then calls IO8 with the status command in the A register. When IO8 returns, the status returned in the IOCB status cell is assigned to the variable specified in the STATUS statement by calling ISVARI. ISVAR1 also terminates the STATUS statement processing. XNOTE ($BC3G) The NOTE statement is used specifically for disk random access. NOTE executes the Disk Device Dependent Note Command, $26, which returns two values representing the current position within the file for which the IOCB is open. NOTE begins at XNOTE at $BC36. The code loads the command value, $26, into the A register and calls GDVCIO to do the I/O operation. When GDVCIO returns, the values are moved from AUX3 and AUX4 to the first variable in the NOTE statement. The next variable is assigned the value from AUX5. XPOINT ($8C4D) The POINT statement is used to position a disk file to a previously NOTEd location. Processing starts at XPOINT ($BC4D). This routine conyerts the first POINT parameter to an integer and stores the value in AUX3 and AUX4. The second parameter is then converted to an integer and its value stored 100
Chapter Thirteen in AUX5. The POINT command, $25, is executed by calling GDIO1, which is part of GDVCIO. Miscellaneous I/O Subroutines IOTEST. IOTEST($BCB3) is a general purpose routine that examines the results of an I/O operarion. If the I/O processing has returned an error, IOTEST processes that error. IOTEST starts by calling LDIOSTA to get the status byte from the IOCB that performed the last I/O operation. If the byte value is positive (less than 128), IOTEST returns to the caller. If the status byte is negative, the I/O operation was abnormal and processing continues at SICKIO. If the I/O aborted due to a BREAK key depression, BRKBYT ($11) is set to zero to indicate BREAK. If a LOAD was in progress when BREAK was hit, exit is via COLDSTART; otherwise IOTEST returns to its caller. If the error was not from IOCB 7 (the device BASIC uses), the error status value is stored in ERRNUM and ERROR is called to print the error message and abort program execution. If the error was from IOCB 7, then IOCB 7 is closed and ERROR is called with the error status value in ERRNUM - unless ENTER was being executed, and the error was an end- of-file error. In this case, IOCB 7 is closed, the enter device is reset to IOCB 0, and SNX2 is called to return control to the Program Editor. I/O Call Routine. All I/O is initiated from the routine starting at IO1 ($BD0A). This routine has eight entry points, IO1 through IO8, each of which stores predetermined values in an IOCB. All IOn entry points assume that the X register contains the IOCB value, times 16. IO1 sets the buffer length to 255. IO2 sets the buffer length to zero. IO3 sets the buffer length to the value in the Y register plus a most-significant length byte of zero. IO4 sets the buffer length from the values in the Y,A register pair, with the A register being the most-significant value. IO5 sets the buffer address from the value in the INBUFF cell (sF3). IO6 sets the buffer address from the Y,A register pair. The A register contains the most significant byte. 101
Chapter Thirteen IO7 sets the 1/0 command value from the value in the IOCMD cell. IO8 sets the I/O command from the value in the A register. All of this is followed by a call to the operating system CIO entry point. This call executes the I/O. When CIO returns, the general I/O routine returns to its caller. 102
Chapter FourteenInternal I/O Statements
The READ, DATA, and RESTORE statements work together to allow the BASIC user to pass predetermined information to his or her program. This is, in a sense, internal I/O. XDATA ($A9E7) The information to be passed to the BASIC program is stored in one or more DATA statements. A DATA statement can occur any place in the program, but execution of a DATA statement has no effect. When Execution Control encounters a DATA statement, it expects to process this statement just like any other. Therefore an XDATA routine is called, but XDATA simply returns to Execution Control. XREAD ($B283) The XREAD routine must search the Statement Table to find DATA. It uses Execution Control's subroutines and line parameters to do this. When XREAD is done, it must restore the line parameters to point to the READ statement. In order to mark its place in the Statement Table, XREAD calls a subroutine of XGOSUB to put a GOSUB-type entry on the Runtime Stack. The BASIC program may need to READ some DATA, do some other processing, and then READ more DATA. Therefore, XREAD needs to keep track of just where it is in which DATA statement. There are two parameters that provide for this. DATALN ($B7) contains the line number at which to start the search for the next DATA statement. DATAD ($B6) contains the displacement of the next DATA element in the DATALN line. Both values are set to zero as part of RUN and CLR statement processing. XREAD calls Execution Control's subroutine GETSTMT to get the line whose number is stored in DATALN. If this is the first READ in the program and a RESTORE has not set a 103
Chapter Fourteen different line number, DATALN contains zero, and GETSTMT will get the first line in the program. On subsequent READs, GETSTMT gets the last DATA statement that was processed by the previous READ. After getting its first line, XREAD calls the XRTN routine to restore Execufion Control's line parameters. The current line number is stored in DATALN. XREAD steps through the line, statement by statement, looking for a DATA statement. If the line contains no DATA statement, then subsequent lines and statements are examined until a DATA statement is found. When a DATA statement has been found, XREAD inspects the elements of the DATA statement until it finds the element whose displacement is in DATAD. If no DATA is found, XREAD exits via the ERROOD entry point in the Error Handling Routine. Otherwise, a flag is set to indicate that a READ is being done, and XREAD joins XINPUT at :XINA. XINPUT handles the assignment of the DATA values to the variables. (See Chapter 13.) XREST ($B268) The RESTORE statement allows the BASIC user to re-READ a DATA statement or change the order in which the DATA statements are processed. The XREST routine simulates RESTORE. XREST sets DATALN to the line number given, or to zero if no line number is specified. It sets DATAD to zero, so that the next READ after a RESTORE will start at the first element in the DATA line specified in DATALN. 104
Chapter FifteenMiscellaneous Statements
XDEG ($B261) and XRAD ($B266) The transcendental functions such as SIN or COS will work with either degrees or radians depending on the setting of RADFLG ($FB). The DEG and'RAD statements cause RADFLG to be set. These statements are simulated by the XDEG and XRAD routines, respectiveW. The XDEG routine stores a six in RADI'LG. XRAD sets it to zero. These particular values were chosen because they aid the transcendental functions in their calculations. RADFLG is set to zero during BASIC's initialization process and also during simulation of the RUN statement. XPOKE ($B24C) The POKE statement is simulated by the XPOKE routine. XPOKE calls a subroutine of Execute Expression to get the address and data integers from the tokenized line. XPOKE then stores the data at the specified address. XBYE ($A9E8) The XBYE routine simulates the BYE statement. XBYE closes all IOCBs (devices and files) and then jumps to location $E471 in the Operating System. This ends BASIC and causes the memo pad to be displayed. XDOS ($A9EE) The DOS statement is simulated by the XDOS routine. The XDOS routine closes all IOCBs and jumps to whatever address is stored in location $0A. This will be the address of DOS if DOS has been loaded. If DOS has not been loaded, $0A will point to the memo pad. XLET ($AAE0) The LET and implied LET statements assign values to variables. They both invoke the XLET routine, which consists of the Execute Expression routines. (See Chapter 7.) 105
Chapter Fifteen XREM ($A9E7) The REM statement is for documentation purposes only and has no effect on the running program. The routine which simulates REM, XREM, simply executes an RTS instruction to return to Execution Control. XERR ($B91E) When a line containing a syntax error is entered, it is given a special statement name token to indicate the error. The entire line is flagged as erroneous no matter how many previously good statements are in the line. The line is then stored in the Statement Table. The error statement is processed just like any other. Execution Control calls a routine, XERR, which is one of the entry points to the Error Handling Routine. It causes error 17 (EXECUTION OF GARBAGE). XDIM ($B1D9) The DIMension statement, simulated by the XDIM routine, reserves space in the String/Array Table for the DIMensioned variable. The XDIM routine calls Execute Expression to get the variable to be DIMensioned from the Variable Value Table. The variable entry is put into a work area. In the process, Execute Expression gets the first and second DIMension values and sets a default of zero if only one value is specified. XDIM checks to see if the variable has already been DIMensioned. If the variable was already DIMensioned, XDIM exits via the ERRDIM entry point in the Error Handling Routine. If not, a bit is set in the variable type byte in the work area entry to mark this variable as DIMensioned. Next, XDIM calculates the amount of space required. This calculation is handled differently for strings and arrays. DIMensioning an Array. XDIM first increments both dimension values by one and then multiplies them together to get the number of elements in the array. XDIM multiplies the result by 6 (the length of a floating point number) to get the number of bytes required. EXPAND is called to expand the String/Array Table by that amount. XDIM must finish building the variable entry in the work area. It stores the first and second dimension values in the entry. It also stores the array's displacement into the 106
Chapter Fifteen String/Array Table. It then calls an Execute Expression subroutine to return the variable to the Variable Value Table. (See Chapter 3.) DIMensioning a String. Reserving space for a string in the String/Array Table is much simpler. XDIM merely calls the EXPAND routine to expand by the user-specified size. XDIM must also build the Variable Value Table entry in the work area. It sets the current length to 0 and the maximum length to the DIMensioned value. The displacement of the string into the String/Array Table is also stored in the variable. XDIM then calls a subroutine of Execute Expression to return the variable entry to the Variable Value Table. (See Chapter 3.) 107
Chapter SixteenInitialization
When the Atari computer is powered up with the BASIC cartridge in place, the operating system does some processing and then jumps to a BASIC routine. Between the time that BASIC first gets control and the time it prints the READY message, initialization takes place. This initialization is called a cold start. No data or tables are preserved during a cold start. Initialization is repeated if things go terribly awry. For example, if there is an I/O error while executing a LOAD statement, BASIC is totally confused. It gives up and begins all over again with the COLDSTART routine. Sometimes a less drastic partial initialization is necessary. This process is handled by the WARMSTART routine, in which some tables are preserved. Entering the NEW statement, simulated by the XNEW routine, has almost the same effect as a cold start. COLDSTART ($A000) Two flags, LOADFLG and WARMFLG, are used to determine if a cold or warm start is required. The load flag, LOADFLG ($CA), is zero except during the execution of a LOAD statement. The XLOAD routine sets the flag to non-zero when it starts processing and resets it to zero when it finishes. If an I/O error occurs during that interval, IOTEST notes that LOADFLG is non-zero and jumps to COLDSTART. The warm-start flag, WARMFLG ($08), is never set by BASIC. It is set by some other routine, such as the operating system or DOS. If WARMFLG is zero, a cold start is done. If it is non-zero, a warm start is done. During its power-up processing, before BASIC is given control, OS sets WARMFLG to zero to request a cold start. During System Reset processing, OS sets the flag to non-zero, indicating a warm start is desired. If DOS has loaded any data into BASIC's program area during its processing, it will request a cold start. The COLDSTART routine checks both WARMFLG and LOADFLG to determine whether to do a cold or warm start. If a cold start is required, COLDSTART initializes the 6502 CPU 109
Chapter Sixteen stack and clears the decimal flag. The rest of its processing is exactly the same as if the NEW statement had been entered. XNEW ($A00C) The NEW statement is simulated by the XNEW routine. XNEW resets the load flag, LOADFLG, to zero. It initializes the zero- page pointers to BASIC's RAM tables. It reserves 256 bytes at the loW memory address for the multipurpose buffer and stores its address in the zero-page pointer located at $80. Since none of the RAM tables are to retain any data, their zero- page pointers ($82 through $90) are all set to lOW memory plus 256. The Variable Name Table is expanded by one byte, which is set to zero. This creates a dummy end-of-table entry. The Statement Table is expanded by three bytes. The line number of the direct statement ($8000) is stored there along with the length (three). This marks the end of the Statement Table. A default tab value of 10 is set for the PRINT statement. WARMSTART ($A04D) A warm start is the least drastic of the three types of initialization. Everything the WARMSTART routine does is also done by COLDSTART and XNEW. The stop line number (STOPLN), the error number (ERRNUM), and the DATA parameters (DATALN and DATAD) are all set to zero. The RADFLG flag is set to zero, indicating that transcendental functions are working in radians. The break byte (BRKBYT) is set off and $FF is stored in TRAPLN to indicate that errors are not being trapped. All IOCBs (devices and files) are closed. The enter and list devices (ENTDTD and LISTDTD) are set to zero to indicate the keyboard and the screen, respectively. Finally, the READY message is printed and control passes to the Program Editor. 110