ATARI BASIC

WHAT IS ATARI BASIC?

ATARI BASIC is an interpreted language. This means programs can be run when they are entered without intermediate stages of compilation and linking. The ATARI BASIC interpreter resides in an 8K ROM cartridge in the left slot of the computer. It encompasses addresses A000 through BFFF. At least 8K of RAM is required to run BASIC.

To use ATARI BASIC effectively, you must know its strengths and weaknesses. With this information, programs can be written that make good use of the assets and features of ATARI BASIC.

Strengths of ATARI BASIC

Weaknesses of ATARI BASIC

HOW ATARI BASIC WORKS

The workings of the BASIC interpreter are summarized as follows: The details of these operations are discussed in the following four sections.

THE TOKENIZING PROCESS

In simple terms, the tokenization of a line of code in BASIC looks like this:
    1. BASIC gets a line of input
    2. It then checks for legal syntax
    3. During syntax checking it is tokenized
    4. The tokenized line is moved into the token program
    5. If the line is in immediate mode it is executed
To better understand the tokenizing process, some terms must first be defined: BASIC begins the tokenizing process by getting a line of input. This input will be obtained from one of the handlers of the operating system. Normally it is from the screen editor; however with the ENTER command, any device can be specified. The call BASIC issues is a GET RECORD command, and the data returned is ATASCII information terminated by an EOL. This data is stored by CIO into the BASIC Input Line Buffer from 580 to 5FF hex.

After the record is returned, the syntax checking and tokenizing processes begin. First BASIC looks for a line number. If one is found, it is converted into a 2- byte integer. If no line number is present, it is assumed to be in immediate mode and the line-number 8000 hex is assigned to it. These will be the first two tokens of the tokenized line. This line is built in the token output buffer that is 256 bytes long and resides at the end of the reserved operating system RAM.

The next token is a dummy byte reserved for the byte count (or offset) from the start of this line to the start of the next line. Following that is another dummy byte for the count of the start of this line to the start of the next statement. These values will be set when tokenization is complete for the line and the statement respectively. The use of these values is discussed in the program execution process section.

BASIC now looks for the command of the first statement of the input line. A check is made to determine if this is a valid command by scanning a list of legal commands in ROM. If a match is found, then the next byte in the token line becomes the number of the entry in the ROM list that matched. If no match is found, a syntax error token is assigned to that byte and BASIC stops tokenizing, copies the rest of the input buffer in ATASCII format to the token output buffer, and prints the error line.

Assuming a good line, one of seven items can follow the command: a variable, a constant, an operator, a function, a double quote, another statement, or an EOL. BASIC tests if the next input character is numeric. If not then it compares that character and those following against the entries of the variable name table. If this is the first line of code entered in the program then no match is found. The characters are then compared against the function and operator tables. If no match is found there then BASIC assumes that this is a new variable name. Since this is the first variable it is assigned the first entry in the variable name table. The characters are copied out of the input buffer and stored into the name table with the most significant bit (MSB) set on the last byte of the name. Eight bytes are then reserved in the variable value table for this entry. (See the variable value table discussion in the section, "Token File Structure".)

The token that ends up in the tokenized line is the variable number minus one; with the MSB set. Thus the token of the first variable entered would be 80 Hex, the second would be 81, and so on up to FF for a total of 128 unique variable numbers.

If a function is found, then its entry number in the operator function table is assigned to the token. Functions require certain sequences of parameters; these are contained in syntax tables, and if they are not matched, a syntax error will result.

If an operator is found, then a token is given its table entry number. Operators can follow each other in a rather complex fashion (such as multiple parentheses), so the syntax checking of them is a bit complicated.

In the case of the double quotes, BASIC assumes that a character string is following and assigns a 0F hex to the output token and reserves a dummy byte for the string length. The characters are moved from the input buffer into the output buffer until the second set of quotes is found. The length byte is then set to the character count.

If the next characters in the input buffer are numeric, BASIC converts them into a 6-byte BCD constant. A 0E hex token will be put in the output buffer, followed by the six byte constant.

When a colon is encountered, a 14 hex token is inserted in the output buffer and the offset from the start of the line is stored in the dummy byte that was reserved for the count to the start of the next statement. At this point another dummy byte is reserved and the process goes back to get a command.

When the EOL is found, a 16 hex token is stored and the offset from the start of the line is put in the dummy byte for the line offset. At this point, tokenization is complete and BASIC moves the token line into the token program. First it searched the program for that line number. If it is found it replaces the old line with the new one. If it is not found, then it inserts the new line in the correct numerical sequence. In both cases, the data following the line will be moved either up or down in memory to allow for an expanding and contracting program size.

BASIC now checks if the tokenized line is an immediate mode line. If so, that line is executed according to the methods described in the interpretive process; if not, BASIC goes back to get another line of input.

If at any time during the tokenizing process the length of the token line exceeds 256 bytes, an ERROR 14 message (line too long) is sent to the screen and BASIC goes back to get the next line of input.

An example line of input and its token form looks like this (all token values are hexadecimal):

THE TOKEN FILE STRUCTURE

The token file contains two major segments: (1) a group of zero page pointers that point into the token file, and (2) the actual token file itself. The zero page pointers are 2-byte values that point to various sections of the token file. There are nine 2-byte pointers and they are in locations 80 to 91 hex. Following is a list of the pointers and the sections of the token file they reference.

Pointer (hex) Token File Section (Contiguous Blocks)
LOMEM 80,81 Token output buffer - This is the buffer BASIC uses to tokenize one line of code. It is 256 bytes long. This buffer resides at the end of the operating system's allocated RAM.
VNTP 82,83 Variable name table - A list of all the variable names that have been entered in the program. They are stored as ATASCII characters, each new name stored in the order it was entered. Three types of name entries exist:
  1. Scalar variables - MSB set on last character in name.
  2. String variables - last character is a with the MSB set.
  3. Array variables - last character is a with the MSB set.
VNTD 84,85 Variable name table dummy end - BASIC uses this pointer to indicate the end of the name table. This normally points to a dummy zero byte when there are less than 128 variables. When 128 variables are present, this points to the last byte of the last variablename.
VVTP 86,87 Variable value table - This table contains current information on each variable. For each variable in the name table, eight bytes are reserved in the value table. The information for each variable type is:
Byte Number12345 678
Scalar00Var#6-byte BCD constant
Array (DIMed)
(unDIMed)
41
40
Var# Offset from
STARP(8C,8D)
first
DIM + 1
second
DIM + 1
String (DIMed)
(unDIMed)
81
80
Var# Offset from
STARP
Length DIM

A scalar variable contains a numeric value. An example is X=1. The scalar is X and its value is 1, stored in 6-byte BCD format. An array is composed of numeric elements stored in the string/array area and has one entry in the value table. A string, composed of character elements in the string/array area, also has one entry in the table.

The first byte of each value entry indicates the type of variable: 00 for a scalar, 40 for an array, and 80 for a string. If the array or string has been dimensioned, then the LSB is set on the first byte.

The second byte contains the variable number. The first variable entry is number zero, and if 128 variables were present, the last would be 7F.

In the case of the scalar variable the third through eighth byte contain the 6-byte BCD number that has currently been assigned to it.

For arrays and strings, the third and fourth bytes contain an offset from the start of the string/array area (described below) to the beginning of the data.

The fifth and sixth bytes of an array contain its first dimension. The quantity is a 16-bit integer and its value is 1 greater than the user entered. The seventh and eighth bytes are the second dimension, also a value of 1 greater.

The fifth and sixth bytes of a string are a 16 bit integer that contains its current length. The seventh and eighth bytes are its dimension (up to 32767 bytes in size).

STMTAB 88,89 Statement Table - This block of data includes all the lines of code that have been entered by the user and tokenized by BASIC, and it also includes the immediate mode line. The format of these lines is described in the tokenized line example of the section on the tokenizing process.
STMCUR 8A,8B Current Statement - This pointer is used by BASIC to reference particular tokens within a line of the statement table. When BASIC is waiting for input, this pointer is set to the beginning of the immediate mode line.
STARP 8C,8D String/Array area - This block contains all the string and array data. String characters are stored as one byte ATASCII entries, so a string of 20 characters will require 20 bytes. Arrays are stored with 6-byte BCD numbers for each element. A 10-element array would require 60 bytes. This area is allocated and subsequently enlarged by each dimension statement encountered, the amount being equal to the size of a string dimension or six times the size of an array dimension.
RUNSTK 8E,8F Run time stack - This software stack contains GOSUB and FOR/NEXT entries. The GOSUB entry consists of four bytes. The first is a 0 byte indicating GOSUB, followed by the 2-byte integer line number on which the call occurred. This is followed by the offset into that line so the RETURN can come back and execute the next statement. The FOR/NEXT entry contains 16 bytes. The first is the limit the counter variable can reach. The second byte is the step or counter increment. Each of these quantities is in 6-byte BCD format. The thirteenth byte is the counter variable number with the MSB set. The fourteenth and fifteenth bytes are the line number, and the sixteenth is the line offset to the FOR statement.
MEMTOP 90,91 Top of application RAM - This is the end of the user program. Program expansion can occur from this point to the end of free RAM, which is defined by the start of the display list. The FRE function returns the amount of free RAM by subtracting MEMTOP from HIMEM (2E5,2E6). Note that the BASIC MEMTOP is not the same as the OS variable called MEMTOP.

THE PROGRAM EXECUTION PROCESS

Executing a line of code is a process that involves reading the tokens that were created during the tokenization process. Each token has a particular meaning that causes BASIC to execute a specific series of operations. The method of doing this requires that BASIC get one token at a time from the token program and then process it. The token is an index into a jump table of routines, so a PRINT token will point indirectly to a PRINT processing routine. When that processing is complete, BASIC returns to get the next token. The pointer that is used to fetch each token is called STMCUR and is at 8A and 8B.

The first line of code that is executed in a program is the immediate mode line. This is usually a RUN or GOTO. In the case of the RUN, BASIC gets the first line of tokens from the statement table (tokenized program) and processes it. If all the code is in-line, then BASIC merely executes consecutive lines.

If a GOTO is encountered, then the line to go to must be found. The statement table contains a linked list of tokenized BASIC lines. These lines are stored in ascending numerical order. To find a line somewhere in the middle of the table, BASIC starts by finding the first line of the program.

The address of the first line is contained in the STMTAB pointer at 88 and 89. This address is now stored in a temporary pointer. The first 2 bytes of the first line are its line number which is compared against the requested line number. If the first number is less, then BASIC gets the next line by adding the third byte of the first line to the temporary pointer. The temporary pointer will now be pointing to the second line. Again the first 2 bytes of this new line are compared to he requested line, and if they are less, the third byte is added to the pointer. If a line number does match, the contents of the temporary pointer are moved into STMCUR and BASIC fetches the next token from the new line. Should the requested line number not be found, an ERROR 12 is generated.

The GOSUB involves more processing than the GOTO. The line finding routine is the same, but before BASIC goes to that line it sets up an entry in the Run Time Stack. It allocates four bytes at the end of the stack and stores a 0 in the first byte to indicate a GOSUB stack entry. It then stores the line number it was on when the call was made into the next two bytes of the stack. The final byte contains the offset in bytes from the start of that line to where the GOSUB token was found. BASIC then executes the line it looked up. When the RETURN is found, the entry on the stack is pulled off, and BASIC returns to the calling line.

The FOR command causes BASIC to allocate 16 bytes on the Run Time Stack. The first six bytes are the limit the variable can reach in 6-byte BCD format. The second six bytes are the step, in the same format. Following these, BASIC stores the variable number (MSB set) of the counting variable. It then stores the present line number (two bytes) and the offset into the line. The rest of the line is then executed.

When BASIC finds the NEXT command, it looks at the last entry on the stack. It makes sure the variable referenced by the NEXT is the same as the one on the stack and checks if the counter has reached or exceeded the limit. If not then BASIC returns to the line with the FOR statement and continues execution. If the limit was reached, then the FOR entry is pulled off the stack and execution continues from that point.

When an expression is evaluated, the operators are put onto an operator stack and are pulled off one at a time and evaluated. The order in which the operators are put onto the stack can either be implied, in which case BASIC looks up the operator's precedence from a ROM table, or the order can be explicitly stated by the placement of parentheses.

Pressing the BREAK key at any time causes the operating system to set a flag to indicate this occurrence. BASIC checks this flag after each token is processed. If it finds it has been set, it stores the line number at which this occurred, prints out a "STOPPED AT LINE XXXX" message, clears the BREAK flag and waits for user input. At this point the user could type CONT and program execution would continue at the next line.

SYSTEM INTERACTION

BASIC communicates with the Operating System primarily through the use of 1/0 calls to the Central 1/0 Utility (CIO). Following is a list of user BASIC calls and the corresponding operating system IOCB (Input/Output Control Block) setups.

SAVE/LOAD: When a BASIC token program is saved to a device, two blocks of information are written. The first block consists of seven of the nine zero page pointers that BASIC uses to maintain the token file. These are LOMEM(80,81) through STARP (8C,8D). There is one change made to these pointers when they are written out: The value of LOMEM is subtracted from each of the 2-byte pointers, and these new values are written to the device. Thus the first 2-bytes written will be 0,0.

The second block of information written consists of the following token file sections: (1) The variable name table, (2) the variable value table, (3) the token program, and (4) the immediate mode line.

When this program is loaded into memory, BASIC looks at the OS variable MEMLO (2E7,2E8) and adds its value to each of the 2-byte zero page pointers as they are read from the device. These pointers are placed back on page zero and then the values of RUNSTK(8E,8F) and MEMTOP (90,91) are set to the value in STARP.

Next, 256 bytes are reserved in memory above the value of MEMLO to allocate space for the token output buffer. Then the token file information, consisting of the variable name table through the immediate mode line, is read in. This data is placed in memory immediately following the token output buffer.

IMPROVING PROGRAM PERFORMANCE

Program performance can be improved in two ways. First the execution time can be decreased (it will run faster) and second, the amount of space required can be decreased, allowing it to use less RNA. To attain these two goals, the following lists can be used as guidelines. The methods of improvement in each list are primarily arranged in order of decreasing effectiveness. Therefore the method at the top of a list will have more impact than one on the bottom.

Speeding Up a BASIC Program

  1. Recode - Because BASIC is not a structured language, the code written in it tends to be inefficient. After many revisions it becomes even worse. Thus, the time spend to restructure the code is worthwhile.
  2. Check algorithm logic - Make sure that the code to execute a process is as efficient as possible.
  3. Put frequently called subroutines and FOR/NEXT loops at the start of the program - BASIC starts at the beginning of a program to look for a line number, so any line references near the end will take longer to reach.
  4. For frequently called operations within a loop use in-line code rather than subroutines - The program speed can be improved here since BASIC spends time adding and removing entries from the run time stack.
  5. Make the most frequently changing loop of a nested set the deepest - In this way, the run time stack will be altered the fewest number of times.
  6. Simplify floating point calculations within the loop - if a result is obtained by multiplying a constant by a counter, time could be saved by changing the operation to an add of a constant.
  7. Set up loops as multiple statements on one line - In this way the BASIC interpreter will not have to get the next line to continue the loop.
  8. Disable the screen display - If visual information is not important for a period of time, up to a 30 percent time savings can be made with a POKE 559,0.
  9. Use a faster graphics mode or a short display list - If a full screen display is not necessary then up to 25 percent time savings can be made.
  10. Use assembly code - Time savings can be made by encoding loops in assembler and using the USR function.
Saving Space In A BASIC Program

  1. Recode - As mentioned previously, restructuring the program will make it more efficient. It will also save space.
  2. Remove remarks - Remarks are stored as ATASCII data and merely take up space in the running program.
  3. Replace a constant used three times or more with a variable BASIC allocates seven bytes for a constant but only one for a variable reference, so six bytes can be saved each time a constant is replaced with a variable assigned to that constant's value.
  4. Initialize variables with a read statement - A data statement is stored in ATASCII code, one byte per character, whereas an assignment statement requires seven bytes for one constant.
  5. Try to convert numbers used once and twice to operations of predefined variables - An example is to define Z1 to equal 1, Z2 to equal 2, and if the number 3 is required, replace it with the expression Z1 + Z2.
  6. Set frequently used line numbers (in GOSUB and GOTO) to predefined variables - If the line 100 is referenced 50 times, approximately 300 bytes can be saved by equating Z100 to 100 and referencing Z100
  7. Keep the number of variables to a minimum - Each new variable entry requires 8 more bytes in the variable value table plus a few bytes for its name.
  8. Clean up the value and name tables - Variable entries are not deleted from the value and name tables even after all references to them are removed from the program. To delete the entries LIST the program to disk or cassette, type NEW, then ENTER the program.
  9. Keep variable names as short as possible - Each variable name is stored in the name table as ATASCII information. The shorter the names, the shorter the table.
  10. Replace text used repeatedly with strings - On screens with a lot of text, space can be saved by assigning a string to a commonly used set of characters.
  11. Initialize strings with assignment statements - An assignment of a string with data in quotes requires less space than a READ statement and a CHR$ function.
  12. Concatenate lines into multiple statements - Three bytes can be saved each time two lines are converted into two statements on one line.
  13. Replace once used subroutines with in-line code - The GOSUB and RETURN statements waste bytes if used only once.
  14. Replace numeric arrays with strings if the data values do not exceed 255 - Numeric array entries require six bytes each, whereas string elements only need one.
  15. Replace SETCOLOR statements with POKE commands - This will save 8 bytes.
  16. Use cursor control characters rather than POSITION statements The POSITION statement requires 15 bytes for the X,Y parameters whereas the cursor editing characters are one byte each.
  17. Delete lines of code via program control - See the advanced programming techniques section.
  18. Modify the string/array pointer to load predefined data - By changing the value in STARP, string and array information can be saved.
  19. Small assembly routines can be stored in USR calls - For example X=USR(ADR("hhh*LVd"),16).
  20. Chain programs - An example would be an initialization routine that is run first and then loads and executes the main program.

ADVANCED PROGRAMMING TECHNIQUES

An understanding of fundamentals of ATARI BASIC makes it possible to write some interesting applications. These can be strictly BASIC operations, or they can also involve features of the operating system.

Example 1 - String Initialization - This program will set all the bytes of a string of any length to the same value. BASIC copies the first byte of the source string into the first byte of the destination string, then the second, third, and so on. By making the destination string the second byte of the source, the same character can be stored throughout the entire string.

Example 2 - Delete Lines Of Code - By using a feature of the operating system, a program can delete or modify lines of code within itself. The screen editor can be set to accept data from the screen without user input. Thus by first setting up the screen, positioning the cursor to the top, and then stopping the program, BASIC will be getting the commands that have been printed on the screen.

Example 3 - Player/Missile (P/M) Graphics With Strings - A fast way to move player/missile graphics data is shown in this example. A dimensioned string has its string/array area offset value changed to point to the P/M graphics area. Writing to this string with an assignment statement will now write data into the P/M area at assembly language rates.