Atari BASIC: A High-Level Language Translator The programming language which has become the de facto standard for the Atari Home Computer is the Atari 8K BASIC Cartridge, known simply as Atari BASIC. It was designed to serve the programming needs of hoth the computer novice and the experienced programmer who is interested in developing sophisticated applications programs. In order to meet such a wide range of programming needs, Atari BASIC was designed with some unique features. In this chapter we will introduce the concepts of high level language translators and examine the design features of Atari BASIC that allow it to satisfy such a wide variety of needs. Language Translators Atari BASIC is what is known as a high level language translator. A language, as we ordinarily think of it, is a system for communication. Most languages are constructed around a set of symbols and a set of rules for combining those symbols. The English language is a good example. The symbols are the words you see on this page. The rules that dictate how to combine these words are the patterns of English grammar. Without these patterns, communication would be very difficult, if not impossible: Out sentence this believe, of make don't this trying if sense you to! If we don't use the proper symbols, the results are also disastrous: @twu2 yeggopt gjsiem, keorw? In order to use a computer, we must somehow communicate with it. The only language that our machine really understands is that strange but logical sequence of ones and zeros known as machine language. In the case of the Atari, this is known as 6502 machine language. When the 6502 central processing unit (CPU) "sees" the sequence 01001000 in just the right place according to its rules of syntax, it knows that it should push the current contents of Chapter One the accumulator onto the CPU stack. (If you don't know what an "accumulator" or a "CPU stack" is' don't worry about it. For the discussion which follows, it is sufficient that you be aware of their existence.) Language translators are created to make it simpler for humans to communicate with computers. There are very few 6502 programmers, even among the most expert of them, who would recognize 01001000 as the push-the-accumulator instruction. There are more 6502 programmers, but still not very many, who would recognize the hexadecimal form of 01001000, $48, as the push-the-accumulator instruction. However, most, if not all, 6502 programmers will recognize the symbol PHA as the instruction which will cause the 6502 to push the accumulator. PHA, $48, and even 01001000, to some extent, are translations from the machine's language into a language that humans can understand more easily. We would like to be able to communicate to the computer in symbols like PHA; but if the machine is to understand us, we need a language translator to translate these symbols into machine language. The Debug Mode of Atari's Editor/Assembler cartridge, for example, can be used to translate the symbols $48 and PHA to the ones and zeros that the machine understands. The debugger can also translate the machine's ones and zeros to $48 and PHA. The assembler part of the Editor/Assembler cartridge can be used to translate entire groups of symbols like PHA to machine code. Assemblers An assembler - for example, the one contained in the Assembler/Editor cartridge - is a program which is used to translate symbols that a human can easily understand into the ones and zeros that the machine can understand. In order for the assembler to know what we want it to do, we must communicate with it by using a set of symbols arranged according to a set of rules. The assembler is a translator, and the language it understands is 6502 assembly language. The purpose of 6502 assembly language is to aid program authors in writing machine language code. The designers of the 6502 assembly language created a set of symbols and rules that matches 6502 machine language as closely as possible. This means that the assembler retains some of the Chapter One disadvantages of machine language. For instance, the process of adding two large numbers takes dozens of instructions in 6502 machine language. If human programmers had to code those dozens of instructions in the ones and zeros of machine language, there would be very few human programmers. But the process of adding two large numbers in 6502 assembly language also takes dozens of instructions. The assembly language instructions are easier for a programmer to read and remember, but they still have a One-to-one cor– respondence with the dozens of machine language instructions. The programming is easier, but the process remains the same. High Level Languages High level languages, like Atari BASIC, Atari PILOT, and Atari Pascal, are simpler for people to use because they more closely approximate human speech and thought patterns. However, the computer still understands only machine language. So the high level languages, while seeming simple to their users, are really much more complex in their internal operations than assembly language. Each high level language is designed to meet the specific need of some group of people. Atari Pascal is designed to implement the concept of structured programming. Atari PILOT is designed as a teaching tool. Atari BASIC is designed to serve both the needs of the novice who is just learning to program a computer and the needs of the expert programmer who is writing a sophisticated application program, but wants the program to be accessible to a large number of users. Each of these languages uses a different set of symbols and symbol-combining rules. But all these language translators were themselves written in assembly language. Language Translation Methods There are two different methods of performing language translation - compilation and interpretation. Languages which translate via interpretation are called interpreters. Languages which translate via compilation are called compilers. Interpreters examine the program source text and simulate the operations desired. Compilers translate the program source text into machine language for direct machine execution. Chapter One The compilation method tends to produce faster, more efficient programs than does the interpretation method. However, the interpretation method can make programming easier. Problems with the Compiler Method The compiler user first creates a program source file on a disk, using a text editing program. Then the compiler carefully examines the source program text and generates the machine language as required. Finally, the machine language code is loaded and executed. While this three-step process sounds fairly simple, it has several serious 'gotchas." Language translators are very particular about their symbols and symbol-combining rules. If a symbol is misspelled, if the wrong symbol is used, or if the symbol is not in exactly the right place, the language translator will reject it. Since a compiler examines the enure program in one gulp, one misplaced symbol can prevent the compiler from understanding any of the rest of the program - even though the rest of the program does not violate any rules! The result is that the user often has to make several trips between the text editor and the compiler before the compiler successfully generates a machine language program. But this does not guarantee that the program will work. If the programmer is very good or very lucky, the program will execute perfectly the very first time. Usually, however, the user must debug the program. This nearly always involves changing the source program, usually many times. Each change in the source program sends the user back to step one: after the text editor changes the program, the compiler still has to agree that the changes are valid, and then the machine code version must be tested again. This process can be repeated dozens of times if the program is very complex. Faster Programming or Faster Programs? The interpretation method of language translation avoids many of these problems. Instead of translating the source code into machine language during a separate compiling step, the interpreter does all the translation while the program is running. This means that whenever you want to test the program you're writing, you merely have to tell the interpreter to run it. If things don't work right; stop the program, make a few changes, and run the program again at once. Chapter One You must pay a few penalties for the convenience of using the interpreter's interactive process, but you can generally develop a complex program much more quickly than the compiler user can. However, an interpreter is similar to a compiler in that the source code fed to the interpreter must conform to the rules of the language. The difference between a compiler and an interpreter is that a compiler has to verify the symbols and symbol-combining rules only once - when the program is compiled. No evaluation goes on when the program is running. The interpreter, however, must verify the symbols and symbol-combining rules every time it attempts to run the program. If two identical programs are written, one for a compiler and one for an interpreter, the compiled program will generally execute at least ten to twenty times faster than the interpreted program. Pre-compiling Interpreter Atari BASIC has been incorrectly called an interpreter. It does have many of the advantages and features of an interpretive language translator, but it also has some of the useful features of a compiler. A more accurate term for Atari's BASIC Language Translator is pre-compiIing interpreter. Atari BASIC, like an interpreter, has a text editor built into it. When the user enters a source line, though, the line is not stored in text form, but is translated into an intermediate code, a set of symbols called tokens. The program is stored by the editor in token form as each program line is enterred. Syntax and symbol errors are weeded out at that time. Then, when you run the program, these tokens are examined and their functions simulated; but hecause much of the evaluation has already been done, the execution of an Atari BASIC program is faster than-that of a pure interpreter. Yet Atari BASIC's program-building process is much simpler than that of a compiler. Atari BASIC has advantages over compilers and interpreters alike. With Atari BASIC, every time you enter a line it is verified for language correctness. You don't have to wait until compilation; you don't even have to wait until a test run. When you type RUN you already know there are no syntax errors in your program. Chapter One Internal Design Overview Atari BASIC is divided into two major functional areas: the Program Constructor and the Program Executor. The Program Constructor is used when you enter and edit a BASIC program. The source line pre-compiler, also part of the Program Constructor, translates your BASIC program source text lines into tokenized lines. The Program Executor is used to execute the tokenized program - when you type RUN, the Program Executor takes over. Both the Program Constructor and the Program Executor are designed to use data tables. Some of these tables are already contained in BASIC's ROM (read-only memory). Others are constructed by BASIC in the user RAM (random- access memory). Understanding these various tables is an important key to understanding the design of Atari BASIC. Tokens In Atari BASIC, tokens are the intermediate code into which the source text is translated. They represent source-language symbols that come in various lengths - some as long as 100 characters (a long variable name) and others as short as one character ("+" or "-"). Every token, however, is exactly one eight-bit byte in length. Since most BASIC Language Symbols are more than one character long, the representation of a multi-character BASIC Language Symbol with a single-byte token can mean a considerable saving of program storage space. A single-byte token symbol is also easier for the Program Executor to recognize than a multi-character symbol, since it can be evaluated by machine language routines much more quickly. The SEARCH routine - 76 bytes long - located at $A462 isa good example of how much assembly language it takes to recognize a multi-character symbol. On the other hand, the two instructions located at $AB42 are enough to
<-Start Chapter 02->