.so /usr/lib/tmac/tmac.e .TH ASM_OPS 5 "1 Nov 1986" .SH NAME asm_ops \- generic assembler op-code tables .SH DESCRIPTION The generic assembler .b asm can be made to assemble code for a number of different microprocessors. At the time of this writing, codes have been developed for the Intel 8085, the Motorola 6803, and the 6502 (Commodore 64). This manual page will describe the format of the ops.h file, which contains the processor-specific parts of the assembler. The structures described below are defined in the file asm.h. The opd.h file consists in a series of structure initializations, that generate tables in the assembler when it is compiled. All lines are of the form: table-type name = {value, ....., value}; where the names are arbitrary (within the c-compiler naming restrictions) and the values may be integers, Flags (i.e., boolean-valued integers), strings (actually pointers to strings), pointers to other structures within the file, and pointers to functions in the assembler itself. The type Word refers an unsigned byte and Memad refers to a type holding any possible memory address for the target machine. The first structure is opdclass, which defines the kinds of operands that may appear in an instruction, and their encoding in the corresponding instruction word: .nf typedef struct opdclassitem { /* This defines an instruction field */ int length; /* length in bits of field */ Flag signed; /* else unsigned */ Flag byteswapped; /* data has bytes backwards */ Flag relative; /* field is relative to $ */ int offset; /* fixed value added to field */ } opdclass; .fi An operand's .b length refers to the number of bits that are allocated for it in the instruction field. If that number is eight, then only numbers from -128 through +127 (two's complement assumed) can fit in the field. If the next flag, .b signed , is set then the range becomes 0 through 255, in this example. The .b byteswapped flag is set if the bytes (in a multibyte field) are to be loaded within a 2 byte word in right-to-left order, rather then the more conventional left-to-right. The .b relative flag is set if the value to be placed in the field must first be decremented by the value of the location counter before insertion. Finally, .b offset is an integer value to be added to the value of the field before it is inserted. As an example, an entry for the 6805 reads: opdclass o_rmem = { 8, YES, NO , YES, -2}; This defines a field that is used in relative-mode instructions. The field is eight bits long, is signed, and is relative to the current lc. In addition, it is expected to be decremented by two. Given all this, the legal range of a value to be placed in this field must be from (lc)-126 through (lc)+129 inclusive, where (lc) is the current value of the location counter (which points to the first byte of the current instruction). The second "set" of structures, insclass, define an instruction type. Every generated instruction must fall within one of these types. They define the instruction structure (as a collection of fields) and the written form of its invocation: .nf typedef struct insclassitem { /* This defines an instruction type */ int length; /* instruction length in bytes */ int mopds; /* number of operands expected */ opdclass *type[MAXOPDS]; /* each operand's field type */ int offset[MAXOPDS]; /* each operand's bit offset, from right end of first byte */ } insclass; .fi The .b length of an instruction type is the number of bytes in the instruction, including all the fields. The number of operands expected .b mopd may be 0, 1, or 2 (making this larger would involve changes to asm.h and asm.c). MAXOPDS enforces the current limit on operands to two. The members of the array .btype are pointers to the appropriate opdclass defined above. When the instruction is scanned, the first operand must fit the field described in the structure pointed to be xxx.type[0], the second by xxx.type[1]. The array .b offset defines the amount of shifting to be done to properly align the field in the instruction. An offset of zero states that the field's rightmost bit should be placed in the rightmost bit of the instruction's first byte; a negative offset requires the value to be shifted left that many bits, and a positive value must be shifted right. An example, again from the 6805, shows the format of a relative instruction: insclass i_rel = {2, 1, &o_rmem, &o_none, 8, 0}; Such an instruction is two bytes long, and contains one operand. This operand is a relative memory operand (from the example above), and it must be shifted to the right 8 bits (which puts it in the second byte of the instruction exactly). The second operand must have an address even though its not used; o_none fills this requirement. All this is leading, of course to the definition of individual instructions. These are defined in the opdef structures: .nf typedef struct opdefitem { /* Defines an instruction */ char *name; /* instruction mnemonic */ insclass *class; /* instruction type */ Word mask; /* mask for first byte of instruction */ int (*action) (); /* action routine for assembly */ } opdef; .fi Each instruction has a .b name that is recognized during the source code scanning. It also has a pointer to the insclass .b class that defines both what the scanner should expect and how the finished instruction looks. The .b mask is a value to be or'ed in with the assembled operand fields to complete the instruction. It normally contains the instruction-unique bits known as the opcode. Finally, the routine .b action defined to assemble the instruction must be given. For all normal instructions, the routine is .u geninstr , which generates all normal instructions from the table data. This field is defined primarily so that pseudo-ops can also use the structure. Now, the opdef table is defined in a slightly different way than the other tables. The entries in the other tables are all referenced by pointers, so their order is of no consequence. The opdef table (named optab), on the other hand, must be searched by the assembler. Therefore, each entry is a member of the array optab, and not a separate statement. In addition, the entries are all arranged in alphabetical order by .b name . An example of a defined instruction for the 6803 is .nf opdef optab [] = .... "bra" , &i_rel , 0x20, geninstr, .... }; .fi The unconditional branch instruction has its format defined by the i_rel class of instruction formats (which, as shown above, defines a two byte instruction with one operand, etc.). The mask for the first byte of the instruction (the opcode for a branch) is hex 20. It is generated by the geninstr routine. Following the definition of optab is the statement #define oplen sizeof(optab)/sizeof(opdef) which reveals the size of the optab to asm.c. What of the microprocessors that have two different instructions that may be used to perform an operation, such as the 6803 that can load a register from a memory location with either a two byte relative instruction or a three byte extended instruction? The native assembler can generate the shortest instruction that will fulfill the effect; so can the generic assembler, under some circumstances. The third set of structures, called choicedef, is used in this case: .nf typedef struct chcitem { /* Defines the alternatives for instr choices */ char *rname; /* restrictive mnemonic */ char *nname; /* non-restrictive mnemonic */ int field; /* operand that is restricted */ int lorange, hirange; /* range of restriction inclusive */ Flag relative; /* to current lc, else absolute value */ } choicedef; .fi Any choicedef that exists (there may be none, if the microprocessor has no such overlapping instructions) describes the tradeoff to be made. The .b rname is the mnemonic that may be used in the restrictive case of the choice (i.e., the one that is more desireable, usually leading to a smaller instruction). .b Nname is the mnemonic to be used otherwise. Presumably, the two choices are mutually inclusive of all possibilities, so that the nname mnemonic may be substituted in all cases to achieve the desired result. The field .b field is either one or two, describing which field the choice hinges on. The .b lorange and .b hirange values are the inclusive limits of the values that the field must fall within in order to qualify for the restrictive choice. Finally, the .b relative flag staes that the ranges are or are not relative to the current location counter. (At this point, the relative flag is not implemented.) The infamous example: "add" , (insclass *)&c_add, 0x00, choiceinstr, This entry in the optab table defines an pseudo instruction called add. It may be expanded as the instruction adds, if conditions are right, or as addz in any case. Instead of pointing to an instruction class, the second entry in the structure is the address of a choice structure, which is cast as an insclass pointer to keep c and lint happy. The mask is defaulted (its value is not used), and the generating routine is changed to choiceinstr, which handles the cases. The choicedef entry pointed to is: choicedef c_add = {"adds" , "addz" , 2, 0, 0xff, NO}; This defines a choice of either the adds instruction or the addz instruction to add a memory location to an accumulator. The second field, the memory reference, is the key: if the reference is greater than or equal to zero, and less than or equal to hex ff (decimal 255), then adds can be used; otherwise only addz is allowed. As I said above, the choice mechanism is restricted in the decisions that it can make in attempting to use the shortest instruction. Since the choices are all made during the first pass, the expression in the deciding field must be completely backward-defined. That means that all symbols in the expression must be either constants, be predefined (see below) or have been defined in the code physically before the line where the choice resides. In addition, all symbols in the expression must be in the same segment as the code being generated. This is not to say that using a choice instruction containing a forward reference is an error; rather, the assembler, in the absence of required data to make the decision, will opt for the "otherwise" form of the instruction, when it could be possible to use the other. As Captain Kirk says, "Sie la vie." The last set of entries in the ops.h file is a group of predefined symbols that the assembler "knows" about for all assemblies on this microprocessor. An example of these definitions is: .nf symbol predef[] = { {"ra" , 0x0, &o_reg , (struct seg *)0 }, {"rb" , 0x1, &o_reg , (struct seg *)0 }, {"eq" , 0x7, &o_cond , (struct seg *)0 }, {"nc" , 0x4, &o_cond , (struct seg *)0 }, {"" , 0x0, &o_none , (struct seg *)0 }, }; .fi These predefine the symbols ra, rb, eq, and nc for use in instructions that reference register a, register b, and the branch conditions eq and nc (no carry). Each is given a value, points to an operand type (which is not currently used), and defined in the null segment (that is, the only predefined, default segment). Note the null entry at the end of the table. Now, given the above description it should be possible to define an ops.h file to generate an assembler for most any machine. This scheme should work for almost any 8 or 16 bit microprocessor, although its only been used with eights. Restrictions are enforced by the use of a long variable for expression solution and for instruction compilation, and probably other things that will become apparant when the assembler is ported to some wierd machine. Choices for instruction format are arbitrary, of course. I use a branch conditional instruction with the conditional code as an operand to reduce the size of the optab (simply laziness); it would be easy to have individual branch instructions instead. So it goes.