.so /usr/lib/tmac/tmac.e
.TH ASM_OPS 5 "1 Nov 1986"
.SH NAME
asm_ops \- generic assembler op-code tables
.SH DESCRIPTION
The generic assembler
.b asm
can be made to assemble code for a number of different microprocessors. At
the time of this writing, codes have been developed for the Intel 8085,
the Motorola 6803, and the 6502 (Commodore 64). This manual page will
describe the format of the ops.h file, which contains the processor-specific
parts of the assembler. The structures described below are defined in the
file asm.h.

The opd.h file consists in a series of structure initializations, that
generate tables in the assembler when it is compiled. All lines are of the
form:

	table-type name = {value, ....., value};

where the names are arbitrary (within the c-compiler naming restrictions)
and the values may be integers, Flags (i.e., boolean-valued integers),
strings (actually pointers to strings), pointers to other structures
within the file, and pointers to functions in the assembler itself. The
type Word refers an unsigned byte and Memad refers to a type holding any
possible memory address for the target machine.

The first structure is opdclass, which defines the kinds of operands that
may appear in an instruction, and their encoding in the corresponding
instruction word:

.nf
typedef struct opdclassitem {   /* This defines an instruction field */
	int length;                     /* length in bits of field */
	Flag signed;                    /* else unsigned */
	Flag byteswapped;               /* data has bytes backwards */
	Flag relative;                  /* field is relative to $ */
	int offset;                     /* fixed value added to field */
} opdclass;
.fi

An operand's
.b length
refers to the number of bits that are allocated for it
in the instruction field. If that number is eight, then only numbers from
-128 through +127 (two's complement assumed) can fit in the field. If the
next flag,
.b signed ,
is set then the range becomes 0 through 255, in this
example. The
.b byteswapped
flag is set if the bytes (in a multibyte field)
are to be loaded within a 2 byte word in right-to-left order, rather then
the more conventional left-to-right. The
.b relative
flag is set if the value
to be placed in the field must first be decremented by the value of the
location counter before insertion. Finally,
.b offset
is an integer value
to be added to the value of the field before it is inserted. As an
example, an entry for the 6805 reads:

opdclass o_rmem  = { 8, YES, NO , YES, -2};

This defines a field that is used in relative-mode instructions. The field
is eight bits long, is signed, and is relative to the current lc. In
addition, it is expected to be decremented by two. Given all this, the
legal range of a value to be placed in this field must be from (lc)-126
through (lc)+129 inclusive, where (lc) is the current value of the
location counter (which points to the first byte of the current
instruction).

The second "set" of structures, insclass, define an instruction type.
Every generated instruction must fall within one of these types.  They
define the instruction structure (as a collection of fields) and the
written form of its invocation:

.nf
typedef struct insclassitem {   /* This defines an instruction type */
	int length;                     /* instruction length in bytes */
	int mopds;                      /* number of operands expected */
	opdclass *type[MAXOPDS];        /* each operand's field type */
	int offset[MAXOPDS];            /* each operand's bit offset,
					   from right end of first byte */
} insclass;
.fi

The
.b length
of an instruction type is the number of bytes in the
instruction, including all the fields.  The number of operands expected
.b mopd
may be 0, 1, or 2 (making this larger would involve changes to asm.h and
asm.c). MAXOPDS enforces the current limit on operands to two.  The
members of the array
.btype
are pointers to the appropriate opdclass defined
above.  When the instruction is scanned, the first operand must fit the
field described in the structure pointed to be xxx.type[0], the second by
xxx.type[1]. The array
.b offset
defines the amount of shifting to be done to
properly align the field in the instruction.  An offset of zero states
that the field's rightmost bit should be placed in the rightmost bit of
the instruction's first byte; a negative offset requires the value to be
shifted left that many bits, and a positive value must be shifted right.
An example, again from the 6805, shows the format of a relative
instruction:

insclass i_rel   = {2, 1, &o_rmem, &o_none,  8, 0};

Such an instruction is two bytes long, and contains one operand. This
operand is a relative memory operand (from the example above), and it must
be shifted to the right 8 bits (which puts it in the second byte of the
instruction exactly). The second operand must have an address even though
its not used; o_none fills this requirement.

All this is leading, of course to the definition of individual
instructions. These are defined in the opdef structures:

.nf
typedef struct opdefitem {      /* Defines an instruction */
	char *name;                     /* instruction mnemonic */
	insclass *class;                /* instruction type */
	Word mask;                      /* mask for first byte of instruction */
	int (*action) ();               /* action routine for assembly */
} opdef;
.fi

Each instruction has a
.b name
that is recognized during the source code
scanning. It also has a pointer to the insclass
.b class
that defines both what the
scanner should expect and how the finished instruction looks. The
.b mask
is a value to be or'ed in with the assembled operand fields to complete the
instruction. It normally contains the instruction-unique bits known as the
opcode. Finally, the routine
.b action
defined to assemble the instruction
must be given. For all normal instructions, the routine is
.u geninstr ,
which generates all normal instructions from the table data. This field is
defined primarily so that pseudo-ops can also use the structure.

Now, the opdef table is defined in a slightly different way than the other
tables. The entries in the other tables are all referenced by pointers,
so their order is of no consequence.  The opdef table (named optab), on
the other hand, must be searched by the assembler. Therefore, each entry
is a member of the array optab, and not a separate statement. In addition,
the entries are all arranged in alphabetical order by
.b name .

An example of a defined instruction for the 6803 is

.nf
opdef optab [] =
	....
	"bra"    , &i_rel  , 0x20, geninstr,
	....
};
.fi

The unconditional branch instruction has its format defined by the i_rel
class of instruction formats (which, as shown above, defines a two byte
instruction with one operand, etc.).  The mask for the first byte of the
instruction (the opcode for a branch) is hex 20. It is generated by the
geninstr routine.

Following the definition of optab is the statement

#define oplen sizeof(optab)/sizeof(opdef)

which reveals the size of the optab to asm.c.

What of the microprocessors that have two different instructions that may
be used to perform an operation, such as the 6803 that can load a register
from a memory location with either a two byte relative instruction or a
three byte extended instruction? The native assembler can generate the
shortest instruction that will fulfill the effect; so can the generic
assembler, under some circumstances. The third set of structures, called
choicedef, is used in this case:

.nf
typedef struct chcitem {        /* Defines the alternatives for instr choices */
	char *rname;                    /* restrictive mnemonic */
	char *nname;                    /* non-restrictive mnemonic */
	int field;                      /* operand that is restricted */
	int lorange, hirange;           /* range of restriction inclusive */
	Flag relative;                  /* to current lc, else absolute value */
} choicedef;
.fi

Any choicedef that exists (there may be none, if the microprocessor has no
such overlapping instructions) describes the tradeoff to be made. The
.b rname
is the mnemonic that may be used in the restrictive case of the
choice (i.e., the one that is more desireable, usually leading to a
smaller instruction).
.b Nname
is the mnemonic to be used otherwise.
Presumably, the two choices are mutually inclusive of all possibilities,
so that the nname mnemonic may be substituted in all cases to achieve the
desired result. The field
.b field
is either one or two, describing which
field the choice hinges on. The
.b lorange and
.b hirange
values are the
inclusive limits of the values that the field must fall within in order to
qualify for the restrictive choice. Finally, the
.b relative
flag staes that
the ranges are or are not relative to the current location counter. (At
this point, the relative flag is not implemented.)

The infamous example:

	"add"    , (insclass *)&c_add, 0x00, choiceinstr,

This entry in the optab table defines an pseudo instruction called add. It
may be expanded as the instruction adds, if conditions are right, or as
addz in any case. Instead of pointing to an instruction class, the second
entry in the structure is the address of a choice structure, which is cast
as an insclass pointer to keep c and lint happy. The mask is defaulted
(its value is not used), and the generating routine is changed to
choiceinstr, which handles the cases. The choicedef entry pointed to is:

choicedef c_add = {"adds" , "addz" , 2, 0, 0xff, NO};

This defines a choice of either the adds instruction or the addz
instruction to add a memory location to an accumulator. The second field,
the memory reference, is the key: if the reference is greater than or equal
to zero, and less than or equal to hex ff (decimal 255), then adds can be
used; otherwise only addz is allowed.

As I said above, the choice mechanism is restricted in the decisions that
it can make in attempting to use the shortest instruction.  Since the
choices are all made during the first pass, the expression in the deciding
field must be completely backward-defined.  That means that all symbols in the
expression must be either constants, be predefined (see below) or have
been defined in the code physically before the line where the choice
resides. In addition, all symbols in the expression must be in the same
segment as the code being generated. This is not to say that using a
choice instruction containing a forward reference is an error; rather, the
assembler, in the absence of required data to make the decision, will opt
for the "otherwise" form of the instruction, when it could be possible to
use the other. As Captain Kirk says, "Sie la vie."

The last set of entries in the ops.h file is a group of predefined symbols
that the assembler "knows" about for all assemblies on this
microprocessor. An example of these definitions is:

.nf
symbol predef[] = {
	{"ra"        ,    0x0, &o_reg  , (struct seg *)0 },
	{"rb"        ,    0x1, &o_reg  , (struct seg *)0 },
	{"eq"        ,    0x7, &o_cond , (struct seg *)0 },
	{"nc"        ,    0x4, &o_cond , (struct seg *)0 },
	{""          ,    0x0, &o_none , (struct seg *)0 },
};
.fi

These predefine the symbols ra, rb, eq, and nc for use in instructions
that reference register a, register b, and the branch conditions eq and nc
(no carry). Each is given a value, points to an operand type (which is not
currently used), and defined in the null segment (that is, the only
predefined, default segment). Note the null entry at the end of the table.

Now, given the above description it should be possible to define an ops.h
file to generate an assembler for most any machine. This scheme should
work for almost any 8 or 16 bit microprocessor, although its only been
used with eights. Restrictions are enforced by the use of a long
variable for expression solution and for instruction compilation, and
probably other things that will become apparant when the assembler is
ported to some wierd machine.

Choices for instruction format are arbitrary, of course. I use a branch
conditional instruction with the conditional code as an operand to reduce
the size of the optab (simply laziness); it would be easy to have
individual branch instructions instead.

So it goes.