Contents copyright Tom Jennings 1999-2002. All rights reserved.
Design of the
Universal Machine
15 December 2001
Tom Jennings
World Power Systems
7 October 2000
CONTENTS
Background 3
The Universal Machine 6
Paths not followed 6
Current architecture 6
Major design features 7
Memory addressing 7
Subroutine linkage 7
Software 9
Hardware 11
Datapath design 11
Memory logical design 11
Registers and circulators 12
Internal model 13
Drum architecture 14
Conditional execution 15
Logic family 16
Basic machine timing 17
Order function matrix 18
Logic diagrams 20
Symbols and conventions 21
Block diagram 21
Control machine 22
Front panel function logic 22
Front panel switch and lamp registers 22
Order decoder 24
Drum address search logic and track select 24
Track select tree 24
ALU datapaths and condition codes 25
Adder 25
A circulator 25
C circulator 25
K circulator 25
Drum read/write 26
Timing track circuitry 27
Timing logic and buffers 27
Teletype I/O 27
BACKGROUND
The goal of this project is to produce a fully functional stored-program digital computer, an idealized yet arguably authentic 1950-style vacuum tube computer capable of running real programs, with the express purpose of revealing the incredibly beauty and elegance inherent in these earliest machines, though mostly hidden under the primitive technical necessities of the time. The Machine will be the central focusing component in an art environment, either as an audience-participatory event or a passively-observed installation, depending on the software running in the machine.
This machine is of course a blatant forgery. Other than the lack of hardware multiply/divide the features of this machine is close to the Royal McBee LGP-30, from which much of this machine's design and aesthetic were stolen. (For a brief description and history of this important machine, please refer to URL XXXX.)
Genuine early-1950's machines ran rings around this Universal Machine performance-wise; it takes 403 milliseconds to grind through the Gill test, a popular benchmark of the time, not exactly a high performer, almost entirely due to the lack of hardware multiply/divide. (If the software multiply routine, which consumes approximately 370 of those precious milliseconds, is replaced with a 64 mS multiply instruction, performance is within 10% of the LGP-30, which had a Gill rating of about 32 mS.)
Worse, the software "environment" is far too modern, with a symbolic assembly language and such frivolities as "free addressing" (symbolic memory addressing) assumed from the start. Luckily for me there are few people around to criticize these things.
The Universal Machine (which name is pure artistic license; a 19th century archaism, and a nod to Alan Turing's work, who I feel is grossly under-appreciated in any number of ways, which the performances of this Machine will be embedded will explicitly address) is a serial, binary, drum memory, stored-program automatic digital computer. It is reasonably small and portable, built entirely with vacuum tubes and "crystal diodes" (sic) and uses a teletype and paper tape for input/output.
As far as I can tell, drum computers existed right from the start [BOOTH] and though quite obsolete by 1960, had a large effect on programmer culture, being quite literally the mini/personal computers of their time; the Royal McBee LGP-30 was built into a desk, had a modest and approachable order code ("instruction set"), and was immortalized in the infamous ditty about Mel, the Real Programmer.
Nonetheless, albeit conveniently, I feel I really had no choice buta binary, serial, drum architecture, from a strictly practical perspective. Serial, binary, drum machines are, hands down, the simplest, most elegant, lowest component-count stored-program digital computer architecture known to me. (The LGP-30 had 112 tubes; contemporary "full scale" computers had a minimum of 2000; the ENIAC of 1946 had 18,000, but that was truly inexcusable). From a purely construction-oriented point of view, the problems I faced in designing a vacuum tube computer today that I would actually have a realistic chance of completing parallels those of the original builders, though for different reasons; for them, utterly untried techniques and concepts, having to create software environments from scratch and with zero experience (usually grossly underestimating the task); for me, no budget, no staff (though no production schedule to meet), nearly zero experience with tubes (but a half-century of moldering documentation and lore), and the necessary techniques and components now of museum age (the few vacuum tube logic design books that existed were mostly thrown out long ago).
Essentially, all computer designs revolve around the main store, now and then. In 1950 the practical choices (with the convenience of historical re-vision) for main memory were mainly Williams tubes (fast, nearly random-access, using "ordinary" cathode-ray tubes, with many caveats) and acoustic delay line technology borrowed from WWII radar. Non-rotating magnetic techniques were known, but weren't practically exploitable until the mid 1950's; rotating memories were reliable but technology limitations meant it was too slow. Today's idealized silicon rectangular N-dimensional arrays let you think that memory organization is somehow automatic; though this isn't true at all even today, it sure seems idyllic in their historic context.
Main store is always a matter of economics. You choose one of the available technologies and replicate as many as you can afford. A million words of microsecond memory was possible in 1950, unfortunately no one could afford it.
In a serial drum machine, the deal with the devil allows a clean, simple, elegant solution to component count and rock-solid synchrony of all system timing, in exchange for any hope of decent performance, and likely the most-pessimum memory bandwidth tradeoff possible. Though of course it's impossible to ignore the performance problems, the elegance of the architecture deserves notice even today.
A sample fragment from an inner loop indexing a linear array of words will illustrate the dilemma:
v= *++ptr; # v= ARRAY[i]
might be rendered as
0100 LD PTR # load address field of order
0101 ADD 1 # increment it
0102 STA PTR # "store A in address field" order
0103 PTR: LD ARRAY # (initialized elsewhere as ARRAY[0])
0104 STO V # v= ARRAY[I]
0105 ...
In this example, four instructions ("orders") (all but ADD 1) reference (drum) memory twice, once for the fetch, and again to locate the operand. For each the access time is the angular distance between the last word read and the next word to read; in sequences like this usually (N - 1) orders in a loop require a full revolution. Note further than a pre-increment is shown; post-increment would require an additional store and load of "v".
There are obviously no index registers. Indexing is done by instruction-modification, enough to make compiIer writers faint and recursion and reentrancy not for the cautious. The performance loss due to all transfers passing through the single accumulator is significant, multiplied by huge and variable memory access times. Why put up with such a poorly-performing design? An order of magnitude reduction of hardware complexity, that's why; 100 kilograms of hardware vs. 1000 kilos of metal and glass.
In fact the actual story is more complex than described; optimum coding techniques, of which I have personally explored only the margins of, can recapture a lot of performance, but require great skill and planning on the part of the programmer (hence Mel's legendary status); my first pass at coding the Gill Test for this machine consumed over four seconds of run time; optimum coding brought it under 400 milliseconds. Further, sector-numbering is non-sequential ("skew factor") such that logically-sequential storage locations are placed where they are occasionally optimal.
The history of low-cost computing is synonymous with that of computing itself, with three major historical performances (drum machines in the 1950's, minicomputers in the 1960's, and microprocessors in the 1970's).
THE UNIVERSAL MACHINE
The system documented here is a very minimal but functional machine; a blatantly revisionist architecture: vacuum tube and diode logic, 1948; accumulator architecture, 1949; serial binary drum, 1950; "assembly language" machine abstraction, 1958; "computer" electron tube technology, 1962; two-pass assembler, 1968; simulator and cross-assembler, 1975.
I shamelessly picked over five decades of collective computer evolution with an inverse goal; to chose was what I personally think is the best of accepted practice, collective programmer culture, the darwinian survivors; practices that might make a sort of back-in-time-travel/forward-science-fiction sense to 1948 computer designers/users: the pinnacle of electron tube state of the art at the dawn of the semiconductor age; a basic machine design Wilkes would be familiar with (though likely cringe at); a serial/drum hardware scheme that produced some of the earliest programmer-friendly ancestors of the "mini" computer and the personal computer (the Royal McBee LGP-30 a shining example); an optimum machine- and human-parsable idealized representation of the underlying machine (the assembler); and the blatantly modern and practical simulator.
PATHS NOT FOLLOWED
T.B.D. D as in described, I have a lot of notes to process.
CURRENT ARCHITECTURE
The accumulator design is generally agreed [HENNESSY] to be the simplest hardware design. It has many simplifications over a 'minimal load/store' design, mainly, trivial address calculations (register C during fetch phase, register K during execute phase; that's all folks); all datapaths go through the ALU except write-to drum, for which there is one hard-wired facility that stores the accumuator to drum (for timing reasons peculiar to the serial/drum architecture). There are no busses per se, what few datapaths there are are driven directly from an opcode-decoder matrix switch. Needless to say there is no parallelism, in this design taken to an extreme: the sole I/O facility is one bit of input and one bit of output, with special opcodes to handle serial character input/output.Those knowledgable of historical architectural trends will appreciate the complete and utterabandonment of a data-driven model in this machine; the I/O model here reduces I/O data to pure procedure.
(It's interesting to note that these sorts of architectures haven't disappeared at all, but are thriving in the PIC (Programmable Interface Controller) world. The Microchip Inc. family of CPUs is very reminiscent of a Harvard-architectured PDP-8. Of course this one has 16 pins, runs at 20MHz and costs $7.00.)
Please note that this is the design document for the computer itself, it is not a description of the installation in which the computer will be contextualized, nor how it will be presented to participants/viewers. Most specifically all of this arch insider irony and implementation trivia will probably go unmentioned and unappreciated, and the particular software for the performance will be the foreground.
MAJOR DESIGN FEATURES
Single accumulator, serial binary unsigned arithmetic, single address, drum memory computer.
Instruction set designed for symbolic text character processing.
Major features swiped from the LGP-30 (single-address, "soft sectoring").
Electron tube and "crystal diode" logic. "Modern" tubes (miniature, 5xxx and 6xxx computing and Red series) and silicon diodes will be used.
Solid-state smart power supply (let's not be pedantic).
Magnetic drum as main and register store. Adequate capacity to write real programs; 4000 words planned, 2000 words minimum.
18-bit word length, consistent with the goals of the machine: short, but a multiple of character length (6 bits).
Optimizing assembler written in Perl.
Teletype and paper tape IO systems.
Built-in CRT for register contents display (deprecated in favor of integral neons).
MEMORY ADDRESSING
There is only one addressing mode, immediate address contained in the k field of most instructions. During the execute phases K is gated to the drum address comparator for those instructions that require drum access; during the fetch phases C is gated to the drum address comparator.
The STA instruction provides the sole method of memory indexing by modifying the k field (12 bits) of what is presumed to be an instruction stored on the drum (not including the obvious STOorder which modifies all 18 bits).
SUBROUTINE LINKAGE
There is in fact no real subroutine linkage except as provided by the STAorder. Instead, a programming convention will have to do.
When a sub-routine is called, the convention is to place the return address into A before JUMPing to the subroutine. It is up to the sub-routine to use this provided return address to create the return linkage. This is a major effect of the STA order.
The assembler meta-symbol C ("current contents of C circulator") is provided for this purpose. Please refer to the Universal Machine Operators Manual for details.
SOFTWARE
At this time there is very little software available, and most of that are library routines and performance and hardware testing.
As might be expected, the Universal Machine isn't quite, and can be hard to write good code for, but it's not unbearable. The instruction set more or less makes sense; the worst shortcomings seem to be, in order: the lack of hardware subroutine call and return eats up the accumulator, making argument passing slow and inefficient; indexing memory is awkward, especially with only one high-speed register, though immediate-operands help immensely.
Performance is quite terrible, and optimum coding is hard work. Outputting fixed text strings is depressingly slow, about 2 characters a second (though the methodology and code has not fully been worked out).
Complete listings of the routines that do exist are in the Operator's Manual, but a quick look at the double-precision multiply routine will have to do. This multi-variable routine illustrates the memory-bandwidth issue quite nicely.
errno: VAR # subr status
Mper: VAR # multiplier
Mcand: VAR # multiplicand or LSW
Mcandh: VAR # multiplicand MSW
Mtest: VAR # mult. routine
MQ: VAR # product or LSW
MQh: VAR # product MSW
Determining the optimal location for these variables must be done on a case by case basis, but the library routines will be carefully crafted.
# Unsigned double-precision multiply Mcand by Mper
# leaving the product in MQ (ls word) and MQh
# (ms word). Destroys Mcand and Mper. Sets errno
# if overflow out of MS digit.
multUDx: sta mdlr
and 0
sto MQ
sto MQh # initialize
ld 1
Nothing surprising here, just set the product to 0 and init the multiplier bit mask. More surprising is that unoptimized, the above four orders consume about 40 milliseconds, optimized, about 10.
mdl1: sto Mtest # m'per digit tester
andm Mper # test m'per digit
snz
jump mdl2 # add in Mcand if 1
The top of the inner loop saves the "current" multiplier bit mask, tests and skips if there's a digit to multiply by. Another 20 mS consumed if the storage locations are badly placed. Even one more circulator would be nice right now...
ldm Mcand # Mcand lo
addm MQ # + MQ lo
sto MQ # = MQ lo
ldm Mcandh # Mcand hi
adcm Mqh # + MQ hi + carry
sto MQh # = MQ hi
.There is probably a better algorithm for a serial machine.
ld 0
adcm errno # remember overflow error
sto errno
Error checking like this is probably an expensive luxury in this machine.
mdl2: ldm Mcand
or 0 # clear carry
rlc # shift multiplicand
sto Mcand # Mcand * 2
ldm Mcandh # double precision
adcm Mcandh #
ldm Mtest
or 0 # clear carry
rlc
snz # if no more bits
mdlr: jump RETURN # done.
jump mdl1
The multiply ends when there are no more bits to multiply by (nicely word-length-insensitive, obviously I have not been coding low-performance machines for a long time).
HARDWARE
The remainder of this document covers hardware operation.
DATAPATH DESIGN
The dataflow/datapath is an attempt at an absolute minimum hardware design, with the goal being extreme reliability through simplicity of design and broad margins. All ALU functions output to the single accumulator. The ALU has only six sources total on its two inputs. All datapaths between circulator source/destination are at most two levels of (diode) gate, with each source-to-destination node having it's own (diode) gate driven directly from the order decoder matrix. There is no internal microprogramming abstraction.
MEMORY LOGICAL DESIGN
The entire design centers around drum memory features (and limitations). Bit density vs. reliability is the limiting factor. Most of the drum comprises the main store, with additional tracks dedicated to circulators, sector addressing and timing. Note that the number of storage locations per main-store drum track may not necessarily be a power of two; therefore memory may not be treated as a single, simple array, but must be treated as N tracks of M locations per track. Welcome to 1948, in philosophy if not fact.
[June 2001: Almost sadly, it looks like 64 sectors/track is optimum for both performance (rotational latency) and low required track bit density. 64 is of course a power of two. I almost wish it were not.]
Two drum tracks are dedicated to machine timing. One provides basic bit-level clock timing pulses, the other contains six- to eight-bit sector addresses used for locating sector storage locations within a given track, and allows for sector skewing to allow optimum coding in the chosen one-address scheme (very clever, Royal McBee).
REGISTERS & CIRCULATORS
Nearly all registers in the machine are drum circulators, pairs of read/write heads with dedicated circuitry to provide a shift register of the required width. These should all fit within one or few physical tracks. There is also flip-flop based storage.
Register symbol width circulator/shift register
---------------------------------------------
Accumulator A 18 C (circulator)
Command C 18 C
Instruction I 5 SIPO (serial in, parallel out)
Drum D 20 see text
Select S 6 SIPO
Operand K 18 C
Z Z 1 flip-flop
Cy Cy 1 flip-flop
The accumulator (A), command (C), and constant (K) registers are 18-bit circulators contained on their own fixed drum tracks.
Z and Cy are condition-code flip-flops. Z is set when all bits in A are clear after certain opcodes; Cy is set when there is an overflow out of the most significant of A after certain opcodes. Program execution can be modified by using the "skip" instructions which reference the condition codes.
During the FETCH-LOAD phase, K receives the first 12 bits of the instruction word (zero-filled on the left out to 18 bits) and the I flip-flops receive the static opcode bits of the instruction, which remain valid until the next FETCH-LOAD phase. The addition of a circulator for K obviously requires more hardware, but it makes drum searches almost trivial and enables all of the immediate-operand orders.
The D register is the drum read/write head; while not a circulator or register, it is treated as such after the search cycle positions the correct drum word under the read/write head. The data at D is valid for only the current minor cycle.
The S register is the drum head-select static register, a mercury-wetted reed-relay decoder tree, with delta and timing logic attached.
INTERNAL MODEL
The logical machine consists of the circulators, datapaths and ALU, the connections between which are controlled by read-only diode matrix. The main features are:
A main store of 18-bit words, arranged as N tracks of M sectors per track.
One accumulator, A, 18 bits wide.
An command counter, C, 12 bits wide.
A constant register, K, 12 bits wide.
A three-function ALU.
An X by Y by Z instruction decoder matrix.
All machine timing is in synchrony with drum rotation. As the drum rotates the M sectors on the track pass under the read/write head; this is called the major cycle. Each sector consist consists of the 18 accessible data bits and the two dead bits; each sector comprises a minor cycle.
The machine uses a simple four-phase execution cycle, where each cycle takes exactly one minor cycle. All machine state changes (with an exception involving the STA order) takes place during the two dead-bit times.
FETCH:SEARCH The order indicated by the C register is located, by matching the LS 6 bits of C against the SECTOR ADDRESS track, serially. (Additional logic is involved to select the drum track.) A flip-flop is set upon sucessful match, indicating that the next-available drum word in fact contains the desired order;
FETCH:LOAD The first 12 bits off the drum read head are gated into the K circulator, the next 5 bits are strobed into the 5 I register flip-flops; during the last 6 bits the input to K is closed, filling the MS 6 bits of K with 0's. (Note that the MS bit of an order is unused.) Since the I flip-flops drive the order decoder matrix directly, at the end of this phase the microcode word is settled and valid.
EXECUTE:SEARCH If the MREF flag of the microcode word is true, then a search is made for the order's operand stored on drum. This search is identical to FETCH:SEARCH except that the comparator source is the K register instead of the C register. If the MREF flag is false then this phase is skipped by advancing to EXECUTE:RUN during the 2nd dead bit.
EXECUTE:RUN Performs the function implicit in the order applied to the I flip-flops, the drum operand, if required, is immediately available.
The outputs of the order decoder matrix are defined in [spreadsheet]. The schematics and the datapath tables illustrate the paths connected by the decoder matrix.
The exception to the state-change-during-dead-bit-times is the STAorder, which modifies only the LS 12 bits of a drum word, and requires that the drum write gate be turned off during a word time. For this reason alone, a return-to-zero (RZ) encoding method was chosen for the drum to avoid glitching and ruining data integrity. (Write current goes to zero at the end of every bit cell in RZ encoding.)
All of the circulators have separate read and write heads, and are fully and simply in sync with the rest of the machine, in that read and write operations are timing-wise symmetrical (eg. A gates to K or K gates to A, bit by bit).
DRUM ARCHITECTURE
The drum main store has only one head per track, and the realities of data encoding/decoding means that drum data is in fact delayed by one bit time as it appears in the drum-read flip-flop. The convention was chosen that drum read data timing is simply declared to be "in sync" with the circulators, eg. the bit contained in the drum read flip-flop at bit time 0 is the 0th bit, with the ramification that writing to the drum must be done one bit time early, and with A as the sole source. To accomplish this, the A circulator is actually one bit shorter as stored on the drum, with the missing bit supplied by a flip-flop, thereby providing an "early" tap-off for drum writing. The order code reflects this choice, in that there are many orders to read the drum (LDM, ADDM, etc) and only two that write the drum: STO and STA. This early tap on the A circulator was needed anyways to accomplish right-shifts (SR order). While this makes for a slightly degraded order bandwidth (eg. A + K --> drum is impossible) overall this degradation is utterly swamped by the general drum bandwidth issue.
There is one set of read/write electronics for all of the main store tracks; each track has its own fixed read/write head, with 1-of-N track/head selection done via a mercury-wetted reed-relay decoder tree (fast, as those sort of things go). Drum store word-aligned reading is done in synchrony with the machine; drum writes however must be started one bit time early due to magnetic flux writing technique delays; therefore, the only source for writing to the drum main store is via the acumulator, which has an early-data pickoff for this purpose (and for shift-right).
Memory addressing was copied from the LGP-30. Register S directly drives the track (read/write head) selection relay tree. Sector-within-a-track is done by a serial address comparator that compares the least (6) bits of the (C or K) circulator against a dedicated sector-address track provided for the purpose. This track contains pre-recorded sector address data; further, sector addressess are skewed to simplify optimum-coding techniques; the addresses stored on the sector-address track are numbered 0 16 32 48 1 17 33 49 2 18 34 50 3 19 35 51 ..., a skew factor of 4, which ensures that the next-sequential logical sector is "next" when a non-memory-reference order completes. Simple gating is used to ensure that only the least (6) bits of the C or K address circulator and sector-address track are compared.
Each circulator has its own physical track, each with a read and a write head and associated electronics. [I am assuming mechanical considerations will dictate mounting all the heads along the length of the drum, rather than mounting more than one circulator head per physical track; lineal density will lose out to mechanical simplicity.]
Circulator K, which holds the "immediate" field from the current order, has only read/write and switching circuitry (recirculate vs. write). The most significant six bits are always zero, ensured with a simple gate.
Circulator C, the command counter, contains circuitry similar to K, and additionally contains an incrementer (half-adder with carry storage) to manipulate program flow.
Circulator A, the accumulator, contains circuitry similar to K, as well as inverted-data and early-data pickoffs for use as ALU sources. The early data pickoff, Ae, is used as a source for writing to the main drum store. All eighteen bits of A are significant. The input to A is always and only the ALU output.
A timing track or tracks contains sector address information and sector ("word") timing information. This track is read-only, its contents generated by external means (TBD, but probably C code on a linux machine). All machine timing is derived from this information, with the addition of a [mechanical, optical, magnetic] timing mark indicating the zeroth word on the drum. A lot of machine timing finesse will be offloaded into intricacies in this data.
CONDITIONAL EXECUTION
There are two flip-flops whose persistent states are determined only by the arithmetic/logical orders (and machine hardware reset). The behavior of the six "skip" orders depend on the state of one of these flip-flops at the time the order is executed. No other orders depend on, or modify, these flip-flops. The two flip-flops, each with two states, are named as follows:
Z last ALU operation resulted in zero
NZ last ALU operation resulted in non-zero
C ALU carry bit set
NC ALU carry bit clear
The various "skip" orders specify one of these flip-flop states as a condition for execution. In fact, these orders simply gate the appropriate flip-flop output (Z, NZ, C, NC) to the CARRY-IN input on the half-adder attached to the C circulator; during the EXECUTE:RUN phase this will increment C, or not. Simple.
LOGIC FAMILY
The design criteria I decided upon is quite severe, and had to satisfy a number of things at once, in decreasing importance:
near-zero experience designing with tubes,
emphasizing the need for simple circuitry,
in turn implying not-very-fast circuitry,
lowest possible tube count,
fixed gate depth,
historical accuracy.
To this end the following criteria was chosen:
All gates and all switching are positive logic using silicon diodes (largely historically consistent). No more than two levels of gates are used to minimize speed and fan-out problems.
The only active device is the flip-flop, with rare few exceptions. These provide both normal and inverted outputs, consistent with the use of diode gates for switching and gating. The exceptions are Schmitt Triggers (aka voltage discriminators) on various inputs and cathode followers in high-fanout circumstances.
With rare exceptions inverters are not used, rather, the complement outputs of flip-flops are used. This minimizes timing skew issues of related signals (eg. +FOO and -FOO will be more or less simultaneous) though it increases the number of diode gates.
Standardized input logic level of approximately +20V, -30V for inter-module communications. Output level will generally be B+ and B-. Input diode clipping/bounding will be used where appropriate. (Within a module there is no standardization.)
All clocking is via gating; inputs will be held stable during the entire clock period (per-module determined). The clock system will provide a high-fanout narrow clock pulse well within each data window (in some cells more than one clock, delayed within the cell), generated by high-speed dekatrons, synchronized to the drum timing track.
This is a D.C. system in that it uses gates and not edges, though there will be minimum rise-time limitations.
Shift registers will be simulated with paralleled inputs and per-bit clock/gates (eg. a 1-of-18 demux), therefore allowing the use of simple RS flip-flops with diode clock gating.
Since all machine I/O is via teletype, all documentation (excepting these design documents) will be restricted to the ITA2 character set.
Logical signals are given alphanumeric names all in upper case. Positive logic is assumed, and a positive voltage is eqivalent to an asserted signal. For instances of negative logic (eg. The complement of a positive-logic signal) a leading minus sign indicates negative logic. By obvious extention a leading plus sign indicates (redundantly) positive logic. Therefore, the two complementary outputs of an arbitrary flip-flop might be named FOO or +FOO ("Q") and -FOO ("NOT Q").
Power supplies have a trailing plus or minus sign to indicate polarity; B+, B-, etc. Hopefully potentially confusing signal names will be avoided (one can imagine a single resistor drawn on a schematic, one end labelled "B+" and the other "+B").
BASIC MACHINE TIMING
The machine is designed around a bit-serial drum memory. A major cycle is defined as the time it takes for the drum to make one full revolution (approximately 10mS); a minor cycle is the time for one machine word of 18 data bits plus two "dead bits".
The Machine is little-endian, which makes serial addition most easy; bits are added from LS to MS, propagating a single carry bit per addition. The convention is that the first bit of a word off the drum is the 0th bit, eg. Bits are numbered 0..17, with 17 the MS bit, with an arithmetic weight of 2**17.
Following the 18th data bit are two dead bits, named D1 and D2. With one exception, all major machine states change only during D1/D2. The exception is the very important STA order, which requires switching drum write current off after bit 11 and before bit 12, to implement address modification. However, this is done with a simple timing signal, and the choice of the return-to-zero (RZ) drum encoding method, since it requires that head write current go to zero between bits, anyways.
Therefore, a complete minor cycle consists of 18 timing cells, named B0 (bit 0) through B17 (the 18th bit, bit 17), followed by D1 and D2. The timing section, described in later sections, produces 20 clock signals, B0, B1, B2, ..., B17, D1, D2, each a full bit "cell" in width, that allows shift-register-like functions with simple flip-flops.
ORDER FUNCTION MATRIX
A lot of information is packed into the order function table (spreadsheet). Each row describes one order (instruction), the columns indicate functions invoked for that order. This is a simplified abstraction of the order decoder matrix, which is really just a big ROM made from discrete diodes; 1-of-32 input, N-wide output. This spreadsheet will become much wider to include every control signal necessary.
The table is rather sparse, so it's apparent size 32 x 30? x 2) is not as problematical as it seems; though the logic-family requirement of no inverters means that the decoder must code for signal FOO as well as -FOO, but all in all I don't expect more than a thousand diodes in the decoder.
The order decoder output functions are as follows:
OP-ADD Selects ALU function ADD.
OP-OR Selects ALU function OR.
OP-AND Selects ALU function AND.
OP-CC ALU output modifies CY.
OP-CCY Clear CY before ALU op start.
OP-ALUA-A Gate A into ALU input A.
OP-ALUA-AE Gate AE into ALU input A.
OP-ALUA-AI Gate AI into ALU input A.
OP-ALUB-A Gate A into ALU input B.
OP-ALUB-D Gate D into ALU input B.
OP-ALUB-K Gate K into ALU input B.
OP-ALU-SW Gate switches into ALU input B.
OP-SZ Gate Z flip-flop into C carry-in.
OP-SNZ Gate -Z into C carry in.
OP-SC Gate Cy into C carry in.
OP-SNC Gate -Cy into C carry in.
OP-STZ Gate TTY into C carry in.
OP-STNZ Gate -TTY into C carry in.
OP-NOMREF Skip EXECUTE:SEARCH phase.
OP-ACIRC A write input is A output (recirculate).
OP-CCIRC C write input is C output.
OP-KCIRC K write input is K output.
OP-AWRITE A write input is ALU output.
OP-CWRITE C write input is D.
OP-KWRITE K write input is D.
OP-HALTSW Enable HALT switch to STOP.
STOA-GATE Drum write gate closes at B12.
STO-GATE Drum write gate closes at B17.
OP-TTYW Gate A[0] to TTY flip-flop.
Opcode matrix here
LOGIC DIAGRAMS
This section is meant to accompany the B-sized logic diagrams, and describes most of the inner workings of the major functions. The logical design here occasionally drifts into the electronic/engineering realm, which is usually the result of bad design. However, because I was so paranoid about component (especially tube) count as an overriding concern, I let tube-engineering issues drift into the logical design; the intent was to design purely logically, but to peek over the fence at engineering issues to streamline the electronics. The most blatant case of this is the serial drum sector address comparator; in "traditional" (builing-block semiconductor) methods it would be essentially an exclusive-OR function, requiring a half-dozen AND and OR gates; here it will be done with a modified analog comparator, using two grids.
(The rule of thumb for counting electron tube active components is to count control grids, more or less the equivalent to the gate, or base, of a transistor. The tube technology chosen here ("computor tubes") put one or two active devices/control grids per glass envelope (the state-of-the-art in tube packaging reached three active devices per envelope in late 1960's television receiver-specific tubes).
I tried to keep the designs extraordinarily non-tricky with radically simple DC clocking to ease design and testing. Tubes are terrible things, my working assumption is that every tube element has a 10 pF capacitor to ground (and G to P for triodes). Saying they are "slow" is being nice, hence the 100,000 pulse/second clock rate.
LOGIC DIAGRAM SYMBOLS AND CONVENTIONS
I may have gone overboard in choosing to use a subset of old-style McCulloch-Pitts-influenced logical component symbols, but luckily they're pretty straightforward (and actually practical enough given the simplicity of the logic).
McCulloch and Pitts were a pair of neuro-something researchers in the 1940's who came up with neuronal models of the brain, and the obvious parallel with computer (boolean) logic and then-fasionable cybernetics was not lost on early computer pioneers needing a convenient nomenclature for this new logical electronics, and McCulloch-Pitts fit the bill. As it was used by Turing and von Neumann it was a little too generalized, all gates being drawn exactly the same, many inputs and one or few outputs, and the number of "firing" inputs required to produce output indicated by a number inside the circle; an OR gate would have "1" inside, and AND would have "3" if there were three inputs (I never saw and N-input gate with less than N drawn in it, though such things likely existed in neurological models). Interestingly, an input or output that we would today call "inverted" was called "inhibiting" (again that neuronal model) and is drawn today as it was then - the input or output line meeting the gate with a small round dot.
I chose a vastly simplified version that is nonetheless nostalgic. The logic family restrictions mean that you won't see a single inverted ("inhibited") input or output dot. Inside the gate is "&" for AND and "+" for OR. I also chose left-to-right or right-to-left signal flow, and try to put inputs on the left and outputs on the right, and in some cases inter-module ins and outs along the top (there are many exceptions, I hope they remain clear).
BLOCK DIAGRAM
The block diagram shows all of the functional blocks in the Machine and the table on the right lists which drawing includes the labelled function. Much of the logic is extremely simple, most of the logical complexity is contained in the Control, Drum access, Timing, and circulators. Most of the rest is simple combinatorial logic and a few latches (flip-flops). The big scary Datapaths block near the top doesn't exist per se, but is really just the interconnections provided by simple gates on the outputs and inputs to sources and destinations.
CONTROL MACHINE
FRONT PANEL FUNCTION LOGIC
FRONT PANEL SWITCH AND LAMP REGISTERS
These three drawings cover the heart of the machine. The front-panel logic provides interlocks on human-interface switches (eg. STOP occurs only after order execution complete, etc) and generates signals that perform the functions described under CONTROL PANEL FUNCTIONS in the Operator's Manual. The functional blocks with the mysterious SPST switch drawn in them are simply modified flip-flops that change state when the switch is changed, in sync with the ENABLE or R/S inputs; the switch is debounced simultaneously. [I foolishly packed the notebook with implementation details of the timed debouncers and won't have access to them until Aug 2001! They are not rocket science, and in any case I lifted the design from [ELECTRONICS].]
The true heart of the machine is the state machine defined by two flip-flops, a 2-to-4-line diode decoder, one feedback flip-flop, and a half-dozen diode gates. There are four states with six possible transitions. This sub-machine handles all order decode and execute logic as well as manual examine/modify front panel functions and indicators.
Briefly, the two flip-flops generate the four sequential states:
FETCH:SEARCH
FETCH:LOAD
EXECUTE:SEARCH
EXECUTE:RUN
Assuming (unrealistically) that memory access (drum search) takes zero time, the state machine simply changes each D2 period; the instruction is found (FETCH:SEARCH), loaded into the decoder (FETCH:LOAD), any operand is found (EXEC:SEARCH), the order executed (EXEC:RUN). Slightly more realistically stated, the change from *:SEARCH to the next state (0-->1 or 2-->3) is delayed to the D2 that follows drum address search match (via the FOUND input to gate A.). Therefore, allowing for drum searches, the state machine runs through its four states every D2 (or D2 following drum search).
(The decision to change state actually occurs during D1, as determined by the FOUND and SEARCH inputs to gates A and C, timed with D1, and stored in the left-most flip-flop, the feedback flip-flop. During D2 (the very next clock pulse) this decision is gated to the T input of th SEARCH flip-flop, and during B0, following D2, the feedback flip-flop is reset for the next cycle. Without the delay/storage from the feedback flip-flop, a race condition would result from the SEARCH signal changing as the state changed.)
Additional demands of this state machine slightly complicate the logic, namely front panel examine/modify functions and the major performance enhancement of the immediate address mode.
Front panel functions
The front panel logic STOP signal doesn't actually halt the machine, as is traditional, but simply prevents it from advancing past the fetch phases, such that the machine fetches the same order repeatedly, once per major cycle, causing the neons of the lamp register to refresh continuously, making dynamic drum/register contents visible without an oscilloscope (the traditional solution for drum machines such as the LGP-30); therefore examine drum or circulator is a simple rotary switch that selects the appropriate serial input. Modifying circulator contents is also easy; the STORE-A, STORE-C signals from the front panel logic are already timed to correctly gate the switch register output to the appropriate circulator input [NOTE: this logic is missing from the diagrams].
Modify drum is trickier. The drum is delayed relative to the circulators by one bit time (a side effect of having only one read/write head; the circulators have both a read and a write head), and requires that data be delivered to it one bit time early. The solution is that the front-panel-generated STORE-D signal (1) jams a SWA order onto the I latch, (2) gates the early-data tap of circulator A (AE) to drum write for one minor cycle. An undesired side effect is that this modifies circulator A, so the programmer is warned to manually save A first, if necessary. [NOTE: This logic did not make it onto the Order Decoder drawing, but it consists only of diode gates to jam SWA onto register I.]
Immediate addressing mode performance optimization and D2
A major architectural performance boost is provided by the immediate mode of addressing that allows skipping the EXEC:SEARCH state for some orders. The
order fetch starts loading opcode bits into the I latch during FETCH:LOAD, bit 12, and is complete by bit 16 (order bit 17 is unused). During the I latch load time the order decoder matrix outputs are indeterminate, and certainly flap, but are allowed to settle starting with bit 16, and are certainly settled at the next D2 when the machine begins to enter EXEC:SEARCH, via gate C. When D2 occurs it begins to propagate the state change stored in the feedback flip-flop, but iff signal OP-NOMREF ("opcode, no memory reference") is true, gate D simultaneously jams both state flip-flops true, to state EXEC:RUN. Since the T input is edge triggered, and therefore of shorter duration than D2, the jam via gate D wins and the state machine skips EXEC:SEARCH.
ORDER DECODER
The order decoder consists of the five-bit I latch, loaded with the opcode bits, B12 through B16, during the FETCH:LOAD phase. The flip-flop outputs directly drive the large order-decoder diode matrix, a ROM that decomposes the opcode to the various signals driving datapaths, ALU functions and the like.
Note that order decoding is simple; I latch contents are always valid, except during B12 through B16 of the FETCH:LOAD phase. Datapaths, circulator and drum writes are enabled only in phase with EXEC:RUN phase and are therefore immune to flailing I latch updates.
There are also inputs to jam an SWA order, to support front panel function "STORE TO DRUM" and a reset (of dubious value). The matrix will be physically fairly large, and probably housed in its own chassis.
DRUM ADDRESS SEARCH LOGIC AND TRACK SELECT
TRACK SELECT TREE
This section while not overly complex will probably be a pain to make work, since it is totally dynamic in operation. It consists of two subsections with one point of interconnection.
Drum address consists of six bits of track select (one of 64 or 32 physical tracks) and six bits of sector (one of 64 sectors along each track). Track select is via simple static latch, but sector selection is dynamic, using a discrete sector-address track and a serial comparator.
The track-select portion is a straightforward latch driving the track-select relay tree, loaded with bits 7 through 11 of circulator C (during EXEC:RUN) or the fetched order (during EXEC:LOAD); the selected R/W head is therefore applied to the read/write electronics. A simple full-wave rectifier and RC integrator detects "track change", and therefore whether or not a delay must be invoked to allow the relay tree to settle; a monostable timer of sufficient duration (TBD) delays address-match (next paragraph).
Sector address dynamically compares the source address, either circulator C during FETCH:SEARCH, or circulator K during EXEC:SEARCH phase, with the contents of the special sector-address track, whose logical output is named SECTOR.
SECTOR data is simply a pre-recorded data track, containing 64 sectors with the same format as drum read/write data, however each sector contains the logical 6-bit address of the sector (it's own address) extended to 18 bits with 0's (not that it matters since they are unused). Sector numbering is not sequential, but skewed by a factor of four, such that every 4th physical sector is the next logical sector, to accommodate the four-phase nature of the machine (eg. After execution of an immediate-mode order from sector N, sector N+1 will be the next available sector) (see annoying subtlety, below).
Sector address comparison therefore consists of comparing the LS six bits of the source address, C or K, with the LS six bits of the SECTOR data; the comparison is made by setting the FOUND flip-flop every D2, and resetting it if any of the first 6 bits do not match. [Missing from the diagram is a flip-flop and gate that allows the comparator to reset FOUND only during B0 - B5.]
There is one annoying subtlety: the skew factor is four, even though immediate-mode orders execute in three minor cycles, because sector address comparison is dynamic; the 4th minor cycle is needed to make the address comparison.
ALU DATAPATHS AND CONDITION CODES
ADDER
A bunch of diodes and about seven grids, and the sole inverter in the machine comprise the ALU, the arithmetical and logical datapaths, the adder (there is no subtracter), and the Z and Cy flip-flops.
The arithmetical/logical datapaths and the ALU operation (add, or, and) are selected statically by the order decoder, and are fairly straightforward.
Z (accumulator .EQ. 0) is set at D2 time, and reset if any bit of A is non-zero.
Cy is updated (set or reset) by the ADD, ADDM, ADC, ADCM orders (via signal OP-CC). It is initially cleared (via OP-CCY) for the ADDx, Orx, ANDx orders, but the original contents preserved for ADCx.
The adder is a simple combinatorial adder built from AND gates. The inverter at the input to Cy on the ALU DATAPATHS sheet could be eliminated if a second adder were built to provide the inverse sum of the two operands.
A CIRCULATOR
C CIRCULATOR
K CIRCULATOR
The circulators are all essentially the same, with varying amounts of extra features. The basic circulator works as follows: the read head drives RZ decoder circuitry, whose output is sampled in sync with the bit clock; the write head is driven by the bit clock. In between the two heads is approximately 19.5 bits of rotating drum surface; the missing ½ bit ensures that integrated data is delivered to the read circuitry early enough to be decoded in time to be re-written. Between the read and write circuitry is a gate that allows inserting data into the write head instead of circulating old, the gate synchronized to the correct minor cycle during the correct phase.
The K circulator is simply the basic circulator with no added features. The gate is open during FETCH:LOAD in order to catch the least 12 bits of the order, eg. The operand address or immediate value. The K output is gated with bits B0 through B11, effectively fixing K at 12 bits (eg. Values 0 - 4095).
The C circulator is similar to K, with the addition of a half adder in the recirculate path. The INCR-C input causes C to be incremented upon recirculation, eg. C= C + 1. INCR-C is asserted during every FETCH:LOAD phase, and conditionally by the SKIP orders, which essentially gate the condition code flip-flop outputs (Z, NZ, Cy, NCy) to INCR-C.
The A circulator is different from the basic circulator in that the rotating magnetic path is one bit shorter, the missing bit made up in an additional flip-flop. This additional flip-flop provides an early data pickoff, called AE, one full bit time earlier than all other data sources (eg. When the Nth bit appears at A, C, K or D, the N+1th bit is available at AE).
This early pickoff solves two problems: it allows shifting A right (A= AE) and is the sole source for drum data write, which requires an early data source. Other than the shortened magnetic path and early pickoff A functions like the basic circulator.
DRUM READ/WRITE
The drum read/write circuitry is a straightforward RZ encoder/decoder with gates for timing and to support the STA address modification.
Drum write is essentially an integrating process, and the integration process as it takes place over a rotating surface means that the data cell placed on the drum, as it will be later found during drum read, is significantly delayed. The optimal solution (for both ease of design and construction and for performance) is to arrange the magnetic circuit such that drum-read is in sync with the rest of the machine and to make drum-write the anomalous function. This solution requires that data to be written to the drum be provided to the write head early, and that all drum writes go through the accumulator, which provides an early data pickoff, AE.
There are two timing gates used for drum writes. Both begin at D2 (the bit cell before B0). The gate for the STO order (ordinary full word write) closes at B17 (bit cell before D1), writing all 18 bits of the word, and the gate for the STA (address modification) closes at B11, writing the LS 12 bits, preserving the 6 MS bits of the drum word.
TIMING TRACK CIRCUITRY
TIMING LOGIC AND BUFFERS
All machine timing is derived from a read-only, pre-recorded drum track and an optical or magnetic (TBD) index that determines a single absolute drum position, called SECTOR0. A lot of detail here will be worked out during the engineering phase, and is TBD here.
From the timing track is derived the basic bit cell clock, CLOCK, which will be provided in a number of finely-separated phases. CLOCK also drives two cascaded high-speed Burroughs Dekatrons, decimal switched-beam tubes that generate bit cell clocks B0 - B18 and D1 and D2. The dekatrons are redundantly reset to B0 by SECTOR0 every drum revolution [checking for B0 true at SECTOR0 is a good runtime health check].
While most subassemblies will have their own clock (and other) buffers, the timing section will provide heavily buffered and high-power outputs.
TELETYPE I/O
The teletype interface is exceedingly simple: an output flip-flop driven by the TTO order, and a Schmitt Trigger for the STZ/STNZ skip orders, which gate the teletype input signal to INCR-C on the C circulator, thereby implementing conditional execution. A big fat cathode follower drives the 60 mA loop current.
Universal Machine Preliminary Design -- Copyright Tom Jennings 1999-2002 Page 1