Coding for Reuse Course - Module 2 Lesson 2
ALU Preliminary

In this lesson we will partition and develop a preliminary version of our arithmetic/logic unit for our microprocessor.

ALU Partitioning  

Before we begin partitioning the ALU, let's examine what it does and where various pieces may fit. This exercise will require you to apply your knowledge of Boolean math, our earlier lessons and evolution, and general FPGA and CPLD knowledge.

  1. We require an incrementer to increase the operand bit value of the selected source by one. THE INCREMENTER IS USED DURING THE EXECUTION OF EVERY INSTRUCTION. The reason for this statement is simple. The program counter is used to fetch instruction opcodes and progresses upward. The stack pointer register must also progress downward for every byte "pushed" and upward for every byte "popped".

    The incrementer is used to convert one's complement values from the source mux into two's complement values. This conversion is only used during decrement, add, and subtract operations in the 8085 series processors.

    The incrementer is also used selectively during the 'add with carry' and 'subtract with borrow' instructions. In an 'add with carry' instruction, the source is incremented by one if carry flag is set, using the distribution theorem. During a 'subtract with borrow' operation, the incrementer is used if the carry flag is reset, because in Boolean math the addition of a one's complement value will effectively subtract the source minus 1, and a two's complement value the source, compatible with distribution theorem.

    Needless to say, but the incrementer is used to calculate the results of 'increment' and 'decrement' instructions.

  2. The ALU must also must contain the following functional blocks in order to calculate a result:

    1. A 16-bit adder. This adder is used to calculate the results of all instructions requiring add, subtract, compare and, as will be seen in later lessons, multiply and divide.

    2. A logic unit. This unit is used during 8-bit logic operations to calculate the 'AND', 'OR', and 'XOR' function logical results.

    3. A left/right, variable width shift unit. This unit is able to shift or rotate 8, 9, 16, or 17 bit operands left or right by one-bit position. It must also select between 8 and 16-bit source widths and whether or not to include the carry bit in the source, destination, or both.

    4. A multi-cycle multiply block.

    5. A multi-cycle divide block.

  3. All of these various paths must then be re-combined into a unified result from the ALU.

  4. Lastly the ALU must also calculate a new set of flags from the operation. These new flags reflect the current state of the flags, the inputs to certain ALU operations, the outputs from many ALU instructions, the instruction itself, or a combination of all of the former.

Now let's create a preliminary block diagram for this ALU and see what we've got:

Preliminary ALU Architecture

We can see immediately from this diagram that the output mux should be moved and combined into the destination mux. Then separate outputs will be taken from the incrementer, adder, multiply unit, divide unit, shifter, and the logic unit and routed into an enhanced destination. The destination mux, itself, must be considerably modified to incorporate these changes. It must be capable of accepting and selecting these multiple outputs. For best practice, it should also feedback the initially selected destination results to the ALU before any crossover:

Modified ALU Architecture

Modified destination mux

Preliminary ALU  

We don't have to design all the blocks of the ALU at once. For example; we could have enough infrastructure in place for more than 50% of the instructions if we just code the incrementer and incrementer logic. When we incorporate the adder and most of the flag logic, this figure becomes more like 78%. When we introduce the logic unit code, this infrastructure will support more than 98% of the instructions. Of course a 'black box' module will be needed for the shifter, multiply unit, and divide unit to provide a source for output signals and a sink for input signals.

Let us conduct a detailed analysis of how best to implement these four blocks before we begin coding.

1. Increment Logic

Increment logic determines whether or not to add one to the source value before propagating it to the adder or destination bus. For the increment, decrement, add, subtract, and compare instructions this is straight forward; the instruction sequencer simply enables the incrementer. The 'add with carry' and 'subtract with borrow' instructions change this logic a little. The current carry state must be accounted for. During an 'add with carry' instruction, the source is incremented if the current carry flag is set. During a 'subtract with borrow' instruction, the source is incremented when the current carry flag is reset. Subtract operations are detected by the ALU when the enables for source complement (comse), increment (incse), and add (adde) are active. 'Add with carry' and 'subtract with borrow' instructions are detected with a separate enable.

2. The Incrementer

The incrementer is, in its purest form, just an adder that generates the result of 'source + 1'. However, adders are expensive in terms of logic or resource usage, routing, time, or all of the former. But we can take advantage of the fact that one of the operand values is fixed at one and reduce the cost. If the incrementer is split into 4-bit elements, each individual bit could then be evaluated by a single 4-input logic cell. Because of the constraints in Rule 3.13; although UDP could be used for these outputs, it is better to use the logic equations. The mux is the split into 4-bit segments, with each segment controlled by the carry out from the previous cell and the increment enable signal. An additional logic cell is required by each chunk to generate the carry to the next one:

Incrementer and Mux

We set the carry into the first stage to "1'b1" to allow "ince" to control the increment function. The 'AND' gate will be optimized out by most compilers.

3. The Adder

The adder in our microprocessor generates the sum of two sixteen-bit integers, with no carry in, since that function was already accomplished by our incrementer. There are several different types of adder optimization that are generally device architecture dependent. Since adder logic is prevalent across a series of designs, the vendor probably has already selected the optimum adder for his particular device. Therefore, it is best to use the compiler to choose the method with a standard 'a + b' type statement.

However, we do need to split our adder into 4-bit chunks for a simple reason. Vendor libraries do not generally make internal carry out terms available. Our microprocessor requires the carry terms from the 4, 8, and 16-bit adders for use in flag generation.

4. The Logic Unit

In our microprocessor, the logic unit is only 8 bits wide. Moreover, each output bit requires two input values and can perform one of four functions: no-operation, OR, AND, or EXCLUSIVE OR. This means that each bit will require only one logic cell for each bit. Because of the constraints in Rule 3.13; although UDP's could be used for these outputs, it is better to use the logic equations.

5. Flag Logic

Each flag is a 1-bit value that indicates a particular attribute of an operation, inputs to an operation, result of that operation, or a combination of inputs to that operation and its result. Each distinct flag may or may not be used in calculating the output value, or be altered by any particular instruction. There are 8 distinct flags in our microprocessor.

Calculating borrows from subtract operations require particular attention to detail. Subtracts are accomplished by either one's complement addition in a subtract with borrow operation or two's complement addition otherwise. The subtrahend (source) has already been converted into a one's complement value by source selection, increment logic, and the incrementer. It would seem then that any carry out of the adder should be the complement of a borrow during a subtract operation. This is indeed true until you consider the case of the source operand or the source operand with borrow equaling zero. In these situations, the adder will not generate a carry, and the complement would indicate a borrow where none occurred.

Rather than adding logic to detect zero subtrahend values, we can take advantage of the flags generated by the incrementer, which would be used in both exceptions. Operands with zero value will generate an overflow out of the incrementer upon two's complement conversion. Subtract with borrow operations where a borrow occurs do not use the incrementer; but would also require the carry out of the adder to undergo a one's complement in order to indicate a borrow. These operations can be detected by using the same signals that control increment selection.

The same borrow detection logic used to complement the carry is also used by the auxiliary carry and overflow flags, albeit with different incrementer flags.

6. The Remaining ALU Blocks

If we want to be able to compile and simulate our code, we must provide sources for our outputs from the various un-coded blocks and sinks for their inputs. Therefore, the remaining blocks should declare their respective outputs as registers, use an 'initial' statement to set a default value for these outputs, and use the dedicated inputs to this block to modify that default value.

Preliminary ALU Coding  

Now let's modify the destination mux and code our preliminary ALU:

Skeleton with Modified Destination Mux
and Preliminary ALU Added

In order to prevent errors or extensive warnings when compiling different models, extensive use of conditional directives is used during flag logic instantiation, declaration and coding.

  1. Why do we split the adder into 4-bit chunks?

  2. How did we modify the destination mux and why?

  3. What does the increment logic do?

  4. Why isn't the incrementer implemented as an adder?

  5. Why don't we use UDP's for the incrementer and logic unit?

  6. What functions are performed by the logic unit?

  7. How do we prepare our un-coded units for compile and simulation?

  8. Why do we partition the ALU?

  9. When is the incrementer used?

  10. How many 4-input logic cells does our logic unit use?