Before we begin partitioning the ALU, let's examine what it does and
where various pieces may fit. This exercise will require you to apply
your knowledge of Boolean math, our earlier lessons and evolution, and
general FPGA and CPLD knowledge.

We require an incrementer to increase the operand bit value of the
selected source by one. THE INCREMENTER IS USED DURING THE
EXECUTION OF EVERY INSTRUCTION. The reason for this statement is
simple. The program counter is used to fetch instruction opcodes
and progresses upward. The stack pointer register must also
progress downward for every byte "pushed" and upward for
every byte "popped".
The incrementer is used to convert one's complement values from
the source mux into
two's complement values. This conversion is only used during
decrement, add, and subtract operations in the 8085 series
processors.
The incrementer is also used selectively during the 'add with
carry' and 'subtract with borrow' instructions. In an 'add with
carry' instruction, the source is incremented by one if carry flag
is set, using the distribution theorem. During a 'subtract with
borrow' operation, the incrementer is used if the carry flag is
reset, because in Boolean math the addition of a one's complement
value will effectively subtract the source minus 1, and a two's
complement value the source, compatible with distribution theorem.
Needless to say, but the incrementer is used to calculate the
results of 'increment' and 'decrement' instructions.

The ALU must also must contain the following functional
blocks in order to calculate a result:

A 16bit adder. This adder is used to calculate the results of
all instructions requiring add, subtract, compare and, as will
be seen in later lessons, multiply and divide.

A logic unit. This unit is used during 8bit logic operations
to calculate the 'AND', 'OR', and 'XOR' function logical
results.

A left/right, variable width shift unit. This unit is able to
shift or rotate 8, 9, 16, or 17 bit operands left or right by
onebit position. It must also select between 8 and 16bit
source widths and whether or not to include the carry bit in
the source, destination, or both.

A multicycle multiply block.

A multicycle divide block.

All of these various paths must then be recombined into a unified
result from the ALU.

Lastly the ALU must also calculate a new set of flags from the
operation. These new flags reflect the current state of the flags,
the inputs to certain ALU operations, the outputs from many ALU
instructions, the instruction itself, or a combination of all of
the former.
Now let's create a preliminary block diagram for this
ALU and see what we've got:
Preliminary ALU Architecture
We can see immediately from this diagram that the output mux should be
moved and combined into the destination mux. Then separate outputs
will be taken from the incrementer, adder, multiply unit, divide unit,
shifter, and the logic unit and routed into an enhanced destination.
The destination mux, itself, must be considerably modified to
incorporate these changes. It must be capable of accepting and
selecting these multiple outputs. For best practice, it should also
feedback the initially selected destination results to the ALU before
any crossover:
Modified ALU Architecture
Modified destination mux

We don't have to design all the blocks of the ALU at once. For
example; we could have enough infrastructure in place for more than
50% of the instructions if we just code the incrementer and
incrementer logic. When we incorporate the adder and most of the flag
logic, this figure becomes more like 78%. When we introduce the logic
unit code, this infrastructure will support more than 98% of the
instructions. Of course a 'black box' module will be needed for the
shifter, multiply unit, and divide unit to provide a source for output
signals and a sink for input signals.
Let us conduct a detailed analysis of how best to implement these four
blocks before we begin coding.
1. Increment Logic


Increment logic determines whether or not to add one to the
source value before propagating it to the adder or destination
bus. For the increment, decrement, add, subtract, and compare
instructions this is straight forward; the instruction sequencer
simply enables the incrementer. The 'add with carry' and
'subtract with borrow' instructions change this logic a little.
The current carry state must be accounted for. During an
'add with carry' instruction, the source is incremented if the
current carry flag is set. During a 'subtract with borrow'
instruction, the source is incremented when the current carry
flag is reset. Subtract operations are detected by the ALU when
the enables for source complement (comse), increment (incse),
and add (adde) are active. 'Add with carry' and 'subtract with
borrow' instructions are detected with a separate enable.

2. The Incrementer


The incrementer is, in its purest form, just an adder that
generates the result of 'source + 1'. However, adders are
expensive in terms of logic or resource usage, routing, time, or
all of the former. But we can take advantage of the fact that
one of the operand values is fixed at one and reduce the cost.
If the incrementer is split into 4bit elements, each individual
bit could then be evaluated by a single 4input logic cell.
Because of the constraints in
Rule 3.13;
although UDP could be used for these outputs, it is better to
use the logic equations. The mux is the split into 4bit
segments, with each segment controlled by the carry out from the
previous cell and the increment enable signal. An additional
logic cell is required by each chunk to generate the carry to
the next one:
Incrementer and Mux
We set the carry into the first stage to "1'b1" to
allow "ince" to control the increment function. The
'AND' gate will be optimized out by most compilers.

3. The Adder


The adder in our microprocessor generates the sum of two
sixteenbit integers, with no carry in, since that function was
already accomplished by our incrementer. There are several
different types of adder optimization that are generally device
architecture dependent. Since adder logic is prevalent
across a series of designs, the vendor probably has already
selected the optimum adder for his particular device. Therefore,
it is best to use the compiler to choose the method with a
standard 'a + b' type statement.
However, we do need to split our adder into 4bit chunks for a
simple reason. Vendor libraries do not generally make internal
carry out terms available. Our microprocessor requires the carry
terms from the 4, 8, and 16bit adders for use in flag
generation.

4. The Logic Unit


In our microprocessor, the logic unit is only 8 bits wide.
Moreover, each output bit requires two input values and can
perform one of four functions: nooperation, OR, AND, or
EXCLUSIVE OR. This means that each bit will require only one
logic cell for each bit. Because of the constraints in
Rule 3.13;
although UDP's could be used for these outputs, it is better to
use the logic equations.

5. Flag Logic


Each flag is a 1bit value that indicates a particular attribute
of an operation, inputs to an operation, result of that
operation, or a combination of inputs to that operation and its
result. Each distinct flag may or may not be used in calculating
the output value, or be altered by any particular instruction.
There are 8 distinct flags in our microprocessor.
Calculating borrows from subtract operations require particular
attention to detail. Subtracts are accomplished by either one's
complement addition in a subtract with borrow operation or two's
complement addition otherwise. The subtrahend (source) has
already been converted into a one's complement value by source
selection, increment logic, and the incrementer. It would
seem then that any carry out of the adder should be the
complement of a borrow during a subtract operation. This is
indeed true until you consider the case of the source operand or
the source operand with borrow equaling zero. In these
situations, the adder will not generate a carry, and the
complement would indicate a borrow where none occurred.
Rather than adding logic to detect zero subtrahend values, we
can take advantage of the flags generated by the incrementer,
which would be used in both exceptions. Operands with zero value
will generate an overflow out of the incrementer upon two's
complement conversion. Subtract with borrow operations where a
borrow occurs do not use the incrementer; but would also require
the carry out of the adder to undergo a one's complement in
order to indicate a borrow. These operations can be detected by
using the same signals that control increment selection.
The same borrow detection logic used to complement the carry is
also used by the auxiliary carry and overflow flags, albeit with
different incrementer flags.

6. The Remaining ALU Blocks


If we want to be able to compile and simulate our code, we must
provide sources for our outputs from the various uncoded blocks
and sinks for their inputs. Therefore, the remaining blocks
should declare their respective outputs as registers, use an
'initial' statement to set a default value for these outputs,
and use the dedicated inputs to this block to modify that
default value.

