The source and/or object code (subsequently referred to as code)
presented in this and subsequent lessons is the property of
saxelec.com and owner who retains copyright protection. This code in
whole or in part is licensed only for single copy per user,
non-commercial use; except in the case of educational institutions,
where a license of single copy per student, only as part of course
curricula, is permitted.
Donald Gerard Polak (owner)
Division, at best, is a very time consuming instruction. It involves
add, multiply, and at the very least, subtract operations. Each of
these operations are not only time, but also logic intensive in
nature. Let us look back to elementary school
The process proceeds as follows:
The partial remainder is initially set to zero.
The next symbol from the dividend is shifted into the
The divisor is compared to the partial remainder. The number
of times it will "go" is shifted into the quotient and
also multiplied by the divisor. The product is subtracted
from the partial remainder to form a new partial remainder.
Steps two and three are repeated until all symbols in the
dividend are used.
The remainder is the partial remainder after all symbols from
the divisor are used.
So how does a computer divide? Believe it or not, many of
the algorithms use long division. Different methods only vary according
to the number of symbols shifted in each step, the use of pre-scaling,
and how the quotient symbols and partial remainder are calculated.
There are numerous articles available on the various hardware
division techniques, so I will not re-iterate them here. I will let
you do personal research if you are interested.
Our divider is a modified restoring divider circuit. Although this is
not the fastest way to divide, it uses less than 50 logic elements.
latency of the add back is avoided by using a mux to select between
an unmodified remainder, and the difference between the remainder
and the divisor. An eight-bit adder is incorporated into the
divider circuit to facilitate subtraction.
For our divider circuit to function properly, the divisor must be
a non-zero unsigned value. A zero check is incorporated into the
divider initialization. This means, the divide overflow flag is
adjusted during initialization, whereas, the rest of the flags are
adjusted at the end of the operation; provided that the divisor
The way our divider works is as follows:
During initialization, the two's complement of the accumulator
is placed in an 8-bit divisor register. The carry out of the
incrementer, used to calculate the two's complement value, is
written to the overflow flag. If the overflow is zero, a
four-bit down counter is loaded with an all ones' value.
Simultaneously, dvoe is set, and will remain set until the
clock cycle after the down counter decrements to zero. The
remainder is also set to zero.
While dvoe is active, the rightmost six bits of the remainder and
bit seven of the b register are concatenated and simultaneously
presented as an addend to the adder and as one input of the
restoring mux. The other addend of the adder is the divisor and
the other input of the restoring mux is the sum out of the
adder. The carry out of the adder is used to select the sum
input to the restoring mux and appended to the upper
six bits of the b register and the seven bits of the c register.
This value is written to the bc register pair, while the restoring
mux output is written to the remainder.
The signal, rmoe, is set for one clock cycle on the same clock
edge that dvoe is cleared. When rmoe, the remainder is written
to the accumulator, and the flags adjusted according to the
Divider Block Diagram
In order to avoid complicating our source mux, the quotient and
remainder are multiplexed in the divider. The quotient is used as
the source when dvoe is active. The remainder is presented when
rmoe is active.
We also discovered an error in the naming of the vf flag in several
modules, where it was named of. This error was corrected when we
simulated our code, using the techniques explained in
module 3, lesson 1.
The divide instruction requires 14 sysclk cycles. To accommodate this
requirement, a new bus cycle, which delays the bus ten sysclk cycles,
had to be added. This new cycle is similar to the delay cycle
used with the DADx instructions, except in the number of sysclk
cycles. The new bus request type had to be added to our parameter
Parameter File with Ten sysclk Cycle Delay Bus Request Type
Coding and Simulation
Coding our divider required some signal renames, additions, and
deletions. We obviously had to modify the microcode state machine
and the microcode state expander. One change in the microcode
state expander that might be obvious at first, is the generation
of the flag update signal. Since our flags aren't updated until
the end of a divide, fupe must be enabled by rmoe during divide
operations only. We also modified flags_r to enable the overflow
flag during divide initialization.
The flag logic, itself, had to be modified to generate the
correct flag values during divide. Finally, we touched up some
of our `ifdefs so that additional logic is not generated for
the 8085A, 8085B, and 8085C versions.
I also encountered an error that may require additional explanation.
When you mix blocking and non-blocking assignments, or even sometimes
when you are only using non-blocking assignments, a signal may
change at the clock edge during functional verification and the
changed value, rather than the original value used by the
simulator to evaluate another expression that is supposed to
change on the same clock edge. This is very simulator dependent and
can be difficult to track down. To correct this issue, you must
delay the culprit signal during functional simulation only. This
situation does not occur during timing simulation, nor does it
affect the actual operation of the hardware. (look at rmoe).
I came up with the following code, which is a complete
(albeit, not completely verified) S8085D microprocessor:
Project with DIV instruction implemented.
We need to modify our test bench to exercise the divide instruction :
Updated test bench for exercising the DIV instruction
When we run our simulation we get the following :
GTKWAVE Display of the DIV Instruction
What is the remainder after a long division?
How do different methods of long division on a computer vary?
What does our divisor have to be for our divider to function?
What kind of divider do we use?
How do we avoid add back latency?