Coding for Reuse Course - Module 3 Lesson 6
Hardware Division

In this lesson we will study how to perform hardware division, and different hardware division algorithms

License Terms  

The source and/or object code (subsequently referred to as code) presented in this and subsequent lessons is the property of and owner who retains copyright protection. This code in whole or in part is licensed only for single copy per user, non-commercial use; except in the case of educational institutions, where a license of single copy per student, only as part of course curricula, is permitted.

Donald Gerard Polak (owner)


Division, at best, is a very time consuming instruction. It involves add, multiply, and at the very least, subtract operations. Each of these operations are not only time, but also logic intensive in nature. Let us look back to elementary school

Long Division

The process proceeds as follows:

  1. The partial remainder is initially set to zero.

  2. The next symbol from the dividend is shifted into the partial remainder.

  3. The divisor is compared to the partial remainder. The number of times it will "go" is shifted into the quotient and also multiplied by the divisor. The product is subtracted from the partial remainder to form a new partial remainder.

  4. Steps two and three are repeated until all symbols in the dividend are used.

  5. The remainder is the partial remainder after all symbols from the divisor are used.

Computer Divide  

So how does a computer divide? Believe it or not, many of the algorithms use long division. Different methods only vary according to the number of symbols shifted in each step, the use of pre-scaling, and how the quotient symbols and partial remainder are calculated.

There are numerous articles available on the various hardware division techniques, so I will not re-iterate them here. I will let you do personal research if you are interested.

Our Divider  

Our divider is a modified restoring divider circuit. Although this is not the fastest way to divide, it uses less than 50 logic elements. latency of the add back is avoided by using a mux to select between an unmodified remainder, and the difference between the remainder and the divisor. An eight-bit adder is incorporated into the divider circuit to facilitate subtraction.

For our divider circuit to function properly, the divisor must be a non-zero unsigned value. A zero check is incorporated into the divider initialization. This means, the divide overflow flag is adjusted during initialization, whereas, the rest of the flags are adjusted at the end of the operation; provided that the divisor was non-zero.

The way our divider works is as follows:

  1. During initialization, the two's complement of the accumulator is placed in an 8-bit divisor register. The carry out of the incrementer, used to calculate the two's complement value, is written to the overflow flag. If the overflow is zero, a four-bit down counter is loaded with an all ones' value. Simultaneously, dvoe is set, and will remain set until the clock cycle after the down counter decrements to zero. The remainder is also set to zero.

  2. While dvoe is active, the rightmost six bits of the remainder and bit seven of the b register are concatenated and simultaneously presented as an addend to the adder and as one input of the restoring mux. The other addend of the adder is the divisor and the other input of the restoring mux is the sum out of the adder. The carry out of the adder is used to select the sum input to the restoring mux and appended to the upper six bits of the b register and the seven bits of the c register. This value is written to the bc register pair, while the restoring mux output is written to the remainder.

  3. The signal, rmoe, is set for one clock cycle on the same clock edge that dvoe is cleared. When rmoe, the remainder is written to the accumulator, and the flags adjusted according to the remainder value.

Divider Block Diagram

In order to avoid complicating our source mux, the quotient and remainder are multiplexed in the divider. The quotient is used as the source when dvoe is active. The remainder is presented when rmoe is active.

We also discovered an error in the naming of the vf flag in several modules, where it was named of. This error was corrected when we simulated our code, using the techniques explained in module 3, lesson 1.

The divide instruction requires 14 sysclk cycles. To accommodate this requirement, a new bus cycle, which delays the bus ten sysclk cycles, had to be added. This new cycle is similar to the delay cycle used with the DADx instructions, except in the number of sysclk cycles. The new bus request type had to be added to our parameter file:

Parameter File with Ten sysclk Cycle Delay Bus Request Type

Coding and Simulation  

Coding our divider required some signal renames, additions, and deletions. We obviously had to modify the microcode state machine and the microcode state expander. One change in the microcode state expander that might be obvious at first, is the generation of the flag update signal. Since our flags aren't updated until the end of a divide, fupe must be enabled by rmoe during divide operations only. We also modified flags_r to enable the overflow flag during divide initialization.

The flag logic, itself, had to be modified to generate the correct flag values during divide. Finally, we touched up some of our `ifdefs so that additional logic is not generated for the 8085A, 8085B, and 8085C versions.

I also encountered an error that may require additional explanation. When you mix blocking and non-blocking assignments, or even sometimes when you are only using non-blocking assignments, a signal may change at the clock edge during functional verification and the changed value, rather than the original value used by the simulator to evaluate another expression that is supposed to change on the same clock edge. This is very simulator dependent and can be difficult to track down. To correct this issue, you must delay the culprit signal during functional simulation only. This situation does not occur during timing simulation, nor does it affect the actual operation of the hardware. (look at rmoe).

I came up with the following code, which is a complete (albeit, not completely verified) S8085D microprocessor:

Project with DIV instruction implemented.

We need to modify our test bench to exercise the divide instruction :

Updated test bench for exercising the DIV instruction

When we run our simulation we get the following :

GTKWAVE Display of the DIV Instruction

  1. What is the remainder after a long division?

  2. How do different methods of long division on a computer vary?

  3. What does our divisor have to be for our divider to function?

  4. What kind of divider do we use?

  5. How do we avoid add back latency?