Coding for Reuse Course - Module 2 Lesson 4
Iterative Compilation and the Instruction Decode and Sequencing Block Part 1.

In this lesson we will study iterative compilation and the instruction decode and sequencing block. At the conclusion of the second part of this lesson you will have a working microprocessor.

Iterative Compilation  

Iterative compilation is a debug method used during the earliest phases of project development. It allows you to remove basic errors during the actual coding of a project rather than trying to address them simultaneously. Iterative compilation is especially important in complex designs that require many semi-dependent blocks.

Iterative compilation means that you compile projects, individual modules or partial modules before project completion to allow you to identify and remove syntactical errors. Before iterative compilation can be attempted, the target code should be in a stand-alone state. This means that all inputs and outputs in lower level modules must either be connected to a driver or assigned a fixed value.

Implementing Iterative Compilation  

Iterative compilation frequently begins at a module level. For larger modules, theoretical proofs, and library modules; the module is isolated and compiled as an independent project. Any coding errors are removed. For library modules and theoretical proofs, the code is often simulated.

Frequently when you are creating larger projects, you may not wish to design all of the applicable modules or functions during the first implementation. However, you must still supply all of the signals that your architectural development and design partitioning called for. You can accomplish this during instantiation of those signals that originate from the un-coded modules by using fixed values for those signals during the instantiation. Functions, or in our case, microprocessor instructions are handled by the 'default' case in state machines, or by fixed values returned from functions. These techniques will allow you to compile and simulate partial code.

Please note that although these techniques allow you to remove the errors that will prevent you from compiling and simulating your code, warnings from pared logic will abound. you should still be prepared to investigate these warnings to ensure that they originate from the use of parameters in the instantiations and not from some other condition, such as a spelling error signal name.

For signals that originate from modules that you have not coded yet, this is a relatively easy exercise. One way to do this is to set a parameter in the interconnect block whose name corresponds to that signal name and use it in the instantiation. If that signal name has already been defined as a wire in the interconnect block or an output from another block, those definitions must be removed or (preferably) noted out (use '//' at the beginning of that declaration line).

Example:
 

// wire [15:0]  daaval;
    . . . 
parameter daaval = 16'b0000000000000000;

Another way to accomplish this task is to declare all outputs from the uncoded blocks as registers and use an initial block to set the values. This is the method I prefer, as it lets you setup the interconnect topology correctly.

Housekeeping  

There are some functions that we do not chose to address in this early phase of our development. Stand outs include the "DAA" instruction and interrupt processing. Because of its simplicity, the serial port may be addressed, although some of its inputs may not be resolved.

The serial port relies on the interrupt mask instructions for operation. If bit 6 in the accumulator is set during a "SET INTERRUPT MASK" (SIM) instruction, the state of bit 7 in the accumulator is latched as the "sod_pad" output. The state of the "sid" input is latched into bit 7 of the accumulator during a "READ INTERRUPT MASK" (RIM) instruction. Therefore "sid" should be an input to the interrupt block, and not to the serial port block, or the serial port block should be moved to the interrupt block hierarchy.

The bus request type parameters also need addressing. They are used in more than one module and, according to rule 6.6 should be moved from bus_r to a separate parameter file. They can be moved to S8085d_p.v as follows:

Modified Parameter File

There exists another vital function, that is common across all processors, we have not yet addressed. In the 8085 architecture it is particularly important and contributed to its use in such projects as the 'MARS ROVER'. In 8085 architecture the 'HALT' state essentially powers down the microprocessor until a 'RESET' or the next interrupt occurs. Since the primary power consumption for FPGAs and CPLDs originates from the I/O pads, this condition may be emulated by forcing all non-essential outputs to high impedance.

The 'HALT' state is initiated by a processor instruction but bears functional similarity to the 'RESET' state except that it does not alter any internal registers or the processor state (i.e. when the 'HALT' state is exited, the processor resumes where it left off). The 'HALT' output is only used by the top level module and by the instruction decode and sequencing block, and one could be tempted to locate it in the instruction decode and sequencing block. However, by applying rule 3.11 we can see that the best fit for the 'HALT' function must reside in either the 'RESET' block or an independent module.

Partitioning the Instruction Decode and Sequencing Block  

The Instruction Decode and Sequencing Block performs a variety of functions and therefore, in accordance with rule 3.11 must be partitioned into smaller blocks.

Blocks are partitioned using a similar method to that employed when partitioning the entire project. Partitioned blocks represent a distinct hierarchical branch with the project and are semi-autonomous.

When we examine the Instruction Decode and Sequencer Block from a functional perspective, we can see that

  1. It must monitor and react for 'HALT' states and pending interrupts.

  2. It must request and monitor all bus transactions.

  3. It must latch fetched instructions.

  4. It must generate a flag mask specific to each instruction.

  5. It must control the execution of a specific set of steps necessary to execute each instruction or function.

  6. It must generate all source and destination selects and function enables.

In order to minimize the blocks generated and the resulting logic, you must understand how processors traditionally accomplish some of these functions. Processors generally use some form of microcode to direct their functions. Microcode is simply a small internal program to direct the steps the processor must execute in order to perform an instruction or function. In FPGA's and CPLD's this microcode and its associated execution block is replaced by a state machine. By using the microcode concept, we may consolidate several blocks:

  1. Monitor and react for 'HALT' states and pending interrupts.

  2. Request and monitor all bus transactions.

  3. Control the execution of a specific set of steps necessary to execute each instruction or function.

Because our microcode state machine has to handle so many disparate values, it is best to use coded outputs, in accordance with the principles we learned in module 2 lesson 3 to avoid the excessive logic delays and usage. Instead, we rely upon an expander block to decode the microcode state machine outputs into specific signals for source and destination selects and function enables. This, then, leads us to the following architecture:

Instruction Decode and Sequencing Block

  1. Instruction Latch

    Because all instructions execute some common functions, the state machine frequently branches and converges. Therefore, specific instructions are frequently re-referenced. This may appear at first glance to involve state machines within a state machine, but if you analyze further, you will see that only one state machine is actually used. The extra case statements are just sophisticated comparisons. However, the re-references do require that an instruction be latched for the duration of its execution. Since instruction fetches are extended by one clock cycle specifically to allow time to execute 'single cycle instructions' (referring to bus transactions) this instruction capture should utilize a transparent latch to allow full use of the final clock cycle. The use of a transparent latch will reduce the signal entities that the sequencer block must use for comparison.

    Most CPLD's and PGA's in particular do not like transparent latches because they involve combinatorial feedback that in many architectures can only take place at the I/O pads. Although most FPGA architectures can support some form of combinatorial feedback, their compilers may not. Simulators also frequently experience problems when dealing with combinatorial feedback.

    To overcome the combinatorial feedback problem, we can use a conventional 'D' flip flop with clock enable followed by a multiplexer that uses the inverse of the clock enable signal for input selection to emulate the function of the transparent latch. This solution, of course, may cause a propagation glitch during switch over because combinatorial logic is typically 50% faster than register logic. For slower clock speeds you can mitigate this potential glitch by delaying the multiplexer enable by one half clock cycle via a 'D' type register. For optimal reuse quality, this delay should be on the basis of a conditional compile.

    Instruction Latch

  2. Flag Mask Decoder

    The flag mask decoder is simply a 256 word by 8-bit ROM that translates the fetched instruction into an 8-bit flag mask. Normally the flag mask is used to enable which flags, if any, the instruction may alter. However, during conditional jumps, it is used to select which flag to test. The negative flag (NF) is used to select the complement of the condition.

    Although all types of FPGA block RAM/ROM may be asynchronously read, most compliers require a synchronous read in order to recognize block RAM/ROM.

    CPLDs and FPGAs that do not have block RAM/ROM available will implement this code as a multiplexer tree or sum of products. This could influence your component selection for higher clock speeds. The register that latches the flag match must have sufficient setup time remaining from the time the data because stable on the address and data bus, through the I/O pad delay and the decode logic delay, to latch the flags on the rising edge of the read pulse. Instruction ROM/RAM should be fast enough, or the decode logic delay short enough to satisfy this constraint.

  3. Microcode State Machine and Expander

    The microcode state machine is a coded state machine that directs the overall processor operation by executing a sequence of steps for each function, based upon the current state and processor condition. There are many more states then there are processor instructions and many of these states are shared among functions, making it necessary to periodically re-interpret the current instruction.

    The expander is a combinatorial logic block that shares the state parameter definitions with the microcode state machine. It decodes the current state value into individual outputs used to control the other blocks.

    A coded state machine should be broken down into individual fields, even if this requires more bits. This will allow for easier maintenance, modification, and additions. Mature, stable products may violate this recommendation, however the resultant code may not be reusable or easily maintained,

    Each coded field within the microcode itself should be kept as narrow as possible (4 bits is ideal) in order to minimize the logic levels in the expander. Some ways to accomplish this are:

    • Use of differentiator bits as a multiplexer select to create an alternate interpretation of the field:

      Alternate Interpretation Using Differentiator Bits

      This is usually the safest approach and least costly in terms of logic provide that enough differentiator bits exist to accomplish this.

    • Use of combined fields to select special states. This is a particularly dangerous thing to do in that you must ensure that the individual field interpretations do not cause any unintended side effects. "No Operation" states are normally utilized. In our processor such states as "OR SP,SP" etc. could be used for this purpose.

    • Interpretation of the entire state to select special functions. This is the last resort and is similar to the previous method, except that it uses more logic. This approach is generally used to expand the previous method to more special signals or to select single occurrence type signals, such as bus request, and built directly into the state machine:

          INTEN : begin
               intre = 1;
                   .
                   .
                   .
          end

    If all of these methods fail to accommodate the special function, you must expand the width of your state machine. This is very expensive in logic terms as it will be multiplied by all state comparisons. It is generally used early on in the development process while you are trying to minimize the total state machine width.

Code Realization  

Now let's modify our project skeleton to reflect the principles that we have studied in this lesson:

Modified Project Skeleton

Exercise  
  1. What is a "transparent latch"?

  2. What do you do with un-driven signals during iterative compilation?

  3. Why should we use a clock for embedded ROM reads?

  4. Describe the various methods to incorporate special functions into a coded state machine and its expander.

  5. Why did we place the halt register in the reset block?

  6. Why is state machine expansion so expensive?

  7. Where should "total state interpretation" be placed and why?

  8. What are the hazards of "combinatorial feedback?"

  9. What are the goals when establishing bus state parameters?

  10. What happens to logic usage when you expand the width of a state machine and why?