Iterative compilation is a debug method used during the earliest
phases of project development. It allows you to remove basic errors
during the actual coding of a project rather than trying to address
them simultaneously. Iterative compilation is especially important
in complex designs that require many semi-dependent blocks.
Iterative compilation means that you compile projects, individual
modules or partial modules before project completion to allow you to
identify and remove syntactical errors. Before iterative compilation
can be attempted, the target code should be in a stand-alone state.
This means that all inputs and outputs in lower level modules must
either be connected to a driver or assigned a fixed value.
Implementing Iterative Compilation
Iterative compilation frequently begins at a module level. For
larger modules, theoretical proofs, and library modules; the module
is isolated and compiled as an independent project. Any coding errors
are removed. For library modules and theoretical proofs, the code is
Frequently when you are creating larger projects, you may not wish to
design all of the applicable modules or functions during the first
implementation. However, you must still supply all of the signals that
your architectural development and design
partitioning called for. You can accomplish this during
instantiation of those
signals that originate from the un-coded modules by using fixed values
for those signals during the instantiation. Functions, or in our case,
microprocessor instructions are handled by the 'default' case in
state machines, or by fixed values returned from functions. These
techniques will allow you to compile and simulate partial code.
Please note that although these techniques allow you to remove the
errors that will prevent you from compiling and simulating your code,
warnings from pared logic will abound. you should still be prepared
to investigate these warnings to ensure that they originate from the
use of parameters in the instantiations and not from some other
condition, such as a spelling error signal name.
For signals that originate from modules that you have not coded
yet, this is a relatively easy exercise. One way to do this is to set a
parameter in the interconnect block whose name corresponds to that signal
name and use it in the instantiation. If that signal name has already
been defined as a wire in the interconnect block or an output from
another block, those definitions must be removed or (preferably)
noted out (use '//' at the beginning of that declaration line).
// wire [15:0] daaval;
. . .
parameter daaval = 16'b0000000000000000;
Another way to accomplish this task is to declare all outputs from
the uncoded blocks as registers and use an initial block to set the
values. This is the method I prefer, as it lets you setup the
interconnect topology correctly.
There are some functions that we do not chose to address in this
early phase of our development. Stand outs include the "DAA"
instruction and interrupt processing. Because of its simplicity, the
serial port may be addressed, although some of its inputs may
not be resolved.
The serial port relies on the interrupt mask instructions for
operation. If bit 6 in the accumulator is set during a "SET
INTERRUPT MASK" (SIM) instruction, the state of bit 7 in the
accumulator is latched as the "sod_pad" output. The state of
the "sid" input is latched into bit 7 of the accumulator during
a "READ INTERRUPT MASK" (RIM) instruction. Therefore "sid" should
be an input to the interrupt block, and not to the serial port
block, or the serial port block should be moved to the interrupt
The bus request type parameters also need addressing. They are used
in more than one module and, according to
rule 6.6 should be moved from bus_r to
a separate parameter file. They can be moved to S8085d_p.v as
There exists another vital function, that is common across all
processors, we have not yet addressed. In the 8085 architecture
it is particularly important and contributed to its use in such
projects as the 'MARS ROVER'. In 8085 architecture the 'HALT'
state essentially powers down the microprocessor until a 'RESET'
or the next interrupt occurs. Since the primary power consumption
for FPGAs and CPLDs originates from the I/O pads, this condition
may be emulated by forcing all non-essential outputs to high
The 'HALT' state is initiated by a processor instruction but bears
functional similarity to the 'RESET' state except that it does not
alter any internal registers or the processor state (i.e. when
the 'HALT' state is exited, the processor resumes where it left off).
The 'HALT' output is only used by the top level module and by the
instruction decode and sequencing block, and one could be tempted to
locate it in the instruction decode and sequencing block. However,
by applying rule 3.11 we can see
that the best fit for the 'HALT' function must reside in either the
'RESET' block or an independent module.
Partitioning the Instruction Decode
and Sequencing Block
The Instruction Decode and Sequencing Block performs a variety of
functions and therefore, in accordance with
rule 3.11 must be partitioned
into smaller blocks.
Blocks are partitioned using a similar
method to that employed when partitioning the entire project.
Partitioned blocks represent a distinct hierarchical branch with
the project and are semi-autonomous.
When we examine the Instruction Decode and Sequencer Block from
a functional perspective, we can see that
It must monitor and react for 'HALT' states and pending
It must request and monitor all bus transactions.
It must latch fetched instructions.
It must generate a flag mask specific to each instruction.
It must control the execution of a specific set of steps
necessary to execute each instruction or function.
It must generate all source and destination selects and
In order to minimize the blocks generated and the resulting logic,
you must understand how processors traditionally accomplish some
of these functions. Processors generally use some form of
microcode to direct their
functions. Microcode is simply a small internal program to direct the
steps the processor must execute in order to perform an instruction or
function. In FPGA's and CPLD's this microcode and its associated
execution block is replaced by a state machine. By using the
microcode concept, we may consolidate several blocks:
Monitor and react for 'HALT' states and pending interrupts.
Request and monitor all bus transactions.
Control the execution of a specific set of steps necessary to
execute each instruction or function.
Because our microcode state machine has to handle so many disparate
values, it is best to use coded outputs, in accordance with the
principles we learned in module
2 lesson 3 to avoid the excessive logic delays and usage.
Instead, we rely upon an
block to decode the microcode state machine outputs into specific
signals for source and destination selects and function enables.
This, then, leads us to the following architecture:
Instruction Decode and Sequencing Block
Because all instructions execute some common functions, the
state machine frequently branches and converges. Therefore,
specific instructions are frequently re-referenced. This may
appear at first glance to involve state machines within a state
machine, but if you analyze further, you will see that only one
state machine is actually used. The extra case statements are
just sophisticated comparisons. However, the re-references do
require that an instruction be latched for the duration of its
execution. Since instruction fetches are extended by one
clock cycle specifically to allow time to execute 'single cycle
instructions' (referring to bus transactions) this instruction
capture should utilize a
transparent latch to
allow full use of the final clock cycle. The use
of a transparent latch will reduce the signal entities that
the sequencer block must use for comparison.
Most CPLD's and PGA's in particular do not like transparent
latches because they involve
that in many architectures can only take place at the I/O pads.
Although most FPGA architectures can support some form of
combinatorial feedback, their compilers may not. Simulators
also frequently experience problems when dealing with
To overcome the combinatorial feedback problem, we can use a
conventional 'D' flip flop with clock enable followed by a
multiplexer that uses the inverse of the clock enable signal
for input selection to emulate the function of the transparent
latch. This solution, of course, may cause a propagation
glitch during switch
over because combinatorial logic is typically 50% faster than
register logic. For slower clock speeds you can mitigate this
potential glitch by delaying the multiplexer enable by one half
clock cycle via a 'D' type register. For optimal reuse
quality, this delay should be on the basis of a conditional
Flag Mask Decoder
The flag mask decoder is simply a 256 word by 8-bit ROM that
translates the fetched instruction into an 8-bit flag mask.
Normally the flag mask is used to enable which flags, if any,
the instruction may alter. However, during conditional jumps,
it is used to select which flag to test. The negative flag
(NF) is used to select the complement of the condition.
Although all types of FPGA block RAM/ROM may be asynchronously
read, most compliers require a synchronous read in order
to recognize block RAM/ROM.
CPLDs and FPGAs that do not have block RAM/ROM available will
implement this code as a multiplexer tree or sum of products.
This could influence your component selection for higher clock
speeds. The register that latches the flag match must have
sufficient setup time remaining from the time the data because
stable on the address and data bus, through the I/O pad delay
and the decode logic delay, to latch the flags on the rising
edge of the read pulse. Instruction ROM/RAM should be fast
enough, or the decode logic delay short enough to satisfy this
Microcode State Machine and Expander
The microcode state machine is a coded state machine that
directs the overall processor operation by executing a
sequence of steps for each function, based upon the current
state and processor condition. There are many more states
then there are processor instructions and many of these
states are shared among functions, making it necessary to
periodically re-interpret the current instruction.
The expander is a
block that shares the state parameter definitions
with the microcode state machine. It decodes the current state
value into individual outputs used to control the other blocks.
A coded state machine should be broken down into individual
fields, even if this requires more bits. This will allow
for easier maintenance, modification, and additions. Mature,
stable products may violate this recommendation, however the
resultant code may not be reusable or easily maintained,
Each coded field within the microcode itself should be
kept as narrow as possible (4 bits is ideal) in order to
minimize the logic levels in the expander. Some ways to
accomplish this are:
Use of differentiator bits as a multiplexer select to
create an alternate interpretation of the field:
Alternate Interpretation Using Differentiator Bits
This is usually the safest approach and least costly in
terms of logic provide that enough differentiator bits
exist to accomplish this.
Use of combined fields to select special states. This is
a particularly dangerous thing to do in that you must
ensure that the individual field interpretations do not
cause any unintended side effects. "No Operation" states
are normally utilized. In our processor such states as
"OR SP,SP" etc. could be used for this purpose.
Interpretation of the entire state to select special
functions. This is the last resort and is similar to the
previous method, except that it uses more logic. This
approach is generally used to expand the previous method
to more special signals or to select single occurrence
type signals, such as bus request, and built directly
into the state machine:
INTEN : begin
intre = 1;
If all of these methods fail to accommodate the special
function, you must expand the width of your state machine. This
is very expensive in logic terms as it will be multiplied
by all state comparisons. It is generally used early on in
the development process while you are trying to minimize the
total state machine width.
Now let's modify our project skeleton to reflect the principles that
we have studied in this lesson:
Modified Project Skeleton
What is a "transparent latch"?
What do you do with un-driven signals during iterative compilation?
Why should we use a clock for embedded ROM reads?
Describe the various methods to incorporate special functions into
a coded state machine and its expander.
Why did we place the halt register in the reset block?
Why is state machine expansion so expensive?
Where should "total state interpretation" be placed and why?
What are the hazards of "combinatorial feedback?"
What are the goals when establishing bus state parameters?
What happens to logic usage when you expand the width of
a state machine and why?