Coding for Reuse Course - Module 4 Lesson 4
Making it Smaller

We now have a complete 8085(A/B/C/D) intellectual property that will run at 20 MHz in Altera and Xilinx FPGAs. It will also synthesize for a Lattice MACH XO2 device using Lattice Diamond Software, but there are some questions concerning state machine encoding. The issue remaining is our project constraint that requires our intellectual property fit into next generation CPLDs.

Even though the Lattice MACH XO2 family and the Xilinx XC3S200AN device can technically meet this requirement, this does not address the constraint intent completely. Altera has two families of devices, the Max II and the Max V, that more closely resemble traditional CPLD architectures. The largest members of these families feature 2210 logic elements. Our current Altera logic element count is 2537. We need to make it smaller.

In this lesson we will study how to achieve our timing requirements using both the Xilinx and Altera tool sets.

License Terms  

The source and/or object code (subsequently referred to as code) presented in this and subsequent lessons is the property of saxelec.com and owner who retains copyright protection. This code in whole or in part is licensed only for single copy per user, non-commercial use; except in the case of educational institutions, where a license of single copy per student, only as part of course curricula, is permitted.

Donald Gerard Polak (owner)

Timestamps  

I have been remiss about keeping the history portion of the module headers properly updated. The timestamps are hopelessly out of date. In Rule 6.19I mentioned using a script to update the regular "timestamp" expression. This will work under LINUX™, where the "sed" and "awk" utilities are available.

For you WINDOWS™ users and those of you that are not daring enough to create a "BASH" or "BOURNE" script (BTW: put this on you TODO list, even hardware designers require this skill), here is a simple "C" language program that you can compile using tinycc or GCC, which are free downloads under the "GNU" public license.

The "timestamp" utility is fairly easy to use. It must be called from the command prompt with the command line :

timestamp [<switches>] <filename> [[<switches>] <filename> ...]

The switches are "sticky", that is a particular switch state persists until it is reassigned by another switch. The switches are as follows :

  • -a Replace all occurences of regular expression (default).

  • -A Same as -a.

  • -c Use current date/time for replacement.

  • -C Same as -c.

  • -d Delete backup after timestamp (default).

  • -D Same as -d.

  • -k Keep backup.

  • -K Same as -k.

  • -m Use file modified date/time for replacement (default).

  • -M Same as -m.

  • -s Selective replacement of regular expression.

  • -S Same as -s.

Size Reductions  

Obviously, the first step in our size reductions should be to set our synthesis options to optimize for area and prevent register duplication. This reduces our logic element count to 2375. Still far too big.

We can see from our flow summary that a block ROM is not being inferred from our flag mask ROM. We see in flags_r that our potential ROM block has a reset that must be removed before a ROM block can be inferred. This correction would have no impact on CPLDs, which do not have memory blocks, but we will fix it anyway.

The bulk of our logic is contained in the microcode state machine. The state counter is 20 bits wide. In addition, there are several states where the instruction latch has to be re-interpreted. This requires a 28-bit decode, with a minimum of three logic levels per state.

We begin by removing unnecessary default conditions, i.e. the "Should never get here" states. We can replace the defaults with the most frequent next state conditions. We also replace the "if - else" constructs where either the "if" statement or the "else" statement returns to the current state with a single "if" statement; as some compilers cannot correctly interpret this type of construct as a clock enable.

We then make the delay states return to MIDLE, rather than using the poll function; unless they are executing another state, such as MFETCH1P. That leaves us with the following source code:

Size Optimized Project Source Code

External Microcode State Machine Version  

We can create an alternate version of the microcode state machine using an external state machine. Because the states no longer control the output signals directly, the output signals become registers whose values change only in certain states. The external state machine version is much more difficult to maintain, and yields larger object code. But it does execute faster, allowing us to operate at 24 MHz in Altera CYCLONE IV E devices.

The external state machine version uses a one-hot instruction decoder in the instruction latch. A one-hot decoder features an array of signals, that has an individual signal for all possible values of the input signal. Only one of these signals becomes true at any one time, based upon the current value of the input signal.

We need to alter our parameter file so that the instruction mnemonics represent an integer displacement in the decoder array. The new parameter file appears as follows:

Parameter File for External Microcode State Machine Version

We now alter ilatch_r to include the one-hot decoder, and flags_r and microcode_r to take advantage of the change. This leaves us with the following source code:

Source for External Microcode State Machine Version

Exercise  
  1. What did we do to make the object code smaller?

  2. Explain a one-hot decoder.

  3. What advantage does the external state machine version have? What disadvantages?