Coding for Reuse Course - Module 2 Lesson 6
Multi-Cycle-Instructions

In this lesson we will study how the sequencer block handles multi-cycle instructions. We will also demonstrate how to make rudimentary architectural changes in order to implement two operand and some unconventional instructions.

License Terms  

The source and/or object code (subsequently referred to as code) presented in this and subsequent lessons is the property of Saxelec.com and owner who retains copyright protection. This code in whole or in part is licensed only for single copy per user, non-commercial use; except in the case of educational institutions, where a license of single copy per student, only as part of course curricula, is permitted.

Donald Gerard Polak (owner)

Introduction  

If the 8085 was limited to just using the 8-bit instruction fetch to perform its work, we would have some serious limitations:

  • Where would the data to necessary to execute the instruction come from? How would it get there?

  • Where would the results of the instruction be put? How would it get there?

  • What happens if the instruction needs more time?

  • How does the instruction make a decision?

  • When will this pile of junk do something?

The answer to these questions lies in multi-cycle instructions. When we use the term "multi-cycle instructions" we refer to the external bus perspective. A single cycle instruction consists or instruction fetch only. Multi-cycle instructions follow the instruction fetch cycle with one or more read, write, or delay cycles. External events may also trigger a series of bus cycles not associated with instruction fetch, or alter the timing of the instruction fetch/external event and subsequent cycles.

Before we begin the multi-cycle instructions, we must correct certain minor architectural flaws and update certain parameters so that the existing code can execute properly.

Preliminary Fixes  

Before we begin discussing multi-cycle instructions, let's first fix some missing definitions and some architectural shortcomings that closer followers may have already observed with some of the single-cycle instructions that we had temporarily bypassed, and with the bus state machine. The adder requires two operands. In our current architecture addend1 is connected directly to the accumulator. This confines adds and subtracts to 8-bit operations involving the accumulator only. However, the DADx and DSUB instructions requires addend1 to come from the HL register, the LDSI and LDHI instructions use DE for addend1, and the STAR and LDAR instructions perform an add with TEMP as one of the operands. Since we already have an 'adest' mux, used for the XCHG instruction, we can update the mux architecture to implement the operand selection and connect it to the adder. Note that we will still require a separate output for the DE register data though.


New Auxiliary Destination Bus Architecture

The parameter file was also missing a definition for the BUS_DELAY request. During research for this lesson we also discovered that the instructions and flags added for the 8085B and beyond architectures were also present, but not documented, in earlier architectures. This discovery allowed us to greatly simplify definitions and eliminated many alternate code paths. Due to this research, we also added a BUS_DLY3 bus request type to support the ARHL instruction.

In the last lesson, we also filled in the microcode for the eight-bit shift and rotate instructions, without implementing the shifter. We implement the shifter here. In this implementation of the shifter, we are also able to eliminate the ase, and the sce signals, thereby reducing logic. (I filled in the answers to the previous lesson's "left for you to code" states).

Updated Parameter File
Project Skeleton with XCHG instruction implementation.

I ran a simulation to demonstrate our shifter with the RRC and RLC instructions and the action of the XCHG and PCHL instructions. To do this, I updated our test bench with new instructions (and filling in the rest of the ROM addresses in the process):

Updated Test Bench


Simulation of the RRC and RLC Instructions


Simulation of the XCHG and PCHL Instructions

2 Cycle Instructions  

Our architecture for the 8085 makes this implementation unique. The bus state machine and the sequencer state machine are separate, independent blocks. However, the bus state machine is a subordinate block, i.e. it only processes commands dictated by the sequencer. Therefore, to preserve bus cycle compatibility from the external bus perspective, multi-cycle instructions must fall into two distinct classes: those instructions that require one or more bus cycles to happen before execution and those that require one or more bus cycles to happen after execution.

  1. Instructions that require a bus cycle after execution

    • In the original 8085 the internal busses, for technological reasons, were limited to eight bits. Consequently, most sixteen-bit instructions required additional time to complete. This additional time was reflected as additional clock cycles on the external bus.

    • Using our architecture, most of these same instructions will complete within a single clock cycle of instruction fetch; therefore, these additional clock cycles are un-necessary from a core hardware perspective. Some of the new instructions do require additional time to complete, but do not require any bus activity to occur during that interval.

    • In the legacy architecture, the bus and instruction execution were closely coupled, i.e. they originated from the same state machine and therefore could not operate independently.

    • In our architecture, the bus and instruction execution are loosely coupled, i.e. they originate from separate state machines and therefore can operate independently.

    • Legacy hardware and software frequently took advantage of these additional bus clocks and other bus activities for such functions as timing and direct memory access (DMA). Therefore, it is critical to maintain bus cycle compatibility if this IP is to be used with any legacy hardware or software.

    • In our architecture, the sequencer must request, and the bus cycle state machine supply, these "back porch" cycles.

    • Legacy instructions that require "back porch" cycles are INX(x), DCX(x), DAD(x), DSUB, ARHL, and RDEL. New instructions that require these cycles are MPY and DIV.

  2. Instructions that require one or more bus cycles before execution

    • Some instructions require additional data, such as a constant, address, or contents of a memory location, to execute. Some other instructions must write data to a particular memory location or device to execute. Some instructions must do both. All of these instructions require additional bus cycles to complete.

    • In our sequencer state machine, most of these instructions will converge one or more times on a subset of states (which I refer to as landing states), based upon the required bus operation and address source, and then diverge again to continue operation. In Verilog terms this requires "nested" case statements. That is, you are going to end up with case statements within case statements. Verilog compilers all support this, but it is important to use good coding style to keep track.

Immediate Operands  

Immediate operand instructions are a prime example of what must happen during a multi-cycle read instruction. When the instruction is first decoded in the sequencer state machine, the sequencer state machine must post a read request with the bus state machine, using the current program counter as the data address. Once the bus state machine acknowledges the request (with the rack signal), the sequencer state machine must first modify the address source (if necessary), in this case the program counter, wait for the bus state machine to signal the data transfer is ready (with the dack signal), and then setup the appropriate register enables to complete the instruction.

These instructions, along with other instructions (including those that require more than two bus cycles, but need to do their first read based upon the program counter), will converge upon a single state that initiates the bus request. This state is the first in a set of states that will hold the request until the bus state machine is ready to accept, increment the program counter, and wait for the bus cycle to complete. On the last state, the instructions will then diverge again to either complete the instruction or request another bus cycle.

Multiple Execution Pitfall.  

When the state machine diverges from its landing state directly to an execution state, multiple clock cycles can occur while waiting for dack. On each clock cycle, the instruction setup will execute. On a simple memory fetch, such as move immediate operand to register, these extra executions will not make a difference; since the final execution will load the appropriate data. But on operations such as add immediate, etc., the results are catastrophic (unless you want a truly random number generator). Therefore, we need some means of preventing these extra executions.

There are two different methods to prevent multiple executions:

  1. The delayed enable method requires that the microcode state machine send a signal to the microcode expander to tell the microcode expander to wait for dack (this also involves the sequencer level to attach the additional signals to the microcode expander). The microcode expander must then incorporate this additional signal and dack into all the equations for the destination and flag enable signals. The microcode state machine must hold each individual execution state until dack is asserted. In our implementation we use a pseudo request state, encoded in rtype), to tell the microcode expander to delay enables until dack is asserted to avoid inserting extra bits into the microcode state machine, which would affect all states and could add additional logic levels to all states; limiting the maximum clock speed. Despite this precaution, at least one or two logic levels will be added to each destination and flag enable which impacts maximum speed.

    In other architectures, such as in RISC machines that do not use a microcode state machine for instruction execution, or microprocessors that use a faster bus cycle, the delayed enable method would have a much less severe impact and could be the better fit.

  2. The early dack method involves adding a bit to the bus state machine that will occur one clock cycle before dack gets asserted. In this method, the instructions do not enter into an execution state until the early dack (edck) is asserted. The "wait for early dack" becomes another common landing state for groups of instructions, which will then diverge to execution states once the early dack signal is received. These execution states are not required to wait on dack.

    In our architecture, effect of the additional bit in the bus state machine is minimal, since the number of bits with or without early dack will not add additional logic levels. In other bus architectures, this may not be the case.

When we compile the following code for a XC2S150 using ISE version 10.1 you can see the difference. When we use early dack, the code requires 1363 LUT's or 39%. When it is compiled for the delayed enable method it requires 1429 LUT's or 41%. These extra logic cells can introduce undesired delays and limit the maximum clock speed.

(Note: Make sure that you set FSM encoding style to NONE before you compile or you will get almost 700 warnings since you crippled the microcode expander).

New parameter File
Project Skeleton with 2 cycle instructions.

Instructions with 3 or more cycles  

I compiled and checked this code, but I am saving another simulation for the next module. If you would like to simulate the immediate operand instructions, however, alter the test bench and follow the instructions from previous simulations. I suggest trying with and without the early dack method, and with the S8085B defined.

Subsequent versions of this code will use the early dack method exclusively, since we demonstrated that this method will give us a smaller intellectual property. It will also remove a lot of clutter from our code.

A number of instructions require 16-bit operands, and thus require more bus cycles in order to execute. These instructions are not much different than the two cycle instructions, except, the first data fetch or write must request the subsequent fetch or write, etc. Examine the LXIB instruction. The first data fetch places the memory data in the C register and then requests another data fetch which places the memory data in the B register. Each of these data fetches increments the PC once the request is made. Note, that the final state, which places memory data in the B register is shared with the MVIB and MOVMB instructions. Terminal states in most cases, can be shared; reducing the overall number of states required.

Another concept introduced here is the use of the temporary register which, in our architecture, is sixteen bits. The temporary register is used to hold transitory data, as in the case of the read-modify-write instructions, INRM and DCRM, or the data being move from one memory location to another in the MVIM instruction. It is also used to hold the I/O address in the IN and OUT instructions until the next bus request is made (Note that I/O addresses are only 8-bits, the lower 8-bits are duplicated to the upper 8-bits by the bus state machine). During the JMP instruction, the temporary register must hold the lower 8 bits of the JMP address until the second data fetch request is made. And during the LDAR and STAR instructions, the temporary register is used to process and hold the relative address.

The stack is also introduced here. The stack is a memory area for temporary storage. As items are PUSHed onto the stack, the stack pointer contents are used as the address, and then the stack pointer is decremented. As items are POPed from the stack, the stack pointer is first incremented and then used as the address.

With the implementation of these instructions, our microprocessor is more than 80% complete. We have also removed the differentiator bits from the state definitions since we can use the request type and operation fields as virtual differentiators. To this end we have defined a NULREQ virtual request.

Parameter File
Project Skeleton with multi-cycle instructions.

Exercise  
  1. Encode the HLPC instruction. Remember that this instruction is only present in the S8085D implementation.

  2. The XTHL instruction performs the following sequence:

    • L is transferred to TMP

    • The memory location pointed to by SP is read into L

    • TMP is written to the memory location pointed to by SP.

    • H is transferred to TMP

    • The memory location pointed to by SP+1 is read into H.

    • TMP is written to the memory location pointed to by SP+1.

    Encode the instruction. Try to share states if possible. You can also move both H and L to TMP at once if it helps.

  3. Draw the state transition diagram for the SHLD instruction.