The source and/or object code (subsequently referred to as code)
presented in this and subsequent lessons is the property of
Saxelec.com and owner who retains copyright protection. This code in
whole or in part is licensed only for single copy per user,
non-commercial use; except in the case of educational institutions,
where a license of single copy per student, only as part of course
curricula, is permitted.
Donald Gerard Polak (owner)
If the 8085 was limited to just using the 8-bit instruction fetch to
perform its work, we would have some serious limitations:
Where would the data to necessary to execute the instruction
come from? How would it get there?
Where would the results of the instruction be put? How would it
What happens if the instruction needs more time?
How does the instruction make a decision?
When will this pile of junk do something?
The answer to these questions lies in multi-cycle instructions. When
we use the term "multi-cycle instructions" we refer to the
external bus perspective. A single cycle instruction consists or
instruction fetch only. Multi-cycle instructions follow the
instruction fetch cycle with one or more read, write, or delay cycles.
External events may also trigger a series of bus cycles not associated
with instruction fetch, or alter the timing of the instruction
fetch/external event and subsequent cycles.
Before we begin the multi-cycle instructions, we must correct certain
minor architectural flaws and update certain parameters so that the
existing code can execute properly.
Before we begin discussing multi-cycle instructions, let's first fix
some missing definitions and some architectural shortcomings that
closer followers may have already observed with some of the
single-cycle instructions that we had temporarily bypassed, and with
the bus state machine. The adder requires two operands. In our current
architecture addend1 is connected directly to the accumulator. This
confines adds and subtracts to 8-bit operations involving the
accumulator only. However, the DADx and DSUB instructions requires
addend1 to come from the HL register, the LDSI and LDHI instructions
use DE for addend1, and the STAR and LDAR instructions perform an add
with TEMP as one of the operands. Since we already have an 'adest'
mux, used for the XCHG instruction, we can update the mux architecture
to implement the operand selection and connect it to the adder. Note
that we will still require a separate output for the DE register data
New Auxiliary Destination Bus Architecture
The parameter file was also missing a definition for the BUS_DELAY
request. During research for this lesson we also discovered that the
instructions and flags added for the 8085B and beyond architectures
were also present, but not documented, in earlier architectures. This
discovery allowed us to greatly simplify definitions and eliminated
many alternate code paths. Due to this research, we also added a
BUS_DLY3 bus request type to support the ARHL instruction.
In the last lesson, we also filled in the microcode for the eight-bit
shift and rotate instructions, without implementing the shifter. We
implement the shifter here. In this implementation of the shifter, we
are also able to eliminate the ase, and the sce signals, thereby
reducing logic. (I filled in the answers to the previous lesson's
"left for you to code" states).
with XCHG instruction implementation.
I ran a simulation to demonstrate our shifter with the RRC and RLC
instructions and the action of the XCHG and PCHL instructions. To do
this, I updated our test bench with new instructions (and filling in
the rest of the ROM addresses in the process):
Simulation of the RRC and RLC Instructions
Simulation of the XCHG and PCHL Instructions
2 Cycle Instructions
Our architecture for the 8085 makes this implementation unique. The
bus state machine and the sequencer state machine are separate,
independent blocks. However, the bus state machine is a subordinate
block, i.e. it only processes commands dictated by the sequencer.
Therefore, to preserve bus cycle compatibility from the external bus
perspective, multi-cycle instructions must fall into two distinct
classes: those instructions that require one or more bus cycles to
happen before execution and those that require one or more bus
cycles to happen after execution.
Instructions that require a bus cycle after execution
In the original 8085 the internal busses, for technological
reasons, were limited to eight bits. Consequently, most
sixteen-bit instructions required additional time to complete.
This additional time was reflected as additional clock cycles
on the external bus.
Using our architecture, most of these same instructions will
complete within a single clock cycle of instruction fetch;
therefore, these additional clock cycles are un-necessary from
a core hardware perspective. Some of the new instructions do
require additional time to complete, but do not require any
bus activity to occur during that interval.
In the legacy architecture, the bus and instruction execution
were closely coupled, i.e. they originated from the same state
machine and therefore could not operate independently.
In our architecture, the bus and instruction execution are
loosely coupled, i.e. they originate from separate state
machines and therefore can operate independently.
Legacy hardware and software frequently took advantage of
these additional bus clocks and other bus activities for such
functions as timing and direct memory access (DMA). Therefore,
it is critical to maintain bus cycle compatibility if this
IP is to be used with any legacy hardware or software.
In our architecture, the sequencer must request, and the bus
cycle state machine supply, these "back porch"
Legacy instructions that require "back porch" cycles
are INX(x), DCX(x), DAD(x), DSUB, ARHL, and RDEL. New
instructions that require these cycles are MPY and DIV.
Instructions that require one or more bus cycles before
Some instructions require additional data, such as a constant,
address, or contents of a memory location, to execute. Some
other instructions must write data to a particular memory
location or device to execute. Some instructions must do both.
All of these instructions require additional bus cycles
In our sequencer state machine, most of these instructions
will converge one or more times on a subset of states (which
I refer to as landing states), based upon the required bus
operation and address source, and then diverge again to
continue operation. In Verilog terms this requires
"nested" case statements. That is, you are going to
end up with case statements within case statements. Verilog
compilers all support this, but it is important to use good
coding style to keep track.
Immediate operand instructions are a prime example of what must happen
during a multi-cycle read instruction. When the instruction is first
decoded in the sequencer state machine, the sequencer state machine
must post a read request with the bus state machine, using the current
program counter as the data address. Once the bus state machine
acknowledges the request (with the rack signal), the sequencer state
machine must first modify the address source (if necessary), in this
case the program counter, wait for the bus state machine to signal the
data transfer is ready (with the dack signal), and then setup
the appropriate register enables to complete the instruction.
These instructions, along with other instructions (including those
that require more than two bus cycles, but need to do their first read
based upon the program counter), will converge upon a single state
that initiates the bus request. This state is the first in a set of
states that will hold the request until the bus state machine is ready
to accept, increment the program counter, and wait for the bus cycle
to complete. On the last state, the instructions will then diverge
again to either complete the instruction or request another bus cycle.
Multiple Execution Pitfall.
When the state machine diverges from its landing state directly to an
execution state, multiple clock cycles can occur while waiting for
dack. On each clock cycle, the instruction setup will execute. On a
simple memory fetch, such as move immediate operand to register, these
extra executions will not make a difference; since the final execution
will load the appropriate data. But on operations such as add
immediate, etc., the results are catastrophic (unless you want a truly
random number generator). Therefore, we need some means of
preventing these extra executions.
There are two different methods to prevent multiple executions:
The delayed enable method requires that the microcode state
machine send a signal to the microcode expander to tell the
microcode expander to wait for dack (this also involves the
sequencer level to attach the additional signals to the microcode
expander). The microcode expander must then incorporate this
additional signal and dack into all the equations for the
destination and flag enable signals. The microcode state machine
must hold each individual execution state until dack is asserted.
In our implementation we use a pseudo request state, encoded in
rtype), to tell the microcode expander to delay enables until dack
is asserted to avoid inserting extra bits into the microcode state
machine, which would affect all states and could add additional
logic levels to all states; limiting the maximum clock speed.
Despite this precaution, at least one or two logic levels will be
added to each destination and flag enable which impacts maximum
In other architectures, such as in RISC machines that do not use a
microcode state machine for instruction execution, or
microprocessors that use a faster bus cycle, the delayed enable
method would have a much less severe impact and could be the
The early dack method involves adding a bit to the bus state
machine that will occur one clock cycle before dack gets asserted.
In this method, the instructions do not enter into an execution
state until the early dack (edck) is asserted. The "wait for
early dack" becomes another common landing state for groups
of instructions, which will then diverge to execution states
once the early dack signal is received. These execution states are
not required to wait on dack.
In our architecture, effect of the additional bit in the bus state
machine is minimal, since the number of bits with or without early
dack will not add additional logic levels. In other bus
architectures, this may not be the case.
When we compile the following code for a XC2S150 using ISE version
10.1 you can see the difference. When we use early dack, the code
requires 1363 LUT's or 39%. When it is compiled for the delayed enable
method it requires 1429 LUT's or 41%. These extra logic cells can
introduce undesired delays and limit the maximum clock speed.
(Note: Make sure that you set FSM encoding style to NONE before you
compile or you will get almost 700 warnings since you crippled the
with 2 cycle instructions.
Instructions with 3 or more cycles
I compiled and checked this code, but I am saving another simulation
for the next module. If you would like to simulate the immediate
operand instructions, however, alter the test bench and follow the
instructions from previous simulations. I suggest trying with and
without the early dack method, and with the S8085B defined.
Subsequent versions of this code will use the early dack method
exclusively, since we demonstrated that this method will give us a
smaller intellectual property. It will also remove a lot of clutter
from our code.
A number of instructions require 16-bit operands, and thus require
more bus cycles in order to execute. These instructions are not much
different than the two cycle instructions, except, the first data
fetch or write must request the subsequent fetch or write, etc.
Examine the LXIB instruction. The first data fetch places the memory
data in the C register and then requests another data fetch which
places the memory data in the B register. Each of these data fetches
increments the PC once the request is made. Note, that the final
state, which places memory data in the B register is shared with the
MVIB and MOVMB instructions. Terminal states in most cases, can be
shared; reducing the overall number of states required.
Another concept introduced here is the use of the temporary register
which, in our architecture, is sixteen bits. The temporary register is
used to hold transitory data, as in the case of the read-modify-write
instructions, INRM and DCRM, or the data being move from one memory
location to another in the MVIM instruction. It is also used to hold
the I/O address in the IN and OUT instructions until the next bus
request is made (Note that I/O addresses are only 8-bits, the lower
8-bits are duplicated to the upper 8-bits by the bus state machine).
During the JMP instruction, the temporary register must hold
the lower 8 bits of the JMP address until the second data fetch request
is made. And during the LDAR and STAR instructions, the temporary
register is used to process and hold the relative address.
The stack is also introduced here. The stack is a memory area for
temporary storage. As items are PUSHed onto the stack, the stack
pointer contents are used as the address, and then the stack pointer
is decremented. As items are POPed from the stack, the stack pointer
is first incremented and then used as the address.
With the implementation of these instructions, our microprocessor is
more than 80% complete. We have also removed the differentiator bits
from the state definitions since we can use the request type and
operation fields as virtual differentiators. To this end we have
defined a NULREQ virtual request.
with multi-cycle instructions.
Encode the HLPC instruction. Remember that this instruction is
only present in the S8085D implementation.
The XTHL instruction performs the following sequence:
L is transferred to TMP
The memory location pointed to by SP is read into L
TMP is written to the memory location pointed to by SP.
H is transferred to TMP
The memory location pointed to by SP+1 is read into H.
TMP is written to the memory location pointed to by SP+1.
Encode the instruction. Try to share states if possible. You
can also move both H and L to TMP at once if it helps.
Draw the state transition diagram for the SHLD instruction.