1. Establishing and Partitioning a
One of the most frequent inefficiencies encountered when
developing an architecture or even circuitry for a project is
the duplication from a previous version of that or similar
projects. This serves as an artificial constraint, stifles
innovation, and may even propagate previously undiscovered, but
still serious, errors.
I have seen a circuit, duplicated in many similar designs, that
mistook an output, meant to drive an optional Pierce oscillator,
as an input. Since the output was ancillary, and therefore
caused no immediate, measurable failures it was considered
standard. The error was causing increased power consumption and
noise susceptibility, decreased performance due to die heating,
and eventual device failures.
Your target device may also vary significantly from the original
in terms of type and availability of resources. Older devices
used 2.5 micron or larger lithography and had limited,
point-to-point metal layers for routing. As a consequence, most
featured higher fan-in to individual cells and the largest
contributors to pin-to-pad delays were those cells. Modern
devices in general production feature lithographies as small as
20 nanometers and additional metal layers with programmable
access. These devices typically have lower fan-in to individual
cells; and the largest contribution to pin-to-pad delays
are associated with routing.
This is not to say that previous versions or similar devices
should not be considered when developing a project. I only
suggest that these factors should be taken as guidance, rather
than as a blueprint for your project.
When developing an architecture, best practice also dictates
taking into account the characteristics of the target range of
devices even though, at an architectural level, this is a very
The architecture that you develop should not only satisfy the
customer requirements at project inception, but anticipate what
those needs will be at project completion. It should also
incorporate your ideas for further utilization, expansion, and
enhancements to your project. In other words, do not think just
the immediate requirements, but future project evolution. The
best expression I have seen of this approach was a poster (I
don't recall the source) that said:
"Do not undertake vast projects with half vast
Providing the support infrastructure necessary for project
evolution during this phase of development will minimize the
impact of "scope growth." Scope growth normally
originates from changes in customer requirements during the
project development cycle, and, if un-anticipated, can be a
project killer. Un-mitigated scope growth can be best summarized
by the expression (origin unknown; paraphrased):
"The purpose of any good employee should be to
anticipate any potential problems, develop solutions for
those problems, and be prepared to execute those solutions
when called upon.... However, when you're up to your a** in
alligators it is difficult to remember that your original
objective was to drain the swamp."
1.2 Isolating Preliminary Functional Blocks
The best approach to architectural development is to consider
the project from a functional perspective, using guidance from
similar devices, then looking for major functional divisions. Do
not begin deciding how a block would perform that function.
Build a preliminary, high-level, block diagram based upon these
divisions. A paper copy could be used for this block diagram;
however, I strongly suggest a soft version, since the block
diagram is often used in other documents such as design
specifications and functional descriptions. Graphical design
software or word processors with built-in graphical design
capabilities, and the ability to group drawing elements and
text objects are good for creating the electronic version;
particularly since blocks may be frequently re-arranged.
Connect the external (from a project perspective) signals that
must originate (to include bi-directional) from these
preliminary blocks but do not interconnect the blocks or add
input signals just yet. These originating or bidirectional
signals may be either generic, (when creating a new device) or
specific (when replacing or enhancing an existing device).
Next, refine the block diagram by examining the signals that must
originate from these major blocks and their relation to one
Further divide your block diagram by examining the prospective
functional blocks at a high-level for compliance with
A good way of doing this is by writing a description of each
block. If more than two or three 'ands' or 'alsos' appear in
this description, the block is too big; and requires further
Socialize your work and get input from your peers.
By examination of our project intellectual property, specific
output signals, and other processors in general; we can see the
following major functional blocks:
A reset block that controls initialization and generates
RESET OUT. In our project, this is a virtual rather than a
physical block, since RESET OUT can be generated by the I/O
and the power reset function is normally controlled by
A timing block which controls synchronization and originates
SYSCLK and X2.
An interrupt block which controls the recognition and
prioritization of interrupts and originates -INTA.
A bus interface block which controls the flow of data into
or out of the processor. It originates A[15:8], AD[7:0},
ALE, S0, S1,IO/-M,-RD, -WR, and HLDA.
A serial port that terminates SID originates SOD.
A group of registers (referred to a register file) that does
not originate any output signals.
An arithmetic/logic unit that modifies data going to the
register file or the external bus. It does not originate any
An instruction decoder/sequencer that controls instruction
execution. It does not originate any output signals.
Preliminary Block Diagram
1.3 Develop the Interconnection Topology
Identify those signals that can originate both externally
and internally, or from multiple sources. Because of
we cannot add internal tristate drivers for these signals
and, instead, must add an additional block to select the
appropriate signal source. Connect the originating signals
to this block and any additional signal or signal group(s)
from this block to the termination point(s). This step
becomes iterative, since adding additional blocks may
introduce requirements for more selection blocks.
In our microprocessor project, we can see that data going
into the arithmetic/logical unit may originate from either
the register file block or the AD input. Therefore, we must
add a source selection block. For most operations, the
arithmetic/logical unit requires two separate sources. Data
going to the register file or AD bus can originate from this
source selection block or from the arithmetic/logical block,
therefore we must add a destination data selection block.
Working from your initial block diagram, identify and add
those external (from project perspective) input signals that
each block requires to perform its suggested function. These
signals may or may not fan into multiple blocks.
Identify and add those derivative, single point-of-origin,
single-point of destination signals or groups of signals
that (from a functional viewpoint) must logically originate
in one particular block and terminate in another.
In our microprocessor project, we can see that current flags
must originate in the register file and terminate in the
arithmetic/logical unit for modification or comparison.
Identify and add those input signals or groups of signals
that (from a functional perspective) must logically emanate
from one particular block and fan into multiple blocks.
This becomes a grey area. In our project, for instance,
modified flags in all but one instruction must originate in
the arithmetic logical unit and terminate in either the
instruction decode and sequencing or the register file. The
sole exception is the 'POPPSW' instruction, where modified
flags can originate from either the arithmetic/logical unit
or the AD bus. In these type of cases, you can either add an
additional selection block, or take advantage of existing
resources. Since the source selection block already makes
the AD bus available to the arithmetic/logical unit, we will
take advantage of that resource to form a single bus,
although we may revisit this decision at a later time.
OOptimize known bus sizes for your target range of devices.
Certain bussed signals, for instance address and data, have
known widths or steps of widths. Older devices, with limited
routing resources, sometimes employed multiplexed busses to
minimize routing. In modern and programmable devices,
multiplexed busses can consume resources and even introduce
additional routing constraints. In general, I prefer to size
busses according to the largest data width. But there is a
trade-off, extremely wide busses with high fanout can
consume valuable, impedance controlled 'long lines'.
Therefore, a mix of wider bus widths and bus multiplexing
should be considered in these cases. Conditional compiles
with both versions could then be used to evaluate the best
For our project, we can see that address and data internally
use the same busses. Furthermore, we can see that all
addresses and the results of some operations are 16 bits
wide. Even though the data width for the majority of
operations is only 8 bits, we must design our source and
destination busses to accommodate the larger, 16-bit width.
Re-examine your completed block diagram to ensure that your
proposed architecture will support your customers immediate
and foreseen needs. Be sure to add descriptions of any added
blocks; then, whenever possible, have your customer and
peers also examine the proposal. Do not try to 'hard
sell' your proposal. Let it stand or fall on its own merits
and make the necessary adjustments. Be prepared enough,
however, to negotiate any scope growth at this point. If you
followed the principles outlined in
Section 1.1, you can delay or
revector some of the most onerous "blue sky"
By applying the principles of section
1.1, section 1.2, and
section 1.3 we can arrive at the following
preliminary architecture for our project (note that some signals
are not named in this diagram simply to improve readability on
low resolution displays) :
Completed Block Diagram
2. Adapting Your Project Architecture to an
2.1 Additional Blocks
Because of rule
3.4 we see that I/O's must be isolated into a single,
top-level block. We can also see from
rules 3.5 and
3.6 that the top levels of different hierarchical functions
must be connected to each other. In our project this connection
is accomplished within the I/O block; however, you keep the I/O
block separate and add an interconnect block. This allows the
I/O block to be easily swapped out. Changeable I/O is desirable,
since the I/O architecture of an embedded intellectual property
can vary significantly from the I/O block needed when that same
IP is used for a stand-alone application. This also allows the
synthesis tool to readily remove unused functions by tying the
inputs used only by that function to a static value (0 or 1).
Many device vendor synthesis tools can infer global clocks. In
some cases, they must be tied to a buffer in the I/O module. I
generally apply those buffers without any associated compiler
directives, since may ASICs will require them. However, applying
a buffer to an input signal may affect the synthesis of certain
vendor topologies or devices so you may end up adding compiler
directives or conditional statements later to support these
vendor devices or topologies.
Also note that bi-directional signals are broken apart leaving
and entering the I/O module. For instance, in our microprocessor
IP, the AD bus has an ad_in signal set going out of the I/O
module from the connected modules, and an ad_out signal set
entering the I/O module from the connected modules. Because of
this separation, we can remove ad_in from the bus interface,
since the bus interface will never directly use it.
Whenever possible, the control and signal sense of all I/O
buffers should be the same. In other words, all I/O tristate
buffers should be either bufif0 or bufif1, but not a mix of the
two. They should be non-inverting in function. When this is not
possible, as in the case of the buffer for rst_out_pad in our
project, it is better to invert the state of the control or
input signal than to change the buffer (in our example we
will invert rst_in_n to generate rst_out_pad rather than
inserting an inverting buffer).
2.2 Developing a Project Skeleton
The project skeleton is the first version of your Verilog code.
It consists of a completed I/O module and black box modules for
each of your physical architectural blocks, which will serve
either as complete functions or as the top level modules in
function hierarchies. Connect the known signals to these blocks.
Additional signals can easily be inserted and unused signals
removed as your project evolves.
he first project evolution can begin while you are creating the
project skeleton. While you are writing the Verilog code,
project the block functional requirements and add any additional
signal resource dependencies or unused signals that you have not
identified during your block diagram development. Add or remove
these signals to/from your functional block, project hierarchy,
and block diagram. The block diagram that I presented earlier
incorporated this type of evolution.
My project skeleton for our microprocessor appears as: