head	1.6;
access;
symbols
	bg2_23:1.5
	bg2_22:1.5
	bg2_21:1.5
	bg2_20:1.5
	bg2_16:1.5
	bg2_15:1.5
	bg2_12:1.5
	bg2_07:1.5
	isorc2008_submission:1.5
	handbook_alpha_edition:1.4;
locks; strict;
comment	@% @;


1.6
date	2008.08.19.16.20.08;	author martin;	state Exp;
branches;
next	1.5;
commitid	599148aaf29f4567;

1.5
date	2007.11.13.19.05.00;	author martin;	state Exp;
branches;
next	1.4;
commitid	50624739f55a4567;

1.4
date	2007.11.05.12.43.26;	author martin;	state Exp;
branches;
next	1.3;
commitid	50ee472f0fbc4567;

1.3
date	2007.10.03.02.02.37;	author martin;	state Exp;
branches;
next	1.2;
commitid	3f124702f8374567;

1.2
date	2007.09.14.03.39.58;	author martin;	state Exp;
branches;
next	1.1;
commitid	3acf46ea02864567;

1.1
date	2007.09.14.00.41.20;	author martin;	state Exp;
branches;
next	;
commitid	341646e9d8ae4567;


desc
@@


1.6
log
@Corrections as suggested by Trevor
@
text
@
\section{Introduction}

The intention of the following SoC interconnect standard is to be
simple and efficient with respect to implementation resources and
transaction latency.

SimpCon is a fully synchronous standard for on-chip
interconnections. It is a point-to-point connection between a master
and a slave. The master starts either a read or write transaction.
Master commands are single cycle to free the master to continue on
internal operations during an outstanding transaction. The slave has
to register the address when needed for more than one cycle. The
slave also registers the data on a read and provides it to the
master for more than a single cycle. This property allows the master
to delay the actual read if it is busy with internal operations.

The slave signals the end of the transaction through a novel
\emph{ready counter} to provide an early notification. This early
notification simplifies the integration of peripherals into
pipelined masters.

Slaves can also provide several levels of pipelining. This feature
is announced by two static output ports (one for read and one write
pipeline levels).

Off-chip connections (e.g.\ main memory) are device specific and
need a slave to perform the translation. Peripheral interrupts are
not covered by this specification.

\subsection{Features}

\begin{itemize}
    \item Master/slave point-to-point connection
    \item Synchronous operation
    \item Read and write transactions
    \item Early pipeline release for the master
    \item Pipelined transactions
    \item Open-source specification
    \item Low implementation overheads
\end{itemize}

\subsection{Basic Read Transaction}

Figure~\ref{fig:sc:basic:rd} shows a basic read transaction for a
slave with one cycle latency. The acknowledge signals are omitted
from the figure. In the first cycle, the address phase, the \sign{rd}
signals the slave to start the read transaction. The address is
registered by the slave. During the following cycle, the read
phase\footnote{It has to be noted that the read phase can be longer
for devices with a high latency. For simple on-chip IO devices the
read phase can be omitted completely (0 cycles). In that case
\sign{rdy\_cnt} will be zero in the cycle following the address
phase.}, the slave performs the read and registers the data. Due to
the register in the slave, the data is available in the third cycle,
the result phase. To simplify the master, \sign{rd\_data} stays valid
until the next read request response. It is therefore possible for a
master to issue a pre-fetch command early. When the pre-fetched data
arrives too early it is still valid when the master actually wants to
read it.

\begin{figure}
    \centering
    \includegraphics[scale=\scgrsc]{\scgrp/sc_basic_rd}
    \caption{Basic read transaction}
    \label{fig:sc:basic:rd}
\end{figure}

\subsection{Basic Write Transaction}

A write transaction consists of a single cycle address/command phase
started by assertion of \sign{wr} where the address and the write
data are valid. \sign{address} and \sign{wr\_data} are usually
registered by the slave. The end of the write cycle is signalled to
the master by the slave with \sign{rdy\_cnt}. See
Section~\ref{sec:ack} and an example in Figure~\ref{fig:sc:wr:ws}.

\section{SimpCon Signals}

This sections defines the signals used by the SimpCon connection.
Some of the signals are optional and may not be present on a
peripheral device.

All signals are a single direction point-to-point connection between
a master and a slave. The signal details are described by the device
that drives the signal. Table~\ref{tab:sc:signals} lists the signals
that define the SimpCon interface. The column Direction indicates
whether the signal is driven by the master or the slave.

\begin{table}
    \centering

    \begin{tabular}{lrlll}
        \toprule
        Signal & Width & Direction & Required & Description \\
        \midrule
        \sign{address} & 1--32 & Master & No & Address lines from the
        master\\
        & & & & to the slave port\\
        \sign{wr\_data} & 32 & Master & No & Data lines from the
        master\\
        & & & & to the slave port\\
        \sign{rd} & 1 & Master & No & Start of a read transaction \\
        \sign{wr} & 1 & Master & No & Start of a write transaction \\
        \sign{rd\_data} & 32 & Slave & No & Data lines from the
        slave\\
        & & & & to the master port\\
        \sign{rdy\_cnt} & 2 & Slave & Yes & Transaction end signalling \\
        \sign{rd\_pipeline\_level} & 2 & Slave & No & Maximum pipeline
        level\\
        & & & & for read transactions \\
        \sign{wr\_pipeline\_level} & 2 & Slave & No & Maximum pipeline
        level\\
        & & & & for write transactions \\
        \bottomrule

    \end{tabular}
    \caption{SimpCon port signals}
    \label{tab:sc:signals}

\end{table}


\subsection{Master Signal Details}

This section describes the signals that are driven by the master to
initiate a transaction.

\subsubsection{address}

Master addresses represent word addresses as offsets in the slave's
address range. \sign{address} is valid a single cycle either with
\sign{rd} for a read transaction or with \sign{wr} and
\sign{wr\_data} for a write transaction.

The number of bits for \sign{address} depends on the slave's address
range. For a single port slave, \sign{address} can be omitted.

\subsubsection{wr\_data}

The \sign{wr\_data} signals carry the data for a write transaction.
It is valid for a single cycle together with \sign{address} and
\sign{wr}. The signal is typically 32 bits wide. Slaves can ignore
upper bits when the slave port is less than 32 bits.

\subsubsection{rd}

The \sign{rd} signal is asserted for a single clock cycle to start a
read transaction. \sign{address} has to be valid in the same cycle.

\subsubsection{wr}

The \sign{wr} signal is asserted for a single clock cycle to start a
write transaction. \sign{address} and \sign{wr\_data} have to be
valid in the same cycle.

\subsubsection{sel\_byte}

The \sign{sel\_byte} signal is reserved for future versions of the
SimpCon specification to add individual byte enables.

\subsection{Slave Signal Details}

This section describes the signals that are driven by the slave as a
response to transactions initiated by the master.

\subsubsection{rd\_data}

The \sign{rd\_data} signals carry the result of a read transaction.
The data is valid when \sign{rdy\_cnt} reaches 0 and stays valid
until a new read result is available. The signal is typically 32 bits
wide. Slaves that provide less than 32 bits should pad the upper bits
with 0.

\subsubsection{rdy\_cnt}

The \sign{rdy\_cnt} signal provides the number of cycles until the
pending transaction will finish. A 0 means that either read data is
available or a write transaction has been finished. Values of 1 and 2
mean the transaction will finish in at least 1 or 2 cycles. The
maximum value is 3 and means the transaction will finish in 3 or
\emph{more} cycles. Note that not all values have to be used in a
transaction. Each monotonic sequence of \sign{rdy\_cnt} values is
legal.

\subsubsection{rd\_pipeline\_level}

The static \sign{rd\_pipeline\_level} provides the master with the
read pipeline level of the slave. The signal has to be constant to
enable the synthesizer to optimize the pipeline level dependent
state machine in the master.


\subsubsection{wr\_pipeline\_level}

The static \sign{wr\_pipeline\_level} provides the master with the
write pipeline level of the slave. The signal has to be constant to
enable the synthesizer to optimize the pipeline level dependent
state machine in the master.

\section{Slave Acknowledge}
\label{sec:ack}

Flow control between the slave and the master is usually done by a
single signal in the form of \emph{wait} or \emph{acknowledge}. The
\sign{ack} signal, e.g.\ in the Wishbone specification, is set when
the data is available or the write operation has finished. However,
for a pipelined master it can be of interest to know it
\emph{earlier} when a transaction will finish.


For many slaves, e.g.\ an SRAM interface with fixed wait states, this
information is available inside the slave. In the SimpCon interface,
this information is communicated to the master through the two bit
ready counter (\sign{rdy\_cnt}). \sign{rdy\_cnt} signals the number
of cycles until the read data will be available or the write
transaction will be finished. Value 0 is equivalent to an \emph{ack}
signal and 1, 2, and 3 are equivalent to a wait request with the
distinction that the master knows how long the wait request will
last.

To avoid too many signals at the interconnect, \sign{rdy\_cnt} has a
width of two bits. Therefore, the maximum value of 3 has the special
meaning that the transaction will finish in 3 or \emph{more} cycles.
As a result the master can only use the values 0, 1, and 2 to release
actions in its pipeline. If necessary, an extension for a longer
pipeline is straightforward with a larger
\sign{rdy\_cnt}\footnote{The maximum value of the ready counter is
relevant for the early restart of a waiting master. A longer latency
from the slave e.g., for DDR SDRAM, will map to the maximum value of
the counter for the first cycles.}.

Idle slaves will keep the former value of 0 for \sign{rdy\_cnt}.
Slaves that do not know in advance how many wait states are needed
for the transaction can produce sequences that omit any of the
numbers 3, 2, and 1. A simple slave can hold \sign{rdy\_cnt} on 3
until the data is available and set it than directly to 0. The master
has to handle those situations. Practically, this reduces the
possibilities of pipelining and therefore the performance of the
interconnect. The master will read the data later, which is not an
issue as the data stays valid.

Figure~\ref{fig:sc:rd:ws} shows an example of a slave that needs
three cycles for the read to be processed. In cycle 1, the read
command and the address are set by the master. The slave registers
the address and sets \sign{rdy\_cnt} to 3 in cycle 2. The read takes
three cycles (2--4) during which \sign{rdy\_cnt} gets decremented. In
cycle 4 the data is available inside the slave and gets registered.
It is available in cycle 5 for the master and \sign{rdy\_cnt} is
finally 0. Both, the \sign{rd\_data} and \sign{rdy\_cnt} will keep
their value until a new transaction is requested.

\begin{figure}
    \centering
    \includegraphics[scale=\scgrsc]{\scgrp/sc_rd_ws}
    \caption{Read transaction with wait states}
    \label{fig:sc:rd:ws}
\end{figure}


Figure~\ref{fig:sc:wr:ws} shows an example of a slave that needs
three cycles for the write to be processed. The address, the data to
be written, and the write command are valid during cycle 1. The slave
registers the address and write data during cycle 1 and performs the
write operation during cycles 2--4. The \sign{rdy\_cnt} is
decremented and a non-pipelined slave can accept a new command after
cycle 4.

\begin{figure}
    \centering
    \includegraphics[scale=\scgrsc]{\scgrp/sc_wr_ws}
    \caption{Write transaction with wait states}
    \label{fig:sc:wr:ws}
\end{figure}


\section{Pipelining}

Figure~\ref{fig:sc:pipe:level} shows a read transaction for a slave
with four clock cycles latency. Without any pipelining, the next read
transaction will start in cycle 7 after the data from the former read
transaction is read by the master. The three bottom lines show when
new read transactions (only the \sign{rd} signal is shown, address
lines are omitted from the figure) can be started for different
pipeline levels. With pipeline level 1, a new transaction can start
in the same cycle when the former read data is available (in this
example in cycle 6). At pipeline level 2, a new transaction (either
read or write) can start when \sign{rdy\_cnt} is 1, for pipeline
level 3 the next transaction can start at a \sign{rdy\_cnt} of 2.

\begin{figure}
    \centering
    \includegraphics[scale=\scgrsc]{\scgrp/sc_pipe_level}
    \caption{Different pipeline levels for a read transaction}
    \label{fig:sc:pipe:level}
\end{figure}

The implementation of level 1 in the slave is trivial (just two more
transitions in the state machine). It is recommended to provide  at
least level 1 for read transactions. Level 2 is a little bit more
complex but usually no additional address or data registers are
necessary.

To implement level 3 pipelining in the slave, at least one additional
address register is needed. However, to use level 3 the master has to
issue the request in the same cycle as \sign{rdy\_cnt} goes to 2.
That means this transition is combinatorial. We see in
Figure~\ref{fig:sc:pipe:level} that \sign{rdy\_cnt} value of 3 means
three or more cycles until the data is available and can therefore
not be used to trigger a new transaction. Extension to an even deeper
pipeline needs a wider \sign{rdy\_cnt}.


\subsection{Interconnect}

Although the definition of SimpCon is from a single master/slave
point-to-point viewpoint, all variations of multiple slave and
multiple master devices are possible.

\subsubsection{Slave Multiplexing}

To add several slaves to a single master, \sign{rd\_data} and
\sign{rdy\_cnt} have to be multiplexed. Due to the fact that all
\sign{rd\_data} signals are already registered by the slaves, a
single pipeline stage will be enough for a large multiplexer. The
selection of the multiplexer is also known at the transaction start
but at least needed one cycle later. Therefore it can be registered
to further speed up the multiplexer.

%TODO: add a schematic for the master \sign{rd\_data} multiplexer.

\subsubsection{Master Multiplexing}

SimpCon defines no signals for the communication between a master and
an arbiter. However, it is possible to build a multi-master system
with SimpCon. The SimpCon interface can be used as an interconnect
between the masters and the arbiter and the arbiter and the slaves.
In this case the arbiter acts as a slave for the master and as a
master for the peripheral devices. An example of an arbiter for
SimpCon, where JOP and a VGA controller are two masters for a shared
main memory, can be found in \cite{jop:dma}. The same arbiter is also
used to build a chip-multiprocessor version of JOP.

The missing arbitration protocol in SimpCon results in the need to
queue $n-1$ requests in an arbiter for $n$ masters. However, this
additional hardware results in a zero cycle bus grant. The master,
which gets the bus granted, starts the slave transaction in the same
cycle as the original read/write request.

%TODO: add a timing diagram to explain this concept.


\section{Examples}

This section provides some examples for the application of the
SimpCon definition.

\subsection{IO Port}

TODO: Show how simple an IO port can be with SimpCon. We need no
addresses and can tie \sign{bsy\_cnt} to 0. We only need the
\sign{rd} or \sign{wr} signal to enable the port.

\subsection{SRAM interface}

The following example is taken from an implementation of SimpCon for
a Java processor. The processor is clocked at 100~MHz and the main
memory consists of 15~ns static RAMs. Therefore the minimum access
time for the RAM is two cycles. The slack time of 5ns forces us to
use output registers for the RAM address and write data and input
registers for the read data in the IO cells of the FPGA. These
registers fit nicely with the intention of SimpCon to use registers
inside the slave.

Figure~\ref{fig:sc:sram} shows the memory interface for a
non-pipelined read access followed by a write access. Four signals
are driven by the master and two signals by the slave. The lower
half of the figure shows the signals at the FPGA pins where the RAM
is connected.


\begin{figure}
    \centering
    \includegraphics[scale=\scgrsc]{\scgrp/sc_sram}
    \caption{Static RAM interface without pipelining}
    \label{fig:sc:sram}
\end{figure}

In cycle~1 the read transaction is started by the master and the
slave registers the address. The slave also sets the registered
control signals \sign{ncs} and \sign{noe} during cycle~1. Due to the
placement of the registers in the IO cells, the address and control
signals are valid at the FPGA pins very early in cycle~2. At the end
of cycle~3 (15~ns after \sign{address}, \sign{ncs} and \sign{noe} are
stable) the data from the RAM is available and can be sampled with
the rising edge for cycle~4. The setup time for the read register is
short, as the register can be placed in the IO cell. The master reads
the data in cycle~4 and starts a write transaction in cycle~5.
Address and data are again registered by the slave and are available
for the RAM at the beginning of cycle~6. To perform a write in two
cycles the \sign{nwr} signal is registered by a negative triggered
flip-flop.

In Figure~\ref{fig:sc:sram:prd} we see a pipelined read from the RAM
with pipeline level 2. With this pipeline level and the two cycles
read access time of the RAM we achieve the maximum possible
bandwidth.

\begin{figure}
    \centering
    \includegraphics[scale=\scgrsc]{\scgrp/sc_sram_prd}
    \caption{Pipelined read from a static RAM}
    \label{fig:sc:sram:prd}
\end{figure}

We can see the start of the second read transaction in cycle 3
during the read of the first data from the RAM. The new address is
registered in the same cycle and available for the RAM in the
following cycle 4. Although we have a pipeline level of 2 we need no
additional address or data register. The read data is available for
two cycles (\sign{rdy\_cnt} 2 or 1 for the next read) and the master
is free to select one of the two cycles to read the data.

It has to be noted that pipelining with one read per cycle is
possible with SimpCon. We just showed a 2 cycle slave in this
example. For a SDRAM memory interface the ready counter will stay
either at 2 or 1 during the single cycle reads (depending on the
slave pipeline level). It will go down to 0 only for the last data
word to read.

\subsection{Master Multiplexing}

To add several slaves to a single master the \sign{rd\_data} and
\sign{bsy\_cnt} have to be multiplexed. Due to the fact that all
\sign{rd\_data} signals are registered by the slaves a single
pipeline stage will be enough for a large multiplexer. The selection
of the multiplexer is also known at the transaction start but needed
at most in the next cycle. Therefore it can be registered to further
speed up the multiplexer.
\ \\
\ \\
TODO: add a schematic for the master \sign{rd\_data} multiplexer.

\section{Available VHDL Files}

Besides the SimpCon documentation, some example VHDL files for slave
devices and bridges are available from
\url{http://www.opencores.org/projects.cgi/web/simpcon/overview}. All
components are also part of the standard JOP distribution.


\subsection{Components}

\begin{itemize}
    \item \code{sc\_pack.vhd} defines VHDL records and some
    constants.
    \item \code{sc\_test\_slave} is a very simple SimpCon device. A
    counter to be read out and a register that can be written and
    read. There is no connection to the outer world. This example
    can be used as basis for a new SimpCon device.
    \item \code{sc\_sram16.vhd} is a memory controller for 16-bit
    SRAM.
    \item \code{sc\_sram32.vhd} is a memory controller for 32-bit
    SRAM.
    \item \code{sc\_sram32\_flash.vhd} is a memory controller for 32-bit
    SRAM, a NOR Flash, and a NAND Flash as used in the Cycore FPGA board for JOP.
    \item \code{sc\_uart.vhd} is a simple UART with configurable
    baud rate and FIFO width.
    \item \code{sc\_usb.vhd} is an interface to the parallel port of
    the FTDI 2232 USB chip. The register definition is identical to
    the UART and the USB connection can be used as a drop in
    replacement for a UART.
    \item \code{sc\_isa.vhd} interfaces the old ISA bus. It can be used
    for the popular CS8900 Ethernet chip.
    \item \code{sc\_sigdel.vhd} is a configurable sigma-delta converter
    for an FPGA that needs at minimum just two external components:
    a capacitor and a resistor.
    \item \code{sc\_fpu.vhd} provides an interface to the 32-bit FPU available
    at \url{www.opencores.org}.
    \item \code{sc\_arbiter.vhd} is a zero cycle latency,
        priority-based SimpCon arbiter written by Christof Pitter
        \cite{jop:cmp}.
\end{itemize}

\subsection{Bridges}

\begin{itemize}
    \item \code{sc2wb.vhd} is a SimpCon/Wishbone \cite{soc:wishbone}
    bridge.
    \item \code{sc2avalon.vhd} is a SimpCon/Avalon \cite{soc:avalon}
    bridge to integrate a SimpCon based design with Altera's SOPC
    Builder \cite{quartus}.
    \item \code{sc2ahbsl.vhd} provides an interface to AHB slaves as
    defined in Gaisler's GRLIB \cite{grlib}. Many of the available
    GPL AHB modules from the GRLIB can be used in a SimpCon based
    design.
\end{itemize}


\section{Why a New Interconnection Standard?}

There are many interconnection standards available for SoC designs.
The natural question is: Why propose a new one? The answer is given
in the following section. In summary, the available standards are
still in the tradition of backplane busses and do not fit very well
for pipelined on-chip interconnections.

\subsection{Common SoC Interconnections}


Several point-to-point and bus standards have been proposed. The
following section gives a brief overview of common SoC
interconnection standards.

The Advanced Microcontroller Bus Architecture (AMBA) \cite{soc:amba}
is the interconnection definition from ARM. The specification defines
three different busses: Advanced High-performance Bus (AHB), Advanced
System Bus (ASB), and Advanced Peripheral Bus (APB). The AHB is used
to connect on-chip memory, cache, and external memory to the
processor. Peripheral devices are connected to the APB. A bridge
connects the AHB to the lower bandwidth APB. An AHB bus transfer can
be one cycle with burst operation. With the APB a bus transfer
requires two cycles and no burst mode is available. Peripheral bus
cycles with wait states are added in the version 3 of the APB
specification. ASB is the predecessor of AHB and is not recommended
for new designs (ASB uses both clock phases for the bus signals --
very uncommon for today's synchronous designs). The AMBA 3 AXI
(Advanced eXtensible Interface) \cite{soc:amba3} is the latest
extension to AMBA. AXI introduces out-of-order transaction completion
with the help of a 4 bit transaction ID tag. A ready signal
acknowledges the transaction start. The master has to hold the
transaction information (e.g.\ address) until the interconnect
signals ready. This enhancement ruins the elegant single cycle
address phase from the original AHB specification.

Wishbone \cite{soc:wishbone} is a public domain standard used by
several open-source IP cores. The Wishbone interface specification is
still in the tradition of microcomputer or backplane busses. However,
for a SoC interconnect, which is usually
point-to-point\footnote{Multiplexers are used instead of busses to
connect several slaves and masters.}, this is not the best approach.
The master is requested to hold the address and data valid through
the whole read or write cycle. This complicates the connection to a
master that has the data valid only for one cycle. In this case the
address and data have to be registered \emph{before} the Wishbone
connect or an expensive (time and resources) multiplexer has to be
used. A register results in one additional cycle latency. A better
approach would be to register the address and data in the slave. In
that case the address decoding in the slave can be performed in the
same cycle as the address is registered. A similar issue, with
respect to the master, exists for the output data from the slave: As
it is only valid for a single cycle, the data has to be registered by
the master when the master is not reading it immediately. Therefore,
the slave should keep the last valid data at its output even when the
Wishbone strobe signal (\emph{wb.stb}) is not assigned anymore.
Holding the data in the slave is usually \emph{for free} from the
hardware complexity -- it is \emph{just} a specification issue. In
the Wishbone specification there is no way to perform pipelined read
or write. However, for blocked memory transfers (e.g. cache load)
this is the usual way to achieve good performance.


The Avalon \cite{soc:avalon} interface specification is provided by
Altera for a system-on-a-programmable-chip (SOPC) interconnection.
Avalon defines a great range of interconnection devices ranging from
a simple asynchronous interface intended for direct static RAM
connection up to sophisticated pipeline transfers with variable
latencies. This great flexibility provides an easy path to connect a
peripheral device to Avalon. How is this flexibility possible? The
\emph{Avalon Switch Fabric} translates between all those different
interconnection types. The switch fabric is generated by Altera's
SOPC Builder tool. However, it seems that this switch fabric is
Altera proprietary, thus tying this specification to Altera FPGAs.

The On-Chip Peripheral Bus (OPB) \cite{soc:opb} is an open standard
provided by IBM and used by Xilinx. The OPB specifies a bus for
multiple masters and slaves. The implementation of the bus is not
directly defined in the specification. A distributed ring, a
centralized multiplexer, or a centralized AND/OR network are
suggested. Xilinx uses the AND/OR approach and all masters and
slaves must drive the data busses to zero when inactive.


Sonics Inc. defined the Open Core Protocol (OCP) \cite{soc:ocp} as
an open, freely available standard. The standard is now handled by
the OCP International Partnership\footnote{\url{www.ocpip.org}}.


\subsection{What's Wrong with the Classic Standards?}

All SoC interconnection standards, which are widely in use, are still
in the tradition of a backplane bus. They force the master to hold
the address and control signals until the slave provides the data or
acknowledges the write request. However, this is not necessary in a
clocked, synchronous system. Why should we force the master to hold
the signals? Let the master move on after submitting the request in a
single cycle. Forcing the address and control valid for the complete
request disables any form of pipelined requests.


\begin{figure}
    \centering
    \includegraphics[scale=\scgrsc]{\scgrp/wb_basic_rd}
    \caption{Classic basic read transaction}
    \label{fig:wb:basic:rd}
\end{figure}

Figure~\ref{fig:wb:basic:rd} shows a read transaction with wait
states as defined in Wishbone \cite{soc:wishbone}, Avalon
\cite{soc:avalon}, OPB \cite{soc:opb}, and OCP
\cite{soc:ocp}.\footnote{The signal names are different, but the
principle is the same for all mentioned busses.} The master issues
the read request and the address in cycle 1. The slave has to reset
the \sign{ack} in the same cycle. When the slave data is available,
the acknowledge signal is set (\sign{ack} in cycle 3). The master has
to read the data and register them within the same clock cycle. The
master has to hold the address, write data, and control signal active
until the acknowledgement from the slave arrives. For pipelined
reads, the \sign{ack} signal can be split into two signals (available
in Avalon and OCP): one to accept the request and a second one to
signal the available data.

The master is blind about the status of the outstanding transaction
until it is finished. It could be possible that the slave informs the
master in how many cycles the result will be available. This
information can help in building deeply pipelined masters.

Only the AMBA AHB \cite{soc:amba} defines a different protocol. A
single cycle address phase followed by a variable length data phase.
The slave acknowledgement (HREADY) is only necessary in the data
phase avoiding the combinatorial path from address/command to the
acknowledgement. Overlapping address and data phase is allowed and
recommended for high performance. Compared to SimpCon, AMBA AHB
allows for single stage pipelining, whereas SimpCon makes multi-stage
pipelining possible using the ready counter (\sign{rdy\_cnt}). The
\sign{rdy\_cnt} signal defines the delay between the address and the
data on a read, signalled by the slave. Therefore, the pipeline depth
of the bus and the slaves is only limited by the bit width of
\sign{rdy\_cnt}.


Another issue with all interconnection standards is the single cycle
availability of read data at the slaves. Why not keep the read data
valid as long as there is no new read data available? This feature
would allow the master to be more flexible when to read the data. It
would allow issuing a read command and then continuing with other
instructions -- a feature known as data pre-fetching to hide long
latencies.

The last argument sounds contradictory to the first argument: provide
the transaction data at the master just for a single cycle, but
request the slave to hold the data for several cycles. However, it is
motivated by the need to free up the master, keep it \emph{moving},
and move the data hold (register) burden into the slave. As data
processing bottlenecks are usually found in the master devices, it
sounds natural to move as much work as possible to the slave devices
to free up the master.

Avalon, Wishbone, and OPB provide a single cycle latency access to
slaves due to the possibility of acknowledging a request in the same
cycle. However, this feature is a scaling issue for larger systems.
There is a combinatorial path from master address/command to address
decoding, slave decision on \sign{ack}, slave \sign{ack} multiplexing
back to the master and the master decision to hold address/command or
read the data and continue. Also, the slave output data multiplexer
is on a combinatorial path from the master address.

AMBA, AHB, and SimpCon avoid this scaling issue by requesting the
acknowledge in the cycle following the command. In SimpCon and AMBA,
the select for the read data multiplexer can be registered as the
read address is known at least one cycle before the data is
available. The later acknowledgement results in a minor drawback on
SimpCon and AMBA (nothing is for free): It is not possible to perform
a single cycle read or write without pipelining. A single, non
pipelined transaction takes two cycles without a wait state. However,
a single cycle read transaction is only possible for very simple
slaves. Most non-trivial slaves (e.g.\ memory interfaces) will not
allow a single cycle access anyway.

\subsection{Evaluation}

We compare the SimpCon interface with the AMBA and the Avalon
interface as two examples of common interconnection standards. As an
evaluation example, we interface an external asynchronous SRAM with a
tight timing. The system is clocked at 100~MHz and the access time
for the SRAM is 15~ns. Therefore, there are 5~ns available for
on-chip register to SRAM input and SRAM output to on-chip register
delays. As an SoC, we use an actual low-cost FPGA (Cyclone EP1C6
\cite{AltCyc} and a Cyclone II).

The master is a Java processor (JOP \cite{jop:thesis,
jop:jnl:jsa2007}). The processor is configured with a 4~KB
instruction cache and a 512 byte on-chip stack cache. We run a
complete application benchmark on the different systems. The embedded
benchmark (\emph{Kfl} as described in \cite{jop:austrochip05}) is an
industrial control application already in production.


%For a different benchmark (a UDP/IP application with IP processing -
%lot of buffer access) the difference is larger.

\begin{table}
    \centering

    \begin{tabular}{rll}
        \toprule
        Performance &      Memory     & Interconnect \\
        \midrule
        16,633 &  32 bit SRAM & SimpCon \\

        14,259 &  32 bit SRAM & AMBA \\

        14,015 &  32 bit SRAM & Avalon/PTF \\
        13,920 &  32 bit SRAM & Avalon/VHDL \\
        15,762 & 32 bit on-chip & Avalon\\

        14,760 &  16 bit SRAM & SimpCon \\

        11,322 &  16 bit SRAM & Avalon \\


         7,288  & 16 bit SDRAM & Avalon \\


%        Kfl & UDP/IP   &  Memory     & Interconnect \\
%        \midrule
%        16,633 & 6,537 & 32 bit SRAM & SimpCon \\
%
%        14,015 &       & 32 bit SRAM & Avalon/PTF \\
%        13,920 &       & 32 bit SRAM & Avalon/VHDL \\
%
%        14,760 & 5,716 & 16 bit SRAM & SimpCon \\
%
%        11,322 & 4,302 & 16 bit SRAM & Avalon \\
%        15,762 & --    & 32 bit on-chip & Avalon\\
%
%
%         7,288 & 2,677 & 16 bit SDRAM & Avalon \\


        \bottomrule

    \end{tabular}
    \caption{JOP performance with different interconnection types}
    \label{tab:perf:diff}

%         5,630 & 2,140 &  50 MHz & 16 bit SRAM (relaxed timing) & Avalon \\
%         6,243 & 2,389 &  50 MHz & 16 bit SRAM & Avalon \\

\end{table}

Table~\ref{tab:perf:diff} shows the performance numbers of this
JOP/SRAM interface on the embedded benchmark. It measures iterations
per second and therefore higher numbers are better. One iteration is
the execution of the main control loop of the \emph{Kfl} application.
For a 32 bit SRAM interface, we compare SimpCon against AMBA and
Avalon. SimpCon outperforms AMBA by 17\% and Avalon by
19\%\footnote{The performance is the measurement of the execution
time of the whole application, not only the difference between the
bus transactions.} on a 32 bit SRAM.

The AMBA experiment uses the SRAM controller provided as part of
GRLIB \cite{grlib} by Gaisler Research. We avoided writing our own
AMBA slave to verify that the AMBA implementation on JOP is correct.
To provide a fair comparison between the single master solutions with
SimpCon and Avalon, the AMBA bus was configured without an arbiter.
JOP is connected directly to the AMBA memory slave. The difference
between the SimpCon and the AMBA performance can be explained by two
facts: (1) as with the Avalon interconnect, the master has the
information when the slave request is ready at the same cycle when
the data is available (compared to the \sign{rdy\_cnt} feature); (2)
the SRAM controller is conservative as it asserts \sign{HREADY} one
cycle later than the data is available in the read register
(\sign{HRDATA}). The second issue can be overcome by a better
implementation of the SRAM AMBA slave.


The Avalon experiment considers two versions: an SOPC Builder
generated interface (PTF) to the memory and a memory interface
written in VHDL. The SOPC Builder interface performs slightly better
than the VHDL version that generates the Avalon \sign{waitrequest}
signal. It is assumed that the SOPC Builder version uses fixed wait
states within the switch fabric.

We also implemented an Avalon interface to the single-cycle on-chip
memory. SimpCon is even faster with the 32 bit off-chip SRAM than
with the on-chip memory connected via Avalon. Furthermore, we also
performed experiments with a 16 bit memory interface to the same
SRAM. With this smaller data width the pressure on the
interconnection and memory interface is higher. As a result the
difference between SimpCon and Avalon gets larger (30\%) on the 16
bit SRAM interface. To complete the picture we also measured the
performance with an SDRAM memory connected to the Avalon bus. We see
that the large latency of an SDRAM is a big performance issue for
the Java processor.

\section{Summary}

This document describes a simple (with respect to the definition and
implementation) and efficient SoC interconnect. The novel signal
\sign{rdy\_cnt} allows an early signalling to the master when read
data will be valid. This feature allows the master to restart a
stalled pipeline earlier to react for arriving data. Furthermore,
this feature also enables pipelined bus transactions with a minimal
effort on the master and the slave side.

We have compared SimpCon quantitative with AMBA and Avalon, two
common interconnection definitions. The application benchmark shows
a performance advantage of SimpCon by 17\% over AMBA and 19\% over
Avalon interfaces to an SRAM.

SimpCon is used as the main interconnect for the Java processor JOP
in a single master, multiple salves configuration. SimpCon is also
used to implement a shared memory chip-multiprocessor version of
JOP. Furthermore, in a research project on time-triggered
network-on-chip \cite{jop:ttnoc} SimpCon is used as the
\emph{socket} to this NoC.

The author thanks Kevin Jennings and Tommy Thorn for the interesting
discussions about SimpCon, Avalon, and on-chip interconnection in
general at the Usenet newsgroup \texttt{comp.arch.fpga}.
@


1.5
log
@clearification of simple read timing
@
text
@d45 16
a60 17
Figure~\ref{fig:sc:basic:rd} shows a basic read transaction for
a slave with one cycle latency. The acknowledge signals are
omitted from the figure. In the first cycle, the address phase,
the \sign{rd} signals the slave to start the read transaction.
The address is registered by the slave. During the following
cycle, the read phase\footnote{It has to be noted that the read
phase can be longer for devices with a high latency. For simple
on-chip IO devices the read phase can be omitted completely (0
cycles). In that case \sign{rdy\_cnt} will be zero in the cycle
following the address phase.}, the slave performs the read and
registers the data. Due to the register in the slave the data
is available in the third cycle, the result phase. To simplify
the master, \sign{rd\_data} stays valid till the next read
request response. It is therefore possible for a master to
issue a pre-fetch command early. When the pre-fetched data
arrives to early it is still valid when the master actually
wants to read it.
d88 1
a88 1
wether the signal is driven by the master or the slave.
d132 1
a132 1
Master addresses represent word addresses as offsets in the slaves
d137 2
a138 2
The number of bits for \sign{address} depend on the slaves address
range. For a single port slave \sign{address} can be omitted.
d149 1
a149 1
The \sign{rd} signal is asserted a single clock cycle to start a
d154 1
a154 1
The \sign{wr} signal is asserted a single clock cycle to start a
d166 1
a166 1
response to transaction initiated by the master.
d170 1
a170 1
The \sign{wr\_data} signals carry the result for a read transaction.
d172 3
a174 3
till a new read result is available. The signal is typically 32 bits
wide. Slaves that provide less than 32 bits should pad the upper
bits with 0.
d178 1
a178 1
The \sign{rdy\_cnt} signal provides the number of cycles till the
d180 5
a184 5
available or a write transaction has been finished. Values of 1 and
2 mean the the transaction will finish in at least 1 or 2 cycles.
The maximum value is 3 and means the the transaction will finish in
3 or \emph{more} cycles. Note that not all values have to be used in
a transaction. Each monotonic sequence of \sign{rdy\_cnt} values is
d213 5
a217 5
For many slaves, e.g.\ an SRAM interface with fixed wait states,
this information is available inside the slave. In the SimpCon
interface this information is communicated to the master through the
two bit ready counter (\sign{rdy\_cnt}). \sign{rdy\_cnt} signals the
number of cycles till the read data will be available or the write
d223 1
a223 1
To avoid too many signals at the interconnect \sign{rdy\_cnt} has a
d226 3
a228 3
As a result the master can only use the values 0, 1, and 2 to
release actions in its pipeline. If necessary an extension for a
longer pipeline is straightforward with a larger
d235 1
a235 1
Slaves, that don't know in advance how many wait states are needed
d238 2
a239 2
until the data is available and set it than directly to 0. The
master has to handle those situations. Practically this reduces the
d245 1
a245 1
three cycles for the read to be processed. In cycle 1 the read
d248 5
a252 6
three cycles (2--4) during which \sign{rdy\_cnt} gets decremented.
In cycle 4 the data is available inside the slave and gets
registered. It is available in cycle 5 for the master and
\sign{rdy\_cnt} is finally 0. Both, the \sign{rd\_data} and
\sign{rdy\_cnt} will keep their value till a new transaction is
requested.
d264 1
a264 1
be written and the write command are valid during cycle 1. The slave
d282 10
a291 11
with four clock cycles latency. Without any pipelining the next read
transaction will start in cycle 7 after the data from the former
read transaction is read by the master. The three bottom lines show
when new read transactions (only the \sign{rd} signal is shown,
address lines are omitted from the figure) can be started for
different pipeline levels. With pipeline level 1 a new transaction
can start in the same cycle when the former read data is available
(in this example in cycle 6). At pipeline level 2 a new transaction
(either read or write) can start when \sign{rdy\_cnt} is 1, for
pipeline level 2 the next transaction can start at a \sign{rdy\_cnt}
of 2.
d306 3
a308 3
To implement level 3 pipelining in the slave at least an additional
address register is needed. However, to use level 3 the master has
to issue the request in the same cycle as \sign{rdy\_cnt} goes to 2.
d311 3
a313 3
three or more cycles till the data is available and can therefore
not be used to trigger a new transaction. Extension to an even
deeper pipeline needs a wider \sign{rdy\_cnt}.
d324 1
a324 1
To add several slaves to a single master \sign{rd\_data} and
d326 1
a326 1
\sign{rd\_data} signals are already registered by the slaves a
d329 1
a329 1
but needed at most in the next cycle. Therefore it can be registered
d336 9
a344 9
SimpCon defines no signals for the communication between a master
and an arbiter. However, it is possible to build a multi master
system with SimpCon. The SimpCon interface can be used as
interconnect between the masters and the arbiter and the arbiter and
the slaves. In this case the arbiter acts as slave for the master
and as master for the peripheral devices. An example of an arbiter
for SimpCon, where JOP and a VGA controller are two masters for a
shared main memory, can be found in \cite{jop:dma}. The same arbiter
is also used to build a chip-multiprocessor version of JOP.
d369 2
a370 2
a Java processor. The processor is clocked with 100MHz and the main
memory consists of 15ns static RAMs. Therefore the minimum access
d374 1
a374 1
registers fit nice with the intention of SimpCon to use registers
d396 9
a404 9
of cycle~3 (15~ns after \sign{address}, \sign{ncs} and \sign{noe}
are stable) the data from the RAM is available and can be sampled
with the rising edge for cycle~4. The setup time for the read
register is short as the register can be placed in the IO cell. The
master reads the data in cycle~4 and starts a write transaction in
cycle~5. Address and data are again registered by the slave and are
available for the RAM at the beginning of cycle~6. To perform a
write in two cycles the \sign{nwr} signal is registered by a
negative triggered flip-flop.
d448 1
a448 1
Besides the SimpCon documentation some example VHDL files for slave
d450 2
a451 2
\url{http://www.opencores.org/projects.cgi/web/simpcon/overview}.
All components are also part of the standard JOP distribution.
d482 3
a484 2
    \item \code{sc\_arbiter.vhd} is a zero cycle latency, priority
    based SimpCon arbiter written by Christof Pitter \cite{jop:cmp}.
d506 1
a506 1
in the following section. For short: The available standards are
d513 3
a515 3
Several point-to-point and bus standards have been proposed over the
last years. The following section gives a brief overview of common
SoC interconnection standards.
d518 5
a522 5
is the interconnection definition from ARM. The specification
defines three different busses: Advanced High-performance Bus (AHB),
Advanced System Bus (ASB), and Advanced Peripheral Bus (APB). The
AHB is used to connect on-chip memory, cache, and external memory to
the processor. Peripheral devices are connected to the APB. A bridge
d531 4
a534 4
extension to AMBA. AXI introduces out-of-order transaction
completion with the help of a 4 bit transaction id tag. A ready
signal acknowledges the transaction start. The master has to hold
the transaction information (e.g.\ address) till the interconnect
d539 3
a541 3
several open-source IP cores. The Wishbone interface specification
is still in the tradition of microcomputer or backplane busses.
However, for a SoC interconnect, which is usually
d554 1
a554 1
it is only valid for a single cycle the data has to be registered by
d556 2
a557 2
the slave should keep the last valid data at its output even when
the Wishbone strobe signal (\emph{wb.stb}) is not assigned anymore.
d575 1
a575 1
Altera proprietary thus tying this specification to Altera FPGAs.
d595 1
a595 1
All SoC interconnection standards, that are widely in use, are still
d597 1
a597 1
the address and control signals till the slave provides the data or
d600 3
a602 3
the signals? Let the master move on after submitting the request in
a single cycle. Forcing the address and control valid for the
complete request disables any form of pipelined requests.
d615 2
a616 2
\cite{soc:ocp}\footnote{The signal names are different, but the
principle is the same for all mentioned busses.}. The master issues
d618 7
a624 7
the \sign{ack} in the same cycle. When the slave data is available
the acknowledge signal is set (\sign{ack} in cycle 3). The master
has to read the data and register them within the same clock cycle.
The master has to hold the address, write data and control signal
active till the acknowledgement from the slave. For pipelined read
the \sign{ack} signal can be split into two signals (available in
Avalon and OCP): one to accept the request and a second one to
d628 3
a630 3
until it is finished. It could be possible that the slave informs
the master in how many cycles the result will be available. This
information can help in building deeper pipelined masters.
d638 6
a643 6
allows for single stage pipelining, whereas SimpCon makes
multi-stage pipelining possible using the ready counter
(\sign{rdy\_cnt}). The \sign{rdy\_cnt} signal defines the delay
between the address and the data on a read, signalled by the slave.
Therefor, the pipeline depth of the bus and the slaves is only
limited by the bit width of \sign{rdy\_cnt}.
d650 1
a650 1
would allow to issue a read command and then continue with other
d654 8
a661 8
The last argument sounds contradictory to the first argument on
providing the transaction data at the master just for a single
cycle, but requesting the slave to hold the data for several cycles.
However, it is motivated to free up the master, keep it
\emph{moving}, and move the data hold (register) burden into the
slave. As data processing bottlenecks are usually found in the
master devices it sounds natural to move as much work as possible to
the slave devices to free up the master.
d663 2
a664 2
Avalon, Wishbone and OPB provide a single cycle latency access to
slaves due to the possibility to acknowledge a request in the same
d667 4
a670 4
decoding, slave decision on \sign{ack}, slave \sign{ack}
multiplexing back to the master and the master decision to hold
address/command or read the data and continue. Also the slave output
data multiplexer is on a combinatorial path from the master address.
d672 2
a673 2
AMBA AHB and SimpCon avoid this scaling issue by requesting the
acknowledge in the cycle following the command. In SimpCon and AMBA
d676 7
a682 7
available. The later acknowledge results in a minor drawback on
SimpCon and AMBA (nothing is for free): It is not possible to
perform a single cycle read or write without pipelining. A single,
non pipelined transaction takes two cycles without a wait state.
However, a single cycle read transaction is only possible for very
simple slaves. Most non-trivial slaves (e.g.\ memory interfaces)
will not allow a single cycle access anyway.
d687 4
a690 4
interface as two examples of common interconnection standards. As
evaluation example we interface an external asynchronous SRAM with a
tight timing. The system is clocked with 100 MHz and the access time
for the SRAM is 15 ns. Therefore, there are 5 ns available for
d692 1
a692 1
delays. As SoC we use an actual low-cost FPGA (Cyclone EP1C6
d696 3
a698 3
jop:jnl:jsa2007}). The processor is configured with 4~KB instruction
cache and 512 Byte on-chip stack cache. We run a complete
application benchmark on the different systems. The embedded
d757 5
a761 5
JOP/SRAM interface on the embedded benchmark. It measures
iterations/s and therefore higher numbers are better. One iteration
is the execution of the main control loop of the \emph{Kfl}
application. For a 32 bit SRAM interface we compare SimpCon against
AMBA and Avalon. SimpCon outperforms AMBA by 17\% and Avalon by
d769 11
a779 11
To provide a fair comparison between the single master solutions
with SimpCon and Avalon the AMBA bus was configured without an
arbiter. JOP is connected directly to the AMBA memory slave. The
difference between the SimpCon and the AMBA performance can be
explained by two facts: (1) as with the Avalon interconnect, the
master has the information when the slave request is ready at the
same cycle when the data is available (compared to the
\sign{rdy\_cnt} feature); (2) the SRAM controller is conservative as
it asserts \sign{HREADY} one cycle later than the data is available
in the read register (\sign{HRDATA}). The second issue can be
overcome by a better implementation of the SRAM AMBA slave.
d824 2
a825 2
discussions about SimpCon, Avalon and on-chip interconnection in
general at the usenet newsgroup \texttt{comp.arch.fpga}.
@


1.4
log
@Handbook alpha edition
@
text
@d45 17
a61 12
Figure~\ref{fig:sc:basic:rd} shows a basic read transaction for a
slave with one cycle latency. The acknowledge signals are omitted
from the figure. In the first cycle, the address phase, the
\sign{rd} signals the slave to start the read transaction. The
address is registered by the slave. During the following cycle, the
read phase, the slave performs the read and registers the data. Due
to the register in the slave the data is available in the third
cycle, the result phase. To simplify the master, \sign{rd\_data}
stays valid till the next read request response. It is therefore
possible for a master to issue a pre-fetch command early. When the
pre-fetched data arrives to early it is still valid when the master
actually wants to read it.
@


1.3
log
@some update
@
text
@d585 1
a585 1
the OCP International Partnership (www.ocpip.org).
@


1.2
log
@update SimpCon chapter with Austrochip paper content and VHDL file descriptions
@
text
@d440 2
a441 2
\\
\\
@


1.1
log
@SimpCon description
@
text
@d31 1
a31 1
\subsection{Feature}
d52 5
a56 2
cycle, the result phase. To simplify the master, the read data stays
valid till the next read request response.
d71 2
a72 2
the master by the slave with \sign{rdy\_cnt}. See section
\ref{sec:ack} and an example in Figure~\ref{fig:sc:wr:ws}.
d208 2
a209 1
For a lot of slaves, e.g.\ a SRAM interface with fixed wait states,
d212 2
a213 2
two bit signal \sign{rdy\_cnt}. \sign{rdy\_cnt} signals the number
of cycles till the read data will be available or the write
d223 6
a228 1
release actions in it's pipeline.
d231 8
a238 3
Slaves, that don't know in advance how many wait states are need for
the transaction can produce sequences that omit any of the numbers
3, 2, and 1. The master has to handle this situations.
d244 6
a249 5
three cycles (2--4) during which \sign{rdy\_cnt} is decremented. In
cycle 4 the data is available inside the slave and gets registered.
It is available in cycle 5 for the master and \sign{rdy\_cnt} is
finally 0. Both, the \sign{rd\_data} and \sign{rdy\_cnt} will keep
their value till a new transaction is requested.
d279 1
a279 1
with four cycles latency. Without any pipelining the next read
d282 8
a289 5
when new read transactions will be started for different pipeline
levels. With pipeline level 1 a new transaction can start in the
same cycle when the former read data is available (in this example
in cycle 6). Higher levels mean that the next read will start
earlier as shown for level 2 and 3.
d298 3
a300 3
Implementation of level 1 in the slave is trivial (just two more
transitions in the state machine). It is recommended to provide
level 1 at least for read transactions. Level 2 is a little bit more
d302 1
a302 1
needed.
d310 9
a318 1
not be used to trigger a new transaction.
d320 13
a332 1
\section{Multiple Master}
d339 4
a342 1
and as master for the peripheral devices.
d345 6
a350 7
queue $n-1$ requests in an arbiter for $n$ masters. However, for
this additional HW we get zero overheads for the bus request. The
master, which gets the bus will will start the slave transaction in
the same cycle.
\\
\\
TODO: add a timing diagram to explain this concept.
d375 6
a380 4
Figure~\ref{fig:sc:sram} shows the interface for a non-pipelined
read access followed by a write access. Four signals are driven by
the master and two signal by the slave. The lower half of the figure
shows the signals at the FPGA pins where the RAM is connected.
d389 1
a389 1
In cycle 1 the read transaction is started by the master and the
d391 12
a402 6
control signals \sign{ncs} and \sign{noe} during cycle1. Due to the
IO cell registers, the address and control signals are valid at the
FPGA pins very early in cycle 2. At the end of cycle 3 (15ns after
\sign{address}, \sign{ncs} and \sign{noe} are stable) the data from
the RAM is available and can be sampled with the rising edge for
cycle 4.
d404 1
a404 7
The master reads the data in cycle 4 and starts a write transaction
in cycle 5. Address and data are again registered from the slave and
are available for the RAM at the beginning of cycle 6. To perform a
write in two cycles the nwr signal is registered by a negative
triggered flip-flop.

In figure~\ref{fig:sc:sram:prd} we see a pipelined read from the RAM
d406 2
a407 1
read access time of the RAM we get the maximum bandwidth possible.
d424 7
d444 41
a484 1
\section{Status}
d487 9
a495 6
    \item First timing diagrams drawn
    \item SimpCon SRAM interface for JOP on Cyclone and Spartan-3 is
    available
    \item Project at opencores.org accepted
    \item Simple UART as SimpCon example
    \item IO in JOP changed to SimpCon (uart, cnt, usb)
d497 232
d730 7
a736 1
Next steps:
a737 4
\begin{itemize}
    \item Continue this document
    \item Provide Wishbone bridges
\end{itemize}
d739 84
a822 5
to clarify:
\begin{itemize}
    \item Use transaction or transfer in this document?
    \item Use address phase or better command cycle?
\end{itemize}
@