A crossbar network has been added and link widths in the network and network interface can be set. Detailed message timing data can be obtained on a per-message basis and a formula is provided giving message latency in terms of network characteristics.
In addition to the k-ary n-cubes, a crossbar network (switch) can be simulated. A message arriving at an input to a crossbar is directly forwarded to its destination. Starting with the Proteus 3.6 implementation, the messages suffer one wire delay and one switch delay. (See section Time Taken by Messages.) There is no blocking but there is a queue associated with each crossbar output. The queue is multiported and so can simultaneously receive messages from multiple inputs. The timing of message removal from the queue is the same as the queues used in the other "exact" networks.
Proteus simulates a crossbar when it is built using the options shown below:
#define N_DIRECT #undef N_INDIRECT #define Nk 1 #define Ndim <number of processors> #define NET_EXACT
The options are in `net.param'. These options can also be set by selecting an exact radix-1 k-ary n-cube network using the config program.
The width of datapath links can be set using run-time initialized variables `FlitsPerByte' and `NWItransPerByte'.
The simulated network transfers
1.0/FlitsPerByte
bytes per clock cycle over network links.
Put another way, a
message of length l
would require ceil( l * FlitsPerByte )
cycles to move past a point in the network (assuming no delays and
assuming the switch could operate at full speed), where
ceil(x)
is the smallest integer greater than or equal to
x
.
The system transfers data from the network interface
to queues in a network node at a rate of 1.0/NWItransPerByte
bytes per clock cycle.
(See section Time Taken by Messages for details.)
Starting with version L3.3, the length parameter in procedures `send_ipi' and `send_ipiV' refers to the size of a message in bytes. In previous versions the length was in flits.
The amount of time a message takes to arrive at its destination depends upon the message size, distance to travel, the state of the interconnection network, and the type of network used. The formula described below can be used estimate the amount of time that a message will take. This formula is used to validate the network configurations of Proteus L; timing is considered incorrect if actual timing deviates from the timing computed using the formula by more than twenty cycles.
Not all Proteus configurations are validated; in particular the Alewife networks, the bus interconnection, and the send-receive message passing configurations are not presently validated. (See the files in `.../valid' for the code used to perform the validation tests.)
Let tLat
be the time from when a `send_ipi' or `send_ipiV'
is called to the time that the inter-processor interrupt
is enqueued at the destination processor. Then
tLat = wIn + wOut + wNWI + distance * ( WIRE_DELAY + SWITCH_DELAY ) + tX
where wIn
is the total waiting time in input queues, wOut
is
the total waiting time in output queues, wNWI
is the total waiting
time in the network interface queue, distance
is the number of links
on the path taken, WIRE_DELAY
and SWITCH_DELAY
are the
like-named Proteus parameters,
and tX
is the time needed for the message to be copied from
the network interface to its final destination.
Let tNW
be the number of cycles needed to move the message
past a point in an otherwise empty network. Then
tNW = length * FlitsPerByte + SWITCH_DELAY
,
where length
is the length of the message in bytes and
FlitsPerByte
is the like-named Proteus variable, indicating the
number of flits per byte.
Similarly, let tNWI
be the minimum number of cycles needed to move
a message from the network interface to its final destination.
Then
tNWI = length * NWItransPerByte
,
where NWItransPerByte
is the like-named Proteus variable, indicating
the number of cycles needed to move one byte in the network interface.
(Usually, one over the width of the datapath between the network interface
and a node's memory.)
The time for a message, not necessarily one which has completely arrived at a node, to move from the network interface to a processor's memory is given by
tX = tNWI
if wNWI > tNW - tNWI
tX = tNWI - wNWI
if wNWI <= tNW - tNWI
.
The parameters have the following interpretation.
WIRE_DELAY
is the propagation delay along a link. The
link is pipelineable, so that multiple flits can be using a link
simultaneously. Regardless of WIRE_DELAY
's value, a link's
bandwidth is one flit per cycle.
Parameter SWITCH_DELAY
indicates the bandwidth and latency of a
switch. A switch can transfer 1/SWITCH_DELAY
flits per cycle.
This interpretation may change in a future release. Changes will not
effect the timing of simulations in which SWITCH_DELAY=1
, other
settings of this parameter are deprecated. No anticipated changes will
break code using the other timing parameters.
Unlike Proteus 3, in Proteus L both the exact and modeled network use
wormhole routing. Queue sizes are infinite. Switches in a network are
non-blocking. Messages within the network are transfered at a rate
of 1/( SWITCH_DELAY * FlitsPerByte )
bytes per cycle.
Messages are transfered between a processor's memory and the network
interface at a rate of 1/NWItransPerByte
bytes per cycle.
All messages, including protocol messages, arriving at a processor are
enqueued in a single network-interface queue. Messages sent from
a processor to itself do not use the network-interface queue.
Instead, they are dispatched as soon as they are copied, at a rate
of 1/NWItransPerByte
bytes per cycle.
Messages sent using `send_ipi' and `send_ipiV' are copied to the
appropriate output buffer, without delay, at a rate of
1/NWItransPerByte
bytes / cycle. After copying, control returns to
the caller. See files `net.exact.c', `net.model.c', and
`ipi.c' for details.
Messages sent by the cache and memory controllers are also
copied at 1/NWItransPerByte
. The cache and memory
controllers share a connection to the network; they also do not
block while a message is being dispatched. Therefore the copying
of messages sent by the cache and memory controllers may be
delayed until other messages have been copied. See file
`cache.net.c' for details.
Timing and number of hops used by individual inter-processor interrupt messages can be obtained using the journey statistics feature.
Message timing is enabled by defining macro `JOURNEY_STATS' in file
`lsu.param'. Message statistics are collected for messages
initiated by a call to `send_ipiV' in which the argc
parameter is negative. The interrupt handler that receives such a
message will find a MsgData
structure appended to the sent data
(argc will be increased by sizeof(MsgData)>>2
). The items
included in this structure are: time in the network input queues, time
in the network output queues, time in network interface, number of hops
taken, time message started, the time the message arrived at its
destination, and time that the interrupt was issued. Note that for the
k-ary n-cube networks used by Proteus, time in the input queue will be
zero.
Switch for turning on collection of message timing data. This must be placed in `lsu.param' and can also be set using the `config' program.
Used for statistics collected from a single message.
send_ipiV
for this message started.
Member `destoTime' is the time that the message
arrived at the network interface queue at the destination.
Member `issueTime' is the
time at which the interrupt request was enqueued at the
destination.
See file `genTest.ca' for an example of the function's use.