A crossbar network has been added and link widths in the network and network interface can be set. Detailed message timing data can be obtained on a per-message basis and a formula is provided giving message latency in terms of network characteristics.
In addition to the k-ary n-cubes, a crossbar network (switch) can be simulated. A message arriving at an input to a crossbar is directly forwarded to its destination. Starting with the Proteus 3.6 implementation, the messages suffer one wire delay and one switch delay. (See section Time Taken by Messages.) There is no blocking but there is a queue associated with each crossbar output. The queue is multiported and so can simultaneously receive messages from multiple inputs. The timing of message removal from the queue is the same as the queues used in the other "exact" networks.
Proteus simulates a crossbar when it is built using the options shown below:
#define N_DIRECT #undef N_INDIRECT #define Nk 1 #define Ndim <number of processors> #define NET_EXACT
The options are in `net.param'. These options can also be set by selecting an exact radix-1 k-ary n-cube network using the config program.
The width of datapath links can be set using run-time initialized variables `FlitsPerByte' and `NWItransPerByte'.
The simulated network transfers
1.0/FlitsPerByte bytes per clock cycle over network links.
Put another way, a
message of length
l would require
ceil( l * FlitsPerByte )
cycles to move past a point in the network (assuming no delays and
assuming the switch could operate at full speed), where
ceil(x) is the smallest integer greater than or equal to
The system transfers data from the network interface
to queues in a network node at a rate of
bytes per clock cycle.
(See section Time Taken by Messages for details.)
Starting with version L3.3, the length parameter in procedures `send_ipi' and `send_ipiV' refers to the size of a message in bytes. In previous versions the length was in flits.
The amount of time a message takes to arrive at its destination depends upon the message size, distance to travel, the state of the interconnection network, and the type of network used. The formula described below can be used estimate the amount of time that a message will take. This formula is used to validate the network configurations of Proteus L; timing is considered incorrect if actual timing deviates from the timing computed using the formula by more than twenty cycles.
Not all Proteus configurations are validated; in particular the Alewife networks, the bus interconnection, and the send-receive message passing configurations are not presently validated. (See the files in `.../valid' for the code used to perform the validation tests.)
tLat be the time from when a `send_ipi' or `send_ipiV'
is called to the time that the inter-processor interrupt
is enqueued at the destination processor. Then
tLat = wIn + wOut + wNWI + distance * ( WIRE_DELAY + SWITCH_DELAY ) + tX
wIn is the total waiting time in input queues,
the total waiting time in output queues,
wNWI is the total waiting
time in the network interface queue,
distance is the number of links
on the path taken,
SWITCH_DELAY are the
like-named Proteus parameters,
tX is the time needed for the message to be copied from
the network interface to its final destination.
tNW be the number of cycles needed to move the message
past a point in an otherwise empty network. Then
tNW = length * FlitsPerByte + SWITCH_DELAY,
length is the length of the message in bytes and
FlitsPerByte is the like-named Proteus variable, indicating the
number of flits per byte.
tNWI be the minimum number of cycles needed to move
a message from the network interface to its final destination.
tNWI = length * NWItransPerByte ,
NWItransPerByte is the like-named Proteus variable, indicating
the number of cycles needed to move one byte in the network interface.
(Usually, one over the width of the datapath between the network interface
and a node's memory.)
The time for a message, not necessarily one which has completely arrived at a node, to move from the network interface to a processor's memory is given by
tX = tNWI if
wNWI > tNW - tNWI
tX = tNWI - wNWI if
wNWI <= tNW - tNWI .
The parameters have the following interpretation.
WIRE_DELAY is the propagation delay along a link. The
link is pipelineable, so that multiple flits can be using a link
simultaneously. Regardless of
WIRE_DELAY's value, a link's
bandwidth is one flit per cycle.
SWITCH_DELAY indicates the bandwidth and latency of a
switch. A switch can transfer
1/SWITCH_DELAY flits per cycle.
This interpretation may change in a future release. Changes will not
effect the timing of simulations in which
settings of this parameter are deprecated. No anticipated changes will
break code using the other timing parameters.
Unlike Proteus 3, in Proteus L both the exact and modeled network use
wormhole routing. Queue sizes are infinite. Switches in a network are
non-blocking. Messages within the network are transfered at a rate
1/( SWITCH_DELAY * FlitsPerByte ) bytes per cycle.
Messages are transfered between a processor's memory and the network
interface at a rate of
1/NWItransPerByte bytes per cycle.
All messages, including protocol messages, arriving at a processor are
enqueued in a single network-interface queue. Messages sent from
a processor to itself do not use the network-interface queue.
Instead, they are dispatched as soon as they are copied, at a rate
1/NWItransPerByte bytes per cycle.
Messages sent using `send_ipi' and `send_ipiV' are copied to the
appropriate output buffer, without delay, at a rate of
1/NWItransPerByte bytes / cycle. After copying, control returns to
the caller. See files `net.exact.c', `net.model.c', and
`ipi.c' for details.
Messages sent by the cache and memory controllers are also
1/NWItransPerByte. The cache and memory
controllers share a connection to the network; they also do not
block while a message is being dispatched. Therefore the copying
of messages sent by the cache and memory controllers may be
delayed until other messages have been copied. See file
`cache.net.c' for details.
Timing and number of hops used by individual inter-processor interrupt messages can be obtained using the journey statistics feature.
Message timing is enabled by defining macro `JOURNEY_STATS' in file
`lsu.param'. Message statistics are collected for messages
initiated by a call to `send_ipiV' in which the argc
parameter is negative. The interrupt handler that receives such a
message will find a
MsgData structure appended to the sent data
(argc will be increased by
sizeof(MsgData)>>2). The items
included in this structure are: time in the network input queues, time
in the network output queues, time in network interface, number of hops
taken, time message started, the time the message arrived at its
destination, and time that the interrupt was issued. Note that for the
k-ary n-cube networks used by Proteus, time in the input queue will be
Switch for turning on collection of message timing data. This must be placed in `lsu.param' and can also be set using the `config' program.
Used for statistics collected from a single message.
send_ipiVfor this message started. Member `destoTime' is the time that the message arrived at the network interface queue at the destination. Member `issueTime' is the time at which the interrupt request was enqueued at the destination.
See file `genTest.ca' for an example of the function's use.