Node:Networks, Next: Synchronization, Prev: CPU, Up: Top, Contents, Top

Networks, Messages, and Interrupt Timing

A crossbar network has been added and link widths in the network and network interface can be set. Detailed message timing data can be obtained on a per-message basis and a formula is provided giving message latency in terms of network characteristics.


Node:Crossbar, Next: Datapath Widths, Prev: Networks, Up: Networks, Contents, Top

Crossbar Network

In addition to the k-ary n-cubes, a crossbar network (switch) can be simulated. A message arriving at an input to a crossbar is directly forwarded to its destination. Starting with the Proteus 3.6 implementation, the messages suffer one wire delay and one switch delay. (See section Time Taken by Messages.) There is no blocking but there is a queue associated with each crossbar output. The queue is multiported and so can simultaneously receive messages from multiple inputs. The timing of message removal from the queue is the same as the queues used in the other "exact" networks.

Proteus simulates a crossbar when it is built using the options shown below:

#define N_DIRECT
#undef N_INDIRECT
#define Nk 1
#define Ndim <number of processors>
#define NET_EXACT

The options are in `net.param'. These options can also be set by selecting an exact radix-1 k-ary n-cube network using the config program.


Node:Datapath Widths, Next: Message Timing, Prev: Crossbar, Up: Networks, Contents, Top

Setting Network Datapath Width

The width of datapath links can be set using run-time initialized variables `FlitsPerByte' and `NWItransPerByte'.

The simulated network transfers 1.0/FlitsPerByte bytes per clock cycle over network links. Put another way, a message of length l would require ceil( l * FlitsPerByte ) cycles to move past a point in the network (assuming no delays and assuming the switch could operate at full speed), where ceil(x) is the smallest integer greater than or equal to x.

The system transfers data from the network interface to queues in a network node at a rate of 1.0/NWItransPerByte bytes per clock cycle. (See section Time Taken by Messages for details.)

New Meaning of a send_ipi and send_ipiV Parameter

Starting with version L3.3, the length parameter in procedures `send_ipi' and `send_ipiV' refers to the size of a message in bytes. In previous versions the length was in flits.


Node:Message Timing, Next: MTime, Prev: Datapath Widths, Up: Networks, Contents, Top

Time Taken by Messages

The amount of time a message takes to arrive at its destination depends upon the message size, distance to travel, the state of the interconnection network, and the type of network used. The formula described below can be used estimate the amount of time that a message will take. This formula is used to validate the network configurations of Proteus L; timing is considered incorrect if actual timing deviates from the timing computed using the formula by more than twenty cycles.

Not all Proteus configurations are validated; in particular the Alewife networks, the bus interconnection, and the send-receive message passing configurations are not presently validated. (See the files in `.../valid' for the code used to perform the validation tests.)

Let tLat be the time from when a `send_ipi' or `send_ipiV' is called to the time that the inter-processor interrupt is enqueued at the destination processor. Then

tLat = wIn + wOut + wNWI + distance * ( WIRE_DELAY + SWITCH_DELAY ) + tX

where wIn is the total waiting time in input queues, wOut is the total waiting time in output queues, wNWI is the total waiting time in the network interface queue, distance is the number of links on the path taken, WIRE_DELAY and SWITCH_DELAY are the like-named Proteus parameters, and tX is the time needed for the message to be copied from the network interface to its final destination.

Let tNW be the number of cycles needed to move the message past a point in an otherwise empty network. Then

tNW = length * FlitsPerByte + SWITCH_DELAY,

where length is the length of the message in bytes and FlitsPerByte is the like-named Proteus variable, indicating the number of flits per byte.

Similarly, let tNWI be the minimum number of cycles needed to move a message from the network interface to its final destination. Then

tNWI = length * NWItransPerByte ,

where NWItransPerByte is the like-named Proteus variable, indicating the number of cycles needed to move one byte in the network interface. (Usually, one over the width of the datapath between the network interface and a node's memory.)

The time for a message, not necessarily one which has completely arrived at a node, to move from the network interface to a processor's memory is given by

tX = tNWI if wNWI > tNW - tNWI

tX = tNWI - wNWI if wNWI <= tNW - tNWI .

The parameters have the following interpretation. WIRE_DELAY is the propagation delay along a link. The link is pipelineable, so that multiple flits can be using a link simultaneously. Regardless of WIRE_DELAY's value, a link's bandwidth is one flit per cycle.

Parameter SWITCH_DELAY indicates the bandwidth and latency of a switch. A switch can transfer 1/SWITCH_DELAY flits per cycle. This interpretation may change in a future release. Changes will not effect the timing of simulations in which SWITCH_DELAY=1, other settings of this parameter are deprecated. No anticipated changes will break code using the other timing parameters.

Some Network Details

Unlike Proteus 3, in Proteus L both the exact and modeled network use wormhole routing. Queue sizes are infinite. Switches in a network are non-blocking. Messages within the network are transfered at a rate of 1/( SWITCH_DELAY * FlitsPerByte ) bytes per cycle. Messages are transfered between a processor's memory and the network interface at a rate of 1/NWItransPerByte bytes per cycle.

All messages, including protocol messages, arriving at a processor are enqueued in a single network-interface queue. Messages sent from a processor to itself do not use the network-interface queue. Instead, they are dispatched as soon as they are copied, at a rate of 1/NWItransPerByte bytes per cycle.

Messages sent using `send_ipi' and `send_ipiV' are copied to the appropriate output buffer, without delay, at a rate of 1/NWItransPerByte bytes / cycle. After copying, control returns to the caller. See files `net.exact.c', `net.model.c', and `ipi.c' for details.

Messages sent by the cache and memory controllers are also copied at 1/NWItransPerByte. The cache and memory controllers share a connection to the network; they also do not block while a message is being dispatched. Therefore the copying of messages sent by the cache and memory controllers may be delayed until other messages have been copied. See file `cache.net.c' for details.


Node:MTime, Prev: Message Timing, Up: Networks, Contents, Top

Timing Messages

Timing and number of hops used by individual inter-processor interrupt messages can be obtained using the journey statistics feature.

Use

Message timing is enabled by defining macro `JOURNEY_STATS' in file `lsu.param'. Message statistics are collected for messages initiated by a call to `send_ipiV' in which the argc parameter is negative. The interrupt handler that receives such a message will find a MsgData structure appended to the sent data (argc will be increased by sizeof(MsgData)>>2). The items included in this structure are: time in the network input queues, time in the network output queues, time in network interface, number of hops taken, time message started, the time the message arrived at its destination, and time that the interrupt was issued. Note that for the k-ary n-cube networks used by Proteus, time in the input queue will be zero.

Definitions

Macro: JOURNEY_STATS

Switch for turning on collection of message timing data. This must be placed in `lsu.param' and can also be set using the `config' program.

Structure: MsgData

Used for statistics collected from a single message.

Example

See file `genTest.ca' for an example of the function's use.