Node:Collected Statistics, Next: Building, Prev: Statistics Collection, Up: Top, Contents, Top

Collected Statistics

Some additional statistics are automatically collected by Proteus. Not exactly a statistic, Proteus generates strings describing its configuration; these can be used by user code to label output. Histometrics are collected for network and memory performance parameters. (See section Histogram Metrics.) The collection of some of these can be switched on and off. Several new graphs showing network and memory-system performance are displayed by stats.

Configuration Strings: Configuration description and comment strings.
Pre-Defined Histometrics: Automatically collected histogram data.
Utilization Statistics: Pausing certain statistics collection.
New Stats: New graphs displayed by stats.

Node:Configuration Strings, Next: Pre-Defined Histometrics, Prev: Collected Statistics, Up: Collected Statistics, Contents, Top

Automatically Generated Configuration Strings and User Comment Strings

Proteus automatically generates several strings describing the system configuration, including the type of network and memory system. If a simulation title is not provided, the strings are used to construct one. Some of the strings are also printed out at the top of a run. The strings can be used in user code, to label output, for example.

Each string comes in two sizes: a normal size, usually no more than 80 characters; and a short size, usually less than 20 characters.

Proteus can automatically display user comment strings at the beginning of a run. The strings are specified by RTI variables `sim_comment_1', `sim_comment_2', and `sim_comment_3'.

variable: char* mem_config_str
variable: char* mem_config_str_short: Describes the amount of memory and size of the cache directories. Short string format is (2^XB,DY), where X is the number of bits of address space per module and Y is the number of entries in the cache directory, if any.

variable: char* cache_config_str
variable: char* cache_config_str_short: Describes the type of cache. Short string format is (Xw,2^Y,2^ZL,W,LV), where X is the cache associativity (number of lines per set), 2^Y is the number of sets, 2^Z is the number of bytes per line, W is the type of replacement policy used: `pr' for part-random, `fr' for full-random, or `DR' for deterministic replacement, and V is the latency of cache operations in cycles.

variable: char* nw_config_str
variable: char* nw_config_str_short: Describes the type of network. Short string format for KNC and multistage networks is (Dd,Rr,XY,(NL,II)W,Tcy), where D is the number of dimensions, R is the radix, X is `B' for bidirectional networks and `D' for unidirectional networks, Y is `D' for direct networks and `I' for indirect networks, N is the link width in bytes, I is the network interface width in bytes, and T is the memory latency in cycles. If D is preceded by `~' the network is modeled; if it is preceded by `=' then it is simulated exactly.

RTI Variable: char* sim_comment_1
RTI Variable: char* sim_comment_2
RTI Variable: char* sim_comment_3: Pointers to comment strings. If set, the strings are printed at the beginning of a run.

Node:Pre-Defined Histometrics, Next: Utilization Statistics, Prev: Configuration Strings, Up: Collected Statistics, Contents, Top

Pre-Defined Histometrics

Histometrics are automatically collected for the simulated-system quantities listed below (See section Histogram Metrics.) They can be viewed using the Mstats Mathematica package. (See section Mstats: A Mathematica Package for Viewing Traces.)

Time, as used below, refers to simulated time measured in cycles.

The first four quantities below are collectively called the utilization statistics. If perfectly defined, the sum of each quantity for each processor would equal the run time. (See section Utilization Graphs.)

`User Time' Time spent in user mode. This includes time spent running user threads and running cycle-counted library functions. It does not include time spent running interrupt handlers or time for shared memory access.
`Shared Memory Access Latency' Time spent accessing shared memory.
`Interrupt Handler Run Time' Time spent running interrupt handlers.
`Idle Time' Time in which processor was not executing code.

The quantities below describe the delays a message encounters from source to destination. These statistics are only collected if `JOURNEY_STATS' is defined (See section Timing Messages) and only when utilization statistics are being collected (See section Utilization Statistics).

`Message Latency' The time taken for a message to reach, and be be processed at, its destination. This statistic is not collected for messages which are sent to the sending processor (i.e., those on a zero-length path). Statistics are collected for each processor on the messages it sends. The message is considered processed when it is completely removed from the network interface queue. If the message is an inter-processor interrupt, this is the time that the interrupt is enqueued.
`Message Delay' The time that a message spends waiting in network queues, not including the network-interface queue. Wire and switch latency are not counted. If a message passes through an empty network the waiting time will be zero. Statistics are collected for the processor sending the message. Statistics are collected for cache protocol messages with a path length greater than zero if RTI-variable ProtoMsgStats!=0. Statistics are not collected for protocol messages if ProtoMsgStats==0. (See section Run-Time Variable Initialization.) Statistics are collected only for those inter-processor interrupt messages for which the argv argument is negative. (See section Timing Messages.)
`Message Desto Delay' Time spent in the network-interface queue. Statistics are collected for the same messages for which Message Delay statistics are collected. Statistics are collected for the processor receiving the message.

Node:Utilization Statistics, Next: New Stats, Prev: Pre-Defined Histometrics, Up: Collected Statistics, Contents, Top

Utilization Statistics

Utilization statistics describe describe the amount of time a processor spends running user code, performing shared-memory access, running interrupt handlers, and idle. If perfectly defined, these would add up to the total execution time per processor. (See section Pre-Defined Histometrics.)

They also include shared memory access latency and several components of message latency.

Collection of utilization statistics can be turned on and off using functions `UtilSetOn()' and `UtilSetOff()'. These functions can also call a user function; the user function might switch user statistics collection routines on and off. Utilization statistics collection can automatically be turned on using RTI variable `utilSwitch'.

Function: void UtilSetOn( void );: Calls function `*user_statistics_on_hook_' if non-null, turns on collection of utilization statistics, and writes an appropriate state event. (The state event is associated with the highest-numbered processor.) (See section Pre-Defined Histometrics.)

Function: void UtilSetOff( void );: Calls function `*user_statistics_off_hook_' if non-null, turns off collection of utilization statistics, and writes an appropriate state event. (The state event is associated with the highest-numbered processor.) (See section Pre-Defined Histometrics.)

Variable: void (*user_statistics_on_hook_)();: Points to a function to be called when utilization statistics are turned on, or `NULL' if there is no user function. It is the user's responsibility to chain hook functions.

Variable: void (*user_statistics_off_hook_)();: Points to a function to be called when utilization statistics are turned off, or `NULL' if there is no user function. It is the user's responsibility to chain hook functions.

RTI Variable: int utilSwitch: Utilization statistics switch. When 0 utilization statistics collection must be activated by the user (by calling `UtilSetOn'). When 1, collection starts just before `usermain' is called. When 2, collection starts at the beginning of the simulation.

Node:New Stats, Prev: Utilization Statistics, Up: Collected Statistics, Contents, Top

New Stats Graphs

The following graphs are provided by Proteus L in addition to those provided by Proteus 3. The graphs are all available in cached, exact network configurations; some graphs are unavailable in other configurations. The collection of data for these graphs is controlled by macros defined in `conf.param' and can be set using the config program. Data collection might be switched off to save disk space (by default the data is written to `events.sim') and improve speed. The stats program will display an empty graph if the needed data was not collected.

Threads in Barrier The number of processors which are in the basic barrier. (Statistics are not collected for other barriers.) (See section Basic Barrier.)
Network Traffic The arrival rate of traffic to the network. The (admittedly awkward) units are flits per 128 cycles. Note that this includes all processors, it is not a per-processor rate. Data collection controlled by `WATCH_NET'.
Cache Contention The amount of time (in cycles) that cache requests wait at a processor's cache before being serviced. This does not include the time spent in the network interface. Data collection controlled by `WATCH_CACHE'. (See section Timing of Cache and Memory Operations.)
Memory Contention The amount of time (in cycles) that memory requests wait at a memory module before being serviced. This does not include the time spent in the network interface. Data collection controlled by `WATCH_NET'. (See section Timing of Cache and Memory Operations.)
Access Latency The amount of time (in cycles) from the start of a memory access instruction (load or store) to the start of the following instruction. If the store is nonblocking, this can be as short as one integer instruction time. Data collection controlled by `WATCH_MEMORY_ACCESS'.
TLB Hit Ratio The number of TLB hits per 1000 TLB accesses.
Output Contention The total waiting time (in cycles) protocol messages entering the network at a processor and its memory module. These include messages generated by the cache controller and memory module (which share a network port) but not inter-processor interrupts (generated by the processor). Data collection controlled by `WATCH_NET'.
NWI Contention The total waiting time (in cycles) of messages in the processor's network interface. This includes all messages bound for the processor and memory module. Data collection controlled by `WATCH_NET'.