Node:Collected Statistics, Next: Building, Prev: Statistics Collection, Up: Top, Contents, Top
Collected Statistics
Some additional statistics are automatically collected by Proteus.  Not
exactly a statistic, Proteus generates strings describing its
configuration; these can be used by user code to label output.
Histometrics are collected for network and memory performance
parameters. (See section Histogram Metrics.) The collection of some of these can
be switched on and off.  Several new graphs showing network and
memory-system performance are displayed by stats.
Node:Configuration Strings, Next: Pre-Defined Histometrics, Prev: Collected Statistics, Up: Collected Statistics, Contents, Top
Automatically Generated Configuration Strings and User Comment Strings
Proteus automatically generates several strings describing the system
configuration, including the type of network and memory system.  If a
simulation title is not provided, the strings are used to construct one.
Some of the strings are also printed out at the top of a run.
The strings can be used in user code, to label output, for example.
Each string comes in two sizes: a normal size, usually no more than
80 characters; and a short size, usually less than 20 characters.
Proteus can automatically display user comment strings at the beginning
of a run. The strings are specified by RTI variables
`sim_comment_1', `sim_comment_2', and `sim_comment_3'.
- variable: char* mem_config_str
- 
- variable: char* mem_config_str_short
- 
Describes the amount of memory and size of the cache directories. 
Short string format is
(2^XB,DY), where
X is the number of bits of address space per module and
Y is the number of entries in the cache directory, if any.
- variable: char* cache_config_str
- 
- variable: char* cache_config_str_short
- 
Describes the type of cache. Short string format is
(Xw,2^Y,2^ZL,W,LV),
where
X is the cache associativity (number of lines per set),
2^Y is the number of sets, 2^Z is the number of bytes
per line, W is the type of replacement policy used:
`pr' for part-random, `fr' for full-random, or
`DR' for deterministic replacement, and V
is the latency of cache operations in cycles.
- variable: char* nw_config_str
- 
- variable: char* nw_config_str_short
- 
Describes the type of network. Short string format for KNC and
multistage networks is
(Dd,Rr,XY,(NL,II)W,Tcy),
where D is the number of dimensions, R is the radix,
X is `B' for bidirectional networks
and `D' for unidirectional networks, Y is `D' for
direct networks and `I' for indirect networks, N is the
link width in bytes, I is the network interface width
in bytes, and T is the memory latency in cycles.
If D is preceded by `~' the network is modeled;
if it is preceded by `=' then it is simulated exactly.
- RTI Variable: char* sim_comment_1
- 
- RTI Variable: char* sim_comment_2
- 
- RTI Variable: char* sim_comment_3
- 
Pointers to comment strings. If set, the strings are printed at the
beginning of a run.
Node:Pre-Defined Histometrics, Next: Utilization Statistics, Prev: Configuration Strings, Up: Collected Statistics, Contents, Top
Pre-Defined Histometrics
Histometrics are automatically collected for the simulated-system
quantities listed below (See section Histogram Metrics.)  They can be viewed
using the Mstats Mathematica package.  (See section Mstats: A Mathematica Package for Viewing Traces.)
Time, as used below, refers to simulated time measured in cycles.
The first four quantities below are collectively called the
utilization statistics. If perfectly defined, the sum
of each quantity for each processor would equal the run time.
(See section Utilization Graphs.)
- 
`User Time' 
Time spent in user mode.  This includes time spent running user threads
and running cycle-counted library functions.  It does not include time
spent running interrupt handlers or time for shared memory access.
- 
`Shared Memory Access Latency' 
Time spent accessing shared memory. 
- 
`Interrupt Handler Run Time' 
Time spent running interrupt handlers.
- 
`Idle Time' 
Time in which processor was not executing code.
The quantities below describe the delays a message encounters from
source to destination. These statistics are only collected if
`JOURNEY_STATS' is defined (See section Timing Messages) and only when utilization
statistics are being collected (See section Utilization Statistics).
- 
`Message Latency' 
The time taken for a message to reach, and be be processed at, its
destination. This statistic is not collected for messages which are sent
to the sending processor (i.e., those on a zero-length path).
Statistics are collected for each processor on the messages it sends.
The message is considered processed when it is completely removed
from the network interface queue.  If the message is an inter-processor
interrupt, this is the time that the interrupt is enqueued.
- 
`Message Delay' 
The time that a message spends waiting in network queues, not including
the network-interface queue.  Wire and switch latency are not
counted. If a message passes through an empty network the waiting time
will be zero.  Statistics are collected for the processor sending the
message.
Statistics are collected for cache protocol messages with a path length
greater than zero
if RTI-variable ProtoMsgStats!=0. Statistics are not
collected for
protocol messages ifProtoMsgStats==0.  
(See section Run-Time Variable Initialization.) 
Statistics are collected only for those inter-processor interrupt messages
for which the argv argument is negative. (See section Timing Messages.)
- 
`Message Desto Delay' 
Time spent in the network-interface queue. Statistics are collected
for the same messages for which Message Delay statistics are collected.
Statistics are collected for the processor receiving the message.
Node:Utilization Statistics, Next: New Stats, Prev: Pre-Defined Histometrics, Up: Collected Statistics, Contents, Top
Utilization Statistics
Utilization statistics describe describe the amount of time
a processor spends running user code, performing shared-memory access,
running interrupt handlers, and idle. If perfectly defined,
these would add up to the total execution time per processor.
(See section Pre-Defined Histometrics.)
They also include shared memory access latency and several components
of message latency.
Collection of utilization statistics can be turned on and off
using functions `UtilSetOn()' and `UtilSetOff()'.
These functions can also call a user function; the
user function might switch user statistics collection
routines on and off.  Utilization statistics collection
can automatically be turned on using RTI variable `utilSwitch'.
- Function: void UtilSetOn( void );
- 
Calls function `*user_statistics_on_hook_' if non-null,
turns on collection of utilization statistics, and writes
an appropriate state event. (The state event is associated with
the highest-numbered processor.)
(See section Pre-Defined Histometrics.) 
- Function: void UtilSetOff( void );
- 
Calls function `*user_statistics_off_hook_' if non-null,
turns off collection of utilization statistics, and writes
an appropriate state event. (The state event is associated with
the highest-numbered processor.)
(See section Pre-Defined Histometrics.) 
- Variable: void (*user_statistics_on_hook_)();
- 
Points to a function to be called when utilization statistics
are turned on, or `NULL' if there is no user function.
It is the user's responsibility to chain hook functions.
- Variable: void (*user_statistics_off_hook_)();
- 
Points to a function to be called when utilization statistics
are turned off, or `NULL' if there is no user function.
It is the user's responsibility to chain hook functions.
- RTI Variable: int utilSwitch
- 
Utilization statistics switch. When 0 utilization statistics collection
must be activated by the user (by calling `UtilSetOn').  When 1,
collection starts just before `usermain' is called. When 2,
collection starts at the beginning of the simulation.
Node:New Stats, Prev: Utilization Statistics, Up: Collected Statistics, Contents, Top
New Stats Graphs
The following graphs are provided by Proteus L in addition to 
those provided by Proteus 3.  The graphs are all available in
cached, exact network configurations; some graphs are unavailable
in other configurations.  The collection of data for these
graphs is controlled by macros defined in `conf.param' and
can be set using the config program.  Data collection might
be switched off to save disk space (by default the data is written
to `events.sim') and improve speed. The stats program
will display an empty graph if the needed data was not collected.
- 
Threads in Barrier
The number of processors which are in the basic barrier. (Statistics
are not collected for other barriers.)
(See section Basic Barrier.)
- 
Network Traffic
The arrival rate of traffic to the network. The (admittedly awkward)
units are flits per 128 cycles. Note that this includes all 
processors, it is not a per-processor rate.  Data
collection controlled by `WATCH_NET'.
- 
Cache Contention
The amount of time (in cycles) that cache requests wait at a processor's
cache before being serviced.  This does not include the time spent in
the network interface.  Data collection controlled by
`WATCH_CACHE'.  (See section Timing of Cache and Memory Operations.)
- 
Memory Contention
The amount of time (in cycles) that memory requests wait at a memory
module before being serviced.  This does not include the time spent in
the network interface. Data collection controlled by
`WATCH_NET'. (See section Timing of Cache and Memory Operations.)
- 
Access Latency
The amount of time (in cycles) from the start of a memory access
instruction (load or store) to the start of the following
instruction. If the store is nonblocking, this can be as short as one
integer instruction time.  Data collection controlled by
`WATCH_MEMORY_ACCESS'.
- 
TLB Hit Ratio
The number of TLB hits per 1000 TLB accesses.
- 
Output Contention
The total waiting time (in cycles) protocol messages entering the
network at a processor and its memory module. These include messages
generated by the cache controller and memory module (which share a
network port) but not inter-processor interrupts (generated by the
processor).  Data collection controlled by `WATCH_NET'.
- 
NWI Contention
The total waiting time (in cycles) of messages in the processor's network
interface. This includes all messages bound for the processor and memory
module.  Data collection controlled by `WATCH_NET'.