Two data-collecting mechanisms have been added, state events and histometrics. State events are used to generate a graph, displayed by stats, showing what processors are doing during execution. Histometrics are dynamically generated histograms of data sampled during execution. Histometrics can be displayed using Mathematica.
Stats can scale the events that it displays. Though events found in the event file are integers, they sometimes describe quantities usually expressed as floating-point numbers. For example, access latency is the total latency for the last 100 accesses, not the average. Stats can now scale events displayed in ArrayGraph legends, for example dividing the access latency "per 100 requests" found in the event file by 100. A scale is specified by using a "scale" descriptor in the graph specification and a number format is specified by a "format" descriptor.
In the graph specification below, a 0.01 scale is used for access latency which, as described above, uses events which are the sum of the last 100 accesses. With scale, an average is displayed, the format specifies one digit past the decimal point.
ArrayGraph accesslatency (x, 0, NO_OF_PROCESSORS -1) { menu <- "Access Latency", name <- "Memory Access Latency", scale <- 0.01, format <- "%.1f", y_axis <- "Processor", x_axis <- "Time / Cycles", action { EV_MEMACC_LATENCY: VALUE(x) }}
Stats will look for graph specification data in the event file, in addition to the usual graph file. If a graph is defined both in the graph file and the event file, descriptors for that graph found in the graph file take precedence. When stats is run, the graph information found in the event file is written to file GraphfileTemp.
Graph specification data is placed in the event file using the function GraphSpec:
This feature is in an early state of implementation and subject to change. (That's why there's no way yet to automatically put state event names in the state graph legend.)
Stats can now read incomplete event files; this is useful for simulations that end abnormally or are in progress. (Gee, I wonder if it's in an infinite loop?)
The current and total number of views is displayed in the upper left-hand-corner of the graph.
The "x" values of views are preserved when switching to compatible graphs. For example, if you zoom in on cycles 1000000 to 1200000 in the states graph and then switch to the access latency graph, that same range will be visible.
Double clicking on a row (typically displaying data from a particular processor) in an event graph will zoom the graph to display only that row.
The keyboard can be used to pan, zoom, and switch between views. The key bindings are:
[left] Pan (user's view) left 20% of current view. [S-left] Pan left 4% of current view. [right] Pan right 20% of current view. [S-right] Pan right 4% of current view. [up] Pan up 20% of current view. [down] Pan down 20% of current view. [pageup] Pan to maximum y value. [pagedown] Pan to minimum y value. [home] Pan to minimum x value. [end] Pan to maximum x value. [C-up] Zoom out. [C-down] Zoom in. [M-left] Previous view. [M-right] Next view. [M-up] First view (entire graph).
where, [S-up] indicates shift-up-arrow, [C-up] indicates control-up-arrow, [M-left] indicates meta-left-arrow, etc. (The meta key may be labeled with a black diamond, the word ALT, or with something else.)
Sometimes these pan and zoom commands change the current view, sometimes they increment the current view and apply there. (The View menu or the [M-left] and [M-right] keys can be used to switch between views.) The rules are:
PostScript output closely matches the displayed graph. New standard sizes are available (identified by size rather than their intended use) as well as PostScript matching the current window size (WYSIWYG). The PostScript generation code is not perfect: text can overlap. If it does, either shorten the text by editing the PostScript file or select a larger size.
The title, subtitle, and the plot background color can easily be changed by editing the PostScript; comments have been included to guide those making more extensive changes.
(For those familiar with PostScript, the title and subtitle can even by changed by defining keys before the graph file itself is read within some other environment. One could, say, specify the title in a TeX file processed by dvips using `\special{!/stats-title (Experiment 1 Results) def}', though this would set the title of all included stats graphs.)
A useful feature has been removed. It is no longer possible to generate PostScript using command-line arguments.
The key legend event graphs can now indicate the number or percentage of events in the current view within each range. For example, the states graph now indicates the amount of time idle, waiting in a barrier, etc. This information can be switched off from the options menu.
The average value of displayed events is given for Array Graphs without state maps. (In a graph with a state map, legend entries are labeled with a text string rather than a bin range.)
System state refers to the most important (1) activity that each processor is performing at a given time. The state might indicate which procedure is running, or perhaps if an important loop is executing. Proteus collects and writes such system state information to the event file; after execution stats can display a graph showing these states. The graph can provide an overview of the simulation, and might be useful for performance tuning, algorithm or hardware development, debugging, and office decoration.
States are constructed from sub-states, which typically indicate whether a particular activity is in progress. Each sub-state can be either `on' (active or true) or `off' (inactive or false).
Proteus collects a number of sub-states by default, others can be added by
the user. Default sub-states are: idle, busy, getting
semaphore lock, and barrier.
A processor is idle when no threads are running on that
processor; a processor is busy when it's not idle. It's getting a lock
during a sem_P
call and its in the barrier sub-state
when executing a barrier.
Each sub-state is associated with one or two colors and a layer, all real numbers. The sub-state's two colors are called primary and secondary. A sub-state's colors can be either translucent or opaque. A sub-state with translucent colors is also associated with a lower-level sub-state, called the co-state.
The state of a processor is determined from the colors of the active sub-state with the highest layer as follows. If no sub-states are active the color is zero. If the principal (highest-layer active) sub-state has the opaque property the color is the sub-state's primary color. If the principal sub-state has the translucent property the processor state color depends upon the value of the co-state. If the co-state is `on', the secondary color is used, otherwise the primary color is used. Note that if a sub-state is opaque, the secondary color and co-state are superfluous. Proteus writes state events to the event file indicating the processor and the color, which is cast to an integer. The mapping between the color values referred to above and a color displayed using stats is specified in stats' `Graphfile'.
By default, sub-state idle is in layer -1 and has color black. (It's called `preFirst' internally, and is always active but obscured by all other sub-states.) Sub-state busy is in layer 1 and has color gray. Getting lock is in layer 100 and is red. The barrier, a translucent sub-state, is in layer 100 and has colors navy blue and brown. Navy blue is used when the busy-sub-state is not active (it's labeled "In barrier, idle" in the graph legend) and brown otherwise. Note that displayed colors will vary with display device.
An additional state is defined for the N-Queens program; indicating that the `local_add_queen' is executing. This sub-state is in layer 55 and has color blue.
The default stats graph labels four user-defined slots and maps them to colors blue, green, yellow, and white. These can be changed by editing `Graphfile'.
By convention layers around 50 are used for procedures doing useful work and layers around 100 are for activities that delay work, such as contention delays.
States that change frequently can bloat the event file. Function `stateSetResolution' can be used to filter out short-duration states of a particular type; variable `sev_default_resolution' can set a default minimum state duration.
A new sub-state is defined by the following function:
Colors are rounded to integers so fractional values are ignored. The mapping between these colors and those displayed by stats is determined by the contents of `Graphfile'.
The function returns the identifier assigned to the sub-state.
A sub-state change is posted by calling the following function:
The new state is determined and an appropriate event is written to the event file, as described above. The value of the event is the color of the sub-state. (The actual color displayed by stats is based on the range of default color values or defined color values.)
The value of the various sub-states is determined by the order in which `stateEvent' is called. It is not determined by the time parameter. The time parameter is only used when writing events to the event file.
A duration threshold can be set for state events, if the state is active for strictly less than the duration, it is not written to the event file. Thresholds can be set for individual states and there is a default threshold.
For examples of how these functions can be used see engine files `basic_barrier.ca' and `queens.ca' and in the validation directory, `.../valid/genTest.ca'.
A histogram metric, or histometric for short, is used by Proteus engine
or user programmers to obtain a histogram, and other statistics, of
values in a simulation run. The number and range of histogram bins can
be specified by the user or can be chosen automatically. A single
histometric can record statistics on several sets of data. (For
example, suppose a histometric is to be used to record statistics on
x
, a variable local to each of N processors. Then N sets would
be used; separate data would be collected for each x
.)
To use a histometric it must first be defined by a call to the function
newHistoMetric
, which returns a number for the histometric.
Parameters for this function specify the number of sets, number and
range of bins, and a name for the histometric. (Parameters are
described below.)
Data is sampled by calls to histoMetricSample
.
Parameters specify the histometric number, a set number, and
the value to sample. Sampling can be turned on and off using
histoMetricOff
, histoMetricOn
, and histoMetricOnAt
.
(When sampling for a histometric is turned off a call to
histoMetricSample
for that histometric has no effect.)
If the simulation ends normally the histometric data is written to the
trace file (by default called `events.sim') along with other trace
data. Averages of histoMetrics can be obtained after all data is
collected using the function histoMetricAverage
. After
the simulatoin, histometric
data can be viewed using the Mstats package for
Mathematica. See section Mstats: A Mathematica Package for Viewing Traces for Mstats documentation.
A histometric must be defined before it can be used. A histometric
is defined by calling function newHistoMetric
, usually
near the beginning of the user program.
Declarations for the functions below are in `event.h'.
histoMetricSample
.
The string pointed to by name is used as the name for the histometric. The name is to be used in Mstats and could be used in the script file for stats.
Calls to histoMetricSample
for a histometric will be ignored
when the simulated processor time is less than the value given for
startAt in the histometric's definition (unless the histometric is
explicitly turned on).
Parameter numSets specifies the number of distinct variables
to sample (called sets). Statistics are recorded separately for
each variable. Sets are numbered from 0 to numSets-1. Set
numbers are not presently checked in calls to histoMetricSample
;
invalid set numbers can result in segmentation faults.
Parameter autoSamples specifies the number of samples to examine before determining the bin sizes. If autoSamples is 0, then the bin range specified in the next two parameters (along with the number of bins) are used to determine the bin sizes. If autoSamples is positive then autoSamples samples will be buffered. When the last of these samples is buffered they will be analyzed to determine the bin sizes (as described below), and the bins will be initialized using the samples.
Parameters minBin and maxBin specify the minimum sample that will be placed in the smallest bin and the minimum sample too large for the largest bin, respectively. Their interpretation depends upon autoSamples. If autoSamples is 0, then minBin and maxBin specify actual values. If autoSamples is positive, then minBin and maxBin are percentiles (they must be between 0 and 1, inclusive). The minimum sample that will be placed in the smallest bin is the minBin'th percentile buffered sample. The minimum sample too large for the largest bin is the maxBin'th percentile buffered sample.
Parameter bins specifies the number of normal bins. Each of these bins is the same width. Two additional bins, called end bins are provided for samples which fall outside the normal bins.
For example, consider the following code fragment:
#include "event.h" ... x=newHistoMetric("Pressure", 0, NO_OF_PROCESSORS, 1000, 0, 0.95, 100); y=newHistoMetric("Mass", MAXTIME, 5, 0, 0, 500, 20);
The first call defines a histometric named "Pressure"; data collection starts immediately. There is a quantity called pressure (presumably) on each processor and separate statistics are being kept for each of them. It is not possible to determine the range of values that will be encountered so bin values will be automatically determined based on the first 1000 samples. The smallest bin will be based on the smallest sample buffered; the largest sample will be the 50'th largest. (The largest 5% might be much larger than the rest, and would result in unacceptably large bins if used to determine bin size.) There will be a total of 100 bins.
The second call defines a histometric named "Mass"; data collection will not start until explicitly turned on. (`MAXTIME' is the largest possible time the simulator can handle.) Five sets of data will be maintained, perhaps because there are five items called mass. The range of the 20 bins is manually specified; the smallest bin starts at zero; the largest bin ends at 500.
The statistics are based on samples of data; samples are
collected by calling histoMetricSample
. The following data are
collected:
These data can be accessed using the Mstats package.
The function and macros below are declared in `event.h'.
For example, consider the code fragment below which is used
to produce a histogram of values of variable ir
.
#include "event.h" ... /* Initialization */ int met_Irr; met_Irr=newHistoMetric("Irradiance", MAXTIME, 1, 0, 0, 500, 20); ... /* Turn on data collection. */ if(warm)hystoMetricOn(met_Irr); ... /* Compute and sample data. */ ir= (a + x)/2.0 histoMetricSample(met_Irr,0,ir); ...
The code under the initialization comment defines the
histometric. The histometric is initially turned off. Only one
set of data is being collected since there is only one quantity
which is being sampled. Data collection is turned on under the
data-collection comment if variable warm
is non-zero.
Data is actually sampled in the last line of code shown, which
presumably will be executed many times.
Averages of histogram metrics can be obtained using the function histoMetricAverage, which should be called only at the end of a run.
Mstats is a Mathematica package for viewing Proteus trace data. Two special graph types are provided for viewing traces, one shows processor utilization using a stacked bar graph. The other displays a set of histograms corresponding to a histometric. Proteus metric and parameter data are available, event data is not available. Mstats is to be used with stats, therefore it does not reproduce all of `stats'' functionality.
To run Mstats, Mathematica must be installed and a modified version of the stats program, stats must be installed. Of course, there must also be a trace file generated by Proteus.
The session transcript below shows Mstats being used to display a graph showing processor utilization and a graph showing histograms of message latency.
Mathematica 2.2 for SPARC Copyright 1988-93 Wolfram Research, Inc. -- Motif graphics initialized -- In[1]:= <</home2/koppel/protLocal/bin/Mstats.m In[2]:= TraceLoad[] Loading `events.sim' Allocating space for 100000 events. Reading parameters from event file "events.sim". Loaded 17248 events. Loaded trace named `Sim0'. Proteus run 26 May 1995 13:15:28 (UTC-5). Simulated-system execution time 579,376. cycles Host execution time 30.77 s In[3]:= UtilGraph[] Out[3]= -Graphics- In[4]:= HistoGraph["Message Latency"] Out[4]= -Graphics- In[5]:= Quit Process Mathematica finished
The first input line, In[1]...
, loads Mstats. The
second input line loads a Proteus trace file (named `events.sim',
the default). Mstats displays the name it gives to the
loaded trace, `Sim0', and some basic information about the
simulation described by the trace.
A command to display a graph showing processor utilization is entered in line 3. The graph may appear in a separate window, depending upon the type of terminal and Mathematica front end used.
A graph of histometric `Message Latency' is displayed in line 4. A command to exit Mathematica is entered at line 5.
Trace files (which describe a simulation) are loaded using the `TraceLoad' command; each trace loaded is assigned a name. This way the results in several traces can quickly be compared. `TraceLoad' makes the newly loaded trace current. The command `SetCurrentTrace' can be used to make another trace current. The command `Summary' gives details about the current trace; the command `Summaries' gives details about all loaded traces. There is no way, other than exiting Mathematica, to remove a trace.
`TraceLoad' has options `TraceName' and `TraceFile'.
Example: TraceLoad[TraceFile->"linearNW.sim"]
.
Example:
TraceLoad[TraceFile->"thirdTry.sim",TraceName->"pleaseWork"]
.
The utilization graphs show how each processor's time is divided between basic activities: computing, running interrupt handlers, accessing shared memory, and (optionally) idle. A segmented horizontal bar is shown for each processor. There is a segment for each activity; the length of the segment is proportional to the time spent performing that activity (within the data collection period).
By default, utilization for each processor, and an average, are shown. Optionally, utilization for only a subset of processors can be plotted.
The data presented in the utilization graphs is not perfect, therefore the bars may not all be the same length. The discrepancy in bar length can be large if the collection of utilization statistics is turned off and on frequently. (Since, for example, a memory access in progress when utilization statistics collection is turned on is not counted.)
Displays a bar graph showing processor utilization for the current trace, as described above. `UtilGraph' has options `Normalize', `PlotSet', and `BarWidth' and can also use many of the options for the Mathematica `Graphics' object.
Example: UtilGraph[PlotLabel->"Well-Tuned Code"]
will
plot a graph with the label "Well-Tuned Code" instead of
"Processor Utilization".
Option for `UtilGraph'; if set to `True' the utilization graph will not include idle time and all bars will be scaled to the same length. If set to `False', the default, the the bar's length is determined by the length of each segment.
Specifies which processors to include in the graph. Setting `All', the default, indicates that all processors are to be included. When set to a list of integers, the corresponding processor numbers are included in the graph. Processors are numbered from 0 to N-1, where N is the number of processors. The list can be in any order and can contain repeated elements; bars will be plotted in the order specified by the list.
Example: UtilGraph[PlotSet->{1,3,20}]
plots a graph for
processors 1, 3, and 20.
The width of the bars used in the graph. When set to 1 bars will touch; when set to 0.5 the space between the bars will be the same as the bar width.
A histometric records statistical information about the values (samples) of a simulation quantity, for example message latency. See section Histogram Metrics for more information on histometrics. The information includes a histogram, the number of samples, the sum, and the sum of squares. A single histometric can collect data about several sets of variables, for example the message latency of messages arriving at each processor.
Histometrics are used by the Proteus L engine to collect data, for example the data used in the utilization graph. They can also be used by user code.
The command HistoSummary[]
lists summary information about
each histometric found in the trace.
The command HistoGraph[name]
plots data for histometric
name. There are functions which return the data within
a histometric. For example HistoSetALN[name]
returns
the number of samples which are too large for the maximum bin
for each set. See section Sampling Data for a description of the data within
a histometric.
HistoGraph
has options BarWidth, ColorFunction, EdgeStyle,
FirstIsBar, AvgBarRange, DataAverages, AverageMarker, SegmentTicSize,
BarPartStyle, BarPartSize, SetLabels, PlotSet.
Example: HistoGraph["User Time"]
displays histograms showing
the amount of time each processor is in user mode.
Example:
HistoGraph["Message Latency",ColorFunction->(Hue[0.1,#,0.3+#/2]&)]
Example:
HistoGraph["Idle Time",AvgBarRange->{0.1,0.2}]
plots
information about processor idle time with the means
shown with bars from 10 to 20% of the plot width.
Example: HistoGraph["Idle Time",BarPartSize->0.75]
plots a histograph in which the composite histogram is 3/4 the size of
the individual histograms.
For example, suppose the range of values anticipated
was from 0 to 30, and that 3 bins were requested. Further, suppose
1000 samples were encountered, 999 of these fell within these three bins and
one sample had a value of 731. Under the Uniform
setting four bins
would be drawn, all the same size. Under the Proportional
setting the first three bins would be the same size, but the
fourth one would be much larger, perhaps making the first three
bins unacceptably small.
Example: UtilGraph[PlotSet->{1,3,20}]
plots a graph for
processors 1, 3, and 20.
[[i,j]]
of
the returned value is the
number of samples in bin j
of set i
.
When the TraceLoad
command is issued Mstats loads metric, array
metric, and parameter data; event data is not loaded. Metric data can
be viewed using the function pd["Metric Name"]
. If "Metric
Name"
is the name of a loaded metric then the function will return the
value of the metric. For array metrics it will return a list. Similarly,
the value of parameters (those defined in files with extension
`.param') is returned by function pdp["Parameter Name"]
.
A list of metric names can be obtained using the function
pd[MetricList]
. Similarly a list of array metric names and
parameter names can be obtained using the functions
pd[ArrayMetricList]
and pdp[ParameterList]
respectively.
MetricList
returns a list of all metric names; if the
argument is ArrayMetricList
returns a list of all array metric
names. Metric names are defined in Proteus code using functions such as
new_metric
and newHistoMetric
.
ParameterList
returns a list of all parameter names.
Parameter names of real parameters are found in Proteus files with
extension .param
. Other parameter names are based on the names
and version numbers (or modification times) of files used to build
simulator.