Instruction states running TeX on an 8-way superscalar machine.

The PSE CPU Execution Visualization Tool

Description
Output
Background

These pages contain output samples from a CPU visualization tool, PSE (processor simulation elucidator), being developed by David Koppelman to support his research at the LSU department of electrical and computer engineering. Using a dataset file collected from a CPU simulation PSE allows the user to visualize the execution of instructions by the simulated processor. These visualizations can provide insight in to the dynamics of an executing program that may lead to better coding, compiling, or processor designs.

Reorder buffer contents running TeX on an 8-way superscalar machine.

The program will be released in an early form some time this summer (2003). It currently works with a modified version of the RSIM simulator but will be written to allow easy interfacing of other simulators.

Output samples decorate this page, examples showing interesting features of processor operation are linked to this page.

Instruction states running TeX on an 8-way superscalar machine.

Description

PSE reads dataset files written by specially modified CPU simulators. The dataset files contain the state transitions made by dynamic instructions as they execute and also contains other simulation variables. As currently used, state transition data is collected for every instruction executing during 1000-cycle segments, the segments might be spaced 10,000 cycles apart. Segment duration and spacing is a simulation parameter and is chosen to balance the dataset file size with the need for detailed data.

Overview Plot

PSE currently produces three kinds of graphs, an overview plot, a reorder buffer (ROB) plot, and a pipeline execution diagram (PED) plot. The overview plot (not shown on these pages) displays the execution rate, in instructions per cycle (IPC), during each segment. The segments can be ordered chronologically or in execution rate order. PSE does not currently output these plots.

Reorder Buffer Plot

The reorder buffer plot shows the contents of the reorder buffer, with time on the horizontal axis and reorder buffer position on the vertical axis. Color is used to indicate instruction status or state. A grayed color indicates an instruction that will later be squashed (due to an exception or misprediction recovery). Instructions entering the reorder buffer are shown separately along the top of the plot (in purple).

Pipeline Execution Diagram Plot

The awkwardly named pipeline execution diagram plot is a re-formated version of the ROB plot. A dynamic instruction occupies a horizontal strip, that is, its vertical position is fixed. (In the ROB plot the vertical position of an instruction is based on its location in the reorder buffer.) The data in the PED plot is also upside down compared to the ROB plot, that is new instructions appear below older ones. Some of the PED plots wrap from bottom to top, for example, the long ones used as page separators. In some PED plots white lines divide individual instructions and cycles.

Output Samples

Instruction states running TeX on an 8-way superscalar machine.

Explanatory

Simple Loop, Graph Descriptions

Handled Well by Dynamic Scheduling

Short Dependency Chain

Miss Catchup

Not Helped by Dynamic Scheduling

Worst Case Pointer Chasing

Background -- Under Construction

Though instruction sets for processors are defined for the most part using a one-instruction-at-a-time execution model processors, to achieve high performance, actually overlap the execution of instructions.

Statically Scheduled Processors

Instructions in statically scheduled pipelined processors proceed though something like an assembly line, called a pipeline. The pipeline is divided in to stages, with each stage performing a particular step in the instructions execution, such as fetch or decode.

The maximum number of instructions in flight is the product of the number of stages and the width of the pipeline.

Statically scheduled processors are orderly in that most instructions pass through the pipeline in program order. That orderliness can limit performance because the pipeline must stall for instructions that must wait at a stage for something to arrive. The stall stops the advance of instructions from the stage holding the waiting instruction back to the pipeline entrance. Because instructions must remain in-order the held-up instructions cannot go around the waiting instruction.

The instructions in a program are usually arranged, or scheduled by a conscientious programmer or a compiler to minimize such stalls. Processors are called statically scheduled because they expect to run code prepared this way. (Statically in this sense means performed before the program is run.)

Dynamically Scheduled Processors (Currently Incomplete)

Instructions enter and leave a dynamically scheduled processor in program order, but other parts of their execution occur out of order. One could think of such a processor as having three pipelines, one for instructions to enter (fetch), one to execute, and one to exit (commit). Instructions enter the fetch pipeline in program order (in reality, a predicted program order).

An instruction can enter the execution pipeline only when everything will be ready for it, and so the execution pipeline never need stall. A ready-to-execute instruction need never be held by another instruction waiting for something. Of course, there will be times when there are no instructions ready to enter the execution and so execution will not proceed at the maximum possible speed.