LSU EE 7700-2 -- Fall 2003 -- Multiprocessors, Etc.
Final Exam Review
For Each Technique
What kind of program behavior is exploited?
Write a partisan example.
How does it improve execution time?
Show execution with and without technique.
Describe tables needed by technique.
How are the tables indexed?
Describe each field in a table entry.
When is the table read?
When is the table written?
How do technique X and Y compare?
Easy question:
Assuming perfect stride prediction, which is better, sequential or stride prefetch?
Ans: Stride, since it includes sequential.
Harder question:
Suppose all cache miss were from instructions accessing stride sequences.
Under what circumstances would pre-execution be better than stride prefetch,
and vice versa.
Ans: Pre-ex better: long gap between loads that miss, so prefetched line may be
evicted. Stride better: many paths to load, costly to build p-thread for each.
Execution Limits
Dataflow Graph
Draw one for a given program execution.
Explain how to determine execution bound and ILP.
Explain various reasons why the bound is not realizable.
Explain various ways of executing faster than the bound.
Simulation
Instrumentation
Event registers.
Explain how event registers can be used to determine branch
prediction accuracy.
Simulation Types: Behavioral and Timing
Know what kind of information they provide.
Simulators
Shade
Simplescalar
SimOS
CTI Prediction
Hybrid Branch Predictors
Key Idea: Select best for a particular branch.
Show an example in which a hybrid local / gshare outperforms either.
Bi-Mode and YAGS Predictor
Note: Bimode and bimodal completely different.
YAGS Key Idea: Use pattern history for exceptions to bimodal prediction.
Multiple GHR Length Predictors
Key Idea: Find best GHR length.
Trace Caching, MBP
Problems in predicting multiple branches per cycle.
MBP
Predicting multiple branches.
Multiported instruction cache.
Trace Cache
Predicting next trace.
Trace construction and use.
Prefetch
Basic steps in prefetch.
Sequential and Stride Techniques
Pre-Execution
Construction of p-thread.
Execution of p-thread.
Control Independence Exploitation
Skipper
IMT / Multiscalar
Critical Path Compression
Data Prediction
Value Re-Use
Dynamic Optimization (rePlay, Optimization Cache)
SMTs
Execution of multi-threaded code.
Parallel Computation
Communication Models
Message passing v. shared memory.
Machine Organizations
Multiprocessor v. cluster.
Cache Coherence