LSU EE 7700-2 -- Fall 2003 -- Multiprocessors, Etc.

Final Exam Review


For Each Technique
 What kind of program behavior is exploited?
  Write a partisan example.
 How does it improve execution time?
  Show execution with and without technique.
 Describe tables needed by technique.
  How are the tables indexed?
  Describe each field in a table entry.
  When is the table read?
  When is the table written?
 How do technique X and Y compare?
  Easy question: 
   Assuming perfect stride prediction, which is better, sequential or stride prefetch?
    Ans: Stride, since it includes sequential.
  Harder question:
   Suppose all cache miss were from instructions accessing stride sequences.
   Under what circumstances would pre-execution be better than stride prefetch,
    and vice versa.
   Ans: Pre-ex better: long gap between loads that miss, so prefetched line may be 
    evicted.  Stride better: many paths to load, costly to build p-thread for each.
   

Execution Limits

 Dataflow Graph
  Draw one for a given program execution.
  Explain how to determine execution bound and ILP.
  Explain various reasons why the bound is not realizable.
  Explain various ways of executing faster than the bound.

Simulation

 Instrumentation
  Event registers.
   Explain how event registers can be used to determine branch
   prediction accuracy.

 Simulation Types: Behavioral and Timing
  Know what kind of information they provide.

 Simulators
  Shade
  Simplescalar
  SimOS
 
CTI Prediction

 Hybrid Branch Predictors
  Key Idea: Select best for a particular branch.
  Show an example in which a hybrid local / gshare outperforms either.

 Bi-Mode and YAGS Predictor
  Note: Bimode and bimodal completely different.
  YAGS Key Idea: Use pattern history for exceptions to bimodal prediction.
 Multiple GHR Length Predictors
  Key Idea: Find best GHR length.

Trace Caching, MBP

 Problems in predicting multiple branches per cycle.
 MBP
  Predicting multiple branches.
  Multiported instruction cache.
 Trace Cache
  Predicting next trace.
  Trace construction and use.

Prefetch
 Basic steps in prefetch.
 Sequential and Stride Techniques

Pre-Execution
 Construction of p-thread.
 Execution of p-thread.

Control Independence Exploitation
 Skipper
 IMT / Multiscalar

Critical Path Compression
 Data Prediction
 Value Re-Use
 Dynamic Optimization (rePlay, Optimization Cache)

SMTs
 Execution of multi-threaded code.

Parallel Computation
 Communication Models
  Message passing v. shared memory.
 Machine Organizations
  Multiprocessor v. cluster.
 Cache Coherence