LSU EE 7700-2 -- Fall 2003 -- Multiprocessors, Etc. Final Exam Review For Each Technique What kind of program behavior is exploited? Write a partisan example. How does it improve execution time? Show execution with and without technique. Describe tables needed by technique. How are the tables indexed? Describe each field in a table entry. When is the table read? When is the table written? How do technique X and Y compare? Easy question: Assuming perfect stride prediction, which is better, sequential or stride prefetch? Ans: Stride, since it includes sequential. Harder question: Suppose all cache miss were from instructions accessing stride sequences. Under what circumstances would pre-execution be better than stride prefetch, and vice versa. Ans: Pre-ex better: long gap between loads that miss, so prefetched line may be evicted. Stride better: many paths to load, costly to build p-thread for each. Execution Limits Dataflow Graph Draw one for a given program execution. Explain how to determine execution bound and ILP. Explain various reasons why the bound is not realizable. Explain various ways of executing faster than the bound. Simulation Instrumentation Event registers. Explain how event registers can be used to determine branch prediction accuracy. Simulation Types: Behavioral and Timing Know what kind of information they provide. Simulators Shade Simplescalar SimOS CTI Prediction Hybrid Branch Predictors Key Idea: Select best for a particular branch. Show an example in which a hybrid local / gshare outperforms either. Bi-Mode and YAGS Predictor Note: Bimode and bimodal completely different. YAGS Key Idea: Use pattern history for exceptions to bimodal prediction. Multiple GHR Length Predictors Key Idea: Find best GHR length. Trace Caching, MBP Problems in predicting multiple branches per cycle. MBP Predicting multiple branches. Multiported instruction cache. Trace Cache Predicting next trace. Trace construction and use. Prefetch Basic steps in prefetch. Sequential and Stride Techniques Pre-Execution Construction of p-thread. Execution of p-thread. Control Independence Exploitation Skipper IMT / Multiscalar Critical Path Compression Data Prediction Value Re-Use Dynamic Optimization (rePlay, Optimization Cache) SMTs Execution of multi-threaded code. Parallel Computation Communication Models Message passing v. shared memory. Machine Organizations Multiprocessor v. cluster. Cache Coherence