LSU EE 7700-1 -- CA Research Methods -- Spring 2006 Processor Idea Evaluation Evaluate Three Things System Performance: performance of whole system. Idea Performance: performance of idea, ignore rest of system. Concept Performance: is it really working? Is it reaching its full potential? System Performance Goal: Estimate improvement in overall system performance. Performance Measures Execution time. Most common. Reported as speedup over a base system. Power Energy System Variations Base: A typical system. Idea: System with new idea. Denote execution time t_idea. Other: System with similar or competing research ideas. Denote execution time of these systems, t_base, t_idea, and t_other. Speedup of idea: t_base / t_idea. Base System Choice Goal: A typical system that will be in use when your idea can be manufactured. Practice (what's done): As much of ideal as possible but... reasonably close to what others in the literature simulate. System Variation Experiments Show how much better than Base and Others. But don't show: If it works on different configurations. Why and how well it is working. System Configuration Experiments Show if idea is sensitive to configuration (performs better on some). Vary configuration parameter that idea might be sensitive to (see examples). Plot or list speedup v. configuration parameter (e.g., cache size). System Configuration Experiments Example - Branch Prediction Vary Pipeline Depth Deeper pipelines increase importance of good branch prediction. Deeper pipelines interfere with predictor table update... hurting some predictors. Vary Cache Size Smaller caches mean more misses and so branches less important. That is, smaller caches, less speedup. Predictor Table Size Impact on predictor performance is obvious. Might show that Idea works better at smaller sizes, larger sizes, or all sizes. System Configurations Typically Tested Pipeline Depth The time from when an instruction arrives to when its results are available. Memory Latency How long it takes to retrieve something from a memory device. Cache Size ROB (window) Size For dynamically scheduled (out-of-order) processors. The number of instructions that a processor can handle at once. Fetch Width, Decode Width, Issue Width The number of instructions a processor can fetch, decode, or start execution at once. A 4-way superscalar processor has a decode width of 4. Number of Arithmetic and other Functional Units Idea Size How much chip area given to idea. Ideally, true area, but often the number of bits of storage needed for the idea. Benchmark Sensitivity Benchmark Choices Programs "narrow" target users expected to run. These are programs Idea works well on. Example: crafty (chess), go (go). Programs "broad" target users expected to run. These are large class of programs Idea works well on. Example: SPECCPU integer programs. (Includes crafty.) Programs other researchers use. (For comparison.) Reporting Results For at least one configuration, show benchmarks individually. Might separately show results in narrow and broad benchmark set. Idea Performance How well does idea, not the whole system, perform. Can show how well idea solves problem, ignoring how important problem is. Example - Branch Prediction Problem: branch mispredictions. Show branch prediction accuracy. Example - Cache Designs Problem: cache misses. Show cache hit ratio. Concept Performance This section for Ideas that strongly depend on program and system behavior. Does: (Concept performance as described here does apply.) Branch predictor, cache replacement policy. Does not: (Concept performance as described here does not apply.) Faster division hardware. Improvement in division speed not affected by numbers being divided. Overall speedup is affected by number of divides in benchmark, but that's system performance, not concept performance. Concept Concept is particular behavior or situation idea targets, and how it is exploited. Example - Local Predictor Behavior: Repeating outcome patterns that many branches might have. Exploitation: Use BHT to remember pattern and PHT to remember outcome. Concept Performance How often does behavior occur? For local predictor, how many branches have repeating patterns? How well is it handled? Are they all being predicted? Measuring Concept Performance Method 1: (easier) Use idealized version of idea. For example, local predictor with unlimited size BHT and large local history. Method 2: (may be harder) Modify simulator to detect behavior. Compare number of times Idea finds behavior to number of times simulator does.