LSU EE 7700-1 -- CA Research Methods -- Spring 2006


Processor Idea Evaluation

 Evaluate Three Things

  System Performance: performance of whole system.
  Idea Performance: performance of idea, ignore rest of system.
  Concept Performance: is it really working? Is it reaching its full potential?

System Performance

 Goal: Estimate improvement in overall system performance.

 Performance Measures
  Execution time. 
   Most common.
   Reported as speedup over a base system.
  Power
  Energy

 System Variations
   Base: A typical system.  
   Idea: System with new idea. Denote execution time t_idea.
   Other: System with similar or competing research ideas. 
   Denote execution time of these systems, t_base, t_idea, and t_other.

   Speedup of idea:  t_base / t_idea.

 Base System Choice
   Goal: 
     A typical system that will be in use when your idea can be manufactured.
   Practice (what's done):
     As much of ideal as possible but...
     reasonably close to what others in the literature simulate.

 System Variation Experiments
   Show how much better than Base and Others.
   But don't show:
     If it works on different configurations.
     Why and how well it is working.

 System Configuration Experiments
   Show if idea is sensitive to configuration (performs better on some).
   Vary configuration parameter that idea might be sensitive to (see examples).
   Plot or list speedup v. configuration parameter (e.g., cache size).

 System Configuration Experiments Example - Branch Prediction

   Vary Pipeline Depth 
     Deeper pipelines increase importance of good branch prediction.
     Deeper pipelines interfere with predictor table update...
       hurting some predictors.

   Vary Cache Size
     Smaller caches mean more misses and so branches less important.
       That is, smaller caches, less speedup.

   Predictor Table Size
     Impact on predictor performance is obvious.
     Might show that Idea works better at smaller sizes, larger sizes, or all sizes.

 System Configurations Typically Tested

   Pipeline Depth
     The time from when an instruction arrives to when its results are available.

   Memory Latency
     How long it takes to retrieve something from a memory device.

   Cache Size

   ROB (window) Size
     For dynamically scheduled (out-of-order) processors.
     The number of instructions that a processor can handle at once.

   Fetch Width, Decode Width, Issue Width
     The number of instructions a processor can fetch, decode, or start execution
      at once.
     A 4-way superscalar processor has a decode width of 4.

   Number of Arithmetic and other Functional Units

   Idea Size
     How much chip area given to idea. Ideally, true area, but often
     the number of bits of storage needed for the idea.

 Benchmark Sensitivity
   
   Benchmark Choices
     Programs "narrow" target users expected to run.
       These are programs Idea works well on.
       Example: crafty (chess), go (go).
     Programs "broad" target users expected to run.
       These are large class of programs Idea works well on.
       Example: SPECCPU integer programs. (Includes crafty.)
     Programs other researchers use. (For comparison.)

   Reporting Results
     For at least one configuration, show benchmarks individually.
     Might separately show results in narrow and broad benchmark set.

     
Idea Performance

 How well does idea, not the whole system, perform.

 Can show how well idea solves problem, ignoring how important problem is.

 Example - Branch Prediction

  Problem: branch mispredictions.
  Show branch prediction accuracy.

 Example - Cache Designs

  Problem: cache misses.
  Show cache hit ratio.
 

Concept Performance

 This section for Ideas that strongly depend on program and system behavior.
  Does: (Concept performance as described here does apply.)
    Branch predictor, cache replacement policy.
  Does not: (Concept performance as described here does not apply.)
    Faster division hardware. 
      Improvement in division speed not affected by numbers being divided.
      Overall speedup is affected by number of divides in benchmark, but that's system
       performance, not concept performance.

 Concept
   Concept is particular behavior or situation idea targets, and how it is exploited.

   Example - Local Predictor
    Behavior: Repeating outcome patterns that many branches might have.
    Exploitation:  Use BHT to remember pattern and PHT to remember outcome.
    
 Concept Performance
   How often does behavior occur?
     For local predictor, how many branches have repeating patterns?
   How well is it handled?
     Are they all being predicted?

 Measuring Concept Performance
   Method 1: (easier)
     Use idealized version of idea.
     For example, local predictor with unlimited size BHT and large local history.
   Method 2: (may be harder)
     Modify simulator to detect behavior.
     Compare number of times Idea finds behavior to number of times simulator does.