!! LSU EE 4720 -- Spring 2017 -- Computer Architecture ! !! Compiler Optimization Lecture Notes !! Contents ! ! Optimization Introduction ! Steps in Preparing a Program ! Compiler Definitions ! High-Level Optimizations ! Low-Level Optimizations ! Compiler Optimization Options ! Use of Compiler Switches !! References ! ! :HP3: Hennessy & Patterson, "Computer Architecture, a Quantitative Approach" !! Lecture Goals ! ! Understand the Program Building and Compilation Process ! Describe steps in program building and optimization, including ! intermediate files (assembler, object, ...) and tool names (preprocessor, ! compiler, etc.). ! ! Understand Specific Optimizations and Assumption Switches ! Describe, work example, explain benefit. ! ! Understand Profiling ! Steps, how performance is improved. ! ! Understand ISA and Implementation Options ! How programmer chooses them, how compiler uses them. ! ! Understand how Programmers Use Compilation Options ! Normal program development, high-performance programs, SPEC disclosure. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! Optimization Introduction ! :HP3: Section 2.11 ! :Def: Optimization ! The optional steps in compiling a program that reduce the program's ! execution time, reduce the size of the program, energy consumed, etc. ! Typically, the only time optimization is NOT done is when a program ! is being debugged. ! ! In most cases, the programmer sets overall optimization effort (low, ! medium, high). ! ! When performance is very important the programmer can specify which ! specific optimizations to use. ! :Example: ! ! Data on a program that computes pi with and without optimization. ! ! System: SunOS sol 5.6 Generic_105181-31 sun4u sparc SUNW,Ultra-Enterprise ! Compiler: Sun WorkShop 6 2000/04/07 C 5.1 ! Clock Frequency: 300 MHz ! ! Without Optimization ! Size : 6408 bytes ! Run Time : 3.00 s ! Instruction Count : 325,338,749 ! CPI : (/ (* 3.00 300e6) 325338749.0 ) = 2.7663 ! ! With Optimization ! Size : 6340 bytes Smaller! (OK, only a tiny bit smaller.) ! Run Time : 1.65 s Faster! ! Instruction Count : 100,338,751 ! CPI : (/ (* 1.65 300e6) 100338751.0) = 4.9333 ! ! Comparison of Un-optimized and Optimized Runs ! ! Un-optimized run takes 1.82 times longer. ! Un-optimized run executes 3.22 times more instructions. ! ! --> Execution time is not proportional to instruction count ! We've already seen how this can be true due to stalls. ! ! --> In the slower version instructions seem to be executed more efficiently. ! ! Quick Explanation (See insn scheduling discussion below for code.) ! ! The un-optimized version contained more easy instructions, such ! as load and store instructions (that would hit the cache). ! ! Both versions had the same number of floating-point divide instructions ! which take a long time to execute. !! Reasons *not* to Optimize ! ! Percentage shows roughly how often reason applies. ! ! 95% It makes debugging difficult, so don't optimize while debugging. ! This is true for almost everyone that uses a debugger. ! ! 10% It slows down compilation. ! Only important when there is a very large amount of code to ! recompile. (Back in the 20th century when computers were slow ! this was important.) ! ! .001% Optimization introduces bugs. ! It does, but very rarely. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! Steps in Building a Program ! Typical Steps in Building a Program ! ! Pre-Process, Compile, Assemble, Link, Load ! pi.c -> pi.i -> pi.s -> pi.o -> pi ! ! These steps can all be automatically performed of the compiler ! driver. (cc, gcc, MS Visual Studio, etc.) ! ! :Sample: cc pi.c -o pi ! ! They can also be specified individually. ! ! More details appear below, compile is what we're interested in. !! Pre-Process ! Load include files, expand macros. ! Typical pre-processor name: cpp ! Input File: pi.c ! Output File: pi.i (High-Level Language) !! Compile ! Convert high-level language in to assembler. ! Typical compiler name: cc1 (not cc, or gcc, or Visual Studio) ! Input File: pi.i ! Output File: pi.s (Assembler) !! Assemble ! Convert assembly language in to machine language. ! Typical assembler name: as ! Input File: pi.s ! Output File: pi.o (Object file containing machine code.) !! Link ! Combine object files, libraries, and other code into executable. ! Typical linker name: ld ! Input File: pi.o ! Output File: pi !! Load ! Copy executable in to memory, link with shared libraries, and start. ! Loader name: exec system call. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! Compiler Terminology ! A program is compiled using a steps or /passes/. ! ! The first pass reads the pre-processed high-level source code. ! ! The last pass emits assembler code. ! ! Between the first and last passes the program is in some ! intermediate representation. ! ! The way passes are defined and organized varies greatly from ! compiler to compiler. ! :Def: Pass ! A step in compiling a program. A pass usually looks at an entire function. ! :Def: Intermediate Representation ! The form of a program (sort of a special-purpose language) used internally ! by a compiler. A compiler might use several intermediate representations. !! Typical Passes ! ! Parse ! Convert the source code to a high-level intermediate representation (H-IR). ! ! High-Level Optimization ! Optional, may be done in several passes. ! Also called front-end optimization. ! Modify H-IR to improve performance or reduce code size. ! Reads and writes H-IR ! ! Low-Level Intermediate Representation (L-IR) Generation ! Convert H-IR to a low-level intermediate representation (L-IR). ! ! Low-Level Optimization (Optional, may be done in several passes.) ! Also called back-end optimization. ! Modify L-IR to improve performance or reduce code size. ! ! Register Assignment (Part of low-level optimization.) ! Choose machine registers. ! ! Code Generation ! Convert L-IR to assembly code. ! ! Pigeonhole Optimizations (Optional, may be done in several passes.) ! These are also called low-level optimizations. ! Modify L-IR to improve performance or reduce code size. ! Some of these can be done at link time. ! :Def: Compiler Front End ! The parts of the compiler that do the parsing and high-level ! optimization passes. Computer architects are less interested ! in this part. ! :Def: Compiler Back End ! The parts of the compiler that do the low-level optimization, ! register assignment, and code generation passes. Computer ! architects are very interested in this part. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! High-Level Optimizations ! Easy high-level optimizations presented here. !! Some Easy-To-Explain Front-End Optimizations ! ! Dead-Code Elimination (DCE) ! Common Subexpression Elimination (CSE) ! Constant Propagation, Folding ! :Def: Dead-Code Elimination (DCE) ! Removal of code which isn't used. ! Yes, it happens. ! This can also be a low-level optimization. ! :Example: ! ! Code benefiting from DCE ! High-level code shown for clarity. Most compilers will transform ! an intermediate representation. ! ! Before: ! int main(int argv, char **argc) { double i; double sum = 0; for(i=1; i<50000000;) { sum = sum + 4.0 / i; i += 2; sum = sum - 4.0 / i; i += 2; } printf("Performed %d iterations. Thank you for running me.\n",i); } ! ! After: ! int main(int argv, char **argc) { double i; for(i=1; i<50000000;) { i += 2; i += 2; } printf("Performed %d iterations. Thank you for running me.\n",i); } ! ! Note: Other optimizations would leave only the printf. ! :Def: Common Subexpression Elimination (CSE) ! Remove duplicated code. ! ! Before: r = ( a + b ) / ( x + y ); s = ( a + b ) / ( x - y ); ! After: temp = a + b; r = ( temp ) / ( x + y ); s = ( temp ) / ( x - y ); ! :Def: Constant Propagation, Folding ! The compiler performs whatever arithmetic it can at compile time ! rather than emitting code to perform the arithmetic at run time. ! ! Before: int sec_per_day = 60 * 60 * 24; int sec_per_year = sec_per_day * 365; some_routine(sec_per_day * x, sec_per_year * y); ! After: int sec_per_day = 86400; int sec_per_year = 31536000; some_routine(86400 * x, 31536000 * y); !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! Low-Level Optimizations ! Some Low-Level Optimizations ! ! This is not a complete list. ! ! Register Assignment ! Instruction Selection ! Scheduling ! :Def: Register Assignment ! Selection of which values will be held in registers. Values ! not held in registers are stored in memory. ! ! Without Register Assignment Optimizations ! All values corresponding to variables (in high-level program) are ! written to memory (and not held in registers). Intermediate results are ! held in registers. ! ! With Register Assignment Optimization ! Registers are assigned to as many variables as possible, with priority ! given to frequently used variables. ! ! Advantage of Register Assignment Optimization ! Fewer memory writes and reads. ! :Def: Scheduling ! Re-arranging instructions to minimize the amount of time one instruction ! has to wait for another. ! ! For example, if an instruction takes a long time it will be started early ! so that other instructions will not have to wait for its result. ! ! Scheduling will be covered in more detail later in the semester. ! :Example: ! ! pi program without and with optimization. ! ! Optimizations include register assignment and scheduling. ! ! Without Optimization ! ! 10 { ! 11 sum = sum + 4.0 / i; i += 2; ldd [%fp-24],%f6 ! f6 = sum ldd [%l0+0],%f4 ! f4 = 4.0 ldd [%fp-16],%f2 ! f2 = i fdivd %f4,%f2,%f2 ! f2 = 4.0 / i faddd %f6,%f2,%f2 ! f2 = sum + (4.0/i) std %f2,[%fp-24] ! sum = f2 ldd [%fp-16],%f4 ! f4 = i ldd [%l0+8],%f2 ! f2 = 2.0 faddd %f4,%f2,%f2 ! f4 = i + 2.0 std %f2,[%fp-16] ! i = f4 ! 12 sum = sum - 4.0 / i; i += 2; ldd [%fp-24],%f6 ! f6 = sum ldd [%l0+0],%f4 ! f4 = 4.0 ldd [%fp-16],%f2 ! f2 = i fdivd %f4,%f2,%f2 ! f2 = 4.0 / i fsubd %f6,%f2,%f2 ! f2 = sum - (4.0/i) std %f2,[%fp-24] ! sum = f2 ldd [%fp-16],%f4 ! f4 = i ldd [%l0+8],%f2 ! f2 = 2.0 faddd %f4,%f2,%f2 ! f2 = i + 2.0 std %f2,[%fp-16] ! i = f2 ldd [%fp-16],%f4 ! f4 = i ldd [%l0-8],%f2 ! f2 = 50000000 fcmped %f4,%f2 ! compare i, 50000000 nop fbl .L92 ! Branch if FP comparison less than. nop ! ! With Optimization ! ! 10 ! { ! 11 ! sum = sum + 4.0 / i; i += 2; ! .L77000016: /* 0x0020 11 */ fdivd %f4,%f30,%f6 ! temp1 = 4.0 / i_2 /* 0x0024 */ faddd %f30,%f2,%f8 ! i_1 = i_2 + 2.0 ! 12 ! sum = sum - 4.0 / i; i += 2; /* 0x0028 12 */ faddd %f8,%f2,%f30 ! i_2 = i_1 + 2.0 /* 0x002c */ fcmped %f30,%f0 ! i_2 < 50000000 /* 0x0030 */ fdivd %f4,%f8,%f8 ! temp2 = 4.0 / i_1 /* 0x0034 11 */ faddd %f10,%f6,%f6 ! sum = sum + temp1 /* 0x0038 12 */ fbl .L77000016 /* 0x003c */ fsubd %f6,%f8,%f10 ! sum = sum - temp2 ! 13 ! } !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! Compiler Optimization Options ! Compiler options tell the compiler: ! ! How much EFFORT to put into optimization. E.g., -O2 ! Which PARTICULAR optimizations to perform. E.g., -fno-strength-reduce ! The TARGET system the code will be running on. E.g., -xultra2 ! Whether to make certain ASSUMPTIONS about the code. ! Whether to use PROFILING data from a training run. !! Target Options ! ! Specify exact type of machine code will be run on:f ISA, Implementation, Cache ! ! Choice is based on type of machines customers have. ! ! !! Specifying the: ISA ! ! The exact instruction set used. ! Specifies not just the ISA family, but a particular variation. ! A poor choice will limit the number of machines code can run on. ! !! Specifying the: Implementation (Processor Core) ! ! Specify the implementation code will run on. ! A poor choice will result in slower code. ! !! Specifying the: Cache (Can be considered part of ISA implementation.) ! ! Specify configuration of cache. ! Caches covered later in the semester. ! A poor choice will result in slower code. !! Background ! ! ARM AArch64 Architecture ! ! Developed by ARM. ! ! Some Implementations ! ! cortex-a53 ! cortex-a72 ! exynos-m1 ! :Example: ! ! Switches for GCC ! ! -march Specifies ISA ! -mtune Specifies the implementation. ! -mcpu Specifies both ISA and implementation ! Specify the name of the target processor for which GCC should tune ! the performance of the code. Permissible values for this option ! are: 'generic', 'cortex-a53', 'cortex-a57', 'cortex-a72', ! 'exynos-m1', 'thunderx', 'xgene1'. !! Optimization EFFORT ! ! :Def: Optimization Effort ! Amount of optimization to be done. A small effort means performing ! only easy optimizations, a large effort means performing more ! time-consuming optimizations. ! ! Most compilers have optimization levels. ! ! The higher the number, the more optimizations done. ! :Example: ! ! Optimization Levels for Sun Workshop 6 Compiler ! -xO [1|2|3|4|5] ! Optimizes the object code. Note the upper-case letter ! O. ! The levels (1, 2, 3, 4, or 5) you can use differ ! according to the platform you are using. ! ( SPARC) ! -xO1 Does basic local optimization (peephole). ! -xO2 Does basic local and global optimization. ! This is induction variable elimination, local ! and global common subexpression elimination, ! algebraic simplification, copy propagation, ! constant propagation, loop-invariant optimi- ! zation, register allocation, basic block ! merging, tail recursion elimination, dead ! code elimination, tail call elimination and ! complex expression expansion. ! The -xO2 level does not assign global, exter- ! nal, or indirect references or definitions to ! registers. It treats these references and ! definitions as if they were declared "vola- ! tile." In general, the -xO2 level results in ! minimum code size. ! -xO3 Performs like -xO2 but, also optimizes refer- ! ences or definitions for external variables. ! Loop unrolling and software pipelining are ! also performed. The -xO3 level does not trace ! the effects of pointer assignments. When com- ! piling either device drivers, or programs ! that modify external variables from within ! signal handlers, you may need to use the ! volatile type qualifier to protect the object ! from optimization. In general, the -xO3 ! level results in increased code size. ! -xO4 Performs like -xO3 but, also does automatic ! inlining of functions contained in the same ! file; this usually improves execution speed. ! The -xO4 level does trace the effects of ! pointer assignments. In general, the -xO4 ! level results in increased code size. ! If you want to control which functions are ! inlined, see -xinline . ! -xO5 Generates the highest level of optimization. ! Uses optimization algorithms that take more ! compilation time or that do not have as high ! a certainty of improving execution time. ! Optimization at this level is more likely to ! improve performance if it is done with pro- ! file feedback. !! PARTICULAR Optimizations ! ! The levels specify sets of optimizations (like ordering the "Sport Package" ! for a new car). ! ! In contrast to optimization levels (-O3), the compiler can be told which ! particular optimizations to make. ! ! These are typically used by skilled programmers trying to get ! fastest code. ! ! Some examples below. ! gcc ! ! `-frerun-loop-opt' ! Run the loop optimizer twice. ! Sun Workshop 6 ! ! Prefetch instructions read memory in advance, eliminating some cache ! misses (covered later in the semester). They can also increase ! cache misses or increase the time to access memory, hurting ! performance. The compiler does not know if they will help or hurt. ! ! -xprefetch[=val],val ! (SPARC) Enable prefetch instructions on those architec- ! tures that support prefetch, such as UltraSPARC II. ! (-xarch=v8plus, v9plusa, v9, or v9a) ! Explicit prefetching should only be used under special ! circumstances that are supported by measurements. !! ASSUMPTIONS (Assertions) About the Program ! Compilers must generate correct code. ! ! That is, the code must execute in the way specified by the ! high-level language definition. ! ! Correct code can be slow. ! ! The compiled code might need to check for things that can happen ... ! ... but don't in a particular program. ! ! Some options tell the compiler to make assumptions about the program. ! ! These assumptions would not hold for every program. ! ! The compiled program runs faster ... ! ... and correctly if the assumptions are valid. ! Some switches specifying assumptions: ! gcc, Assume the program does not require strict IEEE 754 FP features. ! `-ffast-math' ! This option allows GCC to violate some ANSI or IEEE rules and/or ! specifications in the interest of optimizing code for speed. For ! example, it allows the compiler to assume arguments to the `sqrt' ! function are non-negative numbers and that no floating-point values ! are NaNs. ! cc (Sun Workshop 6 Compiler) Assume certain pointers do not overlap. ! ! -xrestrict=f ! (SPARC) Treats pointer-valued function parameters as ! restricted pointers. f is a comma-separated list that ! consists of one or more function parameters, %all, ! %none. This command-line option can be used on its ! own, but is best used with optimization of -xO3 or ! greater. ! ! The default is %none. Specifying -xrestrict is ! equivalent to specifying -xrestrict=%all. ! ! :Example: ! ! In the loop below the compiler would ordinarily load the value at x ! from memory (dereference) each iteration (five times) because the ! address of x may be the same as the address of one of the "a" ! elements. With -xrestrict switch, x is not loaded each iteration, ! saving time and space. The switch is needed because the compiler ! has no way of knowing if x and the a's overlap and must otherwise ! make a conservative assumption (that they overlap) void array_add(int *a, int *b, int *x) { int i; for(i=0; i<5; i++) a[i] = b[i] + x[0]; } ! Compiled Code Without -xrestrict ! 24 ! int i; ! 25 ! for(i=0; i<5; i++) ! 26 ! a[i] = b[i] + *x; ! %g4 i ! %o2 x ! %g3 &a[i] ! %g2 &b[i] /* 0x0004 26 */ ld [%o1],%o5 ! o5 = Mem[o1] /* 0x0008 23 */ or %g0,%o1,%g2 ! g2 = 0 + o1 /* 0x000c 25 */ or %g0,0,%g4 ! g4 = 0 + 0 .L900000108: ! The loop body starts here. /* 0x0010 26 */ ld [%o2],%o4 ! <-- Load x, notice that /* 0x0014 */ add %g4,1,%g4 ! o2 does not change. /* 0x0018 */ add %g2,4,%g2 ! g2 = g2 + 4 /* 0x001c */ cmp %g4,5 /* 0x0020 */ add %o5,%o4,%o5 ! o5 = b[i] + x[0] /* 0x0024 */ st %o5,[%g3] ! a[i] = o5 /* 0x0028 */ add %g3,4,%g3 /* 0x002c */ bl,a .L900000108 /* 0x0030 */ ld [%g2],%o5 ! o5 = b[i+1] .L77000022: /* 0x0034 */ retl ! Result = /* 0x0038 */ nop ! Compiled Code With -xrestrict /* 000000 23 */ or %g0,%o0,%g4 ! 24 ! int i; ! 25 ! for(i=0; i<5; i++) ! 26 ! a[i] = b[i] + *x; ! o2 x ! g2 *x ! g4 &a[i] ! g3 &b[i] /* 0x0004 26 */ ld [%o2],%g2 ! <-- Load x before entering /* 0x0008 23 */ or %g0,%o1,%g3 ! the loop. /* 0x000c 26 */ ld [%o1],%o5 /* 0x0010 25 */ or %g0,0,%g1 .L900000108: ! The loop body starts here. /* 0x0014 26 */ add %o5,%g2,%o5 /* 0x0018 */ st %o5,[%g4] /* 0x001c */ add %g1,1,%g1 /* 0x0020 */ add %g3,4,%g3 /* 0x0024 */ add %g4,4,%g4 /* 0x0028 */ cmp %g1,5 /* 0x002c */ bl,a .L900000108 /* 0x0030 */ ld [%g3],%o5 .L77000022: /* 0x0034 */ retl ! Result = /* 0x0038 */ nop ! If array_add is compiled using the -xrestrict switch and it is ! called by the code below, it will return the wrong answer (since x ! changes value). int main(int argv, char **argc) { int a[5], b[5]; int *x = &a[1]; ... (More code here) array_add(a,b,x); } ! (End of example.) !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! PROFILING !! ! :Def: Profiling (a.k.a. Feedback-Directed Optimization (FDO)) ! A compilation technique in which data taken from a sample run is used ! to guide compiler decisions. !! Typical Profiling Procedure ! ! (1) Compile program with profiling option. ! ! (2) Run program using typical input data. ! During run profile information collected for compiler. ! ! (3) Run compiler again, specifying where to find profile information. !! Unoptimized bneq r1, r2 IFPART nop ELSEPART: j CONTINUE; sub r9, r10, r11 IFPART: add r3, r4, r5 CONTINUE: xor r6, r7, r8 !! Optimized after profiling bneq r1, r2 IFPART nop ELSEPART: sub r9, r10, r11 CONTINUE: xor r6, r7, r8 !! Lots of additional code. !! Somewhere far away. IFPART: add r3, r4, r5 j CONTINUE; nop ! Branches occur frequently in code. There is a performance penalty ! in taking a branch and so it's best if the compiler organizes code ! (rearranges the intermediate representation) so that branches are ! not taken as much as possible. To do that the compiler needs to ! know how often an "if" or other condition (for which the branch is ! emitted) is true. Only in a few cases can the compiler figure that ! out on its own, because, for example, "if" conditions depend on ! input data. To obtain this useful information a two-step ! compilation process called profiling is used. In the first step the ! code is compiled so that it writes branch information (more ! precisely basic block, covered later) to a file. The program is run ! with typical input data, called the training input, and it writes ! the branch information to a file. In the second step the compiler ! reads the information and uses that to better organize the code. !! Sun Profiling Compiler Switches ! ! -xprofile=p ! Collects data for a profile or use a profile to optim- ! ize. ! ! p must be collect[:name], use[:name], or tcov. ! ! This option causes execution frequency data to be col- ! lected and saved during execution, then the data can be ! used in subsequent runs to improve performance. This ! option is only valid when a level of optimization is ! specified. !! GCC Profiling Compiler Switches ! ! `-fprofile-generate' ! `-fprofile-generate=PATH' ! Enable options usually used for instrumenting application to ! produce profile useful for later recompilation with profile ! feedback based optimization. You must use `-fprofile-generate' ! both when compiling and when linking your program. ! ! The following options are enabled: `-fprofile-arcs', ! `-fprofile-values', `-fvpt'. ! ! If PATH is specified, GCC will look at the PATH to find the ! profile feedback data files. See `-fprofile-dir'. ! ! `-fprofile-use' ! `-fprofile-use=PATH' ! Enable profile feedback directed optimizations, and optimizations ! generally profitable only with profile feedback available. ! ! The following options are enabled: `-fbranch-probabilities', ! `-fvpt', `-funroll-loops', `-fpeel-loops', `-ftracer' ! ! By default, GCC emits an error message if the feedback profiles do ! not match the source code. This error can be turned into a ! warning by using `-Wcoverage-mismatch'. Note this may result in ! poorly optimized code. ! ! If PATH is specified, GCC will look at the PATH to find the ! profile feedback data files. See `-fprofile-dir'. ! ! `-fbranch-probabilities' ! After running a program compiled with `-fprofile-arcs' (*note ! Options for Debugging Your Program or `gcc': Debugging Options.), ! you can compile it a second time using `-fbranch-probabilities', ! to improve optimizations based on the number of times each branch ! was taken. When the program compiled with `-fprofile-arcs' exits ! it saves arc execution counts to a file called `SOURCENAME.gcda' ! for each source file. The information in this data file is very ! dependent on the structure of the generated code, so you must use ! the same source code and the same optimization options for both ! compilations. ! ! With `-fbranch-probabilities', GCC puts a `REG_BR_PROB' note on ! each `JUMP_INSN' and `CALL_INSN'. These can be used to improve ! optimization. Currently, they are only used in one place: in ! `reorg.c', instead of guessing which path a branch is mostly to ! take, the `REG_BR_PROB' values are used to exactly determine which ! path is taken more often. ! ! `-fprofile-values' ! If combined with `-fprofile-arcs', it adds code so that some data ! about values of expressions in the program is gathered. ! ! With `-fbranch-probabilities', it reads back the data gathered ! from profiling values of expressions and adds `REG_VALUE_PROFILE' ! notes to instructions for their later usage in optimizations. ! ! Enabled with `-fprofile-generate' and `-fprofile-use'. ! ! `-fvpt' ! If combined with `-fprofile-arcs', it instructs the compiler to add ! a code to gather information about values of expressions. ! ! With `-fbranch-probabilities', it reads back the data gathered and ! actually performs the optimizations based on them. Currently the ! optimizations include specialization of division operation using ! the knowledge about the value of the denominator. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! Use of Compiler Switches !! Use of Compiler Switches ! ISA ! Type of system all uses have. (IA-32 for PCs, SPARC for Sun users, etc.) ! Users can't normally run code compiled for a different ISA. ! Implementation ! Type of system most users have. ! Other users can run code, but won't run as fast. ! Optimization ! Select medium or high optimization level. ! If market very sensitive to performance, use specific optimizations.