Lab : Branch Prediction |
The purpose of this lab is to observe branch predictors "in the wild". In other words, your goal is to collect evidence that the branch predictor on your processor does, in fact, affect performance.
The first step is to design a pair of programs that will differ only in their interaction with the branch predictor. This means that the only difference between the two programs should be the sequence of "taken/not-taken" for a specific branch. Designing such a program can be tricky.
Download and examine observe_bp.c. This program first loads a
boolean array with a specific "taken/not-taken" sequence. It then measures how many
cycles elapse during the running of a loop containing a branch that depends on the array.
Compile three versions of this program:
gcc -O0 -DTAKE_OR_NOT=true observe_bp.c -o nevergcc -O0 -DTAKE_OR_NOT=false observe_bp.c -o alwaysgcc -O0 -DTAKE_OR_NOT=which observe_bp.c -o randomThe "-D" flag allows us to set the value of the TAKE_OR_NOT macro from
the command line (and saves us the trouble of repeatedly editing the source file). The program never
never takes the branch in the loop, whereas the program random randomly takes the branch. Notice that
boolean array instead of repeatedly calling random inside the code
we are timing.
random in the "Array Initialization Loop" for all programs --- even those programs that
don't use the random value when initializing the array.
random function in the always and never
programs, even though they don't actually use the random values?
Run the never program.
To reduce the "noise" in the system, run each program a few thousand times and take the average.
This Ruby script will automate the process. Run
ruby average_runs.rb command where command is the same command you would type
on the command line to
execute the program (e.g., ./always.)
always, never, and random on an Intel i7 processor and report your
observations about the time.
always, never, and random on an AMD processor and report your
findings. (I will provide instructions on how to access the
AMD during lab.)
As discussed in class, modern CPUs use quite complicated branch predictors that can detect complex patterns. Create a
new program based on observe_bp.c that will cycle through a random sequence of taken/not-taken of a given
length. A suboptimal way to do this would be to replace the line
if (values[i]) {
with
if (values[i % pattern_length]) {
However, you don't want the mod inside the code you are timing. Your program should generate the
desired pattern outside the timed loop.
SIZE is several times larger than your maximum pattern length.atoi(argv[1])).for ((i=25; i < 1500; i+= 25)) do echo -n "$i " # the "-n" suppresses the newline ruby average_runs.rb ./pattern $i --samples 2500 done
gnuplot to generate your graphs. Simply create a text file with two numbers on each
line: the pattern length and the observed time. At the gnuplot prompt type plot
"filename" with linespoints to display the graph.
average_runs.rb to sufficiently reduce
noise.