Lab : Branch Prediction |
The purpose of this lab is to observe branch predictors "in the wild". In other words, your goal is to collect evidence that the branch predictor on your processor does, in fact, affect performance.
The first step is to design a pair of programs that will differ only in their interaction with the branch predictor. This means that the only difference between the two programs should be the sequence of "taken/not-taken" for a specific branch. Designing such a program can be tricky.
Download and examine observe_bp.c
. This program first loads a
boolean
array with a specific "taken/not-taken" sequence. It then measures how many
cycles elapse during the running of a loop containing a branch that depends on the array.
Compile three versions of this program:
gcc -O0 -DTAKE_OR_NOT=true observe_bp.c -o never
gcc -O0 -DTAKE_OR_NOT=false observe_bp.c -o always
gcc -O0 -DTAKE_OR_NOT=which observe_bp.c -o random
The "-D"
flag allows us to set the value of the TAKE_OR_NOT
macro from
the command line (and saves us the trouble of repeatedly editing the source file). The program never
never takes the branch in the loop, whereas the program random
randomly takes the branch. Notice that
boolean
array instead of repeatedly calling random
inside the code
we are timing.
random
in the "Array Initialization Loop" for all programs --- even those programs that
don't use the random value when initializing the array.
random
function in the always
and never
programs, even though they don't actually use the random values?
Run the never
program.
To reduce the "noise" in the system, run each program a few thousand times and take the average.
This Ruby script will automate the process. Run
ruby average_runs.rb command
where command
is the same command you would type
on the command line to
execute the program (e.g., ./always
.)
always
, never
, and random
on an Intel i7 processor and report your
observations about the time.
always
, never
, and random
on an AMD processor and report your
findings. (I will provide instructions on how to access the
AMD during lab.)
As discussed in class, modern CPUs use quite complicated branch predictors that can detect complex patterns. Create a
new program based on observe_bp.c
that will cycle through a random sequence of taken/not-taken of a given
length. A suboptimal way to do this would be to replace the line
if (values[i]) {
with
if (values[i % pattern_length]) {
However, you don't want the mod
inside the code you are timing. Your program should generate the
desired pattern outside the timed loop.
SIZE
is several times larger than your maximum pattern length.atoi(argv[1])
).for ((i=25; i < 1500; i+= 25)) do echo -n "$i " # the "-n" suppresses the newline ruby average_runs.rb ./pattern $i --samples 2500 done
gnuplot
to generate your graphs. Simply create a text file with two numbers on each
line: the pattern length and the observed time. At the gnuplot
prompt type plot
"filename" with linespoints
to display the graph.
average_runs.rb
to sufficiently reduce
noise.