Lab : Branch Prediction

Pre-lab

Please complete problems 1 and 2.

Overview

The purpose of this lab is to observe branch predictors "in the wild". In other words, your goal is to collect evidence that the branch predictor on your processor does, in fact, affect performance.

Design the trap

The first step is to design a pair of programs that will differ only in their interaction with the branch predictor. This means that the only difference between the two programs should be the sequence of "taken/not-taken" for a specific branch. Designing such a program can be tricky.

Download and examine observe_bp.c. This program first loads a boolean array with a specific "taken/not-taken" sequence. It then measures how many cycles elapse during the running of a loop containing a branch that depends on the array.

Compile three versions of this program:

gcc -O0 -DTAKE_OR_NOT=true observe_bp.c -o never
gcc -O0 -DTAKE_OR_NOT=false observe_bp.c -o always
gcc -O0 -DTAKE_OR_NOT=which observe_bp.c -o random

The "-D" flag allows us to set the value of the TAKE_OR_NOT macro from the command line (and saves us the trouble of repeatedly editing the source file). The program never never takes the branch in the loop, whereas the program random randomly takes the branch. Notice that

We initialize a boolean array instead of repeatedly calling random inside the code we are timing.
We call random in the "Array Initialization Loop" for all programs --- even those programs that don't use the random value when initializing the array.

Why is it important to call the random function in the always and never programs, even though they don't actually use the random values?

Run the never program.

Why does this program return a different time each time it is run?

To reduce the "noise" in the system, run each program a few thousand times and take the average. This Ruby script will automate the process. Run ruby average_runs.rb command where command is the same command you would type on the command line to execute the program (e.g., ./always.)

Run always, never, and random on an Intel i7 processor and report your observations about the time.
Explain how your observations indicate the presence of a branch predictor.
Run always, never, and random on an AMD processor and report your findings. (I will provide instructions on how to access the AMD during lab.)
Explain how your results indicate that the AMD uses a different branch predictor.

More differences

As discussed in class, modern CPUs use quite complicated branch predictors that can detect complex patterns. Create a new program based on observe_bp.c that will cycle through a random sequence of taken/not-taken of a given length. A suboptimal way to do this would be to replace the line
if (values[i]) {
with
if (values[i % pattern_length]) {
However, you don't want the mod inside the code you are timing. Your program should generate the desired pattern outside the timed loop.

For both an Intel and AMD cpu, plot the change in time as the pattern length increases. Increase the pattern length until the time levels off. There are several ways to do this. Here are my suggestions:
- Make sure SIZE is several times larger than your maximum pattern length.
- Make the pattern length a command-line parameter (i.e., pass it on the command line and get the value from atoi(argv[1])).
- Write a script to automate running, and data collection. Here's a bash script that I've used:
```
for ((i=25; i < 1500; i+= 25))
do
  echo -n "$i "  # the "-n" suppresses the newline
  ruby average_runs.rb ./pattern $i --samples 2500
done
	      
```
- Use gnuplot to generate your graphs. Simply create a text file with two numbers on each line: the pattern length and the observed time. At the gnuplot prompt type plot "filename" with linespoints to display the graph.
- You may need to increase the number of samples in average_runs.rb to sufficiently reduce noise.
Submit your code with your writeup.