Lab : Branch Prediction


Pre-lab

Please complete problems 1 and 2.

Overview

The purpose of this lab is to observe branch predictors "in the wild". In other words, your goal is to collect evidence that the branch predictor on your processor does, in fact, affect performance.

Design the trap

The first step is to design a pair of programs that will differ only in their interaction with the branch predictor. This means that the only difference between the two programs should be the sequence of "taken/not-taken" for a specific branch. Designing such a program can be tricky.

Download and examine observe_bp.c. This program first loads a boolean array with a specific "taken/not-taken" sequence. It then measures how many cycles elapse during the running of a loop containing a branch that depends on the array.

Compile three versions of this program:

The "-D" flag allows us to set the value of the TAKE_OR_NOT macro from the command line (and saves us the trouble of repeatedly editing the source file). The program never never takes the branch in the loop, whereas the program random randomly takes the branch. Notice that

  1. Why is it important to call the random function in the always and never programs, even though they don't actually use the random values?

Run the never program.

  1. Why does this program return a different time each time it is run?

To reduce the "noise" in the system, run each program a few thousand times and take the average. This Ruby script will automate the process. Run ruby average_runs.rb command where command is the same command you would type on the command line to execute the program (e.g., ./always.)

  1. Run always, never, and random on an Intel i7 processor and report your observations about the time.
  2. Explain how your observations indicate the presence of a branch predictor.
  3. Run always, never, and random on an AMD processor and report your findings. (I will provide instructions on how to access the AMD during lab.)
  4. Explain how your results indicate that the AMD uses a different branch predictor.

More differences

As discussed in class, modern CPUs use quite complicated branch predictors that can detect complex patterns. Create a new program based on observe_bp.c that will cycle through a random sequence of taken/not-taken of a given length. A suboptimal way to do this would be to replace the line
if (values[i]) {
with
if (values[i % pattern_length]) {
However, you don't want the mod inside the code you are timing. Your program should generate the desired pattern outside the timed loop.

  1. For both an Intel and AMD cpu, plot the change in time as the pattern length increases. Increase the pattern length until the time levels off. There are several ways to do this. Here are my suggestions:
  2. Submit your code with your writeup.