CIS 351

Cache

Fall 2020

Overview

This lab uses the Simplescalar CPU simulator to investigate the effects of different cache configurations on performance.

This lab will be due in two pieces. Part 1 is problems 1-5; it will be due next week. Part 2 is problems 6 - 11; it will be due the Monday after Thanksgiving. Officially, there is not lab Thanksgiving week; but, I will be online to answer questions with Part 2. Anyone is welcome to addend (regardless of your scheduled lab time).

Simplescalar

Simplescalar is a suite of programs that simulate the execution of programs compiled using a MIPS-like instruction set called PISA. You can simulate the execution of any program using Simplescalar by simply re-compiling it using a version of gcc that knows how to generate PISA instructions as well as x86 instructions.

For this lab, you will be using the tool sim-cache. This tool takes as input a description of a machine's memory hierarchy (i.e., cache levels) and reports on the number of hits and misses in each cache. Section 4.2 of The Simplescalar Tech Report explains how to describe the cache setup you want to simulate. When you read through this section, take note of three things:

For example, to configure a machine with an 8KB, direct-mapped L1 data cache with 32 byte blocks, use this command: -cache:dl1 dl1:256:32:1:l. Notice that 256 blocks times 32 bytes per block equals 8192 bytes.

When running sim-cache, I recommend sending the output directly to a file using the command line parameters -redir:sim file1 and -redir:prog file2. file1 will contain the results of the simulation (i.e., cache hit and miss rates). file2 will contain the output produced by the program simulated. This data is generally not interesting.

Using a VM for Simplescalar

Simplescalar is designed to run on 32-bit machines. So, instead of using Arch/EOS directly, we will be using a Virtual Machine. Our sysadmin Tom has set up five virtual machines for us to use. They have these IP addresses:

To access these machines, you must first log into an EOS or Arch machine. Then, use ssh to log into one of the VMs: ssh simple@192.168.216.X (where X is the number of the particular machine you chose. Just pick one at random.) The username is simple and the password is SimpleSim2020.

You will all be sharing these machines; so,

  1. the first time please create a directory to use as your workspace (so you don't accidentally mess up others' work).
  2. Be nice.

Note: The VMs do not have a GUI.

Effects of block size

Your first task is to examine the effects of block size on a "toy" program. Look at blockSize1.c. This small program iterates through each byte in a large array. First, you will need to copy this file to your space on a VM: scp blockSize1.c simple@192.168.216.X:yourDirectory Next, compile this C program for Simplescalar using the following command: ss_gcc blockSize1.c (Make sure you are in your directory so you don't end up overwriting someone else's file.) Running this command will produce a file named a.out. (As with the normal version of gcc, you can specify the name of the executable generated using the -o flag.) This file will not run by itself. It will run only as input to one of the Simplescalar programs. If it does, you generated it using the wrong version of gcc.

  1. Use sim-cache to determine the miss-rates of an 8KB, direct-mapped cache with the following block sizes: 8 bytes, 16 bytes, 32 bytes, and 64 bytes. To do so, use commands that look like this:

    sim-cache -cache:dl1 dl1:line:block:1:l -redir:prog /dev/null -redir:sim output_block a.out

    Where block ranges from 8 to 64, and line is set such that product of block times line is 8192.

    Hints for running sim-cache:

    After you have run sim-cache for each block size, grep each output file (output_8, output_16, etc.) for the line "dl1.miss_rate". List the miss rate for each block size tested.

  2. Based on your observations, determine a formula for the miss rate in terms of block_size.
  3. Explain your results (i.e., what is happening in the cache during each memory access to produce the results you observed).
  4. Now, write a C program for which the miss rate is considerably higher for a 16 byte block than for an 8 byte block. (The easiest way to do this is to find two array locations that conflict with a 16-byte block, but not with an 8-byte block. If you do this, you will see the cache with the 8-byte blocks have a nearly 0% miss rate while the cache with the 16-byte blocks has nearly a 100% miss rate.) List your source code, all cache parameters used, and the resulting hit rates. Hint: You need not loop through the entire array. Instead, find two addresses that collide in the cache. Remove the inner loop and make NUM_LOOPS 1000000.
Hints for writing programs with a specified cache behavior:

Optimal block size

Your next task is to determine the optimal block size for qsort.
  1. Determine the optimal block size for qsort given a 1KB, 4KB, and 16KB cache. Present your results using a graph with block size on the x-axis and the miss rate on the y-axis. Please generate one graph with three lines: One each for 1KB, 4KB, and 16KB. Valid block sizes are 8, 16, 32, and 64. Your graph should have a form similar to Figure 8.18 in Harris and Harris (2nd edition).

Effects of associativity

  1. Write a simple C program for which a 4-way associative cache has a significantly lower miss rate than a equally sized 2-way set associative cache. You may choose any cache size and block size you wish, but they must remain the same for the entire problem. Submit the source code, all cache parameters, and resulting hit rates. Hint: Write the simplest program you can that will produce a 100% miss rate for the 2-way cache.
  2. Write a simple C program for which an associativity of 2 has a higher miss rate than a direct-mapped cache. You may choose any cache size and block size you wish, but they must remain the same for the entire problem. Submit the source code, all cache parameters, and resulting hit rates. As with the previous problem, start by writing a program that has a 100% miss rate for a 2-way cache.
  3. Explain why your code above produces the miss rates observed (i.e., why your code has a higer miss rate on the two-way cache).
  4. Choose qsort (or another interesting program of your choice) and a cache size. Produce a graph showing miss rates as associativity ranges over 1, 2, 4, 8, 16, and fully associative. Your graph should have associativity on the x-axis, and miss-rate on the y-axis. It should also contain four lines: one for each block size. Be sure to clearly label your graph with the cache size. Your graph should have a form similar to Figure 5.30 in Patterson and Hennessey (4th edition, revised).

Effects of replacement policy

  1. Write a simple C program for which a random replacement policy results in a significantly lower miss rate than LRU. (Use an 8-way set associative cache with a size and block size of your choosing.) Submit the source code, all cache parameters, and resulting hit rates.
  2. Choose a cache size and block size. Plot the effects of associativity and replacement policy on qsort. Your graph should have associativity on the x-axis and miss rate on the y-axis. There should be three lines: one for each replacement policy.

Updated Monday, 16 November 2020, 2:05 PM

W3c Validation