CIS 351 |
Lab 9 and 10: Cache |
Fall 2018 |
Lab 9 is problems 1-5. Lab 10 is problems 6-11.
Simplescalar is a suite of
programs that simulate the execution of programs compiled using a
MIPS-like instruction set called PISA. You can simulate the execution
of any program using Simplescalar by simply re-compiling the program
using a version of gcc
that knows how to generate PISA
instructions as well as x86 instructions.
For this lab, you will be using the tool sim-cache
. This
tool takes as input a description of a machine's memory hierarchy
(i.e., cache levels) and reports on the number of hits and misses in
each cache. Section 4.2 of The Simplescalar
Tech Report explains how to describe the cache setup you want to
simulate. When you read through this section, take note of three things:
sim-cache
, you don't specify the size of the
cache directly. Instead you specify (1) the number of lines, (2)
the block size, and (3) the associativity of the cache. The size
of the cache is the product of these three numbers. Thus, a 4-way set
associative cache with 1024 lines of 16 bytes each is 4*1024*16 =
65536 bytes (64 kilobytes).
dl1
" will appear twice when
configuring the L1 data cache.)
sim-cache
frequently uses both
"1
" (the numeral "one") and "l
" (a
lower-case letter "L"). Watch carefully because the differences
in print between the two can be subtle.
For example, to configure a machine with an 8KB, direct-mapped L1 data
cache with 32 byte blocks, use this command: -cache:dl1
dl1:256:32:1:l
. Notice that 256 blocks times 32 bytes per
block equals 8192 bytes.
When running sim-cache
, I recommend sending the output
directly to a file using the command line
parameters -redir:sim file1
and -redir:prog file2
. file1
will contain the results of the simulation (i.e., cache hit and miss
rates). file2
will contain the output produced by
the program simulated. This data is generally not interesting.
Your first task is to examine the effects of block size on a "toy"
program. Look
at blockSize1.c
. This small
program iterates through each byte in a large array. Begin by
compiling this C program for simplescalar using the following
command:
~kurmasz/public/Simplescalar/bin/ss_gcc blockSize1.c
~kurmasz/public/CS451/bin
is in your path, you can use ss_gcc
instead of
~kurmasz/public/Simplescalar/bin/ss_gcc
.) Make sure you are running the Simplescalar version. If you can run
./a.out
from the command line, you used the wrong version.
Running this command will produce a file named a.out
.
(As with the normal version of gcc
, you can specify the
name of the executable generated using the -o
flag.)
This file will not run by itself. It will run only as input to one of
the Simplescalar programs. If it does, you generated it using the wrong version of gcc
.
sim-cache
to determine the miss-rates of an 8KB, direct-mapped cache with the
following block sizes: 8 bytes, 16 bytes, 32 bytes, and 64 bytes. To
do so, use commands that look like this:
~kurmasz/public/Simplescalar/bin/sim-cache -cache:dl1
dl1:line:block:1:l -redir:prog /dev/null -redir:sim
output_block a.out
Where block ranges from 8 to 64, and line is set such that product of block times line is 8192.
Hints for running sim-cache:1
followed by the letter l
(as in
"lru").
sim-cache
is in ~kurmasz/public/CS451/bin
sim-cache
with
varying block sizes and present the results. The line below gives an example of how to perform
arithmetic in
a bash script:
let num_lines=8192/$i
After you have run sim-cache
for each block size,
grep
each output file (output_8
, output_16
,
etc.) for the line "dl1.miss_rate
". List the miss rate
for each block size tested.
NUM_LOOPS
1000000.
blockSize1.c
that array
is
an array of characters; therefore, each item in the cache is exactly
1 byte. As a result, it is easy to identify data items that will or
will not conflict in the cache. For example, in an 8KB direct-mapped
cache, array bytes 0 and 8192 will conflict. Your job is to find
sets of array elements that conflict with a 16 byte block, but not an
8 byte block.
gcc
is a C compiler. Your code must be
straight C. No iostreams; no "//"-style comments; and, all variables
must be declared at the beginning of each function.
array[0]
tends to be mapped to cache slot 0. Sometimes, it gets mapped
somewhere else. For Fall, 2009 array[0]
mapped to the
middle of a 16-byte block. To account for this, you can add the following line of code to
blockSize1.c
:
register char* arraym = array + 8;
qsort
.
qsort
given a
1KB, 4KB, and 16KB cache. Present your results using a graph with
block size on the x-axis and the miss rate on the y-axis. Please
generate one graph with three lines: One each for 1KB, 4KB, and 16KB.
Valid block sizes are 8, 16, 32, and 64. Your graph should have a form similar to
Figure 8.18 in Harris and Harris (2nd edition).
input_1e4
for input. (It contains 50,000
randomly generated integers.)
qsort
executable and sample inputs are found in
~kurmasz/public/Simplescalar/Tests/qsort
.
~kurmasz/public/Simplescalar/Tests/qsort/input_1e4
into your current directory.~kurmasz/public/Simplescalar/bin/sim-cache -cache:dl1
dl1:64:16:1:l -redir:prog opt -redir:sim output_dl1:64:16:1:l
~kurmasz/public/Simplescalar/Tests/qsort/ss_qsort input_1e4
opt
may give you a clue. If not,
ask the instructor for help.
gnuplot
, entering the line set style data linespoints
will plot all
data files using the "linespoints" style. Using this shortcut means that you won't have to type
with linespoints
after every file.
qsort
(or another interesting program of
your choice) and a cache size. Produce a graph showing miss
rates as associativity ranges over 1, 2, 4, 8, 16, and fully
associative. Your graph should have associativity on the x-axis,
and miss-rate on the y-axis. It should also contain four lines:
one for each block size. Be sure to clearly label your graph
with the cache size. Your graph should have a form similar to
Figure 5.30 in Patterson and Hennessey (4th edition, revised).
Updated Tuesday, 6 November 2018, 2:20 PM