CIS 351 Cache Homework

Due: Wednesday, 25 November, 2020. (Only problems 1-6. We may not get to 7 and 8 this semester.)

All Patterson and Hennessey problems are from the 3rd edition.
  1. (Harris and Harris, problem 8.1) In less than one page, describe four everyday activities (other than using a library) that exhibit temporal or spatial locality. List two activities for each type of locality, and be specific.
  2. For each cache configuration below, fill in the corresponding table column to show whether each reference is a hit or a miss.
    1. 1MB, direct-mapped cache with 4 byte blocks.
    2. 1MB, direct-mapped cache with 8 byte blocks.

    References (a) (b)
    0x00000004       1             Miss             0             Miss                            
    0x00800018 6 Miss 3 Miss
    0x00D00010
    0x00000004
    0x0080001C
    0x00D00014
    0x00A00008
    0x00A00004
    0x00000008
    0x00200030
    0x00200024
    0x00200034
    0x00D00034
    0x00200030
  3. There should be four rows where columns (a) and (b) differ. Explain precisely how the change in cache configuration produces the observed behavior differences.
  4. For each cache configuration below, fill in the corresponding table column to show whether each reference is a hit or a miss.
    1. 1MB, direct-mapped cache with 4 byte blocks.
    2. 1MB, direct-mapped cache with 8 byte blocks.
    3. 1MB, two-way set associative cache with 8 byte blocks.
    4. 1.5MB three-way set associative cache with 8 byte blocks

    References (a) (b) (c) (d)
    0x00000018       6             Miss             3             Miss             3             Miss             3             Miss      
    0x00D00018 6 Miss 3 Miss 3 Miss 3 Miss
    0x00A0003C
    0x00D00018
    0x00000018
    0x00500010
    0x00400004
    0x00A00038
    0x00400000
    0x00600024
    0x00500014
    0x00D0001C
    0x00E0001C
    0x00D0001C
    0x0000001C
  5. Compute the total number of bits required to implement the caches described below given a 32-bit address space. Fill in the requested information in the table provided. List (1) the total number of bits per line, (2) the total number of bytes of SRAM needed, and (3) the "overhead" as a percentage of data. (For example, a 1MB cache that requires 1.25MB of data total would have a 25% overhead.) Don't forget to include the valid bit and any LRU bits if necessary. Assume the number of LRU bits is n - 1 for an n-way set associative cache.
    1. 1MB direct-mapped cache with 1 byte blocks (Patterson and Hennessy problem 7.9)
    2. 1MB direct-mapped cache with 4 byte blocks (Patterson and Hennessy problem 7.10)
    3. 1MB 2-way set associative cache with 1 byte blocks and LRU replacement (Patterson and Hennessy problem 7.25 -- almost)
    4. 1MB 2-way set associative cache with 4 byte blocks and LRU replacement
    5. 1MB fully associative cache with 1 byte blocks and LRU replacement (Patterson and Hennessy problem 7.26)
    6. 1MB fully associative cache with 4 byte blocks and LRU replacement (Patterson and Hennessy problem 7.27)

    tag Line size Total size (with valid) %overhead
    1 21 bits/line 2,752,512 bytes (2.625MB) 162%
    2 ___ bits/line
    3 ___ bits/line
    4 ___ bits/line
    5 ___ bits/"way"
    6 ___ bits/"way"
  6. (Patterson and Hennessy 7.29) Suppose a byte-addressable computer's address size is k bits, the cache size is S bytes, the block size is B bytes, and the cache is A-way set associative. Give the formulas for the following in terms of S, B, A, and k:

  7. Consider two CPUs: For simplicity, assume that instruction fetches don't cause cache misses. If 36% of all instructions are memory accesses (e.g., lw and sw), the memory access time is 70ns, and the CPUs clock period is set to the cache hit time, then
    1. What is the CPI for each machine?
    2. Which machine is faster? (Remember, they have different clock periods.)
    3. Suppose a 4KB L1 cache would have a .7ns hit time. What miss rate would be necessary for this CPU to out-perform the other two?
    4. What speedup would result from doubling the speed of P1's L1 cache and clock frequency while leaving the main memory speed at 70ns? (In other words, what happens if the clock period is now .31ns, a cache access is still 1 cycle, and a cache miss is still 70ns?)
    5. What speedup would result from doubling P1's main memory speed (to 35ns) and leaving the clock at .62ns?
  8. Consider a CPU with a split L1 cache and a unified L2 cache. Compute the following:
    1. What is the average memory access time for instruction accesses?
    2. What is the average memory access time for data accesses?
    3. What is the overall CPI?
    4. We could double the size of each L1 cache if we made the clock 20% slower. What would the new CPI need to be in order for the new CPI to be faster?
    5. By what percent would the L1 miss rates need to decrease in order for the slower CPU to have better performance? (Assume that the L2 miss rate does not change, but that the L2 and memory access cycle times reflect the slower clock period.)

For more practice, try Harris and Harris problems 8.7, 8.8, 8.9, 8.10*, 8.11*, 8.12*