Homework 2 - Chapter 5

Due: Friday January 25, 2013 by midnight
Last Late Day: The assignment will only be accepted late through midnight on Sunday January 27th, so that the solution can be posted Monday January 28th for people to review before Midterm 1 on Tuesday January 29th.
  1. (8 pts) This is a continuation of Question 8 from Homework 1. You are now going to evaluate two more cache designs: a direct-mapped cache with 4 instructions per block (block size = 4 words) that stores up to 16 instructions and a 2-way set associative cache with 4 instructions per block that stores up to 32 instructions. Both caches will have 4 rows with this setup.

    The main difference between the 2-way set associative cache with 4 instructions per block and the direct-mapped cache with 4 instructions per block is that each row of the 2-way set associative cache contains 2 blocks instead of 1 block. The blocks are unrelated except for the fact that they map to the same cache row address.

    The cache row address and tag for this cache will be calculated as follows:

       4 instructions per row, 2 bit word offset = address[3:2]
       row number is 2 bits = address[5:4]
       tag is 26 bits = address[31:6]
    
         31 ... 6|5 4|3 2|1 0
         --------------------
        |  Tag   |Row|   |0 0|  address
         --------------------
                      /|\ /|\
                       |   |___ byte offset
                       |
                   word offset
    
    The instructions being executed are the same as Homework 1 (note 4000d means 4000 in decimal notation):
        Address    Instruction
        =======    ============================
        4000d      Loop:   beq $s0, $zero, Exit   # immediate = 6, offset to Exit
        4004d              add $t0, $s0, $s2      # compute read address
        4008d              add $t1, $s0, $s3      # compute write address
        4012d              lw $t2, 0($t0)         # read data
        4016d              sw $t2, 0($t1)         # write data
        4020d              sub $s0, $s0, $s1      # subtract offset
        4024d              j Loop                 # immediate = 1000 which is 4000/4
        4028d      Exit:
    
    Fill in the following cache table and state how many cache misses this design has. Assume that the code starts executing at the Loop: tag, that is executes for EXACTLY two interations, and that the cache is empty at the start.

    Direct Mapped Cache - 4 instructions per block
    Row (2 bits) Valid Tag (26 bits) Data (4 instructions, 2 bit word offset)
    Word 00 Word 01 Word 10 Word 11
    00 (0)            
    01 (1)            
    10 (2)            
    11 (3)            

    2-way Associative Cache - 4 instructions per block
    Row (2 bits) Valid Tag (26 bits) Data (4 instructions, 2 bit word offset)
    Word 00 Word 01 Word 10 Word 11
    00 (0)            
               
    01 (1)            
               
    10 (2)            
               
    11 (3)            
               

  2. (2 pts) Cache misses are classified as one of the following: compulsory, capacity or conflict. Which of these types of cache misses can be minimized by rewriting your code to use less memory?
  3. (2 pts) Assume you have a 2D matrix written in C or C++ which contains 8 rows with 64 integers per row. Your cache block size is 16 bytes and you have a 2-way set associative cache containing 8 rows. What would be the miss rate if you accessed all of the elements in the matrix column-by-column?
  4. (4 pts) Assume you have a cache hierarchy where the L1 hit time is 2ns, the L2 hit time is 6ns, the L3 hit time is 25ns, and main memory access time is 100ns. Assume 100 memory accesses occur. What is the average transfer time for each of the following scenarios:
    1. 8% miss rate on L1, 30% miss rate on L2, 50% miss rate on L3
    2. 3% miss rate on L1, 40% miss rate on L2, 20% miss rate on L3
  5. (4 pts) Page tables and translation lookaside buffers are used for managing virtual memory addresses. Assume you have a system with 4KB pages, 4-entry fully associative TLB, and true LRU replacement. Assume the TLB and page table are initially as follows:

    TLB Initial State
    Valid Tag Page Number
    1 11 12
    1 7 4
    1 3 6
    0 4 9

    Page Table Initial State
    Valid Physical Page (or on Disk)
    1 5
    0 Disk
    0 Disk
    1 6
    1 9
    1 11
    0 Disk
    1 4
    0 Disk
    0 Disk
    1 2
    1 12

    You access the following stream of virtual addresses:

    4669, 2227, 13916, 34587, 48870, 12608, 49225
    
    Give the final state of the system (updated TLB and page table).