Homework 7 - Extra Credit (Chapters 7-9)

Due: Thursday, March 19, 2009 at 5:00pm (The night before the final)
Absolutely NO late assignments will be accepted.
  1. (2pts) Consider a matrix (2D array) implemented in C or C++. One may traverse the matrix either row by row or column by column. Which of these two traversal methods is more well-suited for a cache which takes advantage of spatial locality? Justify your answer by describing how the 2D array would be arranged in main memory.
  2. (2pts) Cache misses are classified as one of the following: compulsory, capacity or conflict. Which of these classifications can be minimized by rewriting your code to use less memory?
  3. (2pts) What is the motivation behind using multiple levels of caches with different sizes and addressing modes (e.g. a level 1 cache with 16 direct mapped blocks and a level 2 cache with 64 2-way set associative blocks)?
  4. (2pts) What is one motivation for doing I/O interrupts instead of I/O polling?
  5. (2pts) Describe a programming situation which would be well-suited to be adapted to multithreading for use on a multiple issue or multi-core processor. The program should do more than one task concurrently with mutual exclusion between the tasks.
  6. (10pts) You are evaluating three cache designs for the instruction cache. The three designs are: direct mapped with one instruction per block, direct mapped with 4 instructions per block and 2-way set associative with 4 instructions per block. The instructions being executed are:
        Address    Instruction
        =======    ============================
        4000       Loop:   beq $s0, $zero, Exit   # imm = 6, offset to Exit
        4004               add $t0, $s0, $s2      # compute read address
        4008               add $t1, $s0, $s3      # compute write address
        4012               lw $t2, 0($t0)         # read data
        4016               sw $t2, 0($t1)         # write data
        4020               sub $s0, $s0, $s1      # subtract offset
        4024               j Loop                 # imm = 1000 which is 4000/4
        4028       Exit:
    
    The layout for each cache will follow in the individual questions for each cache. To determine where to store the instruction in the cache, convert the address into a 32 bit binary number. Ignore the lower 2 bits of the memory address (since each instruction is 4 bytes long and the address is a multiple of 4, the lower 2 bits will always be 00). Use the remaining 30 bits to calculate the cache row address and tag for each cache design.
    1. Direct mapped cache with 1 instruction per block
      The cache row address and tag for this cache design will be calculated as follows:
                 31 ... 6|5 ... 2|1 0 
                 --------------------
                |  Tag   |  Row  |0 0|  instruction address
                 --------------------
                                  /|\
                                   |
                                 ignore these bits
      
      Fill in the cache and state how many cache misses this design has assuming the code starts executing at the Loop: tag and that it executes for two iterations.

      Direct Mapped Cache - 1 instruction per block
      Row (4 bits) Valid Tag (26 bits) Data (1 instruction)
      0000 (0)      
      0001 (1)      
      0010 (2)      
      0011 (3)      
      0100 (4)      
      0101 (5)      
      0110 (6)      
      0111 (7)      
      1000 (8)      
      1001 (9)      
      1010 (10)      
      1011 (11)      
      1100 (12)      
      1101 (13)      
      1110 (14)      
      1111 (15)      


    2. Direct mapped cache with 4 instructions per block
      When the cache has more than one instruction per block, the cache row address is now divided into a row number and word offset as follows:
       4 instructions per row, 2 bit word offset = address[3:2]
       row number is 2 bits = address[5:4]
       tag is 26 bits = address[31:6]
      
         31 ... 6|5 4|3 2|1 0
         --------------------
        |  Tag   |Row|   |0 0|  address
         --------------------
                      /|\
                       |
                   word offset
      
      If there is a cache miss on one instruction in a row, all 4 instructions for the row are pulled into the cache (e.g. all instructions that have the same upper 28 bits as the instruction that caused the cache miss). Fill in the cache and state how many cache misses this design has assuming the code starts executing at the Loop: tag and that it executes for two iterations.

      Direct Mapped Cache - 4 instructions per block
      Row (2 bits) Valid Tag (26 bits) Data (4 instructions, 2 bit word offset)
      Word 00 Word 01 Word 10 Word 11
      00 (0)            
      01 (1)            
      10 (2)            
      11 (3)            


    3. 2-way set associative cache with 4 instructions per block
      The main difference between the 2-way set associative cache with 4 instructions per block and the direct mapped cache with 4 instructions per block is that each row of the 2-way set associative cache contains 2 blocks instead of 1 block. The blocks are unrelated except for the fact that they map to the same cache row address. If the cache is to hold the same number of instructions as the direct mapped cache, then the number of rows must be divided by 2 (since it is 2-way associative). The other alternative is to double the number of instructions the cache can hold. We'll use this alternative for this assignment and let this cache hold 32 instructions instead of just 16.

      Since the cache can now hold 32 instructions, the cache row address, word offset and tag calculations are exactly the same as for the direct mapped cache with 4 instructions per block. As with that design, if there is a cache miss on one instruction in a row, all 4 instructions for the block are pulled into the cache. Fill in the cache and state how many cache misses this design has assuming the code starts executing at the Loop: tag and that it executes for two iterations.

      2-way Associative Cache - 4 instructions per block
      Row (2 bits) Valid Tag (26 bits) Data (4 instructions, 2 bit word offset)
      Word 00 Word 01 Word 10 Word 11
      00 (0)            
                 
      01 (1)            
                 
      10 (2)            
                 
      11 (3)