程序代写代做代考 cache 1 of 2
1 of 2
0907432 Computer Design (Spring 2010)
Midterm Exam Solution
:رقم الشعبة :رقم التسجيل :السما
==================================================================================
Instructions: Time 50 min. Closed books & notes. No calculators or mobile phones. No questions are allowed. Show
your work clearly. Every problem is for 6 marks.
==================================================================================
Q1. Assume that the instructions executed by a processor are broken down as follows:
Type ALU Control Memory
Frequency 50% 20% 30%
If you can improve the performance of only one type of these three instruction types by a factor of 2, answer the
following two questions.
(a) What type would you improve? ______ALU____________
(b) Using Amdahl’s law, what is the resulting overall speedup? ___4/3____
Speedup = 1 / ( 1-0.5 + 0.5/2 ) = 1/.75 = 4/3
Q2. Describe how the branch history table (BHT) and the branch target buffer (BTB) are used in branch prediction.
____Using the branch instruction address,
______1) The BHT is consulted to predict the branch direction
______2) The BTB is consulted to predict the branch target address
__________________________________________________________________________________________
Q3. Assume that you have a typical 5-stage pipelined processor that uses forwarding and stalls to solve data
hazards, resolves branches in the decode stage, and has one branch delay slot. After you do pipeline
scheduling (rearrange instructions) for the following code sequence, the minimum number of cycles
needed to execute this code sequence is: ____9______ (You must show your rearranged code)
Original Code Sequence Rearranged Code Sequence
L1: lw R5, 0(R1)
lw R6, 0(R5)
add R7, R7, R6
addi R1, R1, #-4
bne R1, R2, L1
nop
L1: lw R5, 0(R1)
addi R1, R1, #-4
lw R6, 0(R5)
bne R1, R2, L1
add R7, R7, R6
2 of 2
Q4. Assume that the following code sequence is executed by a speculative superscalar processor of degree
2. This processor uses reservation stations and reorder buffer. The integer latency is 1 cycle and the load
latency is 2 cycles. The processor has one address calculation unit, one memory access unit, one integer
ALU unit, and one branch unit. Assume that the processor predicts that the branch instruction is not
taken and the branch is actually not taken. Using pipeline diagram in the space below, the number of
cycles needed to fetch and commit these instructions is: ___10_____
1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
lw R2, 0(R1) F I A M W C
lw R3, 4(R1) F I A M W C
add R4, R2, R3 F I E W C
beq R2, R3, Skip F I E W C
sub R5, R2, R3 F I E W C
Skip:
Q5. For a direct-mapped cache design with 32-bit address, the following bits of the address are used to access the
cache.
Tag Index Offset
31-12 11-5 4-0
(a) What is the cache line size (in words)? _____2
5
/4 = 32/4 = 8 words_________
(b) How many lines does the cache have? __2
11-5+1
= 2
7
= 128___________
(c) What is the ratio between total bits required for such a cache implementation over the data storage bits?
Ratio = [ 128 * (32*8 + 20 + 1) ] / [ 128 * (32*8) ] = 277/256
Starting from power on, the following byte addressed cache references are recorded.
Address Block Address Cache Index Hit or Miss
0 0 0 Miss
4 0 0 Hit
16 0 0 Hit
128 4 4 Miss
224 7 7 Miss
160 5 5 Miss
4100 128 0 Miss
30 0 0 Miss
140 4 4 Hit
3100 96 96 Miss
(d) How many blocks are replaced? __two blocks___
(e) What is the hit ratio? ___3/10_______