CS计算机代考程序代写 compiler computer architecture scheme Pipelining 2
Pipelining 2
CS 154: Computer Architecture Lecture #15
Winter 2020
Ziad Matni, Ph.D.
Dept. of Computer Science, UCSB
Administrative
• Talk is on MONDAY, MARCH 9th in our usual class • Will take attendance…
• Final Exam info:
• Tuesday, March 17th at 12:00 (not 12:30!!!) in this classroom
• • •
• Arrive 10 mins early – randomized seating…
Cumulative Exam
Will allow some notes – exact details to follow
Study guide/example Qs will be issued by this weekend
3/4/2020
Matni, CS154, Wi20 2
Re: Labs
•Lab 7 still due on Sunday •Lab 8 will be issued soon •There IS a lab THIS Friday •Re: lab NEXT Friday…
3/4/2020 Matni, CS154, Wi20 3
Lecture Outline
• Data Hazards in Pipelining: Forwarding vs Stalling
• Control Hazards: Branch Prediction
3/4/2020 Matni, CS154, Wi20 4
Control Lines for the Last 3 Pipeline Stages
Same control signals that we learned earlier, but this time they are “ferried” across the pipelines
See tables in Fig. 4.48, 4.49 in textbook
These are derived from the instruction
3/4/2020 Matni, CS154, Wi20 5
Pipelined Datapath Showing Control Signals
3/4/2020 Matni, CS154, Wi20 6
Another Look at Data Hazards
• Consider this sequence: sub $2, $1, $3
and $12, $2, $5 or $13, $6, $2 add $14, $2, $2 sw $15, 100($2)
• All of the instructions after sub are dependent on the result in register $2 of the first instruction.
3/4/2020 Matni, CS154, Wi20 7
Let’s say that reg. $2 has value 10 at first, but that the sub instruction changes that to -20.
This diagram shows that the instructions that would get the correct value of −20 are add and sw, while the AND and OR instructions would get the incorrect value 10!
3/4/2020 Matni, CS154, Wi20 8
We could resolve these hazards using forwarding
But how do we detect when to forward?
Forwarding vs. Stalling
• We could stall using bubbles… (inefficient)
• …or we could forward the data as soon as it is available to any units that need it before it is available to read in the WB stage
• Let’s only consider forwarding to an operation in the EX stage
• That is, either an ALU operation or an address calculation
3/4/2020
Matni, CS154, Wi20
9
A Word on Some Notation to Use…
Example: ID/EX.RegisterRs
• Refers to the number of a register whose value is found in the pipeline register ID/EX
• The 1st part is the name of the pipeline register (ID/EX) • the 2nd part is the name of the field in that register (Rs)
• ALU operand register numbers in EX stage are given by ID/EX.RegisterRs and ID/EX.RegisterRt
3/4/2020 Matni, CS154, Wi20 10
Data Hazards Occur When…
1a. EX/MEM.RegisterRd = ID/EX.RegisterRs 1b. EX/MEM.RegisterRd = ID/EX.RegisterRt
2a. MEM/WB.RegisterRd = ID/EX.RegisterRs 2b. MEM/WB.RegisterRd = ID/EX.RegisterRt
1st hazard here is on register $2, between the result of sub and AND instructions.
This hazard can be detected when the AND instruction is in the EX stage and the sub instruction is in the MEM stage, so this is hazard 1a:
EX/MEM.RegisterRd = ID/EX.RegisterRs = $2
3/4/2020 11
Matni, CS154, Wi20
Detecting the Need to Forward
• These comparisons are not enough, though! • Some instructions don’t write to registers
• Solution: see if the RegWrite control signal will be active • Additionally, check that:
1a. EX/MEM.RegisterRd = ID/EX.RegisterRs ≠ $zero 1b. EX/MEM.RegisterRd = ID/EX.RegisterRt ≠ $zero
2a. MEM/WB.RegisterRd = ID/EX.RegisterRs ≠ $zero
2b. MEM/WB.RegisterRd = ID/EX.RegisterRt ≠ $zero
3/4/2020 Matni, CS154, Wi20 12
Forwarding Paths (simplified)
See Fig. 4.55 in textbook for full explanation of the mux selects ForwardA and ForwardB
3/4/2020 Matni, CS154, Wi20 13
Forwarding Conditions
3/4/2020 Matni, CS154, Wi20 14
Double Data Hazards!
• Consider the sequence:
add $1, $1, $2 add $1, $1, $3 add $1, $1, $4
• Both hazards conditions occur at once!
• We want to use the most recent value in $1
• We have to revise the MEM hazard condition • Only forward if EX hazard condition isn’t true
3/4/2020 Matni, CS154, Wi20 15
Revised Forwarding Condition
3/4/2020 Matni, CS154, Wi20 16
Pipelined Datapath Modified
To Resolve Hazards Via Forwarding
3/4/2020 Matni, CS154, Wi20 17
Load-Use Data Hazard
A case where forwarding will not work is when an instruction tries to read a register following a load instruction that writes the same register.
3/4/2020 Matni, CS154, Wi20 18
Hazard Detection
• We also need a “Hazard Detection Unit”!
• It operates during the ID stage so that it can insert the stall
between the load and its use.
• ALU operand register numbers in ID stage are given by
IF/ID.RegisterRs, IF/ID.RegisterRt • Has one thing to check:
3/4/2020 Matni, CS154, Wi20 19
How to Stall the Pipeline
• Force control values in ID/EX register to 0 • EX, MEM and WB do a nop (no-operation)
• Prevent update of PC and IF/ID register • Current instruction is decoded again
• Following instruction is fetched again
• 1-cycle stall allows MEM to read data for lw • Can subsequently forward to EX stage
3/4/2020 Matni, CS154, Wi20 20
How Stalls Are Actually Done…
3/4/2020 Matni, CS154, Wi20 21
Compiler vs Hardware Stalls
• Compilers usually rely on the hardware to resolve hazards
• Sometimes compilers take measures to avoid some types of hazards
• BUT a compiler must understand the pipeline to achieve the best performance.
• Otherwise, unexpected stalls will reduce the performance of the compiled code.
3/4/2020
Matni, CS154, Wi20 22
Pipelined Datapath Modified
To Resolve Hazards Via Forwarding OR Stalling
3/4/2020 Matni, CS154, Wi20 23
Control Hazards
• Pipeline hazards involving branches
• An instruction must be fetched at every clock cycle to sustain the pipeline
• Buy the decision about whether to branch doesn’t occur until the MEM pipeline stage
• Stalling until the branch is complete is too slow
• Control hazards occur less frequently than data hazards
• We end up using simpler schemes.
3/4/2020 Matni, CS154, Wi20 24
Prediction Scheme 1: Assume branch not taken
• Continue execution down the sequential instruction stream.
• If the branch is taken, the instructions that are being fetched and decoded must now be discarded (flushed)
• Execution continues at the branch target.
• If the branch is untaken half the time, and if it costs little to discard the instructions, this optimization halves the cost of control hazards
3/4/2020 Matni, CS154, Wi20 25
Prediction Scheme 1.5: Reduce the Delays
• That is, reduce the cost of the taken branch
• Main idea: if we move the branch execution earlier in the pipeline (from MEM), then fewer instructions need be discarded.
• Many branches rely only on simple tests (equality or sign) • Such tests do not require a full ALU operation
• Can be done with at most a few gates.
• For more complex branch decisions use a separate instruction that uses an ALU to perform a comparison
3/4/2020 Matni, CS154, Wi20 26
What Needs to be Done?
2 actions have to occur:
• Computing the branch target address earlier
• Easy fix: we move the branch adder from the EX stage to the ID stage
• Evaluating the branch decision earlier
• Harder to do…
• For branch equal, we would compare the two registers read during the ID stage to see if they are equal.
• Can be done with 1 XOR and 1 OR gate (32b gates).
• Implies additional forwarding and hazard detection hardware…
3/4/2020 Matni, CS154, Wi20 27
Example
• Consider this code:
• Assume that, in this example, the branch will be taken
3/4/2020 Matni, CS154, Wi20 28
3/4/2020 Matni, CS154, Wi20 29
Using Simple Forwarding in Branches
• If a comparison register is a destination of a 2nd or 3rd preceding ALU instruction
• Can resolve using forwarding
3/4/2020 Matni, CS154, Wi20 30
Dynamic Branch Prediction
• In deeper and superscalar pipelines, branch penalty is more significant, so dynamic prediction is used, needing:
• Branch prediction buffer (aka branch history table)
• Indexing by recent branch instruction addresses
• Store outcome (taken/not taken)
• To execute a branch, then:
• Check table, expect the same outcome
• Start fetching from fall-through or target • If wrong, flush pipeline and flip prediction
3/4/2020 Matni, CS154, Wi20 31
1-bit Predictor
• A branch prediction buffer is a small memory indexed by the lower portion of the address of the branch instruction.
• The memory contains a bit that says whether the branch was recently taken or not.
• Very simple
• Remember: a prediction is just an “educated” guess – a hint we hope is
correct
• If the hint turns out to be wrong:
• the incorrectly predicted instructions are deleted • the prediction bit is inverted and stored back
• the proper sequence is fetched and executed.
3/4/2020 Matni, CS154, Wi20 32
Downsides to a 1-bit Predictor
• You could get a sequence of wrong predictions!
• Example:
• Mis-predict taken on last iteration of the inner loop
• Then mis-predict (as in “do not take”) on first iteration of
inner loop next time around
3/4/2020 Matni, CS154, Wi20 33
A Better Way? Use a 2-bit Predictor
• Only change prediction on two successive mis-predictions
3/4/2020 Matni, CS154, Wi20 34
Final Pipelined Datapath Showing Hazard and Forwarding Detection
3/4/2020 Matni, CS154, Wi20 35
YOUR TO-DOs for the Week
• Finish Lab 7 by Sunday
• New Lab 8 (last one!) will be issued Thursday
• Due next week, which is last week of classes…
3/4/2020 Matni, CS154, Wi20 36
3/4/2020 Matni, CS154, Wi20 37