# CMCS 611-101 Advanced Computer Architecture

#### Lecture 7

### **Pipeline Hazards**

#### September 28, 2009

www.csee.umbc.edu/~younis/CMSC611/CMSC611.htm



#### **Lecture's Overview**

#### Previous Lecture:

#### ➔ Pipelined hazards

- Pipelining concept is natural
- Start handling of next instruction while current one is in progress

#### ➔ Pipeline performance

- Performance improvement by increasing instruction throughput
- Ideal and upper bound for speedup is number of stages in pipeline

#### **This Lecture:**

- Structural, data and control hazards
- Data Hazard resolution techniques
- Pipelined control





□ The load instruction is the longest

□ All instructions follows at most the following five steps:

- → Ifetch: Instruction Fetch
  - Fetch the instruction from the Instruction Memory
- → Reg/Dec: Registers Fetch and Instruction Decode
- → Exec: Calculate the memory address
- → Mem: Read the data from the Data Memory
- $\rightarrow$  WB: Write the data back to the register file



# **Instruction Pipelining**

Start handling of next instruction while the current instruction is in progress

Pipelining is feasible when different devices are used at different stages of instruction execution
Time





### **Pipeline Datapath**



Every stage must be completed in one clock cycle to avoid stalls

- Values must be latched to ensure correct execution of instructions
- The PC multiplexer has moved to the IF stage to prevent two instructions from updating the PC simultaneously (in case of branch instruction)



### **Pipeline Hazards**

Pipeline hazards are cases that affect instruction execution semantics and thus need to be detected and corrected

#### Hazards types

Structural hazard: attempt to use a resource two different ways at same time

- ➔ E.g., combined washer/dryer would be a structural hazard or folder busy doing something else (watching TV)
- ➔ Single memory for instruction and data

Data hazard: attempt to use item before it is ready

- ➔ E.g., one sock of pair in dryer and one in washer; can't fold until get sock from washer through dryer
- $\rightarrow$  instruction depends on result of prior instruction still in the pipeline

<u>Control hazard</u>: attempt to make a decision before condition is evaluated

- ➔ E.g., washing football uniforms and need to get proper detergent level; need to see after dryer before next load in
- ➔ branch instructions
- Hazards can always be resolved by waiting

#### **Single Memory is a Structural Hazard**



#### Can be easily detected

#### □ Resolved by inserting idle cycles



# **Stalls & Pipeline Performance**

Average instruction time unpipelined Average instruction time pipelined Speedup from pipelining =  $= \frac{\text{CPI unpipelined}}{\text{CPI pipelined}} \times \frac{\text{Clock cycle unpipelined}}{\text{Clock cycle pipelined}}$ 

□ Ideally the CPI of the pipeline execution is 1 (after fill-up), thus

→ CPI pipelined = Ideal CPI + Pipeline stall clock per instruction

= 1 + Pipeline stall clock per instruction

Speedup =  $\frac{\text{CPI unpipelined}}{1 + \text{Pipelinestall cycles per instruction}} \times \frac{\text{Clock cycle unpipelined}}{\text{Clock cycle pipelined}}$ 

Assuming all pipeline stages are balanced, then



#### **Control Hazard**

#### □ Stall: wait until decision is clear

➔ It is possible to move up decision to 2<sup>nd</sup> stage by adding hardware to check registers as being read



 $\Box$  Impact: 2 clock cycles per branch instruction  $\Rightarrow$  slow



# **Control Hazard Solution**



## **Control Hazard Solution**

Redefine branch behavior (takes place after next instruction) "delayed branch"



Impact: 0 clock cycles per branch instruction if can find instruction to put in "slot" (50% of time)

#### **Data Hazard**



#### Dependencies backwards in time are hazards



#### **Data Hazard Solution**



"Forward" result from one stage to another



## **Implementing Data Forwarding**



The ALU result from EX/MEM register is fed back and kept in next stages

If data hazard is detected the forward values will be used





➤The ALU result from EX/MEM register is forwarded to MEM/WB



#### **Forwarding Datapath**



Three additional inputs are added to the ALU multiplexers each corresponding to a bypass (forwarded data)



# **Data Hazards Classification**

- Data hazard can happen because dependence among a pair of instructions writing and reading to the same register or memory location
- By stalling the pipeline on cache misses data hazards caused by memory access are avoided
- $\Box$  Data hazards types (instruction *I* proceeds *J*)

RAW (read after write): J attempts to read an operand before I writes it

➔ Most common type of hazard and typically handled by forwarding

WAW (write after write): J attempts to write an operand before I writes it

- $\rightarrow$  Can happen when writing is done in more than one pipeline stage
- ➔ For MIPS pipeline, WAW hazard can happen if WB is performed during MEM stage and the memory is slow so that MEM stage take two cycles

| LW R1, 0(R2)   | IF | ID | EX | MEM1 | MEM2 | WB |
|----------------|----|----|----|------|------|----|
| ADD R1, R2, R3 |    | IF | ID | EX   | WB   |    |

WAR (write after read): J attempts to write an operand before I reads it

Happen when there are instructions that write early in the pipeline while others reading in a late stage



#### **Data Hazards for Load Instructions**



#### Dependencies backwards in time but cannot be solved with forwarding

Must delay/stall instruction dependent on loads

## **Solving Hazard by Pipeline Interlock**



# **Compiler Scheduling for Data Hazards**

□ The compiler usually performs *instruction scheduling* to avoid causing data hazard, such as:

45%

40%

41%

avoid generating LW followed by an immediate instruction  $\succ$ that uses the destination register

Change order of instructions in the basic block 



### **Data Hazards Detection**

Detecting hazards early in the pipeline reduces hardware complexity since the machine state will not get erroneously changed

□ For the MIPS integer pipeline, all data hazards can be checked in ID stage

| Example:                       | Situation                               | Example code<br>sequence                                                         | Action                                                                                                                                                             |  |
|--------------------------------|-----------------------------------------|----------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|
| Load<br>interlock<br>detection | No<br>dependence                        | LW <b>R1</b> , 45 (R2)<br>ADD R5,R6,R7<br>SUB R8,R6,R7<br>OR R9, R6, R7          | No hazard possible because no dependence<br>exists on R1 in the immediately following<br>three instructions                                                        |  |
|                                | Dependence<br>requiring stall           | LW <b>R1</b> , 45 (R2)<br>ADD R5, <b>R1</b> ,R7<br>SUB R8,R6,R7<br>OR R9, R6, R7 | Comparators detect the use of R1 in the AD<br>and stall the ADD (and SUB and OR) befo<br>the ADD begins EX                                                         |  |
|                                | Dependence<br>overcome by<br>forwarding | LW <b>R1</b> , 45 (R2)<br>ADD R5,R6,R7<br>SUB R8, <b>R1</b> ,R7<br>OR R9, R6, R7 | Comparators detect the use of R1 in the SUB<br>and forward result of load to ALU in time for<br>SUB to begin EX                                                    |  |
|                                | Dependence<br>with accesses<br>in order | LW <b>R1</b> , 45 (R2)<br>ADD R5,R6,R7<br>SUB R8,R6,R7<br>OR R9, <b>R1</b> , R7  | No action required because the read of R1 by<br>OR occurs in the second half of the ID phase,<br>while the write of the loaded data occurred in<br>the first half. |  |



# **Load Interlock Detection**

Pipeline stall is needed when a load instruction is followed by the an instruction that read the yet-to-be-loaded register

□ The load interlock conditions for RAW hazards are:

| Opcode field of ID/EX<br>(ID/EX.IR <sub>05</sub> ) | Opcode field of IF/ID<br>(IF/ID. IR <sub>05</sub> ) | Matching operand fields          |  |
|----------------------------------------------------|-----------------------------------------------------|----------------------------------|--|
| Load                                               | Register-register ALU                               | ID/EX. IR 1115 == IF/ID.IR 610   |  |
| Load                                               | Register-register ALU                               | ID/EX. IR 1115 == IF/ID. IR 1115 |  |
| Load                                               | Load, store, ALU imm., or branch                    | ID/EX. IR 1115 == IF/ID.IR 610   |  |

- □ Control logic is simple combinational circuit with input from ID/EX and IF/ID
- Once the hazard is detected the control unit must insert the pipeline stall and prevent the instructions in the IF and ID stages from advancing
- □ Since all control logic is derived from the data stationary, stalling the pipeline is simply by setting the ID/EX portion to zero (matching the NOP instruction)
- In case of a stall, the contents of the IF/ID registers will be re-circulated to hold the stalled instruction



### **Data Forwarding Logic**

| Pipeline<br>register<br>containing<br>source<br>instruction | Opcode of<br>source<br>instruction | Pipeline<br>register<br>containing<br>destination<br>instruction | Opcode of<br>destination<br>instruction                         | Destination<br>of the<br>forwarded<br>result | Comparison (if<br>equal then<br>forward)                 |
|-------------------------------------------------------------|------------------------------------|------------------------------------------------------------------|-----------------------------------------------------------------|----------------------------------------------|----------------------------------------------------------|
| EX/MEM                                                      | Register-<br>register ALU          | ID/EX                                                            | Register-register ALU,<br>ALU immediate, load,<br>store, branch | Top ALU<br>input                             | EX/MEM.IR <sub>1620</sub> ==<br>ID/EX.IR <sub>610</sub>  |
| EX/MEM                                                      | Register-<br>register ALU          | ID/EX                                                            | Register-register ALU                                           | Bottom ALU<br>input                          | EX/MEM.IR <sub>1620</sub> ==<br>ID/EX.IR <sub>1115</sub> |
| MEM/WB                                                      | Register-<br>register ALU          | ID/EX                                                            | Register-register ALU,<br>ALU immediate, load,<br>store, branch | Top ALU<br>input                             | EX/MEM.IR <sub>1620</sub> ==<br>ID/EX.IR <sub>610</sub>  |
| MEM/WB                                                      | Register-<br>register ALU          | ID/EX                                                            | Register-register ALU                                           | Bottom ALU<br>input                          | EX/MEM.IR <sub>1620</sub> ==<br>ID/EX.IR <sub>1115</sub> |
| EX/MEM                                                      | ALU<br>immediate                   | ID/EX                                                            | Register-register ALU,<br>ALU immediate load,<br>store, branch  | Top ALU<br>input                             | EX/MEM.IR 1620 ==<br>ID/EX.IR 610                        |
| EX/MEM                                                      | ALU<br>immediate                   | ID/EX                                                            | Register-register ALU                                           | Bottom ALU<br>input                          | EX/MEM.IR <sub>1620</sub> ==<br>ID/EX.IR <sub>1115</sub> |
| MEM/WB                                                      | ALU<br>immediate                   | ID/EX                                                            | Register-register ALU,<br>ALU immediate load,<br>store, branch  | Top ALU<br>input                             | EX/MEM.IR <sub>1620</sub> ==<br>ID/EX.IR <sub>610</sub>  |
| MEM/WB                                                      | ALU<br>immediate                   | ID/EX                                                            | Register-register ALU                                           | Bottom ALU<br>input                          | EX/MEM.IR <sub>1620</sub> ==<br>ID/EX.IR <sub>1115</sub> |
| MEM/WB                                                      | Load                               | ID/EX                                                            | Register-register ALU,<br>ALU immediate load,<br>store, branch  | Top ALU<br>input                             | EX/MEM.IR 1620 ==<br>ID/EX.IR 610                        |
| MEM/WB                                                      | load                               | ID/EX                                                            | Register-register ALU                                           | Bottom ALU<br>input                          | EX/MEM.IR <sub>1620</sub> ==<br>ID/EX.IR <sub>1115</sub> |



## Conclusion

#### □ <u>Summary</u>

- ➔ Pipeline Hazards
  - Structural, data and control hazards
- ➔ Data Hazards
  - Forwarding techniques for simple data hazards resolution
  - Data hazards classifications and detection logic
  - Load-caused pipeline stalls and how to limit their scope
  - Compiler-based instruction scheduling to avoid pipeline stalls
  - Implementation of data hazard detection and forwarding logic

#### ❑ <u>Next Lecture</u>

- ➔ Pipeline control hazards
- ➔ Pipelining and exception handling

Reading assignment includes Appendix A.2 & A.3 in the textbook

