# CMCS 611-101 Advanced Computer Architecture

### *Lecture 8*

### **Control Hazards and Exception Handling**

September 30, 2009

www.csee.umbc.edu/~younis/CMSC611/CMSC611.htm



### **Lecture's Overview**

#### $\Box$ *Previous Lecture:*

### $\rightarrow$  Data Hazards

- Forwarding techniques for simple data hazards resolution
- Data hazards classifications and detection logic
- Load-caused pipeline stalls and how to limit their scope
- Compiler-based instruction scheduling to avoid pipeline stalls
- Implementation of data hazard detection and forwarding logic

### *This Lecture*

- $\rightarrow$  Control hazards
- $\rightarrow$  Pipelining and exception handling



### **Pipeline Hazards**

 $\Box$  Pipeline hazards are cases that affect instruction execution semantics and thus need to be detected and corrected

□ Hazards types

*Structural hazard:* attempt to use <sup>a</sup> resource two different ways at same time time

- $\rightarrow$  E.g., combined washer/dryer would be a structural hazard or folder busy doing something else (watching TV)
- $\rightarrow$  Single memory for instruction and data

*Data hazard:* attempt to use item before it is ready

- $\rightarrow$  E.g., one sock of pair in dryer and one in washer; can't fold until get sock from washer through dryer
- $\rightarrow$  instruction depends on result of prior instruction still in the pipeline

*Control hazard:* attempt to make a decision before condition is evaluated

 $\checkmark$   $\;\;\blacktriangleright\; E.g.$ , washing football uniforms and need to get proper detergent level; need to see after dryer before next load in

 $\rightarrow$  branch instructions

**□ Hazards can always be resolved by waiting** 

### **Control Hazard**

- $\Box$  Control hazards are caused by the uncertainty of the execution path, branch taken or untaken
- $\Box$  If the branch is to be taken, the PC is not normally changed until the end of the MEM stage
- $\Box$  The simplest way to deal with control hazards is to stall the pipeline



- $□$  Impact: 2 clock cycles wasted per branch instruction  $\Rightarrow$  slow
- $\square$  Performance penalty can be limited by:
	- $\rightarrow$  move up decision to 2<sup>nd</sup> stage by adding hardware to check registers as being read (*MIPS allows only comparison with zero in the condition*)
	- $\rightarrow$  Compute the address of the branch target earlier in the pipeline



### **Enhanced Pipeline Datapath**



## **Impact of Modifications**

 $\square$  Condition evaluated and taken address calculated at ID stage



 $\square$  2 clock cycles per branch instr. (1 cycle wasted)  $\Rightarrow$  still slow

\* Figure is courtesy of Dave Patterson



## **Control Hazard Solution (I)**

- $\square$  Predict: guess one direction then back off if wrong
- $\Box$  Predict untaken: machine state must be updated to ensure correct semantics
- $\square$  Impact: 1 clock cycles per branch instruction if right, 2 if wrong





### **Branch Behavior in Programs**



 $\geq 60\%$  of forward and 15% of backward branches are taken (mainly loops)

¾67% of the branches (forward and backward) are taken on average

# **Control Hazard Solution (II)**

 Redefine branch behavior (takes place after next instr.) "delayed branch" **□ Compiler optimization plays an essential role in filling delay slots** 



 $\Box$  Impact: 0 clock cycles per branch instruction if can find instructions to put in the delay "slot"

\* Slide is courtesy of Dave Patterson



CMCS 611, Advanced Computer Architecture 10

## **Branch-Delay Scheduling Requirements**



 $\sqcup$  $\Box$  The limitation on delayed-branch scheduling arise from:

- 1. Restrictions on the instructions that are scheduled into the delay slots
- 2.. Ability to predict at compile-time whether a branch is likely to be taken
- $\Box$  To improve the ability of the compiler to fill branch delay slots with little undo penalty, a capability for cancellation (turning to no-op) is added
- $\Box$ Additional PC is needed to allow safe operation in case of interrupts



### **Performance of Branch-Delay**



 $\triangleright$  On average 30% of the branch delay slots are wasted On average 30% of the branch delay slots are wasted<br>(integer programs are worse than floating point)



### **Static Branch Prediction**

#### *Examination of program behavior*

- ¾ Assume branch is usually taken based on statistics but misprediction rate still range from 9%-59%
- $\triangleright$  Predict on branch direction forward/backward based on statistics and code generation convention

#### *Profile information from earlier program runs*



### **Profile-based Predictor Performance**



- ¾ Misprediction rate varies widely but is generally better for FP programs
- ¾ Actual performance depends on both prediction accuracy & branch frequency

### **Performance of MIPS Integer Pipeline**



Mohamed Younis

- $\triangleright$  Performance is based on a perfect memory system with all cache hits
- $\triangleright$  Basic delayed branch with cancellation support and thus costing one delay cycle
- $\triangleright$  Integer programs exhibit an average of 0.06 branch stalls per instruction and 0.05 load stalls per instruction
- **≻ Average CPI from MIPS** memory is 1.11
- ¾ SPECint92 performance can be enhanced by a factor of  $5/1.11 = 4.5$

# **Exception Types and Requirements**

### **Famous Types of Exceptions**

- ¾ I/O device request
- **≻ Breakpoint**
- $\triangleright$  Integer arithmetic overflow
- $\triangleright$  FP arithmetic anomaly
- ¾ Page fault

### **Requirements Characterization**

*Synchronous vs. asynchronous*

- ¾ Misaligned memory accesses
- ¾ Memory-protection violation
- r arithmetic overflow  $\longrightarrow$  Undefined instruction
	- ¾ Privilege violation
	- $\triangleright$  Hardware and power failure

- ¾ Async. Excep. are caused by I/O devices and allows completion of current instr.
- *User requested vs. coerced*

 $\triangleright$  requested exceptions are predictable and easier to handle

- User *Maskable vs. unmaskable*
- *Within vs. between instructions*
	- $\triangleright$  Exceptions occurring within instructions are synchronous
	- $\triangleright$  It is harder to deal with exceptions that occur within instructions
- *Resume vs. terminate*
	- $\triangleright$  it is easier to implement exceptions that terminate program execution



## **Exception Categorization (Example)**



- $\triangleright$  Exceptions occurring within an instruction are the most difficult to handle since they require background saving of the program state
- $\triangleright$  Pipelines is called "restartable", if it allows an instruction to be restarted after exception handling



## **Stopping & Restarting Execution**

- ¾ $\triangleright$  Some exception, e.g. page fault, takes place while the instruction is in the MEM stage, requiring the instruction to be restarted
- ¾ When an exception occurs, the pipeline control can do the following:
	- 1.. Force a trap instruction into the pipeline on the next IF
	- 2. Until the trap is taken, turn off all writes for the faulting instruction and all the instructions that follow
	- 3. After the exception-handling routine in the operating system receives control, it saves the PC of the faulting instruction
- **E** When using delayed branching, it is impossible to recreate the state using a single PC since the instructions may not be sequentially related
- $\blacktriangleright$  A pipeline that allows instructions before the faulting one to complete and those after it to restart, is said to have *precise exception*
- $\blacktriangleright$  Ideally faulting instructions will not change the state of the machine before the exception occur, however if it does, exception handling gets complex
- ¾ $\triangleright$  Supporting precise exceptions simplifies the operating system interface and is required in machines that implements demand paging



### **Exceptions in MIPS**



- ¾ Multiple exceptions might occur since multiple instructions are executing (LW followed by DIV might cause page fault and an arith. exceptions in same cycle)
- ¾ Exceptions can even occur out of order (e.g. a page fault of instr. memory can occur earlier than a page fault of data memory caused by the proceeding instruction in the pipeline)

Pipeline exceptions has to be handled in order of execution of faulting instructions not according to the time they occur



## **Precise Exception Handling**

### **The MIPS Approach:**

- ¾ Hardware posts all exceptions caused by a given instruction in a status vector associated with the instruction
- $\triangleright$  The exception status vector is carried along as the instruction goes down the pipeline
- ¾ Once an exception indication is set in the exception status vector, any control signal that may cause a data value to be written is turned off
- ¾ Upon entering the WB stage the exception status vector is checked and the exceptions, if any, will be handled according to the time they occurred
- $\triangleright$  Allowing an instruction to continue execution till the WB stage is not a problem since all write operations for that instruction will be disallowed

### **Notes:**

- ¾ The MIPS machine design does not allow exception to occur at the WB stage
- ¾ All write operations in the MIPS pipeline are in late stages
- ¾ Machines that allow writing in early pipeline stages are difficult to handle since exceptions can occur after the machine state has been already changed



## **Instruction Set Complications**

#### **Early-Write Instructions**

- $\triangleright$  An instruction that has a single write at the last stage of the pipeline, e.g. MIPS, can easily handle exceptions
- ¾ Machines that allow for multiple writes, e.g. VAX auto-increment, during instr. execution, usually require a capability to rollback the effect of an instruction
- ¾ Instructions that update the memory state during execution, e.g. string copy, temporary registers are saved and restored to allow instruction to continue

### **Branching mechanisms**

¾ Code based branching set by non-branch instructions requires special care if an exception happen prior to executing the branch instruction

### **Multi-cycle operations**

 $\triangleright$  In machines with widely different instruction cycles, an instruction can make multiple writes and cause complex data hazard, and thus exception become very hard to handle

### If architects realize the relationship between instruction set design and pipelining, they can enable efficient pipelining



## **Conclusion**

#### $\Box$ *Summary*

### $\rightarrow$  Control hazards

- •Limiting the effect of control hazards via branch prediction and delay
- $\bullet$ Static branch prediction techniques and performance
- •Branch delay issues and performance

### $\rightarrow$  Exceptions handling

- •Categorizing of exception based on types and handling requirements
- $\bullet$ Issues of stopping and restarting instructions in a pipeline
- $\bullet$ Precise exception handling and the conditions that enable it
- $\bullet$ Instructions set effects on complicating pipeline design

### *Next Lecture*

- $\rightarrow$  Pipelining floating point operations
- → An example pipeline: MIPS R4000

### Reading assignment includes Appendix A.4 & A.5 in the textbook

