1Learning Outcomes¶
TODO
TODO
🎥 Lecture Video
From earlier:
Data hazard: Instructions have data dependencies, and some instructions must wait for previous instructions to complete—otherwise outdated values would be used in computation.
Data hazards occur because instructions read from and write to the same registers and memory. From P&H 4.6:
Suppose you found a sock at the folding station for which no match existed. One possible strategy is to run down to your room and search through your clothes bureau to see if you can find the match. Obviously, while you ar edoing the search, loads that have completed drying are ready to fold and those that have finished are ready to dry.
In this section, we discuss how the five-stage pipelined processor can be modified to mitigate performance hits due to data hazards.
Consider the following waterfall diagram in Table 1. The add and sub instructions have a data hazard because the former writes to and the latter reads from register s0.
Table 1:Example 1. Data hazard.
Instruction | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
|---|---|---|---|---|---|---|---|---|---|
| IF | ID | EX | MEM | WB | ||||
| IF | ID | EX | MEM | WB | ||||
| IF | ID | EX | MEM | WB |
The sub instruction must read the updated value of s0 after the add instruction completes. In cycle 5, the add instruction writes to register s0. However, in cycle 3, sub reads from register s0, which gets the stale value of s0, before add has updated it. Then sub performs the incorrect subtraction of this stale value before writing the incorrect result.
2Stalling¶
To resolve the data hazard in Table 1, we can stall the pipeline until resources are “ready,” i.e., add has written the correct value to register s0. Pipeline stalls, or bubbles, are effectively “no-ops” where affected pipelines do nothing.
The below diagram illustrates a three-stall solution. In Table 2, sub will most certainly read the correctly updated value of register s0 by the end of cycle 6.
Table 2:Example 1: Resolving data hazards with stalls. A dash (–) indicates that the pipeline is flushed and affected instructions do “nothing.”
Instruction | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
|---|---|---|---|---|---|---|---|---|---|
| IF | ID | EX | MEM | WB | ||||
| IF | – | – | – | – | ||||
| – | – | – | – | – | ||||
| – | – | – | – | – | ||||
| IF | ID | EX | MEM | WB |
Because performance suffers with stalling, we will discuss ways to avoid stalling where possible (though it is always a good last resort).
2.1Implementing Stalls¶
The details in this subsection are out of scope. For more information, read P&H 4.8.
Implementing stalls in hardware requires control and extra pipeline state to prevent unintended state changes in stalled stages, e.g. writes to the program counter, register, or memory.
One approach described in P&H 4.8 is a hazard detection unit. For data hazards, this detection unit can be implemented in the ID stage to determine if the source registers of this instruction depend on the destination register of register(s) still in the pipeline.[1] To stall an instruction, we could deassert all control signals (by setting them to 0) so that when the instruction passes through later stages, the stages effectively do nothing.[2]
We illustrate this in Table 2, where in cycle 2, the hazard detection unit detects that the instruction in the ID stage, sub, has a source registere that depends on the add instruction. The hazard detection unit then bubbles nops through the pipeline and preserves the sub instruction until it can be safely completed[3].
3RegFile: Write-Then-Read¶
Consider the waterfall diagram in Table 3. Does the dependency between add and sw incur a data hazard?
Table 3:Example 2. Data hazard...?
Instruction | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
|---|---|---|---|---|---|---|---|---|---|
| IF | ID | EX | MEM | WB | ||||
| IF | ID | EX | MEM | WB | ||||
| IF | ID | EX | MEM | WB | ||||
| IF | ID | EX | MEM | WB | ||||
| IF | ID | EX | MEM | WB |
What is happening in cycle 5? If we are assuming our original RegFile design, then the add instruction in the WB stage only sets up the MUX, so that the write to t0 occurs at the next rising clock, edge, or cycle 6. This would mean that in the same cycle 5, the sw instruction in the ID stage would indeed read a stale value, causing a data hazard.[4]
The RISC-V five-stage pipeline therefore “ups” the hardware requirement on the register file. We leverage the high speed of the register file (100 ps for each of read/write) to assume that the hardware unit supports write-then-read:
WBstage instruction updates value in first half of cycle, e.g., on falling edge.IDstage reads new value.
If we assume our RegFile supports write-then-read, then in cycle 5, the read of the sw instruction in the ID stage delivers what is written by the add instruction in the WB stage, so there is no data hazard.
Let’s visit our earlier simple example. If we assume the RegFile supports write-then-read, then we can just stall two cycles, as shown in Table 4. In the first half of cycle 5, the add instruction writes to register s0; in the second half, the sub instruction reads s0.
Table 4:Example 1: Resolving data hazards with stalls and an assumption that the register file supports write-then-read in the same cycle. A dash (–) indicates that the pipeline is flushed and affected instructions do “nothing.”
Instruction | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
|---|---|---|---|---|---|---|---|---|---|
| IF | ID | EX | MEM | WB | ||||
| IF | – | – | – | – | ||||
| – | – | – | – | – | ||||
| IF | ID | EX | MEM | WB |
4Forwarding¶
So far, we have discussed some solutions to some hazards by (1) specifying appropriate hardware requirements, and, if all else fails, (2) stalling the pipeline until there are no hazards.
However, we observe that with data hazards, we don’t need to wait for the instruction to complete before trying to resolve the data hazard. In other words, the data in question is ready much earlier than the WB stage of the earlier instruction.
Consider the example in Table 5, which has two data hazards because the sub and or instructions depend on the result of the add instruction writing to register s0.
Table 5:Example 3.
Instruction | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
|---|---|---|---|---|---|---|---|---|---|
| IF | ID | EX | MEM | WB | ||||
| IF | ID | EX | MEM | WB | ||||
| IF | ID | EX | MEM | WB |
The result of adding t0 and t1 is ready at the beginning of cycle 4, once the add instruction completes the EX stage in cycle 3. Similarly,
In other words, as soon as the ALU creates the sum for the add instruction, we could add extra hardware to supply it as the input for the sub instruction and the or instruction.
Wiring more connections in the datapath to use results when computed is a process known as forwarding or bypassing. Instead of waiting for the value to be written into the RegFile, we can instead grab the operand directly from the next pipeline stage.
We use Figure 2 to describe at a high-level what data is forwarded.

Figure 2:Forwarding adds extra connections between pipeline registers and other components in the datapath.
Notes:
At the beginning of cycle 4, the ALU result from the
addinstruction is forwarded from itsEX/MEMpipeline register directly to the ALU (for thesubinstruction’sEXstage).At the beginning of cycle 5, the ALU result from the
addinstruction is forwarded from itsMEM/WBpipeline register directly to the ALU (for theorinstruction’sEXstage).The value of register
s0is still updated in cycle 5, from the stale value 5 to the new value 9. TheIDstages of thesubandorinstructions still read the stale value of registers0in cycles 2 and 3, respectively. What matters is that the correct operands are fed into ALU during theEXstage for both of these instructions.Note that with hardware forwarding, we do not need to update the waterfall diagram in Table 5 because no stalls occur.
4.1Implementing Forwarding¶
Forwarding is implemented by adding bypass wires between pipeline registers and other components, inserting muxes, and including additional control logic.
Figure 3 shows an implementation of the EX/MEM forwarding to resolve the add and sub data hazard in Table 5. The forwarding path (e.g., bypass) connects the output of the ALU from the EX/MEM pipeline register to the ALU input muxes. These two muxes are now wider to account for the additional bypass option. The control signals ASel and BSel now must also use the instruction bits to determine if the bypass should be used for either input to the ALU.

Note that in this course, we discuss two bypasses: from the EX/MEM pipeline registers (e.g., in Table 5, to resolve the add/sub data hazard) and the MEM/WB pipeline registers (to resolve the add/or data hazard). Figure 4 shows how the B input to the ALU must select the data from the ID/EX pipeline registers, the EX/MEM pipeline registers, and the MEM/WB pipeline registers.

Figure 4:Forwarding bypasses for the ALU’s B input signal. For simplicitly, we do not draw the the bypasses for the A input signal, though they are certainly needed.
We have only shown one in Figure 3; we omit the full MEM/WB bypass diagram, leaving this for you to work out.
5Load Data Hazards¶
Watch lecture/video for now! Thanks.
How do we check destination registers? The hazard detection unit checks the pipeline registers. For example, if register
rdspecified in theID/EXpipeline registers is one of the source registers for the instruction in theIDstage, then stall the instruction in theIDstage.If the instruction in the
IDstage is stalled, then the instruction in theIFstage must also be stalled, etc. We can accomplish this by (1) preventing the PC register from incrementing, and (2) preventing theIF/IDpipeline register from changing. From P&H 4.8: “It’s as if you restart the washer with the same clothes, and let the dryer continue tumbling empty. Of course, like the dryer, the back half of the pipeline starting with the EX stage must be doing something; what it is doing is executing instructions that have no effect: nops.”We note this hazard is not a structural hazard. After all, the RegFile design does not prevent
addandswfrom reading/writing to the same register in the same cycle, because there are sufficient input ports. However, what is concerning is that the valueswreads must be the correct value thataddwrites.