Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

1Learning Outcomes

From earlier:

Data hazard: Instructions have data dependencies, and some instructions must wait for previous instructions to complete—otherwise outdated values would be used in computation.

Data hazards occur because instructions read from and write to the same registers and memory. From P&H 4.6:

Suppose you found a sock at the folding station for which no match existed. One possible strategy is to run down to your room and search through your clothes bureau to see if you can find the match. Obviously, while you ar edoing the search, loads that have completed drying are ready to fold and those that have finished are ready to dry.

In this section, we discuss how the five-stage pipelined processor can be modified to mitigate performance hits due to data hazards.

Consider the following waterfall diagram in Table 1. The add and sub instructions have a data hazard because the former writes to and the latter reads from register s0.

Table 1:Example 1. Data hazard.

Instruction

1

2

3

4

5

6

7

8

9

add s0 t0 t1

IF

ID

EX

MEM

WB

sub t2 s0 t0

IF

ID

EX

MEM

WB

or t3 t4 t5

IF

ID

EX

MEM

WB

The sub instruction must read the updated value of s0 after the add instruction completes. In cycle 5, the add instruction writes to register s0. However, in cycle 3, sub reads from register s0, which gets the stale value of s0, before add has updated it. Then sub performs the incorrect subtraction of this stale value before writing the incorrect result.

2Stalling

To resolve the data hazard in Table 1, we can stall the pipeline until resources are “ready,” i.e., add has written the correct value to register s0. Pipeline stalls, or bubbles, are effectively “no-ops” where affected pipelines do nothing.

The below diagram illustrates a three-stall solution. In Table 2, sub will most certainly read the correctly updated value of register s0 by the end of cycle 6.

Table 2:Example 1: Resolving data hazards with stalls. A dash (–) indicates that the pipeline is flushed and affected instructions do “nothing.”

Instruction

1

2

3

4

5

6

7

8

9

add s0, t0, t1

IF

ID

EX

MEM

WB

sub → nop

IF

nop

nop

sub t2, s0, t0

IF

ID

EX

MEM

WB

Because performance suffers with stalling, we will discuss ways to avoid stalling where possible (though it is always a good last resort).

2.1Implementing Stalls

The details in this subsection are out of scope. For more information, read P&H 4.8.

Implementing stalls in hardware requires control and extra pipeline state to prevent unintended state changes in stalled stages, e.g. writes to the program counter, register, or memory.

One approach described in P&H 4.8 is a hazard detection unit. For data hazards, this detection unit can be implemented in the ID stage to determine if the source registers of this instruction depend on the destination register of register(s) still in the pipeline.[1] To stall an instruction, we could deassert all control signals (by setting them to 0) so that when the instruction passes through later stages, the stages effectively do nothing.[2]

We illustrate this in Table 2, where in cycle 2, the hazard detection unit detects that the instruction in the ID stage, sub, has a source registere that depends on the add instruction. The hazard detection unit then bubbles nops through the pipeline and preserves the sub instruction until it can be safely completed[3].

3RegFile: Write-Then-Read

Consider the waterfall diagram in Table 3. Does the dependency between add and sw incur a data hazard?

Table 3:Example 2. Data hazard...?

Instruction

1

2

3

4

5

6

7

8

9

add t0 t1 t2

IF

ID

EX

MEM

WB

lw t0 8(t3)

IF

ID

EX

MEM

WB

or t3 t4 t5

IF

ID

EX

MEM

WB

sw t0 4(t6)

IF

ID

EX

MEM

WB

sll t6 t0 t3

IF

ID

EX

MEM

WB

What is happening in cycle 5? If we are assuming our original RegFile design, then the add instruction in the WB stage only sets up the MUX, so that the write to t0 occurs at the next rising clock, edge, or cycle 6. This would mean that in the same cycle 5, the sw instruction in the ID stage would indeed read a stale value, causing a data hazard.[4]

The RISC-V five-stage pipeline therefore “ups” the hardware requirement on the register file. We leverage the high speed of the register file (100 ps for each of read/write) to assume that the hardware unit supports write-then-read:

If we assume our RegFile supports write-then-read, then in cycle 5, the read of the sw instruction in the ID stage delivers what is written by the add instruction in the WB stage, so there is no data hazard.

Let’s visit our earlier simple example. If we assume the RegFile supports write-then-read, then we can just stall two cycles, as shown in Table 4. In the first half of cycle 5, the add instruction writes to register s0; in the second half, the sub instruction reads s0.

Table 4:Example 1: Resolving data hazards with stalls and an assumption that the register file supports write-then-read in the same cycle. A dash (–) indicates that the pipeline is flushed and affected instructions do “nothing.”

Instruction

1

2

3

4

5

6

7

8

9

add s0, t0, t1

IF

ID

EX

MEM

WB

sub → nop

IF

nop

sub t2, s0, t0

IF

ID

EX

MEM

WB

4Forwarding

So far, we have discussed some solutions to some hazards by (1) specifying appropriate hardware requirements, and, if all else fails, (2) stalling the pipeline until there are no hazards.

However, we observe that with data hazards, we don’t need to wait for the instruction to complete before trying to resolve the data hazard. In other words, the data in question is ready much earlier than the WB stage of the earlier instruction.

Consider the example in Table 5, which has two data hazards because the sub and or instructions depend on the result of the add instruction writing to register s0.

Table 5:Example 3.

Instruction

1

2

3

4

5

6

7

8

9

add s0 t0 t1

IF

ID

EX

MEM

WB

sub t2 s0 t0

IF

ID

EX

MEM

WB

or t6 s0 t3

IF

ID

EX

MEM

WB

The result of adding t0 and t1 is ready at the beginning of cycle 4, once the add instruction completes the EX stage in cycle 3. Similarly, In other words, as soon as the ALU creates the sum for the add instruction, we could add extra hardware to supply it as the input for the sub instruction and the or instruction.

Wiring more connections in the datapath to use results when computed is a process known as forwarding or bypassing. Instead of waiting for the value to be written into the RegFile, we can instead grab the operand directly from the next pipeline stage.

We use Figure 2 to describe at a high-level what data is forwarded.

Forwarding adds extra connections between pipeline registers and other components in the datapath.

Figure 2:Forwarding adds extra connections between pipeline registers and other components in the datapath.

Notes:

4.1Implementing Forwarding

Forwarding is implemented by adding bypass wires between pipeline registers and other components, inserting muxes, and including additional control logic.

Figure 3 shows an implementation of the EX/MEM forwarding to resolve the add and sub data hazard in Table 5. The forwarding path (e.g., bypass) connects the output of the ALU from the EX/MEM pipeline register to the ALU input muxes. These two muxes are now wider to account for the additional bypass option. The control signals ASel and BSel now must also use the instruction bits to determine if the bypass should be used for either input to the ALU.

"TODO"

Note that in this course, we discuss two bypasses: from the EX/MEM pipeline registers (e.g., in Table 5, to resolve the add/sub data hazard) and the MEM/WB pipeline registers (to resolve the add/or data hazard). Figure 4 shows how the B input to the ALU must select the data from the ID/EX pipeline registers, the EX/MEM pipeline registers, and the MEM/WB pipeline registers.

"TODO"

Figure 4:Forwarding bypasses for the ALU’s B input signal. For simplicitly, we do not draw the the bypasses for the A input signal, though they are certainly needed.

We have only shown one in Figure 3; we omit the full MEM/WB bypass diagram, leaving this for you to work out.

5Load Data Hazards

Watch lecture/video for now! Thanks.

Footnotes
  1. How do we check destination registers? The hazard detection unit checks the pipeline registers. For example, if register rd specified in the ID/EX pipeline registers is one of the source registers for the instruction in the ID stage, then stall the instruction in the ID stage.

  2. If the instruction in the ID stage is stalled, then the instruction in the IF stage must also be stalled, etc. We can accomplish this by (1) preventing the PC register from incrementing, and (2) preventing the IF/ID pipeline register from changing. From P&H 4.8: “It’s as if you restart the washer with the same clothes, and let the dryer continue tumbling empty. Of course, like the dryer, the back half of the pipeline starting with the EX stage must be doing something; what it is doing is executing instructions that have no effect: nops.”

  3. We note that in Table 2, the sub instruction is really fetched in cycle 2, but its ID stage is delayed until clock cycle 6.

  4. We note this hazard is not a structural hazard. After all, the RegFile design does not prevent add and sw from reading/writing to the same register in the same cycle, because there are sufficient input ports. However, what is concerning is that the value sw reads must be the correct value that add writes.