Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

1And in Conclusion\dots

Pipelining In order to pipeline, we separate the datapath into 5 discrete stages, each completing a different function and accessing different resources on the way to executing an entire instruction. Recall the five stages: In the IF stage, we use the Program Counter to access our instruction as it is stored in IMEM. Then, we separate the distinct parts we need from the instruction bits in the ID stage and generate our immediate, the register values from the RegFile, and other control signals. Afterwards, using these values and signals, we complete the necessary ALU operations in the EX stage. Next, anything we do in regards with DMEM (not to be confused with RegFile or IMEM) is done in the MEM stage, before we hit the WB stage, where we write the computed value that we want back into the return register in the RegFile.

These 5 stages, divided by registers, allow operating different stages of the datapath in the same clock period. Different instructions can use different stages at the same time. At each clock cycle, the necessary inputs into a particular stage are sampled at the rising clock edge (and available after the clk-to-q delay). After the stage operates on the inputs, the corresponding outputs are fed into pipeline registers for the next stage. Note, pipeline registers may also be required to pass information that may not be necessary for the next immediate stage, but some future stage.

5 Stage Datapath Diagram Picture

2Textbook Readings

P&H 4.6, 4.7, 4.8

3Additional References

4Exercises

Check your knowledge!

4.1Short Exercises

  1. True/False: By pipelining the CPU datapath, each single instruction will execute faster because pipelining reduces the latency per instruction (resulting in a speed-up in performance).

  1. True/False: A pipelined CPU datapath results in instructions being executed with higher throughput compared to the single-cycle CPU.