Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

1Learning Outcomes

2Building a Processor with DMEM access

Recall that load instructions are I-Type because they read a register, have an immediate, and write to a register a 32-bit value read from memory.

To support lw, we use a similar datapath to addi but instead compute an address with which to access DMEM.

Loads (and stores) participate in the MEM phase of the five step process. We therefore introduce additional logic connecting DMEM to the ALU and the RegFile, as shown in Figure 1:

DMEM: Connect and use a mux before WB (Write Back).

Figure 1:DMEM: Connect and use a mux before WB (Write Back).

DMEM: To read the memory at an address, we use the ALU to compute the address as alu = R[rs1] + imm. This readily reuses the circuitry for arithmetic and logical I-Type instructions.

Mux: We now include a mux after the ALU and DMEM that uses the control signal WBSel to select between two values for wdata, the data to write to R[rd]:

Figure 2:The lw datapath. Use the menu bar to trace through the animation or download a copy of the PDF/PPTX file.

  1. Instruction Fetch: Increment PC to next instruction (see R-Type datapath).

  2. Instruction Decode: Fetch R[rs1] from RegFile, build the immediate imm for I-Type instructions (see I-Type datapath). Also configure control logic:

    • Configure ImmSel to I-type immediates.

    • Set RegWEn to 1.

    • Set BSel to 1.

    • Set ALUSel to Add.

    • Set MemRW to Read.

    • Set WBSel to 0.

    After some delay, the immediate generator block updates its output signal imm to the appropriate sign-extended 32-bit immediate value, register value R[rs1] is read, and control signals are set.

  3. Execute: Because the control line BSel=1 selects the generated immediate imm for ALU input B, our ALU computes R[rs1] + imm.

  4. Memory: Read memory at address alu = R[rs1] + imm. After some delay, the output signal mem has the value Mem[R[rs1] + imm].

  5. Write Back: Write DMEM output to the destination register by connecting the output of the WBSel mux to RegFile’s wdata input.

    Around the next rising clock edge, wdata, RegWEn, and rd should be held stable through setup and hold time of RegFile.

3Tracing the Store Datapath

By contrast, store instructions, by contrast, are S-Type because they read two registers, have an immediate, and write to memory. Stores do not write data to registers.

We do not need to add additional blocks for stores, but we will need to:

Figure 3:The sw datapath. Use the menu bar to trace through the animation or download a copy of the PDF/PPTX file.

  1. Instruction Fetch: Increment PC to next instruction (see R-Type datapath).

  2. Instruction Decode: Fetch R[rs1] and R[rs2] from RegFile, and build the immediate imm for I-Type instructions (see I-Type datapath). Also configure control logic (see below).

  3. Execute: Because the control line BSel=1 selects the generated immediate imm for ALU input B, our ALU computes R[rs1] + imm.

  4. Memory: Write memory at address alu = R[rs1] + imm by holding addr and wdata (DMEM input).

    Around the next rising clock edge, wdata (DMEM input), MemRW, and addr should be held stable through setup and hold time of RegFile.

  5. Write Back: (We don’t write to RegFile, so skip this.)

4Partial Loads and Stores: Course project details

This section discusses how to implement partial loads and stores. The details in this section are specific to the course project. We start by introducing additional functions of the course project’s DMEM block.

Recall from an earlier section that according to the RISC-V standard, loads and stores of words should specify word-aligned addresses. DMEM memory accesses in this course (and in the course project) therefore follow the following conventions:

Consider Figure 4, which shows each byte in memory, referred to by its 32-bit address.

Diagram of aligned memory (Red boxes around every 4 bytes).

Figure 4:Diagram of aligned memory (Red boxes around every 4 bytes).

4.1Word-aligned DMEM

4.2Partial Load

The partial_load.circ circuit in the course project is designed to take the 32 bits of data read from DMEM, extract the relevant data, then process the data to put into a 32-bit register. The signals for this subcircuit are in Table 1.

Table 1:Signals for the course project partial load subcircuit.

NameDirectionBit WidthDescription
InstructionInput32The load instruction being executed.
MemAddressInput32The memory address to read from.[2]
DataFromMemInput32The data read from DMEM.
DataToRegOutput32The data to put in the register.

Behavior: For loads, DMEM will always read 32 bits of memory starting at an address that is a multiple of 4 bytes. The partial load subcircuit then uses the instruction itself to determine what bytes to extract.

Example 1. Suppose we had a lb instruction on address 6 = 0b000110.

Example 2. Suppose we had a lh instruction on address 9 = 0b001001.

Table 2:If useful, all scenarios you should handle in the partial load subcircuit for the course project. SignExt means sign-extend.

InstructionMemAddress[1:0][2]Value to put in DataToReg
lb00SignExt(DataFromMem[7:0])
lb01SignExt(DataFromMem[15:8])
lb10SignExt(DataFromMem[23:16])
lb11SignExt(DataFromMem[31:24])
lh00SignExt(DataFromMem[15:0])
lh01SignExt(DataFromMem[23:8])
lh10SignExt(DataFromMem[31:16])
lw00DataFromMem

4.3Partial Store

The partial_store.circ circuit in the course project is designed to take data from a register, process it, and store the relevant bytes to memory. The signals for this subcircuit are in Table 3.

Table 3:Signals for the course project partial store subcircuit.

Signal NameDirectionBit WidthDescription
InstructionInput32The store instruction being executed.
MemAddressInput32The memory address to store to.[2]
DataFromRegInput32The data from the register.
MemWEnInput1The control signal indicating whether writing to memory is enabled for this instruction.
DataToMemOutput32The data to store to memory.
MemWriteMaskOutput4The write mask indicating whether each byte of DataToMem will be written to memory.

Behavior: For stores, DMEM will write bitmasked data to memory at an address that is a multiple of 4 bytes. Each bit in the 4-bit write-mask MemWriteMask corresponds to the 4 bytes of the word; if a bit in MemWriteMask is 0, DMEM will not store the corresponding byte of DataToMem to memory.

Example 1: Suppose we had a sb instruction on address 3 = 0b000011.

Example 2: Suppose we had a sh instruction on address 2 = 0b000010.

Note that sh and sb instructions specify data as the lower bits of a 32-bit value, i.e., bottom 16 bits and bottom 8 bits, respectively. See Table 4.

Table 4:If useful, all scenarios you should handle in the partial store subcircuit for the course project. (Also recall no sign-extension for stores.)

InstructionMemAddress[1:0][2]DataToMem[31:24]DataToMem[23:16]DataToMem[15:8]DataToMem[7:0]MemWriteMask
sb00000DataFromReg[7:0]0001
sb0100DataFromReg[7:0]00010
sb100DataFromReg[7:0]000100
sb11DataFromReg[7:0]0001000
sh0000DataFromReg[15:8]DataFromReg[7:0]0011
sh10DataFromReg[15:8]DataFromReg[7:0]001100
sw0b00DataFromReg[31:24]DataFromReg[23:16]DataFromReg[15:8]DataFromReg[7:0]1111

Tips/reminders for the course project:

Footnotes
  1. We leave it to you to determine why these cases would/would not cross 32-bit word boundaries.

  2. Important: The bottom two bits of the MemAddress address (e.g., computed from the instruction) are NOT zeroed. However, DMEM will convert the address to a multiple of 4 before accessing memory.

  3. The lower bits 0-23 can be all zeros, though in practice these bits don’t matter beacuse of MemWriteMask.