Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

1Learning Outcomes

Let’s describe program translation (Compile, Assemble, and Link) with a hello_world example.

2Compiler: hello.c \rightarrow hello.s

hello.c:

1
2
3
4
5
#include <stdio.h>
int main() {
    printf("Hello, %s\n", "world");
    return 0;
}

hello.s:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
.text
    .align 2
    .global main
main:
    addi sp sp -4
    sw   ra 0(sp)
    la   a0 str1
    la   a1 str2
    call printf
    lw   ra 0(sp)
    addi sp sp 4
    li   a0 0
    ret
.section .rodata
    .balign 4
str1:
    .string "Hello, %s!\n"
str2:
    .string "world"

Note the many pseudoinstructions: la, call, li, ret, etc.

3Assembler: hello.s \rightarrow hello.o

The assembler output is an object file in binary:

     ... ff010113 00112623 00000537
00050513 000005b7 00058593 000080e7
00c12083 01010113 00000513 00008067 ...

3.1Text segment

This is not very readable, so we describe some components in more detail. Below is the text segment translating the main portion of hello.o. Refer to hello.s as needed.

00000000 <main>:
0:  ff010113 addi sp sp -16
4:  00112623 sw   ra 12(sp)
8:  00000537 lui  a0 0x0
c:  00050513 addi a0 a0 0
10: 000005b7 lui  a1 0x0
14: 00058593 addi a1 a1 0
18: 000080e7 jalr ra 0
1c: 00c12083 lw   ra 12(sp)
20: 01010113 addi sp sp 16
24: 00000513 addi a0 a0 0
28: 00008067 jalr ra

Pseudoinstructions are replaced where possible.

The machine code includes address placeholders of zero (e.g., lui a1 0x0 is machine code 000005b7), denoting unresolved references for the linker to resolve.

3.2Symbol Table

The below would be in binary, but we illustrate it as a table:

Table 1:Symbol Table for hello.o

LabelAddress (in segment of module)type
main0x00000000global text
str10x00000000local data
str20x0000000clocal data

3.3Relocation Table

The below would be in binary, but we illustrate it as a table:

Table 2:Relocation Table for hello.o

AddresstypeDependency
0x00000008lui %hi(str1)
0x0000000caddi%low(str1)
0x00000010lui%hi(str2)
.........

4Linker: hello.s \rightarrow a.out

A portion of the a.out executable is shown:

000101b0 <main>:
  101b0: ff010113 addi sp sp -16
  101b4: 00112623 sw   ra 12(sp)
  101b8: 00021537 lui  a0 0x21
  101bc: a1050513 addi a0 a0 -1520 # 20a10 <str1>
  101c0: 000215b7 lui  a1 0x21
  101c4: a1c58593 addi a1 a1 -1508 # 20a1c <str2>
  101c8: 288000ef jal  ra 10450    # <printf>
  101cc: 00c12083 lw   ra 12(sp)
  101d0: 01010113 addi sp sp,16
  101d4: 00000513 addi a0 0,0
  101d8: 00008067 jalr ra
Footnotes
  1. The CS61C refcard says that the la pseudoinstruction resolves to an auipc and addi, implying that la is always a PC-relative address. However, this pseudo->real instruction resolution depends on the RISC-V compiler. In gcc, you can use the -fpic or -fno-pic flag to specify relative vs. absolute addressing (source1, source2). I suspect that for this example, we used absolute addressing in order to fully demonstrate the address placeholders. I’m leaving further exploration of this to future CS61C instructors.