1Learning Outcomes¶
Identify which step of Compile, Assemble, and Link translates a specific assembly instruction to machine code.
See an end-to-end example of translation of a
hello.cprogram into ana.outexecutable.Identify assembly directives.
🎥 Lecture Video
Let’s describe program translation (Compile, Assemble, and Link) with a hello_world example.
2Compiler: hello.c hello.s¶
hello.c:
1 2 3 4 5#include <stdio.h> int main() { printf("Hello, %s\n", "world"); return 0; }
hello.s:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19.text .align 2 .global main main: addi sp sp -4 sw ra 0(sp) la a0 str1 la a1 str2 call printf lw ra 0(sp) addi sp sp 4 li a0 0 ret .section .rodata .balign 4 str1: .string "Hello, %s!\n" str2: .string "world"
Note the many pseudoinstructions: la, call, li, ret, etc.
Click to show directives
Line 1,
.text: EntertextsectionLine 2,
.align: Align code to 2*2 bytesLine 3,
.global: Declare global symbol main
Line 14,
section: Enter read-onlydatasectionLine 15,
balign: Align data section to 4 bytesLine 17,
.string: null-terminated stringLine 19,
.string: null-terminated string
More directives:
| Directive | Description |
|---|---|
.text | Put subsequent items in the user Text segment (machine code) |
.data | Put subsequent items in user Data segment (source file data in binary) |
.globl sym | Declares sym global and can be referenced from other files |
.string str | Store the string str in memory and null-terminate it |
.word | Store the n 32-bit quantities in successive memory words |
Click to show labels
Line 4,
mainLine 16,
str1Line 18,
str2
3Assembler: hello.s hello.o¶
The assembler output is an object file in binary:
... ff010113 00112623 00000537
00050513 000005b7 00058593 000080e7
00c12083 01010113 00000513 00008067 ...3.1Text segment¶
This is not very readable, so we describe some components in more detail. Below is the text segment translating the main portion of hello.o. Refer to hello.s as needed.
00000000 <main>:
0: ff010113 addi sp sp -16
4: 00112623 sw ra 12(sp)
8: 00000537 lui a0 0x0
c: 00050513 addi a0 a0 0
10: 000005b7 lui a1 0x0
14: 00058593 addi a1 a1 0
18: 000080e7 jalr ra 0
1c: 00c12083 lw ra 12(sp)
20: 01010113 addi sp sp 16
24: 00000513 addi a0 a0 0
28: 00008067 jalr raHow to read each line
Left of colon, e.g.,
10: the relative address of the instruction in the module8-digit hexadecimal, e.g., (
000005b7): the 32-bit-wide machine code, perhaps with placeholdersassembly instruction, e.g., (
lui a1 0x0): the assembly insturction, perhaps with placeholders
Pseudoinstructions are replaced where possible.
la[1] is replaced withluiandaddicallis replaced withjalr
The machine code includes address placeholders of zero (e.g., lui a1 0x0 is machine code 000005b7), denoting unresolved references for the linker to resolve.
3.2Symbol Table¶
The below would be in binary, but we illustrate it as a table:
Table 1:Symbol Table for hello.o
| Label | Address (in segment of module) | type |
|---|---|---|
main | 0x00000000 | global text |
str1 | 0x00000000 | local data |
str2 | 0x0000000c | local data |
3.3Relocation Table¶
The below would be in binary, but we illustrate it as a table:
Table 2:Relocation Table for hello.o
| Address | type | Dependency |
|---|---|---|
0x00000008 | lui | %hi(str1) |
0x0000000c | addi | %low(str1) |
0x00000010 | lui | %hi(str2) |
| ... | ... | ... |
4Linker: hello.s a.out¶
A portion of the a.out executable is shown:
000101b0 <main>:
101b0: ff010113 addi sp sp -16
101b4: 00112623 sw ra 12(sp)
101b8: 00021537 lui a0 0x21
101bc: a1050513 addi a0 a0 -1520 # 20a10 <str1>
101c0: 000215b7 lui a1 0x21
101c4: a1c58593 addi a1 a1 -1508 # 20a1c <str2>
101c8: 288000ef jal ra 10450 # <printf>
101cc: 00c12083 lw ra 12(sp)
101d0: 01010113 addi sp sp,16
101d4: 00000513 addi a0 0,0
101d8: 00008067 jalr raSquash all .o files
Update in symbol table
For each entry in the relocation table:
Replace placeholders with the actual address
Update machine code. Note that the
luiupper immediate is incremented by 1; refer to an earlier section about U-Type instruction formats and thelipseudoinstruction.
The CS61C refcard says that the
lapseudoinstruction resolves to anauipcandaddi, implying thatlais always a PC-relative address. However, this pseudo->real instruction resolution depends on the RISC-V compiler. In gcc, you can use the -fpic or -fno-pic flag to specify relative vs. absolute addressing (source1, source2). I suspect that for this example, we used absolute addressing in order to fully demonstrate the address placeholders. I’m leaving further exploration of this to future CS61C instructors.