Assembler: Object File - CS 61C Course Notes

1Learning Outcomes¶

Identify basic components of the object file.
Explain why the assembler can resolve PC-relative addresses but not absolute addresses.
Explain why PC-relative address resolution requires a two-pass process.

🎥 Lecture Video

1:26 onwards

The assembler translates assembly code to machine modules. It translates pseudoinstructions to real instructions and produces an object file. The assembler uses assembly directives to produce the object file, which contains portions of an executable’s text segment, data segment, and more.

2Object File¶

The final .o object file is a machine module, which is in binary:

Object File Header: size and position of other pieces of the object file. This is like the “table of contents”.
Text Segment: machine code.
Data Segment: binary representation of static data in the source file.
Symbol Table: List of file’s labels, static data that can be referenced by other programs
Relocation Table: Lines of code to fix later (by Linker)
Debugging Information.

The Text Segment and Data Segment (recall program memory layout) translates into machine code where possible. The last three items (Symbol Table, Relocation Table, Debugging Information) are used for the downstream Linker to resolve everything and create a single executable.

We highly recommend reading this section and then seeing the example at the end of this chapter.

3Text Segment¶

An example text segment is discussed in the example at the end of this chapter.

Arithmetic, Logical Instructions: For simple cases like add or sub, the 32-bit instruction contains all the information needed to build the machine code. This encompasses R-Format and some I-Format instructions.

PC-Relative Branches and Jumps, like beq/bne/etc. and jal. Once pseudoinstructions are replaced with real ones, all known PC-relative addressing within the object file can be computed. Determine the offset to encode by counting the number of half-word instructions between current instruction and target instruction.

After replacing pseudoinstructions, the assembler performs two passes over the program to compute all offsets.

Record all labels in a symbol table: The assembler records positions of labels by storing them in a symbol table.
Then, resolve references: Use label positions to generate machine code and hardcode branch/jump offsets where possible.

If the assembler only made one pass (from earlier instructions to later), “forward references” (labels to locations later in the program) would have unknown PC-relative offsets.

4Symbol Table¶

The symbol table is a list of labels (procedures) and data (like global arrays) in your file that could be used by other files. The Symbol Table is also used by the debugger gdb.

Instruction labels: Used to compute machine code for PC-relative addressing in branches, function calling, etc.
- If you want to call a function like printf from a library, the linker will eventually need this symbol (additionally, see the relocation table below).
- Can use .globl directive to allow labels can be referenced by other files.
Data segment: anything in section marked by the .data directive. Recall that the data segment has global variables may be accessed/used by other files.

An example symbol table is discussed in the example at the end of this chapter.

5Relocation Table¶

The relocation table is a “to-do list” of things to fix later (by the downstream linker). It contains placeholders for:

Absolute addresses for any external label used by a jump, e.g., in lib files, jal ext_label
Absolute addresses for any data located in data segment. e.g., static variables from the la load address instruction. The assembler doesn’t know where the static section or the “final resting place” is; it delegates this “final resting place” resolution to the linker.

An example relocation table is discussed in the example at the end of this chapter.