Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

1Learning Outcomes

In this lecture, we will learn how to run a program. We will give more details to our colloquial definition of compiling C code to a binary executable:

"TODO"

Figure 1:Colloquially, “compiling C code” has translated a program foo.c to some executable a.out. But how and where does assembly get involved?

Let’s come back to The Great Idea of Abstraction. Today, we will discuss everything above the orange ISA line–how to go from a high-level language like C to a lower-level assembly language, and then finally to the lowest-level machine code.

2Translate and Run a Program

We will spend today discussing Figure 2.

The process of translating a C program foo.c from the high-level C code to a machine executable a.out is actually three steps (Compile, Assemble, Link), plus an additional step for running the program (Load). Notably, the first three steps spell CAL–Go Bears!!!

"TODO"

Figure 2:Flow chart for steps for compiling and running a C program.

The high-level details are shown in Table 1:

Table 1:Input/output of each step of CALL.

StepInputOutputNotes
CompilerHigh-level Language Code (e.g., foo.c)Assembly Language Code (e.g., foo.s)Output may contain pseudoinstructions (mv, li, j, etc.)
AssemblerAssembly Language Code (e.g., foo.s)Machine Language Module object file (e.g., foo.o)Replace pseudoinstructions with true assembly instructions; produce an object file (module).
LinkerObject files (e.g., foo.o, lib.oExecutable machine code (e.g., a.out)The Linker enables separate compilation of files. Changes to one file does not require recompilation of the entire program.
LoaderExecutable Code (e.g., a.out)(program is run)When an executable is run, the loader first loads the executable file from disk into memory and then runs the executable.

2.1Compiler

The compiler translates C code to assembly code. Generally, we use a program like gcc to compile programs for us. In this class, we’ve practiced hand-authoring RISC-V with the Venus simulator.

Importantly, the compiler assembly output can include pseudo-instructions. As mentioned in an earlier section, pseudoinstructions (e.g., mv t1 t2) make it much easier to write and think about assembly-level programs. Downstream, the assembler then translates these pseudoinstructions into real instructions (e.g., addi t1 t2 0) in the ISA.

2.2Assembler

The assembler translates assembly code to machine modules. It translates pseudoinstructions to real instructions and produces an object file. The assembler uses assembly directives to produce the object file, which contains portions of an executable’s text segment, data segment, and more.

2.3Linker

The linker patches together multiple object modules to produce an executable. It resolves all the assembler’s “TODO items,” including relocating everything for the final executable:

  1. Put together text segments from each .o file.

  2. Put together data segments from each .o file, then concatenate this onto the end of Step 1’s segment.

  3. Resolve references, i.e., addresses that the assembler wasn’t able to resolve.

"TODO"

The linker enables separate compilation of different parts of the program. Importantly, it supports not recompiling larger libraries. For example,. C standard libraries (e.g., stdio) are part of the Linux source, which is over 20 million lines of code. Because of the linker, recompiling a simple foo.c does not require recompiling stdio :-)

2.4Loader

The loader does several things to run a program. To learn more, we strongly recommend taking an upper-division class like CS 162: Operating Systems!

1. Load program into a newly created address space in memory.[1]

2. Initialize machine registers. Clear registers, and assign stack pointer sp to the assigned address of the first free stack location.

3. Jump to start-up routine: Copy program arguments from stack to registers, set program counter pc.

4. Run the program. If main routine returns, terminate program with exit system call.

Footnotes
  1. We discuss virtual memory in a later chapter.