Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

1Learning Outcomes

One of the primary metric to measure processor performance is the time it takes to execute a program, also known as program execution time. However, there are generally many parameters that affect this performance metric.

To disentangle these parameters, we break them down into what we will call the “iron law” of processor performance. Equation (1) shows this classic CPU performance equation:

timeprogram=instructionsprogramcyclesinstructionstimecycles\frac{\text{time}}{\text{program}} = \frac{\text{instructions}}{\text{program}} \cdot \frac{\text{cycles}}{\text{instructions}} \cdot \frac{\text{time}}{\text{cycles}}

This equation uses fractions to expand the program execution time (time/program\text{time}/\text{program} measured in s, ms, ns, etc.) into the product of three components that involve instruction count, cycles per instruction, and clock period. From P&H 1.6:

This formula is particularly useful because they separate [three] key factors that affect performance. We can use these formulas to compare two different implementations or to evaluate a design alternative if we know its impact on these three parameters.

1.1Instructions per program

instructionsprogram\frac{\text{instructions}}{\text{program}}

The first component of Equation (1) is the count of instructions in the program benchmark. The program benchmark is determined by the following:

1.2Cycles per Instruction (CPI)

cyclesinstructions\frac{\text{cycles}}{\text{instructions}}

The second component in Equation (1) is cycles per instruction, or CPI. From P&H 1.6:

Since different instructions may take different amounts of time depending on what they do, CPI is an average of all the instructions executed in the program.

Examples:

To measure CPI, run a processor on a program benchmark. On the same program benchmark, processors will have different CPI because of differences in the ISA and the processor implementation. Nevertheless, CPI provides one way of comparing two different implementations of the same ISA, since the program benchmark (and number of instructions in the program) will be the same.

1.3Clock period

timecycles\frac{\text{time}}{\text{cycles}}

The clock period is the time it takes for a single clock cycle and is the inverse of the clock frequency and is measured in seconds (or ms, ns, etc.). The clock period is determined by the following:[2]

2Using the Iron Law

From P&H 1.6:

Always bear in mind that the only complete and reliable measure of computer performance is time. For example, changing the instruction set to lower the instruction count may lead to an organization with a slower clock cycle time or higher CPI that offsets the improvement in instruction count. Similarly, because CPI depends on the type of instructions executed, the code that executes the fewest number of instructions may not be the fastest.

For example, imagine trying to reduce the variance in the instructions per program component. This is harder than it seems and will depend on what you want to compare:

2.1Instruction Throughput

We note that embedded into (1) is a metric of instruction throughput. Instruction throughput can be measured as instructions completed per unit of time and is the product of the inverse of the last two components. Equivalent, it is the inverse of CPI multiplied by the clock frequency ff.

instructionstime=cyclesinstructionscyclestime=fCPI\begin{aligned} \frac{\text{instructions}}{\text{time}} &= \frac{\text{cycles}}{\text{instructions}} \cdot \frac{\text{cycles}}{\text{time}} \\ &= \frac{f}{\text{CPI}} \end{aligned}
Solution to Exercise 1 #

Despite executing more instructions and having a slower clock rate, Processor B is faster for this task! Processor B’s significantly better CPI helps it overcome other disadvantages.

Table 2:Processor Performance Comparison Example.

MeasureProcessor AProcessor B
# Instructions1 million1.5 million
Average CPI2.51
Clock rate, f2.5 GHz2 GHz
Program execution time1 ms0.75 ms
Instruction throughput (per ns)1 inst/ns2 inst/ns

Program execution time for Processor A:

106×2.5×1/(2.5×109)=2.5×103=1ms10^6 \times 2.5 \times 1/(2.5 \times 10^9) = 2.5 \times 10^{-3} = 1 \text{ms}

Program execution time for Processor B:

(1.5×106)×1×1/(2×109)=1.5/2×103=0.75ms(1.5 \times 10^6 )\times 1 \times 1/(2 \times 10^9) = 1.5/2 \times 10^{-3} = 0.75 \text{ms}
Solution to Exercise 2 #
  1. True

  2. True

  3. False. If multiple instructions are executing, this definition of CPI would “double-count” time. This definition describes instruction latency.

Footnotes
  1. For Big-O notation, see CS 61B or an equivalent Data Structures and Algorithms course.

  2. For more information, take upper-division courses like EE 105.