Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

1Learning Outcomes

2Why parallelism?

To make a better computer and to run our applications faster, we have spent a good portion of time trying to understand the principles behind building a better computer. We have seen pipelining as a way to increase performance by working on multiple instructions at the same time in the pipeline.

Pipelining was the primary way people tried to increase processor performance from the 90s up to the early 2000s, where we kept building deeper and deeper pipelines and simultaneously used that to crank up the clock frequencies. However, as every processor generation came out, the power requirements kept shooting up—unsustainable, really (Figure 1). Eventually, Moore’s Law also tapered off.

"TODO"

Figure 1:Power-Density prediction plot. Source: S. Borkar, Intel, circa 2000.

"TODO"

Figure 5:Visual of Moore’s Law over time.

To stay under reasonable power limits, people explored using parallelism in software and in hardware. Nowadays, almost all high-programs exploit parallelism in some way.

3Amdahl’s Law

A common pitfall when programmers first start pursuing performance (P&H 1.11):

Pitfall: Expecting the improvement of one aspect of a computer to increase overall performance by an amount proportional to the size of the improvement.

It reminds us that the opportunity for improvement is affected by how much fraction of time the improved event consumes.

Amdahl’s Law is the following straightforward Equation (1):

Exec. time with improvement=Exec. time affected by improvementAmount of improvement+Exec. time not affected\text{Exec. time with improvement} = \frac{\text{Exec. time affected by improvement}}{\text{Amount of improvement}} + \text{Exec. time not affected}

We rewrite Amdahl’s Law as Equation (2) to capture the speedup of a given program improvement, enhancement, or optimization:

Speedup with enhancement=Exec. time before enhancementExec. time after enhancement\begin{align*} \text{Speedup with enhancement} &= \frac{\text{Exec. time before enhancement}}{\text{Exec. time after enhancement}} \\ \end{align*}

Finally, we rewrite in terms of the proportion of the program that is optimized/affected by the improvement fracoptimized\text{frac}_{\text{optimized}} and the speedup factor to this component of the program factorimprovement\text{factor}_{\text{improvement}} to produce Equation (3):

Speedup with enhancement=1(1fracoptimized)+fracoptimizedfactorimprovement\begin{align*} \text{Speedup with enhancement} &= \dfrac{1}{\left(1 - \text{frac}_{\text{optimized}}\right) + \dfrac{\text{frac}_{\text{optimized}}}{\text{factor}_{\text{improvement}}}} \end{align*}

Amdahl’s Law tells us that overall improvement to program execution time is not linear.

4Amdahl’s Law: Examples

The execution time of a third of a program can be accelerated by a factor of 2, due to some enhancement (e.g., multithreading and multicore).

"TODO"

Figure 2:The execution time of a third of a program can be accelerated by a factor of 2 due to some enhancement.

The overall program speedup due to this enhancement can be computed using Amdahl’s Law (Equation (3)). Here, fracoptimized=1/3\text{frac}_{\text{optimized}} = 1/3 and factorimprovement=2\text{factor}_{\text{improvement}} = 2.

speedup=12/3+1/32=1.2\begin{align*} \text{speedup} &= \dfrac{1}{2/3 + \frac{1/3}{2}} \\ &= 1.2 \end{align*}

Because parallelism is another type of program optimization, performance gains due to parallelism are limited by the ratio of software processing that must be executed sequentially. We illustrate this limitation below.

5Parallelism: Theoretical limitations

Applications can almost never be completely parallelized; some code will always remain that must be executed sequentially.

As an analogy, consider Figure 4, where we collaborate with an infinitely large set of classmates to write a project report. In parallel, each classmate can do research and update a shared document, thanks to cloud technologies. However, even as the set of classmates approaches infinity, some fraction of the project will inevitably need to run sequentially before and after the large bulk of parallelization, e.g., to coordinate the parallel work itself, or to check for consistencies and print the report. These sequential parts may also increase and eventually outweigh any performance gains in parallelization.

Reasonable assumption: Even with parallelization, some fraction of a program will need to run sequentially, e.g., to coordinate the parallel work itself.

Figure 4:Reasonable assumption: Even with parallelization, some fraction of a program will need to run sequentially, e.g., to coordinate the parallel work itself.

In the simplest case, assume the parallelization speedup due to PP processors is factorimprovement=P\text{factor}_{\text{improvement}} = P. If we let ss be the sequential fraction of our program, then the eventual speedup is limited by ss:

Speedup(P)=Original program timeParallelized program time=1s+1sP1s as P\begin{align*} \text{Speedup}(P) &= \frac{\text{Original program time}}{\text{Parallelized program time}} \\ &= \frac{1}{s + \dfrac{1 - s}{P}} \\ &\rightarrow \dfrac{1}{s} \text{ as } P \rightarrow \infty \end{align*}

In a perfect world, as the speedup factor PP goes to infinity—if you have a million or a trillion processors—the speedup can still be no bigger than than 1/s, determined by the sequential portion.[1] This theoretical limit is illustrated in Figure 5.

"TODO"

Figure 5:Plot of Amdahl’s Law: Speedup vs. Number of Processors.

At the end of the day, parallelism in hardware does not solve everything. We will need to choose, design, and write software tasks that actually leverage the parallelism available in our architecture. Onto the next section!

Footnotes
  1. Energy would also be a HUGE limitation here...!