Data Races, Critical Sections - CS 61C Course Notes

1Learning Outcomes¶

Define a data race.
Specify critical sections in OpenMP to synchronize threads.

🎥 Lecture Video

As discussed earlier, threads operate under a shared memory model, where different threads can read from and write to the same locations in memory.

Consider the below program. What are possible values of x after running this program with four OpenMP threads?

#include <stdio.h>
#include <omp.h>
int main() {
  int x = 0;                        /* shared variable */
  #pragma omp parallel
  {
    x += 1;
  }
}

2Data Race¶

Two memory accesses form a data race if:

They are from different threads to the same location;
At least one is a write; and
They occur one after another.

In our multi-thread execution model, instructions from different threads have their execution interleaved in time, thus causing data races.^[1]

Let us translate the parallel section of our example code into the three RISC-V instructions below. With four OpenMP threads, there are four sets of three sections to execute. Each thread accesses the same variable x in (shared) memory but has its copy of register t0 used in arithmetic, resulting in the data race.

lw t0 0(sp)   # x @ sp
addi t0 t0 1
sw t0 0(sp)

Because of the many possibilities of interleaving the execution of these twelve instructions, the final value of x is not always the same.^[2] Consider the cases below.

Case 1

Case 2

Case 3

Case 1: All threads run sequentially.

Thread	Instruction	`x` memory access
1	load	read `x`: 0
1	store	write `x`: 1
2	load	read `x`: 1
2	store	write `x`: 2
3	load	read `x`: 2
3	store	write `x`: 3
4	load	read `x`: 3
4	store	write `x`: 4

Final value of x: 4

Case 2: Threads “perfectly” interleaved.

Thread	Instruction	`x` memory access
1	load	read `x`: 0
2	load	read `x`: 0
3	load	read `x`: 0
4	load	read `x`: 0
1	store	write `x`: 1
2	store	write `x`: 1
3	store	write `x`: 1
4	store	write `x`: 1

Final value of x: 1

Case 3: Thread 1’s store is last.

Thread	Instruction	`x` memory access
1	load	read `x`: 0
2	load	read `x`: 0
2	store	write `x`: 1
3	load	read `x`: 1
3	store	write `x`: 2
4	load	read `x`: 2
4	store	write `x`: 3
1	store	write `x`: 1

Final value of x: 1

3Critical Sections in OpenMP¶

To enforce multithreaded program correctness, we often need to synchronize threads, i.e., coordinate their execution. Most commonly, we must identify when one thread is finished writing so that it is safe for another to read.

Synchronization can be specified in user-level routines, i.e., in higher-level languages. A critical section is a segment of code that must be executed by a single thread at a time, thereby enforcing synchronization. Once a thread enters a critical section, it can safely execute all code in that critical section, knowing that it is the only thread that can execute that section at that time.

We discuss two OpenMP synchronization constucts:

#pragma omp critical: Creates a critical section within a parallel code segment. OpenMP docs
#pragma omp barrier: Forces all threads to wait until all threads have hit the barrier. OpenMP docs

Returning to our example code, we can specify a critical section and prevent any data races:

#include <stdio.h>
#include <omp.h>
int main() {
  int x = 0;                        /* shared variable */
  #pragma omp parallel
  {
    #pragma omp critical    
    {
      x += 1;
    }
  }
}

4Data Race Examples¶

4.1OpenMP Hello World: Add¶

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
#include <stdio.h>
#include <omp.h>
int main() {
  int x = 0;                        /* shared variable */
  #pragma omp parallel
  {
    int tid = omp_get_thread_num(); /* private variable */
    #pragma omp critical    
    {
      x++;
    }
    printf("Hello World from thread = %d, x = %d\n", tid, x);

    #pragma omp barrier
    if (tid == 0) {
      printf("Number of threads = %d\n", omp_get_num_threads());
    }
  }
}

Show Answer

No; it varies. One output run:

Hello World from thread = 11, x = 10
Hello World from thread = 01, x = 1
Hello World from thread = 06, x = 11
Hello World from thread = 07, x = 7
Hello World from thread = 09, x = 8
Hello World from thread = 03, x = 9
Hello World from thread = 04, x = 5
Hello World from thread = 10, x = 3
Hello World from thread = 08, x = 2
Hello World from thread = 00, x = 12
Number of threads = 12
Hello World from thread = 02, x = 6
Hello World from thread = 05, x = 4
Final value of x: 12

Because Line 10 is in a critical section, x’s value is updated correctly. However, Line 12’s printf is not part of the critical section, and a thread with a larger value of x can print before a thread with a smaller value.

Quick Check

Consider the DGEMM OpenMP code discussed in an earlier section:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
matrix_d_t *dgemm_openmp(matrix_d_t *A, matrix_d_t *B) {
  if (A->ncols != B->nrows) return NULL;
  matrix_d_t *C = init_mat_d(A->nrows, B->ncols);

  #pragma omp parallel for
  for (int i = 0; i < A->nrows; i++) {
    for (int j = 0; j < B->ncols; j++) {
      double sum = 0;
      for (int k = 0; k < A->ncols; k++) {
        sum += A->data[i*A->ncols+k]*B->data[k*B->ncols+j];
      }
      C->data[i*C->ncols+j] = sum;
    }
  }
  return C;
}

Why is there no specified critical section, even though each (work-sharing) thread updates elements of C?

Show Answer

First, review the definition of data race.

Consider all of the memory accesses in the innermost loop (i.e., for all values of j and k), assuming that two different threads run the i = 0 and i = 1 iterations, respectively. Below, we explain that none of these memory accesses will cause races:

The variable sum is declared within the parallel section; it is not shared across threads.
Read element $A_{ik}$ : No data race, because neither thread’s access is a write.
Read element $B_{kj}$ : No data race, because neither thread’s access is a write.
Write element $C_{ij}$ : No data race. While both threads’ accesses are writes, the elements $C_{0j}$ and $C_{1j}$ are different, so writes are to different locations.^[3]

We present the above argument without loss of generality, meaning that it applies for all possible work-sharing configurations in the DGEMM OpenMP code

Footnotes¶

A data race is not a data hazard. While data hazards result from instruction-level parallelism on a pipelined processor architecture, data races can occur even on single-cycle processors. Data races result from not determining instruction execution order between threads—i.e., which thread accesses memory first.
↩
Formally, a multithreaded program is only considered correct if ANY interlacing of threads yield the same result. Our example code is an incorrect program. For those curious, there are $8!/(2!)4 = 2520$ different possible orders of load and store instruction pairs (or 105 orders if we consider that all threads are identical).
↩↩
If caches are in the memory hierarchy, it is likely that the two threads will each have their own copy of the same block of C in their respective cache. So while there is no data race, these copies will be “out of sync.” Read more about cache coherency in a later section.
↩