Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

1Learning Outcomes

The core challenge to synchronization is contesting over shared resources, e.g., a specific location in memory, an I/O device, or (in general) some state. Critical sections specify sections of codes that must be exclusively accessed by at most one single thread at any given time. In practice, this exclusive access to a section of assembly instructions thereby enables exclusive access to shared state. But what might these “exclusive access” assembly instructions look like?

In this section, we first introduce the concept of locks[1], which is a conceptual mechanism to implement exclusive access to shared state. Then, we describe some atomic assembly instructions in RISC-V.

2Analogy: Buying Milk

You and your roommate want to agree on how to buy milk. Both of you will use the same procedure. Assume a shared fridge.

Approach 1: The below approach will not work. For example, what if you get home while your roommate is out buying milk? Once you buy milk, you will have two milk cartons in the fridge!

if milk not in fridge:
  buy milk at store
  put milk in fridge

Approach 2: A reasonable next step is to leave a sticky note on the fridge when we step out to buy milk. Even with this shared note message system, we can still run into two milk cartons in the fridge. Click the tabs below to see how.

Code
Access Pattern
if note not on fridge:
  if milk not in fridge:
    put note on fridge
    buy milk at store
    put milk in fridge
    take note off fridge

We could continue down this path—adding more messages and more notes—but these solutions do not address the key problem: that your roommate can insert themselves in-between you checking the fridge and you buying milk.

3Locks

One solution is to use a lock to grant a thread (here, you or your roommate) exclusive access to a shared resource. Each thread (well, you or your roommate) can try to “acquire” the lock, but only one thread can have the lock at a given time.

Formally, there are two operations with a lock:[^spinlock]

You have seen the notion of a lock before in public bathroom stalls. One person enters the stall (the shared resource) and locks the door (“acquires” the lock). Others wait until that person unlocks the door (“releases” the lock) to indicate the stall is available.

Attempt 3: With locks, we can successfully implement a milk-buying procedure where there is at most one milk carton in the fridge at any given time. Click the tabs below to see how.

Code
Access Pattern
acquire fridgelock
  if milk not in fridge:
    buy milk at store
    put milk in fridge
release fridgelock

3.1Deadlock

Introducing locks introduces a new problem: deadlock. Deadlock is a system state in which no progress is possible because everything is locked waiting for something else.

"Photograph or stylized aerial diagram of vehicles gridlocked at an intersection, each lane waiting on another in a cycle so no car can proceed—mirroring circular wait in locking. Caption ties the visual metaphor to threads holding locks while waiting on peers."

Figure 1:A classic deadlock is a traffic jam, where no car can move because every car is blocked by another. Image source: Reddit

A classic CS example, the dining philosophers problem, illustrates how deadlock can occur.[2] Suppose there is a special spaghetti must be eaten with two forks (one left fork, one right fork). Each philosopher can only think OR eat. Consider a proposal in which each philosopher is instructed to behave as follows:

  • think unless the left fork is available; when it is, pick it up;

  • think unless the right fork is available; when it is, pick it up;

  • when both forks are held, eat for a fixed amount of time;

  • put the left fork down;

  • put the right fork down;

  • repeat from the beginning.

An illustration of the dining philosophers problem.

Figure 2:In the problem, each philosopher has a bowl of spaghetti and can reach the two forks on either side of them.. Wikipedia

This proposed algorithm can lead to deadlock. Suppose that all philosophers grab their left fork “at the same time” (that is, immediately after one-another). All are stuck waiting for the right fork and will be stuck thinking forever.

To avoid deadlock in multithreaded programs, we must ensure that multiple threads cannot interleave multiple shared memory accesses in a way that “locks out” a way forward, because one thread has acquired one lock and is waiting to acquire a second lock, but the second lock has already been acquired by a second thread that is waiting to acquire the first lock.

3.2Naive implementation: Lock

With our current set of instructions, any lock implementation can still cause a data race when threads are interleaved.

Consider a “lock” in shared memory @ 0x55550000. Let a value 0 at this address mean the lock is unlocked (open), and 1 mean the lock is locked (closed). Suppose that we use register t0 to check the lock state, and we use t1 to temporarily store the immediate to write to the lock in memory.

Suppose we write the acquire and release procedures using RISC-V assembly instructions. Click the tabs below to see how both threads may think they have acquired the singular lock:

Code
Access Pattern
# acquire
      li   t1 1
try:  lw   t0 0(s0)
      bnez t0 try
lock: sw   t1 0(s0)

...
# release
      sw   x0 0(s0)

4Atomic Instructions

To implement locks and other synchronization primitives, we need lower-level support at the assembly level. This hardware support must prevent an interloper (e.g., another thread) from changing values between the necessary memory accesses in a lock.[3] RISC-V and other ISAs support atomic instructions for memory read and memory write.

Atomic instructions are instructions where all other instructions must seem to happen strictly before or after the atomic instruction’s operation.

Atomic instructions on the surface are composites of existing instructions, e.g., a read and write to a single memory location. However, to implement this read-write atomically, the hardware must be designed to not allow any other access to that memory location between that read and write.

These are two common approaches to implementing locks,[1] both of which are supported with RISC-V extensions:

The implementation of these extension instructions is out of scope.

4.1(Zaamo) Lock Implementation with Atomic Swap

The Zaamo extension implements atomic memory operations (AMOs), which atomically perform an operation on an operand in memory and set the destination register to the original memory value.

RISC-V AMOs are R-Type instructions with the format amoinst rd rs2 (rs1) that do the following:

RISC-V supports these atomic insturctions with: swap, add, and/or/xor, min/max, min/max unsigned.

An AMO lock implementation works as follows. On acquire, if the lock state was previously also 1, then another thread has the lock, so we “spin” and try again until lock state was previously 0.

# acquire
        li             t0 1
try:    amoswap.w.aq   t0 t0 (s0)
        bnez           t0 try
locked: # already locked now

# release
unlock: amoswap.w.rl  x0 x0 (s0)

The code above looks very similar to our naive implementation, but now we use AMOs in place of lw and sw.

A single atomic memory swap operation is hard to implement because it requires both memory read and write in a single instruction. We know from building the datapath that this is not only difficult but also inefficient, potentially leading to a slower clocked system.

4.2(Zalrsc) Lock Implementation with Load-Reserved/Store-Conditional

An alternative approach to atomic instruction uses a pair of instructions (one read, one write) that are effectively atomic. To do so, we assume a different definition of success. If the pair executes successfully, then nothing else has changed the value between the instruction pairs.

We discuss one common atomic insturction pair:

A load-reserved/store-conditional lock implementation works as follows:

# acquire
        li   t2  1
try:    lr   t1 s1
        bne  t1 x0 try
        sc   t0 s1 t2
        bnez t0 try
locked: # do nothing
...

# release
unlock: sw x0,0(s1)
Footnotes
  1. Specifically, we will describe a spinlock. See an operating systems course for other synchronization primitives, like semaphores.

  2. The Dining Philosophers Problem was first posed by Edsger Djikstra in 1965. Wikipedia

  3. Note that such assembly-level support regardless if a system is uniprocessor or multicore, because OS interrupts will trigger thread context switches.