Shared Memory Synchronization 2

1. Why the lecture still studies shared-memory synchronization

1.1. Actor models and software transactional memory are useful, but they are higher-level abstractions

  • The lecture begins by addressing a question: why not use actor models or software transactional memory instead of low-level synchronization?
  • The answer is that operating systems live directly on top of hardware.
  • Hardware fundamentally exposes shared state / shared memory.
  • Therefore, kernel and OS developers must understand and handle synchronization at the low level first.
  • Higher-level abstractions can be built on top of the OS, but the OS itself cannot assume them.

2. Recap: synchronization and the “too much milk” example

2.1. The central problem

  • Two processes (or actors) need to coordinate so that exactly one of them performs some action.
  • Example: check whether there is milk; if not, go buy milk.
  • This sequence is a critical section.
  • The critical section is non-atomic: checking and acting are separated in time.

2.2. Desired properties

2.2.1. Safety

  • Safety means mutual exclusion.
  • At most one process should execute the critical section.

2.2.2. Liveness

  • Liveness means progress.
  • Eventually, some process should make progress and perform the required action.

2.2.3. Combined goal

  • “At most one” + “at least one eventually” gives the desired exactly one behavior.

2.3. Recap of failed solution attempts

2.3.1. Naive solution

  • Fails because the two processes may interleave step-by-step.
  • Both can observe the same state and both may decide to act.
  • Thus mutual exclusion is violated.

2.3.2. Symmetry-breaking attempt

  • Improves safety in some executions.
  • But it is not robust: if one process stops participating, the other may fail to make progress.
  • So liveness is violated.

2.3.3. Another near-correct attempt

  • Preserves “at most one” better.
  • Also more robust against one process being absent.
  • But there is still a corner case where both conclude that the other process is doing the work.
  • Then both wait forever: no progress.

2.3.4. Busy-waiting style solution

  • A final solution works by forcing one side to wait while the other completes the whole critical section.
  • This can enforce correctness for the example.
  • However, it is inefficient because it wastes CPU time.

3. Semaphores as the OS abstraction for synchronization

3.1. Motivation

  • Busy waiting is usually not considered a good general solution.
  • The operating system provides a better abstraction: the semaphore.

3.2. Semaphore semantics

3.2.1. P operation

  • Wait until the semaphore value is positive.
  • The check and the decrement happen atomically.
  • After a successful P, the counter is decremented.

3.2.2. V operation

  • Signal / increment the semaphore.
  • It announces that some resource or event is available.

3.3. Two kinds of semaphore usage

3.3.1. Binary semaphore

  • Used primarily for mutual exclusion.
  • Behaves like a mutex in many examples.

3.3.2. Counting semaphore

  • Used for event counting or resource counting.
  • Represents how many times something has happened, or how many instances of a resource are available.

3.4. Important distinction

  • Binary semaphores and counting semaphores serve different purposes.
  • Correct synchronization solutions must keep these roles conceptually separate.

4. Producer-consumer synchronization

4.1. Problem setting

  • There is a bounded pool of buffers.
  • Producers generate data and place it into buffers.
  • Consumers take data out of buffers and return the empty buffers to the pool.

4.2. Shared state and synchronization needs

  • The queue / pool state is shared.
  • Access to shared queue state must be mutually exclusive.
  • Producers must wait until an empty buffer exists.
  • Consumers must wait until a full buffer exists.

4.3. Required semaphores

4.3.1. One binary semaphore

  • A mutex protects the shared queue / buffer-pool state.

4.3.2. Two counting semaphores

  • One counts empty buffers available.
  • One counts full buffers available.

4.4. Producer logic

  • Wait until an empty buffer exists.
  • Enter the mutex-protected critical section.
  • Remove an empty buffer / update queue state.
  • Leave the critical section.
  • Produce / publish data.
  • Signal that a full buffer is now available.

4.5. Consumer logic

  • Wait until a full buffer exists.
  • Enter the mutex-protected critical section.
  • Remove a full buffer / update queue state.
  • Leave the critical section.
  • Process the data.
  • Re-enter critical-section protection if needed for shared pool bookkeeping.
  • Return the empty buffer to the pool.
  • Signal that an empty buffer is now available.

4.6. Key structural observation

4.6.1. Mutex synchronization is symmetric

  • With a mutex-like semaphore, P and V naturally occur in matching pairs.
  • Lock, do work, unlock.

4.6.2. Condition synchronization is asymmetric

  • For event-counting semaphores, P and V often occur in different places in the program.
  • Example:
    • producer does V(full)
    • consumer does P(full)
  • This is normal, because one side causes the event and another side waits for it.

4.7. Multiple consumers

  • Adding a second consumer requires no algorithmic change.
  • Another consumer can simply run the same code.
  • The semaphore-based design already supports multiple consumers correctly.

4.8. Splitting the mutex

  • The lecture discusses whether separate mutexes could protect different parts of the state.
  • Answer: yes, that is possible.
  • In real implementations, doing so may improve performance and concurrency.
  • The lecture uses one mutex mainly for conceptual simplicity.

4.9. Reordering V operations

  • Reordering certain V operations may be possible if the operations touch different state and are already atomic.
  • However, it is often pointless.
  • In general, critical sections should be kept as short as possible.

4.10. Reordering P operations can be dangerous

  • If a consumer acquires the mutex before waiting for data, disaster may occur:
    • the consumer holds the mutex,
    • but waits for a producer to make data available,
    • while the producer cannot acquire the mutex to publish the data.
  • This creates a deadlock.

5. Deadlock

5.1. Intuition

  • Deadlock occurs when processes are stuck waiting on one another and no progress is possible.
  • Dijkstra called this a deadly embrace.

5.2. Wait-for graph view

  • Model processes as nodes.
  • Add an edge when one process waits for a resource held by another.
  • A cycle in this graph indicates deadlock.

5.3. Stability of deadlock

  • Once processes are in a deadlocked cycle, nothing inside the cycle can change the situation.
  • Therefore, deadlock is stable unless some outside intervention occurs.

5.4. Necessary conditions for deadlock

The lecture develops the standard necessary conditions.

5.4.1. Mutual exclusion

  • Some resource must require exclusive access.

5.4.2. No resource preemption

  • Once a process holds a resource, the system cannot simply take it away safely.

5.4.3. Hold and wait

  • A process holds one resource while waiting for another.

5.4.4. Circular wait

  • A cycle exists in the wait-for relation.

5.5. Why simple resource preemption is hard

  • A lock usually protects shared state.
  • If the OS simply takes the lock away, the process may already have partially modified the shared data.
  • Then the protected data structure may become inconsistent or corrupted.

5.6. Relation to transactions

  • In database systems, deadlock recovery is more practical because transactions keep enough information to roll back changes.
  • Similar ideas inspired software transactional memory.
  • But for arbitrary low-level memory manipulation in an OS kernel, rollback is difficult and expensive.

6. Approaches to deadlock

6.1. Detection and recovery

  • Detect cycles in the wait-for graph.
  • Then break the deadlock, for example by killing one process.

6.1.1. Why this is difficult in kernels

  • The wait-for graph changes dynamically.
  • Detecting cycles reliably has overhead.
  • Deciding when to run detection is tricky.
  • Recovery is dangerous if the killed process has already left shared state inconsistent.

6.1.2. Suitable at higher abstraction levels

  • Database transaction systems can often do this.
  • OS kernels generally avoid relying on this.

6.2. Prevention

  • Prevent deadlock from arising in the first place.
  • Since the first three necessary conditions are hard to remove in practice, the lecture focuses on preventing circular wait.

7. Deadlock prevention via resource ordering

7.1. Core idea

  • Impose a partial order on resources / locks.
  • Processes may acquire locks only in an order consistent with that partial order.
  • Then cycles become impossible.

7.2. Why a partial order works

  • If every lock acquisition moves “forward” in the order, a cycle would imply:
    • \(R_1 < R_2 < \dots < R_k < R_1\),
  • which is impossible.

7.3. Why only a partial order, not a total order

  • Not all locks in a system need to be related.
  • If two locks never interact, their relative order does not matter.
  • Therefore a partial order is sufficient.

7.4. Practical interpretation

  • In real systems, the order may be implicit rather than globally written down.
  • Every code path must respect it.
  • If one path acquires \(A\) then \(B\), another path must not acquire \(B\) then \(A\).

7.5. Linux example: lock dependency checking

  • The lecture mentions Linux’s lockdep debugging feature.
  • The kernel is too large for humans to write one global order explicitly.
  • Instead, runtime checking can detect counterexamples:
    • one code path acquires locks in one order,
    • another path acquires them in the reverse order.
  • Such a pair cannot both belong to the same valid partial order.

7.6. Important detail about V operations

  • Deadlock fundamentally arises from waiting.
  • P operations may block; V operations do not block.
  • Therefore ordering constraints matter primarily for blocking acquisition operations, not for releases.

8. “All-or-nothing” resource acquisition

8.1. Idea

  • Instead of acquiring locks incrementally, require a process to request all needed resources at once.
  • The request succeeds only if all are available; otherwise it gets none.

8.2. Advantage

  • This eliminates the hold-and-wait condition.

8.3. Limitation

  • In many OS scenarios, a process discovers which resource it needs next only after inspecting already locked state.
  • Therefore it often cannot know all required resources in advance.

8.4. Conservative workaround

  • Request every resource that might be needed.
  • This is safe but pessimistic.
  • It harms concurrency.

8.5. Retry-loop workaround

  • Lock something, inspect it, unlock, then request a larger set, check again, maybe retry.
  • This may work in practice, but progress can no longer be guaranteed because after releasing the lock and reacquiring a larger set of resources, the observed state may have changed, forcing the process to restart. Under contention, this restart can happen indefinitely.
  • So it is not a fully satisfactory solution.

9. Reader-writer synchronization

9.1. Problem setting

  • Shared data structure, e.g. a database.
  • Readers only inspect the structure.
  • Writers modify it.

9.2. Desired access policy

9.2.1. Readers

  • Multiple readers may read concurrently.

9.2.2. Writers

  • A writer must exclude:
    • other writers,
    • all readers.

9.3. Reader protocol requirements

A reader should:

  1. Wait until reading is allowed.
  2. Record its presence as an active reader.
  3. Perform the read.
  4. Remove its presence when done.
  5. Potentially wake a waiting writer when the last reader leaves.

9.4. Writer protocol requirements

A writer should:

  1. Wait until writing is allowed.
  2. Record its presence as the active writer.
  3. Perform the write.
  4. Remove its presence when done.
  5. Wake either another writer or the waiting readers.

9.5. State variables

The lecture introduces four counters:

  • AR = active readers
  • AW = active writers
  • WR = waiting readers
  • WW = waiting writers

9.6. Semaphores needed

  • One mutex to protect these counters.
  • One counting semaphore okToRead.
  • One counting semaphore okToWrite.

9.7. Reader-side logic

9.7.1. Entry section

  • Acquire the mutex.
  • If there are no active or waiting writers, the arriving reader may proceed immediately:
    • increment AR,
    • signal okToRead for itself.
  • Otherwise:
    • increment WR,
    • do not signal okToRead,
    • so the reader will really block later.
  • Release the mutex.
  • Perform P(okToRead).
  • Then enter the actual read section.

9.7.2. Exit section

  • Acquire the mutex.
  • Decrement AR.
  • If this reader was the last active reader and writers are waiting:
    • signal one writer,
    • update bookkeeping so that one writer becomes active.
  • Release the mutex.

9.8. Writer-side logic

9.8.1. Entry section

  • Acquire the mutex.
  • If there are no active readers and no active writers:
    • the writer may proceed,
    • increment AW,
    • signal okToWrite for itself.
  • Otherwise:
    • increment WW,
    • the writer will later block.
  • Release the mutex.
  • Perform P(okToWrite).
  • Then enter the actual write section.

9.8.2. Exit section

  • Acquire the mutex.
  • Decrement AW.
  • If there are waiting writers:
    • wake one writer next.
  • Otherwise, if there are waiting readers:
    • wake all waiting readers, not just one.
  • Release the mutex.

9.9. Why waking all readers requires a loop

  • Readers are allowed to proceed concurrently.
  • Therefore the writer exit code must signal okToRead once for each waiting reader.

9.10. Important subtlety

  • The code sometimes “signals itself”:
    • when a process discovers that it may proceed immediately,
    • it records that fact by incrementing the corresponding semaphore,
    • then later consumes that permission with its own P operation.
  • This keeps the structure uniform: both immediate progress and blocking cases use the same synchronization point.

10. Writer preference

10.1. Policy in this lecture

  • Once a writer is waiting, newly arriving readers are no longer allowed to enter immediately.
  • They must wait.

10.2. Motivation

  • Reader-writer locks are typically used when readers are frequent and writers are less frequent but important.
  • If readers were always preferred, a continuous stream of readers could starve writers forever.
  • Writer preference reduces update latency.

10.3. Alternative policies

10.3.1. Reader preference

  • Easier on readers.
  • But may starve writers.

10.3.2. Fair reader-writer locks

  • Alternate or otherwise enforce fairness between readers and writers.

11. Additional clarifications from discussion

11.1. Waiting vs scheduler preemption

  • “Waiting” in the deadlock discussion means blocked / non-runnable waiting for an event.
  • This is different from being merely preempted by the scheduler.
  • A preempted process is still conceptually runnable.

11.2. Why writer entry does not explicitly check waiting writers

  • The lecture notes that checking active readers and active writers is sufficient for correctness in the given logic.
  • A question revealed that one must be careful with the precise implication:
    • if a writer is waiting, then some blocking condition already existed,
    • namely active readers or an active writer.
  • So the active-state checks are the essential part.

11.3. Domain of some variables

  • AW is effectively binary in the presented design: either 0 or 1.
  • okToRead may count multiple readers.
  • okToWrite only enables one writer at a time.

12. Practical takeaway of the lecture

12.1. Semaphores enable layered synchronization design

  • Once the OS provides semaphores, many higher-level coordination patterns can be built systematically.

12.2. Distinguish two major uses

  • Mutual exclusion for protecting shared state.
  • Condition/event synchronization for waiting on state changes.

12.3. Deadlock is a design issue, not just a bug in one statement

  • Incorrect acquisition order can make perfectly reasonable code deadlock.
  • Preventing circular wait by lock ordering is the key practical discipline.

12.4. Reader-writer synchronization is more subtle than mutex-only synchronization

  • Because readers may share access, but writers need exclusion.
  • This creates richer state and more involved bookkeeping.

13. Administrative note at the end of the lecture

  • The first project assignment / milestone is posted.
  • Submission deadline: May 13.
  • Students are encouraged to start early because:
    • the project is work-intensive,
    • the next assignment appears already on May 6,
    • so assignments overlap.
  • Submission is through a specially named Git branch and an integrated CMS/repository workflow.
  • Automatic grading exists, but human inspection also matters.
  • Good code quality and style are expected; passing auto-tests alone is not enough.

Author: Lowtroo

Created on: 2026-04-22 Wed 16:00

Powered by Emacs 29.3 (Org mode 9.6.15)