Shared Memory Synchronization 1
1. From Communication to Shared-Memory Synchronization
1.1. Two fundamental ways to realize coordination between processes
- The lecture begins by revisiting the two basic approaches to communication and coordination between processes:
- message passing
- shared memory
1.2. Message passing vs. shared memory
1.2.1. Message passing
- Typically easier to use correctly.
- Provides an OS-level abstraction for communication.
- Does not require the receiver to trust arbitrary shared memory contents in the same way shared memory does.
- Often comes with useful API properties such as:
- buffering,
- possibly multiple senders and receivers,
- decoupling between participants.
1.2.2. Shared memory
- Processes intentionally share part of their memory.
- This effectively means “poking a hole” into process isolation.
- It can be efficient, but correctness becomes the programmer’s responsibility.
- In particular, synchronization must be handled explicitly to avoid overwriting each other’s data or observing inconsistent state.
1.3. Important clarification
- The lecture distinguishes:
- true message passing as an OS primitive, versus
- manually building something that behaves like message passing on top of shared memory.
- One can emulate a mailbox using shared memory, but then one no longer automatically gets the semantic guarantees and convenient abstraction of the OS primitive.
2. Shared memory
The lecture then introduces the alternative: shared memory.
Here, processes deliberately allow part of their memory to be shared, effectively “poking a hole” through the isolation boundary. One process can then write data that another process can read directly.
2.1. Why shared memory is useful
The main reason to use shared memory is performance. Since communication becomes ordinary loads and stores, it can support much higher communication rates and larger data transfers than explicit message passing.
The lecture gives an example such as a video-processing pipeline: moving large frames at high frequency between stages may be impractical with message passing, but feasible with shared memory.
2.2. The cost: synchronization complexity
The drawback is that shared memory reintroduces all the difficulties of concurrent access to shared state:
- ordering becomes crucial,
- correctness depends on the interleaving of operations,
- bugs can be timing-dependent and difficult to reproduce.
The lecture notes that such systems are often called “non-deterministic,” though more precisely the execution is deterministic given a fixed thread schedule; the problem is that the schedule is typically unknown or hard to reproduce. This leads to classic heisenbugs: errors that appear in one run and disappear in the next.
2.3. Interleavings and what outputs are possible
To make the issue concrete, the lecturer considers two processes writing:
- one writes
ABC, - the other writes
CBA.
Because execution can interleave arbitrarily, many mixed outputs are possible. For example, something like CBC can occur under a suitable interleaving.
However, not every string is possible. Each individual process still executes its own instructions in order, so no output may violate the per-process ordering constraints. The lecture ends by using this example to motivate synchronization primitives, which will be continued in the next lecture.
3. Why Shared Memory is Difficult
3.1. Interleaving matters
- Once state is shared, the order of accesses becomes significant.
- The behavior of the system may then depend on:
- the interleaving chosen by the scheduler,
- the relative speed of the processes or threads,
- the exact hardware guarantees.
3.2. Some interleavings are harmless
- Not all operations are sensitive to order.
Example:
A = 1 || B = 2
- These operations affect different variables, so the final state is independent of order.
3.3. Some interleavings are not harmless
Example:
A = B + 1 || B = 2 * B
- Here, the result depends on which operation happens first.
- If the assignment to
Ahappens first,Agets the old value ofB + 1. - If the doubling of
Bhappens first, thenAmay see the doubled value ofBand compute a different result.
3.4. Concurrent writes to the same location
Example:
A = 1 || A = 2
- Without further assumptions, the outcome is hardware-dependent.
- One might see:
1,2,- or potentially even a corrupted/interleaved result, if the hardware does not guarantee atomicity for that write.
3.5. Moral
- The outcome of concurrent shared-memory execution cannot be reasoned about purely at source-code level.
- It depends crucially on what the hardware guarantees.
4. Atomic Operations
4.1. Basic idea
- An atomic operation is an operation that cannot be interrupted “in the middle”.
- It either happens entirely or not at all, from the perspective of concurrent observers.
4.2. Example: aligned atomic writes
- Suppose the hardware guarantees that writes to aligned 16-bit words are atomic.
Then, if
Ais a 16-bit aligned variable:A = 1 || A = 2
cannot produce some mixed intermediate bit pattern such as
3.- In that case, the final outcome must be either
1or2.
4.3. Alignment matters
- The lecture emphasizes the distinction between:
- aligned accesses: address is a multiple of the word size,
- non-aligned accesses: address is not a multiple of the word size.
- Typical hardware often guarantees atomicity only for certain native aligned word accesses.
- Non-aligned accesses are often not atomic.
4.4. Larger values spanning multiple words
- Even if 16-bit aligned writes are atomic, a 32-bit value spanning two 16-bit words is not automatically atomic.
Example:
A = 0x1 || A = 0x10000
- If the two writes update different halves of the 32-bit value independently, one can observe a mixed result.
The lecture gives the example that such an execution can yield a hybrid value such as:
0x100001
- The key point is:
- word-level atomicity does not imply atomicity for larger multi-word objects.
4.5. Hardware dependence
- Atomicity properties are not universal.
- One must understand the guarantees of the actual hardware/platform.
- Common hardware-level atomic primitives include:
- word-aligned load/store,
- fetch-and-increment,
- test-and-set,
- compare-and-exchange / compare-and-swap (CAS).
4.6. Practical takeaway
- Never assume an access is atomic unless the hardware/platform specification says so.
- Concurrency reasoning must be grounded in the exact machine model.
5. Race Conditions
5.1. Definition
- A race condition occurs when processes or threads are “racing” to perform conflicting operations.
- The outcome then depends on:
- interleaving,
- relative timing,
- process speed.
5.2. Why races are bad
- Usually, race conditions are considered bugs.
- They make behavior:
- nondeterministic,
- hard to reproduce,
- hard to debug.
5.3. Benign races
- The lecture briefly notes that some races can be benign in practice.
- However, this is an exception.
- In general, races should be treated as incorrect unless there is a very strong reason not to.
5.4. Goal
- The goal of synchronization is to eliminate unwanted timing dependence and make the system behavior deterministic with respect to the intended specification.
6. The “Milk Buying” Problem as a Synchronization Example
6.1. Scenario
- The lecture introduces a deliberately simple but instructive example:
- two roommates / robots,
- one fridge,
- milk should be bought if there is none,
- but there should not be too much milk,
- ideally exactly one bottle is bought when needed.
6.2. What can go wrong
- If both agents independently observe “no milk” and both decide to go shopping, then both may buy milk.
- The outcome depends on timing.
- This is precisely a race condition.
6.3. Why the example matters
- The example is simple, but it captures the core of synchronization:
- conflicting operations on shared state,
- need for mutual exclusion,
- need for progress,
- need to separate specification from implementation.
7. First Specify Correctness Before Proposing a Fix
7.1. Important methodology
- Before trying to “fix” concurrency bugs, one must first specify what correct behavior actually means.
- The lecture explicitly warns against prematurely diving into low-level implementation details.
7.2. High-level specification of the milk problem
- A correct solution should satisfy at least the following:
7.2.1. Safety
- Do not bring home a bottle of milk if there is already milk.
- In other words, do not buy too much milk.
- At most one agent should be in the relevant critical section at a time.
7.2.2. Progress / liveness
- If there is no milk and no one is already taking care of the task, then eventually someone should go buy milk.
- “Eventually” means:
- after a finite number of steps,
- not “maybe never”.
7.2.3. Exclusion
- If one agent is already performing the milk-buying operation, the other should be excluded from doing the same conflicting operation at the same time.
7.3. Critical section
- The relevant instructions that manipulate the shared decision state form a critical section.
- A key synchronization goal is:
- critical sections that conflict must not execute simultaneously.
8. Naïve and Broken Milk-Buying Solutions
8.1. Idea: leave a note
- A first idea is to use a note on the fridge:
- if there is no milk, leave a note saying “I’m taking care of it”,
- then go buy milk.
8.2. Why this fails
The problem is that:
- checking the fridge,
- and leaving the note,
are not atomic together.
- Both agents may:
- observe that there is no milk,
- observe that there is no note,
- both leave a note,
- both go buy milk.
- This only makes the bug less likely, not impossible.
8.3. Important engineering lesson
- Making a race harder to trigger does not make the program correct.
- The lecture explicitly points out this common anti-pattern:
- “it doesn’t fail anymore” does not imply correctness.
8.4. Another workaround: asymmetric protocols
- The lecture then considers a more complicated protocol where the two processes execute different code.
- This can be made to avoid double-buying in some cases.
8.5. Why asymmetric protocols are unattractive
- They are awkward from a software engineering perspective.
- Different threads/processes now need different code.
- Extending the scheme to a third participant becomes unpleasant.
- Such solutions may also introduce unfairness or bias.
9. Busy Waiting (Spinning)
9.1. Definition
- Busy waiting (or spinning) means repeatedly checking a condition in a loop while doing no useful work.
9.2. Why it is bad
- On a uniprocessor, busy waiting is especially wasteful:
- the waiting thread consumes CPU time,
- while the thread that could make progress may not get scheduled.
- Thus, CPU time is wasted without useful progress.
9.3. Unbounded waiting is the core problem
- The issue is not merely that there is a loop.
- The issue is that the waiting time is not bounded in the program logic.
- A pathological schedule can make the spinner waste arbitrary amounts of CPU time.
9.4. Scheduler interaction
- The lecture gives an example:
- one process is spinning,
- the other process, which would resolve the condition, is not scheduled,
- perhaps because the spinner has higher priority.
- Then the system may effectively stall.
9.5. Important exam-style lesson
- Replacing the infinite loop with a fixed finite delay does NOT solve the correctness problem.
- For any fixed number of wait steps, one can construct a schedule where that delay is insufficient.
- Therefore, “pause a bit and hope” is not a correct synchronization strategy.
9.6. General rule
- For the synchronization problems in this course, busy waiting is not considered a proper solution.
10. Locks as the Conceptual Interpretation
10.1. The note analogy as locking
- The lecture interprets the note protocol as a crude locking scheme:
- leaving a note = locking the resource,
- removing the note = unlocking the resource.
10.2. Meaning of the lock
- The protected resource is not literally “the fridge note”.
- Rather, the lock protects:
- the consistency of the shared decision state,
- the correctness condition “do not buy more than one bottle”.
10.3. But crude locking is still not enough
- Without proper primitives, the lock itself may be manipulated incorrectly.
- Therefore, one needs a more robust abstraction.
11. Hardware Support for Atomic Synchronization
11.1. Desire for stronger atomic operations
- Several proposed fixes effectively ask for something like:
- “check and set together”,
- “look and claim together”,
- “take a note and inspect state in one indivisible step”.
11.2. Real hardware primitives
- Hardware often offers such low-level atomic instructions, for example:
- load/store variants,
- fetch-and-increment,
- test-and-set,
- compare-and-exchange / CAS.
11.3. But these are not portable abstractions
- Different architectures provide different instructions and semantics.
- x86, ARM, and other architectures differ substantially.
- Correct use therefore requires detailed hardware knowledge.
12. Disabling Interrupts
12.1. Uniprocessor idea
- On a uniprocessor, concurrency from the OS scheduler arises through interleaving.
- Interleaving is caused by preemption, and preemption is triggered by interrupts.
- Therefore, on a uniprocessor:
- execution between two interrupts is non-interleaved.
12.2. Consequence
- If the kernel temporarily disables interrupts, then the currently running code will not be preempted.
- This can make a short sequence of instructions effectively atomic on a uniprocessor.
12.3. Why this is dangerous
- Interrupts must not be disabled for too long.
- Otherwise the system becomes unresponsive to the environment.
- Examples of negative consequences mentioned in the lecture:
- delayed reaction to device events,
- increased latency,
- poor interactivity,
- bad behavior in safety-critical systems,
- networking delays,
- generally reduced responsiveness.
12.4. Multiprocessor limitation
- Disabling interrupts on one core does not stop another core from accessing shared memory.
- Therefore, this is not a general solution on multiprocessors.
12.5. Proper lesson
- Disabling interrupts can be used very carefully for extremely short kernel-critical sequences.
- But it is not the general synchronization solution.
- Proper synchronization abstractions are still needed.
13. Why Raw Atomic Instructions Are Not Enough
13.1. Too hardware-specific
- Available atomic instructions vary from machine to machine.
13.2. Hard to use correctly
- Even similarly named instructions may have different semantics across architectures.
13.3. Often leads to spinning
- If synchronization is built directly on raw atomic primitives, one often ends up with busy waiting.
- The scheduler is unaware of what the process is waiting for.
13.4. Therefore
- The OS should provide a higher-level, portable abstraction with predictable semantics.
14. The OS-Level Abstraction: Semaphores
14.1. Historical note
- Semaphores were introduced by Dijkstra in 1962.
14.2. Core abstraction
- A semaphore is essentially an integer counter together with two atomic operations.
14.3. Operations
14.3.1. P operation
- Wait until the counter is positive.
- Then decrement it by one atomically.
- Intuition:
- pass only if permission/resource is available.
14.3.2. V operation
- Increment the counter by one atomically.
- Intuition:
- signal that one unit of permission/resource/event has become available.
14.4. Alternative names
- The lecture mentions common alternative terminology:
Pis also called:- wait,
- down,
- acquire.
Vis also called:- signal,
- up,
- release.
14.5. Why semaphores are better
- They provide:
- portable semantics,
- a simple abstraction,
- efficient implementation inside the OS,
- no need for user-level busy waiting.
14.6. Efficient implementation idea
- If a thread cannot proceed in
P, the OS can:- remove it from the ready queue,
- put it onto a wait queue,
- wake it later when the semaphore is signaled.
- This avoids wasting CPU cycles by spinning.
15. Solving the Milk-Buying Problem with a Semaphore
15.1. Binary semaphore for mutual exclusion
- The milk-buying problem becomes simple with a semaphore.
- Use a semaphore representing the right to enter the critical section.
15.2. Initial value
The correct initial value is:
1
- Reason:
- the first arriving process sees a positive value and proceeds,
- the next one sees
0and must wait.
15.3. Conceptual solution
The code structure is essentially:
P(s) check whether milk is present if not, buy milk V(s)
15.4. Why this is good
- The solution is:
- symmetric,
- short,
- easy to reason about,
- portable,
- efficient when implemented by the OS.
16. Binary vs. Counting Semaphores
16.1. Binary semaphore
- Values effectively restricted to:
01
- Main use:
- mutual exclusion
- It behaves conceptually like an atomic boolean lock.
16.2. Counting semaphore
- Value is not restricted to
0or1. - It can count how many times some event has occurred.
- Main use:
- condition synchronization
- counting available resources or produced events.
16.3. Key distinction
- Binary semaphore:
- protects critical sections.
- Counting semaphore:
- records availability or occurrence of events.
17. Producer-Consumer Synchronization
17.1. Problem setting
- The lecture then moves to the producer-consumer pattern.
- Producer:
- creates data.
- Consumer:
- consumes/processes data.
- A canonical real-world example is the Unix pipe.
17.2. Finite buffering
- Produced data must be stored in a finite set of buffers.
- Therefore synchronization is needed because:
- a producer needs an empty buffer before producing,
- a consumer needs a full buffer before consuming.
17.3. Basic specification
17.3.1. Producer side
- Wait until an empty buffer is available.
- Obtain one empty buffer.
- Fill it with data.
- Publish it as a full buffer.
17.3.2. Consumer side
- Wait until a full buffer is available.
- Obtain one full buffer.
- Process it.
- Return it to the pool of empty buffers.
17.4. Another crucial issue: shared state
- The queues/lists of buffers are themselves shared state.
- Therefore, in addition to condition synchronization, we also need mutual exclusion so that the shared buffer lists are not corrupted.
18. Semaphore Design for Producer-Consumer
18.1. Required semaphores
- The lecture derives that three semaphores are needed:
18.1.1. A counting semaphore for empty buffers
- Counts how many empty buffers are available.
18.1.2. A counting semaphore for full buffers
- Counts how many full buffers are available.
18.1.3. A binary semaphore (mutex) for the shared buffer pool data structure
- Protects the queue/list manipulations.
18.2. Suggested names
- The lecture recommends clear names such as:
buffer_emptybuffer_filledbuffer_pool_mutex
- Especially, semaphores used for mutual exclusion should be named
mutexto make code easier to understand.
18.3. Initialization
- Let
Nbe the total number of buffers.
18.3.1. Mutex
Initial value:
1
18.3.2. Filled-buffer counter
- Initially, no data has been produced yet.
So initial value:
0
18.3.3. Empty-buffer counter
- Initially, all buffers are empty.
So initial value:
N
19. Producer Process: Synchronization Pattern
19.1. High-level unsynchronized producer
- The producer conceptually does:
- get an empty buffer from the pool,
- store produced data into it,
- place the filled buffer into the pool of full buffers.
19.2. Synchronized version: first step
Before taking an empty buffer, the producer must wait until one exists:
P(buffer_empty)
19.3. Accessing the shared pool safely
To remove an empty buffer from the shared pool, the producer must enter the critical section:
P(buffer_pool_mutex)
19.4. Remove one empty buffer
- The producer now obtains a buffer from the shared empty-buffer pool.
19.5. Release the mutex immediately after the shared-structure access
Once the producer has taken one private buffer out of the pool, it should release the mutex:
V(buffer_pool_mutex)
19.6. Why release the mutex here?
- Because the actual data production may take a long time.
- Holding the mutex during production would unnecessarily serialize unrelated work by other processes.
- The critical section should be kept as short as possible.
19.7. Produce/store data into the private buffer
- At this point the buffer is private to the producer.
- No other process can access it while it is outside the shared pool.
- Therefore this step does not require the mutex.
19.8. Re-enter the critical section to publish the full buffer
To insert the filled buffer into the shared full-buffer pool:
P(buffer_pool_mutex)
19.9. Insert the full buffer into the shared pool
- Now the consumer can later obtain it.
19.10. Release the mutex again
V(buffer_pool_mutex)
19.11. Signal that one more full buffer is now available
V(buffer_filled)
19.12. Final synchronized producer pattern
P(buffer_empty) P(buffer_pool_mutex) get empty buffer b from pool V(buffer_pool_mutex) produce data into b P(buffer_pool_mutex) put b into full-buffer pool V(buffer_pool_mutex) V(buffer_filled)
19.13. Important conceptual distinction emphasized in the lecture
- For a mutex-style semaphore,
PandVoften appear as a natural acquire/release pair around one critical section. - For a counting semaphore, the
PandVare often in different parts of the program:- one side waits for the event,
- the other side signals that the event happened.
- This distinction is fundamental and frequently causes mistakes.
20. Questions and Clarifications Raised in the Lecture
20.1. Is the producer’s private buffer still shared?
- No.
- Once the producer removes one buffer from the shared pool under mutex protection, that buffer is private to the producer until it is placed back into the shared structure.
20.2. Why not keep the mutex during the whole production step?
- Because the actual work may be slow.
- Keeping the mutex would block other processes from accessing the shared pool even though they do not need the producer’s private buffer.
- This would unnecessarily reduce concurrency.
20.3. How does buffer_empty get replenished?
- This is done by the consumer when it finishes consuming a full buffer and returns it to the empty-buffer pool.
- The lecture stops before completing the consumer code, but clearly indicates that this is the consumer’s responsibility.
20.4. Exam warning
- The lecturer explicitly notes that this producer-consumer semaphore pattern is a standard pattern and a frequent source of mistakes in exams.
- It should be studied until the role of each semaphore is completely clear.
21. Main Takeaways of the Lecture
21.1. Shared memory is powerful but dangerous
- Once state is shared, correctness depends on interleaving and hardware guarantees.
21.2. Race conditions arise from timing dependence
- If correctness depends on “who gets there first”, the system is likely broken.
21.3. Atomicity is central
- Correct synchronization relies on operations that cannot be interrupted midway.
- But raw atomic hardware operations are too low-level and too machine-dependent to use directly as the main abstraction.
21.4. Busy waiting is not an acceptable general solution
- It wastes CPU and interacts badly with scheduling.
21.5. OS abstractions matter
- The operating system should hide hardware complexity behind predictable synchronization abstractions.
21.6. Semaphores unify two needs
- binary semaphores solve mutual exclusion,
- counting semaphores solve condition synchronization.
21.7. Producer-consumer is the canonical next application
- It requires both:
- condition synchronization (empty/full buffers),
- mutual exclusion (protecting the shared buffer pool).
21.8. General design lesson
- First specify correctness:
- safety,
- liveness/progress,
- mutual exclusion.
- Only then design the synchronization mechanism.
22. End of Lecture Scope
- The lecture fully develops:
- shared-memory hazards,
- race conditions,
- atomicity,
- disabling interrupts,
- semaphore basics,
- milk-buying via semaphore,
- the producer side of the producer-consumer pattern.
- The consumer side is explicitly deferred to the next lecture.