Shared Memory Synchronization 1

1. From Communication to Shared-Memory Synchronization

1.1. Two fundamental ways to realize coordination between processes

The lecture begins by revisiting the two basic approaches to communication and coordination between processes:
- message passing
- shared memory

1.2. Message passing vs. shared memory

1.2.1. Message passing

Typically easier to use correctly.
Provides an OS-level abstraction for communication.
Does not require the receiver to trust arbitrary shared memory contents in the same way shared memory does.
Often comes with useful API properties such as:
- buffering,
- possibly multiple senders and receivers,
- decoupling between participants.

1.2.2. Shared memory

Processes intentionally share part of their memory.
This effectively means “poking a hole” into process isolation.
It can be efficient, but correctness becomes the programmer’s responsibility.
In particular, synchronization must be handled explicitly to avoid overwriting each other’s data or observing inconsistent state.

1.3. Important clarification

The lecture distinguishes:
- true message passing as an OS primitive, versus
- manually building something that behaves like message passing on top of shared memory.
One can emulate a mailbox using shared memory, but then one no longer automatically gets the semantic guarantees and convenient abstraction of the OS primitive.

2. Shared memory

The lecture then introduces the alternative: shared memory.

Here, processes deliberately allow part of their memory to be shared, effectively “poking a hole” through the isolation boundary. One process can then write data that another process can read directly.

2.1. Why shared memory is useful

The main reason to use shared memory is performance. Since communication becomes ordinary loads and stores, it can support much higher communication rates and larger data transfers than explicit message passing.

The lecture gives an example such as a video-processing pipeline: moving large frames at high frequency between stages may be impractical with message passing, but feasible with shared memory.

2.2. The cost: synchronization complexity

The drawback is that shared memory reintroduces all the difficulties of concurrent access to shared state:

ordering becomes crucial,
correctness depends on the interleaving of operations,
bugs can be timing-dependent and difficult to reproduce.

The lecture notes that such systems are often called “non-deterministic,” though more precisely the execution is deterministic given a fixed thread schedule; the problem is that the schedule is typically unknown or hard to reproduce. This leads to classic heisenbugs: errors that appear in one run and disappear in the next.

2.3. Interleavings and what outputs are possible

To make the issue concrete, the lecturer considers two processes writing:

one writes ABC,
the other writes CBA.

Because execution can interleave arbitrarily, many mixed outputs are possible. For example, something like CBC can occur under a suitable interleaving.

However, not every string is possible. Each individual process still executes its own instructions in order, so no output may violate the per-process ordering constraints. The lecture ends by using this example to motivate synchronization primitives, which will be continued in the next lecture.

3. Why Shared Memory is Difficult

3.1. Interleaving matters

Once state is shared, the order of accesses becomes significant.
The behavior of the system may then depend on:
- the interleaving chosen by the scheduler,
- the relative speed of the processes or threads,
- the exact hardware guarantees.

3.2. Some interleavings are harmless

Not all operations are sensitive to order.
Example:
```
A = 1   ||   B = 2
```
These operations affect different variables, so the final state is independent of order.

3.3. Some interleavings are not harmless

Example:
```
A = B + 1   ||   B = 2 * B
```
Here, the result depends on which operation happens first.
If the assignment to A happens first, A gets the old value of B + 1.
If the doubling of B happens first, then A may see the doubled value of B and compute a different result.

3.4. Concurrent writes to the same location

Example:
```
A = 1   ||   A = 2
```
Without further assumptions, the outcome is hardware-dependent.
One might see:
- 1,
- 2,
- or potentially even a corrupted/interleaved result, if the hardware does not guarantee atomicity for that write.

3.5. Moral

The outcome of concurrent shared-memory execution cannot be reasoned about purely at source-code level.
It depends crucially on what the hardware guarantees.

4. Atomic Operations

4.1. Basic idea

An atomic operation is an operation that cannot be interrupted “in the middle”.
It either happens entirely or not at all, from the perspective of concurrent observers.

4.2. Example: aligned atomic writes

Suppose the hardware guarantees that writes to aligned 16-bit words are atomic.
Then, if A is a 16-bit aligned variable:
```
A = 1   ||   A = 2
```
cannot produce some mixed intermediate bit pattern such as 3.
In that case, the final outcome must be either 1 or 2.

4.3. Alignment matters

The lecture emphasizes the distinction between:
- aligned accesses: address is a multiple of the word size,
- non-aligned accesses: address is not a multiple of the word size.
Typical hardware often guarantees atomicity only for certain native aligned word accesses.
Non-aligned accesses are often not atomic.

4.4. Larger values spanning multiple words

Even if 16-bit aligned writes are atomic, a 32-bit value spanning two 16-bit words is not automatically atomic.
Example:
```
A = 0x1   ||   A = 0x10000
```
If the two writes update different halves of the 32-bit value independently, one can observe a mixed result.
The lecture gives the example that such an execution can yield a hybrid value such as:
```
0x100001
```
The key point is:
- word-level atomicity does not imply atomicity for larger multi-word objects.

4.5. Hardware dependence

Atomicity properties are not universal.
One must understand the guarantees of the actual hardware/platform.
Common hardware-level atomic primitives include:
- word-aligned load/store,
- fetch-and-increment,
- test-and-set,
- compare-and-exchange / compare-and-swap (CAS).

4.6. Practical takeaway

Never assume an access is atomic unless the hardware/platform specification says so.
Concurrency reasoning must be grounded in the exact machine model.

5. Race Conditions

5.1. Definition

A race condition occurs when processes or threads are “racing” to perform conflicting operations.
The outcome then depends on:
- interleaving,
- relative timing,
- process speed.

5.2. Why races are bad

Usually, race conditions are considered bugs.
They make behavior:
- nondeterministic,
- hard to reproduce,
- hard to debug.

5.3. Benign races

The lecture briefly notes that some races can be benign in practice.
However, this is an exception.
In general, races should be treated as incorrect unless there is a very strong reason not to.

5.4. Goal

The goal of synchronization is to eliminate unwanted timing dependence and make the system behavior deterministic with respect to the intended specification.

6. The “Milk Buying” Problem as a Synchronization Example

6.1. Scenario

The lecture introduces a deliberately simple but instructive example:
- two roommates / robots,
- one fridge,
- milk should be bought if there is none,
- but there should not be too much milk,
- ideally exactly one bottle is bought when needed.

6.2. What can go wrong

If both agents independently observe “no milk” and both decide to go shopping, then both may buy milk.
The outcome depends on timing.
This is precisely a race condition.

6.3. Why the example matters

The example is simple, but it captures the core of synchronization:
- conflicting operations on shared state,
- need for mutual exclusion,
- need for progress,
- need to separate specification from implementation.

7. First Specify Correctness Before Proposing a Fix

7.1. Important methodology

Before trying to “fix” concurrency bugs, one must first specify what correct behavior actually means.
The lecture explicitly warns against prematurely diving into low-level implementation details.

7.2. High-level specification of the milk problem

A correct solution should satisfy at least the following:

7.2.1. Safety

Do not bring home a bottle of milk if there is already milk.
In other words, do not buy too much milk.
At most one agent should be in the relevant critical section at a time.

7.2.2. Progress / liveness

If there is no milk and no one is already taking care of the task, then eventually someone should go buy milk.
“Eventually” means:
- after a finite number of steps,
- not “maybe never”.

7.2.3. Exclusion

If one agent is already performing the milk-buying operation, the other should be excluded from doing the same conflicting operation at the same time.

7.3. Critical section

The relevant instructions that manipulate the shared decision state form a critical section.
A key synchronization goal is:
- critical sections that conflict must not execute simultaneously.

8. Naïve and Broken Milk-Buying Solutions

8.1. Idea: leave a note

A first idea is to use a note on the fridge:
- if there is no milk, leave a note saying “I’m taking care of it”,
- then go buy milk.

8.2. Why this fails

The problem is that:
- checking the fridge,
- and leaving the note,
are not atomic together.
Both agents may:
1. observe that there is no milk,
2. observe that there is no note,
3. both leave a note,
4. both go buy milk.
This only makes the bug less likely, not impossible.

8.3. Important engineering lesson

Making a race harder to trigger does not make the program correct.
The lecture explicitly points out this common anti-pattern:
- “it doesn’t fail anymore” does not imply correctness.

8.4. Another workaround: asymmetric protocols

The lecture then considers a more complicated protocol where the two processes execute different code.
This can be made to avoid double-buying in some cases.

8.5. Why asymmetric protocols are unattractive

They are awkward from a software engineering perspective.
Different threads/processes now need different code.
Extending the scheme to a third participant becomes unpleasant.
Such solutions may also introduce unfairness or bias.

9. Busy Waiting (Spinning)

9.1. Definition

Busy waiting (or spinning) means repeatedly checking a condition in a loop while doing no useful work.

9.2. Why it is bad

On a uniprocessor, busy waiting is especially wasteful:
- the waiting thread consumes CPU time,
- while the thread that could make progress may not get scheduled.
Thus, CPU time is wasted without useful progress.

9.3. Unbounded waiting is the core problem

The issue is not merely that there is a loop.
The issue is that the waiting time is not bounded in the program logic.
A pathological schedule can make the spinner waste arbitrary amounts of CPU time.

9.4. Scheduler interaction

The lecture gives an example:
- one process is spinning,
- the other process, which would resolve the condition, is not scheduled,
- perhaps because the spinner has higher priority.
Then the system may effectively stall.

9.5. Important exam-style lesson

Replacing the infinite loop with a fixed finite delay does NOT solve the correctness problem.
For any fixed number of wait steps, one can construct a schedule where that delay is insufficient.
Therefore, “pause a bit and hope” is not a correct synchronization strategy.

9.6. General rule

For the synchronization problems in this course, busy waiting is not considered a proper solution.

10. Locks as the Conceptual Interpretation

10.1. The note analogy as locking

The lecture interprets the note protocol as a crude locking scheme:
- leaving a note = locking the resource,
- removing the note = unlocking the resource.

10.2. Meaning of the lock

The protected resource is not literally “the fridge note”.
Rather, the lock protects:
- the consistency of the shared decision state,
- the correctness condition “do not buy more than one bottle”.

10.3. But crude locking is still not enough

Without proper primitives, the lock itself may be manipulated incorrectly.
Therefore, one needs a more robust abstraction.

11. Hardware Support for Atomic Synchronization

11.1. Desire for stronger atomic operations

Several proposed fixes effectively ask for something like:
- “check and set together”,
- “look and claim together”,
- “take a note and inspect state in one indivisible step”.

11.2. Real hardware primitives

Hardware often offers such low-level atomic instructions, for example:
- load/store variants,
- fetch-and-increment,
- test-and-set,
- compare-and-exchange / CAS.

11.3. But these are not portable abstractions

Different architectures provide different instructions and semantics.
x86, ARM, and other architectures differ substantially.
Correct use therefore requires detailed hardware knowledge.

12. Disabling Interrupts

12.1. Uniprocessor idea

On a uniprocessor, concurrency from the OS scheduler arises through interleaving.
Interleaving is caused by preemption, and preemption is triggered by interrupts.
Therefore, on a uniprocessor:
- execution between two interrupts is non-interleaved.

12.2. Consequence

If the kernel temporarily disables interrupts, then the currently running code will not be preempted.
This can make a short sequence of instructions effectively atomic on a uniprocessor.

12.3. Why this is dangerous

Interrupts must not be disabled for too long.
Otherwise the system becomes unresponsive to the environment.
Examples of negative consequences mentioned in the lecture:
- delayed reaction to device events,
- increased latency,
- poor interactivity,
- bad behavior in safety-critical systems,
- networking delays,
- generally reduced responsiveness.

12.4. Multiprocessor limitation

Disabling interrupts on one core does not stop another core from accessing shared memory.
Therefore, this is not a general solution on multiprocessors.

12.5. Proper lesson

Disabling interrupts can be used very carefully for extremely short kernel-critical sequences.
But it is not the general synchronization solution.
Proper synchronization abstractions are still needed.

13. Why Raw Atomic Instructions Are Not Enough

13.1. Too hardware-specific

Available atomic instructions vary from machine to machine.

13.2. Hard to use correctly

Even similarly named instructions may have different semantics across architectures.

13.3. Often leads to spinning

If synchronization is built directly on raw atomic primitives, one often ends up with busy waiting.
The scheduler is unaware of what the process is waiting for.

13.4. Therefore

The OS should provide a higher-level, portable abstraction with predictable semantics.

14. The OS-Level Abstraction: Semaphores

14.1. Historical note

Semaphores were introduced by Dijkstra in 1962.

14.2. Core abstraction

A semaphore is essentially an integer counter together with two atomic operations.

14.3. Operations

14.3.1. P operation

Wait until the counter is positive.
Then decrement it by one atomically.
Intuition:
- pass only if permission/resource is available.

14.3.2. V operation

Increment the counter by one atomically.
Intuition:
- signal that one unit of permission/resource/event has become available.

14.4. Alternative names

The lecture mentions common alternative terminology:
- P is also called:
  - wait,
  - down,
  - acquire.
- V is also called:
  - signal,
  - up,
  - release.

14.5. Why semaphores are better

They provide:
- portable semantics,
- a simple abstraction,
- efficient implementation inside the OS,
- no need for user-level busy waiting.

14.6. Efficient implementation idea

If a thread cannot proceed in P, the OS can:
- remove it from the ready queue,
- put it onto a wait queue,
- wake it later when the semaphore is signaled.
This avoids wasting CPU cycles by spinning.

15. Solving the Milk-Buying Problem with a Semaphore

15.1. Binary semaphore for mutual exclusion

The milk-buying problem becomes simple with a semaphore.
Use a semaphore representing the right to enter the critical section.

15.2. Initial value

The correct initial value is:
```
1
```
Reason:
- the first arriving process sees a positive value and proceeds,
- the next one sees 0 and must wait.

15.3. Conceptual solution

The code structure is essentially:

P(s)
check whether milk is present
if not, buy milk
V(s)

15.4. Why this is good

The solution is:
- symmetric,
- short,
- easy to reason about,
- portable,
- efficient when implemented by the OS.

16. Binary vs. Counting Semaphores

16.1. Binary semaphore

Values effectively restricted to:
- 0
- 1
Main use:
- mutual exclusion
It behaves conceptually like an atomic boolean lock.

16.2. Counting semaphore

Value is not restricted to 0 or 1.
It can count how many times some event has occurred.
Main use:
- condition synchronization
- counting available resources or produced events.

16.3. Key distinction

Binary semaphore:
- protects critical sections.
Counting semaphore:
- records availability or occurrence of events.

17. Producer-Consumer Synchronization

17.1. Problem setting

The lecture then moves to the producer-consumer pattern.
Producer:
- creates data.
Consumer:
- consumes/processes data.
A canonical real-world example is the Unix pipe.

17.2. Finite buffering

Produced data must be stored in a finite set of buffers.
Therefore synchronization is needed because:
- a producer needs an empty buffer before producing,
- a consumer needs a full buffer before consuming.

17.3. Basic specification

17.3.1. Producer side

Wait until an empty buffer is available.
Obtain one empty buffer.
Fill it with data.
Publish it as a full buffer.

17.3.2. Consumer side

Wait until a full buffer is available.
Obtain one full buffer.
Process it.
Return it to the pool of empty buffers.

17.4. Another crucial issue: shared state

The queues/lists of buffers are themselves shared state.
Therefore, in addition to condition synchronization, we also need mutual exclusion so that the shared buffer lists are not corrupted.

18. Semaphore Design for Producer-Consumer

18.1. Required semaphores

The lecture derives that three semaphores are needed:

18.1.1. A counting semaphore for empty buffers

Counts how many empty buffers are available.

18.1.2. A counting semaphore for full buffers

Counts how many full buffers are available.

18.1.3. A binary semaphore (mutex) for the shared buffer pool data structure

Protects the queue/list manipulations.

18.2. Suggested names

The lecture recommends clear names such as:
- buffer_empty
- buffer_filled
- buffer_pool_mutex
Especially, semaphores used for mutual exclusion should be named mutex to make code easier to understand.

18.3. Initialization

Let N be the total number of buffers.

18.3.1. Mutex

Initial value:
```
1
```

18.3.2. Filled-buffer counter

Initially, no data has been produced yet.
So initial value:
```
0
```

18.3.3. Empty-buffer counter

Initially, all buffers are empty.
So initial value:
```
N
```

19. Producer Process: Synchronization Pattern

19.1. High-level unsynchronized producer

The producer conceptually does:
1. get an empty buffer from the pool,
2. store produced data into it,
3. place the filled buffer into the pool of full buffers.

19.2. Synchronized version: first step

Before taking an empty buffer, the producer must wait until one exists:
```
P(buffer_empty)
```

19.3. Accessing the shared pool safely

To remove an empty buffer from the shared pool, the producer must enter the critical section:
```
P(buffer_pool_mutex)
```

19.4. Remove one empty buffer

The producer now obtains a buffer from the shared empty-buffer pool.

19.5. Release the mutex immediately after the shared-structure access

Once the producer has taken one private buffer out of the pool, it should release the mutex:
```
V(buffer_pool_mutex)
```

19.6. Why release the mutex here?

Because the actual data production may take a long time.
Holding the mutex during production would unnecessarily serialize unrelated work by other processes.
The critical section should be kept as short as possible.

19.7. Produce/store data into the private buffer

At this point the buffer is private to the producer.
No other process can access it while it is outside the shared pool.
Therefore this step does not require the mutex.

19.8. Re-enter the critical section to publish the full buffer

To insert the filled buffer into the shared full-buffer pool:
```
P(buffer_pool_mutex)
```

19.9. Insert the full buffer into the shared pool

Now the consumer can later obtain it.

19.10. Release the mutex again

V(buffer_pool_mutex)

19.11. Signal that one more full buffer is now available

V(buffer_filled)

19.12. Final synchronized producer pattern

P(buffer_empty)
P(buffer_pool_mutex)
get empty buffer b from pool
V(buffer_pool_mutex)

produce data into b

P(buffer_pool_mutex)
put b into full-buffer pool
V(buffer_pool_mutex)
V(buffer_filled)

19.13. Important conceptual distinction emphasized in the lecture

For a mutex-style semaphore, P and V often appear as a natural acquire/release pair around one critical section.
For a counting semaphore, the P and V are often in different parts of the program:
- one side waits for the event,
- the other side signals that the event happened.
This distinction is fundamental and frequently causes mistakes.

20. Questions and Clarifications Raised in the Lecture

20.1. Is the producer’s private buffer still shared?

No.
Once the producer removes one buffer from the shared pool under mutex protection, that buffer is private to the producer until it is placed back into the shared structure.

20.2. Why not keep the mutex during the whole production step?

Because the actual work may be slow.
Keeping the mutex would block other processes from accessing the shared pool even though they do not need the producer’s private buffer.
This would unnecessarily reduce concurrency.

20.3. How does `buffer_empty` get replenished?

This is done by the consumer when it finishes consuming a full buffer and returns it to the empty-buffer pool.
The lecture stops before completing the consumer code, but clearly indicates that this is the consumer’s responsibility.

20.4. Exam warning

The lecturer explicitly notes that this producer-consumer semaphore pattern is a standard pattern and a frequent source of mistakes in exams.
It should be studied until the role of each semaphore is completely clear.

21. Main Takeaways of the Lecture

21.1. Shared memory is powerful but dangerous

Once state is shared, correctness depends on interleaving and hardware guarantees.

21.2. Race conditions arise from timing dependence

If correctness depends on “who gets there first”, the system is likely broken.

21.3. Atomicity is central

Correct synchronization relies on operations that cannot be interrupted midway.
But raw atomic hardware operations are too low-level and too machine-dependent to use directly as the main abstraction.

21.4. Busy waiting is not an acceptable general solution

It wastes CPU and interacts badly with scheduling.

21.5. OS abstractions matter

The operating system should hide hardware complexity behind predictable synchronization abstractions.

21.6. Semaphores unify two needs

binary semaphores solve mutual exclusion,
counting semaphores solve condition synchronization.

21.7. Producer-consumer is the canonical next application

It requires both:
- condition synchronization (empty/full buffers),
- mutual exclusion (protecting the shared buffer pool).

21.8. General design lesson

First specify correctness:
- safety,
- liveness/progress,
- mutual exclusion.
Only then design the synchronization mechanism.

22. End of Lecture Scope

The lecture fully develops:
- shared-memory hazards,
- race conditions,
- atomicity,
- disabling interrupts,
- semaphore basics,
- milk-buying via semaphore,
- the producer side of the producer-consumer pattern.
The consumer side is explicitly deferred to the next lecture.