Processes & Threads

1. Course introduction

The challenge of operating systems comes from three things:

Scale and ubiquity: operating systems affect almost every part of modern computing.
Complexity: many components interact with each other, so everything is connected.
Efficiency pressure: the operating system is a kind of “tax” on computation, so it must use as few resources as possible while still managing the machine well.

The goal of the course is therefore not to cover every detail, which would take a lifetime, but to expose students to the core ideas and encourage them to explore real systems further, especially through open-source operating systems.

2. Initial topic: processes and threads

The main topic of the lecture is the conceptual foundation of processes and threads.

Historically, a process was defined as a program in execution. In modern terms, the lecturer reframes this more precisely: a process is better understood as a sphere of isolation or a protection domain. It is the boundary that contains all the resources relevant to a computation and defines what that computation can affect and what can affect it.

A thread, by contrast, is the thread of execution: the minimal sequential execution context required to make a processor execute instructions meaningfully. Historically, the thread was not always separated cleanly from the process, but modern systems distinguish them clearly.

3. What is included in a “program in execution”?

To make the notion of a process concrete, the lecturer asks what must be present for a program to count as “in execution.” Students answer correctly that it is not just code, but also the state required for execution. The lecture identifies the main components:

Program text: the instructions generated by the compiler or written manually.
Static/global memory: global variables and constants, sometimes historically called the BSS in Unix terminology.
Heap memory: dynamically allocated memory obtained at runtime.
Function call stack: where return addresses, arguments, and local activation records are stored.
Registers: including the instruction pointer and other architectural registers.
I/O resources: such as open files, ports, sockets, or hardware devices accessible to the computation.

The lecturer also explains the difference between static memory and heap memory. Static memory is prepared at load time and simply exists as part of the address space, whereas heap memory is managed dynamically by a runtime allocator and therefore incurs runtime overhead and may suffer from fragmentation.

4. Historical evolution: from single-program execution to multiprocessing

A major portion of the lecture explains how the concept of a process emerged historically from the practical limitations of early computers. In the early days, machines were large, slow, expensive, and could execute only one program at a time (Uniprogramming). Programs were loaded manually, often with significant effort.

The lecturer asks what is wrong with this model. The key issue is not merely that only one user can use the machine, but that the CPU is idle much of the time because programs wait on slow input/output operations such as tape or disk access. So even though the machine is extremely expensive, it may be utilized only a fraction of the time.

This leads to multiprogramming: if one program is blocked waiting for I/O, another program can use the CPU. The system saves the state of the current computation, loads another one, and later resumes the first. In early systems this switching was coarse-grained and expensive, since whole programs might have to be unloaded and reloaded.

As hardware improved and memory became larger, systems evolved toward multiprocessing in the historical sense used in the lecture: multiple program executions could coexist in memory, and the system switched among them more frequently to maximize throughput. This is the classic batch-processing model, where the goal is to keep the machine busy and finish as much work as possible by the next day.

5. Why batch processing was not enough

Although batch-oriented multiprogramming improved hardware utilization, it still had major problems. The lecturer points out two especially important ones.

First, response time could be poor and unpredictable. A badly ordered workload could make some jobs wait a long time. Second, and more importantly, interactive use was difficult or impossible. In a batch world, a programmer might submit a deck of punch cards, wait many hours, and then discover only that there was a syntax error early in the program. This made software development painfully slow.

A further issue emerged once multiple programs were resident in memory at the same time: lack of isolation became dangerous. If one program could overwrite another’s memory, then a programmer could not trust incorrect results—was the bug in their own code, or did some other user corrupt their state? This is where the modern idea of the process as an isolated execution environment becomes essential.

6. Time sharing

To solve the productivity problem, systems moved from multiprogramming to time sharing. Time sharing can be seen as a more aggressive, fine-grained form of multiprogramming: the CPU is switched among jobs in very small time slices, so that to human users the machine appears to respond instantly.

The lecturer explains that what counts as “instantaneous” has changed over time. Earlier systems could satisfy users with responses on the order of a second, whereas modern expectations are much stricter. Nonetheless, the underlying principle is the same: divide CPU time into slices small enough that many users or many programs can interact with the same machine as though it were dedicated to them.

Historically, this meant many human users connected via terminals to one expensive central computer. In modern systems, the same concept still applies, though often with one human user and many processes, or with background daemons and services acting as additional “users” of the system.

7. Why threads were introduced

The lecture then turns to why the thread abstraction became necessary. As process-based systems became popular, workloads became more demanding. Networking emerged, programs had to react to multiple inputs, and hardware started to provide multiple processors or cores. A single sequential process was not sufficient to express all this concurrency.

A natural solution was to create more processes, but this was expensive. Processes are heavyweight because they carry all the machinery of isolation and resource tracking. Creating and switching processes takes time and memory, especially in older systems with limited resources. Yet many concurrent activities within one application do not actually need isolation from each other. They are cooperating parts of the same logical computation.

Therefore, operating systems introduced lightweight processes, now called threads. A thread keeps only the minimum execution state necessary to run on a CPU, while sharing the surrounding process environment with other threads. In this way, threads provide a cheaper abstraction for concurrency and parallelism.

8. Modern interpretation of process vs. thread

The lecturer gives a clear modern distinction:

A process is a protection domain. It is the central abstraction for resource management and security.
A thread is the minimal sequential unit of execution. It should contain as little state as possible beyond what is required to continue computation.

The process provides the environment in which threads execute. Multiple threads can exist inside one process and share all the process resources. Because they are not isolated from each other, they can “trample all over each other” if programmed incorrectly, but they are isolated from threads in other processes because the process boundary still enforces protection.

The usual modern structure is:

each process has one or more threads,
each thread belongs to exactly one process,
a process terminates when its last thread terminates.

However, the lecturer also points out that this is not the only possible design. He discusses exceptions and alternative systems, such as kernel threads and more experimental ideas like migrating threads IPC, where a thread might move with a message into another protection domain. He also mentions research systems such as Composite OS, which challenge the traditional assumptions about what processes and threads must be.

9. Why processes and threads are ubiquitous

The lecture argues that nearly every operating system ends up with abstractions very similar to processes and threads, even if they use different terminology such as tasks, apps, or event handlers. This happens because all operating systems must manage execution, share hardware, and often provide some form of isolation.

These abstractions support three core operating-system responsibilities:

Resource virtualization: giving the illusion that many programs can run even on limited hardware.
Isolation: separating different computations so they cannot interfere with each other improperly.
Abstraction: giving programmers and compiler writers a simple model — a sequential processor dedicated to their computation — while the operating system handles the complex reality underneath.

They are also essential for system design because they help decompose complex concurrent systems into manageable sequential components that can then be composed.

10. Systems without processes

The lecturer asks whether an operating system can have threads but no true processes. The answer is yes. Examples include MS-DOS and classic Mac OS, where there was no real protection domain or isolation. Such systems still had units of execution or applications, but not properly isolated processes in the modern sense.

He also mentions that many embedded systems still run with multiple execution contexts but weak or no memory isolation, which is one reason embedded security is often difficult.

11. Implementing processes and threads: control blocks

The operating system kernel must keep track of all processes and threads, so it uses data structures known as:

PCB: Process Control Block
TCB: Thread Control Block

A “control block” is not anything mystical; it is simply a data structure, historically named before “data structure” became the common abstraction in programming language design.

The PCB typically contains:

process identifier (PID),
references to all threads in the process,
information about accessible memory,
information about non-memory resources such as files, sockets, and devices,
security/account data such as the user account or capabilities,
accounting information such as CPU time, memory usage, quotas, and billing data.

The TCB typically contains:

a pointer to the owning PCB,
a thread identifier (TID),
saved processor state such as registers, instruction pointer, and stack pointer,
(A pointer to) the kernel stack used by this thread
the current state of the thread,
scheduling metadata such as priority, weight, or deadline information,
CPU accounting data.

The lecturer remarks that saving processor state during context switches is indeed overhead, and that context-switch paths are among the most performance-critical parts of an operating system.

12. Kernel stacks and a classic optimization

Each thread executing in kernel mode needs a kernel stack. The lecture asks where such stacks should be allocated. A naïve design might reserve a fixed pool or allocate them separately, but both approaches have downsides.

The lecturer presents a classic systems trick: allocate one contiguous block of memory — often a 4 KiB page — and place the TCB at one end and the kernel stack at the other, with the stack growing downward. This gives several advantages:

only one allocation is needed,
no extra pointer is required to locate the stack,
layout is compact and efficient,
allocation size aligns nicely with page-based memory management.

The downside is that if the kernel stack overflows, it may overwrite the TCB. To detect this, systems often place a magic value or canary in the control block. If the canary is corrupted, the kernel knows a stack overflow likely occurred. Because kernel stack overflow can imply arbitrary memory corruption, the correct response is often to panic or halt the system rather than continue unsafely.

The lecturer contrasts this with modern 64-bit systems, which often place guard pages around stacks instead. That approach is easier in a large address space, while 32-bit systems are more constrained and historically relied more on compact layouts like TCB-plus-stack-in-one-page.

13. Dispatching vs. scheduling

The lecture distinguishes two related but different concepts:

Dispatching: the architecture-specific low-level mechanics required to switch execution from one thread to another.
Scheduling: the higher-level policy that decides which thread should run next.

Dispatching is mostly a matter of hardware details and once implemented is largely mechanical, while scheduling is a rich design space involving goals such as fairness, throughput, average response time, deadline satisfaction, or quality of service. This separation lets operating systems combine one low-level switching mechanism with many possible scheduling policies.

14. Basic thread states

Finally, the lecturer introduces the standard basic lifecycle states of a thread:

Ready: the thread is prepared to run and would make progress if given a CPU.
Running: the thread is currently executing on a CPU.
Waiting / Blocked: the thread cannot make progress because it is waiting for some event, such as disk I/O, a lock, or time to pass.
Exited: the thread has completed execution.

The lecture explains several transitions:

Ready → Running: the scheduler dispatches the thread.
Running → Ready: the thread is preempted, often due to a timer interrupt.
Running → Waiting: the thread blocks because it cannot continue until some external condition is satisfied.
Waiting → Ready: the awaited event occurs.
Running → Exited: the thread terminates.

A particularly interesting corner case is that in real systems these transitions do not happen in zero time. Therefore, a thread may transition from waiting back to ready so quickly that it effectively becomes runnable again before the scheduler finishes handling the original block request. This produces the less tidy “waiting to running” shortcut often seen in actual implementations.

The lecture also notes that in Linux-like systems, a stuck thread in kernel mode may become effectively unkillable if it can no longer run the code needed to terminate itself. This illustrates how real systems can deviate from the simple textbook state model.

15. Kernel bookkeeping structures

To manage threads, the kernel usually maintains:

a ready queue for runnable threads,
a reference to the currently executing thread on each core,
wait queues for threads blocked on particular events,
list of all processes and/or threads: lookup structures mapping process IDs or thread IDs to their control blocks.

This bookkeeping is necessary both for resource management and for scheduling decisions.

16. Review of the core roles of an operating system

The lecture begins by revisiting the three core jobs of an operating system:

Isolation: the OS separates protection domains so that different processes cannot freely interfere with one another.
Resource virtualization: the OS multiplexes limited physical resources, such as CPUs, so that many processes can run as if each had its own machine.
Abstraction: the OS provides convenient higher-level abstractions over raw hardware details.

The lecture also reviews the “modern perspective”:

A process is a protection domain.
A thread is the basic unit of sequential execution.

Another distinction that is reinforced is:

Scheduling is the higher-level policy that decides what should run next, according to goals such as latency, throughput, or fairness.
Dispatching is the lower-level mechanism that actually performs the switch.

17. PCB vs. TCB and what the OS tracks

The lecture uses a quiz to sharpen the distinction between process state and thread state.

Some information is naturally process-related and may be tracked in the PCB, such as:

resource usage like network I/O,
open files,
whether a debugger is attached.

By contrast, certain information is thread-specific and therefore belongs conceptually to the TCB rather than the PCB, for example:

whether the next instruction should execute in single-step mode for debugging,
stack-related properties such as nested function calls.

The broader point is that process state and thread state must be kept conceptually separate: the PCB concerns process-wide resources and protection-domain information, while the TCB captures per-thread execution state.

18. What it really means for a thread to block

A key discussion addresses an important abstraction: what does it actually mean when a thread “blocks”?

The answer is that there is no magic. A thread is not some mystical entity that decides to sleep. In reality, the OS manipulates ordinary data structures:

flags,
lists,
wait queues,
thread states.

The lecturer illustrates this first with Linux source code for pipes. When a process performs a blocking read on a pipe and no data is available, the kernel:

places the thread into an appropriate waiting state (such as an interruptible sleep state),
inserts the thread into the pipe’s wait queue,
eventually calls the scheduler,
later wakes the thread when data becomes available.

The same idea is then shown more cleanly in the teaching OS Pintos. There, blocking a thread is conceptually simple:

mark the current thread as blocked,
invoke the scheduler,
and later, when it should resume, put it back on the ready list and mark it ready again.

So the operational meaning of blocking is simply:

change thread state,
move it between lists,
let the scheduler choose something else.

19. Process creation

The lecture next explains how processes are created.

19.1. The first process

The first process in the system is special and is created during boot. The kernel or boot code must:

allocate memory,
load the program image (e.g. from an ELF executable),
set up the initial execution state such as the instruction pointer,
then transfer control to it.

In Unix-like systems, this first user-space process is traditionally init (or its equivalent). Its job is to bring up the rest of user space, launching further processes such as login managers, desktop sessions, and service daemons.

19.2. Two styles of process creation

The lecture contrasts two major styles of process creation.

19.2.1. Constructor-style creation

One style is what the lecturer informally calls a constructor style:

specify the binary to run,
call something like create_process,
obtain a new running process.

This is the model associated with interfaces such as Windows CreateProcess and CreateThread. Conceptually, it is straightforward.

19.2.2. Fork/exec in Unix

Unix instead uses forking:

fork duplicates the current process,
producing a parent and child,
and then the child often calls exec to replace its address space with a new program.

This means that create process can be implemented conceptually as:

fork + exec

The advantage of fork is flexibility. Since the child initially receives a copy of the parent’s state, it can do more than merely launch a new program. For example, forking can be used for:

preserving a snapshot of state,
checkpointing,
special service patterns such as those used in server software.

19.3. Copy-on-write

Conceptually, a fork creates a full copy of the parent’s address space. But literally copying all memory would be too expensive. Real systems therefore use copy-on-write:

parent and child initially share the same physical pages,
both views are write-protected,
actual copying happens only when one side writes.

Thus, fork provides the illusion of a full copy without eagerly duplicating gigabytes of memory.

19.4. Threads and clone

For threads, the lecture mentions that Linux uses clone underneath interfaces such as pthread_create. In fact, Linux conceptually relies on clone as the generalized primitive for creating both processes and threads, depending on flags.

The main idea is that thread creation resembles process creation but shares more state. A new stack is required, but other execution context and resources may be shared depending on the requested semantics.

19.5. Forking a multithreaded process

An especially important warning concerns fork in a multithreaded process.

If one thread calls fork, the child process receives only the calling thread, not all the other threads. This is effectively the only practical design choice, since duplicating all threads would often be wasteful, especially when the usual pattern is immediately fork followed by exec.

However, this creates danger:

the other threads may have been in the middle of modifying shared state,
they may have held locks,
the child may inherit an inconsistent snapshot of memory and runtime-library state.

Therefore, the practical rule is:

do not fork a multithreaded process unless the child intends to call exec very soon afterward.

The lecture stresses that after a multithreaded fork, only a very restricted set of operations is safe before exec. Even ordinary library calls such as printf may be unsafe because of internal shared state.

20. How context switching works

The lecture then turns to one of the central mechanisms of an OS: context switching.

20.1. Function call as the conceptual model

The lecturer first reviews how an ordinary function call works at the assembly level. In the simplified architecture used in class, there are:

registers,
a stack pointer,
an instruction pointer,
and a page-table pointer.

A function call proceeds by:

saving caller-saved registers onto the stack,
pushing the return address,
jumping to the callee,
later returning by popping the saved state and resuming execution.

This review is important because a context switch is presented as almost the same thing.

20.2. The key trick in a context switch

A context switch is essentially a function call plus one trick:

save the current stack pointer,
load a different stack pointer,
then continue execution by restoring state from a different stack.

That is the heart of the switch. The “context” being switched is primarily:

the stack,
and, if the next thread belongs to a different process, the memory-isolation context (represented here by the page-table pointer).

20.3. High-level scheduling vs. low-level switch

The lecture separates:

the scheduler, which decides which thread should run next,
from the context switch routine, which performs the actual machine-state transition.

The scheduler chooses a next thread according to policy. Then the low-level switch routine performs the machine-level transition between the current TCB and the next TCB.

20.4. The three essential steps

The lecture describes context switching in three core steps:

Save the current context The current thread’s stack pointer is stored in its TCB, making its previously stale saved state valid again.
Load the next thread’s stack pointer The CPU stack pointer is changed to the saved value from the next thread’s TCB. This moves execution onto the other thread’s stack.
If necessary, switch process context If the next thread belongs to a different process, the OS updates the page-table pointer or equivalent memory-isolation state using process-level information from the PCB.

After that, registers and control state are restored and execution resumes where the new thread was previously interrupted.

20.5. Why the stack matters so much

The lecture emphasizes that the stack is the core execution context. Each thread needs its own stack, because the stack contains:

saved registers,
return addresses,
local call frames,
and the current continuation of execution.

Switching stacks is therefore what makes it possible to suspend one thread and resume another.

20.6. Relation to kernel mode

The lecture distinguishes two ideas that are sometimes conflated:

entering kernel mode via interrupt, exception, or system call,
switching between threads/processes.

A process switch generally requires kernel support because privileged state, such as page-table configuration, must be changed. But conceptually, the act of entering the kernel is distinct from the thread-to-thread context switch itself.

20.7. Why preemption is possible

A natural question is how the scheduler regains control if a user thread never voluntarily yields. The answer is the timer interrupt:

the kernel programs a hardware timer,
after a chosen time slice expires, the CPU raises an interrupt,
control returns to the kernel,
and the scheduler can decide whether to continue the current thread or switch to another.

So preemption depends on hardware interrupts that periodically force execution back into the kernel.

20.8. Main conclusion about context switching

The lecture’s main conceptual takeaway is:

context switching is not mysterious,
it is basically like a function call,
except that the OS saves one stack and continues from another,
and, if needed, changes the process isolation context too.