Transport Layers 1
1. Transport Layer: UDP and TCP
These two lectures move from the application layer to the transport layer.
The transport layer provides logical communication between application processes running on different hosts.
The main transport-layer protocols discussed are:
- UDP: User Datagram Protocol
- TCP: Transmission Control Protocol
The lectures first introduce general transport-layer services, ports, segments, multiplexing/demultiplexing, and UDP. Then they move into TCP: sequence numbers, acknowledgements, reliable data transfer, flow control, and connection management.
2. Lecture 7: Transport Layer, UDP, and TCP Part 1
2.1. From application layer to transport layer
We have already seen many application-layer protocols and socket programming. Now the course moves one layer down.
The transport layer sits between:
- the application layer above it;
- the network layer below it.
Its job is not to route packets through the Internet. That is the network layer’s job. Instead, the transport layer provides communication between processes.
A useful mental model is:
Application process
|
socket
|
Transport layer
|
Network layer
The application writes data into a socket. The transport layer then prepares this data for transmission over the network.
2.2. Transport-layer actions
At the sender:
- the application passes an application-layer message to the transport layer;
- the transport layer determines suitable header fields;
- it creates transport-layer segments;
- it passes these segments down to IP.
At the receiver:
- the transport layer receives segments from IP;
- it checks header fields;
- it extracts or reassembles the application-layer data;
- it demultiplexes the data to the correct socket/application.
In other words, the application may think it is writing a large object, such as a webpage or a file, into a socket. The transport layer can split this into smaller pieces and later reassemble them.
2.3. TCP vs UDP: high-level comparison
2.3.1. TCP
TCP stands for Transmission Control Protocol.
It provides:
- reliable delivery;
- in-order delivery;
- connection setup;
- flow control;
- congestion control;
- full-duplex communication;
- a byte-stream abstraction.
TCP hides many network-level problems from the application.
For example, IP may deliver packets out of order, lose packets, or deliver packets along different paths. TCP tries to make the application see a clean, ordered byte stream.
2.3.2. UDP
UDP stands for User Datagram Protocol.
It provides:
- connectionless communication;
- unordered delivery;
- unreliable delivery;
- no retransmission mechanism;
- no flow control;
- no congestion control;
- a small header;
- low overhead.
UDP is close to “best effort IP plus ports”.
If data is lost, UDP itself does not repair it. If data arrives out of order, UDP does not reorder it. If the receiver is overwhelmed, UDP does not slow down the sender.
2.3.3. Services not provided by either TCP or UDP
Neither TCP nor UDP gives strict guarantees about:
- delay;
- bandwidth.
TCP can adapt to congestion and retransmit lost data, but it still does not promise that a segment will arrive within a fixed time. UDP is even simpler and also gives no such guarantees.
2.4. Congestion control vs flow control
The lecture emphasized that TCP has both congestion control and flow control, but they solve different problems.
2.4.1. Congestion control
Congestion control protects the network.
If too many senders transmit too much data at the same time, routers may need to buffer packets. Buffers are finite. Once they fill up, packets are dropped. If many connections keep sending aggressively, the network can become congested and performance can collapse.
TCP congestion control tries to make senders reduce their sending rate when the network appears overloaded.
The important idea is:
Congestion control limits how much data the network can handle.
This is a network-side limitation.
2.4.2. Flow control
Flow control protects the receiver.
The receiver has finite buffer space. If the sender sends faster than the receiver can process or read data, the receiver’s buffer may overflow.
The important idea is:
Flow control limits how much data the receiver is willing or able to accept.
This is a receiver-side limitation.
2.4.3. Effective sending limit
The sender should respect both limits:
\[ \text{allowed in-flight data} = \min(\text{congestion window}, \text{receiver window}) \]
The congestion window is related to the network. The receiver window is related to the receiver’s buffer.
2.5. Multiplexing and demultiplexing
2.5.1. Motivation
On one host, many applications may use the network at the same time.
For example:
- a browser;
- an email client;
- a DNS resolver;
- a video call;
- an SSH session.
All of them use the same network stack. The transport layer therefore needs a way to combine data from multiple applications and later separate incoming data back to the right application.
2.5.2. Multiplexing
Multiplexing happens at the source host.
It means:
Gather data from multiple application sockets, add transport headers, and send the resulting segments through the network.
So the source host takes multiple application data streams and sends them through the transport layer.
2.5.3. Demultiplexing
Demultiplexing happens at the destination host.
It means:
Use header fields to deliver each received segment to the correct socket or application process.
The destination host receives segments from the network and decides which application should get each segment.
2.5.4. Multiplexing in other layers
The term multiplexing is not unique to the transport layer.
In general, multiplexing means combining multiple logical streams into one lower-level channel. Demultiplexing means separating them again.
For example, HTTP/2 can multiplex multiple application-level streams over one TCP connection.
2.6. Ports
Ports are used to identify application endpoints.
Both TCP and UDP headers contain:
- source port;
- destination port.
Some common well-known ports:
| Port | Protocol / Use |
|---|---|
| 80 | HTTP |
| 443 | HTTPS |
| 25 | SMTP |
| 53 | DNS |
Important details:
- The destination port often identifies the server-side application.
- The source port is often chosen dynamically by the client.
- Ports 0–1023 are traditionally well-known or privileged ports.
- Client-side ephemeral ports are usually chosen from a high-numbered range.
- The same port number can technically have different meanings for TCP and UDP, though common services often use the same number where applicable.
- DNS commonly uses UDP port 53, but DNS can also use TCP port 53.
2.7. Segments and packets
A segment is the protocol data unit of the transport layer.
A packet is the protocol data unit of the network layer.
So:
| Layer | PDU name |
|---|---|
| Transport layer | Segment |
| Network layer | Packet |
The lecture joked that “packages” are for DHL, not the Internet. In networking, the usual terms are segments and packets.
2.8. UDP: User Datagram Protocol
2.8.1. Basic properties
UDP is a minimal transport-layer protocol.
It is:
- connectionless;
- unreliable;
- unordered;
- best-effort;
- simple;
- fast;
- low-overhead.
UDP was originally specified in RFC 768.
There is no handshake before sending UDP data. Each UDP datagram is handled independently.
2.8.2. Why use UDP?
UDP is useful because it avoids TCP’s connection setup and reliability machinery.
Advantages:
- no connection establishment delay;
- no connection state;
- small header;
- simple implementation;
- sender can transmit at its chosen rate;
- useful for simple request-response protocols;
- useful for loss-tolerant, delay-sensitive applications.
Examples:
- DNS;
- streaming multimedia;
- voice-over-IP;
- SNMP;
- HTTP/3 over QUIC.
For DNS, UDP makes sense because a typical DNS lookup is one query and one response. A full TCP handshake before every small query would add unnecessary round-trip delay.
2.8.3. UDP and congestion
UDP itself has no congestion control.
This can be useful for some applications, but it is also dangerous. A UDP application can send too aggressively and contribute to congestion.
If reliability or congestion control is needed on top of UDP, the application or a higher-level protocol must implement it.
This is relevant for QUIC:
- QUIC runs on top of UDP;
- QUIC implements reliability, congestion control, and flow control itself;
- therefore QUIC uses UDP as a substrate, but adds many TCP-like features above it.
2.8.4. UDP buffers
UDP has a receive buffer at the receiver side.
However, UDP does not keep a sender-side retransmission buffer. Once UDP passes a datagram down to IP, UDP forgets about it.
TCP is different: TCP keeps sent data in a send buffer so that it can retransmit if necessary.
2.8.5. UDP segment structure
The UDP header is small:
| Field | Size |
|---|---|
| Source port | 16 bits |
| Destination port | 16 bits |
| Length | 16 bits |
| Checksum | 16 bits |
So the UDP header is 8 bytes.
The length field includes:
- UDP header;
- UDP payload.
The checksum is used for error detection, not security.
2.8.6. UDP checksum
The checksum helps detect accidental corruption, such as bit flips.
It does not provide cryptographic integrity. An attacker who can modify both the data and the checksum can still forge a valid checksum.
The checksum is computed using one’s-complement arithmetic over 16-bit words. A pseudo-header is also involved, including information such as IP addresses, protocol identifier, and length.
If the packet length is odd, padding may be added for checksum computation.
A checksum value of zero can indicate that no checksum is used, but the lecture emphasized that the checksum should normally be enabled.
2.8.7. UDP summary
UDP is best understood as:
Send and hope for the best.
UDP segments may be:
- lost;
- duplicated;
- delivered out of order;
- delivered correctly.
UDP does not fix these problems. If the application needs more guarantees, it must build them itself.
2.9. TFTP and the Sorcerer’s Apprentice problem
The lecture used TFTP, the Trivial File Transfer Protocol, as a transition from UDP to TCP.
TFTP is a simple file-transfer protocol built on UDP. It tried to add reliability by using acknowledgements.
A simplified TFTP idea:
- Sender sends block \(X\).
- Receiver acknowledges block \(X\).
- Sender sends block \(X+1\).
- Receiver acknowledges block \(X+1\).
This looks simple, but an early design problem caused duplication.
2.9.1. The problem scenario
Suppose the sender sends block \(X+1\).
If the ACK is delayed, the sender may timeout and retransmit \(X+1\).
Now the receiver may receive two copies of \(X+1\). If it acknowledges each copy, then each ACK may trigger the sender to send \(X+2\).
Then the receiver may get two copies of \(X+2\), acknowledge both, and the sender may send two copies of \(X+3\), and so on.
The number of duplicate packets can grow.
This was called the Sorcerer’s Apprentice syndrome.
2.9.2. Lesson
The lesson is that reliability is subtle.
It is not enough to simply say:
Every packet must be acknowledged.
The protocol must also handle:
- delayed packets;
- duplicate packets;
- duplicate ACKs;
- retransmissions;
- state management;
- loss recovery.
This motivates using a dedicated transport protocol like TCP instead of requiring every application to implement these mechanisms itself.
2.10. TCP: Transmission Control Protocol, Part 1
2.10.1. TCP’s abstraction
TCP provides a reliable, in-order byte stream.
This means:
- TCP views data as a stream of bytes;
- TCP does not preserve application message boundaries;
- bytes should be delivered to the receiving application in the same order in which they were sent.
For example, if an application writes:
write("hello")
write("world")
the receiver may simply read a byte stream:
helloworld
TCP itself does not necessarily remember that the application made two writes.
2.10.2. Point-to-point communication
TCP is point-to-point:
- one sender;
- one receiver.
It is not one-to-many multicast.
2.10.3. Full duplex
TCP is full duplex.
One TCP connection supports data flow in both directions at the same time.
So a TCP connection between A and B contains:
- A-to-B byte stream;
- B-to-A byte stream.
Each direction has its own sequence number space.
2.10.4. Connection-oriented communication
TCP is connection-oriented.
Before normal data transfer, the endpoints perform a handshake to initialize connection state.
The state includes, for example:
- sequence numbers;
- acknowledgement numbers;
- buffer/window information;
- negotiated options;
- connection state.
2.10.5. MSS: Maximum Segment Size
The MSS is the maximum amount of TCP payload data that can be sent in one TCP segment.
It should be smaller than the MTU, because the IP and TCP headers also need space.
For Ethernet with MTU 1500 bytes, a common TCP payload size is:
\[ 1500 - 20 - 20 = 1460 \]
assuming:
- 20 bytes IPv4 header;
- 20 bytes TCP header without options.
The MSS is about TCP payload size, not total frame size.
2.10.6. TCP segment header
TCP has a much more complex header than UDP.
Important fields:
| Field | Meaning |
|---|---|
| Source port | Sending-side port |
| Destination port | Receiving-side port |
| Sequence number | Byte number of first data byte in this segment |
| Acknowledgement number | Next byte expected from the other side |
| Header length / offset | Where payload starts |
| Checksum | Error detection |
| Receive window | Receiver-advertised free buffer space |
| Urgent pointer | Used with urgent data |
| Options | Variable-length TCP options |
| Flags | Control bits such as ACK, SYN, FIN, RST |
Important flags:
| Flag | Meaning |
|---|---|
| ACK | Acknowledgement field is valid |
| SYN | Synchronize sequence numbers; used in connection setup |
| FIN | Sender is finished sending in this direction |
| RST | Reset connection |
| PSH | Push data to application promptly |
| URG | Urgent data |
| ECN-related flags | Used for Explicit Congestion Notification |
2.10.7. Sequence numbers
TCP sequence numbers count bytes, not segments.
The sequence number of a TCP segment is the byte-stream number of the first data byte in that segment.
Example:
If a sender starts with sequence number 500 and sends 200 bytes, then the next sequence number is:
\[ 500 + 200 = 700 \]
So the next segment should start with sequence number 700.
If the sender later sends only 32 bytes, the next sequence number increases by
2.10.8. Sequence number space and sender window
The sender’s sequence number space can be divided into regions:
| Region | Meaning |
|---|---|
| Sent and ACKed | Data already confirmed by receiver |
| Sent but not ACKed | Data in flight |
| Usable but not sent | Data that may still be sent within the window |
| Not usable | Data outside the current sending window |
The usable window is limited by both:
- flow control;
- congestion control.
The sender may have multiple unacknowledged segments in flight. This is pipelining.
2.10.9. Acknowledgement numbers
The acknowledgement number means:
I have received all bytes before this number, and I expect this byte next.
Example:
If Host A sends 200 bytes starting at sequence number 500, then Host B sends:
\[ ACK = 700 \]
This means:
I received bytes 500 through 699. I now expect byte 700.
2.10.10. Cumulative ACKs
TCP ACKs are cumulative.
An ACK for byte \(N\) means:
All bytes before \(N\) have been received in order.
So if a receiver sends:
\[ ACK = 3000 \]
it means all bytes up to 2999 have been received in order.
If there is a gap, the receiver cannot cumulatively acknowledge beyond the gap.
2.10.11. Example: gap in the byte stream
Suppose TCP segments have payload length 500.
A sends:
| Segment | Seq | Length |
|---|---|---|
| 1 | 2500 | 500 |
| 2 | 3000 | 500 |
| 3 | 3500 | 500 |
If segment 2 is lost but segment 3 arrives, the receiver has:
- bytes 2500–2999;
- bytes 3500–3999;
- but not bytes 3000–3499.
The receiver cannot send \(ACK=4000\), because that would falsely claim that all bytes before 4000 were received.
Instead, the receiver repeats:
\[ ACK = 3000 \]
This duplicate ACK tells the sender:
I am still waiting for byte 3000.
2.10.12. Duplicate ACKs
A duplicate ACK is an ACK with the same acknowledgement number as before.
Duplicate ACKs usually mean:
- later data arrived;
- but some earlier data is missing;
- therefore the receiver keeps asking for the same next byte.
TCP does not immediately retransmit after one duplicate ACK because packets can be reordered in the network.
The common rule is:
After three duplicate ACKs, TCP assumes loss and performs fast retransmit.
More precisely, the sender receives the original ACK plus three additional ACKs for the same acknowledgement number.
2.10.13. Sequence number examples from the lecture
Example 1:
A sends:
\[ Seq = 1500,\quad Length = 500 \]
B should ACK:
\[ ACK = 2000 \]
A then sends:
\[ Seq = 2000,\quad Length = 500 \]
B should ACK:
\[ ACK = 2500 \]
Example 2 with loss:
A sends:
\[ Seq = 2500,\quad Length = 500 \]
B ACKs:
\[ ACK = 3000 \]
A sends:
\[ Seq = 3000,\quad Length = 500 \]
but this is lost.
A sends:
\[ Seq = 3500,\quad Length = 500 \]
B receives this but sees a gap. It still sends:
\[ ACK = 3000 \]
because byte 3000 is still the next expected byte.
2.10.14. ACKs and pure ACK segments
Each TCP segment contains both:
- a sequence number;
- an acknowledgement number.
Even a pure ACK segment, with no application payload, still has a sequence number field.
However, a pure ACK with no data normally does not consume sequence number space. SYN and FIN are special: they each consume one sequence number.
2.10.15. Initial sequence numbers
TCP sequence numbers do not simply start at zero.
Historically, predictable initial sequence numbers enabled attacks, because an attacker could guess where important data would appear in the byte stream.
Modern TCP uses random initial sequence numbers.
Each direction has its own initial sequence number.
For a connection between A and B:
- A chooses an initial sequence number for A-to-B data;
- B chooses an initial sequence number for B-to-A data.
3. Lecture 8: TCP Part 2
Lecture 8 goes deeper into TCP.
Main topics:
- TCP segment refresher;
- sequence numbers and acknowledgements;
- reliable data transfer;
- RTT estimation and timeout;
- retransmissions;
- delayed ACKs;
- fast retransmit;
- flow control;
- connection management.
3.1. TCP refresher
TCP is:
- point-to-point;
- reliable;
- in-order;
- byte-stream oriented;
- full duplex;
- connection-oriented;
- flow controlled;
- congestion controlled.
TCP provides the illusion of a reliable ordered stream on top of IP, even though IP itself is unreliable and packet-oriented.
3.2. TCP byte stream and message boundaries
TCP does not preserve message boundaries.
The application writes bytes. TCP sends bytes. The receiving application reads bytes.
If the application needs message boundaries, it must implement them itself, for example by:
- fixed-size messages;
- delimiters;
- length prefixes;
- application-level framing.
This is why protocols built on TCP often define their own framing format.
3.3. Pipelining
If TCP sent only one segment and then waited for its ACK before sending the next one, throughput would be very poor, especially on long-delay links.
So TCP uses pipelining:
The sender may have multiple segments in flight before receiving ACKs.
The number of bytes allowed in flight depends on:
\[ \min(cwnd, rwnd) \]
where:
- \(cwnd\) is the congestion window;
- \(rwnd\) is the receive window.
3.4. Sequence numbers and ACKs: detailed examples
3.4.1. Telnet example
The lecture uses a simple Telnet-like example where the user types one character, \(C\).
Host A sends \(C\) to Host B:
\[ Seq = 42,\quad ACK = 79,\quad data = C \]
Since \(C\) is one byte, Host B acknowledges the next expected byte:
\[ ACK = 43 \]
Host B also echoes \(C\) back using its own sequence number space:
\[ Seq = 79,\quad ACK = 43,\quad data = C \]
Then Host A acknowledges the echoed byte:
\[ Seq = 43,\quad ACK = 80 \]
Important lesson:
- the ACK number is the next expected byte from the other side;
- each direction has its own sequence number space;
- a one-byte payload increases the next sequence number by 1.
3.4.2. Larger segment example
Suppose A’s initial sequence number is 10, B’s initial sequence number is 200, and A sends 10-byte segments.
A sends:
\[ Seq = 10,\quad Len = 10 \]
B should ACK:
\[ ACK = 20 \]
A sends another 10 bytes:
\[ Seq = 20,\quad Len = 10 \]
B can ACK cumulatively:
\[ ACK = 30 \]
This says:
I have received everything before byte 30.
3.4.3. Loss example
A sends:
| Segment | Seq | Length |
|---|---|---|
| 1 | 10 | 10 |
| 2 | 20 | 10 |
| 3 | 30 | 10 |
| 4 | 40 | 10 |
Suppose segment 2 is lost, but segments 3 and 4 arrive.
The receiver cannot ACK 50, because bytes 20–29 are missing.
It repeatedly sends:
\[ ACK = 20 \]
These are duplicate ACKs.
If the missing segment is later retransmitted and the receiver has buffered the out-of-order segments, then it can ACK:
\[ ACK = 50 \]
If the receiver did not buffer the out-of-order data, then it may only ACK after receiving more retransmitted data. The TCP specification does not fully mandate how out-of-order data must be buffered; that is implementation-specific.
3.5. RTT and timeout
3.5.1. Why timeout is needed
TCP needs to decide when a segment is probably lost.
If an ACK does not arrive, the sender eventually retransmits.
The timeout must be chosen carefully.
If the timeout is too short:
- TCP retransmits unnecessarily;
- this wastes bandwidth;
- it may worsen congestion.
If the timeout is too long:
- TCP waits too long before repairing loss;
- performance suffers.
3.5.2. Round-trip time
The round-trip time, RTT, is the time from sending a segment to receiving the corresponding acknowledgement.
\[ RTT = \text{time from segment transmission to ACK receipt} \]
RTT varies because of:
- queueing delay;
- route changes;
- changing network load;
- processing delay;
- propagation delay.
The most variable part is usually queueing delay.
3.5.3. SampleRTT
TCP can measure a sample RTT:
\[ SampleRTT = \text{ACK receipt time} - \text{segment send time} \]
However, a single sample is noisy.
So TCP uses a smoothed estimate.
3.5.4. EstimatedRTT
TCP estimates RTT using an exponentially weighted moving average:
\[ EstimatedRTT = (1-α) ⋅ EstimatedRTT
α ⋅ SampleRTT \]
Typical value:
\[ \alpha = 0.125 = \frac{1}{8} \]
This makes recent samples matter more than old samples.
The influence of an old sample decreases exponentially over time.
3.5.5. Why use powers of two?
The lecture mentions that values such as \(1/8\) and \(1/4\) are convenient because they can be implemented efficiently using bit shifts.
This matters because TCP operations must be fast inside the operating system.
3.5.6. DevRTT
TCP also estimates how much RTT varies.
\[ DevRTT = (1-β) ⋅ DevRTT
β ⋅ |SampleRTT - EstimatedRTT| \]
Typical value:
\[ \beta = 0.25 = \frac{1}{4} \]
If RTT varies a lot, TCP needs a larger safety margin.
3.5.7. Timeout interval
The timeout interval is:
\[ TimeoutInterval = EstimatedRTT + 4 \cdot DevRTT \]
The factor 4 is a safety margin.
So the timeout adapts to both:
- the average RTT;
- the variability of RTT.
3.6. Retransmission ambiguity and Karn’s algorithm
3.6.1. The ambiguity problem
Suppose TCP sends a segment and then retransmits it because of timeout.
Later an ACK arrives.
Question:
Does the ACK correspond to the original segment or the retransmitted segment?
The sender often cannot tell.
Therefore, the RTT sample would be ambiguous.
3.6.2. Karn’s RTT estimator
Karn’s rule:
- do not take RTT samples for retransmitted segments;
- keep the backed-off timeout;
- reuse RTT estimation only after a successful non-retransmitted segment.
This avoids corrupting the RTT estimator with ambiguous samples.
3.6.3. TCP timestamp option
The lecture also mentions that modern TCP can use timestamp options.
With timestamps, the sender includes a timestamp in a segment, and the receiver echoes it back. This can help obtain more accurate RTT samples and reduce ambiguity.
3.7. TCP reliable data transfer
TCP builds reliable data transfer on top of unreliable IP.
It uses:
- sequence numbers;
- cumulative ACKs;
- retransmission timers;
- duplicate ACKs;
- retransmission;
- buffering;
- sliding windows.
3.8. Simplified TCP sender
A simplified TCP sender maintains:
- \(SendBase\): first unacknowledged byte;
- \(NextSeqNum\): next byte number to use when sending new data;
- a retransmission timer.
Initial state:
\[ SendBase = InitialSeqNum \]
\[ NextSeqNum = InitialSeqNum \]
3.8.1. Event: data received from application
When data arrives from the application:
- create a TCP segment;
- set the sequence number to \(NextSeqNum\);
- pass the segment to IP;
- update:
\[ NextSeqNum = NextSeqNum + length(data) \]
- start the timer if it is not already running.
The timer is conceptually associated with the oldest unacknowledged segment.
3.8.2. Event: ACK received
Suppose an ACK with value \(y\) arrives.
If:
\[ y > SendBase \]
then this ACK acknowledges new data.
Update:
\[ SendBase = y \]
If there are still unacknowledged segments, restart the timer. Otherwise, stop the timer.
If the ACK does not acknowledge new data, it may be a duplicate ACK.
3.8.3. Event: timeout
On timeout:
- retransmit the segment with the smallest unacknowledged sequence number;
- restart the timer.
3.9. Retransmission scenarios
3.9.1. Lost ACK
A sends:
\[ Seq = 92,\quad Len = 8 \]
B receives it and sends:
\[ ACK = 100 \]
If the ACK is lost, A eventually times out and retransmits the same data.
B receives duplicate data, but can send \(ACK=100\) again.
The data is not delivered twice to the application as new data, because TCP can recognize the duplicate sequence numbers.
3.9.2. Premature timeout
A segment may not be lost, but the sender’s timeout may fire too early.
Then the sender retransmits unnecessarily.
The receiver may get duplicate data and ACK accordingly.
This is why choosing timeout too short is bad.
3.9.3. Cumulative ACK advantage
Suppose ACK for an earlier segment is lost, but a later cumulative ACK arrives.
The later ACK may still acknowledge all earlier data.
For example:
- A sends bytes 92–99;
- B sends \(ACK=100\), but it is lost;
- A sends bytes 100–119;
- B sends \(ACK=120\).
The ACK 120 implies that bytes up to 119 were received, so ACK 100 is no longer needed.
This is the advantage of cumulative ACKs.
3.10. TCP ACK generation
The receiver’s ACK behavior is not always “ACK every segment immediately”.
The lecture discusses common rules.
3.10.1. In-order segment, no pending ACK
If an in-order segment arrives with the expected sequence number, and all previous data is already ACKed:
- use delayed ACK;
- wait briefly for another segment;
- if no next segment arrives, send an ACK.
The slides mention waiting up to 500 ms; the lecture notes that modern systems often use smaller values such as around 200 ms.
3.10.2. In-order segment, one ACK already pending
If another in-order segment arrives while an ACK is pending:
- immediately send one cumulative ACK;
- this ACK covers both in-order segments.
This reduces ACK overhead.
3.10.3. Out-of-order segment
If a segment arrives with a higher-than-expected sequence number, there is a gap.
The receiver immediately sends a duplicate ACK indicating the next expected byte.
Example:
Expected:
\[ Seq = 200 \]
Received:
\[ Seq = 300 \]
The receiver sends:
\[ ACK = 200 \]
3.10.4. Segment fills a gap
If a segment arrives and partially or completely fills a gap, the receiver immediately sends an ACK, provided the segment starts at the lower end of the gap.
This helps the sender quickly learn that recovery succeeded.
3.11. Delayed ACKs
Delayed ACKs reduce overhead.
Instead of ACKing every segment, TCP can wait and ACK multiple received segments with one cumulative ACK.
However, delayed ACKs must not wait forever. If no second segment arrives within the delayed-ACK timeout, the receiver sends an ACK anyway.
Reason:
- otherwise the sender may wait unnecessarily;
- or the sender may timeout and retransmit even though the data arrived.
3.12. Fast retransmit
Timeouts can be long because TCP is designed to work over many different network environments, including very long-delay paths.
Waiting for timeout can be slow.
TCP therefore also uses duplicate ACKs to infer loss.
3.12.1. Triple duplicate ACK rule
If the sender receives three duplicate ACKs for the same acknowledgement number, it assumes that the missing segment was lost.
Then it retransmits the oldest unacknowledged segment without waiting for the timeout.
This is called fast retransmit.
Important exam detail:
It is not enough to receive one duplicate ACK. TCP fast retransmit uses three duplicate ACKs.
That means four ACKs with the same ACK number in total:
- the original ACK;
- first duplicate ACK;
- second duplicate ACK;
- third duplicate ACK.
After the third duplicate ACK, the sender retransmits.
3.12.2. Why not retransmit after one duplicate ACK?
Because packets can be reordered.
A single out-of-order packet may just mean that the network delivered packets in a different order, not that a segment was lost.
Three duplicate ACKs are stronger evidence.
3.13. Flow control
Flow control prevents the sender from overwhelming the receiver.
3.13.1. Receiver buffer
At the receiver, incoming TCP data is placed into a receive buffer.
The application process reads data from this buffer.
If the application reads slowly, the buffer can fill up.
If the sender keeps sending data after the buffer is full, the receiver would need to drop data.
TCP avoids this through flow control.
3.13.2. Receive window
The receiver advertises available buffer space using the receive window field, often written:
\[ rwnd \]
The receive window tells the sender how many more bytes the receiver is currently willing to accept.
If the receive buffer has:
- total size \(RcvBuffer\);
- currently buffered data \(BufferedData\);
then approximately:
\[ rwnd = RcvBuffer - BufferedData \]
The sender limits unacknowledged in-flight data to at most \(rwnd\), unless congestion control is even more restrictive.
3.13.3. Dynamic receiver window
The receive window changes over time.
It increases when the application reads data from the buffer.
It decreases when new data arrives and fills buffer space.
Many operating systems auto-adjust receive buffer sizes. The default may start small, for example around 4096 bytes, and grow if necessary.
3.13.4. Why not allocate huge buffers?
Although modern machines have lots of memory, kernel buffer memory is still a limited resource.
If the system allocates too much buffer memory for many connections, it can run out of kernel memory.
Therefore, buffer sizes are managed carefully.
3.13.5. Zero receive window
If the receiver advertises:
\[ rwnd = 0 \]
then the sender cannot send normal data.
But if the receiver later frees buffer space, how does the sender learn this?
The receiver may not have a new data segment to send.
Therefore, the sender periodically sends small probes, often one byte, to test whether the receiver window has opened again.
If the receiver still has no space, it acknowledges the old next expected byte. The sender backs off and probes again later.
3.14. Sliding window protocol
TCP is a sliding-window protocol.
For window size \(n\), the sender can have up to \(n\) bytes outstanding without receiving an acknowledgement.
When data is acknowledged, the window slides forward.
Sender-side window regions:
| Region | Meaning |
|---|---|
| Sent and ACKed | Done |
| Sent but not ACKed | In flight |
| Not yet sent but allowed | Can be sent |
| Not usable | Must wait |
Receiver-side regions:
| Region | Meaning |
|---|---|
| ACKed but not delivered to user | In receive buffer |
| Not yet ACKed | May arrive next |
| Receive window | Free buffer space |
3.15. Silly window syndrome
Silly window syndrome happens when TCP sends many tiny segments because only a small amount of window space opens repeatedly.
This is inefficient because each small segment carries TCP/IP header overhead.
A common mitigation is to limit the number of segments smaller than MSS that are in flight.
The lecture states the rule informally as:
Do not allow too many packets smaller than MSS in flight; limit them strongly, for example to one per RTT.
3.16. Ideal window size and bandwidth-delay product
The ideal amount of data in flight is related to the bandwidth-delay product:
\[ BDP = RTT \cdot \text{bottleneck bandwidth} \]
Interpretation:
The bandwidth-delay product is roughly the amount of data needed to fill the network path.
If:
\[ window < BDP \]
then the sender may waste available bandwidth.
If:
\[ window > BDP \]
then the sender may put too much data into the network, causing queues, increased RTT, and eventually loss.
In practice, one often wants to operate below the full BDP because other users also share the network.
3.17. Connection management
TCP needs connection management because both sides must agree on state before data transfer.
The state includes:
- connection state;
- initial sequence numbers;
- receive buffer sizes;
- options;
- window scaling;
- other negotiated parameters.
3.18. Why a two-way handshake is not enough
A two-way handshake might look like:
Client -> Server: Let's talk Server -> Client: OK
But this is not reliable enough in a real network because:
- messages can be delayed;
- messages can be lost;
- messages can be retransmitted;
- messages can be reordered;
- one side cannot directly see the other side’s current state.
A delayed old connection request may arrive after the client has already gone away. The server may then create half-open state for a client that no longer exists.
The lecture’s human analogy:
- If I say “hello” and you say “hello” back, you still do not know whether I heard your response unless I acknowledge it again.
So TCP uses a three-way handshake.
3.19. TCP three-way handshake
The TCP handshake uses SYN and ACK flags.
Let the client choose initial sequence number \(x\).
Let the server choose initial sequence number \(y\).
3.19.1. Step 1: Client sends SYN
\[ Client \rightarrow Server: SYN=1,\quad Seq=x \]
The ACK flag is not set because the client does not yet know the server’s sequence number.
The client enters a SYN-SENT-like state.
3.19.2. Step 2: Server sends SYN-ACK
\[ Server \rightarrow Client: SYN=1,\quad ACK=1,\quad Seq=y,\quad ACKnum=x+1 \]
The server acknowledges the client’s SYN.
Important:
A SYN consumes one sequence number.
So the ACK for \(Seq=x\) is \(x+1\), even if the SYN carries no application data.
The server also sends its own SYN with sequence number \(y\).
3.19.3. Step 3: Client sends ACK
\[ Client \rightarrow Server: ACK=1,\quad Seq=x+1,\quad ACKnum=y+1 \]
This acknowledges the server’s SYN.
After this, both sides know that the other side is alive and has received the necessary initial state.
3.19.4. Data after handshake
After the handshake:
- client data starts at sequence number \(x+1\);
- server data starts at sequence number \(y+1\).
Example:
If:
\[ x = 1000 \]
\[ y = 200 \]
Then:
Client sends:
\[ SYN,\ Seq=1000 \]
Server sends:
\[ SYN+ACK,\ Seq=200,\ ACK=1001 \]
Client sends:
\[ ACK,\ Seq=1001,\ ACK=201 \]
If the client then sends 1000 bytes, it uses:
\[ Seq=1001,\quad Len=1000 \]
The next client sequence number becomes:
\[ 2001 \]
3.20. Closing a TCP connection
TCP is full duplex, so each direction is closed separately.
A FIN means:
I am done sending data in this direction.
But the other side may still send data.
3.20.1. FIN consumes sequence number space
Like SYN, FIN consumes one sequence number.
If a side sends:
\[ FIN,\ Seq=x \]
the ACK is:
\[ ACK = x+1 \]
3.20.2. Typical close sequence
A typical close:
- Client sends FIN.
- Server ACKs the FIN.
- Server may continue sending remaining data.
- Server later sends FIN.
- Client ACKs the server’s FIN.
- Client waits in TIME-WAIT for \(2 \times MSL\).
Here \(MSL\) means maximum segment lifetime.
3.20.3. Why TIME-WAIT?
The endpoint that sends the final ACK waits to ensure old packets from the connection have disappeared from the network.
The wait is often:
\[ 2 \times MSL \]
This helps prevent old duplicate packets from being confused with a later connection using the same socket tuple.
3.20.4. FIN vs RST
FIN is graceful:
- one side says it has no more data to send;
- the other direction may still continue;
- state is cleaned up carefully.
RST is abrupt:
- it resets the connection;
- it is used for errors or unexpected packets;
- it tells the other side to discard connection state.
The lecture notes that historically, servers sometimes used RST instead of graceful FIN in some HTTP scenarios to avoid the cost of keeping many connections in TIME-WAIT or half-open states.
3.21. Important takeaways
3.21.1. UDP
UDP is simple, fast, and connectionless.
It gives no reliability, ordering, flow control, or congestion control.
It is suitable when:
- the application can tolerate loss;
- low delay matters;
- the protocol is simple request-response;
- reliability is implemented elsewhere.
3.21.2. TCP
TCP provides reliable, in-order byte-stream communication.
It uses:
- sequence numbers;
- cumulative acknowledgements;
- retransmissions;
- timers;
- duplicate ACKs;
- flow control;
- congestion control;
- connection setup and teardown.
3.21.3. Sequence numbers and ACKs
The most important rule:
\[ ACK = Seq + Length \]
for normal data, because the ACK number is the next byte expected.
But remember:
- sequence numbers count bytes, not segments;
- SYN and FIN each consume one sequence number;
- pure ACKs normally do not consume sequence number space;
- each direction has its own sequence number space.
3.21.4. Reliable transfer
TCP repairs loss using:
- timeout-based retransmission;
- fast retransmit after three duplicate ACKs.
Cumulative ACKs make ACK loss less serious because later ACKs can cover earlier data.
3.21.5. Flow control
Flow control is receiver protection.
The receiver advertises:
\[ rwnd \]
The sender ensures that in-flight unacknowledged data does not exceed the receiver’s available buffer space.
3.21.6. Congestion control
Congestion control is network protection.
It was introduced but not fully covered in these lectures. The next lecture will focus on it.
The sender’s actual allowed in-flight data is limited by both:
\[ \min(cwnd, rwnd) \]
where \(cwnd\) protects the network and \(rwnd\) protects the receiver.
3.21.7. Connection management
A two-way handshake is not enough because of delayed, reordered, or retransmitted messages.
TCP uses a three-way handshake:
\[ SYN \]
\[ SYN + ACK \]
\[ ACK \]
Connection closing is separate in the two directions and uses FIN/ACK, with RST for abrupt error handling.