Transport Layer 1

1. Transport Layer: UDP and TCP

These two lectures move from the application layer to the transport layer.

The transport layer provides logical communication between application processes running on different hosts.

The main transport-layer protocols discussed are:

UDP: User Datagram Protocol
TCP: Transmission Control Protocol

The lectures first introduce general transport-layer services, ports, segments, multiplexing/demultiplexing, and UDP. Then they move into TCP: sequence numbers, acknowledgements, reliable data transfer, flow control, and connection management.

2. Lecture 7: Transport Layer, UDP, and TCP Part 1

2.1. From application layer to transport layer

We have already seen many application-layer protocols and socket programming. Now the course moves one layer down.

The transport layer sits between:

the application layer above it;
the network layer below it.

Its job is not to route packets through the Internet. That is the network layer’s job. Instead, the transport layer provides communication between processes.

A useful mental model is:

Application process
        |
     socket
        |
Transport layer
        |
Network layer

The application writes data into a socket. The transport layer then prepares this data for transmission over the network.

2.2. Transport-layer actions

At the sender:

the application passes an application-layer message to the transport layer;
the transport layer determines suitable header fields;
it creates transport-layer segments;
it passes these segments down to IP.

At the receiver:

the transport layer receives segments from IP;
it checks header fields;
it extracts or reassembles the application-layer data;
it demultiplexes the data to the correct socket/application.

In other words, the application may think it is writing a large object, such as a webpage or a file, into a socket. The transport layer can split this into smaller pieces and later reassemble them.

2.3. TCP vs UDP: high-level comparison

2.3.1. TCP

TCP stands for Transmission Control Protocol.

It provides:

reliable delivery;
in-order delivery;
connection setup;
flow control;
congestion control;
full-duplex communication;
a byte-stream abstraction.

TCP hides many network-level problems from the application.

For example, IP may deliver packets out of order, lose packets, or deliver packets along different paths. TCP tries to make the application see a clean, ordered byte stream.

2.3.2. UDP

UDP stands for User Datagram Protocol.

It provides:

connectionless communication;
unordered delivery;
unreliable delivery;
no retransmission mechanism;
no flow control;
no congestion control;
a small header;
low overhead.

UDP is close to “best effort IP plus ports”.

If data is lost, UDP itself does not repair it. If data arrives out of order, UDP does not reorder it. If the receiver is overwhelmed, UDP does not slow down the sender.

2.3.3. Services not provided by either TCP or UDP

Neither TCP nor UDP gives strict guarantees about:

delay;
bandwidth.

TCP can adapt to congestion and retransmit lost data, but it still does not promise that a segment will arrive within a fixed time. UDP is even simpler and also gives no such guarantees.

2.4. Congestion control vs flow control

The lecture emphasized that TCP has both congestion control and flow control, but they solve different problems.

2.4.1. Congestion control

Congestion control protects the network.

If too many senders transmit too much data at the same time, routers may need to buffer packets. Buffers are finite. Once they fill up, packets are dropped. If many connections keep sending aggressively, the network can become congested and performance can collapse.

TCP congestion control tries to make senders reduce their sending rate when the network appears overloaded.

The important idea is:

Congestion control limits how much data the network can handle.

This is a network-side limitation.

2.4.2. Flow control

Flow control protects the receiver.

The receiver has finite buffer space. If the sender sends faster than the receiver can process or read data, the receiver’s buffer may overflow.

The important idea is:

Flow control limits how much data the receiver is willing or able to accept.

This is a receiver-side limitation.

2.4.3. Effective sending limit

The sender should respect both limits:

\[ \text{allowed in-flight data} = \min(\text{congestion window}, \text{receiver window}) \]

The congestion window is related to the network. The receiver window is related to the receiver’s buffer.

2.5. Multiplexing and demultiplexing

2.5.1. Motivation

On one host, many applications may use the network at the same time.

For example:

a browser;
an email client;
a DNS resolver;
a video call;
an SSH session.

All of them use the same network stack. The transport layer therefore needs a way to combine data from multiple applications and later separate incoming data back to the right application.

2.5.2. Multiplexing

Multiplexing happens at the source host.

It means:

Gather data from multiple application sockets, add transport headers, and send the resulting segments through the network.

So the source host takes multiple application data streams and sends them through the transport layer.

2.5.3. Demultiplexing

Demultiplexing happens at the destination host.

It means:

Use header fields to deliver each received segment to the correct socket or application process.

The destination host receives segments from the network and decides which application should get each segment.

2.5.4. Multiplexing in other layers

The term multiplexing is not unique to the transport layer.

In general, multiplexing means combining multiple logical streams into one lower-level channel. Demultiplexing means separating them again.

For example, HTTP/2 can multiplex multiple application-level streams over one TCP connection.

2.6. Ports

Ports are used to identify application endpoints.

Both TCP and UDP headers contain:

source port;
destination port.

Some common well-known ports:

Port	Protocol / Use
80	HTTP
443	HTTPS
25	SMTP
53	DNS

Important details:

The destination port often identifies the server-side application.
The source port is often chosen dynamically by the client.
Ports 0–1023 are traditionally well-known or privileged ports.
Client-side ephemeral ports are usually chosen from a high-numbered range.
The same port number can technically have different meanings for TCP and UDP, though common services often use the same number where applicable.
DNS commonly uses UDP port 53, but DNS can also use TCP port 53.

2.7. Segments and packets

A segment is the protocol data unit of the transport layer.

A packet is the protocol data unit of the network layer.

So:

Layer	PDU name
Transport layer	Segment
Network layer	Packet

The lecture joked that “packages” are for DHL, not the Internet. In networking, the usual terms are segments and packets.

2.8. UDP: User Datagram Protocol

2.8.1. Basic properties

UDP is a minimal transport-layer protocol.

It is:

connectionless;
unreliable;
unordered;
best-effort;
simple;
fast;
low-overhead.

UDP was originally specified in RFC 768.

There is no handshake before sending UDP data. Each UDP datagram is handled independently.

2.8.2. Why use UDP?

UDP is useful because it avoids TCP’s connection setup and reliability machinery.

Advantages:

no connection establishment delay;
no connection state;
small header;
simple implementation;
sender can transmit at its chosen rate;
useful for simple request-response protocols;
useful for loss-tolerant, delay-sensitive applications.

Examples:

DNS;
streaming multimedia;
voice-over-IP;
SNMP;
HTTP/3 over QUIC.

For DNS, UDP makes sense because a typical DNS lookup is one query and one response. A full TCP handshake before every small query would add unnecessary round-trip delay.

2.8.3. UDP and congestion

UDP itself has no congestion control.

This can be useful for some applications, but it is also dangerous. A UDP application can send too aggressively and contribute to congestion.

If reliability or congestion control is needed on top of UDP, the application or a higher-level protocol must implement it.

This is relevant for QUIC:

QUIC runs on top of UDP;
QUIC implements reliability, congestion control, and flow control itself;
therefore QUIC uses UDP as a substrate, but adds many TCP-like features above it.

2.8.4. UDP buffers

UDP has a receive buffer at the receiver side.

However, UDP does not keep a sender-side retransmission buffer. Once UDP passes a datagram down to IP, UDP forgets about it.

TCP is different: TCP keeps sent data in a send buffer so that it can retransmit if necessary.

2.8.5. UDP segment structure

The UDP header is small:

Field	Size
Source port	16 bits
Destination port	16 bits
Length	16 bits
Checksum	16 bits

So the UDP header is 8 bytes.

The length field includes:

UDP header;
UDP payload.

The checksum is used for error detection, not security.

2.8.6. UDP checksum

The checksum helps detect accidental corruption, such as bit flips.

It does not provide cryptographic integrity. An attacker who can modify both the data and the checksum can still forge a valid checksum.

The checksum is computed using one’s-complement arithmetic over 16-bit words. A pseudo-header is also involved, including information such as IP addresses, protocol identifier, and length.

If the packet length is odd, padding may be added for checksum computation.

A checksum value of zero can indicate that no checksum is used, but the lecture emphasized that the checksum should normally be enabled.

2.8.7. UDP Checksum and the Pseudo-Header

The UDP checksum is used to detect accidental corruption during transmission.

It is not a cryptographic security mechanism. It can detect ordinary bit errors, but it cannot stop an attacker who can modify both the packet contents and the checksum.

Basic idea
UDP computes its checksum over 16-bit words.

That means the data is divided into chunks of 16 bits, i.e. 2 bytes each.

The checksum is computed using one’s-complement arithmetic.

The rough procedure is:
1. Divide the checked data into 16-bit words.
2. Add all 16-bit words using one’s-complement addition.
3. If there is a carry beyond 16 bits, wrap it around and add it back.
4. Take the one’s complement of the final sum.
5. Store the result in the UDP checksum field.
In simple terms:

Add everything in 16-bit chunks, wrap around carries, then flip all bits.
One’s-complement addition

One’s-complement addition is almost normal binary addition, except that overflow is wrapped around.

For example, if the sum is larger than 16 bits, the carry is added back to the lower 16 bits.

Conceptually:

\[ \text{sum} = \text{low 16 bits} + \text{carry} \]

Then the checksum is:

\[ \text{checksum} = \sim \text{sum} \]

where \(\sim\) means bitwise complement.
Padding for odd length

The checksum is computed over 16-bit words.

If the UDP data has an odd number of bytes, one padding byte is temporarily added for checksum computation.

This padding byte is only used for the checksum calculation. It is not necessarily part of the actual application data.
What is the pseudo-header?
The UDP checksum is not computed only over the UDP header and UDP payload.

It also includes a pseudo-header.

The pseudo-header is not a real UDP header field. It is not sent as part of the UDP segment.

Instead, it is temporarily constructed during checksum calculation.

It contains selected information from the IP layer, such as:
- Source IP address 4 bytes
- Destination IP address 4 bytes
- Zero / reserved field 1 byte
- Protocol identifier 1 byte
- UDP length 2 bytes
So the checksum is computed over:
\begin{equation*} \text{pseudo-header} + \text{UDP header} + \text{UDP data} \end{equation*}
Why include a pseudo-header?
This is important because UDP itself only has ports, length, checksum, and payload. It does not contain source and destination IP addresses.

The IP addresses are stored in the IP header, one layer below UDP.

However, UDP still wants to detect some errors involving the IP-layer delivery information.

For example, suppose a UDP datagram is accidentally delivered to the wrong host because the destination IP address was corrupted.

If the checksum only covered the UDP header and UDP payload, the receiver might not notice that the IP-level destination information was wrong.

By including the pseudo-header in the checksum calculation, the receiver checks that the UDP datagram matches the expected IP-layer information too.

Therefore, the pseudo-header helps detect errors such as:
- corrupted source IP address;
- corrupted destination IP address;
- wrong protocol number;
- wrong UDP length;
- delivery to the wrong endpoint.
Why is it called “pseudo”?

It is called a pseudo-header because it behaves like a header for checksum calculation, but it is not actually inserted into the UDP segment.

The sender and receiver both construct it locally from IP-layer information.

The sender computes:

\[ \text{checksum} = f(\text{pseudo-header}, \text{UDP header}, \text{UDP data}) \]

The receiver reconstructs the same pseudo-header and recomputes the checksum.

If the result does not match, the UDP datagram is considered corrupted and can be discarded.

2.8.8. UDP summary

UDP is best understood as:

Send and hope for the best.

UDP segments may be:

lost;
duplicated;
delivered out of order;
delivered correctly.

UDP does not fix these problems. If the application needs more guarantees, it must build them itself.

2.9. TFTP and the Sorcerer’s Apprentice problem

The lecture used TFTP, the Trivial File Transfer Protocol, as a transition from UDP to TCP.

TFTP is a simple file-transfer protocol built on UDP. It tried to add reliability by using acknowledgements.

A simplified TFTP idea:

Sender sends block \(X\).
Receiver acknowledges block \(X\).
Sender sends block \(X+1\).
Receiver acknowledges block \(X+1\).

This looks simple, but an early design problem caused duplication.

2.9.1. The problem scenario

Suppose the sender sends block \(X+1\).

If the ACK is delayed, the sender may timeout and retransmit \(X+1\).

Now the receiver may receive two copies of \(X+1\). If it acknowledges each copy, then each ACK may trigger the sender to send \(X+2\).

Then the receiver may get two copies of \(X+2\), acknowledge both, and the sender may send two copies of \(X+3\), and so on.

The number of duplicate packets can grow.

This was called the Sorcerer’s Apprentice syndrome.

2.9.2. Lesson

The lesson is that reliability is subtle.

It is not enough to simply say:

Every packet must be acknowledged.

The protocol must also handle:

delayed packets;
duplicate packets;
duplicate ACKs;
retransmissions;
state management;
loss recovery.

This motivates using a dedicated transport protocol like TCP instead of requiring every application to implement these mechanisms itself.

2.10. TCP: Transmission Control Protocol, Part 1

2.10.1. TCP’s abstraction

TCP provides a reliable, in-order byte stream.

This means:

TCP views data as a stream of bytes;
TCP does not preserve application message boundaries;
bytes should be delivered to the receiving application in the same order in which they were sent.

For example, if an application writes:

write("hello")
write("world")

the receiver may simply read a byte stream:

helloworld

TCP itself does not necessarily remember that the application made two writes.

2.10.2. Point-to-point communication

TCP is point-to-point:

one sender;
one receiver.

It is not one-to-many multicast.

2.10.3. Full duplex

TCP is full duplex.

One TCP connection supports data flow in both directions at the same time.

So a TCP connection between A and B contains:

A-to-B byte stream;
B-to-A byte stream.

Each direction has its own sequence number space.

2.10.4. Connection-oriented communication

TCP is connection-oriented.

Before normal data transfer, the endpoints perform a handshake to initialize connection state.

The state includes, for example:

sequence numbers;
acknowledgement numbers;
buffer/window information;
negotiated options;
connection state.

2.10.5. MSS: Maximum Segment Size

The MSS is the maximum amount of TCP payload data that can be sent in one TCP segment.

It should be smaller than the MTU, because the IP and TCP headers also need space.

For Ethernet with MTU 1500 bytes, a common TCP payload size is:

\[ 1500 - 20 - 20 = 1460 \]

assuming:

20 bytes IPv4 header;
20 bytes TCP header without options.

The MSS is about TCP payload size, not total frame size.

2.10.6. TCP segment header

TCP has a much more complex header than UDP.

Important fields:

Field	Meaning
Source port	Sending-side port
Destination port	Receiving-side port
Sequence number	Byte number of first data byte in this segment
Acknowledgement number	Next byte expected from the other side
Header length / offset	Where payload starts
Checksum	Error detection
Receive window	Receiver-advertised free buffer space
Urgent pointer	Used with urgent data
Options	Variable-length TCP options
Flags	Control bits such as ACK, SYN, FIN, RST

Important flags:

Flag	Meaning
ACK	Acknowledgement field is valid
SYN	Synchronize sequence numbers; used in connection setup
FIN	Sender is finished sending in this direction
RST	Reset connection
PSH	Push data to application promptly
URG	Urgent data
ECN-related flags	Used for Explicit Congestion Notification

2.10.7. Sequence numbers

TCP sequence numbers count bytes, not segments.

The sequence number of a TCP segment is the byte-stream number of the first data byte in that segment.

Example:

If a sender starts with sequence number 500 and sends 200 bytes, then the next sequence number is:

\[ 500 + 200 = 700 \]

So the next segment should start with sequence number 700.

If the sender later sends only 32 bytes, the next sequence number increases by 32.

2.10.8. Sequence number space and sender window

The sender’s sequence number space can be divided into regions:

Region	Meaning
Sent and ACKed	Data already confirmed by receiver
Sent but not ACKed	Data in flight
Usable but not sent	Data that may still be sent within the window
Not usable	Data outside the current sending window

The usable window is limited by both:

flow control;
congestion control.

The sender may have multiple unacknowledged segments in flight. This is pipelining.

2.10.9. Acknowledgement numbers

The acknowledgement number means:

I have received all bytes before this number, and I expect this byte next.

Example:

If Host A sends 200 bytes starting at sequence number 500, then Host B sends:

\[ ACK = 700 \]

This means:

I received bytes 500 through 699. I now expect byte 700.

2.10.10. Cumulative ACKs

TCP ACKs are cumulative.

An ACK for byte \(N\) means:

All bytes before \(N\) have been received in order.

So if a receiver sends:

\[ ACK = 3000 \]

it means all bytes up to 2999 have been received in order.

If there is a gap, the receiver cannot cumulatively acknowledge beyond the gap.

2.10.11. Example: gap in the byte stream

Suppose TCP segments have payload length 500.

A sends:

Segment	Seq	Length
1	2500	500
2	3000	500
3	3500	500

If segment 2 is lost but segment 3 arrives, the receiver has:

bytes 2500–2999;
bytes 3500–3999;
but not bytes 3000–3499.

The receiver cannot send \(ACK=4000\), because that would falsely claim that all bytes before 4000 were received.

Instead, the receiver repeats:

\[ ACK = 3000 \]

This duplicate ACK tells the sender:

I am still waiting for byte 3000.

2.10.12. Duplicate ACKs

A duplicate ACK is an ACK with the same acknowledgement number as before.

Duplicate ACKs usually mean:

later data arrived;
but some earlier data is missing;
therefore the receiver keeps asking for the same next byte.

TCP does not immediately retransmit after one duplicate ACK because packets can be reordered in the network.

The common rule is:

After three duplicate ACKs, TCP assumes loss and performs fast retransmit.

More precisely, the sender receives the original ACK plus three additional ACKs for the same acknowledgement number.

2.10.13. Sequence number examples from the lecture

Example 1:

A sends:

\[ Seq = 1500,\quad Length = 500 \]

B should ACK:

\[ ACK = 2000 \]

A then sends:

\[ Seq = 2000,\quad Length = 500 \]

B should ACK:

\[ ACK = 2500 \]

Example 2 with loss:

A sends:

\[ Seq = 2500,\quad Length = 500 \]

B ACKs:

\[ ACK = 3000 \]

A sends:

\[ Seq = 3000,\quad Length = 500 \]

but this is lost.

A sends:

\[ Seq = 3500,\quad Length = 500 \]

B receives this but sees a gap. It still sends:

\[ ACK = 3000 \]

because byte 3000 is still the next expected byte.

2.10.14. ACKs and pure ACK segments

Each TCP segment contains both:

a sequence number;
an acknowledgement number.

Even a pure ACK segment, with no application payload, still has a sequence number field.

However, a pure ACK with no data normally does not consume sequence number space. SYN and FIN are special: they each consume one sequence number.

2.10.15. Initial sequence numbers

TCP sequence numbers do not simply start at zero.

Historically, predictable initial sequence numbers enabled attacks, because an attacker could guess where important data would appear in the byte stream.

Modern TCP uses random initial sequence numbers.

Each direction has its own initial sequence number.

For a connection between A and B:

A chooses an initial sequence number for A-to-B data;
B chooses an initial sequence number for B-to-A data.

3. Lecture 8: TCP Part 2

Lecture 8 goes deeper into TCP.

Main topics:

TCP segment refresher;
sequence numbers and acknowledgements;
reliable data transfer;
RTT estimation and timeout;
retransmissions;
delayed ACKs;
fast retransmit;
flow control;
connection management.

3.1. TCP refresher

TCP is:

point-to-point;
reliable;
in-order;
byte-stream oriented;
full duplex;
connection-oriented;
flow controlled;
congestion controlled.

TCP provides the illusion of a reliable ordered stream on top of IP, even though IP itself is unreliable and packet-oriented.

3.2. TCP byte stream and message boundaries

TCP does not preserve message boundaries.

The application writes bytes. TCP sends bytes. The receiving application reads bytes.

If the application needs message boundaries, it must implement them itself, for example by:

fixed-size messages;
delimiters;
length prefixes;
application-level framing.

This is why protocols built on TCP often define their own framing format.

3.3. Pipelining

If TCP sent only one segment and then waited for its ACK before sending the next one, throughput would be very poor, especially on long-delay links.

So TCP uses pipelining:

The sender may have multiple segments in flight before receiving ACKs.

The number of bytes allowed in flight depends on:

\[ \min(cwnd, rwnd) \]

where:

\(cwnd\) is the congestion window;
\(rwnd\) is the receive window.

3.4. Sequence numbers and ACKs: detailed examples

3.4.1. Telnet example

The lecture uses a simple Telnet-like example where the user types one character, \(C\).

Host A sends \(C\) to Host B:

\[ Seq = 42,\quad ACK = 79,\quad data = C \]

Since \(C\) is one byte, Host B acknowledges the next expected byte:

\[ ACK = 43 \]

Host B also echoes \(C\) back using its own sequence number space:

\[ Seq = 79,\quad ACK = 43,\quad data = C \]

Then Host A acknowledges the echoed byte:

\[ Seq = 43,\quad ACK = 80 \]

Important lesson:

the ACK number is the next expected byte from the other side;
each direction has its own sequence number space;
a one-byte payload increases the next sequence number by 1.

3.4.2. Larger segment example

Suppose A’s initial sequence number is 10, B’s initial sequence number is 200, and A sends 10-byte segments.

A sends:

\[ Seq = 10,\quad Len = 10 \]

B should ACK:

\[ ACK = 20 \]

A sends another 10 bytes:

\[ Seq = 20,\quad Len = 10 \]

B can ACK cumulatively:

\[ ACK = 30 \]

This says:

I have received everything before byte 30.

3.4.3. Loss example

A sends:

Segment	Seq	Length
1	10	10
2	20	10
3	30	10
4	40	10

Suppose segment 2 is lost, but segments 3 and 4 arrive.

The receiver cannot ACK 50, because bytes 20–29 are missing.

It repeatedly sends:

\[ ACK = 20 \]

These are duplicate ACKs.

If the missing segment is later retransmitted and the receiver has buffered the out-of-order segments, then it can ACK:

\[ ACK = 50 \]

If the receiver did not buffer the out-of-order data, then it may only ACK after receiving more retransmitted data. The TCP specification does not fully mandate how out-of-order data must be buffered; that is implementation-specific.

3.5. RTT and timeout

3.5.1. Why timeout is needed

TCP needs to decide when a segment is probably lost.

If an ACK does not arrive, the sender eventually retransmits.

The timeout must be chosen carefully.

If the timeout is too short:

TCP retransmits unnecessarily;
this wastes bandwidth;
it may worsen congestion.

If the timeout is too long:

TCP waits too long before repairing loss;
performance suffers.

3.5.2. Round-trip time

The round-trip time, RTT, is the time from sending a segment to receiving the corresponding acknowledgement.

\[ RTT = \text{time from segment transmission to ACK receipt} \]

RTT varies because of:

queueing delay;
route changes;
changing network load;
processing delay;
propagation delay.

The most variable part is usually queueing delay.

3.5.3. SampleRTT

TCP can measure a sample RTT:

\[ SampleRTT = \text{ACK receipt time} - \text{segment send time} \]

However, a single sample is noisy.

So TCP uses a smoothed estimate.

3.5.4. EstimatedRTT

TCP estimates RTT using an exponentially weighted moving average:

\begin{equation*} EstimatedRTT = (1-\alpha) \cdot EstimatedRTT + \alpha \cdot SampleRTT \end{equation*}

Typical value:

\[ \alpha = 0.125 = \frac{1}{8} \]

This makes recent samples matter more than old samples.

The influence of an old sample decreases exponentially over time.

3.5.5. Why use powers of two?

The lecture mentions that values such as \(1/8\) and \(1/4\) are convenient because they can be implemented efficiently using bit shifts.

This matters because TCP operations must be fast inside the operating system.

3.5.6. DevRTT

TCP also estimates how much RTT varies.

\begin{equation*} DevRTT = (1-\beta) \cdot DevRTT + \beta \cdot |SampleRTT - EstimatedRTT| \end{equation*}

Typical value:

\[ \beta = 0.25 = \frac{1}{4} \]

If RTT varies a lot, TCP needs a larger safety margin.

3.5.7. Timeout interval

The timeout interval is:

\[ TimeoutInterval = EstimatedRTT + 4 \cdot DevRTT \]

The factor 4 is a safety margin.

So the timeout adapts to both:

the average RTT;
the variability of RTT.

3.6. Retransmission ambiguity and Karn’s algorithm

3.6.1. The ambiguity problem

Suppose TCP sends a segment and then retransmits it because of timeout.

Later an ACK arrives.

Question:

Does the ACK correspond to the original segment or the retransmitted segment?

The sender often cannot tell.

Therefore, the RTT sample would be ambiguous.

3.6.2. Karn’s RTT estimator

Karn’s rule:

do not take RTT samples for retransmitted segments;
keep the backed-off timeout;
reuse RTT estimation only after a successful non-retransmitted segment.

This avoids corrupting the RTT estimator with ambiguous samples.

3.6.3. TCP timestamp option

The lecture also mentions that modern TCP can use timestamp options.

With timestamps, the sender includes a timestamp in a segment, and the receiver echoes it back. This can help obtain more accurate RTT samples and reduce ambiguity.

3.7. TCP reliable data transfer

TCP builds reliable data transfer on top of unreliable IP.

It uses:

sequence numbers;
cumulative ACKs;
retransmission timers;
duplicate ACKs;
retransmission;
buffering;
sliding windows.

3.8. Simplified TCP sender

A simplified TCP sender maintains:

\(SendBase\): first unacknowledged byte;
\(NextSeqNum\): next byte number to use when sending new data;
a retransmission timer.

Initial state:

\[ SendBase = InitialSeqNum \]

\[ NextSeqNum = InitialSeqNum \]

3.8.1. Event: data received from application

When data arrives from the application:

create a TCP segment;
set the sequence number to \(NextSeqNum\);
pass the segment to IP;
update: \[ NextSeqNum = NextSeqNum + length(data) \]
start the timer if it is not already running.

The timer is conceptually associated with the oldest unacknowledged segment.

3.8.2. Event: ACK received

Suppose an ACK with value \(y\) arrives.

If:

\[ y > SendBase \]

then this ACK acknowledges new data.

Update:

\[ SendBase = y \]

If there are still unacknowledged segments, restart the timer. Otherwise, stop the timer.

If the ACK does not acknowledge new data, it may be a duplicate ACK.

3.8.3. Event: timeout

On timeout:

retransmit the segment with the smallest unacknowledged sequence number;
restart the timer.

3.9. Retransmission scenarios

3.9.1. Lost ACK

A sends:

\[ Seq = 92,\quad Len = 8 \]

B receives it and sends:

\[ ACK = 100 \]

If the ACK is lost, A eventually times out and retransmits the same data.

B receives duplicate data, but can send \(ACK=100\) again.

The data is not delivered twice to the application as new data, because TCP can recognize the duplicate sequence numbers.

3.9.2. Premature timeout

A segment may not be lost, but the sender’s timeout may fire too early.

Then the sender retransmits unnecessarily.

The receiver may get duplicate data and ACK accordingly.

This is why choosing timeout too short is bad.

3.9.3. Cumulative ACK advantage

Suppose ACK for an earlier segment is lost, but a later cumulative ACK arrives.

The later ACK may still acknowledge all earlier data.

For example:

A sends bytes 92–99;
B sends \(ACK=100\), but it is lost;
A sends bytes 100–119;
B sends \(ACK=120\).

The ACK 120 implies that bytes up to 119 were received, so ACK 100 is no longer needed.

This is the advantage of cumulative ACKs.

3.10. TCP ACK generation

The receiver’s ACK behavior is not always “ACK every segment immediately”.

The lecture discusses common rules.

3.10.1. In-order segment, no pending ACK

If an in-order segment arrives with the expected sequence number, and all previous data is already ACKed:

use delayed ACK;
wait briefly for another segment;
if no next segment arrives, send an ACK.

The slides mention waiting up to 500 ms; the lecture notes that modern systems often use smaller values such as around 200 ms.

3.10.2. In-order segment, one ACK already pending

If another in-order segment arrives while an ACK is pending:

immediately send one cumulative ACK;
this ACK covers both in-order segments.

This reduces ACK overhead.

3.10.3. Out-of-order segment

If a segment arrives with a higher-than-expected sequence number, there is a gap.

The receiver immediately sends a duplicate ACK indicating the next expected byte.

Example:

Expected:

\[ Seq = 200 \]

Received:

\[ Seq = 300 \]

The receiver sends:

\[ ACK = 200 \]

3.10.4. Segment fills a gap

If a segment arrives and partially or completely fills a gap, the receiver immediately sends an ACK, provided the segment starts at the lower end of the gap.

This helps the sender quickly learn that recovery succeeded.

3.11. Delayed ACKs

Delayed ACKs reduce overhead.

Instead of ACKing every segment, TCP can wait and ACK multiple received segments with one cumulative ACK.

However, delayed ACKs must not wait forever. If no second segment arrives within the delayed-ACK timeout, the receiver sends an ACK anyway.

Reason:

otherwise the sender may wait unnecessarily;
or the sender may timeout and retransmit even though the data arrived.

3.12. Fast retransmit

Timeouts can be long because TCP is designed to work over many different network environments, including very long-delay paths.

Waiting for timeout can be slow.

TCP therefore also uses duplicate ACKs to infer loss.

3.12.1. Triple duplicate ACK rule

If the sender receives three duplicate ACKs for the same acknowledgement number, it assumes that the missing segment was lost.

Then it retransmits the oldest unacknowledged segment without waiting for the timeout.

This is called fast retransmit.

Important exam detail:

It is not enough to receive one duplicate ACK. TCP fast retransmit uses three duplicate ACKs.

That means four ACKs with the same ACK number in total:

the original ACK;
first duplicate ACK;
second duplicate ACK;
third duplicate ACK.

After the third duplicate ACK, the sender retransmits.

3.12.2. Why not retransmit after one duplicate ACK?

Because packets can be reordered.

A single out-of-order packet may just mean that the network delivered packets in a different order, not that a segment was lost.

Three duplicate ACKs are stronger evidence.

3.13. Flow control

Flow control prevents the sender from overwhelming the receiver.

3.13.1. Receiver buffer

At the receiver, incoming TCP data is placed into a receive buffer.

The application process reads data from this buffer.

If the application reads slowly, the buffer can fill up.

If the sender keeps sending data after the buffer is full, the receiver would need to drop data.

TCP avoids this through flow control.

3.13.2. Receive window

The receiver advertises available buffer space using the receive window field, often written:

\[ rwnd \]

The receive window tells the sender how many more bytes the receiver is currently willing to accept.

If the receive buffer has:

total size \(RcvBuffer\);
currently buffered data \(BufferedData\);

then approximately:

\[ rwnd = RcvBuffer - BufferedData \]

The sender limits unacknowledged in-flight data to at most \(rwnd\), unless congestion control is even more restrictive.

3.13.3. Dynamic receiver window

The receive window changes over time.

It increases when the application reads data from the buffer.

It decreases when new data arrives and fills buffer space.

Many operating systems auto-adjust receive buffer sizes. The default may start small, for example around 4096 bytes, and grow if necessary.

3.13.4. Why not allocate huge buffers?

Although modern machines have lots of memory, kernel buffer memory is still a limited resource.

If the system allocates too much buffer memory for many connections, it can run out of kernel memory.

Therefore, buffer sizes are managed carefully.

3.13.5. Zero receive window

If the receiver advertises:

\[ rwnd = 0 \]

then the sender cannot send normal data.

But if the receiver later frees buffer space, how does the sender learn this?

The receiver may not have a new data segment to send.

Therefore, the sender periodically sends small probes, often one byte, to test whether the receiver window has opened again.

If the receiver still has no space, it acknowledges the old next expected byte. The sender backs off and probes again later.

3.14. Sliding window protocol

TCP is a sliding-window protocol.

For window size \(n\), the sender can have up to \(n\) bytes outstanding without receiving an acknowledgement.

When data is acknowledged, the window slides forward.

Sender-side window regions:

Region	Meaning
Sent and ACKed	Done
Sent but not ACKed	In flight
Not yet sent but allowed	Can be sent
Not usable	Must wait

Receiver-side regions:

Region	Meaning
ACKed but not delivered to user	In receive buffer
Not yet ACKed	May arrive next
Receive window	Free buffer space

3.15. Silly window syndrome

Silly window syndrome happens when TCP sends many tiny segments because only a small amount of window space opens repeatedly.

This is inefficient because each small segment carries TCP/IP header overhead.

A common mitigation is to limit the number of segments smaller than MSS that are in flight.

The lecture states the rule informally as:

Do not allow too many packets smaller than MSS in flight; limit them strongly, for example to one per RTT.

3.16. Ideal window size and bandwidth-delay product

The ideal amount of data in flight is related to the bandwidth-delay product:

\[ BDP = RTT \cdot \text{bottleneck bandwidth} \]

Interpretation:

The bandwidth-delay product is roughly the amount of data needed to fill the network path.

If:

\[ window < BDP \]

then the sender may waste available bandwidth.

If:

\[ window > BDP \]

then the sender may put too much data into the network, causing queues, increased RTT, and eventually loss.

In practice, one often wants to operate below the full BDP because other users also share the network.

3.17. Connection management

TCP needs connection management because both sides must agree on state before data transfer.

The state includes:

connection state;
initial sequence numbers;
receive buffer sizes;
options;
window scaling;
other negotiated parameters.

3.18. Why a two-way handshake is not enough

A two-way handshake might look like:

Client -> Server: Let's talk
Server -> Client: OK

But this is not reliable enough in a real network because:

messages can be delayed;
messages can be lost;
messages can be retransmitted;
messages can be reordered;
one side cannot directly see the other side’s current state.

A delayed old connection request may arrive after the client has already gone away. The server may then create half-open state for a client that no longer exists.

The lecture’s human analogy:

If I say “hello” and you say “hello” back, you still do not know whether I heard your response unless I acknowledge it again.

So TCP uses a three-way handshake.

3.19. TCP three-way handshake

The TCP handshake uses SYN and ACK flags.

Let the client choose initial sequence number \(x\).

Let the server choose initial sequence number \(y\).

3.19.1. Step 1: Client sends SYN

\[ Client \rightarrow Server: SYN=1,\quad Seq=x \]

The ACK flag is not set because the client does not yet know the server’s sequence number.

The client enters a SYN-SENT-like state.

3.19.2. Step 2: Server sends SYN-ACK

\[ Server \rightarrow Client: SYN=1,\quad ACK=1,\quad Seq=y,\quad ACKnum=x+1 \]

The server acknowledges the client’s SYN.

Important:

A SYN consumes one sequence number.

So the ACK for \(Seq=x\) is \(x+1\), even if the SYN carries no application data.

The server also sends its own SYN with sequence number \(y\).

3.19.3. Step 3: Client sends ACK

\[ Client \rightarrow Server: ACK=1,\quad Seq=x+1,\quad ACKnum=y+1 \]

This acknowledges the server’s SYN.

After this, both sides know that the other side is alive and has received the necessary initial state.

3.19.4. Data after handshake

After the handshake:

client data starts at sequence number \(x+1\);
server data starts at sequence number \(y+1\).

Example:

If:

\[ x = 1000 \]

\[ y = 200 \]

Then:

Client sends:

\[ SYN,\ Seq=1000 \]
Server sends:

\[ SYN+ACK,\ Seq=200,\ ACK=1001 \]
Client sends:

\[ ACK,\ Seq=1001,\ ACK=201 \]

If the client then sends 1000 bytes, it uses:

\[ Seq=1001,\quad Len=1000 \]

The next client sequence number becomes:

\[ 2001 \]

3.20. Closing a TCP connection

TCP is full duplex, so each direction is closed separately.

A FIN means:

I am done sending data in this direction.

But the other side may still send data.

3.20.1. FIN consumes sequence number space

Like SYN, FIN consumes one sequence number.

If a side sends:

\[ FIN,\ Seq=x \]

the ACK is:

\[ ACK = x+1 \]

3.20.2. Typical close sequence

A typical close:

Client sends FIN.
Server ACKs the FIN.
Server may continue sending remaining data.
Server later sends FIN.
Client ACKs the server’s FIN.
Client waits in TIME-WAIT for \(2 \times MSL\).

Here \(MSL\) means maximum segment lifetime.

3.20.3. Why TIME-WAIT?

The endpoint that sends the final ACK waits to ensure old packets from the connection have disappeared from the network.

The wait is often:

\[ 2 \times MSL \]

This helps prevent old duplicate packets from being confused with a later connection using the same socket tuple.

3.20.4. FIN vs RST

FIN is graceful:

one side says it has no more data to send;
the other direction may still continue;
state is cleaned up carefully.

RST is abrupt:

it resets the connection;
it is used for errors or unexpected packets;
it tells the other side to discard connection state.

The lecture notes that historically, servers sometimes used RST instead of graceful FIN in some HTTP scenarios to avoid the cost of keeping many connections in TIME-WAIT or half-open states.

3.21. Important takeaways

3.21.1. UDP

UDP is simple, fast, and connectionless.

It gives no reliability, ordering, flow control, or congestion control.

It is suitable when:

the application can tolerate loss;
low delay matters;
the protocol is simple request-response;
reliability is implemented elsewhere.

3.21.2. TCP

TCP provides reliable, in-order byte-stream communication.

It uses:

sequence numbers;
cumulative acknowledgements;
retransmissions;
timers;
duplicate ACKs;
flow control;
congestion control;
connection setup and teardown.

3.21.3. Sequence numbers and ACKs

The most important rule:

\[ ACK = Seq + Length \]

for normal data, because the ACK number is the next byte expected.

But remember:

sequence numbers count bytes, not segments;
SYN and FIN each consume one sequence number;
pure ACKs normally do not consume sequence number space;
each direction has its own sequence number space.

3.21.4. Reliable transfer

TCP repairs loss using:

timeout-based retransmission;
fast retransmit after three duplicate ACKs.

Cumulative ACKs make ACK loss less serious because later ACKs can cover earlier data.

3.21.5. Flow control

Flow control is receiver protection.

The receiver advertises:

\[ rwnd \]

The sender ensures that in-flight unacknowledged data does not exceed the receiver’s available buffer space.

3.21.6. Congestion control

Congestion control is network protection.

It was introduced but not fully covered in these lectures. The next lecture will focus on it.

The sender’s actual allowed in-flight data is limited by both:

\[ \min(cwnd, rwnd) \]

where \(cwnd\) protects the network and \(rwnd\) protects the receiver.

3.21.7. Connection management

A two-way handshake is not enough because of delayed, reordered, or retransmitted messages.

TCP uses a three-way handshake:

\[ SYN \]

\[ SYN + ACK \]

\[ ACK \]

Connection closing is separate in the two directions and uses FIN/ACK, with RST for abrupt error handling.