Application Layer 3: P2P, Multimedia, Streaming, VoIP, Sockets

1. Peer-to-peer architecture

1.1. Client-server model as the baseline

Until now, most application-layer examples were based on a client-server model.

In the client-server model:

A client initiates communication.
A server provides the requested service.
The server might serve web content, accept email, deliver files, etc.
The server is usually the central provider of the resource.

This works well, but the server can become a bottleneck.

1.2. Basic idea of P2P

Peer-to-peer changes the model:

Every participant can act both as a client and as a server.
A peer downloads data from others.
Once it has some data, it can upload that data to other peers.
Thus, bandwidth can be shared across participants.

In other words, peers:

request service from other peers,
provide service to other peers,
contribute upload capacity while also creating download demand.

This gives P2P a form of self-scalability:

When new peers join, demand increases.
But the available upload capacity also increases.

1.3. Why P2P is attractive

P2P can reduce load on a central server.

In a traditional client-server file distribution system:

The server must upload the file to every client.
If many clients request the file, the server’s upload capacity can become the bottleneck.

In P2P:

The server may only need to upload at least one complete copy.
Peers can then redistribute chunks of the file.
The aggregate upload capacity of all peers can help distribute the file faster.

1.4. Practical problems of P2P

The theory is attractive, but real P2P systems face several difficulties.

1.4.1. Reachability

P2P assumes that peers can directly communicate with each other.

In practice, this is difficult because:

IPv4 has only \(2^{32}\) possible addresses, about 4 billion.
There are more people than IPv4 addresses.
Many people have multiple Internet-connected devices.
NAT and firewalls break the original end-to-end reachability assumption of the Internet.

IPv6 provides many more addresses, but it is not universally deployed.

1.4.2. Firewalls and NAT

Even if addressing works, users may not want their devices to be reachable from the public Internet.

Firewalls often block incoming traffic.

For example, older P2P games sometimes required users to:

open a specific firewall port,
configure their router,
or use VPN-like tools such as Hamachi.

This shows that P2P is conceptually simple but operationally complex.

1.4.3. Churn

Peers are not always online.

A peer may:

shut down the laptop,
suspend the device,
disconnect from the network,
change IP address,
leave after finishing the download.

This changing set of available peers is called churn.

P2P systems must constantly adapt to churn.

2. File distribution: client-server vs P2P

2.1. Setup

Assume:

A file has size \(F\).
There is one server.
There are \(N\) peers that want the file.
The server upload capacity is \(u_s\).
Peer \(i\)’s upload capacity is \(u_i\).
Peer \(i\)’s download capacity is \(d_i\).
The slowest peer download rate is: \[ d_{\min} = \min_i d_i \]

2.2. Client-server distribution time

In the client-server model, the server must upload \(N\) copies of the file.

Time for the server to upload one copy: \[ \frac{F}{u_s} \]

Time for the server to upload \(N\) copies: \[ \frac{NF}{u_s} \]

Each client must also download one copy.

The slowest client needs at least: \[ \frac{F}{d_{\min}} \]

Therefore, the distribution time is bounded by: \[ D_{c-s} \geq \max\left\{\frac{NF}{u_s}, \frac{F}{d_{\min}}\right\} \]

Important consequence:

The term \(\frac{NF}{u_s}\) grows linearly with \(N\).
If many clients want the file, the server upload capacity becomes a bottleneck.

2.3. P2P distribution time

In P2P, the server must still upload at least one copy of the file:

\[ \frac{F}{u_s} \]

Each client must still download the file:

\[ \frac{F}{d_{\min}} \]

But now peers can also upload to each other.

The total upload capacity is: \[ u_s + \sum_i u_i \]

The whole system must collectively deliver \(NF\) bits.

Thus: \[ D_{P2P} \geq \max\left\{ \frac{F}{u_s}, \frac{F}{d_{\min}}, \frac{NF}{u_s + \sum_i u_i} \right\} \]

Important consequence:

\(NF\) still grows with \(N\).
But the denominator also grows because new peers bring new upload capacity.
Therefore, P2P can scale better than client-server distribution.

2.4. Intuition from the example

The slide example assumes:

each client upload rate is \(u\),
\(F/u = 1\) hour,
server upload rate \(u_s = 10u\),
\(d_{\min} \geq u_s\).

The graph shows:

client-server distribution time increases almost linearly with the number of clients,
P2P grows much more slowly because every peer contributes upload capacity.

3. BitTorrent

3.1. Basic concepts

BitTorrent is a concrete example of P2P file distribution.

Important terms:

file chunks: The file is divided into small chunks. In the slide, chunks are 256 Kb.
torrent: The group of peers exchanging chunks of a file.
tracker: A centralized component that tracks which peers participate in the torrent.

The tracker does not necessarily send the file itself. It helps peers find each other.

3.2. Joining a torrent

When a peer joins a torrent:

It initially has no chunks.
It contacts the tracker.
The tracker provides a list of peers.
The peer connects to a subset of those peers, called neighbors.
The peer starts downloading chunks.
Once it has chunks, it can upload them to others.

A peer can change which other peers it exchanges chunks with over time.

3.3. Churn in BitTorrent

Peers may come and go.

After downloading the entire file, a peer may:

selfishly leave immediately,
or altruistically stay and continue uploading.

This affects availability and performance.

3.4. Requesting chunks

At any given time, different peers have different subsets of chunks.

A peer periodically asks other peers:

Which chunks do you have?

Then it requests missing chunks.

A common strategy is rarest first:

request chunks that are least widely available first.

This helps avoid a situation where rare chunks disappear from the torrent.

3.5. Sending chunks: tit-for-tat

BitTorrent uses a tit-for-tat incentive mechanism.

Alice uploads chunks to the four peers that are currently sending her chunks at the highest rate.

Other peers are choked:

Alice does not upload chunks to them.

The top four peers are re-evaluated periodically, about every 10 seconds.

Every 30 seconds, Alice randomly selects another peer and starts sending chunks to it. This is called optimistic unchoking.

3.6. Optimistic unchoking

Optimistic unchoking helps discover better trading partners.

Example:

Alice optimistically unchokes Bob.
Bob may start receiving chunks from Alice.
Bob may reciprocate by uploading chunks to Alice.
If Bob becomes a good provider, he can enter Alice’s top-four list.

The idea:

Higher upload rate helps a peer find better trading partners.
Better trading partners help the peer get the file faster.

3.7. Trust and integrity in BitTorrent

The lecture mentions that BitTorrent historically had trust issues.

In theory:

file hashes can verify that the data matches what was advertised.

But:

if the advertised hash itself is untrusted, the user still has to trust the source of the torrent metadata.

This is not unique to BitTorrent:

downloading from a centralized website or CDN also requires some trust.

4. P2P beyond file sharing

P2P is not only useful for file distribution.

It can also be useful for real-time communication.

Example:

For one-to-one communication, directly sending traffic between two peers may be faster than routing everything through a central server.
Some systems, such as Skype historically and Zoom in some cases, can switch to direct P2P communication.

However, central servers are still useful because:

they simplify connection establishment,
they help pass NATs and firewalls,
they support group meetings,
they provide a robust fallback.

5. Video streaming

5.1. Why video streaming matters

Video streaming is a major consumer of Internet bandwidth.

The lecture mentions estimates from 2020:

Netflix, YouTube, Amazon Prime, and similar services account for a very large fraction of residential ISP traffic by volume, around 80%.

Main challenges:

Scale: How can a service reach a huge number of users?
Heterogeneity: Users have very different network conditions and devices. Examples:
- wired vs mobile,
- high bandwidth vs poor bandwidth,
- stable vs unstable network conditions.

Common solution:

use distributed application-level infrastructure,
such as CDNs and application-level adaptation.

5.2. Multimedia audio

Audio begins as an analog signal.

To transmit it digitally, it must be:

sampled,
quantized,
encoded as bits.

5.2.1. Sampling

Sampling means measuring the analog signal at regular time intervals.

Examples:

Telephone audio: about 8000 samples per second.
CD audio: 44100 samples per second.

5.2.2. Quantization

Each sample is rounded to one of a finite number of values.

For example: \[ 2^8 = 256 \]

So with 8 bits per sample, there are 256 possible quantized values.

Quantization introduces error because the original analog value is rounded.

5.2.3. Example bitrate

Telephone-quality audio:

\[ 8000 \text{ samples/s} \times 8 \text{ bits/sample} = 64000 \text{ bits/s} \]

So this requires:

\[ 64 \text{ Kbps} \]

Other example bitrates:

CD audio: about \(1.411\) Mbps.
MP3: 96, 128, or 160 Kbps.
Internet telephony: about 5.3 Kbps and upward, depending on the codec.

5.3. Multimedia video

Video is a sequence of images displayed at a constant rate.

Example:

24 images per second,
30 frames per second,
60 frames per second.

A digital image is an array of pixels. Each pixel is represented by bits.

Raw video would require a huge amount of data, so video must be compressed.

5.4. Video compression

Video compression exploits redundancy.

There are two main types:

5.4.1. Spatial redundancy

Spatial redundancy exists within a single image.

Example:

If many neighboring pixels have the same color, we do not need to send the same color value repeatedly.
Instead of sending \(N\) identical purple pixels, send:
- the color value,
- the number of repetitions.

5.4.2. Temporal redundancy

Temporal redundancy exists between consecutive frames.

Often, most of the image stays the same from one frame to the next.

Example:

Instead of sending the entire next frame, send only the difference from the previous frame.

This is especially useful when only a small part of the image changes.

5.5. CBR and VBR

5.5.1. Constant Bit Rate

CBR means constant bit rate.

The video encoding rate is fixed over time.

This is simple, but not always efficient.

5.5.2. Variable Bit Rate

VBR means variable bit rate.

The encoding rate changes depending on the amount of spatial and temporal information.

For example:

a static scene may require fewer bits,
a complex scene with movement may require more bits.

5.6. Video coding standards

Examples mentioned in the lecture:

MPEG-1:
- used for CD-ROM,
- about 1.5 Mbps.
MPEG-2:
- used for DVD,
- about 3 to 6 Mbps.
MPEG-4:
- often used on the Internet,
- about 64 Kbps to 12 Mbps.

The wide bitrate range shows that video can be adapted to different network and device conditions.

6. Application types for multimedia

6.1. Streaming stored audio/video

This is the easiest case conceptually.

Examples:

YouTube videos,
Netflix videos,
stored video on a server.

Definition of streaming:

the client can begin playback before downloading the entire file.

The file is stored at the server. The server can send data faster than the actual playback rate. The client buffers incoming data and plays it out continuously.

Usually, the user does not permanently keep the streamed file.

6.2. Conversational voice/video over IP

Examples:

Zoom,
Skype,
other video call systems.

This is more difficult because:

the input is generated live,
the system does not know in advance what will be said,
delay tolerance is much smaller,
buffering for too long destroys interactivity.

In a video call, if speech is buffered and played minutes later, the conversation becomes unusable.

6.3. Streaming live audio/video

Examples:

live sports,
livestreams.

This is between stored streaming and real-time conversation.

The input is live, but:

it is usually one-way,
delay tolerance is higher than in conversation,
but users still notice delay in some cases.

Example from the lecture:

If watching a football match livestream, a neighbor with a lower-delay stream may react to a goal earlier.

7. Streaming stored video

7.1. Basic model

Consider a stored video recorded at a fixed frame rate, for example:

\[ 30 \text{ frames/s} \]

The server sends the video. The network adds delay. The client receives and plays the video at the same original frame rate.

In the ideal case:

server transmission is smooth,
network delay is fixed,
client playback is continuous.

Streaming means:

while the client is playing the early part,
the server may still be sending later parts.

7.2. Continuous playout constraint

A key constraint is continuous playout.

During playback:

timing must match the original video timing,
frames should be displayed at the intended rate,
playback should not suddenly freeze, speed up, or slow down.

But the network introduces:

variable delay,
jitter,
packet loss.

Therefore, the client needs buffering.

7.3. Client-side buffering

The client waits before starting playback.

This initial waiting time fills a buffer.

The buffer absorbs delay variation:

if packets arrive slightly late, playback can continue from buffered data.

Let:

\(x(t)\) be the variable fill rate,
\(r\) be the constant playout rate,
\(Q(t)\) be the buffer fill level: the amount of already received but not yet played media data in the client-side buffer,
\(B\) be the buffer size.

If the average fill rate is less than the playout rate:

\[ x < r \]

then the buffer eventually empties. This causes freezing or rebuffering.

If:

\[ x > r \]

then the buffer does not empty, assuming the initial playout delay is large enough to absorb variation.

However, if too much data arrives too quickly, the buffer can overflow.

7.4. Initial playout delay tradeoff

A larger initial playout delay:

reduces the probability of buffer starvation,
improves continuity.

But it also:

makes the user wait longer before the video starts.

So the system must balance:

startup delay,
smooth playback,
memory/buffer size,
user experience.

7.5. Client interactivity

Streaming systems must support user actions:

pause,
fast-forward,
rewind,
jump to another point in the video.

This complicates streaming because the client may suddenly request a different part of the file.

A good streaming system should react quickly to such jumps.

8. Streaming over UDP

8.1. UDP as unreliable transport

UDP is unreliable and connectionless.

The sender sends datagrams and does not get transport-level reliability.

From the UDP perspective:

no retransmission,
no connection setup,
no guarantee of delivery,
no guarantee of ordering.

The lecture describes UDP as essentially:

send the data,
hope for the best,
if it is lost, UDP itself does not care.

8.2. Why UDP can be useful for streaming

UDP can be useful because the application has more control.

For streaming:

the server can send at the encoding rate,
the application can decide how to handle loss,
retransmission can be avoided when late data would be useless.

A typical playout delay might be around 2 to 5 seconds to absorb jitter.

8.3. Application-level recovery

If error recovery is needed with UDP, it must be implemented above UDP.

For example:

application-level retransmission,
forward error correction,
selective recovery,
RTP-like mechanisms.

The lecture mentions RTP as a protocol related to multimedia payload types.

8.4. Firewall problem

UDP may be blocked by firewalls.

Some networks allow DNS over UDP but treat other UDP traffic suspiciously.

This makes UDP less reliable from a deployment perspective.

9. Streaming over HTTP/TCP

9.1. TCP as reliable transport

TCP provides:

reliable delivery,
in-order delivery,
congestion control,
retransmission,
a byte-stream abstraction.

With HTTP/TCP streaming:

the multimedia file is retrieved using HTTP GET,
the server sends data as fast as TCP allows,
TCP adapts to congestion,
TCP retransmits lost packets.

9.2. Advantages of HTTP/TCP streaming

Advantages:

TCP passes through firewalls more easily.
The client can use simple HTTP requests.
The client can request specific parts of the video.
TCP hides many transport-level details from the application.
Reliability and ordering are handled by the transport layer.

This is useful for stored streaming, where receiving the correct data is often important.

9.3. Disadvantages of TCP for streaming

TCP’s goal is to reliably deliver every byte.

But for multimedia, this can be problematic.

If a video frame arrives too late, it may be useless even if it is eventually delivered.

TCP may:

retransmit data that the application no longer needs,
reduce its sending rate after loss,
delay later data because of in-order delivery,
create larger playout delays.

This is especially problematic for interactive applications like VoIP.

9.4. Is TCP ill-suited for video?

The lecture does not say TCP is always wrong.

Instead, the question is about tradeoffs:

TCP is convenient and deployable.
But TCP reliability may be too strict for some media data.
Some media frames are more important than others.

This motivates selective or partial reliability.

10. Frame types and ideal transport for streaming

10.1. Video frames

A video consists of frames, i.e. still images shown in sequence.

Not all frames are equally important.

10.2. I-frames

I-frames are independent frames.

They:

contain enough information to decode themselves,
do not require previous or future frames,
can act as recovery points after loss.

If an I-frame is lost, video recovery is difficult. Therefore, I-frames should be delivered reliably.

10.3. P-frames

P-frames use data from previous frames.

They:

encode differences or predictions based on earlier frames,
reduce bitrate,
require previous frames for decoding.

Loss of a P-frame can reduce quality, but may not always completely break playback.

10.4. B-frames

B-frames use both previous and future frames.

They:

exploit even more temporal redundancy,
can be efficient,
depend on surrounding frames.

10.5. Ideal solution: partial reliability

An ideal streaming transport would treat frame types differently.

Possible design:

transmit I-frames reliably,
transmit P-frames and B-frames unreliably or with weaker reliability.

This is called selective or partial reliability.

Loss handling could use:

forward error correction,
selective retransmission,
application-specific recovery.

This would combine:

reliability where it matters,
low delay where retransmission would be useless.

11. Voice-over-IP

11.1. Main requirement

VoIP must preserve the conversational aspect of human communication.

Delay is critical.

Rules of thumb:

below 150 ms: good,
above 400 ms: bad.

The total delay includes:

application-level delay,
packetization delay,
playout delay,
network delay,
queueing delay,
processing delay.

11.2. Additional VoIP issues

VoIP systems also need session initialization.

A caller and callee must somehow agree on:

IP address,
port number,
encoding algorithm / codec,
media parameters.

VoIP systems may also provide value-added services:

call forwarding,
call screening,
recording.

Emergency services are also important:

traditional telephony had strong emergency call assumptions,
moving to VoIP introduces regulatory and reliability concerns.

11.3. Talk spurts and silence

Human speech alternates between:

talk spurts,
silent periods.

During a talk spurt:

audio is generated continuously.

During silence:

little or no audio data needs to be sent.

This helps the application adapt:

silence periods can be compressed or elongated,
playout delay can be recalculated between talk spurts.

11.4. VoIP packet generation

The lecture gives a typical example:

64 Kbps during a talk spurt.
Packets generated only during talk spurts.
Audio is divided into 20 ms chunks.

Since:

\[ 64 \text{ Kbps} = 8 \text{ KBytes/s} \]

A 20 ms chunk contains:

\[ 8 \text{ KBytes/s} \times 0.02 \text{ s} = 0.16 \text{ KBytes} = 160 \text{ bytes} \]

So the payload is about 160 bytes per packet.

Then:

an application-layer header is added,
the chunk plus header is encapsulated into UDP or TCP,
more lower-layer headers are added.

Thus, the actual packet on the network is larger than 160 bytes.

12. VoIP packet loss and delay

12.1. Network loss

Network loss occurs when an IP datagram is lost in the network.

Common cause:

congestion,
router buffer overflow.

12.2. Delay loss

Delay loss occurs when a packet arrives too late for playback.

Even if the packet eventually arrives, it is useless if its playout time has already passed.

For VoIP:

a packet arriving after the playout deadline is treated as lost.

12.3. Loss tolerance

VoIP can tolerate some packet loss depending on:

codec,
loss concealment,
application design.

Typical tolerable loss: \[ 1\% \text{ to } 10\% \]

10% loss is usually very annoying, but applications can sometimes conceal it.

13. Delay jitter

13.1. Meaning of jitter

Delay jitter means variable packet delay.

Packets may be generated every 20 ms, but they may not arrive every 20 ms.

For two consecutive packets:

one may experience low delay,
the next may experience higher delay.

Therefore, inter-arrival time is not constant.

13.2. Need for playout delay

The receiver uses a playout buffer.

The idea:

receive packets,
wait a short time,
play them out at regular intervals.

This hides network jitter.

14. Fixed playout delay

14.1. Definition

Suppose a chunk is generated at time \(t\).

The receiver attempts to play it at:

\[ t + q \]

where \(q\) is the fixed playout delay.

If the packet arrives after \(t+q\), it is too late and is treated as lost.

14.2. Tradeoff

Choosing \(q\) is a tradeoff.

Large \(q\):

less late loss,
more buffering,
worse interactivity.

Small \(q\):

better interactivity,
more late loss.

For VoIP, this tradeoff is crucial.

15. Adaptive playout delay

15.1. Motivation

A fixed playout delay is not always optimal because network delay changes over time.

VoIP can adapt more naturally than stored video because speech has silent periods.

At the beginning of each talk spurt, the receiver can adjust the playout delay.

During the talk spurt:

chunks are still played every 20 ms,
the delay is not constantly changed inside the spurt.

Between talk spurts:

the system can recalculate delay.

15.2. Estimated network delay

Let:

\(t_i\) be the timestamp when packet \(i\) was generated,
\(r_i\) be the time packet \(i\) was received,
\(r_i - t_i\) be the measured network delay,
\(d_i\) be the estimated average delay.

The estimate uses an exponentially weighted moving average:

\[ d_i = (1-\alpha)d_{i-1} + \alpha(r_i - t_i) \]

The idea:

recent measurements matter more,
older measurements still matter but with decreasing weight.

15.3. Estimated delay deviation

It is also useful to estimate the deviation:

\[ v_i = (1-\beta)v_{i-1} + \beta |r_i - t_i - d_i| \]

This measures how much the actual delay differs from the estimated average delay.

15.4. Playout time

For the first packet in a talk spurt, choose:

\[ \text{playout-time}_i = t_i + d_i + K v_i \]

where:

\(d_i\) estimates average delay,
\(v_i\) estimates delay variability,
\(K\) controls how conservative the playout delay is.

Then remaining packets in the talk spurt are played periodically.

15.5. Detecting the start of a talk spurt

If there is no loss:

look at successive timestamps.
If the timestamp difference is greater than 20 ms, a new talk spurt begins.

Under loss:

timestamps alone are not enough.
the receiver must also inspect sequence numbers.

A new talk spurt can be detected when:

timestamp difference is greater than 20 ms,
and sequence numbers show no packet gap indicating loss.

16. VoIP loss recovery

16.1. Why retransmission is difficult

Retransmission requires at least one RTT.

For VoIP, the tolerable delay is small.

By the time a retransmitted packet arrives, it may already be too late.

Therefore, VoIP often uses forward error correction rather than relying only on retransmission.

16.2. Forward Error Correction

Forward Error Correction means sending extra information so that the receiver can recover from loss without requesting retransmission.

The tradeoff:

more bandwidth,
more sender/receiver complexity,
better loss recovery.

16.3. Simple XOR-based FEC

For every group of \(n\) chunks:

create one redundant chunk by XOR-ing the \(n\) original chunks.
send \(n+1\) chunks.

Bandwidth overhead: \[ \frac{1}{n} \]

If at most one chunk is lost among the \(n+1\) chunks, the receiver can reconstruct the missing chunk.

This adds recovery capability but increases bandwidth and may increase playout delay.

16.4. Piggyback lower-quality stream

Another FEC approach:

send a normal-quality stream,
piggyback a lower-quality version of previous audio chunks.

Example:

nominal stream: PCM at 64 Kbps,
redundant stream: GSM at 13 Kbps.

If one packet is lost, the receiver may reconstruct or conceal the loss using the lower-quality copy sent in another packet.

This works better for non-consecutive losses.

Generalization:

append the \((n-1)\)-st and \((n-2)\)-nd low-bitrate chunks as redundancy.

16.5. Interleaving

Interleaving hides loss without adding redundancy.

Idea:

divide audio chunks into smaller units,
for example, split a 20 ms chunk into four 5 ms units.
each packet contains small units from different original chunks.

If one packet is lost:

the receiver loses only small pieces of several chunks,
rather than one complete chunk.

Advantage:

no redundancy overhead.

Disadvantage:

increases playout delay since the receiver must wait longer to reassemble the chunks

17. Example: Zoom

17.1. Why Zoom is discussed

Zoom combines:

video streaming,
VoIP,
screen sharing,
client-server architecture,
sometimes P2P communication.

It is therefore a useful real-world example of the lecture topics.

17.2. Protocol properties

Zoom uses a proprietary application-layer protocol.

According to reverse-engineering work mentioned in the lecture:

it uses a custom form of RTP,
parts of the traffic are encrypted,
passive measurement can still infer useful metadata.

17.3. Server-client mode

By default:

clients connect to Zoom servers,
servers organize meetings,
servers distribute media streams.

Servers may be operated by Zoom or customers.

In server-client mode, media is divided into three flows:

video,
audio,
screen share.

17.4. P2P mode

For meetings with exactly two participants, Zoom can switch to P2P if enabled and possible.

If a third participant joins:

Zoom switches back to server-client mode.

In P2P mode:

all media is combined into one stream.

Reason:

establishing P2P is difficult,
NAT traversal and firewall traversal are already hard,
using one stream is simpler than maintaining three separate P2P streams.

The slide mentions STUN as part of establishing P2P.

17.5. Screen sharing compared with video and audio

Screen sharing is special.

It is not like normal video:

often the screen stays static,
slides may remain unchanged for minutes,
only occasional changes need to be transmitted.

It is somewhat similar to audio talk spurts:

there are active periods and quiet/static periods.

Therefore, screen share bitrate can often be reduced.

17.6. Passive measurement of Zoom

The lecture references a paper:

Oliver Michel et al., 2022, “Enabling Passive Measurement of Zoom Performance in Production Networks”, IMC.

The measurement showed:

most Zoom traffic volume is video,
audio and screen sharing use much less bandwidth,
peaks appear around meeting start times,
meetings often start on regular time boundaries.

18. Socket programming

18.1. Motivation

After discussing application-layer protocols such as HTTP, SMTP, IMAP, streaming, and VoIP, the lecture asks:

How does an application actually send arbitrary data over the network?

The answer is: sockets.

Sockets are the interface between:

the application,
the operating system,
the transport layer.

18.2. What is a socket?

A socket is an abstract representation of a network connection at the application level.

The operating system provides a socket API.

The application:

creates a socket,
writes data to it,
reads data from it.

The operating system:

handles actual data transmission,
interacts with TCP or UDP,
uses the network stack.

A useful analogy:

sending data through a socket is like writing to a file,
receiving data from a socket is like reading from a file.

This fits the Unix idea that many things can be treated like files.

18.3. Sockets and applications

Many applications use sockets internally.

Examples:

Browsers and web servers use sockets to speak HTTP or HTTPS.
Mail clients and servers use sockets to speak SMTP, IMAP, or POP3.
P2P applications use sockets to speak protocols such as BitTorrent or Skype-like protocols.
Multimedia applications use sockets for audio/video data.

18.4. Socket as a door

A socket is a door between:

an application process,
an end-to-end transport protocol.

The application developer controls the application side. The operating system controls the lower layers.

19. UDP sockets

19.1. UDP application viewpoint

UDP provides unreliable transfer of groups of bytes called datagrams.

Properties:

no connection,
no handshake,
each datagram is independent,
sender explicitly attaches destination IP address and port,
receiver extracts source IP address and port from the received datagram,
data may be lost,
data may arrive out of order.

19.2. UDP client-server interaction

Server side:

Create a UDP socket.
Bind it to a local port.
Wait for datagrams.
When a datagram arrives, read both:
- the message,
- the client address.
Send a reply to the client address.

Client side:

Create a UDP socket.
Create a message.
Send it to the server’s IP address and port.
Wait for a reply.
Close the socket.

19.3. UDP Python example: client

from socket import *

serverName = 'hostname'
serverPort = 12000

clientSocket = socket(AF_INET, SOCK_DGRAM)

message = input('Input lowercase sentence:')

clientSocket.sendto(message.encode(), (serverName, serverPort))

modifiedMessage, serverAddress = clientSocket.recvfrom(2048)

print(modifiedMessage.decode())

clientSocket.close()

Important points:

SOCK_DGRAM means UDP.
sendto includes the destination address and port.
recvfrom returns both the message and the sender address.

19.4. UDP Python example: server

from socket import *

serverPort = 12000

serverSocket = socket(AF_INET, SOCK_DGRAM)
serverSocket.bind(('', serverPort))

print('The server is ready to receive')

while True:
    message, clientAddress = serverSocket.recvfrom(2048)
    modifiedMessage = message.decode().upper()
    serverSocket.sendto(modifiedMessage.encode(), clientAddress)

Important points:

The server binds to a port.
It loops forever.
It receives a message and the client’s address.
It sends the uppercase version back to that client.

19.5. UDP demo from the lecture

The lecturer demonstrated UDP using packet capture.

Observation:

before sending data, there is no connection setup traffic.
once the client sends a message, a UDP packet appears.
the server receives it, converts it to uppercase, and sends it back.
this illustrates that UDP has no handshake.

20. TCP sockets

20.1. TCP application viewpoint

TCP provides reliable, in-order byte-stream transfer.

TCP is connection-oriented.

Properties:

client must contact server,
server must have a socket waiting for connections,
connection setup happens before application data transfer,
TCP performs a handshake,
data is delivered reliably and in order.

20.2. TCP server sockets

A TCP server uses two kinds of sockets:

Welcoming socket / listening socket:
- waits for incoming connection requests.
Connection socket:
- created when a client connects,
- used to communicate with that particular client.

This allows one server to handle multiple clients.

Different clients are distinguished by port numbers and connection state.

20.3. TCP client-server interaction

Server side:

Create a socket.
Bind it to a port.
Listen for incoming connections.
Accept a connection.
A new connection socket is created.
Read request from the connection socket.
Write reply to the connection socket.
Close the connection socket.

Client side:

Create a TCP socket.
Connect to server IP address and port.
TCP handshake happens.
Send data.
Receive reply.
Close socket.

20.4. TCP Python example: client

from socket import *

serverName = 'servername'
serverPort = 12000

clientSocket = socket(AF_INET, SOCK_STREAM)
clientSocket.connect((serverName, serverPort))

sentence = input('Input lowercase sentence:')

clientSocket.send(sentence.encode())

modifiedSentence = clientSocket.recv(1024)

print('From Server:', modifiedSentence.decode())

clientSocket.close()

Important points:

SOCK_STREAM means TCP.
connect establishes the TCP connection.
After connection setup, send does not need to include the destination address.
The destination is already associated with the socket.

20.5. TCP Python example: server

from socket import *

serverPort = 12000

serverSocket = socket(AF_INET, SOCK_STREAM)
serverSocket.bind(('', serverPort))
serverSocket.listen(1)

print('The server is ready to receive')

while True:
    connectionSocket, addr = serverSocket.accept()
    sentence = connectionSocket.recv(1024).decode()
    capitalizedSentence = sentence.upper()
    connectionSocket.send(capitalizedSentence.encode())
    connectionSocket.close()

Important points:

listen makes the socket a welcoming socket.
accept waits for an incoming connection.
accept returns a new socket for that specific client.
The welcoming socket remains available for future clients.
Unlike UDP, the server reads bytes from the connection socket, not address-tagged datagrams.

20.6. TCP demo from the lecture

The lecturer showed a TCP packet capture.

Observation:

after starting the server and client, before sending the application message, packets already appear.
these packets correspond to TCP connection setup.
this is the TCP handshake.
after the application sends data, more packets appear for data transfer and response.

This contrasts with UDP:

UDP sends no packets until the application actually sends a datagram.
TCP sends handshake packets during connection establishment.

21. Raw sockets

Raw sockets allow applications to access lower layers more directly.

They can be used to:

implement protocols below the transport layer,
work with ICMP,
work with ARP,
work with OSPF,
monitor traffic,
sniff packets,
send forged packets.

Normal applications usually do not need raw sockets.

With raw sockets:

the application may need to construct headers manually,
it has more control,
but also more responsibility.

22. WebSockets

WebSocket is not the same thing as a socket.

Important distinction:

\[ \text{WebSocket} \neq \text{Socket} \]

A WebSocket is a protocol.

It is built on top of normal TCP sockets.

The WebSocket protocol:

enables two-way communication between a client and a remote host,
starts with an opening handshake,
then uses message framing,
is layered over TCP.

The relevant RFC is RFC 6455.

So:

TCP sockets are the lower-level OS/network abstraction.
WebSocket is an application-layer protocol using TCP.

23. Optional DNS demo at the end

After the main lecture content, the lecturer briefly showed DNS queries using the Linux tool dig.

23.1. dig

dig can be used to query DNS records.

Example idea:

dig google.com

This asks the configured resolver for DNS information.

23.2. A record

An A record maps a domain name to IPv4 addresses.

Example:

querying google.com for A records returns multiple IPv4 addresses.
a client may choose one of them to connect.

23.3. AAAA record

An AAAA record maps a domain name to IPv6 addresses.

Example:

dig google.com AAAA

23.4. NS record

An NS record maps a domain to its authoritative name servers.

Important point:

NS records contain names, not IP addresses.

Therefore, DNS often includes additional information:

the response may include the IP addresses of those name servers in an additional section.

This avoids circular resolution problems.

For example:

to contact a name server inside a domain, the resolver may need that name server’s IP address.
the parent zone can provide glue records / additional address information.

23.5. MX record

An MX record identifies mail servers for a domain.

Example:

querying MX records for google.com returns mail server information.

23.6. TXT records

TXT records store arbitrary text data.

They are often used for email-related security mechanisms.

Example:

SPF records can be stored as TXT records.
SPF helps specify which servers are allowed to send mail for a domain.

The lecture connected this back to the previous email lecture.

23.7. DNS classes

The lecturer briefly mentioned DNS classes:

IN is the normal Internet class.
CHAOS also exists historically, but is rarely relevant in normal use.