Packet Switching: How the Network Carries Anything

#TL;DR

Before 1965, networks carried data the way the phone system carried voice: reserve an end-to-end circuit, hold it open for the duration, tear it down when you’re done. Paul Baran and Donald Davies independently proposed the opposite — chop every message into small, self-describing chunks, send each one independently, and reassemble them at the destination. The network becomes stateless and replaceable. The endpoints do all the smart work. Fifty years later, every byte you’ve ever sent still moves this way.

#Circuit Switching: The Phone Model

In a circuit-switched network, placing a call allocates a physical path. Every switch between you and the other party reserves a slice of its capacity for your conversation, and that capacity stays reserved even when neither of you is talking.

For analog voice, it works. Two humans use a phone line pretty efficiently. For computers, it’s a disaster:

A terminal sending a command uses the line for a few milliseconds, then sits idle for seconds or minutes waiting for a response.
Setup takes real time. Dialing and negotiating a cross-country circuit in the 1960s could take tens of seconds before the first byte flowed.
A failed switch drops the call. Your session is gone. Start over.
The network scales with the number of simultaneous connections, not the amount of actual data moving. Billable peak, idle most of the time.

By 1960, computer traffic was clearly bursty. Circuit switching was optimizing for the wrong thing.

#Baran’s Insight: Survivability

Paul Baran worked at the RAND Corporation in the early 1960s on a problem the US Air Force cared about urgently: how do you keep a command network running after a nuclear strike takes out most of its nodes?

Baran’s answer, published as a series of RAND memoranda between 1960 and 1964, had three parts:

A redundant mesh — enough alternate paths that losing any single node, or even many, doesn’t partition the network.
Distributed routing — no central controller. Each node decides locally where to forward each message, based on its current view of which neighbors are alive.
Message blocks — chop every message into small, uniform pieces. Each piece carries its own header with source, destination, and sequence number. Pieces can take different routes. The destination reassembles them.

He called it distributed adaptive message block switching. AT&T, which ran the US telephone network, told him it wouldn’t work. They were wrong, but they weren’t interested.

#Davies’s Insight: Efficiency

Donald Davies, at the UK’s National Physical Laboratory, arrived at the same technique in 1965 from a completely different starting point. He wasn’t thinking about nuclear survivability. He was thinking about time-sharing computers and how wasteful it was to hold a circuit open across an ocean for an interactive session.

Davies’s version emphasized statistical multiplexing: if you slice up every message into small pieces and interleave them on shared links, you can pack far more conversations into the same bandwidth. Nobody holds the wire exclusively. Everyone gets a share proportional to what they’re actually sending.

Davies also gave it the name that stuck. He rejected Baran’s “message block” as clunky and picked a shorter word: packet.

By the time ARPA began designing ARPANET in 1967, Larry Roberts had read both Baran and Davies and built the architecture around their shared idea.

#Anatomy of a Packet

The trick that makes everything else work is that each packet is self-describing. You can hand one to a router that has never seen anything else from your conversation, and it will know exactly what to do with it.

Roughly, every packet has two parts:

┌───────────────────────────────────────────┐
│  Header                                   │
│  ──────                                   │
│  source address                           │
│  destination address                      │
│  sequence number                          │
│  length                                   │
│  checksum                                 │
│  (flags, version, TTL, ...)               │
├───────────────────────────────────────────┤
│  Payload                                  │
│  ───────                                  │
│  up to ~1500 bytes of whatever you like   │
└───────────────────────────────────────────┘

The header is what the network reads. The payload is what the network ignores. That separation is the entire trick. A router looking at a packet doesn’t care whether the payload is email, a video frame, a database query, or a thing that won’t be invented for another thirty years. It looks at the destination address and forwards.

#Packets Out of Order, Packets Lost

Because each packet travels independently, several things can happen that never happen in a circuit-switched call:

Out-of-order arrival. Packet 3 takes a shorter route and beats packet 2 to the destination.
Duplicates. A router retransmits a packet it thinks was lost; the original shows up anyway.
Loss. A router’s queue overflows; a packet is dropped with no apology.

The network does not try to fix any of this. The endpoints do.

import random

def packetize(message, size=8):
    return [
        {"seq": i, "total": -(-len(message) // size), "data": message[i*size:(i+1)*size]}
        for i in range(-(-len(message) // size))
    ]

def simulate_network(packets, drop_rate=0.1):
    delivered = [p for p in packets if random.random() > drop_rate]
    random.shuffle(delivered)  # different routes, different latencies
    return delivered

def reassemble(packets):
    seen = {p["seq"]: p for p in packets}  # dedupe by sequence number
    if len(seen) < packets[0]["total"]:
        return None  # something was lost; ask for retransmission
    return "".join(seen[i]["data"] for i in range(packets[0]["total"]))

msg = "Packet switching makes the network dumb and the endpoints smart."
received = simulate_network(packetize(msg))
print(reassemble(received))

This is the end-to-end principle in miniature: reliability isn’t a property of the network, it’s a property of what the endpoints do with what the network gives them. TCP, in TCP/IP, is the library version of this pattern.

#Why This Abstraction Refuses to Die

Every major networking evolution since 1969 has replaced something — the link layer, the transport protocol, the routing algorithm, the addressing scheme — without replacing packet switching. The reason is that packet switching is a contract, not a technology:

The network doesn’t understand the payload. It never has to be upgraded when applications change. YouTube streaming and SSH and BitTorrent all run over the same IP routers.
The endpoints don’t know the topology. Fiber replaces copper, 5G replaces LTE, satellite links enter the mix — the packet doesn’t care. The application doesn’t care.
State lives at the edges. A router holds no conversation state. Lose a router, another route absorbs the traffic, no session gets dropped. This is Baran’s survivability property, and it’s why the internet has never had a central switch that can be turned off.

Circuit switching still exists for specific workloads — synchronous optical networks in telecom backbones, some forms of MPLS, the dedicated “lanes” a data center might set up between GPUs for training runs. But as the default, packet switching has won everywhere it’s been tried.

The ARPANET post opens with the four-node diagram. That diagram is only interesting if you understand what the IMPs between the nodes were doing. They were moving packets — headers they read, payloads they ignored — and that unremarkable choice is the reason a network designed for twelve computers now runs for ten billion.