工坊 · 1,153 字 · 5 分钟阅读

Docker: Ship the Whole Machine

How containerization solved 'works on my machine' and transformed software delivery forever.

#TL;DR

“It works on my machine” was the eternal excuse. Different operating systems, different library versions, different configurations — the gap between development and production was a constant source of bugs, outages, and finger-pointing. Virtual machines solved this but at the cost of running entire operating systems. In 2013, Solomon Hykes and his team at dotCloud launched Docker: a tool that packaged applications and all their dependencies into lightweight, portable containers that shared the host’s kernel but were otherwise isolated. A container started in milliseconds (not minutes), used megabytes (not gigabytes), and ran identically everywhere — laptop, CI server, production cluster. Docker didn’t invent containers (Linux had the underlying technology for years), but it made them usable by wrapping cgroups, namespaces, and union filesystems in a simple CLI and a standard image format. Docker transformed how software is built, tested, and deployed, and made the microservices architecture practical.

#The Gap Between Dev and Prod

In 2012, deploying software was a minefield of environmental differences:

  • Your laptop runs macOS. Production runs Ubuntu 20.04.
  • You have Python 3.9. The server has Python 3.6.
  • You installed libssl from Homebrew. The server has a different version from apt.
  • Your app expects a config file at /etc/myapp/config.yaml. The server has it at /opt/config/myapp.yaml.

Each difference was a potential failure mode. The gap between “works on my machine” and “works in production” consumed enormous engineering effort: configuration management tools (Chef, Puppet, Ansible), provisioning scripts, environment-specific build steps, and exhaustive testing matrices.

Virtual machines solved the consistency problem. Package the entire operating system — kernel, libraries, runtime, application — into a VM image, and it runs identically everywhere. But VMs were heavy: each one booted a full OS, consumed gigabytes of RAM, and took minutes to start. Running 50 microservices meant 50 VMs, each with its own Linux kernel, its own memory overhead, and its own boot time.

#Containers: Lightweight Isolation

Docker’s insight was that you didn’t need to virtualize the entire machine. You just needed to isolate the application’s view of the operating system — its filesystem, its processes, its network — while sharing the host’s kernel.

Linux already had the building blocks:

  • namespaces (2002) — isolate what a process can see (its own PID tree, its own network interfaces, its own filesystem mount points)
  • cgroups (2007) — limit what a process can use (CPU, memory, disk I/O)
  • union filesystems (OverlayFS, AUFS) — layer filesystem changes on top of a read-only base image

Docker combined these into a coherent abstraction: the container.

Virtual Machine:                    Container:
┌──────────────┐                   ┌──────────────┐
│   Your App   │                   │   Your App   │
├──────────────┤                   ├──────────────┤
│  Libraries   │                   │  Libraries   │
├──────────────┤                   ├──────────────┤
│  Guest OS    │                   │  (shared     │
│  (full Linux)│                   │   kernel)    │
├──────────────┤                   └──────┬───────┘
│  Hypervisor  │                          │
├──────────────┤                   ┌──────┴───────┐
│   Host OS    │                   │   Host OS    │
└──────────────┘                   └──────────────┘

Size: ~GBs                         Size: ~MBs
Boot: minutes                      Boot: milliseconds

A container has its own filesystem, its own process tree, and its own network interface — but it shares the host’s Linux kernel. No guest OS to boot, no hypervisor overhead. Containers start in milliseconds and consume only the memory the application actually uses.

#The Dockerfile

Docker’s most important user-facing innovation was the Dockerfile — a declarative recipe for building a container image:

FROM node:22-alpine
WORKDIR /app
COPY package.json pnpm-lock.yaml ./
RUN corepack enable && pnpm install --frozen-lockfile
COPY . .
RUN pnpm build
EXPOSE 3000
CMD ["node", "dist/server.js"]

Each instruction creates a layer. Layers are cached independently — if package.json hasn’t changed, the pnpm install layer is reused from cache. Only layers after the first change are rebuilt. This made builds fast and images small.

# Build an image
docker build -t myapp:latest .

# Run a container from it
docker run -p 3000:3000 myapp:latest

# Same image, same behavior, anywhere Docker runs

The Dockerfile is both a build script and documentation. Reading it tells you exactly what the application needs: which base OS, which runtime, which dependencies, which ports, which startup command. No more tribal knowledge about how to set up the environment.

#Images and Registries

A Docker image is a read-only template — a stack of filesystem layers that contains everything an application needs. A container is a running instance of an image, with a writable layer on top for runtime changes.

Image layers (read-only):           Container (running):
┌─────────────────────┐            ┌─────────────────────┐
│ Layer 4: app code   │            │ Writable layer      │ ← runtime changes
├─────────────────────┤            ├─────────────────────┤
│ Layer 3: npm install│            │ Layer 4: app code   │
├─────────────────────┤            ├─────────────────────┤
│ Layer 2: Node.js    │            │ Layer 3: npm install│
├─────────────────────┤            ├─────────────────────┤
│ Layer 1: Alpine     │            │ Layer 2: Node.js    │
│         Linux       │            ├─────────────────────┤
└─────────────────────┘            │ Layer 1: Alpine     │
                                   └─────────────────────┘

Docker Hub (and later GitHub Container Registry, Amazon ECR, Google Artifact Registry) became the package manager for infrastructure. Push an image with docker push, pull it anywhere with docker pull. The same image that passed CI runs in staging and production. The artifact is the deployment.

#Docker Compose: Multi-Container Applications

Real applications aren’t a single process. They have a web server, a database, a cache, a message queue. Docker Compose let you define multi-container applications in a single YAML file:

services:
  web:
    build: .
    ports:
      - "3000:3000"
    environment:
      DATABASE_URL: postgres://db:5432/myapp
    depends_on:
      - db
      - redis

  db:
    image: postgres:16-alpine
    volumes:
      - pgdata:/var/lib/postgresql/data

  redis:
    image: redis:7-alpine

volumes:
  pgdata:

docker compose up starts everything. Every developer gets an identical environment — the same database version, the same Redis version, the same network topology. The “it works on my machine” problem wasn’t just solved for the application. It was solved for the entire infrastructure stack.

#The Microservices Enabler

Docker didn’t invent microservices, but it made them practical. Before containers, deploying 50 independent services meant managing 50 sets of dependencies, 50 deployment scripts, and 50 potential environment mismatches. With Docker, each service was a self-contained image:

Monolith (pre-Docker):              Microservices (with Docker):
┌─────────────────────┐            ┌─────┐ ┌─────┐ ┌─────┐
│                     │            │Auth │ │Cart │ │Pay  │
│    One big app      │            │ API │ │ API │ │ API │
│    One big deploy   │            └──┬──┘ └──┬──┘ └──┬──┘
│    One big problem  │               │       │       │
│                     │            ┌──┴──┐ ┌──┴──┐ ┌──┴──┐
└─────────────────────┘            │ DB  │ │Redis│ │ DB  │
                                   └─────┘ └─────┘ └─────┘

Each service could use different languages, different frameworks, different versions — as long as it exposed an API and fit in a container. Teams could deploy independently, scale independently, and fail independently. Docker was the packaging format that made this decomposition manageable.

The microservices explosion created its own problems — service discovery, distributed tracing, cascading failures — which led to orchestration platforms like Kubernetes (2014). But Kubernetes orchestrates containers. Docker provided the container abstraction that Kubernetes builds on.

#What Docker Got Right

Docker didn’t invent any of its underlying technologies. cgroups, namespaces, and union filesystems existed for years. Docker’s contribution was packaging and usability:

  • The image format — a standard, portable, layered format for packaging applications. Before Docker, every deployment tool had its own format (AMIs, RPMs, tarballs, Vagrant boxes). Docker images became the universal packaging format: build once, run anywhere Docker runs. This is the same achievement as the shipping container in physical logistics — the standard box that made global trade efficient.
  • The developer experiencedocker build, docker run, docker push. Three commands to go from source code to deployable artifact to shared registry. The Dockerfile made environment setup reproducible and version-controlled. Docker Compose made multi-service development local. The CLI was simple enough that developers who knew nothing about Linux internals could use containers effectively.
  • Immutable infrastructure — Docker shifted the deployment model from “update the server in place” to “build a new image and replace the old one.” Servers became disposable. Rollbacks meant running the previous image. Debugging meant inspecting the exact image that was running. This immutability eliminated an entire class of “configuration drift” bugs.
  • The bridge to cloud-native — Docker containers became the standard unit of deployment for cloud platforms. AWS ECS, Google Cloud Run, Azure Container Instances, Fly.io — they all run Docker images. The container abstraction decoupled applications from infrastructure providers, making workloads portable across clouds in a way that VM images never achieved.

Solomon Hykes described Docker’s goal as making the internet’s infrastructure “programmable.” Before Docker, infrastructure was configured — painstakingly, imperatively, differently on every machine. After Docker, infrastructure was defined — declaratively, reproducibly, in version-controlled files that could be reviewed, tested, and shared like code. The container didn’t just change how software is deployed. It changed how software is thought about: as a self-contained unit, portable by default, identical everywhere it runs.