Kiến thức nền
CGI: The First Dynamic Web
A simple standard let web servers run programs instead of just serving files — turning static pages into interactive applications and opening the door to everything that followed.
TL;DR
The early web was a library — you could read what was there, but you couldn’t interact with it. Every page was a file sitting on a disk, and all the server did was hand it to you. The Common Gateway Interface, formalized at NCSA in 1993, changed that with a breathtakingly simple idea: when a request comes in, run a program instead of serving a file, and send the program’s output back as the response. Any language that could read environment variables and print to stdout could generate a web page. Perl scripts, C programs, shell scripts — they all worked. CGI gave us hit counters, guest books, search engines, and shopping carts. It was slow, insecure, and eventually replaced by everything from PHP to Node.js. But every server-side framework alive today is a descendant of CGI’s core insight: the web page doesn’t have to exist until someone asks for it.
A Library You Can’t Talk To
By 1993, the web had around 500 servers. Every one of them did the same thing: receive an HTTP request, find the matching file on disk, send it back. The server was a file clerk.
Browser: GET /about.html
Server: (opens /var/www/about.html, sends contents)
This worked for CERN’s physics papers. It did not work for anything that required a response tailored to the person asking. You couldn’t search. You couldn’t log in. You couldn’t submit a form and get an answer. The web was read-only.
HTML had <form> tags — Berners-Lee had included them from the start — but there was no standard way for a server to do something with the data a form submitted. The protocol could carry user input. The server had no idea what to do with it.
”Just Run a Program”
The fix came from the National Center for Supercomputing Applications (NCSA) at the University of Illinois — the same lab that built NCSA Mosaic, the browser that brought the web to the mainstream.
The NCSA httpd team, including Rob McCool, needed a way for their web server to run external programs. The approach they formalized was almost comically simple:
- The server receives a request for a special URL (typically under
/cgi-bin/) - Instead of looking for a file, the server runs a program
- The server passes request data to the program via environment variables and stdin
- The program writes its response to stdout
- The server sends that output back to the browser
That’s it. That’s CGI.
No special API. No library. No SDK. If your program could read an environment variable and print a string, it could power a web page.
#!/bin/bash
# The simplest CGI script: a dynamic page in 4 lines
echo "Content-Type: text/html"
echo ""
echo "<html><body>"
echo "<h1>Hello! The time is $(date)</h1>"
echo "</body></html>"
Every time a browser hit this script, it got the current time. The page didn’t exist as a file. It was generated on demand. The web was no longer a library. It was a machine.
The Environment Variable API
CGI’s interface was the Unix environment. Before launching the program, the server set a collection of variables describing everything about the incoming request:
#!/usr/bin/env perl
# A CGI script that echoes back what the server knows about you
print "Content-Type: text/html\n\n";
print "<html><body><pre>\n";
# The server sets these before your script runs
print "REQUEST_METHOD: $ENV{REQUEST_METHOD}\n"; # GET, POST, etc.
print "QUERY_STRING: $ENV{QUERY_STRING}\n"; # everything after ?
print "REMOTE_ADDR: $ENV{REMOTE_ADDR}\n"; # client's IP
print "HTTP_USER_AGENT: $ENV{HTTP_USER_AGENT}\n"; # browser identity
print "CONTENT_LENGTH: $ENV{CONTENT_LENGTH}\n"; # size of POST body
print "PATH_INFO: $ENV{PATH_INFO}\n"; # extra path after script
print "</pre></body></html>\n";
For GET requests, user input came through QUERY_STRING — the part after the ? in the URL. For POST requests, the body arrived on stdin, and the script read CONTENT_LENGTH bytes from it.
GET /cgi-bin/search?q=hello+world HTTP/1.0
→ QUERY_STRING = "q=hello+world"
→ REQUEST_METHOD = "GET"
POST /cgi-bin/login HTTP/1.0
Content-Length: 29
username=alice&password=secret
→ CONTENT_LENGTH = "29"
→ REQUEST_METHOD = "POST"
→ stdin contains "username=alice&password=secret"
This was the entire API. No objects, no frameworks, no request parsers. Parse the environment variables yourself. Build the HTML string yourself. Print it to stdout. Done.
Perl: The Language of Early CGI
You could write CGI in any language — C, Python, Tcl, even shell scripts. But Perl became the lingua franca of CGI programming. It had powerful text processing, regular expressions baked into the syntax, and it was already installed on every Unix server.
#!/usr/bin/env perl
use strict;
use warnings;
# Parse form data from QUERY_STRING
my %params;
for my $pair (split /&/, $ENV{QUERY_STRING} || "") {
my ($key, $val) = split /=/, $pair, 2;
$val =~ tr/+/ /;
$val =~ s/%([0-9A-Fa-f]{2})/chr(hex($1))/eg; # URL decode
$params{$key} = $val;
}
my $name = $params{name} || "stranger";
print "Content-Type: text/html\n\n";
print "<html><body>\n";
print "<h1>Welcome, $name!</h1>\n";
print "<form method='get'>\n";
print " <input name='name' placeholder='Your name'>\n";
print " <button type='submit'>Greet me</button>\n";
print "</form>\n";
print "</body></html>\n";
This tiny script is a complete web application: it renders a form, reads user input, and responds with personalized content. The cgi-lib.pl and later CGI.pm libraries saved developers from writing their own URL decoders, but at its core, every Perl CGI script was the same: parse input, generate HTML, print to stdout.
By 1996, the joke was that the web ran on Perl and duct tape. It wasn’t entirely a joke.
What CGI Built
The first generation of interactive web experiences all ran on CGI:
Hit counters — the “you are visitor #4,523” badges on every personal homepage. A script that read a number from a file, incremented it, wrote it back, and returned an image.
Guest books — visitors left messages that were appended to a file and displayed on a page. The first user-generated content on the web.
Search engines — when you typed a query into AltaVista or early Yahoo, a CGI script parsed your input, searched an index, and built a results page on the fly. The URL even showed it: search.cgi?q=your+query.
Form processing — contact forms, survey tools, email gateways. Matt Wright’s Matt’s Script Archive provided free Perl CGI scripts that powered thousands of sites. His FormMail.pl alone was on millions of servers (and became one of the most exploited scripts in history — more on that shortly).
Shopping carts — the first e-commerce sites tracked items in hidden form fields or flat files, all processed by CGI scripts.
The Fork-Per-Request Problem
CGI had a fatal performance flaw: every request spawned a new process.
When a browser hit /cgi-bin/search.pl, the server started a fresh Perl interpreter, loaded the script, parsed the modules, ran the code, sent the output, and killed the process. For the next request — even from the same user, even for the same page — the entire cycle repeated.
Request 1 → fork → start perl → load modules → run → respond → exit
Request 2 → fork → start perl → load modules → run → respond → exit
Request 3 → fork → start perl → load modules → run → respond → exit
On a Unix system in 1994, forking a process and starting a Perl interpreter took maybe 50–100 milliseconds. Fine for a site with 10 visitors. Catastrophic for a site with 10,000. Each concurrent request consumed a separate process with its own memory space. A popular CGI page could bring a server to its knees.
FastCGI (1996) fixed the worst of this: keep the program running as a persistent process, send requests to it over a socket, and reuse it across requests. mod_perl went further by embedding a Perl interpreter inside the Apache web server itself, eliminating the fork entirely. These were bridges to the next generation — but the core lesson was clear: one process per request doesn’t scale.
The Security Disaster
CGI scripts were the first software written by non-experts that was directly exposed to the internet. The results were predictably terrible.
The fundamental problem: user input came in as raw strings, and developers passed those strings straight into shell commands, file paths, and HTML output.
# DANGEROUS: classic CGI vulnerability
my $filename = $ENV{QUERY_STRING};
open(my $fh, "<", "/data/$filename"); # directory traversal!
# GET /cgi-bin/read.pl?../../etc/passwd → reads the system password file
# DANGEROUS: command injection
my $domain = $params{domain};
my $result = `nslookup $domain`; # shell injection!
# domain=; rm -rf / → executes arbitrary commands
# DANGEROUS: no output escaping
print "<p>You searched for: $query</p>"; # XSS!
# query=<script>document.cookie</script> → steals cookies
Every category of web vulnerability that OWASP tracks today — injection, XSS, path traversal, broken access control — was discovered and exploited in CGI scripts first. The web’s security lessons were written in Perl.
What CGI Got Right
CGI was replaced, but its ideas never were:
- Language independence — CGI didn’t care what language you wrote in. This principle lives on: Docker containers, serverless functions, and microservices all share CGI’s core abstraction of “receive a request, run some code, return a response.” AWS Lambda is, in a very real sense, CGI with better infrastructure.
- Stdin/stdout as interface — by using Unix’s most basic I/O primitives, CGI required zero special tooling. No SDK, no framework, no library. This made it accessible to anyone who could write a script, which is why the web exploded with amateur-built interactive sites in the mid-1990s.
- The request/response cycle — every server-side web framework still works the same way: parse the incoming request, do some processing, return a response. Rails controllers, Express handlers, Django views — they’re all structured around the lifecycle CGI established.
- The URL as function call — CGI mapped URLs to executable code.
/cgi-bin/search?q=hellois a function invocation: callsearchwith argumentq=hello. This pattern became REST, became GraphQL, became every API you’ve ever used.
CGI’s successor wasn’t a single technology — it was an entire ecosystem. PHP embedded the script inside the HTML. Java Servlets kept a long-running process. mod_perl eliminated the fork. Each one fixed a specific CGI limitation while keeping its core model. The web page that’s built on the fly, the server that runs code in response to a click — that’s CGI’s permanent contribution. Every dynamic page you’ve ever loaded is a descendant.