What Happens When You Steal Public Records and Make Them Public


In August, I wrote about a bunch of geeks who built a tool to gradually borrow/steal court records from a paywalled government database. They're pulling the records gradually, in a in diffuse way. But when the courts briefly opened that same database—PACER—to law libraries without charge earlier this year, 22-year-old programmer Aaron Swartz decided to try snag the records all at once:

He visited one of the libraries — the 7th U.S. Circuit Court of Appeals library in Chicago — and installed a small PERL script he'd written. The code cycled sequentially through case numbers, requesting a new document from PACER every three seconds. In this manner, Swartz got nearly 20 million pages of court documents, which his script uploaded to Amazon's EC2 cloud computing service.

The script ran for a couple of weeks — from September 4 to 22, until the court system's IT department realized something was wrong. Someone was downloading everything.

And so, for making public records public, Swartz became an object of interest by the FBI:

The FBI ran Swartz through a full range of government databases starting in February, and drove by his home…The feds also checked Swartz's Facebook page, ran his name against the Department of Labor to figure out his work history, looked for outstanding warrants and prior convictions, checked to see if his mobile phone number had ever come up in a federal wiretap or pen register, and checked him against the records in a private data broker's database.

They decided not to stake out his house because "any surveillance, the agent concluded, would be conspicuous, since so few cars were parked on Swartz's dead-end street in Highland Park, Illinois."