Politics

Uppity Young Geek Steals Legitimately Downloads Millions of Documents. Again.

|

Aaron Swartz's smirky* visage has appeared on Hit&Run before, when he took it upon himself to download millions of pages of public court records from the PACER database—which generally charges 8 cents per page—using a free law library account and then make them public in 2009. The FBI looked into the incident, decided they couldn't make anything stick and let the former Reddit co-owner and digital activist go free. 

Now the transparency vigilante has done it again. Only this time he wound up in handcuffs—facing a $1 million fine and/or 35 years in prison—and the hows and whys of this case are much murkier than his last caper:

The grand jury indictment accuses Swartz of evading MIT's attempts to kick his laptop off the network while downloading more than four million documents from JSTOR, a not-for-profit company that provides searchable, digitized copies of academic journals. The scraping, which took place from September 2010 to January 2011 via MIT's network, was invasive enough to bring down JSTOR's servers on several occasions.

Swartz had to sneak in and out of server closets to do the JSTOR scraping, which makes the whole thing seem seedier and more illegal. But the differences between the PACER grab and the JSTOR grab are less about how he got the data than what kind of data it was, and what he did with it. In both cases, he seems to have done little more than violate the sites' Terms of Service to get the docs—a move that has been greeted with varying degrees of condemnation by the courts. But as Ars Technica explains

There's an important difference between PACER and JSTOR. As works of the federal government, PACER documents are in the public domain. In contrast, many JSTOR documents are protected by copyright. The PACER documents Swartz downloaded are now available for download. Distributing the JSTOR documents, in contrast, would be a clear case of copyright infringement.

Whether he intended to release all those JSTOR documents, many of which are copyrighted, the way he released the public domain PACER data is far from clear, although a peek at Swartz's resume suggests that all parties would be right to be suspicious. Still: 

His history includes a study co-authored with Shireen Barday, which looked through thousands of law review articles looking for law professors who had been paid by industry patrons to write papers. That study was published in 2008 in the Stanford Law Review.

JSTOR says it got the docs back from Swartz and isn't behind the prosecution. Meanwhile, Swartz is out on $100,000 bail

More on PACER and transparency chic here.

*As a lifelong smirker myself, I do not mean to use the term derogatorily here.