Data Scraping Is Not a Crime

South Carolina's NAACP and ACLU are challenging the state's ban on automated data collection.

C.J. Ciaramella | From the July 2022 issue

(Illustration: retrorocket/iStock)

South Carolina has the highest eviction rate in the country, and the state chapter of the NAACP wanted to find out why. Given the difficulty of tracking down every case by hand, the organization hoped to use a software program called a "scraper" to collect data from South Carolina's online repository of legal filings.

Researchers, academics, and investigative journalists frequently use scrapers to automate this kind of laborious, large-scale project. But the South Carolina Court Administration categorically bans such automated data collection.

Now the American Civil Liberties Union (ACLU) of South Carolina and the South Carolina NAACP are challenging the state's scraping ban in federal court. In a lawsuit they filed in the U.S. District Court for the District of South Carolina in March, the groups argue that the policy unreasonably restricts their First Amendment rights. "This case is about ensuring core First Amendment principles, like the right to access public court filings, are applied in a way that meets our rapidly expanding digital reality," Allen Chaney, the ACLU of South Carolina's legal director, said in a press release.

The NAACP says collecting eviction filings would allow it to research the issue and contact affected tenants to ensure they have meaningful access to the courts. But scraping has numerous other legitimate uses.

In 2018, for example, I wanted to find out how often Texas police used a loophole in the state's public record law to hide information on deaths in custody. So I wrote code to scrape more than 300,000 pages of public-record rulings that the Texas Attorney General's Office had posted on its website. Then I filtered the results for those that cited the specific provision I was investigating.

That would have been impossible without a bot to do the heavy lifting. By scraping data, I identified more than 80 cases in which Texas police withheld information about deaths in custody from families, lawyers, and journalists.

The South Carolina lawsuit is the latest challenge to state anti-hacking laws and the federal Computer Fraud and Abuse Act (CFAA). The U.S. Court of Appeals for the 9th Circuit issued a landmark ruling in April that scraping publicly available data from websites does not constitute "unauthorized access" under the CFAA. While it's true that scrapers can bog down websites, ethical coders add courtesy delays to their programs that avoid that problem and include identifying information in their HTTP requests to government website administrators.

Banning scrapers is not about preventing unauthorized hacking. It just makes it harder for the public to know what the government is doing.

Start your day with Reason. Get a daily brief of the most important stories and trends every weekday morning when you subscribe to Reason Roundup.

NEXT: L.A.'s Eternal Eviction Moratorium

Hide Comments (35)

Editor's Note: As of February 29, 2024, commenting privileges on reason.com posts are limited to Reason Plus subscribers. Past commenters are grandfathered in for a temporary period. Subscribe here to preserve your ability to comment. Your Reason Plus subscription also gives you an ad-free version of reason.com, along with full access to the digital edition and archives of Reason magazine. We request that comments be civil and on-topic. We do not moderate or assume any responsibility for comments, which are owned by the readers who post them. Comments do not represent the views of reason.com or Reason Foundation. We reserve the right to delete any comment and ban commenters for any reason at any time. Comments may only be edited within 5 minutes of posting. Report abuses.

Don't look at me! 3 years ago

It’s ok when we do it.
flag58 3 years ago

"scraping publicly available data" is not hacking. But that does not mean all the data is free to use. Copyright does exist for most information online.
1. Quicktown Brix 3 years ago
  
  What do you mean by "free to use?" That seems to defies logic. Why publicly post data online that cannot be used by the public?
  1. Longtobefree 3 years ago
    
    Clicks, dude, clicks.
    Don't you know how the internet works?
    1. Quicktown Brix 3 years ago
      
      Right. I briefly forgot the internet was for clicks, porn and government surveillance.
      1. Outlaw Josey Wales 3 years ago
        
        And non-fungibles. Get with the times!
        
        Quicktown Brix 3 years ago
        
        Oh, I may have missed that while I was looking at wedding photos that show too much.
2. Rossami 3 years ago
  
  As a general principle, that's true. In this specific instance, that's utterly irrelevant. Court documents are public records by definition. They are not and cannot ever be copyrighted.
3. Zeb 3 years ago
  
  Fair use is a thing. Copyright doesn't mean complete control over information by the holder. In general you can cite copyrighted material in research.
4. MatthewSlyfield 3 years ago
  
  We are talking about government data here. US copyright law specifically prohibits government copyrights in government work product.
  1. PapaG 3 years ago
    
    That includes software, right?
Longtobefree 3 years ago

Well, well.
Common sense laws relating to the first amendment are different from laws relating to the second amendment.
What if you had to get a permit from the sheriff to do data scraping? We can call it 'high capacity reading'. Just pay for a background check and fingerprinting, and then file an application with the appropriate fees. After the requisite delays, you might get the permit. If not, no appeal. Of course, you can't use certain kinds of programs to do the scraping, no matter how efficient, only the ones approved by the state.
Doesn't that sound better than allowing just anyone to execute programs that run amok through the public data?
1. Jerry B. 3 years ago
  
  I'm really not sure how data scraping falls under the 1st Amendment anyway. Is this one of those "penumbra" thingies?
  1. Longtobefree 3 years ago
    
    I don't either, but that is the position of the ACLU and their co-conspirators, so that is how I snarked.
  2. Zeb 3 years ago
    
    Free press? If you can't collect information, you can't do journalism. Whether the information is collected by reading a piece of paper or making a piece of software that reads lots of documents shouldn't matter.
2. Its_Not_Inevitable 3 years ago
  
  What caliber is the bot, and how many data points does it scrape per minute?
3. Ajsloss 3 years ago
  
  Common sense scraping control. Nobody needs that much data.
Commenter_XY 3 years ago

In 2018, for example, I wanted to find out how often Texas police used a loophole in the state's public record law to hide information on deaths in custody. So I wrote code to scrape more than 300,000 pages of public-record rulings that the Texas Attorney General's Office had posted on its website. Then I filtered the results for those that cited the specific provision I was investigating.

Is this data (and your code) posted somewhere, like GitHub? That assertion (you wrote code and definitively showed 80 instances) of yours is not verifiable, CJ Ciaramella. Can you prove the data exists?

Is this Reason journalism....making completely unsubstantiated statements like that?
1. Zeb 3 years ago
  
  How is that any different from doing research by less automated means? Journalists don't, in general, post complete source materials and references.
Social Justice is neither 3 years ago

So the group that wrote a defamatory article for AH is now "helping" the NAACP with this. One question, what does this have to do with 1A? It's not assembly, it's not religion, it's not restricting their speech and they can still petition the government for redress, unlike Republicans and those are restrictions Reason enthusiastically endorses.
1. Rossami 3 years ago
  
  It's connected to the First Amendment in the same way that the right to record on-duty police is. Court documents are public records. The government has no authority to impede their collection or use.*
  
  * The government can impede some use through redaction orders or sealing orders but a) those are already well-established with compensating controls and b) they apply at the point of document creation and are not uniquely applied based on whether the document is in hardcopy or electronic form.
Mickey Rat 3 years ago

So scraping bots have a negative impact on the systems they search, which gives a plausibly legitimate rationale to limit their use, since it sounds like they may the sort of thing that could be used in a denial of service attack. On the other hand, the government may be using that practical concern as a smokescreen to limit the ability to shift through compromising data efficiently.
1. Moderation4ever 3 years ago
  
  The author noted that code can be written to minimize interference with the web site's stated purpose. It seems that government could simply set standards for the codes used and for time when they could be used. Eliminate data scraping during high website use time.
  1. NOYB2 3 years ago
    
    Scraping is the wrong way to access the data. The data should simply be available for direct download (as an archive) and/or available upon request on a hard drive.
sarcasmic 3 years ago

I was listening to the radio yesterday about a state representative here in Maine who was trying to pass a law prohibiting the police from doing the same thing. Apparently they've got bots that scour Facebook and such, and even grab your credit scores. He thought that was creepy, but the DA and State Police shut him down.
1. Diane Reynolds (Paul.) 3 years ago
  
  Exactly. And I believe Reason has printed articles in the past whinging about law enforcement gathering data from people's public facebook profiles who were too stupid to not broadcast their felonies. It seems data 'scraping' is one of those concepts that you either are for or against depending on by whom, what and who is being scraped.
  1. Zeb 3 years ago
    
    Well, a law limiting what police can do is in the proper purview of government. Limiting what people can do with public records, not so much.
    1. sarcasmic 3 years ago
      
      Close but not quite. Limiting what police can do implies that they have unlimited power that is only restrained by laws saying what they cannot do. That's backwards. They have limited powers which means they cannot do anything unless authorized. I just wish the courts felt the same way.
  2. Livemike 3 years ago
    
    The difference is who the scraping is done TO. It's perfectly legitimate for private individuals to examine government records for whatever reason they want with whatever tools they want. The government examining private records is a different matter.
Diane Reynolds (Paul.) 3 years ago

I'm going to have to read about this more carefully, because I'm getting a whiff of Fusion GPS. Like the ACLU supported this before they were against it.
MJBinAL 3 years ago

" While it's true that scrapers can bog down websites, ethical coders add courtesy delays to their programs that avoid that problem and include identifying information in their HTTP requests to government website administrators."

Have you ever noticed that most coders are not ethical coders? Ever been online? (yes I know, rhetorical)
NOYB2 3 years ago

the organization hoped to use a software program called a "scraper" to collect data from South Carolina's online repository of legal filings

So, in other words, there is no "scraping ban" and scraping per se is not a crime in South Carolina.

It sounds like this particular site, the "SC repository of legal filings" has a ban on scraping in its terms of service. And such a ban makes sense because scraping can place a big load on interactive services. So, talk of a "scraping ban" is a red herring.

The real issue here is that legal filings should be publicly available without scraping, via direct downloads.
1. Zeb 3 years ago
  
  Yeah, that seems like the real answer here. If the system can't handle the inquiries people want to make of it, then they should make a better system that works with the way people are actually using the records.
  1. NOYB2 3 years ago
    
    "A better system" consists of copying the data onto a 20 TB drive and sending it to the ACLU, at the ACLU's expense (a few hundred dollars).
Sevo 3 years ago

Why is the ACLU interested in evictions?

Please log in to post comments

Data Scraping Is Not a Crime

South Carolina's NAACP and ACLU are challenging the state's ban on automated data collection.

Latest

They Built a Hemp Business in Good Faith but Washington Is About To Crush It

Knitters Need Free Trade: Trump's Tariffs Are Making Crafting Supplies Harder To Get

America's Politicized Holiday Dinner

Trump Slammed Biden's $52 Billion CHIPS Act. Then He Used It To Buy a Federal Stake in Intel.

Trump's $1.1 Billion Tax Hike on Toys and Games

Recommended

Login Form