Scraping A Public Website Doesn't Violate the CFAA, Ninth Circuit (Mostly) Holds

A major decision.

|The Volokh Conspiracy |

The Ninth Circuit Court of Appeals has handed down a groundbreaking decision today on the federal computer hacking law,  the Computer Fraud and Abuse Act (CFAA).  In HiQ Labs v. LinkedIn, the court held that scraping a public website is likely not a CFAA violation.

Under the new decision, violating the CFAA requires "circumvent[ing] a computer's generally applicable rules regarding access permissions, such as username and password requirements," that thus "demarcate[]" the information "as private using such an authorization system."  If the data is available to the general public, the court says, it's not an unauthorized access to view it—even when the computer owner has sent a cease-and-desist letter to the visitor telling them not to visit the website.

This is a major case that will be of interest to a lot of people and a lot of companies.  But it's also pretty complicated and easy to misunderstand.   This post will go through it carefully, trying to explain what it says and what it doesn't say.

I.  The Context

To really understand the new decision, I think it helps to start with some context.  The CFAA is a computer trespass statute that prohibits accessing a computer "without authorization."  It is primarily a criminal statute, but it also has civil remedies that permit private parties to bring CFAA lawsuits for damages or injunctive relief.

Importantly, the meaning of the CFAA is the same in both civil and criminal settings.  This means that whatever courts say about the CFAA when a computer owner sues a user is equally applicable when the federal government arrests and prosecutes the user with substantial jail time in play.

The big question under the CFAA has long been what counts as "authorization."  Does authorization depend on how the computer architecture is designed, with users authorized to use a computer if it's available to the public and not authorized if the access is technically blocked?  Or does it depend on what the computer owners says they want, either through terms of use posted on the computer or through letters directed to potential visitors?

Courts have been all over the map, and the Ninth Circuit's decisions have zigzagged a bit on this.  There are four big Ninth Circuit precedents to consider:

(1) In LVRC Holdings LLC v. Brekka, 581 F.3d 1127, 1133 (9th Cir. 2009), the Ninth Circuit held that an employee who gathers information on a company computer for his own purposes does not violate the CFAA merely because that personal use was adverse to the interests of the employer.  The parties agreed that access to the company computer would be unauthorized after the employee left the company.  But when the employee was  working at the company, accessing the company's files was not made a crime simply because the employee was doing so for a secret purpose to help himself and hurt the company.  (Another circuit had disagreed, but I'll stick to Ninth Circuit caselaw in this post.)

(2) Three years later, in United States v. Nosal, 676 F.3d 854 (9th Cir. 2012) (en banc) ("Nosal I"), the Ninth Circuit held that it doesn't violate the CFAA to use a website in violation of written restrictions like employment agreements or Terms of Service posted on websites.  The CFAA was designed to "punish hacking — the circumvention of technological access barriers," the Court noted.  Given that narrow focus, it was wrong to construe the statute to also encompass the very common and innocuous act of  using a website or company computer in a way contrary to terms of use and employment policies.  (There is a circuit split on this issue, too.  But again, I'm focused on the Ninth Circuit here.)

(3) Four years later, in a follow up case, United States v. Nosal, 844 F.3d 1024 (9th Cir. 2016) ("Nosal II"), the Ninth Circuit held that it does violate the CFAA for a former employee to get a current employee's username and password and to use their account with their permission. That's different from violating an employment agreement or terms of use, the court held, as the former employee has no right to access the computer under Brekka.  Leaving the company ended the access rights, and a former employee can't work around that to restore those rights just because a current employee was willing to hand him her username and password.

(4) Shortly after Nosal II, the Ninth Circuit handed down Facebook v. Power Ventures, 844 F.3d 1058 (9th Cir. 2016). Power Ventures was a service that accessed users' Facebook profiles with the users' permission and moved the data to a different website run by Power Ventures. Power Ventures held that it's an unauthorized access to visit a computer after receiving a cease-and-desist letter from the computer owner prohibiting the visit based on it violating terms of service.  That's like Brekka, the Ninth Circuit reasoned, because the cease-and-desist letter withdraws permission to use the computer.  And it's not like Nosal I, the court argued, because cease-and-desist letters (unlike posted terms of service) put the visitor on clear notice that the visit to the computer is prohibited by the computer owner.

II.  The Facts and Procedural History of HiQ Labs

That brings us finally to the new case, HiQ Labs v. LinkedIn.  HiQ Labs is a data analytics company.  It scrapes information on LinkedIn profiles that LinkedIn users have set be viewable by the general public without a LinkedIn account.  HiQ Labs combines that information with other information and sells it to companies.

LinkedIn wants to monetize that data itself, so it sent a cease-and-desist letter to HiQ telling it to stop accessing and copying the data publicly posted on LinkedIn. LinkedIn threatened to sue HiQ on various grounds if HiQ refused to stop.  HiQ instead filed suit in federal court seeking an injunction based on state law and a declaratory judgment that its conduct was legal.

The district court granted a preliminary injunction, setting up this appeal.

That brings me to a warning: The new Ninth Circuit decision is a little bit tricky to analyze because of its procedural posture. That's true for two reasons that are helpful to flag now.

First, at this stage of the case, HiQ is only seeking a preliminary injunction—basically, a ruling from the judge preserving the status quo so LinkedIn can't stop HiQ  in the initial period when the lawsuit is pending.  The standard for a preliminary injunction considers the merits of the legal claims, but it does not make a definitive ruling about them.  For that reason, the opinion's conclusions about the CFAA are written tentatively. The court talks about what is "likely" the correct interpretation of the CFAA, what raises "serious questions," et cetera.

Second, the CFAA issues enter the case somewhat indirectly.  HiQ is seeking a preliminary injunction based on a state law claim, that LinkedIn is tortiously interfering with its business contracts by trying to block it and stop its conduct.  LinkedIn is then raising the CFAA as a defense. You can't sue us for tortiously interfering with your business contracts, LinkedIn is saying, because the entire HiQ business is illegal under federal law.

All of this means that the CFAA ruling is a bit indirect.  Technically, the issue being decided is whether there's a serious question that HiQ's scraping complies with the CFAA, which is needed to say that LinkedIn trying to stop HiQ may be tortious interference with HiQ's legitimate business, which is needed to know if was an abuse of discretion for the trial court to temporarily stop LinkedIn from trying to interfere with HiQ's business.

Got it?  I know, I know.  Lawyers always have to make everything so complicated. (Sorry.)

III.  The CFAA Ruling

That brings us to the CFAA ruling.  It's hugely important.  The Ninth Circuit views the CFAA has a hacking statute (like Nosal I did), and it presumes a right to open access under the CFAA unless there is some technological measure placed on access.  Because HiQ did not circumvent a technological access measure to get to the data publicly posted on LinkedIn's website, the CFAA was not violated.  (Or rather, "likely" was not violated, see the reason for the tentative language above.)

Here's the key language, with the particularly important language in bold and a few paragraph breaks added by me for web readability.  The opinion was by Judge Berzon, joined by Judge Wallace and District Judge Berg sitting by designation.

We . . . look to whether the conduct at issue is analogous to "breaking and entering." H.R. Rep. No. 98-894, at 20. Significantly, the version of the CFAA initially enacted in 1984 was limited to a narrow range of computers—namely, those containing national security information or financial data and those operated by or on behalf of the government. See Counterfeit Access Device and Computer Fraud and Abuse Act of 1984, Pub. L. No. 98- 473, § 2102, 98 Stat. 2190, 2190–91. None of the computers to which the CFAA initially applied were accessible to the general public; affirmative authorization of some kind was presumptively required.

When section 1030(a)(2)(c) was added in 1996 to extend the prohibition on unauthorized access to any "protected computer," the Senate Judiciary Committee explained that the amendment was designed to "to increase protection for the privacy and confidentiality of computer information." S. Rep. No. 104-357, at 7 (emphasis added).

The legislative history of section 1030 thus makes clear that the prohibition on unauthorized access is properly understood to apply only to private information—information delineated as private through use of a permission requirement of some sort. As one prominent commentator has put it, "an authentication requirement, such as a password gate, is needed to create the necessary barrier that divides open spaces from closed spaces on the Web." Orin S. Kerr, Norms of Computer Trespass, 116 Colum. L. Rev. 1143, 1161 (2016). Moreover, elsewhere in the statute, password fraud is cited as a means by which a computer may be accessed without authorization, see 18 U.S.C. § 1030(a)(6), bolstering the idea that authorization is only required for password-protected sites or sites that otherwise prevent the general public from viewing the information.

We therefore conclude that hiQ has raised a serious question as to whether the reference to access "without authorization" limits the scope of the statutory coverage to computer information for which authorization or access permission, such as password authentication, is generally required.

Put differently, the CFAA contemplates the existence of three kinds of computer information: (1) information for which access is open to the general public and permission is not required, (2) information for which authorization is required and has been given, and (3) information for which authorization is required but has not been given (or, in the case of the prohibition on exceeding authorized access, has not been given for the part of the system accessed).

Public LinkedIn profiles, available to anyone with an Internet connection, fall into the first category. With regard to such information, the "breaking and entering" analogue invoked so frequently during congressional consideration has no application, and the concept of "without authorization" is inapt.

Neither of the cases LinkedIn principally relies upon is to the contrary. LinkedIn first cites Nosal II, 844 F.3d 1024 (9th Cir. 2016). As we have already stated, Nosal II held that a former employee who used current employees' login credentials to access company computers and collect confidential information had acted "'without authorization' in violation of the CFAA." Nosal II, 844 F.3d at 1038. The computer information the defendant accessed in Nosal II was thus plainly one which no one could access without authorization.

So too with regard to the system at issue in Power Ventures, 844 F.3d 1058 (9th Cir. 2016), the other precedent upon which LinkedIn relies. In that case, Facebook sued Power Ventures, a social networking website that aggregated social networking information from multiple platforms, for accessing Facebook users' data and using that data to send mass messages as part of a promotional campaign. Id. at 1062–63. After Facebook sent a cease-and-desist letter, Power Ventures continued to circumvent IP barriers and gain access to password-protected Facebook member profiles. Id. at 1063.

We held that after receiving an individualized cease-and-desist letter, Power Ventures had accessed Facebook computers "without authorization" and was therefore liable under the CFAA. Id. at 1067–68. But we specifically recognized that "Facebook has tried to limit and control access to its website" as to the purposes for which Power Ventures sought to use it. Id. at 1063. Indeed, Facebook requires its users to register with a unique username and password, and Power Ventures required that Facebook users provide their Facebook username and password to access their Facebook data on Power Ventures' platform. Facebook, Inc. v. Power Ventures, Inc., 844 F. Supp. 2d 1025, 1028 (N.D. Cal. 2012). While Power Ventures was gathering user data that was protected by Facebook's username and password authentication system, the data hiQ was scraping was available to anyone with a web browser.

In sum, Nosal II and Power Ventures control situations in which authorization generally is required and has either never been given or has been revoked. As Power Ventures indicated, the two cases do not control the situation present here, in which information is "presumptively open to all comers." Power Ventures, 844 F.3d at 1067 n.2.

. . . Both the legislative history of section 1030 of the CFAA and the legislative history of section 2701 of the SCA, with its similar "without authorization" provision, then, support the district court's distinction between "private" computer networks and websites, protected by a password authentication system and "not visible to the public," and websites that are accessible to the general public.

Finally, the rule of lenity favors our narrow interpretation of the "without authorization" provision in the CFAA. The statutory prohibition on unauthorized access applies both to civil actions and to criminal prosecutions— indeed, "§ 1030 is primarily a criminal statute." LVRC Holdings LLC v. Brekka, 581 F.3d 1127, 1134 (9th Cir. 2009). "Because we must interpret the statute consistently, whether we encounter its application in a criminal or noncriminal context, the rule of lenity applies." Leocal v. Ashcroft, 543 U.S. 1, 11 n.8 (2004). As we explained in Nosal I, we therefore favor a narrow interpretation of the CFAA's "without authorization" provision so as not to turn a criminal hacking statute into a "sweeping Internet-policing mandate." Nosal I, 676 F.3d at 858; see also id. at 863.

For all these reasons, it appears that the CFAA's prohibition on accessing a computer "without authorization" is violated when a person circumvents a computer's generally applicable rules regarding access permissions, such as username and password requirements, to gain access to a computer. It is likely that when a computer network generally permits public access to its data, a user's accessing that publicly available data will not constitute access without authorization under the CFAA. The data hiQ seeks to access is not owned by LinkedIn and has not been demarcated by LinkedIn as private using such an authorization system. HiQ has therefore raised serious questions about whether LinkedIn may invoke the CFAA to preempt hiQ's possibly meritorious tortious interference claim.

The court goes on to note that website owners have other legal options and causes of action outside the CFAA.  First, the court suggests that website scraping might violate the common law tort of trespass to chattels, "at least when it causes demonstrable harm."  Second, depending on the case, there may also be civil causes of action for "copyright infringement, misappropriation, unjust enrichment, conversion, breach of contract, or breach of privacy."  But not the combined civil/criminal provisions of the CFAA.

IV.  A Few Reactions

What do I make of the new decision?

On the substance of the reasoning, I'm delighted.  Of course, that's easy for me to say.  Given my prior writing on this topic, including writing that the court very graciously cited, the decision seems quite brilliant to me.

More seriously, this is a really important decision that embraces the open presumption of the Internet far more clearly and directly than prior cases.  The Ninth Circuit's approach to the CFAA has zigzagged a bit over time.  Some cases have embraced a more open Internet, and others have been quick to say that computer owners can close it easily.  This is a big step in the direction of openness.

I also think this decision renders Power Ventures an outlier.  I may be biased, as I thought Power Ventures was wrong.  As regular readers may remember, I represented Power Ventures on the petition for rehearing to try to get the panel decision overturned.  But Power Ventures seemed to give cease-and-desist letters magical powers given their clarity and notice.  It was possible to read Power Ventures broadly as saying that as long as the computer owner sends the cease-and-desist letter, the computer owner's written directive controls the CFAA question—the recipient is sent into Brekka-land where their access rights were withdrawn.

HiQ Labs now places a critical limit on Power Ventures. Under HiQ Labs, the cease-and-desist letter only controls access rights to non-public data.  That seems to reduce Power Ventures to a limited application of Nosal II.  Under both Nosal II and Power Ventures-as-construed-in-HiQ, once a computer owner tells you to go away, you can't then rely on a current legitimate user's permission to let you back in.

Putting the cases together, the Ninth Circuit law right now seems to go like this.  You can scrape a public website, and you can violate terms of service, without violating the CFAA.  However, you can only access non-public areas of a computer if you haven't had your access rights canceled before, either through a cease-and-desist letter or through the relationship ending that had granted you access rights.

It's worth stressing that all of this is only the law in the Ninth Circuit.  There are clear circuit splits on how the CFAA has been interpreted that only the U.S. Supreme Court can resolve.  I suspect some of that resolution will happen pretty soon.  When it happens, the Supreme Court's guidance will of course mean much more than the view of one court of appeals.  But the Ninth Circuit has handed down significantly more CFAA caselaw than any other circuit court.  In the interim, before the Supreme Court takes a look at these issues, HiQ Labs is a really big deal.

One last point: There's more in the new decision on issues beyond the CFAA that is worth checking out.  The whole opinion is worth a read.

Advertisement

NEXT: Bankrupting National Security?

Editor's Note: We invite comments and request that they be civil and on-topic. We do not moderate or assume any responsibility for comments, which are owned by the readers who post them. Comments do not represent the views of Reason.com or Reason Foundation. We reserve the right to delete any comment for any reason at any time. Report abuses.

  1. HiQ is seeking an injunction to require LinkedIn to continue to allow it to access data on LinkedIn servers? Why is the court even considering this as a choice? Why can’t LinkedIn take any of these steps as a response to HiQ’s use of its servers and services?

    1) block, limit, or meter access from HiQ’s network
    2) poison the data transmitted to HiQ
    3) change the access allowed to the public in ways that keep HiQ from making effective use of the data.
    4) impose terms of service that hamper HiQ’s use of the data.
    5) charge HiQ for the access to their servers and services

    1. 1-3. Because those are hard to do without blocking, poisoning or otherwise limiting the data available to all the rest of LinkedIn’s customers.
      4. Because Nosal I says that ignoring the terms of service doesn’t create a CFAA violation.
      5. For the same reason you can’t try to charge me for reading a billboard you put up on the side of the highway.
      All. Because they likely didn’t think they’d need to based on their interpretation of the Power Ventures decision.

      1. ” Because those are hard to do without blocking, poisoning or otherwise limiting the data available to all the rest of LinkedIn’s customers.”

        No, they aren’t. To the extent that HiQ is drawing data away from LinkedIn servers in violation of the Terms of Service, they are conducting a low-grade Denial-of-Service attack… which network administrators have built in defenses to network designs for about 35 years now.

        1. @JamesPollock – In the real world, it’s hard to deny source networks reliably for this kind of thing. I mean, it’s easy to block an IP range, but it’s easier for a visitor to change their apparent IP address, either through VPN hopping or other techniques.

          Blocking source networks, as Rossami notes, very quickly starts blocking legitimate traffic. For instance, I have an IP address in Texas on the AT&T/Cingular network. It changes frequently. You want to block all of AT&T/Cingular subnets in use in Austin? Probably not. Especially when I can just route my request through Vancouver or any other city at random, and systemically.

          So then you may start getting into examining signatures of the requests – based on frequency, request headers, other things that might uniquely identify the culprit. But this is another thing which is easier for the requester to alter (even randomly) than it is for the responder to analyze and adapt to. Even if it’s a robot doing the analysis. Avoiding systems like this is just a matter of altering the timing of your requests, the style of your requests and the origins of your requests enough to avoid a robot finding a pattern.

          The problem with your DOS attack analogy is that an actual DOS attack depends on a flooding consumption of resources, where data scraping can just simulate legit traffic coming in at a legit rate. There’s no burden on the requester’s part to mob a system. Time favors the data scraper, whereas in a DOS attack, the attacker is fighting against time (by cramming as many requests per second in as possible). This makes it easier for robots to distinguish a DOS attack in action. Data scraping done wrong can look like a DOS attack and is frankly very rude in the Internet ethic, but data scraping done right is a more patient endeavor.

          1. “In the real world, it’s hard to deny source networks reliably for this kind of thing.”

            They don’t have to deny it reliably. Intermittently should do the trick. Especially if they also intermittently poison the data going out.

            “The problem with your DOS attack analogy”

            What analogy? The resources used to service HiQ’s requests for data are denied to legit users. That’s a low-grade DOS.

            1. Reliably means effectively. What you may be missing here is that the scripts used to scrape data aren’t like a google indexer that announces to the host who it is and why it’s there. The better a data scraping script/operation is the closer it mimics legitimate traffic. If you want legitimate traffic (as LinkedIn does), it’s pretty tough to block legitimate-looking traffic. A badly written script to harvest site data will look like a DOS attack, yes, but people ‘in the biz’ don’t write crappy scripts because it’s a lose-lose scenario. Not only would it make them easy to block but nobody wants to harm the target site.

              1. ” What you may be missing here”

                What you may be missing here is that I am an IT professional by trade. Do you know what a CISSP is?

                1. A certification held by people who are not of the quality of Wendy Nather?

                2. If I’m causing you personal offense, that wasn’t my purpose, I’m sorry. If you do network edge security for a living my advice would be to also start playing with scripting so you can see the other side of it. Write a quick script to download content off a website at your employer or in your own lab (I suggest using Powershell because it’s quick, but Python too or even a shell script if that’s quickest). If you work with an IDS platform or active security appliances, watch how they respond to your script. Your premise all along has been that you can systemically prevent data harvesting by poisoning/blocking content or with the tools you use to prevent DOS attacks, and that’s just not true for reasons that have been explained and that can be demonstrated quite simply by trying it yourself. You just can’t ignore the need to facilitate anonymous traffic to a web site reliably (blocking address ranges for whole markets is disruptive to this end) and you need the perspective of the harvester (how easy it is to write non-destructive adaptable harvesting scripts). If you are actively employed in network security circles or are trying to be, gaining this perspective from other vantage points will seriously give you a leg up on the competition. I’m simply trying to tell you that as a matter of fact, site scraping is easy to do and hard to prevent. Do you really think the boys at LinkedIn haven’t considered technical solutions to this? There’s a reason they’ve resorted to using the courts now.

  2. “Given my prior writing on this topic, including writing that the court very graciously cited…”

    I’m guessing that positive feeling probably never grows old!

  3. Just wanted to boost the comment count on this post; it’s kind of a shame that a post about citizenship gets 200 posts because kooks drive the thread, while actual legal discussion gets almost none.

    Found this discussion informative.

    1. This is in part because whereas lots of people (lawyers, even) understand the idea of citizenship, not a lot of people (lawyers, even) understand the ideas of information security.

    2. I always appreciate the thoroughness of Prof. Kerr’s analysis, it’s rare that a question or side issue occurs that he doesn’t track down somewhere in his post.

    3. Agree with you on that, DN.

  4. It’s always seemed to me that the bare, anonymously-accessible content on the Internet at any given point in time is a matter of fact, much like the quantity of cars that passed by my office building this morning (and their license plate numbers), the weather at any given GPS coordinates or the content of signs on the sides of buildings. Copyright certainly comes into play in terms of content reuse, but it doesn’t seem like a CDA section 230 platform can copyright my current job title.

  5. I still see an inconsistency between HiQ Labs and Power Ventures. If the statute focuses on access to the computer, then Power Ventures and HiQ Labs are inconsistent for both companies said, “It’s my computer, stay out.”

    If the statute focuses on the privacy of the data on the computer, then Power Ventures cannot be correct. The data in Power Ventures (regardless of any terms of service that Facebook may have) is that of the Facebook subscribers. They granted access to their data to Power Ventures. That means that with respect to Power Ventures, this was not private data. From Power Ventures’ point of view, it is no different than the public data on LinkedIn in HiQ Labs.

  6. I am not an intellectual property lawyer, or any kind of lawyer. So maybe someone with legal expertise can explain why something like this in the terms of service wouldn’t solve the problem:

    In posting material to the LinkedIn site you agree that you are the copyright holder for all material you post. You agree further that LinkedIn is your sole authorized licensee for use of this material, and grant power to LinkedIn to defend your copyright against infringement by any third party. This agreement is revocable by you at any time.

    1. I could see that for the summary one writes about themselves, but name, employer and job title? I don’t think you can get copyright protection for your name. Presumably HiQ wants the facts about the pros on LinkedIn, not the prose on LinkedIn.

Please to post comments