Where Everybody Knows Your Name

What do AOL customers, Netflix subscribers, and abortion seekers in Oklahoma have in common? Hacking their identities is a cinch.

We're all part of a huge, ongoing statistics project. Mostly, we become a part of various data sets anonymously, without even knowing it—as sales figures for Guitar Hero, traffic patterns on I-95, or levels of cocaine in an urban sewer system.

But there's another kind of data that gets released into the wild with increasing frequency: researcher bait. Netflix made its user-generated rating database publicly available as part of a prize competition designed to improve the site's movie recommendations. Three years ago, America Online released several months of search query information, just as a nice gesture to researchers. In both cases, the names and other obvious identifying information were removed before the data was set free.

Last month, Oklahoma set out to contribute a new mass of data to the world. New reporting requirements on abortion would have dumped a massive amount of information into a public database, available on the state government's website. The new laws require doctors to collect and report information about every abortion in the state, including the mother's age, marital status, race, number of children, education level, the mother’s relationship to the father, the reason for the abortion, the cost, and method of payment. The form contains 37 questions, most with several subsections. The names and addresses of the women would have been omitted, though her zip code was part of the information to be disclosed.

But as it turns out, taking your name off of something doesn't mean your fingerprints aren't all over it. Even when obvious identifying information is stripped from a large data set, personal identities can often be cracked by a geek with time on his hands.

Geeks like Arvind Narayanan and Vitaly Shmatikov, to be specific, who broke the anonymity of the Netflix set by comparing the dates of specific rankings with similar rankings on the popular Internet Movie Database, where users reveal personal information in public profiles. The vulnerability of the AOL database so horrified researchers that they have mostly left the set alone, tempting though that juicy data is. For a taste of the kind of revelations from that "anonymized" set, check out what this guy was up to:

  • 17556639 how to kill your wife
  • 17556639 how to kill your wife
  • 17556639 wife killer
  • 17556639 how to kill a wife
  • 17556639 poop
  • 17556639 dead people
  • 17556639 pictures of dead people
  • 17556639 killed people
  • 17556639 dead pictures
  • 17556639 dead pictures
  • 17556639 dead pictures
  • 17556639 murder photo
  • 17556639 steak and cheese
  • 17556639 photo of death
  • 17556639 photo of death
  • 17556639 death
  • 17556639 dead people photos
  • 17556639 photo of dead people
  • 17556639 www.murderdpeople.com
  • 17556639 decapatated photos
  • 17556639 decapatated photos
  • 17556639 car crashes3
  • 17556639 car crashes3
  • 17556639 car crash photo

Searches for just a couple of addresses or phone numbers along with that astonishingly evocative list of murder-related searches and user 17556639 is in the bag. In 2000 then-graduate student Latanya Sweeney sliced and diced U.S. Census data and found that 87 percent of the population can be identified using only their date of birth, zip code, and gender. 

This fall, Paul Ohm of the University of Colorado Law School published a study on the "surprising failure of anonymization." He writes that we have "labored beneath a fundamental misunderstanding, which has assured us much less privacy than we have assumed. This mistake pervades nearly every information privacy law, regulation, and debate, yet regulators and legal scholars have paid it scant attention."

As Ohm notes, while the tech community has become very aware of the privacy issues surrounding large data sets over the last several years—Google has fought off broad government subpoenas demanding search queries, even though the feds weren't asking for personal information about users—Oklahoma state legislators don't seem to have gotten the memo. And it's safe to assume that federal legislators will suffer from the same problem. For now, the Oklahoma rules are on hold while a court considers a challenge to the law. The hearing was postponed this week, after a second judge recused herself from the case. But this won't be the last time courts have to consider the viability of laws like Oklahoma's. And as the federal government gets more involved with health care, the feds will be looking for ways to get more bang for their regualtory buck. One of the likely results: More disclosure mandates, so that we can all be part of the great, ongoing statistics project whether we like it or not.

There's an old(ish) adage that the Internet treats censorship as a malfunction, and routes around it. There's a corollary for online data, voiced by Sweeney, now of Harvard’s Center for Research on Computation and Society, who has said that “data tend to flow around and get linked to other data.” Stripping out information about names and addresses isn't enough to keep data secure. Digital data sets don't stay isolated. And as Ohm notes, that's the problem: "Data can either be useful or perfectly anonymous but never both."

Katherine Mangu-Ward is a senior editor at Reason magazine.

Editor's Note: We invite comments and request that they be civil and on-topic. We do not moderate or assume any responsibility for comments, which are owned by the readers who post them. Comments do not represent the views of Reason.com or Reason Foundation. We reserve the right to delete any comment for any reason at any time. Report abuses.

  • Pro-LIfe & Pro-Gay Atheist.||

    It was not that long ago that the fraud of climate change was exposed as a result of hacking. Some say the world is becoming a panopticon. That may well be. But it begs the question - who are the prisoners and who are the guards? So long as governments snoop into our lives I will defend the private hackers who snoop into the lives of the governments, the rent seeking scumbags and the politicans.

  • Untermensch||

    Wow, that article just left me hanging…

  • Untermensch||

    Thanks for supplying the rest of the article. It initially stopped just before the list of warped search queries on how to kill wives.

  • Old Mexican||

    Well, now I know that if I want to have an abortion, I should not suscribe to Netflix... Although, I may not need one in the forseeable future, since I am not a woman.... but still...

  • ||

    WTH happened to comments on Brickbats?

    We now return you to your regularly scheduled comments.

  • Chad||

    There isn't any privacy anymore. Get over it and quit whining.

    The closest thing you can get is anominity.

  • Old Mexican||

    Re: Chad,

    The closest thing you can get is anominity.[sic]


    I am not sure I even waht that - I have no idea what it is.

    Or did you mean anonymity?
  • ||

    I thought it was something in a saltwater fish tank.

  • ||

    Autonomy probably doesn't exist as it did either.

  • Vines & Cattle||

    Goddammit, Oklahoma is not in "the South". Neither are we the Midwest, or the Southwest. Sadly we're than a Dallas suburb with fewer strip clubs and 3.2 beer. :(

  • Vines & Cattle||

    ^^^ little more than a Dallas suburb

  • ||

    No one knows where the hell Oklahoma and Texas should be categorized. If you're an orthopedic surgeon from those states, you could join the Western, Southern, or MidWest regional associations.

  • ||

    MidSouth?

  • ||

    Texas doesn't need to be part of a "region", since it's bigger on its own than many of the multi-state regions it's compared to.

    Oklahoma is another matter. Nobody knows how to classify it, exactly. But there's no need to because, in a sense, the state barely exists in the first place. Once you get outside of OKC it's nothing but oil fields. If it wasn't for the wells, refineries, and air strips, driving through Oklahoma at night would feel a lot like driving through the Sahara, the moon and your car's headlights being the only source of illumination. I can't help but wonder if life there would change at all if Oklahoma ceased to be a state and just became a "territory."

  • ||

    Tulsa actually has a larger population than OKC. It also has a ballet, opera, several large universities, and a reasonable number of large businesses for a city its size.

  • Ike||

    For what it is worth, the Center for Democracy and Technology has just launched a campaign called "Take Back Your Privacy."

    http://www.cdt.org/takebackyourprivacy

  • Busy deleting cookies||

    Ike,

    The best part:
    Sign up now!
    Email address (required):

  • Pappy Smear ||

    Did Amnitybot OD? Will there be a memorial service?

  • ||

    # 17556639 murder photo
    # 17556639 steak and cheese
    # 17556639 photo of death

    Apparently even incipient murderers enjoy a nice sammich or whatnot from time to time.

  • ||

    It might not be steak and cheese as we think of it.

    """And as the federal government gets more involved with health care, the feds will be looking for ways to get more bang for their regualtory buck.""""

    Been doing that for years.

    http://bphc.hrsa.gov/uds/

  • ||

    Mangu-Ward, what exactly is your point?

    If someone is out to get you, they don't have to screw around with cracking vast databases on their basement Cray.

    They do just what detectives do -- they go to your neighborhood, they talk to your neighbors, your boss, your ex-wife, and so on. They look through your trash, tail you to the porn store, et cetera.

    Nothing's really changed about that. The fact that it's possible to find out all about you through some pain in the ass computationally-intensive brainy method using the Cray in your basement is underwhelming, in a world in which there are far more ordinary and traditional ways of finding out everything about you.

  • Marian K.||

    The problem is with history. If someone starts going after you in 2020, it might make a difference if they can possibly get your 20-year history of Internet activities, or calls, or anything and filter it with some specialized data mining software, or if they cannot.

  • Annon Imous||

    It looks like #17556639 is planning on decapitating his wife using a philly cheesesteak as a sword, and then putting her in a car and rolling it off the road to make it look like an accident.

    Anyway, the idea in this story seems pretty obvious. I'm worried in particular about Google. People search for directions from their house, they Google their friends and family, and so on. The amount of information Google has abut everyone is staggering.

    The only way to protect privacy (not completely, but pretty well) would be to only store a disconnected collection of individual search queries, without any indentifier, even an anonymized one. Changing "John Smith" to "user 35375017" doesn't help. Just store li

  • Mad Max||

    I hate to derail this abortion thread with a discussion of - well, of abortion, but Oklahoma isn't the only place where abortion is being discussed today. There's also some debate on the issue over on Capitol Hill.

    Here's an article in Politico:

    Abortion battle could derail health bill

    ‘In the past week, abortion has flared up as a major impediment to passage of a health care reform bill in the Senate, taking a similar path as it did during the House debate — from obscurity to obstacle in a matter of days.

    ‘After months of trying to craft a 60-vote coalition based on the finer points of health care policy, Senate Democrats are growing increasingly worried that abortion will upend what had become a clear path to approving the overhaul bill.

    ‘Sen. Ben Nelson (D-Neb.) sparked a fresh round of concern this week when he repeatedly and definitively vowed to filibuster the health care legislation unless it included abortion restrictions as tough as the so-called Stupak amendment in the House bill.’

    Meanwhile, Planned Parenthood had a rally in Washington to support putting abortion back in the health-care bill. Showcasing the pro-aborts’ support for separation of Church and State, one of the speakers at the rally was Rev. Carlton Veazy, had of the Religious Coalition for Reproductive Choice.

    Veazy told the assembled protesters that abortion is a “God-given right” - thereby blatantly injecting religious considerations into the discussion of a purely secular subject.

    These folks at the Religious Coalition for Reproductive Choice have been violating the Church/State wall for decades. As discussed on their Web site, a predecessor organization known as the Clergy Consultation Service referred women for abortions in the pre-Roe v. Wade era, evading state restrictions on abortions.

    I'm sure that valiant defenders of the Wall of Separation will be voicing their protests any time now. Yup, really soon . . . any day now . . .

  • VM||

    just eat your face and set the stub on fire, will you, please?

  • ||

    theres cocaine in the sewers?

  • Mary Stack||

    Just a thought, sewers are dirty. Cocaine is also found on bills. Presumably the higher the denomination the better the grade. Wait, sewers are probably H1N1 free.

  • ||

    Everybody chill, take your meds and dont do meth. And yes Oklahoma is the worst place to live.

  • Ratko||

    There's cocaine in an urban sewer system? Sounds like some shitty blow.

  • abercrombie milano||

    My only point is that if you take the Bible straight, as I'm sure many of Reasons readers do, you will see a lot of the Old Testament stuff as absolutely insane. Even some cursory knowledge of Hebrew and doing some mathematics and logic will tell you that you really won't get the full deal by just doing regular skill english reading for those books. In other words, there's more to the books of the Bible than most will ever grasp. I'm not concerned that Mr. Crumb will go to hell or anything crazy like that! It's just that he, like many types of religionists, seems to take it literally, take it straight...the Bible's books were not written by straight laced divinity students in 3 piece suits who white wash religious beliefs as if God made them with clothes on...the Bible's books were written by people with very different mindsets.

  • nike shox||

    is good

GET REASON MAGAZINE

Get Reason's print or digital edition before it’s posted online

  • Video Game Nation: How gaming is making America freer – and more fun.
  • Matt Welch: How the left turned against free speech.
  • Nothing Left to Cut? Congress can’t live within their means.
  • And much more.

SUBSCRIBE

advertisement