Stanford Study: It's Ridiculously Easy To Match Metadata to People


Stanford University researchers Jonathon Mayer and Patrick Mutchler were skeptical when President Obama told the nation that the NSA is just collecting metadata and thus not violating Americans' privacy — so they investigated how easy it would be for someone to match metadata, which includes information about a caller's location, length, and number calls, with a caller's identity.

NSA HQ/Wikimedia Commons

They found it is "trivially" simple to do, even for those with limited funds and software.

Mayer and Mutchler, computer scientists who study technology policy, decided to run an experiment testing the ease with which one can connect metadata to names. For the experiment, volunteers agree to use an Android app, MetaPhone, that allows the researchers access to their metadata. Mayer and Mutchler say that it was hardly any trouble figuring out who the phone numbers belonged to — and they did it in a few hours.

From their blog:

So, just how easy is it to identify a phone number?

Trivial, we found. We randomly sampled 5,000 numbers from our crowdsourced MetaPhone dataset and queried the Yelp, Google Places, and Facebook directories. With little marginal effort and just those three sources—all free and public—we matched 1,356 (27.1%) of the numbers. Specifically, there were 378 hits (7.6%) on Yelp, 684 (13.7%) on Google Places, and 618 (12.3%) on Facebook.

What about if an organization were willing to put in some manpower? To conservatively approximate human analysis, we randomly sampled 100 numbers from our dataset, then ran Google searches on each. In under an hour, we were able to associate an individual or a business with 60 of the 100 numbers. When we added in our three initial sources, we were up to 73.

How about if money were no object? We don't have the budget or credentials to access a premium data aggregator, so we ran our 100 numbers with Intelius, a cheap consumer-oriented service. 74 matched. Between Intelius, Google search, and our three initial sources, we associated a name with 91 of the 100 numbers.

The researchers conclude that, "If a few academic researchers can get this far this quickly, it's difficult to believe the NSA would have any trouble identifying the overwhelming majority of American phone numbers."

The study confirms what numerous critics of the NSA have been saying. A professor speaking on behalf of the ACLU, for instance, said in an August court hearing testifying against the program:

Although officials have insisted that the orders issued under the telephony metadata program do not compel the production of customers' names, it would be trivial for the government to correlate many telephone numbers with subscriber names using publicly available sources. The government also has available to it a number of legal tools to compel service providers to produce their customer's information, including their names.

Some government officials also aren't buying the story that bulk collection of "just metadata" is harmless. In his preliminary injunction against the program last week, Judge Richard Leon said:

The Government maintains that the metadata the NSA collects does not contain personal identifying information associated with each phone number…[but] there is also nothing stopping the Government… using public databases or any of its other vast resources to match phone numbers with subscribers.

NEXT: A Great Christmas Season Rant Against the Unfairness of the State. (Warning! NSFW!)

Editor's Note: We invite comments and request that they be civil and on-topic. We do not moderate or assume any responsibility for comments, which are owned by the readers who post them. Comments do not represent the views of Reason.com or Reason Foundation. We reserve the right to delete any comment for any reason at any time. Report abuses.

  1. Which is why Google and Yahoo and Facebook should grow a pair and publish the metadata of the NSA’s requests to them, even if it is technically illegal. If the NSA has nothing to hide, why should they be concerned if everyone can see their metadata?

    1. Yes. And after that the metadata of everyone who voted for the FISA. Then all who voted for the Patriot Act. Then the rest of Congress, Cabinet, and their staff.

  2. Also, the NSA isn’t just collecting metadata.

  3. Poor NSA – they use the latest technology to hoover up all the lolcats but are still stuck with a pre-internet PR paradigm.

    They should know by now that *any* claim you make is quickly going to be fact-checked by a ton of borderline idiot-savants, all working independent of each other, who have nothing better to do *all day* than to check the veracity of your statements.

    Every implication and possibility of every half-truth and bald-faced lie is *going* to be outed. Hiding what you’ve done only drags out the proceedings, keeping it fresh in people’s minds.

    You can hide for a while, but once you’re outed you may as well face-up to what you’ve done and be honest about it – there are other well-tested (by politician, priest, and CEO) methods of deflecting public outrage long enough for it to be forgotten.

  4. I repeat.

    The 4A is not about privacy. It is about property rights. When the government has the power to search and confiscate YOUR shit in furtherance of a criminal investigation.

    That metadata belongs to the service provider (as per contractual agreements). If that service provider does not want to provide THEIR property to the government, the government needs a warrant to take it. PERIOD.

    And to get said warrant, the government needs to have probable cause, supported by Oath or affirmation, and particularly describing the place to be searched, and the persons or things to be seized.

    Take your third party rule and shove it COMPLETELY up your statist ass!

    1. “Secure in your person and effects” sorta means privacy to me, along with all the rest of what you said.

      1. No, I agree, privacy and property rights are very much related, but framing 4A in the light of privacy alone allows the statists leeway to “interpret” WHEN you have a right to privacy. If it’s framed as a property rights argument, I ALWAYS have the right to my shit and no one else has claim to it without my say so, unless you comply with the 4A provisions for taking possession of it.

        E.g. Do I have a right to privacy in my car? No, because others can look into your car, and according to the statists if you have no right to privacy, they don’t need to comply with the warrant provisions.

        Now ask if I have the right to control my property while I’m in my car? It has a different connotation. I ALWAYS control my property, regardless of where it is. Had the argument been framed in this, its proper light, this third party bullshit would never have been invented.

        1. I have a very simple 4A test. If a reasonable person would object to a stranger doing what the government agent proposes, the government agent needs a warrant.

          Looking in a car window? Hard to really object if it’s parked somewhere publicly accessible.

          Attaching something hidden to the car (that is, not a note or something)? Pretty clearly invasive.

          Would I object to my telcom provider handing information about me to a stranger? Yes I would.

          1. HIPPA for e-data of all types.

    2. I like that property rights formulation, Francisco. If I loan my car to a buddy, I’m not giving up ownership in the car. He can use it (for a limited time and purpose) but he can’t keep it, sell it, or give it to someone else. That’s a good way to look at metadata too, I think.

  5. I’ve been saying all along: if it can’t be used for anything, why are they collecting it?

    1. Because it’s easier to do than actually discriminating a target. They want to scoop up loads of data and then sift through it, like panning for gold. It’s okay because terrorism and stuff. Keeping kids safe. 9/11. Emotional buzzwords and gotchas of your choice. Freedom. Jihad. Sharia law. Jesus.

  6. Sometimes man you jsut have to roll with it.


Please to post comments

Comments are closed.