Data Collection

Debate: Corporate Data Collection Poses a Threat to Personal Freedom

There are lots of reasons to be concerned about government snooping, but how should we feel when private companies do it?


Big Corporations Want Your Data. Don't Give It to Them.

J.D. Tuccille

Joanna Andreasson

If I forget where I've been shopping online, I can just head over to Facebook. Ads on the social networking site will quickly remind me what I've been browsing, and perhaps even offer a coupon code to help close the deal. I appreciate the discounts, but I'm creeped out by the thought of the profile that can be stitched together from the sites that I visit.

Libertarians rightly fret about government databases that assemble sensitive information about finances, movements, and beliefs. That information can be weaponized against individuals for official purposes (hello, J. Edgar Hoover!) or for personal gain and amusement. It's also a treasure trove for hackers, as we've seen with breaches to the IRS and the federal Office of Personnel Management. We have no choice but to supply the state with the data it demands and hope for the best.

If you're concerned about privacy, however, it's apparent that we don't have a lot more choice when it comes to private sector data collection. And while the threat there is different than the one posed by intrusive government programs, it's still worth worrying about—and taking steps to protect yourself.

Loan applications, credit card transactions, and surfed websites contribute pieces to the jigsaw puzzle of our lives. The Facebook/Cambridge Analytica hookup illustrated how sought-after those puzzle pieces are for the targeted marketing of products—and politicians. The 2017 hack of credit monitoring company Equifax compromised the personal data, including tax identification and Social Security numbers, of nearly 150 million people. Criminal pilfering of credit agency databases demonstrates that data collection doesn't have to be mandated by law to be perilous.

Credit agencies, banks, social media companies, marketers—there's a long list of independent agencies who don't need legal bludgeons to extract our data from us. Keeping our private information out of their hands might be possible, but only by living a cash-only, near-Luddite existence.

Some digital privacy hawks have argued that personal data should be treated like property. There's something appealing about the idea that we should have control over the use of information that might be sensitive, or dangerous, or just embarrassing. But data doesn't exist in a discrete, physical form. It's knowledge, and knowledge can be effortlessly replicated and distributed—including in people's minds. How would we control that?

One possibility, suggested by Mark Skilton, an information systems professor at Warwick Business School in the U.K., is to separate the right to possess your personal information from the right to grant permission to others for its use. "New personal data services will evolve to track and better enable people to manage their data while maintaining security and privacy," he predicts.

That could work for, say, Social Security numbers or other keys to our identities. But much of the personal information over which people worry involves interactions with other people. Records of what we bought are also records of what other people sold. Accounts of where we travel are also accounts of who transported us or rented us a room. If we can control data about our purchases, can vendors also restrict what we say about their role in the deal? If so then good luck, Yelp.

We get a glimpse of the pitfalls in this approach from the implementation of the European Union's Data Protection Regulation, intended to give individuals control over personally identifiable information—"anything from a name, a photo, an email address, bank details, your posts on social networking websites, your medical information, or your computer's IP address," according to the European Commission. Compliance has proven challenging, especially for small businesses, which have struggled to navigate the law's bureaucratic complexities far more than tech behemoths like Facebook and Google.

Joanna Andreasson

So creating an obstacle course of red tape may not be the most effective way to go. Not unless we're trying to entrench big firms and create a full-employment act for lawyers, that is.

But such difficulties in implementation don't erase legitimate concerns about the collection and use of our data. Nobody wants his identity stolen because a company can't be bothered to safeguard a massive store of hacker-bait. And political profiles assembled from our online activities pose the distinct danger of putting targets on our backs in an era when "partisans fixate on the goal of defeating and even humiliating the opposition at all costs," as Stanford University's Shanto Iyengar and Masha Krupenkin wrote recently in Advances in Political Psychology.

What to do?

Enhanced liability for companies that compile sensitive data but fail to adequately secure it seems appropriate. Anybody who sees enough value in gathering such information that they build a business model around it should be expected to take measures to keep it safe. If they don't, they should be prepared to pay the price.

Still, we all need to be smart about distributing our information and managing our own brands. Using social media is, of course, far from a necessity. But if you do, there are ways to limit the ability of tech industry voyeurs to look into your life.

Those Facebook ads become a lot more generic if you install the Facebook Container Firefox add-on to isolate the social media giant from the rest of your online activity. Anti-tracking extensions like Ghostery and any of numerous ad blockers help you use the internet without leaving trails of breadcrumbs wherever you go. So does favoring a privacy-respecting search engine like DuckDuckGo over Google.

It's also possible to smudge the outlines of your profile with simple actions such as sharing your supermarket loyalty card—if you choose to use one—with the friendly folks behind you in line. The gas credits are nice, and let the marketers grapple with your apparent ownership of ten cats and inexplicable thirst for white zinfandel. After all, profiles of our lives are only as accurate as the data fed into them.

Those of us who care about our privacy will always be torn over what we should share with the world. There are no perfect solutions, but we can certainly make efforts to manage our brands, and to make marketers doubt the reliability of their profiles by actively tainting them with false information.

Corporate Collection of Big Data Makes Your Life Better

Declan McCullagh

At this very moment in Silicon Valley, Seattle, New York, Zurich, Tel Aviv, or Tokyo, a software engineer is puzzling out better ways to use large amounts of personal data about you and billions of other people. Our engineer's goal is to make your life a little more convenient: Your phone will do a better job of searching your photos, avoiding traffic, or suggesting books to read.

Private data collection done with the user's consent isn't spying. It's a way of figuring out what individual customers want and need to serve them better.

These abilities are powered by a type of analysis called machine learning. Its statistical techniques are capable of identifying patterns in data that previously required human intelligence to discover. In general, the more data—sound clips, photographs, Uber rides, and so on—used for training the system, the better its internal models will become and the more useful the results will be. When executed well, it's nearly magical. When done poorly, well, we're all very sorry you were asked if you wanted to buy Hillary Clinton's It Takes a Village, Tenth Anniversary Edition.

Machine learning techniques are still rudimentary, but they're improving all the time. Google's Eric Schmidt predicted in 2016 that they will underpin "every successful, huge" initial public offering during the next few years, and he might be right. The limits of the different techniques—which go by names like neural nets, ensemble learning, and support vector machines—have yet to be reached. But in general, additional data mean better results. The more a system knows about you, the more it helps.

In a few years, a virtual assistant may place and receive phone calls on your behalf, send emails, drive your car, and diagnose medical conditions. This is not general-purpose artificial intelligence like we see in science fiction, but it can still be useful. And your future assistant will do a better job if you let it access personal information about you.

Every major technology company I'm familiar with uses machine learning techniques. Amazon uses them for product recommendations, personalized ads, Alexa, and its Amazon Go physical store. Twitter uses them to create personalized timelines. Netflix uses them to improve streaming quality and to personalize not only movies but also the on-screen artwork displayed for each movie. Facebook uses them to recognize your friends' faces and generate street addresses from satellite imagery. Yelp uses them to improve image classification (so restaurant photographs uploaded by users are categorized properly). Thanks to off-the-shelf services like Google Cloud and Amazon Web Services, machine learning employing large data sets already has become the foundation of personalized computing.

"Private data collection done with the user's consent isn't spying. It's a way of figuring out what individual customers want and need to serve them better."

Some libertarians may like the trade-off involved in sharing personal data in exchange for recommendations or virtual assistants. Other libertarians may not. But the choice should be yours: Your idiosyncratic dislike of someone else's personal preferences does not qualify as a compelling reason to demand government intervention.

None of this should be taken as a defense of companies that lie about what they're doing, conceal important details, or fail to adequately protect their users' information. Laws prohibiting fraud remain libertarian-compatible, and the tort bar will be happy to pounce on misfeasance. An example from 2005: Sony BMG failed to disclose that its CD copy protection contained a so-called rootkit, which introduced vulnerabilities and leaked user data. It was a braindead corporate decision, made worse by management's initial response, which ended with Sony writing settlement checks for up to $50 million.

By now, astute readers will have realized that there is a potential privacy problem separate from corporate blundering: If large quantities of your data are remotely stored on servers, law enforcement and intelligence agencies will surely demand the ability to gain access. Worse, the privacy threat increases with the volume and sensitivity of the data. The very information that allows a virtual assistant to operate efficiently—your spending habits, political and religious views, and minute-by-minute location—is a target for a legal, or perhaps even extralegal, fishing expedition.

One response would be to enact a law curbing what information third parties can collect. But that makes as much sense as preventing companies from manufacturing binoculars simply because police can use them for spying.

The more sensible approach is to curb government surveillance. That means taking steps such as updating the Privacy Act of 1974 to limit government access to outsourced databases; increasing the authority of inspectors general at federal agencies to monitor abuses; boosting criminal penalties for lawbreaking officials; and perhaps most important, rethinking the drug laws that continue to invite snooping into our personal lives. (In 2017, over half of the 3,813 federal and state wiretaps reported to the Administrative Office of the U.S. Courts were for drug-related offenses. Investigations into violent crimes, including homicide, robbery, assault, and kidnapping, amounted to less than 10 percent of the total, though these figures do not include wiretaps done under the Foreign Intelligence Surveillance Act.)

Another crucial fix is to provide a broader legal shield protecting personal data held by third parties. This is happening, albeit slowly. In June, the Supreme Court ruled that police needed a warrant supported by probable cause before they could obtain cell-site records (which revealed location information) from a man's wireless carrier. And since the 6th Circuit Court of Appeals' 2010 ruling in U.S. v. Warshak, technology companies generally require prosecutors to obtain warrants signed by judges—not mere subpoenas—before divulging the content of email messages.

In the long run, as we share more with our virtual assistants, perhaps certain deeply personal data should simply become off-limits to the government. This would follow other legal privileges, including the attorney-client privilege, the marital privilege, the clergy privilege (protecting both formal confessions as in the Catholic Church and confidential communications to other clerics), the accountant-client privilege, and the physician-patient privilege. These are not absolute; exceptions exist for future crime or fraud, and a privilege can be waived in other ways.

The real danger in exabyte-scale data collection coupled with advances in machine learning isn't from private companies attempting to learn your preferences. It's when governments, probably led by China, start to really figure it out.

The enabling technology is increasingly available to anyone who wants it. A generation ago, automatic license plate recognition was the stuff of Hollywood thrillers. Now anyone can download the code for free. I'm thinking of installing it on an extra computer to recognize cars pulling into my driveway. If I might do this in my spare time, what will the government come up with when it actually puts some thought into it?