Medical databases and privacy

Privates Made Public

Katherine Mangu-Ward | From the March 2010 issue

New reporting requirements in Oklahoma could force women who receive abortions to have their private information entered in a public database. The rules are on hold pending a court hearing, which at press time had not yet been scheduled.

The new regulations would require doctors to collect and report information about every abortion in the state, including the mother's age, marital status, race, number of children, education level, relationship to the father, and reason for the abortion, as well as the cost and method of payment. The form contains 37 questions in all, most with several subsections. One goal of the law, which also includes a ban on sex-selective abortions, is to make the data available to researchers and the general public on the state government's website.

To keep such personal information private, the database would strip out women's names and other obvious identifying information, theoretically "anonymizing" the data. But as Latanya Sweeney of Harvard's Center for Research on Computation and Society told BioEdge, "data tend to flow around and get linked to other data." Even when obvious identifying information is removed from a large data set, personal identities often can be cracked by a geek with time on his hands. Arvind Narayanan and Vitaly Shmatikov, for instance, broke the anonymity of a large set of Netflix movie preference data by comparing the dates of specific rankings with similar rankings on the popular Internet Movie Database, where users reveal personal information in public profiles. Something similar happened when AOL released "anonymized" search queries that nonetheless made identifying some users quite simple, with potentially embarrassing results.

Paul Ohm, a law professor at the University of Colorado, summed up the problem in an interview with the tech website Ars Technica: "Data can either be useful or perfectly anonymous but never both."