The police in Sacramento, California, famously used genetic matching data from the direct to consumer genealogy website GEDmatch to identify former police officer Joseph James DeAngelo, age 72, as the "Golden State Killer" who committed a series of rapes and murders in California in the 1970s and '80s. Genetic information provided to the website by some distant relatives led the police to suspect DeAngelo. The police directly connected DeAngelo to the murders and rapes by matching old crime scene DNA to his DNA obtained from a car door handle and a piece of discarded tissue paper.
Similar genetic genealogy matching has been used to identify criminal perpetrators in about 13 cold cases in the U.S. Police investigators can be expected to resort to such long range familial searches as more and more genetic information is voluntarily supplied by Americans to such open websites as GEDmatch. Is this a problem? Does this violate the privacy of Americans?
An article on identity inference of genomic data using long range familial searches recently published in Science by a team of researchers led by Columbia University bioinformatics researcher Yaniv Erlich reports that very soon it will be likely that, at least for Americans of European ancestry, genetic genealogy matching combined with easily obtainable demographic information will be able to identify almost anyone. Genetic matching of even distant relatives enables investigators to refine family trees to eventually identify specific individuals. Such genetic matching is becoming much easier as genetic databases grow. The researchers predict that "with a database size of ~3 million US individuals of European descent (2% of the adults of this population), over 99% of the people of this ethnicity would have at least a single 3rd cousin match and over 65% are expected to have at least one 2nd cousin match. With the exponential growth of consumer genomics, we posit that such database scale is foreseeable for some 3rd party websites in the near future."
As it happens, 23andMe tells me that I have 1,082 genetic relatives ranging from first to sixth cousins identified in their database.
Most legal analysts agree that police use of long range familial searches using data from nonforensic genetic databases does not violate either constitutional protections or current laws. Nevertheless, some object to police suspicionless genetic surveillance. One proposal by University of Baltimore law professor Natalie Ram and her colleagues is the adoption of a Stored Genetics Act similar to the Stored Communications Act, under which a court may order disclosure of electronic records if the government "offers specific and articulable facts showing that there are reasonable grounds to believe" that the records sought "are relevant and material to an ongoing criminal investigation."
Ram and her colleagues observe that such an act "would likely render law enforcement searches of nonforensic genetic databases unlawful for crime-detection purposes, as there can be no 'specific and articulable' connection between particular database records and a particular crime when investigators seek to use such a search to generate leads, not investigate them. Thus, although such an approach would preserve freedom from perpetual genetic surveillance by the government, it may well result in fewer solved cases."
Setting aside concerns of how the police might use DTC genetic information, three University of Washington computer scientists outline in an article posted at arXiv various ways that criminals could misuse unprotected DTC genetic information. Specifically, they suggest that forged genetic profiles can be used by criminals to misdirect investigations, by con-artists to defraud victims, or by political operatives to blackmail opponents. For example, a criminal could manipulate genetic data to create a profile for a synthetic second cousin in a database that would point the police using long range familial searches away from the actual perpetrator.
Both Erlich and the University of Washington researchers suggest that one good way to forestall the manipulation of digitized genetic data for nefarious purposes is for the raw data files generated by DTC genetic testing companies to be digitally signed using a cryptographic key controlled by the DTC company. Profiles submitted without a valid DTC digital signature such as those of participants in genetic disease research programs would be rejected from third party databases like GEDmatch. A digital signature system would also complicate police use of long range familial genetic searches since they would have to submit crime scene samples to DTC companies in order to obtain digital signatures so that the profiles could be compared with profiles already in third party databases.
Despite his concerns, how worried is Erlich about his own genetic privacy? Not very. He, like me, has posted his genomic information online. "If you ask me, do you want to share with me your genealogy or your cellphone records or search-engine records, I will share my genealogy," he explained in an interview at The Atlantic earlier this year. "If you ask me do you want your search-engine data or data your ISP sees or your bank account versus your genome, your genome is actually quite—I don't think it's very interesting."
But as Erlich demonstrates, the rapid expansion of DTC genetic testing databases will make it increasingly hard for any of your criminal relatives to hide from the police.
In any case, it's a good idea for DTC genetic testing companies to adopt digital signatures as a way to allay concerns over identifying research participants and forestalling the misuse of genetic data by criminals and the police.