An increasing number of criminal suspects are being arrested following the use of genealogical websites, raising ethical concerns regarding the use of genetic databases. Scientists from Israel checked whether a person can actually be tracked down according to familial DNA, and they also offer a solution

In April 2018, James DeAngelo was arrested and charged with the murder of 12 people and the rape of dozens of women in the 1970s and 80s. The so-called Golden State Killer was the first to be caught by comparing his DNA with genetic profiles uploaded by genealogy website users. But he is by no means the last. A recent study, led by scientists from the Israeli company MyHeritage, highlights the potential dangers of such uses, and even offers a solution. 

Companies that perform private genetic tests and provide their customers with information regarding their ancestry and family tree have become very popular in recent years. Over 16 million people have already taken these tests in the leading companies, 23andMe,, and MyHeritage, which also operates in the US. The genetic profiles can be compared to others kept by the same company, in an attempt to locate family members, or are uploaded to genealogical websites, such as GEDmatch, enabling users to compare their own genetic profile with profiles from other companies. MyHeritage also allows users to upload genetic profiles from other companies to its database.

When law enforcement officials began uploading the profiles they had created, based on DNA from crime scenes, they struck gold. The commercial profiles contain much more information about each person included in them than the FBI's genetic database. To make a positive high-confidence identification between a certain DNA sample and a specific person, police investigators compare the DNA sequence of several dozens of defined regions. In contrast, the commercial tests aim at searching for family roots, and are therefore much more extensive, sequencing some 600,000 DNA regions. Subsequently, they can be used not only to locate first-degree relatives, but also much more distant ones, like third cousins. Unlike government databases, there is no supervision on the use of genealogical websites, and by identifying relatives and cross-referencing some public information, investigators can frequently reach the suspect him/herself.    

From April to August of 2018 alone, at least 13 suspects were arrested on the basis of information acquired by investigators from such websites. While most cases were crimes committed long ago, like those of the Golden State Killer, one was a crime committed in April of this year – indicating that at least some police officers use the genetic databases as another professional tool, relying on them for their ongoing work.

But this type of database usage raises legal and ethical concerns. To date, all investigations involving genealogical websites were conducted in the US and dealt with serious crimes – rape and murder. But no-one can guarantee that this will be the case in the future, too. "I think we will see an increasing use of these means by the police, possibly extending into property crimes, political offenses, or any other kind of transgressions," said Dr. Yaniv Erlich, Chief Science Officer at MyHeritage and Professor of Computer Science at Columbia University in the US, to Davidson Online.

כל אחד יכול להזמין בדיקה ולצרף את ה-DNA שלו למאגרים של מיליוני בני אדם. ערכת בדיקה גנטית | צילום: Shutterstock
Everyone can order a test and add their DNA to databases containing information on millions of people. A genetic test kit | Photograph: Shutterstock

Narrowing down the list

What are the odds that searching a genealogical database will indeed lead to finding a suspect’s relatives? And once found, what are the odds of their leading the police to the suspect? Erlich and his colleagues asked these questions in their recent study.

They used MyHeritage's database, with its 1.28 million profiles, running each one as if it was the "target," namely, the DNA for which relatives are being searched. They discovered that for about 60%, they found a third-degree cousin, and for 15%, a second-degree cousin or an even closer relative. In addition, the researchers conducted 30 searches on GEDmatch, where profiles obtained through different commercial companies can be uploaded, and made similar findings.

With the help of Dr. Shai Carmi from the School of Public Health at The Hebrew University of Jerusalem, the researchers constructed a model simulating the genetic variation in the population. According to the model, a database needs to contain only 2% of a given population in order to have at least one third-degree cousin of almost any person from that population. This means that if we have a database of about 3 million Americans of European descent, which comprise most of the users of the companies offering genetic testing, over 90% of these people would find relatives in the database. 

Can identifying a third-degree cousin really lead the police to making an arrest? Based on familial relationship alone, the list of suspects would include about 850 names on average – not exactly a number that would lead us straight to the suspect we are looking for. But if we were to add additional details, such as the suspects’ sex – which we can learn from the DNA, an estimate of their age according to eyewitnesses, and also limit our search to people living within a 100-mile radius from the crime scene, the list is significantly narrowed down. According to the researchers' calculations, we would be left with about 16-17 suspects on average, a small enough group on which commonly used law enforcement measures can be applied.

So it seems that the new method may indeed be very efficient, as long as investigators have access to a sufficiently large database. Such a database is found on genealogical websites, but also on research databases, such as Pseifas (Hebrew for mosaic), the Israeli Ministry of Health's project for creating a medical database, containing 100,000 profiles.

Erlich emphasizes that he supports the project. "I would recommend people to sign up for the Pseifas database," he said. "We have to look at its advantages: this type of database can help us find a treatment for the disease of someone we care about". Nevertheless, 100,000 profiles is close to 2% of the Israeli population – and this means that it could also serve as an efficient tool for law enforcement officials to search for suspects. "Would it be stolen by a foreign country, it can serve for identifying DNA left at that country," added Erlich. "Even if that person never submitted their details to the database."

השימוש במאגרים האלה צפוי רק להתרחב, ויש לכך יתרונות רבים. יניב ארליך | צילום: MyHertigae
The use of these databases is only going to increase, and this has many advantages. Yaniv Erlich | Photograph: MyHeritage

Filtering profiles

The concerns regarding inappropriate database use has led some of the companies to restrict their terms of service: GEDmatch authorizes the authorities’ sample submission only for investigations of violent crimes, such as murder and rape, and MyHeritage completely prohibits uploading such samples – but currently, the companies have no actual means to prevent this, and, at most, can protest after the fact.

Erlich and his colleagues offer a solution that addresses this issue: A cryptographic signature that would accompany each profile created by the large companies. "When a user uploads a profile to the website, I can check whether that signature is valid," explains Erlich. In the event of a missing signature, the sample is not automatically dismissed, but undergoes examination. I take you to another page, check who you are, start a conversation with you. The idea is to build a higher wall, to ensure that whoever is using the website is legitimate."

The research paper contains an example for a code that creates such a signature, but Erlich emphasizes it is only a demonstration. "We need to give this some extra thought, how to create the security, and that will take some time. The paper aims to present the problem, and offer policy-makers a way to think about it." 

Along with the concerns regarding the potentially problematic use of the data on the databases, even if for a good cause, we must remember that nearly all uses of genetic databases are for a good cause: From locating relatives to medical research. "We have wonderful success stories, adopted children finding their birth families, Holocaust survivors finding relatives after decades," says Erlich. "We are telling the people that manage the databases – think about sharing your data. We can build a beautiful thing, we just have to do it right."


Translated by Elee Shimshoni