How Do DNA Ancestry Tests Really Work?
Not long ago, genetic tests that are widely available today were the domain of dystopian science fiction. Now, they're a nice gift to buy your genealogy-minded aunt for her birthday.
Companies such as 23andMe, Ancestry.com and National Geographic market these at-home DNA testing kits, offering to unlock your genetic secrets for the price of a group dinner at a nice restaurant and about half a teaspoon of spit.
And although there was a time when these tests were marketed primarily as health services — ways to test for diseases and better understand your body — that aspect of their branding has partly receded, in part thanks to action from U.S. regulators. Nowadays, most of the big genetic testing companies pitch themselves primarily as "ancestry" services, promising both to connect long-lost relatives and to tell users what parts of the world their ancestors came from.
For more 23andme information
For more AncestryDNA information
"The ancestry service is a collection of features that give you a comprehensive look into your history, from the very ancient past, 60,000 years ago with Neanderthals, up to the recent past," said Robin Smith, who heads 23andMe's ancestry program.
Customers send spit samples to these companies. Then, usually about two months later, they log in to their accounts to find personalized web pages with information like their percentage of South Asian ancestry, or Neanderthal ancestry, or details about their maternal and paternal lines. [The Best DNA Testing Kits of 2018]
But how do these services actually work to determine someone's ancestry?
Smith told Live Science that 23andMe uses a number of algorithms to arrive at these results.
Sign up for the Live Science daily newsletter now
Get the world’s most fascinating discoveries delivered straight to your inbox.
Once the DNA in a spit sample has been digitized, it looks like a long string of C's, G's, T's and A's. Those are the labels given to the four nucleobases of DNA,the letters with which genes are written.
This string of letters would be incomprehensible to you and, on their own, just as incomprehensible to the biologists and engineers who study them. There's no string of letters that means "Swiss" or "Nigerian," for example. But the algorithms can pull meaning out of the strings of letters, Smith said.
These companies keep the details of their algorithms somewhat secret. But it's not that their computers speak some secret language. Instead, according to geneticist Mark Stoneking, group leader of the Max Planck Institute for Evolutionary Anthropology in Germany, they're really good at spotting patterns.
"These are techniques that scientists have known about for a very long time," Stoneking told Live Science.
He used a version of these methods in his pioneering work tracing the common ancestor of all living humans, a woman referred to as "Mitochondrial Eve" who lived about 200,000 years ago. And researchers still use these methods to track the movements and intermixing of human populations from the deep past to recent history.
If a genetic anthropologist has a DNA sample and a very large library of other samples to compare it against, that anthropologist can quickly figure out which groups in the library that DNA is most closely related to, Stoneking said.
"It's a robust method," he added.
The nitty-gritty
Researchers can track paternal ancestry by looking at the Y chromosome, which fathers pass to their male children. Maternal ancestry, similarly, can be found in mitochondrial DNA, which mothers pass to all of their children. The richest and most detailed ancestry information, however, comes from comparing everything else — the 22 non-sex chromosomes — against the massive libraries.
"The way that the algorithm works, it takes an entire genome and chunks it up," Smith said. "It takes little pieces, and for each piece, it compares it against the reference data set. It compares it against British; it compares it against West African; it goes through the entire list, and it spits out a probability for [where that piece of DNA came from]."
So, if your 23andMe test says you're 29 percent British, it's because 29 percent of the pieces of your DNA were most likely to have come from a group that 23andMe's reference library has labeled "British."
The names for those ancestry groups, Stoneking said, come from a mix of self-reports (many people can describe their immediate background pretty well) and independent research. So, if an algorithm finds that 8,000 people are from a close-knit ancestry group, and the researchers know that all of those people trace their heritage to Thailand, they might label that group "Thai."
The problem, as Stoneking described and Smith acknowledged, is that these methods are only as good as the libraries researchers have to compare DNA samples to.
The details of 23andMe's libraries — like those of all of its major competitors — aren't public, but Smith said the company can provide much more detailed information on European populations (among their most-sampled groups) than, for example, Native American populations (among the least-sampled groups). That's why an ancestry page can parse Irish from Anglo-Saxon, or Ashkenazi Jewish from Polish, but might combine Inuit and Navajo into a single category.
Therefore, while the underlying tools are valid, there are limits to the quality of broad ancestry data, Stoneking said.
However, the more individualized sorts of information these companies offer — such as finding long-lost relatives — are more certain, Stoneking said. A company doesn't need a big library to know whether DNA samples come from family members; they just need the algorithms that have been perfected for decades now.
Originally published on Live Science.