Baby Versus Bathwater: Distant Matches

If you've done an autosomal DNA test for genealogy (e.g. AncestryDNA, 23andMe, FamilyTreeDNA's Family Finder, or MyHeritage), you may have gotten a bewildering collection of DNA matches that the testing company suggests are some sort of cousin. For example, when I downloaded my full set of AncestryDNA results last week, I had 14,435 matches. That's a lot of relatives!

In January 2017, Blaine Bettinger (who blogs as The Genetic Genealogist) published "The Danger of Distant Matches". In this post, he compared his matches to those of both of his parents and found that 35% of his matches were not shared with either of his parents. The vast majority of these problematic matches shared a small amount of DNA (10 centimorgans (cM)* or less). Since he got all of his DNA from his parents, this means that these matches have to be false positives (or false negatives for his parents).

Last month, I had the privilege of studying with Bettinger and other experts in the week-long course on Practical Genetic Genealogy at the Genealogical Research Institute of Pittsburgh (GRIP). We discussed the distant matches findings and Bettinger commented that he'd like to have more people do this analysis so we can have a bigger data set to consider.** Since both of my parents have also tested at AncestryDNA (and given their permission for me to blog about their results), I replicated Bettinger's process with my own family's data.

Using the DNAGedcom Client, I downloaded match lists for me and both of my parents on July 26 and July 27, 2017. I used the Match-o-matic tool, available in Version of the DNAGedcom Client, to analyze the match lists to see which matches are (and are not) shared.

Overall, 17% of my matches were unshared with either parent (2,411 out of 14,435), which is more encouraging than the 32% rate that Bettinger found for his results.

Here's the breakdown for the categories that Bettinger used:
Largest match: 18.1 cM
10 share 15 cM or more
130 share 10 cM or more (5% of unshared matches)
894 share 7-10 cM (37% of unshared matches)
1387 share 6-7 cM (58% of unshared matches)

I tend to like visual representations of data, so I wanted to see the shape of the curve for the proportion of matches that appear to be valid. I put the data into buckets based on the total amount of shared DNA in centimorgans and here's what I found:

From this set of data, it appears that the probability of one of my matches being shared with a parent is very good (98% or better) for matches down to about 12 cM. When matches share less DNA with me than that, the likelihood that they also show up as a match for one of my parents starts to drop off. By the time we get down to about 9 cM, the probability of a match being shared with a parent drops to about 90%. Below that, it drops off more dramatically, with only two-thirds (67%) of matches between 6-7 cM being shared with one of my parents.

So what does this mean? Should you ignore all matches below a certain threshold because they might be, in Bettinger's words, poison? Is there some threshold above which you can say with total certainty that the match is real?

As with most things, I think a set of context-sensitive guidelines rather than hard-and-fast rules are the way to go. Here are some of the guidelines that I use when analyzing autosomal data:

  • Start with closer matches, rather than more distant ones. Matches that share more DNA are not only more likely to be valid matches, but they're also likely to be more closely related, so it may be easier to figure out who the shared ancestors are.
  • Use caution when considering distant matches. Perhaps because my results were a bit more encouraging than Bettinger's (or perhaps because I haven't been at this as long as he has which may lead to some naiveté on my part), I might be more favorably inclined to consider matches below the 15 cM guideline that Bettinger describes as the "safe zone". However, it is clear that the smaller the amount of DNA the matches share, the higher the risk that the match might not be valid.
  • Correlate multiple sources of evidence! The amount of DNA shared with a match is only one piece of the puzzle. The best way I know of to determine whether a match is valid or not is to correlate the amount shared DNA with other available evidence. How long are the matching segment(s)? Do you and this match share other matches in common? If so, who are they and what do you know about them? Does this match have a tree available? If so, how complete is it? How complete is your own tree? This is just a subset of the questions I would ask when analyzing any match.

In practice, I tend to incorporate small matches into my analysis when they come to my attention for some other reason. A distant match may fit into a triangulation group (a group of three or more people, not closely related to one another, who share a particular DNA segment in common), may have an enticing set of in-common-with matches related to a question I'm researching, or may capture my attention due to a tree with surname(s) or location(s) that give me a clue as to where this match might fit in with the questions I'm trying to answer.

For those of you who are digging into autosomal DNA, what guidelines do you use to decide when to pay attention to a distant match?

*A centimorgan (cM) is a measure of the probability of a recombination event happening between two locations on DNA. Because recombination is more likely in some locations on the genome than others, it doesn't correlate perfectly with number of base pairs, or rungs in the ladder of DNA. However, for the purposes of this discussion you can think of it more-or-less as a measure of distance. On average, a centimorgan corresponds to about 1 million base pairs.

**I know of at least one other person who has also published a similar analysis: I was glad to have two sets of data other than my own to look at!

For Mother’s Day: Deep Matrilineal Ancestry

Both in my personal research and in work for clients, I spend most of my time looking at the fairly recent past: the last few hundred years and the most-recent eight to ten generations. With Mother's Day approaching, I decided to dig into the deeper history of my maternal line by looking closely at my mitochondrial DNA. I've done a mitochondrial DNA full-sequence test at Family Tree DNA that tells me that I'm in Haplogroup H11a. What does that really mean?

Before I head down this rabbit hole, I'd like to go on the record saying that I don't think there's anything about my DNA that makes it more interesting or important than anyone else's, except perhaps to me (and those who share DNA with me) because I'm curious about my own stories. I'm using myself as a case study because it's the easiest DNA for me to access, and because I've already done some research on my own ancestry, so I've accumulated a fair amount of data. Over time, I hope to use this blog to describe some of the complexity of human history. Part of that will come in illustrating the complexity of my own background (again, as a case study and not because there's anything particularly remarkable about me). In terms of genetics, this requires other people to test their DNA and to give me permission to write about the results here. I also hope that other people who are not closely related to me and whose backgrounds may be very different than mine might be willing to let me share some of their stories here.

By National Human Genome Research Institute [Public domain], via Wikimedia Commons

Mitochondria are organelles in our cells that generate energy. While most of our DNA is located in the nucleus of our cells, mitochondria have their own DNA. It is likely that mitochondria were once free-living bacteria that have long-since been incorporated into nearly all eukaryotic cells. Human mitochondrial DNA (mtDNA) is a small circle comprised of about 16,569 base pairs (the ladder rungs in the DNA molecule). Mitochondrial DNA is passed from biological mother to offspring; biological males have their mother's mtDNA, but do not pass it on to future generations. mtDNA doesn't undergo recombination (mixing), so except for rare mutations it is passed unchanged from mother to child. This allows us to use mtDNA to trace the deep history along the matrilineal line: my mtDNA is the same as my mother's, which is the same as her mother's, which is the same as her mother's, and so on.

Scientists have sequenced the mtDNA in living humans and in archaeological remains and used this information to create a mitochondrial family tree for all humans. This tree continues to be revised based on additional data; the most-recent version (Build 17) was published on 18 February 2016. You can see it here: . This tree shows how the patterns in chance mutations in mtDNA can be mapped back through all of our matrilineal lines to a woman known as Mitochondrial Eve who is estimated to have lived in Africa somewhere between 100,000 and 230,000 years ago. Mitochondrial Eve's mtDNA sequence has been published as the Reconstructed Sapience Reference Sequence (RSRS).1

By comparing an individual's mtDNA to the RSRS, you can see where the sequence is the same and where it differs. Those differences allow you to trace your way down the tree to one of its branches. For example, the fact that I have a G (guanine) instead of A (adenine) at positions 769 and 1018 on my mitochondrial DNA places me within the large group L3 early in the branching of the human mitochondrial tree.2 L3 later branched off into other groups that represent most maternal lineages outside of Africa. For me, that pathway goes through the following haplogroups: L3-->N-->R-->R0-->HV-->H-->H11-->H11a.

My matrilineal ancestors likely migrated from the Horn of Africa about 60,000-70,000 years ago via the Arabian peninsula and followed a southern coastal route into Asia.3 My haplogroup is a sub-group of H, which indicates that my foremothers were likely among the early farmers who migrated to Europe from Western Asia around 9,000 years ago as part of what is known as the Neolithic revolution.4 One recent study examining mitochondrial DNA from ancient human remains from what is now Germany includes a sample from an individual that shares my H11a haplogroup. This person appears to have been part of the Unetice culture in the Early Bronze Age (2200-1575 BC).5

We live in an amazing era. I can rub a bit of plastic on the inside of my cheek and for a couple hundred dollars I can know the full sequence of my mitochondrial DNA. This information can situate me in the context of research being done by scientists all over the world to examine modern and ancient mtDNA, and allows me to trace the broad outlines of my foremothers' journey from Africa to Asia to Europe. There are still huge gaps to be filled in between Bronze Age Europe and early 19th century Tennessee, which is as far as I've gotten on the paper trail for my matrilineal line. This is also just one branch of my family tree. I'm looking forward do doing similar analysis of mtDNA from other relatives, which will reveal the deep history of the matrilineal lines of other ancestors who didn't pass me their mitochondria, but were equally essential to the sequence of events that led to me.

What do you know about your matrilineal history?

1Behar DM et al., "A 'Copernican' Reassessment of the Human Mitochondrial DNA Tree from its Root," American Journal of Human Genetics 90, no. 4 (2012):675-684, doi:10.1016/j.ajhg.2012.03.002.

2There is also a third defining mutation for L3 at position 16311 with a change from C to T; in my case that position later changed back to C, which is part of what indicates that I'm part of the group H11.

3Soares, Pedro et al., "The Archaeogenetics of Europe," Current Biology 20, no. 4 (2010):R174-R183, doi:10.1016/j.cub.2009.11.054.

4Fu, Qiaomei et al., "Complete Mitochondrial Genomes Reveal Neolithic Expansion into Europe," PLOS ONE 7, no. 3 (2012):e32473, doi:10.1371/journal.pone.0032473.

5Brotherton, Paul et al., "Neolithic mitochondrial haplogroup H genomes and the genetic origins of Europeans," Nature Communications 4 (2013):1764, doi:10.1038/ncomms2656.

Why Genealogy? Why Now?

Launching a practice as a professional genealogist specializing in genetic genealogy might seem to be a big departure from other work that I’ve done. People have asked me why I’m launching Borgerson Research. Here are three reasons:

The first reason feels a bit selfish: I am insatiably curious. Making my living by helping people find their stories, always getting to learn more about history and science along the way, is just about the best life that I can imagine.

The second reason I’m doing this is because I think I can really help people solve their mysteries. Though my earlier work has been in other fields (including science, software, and higher education), there have been some common threads through all of my roles: I’m good at making sense of huge amounts of information, including seeing the connections between pieces that seem widely separated. I’ve also got a knack for clearly communicating complex things, especially to folks who are non-specialists.

The third and most important reason I’m doing this is because I think this is an opportunity for me to contribute to something the world needs right now. So many of the systems that we live in seem to be teetering on the edge of massive disruption. I hope that good changes may lie ahead of us, and I fear that some of the changes may be really hard. I don’t think any of the answers are simple. My intention is to help people understand the complexity of how we got here, and the ways in which both our pasts and our futures are intertwined. Every decision that each of us makes impacts other people, and I think that the more clearly we understand those impacts, the better choices we can make.

I’m on a journey to understand more about how I’ve ended up where I am today. I’d like to help other people on their own versions of that journey. I invite you to join me.

Hello world!

And so it begins. Watch this space for more on genealogy, (epi)genetics, social justice, trains, and whatever else I decide to write about.