¿ªÔÆÌåÓý

ctrl + shift + ? for shortcuts
© 2025 Groups.io

TMRCAs < was: Block Tree Question


 

Hi folks,

?

The exact details of FTDNA's TMRCA calculations are not public. There's supposed to be a White Paper on it, but they haven't got around to writing it yet. Some parts of it are based on the article I wrote following my meeting with them in 2020. If you want all the gory details of that, they're here:
https://www.mdpi.com/2073-4425/12/6/862
I'll try to give a beginner's guide here.

?

SNPs are actually the only form of variant that FTDNA tracks, so their non-matching variants list is really the number of SNPs that differ between two people.

?

A SNP is simply an error in copying - the accidental replacement of one molecule with another in the genetic sequence. We don't know if this is truly random at the quantum mechanical level, but it is close enough to random that we can treat the formation of SNPs as a random process.

?

Many people struggle with the concept of random processes. There are many similar processes in nature. A good example for our purposes is the number of weeds growing in a square inch of garden: if you till a garden bare and wait, weeds will grow and increase in number over time. Similarly, we can take an ancestor's Y chromosome and see how it accumulates SNPs as it is passed down to different lineages.

?

If you have an idea how fast weeds grow in your part of the world, you can tell by looking at the garden and the number of weeds in it approximately how long it's been since the soil was tilled. Let's say it normally takes two days for the first weeds to germinate, and you might have an average of seven weeds per inch after a fortnight. Similarly, you can tell the length of time it's been since two people were related (their TMRCA) by looking at the number of SNPs that differ between their Y chromosomes - provided you know how often SNPs form. We know that one SNP happens in every molecule (base pair) of DNA about every 1.25 billion years, on average.

?

However, the number of weeds you actually get depends on the size of your garden. The more area your garden has, the more weeds will grow. Similarly, the number of SNPs you can expect depends on the size of your Y chromosome test. This varies slightly from test to test, and with how low a quality of SNP you are prepared to accept. However, a good rule of thumb is that there are about 10 million base pairs in an old BigY-500 test and about 15 million base pairs in a modern BigY-700 test. This means a BigY-500 test gets a new SNP every 125 years or so, and a BigY-700 test gets a new SNP every 83 years or so. The comparison between every pair of tests is slightly different, as no two tests cover exactly the same parts of the Y chromosome.

?

Also, if you have a flowerpot that's only a square inch, you might not get seven weeds after a fortnight. You might have only three or you might have twelve, it just depends on how many seeds land in your tiny flowerpot. Similarly, if you are looking at two BigY tests related 290 years ago, you might expect them to have seven non-matching variants between them (290 / 83 * 2 = 7), but they might actually only have three or twelve. It really depends on luck. This means that there's no one-to-one correspondence between the number of non-matching variants that differ between two people and how long ago they are related. All we can do is compare them to the statistical average we expect, and establish some range of time during which the real date is likely to happen (the confidence interval). (The mathematics of how to estsablish confidence intervals is governed by Poisson statistics - a branch of mathematics that was originally devised to deal with another random process: the number of Prussian soldiers kicked to death by their own horses.)

?

This gives us an inexact mapping between number of non-matching variants and actual TMRCA, which is why exact TMRCAs are impossible to calculate. How can we improve the accuracy of our TMRCAs? We somehow have to increase the size of our garden. If you take a different square-inch flowerpot, it will have a different number of weeds. If you take many square-inch flowerpots that were planted a fortnight ago, you would find eventually get an average that settles down to our expected seven weeds in each.

?

We have two options when it comes to genetic tests. We can either take bigger and better tests - this may be possible in the future, but we are close to running out of Y chromosome to test! Another option is to look at more testers. If you take a man who has two sons, each son's family will accrue an independent set of new SNPs, each of which can provide an estimate of the TMRCA. If you can add in a third son's family, then you have three independent sets of SNPs, and their average number (multiplied by 83) should give you a smaller confidence interval that is closer to the real TMRCA. Adding a fourth son would improve things further, etc., but there is a law of diminishing returns - the difference between 99 and 100 sons is very small!

?

Not even the most prolific father has 100 sons, all of whom have descendants today. However, if you run out of sons, then two grandsons from the same son have *almost* independent sets of SNPs, so they are *almost* as good at improving the TMRCA. Conversely, however, testing your brother isn't effective, since you and your brother's sets of SNPs aren't independent of each other - you will share most or all of them. This is the reason I would not suggest encouraging brothers or close cousins to take a BigY test if you have already done so, unless there is a very specific reason for doing so. Normally, there's no benefit from testing anyone closer than about fourth cousins and, if you do want to test additional family to improve your TMRCA, then test the most distant cousin you can find, as their SNPs will be the most independent of yours.

?

Hopefully this gives you some insight into TMRCAs and how they can be improved. I should point out that FTDNA's TMRCAs also use Y-STRs as well as Y-SNPs, though these obviously aren't reported in your non-matching variants lists. It's possible to also include other larger mutations in DNA (e.g. MNPs, insertions, deletions, etc.), but these are unreliably reported by BigY in many cases. My paper sets out the basic principles behind how each of these can be used.

?

Best wishes,

?

Iain.

Join [email protected] to automatically receive all group messages.