¿ªÔÆÌåÓý

ctrl + shift + ? for shortcuts
© 2025 Groups.io

Re: BigY - is it worth it?


 

¿ªÔÆÌåÓý

Thank you for the interesting clarification, which explains the thinking of the STR years.

?

Is there any theoretical and/or empirical reason to suppose that the location and combination of these elements may not be entirely random, but may behave in accordance with the observation and deductions

of Thomas Hunt Morgan( the man whose name is best know in the term CentiMorgan) that recombinations do have some pattern to them?

?

Richard

?

From: [email protected] <[email protected]> On Behalf Of Iain via groups.io
Sent: Thursday, April 25, 2024 3:15 AM
To: [email protected]
Subject: Re: [R1b-U106] BigY - is it worth it?

?

Hi folks,

?

Mike's comments are spot on, but I thought I'd try to be a bit more pedantic *ahem* precise in the reasoning.

?

The reason people appear and disappear from match lists at different levels comes down mostly to random chance. I mentioned the Time Estimate table on the Y-STR results: you can see from this that the nominal cutoff for individual matches is a little over 1000 years in each case.

Now we know that the number of mutations a family builds up over 1000 years on each Y-STR is random. Some families will get a mutation, others won't. But if you're only looking at a small number of STRs and a small number of mutations, you'll find some families have no mutations and others have loads. It's only by taking a large number of Y-STRs that this effect averages out, and we get a better consistency in the number of mutations a family has accured. If you have a match close to this boundary, you might find that they have few mutations in the first 37 markers, so they are a match at Y-37, but lots of mutations in the 38-67 marker set, so they don't match at Y-67, but fewer mutations in the 68-111 marker set, so they are a match again at Y-111. This is the main reason that matches can appear at higher levels that weren't at lower levels.

?

However, it is also true that the first 37 markers are a bit special. They are not especially "jumpy" or "erratic" in the technical sense, but their unusual combination of mutation rates does cause them to behave differently. Markers 38-111 are a fairly typical set of Y-STR mutations. However, when FTDNA was putting together the first set of 37 Y-STRs, which now seems to have become something of an industry standard, they picked specific STRs for specific reasons.

?

Those choices were based how effectively they could be used to group people into genetic families. The first 12 markers are mostly very slow-mutating markers (e.g. DYS426, DYS388, DYS392), which lets people be reliably grouped into major haplogroups (e.g. R-M269). Both the Y-25 and Y-37 upgrades brought in more slow markers, but also added a number of faster ones: e.g., DYS458 and DYS464 in Y-25 and CDY, DYS576 and DYS570 in Y-37. These fast-mutating markers give a much better chance of finding mutations within a closely related family, but are pretty useless at grouping people on older timescales because there are too many convergent and back mutations on them. This wide range of mutation rates means it's both very easy to build up a large genetic distance from someone quickly, and relatively easy for spurious matches with convergent mutations to enter your Y-12, Y-25 and Y-37 match lists.

?

To give you a sense of scale, you're about 100 times more likely to get a mutation on one of the CDY markers as you are on DYS426. When dealing with a wide range of mutation rates and fast-mutating markers, the relationship between TMRCA and genetic distance quickly breaks down due to the hidden and back mutations: the probability of getting a convergent or back mutation in two families over 1000 years is about 66% and is most likely to occur on CDY, with each of its two markers mutating roughly every 40 generations.

However, FTDNA generally simply takes an average of the mutation rates when dealing with Y-STR markers. If you do this, then you'd predict that a convergent or back mutation only happens about 49% of the time. So, even if FTDNA's Time Estimates accounted for convergent and back mutations, and even if it accounted for the growth in the number of cousins with time, this simplification to an average mutation rate would cause their estimates to underestimate the full range of possible TMRCAs.

?

Cheers,

?

Iain.

Join [email protected] to automatically receive all group messages.