#8679

开云体育

A new paper presenting "UYSD: a novel data repository accessible via public website for worldwide population frequencies of Y-SNP haplogroups":

?

Debbie Kennett

#8688

开云体育

I cannot see that anyone else has commented on this paper below.

?

This does contain Mark Jobling and Maarten Larmuseau amongst the extensive list of co-authors, but this paper seems to be fronted principally by researchers in Estonia.

?

There are Supplements to this paper which need to be downloaded as well. But my first impression is that this work will fall far short of the work and results now available at FTDNA.

?

Whether these types of academic groups would have thought about approaching FTDNA to access their Y-DNA data pool is an interesting ethical question in 2025. ?Perhaps we should invite Prof. Mark Jobling along to the Guild of One-Name Studies 1-Day Seminar on DNA in October 2025 at Oadby in Leicestershire to help in the discussions of these types of issues for the future.

?

Brian

?

Show quoted text

From: [email protected] <[email protected]> On Behalf Of Debbie via groups.io
Sent: 08 May 2025 20:56
To: [email protected]
Subject: [R1b-U106] The new universal Y-SNP database (UYSD)

?

A new paper presenting "UYSD: a novel data repository accessible via public website for worldwide population frequencies of Y-SNP haplogroups":

?

Debbie Kennett

#8689

I read through all of the restrictions associated with getting any results added to this system. It if is not peer reviewed, published, and contains all relevant data privacy sign-offs in place it won't be added. Overall this system has very little use or value to the current community as the data we generate cannot be merged into it.

-Wayne

On Wednesday, May 14, 2025 at 06:50:36 AM EDT, Brian Swann via groups.io <brian_swann@...> wrote:

I cannot see that anyone else has commented on this paper below.

?

This does contain Mark Jobling and Maarten Larmuseau amongst the extensive list of co-authors, but this paper seems to be fronted principally by researchers in Estonia.

?

There are Supplements to this paper which need to be downloaded as well. But my first impression is that this work will fall far short of the work and results now available at FTDNA.

?

Whether these types of academic groups would have thought about approaching FTDNA to access their Y-DNA data pool is an interesting ethical question in 2025. ?Perhaps we should invite Prof. Mark Jobling along to the Guild of One-Name Studies 1-Day Seminar on DNA in October 2025 at Oadby in Leicestershire to help in the discussions of these types of issues for the future.

?

Brian

?

Show quoted text

From: [email protected] <[email protected]> On Behalf Of Debbie via groups.io
Sent: 08 May 2025 20:56
To: [email protected]
Subject: [R1b-U106] The new universal Y-SNP database (UYSD)

?

A new paper presenting "UYSD: a novel data repository accessible via public website for worldwide population frequencies of Y-SNP haplogroups":

?

Debbie Kennett

#8690

In terms of creating an improved phylogenetic tree, this database does not greatly help us. However, what it does is essentially create a much phylogenetically finer scale for population-scale studies. This data should be considerably less biased than our American-oriented sample is at present. While its geographical scale appears limited to country-level statistics, it should allow us to more easily compare with country-level data on population frequencies.

?

For example, if we want to determine the fraction of R-U106 in a country, the best place I have found to do so is currently the Wikipedia page for R-M269, which mostly references Myres et al. (2007). It would be very helpful to have a more up-to-date compendium of relative frequency from published data.

?

Another example: if we want to determine the fraction of a smaller haplogroup in a country, the best place to do so is currently Family Tree DNA's haplotree, by dividing the number in our target sub-clade ("R-XXX") by the total number of testers in that country (i.e. the number in A-PR2921). This is problematic for a lot of reasons, mostly to do with testing depth. While the UYSD will not have the numbers that FTDNA has, so won't be able to go down to the fine-scale haplogroups, we can hope to get a better idea of, e.g., the fraction of R-Z156, R-Z9, R-L47 or R-Z18 in individual countries than we have now.

?

That population frequency data is vital when it comes to examining the origins and spread of haplogroups, as we can unpick origins from modern distributions with our historical knowledge of migrations.

?

Cheers,

?

Iain.

?

(P.S. - speaking of origins, I'm still working on my phylogeography document. I'm about halfway through R-DF98, but I still have R-S18823 to go.)

#8691

Quoted excerpts from

"Since the samples in this study were analyzed using different DNA technologies, each examining varying numbers of Y-SNPs and classifying Y-haplogroups at different levels of resolution, we inferred a reduced phylogenetic tree to enable comparative analysis between the population samples. To qualify for inclusion in this tree, each Y-SNP had to be typed in at least 90% of the samples. Additionally, each included Y-SNP was required to have a frequency of at least 5% in one of the analyzed populations. A total of 188 Y-SNPs met these criteria and were, therefore, included in comparative analyses between the 27 study populations."

?

For those (such as I) belonging to "minor clades" which are often less than 5% of the population, I don't seen any benefit whatsoever for using this database.? Better results would be achieved in submitting data to YFull (especially for novel Y-SNP discovery & tree building), which is at least being referenced as the primary base resource by this database.

https://ysnp.erasmusmc.nl/haplogroup_tree

"There are several phylogenetic trees in use. In addition to the tree developed by YFull (utilized here), a commonly referenced tree is maintained by the International Society of Genetic Genealogy (ISOGG). However, ISOGG’s Y-DNA Haplogroup Tree has not been updated since 2020 and relied heavily on manual updating, which can introduce errors. Another option is the Y-DNA Haplotree developed by FamilyTreeDNA. While extensive, it lacks traceability—there are no version numbers, and changes are not publicly documented. In contrast, YFull’s YTree offers regular updates and comprehensive traceability, making it the preferred foundation for UYSD, at present. A disadvantage of the approach is that when data is stored in UYSD only variations which are incorporated in the underlying phylogenetic tree are stored. Any variants considered by alternative phylogenetic trees, private mutations, or variations that are not yet incorporated in the phylogenetic tree will not be stored. *Consequently, **if** the phylogenetic tree used by UYSD is updated, newly added genetic variants (compared to the previous version) cannot be analyzed for samples that were included under the prior version.*" [asterisks mine]

I think the last sentence underscores my primary objection: this database will always lag behind the "cutting edge" of Y-Tree building by several years, I'd reckon 5 to 10, perhaps more.? It's an issue I have with the "Minimal Y-Tree" at https://www.phylotree.org, and this one looks to be only a minor improvement at best.

--

Best regards,

Vince T.

#8692

开云体育

Hi Vince

?

I would also have thought that their comment on FamilyTreeDNA’s database is a bit unfair:

?

“While extensive, it lacks traceability—there are no version numbers, and changes are not publicly documented.”

?

I’m not sure what they would accept in terms of traceability, unless it means access to the raw data files.? It is just about impossible to produce a version number as the Y-DNA Haplotree of Mankind must be updated about weekly, I would guess.

?

It sort of reminds me of an extended Facebook comment Blaine Bettinger made at the start of the 23andMe saga, when he said academics in Britain refuse to acknowledge anything that the citizen science community has pulled together which might be more useful to them than their own endeavours.

?

I suspect mtDNA full sequencing is just about to go down this same rabbit hole.

?

Brian

?

Show quoted text

From: [email protected] <[email protected]> On Behalf Of vince@...
Sent: 15 May 2025 02:53
To: [email protected]
Subject: Re: [R1b-U106] The new universal Y-SNP database (UYSD)

?

Quoted excerpts from

"Since the samples in this study were analyzed using different DNA technologies, each examining varying numbers of Y-SNPs and classifying Y-haplogroups at different levels of resolution, we inferred a reduced phylogenetic tree to enable comparative analysis between the population samples. To qualify for inclusion in this tree, each Y-SNP had to be typed in at least 90% of the samples. Additionally, each included Y-SNP was required to have a frequency of at least 5% in one of the analyzed populations. A total of 188 Y-SNPs met these criteria and were, therefore, included in comparative analyses between the 27 study populations."

?

For those (such as I) belonging to "minor clades" which are often less than 5% of the population, I don't seen any benefit whatsoever for using this database.? Better results would be achieved in submitting data to YFull (especially for novel Y-SNP discovery & tree building), which is at least being referenced as the primary base resource by this database.

"There are several phylogenetic trees in use. In addition to the tree developed by YFull (utilized here), a commonly referenced tree is maintained by the International Society of Genetic Genealogy (ISOGG). However, ISOGG’s Y-DNA Haplogroup Tree has not been updated since 2020 and relied heavily on manual updating, which can introduce errors. Another option is the Y-DNA Haplotree developed by FamilyTreeDNA. While extensive, it lacks traceability—there are no version numbers, and changes are not publicly documented. In contrast, YFull’s YTree offers regular updates and comprehensive traceability, making it the preferred foundation for UYSD, at present. A disadvantage of the approach is that when data is stored in UYSD only variations which are incorporated in the underlying phylogenetic tree are stored. Any variants considered by alternative phylogenetic trees, private mutations, or variations that are not yet incorporated in the phylogenetic tree will not be stored. *Consequently, **if** the phylogenetic tree used by UYSD is updated, newly added genetic variants (compared to the previous version) cannot be analyzed for samples that were included under the prior version.*" [asterisks mine]

I think the last sentence underscores my primary objection: this database will always lag behind the "cutting edge" of Y-Tree building by several years, I'd reckon 5 to 10, perhaps more.? It's an issue I have with the "Minimal Y-Tree" at , and this one looks to be only a minor improvement at best.

--

Best regards,

Vince T.

#8693

Brian,??

For traceability think about it in terms of version control.? With a documented version there is a set of changes applied to the previous release which individuals can review.? It is difficult to reference a "live" construct in any work.? With a fixed version number and date of release it is obvious to other individuals what one is referencing and working off of.?

The lack of discrete versioning indicates a potential uncontrolled process for editing and publishing of results.? FTDNA should consider going to regular timed and versioned releases of their tree.? Stop pandering to the "I want to see my result on the tree now" focus which is present in the community.

- Wayne

On Thursday, May 15, 2025 at 03:26:47 AM EDT, Brian Swann via groups.io <brian_swann@...> wrote:

Hi Vince

?

I would also have thought that their comment on FamilyTreeDNA’s database is a bit unfair:

?

“While extensive, it lacks traceability—there are no version numbers, and changes are not publicly documented.”

?

I’m not sure what they would accept in terms of traceability, unless it means access to the raw data files.? It is just about impossible to produce a version number as the Y-DNA Haplotree of Mankind must be updated about weekly, I would guess.

?

It sort of reminds me of an extended Facebook comment Blaine Bettinger made at the start of the 23andMe saga, when he said academics in Britain refuse to acknowledge anything that the citizen science community has pulled together which might be more useful to them than their own endeavours.

?

I suspect mtDNA full sequencing is just about to go down this same rabbit hole.

?

Brian

?

Show quoted text

From: [email protected] <[email protected]> On Behalf Of vince@...
Sent: 15 May 2025 02:53
To: [email protected]
Subject: Re: [R1b-U106] The new universal Y-SNP database (UYSD)

?

Quoted excerpts from

"Since the samples in this study were analyzed using different DNA technologies, each examining varying numbers of Y-SNPs and classifying Y-haplogroups at different levels of resolution, we inferred a reduced phylogenetic tree to enable comparative analysis between the population samples. To qualify for inclusion in this tree, each Y-SNP had to be typed in at least 90% of the samples. Additionally, each included Y-SNP was required to have a frequency of at least 5% in one of the analyzed populations. A total of 188 Y-SNPs met these criteria and were, therefore, included in comparative analyses between the 27 study populations."

?

For those (such as I) belonging to "minor clades" which are often less than 5% of the population, I don't seen any benefit whatsoever for using this database.? Better results would be achieved in submitting data to YFull (especially for novel Y-SNP discovery & tree building), which is at least being referenced as the primary base resource by this database.

"There are several phylogenetic trees in use. In addition to the tree developed by YFull (utilized here), a commonly referenced tree is maintained by the International Society of Genetic Genealogy (ISOGG). However, ISOGG’s Y-DNA Haplogroup Tree has not been updated since 2020 and relied heavily on manual updating, which can introduce errors. Another option is the Y-DNA Haplotree developed by FamilyTreeDNA. While extensive, it lacks traceability—there are no version numbers, and changes are not publicly documented. In contrast, YFull’s YTree offers regular updates and comprehensive traceability, making it the preferred foundation for UYSD, at present. A disadvantage of the approach is that when data is stored in UYSD only variations which are incorporated in the underlying phylogenetic tree are stored. Any variants considered by alternative phylogenetic trees, private mutations, or variations that are not yet incorporated in the phylogenetic tree will not be stored. *Consequently, **if** the phylogenetic tree used by UYSD is updated, newly added genetic variants (compared to the previous version) cannot be analyzed for samples that were included under the prior version.*" [asterisks mine]

I think the last sentence underscores my primary objection: this database will always lag behind the "cutting edge" of Y-Tree building by several years, I'd reckon 5 to 10, perhaps more.? It's an issue I have with the "Minimal Y-Tree" at , and this one looks to be only a minor improvement at best.

--

Best regards,

Vince T.

#8694

Wayne,

?

Admittedly, I am not an academic or even a member of the citizen science community. I am simply a customer of FTDNA, who like many others, took a test to see where my place is in the greater world of YDNA. As such, I respect your concern about presenting a product that is usable well beyond the customer base. However, I also think satisfying the customers' needs/desires is not "pandering" and ought to remain a priority while also trying to work with the larger academic/scientific community.

?

Ed

?

#8695

If version control is an issue (and I understand that it would be helpful), then there doesn't seem an issue of publishing a "version of record" on an ~annual basis. This could be as easy as posting a copy of the existing JSON file on Zenodo and giving it a DOI. Ideally there would be an interface to it too, but it shouldn't be a major problem.

?

- Iain.

#8696

Ed - with all due respect it would be very hard to make the argument that FTDNA is 'customer-focused' in any comparison or review of their historical performance. We get what we get is more like it, and as a customer/citizen-scientist I continue to expect more than what I've seen in both categories.

?

Leake

#8697

Hi Leake,

?

I'd counter that argument - Family Tree DNA have the biggest database and the best tools for Y-DNA research in the business. Those are the primary reasons why our community works so well with them, and why we recommend people to buy their tests. And they are the market leaders specifically because they are customer focussed, providing the tools and support that we need to do our work, and getting problems fixed.

?

There are aspects where I think they could do better. They could be more responsive to small change requests - currently almost every change they make is part of a major version overhaul, there are no minor updates to services. They could be more pro-active about anticipating their customers' needs, but there our community is their strength and we can tell them what we need - Alex Williamson's Block Tree and my TMRCA estimates have both been modified by them and are now presented to every BigY tester in a way that we couldn't reach ourselves.

?

There are significant bottlenecks to providing other services. Some of this is due to data volume. BigY tests include a whole lot of data, and processing hundreds of thousands of BigY tests, not to mention the Y-STR, autosomal and mt-DNA tests, is a big challenge if you want additional BigY services. There are also issues with personal information security and legal requirements for privacy that FTDNA have to navigate and which affect us all. There is also the sometimes-difficult task of keeping information simple for new testers, which providing the level of technical and scientific detail that us power-users want.

?

If there are specific changes or improvements that you would like FTDNA to make, then perhaps giving some examples would be a good start. If we can demonstrate that a lot of people want changes to happen, then FTDNA often listens to big groups like us and makes changes in future upgrades if those changes are cost-effective.

?

Best wishes,

?

Iain.

#8698

开云体育

Hi Wayne

?

This is a fair point – but I suggest it might mean FTDNA have to run two versions of their global Haplotrees – one that is updated continuously, and another which could be versioned and updated as such, say at 6-monthly intervals which could be referenced, by academia in particular.

?

There might be an opportunity to do something about this.? The Guild of One-Name Studies is holding a One-Day Symposium on DNA on the 18th October 2025 at Oadby, close to Leicester in the UK.

?

I have been asked to speak in the afternoon session of this programme, which is not yet completely finalised.? But the last session will be a Q&A on the future of genetic genealogy.

?

I have suggested to the organisers it might be useful to have David Vance sit in on at least that part of the programme, and realistically there is no restriction on any DNA testing company, its representatives or indeed anyone on this email list from attending this one-day event virtually, apart from paying the non-member of the Guild costing to attend.

?

With the new publication of the 2nd Edition of the ‘DNA: A Guide for Family Historians’ book by the Strathclyde group, it will be a good time to have this sort of discussion.?

The last chapter of this book (Chapter 10, What Does the Future Hold), authored by Iain Macdonald, Michelle Leonard, John Cleary and Graham S. Holton, also asks similar sorts of questions and could be a useful item to read before this meeting.

?

Brian

?

Show quoted text

From: [email protected] <[email protected]> On Behalf Of Wayne via groups.io
Sent: 15 May 2025 11:50
To: [email protected]
Subject: Re: [R1b-U106] The new universal Y-SNP database (UYSD)

?

Brian,??

?

For traceability think about it in terms of version control.? With a documented version there is a set of changes applied to the previous release which individuals can review.? It is difficult to reference a "live" construct in any work.? With a fixed version number and date of release it is obvious to other individuals what one is referencing and working off of.?

The lack of discrete versioning indicates a potential uncontrolled process for editing and publishing of results.? FTDNA should consider going to regular timed and versioned releases of their tree.? Stop pandering to the "I want to see my result on the tree now" focus which is present in the community.

?

- Wayne

?

On Thursday, May 15, 2025 at 03:26:47 AM EDT, Brian Swann via groups.io <brian_swann@...> wrote:

Hi Vince

I would also have thought that their comment on FamilyTreeDNA’s database is a bit unfair:

“While extensive, it lacks traceability—there are no version numbers, and changes are not publicly documented.”

I’m not sure what they would accept in terms of traceability, unless it means access to the raw data files.? It is just about impossible to produce a version number as the Y-DNA Haplotree of Mankind must be updated about weekly, I would guess.

It sort of reminds me of an extended Facebook comment Blaine Bettinger made at the start of the 23andMe saga, when he said academics in Britain refuse to acknowledge anything that the citizen science community has pulled together which might be more useful to them than their own endeavours.

I suspect mtDNA full sequencing is just about to go down this same rabbit hole.

Brian

From: [email protected] <[email protected]> On Behalf Of vince@...
Sent: 15 May 2025 02:53
To: [email protected]
Subject: Re: [R1b-U106] The new universal Y-SNP database (UYSD)

Quoted excerpts from

"Since the samples in this study were analyzed using different DNA technologies, each examining varying numbers of Y-SNPs and classifying Y-haplogroups at different levels of resolution, we inferred a reduced phylogenetic tree to enable comparative analysis between the population samples. To qualify for inclusion in this tree, each Y-SNP had to be typed in at least 90% of the samples. Additionally, each included Y-SNP was required to have a frequency of at least 5% in one of the analyzed populations. A total of 188 Y-SNPs met these criteria and were, therefore, included in comparative analyses between the 27 study populations."

For those (such as I) belonging to "minor clades" which are often less than 5% of the population, I don't seen any benefit whatsoever for using this database.? Better results would be achieved in submitting data to YFull (especially for novel Y-SNP discovery & tree building), which is at least being referenced as the primary base resource by this database.

"There are several phylogenetic trees in use. In addition to the tree developed by YFull (utilized here), a commonly referenced tree is maintained by the International Society of Genetic Genealogy (ISOGG). However, ISOGG’s Y-DNA Haplogroup Tree has not been updated since 2020 and relied heavily on manual updating, which can introduce errors. Another option is the Y-DNA Haplotree developed by FamilyTreeDNA. While extensive, it lacks traceability—there are no version numbers, and changes are not publicly documented. In contrast, YFull’s YTree offers regular updates and comprehensive traceability, making it the preferred foundation for UYSD, at present. A disadvantage of the approach is that when data is stored in UYSD only variations which are incorporated in the underlying phylogenetic tree are stored. Any variants considered by alternative phylogenetic trees, private mutations, or variations that are not yet incorporated in the phylogenetic tree will not be stored. *Consequently, **if** the phylogenetic tree used by UYSD is updated, newly added genetic variants (compared to the previous version) cannot be analyzed for samples that were included under the prior version.*" [asterisks mine]

I think the last sentence underscores my primary objection: this database will always lag behind the "cutting edge" of Y-Tree building by several years, I'd reckon 5 to 10, perhaps more.? It's an issue I have with the "Minimal Y-Tree" at , and this one looks to be only a minor improvement at best.

--

Best regards,

Vince T._,_

#8699

Thx Iain - I think you state some of the opportunities plainly enough, I don't carry around a wish list any longer... I appreciate your time in the saddle but I'm satisfied it is what it is at this point. Version control, working with citizen scientists, etc. all have more to do with their implied service model vs the actual services we receive as admins, customers or end-users of their site. I'm not a professional reviewer, auditor or rabble-rouser; can only best express my experience in working with them as having been more hopeful than fully satisfied - by any measure.

#8707
Edited

The Haplotree version is now included in Discover.

?

Under Phylogenetics from the About page.

?

"The current version of the FamilyTreeDNA Y-DNA Haplotree is?2025.05.18"

?

The version should be updated weekly.? The previous version was 2025.05.11

#8708

Hi, I have paid for three Y700 tests within the family, one for my brother, one for my cousin and one for my mothers youngest cousin so I could get a great grandfather result.? My brother is a U106 DF96 line and whilst we have found out a lot about that line the archaic matches are very very widespread, Hungary, Romania, France, Italy, France, Bavaria, Denmark, several locations in the U.K.? One of the interesting things is that it is found in both La Tene burials and Viking burials as well as in York with Driffield 3 a supposed Gladiator whose autosomal DNA is closest to the Welsh.? This is very very intriguing and was unexpected for the researchers who imagined the three Germanic male lines they tested to be Germanic prisoners of War. They were all essentially Celtic Britons, a reminder that Y DNA isn't necessarily the largest contributing factor in working out our overall ancestry.?

?

If we had lots of matches the line would be easier to track.? The fact is we don't but we MAY have more in the future and that's the best we can hope for. In many ways it's a 'team effort' and I really wish some of the Y67 matches etc would upgrade.? Currently our only living match is connected to us at FGC8372 level in the Iron Age and is a Swede but they also put that marker in England as well. We could guess forever.? FtDNA have put my brothers SNP in England in 150 BC there was huge flooding in the lowlands and some Frisians potentially crossed at that time.? MyTrueAncestry appear to call some of my DNA Lombard or Langobard and they were originally in Sweden.? According to Eupedia Lombard DNA is virtually indistinguishable from Frisian. None of these DNA results come with guarantees (actual Langobard as a tribal name in history is recorded much later) they are just jigsaw pieces and I am not entirely sure of the picture yet. It seems people were regularly uprooting due to flooding and plague and also tribal warfare, when this happened they didn't travel en masse but had different ideas about which direction to go in.

?

The British Beaker line results were very easy in comparison, not because it kept still but because a lot of people with Irish ancestry appear to test.

?

Regards

?

Linda

#8709

Thanks for the update Martin.

?

I think, however, that for FTDNA's public Y-Tree to become a cite-able resource, there should be a mechanism for periodical static "snapshot" publishing like the DOI system mentioned by Iain, which records authorship, version, date of publishing, publisher, and so forth.

(ISOGG did this by freezing their Y-Tree at each year-end with a permanent URL.)

?

Currently the raw JSON data for FTDNA's public Y-Tree can be captured fairly simply by a command script using the "curl" utility; it just takes a while to capture the 105+ MB data-stream.

i.e. in Windows 10/11:

curl -o FTDNA-YTREE_%date%.json https://www.familytreedna.com/public/y-dna-haplotree/get

All that remains is to assemble the metadata for the resource and upload it to the static repository.? Ideally the snapshot metadata (author, provider, publisher, version, date, etc.) should be included in the JSON file itself as well.? I presume a DOI repository service should be able to do that during upload processing, but it would be better if it was included at the source by FTDNA.

EDIT: I noticed that the JSON file I just retrieved included the key-value pair "publishedDate":"2025-05-23T16:29:12.367" at the end of the file, so we're part-way there!

Aside from the raw JSON file not being particularly easy to view without a rendering program to convert it to a human-navigable tree, the only other caveat with FTDNA's current JSON structure that I can see is that it records surnames associated with haplogroups, which could trigger privacy concerns.

--

Best regards,

Vince T.