开云体育

ctrl + shift + ? for shortcuts
© 2025 开云体育

on-going work update via an email to a UK researcher


 

开云体育

Hello Christopher,

()

Thank you for your response. I appreciate your interest in breaking through brick walls in genealogical research and your understanding of the effort required to do so. Your email has inspired me to add a new section to my Yates one-name-study, where I will clarify the study's aim and declare the known understandings and limitations of autosomal DNA testing. By focusing on my own study and its applicability, I hope to provide meaningful insights that could also benefit others.

I wholeheartedly acknowledge and embrace all the known and unknown limitations of autosomal testing. I would add that some lines in genealogical research can be fictional, while others may have ill intentions, such as phishing scams targeting amateur family genealogists.

The primary goal of the Yates Cousin Study is to apply data analytics techniques to a specific dataset. I have been exploring the possibility of using an analytical approach on a vast set of DNA matches to uncover any valuable insights. I must clarify that I am not an exceptional research genealogist or a data expert. My background lies in being a retired hospital administrator with a master's in public health. However, I draw inspiration from a consulting project I previously conducted with the Fair Isaac Corporation, renowned for their utilization of "Big Data" and mathematical algorithms to predict consumer behavior, which has transformed entire industries.

The Fair Isaac staff once shared an intriguing finding from a deep dive into "Best Buy" sales data. Through an inexplicable algorithm, they accurately predicted customers' future purchases and provided recommendations that led to a 20% increase in sales the following year. This experience from two decades ago sparked my curiosity—could a similar approach be applied to overcome the brick walls in my own research?

I was uncertain but decided to give it a try. What you have seen so far are the preliminary results of my first-generation analysis. Admittedly, the data is complex and messy. My objective was not to make predictions, but rather to discover connections and correlations within the existing data.

All participants in the testing phase who granted access to their DNA matches have USA-based lineage. Interestingly, 74% of the 375 ancestral lines in the dataset connect to the Yate(s) family of Berkshire, UK. An important question that needs addressed by a data analyst is how much data frequency (number of ancestral lines) within a dataset is necessary to filter out false data interference, considering the known understandings and limitations of autosomal DNA testing. The answer remains elusive at this stage, but it is certainly more than the 375 lines included in the current index.

The question of frequency defines the aim of the ongoing second-generation work.? To address the limitations of consumer autosomal testing, we are working on a Python scripting solution that will allow the automatic generation of unique ancestral lines from any GEDCOM file.

We plan to add an additional element—a lone source item in each applicable record of a software program—to the GEDCOM, which can be edited by the user. This addition will likely be a numerical value representing different DNA testing services, such as 1 for an agnostic DNA match, 2 for an Ancestry DNA match, 3 for a 23&Me DNA match, 4 for an FTDNA DNA match, and so on.

Upon successful implementation of this scripting solution, several benefits will be realized. Firstly, processing a complete GEDCOM will take only about 25 seconds, a significant improvement compared to the six months of manual labor previously required. Secondly, a GEDCOM file with the aforementioned numbering scheme will allow for the coding and sorting of all contents, enabling the user to cross-reference multiple family lines based on their imagination alone. Furthermore, this approach eliminates the restriction of relying solely on Ancestry.com matches, as GEDCOMs from various sources can be processed. However, the quality of the research and sourcing within the GEDCOM remains the responsibility of the user—garbage in, garbage out.

Looking ahead, the third generation of our study will involve exploring the concept of Most Distant Ancestors. With numerous descendants, determining the number of authentic lines leading to Most Distant Ancestors becomes a significant question. Additionally, we are considering the creation of an index comprising impeccably researched ancestral family lines to serve as a guidepost for future research.

During my six months of line development, I frequently encountered DNA matches with incomplete family trees, requiring me to reconstruct them to the best of my abilities. As I progressed, it became apparent that I could identify ancillary lines and that the line I was constructing often led to a Most Distant Ancestor already developed by another researcher. If done accurately and based on meticulously researched lines, this approach could assist future DNA matches in considering the correct ancestral line—a spin-off concept similar to Ancestry.com's Hint system.

I hope this provides you with a review of my work and the direction I am taking. If you have any further questions or would like additional information, please do not hesitate to ask.

Best regards,

Ron Yates

Charlotte, North Carolina???

?


--
Ron Yates #9019
Yates name registrant

London, England