Keyboard Shortcuts
Likes
Search
Best-guess origins of the major R-U106 haplogroups: methods and results
#origins
¿ªÔÆÌåÓýPART I. INTRODUCTION Hi folks, I've been turning my attention recently to the problem of origins, as you may have noticed in some of my recent posts. I wanted to share my thoughts with the forum here, partly because I don't see many people looking at this problem rigorously elsewhere (hence I hope it's a useful record), and partly to get any input that others might have. Consequently, I've taken the decision to record my musings in this very long post. This is not some kind of dedicated scientific work. It's just playing around with ideas and numbers, testing out techniques, and seeing what the result is. This shouldn't be considered the be-all and end-all of everything. I'll work on that later! I hope the following is instructive. This post is incredibly long and starts off very dry. More of you will be interested in its results than its pre-amble! To go with this document there's a map: http://www.jb.man.ac.uk/~mcdonald/genetics/origins-2022.pdf and the statistics can be found in these spreadsheets: http://www.jb.man.ac.uk/~mcdonald/genetics/haplogroup_frequency.ods Cheers, Iain. PART II. DATASET BIASES In this part, I wanted to examine the biases we have in the dataset, what they are, why they're important, and how we can correct them to look at origins. Much of this I've said in different posts before, but I'm bringing it together here in what I hope is a coherent narrative. 1. BIAS IN DATASETS All datasets inherit bias. This is why survey results are never accurate and the pollsters always get it wrong. A well-constructed census will give a minimally biased dataset, but anything affected by discriminatory factors like user interest, ability to pay, legal boundaries, advertising, etc., can generate a very biased dataset. Deep Y-DNA testing is the equivalent of going into a Boston wine bar - you'll get a very different set of opinions to a truck stop in Texas... and a very different set again to a street market in Oaxaca. 2. BIAS CAUSED BY STRONG USA TESTING Y-DNA testing is done by Americans. Back in the day when FTDNA admins had access to your home addresses, the vast majority of those I used to see were in the USA, and I've colloquially heard fractions of 90% Americans in the current database (cf. the USA has ~4% of the world's population). This biases the dataset towards Americans, but more so the specific sectors and families of American society who have the ability and desire to pay for Y-DNA testing. This is an even bigger concern for haplogroups worldwide, and within individual surname projects, but I'll discuss here the specific impact it has on European haplogroups like R-U106. The origins of the affluent portion of American males is dominated by the British Isles. Of the US population in 1790, about 2/3 were of British descent (including Irish). Testing is also relatively common in Canada, South Africa, Australia and New Zealand, all of which have populations of majority British descent (also Saudi Arabia, but that doesn't feature much in our current story). In terms of recorded origins, we are also biased towards the date of those arrivals. Scots arriving in the Americas during the Highland Clearances, or Irish arriving after the Potato Famine, are much more likely to be able to trace their histories back to their countries of origin than someone from England whose ancestor arrived in the late 1600s. These factors together bias us not only strongly towards populations of British descent, but towards the parts of the British Isles that provided recent or notable emigrants. 3. TARGETTED TESTING AND FOUNDER EFFECTS In addition, not everyone tests randomly, but often on the instigation of others. There are two ways this affects our data. The first is that some families are vastly over-represented because one member of the family has led efforts to mass-test the descendants of historic individuals. This can make the difference between a family of 1000 men having one person tested and having 100 men tested. This causes huge bias in small clades, and needs looked out for. The McMillan family in R-S5520 is an example we've discussed recently; Treece in R-DF98 is another. Between them, they account for ~15% of all the 1309 R-Z156 testers, or ~2.5% of all R-U106. Yet, 1000 years ago, their haplogroups had only two out of the ~10 million R-U106 men at the time. Targetted testing can also be a less-conscious decision influenced by our initial results. You are more likely to upgrade if a person closely matches you, especially if they have upgraded already. If you have no-one closely matching you, you gain less from upgrading to a sequencing test. Hence, people in the US (or are from the mostly British families with family branches in the US) are more likely to upgrade than someone from a under-tested branch in eastern Europe. These targetted testings are artificial forms of founder effects. To recover the origins of haplogroups, and to understand our past ancestors, we don't really care about the modern distribution of a haplogroup, but what distribution that haplogroup had in the past. For example, to recover the origin of your surname, you don't care how 100 testers with it are distributed in the USA, but where their single ancestor came from in Europe. Equally, to recover the origin of R-L151, we don't care about how its R-U106 descendants are distributed in Europe, but where R-U106 itself came from. In other words, we need to calibrate out the bias caused by these artificial and real founder effects when deriving the origins of haplogroups. 4. THE EFFECT OF BIAS ON TESTER DISTRIBUTION What we want to do is correct for the bias in the dataset. To do that, we need to make some model of what an unbiased dataset should look like, and make a correction appropriately. The obvious model for an unbiased dataset is one where the population is equally tested: one test is conducted at random for every X men in that country. Consequently, we can get an estimate of the bias on a per-country basis by taking half the total population of a country and dividing it by the number of men from that country who have tested, to give X (the number of men per test). Comparing the relative values of X gives us a rough estimate of the relative bias in our dataset between countries, and we can multiply the number of testers by X to estimate the total number of men in that haplogroup in that country. All we need is a count of the number of testers and the population of that country. A count of the number of testers per country is easy. We can take FTDNA's public haplotree as a representation - partly because FTDNA provide the majority of deep Y-DNA tests, and partly because it gives us precisely this country-by-country breakdown for each haplogroup. Unfortunately, it also contains a lot of other samples from SNP-pack testers and external projects that don't go very deeply, meaning people get stuck at different levels: R-U106 comprises R-Z2265 (15950 testers) and the minor, basal clades R-S19589 (20), R-BY95662 (4) and R-Y138795 (5). Yet the total number of R-U106 is 17343, because about 1350 men have been tested, but only down to the level of R-U106. We can use differences like the R-Z2265/R-U106 ratio per country to explore the depth of testing in different locations. A count of the population of that country is more difficult. FTDNA's haplotree is assigned by modern country, yet most people's origins are stated at some point in the past. These points in time are different, countries at those times were different, and the populations of those countries are often not reliably available at those points in history. The next section discusses historical population statistics and the biases we get there - it turns out things could be even worse. What we can do for now is estimate the relative bias using modern population statistics instead. 5. HOW BIASED IS OUR TESTING POPULATION? If we take modern country populations as gospel (bearing in mind the above inadequacies of this approach), we can see some notable biases between countries. Normalising the number of men per test to 1 for the British Isles as a whole, the current haplotree shows the following biases (countries with fewer than 75 testers are removed; a population-weighted average for each region is given at its end; larger numbers are bad): England 2.225 Scotland 0.390 Wales 1.552 N.I. 0.814 Ireland 0.252 Brit. Isles 1.000
France 11.434 Germany 5.312 Switzerland 2.984 Belgium 14.788 Netherlands 8.061 NW Europe 6.867
Poland 6.577 Czech 7.234 Slovakia 7.382 Austria 8.681 NC Europe 6.982
Denmark 4.680 Norway 1.566 Sweden 1.478 Finland 0.815 Iceland 1.453 Scandinavia 1.467
Russia 15.797 Estonia 5.869 Latvia 5.406 Lithuania 1.812 Belarus 7.077 Ukraine 15.048 Moldova 19.324 E Europe 13.200
Albania 10.669 Bosnia 10.136 Bulgaria 9.341 Croatia 11.070 Hungary 5.631 Macedonia 12.202 Montenegro 3.617 Romania 22.361 Serbia 18.724 Slovenia 7.841 Turkey 49.740 SE Europe 20.149
Greece 8.211 Italy 10.506 Portugal 6.413 Spain 10.943 Meditteranian 9.866
Europe 5.101
World 19.589 6. DISCUSSION As we have seen, these numbers are only an approximation of the true bias. However, they show some stark inequalities in testing. For example, the apparent over-testing of Scotland and Ireland compared to England is considerable, and demonstrates both the rapid population rise in England since 1750 and the aforementioned comparative ease with which Scots and Irish families in America can trace their origins. It also shows that Scandinavia is just about the only place that approaches parity with the British Isles in terms of testing depth. Comparisons between Scandinavia and the British Isles as a whole have only a slight bias, except for Denmark, although it's worth noting that local-level biases are still sufficiently strong to be concerning (e.g. if tracing Viking populations in Scotland and Norway). In north-western continental Europe, where we need the most resolution for all R-L151 clades, we do fairly woefully. France's legal situation means we lack testers there, but Germany and the Low Countries are also comparatively poorly tested by a factor of a few. In north-central Europe, where many R-U106 clades likely spread from, we do similarly badly, and the problems get worse as we go further south and east. Looking at this another way, we may have about 4 million R-U106 men in the British Isles out of the 35 million in Europe, or 11%. But 4351 of the 7477 European testers (58%) are in the British Isles. If we envisage a haplogroup evenly spread across northern Europe, from Ireland and Iceland to Russia, as far south as the Alps: 12.8% of your haplogroup will be British, but 60% of the testers in that haplogroup will be British. A typical haplogroup of 32 testers would comprise four English, two Scottish, two Irish, two other ambiguous British results, two Germans, a Dutchman, a Swede, and 18 who don't know or state where they're from. This would completely mask, for example, that a seventh of the haplogroup was in France and over a third was in the former Soviet states, because no-one there has tested. It takes roughly 100 testers to make useful observations about the regional distribution of a haplogroup in Europe, and roughly 400 to make useful observations on a country-level basis. Any observations about haplogroups smaller than that is essentially guesswork based on what we do or don't see (of course, that number can be much less if you happen to be in a haplogroup that isn't strongly present in the British Isles). This problem gets worse the further down the haplotree you dig. We can compare the ratio of people who are R-M269 to those who are part of its sub-clades, R-L23 and R-PF7562. In a world where everyone on the haplotree takes a BigY test (instead of, e.g., a SNP pack or individual SNPs), these numbers should be the same. Instead, we find in the British Isles and Scandinavia that 90.5% and 90.3% of people on the haplotree have tested below R-M269. This percentage decreases to 87.2% in north-western Eurpoe, 85.3% in central Europe, and reaches a nadir of 72.2% in Russia. Equally, we can compare the ratio R-Z2265/R-U106. In a fully-BigY-tested world, we should find that 99.8% of R-U106 are in R-Z2265. Instead, we find 92.0% in the British Isles and 92.9% in Scandinavia, decreasing to 88-89% across northern Europe, 85% in Russia, 81% in south-eastern Europe, and 80% in the Meditteranean countries. Consequently, there's an additional multiplier of ~1.1 to ~1.3 in the biases that we have to account for in terms of testing depth. And this still doesn't include the account for the differences due to people who have taken the full range of SNP testing packs - only those that have stopped on the highest levels. So there is a huge range of biases in the dataset. Some of these we can account for, some of them we can't. These frustrate efforts to establish the actual modern distribution of testers in a haplogroup, from which we can try to derive its origins. Instead, we have to look at the relative distribution of testers in a haplogroup, which is what I want to devote later sections to. One final note: these country-level biases seem to be getting worse. Over the period from November 2019 to today (2.15 years), the British R-U106 testing population grew by 26%, while that of north-western Europe grew by only 15% (Belgium by only 1.5%). Elsewhere in Europe, increases are roughly between 16% and 25%. The anomoly has been the Czech Republic, which doubled during that time. Overall, Europe grew by 22%, while globally the haplotree grew by 28%. That means that the British testing bias is getting worse and a smaller fraction of the testers are listing their most-distant known ancestor's country of origin. This is fairly bleak news for fixing the biases in our data. Alternatively, if you want to look at things in a more positive light, the doubling time for the haplotree is roughly once per six years. That means the average person will need to wait six years before randomly happening across a new match that is more closely related to them than their current nearest random match. PART III: IMPROVING BIASES USING HISTORICAL POPULATION ESTIMATES 1. METHODOLOGY Remember: if we consider all the testers we have, we don't sample the populations we have equally, nor do we necessarily record the countries we live in now, but the countries our ancestors came from. We can get therefore get a measure of how much we over- or under-sample a population by looking at what fraction of the people we test, out of those living at the same time as our most-distant known ancestors, in the same countries. I roughly extracted dates from the R-U106 project members' stated paternal ancestors (any word comprising only four numbers between 1000 and 2000). Of the 2650 results, most were within a century or so of 1750. There will be some country-to-country variance that I have not explored, and would only complicate the situation further. What we would like to do is look at the fraction of men alive in 1750 whose progeny we sample via Y-DNA testing today. However, many borders have changed over the last 200-300 years, and reliable population estimates earlier than 1850 are often hard to find. (I also ignore the fact that many people's MDKA's places of origin may now be in different countries to those they were in at the time, and the fact that multiple people have sometimes tested who are descended from the same person alive in 1750. We can't easily control for this without detailed analysis of placenames and information provided.) We can then perform the same calculations as in the last post, computing the number of men per test, and calculate the relative biases in these numbers. We can normalise to the average number of men per test in the British Isles, as reliable population data exist for the British Isles for virtually every decade between 1700 and 1850. The most consistent set of data across Europe is for the year 1800, so I've used this as a default basis. However, I have averaged the biases across any population statistics I can find for other countries between 1700 and 1850 as well. 2. OVERVIEW An interesting result is that we sample roughly one in 40 of the men living in Ireland in 1700, and roughly one in 70 of those living in Scotland (again, not accounting for multiple testers per man, so the 40 and 70 are realistically lower limits). This compares with one in 190 for England, and one per 111 for the British Isles as a whole. This 111 is what we will use as our benchmark. In western Europe, testing reaches a nadir at one in 3400 for France. Consequently, the relative bias from France:Britain in 1700 is 30.5:1. This decreases to 16.2:1 by 1850, and averages 23.3:1 over the full period. Consequently, for every historical family in France, we test about 23 in the British Isles. Put another way, a haplogroup equally distributed between France and the UK could expect to have 96 British and Irish testers, and only four Frenchmen. Hence, to get a good estimate of how well-distributed a haplogroup is around Europe, while avoiding too many problems caused by small-number statistics, approximately 100 testers or more are needed. It's worth bearing in mind that these are for the whole tree, including those people who have not tested deeply. If the biases are corrected for the Z2265+:U106+ ratio in the tree (which should be almost one) then the biases on average get slightly worse. This ratio is included in the list below. 3. RESULTS The full list of biases for this period, based on the population statistics I can find, are below. These provide the rough correction factors we need to bear in mind when converting the number of people we see on the FTDNA haplotree to the distribution of a haplogroup within Europe. Note that these only apply to the FTDNA haplotree: other labs, ancient DNA and other DNA frequency studies have their own biases, which need accounting for individually. England 1.529 Scotland 0.560 Wales 1.380 N.I. 2.086 Ireland 0.849 Brit. Isles 1.000
France 24.715 Germany 6.235 Switzerland 3.978 Belgium 17.842 Netherlands 4.907 NW Europe 10.617
Poland 6.225 Czech 12.686 Slovakia 10.717 Austria 12.549 NC Europe 9.732
Denmark 3.752 Norway 1.150 Sweden 1.349 Finland 0.480 Iceland 0.930 Scandinavia 1.140
Russia 17.207 Estonia 8.343 Latvia 6.743 Lithuania 2.118 Belarus 6.804 Ukraine 12.332 Moldova 11.739 E Europe 14.424
Albania 6.816 Bosnia 8.844 Bulgaria 12.304 Croatia 12.164 Hungary 9.279 Kosovo 12.237 Macedonia 8.393 Montenegro 4.064 Romania 35.421 Serbia 8.689 Slovenia 3.736 Turkey 20.414 SE Europe 15.147
Cyprus 6.514 Greece 9.550 Italy 15.281 Malta 3.350 Portugal 9.590 Spain 10.572 Meditteranian 13.498
Europe 5.834
World 10.233 PART IV: DETERMINING HAPLOGROUP ORIGINS FROM MODERN DATA 1. ORIGIN DEDUCED FROM AN AVERAGE POSITION The geographical location associated with each person's most-distant known ancestor (MDKA) tells us something about a the origins of the haplogroups they belong to. However, those origins are hidden from us by the centuries of migration since each haplogroup's foundation. One tool we can use is that there is a generalised rate at which people have moved around in history. Common rules of thumb are somewhere between 1 km/year and 1 mile/year. Generally, the latter is more useful over short times (mapping the movements of individuals) and the former is more useful over long times (because successive generations double back on themselves). These numbers fall down in the recent past, where mechanised transport has become commonplace, but they work pretty well in the times before people's MDKAs are recorded (typically before 1800 or so). Let's take this basic rule of thumb and apply it to a 5000-year-old haplogroup like R-U106. Take any single test with a known European ancestor, and you might expect that the most-recent common ancestor (MRCA) for R-U106 lived within 5000 km (3000 miles) of that person's MDKA. This range essentially extends over all of Europe, so any one person can't tell us a lot about where ancient haplogroups like R-U106 formed. Individuals are only useful for historical-era groups. We can average together multiple people in the same haplogroup to try to find an origin. This is what Rob Spencer's SNP tracker does (http://scaledinnovation.com/gg/snpTracker.html). This should indeed bring us closer to the true origin of ancient haplogroups, but what it actually gives us is the average position of modern testers' known ancestors. In other words, it gives us a picture of where R-U106 was about the 18th century (the epoch of most testers' MDKAs), biased by the testing prevalence in each location. In other words, if people didn't list their MDKAs but their modern locations, this method would show R-U106 started in the USA. 2. BIASED AVERAGES DON'T EQUAL ORIGIN First, the dataset needs corrected for bias. I outline the bias correction factors (the weighting appropriate to each test) in my last post. Rob makes an attempt to do this in his SNP tracker, but there are some faults in the mathematical details that mean the bias is only partly corrected and new biases are introduced instead: we have discussed this, but currently many haplogroups end up in the wrong place - R-U106 and its sub-clades, for instance, are generally dragged too far west by the mathematics he uses. Second, we have to recognise that the distribution of R-U106 in the 18th century does not necessarily surround its point of formation. If you enter R-Z2265 into Rob's SNP tracker, then you will see that it ends up somewhere off the Norfolk coast, rather than near the "pinned" location of R-U106 in the Czech Republic, where R-Z2265 probably began. The reason for this is partly because of the testing bias: the correct mathematics should put the average position of all European R-U106 men somewhere close to Frisia, where R-U106 reaches a maximum percentage of the population. But we don't think R-U106 began in Frisia either. If you've been on this forum for a while, you'll remember conversations about R-U106 having origins in Frisia or Doggerland or somewhere like that, because this is where most R-U106 is found. Similarly, R-P312 is found in the highest densities in the British Isles. However, that doesn't mean they started out there: high frequency doesn't necessarily indicate origin. To understand why we have such high percentages of R-U106 (and especially R-L151) in western Europe, we can consider the fact that successive waves of migration have occurred in Europe from east to west. The first Holocene migration was haplogroup I, which resettled the British Isles, Scandinavia and inland Europe after the last Ice Age. The next was G2, out of Anatolia with the first farmers. This wave of migration didn't really make it as far as the British Isles or Scandinavia, setting up a population gradient with haplogroup I in the north and west and I+G2 in the south and east: haplogroup I reached its highest frequency in the north-west, but that's not where it came from. The third wave was our massive R-L151, which effectively took over most of Europe, leaving an effective population high-tide mark around the Atlantic Coast similar to haplogroup I. Subsequent populations have migrated into southern and eastern Europe from elsewhere over the centuries, but have never had the power of the R-L151 wave. Thus, we have a new population gradient, with R-L151 in the west and L151+others in the east. The distribution of R-L151 now tells us where R-L151 was eroded away by 5000 years' worth of incomers, not where it started. The same is true of more recent haplogroups. 3. SEPARATING CONCURRENT FROM LATER MIGRATION We can get around this problem by looking at the haplotree on a clade-by-clade basis, working up from the present towards older haplogroups, and joining together sub-clades as we go. This is the approach that Hunter Provyn takes in his Phylogeographer site. Yet if we put S1894 (=CTS10465) into this site (https://phylogeographer.com/mygrations/?hg=R1b&clade=R-CTS10465), we see things going wrong again: R-U106 ends up near Frisia and R-S1894 ends up in southern Norway. However, we know from ancient DNA (see next post) that these are both central European in origin. The reasons Hunter's tool ends up wrong are three-fold. First, he doesn't correct for the country-to-country biases (see above). Secondly, he becomes quickly undersampled with the dataset he uses. Thirdly, even this method doesn't accurately reproduce origin. To understand why this method fails, we need to understand what I will term convergent migration. Remember that R-U106 occurred as part of an east-west migration, and it was the clades that went west that thrived best. Consequently, we can expect that most of the sub-clades of R-U106 (and R-P312 and R-S1194) that still exist today to have been western offshoots of R-U106. In other words, if we take an average of the positions given by R-U106's sub-clades, we will end up with a location west of where R-U106 actually spread from, because most surviving and tested sub-clades of R-U106 migrated west. This will also be true of later migrations. When the Germanic tribes invaded Britain, many lineages did very well and quickly outstripped their German counterparts in numbers. A larger British haplogroup and a smaller German haplogroup may now exist, even though the haplogroup began in Germany. If several branches of the same family migrated, and if we are missing many of the German testers, we may mis-time this migration as happening earlier than it actually did. In defence of both Rob and Hunter's tools, they correctly predict the origins of obvious cases where exceptionally strong concentrations exist, like R-Z326 in Germany and R-A689 in eastern Europe. However, they fail to reproduce the non-trival cases, like R-U106/R-Z2265 or R-S1894, or indeed many of the steps leading up to R-Z326 and R-A689. Unfortunately, there isn't an easy fix for this, which is why no-one has done it. Currently, we have to rely on a very manual process that involves a lot of qualitative understanding and not much quantitative calculation. As a result, the origin of a haplogroup is something that can rarely be stated with accuracy, and most people's efforts to do so are either wishful thinking or crystal-ball gazing. There are some modelling efforts that can be done but, while Rob and Hunter's tools are the state-of-the-art, they fall woefully short of what's needed. We require a lot of effort spent to improve modelling before we have something that's fit for purpose. The addition of ancient DNA and accurate TMRCAs will be key here, as I will show next. PART V: ANCIENT DATA 1. OVERVIEW Haplogroups recovered from ancient burials offer us pinpricks of light in a dark history. They are generally less useful than modern DNA results, because degradation of the DNA sample and the difficulty of enrichment means that they are rarely called for the deepest haplogroup they could be: frequently, we have to make do with "haplogroup R1b" or similar. There are also generally very few of them: while we might have thousands of testers in R-U106, we've still only got barely 100 ancient R-U106 results. You might therefore wonder what use ancient DNA can be to us, given the number of modern testers. Fundamentally, what these results tell us is that someone with that haplogroup was buried in the area at the time... and they were unlikely to be alone! However, with thousands of ancient DNA results overall at our disposal, we can also start to pick out where haplogroups *weren't*: for example, the distribution of haplogroups in England today is something like 50% R-P312, 20% R-U106 and 30% others. In 100 ancient English burials from the early Bronze Age, we might therefore find 20 that are R-U106, but we find none. This indicates that R-U106 wasn't in England in the early Bronze Age, or at least not in great numbers. Nor does it appear in the Rhine delta during this period. Any phylogeography site claiming your 4000-year-old haplogroup (e.g. R-Z159) comes from either England or the Rhine delta is therefore not quite right (see previous post)! 2. WHICH ANCIENT Y-DNA IS IMPORTANT? So, what emphasis should we give ancient DNA results, and how can we use them to solve the migratory problems discussed in the last e-mail? A key word here is contemporanity. Remember the population-diffusion rate of 1 km/year I mentioned earlier? We can also apply this to ancient DNA. Most ancient DNA for R-U106 dates to around 1500 years ago. This narrows the geographical window for the origin of R-U106 down to within about 3500 km (2200 miles) of where these people came from. Ok, that doesn't help much for R-U106. If they aren't called for any haplogroups below R-U106, that's not going to help us much more than a modern tester - most ancient Y-DNA is pretty useless. What are important are the few stand-out cases. If we take only the early Bronze Age results for R-U106, where do they lie? We have a sample (I13025) from the Barbed Wire Beaker culture from south Holland, dated 1000 years after the formation of R-U106, so R-U106 probably formed within 1000 km (600 miles) of that location. Another sample of the same age is I7196 from Prague, so R-U106 probably formed within 1000 km (600 miles) of Prague. Then we have RISE98 in southernmost Sweden, so R-U106 probably formed within 900 km (550 miles) of there. These three circles define a region encompassing southernmost Scandinavia, the Low Countries, north-eastern France, Germany, Austria, the Czech and Slovak Republics and Poland. All pretty likely origins for R-U106 up until recently. Then along came PNL1, an individual who died perhaps probably within a couple of centuries of R-U106's formation. Indeed, there's no reason he couldn't be the founder of U106 itself (he's very unlikely to be, but it's possible). And he was buried in Bohemia. So, using our rule of diffusion, it's likely that R-U106 formed within a couple of hundred km (or miles) of there. This revised circle essentially covers the Czech Republic and the nearest parts of its neighbouring countries. Hence, you can see the utility of having ancient DNA close to a haplogroup's origin: an ancient DNA sample half of the age of the haplogroup reduced the search radius by a factor of two, and the search area by a factor of four. 3. THE CONTEXT OF ANCIENT DNA Burials like PNL1 also allow us to contextualise the haplogroup in history. We know the burial is from the Corded Ware Culture, which may well have taken root in Bohemia within PNL1's lifetime. Hence, we can surmise that he and his family are from further east, and that the migration during this portion of the haplotree was biased east-to-west. We can also guess that the migration rate for this family was probably more than the typical 1 km/year (based on Furholt 2014, the Corded Ware Culture advanced west at the rate of about 5 km/year or 3 miles/year). Hence, we may have to expand our search area for the R-U106 origin east by a couple of hundred miles, encompasing modern Slovakia, Hungary, Poland, Romania, Moldova and the western Ukraine. While this kind of cultural contextualisation can be very subjective, the precise timing and location results like PNL1 can give can be quite extraordinary. PNL1 is probably at least 25 times closer in time to the origin of R-U106 than we are, so is not only numerically worth 25*25=625 modern testers in terms of reducing the effects of population diffusion, but it suffers from 25 times less bias caused by subsequent migrations. Results like these are invaluable. And results like this come any time a high-coverage sample is returned. We discussed how I7196 wasn't very useful for measuring the origin of R-U106, but it was also called for R-U106>Z156>Z304>DF98>S1894. R-S1894 probably split within a couple of centuries of I7196's lifetime too, so it provides as much constraint on R-S1894 (essentially limiting its origin to in or near the Czech Republic) as PNL1 does for R-U106. This may not be much use to you unless you are R-S1894 yourself (I'm very lucky!) but it provides at least some constraint on the successive haplogroups further up the tree too. There are a number of other well-recovered ancient DNA samples where similar, contemporary comparisons can be made. Hence, ancient DNA can give us some means to overcome the biases that I mentioned in the last e-mail, caused by millennia of migrations shifting the population centres of haplogroups. It's really only a combination of rigorous treatment of modern DNA and the careful incorporation of ancient DNA that will let us identify the cultures our ancestors ultimately came from and how they moved around. PART VI: RESULTS 1. OVERVIEW In this penultimate part, I want to use the above thinking to update the best-estimate origins of the major haplogroups within R-U106. Every group with more than about 100 testers should be at least mentioned. I've likened this process above to crystal-ball gazing, much of these results remain that. It is not rigorous science, it is not quantitative, and some of it is liable not to be right. There will be family-level over-sampling that I haven't caught, as well as other factors that will affect the numbers and my conclusions. For small haplogroups, the addition of single individuals from some countries could make a measurable difference in the results, and many of these individuals are out there with simpler STR tests that don't go onto the haplotree. The dates are also not computed with rigour and are due for a major refresh - they will probably disagree at some level to those I've posted before, including recent estimates. It's also important that I emphasise that these are *comparative* not absolute statistics. Over-representation in one area therefore naturally leads to under-representation in every other, regardless of whether or not that under-representation is meaningful. Also, under-representation doesn't mean a haplogroup is not enriched in that particular place, only that it isn't as enriched as other haplogroups that may have contributed a larger percentage of the comparison (e.g. overall R-U106) population. Disentangling natural absence from relative absence can be done by paying attention to sub-clades and other haplogroups in the area, but isn't an easy task. This process will also inherit new biases from relative factors we don't yet fully understand. Despite this, I'm putting these guesses out there because people want an "expert" opinion, and because I think I can provide that better than anyone else at present. That's not to say I can look in as much detail or make as precise an observation as someone whose focus is solely within one of these haplogroups. These estimates can't replace careful manual examination of a haplogroup's descendant branches by a knowledgable individual, but I can give a comparative overview of all of these haplogroups, which is needed to uncover the steps leading up to their formation. So, caveat emptor, and here we go! 2. RESULTS R-L151 Likely MRCA date range: 3300-3000 BC. Likely origin: Ukraine or other western former Soviet states if close to 3300 BC, or former Eastern Bloc countries (Poland/Czech R./Slovakia/Hungary/Romania/Bulgaria) if closer to 3000 BC. Culture: earliest Corded Ware Culture (CWC) Narrative evidence: The ancient DNA result PNL1 in Bohemia (before 2879 BC) sets the latest time that R-L151 can form, and likely sets the westernmost possible point of origin (since the CWC was an east-west migration and had only just reached Bohemia by 2900 BC). R-L151 represents the nexus of the R1b CWC expansion that took over Europe. The speed of this expansion can be traced via the number of sub-clades per haplogroup in the haplotree: initially (at R-L151) this was moderately fast, before becoming extremely fast. Hence, R-L151 is most likely to have formed and started splitting right at the start of the CWC expansion, give or take a few generations. The exact origins of this expansion are nebulous, but likely to be somewhere around the Ukraine. Different authors prefer routes north or south of the Carpathian Mountains. Based on current evidence, I would suggest a more northerly route. R-U106 Likely MRCA data range: 3200-2900 BC. Likely origin: Former Eastern Bloc countries: Czech R., Slovakia, Hungary, Poland (perhaps also Austria) if closer to 2950 BC; Romania, Moldova, western Ukraine if closer to 3200 BC. Culture: early Corded Ware Culture (CWC) Narrative evidence: As R-L151. R-U106 starts to get more rapidly into the population expansion. R-P312, R-S1194 and other basal R-L151 clades probably have a similar origin, though with less-constraining ancient DNA the date range can be slightly closer to the present and regions slightly further west allowed as the CWC migrated in that direction. R-U106>Z18 Likely MRCA data range: 2800-2100 BC. Likely origin: Germany (if closer to 2800 BC) but more likely southern Scandinavia Culture: (early) Corded Ware Culture (CWC) or (more likely) Battle Axe or Pitted Ware Cultures Narrative evidence: R-Z18 is more common in Scandinavia than other R-U106 clades by a factor of more than four, skewing the entire R-U106 distribution towards this region. It makes up 29% of all R-U106 in Scandinavia, but only 7% of R-U106 elsewhere. It also makes up a substantial fraction (~15%) of east German R-U106 (including Austria, Slovakia and the Baltic States). In the British Isles, it's more common than average in Scotland (partly due to over-testing of the Dunbar-Cockburn group), and less common in Ireland. The Scandinavian trend is particularly strong not only for the dominant group, but the R-CTS12023 basal clade as well, indicating either the bulk of R-Z18 moved to Scandinavia (early chronology) or R-Z18 formed in Scandinavia (late chronology). The first R-Z18 ancient DNA (VK418, 4th Century AD) comes from northern Norway too. R-U106>Z18>Z372 Likely MRCA data range: 2000-1100 BC. Likely origin: southern Scandinavia Culture: Nordic Bronze Age? Narrative evidence: The strength of the Scandinavian over-representation grows with R-Z372 to a factor of over six (7.3 in Sweden, 7.9 in Norway), clearly indicating a very likely origin. Oversampling still occurs in parts of eastern Europe (Czech, Slovak, Austrian populations, mild enhancement across the Baltic States), and artificially in Scotland (largely due to the Dunbar/Cockburn family). YFull find more recent MRCAs for R-Z372 and clades below it, apparently due to lack of good causality treatment. R-Z372 seems to represent a major splitting point: descendant sub-clades go different ways, indicating that the R-Z372 was split up at some point, shortly after its foundation. This split probably occurred before the Bronze Age Collapse (except under a very late chronology), and may have been closer to the introduction of Bronze in the area around 1750 BC. R-U106>Z18>Z372>Y38140 Likely MRCA data range: 2000-1000 BC. Likely origin: Scandinavia Culture: Nordic Bronze Age Narrative evidence: Part of the R-Z372 expansion, more mildly oversampled in Scandinavia (factor of 3.15), much more strongly over-sampled in central Europe (Czech R., Slovakia and Austria), perhaps corresponding to the route of the Amber Road. This eastern expansion is led by the R-ZP91 clade (R-S4037 being oversampled in the UK thanks to the Nesbit family and recent relatives). Further investigation might narrow down the window of this migration, though a Lombard burial in Hungary shows it must have been before about 550 AD. R-U106>Z18>Z372>L257 Likely MRCA data range: 1800-800 BC. Likely origin: Scandinavia Culture: Nordic Bronze Age Narrative evidence: Part of the R-Z372 expansion, still biased towards Scotland thanks to the Dunbar/Cockburn testing, with R-ZP2 being the main focus point of their expansion, 1000-1500 years ago. The majority of its minor clades show evidence of eastern Germanic migrations, with some in central Europe too. However, it is undersampled in western Europe, particularly Germany, suggesting there was not much expansion into the west Germanic populations, hence we probably have to look to northern Germanic (e.g. Viking) migrations to explain the presence of most R-L257 in Scotland. R-U106>Z18>Z372>S3207 Likely MRCA data range: 1700-600 BC. Likely origin: Scandinavia Culture: Nordic Bronze Age Narrative evidence: Very little R-S3207 is found outside Scandinavia. Both its major sub-clades, R-S5673 and R-CTS5533, have a very strong presence there. This evidence strongly points to an origin in Scandinavia. The aforementioned ancient DNA result from northern Norway (4th C AD) in part of this group. R-U106>S12025 Likely MRCA data range: 2700-1600 BC. Likely origin: central Europe (if closer to 2700 BC) or continental coasts of the North Sea (if closer to 1600 BC) Culture: Corded Ware Culture (if closer to 2700 BC), Single Grave Culture (central estimate) or Nordwestblock cultures (if closer to 1600 BC). Narrative evidence: Likely most common in the Netherlands and Denmark. More than average in Scotland. Indications of a North-Sea-based culture. Too small and unclear to make many good predictions from. R-U106>S18632 Likely MRCA data range: 2600-1900 BC. Likely origin: central Europe (if closer to 2600 BC) or continental coasts of the North Sea (if closer to 1900 BC) Culture: Corded Ware Culture (if closer to 2600 BC) or its western or central European descendants. Narrative evidence: Shows an important presence in central and eastern Europe. In the British Isles, concentrated in England. Too small and unclear to make many good predictions from. R-U106>FGC3861 Likely MRCA data range: 2400-1600 BC. Likely origin: central or western Europe? Culture: Corded Ware Culture, possibly specifically Single Grave Culture (if closer to 2400 BC) or its western or central European descendants. Narrative evidence: R-FGC3861 is fairly evenly spread across Europe, tracing approximately the same distribution as the R-U106 average. This probably means it shares a common culture with the bulk of R-U106 during its early years. It's sub-clades show a more diverse pattern. R-A1243 is strong in eastern Europe. R-FGC14877 is focussed towards Germany. R-Z8053 is strong in Scandinavia, Britain, France and probably the western Meditteranean. There is not enough data to easily tell when the major geographical splits were. R-U106>Z2265xZ381xZ18 (R-U106 basal clades) Narrative evidence: taken together, the R-U106 minor basal clades show an overall picture of R-U106 that can be separated out from later migrations of the bulk groups R-Z381 and R-U106. Overall, these basal clades follow the same distribution as the rest of R-U106, with possibly slightly higher frequencies in the Netherlands, Denmark and Czech Republic, with possibly lower prevalence in Ireland, Switzerland and Norway. However, these are at the very edge of statistical significance. This indicates that our R-U106 family retained its homogeneity during the early phases of its expansion, only starting to branch apart at some point after R-Z381's formation. Whether this means R-U106 was stationary during this time, or all migrated in the same direction (westward) isn't clear. R-U106>Z381xL48xS1688xZ156 (R-Z381 basal clades) Narrative evidence: taken together, the R-Z381 basal clades are relatively uncommon in central Europe, slightly less common in the British Isles, and more common in Scandinavia (especially Norway). R-FGC8512 has a notable Estonian component. They also appear mildly more common in the Meditteranean, particularly Portugal and Spain. However, overall, they are a generally good match to the other clades in R-U106. This suggests perhaps an overall migration after R-Z381 in the direction from central Europe towards Jutland, Scandinavia and the Atlantic (consistent with movement towards the Single Grave Culture and its descendants). However, this is very weak evidence and a speculative hypothesis. R-U106>Z381>Z156 Likely MRCA data range: 3000-2700 BC. Likely origin: central Europe (Bohemia?) Culture: Corded Ware Culture (eastern Bell Beaker influence if later) Narrative evidence: (R-Z156 has strong family-level biases from over-sampled testiing of the McMullen (R-S5520>FGC11674) and Pittmann (R-Z306>DF98>FGC48870) groups. These groups have been taken out of the following analysis for all of R-Z156 and its subclades.) Overall, R-Z156 doesn't show a very strong British bias, and is comparatively absent from the Low Countries, despite the fact that the earliest R-U106 ancient DNA from these regions is almost entirely R-Z156. Hence, while R-Z156 may have been the earliest western wave of R-U106, it wasn't as influential in these regions as later migrations. R-Z156 is moderately common in the rest of north-western Europe, particularly France, where it provides 35% of the R-U106 test results (cf. 18% over Europe as a whole). It is also relatively common in the Meditteranean, meaning overall that it is common among the southern reaches of R-U106's realm. Within the British Isles, it is notably strong in Ireland. We know from ancient DNA that R-Z156>Z304>DF98>S1894 was found in the early Unetice culture in Prague. Hence, it is likely that R-Z156 arose in the precursor to this culture, the Corded Ware Culture, and we can tentatively plot a very short line across Bohemia from our most-ancient R-U106 burial to this one. The prevalence to the south, France and Ireland would therefore come from later migrations, notably including the Celtic migrations (especially the western La Tene groups). R-Z156 shows a modest population in eastern Europe, but does not comprise a significant fraction of the R-U106 here: the eastern European populations are over-represented in the R-Z156 basal clades (e.g. R-Y30585), which we can use to further evidence this central European origin, then bulk migration skewed towards the west. R-U106>Z381>Z156>S5520 Likely MRCA data range: 2900-2500 BC. Likely origin: central Europe Culture: Late Corded Ware Culture, probably under the influence of the eastern Bell Beakers, and/or transition to the Unetice culture Narrative evidence: Generally following the R-Z156 population (indicating a likely common origin with other R-Z156 clades), R-S5520 is less common in the British Isles than other R-Z156 clades. Instead, its prevalence appears to follow a rough arc, trailing from Switzerland, Germany and Belgium, up to Denmark and Sweden, with some contribution from Poland. Unusually for R-Z156, it does not show a strong component in France. It is found in ancient DNA (R-FT221936) in the Hallstadt C/D Celtic burial I23978 in Slovenia. Consequently, R-S5520 appears to retain a more central European distribution. R-U106>Z381>Z156>Z306 Likely MRCA data range: 2900-2500 BC. Likely origin: central Europe (Bohemia?) Culture: Late Corded Ware Culture, probably under the influence of the eastern Bell Beakers, and/or transition to the Unetice culture Narrative evidence: R-Z306 is again caught in the descendancy between our early R-U106 man in Bohemia, PNL1, and our R-S1894 burial in Prague. Consequently, a Bohemian origin for R-Z306 seems fitting as well - at least we can expect an origin somewhere near. It is also found in Nordwestblock (Hilversum?) culture remains from ~14th Century BC south Holland, Romano-British remains from York, and La Tene Celtic burials. Since R-Z306 dominates R-Z156, it dominates the modern population distribution as well, so it is found strongly in Ireland but otherwise is slightly uncommon in the British Isles. It is common in France, is a typical fraction of R-Z381 in Germany, but avoids the Low Countries. It is rare in many central European countries, apart from the Czech Republic, and fairly rare across Scandinavia. It makes up a reasonable fraction of Finnish and Russian R-U106, but is fairly rare in the smaller former Soviet states. Its distribution in eastern and southern Europe is typical. R-Z306 splits effectively into R-DF98 and R-DF96, which represent the starting points for their own respective expansions. The other, minor clades of R-Z306 are predominantly found in Germany, Poland and the Czech Republic, as expected for its likely origin, but also possibly in Norway, south-eastern Europe and the Meditteranean, probably (if real) indicating later migrations. R-U106>Z381>Z156>Z306>DF98 Likely MRCA data range: 2500-2100 BC. Likely origin: central Europe (Bohemia?) Culture: Late Corded Ware + Bell Beaker Culture (if closer to 2500 BC), Unetice culture (if closer to 2100 BC) Narrative evidence: The origin of R-DF98 is again well evidenced by PNL1 (early U106+, eastern Bohemia) and I7196 (early S1894+, Prague). R-DF98 broadly shares the distribution of R-Z156, indicating a likely common point of origin. It shows a strong population in Ireland and Scotland, particular northern Ireland thanks to some Ulster Scots. It is still very present but less common in England and Wales. It is very common in France (10% of all French R-U106, cf. 4% of European R-U106), the Czech Republic (11%) and possibly Switzerland, Austria and the western Meditteranean. It avoids the Low Countries, especially the Netherlands (0.85%). Unlike other R-Z156 clades, it is relatively common in Norway and Sweden. However, it avoids the former Soviet states. There is substantial variation within R-DF98, but only R-S1911 and R-S18823 are large enough to consider. Common themes include a high frequency in France and Ireland. R-U106>Z381>Z156>Z306>DF98>S1911 Likely MRCA data range: 2400-2050 BC. Likely origin: central Europe (probably Bohemia) Culture: Early (or proto-) Unetice culture Narrative evidence: R-S1911 shows a greater fraction in the British Isles, at least some of which derives from Norman sources, and ultimately likely French populations. It is common in Scandinavia compared to R-Z156 and R-DF98 as whole groups, and appears common in southern Europe. Its origins must be closely linked to I7196 in Prague. The features of R-S1911 are reflected in R-S1894, which accounts for about half of the group, and includes both I7196 and a Roman-British gladiatorial burial in York. An R-S1911 burial is also found in Visigothic Spain. The Scandinavian expression of R-S10621 is also strong. R-U106>Z381>Z156>Z306>DF98>S18823 Likely MRCA data range: 2400-2000 BC. Likely origin: central Europe (Bohemia?) Culture: Early (or proto-) Unetice culture Narrative evidence: R-S18823 appears common in the upper Rhine valley, especially around Frankfurt, but also (to a lesser extent) in the upper parts of the Danube. It is comparatively under-sampled in the Meditteranean and Scandinavia, even compared to the rest of R-Z156. It is also fairly undersampled in the British Isles, with two families (Egan and Ferguson) dominating the testing. The French component is largely driven by R-M6509 and R-S18821. The German component is strongest in R-M6509. R-M6509 and R-S22116 are both strong in central Europe. R-U106>Z381>Z156>Z306>DF96 Likely MRCA data range: 2500-2000 BC. Likely origin: central Europe? Culture: Late Corded Ware + Bell Beaker Culture (if closer to 2500 BC), Unetice culture (if closer to 2100 BC) Narrative evidence: R-DF96 generally follows the same frequencies as R-DF98 and many other R-Z156 groups, suggesting a shared origin. However, it is found more in England than Scotland, but still is strong in Northern Ireland. It remains strong in France, but less so elsewhere in western Europe. It is essentially absent from central Europe, including the Czech Republic where we suspect (simply by virtue of association with I7196) it arose, but does appear in Russia, the Ukraine and Hungary. It contains a roughly normal fraction of the R-U106 Meditteranean population. These differences suggest that, although it may have a similar origin to R-DF98, it quickly participated in very different migrations. R-U106>Z381>Z156>Z306>DF96>FGC13326 Likely MRCA data range: 2200-1600 BC Likely origin: central Europe? Culture: Unetice culture? Narrative evidence: R-FGC13326 is an unusual beast, likely the produce of many migrations that happened after its formation. It is relatively absent from Great Britain, but present in Ireland (esp. R-S22047). It's strongly found in the Netherlands (major subclades) and France (minor subclades), but not so much in Germany. It's present in Denmark and Finland (R-S25234), but not so much in Norway and Sweden. It's very strong in Iberia compared to other R-U106 groups (minor clades), but nowhere eastwards in the Meditteranean. It appears in Russia and Poland (R-S25234), but nowhere further south. Hence, it's largely been pushed to an arc around the periphery of Europe (without having made it to the Scandinavian peninsula), but no longer really appears at its centre. It's possible that its spread can be partly attributed to the Tumulus culture, with R-S25234 going north-west and north-east, R-S22047 heading down the Rhine and over to the British Isles, and minor clades heading south-west. Various Celtic migrations that followed likely contribute to the migration pattern. However, the haplogroup as a whole lacks a clear narrative. R-S25234 has been found in an ancient Orkadian sample, probably dating from the Bronze or Iron Age, but further details are not available at the time of writing. R-U106>Z381>Z156>Z306>DF96>S11515 Likely MRCA data range: 2100-1300 BC Likely origin: central Europe? Culture: late Unetice or early Tumulus cultures? Narrative evidence: R-S11515 appears very strongly in the British Isles. This strong presence is mostly to the sub-clade FGC8410>BY17999 (discussed below), and doesn't characterise the group as a whole. The rest of these results are corrected for that group's over-representation. The corrected north-western European fraction is strongly concentrated in Germany, with abnormally few results in France and (to a lesser extent) the Low Countries. It is notably absent in central Europe, but fairly strong in Scandinavia across to Russia, thanks to some specific clades (i.e., later migrations) notably including R-FGC8410>FGC8372. It also has a presence in Hungary. Like R-FGC13326, it is difficult to pull a clear narrative from these observations, except that a continental European origin fits the multiple directions of migration. R-U106>Z381>Z156>Z306>DF96>S11515>BY17999 Likely MRCA data range: 400 BC - 700 AD Likely origin: western Europe, migrating to British Isles Culture: Pre-Roman Celtic or post-Roman West Germanic cultures Narrative evidence: This haplogroup represents an important geographical disjoint as, unlike the haplogroups above it, it appears almost uniquely English. While the locations in England aren't often stated, the exact timing of this haplogroup is important. There is very little evidence of R-U106 as a whole in the British Isles before the post-Roman "Anglo-Saxon" invasions. While this period can't be ruled out for this group, it is disfavoured, and an earlier period of history (e.g. Roman- or pre-Roman-era migrations, e.g. Belgae) can't be ruled out. R-U106>Z381>Z156>Z306>DF96>S11515>L1 Likely MRCA data range: 1600-800 BC Likely origin: central Europe? Culture: Tumulus culture? (possibly Urnfield culture if later?) Narrative evidence: R-L1 has a typical strength within the British Isles compared to other R-Z381 clades. In Europe, however, it is notably absent in France and the Low Countries, and common in Germany. Outside these regions, it has a fairly typical distribution across Europe, stretching as far as Russia and the Ukraine, which is fairly typical of many of the other central European haplogroups we see. The timing of R-L1 places it most likely in the Tumulus culture, but subsequent cultures (e.g. pre-Celtic cultures) are possible too. Romano-British remains from York show that R-L1 was in the British Isles from the 3rd century AD at the latest. The large sub-group R-BY41554 shares many of R-L1's characteristics, but is slightly more common in Great Britain (though not Ireland), and less common in Scandinavia. R-U106>Z381>S1688 Likely MRCA data range: 2400-1500 BC. Likely origin: (west of) central Europe? Culture: possibly (western) Unetice culture? Narrative evidence: R-S1688, parent of R-U198, shares more similarities with R-Z156 than it does with R-L48, suggesting a similar origin. Yet it is still sufficiently different from R-Z156 to suggest a slightly different evolution. R-S1688 is a smaller haplogroup that either of the others, meaning it is harder to achieve meaningful statistics within it. Nevertheless, it appears more common in England compared to the rest of the British Isles. It is relatively under-represented in western Europe, being noticeably absent in Belgium and Switzerland, but reaches a local maximum in France. It is reasonably common in Scandinavia, but appears not to have expanded into Finland. It also appears rarely in eastern, south-eastern and southern Europe, indicating it is more confined to the north and west than either R-Z156 or R-L48. The similarity with R-Z156 (particularly R-S1911) suggests a similar origin and, with R-Z156 being noticeably present in the Unetice culture and the timingi of R-S1688 generally matching the period in which the Unetice culture was active, R-S1688 may share a similar origin and migration history. However, its relative confines to the north-west of Europe may indicate a more western outlook. The Unetice culture was also present in many parts of central and southern Germany and, while it is highly speculative to assign such a precise origin, this may give us a starting point to understand its true roots. R-U106>Z381>S1688>DF93 Likely MRCA data range: 2200-1100 BC. Likely origin: west or central Europe? Culture: Unetice or Tumulus cultures? Narrative evidence: R-DF93 is about as small a haplogroup as we can investigate with meaningful results. It is split roughly in half by the much younger R-S4056, so we can perform a jack-knife test to compare the results of R-DF93's basal clades (R-DF93xS4056) with R-S4056. R-DF93 is more common than most haplogroups in England and Wales, but not so much the rest of the British Isles. This usually indicates recent migration (e.g. post-Roman Germanic or Norman), as older populations tend to be pushed to poorly Romanized regions, either by the Romans or these later invasions (however, see R-FGC12307 below). In western Europe, R-DF93 appears in France and the Netherlands, but is significantly absent elsewhere, especially in its basal clades. R-S4056 is rare in central Europe, but (unlike the basal clades) common in Scandinavia. R-DF93 is present in eastern Europe, but the haplogroup is too small to make meaningful comparisons. Hence, the most likely hypothesis seems to be that R-DF93 was born in central Europe, but that at least some of its descendant R-S4056 migrated up to the Atlantic coasts, possibly Scandinavia. However, these conclusions are very speculative, given the size of the haplogroup. Further investigation of the clades responsible for the Scandinavian and Atlantic continental contributions may be helpful in elucidating its origins and migrations. R-U106>Z381>S1688>S15627 Likely MRCA data range: 2300-1200 BC. Likely origin: central Europe? Culture: Unetice or Tumulus cultures? Narrative evidence: R-S15627 is found in Langobard DNA in post-Roman Italy. Hence, we can be clear that at least some of it participated in the west Germanic cultures. However, these may have been adopted cultures from a Celtic or pre-Celtic background, as was common in modern Germany at the time. Like R-DF93, R-S15627 fits neatly into two periods of expansion: the basal R-S15627 and the downstream R-FGC12307, which represents around half the group's testers with European ancestry. In the British Isles, the basal clades are common in Scotland (thanks largely to the young group R-FGC12774) and Ireland (thanks partly to R-FGC12791, upstream of R-FGC12774), while R-FGC12307 is found mostly in England, like the rest of R-S1688. This suggests to very different migration pathways to R-S15627, that may differ in both time and place. In fact, R-FGC12307 is found essentially nowhere other than England (100 of 104 European results are from the British Isles). With an estimated origin (YFull) of 350 BC - 300 AD, R-FGC12307 is therefore another candidate (alongside R-BY17999) for being a pre-Roman arrival into the UK. The R-S15627 basal clades are very rare in western Europe, but probably common in a band that may run approximately across central Europe from Austria to Sweden, and may indicate their origin. R-U106>Z381>L48 Likely MRCA data range: 3000-2500 BC. Likely origin: western or central Europe Culture: Corded Ware Culture Narrative evidence: R-L48 formed as part of the same, unbroken population expansion as R-U106. It displays few major differences from the other major sub-clades of R-Z381, so can't be considered a major geographical or cultural change in itself. It therefore probably arose out of the same central European Corded Ware Culture expansion as its counterparts. There are a few interesting differences about its modern distribution, however, that can be explained by its later migrations. Specifically, it is noticeably rarer in Ireland and France than the rest of R-Z381, suggesting it stuck closer to the Germanic populations than Celtic in its later migrations. It is also rare in the Meditteranean. However, it is relatively common across all three branches (west, north, east) of the Germanic cultural sphere. It is really through R-L48 that R-U106 inherits the dubiously-correct badge of being the clade of the Germanic peoples. R-U106>Z381>L48>Y37962 Likely MRCA data range: 2900-2300 BC Likely origin: north-west Europe? Culture: possibly Corded Ware Culture (early), Single Grave (sub-)culture (mid-estimate) or Nordic Bronze Age (late). Narrative evidence: R-Y37962 is dominated by its slightly younger sub-clade R-S23189, which in turn splits evenly into two slightly younger sub-clades, R-A6706 and R-FT6679. These haplogroups show a fairly consistent picture. This suggests that R-Y37962 probably arose towards or after the end of the main Corded Ware Culture migration, and remained a homogenised population for several centuries while it grew and branched. In the British Isles, it is comparatively stronger in Scotland and Ireland than in England. It is fairly weak in western Europe, being found mostly in Germany and Switzerland, avoiding the Low Countries and especially France. It is moderately strong in central Europe, thanks to R-A6706. R-A6706 also appears strong in Scandinavia, while R-FT6679 is found in Finland, giving an overall relatively strong Scandinavian return. All of R-Y37962 is noticeably weak or absent across all of eastern Europe, south-eastern Europe and the Meditteranean. Groups in these regions can often be attributed to either Celtic or post-Roman Germanic migrations, and these null returns and general absence in England and France suggest that R-Y37962 did not participate strongly in these - this suggests west Germanic, east Germanic and probably Celtic groups are unlikely to have been greatly populated by R-Y37962 before 450 AD. Instead, a north Germanic focus is suggested, as is typical for a Scotland-Ireland-Scandinavia combination. This could be Viking or earlier. However, a notable ancient R-S23189 burial is an early Angle burial in Cambridgeshire. R-U106>Z381>L48>CTS3104 Likely MRCA data range: 2800-1700 BC Likely origin: north-west Europe, Jutland or southern Scandinavia? Culture: unclear, possibly Single Grave Culture (early estimates), Nordic Bronze Age or Nordwestblock (e.g. Elp) culture? Narrative evidence: R-CTS3104 is a very small haplogroup compared to the others. We can't say much about it. It is relatively absent in Ireland and more present in Great Britain. It is very common in Scandinavia, and (in decreasing over-abundance) in Belgium, Switzerland and Germany. It is present in France, but has not been evidenced in central, eastern or southern Europe. This evidence combines factors we expect from both north Germanic and west Germanic groups, without really fulfilling all the criteria for either, and may ultimately be more closely associable with the Nordwestblock cultures, though we would expect to see more evidence in the Netherlands and more dissemination through the Tumulus culture into reigons further east. Ultimately, we need more testers to flesh out the distribution of this group. R-U106>Z381>L48>L47 Likely MRCA data range: 2900-2200 BC Likely origin: central Europe Culture: Eastern Bell Beaker? Narrative evidence: R-L47 splits unevenly into R-L44 and R-Z159, which have very different distributions. Overall, R-L47 is not common in British Isles, though this varies considerably among its sub-clades. Universally, however, it is more common in England, Wales and the Republic of Ireland, and less common in Scotland and Northern Ireland. This north-south dichotomy, especially within Ireland, is quite unusual. R-L47 is comparatively rare in western Europe - exceptionally so in some of the more recent clades. However, both R-L44 and R-Z159 have a strong presence in Poland and the Czech Republic, and there is some evidence that both are present down eastern Europe, at least as far as Hungary. The more basal clades are found in (R-L44) Italy and Austria (with the Czech Republic and Poland being found a little further down the tree), and (R-Z159) Switzerland and Sweden (with Austria, France and Norway found further down). This pattern is most easily reconciled by an ancestor in eastern Europe, perhaps in the Danube valley or the Carpathian mountains, in a culture such as the eastern Bell Beaker Groups (e.g. Moravian), which are thought to have had migrations out to these basal-clade countries from this location at this time. The strong presence across R-L47 clades may suggest an early migration into the Trzciniec culture or similar. R-U106>Z381>L48>L47>L44 Likely MRCA data range: 2600-1700 BC Likely origin: central Europe Culture: Eastern Bell Beaker or Unetice Cultures? Narrative evidence: R-L44 is very similar to R-Z159 in the British Isles, and western and central Europe, but shows noticeable differences in the rest of Europe. It is essentially absent in Scandinavia and (while numbers are too small to be certain) probably in eastern and south-eastern Europe beyond Poland and Hungary. It appears occasionally in Italy. In the wider context of R-L47, this may indicate a group that started off in central Europe (around the same place as R-L47), but later migrated westwards. The central-European characteristics are retained by the majority R-L163 (2400-1400 BC) and R-L46 (2100-900 BC) basal clades, but show a marked difference in R-L45 (1100 BC - 200 AD). R-L45 is majority British (66 out of 70 European testers) and not dominated by a single family, so meets the criteria for being a British-origin haplogroup (or at least one which rapidly became British after its arrival), thus is among the few potentially pre-Roman British haplogroups. Note, however, a Danish Viking burial is R-L45>L493>FGC10248>FGC10249. R-U106>Z381>L48>L47>Z159 Likely MRCA data range: 2400-1300 BC Likely origin: central or eastern Europe? Culture: Eastern Bell Beaker, Unetice or Trzciniec cultures? Narrative evidence: R-Z159 is dominated by two large sub-clades: the dominant R-S6924>S3251 and the smaller R-S9257. Results of the other basal clades show a pan-European distribution, but with a concentration towards central Europe (see R-L47), indicating the origin of R-Z159 may lie here too. Common features to both R-S3251 and R-S9257 include a weak presence in the British Isles (mostly England and the R.o.Ireland), a strong presence in central Europe (especially Poland, see R-L47 for interpretation), and an over-representation in Scandinavia (especially Sweden). This may indicate that R-Z159 lines joined the R-L44>L45 lines in their appearence in the British Isles (though this would need shown in terms of timings). R-U106>Z381>L48>L47>Z159>S9257 Likely MRCA data range: 2300-1000 BC Likely origin: central Europe? Culture: uncertain, maybe Unetice or Tumulus cultures? Migrating towards Nordic Bronze Age? Narrative evidence: Compared to R-S3251, R-S9257 is much more western-focussed. It is moderately common in England and Ireland, though overall it shows less representation in the British Isles than most R-L48 clades. It is relatively common in western Europe, especially France and the Low Countries, though only typically prevalent in Germany. It is very over-represented in Poland and the Czech Republic. It is only typically present across the rest of Europe, though the statistics here are few. This likely indicates an origin still in central Europe, but with an early migration westwards to cultures more typical of R-U106 (e.g. western Unetice, Tumulus, Nordic Bronze Age, depending on how far the migration took it). R-U106>Z381>L48>L47>Z159>S3251 Likely MRCA data range: 1900-800 BC Likely origin: south-eastern Baltic? (Poland or [maybe] Sweden?) Culture: Trzciniec culture? early Lusatian culture? (possibly late Nordic Bronze Age?) Narrative evidence: R-S3251 breaks up into three moderate-sized groups: R-FGC17298, R-FGC8563 and R-M10145, of which R-FGC8563 is the largest, and R-M10145 is too small to get meaningful statistics beyond the fact its distribution is not abnormal. However, R-M10145>S6915 does contain the only ancient DNA remains for the group, in a 4th Century Hun burial in Slovakia and an early Icelandic burial. Overall, R-S3251 is moderately common in England and Ireland, though again less so than other R-L48 clade. It is distinctly uncommon in western Europe (with the sole exception of Germany for R-FGC17298). It is common in central Europe (especially Poland), Scandinavia (esp. Norway and Sweden), and exceptionally common in the former Soviet states, where it makes up about 1/4 of R-U106 (cf. 2.3% elsewhere). This eastern enhancement is particularly strong in R-FGC8563, which extends into the Baltic States, but also found in R-FGC17298, which is found in Russia, Belarus and the Ukraine; one Russian is also found in R-M10145. The enhancement continues all the way down eastern Europe and into the Meditteranean states, including Italy, Spain and Portugal. This expansion is hard to date, but fits very neatly with the Gothic migrations that began in the late centuries BC and culminated in the fall of Rome. The origin of R-S3251 would then be a culture that fed into the start of this migration, in modern-day Poland or nearby. R-U106>Z381>L48>Z9 Likely MRCA data range: 2800-2100 BC Likely origin: west-central Europe? Culture: western Unetice or early Danubian cultures, Single Grave Culture (if early) Narrative evidence: R-Z9 makes up about a third of the R-U106 that we sample. It forms the start of a new population spread that is likely (by virtue of its three SNPs) distinct from that of the original Corded Ware Culture expansion. We therefore likely look towards a later culture for its origin. It splits unevenly into R-Z331, which is dominated by R-Z326, and R-Z30, which is dominated by R-Z2. These two branches present very different geographies, so it is likely that the start of this population expansion co-incided with the beginning of a migratory period for one or both branches. R-Z9 overall is typically common across Great Britain, though appears less in Ireland compared to other R-Z381 clades. It also appears roughly even in western Europe, though is skewed more towards the north, being more common in the Netherlands and less common in France. That trend continues into Scandinavia, where it reaches its highest over-sampling. It is considerably less common in central and eastern Europe, including being less common in south-eastern Europe and (marginally so) in the Meditteranean. This relative frequency in the north-west of Europe suggests an ancestor in that direction compared to earlier central European (~Bohemian) clades. Perhaps not yet in the Scandinavian peninsula (the over-sampling is not anything like that of R-Z18), but heading in a direction more towards modern Germany. both downstream haplogroups (Z331 and Z30) have a prevalence in countries to the south of Germany, suggesting that the northern component of R-Z9 is a later migration northward, and that R-Z9's origin might be more towards the south of Germany. R-U106>Z381>L48>Z9>Z331 Likely MRCA data range: 2700-1900 BC Likely origin: Germany Culture: Danubian cultures, Rhineland Beaker cultures, Single Grave Culture (early) or Nordwestblock (late) Narrative evidence: R-Z331 is vastly dominated by its much younger branch, R-Z326 (2000-1000 BC). However, while the basal clades of R-Z331 are small, they share the same distribution as R-Z326, indicating that they came from the same geographical origin. That origin is very firmly rooted in central Europe, and centred on modern Germany. R-Z331 the only large R-U106 clade that has less than 50% of its European testers in the British Isles: it is underpresent in Great Britain by a factor of two, and in Ireland by a factor of three. Hence, it participated only very minorly in the major Celtic and Germanic migration from western and central Europe to the British Isles. By contrast, it is over-abundant by a factor of two in western Europe, especially Germany and Switzerland. This over-abundance is still present, but drops (in rough order) towards Belgium, Poland, the Netherlands, Austria, France and Denmark. It is rarely found in eastern Europe or the Scandinavian peninsula. However, it is probably moderately common in along the Meditteranean, from Iberia to the Balkans, and appears strong in Italy. The locus of this expansion, seems to be somewhere in modern Germany, speculatively towards the south. The R-Z326 subclades, R-FGC10367 and R-FGC18842 make up the majority of this haplogroup, and their own respective sub-clades R-CTS2509 and R-S23955 make up most of them. While these sub-clades are big enough to analyse in their own right, they share the same geographical representations, suggesting they both originated in the same place. R-CTS2509 retains more of the central European (Germany and surrounds) concentration, while R-S23955 is more geographically split, having pockets in the Low Countries, Denmark and Norway (with a possible slight Viking influence in Scotland), and the Czech Republic and Austria, suggesting at least three migrations out from this central point. While there is some correspondence between these locations and the R-S23955 sub-clades, the evidence to separate them isn't compelling, suggesting the migrations away from Germany took place some generations of centuries after the initial R-S23955 split. R-U106>Z381>L48>Z9>Z30 Likely MRCA data range: 2600-1800 BC Likely origin: southern Germany? Culture: Danubian cultures, Rhineland Beaker cultures, Single Grave Culture (early) or Nordwestblock (late) Narrative evidence: R-Z30 is very much dominated by R-Z2. The R-Z30 basal clades are few in number, but they show a very strong enhancement in Switzerland, the Czech Republic and Austria, and a strong enhancement across north-western Europe from France to Poland, plus in the inner Baltic (Sweden, Finland, Russia). In the context of the surrounding clades, the latter would appear most likely due to a later migration, while the former western and central European groups (focussed on the upper Danube) are more consistent with R-Z331, so likely represent the origin of the group and the position from which it initially spread. Most sub-clades stayed in this region, while R-Z2 moved north. R-U106>Z381>L48>Z9>Z2 Likely MRCA data range: 2400-1600 BC Likely origin: northern Germany or Jutland (maybe western Baltic or the Rhine delta?) Culture: Barbed Wire or Rhineland Beaker groups? Nordic Bronze Age? Nordwestblock (e.g. Elp) cultures? Narrative evidence: All three sub-clades of R-Z2 show a strong enhancement in the Scandinavian population compared to the R-Z30 basal clades. This suggests R-Z2 is the launching point for a migration headed in that direction. Of these three groups, R-S22165 is too small to say much more about. R-FGC31495 is more sizeable: in addition to a strong Scandinavian return, it shows typical distributions across the British Isles, a strong French prevalance in its sub-clade R-FGC31514, and a strong German presence overall, sporadic but overall strong returns from the former Soviet states, but probably a general absence elsewhere in eastern Europe, south-eastern Europe and the Meditteranean. Its largest group, R-Z7 (discussed below) is strongest in the British Isles and Scandinavia, and much weaker elsewhere. While its origins pre-date any concept of a proto-Germanic population, it appears more clearly associated with north-western Europe. Depending on how far the R-Z2 migration went, this could be anywhere from southern Scandinavia down to the middle Rhine valley, or across to northernmost France, but is most likely somewhere in the middle of that triangle, with the general lack of basal results in the Low Countries suggesting somewhere further east - this level of detail is exceptionally speculative, however, and its culture of origin remains very unclear as a result. R-U106>Z381>L48>Z9>Z2>Z7 Likely MRCA data range: 2300-1400 BC Likely origin: north-central Europe, Jutland or (less likely) southern Scandinavia? Culture: unclear, perhaps Nordwestblock (eg.. Elp) cultures? Nordic Bronze Age? northern Tumulus? Narrative evidence: The smaller basal clades of R-Z7 include returns from Finland (R-BY35996) and Italy, Slovenia and Austria (R-BY160730) - the latter haplogroup is a few centuries younger, however, so this is likely to indicate a later migration, rather than a point of origin. R-Z8 (below) makes up the majority of the clade, with the minority remainder made up of R-FGC902. Consistent results between these groups are a common presence in Great Britain but slight absence in Ireland, and a fairly common presence in Scandinavia (R-FGC902 being localised in Sweden), a typical abundance in the Meditteranean, a relative absence in north-west Europe, and notable absence in eastern and south-eastern Europe. These factors best support a west-Germanic history, but this haplogroup predates that group by 1000-2000 years. Hence, we are probably looking at some north-western continental group, similar to its parent, R-Z2. R-U106>Z381>L48>Z9>Z2>Z7>FGC902 Likely MRCA data range: 2200-1300 BC Likely origin: north-central Europe, Jutland or (less likely) southern Scandinavia? Culture: unclear, perhaps Nordwestblock (eg.. Elp) cultures? Nordic Bronze Age? northern Tumulus? Narrative evidence: R-FGC902 is dominated by R-CTS10893 (1500-600 BC), but the other R-FGC902 basal clades are too small to be geographically useful. R-CTS10893 itself is dominated by R-CTS4099 (1400-500 BC). Using this as the separator point, there is little difference between the basal R-FGC902 clades and R-CTS4099. The most marked difference is that only R-CTS4099 is found in Scandinavia, indicating that a migration from continental Europe to Sweden probably happened after R-CTS4099 formed, allowing R-CTS4099 to come into the west Germanic sphere. R-U106>Z381>L48>Z9>Z2>Z7>Z8 Likely MRCA data range: 1800-900 BC Likely origin: continental North Sea coast, probably between the Rhine Delta, southern Sweden and southern Norway Culture: Nordic Bronze Age if north, Nordwestbloc if west, northern Tumulus or Urnfield if south Narrative evidence: R-Z8 is defined by a presence in both the British Isles (specifically England), Scandinavia (particularly Norway and Finland but also Denmark) and the Netherlands. A smattering in Latvia may hint at a larger population, but it is found much less anywhere else. This North Sea focus is similar to that seen in some R-Z18 clades, but R-Z8 is not so extremely Scandinavian. It is later seen in ancient DNA in Viking burials from Norway and Denmark. Consequently, the geography appears consistent with the other R-Z2 haplogroups. R-U106>Z381>L48>Z9>Z2>Z7>Z8>Z338 Likely MRCA data range: 1700-800 BC Likely origin: continental North Sea coast, probably between the Rhine Delta and Jutland Culture: Norwestblock, e.g. Elp, if early; possible northern Tumulus or Urnfield if later. Nordic Bronze Age if later. Narrative evidence: R-Z338 shows more enhancement in the Netherlands, Denmark, Finland and Latvia (R-FGC1954) compared to R-Z1. Like R-Z1, it is much more frequent than normal in England. Enhancement also exists in Norway and Sweden. The basal clades are strongest in the Netherlands and Denmark, with a reasonable presence in Germany and France, but not in England. The major sub-clade, R-Z341, is the haplogroup showing the English prominence (by a factor of 1.76 compared to R-Z381 as a whole). The notable anomaly in R-Z338 is the high frequency in Finland. This is difficult to reconcile with the western European frequency, and lack of presence in countries further east. It is also present in the the "nephew" group R-Z346, but not R-Z346's brother group, R-Z344, indicating that the split point between the two dictates when the relevant migration occurred. Hence, a solution to the Finnish problem is a migration of a largely homogeneous population sometime shortly before the R-Z344/R-Z346 split (i.e. before 1500-500 BC). R-U106>Z381>L48>Z9>Z2>Z7>Z8>Z338>Z341 Likely MRCA data range: 1200-200 BC Likely origin: continental North Sea coast, probably between the Rhine Delta and Jutland Culture: pre-Germanic? (late Nordic Bronze Age or Urnfield if earlier; possibly Jastorf if later?) Narrative evidence: R-Z341 splits into three groups: R-Z12, R-S1774 and the much smaller R-S24577. R-S24577 is too small to obtain good statistics, but shows presence in Ireland, the Netherlands, Sweden and Finland. R-S1774 is much younger, and appears to be almost entirely concentrated in the south of England. The age of this group (500 BC - 500 AD) spans the Roman period, so this could be a Roman addition, but this also overlaps with the Saxon+Jute regions, so there is a chance this could be a post-Roman Germanic migration. R-Z12 (1000-1 BC) splits into R-Z8175 (900-1 BC) and R-A5616 (500 BC - 600 AD): R-Z8175 is common in Norway, Sweden and England, hence is likely to be a Scandinavian haplogroup with a Norse component in England; R-A5616 is found almost exclusively in the British Isles, with a strong concentration in the south of Scotland. The uncertain timing of these migrations leaves their nature unclear. R-U106>Z381>L48>Z9>Z2>Z7>Z8>Z1 Likely MRCA data range: 1600-700 BC Likely origin: Scandinavia Culture: pre-Germanic late Nordic Bronze Age Narrative evidence: R-Z1 splits into the larger R-Z346 and smaller R-Z344 with no other surviving basal clades. Both show strong returns in England and Sweden, with few results elsewhere. A scattering are present in both north-western and north-central Europe (65% and 25% of R-Z381 rates), with progressively lower rates of return when progressing down the eastern regions (}36% in former Soviet states, ~22% in south-eastern Europe, ~13% in the Meditteranean) This suggests a pre-Germanic, Scandinavian origin, but one that did not meaningfully participate in eastern Germanic settlement. R-U106>Z381>L48>Z9>Z2>Z7>Z8>Z1>Z344 Likely MRCA data range: 1400-400 BC Likely origin: western(?) Scandinavia Culture: pre- or proto-Germanic cultures Narrative evidence: R-Z344 differs from its brother R-Z346 by having a stronger focus towards western Scandinavia, with strong returns from Denmark, Norway and Sweden. These may be split unequally between its subgroups, hence could be evidence of a later separation, but we lack the data to say that with any certainty, so we can treat it as western Scandinavian for now. It remains strong in England, the R.o.Ireland and Wales, but not so in Scotland or Northern Ireland. The sub-clade R-Z6 shows a strong Belgian and perhaps Polish returns, possibly indicating later migrations related to Germanic expansion. R-U106>Z381>L48>Z9>Z2>Z7>Z8>Z1>Z344 Likely MRCA data range: 1400-500 BC Likely origin: western(?) Scandinavia Culture: pre- or proto-Germanic cultures Narrative evidence: R-Z344 differs from its brother R-Z346 by having a stronger focus towards western Scandinavia, with strong returns from Denmark, Norway and Sweden. These may be split unequally between its subgroups, hence could be evidence of a later separation, but we lack the data to say that with any certainty, so we can treat it as western Scandinavian for now. It remains strong in England, the R.o.Ireland and Wales, but not so in Scotland or Northern Ireland. The sub-clade R-Z6 shows a strong Belgian and perhaps Polish returns. R-U106>Z381>L48>Z9>Z2>Z7>Z8>Z1>Z346 Likely MRCA data range: 1500-600 BC Likely origin: eastern(?) Scandinavia Culture: pre- or proto-Germanic cultures Narrative evidence: R-Z346 differs from R-Z344 by having a more eastern Scandinavian focus, with good returns from Sweden and Finland (R-Z343 only), but relatively poor returns from Denmark and Norway. Elsewhere, it is common in the Netherlands and perhaps Germany, plus England and (to a slightly lesser degree Scotland in R-DF101). It has stronger returns over the remainder of Europe than R-Z344 as a whole, indicating a wider migration path that would be typical of an origin closer to the Baltic than the Atlantic. R-U106>Z381>L48>Z9>Z2>Z7>Z8>Z1>Z346>DF101 Likely MRCA data range: 1300-300 BC Likely origin: southern Scandinavia, perhaps Jutland, possibly Fresia Culture: pre- or proto-Germanic cultures (Jastorf culture if later) Narrative evidence: R-DF101 can be split into its basal clades and the dominant R-S5245 (400 BC - 500 AD). The basal clades contain a mixture of returns from Sweden, Germany and the Netherlands, indicating a southern Scandinavian or northern German origin. The overall migration is southward at this time, however, and R-S5245 loses the Scandinavian component, so is more clearly continental. This shows a significant Scottish component, led by the Sinclair family from the far north. This region was little troubled by the known Germanic migrations, except for the Vikings, suggesting perhaps an older or more complicated migration. There are some eastern Germanic returns for R-S5245 too, so perhaps its origin is still in Scaninavia. R-U106>Z381>L48>Z9>Z2>Z7>Z8>Z1>Z346>Z343 Likely MRCA data range: 1200-200 BC Likely origin: southern Scandinavia, perhaps Jutland, possibly Fresia Culture: pre- or proto-Germanic Narrative evidence: By contrast to R-DF101, R-Z343 is more strongly English (though still with some Scots component, though little Irish). It remains common in the Netherlands, Sweden and Finland, with sporadic results from the rest of the continent. This suggests a continued Scandinavian presence, with later migrations to Finland, the Netherlands and England. Depending on the exact timing of its origin, the migration towards west Germanic countries could be as part of the R-Z343 origin story, or (more likely) later on in its diversification. Its basal clades show a variety of destinations. Its larger two clades, R-FGC11784 and R-CTS5601 show geographical differences, but these appear to relate to later migrations. R-U106>Z381>L48>Z9>Z2>Z7>Z8>Z1>Z346>Z343>FGC11784 Likely MRCA data range: 600 BC - 500 AD Likely origin: Jutland or surrounding countries Culture: west Germanic Narrative evidence: R-FGC11784 is strongly English (with a minor Welsh component), but appears largely absent in Scotland and England. This likely indicates a historically recent invasion, which appears to have been shortly after the foundation of the majority clade R-S6881 (200 BC - 1000 AD). This spectrum covers the full range of west Germanic migrations to the British Isles, though the Anglo-Saxon and Danelaw migrations would appear most likely. The other basal clades are also found in the Netherlands, France and Denmark, plus isolated returns in Italy and the Czech Republic that hint at a wider west-Germanic population driven by post-Roman migration. R-U106>Z381>L48>Z9>Z2>Z7>Z8>Z1>Z346>Z343>CTS5601 Likely MRCA data range: 1100-100 BC Likely origin: Scandinavia, likely Sweden Culture: proto- or early Germanic (if early), north Germanic if later Narrative evidence: R-CTS5601 shows much stronger Scottish (and, to a lesser degree, Irish) themes than R-FGC11784, or much of the rest of R-Z8 as a whole. It remains strong in England, though only typically so compared to R-Z381 as a whole. In continental Europe, it is strong in the Netherlands, and moderately present in Germany and France. In Scandinavia, it is very strong in Sweden and Finland, and present in Norway. Sporadic returns are seen in eastern Germanic branches, but this is very much focussed towards north Germanic areas, with some western Germanic migrations happening later. Consequently, it appears that this predates the major split of the Germanic peoples into their three traditional branches, but has latterly become most associated with the northern Germanic groups. PART VII: SUMMARY 1. OVERALL FINDINGS So, in summary, what can we learn from this process? Firstly, we can learn that predicting origins is difficult and imprecise. We don't have robust methods for determining where haplogroups come from, but we can get some ideas by looking carefully at each haplogroup in turn. It's not simply enough to look at the testers in our dataset, because the people who test are not representative of the European population as a whole. It's also not simply enough to look at the relative numbers of people in each haplogroup, as they become biased by founder effects - both real effects caused by successful families, and artificial effects caused by intentional over-testing by individuals and family-level projects. Instead, what we need to do is look for consistent messages across the different branches of a haplogroup. Patterns specific to a haplogroup and not the haplogroups above or beside it in the haplotree indicate something unique to that haplogroup. Patterns appearing across all branches of a haplogroup can indicate either the origin of that haplogroup (if in only one region), or that the haplogroup maintained its internal homogeneity until it managed to migrate to a new place (if in more than one region), but that separating origin from migration in this way isn't a clear-cut business and needs to be done in the context of upstream and downstream migrations, and the known contextural history of the underlying population. Ancient DNA is not only a way of helping pinpoint the origins of a haplogroup in space and time, but it also can help is solve these origin versus migration questions, and can reduce migration biases in the estimates. Finally, key to understanding migrations is knowing the dates at which they occurred, so that we can tie haplogroups and migrations to the cultures that existed at the time. Improving on the TMRCA estimates for haplogroups alone will play a significant role in better understanding the cultural origins of many major branches of the haplotree. 2. LESSONS FOR R-U106 But we can also think about the lessons for R-U106 in particular. It should be fairly obvious, but it is only so to me in hindsight: there is a major contrast between the L48- branches of the tree and the L48+ branches of the tree. By the time we are really discussing the major branches of L48, most of the fun has already happened for Z156 and Z18. Even if we can't often chart their historical cultures precisely or individually, the basal clades of R-U106, including R-L48, R-Z18, R-Z156 and R-S1688 and many of the minor clades, really seem to chart this initial spread from the Corded Ware Culture beginnings into cultures like the Single Grave Culture, the Nordic Bronze Age, and the Unetice Culture. That also means that, particularly for the basal clades, R-Z156 and R-S1688, there is a lot more missing history than for the typical R-L48 tester, because the branches on the haplotree thin out much more quickly. We can see R-Z156 becoming established in central Europe, and R-Z18 becoming entrained in Scandinavia, but there is still 4000 years worth of poorly uncovered history that follows and - despite having a few good ancient DNA results in several cases - there aren't enough data to form firm conclusions about what happens during that period. By contrast, the R-L48 half our our haplogroup show a much richer prehistory. We can see diverse expansions of R-L47 out of central Europe to perhaps unexpectedly far-flung locations. We can see start to see how R-Z9 fragments, with R-Z331 becoming entrenched in Germany, while R-Z2 migrates north and gives us much of our Germanic bias, and start to unpick how some of the major R-U106 early historical migrations might have happened. An important part of this is the discovery of some potentially early British-born R-U106 haplogroups: R-FGC12307, R-BY17999, R-L45, R-S1774 and R-A5616. Whether these truly represent British haplogroups, or just haplogroups that had a dominant migration to Great Britain early in their history, we may never know in many cases. The uncertainty around the dates involved also mean some of these may fall into the rather more humdrum migrations of the post-Roman "Anglo-Saxon" migrations. Nevertheless, particularly for fairly old clades like R-L45, it raises the possibility of finding some of the earlier migrations of R-U106 to the British Isles that we struggle to find, precisely because we don't test regions outside of the British Isles well enough to distinguish them. I'll close with a reminder that much of what I've said over this post is very speculative. What you see is the active processes of gaining a better understanding about how these methods can be developed, and the hints that are starting to appear out of the data in hand. They are not the fully robust, quantitatively checked and carefully controlled results of scientific modelling. They're crystal-ball gazing. The dates, cultures, countries and conclusions will all change - not only as new data comes in, but as better methods of determining these migrations can be found and implemented. We are still very much at the start of this process but, for the first time, we are starting to get the data in hand to piece together the last 5000 years of human migration in Europe. |