Tuesday, September 24, 2013

Humans Have Been Evolving Like Crazy Over the Past Few Thousand Years (Smithsonian 2012)

Evolution? Most Americans, in the neighborhood of 85%, do not believe in pure evolution (in the absence of a creator).
Forty-six percent of Americans believe in the creationist view that God created humans in their present form at one time within the last 10,000 years. The prevalence of this creationist view of the origin of humans is essentially unchanged from 30 years ago, when Gallup first asked the question. About a third of Americans believe that humans evolved, but with God’s guidance; 15% say humans evolved, but that God had no part in the process.” (Gallop Poll, June 2012)

[Emphasis added to this quote.]

Despite this troubling ignorance, human beings have continued to evolve - even within the time-frame of Biblical history.

Humans Have Been Evolving Like Crazy Over the Past Few Thousand Years

November 29, 2012
A DNA molecule Photo: ynse / Wikimedia Commons
It’s a common argument of the know-it-all teen, fresh from an introductory biology course: “Life is so cushy now,” he might say, “People aren’t even evolving anymore.” As the argument goes, most people live a decently long life and have a chance to pass on their genes, since we aren’t so often being gobbled up by lions or succumbing to now-curable diseases. With this comes a dampening on the forces of natural selection, and a stagnation, or even weakening, of the human species.
But the truth, it seems, couldn’t be more different. Over the past 5 to 10 thousand years, says Nature, reporting on a new study, the genetic diversity in the human population has exploded, a bloom that serves as stage one in the process of evolution.
The human genome has been busy over the past 5,000 years. Human populations have grown exponentially, and new genetic mutations arise with each generation. Humans now have a vast abundance of rare genetic variants in the protein-encoding sections of the genome.
Brandon Keim, writing in Wired, says, “As a species, we are freshly bursting with the raw material of evolution.”
Most of the mutations that we found arose in the last 200 generations or so. There hasn’t been much time for random change or deterministic change through natural selection,” said geneticist Joshua Akey of the University of Washington, co-author of the Nov. 28 Nature study. “We have a repository of all this new variation for humanity to use as a substrate. In a way, we’re more evolvable now than at any time in our history.
Most of the new genetic shifts are extremely rare, appearing in only a small slice of the human population. The researchers look at their newly unveiled realization of the breadth of human diversity in terms of what it could mean for trying to understand the genetic basis of a number of diseases, or in what it tells us about humanity’s evolution history. But what it also means is that—come the emergence of a new disease or the turned tide of the zombie apocalypse—BAM, rapid evolution. Bring it, selection pressures. We got this.
* * * * *

Here is the full text of the original article - it's mostly incomprehensible to a non-geneticist, or at least to me.

Analysis of 6,515 exomes reveals a recent origin of most human protein-coding variants

Wenqing Fu [1], Timothy D. O'Connor [1], Goo Jun [2], Hyun Min Kang [2], Goncalo Abecasis [2], Suzanne M. Leal [3], Stacey Gabriel [4], David Altshuler [4], Jay Shendure [1], Deborah A. Nickerson [1], Michael J. Bamshad [1,5], Population Genetics Working Group, Broad GO, Seattle GO, NHLBI Exome Sequencing Project, and Joshua M. Akey [1]

1. Department of Genome Sciences, University of Washington, Seattle, Washington, USA.
2. Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, USA.
3. Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA.
4. Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA.
5. Department of Pediatrics, University of Washington, Seattle, Washington, USA.

Copyright notice and Disclaimer
Users may view, print, copy, download and text and data- mine the content in such documents, for the purposes of academic research, subject always to the full Conditions of use: http://www.nature.com/authors/editorial_policies/license.html#terms


Establishing the age of each mutation segregating in contemporary human populations is important to fully understand our evolutionary history1,2 and will help facilitate the development of new approaches for disease gene discovery3. Large-scale surveys of human genetic variation have reported signatures of recent explosive population growth4-6, notable for an excess of rare genetic variants, qualitatively suggesting that many mutations arose recently. To more quantitatively assess the distribution of mutation ages, we resequenced 15,336 genes in 6,515 individuals of European (n=4,298) and African (n=2,217) American ancestry and inferred the age of 1,146,401 autosomal single nucleotide variants (SNVs). We estimate that ~73% of all protein-coding SNVs and ~86% of SNVs predicted to be deleterious arose in the past 5,000-10,000 years. The average age of deleterious SNVs varied significantly across molecular pathways, and disease genes contained a significantly higher proportion of recently arisen deleterious SNVs compared to other genes. Furthermore, European Americans had an excess of deleterious variants in essential and Mendelian disease genes compared to African Americans, consistent with weaker purifying selection due to the out-of-Africa dispersal. Our results better delimit the historical details of human protein-coding variation, illustrate the profound effect recent human history has had on the burden of deleterious SNVs segregating in contemporary populations, and provides important practical information that can be used to prioritize variants in disease gene discovery.

As part of the NHLBI sponsored Exome Sequencing Project (ESP), we sequenced the exomes of 6,515 individuals (Supplementary Table 1) including 4,298 European-Americans (EAs) and 2,217 African-Americans (AAs). Exome data were subjected to standard quality control filters as previously described6 (Supplementary Information), resulting in a data set of 1,146,401 autosomal protein-coding SNVs with a known ancestral state (709,816 and 643,128 in EAs and AAs, respectively) distributed across 15,336 protein-coding genes. To quantitatively estimate the age of each SNV (i.e., allele age), we developed a simulation approach to generate a series of coalescent trees for a specified demographic model, and estimated allele age based upon the derivation of Griffiths and Tavaré7 (Supplementary Information). We verified the accuracy and robustness of this approach to factors including recombination rate heterogeneity, population growth, migration, and purifying selection. Extensive coalescent simulations demonstrated that we could accurately estimate the expected allele age in the simulated data, although the variance associated with any individual SNV can be large (Supplementary Fig. 6 and 7).

We estimated the age of all 1,146,401 SNVs using six different previously inferred demographic models5,6,8-11, three of which considered recent explosive population growth5,6,8 (Supplementary Table 2). Estimates of allele age were generally robust across different demographic models, with the largest discrepancies resulting in a two-fold difference in average age across all SNVs (Supplementary Table 3 and Supplementary Fig. 8a). However, because most SNVs arose recently (see below), differences among demographic models were highly concordant (Supplementary Information). Accordingly, we report results based on a modified Out-of-African model9 in which accelerated population growth began 5,115 years ago with a per generation growth rate of 1.95% and 1.66% for EAs and AAs, respectively6.

The site frequency spectrum (SFS) of protein-coding SNVs revealed an enormous excess of rare variants (Fig. 1a). Indeed, we observed a SNV approximately once every 52 bp and 57 bp in EAs and AAs, respectively, whereas in a population without recent explosive growth we would expect the SNVs to occur once every 257 bp and 152 bp in EAs and AAs, respectively (Supplementary Information). Thus, the EA and AA samples contain a ~5 and ~3-fold increase in SNVs, respectively, attributable to explosive population growth, resulting in a large burden of rare SNVs predicted to have arisen very recently (Fig. 1b). For example, the expected age of derived singletons, which comprise 55.1% of all SNVs, is 1,244 and 2,107 years for the EA and AA samples, respectively. Overall, 73.2% of SNVs (81.4% and 58.7% in EAs and AAs, respectively) are predicted to have arisen in the past 5,000 years. SNVs that arose >50 thousand years (kyr) ago were observed more frequently in the AA samples (Fig. 1b), which likely reflects stronger genetic drift in EAs associated with the out of Africa dispersal.

The average age across all SNVs was 34.2±0.9 (s.d.) kyr in EAs and 47.6±1.5 kyr in AAs, and these estimates were robust to sequencing errors (Supplementary Information; Supplementary Fig. 9). As expected, SNVs shared between EAs and AAs were significantly older (104.4 kyr and 115.8 kyr for EAs and AAs, respectively) than population-specific variants (5.4 kyr and 15.3 kyr in EAs and AAs, respectively; Fig. 1c) (t-test; p<10-5 by permutation). Furthermore, there were large and significant differences among the average allele age of SNVs stratified by functional type (t-test; p<10-5 by permutation). For instance, splice site, nonsense, and non-synonymous SNVs were two to eight times younger compared to synonymous and noncoding variants (Fig. 1d). Moreover, we classified amino acids into four groups (non-polar and neutral, polar and neutral, acidic and polar, and basic and polar), and nonsynonymous SNVs resulting in changes between groups were significantly younger than those within groups (t-test; p<10-5 by permutation; Supplementary Fig. 10a). These differences in average allele age are likely due to varying intensities of selective constraint among different classes of SNVs12. Consistent with this prediction, we observed significantly higher values of the neutrality index, a measure of the direction and degree of departure from neutral evolution, in genomic regions enriched for younger variants (Spearman's correlation; p=0.004 and 0.001 for EAs and AAs, respectively; Supplementary Fig. 11), indicating a higher burden of deleterious SNVs.

To more directly identify putatively deleterious SNVs, we used four functional prediction methods (SIFT13, PolyPhen214, a likelihood ratio test15, MutationTaster16) applicable to nonsynonymous SNVs and two conservation-based methods (GERP++17 and PhyloP18) applicable to all SNVs (Supplementary Information). We found a strong inverse relationship between average SNV age and the number of methods that predicted a variant to be deleterious (Fig. 2a and 2b). Thus, SNVs predicted to be deleterious by multiple methods likely experience (on average) more intense purifying selection and may be of particular interest in disease mapping studies, or to weight differently in rare variant association tests. The age of nonsynonymous SNVs predicted to be deleterious by all six methods was 3.0 and 6.2 kyr in EAs and AAs, respectively, and 88.7% were <5 kyr (92.9% and 80.6% in EAs and AAs, respectively).

The strengths and weaknesses of functional prediction methods vary substantially and as a result the accuracy of any single method is modest15. Accordingly, we used a majority rule approach to identify a more conservative set of SNVs predicted to be deleterious6. Specifically, nonsynonymous SNVs predicted to be functionally significant by at least four methods and all other SNVs (synonymous, splice, and noncoding variants) predicted by two conservation-based methods were designated as deleterious. In total, 14.4% (164,688) of SNVs, including 152,633 nonsynonymous variants, met these criteria. We found that allele age was strongly related to the probability that a variant was predicted to be deleterious (Supplementary Fig. 12), with the fraction of SNVs predicted to be deleterious diminishing as allele age increased (Fig. 2c and Supplementary Fig. 13). The average age of conservatively defined deleterious variants was 5.2±0.3 kyr for EAs and 10.1±0.6 kyr for AAs. Moreover, 86.4% of these SNVs were predicted to have arisen in the past 5 kyr (91.2% and 77.0% for EAs and AAs, respectively), corresponding to the onset of accelerated population growth (Fig. 3a). In other demographic models, a similarly high proportion of deleterious SNVs were predicted to have arisen since the onset of accelerated growth rates, with the exact timing varying somewhat among models, but always in the timeframe of 5-10 kyr (Supplementary Table 3; Supplementary Fig. 8b and 8c).

Moreover, 7,197 (57.4%) of the 12,533 genes in EAs and 4,534 (37.5%) of the 11,607 genes in AAs that harbor one or more deleterious variants only possess deleterious SNVs with an estimated age of < 5 kyr (Fig. 3b). Thus, recent accelerated population growth has had a large influence on the number of genes harboring deleterious variants in contemporary populations. Notably, after correcting for exon length of each gene, three and eighteen genes in EAs and in AAs, respectively, have a significant excess of deleterious variants that arose after the onset of recent accelerated growth (p≤3×10-6; Supplementary Table 4), including 12 genes that have been associated with human diseases19 such as LAMC1 (premature ovarian failure20), LRP1 (Alzheimer Disease21), CPE (coronary artery atherosclerosis22), and KIAA0196 (hereditary spastic paraplegia23).

Next, we investigated the distribution of ages for conservatively defined deleterious SNVs in 849 genes that cause Mendelian disorders24, 2,663 genes associated with complex diseases19, 1,226 genes considered “essential” (i.e., a mouse knockout associated with lethality or sterility)25, and 11,711 genes classified as “other” (Supplementary Information). The proportion of deleterious SNVs in genes for Mendelian disorders (15.9%), essential genes (15.2%), and genes associated with complex diseases (15.1%) were each significantly higher (Fisher's exact test, p<10-16) compared to other genes (14.0%). In the EA samples, the proportion of deleterious SNVs did not decline monotonically as a function of age for Mendelian and essential genes. Rather, the proportion of deleterious variants with an estimated age of 50-100 kyr in Mendelian disease genes and 100-150 kyr in essential genes were elevated (Fig. 4a). This pattern was not observed in the AAs (Fig. 4a). To explore this observation, we performed simulations to estimate the probability that a deleterious SNV survives to the present day as a function of when the variant arose, the magnitude of selection, and presence or absence of an out of Africa bottleneck (Supplementary Information). Simulations of deleterious alleles in the presence of a bottleneck recapitulated the patterns observed in EAs (Supplementary Fig. 14). Specifically, in the presence of a bottleneck, weakly deleterious alleles (selection coefficient, s≤0.001) have an increased probability of survival precisely in the intervals 50-100 kyr and 100-150 kyr. Thus, our simulations suggest that genes underlying disease and essential genes are more functionally constrained relative to other genes, and the bottleneck associated with the out of Africa dispersal led to less efficient purging of weakly deleterious alleles26.

Finally, we found that the average age of deleterious variants (and the proportion of deleterious variants; Supplementary Fig. 15) was significantly different across 235 KEGG pathways (Kruskal-Wallis Rank Sum Test; p=2.5×10-3 and 1.08×10-6 for EAs and AAs, respectively; Fig. 4b; Supplementary Information). The average age across pathways did not vary significantly when all SNVs were considered (Kruskal-Wallis Rank Sum Test; p=0.259 and 0.075 for EAs and AAs, respectively), indicating the differences observed for deleterious variants likely represent heterogeneity of functional constraint across pathways. In general, the average age of deleterious variants in metabolic pathways was older than that in other pathways (Mann-Whitney test, p=1.11×10-4 and 6.27×10-9 for EAs and AAs, respectively), suggesting they are subject to less functional constraint. Conversely, deleterious variants in human disease pathways (Mann-Whitney test, p=0.03 for AAs) and in pathways involved in organismal systems were significantly younger (Mann-Whitney test, p=0.04 and 0.002 for EAs and AAs, respectively).

In summary, the spectrum of protein-coding variation is considerably different today compared to what existed even as recently as 200 – 400 generations ago. 86.4% of putatively deleterious protein-coding SNVs arose in the last 5-10 kyr, which are enriched for mutations of large effect (Supplementary Fig. 14), as selection has not had sufficient time to purge them from the population. It thus seems likely that rare variants play a significant role inheritable phenotypic variation, disease susceptibility, and adverse drug responses. In principle, our results provide a framework for developing new methods to prioritize potential disease causing variants in gene mapping studies. More generally, the recent dramatic increase in human population size, resulting in a deluge of rare functionally important variation, has important implications for understanding and predicting current and future patterns of human disease and evolution. For instance, the increased mutational capacity of recent human populations has led to a larger burden of Mendelian disorders, increased the allelic and genetic heterogeneity of traits, and may have created a new repository of recently arisen advantageous alleles that adaptive evolution will act upon in subsequent generations27.

Methods Summary

Exome sequences were obtained for 6,823 individuals, who were sequenced to high-coverage (median depth > 100x) on an Illumina GAII or HiSeq2000. Library construction, exome capture, sequencing, mapping, calling and filtering were performed as previously described, with minor modifications6 (and see Supplementary Information). After quality control and removal of related individuals, 6,515 individuals were retained. Ancestry of each individual was inferred by PCA performed on the sequence data. We developed a simulation approach based on coalescent theory to estimate allele age, which was applied to 1,146,401 autosomal SNVs with known ancestral states. A complete description of the materials and methods is provided in Supplementary Information.


We acknowledge the support of the National Heart, Lung, and Blood Institute (NHLBI) and the contributions of the research institutions, study investigators, field staff and study participants in creating this resource for biomedical research; and the Population Genetics Project Team. We thank Jim Wilson and Ron Do for critical feedback on the manuscript. Funding for GO ESP was provided by NHLBI grants RC2 HL-103010 (HeartGO), RC2 HL-102923 (LungGO), and RC2 HL-102924 (WHISP). The exome sequencing was performed through NHLBI grants RC2 HL-102925 (BroadGO) and RC2 HL-102926 (SeattleGO).


Author Contributions

WF and JMA conceived the analyses. DAN, SG, and DA oversaw data generation and QC. GJ, HMK, and GA developed algorithms and called SNVs. WF performed the majority of analyses with contributions from TDO. WF, MJB, JS, and JMA analyzed the data and wrote the manuscript with contributions from all authors.

Supplementary Information is linked to the online version of the paper at www.nature.com/nature.

References are available at the NIH site.

Post a Comment