A key goal of population genetics is understanding the relative impact of mutation, drift, and natural selection. While myriad studies have carried out analyses on protein-coding regions, much less information exists for regulatory regions -- in part because the location of regulatory elements and the impact of variants within them is much more difficult to establish. We are developing statistical methods for measuring the impact of natural selection on noncoding sequences, and using them to understand where positive selection has operated throughout great ape genomes (Haygood et al. 2007). Current projects are adapting the tools of meta-analysis to incorporate the results of multiple genome-scale scans for selection in order to understand the distribution of positive selection between various genic compartments across the genome: protein-coding, 5' and 3' flanking, 5' and 3' UTRs, first and non-first introns, and intergenic regions. We have also used SVM (support vector machines) to predict which kinds of genes are more likely to harbor functional regulatory variation genome-wide (Tung et al. 2009b) and the tools of meta-analysis to explore trends in signals of selection across multiple analyses (Haygood et al. submitted).

DISTRIBUTION OF POSITIVE AND NEGATIVE SELECTION THROUGHOUT THE GENOME

Distribution of positive selection during human origins.  Using the human, chimpanzee, and macaque genome assemblies, we tested for branch-specific positive selection within protein-coding regions as well as their associated 5’ and 3’ untranslated (UTR) and flanking regions. Many more genes showed evidence of positive selection in noncoding (red) than coding regions (blue). As might be expected, different functional categories of genes are enriched for positive selection within coding and noncoding partitions. Numbers indicate human chromosomes; results for selection on the human branch only are shown.

THE WRAY LAB

PROJECTS : GENOME-WIDE SELECTION ANALYSES