Generally our work falls into the broad categories of Population Genomics, Comparative Genomics, and Computational Biology. Lab members work both on independent projects, as well as with larger scale, often multi-institutional, collaborative teams. Some recent examples of recent and ongoing projects follow, mostly as examples of what we are currently doing

Population Genomics of Adaptation

Pasted Graphic
Adaptive changes at the phenotypic level abound in the natural world, yet we have very little understanding of their genetic determinants. We are interested in understanding the ways in which natural selection, in general, shapes patterns of genetic variation in genomes. A complete understanding of the population genomics of adaptation requires at least three related issues: the identification of individual targets of natural selection, an understanding of the adaptive history of those targets, and lastly an understanding of the ways in which linked genetic loci are influenced by those targets. My research program centers around these themes, alternately developing methods and applying them to genomic data sets. To this end we have been developing machine learning approaches, sometimes known as artificial intelligence algorithms, for inferring the location and mode of selective sweeps in genomes. An example of this approach is our method S/HIC which uses spatial patterns of polymorphism along a recombining chromosome in concert with a supervised machine learning method called an Extra-Trees classifier to identify selective sweeps with great sensitivity and sensitivity (Schrider and Kern, 2016).

Machine Learning Methods for Population Genomic Prediction of Function

With the rapid explosion in genome sequence data over the past decade, geneticists find themselves in a place where data analysis rather than data collection is the rate limiting step towards gleaning new biological insights. In particular, as genome sequences from thousands of individuals are now available, population geneticists are faced with the daunting task of making sense of millions of genetic variants. It is tempting now to leverage these resources to aid identification of functional elements in the genome using genetic variation as a guide. Again, we have been again been using a fusion of traditional population genetics and machine learning to build powerful methods of inference. An example of this research which utilized so-called unsupervised learning techniques was our development of population genetic hidden Markov models for use in scanning genomes for the footprint of natural selection (Kern and Haussler, 2010). Recently in the lab we have turned our attention to supervised machine learning methods for annotating functional elements in genomes on the basis of population genomic data (Schrider and Kern, 2015).

Phylogenetic Prediction of Function in Genomes

While within population genomic variation should be able to give a species-specific view of functional regions of the genome, comparative genomics approaches are uniquely powerful for the identification of functional elements of genomes that have been conserved over longer evolutionary timescales. In our lab we have used comparative genomic approaches for the phylogenetic-based discovery of important genomic regions in both Drosophila and Plants, with a particular emphasis on uncovering the most conserved regions of genomes, sometimes known as ultra conserved regions. Most recently our lab has published on patterns of conservation in whole genome plant alignments that we have created (Hupalo and Kern, 2013). This work has revealed what appears to be an incredibly dynamic evolution of the non-coding portion of plant genomes that contrasts quite strongly to what we have seen previously in mammalian alignments. To facilitate this work we have also created a web-based comparative genomics browser, based on the UCSC genome browser that houses and serves up our plant multiple alignment, please go check it out here.

Novel Methods for Parameter Estimation in the IM Model

In collaboration with Dr. Jody Hey we have been working on multiple new methods for parameter estimation in the context of the Isolation with Migration (IM) model. In particular we have focused our attention on the allele frequency spectrum (sometimes known also as the site frequency spectrum) as a way to estimate parameters from the IM model efficiently. Currently we have been exploring two separates paths along these lines: 1) use of the two-locus AFS as a way to more accurate estimate migration rates when migration is a strong force (2Nm >> 1) using massively parallel simulation software that we have written and 2) a novel continuous-time markov chain approach that we developed.