Genome Plasticity and Computational Genetics
Despite the high stability of DNA molecules, genomes change over time. Changes can be introduced by controlled pathways like meiotic recombination or by spontaneous, mutagen-induced influences. We are interested how and to what degree genomes change and how these changes influence the phenotype. As this requires reconstruction of genome sequences, we explore the possibilities of state-of-the-art genomics and try to push the limits of these technologies beyond their current limits.
Genome Assembly
In order to analyse the degree of genomics differences, genome sequences need to be reconstructed (assembled) and compared. One active line of research in our group is the development and implementation of new methods for genome assembly. Currently we are exploring the possibilities of long-read sequencing data. We have used PacBio sequencing data in combination with short read sequencing to assemble the first chromosome-level assembly of an Arabidopsis plant (Zapata et al, 2016, PNAS) after the release of the reference sequence in the year 2000. Currently we develop new methods to combine long-read sequencing data with optical mapping and HI-C based genome scaffolding data to reconstruct entire chromosomes from high-throughput data alone.
Understanding the source of evolution: Accumulation of mutations
Mutations introduce genomic variation which is the source for selection and consequently for adaptation. Understanding the rate, spectrum and source of mutations is essential to understand evolutionary processes. We have been involved in an early analysis of mutations across the entire Arabidopsis genome (Ossowski et al, 2010, Science), where this first study focused on spontaneous mutations under “normal” growth conditions only, we are now extending these experiments to decipher the impact of abiotic stress and polyploid genomes.
Computational Genetics: Bridging the gap between genotype and phenotype
Combining next generation sequencing and genetics is a powerful way to bridge the gap between genotypes and phenotypes. In the beginning of ngs we were part in developing the first method to simultaneously map and identify causal mutations from forward genetic screens by a single sequencing experiment, which later became known as SHOREmapping or mapping-by-sequencing (Schneeberger et al, 2009, Nat Methods).
This paved the way into the development of a series of sequence algorithms for mutation identification including the identification of sequence differences without reference sequences exclusively using the absence/presence patterns of k-mers in DNA sequence reads of different samples and the reconstruction of allele frequency shift in pools of genomes ( Galvao et al , 2012, Plant J; Nordström et al, 2013, Nat Biotech; Sun et al, 2015, Methods Mol Biol).
Mapping-by-sequencing is most powerful if applied to phenotypes, with a single causal locus underlying (eg like mutants from forward genetic screens). However, as most natural phenotypes are encoded by more than a single locus, we continued method development for GWAS creating a method that does not require any population structure correction at all (Klasen et al, in revision). Population structure correction is a severe penalization of association results to avoid inflation of false associations, however, it can also hamper the identification of real associations. We have applied this method to diverse sets of segregating populations, and already our first applications allowed us to clone a gene for plant root development that was not described before.
The future of association studies
Without any doubts the biggest variation in phenotypes can be found between species, and not within species. These differences offer the great opportunity to understand the genetic basis of partially conserved traits as well as the evolution of these traits. However, most species might be too distantly related for this, and only sets of closely related species will allow for such associations. Recently we started the development of genotype-to-phenotype-association methods that associate morphological differences that can be found between species to the differences that can be found in their reference genomes (Willing et al, 2015, Nat Plants). We apply these new methods to a broad panel of diverse Brassicaceae species that we cultivate in our lab.