Skip to main content
European Commission logo print header

Next generation disease mapping

Final Report Summary - NEXTGENE (Next generation disease mapping)

NEXTGENE proposed to develop and apply new methods that can find more variants contributing to disease by explicitly modeling their interaction and combine the statistical signal contributed by several rare variants. To attain this a partnership was made between the worlds largest private company that sequence individuals for association mapping (deCode Genetics) and the Bioinformatics Research Center (Thomas Mailund and Mikkel Heide Schierup) which hosts a large group of people developing new algorithms and models. People were exchanged between the two institutions typically of periods of time between 6 months and 24 months. This was both junior researchers (advanced PhD students) and senior researchers. Furthermore, PIs have travelled regularly between the institutions. At deCode researchers were given full rights to work with the in-house data and publish results from this.
In the original project description (2009) it was anticipated that whole genome sequencing would increase in quantity and importance over genotype chips and this has been found to be very true. Consequently methods were from the beginning targeting sequencing data. At BiRC we set expectations for the frequencies of associated variants expected from evolutionary theory (Besenbacher, Mailund and Schierup 2012). We also investigated how disease associated genes are more close in protein interaction space and used this to devise a new method for identification of disease genes using a guilt-by-association principle (Qian et al. 2014). Since most sequencing studies are based on mapping to the human reference genome, there has been an undercalling of insertion and deletion polymorphisms, particularly when these are longer than 20 base pairs. We devised a new method which is the first to use denovo assembly to call indels and implemented in the now popular software SOAPindel (see Li et al. 2012). This software has also been applied to Decode data and identified indels with greater power than competing approaches. We also developed a method that allows efficient assembly of complete mitochondria in whole genome sequencing studies and applied this to a large Danish data set (Li et al., 2014).
At deCode, data has been rapidly accumulated and a large number of variants have been associated to different diseases, where Nextgene has contributed directly to the following studies (Thorgeirsson et al. 2013; Qian et al. 20014; Gudbjartsson et al. 2015). There has also been focus on estimating the rate of new mutations per generation by comparing parent and offspring genomes since such denovo mutations are the raw material for evolution and surprisingly often responsible for new diseases. The first such study was published by decode with contributions from Nextgene personnel at both Decode and BiRC in 2013 (Kong et al, 2013). This has been followed by a suite of recent studies on similar questions, now using more than 2500 genomes sequenced at very high quality (Gudbjartsson et al. 2015). A similar initiative in Denmark has been chaired by NextGene personnel and have used parent offspring trios sequenced at even higher quality to also survey new insertion and deletion mutation and distinguishing among somatic and germline mutations (Besenbacher et al. 2015).
We consider that all the major goals of the original project proposal has been reached with more insight being learned on new mutations than originally anticipated. The impact of the work is considerable as indicated by publication in very high impact scientific journals and the field of human disease genetics is of clinical relevance and the results are already being pursued commercially through decodes partners.