We first optimized a protocol for high molecular weight (HMW) DNA extraction from poplar suitable for long-read genome sequencing using the Oxford Nanopore Technologies (ONT). Subsequently, woody cuttings from 750 poplar genotypes were grown in triplicate in a greenhouse and pure high molecular weight (HMW) DNA was prepared. We made a de novo assembly of a reference Populus nigra genome (accession ‘BDG’). To this end, we used ONT long-reads at 350x coverage, Illumina short-reads and Hi-C data. The total genome size is 388 Mb over 125 fragments (19 chromosomes and 106 scaffolds) with a N50 length of 20.3 Mb. Assembly is highly contiguous, with only 20 gaps remaining in the 19 chromosomes, and is estimated to be 96.2% complete according to a BUSCO analysis. Second, we generated long-read sequences of the 749 individuals composing the wild P. nigra population at an average depth of 23x. We identified over 9 million biallelic SNPs by retaining SNPs identified by three independent software packages. Our variant calling analysis also revealed ~128,000 structural variants/genotype, including insertions, deletions, inversions and translocations. We also identified large hemizygous regions (up to 1.2Mb long) that could not be detected by any of the three software packages.
The second objective was to establish the most optimal harvesting stage, tissue and extraction method for metabolite profiling. To this end, metabolite profiles of leaves of three developmental stages of 10 genotypes were generated by LCMS. The first fully mature leaf, at leaf plastochron index 5, generated the most informative metabolite spectrum. Furthermore, metabolites extracted from leaf material from one poplar genotype are used for purification and structural identification by the VIB Metabolomics Core facility. A high-throughput metabolite profiling method was established and used for the metabolite profiling of the leaf samples. The UPLC-MS metabolic profiles for the 749 genotypes in triplicate resulted in ~28,000 metabolite features.
We conducted four types of genome-wide association studies (GWAS) on the 28,000 features. Across all analyses, we detected 691,708 significant trait-variant associations at the genome-wide threshold (Bonferroni-adjusted P < 0.001) encompassing 15,645 metabolic features and 11,292 genes, with between 1 and 6,140 variants per feature. Of these genes, 3,423 (30.3%) are predicted to encode enzymes. The four GWAS approaches revealed both shared and unique gene associations, highlighting complementary aspects of the genetic architecture underlying the traits. Several genomic regions were repeatedly associated across multiple metabolite phenotypes and variant types, suggesting the presence of pleiotropic loci.
As proof-of-concept, our analysis has focused on 89 structurally-characterized poplar compounds. Using the EMMAX algorithm, a total of 3,259 significant trait-variant associations at the genome-wide threshold (Bonferroni-adjusted P< 0.001) were identified, spanning 171 genes. Among these associations, we selected 53 enzyme-encoding genes, 14 of which have been expressed in E.coli or yeast. For 3 genes, we already have proof-of-function based on enzyme-assays. For two candidate-genes, we have made vectors to knock-out the corresponding genes by CRISPR/Cas9 in Populus tremula x P. alba.