Periodic Reporting for period 2 - ClimAHealth (Assessing the role of climate adaptation on human evolution and its implications for health.)
Période du rapport: 2023-10-15 au 2024-10-14
Conclusions: We developed and applied novel machine learning models to detect adaptation signals, finding a consistent accumulation of them in genes related to thermogenesis in skeletal muscle and brown adipose tissue (SMT and BAT). These signals appeared across populations from four continents, suggesting widespread climate adaptation over the past 30,000 years. Moreover, thermogenic genes showed stronger-than-expected associations with body mass in an independent genome-wide association study (GWAS), suggesting that past climate adaptations may influence present-day cardiometabolic health.
We have developed and applied a modeling framework able to predict the probability of adaptation (iHS) using multiple predictors as input. In other words, we have been able to assess the influence of multiple genomic factors on adaptation at the same time. This framework considered a wide range of algorithms from simple linear models to deep neural networks. It applied a rigorous training/evaluation scheme in order to, first, select the best combination of parameters within a given algorithm category and then make a final selection between the best candidates across all categories. The models were evaluated and tested against data not previously seen in order to obtain a rigorous estimate of model performance. Given the high computational burden of this model exploration, we took advantage of initial results of ClimAHealth showing that adaptation signals were more visible in the African Yoruba population compared to non-African populations. Therefore, Yoruba was a good population to battle-test our framework and find the best algorithm and architecture to detect true iHS selection signals. This was a deep neural network with 5 layers and 1000 nodes (among other parameters) that then was used to model iHS in completely different and new populations (25 populations spread across 4 continents). This approach achieved a high predictive power explaining ~80% of the variability of iHS when predicting on unseen data. Notably, we identified multiple genomic factors associated with adaptation across all populations. This suggests that signals of recent adaptation were not randomly distributed across the genome, reducing the likelihood of them being false positives. In other words, according to these results, adaptation should have been relatively frequent in human populations in the past 30 Kya. Importantly, we included as predictor in the models the distance to genes related to thermoregulation in general, and thermogenesis in skeletal muscle and brown adipose tissue in particular. We found a higher probability of adaptation around thermogenic genes compared to the rest of the genome for most of the studied populations. Given the link between thermogenesis and environmental temperature, these adaptation signals provide evidence for climate adaptation in humans. This also suggests that genetic variants in these thermogenic genes are functionally relevant, as signals of adaptation should occur in genomic regions impacting human phenotypes.
In the last step of this project, we performed a genome-wide association study using as input genetic variants coming from individuals exposed to a training regime where cardio-respiratory performance and body mass were measured before and after. We applied multiple quality control procedures required to reduce confounding effects (e.g. population structure or sample relatedness) along with an imputation step in the TOPMed imputation server resulting in 3M variants for 1K samples. The processed data was then used to calculate the association between all genetic variants and the traits under study (e.g. weight change after the training regime). This was then used to calculate the average level of association for all variants within thermogenic genes and compare it with randomly selected variants across the genome, which acted as the random expectation. We found that thremogenic genes had a higher association with weight change than expected by chance. Therefore, we have defined a set of genes related to thermoregulation that exhibited an accumulation of adaptation signals and, at the same time, significantly correlated with body mass. This suggests the existence of past events of climate-driven adaptation that shaped the physiology of human populations and, in turn, this could influence the variability of health-related traits like body mass.
The human genome’s complexity has made detecting recent adaptation controversial. However, our innovative models provide strong evidence for frequent adaptation in recent evolutionary history. ClimAHealth thus contributes to key questions in human evolutionary biology and opens new research avenues by applying advanced modeling techniques to study positive selection.
Last, but not least, we defined a list of genes related to brown adipose tissue that was initially validated, confirming its cohesion and relationship with BAT. Many genes included in this list were already known to be directly implicated in BAT, but not all of them, being the latter potential novel candidates. The accumulation of adaptive signals within this group of genes further supports the functional relevance of these genes making it a promising list of novel candidates to improve our knowledge about BAT. This has broader implications given the influence of BAT in glucose and plasma lipids, being these candidates potentially relevant for the genetic architecture of health-related traits.