The recent EMBO|EMBL Symposium: Multiomics to Mechanisms – Challenges in Data Integration (11-13 Sep 2019) addressed ways of integrating large-scale biological data across the different omics fields.
258 researchers from various fields gathered in Heidelberg last week to listen to 36 talks and engage with 146 poster presenters. Here we present the posters of 5 scientists who received best poster awards at the conference by popular vote.
Benchmarking of multi-omics joint dimensionality reduction (DR) approaches for cancer study
Authors: Laura Cantini (1), Pooya Zakeri (2), Aurelien Naldi (1), Denis Thieffry (1), Elisabeth Remy (3), Anaïs Baudot (2)
Dimensionality Reduction (DR), decomposing data into low-dimensional spaces while preserving most of their information content, is among the most prevalent machine learning techniques in data mining. With the advent of high-throughput technologies, high-dimensional data have become a standard in biology, emphasizing the use of DR. This phenomenon is particularly pronounced in cancer biology, where consortia have profiled thousands of patients for multiple molecular assays (“multi-omics”), including at the emerging single-cell scale. DR approaches have been mainly applied to single omics data leading to cancer subtyping, tumor sub-clones quantification and immune infiltration quantification. Recently, DR approaches designed to jointly analyze multiple omics have been proposed. Integrative DR methods are based on various mathematical assumptions, ranging from extensions of CCA, tensors, or more general data fusion approaches, which makes difficult to chose which method to apply.
In this context, we here in-depth benchmark multi-omics DR approaches using: i) artificial multi-omics cancer data ii) multi-omics bulk data from 10 different cancer types downloaded from TCGA iii) multi-omics single-cell data from cancer cell lines In (i), the capability of the various methods to predict the clustering ground truth was found strongly sensible to the size of the clusters, with intNMF, RGCCA, MCIA and JIVE being the more robust methods. For (ii), MCIA, RGCCA, MOFA and JIVE more consistently identified factors associated to survival, clinical annotations and biological annotations. Finally in (iii), despite never being applied to single-cell data, tICA and MSFA outperformed other methods for their ability to cluster single cells based on their cell line of origin. Overall, our results show that RGCCA, MCIA and JIVE perform consistently better across the three scenarios. This suggests that a mathematical formulation, based on the search of omic-specific factors whose inter-dependence is maximized, better approximates the nature of multi-omics data.
(1) Institut de Biologie de l’Ecole Normale Superieure IBENS, France, (2) Aix Marseille University, INSERM, MMG, CNRS, France, (3) Aix Marseille University, CNRS, France
Single-cell transcriptome and chromatin accessibility data integration reveals cell specific signatures
Authors: Andres Quintero (1), Anne-Claire Kröger (2), Carl Herrmann (2)
The ability to integrate multiple layers of omics data will play an essential role in understanding the complex interplay of different molecular mechanisms that give rise to cellular diversity. In particular, single-cell multi-omics studies provide an enormously valuable source of information, allowing the characterization of different cell states under different biological contexts. However, the integration of distinct cellular modalities to disentangle the regulatory networks and pathways that explain cell identity is still a challenge.Here we introduce Integrative Iterative Non-negative Matrix Factorization (i2NMF), a computational method to dissect cell type associated signatures from multi-omics data sets. i2NMF takes full advantage of data sets with multiple modalities for the same sample or cell, defining cell type-specific features and discerning the shared and specific contribution of each omics type to the identification of different cell types. We applied i2NMF to an early human embryo single-cell multi-omics data set for which scRNA-seq and scATAC-seq profiles were available for every single cell, identifying master transcription factors at the morula and blastocyst stages. Finally, i2NMF is also able to integrate different modalities across multiple experiments. We used this functionality to extract cell-type specific molecular signatures from two complementary datasets of the mouse visual cortex, comprising scATAC-seq and scRNA-seq data. i2NMF was implemented on TensorFlow, presenting a scalable framework and allowing its efficient execution under multiple systems. Our results demonstrate that i2NMF is a useful tool to identify cell-type specific signatures and dissect their underlying molecular features.
(1) German Cancer Research Center (DKFZ), Germany, (2) University Hospital Heidelberg, Germany
Linking signalling and metabolomic footprints with causal networks
Aurélien Dugourd (1), Christoph Kuppe (1), Rafael Kramann (1), Julio Saez-Rodriguez (2)
Renal clear cell carcinomas (RCCC) are the result of a system-wide dysregulation of signaling and metabolic functions originating from multiple factors. Characterizing cellular molecular machineries across multiple omic layers is a very powerful strategy to understand the cellular effects of such dysregulations. In this study, we performed metabolomics and phosphoproteomics from RCCC tissue in comparison to the non-cancerous kidney tissue in a cohort of 20 patients. In order to extract mechanistic information from these observations and to integrate both datasets, we developed a novel analysis pipeline. Phosphoproteomic abundance changes are used to estimate kinase activity changes across patients. Kinase activity estimations are then correlated with metabolite abundance changes. This points at possible interactions between signaling pathways and metabolism. We subsequently build a generic network integrating signaling pathways and metabolic reaction networks based on literature knowledge and databases. We use this signaling/metabolic network to identify paths across kinases and metabolic enzymes to link the correlated kinase activities and metabolites.
This provides potential mechanisms to explain the effect of deregulation of signaling on metabolism. Our approach was able to recover the structure canonical signaling pathway topologies and highlight specific connections between kinases and metabolite abundance deregulated in kidney tumor tissues. This pipeline allows to extract and compare mechanistic
information from metabolomic, phosphoproteomic (and potentially transcriptomic) data across many kidney cancer patients. This information can be used to select potential therapeutic targets to disrupt cancer specific cellular mechanisms, such as the SP1 kinase. Furthermore, the pipeline offers the advantage of being easily transferable in many different biological contexts.
(1) RWTH Uniklinikum Aachen, Germany, (2) Heidelberg University, Germany
A network-based approach for the identification of multi-omics modules associated with complex human diseases
Authors: Maria Anna Wörheide (1), Jan Krumsiek (2), Gabi Kastenmüller (1), Matthias Arnold (1)
Application of advanced high-throughput omics technologies have provided us with vast amounts of quantitative, highly valuable data. For complex, heterogeneous, and untreatable diseases such as Alzheimer’s disease (AD), the integration of different omics levels and their interconnections is desperately needed to understand the underlying molecular pathomechanisms and identify potential therapeutic targets. However, integrated, multivariable analyses of cross-omics data are not straightforward, and even if successfully applied, often lack a human comprehensible representation. Graph databases provide an intuitive and mathematically well defined framework to store and interconnect diverse biological domains in accessible network structures. Here, we propose a network-based, multi-omics framework
developed with the graph database Neo4j, that allows the large-scale integration and analysis of data on biological entities across omics, as well as results from association analysis with specific (endo) phenotypes. The backbone of this framework comes from known biological relationships and functional/pathway annotations available in public databases. It is augmented with experimental, quantitative data for single omics (e.g. tissue-specific gene expression) and across omics (e.g. eQTLs or mQTLs) derived in population-based studies. To identify modules within this network that are potentially relevant to a disease such as AD, we extend the
framework using large-scale association data for AD (e.g. from case-control GWASs). The resulting network is comprised of over 50 million nodes (entities), representing more than 30 different data types, and more than 80 million edges (relationships). We mined this comprehensive catalogue of biological information using established graph algorithms to
identify potentially disease-related modules of tightly interlinked entities, and were able to obtain several subnetworks significantly enriched for AD-associations.
(1) Helmholtz Zentrum München, Germany, (2) Weill Cornell Medicine, United States of America
Mechanistic insights into transcription factor cooperativity and its impact on protein-phenotype interactions
Authors: Ignacio Ibarra, Nele Hollmann, Bernd Klaus, Sandra Augsten, Britta Velten, Janosch Hennig, Judith Zaugg (EMBL Heidelberg)
Recent high-throughput transcription factor (TF) binding assays revealed that TF cooperativity
is a widespread phenomenon. However, we still miss global mechanistic and functional understanding of TF cooperativity. To close this gap we introduce a statistical learning framework that provides structural insight into TF cooperativity and its functional consequences based on next generation sequencing data. We identify DNA shape as driver for cooperativity, with a particularly strong effect for Forkhead-Ets pairs. Follow-up experiments revealed a local shape preference at the Ets-DNA-Forkhead interface and a decreased cooperativity once the interaction is lost. Additionally, we discovered many novel functional associations for cooperatively bound TFs. Examining the novel link between FOXO1:ETV6 and lymphomas revealed that their joint expression levels improve patient survival stratification.
Altogether, our results demonstrate that inter-family cooperative TF binding is driven by position-specific DNA readout mechanisms, which provides an additional regulatory layer for downstream biological functions.
Working on your own conference poster? Then check out 10 tips to create a scientific poster people want to stop by .