Event Report by Apoorva Baluapuri, University of Würzburg, Germany
RNA and DNA were first described by the Swiss biologist Friedrich Miescher in 1868. About 150 years later, we stand at crossroads of the two disciplines which have arisen as a result of dedicated research on both molecules. The first EMBL symposium on the connections between transcription and DNA replication/repair research was a major step forward in combining the progress from wide ranging topics, thus generating a consensus on how gene expression and DNA transactions cooperate.
The symposium, which was the second one from EMBL this year, was scheduled just a day after International Women’s Day, and that aligned very well with the equally represented line-up of speakers and organisers!
The titular opening session was dedicated to transcription-associated genomic instability where all aspects of R-loops and ribonucleotide excision repair in transcription coupled DNA double strand break repair were covered. For example, Gaëlle Legube(CNRS – University of Toulouse, France) expanded in great detail on the influence of DSB-induced chromatin conformation and the strong potential of 3C-based technologies, while Elodie Hatchi (Dana Farber Cancer Institute, Boston, USA) explained about her recent publication in Nature concerning the impact of BRCA1, RAD52 and PALB2 on small RNA-driven DNA repair.
Eventually, we switched over to a more translational theme with Rushad Pavri (IMP, Vienna, Austria) who spoke about the relation between DNA replication timing and frequency of oncogenic translocations.
This time around, the poster presentation sessions were equally dynamic with topics being covered from role of RBMX in RNA processing (Sara Luzzi, University of Newcastle) to role of MYCN in reconciling elevated transcription levels with DNA replication (Dimitrios Papadopoulos, University of Würzburg, Germany).
To end the first day, Philippe Pasero (CNRS, France) tried to answer the old question of the chicken or the egg in terms of toxic R-loops, if they are the cause or consequences of DNA replication stress, while Andrew Deans (St. Vincent’s Institute of Medical Research, Australia) explained about fork re-modellers as a general mechanism of R-loop removal.
The second day started out on a high note by a talk on the consequences of DNA damage and heat shock on Pol II from Jesper Svejstrup. Prof Svejstrup recently moved his lab from the Francis Crick Institute in London to the University of Copenhagen (Faculty of Health and Medical Sciences).
To make things even more exciting at the symposium, Martin Eilers (University of Würzburg, Germany) spoke about conflict resolution by MYCN between “friends and foes”, i.e. Pol II and replication fork. This was followed by talk by Marco Foiani (University of Milan, Italy) who showed the role of ATR in nuclear integrity.
In between the breaks, the participants eagerly shared their setup of how they were joining the virtual conference:
Along with this fun, the second day’s poster session continued with equally interesting topics as the previous day. The virtual conference platform provided by Engagez came across as a handy tool in coming as close as possible to the in-person poster presentations.
Frédéric Chédin (University of California, Davis, USA) closed the day by talking about interplay between splicing and R-loops.
In the next two days, a wide variety of topics and methods were covered. For example, Nick Proudfoot (University of Oxford, UK) dazzled with correlation between R-loops and antisense transcription while Petra Beli (IMB, Mainz, Germany) moved the focus from genomics to proteomics with èlan. She spoke about a method called “RDProx” which maps R-loop proximal proteome in a native chromatin environment.
Also, junior group leaders like Marco Saponaro (University of Birmingham, UK) answered what happens to replication when it encounters transcription and Madzia Crossley (Stanford University, USA) showed CytoDRIP-blots to probe RNA-DNA hybrids on gels which showed that SETX and BRCA1 loss, along with splicing inhibition, results accumulation of RNA-DNA hybrids in cytoplasm!
All along the talks, whichever questions (which, by the way, were in majority from younger researchers) didn’t get answered, were posted and responded to in the “Forum” section: this actually became a valuable summary of quite a few topics.
The networking options were also in abundance, be it the Virtual Bar mixer, or Meet the Editors session on the online platform. Given that editors from elite journals like EMBO, PLoS Biology etc. were present, it gave a nice opportunity for the researchers to gauge where their next big story could find a good home.
In summary, the symposium gave the feeling of being cozy without being too small and specific in terms of the topics covered, and benefited both the experienced and young researchers in an equal way. It was a common understanding and expectation among the participants that this symposium would perhaps be held in person next time if possible.
A total of 189 posters were presented, from which two were singled out as the winners by popular vote.
Characterization of the genomic and splicing features of long non-coding RNAs using bioinformatics approaches
Authors: Monah Abou Alezz, Ludovica Celli, Giulia Belotti, Silvia Bione, Institute of Molecular Genetics L. L Cavalli-Sforza – National Research Council, Italy
Recent developments in deep sequencing approaches have simulated the continuous discovery of a significantly large number of novel long non-coding RNA (lncRNA) genes loci in the genomes. Long non-coding RNAs are recognized as a new class of regulatory molecules despite very little is known about their functions in the cellular processes. Due to their overall low expression level and tissue-specificity, the identification and annotation of lncRNA genes still remains challenging. The characterization of lncRNAs’ features is crucial to understand and get functional insights on their mechanisms of action. We exploited recent annotations by the GENCODE compendium to characterize the genomic and splicing features of long non-coding genes, in comparison to protein-coding ones, in the human and mouse genome by using bioinformatics approaches. Our analysis highlighted differences between the two classes of genes in terms of gene architecture regarding exons and introns length, GC-content, and the combinatorial patterns of chromatin marks and states. Moreover, significant differences in the splice sites usage were observed between long non-coding and protein-coding genes. While the frequency of non-canonical GC-AG splice junctions represents about 0.8% of total splice sites in protein-coding genes, we identified a remarkable enrichment of the GC-AG splice sites in long non-coding genes, both in human (3.0%) and mouse (1.9%). In addition, we identified peculiar characteristics of the GC-AG introns in terms of donor and acceptor splice sites strength, poly-pyrimidine tract, intron length, and a positional bias of GC-AG junctions being enriched in the first intron. Genes containing at least one GC-AG intron were found conserved in many species across large evolutionary distances, more prone to alternative splicing and a functional analysis pointed toward their enrichment in specific biological processes such as
Authors: Bastian Fromm (1), Diana Domanska (2), Eirik Hoye (3), Vladimir Ovchinnikov (4), Wenjing Kang (5), Ernesto Aparicio-Puerta (6), Morten Johansen (7), Kjersti Flatmark (3), Anthony Mathelier (8), Hovig
Eivind (3), Michael Hackenberg (6), Marc Friedländer (5), Kevin Peterson (9)
Non-coding RNAs (ncRNA) have gained substantial attention due to their roles in human disorders and animal development. microRNAs (miRNAs) are unique within this class as they are the only ncRNAs with individual gene sequences conserved across the animal kingdom. Bona fide miRNAs can be clearly distinguished from the myriad small RNAs generated in cells by a set of unique criteria. Unfortunately, recognition and utilization of these clear and mechanistically well understood features is not a common practice. We addressed this by extensively expanding our curated miRNA gene database MirGeneDB to 45 organisms that represent the breadth of Metazoa. By consistently annotating and naming more than 11,000 miRNA genes in these organisms, we show that previous miRNA annotations contained not only many false positives, but surprisingly many false negatives as well. Indeed, curated miRNA complements of closely related organisms are very similar and can be used to reconstruct evolution of miRNA genes, families and biogenesis across more than 1 billion years of evolution. MirGeneDB represents a robust platform for providing deeper and more significant insights into the biology of miRNAs, possible sources of mis-regulation, and evolutionary mechanisms. MirGeneDB is publicly and freely available under http://mirgenedb.org/.
(1) Science for Life Laboratory, Sweden (2) Department of Informatics, University of Oslo, Oslo, Norway (3) Department of Tumor Biology, Institute for Cancer Research, The Norwegian Radium Hospital, Oslo University Hospital, Oslo, Norway (4) School of Life Sciences, Faculty of Health and Life Sciences, University of Nottingham, United Kingdom (5) Stockholm University, SciLifeLab, Sweden (6) Department of Genetics, Faculty of Sciences, University of Granada, Granada, Spain (7) Institute for Medical Informatics, The Norwegian Radium Hospital, Oslo University Hospital, Oslo, Norway (8) Centre for Molecular Medicine Norway (NCMM), Nordic EMBL Partnership, University of Oslo, Oslo, Norway (9) Department of Biological Sciences, Dartmouth College, Hanover, New Hampshire, United States of America
258 researchers from various fields gathered in Heidelberg last week to listen to 36 talks and engage with 146 poster presenters. Here we present the posters of 5 scientists who received best poster awards at the conference by popular vote.
Benchmarking of multi-omics joint dimensionality reduction (DR) approaches for cancer study
Authors: Laura Cantini (1), Pooya Zakeri (2), Aurelien Naldi (1), Denis Thieffry (1), Elisabeth Remy (3), Anaïs Baudot (2)
Dimensionality Reduction (DR), decomposing data into low-dimensional spaces while preserving most of their information content, is among the most prevalent machine learning techniques in data mining. With the advent of high-throughput technologies, high-dimensional data have become a standard in biology, emphasizing the use of DR. This phenomenon is particularly pronounced in cancer biology, where consortia have profiled thousands of patients for multiple molecular assays (“multi-omics”), including at the emerging single-cell scale. DR approaches have been mainly applied to single omics data leading to cancer subtyping, tumor sub-clones quantification and immune infiltration quantification. Recently, DR approaches designed to jointly analyze multiple omics have been proposed. Integrative DR methods are based on various mathematical assumptions, ranging from extensions of CCA, tensors, or more general data fusion approaches, which makes difficult to chose which method to apply.
In this context, we here in-depth benchmark multi-omics DR approaches using: i) artificial multi-omics cancer data ii) multi-omics bulk data from 10 different cancer types downloaded from TCGA iii) multi-omics single-cell data from cancer cell lines In (i), the capability of the various methods to predict the clustering ground truth was found strongly sensible to the size of the clusters, with intNMF, RGCCA, MCIA and JIVE being the more robust methods. For (ii), MCIA, RGCCA, MOFA and JIVE more consistently identified factors associated to survival, clinical annotations and biological annotations. Finally in (iii), despite never being applied to single-cell data, tICA and MSFA outperformed other methods for their ability to cluster single cells based on their cell line of origin. Overall, our results show that RGCCA, MCIA and JIVE perform consistently better across the three scenarios. This suggests that a mathematical formulation, based on the search of omic-specific factors whose inter-dependence is maximized, better approximates the nature of multi-omics data.
(1) Institut de Biologie de l’Ecole Normale Superieure IBENS, France,(2) Aix Marseille University, INSERM, MMG, CNRS, France,(3) Aix Marseille University, CNRS, France
Single-cell transcriptome and chromatin accessibility data integration reveals cell specific signatures
Authors: Andres Quintero (1), Anne-Claire Kröger (2), Carl Herrmann (2)
The ability to integrate multiple layers of omics data will play an essential role in understanding the complex interplay of different molecular mechanisms that give rise to cellular diversity. In particular, single-cell multi-omics studies provide an enormously valuable source of information, allowing the characterization of different cell states under different biological contexts. However, the integration of distinct cellular modalities to disentangle the regulatory networks and pathways that explain cell identity is still a challenge.Here we introduce Integrative Iterative Non-negative Matrix Factorization (i2NMF), a computational method to dissect cell type associated signatures from multi-omics data sets. i2NMF takes full advantage of data sets with multiple modalities for the same sample or cell, defining cell type-specific features and discerning the shared and specific contribution of each omics type to the identification of different cell types. We applied i2NMF to an early human embryo single-cell multi-omics data set for which scRNA-seq and scATAC-seq profiles were available for every single cell, identifying master transcription factors at the morula and blastocyst stages. Finally, i2NMF is also able to integrate different modalities across multiple experiments. We used this functionality to extract cell-type specific molecular signatures from two complementary datasets of the mouse visual cortex, comprising scATAC-seq and scRNA-seq data. i2NMF was implemented on TensorFlow, presenting a scalable framework and allowing its efficient execution under multiple systems. Our results demonstrate that i2NMF is a useful tool to identify cell-type specific signatures and dissect their underlying molecular features.
(1) German Cancer Research Center (DKFZ), Germany, (2) University Hospital Heidelberg, Germany
Linking signalling and metabolomic footprints with causal networks
Aurélien Dugourd (1), Christoph Kuppe (1), Rafael Kramann (1), Julio Saez-Rodriguez (2)
Renal clear cell carcinomas (RCCC) are the result of a system-wide dysregulation of signaling and metabolic functions originating from multiple factors. Characterizing cellular molecular machineries across multiple omic layers is a very powerful strategy to understand the cellular effects of such dysregulations. In this study, we performed metabolomics and phosphoproteomics from RCCC tissue in comparison to the non-cancerous kidney tissue in a cohort of 20 patients. In order to extract mechanistic information from these observations and to integrate both datasets, we developed a novel analysis pipeline. Phosphoproteomic abundance changes are used to estimate kinase activity changes across patients. Kinase activity estimations are then correlated with metabolite abundance changes. This points at possible interactions between signaling pathways and metabolism. We subsequently build a generic network integrating signaling pathways and metabolic reaction networks based on literature knowledge and databases. We use this signaling/metabolic network to identify paths across kinases and metabolic enzymes to link the correlated kinase activities and metabolites.
This provides potential mechanisms to explain the effect of deregulation of signaling on metabolism. Our approach was able to recover the structure canonical signaling pathway topologies and highlight specific connections between kinases and metabolite abundance deregulated in kidney tumor tissues. This pipeline allows to extract and compare mechanistic
information from metabolomic, phosphoproteomic (and potentially transcriptomic) data across many kidney cancer patients. This information can be used to select potential therapeutic targets to disrupt cancer specific cellular mechanisms, such as the SP1 kinase. Furthermore, the pipeline offers the advantage of being easily transferable in many different biological contexts.
A network-based approach for the identification of multi-omics modules associated with complex human diseases
Authors: Maria Anna Wörheide (1), Jan Krumsiek (2), Gabi Kastenmüller (1), Matthias Arnold (1)
Application of advanced high-throughput omics technologies have provided us with vast amounts of quantitative, highly valuable data. For complex, heterogeneous, and untreatable diseases such as Alzheimer’s disease (AD), the integration of different omics levels and their interconnections is desperately needed to understand the underlying molecular pathomechanisms and identify potential therapeutic targets. However, integrated, multivariable analyses of cross-omics data are not straightforward, and even if successfully applied, often lack a human comprehensible representation. Graph databases provide an intuitive and mathematically well defined framework to store and interconnect diverse biological domains in accessible network structures. Here, we propose a network-based, multi-omics framework
developed with the graph database Neo4j, that allows the large-scale integration and analysis of data on biological entities across omics, as well as results from association analysis with specific (endo) phenotypes. The backbone of this framework comes from known biological relationships and functional/pathway annotations available in public databases. It is augmented with experimental, quantitative data for single omics (e.g. tissue-specific gene expression) and across omics (e.g. eQTLs or mQTLs) derived in population-based studies. To identify modules within this network that are potentially relevant to a disease such as AD, we extend the
framework using large-scale association data for AD (e.g. from case-control GWASs). The resulting network is comprised of over 50 million nodes (entities), representing more than 30 different data types, and more than 80 million edges (relationships). We mined this comprehensive catalogue of biological information using established graph algorithms to
identify potentially disease-related modules of tightly interlinked entities, and were able to obtain several subnetworks significantly enriched for AD-associations.
Recent high-throughput transcription factor (TF) binding assays revealed that TF cooperativity
is a widespread phenomenon. However, we still miss global mechanistic and functional understanding of TF cooperativity. To close this gap we introduce a statistical learning framework that provides structural insight into TF cooperativity and its functional consequences based on next generation sequencing data. We identify DNA shape as driver for cooperativity, with a particularly strong effect for Forkhead-Ets pairs. Follow-up experiments revealed a local shape preference at the Ets-DNA-Forkhead interface and a decreased cooperativity once the interaction is lost. Additionally, we discovered many novel functional associations for cooperatively bound TFs. Examining the novel link between FOXO1:ETV6 and lymphomas revealed that their joint expression levels improve patient survival stratification.
Altogether, our results demonstrate that inter-family cooperative TF binding is driven by position-specific DNA readout mechanisms, which provides an additional regulatory layer for downstream biological functions.
It’s a well known fact that EMBL conferences present the most top-class science from around the world, not only from established researchers but also up-and-coming scientists. In this brand new series we will feature some of the award-winning posters from recent EMBL conferences and symposia.
De novo selection of peptides that confer antibiotic resistance
Authors: Michael Knopp (1), Jonina Gudmundsdottir (1), Tobias Nilsson (2), Finja König (2), Omar Warsi (1), Fredrika Rajer (1), Pia Ädelroth (2), Dan Andersson (1)
The origin of novel genes and proteins is a fundamental question in evolutionary biology. New genes can originate from different mechanisms including horizontal gene transfer, duplication-divergence and de novo from non-coding DNA sequences. Comparative genomics has generated strong evidence for de novo emergence of genes in various organisms but experimental demonstration of this process has been limited to localized randomization in pre-existing structural scaffolds. This is bypassing the basic requirement of de novo gene emergence, i.e. lack of an ancestral gene. We constructed highly diverse plasmid libraries encoding randomly generated open reading frames and expressed them in Escherichia coli to identify peptides that could confer a beneficial and selectable phenotype in vivo. Selections on antibiotic-containing agar plates resulted in the identification of three inserts that increased aminoglycoside resistance up to 48-fold. Combining genetic and functional analyses, we show that the peptides are highly hydrophobic and that they insert into the membrane, reduce membrane potential, decrease aminoglycoside uptake and thereby confer high-level resistance. This study demonstrates that randomized DNA sequences can encode peptides that confer selective benefits, and illustrates how expression of random sequences could spark the origination of new genes.
(1) Uppsala University, Sweden; (2) Stockholm University, Sweden
Combinatoria genetics approach to prevent and disrupt biofilm-associated infection
Authors: Irina Afonina (1), Kimberly Kline (2), Timothy Lu (3)
Enterococci are opportunistic bacterial pathogens that cause a variety of infections including life-threatening endocarditis, chronic wounds, medical device and urinary tract infections. All of these infections are biofilm-associated, which are intrinsically more tolerant to antimicrobial clearance, and which therefore pose a major challenge in treating these infections. Biofilm formation is multifactorial, requiring multiple factors, which can vary depending on the environment or niche where the bacteria reside. Therefore, to understand the complexity of interactions and factors that contribute to enterococcal biofilms, we are combining CRISPRi technology with rapid DNA assembly to identify gene pairs involved in biofilm formation in different infection niches. We established a dual-vector inducible CRISPRi system for Enterococcus faecalis that targets planktonic and biofilm cells with efficiency resembling that of a gene knock out. We have shown that CRISPRi targeting of constitutively expressed gfp gene on the bacterial chromosome, fully quenches GFP signal within planktonic, early and late biofilm cells. Additionally, we have shown that silencing of the croR gene, required for bacitracin resistance, mimics a croR in-frame deletion phenotype, and both CRISPRi croR and croR show reduced minimal inhibitory concentration to bacitracin compared to uninduced or wild type strains. We are creating combinatorial libraries to identify pairs and quartets of genes of all of the two-component signal transduction systems encoded in E. faecalis, to address the hypothesis that different signals will drive unique biofilm programs in different environmental conditions. This research serves as a platform to rapidly identify combinations of genes involved in enterococcal pathogenesis, including antimicrobial resistance, virulence, and immune invasion.
(1) SMART, Singapore; (2) Nanyang Technological University, Singapore; (3) Massachusetts Institute of Technology, USA
A conserved RNA seed-pairing domain directs small RNA-mediated stress resistance in enterobacteria
Authors: Nikolai Peschek (1), Mona Hoyos (1), Roman Herzog (1), Konrad U. Förstner (2), Kai Papenfort (1)
Small regulatory RNAs (sRNAs) are crucial components of many stress response systems. The envelope stress response (ESR) of Gram-negative bacteria is a paradigm for sRNA-mediated stress management and involves, among other factors, the alternative sigma factor E (σE) and one or more sRNAs. In this study, we identified the MicV sRNA as a new member of the σE regulon in Vibrio cholerae. We show that MicV acts redundantly with another sRNA, VrrA, and that both sRNAs share a conserved seed-pairing domain to regulate multiple target mRNAs. V. cholerae lacking σE displayed increased sensitivity towards antimicrobial substances and overexpression of either of the sRNAs suppressed this phenotype. Laboratory selection experiments using a library of synthetic sRNA regulators revealed that the seed-pairing domain of σE-dependent sRNAs is strongly enriched under membrane-damaging conditions and that repression of OmpA is key for sRNA-mediated stress relief. Together, our work shows that MicV and VrrA act as global regulators in the ESR of V. cholerae and provides evidence that bacterial sRNAs can be functionally annotated by their seed-pairing sequences.
The interaction between replication factor DiaA and primary metabolite sedoheptulose-7- phosphate directly regulates DNA replication in Escherichia coli
Authors: Joanna Morcinek-Orlowska (1), Aleksandra Bebel (1), Justyna Galinska (1), Torsten Wladminghaus (2), Anna Zawilak-Pawlik (3), Monika Glinkowska (1)
To proliferate, bacterial cells duplicate their genomes and this process is coordinated with cell growth and division. During the last few decades, various biochemical mechanisms controlling initiation of DNA replication in the model bacterium Escherichia coli have been characterized in detail. However, it remains elusive what constitutes a signal for the growing cell to initiate the next round of chromosomal DNA replication. Here we present evidence that a primary metabolite sedoheptulose 7-phosphate (S7P) binds to a replication factor DiaA and regulates its activity in promoting oligomerization of the DnaA initiator protein. Furthermore, our results show that the cellular level of S7P and the ability of DiaA to interact with the metabolite both influence DNA replication in vivo. S7P is an intermediate in the pentose phosphate pathway, providing building blocks for synthesis of nucleotides and a starting point for production of the outer membrane components. Consequently, we propose a mechanism which links DNA replication with cell growth through primary metabolism.