Download Free Computational Characterization Of Protein Rna Interactions And Implications For Phase Separation Book in PDF and EPUB Free Download. You can read online Computational Characterization Of Protein Rna Interactions And Implications For Phase Separation and write the review.

Dr. Nicolas Lux Fawzi is a member of the Scientific Advisory Board of Dewpoint Therapeutics LLC. All other Topic Editors declare no competing interests with regards to the Research Topic.
Abstract: This dissertation is about the computational analysis and prediction of RNA-protein interactions. Ribonucleic acids (RNAs) and proteins both are essential for the control of gene expression in our cells. Gene expression is the process by which a functional gene product, namely a protein or an RNA, is produced from a gene, starting from the gene region on the DNA with the transcription of an RNA. Once regarded primarily as a messenger to transmit the protein information, recent years have seen RNA moving further into the biomedical spotlight, thanks to its increasingly uncovered roles in regulating gene expression. In addition, RNA has showcased its therapeutic potential, as famously demonstrated by the groundbreaking success of RNA vaccines in the COVID-19 pandemic. However, RNAs rarely function on their own: In humans, more than 1,500 different RNA-binding proteins (RBPs) are involved in controlling the various stages of an RNA's life cycle, creating a highly complex regulatory interplay between RNAs and proteins. It is therefore of fundamental importance to study these RNA-protein interactions, in order to deepen our understanding of gene expression. Over the last decade, CLIP-seq has become the dominant experimental method to identify the set of cellular RNA binding sites for an RBP of interest. However, analysing the resulting CLIP-seq data can be challenging, as there are many analysis steps and CLIP-seq protocol variants available, each requiring specific adaptations to the analysis workflow. Consequently, there is a need for analysis guidelines, providing easy access to tools, as well as the constant improvement of tools and workflows to increase the accuracy of the analysis results. The first set of works included in this thesis (publications P1, P4, and P5) deals with these topics, by providing a review article on CLIP-seq data analysis, as well as two articles on how to further improve CLIP-seq data analysis. Publication P1 supplies readers with an overview of tools and protocols, as well as guidelines to conduct a successful analysis, drawing largely from our own experience with analysing CLIP-seq data. Publication P4 demonstrates the issues current binding site identification tools have with CLIP-seq data from RBPs that bind to processed RNAs, and that the integration of RNA processing information improves the resulting binding site quality. On top of this, publication P5 presents Peakhood, the first tool that utilizes RNA processing information in order to increase the quality of RBP binding sites identified from CLIP-seq data. A natural drawback of experimental methods is that a target RNA needs to be sufficiently expressed in the observed cells for an RNA-protein interaction to be detected. Hence, since gene expression is a dynamic process that differs between cell types, time points, and conditions, a CLIP-seq experiment cannot recover the complete set of cellular RBP binding sites. This creates a demand for computational methods which can learn the binding properties of an RBP from existing CLIP-seq data, in order to predict RBP binding sites on any given target RNA. Besides interacting with proteins, RNAs can also interact with other RNAs, further increasing the amount of possible regulatory interactions between RNAs and proteins. In this regard, long non-coding RNAs (lncRNAs), a large class of non-protein-coding RNAs whose functions are still vastly unexplored, have become especially important, as it has been shown that they can engage in RNA-RNA interactions, whose regulatory mechanisms also include RNA-protein interactions. As such mechanistic studies are typically slow and expensive, computational tools that combine RNA-protein and RNA-RNA interaction predictions to infer potential mechanisms could be of great help, e.g., by screening a set of target RNAs and proteins and suggesting plausible mechanisms for experimental validation. The second set of works included in this thesis (publications P2 and P3) thus deals with the computational prediction of RNA-protein interactions, RNA-RNA interactions and the functional mechanisms that can be inferred from these interactions. Publication P2 introduces MechRNA, the first tool to infer functional mechanisms of lncRNAs based on their predicted interactions with RBPs and other RNAs, as well as gene expression data. We demonstrated MechRNA's capability to identify formerly described lncRNA mechanisms and experimentally validated one prediction, underlining its value for functional lncRNA studies. Finally, publication P3 presents RNAProt, a flexible and performant RBP binding site prediction tool based on recurrent neural networks. Compared to other popular deep learning methods, RNAProt achieves state-of-the-art predictive performance, as well as superior runtime efficiency. In addition, it is more feature-rich than any other available method, including the support of user-defined predictive features. We further showed that its visualizations agree with known RBP binding preferences, and demonstrated that its additional predictive features can increase the specificity of predictions
The work reported in this book represents an excellent example of how creative experimentation and technology development, complemented by computational data analysis, can yield important insights that further our understanding of biological entities from a systems perspective. The book describes how the study of a single RNA-binding protein and its interaction sites led to the development of the novel ‘protein occupancy profiling’ technology that for the first time captured the mRNA sequence space contacted by the ensemble of expressed RNA binders. Application of protein occupancy profiling to eukaryotic cells revealed that extensive sequence stretches in 3’ UTRs can be contacted by RBPs and that evolutionary conservation as well as negative selection act on protein-RNA contact sites, suggesting functional importance. Comparative analysis of the RBP-bound sequence space has the potential to unravel putative cis-acting RNA elements without a priori knowledge of the bound regulators. Here, Dr. Munschauer provides a comprehensive introduction to the field of post-transcriptional gene regulation, examines state-of-the-art technologies, and combines the conclusions from several journal articles into a coherent and logical story from the frontiers of systems-biology inspired life science. This thesis, submitted to the Department of Biology, Chemistry and Pharmacy at Freie Universität Berlin, was selected as outstanding work by the Berlin Institute for Medical Systems Biology at the Max-Delbrueck Center for Molecular Medicine, Germany.
A multi-discipline, hands-on guide to microarray analysis of biological processes Analyzing Microarray Gene Expression Data provides a comprehensive review of available methodologies for the analysis of data derived from the latest DNA microarray technologies. Designed for biostatisticians entering the field of microarray analysis as well as biologists seeking to more effectively analyze their own experimental data, the text features a unique interdisciplinary approach and a combined academic and practical perspective that offers readers the most complete and applied coverage of the subject matter to date. Following a basic overview of the biological and technical principles behind microarray experimentation, the text provides a look at some of the most effective tools and procedures for achieving optimum reliability and reproducibility of research results, including: An in-depth account of the detection of genes that are differentially expressed across a number of classes of tissues Extensive coverage of both cluster analysis and discriminant analysis of microarray data and the growing applications of both methodologies A model-based approach to cluster analysis, with emphasis on the use of the EMMIX-GENE procedure for the clustering of tissue samples The latest data cleaning and normalization procedures The uses of microarray expression data for providing important prognostic information on the outcome of disease
RNA is ubiquitous in the cellular environment, and it can function in innumerable ways with a variety of interaction partners. A RNA molecule's structure, in particular the set of base pairing interactions between the nucleotides of the molecule known as secondary structure, can help determine its function. Since most proteins can only bind to either single stranded or double stranded RNA, RNA secondary structure can also help determine where and how RNA-protein binding interactions occur. In this work I investigate computational models for RNA-protein interactions in a variety of different contexts. In Chapter 2 I probe the effect of single nucleotide variations on RNA-protein binding as mediated by RNA secondary structure. Single nucleotide variations are single nucleotide changes in an organism's genome that can often cause disease, and may do so through a number of different mechanisms. In this work we propose that sequence changes can affect accessibility to protein binding sites through changes in secondary structure, even when these sequence changes occur tens of nucleotides outside of protein binding sites. We find that single nucleotide variations can have a many fold effect on the binding affinity of proteins for RNA, and characterize the genome-wide effect of single nucleotide variations on HuR binding. HuR is a single-stranded RNA binding protein that binds to AU-rich sequences, and has links to diseases such as cancer. We also find an asymmetry in this effect for HuR, indicating that this effect may be under selection. Following the previous work, which utilizes a model incorporating single stranded RNA binding proteins into RNA secondary structure folding, I introduce a model for incorporating double stranded RNA binding proteins (dsRBPs) into RNA secondary structure partition function calculations in Chapter 3. The dsRBPs are an important but understudied class of proteins that have uses in a wide range of processes. We implement our model in the ViennaRNA package, and validate it by calculating a number of experimental observables for transactivation response element RNA-binding protein. We find that RNA secondary structure can have a many fold effect on the effective binding affinity of dsRBPs, and show that calculated affinities for pre-miRNA-like constructs correlate with experimentally measured processing rates. Our model provides a novel method for interrogating the interplay between dsRBPs and RNA secondary structure. In Chapter 4 I study RNA-protein interactions in a different context, and investigate the role of Shine-Dalgarno (SD) sequences in translation in the Bacteroidetes. The Bacteroidetes are a phylum of bacteria known to rarely use SD sequences, but after performing a survey of SD usage in the phylum we find that certain ribosomal protein genes utilize them, particularly rpsU. A cryo-electron microscopy structure of the ribosome from Flavobacterium johnsoniae, a member of the Bacteroidetes, also shows that S21, which is encoded by the ribosomal open reading frame rpsU, sequesters the anti-Shine-Dalgarno (ASD) sequence. In our survey of SD sequences we also find covariation between the SD sequence of rpsU and the ASD sequence. These observations suggest an autoregulatory model for S21 in the Bacteroidetes.
Interactions between proteins and nucleic acid molecules are central to the cellular regulation and homeostasis. To study them, I employ a wide range of computational analysis methods to integrate genomic data from many types of experiment. This thesis has three parts. In the first part, I explore the patterns of indels created by CRISPR-Cas9 genome editing. By thorough characterisation of the precision of editing at thousands of genomic target sites, we identify simple sequence rules that can help predict these outcomes. Furthermore, we examine the role of the structural chromatin context in fine-tuning Cas9-DNA interactions. In the second part, I explore methods to study protein-RNA interactions. I use comparative computational analyses to assess both the data quality of, and data analysis methods for, different crosslinking and immunoprecipitation (CLIP) technologies. I then develop new methods to analyse data generated by hybrid individual-nucleotide resolution CLIP (hiCLIP). By tailoring computational solutions to an understanding of experimental conditions, I improve the overall sensitivity of hiCLIP, and ultimately feedback to drive ongoing experimental development. In the third part, I focus on the Staufen family of double-stranded RNA binding proteins and using hiCLIP data to define transcriptome-wide atlases of RNA duplexes bound by these proteins both in a cell line and in rat brain tissue. Through integration with other data sets, both publicly available and newly generated, I derive insights into their function in RNA metabolism, and in how these interactions change during the course of mammalian brain development with putative roles in ribonucleoprotein complex formation. In summary, I present a range of tailored computational methods and analyses developed to understand interactions between proteins and nucleic acids; aiming to link these interactions to functional outcomes.
Protein-nucleic acid interactions are paramount for maintaining cellular homeostasis. Characterization of protein-nucleic acid complexes by high-resolution structural biology methods remains a challenge due to intrinsic structural and chemical heterogeneity. Native mass spectrometry (nMS) is a powerful bioanalytical tool for the investigation of proteins and protein complexes; however, it has only sparingly been implemented in the analysis of protein -nucleic acid complexes. This dissertation describes the application of native mass spectrometry to the analysis of RNA and RNA-protein complexes. Chapters 2 and 3 describe the characterization of the stoichiometry of the HIV-1 viral assembly nucleation complex. Using nMS, it was revealed that Gag specifically dimerizes in the presence of RNA containing the HIV-1 packaging signal (Psi), while other RNAs are bound primarily to monomeric Gag. Further investigations focused on the effect of transcription start site heterogeneity on the dimerization of the HIV-1 genomic RNA 5′ untranslated region (5′UTR), and stoichiometry. It was observed that 5′UTRs that begin with a single guanosine preferentially dimerize and are bound by Gag. Chapter 4 focuses on the characterization of a gas-phase separation method (ion mobility) as a structural biology tool for RNA. The effect of magnesium during RNA folding, solution temperature, ionization polarity, and collisional activation on the collision cross section of tRNAPhe were probed. It was observed that magnesium is essential for the folding and stability of tRNAPhe, consistent with previous reports. The collision cross sections (CCS) of tRNAPhe were compared in both positive and negative ionization polarities. The CCS of tRNAPhe refolded in under folding conditions was lower in negative mode relative to positive mode. It was observed that the CCS of WT tRNAPhe was not affected by the solution temperatures tested, however the CCS of a mutant (MT) tRNAPhe, that has a perturbed tertiary interaction network, increased as a function of solution temperature. Furthermore, we also probed the stability of these RNAs using collision-induced unfolding and it was observed that the wild-type RNA underwent collision-induced collapse while the mutant tRNA collapsed to a lesser extent. Lastly, a small dimeric RNA-RNA complex (HJ3) was used to determine whether RNA quaternary structure is preserved upon transfer to the gas phase. The intact dimer was observed in the gas-phase, and surface-induced dissociation was identified as an effective method for probing the stoichiometry and of RNA-RNA complexes. Chapter 5 describes the characterization of Hfq-RNA complex stability and subunit connectivity by surface-induced dissociation and ion mobility. It was observed that the intrinsically disordered C-terminal domains greatly stabilize Hfq. RNA-binding to Hfq destabilizes the wild-type protein yet stabilizes a protein that lacks C-terminal domains. The dissociation products observed for RNA-bound wild-type and the mutant Hfq were remarkably similar, suggesting that the C-terminal domains do not alter the RNA binding interfaces.