Download Free Improving The Prediction Of Rna Secondary Structure And Automatic Alignment Of Rna Sequences Book in PDF and EPUB Free Download. You can read online Improving The Prediction Of Rna Secondary Structure And Automatic Alignment Of Rna Sequences and write the review.

The accurate prediction of an RNA secondary structure from its sequence will enhance the experimental design and interpretation for the increasing number of scientists that study RNA. While the computer programs that make these predictions have improved, additional improvements are necessary, in particular for larger RNAs. The first major section of this dissertation is concerned with improving the prediction accuracy of RNA secondary structures by generating new energetic parameters and evaluating a new RNA folding model. Statistical potentials for hairpin and internal loops produce significantly higher prediction accuracy when compared with nine other folding programs. While more improvements can be made to the energetic parameters used by secondary structure folding programs, I believe that a new approach is also necessary. I describe a RNA folding model that is predicated on a large body of computational and experimental work. This model includes energetics, contact distance, competition and a folding pathway. Each component of this folding model is evaluated and substantiated for its validity. The statistical potentials were created with comparative analysis. Comparative analysis requires the creation of highly accurate multiple RNA sequence alignments. The second major section of this dissertation is focused on my template-based sequence aligner, CRWAlign. Multiple sequence aligners generally run into problems when the pairwise sequence identity drops too low. By utilizing multiple dimensions of data to establish a profile for each position in a template alignment, CRWAlign is able to align new sequences with high accuracy even for pairs of sequence with low identity.
RNA molecules form complex higher-order structures which are essential to perform their biological activities. The accurate prediction of an RNA secondary structure and other higher-order structural constraints will significantly enhance the understanding of RNA molecules and help interpret their functions. Covariation analysis is the predominant computational method to accurately predict the base pairs in the secondary structure of RNAs. I developed a novel and powerful covariation method, Phylogenetic Events Count (PEC) method, to determine the positional covariation. The application of the PEC method onto a bacterial 16S rRNA sequence alignment proves that it is more sensitive and accurate than other mutual information based method in the identification of base-pairs and other structural constraints of the RNA structure. The analysis also discoveries a new type of structural constraint -- neighbor effect, between sets of nucleotides that are in proximity in the three dimensional RNA structure with weaker but significant covariation with one another. Utilizing these covariation methods, a proposed secondary structure model of an entire HIV-1 genome RNA is evaluated. The results reveal that vast majority of the predicted base pairs in the proposed HIV-1 secondary structure model do not have covariation, thus lack the support from comparative analysis. Generating the most accurate multiple sequence alignment is fundamental and essential of performing high-quality comparative analysis. The rapid determination of nucleic acid sequences dramatically increases the number of available sequences. Thus developing the accurate and rapid alignment program for these RNA sequences has become a vital and challenging task to decipher the maximum amount of information from the data. A template-based RNA sequence alignment system, CRWAlign-2, is developed to accurately align new sequences to an existing reference sequence alignment based on primary and secondary structural similarity. A comparison of CRWAlign-2 with eight alternative widely-used alignment programs reveals that CRWAlign-2 outperforms other programs in aligning new sequences with higher accuracy. In addition to aligning sequences accurately, CRWAlign-2 also creates secondary structure models for each sequence to be aligned, which provides very useful information for the comparative analysis of RNA sequences and structures. The CRWAlign-2 program also provides opportunities for multiple areas including the identification of chimeric 16S rRNA sequences generated in microbiome sequencing projects.
With the dramatic increase in RNA 3D structure determination in recent years, we now know that RNA molecules are highly structured. Moreover, knowledge of RNA 3D structures has proven crucial for understanding in atomic detail how they carry out their biological functions. Because of the huge number of potentially important RNA molecules in biology, many more than can be studied experimentally, we need theoretical approaches for predicting 3D structures on the basis of sequences alone. This volume provides a comprehensive overview of current progress in the field by leading practitioners employing a variety of methods to model RNA 3D structures by homology, by fragment assembly, and by de novo energy and knowledge-based approaches.
This book explores recent progress in RNA secondary, tertiary structure prediction, and its application from an expansive point of view. Because of advancements in experimental protocols and devices, the integration of new types of data as well as new analysis techniques is necessary, and this volume discusses additional topics that are closely related to RNA structure prediction, such as the detection of structure-disrupting mutations, high-throughput structure analysis, and 3D structure design. Written for the highly successful Methods in Molecular Biology series, chapters feature the kind of detailed implementation advice that leads to quality research results. Authoritative and practical, RNA Structure Prediction serves as a valuable guide for both experimental and computational RNA researchers.
This dissertation, "Efficient Methods for Improving the Sensitivity and Accuracy of RNA Alignments and Structure Prediction" by Yaoman, Li, 李耀满, was obtained from The University of Hong Kong (Pokfulam, Hong Kong) and is being sold pursuant to Creative Commons: Attribution 3.0 Hong Kong License. The content of this dissertation has not been altered in any way. We have altered the formatting in order to facilitate the ease of printing and reading of the dissertation. All rights not granted by the above license are retained by the author. Abstract: RNA plays an important role in molecular biology. RNA sequence comparison is an important method to analysis the gene expression. Since aligning RNA reads needs to handle gaps, mutations, poly-A tails, etc. It is much more difficult than aligning other sequences. In this thesis, we study the RNA-Seq align tools, the existing gene information database and how to improve the accuracy of alignment and predict RNA secondary structure. The known gene information database contains a lot of reliable gene information that has been discovered. And we note most DNA align tools are well developed. They can run much faster than existing RNA-Seq align tools and have higher sensitivity and accuracy. Combining with the known gene information database, we present a method to align RNA-Seq data by using DNA align tools. I.e. we use the DNA align tools to do alignment and use the gene information to convert the alignment to genome based. The gene information database, though updated daily, there are still a lot of genes and alternative splicings that hadn't been discovered. If our RNA align tool only relies on the known gene database, then there may be a lot reads that come from unknown gene or alternative splicing cannot be aligned. Thus, we show a combinational method that can cover potential alternative splicing junction sites. Combining with the original gene database, the new align tools can cover most alignments which are reported by other RNA-Seq align tools. Recently a lot of RNA-Seq align tools have been developed. They are more powerful and faster than the old generation tools. However, the RNA read alignment is much more complicated than other sequence alignment. The alignments reported by some RNA-Seq align tools have low accuracy. We present a simple and efficient filter method based on the quality score of the reads. It can filter most low accuracy alignments. At last, we present a RNA secondary prediction method that can predict pseudoknot(a type of RNA secondary structure) with high sensitivity and specificity. DOI: 10.5353/th_b5153733 Subjects: Nucleotide sequence - Data processing
"Functional RNA sequences typically have structural elements that are highly conserved during evolution. Here we present an algorithmic method for multiple alignment of RNAs, taking into consideration both structural similarity and sequence identity. Furthermore, we performed a comparative analysis on pairing probability matrices of a set of aligned orthologous sequences and predicted the conserved secondary structure. Our alignment method outperforms the most widely used multiple alignment tool - Clustal W, and the structure prediction approach we proposed can generate a more accurate secondary structure for 5S rRNA compared to the existing approaches such as Alifold. In addition, our algorithms are efficient in terms of CPU time and memory usage compared to most existing methods for secondary structure prediction." --
With increasing number of non-coding RNA families being identified, there is strong interest in developing computational methods to estimate sequence alignment and secondary structure. I developed TurboFold II, an algorithm that takes multiple, unaligned homologous RNA sequences, and outputs the predicted secondary structures and the structural alignment of the sequences. Secondary structure conservation information is incorporated in the alignment using a match score, calculated from estimated base pairing probabilities, to represent the secondary structural similarity between nucleotide positions in the two sequences. TurboFold II computes a multiple sequence alignment, based on a probabilistic consistency transformation and a hierarchically computed guide tree. TurboFold II has comparable alignment accuracy with MAFFT and higher accuracy than other tools. TurboFold II also has comparable structure prediction accuracy as the original TurboFold algorithm, which is one of the most accurate methods. I adapted the TurboFold II algorithm for prediction of RNA secondary structures to utilize base pairing probabilities guided by SHAPE experimental data. Results demonstrate that the SHAPE mapping data for a sequence improves structure prediction accuracy for other homologous sequences beyond the accuracy obtained by sequence comparison alone.I also developed TurboHomology, a method for secondary structure modeling and alignment for a newly discovered sequence of an RNA family with a known secondary structure and an existing multiple sequence alignment. TurboHomology achieves greater accuracy than TurboFold II by taking advantage of the known structure and alignment.
The existence of genes for RNA molecules not coding for proteins (ncRNAs) has been recognized since the 1950's, but until recently, aside from the critically important ribosomal and transfer RNA genes, most focus has been on protein coding genes. However, a long series of striking discoveries, from RNA's ability to carry out catalytic function, to discovery of riboswitches, microRNAs and other ribo-regulators performing critical tasks in essentially all living organisms, has created a burgeoning interest in this primordial component of the biosphere. However, the structural characteristics and evolutionary constraints on RNA molecules are very different from those of proteins, necessitating development of a completely new suite of informatic tools to address these challenges. In RNA Sequence, Structure, Function: Computational and Bioinformatic Methods, expert researchers in the field describe a substantial and relevant fraction of these methodologies from both practical and computational/algorithmic perspectives. Focusing on both of these directions addresses both the biologist interested in knowing more about RNA bioinformatics as well as the bioinformaticist interested in more detailed aspects of the algorithms. Written in the highly successful Methods in Molecular Biology series format, the chapters include the kind of detailed description and implementation advice that is crucial for getting optimal results. Thorough and intuitive, RNA Sequence, Structure, Function: Computational and Bioinformatic Methods aids scientists in continuing to study key methods and principles of RNA bioinformatics.
RNA exists at the heart of many important questions in biology today. Its diverse functionality is rooted in the wide range of structures RNA is able to form. The nucleotides in an RNA sequence possess the ability to form bonds with each other. Such bonding allows a strand of RNA to fold onto itself. In contrast to the iconic double helix structure of DNA, this results in intricate 3D conformations that vary with RNA sequence and in part allow the RNA to perform its cellular functions. The study of RNA's 2D folding pattern between bases in the sequence serves as an intermediate step to deciphering its complex final 3D formation. Determining this folding pattern, also called the secondary structure, remains a challenging task. In recent years, the advancement in DNA sequencing technology has popularized a number of chemical and enzymatic experiments that probe RNA molecules in a massively parallel fashion. These structure probing experiments can be performed both in vitro and in vivo and provide a wealth of information on RNA structure. The data coming from these experiments are typically quantified into a measure of reactivity per nucleotide. This reactivity is correlated with structure and thus this data is used to infer RNA structure. Combined with sequence information, these experimental datasets are typically incorporated into computational secondary structure prediction algorithms. Another class of psoralen-facilitated cross-linking experiments make use of psoralen's ability to form cross-links at interacting regions of RNA to directly probe base-pairing interactions in an RNA structure. These experiments provide direct structural information on an RNA and the resulting data have been particularly useful in uncovering alternative folding patterns for long RNA sequences. Despite the richness in experimental data, current data-driven secondary structure prediction methods suffer from major inaccuracies. In fact, while experimental protocols have been refined over the years, less progress has been made towards statistical characterization of structure probing data. This is even more true for the relatively new psoralen-facilitated cross-linking experiments. Further, most computational methods for structure prediction aim to predict a single optimal structure, whereas it is well-established that the same RNA sequence can exist in multiple conformations in nature. Thus, studying the entire Boltzmann ensemble of possible secondary structures for a given RNA can help uncover important underlying structures that would otherwise remain unknown. Additionally, prediction accuracy improves when abstract representations of RNA structures are used. The work done in this dissertation focuses on the development of computational tools to better utilize data coming from both types of experiments in the context of secondary structure prediction. First, we explored methods for improved signal extraction of structure probing data using signal processing techniques. We then developed a probabilistic model for characterization of structure probing data by analyzing statistical properties of such data. This model was incorporated into thermodynamics based secondary structure prediction algorithms for improved structure prediction. Finally, we studied the use of psoralen-facilitated cross-linking data to recover the structural landscape for a given RNA. We introduced a probabilistic model for these data and provide an extension of the previously developed structural landscape explorer, SLEQ. As these experiments are aimed at probing long RNAs, this extension makes use of abstract structural elements to help cluster similar structures and aggregate similar structural motifs.