RNA secondary structure has become the most exploitable feature for ab initio detection of non-coding RNA (ncRNA) genes from genome sequences. Previous work has used Minimum Free Energy (MFE) based methods developed to identify ncRNAs by measuring sequence fold stability and certainty. However, these methods yielded variable performances across different ncRNA species. Designing novel reliable structural measures will help to develop effective ncRNA gene finding tools. This paper introduces a new RNA structural measure based on a novel RNA secondary structure ensemble constrained by characteristics of native RNA tertiary structures. The new method makes it possible to achieve a performance leap from the previous structure-based methods. Test results on standard ncRNA datasets (benchmarks) demonstrate that this method can effectively separate most ncRNAs families from genome backgrounds.
- Article type
- Year
- Co-author
Dynamic regulation and packaging of genetic information is achieved by the organization of DNA into chromatin. Nucleosomal core histones, which form the basic repeating unit of chromatin, are subject to various post-translational modifications such as acetylation, methylation, phosphorylation, and ubiquitinylation. These modifications have effects on chromatin structure and, along with DNA methylation, regulate gene transcription. The goal of this study was to determine if patterns in modifications were related to different categories of genomic features, and, if so, if the patterns had predictive value. In this study, we used publically available data (ChIP-chip) for different types of histone modifications (methylation and acetylation) and for DNA methylation for Arabidopsis thaliana and then applied a machine learning based approach (a support vector machine) to demonstrate that patterns of these modifications are very different among different kinds of genomic feature categories (protein, RNA, pseudogene, and transposon elements). These patterns can be used to distinguish the types of genomic features. DNA methylation and H3K4me3 methylation emerged as features with most discriminative power. From our analysis on Arabidopsis, we were able to predict 33 novel genomic features, whose existence was also supported by analysis of RNA-seq experiments. In summary, we present a novel approach which can be used to discriminate/detect different categories of genomic features based upon their patterns of chromatin modification and DNA methylation.