Abstract
Dynamic regulation and packaging of genetic information is achieved by the organization of DNA into chromatin. Nucleosomal core histones, which form the basic repeating unit of chromatin, are subject to various post-translational modifications such as acetylation, methylation, phosphorylation, and ubiquitinylation. These modifications have effects on chromatin structure and, along with DNA methylation, regulate gene transcription. The goal of this study was to determine if patterns in modifications were related to different categories of genomic features, and, if so, if the patterns had predictive value. In this study, we used publically available data (ChIP-chip) for different types of histone modifications (methylation and acetylation) and for DNA methylation for Arabidopsis thaliana and then applied a machine learning based approach (a support vector machine) to demonstrate that patterns of these modifications are very different among different kinds of genomic feature categories (protein, RNA, pseudogene, and transposon elements). These patterns can be used to distinguish the types of genomic features. DNA methylation and H3K4me3 methylation emerged as features with most discriminative power. From our analysis on Arabidopsis, we were able to predict 33 novel genomic features, whose existence was also supported by analysis of RNA-seq experiments. In summary, we present a novel approach which can be used to discriminate/detect different categories of genomic features based upon their patterns of chromatin modification and DNA methylation.