AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
PDF (344.4 KB)
Collect
Submit Manuscript AI Chat Paper
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Open Access

Descriptors for DNA Sequences Based on Joint Diagonalization of Their Feature Matrices from Dinucleotide Physicochemical Properties

Department of Mathematics, School of Science, Anhui Science and Technology University, Fengyang 233100, China
Machine Learning and Systems Biology Laboratory, Tongji University, Shanghai 201804, China
Show Author Information

Abstract

Numerical characterizations of DNA sequence can facilitate analysis of similar sequences. To visualize and compare different DNA sequences in less space, a novel descriptors extraction approach was proposed for numerical characterizations and similarity analysis of sequences. Initially, a transformation method was introduced to represent each DNA sequence with dinucleotide physicochemical property matrix. Then, based on the approximate joint diagonalization theory, an eigenvalue vector was extracted from each DNA sequence, which could be considered as descriptor of the DNA sequence. Moreover, similarity analyses were performed by calculating the pair-wise distances among the obtained eigenvalue vectors. The results show that the proposed approach can capture more sequence information, and can jointly analyze the information contained in all involved multiple sequences, rather than separately, whose effectiveness was demonstrated intuitively by constructing a dendrogram for the 15 beta-globin gene sequences.

References

[1]
A.Nandy, M.Harle, and S. C.Basak, Mathematical descriptors of DNA sequences: Development and applications, ARKIVOC, vol. ix, pp. 211-238, 2006.
[2]
B. E.Blaisdell, A measure of the similarity of sets of sequences not requiring sequence alignment, Proceedings of the National Academy of Sciences, vol. 83, pp. 5155-5159, 1986.
[3]
M. R.Kantorovitz, G. E.Robinson, and S.Sinha, A statistical method for alignment-free comparison of regulatory sequences, Bioinformatics, vol. 23, no. 13, pp. i249-i255, 2007.
[4]
G. E.Sims, S. R.Jun, G. A.Wu, and S. H.Kim, Alignment-free genome comparison with feature frequency profiles (FFP) and optimal resolutions, Proceedings of the National Academy of Sciences, vol. 106, no. 8, pp. 2677-2682, 2009.
[5]
S. R.Jun, G. E.Sims, G. A.Wu, and S. H.Kim, Whole-proteome phylogeny of prokaryotes by feature frequency profiles: An alignment-free method with optimal feature resolution, Proceedings of the National Academy of Sciences, vol. 107, no. 1, pp. 133-138, 2009.
[6]
Y.Wu, A. W-CLiew, H.Yan, and M. S.Yang, DB-Curve: A novel 2D method of DNA sequence visualization and representation, Chemical Physics Letters, vol. 367, pp. 170-176, 2003.
[7]
M.Randić, Condensed representation of DNA primary sequences, Journal of Chemistry Information & Computer Science, vol. 40, no. 1, pp. 50-56, 2000.
[8]
Z. B.Liu, B.Liao, and W.Zhu, A new method to analyze the similarity based on dual nucleotides of the DNA sequence, MATCH, vol. 61, pp. 541-552, 2009.
[9]
Z. B.Liu, B.Liao, W.Zhu, and G. H.Huang, A 2D graphical representation of DNA sequence based on dual nucleotides and its application, International Journal of Quantum Chemistry, vol. 109, no. 5, pp. 948-958, 2009.
[10]
C. Y.Lu, H.Min, J.Gui, L.Zhu, and Y. K.Lei, Face recognition via weighted sparse representation, Journal of Visual Communication and Image Representation, vol. 24, no. 2, pp. 111-116, 2013.
[11]
D.Bielinska-Waz, Graphical and numerical representations of DNA sequences: Statistical aspects of similarity, Journal of Mathematical Chemistry, vol. 49, no. 10, pp. 2345-2407, 2011.
[12]
R. F.Voss, Evolution of long-rang fractal correlations and 1/f noise in DNAbase sequences, Physical Review Letter, vol. 68, pp. 3805-3808, 1992.
[13]
M.Akhtar, J.Epps, and E.Ambikairajah, On DNA numerical representation for period-3 based exon prediction, in 5th International Workshop on Genomic Signal Processing and Statistics, Tuusula, Piscataway, NJ, USA, 2007.
[14]
H. J.Jeffrey, Chaos game representation of gene structure, Nucleic Acids Research, vol. 18, no. 8, pp. 2163-2170, 1990.
[15]
C. Y.Luand D. S.Huang, Optimized projections for sparse representation based classification, Neurocomputing, vol. 113, pp. 213-219, 2013.
[16]
R.Zhangand C. T.Zhang, Zcurves, an intutive tool for visualizing and analyzing the DNA sequences, Journal of Biomolecular Structure & Dynamics, vol. 11, no. 4, pp. 767-782, 1994.
[17]
M.Randić, Another look at the chaos-game representation of DNA, Chemical Physics Letters, vol. 456, no. 1, pp. 84-88, 2008.
[18]
S.Wang, F.Tian, W.Feng, and X.Liu, Applications of representation method for DNA sequences based on symbolic dynamics, Journal of Molecular Structure: THEOCHEM, vol. 909, pp. 33-42, 2009.
[19]
A. K.Brodzikand O.Peters, Symbol-balanced quaternionic periodicity transform for latent pattern detection in DNA sequences, in Proceedings of IEEE ICASSP, Philadelphia, PA, USA, 2005, pp. 373-376.
[20]
B.Liao, M.Tan, and K.Ding, Application of 2-D graphical representation of DNA sequence, Chemical Physics Letters, vol. 414, pp. 296-300, 2005.
[21]
W.Wangand D. H.Johnson, Computing linear transforms of symbolic signals, IEEE Transactions on Signal Processing, vol. 50, no. 3, pp. 628-634, 2002.
[22]
C. Y.Lu, H.Min, Z. Z.Zhao, L.Zhu, D. S.Huang, and S. C.Yan, Robust and efficient subspace segmentation via least squares regression, European Conference on Computer Vision ECCV, vol. 7578, no. 7, pp. 347-360, 2012.
[23]
G. H.Goluband C. F. V.Loan, Matrix Computations, 3rd Ed. Baltimore and London: Johns Hopkins University Press, 1996.
[24]
H. J.Yuand D. S.Huang, Graphical representation for DNA sequences via joint diagonalization of matrix pencil, IEEE Journal of Biomedical and Health Informatics, vol. 17, no. 3, pp. 503-511, 2013.
[25]
A.Yeredor, Non-orthogonal joint diagonalization in the least-squares sense with application in blind source separation, IEEE Transactions on Signal Processing, vol. 50, no. 7, pp. 1545-1553, 2002.
[26]
Q.Dai, X. Q.Liu, Y. H.Yao, and F. K.Zhao, Sequence comparison via polar coordinates representation and curve tree, Journal of Theoretical Biology, vol. 292, pp. 78-85, 2011.
[27]
C.Li, H.Ma, Y.Zhou, X. L.Wang, and X. Q.Zheng, Similarity analysis of DNA sequences based on the weighted pseudo-entropy, Journal of Computational Chemistry, vol. 32, no. 4, pp. 675-680, 2011.
Tsinghua Science and Technology
Pages 446-453
Cite this article:
Yu H, Huang D. Descriptors for DNA Sequences Based on Joint Diagonalization of Their Feature Matrices from Dinucleotide Physicochemical Properties. Tsinghua Science and Technology, 2013, 18(5): 446-453. https://doi.org/10.1109/TST.2013.6616518

544

Views

16

Downloads

0

Crossref

N/A

Web of Science

0

Scopus

0

CSCD

Altmetrics

Received: 15 June 2013
Revised: 04 September 2013
Accepted: 05 September 2013
Published: 03 October 2013
© The author(s) 2013
Return