| Sign up

PDF (344.4 KB)

Cite

EndNote(RIS) BibTeX

Collect

Collect

Submit Manuscript

Open Access

Descriptors for DNA Sequences Based on Joint Diagonalization of Their Feature Matrices from Dinucleotide Physicochemical Properties

Hongjie Yu, Deshuang Huang()

Department of Mathematics, School of Science, Anhui Science and Technology University, Fengyang 233100, China

Machine Learning and Systems Biology Laboratory, Tongji University, Shanghai 201804, China

Show Author Information

Abstract

Numerical characterizations of DNA sequence can facilitate analysis of similar sequences. To visualize and compare different DNA sequences in less space, a novel descriptors extraction approach was proposed for numerical characterizations and similarity analysis of sequences. Initially, a transformation method was introduced to represent each DNA sequence with dinucleotide physicochemical property matrix. Then, based on the approximate joint diagonalization theory, an eigenvalue vector was extracted from each DNA sequence, which could be considered as descriptor of the DNA sequence. Moreover, similarity analyses were performed by calculating the pair-wise distances among the obtained eigenvalue vectors. The results show that the proposed approach can capture more sequence information, and can jointly analyze the information contained in all involved multiple sequences, rather than separately, whose effectiveness was demonstrated intuitively by constructing a dendrogram for the 15 beta-globin gene sequences.

Keywords

descriptors approximate joint diagonalization dendrogram physicochemical property similarity analysis

References

[1]

A.

Nandy

, M.

Harle

, and S. C.

Basak

, Mathematical descriptors of DNA sequences: Development and applications, ARKIVOC, vol. ix, pp. 211-238, 2006.

Crossref Google Scholar

[2]

B. E.

Blaisdell

, A measure of the similarity of sets of sequences not requiring sequence alignment, Proceedings of the National Academy of Sciences, vol. 83, pp. 5155-5159, 1986.

Crossref Google Scholar

[3]

M. R.

Kantorovitz

, G. E.

Robinson

, and S.

Sinha

, A statistical method for alignment-free comparison of regulatory sequences, Bioinformatics, vol. 23, no. 13, pp. i249-i255, 2007.

Crossref Google Scholar

[4]

G. E.

Sims

, S. R.

Jun

, G. A.

Wu

, and S. H.

Kim

, Alignment-free genome comparison with feature frequency profiles (FFP) and optimal resolutions, Proceedings of the National Academy of Sciences, vol. 106, no. 8, pp. 2677-2682, 2009.

Crossref Google Scholar

[5]

S. R.

Jun

, G. E.

Sims

, G. A.

Wu

, and S. H.

Kim

, Whole-proteome phylogeny of prokaryotes by feature frequency profiles: An alignment-free method with optimal feature resolution, Proceedings of the National Academy of Sciences, vol. 107, no. 1, pp. 133-138, 2009.

Crossref Google Scholar

[6]

Y.

Wu

, A. W-C

Liew

, H.

Yan

, and M. S.

Yang

, DB-Curve: A novel 2D method of DNA sequence visualization and representation, Chemical Physics Letters, vol. 367, pp. 170-176, 2003.

Crossref Google Scholar

[7]

M.

Randić

, Condensed representation of DNA primary sequences, Journal of Chemistry Information & Computer Science, vol. 40, no. 1, pp. 50-56, 2000.

Crossref Google Scholar

[8]

Z. B.

Liu

, B.

Liao

, and W.

Zhu

, A new method to analyze the similarity based on dual nucleotides of the DNA sequence, MATCH, vol. 61, pp. 541-552, 2009.

[9]

Z. B.

Liu

, B.

Liao

, W.

Zhu

, and G. H.

Huang

, A 2D graphical representation of DNA sequence based on dual nucleotides and its application, International Journal of Quantum Chemistry, vol. 109, no. 5, pp. 948-958, 2009.

Crossref Google Scholar

[10]

C. Y.

Lu

, H.

Min

, J.

Gui

, L.

Zhu

, and Y. K.

Lei

, Face recognition via weighted sparse representation, Journal of Visual Communication and Image Representation, vol. 24, no. 2, pp. 111-116, 2013.

Crossref Google Scholar

[11]

D.

Bielinska-Waz

, Graphical and numerical representations of DNA sequences: Statistical aspects of similarity, Journal of Mathematical Chemistry, vol. 49, no. 10, pp. 2345-2407, 2011.

Crossref Google Scholar

[12]

R. F.

Voss

, Evolution of long-rang fractal correlations and 1/f noise in DNAbase sequences, Physical Review Letter, vol. 68, pp. 3805-3808, 1992.

Crossref Google Scholar

[13]

M.

Akhtar

, J.

Epps

, and E.

Ambikairajah

, On DNA numerical representation for period-3 based exon prediction, in 5th International Workshop on Genomic Signal Processing and Statistics, Tuusula, Piscataway, NJ, USA, 2007.

[14]

H. J.

Jeffrey

, Chaos game representation of gene structure, Nucleic Acids Research, vol. 18, no. 8, pp. 2163-2170, 1990.

Crossref Google Scholar

[15]

C. Y.

Lu

and D. S.

Huang

, Optimized projections for sparse representation based classification, Neurocomputing, vol. 113, pp. 213-219, 2013.

Crossref Google Scholar

[16]

R.

Zhang

and C. T.

Zhang

, Z

curves

, an intutive tool for visualizing and analyzing the DNA sequences, Journal of Biomolecular Structure & Dynamics, vol. 11, no. 4, pp. 767-782, 1994.

Crossref Google Scholar

[17]

M.

Randić

, Another look at the chaos-game representation of DNA, Chemical Physics Letters, vol. 456, no. 1, pp. 84-88, 2008.

Crossref Google Scholar

[18]

S.

Wang

, F.

Tian

, W.

Feng

, and X.

Liu

, Applications of representation method for DNA sequences based on symbolic dynamics, Journal of Molecular Structure: THEOCHEM, vol. 909, pp. 33-42, 2009.

Crossref Google Scholar

[19]

A. K.

Brodzik

and O.

Peters

, Symbol-balanced quaternionic periodicity transform for latent pattern detection in DNA sequences, in Proceedings of IEEE ICASSP, Philadelphia, PA, USA, 2005, pp. 373-376.

[20]

B.

Liao

, M.

Tan

, and K.

Ding

, Application of 2-D graphical representation of DNA sequence, Chemical Physics Letters, vol. 414, pp. 296-300, 2005.

Crossref Google Scholar

[21]

W.

Wang

and D. H.

Johnson

, Computing linear transforms of symbolic signals, IEEE Transactions on Signal Processing, vol. 50, no. 3, pp. 628-634, 2002.

Crossref Google Scholar

[22]

C. Y.

Lu

, H.

Min

, Z. Z.

Zhao

, L.

Zhu

, D. S.

Huang

, and S. C.

Yan

, Robust and efficient subspace segmentation via least squares regression, European Conference on Computer Vision ECCV, vol. 7578, no. 7, pp. 347-360, 2012.

Crossref Google Scholar

[23]

G. H.

Golub

and C. F. V.

Loan

, Matrix Computations, 3rd Ed. Baltimore and London: Johns Hopkins University Press, 1996.

[24]

H. J.

Yu

and D. S.

Huang

, Graphical representation for DNA sequences via joint diagonalization of matrix pencil, IEEE Journal of Biomedical and Health Informatics, vol. 17, no. 3, pp. 503-511, 2013.

Crossref Google Scholar

[25]

A.

Yeredor

, Non-orthogonal joint diagonalization in the least-squares sense with application in blind source separation, IEEE Transactions on Signal Processing, vol. 50, no. 7, pp. 1545-1553, 2002.

Crossref Google Scholar

[26]

Q.

Dai

, X. Q.

Liu

, Y. H.

Yao

, and F. K.

Zhao

, Sequence comparison via polar coordinates representation and curve tree, Journal of Theoretical Biology, vol. 292, pp. 78-85, 2011.

Crossref Google Scholar

[27]

C.

Li

, H.

Ma

, Y.

Zhou

, X. L.

Wang

, and X. Q.

Zheng

, Similarity analysis of DNA sequences based on the weighted pseudo-entropy, Journal of Computational Chemistry, vol. 32, no. 4, pp. 675-680, 2011.

Crossref Google Scholar

Tsinghua Science and Technology

Volume 18 Issue 5,
October 2013

Pages 446-453

DOI: 10.1109/TST.2013.6616518

Cite this article:

Yu H, Huang D. Descriptors for DNA Sequences Based on Joint Diagonalization of Their Feature Matrices from Dinucleotide Physicochemical Properties. Tsinghua Science and Technology, 2013, 18(5): 446-453. https://doi.org/10.1109/TST.2013.6616518

About Us

Learn about Open Access

Tsinghua University Press

Publish with Us

Peer Review Policy

Copyright and Licensing

Article Processing Charge

Contact Us

Journal Collaboration: Yao Meng (Ms.)✉️ +86-10-83470574

Technical Support: Kuo Zhao (Mr.)✉️ +86-10-83470507

Media Contact: Hao Jin (Mr.)✉️ +86-10-83470559

Address: Floor 6, Tower B, Xueyan Building, Shuangqing Road, Haidian District, Beijing 100084, China.

SciOpen——中国科技期刊卓越行动计划支持项目

Copyright © 2025 Tsinghua University Press Ltd.

京ICP备 10035462号-42 京公网安备11010802044758号