A New Hidden Markov Model for Protein Quality Assessment Using Compatibility Between Protein Sequence and Structure

Zhiquan He; Wenji Ma; Jingfen Zhang; Dong Xu

doi:10.1109/TST.2014.6961026

| Sign up

PDF (4.2 MB)

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

Show Outline

Figures (7)

Fig. 1

Fig. 2

Fig. 3

Fig. 4

Fig. 5

Fig. 6

Fig. 7

Tables (5)

Table 1

Table 2

Table 3

Table 4

Table 5

Open Access

A New Hidden Markov Model for Protein Quality Assessment Using Compatibility Between Protein Sequence and Structure

Zhiquan He, Wenji Ma, Jingfen Zhang, Dong Xu()

Department of Computer Science and Christopher S. Bond Life Sciences Center, University of Missouri, MO 65211, USA.

Christopher S. Bond Life Sciences Center, University of Missouri, MO 65211, USA

Department of Computer Science, City University of Hong Kong, Hong Kong, China.

Show Author Information

Abstract

Protein structure Quality Assessment (QA) is an essential component in protein structure prediction and analysis. The relationship between protein sequence and structure often serves as a basis for protein structure QA. In this work, we developed a new Hidden Markov Model (HMM) to assess the compatibility of protein sequence and structure for capturing their complex relationship. More specifically, the emission of the HMM consists of protein local structures in angular space, secondary structures, and sequence profiles. This model has two capabilities: (1) encoding local structure of each position by jointly considering sequence and structure information, and (2) assigning a global score to estimate the overall quality of a predicted structure, as well as local scores to assess the quality of specific regions of a structure, which provides useful guidance for targeted structure refinement. We compared the HMM model to state-of-art single structure quality assessment methods OPUSCA, DFIRE, GOAP, and RW in protein structure selection. Computational results showed our new score HMM.Z can achieve better overall selection performance on the benchmark datasets.

Keywords

protein structure prediction structure quality assessment Hidden Markov Model (HMM)

References

[1]

Moult

, K.

Fidelis

, A.

Kryshtafovych

, T.

Schwede

, and A.

Tramontano

, Critical assessment of methods of protein structure prediction (CASP) round x, Proteins: Structure, Function, and Bioinformatics, vol. 82, no. S2, pp. 1-6, 2014.

	Top1	Top5	Mean5	Pearson	Spearman
GDT	0.705	0.705	0.693	1.000	1.000
OPUSCA	0.614	0.646	0.613	0.322	0.237
DFIRE	0.609	0.641	0.608	0.312	0.231
RW	0.610	0.636	0.609	0.278	0.196
GOAP	0.603	0.643	0.610	0.285	0.230
Fitness	0.607	0.641	0.606	0.176	0.119
SSMatch	0.617	0.651	0.616	0.216	0.166
HMM.Z	0.615	0.651	0.616	0.265	0.192

	Native	OPUSCA	DFIRE	RW	HMM.Z
Native	1.000	0.662	0.525	0.525	0.442
OPUSCA		1.000	0.521	0.521	0.463
DFIRE			1.000	1.000	0.762
RW				1.000	0.762
HMM.Z					1.000

	Top1	Top5	Mean5	Pearson	Spearman
GDT	0.688	0.688	0.675	1.000	1.000
OPUSCA	0.579	0.627	0.584	0.192	0.175
DFIRE	0.587	0.623	0.585	0.175	0.157
RW	0.569	0.613	0.574	0.104	0.093
GOAP	0.588	0.628	0.589	0.192	0.179
Fitness	0.558	0.621	0.567	0.018	0.020
SSMatch	0.578	0.624	0.580	0.075	0.067
HMM.Z	0.594	0.631	0.593	0.227	0.205

	Top1	Top5	Mean5	Pearson	Spearman
GDT	0.623	0.623	0.598	1.000	1.000
OPUSCA	0.464	0.517	0.450	0.274	0.274
DFIRE	0.469	0.525	0.468	0.288	0.282
RW	0.463	0.524	0.465	0.268	0.268
Fitness	0.470	0.542	0.465	0.190	0.186
SSMatch	0.467	0.519	0.451	0.172	0.166
HMM.Z	0.485	0.525	0.464	0.236	0.218

	Top1	Top5	Mean5	Pearson	Spearman
GDT	0.860	0.860	0.836	1.000	1.000
OPUSCA	0.840	0.858	0.823	0.779	0.739
DFIRE	0.844	0.856	0.824	0.806	0.756
RW	0.847	0.858	0.824	0.812	0.759
GOAP	0.844	0.859	0.824	0.842	0.754
Fitness	0.826	0.854	0.803	0.740	0.592
SSMatch	0.789	0.845	0.795	0.680	0.625
HMM.Z	0.839	0.857	0.813	0.780	0.721