AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
PDF (838.1 KB)
Submit Manuscript AI Chat Paper
Show Outline
Show full outline
Hide outline
Show full outline
Hide outline
Research paper | Open Access

Knowledge discovery in sociological databases: An application on general society survey dataset

Zhiwen Pan1( )Jiangtian Li2Yiqiang Chen1Jesus Pacheco3Lianjun Dai4Jun Zhang4
Institute of Computing Technology Chinese Academy of Sciences, Beijing, China
High School Affiliated to Renmin University of China, Beijing, China
Universidad de Sonora, Hermosillo, Mexico
Information Centre of China Disabled Persons' Federation, Beijing, China
Show Author Information



The General Society Survey(GSS) is a kind of government-funded survey which aims at examining the Socio-economic status, quality of life, and structure of contemporary society. GSS data set is regarded as one of the authoritative source for the government and organization practitioners to make data-driven policies. The previous analytic approaches for GSS data set are designed by combining expert knowledges and simple statistics. By utilizing the emerging data mining algorithms, we proposed a comprehensive data management and data mining approach for GSS data sets.


The approach are designed to be operated in a two-phase manner: a data management phase which can improve the quality of GSS data by performing attribute pre-processing and filter-based attribute selection; a data mining phase which can extract hidden knowledge from the data set by performing data mining analysis including prediction analysis, classification analysis, association analysis and clustering analysis.


According to experimental evaluation results, the paper have the following findings: Performing attribute selection on GSS data set can increase the performance of both classification analysis and clustering analysis; all the data mining analysis can effectively extract hidden knowledge from the GSS data set; the knowledge generated by different data mining analysis can somehow cross-validate each other.


By leveraging the power of data mining techniques, the proposed approach can explore knowledge in a fine-grained manner with minimum human interference. Experiments on Chinese General Social Survey data set are conducted at the end to evaluate the performance of our approach.


Australian Bureau of Statistics (2014), “1200.0.55.006 – age standard”, available at:,%20Version%201.7
Borgelt, C. (2005), “An implementation of the FP-growth algorithm”, Proceedings of the 1st international workshop on open source data mining: frequent pattern mining implementations, ACM, pp. 1-5.
Davis, J.A. and Smith, T.W. (1991), ”The NORC General Social Survey: A User's Guide”, SAGE publications.
Dittman, D.J., Khoshgoftaar, T.M., Wald, R. and Napolitano, A. (2013), “Classification performance of rank aggregation techniques for ensemble gene selection”, The Twenty-Sixth International FLAIRS Conference.

Du, P. and Yang, H. (2010), “China's population ageing and active ageing”, China Journal of Social Work, Vol. 3 Nos 2/3, pp. 139-152.

Dwork, C., Kumar, R., Naor, M. and Sivakumar, D. (2001), “Rank aggregation methods for the web”, Proceedings of the 10th international conference on World Wide Web, ACM, pp. 613-622.

Friedman, J.H. and Popescu, B.E. (2008), “Predictive learning via rule ensembles”, The Annals of Applied Statistics. JSTOR, Vol. 2 No. 3, pp. 916-954.


Gao, J., Liu, N., Lawley, M. and Hu, X. (2017), “An interpretable classification framework for information extraction from online healthcare forums”, Journal of Healthcare Engineering, Vol. 2017, doi: 10.1155/2017/2460174.


Hu, A. and Leamaster, R.J. (2015), “Intergenerational religious mobility in contemporary China”,Journal for the Scientific Study of Religion, Vol. 54 No. 1, pp. 79-99.


Johnston, M.P. (2017), “Secondary data analysis: a method of which the time has come”, Qualitative and Quantitative Methods in Libraries, Vol. 3 No. 3, pp. 619-626.


Kruidenier, L.M., Nicolaï, S.P.A., Willigendael, E.M., et al. (2009), “Functional claudication distance: a reliable and valid measurement to assess functional limitation in patients with intermittent claudication”, BMC Cardiovascular Disorders, Vol. 9 No. 1, p. 9.

Lorenzo, R. (2013), ”Individual Income Tax Law, Chinese Tax Law and International Treaties, Springer International Publishing, pp. 9-21.

Mitra, P., Murthy, C.A. and Pal, S. (2002), “Unsupervised feature selection using feature similarity”, IEEE Trans. Pattern Anal. Mach. Intell, Vol. 24 No. 3, pp. 301-312.

National Survey Research Center (NSRC) at Renmin University of China (2019), “Chinese General Society Survey, 2019”, available at:
Statistics Canada (2017), “Age categories, life cycle groupings”, available at:

Tan, H. (2014), “The problems in rural English teaching and the optimization path: a study based on the Chinese general social survey data”,Asian Agricultural Research, Vol. 6 No. 1812-2016-143451, pp. 86-92.


Tibshirani, R. (1996), “Regression shrinkage and selection via the lasso”, Journal of the Royal Statistical Society. Series B (Methodological), Vol. 58 No. 1, pp. 267-288.


Wu, X., Ye, H. and He, G.G. (2014), “Fertility decline and women's status improvement in China”, Chinese Sociological Review, Vol. 46 No. 3, pp. 3-25.

Zhao, Z. and Liu, H. (2007), “Spectral feature selection for supervised and unsupervised learning”, Proceedings of the 24th international conference on Machine learning, ACM, pp. 1151-1157.
International Journal of Crowd Science
Pages 315-332
Cite this article:
Pan Z, Li J, Chen Y, et al. Knowledge discovery in sociological databases: An application on general society survey dataset. International Journal of Crowd Science, 2019, 3(3): 315-332.










Received: 12 September 2019
Revised: 11 October 2019
Accepted: 12 October 2019
Published: 09 December 2019
© The author(s)

Zhiwen Pan, Jiangtian Li, Yiqiang Chen, Jesus Pacheco, Lianjun Dai and Jun Zhang. Published in International Journal of Crowd Science. Published by Emerald Publishing Limited. This article is published under the Creative Commons Attribution (CC BY 4.0) licence. Anyone may reproduce, distribute, translate and create derivative works of this article (for both commercial and non-commercial purposes), subject to full attribution to the original publication and authors. The full terms of this licence may be seen at
