AI Chat Paper

Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.

Chat more with AI

| Sign up

Browse by Subject

Search for peer-reviewed journals with full access.

Journals A - Z

About Us

Discover the SciOpen Platform and Achieve Your Research Goals with Ease.

About Us

Publish with Us

Support

Journals A - Z

About Us

Publish with Us

Support

PDF (838.1 KB)

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

AI Chat Paper

Show Outline

Outline

Show full outline

Hide outline

Outline

Show full outline

Hide outline

Research paper | Open Access

Knowledge discovery in sociological databases: An application on general society survey dataset

Zhiwen Pan^¹(

), Jiangtian Li^², Yiqiang Chen^¹, Jesus Pacheco^³, Lianjun Dai^⁴, Jun Zhang^⁴

Institute of Computing Technology Chinese Academy of Sciences, Beijing, China

High School Affiliated to Renmin University of China, Beijing, China

Universidad de Sonora, Hermosillo, Mexico

Information Centre of China Disabled Persons' Federation, Beijing, China

Show Author Information

Abstract

Purpose

The General Society Survey(GSS) is a kind of government-funded survey which aims at examining the Socio-economic status, quality of life, and structure of contemporary society. GSS data set is regarded as one of the authoritative source for the government and organization practitioners to make data-driven policies. The previous analytic approaches for GSS data set are designed by combining expert knowledges and simple statistics. By utilizing the emerging data mining algorithms, we proposed a comprehensive data management and data mining approach for GSS data sets.

Design/methodology/approach

The approach are designed to be operated in a two-phase manner: a data management phase which can improve the quality of GSS data by performing attribute pre-processing and filter-based attribute selection; a data mining phase which can extract hidden knowledge from the data set by performing data mining analysis including prediction analysis, classification analysis, association analysis and clustering analysis.

Findings

According to experimental evaluation results, the paper have the following findings: Performing attribute selection on GSS data set can increase the performance of both classification analysis and clustering analysis; all the data mining analysis can effectively extract hidden knowledge from the GSS data set; the knowledge generated by different data mining analysis can somehow cross-validate each other.

Originality/value

By leveraging the power of data mining techniques, the proposed approach can explore knowledge in a fine-grained manner with minimum human interference. Experiments on Chinese General Social Survey data set are conducted at the end to evaluate the performance of our approach.

Keywords

Data mining Data management Crowdsourced big data and analytics Knowledge discovery

References

Australian Bureau of Statistics (2014), “1200.0.55.006 – age standard”, available at: www.abs.gov.au/ausstats/abs@.nsf/Lookup/1200.0.55.006main+features62014,%20Version%201.7

Borgelt, C. (2005), “An implementation of the FP-growth algorithm”, Proceedings of the 1st international workshop on open source data mining: frequent pattern mining implementations, ACM, pp. 1-5.https://doi.org/10.1145/1133905.1133907

Crossref

Davis, J.A. and Smith, T.W. (1991), ”The NORC General Social Survey: A User's Guide”, SAGE publications.https://doi.org/10.4135/9781483345246

Crossref

Dittman, D.J., Khoshgoftaar, T.M., Wald, R. and Napolitano, A. (2013), “Classification performance of rank aggregation techniques for ensemble gene selection”, The Twenty-Sixth International FLAIRS Conference.

Du, P. and Yang, H. (2010), “China's population ageing and active ageing”, China Journal of Social Work, Vol. 3 Nos 2/3, pp. 139-152.

Crossref Google Scholar

Dwork, C., Kumar, R., Naor, M. and Sivakumar, D. (2001), “Rank aggregation methods for the web”, Proceedings of the 10th international conference on World Wide Web, ACM, pp. 613-622.https://doi.org/10.1145/371920.372165

Crossref

Friedman, J.H. and Popescu, B.E. (2008), “Predictive learning via rule ensembles”, The Annals of Applied Statistics. JSTOR, Vol. 2 No. 3, pp. 916-954.

Crossref Google Scholar

Gao, J., Liu, N., Lawley, M. and Hu, X. (2017), “An interpretable classification framework for information extraction from online healthcare forums”, Journal of Healthcare Engineering, Vol. 2017, doi: 10.1155/2017/2460174.

Crossref Google Scholar

Hu, A. and Leamaster, R.J. (2015), “Intergenerational religious mobility in contemporary China”,Journal for the Scientific Study of Religion, Vol. 54 No. 1, pp. 79-99.

Crossref Google Scholar

Johnston, M.P. (2017), “Secondary data analysis: a method of which the time has come”, Qualitative and Quantitative Methods in Libraries, Vol. 3 No. 3, pp. 619-626.

Google Scholar

Kruidenier, L.M., Nicolaï, S.P.A., Willigendael, E.M., et al. (2009), “Functional claudication distance: a reliable and valid measurement to assess functional limitation in patients with intermittent claudication”, BMC Cardiovascular Disorders, Vol. 9 No. 1, p. 9.

Crossref Google Scholar

Lorenzo, R. (2013), ”Individual Income Tax Law, Chinese Tax Law and International Treaties, Springer International Publishing, pp. 9-21.https://doi.org/10.1007/978-3-319-00275-0_2

Crossref

Mitra, P., Murthy, C.A. and Pal, S. (2002), “Unsupervised feature selection using feature similarity”, IEEE Trans. Pattern Anal. Mach. Intell, Vol. 24 No. 3, pp. 301-312.

Crossref Google Scholar

National Survey Research Center (NSRC) at Renmin University of China (2019), “Chinese General Society Survey, 2019”, available at: http://cgss.ruc.edu.cn/index.php?r=index/index&hl=en

Statistics Canada (2017), “Age categories, life cycle groupings”, available at: www.statcan.gc.ca/eng/concepts/definitions/age2

Tan, H. (2014), “The problems in rural English teaching and the optimization path: a study based on the Chinese general social survey data”,Asian Agricultural Research, Vol. 6 No. 1812-2016-143451, pp. 86-92.

Google Scholar

Tibshirani, R. (1996), “Regression shrinkage and selection via the lasso”, Journal of the Royal Statistical Society. Series B (Methodological), Vol. 58 No. 1, pp. 267-288.

Crossref Google Scholar

Wu, X., Ye, H. and He, G.G. (2014), “Fertility decline and women's status improvement in China”, Chinese Sociological Review, Vol. 46 No. 3, pp. 3-25.

Crossref Google Scholar

Zhao, Z. and Liu, H. (2007), “Spectral feature selection for supervised and unsupervised learning”, Proceedings of the 24th international conference on Machine learning, ACM, pp. 1151-1157.https://doi.org/10.1145/1273496.1273641

Crossref

International Journal of Crowd Science

Volume 3 Issue 3,
December 2019

Pages 315-332

DOI: 10.1108/IJCS-09-2019-0023

Cite this article:

Pan Z, Li J, Chen Y, et al. Knowledge discovery in sociological databases: An application on general society survey dataset. International Journal of Crowd Science, 2019, 3(3): 315-332. https://doi.org/10.1108/IJCS-09-2019-0023

608

Views

Downloads

Crossref

Scopus

Google Scholar
Citation

Altmetrics

Received: 12 September 2019

Revised: 11 October 2019

Accepted: 12 October 2019

Published: 09 December 2019

Zhiwen Pan, Jiangtian Li, Yiqiang Chen, Jesus Pacheco, Lianjun Dai and Jun Zhang. Published in International Journal of Crowd Science. Published by Emerald Publishing Limited. This article is published under the Creative Commons Attribution (CC BY 4.0) licence. Anyone may reproduce, distribute, translate and create derivative works of this article (for both commercial and non-commercial purposes), subject to full attribution to the original publication and authors. The full terms of this licence may be seen at http://creativecommons.org/licences/by/4.0/legalcode