Estimating Intelligence Quotient Using Stylometry and Machine Learning Techniques: A Review

Glory O. Adebayo; Roman V. Yampolskiy

doi:10.26599/BDMA.2022.9020002

| Sign up

PDF (2.1 MB)

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

Show Outline

Figures (5)

Fig. 1

Fig. 2

Fig. 3

Fig. 4

Fig. 5

Tables (18)

Table 1

Table 2

Table 3

Table 4

Table 5

Open Access

Estimating Intelligence Quotient Using Stylometry and Machine Learning Techniques: A Review

Glory O. Adebayo, Roman V. Yampolskiy()

Department of Computer Science and Engineering, University of Louisville, Louisville, KY 40208, USA

Show Author Information

Abstract

The task of trying to quantify a person’s intelligence has been a goal of psychologists for over a century. The area of estimating IQ using stylometry has been a developing area of research and the effectiveness of using machine learning in stylometry analysis for the estimation of IQ has been demonstrated in literature whose conclusions suggest that using a large dataset could improve the quality of estimation. The unavailability of large datasets in this area of research has led to very few publications in IQ estimation from written text. In this paper, we review studies that have been done in IQ estimation and also that have been done in author profiling using stylometry and we conclude that based on the success of IQ estimation and author profiling with stylometry, a study on IQ estimation from written text using stylometry will yield good results if the right dataset is used.

Keywords

stylometry IQ estimation authorship attribution intelligence IQ author profiling machine learning

References

[1]

Hallinan

, Book review: Psychological testing (5th edn), Aust. Educ. Dev. Psychol., vol. 2, no. 2, p. 18, 1985,

Number of 3D brain images for each class	Number of MRI slices form individual images	Total number of slices for training	Total number of slices for testing	CNN architecture used	Transversal image accuracy (%)	Sagittal image accuracy (%)	Coronal image accuracy (%)
50	25	4000	1000	SVGG	51.625	61.0	58.7
50	25	4000	1000	VGG16	54.500	73.0	68.8
50	25	4000	1000	ResNet-50	66.800	85.9	76.4

Algorithm	Prediction result
Algorithm	All samples	Male samples	Female samples
ReliefF+LASSO	0.5122	0.4682	0.7212
ReliefF+ridge	0.4787	0.3010	0.4918
ReliefF+elastic net	0.4313	0.2787	0.6481
ReliefF+RVR	0.2189	0.1353	0.2359
ReliefF+OLS	0.3157	0.1468	0.3161
LASSO	0.3668	0.1678	0.6802
Ridge	0.4345	0.1815	0.4295
Elastic net	0.3449	0.1830	0.6655
RVR	0.2413	0.0859	0.2609
OLS	-0.0213	-0.1294	0.1076

Sample word length	Sample collegiate word count	Sample CWR	Expected IQ	Measured IQ	Error (%)
752	94	0.1250	153	123.88	19.03
412	51	0.1238	130	123.31	5.15
136	22	0.1618	141	141.36	0.26
3279	433	0.1321	129	127.24	1.36

GRE score	IQ range	GRE score	IQ range
1	70-79	4	111-120
2	80-89	5	121-130
3	90-110	6	131-160

Exp. IQ	Sample name	SYNNP	LDMTLD	WRDMEAc	LAR
70-79	Sample 1	72.20	75.55	67.69	84.98
70-79	Sample 2	104.98	76.11	123.90	98.12
80-89	Sample 3	131.28	76.96	72.54	95.93
80-89	Sample 4	113.32	72.42	81.55	91.60
90-110	Sample 5	121.32	86.88	83.68	108.70
90-110	Sample 6	114.83	84.59	93.51	125.56
111-120	Sample 7	108.01	88.98	104.92	121.93
111-120	Sample 8	109.74	92.79	124.72	101.74
121-130	Sample 9	103.36	76.14	111.63	94.82
121-130	Sample 10	119.37	117.02	104.47	135.67
131-160	Sample 11	118.08	95.65	121.67	89.20
131-160	Sample 12	124.14	87.02	75.10	102.62

Domain	FW	POS	FWPOS
All	73.7 $\pm$ 0.86	70.5 $\pm$ 0.90	77.3 $\pm$ 0.79
Fiction	78.8 $\pm$ 1.1	77.1 $\pm$ 0.85	79.5 $\pm$ 1.1
Nonfiction	68.5 $\pm$ 1.3	67.2 $\pm$ 1.2	82.6 $\pm$ 0.99

Corpus	Predicion result
Corpus	Gender	Variety	Both
Training - Arabic	0.5942	0.6079	0.3788
Training - English	0.6578	0.3017	0.2094
Training - Portuguese	0.6392	0.8975	0.5750
Training - Spanish	0.6307	0.3519	0.2193
Test - Arabic	0.5863	0.5844	0.3650
Test - English	0.6692	0.2779	0.1900
Test - Portuguese	0.6100	0.9063	0.5488
Test - Spanish	0.6354	0.3496	0.2189

Language	Prediction result
	Instance-based			Prototype-based
	Gender	Language variety	Joint	Gender	Language variety	Joint
Spanish	0.60	0.20	0.12	0.63	0.3	0.19
English	0.56	0.23	0.14	0.65	0.3	0.20

Language	Preprocessing	Feature	Classifier	F macro	F micro
Arabic	Removal of stop words	TF-IDF (1/2-grams)	NBC	0.707	0.708
English	Removal of stop words	TF-IDF (1/2-grams)	NBC	0.669	0.669
Spanish	—	TF-IDF (1/2-grams)	NBC	0.659	0.661
Portuguese	—	TF-IDF (1/2-grams)	NBC	0.659	0.663

Language	Score obtained
Language	Gender	Variety	Joint
Arabic	0.6856	0.7544	0.5475
English	0.7546	0.7588	0.5704
Spanish	0.6968	0.9168	0.6400
Portuguese	0.6638	0.9750	0.6475

Task	Model	Prediction result
Task	Model	Arabic	English	Portuguese	Spanish	Average
Language variety	Random	25.0	16.7	50.0	14.3	26.5
	BOW	71.2	59.4	88.7	75.1	73.6
	Skip-gram emb.	73.0	62.4	98.6	80.6	78.7
	Subword emb.	70.7	68.3	98.5	79.6	79.3
	DAN	80.6	76.5	98.9	91.0	86.8
Gender	Random	50.0	50.0	50.0	50.0	50.0
	BOW	66.4	66.7	71.0	63.4	66.9
	Skip-gram emb.	71.2	78.4	76.5	73.3	74.8
	Subword emb.	73.7	78.8	72.6	74.5	74.9
	DAN	74.5	80.8	78.8	75.5	77.4

Language	Prediction result (%)
Language	bi-GRU+Attention	CNN
English	79.03	73.24
Spanish	72.57	72.93
Portuguese	79.50	79.83
Arabic	71.58	70.88
Average	75.67	74.22

Language	Prediction result (%)
Language	bi-GRU+Attention	CNN
English	79.03	70.90
Spanish	92.05	89.67
Portuguese	98.76	98.75
Arabic	78.71	78.38
Average	87.11	84.22

Corpus	Prediction result obtained from learning algorithms
Corpus	TF-IDF	CNN	Final	Random
English variety	0.8333	0.6563	—	0.1666
English gender	0.6805	0.7803	—	0.5000
Both	0.4724	0.5228	0.6502	0.0833
Spanish variety	0.9323	0.7804	—	0.1428
Spanish gender	0.6491	0.7238	—	0.5000
Both	0.6051	0.5648	0.6747	0.0714
Portuguese variety	0.9925	0.9833	—	0.5000
Portuguese gender	0.7317	0.8500	—	0.5000
Both	0.5313	0.8358	0.8436	0.2500
Arabic variety	0.8609	0.6750	—	0.2500
Arabic gender	0.6888	0.7500	—	0.5000
Both	0.5929	0.5028	0.6456	0.1250
Average	—	—	0.7035	—

N/A	Publication	Machine learning method	Dataset	Source	Size	Evaluation metrics	Result
1	MRI-based IQ estimation with sparse learning^[12]	SVR — Multi-kernel SVR, Single-kernel SVR	MRI samples of developing children between 6 and 15 years scanned at 5 different sites (NYU, KKI, SU, OHSU, and UCLA)	Autism Brain Imaging Data Exchange	164 samples (male/female: 130/34)	Correlation coefficient, root mean square error	· Multi Kernel SVR yielded a CC of 0.718 and an RMSE of 8.695.· Single kernel SVR yielded a CC of 0.684 and an RMSE of 9.166.
2	IQ estimation by means of EEG-fNIRS recordings during a logical-mathematical intelligence test^[23]	Linear regression, support vector regression	fNIRS and EEG signals readings	fNIRS and EEG signals readings gotten from graduate students while they solved the RPM intelligence test	11 samples (male/female: 6/5)	Relative error between the real IQ (Cattle test) and estimated IQ	· A combination of fNIRS and EEG features selected using PCA yielded the best results.
3	Automated IQ estimation from writing samples^[34]	Stylometry	Common crawl corpus and SAT vocabulary	https://aws.amazon.com/public-datasets/common-crawl/	Samples from common crawl with more than 100 words	Percentage error between Real IQ (from social media contacts) and estimated IQ	· The results show an accuracy of about 75%.
4	Automatic IQ estimation using stylometric methods^[35]	Stylometry	Written text samples of American English published since 1990	Open American National Corpus (OANC) and SAT Vocabulary	6516 written samples and 5000 words from the SAT Vocabulary	Error between expected IQ range (gotten from sample GRE cores mapped to a range of IQs) and calculated IQ	· There was a high correlation between estimated IQ and calculated IQ with WRDMEAc feature providing the best estimation with a 75% accuracy.
5	Automatically profiling the author of an anonymous text^[36]	Bayesian Multinomial Regression (BMR)	Three separate corpora. One to detect age and gender, one to detect native language, and the last one to detect personality type.	Age and gender: Full sets of postings from blog authors written in EnglishNative language: International Corpus of Learner EnglishPersonality: Essays written by psychology undergraduates at the University of Texas, Austin, as part of their course requirements	Age and gender: 19 320 authors with a mean length of 7250 words/authorNative language: 1290 authors. Between 279 and 846 words/authorPersonality: 198 authors. Between 251-1951 words/author	Classification accuracy	· Content-based and style-based features yielded the best results for age (76.1%) and gender (77.7%).· Content-based features only yielded the best results for language (82.3%).· Style-based features yielded the best results for neuroticism (65.7%).
6	Author profiling, instance-based similarity classification^[47]	Instance-based similarity classification	XML-based tweets from twitter in four different languages (Arabic, English, Portuguese, and Spanish)	www.twitter.com	Arabic: 2400 documentsEnglish: 3600 documentsPortuguese: 1200 documentsSpanish: 4200 documents100 tweets/documents	Classification accuracy	· Performed well in gender classification but poorly in language classification.
7	Arabic tweeps gender and dialect prediction^[38]	Support vector machines (SVMs), sequential minimal optimization (SMO)	XML-based tweets from twitter in Arabic language	www.twitter.com	240 000 tweets written in Arabic by 2400 authors	Classification accuracy	SMO yielded the best results with· Language variety = 75.5%,· Gender = 72.25%.
8	Subword-based deep averaging networks for author profiling in social media^[49]	Deep averaging networks (DANs)	XML-based tweets from twitter in four different languages (Arabic, English, Portuguese, and Spanish)	www.twitter.com	Arabic: 2400 documentsEnglish: 3600 documentsPortuguese: 1200 documentsSpanish: 4200 documents100 tweets/documents	Classification accuracy between an instance-based and prototype-based classification	· DAN with subword embeddings yielded the best results.· DAN performs well in author profiling to magnify the most discriminant values contained in an embedding average.· It is a competitive alternative.
9	Author profile prediction using trend and word frequency based analysis in text^[44]	Distance-based method	XML-based tweets from twitter in four different languages (Arabic, English, Portuguese, and Spanish)	www.twitter.com	Arabic: 2400 documentsEnglish: 3600 documentsPortuguese: 1200 documentsSpanish: 4200 documents100 tweets/documents	Classification accuracy	·There is a flaw in the system which is a decrease in the prediction of variety when there is an increase in the number of language varieties.· The method yields bad results.
10	INSA LYON and UNI PASSAU’s participation at PAN@CLEF’17: Author profiling task: Notebook for PAN at CLEF 2017^[48]	SVMs, multinomial Naïve Bayes classifier (MNBC), random forest	XML-based tweets from twitter in four different languages (Arabic, English, Portuguese, and Spanish)	www.twitter.com	Arabic: 235 781 tweetsEnglish: 358 445 tweetsSpanish: 418 090 tweetsPortuguese: 118 105 tweets	Classification accuracy	· Combining TF-IDF features on unigram and bigrams using Naïve Bayes classifier yielded the best results.· Predicting Portuguese (97.5%) and Spanish (91.98%) yielded the best results.
11	Author profiling with bidirectional RNNs using attention with GRUs^[39]	Recurrent neural networks (RNNs)	XML-based tweets from twitter in four different languages (Arabic, English, Portuguese, and Spanish)	www.twitter.com	500 authors,100 tweets per author	Classification accuracy between the RNN and a CNN based model as the baseline	RNN yielded better results than CNN with an average classification accuracy of· 75.67% for gender,· 87.11% for language variety.
12	TF-IDF and deep learning for author profiling^[51]	TF-IDF based method, convolutional neural networks (CNNs)	XML-based tweets from twitter in four different languages (Arabic, English, Portuguese, and Spanish)	www.twitter.com	English: 360 000 tweetsSpanish: 420 000 tweetsPortuguese: 120 000 tweetsArabic: 240 000 tweets	Classification accuracy between TF-IDF and CNN	· TF-IDF performed better for predicting language variety.· CNN performed better when used to classify gender.
13	Automatically categorizing written texts by author gender^[43]	Winnow-like Algorithm, Naïve Bayes, decision trees	Documents in British English that are labeled both for author gender and for genre: fiction and several non-fiction genres and sub-genres	http://www.ir.iit.edu/~argamon/gender.html	Between 554 and 61 199 words with an average of about 34 320 words each (female = 34 795; male = 33 845).	Classification accuracy	· Function words combined with parts-of-speech yielded the best results across all genres with about 80% accuracy.
14	Determining an author’s native language by mining a text for errors^[37]	Multi-class linear SVM	Written text from non-native English-speaking students	International Corpus of Learner English	258 authors each from Russia, Czech Republic, Bulgaria, Spanish, and French sub-corpus	Confusion matrix	· Classification accuracy of 80.2% when all features are used in tandem with one another.
15	Intelligence quotient classification from human MRI brain images using convolutional neural networks^[21]	CNN based IQ classification	ABIDE (autism brain image data exchange) provided by NITRC (neuroimaging informatics tools and resources)	Autism Brain Imaging Data Exchange	5000 bi-dimensional slices from each of the three brain views (15 000)	Classification accuracy	· ResNet-50 yielded a maximum accuracy of 85.9%.· Using the images from the sagittal view proved to yield the best results.
16	Predicting individualized intelligence quotient scores using brainnetome-atlas based functional connectivity^[22]	Regression algorithms	MRI brain scans	MRI brain scans obtained using a Tesla magnetic resonance scanner	360 subjects between the ages of 17 and 24. 174 females and 186 males	Comparison of regression coefficients	· ReliefF + LASSO produced the best results with a regression coefficient of 0.5122 for all subjects and 0.7212 for all female subjects.

N/A	Publication	Year published	Journal/Conference name	Authors	Link
1	MRI-Based intelligent quotient (IQ) estimation with sparse learning^[12]	2015	Plos One (Journal)	Liye Wang, Chong-Yaw Wee, Heung-Il Suk, Xiaoying Tang, and Dinggang Shen	https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0117295
2	IQ estimation by means of EEG-fNIRS recordings during a logical-mathematical intelligence test^[23]	2019	Elsevier (Journal)	Shabnam Firooz and Seyed Kamaledin Setarehdan	https://www.sciencedirect.com/science/article/abs/pii/S0010482519301738
3	Automated IQ estimation from writing samples^[34]	2017	MAICS 2017 (Conference)	Austin Hendrix and Roman Yampolskiy	http://ceur-ws.org/Vol-1964/CS1.pdf
4	Automatic IQ estimation using stylometric methods^[35]	2019	ThinkIR (Electronic Thesis & Dissertation)	Polina Shafran Abramov and Roman Yampolskiy	https://ir.library.louisville.edu/cgi/viewcontent.cgi?article=4132&context=etd
5	Automatically profiling the author of an anonymous text^[36]	2009	Communications of the ACM (Journal)	Shlomo Argamon, Moshe Koppel, James W. Pennebaker, and Jonathan Schler	https://www.researchgate.net/publication/220427266_Automatically_Profiling_the_Author_of_an_Anonymous_Text
6	Author profiling, instance-based similarity classification^[47]	2017	PAN at CLEF 2017 (Conference)	Yaritza Adame-Arcia, Daniel Castro-Castro, Reynier Ortega Bueno, and Rafael Mu-ñoz	https://pan.webis.de/downloads/publications/papers/adamearcia_2017.pdf
7	Arabic tweeps gender and dialect prediction^[38]	2017	PAN at CLEF 2017 (Conference)	Khaled Alrifai, Ghaida Rebdawi, and Nada Ghneim	https://www.semanticscholar.org/paper/Arabic-Tweeps-Gender-and-Dialect-Prediction-Alrifai-Rebdawi/21b3341024ec0df3a73f7d30cf067686f0103464?p2df
8	Subword-based deep averaging networks for author profiling in social media^[49]	2017	PAN at CLEF 2017 (Conference)	Marc Franco-Salvador, Nataliia Plotnikova, Neha Pawar, and Yassine Benajiba	https://www.semanticscholar.org/paper/Subword-based-Deep-Averaging-Networks-for-Author-in-Franco-Salvador-Plotnikova/a9d7350eb6c3381b43454ede11ad07789dccbf20
9	Author profile prediction using trend and word frequency based analysis in text^[44]	2017	PAN at CLEF 2017 (Conference)	Jamal Ahmad Khan	https://www.semanticscholar.org/paper/Author-Profile-Prediction-Using-Trend-and-Word-in-Khan/4da5e57d2d07cf5336b2d6623bae090ea1c38e58
10	INSA LYON and UNI PASSAU’s participation at PAN@CLEF’17: Author profiling task: Notebook for PAN at CLEF 2017^[48]	2017	PAN at CLEF 2017 (Conference)	Guillaume Kheng, Léa Laporte, and Michael Granitzer	https://www.semanticscholar.org/paper/INSA-LYON-and-UNI-PASSAU’s-Participation-at-Author-Kheng-Laporte/c61506eab622da887cbe6c8202435197735b3b67
11	Author profiling with bidirectional RNNs using attention with GRUs^[39]	2017	PAN at CLEF 2017 (Conference)	Don Kodiyan, Florin Hardegger, Stephan Neuhaus, and Mark Cieliebak	https://www.semanticscholar.org/paper/Author-Profiling-with-Bidirectional-RNNs-using-with-Kodiyan-Hardegger/915654a4a29fd86621caeb7022fa484092f5e33b
12	TF-IDF and deep-learning for author profiling^[51]	2017	PAN at CLEF 2017 (Conference)	Nils Schaetti	https://www.researchgate.net/publication/320287536_UniNE_at_CLEF_2017_TF-IDF_and_Deep-Learning_for_Author_Profiling_Notebook_for_PAN_at_CLEF_2017
13	Automatically categorizing written texts by author gender^[43]	2002	Literary and Linguistic Computing (Journal)	Moshe Koppel, Shlomo Argamon, and Anat Rachel Shimoni	https://academic.oup.com/dsh/article-abstract/17/4/401/1019830
14	Determining an author’s native language by mining a text for errors^[37]	2005	KDD’05: Proceedings of the eleventh ACM SIGKDD international conference on knowledge discovery in data mining	Moshe Koppel, Jonathan Schler, and Kfir Zigdon	https://dl.acm.org/doi/10.1145/1081870.1081947
15	Intelligence quotient classification from human MRI brain images using convolutional neural networks^[21]	2020	12th International Conference on Computational Intelligence and Communication Networks	A Arya and Manju Manuel	https://ieeexplore.ieee.org/document/9242552
16	Predicting individualized intelligence quotient scores using brainnetome-atlas based functional connectivity^[22]	2017	2017 IEEE International Workshop on Machine Learning for Signal Processing	Rongtao Jiang, Shile Qi, Yuhui Du, Weizheng Yan, Vince D. Calhoun, Tianzi Jiang, and Jing Sui	https://ieeexplore.ieee.org/document/8168150