1 | MRI-based IQ estimation with sparse learning[12] | SVR — Multi-kernel SVR, Single-kernel SVR | MRI samples of developing children between 6 and 15 years scanned at 5 different sites (NYU, KKI, SU, OHSU, and UCLA) | Autism Brain Imaging Data Exchange | 164 samples (male/female: 130/34) | Correlation coefficient, root mean square error | · Multi Kernel SVR yielded a CC of 0.718 and an RMSE of 8.695.· Single kernel SVR yielded a CC of 0.684 and an RMSE of 9.166. |
2 | IQ estimation by means of EEG-fNIRS recordings during a logical-mathematical intelligence test[23] | Linear regression, support vector regression | fNIRS and EEG signals readings | fNIRS and EEG signals readings gotten from graduate students while they solved the RPM intelligence test | 11 samples (male/female: 6/5) | Relative error between the real IQ (Cattle test) and estimated IQ | · A combination of fNIRS and EEG features selected using PCA yielded the best results. |
3 | Automated IQ estimation from writing samples[34] | Stylometry | Common crawl corpus and SAT vocabulary | https://aws.amazon.com/public-datasets/common-crawl/ | Samples from common crawl with more than 100 words | Percentage error between Real IQ (from social media contacts) and estimated IQ | · The results show an accuracy of about 75%. |
4 | Automatic IQ estimation using stylometric methods[35] | Stylometry | Written text samples of American English published since 1990 | Open American National Corpus (OANC) and SAT Vocabulary | 6516 written samples and 5000 words from the SAT Vocabulary | Error between expected IQ range (gotten from sample GRE cores mapped to a range of IQs) and calculated IQ | · There was a high correlation between estimated IQ and calculated IQ with WRDMEAc feature providing the best estimation with a 75% accuracy. |
5 | Automatically profiling the author of an anonymous text[36] | Bayesian Multinomial Regression (BMR) | Three separate corpora. One to detect age and gender, one to detect native language, and the last one to detect personality type. | Age and gender: Full sets of postings from blog authors written in EnglishNative language: International Corpus of Learner EnglishPersonality: Essays written by psychology undergraduates at the University of Texas, Austin, as part of their course requirements | Age and gender: 19 320 authors with a mean length of 7250 words/authorNative language: 1290 authors. Between 279 and 846 words/authorPersonality: 198 authors. Between 251-1951 words/author | Classification accuracy | · Content-based and style-based features yielded the best results for age (76.1%) and gender (77.7%).· Content-based features only yielded the best results for language (82.3%).· Style-based features yielded the best results for neuroticism (65.7%). |
6 | Author profiling, instance-based similarity classification[47] | Instance-based similarity classification | XML-based tweets from twitter in four different languages (Arabic, English, Portuguese, and Spanish) | www.twitter.com | Arabic: 2400 documentsEnglish: 3600 documentsPortuguese: 1200 documentsSpanish: 4200 documents100 tweets/documents | Classification accuracy | · Performed well in gender classification but poorly in language classification. |
7 | Arabic tweeps gender and dialect prediction[38] | Support vector machines (SVMs), sequential minimal optimization (SMO) | XML-based tweets from twitter in Arabic language | www.twitter.com | 240 000 tweets written in Arabic by 2400 authors | Classification accuracy | SMO yielded the best results with· Language variety = 75.5%,· Gender = 72.25%. |
8 | Subword-based deep averaging networks for author profiling in social media[49] | Deep averaging networks (DANs) | XML-based tweets from twitter in four different languages (Arabic, English, Portuguese, and Spanish) | www.twitter.com | Arabic: 2400 documentsEnglish: 3600 documentsPortuguese: 1200 documentsSpanish: 4200 documents100 tweets/documents | Classification accuracy between an instance-based and prototype-based classification | · DAN with subword embeddings yielded the best results.· DAN performs well in author profiling to magnify the most discriminant values contained in an embedding average.· It is a competitive alternative. |
9 | Author profile prediction using trend and word frequency based analysis in text[44] | Distance-based method | XML-based tweets from twitter in four different languages (Arabic, English, Portuguese, and Spanish) | www.twitter.com | Arabic: 2400 documentsEnglish: 3600 documentsPortuguese: 1200 documentsSpanish: 4200 documents100 tweets/documents | Classification accuracy | ·There is a flaw in the system which is a decrease in the prediction of variety when there is an increase in the number of language varieties.· The method yields bad results. |
10 | INSA LYON and UNI PASSAU’s participation at PAN@CLEF’17: Author profiling task: Notebook for PAN at CLEF 2017[48] | SVMs, multinomial Naïve Bayes classifier (MNBC), random forest | XML-based tweets from twitter in four different languages (Arabic, English, Portuguese, and Spanish) | www.twitter.com | Arabic: 235 781 tweetsEnglish: 358 445 tweetsSpanish: 418 090 tweetsPortuguese: 118 105 tweets | Classification accuracy | · Combining TF-IDF features on unigram and bigrams using Naïve Bayes classifier yielded the best results.· Predicting Portuguese (97.5%) and Spanish (91.98%) yielded the best results. |
11 | Author profiling with bidirectional RNNs using attention with GRUs[39] | Recurrent neural networks (RNNs) | XML-based tweets from twitter in four different languages (Arabic, English, Portuguese, and Spanish) | www.twitter.com | 500 authors,100 tweets per author | Classification accuracy between the RNN and a CNN based model as the baseline | RNN yielded better results than CNN with an average classification accuracy of· 75.67% for gender,· 87.11% for language variety. |
12 | TF-IDF and deep learning for author profiling[51] | TF-IDF based method, convolutional neural networks (CNNs) | XML-based tweets from twitter in four different languages (Arabic, English, Portuguese, and Spanish) | www.twitter.com | English: 360 000 tweetsSpanish: 420 000 tweetsPortuguese: 120 000 tweetsArabic: 240 000 tweets | Classification accuracy between TF-IDF and CNN | · TF-IDF performed better for predicting language variety.· CNN performed better when used to classify gender. |
13 | Automatically categorizing written texts by author gender[43] | Winnow-like Algorithm, Naïve Bayes, decision trees | Documents in British English that are labeled both for author gender and for genre: fiction and several non-fiction genres and sub-genres | http://www.ir.iit.edu/~argamon/gender.html | Between 554 and 61 199 words with an average of about 34 320 words each (female = 34 795; male = 33 845). | Classification accuracy | · Function words combined with parts-of-speech yielded the best results across all genres with about 80% accuracy. |
14 | Determining an author’s native language by mining a text for errors[37] | Multi-class linear SVM | Written text from non-native English-speaking students | International Corpus of Learner English | 258 authors each from Russia, Czech Republic, Bulgaria, Spanish, and French sub-corpus | Confusion matrix | · Classification accuracy of 80.2% when all features are used in tandem with one another. |
15 | Intelligence quotient classification from human MRI brain images using convolutional neural networks[21] | CNN based IQ classification | ABIDE (autism brain image data exchange) provided by NITRC (neuroimaging informatics tools and resources) | Autism Brain Imaging Data Exchange | 5000 bi-dimensional slices from each of the three brain views (15 000) | Classification accuracy | · ResNet-50 yielded a maximum accuracy of 85.9%.· Using the images from the sagittal view proved to yield the best results. |
16 | Predicting individualized intelligence quotient scores using brainnetome-atlas based functional connectivity[22] | Regression algorithms | MRI brain scans | MRI brain scans obtained using a Tesla magnetic resonance scanner | 360 subjects between the ages of 17 and 24. 174 females and 186 males | Comparison of regression coefficients | · ReliefF + LASSO produced the best results with a regression coefficient of 0.5122 for all subjects and 0.7212 for all female subjects. |