Inferring people’s Socioeconomic Attributes (SEAs), including income, occupation, and education level, is an important problem for both social sciences and many networked applications like targeted advertising and personalized recommendation. Previous works mainly focus on estimating SEAs from peoples’ cyberspace behaviors and relationships, such as the content of tweets or the social networks between online users. Besides cyberspace data, alternative data sources about users’ physical behavior, like their home location, may offer new insights. More specifically, in this paper, we study how to predict a person’s income level, family income level, occupation type, and education level from his/her home location. As a case study, we collect people’s home locations and socioeconomic attributes through a survey involving 9 provinces and 85 cities in China. We further enrich home location with the knowledge from real estate websites, government statistics websites, online map services, etc. To learn a shared representation from input features as well as attribute-specific representations for different SEAs, we propose H2SEA, a factorization machine-based multi-task learning method with attention mechanism. Extensive experiment results show that: (1) Home location can clearly improve the estimation accuracy for all SEA prediction tasks (e.g., 80.2% improvement in terms of F1-score in estimating personal income level); (2) The proposed H2SEA model outperforms alternative models for SEA inference in terms of various evaluation metrics, such as Area Under Curve (AUC), F-measure, and specificity; (3) The performance of specific SEA prediction tasks (e.g., personal income) can be further improved if H2SEA only focuses on cities or villages due to urban-rural gap in China; (4) Compared with online crawled housing price data, the area-level average income and Points Of Interest (POI) are more important features for SEA inferences in China.
- Article type
- Year
- Co-author
This paper poses a question: How many types of social relations can be categorized in the Chinese context? In social networks, the calculation of tie strength can better represent the degree of intimacy of the relationship between nodes, rather than just indicating whether the link exists or not. Previou research suggests that Granovetter measures tie strength so as to distinguish strong ties from weak ties, and the Dunbar circle theory may offer a plausible approach to calculating 5 types of relations according to interaction frequency via unsupervised learning (e.g., clustering interactive data between users in Facebook and Twitter). In this paper, we differentiate the layers of an ego-centered network by measuring the different dimensions of user's online interaction data based on the Dunbar circle theory. To label the types of Chinese guanxi, we conduct a survey to collect the ground truth from the real world and link this survey data to big data collected from a widely used social network platform in China. After repeating the Dunbar experiments, we modify our computing methods and indicators computed from big data in order to have a model best fit for the ground truth. At the same time, a comprehensive set of effective predictors are selected to have a dialogue with existing theories of tie strength. Eventually, by combining Guanxi theory with Dunbar circle studies, four types of guanxi are found to represent a four-layer model of a Chinese ego-centered network.