SNCA: Semi-Supervised Node Classification for Evolving Large Attributed Graphs

Faima Abbasi; Muhammad Muzammal; Qiang Qu; Farhan Riaz; Jawad Ashraf

doi:10.26599/BDMA.2024.9020033

| Sign up

PDF (2 MB)

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

Show Outline

Figures (9)

Fig. 1

Fig. 2

Fig. 3

Fig. 4

Fig. 5

Fig. 6

Fig. 7

Fig. 8

Fig. 9

Tables (6)

Table 1

Table 2

Table 3

Table 4

Table 2

Open Access

SNCA: Semi-Supervised Node Classification for Evolving Large Attributed Graphs

Faima Abbasi^¹, Muhammad Muzammal^², Qiang Qu^³(), Farhan Riaz^⁴, Jawad Ashraf^⁵

1Luxembourg Institute of Science and Technology and University of Luxembourg, Esch-Sur-Alzette L-4362, Luxembourg

2Department of Computer and Information Sciences, Northumbria University, Newcastle NE1 8ST, UK

3Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China

4School of Computer Science, University of Lincoln, Lincoln LN6 7TS, UK

5Faculty of Computing, Engineering and Media, De Montfort University, Leicester, LE1 9BH, UK

Show Author Information

Abstract

Attributed graphs have an additional sign vector for each node. Typically, edge signs represent like or dislike relationship between the node pairs. This has applications in domains, such as recommender systems, personalised search, etc. However, limited availability of edge sign information in attributed networks requires inferring the underlying graph embeddings to fill-in the knowledge gap. Such inference is performed by way of node classification which aims to deduce the node characteristics based on the topological structure of the graph and signed interactions between the nodes. The study of attributed networks is challenging due to noise, sparsity, and class imbalance issues. In this work, we consider node centrality in conjunction with edge signs to contemplate the node classification problem in attributed networks. We propose Semi-supervised Node Classification in Attributed graphs (SNCA). SNCA is robust to underlying network noise, and has in-built class imbalance handling capabilities. We perform an extensive experimental study on real-world datasets to showcase the efficiency, scalability, robustness, and pertinence of the solution. The performance results demonstrate the suitability of the solution for large attributed graphs in real-world settings.

Keywords

attributed networks node classification recommender systems

References

[1]

Z. Zhu, H. Xing, and Y. Xu, Balanced neighbor exploration for semi-supervised node classification on imbalanced graph data, Inf. Sci., vol. 631, pp. 31–44, 2023.

Crossref Google Scholar

[2]

P. Chunaev, Community detection in node-attributed social networks: A survey, Comput. Sci. Rev., vol. 37, p. 100286, 2020.

Crossref Google Scholar

[3]

H. Sun, F. He, J. Huang, Y. Sun, Y. Li, C. Wang, L. He, Z. Sun, and X. Jia, Network embedding for community detection in attributed networks, ACM Trans. Knowl. Discov. Data, vol. 14, no. 3, p. 36, 2020.

Crossref Google Scholar

[4]

C. Park, D. Kim, J. Han, and H. Yu, Unsupervised attributed multiplex network embedding, in Proc. 34^th AAAI Conf. Artificial Intelligence, New York, NY, USA, 2020, pp. 5371–5378.

Crossref

[5]

X. Zhou, L. Su, X. Li, Z. Zhao, and C. Li, Community detection based on unsupervised attributed network embedding, Expert Syst. Appl., vol. 213, p. 118937, 2023.

Crossref Google Scholar

[6]

L. Shi, Q. Liu, J. Shao, Y. Cheng, and X. W. Zheng, A cooperation-competition evolutionary dynamic model over signed networks, IEEE Trans. Automat. Control, vol. 68, no. 12, pp. 7927–7934, 2023.

Crossref Google Scholar

[7]

M. R. Kondamudi, S. R. Sahoo, L. Chouhan, and N. Yadav, A comprehensive survey of fake news in social networks: Attributes, features, and detection approaches, J. King Saud Univ.-Comput. Inf. Sci., vol. 35, no. 6, p. 101571, 2023.

Crossref Google Scholar

[8]

L. Dinh, R. Rezapour, L. Jiang, and J. Diesner, Enhancing structural balance theory and measurement to analyze signed digraphs of real-world social networks, Front. Hum. Dyn., vol. 4, p. 1028393, 2023.

Crossref Google Scholar

[9]

W. Yuan, C. Li, G. Han, D. Guan, L. Zhou, and K. He, Negative sign prediction for signed social networks, Future Gener. Comput. Syst., vol. 93, pp. 962–970, 2019.

Crossref Google Scholar

[10]

A. Arya and K. P. Pandey, Structural reconstruction of signed social networks, IEEE Trans. Comput. Soc. Syst., vol. 10, no. 5, pp. 2599–2612, 2023.

Crossref Google Scholar

[11]

T. Derr, C. Aggarwal, and J. Tang, Signed network modeling based on structural balance theory, in Proc. 27^th ACM Int. Conf. Information and Knowledge Management, Torino, Italy, 2018, pp. 557–566.

Crossref

[12]

P. Doreian and A. Mrvar, Structural balance and signed international relations, J. Soc. Struct., vol. 16, no. 1, pp. 1–49, 2015.

Crossref Google Scholar

[13]

X. Zheng, D. Zeng, and F. Y. Wang, Social balance in signed networks, Inf. Syst. Front., vol. 17, no. 5, pp. 1077–1095, 2015.

Crossref Google Scholar

[14]

W. C. Liu, L. C. Huang, C. W. J. Liu, and F. Jordán, A simple approach for quantifying node centrality in signed and directed social networks, Appl. Netw. Sci., vol. 5, no. 1, p. 46, 2020.

Crossref Google Scholar

[15]

W. Ju, L. Chen, B. Li, W. Liu, J. Sheng, and Y. Wang, A new algorithm for positive influence maximization in signed networks, Inf. Sci., vol. 512, pp. 1571–1591, 2020.

Crossref Google Scholar

[16]

M. Muzammal, F. Abbasi, Q. Qu, R. Talat, and J. Fan, A decentralised approach for link inference in large signed graphs, Future Gener. Comput. Syst., vol. 102, pp. 827–837, 2020.

Crossref Google Scholar

[17]

C. Xia, Y. Luo, L. Wang, and H. J. Li, A fast community detection algorithm based on reconstructing signed networks, IEEE Syst. J., vol. 16, no. 1, pp. 614–625, 2022.

Crossref Google Scholar

[18]

J. Tang, C. Aggarwal, and H. Liu, Recommendations in signed social networks, in Proc. 25^th Int. Conf. World Wide Web, Montréal, Canada, 2016, pp. 31–40.

Crossref

[19]

Y. Chen, Q. Qu, Y. Ying, H. Li, and J. Shen, Semantics-aware influence maximization in social networks, Inf. Sci., vol. 513, pp. 442–464, 2020.

Crossref Google Scholar

[20]

S. Y. Liu, J. Xiao, and X. K. Xu, Link prediction in signed social networks: From status theory to motif families, IEEE Trans. Netw. Sci. Eng., vol. 7, no. 3, pp. 1724–1735, 2020.

Crossref Google Scholar

[21]

Y. He, G. Reinert, S. Wang, and M. Cucuringu, SSSNET: Semi-supervised signed network clustering, in Proc. 2022 SIAM Int. Conf. Data Mining, Alexandria, VA, USA, 2022, pp. 244–252.

Crossref

[22]

S. Kumar, A. Gupta, and I. Khatri, CSR: A community based spreaders ranking algorithm for influence maximization in social networks, World Wide Web, vol. 25, no. 6, pp. 2303–2322, 2022.

Crossref

[23]

Y. Xing, X. Wang, C. Qiu, Y. Li, and W. He, Research on opinion polarization by big data analytics capabilities in online social networks, Technol. Soc., vol. 68, p. 101902, 2022.

Crossref Google Scholar

[24]

S. Dhelim, N. Aung, M. T. Kechadi, H. Ning, L. Chen, and A. Lakas, Trust2Vec: Large-scale IoT trust management system based on signed network embeddings, IEEE Int. Things J., vol. 10, no. 1, pp. 553–562, 2023.

Crossref Google Scholar

[25]

X. Su, S. Xue, F. Liu, J. Wu, J. Yang, C. Zhou, W. Hu, C. Paris, S. Nepal, D. Jin, et al., A comprehensive survey on community detection with deep learning, IEEE Trans. Neural Netw. Learn. Syst., vol. 35, no. 4, pp. 4682–4702, 2024.

Crossref Google Scholar

[26]

B. Xu, J. Huang, L. Hou, H. Shen, J. Gao, and X. Cheng, Label-consistency based graph neural networks for semi-supervised node classification, in Proc. 43^rd Int. ACM SIGIR Conf. Research and Development in Information Retrieval, Virtual Event, 2020, pp. 1897–1900.

Crossref

[27]

B. Wang, A. Li, M. Pang, H. Li, and Y. Chen, GraphFL: A federated learning framework for semi-supervised node classification on graphs, in Proc. 2022 IEEE Int. Conf. Data Mining, Orlando, FL, USA, 2022, pp. 498–507.

Crossref

[28]

S. Xiao, S. Wang, Y. Dai, and W. Guo, Graph neural networks in node classification: Survey and evaluation, Mach. Vision Appl., vol. 33, no. 1, p. 4, 2022.

Crossref Google Scholar

[29]

S. Bhagat, G. Cormode, and S. Muthukrishnan, Node classification in social networks, in Social Network Data Analytics, C. C. Aggarwal, ed. Boston, MA, USA: Springer, 2011, pp. 115–148.

Crossref

[30]

S. Abu-El-Haija, A. Kapoor, B. Perozzi, and J. Lee, N-GCN: Multi-scale graph convolution for semi-supervised node classification, in Proc. 35^th Conf. Uncertainty in Artificial Intelligence, Tel Aviv, Israel, 2019, pp. 841–851.

[31]

H. Kang and H. Park, Providing node-level local explanation for node2vec through reinforcement learning, in Proc. 15^th ACM Int. Conf. Web Search and Data Mining, Virtual Event, 2022.

[32]

J. Oh, K. Cho, and J. Bruna, Advancing GraphSAGE with a data-driven node sampling, arXiv preprint arXiv: 1904.12935, 2019.

[33]

I. Chami, Z. Ying, C. Re, and J. Leskovec, Hyperbolic graph convolutional neural networks, in Proc. 33^rd Int. Conf. Neural Information Processing Systems, Vancouver, Canada, 2019, p. 438.

[34]

X. Sun, H. Yin, B. Liu, H. Chen, J. Cao, Y. Shao, and N. Q. V. Hung, Heterogeneous hypergraph embedding for graph classification, in Proc. 14^th ACM Int. Conf. Web Search and Data Mining, Virtual Event, 2021, pp. 725–733.

Crossref

[35]

W. Gu, F. Gao, X. Lou, and J. Zhang, Discovering latent node information by graph attention network, Sci. Rep., vol. 11, no. 1, p. 6967, 2021.

Crossref Google Scholar

[36]

X. Sun, H. Cheng, J. Li, B. Liu, and J. Guan, All in one: Multi-task prompting for graph neural networks, in Proc. 29^th ACM SIGKDD Conf. Knowledge Discovery and Data Mining, Long Beach, CA, USA, 2023, pp. 2120–2131.

Crossref

[37]

T. Xie, B. Wang, and C. C. J. Kuo, GraphHop: An enhanced label propagation method for node classification, IEEE Trans. Neural Netw. Learn. Syst., vol. 34, no. 11, pp. 9287–9301, 2023.

Crossref Google Scholar

[38]

Q. Huang, H. He, A. Singh, S. N. Lim, and A. R. Benson, Combining label propagation and simple models out-performs graph neural networks, in Proc. 9^th Int. Conf. Learning Representations, Virtual Event, arXiv preprint arXiv: 2010.13993 2021.

[39]

X. Sun, H. Cheng, B. Liu, J. Li, H. Chen, G. Xu, and H. Yin, Self-supervised hypergraph representation learning for sociological analysis, IEEE Trans. Knowl. Data Eng., vol. 35, no. 11, pp. 11860–11871, 2023.

Crossref Google Scholar

[40]

S. Zhang, H. Chen, X. Sun, Y. Li, and G. Xu, Unsupervised graph poisoning attack via contrastive loss back-propagation, in Proc. ACM Web Conf. 2022, Lyon, France, 2022, pp. 1322–1330.

Crossref

[41]

J. Tang, C. C. Aggarwal, and H. Liu, Node classification in signed social networks, in Proc. 2016 SIAM Int. Conf. Data Mining, Miami, FL, USA, 2016, pp. 54–62.

Crossref

[42]

P. Mercado, J. Bosch, and M. Stoll, Node classification for signed social networks using diffuse interface methods, in Proc. Joint European Conf. Machine Learning and Knowledge Discovery in Databases, Würzburg, Germany, 2019, pp. 524–540.

Crossref

[43]

M. A. Abbasi, J. Tang, and H. Liu, Scalable learning of users’ preferences using networked data, in Proc. 25^th ACM Conf. Hypertext and Social Media, Santiago, Chile, 2014, pp. 4–12.

Crossref

[44]

J. Tang, X. Hu, and H. Liu, Is distrust the negation of trust? The value of distrust in social media, in Proc. 25^th ACM Conf. Hypertext and Social Media, Santiago, Chile, 2014, pp. 148–157.

Crossref

[45]

R. Salakhutdinov and A. Mnih, Probabilistic matrix factorization, in Proc. 20^th Int. Conf. Neural Information Processing Systems, Vancouver, Canada, 2007, pp. 1257–1264.

[46]

R. Salakhutdinov and A. Mnih, Bayesian probabilistic matrix factorization using Markov chain Monte Carlo, in Proc. 25^th Int. Conf. Machine Learning, Helsinki, Finland, 2008, pp. 880–887.

Crossref

[47]

S. Yang and B. Yang, Enhanced network embedding with text information, in Proc. 24^th Int. Conf. Pattern Recognition (ICPR ), Beijing, China, 2018, pp. 326–331.

Crossref

[48]

A. Grover and J. Leskovec, node2vec: Scalable feature learning for networks, in Proc. 22^nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, San Francisco, CA, USA, 2016, pp. 855–864.

Crossref

[49]

N. K. Ahmed, R. Rossi, J. B. Lee, T. L. Willke, R. Zhou, X. Kong, and H. Eldardiry, Learning role-based graph embeddings, arXiv preprint arXiv: 1802.02896, 2018.

[50]

J. Jung, H. M. Park, and U. Kang, BalanSiNG: Fast and scalable generation of realistic signed networks, in Proc. 23^rd Int. Conf. Extending Database Technology, Copenhagen, Denmark, 2020, pp. 193–204.

[51]

N. Papernot, P. McDaniel, and I. Goodfellow, Transferability in machine learning: From phenomena to black-box attacks using adversarial samples, arXiv preprint arXiv: 1605.07277, 2016.

Big Data Mining and Analytics

Volume 7 Issue 3,
September 2024

Pages 794-808

DOI: 10.26599/BDMA.2024.9020033

Cite this article:

Abbasi F, Muzammal M, Qu Q, et al. SNCA: Semi-Supervised Node Classification for Evolving Large Attributed Graphs. Big Data Mining and Analytics, 2024, 7(3): 794-808. https://doi.org/10.26599/BDMA.2024.9020033

Return

Table 1List of useful notation.

Notation	Description
$G^{A} (N^{L}, N^{U}, W^{+}, W^{-})$	Attributed network
$N^{L}$ , $N^{U}$ , $N^{E}$	Labelled nodes set, unlabelled, and embedding nodes set
$W^{+}$ , $W^{-}$	Signed link (positive and negative)
$E_{M}$	Learning model
$S^{+}$ , $S^{-}$	Signed user preference matrix
$d^{+}$ , $d^{-}$	Positive & negative degree of node
$S^{+}$ = (0, +1)	Existence of positive edge between two users
$S^{-}$ = (0, −1)	Existence of negative edge between two users
$R$	Low dimensional representation of $G^{A}$
$I_{δ}$	Uncertain data
$Z$	Proximity matrix
$D$	Document matrix
$U$ , $M$	Base matrix
$Q$ , $V$	Co-efficient matrix

Table 2

Algorithm 1　SNCA with PMF

Input: Attributed network

G^{A} (N^{L}, N^{U}, W^{+}, W^{-})

Output: Predicted labels of

N^{U}

G^{A}

is split into training graph

G_{t r a i n}

and test graph

G_{t e s t}

;

　Compute

n o d e C e n t r a l i t y (G_{t r a i n})

;

　Generate user preference matrix

S

;

　for

i = 0

to length of nodes (

G_{t r a i n}

) do

　　Initialize user latent vectors (

η_{i}

η_{j}

) randomly using 　　Gaussian distribution given in Eq. (1);

　　Update

η_{i}

and

η_{j}

using gradient descent given in Eqs. (3) 　　and (4);

　end for

N^{E}

η_{i}

along

n o d e C e n t r a l i t y (η_{i})

;

D T

c o n s t r u c t T r e e

(

N^{E}

s i z e_O f (N^{E})

N^{L}

);

　Make prediction on

G_{t e s t}

using

D T

;

　return Predicted labels of

N^{U}

Table 3

Algorithm 2　SNCA with TENE

Input: Attributed network

G^{A} (N^{L}, N^{U}, W^{+}, W^{-})

Output: Predicted labels of

N^{U}

G^{A}

is split into training graph

G_{t r a i n}

and test graph

G_{t e s t}

;

　Compute

n o d e C e n t r a l i t y (G_{t r a i n})

;

N^{E}

= TENE (

G_{t r a i n}

);

D T

c o n s t r u c t T r e e

(

N^{E}

s i z e_O f (N^{E})

N^{L}

);

　Make prediction on

G_{t e s t}

using

D T

;

　return Predict labels of

N^{U}

　procedure TENE (

G_{t r a i n}

)

　　Construct the user preference matrix

Z

G_{t r a i n}

;

　　Construct the node feature matrix

D

G_{t r a i n}

;

　　Optimize

Z

and

D

using Eqs. (8) and (9);

　　for each node

η

\in N

U, C = F A C T O R I Z E (Z, D)

using Eq. (10);

　　　Update

U and C

using ALS-MF;

　　end for

N^{E}

η_{i}

along

n o d e C e n t r a l i t y (η_{i})

\in U

;

　return Node embedding

N^{E}

Table 4

Algorithm 3　SNCA with ROLE2VEC

Input: Attributed network

G^{A} (N^{L}, N^{U}, W^{+}, W^{-})

Output: Predicted labels of

N^{U}

G^{A}

is split into training graph

G_{t r a i n}

and test graph

G_{t e s t}

;

　Compute

n o d e C e n t r a l i t y (G_{t r a i n})

;

N^{E}

= ROLE2VEC (

G_{t r a i n}, Y, D, W, L, ω)

;

D T

c o n s t r u c t T r e e

(

N^{E}

s i z e_O f (N^{E})

N^{L}

);

　Make prediction on

G_{t e s t}

using

D T

;

　return Predict labels of

N^{U}

　procedure ROLE2VEC (

G_{t r a i n}, Y, D, W, L, ω

);

　　Set attributed walk

W \to ϕ

also extract and append motifs 　　to

Y

;

　　Map users to type function

ϕ : y \to w

;

　　Compute probability of transition

π

and form

G_{t r a i n} =

(N, W, π)

;

　　Set

Π

to be random permutations of

N

;

　　for each node

η

\in Π

W_{n} \to

A T T R I B U T E D W A L K

(

G_{t r a i n}, Y, η, ϕ, L)

;

　　　Append

W_{n}

W

;

　　end for

N^{E}

= (

G R A D I E N T D E S C E N T

(W, D, ω),

n o d e C e n t r a l i t y (η_{i}))

　return Node embedding

N^{E}

Table 2Wikipedia network statistics.

Parameter	Wikipedia RFA		Wikipedia ELEC
Parameter	Positive	Negative	Positive	Negative
$N^{L}$	1684 (48%)	1813 (52%)	1211 (51%)	1163 (49%)
$W^{L}$	138247 (79%)	39080 (21%)	81572 (78%)	21932 (22%)

Table 3Performance comparison of accuracy with different proportion of labelled nodes. The best results from current work are bold and underlined, best results from literature are underlined.

Method	Wikipedia-RFA			Wikipedia-ELEC
Method	5%	10%	15%	5%	10%	15%
NCSSN^[41]	0.756	0.745	0.734	0.726	0.735	0.776
DGB^[42]	0.681	0.681	0.650	0.602	0.644	0.609
GL^[42]	0.847	0.848	0.849	0.885	0.887	0.887
SNCA (ROLE2VEC)	0.401	0.791	0.866	0.306	0.800	0.833
SNCA (TENE)	0.388	0.764	0.832	0.288	0.833	0.866
SNCA (PMF)	0.428	0.815	0.858	0.368	0.844	0.877
SNCA (BPMF)	0.499	0.833	0.877	0.400	0.866	0.899