Department of Computer Science, Georgia State University, Atlanta, GA 30303, USA
LSWare Inc., Seoul 08504, Republic of Korea
College of Intelligence Information Engineering, Sangmyung University, Seoul 03016, Republic of Korea
Show Author Information
Hide Author Information
Abstract
Open-source licenses can promote the development of machine learning by allowing others to access, modify, and redistribute the training dataset. However, not all open-source licenses may be appropriate for data sharing, as some may not provide adequate protections for sensitive or personal information such as social network data. Additionally, some data may be subject to legal or regulatory restrictions that limit its sharing, regardless of the licensing model used. Hence, obtaining large amounts of labeled data can be difficult, time-consuming, or expensive in many real-world scenarios. Few-shot graph classification, as one application of meta-learning in supervised graph learning, aims to classify unseen graph types by only using a small amount of labeled data. However, the current graph neural network methods lack full usage of graph structures on molecular graphs and social network datasets. Since structural features are known to correlate with molecular properties in chemistry, structure information tends to be ignored with sufficient property information provided. Nevertheless, the common binary classification task of chemical compounds is unsuitable in the few-shot setting requiring novel labels. Hence, this paper focuses on the graph classification tasks of a social network, whose complex topology has an uncertain relationship with its nodes’ attributes. With two multi-class graph datasets with large node-attribute dimensions constructed to facilitate the research, we propose a novel learning framework that integrates both meta-learning and contrastive learning to enhance the utilization of graph topological information. Extensive experiments demonstrate the competitive performance of our framework respective to other state-of-the-art methods.
No abstract is available for this article. Click the button above to view the PDF directly.
References
[1]
X.Zheng and Z.Cai, Privacy-preserved data sharing towards multiple parties in industrial IoTs, IEEE J. Sel. Areas Commun., vol. 38, no. 5, pp. 968–979, 2020.
Z.Cai and X.Zheng, A private and efficient mechanism for data uploading in smart cyber-physical systems, IEEE Trans. Netw. Sci. Eng., vol. 7, no. 2, pp. 766–775, 2020.
Z.Cai, X.Zheng, J.Wang, and Z.He, Private data trading towards range counting queries in Internet of Things, IEEE Trans. Mob. Comput., vol. 22, no. 8, pp. 4881–4897, 2023.
Z.Cai, Z.He, X.Guan, and Y.Li, Collective data-sanitization for preventing sensitive information inference attacks in social networks, IEEE Trans. Dependable Secure Comput., vol. 15, no. 4, pp. 577–590, 2018.
Y.Liang, Z.Cai, J.Yu, Q.Han, and Y.Li, Deep learning based inference of private information using embedded sensors in smart devices, IEEE Netw., vol. 32, no. 4, pp. 8–14, 2018.
Y.Huang, Y. J.Li, and Z.Cai, Security and privacy in metaverse: A comprehensive survey, Big Data Mining and Analytics, vol. 6, no. 2, pp. 234–247, 2023.
I. K.Nti, J. A.Quarcoo, J.Aning, and G. K.Fosu, A mini-review of machine learning in big data analytics: Applications, challenges, and prospects, Big Data Mining and Analytics, vol. 5, no. 2, pp. 81–97, 2022.
B. M.Oloulade, J.Gao, J.Chen, T.Lyu, and R.Al-Sabri, Graph neural architecture search: A survey, Tsinghua Science and Technology, vol. 27, no. 4, pp. 692–708, 2022.
M.Zhang, Z.Cui, M.Neumann, and Y.Chen, An end-to- end deep learning architecture for graph classification, Proc. AAAI Conf. Artif. Intell., vol. 32, no. 1, pp. 4438–4445, 2018.
F.Errica, M.Podda, D.Bacciu, and A.Micheli, A fair comparison of graph neural networks for graph classification, arXiv preprint arXiv: 1912.09893, 2019.
H.Zhao, H.Chen, L.Li, and H.Wan, Understanding social relationships with person-pair relations, Big Data Mining and Analytics, vol. 5, no. 2, pp. 120–129, 2022.
C.Eksombatchai, P.Jindal, J. Z.Liu, Y.Liu, R.Sharma, C.Sugnet, M.Ulrich, and J.Leskovec, Pixie: A system for recommending 3+ billion items to 200+ million users in real-time, in Proc. 2018 World Wide Web Conference, 2018, pp. 1775–1784.
Y.Wang, Q.Yao, J. T.Kwok, and L. M.Ni, Generalizing from a few examples: A survey on few-shot learning, ACM Comput. Surv., vol. 53, no. 3, p. 63, 2020.
O.Vinyals, C.Blundell, T.Lillicrap, K.Kavukcuoglu, and D.Wierstra, Matching networks for one shot learning, in Proc. 30th Int. Conf. Neural Information Processing Systems, Barcelona, Spain, 2016, pp. 3637–3645.
J.Snell, K.Swersky, and R.Zemel, Prototypical networks for few-shot learning, in Proc. 31st Int. Conf. Neural Information Processing Systems, 2017, pp. 4080–4090.
C.Finn, P.Abbeel, and S.Levine, Model-agnostic meta-learning for fast adaptation of deep networks, in Proc. 34th Int. Conf. Machine Learning - Volume 70, Sydney, Australia, 2017, pp. 1126–1135.
J.Chauhan, D.Nathani, and M.Kaul, Few-shot learning on graphs via super-classes based on graph spectral measures, arXiv preprint arXiv: 2002.12815, 2020.
T.Chen, S.Kornblith, M.Norouzi, and G.Hinton, A simple framework for contrastive learning of visual representations, in Proc. 37th Int. Conf. Machine Learning, Vienna, Austria, 2020, pp. 1597–1607.
Y.You, T.Chen, Y.Sui, T.Chen, Z.Wang, and Y.Shen, Graph contrastive learning with augmentations, in Proc. 34th Int. Conf. Neural Information Processing Systems, Vancouver, Canada, 2020, pp. 5812–5823.
Y.Zhu, Y.Xu, F.Yu, Q.Liu, S.Wu, and L.Wang, Graph contrastive learning with adaptive augmentation, in Proc. Web Conf. 2021, Ljubljana, Slovenia, 2021, pp. 2069–2080.
J.Yu, H.Yin, X.Xia, T.Chen, L.Cui, and Q. V. H.Nguyen, Are graph augmentations necessary? Simple graph contrastive learning for recommendation, in Proc. 45th Int. ACM SIGIR Conf. Research and Development in Information Retrieval, Madrid, Spain, 2022, pp. 1294–1303.
K.He, H.Fan, Y.Wu, S.Xie, and R.Girshick, Momentum contrast for unsupervised visual representation learning, in Proc. 2020 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 2020, pp. 9726–9735.
J.McAuley, R.Pandey, and J.Leskovec, Inferring networks of substitutable and complementary products, in Proc. 21th ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, Sydney, Australia, 2015, pp. 785–794.
J.Tang, J.Zhang, L.Yao, J.Li, L.Zhang, and Z.Su, ArnetMiner: Extraction and mining of academic social networks, in Proc. 14th ACM SIGKDD Int. Conf. Knowledge discovery and data mining, Las Vegas, NV, USA, 2008, pp. 990–998.
N.Wang, M.Luo, K.Ding, L.Zhang, J.Li, and Q.Zheng, Graph few-shot learning with attribute matching, in Proc. 29th ACM Int. Conf. Information & Knowledge Management, Virtual Event, 2020, pp. 1545–1554.
N.Shervashidze, P.Schweitzer, E. J.Van Leeuwen, K.Mehlhorn, and K. M.Borgwardt, Weisfeiler-lehman graph kernels, J. Mach. Learn. Res., vol. 12, pp. 2539–2561, 2011.
Zhang K, Shin D, Seo D, et al. Few-Shot Graph Classification with Structural-Enhanced Contrastive Learning for Graph Data Copyright Protection. Tsinghua Science and Technology, 2024, 29(2): 605-616. https://doi.org/10.26599/TST.2023.9010071
The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).
10.26599/TST.2023.9010071.F001
A simple example that shows node attributes may hurt classification accuracy without adequately considering the graph structure. Strengthening graph structure learning can generate more distinguishable graph embedding while retaining high-similarity node information.
10.26599/TST.2023.9010071.F002
Overview of SE-GCL. The framework consists of two main processes: graph meta-learning and contrastive learning. Given a support set of input graphs, we use a graph encoder to extract robust feature representation and derive reliable prototypes for each class. The Wasserstein metric measures the similarity between the query graph and the prototype. Further, we impose the contrastive loss on the query set to improve the model’s generalizability. The complete workflow of all modules is an end-to-end solution. More details could be in the section of the proposed framework.
Proposed framework
Figure 2
illustrates the framework of our proposed method. Two complementary classification tasks are performed simultaneously to learn the main encoder , which is a GNN for projecting a graph into an embedding. The first learning module is metric-based meta-learning, which utilizes explicit label information to generate the graph embedding and compute the similarity between the support set and query set. The second learning module is contrastive learning, which is a self-supervised instance-level classification task to improve the representation result. For self-supervised learning, we design a strategy to generate a pair of positive and negative augmentation views of the input graph automatically, which contributes to data copyright protection by mitigating the risk of unauthorized reproduction or misuse of the original data.
10.26599/TST.2023.9010071.F002
Overview of SE-GCL. The framework consists of two main processes: graph meta-learning and contrastive learning. Given a support set of input graphs, we use a graph encoder to extract robust feature representation and derive reliable prototypes for each class. The Wasserstein metric measures the similarity between the query graph and the prototype. Further, we impose the contrastive loss on the query set to improve the model’s generalizability. The complete workflow of all modules is an end-to-end solution. More details could be in the section of the proposed framework.
During meta-learning, the main encoder maps each graph into a latent representation as its graph embedding . Specifically, GNNs compute graph embedding via a message-passing framework:
where denotes the embedding of node u at the l-th GNN layer; is the neighbor set of node u; AGG is the neighbor aggregation function; COM is the combination function; and READOUT is the graph-level pooling function. Then all support graph embeddings in the same class are aggregated into one prototype representation by computing the average, which is formulated as
To predict the label of the query graph, the similarity between query graph embedding and the prototype representation is measured by the p-th Wasserstein distance following the work in Ref. [
36
], which is the optional cost of moving mass between two graph embeddings. The classification loss is defined as the average cross entropy between true labels and predictions based on the similarity, which can be formulated as
where sim denotes the Wasserstein similarity metric.
Because contrastive learning can maximize the agreement between the input data and its positive view while minimizing the agreement with the negative view, two automatic augmentations are employed to generate a pair of differentiable views for the respective goals, which reduces the need to unauthorized operations of the original data. Expressly, the positive augmentation operation preserves the original topology of the sample graph and masks all the node features to form a positive view , which aims to mediate the overwhelming of the node features over the graph structure information in the representation learning. On the other hand, the negative augmentation operation generates a negative view by random node-dropping and edge-perturbation. Both operations follow an i.i.d. uniform distribution with node-dropping ratio and edge-perturbation ratio . For edge-perturbation, it randomly drops existing edges, then adds the same amount of random edges back into . To form as a small subgraph from with a few noisy edges, is set to 0.8 by default. Moreover, it is stated in Ref. [
23
] that the structural information of graph data consists of both local and global dimensions, which means some attributes of a graph depend on the substructure of the graph while some consider the global structure more. As generalization is the main challenge for meta-learning to test novel domains, randomly treating a small subgraph as the negative example helps predictive models generalize beyond the limited training data. It should be noted that the negative view of one sample graph is also treated as the negative view of the rest samples (i.e., for a query set containing M samples, there are M negative views for each sample graph). Introduced in Ref. [
37
], we apply a momentum encoder for projecting the contrastive views, which behaves similarly as the main encoder as its parameter is a moving average of . Given , the contrastive loss aims to maximize its agreement with while minimizing the agreement with all the negative views , which can be formulated as
where M denotes the size of the query set, and is the perturbation ratio. By minimizing w.r.t. , we force the main encoder to maintain the complete structural information in the embedding and produce more generalized prototypical networks. Thus, the overall loss is the combination of the classification loss and the contrastive loss:
where is a hyper-parameter that balances two terms. The detailed learning process is described in Algorithm 1. And all notations used in this paper are listed in
Table 1
.
10.26599/TST.2023.9010071.T001
List of notations used in this paper.
Symbol
Description
Undirected unweighted graph
U
Set of nodes
Set of node’s neighbors
E
Set of edges
A
Set of node attributes
Graph label
Support dataset
Query dataset
Main graph encoder
Momentum graph encoder
Graph embedding
Node embedding at the l-th GNN layer
Graph prototype representation
Graph positive augmentation view
Graph negative augmentation view
Perturbation ratio of
Regularization hyper-parameter
10.26599/TST.2023.9010071.F003
t-SNE visualization comparison for the DBLP dataset. Each class is represented in a different color.
10.26599/TST.2023.9010071.F004
Influence of the perturbation ratio . The range of is set from 0.1 to 0.9.
10.26599/TST.2023.9010071.T001List of notations used in this paper.
Symbol
Description
Undirected unweighted graph
U
Set of nodes
Set of node’s neighbors
E
Set of edges
A
Set of node attributes
Graph label
Support dataset
Query dataset
Main graph encoder
Momentum graph encoder
Graph embedding
Node embedding at the l-th GNN layer
Graph prototype representation
Graph positive augmentation view
Graph negative augmentation view
Perturbation ratio of
Regularization hyper-parameter
10.26599/TST.2023.9010071.T002Statistics of datasets. We show each dataset with the number of graphs , the average number of nodes ., the average number of edges ., the dimensions of node attributes , and the number of classes for training over testing .
Dataset
Amazon-Clothing
2000
32.15
192.50
9034
10/10
DBLP
2000
47.25
318.45
7202
10/10
Letter-High
2250
4.67
4.50
2
11/4
TRIANGLES
2000
20.85
35.50
1
7/3
10.26599/TST.2023.9010071.T003Accuracy with a standard deviation of baselines and our method. We tested 100 N-way-K-shot tasks on both Amazon-Clothing and DBLP datasets. The best results are highlighted in bold.
Method
Accuracy (%)
Amazon-Clothing
DBLP
5-way 5-shot
5-way 10-shot
8-way 5-shot
5-way 5-shot
5-way 10-shot
8-way 5-shot
WL kernel
56.40 2.23
65.24 1.37
49.47 2.64
57.12 2.44
65.52 1.71
50.35 2.39
GIN
63.25 1.63
71.24 1.57
55.47 3.34
66.10 2.41
72.38 1.44
57.13 2.88
MAML (GCN)
70.72 3.88
76.62 2.35
60.70 4.53
73.12 4.65
77.69 2.89
63.19 5.12
MAML (GAT)
70.66 3.53
76.68 2.51
60.27 4.49
74.10 4.19
78.03 3.44
62.80 3.99
PN (GCN)
70.18 1.19
77.43 1.87
63.17 2.14
74.32 2.49
79.79 2.19
64.49 3.19
PN (GAT)
71.22 2.43
77.06 2.15
63.89 2.94
74.91 3.29
80.29 2.34
64.52 3.52
SE-GCL (GCN)
74.98 2.01
80.22 1.55
66.37 1.99
77.31 2.17
83.40 1.14
67.59 2.86
SE-GCL (GAT)
75.022.90
81.762.36
66.922.43
78.163.09
84.751.82
68.253.20
10.26599/TST.2023.9010071.T004Accuracy of GSM and our method. We tested 100 N-way-K-shot tasks on both Letter-High (4-way) and TRIANGLES datasets (3-way).