Scholar - SciOpen

Open Access Issue

Restage: Relation Structure-Aware Hierarchical Heterogeneous Graph Embedding

Huanjing Zhao, Pinde Rui, Jie Chen, Shu Zhao, Yanping Zhang

Tsinghua Science and Technology 2025, 30(1): 198-214

Published: 11 September 2024

Abstract

PDF (6.1 MB) Collect Collected

Downloads：19

Heterogeneous graphs contain multiple types of entities and relations, which are capable of modeling complex interactions. Embedding on heterogeneous graphs has become an essential tool for analyzing and understanding such graphs. Although these meticulously designed methods make progress, they are limited by model design and computational resources, making it difficult to scale to large-scale heterogeneous graph data and hindering the application and promotion of these methods. In this paper, we propose Restage, a relation structure-aware hierarchical heterogeneous graph embedding framework. Under this framework, embedding only a smaller-scale graph with existing graph representation learning methods is sufficient to obtain node representations on the original heterogeneous graph. We consider two types of relation structures in heterogeneous graphs: interaction relations and affiliation relations. Firstly, we design a relation structure-aware coarsening method to successively coarsen the original graph to the top-level layer, resulting in a smaller-scale graph. Secondly, we allow any unsupervised representation learning methods to obtain node embeddings on the top-level graph. Finally, we design a relation structure-aware refinement method to successively refine the node embeddings from the top-level graph back to the original graph, obtaining node embeddings on the original graph. Experimental results on three public heterogeneous graph datasets demonstrate the enhanced scalability of representation learning methods by the proposed Restage. On another large-scale graph, the speed of existing representation learning methods is increased by up to eighteen times at most.

Open Access Issue

Hierarchical Covering Algorithm

Jie Chen, Shu Zhao, Yanping Zhang

Tsinghua Science and Technology 2014, 19(1): 76-81

Published: 07 February 2014

Abstract

PDF (370.7 KB) Collect Collected

Downloads：31

The concept of deep learning has been applied to many domains, but the definition of a suitable problem depth has not been sufficiently explored. In this study, we propose a new Hierarchical Covering Algorithm (HCA) method to determine the levels of a hierarchical structure based on the Covering Algorithm (CA). The CA constructs neural networks based on samples’ own characteristics, and can effectively handle multi-category classification and large-scale data. Further, we abstract characters based on the CA to automatically embody the feature of a deep structure. We apply CA to construct hidden nodes at the lower level, and define a fuzzy equivalence relation $\bar{R}$ on upper spaces to form a hierarchical architecture based on fuzzy quotient space theory. The covering tree naturally becomes from $\bar{R}$ . HCA experiments performed on MNIST dataset show that the covering tree embodies the deep architecture of the problem, and the effects of a deep structure are shown to be better than having a single level.

Open Access Issue

Community-Based User Domain Model Collaborative Recommendation Algorithm

Fulan Qian, Yanping Zhang, Yuan Zhang, Zhen Duan

Tsinghua Science and Technology 2013, 18(4): 353-359

Published: 05 August 2013

Abstract

PDF (384.4 KB) Collect Collected

Downloads：25

Collaborative Filtering (CF) is a commonly used technique in recommendation systems. It can promote items of interest to a target user from a large selection of available items. It is divided into two broad classes: memory-based algorithms and model-based algorithms. The latter requires some time to build a model but recommends online items quickly, while the former is time-consuming but does not require pre-building time. Considering the shortcomings of the two types of algorithms, we propose a novel Community-based User domain Collaborative Recommendation Algorithm (CUCRA). The idea comes from the fact that recommendations are usually made by users with similar preferences. The first step is to build a user-user social network based on users’ preference data. The second step is to find communities with similar user preferences using a community detective algorithm. Finally, items are recommended to users by applying collaborative filtering on communities. Because we recommend items to users in communities instead of to an entire social network, the method has perfect online performance. Applying this method to a collaborative tagging system, experimental results show that the recommendation accuracy of CUCRA is relatively good, and the online time-complexity reduces to $O (n)$ .

Open Access Issue

MICkNN: Multi-Instance Covering kNN Algorithm

Shu Zhao, Chen Rui, Yanping Zhang

Tsinghua Science and Technology 2013, 18(4): 360-368

Published: 05 August 2013

Abstract

PDF (364.5 KB) Collect Collected

Downloads：12

Mining from ambiguous data is very important in data mining. This paper discusses one of the tasks for mining from ambiguous data known as multi-instance problem. In multi-instance problem, each pattern is a labeled bag that consists of a number of unlabeled instances. A bag is negative if all instances in it are negative. A bag is positive if it has at least one positive instance. Because the instances in the positive bag are not labeled, each positive bag is an ambiguous. The mining aim is to classify unseen bags. The main idea of existing multi-instance algorithms is to find true positive instances in positive bags and convert the multi-instance problem to the supervised problem, and get the labels of test bags according to predict the labels of unknown instances. In this paper, we aim at mining the multi-instance data from another point of view, i.e., excluding the false positive instances in positive bags and predicting the label of an entire unknown bag. We propose an algorithm called Multi-Instance Covering kNN (MICkNN) for mining from multi-instance data. Briefly, constructive covering algorithm is utilized to restructure the structure of the original multi-instance data at first. Then, the kNN algorithm is applied to discriminate the false positive instances. In the test stage, we label the tested bag directly according to the similarity between the unseen bag and sphere neighbors obtained from last two steps. Experimental results demonstrate the proposed algorithm is competitive with most of the state-of-the-art multi-instance methods both in classification accuracy and running time.