Heterogeneous graphs contain multiple types of entities and relations, which are capable of modeling complex interactions. Embedding on heterogeneous graphs has become an essential tool for analyzing and understanding such graphs. Although these meticulously designed methods make progress, they are limited by model design and computational resources, making it difficult to scale to large-scale heterogeneous graph data and hindering the application and promotion of these methods. In this paper, we propose Restage, a relation structure-aware hierarchical heterogeneous graph embedding framework. Under this framework, embedding only a smaller-scale graph with existing graph representation learning methods is sufficient to obtain node representations on the original heterogeneous graph. We consider two types of relation structures in heterogeneous graphs: interaction relations and affiliation relations. Firstly, we design a relation structure-aware coarsening method to successively coarsen the original graph to the top-level layer, resulting in a smaller-scale graph. Secondly, we allow any unsupervised representation learning methods to obtain node embeddings on the top-level graph. Finally, we design a relation structure-aware refinement method to successively refine the node embeddings from the top-level graph back to the original graph, obtaining node embeddings on the original graph. Experimental results on three public heterogeneous graph datasets demonstrate the enhanced scalability of representation learning methods by the proposed Restage. On another large-scale graph, the speed of existing representation learning methods is increased by up to eighteen times at most.
- Article type
- Year
- Co-author
Community structure is one of the most important features in real networks and reveals the internal organization of the vertices. Uncovering accurate community structure is effective for understanding and exploiting networks. Tolerance Granulation based Community Detection Algorithm (TGCDA) is proposed in this paper, which uses tolerance relation (namely tolerance granulation) to granulate a network hierarchically. Firstly, TGCDA relies on the tolerance relation among vertices to form an initial granule set. Then granules in this set which satisfied granulation coefficient are hierarchically merged by tolerance granulation operation. The process is finished till the granule set includes one granule. Finally, select a granule set with maximum granulation criterion to handle overlapping vertices among some granules. The overlapping vertices are merged into corresponding granules based on their degrees of affiliation to realize the community partition of complex networks. The final granules are regarded as communities so that the granulation for a network is actually the community partition of the network. Experiments on several datasets show our algorithm is effective and it can identify the community structure more accurately. On real world networks, TGCDA achieves Normalized Mutual Information (NMI) accuracy 17.55% higher than NFA averagely and on synthetic random networks, the NMI accuracy is also improved. For some networks which have a clear community structure, TGCDA is more effective and can detect more accurate community structure than other algorithms.