Scholar - SciOpen

Two-stage submodular maximization problem under cardinality constraint has been widely studied in machine learning and combinatorial optimization. In this paper, we consider knapsack constraint. In this problem, we give $n$ articles and $m$ categories, and the goal is to select a subset of articles that can maximize the function $F (S)$ . Function $F (S)$ consists of $m$ monotone submodular functions $f_{j}$ , $j = 1, 2, \dots, m$ , and each $f_{j}$ measures the similarity of each article in category $j$ . We present a constant-approximation algorithm for this problem.

Open Access Issue

Maximizing Submodular+Supermodular Functions Subject to a Fairness Constraint

Zhenning Zhang, Kaiqiao Meng, Donglei Du, Yang Zhou

Tsinghua Science and Technology 2024, 29(1): 46-55

Published: 21 August 2023

Abstract

PDF (1.4 MB) Collect Collected

Downloads：53

We investigate the problem of maximizing the sum of submodular and supermodular functions under a fairness constraint. This sum function is non-submodular in general. For an offline model, we introduce two approximation algorithms: A greedy algorithm and a threshold greedy algorithm. For a streaming model, we propose a one-pass streaming algorithm. We also analyze the approximation ratios of these algorithms, which all depend on the total curvature of the supermodular function. The total curvature is computable in polynomial time and widely utilized in the literature.

Open Access Issue

Fair k-Center Problem with Outliers on Massive Data

Fan Yuan, Luhong Diao, Donglei Du, Lei Liu

Tsinghua Science and Technology 2023, 28(6): 1072-1084

Published: 28 July 2023

Abstract

PDF (3.3 MB) Collect Collected

Downloads：54

The clustering problem of big data in the era of artificial intelligence has been widely studied. Because of the huge amount of data, distributed algorithms are often used to deal with big data problems. The distributed computing model has an attractive feature: it can handle massive datasets that cannot be put into the main memory. On the other hand, since many decisions are made automatically by machines in today’s society, algorithm fairness is also an important research area of machine learning. In this paper, we study two fair clustering problems: the centralized fair $k$ -center problem with outliers and the distributed fair $k$ -center problem with outliers. For these two problems, we have designed corresponding constant approximation ratio algorithms. The theoretical proof and analysis of the approximation ratio, and the running space of the algorithm are given.

Open Access Issue

Approximation Algorithm for the Balanced 2-Correlation Clustering Problem

Sai Ji, Dachuan Xu, Donglei Du, Ling Gai, Zhongrui Zhao

Tsinghua Science and Technology 2022, 27(5): 777-784

Published: 17 March 2022

Abstract

PDF (4 MB) Collect Collected

Downloads：140

The Correlation Clustering Problem (CorCP) is a significant clustering problem based on the similarity of data. It has significant applications in different fields, such as machine learning, biology, and data mining, and many different problems in other areas. In this paper, the Balanced $2$ -CorCP (B $2$ -CorCP) is introduced and examined, and a new interesting variant of the CorCP is described. The goal of this clustering problem is to partition the vertex set into two clusters with equal size, such that the number of disagreements is minimized. We first present a polynomial time algorithm for the B $2$ -CorCP on $M$ -positive edge dominant graphs $(M ⩾ 3)$ . Then, we provide a series of numerical experiments, and the results show the effectiveness of our algorithm.

Total 4