Institute of Mathematics, Hebei University of Technology, Tianjin 300401, China
School of Mathematics and Statistics, Shandong Normal University, Jinan 250358, China
College of Statistics and Data Science, Beijing University of Technology, Beijing 100124, China
Department of Operations Research and Information Engineering, Beijing University of Technology, Beijing 100124, China
Show Author Information
Hide Author Information
Abstract
Min-max disagreements are an important generalization of the correlation clustering problem (CorCP). It can be defined as follows. Given a marked complete graph , each edge in the graph is marked by a positive label "" or a negative label "" based on the similarity of the connected vertices. The goal is to find a clustering of vertices , so as to minimize the number of disagreements at the vertex with the most disagreements. Here, the disagreements are the positive cut edges and the negative non-cut edges produced by clustering . This paper considers two robust min-max disagreements: min-max disagreements with outliers and min-max disagreements with penalties. Given parameter , we first provide a threshold-based iterative clustering algorithm based on LP-rounding technique, which is a -bi-criteria approximation algorithm for both the min-max disagreements with outliers and the min-max disagreements with outliers on one-sided complete bipartite graphs. Next, we verify that the above algorithm can achieve an approximation ratio of for min-max disagreements with penalties when we set parameter .
No abstract is available for this article. Click the button above to view the PDF directly.
References
[1]
J. P.Hou, A.Emad, G. J.Puleo, J.Ma, and O.Milenkovic, A new correlation clustering method for cancer mutation analysis, Bioinformatics, vol. 32, no. 24, pp. 3717–3728, 2016.
J.Jafarov, S.Kalhan, K.Makarychev, and Y.Makarychev, Correlation clustering with asymmetric classification errors, in Proc. 37th Int. Conf. Machine Learning, Virtual, 2020, pp. 4641–4650.
A.Ukkonen, Crowdsourced correlation clustering with relative distance comparisons, in Proc. 2017 IEEE Int. Conf. Data Mining (ICDM), New Orleans, LA, USA, 2017, pp. 1117–1122.
N.Veldt, D. F.Gleich, and A.Wirth, A correlation clustering framework for community detection, in Proc. 2018 World Wide Web Conf., Lyon, France, 2018, pp. 439–448.
N.Ailon, N.Avigdor-Elgrabli, E.Liberty, and A. V.Zuylen, Improved approximation algorithms for bipartite correlation clustering, SIAM J. Comput., vol. 41, no. 5, pp. 1110–1121, 2012.
M.Bressan, N.Cesa-Bianchi, A.Paudice, and F.Vitale, Correlation clustering with adaptive similarity queries, in Proc. 33rd Int. Conf. Neural Information Processing Systems, Vancouver, Canada, 2019, pp. 12531–12540.
J. H.Lange, A.Karrenbauer, and B.Andres, Partial optimality and fast lower bounds for weighted correlation clustering, in Proc. 35th Int. Conf. Machine Learning (ICML), Stockholm, Sweden, 2018, pp. 2892–2901.
E.Thiel, M. H.Chehreghani, and D.Dubhashi, A non-convex optimization approach to correlation clustering, Proc. AAAI Conf. Artif. Intell., vol. 33, no. 1, pp. 5159–5166, 2019.
S.Chawla, K.Makarychev, T.Schramm, and G.Yaroslavtsev, Near optimal LP rounding algorithm for correlation clustering on complete and complete k-partite graphs, in Proc. Forty-Seventh Annual ACM Symp. on Theory of Computing, Portland, OR, USA, 2015, pp. 219–228.
G. J.Puleo and O.Milenkovic, Correlation clustering and biclustering with locally bounded errors, IEEE Trans. Inf. Theory, vol. 64, no. 6, pp. 4105–4119, 2018.
M.Charikar, N.Gupta, and R.Schwartz, Local guarantees in graph cuts and clustering, in Proc. 19th Int. Conf. Integer Programming and Combinatorial Optimization, Waterloo, Canada, 2017, pp. 136–147.
, R.Krishnaswamy, and N.Rajaraman, Robust correlation clustering, in Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2019), D.Achlioptas and L. A.Végh, eds. Dagstuhl, Germany: Schloss Dagstuhl, 2019, pp. 33:1–33:18.
[18]
M. H.Chehreghani, Hierarchical correlation clustering and tree preserving embedding, arXiv preprint arXiv: 2002.07756, 2020.
K.Makarychev, Y.Makarychev, and A.Vijayaraghavan, Correlation clustering with noisy partial information, in Proc. 28th Conf. Learn. Theory (COLT), Paris, France, 2015, pp. 1321–1342.
C.Mathieu and W.Schudy, Correlation clustering with noisy input, in Proc. 2010 Annu. ACM-SIAM Symp. on Discrete Algorithms (SODA), Austin, TX, USA, 2010, pp. 712–728.
I.Giotis and V.Guruswami, Correlation clustering with a fixed number of clusters, in Proc. Seventeenth Annu. ACM-SIAM Symp. on Discrete Algorithm (SODA), Miami, FL, USA, 2006, pp. 1167–1176.
A.Aboud and Y.Rabani, Correlation clustering with penalties and approximating the reordering buffer management problem, Doctor dissertation, Computer Sicence Department, Technion, Haifa, Israel, 2008.
Ji S, Li M, Liang M, et al. Robust Correlation Clustering Problem with Locally Bounded Disagreements. Tsinghua Science and Technology, 2024, 29(1): 66-75. https://doi.org/10.26599/TST.2023.9010027
The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).
Theoretical analysis of the bi-criteria ratio
Let be the set of outliers and be the center set output by Algorithm 1. Moreover, let be the clustering of . From Algorithm 1 and the constraints of Formula (
2
), we have Property 1.
Property 1 The solutions returned by Algorithm 1 satisfies:
(1) For each , we have ;
(2) For each , we have ;
(3) For each , we have ;
(4) For each , we have .
Lemma 1 The size of set is no more than .
Proof From the forth constraint of Programming (2), we have
Moreover, from Property 1-(1), we can obtain that
The lemma is concluded.
■
Next, we analyze the value of for each . As shown in
Fig. 1
, for each vertex , and , denote
10.26599/TST.2023.9010027.F001
Relationship between vertex q and several special sets.
Similar, for each , denote
Next, we analyze the number of disagreements at each vertex based on the above sets.
Positive cut edges
In this subsection, we analyze the number of disagreements generated by positive cut edges at each vertex .
Lemma 2 For each vertex , the number of positive cut edge , is no more than
Proof From the construction of sets , and , for each cut edge , we have
Moreover, from Property 1-(3), we can obtain
which indicates that the disagreement generated by is no more than
We sum all the , and we have that the number of positive cut edge is no more than
The lemma is concluded.
■
Lemma 3 For each vertex , the number of positive cut edge is no more than
Proof Given a vertex and any , for each positive edge , from Property 1-(3) we receive that
Moreover, for each negative edge , from Property 1-(4) we achieve that
where is any vertex in set .
From Line of Algorithm 1, we have that
Then, the number of positive cut edge equals and it is no more than
Above all, we get
Therefore, we have
The lemma is concluded.
■
Lemma 4 For each vertex , the number of positive cut edges , is no more than
Proof From Line of Algorithm 1, we receive that
Therefore, we have
Similar to Lemma 3, we only need to analyze the lower bound of for each edge .
For each positive edge with , from Property 1-(3) and the first constraint of Formula (
2
), we can get
where is any vertex in set .
Moreover, from Property 1-(4) and the first constraint of Formula (
2
), for each with , we achieve
Above all, we obtain that
The lemma is concluded.
■
Negative non-cut edges
In this subsection, we analyze the number of negative non-cut edges for each vertex .
Lemma 5 For each vertex , the number of negative edges , is no more than
Proof From Property 1-(4) and the first constraint of Formula (
2
), for each , we have
Combining Formulas (
3
) and (
4
), we have
The lemma is concluded.
■
From Lemmas 2–5, we can obtain the upper bound of disagreements generated by each vertex.
Lemma 6 For each vertex , we have
Proof For each vertex , from Lemmas 2–4, we have
Combining Lemma 5 and Formula (
5
), we can get
The lemma is concluded.
■
From Lemmas 1 and 6, we can obtain Theorem 1.
Theorem 1 For each instance of the min-max disagreements with outliers, Algorithm 1 can return a set with as well as a clustering of , such that
where and are the set of outliers and the clustering of non-outliers returned by the optimal algorithm, respectively.
10.26599/TST.2023.9010027.F001
Relationship between vertex q and several special sets.