AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
PDF (3.3 MB)
Collect
Submit Manuscript AI Chat Paper
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Open Access

Fair k-Center Problem with Outliers on Massive Data

Beijing Institute for Scientific and Engineering Computing, Faculty of Science, Beijing University of Technology, Beijing 100124, China
Faculty of Management, University of New Brunswick, Fredericton, E3B 5A3, Canada
Show Author Information

Abstract

The clustering problem of big data in the era of artificial intelligence has been widely studied. Because of the huge amount of data, distributed algorithms are often used to deal with big data problems. The distributed computing model has an attractive feature: it can handle massive datasets that cannot be put into the main memory. On the other hand, since many decisions are made automatically by machines in today’s society, algorithm fairness is also an important research area of machine learning. In this paper, we study two fair clustering problems: the centralized fair k-center problem with outliers and the distributed fair k-center problem with outliers. For these two problems, we have designed corresponding constant approximation ratio algorithms. The theoretical proof and analysis of the approximation ratio, and the running space of the algorithm are given.

References

[1]
M. Kay, C. Matuszek, and S. A. Munson, Unequal representation and gender stereotypes in image search results for occupations, in Proc. 33rd Annu. ACM Conf. Human Factors in Computing Systems, Seoul, Republic of Korea, 2015, pp. 38193828.
[2]
M. Kleindessner, P. Awasthi, and J. Morgenstern, Fair k-center clustering for data summarization, in Proc. 36th Int. Conf. Machine Learning, Long Beach, CA, USA, 2019, pp. 34483457.
[3]
A. Chiplunkar, S. Kale, and S. N. Ramamoorthy, How to solve fair k-center in massive data models, in Proc. 37th Int. Conf. Machine Learning, Virtual Event, 2020, pp. 18771886.
[4]
T. F. Gonzalez, Clustering to minimize the maximum intercluster distance, Theor. Comput. Sci., vol. 38, pp. 293306, 1985.
[5]
D. S. Hochbaum and D. B. Shmoys, A best possible heuristic for the k-center problem, Math. Operat. Res., vol. 10, no. 2, pp. 180184, 1985.
[6]
W. L. Hsu and G. L. Nemhauser, Easy and hard bottleneck location problems, Discrete Appl. Math., vol. 1, no. 3, pp. 209215, 1979.
[7]
D. Z. Chen, J. Li, H. Liang, and H. Wang, Matroid and knapsack center problems, Algorithmica, vol. 75, no. 1, pp. 2752, 2016.
[8]
P. K. Agarwal and J. M. Phillips, An efficient algorithm for 2D Euclidean 2-center with outliers, in Proc. 16th Annu. European Symp. Algorithms, Karlsruhe, Germany, 2008, pp. 6475.
[9]
S. Guha, R. Rastogi, and K. Shim, Techniques for clustering massive data sets, in Clustering and Information Retrieval, W. Wu, H. Xiong, and S. Shekhar, eds. New York, NY, USA: Springer, 2004, pp. 3582.
[10]
M. Hassani, E. Müller, and T. Seidl, EDISKCO: Energy efficient distributed in-sensor-network k-center clustering with outliers, in Proc. 3rd Int. Workshop on Knowledge Discovery from Sensor Data, Paris, France, 2009, pp. 3948.
[11]
M. Charikar, S. Khuller, D. M. Mount, and G. Narasimhan, Algorithms for facility location problems with outliers, in Proc. 12th Annu. ACM-SIAM Symp. Discrete Algorithms, Washington, DC, USA, 2001, pp. 642651.
[12]
S. Guha, Y. Li, and Q. Zhang, Distributed partial clustering, ACM Trans. Parallel Comput., vol. 6, no. 3, p. 11, 2019.
[13]
X. Guo and S. Li, Distributed k-clustering for data with heavy noise, in Proc. 32nd Int. Conf. Neural Information Processing Systems, Montreal, Canada, 2018, pp. 78497857.
[14]
G. Malkomes, M. J. Kusner, W. Chen, K. Q. Weinberger, and B. Moseley, Fast distributed k-center clustering with outliers on massive data, in Proc. 28th Int. Conf. Neural Information Processing Systems, Montreal, Canada, 2015, pp. 10631071.
[15]
N. Ailon, R. Jaiswal, and C. Monteleoni, Streaming k-means approximation, in Proc. 22nd Int. Conf. Neural Information Processing Systems, Vancouver, Canada, 2009, pp. 1018.
[16]
S. Guha, A. Meyerson, N. Mishra, R. Motwani, and L. O’Callaghan, Clustering data streams: Theory and practice, IEEE Trans. Knowl. Data Eng., vol. 15, no. 3, pp. 515528, 2003.
[17]
R. M. McCutchen, and S. Khuller, Streaming algorithms for k-center clustering with outliers and with anonymity, in Proc. 11th Int. Workshop, APPROX 2008, and 12th Int. Workshop, RANDOM 2008 on Approximation, Randomization and Combinatorial Optimization: Algorithms and Techniques, Boston, MA, USA, 2008, pp. 165178.
[18]
M. Shindler, A. Wong, and A. Meyerson, Fast and accurate k-means for large datasets, in Proc. 24th Int. Conf. Neural Information Processing Systems, Granada, Spain, 2011, pp. 23752383.
[19]
B. Bahmani, B. Moseley, A. Vattani, R. Kumar, and S. Vassilvitskii, Scalable k-means++, arXiv preprint arXiv: 1203.6402, 2012.
[20]
M. F. Balcan, S. Ehrlich, and Y. Liang, Distributed k-means and k-median clustering on general topologies, in Proc. 26th Int. Conf. Neural Information Processing Systems, Lake, Tahoe, NV, USA, 2013, pp. 19952003.
[21]
A. Ene, S. Im, and B. Moseley, Fast clustering using MapReduce, in Proc. 17th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, San Diego, CA, USA, 2011, pp. 681689.
[22]
B. Mirzasoleiman, A. Karbasi, R. Sarkar, and A. Krause, Distributed submodular maximization: Identifying representative elements in massive data, in Proc. 26th Int. Conf. Neural Information Processing Systems, Lake Tahoe, NV, USA, 2013, pp. 20492057.
[23]
V. Cohen-Addad, C. Schwiegelshohn, and C. Sohler, Diameter and k-center in sliding windows, in Proc. 43rd Int. Colloquium on Automata, Languages, and Programming, Rome, Italy, 2016, pp. 19:19:12.
[24]
P. Pellizzoni, A. Pietracaprina, and G. Pucci, Dimensionality-adaptive k-center in sliding windows, in Proc. 2020 IEEE 7th Int. Conf. Data Science and Advanced Analytics (DSAA), Sydney, Australia, 2020, pp. 197206.
[25]
P. Pellizzoni, A. Pietracaprina, and G. Pucci, k-center clustering with outliers in sliding windows, Algorithms, vol. 15, no. 2, p. 52, 2022.
[26]
H. Ding, H. Yu, and Z. Wang, Greedy strategy works for k-center clustering with outliers and coreset construction, in Proc. 27th Annu. European Symp. Algorithms (ESA 2019), Munich/Garching, Germany, 2019, pp. 40:140:16.
[27]
H. Zarrabi-Zadeh, Core-preserving algorithms, in Proc. 20th Annu. Canadian Conference on Computational Geometry, Montreal, Canada, 2008, pp. 159162.
[28]
F. Yuan, L. Diao, D. Du, and L. Liu, Distributed fair k-Center clustering problems with outliers, in Proc. 22nd Int. Conf. Parallel and Distributed Computing, Applications and Technologies, Guangzhou, China, 2022, pp. 430440.
[29]
S. Kale, Small space stream summary for matroid center, arXiv preprint arXiv: 1810.06267, 2020.
[30]
L. E. Celis, V. Keswani, D. Straszak, A. Deshpande, T. Kathuria, and N. K. Vishnoi, Fair and diverse DPP-based data summarization, in Proc. 35th Int. Conf. Machine Learning, Stockholm, Sweden, 2018, pp. 716725.
[31]
S. Ahmadian, A. Epasto, R. Kumar, and M. Mahdian, Clustering without over-representation, in Proc. 25th ACM SIGKDD Int. Conf. on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 2019, pp. 267275.
[32]
S. K. Bera, D. Chakrabarty, N. J. Flores, and M. Negahbani, Fair algorithms for clustering, in Proc. 33rd Int. Conf. Neural Information Processing Systems, Vancouver, Canada, 2019, pp. 49544965.
[33]
S. Bandyapadhyay, T. Inamdar, S. Pai, and K. Varadarajan, A constant approximation for colorful k-center, arXiv preprint arXiv: 1907.08906, 2019.
[34]
F. Chierichetti, R. Kumar, S. Lattanzi, and S. Vassilvitskii, Fair clustering through fairlets, in Proc. 31st Int. Conf. Neural Information Processing Systems, Long Beach, CA, USA, 2017, pp. 50295037.
[35]
X. Jia, K. Sheth, and O. Svensson, Fair colorful k-center clustering, Math. Program., vol. 192, nos. 1&2, pp. 339360, 2022.
[36]
M. Schmidt, C. Schwiegelshohn, and C. Sohler, Fair coresets and streaming algorithms for fair k-means, in Proc. 17th Int. Workshop on Approximation and Online Algorithms, Munich, Germany, 2019, pp. 232251.
[37]
I. O. Bercea, M. Groβ, S. Khuller, A. Kumar, C. Rösner, D. R. Schmidt, and M. Schmidt, On the cost of essentially fair clusterings, in Proc. Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, Cambridge, MA, USA, 2019, pp. 18:118:22.
[38]
M. Böhm, A. Fazzone, S. Leonardi, C. Menghini, and C. Schwiegelshohn, Algorithms for fair k-clustering with multiple protected attributes, Oper. Res. Lett., vol. 49, no. 5, pp. 787789, 2021.
[39]
M. Hajiaghayi, R. Khandekar, and G. Kortsarz, Budgeted red-blue median and its generalizations, in Proc. 18th Annu. European Symp. Algorithms, Liverpool, UK, 2010, pp. 314325.
[40]
R. Krishnaswamy, A. Kumar, V. Nagarajan, Y. Sabharwal, and B. Saha, The matroid median problem, in Proc. 2011 Annu. ACM-SIAM Symp. Discrete Algorithms (SODA), San Francisco, CA, USA, 2011, pp. 11171130.
[41]
M. E. Dyer and A. M. Frieze, A simple heuristic for the p-centre problem, Oper. Res. Lett., vol. 3, no. 6, pp. 285288, 1985.
Tsinghua Science and Technology
Pages 1072-1084
Cite this article:
Yuan F, Diao L, Du D, et al. Fair k-Center Problem with Outliers on Massive Data. Tsinghua Science and Technology, 2023, 28(6): 1072-1084. https://doi.org/10.26599/TST.2023.9010013

655

Views

54

Downloads

0

Crossref

0

Web of Science

0

Scopus

0

CSCD

Altmetrics

Received: 04 October 2022
Revised: 02 December 2022
Accepted: 13 March 2023
Published: 28 July 2023
© The author(s) 2023.

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).

Return