School of Mathematics and Information Science, Weifang University, Weifang 261061, China
Beijing Institute for Scientific and Engineering Computing, Beijing University of Technology, Beijing 100124, China
Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
School of Mathematical Sciences, Qufu Normal University, Qufu 273165, China
Show Author Information
Hide Author Information
Abstract
Many practical problems emphasize the importance of not only knowing whether an element is selected but also deciding to what extent it is selected, which imposes a challenge on submodule optimization. In this study, we consider the monotone, nondecreasing, and non-submodular maximization on the integer lattice with a cardinality constraint. We first design a two-pass streaming algorithm by refining the estimation interval of the optimal value. For each element, the algorithm not only decides whether to save the element but also gives the number of reservations. Then, we introduce the binary search as a subroutine to reduce the time complexity. Next, we obtain a one-pass streaming algorithm by dynamically updating the estimation interval of optimal value. Finally, we improve the memory complexity of this algorithm.
No abstract is available for this article. Click the button above to view the PDF directly.
References
[1]
H.Lin and J.Bilmes, A class of submodular functions for document summarization, in Proc. 49th Annu. Meeting of the Association for Computational Linguistics: Human Language Technologies, Portland, OR, USA, 2011, pp. 510–520.
D.Kempe, J.Kleinberg, and É.Tardos, Maximizing the spread of influence through a social network, in Proc. 9th ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, Washington, DC, USA, 2003, pp. 137–146.
J.Lee, Encyclopedia of Environmetrics. New York, NY, USA: John Wiley & Sons, Ltd., 2006, pp. 1229–1234.
[4]
A.Badanidiyuru, B.Mirzasoleiman, A.Karbasi, and A.Krause, Streaming submodular maximization: Massive data summarization on the fly, in Proc. 20th ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, New York, NY, USA, 2014, pp. 671–680.
T.Soma, N.Kakimura, K.Inaba, and K.Kawarabayashi, Optimal budget allocation: Theoretical guarantee and efficient algorithm, in Proc. 31st Int. Conf. Machine Learning, Beijing, China, 2014, pp. I-351–I-359.
E.Balkanski, A.Rubinstein, and Y.Singer, An exponential speedup in parallel running time for submodular maximization without loss in approximation, in Proc. 30th Annu. ACM-SIAM Symp. Discrete Algorithms, San Diego, CA, USA, 2019, pp. 283–302.
C.Chekuri and K.Quanrud, Submodular function maximization in parallel via the multilinear relaxation, in Proc. 30th Annu. ACM-SIAM Symp. Discrete Algorithms, San Diego, CA, USA, 2019, pp. 303–322.
A.Das, and D.Kempe, Algorithms for subset selection in linear regression, in Proc. 40th Annu. ACM Symp. Theory of Computing, Victoria, Canada, 2008, pp. 45–54.
A.Ene and H. L.Nguyen, Submodular maximization with nearly-optimal approximation and adaptivity in nearly-linear time, in Proc. 30th Annu. ACM-SIAM Symp. Discrete Algorithms, San Diego, CA, USA, 2019, pp. 274–282.
A.Das and D.Kempe, Submodular meets spectral: Greedy algorithms for subset selection, sparse approximation and dictionary selection, in Proc. 28th Int. Conf. Machine Learning, Washington, DC, USA, 2011, pp. 1057–1064.
D.Golovin and A.Krause, Adaptive submodularity: Theory and applications in active learning and stochastic optimization, J. Artif. Intell. Res., vol. 42, no. 1, pp. 427–486, 2011.
S.Gong, Q.Nong, W.Liu, and Q.Fang, Parametric monotone function maximization with matroid constraints, J. Glob. Optim., vol. 75, no. 3, pp. 833–849, 2019.
C.Huang and N.Kakimura, Improved streaming algorithms for maximizing monotone submodular functions under a knapsack constraint, in Proc. 16th Int. Symp. Algorithms and Data Structures, Edmonton, Canada, 2019, pp. 438–451.
A.Norouzi-Fard, J.Tarnawski, S.Mitrović, A.Zandieh, A.Mousavifar, and O.Svensson, Beyond 1/2-approximation for submodular maximization on massive data streams, in Proc. 35th Int. Conf. Machine Learning, Stockholm, Sweden, 2018, pp. 3829–3838.
A.Shioura, On the pipage rounding algorithm for submodular function maximization—a view from discrete convex analysis, Discrete Math. Algorithms Appl., vol. 1, no. 1, pp. 1–23, 2009.
R. Q.Yang, D. C.Xu, Y. J.Jiang, Y. S.Wang, and D. M.Zhang, Approximating robust parameterized submodular function maximization in large-scales, Asia Pac. J. Oper. Res., vol. 36, no. 4, p. 1950022, 2019.
G. L.Nemhauser, L. A.Wolsey, and M. L.Fisher, An analysis of approximations for maximizing submodular set functions-I, Math. Program., vol. 14, no. 1, pp. 265–294, 1978.
G.Calinescu, C.Chekuri, M.Pál, and J.Vondrák, Maximizing a monotone submodular function subject to a matroid constraint, SIAM J. Comput., vol. 40, no. 6, pp. 1740–1766, 2011.
A.Krause, J.Leskovec, C.Guestrin, J.VanBriesen, and C.Faloutsos, Efficient sensor placement optimization for securing large water distribution networks, J. Water Resour. Plann. Manage., vol. 134, no. 6, pp. 516–526, 2008.
M.Kapralov, I.Post, and J.Vondrák, Online submodular welfare maximization: Greedy is optimal, in Proc. 24th Annu. ACM-SIAM Symp. Discrete Algorithms, New Orleans, LA, US, 2013, pp. 1216–1225.
R.Khanna, E. R.Elenberg, A. G.Dimakis, S.Negahban, and J.Ghosh, Scalable greedy feature selection via weak submodularity, in Proc. 20th Int. Conf. Artificial Intelligence and Security, Fort Lauderdale, FL, USA, 2017, pp. 1560–1568.
N.Lawrence, M.Seeger, and R.Herbrich, Fast sparse Gaussian process methods: The informative vector machine, in Proc. 15th Int. Conf. Neural Information Processing Systems, Cambridge, MA, USA, 2002, pp. 625–632.
Y.Lin, W.Chen, and J. C. S.Lui, Boosting information spread: An algorithmic approach, in Proc. of the 2017 IEEE 33rd Int. Conf. Data Engineering, San Diego, CA, USA, 2017, pp. 883–894.
T.Soma and Y.Yoshida, A generalization of submodular cover via the diminishing return property on the integer lattice, in Proc. 28th Int. Conf. Neural Information Processing Systems, Montreal, Canada, 2015, pp. 847–855.
A.Krause, A.Singh, and C.Guestrin, Near-optimal sensor placements in Gaussian processes: Theory, efficient algorithms and empirical studies, J. Mach. Learn. Res., vol. 9, pp. 235–284, 2008.
A. A.Bian, J. M.Buhmann, A.Krause, and S.Tschiatschek, Guarantees for Greedy maximization of non-submodular functions with applications, in Proc. 34th Int. Conf. Machine Learning, Sydney, Australia, 2017, pp. 498–507.
I.Bogunovic, J.Zhao, and V.Cevher, Robust maximization of non-submodular objectives, in Proc. 21st Int. Conf. Artificial Intelligence and Statistics, Playa Blanca, Spain, 2018, pp. 890–899.
A.Kuhnle, J. D.Smith, V. G.Crawford, and M. T.Thai, Fast maximization of non-submodular, monotonic functions on the integer lattice, in Proc. 35th Int. Conf. Machine Learning, Stockholm, Sweden, 2018, pp. 2786–2795.
U.Feige and R.Izsak, Welfare maximization and the supermodular degree, in Proc. 4th Conf. Innovations in Theoretical Computer Science, Berkeley, CA, USA, 2013, pp. 247–256.
T.Horel and Y.Singer, Maximization of approximately submodular functions, in Proc. 30th Int. Conf. Neural Information Processing Systems, Barcelona, Spain, 2016, pp. 3053–3061.
Y. J.Jiang, Y. S.Wang, D. C.Xu, R. Q.Yang, and Y.Zhang, Streaming algorithm for maximizing a monotone non-submodular function under -knapsack constraint, Optim. Lett., vol. 14, no. 5, pp. 1235–1248, 2020.
Y. J.Wang, D. C.Xu, Y. S.Wang, and D. M.Zhang, Non-submodular maximization on massive data streams, J. Glob. Optim., vol. 76, no. 4, pp. 729–743, 2020.
Z.Zhang, D.Du, Y.Jiang, and C.Wu, Maximizing DR-submodular + supermodular functions on the integer lattice subject to a cardinality constraint, J. Glob. Optim., vol. 80, no. 3, pp. 595–616, 2021.
Tan J, Sun Y, Xu Y, et al. Streaming Algorithms for Non-Submodular Maximization on the Integer Lattice. Tsinghua Science and Technology, 2023, 28(5): 888-895. https://doi.org/10.26599/TST.2022.9010031
The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).
Two-pass streaming algorithm
In the streaming model, we cannot decide immediately whether the value of the arriving element exceeds the maximum marginal value over each iteration. A natural idea is to compare the marginal gain produced by the arriving element with OPT in a certain way. To determine whether or not the arriving item is selected, a specified threshold of is used. We combine the exponential growth method for estimating the OPT with binary search to give the -approximation algorithm for NMCC. The detailed description refers to Algorithms 1 and 2 below.
Lemma 1 Suppose is the output of the -th iteration in Algorithm 2, then we have
Proof From Algorithm 2, we know the initial vector , so Formula (2) holds naturally. Then, we assume that Formula (2) holds for the -th iteration; we only need to prove that it holds for the -th iteration. If we let be the output returned from Algorithm 1, then we have
Together with the introduced hypothesis, we conclude that
which completes the proof.
■
In the following Lemma 2, we consider the marginal increment of with .
Lemma 2 Suppose is the final output of Algorithm 2 and , then we have
where .
Proof Assume that Algorithm 2 produces a vector right before the element ’s arrival. is the output of the BinarySearch subroutine satisfying the following two inequations:
and
Let . Therefore,
This completes the proof.
■
Based on the Lemmas 1 and 2, we can analyze the performance of Algorithm 2. In the analysis, Lemma 1 is used for the case , and Lemma 2 is used for the case .
Theorem 1 For any and given function , Algorithm 2 is a two-pass algorithm, the approximation is , the memory complexity is , and the query times per element is .
Proof Suppose . From the definition of the DR ratio, we obtain
Moreover,
Combined Formulas (5) and (6), we have
Let , thus there exists , such that and
From Formulas (6) and (7), we obtain
Next, for , we denote as the output of Algorithm 2, and consider the following two cases:
(1)
According to Lemma 1, we have
(2)
From Lemma 2 and the weak DR ratio, we have
As , we obtain
Combining Formulas (8) and (9), we obtain
From Algorithm 2, we know that . Thus, we can conclude that
The proof is completed.
■
Two-pass streaming algorithm
In the streaming model, we cannot decide immediately whether the value of the arriving element exceeds the maximum marginal value over each iteration. A natural idea is to compare the marginal gain produced by the arriving element with OPT in a certain way. To determine whether or not the arriving item is selected, a specified threshold of is used. We combine the exponential growth method for estimating the OPT with binary search to give the -approximation algorithm for NMCC. The detailed description refers to Algorithms 1 and 2 below.
Lemma 1 Suppose is the output of the -th iteration in Algorithm 2, then we have
Proof From Algorithm 2, we know the initial vector , so Formula (2) holds naturally. Then, we assume that Formula (2) holds for the -th iteration; we only need to prove that it holds for the -th iteration. If we let be the output returned from Algorithm 1, then we have
Together with the introduced hypothesis, we conclude that
which completes the proof.
■
In the following Lemma 2, we consider the marginal increment of with .
Lemma 2 Suppose is the final output of Algorithm 2 and , then we have
where .
Proof Assume that Algorithm 2 produces a vector right before the element ’s arrival. is the output of the BinarySearch subroutine satisfying the following two inequations:
and
Let . Therefore,
This completes the proof.
■
Based on the Lemmas 1 and 2, we can analyze the performance of Algorithm 2. In the analysis, Lemma 1 is used for the case , and Lemma 2 is used for the case .
Theorem 1 For any and given function , Algorithm 2 is a two-pass algorithm, the approximation is , the memory complexity is , and the query times per element is .
Proof Suppose . From the definition of the DR ratio, we obtain
Moreover,
Combined Formulas (5) and (6), we have
Let , thus there exists , such that and
From Formulas (6) and (7), we obtain
Next, for , we denote as the output of Algorithm 2, and consider the following two cases:
(1)
According to Lemma 1, we have
(2)
From Lemma 2 and the weak DR ratio, we have
As , we obtain
Combining Formulas (8) and (9), we obtain
From Algorithm 2, we know that . Thus, we can conclude that
The proof is completed.
■
One-pass streaming algorithm
For the streaming algorithm, the number of rounds to read data is an important index to measure the performance of such an algorithm. Naturally, we want to obtain a one-pass streaming algorithm by dynamically updating the maximum value of the standard unit vector according to the arriving elements and replacing the estimation value range of the OPT at the same time. We design the one-pass streaming algorithm for the NMCC, which is stated in Algorithm 3. Here, is unknown at the beginning. However, as , we also know . Thus, there is an expression that satisfies .
Theorem 2 For any given and function , Algorithm 2 is a one-pass algorithm, the approximation is , the memory complexity is , and the query times per element is .
Proof Similar to the proof of Theorem 1, we can obtain a , such that . Let be the output solution with respect to . Therefore,
As the return vector of Algorithm 3 satisfies
we can conclude that
■
Improved one-pass streaming algorithm
Asides from the number of rounds to reading the data, the memory complexity is also an important index to measure the performance of a streaming algorithm. In Algorithm 3, the memory complexity relies on the amount of and the cardinality constraint . Then, we take measures to reduce the lower amount of in order to reduce memory complexity. The improved one-pass streaming algorithm is presented as following.
Theorem 3 For any given and function , Algorithm 4 is a one-pass -approximation algorithm for NMCC, with memory complexity and query times per item.
Proof We can obtain the approximation by using the same method as Theorem 2. Thus, we omit the explanation here, and skip to the analysis of the memory complexity of Algorithm 4. On the one hand, after the -th iteration we obtain
and
Therefore, we have
and
On the other hand, after the -th iteration we have
It is easy to check that . Thus, there is no need to save the vector with . We need only to consider the vector corresponding to these . The number of in is .
From Lemma 1, we obtain
Then, for each iteration, the memory complexity is expressed as follows:
For any , the number of is at most . Meanwhile, for each and per , the query time is . Thus, for each item, the update time is .