AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
PDF (1.5 MB)
Collect
Submit Manuscript AI Chat Paper
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Article | Open Access

Decision Making in Team-Adversary Games with Combinatorial Action Space

Shuxin Li1Youzhi Zhang2( )Xinrun Wang1Wanqi Xue1Bo An1
School of Computer Science and Engineering, Nanyang Technological University, Singapore 639798, Singapore
Centre for Artificial Intelligence and Robotics, Hong Kong Institute of Science & Innovation, Chinese Academy of Sciences, Hong Kong 999077, China
Show Author Information

Abstract

The team-adversary game simulates many real-world scenarios in which a team of agents competes cooperatively against an adversary. However, decision-making in this type of game is a big challenge since the joint action space of the team is combinatorial and exponentially related to the number of team members. It also hampers the existing equilibrium finding algorithms from solving team-adversary games efficiently. To solve this issue caused by the combinatorial action space, we propose a novel framework based on Counterfactual Regret Minimization (CFR) framework: CFR-MIX. Firstly, we propose a new strategy representation to replace the traditional joint action strategy by using the individual action strategies of all the team members, which can significantly reduce the strategy space. To maintain the cooperation between team members, a strategy consistency relationship is proposed. Then, we transform the consistency relationship of the strategy to the regret consistency for computing the equilibrium strategy with the new strategy representation under the CFR framework. To guarantee the regret consistency relationship, a product-form decomposition method over cumulative regret values is proposed. To implement this decomposition method, our CFR-MIX framework employs a mixing layer under the CFR framework to get the final decision strategy for the team, i.e., the Nash equilibrium strategy. Finally, we conduct experiments on games in different domains. Extensive results show that CFR-MIX significantly outperforms state-of-the-art algorithms. We hope it can help the team make decisions in large-scale team-adversary games.

References

[1]
Y. Shoham and K. Leyton-Brown, Multiagent Systems. Cambridge, UK: Cambridge University Press, 2008.
[2]
H. B. McMahan, G. J. Gordon, and A. Blum, Planning in the presence of cost functions controlled by an adversary, in Proc. 20th Int. Conf. Machine Learning, Washington, DC, USA, 2003, pp. 536–543.
[3]
M. Jain, D. Korzhyk, O. Vaněk, V. Conitzer, M. Pěchouček, and M. Tambe, A double oracle algorithm for zero-sum security games on graphs, in Proc. 10th Int. Conf. Autonomous Agents and Multiagent Systems (AAMAS 2011), Taipei, China, 2011, pp. 327–334.
[4]

M. Zinkevich, M. Johanson, M. Bowling, and C. Piccione, Regret minimization in games with incomplete information, Adv. Neural Inf. Process. Syst., vol. 20, pp. 905–912, 2008.

[5]

M. Bowling, N. Burch, M. Johanson, and O. Tammelin, Heads-up limit hold’em poker is solved, Science, vol. 347, no. 6218, pp. 145–149, 2015.

[6]
N. Brown and T. Sandholm, Libratus: the superhuman AI for No-limit poker, in Proc. 26th Int. Joint Conf. Artificial Intelligence, Melbourne, Australia, 2017, 5226–5228.
[7]

M. Moravčík, M. Schmid, N. Burch, V. Lisý, D. Morrill, N. Bard, T. Davis, K. Waugh, M. Johanson, and M. Bowling, DeepStack: Expert-level artificial intelligence in heads-up no-limit poker, Science, vol. 356, no. 6337, pp. 508–513, 2017.

[8]
Marc Lanctot, Kevin Waugh, Martin Zinkevich, and Michael Bowling. Monte Carlo sampling for regret minimization in extensive games, in Proc. 22nd Int. Conf. Neural Information Processing Systems, Vancouver, Canada, 2009, pp. 1078–1086.
[9]

R. Gibson, M. Lanctot, N. Burch, D. Szafron, and M. Bowling, Generalized sampling and variance in counterfactual regret minimization, Proc. AAAI Conf. Artif. Intell., vol. 26, no. 1, pp. 1355–1361, 2021.

[10]
N. Brown, A. Lerer, S. Gross, and T. Sandholm, Deep counterfactual regret minimization, in Proc. 36th Int. Conf. Machine Learning, Long Beach, CA, USA, 2019, pp. 793–802.
[11]
E. Steinberger, Single deep counterfactual regret minimization, arXiv preprint arXiv: 1901.07621, 2019.
[12]
H. Li, K. Hu, S. Zhang, Y. Qi, and L. Song, Double neural counterfactual regret minimization, in Proc. 8th Int. Conf. Learning Representation, virtual, 2019.
[13]
N. Basilico, A. Celli, G. De Nittis, and N. Gatti, Coordinating multiple defensive resources in patrolling games with alarm systems, in Proc. 16th Int. Conf. Autonomous Agents and Multiagent Systems (AAMAS 2017), São Paulo, Brazil, 2017, pp. 678–686.
[14]
N. Abou Risk and D. Szafron, Using counterfactual regret minimization to create competitive multiplayer poker agents, in Proc. 9th Int. Conf. Autonomous Agents and Multiagent Systems: volume 1 - Volume 1, Toronto, Canada, 2010, pp. 159–166.
[15]
T. Rashid, M. Samvelyan, C. S. de Witt, G. Farquhar, J. Foerster, and S. Whiteson, Monotonic value function factorisation for deep multi-agent reinforcement learning, in Proc. 35th Int. Conf. Machine Learning, Stockholm, Sweden, 2020.
[16]

J. F. Nash Jr, Equilibrium points in n-person games, Proc. Natl. Acad. Sci. U. S. A., vol. 36, no. 1, pp. 48–49, 1950.

[17]

A. Celli and N. Gatti, Computational results for extensive-form adversarial team games, Proc. AAAI Conf. Artif. Intell., vol. 32, no. 1, pp. 965–972, 2018.

[18]
G. Farina, A. Celli, N. Gatti, and T. Sandholm, Ex ante coordination and collusion in zero-sum multi-player extensive-form games, in Proc. 32nd Conf. Neural Information Processing Systems (NIPS 2018), Montreal, Canada, 2018.
[19]
A. Celli, A. Marchesi, T. Bianchi, and N. Gatti, Learning to correlate in multi-player general-sum sequential games, in Proc. 33rd Conf. Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada, 2019.
[20]

Y. Zhang, B. An, and J. Černý, Computing ex ante coordinated team-maxmin equilibria in zero-sum multiplayer extensive-form games, Proc. AAAI Conf. Artif. Intell., vol. 35, no. 6, pp. 5813–5821, 2021.

[21]
K. Waugh, D. Schnizlein, M. Bowling, and D. Szafron, Abstraction pathologies in extensive games, in Proc. 8th Int. Conf. on Autonomous Agents and Multiagent Systems (AAMAS 2009), Budapest, Hungary, 2009, pp. 781–788.
[22]

M. Schmid, N. Burch, M. Lanctot, M. Moravcik, R. Kadlec, and M. Bowling, Variance reduction in Monte Carlo counterfactual regret minimization (VR-MCCFR) for extensive form games using baselines, Proc. AAAI Conf. Artif. Intell., vol. 33, no. 1, pp. 2157–2164, 2019.

[23]
E. Steinberger, A. Lerer, and N. Brown, DREAM: deep regret minimization with advantage baselines and model-free learning, arXiv preprint arXiv: 2006.10410, 2020.
[24]
O. Tammelin, N. Burch, M. Johanson, and M. Bowling, Solving heads-up limit texas hold’em, in Proc. 24th Int. Conf. Artificial Intelligence, Buenos Aires, Argentina, 2015, pp. 645–652.
[25]

S. M. Ross, Goofspiel—The game of pure strategy, J. Appl. Probab., vol. 8, no. 3, pp. 621–625, 1971.

[26]
V. Lisý, M. Lanctot, and M. Bowling, Online Monte Carlo counterfactual regret minimization for search in imperfect information games, in Proc. 2015 Conf. Autonomous Agents and Multi Agent Systems, Istanbul, Turkey, 2015, pp. 27–36.
[27]

N. Brown and T. Sandholm, Solving imperfect-information games via discounted regret minimization, Proc. AAAI Conf. Artif. Intell., vol. 33, no. 1, pp. 1829–1836, 2019.

[28]
Y. Zhang, B. An, L. Tran-Thanh, Z. Wang, J. Gan, and N. R. Jennings, Optimal escape interdiction on transportation networks, in Proc. 26th Int. Joint Conf. Artificial Intelligence, Melbourne, Australia, 2017, pp. 3936–3964.
[29]

Y. Zhang, Q. Guo, B. An, L. Tran-Thanh, and N. R. Jennings, Optimal interdiction of urban criminals with the aid of real-time information, Proc. AAAI Conf. Artif. Intell., vol. 33, no. 1, pp. 1262–1269, 2019.

[30]
K. Horák and B. Bošanský, Dynamic programming for one-sided partially observable pursuit-evasion games, arXiv preprint arXiv: 1606.06271, 2016.
CAAI Artificial Intelligence Research
Article number: 9150023
Cite this article:
Li S, Zhang Y, Wang X, et al. Decision Making in Team-Adversary Games with Combinatorial Action Space. CAAI Artificial Intelligence Research, 2023, 2: 9150023. https://doi.org/10.26599/AIR.2023.9150023
Part of a topical collection:

826

Views

96

Downloads

0

Crossref

Altmetrics

Received: 10 July 2023
Revised: 14 September 2023
Accepted: 03 November 2023
Published: 17 January 2024
© The author(s) 2023.

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).

Return