PDF (1.5 MB)

Cite

Collect

Article | Open Access

Decision Making in Team-Adversary Games with Combinatorial Action Space

Shuxin Li^¹, Youzhi Zhang^²(), Xinrun Wang^¹, Wanqi Xue^¹, Bo An^¹

1School of Computer Science and Engineering, Nanyang Technological University, Singapore 639798, Singapore

2Centre for Artificial Intelligence and Robotics, Hong Kong Institute of Science & Innovation, Chinese Academy of Sciences, Hong Kong 999077, China

Show Author Information

Abstract

The team-adversary game simulates many real-world scenarios in which a team of agents competes cooperatively against an adversary. However, decision-making in this type of game is a big challenge since the joint action space of the team is combinatorial and exponentially related to the number of team members. It also hampers the existing equilibrium finding algorithms from solving team-adversary games efficiently. To solve this issue caused by the combinatorial action space, we propose a novel framework based on Counterfactual Regret Minimization (CFR) framework: CFR-MIX. Firstly, we propose a new strategy representation to replace the traditional joint action strategy by using the individual action strategies of all the team members, which can significantly reduce the strategy space. To maintain the cooperation between team members, a strategy consistency relationship is proposed. Then, we transform the consistency relationship of the strategy to the regret consistency for computing the equilibrium strategy with the new strategy representation under the CFR framework. To guarantee the regret consistency relationship, a product-form decomposition method over cumulative regret values is proposed. To implement this decomposition method, our CFR-MIX framework employs a mixing layer under the CFR framework to get the final decision strategy for the team, i.e., the Nash equilibrium strategy. Finally, we conduct experiments on games in different domains. Extensive results show that CFR-MIX significantly outperforms state-of-the-art algorithms. We hope it can help the team make decisions in large-scale team-adversary games.

Keywords

decision making team-adversary games Nash equilibrium Counterfactual Regret Minimization (CFR)

References

[1]

Y. Shoham and K. Leyton-Brown, Multiagent Systems. Cambridge, UK: Cambridge University Press, 2008.

Crossref

[2]

H. B. McMahan, G. J. Gordon, and A. Blum, Planning in the presence of cost functions controlled by an adversary, in Proc. 20th Int. Conf. Machine Learning, Washington, DC, USA, 2003, pp. 536–543.

[3]

M. Jain, D. Korzhyk, O. Vaněk, V. Conitzer, M. Pěchouček, and M. Tambe, A double oracle algorithm for zero-sum security games on graphs, in Proc. 10th Int. Conf. Autonomous Agents and Multiagent Systems (AAMAS 2011), Taipei, China, 2011, pp. 327–334.

[4]

M. Zinkevich, M. Johanson, M. Bowling, and C. Piccione, Regret minimization in games with incomplete information, Adv. Neural Inf. Process. Syst., vol. 20, pp. 905–912, 2008.

Google Scholar

[5]

M. Bowling, N. Burch, M. Johanson, and O. Tammelin, Heads-up limit hold’em poker is solved, Science, vol. 347, no. 6218, pp. 145–149, 2015.

Crossref Google Scholar

[6]

N. Brown and T. Sandholm, Libratus: the superhuman AI for No-limit poker, in Proc. 26th Int. Joint Conf. Artificial Intelligence, Melbourne, Australia, 2017, 5226–5228.

Crossref

[7]

M. Moravčík, M. Schmid, N. Burch, V. Lisý, D. Morrill, N. Bard, T. Davis, K. Waugh, M. Johanson, and M. Bowling, DeepStack: Expert-level artificial intelligence in heads-up no-limit poker, Science, vol. 356, no. 6337, pp. 508–513, 2017.

Crossref Google Scholar

[8]

Marc Lanctot, Kevin Waugh, Martin Zinkevich, and Michael Bowling. Monte Carlo sampling for regret minimization in extensive games, in Proc. 22nd Int. Conf. Neural Information Processing Systems, Vancouver, Canada, 2009, pp. 1078–1086.

[9]

R. Gibson, M. Lanctot, N. Burch, D. Szafron, and M. Bowling, Generalized sampling and variance in counterfactual regret minimization, Proc. AAAI Conf. Artif. Intell., vol. 26, no. 1, pp. 1355–1361, 2021.

Crossref Google Scholar

[10]

N. Brown, A. Lerer, S. Gross, and T. Sandholm, Deep counterfactual regret minimization, in Proc. 36th Int. Conf. Machine Learning, Long Beach, CA, USA, 2019, pp. 793–802.

[11]

E. Steinberger, Single deep counterfactual regret minimization, arXiv preprint arXiv: 1901.07621, 2019.

[12]

H. Li, K. Hu, S. Zhang, Y. Qi, and L. Song, Double neural counterfactual regret minimization, in Proc. 8th Int. Conf. Learning Representation, virtual, 2019.

[13]

N. Basilico, A. Celli, G. De Nittis, and N. Gatti, Coordinating multiple defensive resources in patrolling games with alarm systems, in Proc. 16th Int. Conf. Autonomous Agents and Multiagent Systems (AAMAS 2017), São Paulo, Brazil, 2017, pp. 678–686.

[14]

N. Abou Risk and D. Szafron, Using counterfactual regret minimization to create competitive multiplayer poker agents, in Proc. 9th Int. Conf. Autonomous Agents and Multiagent Systems: volume 1 - Volume 1, Toronto, Canada, 2010, pp. 159–166.

[15]

T. Rashid, M. Samvelyan, C. S. de Witt, G. Farquhar, J. Foerster, and S. Whiteson, Monotonic value function factorisation for deep multi-agent reinforcement learning, in Proc. 35th Int. Conf. Machine Learning, Stockholm, Sweden, 2020.

[16]

J. F. Nash Jr, Equilibrium points in n-person games, Proc. Natl. Acad. Sci. U. S. A., vol. 36, no. 1, pp. 48–49, 1950.

Crossref Google Scholar

[17]

A. Celli and N. Gatti, Computational results for extensive-form adversarial team games, Proc. AAAI Conf. Artif. Intell., vol. 32, no. 1, pp. 965–972, 2018.

Crossref Google Scholar

[18]

G. Farina, A. Celli, N. Gatti, and T. Sandholm, Ex ante coordination and collusion in zero-sum multi-player extensive-form games, in Proc. 32nd Conf. Neural Information Processing Systems (NIPS 2018), Montreal, Canada, 2018.

[19]

A. Celli, A. Marchesi, T. Bianchi, and N. Gatti, Learning to correlate in multi-player general-sum sequential games, in Proc. 33rd Conf. Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada, 2019.

[20]

Y. Zhang, B. An, and J. Černý, Computing ex ante coordinated team-maxmin equilibria in zero-sum multiplayer extensive-form games, Proc. AAAI Conf. Artif. Intell., vol. 35, no. 6, pp. 5813–5821, 2021.

Crossref Google Scholar

[21]

K. Waugh, D. Schnizlein, M. Bowling, and D. Szafron, Abstraction pathologies in extensive games, in Proc. 8th Int. Conf. on Autonomous Agents and Multiagent Systems (AAMAS 2009), Budapest, Hungary, 2009, pp. 781–788.

[22]

M. Schmid, N. Burch, M. Lanctot, M. Moravcik, R. Kadlec, and M. Bowling, Variance reduction in Monte Carlo counterfactual regret minimization (VR-MCCFR) for extensive form games using baselines, Proc. AAAI Conf. Artif. Intell., vol. 33, no. 1, pp. 2157–2164, 2019.

Crossref Google Scholar

[23]

E. Steinberger, A. Lerer, and N. Brown, DREAM: deep regret minimization with advantage baselines and model-free learning, arXiv preprint arXiv: 2006.10410, 2020.

[24]

O. Tammelin, N. Burch, M. Johanson, and M. Bowling, Solving heads-up limit texas hold’em, in Proc. 24th Int. Conf. Artificial Intelligence, Buenos Aires, Argentina, 2015, pp. 645–652.

[25]

S. M. Ross, Goofspiel—The game of pure strategy, J. Appl. Probab., vol. 8, no. 3, pp. 621–625, 1971.

Crossref Google Scholar

[26]

V. Lisý, M. Lanctot, and M. Bowling, Online Monte Carlo counterfactual regret minimization for search in imperfect information games, in Proc. 2015 Conf. Autonomous Agents and Multi Agent Systems, Istanbul, Turkey, 2015, pp. 27–36.

[27]

N. Brown and T. Sandholm, Solving imperfect-information games via discounted regret minimization, Proc. AAAI Conf. Artif. Intell., vol. 33, no. 1, pp. 1829–1836, 2019.

Crossref Google Scholar

[28]

Y. Zhang, B. An, L. Tran-Thanh, Z. Wang, J. Gan, and N. R. Jennings, Optimal escape interdiction on transportation networks, in Proc. 26th Int. Joint Conf. Artificial Intelligence, Melbourne, Australia, 2017, pp. 3936–3964.

Crossref

[29]

Y. Zhang, Q. Guo, B. An, L. Tran-Thanh, and N. R. Jennings, Optimal interdiction of urban criminals with the aid of real-time information, Proc. AAAI Conf. Artif. Intell., vol. 33, no. 1, pp. 1262–1269, 2019.

Crossref Google Scholar

[30]

K. Horák and B. Bošanský, Dynamic programming for one-sided partially observable pursuit-evasion games, arXiv preprint arXiv: 1606.06271, 2016.

Crossref

CAAI Artificial Intelligence Research

Volume 2,
2023

Article number: 9150023

DOI: 10.26599/AIR.2023.9150023

Cite this article:

Li S, Zhang Y, Wang X, et al. Decision Making in Team-Adversary Games with Combinatorial Action Space. CAAI Artificial Intelligence Research, 2023, 2: 9150023. https://doi.org/10.26599/AIR.2023.9150023

Part of a topical collection:

Decision Intelligence