AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
PDF (3.6 MB)
Collect
Submit Manuscript AI Chat Paper
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Article | Open Access

Game Interactive Learning: A New Paradigm towards Intelligent Decision-Making

Junliang Xing1Zhe Wu2Zhaoke Yu2Renye Yan2Zhipeng Ji2Pin Tao1Yuanchun Shi1( )
Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China
Qiyuan Laboratory, Beijing 100094, China
Show Author Information

Abstract

Decision-making plays an essential role in various real-world systems like automatic driving, traffic dispatching, information system management, and emergency command and control. Recent breakthroughs in computer game scenarios using deep reinforcement learning for intelligent decision-making have paved decision-making intelligence as a burgeoning research direction. In complex practical systems, however, factors like coupled distracting features, long-term interact links, and adversarial environments and opponents, make decision-making in practical applications challenging in modeling, computing, and explaining. This work proposes game interactive learning, a novel paradigm as a new approach towards intelligent decision-making in complex and adversarial environments. This novel paradigm highlights the function and role of a human in the process of intelligent decision-making in complex systems. It formalizes a new learning paradigm for exchanging information and knowledge between humans and the machine system. The proposed paradigm first inherits methods in game theory to model the agents and their preferences in the complex decision-making process. It then optimizes the learning objectives from equilibrium analysis using reformed machine learning algorithms to compute and pursue promising decision results for practice. Human interactions are involved when the learning process needs guidance from additional knowledge and instructions, or the human wants to understand the learning machine better. We perform preliminary experimental verification of the proposed paradigm on two challenging decision-making tasks in tactical-level War-game scenarios. Experimental results demonstrate the effectiveness of the proposed learning paradigm.

References

[1]
D. S. Gonzalez, M. Garzon, J. S. Dibangoye, and C. Laugier, Human-like decision-making for automated driving in highways, in Proc. IEEE Intelligent Transportation Systems Conf. (ITSC), Auckland, New Zealand, 2019, 2087–2094.
[2]
A. Tazoniero, R. Gonçalves, and F. Gomide, Decision making strategies for real-time train dispatch and control, in Analysis and Design of Intelligent Systems Using Soft Computing Techniques, P. Melin, O. Castillo, E. Gomez Ramírez, J. Kacprzyk, W. Pedrycz Eds. Heidelberg, Germany: Springer, 2007, pp. 195–204.
[3]

S. Nowduri, Management information systems and business decision making: Review, analysis, and recommendations, J. Manag. Mark. Res., vol. 7, pp. 1–8, 2011.

[4]
W. L. Waugh Jr., Mechanisms for collaboration in emergencymanagement: ICS, NIMS, and the problem with command and control, in The collaborative public manager: New ideas for the twenty-first century, R. O’Leary and L. B. Bingham Eds. Washington, DC, USA: Georgetown University Press, 2009, pp. 157–175, 2009.
[5]

G. E. Hinton and R. R. Salakhutdinov, Reducing the dimensionality of data with neural networks, Science, vol. 313, no. 5786, pp. 504–507, 2006.

[6]

Y. LeCun, Y. Bengio, and G. Hinton, Deep learning, Nature, vol. 521, no. 7553, pp. 436–444, 2015.

[7]

H. Purwins, B. Li, T. Virtanen, J. Schluter, S. Y. Chang, and T. Sainath, Deep learning for audio signal processing, IEEE J. Sel. Top. Signal Process., vol. 13, no. 2, pp. 206–219, 2019.

[8]

Y. Guo, Y. Liu, A. Oerlemans, S. Lao, S. Wu, and M. S. Lew, Deep learning for visual understanding: A review, Neurocomputing, vol. 187, pp. 27–48, 2016.

[9]

S. Minaee, N. Kalchbrenner, E. Cambria, N. Nikzad, M. Chenaghlu, and J. Gao, Deep Learning: Based Text Classification, ACM Comput. Surv., vol. 54, no. 3, pp. 1–40, 2022.

[10]

D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. van den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot et al., Mastering the game of Go with deep neural networks and tree search, Nature, vol. 529, no. 7587, pp. 484–489, 2016.

[11]

M. Bowling, N. Burch, M. Johanson, and O. Tammelin, Heads-up limit hold’em poker is solved, Science, vol. 347, no. 6218, pp. 145–149, 2015.

[12]
D. Zha, J. Xie, W. Ma, S. Zhang, X. Lian, X. Hu, and J. Liu, DouZero: Mastering DouDizhu with self-play deep reinforcement learning, in Proc. 38th Int. Conf. Machine Learning, virtual, 2021, pp. 12333–12344.
[13]

O. Vinyals, I. Babuschkin, W. M. Czarnecki, M. Mathieu, A. Dudzik, J. Chung, D. H. Choi, R. Powell, T. Ewalds, P. Georgiev et al., Grandmaster level in StarCraft II using multi-agent reinforcement learning, Nature, vol. 575, no. 7782, pp. 350–354, 2019.

[14]
C. Berner, G. Brockman, B. Chan, V. Cheung, P. Dębiak, C. Dennison, D. Farhi, Q. Fischer, S. Hashme, C. Hesse, et al., Dota 2 with large scale deep reinforcement learning, arXiv preprint arXiv: 1912.06680, 2019.
[15]
D. Ye, G. Chen, W. Zhang, S. Chen, B. Yuan, B. Liu, J. Chen, Z. Liu, F. Qiu, H. Yu, et al., Towards playing full MOBA games with deep reinforcement learning, in Proc. 34th Int. Conf. Neural Information Processing Systems, virtual, 2020, pp. 621–632.
[16]

J. Schrittwieser, I. Antonoglou, T. Hubert, K. Simonyan, L. Sifre, S. Schmitt, A. Guez, E. Lockhart, D. Hassabis, T. Graepel et al., chess and shogi by planning with a learned model, Nature, vol. 588, no. 7839, pp. 604–609, 2020.

[17]

Y. Duan, J. S. Edwards, and Y. K. Dwivedi, Artificial intelligence for decision making in the era of Big Data–evolution, challenges and research agenda, Int. J. Inf. Manag., vol. 48, pp. 63–71, 2019.

[18]

A. Tampuu, T. Matiisen, D. Kodelja, I. Kuzovkin, K. Korjus, J. Aru, J. Aru, and R. Vicente, Multiagent cooperation and competition with deep reinforcement learning, PLoS One, vol. 12, no. 4, p. e0172395, 2017.

[19]
M. Zinkevich, M. Johanson, M. Bowling, and C. Piccione, Regret minimization in games with incomplete information, in Proc. 21st Annu. Conf. Neural Information Processing Systems, Vancouver, Canada, 2007, pp. 1729–1736.
[20]
Z. Wang, C. Mu, S. Hu, C. Chu, and X. Li, Modelling the dynamics of regret minimization in large agent populations: a master equation approach, in Proc. 31st Int. Joint Conf. Artificial Intelligence, Vienna, Austria, 2022, pp. 23–29.
[21]
J. Heinrich, M. Lanctot, and D. Silver, Fictitious self-play in extensive-form games, in Proc. 32nd Int. Conf. Machine Learning, Lille, France, 2015, pp. 805–813.
[22]

N. Bard, J. N. Foerster, S. Chandar, N. Burch, M. Lanctot, H. F. Song, E. Parisotto, V. Dumoulin, S. Moitra, E. Hughes, et al., The Hanabi challenge: A new frontier for AI research, Artif. Intell., vol. 280, p. 103216, 2020.

[23]
D. J. Strouse, K. R. McKee, M. M. Botvinick, E. Hughes, and R. Everett, Collaborating with humans without human data, in Proc. 35th Annu. Conf. Neural Information Processing Systems, 2021, virtual, pp. 14502–14515.
[24]

N. Brown and T. Sandholm, Superhuman AI for multiplayer poker, Science, vol. 365, no. 6456, pp. 885–890, 2019.

[25]

Meta Fundamental AI Research Diplomacy Team (FAIR), A. Bakhtin, N. Brown, D. E., G. Farina, C. Flaherty, D. Fried, A. Goff, J. Gray, H. Hu, et al., Human-level play in the game of Diplomacy by combining language models with strategic reasoning, Science, vol. 378, no. 6624, pp. 1067–1074, 2022.

[26]
M. Jaderberg, V. Dalibard, S. Osindero, W. MCzarnecki, J. Donahue, A. Razavi, O. Vinyals, T. Green, I. Dunning, K. Simonyan, et al., Population based training of neural networks, arXiv preprint arXiv: 1711.09846, 2017.
[27]
Z. Wu, K. Li, H. Xu, Y. Zang, B. An, and J. Xing, L2E: Learning to exploit your opponent, in Proc. Int. Joint Conf. Neural Networks (IJCNN), Padua, Italy, 2022, pp. 1–8.
[28]
J. N. Foerster, Y. M. Assael, N. de Freitas, and S. Whiteson, Learning to communicate with deep multi agent reinforcement learning, in Proc. 30th Conf. Neural Information Processing Systems (NIPS 2016), Barcelona, Spain, pp. 2145–2153, 2016.
[29]
N. Rabinowitz, F. Perbet, F. Song, C. Zhang, S. M. Ali Eslami, and M. Botvinick, Machine theory of mind, in Proc. 35th Int. Conf. Machine Learning, Stockholm, Sweden, 2018, pp. 4218–4227.
[30]

Y. J. Liu, M. Yu, G. Zhao, J. Song, Y. Ge, and Y. Shi, Real-time movie-induced discrete emotion recognition from EEG signals, IEEE Trans. Affect. Comput., vol. 9, no. 4, pp. 550–562, 2018.

[31]

F. M. Zanzotto, Viewpoint: Human-in-the-loop artificial intelligence, J. Artif. Intell. Res., vol. 64, pp. 243–252, 201.

[32]
A. Hussein, M. M. Gaber, E. Elyan, and C. Jayne, Imitation learning: A survey of learning methods, ACM Comput. Surv., vol. 50, no. 2, p. 21.
[33]
D. Wang, E. Churchill, P. Maes, X. Fan, B. Shneiderman, Y. Shi, and Q. Wang, From human-human collaboration to human-AI collaboration: Designing AI systems that can work together with people, in Proc. Extended Abstracts of the 2020 CHI Conf. Human Factors in Computing Systems, Honolulu, HI, USA, 2020, pp. 1–6.
[34]

L. Yuan, X. Gao, Z. Zheng, M. Edmonds, Y. N. Wu, F. Rossano, H. Lu, Y. Zhu, and S. C. Zhu, In situ bidirectional human-robot value alignment, Sci. Robot., vol. 7, no. 68, p. eabm4183, 2022.

[35]
A. Dengel, L. Devillers, and L. M. Schaal, Augmented human and human-machine co-evolution: Efficiency and ethics, in Reflections on Artificial Intelligence for Humanity, Cham, Switzerland: Springer, 2021, pp. 203–227.
[36]
J. Perolat, B. Scherrer, B. Piot, and O. Pietquin, Approximate dynamic programming for two-player zero-sum Markov games, in Proc. 32nd Int. Conf. Machine Learning, Lille, France, 2015, pp. 1321–1329.
[37]

R. Mehta, Constant rank two-player games are PPAD-hard, SIAM J. Comput., vol. 47, no. 5, pp. 1858–1887, 2018.

[38]
F. Doshi-Velez and B. Kim, Towards a rigorous science of interpretable machine learning, arXiv preprint arXiv: 1702.08608, 2017.
[39]
M. T. Ribeiro, S. Singh, and C. Guestrin, “why should I trust you?”: Explaining the predictions of any classifier, in Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, San Francisco, CA, USA, 2016, pp. 1135–1144.
[40]

M. Mundhenk, J. Goldsmith, C. Lusena, and E. Allender, Complexity of finite-horizon Markov decision process problems, J. ACM, vol. 47, no. 4, pp. 681–720, 2000.

[41]
C. Yu, Y. Gu, Z. Yang, X. Yi, H. Luo, and Y. Shi, Tap, dwell or gesture?: Exploring head-based text entry techniques for HMDs, in Proc. 2017 CHI Conf. Human Factors in Computing Systems, Denver, CO, USA, 2017, pp. 4479–4488.
[42]
K. Lee, L. M. Smith, and P. Abbeel, Pebble: Feedback-efficient interactive reinforcement learning via relabeling experience and unsupervised pre-training, in Proc. 38th Int. Conf. Machine Learning, virtual, 2021, pp. 6152–6163.
CAAI Artificial Intelligence Research
Article number: 9150027
Cite this article:
Xing J, Wu Z, Yu Z, et al. Game Interactive Learning: A New Paradigm towards Intelligent Decision-Making. CAAI Artificial Intelligence Research, 2023, 2: 9150027. https://doi.org/10.26599/AIR.2023.9150027

953

Views

176

Downloads

0

Crossref

Altmetrics

Received: 02 August 2023
Revised: 11 October 2023
Accepted: 10 December 2023
Published: 12 March 2024
© The author(s) 2023.

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).

Return