AI Chat Paper

Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.

Chat more with AI

| Sign up

Browse by Subject

Search for peer-reviewed journals with full access.

Journals A - Z

About Us

Discover the SciOpen Platform and Achieve Your Research Goals with Ease.

About Us

Publish with Us

Support

Journals A - Z

About Us

Publish with Us

Support

PDF (3.6 MB)

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

AI Chat Paper

Show Outline

Outline

Show full outline

Hide outline

Outline

Show full outline

Hide outline

Article | Open Access

Game Interactive Learning: A New Paradigm towards Intelligent Decision-Making

Junliang Xing^¹, Zhe Wu^², Zhaoke Yu^², Renye Yan^², Zhipeng Ji^², Pin Tao^¹, Yuanchun Shi^¹(

)

1Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China

2Qiyuan Laboratory, Beijing 100094, China

Show Author Information

Abstract

Decision-making plays an essential role in various real-world systems like automatic driving, traffic dispatching, information system management, and emergency command and control. Recent breakthroughs in computer game scenarios using deep reinforcement learning for intelligent decision-making have paved decision-making intelligence as a burgeoning research direction. In complex practical systems, however, factors like coupled distracting features, long-term interact links, and adversarial environments and opponents, make decision-making in practical applications challenging in modeling, computing, and explaining. This work proposes game interactive learning, a novel paradigm as a new approach towards intelligent decision-making in complex and adversarial environments. This novel paradigm highlights the function and role of a human in the process of intelligent decision-making in complex systems. It formalizes a new learning paradigm for exchanging information and knowledge between humans and the machine system. The proposed paradigm first inherits methods in game theory to model the agents and their preferences in the complex decision-making process. It then optimizes the learning objectives from equilibrium analysis using reformed machine learning algorithms to compute and pursue promising decision results for practice. Human interactions are involved when the learning process needs guidance from additional knowledge and instructions, or the human wants to understand the learning machine better. We perform preliminary experimental verification of the proposed paradigm on two challenging decision-making tasks in tactical-level War-game scenarios. Experimental results demonstrate the effectiveness of the proposed learning paradigm.

Keywords

machine learning game theory decision-making human-computer interaction game interactive learning

References

[1]

D. S. Gonzalez, M. Garzon, J. S. Dibangoye, and C. Laugier, Human-like decision-making for automated driving in highways, in Proc. IEEE Intelligent Transportation Systems Conf. (ITSC), Auckland, New Zealand, 2019, 2087–2094.

Crossref

[2]

A. Tazoniero, R. Gonçalves, and F. Gomide, Decision making strategies for real-time train dispatch and control, in Analysis and Design of Intelligent Systems Using Soft Computing Techniques, P. Melin, O. Castillo, E. Gomez Ramírez, J. Kacprzyk, W. Pedrycz Eds. Heidelberg, Germany: Springer, 2007, pp. 195–204.

Crossref

[3]

S. Nowduri, Management information systems and business decision making: Review, analysis, and recommendations, J. Manag. Mark. Res., vol. 7, pp. 1–8, 2011.

Google Scholar

[4]

W. L. Waugh Jr., Mechanisms for collaboration in emergencymanagement: ICS, NIMS, and the problem with command and control, in The collaborative public manager: New ideas for the twenty-first century, R. O’Leary and L. B. Bingham Eds. Washington, DC, USA: Georgetown University Press, 2009, pp. 157–175, 2009.

[5]

G. E. Hinton and R. R. Salakhutdinov, Reducing the dimensionality of data with neural networks, Science, vol. 313, no. 5786, pp. 504–507, 2006.

Crossref Google Scholar

[6]

Y. LeCun, Y. Bengio, and G. Hinton, Deep learning, Nature, vol. 521, no. 7553, pp. 436–444, 2015.

Crossref Google Scholar

[7]

H. Purwins, B. Li, T. Virtanen, J. Schluter, S. Y. Chang, and T. Sainath, Deep learning for audio signal processing, IEEE J. Sel. Top. Signal Process., vol. 13, no. 2, pp. 206–219, 2019.

Crossref Google Scholar

[8]

Y. Guo, Y. Liu, A. Oerlemans, S. Lao, S. Wu, and M. S. Lew, Deep learning for visual understanding: A review, Neurocomputing, vol. 187, pp. 27–48, 2016.

Crossref Google Scholar

[9]

S. Minaee, N. Kalchbrenner, E. Cambria, N. Nikzad, M. Chenaghlu, and J. Gao, Deep Learning: Based Text Classification, ACM Comput. Surv., vol. 54, no. 3, pp. 1–40, 2022.

Crossref Google Scholar

[10]

D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. van den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot et al., Mastering the game of Go with deep neural networks and tree search, Nature, vol. 529, no. 7587, pp. 484–489, 2016.

Crossref Google Scholar

[11]

M. Bowling, N. Burch, M. Johanson, and O. Tammelin, Heads-up limit hold’em poker is solved, Science, vol. 347, no. 6218, pp. 145–149, 2015.

Crossref Google Scholar

[12]

D. Zha, J. Xie, W. Ma, S. Zhang, X. Lian, X. Hu, and J. Liu, DouZero: Mastering DouDizhu with self-play deep reinforcement learning, in Proc. 38th Int. Conf. Machine Learning, virtual, 2021, pp. 12333–12344.

[13]

O. Vinyals, I. Babuschkin, W. M. Czarnecki, M. Mathieu, A. Dudzik, J. Chung, D. H. Choi, R. Powell, T. Ewalds, P. Georgiev et al., Grandmaster level in StarCraft II using multi-agent reinforcement learning, Nature, vol. 575, no. 7782, pp. 350–354, 2019.

Crossref Google Scholar

[14]

C. Berner, G. Brockman, B. Chan, V. Cheung, P. Dębiak, C. Dennison, D. Farhi, Q. Fischer, S. Hashme, C. Hesse, et al., Dota 2 with large scale deep reinforcement learning, arXiv preprint arXiv: 1912.06680, 2019.

[15]

D. Ye, G. Chen, W. Zhang, S. Chen, B. Yuan, B. Liu, J. Chen, Z. Liu, F. Qiu, H. Yu, et al., Towards playing full MOBA games with deep reinforcement learning, in Proc. 34th Int. Conf. Neural Information Processing Systems, virtual, 2020, pp. 621–632.

[16]

J. Schrittwieser, I. Antonoglou, T. Hubert, K. Simonyan, L. Sifre, S. Schmitt, A. Guez, E. Lockhart, D. Hassabis, T. Graepel et al., chess and shogi by planning with a learned model, Nature, vol. 588, no. 7839, pp. 604–609, 2020.

Crossref Google Scholar

[17]

Y. Duan, J. S. Edwards, and Y. K. Dwivedi, Artificial intelligence for decision making in the era of Big Data–evolution, challenges and research agenda, Int. J. Inf. Manag., vol. 48, pp. 63–71, 2019.

Crossref Google Scholar

[18]

A. Tampuu, T. Matiisen, D. Kodelja, I. Kuzovkin, K. Korjus, J. Aru, J. Aru, and R. Vicente, Multiagent cooperation and competition with deep reinforcement learning, PLoS One, vol. 12, no. 4, p. e0172395, 2017.

Crossref Google Scholar

[19]

M. Zinkevich, M. Johanson, M. Bowling, and C. Piccione, Regret minimization in games with incomplete information, in Proc. 21st Annu. Conf. Neural Information Processing Systems, Vancouver, Canada, 2007, pp. 1729–1736.

[20]

Z. Wang, C. Mu, S. Hu, C. Chu, and X. Li, Modelling the dynamics of regret minimization in large agent populations: a master equation approach, in Proc. 31st Int. Joint Conf. Artificial Intelligence, Vienna, Austria, 2022, pp. 23–29.

Crossref

[21]

J. Heinrich, M. Lanctot, and D. Silver, Fictitious self-play in extensive-form games, in Proc. 32nd Int. Conf. Machine Learning, Lille, France, 2015, pp. 805–813.

[22]

N. Bard, J. N. Foerster, S. Chandar, N. Burch, M. Lanctot, H. F. Song, E. Parisotto, V. Dumoulin, S. Moitra, E. Hughes, et al., The Hanabi challenge: A new frontier for AI research, Artif. Intell., vol. 280, p. 103216, 2020.

Crossref Google Scholar

[23]

D. J. Strouse, K. R. McKee, M. M. Botvinick, E. Hughes, and R. Everett, Collaborating with humans without human data, in Proc. 35th Annu. Conf. Neural Information Processing Systems, 2021, virtual, pp. 14502–14515.

[24]

N. Brown and T. Sandholm, Superhuman AI for multiplayer poker, Science, vol. 365, no. 6456, pp. 885–890, 2019.

Crossref Google Scholar

[25]

Meta Fundamental AI Research Diplomacy Team (FAIR), A. Bakhtin, N. Brown, D. E., G. Farina, C. Flaherty, D. Fried, A. Goff, J. Gray, H. Hu, et al., Human-level play in the game of Diplomacy by combining language models with strategic reasoning, Science, vol. 378, no. 6624, pp. 1067–1074, 2022.

Crossref Google Scholar

[26]

M. Jaderberg, V. Dalibard, S. Osindero, W. MCzarnecki, J. Donahue, A. Razavi, O. Vinyals, T. Green, I. Dunning, K. Simonyan, et al., Population based training of neural networks, arXiv preprint arXiv: 1711.09846, 2017.

[27]

Z. Wu, K. Li, H. Xu, Y. Zang, B. An, and J. Xing, L2E: Learning to exploit your opponent, in Proc. Int. Joint Conf. Neural Networks (IJCNN), Padua, Italy, 2022, pp. 1–8.

Crossref

[28]

J. N. Foerster, Y. M. Assael, N. de Freitas, and S. Whiteson, Learning to communicate with deep multi agent reinforcement learning, in Proc. 30th Conf. Neural Information Processing Systems (NIPS 2016), Barcelona, Spain, pp. 2145–2153, 2016.

[29]

N. Rabinowitz, F. Perbet, F. Song, C. Zhang, S. M. Ali Eslami, and M. Botvinick, Machine theory of mind, in Proc. 35th Int. Conf. Machine Learning, Stockholm, Sweden, 2018, pp. 4218–4227.

[30]

Y. J. Liu, M. Yu, G. Zhao, J. Song, Y. Ge, and Y. Shi, Real-time movie-induced discrete emotion recognition from EEG signals, IEEE Trans. Affect. Comput., vol. 9, no. 4, pp. 550–562, 2018.

Crossref Google Scholar

[31]

F. M. Zanzotto, Viewpoint: Human-in-the-loop artificial intelligence, J. Artif. Intell. Res., vol. 64, pp. 243–252, 201.

Crossref Google Scholar

[32]

A. Hussein, M. M. Gaber, E. Elyan, and C. Jayne, Imitation learning: A survey of learning methods, ACM Comput. Surv., vol. 50, no. 2, p. 21.

Crossref

[33]

D. Wang, E. Churchill, P. Maes, X. Fan, B. Shneiderman, Y. Shi, and Q. Wang, From human-human collaboration to human-AI collaboration: Designing AI systems that can work together with people, in Proc. Extended Abstracts of the 2020 CHI Conf. Human Factors in Computing Systems, Honolulu, HI, USA, 2020, pp. 1–6.

Crossref

[34]

L. Yuan, X. Gao, Z. Zheng, M. Edmonds, Y. N. Wu, F. Rossano, H. Lu, Y. Zhu, and S. C. Zhu, In situ bidirectional human-robot value alignment, Sci. Robot., vol. 7, no. 68, p. eabm4183, 2022.

Crossref Google Scholar

[35]

A. Dengel, L. Devillers, and L. M. Schaal, Augmented human and human-machine co-evolution: Efficiency and ethics, in Reflections on Artificial Intelligence for Humanity, Cham, Switzerland: Springer, 2021, pp. 203–227.

Crossref

[36]

J. Perolat, B. Scherrer, B. Piot, and O. Pietquin, Approximate dynamic programming for two-player zero-sum Markov games, in Proc. 32nd Int. Conf. Machine Learning, Lille, France, 2015, pp. 1321–1329.

[37]

R. Mehta, Constant rank two-player games are PPAD-hard, SIAM J. Comput., vol. 47, no. 5, pp. 1858–1887, 2018.

Crossref Google Scholar

[38]

F. Doshi-Velez and B. Kim, Towards a rigorous science of interpretable machine learning, arXiv preprint arXiv: 1702.08608, 2017.

[39]

M. T. Ribeiro, S. Singh, and C. Guestrin, “why should I trust you?”: Explaining the predictions of any classifier, in Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, San Francisco, CA, USA, 2016, pp. 1135–1144.

Crossref

[40]

M. Mundhenk, J. Goldsmith, C. Lusena, and E. Allender, Complexity of finite-horizon Markov decision process problems, J. ACM, vol. 47, no. 4, pp. 681–720, 2000.

Crossref Google Scholar

[41]

C. Yu, Y. Gu, Z. Yang, X. Yi, H. Luo, and Y. Shi, Tap, dwell or gesture?: Exploring head-based text entry techniques for HMDs, in Proc. 2017 CHI Conf. Human Factors in Computing Systems, Denver, CO, USA, 2017, pp. 4479–4488.

Crossref

[42]

K. Lee, L. M. Smith, and P. Abbeel, Pebble: Feedback-efficient interactive reinforcement learning via relabeling experience and unsupervised pre-training, in Proc. 38th Int. Conf. Machine Learning, virtual, 2021, pp. 6152–6163.

CAAI Artificial Intelligence Research

Volume 2,
2023

Article number: 9150027

DOI: 10.26599/AIR.2023.9150027

Cite this article:

Xing J, Wu Z, Yu Z, et al. Game Interactive Learning: A New Paradigm towards Intelligent Decision-Making. CAAI Artificial Intelligence Research, 2023, 2: 9150027. https://doi.org/10.26599/AIR.2023.9150027

716

Views

124

Downloads

Crossref

Google Scholar
Citation

Altmetrics

Received: 02 August 2023

Revised: 11 October 2023

Accepted: 10 December 2023

Published: 12 March 2024

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).