| Sign up

PDF (7.3 MB)

Cite

EndNote(RIS) BibTeX

Collect

Collect

Submit Manuscript

Article | Open Access

On Realization of Intelligent Decision Making in the Real World: A Foundation Decision Model Perspective

Ying Wen^¹, Ziyu Wan^¹, Ming Zhou^¹, Shufang Hou^², Zhe Cao^¹, Chenyang Le^¹, Jingxiao Chen^¹, Zheng Tian^³, Weinan Zhang^¹(), Jun Wang^{²^,⁴}

1SEIEE, Shanghai Jiao Tong University, Shanghai 200240, China

2Digital Brain Laboratory, Shanghai 201306, China

3School of Creativity and Art, ShanghaiTech University, Shanghai 201210, China

4Department of Computer Science, University College London, London WC1E 6BT, UK

Show Author Information

Abstract

The pervasive uncertainty and dynamic nature of real-world environments present significant challenges for the widespread implementation of machine-driven Intelligent Decision-Making (IDM) systems. Consequently, IDM should possess the ability to continuously acquire new skills and effectively generalize across a broad range of applications. The advancement of Artificial General Intelligence (AGI) that transcends task and application boundaries is critical for enhancing IDM. Recent studies have extensively investigated the Transformer neural architecture as a foundational model for various tasks, including computer vision, natural language processing, and reinforcement learning. We propose that a Foundation Decision Model (FDM) can be developed by formulating diverse decision-making tasks as sequence decoding tasks using the Transformer architecture, offering a promising solution for expanding IDM applications in complex real-world situations. In this paper, we discuss the efficiency and generalization improvements offered by a foundation decision model for IDM and explore its potential applications in multi-agent game AI, production scheduling, and robotics tasks. Lastly, we present a case study demonstrating our FDM implementation, DigitalBrain (DB1) with 1.3 billion parameters, achieving human-level performance in 870 tasks, such as text generation, image captioning, video game playing, robotic control, and traveling salesman problems. As a foundation decision model, DB1 represents an initial step toward more autonomous and efficient real-world IDM applications.

Keywords

artificial intelligence intelligent decision making Transformer foundation decision model

References

[1]

N. Khatri and H. A. Ng, The role of intuition in strategic decision making, Hum. Relat., vol. 53, no. 1, pp. 57–86, 2000.

Crossref Google Scholar

[2]

R. S. Peres, X. Jia, J. Lee, K. Sun, A. W. Colombo, and J. Barata, Industrial artificial intelligence in industry 4.0 - systematic review, challenges and outlook, IEEE Access, vol. 8, pp. 220121–220139, 2020.

Crossref Google Scholar

[3]

H. A. Simon, Bounded rationality, in Utility and Probability, J. Eatwell, M. Milgate, P. Newman Eds. London, UK: Palgrave Macmillan, 1990, pp. 15–18.

[4]

D. E. Kirk, Optimal control theory: An introduction, Mineola, NY, USA: Dover Publications, 2004.

[5]

W. L. Winston and J. B. Goldberg, Operations research: Applications and algorithms, Boston, MA, USA: Thomson Brooks/Cole, 1998.

[6]

S. Eom and E. Kim, A survey of decision support system applications (1995–2001), J. Oper. Res. Soc., vol. 57, no. 11, pp. 1264–1278, 2006.

Crossref Google Scholar

[7]

J. W. Herrmann, Handbook of production scheduling, New York, NY, USA: Springer, 2006.

[8]

M. S. Nolan, Fundamentals of air traffic control, Clifton Park, NY, USA: Delmar Cengage Learning, 2010.

[9]

G. O. Barnett, J. J. Cimino, J. A. Hupp, and E. P. Hoffer, DXplain. An evolving diagnostic decision-support system, JAMA, vol. 258, no. 1, pp. 67–74, 1987.

Crossref Google Scholar

[10]

J. Ranjan, Business intelligence: Concepts, components, techniques and benefits, J. Theor. Appl. Inf. Technol., vol. 9, no. 1, pp. 60–70, 2009.

[11]

T. Gudehus and H. Kotzab, Comprehensive logistics, Heidelberg, Germany: Springer, 2012.

[12]

M. B. Jensen, M. P. Philipsen, A. Mogelmose, T. B. Moeslund, and M. M. Trivedi, Vision for looking at traffic lights: Issues, survey, and perspectives, IEEE Trans. Intell. Transport. Syst., vol. 17, no. 7, pp. 1800–1815, 2016.

Crossref Google Scholar

[13]

G. Phillips-Wren, N. Ichalkaranje, and L. C. Jain, Intelligent decision making: An AI-based approach, Heidelberg, Germany: Springer-Verlag, 2008.

[14]

C. Gupta and A. Farahat, Deep learning for industrial AI: Challenges, new methods and best practices, in Proc. 26th ACM SIGKDD Int. Conf. Knowledge Discovery & Data Mining, virtual, 2020, pp. 3571–3572.

[15]

J. Wang, W. Zhang, and S. Yuan, Display advertising with real-time bidding (RTB) and behavioural targeting, Found. Trends® Inf. Retr., vol. 11, no. 4–5, pp. 297–435, 2017.

Crossref Google Scholar

[16]

R. Salakhutdinov, Deep learning, in Proc. 20th ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, New York, NY, USA, 2014, p. 1973.

[17]

S. Russell and P. Norvig, Artificial intelligence: A modern approach, Upper Saddle River, NJ, USA: Pearson, 2010.

[18]

R. R. Murphy, Introduction to AI robotics, Ind. Robot Int. J., vol. 28, no. 3, pp. 266–267, 2001.

Crossref Google Scholar

[19]

J. Chen, J. Sun, and G. Wang, From unmanned systems to autonomous intelligent systems, Engineering, vol. 12, pp. 16–19, 2022.

Crossref Google Scholar

[20]

V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller, Playing atari with deep reinforcement learning, arXiv preprint arXiv: 1312.5602, 2013.

[21]

R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction, Cambridge, MA, USA: MIT Press, 2018.

[22]

D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, T. Hubert, L. Baker, M. Lai, A. Bolton et al., Mastering the game of Go without human knowledge, Nature, vol. 550, no. 7676, pp. 354–359, 2017.

Crossref Google Scholar

[23]

O. Vinyals, I. Babuschkin, W. M. Czarnecki, M. Mathieu, A. Dudzik, J. Chung, D. H. Choi, R. Powell, T. Ewalds, P. Georgiev et al., Grandmaster level in StarCraft II using multi-agent reinforcement learning, Nature, vol. 575, no. 7782, pp. 350–354, 2019.

Crossref Google Scholar

[24]

R. Sutton, The bitter lesson, http://www.incompleteideas.net/IncIdeas/BitterLesson.html, 2019.

[25]

R. Qin, S. Gao, X. Zhang, Z. Xu, S. Huang, Z. Li, W. Zhang, and Y. Yu, Neorl: A near real-world benchmark for offline reinforcement learning, arXiv preprint arXiv: 2102.00714, 2021.

[26]

I. Sutskever, O. Vinyals, and Q. V. Le, Sequence to sequence learning with neural networks, in Proc. 27th Int. Conf. Neural Information Processing Systems, Montreal, Canada, 2014, pp. 3104–3112.

[27]

A. Krizhevsky, I. Sutskever, and G. E. Hinton, ImageNet classification with deep convolutional neural networks, Commun. ACM, vol. 60, no. 6, pp. 84–90, 2017.

Crossref Google Scholar

[28]

J. Devlin, M. W. Chang, K. Lee, and K. Toutanova, BERT: Pre-training of deep bidirectional transformers for language understanding, in Proc. 2019 Conf. North American Chapter Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA, 2019, pp. 4171–4186.

[29]

T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, et al., Language models are few-shot learners, in Proc. 34th Conf. Neural Information Processing Systems (NeurIPS 2020), virtual, 2020.

[30]

Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, Swin Transformer: Hierarchical Vision Transformer using shifted windows, in Proc. 2021 IEEE/CVF Int. Conf. Computer Vision (ICCV), Montreal, Canada, 2021, pp. 9992–10002.

[31]

L. Chen, K. Lu, A. Rajeswaran, K. Lee, A. Grover, M. Laskin, P. Abbeel, A. Srinivas, and I. Mordatch, Decision Transformer: Reinforcement learning via sequence modeling, in Proc. 35th Conf. Neural Information Processing Systems (NeurIPS 2021), virtual, 2021, pp. 15084–15097.

[32]

M. Janner, Q. Li, and S. Levine, Offline reinforcement learning as one big sequence modeling problem, in Proc. 35th Conf. Neural Information Processing Systems (NeurIPS 2021), virtual, 2021, pp. 1273–1286.

[33]

B. Baker, I. Akkaya, P. Zhokhov, J. Huizinga, J. Tang, A. Ecoffet, B. Houghton, R. Sampedro, and J. Clune, Video PreTraining (VPT): Learning to act by watching unlabeled online videos, arXiv preprint arXiv: 2206.11795, 2022.

[34]

S. Reed, K. Zolna, E. Parisotto, S. G. Colmenarejo, A. Novikov, G. Barth-Maron, M. Gimenez, Y. Sulsky, J. Kay, J. T. Springenberg, et al., A generalist agent, arXiv preprint arXiv: 2205.06175, 2022.

[35]

K. Han, Y. Wang, H. Chen, X. Chen, J. Guo, Z. Liu, Y. Tang, A. Xiao, C. Xu, Y. Xu et al., A survey on visual transformer, arXiv preprint arXiv: 2012.12556, 2020.

[36]

P. Xu, X. Zhu, and D. A. Clifton, Multimodal learning with transformers: A survey, IEEE Trans. Pattern Anal. Mach. Intell., pp. 1–20, 2023.

[37]

Y. -H H. Tsai, S. Bai, P. Pu Liang, J. Z. Kolter, L. -P. Morency, and R. Salakhutdinov, Multimodal transformer for unaligned multimodal language sequences, Proc. Conf. Assoc. Comput. Linguist. Meet., vol. 2019, pp. 6558–6569, 2019.

[38]

Q. Wen, T. Zhou, C. Zhang, W. Chen, Z. Ma, J. Yan, and L. Sun, Transformers in time series: A survey, arXiv preprint arXiv: 2202.07125, 2022.

[39]

R. Bommasani, D. A. Hudson, E. Adeli, R. Altman, S. Arora, S. von Arx, M. S. Bernstein, J. Bohg, A. Bosselut, E. Brunskill, et al., On the opportunities and risks of foundation models, arXiv preprint arXiv: 2108.07258, 2021.

[40]

H. Bao, L. Dong, and F. Wei, BEiT: Bert pre-training of image Transformers, arXiv preprint arXiv: 2106.08254, 2021.

[41]

W. Wang, H. Bao, L. Dong, J. Bjorck, Z. Peng, Q. Liu, K. Aggarwal, O. K. Mohammed, S. Singhal, S. Som, et al., Image as a foreign language: BEiT pretraining for all vision and vision-language tasks, arXiv preprint arXiv: 2208.10442, 2022.

[42]

Z. Lin, M. Feng, C. N. dos Santos, M. Yu, B. Xiang, B. Zhou, and Y. Bengio, A structured self-attentive sentence embedding, in Proc. 5th Int. C. Learning Representations, Toulon, France, 2017.

[43]

M. Wen, J. G. Kuba, R. Lin, W. Zhang, Y. Wen, J. Wang, and Y. Yang, Multi-agent reinforcement learning is a sequence modeling problem, in Proc. 36th Conf. Neural Information Processing Systems (NeurIPS 2022), New Orleans, LA, USA, 2022.

[44]

C. A. Seger and E. J. Peterson, Categorization = decision making + generalization, Neurosci. Biobehav. Rev., vol. 37, no. 7, pp. 1187–1200, 2013.

Crossref Google Scholar

[45]

R. Kirk, A. Zhang, E. Grefenstette, and T. Rocktäschel, A survey of zero-shot generalisation in deep reinforcement learning, J. Artif. Intell. Res., vol. 76, pp. 201–264, 2023.

Crossref Google Scholar

[46]

O. Poquet and M. de Laat, Developing capabilities: Lifelong learning in the age of AI, Br. J. Educ. Technol., vol. 52, no. 4, pp. 1695–1708, 2021.

Crossref Google Scholar

[47]

L. Kaiser, A. N. Gomez, N Shazeer, A Vaswani, N Parmar, L Jones, and J Uszkoreit, One model to learn them all, arXiv preprint arXiv: 1706.05137, 2017.

[48]

J. Schmidhuber, One big net for everything, arXiv preprint arXiv: 1802.08864, 2018.

[49]

C. Gan, Y. Zhang, J. Wu, B. Gong, and J. B. Tenenbaum, Look, listen, and act: Towards audio-visual embodied navigation, in Proc. 2020 IEEE Int. Conf. Robotics and Automation (ICRA), Paris, France, 2020, pp. 9701–9707.

[50]

M. Ahn, A. Brohan, N. Brown, Y. Chebotar, O. Cortes, B. David, C. Finn, K. Gopalakrishnan, K. Hausman, A. Herzog, et al., Do as i can, not as i say: Grounding language in robotic affordances, arXiv preprint arXiv 2204.01691, 2022.

[51]

D. Perez-Liebana, K. Hofmann, S. P. Mohanty, N. Kuno, A. Kramer, S. Devlin, R. D. Gaina, and D. Ionita, The multi-agent reinforcement learning in malm\ “O (MARL\” O) competition, arXiv preprint arXiv: 1901.08129, 2019.

[52]

L. Fan, G. Wang, Y. Jiang, A. Mandlekar, Y. Yang, H. Zhu, A. Tang, D. -A. Huang, Y. Zhu, and A. Anandkumar, MineDojo: building open-ended embodied agents with internet-scale knowledge, arXiv preprint arXiv: 2206.08853, 2022.

[53]

A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, et al., Learning transferable visual models from natural language supervision, in Proc. 38th Int. Conf. Machine Learning, virtual, 2021, pp. 8748–8763.

[54]

S. Vandenhende, S. Georgoulis, W. Van Gansbeke, M. Proesmans, D. Dai, and L. Van Gool, Multi-task learning for dense prediction tasks: A survey, IEEE Trans. Pattern Anal. Mach. Intell., p. 1, 2021.

[55]

Q. Fu, Z. Wang, N. Fang, B. Xing, X. Zhang, and J. Chen, MAML^2: meta reinforcement learning via meta-learning for task categories, Front. Comput. Sci., vol. 17, no. 4, pp. 1–11, 2022.

Crossref Google Scholar

[56]

Y. Zhang and Q. Yang, A survey on multi-task learning, IEEE Trans. Knowl. Data Eng., vol. 34, no. 12, pp. 5586–5609, 2022.

Crossref Google Scholar

[57]

B. Liu, Lifelong machine learning: a paradigm for continuous learning, Front. Comput. Sci., vol. 11, no. 3, pp. 359–361, 2017.

Crossref Google Scholar

[58]

A. P. Badia, B. Piot, S. Kapturowski, P. Sprechmann, A. Vitvitskyi, Z. D. Guo, and C. Blundell, Agent57: Outperforming the Atari human benchmark, in Proc. 37th Int. Conf. Machine Learning, virtual, 2020, pp. 507–517.

[59]

L. Espeholt, H. Soyer, R. Munos, K. Simonyan, V. Mnih, T. Ward, Y. Doron, V. Firoiu, T. Harley, I. Dunning, et al., IMPALA: Scalable distributed deep-RL with importance weighted actor-learner architectures, in Proc. 35th Int. Conf. Machine Learning, Stockholm, Sweden, 2018, pp. 1406–1415.

[60]

W. Zhao, J. P. Queralta, and T. Westerlund, Sim-to-real transfer in deep reinforcement learning for robotics: a survey, in Proc. 2020 IEEE Symp. Series on Computational Intelligence (SSCI), Canberra, Australia, pp. 737–744.

[61]

E. Salvato, G. Fenu, E. Medvet, and F. A. Pellegrino, Crossing the reality gap: A survey on sim-to-real transferability of robot controllers in reinforcement learning, IEEE Access, vol. 9, pp. 153171–153187, 2021.

Crossref Google Scholar

[62]

B. Zheng, S. Verma, J. Zhou, I. Tsang, and F. Chen, Imitation learning: Progress, taxonomies and opportunities, arXiv preprint arXiv: 2106.12177, 2021.

[63]

J. Ho and S. Ermon, Generative adversarial imitation learning, in Proc. 30th Conf. Neural Information Processing Systems (NIPS 2016), Barcelona, Spain, 2016, pp. 4565–4573.

[64]

X. Zhang, H. Ma, X. Luo, and J. Yuan, LIDAR: learning from imperfect demonstrations with advantage rectification, Front. Comput. Sci., vol. 16, no. 1, pp. 1–10, 2021.

Crossref Google Scholar

[65]

J. Hua, L. Zeng, G. Li, and Z. Ju, Learning for a robot: Deep reinforcement learning, imitation learning, transfer learning, Sensors, vol. 21, no. 4, p. 1278, 2021.

Crossref Google Scholar

[66]

S. Gronauer and K. Diepold, Multi-agent deep reinforcement learning: a survey, Artif. Intell. Rev., vol. 55, no. 2, pp. 895–943, 2022.

Crossref Google Scholar

[67]

L. Meng, M. Wen, Y. Yang, C. Le, X. Li, W. Zhang, Y. Wen, H. Zhang, J. Wang, and B. Xu, Offline pre-trained multi-agent decision Transformer, Mach. Intell. Res., vol. 20, no. 2, pp. 233–248, 2023.

Crossref Google Scholar

[68]

J. G. Kuba, R. Chen, M. Wen, Y. Wen, F. Sun, J. Wang, and Y. Yang, Trust region policy optimisation in multi-agent reinforcement learning, in Proc. 10th Int. Conf. Learning Representations, virtual, 2022.

[69]

M. Hausknecht and P. Stone, Deep recurrent Q-learning for partially observable MDPs, in Proc. 2015 AAAI Fall Symp., Arlington, VA, USA, 2015, pp. 29–37.

[70]

D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. van den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot et al., Mastering the game of Go with deep neural networks and tree search, Nature, vol. 529, no. 7587, pp. 484–489, 2016.

Crossref Google Scholar

[71]

N. Brown and T. Sandholm, Libratus: the superhuman AI for No-limit poker, in Proc. 26th Int. Joint Conf. Artificial Intelligence, Melbourne, Australia, 2017, pp. 5226–5228.

[72]

OpenAI, C. Berner, G. Brockman, B. Chan, V. Cheung, P. Dębiak, C. Dennison, D. Farhi, Q. Fischer et al., Dota 2 with large scale deep reinforcement learning, arXiv preprint arXiv: 1912.06680, 2019.

[73]

Q. Yin, J. Yang, W. Ni, B. Liang, and K. Huang, AI in games: Techniques, challenges and opportunities, arXiv preprint arXiv: 2111.07631, 2021.

[74]

Z. Jiang, S. Yuan, J. Ma, and Q. Wang, The evolution of production scheduling from Industry 3.0 through Industry 4.0, Int. J. Prod. Res., vol. 60, no. 11, pp. 3534–3554, 2022.

Crossref Google Scholar

[75]

C. Waubert de Puiseau, R. Meyes, and T. Meisen, On reliability of reinforcement learning based production scheduling systems: a comparative survey, J. Intell. Manuf., vol. 33, no. 4, pp. 911–927, 2022.

Crossref Google Scholar

[76]

F. Bonin-Font, A. Ortiz, and G. Oliver, Visual navigation for mobile robots: A survey, J. Intell. Rob. Syst., vol. 53, no. 3, pp. 263–296, 2008.

Crossref Google Scholar

[77]

B. Singh, R. Kumar, and V. P. Singh, Reinforcement learning in robotic applications: a comprehensive survey, Artif. Intell. Rev., vol. 55, no. 2, pp. 945–990, 2022.

Crossref Google Scholar

[78]

C. Beattie, J. Z. Leibo, D. Teplyashin, T. Ward, M. Wainwright, H. Küttler, A. Lefrancq, S. Green, V. Valdés, A. Sadik, et al, Deepmind lab, AarXiv preprint arXiv: 1612.03801, 2016.

[79]

S. Schmitt, M. Hessel, and K. Simonyan, Off-policy actor-critic with shared experience replay, in Proc. 37th Int. Conf. Machine Learning, 2020, virtual, pp. 8545–8554.

[80]

K. Cobbe, C. Hesse, J. Hilton, and J. Schulman, Leveraging procedural generation to benchmark reinforcement learning, in Proc. 37th Int. Conf. Machine Learning, 2020, virtual, pp. 2048–2056.

[81]

S. Racanière, T. Weber, D. P. Reichert, L. Buesing, A. Guez, D. J. Rezende, A. P. Badia, O. Vinyals, N. Heess, Y. Li, et al., Imagination-augmented agents for deep reinforcement learning, in Proc. 31st Conf. Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 2017, pp. 5690–5701.

[82]

M. Chevalier-Boisvert, D. Bahdanau, S. Lahlou, L. Willems, C. Saharia, T. H. Nguyen, and Y. Bengio, BabyAI: A platform to study the sample efficiency of grounded language learning, in Proc. 7th Int. Conf. Learning Representations, New Orleans, LA, USA, 2019.

[83]

W. Huang, I. Mordatch, and D. Pathak, One policy to control them all: Shared modular policies for agent-agnostic control, in Proc. 37th Int. Conf. Machine Learning, virtual, 2020, pp. 4455–4464.

[84]

Emanuel Todorov, Tom Erez, and Yuval Tassa. Mujoco: A physics engine for model-based control. In 2012 IEEE/RSJ international conference on intelligent robots and systems, pages 5026–5033. IEEE, 2012.

[85]

G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang, and W. Zaremba, OpenAI Gym, arXiv preprint arXiv: 1606.01540, 2016.

[86]

G. Barth-Maron, M. W. Hoffman, D. Budden, W. Dabney, D. Horgan, D. TB, A. Muldal, N. Heess, and T. P. Lillicrap, Distributed distributional deterministic policy gradients, presented at the 6th Int. Conf. Learning Representations, Vancouver, Canada, 2018.

[87]

T. Yu, D. Quillen, Z. He, R. Julian, K. Hausman, C. Finn, and S. Levine, Meta-world: A benchmark and evaluation for multi-task and meta reinforcement learning, in Proc. 3rd Annu. Conf. Robot Learning, Osaka, Japan, 2019, pp. 1094–1100.

[88]

A. Petrenko, Z. Huang, T. Kumar, G. Sukhatme, and V. Koltun, Sample factory: Egocentric 3D control from pixels at 100000 FPS with asynchronous reinforcement learning, in Proc. 37th Int. Conf. Machine Learning, virtual, 2020, pp. 7652–7662.

[89]

S. Tunyasuvunakool, A. Muldal, Y. Doron, S. Liu, S. Bohez, J. Merel, T. Erez, T. Lillicrap, N. Heess, and Y. Tassa, Dm_control: Software and tasks for continuous control, Softw. Impacts, vol. 6, p. 100022, 2020.

Crossref Google Scholar

[90]

X. Chen, H. Fang, T. Y. Lin, R. Vedantam, S. Gupta, P. Dollar, and C. L. Zitnick, Microsoft COCO captions: Data collection and evaluation server, arXiv preprint arXiv: 1504.00325, 2015.

[91]

B. Gavish and S. Graves, The travelling salesman problem and related problems, http://hdl.handle.net/1721.1/5363, 1978.

[92]

S. Lin and B. W. Kernighan, An effective heuristic algorithm for the traveling-salesman problem, Oper. Res., vol. 21, no. 2, pp. 498–516, 1973.

Crossref Google Scholar

[93]

T. N. Kipf and M. Welling, Semi-supervised classification with graph convolutional networks, presented at the 5th Int. Conf. Learning Representations, Toulon, France, 2017.

[94]

Z. Dai, Z. Yang, Y. Yang, J. Carbonell, Q. Le, and R. Salakhutdinov, Transformer-XL: Attentive language models beyond a fixed-length context, in Proc. 57th Annu. Meeting Association for Computational Linguistics, Florence, Italy, 2019, pp. 2978–2988.

[95]

A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, I. Sutskever, Language models are unsupervised multitask learners, https://paperswithcode.com/paper/language-models-are-unsupervised-multitask, 2019.

[96]

R. Xiong, Y. Yang, D. He, K. Zheng, S. Zheng, C. Xing, H. Zhang, Y. Lan, L. Wang, et al., On layer normalization in the Transformer architecture, in Proc. 37th Int. Conf. Machine Learning, virtual, 2020, pp. 10524–10533.

[97]

N. Shazeer, GLU variants improve Transformer, arXiv preprint arXiv: 2002.05202, 2020.

[98]

R. J. Williams and D. Zipser, A learning algorithm for continually running fully recurrent neural networks, Neural Comput., vol. 1, no. 2, pp. 270–280, 1989.

Crossref Google Scholar

[99]

I. Loshchilov and F. Hutter. Decoupled weight decay regularization, presented at the 7th Int. Conf. Learning Representations, New Orleans, LA, USA, 2019.

[100]

P. Liu, W. Yuan, J. Fu, Z. Jiang, H. Hayashi, and G. Neubig, Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing, ACM Comput. Surv., vol. 55, no. 9, p. 195,

[101]

M. G. Bellemare, Y. Naddaf, J. Veness, and M. Bowling, The arcade learning environment: An evaluation platform for general agents, in Proc. 24th Int. Conf. Artificial Intelligence, Buenos Aires, Argentina, 2015, pp. 4148–4152.

CAAI Artificial Intelligence Research

Article number: 9150026

DOI: 10.26599/AIR.2023.9150026

Cite this article:

Wen Y, Wan Z, Zhou M, et al. On Realization of Intelligent Decision Making in the Real World: A Foundation Decision Model Perspective. CAAI Artificial Intelligence Research, 2023, 2: 9150026. https://doi.org/10.26599/AIR.2023.9150026

Part of a topical collection:

Decision Intelligence

About Us

Learn about Open Access

Tsinghua University Press

Publish with Us

Peer Review Policy

Copyright and Licensing

Article Processing Charge

Contact Us

Journal Collaboration: Yao Meng (Ms.)✉️ +86-10-83470574

Technical Support: Kuo Zhao (Mr.)✉️ +86-10-83470507

Media Contact: Hao Jin (Mr.)✉️ +86-10-83470559

Address: Floor 6, Tower B, Xueyan Building, Shuangqing Road, Haidian District, Beijing 100084, China.

SciOpen——中国科技期刊卓越行动计划支持项目

Copyright © 2025 Tsinghua University Press Ltd.

京ICP备 10035462号-42 京公网安备11010802044758号