[1]
J. McCarthy, What is AI?, http://www-formal.stanford.edu/jmc/whatisai.html, 2004.
[4]
M. I. Posner, Foundations of Cognitive Science. Cambridge, MA, USA: MIT Press, 1989.
[11]
T. Gerstenberg and J. B. Tenenbaum, Intuitive theories, in Oxford Handbook of Causal Reasoning, M. R. Waldmann, Ed. Oxford, UK: Oxford University Press, 2017, pp. 515–547.
[12]
E. S. Spelke, What Babies Know: Core Knowledge and Composition Volume 1. New York, NY, USA: Oxford University Press, 2022.
[18]
C. Bates, P. W. Battaglia, I. Yildirim, and J. B. Tenenbaum, Humans predict liquid dynamics using probabilistic simulation, in Proc. 37th Annu. Meeting of the Cognitive Science Society, Pasadena, CA, USA, 2015, pp. 172–176.
[19]
W. Liang, Y. Zhao, Y. Zhu, and S. C. Zhu, Evaluating human cognition of containing relations with physical simulation, in Proc. 37th Annu. Meeting of the Cognitive Science Society, Pasadena, CA, USA, 2015, pp. 1356–1361.
[20]
J. Kubricht, C. Jiang, Y. Zhu, S. C. Zhu, D. Terzopoulos, and H. Lu, Probabilistic simulation predicts human performance on viscous fluid-pouring problem, in Proc. 38th Annu. Meeting of the Cognitive Science Society, Philadelphia, PA, USA, 2016, pp. 1805–1810.
[21]
J. Kubricht, Y. Zhu, C. Jiang, D. Terzopoulos, S. C. Zhu, and H. Lu, Consistent probabilistic simulation underlying human judgment in substance dynamics, in Proc. 39th Annu. Meeting of the Cognitive Science Society, London, UK, 2017, pp. 700–705.
[24]
K. Smith, L. Mei, S. Yao, J. Wu, E. S. Spelke, J. Tenenbaum, and T. D. Ullman, The fine structure of surprise in intuitive physics: When, why, and how much?, in Proc. 42nd Annu. Meeting of the Cognitive Science Society, virtual, 2020, pp. 3048–3054.
[26]
S. Li, K. Wu, C. Zhang, and Y. Zhu, On the learning mechanisms in physical reasoning, presented at the 36th Conf. Neural Information Processing Systems, New Orleans, LA, USA, 2022.
[27]
J. Wu, I. Yildirim, J. J. Lim, W. T. Freeman, and J. B. Tenenbaum, Galileo: Perceiving physical object properties by integrating a physics engine with deep learning, in Proc. 28th Int. Conf. Neural Information Processing Systems, Montreal, Canada, 2015, pp. 127–135.
[28]
Y. Zhu, C. Jiang, Y. Zhao, D. Terzopoulos, and S. C. Zhu, Inferring forces and learning human utilities from videos, in Proc. 2016 Conf. Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 2016, pp. 3823–3833.
[29]
J. Wu, E. Lu, P. Kohli, W. T. Freeman, and J. B. Tenenbaum, Learning to see physics via visual de-animation, in Proc. 31st Int. Conf. Neural Information Processing Systems, Long Beach, CA, USA, 2017, pp. 152–163.
[30]
S. Qi, Y. Zhu, S. Huang, C. Jiang, and S. C. Zhu, Human-centric indoor scene synthesis using stochastic grammar, in Proc. 2018 IEEE/CVF Conf. Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018, pp. 5899–5908.
[32]
W. Liang, Y. Zhu, and S. C. Zhu, Tracking occluded objects and recovering incomplete trajectories by reasoning about containment relations and human actions, in Proc. 32nd AAAI Conf. Artificial Intelligence and 30th Innovative Applications of Artificial Intelligence Conf. and 8th AAAI Symp. Educational Advances in Artificial Intelligence, New Orleans, LA, USA, 2018, pp. 7106–7113.
[33]
S. Huang, S. Qi, Y. Zhu, Y. Xiao, Y. Xu, and S. C. Zhu, Holistic 3D scene parsing and reconstruction from a single RGB image, in Proc. 15th European Conf. Computer Vision (ECCV), Munich, Germany, 2018, pp. 194–211.
[34]
S. Huang, S. Qi, Y. Xiao, Y. Zhu, Y. N. Wu, and S. C. Zhu, Cooperative holistic scene understanding: Unifying 3D object, layout, and camera pose estimation, in Proc. 32nd Conf. Neural Information Processing Systems, Montréal, Canada, 2018, pp. 206–217.
[35]
Y. Chen, S. Huang, T. Yuan, Y. Zhu, S. Qi, and S. C. Zhu, Holistic++ scene understanding: Single-view 3D holistic scene parsing and human pose estimation with human-object interaction and physical commonsense, in Proc. 2019 IEEE/CVF Int. Conf. Computer Vision (ICCV), Seoul, Republic of Korea, 2019, pp. 8647–8656.
[36]
B. Zheng, Y. Zhao, J. C. Yu, K. Ikeuchi, and S. C. Zhu, Beyond point clouds: Scene understanding by reasoning geometry and physics, in Proc. Conf. Computer Vision and Pattern Recognition, Portland, OR, USA, 2013, pp. 3127–3134.
[38]
Y. Zhu, Y. Zhao, and S. C. Zhu, Understanding tools: Task-oriented object modeling, learning and recognition, in Proc. 2015 IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 2015, pp. 2855–2864.
[39]
M. Han, Z. Zhang, Z. Jiao, X. Xie, Y. Zhu, S. C. Zhu, and H. Liu, Reconstructing interactive 3D scenes by panoptic mapping and CAD model alignments, in Proc. 2021 IEEE Int’l Conf. Robotics and Automation (ICRA), Xi'an, China, 2021, pp. 12199–12206.
[43]
M. Edmonds, J. Kubricht, C. Summers, Y. Zhu, B. Rothrock, S. C. Zhu, and H. Lu, Human causal transfer: Challenges for deep reinforcement learning, in Proc. 40th Annu. Meeting of the Cognitive Science Society, Madison, WI, USA, 2018, pp. 324–329.
[44]
M. Edmonds, S. Qi, Y. Zhu, J. Kubricht, S. C. Zhu, and H. Lu, Decomposing human causal learning: Bottom-up associative learning and top-down schema reasoning, in Proc. 41st Annu. Meeting of the Cognitive Science Society, Montreal, Canada, 2019, pp. 1696–1702.
[46]
M. Edmonds, X. Ma, S. Qi, Y. Zhu, H. Lu, and S. C. Zhu, Theory-based causal transfer: Integrating instance-level induction and abstract-level structure learning, in Proc. AAAI Conference on Artificial Intelligence, New York, NY, USA, 2020, pp. 1283–1291.
[47]
C. Zhang, B. Jia, M. Edmonds, S. C. Zhu, and Y. Zhu, Acre: Abstract causal reasoning beyond covariation, in Proc. 2021 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 2021, pp. 10638–10648.
[48]
M. Xu, G. Jiang, C. Zhang, S. C. Zhu, and Y. Zhu, EST: Evaluating scientific thinking in artificial agents, arXiv preprint arXiv: 2206.09203, 2022.
[49]
C. Zhang, S. Xie, B. Jia, Y. N. Wu, S. C. Zhu, and Y. Zhu, Learning algebraic representation for systematic generalization in abstract reasoning, in Proc. 17th European Conf. Computer Vision, Tel Aviv, Israel, 2022, pp. 692–709.
[52]
C. Zhang, F. Gao, B. Jia, Y. Zhu, and S. C. Zhu, Raven: A dataset for relational and analogical visual reasoning, in Proc. 2019 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 2019, pp. 5312–5322.
[53]
C. Zhang, B. Jia, F. Gao, Y. Zhu, H. Lu, and S. C. Zhu, Learning perceptual inference by contrasting, in Proc. 33rd Int. Conf. Neural Information Processing Systems, Vancouver, Canada, 2019, pp. 1075–1087.
[54]
W. Zhang, C. Zhang, Y. Zhu, and S. C. Zhu, Machine number sense: A dataset of visual arithmetic problems for abstract and relational reasoning, in Proc. AAAI Conf. Artificial Intelligence, New York, NY, USA, 2020, pp. 1332–1340.
[55]
C. Zhang, B. Jia, S. C. Zhu, and Y. Zhu, Abstract spatial-temporal reasoning via probabilistic abduction and execution, in Proc. 2021 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 2021, pp. 9731–9741.
[57]
M. Tomasello, Do apes ape, in Social Learning in Animals: The Roots of Culture, C. M. Heyes and B. G. Galef, Jr. , Eds. San Diego, CA, USA: Academic Press, 1996, pp. 319–346.
[58]
M. Tomasello, Origins of Human Communication. Cambridge, MA, USA: MIT Press, 2010.
[59]
S. Kita, Pointing: Where Language, Culture, and Cognition Meet. New York, NY, USA: Psychology Press, 2003.
[60]
R. M. Scott, E. Roby, and R. Baillargeon, How sophisticated is infants’ theory of mind?, in The Cambridge Handbook of Cognitive Development, O. Houdé and G. Borst, Eds. Cambridge, UK: Cambridge University Press, 2022, pp. 242–268.
[65]
J. Launchbury, A DARPA perspective on artificial intelligence,https://www.darpa.mil/about-us/darpa-perspective-on-ai, 2017.
[66]
K. Jiang, S. Stacy, A. Chan, C. Wei, F. Rossano, Y. Zhu, and T. Gao, Individual vs. joint perception: A pragmatic model of pointing as smithian helping, in Proc. 43rd Annu. Meeting of the Cognitive Science Society, Vienna, Austria, 2021, pp. 1781–1787.
[67]
K. Jiang, A. Dahmani, S. Stacy, B. Jiang, F. Rossano, Y. Zhu, and T. Gao, What is the point? A theory of mind model of relevance, in Proc. 44th Annu. Meeting of the Cognitive Science Society, Toronto, Ontario, Canada, 2022, pp. 3369–3375.
[68]
L. Wittgenstein, The Big Typescript: TS 213, Hoboken, NJ, USA: Wiley-Blackwell, 2012.
[69]
J. Aru, A. Labash, O. Corcoll, and R. Vicente, Mind the gap: Challenges of deep learning approaches to theory of mind, Artif. Intell. Rev.,doi: 10.1007/s10462-023-10401-x.
[72]
A. Michotte, The emotions regarded as functional connections, in Michottes Experimental Phenomenology of Perception, G. Thinés, A. Costall, and G. Butterworth, Eds. Abingdon, UK: Routledge, 1991, pp. 103–116.
[76]
A. Michotte, The Perception of Causality. London: Routledge, 2017.
[87]
N. Chevalier and A. Blaye, False-belief representation and attribution in preschoolers: Testing a graded-representation hypothesis, Curr. Psychol. Lett., vol. 18, no. 1, 2006.
[92]
H. M. Wellman, Developing a theory of mind, in The Wiley-Blackwell Handbook of Childhood Cognitive Development, U. Goswami, Ed. Chichester, West Sussex: John Wiley & Sons, 2011, pp. 258–284.
[99]
A. N. Meltzoff and R. Brooks, “Like me” as a building block for understanding other minds: Bodily acts, attention, and intention, in Intentions and Intentionality: Foundations of Social Cognition, B. F. Malle, L. J. Moses, and D. A. Baldwin, Eds. Cambridge, MA, USA: The MIT Press, 2001, pp. 171–191.
[105]
Z. X. Tan, J. L. Mann, T. Silver, J. B. Tenenbaum, and V. K. Mansinghka, Online Bayesian goal inference for boundedly-rational planning agents, in Proc. 34th Int. Conf. Neural Information Processing Systems, Vancouver, Canada, 2020, pp. 19238–19250.
[110]
R. M. Gordon, ‘Radical’ simulationism, in Theories of Theories of Mind, P. Carruthers and P. K. Smith, Eds. Cambridge, UK: Cambridge University Press, 1996, pp. 11–21.
[113]
T. J. Wiltshire, E. J. Lobato, J. Velez, F. Jentsch, and S. M. Fiore, An interdisciplinary taxonomy of social cues and signals in the service of engineering robotic social intelligence, in Proc. SPIE 9084, Unmanned Systems Technology XVI, Baltimore, Maryland, United States, 2014, p. 90840F.
[117]
C. Moore, P. J. Dunham, and P. Dunham, Joint Attention: Its Origins and Role in Development, London, UK, Psychology Press, 2014.
[119]
Y. Nagai, Understanding the development of joint attention from a viewpoint of cognitive developmental robotics, Ph. D. dissertation, Osaka University, Osaka, Japan, 2004.
[120]
L. Fan, Y. Chen, P. Wei, W. Wang, and S. C. Zhu, Inferring shared attention in social scene videos, in Proc. 2018 IEEE/CVF Conf. Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018, pp. 6460–6468.
[123]
S. C. Levinson, On the human “interaction engine”, in Roots of Human Sociality, S. C. Levinson and N. J. Enfield, Eds. London: Routledge, 2020, pp. 39–69.
[127]
M. Tomasello, Why We Cooperate, Cambridge, MA, USA: MIT Press, 2009.
[128]
N. Tang, S. Stacy, M. Zhao, G. Marquez, and T. Gao, Bootstrapping an imagined we for cooperation, in Proc. 42nd Annu. Meeting of the Cognitive Science Society, virtual, 2020, pp. 2453–2456.
[129]
S. Stacy, Q. Zhao, M. Zhao, M. Kleiman-Weiner, and T. Gao, Intuitive signaling through an “imagined we”, in Proc. 42nd Annu. Meeting of the Cognitive Science Society, virtual, 2020, p. 1880.
[130]
S. E. T. Stacy, The imagined we: Shared Bayesian theory of mind for modeling communication, Ph. D. dissertation, University of California, Los Angeles, CA, USA, 2022.
[131]
N. Tang, S. Gong, Z. Liao, H. Xu, J. Zhou, M. Shen, and T. Gao, Jointly perceiving physics and mind: Motion, force and intention, in Proc. 43rd Annu. Meeting of the Cognitive Science Society, Vienna, Austria, 2021, pp. 735–741.
[132]
T. Shu, M. Kryven, T. D. Ullman, and J. Tenenbaum, Adventures in flatland: Perceiving social interactions under physical dynamics, in Proc. 42nd Annu. Meeting of the Cognitive Science Society, virtual, 2020, pp. 2901–2907.
[135]
P. Wei, Y. Liu, T. Shu, N. Zheng, and S. C. Zhu, Where and why are they looking? Jointly inferring human attention and intentions in complex tasks, in Proc. 2018 IEEE/CVF Conf. Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018, pp. 6801–6809.
[137]
S. Holtzen, Y. Zhao, T. Gao, J. B. Tenenbaum, and S. C. Zhu, Inferring human intent from video by sampling hierarchical plans, in Proc. 2016 IEEE/RSJ Int. Conf. Intelligent Robots and Systems (IROS), Daejeon, Republic of Korea, 2016, pp. 1489–1496.
[138]
B. González and L. J. Chang, Computational models of mentalizing, in The Neural Basis of Mentalizing, M. Gilead and K. N. Ochsner, Eds. Cham, Switzerland: Springer, 2021, pp. 299–315.
[142]
C. L. Baker, R. Saxe, and J. B. Tenenbaum, Bayesian theory of mind: Modeling joint belief-desire attribution, in Proc. 33rd Annu. Meeting of the Cognitive Science Society, Boston, MA, USA, 2011, pp. 2469–2474.
[143]
T. Yuan, H. Liu, L. Fan, Z. Zheng, T. Gao, Y. Zhu, and S. C. Zhu, Joint inference of states, robot knowledge, and human (false-) beliefs, in Proc. 2020 IEEE Int. Conf. Robotics and Automation (ICRA), Paris, France, 2020, pp. 5972–5978.
[144]
L. Fan, S. Qiu, Z. Zheng, T. Gao, S. C. Zhu, and Y. Zhu, Learning triadic belief dynamics in nonverbal communication from videos, in Proc. 2021 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 2021, pp. 7308–7317.
[146]
I. Oguntola, D. Hughes, and K. Sycara, Deep interpretable models of theory of mind, in Proc. 2021 30th Int. Conf. Robot and Human Interactive Communication (RO-MAN), Vancouver, Canada, 2021, pp. 657–664.
[149]
Y. Wen, Y. Yang, R. Luo, J. Wang, and W. Pan, Probabilistic recursive reasoning for multi-agent reinforcement learning, arXiv: 1901.09207, 2019.
[150]
P. Moreno, E. Hughes, K. R. McKee, B. A. Pires, and T. Weber, Neural recursive belief states in multi-agent reinforcement learning, arXiv preprint arXiv: 2102.02274, 2021.
[151]
A. Hakimzadeh, Y. Xue, and P. Setoodeh, Interpretable reinforcement learning inspired by piaget’s theory of cognitive development, arXiv preprint arXiv: 2102.00572, 2021.
[157]
R. Tejwani, Y. L. Kuo, T. Shu, B. Stankovits, D. Gutfreund, J. B. Tenenbaum, B. Katz, and A. Barbu, Incorporating rich social interactions into MDPs, in Proc. 2022 Int. Conf. Robotics and Automation (ICRA), Philadelphia, PA, USA, 2022, pp. 7395–7401.
[158]
R. Tejwani, Y. L. Kuo, T. Shu, B. Katz, and A. Barbu, Social interactions as recursive MDPs, in Proc. 5th Conf. Robot Learning, London, UK, 2022, pp. 949–958.
[161]
S. Stacy, C. Li, M. Zhao, Y. Yun, Q. Zhao, M. Kleiman-Weiner, and T. Gao, Modeling communication to coordinate perspectives in cooperation, in Proc. 43rd Annu. Meeting of the Cognitive Science Society, Vienna, Austria, 2021, pp. 1851–1857.
[162]
X. Gao, R. Gong, Y. Zhao, S. Wang, T. Shu, and S. Chun. Zhu, Joint mind modeling for explanation generation in complex human-robot collaborative tasks, in Proc. 2020 29th IEEE Int. Symp. Robot and Human Interactive Communication (RO-MAN), Naples, Italy, 2020, pp. 1119–1126.
[163]
M. C. Buehler, J. Adamy, and T. H. Weisswange, Theory of mind based assistive communication in complex human robot cooperation, arXiv preprint arXiv: 2109.01355, 2021.
[164]
Y. Wang, F. Zhong, J. Xu, and Y. Wang, ToM2C: Target-oriented multi-agent communication and cooperation with theory of mind, arXiv: 2111.09189, 2022.
[166]
V. Chidambaram, Y. H. Chiang, and B. Mutlu, Designing persuasive robots: How robots might persuade people using vocal and nonverbal cues, in Proc. 2012 7th ACM/IEEE Int. Conf. Human-Robot Interaction (HRI), Boston, MA, USA, 2012, pp. 293–300.
[169]
J. E. Laird, Introduction to soar, arXiv preprint arXiv: 2205.03854, 2022.
[171]
M. Vircikova, G. Magyar, and P. Sincak, The affective loop: A tool for autonomous and adaptive emotional human-robot interaction, in Robot Intelligence Technology and Applications 3, J. H. Kim, W. Yang, J. Jo, P. Sincak, and H. Myung, Eds. Cham, Switzerland: Springer, 2015, pp. 247–254.
[172]
J. Snaider, R. McCall, and S. Franklin, The LIDA framework as a general tool for AGI, in Proc. 4th Int. Conf. Artificial General Intelligence, Mountain View, CA, USA, 2011, pp. 133–142.
[173]
J. E. Laird, The Soar Cognitive Architecture, Cambridge, MA, USA: MIT Press, 2019.
[180]
A. Netanyahu, T. Shu, B. Katz, A. Barbu, and J. B. Tenenbaum, PHASE: Physically-grounded abstract social events for machine social perception, arXiv: 2103.01933, 2021.
[181]
T. Shu, A. Bhandwaldar, C. Gan, K. A. Smith, S. Liu, D. Gutfreund, E. Spelke, J. B. Tenenbaum, and T. D. Ullman, AGENT: A benchmark for core psychological reasoning, arXiv: 2102.12321, 2021.
[182]
X. Puig, T. Shu, S. Li, Z. Wang, Y. H. Liao, J. B. Tenenbaum, S. Fidler, and A. Torralba, Watch-and-help: A challenge for social perception and human-AI collaboration, arXiv: 2010.09890, 2021.
[183]
M. Sap, H. Rashkin, D. Chen, R. LeBras, and Y. Choi, SocialiQA: Commonsense reasoning about social interactions, arXiv preprint arXiv: 1904.09728, 2019.
[186]
F. Lievens and D. Chan, Practical intelligence, emotional intelligence, and social intelligence, in Handbook of Employee Selection, J. L. Farr and N. T. Tippins, Eds. New York, NY, USA: Routledge, 2017, pp. 342–364.
[188]
M. Schurz, J. Radua, M. Aichhorn, F. Richlan, and J. Perner, Fractionating theory of mind: A meta-analysis of functional brain imaging studies, Neurosci. Biobehav. Rev., vol. 42, pp. 9–34, 2014.
[189]
L. Smith and M. Gasser, The development of embodied cognition: Six lessons from babies, Artif. Life, vol. 11, nos. 1–2, pp. 13–29, 2005.
[190]
G. M. van de Ven and A. S. Tolias, Three scenarios for continual learning, arXiv preprint arXiv: 1904.07734, 2019.
[191]
R. Caruana, Multitask learning, Mach. Learn., vol. 28, no. 1, pp. 41–75, 1997.
[192]
Y. Zhang and Q. Yang, A survey on multi-task learning, IEEE Trans. Knowl. Data Eng., vol. 34, no. 12, pp. 5586–5609, 2021.
[193]
J. Vanschoren, Meta-learning: A survey. arXiv preprint arXiv: 1810.03548, 2018.