Embodied Intelligence, which integrates physical interaction capabilities with cognitive computation in real-world scenarios, provides a promising path to achieve Artificial General Intelligence (AGI). Recently, the landscape of embodied intelligence has grown profoundly, empowering robotics, autonomous driving, intelligent manufacturing, and so on. This paper presents a comprehensive survey on the evolution of embodied intelligence, tracing its journey from philosophical roots to contemporary advancements. We emphasize significant progress in the integration of perceptual, cognitive, and behavioral components, rather than focusing on these elements in isolation. Despite these advancements, several challenges remain, including hardware limitations, model generalization, physical world understanding, multimodal integration, and ethical considerations, which are critical for the development of robust and reliable embodied intelligence systems. To address these challenges, we outline future research directions, emphasizing Large Perception-Cognition-Behavior (PCB) models, physical intelligence, and morphological intelligence. Central to these perspectives is the general agent framework termed as Bcent, which integrates perception, cognition, and behavior dynamics. Bcent aims to enhance the adaptability, robustness, and intelligence of embodied systems, aligning with the ongoing progress in robotics, autonomous systems, healthcare, and more.
- Article type
- Year
- Co-author
Learning the accurate dynamics of robotic systems directly from the trajectory data is currently a prominent research focus. Recent physics-enforced networks, exemplified by Hamiltonian neural networks and Lagrangian neural networks, demonstrate proficiency in modeling ideal physical systems, but face limitations when applied to systems with uncertain non-conservative dynamics due to the inherent constraints of the conservation laws foundation. In this paper, we present a novel augmented deep Lagrangian network, which seamlessly integrates a deep Lagrangian network with a standard deep network. This fusion aims to effectively model uncertainties that surpass the limitations of conventional Lagrangian mechanics. The proposed network is applied to learn inverse dynamics model of two multi-degree manipulators including a 6-dof UR-5 robot and a 7-dof SARCOS manipulator under uncertainties. The experimental results clearly demonstrate that our approach exhibits superior modeling precision and enhanced physical credibility.
With the accelerated aging of the global population and escalating labor costs, more service robots are needed to help people perform complex tasks. As such, human-robot interaction is a particularly important research topic. To effectively transfer human behavior skills to a robot, in this study, we conveyed skill-learning functions via our proposed wearable device. The robotic teleoperation system utilizes interactive demonstration via the wearable device by directly controlling the speed of the motors. We present a rotation-invariant dynamical-movement-primitive method for learning interaction skills. We also conducted robotic teleoperation demonstrations and designed imitation learning experiments. The experimental human-robot interaction results confirm the effectiveness of the proposed method.
The control of a high Degree of Freedom (DoF) robot to grasp a target in three-dimensional space using Brain-Computer Interface (BCI) remains a very difficult problem to solve. Design of synchronous BCI requires the user perform the brain activity task all the time according to the predefined paradigm; such a process is boring and fatiguing. Furthermore, the strategy of switching between robotic auto-control and BCI control is not very reliable because the accuracy of Motor Imagery (MI) pattern recognition rarely reaches 100
Modern computational models have leveraged biological advances in human brain research. This study addresses the problem of multimodal learning with the help of brain-inspired models. Specifically, a unified multimodal learning architecture is proposed based on deep neural networks, which are inspired by the biology of the visual cortex of the human brain. This unified framework is validated by two practical multimodal learning tasks: image captioning, involving visual and natural language signals, and visual-haptic fusion, involving haptic and visual signals. Extensive experiments are conducted under the framework, and competitive results are achieved.
In this paper, the attitude control problem of rigid body is addressed with considering inertia uncertainty, bounded time-varying disturbances, angular velocity-free measurement, and unknown non-symmetric saturation input. Using a mathematical transformation, the effects of bounded time-varying disturbances, uncertain inertia, and saturation input are combined as total disturbances. A novel finite-time observer is designed to estimate the unknown angular velocity and the total disturbances. For attitude control, an observer-based sliding-mode control protocol is proposed to force the system state convergence to the desired sliding-mode surface; the finite-time stability is guaranteed via Lyapunov theory analysis. Finally, a numerical simulation is presented to illustrate the effective performance of the proposed sliding-mode control protocol.