Despite significant advances in bipedal locomotion, enabling humanoid robots to perform general whole-body tasks through meaningful interaction with their environments remains a challenging open problem. While deep reinforcement learning (RL) has recently demonstrated impressive results in dynamic walking — even on complex and unpredictable terrain — real-world utility demands that humanoids go beyond locomotion to execute task-oriented behaviors.
In this work, we propose a framework for teaching humanoid robots to imitate humans doing useful tasks by training policies for tracking human motion references. Our approach leverages high-quality in-house motion capture (MoCap) data, from which we perform kinematic retargeting to project human trajectories onto a humanoid platform. Crucially, we adopt a hybrid learning paradigm: the policy is trained to track upper-body and root motions from the MoCap data, and receives additional supervision from a pre-trained omnidirectional walking expert. This expert guidance, implemented via a Behavior Cloning (BC) objective, ensures that leg motion respects dynamics and kinematic constraints of the humanoid. We train policies entirely in simulation and successfully transfer them to a real humanoid robot. We validate our method on a box loco-manipulation task, demonstrating effective sim-to-real transfer and marking a step toward more capable, task-driven humanoid behavior.
Our objective is to develop a methodology that enables humanoid robots to learn loco-manipulation behaviors directly from human motion demonstrations. To this end, we propose a framework that combines motion capture, inverse kinematics (IK), and reinforcement learning (RL) with an auxiliary Behavior Cloning (BC) loss term to achieve motion imitation while maintaining dynamic feasibility on humanoid platforms.
The pipeline begins with the collection of reference motion data in-house by having a human subject perform the loco-manipulation task while wearing a full-body motion capture suit. This includes walking towards a box placed on a table, picking it up, then walking to another table to drop it down. We record both the 3D trajectories of body markers and the corresponding skeletal motion. We also record contact forces through sensors placed on the hands and on the bottom surface of the box for accurate identification of contact events during interaction.
The kinematic retargeting of the collected MoCap trajectory from a human skeleton to a humanoid robot is formulated as an optimization-based inverse kinematics through a constrained quadratic programming problem. The objective function minimizes the weighted sum of squared errors between the 3D positions and orientations of key end-effectors (hands and feet) as well as the pose of the torso, and the head. This formulation allows for smooth tracking of human motion while preserving the structural characteristics of the original trajectory.
Directly mimicking full-body trajectories including walking movements is infeasible due to the substantial dynamic and morphological differences between humans and humanoid robots. In particular, the discrepancy in limb proportions and joint constraints often leads to violations of dynamic stability when attempting to directly replicate human walking patterns. Our early sim-to-real experiments showed that the robot struggles to make stable foot contacts with the floor while making unrealistically long strides. Furthermore, tuning IK parameters to enforce both kinematic accuracy and plausible contact dynamics such as maintaining foot orientation is labor-intensive and not scalable. Thus we train an expert bipedal walking policy by adapting for H1 the approach presented in [1] [2].
[1] R. P. singh, Z. Xie, P. Gergondet, and F. Kanehiro, “Learning bipedal
walking for humanoids with current feedback,” IEEE Access, vol. 11,
pp. 82 013-82 023, 2023.
[2] R. P. Singh, M. Morisawa, M. Benallegue, Z. Xie, and F. Kanehiro,
“Robust humanoid walking on compliant and uneven terrain with
deep reinforcement learning,” in 2024 IEEE-RAS 23rd International
Conference on Humanoid Robots (Humanoids). IEEE, 2024, pp. 497-504.
We implement a teacher-student supervision through a behavior cloning objective, guiding the student policy to match the expert's leg actions. In parallel, we leverage RL to train the upper body to imitate the human motion reference. This hybrid approach allows the student policy to benefit from high-level human demonstrations while leveraging the robustness and stability of the expert locomotion policy, resulting in a coherent whole-body behavior that successfully integrates walking and manipulation.
The robot achieves a full loco-manipulation cycle by walking toward the box, picking it up to drop it off on another table, then going back to its starting position. This is done in one go by automatically switching between policies when the phase clock signal reaches its final value. The cycle can be repeated on-the-fly by placing the box back on the first table while the robot is returning to the starting position. This successful deployment indicates that the domain randomization we used was effective for crossing the sim-to-real gap.
Even if the policy was trained with a single human motion recording that demonstrated a pick-and-place on a short distance, we generalize loco-manipulation over long distances. To do so, we leverage a 2D Dubins planning algorithm to generate paths along which fake target positions will be set to sequentially lead the robot to the real pick-up and drop-off locations. That way the target positions given to the policy remains in-distribution.