OmniRetarget: Interaction-Preserving Data Generation for Humanoid Whole-Body Loco-Manipulation and Scene Interaction
Lujie Yang, Xiaoyu Huang, Zhen Wu, Angjoo Kanazawa, Pieter Abbeel, Carmelo Sferrazza, C. Karen Liu, Rocky Duan, Guanya Shi
ARCHITECTURE
THE PROBLEM
Before OmniRetarget, motion retargeting (converting human movements into Core ConceptsRobotA physical system with sensors and actuators that can observe the world and take actions. commands) was plagued by the Core ConceptsEmbodimentThe robot’s physical form, including its body, joints, sensors, and actuation limits. gap problem. Humans and humanoid robots have fundamentally different body proportions, Movement, Mechanics & Robot BodyJointA movable connection between robot parts. ranges, and physical capabilities. When you naively retarget human motion to robots, you get physical disasters: feet sliding through floors (foot-skating), hands penetrating objects, and Movement, Mechanics & Robot BodyContactPhysical interaction between the robot and an object or surface. artifacts that make the motion physically implausible. Existing methods like Geometric Motion Retargeting (GMR) and physics-based humanoid controllers (PHC) tried to fix kinematic infeasibility, but they completely ignored the semantic content—the actual interactions between the human, objects, and Core ConceptsEnvironmentThe external world the robot operates in, including objects, obstacles, people, and surfaces.. A human Imitation & Reinforcement LearningDemonstrationAn example of a task being done correctly, often by a human. of 'carry a box up stairs' contains rich relational information about hand-object Movement, Mechanics & Robot BodyContactPhysical interaction between the robot and an object or surface. and foot-ground Movement, Mechanics & Robot BodyContactPhysical interaction between the robot and an object or surface. that previous methods simply discarded. This meant Robot LearningTrainingThe process of fitting a model using data or experience. data was wasteful: one human video could only train one Modern Robot LearningSkillA reusable behavior like grasp, push, place, or open drawer. on one Core ConceptsRobotA physical system with sensors and actuators that can observe the world and take actions. with one object configuration. Developers had to manually create massive motion datasets or craft Imitation & Reinforcement LearningRewardA score that tells the robot how well it is doing. functions by hand, making it expensive and brittle to scale humanoid learning.
HOW IT WORKS
Interaction Mesh Construction
Laplacian Deformation with Kinematic Constraints
Systematic Data Augmentation from Interaction Semantics
Proprioceptive RL Training with Minimal Rewards
MORE DEMONSTRATIONS
KEY RESULTS
vs. typical humanoid skills at 5-10 seconds
The Core ConceptsRobotA physical system with sensors and actuators that can observe the world and take actions. successfully executes multi-phase tasks (carry chair → climb platform → parkour roll) lasting 30 seconds continuously. This demonstrates coherent, long-horizon reasoning where the Core ConceptsRobotA physical system with sensors and actuators that can observe the world and take actions. maintains balance, object Manipulation & TasksManipulationUsing a robot arm or hand to move or interact with objects., and dynamic movement across multiple phases without falling or losing track of the Core ConceptsTaskThe job the robot is supposed to complete, such as pick-and-place, navigation, or drawer opening..
vs. typical humanoid papers requiring 15-20+ rewards and multi-stage curricula
The entire Robot LearningTrainingThe process of fitting a model using data or experience. pipeline uses minimal hyperparameter tuning—one shared Imitation & Reinforcement LearningRewardA score that tells the robot how well it is doing. structure and simple Data, Distributions & Training IssuesDomain randomizationChanging simulator visuals or physics during training so policies transfer better to reality. work across all tasks (parkour, Manipulation & TasksLoco-manipulationLocomotion and manipulation happening together, often in humanoids., climbing, crawling). This suggests the retargeted motion data is so high-quality that Imitation & Reinforcement LearningReinforcement Learning (RL)Teaching a robot through trial and error using rewards. doesn't need extensive task-specific engineering. This is practically important: it means you can scale to new tasks without rebuilding Imitation & Reinforcement LearningRewardA score that tells the robot how well it is doing. functions from scratch.
vs. from multiple human mocap datasets (OMOMO, LAFAN1, proprietary)
OmniRetarget processed and retargeted over 9 hours of human motion capture data across three different datasets, producing physically feasible Core ConceptsRobotA physical system with sensors and actuators that can observe the world and take actions. trajectories. This demonstrates Modern Robot LearningRobustnessHow well a robot keeps working despite noise, disturbances, or variation. across different human movement styles and datasets, not just curated in-house motion capture.
vs. GMR and PHC baselines showing visible foot-skating and penetration
Visual comparisons in the project page show that GMR produces obvious foot-sliding artifacts and object penetration, while OmniRetarget trajectories obey non-penetration constraints and maintain Movement, Mechanics & Robot BodyContactPhysical interaction between the robot and an object or surface. integrity. This is the core technical contribution—making retargeting interaction-aware actually eliminates the physical artifacts that break Imitation & Reinforcement LearningReinforcement Learning (RL)Teaching a robot through trial and error using rewards. Robot LearningTrainingThe process of fitting a model using data or experience..
PERFORMANCE COMPARISON
WHY DEVELOPERS SHOULD CARE
If you're building humanoid robotics applications, OmniRetarget changes the game in two ways. First, it solves the data bottleneck. Creating Core ConceptsRobotA physical system with sensors and actuators that can observe the world and take actions. Robot LearningTrainingThe process of fitting a model using data or experience. data has been expensive—you either hire motion capture studios, manually craft trajectories by hand, or run massive Imitation & Reinforcement LearningReinforcement Learning (RL)Teaching a robot through trial and error using rewards. simulations. OmniRetarget lets you harvest human motion from public datasets (LAFAN1 contains thousands of diverse human movements) and automatically convert them to Robot LearningTrainingThe process of fitting a model using data or experience. data for multiple Core ConceptsRobotA physical system with sensors and actuators that can observe the world and take actions. embodiments and configurations. One human video becomes dozens of Core ConceptsRobotA physical system with sensors and actuators that can observe the world and take actions. Robot LearningTrainingThe process of fitting a model using data or experience. scenarios. Second, it shows that motion retargeting, done correctly, is actually a foundational building block for humanoid learning. Prior work treated retargeting as a preprocessing step that was 'good enough'—but this paper demonstrates that preserving interaction semantics during retargeting is critical. The Imitation & Reinforcement LearningReinforcement Learning (RL)Teaching a robot through trial and error using rewards. Core ConceptsPolicyThe rule or model that maps observations or states to actions. can then focus on the Control & PlanningControlThe method used to make the robot move the way you want. problem (tracking references with physics) rather than learning from scratch. This is important philosophically: it suggests that human demonstrations contain rich structure about what skillful movement should look like, and respecting that structure (especially interaction structure) makes learning much more efficient. For developers, this means: (1) leverage human mocap data systematically instead of collecting robot-only data, (2) think about interaction constraints as first-class citizens in motion processing, (3) don't over-engineer Imitation & Reinforcement LearningRewardA score that tells the robot how well it is doing. functions if your reference data is high-quality.
LIMITATIONS
OmniRetarget relies on accurate motion capture input and requires that interactions can be modeled geometrically (hand-object Movement, Mechanics & Robot BodyContactPhysical interaction between the robot and an object or surface., foot-ground Movement, Mechanics & Robot BodyContactPhysical interaction between the robot and an object or surface.). It doesn't handle situations where the scene geometry is unknown or complex interaction logic is needed (e.g., 'grasp the handle, not the blade'). The method also assumes that human motion is retargetable to the target Core ConceptsRobotA physical system with sensors and actuators that can observe the world and take actions. at all—some human movements (like extreme flexibility) simply aren't feasible for robots, and the paper doesn't discuss how gracefully it handles such cases. Additionally, all experiments are on Unitree humanoids in relatively controlled environments; Modern Robot LearningGeneralizationThe robot’s ability to work in new situations it has not seen before. to other Core ConceptsRobotA physical system with sensors and actuators that can observe the world and take actions. morphologies or unstructured real-world scenes is untested. The proprioceptive-only Core ConceptsPolicyThe rule or model that maps observations or states to actions. also means the Core ConceptsRobotA physical system with sensors and actuators that can observe the world and take actions. has no visual Control & PlanningFeedbackInformation returned from sensors during action to help correct behavior., limiting Modern Robot LearningRobustnessHow well a robot keeps working despite noise, disturbances, or variation. to unexpected scene variations or dynamic obstacles that the Imitation & Reinforcement LearningDemonstrationAn example of a task being done correctly, often by a human. didn't cover.
WHAT COMES NEXT
The next frontier is likely bridging the Simulation & Sim-to-RealSim-to-real (sim2real)Transferring a policy trained in simulation to a real robot. gap more reliably and adding Perception & SensingPerceptionThe process of turning raw sensor data into useful understanding of the world.. Currently, OmniRetarget generates data in Simulation & Sim-to-RealSimulationA virtual environment where robots can be trained or tested., and there's always slippage when deploying to real robots. Combining interaction-preserving retargeting with vision-based policies (so the Core ConceptsRobotA physical system with sensors and actuators that can observe the world and take actions. can adapt when objects or terrain don't match the Data, Distributions & Training IssuesTraining distributionThe kinds of examples the model saw during training. exactly) would make this approach production-ready. Another direction is learning from in-the-wild human video (YouTube, TikTok) without mocap—estimating 3D human pose from video, preserving interactions, and retargeting for robots. Finally, extending interaction meshes to more complex scenarios (multi-object Manipulation & TasksManipulationUsing a robot arm or hand to move or interact with objects., human-robot collaboration, contact-rich Manipulation & TasksManipulationUsing a robot arm or hand to move or interact with objects. like piano playing) could unlock even richer Modern Robot LearningSkillA reusable behavior like grasp, push, place, or open drawer. learning from human demonstrations.