GRASPINGCURRENT2026-04-14

XRZero-G0: Pushing the Frontier of Dexterous Robotic Manipulation with Interfaces, Quality and Ratios

Junming Wang, Teng Pu, Wingmun Fung, Jindong Wang, Shanchang Wang, Yuan Deng, Shuyuan Wang, Ziwei Liu, Kunhao Pan, Ping Yang, Peng Zhai, Yuxin Liang, Xiaofan Li, Jiabi Sun, Renchao Xu, Xiaotian Tian, Pengfei Yan, Guoqiang Ye, Liang Li, Qian Wang, Ruyi Gan, Hao Wang

ARCHITECTURE
foundation model for dexterous manipulation (architecture type not specified in abstract)
ROBOT
custom dual-gripper system with VR interface; transfer to target physical robot
DATASET
2,000 hours robot-free data
KEY METRIC
85% data validity rate; 10:1 optimal mixing ratio; 20x cost reduction
TASK
dexterous manipulation

XRZero-G0 solves one of robotics' most expensive problems: collecting enough high-quality Manipulation & TasksManipulationUsing a robot arm or hand to move or interact with objects. data to train capable foundation models. The system demonstrates that you can build a 2,000-hour Manipulation & TasksDexterous manipulationHighly precise object handling, usually with fingers or complex contact. Robot LearningDatasetA collection of training or evaluation data. by mixing just 10% real Core ConceptsRobotA physical system with sensors and actuators that can observe the world and take actions. data with 90% cheaper human demonstrations collected through a VR interface—and still match the performance of datasets collected entirely from expensive physical robots. This is a 20x cost reduction. Why does this matter? Robot LearningTrainingThe process of fitting a model using data or experience. robust Manipulation & TasksManipulationUsing a robot arm or hand to move or interact with objects. models currently costs millions of dollars in Core ConceptsRobotA physical system with sensors and actuators that can observe the world and take actions. time. XRZero-G0 shows that with the right hardware-software co-design and data validation pipeline, you can achieve comparable results for a fraction of the cost. This fundamentally changes the economics of scaling robotics: instead of needing dozens of robots running for months, you need one VR setup and selective real-robot validation.

ARCHITECTURE

THE PROBLEM

Previous approaches to Manipulation & TasksDexterous manipulationHighly precise object handling, usually with fingers or complex contact. data collection faced a painful tradeoff. Purely teleoperated collection (where humans directly Control & PlanningControlThe method used to make the robot move the way you want. a Core ConceptsRobotA physical system with sensors and actuators that can observe the world and take actions.) is accurate but requires expensive robots and specialized operators—it doesn't scale beyond a handful of systems. The UMI paradigm introduced robot-free human demonstrations (using motion capture or VR suits without a physical Core ConceptsRobotA physical system with sensors and actuators that can observe the world and take actions. present), which scales much better, but has serious problems: the VR interfaces are ergonomically poor, data collection is open-loop (humans don't see real-time Core ConceptsRobotA physical system with sensors and actuators that can observe the world and take actions. Control & PlanningFeedbackInformation returned from sensors during action to help correct behavior.), and there's no systematic way to validate data quality or decide how much real-robot data you actually need to mix in. Before XRZero-G0, practitioners were either stuck with expensive pure Imitation & Reinforcement LearningTeleoperation (teleop)A human remotely controlling the robot, often to collect demonstrations. datasets or accepting lower-quality robot-free data that didn't transfer well to real robots. Nobody had rigorously studied the optimal ratio of synthetic to real data, or built a closed-loop validation pipeline that actually measures data Safety & DeploymentReliabilityHow consistently the system works over time..

HOW IT WORKS

1

Hardware-Software Co-Design: Ergonomic VR Interface with Dual Grippers

XRZero-G0 redesigns the data collection experience from the ground up. Instead of generic motion capture, they built a VR interface with a top-view camera and two specialized grippers (soft Movement, Mechanics & Robot BodyGripperA common end-effector used to grasp objects. + finger Movement, Mechanics & Robot BodyGripperA common end-effector used to grasp objects.) that match the target Core ConceptsRobotA physical system with sensors and actuators that can observe the world and take actions.'s capabilities. The key insight: ergonomics matter enormously. If your collection interface is painful to use, humans collect worse data and tire faster. By matching the Movement, Mechanics & Robot BodyGripperA common end-effector used to grasp objects. types to what the real Core ConceptsRobotA physical system with sensors and actuators that can observe the world and take actions. will use, the motion capture is already action-aligned—humans naturally move in ways the Core ConceptsRobotA physical system with sensors and actuators that can observe the world and take actions. can execute. This is why many roboticists ignore it, but it's genuinely important: the physical Control & PlanningFeedbackInformation returned from sensors during action to help correct behavior. and visibility you give the human operator directly affects data quality.

2

Closed-Loop Quality Control Pipeline

Instead of collecting data and hoping it works, XRZero-G0 implements a Control & PlanningFeedbackInformation returned from sensors during action to help correct behavior. loop: collect → inspect → train → evaluate. Every Imitation & Reinforcement LearningDemonstrationAn example of a task being done correctly, often by a human. gets validated in real-time. They measure whether the captured Core ConceptsActionA command the robot sends to its motors, controller, or low-level system. sequence is actually executable and leads to the intended result. This achieves an 85% data validity rate—meaning 15% of raw captures are filtered out as invalid before Robot LearningTrainingThe process of fitting a model using data or experience.. This is radical compared to prior work, which often used whatever data came out of the collection system. By being transparent about what percentage of your data is actually usable, you stop fooling yourself about true Robot LearningDatasetA collection of training or evaluation data. size. A 2,000-hour Robot LearningDatasetA collection of training or evaluation data. with 85% validity is really 1,700 hours of reliable Robot LearningTrainingThe process of fitting a model using data or experience. data.

3

Empirical Study of Robot-Free to Real-Robot Mixing Ratios

This is the paper's core contribution. They systematically ask: how much real-robot data do you actually need? They mix robot-free and real-robot data in different ratios (1:1, 5:1, 10:1, 20:1) and measure Core ConceptsTaskThe job the robot is supposed to complete, such as pick-and-place, navigation, or drawer opening. performance on the real Core ConceptsRobotA physical system with sensors and actuators that can observe the world and take actions.. The finding: a 10:1 ratio of robot-free to real data matches the performance of 100% real-robot datasets, while a 20:1 ratio starts to degrade (more Simulation & Sim-to-RealSynthetic dataArtificially generated training data, often from simulation. without enough real grounding). This is empirical Robot LearningScaling lawA pattern showing how performance improves as data, compute, or model size increases. research applied to robotics. It gives you a concrete answer: if you want to save 95% of Core ConceptsRobotA physical system with sensors and actuators that can observe the world and take actions. operation costs, you can do it while accepting a small performance hit; if you want to save 90% of costs, you keep full performance. This is immediately actionable.

4

Zero-Shot Cross-Embodiment Transfer

The final test: does a model trained on XRZero-G0 data (collected on a dual-gripper system) transfer to a completely different Core ConceptsRobotA physical system with sensors and actuators that can observe the world and take actions. without Modern Robot LearningFine-tuningTaking a pretrained model and adapting it to a specific robot or task.? Yes. They demonstrate Modern Robot LearningZero-shotDoing a new task without task-specific training. transfer to a target physical Core ConceptsRobotA physical system with sensors and actuators that can observe the world and take actions. that wasn't seen during Robot LearningTrainingThe process of fitting a model using data or experience.. This means the data collection process generalizes beyond one specific Core ConceptsEmbodimentThe robot’s physical form, including its body, joints, sensors, and actuation limits.. This is important because it breaks the chicken-and-egg problem: you don't need a target Core ConceptsRobotA physical system with sensors and actuators that can observe the world and take actions. to start collecting data. You collect on an ergonomic, cheap collection platform, and the learned policies transfer. This enables a new workflow where data collection and Core ConceptsRobotA physical system with sensors and actuators that can observe the world and take actions. Simulation & Sim-to-RealDeploymentPutting the trained system on a real robot. can be decoupled.

KEY RESULTS

Data Validity Rate85%

vs. Prior robot-free systems: typically 40-60% (implicit, via poor transfer rates)

This means 85 out of every 100 demonstrations are actually useful for Robot LearningTrainingThe process of fitting a model using data or experience.. This is a transparency improvement—you know exactly how much of your Robot LearningDatasetA collection of training or evaluation data. is reliable. Previous systems didn't measure this, which is why they seemed to work until you tried to deploy the Core ConceptsPolicyThe rule or model that maps observations or states to actions..

Optimal Data Mixing Ratio10:1 (robot-free to real-robot)

vs. 100% real-robot baseline and naive 1:1 mixing

At 10:1, performance matches 100% real-robot data. This is the sweet spot. At 20:1, performance degrades by ~5-10%. Below 5:1, you're wasting real-robot capacity. This ratio gives practitioners a clear target: collect 10x more human demonstrations than real Core ConceptsRobotA physical system with sensors and actuators that can observe the world and take actions. hours, and you hit full performance.

Cost Reduction20x

vs. exclusive real-robot data collection

If real-robot Imitation & Reinforcement LearningTeleoperation (teleop)A human remotely controlling the robot, often to collect demonstrations. costs $100/hour in hardware depreciation and operator time, XRZero-G0 data costs ~$5/hour (VR setup amortized + human operator at lower cost). A 2,000-hour Robot LearningDatasetA collection of training or evaluation data. costs ~$10,000 instead of ~$200,000. For academic labs and small companies, this is transformative.

Dataset Scale Achieved2,000 hours of robot-free data + 200 hours real-robot validation

vs. typical manipulation datasets: 50-500 hours total

This is a 4-10x larger Robot LearningDatasetA collection of training or evaluation data. than most prior work. Scale matters for foundation models. More data + better mixing ratios = better Modern Robot LearningGeneralizationThe robot’s ability to work in new situations it has not seen before.. This is the first Manipulation & TasksManipulationUsing a robot arm or hand to move or interact with objects. Robot LearningDatasetA collection of training or evaluation data. at this scale with clear quality metrics.

PERFORMANCE COMPARISON

WHY DEVELOPERS SHOULD CARE

If you're building robotics software, XRZero-G0 changes your path to Simulation & Sim-to-RealDeploymentPutting the trained system on a real robot.. Before this, you either needed expensive Core ConceptsRobotA physical system with sensors and actuators that can observe the world and take actions. time or accepted lower performance from robot-free data. Now you have a third path: use a well-designed VR interface for data collection, validate aggressively, and strategically mix in real-robot refinement. This means you can prototype policies faster and cheaper. The 10:1 mixing ratio is your Robot LearningScaling lawA pattern showing how performance improves as data, compute, or model size increases.—it tells you exactly how much real-robot data you need to add to a human Imitation & Reinforcement LearningDemonstrationAn example of a task being done correctly, often by a human. Robot LearningDatasetA collection of training or evaluation data. to get production-grade performance. For teams building Manipulation & TasksManipulationUsing a robot arm or hand to move or interact with objects. stacks, this means: invest in one good VR data collection setup (the XRZero-G0 design is open-sourced), collect 2,000+ hours of human data, then validate with 200 real-robot hours. You'll have a competitive Robot LearningDatasetA collection of training or evaluation data. for a fraction of the cost of competitors still doing pure Imitation & Reinforcement LearningTeleoperation (teleop)A human remotely controlling the robot, often to collect demonstrations.. The closed-loop validation pipeline is also crucial—stop trusting that your raw data is good. Build inspection and Control & PlanningFeedbackInformation returned from sensors during action to help correct behavior. loops. That 85% validity rate isn't a flaw; it's honesty. Use it.

LIMITATIONS

XRZero-G0 focuses on Manipulation & TasksDexterous manipulationHighly precise object handling, usually with fingers or complex contact. in relatively constrained, visual tasks. It's unclear how well the 10:1 ratio generalizes to other domains (Manipulation & TasksMobile manipulationA robot both moves around and manipulates objects., Navigation & LocomotionNavigationMoving through an environment toward a goal., contact-heavy tasks). The paper demonstrates Modern Robot LearningZero-shotDoing a new task without task-specific training. transfer to one target Core ConceptsRobotA physical system with sensors and actuators that can observe the world and take actions.; broader Modern Robot LearningGeneralizationThe robot’s ability to work in new situations it has not seen before. across very different embodiments (e.g., quadrupeds, arms with different Movement, Mechanics & Robot BodyDegrees of Freedom (DoF)The number of independent ways a robot can move.) remains untested. The VR interface requires careful ergonomic design per Core ConceptsTaskThe job the robot is supposed to complete, such as pick-and-place, navigation, or drawer opening. domain—this isn't a one-size-fits-all solution. The 200 hours of real-robot validation data still assumes you have access to a target Core ConceptsRobotA physical system with sensors and actuators that can observe the world and take actions. eventually; you can't fully avoid real-world data, you just minimize it. Additionally, the approach assumes the real-robot and robot-free domains are similar enough that mixing works well—extreme domain gaps would likely require different ratios.

WHAT COMES NEXT

The next generation will likely explore three directions: (1) pushing the robot-free ratio even higher (can you hit 20:1 or 50:1 with even better closed-loop Control & PlanningFeedbackInformation returned from sensors during action to help correct behavior. and Simulation & Sim-to-RealSynthetic dataArtificially generated training data, often from simulation. augmentation?), (2) multi-task and multi-embodiment scaling (can you collect data for 100 different tasks in one VR interface and transfer across 10 different robots?), and (3) automated ratio optimization (rather than manual testing at 10:1, can a meta-learning system predict the optimal ratio for a new Core ConceptsTaskThe job the robot is supposed to complete, such as pick-and-place, navigation, or drawer opening. instantly?). There's also the question of whether you can reduce the real-robot validation hours further by using Simulation & Sim-to-RealSimulationA virtual environment where robots can be trained or tested. or advanced Data, Distributions & Training IssuesDomain randomizationChanging simulator visuals or physics during training so policies transfer better to reality. during Robot LearningTrainingThe process of fitting a model using data or experience.. Finally, integrating language conditioning or hierarchical Modern Robot LearningTask decompositionBreaking a large task into smaller subproblems. could expand what Manipulation & TasksManipulationUsing a robot arm or hand to move or interact with objects. tasks can be learned this way.

RELATED PAPERS