Day 25
Domain randomization for sim-to-real
This is a valid v1.0 placeholder page for the later curriculum arc. Full interactive lab treatment ships after Week 1 dogfooding.
LECTURE & READING
Glossary primer (10 min)
- Data, Distributions & Training IssuesDomain randomizationChanging simulator visuals or physics during training so policies transfer better to reality. (DR) — At each Robot LearningEpisodeOne full attempt at a task from start to finish. reset, sample physical parameters from distributions: mass, Movement, Mechanics & Robot BodyFrictionResistance between contacting surfaces that affects sliding and grasping., motor strength, Perception & SensingSensorA device that provides information about the robot or its environment. Data, Distributions & Training IssuesNoiseUnwanted variation or randomness in sensor readings or actuation.. Forces Core ConceptsPolicyThe rule or model that maps observations or states to actions. to be robust.
- Movement, Mechanics & Robot BodyFrictionResistance between contacting surfaces that affects sliding and grasping. range — Typical: floor Movement, Mechanics & Robot BodyFrictionResistance between contacting surfaces that affects sliding and grasping. ∈ [0.4, 1.5], lateral Movement, Mechanics & Robot BodyFrictionResistance between contacting surfaces that affects sliding and grasping. ∈ [0.1, 1.0].
- Mass scale — ±20% mass per Movement, Mechanics & Robot BodyLinkA rigid body segment between joints., ±5kg payload on torso.
- Motor PD Simulation & Sim-to-RealRandomizationIntroducing controlled variability during simulation or training. — Kp ∈ [80, 120], Kd ∈ [1, 3].
- Push perturbations — Random external forces every N steps to simulate kicks/wind.
- Core ConceptsActionA command the robot sends to its motors, controller, or low-level system. delay — Insert 1–4 frame delay between Core ConceptsPolicyThe rule or model that maps observations or states to actions. and actuators (motor Simulation & Sim-to-RealLatencyDelay between input, computation, and action.).
- Core ConceptsObservationThe information the robot receives from sensors, such as images, depth, touch, or joint readings. Data, Distributions & Training IssuesNoiseUnwanted variation or randomness in sensor readings or actuation. — Gaussian Data, Distributions & Training IssuesNoiseUnwanted variation or randomness in sensor readings or actuation. on IMU and Movement, Mechanics & Robot BodyJointA movable connection between robot parts. sensors at deployment-realistic levels.
- Simulation & Sim-to-RealReality gapThe difference between simulation behavior and real-world behavior. — Quantitative measure: Simulation & Sim-to-RealSuccess rateHow often the robot completes a task correctly. in sim − Simulation & Sim-to-RealSuccess rateHow often the robot completes a task correctly. in real.
- Robust PPO — PPO + DR is sufficient for many tasks; the "next level" (Day 27) is teacher-student.
Real-world analogy
A pianist who only practiced on one specific piano will struggle on a different one. A pianist who's practiced on 100 different pianos (each with slight tuning, Core ConceptsActionA command the robot sends to its motors, controller, or low-level system., weight differences) plays well on a new piano they've never seen. Data, Distributions & Training IssuesDomain randomizationChanging simulator visuals or physics during training so policies transfer better to reality. is the practice routine.
Hour 1 — Reading
- Simulation & Sim-to-RealSim-to-real (sim2real)Transferring a policy trained in simulation to a real robot. Transfer of Robotic Control & PlanningControlThe method used to make the robot move the way you want. with Movement, Mechanics & Robot BodyDynamicsThe study of motion including forces, torques, mass, and inertia. Simulation & Sim-to-RealRandomizationIntroducing controlled variability during simulation or training. (Peng et al.) — abstract + Section 3 (~20 min): https://arxiv.org/abs/1710.06537
- Closing the Simulation & Sim-to-RealReality gapThe difference between simulation behavior and real-world behavior. with Data, Distributions & Training IssuesDomain randomizationChanging simulator visuals or physics during training so policies transfer better to reality. (Tobin et al.) — original DR paper (~15 min): https://arxiv.org/abs/1703.06907
- Simulation & Sim-to-RealSim-to-real (sim2real)Transferring a policy trained in simulation to a real robot. Survey 2024 — skim Section 3 (~25 min): https://arxiv.org/abs/2503.00917
Hour 2 — Inspect Playground's DR config
from mujoco_playground.locomotion.go1 import joystick
# Look at randomize_fn
import inspect
print(inspect.getsource(joystick.GO1_DR_CONFIG))Read the DR config dataclass. Note which parameters are randomized and over what ranges.
LAB
Hour 3 — Lab: train Go1 with vs without DR, evaluate on perturbed sim (75 min)
- What you're building. Two Go1 policies:
- A: trained without DR (Day 24's Core ConceptsPolicyThe rule or model that maps observations or states to actions.).
- B: trained with full DR enabled.
Then evaluate both on a test sim with 10× heavier payload, 0.3 Movement, Mechanics & Robot BodyFrictionResistance between contacting surfaces that affects sliding and grasping. (slippery), 50ms Core ConceptsActionA command the robot sends to its motors, controller, or low-level system. delay. The DR-trained Core ConceptsPolicyThe rule or model that maps observations or states to actions. should survive; the naive one should fall.
What success looks like at the end. You have:
1. Two trained checkpoints policy_no_dr.pkl and policy_dr.pkl.
2. Eval results: no-DR Core ConceptsPolicyThe rule or model that maps observations or states to actions. fails (Imitation & Reinforcement LearningReturnThe total accumulated reward over time. < 5) on perturbed sim; DR Core ConceptsPolicyThe rule or model that maps observations or states to actions. survives (Imitation & Reinforcement LearningReturnThe total accumulated reward over time. ≥ 15).
3. Video videos/day25_dr_comparison.mp4 showing both policies side-by-side on perturbed sim.
4. Bar plot figures/day25_dr_robustness.png with 4 bars: (no-DR, default) vs (no-DR, perturbed) vs (DR, default) vs (DR, perturbed).
Step 1 — Train Go1 with DR (60 min wall-clock)
Full source continues in the committed curriculum files. The v1.0 page exposes the day flow and lab surface without inventing content.