Day 25

Domain randomization for sim-to-real

This is a valid v1.0 placeholder page for the later curriculum arc. Full interactive lab treatment ships after Week 1 dogfooding.

LECTURE & READING

Glossary primer (10 min)

  • Data, Distributions & Training IssuesDomain randomizationChanging simulator visuals or physics during training so policies transfer better to reality. (DR) — At each Robot LearningEpisodeOne full attempt at a task from start to finish. reset, sample physical parameters from distributions: mass, Movement, Mechanics & Robot BodyFrictionResistance between contacting surfaces that affects sliding and grasping., motor strength, Perception & SensingSensorA device that provides information about the robot or its environment. Data, Distributions & Training IssuesNoiseUnwanted variation or randomness in sensor readings or actuation.. Forces Core ConceptsPolicyThe rule or model that maps observations or states to actions. to be robust.
  • Movement, Mechanics & Robot BodyFrictionResistance between contacting surfaces that affects sliding and grasping. range — Typical: floor Movement, Mechanics & Robot BodyFrictionResistance between contacting surfaces that affects sliding and grasping. ∈ [0.4, 1.5], lateral Movement, Mechanics & Robot BodyFrictionResistance between contacting surfaces that affects sliding and grasping. ∈ [0.1, 1.0].
  • Mass scale — ±20% mass per Movement, Mechanics & Robot BodyLinkA rigid body segment between joints., ±5kg payload on torso.
  • Motor PD Simulation & Sim-to-RealRandomizationIntroducing controlled variability during simulation or training. — Kp ∈ [80, 120], Kd ∈ [1, 3].
  • Push perturbations — Random external forces every N steps to simulate kicks/wind.
  • Core ConceptsActionA command the robot sends to its motors, controller, or low-level system. delay — Insert 1–4 frame delay between Core ConceptsPolicyThe rule or model that maps observations or states to actions. and actuators (motor Simulation & Sim-to-RealLatencyDelay between input, computation, and action.).
  • Core ConceptsObservationThe information the robot receives from sensors, such as images, depth, touch, or joint readings. Data, Distributions & Training IssuesNoiseUnwanted variation or randomness in sensor readings or actuation. — Gaussian Data, Distributions & Training IssuesNoiseUnwanted variation or randomness in sensor readings or actuation. on IMU and Movement, Mechanics & Robot BodyJointA movable connection between robot parts. sensors at deployment-realistic levels.
  • Simulation & Sim-to-RealReality gapThe difference between simulation behavior and real-world behavior. — Quantitative measure: Simulation & Sim-to-RealSuccess rateHow often the robot completes a task correctly. in sim − Simulation & Sim-to-RealSuccess rateHow often the robot completes a task correctly. in real.
  • Robust PPO — PPO + DR is sufficient for many tasks; the "next level" (Day 27) is teacher-student.

Real-world analogy

A pianist who only practiced on one specific piano will struggle on a different one. A pianist who's practiced on 100 different pianos (each with slight tuning, Core ConceptsActionA command the robot sends to its motors, controller, or low-level system., weight differences) plays well on a new piano they've never seen. Data, Distributions & Training IssuesDomain randomizationChanging simulator visuals or physics during training so policies transfer better to reality. is the practice routine.

Hour 1 — Reading

  • Simulation & Sim-to-RealSim-to-real (sim2real)Transferring a policy trained in simulation to a real robot. Transfer of Robotic Control & PlanningControlThe method used to make the robot move the way you want. with Movement, Mechanics & Robot BodyDynamicsThe study of motion including forces, torques, mass, and inertia. Simulation & Sim-to-RealRandomizationIntroducing controlled variability during simulation or training. (Peng et al.) — abstract + Section 3 (~20 min): https://arxiv.org/abs/1710.06537
  • Closing the Simulation & Sim-to-RealReality gapThe difference between simulation behavior and real-world behavior. with Data, Distributions & Training IssuesDomain randomizationChanging simulator visuals or physics during training so policies transfer better to reality. (Tobin et al.) — original DR paper (~15 min): https://arxiv.org/abs/1703.06907
  • Simulation & Sim-to-RealSim-to-real (sim2real)Transferring a policy trained in simulation to a real robot. Survey 2024 — skim Section 3 (~25 min): https://arxiv.org/abs/2503.00917

Hour 2 — Inspect Playground's DR config

from mujoco_playground.locomotion.go1 import joystick
# Look at randomize_fn
import inspect
print(inspect.getsource(joystick.GO1_DR_CONFIG))

Read the DR config dataclass. Note which parameters are randomized and over what ranges.

LAB

Hour 3 — Lab: train Go1 with vs without DR, evaluate on perturbed sim (75 min)

  • What you're building. Two Go1 policies:
  • A: trained without DR (Day 24's Core ConceptsPolicyThe rule or model that maps observations or states to actions.).
  • B: trained with full DR enabled.

Then evaluate both on a test sim with 10× heavier payload, 0.3 Movement, Mechanics & Robot BodyFrictionResistance between contacting surfaces that affects sliding and grasping. (slippery), 50ms Core ConceptsActionA command the robot sends to its motors, controller, or low-level system. delay. The DR-trained Core ConceptsPolicyThe rule or model that maps observations or states to actions. should survive; the naive one should fall.

What success looks like at the end. You have: 1. Two trained checkpoints policy_no_dr.pkl and policy_dr.pkl. 2. Eval results: no-DR Core ConceptsPolicyThe rule or model that maps observations or states to actions. fails (Imitation & Reinforcement LearningReturnThe total accumulated reward over time. < 5) on perturbed sim; DR Core ConceptsPolicyThe rule or model that maps observations or states to actions. survives (Imitation & Reinforcement LearningReturnThe total accumulated reward over time. ≥ 15). 3. Video videos/day25_dr_comparison.mp4 showing both policies side-by-side on perturbed sim. 4. Bar plot figures/day25_dr_robustness.png with 4 bars: (no-DR, default) vs (no-DR, perturbed) vs (DR, default) vs (DR, perturbed).

Step 1 — Train Go1 with DR (60 min wall-clock)

Full source continues in the committed curriculum files. The v1.0 page exposes the day flow and lab surface without inventing content.

Completion controls unlock when this day graduates from placeholder to full lab.