Domain randomization for sim-to-real

This is a valid v1.0 placeholder page for the later curriculum arc. Full interactive lab treatment ships after Week 1 dogfooding.

LECTURE & READING

Glossary primer (10 min)

(DR) — At each reset, sample physical parameters from distributions: mass, , motor strength, . Forces to be robust.
range — Typical: floor ∈ [0.4, 1.5], lateral ∈ [0.1, 1.0].
Mass scale — ±20% mass per , ±5kg payload on torso.
Motor PD — Kp ∈ [80, 120], Kd ∈ [1, 3].
Push perturbations — Random external forces every N steps to simulate kicks/wind.
delay — Insert 1–4 frame delay between and actuators (motor ).
— Gaussian on IMU and sensors at deployment-realistic levels.
— Quantitative measure: in sim − in real.
Robust PPO — PPO + DR is sufficient for many tasks; the "next level" (Day 27) is teacher-student.

Real-world analogy

A pianist who only practiced on one specific piano will struggle on a different one. A pianist who's practiced on 100 different pianos (each with slight tuning, , weight differences) plays well on a new piano they've never seen. is the practice routine.

Hour 1 — Reading

Transfer of Robotic with (Peng et al.) — abstract + Section 3 (~20 min): https://arxiv.org/abs/1710.06537
Closing the with (Tobin et al.) — original DR paper (~15 min): https://arxiv.org/abs/1703.06907
Survey 2024 — skim Section 3 (~25 min): https://arxiv.org/abs/2503.00917

Hour 2 — Inspect Playground's DR config

from mujoco_playground.locomotion.go1 import joystick
# Look at randomize_fn
import inspect
print(inspect.getsource(joystick.GO1_DR_CONFIG))

Read the DR config dataclass. Note which parameters are randomized and over what ranges.

LAB

Hour 3 — Lab: train Go1 with vs without DR, evaluate on perturbed sim (75 min)

What you're building. Two Go1 policies:
A: trained without DR (Day 24's ).
B: trained with full DR enabled.

Then evaluate both on a test sim with 10× heavier payload, 0.3 (slippery), 50ms delay. The DR-trained should survive; the naive one should fall.

What success looks like at the end. You have: 1. Two trained checkpoints policy_no_dr.pkl and policy_dr.pkl. 2. Eval results: no-DR fails ( < 5) on perturbed sim; DR survives ( ≥ 15). 3. Video videos/day25_dr_comparison.mp4 showing both policies side-by-side on perturbed sim. 4. Bar plot figures/day25_dr_robustness.png with 4 bars: (no-DR, default) vs (no-DR, perturbed) vs (DR, default) vs (DR, perturbed).

Step 1 — Train Go1 with DR (60 min wall-clock)

Full source continues in the committed curriculum files. The v1.0 page exposes the day flow and lab surface without inventing content.

Completion controls unlock when this day graduates from placeholder to full lab.