RMA: Rapid Motor Adaptation (teacher-student sim-to-real)

This is a valid v1.0 placeholder page for the later curriculum arc. Full interactive lab treatment ships after Week 1 dogfooding.

LECTURE & READING

Glossary primer (12 min)

RMA (Rapid Motor Adaptation) — Berkeley 2021. Two-stage : (1) train teacher with privileged info (true mass, , etc.); (2) train student with only , supervised to match teacher.
Privileged information — Variables available in sim but not real (true forces, mass, ground angles). Teacher uses these.
Adaptation module — Student's small recurrent network that estimates a latent vector summarizing the privileged info from history.
Asymmetric actor-critic — Critic uses privileged info during ; actor doesn't. Variant of teacher-student.
History buffer — Student observes the last N proprioceptive states (~50 frames). Used by adaptation module.
Latent regression loss — Student trained via L2 between its adaptation module's output and teacher's privileged latent.

Real-world analogy

Teacher = expert with X-ray vision (knows ground , motor wear). Student = expert without X-ray vision who learns from watching the teacher and inferring "by the way the foot slipped, the floor must be slippery." At deploy time, only the student exists, but it has learned to infer what the teacher knew directly.

Hour 1 — Reading

RMA paper, full sections 1–4 (~30 min): https://ashish-kmr.github.io/rma-legged-robots/
Walk These Ways extends RMA — abstract: https://gmargo11.github.io/walk-these-ways/
Extreme Parkour (CMU 2023) further variant: https://extreme-parkour.github.io/

Hour 2 — Inspect MuJoCo Playground's RMA-style training

Playground's quadruped envs ship with optional asymmetric actor-critic via the privileged_obs field. Read mujoco_playground/locomotion/go1/joystick.py — find _get_privileged_obs and trace where it's used.

LAB

Hour 3 — Lab: implement RMA's two-stage training on Go1 (90 min)

What you're building. Train a Go1 teacher with privileged observations, then distill into a student with only. Compare student's perturbed-sim performance against Day 25's plain-DR .

What success looks like at the end. You have: 1. policy_teacher.pkl (uses privileged obs, ≈ 35). 2. policy_student.pkl ( only, ≈ 28). 3. Student outperforms Day 25's plain-DR on perturbed sim by ≥ 30%.

Step 1 — Train teacher (45 min)

Modify train_go1.py to enable privileged obs in policy_obs_key="privileged_state". Train as usual. Expected ≈ 35–45 (higher than non-privileged because the has more info).

Full source continues in the committed curriculum files. The v1.0 page exposes the day flow and lab surface without inventing content.

Completion controls unlock when this day graduates from placeholder to full lab.