Day 27

RMA: Rapid Motor Adaptation (teacher-student sim-to-real)

This is a valid v1.0 placeholder page for the later curriculum arc. Full interactive lab treatment ships after Week 1 dogfooding.

LECTURE & READING

Glossary primer (12 min)

  • RMA (Rapid Motor Adaptation) — Berkeley 2021. Two-stage Robot LearningTrainingThe process of fitting a model using data or experience.: (1) train teacher Core ConceptsPolicyThe rule or model that maps observations or states to actions. with privileged info (true mass, Movement, Mechanics & Robot BodyFrictionResistance between contacting surfaces that affects sliding and grasping., etc.); (2) train student Core ConceptsPolicyThe rule or model that maps observations or states to actions. with only Perception & SensingProprioceptionThe robot sensing its own body state, such as joint angles, velocity, and force., supervised to match teacher.
  • Privileged information — Variables available in sim but not real (true Movement, Mechanics & Robot BodyContactPhysical interaction between the robot and an object or surface. forces, mass, ground angles). Teacher uses these.
  • Adaptation module — Student's small recurrent network that estimates a latent vector summarizing the privileged info from history.
  • Asymmetric actor-critic — Critic uses privileged info during Robot LearningTrainingThe process of fitting a model using data or experience.; actor doesn't. Variant of teacher-student.
  • History buffer — Student observes the last N proprioceptive states (~50 frames). Used by adaptation module.
  • Latent regression loss — Student trained via L2 between its adaptation module's output and teacher's privileged latent.

Real-world analogy

Teacher = expert with X-ray vision (knows ground Movement, Mechanics & Robot BodyFrictionResistance between contacting surfaces that affects sliding and grasping., motor wear). Student = expert without X-ray vision who learns from watching the teacher and inferring "by the way the foot slipped, the floor must be slippery." At deploy time, only the student exists, but it has learned to infer what the teacher knew directly.

Hour 1 — Reading

Hour 2 — Inspect MuJoCo Playground's RMA-style training

Playground's quadruped envs ship with optional asymmetric actor-critic via the privileged_obs field. Read mujoco_playground/locomotion/go1/joystick.py — find _get_privileged_obs and trace where it's used.

LAB

Hour 3 — Lab: implement RMA's two-stage training on Go1 (90 min)

What you're building. Train a Go1 teacher Core ConceptsPolicyThe rule or model that maps observations or states to actions. with privileged observations, then distill into a student with Perception & SensingProprioceptionThe robot sensing its own body state, such as joint angles, velocity, and force. only. Compare student's perturbed-sim performance against Day 25's plain-DR Core ConceptsPolicyThe rule or model that maps observations or states to actions..

What success looks like at the end. You have: 1. policy_teacher.pkl (uses privileged obs, Imitation & Reinforcement LearningReturnThe total accumulated reward over time. ≈ 35). 2. policy_student.pkl (Perception & SensingProprioceptionThe robot sensing its own body state, such as joint angles, velocity, and force. only, Imitation & Reinforcement LearningReturnThe total accumulated reward over time. ≈ 28). 3. Student outperforms Day 25's plain-DR Core ConceptsPolicyThe rule or model that maps observations or states to actions. on perturbed sim by ≥ 30%.

Step 1 — Train teacher (45 min)

Modify train_go1.py to enable privileged obs in policy_obs_key="privileged_state". Train as usual. Expected Imitation & Reinforcement LearningReturnThe total accumulated reward over time. ≈ 35–45 (higher than non-privileged because the Core ConceptsPolicyThe rule or model that maps observations or states to actions. has more info).

Full source continues in the committed curriculum files. The v1.0 page exposes the day flow and lab surface without inventing content.

Completion controls unlock when this day graduates from placeholder to full lab.