Week navigation
Week 4: Reinforcement Learning
Week 4 -
Reinforcement Learning
PPO, MuJoCo Playground, quadruped locomotion, domain randomization, Isaac Lab, and RMA.
PPO foundations + Abbeel primer + cart-pole from scratch
Glossary primer (15 min) Reinforcement learning (RL) — Learning a policy π(a s) that maximizes expected return E[Σγᵗ rₜ] from interaction wi...
MuJoCo Playground: massively-parallel PPO on a quadruped
Glossary primer (12 min) MuJoCo Playground — Google DeepMind's 2025 GPU accelerated MuJoCo wrapper. Runs 4096 parallel envs on a single H100...
Quadruped locomotion in depth: Go1 sim-to-real basics
Glossary primer (12 min) Go1 / Go2 — Unitree's $3K 5K quadrupeds. Main consumer research platform. Action space (joints) — 12 dim: 4 legs ×...
Domain randomization for sim-to-real
Glossary primer (10 min) Domain randomization (DR) — At each episode reset, sample physical parameters from distributions: mass, friction, m...
Isaac Lab: a peek at the production-grade alternative
Glossary primer (10 min) Isaac Lab — NVIDIA's GPU accelerated robot simulation framework, built on Isaac Sim (Omniverse). Successor to Isaac...
RMA: Rapid Motor Adaptation (teacher-student sim-to-real)
Glossary primer (12 min) RMA (Rapid Motor Adaptation) — Berkeley 2021. Two stage training: (1) train teacher policy with privileged info (tr...
Week 4 integration + fresh-clone
Glossary primer (5 min) No new terms. Reflection + reproducibility day. Hour 1 — Capstone Track C pre design (40 min) Write docs/day28 track...
What you will know by end of Week 4
- Read the week's source papers without drowning in undefined terms.
- Run the week's core software stack from a fresh clone.
- Explain the week's systems in terms of data, control, and learning loops.