Course navigation
Week 3: Imitation LearningDay 17
Diffusion Policy on PushT
This is a valid v1.0 placeholder page for the later curriculum arc. Full interactive lab treatment ships after Week 1 dogfooding.
LECTURE & READING
Glossary primer (10 min)
- Modern Robot LearningDiffusion policyA robot policy that generates actions using diffusion-model techniques. — Columbia/Stanford 2023. Uses a denoising-diffusion model to generate actions conditioned on observations.
- DDPM / DDIM — Diffusion Robot LearningTrainingThe process of fitting a model using data or experience. (DDPM) and fast sampling (DDIM, 10–50 steps instead of 1000).
- Score function — The gradient of log-density, what the diffusion model approximates.
- PushT — A 2D pushing Simulation & Sim-to-RealBenchmarkA standard test used to compare methods fairly.: agent must push a T-shape into a Core ConceptsGoalThe desired outcome or target state for a robot task. region. Standard for diffusion-policy comparisons.
- EMA (Exponential Moving Average) — Running average of model weights. Diffusion policies critically need this; use weights at Robot LearningInferenceUsing a trained model to make predictions or choose actions., not training-step weights.
- Horizon T_p — How many timesteps the Core ConceptsPolicyThe rule or model that maps observations or states to actions. predicts. Modern Robot LearningDiffusion policyA robot policy that generates actions using diffusion-model techniques. paper uses 16; deploy first 8.
Real-world analogy
If ACT is "decide the entire next 100 actions in one shot, deterministically," Modern Robot LearningDiffusion policyA robot policy that generates actions using diffusion-model techniques. is "imagine 50 plausible next-100-action sequences, then refine them iteratively until they're consistent with what experts would do." It naturally handles multi-modal demonstrations (different ways to do the same Core ConceptsTaskThe job the robot is supposed to complete, such as pick-and-place, navigation, or drawer opening.) without averaging them.
Hour 1 — Reading
- Modern Robot LearningDiffusion policyA robot policy that generates actions using diffusion-model techniques. paper, sections 1–4 (~40 min): https://diffusion-policy.cs.columbia.edu/
- Diffusion Models by Sander Dieleman (excellent primer, ~25 min read): https://sander.ai/2022/01/31/diffusion.html
Hour 2 — Read the LeRobot DP implementation
~/robo47-il/.venv/lib/python3.12/site-packages/lerobot/policies/diffusion/modeling_diffusion.py— read ~30 min. Find:- The U-Net (1D over time, conditioned on obs).
- The DDIM sampling loop.
- The temporal ensembling.
LAB
Hour 3 — Lab: train Diffusion Policy on PushT (75 min)
What you're building. A Modern Robot LearningDiffusion policyA robot policy that generates actions using diffusion-model techniques. on the LeRobot pusht Robot LearningDatasetA collection of training or evaluation data.. Compare against ACT on the same Core ConceptsTaskThe job the robot is supposed to complete, such as pick-and-place, navigation, or drawer opening. (more apples-to-apples than ALOHA, since PushT is inherently multi-modal — many ways to push a T-shape).
What success looks like at the end. You have: 1. DP checkpoint with Simulation & Sim-to-RealSuccess rateHow often the robot completes a task correctly. 0.85+ on PushT. 2. ACT checkpoint on PushT with Simulation & Sim-to-RealSuccess rateHow often the robot completes a task correctly. ~0.70 (worse on this multi-modal Core ConceptsTaskThe job the robot is supposed to complete, such as pick-and-place, navigation, or drawer opening.). 3. Comparison plot showing DP > ACT on PushT, while ACT ≈ DP on ALOHA.
Step 1 — Train DP on PushT (45 min on H100)
cd ~/robo47-il
source .venv/bin/activate
lerobot-train \
--policy.type=diffusion \
--dataset.repo_id=lerobot/pusht \
--env.type=pusht \
--env.task=PushT-v0 \
--batch_size=64 \
--steps=200000 \
--eval_freq=20000 \
--save_freq=20000 \
--output_dir=runs/dp_pusht \
--wandb.enable=true \
--seed=1Full source continues in the committed curriculum files. The v1.0 page exposes the day flow and lab surface without inventing content.