Diffusion Policy on PushT

This is a valid v1.0 placeholder page for the later curriculum arc. Full interactive lab treatment ships after Week 1 dogfooding.

LECTURE & READING

Glossary primer (10 min)

— Columbia/Stanford 2023. Uses a denoising-diffusion model to generate actions conditioned on observations.
DDPM / DDIM — Diffusion (DDPM) and fast sampling (DDIM, 10–50 steps instead of 1000).
Score function — The gradient of log-density, what the diffusion model approximates.
PushT — A 2D pushing : agent must push a T-shape into a region. Standard for diffusion-policy comparisons.
EMA (Exponential Moving Average) — Running average of model weights. Diffusion policies critically need this; use weights at , not training-step weights.
Horizon T_p — How many timesteps the predicts. paper uses 16; deploy first 8.

Real-world analogy

If ACT is "decide the entire next 100 actions in one shot, deterministically," is "imagine 50 plausible next-100-action sequences, then refine them iteratively until they're consistent with what experts would do." It naturally handles multi-modal demonstrations (different ways to do the same ) without averaging them.

Hour 1 — Reading

paper, sections 1–4 (~40 min): https://diffusion-policy.cs.columbia.edu/
Diffusion Models by Sander Dieleman (excellent primer, ~25 min read): https://sander.ai/2022/01/31/diffusion.html

Hour 2 — Read the LeRobot DP implementation

~/robo47-il/.venv/lib/python3.12/site-packages/lerobot/policies/diffusion/modeling_diffusion.py — read ~30 min. Find:
The U-Net (1D over time, conditioned on obs).
The DDIM sampling loop.
The temporal ensembling.

LAB

Hour 3 — Lab: train Diffusion Policy on PushT (75 min)

What you're building. A on the LeRobot pusht . Compare against ACT on the same (more apples-to-apples than ALOHA, since PushT is inherently multi-modal — many ways to push a T-shape).

What success looks like at the end. You have: 1. DP checkpoint with 0.85+ on PushT. 2. ACT checkpoint on PushT with ~0.70 (worse on this multi-modal ). 3. Comparison plot showing DP > ACT on PushT, while ACT ≈ DP on ALOHA.

Step 1 — Train DP on PushT (45 min on H100)

cd ~/robo47-il
source .venv/bin/activate

lerobot-train \
  --policy.type=diffusion \
  --dataset.repo_id=lerobot/pusht \
  --env.type=pusht \
  --env.task=PushT-v0 \
  --batch_size=64 \
  --steps=200000 \
  --eval_freq=20000 \
  --save_freq=20000 \
  --output_dir=runs/dp_pusht \
  --wandb.enable=true \
  --seed=1

Full source continues in the committed curriculum files. The v1.0 page exposes the day flow and lab surface without inventing content.

Completion controls unlock when this day graduates from placeholder to full lab.

Papers you will re-read after this

Diffusion Policy — visuomotor diffusion Behavior generation with latent actions