Week 3 -
Imitation Learning

LeRobot, behavior cloning, ACT, diffusion policy, SmolVLA, LoRA, and OpenVLA.

Policies

LeRobot v0.5 setup, dataset exploration, BC baseline

Glossary primer (12 min) LeRobot — Hugging Face's flagship robot learning library. v0.5.0 released Q1 2026. Trains and evaluates ACT, Diffus...

Glossary primer (10 min) ACT (Action Chunking Transformer) — Stanford 2023. CVAE based transformer that predicts the next K actions per call...

Glossary primer (10 min) Diffusion Policy — Columbia/Stanford 2023. Uses a denoising diffusion model to generate actions conditioned on obse...

Glossary primer (12 min) SmolVLA — Hugging Face's compact (2.4B parameter) Vision Language Action model, released 2025. Designed for fine tu...

Glossary primer (10 min) VQ BeT (Vector Quantized Behavior Transformer) — Carnegie Mellon 2024. Tokenize continuous actions via VQ VAE, then...

Glossary primer (12 min) OpenVLA — Stanford / TRI 2024. 7B parameter VLA built on Llama 2 7B + DINOv2 + SigLIP. Open weights. OpenVLA OFT —...

Glossary primer (5 min) No new terms. Today is reflection + reproducibility. Hour 1 — Capstone Track D pre design (40 min) Track D of the We...