Course navigation
Week 3: Imitation LearningDay 15
LeRobot v0.5 setup, dataset exploration, BC baseline
This is a valid v1.0 placeholder page for the later curriculum arc. Full interactive lab treatment ships after Week 1 dogfooding.
LECTURE & READING
Glossary primer (12 min)
- LeRobot — Hugging Face's flagship robot-learning library. v0.5.0 released Q1 2026. Trains and evaluates ACT, Modern Robot LearningDiffusion policyA robot policy that generates actions using diffusion-model techniques., SmolVLA, OpenVLA, and more from a unified CLI.
- LeRobotDataset — Standard Robot LearningDatasetA collection of training or evaluation data. format: parquet shards + per-episode metadata. ~150 datasets on HF Hub.
- Imitation & Reinforcement LearningBehavior Cloning (BC)A simple type of imitation learning where the robot directly copies expert actions. — Robot LearningSupervised learningLearning from labeled input-output examples. of
π(action | observation)from(o, a)pairs. Dumbest method, hard Evaluation & ResearchBaselineA reference method used for comparison.. - Modern Robot LearningAction chunkingPredicting several future actions at once instead of one action at a time. — Predict multiple future actions per Robot LearningInferenceUsing a trained model to make predictions or choose actions., not just one. Smoother trajectories, fewer compounding errors.
- Temporal ensembling — Average overlapping Core ConceptsActionA command the robot sends to its motors, controller, or low-level system. chunks (e.g. predict actions for
t..t+15, average with previous predictions for the same timesteps). - Robot LearningEpisodeOne full attempt at a task from start to finish. — One Imitation & Reinforcement LearningDemonstrationAn example of a task being done correctly, often by a human. Core ConceptsTrajectoryA sequence of states or actions over time.: a sequence of
(image, state, action)tuples. - ALOHA — Bimanual Imitation & Reinforcement LearningTeleoperation (teleop)A human remotely controlling the robot, often to collect demonstrations. platform from Stanford (2023); de facto Simulation & Sim-to-RealBenchmarkA standard test used to compare methods fairly. for fine-manipulation imitation.
- HuggingFace Hub `repo_id` —
lerobot/aloha_sim_insertion_humanetc. Standard Robot LearningDatasetA collection of training or evaluation data. reference.
Real-world analogy
Imitation & Reinforcement LearningBehavior Cloning (BC)A simple type of imitation learning where the robot directly copies expert actions. is "watch the master 1000 times, copy them." Works great when the master is consistent and the situation is similar. Fails catastrophically the first time something unexpected happens — because you've only ever seen the master, never seen them recover.
Hour 1 — Reading + watch
- LeRobot v0.5 release notes (~15 min): https://huggingface.co/blog/lerobot or https://github.com/huggingface/lerobot/releases
- ALOHA paper, abstract + figures (~10 min): https://arxiv.org/abs/2304.13705
- A Survey of Imitation & Reinforcement LearningImitation Learning (IL)Teaching a robot by showing it examples of how to do a task. Methods, Environments and Metrics — skim Sec. 2 (~25 min): https://arxiv.org/abs/2503.16322
Hour 2 — Install and inspect the ALOHA dataset
On Nebius:
ssh -i ~/.ssh/nebius_key ubuntu@<your-instance-ip>
cd ~
mkdir -p robo47-il && cd robo47-il
uv venv --python 3.12 .venv && source .venv/bin/activate
uv pip install "lerobot[all]==0.5.0"
uv pip install wandb
wandb login # paste API key from https://wandb.ai/authorizeInspect a Robot LearningDatasetA collection of training or evaluation data.:
python -c "
from lerobot.common.datasets.lerobot_dataset import LeRobotDataset
ds = LeRobotDataset('lerobot/aloha_sim_insertion_human')
print(f'Episodes: {ds.num_episodes}, frames: {ds.num_frames}')
print(f'Sample keys: {list(ds[0].keys())[:10]}')
print(f'Action shape: {ds[0][\"action\"].shape}')
print(f'Image shape: {ds[0][\"observation.images.top\"].shape}')
"Expected:
Episodes: 50, frames: 25000
Sample keys: ['observation.images.top', 'observation.state', 'action', 'episode_index', 'frame_index', 'timestamp', 'next.done', 'index', 'task_index']
Action shape: torch.Size([14])
Image shape: torch.Size([3, 480, 640])14-d Core ConceptsActionA command the robot sends to its motors, controller, or low-level system. = 7 joints × 2 arms (or 6 + Movement, Mechanics & Robot BodyGripperA common end-effector used to grasp objects. × 2, depending on Robot LearningDatasetA collection of training or evaluation data.).
LAB
Hour 3 — Lab: train a BC baseline (90 min)
What you're building. A behavior-cloning Core ConceptsPolicyThe rule or model that maps observations or states to actions. on ALOHA Manipulation & TasksInsertionPlacing one object into another, like plugging in a connector. using LeRobot v0.5's CLI. This is the worst-case Evaluation & ResearchBaselineA reference method used for comparison. every other Core ConceptsPolicyThe rule or model that maps observations or states to actions. this week will beat. You'll log the failure modes carefully.
What success looks like at the end. You have:
1. A trained Imitation & Reinforcement LearningBehavior Cloning (BC)A simple type of imitation learning where the robot directly copies expert actions. checkpoint at runs/bc_aloha/checkpoints/last/pretrained_model/.
2. Eval Simulation & Sim-to-RealSuccess rateHow often the robot completes a task correctly. ≈ 0.05–0.20 (low; that's the point — it should be bad).
3. runs/bc_aloha/eval/episode_0.mp4 showing the Core ConceptsPolicyThe rule or model that maps observations or states to actions. try-and-fail to insert the peg.
4. Robot LearningTrainingThe process of fitting a model using data or experience. loss curve declines from ~3.0 to <1.0 by step 20k.
Step 1 — Train BC (45 min wall-clock on 1× H100)
cd ~/robo47-il
source .venv/bin/activate
lerobot-train \
--policy.type=bc \
--dataset.repo_id=lerobot/aloha_sim_insertion_human \
--env.type=aloha \
--env.task=AlohaInsertion-v0 \
--batch_size=16 \
--steps=20000 \
--eval_freq=5000 \
--save_freq=5000 \
--output_dir=runs/bc_aloha \
--wandb.enable=true \
--wandb.project=robo47 \
--seed=1Full source continues in the committed curriculum files. The v1.0 page exposes the day flow and lab surface without inventing content.