Day 15

LeRobot v0.5 setup, dataset exploration, BC baseline

This is a valid v1.0 placeholder page for the later curriculum arc. Full interactive lab treatment ships after Week 1 dogfooding.

LECTURE & READING

Glossary primer (12 min)

  • LeRobot — Hugging Face's flagship robot-learning library. v0.5.0 released Q1 2026. Trains and evaluates ACT, Modern Robot LearningDiffusion policyA robot policy that generates actions using diffusion-model techniques., SmolVLA, OpenVLA, and more from a unified CLI.
  • LeRobotDataset — Standard Robot LearningDatasetA collection of training or evaluation data. format: parquet shards + per-episode metadata. ~150 datasets on HF Hub.
  • Imitation & Reinforcement LearningBehavior Cloning (BC)A simple type of imitation learning where the robot directly copies expert actions.Robot LearningSupervised learningLearning from labeled input-output examples. of π(action | observation) from (o, a) pairs. Dumbest method, hard Evaluation & ResearchBaselineA reference method used for comparison..
  • Modern Robot LearningAction chunkingPredicting several future actions at once instead of one action at a time. — Predict multiple future actions per Robot LearningInferenceUsing a trained model to make predictions or choose actions., not just one. Smoother trajectories, fewer compounding errors.
  • Temporal ensembling — Average overlapping Core ConceptsActionA command the robot sends to its motors, controller, or low-level system. chunks (e.g. predict actions for t..t+15, average with previous predictions for the same timesteps).
  • Robot LearningEpisodeOne full attempt at a task from start to finish. — One Imitation & Reinforcement LearningDemonstrationAn example of a task being done correctly, often by a human. Core ConceptsTrajectoryA sequence of states or actions over time.: a sequence of (image, state, action) tuples.
  • ALOHA — Bimanual Imitation & Reinforcement LearningTeleoperation (teleop)A human remotely controlling the robot, often to collect demonstrations. platform from Stanford (2023); de facto Simulation & Sim-to-RealBenchmarkA standard test used to compare methods fairly. for fine-manipulation imitation.
  • HuggingFace Hub `repo_id`lerobot/aloha_sim_insertion_human etc. Standard Robot LearningDatasetA collection of training or evaluation data. reference.

Real-world analogy

Imitation & Reinforcement LearningBehavior Cloning (BC)A simple type of imitation learning where the robot directly copies expert actions. is "watch the master 1000 times, copy them." Works great when the master is consistent and the situation is similar. Fails catastrophically the first time something unexpected happens — because you've only ever seen the master, never seen them recover.

Hour 1 — Reading + watch

Hour 2 — Install and inspect the ALOHA dataset

On Nebius:

ssh -i ~/.ssh/nebius_key ubuntu@<your-instance-ip>
cd ~
mkdir -p robo47-il && cd robo47-il
uv venv --python 3.12 .venv && source .venv/bin/activate
uv pip install "lerobot[all]==0.5.0"
uv pip install wandb
wandb login   # paste API key from https://wandb.ai/authorize

Inspect a Robot LearningDatasetA collection of training or evaluation data.:

python -c "
from lerobot.common.datasets.lerobot_dataset import LeRobotDataset
ds = LeRobotDataset('lerobot/aloha_sim_insertion_human')
print(f'Episodes: {ds.num_episodes}, frames: {ds.num_frames}')
print(f'Sample keys: {list(ds[0].keys())[:10]}')
print(f'Action shape: {ds[0][\"action\"].shape}')
print(f'Image shape: {ds[0][\"observation.images.top\"].shape}')
"

Expected:

Episodes: 50, frames: 25000
Sample keys: ['observation.images.top', 'observation.state', 'action', 'episode_index', 'frame_index', 'timestamp', 'next.done', 'index', 'task_index']
Action shape: torch.Size([14])
Image shape: torch.Size([3, 480, 640])

14-d Core ConceptsActionA command the robot sends to its motors, controller, or low-level system. = 7 joints × 2 arms (or 6 + Movement, Mechanics & Robot BodyGripperA common end-effector used to grasp objects. × 2, depending on Robot LearningDatasetA collection of training or evaluation data.).

LAB

Hour 3 — Lab: train a BC baseline (90 min)

What you're building. A behavior-cloning Core ConceptsPolicyThe rule or model that maps observations or states to actions. on ALOHA Manipulation & TasksInsertionPlacing one object into another, like plugging in a connector. using LeRobot v0.5's CLI. This is the worst-case Evaluation & ResearchBaselineA reference method used for comparison. every other Core ConceptsPolicyThe rule or model that maps observations or states to actions. this week will beat. You'll log the failure modes carefully.

What success looks like at the end. You have: 1. A trained Imitation & Reinforcement LearningBehavior Cloning (BC)A simple type of imitation learning where the robot directly copies expert actions. checkpoint at runs/bc_aloha/checkpoints/last/pretrained_model/. 2. Eval Simulation & Sim-to-RealSuccess rateHow often the robot completes a task correctly. ≈ 0.05–0.20 (low; that's the point — it should be bad). 3. runs/bc_aloha/eval/episode_0.mp4 showing the Core ConceptsPolicyThe rule or model that maps observations or states to actions. try-and-fail to insert the peg. 4. Robot LearningTrainingThe process of fitting a model using data or experience. loss curve declines from ~3.0 to <1.0 by step 20k.

Step 1 — Train BC (45 min wall-clock on 1× H100)

cd ~/robo47-il
source .venv/bin/activate

lerobot-train \
  --policy.type=bc \
  --dataset.repo_id=lerobot/aloha_sim_insertion_human \
  --env.type=aloha \
  --env.task=AlohaInsertion-v0 \
  --batch_size=16 \
  --steps=20000 \
  --eval_freq=5000 \
  --save_freq=5000 \
  --output_dir=runs/bc_aloha \
  --wandb.enable=true \
  --wandb.project=robo47 \
  --seed=1

Full source continues in the committed curriculum files. The v1.0 page exposes the day flow and lab surface without inventing content.

Completion controls unlock when this day graduates from placeholder to full lab.