Course navigation
Week 6: Frontier EmbodimentDay 41
EgoScale and the data-collection paradigm shift
This is a valid v1.0 placeholder page for the later curriculum arc. Full interactive lab treatment ships after Week 1 dogfooding.
LECTURE & READING
Glossary primer (10 min)
- EgoScale — Feb 2026 paper / Robot LearningDatasetA collection of training or evaluation data. (Meta + collaborators). Massive egocentric video Robot LearningDatasetA collection of training or evaluation data. (10,000+ hours) with synchronized hand pose, gaze, language. Targeted at Robot LearningTrainingThe process of fitting a model using data or experience. generalist VLAs.
- Egocentric video — First-person video, e.g. from head-mounted GoPro or smart glasses. Captures Core ConceptsTaskThe job the robot is supposed to complete, such as pick-and-place, navigation, or drawer opening. Core ConceptsExecutionActually carrying out planned or predicted actions on the robot. from the actor's POV.
- Project Aria — Meta's research smart-glasses platform. EgoScale uses Aria-style devices.
- Core ConceptsActionA command the robot sends to its motors, controller, or low-level system. labels from video — Use VLMs (Day 33) to auto-label actions from egocentric video. Cheap; less accurate than Imitation & Reinforcement LearningTeleoperation (teleop)A human remotely controlling the robot, often to collect demonstrations..
- Hand-tracking model — Reconstructs 3D hand pose from RGB video. Standard approach: HaMeR, MANO model.
- Why this matters — Until 2025, Core ConceptsRobotA physical system with sensors and actuators that can observe the world and take actions. data was bottlenecked by Imitation & Reinforcement LearningTeleoperation (teleop)A human remotely controlling the robot, often to collect demonstrations. hours (~$50/hour, slow). Egocentric video can be collected at scale ($0.50/hour from existing footage). 100× cost reduction.
Real-world analogy
Pre-EgoScale: train a chef by having them re-cook each recipe 100× while a lab tech holds their hands and records every motion. Post-EgoScale: just film professional chefs doing their job in their kitchens; auto-extract what their hands did. Same data, 100× cheaper.
Hour 1 — Reading
- EgoScale paper, abstract + Section 3 (~30 min): https://arxiv.org/abs/2602.xxxxx (search "EgoScale 2026")
- Ego4D (predecessor): abstract + figures (~15 min): https://ego4d-data.org/
- Project Aria: https://www.projectaria.com/research/
Hour 2 — Inspect EgoScale samples
If the Robot LearningDatasetA collection of training or evaluation data. is downloadable:
huggingface-cli download facebook/egoscale-v1 --local-dir data/egoscale --include "samples/*"
ls data/egoscale/samples/- For each sample, look at:
- The egocentric MP4 (~30 s)
- The hand pose JSON (per-frame 3D Movement, Mechanics & Robot BodyJointA movable connection between robot parts. positions)
- The auto-generated Core ConceptsActionA command the robot sends to its motors, controller, or low-level system. labels
LAB
Hour 3 — Lab: extract hand pose from a 30-second clip (60 min)
What you're building. Take an egocentric video clip you record yourself (point your phone at your hands while making a sandwich, ~30 s). Run HaMeR (open-source 3D hand pose reconstruction) on it. Output a 30-second timeline of 3D hand keypoints.
Step 1 — Install HaMeR (15 min)
git clone https://github.com/geopavlakos/hamer
cd hamer
uv pip install -e .
huggingface-cli download geopavlakos/hamer --local-dir checkpoints/hamerStep 2 — Record + run (30 min)
Full source continues in the committed curriculum files. The v1.0 page exposes the day flow and lab surface without inventing content.