Day 14

FoundationPose + Week 2 integration + fresh-clone

This is a valid v1.0 placeholder page for the later curriculum arc. Full interactive lab treatment ships after Week 1 dogfooding.

LECTURE & READING

Glossary primer (10 min)

  • 6-DoF Perception & SensingPose estimationEstimating an object’s or robot part’s position and orientation. — Find an object's full SE(3) pose (position + orientation) from sensors. Standard input to Manipulation & TasksGraspingTaking hold of an object. and Manipulation & TasksManipulationUsing a robot arm or hand to move or interact with objects. policies.
  • FoundationPose — NVIDIA's 2024 universal 6-DoF pose estimator. Two modes: model-based (need CAD), model-free (need a few reference images). Released CVPR 2024 (Best Paper highlight).
  • Refinement vs. trackingEstimate: from-scratch pose given image + depth. Track: refine pose using prior frame's estimate. Tracking is faster (10× speedup) but requires good init.
  • OnePose / OnePose++ — Earlier model-free pose estimators. Used for comparison.
  • Mesh model — A .obj or .glb file describing the object's 3D geometry. CAD-based pose estimators need this.

Real-world analogy

FoundationPose is "you've never seen this exact teapot, but you've seen 10 teapots before; given one CAD file or 5 photos, where exactly is it sitting in the scene?"

Hour 1 — Reading

Hour 2 — Setup + run reference example (45 min)

NVIDIA's repo includes a reference example with the YCB-Video Robot LearningDatasetA collection of training or evaluation data.. Follow:

git clone https://github.com/NVlabs/FoundationPose
cd FoundationPose
docker pull shingarey/foundationpose:latest  # or build per README
docker run --gpus all -v $(pwd):/foundationpose -it shingarey/foundationpose:latest bash

Inside container:

cd /foundationpose
bash scripts/run_demo.sh

This runs the bundled demo: estimate pose of a mustard bottle in 5 reference images. Expected output: a sequence of overlay images in debug/ showing the predicted pose drawn as a 3D bounding box around the bottle. Simulation & Sim-to-RealLatencyDelay between input, computation, and action.: ~200 ms per refine step on H100.

LAB

Hour 3 — Lab: integrate Week 2 + fresh-clone test (75 min)

What you're building. A combined integration script that ties Week 2 together: it loads a TurtleBot4 sim, captures one image from its onboard camera, runs the Day 13 Perception & SensingPerceptionThe process of turning raw sensor data into useful understanding of the world. stack, runs FoundationPose on a target object (a YCB sugar box dropped into the warehouse), and reports the object's 6-DoF pose in the Core ConceptsRobotA physical system with sensors and actuators that can observe the world and take actions. frame. Then a fresh-clone test verifies Day 8 (chatter), Day 9 (URDF), and Day 13 (Perception & SensingPerceptionThe process of turning raw sensor data into useful understanding of the world.) all reproduce.

What success looks like at the end. You have: 1. src/day14_w2_integration.py runs and reports a 6-DoF pose for the target. 2. figures/day14_pose_overlay.png shows the predicted pose as a 3D bounding box on the camera image. 3. RETRO_w2.md documents fresh-clone reproduction of three earlier days. 4. Repo w2-systems/ pushed to GitHub with all artifacts.

Step 1 — Spawn an object in TB4 sim (15 min)

In the running TB4 sim, drop a YCB-style box at a known pose:

Full source continues in the committed curriculum files. The v1.0 page exposes the day flow and lab surface without inventing content.

Completion controls unlock when this day graduates from placeholder to full lab.