Gemini Robotics + Robot Academy IBVS primer

This is a valid v1.0 placeholder page for the later curriculum arc. Full interactive lab treatment ships after Week 1 dogfooding.

LECTURE & READING

Glossary primer (10 min)

Gemini Robotics — Google DeepMind 2025. Robotics adaptation of Gemini-2.0/2.5. Native (image + video + audio + text → ).
Gemini Robotics-ER — "Embodied Reasoning" variant. Spatial reasoning, scene understanding, plan generation.
Gemini Robotics-ER 1.6 — Apr 14, 2026 release. Latest ER. Stronger spatial grounding.
Native — Trained from scratch on mixed image/text/audio/video tokens. Not "text LLM + vision adapter."
VLM-as-policy — Use directly as via deltas in language form (e.g. "move +0.05 m in x"). Gemini Robotics-ER does this.
IBVS (Image-Based Visual Servoing) — Classical: drive image features (Pixel position of object) to a target by computing visual-feature Jacobian. Predates VLAs by 30 years; conceptually similar to " outputs EE deltas given vision".

Real-world analogy

Gemini Robotics is "Tesla AutoPilot": vertically integrated, proprietary, fed by enormous private data. ER 1.6 is the latest "FSD beta" with sharper spatial reasoning.

Hour 1 — Robot Academy IBVS primer (visual intuition first)

Watch Visual Servoing masterclass, focus on Image-Based VS lessons (~35 min): https://robotacademy.net.au/masterclass/vision-and-motion/

Why now? IBVS predates VLAs by decades but the conceptual loop — "vision → EE delta" — is identical. Modern policies are IBVS, with a learned visual-feature Jacobian. Watching Corke's animated IBVS demos makes "what does Gemini Robotics-ER actually do?" click.

Hour 2 — Reading

Gemini Robotics announcement (Mar 2025) (~20 min): https://deepmind.google/discover/blog/introducing-gemini-robotics/
Gemini Robotics-ER 1.6 announcement (Apr 14, 2026) (~25 min): https://deepmind.google/discover/blog/gemini-robotics-er-16/

LAB

Hour 3 — Lab: Gemini Robotics-ER inference via API (75 min)

What you're building. Use Google's Gemini API (which exposes Gemini Robotics-ER 1.6 publicly as of Apr 2026) to do spatial reasoning queries on images, then use the responses to drive a simulated Panda toward a designated object.

Step 1 — Setup API key (10 min)

uv pip install google-generativeai
export GOOGLE_API_KEY=<your-key>  # from https://ai.google.dev

Step 2 — Spatial reasoning query (30 min)

Full source continues in the committed curriculum files. The v1.0 page exposes the day flow and lab surface without inventing content.

Completion controls unlock when this day graduates from placeholder to full lab.