Day 42

Week 6 retro + Track B design + R²-Dreamer notes

This is a valid v1.0 placeholder page for the later curriculum arc. Full interactive lab treatment ships after Week 1 dogfooding.

LECTURE & READING

Hour 1 — Capstone Track B pre-design (40 min)

Track B is "Modern Robot LearningWorld modelA model that predicts how the world will change after actions.": train a small JEPA-style action-conditioned Modern Robot LearningWorld modelA model that predicts how the world will change after actions. on a custom Robot LearningDatasetA collection of training or evaluation data., evaluate prediction quality and downstream Imitation & Reinforcement LearningReinforcement Learning (RL)Teaching a robot through trial and error using rewards. performance.

docs/day42_track_b_design.md:

# Track B: World model on a custom dataset

## Hypothesis
"A 100M-param JEPA-style world model trained on 50 hours of MuJoCo Playground
Spot Joystick rollouts will predict next-step embeddings with cosine ≥ 0.85 at
horizon 1 and ≥ 0.6 at horizon 10."

## Variables
- IV: model size (10M / 100M / 300M params)
- DV: cosine similarity at horizons 1, 5, 10, 30

## Pipeline
1. Generate 50h of Spot rollouts via Day 23 trained policy → save (obs, action, next_obs).
2. Train V-JEPA-style predictor on this data (LeWorldModel codebase).
3. Evaluate prediction quality vs horizon (Day 37 methodology).
4. Stretch: use the world model in a Dreamer-style RL loop on a *new* task.

## Compute
- 1× H100, ~6 hours per model size. 18 hours total.

## Risk
- 50h of data may be too little for 300M params. Kill criterion: if 100M val cos < 0.7 at horizon 1, simplify task.

Hour 2 — R²-Dreamer notes + comparison (40 min)

Read R²-Dreamer paper if you didn't on Day 38. Add to your Week 6 docs:

# R²-Dreamer additions over DreamerV3

| Component | DreamerV3 | R²-Dreamer | Why |
|---|---|---|---|
| Action delay | None | Explicit 1-4 frame delay in world model | Real motors have 20-80ms delay |
| Observation lag | None | Asymmetric obs/action timing | Cameras lag motors |
| Sim-to-real | Implicit | Online residual on world-model prediction | Closes specific kinds of gap |
| Sample efficiency | High | Slightly lower (more constraints) | Trade-off |

LAB

Hour 3 — Week 6 retro (45 min)

RETRO_w6.md:

# Week 6 retro

## Numbers
| Day | Method | Result |
|---|---|---|
| 36 | V-JEPA 2-AC zero-shot | mean cos 0.87 on 32-frame ALOHA clip |
| 37 | V-JEPA 2-AC drift | step-1 0.89 → step-30 0.40 (open-loop) |
| 38 | DreamerV3 vs PPO | DreamerV3 798 vs PPO 310 on dmc_walker_walk @ 1M steps |
| 39 | Humanoid WBC | 5-method comparison table; toy VAE + classifier guidance |
| 40 | Allegro toy | cube stays in hand (no full reorient) |
| 41 | EgoScale + HaMeR | 30-s clip → 3D hand pose timeline |

## What I learned
1. JEPA's "predict embeddings, not pixels" is principled and practical. The cosine
   numbers feel right: 0.87 1-step is what you'd expect for a frozen pretrained model.
2. Drift compounds. Open-loop 30 steps is ~unusable. Closed-loop is ~always usable.
3. DreamerV3's 5× sample efficiency over PPO is real — and it's a low estimate for
   harder tasks.
4. Humanoid WBC is its own field. Five major papers in 12 months.
5. Dexterous still mostly works through reward engineering (Eureka) and data
   augmentation (DexMimicGen), not raw scale.
6. EgoScale-style data is the next paradigm. Hand-tracking out of egocentric video
   is the cheapest robot data ever.

## What still confuses me
- Why does symlog matter so much for DreamerV3 stability? Is there a clean
  theoretical explanation or is it pure empirical?
- BeyondMimic's classifier guidance: does it work because the latent space is
  "compositional" or because diffusion is naturally good at sampling unusual
  combinations? My toy VAE wasn't expressive enough to test.
- How do hand-tracking errors propagate when training a policy on extracted
  egocentric data? Surely a 5° wrist error compounds to a real failure mode.

Step 4 — Fresh-clone test + commit (15 min)

cd /tmp && rm -rf w6-test
git clone <your-w6-frontier-url> w6-test
cd w6-test
uv venv --python 3.12 .venv && source .venv/bin/activate
uv pip install -r requirements.txt
python src/day36_vjepa2_ac.py 2>&1 | tee fresh_run.log
grep "mean cosine" fresh_run.log

Full source continues in the committed curriculum files. The v1.0 page exposes the day flow and lab surface without inventing content.

Completion controls unlock when this day graduates from placeholder to full lab.