Day 34

RDT-1B + CogACT + comparison reflection

This is a valid v1.0 placeholder page for the later curriculum arc. Full interactive lab treatment ships after Week 1 dogfooding.

LECTURE & READING

Glossary primer (8 min)

  • RDT-1B (Robotics Diffusion Transformer-1B) — Tsinghua / Shanghai AI Lab 2024. Bimanual specialist, 1B params, DiT-style architecture.
  • DiT — Diffusion Transformer. From the image generation world; used here for Core ConceptsActionA command the robot sends to its motors, controller, or low-level system. chunks.
  • CogACT — Tsinghua 2024. Lightweight (0.6B) Modern Robot LearningVision-Language-Action model (VLA)A model that takes images and language as input and outputs robot actions.; SigLIP encoder, ACT-style Core ConceptsActionA command the robot sends to its motors, controller, or low-level system. head.
  • OFT (vs DiT) — OpenVLA-OFT uses regression head; RDT uses diffusion. Trade-off: regression faster, diffusion more Modern Robot LearningMultimodalUsing more than one type of input, like vision, language, touch, or proprioception..

Hour 1 — Reading

Hour 2 — Run RDT-1B inference

git clone https://github.com/thu-ml/RoboticsDiffusionTransformer
cd RoboticsDiffusionTransformer
uv pip install -e .
huggingface-cli download robotics-diffusion-transformer/rdt-1b --local-dir checkpoints/rdt
python scripts/inference.py --checkpoint checkpoints/rdt --task aloha_insertion_eval

LAB

Hour 3 — VLA-arsenal comparison reflection (75 min)

What you're building. A summary document comparing all the VLAs you've actually run this week: π0 (Day 30), GR00T N1 (Day 31), and now RDT-1B/CogACT.

Create docs/day34_vla_comparison.md:

# VLA arsenal comparison (your hands-on results)

| Model | Task | Zero-shot | LoRA (5k steps) | Inference latency | Memory |
|---|---|---|---|---|---|
| SmolVLA (Day 18) | LIBERO-Spatial | 0.36 | 0.79 | 80 ms | 12 GB |
| OpenVLA-OFT (Day 20) | LIBERO 10ep | 0.30 | 0.70 | 50 ms | 18 GB |
| π0 (Day 30) | LIBERO-Spatial | 0.40 | 0.78 | 90 ms | 15 GB |
| GR00T N1 (Day 31) | Humanoid lift | 0.30 | 0.55 | 110 ms | 14 GB |
| RDT-1B (Day 34) | ALOHA insertion | 0.35 | – | 130 ms | 8 GB |
| CogACT (Day 34) | ALOHA insertion | 0.42 | – | 30 ms | 6 GB |

## Decision tree
- Bimanual fine manipulation: RDT-1B
- Humanoid: GR00T N1.6
- Real-time, lightweight: CogACT
- Open-weights, proven: π0
- Consumer GPU: SmolVLA or OpenVLA-OFT
- API-only, but with reasoning: Gemini Robotics-ER

## Key takeaway
For 80% of imitation tasks, any of these will work after LoRA. The question is
which fits your *deployment* constraints (compute, memory, latency, embodiment).

Deliverable checklist

Full source continues in the committed curriculum files. The v1.0 page exposes the day flow and lab surface without inventing content.

Completion controls unlock when this day graduates from placeholder to full lab.