Quadruped locomotion in depth: Go1 sim-to-real basics

This is a valid v1.0 placeholder page for the later curriculum arc. Full interactive lab treatment ships after Week 1 dogfooding.

LECTURE & READING

Glossary primer (12 min)

Go1 / Go2 — Unitree's $3K-5K quadrupeds. Main consumer-research platform.
space (joints) — 12-dim: 4 legs × (hip-yaw, hip-pitch, knee). PD targets in radians.
space — Typically 48-d: pos/vel + IMU (gravity vector + ang vel) + last + command (linear x, linear y, ang z targets).
Joystick command conditioning — takes 3-d command (v_x, v_y, ω_z) as input. Random commands during ; user-set at deploy.
terms — Tracking (match command), stability (don't fall), smoothness (low variation), (avoid foot slipping). Sum to scalar.
Curriculum — Start with easy commands (low speed), gradually increase as improves.
gap — Differences between simulator and real : motor delay, , mass distribution, .
filtering — Smooth high-frequency before sending to motors. Critical for hardware.

Real-world analogy

A is a reflex network: given "go forward at 1 m/s, turn left", it outputs targets at 50 Hz that produce the desired motion despite slips, bumps, and pushes. this is teaching the reflex; deploying it is hoping the real 's reflexes can handle the same input the simulator did.

Hour 1 — Reading

Learning to walk in minutes using massively parallel deep (Rudin et al. 2022, ETH) — abstract + Section 3 (~25 min): https://arxiv.org/abs/2109.11978
Walk These Ways (CMU) — abstract + visualization (~15 min): https://gmargo11.github.io/walk-these-ways/
A 2024–2025 quadruped demo paper of choice: e.g. Extreme Parkour (CMU) or DribbleBot (CMU). Skim figures.

Hour 2 — Inspect the Go1/Go2 env

from mujoco_playground import registry
env = registry.load("Go1JoystickFlatTerrain")
print("obs:", env.observation_size, "act:", env.action_size)
print("reward terms (from env source):")
# Open env source: ~/.../mujoco_playground/locomotion/go1/joystick.py
# Read 30 min — find reward_tracking_lin_vel, reward_action_rate, reward_orientation, ...

Read the env's _compute_reward method end-to-end. Each line is a paper finding. The shaping is everything.

LAB

Hour 3 — Lab: train Go1 + benchmark vs Spot (75 min)

What you're building. Train Go1 in the same way as Day 23's Spot, then deploy at three different command speeds (0.3, 0.6, 1.0 m/s) and record videos. This is your reference Go1 for the Week 7 capstone Track C.

Step 1 — Train (60 min wall-clock for 100M steps)

cd ~/robo47-rl
cp train_spot.py train_go1.py
# Edit ENV_NAME = "Go1JoystickFlatTerrain", NUM_TIMESTEPS = 100_000_000
python train_go1.py

Expected final ≈ 30–45. ~80k steps/sec on H100; total wall-clock ~25 min for 100M steps.

Full source continues in the committed curriculum files. The v1.0 page exposes the day flow and lab surface without inventing content.

Completion controls unlock when this day graduates from placeholder to full lab.