Day 24
Quadruped locomotion in depth: Go1 sim-to-real basics
This is a valid v1.0 placeholder page for the later curriculum arc. Full interactive lab treatment ships after Week 1 dogfooding.
LECTURE & READING
Glossary primer (12 min)
- Go1 / Go2 — Unitree's $3K-5K quadrupeds. Main consumer-research platform.
- Core ConceptsActionA command the robot sends to its motors, controller, or low-level system. space (joints) — 12-dim: 4 legs × (hip-yaw, hip-pitch, knee). PD targets in radians.
- Core ConceptsObservationThe information the robot receives from sensors, such as images, depth, touch, or joint readings. space — Typically 48-d: Movement, Mechanics & Robot BodyJointA movable connection between robot parts. pos/vel + IMU (gravity vector + ang vel) + last Core ConceptsActionA command the robot sends to its motors, controller, or low-level system. + command (linear x, linear y, ang z Movement, Mechanics & Robot BodyVelocityHow fast something moves. targets).
- Joystick command conditioning — Core ConceptsPolicyThe rule or model that maps observations or states to actions. takes 3-d command
(v_x, v_y, ω_z)as input. Random commands during Robot LearningTrainingThe process of fitting a model using data or experience.; user-set at deploy. - Imitation & Reinforcement LearningRewardA score that tells the robot how well it is doing. terms — Tracking (match command), stability (don't fall), Core ConceptsActionA command the robot sends to its motors, controller, or low-level system. smoothness (low variation), Movement, Mechanics & Robot BodyContactPhysical interaction between the robot and an object or surface. (avoid foot slipping). Sum to scalar.
- Curriculum — Start with easy commands (low speed), gradually increase as Core ConceptsPolicyThe rule or model that maps observations or states to actions. improves.
- Simulation & Sim-to-RealSim-to-real (sim2real)Transferring a policy trained in simulation to a real robot. gap — Differences between simulator and real Core ConceptsRobotA physical system with sensors and actuators that can observe the world and take actions.: motor delay, Perception & SensingSensorA device that provides information about the robot or its environment. Data, Distributions & Training IssuesNoiseUnwanted variation or randomness in sensor readings or actuation., mass distribution, Movement, Mechanics & Robot BodyFrictionResistance between contacting surfaces that affects sliding and grasping..
- Core ConceptsActionA command the robot sends to its motors, controller, or low-level system. filtering — Smooth high-frequency Core ConceptsActionA command the robot sends to its motors, controller, or low-level system. Data, Distributions & Training IssuesNoiseUnwanted variation or randomness in sensor readings or actuation. before sending to motors. Critical for hardware.
Real-world analogy
A Navigation & LocomotionLocomotion policyA policy specialized for walking or moving stably. is a reflex network: given "go forward at 1 m/s, turn left", it outputs Movement, Mechanics & Robot BodyJointA movable connection between robot parts. targets at 50 Hz that produce the desired motion despite slips, bumps, and pushes. Robot LearningTrainingThe process of fitting a model using data or experience. this is teaching the reflex; deploying it is hoping the real Core ConceptsRobotA physical system with sensors and actuators that can observe the world and take actions.'s reflexes can handle the same input the simulator did.
Hour 1 — Reading
- Learning to walk in minutes using massively parallel deep Imitation & Reinforcement LearningReinforcement Learning (RL)Teaching a robot through trial and error using rewards. (Rudin et al. 2022, ETH) — abstract + Section 3 (~25 min): https://arxiv.org/abs/2109.11978
- Walk These Ways (CMU) — abstract + visualization (~15 min): https://gmargo11.github.io/walk-these-ways/
- A 2024–2025 quadruped demo paper of choice: e.g. Extreme Parkour (CMU) or DribbleBot (CMU). Skim figures.
Hour 2 — Inspect the Go1/Go2 env
from mujoco_playground import registry
env = registry.load("Go1JoystickFlatTerrain")
print("obs:", env.observation_size, "act:", env.action_size)
print("reward terms (from env source):")
# Open env source: ~/.../mujoco_playground/locomotion/go1/joystick.py
# Read 30 min — find reward_tracking_lin_vel, reward_action_rate, reward_orientation, ...Read the env's _compute_reward method end-to-end. Each line is a paper finding. The Imitation & Reinforcement LearningRewardA score that tells the robot how well it is doing. shaping is everything.
LAB
Hour 3 — Lab: train Go1 + benchmark vs Spot (75 min)
What you're building. Train Go1 in the same way as Day 23's Spot, then deploy at three different command speeds (0.3, 0.6, 1.0 m/s) and record videos. This is your reference Go1 Core ConceptsPolicyThe rule or model that maps observations or states to actions. for the Week 7 capstone Track C.
Step 1 — Train (60 min wall-clock for 100M steps)
cd ~/robo47-rl
cp train_spot.py train_go1.py
# Edit ENV_NAME = "Go1JoystickFlatTerrain", NUM_TIMESTEPS = 100_000_000
python train_go1.pyExpected final Imitation & Reinforcement LearningReturnThe total accumulated reward over time. ≈ 30–45. Evaluation & ResearchThroughputHow much data or how many actions a system can process in a given time. ~80k steps/sec on H100; total wall-clock ~25 min for 100M steps.
Full source continues in the committed curriculum files. The v1.0 page exposes the day flow and lab surface without inventing content.