VLA Inference#
This guide covers running a trained Isaac-GR00T VLA policy on the Unitree G1 robot using the Sonic whole-body control stack.
Overview#
The inference pipeline consists of:
Isaac-GR00T PolicyServer — loads the VLA model and serves actions over ZMQ
VLA inference client (
run_vla_inference.py) — reads camera + robot state, queries the PolicyServer, and publishes actions to the C++ control loopC++ deploy (
gear_sonic_deploy) — executes whole-body control on the robotCamera server — provides camera images over ZMQ (runs as a systemd service)
Data exporter (optional) — records episodes during inference
┌──────────────────────┐
│ Isaac-GR00T │
│ PolicyServer │
│ (GPU machine) │
└──────┬───────────────┘
│ ZMQ REQ/REP
▼
┌─────────────────────┐ ZMQ TCP ┌──────────────────────┐
│ VLA Inference │ ◄─────────── │ Camera Server │
│ (run_vla_inference)│ │ (on robot) │
└────┬───────────┬────┘ └──────────────────────┘
│ │
│ ZMQ PUB │ ZMQ SUB
│ (actions) │ (state)
▼ ▼
┌─────────────────────┐
│ C++ Deploy │
│ (gear_sonic_deploy)│
└─────────────────────┘
Prerequisites#
1. Isaac-GR00T PolicyServer#
The PolicyServer runs on a machine with a GPU. It loads your finetuned VLA model and serves inference over ZMQ.
Install Isaac-GR00T and start the server:
# On the GPU machine (from the Isaac-GR00T repo)
uv run python gr00t/eval/run_gr00t_server.py \
--model-path /path/to/your/finetuned_model \
--embodiment-tag UNITREE_G1_SONIC \
--device cuda:0 \
--port 5550
2. Inference Environment#
On the inference machine (can be the same as the PolicyServer or a separate PC):
bash install_scripts/install_inference.sh
This creates .venv_inference with the Isaac-GR00T PolicyClient and all
inference dependencies.
3. Camera Server#
The camera server should be running as a systemd service on the robot. See Data Collection for camera server setup.
4. C++ Deploy#
The gear_sonic_deploy binary must be built. See the main README.
Action Space#
The Sonic embodiment (unitree_g1_sonic) uses a 78-dimensional action
space: 64-dim motion token + 7-dim left hand joints + 7-dim right hand joints.
Quick Start — tmux Launcher#
The easiest way to run inference is with the all-in-one tmux launcher:
# Real robot
python gear_sonic/scripts/launch_inference.py \
--prompt "pick up the apple" \
--camera-host 192.168.123.164
# Simulation
python gear_sonic/scripts/launch_inference.py --sim \
--prompt "pick up the apple"
# Without data recording
python gear_sonic/scripts/launch_inference.py \
--no-data-exporter \
--prompt "pick up the apple"
The launcher creates a tmux session with four panes:
Pane |
Component |
Description |
|---|---|---|
0 (top-left) |
C++ Deploy |
Whole-body controller |
1 (bottom-left) |
Keyboard Publisher |
Type keyboard commands here |
2 (top-right) |
VLA Inference |
Policy client + action loop |
3 (bottom-right) |
Data Exporter |
Records episodes (optional) |
Keyboard Controls#
Type these keys in the Keyboard Publisher pane (pane 1):
Key |
Action |
|---|---|
|
Start / stop the C++ control loop |
|
Send initial pose and switch to POSE mode |
|
Pause / resume policy inference |
|
Toggle left hand open/closed (initial pose) |
|
Toggle right hand open/closed (initial pose) |
|
Change the inference prompt (e.g., |
|
Start recording an episode (data exporter) |
|
Stop recording — success (data exporter) |
|
Stop recording — failure / discard (data exporter) |
Typical Workflow#
Wait for all panes to initialize
Click on pane 0 (C++ Deploy) and press Enter to confirm deployment
Switch to pane 1 (Keyboard Publisher)
Press
kto start the C++ control loop (starts in PLANNER mode)Press
ito send the initial pose (switches to POSE mode)Note: The initial motion token in
gear_sonic/utils/inference/initial_poses.pyis specific to the SONIC checkpoint used during training. If you change the SONIC checkpoint, you must updateLATENT_INITIAL_MOTION_TOKENto a safe standing pose from the new checkpoint’s latent space.Press
pto unpause the inference loopThe robot will begin executing VLA-predicted actions
Press
pto pause,kto stop the control loop when done
Manual Setup (Without tmux)#
If you prefer to run each component in separate terminals:
Terminal 1 — Isaac-GR00T PolicyServer (GPU machine)#
# From the Isaac-GR00T repo
uv run python gr00t/eval/run_gr00t_server.py \
--model-path /path/to/your/finetuned_model \
--embodiment-tag UNITREE_G1_SONIC \
--device cuda:0 \
--port 5550
Terminal 2 — C++ Deploy#
cd gear_sonic_deploy
./deploy.sh --input-type zmq_manager real
Terminal 3 — VLA Inference#
source .venv_inference/bin/activate
python gear_sonic/scripts/run_vla_inference.py \
--host <policy_server_ip> \
--port 5550 \
--embodiment-tag unitree_g1_sonic \
--prompt "pick up the apple" \
--camera-host 192.168.123.164
Terminal 4 — Data Exporter (optional)#
source .venv_data_collection/bin/activate
python gear_sonic/scripts/run_data_exporter.py \
--task-prompt "pick up the apple" \
--camera-host 192.168.123.164
Configuration Reference#
VLA Inference (run_vla_inference.py)#
Flag |
Default |
Description |
|---|---|---|
|
|
PolicyServer host |
|
|
PolicyServer port |
|
|
Embodiment tag |
|
|
Language prompt |
|
|
Action publish rate (Hz) |
|
|
Actions per inference chunk |
|
|
Inference rate (Hz) |
|
|
Camera server host |
|
|
Camera server port |
|
|
Always print loop timing |
tmux Launcher (launch_inference.py)#
The launcher exposes all the above flags plus deploy and data exporter options.
Run python gear_sonic/scripts/launch_inference.py --help for the full list.
Remote PolicyServer#
When running the PolicyServer on a separate GPU machine:
# On the inference machine, point to the remote server
python gear_sonic/scripts/launch_inference.py \
--policy-host <gpu_machine_ip> \
--policy-port 5550 \
--camera-host 192.168.123.164 \
--prompt "pick up the apple"
Make sure port 5550 (or your chosen port) is accessible between the two machines.
Latency Compensation#
The inference loop automatically compensates for network and compute latency.
When a new action chunk arrives, the system calculates how many actions in the
chunk are already “stale” based on the time elapsed since inference started,
and skips to the appropriate action index. This is controlled by --action-publish-rate
and --action-horizon.