VLA Inference

VLA Inference#

This guide covers running a trained Isaac-GR00T VLA policy on the Unitree G1 robot using the Sonic whole-body control stack.

Overview#

The inference pipeline consists of:

Isaac-GR00T PolicyServer — loads the VLA model and serves actions over ZMQ
VLA inference client (run_vla_inference.py) — reads camera + robot state, queries the PolicyServer, and publishes actions to the C++ control loop
C++ deploy (gear_sonic_deploy) — executes whole-body control on the robot
Camera server — provides camera images over ZMQ (runs as a systemd service)
Data exporter (optional) — records episodes during inference

┌──────────────────────┐
│  Isaac-GR00T         │
│  PolicyServer        │
│  (GPU machine)       │
└──────┬───────────────┘
       │ ZMQ REQ/REP
       ▼
┌─────────────────────┐    ZMQ TCP    ┌──────────────────────┐
│  VLA Inference      │ ◄─────────── │  Camera Server       │
│  (run_vla_inference)│              │  (on robot)          │
└────┬───────────┬────┘              └──────────────────────┘
     │           │
     │ ZMQ PUB   │ ZMQ SUB
     │ (actions) │ (state)
     ▼           ▼
┌─────────────────────┐
│  C++ Deploy         │
│  (gear_sonic_deploy)│
└─────────────────────┘

Prerequisites#

1. Isaac-GR00T PolicyServer#

The PolicyServer runs on a machine with a GPU. It loads your finetuned VLA model and serves inference over ZMQ.

Install Isaac-GR00T and start the server:

# On the GPU machine (from the Isaac-GR00T repo)
uv run python gr00t/eval/run_gr00t_server.py \
    --model-path /path/to/your/finetuned_model \
    --embodiment-tag UNITREE_G1_SONIC \
    --device cuda:0 \
    --port 5550

2. Inference Environment#

On the inference machine (can be the same as the PolicyServer or a separate PC):

bash install_scripts/install_inference.sh

This creates .venv_inference with the Isaac-GR00T PolicyClient and all inference dependencies.

3. Camera Server#

The camera server should be running as a systemd service on the robot. See Data Collection for camera server setup.

4. C++ Deploy#

The gear_sonic_deploy binary must be built. See the main README.

Action Space#

The Sonic embodiment (unitree_g1_sonic) uses a 78-dimensional action space: 64-dim motion token + 7-dim left hand joints + 7-dim right hand joints.

Quick Start — tmux Launcher#

The easiest way to run inference is with the all-in-one tmux launcher:

# Real robot
python gear_sonic/scripts/launch_inference.py \
    --prompt "pick up the apple" \
    --camera-host 192.168.123.164

# Simulation
python gear_sonic/scripts/launch_inference.py --sim \
    --prompt "pick up the apple"

# Without data recording
python gear_sonic/scripts/launch_inference.py \
    --no-data-exporter \
    --prompt "pick up the apple"

The launcher creates a tmux session with four panes:

Pane	Component	Description
0 (top-left)	C++ Deploy	Whole-body controller
1 (bottom-left)	Keyboard Publisher	Type keyboard commands here
2 (top-right)	VLA Inference	Policy client + action loop
3 (bottom-right)	Data Exporter	Records episodes (optional)

Keyboard Controls#

Type these keys in the Keyboard Publisher pane (pane 1):

Key	Action
`k`	Start / stop the C++ control loop
`i`	Send initial pose and switch to POSE mode
`p`	Pause / resume policy inference
`[`	Toggle left hand open/closed (initial pose)
`]`	Toggle right hand open/closed (initial pose)
`t <text>`	Change the inference prompt (e.g., `t pick up the cup`)
`c`	Start recording an episode (data exporter)
`s`	Stop recording — success (data exporter)
`f`	Stop recording — failure / discard (data exporter)

Typical Workflow#

Wait for all panes to initialize
Click on pane 0 (C++ Deploy) and press Enter to confirm deployment
Switch to pane 1 (Keyboard Publisher)
Press k to start the C++ control loop (starts in PLANNER mode)
Press i to send the initial pose (switches to POSE mode)

Note: The initial motion token in gear_sonic/utils/inference/initial_poses.py is specific to the SONIC checkpoint used during training. If you change the SONIC checkpoint, you must update LATENT_INITIAL_MOTION_TOKEN to a safe standing pose from the new checkpoint’s latent space.
Press p to unpause the inference loop
The robot will begin executing VLA-predicted actions
Press p to pause, k to stop the control loop when done

Manual Setup (Without tmux)#

If you prefer to run each component in separate terminals:

Terminal 1 — Isaac-GR00T PolicyServer (GPU machine)#

# From the Isaac-GR00T repo
uv run python gr00t/eval/run_gr00t_server.py \
    --model-path /path/to/your/finetuned_model \
    --embodiment-tag UNITREE_G1_SONIC \
    --device cuda:0 \
    --port 5550

Terminal 2 — C++ Deploy#

cd gear_sonic_deploy
./deploy.sh --input-type zmq_manager real

Terminal 3 — VLA Inference#

source .venv_inference/bin/activate
python gear_sonic/scripts/run_vla_inference.py \
    --host <policy_server_ip> \
    --port 5550 \
    --embodiment-tag unitree_g1_sonic \
    --prompt "pick up the apple" \
    --camera-host 192.168.123.164

Terminal 4 — Data Exporter (optional)#

source .venv_data_collection/bin/activate
python gear_sonic/scripts/run_data_exporter.py \
    --task-prompt "pick up the apple" \
    --camera-host 192.168.123.164

Configuration Reference#

VLA Inference (`run_vla_inference.py`)#

Flag	Default	Description
`--host`	`localhost`	PolicyServer host
`--port`	`5550`	PolicyServer port
`--embodiment-tag`	`unitree_g1_sonic`	Embodiment tag
`--prompt`	`demo`	Language prompt
`--action-publish-rate`	`50`	Action publish rate (Hz)
`--action-horizon`	`40`	Actions per inference chunk
`--rate`	`2.5`	Inference rate (Hz)
`--camera-host`	`localhost`	Camera server host
`--camera-port`	`5555`	Camera server port
`--verbose-timing`	`false`	Always print loop timing

tmux Launcher (`launch_inference.py`)#

The launcher exposes all the above flags plus deploy and data exporter options. Run python gear_sonic/scripts/launch_inference.py --help for the full list.

Remote PolicyServer#

When running the PolicyServer on a separate GPU machine:

# On the inference machine, point to the remote server
python gear_sonic/scripts/launch_inference.py \
    --policy-host <gpu_machine_ip> \
    --policy-port 5550 \
    --camera-host 192.168.123.164 \
    --prompt "pick up the apple"

Make sure port 5550 (or your chosen port) is accessible between the two machines.

Latency Compensation#

The inference loop automatically compensates for network and compute latency. When a new action chunk arrives, the system calculates how many actions in the chunk are already “stale” based on the time elapsed since inference started, and skips to the appropriate action index. This is controlled by --action-publish-rate and --action-horizon.