Data Collection for VLA

Data Collection for VLA#

Record teleop demonstrations as LeRobot datasets for post-training with Isaac-GR00T. The data exporter runs alongside the SONIC deployment and VR teleop stack, capturing robot state, SMPL teleop poses, and camera images at a configurable frequency.

Deployment model

Everything runs offboard on your workstation except the camera server, which runs onboard the robot computer (e.g., Jetson Orin) where the physical cameras are connected. The camera server publishes JPEG frames over ZMQ to the workstation.

Supported cameras

The tested and supported camera setup uses Luxonis OAK cameras (OAK-D, OAK-1, etc.). This includes a head/ego-view OAK camera and optional OAK wrist cameras. Other camera drivers (RealSense, USB webcam) are included in the codebase but have not been tested recently.

Prerequisites

Completed the Quick Start — you can run the sim2sim loop (includes installing the deployment and downloading model checkpoints).
Completed the VR Teleop Setup — PICO hardware is calibrated and .venv_teleop is ready.
Camera server running on the robot — see Camera Server Setup below. For simulation, the MuJoCo sim loop publishes camera images automatically — no camera server needed.

One-Time Setup (Workstation)#

On your workstation (where you run the C++ deployment, teleop, and data exporter), run the install script from the repo root to create a dedicated virtual environment with all data collection dependencies (LeRobot, PyAV, OpenCV, etc.):

bash install_scripts/install_data_collection.sh

This creates .venv_data_collection using Python 3.10 via uv. It installs gear_sonic[data_collection] which includes lerobot, av, opencv-python, and other required packages. It also installs espeak (system package) for voice feedback during recording.

Tip

This environment is separate from .venv_teleop and .venv_sim — the data exporter has heavier ML dependencies that are not needed for teleop or simulation.

Camera Server Setup (On-Robot)#

The camera server is the only component that runs on the robot computer (e.g., Jetson Orin). Everything else — the C++ deployment, PICO teleop streamer, data exporter, and camera viewer — runs on your workstation.

The camera server captures frames from the OAK cameras physically connected to the robot and publishes them over ZMQ to the workstation.

Step 1: Clone the repo on the robot#

SSH into your robot computer and clone this repository:

git clone https://github.com/NVlabs/GR00T-WholeBodyControl.git
cd GR00T-WholeBodyControl

Step 2: Run the install script#

The install script handles everything: creates the virtual environment, installs all dependencies (including the DepthAI SDK for OAK cameras), detects connected cameras, and optionally installs a systemd service so the camera server starts automatically on boot.

bash install_scripts/install_camera_server.sh

The script will:

Create .venv_camera with gear_sonic[camera] (DepthAI, ZMQ, msgpack, OpenCV, tyro).
Detect connected OAK cameras and list their MxIDs.
Prompt you for each camera position (ego view, and optionally left/right wrist) and its device ID.
Ask whether to install the camera server as a systemd service (recommended). If you answer y, it generates the unit file, installs, enables, and starts the service automatically.

After the script finishes, verify the service is running:

sudo systemctl status composed_camera_server.service
journalctl -u composed_camera_server.service -f

Note

Other camera drivers (RealSense, USB webcam) are included in the codebase but have not been tested recently for data collection. If you need RealSense, install pyrealsense2 into the venv after setup. See the driver files in gear_sonic/camera/drivers/ for details.

Manual setup (alternative)#

If you prefer not to use the install script, or need to reconfigure:

Finding camera device IDs:

Each OAK camera has a unique MxID. List all connected OAK devices:

source .venv_camera/bin/activate
python -c "import depthai as dai; print(dai.Device.getAllAvailableDevices())"

Example output:

[XLinkDeviceState.X_LINK_BOOTED, MxId: 18443010E1ABC12300, ...]

Starting the camera server manually:

source .venv_camera/bin/activate

# Single camera (ego view only)
python -m gear_sonic.camera.composed_camera \
    --ego-view-camera oak \
    --ego-view-device-id <YOUR_MXID> \
    --port 5555

# Multiple cameras (ego view + wrist cameras)
python -m gear_sonic.camera.composed_camera \
    --ego-view-camera oak --ego-view-device-id <EGO_MXID> \
    --left-wrist-camera oak --left-wrist-device-id <LEFT_WRIST_MXID> \
    --right-wrist-camera oak --right-wrist-device-id <RIGHT_WRIST_MXID> \
    --port 5555

Run python -m gear_sonic.camera.composed_camera --help for all options including --fps, --use-mjpeg, and --mjpeg-quality.

Manual systemd setup:

# 1. Edit the service file to match your camera setup
nano systemd/composed_camera_server.service

# 2. Copy to systemd, enable, and start
sudo cp systemd/composed_camera_server.service /etc/systemd/system/
sudo systemctl daemon-reload
sudo systemctl enable composed_camera_server.service
sudo systemctl start composed_camera_server.service

Once the systemd service is running, the camera server starts automatically whenever the robot boots — no manual intervention needed.

Connecting from the workstation#

On your workstation, the data exporter and camera viewer connect to the robot’s camera server over the network. Pass the robot’s IP address (the G1 robot’s default IP is 192.168.123.164):

# Data exporter
python gear_sonic/scripts/run_data_exporter.py \
    --task-prompt "pick up the cup" \
    --camera-host 192.168.123.164 --camera-port 5555

# Camera viewer (to verify the feed)
python gear_sonic/scripts/run_camera_viewer.py \
    --camera-host 192.168.123.164 --camera-port 5555

The tmux launcher also accepts --camera-host:

python gear_sonic/scripts/launch_data_collection.py \
    --camera-host 192.168.123.164 \
    --task-prompt "pick up the cup"

ZMQ message format#

The camera server publishes a single msgpack-encoded payload per frame cycle containing all camera images:

{
    "timestamps": {"ego_view": 1712345678.123, "left_wrist": 1712345678.125},
    "images": {"ego_view": "<base64-jpeg>", "left_wrist": "<base64-jpeg>"}
}

Images are JPEG-compressed (quality 80) and either base64-encoded strings or raw JPEG bytes (when MJPEG on-device encoding is enabled). The data exporter’s ComposedCameraClientSensor handles both formats automatically.

Architecture#

The data exporter receives data from three ZMQ sources. The C++ deployment, PICO teleop, and data exporter all run offboard on the workstation. The camera server runs onboard the robot and streams frames to the workstation over the network.

    Workstation (offboard)                           Robot (onboard)
┌──────────────────────┐  ┌──────────────────────┐  ┌───────────────┐
│  C++ deploy          │  │  pico_manager         │  │  Camera       │
│  (zmq_output_handler)│  │  _thread_server.py    │  │  server       │
│                      │  │                       │  │  (OAK cameras)│
│  port 5557           │  │  port 5556            │  │  port 5555    │
│  topics: g1_debug,   │  │  topic: pose          │  │  (JPEG/ZMQ)   │
│          robot_config│  │  (SMPL body params)   │  │               │
└──────────┬───────────┘  └──────────┬────────────┘  └──────┬────────┘
           │                         │                       │
           └────────────┬────────────┘───────────────────────┘
                        │              (network)
               ┌────────▼────────┐
               │  run_data_      │
               │  exporter.py    │
               │  (workstation)  │
               │                 │
               │  LeRobot dataset│
               │  (parquet + mp4)│
               └─────────────────┘

Source	Runs on	ZMQ Topic	Default Port	Provides
C++ deployment	Workstation	`g1_debug`	5557	Joint positions, velocities, IMU quaternion
C++ deployment	Workstation	`robot_config`	5557	One-shot robot configuration at startup
PICO teleop streamer	Workstation	`pose`	5556	SMPL body parameters (teleop target poses)
Camera server	Robot	(raw TCP)	5555	JPEG-compressed camera images (ego view + optional wrist views)

Running Data Collection#

There are two ways to run the data collection stack: an all-in-one tmux launcher (recommended) or manual multi-terminal setup.

Option A: All-in-One Tmux Launch (Recommended)#

The launcher starts all components in a single tmux session with four panes:

┌───────────────────────┬───────────────────────┐
│ Pane 0: C++ Deploy    │ Pane 2: Data Exporter │
│ (gear_sonic_deploy)   │ (.venv_data_collection)│
├───────────────────────┼───────────────────────┤
│ Pane 1: PICO Teleop   │ Pane 3: Camera Viewer │
│ (.venv_teleop)        │ (.venv_data_collection)│
└───────────────────────┴───────────────────────┘

Note

Requires tmux to be installed (sudo apt install tmux).

For simulation (the launcher starts run_sim_loop.py in a separate tmux window automatically):

python gear_sonic/scripts/launch_data_collection.py --sim

For real robot (camera server running on robot at 192.168.123.164):

python gear_sonic/scripts/launch_data_collection.py \
    --camera-host 192.168.123.164 \
    --task-prompt "pick up the cup"

With wrist cameras (records ego view + left/right wrist camera streams):

python gear_sonic/scripts/launch_data_collection.py \
    --camera-host 192.168.123.164 \
    --task-prompt "pick up the cup" \
    --record-wrist-cameras

Tip

No need to activate a virtual environment first — the launcher automatically detects and uses .venv_data_collection if the required dependencies are not in the current Python.

The launcher auto-attaches to the tmux session. Use Ctrl+b then arrow keys to switch between panes.

Common options:

Flag	Default	Description
`--task-prompt`	`"demo"`	Language task description (e.g., `"pick up the cup"`)
`--dataset-name`	(auto: timestamp)	Dataset name; omit to auto-generate
`--sim / --no-sim`	`False`	Run deploy.sh in sim mode (also starts the sim loop)
`--camera-host`	`localhost`	Camera server host (e.g., `192.168.123.164` for real robot)
`--camera-port`	`5555`	Camera server port
`--no-camera-viewer`	(viewer on)	Disable the camera viewer pane
`--data-exporter-frequency`	`50`	Recording frequency (Hz)
`--deploy-checkpoint`	(default)	Custom checkpoint path for deploy.sh
`--deploy-obs-config`	(default)	Custom observation config for deploy.sh
`--deploy-planner`	(default)	Custom planner model path for deploy.sh
`--deploy-motion-data`	(default)	Custom motion data path for deploy.sh
`--record-wrist-cameras`	`False`	Record left/right wrist camera streams in the dataset
`--no-text-to-speech`	(on)	Disable voice feedback via espeak

Run python gear_sonic/scripts/launch_data_collection.py --help for all options.

Tip

The launcher automatically enables mouse support in the tmux session — click to select panes, scroll with the mouse wheel, and drag to resize pane borders.

Session management:

Action	Command
Switch panes	`Ctrl+b`, then arrow keys
Detach (keep running)	`Ctrl+b`, then `d`
Reattach	`tmux attach -t sonic_data_collection`
Kill session	`Ctrl+\` in any pane, or `tmux kill-session -t sonic_data_collection`

Option B: Manual Multi-Terminal Setup#

If you prefer individual control over each process, run them in separate terminals:

Terminal 1 — MuJoCo Simulator (skip for real robot):

source .venv_sim/bin/activate
python gear_sonic/scripts/run_sim_loop.py \
    --enable-image-publish --enable-offscreen --camera-port 5555

The --enable-image-publish and --enable-offscreen flags are required so the sim renders camera images and streams them over ZMQ on the specified port. The data exporter subscribes to this port the same way it subscribes to a physical camera server.

For real robot deployment, skip this terminal and see VR Whole-Body Teleop instead.

Terminal 2 — C++ Deployment (from gear_sonic_deploy/):

cd gear_sonic_deploy
source scripts/setup_env.sh
./deploy.sh --input-type zmq_manager sim
# Wait until you see "Init done"

Terminal 3 — PICO Teleop Streamer:

source .venv_teleop/bin/activate
python gear_sonic/scripts/pico_manager_thread_server.py --manager

Terminal 4 — Data Exporter:

source .venv_data_collection/bin/activate
python gear_sonic/scripts/run_data_exporter.py --task-prompt "pick up the cup"

Terminal 5 (optional) — Camera Viewer:

source .venv_data_collection/bin/activate
python gear_sonic/scripts/run_camera_viewer.py

All options are provided via CLI flags — no interactive prompts. Key flags:

Flag	Default	Description
`--task-prompt`	`"demo"`	Language task description for this session
`--dataset-name`	(auto: timestamp)	Dataset name. Omit to create a new one, or pass an existing name to append episodes
`--data-collection-frequency`	`50`	Recording frequency (Hz)
`--root-output-dir`	`outputs`	Parent directory for saved datasets

Tip

Datasets are saved under <root-output-dir>/<dataset-name>/. If --dataset-name is not specified, a timestamped name is generated automatically.

Recording Controls#

There are two ways to control recording: PICO VR controllers (recommended during teleop) or keyboard over ZMQ.

PICO VR Controllers (via manager_state topic):

Input	Action
Left Grip + A	Toggle recording — starts a new episode, or stops and saves the current one
Left Grip + B	Discard the current episode (saved to disk but flagged for removal during post-processing)

These buttons work in any manager mode (POSE, PLANNER, etc.) and are independent of the mode-switching controls.

Keyboard over ZMQ:

Key	Action
`c`	Toggle recording (same as Left Grip + A)
`x`	Discard episode (same as Left Grip + B — flagged for removal)

Note

Keyboard commands are sent via a separate ZMQ publisher (default port 5580). The data exporter subscribes to this channel automatically. You can send keys from any ZMQ publisher on that port, or integrate with the C++ deployment’s keyboard handler.

Camera Viewer#

A standalone camera viewer is available for monitoring camera feeds and recording raw video independently of the data exporter.

source .venv_data_collection/bin/activate
python gear_sonic/scripts/run_camera_viewer.py --camera-host localhost --camera-port 5555

The viewer connects to the same ZMQ camera server used by the data exporter and displays all detected camera streams in a tiled OpenCV window.

Controls (OpenCV window must be focused):

Key	Action
`R`	Start/stop video recording
`Q`	Quit

Recordings are saved to camera_recordings/rec_<timestamp>/ with one MP4 per camera stream. This is useful for:

Verifying camera placement and image quality before starting data collection
Recording reference videos alongside the LeRobot dataset
Debugging camera server connectivity

Run python gear_sonic/scripts/run_camera_viewer.py --help for all options.

CLI Options#

All options can be viewed with --help:

python gear_sonic/scripts/run_data_exporter.py --help

Key options:

Flag	Default	Description
`--task-prompt`	`"demo"`	Language task description for annotation
`--dataset-name`	(auto: timestamp)	Dataset name; omit to auto-generate, or reuse an existing name to append
`--data-collection-frequency`	`50`	Recording frequency in Hz
`--camera-host`	`localhost`	Camera server hostname
`--camera-port`	`5555`	Camera server port
`--sonic-zmq-host`	`localhost`	SMPL pose publisher host
`--sonic-zmq-port`	`5556`	SMPL pose publisher port
`--state-zmq-host`	`localhost`	Robot state publisher host
`--state-zmq-port`	`5557`	Robot state publisher port
`--root-output-dir`	`outputs`	Root directory for saved datasets
`--text-to-speech / --no-text-to-speech`	`True`	Voice feedback via espeak

Output Format#

Datasets are saved in the LeRobot v2.1 format under <root-output-dir>/<dataset-name>/:

outputs/2026-04-03-14-30-00-G1-robot01/
├── data/
│   ├── train-00000.parquet      # Tabular data (joint states, actions, annotations)
│   └── ...
├── videos/
│   ├── observation.images.ego_view/
│   │   ├── episode_000000.mp4   # H264-encoded ego camera video
│   │   └── ...
│   ├── observation.images.left_wrist/   # (only with --record-wrist-cameras)
│   └── observation.images.right_wrist/  # (only with --record-wrist-cameras)
└── meta/
    ├── info.json                # Dataset metadata (fps, features, sizes)
    ├── modality.json            # GR00T modality configuration
    ├── episodes.jsonl           # Per-episode metadata
    └── tasks.jsonl              # Task prompt definitions

Recorded Data Channels#

Each frame contains:

Feature	Shape	Description
`observation.state.joint_position`	`(N,)`	Actuated joint positions (rad)
`observation.state.joint_velocity`	`(N,)`	Actuated joint velocities (rad/s)
`observation.state.body_rotation_6d`	`(6,)`	Base orientation (6D rotation)
`observation.state.projected_gravity`	`(3,)`	Gravity vector in body frame
`observation.images.ego_view`	`(480, 640, 3)`	Ego camera image (saved as MP4 video)
`observation.images.left_wrist`	`(480, 640, 3)`	Left wrist camera (only with `--record-wrist-cameras`)
`observation.images.right_wrist`	`(480, 640, 3)`	Right wrist camera (only with `--record-wrist-cameras`)
`action.joint_position`	`(N,)`	Teleop target joint positions
`action.body_rotation_6d`	`(6,)`	Teleop target body rotation
`annotation.human.action.task_description`	string	Task prompt for this frame

Post-Processing Datasets#

After recording, you can clean and merge datasets using the processing script. All commands below run in the data collection virtual environment:

source .venv_data_collection/bin/activate

Remove Discarded Episodes#

Episodes discarded during collection (x key or Left Grip + B) are saved to disk but flagged in meta/info.json. By default, the processing script removes these flagged episodes so they are excluded from fine-tuning:

# Clean a single dataset (removes discarded episodes + stale SMPL frames)
python gear_sonic/scripts/process_dataset.py \
    --dataset-path outputs/my_dataset \
    --output-path outputs/my_dataset_cleaned

To keep discarded episodes (e.g., for inspection), pass --no-remove-discarded.

Remove Stale SMPL Frames#

Teleop pauses or ZMQ frame drops create frames where teleop.smpl_pose is all zeros. The processing script detects these and also removes consecutive frozen (identical) lead-in frames that precede them:

# Clean a single dataset in-place
python gear_sonic/scripts/process_dataset.py \
    --dataset-path outputs/my_dataset

# Clean and write to a new directory (non-destructive)
python gear_sonic/scripts/process_dataset.py \
    --dataset-path outputs/my_dataset \
    --output-path outputs/my_dataset_cleaned

Warning

If you collected data using VR 3-point tracking mode (VR_3PT), the teleop.smpl_pose column will be all zeros because VR_3PT uses raw VR positions/orientations instead of SMPL body parameters. In this case, you must disable SMPL cleaning to avoid dropping all frames:

python gear_sonic/scripts/process_dataset.py \
    --dataset-path outputs/my_dataset \
    --output-path outputs/my_dataset_cleaned \
    --no-remove-stale-smpl

Merge Multiple Datasets#

Combine several recording sessions into a single dataset. The script validates that all sessions share the same script_config (robot configuration) before merging:

# Merge by listing datasets on the command line
python gear_sonic/scripts/process_dataset.py \
    --dataset-path outputs/session1 outputs/session2 outputs/session3 \
    --output-path outputs/merged_dataset

# Or use a text file (one dataset path per line, # for comments)
python gear_sonic/scripts/process_dataset.py \
    --dataset-list datasets.txt \
    --output-path outputs/merged_dataset

SMPL cleaning is applied by default during merging. When enabled, the script removes entire frames where the SMPL teleop pose is stuck at zeros — this happens during operator pauses or ZMQ packet-drop periods where the SMPL stream stops updating. Consecutive frozen (identical) frames that lead into a zero block are also removed, since they represent stale data right before the dropout. To skip this cleaning and merge only, add --no-remove-stale-smpl.

Next Steps: Fine-tune and Deploy#

The output dataset is directly compatible with the Isaac-GR00T post-training pipeline. To fine-tune a VLA model on your collected data and deploy it for autonomous inference, see the VLA Workflow tutorial.

Data Collection for VLA

Contents

Data Collection for VLA#

One-Time Setup (Workstation)#

Camera Server Setup (On-Robot)#

Step 1: Clone the repo on the robot#

Step 2: Run the install script#

Manual setup (alternative)#

Connecting from the workstation#

ZMQ message format#

Architecture#

Running Data Collection#

Option A: All-in-One Tmux Launch (Recommended)#

Option B: Manual Multi-Terminal Setup#

Recording Controls#

Camera Viewer#

CLI Options#

Output Format#

Recorded Data Channels#

Post-Processing Datasets#

Remove Discarded Episodes#

Remove Stale SMPL Frames#

Merge Multiple Datasets#

Next Steps: Fine-tune and Deploy#