Data Collection for VLA#

Record teleop demonstrations as LeRobot datasets for post-training with Isaac-GR00T. The data exporter runs alongside the SONIC deployment and VR teleop stack, capturing robot state, SMPL teleop poses, and camera images at a configurable frequency.

Deployment model

Everything runs offboard on your workstation except the camera server, which runs onboard the robot computer (e.g., Jetson Orin) where the physical cameras are connected. The camera server publishes JPEG frames over ZMQ to the workstation.

Supported cameras

The tested and supported camera setup uses Luxonis OAK cameras (OAK-D, OAK-1, etc.). This includes a head/ego-view OAK camera and optional OAK wrist cameras. Other camera drivers (RealSense, USB webcam) are included in the codebase but have not been tested recently.

Prerequisites

  1. Completed the Quick Start — you can run the sim2sim loop (includes installing the deployment and downloading model checkpoints).

  2. Completed the VR Teleop Setup — PICO hardware is calibrated and .venv_teleop is ready.

  3. Camera server running on the robot — see Camera Server Setup below. For simulation, the MuJoCo sim loop publishes camera images automatically — no camera server needed.


One-Time Setup (Workstation)#

On your workstation (where you run the C++ deployment, teleop, and data exporter), run the install script from the repo root to create a dedicated virtual environment with all data collection dependencies (LeRobot, PyAV, OpenCV, etc.):

bash install_scripts/install_data_collection.sh

This creates .venv_data_collection using Python 3.10 via uv. It installs gear_sonic[data_collection] which includes lerobot, av, opencv-python, and other required packages. It also installs espeak (system package) for voice feedback during recording.

Tip

This environment is separate from .venv_teleop and .venv_sim — the data exporter has heavier ML dependencies that are not needed for teleop or simulation.


Camera Server Setup (On-Robot)#

The camera server is the only component that runs on the robot computer (e.g., Jetson Orin). Everything else — the C++ deployment, PICO teleop streamer, data exporter, and camera viewer — runs on your workstation.

The camera server captures frames from the OAK cameras physically connected to the robot and publishes them over ZMQ to the workstation.

Step 1: Clone the repo on the robot#

SSH into your robot computer and clone this repository:

git clone https://github.com/NVlabs/GR00T-WholeBodyControl.git
cd GR00T-WholeBodyControl

Step 2: Run the install script#

The install script handles everything: creates the virtual environment, installs all dependencies (including the DepthAI SDK for OAK cameras), detects connected cameras, and optionally installs a systemd service so the camera server starts automatically on boot.

bash install_scripts/install_camera_server.sh

The script will:

  1. Create .venv_camera with gear_sonic[camera] (DepthAI, ZMQ, msgpack, OpenCV, tyro).

  2. Detect connected OAK cameras and list their MxIDs.

  3. Prompt you for each camera position (ego view, and optionally left/right wrist) and its device ID.

  4. Ask whether to install the camera server as a systemd service (recommended). If you answer y, it generates the unit file, installs, enables, and starts the service automatically.

After the script finishes, verify the service is running:

sudo systemctl status composed_camera_server.service
journalctl -u composed_camera_server.service -f

Note

Other camera drivers (RealSense, USB webcam) are included in the codebase but have not been tested recently for data collection. If you need RealSense, install pyrealsense2 into the venv after setup. See the driver files in gear_sonic/camera/drivers/ for details.

Manual setup (alternative)#

If you prefer not to use the install script, or need to reconfigure:

Finding camera device IDs:

Each OAK camera has a unique MxID. List all connected OAK devices:

source .venv_camera/bin/activate
python -c "import depthai as dai; print(dai.Device.getAllAvailableDevices())"

Example output:

[XLinkDeviceState.X_LINK_BOOTED, MxId: 18443010E1ABC12300, ...]

Starting the camera server manually:

source .venv_camera/bin/activate

# Single camera (ego view only)
python -m gear_sonic.camera.composed_camera \
    --ego-view-camera oak \
    --ego-view-device-id <YOUR_MXID> \
    --port 5555

# Multiple cameras (ego view + wrist cameras)
python -m gear_sonic.camera.composed_camera \
    --ego-view-camera oak --ego-view-device-id <EGO_MXID> \
    --left-wrist-camera oak --left-wrist-device-id <LEFT_WRIST_MXID> \
    --right-wrist-camera oak --right-wrist-device-id <RIGHT_WRIST_MXID> \
    --port 5555

Run python -m gear_sonic.camera.composed_camera --help for all options including --fps, --use-mjpeg, and --mjpeg-quality.

Manual systemd setup:

# 1. Edit the service file to match your camera setup
nano systemd/composed_camera_server.service

# 2. Copy to systemd, enable, and start
sudo cp systemd/composed_camera_server.service /etc/systemd/system/
sudo systemctl daemon-reload
sudo systemctl enable composed_camera_server.service
sudo systemctl start composed_camera_server.service

Once the systemd service is running, the camera server starts automatically whenever the robot boots — no manual intervention needed.

Connecting from the workstation#

On your workstation, the data exporter and camera viewer connect to the robot’s camera server over the network. Pass the robot’s IP address (the G1 robot’s default IP is 192.168.123.164):

# Data exporter
python gear_sonic/scripts/run_data_exporter.py \
    --task-prompt "pick up the cup" \
    --camera-host 192.168.123.164 --camera-port 5555

# Camera viewer (to verify the feed)
python gear_sonic/scripts/run_camera_viewer.py \
    --camera-host 192.168.123.164 --camera-port 5555

The tmux launcher also accepts --camera-host:

python gear_sonic/scripts/launch_data_collection.py \
    --camera-host 192.168.123.164 \
    --task-prompt "pick up the cup"

ZMQ message format#

The camera server publishes a single msgpack-encoded payload per frame cycle containing all camera images:

{
    "timestamps": {"ego_view": 1712345678.123, "left_wrist": 1712345678.125},
    "images": {"ego_view": "<base64-jpeg>", "left_wrist": "<base64-jpeg>"}
}

Images are JPEG-compressed (quality 80) and either base64-encoded strings or raw JPEG bytes (when MJPEG on-device encoding is enabled). The data exporter’s ComposedCameraClientSensor handles both formats automatically.


Architecture#

The data exporter receives data from three ZMQ sources. The C++ deployment, PICO teleop, and data exporter all run offboard on the workstation. The camera server runs onboard the robot and streams frames to the workstation over the network.

    Workstation (offboard)                           Robot (onboard)
┌──────────────────────┐  ┌──────────────────────┐  ┌───────────────┐
│  C++ deploy          │  │  pico_manager         │  │  Camera       │
│  (zmq_output_handler)│  │  _thread_server.py    │  │  server       │
│                      │  │                       │  │  (OAK cameras)│
│  port 5557           │  │  port 5556            │  │  port 5555    │
│  topics: g1_debug,   │  │  topic: pose          │  │  (JPEG/ZMQ)   │
│          robot_config│  │  (SMPL body params)   │  │               │
└──────────┬───────────┘  └──────────┬────────────┘  └──────┬────────┘
           │                         │                       │
           └────────────┬────────────┘───────────────────────┘
                        │              (network)
               ┌────────▼────────┐
               │  run_data_      │
               │  exporter.py    │
               │  (workstation)  │
               │                 │
               │  LeRobot dataset│
               │  (parquet + mp4)│
               └─────────────────┘

Source

Runs on

ZMQ Topic

Default Port

Provides

C++ deployment

Workstation

g1_debug

5557

Joint positions, velocities, IMU quaternion

C++ deployment

Workstation

robot_config

5557

One-shot robot configuration at startup

PICO teleop streamer

Workstation

pose

5556

SMPL body parameters (teleop target poses)

Camera server

Robot

(raw TCP)

5555

JPEG-compressed camera images (ego view + optional wrist views)


Running Data Collection#

There are two ways to run the data collection stack: an all-in-one tmux launcher (recommended) or manual multi-terminal setup.

Option B: Manual Multi-Terminal Setup#

If you prefer individual control over each process, run them in separate terminals:

Terminal 1 — MuJoCo Simulator (skip for real robot):

source .venv_sim/bin/activate
python gear_sonic/scripts/run_sim_loop.py \
    --enable-image-publish --enable-offscreen --camera-port 5555

The --enable-image-publish and --enable-offscreen flags are required so the sim renders camera images and streams them over ZMQ on the specified port. The data exporter subscribes to this port the same way it subscribes to a physical camera server.

For real robot deployment, skip this terminal and see VR Whole-Body Teleop instead.

Terminal 2 — C++ Deployment (from gear_sonic_deploy/):

cd gear_sonic_deploy
source scripts/setup_env.sh
./deploy.sh --input-type zmq_manager sim
# Wait until you see "Init done"

Terminal 3 — PICO Teleop Streamer:

source .venv_teleop/bin/activate
python gear_sonic/scripts/pico_manager_thread_server.py --manager

Terminal 4 — Data Exporter:

source .venv_data_collection/bin/activate
python gear_sonic/scripts/run_data_exporter.py --task-prompt "pick up the cup"

Terminal 5 (optional) — Camera Viewer:

source .venv_data_collection/bin/activate
python gear_sonic/scripts/run_camera_viewer.py

All options are provided via CLI flags — no interactive prompts. Key flags:

Flag

Default

Description

--task-prompt

"demo"

Language task description for this session

--dataset-name

(auto: timestamp)

Dataset name. Omit to create a new one, or pass an existing name to append episodes

--data-collection-frequency

50

Recording frequency (Hz)

--root-output-dir

outputs

Parent directory for saved datasets

Tip

Datasets are saved under <root-output-dir>/<dataset-name>/. If --dataset-name is not specified, a timestamped name is generated automatically.

Recording Controls#

There are two ways to control recording: PICO VR controllers (recommended during teleop) or keyboard over ZMQ.

PICO VR Controllers (via manager_state topic):

Input

Action

Left Grip + A

Toggle recording — starts a new episode, or stops and saves the current one

Left Grip + B

Discard the current episode without saving

These buttons work in any manager mode (POSE, PLANNER, etc.) and are independent of the mode-switching controls.

Keyboard over ZMQ:

Key

Action

c

Toggle recording (same as Left Grip + A)

x

Discard episode (same as Left Grip + B)

Note

Keyboard commands are sent via a separate ZMQ publisher (default port 5580). The data exporter subscribes to this channel automatically. You can send keys from any ZMQ publisher on that port, or integrate with the C++ deployment’s keyboard handler.


Camera Viewer#

A standalone camera viewer is available for monitoring camera feeds and recording raw video independently of the data exporter.

source .venv_data_collection/bin/activate
python gear_sonic/scripts/run_camera_viewer.py --camera-host localhost --camera-port 5555

The viewer connects to the same ZMQ camera server used by the data exporter and displays all detected camera streams in a tiled OpenCV window.

Controls (OpenCV window must be focused):

Key

Action

R

Start/stop video recording

Q

Quit

Recordings are saved to camera_recordings/rec_<timestamp>/ with one MP4 per camera stream. This is useful for:

  • Verifying camera placement and image quality before starting data collection

  • Recording reference videos alongside the LeRobot dataset

  • Debugging camera server connectivity

Run python gear_sonic/scripts/run_camera_viewer.py --help for all options.


CLI Options#

All options can be viewed with --help:

python gear_sonic/scripts/run_data_exporter.py --help

Key options:

Flag

Default

Description

--task-prompt

"demo"

Language task description for annotation

--dataset-name

(auto: timestamp)

Dataset name; omit to auto-generate, or reuse an existing name to append

--data-collection-frequency

50

Recording frequency in Hz

--camera-host

localhost

Camera server hostname

--camera-port

5555

Camera server port

--sonic-zmq-host

localhost

SMPL pose publisher host

--sonic-zmq-port

5556

SMPL pose publisher port

--state-zmq-host

localhost

Robot state publisher host

--state-zmq-port

5557

Robot state publisher port

--root-output-dir

outputs

Root directory for saved datasets

--text-to-speech / --no-text-to-speech

True

Voice feedback via espeak


Output Format#

Datasets are saved in the LeRobot v2.1 format under <root-output-dir>/<dataset-name>/:

outputs/2026-04-03-14-30-00-G1-robot01/
├── data/
│   ├── train-00000.parquet      # Tabular data (joint states, actions, annotations)
│   └── ...
├── videos/
│   ├── observation.images.ego_view/
│   │   ├── episode_000000.mp4   # H264-encoded ego camera video
│   │   └── ...
│   ├── observation.images.left_wrist/   # (only with --record-wrist-cameras)
│   └── observation.images.right_wrist/  # (only with --record-wrist-cameras)
└── meta/
    ├── info.json                # Dataset metadata (fps, features, sizes)
    ├── modality.json            # GR00T modality configuration
    ├── episodes.jsonl           # Per-episode metadata
    └── tasks.jsonl              # Task prompt definitions

Recorded Data Channels#

Each frame contains:

Feature

Shape

Description

observation.state.joint_position

(N,)

Actuated joint positions (rad)

observation.state.joint_velocity

(N,)

Actuated joint velocities (rad/s)

observation.state.body_rotation_6d

(6,)

Base orientation (6D rotation)

observation.state.projected_gravity

(3,)

Gravity vector in body frame

observation.images.ego_view

(480, 640, 3)

Ego camera image (saved as MP4 video)

observation.images.left_wrist

(480, 640, 3)

Left wrist camera (only with --record-wrist-cameras)

observation.images.right_wrist

(480, 640, 3)

Right wrist camera (only with --record-wrist-cameras)

action.joint_position

(N,)

Teleop target joint positions

action.body_rotation_6d

(6,)

Teleop target body rotation

annotation.human.action.task_description

string

Task prompt for this frame


Post-Processing Datasets#

After recording, you can clean and merge datasets using the processing script. All commands below run in the data collection virtual environment:

source .venv_data_collection/bin/activate

Remove Stale SMPL Frames#

Teleop pauses or ZMQ frame drops create frames where teleop.smpl_pose is all zeros. The processing script detects these and also removes consecutive frozen (identical) lead-in frames that precede them:

# Clean a single dataset in-place
python gear_sonic/scripts/process_dataset.py \
    --dataset-path outputs/my_dataset

# Clean and write to a new directory (non-destructive)
python gear_sonic/scripts/process_dataset.py \
    --dataset-path outputs/my_dataset \
    --output-path outputs/my_dataset_cleaned

Merge Multiple Datasets#

Combine several recording sessions into a single dataset. The script validates that all sessions share the same script_config (robot configuration) before merging:

# Merge by listing datasets on the command line
python gear_sonic/scripts/process_dataset.py \
    --dataset-path outputs/session1 outputs/session2 outputs/session3 \
    --output-path outputs/merged_dataset

# Or use a text file (one dataset path per line, # for comments)
python gear_sonic/scripts/process_dataset.py \
    --dataset-list datasets.txt \
    --output-path outputs/merged_dataset

SMPL cleaning is applied by default during merging. When enabled, the script removes entire frames where the SMPL teleop pose is stuck at zeros — this happens during operator pauses or ZMQ packet-drop periods where the SMPL stream stops updating. Consecutive frozen (identical) frames that lead into a zero block are also removed, since they represent stale data right before the dropout. To skip this cleaning and merge only, add --no-remove-stale-smpl.


Using Datasets with Isaac-GR00T#

The output dataset is directly compatible with the Isaac-GR00T post-training pipeline. Point the Isaac-GR00T training config to the dataset directory:

# In the Isaac-GR00T repository
python scripts/gr00t_finetune.py \
    --dataset-path /path/to/outputs/2026-04-03-14-30-00-G1-robot01 \
    ...

See the Isaac-GR00T documentation for full post-training instructions.