Feature Mapping¶

Fuse neural image features into a TSDF map and query semantic regions.

This tutorial extends curobo.examples.getting_started.volumetric_mapping with a learned feature channel. cuRobo still integrates depth frames into a block-sparse Truncated Signed Distance Field (TSDF), but each RGB frame is also encoded by NVIDIA C-RADIO and passed to CameraObservation as feature_grid. The mapper fuses those patch features into the allocated TSDF blocks, so later queries can find parts of the 3D map that are visually or semantically similar.

Feature Integration

C-RADIO (Reduce All Domains Into One) distills multiple vision foundation models, including DINOv2, SAM, CLIP, and SigLIP, into one backbone. This example uses the C-RADIO v3-B checkpoint and its per-image patch embeddings in two beginner-friendly ways:

Project image or map features to RGB with Principal Component Analysis (PCA) so feature clusters can be inspected visually.
When the viewer is enabled, project block features through the fixed SigLIP adaptor and match them against text prompts such as table or chair.

This example downloads C-RADIO v3-B through torch.hub on first use. The first run must be able to reach NVlabs/RADIO on GitHub, download the checkpoint, and install any missing RADIO dependencies.

By the end of this tutorial you will have:

Loaded an RGB-D sequence from Sun3D
Extracted C-RADIO v3-B patch features for each selected RGB frame
Fused depth, color, and learned features into a TSDF map
Saved side-by-side RGB | PCA(features) images for quick inspection
Visualized the map and highlighted blocks that match a text prompt when --visualize is enabled

Before starting¶

Install the extra feature mapping dependencies. If your environment needs Hugging Face authentication for checkpoint downloads, export HF_TOKEN before running the example:

export HF_TOKEN=<your_huggingface_token>
uv pip install timm transformers torchvision einops

How the mapper uses features¶

The feature mapper follows the same geometry path as the volumetric mapping tutorial, with one extra input: a lower-resolution grid of learned patch features from the RGB image.

RGB-D feature mapping data flow¶

Step 1: Download the dataset¶

This tutorial uses the same Sun3D indoor RGB-D scene as the volumetric mapping tutorial. It contains color images, depth maps, camera intrinsics, and ground-truth camera poses.

Quick start (downloads one scene, about 1400 MB):

wget http://3dvision.princeton.edu/projects/2016/3DMatch/downloads/rgbd-datasets/sun3d-mit_76_studyroom-76-1studyroom2.zip
mkdir -p datasets/sun3d
unzip sun3d-mit_76_studyroom-76-1studyroom2.zip -d datasets/sun3d

The extracted directory should look like:

datasets/sun3d/sun3d-mit_76_studyroom-76-1studyroom2/
    camera-intrinsics.txt
    <sequence_name>/
        000001.color.png
        000001.depth.png
        000001.pose.txt
        ...

Step 2: Run a quick feature-fusion pass¶

Start with a small number of frames because C-RADIO inference is heavier than plain depth integration:

python -m curobo.examples.getting_started.feature_mapping \
    --root ./datasets/sun3d/sun3d-mit_76_studyroom-76-1studyroom2 \
    --num-frames 50 \
    --stride 10 \
    --save-pca

When --save-pca is enabled, the tutorial writes side-by-side RGB and feature-PCA panels to ~/.cache/curobo/examples/feature_mapping/. The colors are not object labels; they are a three-dimensional PCA projection of high-dimensional feature vectors, so nearby colors usually indicate similar visual embeddings.

Step 3: Inspect the map interactively¶

Add --visualize to open a Viser server at http://localhost:8080:

python -m curobo.examples.getting_started.feature_mapping \
    --root ./datasets/sun3d/sun3d-mit_76_studyroom-76-1studyroom2 \
    --num-frames 100 \
    --stride 5 \
    --visualize

The viewer shows:

/reconstruction/features_pca: occupied voxels colored by fused C-RADIO features projected through a map-wide PCA basis.
/reconstruction/rgb: occupied voxels colored by the TSDF color channel. This layer is hidden by default and can be toggled from the scene tree.
Current RGB and Current Feature PCA panels for the latest frame.

Feature Integration

Step 4: Try text matching¶

Use --visualize to open the Text Matching panel. The example uses the C-RADIO v3-B SigLIP adaptor for text queries:

python -m curobo.examples.getting_started.feature_mapping \
    --root ./datasets/sun3d/sun3d-mit_76_studyroom-76-1studyroom2 \
    --num-frames 100 \
    --stride 5 \
    --visualize

Enter a prompt in the panel to highlight the top matching TSDF blocks under /reconstruction/text_matched. Clear Matches demonstrates how matched blocks can be removed from the dynamic map. For a geometric clearing example, --clear-aabb xmin ymin zmin xmax ymax zmax clears all allocated blocks that intersect the given world-space bounds in meters.

Text Feature Alignment

Step 5: Check the output¶

When the tutorial finishes successfully you will see output similar to:

Loading Sun3D from ./datasets/sun3d...
Found 200 frames
Loading C-RADIO (c-radio_v3-B) via NVlabs/RADIO torch.hub...
Feature dim: 768
Mapper initialized: 64.0 MB
integrating: 100%|...

Mapper memory: 64.0 MB
PCA panels saved to: ~/.cache/curobo/examples/feature_mapping

Once you have run the tutorial, open curobo.examples.getting_started.feature_mapping in your editor. The inline comments walk through the key design decisions: why depth is filtered before integration, how C-RADIO patch features are attached to CameraObservation.feature_grid, how block features are visualized with PCA, and how optional text matching projects map features into the SigLIP adaptor space.