Feature Mapping

Fuse neural image features into a TSDF map and query semantic regions.

This tutorial extends curobo.examples.getting_started.volumetric_mapping with a learned feature channel. cuRobo still integrates depth frames into a block-sparse Truncated Signed Distance Field (TSDF), but each RGB frame is also encoded by NVIDIA C-RADIO and passed to CameraObservation as feature_grid. The mapper fuses those patch features into the allocated TSDF blocks, so later queries can find parts of the 3D map that are visually or semantically similar.

Feature Integration

C-RADIO (Reduce All Domains Into One) distills multiple vision foundation models, including DINOv2, SAM, CLIP, and SigLIP, into one backbone. This example uses the C-RADIO v3-B checkpoint and its per-image patch embeddings in two beginner-friendly ways:

  • Project image or map features to RGB with Principal Component Analysis (PCA) so feature clusters can be inspected visually.

  • When the viewer is enabled, project block features through the fixed SigLIP adaptor and match them against text prompts such as table or chair.

This example downloads C-RADIO v3-B through torch.hub on first use. The first run must be able to reach NVlabs/RADIO on GitHub, download the checkpoint, and install any missing RADIO dependencies.

By the end of this tutorial you will have:

  • Loaded an RGB-D sequence from Sun3D

  • Extracted C-RADIO v3-B patch features for each selected RGB frame

  • Fused depth, color, and learned features into a TSDF map

  • Saved side-by-side RGB | PCA(features) images for quick inspection

  • Visualized the map and highlighted blocks that match a text prompt when --visualize is enabled

Before starting

Install the extra feature mapping dependencies. If your environment needs Hugging Face authentication for checkpoint downloads, export HF_TOKEN before running the example:

export HF_TOKEN=<your_huggingface_token>
uv pip install timm transformers torchvision einops

How the mapper uses features

The feature mapper follows the same geometry path as the volumetric mapping tutorial, with one extra input: a lower-resolution grid of learned patch features from the RGB image.

digraph FeatureMapping { rankdir=LR; edge [color="#2B4162", fontsize=10]; node [shape="box", style="rounded, filled", fontsize=12, color="#cccccc"]; rgb [label="RGB image", color="#708090", fontcolor="white"]; depth [label="Depth image", color="#708090", fontcolor="white"]; camera [label="Camera pose\n+ intrinsics", color="#708090", fontcolor="white"]; radio [label="C-RADIO\npatch features", color="#558c8c", fontcolor="white"]; obs [label="CameraObservation\ndepth + RGB + feature_grid", color="#76b900", fontcolor="white"]; mapper [label="Mapper.integrate()\nblock-sparse TSDF", color="#76b900", fontcolor="white"]; blocks [label="TSDF blocks\ngeometry + color + features", color="#558c8c", fontcolor="white"]; pca [label="PCA colors\nfeature clusters", color="#708090", fontcolor="white"]; text [label="Text matching\nSigLIP adaptor", color="#708090", fontcolor="white"]; rgb -> radio -> obs; depth -> obs; camera -> obs; obs -> mapper -> blocks; blocks -> pca; blocks -> text; }

RGB-D feature mapping data flow

Step 1: Download the dataset

This tutorial uses the same Sun3D indoor RGB-D scene as the volumetric mapping tutorial. It contains color images, depth maps, camera intrinsics, and ground-truth camera poses.

Quick start (downloads one scene, about 1400 MB):

wget http://3dvision.princeton.edu/projects/2016/3DMatch/downloads/rgbd-datasets/sun3d-mit_76_studyroom-76-1studyroom2.zip
mkdir -p datasets/sun3d
unzip sun3d-mit_76_studyroom-76-1studyroom2.zip -d datasets/sun3d

The extracted directory should look like:

datasets/sun3d/sun3d-mit_76_studyroom-76-1studyroom2/
    camera-intrinsics.txt
    <sequence_name>/
        000001.color.png
        000001.depth.png
        000001.pose.txt
        ...

Step 2: Run a quick feature-fusion pass

Start with a small number of frames because C-RADIO inference is heavier than plain depth integration:

python -m curobo.examples.getting_started.feature_mapping \
    --root ./datasets/sun3d/sun3d-mit_76_studyroom-76-1studyroom2 \
    --num-frames 50 \
    --stride 10 \
    --save-pca

When --save-pca is enabled, the tutorial writes side-by-side RGB and feature-PCA panels to ~/.cache/curobo/examples/feature_mapping/. The colors are not object labels; they are a three-dimensional PCA projection of high-dimensional feature vectors, so nearby colors usually indicate similar visual embeddings.

Step 3: Inspect the map interactively

Add --visualize to open a Viser server at http://localhost:8080:

python -m curobo.examples.getting_started.feature_mapping \
    --root ./datasets/sun3d/sun3d-mit_76_studyroom-76-1studyroom2 \
    --num-frames 100 \
    --stride 5 \
    --visualize

The viewer shows:

  • /reconstruction/features_pca: occupied voxels colored by fused C-RADIO features projected through a map-wide PCA basis.

  • /reconstruction/rgb: occupied voxels colored by the TSDF color channel. This layer is hidden by default and can be toggled from the scene tree.

  • Current RGB and Current Feature PCA panels for the latest frame.

Feature Integration

Step 4: Try text matching

Use --visualize to open the Text Matching panel. The example uses the C-RADIO v3-B SigLIP adaptor for text queries:

python -m curobo.examples.getting_started.feature_mapping \
    --root ./datasets/sun3d/sun3d-mit_76_studyroom-76-1studyroom2 \
    --num-frames 100 \
    --stride 5 \
    --visualize

Enter a prompt in the panel to highlight the top matching TSDF blocks under /reconstruction/text_matched. Clear Matches demonstrates how matched blocks can be removed from the dynamic map. For a geometric clearing example, --clear-aabb xmin ymin zmin xmax ymax zmax clears all allocated blocks that intersect the given world-space bounds in meters.

Text Feature Alignment

Step 5: Check the output

When the tutorial finishes successfully you will see output similar to:

Loading Sun3D from ./datasets/sun3d...
Found 200 frames
Loading C-RADIO (c-radio_v3-B) via NVlabs/RADIO torch.hub...
Feature dim: 768
Mapper initialized: 64.0 MB
integrating: 100%|...

Mapper memory: 64.0 MB
PCA panels saved to: ~/.cache/curobo/examples/feature_mapping

Once you have run the tutorial, open curobo.examples.getting_started.feature_mapping in your editor. The inline comments walk through the key design decisions: why depth is filtered before integration, how C-RADIO patch features are attached to CameraObservation.feature_grid, how block features are visualized with PCA, and how optional text matching projects map features into the SigLIP adaptor space.