Feature Mapping¶
Fuse neural image features into a TSDF map and query semantic regions.
This tutorial extends curobo.examples.getting_started.volumetric_mapping
with a learned feature channel. cuRobo still integrates depth frames into a
block-sparse Truncated Signed Distance Field (TSDF), but each RGB frame is also
encoded by NVIDIA C-RADIO and passed to CameraObservation
as feature_grid. The mapper fuses those patch features into the allocated
TSDF blocks, so later queries can find parts of the 3D map that are visually or
semantically similar.
C-RADIO (Reduce All Domains Into One) distills multiple vision foundation models, including DINOv2, SAM, CLIP, and SigLIP, into one backbone. This example uses the C-RADIO v3-B checkpoint and its per-image patch embeddings in two beginner-friendly ways:
Project image or map features to RGB with Principal Component Analysis (PCA) so feature clusters can be inspected visually.
When the viewer is enabled, project block features through the fixed SigLIP adaptor and match them against text prompts such as
tableorchair.
This example downloads C-RADIO v3-B through torch.hub on first use. The
first run must be able to reach NVlabs/RADIO on GitHub, download the
checkpoint, and install any missing RADIO dependencies.
By the end of this tutorial you will have:
Loaded an RGB-D sequence from Sun3D
Extracted C-RADIO v3-B patch features for each selected RGB frame
Fused depth, color, and learned features into a TSDF map
Saved side-by-side
RGB | PCA(features)images for quick inspectionVisualized the map and highlighted blocks that match a text prompt when
--visualizeis enabled
Before starting¶
Install the extra feature mapping dependencies. If your environment needs
Hugging Face authentication for checkpoint downloads, export HF_TOKEN before
running the example:
export HF_TOKEN=<your_huggingface_token>
uv pip install timm transformers torchvision einops
How the mapper uses features¶
The feature mapper follows the same geometry path as the volumetric mapping tutorial, with one extra input: a lower-resolution grid of learned patch features from the RGB image.
RGB-D feature mapping data flow¶
Step 1: Download the dataset¶
This tutorial uses the same Sun3D indoor RGB-D scene as the volumetric mapping tutorial. It contains color images, depth maps, camera intrinsics, and ground-truth camera poses.
Quick start (downloads one scene, about 1400 MB):
wget http://3dvision.princeton.edu/projects/2016/3DMatch/downloads/rgbd-datasets/sun3d-mit_76_studyroom-76-1studyroom2.zip
mkdir -p datasets/sun3d
unzip sun3d-mit_76_studyroom-76-1studyroom2.zip -d datasets/sun3d
The extracted directory should look like:
datasets/sun3d/sun3d-mit_76_studyroom-76-1studyroom2/
camera-intrinsics.txt
<sequence_name>/
000001.color.png
000001.depth.png
000001.pose.txt
...
Step 2: Run a quick feature-fusion pass¶
Start with a small number of frames because C-RADIO inference is heavier than plain depth integration:
python -m curobo.examples.getting_started.feature_mapping \
--root ./datasets/sun3d/sun3d-mit_76_studyroom-76-1studyroom2 \
--num-frames 50 \
--stride 10 \
--save-pca
When --save-pca is enabled, the tutorial writes side-by-side RGB and
feature-PCA panels to ~/.cache/curobo/examples/feature_mapping/. The colors
are not object labels; they are a three-dimensional PCA projection of
high-dimensional feature vectors, so nearby colors usually indicate similar
visual embeddings.
Step 3: Inspect the map interactively¶
Add --visualize to open a Viser server at
http://localhost:8080:
python -m curobo.examples.getting_started.feature_mapping \
--root ./datasets/sun3d/sun3d-mit_76_studyroom-76-1studyroom2 \
--num-frames 100 \
--stride 5 \
--visualize
The viewer shows:
/reconstruction/features_pca: occupied voxels colored by fused C-RADIO features projected through a map-wide PCA basis./reconstruction/rgb: occupied voxels colored by the TSDF color channel. This layer is hidden by default and can be toggled from the scene tree.Current RGBandCurrent Feature PCApanels for the latest frame.
Step 4: Try text matching¶
Use --visualize to open the Text Matching panel. The example uses the
C-RADIO v3-B SigLIP adaptor for text queries:
python -m curobo.examples.getting_started.feature_mapping \
--root ./datasets/sun3d/sun3d-mit_76_studyroom-76-1studyroom2 \
--num-frames 100 \
--stride 5 \
--visualize
Enter a prompt in the panel to highlight the top matching TSDF blocks under
/reconstruction/text_matched. Clear Matches demonstrates how matched
blocks can be removed from the dynamic map. For a geometric clearing example,
--clear-aabb xmin ymin zmin xmax ymax zmax clears all allocated blocks that
intersect the given world-space bounds in meters.
Step 5: Check the output¶
When the tutorial finishes successfully you will see output similar to:
Loading Sun3D from ./datasets/sun3d...
Found 200 frames
Loading C-RADIO (c-radio_v3-B) via NVlabs/RADIO torch.hub...
Feature dim: 768
Mapper initialized: 64.0 MB
integrating: 100%|...
Mapper memory: 64.0 MB
PCA panels saved to: ~/.cache/curobo/examples/feature_mapping
Once you have run the tutorial, open
curobo.examples.getting_started.feature_mapping in your editor.
The inline comments walk through the key design decisions: why depth is
filtered before integration, how C-RADIO patch features are attached to
CameraObservation.feature_grid, how block features are visualized with PCA,
and how optional text matching projects map features into the SigLIP adaptor
space.