curobo.examples.getting_started.feature_mapping module¶

Fuse neural image features into a TSDF map and query semantic regions.

This tutorial extends curobo.examples.getting_started.volumetric_mapping with a learned feature channel. cuRobo still integrates depth frames into a block-sparse Truncated Signed Distance Field (TSDF), but each RGB frame is also encoded by NVIDIA C-RADIO and passed to CameraObservation as feature_grid. The mapper fuses those patch features into the allocated TSDF blocks, so later queries can find parts of the 3D map that are visually or semantically similar.

Feature Integration

C-RADIO (Reduce All Domains Into One) distills multiple vision foundation models, including DINOv2, SAM, CLIP, and SigLIP, into one backbone. This example uses the C-RADIO v3-B checkpoint and its per-image patch embeddings in two beginner-friendly ways:

Project image or map features to RGB with Principal Component Analysis (PCA) so feature clusters can be inspected visually.
When the viewer is enabled, project block features through the fixed SigLIP adaptor and match them against text prompts such as table or chair.

This example downloads C-RADIO v3-B through torch.hub on first use. The first run must be able to reach NVlabs/RADIO on GitHub, download the checkpoint, and install any missing RADIO dependencies.

By the end of this tutorial you will have:

Loaded an RGB-D sequence from Sun3D
Extracted C-RADIO v3-B patch features for each selected RGB frame
Fused depth, color, and learned features into a TSDF map
Saved side-by-side RGB | PCA(features) images for quick inspection
Visualized the map and highlighted blocks that match a text prompt when --visualize is enabled

Before starting¶

Install the extra feature mapping dependencies. If your environment needs Hugging Face authentication for checkpoint downloads, export HF_TOKEN before running the example:

export HF_TOKEN=<your_huggingface_token>
uv pip install timm transformers torchvision einops

How the mapper uses features¶

The feature mapper follows the same geometry path as the volumetric mapping tutorial, with one extra input: a lower-resolution grid of learned patch features from the RGB image.

RGB-D feature mapping data flow¶

Step 1: Download the dataset¶

This tutorial uses the same Sun3D indoor RGB-D scene as the volumetric mapping tutorial. It contains color images, depth maps, camera intrinsics, and ground-truth camera poses.

Quick start (downloads one scene, about 1400 MB):

wget http://3dvision.princeton.edu/projects/2016/3DMatch/downloads/rgbd-datasets/sun3d-mit_76_studyroom-76-1studyroom2.zip
mkdir -p datasets/sun3d
unzip sun3d-mit_76_studyroom-76-1studyroom2.zip -d datasets/sun3d

The extracted directory should look like:

datasets/sun3d/sun3d-mit_76_studyroom-76-1studyroom2/
    camera-intrinsics.txt
    <sequence_name>/
        000001.color.png
        000001.depth.png
        000001.pose.txt
        ...

Step 2: Run a quick feature-fusion pass¶

Start with a small number of frames because C-RADIO inference is heavier than plain depth integration:

python -m curobo.examples.getting_started.feature_mapping \
    --root ./datasets/sun3d/sun3d-mit_76_studyroom-76-1studyroom2 \
    --num-frames 50 \
    --stride 10 \
    --save-pca

When --save-pca is enabled, the tutorial writes side-by-side RGB and feature-PCA panels to ~/.cache/curobo/examples/feature_mapping/. The colors are not object labels; they are a three-dimensional PCA projection of high-dimensional feature vectors, so nearby colors usually indicate similar visual embeddings.

Step 3: Inspect the map interactively¶

Add --visualize to open a Viser server at http://localhost:8080:

python -m curobo.examples.getting_started.feature_mapping \
    --root ./datasets/sun3d/sun3d-mit_76_studyroom-76-1studyroom2 \
    --num-frames 100 \
    --stride 5 \
    --visualize

The viewer shows:

/reconstruction/features_pca: occupied voxels colored by fused C-RADIO features projected through a map-wide PCA basis.
/reconstruction/rgb: occupied voxels colored by the TSDF color channel. This layer is hidden by default and can be toggled from the scene tree.
Current RGB and Current Feature PCA panels for the latest frame.

Feature Integration

Step 4: Try text matching¶

Use --visualize to open the Text Matching panel. The example uses the C-RADIO v3-B SigLIP adaptor for text queries:

python -m curobo.examples.getting_started.feature_mapping \
    --root ./datasets/sun3d/sun3d-mit_76_studyroom-76-1studyroom2 \
    --num-frames 100 \
    --stride 5 \
    --visualize

Enter a prompt in the panel to highlight the top matching TSDF blocks under /reconstruction/text_matched. Clear Matches demonstrates how matched blocks can be removed from the dynamic map. For a geometric clearing example, --clear-aabb xmin ymin zmin xmax ymax zmax clears all allocated blocks that intersect the given world-space bounds in meters.

Text Feature Alignment

Step 5: Check the output¶

When the tutorial finishes successfully you will see output similar to:

Loading Sun3D from ./datasets/sun3d...
Found 200 frames
Loading C-RADIO (c-radio_v3-B) via NVlabs/RADIO torch.hub...
Feature dim: 768
Mapper initialized: 64.0 MB
integrating: 100%|...

Mapper memory: 64.0 MB
PCA panels saved to: ~/.cache/curobo/examples/feature_mapping

class CRadioInference( model_name='c-radio_v3-B', device='cuda:0', text_adaptor_name=None, )¶

Bases: object

Own all C-RADIO neural-network inference used by this tutorial.

The mapper itself is not a neural network: it fuses depth, color, and feature tensors into a TSDF map. This class keeps the learned pieces together so the rest of the example can treat them as three simple operations:

extract patch features from an RGB image;
optionally encode text with the requested RADIO adaptor;
optionally project map features into the adaptor’s text-aligned space.

Parameters:

model_name (str)
device (str)
text_adaptor_name (str | None)

__init__( model_name='c-radio_v3-B', device='cuda:0', text_adaptor_name=None, )¶

Parameters:

model_name (str)
device (str)
text_adaptor_name (str | None)

static resolve_torchhub_version( model_name, )¶

Map a C-RADIO model id to NVlabs/RADIO’s torch.hub version key.

Return type:: str
Parameters:: model_name (str)

extract_patch_features( rgb_uint8, )¶

Extract patch features from one RGB image.

Parameters:: rgb_uint8 (Tensor) – (H, W, 3) uint8 image on self.device.
Return type:: Tensor
Returns:: (H_p, W_p, D) float32 feature tensor, where H_p = target_h // patch_size and similarly for W_p.

encode_text(text)¶

Encode one or more strings to (N, D_teacher) L2-normalized features.

Return type:: Tensor

project_features( features, )¶

Project (N, D_radio) map features to L2-normalized teacher features.

Return type:: Tensor
Parameters:: features (torch.Tensor)

pca_colorize_tensor( feats_flat, prev_basis=None, low_pct=0.02, high_pct=0.98, )¶

Fit or reuse a 3-component PCA on (N, D) features and map to RGB.

Returns (colors, basis) where colors is (N, 3) uint8 and basis is (D, 3) float32. Non-finite rows are dropped from the fit and receive black in the output so bad inputs are visually obvious but don’t poison the principal directions. When a compatible prev_basis is provided, it is reused instead of refitting so the expensive PCA/SVD-style solve happens only once.

Return type:

Tuple[Tensor, Tensor]

Parameters:

feats_flat (torch.Tensor)
prev_basis (torch.Tensor | None)
low_pct (float)
high_pct (float)

pca_colorize_with_basis( feats, prev_basis=None, low_pct=0.02, high_pct=0.98, )¶

Project (H, W, D) features to an (H, W, 3) uint8 image via PCA.

Return type:

Tuple[ndarray, Tensor]

Parameters:

feats (torch.Tensor)
prev_basis (torch.Tensor | None)
low_pct (float)
high_pct (float)

pca_colorize( feats, low_pct=0.02, high_pct=0.98, )¶

Project (H, W, D) features to an (H, W, 3) uint8 image via PCA.

Return type:

ndarray

Parameters:

feats (torch.Tensor)
low_pct (float)
high_pct (float)

upsample_nn(img_uint8, target_hw)¶

Nearest-neighbor upsample (H, W, 3) uint8 to target_hw.

Return type:

ndarray

Parameters:

img_uint8 (numpy.ndarray)
target_hw (Tuple[int, int])

downsample_for_gui( img_uint8, max_width=320, )¶

Cheap preview downsample for viser GUI image widgets.

Return type:

ndarray

Parameters:

img_uint8 (numpy.ndarray)
max_width (int)

show_empty_reconstruction( visualizer, voxel_size, )¶

Publish empty RGB and feature-PCA point clouds to clear the viewer.

Return type:: None
Parameters:: voxel_size (float)

show_feature_reconstruction( visualizer, voxels, block_colors_pca, voxel_size, )¶

Draw occupied voxels as RGB and feature-PCA point clouds in Viser.

Return type:

None

Parameters:

block_colors_pca (torch.Tensor)
voxel_size (float)

process_frame( obs, mapper, feature_model, depth_filter, prev_pca_basis=None, surface_only=True, extract_voxels=False, timer=None, )¶

Integrate one RGB-D frame and optionally prepare visualization data.

The mapper expects a batched CameraObservation, even when this tutorial uses one camera. This helper keeps the per-frame flow in one place: clean depth, extract C-RADIO features, integrate into the mapper, and optionally extract occupied voxels for the live PCA point cloud.

Returns:

(feats, voxels, block_colors_pca, pca_basis, tsdf_time_ms). feats is the raw (H_p, W_p, D) RADIO patch map used for per-image PCA.

Parameters:

obs (curobo._src.types.camera.CameraObservation)
mapper (curobo._src.perception.mapper.mapper.Mapper)
feature_model (curobo.examples.getting_started.feature_mapping.CRadioInference)
depth_filter (curobo._src.perception.filter_depth.FilterDepth)
prev_pca_basis (torch.Tensor | None)
surface_only (bool)
extract_voxels (bool)
timer (curobo._src.util.cuda_event_timer.CudaEventTimer)

main()¶