curobo.perception module

Perception utilities module.

This module provides sensor processing utilities for robot perception tasks, including robot segmentation and volumetric mapping from depth images.

Example (robot segmentation):

```python from curobo.perception import RobotSegmenter from curobo.kinematics import KinematicsCfg from curobo.types import CameraObservation, JointState

# Create segmenter segmenter = RobotSegmenter.from_robot_file(

robot_file=”franka.yml”, distance_threshold=0.05,

)

# Segment robot from depth image camera_obs = CameraObservation(

depth_image=depth_tensor, intrinsics=camera_intrinsics, pose=camera_pose,

) joint_state = JointState(position=current_joint_angles)

mask, filtered_depth = segmenter.get_robot_mask(camera_obs, joint_state) # mask: binary mask of robot pixels # filtered_depth: depth image with robot removed ```

Example (volumetric mapping):

```python from curobo.perception import FilterDepth, Mapper, MapperCfg

mapper = Mapper(MapperCfg(voxel_size=0.02, …)) depth_filter = FilterDepth(image_shape=depth.shape[-2:]) filtered, valid = depth_filter(depth.unsqueeze(0)) mapper.integrate(camera_obs) ```

Example (robot pose estimation):

```python from curobo.perception import (

DetectorCfg, PoseDetector, RobotMesh, SDFDetectorCfg, SDFPoseDetector,

)

mesh = RobotMesh.from_kinematics(kinematics, joint_state) detector = PoseDetector(DetectorCfg(…), mesh) pose = detector.detect(pointcloud) ```

class DetectorCfg(
n_mesh_points_coarse=500,
n_observed_points_coarse=2000,
n_rotation_samples=64,
n_iterations_coarse=50,
distance_threshold_coarse=0.5,
n_mesh_points_fine=2000,
n_observed_points_fine=10000,
n_iterations_fine=50,
distance_threshold_fine=0.01,
use_svd=False,
use_huber_loss=True,
huber_delta=0.02,
save_iterations=False,
device_cfg=<factory>,
)

Bases: object

Configuration for pose detector (point-to-plane ICP with Huber loss).

Parameters:
n_mesh_points_coarse: int = 500
n_observed_points_coarse: int = 2000
n_rotation_samples: int = 64
n_iterations_coarse: int = 50
distance_threshold_coarse: float = 0.5
n_mesh_points_fine: int = 2000
n_observed_points_fine: int = 10000
n_iterations_fine: int = 50
distance_threshold_fine: float = 0.01
use_svd: bool = False
use_huber_loss: bool = True
huber_delta: float = 0.02
save_iterations: bool = False
device_cfg: curobo._src.types.device_cfg.DeviceCfg
__init__(
n_mesh_points_coarse=500,
n_observed_points_coarse=2000,
n_rotation_samples=64,
n_iterations_coarse=50,
distance_threshold_coarse=0.5,
n_mesh_points_fine=2000,
n_observed_points_fine=10000,
n_iterations_fine=50,
distance_threshold_fine=0.01,
use_svd=False,
use_huber_loss=True,
huber_delta=0.02,
save_iterations=False,
device_cfg=<factory>,
)
Parameters:
Return type:

None

class FilterDepth(
image_shape,
depth_minimum_distance=0.1,
depth_maximum_distance=10.0,
flying_pixel_threshold=0.5,
bilateral_kernel_size=5,
bilateral_sigma_spatial=10.0,
bilateral_sigma_depth=0.1,
device='cuda',
num_batch=1,
)

Bases: object

GPU-accelerated depth filtering using fused Warp kernels.

Combines range filtering, flying pixel detection, and bilateral smoothing into a single efficient GPU pass. Pre-allocates buffers at initialization for zero-allocation filtering during runtime.

Operates strictly on batched depth tensors of shape (B, H, W). Set num_batch > 1 at init to pre-allocate batched output buffers so all images in the batch are filtered in a single kernel launch. Single-image callers must explicitly pass depth.unsqueeze(0) and index filtered[0] on return.

Parameters:
  • image_shape (Tuple[int, int]) – Tuple of (height, width) for each depth image.

  • depth_minimum_distance (float) – Minimum valid depth in meters. Default: 0.1m.

  • depth_maximum_distance (float) – Maximum valid depth in meters. Default: 10.0m.

  • flying_pixel_threshold (Optional[float]) – Filter aggressiveness 0.0-1.0 for flying pixels. Higher = more aggressive. Set to None to disable. Default: 0.5.

  • bilateral_kernel_size (Optional[int]) – Size of bilateral kernel (must be odd). Set to None to disable bilateral filtering. Default: 5.

  • bilateral_sigma_spatial (float) – Spatial sigma in pixels. Default: 2.0.

  • bilateral_sigma_depth (float) – Depth sigma in meters. Default: 0.05.

  • device (str) – CUDA device. Default: “cuda”.

  • num_batch (int) – Pre-allocate buffers for this batch size. Default: 1.

Initialize FilterDepth with configuration and pre-allocated buffers.

__init__(
image_shape,
depth_minimum_distance=0.1,
depth_maximum_distance=10.0,
flying_pixel_threshold=0.5,
bilateral_kernel_size=5,
bilateral_sigma_spatial=10.0,
bilateral_sigma_depth=0.1,
device='cuda',
num_batch=1,
)

Initialize FilterDepth with configuration and pre-allocated buffers.

Parameters:
  • image_shape (Tuple[int, int])

  • depth_minimum_distance (float)

  • depth_maximum_distance (float)

  • flying_pixel_threshold (float | None)

  • bilateral_kernel_size (int | None)

  • bilateral_sigma_spatial (float)

  • bilateral_sigma_depth (float)

  • device (str)

  • num_batch (int)

update_config(
depth_minimum_distance=None,
depth_maximum_distance=None,
flying_pixel_threshold=None,
bilateral_sigma_depth=None,
)

Update filter configuration without reallocating buffers.

Only updates the specified parameters. Set to None to keep current value.

Parameters:
  • depth_minimum_distance (Optional[float]) – New minimum depth.

  • depth_maximum_distance (Optional[float]) – New maximum depth.

  • flying_pixel_threshold (Optional[float]) – New flying pixel threshold (set to 0 to disable).

  • bilateral_sigma_depth (Optional[float]) – New bilateral depth sigma.

classmethod from_config(
config,
image_shape,
device='cuda',
num_batch=1,
)

Create FilterDepth from a configuration object.

Parameters:
  • config (FilterDepthConfig) – FilterDepthConfig with filter parameters.

  • image_shape (Tuple[int, int]) – Tuple of (height, width) for depth images.

  • device (str) – CUDA device.

  • num_batch (int) – Pre-allocate buffers for this batch size.

Return type:

FilterDepth

Returns:

Configured FilterDepth instance.

class Mapper(config)

Bases: object

Volumetric mapper using block-sparse storage.

Encapsulates block-sparse TSDF integrator with ESDF generation. Provides a simplified API compared to BlockSparseESDFIntegrator.

Parameters:

config (MapperCfg) – MapperCfg with grid and integration parameters.

Example

config = MapperCfg(

extent_meters_xyz=(2.0, 2.0, 2.0), voxel_size=0.005,

) mapper = Mapper(config)

for obs in observations:

mapper.integrate(obs)

voxel_grid = mapper.compute_esdf() mesh = mapper.extract_mesh()

Initialize Mapper with configuration.

__init__(config)

Initialize Mapper with configuration.

Parameters:

config (curobo._src.perception.mapper.mapper_cfg.MapperCfg)

property integrator: curobo._src.perception.mapper.integrator_esdf.BlockSparseESDFIntegrator

Access underlying integrator for advanced operations.

property tsdf: curobo._src.perception.mapper.storage.BlockSparseTSDF

Access the BlockSparseTSDF storage.

integrate(observation)

Integrate batched depth observation into TSDF.

The observation must have a leading camera dimension matching config.num_cameras. See BlockSparseTSDFIntegrator.integrate for the expected tensor shapes.

Parameters:

observation (CameraObservation) – Batched camera observation.

Return type:

None

clear_region(
bounds_min,
bounds_max,
)

Clear dynamic map contents inside a conservative world-space AABB.

Blocks remain allocated. The cached ESDF voxel grid is invalidated and must be recomputed with compute_esdf.

Return type:

int

clear_blocks(pool_indices)

Clear dynamic map contents for explicit block-pool indices.

Return type:

int

compute_esdf(
esdf_origin=None,
esdf_voxel_size=None,
)

Compute ESDF from current TSDF and return as VoxelGrid.

Parameters:
  • esdf_origin (Optional[Tensor]) – Override ESDF origin (for sliding window).

  • esdf_voxel_size (Optional[float]) – Override ESDF voxel size (meters).

Return type:

VoxelGrid

Returns:

VoxelGrid containing the signed distance field for collision checking.

extract_mesh(
refine_iterations=2,
surface_only=True,
)

Extract mesh using GPU marching cubes.

Parameters:
  • refine_iterations (int) – Newton-Raphson iterations for vertex refinement.

  • surface_only (bool) – Only extract mesh near surface.

Return type:

Mesh

Returns:

Mesh object with vertices, faces, and colors.

extract_occupied_voxels(
surface_only=False,
sdf_threshold=None,
)

Extract occupied voxel centers and per-voxel block indices.

Parameters:
  • surface_only (bool) – If True, only extract voxels near the TSDF surface. If False, include inside voxels too.

  • sdf_threshold (Optional[float]) – Surface threshold. Defaults to the underlying TSDF integrator’s voxel size.

Return type:

OccupiedVoxels

Returns:

OccupiedVoxels with voxel centers, per-voxel block pool indices, and a view over per-block storage.

extract_matching_feature_voxels(
feature_vector,
top_k,
surface_only=False,
sdf_threshold=None,
minimum_score=None,
feature_projector=None,
)

Extract voxels from the top-k blocks most similar (cosine) to feature_vector. Requires MapperCfg.feature_dim > 0.

minimum_score (cosine in the query feature space) drops matched blocks below the cut after the top-k selection, so the voxel-extraction kernel skips work proportional to filtered blocks. Pair a generous top_k with minimum_score to get all matches above a quality threshold.

Returns a MatchedVoxels carrying the extracted voxels plus parallel (K,) block_pool_idx / block_scores tensors in descending score order. block_pool_idx can be passed directly to clear_blocks to drop the matched blocks from the map.

feature_projector optionally maps stored block features into the query space before scoring. This keeps model-specific heads (e.g. RADIO -> SigLIP) outside the mapper while still using the mapper’s all-allocated-block scoring and extraction path.

Return type:

MatchedVoxels

Parameters:
get_matching_feature_voxels(
feature_vector,
top_k,
surface_only=False,
sdf_threshold=None,
minimum_score=None,
feature_projector=None,
)

Compatibility wrapper for extract_matching_feature_voxels.

Return type:

MatchedVoxels

Parameters:
reset()

Reset mapper for new scene.

Return type:

None

get_stats(
scan_pool=True,
scan_hash=False,
)

Get mapper statistics.

Parameters:
  • scan_pool (bool) – Forwarded to the integrator’s stats call. When True (default) returns the ground-truth active_blocks and the holes invariant via an O(num_allocated) GPU reduction. When False, falls back to the cheap num_allocated - free_count approximation.

  • scan_hash (bool) – Forwarded to the integrator’s stats call. When True, adds hash table occupancy stats via an O(hash_capacity) reduction. Periodic use only.

Return type:

dict

Returns:

Dictionary with block usage, memory stats, frame count, and last_integration_kernel_timings_ms. See BlockSparseTSDF.get_stats for the full block-pool key list.

memory_usage_mb()

Get total GPU memory usage in megabytes.

Return type:

float

update_static_obstacles(
scene,
env_idx=0,
)

Update static obstacles in the TSDF from scene collision tensors.

This method clears the static SDF channel and re-stamps all primitives from the provided scene. The static channel does not decay and is combined with the dynamic (depth) channel using min() for collision.

Note: This is not intended to be called every frame. Use it when static obstacle poses change (e.g., after robot reconfiguration).

Parameters:
  • scene (SceneData) – Scene collision tensors containing cuboids and meshes.

  • env_idx (int) – Environment index for multi-environment scenes.

Return type:

None

render(
intrinsics,
pose,
image_shape,
)

Render depth and normals from current TSDF.

Parameters:
  • intrinsics (Tensor) – Camera intrinsics (3, 3) or (4,) as [fx, fy, cx, cy].

  • pose (Pose) – Camera-to-world transform as Pose.

  • image_shape (Tuple[int, int]) – (height, width) of output images.

Return type:

Tuple[Tensor, Tensor, Tensor]

Returns:

Tuple of (depth, normals, valid_mask).

render_color(
intrinsics,
pose,
image_shape,
)

Render depth, normals, and color from current TSDF.

Parameters:
  • intrinsics (Tensor) – Camera intrinsics (3, 3) or (4,) as [fx, fy, cx, cy].

  • pose (Pose) – Camera-to-world transform as Pose.

  • image_shape (Tuple[int, int]) – (height, width) of output images.

Return type:

Tuple[Tensor, Tensor, Tensor, Tensor]

Returns:

Tuple of (depth, normals, color, valid_mask). - depth: (H, W) in meters - normals: (H, W, 3) world-frame normals - color: (H, W, 3) uint8 RGB - valid_mask: (H, W) bool

render_depth(
intrinsics,
pose,
image_shape,
)

Render only depth image.

Parameters:
  • intrinsics (Tensor) – Camera intrinsics.

  • pose (Pose) – Camera-to-world transform as Pose.

  • image_shape (Tuple[int, int]) – (height, width) of output images.

Returns:

(H, W) in meters.

Return type:

depth

render_color_only(
intrinsics,
pose,
image_shape,
)

Render only color image.

Parameters:
  • intrinsics (Tensor) – Camera intrinsics.

  • pose (Pose) – Camera-to-world transform as Pose.

  • image_shape (Tuple[int, int]) – (height, width) of output images.

Returns:

(H, W, 3) uint8 RGB.

Return type:

color

render_shaded(
intrinsics,
pose,
image_shape,
light_direction=(0.0, 0.0, 1.0),
ambient=1.0,
use_color=True,
)

Render Lambertian-shaded image.

Parameters:
  • intrinsics (Tensor) – Camera intrinsics.

  • pose (Pose) – Camera-to-world transform as Pose.

  • image_shape (Tuple[int, int]) – (height, width) of output images.

  • light_direction (Tuple[float, float, float]) – Light direction in camera frame.

  • ambient (float) – Ambient lighting factor (0-1).

  • use_color (bool) – If True, use TSDF color. If False, use gray.

Returns:

(H, W, 3) uint8 RGB shaded image.

Return type:

shaded

render_depth_colormap(
intrinsics,
pose,
image_shape,
)

Render depth as colormap for visualization.

Parameters:
  • intrinsics (Tensor) – Camera intrinsics.

  • pose (Pose) – Camera-to-world transform as Pose.

  • image_shape (Tuple[int, int]) – (height, width) of output images.

Returns:

(H, W, 3) uint8 RGB colormap.

Return type:

colormap

render_normal_colormap(
intrinsics,
pose,
image_shape,
)

Render normals as colormap for visualization.

Parameters:
  • intrinsics (Tensor) – Camera intrinsics.

  • pose (Pose) – Camera-to-world transform as Pose.

  • image_shape (Tuple[int, int]) – (height, width) of output images.

Returns:

(H, W, 3) uint8 RGB colormap.

Return type:

colormap

class MapperCfg(
extent_meters_xyz,
voxel_size=0.005,
esdf_voxel_size=0.05,
extent_esdf_meters_xyz=None,
grid_center=None,
truncation_distance=0.04,
minimum_tsdf_weight=0.1,
depth_minimum_distance=0.1,
depth_maximum_distance=10.0,
decay_factor=1.0,
frustum_decay_factor=1.0,
block_size=4,
hash_load_factor=0.5,
roughness=3.0,
seeding_method='gather',
edt_solver='pba',
enable_static=False,
static_obstacle_color=(20, 20, 20),
num_cameras=1,
image_height=None,
image_width=None,
feature_dim=0,
feature_grid_height=None,
feature_grid_width=None,
max_visible_blocks_per_integration=None,
max_support_pixels_per_block_camera=8,
feature_channels_per_thread=8,
max_feature_tile_channels=4096,
feature_integration_kernel='auto',
profile_integration_kernel_timings=False,
accumulator_w_max=1000.0,
device='cuda:0',
)

Bases: object

Configuration for volumetric mapping.

Coordinate Convention:
  • extent_meters_xyz: (x, y, z) physical extent of voxel bounds (not centers)

  • grid_shape: (nz, ny, nx) voxel counts - Z is slowest, X is fastest

  • Memory layout: row-major with X contiguous (optimal for image projection)

  • grid_center: world coordinate at center of grid (default: origin)

Note: Actual extent may be slightly larger than requested due to ceil() rounding. Use get_actual_extent() to get the true physical dimensions.

Index-to-World Transform (voxel center):

world_x = grid_center[0] + (ix - (nx-1)/2) * voxel_size world_y = grid_center[1] + (iy - (ny-1)/2) * voxel_size world_z = grid_center[2] + (iz - (nz-1)/2) * voxel_size

Parameters:
  • extent_meters_xyz (Tuple[float, float, float])

  • voxel_size (float)

  • esdf_voxel_size (float)

  • extent_esdf_meters_xyz (Tuple[float, float, float])

  • grid_center (torch.Tensor | None)

  • truncation_distance (float)

  • minimum_tsdf_weight (float)

  • depth_minimum_distance (float)

  • depth_maximum_distance (float)

  • decay_factor (float)

  • frustum_decay_factor (float)

  • block_size (int)

  • hash_load_factor (float)

  • roughness (float)

  • seeding_method (str)

  • edt_solver (str)

  • enable_static (bool)

  • static_obstacle_color (Tuple[int, int, int])

  • num_cameras (int)

  • image_height (int | None)

  • image_width (int | None)

  • feature_dim (int)

  • feature_grid_height (int | None)

  • feature_grid_width (int | None)

  • max_visible_blocks_per_integration (int | None)

  • max_support_pixels_per_block_camera (int)

  • feature_channels_per_thread (int)

  • max_feature_tile_channels (int)

  • feature_integration_kernel (str)

  • profile_integration_kernel_timings (bool)

  • accumulator_w_max (float)

  • device (str)

extent_meters_xyz

Physical extent (x, y, z) of voxel bounds in meters.

voxel_size

Voxel edge length in meters.

grid_center

World coordinate at grid center (default: origin).

truncation_distance

TSDF truncation distance in meters.

minimum_tsdf_weight

Minimum weight to consider a voxel as observed.

depth_minimum_distance

Minimum valid depth in meters.

depth_maximum_distance

Maximum valid depth in meters.

decay_factor

Weight decay applied to ALL voxels each frame (1.0 = no decay).

frustum_decay_factor

Additional decay for in-view voxels.

block_size

Voxels per block edge. Supported values are 1 or powers of 2 in [2, 32] (default 8). Specializes the Warp kernel builder; two Mappers with different block sizes can coexist.

hash_load_factor

Target hash table load factor.

device

CUDA device string.

extent_meters_xyz: Tuple[float, float, float]
voxel_size: float = 0.005
esdf_voxel_size: float = 0.05
extent_esdf_meters_xyz: Tuple[float, float, float] = None
grid_center: torch.Tensor | None = None
truncation_distance: float = 0.04
minimum_tsdf_weight: float = 0.1
depth_minimum_distance: float = 0.1
depth_maximum_distance: float = 10.0
decay_factor: float = 1.0
frustum_decay_factor: float = 1.0
block_size: int = 4

Voxels per block edge. Supported values are 1 or powers of 2 in [2, 32]. See BlockSparseKernels.

hash_load_factor: float = 0.5
roughness: float = 3.0
seeding_method: str = 'gather'

"scatter" or "gather".

  • "scatter": iterates every allocated TSDF voxel and maps each surface voxel to the single ESDF cell that contains its center. The resulting seed band is exactly one ESDF voxel thick at the surface, faithfully matching the TSDF. Launch dimension depends on the number of allocated TSDF blocks, so it is not CUDA-graph safe.

  • "gather": iterates every ESDF voxel and probes 7 TSDF positions (cell center + 6 face centers at ±esdf_vs/2). Because the face-center samples lie on the ESDF cell boundary, they can detect surface voxels that belong to neighboring ESDF cells, producing a dilated seed band (~1.5 voxels thick). This thicker band gives PBA more seed sites near the surface, which typically yields slightly higher collision recall at the cost of not matching the TSDF surface exactly. Fixed launch dimension (D×H×W), CUDA-graph safe.

Type:

ESDF surface seeding strategy

edt_solver: str = 'pba'
enable_static: bool = False
static_obstacle_color: Tuple[int, int, int] = (20, 20, 20)
num_cameras: int = 1
image_height: int | None = None

Camera image height in pixels. Required; used to pre-allocate the voxel-project scratch buffer at Mapper construction (buffer size num_cameras * image_height * image_width * num_samples) so no per-frame reallocation or D2H syncs occur.

image_width: int | None = None

Camera image width in pixels. Required; see image_height.

feature_dim: int = 0

Per-block feature channel dimensionality. 0 disables features.

feature_grid_height: int | None = None

Compile-time feature-grid height. Required when feature_dim > 0; must be None when features are disabled.

feature_grid_width: int | None = None

Compile-time feature-grid width. Required when feature_dim > 0; must be None when features are disabled.

max_visible_blocks_per_integration: int | None = None

Maximum visible blocks one integration frame may process. Defaults to max_blocks after that value is computed.

max_support_pixels_per_block_camera: int = 8

Maximum support pixels stored per visible block per camera for RGB and feature integration. Larger values use more scratch memory and preserve more per-block color/feature evidence.

feature_channels_per_thread: int = 8

Number of adjacent feature channels accumulated by one feature-kernel thread. Kept explicit so Python launch grouping and Warp kernel decoding cannot drift apart.

max_feature_tile_channels: int = 4096

Compile-time cap for feature channels accumulated by one tiled feature-kernel CTA. The generated tile width is min(feature_dim, max_feature_tile_channels).

feature_integration_kernel: str = 'auto'

"auto", "grouped", or "tiled". "auto" resolves to the tiled kernel only for feature dimensions and support capacities where benchmarks showed a win.

Type:

Feature integration launch policy

profile_integration_kernel_timings: bool = False

Record per-kernel integration timings into Mapper.get_stats()["last_integration_kernel_timings_ms"]. Disabled by default to avoid profiling synchronizations in normal integration.

accumulator_w_max: float = 1000.0

Upper bound on per-block accumulator weight. Caps fp16 block_rgb / block_features magnitudes each frame and sets the EMA decay rate for old observations (effective window ~W_max / mean_per_frame_weight). Raise for longer memory, lower for faster adaptation to dynamic scenes.

device: str = 'cuda:0'
property grid_shape: Tuple[int, int, int]

Returns (nz, ny, nx) voxel counts.

property max_blocks: int

Compute max allocated blocks using surface area × thickness heuristic.

TSDF only allocates blocks near surfaces, not throughout the volume. The surface is a volumetric band with thickness = 2 × truncation_distance.

Formula:

max_blocks = surface_area_blocks × thickness_factor × roughness

This scales O(N²) with grid size, not O(N³), matching actual TSDF usage.

property hash_capacity: int

Hash table capacity based on target load factor.

get_actual_extent()

Returns actual physical extent (x, y, z) after ceil() rounding.

Return type:

Tuple[float, float, float]

voxel_to_world(iz, iy, ix)

Convert voxel index to world coordinate at voxel center.

Parameters:
  • iz (int) – Z index (0 to nz-1)

  • iy (int) – Y index (0 to ny-1)

  • ix (int) – X index (0 to nx-1)

Return type:

Tuple[float, float, float]

Returns:

(world_x, world_y, world_z) coordinate at voxel center.

world_to_voxel(
world_x,
world_y,
world_z,
)

Convert world coordinate to voxel index.

Parameters:
  • world_x (float) – World X coordinate.

  • world_y (float) – World Y coordinate.

  • world_z (float) – World Z coordinate.

Return type:

Tuple[int, int, int]

Returns:

(iz, iy, ix) voxel indices, or (-1, -1, -1) if out of bounds.

__init__(
extent_meters_xyz,
voxel_size=0.005,
esdf_voxel_size=0.05,
extent_esdf_meters_xyz=None,
grid_center=None,
truncation_distance=0.04,
minimum_tsdf_weight=0.1,
depth_minimum_distance=0.1,
depth_maximum_distance=10.0,
decay_factor=1.0,
frustum_decay_factor=1.0,
block_size=4,
hash_load_factor=0.5,
roughness=3.0,
seeding_method='gather',
edt_solver='pba',
enable_static=False,
static_obstacle_color=(20, 20, 20),
num_cameras=1,
image_height=None,
image_width=None,
feature_dim=0,
feature_grid_height=None,
feature_grid_width=None,
max_visible_blocks_per_integration=None,
max_support_pixels_per_block_camera=8,
feature_channels_per_thread=8,
max_feature_tile_channels=4096,
feature_integration_kernel='auto',
profile_integration_kernel_timings=False,
accumulator_w_max=1000.0,
device='cuda:0',
)
Parameters:
  • extent_meters_xyz (Tuple[float, float, float])

  • voxel_size (float)

  • esdf_voxel_size (float)

  • extent_esdf_meters_xyz (Tuple[float, float, float])

  • grid_center (torch.Tensor | None)

  • truncation_distance (float)

  • minimum_tsdf_weight (float)

  • depth_minimum_distance (float)

  • depth_maximum_distance (float)

  • decay_factor (float)

  • frustum_decay_factor (float)

  • block_size (int)

  • hash_load_factor (float)

  • roughness (float)

  • seeding_method (str)

  • edt_solver (str)

  • enable_static (bool)

  • static_obstacle_color (Tuple[int, int, int])

  • num_cameras (int)

  • image_height (int | None)

  • image_width (int | None)

  • feature_dim (int)

  • feature_grid_height (int | None)

  • feature_grid_width (int | None)

  • max_visible_blocks_per_integration (int | None)

  • max_support_pixels_per_block_camera (int)

  • feature_channels_per_thread (int)

  • max_feature_tile_channels (int)

  • feature_integration_kernel (str)

  • profile_integration_kernel_timings (bool)

  • accumulator_w_max (float)

  • device (str)

Return type:

None

get_grid_bounds()

Get world coordinates of grid corners.

Return type:

Tuple[Tuple[float, float, float], Tuple[float, float, float]]

Returns:

(min_corner, max_corner) as ((x, y, z), (x, y, z)).

class MatchedVoxels(
voxels,
block_pool_idx,
block_scores,
)

Bases: object

Result of a feature-similarity query on a BlockSparseTSDF.

Voxels are extracted from the top-K matched blocks. block_pool_idx and block_scores are parallel (K,) tensors in descending score order — index i in one corresponds to index i in the other.

Scores are cosine similarity in the same feature space as the query vector (range [-1, 1]). Callers querying through a projection (e.g., RADIO -> teacher space) must interpret thresholds in the projected space.

block_pool_idx is suitable for direct use with Mapper.clear_blocks to drop the matched blocks from the map.

Parameters:
voxels

Extracted voxels from the matched blocks. len(voxels) may be 0 even when len(block_pool_idx) > 0 if surface filtering rejected every voxel inside the matched blocks.

block_pool_idx

(K,) int32 global pool_idx of matched blocks, sorted by block_scores descending.

block_scores

(K,) float32 cosine scores, parallel to block_pool_idx (same descending order).

voxels: curobo._src.perception.mapper.storage.OccupiedVoxels
block_pool_idx: torch.Tensor
block_scores: torch.Tensor
scores_per_voxel(
fill_value=nan,
)

Gather per-voxel score from per-block scores.

Convenience for visualization (e.g., color voxels by similarity). Granularity is fundamentally per-block — every voxel in a matched block shares its block’s score. Prefer block_scores for ranking and filtering.

Parameters:

fill_value (float) – Value written for voxels whose pool_idx is not in block_pool_idx. Defaults to NaN so an unset entry is loud rather than silently masquerading as a low-similarity match. Voxels in voxels should always come from matched blocks; fill_value is a safety net.

Return type:

Tensor

Returns:

(N,) float32 tensor parallel to voxels.centers.

__init__(
voxels,
block_pool_idx,
block_scores,
)
Parameters:
Return type:

None

class OccupiedVoxels(
centers,
block_idx_per_voxel,
block_data,
)

Bases: object

Result of extract_occupied_voxels.

Parameters:
  • centers (torch.Tensor)

  • block_idx_per_voxel (torch.Tensor)

  • block_data (curobo._src.perception.mapper.storage.BlockDataView)

centers

(N, 3) float32 voxel world positions.

block_idx_per_voxel

(N,) int32 global pool_idx — index into BlockDataView fields.

block_data

Reference view of per-block storage tensors.

centers: torch.Tensor
block_idx_per_voxel: torch.Tensor
block_data: curobo._src.perception.mapper.storage.BlockDataView
colors_uint8(eps=1e-06)

Gather per-voxel RGB colors as (N, 3) uint8.

Weighted sums (stored fp16, RGB normalized to [0, 1] at the integration site) are divided in fp32 and rescaled back to [0, 255] before the uint8 cast. Clamp protects degenerate blocks with zero weight from division by zero.

Return type:

Tensor

Parameters:

eps (float)

features(eps=1e-06)

Gather per-voxel normalized features as (N, feature_dim) float32.

Weighted sums (stored fp16) are divided in fp32 to avoid compounding ulp loss. The per-block feature weight is clamped to avoid division by zero for blocks that never received a feature observation.

Return type:

Tensor

Parameters:

eps (float)

__init__(
centers,
block_idx_per_voxel,
block_data,
)
Parameters:
  • centers (torch.Tensor)

  • block_idx_per_voxel (torch.Tensor)

  • block_data (curobo._src.perception.mapper.storage.BlockDataView)

Return type:

None

class PoseDetector(geometry, config)

Bases: object

Point-to-plane ICP detector with Huber loss and coarse-to-fine strategy.

Achieves sub-millimeter accuracy (1.3mm translation, 0.03° rotation) on robot tracking. Works with any geometry class that provides sample_surface_points and get_dof methods.

Supported geometry types: - RigidObjectGeometry: Static objects - ArticulatedRobotGeometry: Robots with FK-based point transformation - RobotMesh: New unified mesh class with Warp support

Key features: - Point-to-plane ICP (3× better than point-to-point) - Huber loss for outlier robustness (12× improvement) - Coarse-to-fine with 64 random rotation initializations - Cached surface points and normals for speed

Initialize pose detector.

Parameters:
  • geometry (Union[RigidObjectGeometry, ArticulatedRobotGeometry, RobotMesh]) – Geometry model (RigidObjectGeometry, ArticulatedRobotGeometry, or RobotMesh).

  • config (DetectorCfg) – Detector configuration.

__init__(geometry, config)

Initialize pose detector.

Parameters:
  • geometry (Union[RigidObjectGeometry, ArticulatedRobotGeometry, RobotMesh]) – Geometry model (RigidObjectGeometry, ArticulatedRobotGeometry, or RobotMesh).

  • config (DetectorCfg) – Detector configuration.

detect(camera_obs, config)

Detect pose from camera observation.

Parameters:
  • camera_obs (CameraObservation) – Camera observation with depth and segmentation.

  • config (Tensor) – Object configuration (joint angles for robot, ignored for rigid).

Return type:

DetectionResult

Returns:

DetectionResult with pose and alignment error.

detect_from_points(
observed_points,
config,
initial_pose=None,
)

Detect pose from pre-segmented 3D points.

This is useful when you already have segmented points (e.g., from a bounding box or semantic segmentation) and don’t need to extract them from a camera observation.

Parameters:
  • observed_points (Tensor) – [N, 3] tensor of observed 3D points.

  • config (Tensor) – Object configuration (joint angles for robot, ignored for rigid).

  • initial_pose (Optional[Pose]) – Optional initial pose guess. If provided, only refines from this pose (no random rotation sampling). If None, uses random rotations.

Returns:

Detection result with pose and alignment error.

Return type:

result

class RobotMesh(
vertices,
faces,
device='cuda:0',
kinematics=None,
link_vertex_ranges=None,
link_names=None,
link_vertices_local=None,
)

Bases: object

Unified mesh representation for rigid objects and articulated robots.

Provides mesh data for both ICP-based and SDF-based pose detection: - mesh property: wp.Mesh for SDF queries - sample_surface_points(): returns points + normals for ICP

For articulated robots, call update(joint_angles) to transform vertices and refit BVH. The mesh reference remains constant (CUDA graph safe).

Example

# Rigid object robot_mesh = RobotMesh.from_trimesh(trimesh.load(“object.stl”)) points, normals = robot_mesh.sample_surface_points(1000)

# Articulated robot robot_mesh = RobotMesh.from_kinematics(kinematics) robot_mesh.update(joint_angles) points, normals = robot_mesh.sample_surface_points(1000)

# Use with SDF detector sdf = wp.mesh_query_point(robot_mesh.mesh.id, query_point, max_dist)

Initialize RobotMesh.

Use factory methods from_trimesh() or from_kinematics() instead of calling this constructor directly.

Parameters:
  • vertices (Tensor) – [V, 3] mesh vertices in world frame.

  • faces (Tensor) – [F, 3] face indices.

  • device (str) – CUDA device string.

  • kinematics (Optional[Kinematics]) – Kinematics instance for articulated robots.

  • link_vertex_ranges (Optional[List[Tuple[int, int]]]) – List of (start, end) indices for each link’s vertices.

  • link_names (Optional[List[str]]) – List of link names (for FK lookup).

  • link_vertices_local (Optional[List[Tensor]]) – List of vertices in each link’s local frame.

__init__(
vertices,
faces,
device='cuda:0',
kinematics=None,
link_vertex_ranges=None,
link_names=None,
link_vertices_local=None,
)

Initialize RobotMesh.

Use factory methods from_trimesh() or from_kinematics() instead of calling this constructor directly.

Parameters:
  • vertices (Tensor) – [V, 3] mesh vertices in world frame.

  • faces (Tensor) – [F, 3] face indices.

  • device (str) – CUDA device string.

  • kinematics (Optional[Kinematics]) – Kinematics instance for articulated robots.

  • link_vertex_ranges (Optional[List[Tuple[int, int]]]) – List of (start, end) indices for each link’s vertices.

  • link_names (Optional[List[str]]) – List of link names (for FK lookup).

  • link_vertices_local (Optional[List[Tensor]]) – List of vertices in each link’s local frame.

classmethod from_trimesh(
mesh,
device='cuda:0',
)

Create RobotMesh from a trimesh object (rigid object).

Parameters:
  • mesh (Trimesh) – Trimesh object.

  • device (str) – CUDA device string.

Return type:

RobotMesh

Returns:

RobotMesh instance for rigid object.

classmethod from_kinematics(
kinematics,
device='cuda:0',
initial_joint_angles=None,
)

Create RobotMesh from a Kinematics instance (articulated robot).

Combines all link meshes into a single mesh. Call update(joint_angles) to transform vertices based on FK.

Parameters:
  • kinematics (Kinematics) – CuRobo Kinematics instance.

  • device (str) – CUDA device string.

  • initial_joint_angles (Optional[Tensor]) – Initial joint configuration. If None, uses zeros.

Return type:

RobotMesh

Returns:

RobotMesh instance for articulated robot.

property mesh: warp._src.types.Mesh

Get Warp mesh for SDF queries.

The mesh reference is constant (CUDA graph safe), but vertices are updated in-place when update() is called.

Returns:

wp.Mesh instance.

property mesh_id: warp._src.types.uint64

Get Warp mesh ID for kernel use.

Returns:

Mesh ID as wp.uint64.

property vertices: torch.Tensor

Get current vertices as torch tensor.

Returns:

[V, 3] vertex positions in world frame.

property faces: torch.Tensor

Get face indices.

Returns:

[F, 3] face vertex indices.

property n_vertices: int

Number of vertices in mesh.

property n_faces: int

Number of faces in mesh.

property is_articulated: bool

True if this is an articulated robot mesh.

property current_joint_angles: torch.Tensor | None

Current joint configuration (None for rigid objects).

update(joint_angles)

Update mesh vertices based on joint configuration.

Transforms all link vertices using FK and refits BVH. No-op for rigid objects.

Parameters:

joint_angles (Tensor) – [N_joints] tensor of joint angles.

Return type:

None

sample_surface_points(
n_points,
resample=False,
)

Sample points and normals from mesh surface.

Uses pre-cached sample indices for consistency. Set resample=True to generate new random samples.

Parameters:
  • n_points (int) – Number of points to sample.

  • resample (bool) – If True, regenerate sample indices. If False, reuse cached.

Returns:

[n_points, 3] surface points in world frame. normals: [n_points, 3] surface normals (unit vectors).

Return type:

points

get_dof()

Get degrees of freedom.

Return type:

int

Returns:

0 for rigid objects, n_joints for articulated robots.

get_trimesh()

Get current mesh as trimesh object.

Useful for visualization and debugging.

Return type:

Trimesh

Returns:

trimesh.Trimesh with current vertex positions.

class RobotSegmenter(
kinematics,
distance_threshold=0.05,
use_cuda_graph=True,
ops_dtype=torch.bfloat16,
)

Bases: object

Segment robot from depth images using collision spheres.

This class computes masks for robot pixels in depth images by checking distances between depth points and robot collision spheres.

Example

```python from curobo._src.perception.robot_segmenter import RobotSegmenter

segmenter = RobotSegmenter.from_robot_file(“franka.yml”) mask, filtered_image = segmenter.get_robot_mask(camera_obs, joint_state) ```

Initialize the robot segmenter.

Parameters:
  • kinematics (Kinematics) – Robot kinematics model for forward kinematics.

  • distance_threshold (float) – Distance threshold for segmentation.

  • use_cuda_graph (bool) – Whether to use CUDA graphs for acceleration.

  • ops_dtype (dtype) – Data type for operations.

__init__(
kinematics,
distance_threshold=0.05,
use_cuda_graph=True,
ops_dtype=torch.bfloat16,
)

Initialize the robot segmenter.

Parameters:
  • kinematics (Kinematics) – Robot kinematics model for forward kinematics.

  • distance_threshold (float) – Distance threshold for segmentation.

  • use_cuda_graph (bool) – Whether to use CUDA graphs for acceleration.

  • ops_dtype (dtype) – Data type for operations.

static from_robot_file(
robot_file,
collision_sphere_buffer=None,
distance_threshold=0.05,
use_cuda_graph=True,
device_cfg=DeviceCfg(device=device(type='cuda', index=0), dtype=torch.float32, collision_geometry_dtype=torch.float32, collision_gradient_dtype=torch.float32, collision_distance_dtype=torch.float32),
)

Create a RobotSegmenter from a robot configuration file.

Parameters:
  • robot_file (Union[str, Dict]) – Path to robot configuration file or dict.

  • collision_sphere_buffer (Optional[float]) – Optional buffer to add to collision spheres.

  • distance_threshold (float) – Distance threshold for segmentation.

  • use_cuda_graph (bool) – Whether to use CUDA graphs.

  • device_cfg (DeviceCfg) – Tensor device configuration.

Return type:

RobotSegmenter

Returns:

Configured RobotSegmenter instance.

get_pointcloud_from_depth(
camera_obs,
)

Convert depth image to point cloud.

Parameters:

camera_obs (CameraObservation) – Camera observation with depth image.

Return type:

Tensor

Returns:

Point cloud tensor.

update_camera_projection(
camera_obs,
)

Update camera projection rays from observation.

Parameters:

camera_obs (CameraObservation) – Camera observation with intrinsics.

Return type:

None

get_robot_mask(
camera_obs,
joint_state,
)

Get robot mask from camera observation and joint state.

Assumes 1 robot and batch of depth images, batch of poses.

Parameters:
  • camera_obs (CameraObservation) – Camera observation with depth image.

  • joint_state (JointState) – Current joint state of the robot.

Return type:

Tuple[Tensor, Tensor]

Returns:

Tuple of (mask, filtered_image).

Raises:

ValueError – If depth image is not 3D (batch, height, width).

get_robot_mask_from_active_js(
camera_obs,
active_joint_state,
)

Get robot mask using active joint state.

Parameters:
  • camera_obs (CameraObservation) – Camera observation with depth image.

  • active_joint_state (JointState) – Joint state with only active joints.

Return type:

Tuple[Tensor, Tensor]

Returns:

Tuple of (mask, filtered_image).

property kinematics: curobo._src.robot.kinematics.kinematics.Kinematics

Get the robot kinematics model.

Get the robot base link name.

class SDFDetectorCfg(
max_iterations=100,
inner_iterations=25,
convergence_threshold=1e-05,
rotation_convergence_threshold=1e-05,
use_cuda_graph=True,
distance_threshold=0.2,
min_valid_ratio=0.1,
use_huber=True,
huber_delta=0.1,
lambda_initial=0.001,
lambda_factor=10.0,
lambda_min=1e-07,
lambda_max=10000000.0,
rho_min=0.25,
n_points=5000,
device_cfg=<factory>,
)

Bases: object

Configuration for SDF-based pose detector (Levenberg-Marquardt optimization).

Uses mesh SDF queries for implicit correspondence and analytic gradients.

Parameters:
max_iterations: int = 100
inner_iterations: int = 25
convergence_threshold: float = 1e-05
rotation_convergence_threshold: float = 1e-05
use_cuda_graph: bool = True
distance_threshold: float = 0.2
min_valid_ratio: float = 0.1
use_huber: bool = True
huber_delta: float = 0.1
lambda_initial: float = 0.001
lambda_factor: float = 10.0
lambda_min: float = 1e-07
lambda_max: float = 10000000.0
rho_min: float = 0.25
n_points: int = 5000
device_cfg: curobo._src.types.device_cfg.DeviceCfg
property max_distance
__init__(
max_iterations=100,
inner_iterations=25,
convergence_threshold=1e-05,
rotation_convergence_threshold=1e-05,
use_cuda_graph=True,
distance_threshold=0.2,
min_valid_ratio=0.1,
use_huber=True,
huber_delta=0.1,
lambda_initial=0.001,
lambda_factor=10.0,
lambda_min=1e-07,
lambda_max=10000000.0,
rho_min=0.25,
n_points=5000,
device_cfg=<factory>,
)
Parameters:
Return type:

None

class SDFPoseDetector(robot_mesh, config=None)

Bases: object

SDF-based pose detector using mesh SDF queries and Levenberg-Marquardt.

Uses ground-truth mesh SDF for implicit correspondence finding. Supports both rigid objects and articulated robots via RobotMesh.

Uses two-pass kernel design for reduced register pressure and better GPU occupancy: - Pass 1: Query mesh SDF (BVH traversal) - Pass 2: Compute Jacobian and block-reduce

Initialize SDF pose detector.

Parameters:
__init__(
robot_mesh,
config=None,
)

Initialize SDF pose detector.

Parameters:
detect(
camera_obs,
config=None,
initial_pose=None,
)

Detect pose from camera observation.

Parameters:
  • camera_obs (CameraObservation) – Camera observation with depth and segmentation.

  • config (Optional[Tensor]) – Joint angles for articulated robots (updates mesh). Ignored for rigid.

  • initial_pose (Optional[Pose]) – Initial pose estimate. Required for SDF detector.

Return type:

DetectionResult

Returns:

DetectionResult with refined pose.

detect_from_points(
observed_points,
config=None,
initial_pose=None,
)

Detect object pose from pre-segmented 3D points.

Parameters:
  • observed_points (Tensor) – [N, 3] observed points in world frame.

  • config (Optional[Tensor]) – Joint angles for articulated robots (updates mesh). Ignored for rigid.

  • initial_pose (Optional[Pose]) – Initial pose estimate. Required for SDF detector (no random sampling).

Return type:

DetectionResult

Returns:

DetectionResult with refined pose.