curobo.perception module¶
Perception utilities module.
This module provides sensor processing utilities for robot perception tasks, including robot segmentation and volumetric mapping from depth images.
- Example (robot segmentation):
```python from curobo.perception import RobotSegmenter from curobo.kinematics import KinematicsCfg from curobo.types import CameraObservation, JointState
# Create segmenter segmenter = RobotSegmenter.from_robot_file(
robot_file=”franka.yml”, distance_threshold=0.05,
)
# Segment robot from depth image camera_obs = CameraObservation(
depth_image=depth_tensor, intrinsics=camera_intrinsics, pose=camera_pose,
) joint_state = JointState(position=current_joint_angles)
mask, filtered_depth = segmenter.get_robot_mask(camera_obs, joint_state) # mask: binary mask of robot pixels # filtered_depth: depth image with robot removed ```
- Example (volumetric mapping):
```python from curobo.perception import FilterDepth, Mapper, MapperCfg
mapper = Mapper(MapperCfg(voxel_size=0.02, …)) depth_filter = FilterDepth(image_shape=depth.shape[-2:]) filtered, valid = depth_filter(depth.unsqueeze(0)) mapper.integrate(camera_obs) ```
- Example (robot pose estimation):
```python from curobo.perception import (
DetectorCfg, PoseDetector, RobotMesh, SDFDetectorCfg, SDFPoseDetector,
)
mesh = RobotMesh.from_kinematics(kinematics, joint_state) detector = PoseDetector(DetectorCfg(…), mesh) pose = detector.detect(pointcloud) ```
- class DetectorCfg(
- n_mesh_points_coarse=500,
- n_observed_points_coarse=2000,
- n_rotation_samples=64,
- n_iterations_coarse=50,
- distance_threshold_coarse=0.5,
- n_mesh_points_fine=2000,
- n_observed_points_fine=10000,
- n_iterations_fine=50,
- distance_threshold_fine=0.01,
- use_svd=False,
- use_huber_loss=True,
- huber_delta=0.02,
- save_iterations=False,
- device_cfg=<factory>,
Bases:
objectConfiguration for pose detector (point-to-plane ICP with Huber loss).
- Parameters:
n_mesh_points_coarse (int)
n_observed_points_coarse (int)
n_rotation_samples (int)
n_iterations_coarse (int)
distance_threshold_coarse (float)
n_mesh_points_fine (int)
n_observed_points_fine (int)
n_iterations_fine (int)
distance_threshold_fine (float)
use_svd (bool)
use_huber_loss (bool)
huber_delta (float)
save_iterations (bool)
device_cfg (curobo._src.types.device_cfg.DeviceCfg)
- device_cfg: curobo._src.types.device_cfg.DeviceCfg¶
- __init__(
- n_mesh_points_coarse=500,
- n_observed_points_coarse=2000,
- n_rotation_samples=64,
- n_iterations_coarse=50,
- distance_threshold_coarse=0.5,
- n_mesh_points_fine=2000,
- n_observed_points_fine=10000,
- n_iterations_fine=50,
- distance_threshold_fine=0.01,
- use_svd=False,
- use_huber_loss=True,
- huber_delta=0.02,
- save_iterations=False,
- device_cfg=<factory>,
- Parameters:
n_mesh_points_coarse (int)
n_observed_points_coarse (int)
n_rotation_samples (int)
n_iterations_coarse (int)
distance_threshold_coarse (float)
n_mesh_points_fine (int)
n_observed_points_fine (int)
n_iterations_fine (int)
distance_threshold_fine (float)
use_svd (bool)
use_huber_loss (bool)
huber_delta (float)
save_iterations (bool)
device_cfg (curobo._src.types.device_cfg.DeviceCfg)
- Return type:
None
- class FilterDepth(
- image_shape,
- depth_minimum_distance=0.1,
- depth_maximum_distance=10.0,
- flying_pixel_threshold=0.5,
- bilateral_kernel_size=5,
- bilateral_sigma_spatial=10.0,
- bilateral_sigma_depth=0.1,
- device='cuda',
- num_batch=1,
Bases:
objectGPU-accelerated depth filtering using fused Warp kernels.
Combines range filtering, flying pixel detection, and bilateral smoothing into a single efficient GPU pass. Pre-allocates buffers at initialization for zero-allocation filtering during runtime.
Operates strictly on batched depth tensors of shape
(B, H, W). Setnum_batch > 1at init to pre-allocate batched output buffers so all images in the batch are filtered in a single kernel launch. Single-image callers must explicitly passdepth.unsqueeze(0)and indexfiltered[0]on return.- Parameters:
image_shape (
Tuple[int,int]) – Tuple of(height, width)for each depth image.depth_minimum_distance (
float) – Minimum valid depth in meters. Default: 0.1m.depth_maximum_distance (
float) – Maximum valid depth in meters. Default: 10.0m.flying_pixel_threshold (
Optional[float]) – Filter aggressiveness 0.0-1.0 for flying pixels. Higher = more aggressive. Set to None to disable. Default: 0.5.bilateral_kernel_size (
Optional[int]) – Size of bilateral kernel (must be odd). Set to None to disable bilateral filtering. Default: 5.bilateral_sigma_spatial (
float) – Spatial sigma in pixels. Default: 2.0.bilateral_sigma_depth (
float) – Depth sigma in meters. Default: 0.05.device (
str) – CUDA device. Default: “cuda”.num_batch (
int) – Pre-allocate buffers for this batch size. Default: 1.
Initialize FilterDepth with configuration and pre-allocated buffers.
- __init__(
- image_shape,
- depth_minimum_distance=0.1,
- depth_maximum_distance=10.0,
- flying_pixel_threshold=0.5,
- bilateral_kernel_size=5,
- bilateral_sigma_spatial=10.0,
- bilateral_sigma_depth=0.1,
- device='cuda',
- num_batch=1,
Initialize FilterDepth with configuration and pre-allocated buffers.
- update_config(
- depth_minimum_distance=None,
- depth_maximum_distance=None,
- flying_pixel_threshold=None,
- bilateral_sigma_depth=None,
Update filter configuration without reallocating buffers.
Only updates the specified parameters. Set to None to keep current value.
- classmethod from_config(
- config,
- image_shape,
- device='cuda',
- num_batch=1,
Create FilterDepth from a configuration object.
- Parameters:
- Return type:
- Returns:
Configured FilterDepth instance.
- class Mapper(config)¶
Bases:
objectVolumetric mapper using block-sparse storage.
Encapsulates block-sparse TSDF integrator with ESDF generation. Provides a simplified API compared to BlockSparseESDFIntegrator.
- Parameters:
config (
MapperCfg) – MapperCfg with grid and integration parameters.
Example
- config = MapperCfg(
extent_meters_xyz=(2.0, 2.0, 2.0), voxel_size=0.005,
) mapper = Mapper(config)
- for obs in observations:
mapper.integrate(obs)
voxel_grid = mapper.compute_esdf() mesh = mapper.extract_mesh()
Initialize Mapper with configuration.
- __init__(config)¶
Initialize Mapper with configuration.
- Parameters:
- property integrator: curobo._src.perception.mapper.integrator_esdf.BlockSparseESDFIntegrator¶
Access underlying integrator for advanced operations.
- property tsdf: curobo._src.perception.mapper.storage.BlockSparseTSDF¶
Access the BlockSparseTSDF storage.
- integrate(observation)¶
Integrate batched depth observation into TSDF.
The observation must have a leading camera dimension matching
config.num_cameras. SeeBlockSparseTSDFIntegrator.integratefor the expected tensor shapes.- Parameters:
observation (
CameraObservation) – Batched camera observation.- Return type:
- clear_region(
- bounds_min,
- bounds_max,
Clear dynamic map contents inside a conservative world-space AABB.
Blocks remain allocated. The cached ESDF voxel grid is invalidated and must be recomputed with
compute_esdf.- Return type:
- clear_blocks(pool_indices)¶
Clear dynamic map contents for explicit block-pool indices.
- Return type:
- compute_esdf(
- esdf_origin=None,
- esdf_voxel_size=None,
Compute ESDF from current TSDF and return as VoxelGrid.
- extract_mesh(
- refine_iterations=2,
- surface_only=True,
Extract mesh using GPU marching cubes.
- extract_occupied_voxels(
- surface_only=False,
- sdf_threshold=None,
Extract occupied voxel centers and per-voxel block indices.
- Parameters:
- Return type:
- Returns:
OccupiedVoxelswith voxel centers, per-voxel block pool indices, and a view over per-block storage.
- extract_matching_feature_voxels(
- feature_vector,
- top_k,
- surface_only=False,
- sdf_threshold=None,
- minimum_score=None,
- feature_projector=None,
Extract voxels from the top-k blocks most similar (cosine) to
feature_vector. RequiresMapperCfg.feature_dim> 0.minimum_score(cosine in the query feature space) drops matched blocks below the cut after the top-k selection, so the voxel-extraction kernel skips work proportional to filtered blocks. Pair a generoustop_kwithminimum_scoreto get all matches above a quality threshold.Returns a
MatchedVoxelscarrying the extracted voxels plus parallel(K,)block_pool_idx/block_scorestensors in descending score order.block_pool_idxcan be passed directly toclear_blocksto drop the matched blocks from the map.feature_projectoroptionally maps stored block features into the query space before scoring. This keeps model-specific heads (e.g. RADIO -> SigLIP) outside the mapper while still using the mapper’s all-allocated-block scoring and extraction path.- Return type:
- Parameters:
feature_vector (torch.Tensor)
top_k (int)
surface_only (bool)
sdf_threshold (float | None)
minimum_score (float | None)
feature_projector (Callable[[torch.Tensor], torch.Tensor] | None)
- get_matching_feature_voxels(
- feature_vector,
- top_k,
- surface_only=False,
- sdf_threshold=None,
- minimum_score=None,
- feature_projector=None,
Compatibility wrapper for
extract_matching_feature_voxels.- Return type:
- Parameters:
feature_vector (torch.Tensor)
top_k (int)
surface_only (bool)
sdf_threshold (float | None)
minimum_score (float | None)
feature_projector (Callable[[torch.Tensor], torch.Tensor] | None)
- get_stats(
- scan_pool=True,
- scan_hash=False,
Get mapper statistics.
- Parameters:
scan_pool (
bool) – Forwarded to the integrator’s stats call. When True (default) returns the ground-truthactive_blocksand theholesinvariant via an O(num_allocated) GPU reduction. When False, falls back to the cheapnum_allocated - free_countapproximation.scan_hash (
bool) – Forwarded to the integrator’s stats call. When True, adds hash table occupancy stats via an O(hash_capacity) reduction. Periodic use only.
- Return type:
- Returns:
Dictionary with block usage, memory stats, frame count, and
last_integration_kernel_timings_ms. SeeBlockSparseTSDF.get_statsfor the full block-pool key list.
- update_static_obstacles(
- scene,
- env_idx=0,
Update static obstacles in the TSDF from scene collision tensors.
This method clears the static SDF channel and re-stamps all primitives from the provided scene. The static channel does not decay and is combined with the dynamic (depth) channel using min() for collision.
Note: This is not intended to be called every frame. Use it when static obstacle poses change (e.g., after robot reconfiguration).
- render(
- intrinsics,
- pose,
- image_shape,
Render depth and normals from current TSDF.
- render_color(
- intrinsics,
- pose,
- image_shape,
Render depth, normals, and color from current TSDF.
- Parameters:
- Return type:
- Returns:
Tuple of (depth, normals, color, valid_mask). - depth: (H, W) in meters - normals: (H, W, 3) world-frame normals - color: (H, W, 3) uint8 RGB - valid_mask: (H, W) bool
- render_depth(
- intrinsics,
- pose,
- image_shape,
Render only depth image.
- render_color_only(
- intrinsics,
- pose,
- image_shape,
Render only color image.
- render_shaded(
- intrinsics,
- pose,
- image_shape,
- light_direction=(0.0, 0.0, 1.0),
- ambient=1.0,
- use_color=True,
Render Lambertian-shaded image.
- Parameters:
intrinsics (
Tensor) – Camera intrinsics.pose (
Pose) – Camera-to-world transform as Pose.image_shape (
Tuple[int,int]) – (height, width) of output images.light_direction (
Tuple[float,float,float]) – Light direction in camera frame.ambient (
float) – Ambient lighting factor (0-1).use_color (
bool) – If True, use TSDF color. If False, use gray.
- Returns:
(H, W, 3) uint8 RGB shaded image.
- Return type:
shaded
- render_depth_colormap(
- intrinsics,
- pose,
- image_shape,
Render depth as colormap for visualization.
- render_normal_colormap(
- intrinsics,
- pose,
- image_shape,
Render normals as colormap for visualization.
- class MapperCfg(
- extent_meters_xyz,
- voxel_size=0.005,
- esdf_voxel_size=0.05,
- extent_esdf_meters_xyz=None,
- grid_center=None,
- truncation_distance=0.04,
- minimum_tsdf_weight=0.1,
- depth_minimum_distance=0.1,
- depth_maximum_distance=10.0,
- decay_factor=1.0,
- frustum_decay_factor=1.0,
- block_size=4,
- hash_load_factor=0.5,
- roughness=3.0,
- seeding_method='gather',
- edt_solver='pba',
- enable_static=False,
- static_obstacle_color=(20, 20, 20),
- num_cameras=1,
- image_height=None,
- image_width=None,
- feature_dim=0,
- feature_grid_height=None,
- feature_grid_width=None,
- max_visible_blocks_per_integration=None,
- max_support_pixels_per_block_camera=8,
- feature_channels_per_thread=8,
- max_feature_tile_channels=4096,
- feature_integration_kernel='auto',
- profile_integration_kernel_timings=False,
- accumulator_w_max=1000.0,
- device='cuda:0',
Bases:
objectConfiguration for volumetric mapping.
- Coordinate Convention:
extent_meters_xyz: (x, y, z) physical extent of voxel bounds (not centers)
grid_shape: (nz, ny, nx) voxel counts - Z is slowest, X is fastest
Memory layout: row-major with X contiguous (optimal for image projection)
grid_center: world coordinate at center of grid (default: origin)
Note: Actual extent may be slightly larger than requested due to ceil() rounding. Use get_actual_extent() to get the true physical dimensions.
- Index-to-World Transform (voxel center):
world_x = grid_center[0] + (ix - (nx-1)/2) * voxel_size world_y = grid_center[1] + (iy - (ny-1)/2) * voxel_size world_z = grid_center[2] + (iz - (nz-1)/2) * voxel_size
- Parameters:
voxel_size (float)
esdf_voxel_size (float)
grid_center (torch.Tensor | None)
truncation_distance (float)
minimum_tsdf_weight (float)
depth_minimum_distance (float)
depth_maximum_distance (float)
decay_factor (float)
frustum_decay_factor (float)
block_size (int)
hash_load_factor (float)
roughness (float)
seeding_method (str)
edt_solver (str)
enable_static (bool)
num_cameras (int)
image_height (int | None)
image_width (int | None)
feature_dim (int)
feature_grid_height (int | None)
feature_grid_width (int | None)
max_visible_blocks_per_integration (int | None)
max_support_pixels_per_block_camera (int)
feature_channels_per_thread (int)
max_feature_tile_channels (int)
feature_integration_kernel (str)
profile_integration_kernel_timings (bool)
accumulator_w_max (float)
device (str)
- extent_meters_xyz¶
Physical extent (x, y, z) of voxel bounds in meters.
- voxel_size¶
Voxel edge length in meters.
- grid_center¶
World coordinate at grid center (default: origin).
- truncation_distance¶
TSDF truncation distance in meters.
- minimum_tsdf_weight¶
Minimum weight to consider a voxel as observed.
- depth_minimum_distance¶
Minimum valid depth in meters.
- depth_maximum_distance¶
Maximum valid depth in meters.
- decay_factor¶
Weight decay applied to ALL voxels each frame (1.0 = no decay).
- frustum_decay_factor¶
Additional decay for in-view voxels.
- block_size¶
Voxels per block edge. Supported values are 1 or powers of 2 in [2, 32] (default 8). Specializes the Warp kernel builder; two Mappers with different block sizes can coexist.
- hash_load_factor¶
Target hash table load factor.
- device¶
CUDA device string.
- grid_center: torch.Tensor | None = None¶
- block_size: int = 4¶
Voxels per block edge. Supported values are 1 or powers of 2 in [2, 32]. See
BlockSparseKernels.
- seeding_method: str = 'gather'¶
"scatter"or"gather"."scatter": iterates every allocated TSDF voxel and maps each surface voxel to the single ESDF cell that contains its center. The resulting seed band is exactly one ESDF voxel thick at the surface, faithfully matching the TSDF. Launch dimension depends on the number of allocated TSDF blocks, so it is not CUDA-graph safe."gather": iterates every ESDF voxel and probes 7 TSDF positions (cell center + 6 face centers at±esdf_vs/2). Because the face-center samples lie on the ESDF cell boundary, they can detect surface voxels that belong to neighboring ESDF cells, producing a dilated seed band (~1.5 voxels thick). This thicker band gives PBA more seed sites near the surface, which typically yields slightly higher collision recall at the cost of not matching the TSDF surface exactly. Fixed launch dimension (D×H×W), CUDA-graph safe.
- Type:
ESDF surface seeding strategy
- image_height: int | None = None¶
Camera image height in pixels. Required; used to pre-allocate the voxel-project scratch buffer at Mapper construction (buffer size
num_cameras * image_height * image_width * num_samples) so no per-frame reallocation or D2H syncs occur.
- image_width: int | None = None¶
Camera image width in pixels. Required; see
image_height.
- feature_grid_height: int | None = None¶
Compile-time feature-grid height. Required when
feature_dim > 0; must beNonewhen features are disabled.
- feature_grid_width: int | None = None¶
Compile-time feature-grid width. Required when
feature_dim > 0; must beNonewhen features are disabled.
- max_visible_blocks_per_integration: int | None = None¶
Maximum visible blocks one integration frame may process. Defaults to
max_blocksafter that value is computed.
- max_support_pixels_per_block_camera: int = 8¶
Maximum support pixels stored per visible block per camera for RGB and feature integration. Larger values use more scratch memory and preserve more per-block color/feature evidence.
- feature_channels_per_thread: int = 8¶
Number of adjacent feature channels accumulated by one feature-kernel thread. Kept explicit so Python launch grouping and Warp kernel decoding cannot drift apart.
- max_feature_tile_channels: int = 4096¶
Compile-time cap for feature channels accumulated by one tiled feature-kernel CTA. The generated tile width is
min(feature_dim, max_feature_tile_channels).
- feature_integration_kernel: str = 'auto'¶
"auto","grouped", or"tiled"."auto"resolves to the tiled kernel only for feature dimensions and support capacities where benchmarks showed a win.- Type:
Feature integration launch policy
- profile_integration_kernel_timings: bool = False¶
Record per-kernel integration timings into
Mapper.get_stats()["last_integration_kernel_timings_ms"]. Disabled by default to avoid profiling synchronizations in normal integration.
- accumulator_w_max: float = 1000.0¶
Upper bound on per-block accumulator weight. Caps fp16
block_rgb/block_featuresmagnitudes each frame and sets the EMA decay rate for old observations (effective window~W_max / mean_per_frame_weight). Raise for longer memory, lower for faster adaptation to dynamic scenes.
- property max_blocks: int¶
Compute max allocated blocks using surface area × thickness heuristic.
TSDF only allocates blocks near surfaces, not throughout the volume. The surface is a volumetric band with thickness = 2 × truncation_distance.
- Formula:
max_blocks = surface_area_blocks × thickness_factor × roughness
This scales O(N²) with grid size, not O(N³), matching actual TSDF usage.
- get_actual_extent()¶
Returns actual physical extent (x, y, z) after ceil() rounding.
- voxel_to_world(iz, iy, ix)¶
Convert voxel index to world coordinate at voxel center.
- world_to_voxel(
- world_x,
- world_y,
- world_z,
Convert world coordinate to voxel index.
- __init__(
- extent_meters_xyz,
- voxel_size=0.005,
- esdf_voxel_size=0.05,
- extent_esdf_meters_xyz=None,
- grid_center=None,
- truncation_distance=0.04,
- minimum_tsdf_weight=0.1,
- depth_minimum_distance=0.1,
- depth_maximum_distance=10.0,
- decay_factor=1.0,
- frustum_decay_factor=1.0,
- block_size=4,
- hash_load_factor=0.5,
- roughness=3.0,
- seeding_method='gather',
- edt_solver='pba',
- enable_static=False,
- static_obstacle_color=(20, 20, 20),
- num_cameras=1,
- image_height=None,
- image_width=None,
- feature_dim=0,
- feature_grid_height=None,
- feature_grid_width=None,
- max_visible_blocks_per_integration=None,
- max_support_pixels_per_block_camera=8,
- feature_channels_per_thread=8,
- max_feature_tile_channels=4096,
- feature_integration_kernel='auto',
- profile_integration_kernel_timings=False,
- accumulator_w_max=1000.0,
- device='cuda:0',
- Parameters:
voxel_size (float)
esdf_voxel_size (float)
grid_center (torch.Tensor | None)
truncation_distance (float)
minimum_tsdf_weight (float)
depth_minimum_distance (float)
depth_maximum_distance (float)
decay_factor (float)
frustum_decay_factor (float)
block_size (int)
hash_load_factor (float)
roughness (float)
seeding_method (str)
edt_solver (str)
enable_static (bool)
num_cameras (int)
image_height (int | None)
image_width (int | None)
feature_dim (int)
feature_grid_height (int | None)
feature_grid_width (int | None)
max_visible_blocks_per_integration (int | None)
max_support_pixels_per_block_camera (int)
feature_channels_per_thread (int)
max_feature_tile_channels (int)
feature_integration_kernel (str)
profile_integration_kernel_timings (bool)
accumulator_w_max (float)
device (str)
- Return type:
None
- class MatchedVoxels(
- voxels,
- block_pool_idx,
- block_scores,
Bases:
objectResult of a feature-similarity query on a BlockSparseTSDF.
Voxels are extracted from the top-K matched blocks.
block_pool_idxandblock_scoresare parallel(K,)tensors in descending score order — indexiin one corresponds to indexiin the other.Scores are cosine similarity in the same feature space as the query vector (range
[-1, 1]). Callers querying through a projection (e.g., RADIO -> teacher space) must interpret thresholds in the projected space.block_pool_idxis suitable for direct use withMapper.clear_blocksto drop the matched blocks from the map.- Parameters:
voxels (curobo._src.perception.mapper.storage.OccupiedVoxels)
block_pool_idx (torch.Tensor)
block_scores (torch.Tensor)
- voxels¶
Extracted voxels from the matched blocks.
len(voxels)may be 0 even whenlen(block_pool_idx) > 0if surface filtering rejected every voxel inside the matched blocks.
- block_pool_idx¶
(K,)int32 globalpool_idxof matched blocks, sorted byblock_scoresdescending.
- block_scores¶
(K,)float32 cosine scores, parallel toblock_pool_idx(same descending order).
- block_pool_idx: torch.Tensor¶
- block_scores: torch.Tensor¶
- scores_per_voxel(
- fill_value=nan,
Gather per-voxel score from per-block scores.
Convenience for visualization (e.g., color voxels by similarity). Granularity is fundamentally per-block — every voxel in a matched block shares its block’s score. Prefer
block_scoresfor ranking and filtering.- Parameters:
fill_value (
float) – Value written for voxels whosepool_idxis not inblock_pool_idx. Defaults to NaN so an unset entry is loud rather than silently masquerading as a low-similarity match. Voxels invoxelsshould always come from matched blocks;fill_valueis a safety net.- Return type:
- Returns:
(N,)float32 tensor parallel tovoxels.centers.
- __init__(
- voxels,
- block_pool_idx,
- block_scores,
- Parameters:
voxels (curobo._src.perception.mapper.storage.OccupiedVoxels)
block_pool_idx (torch.Tensor)
block_scores (torch.Tensor)
- Return type:
None
- class OccupiedVoxels(
- centers,
- block_idx_per_voxel,
- block_data,
Bases:
objectResult of
extract_occupied_voxels.- Parameters:
centers (torch.Tensor)
block_idx_per_voxel (torch.Tensor)
block_data (curobo._src.perception.mapper.storage.BlockDataView)
- centers¶
(N, 3)float32 voxel world positions.
- block_idx_per_voxel¶
(N,)int32 globalpool_idx— index intoBlockDataViewfields.
- block_data¶
Reference view of per-block storage tensors.
- centers: torch.Tensor¶
- block_idx_per_voxel: torch.Tensor¶
- block_data: curobo._src.perception.mapper.storage.BlockDataView¶
- colors_uint8(eps=1e-06)¶
Gather per-voxel RGB colors as
(N, 3)uint8.Weighted sums (stored fp16, RGB normalized to
[0, 1]at the integration site) are divided in fp32 and rescaled back to[0, 255]before the uint8 cast. Clamp protects degenerate blocks with zero weight from division by zero.
- features(eps=1e-06)¶
Gather per-voxel normalized features as
(N, feature_dim)float32.Weighted sums (stored fp16) are divided in fp32 to avoid compounding ulp loss. The per-block feature weight is clamped to avoid division by zero for blocks that never received a feature observation.
- __init__(
- centers,
- block_idx_per_voxel,
- block_data,
- Parameters:
centers (torch.Tensor)
block_idx_per_voxel (torch.Tensor)
block_data (curobo._src.perception.mapper.storage.BlockDataView)
- Return type:
None
- class PoseDetector(geometry, config)¶
Bases:
objectPoint-to-plane ICP detector with Huber loss and coarse-to-fine strategy.
Achieves sub-millimeter accuracy (1.3mm translation, 0.03° rotation) on robot tracking. Works with any geometry class that provides sample_surface_points and get_dof methods.
Supported geometry types: - RigidObjectGeometry: Static objects - ArticulatedRobotGeometry: Robots with FK-based point transformation - RobotMesh: New unified mesh class with Warp support
Key features: - Point-to-plane ICP (3× better than point-to-point) - Huber loss for outlier robustness (12× improvement) - Coarse-to-fine with 64 random rotation initializations - Cached surface points and normals for speed
Initialize pose detector.
- Parameters:
geometry (
Union[RigidObjectGeometry,ArticulatedRobotGeometry,RobotMesh]) – Geometry model (RigidObjectGeometry, ArticulatedRobotGeometry, or RobotMesh).config (
DetectorCfg) – Detector configuration.
- __init__(geometry, config)¶
Initialize pose detector.
- Parameters:
geometry (
Union[RigidObjectGeometry,ArticulatedRobotGeometry,RobotMesh]) – Geometry model (RigidObjectGeometry, ArticulatedRobotGeometry, or RobotMesh).config (
DetectorCfg) – Detector configuration.
- detect(camera_obs, config)¶
Detect pose from camera observation.
- Parameters:
camera_obs (
CameraObservation) – Camera observation with depth and segmentation.config (
Tensor) – Object configuration (joint angles for robot, ignored for rigid).
- Return type:
DetectionResult- Returns:
DetectionResult with pose and alignment error.
- detect_from_points(
- observed_points,
- config,
- initial_pose=None,
Detect pose from pre-segmented 3D points.
This is useful when you already have segmented points (e.g., from a bounding box or semantic segmentation) and don’t need to extract them from a camera observation.
- Parameters:
observed_points (
Tensor) – [N, 3] tensor of observed 3D points.config (
Tensor) – Object configuration (joint angles for robot, ignored for rigid).initial_pose (
Optional[Pose]) – Optional initial pose guess. If provided, only refines from this pose (no random rotation sampling). If None, uses random rotations.
- Returns:
Detection result with pose and alignment error.
- Return type:
result
- class RobotMesh(
- vertices,
- faces,
- device='cuda:0',
- kinematics=None,
- link_vertex_ranges=None,
- link_names=None,
- link_vertices_local=None,
Bases:
objectUnified mesh representation for rigid objects and articulated robots.
Provides mesh data for both ICP-based and SDF-based pose detection: - mesh property: wp.Mesh for SDF queries - sample_surface_points(): returns points + normals for ICP
For articulated robots, call update(joint_angles) to transform vertices and refit BVH. The mesh reference remains constant (CUDA graph safe).
Example
# Rigid object robot_mesh = RobotMesh.from_trimesh(trimesh.load(“object.stl”)) points, normals = robot_mesh.sample_surface_points(1000)
# Articulated robot robot_mesh = RobotMesh.from_kinematics(kinematics) robot_mesh.update(joint_angles) points, normals = robot_mesh.sample_surface_points(1000)
# Use with SDF detector sdf = wp.mesh_query_point(robot_mesh.mesh.id, query_point, max_dist)
Initialize RobotMesh.
Use factory methods from_trimesh() or from_kinematics() instead of calling this constructor directly.
- Parameters:
vertices (
Tensor) – [V, 3] mesh vertices in world frame.faces (
Tensor) – [F, 3] face indices.device (
str) – CUDA device string.kinematics (
Optional[Kinematics]) – Kinematics instance for articulated robots.link_vertex_ranges (
Optional[List[Tuple[int,int]]]) – List of (start, end) indices for each link’s vertices.link_names (
Optional[List[str]]) – List of link names (for FK lookup).link_vertices_local (
Optional[List[Tensor]]) – List of vertices in each link’s local frame.
- __init__(
- vertices,
- faces,
- device='cuda:0',
- kinematics=None,
- link_vertex_ranges=None,
- link_names=None,
- link_vertices_local=None,
Initialize RobotMesh.
Use factory methods from_trimesh() or from_kinematics() instead of calling this constructor directly.
- Parameters:
vertices (
Tensor) – [V, 3] mesh vertices in world frame.faces (
Tensor) – [F, 3] face indices.device (
str) – CUDA device string.kinematics (
Optional[Kinematics]) – Kinematics instance for articulated robots.link_vertex_ranges (
Optional[List[Tuple[int,int]]]) – List of (start, end) indices for each link’s vertices.link_names (
Optional[List[str]]) – List of link names (for FK lookup).link_vertices_local (
Optional[List[Tensor]]) – List of vertices in each link’s local frame.
- classmethod from_trimesh(
- mesh,
- device='cuda:0',
Create RobotMesh from a trimesh object (rigid object).
- classmethod from_kinematics(
- kinematics,
- device='cuda:0',
- initial_joint_angles=None,
Create RobotMesh from a Kinematics instance (articulated robot).
Combines all link meshes into a single mesh. Call update(joint_angles) to transform vertices based on FK.
- Parameters:
kinematics (
Kinematics) – CuRobo Kinematics instance.device (
str) – CUDA device string.initial_joint_angles (
Optional[Tensor]) – Initial joint configuration. If None, uses zeros.
- Return type:
- Returns:
RobotMesh instance for articulated robot.
- property mesh: warp._src.types.Mesh¶
Get Warp mesh for SDF queries.
The mesh reference is constant (CUDA graph safe), but vertices are updated in-place when update() is called.
- Returns:
wp.Mesh instance.
- property mesh_id: warp._src.types.uint64¶
Get Warp mesh ID for kernel use.
- Returns:
Mesh ID as wp.uint64.
- property vertices: torch.Tensor¶
Get current vertices as torch tensor.
- Returns:
[V, 3] vertex positions in world frame.
- property faces: torch.Tensor¶
Get face indices.
- Returns:
[F, 3] face vertex indices.
- property current_joint_angles: torch.Tensor | None¶
Current joint configuration (None for rigid objects).
- update(joint_angles)¶
Update mesh vertices based on joint configuration.
Transforms all link vertices using FK and refits BVH. No-op for rigid objects.
- sample_surface_points(
- n_points,
- resample=False,
Sample points and normals from mesh surface.
Uses pre-cached sample indices for consistency. Set resample=True to generate new random samples.
- get_dof()¶
Get degrees of freedom.
- Return type:
- Returns:
0 for rigid objects, n_joints for articulated robots.
- class RobotSegmenter(
- kinematics,
- distance_threshold=0.05,
- use_cuda_graph=True,
- ops_dtype=torch.bfloat16,
Bases:
objectSegment robot from depth images using collision spheres.
This class computes masks for robot pixels in depth images by checking distances between depth points and robot collision spheres.
Example
```python from curobo._src.perception.robot_segmenter import RobotSegmenter
segmenter = RobotSegmenter.from_robot_file(“franka.yml”) mask, filtered_image = segmenter.get_robot_mask(camera_obs, joint_state) ```
Initialize the robot segmenter.
- Parameters:
kinematics (
Kinematics) – Robot kinematics model for forward kinematics.distance_threshold (
float) – Distance threshold for segmentation.use_cuda_graph (
bool) – Whether to use CUDA graphs for acceleration.ops_dtype (
dtype) – Data type for operations.
- __init__(
- kinematics,
- distance_threshold=0.05,
- use_cuda_graph=True,
- ops_dtype=torch.bfloat16,
Initialize the robot segmenter.
- Parameters:
kinematics (
Kinematics) – Robot kinematics model for forward kinematics.distance_threshold (
float) – Distance threshold for segmentation.use_cuda_graph (
bool) – Whether to use CUDA graphs for acceleration.ops_dtype (
dtype) – Data type for operations.
- static from_robot_file(
- robot_file,
- collision_sphere_buffer=None,
- distance_threshold=0.05,
- use_cuda_graph=True,
- device_cfg=DeviceCfg(device=device(type='cuda', index=0), dtype=torch.float32, collision_geometry_dtype=torch.float32, collision_gradient_dtype=torch.float32, collision_distance_dtype=torch.float32),
Create a RobotSegmenter from a robot configuration file.
- Parameters:
robot_file (
Union[str,Dict]) – Path to robot configuration file or dict.collision_sphere_buffer (
Optional[float]) – Optional buffer to add to collision spheres.distance_threshold (
float) – Distance threshold for segmentation.use_cuda_graph (
bool) – Whether to use CUDA graphs.device_cfg (
DeviceCfg) – Tensor device configuration.
- Return type:
- Returns:
Configured RobotSegmenter instance.
- get_pointcloud_from_depth(
- camera_obs,
Convert depth image to point cloud.
- Parameters:
camera_obs (
CameraObservation) – Camera observation with depth image.- Return type:
- Returns:
Point cloud tensor.
- update_camera_projection(
- camera_obs,
Update camera projection rays from observation.
- Parameters:
camera_obs (
CameraObservation) – Camera observation with intrinsics.- Return type:
- get_robot_mask(
- camera_obs,
- joint_state,
Get robot mask from camera observation and joint state.
Assumes 1 robot and batch of depth images, batch of poses.
- Parameters:
camera_obs (
CameraObservation) – Camera observation with depth image.joint_state (
JointState) – Current joint state of the robot.
- Return type:
- Returns:
Tuple of (mask, filtered_image).
- Raises:
ValueError – If depth image is not 3D (batch, height, width).
- get_robot_mask_from_active_js(
- camera_obs,
- active_joint_state,
Get robot mask using active joint state.
- Parameters:
camera_obs (
CameraObservation) – Camera observation with depth image.active_joint_state (
JointState) – Joint state with only active joints.
- Return type:
- Returns:
Tuple of (mask, filtered_image).
- property kinematics: curobo._src.robot.kinematics.kinematics.Kinematics¶
Get the robot kinematics model.
- class SDFDetectorCfg(
- max_iterations=100,
- inner_iterations=25,
- convergence_threshold=1e-05,
- rotation_convergence_threshold=1e-05,
- use_cuda_graph=True,
- distance_threshold=0.2,
- min_valid_ratio=0.1,
- use_huber=True,
- huber_delta=0.1,
- lambda_initial=0.001,
- lambda_factor=10.0,
- lambda_min=1e-07,
- lambda_max=10000000.0,
- rho_min=0.25,
- n_points=5000,
- device_cfg=<factory>,
Bases:
objectConfiguration for SDF-based pose detector (Levenberg-Marquardt optimization).
Uses mesh SDF queries for implicit correspondence and analytic gradients.
- Parameters:
max_iterations (int)
inner_iterations (int)
convergence_threshold (float)
rotation_convergence_threshold (float)
use_cuda_graph (bool)
distance_threshold (float)
min_valid_ratio (float)
use_huber (bool)
huber_delta (float)
lambda_initial (float)
lambda_factor (float)
lambda_min (float)
lambda_max (float)
rho_min (float)
n_points (int)
device_cfg (curobo._src.types.device_cfg.DeviceCfg)
- device_cfg: curobo._src.types.device_cfg.DeviceCfg¶
- property max_distance¶
- __init__(
- max_iterations=100,
- inner_iterations=25,
- convergence_threshold=1e-05,
- rotation_convergence_threshold=1e-05,
- use_cuda_graph=True,
- distance_threshold=0.2,
- min_valid_ratio=0.1,
- use_huber=True,
- huber_delta=0.1,
- lambda_initial=0.001,
- lambda_factor=10.0,
- lambda_min=1e-07,
- lambda_max=10000000.0,
- rho_min=0.25,
- n_points=5000,
- device_cfg=<factory>,
- Parameters:
max_iterations (int)
inner_iterations (int)
convergence_threshold (float)
rotation_convergence_threshold (float)
use_cuda_graph (bool)
distance_threshold (float)
min_valid_ratio (float)
use_huber (bool)
huber_delta (float)
lambda_initial (float)
lambda_factor (float)
lambda_min (float)
lambda_max (float)
rho_min (float)
n_points (int)
device_cfg (curobo._src.types.device_cfg.DeviceCfg)
- Return type:
None
- class SDFPoseDetector(robot_mesh, config=None)¶
Bases:
objectSDF-based pose detector using mesh SDF queries and Levenberg-Marquardt.
Uses ground-truth mesh SDF for implicit correspondence finding. Supports both rigid objects and articulated robots via RobotMesh.
Uses two-pass kernel design for reduced register pressure and better GPU occupancy: - Pass 1: Query mesh SDF (BVH traversal) - Pass 2: Compute Jacobian and block-reduce
Initialize SDF pose detector.
- Parameters:
robot_mesh (
RobotMesh) – RobotMesh instance providing mesh for SDF queries.config (
Optional[SDFDetectorCfg]) – Detector configuration.
- __init__(
- robot_mesh,
- config=None,
Initialize SDF pose detector.
- Parameters:
robot_mesh (
RobotMesh) – RobotMesh instance providing mesh for SDF queries.config (
Optional[SDFDetectorCfg]) – Detector configuration.
- detect(
- camera_obs,
- config=None,
- initial_pose=None,
Detect pose from camera observation.
- Parameters:
camera_obs (
CameraObservation) – Camera observation with depth and segmentation.config (
Optional[Tensor]) – Joint angles for articulated robots (updates mesh). Ignored for rigid.initial_pose (
Optional[Pose]) – Initial pose estimate. Required for SDF detector.
- Return type:
DetectionResult- Returns:
DetectionResult with refined pose.
- detect_from_points(
- observed_points,
- config=None,
- initial_pose=None,
Detect object pose from pre-segmented 3D points.
- Parameters:
- Return type:
DetectionResult- Returns:
DetectionResult with refined pose.