ScanNet Example

This example trains a semantic segmentation model on ScanNet indoor scenes using a MinkUNet-style encoder-decoder built with sparse convolutions.

Dataset

The script uses the pre-processed ScanNet 3D point clouds from the OpenScene project. Each scene is stored as (coords, colors, labels):

  • coords: (N, 3) float32 — 3D point positions
  • colors: (N, 3) float32 — RGB color features
  • labels: (N,) int — semantic class labels (20 classes, 255 = ignore)

The 20 semantic classes include: wall, floor, cabinet, bed, chair, sofa, table, door, window, bookshelf, picture, counter, desk, curtain, refrigerator, shower curtain, toilet, sink, bathtub, and other furniture.

The dataset is downloaded automatically on first run (~1.3 GB) to ./data/scannet_3d/.

No data augmentation

This example does not apply augmentations (random rotation, scaling, color jitter, etc.). For high-quality training results, implement your own augmentation pipeline.

Network architecture

The default model is MinkUNet18, a U-Net with sparse convolution encoder and decoder blocks connected by skip connections. Available models:

Model Description
mink_unet.MinkUNet18 Lightweight U-Net (default)
mink_unet.MinkUNet34 Deeper encoder
mink_unet.MinkUNet50 ResNet-50 style blocks
mink_unet.MinkUNet101 ResNet-101 style blocks

Input points are voxelized at voxel_size=0.02 and wrapped via PointToSparseWrapper, which handles the point-to-voxel conversion and maps output features back to the original point resolution.

The model outputs per-point logits with shape (N, 20).

Setup

Install the optional model and training dependencies:

pip install "warpconvnet[models]"

Additional requirements: hydra-core, omegaconf, torchmetrics.

Run

python examples/scannet.py

The script uses Hydra for configuration. Override any parameter on the command line:

# Smaller batch size for limited GPU memory
python examples/scannet.py train.batch_size=4

# Use a deeper model
python examples/scannet.py model._target_=mink_unet.MinkUNet34

# Change voxel size and learning rate
python examples/scannet.py data.voxel_size=0.05 train.lr=0.01

Configuration reference

Paths:

Key Default Description
paths.data_dir ./data/scannet_3d Dataset directory
paths.output_dir ./results/ Output directory
paths.ckpt_path null Checkpoint path to resume from

Training:

Key Default Description
train.batch_size 12 Training batch size
train.lr 0.001 AdamW learning rate
train.epochs 100 Number of training epochs
train.step_size 20 StepLR decay period (epochs)
train.gamma 0.7 StepLR decay factor
train.num_workers 8 DataLoader workers

Test:

Key Default Description
test.batch_size 12 Test batch size
test.num_workers 4 DataLoader workers

Data:

Key Default Description
data.num_classes 20 Number of semantic classes
data.voxel_size 0.02 Voxelization resolution (meters)
data.ignore_index 255 Label index to ignore in loss/metrics

Model:

Key Default Description
model._target_ mink_unet.MinkUNet18 Model class to instantiate
model.in_channels 3 Input feature channels (RGB)
model.out_channels 20 Output channels (num classes)
model.in_type voxel Input type (voxel wraps model with PointToSparseWrapper)

General:

Key Default Description
device cuda Device
use_wandb false Enable Weights & Biases logging
seed 42 Random seed

Expected output

Each epoch prints a progress bar followed by test-set evaluation with accuracy and mean IoU:

Train Epoch: 1 Loss:  2.143: 100%|██████████| 104/104
Test set: Average loss:  1.8234, Accuracy:  42.15%, mIoU:  18.73%

After 100 epochs with default settings, expect roughly:

  • Overall accuracy: ~75-80%
  • mIoU: ~55-65%

Results will vary with augmentation, model choice, and voxel size. This example is intended as a starting point, not a benchmark-tuned recipe.