Compilation Guide¶
WarpConvNet ships CUDA C++ extensions that must be compiled against your specific PyTorch and CUDA versions. This page covers all compilation methods and common issues.
Prerequisites¶
- NVIDIA GPU with compute capability >= 7.0 (Volta or newer)
- CUDA toolkit with
nvccmatching your PyTorch CUDA version - PyTorch with CUDA support (CPU-only builds are not supported)
- ninja build system (for parallel compilation)
- C++17 compatible compiler (GCC >= 7, Clang >= 5)
Method 1: Pre-built wheels (no compilation)¶
The fastest option. Pre-built wheels are available for common PyTorch + CUDA combinations:
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu128
pip install "warpconvnet==1.5.0+torch2.10cu128" \
--find-links https://github.com/NVlabs/WarpConvNet/releases/latest/download/
Replace the version string to match your PyTorch + CUDA combo. See available wheels.
Method 2: pip install from source¶
Builds the CUDA extension automatically. Takes ~10 minutes.
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu128
pip install build ninja
# From PyPI
pip install warpconvnet
# Or from a local clone
git clone https://github.com/NVlabs/WarpConvNet.git
cd WarpConvNet
git submodule update --init 3rdparty/cutlass
pip install -e . --no-build-isolation
Targeting specific GPU architectures¶
By default, WarpConvNet auto-detects your current GPU and compiles only for that architecture. To target specific architectures (e.g., for deployment across different GPUs):
export TORCH_CUDA_ARCH_LIST="8.0 8.9" # A100 + RTX 6000 Ada
pip install -e . --no-build-isolation
Common architecture codes:
| GPU family | Compute capability |
|---|---|
| Volta (V100) | 7.0 |
| Turing (RTX 20xx) | 7.5 |
| Ampere (A100, RTX 30xx) | 8.0, 8.6 |
| Ada Lovelace (RTX 40xx, RTX 6000 Ada) | 8.9 |
| Hopper (H100) | 9.0 |
| Blackwell (B200) | 10.0 |
Method 3: build_ext (in-place compilation only)¶
Use setup.py build_ext --inplace when you want to compile the C++ extension
without a full pip install. This is useful for development iteration — it
builds warpconvnet/_C.*.so directly into the source tree.
git clone https://github.com/NVlabs/WarpConvNet.git
cd WarpConvNet
git submodule update --init 3rdparty/cutlass
pip install build ninja # build dependencies
python setup.py build_ext --inplace
setuptools-scm version detection¶
WarpConvNet uses setuptools-scm to derive the package version from git
tags (e.g., tag v1.5.0 produces version 1.5.0). This can fail in several
situations:
- Detached HEAD (e.g., during
git rebase) - Shallow clones (
git clone --depth 1) - No git tags in the clone
- Worktrees without tag visibility
When version detection fails, you'll see an error like:
LookupError: setuptools-scm was unable to detect version
Fix: Set the SETUPTOOLS_SCM_PRETEND_VERSION environment variable to
bypass git-based version detection:
# build_ext only (development)
SETUPTOOLS_SCM_PRETEND_VERSION=0.0.0 python setup.py build_ext --inplace
# Full editable install
SETUPTOOLS_SCM_PRETEND_VERSION=0.0.0 pip install -e . --no-build-isolation
# Clean rebuild (removes stale build artifacts)
rm -rf build/
SETUPTOOLS_SCM_PRETEND_VERSION=0.0.0 python setup.py build_ext --inplace
The version string is only used for metadata — it doesn't affect
functionality. Use 0.0.0 for local development or any valid version string.
Verify the build¶
After compilation, verify the extension loads correctly:
python -c "import warpconvnet; print('OK')"
If the C++ extension fails to load, you'll see an ImportError with details
about the missing symbol or ABI mismatch.
Troubleshooting¶
ninja: build stopped: subcommand failed¶
The actual error is usually above the ninja message. Scroll up to find the
nvcc or g++ error. Common causes:
- CUDA version mismatch:
nvcc --versionmust match the CUDA version your PyTorch was built with. Check withpython -c "import torch; print(torch.version.cuda)". - Missing CUTLASS submodule: Run
git submodule update --init 3rdparty/cutlass.
undefined symbol at import time¶
The compiled extension was built against a different PyTorch version than the one currently installed. Rebuild:
rm -rf build/
SETUPTOOLS_SCM_PRETEND_VERSION=0.0.0 python setup.py build_ext --inplace
Slow compilation¶
Compilation builds 30+ CUDA source files. To speed it up:
- Install
ninja(pip install ninja) — enables parallel compilation. - Limit target architectures:
export TORCH_CUDA_ARCH_LIST="8.9"(your GPU only). - Use
ccacheif available.
Multiple Python versions¶
Each Python version needs its own compiled .so file
(e.g., _C.cpython-311-x86_64-linux-gnu.so). If you switch Python versions,
rebuild the extension.