Compilation Guide¶

WarpConvNet ships CUDA C++ extensions that must be compiled against your specific PyTorch and CUDA versions. This page covers all compilation methods and common issues.

Prerequisites¶

NVIDIA GPU with compute capability >= 7.0 (Volta or newer)
CUDA toolkit with nvcc matching your PyTorch CUDA version
PyTorch with CUDA support (CPU-only builds are not supported)
ninja build system (for parallel compilation)
C++17 compatible compiler (GCC >= 7, Clang >= 5)

Method 1: Pre-built wheels (no compilation)¶

The fastest option. Pre-built wheels are available for common PyTorch + CUDA combinations:

pip install torch torchvision --index-url https://download.pytorch.org/whl/cu128

pip install "warpconvnet==1.5.0+torch2.10cu128" \
    --find-links https://github.com/NVlabs/WarpConvNet/releases/latest/download/

Replace the version string to match your PyTorch + CUDA combo. See available wheels.

Method 2: pip install from source¶

Builds the CUDA extension automatically. Takes ~10 minutes.

pip install torch torchvision --index-url https://download.pytorch.org/whl/cu128
pip install build ninja

# From PyPI
pip install warpconvnet

# Or from a local clone
git clone https://github.com/NVlabs/WarpConvNet.git
cd WarpConvNet
git submodule update --init 3rdparty/cutlass
pip install -e . --no-build-isolation

Targeting specific GPU architectures¶

By default, WarpConvNet auto-detects your current GPU and compiles only for that architecture. To target specific architectures (e.g., for deployment across different GPUs):

export TORCH_CUDA_ARCH_LIST="8.0 8.9"  # A100 + RTX 6000 Ada
pip install -e . --no-build-isolation

Common architecture codes:

GPU family	Compute capability
Volta (V100)	7.0
Turing (RTX 20xx)	7.5
Ampere (A100, RTX 30xx)	8.0, 8.6
Ada Lovelace (RTX 40xx, RTX 6000 Ada)	8.9
Hopper (H100)	9.0
Blackwell (B200)	10.0

Method 3: build_ext (in-place compilation only)¶

Use setup.py build_ext --inplace when you want to compile the C++ extension without a full pip install. This is useful for development iteration — it builds warpconvnet/_C.*.so directly into the source tree.

git clone https://github.com/NVlabs/WarpConvNet.git
cd WarpConvNet
git submodule update --init 3rdparty/cutlass
pip install build ninja  # build dependencies

python setup.py build_ext --inplace

setuptools-scm version detection¶

WarpConvNet uses setuptools-scm to derive the package version from git tags (e.g., tag v1.5.0 produces version 1.5.0). This can fail in several situations:

Detached HEAD (e.g., during git rebase)
Shallow clones (git clone --depth 1)
No git tags in the clone
Worktrees without tag visibility

When version detection fails, you'll see an error like:

LookupError: setuptools-scm was unable to detect version

Fix: Set the SETUPTOOLS_SCM_PRETEND_VERSION environment variable to bypass git-based version detection:

# build_ext only (development)
SETUPTOOLS_SCM_PRETEND_VERSION=0.0.0 python setup.py build_ext --inplace

# Full editable install
SETUPTOOLS_SCM_PRETEND_VERSION=0.0.0 pip install -e . --no-build-isolation

# Clean rebuild (removes stale build artifacts)
rm -rf build/
SETUPTOOLS_SCM_PRETEND_VERSION=0.0.0 python setup.py build_ext --inplace

The version string is only used for metadata — it doesn't affect functionality. Use 0.0.0 for local development or any valid version string.

Verify the build¶

After compilation, verify the extension loads correctly:

python -c "import warpconvnet; print('OK')"

If the C++ extension fails to load, you'll see an ImportError with details about the missing symbol or ABI mismatch.

Troubleshooting¶

`ninja: build stopped: subcommand failed`¶

The actual error is usually above the ninja message. Scroll up to find the nvcc or g++ error. Common causes:

CUDA version mismatch: nvcc --version must match the CUDA version your PyTorch was built with. Check with python -c "import torch; print(torch.version.cuda)".
Missing CUTLASS submodule: Run git submodule update --init 3rdparty/cutlass.

`undefined symbol` at import time¶

The compiled extension was built against a different PyTorch version than the one currently installed. Rebuild:

rm -rf build/
SETUPTOOLS_SCM_PRETEND_VERSION=0.0.0 python setup.py build_ext --inplace

Slow compilation¶

Compilation builds 30+ CUDA source files. To speed it up:

Install ninja (pip install ninja) — enables parallel compilation.
Limit target architectures: export TORCH_CUDA_ARCH_LIST="8.9" (your GPU only).
Use ccache if available.

Multiple Python versions¶

Each Python version needs its own compiled .so file (e.g., _C.cpython-311-x86_64-linux-gnu.so). If you switch Python versions, rebuild the extension.