Building from Source#
This appendix walks through setting up a development environment for cuda-oxide from a fresh checkout. If you just want to run an example, the Writing Your First Kernel chapter is faster.
Requirements#
Dependency |
Version |
Purpose |
|---|---|---|
Rust nightly |
|
Compiler toolchain with |
CUDA Toolkit |
12.x+ |
Driver API, |
Clang |
21+ ( |
|
Linux |
Tested on Ubuntu 24.04 |
Windows and macOS are not supported |
GPU |
sm_80, sm_90, sm_100a |
Hardware target |
Clone the repository#
git clone https://github.com/NVlabs/cuda-oxide.git
cd cuda-oxide
Install the Rust toolchain#
The repo ships a rust-toolchain.toml that pins the exact nightly version and
components. Rustup picks it up automatically:
# rust-toolchain.toml (already in the repo root)
[toolchain]
channel = "nightly-2026-04-03"
components = ["rust-src", "rustc-dev", "rust-analyzer", "clippy", "llvm-tools"]
If you need to install manually:
rustup toolchain install nightly-2026-04-03
rustup component add rust-src rustc-dev --toolchain nightly-2026-04-03
rust-src provides the standard library source for cross-compilation and
rustc-dev exposes compiler internals that the codegen backend links against.
Install CUDA#
Make sure the CUDA toolkit is on your PATH:
export PATH="/usr/local/cuda/bin:$PATH"
nvcc --version # should print 12.x or later
If you are building on a system without a GPU (e.g. CI), the toolkit is still
required for ptxas and header files, but you will not be able to run kernels.
Install LLVM (usually optional)#
The codegen pipeline emits LLVM IR and invokes llc to produce PTX. The
pinned Rust toolchain (nightly-2026-04-03) already ships LLVM 22 with the
NVPTX backend enabled via the llvm-tools component, so the recommended
path is:
rustup component add llvm-tools
The component is already listed in rust-toolchain.toml, so on a fresh
clone rustup installs it automatically; running the command above is the
one-shot fix for older clones. The pipeline auto-detects this llc at
<sysroot>/lib/rustlib/<host>/bin/llc.
If you would rather use a system LLVM (for a specific patch level, or
because you already have one installed), the pipeline falls back to
llc-22 / llc-21 on PATH. LLVM 21 is the minimum — earlier releases
reject the TMA / tcgen05 / WGMMA intrinsic signatures that cuda-oxide
emits.
# Ubuntu / Debian
sudo apt install llvm-21
If your distro packages do not provide llvm-21, use LLVM’s apt helper:
sudo apt-get install -y lsb-release wget software-properties-common gnupg
wget https://apt.llvm.org/llvm.sh && chmod +x llvm.sh
sudo ./llvm.sh 21
# Verify NVPTX support
llc-21 --version | grep nvptx
You should see nvptx64 - NVIDIA PTX 64-bit in the target list.
To pin a specific binary (rustup’s, a distro’s, or a custom build), set
CUDA_OXIDE_LLC=/path/to/llc. The pipeline’s full resolution order is:
$CUDA_OXIDE_LLC(if set)The Rust toolchain’s
llvm-toolsllcllc-22, thenllc-21, then barellconPATH
Note
Older llc binaries (LLVM 20 and earlier) will compile simpler kernels when
pointed at via CUDA_OXIDE_LLC=/path/to/llc-20, but any example that uses
modern TMA / tcgen05 / WGMMA intrinsics (tma_copy, gemm_sol,
tcgen05_matmul, wgmma, …) will fail with
Intrinsic has incorrect argument type! until you upgrade to LLVM 21+.
Install Clang (for host cuda-bindings)#
The host cuda-bindings crate runs bindgen, which loads libclang and needs
clang’s own resource-dir stddef.h — a bare libclang1-* runtime is not
enough.
sudo apt install clang-21 # or libclang-common-21-dev
cargo oxide doctor verifies this up front.
Build the workspace#
The main workspace contains the user-facing crates (cuda-device, cuda-core,
cuda-async, etc.) and the build tooling (cargo-oxide):
cargo build
Note
The codegen backend (crates/rustc-codegen-cuda/) is intentionally not a
workspace member because it requires special nightly features and a different
build process. cargo-oxide handles building it transparently.
Install cargo-oxide#
cargo-oxide is the cargo subcommand that drives the full compilation
pipeline. Inside the repo, it works via a workspace alias. For standalone use:
cargo install --git https://github.com/NVlabs/cuda-oxide.git cargo-oxide
On first run, cargo-oxide automatically fetches and builds the codegen backend
dylib.
Verify the installation#
# Check all prerequisites
cargo oxide doctor
# Compile and run the canonical first example
cargo oxide run vecadd
cargo oxide doctor validates your Rust toolchain, CUDA toolkit (including
libNVVM / nvJitLink / libdevice for kernels that use math intrinsics), LLVM
installation, and codegen backend. If everything is configured correctly,
cargo oxide run vecadd compiles a Rust kernel to PTX, launches it on the GPU,
and prints a success message.
Common commands#
# Build and run an example
cargo oxide run <example>
# Show the full compilation pipeline (MIR → LLVM IR → PTX)
cargo oxide pipeline <example>
# Debug with cuda-gdb
cargo oxide debug <example> --tui
# Build NVVM IR for libNVVM/nvJitLink interop
cargo oxide build <example> --emit-nvvm-ir --arch sm_120
Building the book#
The documentation lives in cuda-oxide-book/ and uses Sphinx with MyST
Markdown. To build and serve locally:
cd cuda-oxide-book
make setup # creates venv, installs dependencies
source .venv/bin/activate
make livehtml # starts dev server on http://localhost:8000
Generating API documentation#
Standard cargo doc works for the workspace crates:
cargo doc --no-deps --open
This generates rustdoc for cuda-device, cuda-core, cuda-async, and all
other workspace members. The codegen backend is excluded since it is not a
workspace member.
Workspace structure#
cuda-oxide/
├── Cargo.toml # Workspace root
├── rust-toolchain.toml # Pinned nightly + components
├── crates/
│ ├── cuda-device/ # Device intrinsics (#![no_std])
│ ├── cuda-host/ # Host launch APIs
│ ├── cuda-macros/ # Proc macros (#[kernel], #[device], gpu_printf!)
│ ├── cuda-bindings/ # Raw bindgen FFI to cuda.h
│ ├── cuda-core/ # Safe RAII wrappers (CudaContext, DeviceBuffer)
│ ├── cuda-async/ # Async execution (DeviceOperation, DeviceFuture)
│ ├── cargo-oxide/ # Cargo subcommand
│ ├── rustc-codegen-cuda/ # Codegen backend (not a workspace member)
│ ├── mir-importer/ # MIR → Pliron IR translation
│ ├── mir-lower/ # `dialect-mir` → `dialect-llvm` lowering
│ ├── dialect-mir/ # pliron dialect modelling Rust MIR
│ ├── dialect-llvm/ # pliron dialect modelling LLVM IR (+ export)
│ ├── dialect-nvvm/ # NVVM intrinsics dialect
│ ├── libnvvm-sys/ # dlopen bindings to libNVVM
│ ├── nvjitlink-sys/ # dlopen bindings to nvJitLink
│ ├── reserved-oxide-symbols/ # Shared naming contract
│ └── fuzzer/ # Differential testing support
└── cuda-oxide-book/ # This book (Sphinx + MyST)
Troubleshooting#
llcnot found or missing NVPTXThe fastest fix is
rustup component add llvm-tools— the pinned toolchain’sllcis LLVM 22 with NVPTX enabled and is auto-picked up. Otherwise install a system LLVM 21+ (sudo apt install llvm-21); the pipeline probes the rustupllcfirst, thenllc-22→llc-21onPATH. To pin a specific binary setCUDA_OXIDE_LLC=/path/to/llc.Intrinsic has incorrect argument type!(fromllc)Your
llcis older than LLVM 21 and cannot lower the modern TMA / tcgen05 / WGMMA intrinsic signatures. Installllvm-21and re-run.error[E0463]: can't find crate for rustc_middleYou are missing the
rustc-devcomponent. Run:rustup component add rustc-dev --toolchain nightly-2026-04-03.- CUDA driver version mismatch
The toolkit version (compile-time) and driver version (runtime) must be compatible. Run
nvidia-smito check the driver version, andnvcc --versionfor the toolkit.cargo oxide doctorfails on codegen backendThe backend is built on first use. If the build fails, check that
rust-srcis installed and that the nightly version matchesrust-toolchain.toml.