The cuda-oxide Book#
cuda-oxide is an experimental Rust-to-CUDA compiler that lets you write (SIMT) GPU kernels in safe(ish), idiomatic Rust. It compiles standard Rust code directly to PTX β no DSLs, no foreign language bindings, just Rust.
Note
This book assumes familiarity with the Rust programming language, including ownership, traits, and generics. Later chapters on async GPU programming also assume working knowledge of async/.await and runtimes like tokio.
For a refresher, see The Rust Programming Language, Rust by Example, or the Async Book.
Project Status#
The v0.1.0 release is an early-stage alpha: expect bugs, incomplete features, and API breakage as we work to improve it. We hope youβll try it and help shape its direction by sharing feedback on your experience.
π Quick start#
use cuda_device::{kernel, thread, DisjointSlice};
use cuda_core::{CudaContext, DeviceBuffer, LaunchConfig};
use cuda_host::{cuda_launch, load_kernel_module};
#[kernel]
fn vecadd(a: &[f32], b: &[f32], mut c: DisjointSlice<f32>) {
let idx = thread::index_1d();
if let Some(c_elem) = c.get_mut(idx) {
*c_elem = a[idx.get()] + b[idx.get()];
}
}
fn main() {
let ctx = CudaContext::new(0).unwrap();
let stream = ctx.default_stream();
let module = load_kernel_module(&ctx, "vecadd").unwrap();
let a = DeviceBuffer::from_host(&stream, &[1.0f32; 1024]).unwrap();
let b = DeviceBuffer::from_host(&stream, &[2.0f32; 1024]).unwrap();
let mut c = DeviceBuffer::<f32>::zeroed(&stream, 1024).unwrap();
cuda_launch! {
kernel: vecadd, stream: stream, module: module,
config: LaunchConfig::for_num_elems(1024),
args: [slice(a), slice(b), slice_mut(c)]
}.unwrap();
let result = c.to_host_vec(&stream).unwrap();
assert_eq!(result[0], 3.0);
}
Build and run with cargo oxide run vecadd upon installing the prerequisites.
Note
The module name passed to load_kernel_module is the kernel artifact basename;
for workspace examples that is the example name.
Why cuda-oxide?#
Write GPU kernels with Rustβs type system and ownership model. Safety is a first-class goal, but GPUs have subtleties β read about the safety model.
Not a DSL. A custom rustc codegen backend that compiles pure Rust to PTX.
Compose GPU work as lazy DeviceOperation graphs.
Schedule across stream pools. Await results with .await.