Fermat
Host & Device
The user of CUGAR needs to familiarize with the fact that on a GPU equipped system there is both a host, controlled by a CPU, and one or multiple GPU devices, with distinct memory spaces. Hence, there can be several types of functions and data-structures:
  • single-threaded functions that can be called by a host thread
  • single-threaded functions that can be called by a device thread
  • single-threaded functions that can be called both on the host and the device
  • parallel functions that can be called by a host thread, and spawn one or more sets of host threads
  • parallel functions that can be called by a host thread, but spawn one or more sets of device threads
  • data-structures that encapsulate host data and are meant to be used on the host (e.g. a resizable host vector, cugar::vector<host_tag,T>)
  • data-structures that encapsulate device data but are meant to be used on the host (e.g. a resizable device vector, cugar::vector<device_tag,T>)
  • data-structures that encapsulate device data and are meant to be used on the device
Unified Virtual Memory allows to use any data-structure anywhere, but for performance-oriented applications it can be beneficial to have explicit control of placement in the memory hierarchy.

Plain Views

The fact that some data structures contain device data but can only be used from the host, coupled with the fact that at the moment CUDA does not allow to pass references as device kernel arguments and requires to pass PODs in, lends naturally to the definition of plain views: in CUGAR's speech, a plain view of an object is essentially a shallow reference to an object's data encapsulated in a POD data structure that can be passed as kernel parameters.
CUGAR defines the generic function plain_view() to obtain the plain view of a given object. Analogously it defines the meta function plain_view_subtype<T>::type to get the type of the plain view of any given type T (where defined). Moreover, as a convention CUGAR's data structures T define the subtype T::plain_view_type and T::const_plain_view_type to identify their plain view types.
As an example consider the following situation, where on the host you have created a large device vector you want to be filled by a device kernel. Ideally, you'd want to simply pass a reference to the vector to your kernel, as in:
__global__ void my_kernel( // the CUDA kernel
cugar::vector<device_tag,uint32>& vec) // ideally, receive a reference: doesn't work without UVM!
{
const uint32 tid = threadIdx.x + blockIdx.x * blockDim.x; // compute a linear thread id
if (tid < vec.size())
vec[tid] = tid * 10;
}
int main()
{
const uint32 blockdim = 128;
const uint32 n_blocks = util::divide_ri( vec.size(), blockdim );
my_kernel<<<n_blocks,blockdim>>>( vec );
}
With UVM-capable GPUs this is technically possible, though it requires page migration. With CUGAR, you can do this instead:
__global__ void my_kernel( // the CUDA kernel
cugar::vector_view<uint32> vec) // CUGAR's surrogate of a reference
{
const uint32 tid = threadIdx.x + blockIdx.x * blockDim.x; // compute a linear thread id
if (tid < vec.size())
vec[tid] = tid * 10;
}
int main()
{
const uint32 blockdim = 128;
const uint32 n_blocks = util::divide_ri( vec.size(), blockdim );
my_kernel<<<n_blocks,blockdim>>>( cugar::plain_view( vec ) );
}
This basic pattern can be applied to all of CUGAR's data structures that are meant to be setup from the host and accessed from the device.

Next: hello_cugar_page