A simple caching allocator for device memory allocations.
- Overview
- The allocator is thread-safe and stream-safe and is capable of managing cached device allocations on multiple devices. It behaves as follows:
- Allocations from the allocator are associated with an
active_stream
. Once freed, the allocation becomes available immediately for reuse within the active_stream
with which it was associated with during allocation, and it becomes available for reuse within other streams when all prior work submitted to active_stream
has completed.
- Allocations are categorized and cached by bin size. A new allocation request of a given size will only consider cached allocations within the corresponding bin.
- Bin limits progress geometrically in accordance with the growth factor
bin_growth
provided during construction. Unused device allocations within a larger bin cache are not reused for allocation requests that categorize to smaller bin sizes.
- Allocation requests below (
bin_growth
^ min_bin
) are rounded up to (bin_growth
^ min_bin
).
- Allocations above (
bin_growth
^ max_bin
) are not rounded up to the nearest bin and are simply freed when they are deallocated instead of being returned to a bin-cache.
- If the total storage of cached allocations on a given device will exceed
max_cached_bytes
, allocations for that device are simply freed when they are deallocated instead of being returned to their bin-cache.
- For example, the default-constructed CachingDeviceAllocator is configured with:
bin_growth
= 8
min_bin
= 3
max_bin
= 7
max_cached_bytes
= 6MB - 1B
- which delineates five bin-sizes: 512B, 4KB, 32KB, 256KB, and 2MB and sets a maximum of 6,291,455 cached bytes per device
- Examples:
- example_device_partition_flagged.cu, example_device_partition_if.cu, example_device_radix_sort.cu, example_device_reduce.cu, example_device_scan.cu, example_device_select_flagged.cu, example_device_select_if.cu, and example_device_select_unique.cu.
|
| CachingDeviceAllocator (unsigned int bin_growth, unsigned int min_bin=1, unsigned int max_bin=INVALID_BIN, size_t max_cached_bytes=INVALID_SIZE, bool skip_cleanup=false, bool debug=false) |
| Constructor. More...
|
|
| CachingDeviceAllocator (bool skip_cleanup=false, bool debug=false) |
| Default constructor. More...
|
|
cudaError_t | SetMaxCachedBytes (size_t max_cached_bytes_) |
| Sets the limit on the number bytes this allocator is allowed to cache per device. More...
|
|
cudaError_t | DeviceAllocate (int device, void **d_ptr, size_t bytes, cudaStream_t active_stream=0) |
| Provides a suitable allocation of device memory for the given size on the specified device. More...
|
|
cudaError_t | DeviceAllocate (void **d_ptr, size_t bytes, cudaStream_t active_stream=0) |
| Provides a suitable allocation of device memory for the given size on the current device. More...
|
|
cudaError_t | DeviceFree (int device, void *d_ptr) |
| Frees a live allocation of device memory on the specified device, returning it to the allocator. More...
|
|
cudaError_t | DeviceFree (void *d_ptr) |
| Frees a live allocation of device memory on the current device, returning it to the allocator. More...
|
|
cudaError_t | FreeAllCached () |
| Frees all cached device allocations on all devices. More...
|
|
virtual | ~CachingDeviceAllocator () |
| Destructor. More...
|
|
cub::CachingDeviceAllocator::CachingDeviceAllocator |
( |
unsigned int |
bin_growth, |
|
|
unsigned int |
min_bin = 1 , |
|
|
unsigned int |
max_bin = INVALID_BIN , |
|
|
size_t |
max_cached_bytes = INVALID_SIZE , |
|
|
bool |
skip_cleanup = false , |
|
|
bool |
debug = false |
|
) |
| |
|
inline |
Constructor.
- Parameters
-
bin_growth | Geometric growth factor for bin-sizes |
min_bin | Minimum bin (default is bin_growth ^ 1) |
max_bin | Maximum bin (default is no max bin) |
max_cached_bytes | Maximum aggregate cached bytes per device (default is no limit) |
skip_cleanup | Whether or not to skip a call to FreeAllCached() when the destructor is called (default is to deallocate) |
debug | Whether or not to print (de)allocation events to stdout (default is no stderr output) |
cub::CachingDeviceAllocator::CachingDeviceAllocator |
( |
bool |
skip_cleanup = false , |
|
|
bool |
debug = false |
|
) |
| |
|
inline |
Default constructor.
Configured with:
bin_growth
= 8
min_bin
= 3
max_bin
= 7
max_cached_bytes
= (bin_growth
^ max_bin
) * 3) - 1 = 6,291,455 bytes
which delineates five bin-sizes: 512B, 4KB, 32KB, 256KB, and 2MB and sets a maximum of 6,291,455 cached bytes per device
virtual cub::CachingDeviceAllocator::~CachingDeviceAllocator |
( |
) | |
|
|
inlinevirtual |
cudaError_t cub::CachingDeviceAllocator::SetMaxCachedBytes |
( |
size_t |
max_cached_bytes_) | |
|
|
inline |
Sets the limit on the number bytes this allocator is allowed to cache per device.
Changing the ceiling of cached bytes does not cause any allocations (in-use or cached-in-reserve) to be freed. See FreeAllCached()
.
cudaError_t cub::CachingDeviceAllocator::DeviceAllocate |
( |
int |
device, |
|
|
void ** |
d_ptr, |
|
|
size_t |
bytes, |
|
|
cudaStream_t |
active_stream = 0 |
|
) |
| |
|
inline |
Provides a suitable allocation of device memory for the given size on the specified device.
Once freed, the allocation becomes available immediately for reuse within the active_stream
with which it was associated with during allocation, and it becomes available for reuse within other streams when all prior work submitted to active_stream
has completed.
- Parameters
-
[in] | device | Device on which to place the allocation |
[out] | d_ptr | Reference to pointer to the allocation |
[in] | bytes | Minimum number of bytes for the allocation |
[in] | active_stream | The stream to be associated with this allocation |
cudaError_t cub::CachingDeviceAllocator::DeviceAllocate |
( |
void ** |
d_ptr, |
|
|
size_t |
bytes, |
|
|
cudaStream_t |
active_stream = 0 |
|
) |
| |
|
inline |
Provides a suitable allocation of device memory for the given size on the current device.
Once freed, the allocation becomes available immediately for reuse within the active_stream
with which it was associated with during allocation, and it becomes available for reuse within other streams when all prior work submitted to active_stream
has completed.
- Parameters
-
[out] | d_ptr | Reference to pointer to the allocation |
[in] | bytes | Minimum number of bytes for the allocation |
[in] | active_stream | The stream to be associated with this allocation |
cudaError_t cub::CachingDeviceAllocator::DeviceFree |
( |
int |
device, |
|
|
void * |
d_ptr |
|
) |
| |
|
inline |
Frees a live allocation of device memory on the specified device, returning it to the allocator.
Once freed, the allocation becomes available immediately for reuse within the active_stream
with which it was associated with during allocation, and it becomes available for reuse within other streams when all prior work submitted to active_stream
has completed.
cudaError_t cub::CachingDeviceAllocator::DeviceFree |
( |
void * |
d_ptr) | |
|
|
inline |
Frees a live allocation of device memory on the current device, returning it to the allocator.
Once freed, the allocation becomes available immediately for reuse within the active_stream
with which it was associated with during allocation, and it becomes available for reuse within other streams when all prior work submitted to active_stream
has completed.
cudaError_t cub::CachingDeviceAllocator::FreeAllCached |
( |
) | |
|
|
inline |
Frees all cached device allocations on all devices.
const unsigned int cub::CachingDeviceAllocator::INVALID_BIN = (unsigned int) -1 |
|
static |
const size_t cub::CachingDeviceAllocator::INVALID_SIZE = (size_t) -1 |
|
static |
The documentation for this struct was generated from the following file: