GPU 分配器¶

AllocatorFlag¶

tensorrt.AllocatorFlag¶

成员

RESIZABLE : TensorRT 可能会对此分配调用 realloc()

class tensorrt.IGpuAllocator(self: tensorrt.tensorrt.IGpuAllocator)¶

应用程序实现的类，用于控制 GPU 上的分配。

要实现自定义分配器，请确保在 __init__() 中显式实例化基类

class MyAllocator(trt.IGpuAllocator):
    def __init__(self):
        trt.IGpuAllocator.__init__(self)

    ...

请注意，以下所有方法（allocate、reallocate、deallocate、allocate_async、deallocate_async）都必须在自定义分配器中被重写，否则 pybind11 将无法从自定义分配器调用该方法。

allocate(self: tensorrt.tensorrt.IGpuAllocator, size: int, alignment: int, flags: int) → capsule¶

[已弃用] 在 TensorRT 10.0 中已弃用。请改用 allocate_async。应用程序实现的回调，用于处理 GPU 内存的获取。如果发出大小为 0 的分配请求，则应返回 None。

如果无法满足分配请求，则应返回 None。

参数:

size – 所需内存的大小。
alignment – 所需的内存对齐方式。对齐方式将为零或 2 的幂，不超过 cudaMalloc 保证的对齐方式。因此，此分配器可以使用 cudaMalloc/cudaFree 安全地实现。对齐值为零表示任何对齐方式都是可接受的。
flags – 分配标志。请参阅 AllocatorFlag

返回:

已分配内存的地址

allocate_async(self: tensorrt.tensorrt.IGpuAllocator, size: int, alignment: int, flags: int, stream: int) → capsule¶

应用程序实现的回调，用于异步处理 GPU 内存的获取。这只是同步方法 allocate 的包装器。对于异步分配，请使用相应的 IGpuAsyncAllocator 类。如果发出大小为 0 的分配请求，则应返回 None。

如果无法满足分配请求，则应返回 None。

参数:

size – 所需内存的大小。
alignment – 所需的内存对齐方式。对齐方式将为零或 2 的幂，不超过 cudaMalloc 保证的对齐方式。因此，此分配器可以使用 cudaMalloc/cudaFree 安全地实现。对齐值为零表示任何对齐方式都是可接受的。
flags – 分配标志。请参阅 AllocatorFlag
stream – CUDA 流

返回:

已分配内存的地址

deallocate(self: tensorrt.tensorrt.IGpuAllocator, memory: capsule) → bool¶

[已弃用] 在 TensorRT 10.0 中已弃用。请改用 dealocate_async；应用程序实现的回调，用于处理 GPU 内存的释放。

如果 TensorRT 先前从 allocate() 返回了 0，则可能会将 0 传递给此函数。

deallocate_async(self: tensorrt.tensorrt.IGpuAllocator, memory: capsule, stream: int) → bool¶

应用程序实现的回调，用于异步处理 GPU 内存的释放。这只是同步方法 deallocate 的包装器。对于异步释放，请使用相应的 IGpuAsyncAllocator 类。

如果 TensorRT 先前从 allocate() 返回了 0，则可能会将 0 传递给此函数。

参数:

返回:

如果成功释放了获取的内存，则为 True。

reallocate(self: tensorrt.tensorrt.IGpuAllocator, address: capsule, alignment: int, new_size: int) → capsule¶

应用程序实现的回调，用于调整现有分配的大小。

只有使用 AllocatorFlag.RESIZABLE 分配的分配才会被调整大小。

选项之一是：- 就地调整大小，保持 min(old_size, new_size) 字节不变并返回原始地址 - 将 min(old_size, new_size) 字节移动到足够大小的新位置并返回其地址 - 返回 None，表示无法满足请求。

如果返回 None，TensorRT 将假定 resize() 未实现，并且地址处的分配仍然有效。

此方法可用于将调整大小策略委托给应用程序，从而提供改进内存管理的机会的用例。一种可能的实现是分配一个大型虚拟设备缓冲区，并使用 cuMemMap 逐步提交物理内存。在这种情况下，建议使用 CU_MEM_ALLOC_GRANULARITY_RECOMMENDED。

TensorRT 可能会调用 realloc 来相对少量地增加缓冲区。

参数:

返回:

重新分配的内存的地址