IExecutionContext¶

class tensorrt.IOutputAllocator(self: tensorrt.tensorrt.IOutputAllocator)¶

应用程序实现类，用于控制输出张量分配。

要实现自定义输出分配器，请确保在 __init__() 中显式实例化基类

class MyOutputAllocator(trt.IOutputAllocator):
    def __init__(self):
        trt.IOutputAllocator.__init__(self)

    def reallocate_output(self, tensor_name, memory, size, alignment):
        ... # Your implementation here

    def reallocate_output_async(self, tensor_name, memory, size, alignment, stream):
        ... # Your implementation here

    def notify_shape(self, tensor_name, shape):
        ... # Your implementation here

__init__(self: tensorrt.tensorrt.IOutputAllocator) → None¶

class tensorrt.IExecutionContext¶

使用 ICudaEngine 执行推理的上下文。一个 ICudaEngine 实例可以存在多个 IExecutionContext ，从而允许同一个 ICudaEngine 同时执行多个批次。

变量:

debug_sync – bool 调试同步标志。如果此标志设置为 true，则 ICudaEngine 将记录 execute_v2() 期间每个内核的成功执行。
profiler – IProfiler 此 IExecutionContext 使用的性能分析器。
engine – ICudaEngine 关联的 ICudaEngine 。
name – str IExecutionContext 的名称。
device_memory – capsule 此执行上下文使用的设备内存。内存必须与 CUDA 内存对齐属性（使用 cuda.cudart.cudaGetDeviceProperties()）对齐，并且其大小必须足够大，以便使用给定的网络输入执行推理。engine.device_memory_size() 和 engine.get_device_memory_size_for_profile() 报告大小的上限。如果报告的大小为 0，则可以将内存设置为 nullptr。如果使用 execute_async_v3() 运行网络，则内存从调用 execute_async_v3() 到网络执行完成期间都在使用。如果使用 execute_v2()，则在使用到 execute_v2() 返回为止。在此期间释放内存或将其用于其他目的（包括在并行运行的另一个执行上下文中使用）将导致未定义的行为。
active_optimization_profile – int 上下文的活动优化配置文件。选定的配置文件将用于后续对 execute_v2() 的调用。默认情况下选择配置文件 0。这是一个只读属性，可以使用 set_optimization_profile_async() 更改活动优化配置文件。更改此值将使当前执行上下文的所有动态绑定失效，因此必须在使用 execute_v2() 之前使用 set_input_shape() 再次设置它们。
all_binding_shapes_specified – bool 是否已通过调用 set_input_shape() 指定输入张量的所有动态维度。如果网络没有动态形状的输入张量，则平凡地为真。不适用于基于名称的接口，例如 set_input_shape()。请改用 infer_shapes()。
all_shape_inputs_specified – bool 是否已通过调用 set_shape_input() 指定所有输入形状张量的值。如果网络没有输入形状绑定，则平凡地为真。不适用于基于名称的接口，例如 set_input_shape()。请改用 infer_shapes()。
error_recorder – IErrorRecorder 应用程序实现的 TensorRT 对象的错误报告接口。
enqueue_emits_profile – bool enqueue 是否向性能分析器发出层计时。默认值为 True。如果设置为 False，如果附加了性能分析器，则 enqueue 将是异步的。需要调用额外的方法 IExecutionContext::report_to_profiler() 以获取性能分析数据并报告给附加的性能分析器。
persistent_cache_limit – 此执行上下文可以用于激活缓存的最大持久 L2 缓存大小。并非所有架构都支持激活缓存 - 有关详细信息，请参阅开发者指南中的“TensorRT 如何使用内存”。默认值为 0 字节。
nvtx_verbosity – 执行上下文的 NVTX 详细级别。使用 DETAILED 详细级别构建通常会增加 enqueueV3() 中的延迟。调用此方法可在运行时选择此执行上下文中的 NVTX 详细级别。默认值是构建引擎时使用的详细级别，并且详细级别可能不会高于该级别。此函数不影响 IEngineInspector 与引擎的交互方式。
temporary_allocator – IGpuAllocator 用于内部临时存储的 GPU 分配器。

__del__(self: tensorrt.tensorrt.IExecutionContext) → None¶

__exit__(exc_type, exc_value, traceback)¶: 上下文管理器已弃用且无效。当引用计数达到 0 时，对象会自动释放。

__init__(*args, **kwargs)¶

execute_async_v3(self: tensorrt.tensorrt.IExecutionContext, stream_handle: int) → bool¶

异步执行推理。

在流同步或传递给 set_input_consumed_event() 的事件触发之前，修改或释放已为张量注册的内存会导致未定义的行为。

输入张量可以在 set_input_consumed_event() 之后释放，而输出张量需要流同步。

参数:: stream_handle – 推理内核将在其上入队的 CUDA 流。由于 TensorRT 执行额外的 cudaDeviceSynchronize() 调用以确保正确的同步，因此使用默认流可能会导致性能问题。请改用非默认流。

execute_v2(self: tensorrt.tensorrt.IExecutionContext, bindings: List[int]) → bool¶

同步执行批次的推理。此方法需要输入和输出缓冲区数组。

参数:: bindings – 一个整数列表，表示网络的输入和输出缓冲区地址。
返回值:: 如果执行成功，则为 True。

get_debug_listener(self: tensorrt.tensorrt.IExecutionContext) → tensorrt.tensorrt.IDebugListener¶

获取执行上下文的调试侦听器。

返回值:: 执行上下文的 IDebugListener。

get_debug_state(self: tensorrt.tensorrt.IExecutionContext, name: str) → bool¶

获取张量的调试状态。

参数:: name – 张量的名称。

get_input_consumed_event(self: tensorrt.tensorrt.IExecutionContext) → int¶: 返回与消耗输入张量关联的事件。

get_max_output_size(self: tensorrt.tensorrt.IExecutionContext, name: str) → int¶

根据当前的优化配置文件，返回输出张量大小（以字节为单位）的上限。

如果尚未设置配置文件或输入形状，或者提供的名称未映射到输出，则返回 -1。

参数:: name – 张量名称。

get_output_allocator(self: tensorrt.tensorrt.IExecutionContext, name: str) → tensorrt.tensorrt.IOutputAllocator¶

返回与给定输出张量关联的输出分配器，如果提供的名称未映射到输出张量，则返回 None。

参数:: name – 张量名称。

get_tensor_address(self: tensorrt.tensorrt.IExecutionContext, name: str) → int¶

获取给定输入或输出张量的内存地址。

参数:: name – 张量名称。

get_tensor_shape(self: tensorrt.tensorrt.IExecutionContext, name: str) → tensorrt.tensorrt.Dims¶

返回给定输入或输出张量的形状。

参数:: name – 张量名称。

get_tensor_strides(self: tensorrt.tensorrt.IExecutionContext, name: str) → tensorrt.tensorrt.Dims¶

返回给定张量名称的缓冲区步幅。

请注意，对于具有动态形状的不同执行上下文，步幅可能不同。

参数:: name – 张量名称。

infer_shapes(self: tensorrt.tensorrt.IExecutionContext) → List[str]¶

推断形状并返回任何未充分指定的张量的名称。

如果以下任一条件为真，则输入张量未充分指定

它具有动态维度，并且其运行时维度尚未通过 set_input_shape() 指定。
is_shape_inference_io(t) 为 True，并且尚未设置张量的地址。

返回值:: 一个 List[str]，指示任何未充分指定的张量的名称，或成功时为空列表。
Raises:: 如果形状推断因张量未充分指定以外的原因而失败，则引发 RuntimeError。

report_to_profiler(self: tensorrt.tensorrt.IExecutionContext) → bool¶

计算 IExecutionContext 中当前优化配置文件的层计时信息，并在一次推理启动后更新性能分析器。

如果 enqueue_emits_profiler 标志设置为 true，则如果提供了性能分析器，则 enqueue 函数将隐式计算层计时。无需调用此函数。如果 enqueue_emits_profiler 标志设置为 false，则如果提供了性能分析器，则 enqueue 函数将记录 CUDA 事件计时器。但它不会执行层计时计算。需要显式调用此函数来计算先前推理启动的层计时。

在 CUDA 图启动场景中，如果图形是从启用了性能分析器的 IExecutionContext 捕获的，它将记录与常规 enqueue 函数中相同的 CUDA 事件集。需要在图形启动后调用此函数，以将层计时信息报告给性能分析器。

Profiling CUDA graphs is only available from CUDA 11.1 onwards.

返回值:: True 如果调用成功，否则为 False（例如，未提供性能分析器，处于 CUDA 图捕获模式等）

set_all_tensors_debug_state(self: tensorrt.tensorrt.IExecutionContext, flag: bool) → bool¶

打开或关闭所有调试张量的调试状态。

参数:: flag – 如果打开张量的调试状态，则为 True。如果关闭，则为 False。

set_aux_streams(self: tensorrt.tensorrt.IExecutionContext, aux_streams: List[int]) → None¶

设置 TensorRT 应在下一次 execute_async_v3() 调用中在其上启动内核的辅助流。

如果设置，TensorRT 将使用用户通过此 API 提供的流启动应在辅助流上运行的内核。如果在 execute_async_v3() 调用之前未调用此 API，则 TensorRT 将使用 TensorRT 在内部创建的辅助流。

TensorRT 将始终在通过 execute_async_v3() 调用提供的主流和辅助流之间插入事件同步

在 execute_async_v3() 调用的开始，TensorRT 将确保所有辅助流等待主流上的活动。
在 execute_async_v3() 调用的结束，TensorRT 将确保主流等待所有辅助流上的活动。

提供的辅助流不能是默认流，并且必须全部不同以避免死锁。

参数:: aux_streams – CUDA 流列表。如果列表的长度大于 engine.num_aux_streams，则仅使用前“engine.num_aux_streams”个流。如果长度小于 engine.num_aux_streams，例如空列表，则 TensorRT 将为前几个辅助流使用提供的流，并将为其余辅助流在内部创建额外的流。

set_debug_listener(self: tensorrt.tensorrt.IExecutionContext, listener: tensorrt.tensorrt.IDebugListener) → bool¶

设置执行上下文的调试侦听器。

参数:: listener – IDebugListener。

set_device_memory(self: tensorrt.tensorrt.IExecutionContext, memory: int, size: int) → None¶

此 IExecutionContext 使用的设备内存。

参数:

memory – 256 字节对齐的设备内存。
size – 提供的内存大小。这必须至少与 CudaEngine.get_device_memory_size_v2 一样大

如果使用 enqueue_v3()，则在使用到 enqueue_v3() 返回为止。在此期间释放内存或将其用于其他目的将导致未定义的行为。这包括将相同的内存用于并行执行上下文。

set_input_consumed_event(self: tensorrt.tensorrt.IExecutionContext, event: int) → bool¶

将所有输入张量标记为已消耗。

参数:: event – 在所有输入张量都已消耗后触发的 CUDA 事件。

set_input_shape(*args, **kwargs)¶

重载函数。

set_input_shape(self: tensorrt.tensorrt.IExecutionContext, name: str, shape: tuple) -> bool

为给定的输入张量设置形状。

arg name:

输入张量名称。

arg shape:

输入张量形状。
set_input_shape(self: tensorrt.tensorrt.IExecutionContext, name: str, shape: list) -> bool

为给定的输入张量设置形状。

arg name:

输入张量名称。

arg shape:

输入张量形状。
set_input_shape(self: tensorrt.tensorrt.IExecutionContext, name: str, shape: tensorrt.tensorrt.Dims) -> bool

为给定的输入张量设置形状。

arg name:

输入张量名称。

arg shape:

输入张量形状。

set_optimization_profile_async(self: tensorrt.tensorrt.IExecutionContext, profile_index: int, stream_handle: int) → bool¶

使用异步语义设置优化配置文件

参数:

profile_index – 优化配置文件的索引
stream_handle – 可在其上入队切换优化配置文件的工作的 CUDA 流

当通过此 API 切换优化配置文件时，TensorRT 可能需要通过 cudaMemcpyAsync 复制数据。应用程序有责任保证配置文件同步流和入队流之间发生同步。

返回值:: True 如果优化配置文件设置成功

set_output_allocator(self: tensorrt.tensorrt.IExecutionContext, name: str, output_allocator: tensorrt.tensorrt.IOutputAllocator) → bool¶

为给定的输出张量设置要使用的输出分配器。

传递 None 来取消设置输出分配器。

分配器由 execute_async_v3() 调用。

参数:

name – 张量名称。
output_allocator – 输出分配器。

set_tensor_address(self: tensorrt.tensorrt.IExecutionContext, name: str, memory: int) → bool¶

为给定的输入或输出张量设置内存地址。

参数:

name – 张量名称。
memory – 内存地址。

set_tensor_debug_state(self: tensorrt.tensorrt.IExecutionContext, name: str, flag: bool) → bool¶

开启或关闭张量的调试状态。张量必须在构建时被标记为调试张量。

参数:

name – 目标张量的名称。
flag – 如果打开张量的调试状态，则为 True。如果关闭，则为 False。

update_device_memory_size_for_shapes(self: tensorrt.tensorrt.IExecutionContext) → int¶

根据当前的输入形状重新计算内部激活缓冲区大小，并返回所需的总内存量。

用户可以根据返回的大小分配设备内存，并将内存通过赋值给 IExecutionContext.device_memory 提供给 TRT。在调用此函数之前，必须指定所有输入形状和要使用的优化配置文件，否则分区将失效。