1.4. FP4 转换和数据移动

要使用这些函数，请在您的程序中包含头文件 cuda_fp4.h。

枚举

__nv_fp4_interpretation_t: 枚举将 4 位值作为 fp4 类型引用时的可能解释。

函数

__host__ __device__ __nv_fp4x2_storage_t __nv_cvt_bfloat16raw2_to_fp4x2(const __nv_bfloat162_raw x, const __nv_fp4_interpretation_t fp4_interpretation, const enum cudaRoundMode rounding): 将打包在 __nv_bfloat162_raw x 中的两个 nv_bfloat16 精度数字的输入向量转换为两个 fp4 类型值的向量，使用指定的舍入模式并使超出范围的值饱和。
__host__ __device__ __nv_fp4_storage_t __nv_cvt_bfloat16raw_to_fp4(const __nv_bfloat16_raw x, const __nv_fp4_interpretation_t fp4_interpretation, const enum cudaRoundMode rounding): 将输入 nv_bfloat16 精度 x 转换为请求类型的 fp4 类型，使用指定的舍入模式并使超出范围的值饱和。
__host__ __device__ __nv_fp4x2_storage_t __nv_cvt_double2_to_fp4x2(const double2 x, const __nv_fp4_interpretation_t fp4_interpretation, const enum cudaRoundMode rounding): 将打包在 double2 x 中的两个 double 精度数字的输入向量转换为两个 fp4 类型值的向量，使用指定的舍入模式并使超出范围的值饱和。
__host__ __device__ __nv_fp4_storage_t __nv_cvt_double_to_fp4(const double x, const __nv_fp4_interpretation_t fp4_interpretation, const enum cudaRoundMode rounding): 将输入 double 精度 x 转换为请求类型的 fp4 类型，使用指定的舍入模式并使超出范围的值饱和。
__host__ __device__ __nv_fp4x2_storage_t __nv_cvt_float2_to_fp4x2(const float2 x, const __nv_fp4_interpretation_t fp4_interpretation, const enum cudaRoundMode rounding): 将打包在 float2 x 中的两个 single 精度数字的输入向量转换为两个 fp4 类型值的向量，使用指定的舍入模式并使超出范围的值饱和。
__host__ __device__ __nv_fp4_storage_t __nv_cvt_float_to_fp4(const float x, const __nv_fp4_interpretation_t fp4_interpretation, const enum cudaRoundMode rounding): 将输入 single 精度 x 转换为请求类型的 fp4 类型，使用指定的舍入模式并使超出范围的值饱和。
__host__ __device__ __half_raw __nv_cvt_fp4_to_halfraw(const __nv_fp4_storage_t x, const __nv_fp4_interpretation_t fp4_interpretation): 将指定类型的输入 fp4 x 转换为 half 精度。
__host__ __device__ __half2_raw __nv_cvt_fp4x2_to_halfraw2(const __nv_fp4x2_storage_t x, const __nv_fp4_interpretation_t fp4_interpretation): 将指定类型的两个 fp4 值的输入向量转换为打包在 __half2_raw 结构中的两个 half 精度值的向量。
__host__ __device__ __nv_fp4x2_storage_t __nv_cvt_halfraw2_to_fp4x2(const __half2_raw x, const __nv_fp4_interpretation_t fp4_interpretation, const enum cudaRoundMode rounding): 将打包在 __half2_raw x 中的两个 half 精度数字的输入向量转换为两个 fp4 类型值的向量，使用指定的舍入模式并使超出范围的值饱和。
__host__ __device__ __nv_fp4_storage_t __nv_cvt_halfraw_to_fp4(const __half_raw x, const __nv_fp4_interpretation_t fp4_interpretation, const enum cudaRoundMode rounding): 将输入 half 精度 x 转换为请求类型的 fp4 类型，使用指定的舍入模式并使超出范围的值饱和。
__host__ __device__ __nv_fp4_e2m1::__nv_fp4_e2m1(): 默认构造函数。
__host__ __device__ __nv_fp4_e2m1::__nv_fp4_e2m1(const unsigned long int val): 从 unsigned long int 数据类型构造，依赖于 __NV_SATFINITE 行为处理超出范围的值。
__host__ __device__ __nv_fp4_e2m1::__nv_fp4_e2m1(const double f): 从 double 数据类型构造，依赖于 __NV_SATFINITE 行为处理超出范围的值和 cudaRoundNearest 舍入模式。
__host__ __device__ __nv_fp4_e2m1::__nv_fp4_e2m1(const long int val): 从 long int 数据类型构造，依赖于 __NV_SATFINITE 行为处理超出范围的值。
__host__ __device__ __nv_fp4_e2m1::__nv_fp4_e2m1(const float f): 从 float 数据类型构造，依赖于 __NV_SATFINITE 行为处理超出范围的值和 cudaRoundNearest 舍入模式。
__host__ __device__ __nv_fp4_e2m1::__nv_fp4_e2m1(const int val): 从 int 数据类型构造，依赖于 __NV_SATFINITE 行为处理超出范围的值。
__host__ __device__ __nv_fp4_e2m1::__nv_fp4_e2m1(const unsigned short int val): 从 unsigned short int 数据类型构造，依赖于 __NV_SATFINITE 行为处理超出范围的值。
__host__ __device__ __nv_fp4_e2m1::__nv_fp4_e2m1(const long long int val): 从 long long int 数据类型构造，依赖于 __NV_SATFINITE 行为处理超出范围的值。
__host__ __device__ __nv_fp4_e2m1::__nv_fp4_e2m1(const short int val): 从 short int 数据类型构造。
__host__ __device__ __nv_fp4_e2m1::__nv_fp4_e2m1(const __nv_bfloat16 f): 从 __nv_bfloat16 数据类型构造，依赖于 __NV_SATFINITE 行为处理超出范围的值和 cudaRoundNearest 舍入模式。
__host__ __device__ __nv_fp4_e2m1::__nv_fp4_e2m1(const unsigned int val): 从 unsigned int 数据类型构造，依赖于 __NV_SATFINITE 行为处理超出范围的值。
__host__ __device__ __nv_fp4_e2m1::__nv_fp4_e2m1(const unsigned long long int val): 从 unsigned long long int 数据类型构造，依赖于 __NV_SATFINITE 行为处理超出范围的值。
__host__ __device__ __nv_fp4_e2m1::__nv_fp4_e2m1(const __half f): 从 __half 数据类型构造，依赖于 __NV_SATFINITE 行为处理超出范围的值和 cudaRoundNearest 舍入模式。
__host__ __device__ __nv_fp4x2_e2m1::__nv_fp4x2_e2m1(const double2 f): 从 double2 数据类型构造，依赖于 __NV_SATFINITE 行为处理超出范围的值。
__host__ __device__ __nv_fp4x2_e2m1::__nv_fp4x2_e2m1(const __nv_bfloat162 f): 从 __nv_bfloat162 数据类型构造，依赖于 __NV_SATFINITE 行为处理超出范围的值。
__host__ __device__ __nv_fp4x2_e2m1::__nv_fp4x2_e2m1(const __half2 f): 从 __half2 数据类型构造，依赖于 __NV_SATFINITE 行为处理超出范围的值。
__host__ __device__ __nv_fp4x2_e2m1::__nv_fp4x2_e2m1(const float2 f): 从 float2 数据类型构造，依赖于 __NV_SATFINITE 行为处理超出范围的值。
__host__ __device__ __nv_fp4x2_e2m1::__nv_fp4x2_e2m1(): 默认构造函数。
__host__ __device__ __nv_fp4x4_e2m1::__nv_fp4x4_e2m1(): 默认构造函数。
__host__ __device__ __nv_fp4x4_e2m1::__nv_fp4x4_e2m1(const __nv_bfloat162 flo, const __nv_bfloat162 fhi): 从一对 __nv_bfloat162 数据类型值构造，依赖于 __NV_SATFINITE 行为处理超出范围的值。
__host__ __device__ __nv_fp4x4_e2m1::__nv_fp4x4_e2m1(const double4 f): 从 double4 向量数据类型构造，依赖于 __NV_SATFINITE 行为处理超出范围的值。
__host__ __device__ __nv_fp4x4_e2m1::__nv_fp4x4_e2m1(const float4 f): 从 float4 向量数据类型构造，依赖于 __NV_SATFINITE 行为处理超出范围的值。
__host__ __device__ __nv_fp4x4_e2m1::__nv_fp4x4_e2m1(const __half2 flo, const __half2 fhi): 从一对 __half2 数据类型值构造，依赖于 __NV_SATFINITE 行为处理超出范围的值。

类型定义

__nv_fp4_storage_t: 用于 fp4 浮点数存储的 8 位 unsigned integer 类型抽象。
__nv_fp4x2_storage_t: 用于存储 fp4 浮点数对的 8 位 unsigned integer 类型抽象。
__nv_fp4x4_storage_t: 用于存储 fp4 浮点数四元组的 16 位 unsigned integer 类型抽象。

1.4.1. 枚举

enum __nv_fp4_interpretation_t

枚举将 4 位值作为 fp4 类型引用时的可能解释。

值

enumerator __NV_E2M1: 代表 e2m1 类型的 fp4 数字。

1.4.2. 函数

__host__ __device__ __nv_fp4x2_storage_t __nv_cvt_bfloat16raw2_to_fp4x2(const __nv_bfloat162_raw x, const __nv_fp4_interpretation_t fp4_interpretation, const enum cudaRoundMode rounding)

将打包在 __nv_bfloat162_raw x 中的两个 nv_bfloat16 精度数字的输入向量转换为两个请求类型的 fp4 类型值的向量，使用指定的舍入模式并使超出范围的值饱和。

将输入向量 x 转换为由 fp4_interpretation 参数指定的类型的两个 fp4 值向量，使用由 rounding 参数指定的舍入模式。大的超出范围的值饱和到相同符号的 MAXNORM。NaN 输入值导致正 MAXNORM。

返回值

__nv_fp4x2_storage_t 值保存转换结果。

__host__ __device__ __nv_fp4_storage_t __nv_cvt_bfloat16raw_to_fp4(const __nv_bfloat16_raw x, const __nv_fp4_interpretation_t fp4_interpretation, const enum cudaRoundMode rounding)

将输入 nv_bfloat16 精度 x 转换为请求类型的 fp4 类型，使用指定的舍入模式并使超出范围的值饱和。

将输入 x 转换为由 fp4_interpretation 参数指定的类型的 fp4 类型，使用由 rounding 参数指定的舍入模式。大的超出范围的值饱和到相同符号的 MAXNORM。NaN 输入值导致正 MAXNORM。

返回值

__nv_fp4_storage_t 值保存转换结果。

__host__ __device__ __nv_fp4x2_storage_t __nv_cvt_double2_to_fp4x2(const double2 x, const __nv_fp4_interpretation_t fp4_interpretation, const enum cudaRoundMode rounding)

将打包在 double2 x 中的两个 double 精度数字的输入向量转换为两个 fp4 类型值的向量，使用指定的舍入模式并使超出范围的值饱和。

将输入向量 x 转换为由 fp4_interpretation 参数指定的类型的两个 fp4 值向量，使用由 rounding 参数指定的舍入模式。大的超出范围的值饱和到相同符号的 MAXNORM。NaN 输入值导致正 MAXNORM。

返回值

__nv_fp4x2_storage_t 值保存转换结果。

__host__ __device__ __nv_fp4_storage_t __nv_cvt_double_to_fp4(const double x, const __nv_fp4_interpretation_t fp4_interpretation, const enum cudaRoundMode rounding)

将输入 double 精度 x 转换为请求类型的 fp4 类型，使用指定的舍入模式并使超出范围的值饱和。

将输入 x 转换为由 fp4_interpretation 参数指定的类型的 fp4 类型，使用由 rounding 参数指定的舍入模式。大的超出范围的值饱和到相同符号的 MAXNORM。NaN 输入值导致正 MAXNORM。

返回值

__nv_fp4_storage_t 值保存转换结果。

__host__ __device__ __nv_fp4x2_storage_t __nv_cvt_float2_to_fp4x2(const float2 x, const __nv_fp4_interpretation_t fp4_interpretation, const enum cudaRoundMode rounding)

将打包在 float2 x 中的两个 single 精度数字的输入向量转换为两个 fp4 类型值的向量，使用指定的舍入模式并使超出范围的值饱和。

将输入向量 x 转换为由 fp4_interpretation 参数指定的类型的两个 fp4 值向量，使用由 rounding 参数指定的舍入模式。大的超出范围的值饱和到相同符号的 MAXNORM。NaN 输入值导致正 MAXNORM。

返回值

__nv_fp4x2_storage_t 值保存转换结果。

__host__ __device__ __nv_fp4_storage_t __nv_cvt_float_to_fp4(const float x, const __nv_fp4_interpretation_t fp4_interpretation, const enum cudaRoundMode rounding)

将输入 single 精度 x 转换为请求类型的 fp4 类型，使用指定的舍入模式并使超出范围的值饱和。

将输入 x 转换为由 fp4_interpretation 参数指定的类型的 fp4 类型，使用由 rounding 参数指定的舍入模式。大的超出范围的值饱和到相同符号的 MAXNORM。NaN 输入值导致正 MAXNORM。

返回值

__nv_fp4_storage_t 值保存转换结果。

__host__ __device__ __half_raw __nv_cvt_fp4_to_halfraw(const __nv_fp4_storage_t x, const __nv_fp4_interpretation_t fp4_interpretation)

将指定类型的输入 fp4 x 转换为 half 精度。

将 fp4_interpretation 参数指定的类型的输入 fp4 x 转换为 half 精度。

返回值

__half_raw 值保存转换结果。

__host__ __device__ __half2_raw __nv_cvt_fp4x2_to_halfraw2(const __nv_fp4x2_storage_t x, const __nv_fp4_interpretation_t fp4_interpretation)

将指定类型的两个 fp4 值的输入向量转换为两个 half 精度值的向量，并将其打包在 __half2_raw 结构中。

将 fp4_interpretation 参数指定的类型的 fp4 类型的输入向量 x 转换为两个 half 精度值的向量，并作为 __half2_raw 结构返回。

返回值

__half2_raw 值保存转换结果。

__host__ __device__ __nv_fp4x2_storage_t __nv_cvt_halfraw2_to_fp4x2(const __half2_raw x, const __nv_fp4_interpretation_t fp4_interpretation, const enum cudaRoundMode rounding)

将打包在 __half2_raw __half2_raw x 中的两个 half 精度数字的输入向量转换为请求类型的 fp4 类型的两个值的向量，使用指定的舍入模式并饱和超出范围的值。

将输入向量 x 转换为由 fp4_interpretation 参数指定的类型的两个 fp4 值向量，使用由 rounding 参数指定的舍入模式。大的超出范围的值饱和到相同符号的 MAXNORM。NaN 输入值导致正 MAXNORM。

返回值

__nv_fp4x2_storage_t 值保存转换结果。

__host__ __device__ __nv_fp4_storage_t __nv_cvt_halfraw_to_fp4(const __half_raw x, const __nv_fp4_interpretation_t fp4_interpretation, const enum cudaRoundMode rounding)

将输入 half 精度 x 转换为请求类型的 fp4 类型，使用指定的舍入模式并使超出范围的值饱和。

将输入 x 转换为由 fp4_interpretation 参数指定的类型的 fp4 类型，使用由 rounding 参数指定的舍入模式。大的超出范围的值饱和到相同符号的 MAXNORM。NaN 输入值导致正 MAXNORM。

返回值

__nv_fp4_storage_t 值保存转换结果。

1.4.3. 类型定义

typedef __nv_fp8_storage_t __nv_fp4_storage_t: 用于 fp4 浮点数存储的 8 位 unsigned integer 类型抽象。

typedef __nv_fp8_storage_t __nv_fp4x2_storage_t: 用于存储 fp4 浮点数对的 8 位 unsigned integer 类型抽象。

typedef __nv_fp8x2_storage_t __nv_fp4x4_storage_t: 用于存储 fp4 浮点数四元组的 16 位 unsigned integer 类型抽象。