SegFormer - NVIDIA 文档

SegFormer 是 NVIDIA 开发的语义分割模型，包含在 TAO 中。SegFormer 支持以下任务

训练
评估
推理
导出

这些任务可以使用 TAO Launcher 从命令行调用，使用以下约定

复制
已复制！

            
            tao model segformer <sub_task> <args_per_subtask>

其中 args_per_subtask 是给定子任务所需的命令行参数。以下章节详细解释了每个子任务。

SegFormer 的数据输入

Segformer 要求数据以图像和掩码文件夹的形式提供。有关 Segformer 输入数据格式的更多信息，请参阅数据标注格式页面。

创建训练实验规范文件

自定义数据集的配置

在本文档中，我们展示了在 ISBI 数据集上进行训练的示例配置和命令。ISBI 挑战赛：EM 堆栈神经元结构分割数据集，用于二元分割。它包含灰度图像。有关更多详细信息，请参阅示例 notebook TAO 计算机视觉示例。因此，我们将 :code: input_type 设置为 grayscale。
对于“RGB”输入图像，:code: input_type 应设置为 rgb 而不是 grayscale。
请根据您的输入数据集配置 img_norm_cfg 的均值和标准差。

以下是在 ISBI 数据集上使用 mit_b5 主干训练 SegFormer 模型的示例规范文件。

复制
已复制！

            
            train:
  exp_config:
    manual_seed: 49
  checkpoint_interval: 200
  logging_interval: 50
  max_iters: 1000
  resume_training_checkpoint_path: null
  validate: True
  validation_interval: 500
  trainer:
    find_unused_parameters: True
    sf_optim:
      lr: 0.00006
model:
  input_height: 512
  input_width: 512
  pretrained_model_path: null
  backbone:
    type: "mit_b5"
dataset:
  data_root: /tlt-pytorch
  input_type: "grayscale"
  img_norm_cfg:
    mean:
      - 127.5
      - 127.5
      - 127.5
    std:
      - 127.5
      - 127.5
      - 127.5
    to_rgb: True
  train_dataset:
    img_dir:
      - /data/images/train
    ann_dir:
      - /data/masks/train
    pipeline:
      augmentation_config:
        random_crop:
          cat_max_ratio: 0.75
          resize:
            ratio_range:
              - 0.5
              - 2.0
          random_flip:
            prob: 0.5
  val_dataset:
    img_dir: /data/images/val
    ann_dir: /data/masks/val
  palette:
    - seg_class: foreground
      rgb:
        - 0
        - 0
        - 0
      label_id: 0
      mapping_class: foreground
    - seg_class: background
      rgb:
        - 255
        - 255
        - 255
      label_id: 1
      mapping_class: background
  repeat_data_times: 500
  batch_size: 4
  workers_per_gpu: 1

训练分类实验规范由三个主要组件组成

训练
dataset
model

训练

训练配置包含与训练相关的参数。它们描述如下

参数	数据类型	默认值	描述	支持的值
`exp_config`	Dict int	None 49	`exp_config` Dict 包含以下参数 * 用于使训练具有确定性的随机种子	–
`max_iters`	int	10	训练应进行的最大迭代次数/步数
`checkpoint_interval`	int	1	需要保存检查点的步数
`logging_interval`	int	10	需要保存实验日志的步数。日志保存在 logs 目录中。
`resume_training_checkpoint_path`	str	None	用于恢复训练的检查点路径
`validate`	bool	False	在训练期间启用验证的标志
`validation_interval`	int	int	在训练期间应执行验证的迭代间隔数请注意，验证间隔应至少比检查点间隔小 1，以防止状态覆盖
`trainer`	Dict bool Dict Dict	None False None None	此配置包含 MMSeg trainer 所需的参数 * `find_unused_parameters`: 在 DDP 中设置此参数。有关更多信息，请参阅 DDP_PyT。 * `sf_optim`: Segformer 优化器配置。有关更多信息，请参阅 optimizer_spec。 * `lr_config`: Segformer 学习率配置。有关更多信息，请参阅 creating_lr_config_sf。	– – – – –

sf_optim

复制
已复制！

            
            sf_optim:
 lr: 0.00006
  betas:
   - 0.0
   - 0.999
  paramwise_cfg:
   pos_block:
   decay_mult: 0.0
   norm:
    decay_mult: 0.0
   head:
    lr_mut: 10.0
  weight_decay: 5e-4

参数	数据类型	默认值	描述	支持的值
`lr`	float	0.00006	学习率	>=0.0
`betas`	List[float]	[0.0, 0.9]	Adam 优化器中的 beta 参数	>=0.0
`paramwise_cfg`	Dict Dict float Dict float Dict float	None None 0.0 None 0.0 None 10.0	Adam 优化器的配置参数 * `pos_block` * decay_mult * `norm` * decay_mult * `head` * lr_mult	– – >=0.0 >=0.0 >=0.0 – >=0.0
`weight_decay`	float	5e-4	用于正则化的 weight_decay 超参数。	>=0.0

lr_config

复制
已复制！

            
            lr_config:
  warmup_iters: 1500
  warmup_ratio: 1e-6
  power: 1.0
  min_lr: 0.0

参数	数据类型	默认值	描述	支持的值
`warmup_iters`	int	1500	预热持续的迭代或 epoch 数。	>=0.0
`warmup_ratio`	float	1e-6	预热开始时使用的 LR 等于 `warmup_ratio * initial_lr`	>=0.0
`power`	float	1.0	将乘法系数提高到的幂。	>=0.0
`min_lr`	float	0.0	启动 LR 调度器的最小 LR	>=0.0

model

以下示例 model 提供了更改 SegFormer 架构以进行训练的选项。

复制
已复制！

            
            model:
  input_height: 512
  input_width: 512
  pretrained_model_path: null
  backbone:
    type: "mit_b5"

以下示例 model 用于 Segformer 评估/推理期间。

参数	数据类型	默认值	描述	支持的值
`pretrained_model_path`	string	None	预训练主干文件的可选路径	string to the path
`backbone`	Dict string	None	包含以下可配置参数的字典 * `type`: 要使用的主干的名称	mit_b0, mit_b1 mit_b2, mit_b3 mit_b4, mit_b5 fan_tiny_8_p4_hybrid fan_large_16_p4_hybrid fan_small_12_p4_hybrid fan_base_16_p4_hybrid
`decode_head`	Dict int Bool Float	None 768 False 0.1	包含解码器参数的字典 * `decoder_params`: 包含以下网络参数 * `embed_dims`: 嵌入维度 * `align_corners`: 如果设置为 True，则输入和输出张量通过其角像素的中心点对齐，保留角像素的值。 * `dropout_ratio`: 在神经网络中丢弃神经元的 dropout 概率比率	256, 512, 768 True, False >=0.0
`input_width`	int	512	模型的输入高度	>0
`input_height`	int	512	模型的输入宽度	>0

dataset

dataset 参数定义数据集源、训练批次大小和增强。下面提供了 dataset 的示例。

复制
已复制！

            
            dataset:
  data_root: /tlt-pytorch
  input_type: "grayscale"
  img_norm_cfg:
    mean:
      - 127.5
      - 127.5
      - 127.5
    std:
      - 127.5
      - 127.5
      - 127.5
    to_rgb: True
  train_dataset:
    img_dir:
      - /data/images/train
    ann_dir:
      - /data/masks/train
    pipeline:
      augmentation_config:
        random_crop:
          cat_max_ratio: 0.75
          resize:
            ratio_range:
              - 0.5
              - 2.0
          random_flip:
            prob: 0.5
  val_dataset:
    img_dir: /data/images/val
    ann_dir: /data/masks/val
  palette:
    - seg_class: foreground
      rgb:
        - 0
        - 0
        - 0
      label_id: 0
      mapping_class: foreground
    - seg_class: background
      rgb:
        - 255
        - 255
        - 255
      label_id: 1
      mapping_class: background
  repeat_data_times: 500
  batch_size: 4
  workers_per_gpu: 1

参数	数据类型	默认值	描述	支持的值
`img_norm_cfg`	Dict List[float] List[float] bool	None [123.675, 116.28, 103.53] [58.395, 57.12, 57.375] True	mage 归一化配置，其中包含以下参数 * `mean`: 预处理要减去的均值 * `std`: 用于划分图像的标准差 * `to_rgb`: 是否将输入格式从 BGR 转换为 RGB	>=0, <=255 >=0.0 True, False
`input_type`	String	“rgb”	输入类型是 RGB 还是灰度	“rgb”, “grayscale”
`palette`	List[Dict] string string int List[int]	None background background 0 [255, 255, 255]	调色板配置 * `seg_class`: 分割类别 * `mapping_class`: 要将其分组的类别 * `label_id`: 整数类别 ID * `rgb`: 推理期间为此类别叠加的颜色	string string >=0 >=0, <=255
`batch_size`	unsigned int	32	训练和验证的批次大小	>0
`workers_per_gpu`	unsigned int	8	并行处理数据的工作进程数	>0
`train_dataset`	dict config str str dict config dict config dict config	None None None None	用于定义训练数据集的参数 * `img_dir`: 图像目录的路径 * `ann_dir`: PNG 掩码目录的路径 * `pipeline` * `augmentation_config`: 增强配置详细信息（有关更多信息，请参阅 augmentation_config） * `Pad`: padding 增强配置 * `size_ht (int)`: 要将图像/掩码填充到的高度 * `size_wd (int)`: 要将图像/掩码填充到的宽度 * `pad_val (int)`: 输入图像的填充值 * `seg_pad_val (int)`: 分割的填充值	Dict Config None 1024 1024 0 255
`val_dataset`	dict config str str dict config List[int]	None None None None [2048, 1024]	验证配置包含以下用于验证的参数在训练期间 * `img_dir`: 图像目录的路径 * `ann_dir`: PNG 掩码目录的路径 * `pipeline` * `multi_scale`: 图像的最大比例	>=0
`test_dataset`	dict config str str dict config List[int]	None None None None [2048, 1024]	验证配置包含以下用于验证的参数在训练期间 * `img_dir`: 图像目录的路径 * `ann_dir`: PNG 掩码目录的路径 * `pipeline` * `multi_scale`: 图像的最大比例	>=0

augmentation_config

参数 数据类型 默认值 描述 支持的值

random_crop

Dict
List[int]
Float

None
[1024, 1024]
0.75

random_crop 配置具有以下参数
* crop_size: 用于增强的裁剪大小
* cat_max_ratio

0< h,w <= img_ht, img_wd
>= 0.0

resize

Dict

Bool

None

[0.5, 2.0]

True

resize Config 具有以下可配置参数
* img_scale: [高度，宽度] 输入图像应缩放到的比例
* ratio_range: 将从 ratio_range 指定的范围中随机采样一个比率。
然后将其与 img_scale 相乘，以生成采样的比例。
生成采样的比例。

* keep_ratio: 是否保留宽高比

>=0
>=0.0

True/ False

random_flip

Dict

None
0.5

random_flip 配置包含以下用于翻转增强的参数
* prob: 图像应翻转的概率

>=0.0

模型训练

使用以下命令运行 Segformer 训练

复制
已复制！

            
            tao model segformer train [-h] -e <experiment_spec_file>
                    [results_dir=<global_results_dir>]
                    [model.<model_option>=<model_option_value>]
                    [dataset.<dataset_option>=<dataset_option_value>]
                    [train.<train_option>=<train_option_value>]
                    [train.gpu_ids=<gpu indices>]
                    [train.num_gpus=<number of gpus>]

必需参数

唯一必需的参数是实验规范的路径

-e, --experiment_spec: 用于设置训练实验的实验规范文件

可选参数

您可以设置可选参数来覆盖实验规范文件中的选项值。

-h, --help: 显示此帮助消息并退出。
model.<model_option>: 模型选项。
dataset.<dataset_option>: 数据集选项。
train.<train_option>: 训练选项。
train.train_config.optimizer.<optim_option>: 优化器选项

注意

对于训练、评估和推理，我们为每个相应的任务公开 2 个变量：num_gpus 和 gpu_ids，默认值分别为 1 和 [0]。如果两者都已传递，但不一致，例如 num_gpus = 1、gpu_ids = [0, 1]，则会修改它们以遵循具有更多 GPU 的设置，例如 num_gpus = 1 -> num_gpus = 2。

评估模型

Segformer 的评估指标是 meanIOU。有关 mean IOU 指标的更多详细信息，请参阅此处 meanIOU。

使用以下命令运行 Segformer 评估

复制
已复制！

            
            tao model segformer evaluate -e <experiment_spec>
                    evaluate.checkpoint=<evaluation model>
                    results_dir=<path to output evaluation results>
                    [evaluate.gpu_ids=<gpu indices>]
                    [evaluate.num_gpus=<number of gpus>]

必需参数

-e, --experiment_spec_file: 用于设置评估实验的实验规范文件。
evaluate.checkpoint: .pth 模型。

以下是使用 Segformer 评估命令的示例

复制
已复制！

            
            +------------+-------+-------+
| Class      | IoU   | Acc   |
+------------+-------+-------+
| foreground | 37.81 | 44.56 |
| background | 83.81 | 95.51 |
+------------+-------+-------+
Summary:

+--------+-------+-------+-------+
| Scope  | mIoU  | mAcc  | aAcc  |
+--------+-------+-------+-------+
| global | 60.81 | 70.03 | 85.26 |
+--------+-------+-------+-------+
  ...

在模型上运行推理

使用以下命令在 Segformer 上使用 .pth 模型运行推理。

复制
已复制！

            
            tao model segformer inference -e <experiment_spec>
                    inference.checkpoint=<inference model>
                    results_dir=<path to output directory for inference>
                    [inference.gpu_ids=<gpu indices>]
                    [inference.num_gpus=<number of gpus>]

带有类别 ID 的输出掩码 PNG 图像保存在 vis_tao 中。叠加的掩码图像保存在 mask_tao 中。

必需参数

-e, --experiment_spec: 用于设置推理的实验规范文件
inference.checkpoint: 用于执行推理的 .pth 模型
results_dir: 用于保存推理掩码和掩码叠加图像的路径。推理创建两个目录。

导出模型

使用以下命令导出模型。

复制
已复制！

            
            tao model segformer export [-h] -e <experiment spec file>
                    results_dir=<path to results dir>
                    export.checkpoint=<trained pth model to be xported>
                    export.onnx_file=<onnx path>

必需参数

-e, --experiment_spec: 实验规范文件的路径
results_dir: 将保存导出日志的路径
export.checkpoint: 要导出的 .pth 模型
export.onnx_file: 要存储的 :code:.`onnx` 文件

TensorRT 引擎生成、验证和 int8 校准

有关部署，请参阅 TAO Deploy 文档

部署到 DeepStream

有关将 SegFormer 模型部署到 DeepStream 的更多信息，请参阅集成 SegFormer 模型页面。