重要提示

您正在查看 NeMo 2.0 文档。此版本引入了对 API 的重大更改和一个新的库 NeMo Run。我们目前正在将 NeMo 1.0 的所有功能移植到 2.0。有关先前版本或 2.0 中尚不可用的功能的文档，请参阅 NeMo 24.07 文档。

通用配置文件#

本节介绍 NeMo 配置文件设置，该设置特定于 MM Text2Img 集合中的模型。有关如何设置和运行所有 NeMo 模型通用的实验（例如，实验管理器和 PyTorch Lightning 训练器参数）的常规信息，请参阅核心文档部分。

NeMo 多模态 Text2Img 配置文件中的模型部分通常需要有关正在使用的数据集、文本和图像编码器、正在执行的任何增强的参数以及模型架构规范的信息。本页的章节更详细地介绍了其中的每一个部分。

所有 NeMo 多模态 Text2Img 脚本的示例配置文件都可以在示例的 config 目录中找到。

数据集配置#

训练、验证和测试参数分别使用配置文件中的 train、validation 和 test 部分指定。根据任务的不同，可能有指定数据集增强、用于过滤图像的分辨率过滤器等的参数。

配置文件中可以设置实验中使用的 Dataset 类接受的任何初始化参数。有关数据集及其各自参数的列表，请参阅 API 的Datasets 部分。

Text2Img 训练配置示例应如下所示

model:
  data:
    num_workers: 16 # The number of workers for dataloader process
    train:
      dataset_path: # List of wdinfo files for the datasets to train on
        - dataset1.pkl
        - dataset2.pkl
      augmentations:
        resize_samllest_side: 64 # Resize the smallest side of the image to the specified resolution
        center_crop_h_w: 64, 64 # Center cropping
        horizontal_flip: False # Whether to perform horizontal flip
      filterings:
        resolution:
          method: larger
          value: 64
    webdataset:
      use_webdataset: True
      infinite_sampler: false
      local_root_path: ??? # Path that stores the dataset
      verbose: False # Whether to print detail debugging information

目前，我们基于扩散的 Text2Img 模型不需要验证步骤即可实现更快的收敛。正如数据集中所讨论的，将训练数据集以 webdataset 格式存储是所有 text2img 训练管道的要求。使用 webdataset.infinite_sampler=True 是首选的训练方式，特别是当数据集很大时，正如 Webdataset Multinode Training Guideline 建议的那样。

启用 train.filterings 允许人们根据一些常见的用例（例如，最小分辨率）过滤掉图像（和相应的文本对），而无需在训练之前在磁盘上创建 webdataset 的冗余子集。上面的示例展示了如何过滤数据集，以便仅将分辨率大于 64x64 的图像用于训练。连接多个 webdataset 就像在 train.dataset_path 中列出所有 wdinfo 文件一样容易。

训练器配置#

训练器配置指定 Pytorch Lightning Trainer 对象的参数。

trainer:
  devices: 1 # number of GPUs (0 for CPU), or list of the GPUs to use e.g. [0, 1]
  num_nodes: 1
  max_epochs: -1
  max_steps: 2500000 # precedence over max_epochs
  logger: False  # Provided by exp_manager
  precision: bf16 # Should be set to 16 for O1 and O2 to enable the AMP.
  accelerator: gpu
  log_every_n_steps: 5  # Interval of logging.
  resume_from_checkpoint: null # The path to a checkpoint file to continue the training, restores the whole state including the epoch, step, LR schedulers, apex, etc.
  num_sanity_val_steps: 10 # number of steps to perform validation steps for sanity check the validation process before starting the training, setting to 0 disables it
  enable_checkpointing: False # Provided by exp_manager
  accumulate_grad_batches: 1 # do not modify, grad acc is automatic for training megatron models
  gradient_clip_val: 1.0
  benchmark: False
  enable_model_summary: True

有关所有可能的参数，请参阅Pytorch Lightning Trainer API 部分

实验管理器配置#

NeMo 实验管理器提供了一种方便的方式来配置日志记录、保存、恢复选项等。

exp_manager:
  exp_dir: null  # exp_dir for your experiment, if None, defaults to "./nemo_experiments"
  name: ${name}
  create_wandb_logger: True
  wandb_logger_kwargs: # Whether you want exp_manger to create a Wandb logger
    name: training-session
    project: text2img
    group: nemo
    resume: True
  create_tensorboard_logger: True  # Whether you want exp_manger to create a tb logger
  create_checkpoint_callback: True  # Whether you want exp_manager to create a model checkpoint callback
  checkpoint_callback_params:
    monitor: reduced_train_loss
    save_top_k: 5
    every_n_epochs: 0 # Save checkpoint frequency.
    every_n_train_steps: 1000 # Mutually exclusive with every_n_epochs. It is recommended to set this if training on large-scale dataset.
    filename: '${name}--{reduced_train_loss:.2f}-{step}-{consumed_samples}'
  resume_if_exists: True
  resume_ignore_no_checkpoint: True
  resume_from_checkpoint: ${model.resume_from_checkpoint}
  ema:
    enable: True
    decay: 0.9999
    validate_original_weights: False
    every_n_steps: 1
    cpu_offload: False

可以通过设置 exp_manager.ema.enable=True 来启用 EMA 功能。

优化器配置#

optim:
  name: fused_adam
  lr: 0.0001
  eps: 1e-8
  betas: [ 0.9, 0.999 ]
  weight_decay: 0.01
  sched:
    name: WarmupPolicy
    warmup_steps: 10000
    warmup_ratio: null

默认情况下，我们使用 fused_adam 作为优化器，有关所有支持的优化器，请参阅 NeMo 用户指南。学习率调度器可以在 optim.sched 部分中指定。

模型架构配置#

每个配置文件都应描述用于实验的模型架构。

以下是模型部分中参数的列表，这些参数在大多数 MM Text2Img 模型之间共享

参数	数据类型	描述
`micro_batch_size`	int	适合每个 GPU 的微批大小
`global_batch_size`	int	全局批大小，考虑了梯度累积、数据并行性
`inductor`	bool	启用 TorchInductor 优化
`channels_last`	bool	启用 NHWC 训练格式
`seed`	int	训练中使用的种子