固件更新失败故障排除#

固件更新因未找到组件而终止#

当使用主板固件包执行 GPU 托盘的固件更新时,固件更新会停止并显示以下输出消息

...
{
  "@odata.type": "#Message.v1_0_8.Message",
  "Message": "Given PLDMBundle Status Message : Requested component was not found in the firmware bundle.",
  "MessageArgs": [
    "Requested component was not found in the firmware bundle."
  ],
  "MessageId": "UpdateService.1.0.FwUpdateStatusMessage",
  "Resolution": "None",
  "Severity": "Warning"
},
...

该消息表明 nvfwupd 命令的 -p 参数指定的固件文件无效。重试更新并指定与组件匹配的固件文件。例如,对于 GPU 托盘更新,请使用包含 HGX 字符串的 GPU 固件文件。有关固件文件名和组件,请参阅版本 25.01.1

未检测到句柄 ID 0 的设备#

当使用 Redfish API 执行固件更新时,以下输出消息表明 -F UpdateFile= 参数中指定的固件文件不是 JSON 文件中指定的组件的正确文件。

...
{
  "@odata.type": "#Message.v1_0_8.Message",
  "Message": "Given PLDMBundle Status Message : No devices where detected for handle id 0.",
  "MessageArgs": [
    "No devices where detected for handle id 0"
  ],
  "MessageId": "UpdateService.1.0.FwUpdateStatusMessage",
  "Resolution": "None",
  "Severity": "Warning"
},
...

重试更新并指定与组件匹配的固件文件。有关使用 Redfish API 的信息,请参阅NVIDIA DGX B200 系统用户指南中的Redfish API 支持

等待固件更新启动 ID#

使用 nvfwupd 命令进行不成功的固件更新的输出可能如下例所示

FW recipe: ['nvfw_DGXB200_xxxx_xxxxxx.x.x.fwpkg']
{"@odata.type": "#UpdateService.v1_6_0.UpdateService", "Messages": [{"@odata.type": "#Message.v1_0_8.Message", "Message": "A new task /redfish/v1/TaskService/Tasks/4 was created.", "MessageArgs": ["/redfish/v1/TaskService/Tasks/4"], "MessageId": "Task.1.0.New", "Resolution": "None", "Severity": "OK"}, {"@odata.type": "#Message.v1_0_8.Message", "Message": "The action UpdateService.MultipartPush was submitted to do firmware update.", "MessageArgs": ["UpdateService.MultipartPush"], "MessageId": "UpdateService.1.0.StartFirmwareUpdate", "Resolution": "None", "Severity": "OK"}]}
FW update started, Task Id: 4

Wait for FirmwareUpdateStarted Id in Messages
Wait for FirmwareUpdateStarted Id in Messages
 Task Message: Task /redfish/v1/UpdateService/upload has stopped due to an exception condition.
Firmware update failed, retry the firmware update

如命令输出所示,重试固件更新。