附录 A：Helm Chart 值#

部署参数#

名称	描述	值
`affinity`	部署的亲和性设置。允许将 Pod 约束到节点。	`{}`
`securityContext`	为容器指定权限和访问控制设置（仅影响主容器）。	`{}`
`envVars`	向主容器添加任意环境变量 - 键值对。	`{}`
`extraVolumes`	向部署集定义添加任意额外的卷。	`{}`
`image.repository`	NIM-LLM 镜像仓库。	`""`
`image.tag`	镜像标签。	`""`
`image.pullPolicy`	镜像拉取策略。	`""`
`imagePullSecrets`	指定主容器和任何 init 容器所需的 secret 名称。对象键是 secret 的名称。	`{}`
`nodeSelector`	指定标签以确保 NeMo Inference 仅部署在某些节点上（最好根据集群设置将其设置为 `nvidia.com/gpu.present: "true"`）。	`{}`
`podAnnotations`	为主部署 Pod 指定额外的注解。	`{}`
`podSecurityContext`	为 Pod 指定权限和访问控制设置（仅影响主 Pod）。
`podSecurityContext.runAsUser`	指定 Pod 的用户 UID。	`1000`
`podSecurityContext.runAsGroup`	指定 Pod 的组 ID。	`1000`
`podSecurityContext.fsGroup`	指定文件系统所有者组 ID。	`1000`
`replicaCount`	指定部署的副本计数。	`1`
`resources`	为运行的服务指定资源限制和请求。
`resources.limits.nvidia.com/gpu`	指定提供给运行服务的 GPU 数量。	`1`
`serviceAccount.create`	指定是否应创建服务帐户。	`false`
`serviceAccount.annotations`	指定要添加到服务帐户的注解。	`{}`
`serviceAccount.automount`	指定是否自动将服务帐户挂载到容器。	`{}`
`serviceAccount.name`	指定要使用的服务帐户的名称。如果未设置且 create 为 true，则使用 fullname 模板生成名称。	`""`
`tolerations`	为 Pod 分配指定容忍度。允许调度程序调度具有匹配污点的 Pod。

自动伸缩参数#

用于自动伸缩的值。如果未启用自动伸缩，则忽略这些值。应根据服务质量指标以及成本指标，在每个模型的基础上覆盖这些值。除非使用自定义指标 API（例如 prometheus-adapter），否则不建议这样做。CPU 和内存的标准指标在 NIM 伸缩中的用途有限

名称	描述	值
`autoscaling.enabled`	启用水平 Pod 自动伸缩器。	`false`
`autoscaling.minReplicas`	指定自动伸缩的最小副本数。	`1`
`autoscaling.maxReplicas`	指定自动伸缩的最大副本数。	`10`
`autoscaling.metrics`	用于自动伸缩的指标数组。	`[]`

Ingress 参数#

名称	描述	值
`ingress.enabled`	启用 Ingress。	`false`
`ingress.className`	指定 Ingress 的类名。	`""`
`ingress.annotations`	为 Ingress 指定额外的注解。	`{}`
`ingress.hosts`	指定主机列表，每个主机包含路径列表。
`ingress.hosts[0].host`	指定主机名。	`chart-example.local`
`ingress.hosts[0].paths[0].path`	指定 Ingress 路径。	`/`
`ingress.hosts[0].paths[0].pathType`	指定路径类型。	`ImplementationSpecific`
`ingress.hosts[0].paths[0].serviceType`	指定服务类型。它可以是 nemo 或 openai – 确保您的模型服务于适当的端口。	`openai`
`ingress.tls`	指定 TLS secretName 和主机对的列表。	`[]`

探针参数#

名称	描述	值
`livenessProbe.enabled`	启用 livenessProbe。	`true`
`livenessProbe.method`	LivenessProbe 方法 http 或 script，但目前未提供 script。	`http`
`livenessProbe.path`	LivenessProbe 端点路径。	`/v1/health/live`
`livenessProbe.initialDelaySeconds`	livenessProbe 的初始延迟秒数。	`15`
`livenessProbe.timeoutSeconds`	livenessProbe 的超时秒数。	`1`
`livenessProbe.periodSeconds`	livenessProbe 的周期秒数。	`10`
`livenessProbe.successThreshold`	livenessProbe 的成功阈值。	`1`
`livenessProbe.failureThreshold`	livenessProbe 的失败阈值。	`3`
`readinessProbe.enabled`	启用 readinessProbe。	`true`
`readinessProbe.path`	Readiness 端点路径。	`/v1/health/ready`
`readinessProbe.initialDelaySeconds`	readinessProbe 的初始延迟秒数。	`15`
`readinessProbe.timeoutSeconds`	readinessProbe 的超时秒数。	`1`
`readinessProbe.periodSeconds`	readinessProbe 的周期秒数。	`10`
`readinessProbe.successThreshold`	readinessProbe 的成功阈值。	`1`
`readinessProbe.failureThreshold`	readinessProbe 的失败阈值。	`3`
`startupProbe.enabled`	启用 startupProbe。	`true`
`startupProbe.path`	StartupProbe 端点路径。	`/v1/health/ready`
`startupProbe.initialDelaySeconds`	startupProbe 的初始延迟秒数。	`40`
`startupProbe.timeoutSeconds`	startupProbe 的超时秒数。	`1`
`startupProbe.periodSeconds`	startupProbe 的周期秒数。	`10`
`startupProbe.successThreshold`	startupProbe 的成功阈值。	`1`
`startupProbe.failureThreshold`	startupProbe 的失败阈值。	`180`

存储参数#

名称	描述	值
`persistence`	指定设置以修改路径 `/model-store` （如果启用了 `model.legacyCompat`），否则修改 `/.cache` 卷（模型从中提供服务）。
`persistence.enabled`	启用持久卷。	`false`
`persistence.existingClaimName`	指定现有声明。如果使用 existingClaim，则仅运行一个副本或使用 ReadWriteMany 存储设置。	`""`
`persistence.class`	指定持久卷存储类。如果为 null（默认值），则不设置 storageClassName 规范，选择默认的 provisioner。	`""`
`persistence.retain`	指定当 helm chart 升级或删除时，是否应保留持久卷。	`""`
`persistence.createPV`	如果需要 chart 为 hostPath 用例创建 PV，则为 True。	`false`
`persistence.accessMode`	指定 accessModes。如果使用 NFS 或类似的设置，则可以使用 ReadWriteMany。	`ReadWriteOnce`
`persistence.size`	指定声明的大小（例如 8Gi）。	`50Gi`
`hostPath`	使用 hostPath 在节点上的本地磁盘上配置模型缓存 – 用于特殊情况。在使用此选项之前，应调查并了解安全隐患。	`""`

服务参数#

名称	描述	值
`service.type`	指定部署的服务类型。	`ClusterIP`
`service.name`	覆盖默认服务名称。	`""`
`service.http_port`	指定服务的 HTTP 端口。	`8080`
`service.annotations`	指定要添加到服务的额外注解。	`{}`

OpenTelemetry 参数#

名称	描述	值
`zipkinDeployed`	指定此 chart 是否应部署 zipkin 以进行指标收集。	`false`
`otelDeployed`	指定此 chart 是否应部署 OpenTelemetry 以进行指标收集。	`false`
`otelEnabled`	指定此 chart 是否应将指标汇集到 OpenTelemetry。	`false`
`otelEnvVars`	在容器中配置 OTEL 的环境变量，chart 中有合理的默认值。	`{}`
`logLevel`	为容器和指标收集设置的日志级别。	`{}`

OpenTelemetry 配置可以在 OpenTelemetry 仓库的 values 部分找到。

注意

根据您的需要配置 OpenTelemetry 导出器。提供的 helm chart 提供了将跟踪导出到 Zipkin 以及将指标导出到 OTLP 兼容接收器的示例配置，分别存储在 opentelemetry-collector.config.exporters.zipkin 和 opentelemetry-collector.config.exporters.otlp。

例如，如果您的指标设置以拉取方式运行，并且您想以 Prometheus 格式公开 NIM 指标，您可以通过将 OTLP 导出器替换为 Prometheus 导出器来实现。