结构化生成#
用于 VLM 的 NIM 支持通过指定 JSON 模式、正则表达式、上下文无关文法,或将输出限制为某些特定选项来获取结构化输出。
当 NIM 是更大管道的一部分,并且 VLM 输出需要采用特定格式时,这可能很有用。
确保一致的输出格式用于下游处理
验证复杂数据结构
自动化数据提取从非结构化文本中
提高多步骤管道中的可靠性
以下是一些示例,说明如何以不同方式约束输出。
注意:只有 OpenAI 端点公开了用于结构化生成的输入字段。
JSON 模式#
您可以使用特定的 JSON 模式约束输出,方法是在 OpenAI 模式中使用 response_format
参数,并将 json_schema
作为 type
。详细信息请参阅 OpenAI 文档(response_format
参数的新选项部分)
NVIDIA 建议您使用 json_schema
类型而不是 json_object
来指定 JSON 模式。使用 json_object
类型使模型能够生成任何有效的 JSON,包括空 JSON。
示例:从电影海报中提取信息#
from openai import OpenAI
from pydantic import BaseModel, Field
from typing import List, Optional
client = OpenAI(base_url="http://0.0.0.0:8000/v1", api_key="not-used")
# Define the pydantic models for the response format
class Date(BaseModel):
day: int = Field(ge=1, le=31)
month: int = Field(ge=1, le=12)
year: Optional[int] = Field(ge=1895)
class MovieDetails(BaseModel):
title: str
release_date: Date
publishers: List[str]
# Prepare the question and input image
messages = [
{"role": "user", "content": [
{
"type": "text",
"text": f"Look at the poster image. Return the title and other information about this movie in JSON format."
},
{
"type": "image_url",
"image_url":
{
"url": "https://vignette1.wikia.nocookie.net/disney/images/f/f2/Walleposter.jpg"
}
}
]},
]
# Send the request with `json_schema`
response = client.chat.completions.create(
model="meta/llama-3.2-11b-vision-instruct",
messages=messages,
response_format={
"type": "json_schema",
"json_schema": {"name": "MovieDetails", "schema": MovieDetails.model_json_schema()}
}
)
assistant_message = response.choices[0].message.content
print(assistant_message)
# { "title": "WALL-E", "release_date": {"year": 2008, "month": 6, "day": 27}, "publishers": ["Walt Disney Pictures", "Pixar Animation Studios"] }
较新版本的 OpenAI SDK 提供对 Pydantic 对象的原生支持,如 原生 SDK 支持部分 中所述。运行 pip install -U openai 以安装最新的 SDK 版本。
response = client.beta.chat.completions.parse(
model="meta/llama-3.2-11b-vision-instruct",
messages=messages,
response_format=MovieDetails,
)
assistant_message = response.choices[0].message.content
print(assistant_message)
# { "title": "WALL-E", "release_date": {"year": 2008, "month": 6, "day": 27}, "publishers": ["Walt Disney Pictures", "Pixar Animation Studios"] }
通过使用 JSON 模式,您可以确保 VLM 的输出符合特定的结构,从而更容易在应用程序的工作流程中处理和验证生成的数据。
正则表达式#
您可以为输出格式指定正则表达式,方法是在 OpenAI 模式的 nvext
扩展中使用 guided_regex
参数。
from openai import OpenAI
client = OpenAI(base_url="http://0.0.0.0:8000/v1", api_key="not-used")
regex = "[1-5]"
messages = [
{"role": "user", "content": [
{
"type": "text",
"text": f"Return the number of cars seen in this image"
},
{
"type": "image_url",
"image_url":
{
"url": "https://cdn.ebaumsworld.com/mediaFiles/picture/202553/84419818.jpg"
}
}
]},
]
response = client.chat.completions.create(
model="meta/llama-3.2-11b-vision-instruct",
messages=messages,
extra_body={"nvext": {"guided_regex": regex}},
stream=False
)
assistant_message = response.choices[0].message.content
print(assistant_message)
# Prints:
# 2
选项#
您可以为输出指定选项列表,方法是在 OpenAI 模式的 nvext
扩展中使用 guided_choice
参数。
from openai import OpenAI
client = OpenAI(base_url="http://0.0.0.0:8000/v1", api_key="not-used")
choices = ["Good", "Bad", "Neutral"]
# We send the list of choices in the prompt to help the model but this is not
# strictly necessary, the model will have to follow the choices in any case
messages = [
{"role": "user", "content": [
{
"type": "text",
"text": f"What is the state of pollution in this image? It should be one of {choices}"
},
{
"type": "image_url",
"image_url":
{
"url": "https://m.media-amazon.com/images/I/51A5iA+lNcL._AC_.jpg"
}
}
]},
]
response = client.chat.completions.create(
model="meta/llama-3.2-11b-vision-instruct",
messages=messages,
extra_body={"nvext": {"guided_choice": choices}},
stream=False
)
assistant_message = response.choices[0].message.content
print(assistant_message)
# Prints:
# Bad
上下文无关文法#
您可以指定 EBNF 格式的上下文无关文法,方法是在 OpenAI 模式的 nvext
扩展中使用 guided_grammar
参数。
该文法使用 EBNF 语言定义。
from openai import OpenAI
client = OpenAI(base_url="http://0.0.0.0:8000/v1", api_key="not-used")
grammar = """
?start: "There are " num " cars in this image."
?num: /[1-5]/
"""
messages = [
{"role": "user", "content": [
{
"type": "text",
"text": f"What is in this image?"
},
{
"type": "image_url",
"image_url":
{
"url": "https://m.media-amazon.com/images/I/51A5iA+lNcL._AC_.jpg"
}
}
]},
]
response = client.chat.completions.create(
model="meta/llama-3.2-11b-vision-instruct",
messages=messages,
extra_body={"nvext": {"guided_grammar": grammar}},
stream=False
)
completion = response.choices[0].message.content
print(completion)
# Prints:
# There are 2 cars in this image.