Triton 推理服务器 Ray Serve 部署#

使用 Triton 推理服务器进程内 Python API，您可以将基于 triton 服务器的模型集成到任何 Python 框架中，包括 FastAPI 和 Ray Serve。

此目录包含一个基于 FastAPI 的 Triton 推理服务器 Ray Serve 部署示例。

安装#

stable diffusion 管线基于 Popular_Models_Guide/StableDiffusion 教程。

git clone https://github.com/triton-inference-server/tutorials.git
cd tutorials/Triton_Inference_Server_Python_API

请注意，以下命令将花费数分钟，具体取决于您的硬件配置和网络连接。

./build.sh --framework diffusion --build-models

以下命令启动一个容器并将当前目录作为 workspace 进行卷挂载。

./run.sh --framework diffusion
cd examples/rayserve

以下命令启动一个本地 Ray 集群。它还会启动 prometheus 和 grafana 实例，并启用默认的 Ray 和 Ray Serve 指标和仪表板。

./start_ray.sh

serve run tritonserver_deployment:deployment

该部署包括两个端点

identity 端点接受一个字符串并返回相同的字符串。

curl --request GET "http://127.0.0.1:8000/identity?string_input=hello_world!"

"hello_world!"

generate 端点接受一个提示，使用 stable diffusion 基于提示生成图像，并将图像保存到文件。

curl --request GET "http://127.0.0.1:8000/generate?prompt=car,model-t,realistic,4k&filename=/workspace/examples/rayserve/car_sample.jpg"

car_sample

Ray 和 Ray Serve 仪表板托管在默认端口上，可用于可视化各种指标

<IP_ADDRESS>:8265

以下命令停止本地 Ray 集群，并同时停止 prometheus 和 grafana 实例。

./stop_ray.sh