Sample RAG Application#
Prerequisites#
NVIDIA NIM 微服务已部署
NVIDIA NIM for LLMs
NeMo Retriever Text Embedding NIM
NeMo Retriever Text Reranking NIM
拥有 NVIDIA AI Enterprise 产品的有效订阅,或是 NVIDIA 开发者计划成员。Helm chart 和容器的访问权限受到限制。
Install a Vector Database#
NVIDIA 在开发和测试期间使用了独立配置的 Milvus。Milvus 提供 GPU 加速的向量存储。Chain Server 应用程序也支持 Pgvector。
如果您尚未运行 Milvus,请参阅 Milvus 文档中的 使用 Helm Chart 运行支持 GPU 的 Milvus。
Tip
Milvus Helm chart 未指定 PVC 的存储类。如果您的集群没有默认的存储类 Provisioner,您可以运行如下示例命令
$ kubectl patch storageclass <storage-class-name> \
-p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'
Install the NVIDIA Multi-Turn RAG Application#
Create a RAG sample namespace
$ kubectl create namespace rag-sample
Fetch the Helm chart from NGC
$ helm fetch https://helm.ngc.nvidia.com/ohlfw0olaadg/ea-participants/charts/rag-app-multiturn-chatbot-v24.06.tgz \ --username='$oauthtoken' --password=<ngc-api-key>
Save the values from the chart in a file
$ helm show values rag-app-multiturn-chatbot-v24.06.tgz > values.yaml
Edit the
values.yaml
file and update theimagePullSecret.password
andquery.env
fields with the following environment variablesimagePullSecret: ... password: "<ngc-api-key>" env: APP_VECTORSTORE_URL: "http://milvus.milvus.svc.cluster.local:19530" APP_VECTORSTORE_NAME: "milvus" APP_LLM_SERVERURL: "meta-llama3-8b-instruct.nim-service.svc.cluster.local:8000" APP_LLM_MODELNAME: meta/llama3-8b-instruct APP_LLM_MODELENGINE: nvidia-ai-endpoints APP_EMBEDDINGS_SERVERURL: "nv-embedqa-e5-v5.nim-service.svc.cluster.local:8000" APP_EMBEDDINGS_MODELNAME: nvidia/nv-embedqa-e5-v5 APP_EMBEDDINGS_MODELENGINE: nvidia-ai-endpoints APP_RANKING_SERVERURL: "nv-rerankqa-mistral-4b-v3.nim-service.svc.cluster.local:8000" APP_RANKING_MODELNAME: nvidia/nv-rerankqa-mistral-4b-v3 APP_RANKING_MODELENGINE: nvidia-ai-endpoints COLLECTION_NAME: multi_turn_rag APP_RETRIEVER_TOPK: 2 APP_RETRIEVER_SCORETHRESHOLD: 0.25 APP_TEXTSPLITTER_CHUNKSIZE: 506 APP_TEXTSPLITTER_CHUNKOVERLAP: 200
The reranking microservice is optional. Set
APP_RANKING_SERVERURL
andAPP_RANKING_MODELNAME
variables to empty (""
) to prevent the chain server from attempting to use the reranking microservice. KeepAPP_RANKING_MODELENGINE: nvidia-ai-endpoints
even if you did not deploy a reranking microservice.The
APP_VECTORSTORE_URL
value is for Milvus running in amilvus
namespace. Substitute your cluster-specific namespace or another address if Milvus is not running in the same cluster.If you use pgvector, specify a connection string like
pgvector.<namespace>:5432
andAPP_VECTORSTORE_NAME: pgvector
.The
APP_xxxxx_SERVERURL
values are for services running in thenim-service
namespace. Substitute your cluster-specific namespace.
Install the Helm chart
$ helm install -n rag-sample multiturn-rag rag-app-multiturn-chatbot-v24.06.tgz -f values.yaml
Optional: List resources in the namespace
$ kubectl get all -n rag-sample
Example Output
NAME READY STATUS RESTARTS AGE pod/chain-server-multi-turn-9759ff9ff-62fdh 1/1 Running 0 99s pod/rag-playground-multiturn-rag-5cbdc574d6-tgb9l 1/1 Running 0 99s NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/chain-server-multi-turn ClusterIP 10.105.82.33 <none> 8082/TCP 99s service/rag-playground-multiturn-rag NodePort 10.99.241.217 <none> 3001:30621/TCP 99s NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/chain-server-multi-turn 1/1 1 1 99s deployment.apps/rag-playground-multiturn-rag 1/1 1 1 99s NAME DESIRED CURRENT READY AGE replicaset.apps/chain-server-multi-turn-9759ff9ff 1 1 1 99s replicaset.apps/rag-playground-multiturn-rag-5cbdc574d6 1 1 1 99s
Accessing the RAG Playground#
If your cluster is not configured to work with an external load balancer or ingress, you can port-forward the HTTP connection to the sample chat application.
Determine the node port for the sample chat application
$ kubectl get service -n rag-sample rag-playground-multiturn-rag
In the following sample output, the application is listening on node port 30817.
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE rag-playground-multiturn-rag NodePort 10.109.239.27 <none> 3001:30817/TCP 5d19h
Forward the port
$ kubectl port-forward service/rag-playground-multiturn-rag -n rag-sample 30817:3001
After you forward the port, you can access the application at http://127.0.0.1:30817.
Next Steps#
You can uninstall the Helm chart by running
helm uninstall -n rag-sample multiturn-rag
.