Sample RAG Application#

Prerequisites#

NVIDIA NIM 微服务已部署
- NVIDIA NIM for LLMs
- NeMo Retriever Text Embedding NIM
- NeMo Retriever Text Reranking NIM
拥有 NVIDIA AI Enterprise 产品的有效订阅，或是 NVIDIA 开发者计划成员。Helm chart 和容器的访问权限受到限制。

Install a Vector Database#

NVIDIA 在开发和测试期间使用了独立配置的 Milvus。Milvus 提供 GPU 加速的向量存储。Chain Server 应用程序也支持 Pgvector。

如果您尚未运行 Milvus，请参阅 Milvus 文档中的使用 Helm Chart 运行支持 GPU 的 Milvus。

Tip

Milvus Helm chart 未指定 PVC 的存储类。如果您的集群没有默认的存储类 Provisioner，您可以运行如下示例命令

$ kubectl patch storageclass <storage-class-name> \
    -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'

Install the NVIDIA Multi-Turn RAG Application#

Create a RAG sample namespace
```
$ kubectl create namespace rag-sample
```

Fetch the Helm chart from NGC

$ helm fetch https://helm.ngc.nvidia.com/ohlfw0olaadg/ea-participants/charts/rag-app-multiturn-chatbot-v24.06.tgz \
    --username='$oauthtoken' --password=<ngc-api-key>

Save the values from the chart in a file

$ helm show values rag-app-multiturn-chatbot-v24.06.tgz > values.yaml

Edit the values.yaml file and update the imagePullSecret.password and query.env fields with the following environment variables

   imagePullSecret:
     ...
     password: "<ngc-api-key>"

   env:
     APP_VECTORSTORE_URL: "http://milvus.milvus.svc.cluster.local:19530"
     APP_VECTORSTORE_NAME: "milvus"
     APP_LLM_SERVERURL: "meta-llama3-8b-instruct.nim-service.svc.cluster.local:8000"
     APP_LLM_MODELNAME: meta/llama3-8b-instruct
     APP_LLM_MODELENGINE: nvidia-ai-endpoints
     APP_EMBEDDINGS_SERVERURL: "nv-embedqa-e5-v5.nim-service.svc.cluster.local:8000"
     APP_EMBEDDINGS_MODELNAME: nvidia/nv-embedqa-e5-v5
     APP_EMBEDDINGS_MODELENGINE: nvidia-ai-endpoints
     APP_RANKING_SERVERURL: "nv-rerankqa-mistral-4b-v3.nim-service.svc.cluster.local:8000"
     APP_RANKING_MODELNAME: nvidia/nv-rerankqa-mistral-4b-v3
     APP_RANKING_MODELENGINE: nvidia-ai-endpoints
     COLLECTION_NAME: multi_turn_rag
     APP_RETRIEVER_TOPK: 2
     APP_RETRIEVER_SCORETHRESHOLD: 0.25
     APP_TEXTSPLITTER_CHUNKSIZE: 506
     APP_TEXTSPLITTER_CHUNKOVERLAP: 200

The reranking microservice is optional. Set APP_RANKING_SERVERURL and APP_RANKING_MODELNAME variables to empty ("") to prevent the chain server from attempting to use the reranking microservice. Keep APP_RANKING_MODELENGINE: nvidia-ai-endpoints even if you did not deploy a reranking microservice.
The APP_VECTORSTORE_URL value is for Milvus running in a milvus namespace. Substitute your cluster-specific namespace or another address if Milvus is not running in the same cluster.

If you use pgvector, specify a connection string like pgvector.<namespace>:5432 and APP_VECTORSTORE_NAME: pgvector.
The APP_xxxxx_SERVERURL values are for services running in the nim-service namespace. Substitute your cluster-specific namespace.

Install the Helm chart

$ helm install -n rag-sample multiturn-rag rag-app-multiturn-chatbot-v24.06.tgz -f values.yaml

Optional: List resources in the namespace

$ kubectl get all -n rag-sample

Example Output

NAME                                                READY   STATUS    RESTARTS   AGE
pod/chain-server-multi-turn-9759ff9ff-62fdh         1/1     Running   0          99s
pod/rag-playground-multiturn-rag-5cbdc574d6-tgb9l   1/1     Running   0          99s

NAME                                   TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)          AGE
service/chain-server-multi-turn        ClusterIP   10.105.82.33    <none>        8082/TCP         99s
service/rag-playground-multiturn-rag   NodePort    10.99.241.217   <none>        3001:30621/TCP   99s

NAME                                           READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/chain-server-multi-turn        1/1     1            1           99s
deployment.apps/rag-playground-multiturn-rag   1/1     1            1           99s

NAME                                                      DESIRED   CURRENT   READY   AGE
replicaset.apps/chain-server-multi-turn-9759ff9ff         1         1         1       99s
replicaset.apps/rag-playground-multiturn-rag-5cbdc574d6   1         1         1       99s

Accessing the RAG Playground#

If your cluster is not configured to work with an external load balancer or ingress, you can port-forward the HTTP connection to the sample chat application.

Determine the node port for the sample chat application

$ kubectl get service -n rag-sample rag-playground-multiturn-rag

In the following sample output, the application is listening on node port 30817.

NAME                           TYPE       CLUSTER-IP      EXTERNAL-IP   PORT(S)          AGE
rag-playground-multiturn-rag   NodePort   10.109.239.27   <none>        3001:30817/TCP   5d19h

Forward the port

$ kubectl port-forward service/rag-playground-multiturn-rag -n rag-sample 30817:3001

After you forward the port, you can access the application at https://:30817.

Next Steps#

You can uninstall the Helm chart by running helm uninstall -n rag-sample multiturn-rag.