Sample RAG Application#

Prerequisites#

  • NVIDIA NIM 微服务已部署

    • NVIDIA NIM for LLMs

    • NeMo Retriever Text Embedding NIM

    • NeMo Retriever Text Reranking NIM

  • 拥有 NVIDIA AI Enterprise 产品的有效订阅,或是 NVIDIA 开发者计划成员。Helm chart 和容器的访问权限受到限制。

Install a Vector Database#

NVIDIA 在开发和测试期间使用了独立配置的 Milvus。Milvus 提供 GPU 加速的向量存储。Chain Server 应用程序也支持 Pgvector。

如果您尚未运行 Milvus,请参阅 Milvus 文档中的 使用 Helm Chart 运行支持 GPU 的 Milvus

Tip

Milvus Helm chart 未指定 PVC 的存储类。如果您的集群没有默认的存储类 Provisioner,您可以运行如下示例命令

$ kubectl patch storageclass <storage-class-name> \
    -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'

Install the NVIDIA Multi-Turn RAG Application#

  1. Create a RAG sample namespace

    $ kubectl create namespace rag-sample
    
  2. Fetch the Helm chart from NGC

    $ helm fetch https://helm.ngc.nvidia.com/ohlfw0olaadg/ea-participants/charts/rag-app-multiturn-chatbot-v24.06.tgz \
        --username='$oauthtoken' --password=<ngc-api-key>
    
  3. Save the values from the chart in a file

    $ helm show values rag-app-multiturn-chatbot-v24.06.tgz > values.yaml
    
  4. Edit the values.yaml file and update the imagePullSecret.password and query.env fields with the following environment variables

       imagePullSecret:
         ...
         password: "<ngc-api-key>"
    
       env:
         APP_VECTORSTORE_URL: "http://milvus.milvus.svc.cluster.local:19530"
         APP_VECTORSTORE_NAME: "milvus"
         APP_LLM_SERVERURL: "meta-llama3-8b-instruct.nim-service.svc.cluster.local:8000"
         APP_LLM_MODELNAME: meta/llama3-8b-instruct
         APP_LLM_MODELENGINE: nvidia-ai-endpoints
         APP_EMBEDDINGS_SERVERURL: "nv-embedqa-e5-v5.nim-service.svc.cluster.local:8000"
         APP_EMBEDDINGS_MODELNAME: nvidia/nv-embedqa-e5-v5
         APP_EMBEDDINGS_MODELENGINE: nvidia-ai-endpoints
         APP_RANKING_SERVERURL: "nv-rerankqa-mistral-4b-v3.nim-service.svc.cluster.local:8000"
         APP_RANKING_MODELNAME: nvidia/nv-rerankqa-mistral-4b-v3
         APP_RANKING_MODELENGINE: nvidia-ai-endpoints
         COLLECTION_NAME: multi_turn_rag
         APP_RETRIEVER_TOPK: 2
         APP_RETRIEVER_SCORETHRESHOLD: 0.25
         APP_TEXTSPLITTER_CHUNKSIZE: 506
         APP_TEXTSPLITTER_CHUNKOVERLAP: 200
    
    • The reranking microservice is optional. Set APP_RANKING_SERVERURL and APP_RANKING_MODELNAME variables to empty ("") to prevent the chain server from attempting to use the reranking microservice. Keep APP_RANKING_MODELENGINE: nvidia-ai-endpoints even if you did not deploy a reranking microservice.

    • The APP_VECTORSTORE_URL value is for Milvus running in a milvus namespace. Substitute your cluster-specific namespace or another address if Milvus is not running in the same cluster.

      If you use pgvector, specify a connection string like pgvector.<namespace>:5432 and APP_VECTORSTORE_NAME: pgvector.

    • The APP_xxxxx_SERVERURL values are for services running in the nim-service namespace. Substitute your cluster-specific namespace.

  5. Install the Helm chart

    $ helm install -n rag-sample multiturn-rag rag-app-multiturn-chatbot-v24.06.tgz -f values.yaml
    
  6. Optional: List resources in the namespace

    $ kubectl get all -n rag-sample
    

    Example Output

    NAME                                                READY   STATUS    RESTARTS   AGE
    pod/chain-server-multi-turn-9759ff9ff-62fdh         1/1     Running   0          99s
    pod/rag-playground-multiturn-rag-5cbdc574d6-tgb9l   1/1     Running   0          99s
    
    NAME                                   TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)          AGE
    service/chain-server-multi-turn        ClusterIP   10.105.82.33    <none>        8082/TCP         99s
    service/rag-playground-multiturn-rag   NodePort    10.99.241.217   <none>        3001:30621/TCP   99s
    
    NAME                                           READY   UP-TO-DATE   AVAILABLE   AGE
    deployment.apps/chain-server-multi-turn        1/1     1            1           99s
    deployment.apps/rag-playground-multiturn-rag   1/1     1            1           99s
    
    NAME                                                      DESIRED   CURRENT   READY   AGE
    replicaset.apps/chain-server-multi-turn-9759ff9ff         1         1         1       99s
    replicaset.apps/rag-playground-multiturn-rag-5cbdc574d6   1         1         1       99s
    

Accessing the RAG Playground#

If your cluster is not configured to work with an external load balancer or ingress, you can port-forward the HTTP connection to the sample chat application.

  1. Determine the node port for the sample chat application

    $ kubectl get service -n rag-sample rag-playground-multiturn-rag
    

    In the following sample output, the application is listening on node port 30817.

    NAME                           TYPE       CLUSTER-IP      EXTERNAL-IP   PORT(S)          AGE
    rag-playground-multiturn-rag   NodePort   10.109.239.27   <none>        3001:30817/TCP   5d19h
    
  2. Forward the port

    $ kubectl port-forward service/rag-playground-multiturn-rag -n rag-sample 30817:3001
    

After you forward the port, you can access the application at http://127.0.0.1:30817.

Next Steps#

  • You can uninstall the Helm chart by running helm uninstall -n rag-sample multiturn-rag.