示例¶

在本节中，我们将展示一个关于如何定义量子算符、量子态，然后计算量子算符对量子态的作用的示例。为了清晰起见，量子算符在单独的头文件 transverse_ising_full_fused_noisy.h 中定义，并在辅助 C++ 类 UserDefinedLiouvillian 中包装。我们还提供了一个实用程序头文件 helpers.h，其中包含方便的 GPU 数组实例化函数。

编译代码¶

假设 cuQuantum 已解压到 CUQUANTUM_ROOT，cuTENSOR 已解压到 CUTENSOR_ROOT，我们按如下方式更新库路径

export LD_LIBRARY_PATH=${CUQUANTUM_ROOT}/lib:${CUTENSOR_ROOT}/lib/12:${LD_LIBRARY_PATH}

根据您的 CUDA 工具包，您可能需要选择不同的库版本（例如，${CUTENSOR_ROOT}/lib/11）。

下面讨论的串行示例代码 (operator_action_example.cpp) 可以通过以下命令编译

nvcc operator_action_example.cpp -I${CUQUANTUM_ROOT}/include -I${CUTENSOR_ROOT}/include -L${CUQUANTUM_ROOT}/lib -L${CUTENSOR_ROOT}/lib/12 -lcudensitymat -lcutensor -o operator_action_example

为了静态链接到 cuDensityMat 库，请使用以下命令

nvcc operator_action_example.cpp -I${CUQUANTUM_ROOT}/include -I${CUTENSOR_ROOT}/include ${CUQUANTUM_ROOT}/lib/libcudensitymat_static.a -L${CUTENSOR_DIR}/lib/12 -lcutensor -o operator_action_example

为了构建示例 operator_action_mpi_example.cpp 的并行 (MPI) 版本，您需要安装一个CUDA-aware MPI 库（例如，最新的 OpenMPI、MPICH 或 MVAPICH），然后设置环境变量 $CUDENSITYMAT_COMM_LIB 为 MPI 接口包装器共享库 libcudensitymat_distributed_interface_mpi.so 的路径。MPI 接口包装器共享库 libcudensitymat_distributed_interface_mpi.so 可以在 ${CUQUANTUM_ROOT}/distributed_interfaces 文件夹内通过调用提供的构建脚本来构建。为了将可执行文件链接到 CUDA-aware MPI 库，您需要将 -I${MPI_PATH}/include 和 -L${MPI_PATH}/lib -lmpi 添加到构建命令

nvcc operator_action_mpi_example.cpp -I${CUQUANTUM_ROOT}/include -I${CUTENSOR_ROOT}/include -I${MPI_PATH}/include -L${CUQUANTUM_ROOT}/lib -L${CUTENSOR_ROOT}/lib/12 -lcudensitymat -lcutensor -L${MPI_PATH}/lib -lmpi -o operator_action_mpi_example

警告

在没有 CUDA-aware MPI 的情况下运行 operator_action_mpi_example.cpp，程序将崩溃。

注意

根据 cuQuantum 包的来源，您可能需要将上面的 lib 替换为 lib64，具体取决于您的 cuQuantum 包中使用的文件夹名称。

代码示例（单 GPU 上的串行执行）¶

以下代码示例说明了使用 cuDensityMat 库计算量子多体算符对量子态的作用所需的常用步骤。完整的示例代码可以在 NVIDIA/cuQuantum 存储库中找到（主串行代码和算符定义以及实用程序代码）。

首先，让我们介绍一个辅助类来构建量子多体算符，例如，具有融合 ZZ 项和附加噪声项的横向场 Ising 哈密顿量。

/* Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES.
 *
 * SPDX-License-Identifier: BSD-3-Clause
 */

#pragma once

#include <cudensitymat.h> // cuDensityMat library header
#include "helpers.h"   // helper functions

#include <cmath>
#include <complex>
#include <vector>
#include <iostream>
#include <cassert>


/* Time-dependent transverse-field Ising Hamiltonian operator
   with ordered and fused ZZ terms, plus fused unitary dissipation terms:
    H = sum_{i} {h_i * X_i}                // transverse field sum of X_i
      + f(t) * sum_{i < j} {g_ij * ZZ_ij}  // modulated sum of fused {Z_i * Z_j} terms
      + d * sum_{i} {Y_i * {..} * Y_i}     // dissipation terms {Y_i * {..} * Y_i} will be fused into the YY_ii super-operator
   where {..} is the placeholder for the density matrix to show that the operators act from a different side
*/


// User-defined C++ callback function defining a time-dependent coefficient inside the Hamiltonian:
// f(t) = cos(omega * t) + i * sin(omega * t)
extern "C"
int32_t tdCoefComplex64(double time,             // time point
                        int32_t numParams,       // number of external user-defined Liouvillian parameters (= 1 here)
                        const double params[],   // params[0] is omega (user-defined Liouvillian parameter)
                        cudaDataType_t dataType, // data type (CUDA_C_64F here)
                        void * scalarStorage)    // CPU storage for the returned function value
{
  const auto omega = params[0];
  auto * tdCoef = static_cast<std::complex<double>*>(scalarStorage); // casting to complex<double> because it returns CUDA_C_64F data type
  *tdCoef = {std::cos(omega * time), std::sin(omega * time)};
  return 0; // error code (0: Success)
}


/** Convenience class to encapsulate the Liouvillian operator:
 *   - Constructor constructs the desired Liouvillian operator (`cudensitymatOperator_t`)
 *   - get() method returns a reference to the constructed Liouvillian operator
 *   - Destructor releases all resources used by the Liouvillian operator
 */
class UserDefinedLiouvillian final
{
private:
  // Data members
  cudensitymatHandle_t handle;              // library context handle
  const std::vector<int64_t> spaceShape;    // Hilbert space shape
  void * spinXelems {nullptr};              // elements of the X spin operator in GPU RAM
  void * spinYYelems {nullptr};             // elements of the YY two-spin operator in GPU RAM
  void * spinZZelems {nullptr};             // elements of the ZZ two-spin operator in GPU RAM
  cudensitymatElementaryOperator_t spinX;   // X spin operator
  cudensitymatElementaryOperator_t spinYY;  // YY two-spin operator
  cudensitymatElementaryOperator_t spinZZ;  // ZZ two-spin operator
  cudensitymatOperatorTerm_t oneBodyTerm;   // operator term: H1 = sum_{i} {h_i * X_i}
  cudensitymatOperatorTerm_t twoBodyTerm;   // operator term: H2 = f(t) * sum_{i < j} {g_ij * ZZ_ij}
  cudensitymatOperatorTerm_t noiseTerm;     // operator term: D1 = d * sum_{i} {YY_ii}  // Y_i operators act from different sides on the density matrix
  cudensitymatOperator_t liouvillian;       // (-i * (H1 + f(t) * H2) * rho) + (i * rho * (H1 + f(t) * H2)) + D1

public:

  // Constructor constructs a user-defined Liouvillian operator
  UserDefinedLiouvillian(cudensitymatHandle_t contextHandle,              // library context handle
                         const std::vector<int64_t> & hilbertSpaceShape): // Hilbert space shape
    handle(contextHandle), spaceShape(hilbertSpaceShape)
  {
    // Define the necessary elementary tensors in GPU memory (F-order storage!)
    spinXelems = createArrayGPU<std::complex<double>>(
                  {{0.0, 0.0}, {1.0, 0.0},   // 1st column of matrix X
                   {1.0, 0.0}, {0.0, 0.0}}); // 2nd column of matrix X

    spinYYelems = createArrayGPU<std::complex<double>>(  // YY[i0, i1; j0, j1] := Y[i0; j0] * Y[i1; j1]
                    {{0.0, 0.0},  {0.0, 0.0}, {0.0, 0.0}, {-1.0, 0.0},  // 1st column of matrix YY
                     {0.0, 0.0},  {0.0, 0.0}, {1.0, 0.0}, {0.0, 0.0},   // 2nd column of matrix YY
                     {0.0, 0.0},  {1.0, 0.0}, {0.0, 0.0}, {0.0, 0.0},   // 3rd column of matrix YY
                     {-1.0, 0.0}, {0.0, 0.0}, {0.0, 0.0}, {0.0, 0.0}}); // 4th column of matrix YY

    spinZZelems = createArrayGPU<std::complex<double>>(  // ZZ[i0, i1; j0, j1] := Z[i0; j0] * Z[i1; j1]
                    {{1.0, 0.0}, {0.0, 0.0},  {0.0, 0.0},  {0.0, 0.0},   // 1st column of matrix ZZ
                     {0.0, 0.0}, {-1.0, 0.0}, {0.0, 0.0},  {0.0, 0.0},   // 2nd column of matrix ZZ
                     {0.0, 0.0}, {0.0, 0.0},  {-1.0, 0.0}, {0.0, 0.0},   // 3rd column of matrix ZZ
                     {0.0, 0.0}, {0.0, 0.0},  {0.0, 0.0},  {1.0, 0.0}}); // 4th column of matrix ZZ

    // Construct the necessary Elementary Tensor Operators
    //   X_i operator
    HANDLE_CUDM_ERROR(cudensitymatCreateElementaryOperator(handle,
                        1,                                   // one-body operator
                        std::vector<int64_t>({2}).data(),    // acts in tensor space of shape {2}
                        CUDENSITYMAT_OPERATOR_SPARSITY_NONE, // dense tensor storage
                        0,                                   // 0 for dense tensors
                        nullptr,                             // nullptr for dense tensors
                        CUDA_C_64F,                          // data type
                        spinXelems,                          // tensor elements in GPU memory
                        {nullptr, nullptr},                  // no tensor callback function (tensor is not time-dependent)
                        &spinX));                            // the created elementary tensor operator
    //  ZZ_ij = Z_i * Z_j fused operator
    HANDLE_CUDM_ERROR(cudensitymatCreateElementaryOperator(handle,
                        2,                                   // two-body operator
                        std::vector<int64_t>({2,2}).data(),  // acts in tensor space of shape {2,2}
                        CUDENSITYMAT_OPERATOR_SPARSITY_NONE, // dense tensor storage
                        0,                                   // 0 for dense tensors
                        nullptr,                             // nullptr for dense tensors
                        CUDA_C_64F,                          // data type
                        spinZZelems,                         // tensor elements in GPU memory
                        {nullptr, nullptr},                  // no tensor callback function (tensor is not time-dependent)
                        &spinZZ));                           // the created elementary tensor operator
    //  YY_ii = Y_i * {..} * Y_i fused operator (note action from different sides)
    HANDLE_CUDM_ERROR(cudensitymatCreateElementaryOperator(handle,
                        2,                                   // two-body operator
                        std::vector<int64_t>({2,2}).data(),  // acts in tensor space of shape {2,2}
                        CUDENSITYMAT_OPERATOR_SPARSITY_NONE, // dense tensor storage
                        0,                                   // 0 for dense tensors
                        nullptr,                             // nullptr for dense tensors
                        CUDA_C_64F,                          // data type
                        spinYYelems,                         // tensor elements in GPU memory
                        {nullptr, nullptr},                  // no tensor callback function (tensor is not time-dependent)
                        &spinYY));                           // the created elementary tensor operator

    // Construct the necessary Operator Terms from direct products of Elementary Tensor Operators
    //  Create an empty operator term
    HANDLE_CUDM_ERROR(cudensitymatCreateOperatorTerm(handle,
                        spaceShape.size(),                   // Hilbert space rank (number of dimensions)
                        spaceShape.data(),                   // Hilbert space shape
                        &oneBodyTerm));                      // the created empty operator term
    //  Define the operator term
    for (int32_t i = 0; i < spaceShape.size(); ++i) {
      const double h_i = 1.0 / static_cast<double>(i+1);  // just some value (time-independent h_i coefficient)
      HANDLE_CUDM_ERROR(cudensitymatOperatorTermAppendElementaryProduct(handle,
                          oneBodyTerm,
                          1,                                                             // number of elementary tensor operators in the product
                          std::vector<cudensitymatElementaryOperator_t>({spinX}).data(), // elementary tensor operators forming the product
                          std::vector<int32_t>({i}).data(),                              // space modes acted on by the operator product
                          std::vector<int32_t>({0}).data(),                              // space mode action duality (0: from the left; 1: from the right)
                          make_cuDoubleComplex(h_i, 0.0),                                // h_i constant coefficient: Always 64-bit-precision complex number
                          {nullptr, nullptr}));                                          // no time-dependent coefficient associated with the operator product
    }
    //  Create an empty operator term
    HANDLE_CUDM_ERROR(cudensitymatCreateOperatorTerm(handle,
                        spaceShape.size(),                   // Hilbert space rank (number of dimensions)
                        spaceShape.data(),                   // Hilbert space shape
                        &twoBodyTerm));                      // the created empty operator term
    //  Define the operator term
    for (int32_t i = 0; i < spaceShape.size() - 1; ++i) {
      for (int32_t j = (i + 1); j < spaceShape.size(); ++j) {
        const double g_ij = -1.0 / static_cast<double>(i + j + 1);  // just some value (time-independent g_ij coefficient)
        HANDLE_CUDM_ERROR(cudensitymatOperatorTermAppendElementaryProduct(handle,
                            twoBodyTerm,
                            1,                                                              // number of elementary tensor operators in the product
                            std::vector<cudensitymatElementaryOperator_t>({spinZZ}).data(), // elementary tensor operators forming the product
                            std::vector<int32_t>({i, j}).data(),                            // space modes acted on by the operator product
                            std::vector<int32_t>({0, 0}).data(),                            // space mode action duality (0: from the left; 1: from the right)
                            make_cuDoubleComplex(g_ij, 0.0),                                // g_ij constant coefficient: Always 64-bit-precision complex number
                            {nullptr, nullptr}));                                           // no time-dependent coefficient associated with the operator product
      }
    }
    //  Create an empty operator term
    HANDLE_CUDM_ERROR(cudensitymatCreateOperatorTerm(handle,
                        spaceShape.size(),                   // Hilbert space rank (number of dimensions)
                        spaceShape.data(),                   // Hilbert space shape
                        &noiseTerm));                        // the created empty operator term
    //  Define the operator term
    for (int32_t i = 0; i < spaceShape.size(); ++i) {
      HANDLE_CUDM_ERROR(cudensitymatOperatorTermAppendElementaryProduct(handle,
                          noiseTerm,
                          1,                                                              // number of elementary tensor operators in the product
                          std::vector<cudensitymatElementaryOperator_t>({spinYY}).data(), // elementary tensor operators forming the product
                          std::vector<int32_t>({i, i}).data(),                            // space modes acted on by the operator product (from different sides)
                          std::vector<int32_t>({0, 1}).data(),                            // space mode action duality (0: from the left; 1: from the right)
                          make_cuDoubleComplex(1.0, 0.0),                                 // default coefficient: Always 64-bit-precision complex number
                          {nullptr, nullptr}));                                           // no time-dependent coefficient associated with the operator product
    }

    // Construct the full Liouvillian operator as a sum of the operator terms
    //  Create an empty operator (super-operator)
    HANDLE_CUDM_ERROR(cudensitymatCreateOperator(handle,
                        spaceShape.size(),               // Hilbert space rank (number of dimensions)
                        spaceShape.data(),               // Hilbert space shape
                        &liouvillian));                  // the created empty operator (super-operator)
    //  Append an operator term to the operator (super-operator)
    HANDLE_CUDM_ERROR(cudensitymatOperatorAppendTerm(handle,
                        liouvillian,
                        oneBodyTerm,                     // appended operator term
                        0,                               // operator term action duality as a whole (0: acting from the left; 1: acting from the right)
                        make_cuDoubleComplex(0.0, -1.0), // -i constant
                        {nullptr, nullptr}));            // no time-dependent coefficient associated with the operator term as a whole
    //  Append an operator term to the operator (super-operator)
    HANDLE_CUDM_ERROR(cudensitymatOperatorAppendTerm(handle,
                        liouvillian,
                        twoBodyTerm,                     // appended operator term
                        0,                               // operator term action duality as a whole (0: acting from the left; 1: acting from the right)
                        make_cuDoubleComplex(0.0, -1.0), // -i constant
                        {tdCoefComplex64, nullptr}));    // function callback defining the time-dependent coefficient associated with this operator term as a whole
    //  Append an operator term to the operator (super-operator)
    HANDLE_CUDM_ERROR(cudensitymatOperatorAppendTerm(handle,
                        liouvillian,
                        oneBodyTerm,                    // appended operator term
                        1,                              // operator term action duality as a whole (0: acting from the left; 1: acting from the right)
                        make_cuDoubleComplex(0.0, 1.0), // i constant
                        {nullptr, nullptr}));           // no time-dependent coefficient associated with the operator term as a whole
    //  Append an operator term to the operator (super-operator)
    HANDLE_CUDM_ERROR(cudensitymatOperatorAppendTerm(handle,
                        liouvillian,
                        twoBodyTerm,                    // appended operator term
                        1,                              // operator term action duality as a whole (0: acting from the left; 1: acting from the right)
                        make_cuDoubleComplex(0.0, 1.0), // i constant
                        {tdCoefComplex64, nullptr}));   // function callback defining the time-dependent coefficient associated with this operator term as a whole
    //  Append an operator term to the operator (super-operator)
    const double d = 0.42; // just some value (time-independent coefficient)
    HANDLE_CUDM_ERROR(cudensitymatOperatorAppendTerm(handle,
                        liouvillian,
                        noiseTerm,                    // appended operator term
                        0,                            // operator term action duality as a whole (no duality reversing in this case)
                        make_cuDoubleComplex(d, 0.0), // constant coefficient associated with the operator term as a whole
                        {nullptr, nullptr}));         // no time-dependent coefficient associated with the operator term as a whole
  }

  // Destructor destructs the user-defined Liouvillian operator
  ~UserDefinedLiouvillian()
  {
    // Destroy the Liouvillian operator
    HANDLE_CUDM_ERROR(cudensitymatDestroyOperator(liouvillian));

    // Destroy operator terms
    HANDLE_CUDM_ERROR(cudensitymatDestroyOperatorTerm(noiseTerm));
    HANDLE_CUDM_ERROR(cudensitymatDestroyOperatorTerm(twoBodyTerm));
    HANDLE_CUDM_ERROR(cudensitymatDestroyOperatorTerm(oneBodyTerm));

    // Destroy elementary tensor operators
    HANDLE_CUDM_ERROR(cudensitymatDestroyElementaryOperator(spinYY));
    HANDLE_CUDM_ERROR(cudensitymatDestroyElementaryOperator(spinZZ));
    HANDLE_CUDM_ERROR(cudensitymatDestroyElementaryOperator(spinX));

    // Destroy elementary tensors
    destroyArrayGPU(spinYYelems);
    destroyArrayGPU(spinZZelems);
    destroyArrayGPU(spinXelems);
  }

  // Disable copy constructor/assignment (GPU resources are private, no deep copy)
  UserDefinedLiouvillian(const UserDefinedLiouvillian &) = delete;
  UserDefinedLiouvillian & operator=(const UserDefinedLiouvillian &) = delete;
  UserDefinedLiouvillian(UserDefinedLiouvillian &&) noexcept = default;
  UserDefinedLiouvillian & operator=(UserDefinedLiouvillian &&) noexcept = default;

  // Get access to the constructed Liouvillian
  cudensitymatOperator_t & get()
  {
    return liouvillian;
  }

};

现在我们可以在主代码中使用这个量子多体算符。

/* Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES.
 *
 * SPDX-License-Identifier: BSD-3-Clause
 */

#include <cudensitymat.h>  // cuDensityMat library header
#include "helpers.h"       // helper functions


// Transverse Ising Hamiltonian with double summation ordering
// and spin-operator fusion, plus fused dissipation terms
#include "transverse_ising_full_fused_noisy.h"  // user-defined Liouvillian operator example


#include <cmath>
#include <complex>
#include <vector>
#include <chrono>
#include <iostream>
#include <cassert>


// Number of times to perform operator action on a quantum state
constexpr int NUM_REPEATS = 2;

// Logging verbosity
bool verbose = true;


// Example workflow
void exampleWorkflow(cudensitymatHandle_t handle)
{
  // Define the composite Hilbert space shape and
  // quantum state batch size (number of individual quantum states)
  const std::vector<int64_t> spaceShape({2,2,2,2,2,2,2,2}); // dimensions of quantum degrees of freedom
  const int64_t batchSize = 1;                              // number of quantum states per batch (default is 1)

  if (verbose) {
    std::cout << "Hilbert space rank = " << spaceShape.size() << "; Shape = (";
    for (const auto & dimsn: spaceShape)
      std::cout << dimsn << ",";
    std::cout << ")" << std::endl;
    std::cout << "Quantum state batch size = " << batchSize << std::endl;
  }

  // Construct a user-defined Liouvillian operator using a convenience C++ class
  UserDefinedLiouvillian liouvillian(handle, spaceShape);
  if (verbose)
    std::cout << "Constructed the Liouvillian operator\n";

  // Declare the input quantum state
  cudensitymatState_t inputState;
  HANDLE_CUDM_ERROR(cudensitymatCreateState(handle,
                      CUDENSITYMAT_STATE_PURITY_MIXED,  // pure (state vector) or mixed (density matrix) state
                      spaceShape.size(),
                      spaceShape.data(),
                      batchSize,
                      CUDA_C_64F,  // data type must match that of the operators created above
                      &inputState));

  // Query the size of the quantum state storage
  std::size_t storageSize {0}; // only one storage component (tensor) is needed
  HANDLE_CUDM_ERROR(cudensitymatStateGetComponentStorageSize(handle,
                      inputState,
                      1,               // only one storage component
                      &storageSize));  // storage size in bytes
  const std::size_t stateVolume = storageSize / sizeof(std::complex<double>);  // quantum state tensor volume (number of elements)
  if (verbose)
    std::cout << "Quantum state storage size (bytes) = " << storageSize << std::endl;

  // Prepare some initial value for the input quantum state
  std::vector<std::complex<double>> inputStateValue(stateVolume);
  for (std::size_t i = 0; i < stateVolume; ++i) {
    inputStateValue[i] = std::complex<double>{double(i+1), double(-(i+2))}; // just some value
  }

  // Allocate initialized GPU storage for the input quantum state with prepared values
  auto * inputStateElems = createArrayGPU(inputStateValue);

  // Attach initialized GPU storage to the input quantum state
  HANDLE_CUDM_ERROR(cudensitymatStateAttachComponentStorage(handle,
                      inputState,
                      1,                                                 // only one storage component (tensor)
                      std::vector<void*>({inputStateElems}).data(),      // pointer to the GPU storage for the quantum state
                      std::vector<std::size_t>({storageSize}).data()));  // size of the GPU storage for the quantum state
  if (verbose)
    std::cout << "Constructed input quantum state\n";

  // Declare the output quantum state of the same shape
  cudensitymatState_t outputState;
  HANDLE_CUDM_ERROR(cudensitymatCreateState(handle,
                      CUDENSITYMAT_STATE_PURITY_MIXED,  // pure (state vector) or mixed (density matrix) state
                      spaceShape.size(),
                      spaceShape.data(),
                      batchSize,
                      CUDA_C_64F,  // data type must match that of the operators created above
                      &outputState));

  // Allocate initialized GPU storage for the output quantum state
  auto * outputStateElems = createArrayGPU(std::vector<std::complex<double>>(stateVolume, {0.0, 0.0}));

  // Attach initialized GPU storage to the output quantum state
  HANDLE_CUDM_ERROR(cudensitymatStateAttachComponentStorage(handle,
                      outputState,
                      1,                                                 // only one storage component (no tensor factorization)
                      std::vector<void*>({outputStateElems}).data(),     // pointer to the GPU storage for the quantum state
                      std::vector<std::size_t>({storageSize}).data()));  // size of the GPU storage for the quantum state
  if (verbose)
    std::cout << "Constructed output quantum state\n";

  // Declare a workspace descriptor
  cudensitymatWorkspaceDescriptor_t workspaceDescr;
  HANDLE_CUDM_ERROR(cudensitymatCreateWorkspace(handle, &workspaceDescr));

  // Query free GPU memory
  std::size_t freeMem = 0, totalMem = 0;
  HANDLE_CUDA_ERROR(cudaMemGetInfo(&freeMem, &totalMem));
  freeMem = static_cast<std::size_t>(static_cast<double>(freeMem) * 0.95); // take 95% of the free memory for the workspace buffer
  if (verbose)
    std::cout << "Max workspace buffer size (bytes) = " << freeMem << std::endl;

  // Prepare the Liouvillian operator action on a quantum state (needs to be done only once)
  const auto startTime = std::chrono::high_resolution_clock::now();
  HANDLE_CUDM_ERROR(cudensitymatOperatorPrepareAction(handle,
                      liouvillian.get(),
                      inputState,
                      outputState,
                      CUDENSITYMAT_COMPUTE_64F,  // GPU compute type
                      freeMem,                   // max available GPU free memory for the workspace
                      workspaceDescr,            // workspace descriptor
                      0x0));                     // default CUDA stream
  const auto finishTime = std::chrono::high_resolution_clock::now();
  const std::chrono::duration<double> timeSec = finishTime - startTime;
  if (verbose)
    std::cout << "Operator action prepation time (sec) = " << timeSec.count() << std::endl;

  // Query the required workspace buffer size (bytes)
  std::size_t requiredBufferSize {0};
  HANDLE_CUDM_ERROR(cudensitymatWorkspaceGetMemorySize(handle,
                      workspaceDescr,
                      CUDENSITYMAT_MEMSPACE_DEVICE,
                      CUDENSITYMAT_WORKSPACE_SCRATCH,
                      &requiredBufferSize));
  if (verbose)
    std::cout << "Required workspace buffer size (bytes) = " << requiredBufferSize << std::endl;

  // Allocate GPU storage for the workspace buffer
  const std::size_t bufferVolume = requiredBufferSize / sizeof(std::complex<double>);
  auto * workspaceBuffer = createArrayGPU(std::vector<std::complex<double>>(bufferVolume, {0.0, 0.0}));
  if (verbose)
    std::cout << "Allocated workspace buffer of size (bytes) = " << requiredBufferSize << std::endl;

  // Attach the workspace buffer to the workspace descriptor
  HANDLE_CUDM_ERROR(cudensitymatWorkspaceSetMemory(handle,
                      workspaceDescr,
                      CUDENSITYMAT_MEMSPACE_DEVICE,
                      CUDENSITYMAT_WORKSPACE_SCRATCH,
                      workspaceBuffer,
                      requiredBufferSize));
  if (verbose)
    std::cout << "Attached workspace buffer of size (bytes) = " << requiredBufferSize << std::endl;

  // Zero out the output quantum state
  HANDLE_CUDM_ERROR(cudensitymatStateInitializeZero(handle,
                      outputState,
                      0x0));
  if (verbose)
    std::cout << "Initialized the output state to zero\n";

  // Apply the Liouvillian operator to the input quatum state
  // and accumulate its action into the output quantum state (note += semantics)
  for (int32_t repeat = 0; repeat < NUM_REPEATS; ++repeat) { // repeat multiple times for accurate timing
    HANDLE_CUDA_ERROR(cudaDeviceSynchronize());
    const auto startTime = std::chrono::high_resolution_clock::now();
    HANDLE_CUDM_ERROR(cudensitymatOperatorComputeAction(handle,
                        liouvillian.get(),
                        0.01,                                  // time point
                        1,                                     // number of external user-defined Hamiltonian parameters
                        std::vector<double>({13.42}).data(),   // Hamiltonian parameter(s)
                        inputState,                            // input quantum state
                        outputState,                           // output quantum state
                        workspaceDescr,                        // workspace descriptor
                        0x0));                                 // default CUDA stream
    HANDLE_CUDA_ERROR(cudaDeviceSynchronize());
    const auto finishTime = std::chrono::high_resolution_clock::now();
    const std::chrono::duration<double> timeSec = finishTime - startTime;
    if (verbose)
      std::cout << "Operator action computation time (sec) = " << timeSec.count() << std::endl;
  }

  // Compute the squared norm of the output quantum state
  void * norm2 = createArrayGPU(std::vector<double>(batchSize, 0.0));
  HANDLE_CUDM_ERROR(cudensitymatStateComputeNorm(handle,
                      outputState,
                      norm2,
                      0x0));
  if (verbose)
    std::cout << "Computed the output state norm\n";
  HANDLE_CUDA_ERROR(cudaDeviceSynchronize());
  destroyArrayGPU(norm2);

  // Destroy workspace descriptor
  HANDLE_CUDM_ERROR(cudensitymatDestroyWorkspace(workspaceDescr));

  // Destroy workspace buffer storage
  destroyArrayGPU(workspaceBuffer);

  // Destroy quantum states
  HANDLE_CUDM_ERROR(cudensitymatDestroyState(outputState));
  HANDLE_CUDM_ERROR(cudensitymatDestroyState(inputState));

  // Destroy quantum state storage
  destroyArrayGPU(outputStateElems);
  destroyArrayGPU(inputStateElems);

  if (verbose)
    std::cout << "Destroyed resources\n" << std::flush;
}


int main(int argc, char ** argv)
{
  // Assign a GPU to the process
  HANDLE_CUDA_ERROR(cudaSetDevice(0));
  if (verbose)
    std::cout << "Set active device\n";

  // Create a library handle
  cudensitymatHandle_t handle;
  HANDLE_CUDM_ERROR(cudensitymatCreate(&handle));
  if (verbose)
    std::cout << "Created a library handle\n";

  // Run the example
  exampleWorkflow(handle);

  // Destroy the library handle
  HANDLE_CUDM_ERROR(cudensitymatDestroy(handle));
  if (verbose)
    std::cout << "Destroyed the library handle\n";

  // Done
  return 0;
}

代码示例（多 GPU 上的并行执行）¶

适配主串行代码并启用跨多个/多个 GPU 设备（跨多个/多个节点）的并行执行非常简单。我们将用一个使用消息传递接口 (MPI) 作为通信层的示例来说明这一点。下面我们展示了为了启用分布式并行执行而需要进行的少量添加，而无需对原始串行源代码进行任何更改。

完整的示例代码可以在 NVIDIA/cuQuantum 存储库中找到（主 MPI 代码和算符定义以及实用程序代码）。

这是多 GPU 运行的更新后的主代码。

/* Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES.
 *
 * SPDX-License-Identifier: BSD-3-Clause
 */

#include <cudensitymat.h>  // cuDensityMat library header
#include "helpers.h"       // helper functions


// Transverse Ising Hamiltonian with double summation ordering
// and spin-operator fusion, plus fused dissipation terms
#include "transverse_ising_full_fused_noisy.h"  // user-defined Liouvillian operator example


// MPI library (optional)
#ifdef MPI_ENABLED
#include <mpi.h>
#endif

#include <cmath>
#include <complex>
#include <vector>
#include <chrono>
#include <iostream>
#include <cassert>


// Number of times to perform operator action on a quantum state
constexpr int NUM_REPEATS = 2;

// Logging verbosity
bool verbose = true;


// Example workflow
void exampleWorkflow(cudensitymatHandle_t handle)
{
  // Define the composite Hilbert space shape and
  // quantum state batch size (number of individual quantum states)
  const std::vector<int64_t> spaceShape({2,2,2,2,2,2,2,2}); // dimensions of quantum degrees of freedom
  const int64_t batchSize = 1;                              // number of quantum states per batch (default is 1)

  if (verbose) {
    std::cout << "Hilbert space rank = " << spaceShape.size() << "; Shape = (";
    for (const auto & dimsn: spaceShape)
      std::cout << dimsn << ",";
    std::cout << ")" << std::endl;
    std::cout << "Quantum state batch size = " << batchSize << std::endl;
  }

  // Construct a user-defined Liouvillian operator using a convenience C++ class
  UserDefinedLiouvillian liouvillian(handle, spaceShape);
  if (verbose)
    std::cout << "Constructed the Liouvillian operator\n";

  // Declare the input quantum state
  cudensitymatState_t inputState;
  HANDLE_CUDM_ERROR(cudensitymatCreateState(handle,
                      CUDENSITYMAT_STATE_PURITY_MIXED,  // pure (state vector) or mixed (density matrix) state
                      spaceShape.size(),
                      spaceShape.data(),
                      batchSize,
                      CUDA_C_64F,  // data type must match that of the operators created above
                      &inputState));

  // Query the size of the quantum state storage
  std::size_t storageSize {0}; // only one storage component (tensor) is needed
  HANDLE_CUDM_ERROR(cudensitymatStateGetComponentStorageSize(handle,
                      inputState,
                      1,               // only one storage component
                      &storageSize));  // storage size in bytes
  const std::size_t stateVolume = storageSize / sizeof(std::complex<double>);  // quantum state tensor volume (number of elements)
  if (verbose)
    std::cout << "Quantum state storage size (bytes) = " << storageSize << std::endl;

  // Prepare some initial value for the input quantum state
  std::vector<std::complex<double>> inputStateValue(stateVolume);
  for (std::size_t i = 0; i < stateVolume; ++i) {
    inputStateValue[i] = std::complex<double>{double(i+1), double(-(i+2))}; // just some value
  }

  // Allocate initialized GPU storage for the input quantum state with prepared values
  auto * inputStateElems = createArrayGPU(inputStateValue);

  // Attach initialized GPU storage to the input quantum state
  HANDLE_CUDM_ERROR(cudensitymatStateAttachComponentStorage(handle,
                      inputState,
                      1,                                                 // only one storage component (tensor)
                      std::vector<void*>({inputStateElems}).data(),      // pointer to the GPU storage for the quantum state
                      std::vector<std::size_t>({storageSize}).data()));  // size of the GPU storage for the quantum state
  if (verbose)
    std::cout << "Constructed input quantum state\n";

  // Declare the output quantum state of the same shape
  cudensitymatState_t outputState;
  HANDLE_CUDM_ERROR(cudensitymatCreateState(handle,
                      CUDENSITYMAT_STATE_PURITY_MIXED,  // pure (state vector) or mixed (density matrix) state
                      spaceShape.size(),
                      spaceShape.data(),
                      batchSize,
                      CUDA_C_64F,  // data type must match that of the operators created above
                      &outputState));

  // Allocate initialized GPU storage for the output quantum state
  auto * outputStateElems = createArrayGPU(std::vector<std::complex<double>>(stateVolume, {0.0, 0.0}));

  // Attach initialized GPU storage to the output quantum state
  HANDLE_CUDM_ERROR(cudensitymatStateAttachComponentStorage(handle,
                      outputState,
                      1,                                                 // only one storage component (no tensor factorization)
                      std::vector<void*>({outputStateElems}).data(),     // pointer to the GPU storage for the quantum state
                      std::vector<std::size_t>({storageSize}).data()));  // size of the GPU storage for the quantum state
  if (verbose)
    std::cout << "Constructed output quantum state\n";

  // Declare a workspace descriptor
  cudensitymatWorkspaceDescriptor_t workspaceDescr;
  HANDLE_CUDM_ERROR(cudensitymatCreateWorkspace(handle, &workspaceDescr));

  // Query free GPU memory
  std::size_t freeMem = 0, totalMem = 0;
  HANDLE_CUDA_ERROR(cudaMemGetInfo(&freeMem, &totalMem));
  freeMem = static_cast<std::size_t>(static_cast<double>(freeMem) * 0.95); // take 95% of the free memory for the workspace buffer
  if (verbose)
    std::cout << "Max workspace buffer size (bytes) = " << freeMem << std::endl;

  // Prepare the Liouvillian operator action on a quantum state (needs to be done only once)
  const auto startTime = std::chrono::high_resolution_clock::now();
  HANDLE_CUDM_ERROR(cudensitymatOperatorPrepareAction(handle,
                      liouvillian.get(),
                      inputState,
                      outputState,
                      CUDENSITYMAT_COMPUTE_64F,  // GPU compute type
                      freeMem,                   // max available GPU free memory for the workspace
                      workspaceDescr,            // workspace descriptor
                      0x0));                     // default CUDA stream
  const auto finishTime = std::chrono::high_resolution_clock::now();
  const std::chrono::duration<double> timeSec = finishTime - startTime;
  if (verbose)
    std::cout << "Operator action prepation time (sec) = " << timeSec.count() << std::endl;

  // Query the required workspace buffer size (bytes)
  std::size_t requiredBufferSize {0};
  HANDLE_CUDM_ERROR(cudensitymatWorkspaceGetMemorySize(handle,
                      workspaceDescr,
                      CUDENSITYMAT_MEMSPACE_DEVICE,
                      CUDENSITYMAT_WORKSPACE_SCRATCH,
                      &requiredBufferSize));
  if (verbose)
    std::cout << "Required workspace buffer size (bytes) = " << requiredBufferSize << std::endl;

  // Allocate GPU storage for the workspace buffer
  const std::size_t bufferVolume = requiredBufferSize / sizeof(std::complex<double>);
  auto * workspaceBuffer = createArrayGPU(std::vector<std::complex<double>>(bufferVolume, {0.0, 0.0}));
  if (verbose)
    std::cout << "Allocated workspace buffer of size (bytes) = " << requiredBufferSize << std::endl;

  // Attach the workspace buffer to the workspace descriptor
  HANDLE_CUDM_ERROR(cudensitymatWorkspaceSetMemory(handle,
                      workspaceDescr,
                      CUDENSITYMAT_MEMSPACE_DEVICE,
                      CUDENSITYMAT_WORKSPACE_SCRATCH,
                      workspaceBuffer,
                      requiredBufferSize));
  if (verbose)
    std::cout << "Attached workspace buffer of size (bytes) = " << requiredBufferSize << std::endl;

  // Zero out the output quantum state
  HANDLE_CUDM_ERROR(cudensitymatStateInitializeZero(handle,
                      outputState,
                      0x0));
  if (verbose)
    std::cout << "Initialized the output state to zero\n";

  // Apply the Liouvillian operator to the input quatum state
  // and accumulate its action into the output quantum state (note += semantics)
  for (int32_t repeat = 0; repeat < NUM_REPEATS; ++repeat) { // repeat multiple times for accurate timing
    HANDLE_CUDA_ERROR(cudaDeviceSynchronize());
    const auto startTime = std::chrono::high_resolution_clock::now();
    HANDLE_CUDM_ERROR(cudensitymatOperatorComputeAction(handle,
                        liouvillian.get(),
                        0.01,                                  // time point
                        1,                                     // number of external user-defined Hamiltonian parameters
                        std::vector<double>({13.42}).data(),   // Hamiltonian parameter(s)
                        inputState,                            // input quantum state
                        outputState,                           // output quantum state
                        workspaceDescr,                        // workspace descriptor
                        0x0));                                 // default CUDA stream
    HANDLE_CUDA_ERROR(cudaDeviceSynchronize());
    const auto finishTime = std::chrono::high_resolution_clock::now();
    const std::chrono::duration<double> timeSec = finishTime - startTime;
    if (verbose)
      std::cout << "Operator action computation time (sec) = " << timeSec.count() << std::endl;
  }

  // Compute the squared norm of the output quantum state
  void * norm2 = createArrayGPU(std::vector<double>(batchSize, 0.0));
  HANDLE_CUDM_ERROR(cudensitymatStateComputeNorm(handle,
                      outputState,
                      norm2,
                      0x0));
  if (verbose)
    std::cout << "Computed the output state norm\n";
  HANDLE_CUDA_ERROR(cudaDeviceSynchronize());
  destroyArrayGPU(norm2);

  // Destroy workspace descriptor
  HANDLE_CUDM_ERROR(cudensitymatDestroyWorkspace(workspaceDescr));

  // Destroy workspace buffer storage
  destroyArrayGPU(workspaceBuffer);

  // Destroy quantum states
  HANDLE_CUDM_ERROR(cudensitymatDestroyState(outputState));
  HANDLE_CUDM_ERROR(cudensitymatDestroyState(inputState));

  // Destroy quantum state storage
  destroyArrayGPU(outputStateElems);
  destroyArrayGPU(inputStateElems);

  if (verbose)
    std::cout << "Destroyed resources\n" << std::flush;
}


int main(int argc, char ** argv)
{
  // Initialize MPI library (if needed)
#ifdef MPI_ENABLED
  HANDLE_MPI_ERROR(MPI_Init(&argc, &argv));
  int procRank {-1};
  HANDLE_MPI_ERROR(MPI_Comm_rank(MPI_COMM_WORLD, &procRank));
  int numProcs {0};
  HANDLE_MPI_ERROR(MPI_Comm_size(MPI_COMM_WORLD, &numProcs));
  if (procRank != 0) verbose = false;
  if (verbose)
    std::cout << "Initialized MPI library\n";
#else
  const int procRank {0};
  const int numProcs {1};
#endif

  // Assign a GPU to the process
  int numDevices {0};
  HANDLE_CUDA_ERROR(cudaGetDeviceCount(&numDevices));
  const int deviceId = procRank % numDevices;
  HANDLE_CUDA_ERROR(cudaSetDevice(deviceId));
  if (verbose)
    std::cout << "Set active device\n";

  // Create a library handle
  cudensitymatHandle_t handle;
  HANDLE_CUDM_ERROR(cudensitymatCreate(&handle));
  if (verbose)
    std::cout << "Created a library handle\n";

  // Reset distributed configuration (once)
#ifdef MPI_ENABLED
  MPI_Comm comm;
  HANDLE_MPI_ERROR(MPI_Comm_dup(MPI_COMM_WORLD, &comm));
  HANDLE_CUDM_ERROR(cudensitymatResetDistributedConfiguration(handle,
                      CUDENSITYMAT_DISTRIBUTED_PROVIDER_MPI,
                      &comm, sizeof(comm)));
#endif

  // Run the example
  exampleWorkflow(handle);

  // Synchronize MPI processes
#ifdef MPI_ENABLED
  HANDLE_MPI_ERROR(MPI_Barrier(MPI_COMM_WORLD));
#endif

  // Destroy the library handle
  HANDLE_CUDM_ERROR(cudensitymatDestroy(handle));
  if (verbose)
    std::cout << "Destroyed the library handle\n";

  // Finalize the MPI library
#ifdef MPI_ENABLED
  HANDLE_MPI_ERROR(MPI_Finalize());
  if (verbose)
    std::cout << "Finalized MPI library\n";
#endif

  // Done
  return 0;
}

实用技巧¶

对于调试，可以设置环境变量 CUDENSITYMAT_LOG_LEVEL=n。级别 n = 0, 1, …, 5 对应于下表描述的日志记录器级别。环境变量 CUDENSITYMAT_LOG_FILE=<filepath> 可用于将日志输出重定向到 <filepath> 处的自定义文件，而不是 stdout。

级别	摘要	详细描述
0	关闭	禁用日志记录（默认）
1	错误	仅记录错误
2	性能跟踪	启动 CUDA 内核的 API 调用将记录其参数和重要信息
3	性能提示	可能提高应用程序性能的提示
4	启发式跟踪	提供关于库执行的常规信息，可能包含关于启发式状态的详细信息
5	API 跟踪	API 调用将记录其参数和重要信息