示例

在本节中,我们将展示一个关于如何定义量子算符、量子态,然后计算量子算符对量子态的作用的示例。为了清晰起见,量子算符在单独的头文件 transverse_ising_full_fused_noisy.h 中定义,并在辅助 C++ 类 UserDefinedLiouvillian 中包装。我们还提供了一个实用程序头文件 helpers.h,其中包含方便的 GPU 数组实例化函数。

编译代码

假设 cuQuantum 已解压到 CUQUANTUM_ROOT,cuTENSOR 已解压到 CUTENSOR_ROOT,我们按如下方式更新库路径

export LD_LIBRARY_PATH=${CUQUANTUM_ROOT}/lib:${CUTENSOR_ROOT}/lib/12:${LD_LIBRARY_PATH}

根据您的 CUDA 工具包,您可能需要选择不同的库版本(例如,${CUTENSOR_ROOT}/lib/11)。

下面讨论的串行示例代码 (operator_action_example.cpp) 可以通过以下命令编译

nvcc operator_action_example.cpp -I${CUQUANTUM_ROOT}/include -I${CUTENSOR_ROOT}/include -L${CUQUANTUM_ROOT}/lib -L${CUTENSOR_ROOT}/lib/12 -lcudensitymat -lcutensor -o operator_action_example

为了静态链接到 cuDensityMat 库,请使用以下命令

nvcc operator_action_example.cpp -I${CUQUANTUM_ROOT}/include -I${CUTENSOR_ROOT}/include ${CUQUANTUM_ROOT}/lib/libcudensitymat_static.a -L${CUTENSOR_DIR}/lib/12 -lcutensor -o operator_action_example

为了构建示例 operator_action_mpi_example.cpp 的并行 (MPI) 版本,您需要安装一个CUDA-aware MPI 库(例如,最新的 OpenMPI、MPICH 或 MVAPICH),然后设置环境变量 $CUDENSITYMAT_COMM_LIB 为 MPI 接口包装器共享库 libcudensitymat_distributed_interface_mpi.so 的路径。MPI 接口包装器共享库 libcudensitymat_distributed_interface_mpi.so 可以在 ${CUQUANTUM_ROOT}/distributed_interfaces 文件夹内通过调用提供的构建脚本来构建。为了将可执行文件链接到 CUDA-aware MPI 库,您需要将 -I${MPI_PATH}/include-L${MPI_PATH}/lib -lmpi 添加到构建命令

nvcc operator_action_mpi_example.cpp -I${CUQUANTUM_ROOT}/include -I${CUTENSOR_ROOT}/include -I${MPI_PATH}/include -L${CUQUANTUM_ROOT}/lib -L${CUTENSOR_ROOT}/lib/12 -lcudensitymat -lcutensor -L${MPI_PATH}/lib -lmpi -o operator_action_mpi_example

警告

在没有 CUDA-aware MPI 的情况下运行 operator_action_mpi_example.cpp,程序将崩溃。

注意

根据 cuQuantum 包的来源,您可能需要将上面的 lib 替换为 lib64,具体取决于您的 cuQuantum 包中使用的文件夹名称。

代码示例(单 GPU 上的串行执行)

以下代码示例说明了使用 cuDensityMat 库计算量子多体算符对量子态的作用所需的常用步骤。完整的示例代码可以在 NVIDIA/cuQuantum 存储库中找到(主串行代码算符定义 以及实用程序代码)。

首先,让我们介绍一个辅助类来构建量子多体算符,例如,具有融合 ZZ 项和附加噪声项的横向场 Ising 哈密顿量。

  1/* Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES.
  2 *
  3 * SPDX-License-Identifier: BSD-3-Clause
  4 */
  5
  6#pragma once
  7
  8#include <cudensitymat.h> // cuDensityMat library header
  9#include "helpers.h"   // helper functions
 10
 11#include <cmath>
 12#include <complex>
 13#include <vector>
 14#include <iostream>
 15#include <cassert>
 16
 17
 18/* Time-dependent transverse-field Ising Hamiltonian operator
 19   with ordered and fused ZZ terms, plus fused unitary dissipation terms:
 20    H = sum_{i} {h_i * X_i}                // transverse field sum of X_i
 21      + f(t) * sum_{i < j} {g_ij * ZZ_ij}  // modulated sum of fused {Z_i * Z_j} terms
 22      + d * sum_{i} {Y_i * {..} * Y_i}     // dissipation terms {Y_i * {..} * Y_i} will be fused into the YY_ii super-operator
 23   where {..} is the placeholder for the density matrix to show that the operators act from a different side
 24*/
 25
 26
 27// User-defined C++ callback function defining a time-dependent coefficient inside the Hamiltonian:
 28// f(t) = cos(omega * t) + i * sin(omega * t)
 29extern "C"
 30int32_t tdCoefComplex64(double time,             // time point
 31                        int32_t numParams,       // number of external user-defined Liouvillian parameters (= 1 here)
 32                        const double params[],   // params[0] is omega (user-defined Liouvillian parameter)
 33                        cudaDataType_t dataType, // data type (CUDA_C_64F here)
 34                        void * scalarStorage)    // CPU storage for the returned function value
 35{
 36  const auto omega = params[0];
 37  auto * tdCoef = static_cast<std::complex<double>*>(scalarStorage); // casting to complex<double> because it returns CUDA_C_64F data type
 38  *tdCoef = {std::cos(omega * time), std::sin(omega * time)};
 39  return 0; // error code (0: Success)
 40}
 41
 42
 43/** Convenience class to encapsulate the Liouvillian operator:
 44 *   - Constructor constructs the desired Liouvillian operator (`cudensitymatOperator_t`)
 45 *   - get() method returns a reference to the constructed Liouvillian operator
 46 *   - Destructor releases all resources used by the Liouvillian operator
 47 */
 48class UserDefinedLiouvillian final
 49{
 50private:
 51  // Data members
 52  cudensitymatHandle_t handle;              // library context handle
 53  const std::vector<int64_t> spaceShape;    // Hilbert space shape
 54  void * spinXelems {nullptr};              // elements of the X spin operator in GPU RAM
 55  void * spinYYelems {nullptr};             // elements of the YY two-spin operator in GPU RAM
 56  void * spinZZelems {nullptr};             // elements of the ZZ two-spin operator in GPU RAM
 57  cudensitymatElementaryOperator_t spinX;   // X spin operator
 58  cudensitymatElementaryOperator_t spinYY;  // YY two-spin operator
 59  cudensitymatElementaryOperator_t spinZZ;  // ZZ two-spin operator
 60  cudensitymatOperatorTerm_t oneBodyTerm;   // operator term: H1 = sum_{i} {h_i * X_i}
 61  cudensitymatOperatorTerm_t twoBodyTerm;   // operator term: H2 = f(t) * sum_{i < j} {g_ij * ZZ_ij}
 62  cudensitymatOperatorTerm_t noiseTerm;     // operator term: D1 = d * sum_{i} {YY_ii}  // Y_i operators act from different sides on the density matrix
 63  cudensitymatOperator_t liouvillian;       // (-i * (H1 + f(t) * H2) * rho) + (i * rho * (H1 + f(t) * H2)) + D1
 64
 65public:
 66
 67  // Constructor constructs a user-defined Liouvillian operator
 68  UserDefinedLiouvillian(cudensitymatHandle_t contextHandle,              // library context handle
 69                         const std::vector<int64_t> & hilbertSpaceShape): // Hilbert space shape
 70    handle(contextHandle), spaceShape(hilbertSpaceShape)
 71  {
 72    // Define the necessary elementary tensors in GPU memory (F-order storage!)
 73    spinXelems = createArrayGPU<std::complex<double>>(
 74                  {{0.0, 0.0}, {1.0, 0.0},   // 1st column of matrix X
 75                   {1.0, 0.0}, {0.0, 0.0}}); // 2nd column of matrix X
 76
 77    spinYYelems = createArrayGPU<std::complex<double>>(  // YY[i0, i1; j0, j1] := Y[i0; j0] * Y[i1; j1]
 78                    {{0.0, 0.0},  {0.0, 0.0}, {0.0, 0.0}, {-1.0, 0.0},  // 1st column of matrix YY
 79                     {0.0, 0.0},  {0.0, 0.0}, {1.0, 0.0}, {0.0, 0.0},   // 2nd column of matrix YY
 80                     {0.0, 0.0},  {1.0, 0.0}, {0.0, 0.0}, {0.0, 0.0},   // 3rd column of matrix YY
 81                     {-1.0, 0.0}, {0.0, 0.0}, {0.0, 0.0}, {0.0, 0.0}}); // 4th column of matrix YY
 82
 83    spinZZelems = createArrayGPU<std::complex<double>>(  // ZZ[i0, i1; j0, j1] := Z[i0; j0] * Z[i1; j1]
 84                    {{1.0, 0.0}, {0.0, 0.0},  {0.0, 0.0},  {0.0, 0.0},   // 1st column of matrix ZZ
 85                     {0.0, 0.0}, {-1.0, 0.0}, {0.0, 0.0},  {0.0, 0.0},   // 2nd column of matrix ZZ
 86                     {0.0, 0.0}, {0.0, 0.0},  {-1.0, 0.0}, {0.0, 0.0},   // 3rd column of matrix ZZ
 87                     {0.0, 0.0}, {0.0, 0.0},  {0.0, 0.0},  {1.0, 0.0}}); // 4th column of matrix ZZ
 88
 89    // Construct the necessary Elementary Tensor Operators
 90    //   X_i operator
 91    HANDLE_CUDM_ERROR(cudensitymatCreateElementaryOperator(handle,
 92                        1,                                   // one-body operator
 93                        std::vector<int64_t>({2}).data(),    // acts in tensor space of shape {2}
 94                        CUDENSITYMAT_OPERATOR_SPARSITY_NONE, // dense tensor storage
 95                        0,                                   // 0 for dense tensors
 96                        nullptr,                             // nullptr for dense tensors
 97                        CUDA_C_64F,                          // data type
 98                        spinXelems,                          // tensor elements in GPU memory
 99                        {nullptr, nullptr},                  // no tensor callback function (tensor is not time-dependent)
100                        &spinX));                            // the created elementary tensor operator
101    //  ZZ_ij = Z_i * Z_j fused operator
102    HANDLE_CUDM_ERROR(cudensitymatCreateElementaryOperator(handle,
103                        2,                                   // two-body operator
104                        std::vector<int64_t>({2,2}).data(),  // acts in tensor space of shape {2,2}
105                        CUDENSITYMAT_OPERATOR_SPARSITY_NONE, // dense tensor storage
106                        0,                                   // 0 for dense tensors
107                        nullptr,                             // nullptr for dense tensors
108                        CUDA_C_64F,                          // data type
109                        spinZZelems,                         // tensor elements in GPU memory
110                        {nullptr, nullptr},                  // no tensor callback function (tensor is not time-dependent)
111                        &spinZZ));                           // the created elementary tensor operator
112    //  YY_ii = Y_i * {..} * Y_i fused operator (note action from different sides)
113    HANDLE_CUDM_ERROR(cudensitymatCreateElementaryOperator(handle,
114                        2,                                   // two-body operator
115                        std::vector<int64_t>({2,2}).data(),  // acts in tensor space of shape {2,2}
116                        CUDENSITYMAT_OPERATOR_SPARSITY_NONE, // dense tensor storage
117                        0,                                   // 0 for dense tensors
118                        nullptr,                             // nullptr for dense tensors
119                        CUDA_C_64F,                          // data type
120                        spinYYelems,                         // tensor elements in GPU memory
121                        {nullptr, nullptr},                  // no tensor callback function (tensor is not time-dependent)
122                        &spinYY));                           // the created elementary tensor operator
123
124    // Construct the necessary Operator Terms from direct products of Elementary Tensor Operators
125    //  Create an empty operator term
126    HANDLE_CUDM_ERROR(cudensitymatCreateOperatorTerm(handle,
127                        spaceShape.size(),                   // Hilbert space rank (number of dimensions)
128                        spaceShape.data(),                   // Hilbert space shape
129                        &oneBodyTerm));                      // the created empty operator term
130    //  Define the operator term
131    for (int32_t i = 0; i < spaceShape.size(); ++i) {
132      const double h_i = 1.0 / static_cast<double>(i+1);  // just some value (time-independent h_i coefficient)
133      HANDLE_CUDM_ERROR(cudensitymatOperatorTermAppendElementaryProduct(handle,
134                          oneBodyTerm,
135                          1,                                                             // number of elementary tensor operators in the product
136                          std::vector<cudensitymatElementaryOperator_t>({spinX}).data(), // elementary tensor operators forming the product
137                          std::vector<int32_t>({i}).data(),                              // space modes acted on by the operator product
138                          std::vector<int32_t>({0}).data(),                              // space mode action duality (0: from the left; 1: from the right)
139                          make_cuDoubleComplex(h_i, 0.0),                                // h_i constant coefficient: Always 64-bit-precision complex number
140                          {nullptr, nullptr}));                                          // no time-dependent coefficient associated with the operator product
141    }
142    //  Create an empty operator term
143    HANDLE_CUDM_ERROR(cudensitymatCreateOperatorTerm(handle,
144                        spaceShape.size(),                   // Hilbert space rank (number of dimensions)
145                        spaceShape.data(),                   // Hilbert space shape
146                        &twoBodyTerm));                      // the created empty operator term
147    //  Define the operator term
148    for (int32_t i = 0; i < spaceShape.size() - 1; ++i) {
149      for (int32_t j = (i + 1); j < spaceShape.size(); ++j) {
150        const double g_ij = -1.0 / static_cast<double>(i + j + 1);  // just some value (time-independent g_ij coefficient)
151        HANDLE_CUDM_ERROR(cudensitymatOperatorTermAppendElementaryProduct(handle,
152                            twoBodyTerm,
153                            1,                                                              // number of elementary tensor operators in the product
154                            std::vector<cudensitymatElementaryOperator_t>({spinZZ}).data(), // elementary tensor operators forming the product
155                            std::vector<int32_t>({i, j}).data(),                            // space modes acted on by the operator product
156                            std::vector<int32_t>({0, 0}).data(),                            // space mode action duality (0: from the left; 1: from the right)
157                            make_cuDoubleComplex(g_ij, 0.0),                                // g_ij constant coefficient: Always 64-bit-precision complex number
158                            {nullptr, nullptr}));                                           // no time-dependent coefficient associated with the operator product
159      }
160    }
161    //  Create an empty operator term
162    HANDLE_CUDM_ERROR(cudensitymatCreateOperatorTerm(handle,
163                        spaceShape.size(),                   // Hilbert space rank (number of dimensions)
164                        spaceShape.data(),                   // Hilbert space shape
165                        &noiseTerm));                        // the created empty operator term
166    //  Define the operator term
167    for (int32_t i = 0; i < spaceShape.size(); ++i) {
168      HANDLE_CUDM_ERROR(cudensitymatOperatorTermAppendElementaryProduct(handle,
169                          noiseTerm,
170                          1,                                                              // number of elementary tensor operators in the product
171                          std::vector<cudensitymatElementaryOperator_t>({spinYY}).data(), // elementary tensor operators forming the product
172                          std::vector<int32_t>({i, i}).data(),                            // space modes acted on by the operator product (from different sides)
173                          std::vector<int32_t>({0, 1}).data(),                            // space mode action duality (0: from the left; 1: from the right)
174                          make_cuDoubleComplex(1.0, 0.0),                                 // default coefficient: Always 64-bit-precision complex number
175                          {nullptr, nullptr}));                                           // no time-dependent coefficient associated with the operator product
176    }
177
178    // Construct the full Liouvillian operator as a sum of the operator terms
179    //  Create an empty operator (super-operator)
180    HANDLE_CUDM_ERROR(cudensitymatCreateOperator(handle,
181                        spaceShape.size(),               // Hilbert space rank (number of dimensions)
182                        spaceShape.data(),               // Hilbert space shape
183                        &liouvillian));                  // the created empty operator (super-operator)
184    //  Append an operator term to the operator (super-operator)
185    HANDLE_CUDM_ERROR(cudensitymatOperatorAppendTerm(handle,
186                        liouvillian,
187                        oneBodyTerm,                     // appended operator term
188                        0,                               // operator term action duality as a whole (0: acting from the left; 1: acting from the right)
189                        make_cuDoubleComplex(0.0, -1.0), // -i constant
190                        {nullptr, nullptr}));            // no time-dependent coefficient associated with the operator term as a whole
191    //  Append an operator term to the operator (super-operator)
192    HANDLE_CUDM_ERROR(cudensitymatOperatorAppendTerm(handle,
193                        liouvillian,
194                        twoBodyTerm,                     // appended operator term
195                        0,                               // operator term action duality as a whole (0: acting from the left; 1: acting from the right)
196                        make_cuDoubleComplex(0.0, -1.0), // -i constant
197                        {tdCoefComplex64, nullptr}));    // function callback defining the time-dependent coefficient associated with this operator term as a whole
198    //  Append an operator term to the operator (super-operator)
199    HANDLE_CUDM_ERROR(cudensitymatOperatorAppendTerm(handle,
200                        liouvillian,
201                        oneBodyTerm,                    // appended operator term
202                        1,                              // operator term action duality as a whole (0: acting from the left; 1: acting from the right)
203                        make_cuDoubleComplex(0.0, 1.0), // i constant
204                        {nullptr, nullptr}));           // no time-dependent coefficient associated with the operator term as a whole
205    //  Append an operator term to the operator (super-operator)
206    HANDLE_CUDM_ERROR(cudensitymatOperatorAppendTerm(handle,
207                        liouvillian,
208                        twoBodyTerm,                    // appended operator term
209                        1,                              // operator term action duality as a whole (0: acting from the left; 1: acting from the right)
210                        make_cuDoubleComplex(0.0, 1.0), // i constant
211                        {tdCoefComplex64, nullptr}));   // function callback defining the time-dependent coefficient associated with this operator term as a whole
212    //  Append an operator term to the operator (super-operator)
213    const double d = 0.42; // just some value (time-independent coefficient)
214    HANDLE_CUDM_ERROR(cudensitymatOperatorAppendTerm(handle,
215                        liouvillian,
216                        noiseTerm,                    // appended operator term
217                        0,                            // operator term action duality as a whole (no duality reversing in this case)
218                        make_cuDoubleComplex(d, 0.0), // constant coefficient associated with the operator term as a whole
219                        {nullptr, nullptr}));         // no time-dependent coefficient associated with the operator term as a whole
220  }
221
222  // Destructor destructs the user-defined Liouvillian operator
223  ~UserDefinedLiouvillian()
224  {
225    // Destroy the Liouvillian operator
226    HANDLE_CUDM_ERROR(cudensitymatDestroyOperator(liouvillian));
227
228    // Destroy operator terms
229    HANDLE_CUDM_ERROR(cudensitymatDestroyOperatorTerm(noiseTerm));
230    HANDLE_CUDM_ERROR(cudensitymatDestroyOperatorTerm(twoBodyTerm));
231    HANDLE_CUDM_ERROR(cudensitymatDestroyOperatorTerm(oneBodyTerm));
232
233    // Destroy elementary tensor operators
234    HANDLE_CUDM_ERROR(cudensitymatDestroyElementaryOperator(spinYY));
235    HANDLE_CUDM_ERROR(cudensitymatDestroyElementaryOperator(spinZZ));
236    HANDLE_CUDM_ERROR(cudensitymatDestroyElementaryOperator(spinX));
237
238    // Destroy elementary tensors
239    destroyArrayGPU(spinYYelems);
240    destroyArrayGPU(spinZZelems);
241    destroyArrayGPU(spinXelems);
242  }
243
244  // Disable copy constructor/assignment (GPU resources are private, no deep copy)
245  UserDefinedLiouvillian(const UserDefinedLiouvillian &) = delete;
246  UserDefinedLiouvillian & operator=(const UserDefinedLiouvillian &) = delete;
247  UserDefinedLiouvillian(UserDefinedLiouvillian &&) noexcept = default;
248  UserDefinedLiouvillian & operator=(UserDefinedLiouvillian &&) noexcept = default;
249
250  // Get access to the constructed Liouvillian
251  cudensitymatOperator_t & get()
252  {
253    return liouvillian;
254  }
255
256};

现在我们可以在主代码中使用这个量子多体算符。

  1/* Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES.
  2 *
  3 * SPDX-License-Identifier: BSD-3-Clause
  4 */
  5
  6#include <cudensitymat.h>  // cuDensityMat library header
  7#include "helpers.h"       // helper functions
  8
  9
 10// Transverse Ising Hamiltonian with double summation ordering
 11// and spin-operator fusion, plus fused dissipation terms
 12#include "transverse_ising_full_fused_noisy.h"  // user-defined Liouvillian operator example
 13
 14
 15#include <cmath>
 16#include <complex>
 17#include <vector>
 18#include <chrono>
 19#include <iostream>
 20#include <cassert>
 21
 22
 23// Number of times to perform operator action on a quantum state
 24constexpr int NUM_REPEATS = 2;
 25
 26// Logging verbosity
 27bool verbose = true;
 28
 29
 30// Example workflow
 31void exampleWorkflow(cudensitymatHandle_t handle)
 32{
 33  // Define the composite Hilbert space shape and
 34  // quantum state batch size (number of individual quantum states)
 35  const std::vector<int64_t> spaceShape({2,2,2,2,2,2,2,2}); // dimensions of quantum degrees of freedom
 36  const int64_t batchSize = 1;                              // number of quantum states per batch (default is 1)
 37
 38  if (verbose) {
 39    std::cout << "Hilbert space rank = " << spaceShape.size() << "; Shape = (";
 40    for (const auto & dimsn: spaceShape)
 41      std::cout << dimsn << ",";
 42    std::cout << ")" << std::endl;
 43    std::cout << "Quantum state batch size = " << batchSize << std::endl;
 44  }
 45
 46  // Construct a user-defined Liouvillian operator using a convenience C++ class
 47  UserDefinedLiouvillian liouvillian(handle, spaceShape);
 48  if (verbose)
 49    std::cout << "Constructed the Liouvillian operator\n";
 50
 51  // Declare the input quantum state
 52  cudensitymatState_t inputState;
 53  HANDLE_CUDM_ERROR(cudensitymatCreateState(handle,
 54                      CUDENSITYMAT_STATE_PURITY_MIXED,  // pure (state vector) or mixed (density matrix) state
 55                      spaceShape.size(),
 56                      spaceShape.data(),
 57                      batchSize,
 58                      CUDA_C_64F,  // data type must match that of the operators created above
 59                      &inputState));
 60
 61  // Query the size of the quantum state storage
 62  std::size_t storageSize {0}; // only one storage component (tensor) is needed
 63  HANDLE_CUDM_ERROR(cudensitymatStateGetComponentStorageSize(handle,
 64                      inputState,
 65                      1,               // only one storage component
 66                      &storageSize));  // storage size in bytes
 67  const std::size_t stateVolume = storageSize / sizeof(std::complex<double>);  // quantum state tensor volume (number of elements)
 68  if (verbose)
 69    std::cout << "Quantum state storage size (bytes) = " << storageSize << std::endl;
 70
 71  // Prepare some initial value for the input quantum state
 72  std::vector<std::complex<double>> inputStateValue(stateVolume);
 73  for (std::size_t i = 0; i < stateVolume; ++i) {
 74    inputStateValue[i] = std::complex<double>{double(i+1), double(-(i+2))}; // just some value
 75  }
 76
 77  // Allocate initialized GPU storage for the input quantum state with prepared values
 78  auto * inputStateElems = createArrayGPU(inputStateValue);
 79
 80  // Attach initialized GPU storage to the input quantum state
 81  HANDLE_CUDM_ERROR(cudensitymatStateAttachComponentStorage(handle,
 82                      inputState,
 83                      1,                                                 // only one storage component (tensor)
 84                      std::vector<void*>({inputStateElems}).data(),      // pointer to the GPU storage for the quantum state
 85                      std::vector<std::size_t>({storageSize}).data()));  // size of the GPU storage for the quantum state
 86  if (verbose)
 87    std::cout << "Constructed input quantum state\n";
 88
 89  // Declare the output quantum state of the same shape
 90  cudensitymatState_t outputState;
 91  HANDLE_CUDM_ERROR(cudensitymatCreateState(handle,
 92                      CUDENSITYMAT_STATE_PURITY_MIXED,  // pure (state vector) or mixed (density matrix) state
 93                      spaceShape.size(),
 94                      spaceShape.data(),
 95                      batchSize,
 96                      CUDA_C_64F,  // data type must match that of the operators created above
 97                      &outputState));
 98
 99  // Allocate initialized GPU storage for the output quantum state
100  auto * outputStateElems = createArrayGPU(std::vector<std::complex<double>>(stateVolume, {0.0, 0.0}));
101
102  // Attach initialized GPU storage to the output quantum state
103  HANDLE_CUDM_ERROR(cudensitymatStateAttachComponentStorage(handle,
104                      outputState,
105                      1,                                                 // only one storage component (no tensor factorization)
106                      std::vector<void*>({outputStateElems}).data(),     // pointer to the GPU storage for the quantum state
107                      std::vector<std::size_t>({storageSize}).data()));  // size of the GPU storage for the quantum state
108  if (verbose)
109    std::cout << "Constructed output quantum state\n";
110
111  // Declare a workspace descriptor
112  cudensitymatWorkspaceDescriptor_t workspaceDescr;
113  HANDLE_CUDM_ERROR(cudensitymatCreateWorkspace(handle, &workspaceDescr));
114
115  // Query free GPU memory
116  std::size_t freeMem = 0, totalMem = 0;
117  HANDLE_CUDA_ERROR(cudaMemGetInfo(&freeMem, &totalMem));
118  freeMem = static_cast<std::size_t>(static_cast<double>(freeMem) * 0.95); // take 95% of the free memory for the workspace buffer
119  if (verbose)
120    std::cout << "Max workspace buffer size (bytes) = " << freeMem << std::endl;
121
122  // Prepare the Liouvillian operator action on a quantum state (needs to be done only once)
123  const auto startTime = std::chrono::high_resolution_clock::now();
124  HANDLE_CUDM_ERROR(cudensitymatOperatorPrepareAction(handle,
125                      liouvillian.get(),
126                      inputState,
127                      outputState,
128                      CUDENSITYMAT_COMPUTE_64F,  // GPU compute type
129                      freeMem,                   // max available GPU free memory for the workspace
130                      workspaceDescr,            // workspace descriptor
131                      0x0));                     // default CUDA stream
132  const auto finishTime = std::chrono::high_resolution_clock::now();
133  const std::chrono::duration<double> timeSec = finishTime - startTime;
134  if (verbose)
135    std::cout << "Operator action prepation time (sec) = " << timeSec.count() << std::endl;
136
137  // Query the required workspace buffer size (bytes)
138  std::size_t requiredBufferSize {0};
139  HANDLE_CUDM_ERROR(cudensitymatWorkspaceGetMemorySize(handle,
140                      workspaceDescr,
141                      CUDENSITYMAT_MEMSPACE_DEVICE,
142                      CUDENSITYMAT_WORKSPACE_SCRATCH,
143                      &requiredBufferSize));
144  if (verbose)
145    std::cout << "Required workspace buffer size (bytes) = " << requiredBufferSize << std::endl;
146
147  // Allocate GPU storage for the workspace buffer
148  const std::size_t bufferVolume = requiredBufferSize / sizeof(std::complex<double>);
149  auto * workspaceBuffer = createArrayGPU(std::vector<std::complex<double>>(bufferVolume, {0.0, 0.0}));
150  if (verbose)
151    std::cout << "Allocated workspace buffer of size (bytes) = " << requiredBufferSize << std::endl;
152
153  // Attach the workspace buffer to the workspace descriptor
154  HANDLE_CUDM_ERROR(cudensitymatWorkspaceSetMemory(handle,
155                      workspaceDescr,
156                      CUDENSITYMAT_MEMSPACE_DEVICE,
157                      CUDENSITYMAT_WORKSPACE_SCRATCH,
158                      workspaceBuffer,
159                      requiredBufferSize));
160  if (verbose)
161    std::cout << "Attached workspace buffer of size (bytes) = " << requiredBufferSize << std::endl;
162
163  // Zero out the output quantum state
164  HANDLE_CUDM_ERROR(cudensitymatStateInitializeZero(handle,
165                      outputState,
166                      0x0));
167  if (verbose)
168    std::cout << "Initialized the output state to zero\n";
169
170  // Apply the Liouvillian operator to the input quatum state
171  // and accumulate its action into the output quantum state (note += semantics)
172  for (int32_t repeat = 0; repeat < NUM_REPEATS; ++repeat) { // repeat multiple times for accurate timing
173    HANDLE_CUDA_ERROR(cudaDeviceSynchronize());
174    const auto startTime = std::chrono::high_resolution_clock::now();
175    HANDLE_CUDM_ERROR(cudensitymatOperatorComputeAction(handle,
176                        liouvillian.get(),
177                        0.01,                                  // time point
178                        1,                                     // number of external user-defined Hamiltonian parameters
179                        std::vector<double>({13.42}).data(),   // Hamiltonian parameter(s)
180                        inputState,                            // input quantum state
181                        outputState,                           // output quantum state
182                        workspaceDescr,                        // workspace descriptor
183                        0x0));                                 // default CUDA stream
184    HANDLE_CUDA_ERROR(cudaDeviceSynchronize());
185    const auto finishTime = std::chrono::high_resolution_clock::now();
186    const std::chrono::duration<double> timeSec = finishTime - startTime;
187    if (verbose)
188      std::cout << "Operator action computation time (sec) = " << timeSec.count() << std::endl;
189  }
190
191  // Compute the squared norm of the output quantum state
192  void * norm2 = createArrayGPU(std::vector<double>(batchSize, 0.0));
193  HANDLE_CUDM_ERROR(cudensitymatStateComputeNorm(handle,
194                      outputState,
195                      norm2,
196                      0x0));
197  if (verbose)
198    std::cout << "Computed the output state norm\n";
199  HANDLE_CUDA_ERROR(cudaDeviceSynchronize());
200  destroyArrayGPU(norm2);
201
202  // Destroy workspace descriptor
203  HANDLE_CUDM_ERROR(cudensitymatDestroyWorkspace(workspaceDescr));
204
205  // Destroy workspace buffer storage
206  destroyArrayGPU(workspaceBuffer);
207
208  // Destroy quantum states
209  HANDLE_CUDM_ERROR(cudensitymatDestroyState(outputState));
210  HANDLE_CUDM_ERROR(cudensitymatDestroyState(inputState));
211
212  // Destroy quantum state storage
213  destroyArrayGPU(outputStateElems);
214  destroyArrayGPU(inputStateElems);
215
216  if (verbose)
217    std::cout << "Destroyed resources\n" << std::flush;
218}
219
220
221int main(int argc, char ** argv)
222{
223  // Assign a GPU to the process
224  HANDLE_CUDA_ERROR(cudaSetDevice(0));
225  if (verbose)
226    std::cout << "Set active device\n";
227
228  // Create a library handle
229  cudensitymatHandle_t handle;
230  HANDLE_CUDM_ERROR(cudensitymatCreate(&handle));
231  if (verbose)
232    std::cout << "Created a library handle\n";
233
234  // Run the example
235  exampleWorkflow(handle);
236
237  // Destroy the library handle
238  HANDLE_CUDM_ERROR(cudensitymatDestroy(handle));
239  if (verbose)
240    std::cout << "Destroyed the library handle\n";
241
242  // Done
243  return 0;
244}

代码示例(多 GPU 上的并行执行)

适配 主串行代码 并启用跨多个/多个 GPU 设备(跨多个/多个节点)的并行执行非常简单。我们将用一个使用消息传递接口 (MPI) 作为通信层的示例来说明这一点。下面我们展示了为了启用分布式并行执行而需要进行的少量添加,而无需对原始串行源代码进行任何更改。

完整的示例代码可以在 NVIDIA/cuQuantum 存储库中找到(主 MPI 代码算符定义 以及实用程序代码)。

这是多 GPU 运行的更新后的主代码。

  1/* Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES.
  2 *
  3 * SPDX-License-Identifier: BSD-3-Clause
  4 */
  5
  6#include <cudensitymat.h>  // cuDensityMat library header
  7#include "helpers.h"       // helper functions
  8
  9
 10// Transverse Ising Hamiltonian with double summation ordering
 11// and spin-operator fusion, plus fused dissipation terms
 12#include "transverse_ising_full_fused_noisy.h"  // user-defined Liouvillian operator example
 13
 14
 15// MPI library (optional)
 16#ifdef MPI_ENABLED
 17#include <mpi.h>
 18#endif
 19
 20#include <cmath>
 21#include <complex>
 22#include <vector>
 23#include <chrono>
 24#include <iostream>
 25#include <cassert>
 26
 27
 28// Number of times to perform operator action on a quantum state
 29constexpr int NUM_REPEATS = 2;
 30
 31// Logging verbosity
 32bool verbose = true;
 33
 34
 35// Example workflow
 36void exampleWorkflow(cudensitymatHandle_t handle)
 37{
 38  // Define the composite Hilbert space shape and
 39  // quantum state batch size (number of individual quantum states)
 40  const std::vector<int64_t> spaceShape({2,2,2,2,2,2,2,2}); // dimensions of quantum degrees of freedom
 41  const int64_t batchSize = 1;                              // number of quantum states per batch (default is 1)
 42
 43  if (verbose) {
 44    std::cout << "Hilbert space rank = " << spaceShape.size() << "; Shape = (";
 45    for (const auto & dimsn: spaceShape)
 46      std::cout << dimsn << ",";
 47    std::cout << ")" << std::endl;
 48    std::cout << "Quantum state batch size = " << batchSize << std::endl;
 49  }
 50
 51  // Construct a user-defined Liouvillian operator using a convenience C++ class
 52  UserDefinedLiouvillian liouvillian(handle, spaceShape);
 53  if (verbose)
 54    std::cout << "Constructed the Liouvillian operator\n";
 55
 56  // Declare the input quantum state
 57  cudensitymatState_t inputState;
 58  HANDLE_CUDM_ERROR(cudensitymatCreateState(handle,
 59                      CUDENSITYMAT_STATE_PURITY_MIXED,  // pure (state vector) or mixed (density matrix) state
 60                      spaceShape.size(),
 61                      spaceShape.data(),
 62                      batchSize,
 63                      CUDA_C_64F,  // data type must match that of the operators created above
 64                      &inputState));
 65
 66  // Query the size of the quantum state storage
 67  std::size_t storageSize {0}; // only one storage component (tensor) is needed
 68  HANDLE_CUDM_ERROR(cudensitymatStateGetComponentStorageSize(handle,
 69                      inputState,
 70                      1,               // only one storage component
 71                      &storageSize));  // storage size in bytes
 72  const std::size_t stateVolume = storageSize / sizeof(std::complex<double>);  // quantum state tensor volume (number of elements)
 73  if (verbose)
 74    std::cout << "Quantum state storage size (bytes) = " << storageSize << std::endl;
 75
 76  // Prepare some initial value for the input quantum state
 77  std::vector<std::complex<double>> inputStateValue(stateVolume);
 78  for (std::size_t i = 0; i < stateVolume; ++i) {
 79    inputStateValue[i] = std::complex<double>{double(i+1), double(-(i+2))}; // just some value
 80  }
 81
 82  // Allocate initialized GPU storage for the input quantum state with prepared values
 83  auto * inputStateElems = createArrayGPU(inputStateValue);
 84
 85  // Attach initialized GPU storage to the input quantum state
 86  HANDLE_CUDM_ERROR(cudensitymatStateAttachComponentStorage(handle,
 87                      inputState,
 88                      1,                                                 // only one storage component (tensor)
 89                      std::vector<void*>({inputStateElems}).data(),      // pointer to the GPU storage for the quantum state
 90                      std::vector<std::size_t>({storageSize}).data()));  // size of the GPU storage for the quantum state
 91  if (verbose)
 92    std::cout << "Constructed input quantum state\n";
 93
 94  // Declare the output quantum state of the same shape
 95  cudensitymatState_t outputState;
 96  HANDLE_CUDM_ERROR(cudensitymatCreateState(handle,
 97                      CUDENSITYMAT_STATE_PURITY_MIXED,  // pure (state vector) or mixed (density matrix) state
 98                      spaceShape.size(),
 99                      spaceShape.data(),
100                      batchSize,
101                      CUDA_C_64F,  // data type must match that of the operators created above
102                      &outputState));
103
104  // Allocate initialized GPU storage for the output quantum state
105  auto * outputStateElems = createArrayGPU(std::vector<std::complex<double>>(stateVolume, {0.0, 0.0}));
106
107  // Attach initialized GPU storage to the output quantum state
108  HANDLE_CUDM_ERROR(cudensitymatStateAttachComponentStorage(handle,
109                      outputState,
110                      1,                                                 // only one storage component (no tensor factorization)
111                      std::vector<void*>({outputStateElems}).data(),     // pointer to the GPU storage for the quantum state
112                      std::vector<std::size_t>({storageSize}).data()));  // size of the GPU storage for the quantum state
113  if (verbose)
114    std::cout << "Constructed output quantum state\n";
115
116  // Declare a workspace descriptor
117  cudensitymatWorkspaceDescriptor_t workspaceDescr;
118  HANDLE_CUDM_ERROR(cudensitymatCreateWorkspace(handle, &workspaceDescr));
119
120  // Query free GPU memory
121  std::size_t freeMem = 0, totalMem = 0;
122  HANDLE_CUDA_ERROR(cudaMemGetInfo(&freeMem, &totalMem));
123  freeMem = static_cast<std::size_t>(static_cast<double>(freeMem) * 0.95); // take 95% of the free memory for the workspace buffer
124  if (verbose)
125    std::cout << "Max workspace buffer size (bytes) = " << freeMem << std::endl;
126
127  // Prepare the Liouvillian operator action on a quantum state (needs to be done only once)
128  const auto startTime = std::chrono::high_resolution_clock::now();
129  HANDLE_CUDM_ERROR(cudensitymatOperatorPrepareAction(handle,
130                      liouvillian.get(),
131                      inputState,
132                      outputState,
133                      CUDENSITYMAT_COMPUTE_64F,  // GPU compute type
134                      freeMem,                   // max available GPU free memory for the workspace
135                      workspaceDescr,            // workspace descriptor
136                      0x0));                     // default CUDA stream
137  const auto finishTime = std::chrono::high_resolution_clock::now();
138  const std::chrono::duration<double> timeSec = finishTime - startTime;
139  if (verbose)
140    std::cout << "Operator action prepation time (sec) = " << timeSec.count() << std::endl;
141
142  // Query the required workspace buffer size (bytes)
143  std::size_t requiredBufferSize {0};
144  HANDLE_CUDM_ERROR(cudensitymatWorkspaceGetMemorySize(handle,
145                      workspaceDescr,
146                      CUDENSITYMAT_MEMSPACE_DEVICE,
147                      CUDENSITYMAT_WORKSPACE_SCRATCH,
148                      &requiredBufferSize));
149  if (verbose)
150    std::cout << "Required workspace buffer size (bytes) = " << requiredBufferSize << std::endl;
151
152  // Allocate GPU storage for the workspace buffer
153  const std::size_t bufferVolume = requiredBufferSize / sizeof(std::complex<double>);
154  auto * workspaceBuffer = createArrayGPU(std::vector<std::complex<double>>(bufferVolume, {0.0, 0.0}));
155  if (verbose)
156    std::cout << "Allocated workspace buffer of size (bytes) = " << requiredBufferSize << std::endl;
157
158  // Attach the workspace buffer to the workspace descriptor
159  HANDLE_CUDM_ERROR(cudensitymatWorkspaceSetMemory(handle,
160                      workspaceDescr,
161                      CUDENSITYMAT_MEMSPACE_DEVICE,
162                      CUDENSITYMAT_WORKSPACE_SCRATCH,
163                      workspaceBuffer,
164                      requiredBufferSize));
165  if (verbose)
166    std::cout << "Attached workspace buffer of size (bytes) = " << requiredBufferSize << std::endl;
167
168  // Zero out the output quantum state
169  HANDLE_CUDM_ERROR(cudensitymatStateInitializeZero(handle,
170                      outputState,
171                      0x0));
172  if (verbose)
173    std::cout << "Initialized the output state to zero\n";
174
175  // Apply the Liouvillian operator to the input quatum state
176  // and accumulate its action into the output quantum state (note += semantics)
177  for (int32_t repeat = 0; repeat < NUM_REPEATS; ++repeat) { // repeat multiple times for accurate timing
178    HANDLE_CUDA_ERROR(cudaDeviceSynchronize());
179    const auto startTime = std::chrono::high_resolution_clock::now();
180    HANDLE_CUDM_ERROR(cudensitymatOperatorComputeAction(handle,
181                        liouvillian.get(),
182                        0.01,                                  // time point
183                        1,                                     // number of external user-defined Hamiltonian parameters
184                        std::vector<double>({13.42}).data(),   // Hamiltonian parameter(s)
185                        inputState,                            // input quantum state
186                        outputState,                           // output quantum state
187                        workspaceDescr,                        // workspace descriptor
188                        0x0));                                 // default CUDA stream
189    HANDLE_CUDA_ERROR(cudaDeviceSynchronize());
190    const auto finishTime = std::chrono::high_resolution_clock::now();
191    const std::chrono::duration<double> timeSec = finishTime - startTime;
192    if (verbose)
193      std::cout << "Operator action computation time (sec) = " << timeSec.count() << std::endl;
194  }
195
196  // Compute the squared norm of the output quantum state
197  void * norm2 = createArrayGPU(std::vector<double>(batchSize, 0.0));
198  HANDLE_CUDM_ERROR(cudensitymatStateComputeNorm(handle,
199                      outputState,
200                      norm2,
201                      0x0));
202  if (verbose)
203    std::cout << "Computed the output state norm\n";
204  HANDLE_CUDA_ERROR(cudaDeviceSynchronize());
205  destroyArrayGPU(norm2);
206
207  // Destroy workspace descriptor
208  HANDLE_CUDM_ERROR(cudensitymatDestroyWorkspace(workspaceDescr));
209
210  // Destroy workspace buffer storage
211  destroyArrayGPU(workspaceBuffer);
212
213  // Destroy quantum states
214  HANDLE_CUDM_ERROR(cudensitymatDestroyState(outputState));
215  HANDLE_CUDM_ERROR(cudensitymatDestroyState(inputState));
216
217  // Destroy quantum state storage
218  destroyArrayGPU(outputStateElems);
219  destroyArrayGPU(inputStateElems);
220
221  if (verbose)
222    std::cout << "Destroyed resources\n" << std::flush;
223}
224
225
226int main(int argc, char ** argv)
227{
228  // Initialize MPI library (if needed)
229#ifdef MPI_ENABLED
230  HANDLE_MPI_ERROR(MPI_Init(&argc, &argv));
231  int procRank {-1};
232  HANDLE_MPI_ERROR(MPI_Comm_rank(MPI_COMM_WORLD, &procRank));
233  int numProcs {0};
234  HANDLE_MPI_ERROR(MPI_Comm_size(MPI_COMM_WORLD, &numProcs));
235  if (procRank != 0) verbose = false;
236  if (verbose)
237    std::cout << "Initialized MPI library\n";
238#else
239  const int procRank {0};
240  const int numProcs {1};
241#endif
242
243  // Assign a GPU to the process
244  int numDevices {0};
245  HANDLE_CUDA_ERROR(cudaGetDeviceCount(&numDevices));
246  const int deviceId = procRank % numDevices;
247  HANDLE_CUDA_ERROR(cudaSetDevice(deviceId));
248  if (verbose)
249    std::cout << "Set active device\n";
250
251  // Create a library handle
252  cudensitymatHandle_t handle;
253  HANDLE_CUDM_ERROR(cudensitymatCreate(&handle));
254  if (verbose)
255    std::cout << "Created a library handle\n";
256
257  // Reset distributed configuration (once)
258#ifdef MPI_ENABLED
259  MPI_Comm comm;
260  HANDLE_MPI_ERROR(MPI_Comm_dup(MPI_COMM_WORLD, &comm));
261  HANDLE_CUDM_ERROR(cudensitymatResetDistributedConfiguration(handle,
262                      CUDENSITYMAT_DISTRIBUTED_PROVIDER_MPI,
263                      &comm, sizeof(comm)));
264#endif
265
266  // Run the example
267  exampleWorkflow(handle);
268
269  // Synchronize MPI processes
270#ifdef MPI_ENABLED
271  HANDLE_MPI_ERROR(MPI_Barrier(MPI_COMM_WORLD));
272#endif
273
274  // Destroy the library handle
275  HANDLE_CUDM_ERROR(cudensitymatDestroy(handle));
276  if (verbose)
277    std::cout << "Destroyed the library handle\n";
278
279  // Finalize the MPI library
280#ifdef MPI_ENABLED
281  HANDLE_MPI_ERROR(MPI_Finalize());
282  if (verbose)
283    std::cout << "Finalized MPI library\n";
284#endif
285
286  // Done
287  return 0;
288}

实用技巧

  • 对于调试,可以设置环境变量 CUDENSITYMAT_LOG_LEVEL=n。级别 n = 0, 1, …, 5 对应于下表描述的日志记录器级别。环境变量 CUDENSITYMAT_LOG_FILE=<filepath> 可用于将日志输出重定向到 <filepath> 处的自定义文件,而不是 stdout

级别

摘要

详细描述

0

关闭

禁用日志记录(默认)

1

错误

仅记录错误

2

性能跟踪

启动 CUDA 内核的 API 调用将记录其参数和重要信息

3

性能提示

可能提高应用程序性能的提示

4

启发式跟踪

提供关于库执行的常规信息,可能包含关于启发式状态的详细信息

5

API 跟踪

API 调用将记录其参数和重要信息