示例¶
在本节中,我们将展示一个关于如何定义量子算符、量子态,然后计算量子算符对量子态的作用的示例。为了清晰起见,量子算符在单独的头文件 transverse_ising_full_fused_noisy.h
中定义,并在辅助 C++ 类 UserDefinedLiouvillian
中包装。我们还提供了一个实用程序头文件 helpers.h
,其中包含方便的 GPU 数组实例化函数。
编译代码¶
假设 cuQuantum 已解压到 CUQUANTUM_ROOT
,cuTENSOR 已解压到 CUTENSOR_ROOT
,我们按如下方式更新库路径
export LD_LIBRARY_PATH=${CUQUANTUM_ROOT}/lib:${CUTENSOR_ROOT}/lib/12:${LD_LIBRARY_PATH}
根据您的 CUDA 工具包,您可能需要选择不同的库版本(例如,${CUTENSOR_ROOT}/lib/11
)。
下面讨论的串行示例代码 (operator_action_example.cpp
) 可以通过以下命令编译
nvcc operator_action_example.cpp -I${CUQUANTUM_ROOT}/include -I${CUTENSOR_ROOT}/include -L${CUQUANTUM_ROOT}/lib -L${CUTENSOR_ROOT}/lib/12 -lcudensitymat -lcutensor -o operator_action_example
为了静态链接到 cuDensityMat 库,请使用以下命令
nvcc operator_action_example.cpp -I${CUQUANTUM_ROOT}/include -I${CUTENSOR_ROOT}/include ${CUQUANTUM_ROOT}/lib/libcudensitymat_static.a -L${CUTENSOR_DIR}/lib/12 -lcutensor -o operator_action_example
为了构建示例 operator_action_mpi_example.cpp
的并行 (MPI) 版本,您需要安装一个CUDA-aware MPI 库(例如,最新的 OpenMPI、MPICH 或 MVAPICH),然后设置环境变量 $CUDENSITYMAT_COMM_LIB
为 MPI 接口包装器共享库 libcudensitymat_distributed_interface_mpi.so
的路径。MPI 接口包装器共享库 libcudensitymat_distributed_interface_mpi.so
可以在 ${CUQUANTUM_ROOT}/distributed_interfaces
文件夹内通过调用提供的构建脚本来构建。为了将可执行文件链接到 CUDA-aware MPI 库,您需要将 -I${MPI_PATH}/include
和 -L${MPI_PATH}/lib -lmpi
添加到构建命令
nvcc operator_action_mpi_example.cpp -I${CUQUANTUM_ROOT}/include -I${CUTENSOR_ROOT}/include -I${MPI_PATH}/include -L${CUQUANTUM_ROOT}/lib -L${CUTENSOR_ROOT}/lib/12 -lcudensitymat -lcutensor -L${MPI_PATH}/lib -lmpi -o operator_action_mpi_example
警告
在没有 CUDA-aware MPI 的情况下运行 operator_action_mpi_example.cpp
,程序将崩溃。
注意
根据 cuQuantum 包的来源,您可能需要将上面的 lib
替换为 lib64
,具体取决于您的 cuQuantum 包中使用的文件夹名称。
代码示例(单 GPU 上的串行执行)¶
以下代码示例说明了使用 cuDensityMat 库计算量子多体算符对量子态的作用所需的常用步骤。完整的示例代码可以在 NVIDIA/cuQuantum 存储库中找到(主串行代码 和 算符定义 以及实用程序代码)。
首先,让我们介绍一个辅助类来构建量子多体算符,例如,具有融合 ZZ 项和附加噪声项的横向场 Ising 哈密顿量。
1/* Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES.
2 *
3 * SPDX-License-Identifier: BSD-3-Clause
4 */
5
6#pragma once
7
8#include <cudensitymat.h> // cuDensityMat library header
9#include "helpers.h" // helper functions
10
11#include <cmath>
12#include <complex>
13#include <vector>
14#include <iostream>
15#include <cassert>
16
17
18/* Time-dependent transverse-field Ising Hamiltonian operator
19 with ordered and fused ZZ terms, plus fused unitary dissipation terms:
20 H = sum_{i} {h_i * X_i} // transverse field sum of X_i
21 + f(t) * sum_{i < j} {g_ij * ZZ_ij} // modulated sum of fused {Z_i * Z_j} terms
22 + d * sum_{i} {Y_i * {..} * Y_i} // dissipation terms {Y_i * {..} * Y_i} will be fused into the YY_ii super-operator
23 where {..} is the placeholder for the density matrix to show that the operators act from a different side
24*/
25
26
27// User-defined C++ callback function defining a time-dependent coefficient inside the Hamiltonian:
28// f(t) = cos(omega * t) + i * sin(omega * t)
29extern "C"
30int32_t tdCoefComplex64(double time, // time point
31 int32_t numParams, // number of external user-defined Liouvillian parameters (= 1 here)
32 const double params[], // params[0] is omega (user-defined Liouvillian parameter)
33 cudaDataType_t dataType, // data type (CUDA_C_64F here)
34 void * scalarStorage) // CPU storage for the returned function value
35{
36 const auto omega = params[0];
37 auto * tdCoef = static_cast<std::complex<double>*>(scalarStorage); // casting to complex<double> because it returns CUDA_C_64F data type
38 *tdCoef = {std::cos(omega * time), std::sin(omega * time)};
39 return 0; // error code (0: Success)
40}
41
42
43/** Convenience class to encapsulate the Liouvillian operator:
44 * - Constructor constructs the desired Liouvillian operator (`cudensitymatOperator_t`)
45 * - get() method returns a reference to the constructed Liouvillian operator
46 * - Destructor releases all resources used by the Liouvillian operator
47 */
48class UserDefinedLiouvillian final
49{
50private:
51 // Data members
52 cudensitymatHandle_t handle; // library context handle
53 const std::vector<int64_t> spaceShape; // Hilbert space shape
54 void * spinXelems {nullptr}; // elements of the X spin operator in GPU RAM
55 void * spinYYelems {nullptr}; // elements of the YY two-spin operator in GPU RAM
56 void * spinZZelems {nullptr}; // elements of the ZZ two-spin operator in GPU RAM
57 cudensitymatElementaryOperator_t spinX; // X spin operator
58 cudensitymatElementaryOperator_t spinYY; // YY two-spin operator
59 cudensitymatElementaryOperator_t spinZZ; // ZZ two-spin operator
60 cudensitymatOperatorTerm_t oneBodyTerm; // operator term: H1 = sum_{i} {h_i * X_i}
61 cudensitymatOperatorTerm_t twoBodyTerm; // operator term: H2 = f(t) * sum_{i < j} {g_ij * ZZ_ij}
62 cudensitymatOperatorTerm_t noiseTerm; // operator term: D1 = d * sum_{i} {YY_ii} // Y_i operators act from different sides on the density matrix
63 cudensitymatOperator_t liouvillian; // (-i * (H1 + f(t) * H2) * rho) + (i * rho * (H1 + f(t) * H2)) + D1
64
65public:
66
67 // Constructor constructs a user-defined Liouvillian operator
68 UserDefinedLiouvillian(cudensitymatHandle_t contextHandle, // library context handle
69 const std::vector<int64_t> & hilbertSpaceShape): // Hilbert space shape
70 handle(contextHandle), spaceShape(hilbertSpaceShape)
71 {
72 // Define the necessary elementary tensors in GPU memory (F-order storage!)
73 spinXelems = createArrayGPU<std::complex<double>>(
74 {{0.0, 0.0}, {1.0, 0.0}, // 1st column of matrix X
75 {1.0, 0.0}, {0.0, 0.0}}); // 2nd column of matrix X
76
77 spinYYelems = createArrayGPU<std::complex<double>>( // YY[i0, i1; j0, j1] := Y[i0; j0] * Y[i1; j1]
78 {{0.0, 0.0}, {0.0, 0.0}, {0.0, 0.0}, {-1.0, 0.0}, // 1st column of matrix YY
79 {0.0, 0.0}, {0.0, 0.0}, {1.0, 0.0}, {0.0, 0.0}, // 2nd column of matrix YY
80 {0.0, 0.0}, {1.0, 0.0}, {0.0, 0.0}, {0.0, 0.0}, // 3rd column of matrix YY
81 {-1.0, 0.0}, {0.0, 0.0}, {0.0, 0.0}, {0.0, 0.0}}); // 4th column of matrix YY
82
83 spinZZelems = createArrayGPU<std::complex<double>>( // ZZ[i0, i1; j0, j1] := Z[i0; j0] * Z[i1; j1]
84 {{1.0, 0.0}, {0.0, 0.0}, {0.0, 0.0}, {0.0, 0.0}, // 1st column of matrix ZZ
85 {0.0, 0.0}, {-1.0, 0.0}, {0.0, 0.0}, {0.0, 0.0}, // 2nd column of matrix ZZ
86 {0.0, 0.0}, {0.0, 0.0}, {-1.0, 0.0}, {0.0, 0.0}, // 3rd column of matrix ZZ
87 {0.0, 0.0}, {0.0, 0.0}, {0.0, 0.0}, {1.0, 0.0}}); // 4th column of matrix ZZ
88
89 // Construct the necessary Elementary Tensor Operators
90 // X_i operator
91 HANDLE_CUDM_ERROR(cudensitymatCreateElementaryOperator(handle,
92 1, // one-body operator
93 std::vector<int64_t>({2}).data(), // acts in tensor space of shape {2}
94 CUDENSITYMAT_OPERATOR_SPARSITY_NONE, // dense tensor storage
95 0, // 0 for dense tensors
96 nullptr, // nullptr for dense tensors
97 CUDA_C_64F, // data type
98 spinXelems, // tensor elements in GPU memory
99 {nullptr, nullptr}, // no tensor callback function (tensor is not time-dependent)
100 &spinX)); // the created elementary tensor operator
101 // ZZ_ij = Z_i * Z_j fused operator
102 HANDLE_CUDM_ERROR(cudensitymatCreateElementaryOperator(handle,
103 2, // two-body operator
104 std::vector<int64_t>({2,2}).data(), // acts in tensor space of shape {2,2}
105 CUDENSITYMAT_OPERATOR_SPARSITY_NONE, // dense tensor storage
106 0, // 0 for dense tensors
107 nullptr, // nullptr for dense tensors
108 CUDA_C_64F, // data type
109 spinZZelems, // tensor elements in GPU memory
110 {nullptr, nullptr}, // no tensor callback function (tensor is not time-dependent)
111 &spinZZ)); // the created elementary tensor operator
112 // YY_ii = Y_i * {..} * Y_i fused operator (note action from different sides)
113 HANDLE_CUDM_ERROR(cudensitymatCreateElementaryOperator(handle,
114 2, // two-body operator
115 std::vector<int64_t>({2,2}).data(), // acts in tensor space of shape {2,2}
116 CUDENSITYMAT_OPERATOR_SPARSITY_NONE, // dense tensor storage
117 0, // 0 for dense tensors
118 nullptr, // nullptr for dense tensors
119 CUDA_C_64F, // data type
120 spinYYelems, // tensor elements in GPU memory
121 {nullptr, nullptr}, // no tensor callback function (tensor is not time-dependent)
122 &spinYY)); // the created elementary tensor operator
123
124 // Construct the necessary Operator Terms from direct products of Elementary Tensor Operators
125 // Create an empty operator term
126 HANDLE_CUDM_ERROR(cudensitymatCreateOperatorTerm(handle,
127 spaceShape.size(), // Hilbert space rank (number of dimensions)
128 spaceShape.data(), // Hilbert space shape
129 &oneBodyTerm)); // the created empty operator term
130 // Define the operator term
131 for (int32_t i = 0; i < spaceShape.size(); ++i) {
132 const double h_i = 1.0 / static_cast<double>(i+1); // just some value (time-independent h_i coefficient)
133 HANDLE_CUDM_ERROR(cudensitymatOperatorTermAppendElementaryProduct(handle,
134 oneBodyTerm,
135 1, // number of elementary tensor operators in the product
136 std::vector<cudensitymatElementaryOperator_t>({spinX}).data(), // elementary tensor operators forming the product
137 std::vector<int32_t>({i}).data(), // space modes acted on by the operator product
138 std::vector<int32_t>({0}).data(), // space mode action duality (0: from the left; 1: from the right)
139 make_cuDoubleComplex(h_i, 0.0), // h_i constant coefficient: Always 64-bit-precision complex number
140 {nullptr, nullptr})); // no time-dependent coefficient associated with the operator product
141 }
142 // Create an empty operator term
143 HANDLE_CUDM_ERROR(cudensitymatCreateOperatorTerm(handle,
144 spaceShape.size(), // Hilbert space rank (number of dimensions)
145 spaceShape.data(), // Hilbert space shape
146 &twoBodyTerm)); // the created empty operator term
147 // Define the operator term
148 for (int32_t i = 0; i < spaceShape.size() - 1; ++i) {
149 for (int32_t j = (i + 1); j < spaceShape.size(); ++j) {
150 const double g_ij = -1.0 / static_cast<double>(i + j + 1); // just some value (time-independent g_ij coefficient)
151 HANDLE_CUDM_ERROR(cudensitymatOperatorTermAppendElementaryProduct(handle,
152 twoBodyTerm,
153 1, // number of elementary tensor operators in the product
154 std::vector<cudensitymatElementaryOperator_t>({spinZZ}).data(), // elementary tensor operators forming the product
155 std::vector<int32_t>({i, j}).data(), // space modes acted on by the operator product
156 std::vector<int32_t>({0, 0}).data(), // space mode action duality (0: from the left; 1: from the right)
157 make_cuDoubleComplex(g_ij, 0.0), // g_ij constant coefficient: Always 64-bit-precision complex number
158 {nullptr, nullptr})); // no time-dependent coefficient associated with the operator product
159 }
160 }
161 // Create an empty operator term
162 HANDLE_CUDM_ERROR(cudensitymatCreateOperatorTerm(handle,
163 spaceShape.size(), // Hilbert space rank (number of dimensions)
164 spaceShape.data(), // Hilbert space shape
165 &noiseTerm)); // the created empty operator term
166 // Define the operator term
167 for (int32_t i = 0; i < spaceShape.size(); ++i) {
168 HANDLE_CUDM_ERROR(cudensitymatOperatorTermAppendElementaryProduct(handle,
169 noiseTerm,
170 1, // number of elementary tensor operators in the product
171 std::vector<cudensitymatElementaryOperator_t>({spinYY}).data(), // elementary tensor operators forming the product
172 std::vector<int32_t>({i, i}).data(), // space modes acted on by the operator product (from different sides)
173 std::vector<int32_t>({0, 1}).data(), // space mode action duality (0: from the left; 1: from the right)
174 make_cuDoubleComplex(1.0, 0.0), // default coefficient: Always 64-bit-precision complex number
175 {nullptr, nullptr})); // no time-dependent coefficient associated with the operator product
176 }
177
178 // Construct the full Liouvillian operator as a sum of the operator terms
179 // Create an empty operator (super-operator)
180 HANDLE_CUDM_ERROR(cudensitymatCreateOperator(handle,
181 spaceShape.size(), // Hilbert space rank (number of dimensions)
182 spaceShape.data(), // Hilbert space shape
183 &liouvillian)); // the created empty operator (super-operator)
184 // Append an operator term to the operator (super-operator)
185 HANDLE_CUDM_ERROR(cudensitymatOperatorAppendTerm(handle,
186 liouvillian,
187 oneBodyTerm, // appended operator term
188 0, // operator term action duality as a whole (0: acting from the left; 1: acting from the right)
189 make_cuDoubleComplex(0.0, -1.0), // -i constant
190 {nullptr, nullptr})); // no time-dependent coefficient associated with the operator term as a whole
191 // Append an operator term to the operator (super-operator)
192 HANDLE_CUDM_ERROR(cudensitymatOperatorAppendTerm(handle,
193 liouvillian,
194 twoBodyTerm, // appended operator term
195 0, // operator term action duality as a whole (0: acting from the left; 1: acting from the right)
196 make_cuDoubleComplex(0.0, -1.0), // -i constant
197 {tdCoefComplex64, nullptr})); // function callback defining the time-dependent coefficient associated with this operator term as a whole
198 // Append an operator term to the operator (super-operator)
199 HANDLE_CUDM_ERROR(cudensitymatOperatorAppendTerm(handle,
200 liouvillian,
201 oneBodyTerm, // appended operator term
202 1, // operator term action duality as a whole (0: acting from the left; 1: acting from the right)
203 make_cuDoubleComplex(0.0, 1.0), // i constant
204 {nullptr, nullptr})); // no time-dependent coefficient associated with the operator term as a whole
205 // Append an operator term to the operator (super-operator)
206 HANDLE_CUDM_ERROR(cudensitymatOperatorAppendTerm(handle,
207 liouvillian,
208 twoBodyTerm, // appended operator term
209 1, // operator term action duality as a whole (0: acting from the left; 1: acting from the right)
210 make_cuDoubleComplex(0.0, 1.0), // i constant
211 {tdCoefComplex64, nullptr})); // function callback defining the time-dependent coefficient associated with this operator term as a whole
212 // Append an operator term to the operator (super-operator)
213 const double d = 0.42; // just some value (time-independent coefficient)
214 HANDLE_CUDM_ERROR(cudensitymatOperatorAppendTerm(handle,
215 liouvillian,
216 noiseTerm, // appended operator term
217 0, // operator term action duality as a whole (no duality reversing in this case)
218 make_cuDoubleComplex(d, 0.0), // constant coefficient associated with the operator term as a whole
219 {nullptr, nullptr})); // no time-dependent coefficient associated with the operator term as a whole
220 }
221
222 // Destructor destructs the user-defined Liouvillian operator
223 ~UserDefinedLiouvillian()
224 {
225 // Destroy the Liouvillian operator
226 HANDLE_CUDM_ERROR(cudensitymatDestroyOperator(liouvillian));
227
228 // Destroy operator terms
229 HANDLE_CUDM_ERROR(cudensitymatDestroyOperatorTerm(noiseTerm));
230 HANDLE_CUDM_ERROR(cudensitymatDestroyOperatorTerm(twoBodyTerm));
231 HANDLE_CUDM_ERROR(cudensitymatDestroyOperatorTerm(oneBodyTerm));
232
233 // Destroy elementary tensor operators
234 HANDLE_CUDM_ERROR(cudensitymatDestroyElementaryOperator(spinYY));
235 HANDLE_CUDM_ERROR(cudensitymatDestroyElementaryOperator(spinZZ));
236 HANDLE_CUDM_ERROR(cudensitymatDestroyElementaryOperator(spinX));
237
238 // Destroy elementary tensors
239 destroyArrayGPU(spinYYelems);
240 destroyArrayGPU(spinZZelems);
241 destroyArrayGPU(spinXelems);
242 }
243
244 // Disable copy constructor/assignment (GPU resources are private, no deep copy)
245 UserDefinedLiouvillian(const UserDefinedLiouvillian &) = delete;
246 UserDefinedLiouvillian & operator=(const UserDefinedLiouvillian &) = delete;
247 UserDefinedLiouvillian(UserDefinedLiouvillian &&) noexcept = default;
248 UserDefinedLiouvillian & operator=(UserDefinedLiouvillian &&) noexcept = default;
249
250 // Get access to the constructed Liouvillian
251 cudensitymatOperator_t & get()
252 {
253 return liouvillian;
254 }
255
256};
现在我们可以在主代码中使用这个量子多体算符。
1/* Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES.
2 *
3 * SPDX-License-Identifier: BSD-3-Clause
4 */
5
6#include <cudensitymat.h> // cuDensityMat library header
7#include "helpers.h" // helper functions
8
9
10// Transverse Ising Hamiltonian with double summation ordering
11// and spin-operator fusion, plus fused dissipation terms
12#include "transverse_ising_full_fused_noisy.h" // user-defined Liouvillian operator example
13
14
15#include <cmath>
16#include <complex>
17#include <vector>
18#include <chrono>
19#include <iostream>
20#include <cassert>
21
22
23// Number of times to perform operator action on a quantum state
24constexpr int NUM_REPEATS = 2;
25
26// Logging verbosity
27bool verbose = true;
28
29
30// Example workflow
31void exampleWorkflow(cudensitymatHandle_t handle)
32{
33 // Define the composite Hilbert space shape and
34 // quantum state batch size (number of individual quantum states)
35 const std::vector<int64_t> spaceShape({2,2,2,2,2,2,2,2}); // dimensions of quantum degrees of freedom
36 const int64_t batchSize = 1; // number of quantum states per batch (default is 1)
37
38 if (verbose) {
39 std::cout << "Hilbert space rank = " << spaceShape.size() << "; Shape = (";
40 for (const auto & dimsn: spaceShape)
41 std::cout << dimsn << ",";
42 std::cout << ")" << std::endl;
43 std::cout << "Quantum state batch size = " << batchSize << std::endl;
44 }
45
46 // Construct a user-defined Liouvillian operator using a convenience C++ class
47 UserDefinedLiouvillian liouvillian(handle, spaceShape);
48 if (verbose)
49 std::cout << "Constructed the Liouvillian operator\n";
50
51 // Declare the input quantum state
52 cudensitymatState_t inputState;
53 HANDLE_CUDM_ERROR(cudensitymatCreateState(handle,
54 CUDENSITYMAT_STATE_PURITY_MIXED, // pure (state vector) or mixed (density matrix) state
55 spaceShape.size(),
56 spaceShape.data(),
57 batchSize,
58 CUDA_C_64F, // data type must match that of the operators created above
59 &inputState));
60
61 // Query the size of the quantum state storage
62 std::size_t storageSize {0}; // only one storage component (tensor) is needed
63 HANDLE_CUDM_ERROR(cudensitymatStateGetComponentStorageSize(handle,
64 inputState,
65 1, // only one storage component
66 &storageSize)); // storage size in bytes
67 const std::size_t stateVolume = storageSize / sizeof(std::complex<double>); // quantum state tensor volume (number of elements)
68 if (verbose)
69 std::cout << "Quantum state storage size (bytes) = " << storageSize << std::endl;
70
71 // Prepare some initial value for the input quantum state
72 std::vector<std::complex<double>> inputStateValue(stateVolume);
73 for (std::size_t i = 0; i < stateVolume; ++i) {
74 inputStateValue[i] = std::complex<double>{double(i+1), double(-(i+2))}; // just some value
75 }
76
77 // Allocate initialized GPU storage for the input quantum state with prepared values
78 auto * inputStateElems = createArrayGPU(inputStateValue);
79
80 // Attach initialized GPU storage to the input quantum state
81 HANDLE_CUDM_ERROR(cudensitymatStateAttachComponentStorage(handle,
82 inputState,
83 1, // only one storage component (tensor)
84 std::vector<void*>({inputStateElems}).data(), // pointer to the GPU storage for the quantum state
85 std::vector<std::size_t>({storageSize}).data())); // size of the GPU storage for the quantum state
86 if (verbose)
87 std::cout << "Constructed input quantum state\n";
88
89 // Declare the output quantum state of the same shape
90 cudensitymatState_t outputState;
91 HANDLE_CUDM_ERROR(cudensitymatCreateState(handle,
92 CUDENSITYMAT_STATE_PURITY_MIXED, // pure (state vector) or mixed (density matrix) state
93 spaceShape.size(),
94 spaceShape.data(),
95 batchSize,
96 CUDA_C_64F, // data type must match that of the operators created above
97 &outputState));
98
99 // Allocate initialized GPU storage for the output quantum state
100 auto * outputStateElems = createArrayGPU(std::vector<std::complex<double>>(stateVolume, {0.0, 0.0}));
101
102 // Attach initialized GPU storage to the output quantum state
103 HANDLE_CUDM_ERROR(cudensitymatStateAttachComponentStorage(handle,
104 outputState,
105 1, // only one storage component (no tensor factorization)
106 std::vector<void*>({outputStateElems}).data(), // pointer to the GPU storage for the quantum state
107 std::vector<std::size_t>({storageSize}).data())); // size of the GPU storage for the quantum state
108 if (verbose)
109 std::cout << "Constructed output quantum state\n";
110
111 // Declare a workspace descriptor
112 cudensitymatWorkspaceDescriptor_t workspaceDescr;
113 HANDLE_CUDM_ERROR(cudensitymatCreateWorkspace(handle, &workspaceDescr));
114
115 // Query free GPU memory
116 std::size_t freeMem = 0, totalMem = 0;
117 HANDLE_CUDA_ERROR(cudaMemGetInfo(&freeMem, &totalMem));
118 freeMem = static_cast<std::size_t>(static_cast<double>(freeMem) * 0.95); // take 95% of the free memory for the workspace buffer
119 if (verbose)
120 std::cout << "Max workspace buffer size (bytes) = " << freeMem << std::endl;
121
122 // Prepare the Liouvillian operator action on a quantum state (needs to be done only once)
123 const auto startTime = std::chrono::high_resolution_clock::now();
124 HANDLE_CUDM_ERROR(cudensitymatOperatorPrepareAction(handle,
125 liouvillian.get(),
126 inputState,
127 outputState,
128 CUDENSITYMAT_COMPUTE_64F, // GPU compute type
129 freeMem, // max available GPU free memory for the workspace
130 workspaceDescr, // workspace descriptor
131 0x0)); // default CUDA stream
132 const auto finishTime = std::chrono::high_resolution_clock::now();
133 const std::chrono::duration<double> timeSec = finishTime - startTime;
134 if (verbose)
135 std::cout << "Operator action prepation time (sec) = " << timeSec.count() << std::endl;
136
137 // Query the required workspace buffer size (bytes)
138 std::size_t requiredBufferSize {0};
139 HANDLE_CUDM_ERROR(cudensitymatWorkspaceGetMemorySize(handle,
140 workspaceDescr,
141 CUDENSITYMAT_MEMSPACE_DEVICE,
142 CUDENSITYMAT_WORKSPACE_SCRATCH,
143 &requiredBufferSize));
144 if (verbose)
145 std::cout << "Required workspace buffer size (bytes) = " << requiredBufferSize << std::endl;
146
147 // Allocate GPU storage for the workspace buffer
148 const std::size_t bufferVolume = requiredBufferSize / sizeof(std::complex<double>);
149 auto * workspaceBuffer = createArrayGPU(std::vector<std::complex<double>>(bufferVolume, {0.0, 0.0}));
150 if (verbose)
151 std::cout << "Allocated workspace buffer of size (bytes) = " << requiredBufferSize << std::endl;
152
153 // Attach the workspace buffer to the workspace descriptor
154 HANDLE_CUDM_ERROR(cudensitymatWorkspaceSetMemory(handle,
155 workspaceDescr,
156 CUDENSITYMAT_MEMSPACE_DEVICE,
157 CUDENSITYMAT_WORKSPACE_SCRATCH,
158 workspaceBuffer,
159 requiredBufferSize));
160 if (verbose)
161 std::cout << "Attached workspace buffer of size (bytes) = " << requiredBufferSize << std::endl;
162
163 // Zero out the output quantum state
164 HANDLE_CUDM_ERROR(cudensitymatStateInitializeZero(handle,
165 outputState,
166 0x0));
167 if (verbose)
168 std::cout << "Initialized the output state to zero\n";
169
170 // Apply the Liouvillian operator to the input quatum state
171 // and accumulate its action into the output quantum state (note += semantics)
172 for (int32_t repeat = 0; repeat < NUM_REPEATS; ++repeat) { // repeat multiple times for accurate timing
173 HANDLE_CUDA_ERROR(cudaDeviceSynchronize());
174 const auto startTime = std::chrono::high_resolution_clock::now();
175 HANDLE_CUDM_ERROR(cudensitymatOperatorComputeAction(handle,
176 liouvillian.get(),
177 0.01, // time point
178 1, // number of external user-defined Hamiltonian parameters
179 std::vector<double>({13.42}).data(), // Hamiltonian parameter(s)
180 inputState, // input quantum state
181 outputState, // output quantum state
182 workspaceDescr, // workspace descriptor
183 0x0)); // default CUDA stream
184 HANDLE_CUDA_ERROR(cudaDeviceSynchronize());
185 const auto finishTime = std::chrono::high_resolution_clock::now();
186 const std::chrono::duration<double> timeSec = finishTime - startTime;
187 if (verbose)
188 std::cout << "Operator action computation time (sec) = " << timeSec.count() << std::endl;
189 }
190
191 // Compute the squared norm of the output quantum state
192 void * norm2 = createArrayGPU(std::vector<double>(batchSize, 0.0));
193 HANDLE_CUDM_ERROR(cudensitymatStateComputeNorm(handle,
194 outputState,
195 norm2,
196 0x0));
197 if (verbose)
198 std::cout << "Computed the output state norm\n";
199 HANDLE_CUDA_ERROR(cudaDeviceSynchronize());
200 destroyArrayGPU(norm2);
201
202 // Destroy workspace descriptor
203 HANDLE_CUDM_ERROR(cudensitymatDestroyWorkspace(workspaceDescr));
204
205 // Destroy workspace buffer storage
206 destroyArrayGPU(workspaceBuffer);
207
208 // Destroy quantum states
209 HANDLE_CUDM_ERROR(cudensitymatDestroyState(outputState));
210 HANDLE_CUDM_ERROR(cudensitymatDestroyState(inputState));
211
212 // Destroy quantum state storage
213 destroyArrayGPU(outputStateElems);
214 destroyArrayGPU(inputStateElems);
215
216 if (verbose)
217 std::cout << "Destroyed resources\n" << std::flush;
218}
219
220
221int main(int argc, char ** argv)
222{
223 // Assign a GPU to the process
224 HANDLE_CUDA_ERROR(cudaSetDevice(0));
225 if (verbose)
226 std::cout << "Set active device\n";
227
228 // Create a library handle
229 cudensitymatHandle_t handle;
230 HANDLE_CUDM_ERROR(cudensitymatCreate(&handle));
231 if (verbose)
232 std::cout << "Created a library handle\n";
233
234 // Run the example
235 exampleWorkflow(handle);
236
237 // Destroy the library handle
238 HANDLE_CUDM_ERROR(cudensitymatDestroy(handle));
239 if (verbose)
240 std::cout << "Destroyed the library handle\n";
241
242 // Done
243 return 0;
244}
代码示例(多 GPU 上的并行执行)¶
适配 主串行代码 并启用跨多个/多个 GPU 设备(跨多个/多个节点)的并行执行非常简单。我们将用一个使用消息传递接口 (MPI) 作为通信层的示例来说明这一点。下面我们展示了为了启用分布式并行执行而需要进行的少量添加,而无需对原始串行源代码进行任何更改。
完整的示例代码可以在 NVIDIA/cuQuantum 存储库中找到(主 MPI 代码 和 算符定义 以及实用程序代码)。
这是多 GPU 运行的更新后的主代码。
1/* Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES.
2 *
3 * SPDX-License-Identifier: BSD-3-Clause
4 */
5
6#include <cudensitymat.h> // cuDensityMat library header
7#include "helpers.h" // helper functions
8
9
10// Transverse Ising Hamiltonian with double summation ordering
11// and spin-operator fusion, plus fused dissipation terms
12#include "transverse_ising_full_fused_noisy.h" // user-defined Liouvillian operator example
13
14
15// MPI library (optional)
16#ifdef MPI_ENABLED
17#include <mpi.h>
18#endif
19
20#include <cmath>
21#include <complex>
22#include <vector>
23#include <chrono>
24#include <iostream>
25#include <cassert>
26
27
28// Number of times to perform operator action on a quantum state
29constexpr int NUM_REPEATS = 2;
30
31// Logging verbosity
32bool verbose = true;
33
34
35// Example workflow
36void exampleWorkflow(cudensitymatHandle_t handle)
37{
38 // Define the composite Hilbert space shape and
39 // quantum state batch size (number of individual quantum states)
40 const std::vector<int64_t> spaceShape({2,2,2,2,2,2,2,2}); // dimensions of quantum degrees of freedom
41 const int64_t batchSize = 1; // number of quantum states per batch (default is 1)
42
43 if (verbose) {
44 std::cout << "Hilbert space rank = " << spaceShape.size() << "; Shape = (";
45 for (const auto & dimsn: spaceShape)
46 std::cout << dimsn << ",";
47 std::cout << ")" << std::endl;
48 std::cout << "Quantum state batch size = " << batchSize << std::endl;
49 }
50
51 // Construct a user-defined Liouvillian operator using a convenience C++ class
52 UserDefinedLiouvillian liouvillian(handle, spaceShape);
53 if (verbose)
54 std::cout << "Constructed the Liouvillian operator\n";
55
56 // Declare the input quantum state
57 cudensitymatState_t inputState;
58 HANDLE_CUDM_ERROR(cudensitymatCreateState(handle,
59 CUDENSITYMAT_STATE_PURITY_MIXED, // pure (state vector) or mixed (density matrix) state
60 spaceShape.size(),
61 spaceShape.data(),
62 batchSize,
63 CUDA_C_64F, // data type must match that of the operators created above
64 &inputState));
65
66 // Query the size of the quantum state storage
67 std::size_t storageSize {0}; // only one storage component (tensor) is needed
68 HANDLE_CUDM_ERROR(cudensitymatStateGetComponentStorageSize(handle,
69 inputState,
70 1, // only one storage component
71 &storageSize)); // storage size in bytes
72 const std::size_t stateVolume = storageSize / sizeof(std::complex<double>); // quantum state tensor volume (number of elements)
73 if (verbose)
74 std::cout << "Quantum state storage size (bytes) = " << storageSize << std::endl;
75
76 // Prepare some initial value for the input quantum state
77 std::vector<std::complex<double>> inputStateValue(stateVolume);
78 for (std::size_t i = 0; i < stateVolume; ++i) {
79 inputStateValue[i] = std::complex<double>{double(i+1), double(-(i+2))}; // just some value
80 }
81
82 // Allocate initialized GPU storage for the input quantum state with prepared values
83 auto * inputStateElems = createArrayGPU(inputStateValue);
84
85 // Attach initialized GPU storage to the input quantum state
86 HANDLE_CUDM_ERROR(cudensitymatStateAttachComponentStorage(handle,
87 inputState,
88 1, // only one storage component (tensor)
89 std::vector<void*>({inputStateElems}).data(), // pointer to the GPU storage for the quantum state
90 std::vector<std::size_t>({storageSize}).data())); // size of the GPU storage for the quantum state
91 if (verbose)
92 std::cout << "Constructed input quantum state\n";
93
94 // Declare the output quantum state of the same shape
95 cudensitymatState_t outputState;
96 HANDLE_CUDM_ERROR(cudensitymatCreateState(handle,
97 CUDENSITYMAT_STATE_PURITY_MIXED, // pure (state vector) or mixed (density matrix) state
98 spaceShape.size(),
99 spaceShape.data(),
100 batchSize,
101 CUDA_C_64F, // data type must match that of the operators created above
102 &outputState));
103
104 // Allocate initialized GPU storage for the output quantum state
105 auto * outputStateElems = createArrayGPU(std::vector<std::complex<double>>(stateVolume, {0.0, 0.0}));
106
107 // Attach initialized GPU storage to the output quantum state
108 HANDLE_CUDM_ERROR(cudensitymatStateAttachComponentStorage(handle,
109 outputState,
110 1, // only one storage component (no tensor factorization)
111 std::vector<void*>({outputStateElems}).data(), // pointer to the GPU storage for the quantum state
112 std::vector<std::size_t>({storageSize}).data())); // size of the GPU storage for the quantum state
113 if (verbose)
114 std::cout << "Constructed output quantum state\n";
115
116 // Declare a workspace descriptor
117 cudensitymatWorkspaceDescriptor_t workspaceDescr;
118 HANDLE_CUDM_ERROR(cudensitymatCreateWorkspace(handle, &workspaceDescr));
119
120 // Query free GPU memory
121 std::size_t freeMem = 0, totalMem = 0;
122 HANDLE_CUDA_ERROR(cudaMemGetInfo(&freeMem, &totalMem));
123 freeMem = static_cast<std::size_t>(static_cast<double>(freeMem) * 0.95); // take 95% of the free memory for the workspace buffer
124 if (verbose)
125 std::cout << "Max workspace buffer size (bytes) = " << freeMem << std::endl;
126
127 // Prepare the Liouvillian operator action on a quantum state (needs to be done only once)
128 const auto startTime = std::chrono::high_resolution_clock::now();
129 HANDLE_CUDM_ERROR(cudensitymatOperatorPrepareAction(handle,
130 liouvillian.get(),
131 inputState,
132 outputState,
133 CUDENSITYMAT_COMPUTE_64F, // GPU compute type
134 freeMem, // max available GPU free memory for the workspace
135 workspaceDescr, // workspace descriptor
136 0x0)); // default CUDA stream
137 const auto finishTime = std::chrono::high_resolution_clock::now();
138 const std::chrono::duration<double> timeSec = finishTime - startTime;
139 if (verbose)
140 std::cout << "Operator action prepation time (sec) = " << timeSec.count() << std::endl;
141
142 // Query the required workspace buffer size (bytes)
143 std::size_t requiredBufferSize {0};
144 HANDLE_CUDM_ERROR(cudensitymatWorkspaceGetMemorySize(handle,
145 workspaceDescr,
146 CUDENSITYMAT_MEMSPACE_DEVICE,
147 CUDENSITYMAT_WORKSPACE_SCRATCH,
148 &requiredBufferSize));
149 if (verbose)
150 std::cout << "Required workspace buffer size (bytes) = " << requiredBufferSize << std::endl;
151
152 // Allocate GPU storage for the workspace buffer
153 const std::size_t bufferVolume = requiredBufferSize / sizeof(std::complex<double>);
154 auto * workspaceBuffer = createArrayGPU(std::vector<std::complex<double>>(bufferVolume, {0.0, 0.0}));
155 if (verbose)
156 std::cout << "Allocated workspace buffer of size (bytes) = " << requiredBufferSize << std::endl;
157
158 // Attach the workspace buffer to the workspace descriptor
159 HANDLE_CUDM_ERROR(cudensitymatWorkspaceSetMemory(handle,
160 workspaceDescr,
161 CUDENSITYMAT_MEMSPACE_DEVICE,
162 CUDENSITYMAT_WORKSPACE_SCRATCH,
163 workspaceBuffer,
164 requiredBufferSize));
165 if (verbose)
166 std::cout << "Attached workspace buffer of size (bytes) = " << requiredBufferSize << std::endl;
167
168 // Zero out the output quantum state
169 HANDLE_CUDM_ERROR(cudensitymatStateInitializeZero(handle,
170 outputState,
171 0x0));
172 if (verbose)
173 std::cout << "Initialized the output state to zero\n";
174
175 // Apply the Liouvillian operator to the input quatum state
176 // and accumulate its action into the output quantum state (note += semantics)
177 for (int32_t repeat = 0; repeat < NUM_REPEATS; ++repeat) { // repeat multiple times for accurate timing
178 HANDLE_CUDA_ERROR(cudaDeviceSynchronize());
179 const auto startTime = std::chrono::high_resolution_clock::now();
180 HANDLE_CUDM_ERROR(cudensitymatOperatorComputeAction(handle,
181 liouvillian.get(),
182 0.01, // time point
183 1, // number of external user-defined Hamiltonian parameters
184 std::vector<double>({13.42}).data(), // Hamiltonian parameter(s)
185 inputState, // input quantum state
186 outputState, // output quantum state
187 workspaceDescr, // workspace descriptor
188 0x0)); // default CUDA stream
189 HANDLE_CUDA_ERROR(cudaDeviceSynchronize());
190 const auto finishTime = std::chrono::high_resolution_clock::now();
191 const std::chrono::duration<double> timeSec = finishTime - startTime;
192 if (verbose)
193 std::cout << "Operator action computation time (sec) = " << timeSec.count() << std::endl;
194 }
195
196 // Compute the squared norm of the output quantum state
197 void * norm2 = createArrayGPU(std::vector<double>(batchSize, 0.0));
198 HANDLE_CUDM_ERROR(cudensitymatStateComputeNorm(handle,
199 outputState,
200 norm2,
201 0x0));
202 if (verbose)
203 std::cout << "Computed the output state norm\n";
204 HANDLE_CUDA_ERROR(cudaDeviceSynchronize());
205 destroyArrayGPU(norm2);
206
207 // Destroy workspace descriptor
208 HANDLE_CUDM_ERROR(cudensitymatDestroyWorkspace(workspaceDescr));
209
210 // Destroy workspace buffer storage
211 destroyArrayGPU(workspaceBuffer);
212
213 // Destroy quantum states
214 HANDLE_CUDM_ERROR(cudensitymatDestroyState(outputState));
215 HANDLE_CUDM_ERROR(cudensitymatDestroyState(inputState));
216
217 // Destroy quantum state storage
218 destroyArrayGPU(outputStateElems);
219 destroyArrayGPU(inputStateElems);
220
221 if (verbose)
222 std::cout << "Destroyed resources\n" << std::flush;
223}
224
225
226int main(int argc, char ** argv)
227{
228 // Initialize MPI library (if needed)
229#ifdef MPI_ENABLED
230 HANDLE_MPI_ERROR(MPI_Init(&argc, &argv));
231 int procRank {-1};
232 HANDLE_MPI_ERROR(MPI_Comm_rank(MPI_COMM_WORLD, &procRank));
233 int numProcs {0};
234 HANDLE_MPI_ERROR(MPI_Comm_size(MPI_COMM_WORLD, &numProcs));
235 if (procRank != 0) verbose = false;
236 if (verbose)
237 std::cout << "Initialized MPI library\n";
238#else
239 const int procRank {0};
240 const int numProcs {1};
241#endif
242
243 // Assign a GPU to the process
244 int numDevices {0};
245 HANDLE_CUDA_ERROR(cudaGetDeviceCount(&numDevices));
246 const int deviceId = procRank % numDevices;
247 HANDLE_CUDA_ERROR(cudaSetDevice(deviceId));
248 if (verbose)
249 std::cout << "Set active device\n";
250
251 // Create a library handle
252 cudensitymatHandle_t handle;
253 HANDLE_CUDM_ERROR(cudensitymatCreate(&handle));
254 if (verbose)
255 std::cout << "Created a library handle\n";
256
257 // Reset distributed configuration (once)
258#ifdef MPI_ENABLED
259 MPI_Comm comm;
260 HANDLE_MPI_ERROR(MPI_Comm_dup(MPI_COMM_WORLD, &comm));
261 HANDLE_CUDM_ERROR(cudensitymatResetDistributedConfiguration(handle,
262 CUDENSITYMAT_DISTRIBUTED_PROVIDER_MPI,
263 &comm, sizeof(comm)));
264#endif
265
266 // Run the example
267 exampleWorkflow(handle);
268
269 // Synchronize MPI processes
270#ifdef MPI_ENABLED
271 HANDLE_MPI_ERROR(MPI_Barrier(MPI_COMM_WORLD));
272#endif
273
274 // Destroy the library handle
275 HANDLE_CUDM_ERROR(cudensitymatDestroy(handle));
276 if (verbose)
277 std::cout << "Destroyed the library handle\n";
278
279 // Finalize the MPI library
280#ifdef MPI_ENABLED
281 HANDLE_MPI_ERROR(MPI_Finalize());
282 if (verbose)
283 std::cout << "Finalized MPI library\n";
284#endif
285
286 // Done
287 return 0;
288}
实用技巧¶
对于调试,可以设置环境变量
CUDENSITYMAT_LOG_LEVEL=n
。级别n
= 0, 1, …, 5 对应于下表描述的日志记录器级别。环境变量CUDENSITYMAT_LOG_FILE=<filepath>
可用于将日志输出重定向到<filepath>
处的自定义文件,而不是stdout
。
级别 |
摘要 |
详细描述 |
0 |
关闭 |
禁用日志记录(默认) |
1 |
错误 |
仅记录错误 |
2 |
性能跟踪 |
启动 CUDA 内核的 API 调用将记录其参数和重要信息 |
3 |
性能提示 |
可能提高应用程序性能的提示 |
4 |
启发式跟踪 |
提供关于库执行的常规信息,可能包含关于启发式状态的详细信息 |
5 |
API 跟踪 |
API 调用将记录其参数和重要信息 |