高级密度矩阵 API

pythonic API 紧密遵循密度矩阵 C-API 的结构,为不同的概念(例如量子态和算符)提供单独的对象。

WorkStream

pythonic API 的基础是 WorkStream 对象,它将一些配置设置、cuDensityMat 库句柄、CUDA 流、CUDA 设备和工作区缓冲区聚合到一个对象中。工作区缓冲区根据与 WorkStream 实例交互的其他 API 对象的要求自动分配。可以在创建 WorkStream 实例时指定分配的工作区缓冲区大小的上限。工作区缓冲区在显式调用 WorkStream.release_workspace() 方法之前不会释放。在一般用法中,每个设备应该只创建一个 WorkStream。其他 API 对象在其生命周期内只能与一个 WorkStream 交互。目前仅支持阻塞操作,即 API 调用仅在设备上完成任务后才返回。

from cuquantum.densitymat import WorkStream
ctx = WorkStream(device_id=0)

通过将其 WorkStream.set_communicator() 方法将通信器传递给 WorkStream,支持多 GPU 多节点执行。请注意,此步骤必须在 WorkStream 传递给任何其他 API 对象之前执行。请按照 cuDensityMat MPI 设置 中的说明设置 MGMN 执行的环境变量。除了设置通信器之外,多 GPU 执行产生任何影响的唯一地方是指定量子态,其系数将分布在所有 GPU 上。

from mpi4py import MPI
import cupy as cp
from cuquantum.densitymat import WorkStream

comm = MPI.COMM_WORLD.Dup()
rank = comm.Get_rank()
num_devices = cp.cuda.runtime.getDeviceCount()
dev_id = rank % num_devices
cp.cuda.Device(dev_id).use()
ctx = WorkStream(device_id=dev_id)
ctx.set_communicator(comm, provider="MPI")

张量空间和量子态

张量空间是作为一个或多个向量空间的张量积构造的复合线性空间。每个构成向量空间表示一个量子自由度,其维度等于向量空间的维度。复合张量空间的形状是构成向量空间的维度元组,按其原始顺序排列。请注意,张量空间的形状保持不变,无论我们处理纯态(位于张量空间中的向量)还是混合态(与同一张量空间关联的矩阵)。

纯量子态由来自指定张量空间的张量表示,因此纯量子态的形状与张量空间形状一致。混合量子态由属于张量空间的张量表示,该张量空间构造为指定张量空间(主空间或 ket 空间)及其对偶张量空间(对偶空间或 bra 空间)的张量积。因此,混合量子态的形状是两个张量空间形状的串联,其中张量模式的前半部分对应于 ket 空间,而后半部分张量模式对应于 bra 空间。例如,如果张量 \(V[i0, i1, i2]\) 表示来自由三个向量空间(量子自由度)构造的张量空间的纯量子态,则张量 \(D[i0, i1, i2; j0, j1, j2]\) 表示来自同一张量空间的混合量子态,其中模式 \(i0, i1, i2\) 是 ket 模式,而模式 \(j0, j1, j2\) 是 bra 模式。在这两种情况下,对应的张量空间形状都是 \([range(i0), range(i1), range(i2)]\)

在 pythonic API 中,纯(混合)量子态由 DensePureState ( DenseMixedState ) 的实例表示。下面,我们使用密集纯态 DensePureState 类作为示例来提供详细说明。密集混合态的工作流程是相同的,并且这两个类具有相同的方法和属性。

为了创建量子态,我们传递 WorkStream、希尔伯特空间维度、批量大小和数据类型

import cupy as cp
from cuquantum.densitymat import DensePureState

hilbert_space_dims = (2, 6, 4, 3)
batch_size = 1
dtype = "complex64"
psi_1 = DensePureState(ctx, hilbert_space_dims, batch_size, dtype)

创建的量子态尚无关联的物理存储。

为了将物理存储附加到量子态,有两种选择

psi_1.allocate_storage()

DensePureState.allocate_storage() 将分配一个具有 psi1.storage_size 元素的正确数据类型的 cupy.ndarray,并初始化为零。

size = psi_1.storage_size
psi_1.attach_storage(cp.zeros(size, dtype = psi_1.dtype))

DensePureState.attach_storage() 允许用户传递一个具有 DensePureState.storage_size 元素的 cupy.ndarray,该数组位于与传递给 WorkStream 的设备相同的设备上。对于单 GPU 计算,存储大小对应于状态的系数数量,并且传递的数组可以是多维的。在这种情况下,数组需要是 Fortran 顺序的、连续的,并且与希尔伯特空间维度和批量大小匹配。

对于多 GPU 设置,每个进程仅存储状态系数的子集,我们将其称为切片。切片的大小可能等于或小于存储缓冲区大小。在多 GPU 执行中,仅允许一维存储缓冲区。切片始终位于存储缓冲区的前 DensePureState.storage_size 个元素中。

DensePureState.storage 返回附加的存储缓冲区。对于总模式数少于 32 个的状态,DensePureState.view() 返回切片上的多维可写视图。对于更大的系统,由于 cupy.ndarray 不支持超过 32 个维度,因此此方法将失败。DensePureState.local_info 返回本地切片的形状以及沿每个维度的偏移量。解释状态的系数或修改它们(例如,指定初始状态)需要此信息。请注意,即使批量大小为 1,批量大小也始终报告为最后一个模式。

对于纯粹用于单 GPU 执行的脚本,DensePureState.storage_sizeDensePureState.local_info 都直接从希尔伯特空间维度和批量大小得出,用户可以省略这些步骤。请注意,通过传递大小为 DensePureState.storage_size 的一维数组,用户脚本将在单 GPU 和多 GPU 执行之间直接可移植。

对于具有相同希尔伯特空间维度和批量大小的状态,通过 DensePureState.clone() 提供了一种创建新实例并直接附加存储的便捷方法。

psi_2 = psi_1.clone(cp.zeros_like(psi_1.storage))

除了用户直接操作存储缓冲区之外,还有几种方法允许逐元素操作,DensePureState.inplace_scale()DensePureState.inplace_accumulate()DensePureStateDenseMixedState 实例上的单项和二元归约通过 DensePureState.norm()DensePureState.inner_product()DensePureState.trace() 提供。

量子多体算符(超算符)

广泛的量子 Liouvillian 超算符或一般的量子多体算符可以表示为基本张量算符的张量积之和,其中一些系数通常是时变的,其中基本张量算符是作用于特定数量的量子自由度的张量算符,可以从右侧或左侧或两侧作用。每个基本张量算符具有相同数量的 ket 模式和 bra 模式,其中 ket 模式在其张量中位于 bra 模式之前。基本张量算符的 bra 模式作用于量子态的 ket 模式(从左侧作用),而基本张量算符的 ket 模式作用于量子态的 bra 模式(从右侧作用)。在实践中,许多量子 Liouvillian 算符用单体基本张量算符的张量积表示,例如产生/湮灭算符、粒子数算符、坐标/动量算符、单个自旋算符等。为了具体起见,让我们考虑以下量子 Liouvillian 算符作为示例

\[L = \omega_{c} b^{\dagger} b + \frac{1}{2} \omega_{a} \sigma_{z} + [ \frac{1}{2} g(t, \alpha, \beta) b^{\dagger} \sigma_{-} + \frac{1}{2} g(t, \alpha, \beta) b \sigma_{+} ] + \gamma \sigma_{-}\]

在上述 Liouville 算符中,\(b\)\(b^{\dagger}\)\(\sigma_{z}\)\(\sigma_{-}\)\(\sigma_{+}\) 是基本张量算符,并且每一个都只作用于单个量子自由度,由二阶张量 \(U[i0; j0]\) 表示,其中 \(i0\) 是右矢模式,\(j0\) 是左矢模式(k 阶基本张量算符由具有 \(k\) 个右矢模式和 \(k\) 个左矢模式的张量表示)。\(\omega_{c}\)\(\omega_{a}\)\(\gamma\) 是常数标量。\(g(t, \alpha, \beta)\) 是一个随时间变化的标量系数,它取决于时间和两个实参数 \(\alpha\)\(\beta\),这两个参数参数化了 Liouville 算符。

基本算符

定义基本算符的张量可以直接作为 cupy.ndarray(在 WorkStream 的设备上)或 numpy.ndarray 提供。Pythonic API 也提供了 DenseOperatorMultiDiagonalOperator 对象。MultiDiagonalOperator 是作用于单个模式的算符,以稀疏 DIA 格式指定。目前不支持多对角张量(即覆盖多个模式)和主对角线上方或下方超过第一条次对角线的对角矩阵。DenseOperator 具有底层密集存储。这两个对象都允许选择一元和二元运算,并且还支持传入签名为 (t: float, args: Sequence) -> numpy.ndarray 的 CPU 回调函数,该函数动态设置张量存储的元素。无论张量是以 cupy.ndarray / numpy.ndarray 还是以 DenseOperator / MultidiagonalOperator 的形式传递,都会制作原始提供数组的副本。如果张量数据最初以 numpy.ndarray 形式提供,则它将在第一次将基本算符用作 OperatorOperatorAction 的一部分时移动到 GPU(WorkStream 的设备)。如果一个基本算符在量子多体算符的表达式中多次出现,则使用 DenseOperatorMultiDiagonalOperator 允许我们重用相同的实例,因此是首选方法。

用基本算符组合符号表达式

单个基本算符可以组合成乘积和它们的和。API 提供了几个层用于通过聚合组合此类乘积和和。

基本算符的乘积最方便地通过 tensor_product() 自由函数组合。张量积的参数是元组,其中包含基本算符、它所支持的模式序列,以及可选的布尔序列(“对偶性”),该序列表示算符是与状态的右矢模式还是左矢模式收缩(对于混合状态,默认为 False,即右矢模式)。此外,可以通过命名参数提供系数。

算符的张量积通过 tensor_product() 创建。此函数返回一个 OperatorTerm 对象,该对象表示基本张量算符的乘积之和。OperatorTerm 支持就地(+=OperatorTerm.append() 方法)和异地加法(+)来组合和,以及与 numbers.NumberCallable 的乘法。请注意,所有使用的 Callable(对于系数或基本算符回调函数)都具有 (t, args) 的签名,并且必须接受相同的参数。仅在 OperatorTerm 开始使用 WorkStream 之前(见下文)才支持就地加法。此外,支持 OperatorTerm 的乘法,从而创建一个新实例,该实例包含因子乘积的和的乘积,其中和中的项数等于原始 OperatorTerm 中的项数的乘积。我们想指出的是,计算时间大致与 OperatorTerm 中的乘积数量成线性比例,因此建议不要使用 OperatorTerm 的乘法,如果可以手动执行简化操作的话。

Operator 包含 OperatorTerm 的总和,并且每个 OperatorTerm 都与系数(默认为 1)和一个布尔值对偶性(默认为 False)相关联。如果对偶性True,则在 OperatorTerm 中为每个乘积指定的对左矢模式和右矢模式的作用将翻转。Operator.dual() 返回一个新实例,其中这些布尔参数被翻转,这对于表达例如对易子很有用。Operator 支持与其他 Operator 实例的异地加法和减法,以及 OperatorTerm 的就地(+=Operator.append() 方法)和异地(+)加法。仅在 Operator 开始使用 WorkStream 之前(见下文)才支持就地加法。此外,允许与 numbers.NumberCallable 进行标量乘法。

可以通过 Operator.compute_action() 计算 Operator 对量子态 state_in 的作用,并将其累积到量子态 state_out 中。可以通过 Operator.compute_expectation() 评估量子态的期望值。在执行这些调用之前,应调用一次 Operator.prepare_action()(或 Operator.prepare_expectation()),该调用在对象的生命周期内保持有效。

虽然 Liouville 超算符的概念及其对量子态的作用足以指定广泛使用的主方程(如 Lindblad 方程)的右侧,但在更通用的设置中,可能需要能够指定耦合常微分方程组 (ODE),以同时演化多个(可能批量处理的)量子态在它们自己的 Liouville 算符下,如在运动分层方程 (HEOM) 方法中。

这样的 ODE 可以通过 OperatorAction 表示。OperatorAction 包含 Operator 序列,可以通过 OperatorAction.compute() 将其应用于量子态序列,其中结果累积到单个输出状态中。

请注意,在 Operator.prepare_action() / Operator.prepare_expectation() 上调用 Operator 或创建 OperatorAction 可能会对其依赖的所有对象(直接和间接)产生副作用。特别是,该对象及其依赖项将使用传入的 WorkStream 实例,如上所述,这是不可逆转的。如果依赖对象已经在使用不同的 WorkStream 实例,则会引发错误。通常,我们建议每个 GPU 不使用超过一个 WorkStream 实例。

使用示例

以下示例说明了定义基本算符并将它们组合成 OperatorTermOperatorOperatorAction 的各种方法。它是为单 GPU 执行编写的,尽管启用多 GPU 执行很简单,如下一个示例所示。

# Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES
#
# SPDX-License-Identifier: BSD-3-Clause

import cupy as cp
import numpy as np
from cuquantum.densitymat import (
    tensor_product,
    DenseMixedState,
    DenseOperator,
    WorkStream,
    Operator,
    OperatorAction,
)

dev = cp.cuda.Device()  # get current device
props = cp.cuda.runtime.getDeviceProperties(dev.id)
print("===== device info ======")
print("GPU-local-id:", dev.id)
print("GPU-name:", props["name"].decode())
print("GPU-clock:", props["clockRate"])
print("GPU-memoryClock:", props["memoryClockRate"])
print("GPU-nSM:", props["multiProcessorCount"])
print("GPU-major:", props["major"])
print("GPU-minor:", props["minor"])
print("========================")


# define the shape of the composite tensor product space
hilbert_space_dims = (4, 5, 2, 6, 3, 7)  # six quantum degrees of freedom

# define some elementary tensor operators
A = np.random.random((hilbert_space_dims[2],) * 2)  # one-body elementary tensor operator


b = np.random.random(  # two-body elementary tensor operator, arbitrary strides also supported
    (
        hilbert_space_dims[3],
        hilbert_space_dims[5],
    )
    * 2
)


# We can wrap the NDArray in a TensorOperator, which is useful if we want to use the same tensor multiple times.
B = DenseOperator(b)


# one-body elementary tensor callback operator
def c_callback(t, args):
    return np.random.random((hilbert_space_dims[1],) * 2)


c_representative_tensor = c_callback(0.0, ())
# making an instance of TensorOperator is optional, we can also pass the tuple of (c_representative_tensor, c_callback) directly below in palce of `C`
C = DenseOperator(c_representative_tensor, c_callback)

print("Defined elementary operators A, B, C.")


# define a scalar callback function (time-dependent coefficient)
def my_callback(t, args):  # args is an arbitrary list of real user-defined parameters
    _omega = args[0]
    return np.sin(np.pi * _omega * t)  # return the scalar parameterized coefficient at time t


# construct tensor products of elementary tensor operators
ab = tensor_product(
    (
        A,  # elementary tensor operator
        (2,),  # quantum degrees of freedom it acts on
    ),
    (
        B,  # elementary tensor operator
        (3, 5),  # quantum degrees of freedom it acts on
    ),
    coeff=1.0,  # constant (static) coefficient
)

bc = tensor_product(
    (
        B,  # elementary tensor operator
        (3, 5),  # quantum degrees of freedom it acts on
    ),
    (
        C,  # elementary tensor operator
        (1,),  # quantum degrees of freedom it acts on
    ),
    coeff=my_callback,  # time-dependent parameterized coefficient represented by a user-defined callback function
)

# construct different operator terms
term1 = ab + bc  # an OperatorTerm composed of a sum of two tensor operator products
term1 += bc  # `OperatorTerm` also supports in-place addition

term2 = tensor_product(  # an operator term composed of a single elementary tensor operator
    (
        C,  # elementary tensor operator
        (1,),  # quantum degrees of freedom it acts on
        (False,),  # operator action duality (side: left/right) for each quantum degree of freedom
    ),
    coeff=1.0,  # constant (static) coefficient
)

print("Created OperatorTerms term1 and term2.")

# construct the Hamiltonian operator from two operator terms
hamiltonian = Operator(
    hilbert_space_dims,  # shape of the composite tensor space
    (term1,),  # first operator term with a default coefficient 1.0
    (
        term2,
        my_callback,
    ),  # second operator term modulated by a parameterized time-dependent coefficient (callback function)
)

print("Created Hamiltonian Operator from term1 and term2.")

# construct the Liouvillian for the von Neumann equation
liouvillian = (
    hamiltonian - hamiltonian.dual()
)  # Hamiltonian action on the left minus Hamiltonian action on the right: [H, *]

print("Created Liouvillian Operator from Hamiltonian.")

# open a work stream over a CUDA stream
my_stream = cp.cuda.Stream()
ctx = WorkStream(stream=my_stream)

# construct the Liouvillian action for a single quantum state
liouvillian_action = OperatorAction(ctx, (liouvillian,))

print("Created Liouvillian OperatorAction from Liouvillian.")

# create a mixed quantum state (density matrix) with zero initialized data buffer
batch_size = 1
rho0 = DenseMixedState(ctx, hilbert_space_dims, batch_size, "complex128")
slice_shape, slice_offsets = rho0.local_info
rho0.attach_storage(cp.zeros(rho0.storage_size, dtype=rho0.dtype))
# set storage to a Haar random unnormalized state
# for MGMN execution, the data buffer may be larger than the locally stored slice of the state
# the view method returns a tensor shaped view on the local slice (the full state for single-GPU execution)
rho0.view()[:] = cp.random.normal(size=slice_shape) + (
    1j * cp.random.normal(size=slice_shape)
)
# for non-random initialization and MGMN execution, we would use slice_offsets to determine how to set the elements
norm = rho0.norm().get()[()]
rho0.inplace_scale(np.sqrt(1 / norm))
assert np.isclose(rho0.norm().get()[()], 1)

print(
    "Created a Haar random normalized mixed quantum state (not physical due to lack of hermitianity)."
)

# two ways of creating another mixed quantum state of the same shape and init it to zero
rho1 = rho0.clone(cp.zeros_like(rho0.storage))
rho2 = DenseMixedState(ctx, hilbert_space_dims, batch_size, "complex128")
rho2.allocate_storage()

print("Created a zero-initialized output mixed quantum state.")


# prepare operator action on a mixed quantum state
liouvillian_action.prepare(ctx, (rho0,))

print("Prepared Liouvillian action through OperatorAction.prepare.")

# set a parameter for the callback function to some value
omega = 2.4

# compute the operator action on a given quantum state
liouvillian_action.compute(
    0.0,  # time value
    (omega,),  # user-defined parameters
    (rho0,),  # input quantum state
    rho1,  # output quantum state
)

print("Computed Liouvillian action through OperatorAction.compute.")

# alternatively, prepare the operator action directly via the operator
liouvillian.prepare_action(ctx, rho0)

print("Prepared Liouvillian action through Operator.prepare_action.")

# compute the operator action directly via the operator
liouvillian.compute_action(
    0.0,  # time value
    (omega,),  # user-defined parameters
    rho0,  # input quantum state
    rho2,  # output quantum state
)
# OperatorAction.compute and Operator.compute_action are accumulative, so rho0p should now be equivalent to twice the action of the Liouvillian.

print("Computed Liouvillian action through Operator.compute_action.")

# prepare the operator expectation value computation
liouvillian.prepare_expectation(ctx, rho0)

print("Prepared expectation through Operator.prepare_expectation.")

# compute the operator expectation value
expval = liouvillian.compute_expectation(
    0.0,  # time value
    (omega,),  # user-defined parameters
    rho1,  # input quantum state
)

print("Computed expectation through Operator.compute_expectation.")

# we can compute the operator action again without another prepare call
liouvillian.compute_action(
    0.0,  # time value
    (omega,),  # user-defined parameters
    rho0,  # input quantum state
    rho1,  # output quantum state
)

print("Computed Liouvillian action through Operator.compute_action.")

# assuming we want to some other task in between it may be useful to release the workspace
ctx.release_workspace()

# releasing the workspace has no user-facing side effects
# for example, we do not need to reissue prepare calls
liouvillian.compute_action(
    0.0,  # time value
    (omega,),  # user-defined parameters
    rho0,  # input quantum state
    rho2,  # output quantum state
)

print("Computed Liouvillian action through Operator.compute_action after releasing workspace.")

# synchronize the work stream
my_stream.synchronize()

print("Finished computation and exit.")

以下是多 GPU 多节点 (MGMN) 执行的最小工作示例。它包含一个实用函数,用于将量子态设置为计算基中的特定乘积态,这说明了如何解释 DensePureState.local_info 以及如何使用 MGMN 执行所需的一维量子态存储缓冲区。

# Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES
#
# SPDX-License-Identifier: BSD-3-Clause

import cupy as cp
import numpy as np
from mpi4py import MPI

from cuquantum.densitymat import (
    tensor_product,
    DensePureState,
    DenseOperator,
    WorkStream,
    OperatorTerm,
    Operator,
    OperatorAction,
)

NUM_DEVICES = cp.cuda.runtime.getDeviceCount()
rank = MPI.COMM_WORLD.Get_rank()
dev = cp.cuda.Device(rank % NUM_DEVICES)
dev.use()
props = cp.cuda.runtime.getDeviceProperties(dev.id)
print("===== device info ======")
print("GPU-local-id:", dev.id)
print("GPU-name:", props["name"].decode())
print("GPU-clock:", props["clockRate"])
print("GPU-memoryClock:", props["memoryClockRate"])
print("GPU-nSM:", props["multiProcessorCount"])
print("GPU-major:", props["major"])
print("GPU-minor:", props["minor"])
print("========================")


# create Workstream on the current device
ctx = WorkStream(device_id=dev.id)

# setup MPI communicator
ctx.set_communicator(comm=MPI.COMM_WORLD.Dup(), provider="MPI")

# define the shape of the composite tensor product space
hilbert_space_dims = (4, 4, 4, 4, 4)  # six quantum degrees of freedom
batch_size = 2

# define some elementary tensor operators
identity = DenseOperator(np.eye(hilbert_space_dims[0], dtype="complex128"))
op_term = OperatorTerm(dtype="complex128")
for i in range(len(hilbert_space_dims)):
    op_term += tensor_product(
        (
            identity,
            (1,),
        )
    )
# This operator will just be proportional to the identity
op = Operator(hilbert_space_dims, (op_term,))
op_action = OperatorAction(ctx, (op,))


def set_ditstring(state, batch_index, ditstring: list):
    """
    Set's the state's coefficient at for the `batch_index`'th quantum state to the product state in the computational basis encoded by `ditstring`.
    """
    slice_shape, slice_offsets = state.local_info
    ditstring = np.asarray(
        ditstring
        + [
            batch_index,
        ],
        dtype="int",
    )
    ditstring_is_local = True
    state_inds = []
    for slice_dim, slice_offset, state_dit in zip(
        slice_shape, slice_offsets, ditstring
    ):
        ditstring_is_local = state_dit in range(slice_offset, slice_offset + slice_dim)
        if not ditstring_is_local:
            break
        else:
            state_inds.append(
                range(slice_offset, slice_offset + slice_dim).index(state_dit)
            )
    if ditstring_is_local:
        strides = (1,) + tuple(np.cumprod(np.array(slice_shape)[:-1]))
        ind = np.sum(strides * np.array(state_inds))
        state.storage[ind] = 1.0


# product states to be set for each batch state
global_ditstrings = [[0, 1, 3, 2, 0], [1, 0, 3, 2, 1]]

# make initial state
state = DensePureState(ctx, hilbert_space_dims, batch_size, "complex128")
required_buffer_size = state.storage_size
state.attach_storage(cp.zeros((required_buffer_size,), dtype="complex128"))
# set product states for each batch input state
for batch_ind in range(batch_size):
    set_ditstring(state, batch_ind, global_ditstrings[batch_ind])
# more ways to make a State instance
state_out = state.clone(cp.zeros(required_buffer_size, dtype="complex128"))
another_state = DensePureState(ctx, hilbert_space_dims, batch_size, "complex128")
another_state.allocate_storage()
# prepare and compute Operator action
op.prepare_action(ctx, state)
op.compute_action(0.0, [], state, state_out)
state_out_slice = state_out.view()
# compute Operator action
op_action.compute(
    0.0,
    [],
    [
        state,
    ],
    another_state,
)
# OperatorAction and Operator for this specific example have the same effect
assert cp.allclose(another_state.view(), state_out_slice)