DOCA Bench 示例调用
本指南提供该工具各种调用的示例,以帮助提供关于该工具和被测特性的指导和见解。
为了使示例更清晰,某些冗长的输出和重复的信息已被删除或缩短,特别是 DOCA Bench 首次执行时配置或默认值的输出已被删除。
命令行选项可能需要更新以适应您的环境(例如,TCP 地址、端口号、接口名称、用户名)。 有关更多信息,请参阅“命令行参数”部分。
此测试调用 DOCA Bench 以以太网接收模式运行,配置为接收大小为 1500 字节的以太网帧。
该测试使用单核运行 3 秒,并使用最大突发大小为 512 帧。
该测试在默认吞吐量模式下运行,吞吐量数据在测试运行结束时显示。
伴随应用程序使用 6 个核心持续发送大小为 1500 字节的以太网帧,直到被 DOCA Bench 停止。
命令行
doca_bench --core-mask 0x02
\
--pipeline-steps doca_eth::rx \
--device b1:00.1
\
--data-provider random-data \
--uniform-job-size 1500
\
--run-limit-seconds 3
\
--attribute doca_eth.max-burst-size=512
\
--companion-connection-string proto=tcp,addr=10.10
.10.10
,port=12345
,user=bob,dev=ens4f1np1 \
--attribute doption.companion_app.path=/opt/mellanox/doca/tools/doca_bench_companion \
--companion-core-list 6
\
--job-output-buffer-size 1500
\
--mtu-size raw_eth
结果输出
[main] doca_bench : 2.7
.0084
[main] release build
+ + + + + + + + + + + + + + + + + + + + + + + + + +
DOCA bench supported modules: [doca_comm_channel, doca_compress, doca_dma, doca_ec, doca_eth, doca_sha, doca_comch, doca_rdma, doca_aes_gcm]
+ + + + + + + + + + + + + + + + + + + + + + + + + +
DOCA bench configuration
Static configuration: [
Attributes: [doca_eth.l4-chksum-offload:false
, doca_eth.max-burst-size:512
, doption.companion_app.path:/opt/mellanox/doca/tools/doca_bench_companion, doca_eth.l3-chksum-offload:false
]
Companion configuration: [
Device: ens4f1np1
Remote IP address: "bob@10.10.10.10"
Core set: [6
]
]
Pipelines: [
Steps: [
name: "doca_eth::rx"
attributes: []
]
Use remote input buffers: no
Use remote output buffers: no
Latency bucket_range: 10000ns-110000ns
]
Run limits: [
Max execution time: 3seconds
Max jobs executed: -- not configured --
Max bytes processed: -- not configured --
]
Data provider: [
Name: "random-data"
Job output buffer size: 1500
]
Device: "b1:00.1"
Device representor: "-- not configured --"
Warm up job count: 100
Input files dir: "-- not configured --"
Output files dir: "-- not configured --"
Core set: [1
]
Benchmark mode: throughput
Warnings as errors: no
CSV output: [
File name: -- not configured --
Selected stats: []
Deselected stats: []
Separate dynamic values: no
Collect environment information: no
Append to stats file: no
]
]
Test permutations: [
Attributes: []
Uniform job size: 1500
Core count: 1
Per core thread count: 1
Task pool size: 1024
Data provider job count: 128
MTU size: ETH_FRAME
SQ depth: -- not configured --
RQ depth: -- not configured --
Input data file: -- not configured --
]
[main] Initialize framework...
[main] Start execution...
Preparing...
EAL: Detected CPU lcores: 36
EAL: Detected NUMA nodes: 4
EAL: Detected shared linkage of DPDK
EAL: Multi-process socket /run/user/48679
/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'PA'
EAL: VFIO support initialized
TELEMETRY: No legacy callbacks, legacy socket not created
EAL: Probe PCI driver: mlx5_pci (15b3:a2d6) device: 0000
:b1:00.1
(socket 2
)
[08
:19
:32
:110524
][398304
][DOCA][WRN][engine_model.c:90
][adapt_queue_depth] adapting queue depth to 128
.
Executing...
Data path thread [0
] started...
WT[0
] Executing 100
warm-up tasks using 100
unique tasks
Cleanup...
[main] Completed! tearing down...
Aggregate stats
Duration: 3000633
micro seconds
Enqueued jobs: 611215
Dequeued jobs: 611215
Throughput: 000.204
MOperations/s
Ingress rate: 002.276
Gib/s
Egress rate: 002.276
Gib/s
结果概述
由于指定了单核,因此只显示一个统计信息输出部分。
此测试调用 DOCA Bench 以以太网发送模式运行,配置为发送大小为 1500 字节的以太网帧
随机数据用于填充以太网帧
该测试使用单核运行 3 秒,并使用最大突发大小为 512 帧
未启用 L3 和 L4 校验和卸载
该测试在默认吞吐量模式下运行,吞吐量数据在测试运行结束时显示
伴随应用程序使用 6 个核心持续接收大小为 1500 字节的以太网帧,直到被 DOCA Bench 停止
命令行
doca_bench --core-mask 0x02
\
--pipeline-steps doca_eth::tx \
--device b1:00.1
\
--data-provider random-data \
--uniform-job-size 1500
\
--run-limit-seconds 3
\
--attribute doca_eth.max-burst-size=512
\
--attribute doca_eth.l4-chksum-offload=false
\
--attribute doca_eth.l3-chksum-offload=false
\
--companion-connection-string proto=tcp,addr=10.10
.10.10
,port=12345
,user=bob,dev=ens4f1np1 \
--attribute doption.companion_app.path=/opt/mellanox/doca/tools/doca_bench_companion \
--companion-core-list 6
\
--job-output-buffer-size 1500
结果输出
[main] doca_bench : 2.7
.0084
[main] release build
+ + + + + + + + + + + + + + + + + + + + + + + + + +
DOCA bench supported modules: [doca_comm_channel, doca_compress, doca_dma, doca_ec, doca_eth, doca_sha, doca_comch, doca_rdma, doca_aes_gcm]
+ + + + + + + + + + + + + + + + + + + + + + + + + +
DOCA bench configuration
Static configuration: [
Attributes: [doca_eth.l4-chksum-offload:false
, doca_eth.max-burst-size:512
, doption.companion_app.path:/opt/mellanox/doca/tools/doca_bench_companion, doca_eth.l3-chksum-offload:false
]
Companion configuration: [
Device: ens4f1np1
Remote IP address: "bob@10.10.10.10"
Core set: [6
]
]
Pipelines: [
Steps: [
name: "doca_eth::tx"
attributes: []
]
Use remote input buffers: no
Use remote output buffers: no
Latency bucket_range: 10000ns-110000ns
]
Run limits: [
Max execution time: 3seconds
Max jobs executed: -- not configured --
Max bytes processed: -- not configured --
]
Data provider: [
Name: "random-data"
Job output buffer size: 1500
]
Device: "b1:00.1"
Device representor: "-- not configured --"
Warm up job count: 100
Input files dir: "-- not configured --"
Output files dir: "-- not configured --"
Core set: [1
]
Benchmark mode: throughput
Warnings as errors: no
CSV output: [
File name: -- not configured --
Selected stats: []
Deselected stats: []
Separate dynamic values: no
Collect environment information: no
Append to stats file: no
]
]
Test permutations: [
Attributes: []
Uniform job size: 1500
Core count: 1
Per core thread count: 1
Task pool size: 1024
Data provider job count: 128
MTU size: -- not configured --
SQ depth: -- not configured --
RQ depth: -- not configured --
Input data file: -- not configured --
]
[main] Initialize framework...
[main] Start execution...
Preparing...
Executing...
Data path thread [0
] started...
WT[0
] Executing 100
warm-up tasks using 100
unique tasks
Cleanup...
[main] Completed! tearing down...
Aggregate stats
Duration: 3000049
micro seconds
Enqueued jobs: 17135128
Dequeued jobs: 17135128
Throughput: 005.712
MOperations/s
Ingress rate: 063.832
Gib/s
Egress rate: 063.832
Gib/s
结果概述
由于指定了单核,因此只显示一个统计信息输出部分。
此测试在 x86 主机端调用 DOCA Bench 以运行 AES-GM 解密步骤
文件集文件用于指示要解密的文件。 文件集文件的内容列出了要解密的文件名。
用于加密和解密的密钥使用
doca_aes_gcm.key
文件属性指定。 这包含要使用的密钥。它将运行直到处理完 5000 个作业
它在精确延迟模式下运行,延迟和吞吐量数据在测试运行结束时显示
指定了核心掩码以指示核心 12、13、14 和 15 将用于此测试
命令行
doca_bench --mode precision-latency \
--core-mask 0xf000
\
--warm-up-jobs 32
\
--device 17
:00.0
\
--data-provider file-set \
--data-provider-input-file aes_64_128.fileset \
--run-limit-jobs 5000
\
--pipeline-steps doca_aes_gcm::decrypt \
--attribute doca_aes_gcm.key-file='aes128.key'
\
--job-output-buffer-size 80
结果输出
[main] Completed! tearing down...
Worker thread[0
](core: 12
) stats:
Duration: 10697
micro seconds
Enqueued jobs: 5000
Dequeued jobs: 5000
Throughput: 000.467
MOperations/s
Ingress rate: 000.265
Gib/s
Egress rate: 000.223
Gib/s
Worker thread[1
](core: 13
) stats:
Duration: 10700
micro seconds
Enqueued jobs: 5000
Dequeued jobs: 5000
Throughput: 000.467
MOperations/s
Ingress rate: 000.265
Gib/s
Egress rate: 000.223
Gib/s
Worker thread[2
](core: 14
) stats:
Duration: 10733
micro seconds
Enqueued jobs: 5000
Dequeued jobs: 5000
Throughput: 000.466
MOperations/s
Ingress rate: 000.264
Gib/s
Egress rate: 000.222
Gib/s
Worker thread[3
](core: 15
) stats:
Duration: 10788
micro seconds
Enqueued jobs: 5000
Dequeued jobs: 5000
Throughput: 000.463
MOperations/s
Ingress rate: 000.262
Gib/s
Egress rate: 000.221
Gib/s
Aggregate stats
Duration: 10788
micro seconds
Enqueued jobs: 20000
Dequeued jobs: 20000
Throughput: 001.854
MOperations/s
Ingress rate: 001.050
Gib/s
Egress rate: 000.884
Gib/s
min: 1878
ns
max: 4956
ns
median: 2134
ns
mean: 2145
ns
90th %ile: 2243
ns
95th %ile: 2285
ns
99th %ile: 2465
ns
99
.9th %ile: 3193
ns
99
.99th %ile: 4487
ns
结果概述
由于指定了核心掩码但没有指定核心计数,因此将使用掩码中的所有核心。
对于使用的每个核心以及聚合统计信息,都会显示一个统计信息部分。
此测试在 BlueField 端调用 DOCA Bench 以运行 AES-GM 加密步骤
大小为 2KB 的文本文件是加密阶段的输入
用于加密和解密的密钥使用
doca_aes_gcm.key
属性指定它将运行直到处理完 2000 个作业
它在批量延迟模式下运行,延迟和吞吐量数据在测试运行结束时显示
指定了单核和 2 个线程
命令行
doca_bench --mode bulk-latency \
--core-list 3
\
--threads-per-core 2
\
--warm-up-jobs 32
\
--device 03
:00.0
\
--data-provider file \
--data-provider-input-file plaintext_2k.txt \
--run-limit-jobs 2000
\
--pipeline-steps doca_aes_gcm::encrypt \
--attribute doca_aes_gcm.key="0123456789abcdef0123456789abcdef"
\
--uniform-job-size 2048
\
--job-output-buffer-size 4096
结果输出
[main] Completed! tearing down...
Worker thread[0
](core: 3
) stats:
Duration: 501
micro seconds
Enqueued jobs: 2048
Dequeued jobs: 2048
Throughput: 004.082
MOperations/s
Ingress rate: 062.279
Gib/s
Egress rate: 062.644
Gib/s
Worker thread[1
](core: 3
) stats:
Duration: 466
micro seconds
Enqueued jobs: 2048
Dequeued jobs: 2048
Throughput: 004.386
MOperations/s
Ingress rate: 066.922
Gib/s
Egress rate: 067.314
Gib/s
Aggregate stats
Duration: 501
micro seconds
Enqueued jobs: 4096
Dequeued jobs: 4096
Throughput: 008.163
MOperations/s
Ingress rate: 124.558
Gib/s
Egress rate: 125.287
Gib/s
Latency report:
:
:
:
:
:
::
::
::
::
.::. . . ..
------------------------------------------------------------------------------------------------------
[<10000ns]: 0
.. OUTPUT RETRACTED (SHORTENED) ..
[26000ns -> 26999ns]: 0
[27000ns -> 27999ns]: 128
[28000ns -> 28999ns]: 2176
[29000ns -> 29999ns]: 1152
[30000ns -> 30999ns]: 128
[31000ns -> 31999ns]: 0
[32000ns -> 32999ns]: 0
[33000ns -> 33999ns]: 128
[34000ns -> 34999ns]: 0
[35000ns -> 35999ns]: 0
[36000ns -> 36999ns]: 0
[37000ns -> 37999ns]: 0
[38000ns -> 38999ns]: 128
[39000ns -> 39999ns]: 0
[40000ns -> 40999ns]: 0
[41000ns -> 41999ns]: 0
[42000ns -> 42999ns]: 0
[43000ns -> 43999ns]: 128
[44000ns -> 44999ns]: 128
[45000ns -> 45999ns]: 0
.. OUTPUT RETRACTED (SHORTENED) ..
[>110000ns]: 0
结果概述
由于指定了单核,因此只显示一个统计信息输出部分。
此测试在主机端调用 DOCA Bench 以在管道中运行 2 个 AES-GM 步骤,首先加密文本文件,然后解密来自加密步骤的相关输出
大小为 2KB 的文本文件是加密阶段的输入
input-cwd
选项指示 DOCA Bench 在不同的位置查找输入文件,在本例中为父目录用于加密和解密的密钥使用
doca_aes_gcm.key
文件属性指定,指示密钥可以在指定的文件中找到它将运行直到处理完 204800 字节
它在默认吞吐量模式下运行,吞吐量数据在测试运行结束时显示
命令行
doca_bench --core-mask 0xf00
\
--core-count 1
\
--warm-up-jobs 32
\
--device 17
:00.0
\
--data-provider file \
--input-cwd ../. \
--data-provider-input-file plaintext_2k.txt \
--run-limit-bytes 204800
\
--pipeline-steps doca_aes_gcm::encrypt,doca_aes_gcm::decrypt \
--attribute doca_aes_gcm.key-file='aes128.key'
\
--uniform-job-size 2048
\
--job-output-buffer-size 4096
结果输出
Executing...
Worker thread[0
](core: 8
) [doca_aes_gcm::encrypt>>doca_aes_gcm::decrypt] started...
Worker thread[0
] Executing 32
warm-up tasks using 32
unique tasks
Cleanup...
[main] Completed! tearing down...
Aggregate stats
Duration: 79
micro seconds
Enqueued jobs: 214
Dequeued jobs: 214
Throughput: 002.701
MOperations/s
Ingress rate: 041.214
Gib/s
Egress rate: 041.214
Gib/s
结果概述
由于指定了单核,因此只显示一个统计信息输出部分。
此测试在主机端调用 DOCA Bench 以使用 SHA256 算法执行 SHA 操作,并创建一个包含测试配置和统计信息的 CSV 文件
提供了一个包含 1 个核心的列表,每个核心的线程计数为 2
命令行
doca_bench --core-mask 2
\
--threads-per-core 2
\
--pipeline-steps doca_sha \
--device d8:00.0
\
--data-provider random-data \
--uniform-job-size 2048
\
--job-output-buffer-size 2048
\
--run-limit-seconds 3
\
--attribute doca_sha.algorithm=sha256 \
--warm-up-jobs 100
\
--csv-output-file /tmp/sha_256_test.csv
结果输出
Executing...
Data path thread [0
] started...
WT[0
] Executing 100
warm-up tasks using 100
unique tasks
Data path thread [1
] started...
WT[1
] Executing 100
warm-up tasks using 100
unique tasks
Cleanup...
[main] Completed! tearing down...
Stats for
thread[0
](core: 1
)
Duration: 3000064
micro seconds
Enqueued jobs: 3713935
Dequeued jobs: 3713935
Throughput: 001.238
MOperations/s
Ingress rate: 018.890
Gib/s
Egress rate: 000.295
Gib/s
Stats for
thread[1
](core: 1
)
Duration: 3000056
micro seconds
Enqueued jobs: 3757335
Dequeued jobs: 3757335
Throughput: 001.252
MOperations/s
Ingress rate: 019.110
Gib/s
Egress rate: 000.299
Gib/s
Aggregate stats
Duration: 3000064
micro seconds
Enqueued jobs: 7471270
Dequeued jobs: 7471270
Throughput: 002.490
MOperations/s
Ingress rate: 038.000
Gib/s
Egress rate: 000.594
Gib/s
结果概述
由于已指定单核且线程计数为 2,因此会显示每个线程的统计信息以及聚合统计信息。
还可以观察到在核心 1 上启动了 2 个线程,每个线程都执行预热作业。
/tmp/sha_256_test.csv
的内容如下所示。 可以看到,列出了测试使用的配置以及测试运行的相关统计信息
cfg.companion.connection_string,cfg.pipeline.steps,cfg.pipeline.use_remote_input_buffers,cfg.pipeline.use_remote_output_buffers,cfg.pipeline.bulk_latency.lower_bound,cfg.pipeline.bulk_latency.bucket_width,cfg.run_limit.duration,cfg.r un_limit.jobs,cfg.run_limit.bytes,cfg.data_provider.type,cfg.data_provider.output_buffer_size,cfg.device.pci_address,cfg.input.cwd,cfg.output.cwd,cfg.warmup_job_count,cfg.core_set,cfg.benchmark_mode,cfg.warnings_are_errors,cfg.attrib ute.doca_compress.algorithm,cfg.attribute.doca_ec.matrix_type,cfg.attribute.doca_ec.data_block_count,cfg.attribute.doca_ec.redundancy_block_count,cfg.attribute.doca_ec.use_precomputed_matrix,cfg.attribute.doca_eth.l3_chksum_offload,c fg.attribute.doca_eth.l4_chksum_offload,cfg.attribute.doca_sha.algorithm,cfg.uniform_job_size,cfg.core_count,cfg.per_core_thread_count,cfg.task_pool_size,cfg.data_provider_job_count,cfg.sg_config,cfg.mtu-size,cfg.send-queue-size,cfg. receive-queue-size,cfg.data-provider-input-file,cfg.attribute.mmo.log_qp_depth,cfg.attribute.mmo.log_num_qps,stats.input.job_count,stats.output.job_count,stats.input.byte_count,stats.output.byte_count,stats.input.throughput.bytes,sta ts.output.throughput.bytes,stats.input.throughput.rate,stats.output.throughput.rate
,[doca_sha],0
,0
,10000
,1000
,3
,,,random-data,2048
,d8:00.0
,,,100
,[1
],throughput,0
,,,,,,,,sha256,2048
,1
,2
,1024
,128
,1
fragments,,,,,,,7471270
,7471270
,15301160960
,239109312
,038.000
Gib/s,000.594
Gib/s,2.490370
MOperations/s,2.490370
MOpera tions/s
此测试在主机端调用 DOCA Bench 以使用 SHA512 算法执行 SHA 操作,并创建一个包含测试配置和统计信息的 csv 文件,
该命令重复执行,并添加了 csv-append-mode 选项。 这指示 DOCA Bench 将测试运行统计信息附加到现有的 csv 文件中。
提供了一个包含 1 个核心的列表,每个核心的线程计数为 2。
命令行
创建初始
/tmp/sha_512_test.csv
文件doca_bench --core-list
2
\ --threads-per-core2
\ --pipeline-steps doca_sha \ --device d8:00.0
\ --data-provider random-data \ --uniform-job-size2048
\ --job-output-buffer-size2048
\ --run-limit-seconds3
\ --attribute doca_sha.algorithm=sha512 \ --warm-up-jobs100
\ --csv-output-file /tmp/sha_512_test.csv第二个命令是
./doca_bench --core-list
2
\ --threads-per-core2
\ --pipeline-steps doca_sha \ --device d8:00.0
\ --data-provider random-data \ --uniform-job-size2048
\ --job-output-buffer-size2048
\ --run-limit-seconds3
\ --attribute doca_sha.algorithm=sha512 \ --warm-up-jobs100
\ --csv-output-file /tmp/sha_512_test.csv \ --csv-append-mode这使得 DOCA Bench 将第二个命令运行的配置和统计信息附加到
/tmp/sha_512_test.csv
文件中。
结果输出
这是第一个命令运行的结果输出快照
Executing...
Data path thread [0
] started...
WT[0
] Executing 100
warm-up tasks using 100
unique tasks
Data path thread [1
] started...
WT[1
] Executing 100
warm-up tasks using 100
unique tasks
Cleanup...
[main] Completed! tearing down...
Stats for
thread[0
](core: 2
)
Duration: 3015185
micro seconds
Enqueued jobs: 3590717
Dequeued jobs: 3590717
Throughput: 001.191
MOperations/s
Ingress rate: 018.171
Gib/s
Egress rate: 000.568
Gib/s
Stats for
thread[1
](core: 2
)
Duration: 3000203
micro seconds
Enqueued jobs: 3656044
Dequeued jobs: 3656044
Throughput: 001.219
MOperations/s
Ingress rate: 018.594
Gib/s
Egress rate: 000.581
Gib/s
Aggregate stats
Duration: 3015185
micro seconds
Enqueued jobs: 7246761
Dequeued jobs: 7246761
Throughput: 002.403
MOperations/s
Ingress rate: 036.673
Gib/s
Egress rate: 001.146
Gib/s
这是第二个命令运行的结果输出快照
Executing...
Data path thread [0
] started...
WT[0
] Executing 100
warm-up tasks using 100
unique tasks
Data path thread [1
] started...
WT[1
] Executing 100
warm-up tasks using 100
unique tasks
Cleanup...
[main] Completed! tearing down...
Stats for
thread[0
](core: 2
)
Duration: 3000072
micro seconds
Enqueued jobs: 3602562
Dequeued jobs: 3602562
Throughput: 001.201
MOperations/s
Ingress rate: 018.323
Gib/s
Egress rate: 000.573
Gib/s
Stats for
thread[1
](core: 2
)
Duration: 3000062
micro seconds
Enqueued jobs: 3659148
Dequeued jobs: 3659148
Throughput: 001.220
MOperations/s
Ingress rate: 018.611
Gib/s
Egress rate: 000.582
Gib/s
Aggregate stats
Duration: 3000072
micro seconds
Enqueued jobs: 7261710
Dequeued jobs: 7261710
Throughput: 002.421
MOperations/s
Ingress rate: 036.934
Gib/s
Egress rate: 001.154
Gib/s
结果概述
由于已指定单核且线程计数为 2,因此会显示每个线程的统计信息以及聚合统计信息。
还可以观察到在核心 1 上启动了 2 个线程,每个线程都执行预热作业。
/tmp/sha_256_test.csv
的内容在第一个命令运行后如下所示。 可以看到,列出了测试使用的配置以及测试运行的相关统计信息
cfg.companion.connection_string,cfg.pipeline.steps,cfg.pipeline.use_remote_input_buffers,cfg.pipeline.use_remote_output_buffers,cfg.pipeline.bulk_latency.lower_bound,cfg.pipeline.bulk_latency.bucket_width,cfg.run_limit.duration,cfg.run_limit.jobs,cfg.run_limit.bytes,cfg.data_provider.type,cfg.data_provider.output_buffer_size,cfg.device.pci_address,cfg.input.cwd,cfg.output.cwd,cfg.warmup_job_count,cfg.core_set,cfg.benchmark_mode,cfg.warnings_are_errors,cfg.attribute.doca_compress.algorithm,cfg.attribute.doca_ec.matrix_type,cfg.attribute.doca_ec.data_block_count,cfg.attribute.doca_ec.redundancy_block_count,cfg.attribute.doca_ec.use_precomputed_matrix,cfg.attribute.doca_eth.l3_chksum_offload,cfg.attribute.doca_eth.l4_chksum_offload,cfg.attribute.doca_sha.algorithm,cfg.uniform_job_size,cfg.core_count,cfg.per_core_thread_count,cfg.task_pool_size,cfg.data_provider_job_count,cfg.sg_config,cfg.mtu-size,cfg.send-queue-size,cfg.receive-queue-size,cfg.data-provider-input-file,cfg.attribute.mmo.log_qp_depth,cfg.attribute.mmo.log_num_qps,stats.input.job_count,stats.output.job_count,stats.input.byte_count,stats.output.byte_count,stats.input.throughput.bytes,stats.output.throughput.bytes,stats.input.throughput.rate,stats.output.throughput.rate
,[doca_sha],0
,0
,10000
,1000
,3
,,,random-data,2048
,d8:00.0
,,,100
,[2
],throughput,0
,,,,,,,,sha512,2048
,1
,2
,1024
,128
,1
fragments,,,,,,,7246761
,7246761
,14841366528
,463850048
,036.673
Gib/s,001.146
Gib/s,2.403422
MOperations/s,2.403422
MOperations/s
/tmp/sha_256_test.csv
的内容在第二个命令运行后如下所示。 可以看到,添加了第二个条目,详细说明了测试使用的配置以及测试运行的相关统计信息
cfg.companion.connection_string,cfg.pipeline.steps,cfg.pipeline.use_remote_input_buffers,cfg.pipeline.use_remote_output_buffers,cfg.pipeline.bulk_latency.lower_bound,cfg.pipeline.bulk_latency.bucket_width,cfg.run_limit.duration,cfg.run_limit.jobs,cfg.run_limit.bytes,cfg.data_provider.type,cfg.data_provider.output_buffer_size,cfg.device.pci_address,cfg.input.cwd,cfg.output.cwd,cfg.warmup_job_count,cfg.core_set,cfg.benchmark_mode,cfg.warnings_are_errors,cfg.attribute.doca_compress.algorithm,cfg.attribute.doca_ec.matrix_type,cfg.attribute.doca_ec.data_block_count,cfg.attribute.doca_ec.redundancy_block_count,cfg.attribute.doca_ec.use_precomputed_matrix,cfg.attribute.doca_eth.l3_chksum_offload,cfg.attribute.doca_eth.l4_chksum_offload,cfg.attribute.doca_sha.algorithm,cfg.uniform_job_size,cfg.core_count,cfg.per_core_thread_count,cfg.task_pool_size,cfg.data_provider_job_count,cfg.sg_config,cfg.mtu-size,cfg.send-queue-size,cfg.receive-queue-size,cfg.data-provider-input-file,cfg.attribute.mmo.log_qp_depth,cfg.attribute.mmo.log_num_qps,stats.input.job_count,stats.output.job_count,stats.input.byte_count,stats.output.byte_count,stats.input.throughput.bytes,stats.output.throughput.bytes,stats.input.throughput.rate,stats.output.throughput.rate
,[doca_sha],0
,0
,10000
,1000
,3
,,,random-data,2048
,d8:00.0
,,,100
,[2
],throughput,0
,,,,,,,,sha512,2048
,1
,2
,1024
,128
,1
fragments,,,,,,,7246761
,7246761
,14841366528
,463850048
,036.673
Gib/s,001.146
Gib/s,2.403422
MOperations/s,2.403422
MOperations/s
,[doca_sha],0
,0
,10000
,1000
,3
,,,random-data,2048
,d8:00.0
,,,100
,[2
],throughput,0
,,,,,,,,sha512,2048
,1
,2
,1024
,128
,1
fragments,,,,,,,7261710
,7261710
,14871982080
,464806784
,036.934
Gib/s,001.154
Gib/s,2.420512
MOperations/s,2.420512
MOperations/s
此测试在 BlueField 端调用 DOCA Bench 以使用 SHA1 算法执行 SHA 操作,并在测试运行期间每 2000 毫秒显示统计信息
提供了一个包含 3 个核心的列表,每个核心的线程计数为 2,核心计数为 1
core-count 指示 DOCA Bench 使用核心列表中的第一个核心编号,在本例中为核心编号 2
命令行
doca_bench --core-list 2
,3
,4
\
--core-count 1
\
--threads-per-core 2
\
--pipeline-steps doca_sha \
--device 03
:00.0
\
--data-provider random-data \
--uniform-job-size 2048
\
--job-output-buffer-size 2048
\
--run-limit-seconds 3
\
-attribute doca_sha.algorithm=sha1 \
--warm-up-jobs 100
\
--rt-stats-interval 2000
结果输出
Executing...
Data path thread [0
] started...
WT[0
] Executing 100
warm-up tasks using 100
unique tasks
Data path thread [1
] started...
WT[1
] Executing 100
warm-up tasks using 100
unique tasks
Stats for
thread[0
](core: 2
)
Duration: 965645
micro seconds
Enqueued jobs: 1171228
Dequeued jobs: 1171228
Throughput: 001.213
MOperations/s
Ingress rate: 018.505
Gib/s
Egress rate: 000.181
Gib/s
Stats for
thread[1
](core: 2
)
Duration: 965645
micro seconds
Enqueued jobs: 1171754
Dequeued jobs: 1171754
Throughput: 001.213
MOperations/s
Ingress rate: 018.514
Gib/s
Egress rate: 000.181
Gib/s
Aggregate stats
Duration: 965645
micro seconds
Enqueued jobs: 2342982
Dequeued jobs: 2342982
Throughput: 002.426
MOperations/s
Ingress rate: 037.019
Gib/s
Egress rate: 000.362
Gib/s
Stats for
thread[0
](core: 2
)
Duration: 2968088
micro seconds
Enqueued jobs: 3653691
Dequeued jobs: 3653691
Throughput: 001.231
MOperations/s
Ingress rate: 018.783
Gib/s
Egress rate: 000.183
Gib/s
Stats for
thread[1
](core: 2
)
Duration: 2968088
micro seconds
Enqueued jobs: 3689198
Dequeued jobs: 3689198
Throughput: 001.243
MOperations/s
Ingress rate: 018.965
Gib/s
Egress rate: 000.185
Gib/s
Aggregate stats
Duration: 2968088
micro seconds
Enqueued jobs: 7342889
Dequeued jobs: 7342889
Throughput: 002.474
MOperations/s
Ingress rate: 037.748
Gib/s
Egress rate: 000.369
Gib/s
Cleanup...
[main] Completed! tearing down...
Stats for
thread[0
](core: 2
)
Duration: 3000122
micro seconds
Enqueued jobs: 3694128
Dequeued jobs: 3694128
Throughput: 001.231
MOperations/s
Ingress rate: 018.789
Gib/s
Egress rate: 000.184
Gib/s
Stats for
thread[1
](core: 2
)
Duration: 3000089
micro seconds
Enqueued jobs: 3751128
Dequeued jobs: 3751128
Throughput: 001.250
MOperations/s
Ingress rate: 019.079
Gib/s
Egress rate: 000.186
Gib/s
Aggregate stats
Duration: 3000122
micro seconds
Enqueued jobs: 7445256
Dequeued jobs: 7445256
Throughput: 002.482
MOperations/s
Ingress rate: 037.867
Gib/s
Egress rate: 000.370
Gib/s
结果概述
尽管已指定包含 3 个核心的核心列表,但核心计数为 1 指示 DOCA Bench 使用核心列表中的第一个条目。
可以看到,由于已指定线程计数为 2,因此在核心 2 上创建了 2 个线程。
已指定 2000 毫秒的瞬态统计信息间隔,可以看到每个线程的瞬态统计信息以及最终的聚合统计信息。
此测试调用 DOCA Bench 以在主机上执行本地 DMA 操作
它指定应使用核心计数 1、2 和 4 执行核心扫描,使用选项
--sweep core-count,1,4,*2
测试输出将保存在 CSV 文件
/tmp/dma_sweep.csv
中,并应用过滤器,以便仅记录统计信息。 不会记录配置信息。
命令行
doca_bench --core-mask 0xff
\
--sweep core-count,1
,4
,*2
\
--pipeline-steps doca_dma \
--device d8:00.0
\
--data-provider random-data \
--uniform-job-size 2048
\
--job-output-buffer-size 2048
\
--run-limit-seconds 5
\
--csv-output-file /tmp/dma_sweep.csv \
--csv-stats "stats.*"
结果概述
Test permutations: [
Attributes: []
Uniform job size: 2048
Core count: 1
Per core thread count: 1
Task pool size: 1024
Data provider job count: 128
MTU size: -- not configured --
SQ depth: -- not configured --
RQ depth: -- not configured --
Input data file: -- not configured --
--------------------------------
Attributes: []
Uniform job size: 2048
Core count: 2
Per core thread count: 1
Task pool size: 1024
Data provider job count: 128
MTU size: -- not configured --
SQ depth: -- not configured --
RQ depth: -- not configured --
Input data file: -- not configured --
--------------------------------
Attributes: []
Uniform job size: 2048
Core count: 4
Per core thread count: 1
Task pool size: 1024
Data provider job count: 128
MTU size: -- not configured --
SQ depth: -- not configured --
RQ depth: -- not configured --
Input data file: -- not configured --
]
[main] Initialize framework...
[main] Start execution...
Preparing permutation 1
of 3
...
Executing permutation 1
of 3
...
Data path thread [0
] started...
WT[0
] Executing 100
warm-up tasks using 100
unique tasks
Cleanup permutation 1
of 3
...
Aggregate stats
Duration: 5000191
micro seconds
Enqueued jobs: 22999128
Dequeued jobs: 22999128
Throughput: 004.600
MOperations/s
Ingress rate: 070.185
Gib/s
Egress rate: 070.185
Gib/s
Preparing permutation 2
of 3
...
Executing permutation 2
of 3
...
Data path thread [0
] started...
WT[0
] Executing 100
warm-up tasks using 100
unique tasks
Data path thread [1
] started...
WT[1
] Executing 100
warm-up tasks using 100
unique tasks
Cleanup permutation 2
of 3
...
Stats for
thread[0
](core: 0
)
Duration: 5000066
micro seconds
Enqueued jobs: 14409794
Dequeued jobs: 14409794
Throughput: 002.882
MOperations/s
Ingress rate: 043.975
Gib/s
Egress rate: 043.975
Gib/s
Stats for
thread[1
](core: 1
)
Duration: 5000188
micro seconds
Enqueued jobs: 14404708
Dequeued jobs: 14404708
Throughput: 002.881
MOperations/s
Ingress rate: 043.958
Gib/s
Egress rate: 043.958
Gib/s
Aggregate stats
Duration: 5000188
micro seconds
Enqueued jobs: 28814502
Dequeued jobs: 28814502
Throughput: 005.763
MOperations/s
Ingress rate: 087.932
Gib/s
Egress rate: 087.932
Gib/s
Preparing permutation 3
of 3
...
Executing permutation 3
of 3
...
Data path thread [1
] started...
Data path thread [0
] started...
WT[0
] Executing 100
warm-up tasks using 100
unique tasks
WT[1
] Executing 100
warm-up tasks using 100
unique tasks
Data path thread [3
] started...
WT[3
] Executing 100
warm-up tasks using 100
unique tasks
Data path thread [2
] started...
WT[2
] Executing 100
warm-up tasks using 100
unique tasks
Cleanup permutation 3
of 3
...
[main] Completed! tearing down...
Stats for
thread[0
](core: 0
)
Duration: 5000092
micro seconds
Enqueued jobs: 7227025
Dequeued jobs: 7227025
Throughput: 001.445
MOperations/s
Ingress rate: 022.055
Gib/s
Egress rate: 022.055
Gib/s
Stats for
thread[1
](core: 1
)
Duration: 5000081
micro seconds
Enqueued jobs: 7223269
Dequeued jobs: 7223269
Throughput: 001.445
MOperations/s
Ingress rate: 022.043
Gib/s
Egress rate: 022.043
Gib/s
Stats for
thread[2
](core: 2
)
Duration: 5000047
micro seconds
Enqueued jobs: 7229678
Dequeued jobs: 7229678
Throughput: 001.446
MOperations/s
Ingress rate: 022.063
Gib/s
Egress rate: 022.063
Gib/s
Stats for
thread[3
](core: 3
)
Duration: 5000056
micro seconds
Enqueued jobs: 7223037
Dequeued jobs: 7223037
Throughput: 001.445
MOperations/s
Ingress rate: 022.043
Gib/s
Egress rate: 022.043
Gib/s
Aggregate stats
Duration: 5000092
micro seconds
Enqueued jobs: 28903009
Dequeued jobs: 28903009
Throughput: 005.780
MOperations/s
Ingress rate: 088.203
Gib/s
Egress rate: 088.203
Gib/s
结果概述
输出给出了正在执行的排列的摘要,然后继续显示每个排列的统计信息。
可以看到 CSV 输出文件内容仅包含统计信息。 不包含配置信息。
每个扫描排列都有一个条目
stats.input.job_count,stats.output.job_count,stats.input.byte_count,stats.output.byte_count,stats.input.throughput.bytes,stats.output.throughput.bytes,stats.input.throughput.rate,stats.output.throughput.rate
22999128
,22999128
,47102214144
,47102214144
,070.185
Gib/s,070.185
Gib/s,4.599650
MOperations/s,4.599650
MOperations/s
28814502
,28814502
,59012100096
,59012100096
,087.932
Gib/s,087.932
Gib/s,5.762683
MOperations/s,5.762683
MOperations/s
28903009
,28903009
,59193362432
,59193362432
,088.203
Gib/s,088.203
Gib/s,5.780495
MOperations/s,5.780495
MOperations/s
此测试调用 DOCA Bench 以在主机上执行本地 DMA 操作。
它指定应使用作业大小 1024 和 2048 执行统一作业大小扫描,使用选项 --sweep uniform-job-size,1024,2048
。
测试输出将保存在 CSV 文件 /tmp/dma_sweep_job_size.csv
中,并且启用了环境信息收集。
命令行
doca_bench --core-mask 0xff
\
--core-count 1
\
--pipeline-steps doca_dma \
--device d8:00.0
\
--data-provider random-data \
--sweep uniform-job-size,1024
,2048
\
--job-output-buffer-size 2048
\
--run-limit-seconds 5
\
--csv-output-file /tmp/dma_sweep_job_size.csv \
--enable-environment-information
结果概述
Test permutations: [
Attributes: []
Uniform job size: 1024
Core count: 1
Per core thread count: 1
Task pool size: 1024
Data provider job count: 128
MTU size: -- not configured --
SQ depth: -- not configured --
RQ depth: -- not configured --
Input data file: -- not configured --
--------------------------------
Attributes: []
Uniform job size: 2048
Core count: 1
Per core thread count: 1
Task pool size: 1024
Data provider job count: 128
MTU size: -- not configured --
SQ depth: -- not configured --
RQ depth: -- not configured --
Input data file: -- not configured --
]
[main] Initialize framework...
[main] Start execution...
Preparing permutation 1
of 2
...
Executing permutation 1
of 2
...
Data path thread [0
] started...
WT[0
] Executing 100
warm-up tasks using 100
unique tasks
Cleanup permutation 1
of 2
...
Aggregate stats
Duration: 5000083
micro seconds
Enqueued jobs: 23645128
Dequeued jobs: 23645128
Throughput: 004.729
MOperations/s
Ingress rate: 036.079
Gib/s
Egress rate: 036.079
Gib/s
Preparing permutation 2
of 2
...
Executing permutation 2
of 2
...
Data path thread [0
] started...
WT[0
] Executing 100
warm-up tasks using 100
unique tasks
Cleanup permutation 2
of 2
...
[main] Completed! tearing down...
Aggregate stats
Duration: 5000027
micro seconds
Enqueued jobs: 22963128
Dequeued jobs: 22963128
Throughput: 004.593
MOperations/s
Ingress rate: 070.078
Gib/s
Egress rate: 070.078
Gib/s
结果概述
输出给出了正在执行的排列的摘要,然后继续显示每个排列的统计信息。
可以看到 CSV 输出文件内容包含统计信息和环境信息。
每个扫描排列都有一个条目。
cfg.companion.connection_string,cfg.pipeline.steps,cfg.pipeline.use_remote_input_buffers,cfg.pipeline.use_remote_output_buffers,cfg.pipeline.bulk_latency.lower_bound,cfg.pipeline.bulk_latency.bucket_width,cfg.run_limit.duration,cfg.run_limit.jobs,cfg.run_limit.bytes,cfg.data_provider.type,cfg.data_provider.output_buffer_size,cfg.device.pci_address,cfg.input.cwd,cfg.output.cwd,cfg.warmup_job_count,cfg.core_set,cfg.benchmark_mode,cfg.warnings_are_errors,cfg.attribute.doca_compress.algorithm,cfg.attribute.doca_ec.matrix_type,cfg.attribute.doca_ec.data_block_count,cfg.attribute.doca_ec.redundancy_block_count,cfg.attribute.doca_ec.use_precomputed_matrix,cfg.attribute.doca_eth.l3_chksum_offload,cfg.attribute.doca_eth.l4_chksum_offload,cfg.attribute.doca_sha.algorithm,cfg.uniform_job_size,cfg.core_count,cfg.per_core_thread_count,cfg.task_pool_size,cfg.data_provider_job_count,cfg.sg_config,cfg.mtu-size,cfg.send-queue-size,cfg.receive-queue-size,cfg.data-provider-input-file,cfg.attribute.mmo.log_qp_depth,cfg.attribute.mmo.log_num_qps,stats.input.job_count,stats.output.job_count,stats.input.byte_count,stats.output.byte_count,stats.input.throughput.bytes,stats.output.throughput.bytes,stats.input.throughput.rate,stats.output.throughput.rate,host.pci.3
.address,host.pci.3
.ext_tag,host.pci.3
.link_type,host.pci.2
.ext_tag,host.pci.2
.address,host.cpu.0
.model,host.ofed_version,host.pci.4
.max_read_request,host.pci.2
.width,host.cpu.1
.logical_cores,host.pci.2
.eswitch_mode,host.pci.3
.max_read_request,host.pci.4
.address,host.pci.2
.link_type,host.pci.1
.max_read_request,host.pci.4
.link_type,host.cpu.socket_count,host.pci.0
.ext_tag,host.pci.6
.port_speed,host.cpu.0
.physical_cores,host.pci.7
.port_speed,host.memory.dimm_slot_count,host.cpu.1
.model,host.pci.0
.max_payload_size,host.pci.6
.relaxed_ordering,host.doca_host_package_version,host.pci.6
.max_payload_size,host.pci.0
.gen,host.pci.4
.width,host.pci.2
.gen,host.pci.1
.max_payload_size,host.pci.4
.relaxed_ordering,host.pci.3
.width,host.cpu.0
.logical_cores,host.cpu.0
.arch,host.pci.4
.port_speed,host.pci.4
.eswitch_mode,host.pci.7
.address,host.pci.5
.eswitch_mode,host.pci.5
.address,host.cpu.1
.arch,host.pci.0
.eswitch_mode,host.pci.7
.width,host.pci.7
.link_type,host.pci.1
.link_type,host.pci.3
.gen,host.pci.7
.max_read_request,host.pci.7
.eswitch_mode,host.pci.6
.gen,host.pci.2
.port_speed,host.pci.7
.gen,host.pci.2
.relaxed_ordering,host.pci.6
.width,host.pci.4
.gen,host.pci.6
.address,host.hostname,host.pci.5
.link_type,host.pci.6
.link_type,host.pci.6
.max_read_request,host.pci.7
.max_payload_size,host.pci.5
.gen,host.pci.6
.eswitch_mode,host.pci.5
.width,host.pci.3
.relaxed_ordering,host.pci.4
.ext_tag,host.pci.0
.width,host.pci.5
.port_speed,host.pci.2
.max_payload_size,host.pci.3
.max_payload_size,host.pci.5
.max_payload_size,host.pci.2
.max_read_request,host.pci.0
.address,host.pci.gen,host.os.family,host.pci.1
.gen,host.pci.5
.relaxed_ordering,host.pci.1
.port_speed,host.pci.7
.ext_tag,host.pci.1
.address,host.pci.3
.eswitch_mode,host.pci.3
.port_speed,host.pci.0
.max_read_request,host.pci.1
.ext_tag,host.pci.0
.relaxed_ordering,host.pci.0
.link_type,host.pci.5
.max_read_request,host.pci.4
.max_payload_size,host.pci.device_count,host.memory.populated_dimm_count,host.memory.installed_capacity,host.pci.6
.ext_tag,host.os.kernel_version,host.pci.0
.port_speed,host.pci.1
.width,host.pci.7
.relaxed_ordering,host.pci.1
.relaxed_ordering,host.os.version,host.os.name,host.cpu.1
.physical_cores,host.numa_node_count,host.pci.5
.ext_tag,host.pci.1
.eswitch_mode
,[doca_dma],0
,0
,10000
,1000
,5
,,,random-data,2048
,d8:00.0
,,,100
,"[0, 1, 2, 3, 4, 5, 6, 7]"
,throughput,0
,,,,,,,,,1024
,1
,1
,1024
,128
,1
fragments,,,,,,,23645128
,23645128
,24212611072
,24212611072
,036.079
Gib/s,036.079
Gib/s,4.728947
MOperations/s,4.728947
MOperations/s,0000
:5e:00.1
,true
,Infiniband,true
,0000
:5e:00.0
,N/A,OFED-internal-24.04
-0.4
.8
,N/A,x63,N/A,N/A,N/A,0000
:af:00.0
,Infiniband,N/A,Ethernet,2
,true
,N/A,N/A,N/A,N/A,N/A,N/A,true
,<none>,N/A,Gen15,x63,Gen15,N/A,true
,x63,N/A,x86_64,104857600000
,N/A,0000
:d8:00.1
,N/A,0000
:af:00.1
,x86_64,N/A,x63,Ethernet,Infiniband,Gen15,N/A,N/A,Gen15,N/A,Gen15,true
,x63,Gen15,0000
:d8:00.0
,zibal,Ethernet,Ethernet,N/A,N/A,Gen15,N/A,x63,true
,true
,x63,104857600000
,N/A,N/A,N/A,N/A,0000
:3b:00.0
,N/A,Linux,Gen15,true
,N/A,true
,0000
:3b:00.1
,N/A,N/A,N/A,true
,true
,Infiniband,N/A,N/A,8
,N/A,270049112064
,true
,5.4
.0
-174
-generic,N/A,x63,true
,true
,20.04
.1
LTS (Focal Fossa),Ubuntu,N/A,2
,true
,N/A
,[doca_dma],0
,0
,10000
,1000
,5
,,,random-data,2048
,d8:00.0
,,,100
,"[0, 1, 2, 3, 4, 5, 6, 7]"
,throughput,0
,,,,,,,,,2048
,1
,1
,1024
,128
,1
fragments,,,,,,,22963128
,22963128
,47028486144
,47028486144
,070.078
Gib/s,070.078
Gib/s,4.592600
MOperations/s,4.592600
MOperations/s,0000
:5e:00.1
,true
,Infiniband,true
,0000
:5e:00.0
,N/A,OFED-internal-24.04
-0.4
.8
,N/A,x63,N/A,N/A,N/A,0000
:af:00.0
,Infiniband,N/A,Ethernet,2
,true
,N/A,N/A,N/A,N/A,N/A,N/A,true
,<none>,N/A,Gen15,x63,Gen15,N/A,true
,x63,N/A,x86_64,104857600000
,N/A,0000
:d8:00.1
,N/A,0000
:af:00.1
,x86_64,N/A,x63,Ethernet,Infiniband,Gen15,N/A,N/A,Gen15,N/A,Gen15,true
,x63,Gen15,0000
:d8:00.0
,zibal,Ethernet,Ethernet,N/A,N/A,Gen15,N/A,x63,true
,true
,x63,104857600000
,N/A,N/A,N/A,N/A,0000
:3b:00.0
,N/A,Linux,Gen15,true
,N/A,true
,0000
:3b:00.1
,N/A,N/A,N/A,true
,true
,Infiniband,N/A,N/A,8
,N/A,270049112064
,true
,5.4
.0
-174
-generic,N/A,x63,true
,true
,20.04
.1
LTS (Focal Fossa),Ubuntu,N/A,2
,true
,N/A
此测试调用 DOCA Bench 以在主机上执行远程 DMA 操作
它指定了要在主机上使用的伴随连接详细信息,以及要使用远程输出缓冲区
命令行
doca_bench --core-list 12
\
--pipeline-steps doca_dma \
--device 03
:00.0
\
--data-provider random-data \
--uniform-job-size 2048
\
--job-output-buffer-size 2048
\
--use-remote-output-buffers \
--companion-connection-string proto=tcp,port=12345
,mode=host,dev=17
:00.0
,user=bob,addr=10.10
.10.10
\
--run-limit-seconds 5
结果概述
Executing...
Worker thread[0
](core: 12
) [doca_dma] started...
Worker thread[0
] Executing 100
warm-up tasks using 100
unique tasks
Cleanup...
[main] Completed! tearing down...
Aggregate stats
Duration: 5000073
micro seconds
Enqueued jobs: 32202128
Dequeued jobs: 32202128
Throughput: 006.440
MOperations/s
Ingress rate: 098.272
Gib/s
Egress rate: 098.272
Gib/s
结果概述
无。
此测试仅与 BlueField-2 相关。
此测试调用 DOCA Bench 以使用随机数据作为输入运行压缩
指定的压缩算法为“deflate”
命令行
doca_bench --core-list 2
\
--pipeline-steps doca_compress::compress \
--device 03
:00.0
\
--data-provider random-data \
--uniform-job-size 2048
\
--job-output-buffer-size 4096
\
--run-limit-seconds 3
\
--attribute doca_compress.algorithm="deflate"
结果输出
Executing...
Data path thread [0
] started...
WT[0
] Executing 100
warm-up tasks using 100
unique tasks
Cleanup...
[main] Completed! tearing down...
Aggregate stats
Duration: 3000146
micro seconds
Enqueued jobs: 5340128
Dequeued jobs: 5340128
Throughput: 001.780
MOperations/s
Ingress rate: 027.160
Gib/s
Egress rate: 027.748
Gib/s
结果概述
无
此测试调用 DOCA Bench 以使用随机数据作为输入运行解压缩
此测试指定文件集的数据提供程序,其中包含 LZ4 压缩文件的文件名
指定使用远程输入缓冲区用于输入作业
它指定了要在主机上用于远程输入缓冲区的伴随连接详细信息
命令行
doca_bench --core-list 12
\
--pipeline-steps doca_compress::decompress \
--device 03
:00.0
\
--data-provider file-set \
--data-provider-input-file lz4_compressed_64b_buffers.fs \
--job-output-buffer-size 4096
\
--run-limit-seconds 3
\
--attribute doca_compress.algorithm="lz4"
\
--use-remote-output-buffers \
--companion-connection-string proto=tcp,port=12345
,mode=host,dev=17
:00.0
,user=bob,addr=10.10
.10.10
结果输出
Executing...
Worker thread[0
](core: 12
) [doca_compress::decompress] started...
Worker thread[0
] Executing 100
warm-up tasks using 100
unique tasks
Cleanup...
[main] Completed! tearing down...
Aggregate stats
Duration: 3000043
micro seconds
Enqueued jobs: 15306128
Dequeued jobs: 15306128
Throughput: 005.102
MOperations/s
Ingress rate: 003.155
Gib/s
Egress rate: 002.433
Gib/s
结果注释
无
此测试调用 DOCA Bench 以运行 EC 创建步骤。
它在批量延迟模式下运行,并指定
doca_ec
的data_block_count
、redundancy_block_count
和matrix_type
属性
命令行
doca_bench --mode bulk-latency \
--core-list 12
\
--pipeline-steps doca_ec::create \
--device 17
:00.0
\
--data-provider random-data \
--uniform-job-size 1024
\
--job-output-buffer-size 1024
\
--run-limit-seconds 3
\
--attribute doca_ec.data_block_count=16
\
--attribute doca_ec.redundancy_block_count=16
\
--attribute doca_ec.matrix_type=cauchy
结果输出
批量延迟输出将类似于“BlueField 端解压缩 LZ4 示例”部分中介绍的输出。
结果注释
批量延迟输出将类似于本页前面介绍的输出。
此测试调用 DOCA Bench 以运行 EC 创建步骤
它在精确延迟模式下运行,并指定
doca_ec
的data_block_count
、redundancy_block_count
和matrix_type
属性
命令行
doca_bench --mode precision-latency \
--core-list 12
\
--pipeline-steps doca_ec::create \
--device 03
:00.0
\
--data-provider random-data \
--uniform-job-size 1024
\
--job-output-buffer-size 1024
\
--run-limit-jobs 5000
\
--attribute doca_ec.data_block_count=16
\
--attribute doca_ec.redundancy_block_count=16
\
--attribute doca_ec.matrix_type=cauchy
结果输出
无
结果注释
精确延迟输出将类似于本页前面介绍的输出。
此测试在 Comch 消费者模式下调用 DOCA Bench,在主机端和 BlueField 端使用核心列表
运行限制为 500 个作业
命令行
./doca_bench --core-list 4
--warm-up-jobs 32
--pipeline-steps doca_comch::consumer --device ca:00.0
--data-provider random-data --run-limit-jobs 500
--core-count 1
--uniform-job-size 4096
--job-output-buffer-size 4096
--companion-connection-string proto=tcp,mode=dpu,dev=03
:00.0
,user=bob,addr=10.10
.10.10
,port=12345
--attribute dopt.companion_app.path=<path to DPU doca_bench_companion application location> --data-provider-job-count 256
--companion-core-list 12
结果输出
[main] Completed! tearing down...
Aggregate stats
Duration: 1415
micro seconds
Enqueued jobs: 500
Dequeued jobs: 500
Throughput: 000.353
MOperations/s
Ingress rate: 000.000
Gib/s
Egress rate: 010.782
Gib/s
结果注释
聚合统计信息显示测试在处理完 500 个作业后完成。
此测试在 Comch 生产者模式下调用 DOCA Bench,在主机端和 BlueField 端使用核心掩码
运行限制为 1000 个作业
命令行
doca_bench --core-list 4
\
--warm-up-jobs 32
\
--pipeline-steps doca_comch::producer \
--device ca:00.0
\
--data-provider random-data \
--run-limit-jobs 500
\
--core-count 1
\
--uniform-job-size 4096
\
--job-output-buffer-size 4096
\
--companion-connection-string proto=tcp,mode=dpu,dev=03
:00.0
,user=bob,addr=10.10
.10.10
,port=12345
\
--attribute dopt.companion_app.path=<path to DPU doca_bench_companion location> \
--data-provider-job-count 256
\
--companion-core-list 12
结果概述
[main] Completed! tearing down...
Aggregate stats
Duration: 407
micro seconds
Enqueued jobs: 500
Dequeued jobs: 500
Throughput: 001.226
MOperations/s
Ingress rate: 037.402
Gib/s
Egress rate: 000.000
Gib/s
结果注释
聚合统计信息显示测试在处理完 500 个作业后完成。
此测试在 RDMA 发送模式下调用 DOCA Bench,在发送端和接收端使用核心列表
发送队列大小配置为 50 个条目
命令行
doca_bench --pipeline-steps doca_rdma::send \
--device d8:00.0
\
--data-provider random-data \
--uniform-job-size 2048
\
--job-output-buffer-size 2048
\
--run-limit-seconds 3
\
--send-queue-size 50
\
--companion-connection-string proto=tcp,addr=10.10
.10.10
,port=12345
,user=bob,dev=ca:00.0
\
--companion-core-list 12
\
--core-list 12
结果输出
Test permutations: [
Attributes: []
Uniform job size: 2048
Core count: 1
Per core thread count: 1
Task pool size: 1024
Data provider job count: 128
MTU size: -- not configured --
SQ depth: 50
RQ depth: -- not configured --
Input data file: -- not configured --
]
结果注释
配置输出显示发送队列大小配置为 50。
此测试在 RDMA 接收模式下调用 DOCA Bench,在发送端和接收端使用核心列表
接收队列大小配置为 100 个条目
命令行
doca_bench --pipeline-steps doca_rdma::receive \
--device d8:00.0
\
--data-provider random-data \
--uniform-job-size 2048
\
--job-output-buffer-size 2048
\
--run-limit-seconds 3
\
--receive-queue-size 100
\
--companion-connection-string proto=tcp,addr=10.10
.10.10
,port=12345
,user=bob,dev=ca:00.0
\
--companion-core-list 12
\
--core-list 12
结果输出
Test permutations: [
Attributes: []
Uniform job size: 2048
Core count: 1
Per core thread count: 1
Task pool size: 1024
Data provider job count: 128
MTU size: -- not configured --
SQ depth: -- not configured --
RQ depth: 100
Input data file: -- not configured --
]
结果概述
配置输出显示接收队列大小配置为 100。