性能#
您可以使用 genai-perf
工具 在模拟生产负载下对文本嵌入 NIM 的性能进行基准测试。genai-perf
预装在 Triton Server SDK 容器中。
要运行性能基准测试,首先创建一个文本示例数据集,genai-perf
可以在向嵌入服务发出请求时使用。这些示例应代表您期望在生产环境中接收的数据类型。数据集应格式化为 JSONL 文件,其中每行包含一个 {"text": ...}
对象,如下例所示。
示例:(embeddings.jsonl
)
{"text": "What was the first car ever driven?"}
{"text": "Who served as the 5th President of the United States of America?"}
{"text": "Is the Sydney Opera House located in Australia?"}
{"text": "In what state did they film Shrek 2?"}
使用以下示例来运行 Triton Inference Server SDK docker 容器,挂载目录,如下例所示的 datasets/
,您在其中创建了 JSONL 文件。
export RELEASE="yy.mm" # e.g. export RELEASE="24.10"
docker run -it --rm \
--gpus=all \
--network="host" \
--mount type=bind,source=${PWD}/datasets,target=/datasets \
nvcr.io/nvidia/tritonserver:${RELEASE}-py3-sdk
执行以下命令以使用 genai-perf
命令行工具运行性能基准测试。
genai-perf profile \
-m nvidia/nv-embedqa-e5-v5 \
--service-kind openai \
--endpoint-type embeddings \
--batch-size 2 \
--input-file /datasets/embeddings.jsonl \
--extra-inputs input_type:query \
--extra-inputs truncate:END \
--concurrency 5 \
--url http://127.0.0.1:8000
您可以在 GenAI-Perf 文档的 命令行选项 部分中查看 genai-perf
的完整命令行选项集。
基准#
所有延迟测量均以毫秒为单位报告。
输入类型 |
输入 tokens |
批大小 |
并发 |
平均延迟 |
P50 延迟 |
P90 延迟 |
P95 延迟 |
吞吐量 (inputs/s) |
---|---|---|---|---|---|---|---|---|
query |
20 |
1 |
1 |
7.0 |
7.0 |
8.0 |
8.0 |
140.7 |
query |
20 |
1 |
3 |
10.0 |
10.0 |
10.0 |
10.0 |
291.7 |
query |
20 |
1 |
5 |
16.0 |
17.0 |
17.0 |
17.0 |
307.0 |
query |
20 |
1 |
7 |
22.0 |
23.0 |
24.0 |
24.0 |
310.9 |
query |
20 |
1 |
9 |
25.0 |
26.0 |
31.0 |
31.0 |
329.7 |
query |
20 |
1 |
11 |
30.0 |
32.0 |
38.0 |
38.0 |
332.8 |
query |
20 |
1 |
13 |
36.0 |
37.0 |
43.0 |
44.0 |
340.0 |
query |
20 |
1 |
15 |
42.0 |
43.0 |
48.0 |
50.0 |
327.5 |
passage |
300 |
64 |
1 |
159.0 |
159.0 |
162.0 |
163.0 |
401.8 |
passage |
300 |
64 |
3 |
267.0 |
269.0 |
275.0 |
277.0 |
709.4 |
passage |
300 |
64 |
5 |
392.0 |
390.0 |
401.0 |
411.0 |
814.5 |
passage |
512 |
64 |
1 |
222.0 |
218.0 |
233.0 |
235.0 |
286.3 |
passage |
512 |
64 |
3 |
431.0 |
430.0 |
450.0 |
455.0 |
440.1 |
passage |
512 |
64 |
5 |
615.0 |
617.0 |
694.0 |
701.0 |
504.3 |
输入类型 |
输入 tokens |
批大小 |
并发 |
平均延迟 |
P50 延迟 |
P90 延迟 |
P95 延迟 |
吞吐量 (inputs/s) |
---|---|---|---|---|---|---|---|---|
query |
20 |
1 |
1 |
8.0 |
8.0 |
8.0 |
9.0 |
125.6 |
query |
20 |
1 |
3 |
12.0 |
12.0 |
13.0 |
13.0 |
246.1 |
query |
20 |
1 |
5 |
20.0 |
21.0 |
22.0 |
22.0 |
247.9 |
query |
20 |
1 |
7 |
25.0 |
26.0 |
30.0 |
30.0 |
267.5 |
query |
20 |
1 |
9 |
34.0 |
34.0 |
38.0 |
39.0 |
251.5 |
query |
20 |
1 |
11 |
43.0 |
46.0 |
47.0 |
48.0 |
237.7 |
query |
20 |
1 |
13 |
46.0 |
49.0 |
53.0 |
54.0 |
260.9 |
query |
20 |
1 |
15 |
55.0 |
58.0 |
65.0 |
65.0 |
248.4 |
passage |
300 |
64 |
1 |
186.0 |
186.0 |
190.0 |
192.0 |
342.7 |
passage |
300 |
64 |
3 |
345.0 |
346.0 |
354.0 |
356.0 |
550.7 |
passage |
300 |
64 |
5 |
525.0 |
524.0 |
535.0 |
544.0 |
608.3 |
passage |
512 |
64 |
1 |
269.0 |
265.0 |
280.0 |
284.0 |
237.2 |
passage |
512 |
64 |
3 |
563.0 |
564.0 |
581.0 |
584.0 |
337.3 |
passage |
512 |
64 |
5 |
846.0 |
861.0 |
946.0 |
956.0 |
366.0 |
输入类型 |
输入 tokens |
批大小 |
并发 |
平均延迟 |
P50 延迟 |
P90 延迟 |
P95 延迟 |
吞吐量 (inputs/s) |
---|---|---|---|---|---|---|---|---|
query |
20 |
1 |
1 |
10.0 |
10.0 |
10.0 |
11.0 |
98.0 |
query |
20 |
1 |
3 |
15.0 |
17.0 |
17.0 |
17.0 |
192.5 |
query |
20 |
1 |
5 |
24.0 |
23.0 |
28.0 |
28.0 |
200.8 |
query |
20 |
1 |
7 |
34.0 |
34.0 |
39.0 |
40.0 |
197.0 |
query |
20 |
1 |
9 |
44.0 |
45.0 |
51.0 |
51.0 |
197.3 |
query |
20 |
1 |
11 |
53.0 |
56.0 |
62.0 |
62.0 |
196.8 |
query |
20 |
1 |
13 |
63.0 |
67.0 |
73.0 |
73.0 |
190.6 |
query |
20 |
1 |
15 |
73.0 |
79.0 |
84.0 |
84.0 |
188.3 |
passage |
300 |
64 |
1 |
277.0 |
278.0 |
280.0 |
281.0 |
230.3 |
passage |
300 |
64 |
3 |
615.0 |
617.0 |
629.0 |
632.0 |
309.4 |
passage |
300 |
64 |
5 |
976.0 |
975.0 |
983.0 |
987.0 |
327.4 |
passage |
512 |
64 |
1 |
443.0 |
441.0 |
448.0 |
454.0 |
144.3 |
passage |
512 |
64 |
3 |
1071.0 |
1077.0 |
1089.0 |
1090.0 |
177.6 |
passage |
512 |
64 |
5 |
1736.0 |
1735.0 |
1752.0 |
1758.0 |
184.1 |
输入类型 |
输入 tokens |
批大小 |
并发 |
平均延迟 |
P50 延迟 |
P90 延迟 |
P95 延迟 |
吞吐量 (inputs/s) |
---|---|---|---|---|---|---|---|---|
query |
20 |
1 |
1 |
12.0 |
12.0 |
12.0 |
12.0 |
85.7 |
query |
20 |
1 |
3 |
19.0 |
21.0 |
22.0 |
22.0 |
150.3 |
query |
20 |
1 |
5 |
32.0 |
29.0 |
36.0 |
36.0 |
152.8 |
query |
20 |
1 |
7 |
46.0 |
50.0 |
50.0 |
50.0 |
148.0 |
query |
20 |
1 |
9 |
55.0 |
57.0 |
65.0 |
65.0 |
156.3 |
query |
20 |
1 |
11 |
68.0 |
72.0 |
79.0 |
79.0 |
154.5 |
query |
20 |
1 |
13 |
79.0 |
86.0 |
93.0 |
93.0 |
154.4 |
query |
20 |
1 |
15 |
84.0 |
86.0 |
95.0 |
101.0 |
165.0 |
passage |
300 |
64 |
1 |
350.0 |
350.0 |
353.0 |
354.0 |
182.8 |
passage |
300 |
64 |
3 |
821.0 |
823.0 |
831.0 |
842.0 |
231.7 |
passage |
300 |
64 |
5 |
1320.0 |
1317.0 |
1330.0 |
1346.0 |
242.1 |
passage |
512 |
64 |
1 |
567.0 |
566.0 |
573.0 |
575.0 |
112.7 |
passage |
512 |
64 |
3 |
1440.0 |
1448.0 |
1464.0 |
1475.0 |
132.2 |
passage |
512 |
64 |
5 |
2370.0 |
2371.0 |
2391.0 |
2396.0 |
134.8 |
输入类型 |
输入 tokens |
批大小 |
并发 |
平均延迟 |
P50 延迟 |
P90 延迟 |
P95 延迟 |
吞吐量 (inputs/s) |
---|---|---|---|---|---|---|---|---|
query |
20 |
1 |
1 |
8.0 |
8.0 |
9.0 |
10.0 |
122.3 |
query |
20 |
1 |
3 |
11.0 |
12.0 |
14.0 |
14.0 |
252.9 |
query |
20 |
1 |
5 |
20.0 |
19.0 |
23.0 |
24.0 |
250.9 |
query |
20 |
1 |
7 |
27.0 |
27.0 |
31.0 |
32.0 |
251.6 |
query |
20 |
1 |
9 |
30.0 |
30.0 |
37.0 |
42.0 |
280.1 |
query |
20 |
1 |
11 |
41.0 |
42.0 |
47.0 |
51.0 |
250.9 |
query |
20 |
1 |
13 |
46.0 |
49.0 |
53.0 |
57.0 |
262.0 |
query |
20 |
1 |
15 |
50.0 |
51.0 |
60.0 |
63.0 |
269.7 |
passage |
300 |
64 |
1 |
325.0 |
324.0 |
331.0 |
333.0 |
196.9 |
passage |
300 |
64 |
3 |
699.0 |
699.0 |
717.0 |
723.0 |
272.1 |
passage |
300 |
64 |
5 |
1095.0 |
1093.0 |
1109.0 |
1113.0 |
291.8 |
passage |
512 |
64 |
1 |
496.0 |
497.0 |
502.0 |
518.0 |
128.8 |
passage |
512 |
64 |
3 |
1165.0 |
1169.0 |
1199.0 |
1209.0 |
163.2 |
passage |
512 |
64 |
5 |
1872.0 |
1870.0 |
1904.0 |
1916.0 |
170.5 |
输入类型 |
输入 tokens |
批大小 |
并发 |
平均延迟 |
P50 延迟 |
P90 延迟 |
P95 延迟 |
吞吐量 (inputs/s) |
---|---|---|---|---|---|---|---|---|
query |
20 |
1 |
1 |
10.0 |
9.0 |
10.0 |
10.0 |
104.7 |
query |
20 |
1 |
3 |
12.0 |
12.0 |
12.0 |
12.0 |
246.7 |
query |
20 |
1 |
5 |
18.0 |
19.0 |
20.0 |
20.0 |
276.7 |
query |
20 |
1 |
7 |
25.0 |
24.0 |
28.0 |
28.0 |
268.9 |
query |
20 |
1 |
9 |
30.0 |
31.0 |
36.0 |
36.0 |
291.4 |
query |
20 |
1 |
11 |
38.0 |
39.0 |
44.0 |
44.0 |
270.3 |
query |
20 |
1 |
13 |
43.0 |
44.0 |
52.0 |
52.0 |
272.4 |
query |
20 |
1 |
15 |
49.0 |
52.0 |
56.0 |
57.0 |
276.4 |
passage |
300 |
64 |
1 |
362.0 |
362.0 |
367.0 |
369.0 |
176.8 |
passage |
300 |
64 |
3 |
786.0 |
789.0 |
803.0 |
812.0 |
241.9 |
passage |
300 |
64 |
5 |
1240.0 |
1239.0 |
1258.0 |
1262.0 |
257.6 |
passage |
512 |
64 |
1 |
550.0 |
548.0 |
565.0 |
570.0 |
116.2 |
passage |
512 |
64 |
3 |
1327.0 |
1335.0 |
1354.0 |
1359.0 |
143.4 |
passage |
512 |
64 |
5 |
2145.0 |
2145.0 |
2180.0 |
2187.0 |
149.0 |
输入类型 |
输入 tokens |
批大小 |
并发 |
平均延迟 |
P50 延迟 |
P90 延迟 |
P95 延迟 |
吞吐量 (inputs/s) |
---|---|---|---|---|---|---|---|---|
query |
20 |
1 |
1 |
21.0 |
21.0 |
21.0 |
21.0 |
48.6 |
query |
20 |
1 |
3 |
39.0 |
44.0 |
45.0 |
45.0 |
75.8 |
query |
20 |
1 |
5 |
61.0 |
60.0 |
75.0 |
75.0 |
80.2 |
query |
20 |
1 |
7 |
85.0 |
89.0 |
104.0 |
105.0 |
79.2 |
query |
20 |
1 |
9 |
109.0 |
119.0 |
134.0 |
134.0 |
79.0 |
query |
20 |
1 |
11 |
144.0 |
149.0 |
164.0 |
164.0 |
73.2 |
query |
20 |
1 |
13 |
159.0 |
164.0 |
194.0 |
194.0 |
76.5 |
query |
20 |
1 |
15 |
175.0 |
179.0 |
209.0 |
209.0 |
79.4 |
passage |
300 |
64 |
1 |
888.0 |
888.0 |
899.0 |
899.0 |
72.1 |
passage |
300 |
64 |
3 |
2272.0 |
2280.0 |
2329.0 |
2341.0 |
83.9 |
passage |
300 |
64 |
5 |
3795.0 |
3801.0 |
3828.0 |
3840.0 |
84.2 |
passage |
512 |
64 |
1 |
1451.0 |
1451.0 |
1471.0 |
1473.0 |
44.1 |
passage |
512 |
64 |
3 |
3926.0 |
3947.0 |
3988.0 |
4009.0 |
48.6 |
passage |
512 |
64 |
5 |
6577.0 |
6571.0 |
6632.0 |
6657.0 |
48.6 |
输入类型 |
输入 tokens |
批大小 |
并发 |
平均延迟 |
P50 延迟 |
P90 延迟 |
P95 延迟 |
吞吐量 (inputs/s) |
---|---|---|---|---|---|---|---|---|
query |
20 |
1 |
1 |
13.0 |
13.0 |
14.0 |
15.0 |
74.2 |
query |
20 |
1 |
3 |
21.0 |
23.0 |
23.0 |
23.0 |
138.0 |
query |
20 |
1 |
5 |
32.0 |
31.0 |
39.0 |
39.0 |
152.2 |
query |
20 |
1 |
7 |
46.0 |
46.0 |
53.0 |
54.0 |
145.8 |
query |
20 |
1 |
9 |
56.0 |
61.0 |
69.0 |
69.0 |
150.3 |
query |
20 |
1 |
11 |
67.0 |
69.0 |
76.0 |
77.0 |
154.0 |
query |
20 |
1 |
13 |
86.0 |
91.0 |
98.0 |
99.0 |
141.2 |
query |
20 |
1 |
15 |
98.0 |
105.0 |
113.0 |
114.0 |
141.0 |
passage |
300 |
64 |
1 |
724.0 |
725.0 |
730.0 |
733.0 |
88.3 |
passage |
300 |
64 |
3 |
1865.0 |
1876.0 |
1891.0 |
1892.0 |
102.1 |
passage |
300 |
64 |
5 |
3117.0 |
3127.0 |
3147.0 |
3150.0 |
102.1 |
passage |
512 |
64 |
1 |
1300.0 |
1300.0 |
1314.0 |
1318.0 |
49.2 |
passage |
512 |
64 |
3 |
3551.0 |
3577.0 |
3606.0 |
3618.0 |
53.6 |
passage |
512 |
64 |
5 |
5940.0 |
5968.0 |
6000.0 |
6007.0 |
53.6 |
输入类型 |
输入 tokens |
批大小 |
并发 |
平均延迟 |
P50 延迟 |
P90 延迟 |
P95 延迟 |
吞吐量 (inputs/s) |
---|---|---|---|---|---|---|---|---|
query |
20 |
1 |
1 |
18.0 |
18.0 |
18.0 |
19.0 |
55.8 |
query |
20 |
1 |
3 |
33.0 |
37.0 |
38.0 |
38.0 |
87.7 |
query |
20 |
1 |
5 |
55.0 |
62.0 |
63.0 |
63.0 |
87.4 |
query |
20 |
1 |
7 |
71.0 |
75.0 |
88.0 |
88.0 |
94.6 |
query |
20 |
1 |
9 |
96.0 |
101.0 |
114.0 |
114.0 |
90.3 |
query |
20 |
1 |
11 |
115.0 |
126.0 |
139.0 |
139.0 |
89.8 |
query |
20 |
1 |
13 |
131.0 |
138.0 |
164.0 |
164.0 |
91.4 |
query |
20 |
1 |
15 |
154.0 |
157.0 |
189.0 |
189.0 |
88.4 |
passage |
300 |
64 |
1 |
1011.0 |
1011.0 |
1019.0 |
1019.0 |
63.3 |
passage |
300 |
64 |
3 |
2702.0 |
2716.0 |
2734.0 |
2754.0 |
70.6 |
passage |
300 |
64 |
5 |
4527.0 |
4527.0 |
4550.0 |
4571.0 |
70.6 |
passage |
512 |
64 |
1 |
1705.0 |
1710.0 |
1733.0 |
1740.0 |
37.5 |
passage |
512 |
64 |
3 |
4770.0 |
4799.0 |
4860.0 |
4872.0 |
39.9 |
passage |
512 |
64 |
5 |
8001.0 |
8025.0 |
8079.0 |
8085.0 |
39.9 |
输入类型 |
输入 tokens |
批大小 |
并发 |
平均延迟 |
P50 延迟 |
P90 延迟 |
P95 延迟 |
吞吐量 (inputs/s) |
---|---|---|---|---|---|---|---|---|
passage |
300 |
64 |
1 |
99.8 |
100.9 |
107.9 |
108.6 |
639.0 |
passage |
300 |
64 |
3 |
143.8 |
143.3 |
156.6 |
159.0 |
1330.0 |
passage |
300 |
64 |
5 |
239.7 |
239.7 |
259.1 |
265.0 |
1331.0 |
passage |
512 |
64 |
1 |
114.6 |
114.4 |
115.9 |
117.0 |
556.5 |
passage |
512 |
64 |
3 |
170.2 |
169.9 |
171.2 |
171.8 |
1124.2 |
passage |
512 |
64 |
5 |
284.6 |
284.5 |
285.6 |
286.1 |
1121.4 |
query |
20 |
1 |
1 |
5.1 |
5.1 |
5.4 |
5.4 |
196.3 |
query |
20 |
1 |
3 |
6.0 |
5.5 |
7.4 |
7.6 |
498.5 |
query |
20 |
1 |
5 |
11.9 |
12.3 |
12.8 |
12.9 |
418.3 |
query |
20 |
1 |
7 |
16.5 |
17.2 |
18.0 |
18.1 |
422.0 |
query |
20 |
1 |
9 |
21.4 |
22.3 |
23.3 |
23.6 |
418.3 |
query |
20 |
1 |
11 |
26.0 |
26.0 |
28.4 |
28.6 |
421.3 |
query |
20 |
1 |
13 |
30.7 |
30.9 |
33.1 |
33.6 |
422.2 |
query |
20 |
1 |
15 |
37.3 |
37.9 |
39.1 |
39.3 |
401.4 |
输入类型 |
输入 tokens |
批大小 |
并发 |
平均延迟 |
P50 延迟 |
P90 延迟 |
P95 延迟 |
吞吐量 (inputs/s) |
---|---|---|---|---|---|---|---|---|
passage |
300 |
64 |
1 |
2554.3 |
2563.9 |
2678.1 |
2698.3 |
25.0 |
passage |
300 |
64 |
3 |
7349.2 |
7502.1 |
7889.3 |
7968.1 |
25.5 |
passage |
300 |
64 |
5 |
11913.2 |
12461.9 |
12893.4 |
12969.4 |
25.6 |
passage |
512 |
64 |
1 |
3701.9 |
3701.6 |
3703.1 |
3703.4 |
17.3 |
passage |
512 |
64 |
3 |
10730.2 |
10985.2 |
10987.0 |
11029.2 |
17.5 |
passage |
512 |
64 |
5 |
17355.4 |
14691.3 |
22035.4 |
22035.7 |
17.4 |
query |
20 |
1 |
1 |
32.4 |
32.4 |
32.7 |
32.8 |
30.7 |
query |
20 |
1 |
3 |
82.5 |
85.6 |
85.9 |
86.0 |
36.3 |
query |
20 |
1 |
5 |
135.5 |
142.9 |
143.3 |
143.3 |
36.8 |
query |
20 |
1 |
7 |
191.7 |
200.2 |
200.5 |
200.6 |
36.5 |
query |
20 |
1 |
9 |
246.9 |
257.4 |
257.8 |
257.9 |
36.4 |
query |
20 |
1 |
11 |
301.7 |
314.6 |
315.1 |
315.2 |
36.4 |
query |
20 |
1 |
13 |
356.6 |
371.6 |
372.2 |
372.4 |
36.4 |
query |
20 |
1 |
15 |
409.5 |
401.4 |
429.8 |
429.9 |
36.5 |
输入类型 |
输入 tokens |
批大小 |
并发 |
平均延迟 |
P50 延迟 |
P90 延迟 |
P95 延迟 |
吞吐量 (inputs/s) |
---|---|---|---|---|---|---|---|---|
passage |
300 |
64 |
1 |
176.5 |
177.1 |
188.6 |
190.4 |
362.2 |
passage |
300 |
64 |
3 |
336.1 |
337.0 |
359.1 |
365.8 |
570.2 |
passage |
300 |
64 |
5 |
560.2 |
562.9 |
592.3 |
634.6 |
569.8 |
passage |
512 |
64 |
1 |
205.3 |
204.7 |
208.2 |
210.8 |
311.4 |
passage |
512 |
64 |
3 |
410.9 |
411.1 |
412.5 |
412.7 |
466.4 |
passage |
512 |
64 |
5 |
681.5 |
682.0 |
683.6 |
684.1 |
468.7 |
query |
20 |
1 |
1 |
5.3 |
5.3 |
5.6 |
5.7 |
186.3 |
query |
20 |
1 |
3 |
7.4 |
7.4 |
7.5 |
7.7 |
403.8 |
query |
20 |
1 |
5 |
11.9 |
12.4 |
12.6 |
12.8 |
419.2 |
query |
20 |
1 |
7 |
16.6 |
17.3 |
17.5 |
17.6 |
421.5 |
query |
20 |
1 |
9 |
21.2 |
22.1 |
22.5 |
22.6 |
423.9 |
query |
20 |
1 |
11 |
26.1 |
27.2 |
27.7 |
27.8 |
420.5 |
query |
20 |
1 |
13 |
30.8 |
31.2 |
32.6 |
32.7 |
422.3 |
query |
20 |
1 |
15 |
36.4 |
37.3 |
37.9 |
38.0 |
411.7 |
输入类型 |
输入 tokens |
批大小 |
并发 |
平均延迟 |
P50 延迟 |
P90 延迟 |
P95 延迟 |
吞吐量 (inputs/s) |
---|---|---|---|---|---|---|---|---|
passage |
300 |
64 |
1 |
188.4 |
191.7 |
197.7 |
198.8 |
338.7 |
passage |
300 |
64 |
3 |
371.7 |
372.7 |
393.8 |
471.0 |
515.3 |
passage |
300 |
64 |
5 |
619.5 |
621.7 |
648.5 |
728.8 |
515.1 |
passage |
512 |
64 |
1 |
222.7 |
222.3 |
226.0 |
227.4 |
286.5 |
passage |
512 |
64 |
3 |
447.0 |
447.0 |
448.7 |
449.4 |
428.4 |
passage |
512 |
64 |
5 |
742.3 |
742.8 |
745.0 |
745.5 |
430.1 |
query |
20 |
1 |
1 |
6.6 |
6.6 |
7.0 |
7.1 |
149.3 |
query |
20 |
1 |
3 |
7.4 |
7.3 |
7.6 |
7.7 |
404.8 |
query |
20 |
1 |
5 |
11.8 |
12.2 |
12.5 |
12.6 |
421.5 |
query |
20 |
1 |
7 |
16.4 |
17.1 |
17.4 |
17.5 |
426.4 |
query |
20 |
1 |
9 |
20.9 |
21.9 |
22.3 |
22.4 |
429.9 |
query |
20 |
1 |
11 |
25.7 |
26.8 |
27.4 |
27.7 |
427.2 |
query |
20 |
1 |
13 |
30.4 |
31.5 |
32.1 |
32.2 |
427.4 |
query |
20 |
1 |
15 |
35.6 |
36.4 |
37.9 |
38.0 |
420.6 |
输入类型 |
输入 tokens |
批大小 |
并发 |
平均延迟 |
P50 延迟 |
P90 延迟 |
P95 延迟 |
吞吐量 (inputs/s) |
---|---|---|---|---|---|---|---|---|
passage |
300 |
64 |
1 |
377.9 |
376.4 |
396.5 |
404.5 |
169.1 |
passage |
300 |
64 |
3 |
929.9 |
932.3 |
972.0 |
979.2 |
205.8 |
passage |
300 |
64 |
5 |
1545.5 |
1555.2 |
1597.3 |
1610.3 |
205.9 |
passage |
512 |
64 |
1 |
469.5 |
468.6 |
473.9 |
475.1 |
136.2 |
passage |
512 |
64 |
3 |
1178.9 |
1182.4 |
1183.0 |
1183.2 |
162.2 |
passage |
512 |
64 |
5 |
1958.1 |
1970.9 |
1971.6 |
1971.8 |
162.2 |
query |
20 |
1 |
1 |
11.1 |
11.1 |
11.5 |
11.6 |
89.8 |
query |
20 |
1 |
3 |
19.3 |
20.3 |
20.8 |
21.0 |
154.9 |
query |
20 |
1 |
5 |
32.1 |
34.0 |
34.6 |
34.8 |
155.5 |
query |
20 |
1 |
7 |
44.8 |
47.4 |
48.1 |
48.2 |
156.0 |
query |
20 |
1 |
9 |
57.7 |
60.9 |
61.8 |
62.0 |
155.8 |
query |
20 |
1 |
11 |
70.6 |
74.0 |
75.5 |
75.7 |
155.5 |
query |
20 |
1 |
13 |
83.8 |
82.8 |
89.2 |
89.6 |
154.9 |
query |
20 |
1 |
15 |
97.5 |
96.6 |
103.1 |
103.4 |
153.7 |
输入类型 |
输入 tokens |
批大小 |
并发 |
平均延迟 |
P50 延迟 |
P90 延迟 |
P95 延迟 |
吞吐量 (inputs/s) |
---|---|---|---|---|---|---|---|---|
passage |
300 |
64 |
1 |
483.6 |
483.6 |
505.2 |
509.5 |
132.3 |
passage |
300 |
64 |
3 |
1328.0 |
1334.4 |
1367.9 |
1379.6 |
143.9 |
passage |
300 |
64 |
5 |
2181.8 |
2203.7 |
2241.9 |
2250.4 |
145.5 |
passage |
512 |
64 |
1 |
633.8 |
633.8 |
638.6 |
639.4 |
100.9 |
passage |
512 |
64 |
3 |
1744.9 |
1755.3 |
1761.5 |
1763.1 |
109.4 |
passage |
512 |
64 |
5 |
2892.2 |
2923.9 |
2934.8 |
2936.8 |
109.4 |
query |
20 |
1 |
1 |
8.0 |
8.0 |
8.3 |
8.3 |
124.1 |
query |
20 |
1 |
3 |
11.2 |
12.2 |
12.6 |
12.8 |
266.1 |
query |
20 |
1 |
5 |
19.9 |
20.6 |
21.1 |
21.2 |
250.3 |
query |
20 |
1 |
7 |
27.6 |
28.9 |
29.4 |
29.6 |
253.0 |
query |
20 |
1 |
9 |
35.1 |
36.7 |
37.3 |
37.5 |
256.1 |
query |
20 |
1 |
11 |
42.7 |
44.6 |
45.5 |
45.7 |
256.9 |
query |
20 |
1 |
13 |
50.7 |
50.3 |
54.0 |
54.2 |
255.9 |
query |
20 |
1 |
15 |
57.4 |
57.9 |
62.2 |
62.5 |
261.0 |
输入类型 |
输入 tokens |
批大小 |
并发 |
平均延迟 |
P50 延迟 |
P90 延迟 |
P95 延迟 |
吞吐量 (inputs/s) |
---|---|---|---|---|---|---|---|---|
passage |
300 |
64 |
1 |
552.1 |
555.1 |
572.6 |
575.1 |
115.5 |
passage |
300 |
64 |
3 |
1228.5 |
1229.5 |
1325.3 |
1527.0 |
155.4 |
passage |
300 |
64 |
5 |
2045.3 |
2058.5 |
2153.8 |
2231.9 |
155.3 |
passage |
512 |
64 |
1 |
730.0 |
729.7 |
732.3 |
733.6 |
87.4 |
passage |
512 |
64 |
3 |
1775.8 |
1779.3 |
1784.0 |
1784.5 |
107.7 |
passage |
512 |
64 |
5 |
2945.8 |
2539.2 |
3431.0 |
3432.8 |
107.6 |
query |
20 |
1 |
1 |
14.6 |
14.6 |
15.2 |
15.4 |
67.9 |
query |
20 |
1 |
3 |
29.1 |
30.7 |
31.6 |
31.9 |
102.7 |
query |
20 |
1 |
5 |
48.7 |
51.4 |
52.6 |
52.9 |
102.3 |
query |
20 |
1 |
7 |
68.2 |
72.0 |
73.7 |
74.0 |
102.4 |
query |
20 |
1 |
9 |
86.7 |
90.2 |
94.0 |
94.6 |
103.7 |
query |
20 |
1 |
11 |
106.3 |
105.3 |
115.0 |
115.5 |
103.3 |
query |
20 |
1 |
13 |
125.3 |
125.0 |
134.9 |
135.8 |
103.6 |
query |
20 |
1 |
15 |
144.4 |
145.0 |
155.3 |
156.1 |
103.7 |
输入类型 |
输入 tokens |
批大小 |
并发 |
平均延迟 |
P50 延迟 |
P90 延迟 |
P95 延迟 |
吞吐量 (inputs/s) |
---|---|---|---|---|---|---|---|---|
passage |
300 |
64 |
1 |
693.1 |
702.0 |
719.3 |
721.3 |
92.0 |
passage |
300 |
64 |
3 |
1674.4 |
1687.0 |
1849.1 |
2192.7 |
114.0 |
passage |
300 |
64 |
5 |
2780.4 |
2797.6 |
3082.0 |
3389.9 |
113.8 |
passage |
512 |
64 |
1 |
930.9 |
931.5 |
935.9 |
936.9 |
68.6 |
passage |
512 |
64 |
3 |
2398.1 |
2395.9 |
2403.8 |
2407.5 |
79.6 |
passage |
512 |
64 |
5 |
4056.4 |
4079.8 |
4098.5 |
4315.1 |
78.5 |
query |
20 |
1 |
1 |
19.8 |
19.7 |
20.6 |
20.7 |
50.1 |
query |
20 |
1 |
3 |
42.3 |
44.0 |
45.2 |
45.5 |
70.8 |
query |
20 |
1 |
5 |
70.1 |
73.4 |
75.1 |
75.8 |
71.1 |
query |
20 |
1 |
7 |
97.7 |
102.6 |
104.5 |
104.9 |
71.6 |
query |
20 |
1 |
9 |
124.9 |
131.3 |
134.2 |
134.8 |
71.9 |
query |
20 |
1 |
11 |
151.7 |
149.8 |
163.6 |
164.3 |
72.4 |
query |
20 |
1 |
13 |
180.3 |
178.8 |
193.3 |
194.0 |
72.0 |
query |
20 |
1 |
15 |
208.4 |
207.8 |
222.5 |
223.4 |
71.9 |
输入类型 |
输入 tokens |
批大小 |
并发 |
平均延迟 |
P50 延迟 |
P90 延迟 |
P95 延迟 |
吞吐量 (inputs/s) |
---|---|---|---|---|---|---|---|---|
passage |
300 |
64 |
1 |
1322.4 |
1322.2 |
1362.2 |
1369.5 |
48.3 |
passage |
300 |
64 |
3 |
3670.5 |
3674.6 |
3798.4 |
3824.3 |
52.2 |
passage |
300 |
64 |
5 |
6188.9 |
6219.1 |
6368.4 |
6378.3 |
51.4 |
passage |
512 |
64 |
1 |
1990.0 |
1990.5 |
2013.8 |
2018.5 |
32.1 |
passage |
512 |
64 |
3 |
5586.0 |
5601.7 |
5683.0 |
5689.6 |
34.3 |
passage |
512 |
64 |
5 |
9358.7 |
9398.1 |
9525.5 |
9570.5 |
34.0 |
query |
20 |
1 |
1 |
21.5 |
21.5 |
21.8 |
21.8 |
46.3 |
query |
20 |
1 |
3 |
47.8 |
51.1 |
51.5 |
51.7 |
62.5 |
query |
20 |
1 |
5 |
82.1 |
85.3 |
85.8 |
85.9 |
60.8 |
query |
20 |
1 |
7 |
112.1 |
119.2 |
120.0 |
120.2 |
62.3 |
query |
20 |
1 |
9 |
143.5 |
151.5 |
154.2 |
154.4 |
62.6 |
query |
20 |
1 |
11 |
176.5 |
174.3 |
188.5 |
188.8 |
62.2 |
query |
20 |
1 |
13 |
208.2 |
205.8 |
222.2 |
222.4 |
62.3 |
query |
20 |
1 |
15 |
239.0 |
239.5 |
256.2 |
256.6 |
62.7 |
输入类型 |
输入 tokens |
批大小 |
并发 |
平均延迟 |
P50 延迟 |
P90 延迟 |
P95 延迟 |
吞吐量 (inputs/s) |
---|---|---|---|---|---|---|---|---|
passage |
300 |
64 |
1 |
1954.6 |
1957.3 |
2010.7 |
2029.4 |
32.7 |
passage |
300 |
64 |
3 |
5650.6 |
5734.4 |
5950.8 |
7649.8 |
33.3 |
passage |
300 |
64 |
5 |
9470.5 |
9790.5 |
9947.0 |
10511.1 |
32.7 |
passage |
512 |
64 |
1 |
3038.8 |
3045.9 |
3079.6 |
3080.5 |
21.0 |
passage |
512 |
64 |
3 |
8659.0 |
8835.5 |
8944.2 |
8960.5 |
21.8 |
passage |
512 |
64 |
5 |
14292.3 |
14782.0 |
14948.8 |
14986.1 |
21.6 |
query |
20 |
1 |
1 |
29.3 |
29.2 |
29.5 |
29.6 |
34.0 |
query |
20 |
1 |
3 |
71.2 |
73.1 |
73.3 |
73.4 |
42.0 |
query |
20 |
1 |
5 |
113.8 |
121.7 |
122.2 |
122.3 |
43.9 |
query |
20 |
1 |
7 |
159.3 |
170.2 |
171.0 |
171.1 |
43.9 |
query |
20 |
1 |
9 |
204.7 |
217.6 |
219.9 |
220.0 |
43.9 |
query |
20 |
1 |
11 |
253.3 |
266.7 |
268.8 |
268.9 |
43.4 |
query |
20 |
1 |
13 |
299.2 |
295.0 |
317.5 |
317.7 |
43.4 |
query |
20 |
1 |
15 |
346.4 |
342.2 |
366.3 |
366.4 |
43.2 |
输入类型 |
输入 tokens |
批大小 |
并发 |
平均延迟 |
P50 延迟 |
P90 延迟 |
P95 延迟 |
吞吐量 (inputs/s) |
---|---|---|---|---|---|---|---|---|
passage |
300 |
64 |
1 |
1618.5 |
1619.8 |
1666.8 |
1700.4 |
39.5 |
passage |
300 |
64 |
3 |
4245.1 |
4279.2 |
4465.5 |
5715.4 |
44.6 |
passage |
300 |
64 |
5 |
7009.0 |
7159.7 |
7465.8 |
8631.8 |
44.6 |
passage |
512 |
64 |
1 |
2316.9 |
2317.3 |
2321.5 |
2323.1 |
27.6 |
passage |
512 |
64 |
3 |
6328.9 |
6407.9 |
6414.3 |
6415.5 |
29.9 |
passage |
512 |
64 |
5 |
10559.8 |
10698.6 |
11012.5 |
11124.2 |
29.7 |
query |
20 |
1 |
1 |
22.5 |
22.5 |
22.8 |
22.9 |
44.4 |
query |
20 |
1 |
3 |
49.5 |
53.2 |
53.6 |
53.8 |
60.6 |
query |
20 |
1 |
5 |
81.2 |
88.5 |
89.1 |
89.2 |
61.6 |
query |
20 |
1 |
7 |
114.8 |
123.9 |
124.5 |
124.7 |
60.9 |
query |
20 |
1 |
9 |
147.6 |
145.4 |
160.0 |
160.1 |
60.9 |
query |
20 |
1 |
11 |
179.3 |
177.9 |
195.4 |
195.6 |
61.3 |
query |
20 |
1 |
13 |
212.8 |
213.6 |
231.3 |
231.5 |
61.0 |
query |
20 |
1 |
15 |
243.0 |
248.4 |
266.5 |
266.7 |
61.7 |
输入类型 |
输入 tokens |
批大小 |
并发 |
平均延迟 |
P50 延迟 |
P90 延迟 |
P95 延迟 |
吞吐量 (inputs/s) |
---|---|---|---|---|---|---|---|---|
passage |
300 |
64 |
1 |
1887.4 |
1890.5 |
1954.8 |
1970.8 |
33.9 |
passage |
300 |
64 |
3 |
5411.6 |
5587.9 |
5787.5 |
6121.0 |
34.9 |
passage |
300 |
64 |
5 |
8957.8 |
9469.9 |
11139.8 |
11486.4 |
34.6 |
passage |
512 |
64 |
1 |
2839.6 |
2851.3 |
2973.2 |
3008.1 |
22.5 |
passage |
512 |
64 |
3 |
8179.3 |
8529.3 |
8662.5 |
8678.7 |
23.0 |
passage |
512 |
64 |
5 |
13935.1 |
14520.8 |
14928.1 |
15156.9 |
22.5 |
query |
20 |
1 |
1 |
24.0 |
23.9 |
24.2 |
24.3 |
41.6 |
query |
20 |
1 |
3 |
51.3 |
54.4 |
55.3 |
55.5 |
58.4 |
query |
20 |
1 |
5 |
87.2 |
91.3 |
92.8 |
93.1 |
57.3 |
query |
20 |
1 |
7 |
120.8 |
126.9 |
129.5 |
129.8 |
57.9 |
query |
20 |
1 |
9 |
154.8 |
162.4 |
166.6 |
166.9 |
58.1 |
query |
20 |
1 |
11 |
187.7 |
185.9 |
203.5 |
203.8 |
58.5 |
query |
20 |
1 |
13 |
223.0 |
222.1 |
239.8 |
240.3 |
58.2 |
query |
20 |
1 |
15 |
256.2 |
258.2 |
276.6 |
277.3 |
58.5 |
输入类型 |
输入 tokens |
批大小 |
并发 |
平均延迟 |
P50 延迟 |
P90 延迟 |
P95 延迟 |
吞吐量 (inputs/s) |
---|---|---|---|---|---|---|---|---|
passage |
300 |
16 |
1 |
2927.9 |
2927.4 |
3059.5 |
3089.5 |
5.5 |
passage |
300 |
16 |
3 |
8379.6 |
8563.5 |
8739.8 |
9256.8 |
5.6 |
passage |
300 |
16 |
5 |
13629.2 |
14355.2 |
14642.3 |
14710.0 |
5.6 |
passage |
300 |
64 |
1 |
11385.6 |
11350.1 |
11646.0 |
11693.3 |
5.6 |
passage |
300 |
64 |
3 |
29783.1 |
33442.8 |
33609.1 |
33687.4 |
5.7 |
passage |
300 |
64 |
5 |
43320.7 |
55557.3 |
55833.8 |
55911.2 |
5.8 |
query |
20 |
1 |
1 |
39.8 |
39.7 |
40.2 |
40.4 |
25.1 |
query |
20 |
1 |
3 |
95.5 |
100.7 |
101.4 |
101.7 |
31.4 |
query |
20 |
1 |
5 |
157.1 |
167.9 |
168.8 |
169.0 |
31.8 |
query |
20 |
1 |
7 |
224.1 |
235.1 |
236.2 |
236.5 |
31.2 |
query |
20 |
1 |
9 |
284.4 |
302.1 |
303.6 |
303.9 |
31.6 |
query |
20 |
1 |
11 |
345.7 |
339.8 |
370.6 |
370.9 |
31.8 |
query |
20 |
1 |
13 |
410.6 |
406.0 |
437.9 |
438.2 |
31.6 |
query |
20 |
1 |
15 |
470.4 |
472.7 |
505.1 |
505.6 |
31.8 |
输入类型 |
输入 tokens |
批大小 |
并发 |
平均延迟 |
P50 延迟 |
P90 延迟 |
P95 延迟 |
吞吐量 (inputs/s) |
---|---|---|---|---|---|---|---|---|
passage |
300 |
16 |
1 |
1753.4 |
1745.5 |
1831.3 |
1850.1 |
9.1 |
passage |
300 |
16 |
3 |
5090.3 |
5173.0 |
5259.7 |
5416.0 |
9.3 |
passage |
300 |
16 |
5 |
8349.1 |
8605.0 |
8750.8 |
8830.5 |
9.3 |
passage |
300 |
64 |
1 |
7270.1 |
7290.9 |
7380.2 |
7390.7 |
8.8 |
passage |
300 |
64 |
3 |
20045.6 |
21451.2 |
21691.0 |
21695.3 |
8.9 |
passage |
300 |
64 |
5 |
31066.4 |
35673.0 |
36053.9 |
36088.0 |
8.9 |
query |
20 |
1 |
1 |
66.4 |
66.2 |
67.1 |
67.2 |
15.0 |
query |
20 |
1 |
3 |
168.6 |
179.1 |
180.8 |
181.5 |
17.8 |
query |
20 |
1 |
5 |
278.9 |
298.7 |
300.6 |
300.9 |
17.9 |
query |
20 |
1 |
7 |
388.5 |
417.5 |
419.9 |
420.6 |
18.0 |
query |
20 |
1 |
9 |
501.5 |
535.8 |
539.7 |
540.5 |
17.9 |
query |
20 |
1 |
11 |
616.4 |
603.0 |
659.1 |
659.8 |
17.8 |
query |
20 |
1 |
13 |
728.6 |
722.0 |
778.9 |
779.8 |
17.8 |
query |
20 |
1 |
15 |
838.3 |
840.9 |
897.7 |
898.6 |
17.8 |
输入类型 |
输入 tokens |
批大小 |
并发 |
平均延迟 |
P50 延迟 |
P90 延迟 |
P95 延迟 |
吞吐量 (inputs/s) |
---|---|---|---|---|---|---|---|---|
passage |
300 |
64 |
1 |
98.0 |
95.0 |
105.8 |
106.6 |
651.7 |
passage |
300 |
64 |
3 |
144.9 |
142.3 |
159.3 |
174.9 |
1320.4 |
passage |
300 |
64 |
5 |
243.3 |
238.8 |
283.4 |
298.1 |
1311.5 |
passage |
512 |
64 |
1 |
112.0 |
112.0 |
112.9 |
113.2 |
569.2 |
passage |
512 |
64 |
3 |
223.2 |
253.4 |
257.1 |
257.7 |
857.7 |
passage |
512 |
64 |
5 |
300.7 |
295.7 |
356.5 |
360.0 |
1061.4 |
query |
20 |
1 |
1 |
4.6 |
4.6 |
4.8 |
4.8 |
215.6 |
query |
20 |
1 |
3 |
7.0 |
7.2 |
7.5 |
7.8 |
426.4 |
query |
20 |
1 |
5 |
11.4 |
11.9 |
12.1 |
12.2 |
434.7 |
query |
20 |
1 |
7 |
16.0 |
16.7 |
16.9 |
17.0 |
434.7 |
query |
20 |
1 |
9 |
20.6 |
21.4 |
21.8 |
21.9 |
435.3 |
query |
20 |
1 |
11 |
25.2 |
26.2 |
26.7 |
26.9 |
435.8 |
query |
20 |
1 |
13 |
30.2 |
31.2 |
31.8 |
32.1 |
429.8 |
query |
20 |
1 |
15 |
34.9 |
35.8 |
36.4 |
36.6 |
429.0 |
输入类型 |
输入 tokens |
批大小 |
并发 |
平均延迟 |
P50 延迟 |
P90 延迟 |
P95 延迟 |
吞吐量 (inputs/s) |
---|---|---|---|---|---|---|---|---|
passage |
300 |
64 |
1 |
2586.4 |
2617.0 |
2802.0 |
2803.0 |
24.7 |
passage |
300 |
64 |
3 |
7438.4 |
7622.4 |
7961.6 |
8037.8 |
25.2 |
passage |
300 |
64 |
5 |
12158.7 |
12724.5 |
13256.9 |
13319.0 |
25.1 |
passage |
512 |
64 |
1 |
3727.8 |
3727.5 |
3728.9 |
3729.3 |
17.2 |
passage |
512 |
64 |
3 |
10810.7 |
11063.3 |
11102.2 |
11154.1 |
17.3 |
passage |
512 |
64 |
5 |
17458.0 |
14878.2 |
22157.8 |
22183.8 |
17.3 |
query |
20 |
1 |
1 |
32.3 |
32.2 |
32.6 |
32.7 |
30.8 |
query |
20 |
1 |
3 |
81.1 |
85.5 |
85.9 |
86.0 |
36.9 |
query |
20 |
1 |
5 |
136.5 |
142.8 |
143.1 |
143.3 |
36.6 |
query |
20 |
1 |
7 |
189.4 |
199.9 |
200.4 |
200.5 |
36.9 |
query |
20 |
1 |
9 |
245.6 |
257.0 |
257.6 |
257.8 |
36.6 |
query |
20 |
1 |
11 |
297.5 |
313.4 |
314.5 |
314.7 |
36.9 |
query |
20 |
1 |
13 |
350.9 |
344.2 |
371.6 |
371.8 |
37.0 |
query |
20 |
1 |
15 |
409.1 |
427.2 |
429.0 |
429.3 |
36.6 |
输入类型 |
输入 tokens |
批大小 |
并发 |
平均延迟 |
P50 延迟 |
P90 延迟 |
P95 延迟 |
吞吐量 (inputs/s) |
---|---|---|---|---|---|---|---|---|
passage |
300 |
64 |
1 |
172.1 |
171.9 |
186.3 |
189.1 |
371.4 |
passage |
300 |
64 |
3 |
334.3 |
335.5 |
363.5 |
383.4 |
573.4 |
passage |
300 |
64 |
5 |
556.5 |
557.5 |
585.4 |
600.3 |
573.6 |
passage |
512 |
64 |
1 |
203.5 |
202.6 |
206.5 |
207.2 |
314.2 |
passage |
512 |
64 |
3 |
406.1 |
406.7 |
497.6 |
502.4 |
472.0 |
passage |
512 |
64 |
5 |
673.6 |
673.2 |
718.2 |
760.0 |
474.1 |
query |
20 |
1 |
1 |
5.3 |
5.2 |
5.6 |
5.7 |
188.6 |
query |
20 |
1 |
3 |
7.3 |
7.4 |
7.5 |
7.5 |
408.6 |
query |
20 |
1 |
5 |
11.9 |
12.3 |
12.5 |
12.5 |
417.7 |
query |
20 |
1 |
7 |
16.5 |
17.2 |
17.4 |
17.5 |
423.6 |
query |
20 |
1 |
9 |
21.2 |
22.1 |
22.3 |
22.4 |
424.4 |
query |
20 |
1 |
11 |
25.9 |
27.0 |
27.3 |
27.4 |
423.7 |
query |
20 |
1 |
13 |
30.8 |
31.9 |
32.4 |
32.5 |
421.9 |
query |
20 |
1 |
15 |
35.2 |
34.9 |
37.1 |
37.2 |
425.5 |
输入类型 |
输入 tokens |
批大小 |
并发 |
平均延迟 |
P50 延迟 |
P90 延迟 |
P95 延迟 |
吞吐量 (inputs/s) |
---|---|---|---|---|---|---|---|---|
passage |
300 |
64 |
1 |
189.4 |
187.6 |
203.0 |
204.9 |
337.5 |
passage |
300 |
64 |
3 |
371.2 |
372.3 |
396.3 |
404.4 |
516.2 |
passage |
300 |
64 |
5 |
621.1 |
622.6 |
655.5 |
677.5 |
513.7 |
passage |
512 |
64 |
1 |
228.0 |
227.0 |
233.5 |
234.5 |
280.4 |
passage |
512 |
64 |
3 |
462.9 |
467.3 |
559.3 |
570.4 |
414.1 |
passage |
512 |
64 |
5 |
840.0 |
807.1 |
1040.2 |
1089.9 |
379.9 |
query |
20 |
1 |
1 |
6.6 |
6.6 |
7.0 |
7.2 |
150.7 |
query |
20 |
1 |
3 |
7.4 |
7.4 |
7.5 |
7.6 |
399.7 |
query |
20 |
1 |
5 |
12.1 |
12.5 |
12.7 |
12.8 |
411.2 |
query |
20 |
1 |
7 |
16.8 |
17.4 |
17.8 |
17.9 |
413.2 |
query |
20 |
1 |
9 |
21.7 |
22.4 |
22.9 |
23.0 |
413.7 |
query |
20 |
1 |
11 |
26.3 |
27.3 |
27.6 |
27.7 |
417.1 |
query |
20 |
1 |
13 |
31.1 |
32.0 |
32.6 |
32.7 |
416.9 |
query |
20 |
1 |
15 |
36.4 |
37.3 |
37.7 |
37.8 |
411.0 |
输入类型 |
输入 tokens |
批大小 |
并发 |
平均延迟 |
P50 延迟 |
P90 延迟 |
P95 延迟 |
吞吐量 (inputs/s) |
---|---|---|---|---|---|---|---|---|
passage |
300 |
64 |
1 |
373.3 |
366.4 |
397.5 |
401.4 |
171.3 |
passage |
300 |
64 |
3 |
918.5 |
919.8 |
958.2 |
963.8 |
208.4 |
passage |
300 |
64 |
5 |
1527.0 |
1534.3 |
1589.6 |
1594.6 |
208.4 |
passage |
512 |
64 |
1 |
470.4 |
469.4 |
475.0 |
476.0 |
136.0 |
passage |
512 |
64 |
3 |
1180.7 |
1184.0 |
1184.5 |
1184.7 |
162.0 |
passage |
512 |
64 |
5 |
1960.7 |
1973.6 |
1974.0 |
1974.2 |
162.0 |
query |
20 |
1 |
1 |
10.7 |
10.7 |
11.0 |
11.0 |
93.2 |
query |
20 |
1 |
3 |
19.6 |
20.4 |
20.9 |
21.0 |
152.6 |
query |
20 |
1 |
5 |
32.2 |
34.2 |
34.9 |
35.3 |
154.9 |
query |
20 |
1 |
7 |
45.9 |
48.0 |
48.8 |
49.1 |
152.2 |
query |
20 |
1 |
9 |
59.5 |
62.0 |
63.0 |
63.2 |
151.1 |
query |
20 |
1 |
11 |
72.7 |
76.0 |
77.3 |
77.8 |
151.1 |
query |
20 |
1 |
13 |
85.7 |
89.0 |
90.7 |
90.9 |
151.6 |
query |
20 |
1 |
15 |
99.9 |
103.5 |
104.4 |
104.7 |
149.9 |
输入类型 |
输入 tokens |
批大小 |
并发 |
平均延迟 |
P50 延迟 |
P90 延迟 |
P95 延迟 |
吞吐量 (inputs/s) |
---|---|---|---|---|---|---|---|---|
passage |
300 |
64 |
1 |
489.9 |
488.4 |
519.3 |
521.3 |
130.6 |
passage |
300 |
64 |
3 |
1355.4 |
1354.3 |
1413.0 |
1423.2 |
141.0 |
passage |
300 |
64 |
5 |
2251.5 |
2271.3 |
2338.8 |
2345.4 |
140.9 |
passage |
512 |
64 |
1 |
641.5 |
640.7 |
647.5 |
648.6 |
99.7 |
passage |
512 |
64 |
3 |
1797.2 |
1807.8 |
1813.7 |
1814.9 |
106.2 |
passage |
512 |
64 |
5 |
2979.6 |
3014.9 |
3020.7 |
3021.9 |
106.2 |
query |
20 |
1 |
1 |
7.9 |
7.9 |
8.2 |
8.4 |
125.6 |
query |
20 |
1 |
3 |
11.9 |
12.3 |
12.6 |
12.7 |
251.4 |
query |
20 |
1 |
5 |
20.0 |
20.6 |
20.9 |
20.9 |
249.5 |
query |
20 |
1 |
7 |
27.7 |
28.9 |
29.4 |
29.5 |
251.5 |
query |
20 |
1 |
9 |
35.6 |
37.0 |
37.6 |
37.8 |
252.2 |
query |
20 |
1 |
11 |
43.6 |
45.3 |
45.9 |
46.1 |
251.9 |
query |
20 |
1 |
13 |
51.5 |
53.3 |
54.2 |
54.4 |
252.2 |
query |
20 |
1 |
15 |
59.6 |
59.3 |
63.0 |
63.3 |
251.0 |