使用 NVUE 监控接口和收发器
NVUE 使您能够检查接口的状态,以及查看和清除接口计数器。接口计数器提供有关接口的信息,例如丢包数、由于错误而未传输的入站和出站数据包数等等。
显示接口配置和统计信息
要检查接口的配置和统计信息,请运行 nv show interface <interface>
命令
cumulus@switch:~$ nv show interface swp1
operational applied pending
------------------------ ----------------- ------- -------
type swp swp
[acl]
evpn
multihoming
uplink off
ptp
enable off
router
adaptive-routing
enable off
ospf
enable on
area none
cost auto
mtu-ignore off
network-type broadcast
passive on
priority 1
authentication
enable off
bfd
enable off
timers
dead-interval 40
hello-interval 10
retransmit-interval 5
transmit-delay 1
ospf6
enable off
pbr
[map]
pim
enable off
synce
enable off
ip
igmp
enable off
ipv4
forward on
ipv6
enable on
forward on
neighbor-discovery
enable on
[dnssl]
home-agent
enable off
[prefix]
[rdnss]
router-advertisement
enable on
fast-retransmit on
hop-limit 64
interval 600000
interval-option off
lifetime 1800
managed-config off
other-config off
reachable-time 0
retransmit-time 0
router-preference medium
vrrp
enable off
vrf default
[gateway]
link
auto-negotiate off on
duplex full full
speed 1G auto
fec auto
mtu 9216 9216
[breakout]
state up up
stats
carrier-transitions 4
in-bytes 300 Bytes
in-drops 5
in-errors 0
in-pkts 5
out-bytes 9.73 MB
out-drops 0
out-errors 0
out-pkts 140188
mac 48:b0:2d:ef:52:b8
ifindex 3
显示接口计数器
NVUE 提供了以下命令来显示交换机上接口的计数器(统计信息)。
NVUE 命令 | 描述 |
---|---|
nv show interface --view counters | 显示交换机上配置的所有接口的所有统计信息,例如接收和发送的数据包总数,以及接收和发送的丢弃数据包和错误数据包的数量。 |
nv show interface <interface> counters | 显示特定接口的所有统计信息,例如接收和发送的单播、多播和广播数据包的数量,接收和发送的丢弃数据包和错误数据包的数量,以及接收和发送的特定大小的数据包的数量。 |
nv show interface <interface> counters errors | 显示特定接口的错误数据包数,例如接收和发送的数据包对齐、超大、过小和 jabber 错误的数量。 |
nv show interface <interface> counters drops | 显示特定接口的接收和发送的数据包丢弃数,例如 ACL 丢弃、缓冲区丢弃、队列丢弃和非队列丢弃。 |
nv show interface <interface> counters pktdist | 显示特定接口的接收和发送的特定大小的数据包数。 |
nv show interface <interface> counters qos | 显示指定接口的 QoS 统计信息。请参阅 显示 Qos 计数器。 |
nv show interface <interface> counters ptp | 显示特定接口的 PTP 统计信息。请参阅 显示 PTP 计数器。 |
以下示例显示了交换机上配置的所有接口的所有统计信息
cumulus@switch$ nv show interface --view counters
Interface MTU RX_OK RX_ERR RX_DRP RX_OVR TX_OK TX_ERR TX_DRP TX_OVR Flg
-------------- ----- ----- ------ ------ ------ ----- ------ ------ ------ -----
BLUE 65575 1 0 0 0 0 0 4 0 OmRU
RED 65575 1 0 0 0 0 0 4 0 OmRU
bond1 9000 718 0 0 0 1091 0 0 0 BMmRU
bond2 9000 727 0 0 0 1088 0 0 0 BMmRU
bond3 9000 722 0 0 0 1089 0 0 0 BMmRU
br_default 9216 360 0 10 0 475 0 0 0 BMRU
eth0 1500 946 0 0 0 299 0 0 0 BMRU
lo 65536 651 0 0 0 651 0 0 0 LRU
mgmt 65575 283 0 0 0 0 0 4 0 OmRU
peerlink 9216 4972 0 0 0 5028 0 0 0 BMmRU
peerlink.4094 9216 3263 0 0 0 3224 0 0 0 BMRU
swp1 9000 721 0 0 0 1091 0 0 0 BMsRU
swp2 9000 730 0 0 0 1088 0 0 0 BMsRU
swp3 9000 725 0 0 0 1089 0 0 0 BMsRU
swp49 9216 2807 0 0 0 2691 0 0 0 BMsRU
swp50 9216 2165 0 0 0 2337 0 0 0 BMsRU
swp51 9216 685 0 0 0 690 0 0 0 BMRU
swp52 9216 703 0 0 0 722 0 0 0 BMRU
swp53 9216 738 0 0 0 710 0 0 0 BMRU
swp54 9216 682 0 0 0 730 0 0 0 BMRU
vlan10 9216 108 0 20 0 91 0 0 0 BMRU
vlan10-v0 9216 63 0 20 0 45 0 0 0 BMRU
vlan20 9216 104 0 20 0 88 0 0 0 BMRU
vlan20-v0 9216 58 0 20 0 44 0 0 0 BMRU
vlan30 9216 112 0 20 0 94 0 0 0 BMRU
vlan30-v0 9216 61 0 20 0 44 0 0 0 BMRU
vlan4024_l3 9216 1 0 0 0 82 0 0 0 BMRU
vlan4024_l3-v0 9216 0 0 0 0 36 0 0 0 BMRU
vlan4036_l3 9216 1 0 0 0 85 0 0 0 BMRU
vlan4036_l3-v0 9216 0 0 0 0 37 0 0 0 BMRU
vxlan48 9216 45 0 0 0 21 0 0 0 BMRU
以下示例显示了 swp1 的所有统计信息
cumulus@switch$ nv show interface swp1 counters
operational applied
------------------- ----------- -------
carrier-transitions 4
Detailed Counters
====================
Counter Receive Transmit
----------------- ------- --------
Broadcast Packets 0 0
Multicast Packets 0 0
Total Octets 0 0
Total Packets 0 0
Unicast Packets 0 0
Drop Counters
================
Counter Receive Transmit
--------------- ------- --------
ACL Drops 0 n/a
Buffer Drops 0 n/a
Non-Queue Drops n/a 0
Queue Drops n/a 0
Total Drops 0 0
Error Counters
=================
Counter Receive Transmit
---------------- ------- --------
Alignment Errors 0 n/a
FCS Errors 0 n/a
Jabber Errors 0 n/a
Length Errors 0 n/a
Oversize Errors 0 n/a
Symbol Errors 0 n/a
Total Errors 0 0
Undersize Errors 0 n/a
Packet Size Statistics
=========================
Counter Receive Transmit
---------- ------- --------
64 0 0
65-127 0 0
128-255 0 0
256-511 0 0
512-1023 0 0
1024-1518 0 0
1519-2047 0 0
2048-4095 0 0
4096-16383 0 0
Ingress Buffer Statistics
============================
priority-group rx-frames rx-buffer-discards rx-shared-buffer-discards
-------------- --------- ------------------ -------------------------
0 0 0 Bytes 0 Bytes
1 0 0 Bytes 0 Bytes
2 0 0 Bytes 0 Bytes
3 0 0 Bytes 0 Bytes
4 0 0 Bytes 0 Bytes
5 0 0 Bytes 0 Bytes
6 0 0 Bytes 0 Bytes
7 0 0 Bytes 0 Bytes
...
以下示例显示了 swp1 的错误计数器
cumulus@switch$ nv show interface swp1 counters errors
Counter Receive Transmit
---------------- ------- --------
Alignment Errors 0 n/a
FCS Errors 0 n/a
Jabber Errors 0 n/a
Length Errors 0 n/a
Oversize Errors 0 n/a
Symbol Errors 0 n/a
Total Errors 0 0
Undersize Errors 0 n/a
- NVUE 不显示逻辑接口(例如 bonds、VLAN 接口或子接口)的详细统计信息。要查看逻辑接口的基本统计信息,请运行
nv show interface <interface> link stats
命令。 - 在 NVIDIA Spectrum 交换机上,Cumulus Linux 每两秒将物理计数器更新到内核,每十秒更新虚拟接口(例如 VLAN 接口)。您无法更改这些值。由于更新过程的优先级低于其他
switchd
进程,因此当系统负载较重时,间隔可能会更长。
AmBER PHY 健康管理
要显示物理层信息,例如端口上每个通道的错误计数器,请运行 nv show interface <interface> link phy-detail
命令。此命令突出显示链路完整性问题。
命令输出中的 effective-ber
表示无法纠正的误码率,这与未纠正的 FEC 错误相同。
cumulus@switch$ nv show interface swp1 link phy-detail
operational
------------------------- -----------------
time-since-last-clear-min 324
phy-received-bits 15561574400000000
symbol-errors 0
effective-errors 0
phy-raw-errors-lane0 747567424
phy-raw-errors-lane1 215603747
phy-raw-errors-lane2 158456437
phy-raw-errors-lane3 30578923
phy-raw-errors-lane4 121708834
phy-raw-errors-lane5 29244642
phy-raw-errors-lane6 79102523
phy-raw-errors-lane7 96656135
raw-ber 1E-7
symbol-ber 15E-255
effective-ber 15E-255
raw-ber-lane0 3E-6
raw-ber-lane1 9E-7
raw-ber-lane2 6E-7
raw-ber-lane3 1E-7
raw-ber-lane4 5E-7
raw-ber-lane5 1E-7
raw-ber-lane6 3E-7
raw-ber-lane7 4E-7
rs-num-corr-err-bin0 757956054591
rs-num-corr-err-bin1 598244758
rs-num-corr-err-bin2 807002
rs-num-corr-err-bin3 3371
rs-num-corr-err-bin4 180
rs-num-corr-err-bin5 1
rs-num-corr-err-bin6 0
rs-num-corr-err-bin7 0
rs-num-corr-err-bin8 1
rs-num-corr-err-bin9 0
rs-num-corr-err-bin10 0
rs-num-corr-err-bin11 0
rs-num-corr-err-bin12 0
rs-num-corr-err-bin13 0
rs-num-corr-err-bin14 0
rs-num-corr-err-bin15 0
要显示端口的物理层诊断信息,请运行 nv show interface <interface> link phy-diag
命令
cumulus@switch$ nv show interface swp20 link phy-diag
operational
-------------------------------- -----------
pd-fsm-state 0x7
eth-an-fsm-state 0x6
phy-hst-fsm-state 0x8
psi-fsm-state 0x0
phy-manager-link-enabled 0x9bff0
core-to-phy-link-enabled 0x9b800
cable-proto-cap-ext 0x0
loopback-mode 0x0
retran-mode-request 0x0
retran-mode-active 0x0
fec-mode-request 0x1
profile-fec-in-use 0x4
pd-link-enabled 0x80000
phy-hst-link-enabled 0x80000
eth-an-link-enabled 0x0
phy-manager-state 0x3
eth-proto-admin 0x0
ext-eth-proto-admin 0x0
eth-proto-capability 0x0
ext-eth-proto-capability 0x0
data-rate-oper 0x0
an-status 0x0
an-disable-admin 0x0
proto-mask 0x2
module-info-ext 0x0
ethernet-compliance-code 0x1c
ext-ethernet-compliance-code 0x32
memory-map-rev 0x40
linear-direct-drive 0x0
cable-breakout 0x0
cable-rx-amp 0x1
cable-rx-pre-emphasis 0x0
cable-rx-post-emphasis 0x0
cable-tx-equalization 0x0
cable-attenuation-53g 0x0
cable-attenuation-25g 0x0
cable-attenuation-12g 0x0
cable-attenuation-7g 0x0
cable-attenuation-5g 0x0
tx-input-freq-sync 0x0
tx-cdr-state 0xff
rx-cdr-state 0xff
module-fw-version 0x2e820043
module-st 0x3
dp-st-lane0 0x4
dp-st-lane1 0x4
dp-st-lane2 0x4
dp-st-lane3 0x4
dp-st-lane4 0x4
dp-st-lane5 0x4
dp-st-lane6 0x4
dp-st-lane7 0x4
rx-output-valid 0x0
rx-power-type 0x1
active-set-host-compliance-code 0x52
active-set-media-compliance-code 0x1c
error-code-response 0x0
temp-flags 0x0
vcc-flags 0x0
mod-fw-fault 0x0
dp-fw-fault 0x0
rx-los-cap 0x0
tx-fault 0x0
tx-los 0x0
tx-cdr-lol 0x0
tx-ad-eq-fault 0x0
rx-los 0x0
rx-cdr-lol 0x0
rx-output-valid-change 0x0
flag-in-use 0x0
使用 Spectrum 1 ASIC 的交换机不支持 nv show interface <interface> link phy-detail
命令或 nv show interface <interface> link phy-diag
命令。
清除接口计数器
要清除所有接口的计数器(统计信息),请运行 nv action clear interface counters
命令。
cumulus@switch$ nv action clear interface counters
all interface counters cleared
Action succeeded
要清除接口的计数器,请运行 nv action clear interface <interface> counters
命令
cumulus@switch$ nv action clear interface swp1 counters
swp1 counters cleared
Action succeeded
nv action clear interface <interface> counters
命令不会清除硬件中的计数器。
重置收发器
NVUE 提供了一个命令,可以将特定的收发器重置为其初始稳定状态,而无需实际出现在数据中心来拔出收发器。
以下示例重置 swp1 中的收发器
cumulus@switch:~$ nv action reset platform transceiver swp1
Action executing ...
Resetting module swp1 ... OK
Action succeeded
以下示例重置一系列收发器
cumulus@switch:~$ nv action reset platform transceiver swp1-3
Action executing ...
Resetting module swp1 ... OK
Action executing ...
Resetting module swp2 ... OK
Action executing ...
Resetting module swp3 ... OK
Action succeeded
以下示例重置 swp5 和 swp7 中的收发器
cumulus@switch:~$ nv action reset platform transceiver swp5,swp7
Action executing ...
Resetting module swp5 ... OK
Action executing ...
Resetting module swp7 ... OK
Action executing ...
Action succeeded
当重置成功完成时,您会看到类似于以下内容的 syslog 消息
2024-12-06T07:12:37.996339+00:00 cumulus nvue-port-reset: The module reset was successfully completed on swp1
- 要在拆分端口上重置收发器,请指定父端口;例如,要重置 swp1s0,请运行
nv action reset platform transceiver swp1
命令。 - 如果电缆出现故障,
nv action reset platform transceiver <transceiver-id
命令将成功完成,但收发器的详细信息在您解决问题或在必要时重新启动系统之前不会显示。
显示收发器信息
要显示所有模块的标识符、供应商名称、部件号、序列号和修订版,请运行 nv show platform transceiver
命令
cumulus@switch:~$ nv show platform transceiver
Transceiver Identifier Vendor name Vendor PN Vendor SN Vendor revision
----------- ---------- ----------- ---------------- ------------- ---------------
swp1 QSFP28 Mellanox MCP1600-C001E30N MT2039VB01185 A3
swp10 QSFP28 Mellanox MCP1600-C001E30N MT2211VS01792 A3
swp11 QSFP28 Mellanox MCP1600-C001E30N MT2211VS01792 A3
swp12 QSFP28 Mellanox MCP1650-V00AE30 MT2122VB02220 A2
swp13 QSFP28 Mellanox MCP1650-V00AE30 MT2122VB02220 A2
swp14 QSFP-DD Mellanox MCP1660-W00AE30 MT2121VS01645 A3
swp15 QSFP-DD Mellanox MCP1660-W00AE30 MT2121VS01645 A3
swp18 QSFP28 Mellanox MCP1600-C001E30N MT2211VS01967 A3
swp20 QSFP28 Mellanox MFA1A00-C003 MT2108FT02204 B2
swp21 QSFP28 Mellanox MFA1A00-C003 MT2108FT02204 B2
swp22 QSFP28 Mellanox MFA1A00-C003 MT2108FT02194 B2
swp23 QSFP28 Mellanox MFA1A00-C003 MT2108FT02194 B2
swp31 QSFP28 Mellanox MCP1600-C001E30N MT2039VB01191 A3
要显示所有端口的模块信息的详细视图,包括电缆长度、类型和诊断信息、当前状态和错误状态,请运行 nv show platform transceiver details
命令。
要显示特定端口中模块的硬件功能和测量信息,请运行 nv show platform transceiver <interface>
命令
cumulus@switch:~$ nv show platform transceiver swp2
cable-type : Active cable
cable-length : 3m
supported-cable-length : 0m om1, 0m om2, 0m om3, 3m om4, 0m om5
diagnostics-status : Diagnostic Data Available
status : plugged_enabled
error-status : N/A
vendor-date-code : 210215__
identifier : QSFP28
vendor-rev : B2
vendor-name : Mellanox
vendor-pn : MFA1A00-C003
vendor-sn : MT2108FT02204
temperature:
temperature : 42.56 C
high-alarm-threshold : 80.00 C
low-alarm-threshold : -10.00 C
high-warning-threshold: 70.00 C
low-warning-threshold : 0.00 C
alarm : Off
voltage:
voltage : 3.2862 V
high-alarm-threshold : 3.5000 V
low-alarm-threshold : 3.1000 V
high-warning-threshold: 3.4650 V
low-warning-threshold : 3.1350 V
alarm : Off
channel:
channel-1:
rx-power:
power : 0.8625 mW / -0.64 dBm
high-alarm-threshold : 5.40 dBm
low-alarm-threshold : -13.31 dBm
high-warning-threshold: 2.40 dBm
low-warning-threshold : -10.30 dBm
alarm : Off
tx-power:
power : 0.8988 mW / -0.46 dBm
high-alarm-threshold : 5.40 dBm
low-alarm-threshold : -11.40 dBm
high-warning-threshold: 2.40 dBm
low-warning-threshold : -8.40 dBm
alarm : Off
tx-bias-current:
current : 6.750 mA
high-alarm-threshold : 8.500 mA
low-alarm-threshold : 5.492 mA
high-warning-threshold: 8.000 mA
low-warning-threshold : 6.000 mA
alarm : Off
...
nv show platform transceiver
命令仅显示前面板物理端口(例如 swp1)的信息。这些命令不显示逻辑端口(例如 SVI、bond 或 eth0)的信息。- 要显示子接口的信息;请运行
nv show interface <subinterface> transceiver
命令。
您还可以使用 nv show interface <interface> transceiver
命令以更简洁的格式显示收发器数据
cumulus@switch:~$ nv show interface swp1 transceiver
cable-type : Active cable
cable-length : 3m
supported-cable-length : 0m om1, 0m om2, 0m om3, 3m om4, 0m om5
diagnostics-status : Diagnostic Data Available
status : plugged_enabled
error-status : N/A
revision-compliance : SFF-8636 Rev 2.5/2.6/2.7
vendor-date-code : 210215__
identifier : QSFP28
vendor-rev : B2
vendor-oui : 00:02:c9
vendor-name : Mellanox
vendor-pn : MFA1A00-C003
vendor-sn : MT2108FT02204
temperature : 42.56 degrees C / 108.61 degrees F
voltage : 3.2888 V
ch-1-rx-power : 0.8625 mW / -0.64 dBm
ch-1-tx-power : 0.8988 mW / -0.46 dBm
ch-1-tx-bias-current : 6.750 mA
ch-2-rx-power : 0.8385 mW / -0.76 dBm
ch-2-tx-power : 0.9154 mW / -0.38 dBm
ch-2-tx-bias-current : 6.750 mA
ch-3-rx-power : 0.8556 mW / -0.68 dBm
ch-3-tx-power : 0.9537 mW / -0.21 dBm
ch-3-tx-bias-current : 6.750 mA
ch-4-rx-power : 0.8576 mW / -0.67 dBm
ch-4-tx-power : 0.9695 mW / -0.13 dBm
ch-4-tx-bias-current : 6.750 mA
要显示特定端口中模块的通道信息,请运行 nv show platform transceiver <interface> channel
命令。要显示特定端口中模块的特定通道信息,请运行 nv show platform transceiver <interface> channel <channel>
命令。
cumulus@switch:~$ nv show platform transceiver swp25 channel
channel:
channel-1:
rx-power:
power : 0.8625 mW / -0.64 dBm
high-alarm-threshold : 5.40 dBm
low-alarm-threshold : -13.31 dBm
high-warning-threshold: 2.40 dBm
low-warning-threshold : -10.30 dBm
alarm : Off
tx-power:
power : 0.8988 mW / -0.46 dBm
high-alarm-threshold : 5.40 dBm
low-alarm-threshold : -11.40 dBm
high-warning-threshold: 2.40 dBm
low-warning-threshold : -8.40 dBm
alarm : Off
tx-bias-current:
current : 6.750 mA
high-alarm-threshold : 8.500 mA
low-alarm-threshold : 5.492 mA
high-warning-threshold: 8.000 mA
low-warning-threshold : 6.000 mA
alarm : Off
channel-2:
rx-power:
power : 0.8385 mW / -0.76 dBm
high-alarm-threshold : 5.40 dBm
low-alarm-threshold : -13.31 dBm
high-warning-threshold: 2.40 dBm
low-warning-threshold : -10.30 dBm
alarm : Off
tx-power:
power : 0.9154 mW / -0.38 dBm
high-alarm-threshold : 5.40 dBm
low-alarm-threshold : -11.40 dBm
high-warning-threshold: 2.40 dBm
low-warning-threshold : -8.40 dBm
alarm : Off
...
要显示特定接口的模块阈值,请运行 nv show interface <interface> transceiver thresholds
命令
cumulus@switch:~$ nv show interface swp3 transceiver thresholds
Ch Value High Alarm High Warn Low Warn Low Alarm Alt Value
Threshold Threshold Threshold Threshold
------------------------------------------------------------------------------------------------------------------------
temperature - 42.74 C 80.00 C 70.00 C 0.00 C -10.00 C 108.94F
voltage - 3.2862 V 3.5000 V 3.4650 V 3.1350 V 3.1000 V
rx-power 1 -0.64 dBm 5.40 dBm 2.40 dBm -10.30 dBm -13.31 dBm 0.8625 mW
2 -0.70 dBm 5.40 dBm 2.40 dBm -10.30 dBm -13.31 dBm 0.8514 mW
3 -0.68 dBm 5.40 dBm 2.40 dBm -10.30 dBm -13.31 dBm 0.8556 mW
4 -0.60 dBm 5.40 dBm 2.40 dBm -10.30 dBm -13.31 dBm 0.8704 mW
tx-power 1 -0.48 dBm 5.40 dBm 2.40 dBm -8.40 dBm -11.40 dBm 0.8963 mW
2 -0.38 dBm 5.40 dBm 2.40 dBm -8.40 dBm -11.40 dBm 0.9154 mW
3 -0.19 dBm 5.40 dBm 2.40 dBm -8.40 dBm -11.40 dBm 0.9562 mW
4 -0.13 dBm 5.40 dBm 2.40 dBm -8.40 dBm -11.40 dBm 0.9695 mW
tx-bias-current 1 6.750 mA 8.500 mA 8.000 mA 6.000 mA 5.492 mA
2 6.750 mA 8.500 mA 8.000 mA 6.000 mA 5.492 mA
3 6.750 mA 8.500 mA 8.000 mA 6.000 mA 5.492 mA
4 6.750 mA 8.500 mA 8.000 mA 6.000 mA 5.492 mA