It seems that whenever I disconnect from an OpenVPN connection (via systemd unit), the Netdata agent stops showing up as alive in the cloud dashboard. The local dashboard works fine.
The local dashboard also tells me that the agent is not currently connected to the cloud.
Interestingly, reconnecting the VPN does not resolve the issue but a restart of the Netdata service does.
The error.log does not show any activity when the issue occurs either; perhaps something important has fallen over?
When the issue happened I used to netcat to check liveness of port 443 against “api.netdata.cloud” and “mqtt.netdata.cloud” and they come back as being live.
(Issue repro’s every time BTW)
EDIT for update: I ran metric correlation against the time-frame of an issue and saw something interesting. Right before the service restart I see IPv6 bandwidth being used and not IPv4, but right at and after the issue I see it has switch to IPv4 only.
It could be a red-herring but perhaps the agent is not handling a switch being IP protocol versions during run-time?
EDIT 2: I have repro’d the same behaviour when switching from non-VPN to a VPN connection
Environment:
Ubuntu 22.04.1 LTS (arm64)
Edge Version 108.0.1462.46 (Official build) (64-bit)
Netdata agent:
Version: netdata v1.37.0-40-nightly
Configure options: ‘–prefix=/usr’ ‘–sysconfdir=/etc’ ‘–localstatedir=/var’ ‘–libexecdir=/usr/libexec’ ‘–libdir=/usr/lib’ ‘–with-zlib’ ‘–with-math’ ‘–with-user=netdata’ ‘–with-bundled-protobuf’ ‘CFLAGS=-O2 -pipe’ ‘LDFLAGS=’
Install type: kickstart-build
Features:
dbengine: YES
Native HTTPS: YES
Netdata Cloud: YES
ACLK: YES
TLS Host Verification: YES
Machine Learning: YES
Stream Compression: YES
Libraries:
protobuf: YES (bundled)
jemalloc: NO
JSON-C: YES
libcap: NO
libcrypto: YES
libm: YES
tcalloc: NO
zlib: YES
Plugins:
apps: YES
cgroup Network Tracking: YES
CUPS: NO
EBPF: NO
IPMI: NO
NFACCT: NO
perf: YES
slabinfo: YES
Xen: NO
Xen VBD Error Tracking: NO
Exporters:
AWS Kinesis: NO
GCP PubSub: NO
MongoDB: NO
Prometheus Remote Write: NO
Debug/Developer Features:
Trace Allocations: NO
What do you guys need to troubleshoot this?
I’m using the standard systemd unit file for OpenVPN on Ubuntu (can provide that if needed).