Hello,
My monitoring consists in 24 headless collectors streaming to 1 parent. SSL is enabled using a self signed certificate.
Everything is deployed with the help of ansible.
Yesterday I tried to upgrade all my nodes to v1.37.1 which resulted in a complete failure.
Since I had at least 2 different install types amongst my nodes, I decided to uninstall everything, and then deploy again so I could have deb packages installed everywhere.
This solved all my previous issues but a new one came up :
I now have 2 nodes unable to stream to the parent with the follwing logs :
2022-12-15 14:21:10: netdata INFO : STREAM_SENDER[child.node] : STREAM child.node: attempting to connect to 'tcp:parent.node:19998' (default port: 19999)...
2022-12-15 14:21:10: netdata INFO : STREAM_SENDER[child.node] : STREAM child.node [send to tcp:parent.node:19998]: initializing communication...
2022-12-15 14:21:10: netdata INFO : STREAM_SENDER[child.node] : STREAM child.node [send to tcp:parent.node:19998]: waiting response from remote netdata...
2022-12-15 14:21:10: netdata INFO : STREAM_SENDER[child.node] : STREAM child.node [send to tcp:parent.node:19998]: established link with negotiated capabilities: VCAPS HLABELS CLAIM CLABELS FUNCTIONS REPLICATION BINARY
2022-12-15 14:21:10: netdata ERROR : STREAM_SENDER[child.node] : Clearing stream_collected_metrics flag in charts of host child.node
2022-12-15 14:21:10: netdata INFO : STREAM_SENDER[child.node] : STREAM child.node [send to tcp:parent.node:19998]: enabling metrics streaming...
2022-12-15 14:21:10: netdata ERROR : STREAM_SENDER[child.node] : SSL_read() returned -1 bytes, SSL error 2 (errno 11, Resource temporarily unavailable)
2022-12-15 14:21:10: netdata ERROR : STREAM_SENDER[child.node] : Clearing stream_collected_metrics flag in charts of host child.node
2022-12-15 14:21:10: netdata ERROR : STREAM_SENDER[child.node] : Clearing stream_collected_metrics flag in charts of host child.node
$ netdata -W buildinfo
Version: netdata v1.37.1
Configure options: '--build=x86_64-linux-gnu' '--includedir=${prefix}/include' '--mandir=${prefix}/share/man' '--infodir=${prefix}/share/info' '--disable-silent-rules' '--libdir=${prefix}/lib/x86_64-linux-gnu' '--libexecdir=${prefix}/lib/x86_64-linux-gnu' '--disable-maintainer-mode' '--prefix=/usr' '--sysconfdir=/etc' '--localstatedir=/var' '--libdir=/usr/lib' '--libexecdir=/usr/libexec' '--with-user=netdata' '--with-math' '--with-zlib' '--with-webdir=/var/lib/netdata/www' '--disable-dependency-tracking' 'build_alias=x86_64-linux-gnu' 'CFLAGS=-g -O2 -fdebug-prefix-map=/usr/src/netdata=. -fstack-protector-strong -Wformat -Werror=format-security' 'LDFLAGS=-Wl,-Bsymbolic-functions -Wl,-z,relro' 'CPPFLAGS=-Wdate-time -D_FORTIFY_SOURCE=2' 'CXXFLAGS=-g -O2 -fdebug-prefix-map=/usr/src/netdata=. -fstack-protector-strong -Wformat -Werror=format-security'
Install type: binpkg-deb
Binary architecture: x86_64
Packaging distro:
Features:
dbengine: YES
Native HTTPS: YES
Netdata Cloud: YES
ACLK: YES
TLS Host Verification: YES
Machine Learning: YES
Stream Compression: NO
Libraries:
protobuf: YES (system)
jemalloc: NO
JSON-C: YES
libcap: NO
libcrypto: YES
libm: YES
tcalloc: NO
zlib: YES
Plugins:
apps: YES
cgroup Network Tracking: YES
CUPS: YES
EBPF: YES
IPMI: YES
NFACCT: YES
perf: YES
slabinfo: YES
Xen: NO
Xen VBD Error Tracking: NO
Exporters:
AWS Kinesis: NO
GCP PubSub: NO
MongoDB: NO
Prometheus Remote Write: YES
Debug/Developer Features:
Trace Allocations: NO
The 2 problematic servers are both part of replicated group of servers in my infrastructure. Their counterparts are working flawlessly using the very same installation procedure (ansible playbook), configuration files, OS, hardware…
Any help troubleshooting this is welcome and I’m obviously happy to provide more information if needed.
Thanks