Agent auto-restarting at connection loss or manual restart

Environment

Two print servers based on XUbuntu 20.04.3 LTS
netdata v1.33.1 + cups plugin
Install type: binpkg-deb

Problem/Question

I am happily using netdata to monitor my two print servers for a year now. Since a couple of weeks ago I’m experiencing some kind of automatic restart of the agent on both of my print servers. It happens when the internet connection is reset by the provider in the morning around 5 o’clock and sometimes if I restart the netdata.service. Then it takes about half an hour until the service is running stable and the cloud receives the metrics. This can also be seen on both nodes under http://localhost:19999


2022-02-18_09-08

This also results in a flood protection in the cloud which seems to block all alerts which are fired on the node which breaks my monitoring workflow for the printers at the moment.

2022-02-17 11:43:04: alarm-notify.sh: FATAL: All notification methods are disabled. Not sending notification for host 'tgprint-h1', chart 'cups.job_num' to 'root' for 'hang_pending_print_jobs_hall1' = '1' for status 'CRITICAL'.
2022-02-17 11:44:24: alarm-notify.sh: FATAL: All notification methods are disabled. Not sending notification for host 'tgprint-h1', chart 'cups.job_num' to 'root' for 'hang_processing_print_jobs_hall1' = '1' for status 'CRITICAL'.

The error.log is quite flooded with many errors so I reduced it to the “aclk” messages (from the first node):

2022-02-18 05:53:27: netdata INFO  : MAIN : Starting ACLK sync thread for host ccec8d2e-8e72-11ec-8d3c-0bbeb1006e53 -- scratch area 786944 bytes
2022-02-18 05:53:27: netdata INFO  : MAIN : SQLite aclk sync initialization
2022-02-18 05:53:27: netdata INFO  : MAIN : SQLite aclk sync initialization completed
2022-02-18 05:53:27: netdata INFO  : ACLK_Main : thread created with task id 808782
2022-02-18 05:53:27: netdata INFO  : ACLK_Main : set name of thread 808782 to ACLK_Main
2022-02-18 05:53:27: netdata INFO  : ACLK_Main : Waiting for Cloud to be enabled
2022-02-18 05:53:32: netdata INFO  : ACLK_Main : Wait before attempting to reconnect in 0.000 seconds
2022-02-18 05:53:32: netdata INFO  : ACLK_Stats : thread created with task id 809093
2022-02-18 05:53:32: netdata INFO  : ACLK_Stats : set name of thread 809093 to ACLK_Stats
2022-02-18 05:53:32: netdata INFO  : ACLK_Main : Attempting connection now
2022-02-18 05:53:32: netdata INFO  : ACLK_Main : HTTPS "GET" request to "app.netdata.cloud" finished with HTTP code: 200
2022-02-18 05:53:32: netdata INFO  : ACLK_Main : Getting Cloud /env successful
2022-02-18 05:53:32: netdata INFO  : ACLK_Main : Switching ACLK to new protobuf protocol. Due to /env response.
2022-02-18 05:53:33: netdata INFO  : ACLK_Main : HTTPS "GET" request to "app.netdata.cloud" finished with HTTP code: 200
2022-02-18 05:53:33: netdata INFO  : ACLK_Main : ACLK_OTP Got Challenge from Cloud
2022-02-18 05:53:34: netdata INFO  : ACLK_Main : HTTPS "POST" request to "app.netdata.cloud" finished with HTTP code: 201
2022-02-18 05:53:34: netdata INFO  : ACLK_Main : ACLK_OTP Got Password from Cloud
2022-02-18 05:53:34: netdata INFO  : ACLK_Main : [mqtt_wss] I: ws_client: Websocket Connection Accepted By Server
2022-02-18 05:53:34: netdata INFO  : ACLK_Main : ACLK connection successfully established
2022-02-18 05:53:34: netdata INFO  : ACLK_Main : Starting 2 query threads.
2022-02-18 05:53:34: netdata INFO  : ACLK_Query_0 : thread created with task id 809103
2022-02-18 05:53:34: netdata INFO  : ACLK_Query_0 : set name of thread 809103 to ACLK_Query_0
2022-02-18 05:53:34: netdata INFO  : ACLK_Query_1 : thread created with task id 809104
2022-02-18 05:53:34: netdata INFO  : ACLK_Query_1 : set name of thread 809104 to ACLK_Query_1
2022-02-18 05:53:34: netdata INFO  : ACLK_Main : Queuing status update for node=c1d2dcc4-59db-40cb-8e3b-d9f4d0ec2379, live=1, hops=0
2022-02-18 05:54:12: netdata INFO  : MAIN : Starting ACLK sync thread for host ccec8d2e-8e72-11ec-8d3c-0bbeb1006e53 -- scratch area 786944 bytes
2022-02-18 05:54:12: netdata INFO  : MAIN : SQLite aclk sync initialization
2022-02-18 05:54:12: netdata INFO  : MAIN : SQLite aclk sync initialization completed
2022-02-18 05:54:12: netdata INFO  : ACLK_Main : thread created with task id 809374
2022-02-18 05:54:12: netdata INFO  : ACLK_Main : set name of thread 809374 to ACLK_Main
2022-02-18 05:54:12: netdata INFO  : ACLK_Main : Waiting for Cloud to be enabled
2022-02-18 05:54:17: netdata INFO  : ACLK_Main : Wait before attempting to reconnect in 0.000 seconds
2022-02-18 05:54:56: netdata INFO  : MAIN : Starting ACLK sync thread for host ccec8d2e-8e72-11ec-8d3c-0bbeb1006e53 -- scratch area 786944 bytes
2022-02-18 05:54:57: netdata INFO  : MAIN : SQLite aclk sync initialization
2022-02-18 05:54:57: netdata INFO  : MAIN : SQLite aclk sync initialization completed
2022-02-18 05:54:57: netdata INFO  : ACLK_Main : thread created with task id 809932
2022-02-18 05:54:57: netdata INFO  : ACLK_Main : set name of thread 809932 to ACLK_Main
2022-02-18 05:54:57: netdata INFO  : ACLK_Main : Waiting for Cloud to be enabled
2022-02-18 05:55:02: netdata INFO  : ACLK_Main : Wait before attempting to reconnect in 0.000 seconds
2022-02-18 05:55:02: netdata INFO  : ACLK_Main : Attempting connection now
2022-02-18 05:55:02: netdata INFO  : ACLK_Stats : thread created with task id 810261
2022-02-18 05:55:02: netdata INFO  : ACLK_Stats : set name of thread 810261 to ACLK_Stats
2022-02-18 05:55:02: netdata INFO  : ACLK_Main : HTTPS "GET" request to "app.netdata.cloud" finished with HTTP code: 200
2022-02-18 05:55:02: netdata INFO  : ACLK_Main : Getting Cloud /env successful
2022-02-18 05:55:02: netdata INFO  : ACLK_Main : Switching ACLK to new protobuf protocol. Due to /env response.
2022-02-18 05:55:40: netdata INFO  : MAIN : Starting ACLK sync thread for host ccec8d2e-8e72-11ec-8d3c-0bbeb1006e53 -- scratch area 786944 bytes
2022-02-18 05:55:41: netdata INFO  : MAIN : SQLite aclk sync initialization
2022-02-18 05:55:41: netdata INFO  : MAIN : SQLite aclk sync initialization completed
2022-02-18 05:55:41: netdata INFO  : ACLK_Main : thread created with task id 810519
2022-02-18 05:55:41: netdata INFO  : ACLK_Main : set name of thread 810519 to ACLK_Main
2022-02-18 05:55:41: netdata INFO  : ACLK_Main : Waiting for Cloud to be enabled
2022-02-18 05:55:46: netdata INFO  : ACLK_Main : Wait before attempting to reconnect in 0.000 seconds
2022-02-18 05:56:25: netdata INFO  : MAIN : Starting ACLK sync thread for host ccec8d2e-8e72-11ec-8d3c-0bbeb1006e53 -- scratch area 786944 bytes
2022-02-18 05:56:25: netdata INFO  : MAIN : SQLite aclk sync initialization
2022-02-18 05:56:25: netdata INFO  : MAIN : SQLite aclk sync initialization completed
2022-02-18 05:56:25: netdata INFO  : ACLK_Main : thread created with task id 811122
2022-02-18 05:56:25: netdata INFO  : ACLK_Main : set name of thread 811122 to ACLK_Main
2022-02-18 05:56:25: netdata INFO  : ACLK_Main : Waiting for Cloud to be enabled
2022-02-18 05:56:30: netdata INFO  : ACLK_Main : Wait before attempting to reconnect in 0.000 seconds
2022-02-18 05:56:30: netdata INFO  : ACLK_Main : Attempting connection now
2022-02-18 05:56:30: netdata INFO  : ACLK_Stats : thread created with task id 811444
2022-02-18 05:56:30: netdata INFO  : ACLK_Stats : set name of thread 811444 to ACLK_Stats
2022-02-18 05:56:31: netdata INFO  : ACLK_Main : HTTPS "GET" request to "app.netdata.cloud" finished with HTTP code: 200
2022-02-18 05:56:31: netdata INFO  : ACLK_Main : Getting Cloud /env successful
2022-02-18 05:56:31: netdata INFO  : ACLK_Main : Switching ACLK to new protobuf protocol. Due to /env response.
2022-02-18 05:56:31: netdata INFO  : ACLK_Main : HTTPS "GET" request to "app.netdata.cloud" finished with HTTP code: 200
2022-02-18 05:56:31: netdata INFO  : ACLK_Main : ACLK_OTP Got Challenge from Cloud
2022-02-18 05:56:31: netdata INFO  : ACLK_Main : HTTPS "POST" request to "app.netdata.cloud" finished with HTTP code: 201
2022-02-18 05:56:31: netdata INFO  : ACLK_Main : ACLK_OTP Got Password from Cloud
2022-02-18 05:56:32: netdata INFO  : ACLK_Main : [mqtt_wss] I: ws_client: Websocket Connection Accepted By Server
2022-02-18 05:56:32: netdata INFO  : ACLK_Main : ACLK connection successfully established
2022-02-18 05:56:32: netdata INFO  : ACLK_Main : Starting 2 query threads.
2022-02-18 05:56:32: netdata INFO  : ACLK_Query_0 : thread created with task id 811453
2022-02-18 05:56:32: netdata INFO  : ACLK_Query_1 : thread created with task id 811454
2022-02-18 05:56:32: netdata INFO  : ACLK_Query_1 : set name of thread 811454 to ACLK_Query_1
2022-02-18 05:56:32: netdata INFO  : ACLK_Query_0 : set name of thread 811453 to ACLK_Query_0
2022-02-18 05:56:32: netdata INFO  : ACLK_Main : Queuing status update for node=c1d2dcc4-59db-40cb-8e3b-d9f4d0ec2379, live=1, hops=0
2022-02-18 05:57:08: netdata INFO  : MAIN : Starting ACLK sync thread for host ccec8d2e-8e72-11ec-8d3c-0bbeb1006e53 -- scratch area 786944 bytes
2022-02-18 05:57:09: netdata INFO  : MAIN : SQLite aclk sync initialization
2022-02-18 05:57:09: netdata INFO  : MAIN : SQLite aclk sync initialization completed
2022-02-18 05:57:09: netdata INFO  : ACLK_Main : thread created with task id 811711
2022-02-18 05:57:09: netdata INFO  : ACLK_Main : set name of thread 811711 to ACLK_Main
2022-02-18 05:57:09: netdata INFO  : ACLK_Main : Waiting for Cloud to be enabled
2022-02-18 05:57:14: netdata INFO  : ACLK_Main : Wait before attempting to reconnect in 0.000 seconds
2022-02-18 05:57:14: netdata INFO  : ACLK_Main : Attempting connection now
2022-02-18 05:57:14: netdata INFO  : ACLK_Stats : thread created with task id 812030
2022-02-18 05:57:14: netdata INFO  : ACLK_Stats : set name of thread 812030 to ACLK_Stats
2022-02-18 05:57:14: netdata INFO  : ACLK_Main : HTTPS "GET" request to "app.netdata.cloud" finished with HTTP code: 200
2022-02-18 05:57:14: netdata INFO  : ACLK_Main : Getting Cloud /env successful
2022-02-18 05:57:14: netdata INFO  : ACLK_Main : Switching ACLK to new protobuf protocol. Due to /env response.
2022-02-18 05:57:15: netdata INFO  : ACLK_Main : HTTPS "GET" request to "app.netdata.cloud" finished with HTTP code: 200
2022-02-18 05:57:15: netdata INFO  : ACLK_Main : ACLK_OTP Got Challenge from Cloud
2022-02-18 05:57:15: netdata INFO  : ACLK_Main : HTTPS "POST" request to "app.netdata.cloud" finished with HTTP code: 201
2022-02-18 05:57:15: netdata INFO  : ACLK_Main : ACLK_OTP Got Password from Cloud
2022-02-18 05:57:15: netdata INFO  : ACLK_Main : [mqtt_wss] I: ws_client: Websocket Connection Accepted By Server
2022-02-18 05:57:16: netdata INFO  : ACLK_Main : ACLK connection successfully established
2022-02-18 05:57:16: netdata INFO  : ACLK_Main : Starting 2 query threads.
2022-02-18 05:57:16: netdata INFO  : ACLK_Query_0 : thread created with task id 812055
2022-02-18 05:57:16: netdata INFO  : ACLK_Query_1 : thread created with task id 812056
2022-02-18 05:57:16: netdata INFO  : ACLK_Query_1 : set name of thread 812056 to ACLK_Query_1
2022-02-18 05:57:16: netdata INFO  : ACLK_Query_0 : set name of thread 812055 to ACLK_Query_0
2022-02-18 05:57:16: netdata INFO  : ACLK_Main : Queuing status update for node=c1d2dcc4-59db-40cb-8e3b-d9f4d0ec2379, live=1, hops=0
2022-02-18 05:57:54: netdata INFO  : MAIN : Starting ACLK sync thread for host ccec8d2e-8e72-11ec-8d3c-0bbeb1006e53 -- scratch area 786944 bytes
2022-02-18 05:57:54: netdata INFO  : MAIN : SQLite aclk sync initialization
2022-02-18 05:57:54: netdata INFO  : MAIN : SQLite aclk sync initialization completed
2022-02-18 05:57:54: netdata INFO  : ACLK_Main : thread created with task id 812297
2022-02-18 05:57:54: netdata INFO  : ACLK_Main : set name of thread 812297 to ACLK_Main
2022-02-18 05:57:54: netdata INFO  : ACLK_Main : Waiting for Cloud to be enabled
2022-02-18 05:57:59: netdata INFO  : ACLK_Main : Wait before attempting to reconnect in 0.000 seconds
2022-02-18 05:57:59: netdata INFO  : ACLK_Main : Attempting connection now
2022-02-18 05:57:59: netdata INFO  : ACLK_Stats : thread created with task id 812608
2022-02-18 05:57:59: netdata INFO  : ACLK_Stats : set name of thread 812608 to ACLK_Stats
2022-02-18 05:58:00: netdata INFO  : ACLK_Main : HTTPS "GET" request to "app.netdata.cloud" finished with HTTP code: 200
2022-02-18 05:58:00: netdata INFO  : ACLK_Main : Getting Cloud /env successful
2022-02-18 05:58:00: netdata INFO  : ACLK_Main : Switching ACLK to new protobuf protocol. Due to /env response.
2022-02-18 05:58:00: netdata INFO  : ACLK_Main : HTTPS "GET" request to "app.netdata.cloud" finished with HTTP code: 200
2022-02-18 05:58:00: netdata INFO  : ACLK_Main : ACLK_OTP Got Challenge from Cloud
2022-02-18 05:58:01: netdata INFO  : ACLK_Main : HTTPS "POST" request to "app.netdata.cloud" finished with HTTP code: 201
2022-02-18 05:58:01: netdata INFO  : ACLK_Main : ACLK_OTP Got Password from Cloud
2022-02-18 05:58:01: netdata INFO  : ACLK_Main : [mqtt_wss] I: ws_client: Websocket Connection Accepted By Server
2022-02-18 05:58:01: netdata INFO  : ACLK_Main : ACLK connection successfully established
2022-02-18 05:58:01: netdata INFO  : ACLK_Main : Starting 2 query threads.
2022-02-18 05:58:01: netdata INFO  : ACLK_Query_0 : thread created with task id 812623
2022-02-18 05:58:01: netdata INFO  : ACLK_Query_1 : thread created with task id 812624
2022-02-18 05:58:01: netdata INFO  : ACLK_Query_1 : set name of thread 812624 to ACLK_Query_1
2022-02-18 05:58:01: netdata INFO  : ACLK_Query_0 : set name of thread 812623 to ACLK_Query_0
2022-02-18 05:58:02: netdata INFO  : ACLK_Main : Queuing status update for node=c1d2dcc4-59db-40cb-8e3b-d9f4d0ec2379, live=1, hops=0
2022-02-18 05:58:38: netdata INFO  : MAIN : Starting ACLK sync thread for host ccec8d2e-8e72-11ec-8d3c-0bbeb1006e53 -- scratch area 786944 bytes
2022-02-18 05:58:38: netdata INFO  : MAIN : SQLite aclk sync initialization
2022-02-18 05:58:38: netdata INFO  : MAIN : SQLite aclk sync initialization completed
2022-02-18 05:58:38: netdata INFO  : ACLK_Main : thread created with task id 812872
2022-02-18 05:58:38: netdata INFO  : ACLK_Main : set name of thread 812872 to ACLK_Main
2022-02-18 05:58:38: netdata INFO  : ACLK_Main : Waiting for Cloud to be enabled
2022-02-18 05:58:43: netdata INFO  : ACLK_Main : Wait before attempting to reconnect in 0.000 seconds
2022-02-18 05:58:43: netdata INFO  : ACLK_Main : Attempting connection now
2022-02-18 05:58:43: netdata INFO  : ACLK_Stats : thread created with task id 813191
2022-02-18 05:58:43: netdata INFO  : ACLK_Stats : set name of thread 813191 to ACLK_Stats
2022-02-18 05:58:44: netdata INFO  : ACLK_Main : HTTPS "GET" request to "app.netdata.cloud" finished with HTTP code: 200
2022-02-18 05:58:44: netdata INFO  : ACLK_Main : Getting Cloud /env successful
2022-02-18 05:58:44: netdata INFO  : ACLK_Main : Switching ACLK to new protobuf protocol. Due to /env response.
2022-02-18 05:58:44: netdata INFO  : ACLK_Main : HTTPS "GET" request to "app.netdata.cloud" finished with HTTP code: 200
2022-02-18 05:58:44: netdata INFO  : ACLK_Main : ACLK_OTP Got Challenge from Cloud
2022-02-18 05:58:45: netdata INFO  : ACLK_Main : HTTPS "POST" request to "app.netdata.cloud" finished with HTTP code: 201
2022-02-18 05:58:45: netdata INFO  : ACLK_Main : ACLK_OTP Got Password from Cloud
2022-02-18 05:58:45: netdata INFO  : ACLK_Main : [mqtt_wss] I: ws_client: Websocket Connection Accepted By Server
2022-02-18 05:58:45: netdata INFO  : ACLK_Main : ACLK connection successfully established
2022-02-18 05:58:45: netdata INFO  : ACLK_Main : Starting 2 query threads.
2022-02-18 05:58:45: netdata INFO  : ACLK_Query_0 : thread created with task id 813200
2022-02-18 05:58:45: netdata INFO  : ACLK_Query_1 : thread created with task id 813201
2022-02-18 05:58:45: netdata INFO  : ACLK_Query_1 : set name of thread 813201 to ACLK_Query_1
2022-02-18 05:58:45: netdata INFO  : ACLK_Query_0 : set name of thread 813200 to ACLK_Query_0
2022-02-18 05:58:46: netdata INFO  : ACLK_Main : Queuing status update for node=c1d2dcc4-59db-40cb-8e3b-d9f4d0ec2379, live=1, hops=0
2022-02-18 05:59:21: netdata INFO  : MAIN : Starting ACLK sync thread for host ccec8d2e-8e72-11ec-8d3c-0bbeb1006e53 -- scratch area 786944 bytes
2022-02-18 05:59:21: netdata INFO  : MAIN : SQLite aclk sync initialization
2022-02-18 05:59:21: netdata INFO  : MAIN : SQLite aclk sync initialization completed
2022-02-18 05:59:21: netdata INFO  : ACLK_Main : thread created with task id 813462
2022-02-18 05:59:21: netdata INFO  : ACLK_Main : set name of thread 813462 to ACLK_Main
2022-02-18 05:59:21: netdata INFO  : ACLK_Main : Waiting for Cloud to be enabled
2022-02-18 05:59:26: netdata INFO  : ACLK_Main : Wait before attempting to reconnect in 0.000 seconds
2022-02-18 06:00:06: netdata INFO  : MAIN : Starting ACLK sync thread for host ccec8d2e-8e72-11ec-8d3c-0bbeb1006e53 -- scratch area 786944 bytes
2022-02-18 06:00:06: netdata INFO  : MAIN : SQLite aclk sync initialization
2022-02-18 06:00:06: netdata INFO  : MAIN : SQLite aclk sync initialization completed
2022-02-18 06:00:06: netdata INFO  : ACLK_Main : thread created with task id 814056
2022-02-18 06:00:06: netdata INFO  : ACLK_Main : set name of thread 814056 to ACLK_Main
2022-02-18 06:00:06: netdata INFO  : ACLK_Main : Waiting for Cloud to be enabled
2022-02-18 06:00:11: netdata INFO  : ACLK_Main : Wait before attempting to reconnect in 0.000 seconds
2022-02-18 06:00:11: netdata INFO  : ACLK_Main : Attempting connection now
2022-02-18 06:00:11: netdata INFO  : ACLK_Stats : thread created with task id 814380
2022-02-18 06:00:11: netdata INFO  : ACLK_Stats : set name of thread 814380 to ACLK_Stats
2022-02-18 06:00:11: netdata INFO  : ACLK_Main : HTTPS "GET" request to "app.netdata.cloud" finished with HTTP code: 200
2022-02-18 06:00:11: netdata INFO  : ACLK_Main : Getting Cloud /env successful
2022-02-18 06:00:11: netdata INFO  : ACLK_Main : Switching ACLK to new protobuf protocol. Due to /env response.
2022-02-18 06:00:12: netdata INFO  : ACLK_Main : HTTPS "GET" request to "app.netdata.cloud" finished with HTTP code: 200
2022-02-18 06:00:12: netdata INFO  : ACLK_Main : ACLK_OTP Got Challenge from Cloud
2022-02-18 06:00:12: netdata INFO  : ACLK_Main : HTTPS "POST" request to "app.netdata.cloud" finished with HTTP code: 201
2022-02-18 06:00:12: netdata INFO  : ACLK_Main : ACLK_OTP Got Password from Cloud
2022-02-18 06:00:13: netdata INFO  : ACLK_Main : [mqtt_wss] I: ws_client: Websocket Connection Accepted By Server
2022-02-18 06:00:13: netdata INFO  : ACLK_Main : ACLK connection successfully established
2022-02-18 06:00:13: netdata INFO  : ACLK_Main : Starting 2 query threads.
2022-02-18 06:00:13: netdata INFO  : ACLK_Query_1 : thread created with task id 814406
2022-02-18 06:00:13: netdata INFO  : ACLK_Query_1 : set name of thread 814406 to ACLK_Query_1
2022-02-18 06:00:13: netdata INFO  : ACLK_Query_0 : thread created with task id 814405
2022-02-18 06:00:13: netdata INFO  : ACLK_Query_0 : set name of thread 814405 to ACLK_Query_0
2022-02-18 06:00:13: netdata INFO  : ACLK_Main : Queuing status update for node=c1d2dcc4-59db-40cb-8e3b-d9f4d0ec2379, live=1, hops=0
.........
2022-02-18 06:28:28: netdata INFO  : MAIN : Starting ACLK sync thread for host ccec8d2e-8e72-11ec-8d3c-0bbeb1006e53 -- scratch area 786944 bytes
2022-02-18 06:28:29: netdata INFO  : MAIN : SQLite aclk sync initialization
2022-02-18 06:28:29: netdata INFO  : MAIN : SQLite aclk sync initialization completed
2022-02-18 06:28:29: netdata INFO  : ACLK_Main : thread created with task id 833701
2022-02-18 06:28:29: netdata INFO  : ACLK_Main : set name of thread 833701 to ACLK_Main
2022-02-18 06:28:29: netdata INFO  : ACLK_Main : Waiting for Cloud to be enabled
2022-02-18 06:28:34: netdata INFO  : ACLK_Main : Wait before attempting to reconnect in 0.000 seconds
2022-02-18 06:28:34: netdata INFO  : ACLK_Main : Attempting connection now
2022-02-18 06:28:34: netdata INFO  : ACLK_Stats : thread created with task id 834020
2022-02-18 06:28:34: netdata INFO  : ACLK_Stats : set name of thread 834020 to ACLK_Stats
2022-02-18 06:28:34: netdata INFO  : ACLK_Main : HTTPS "GET" request to "app.netdata.cloud" finished with HTTP code: 200
2022-02-18 06:28:34: netdata INFO  : ACLK_Main : Getting Cloud /env successful
2022-02-18 06:28:34: netdata INFO  : ACLK_Main : Switching ACLK to new protobuf protocol. Due to /env response.
2022-02-18 06:28:35: netdata INFO  : ACLK_Main : HTTPS "GET" request to "app.netdata.cloud" finished with HTTP code: 200
2022-02-18 06:28:35: netdata INFO  : ACLK_Main : ACLK_OTP Got Challenge from Cloud
2022-02-18 06:28:35: netdata INFO  : ACLK_Main : HTTPS "POST" request to "app.netdata.cloud" finished with HTTP code: 201
2022-02-18 06:28:35: netdata INFO  : ACLK_Main : ACLK_OTP Got Password from Cloud
2022-02-18 06:28:35: netdata INFO  : ACLK_Main : [mqtt_wss] I: ws_client: Websocket Connection Accepted By Server
2022-02-18 06:28:36: netdata INFO  : ACLK_Main : ACLK connection successfully established
2022-02-18 06:28:36: netdata INFO  : ACLK_Main : Starting 2 query threads.
2022-02-18 06:28:36: netdata INFO  : ACLK_Query_0 : thread created with task id 834029
2022-02-18 06:28:36: netdata INFO  : ACLK_Query_0 : set name of thread 834029 to ACLK_Query_0
2022-02-18 06:28:36: netdata INFO  : ACLK_Query_1 : thread created with task id 834030
2022-02-18 06:28:36: netdata INFO  : ACLK_Query_1 : set name of thread 834030 to ACLK_Query_1
2022-02-18 06:28:36: netdata INFO  : ACLK_Main : Queuing status update for node=c1d2dcc4-59db-40cb-8e3b-d9f4d0ec2379, live=1, hops=0
2022-02-18 06:29:14: netdata INFO  : MAIN : Starting ACLK sync thread for host ccec8d2e-8e72-11ec-8d3c-0bbeb1006e53 -- scratch area 786944 bytes
2022-02-18 06:29:14: netdata INFO  : MAIN : SQLite aclk sync initialization
2022-02-18 06:29:14: netdata INFO  : MAIN : SQLite aclk sync initialization completed
2022-02-18 06:29:14: netdata INFO  : ACLK_Main : thread created with task id 834291
2022-02-18 06:29:14: netdata INFO  : ACLK_Main : set name of thread 834291 to ACLK_Main
2022-02-18 06:29:14: netdata INFO  : ACLK_Main : Waiting for Cloud to be enabled
2022-02-18 06:29:19: netdata INFO  : ACLK_Main : Wait before attempting to reconnect in 0.000 seconds
2022-02-18 06:29:19: netdata INFO  : ACLK_Main : Attempting connection now
2022-02-18 06:29:19: netdata INFO  : ACLK_Stats : thread created with task id 834612
2022-02-18 06:29:19: netdata INFO  : ACLK_Stats : set name of thread 834612 to ACLK_Stats
2022-02-18 06:29:19: netdata INFO  : ACLK_Main : HTTPS "GET" request to "app.netdata.cloud" finished with HTTP code: 200
2022-02-18 06:29:19: netdata INFO  : ACLK_Main : Getting Cloud /env successful
2022-02-18 06:29:19: netdata INFO  : ACLK_Main : Switching ACLK to new protobuf protocol. Due to /env response.
2022-02-18 06:29:20: netdata INFO  : ACLK_Main : HTTPS "GET" request to "app.netdata.cloud" finished with HTTP code: 200
2022-02-18 06:29:20: netdata INFO  : ACLK_Main : ACLK_OTP Got Challenge from Cloud
2022-02-18 06:29:20: netdata INFO  : ACLK_Main : HTTPS "POST" request to "app.netdata.cloud" finished with HTTP code: 201
2022-02-18 06:29:20: netdata INFO  : ACLK_Main : ACLK_OTP Got Password from Cloud
2022-02-18 06:29:21: netdata INFO  : ACLK_Main : [mqtt_wss] I: ws_client: Websocket Connection Accepted By Server
2022-02-18 06:29:21: netdata INFO  : ACLK_Main : ACLK connection successfully established
2022-02-18 06:29:21: netdata INFO  : ACLK_Main : Starting 2 query threads.
2022-02-18 06:29:21: netdata INFO  : ACLK_Query_0 : thread created with task id 834620
2022-02-18 06:29:21: netdata INFO  : ACLK_Query_1 : thread created with task id 834621
2022-02-18 06:29:21: netdata INFO  : ACLK_Query_0 : set name of thread 834620 to ACLK_Query_0
2022-02-18 06:29:21: netdata INFO  : ACLK_Query_1 : set name of thread 834621 to ACLK_Query_1
2022-02-18 06:29:21: netdata INFO  : ACLK_Main : Queuing status update for node=c1d2dcc4-59db-40cb-8e3b-d9f4d0ec2379, live=1, hops=0
2022-02-18 06:29:59: netdata INFO  : MAIN : Starting ACLK sync thread for host ccec8d2e-8e72-11ec-8d3c-0bbeb1006e53 -- scratch area 786944 bytes
2022-02-18 06:29:59: netdata INFO  : MAIN : SQLite aclk sync initialization
2022-02-18 06:29:59: netdata INFO  : MAIN : SQLite aclk sync initialization completed
2022-02-18 06:29:59: netdata INFO  : ACLK_Main : thread created with task id 834864
2022-02-18 06:29:59: netdata INFO  : ACLK_Main : set name of thread 834864 to ACLK_Main
2022-02-18 06:29:59: netdata INFO  : ACLK_Main : Waiting for Cloud to be enabled
2022-02-18 06:30:04: netdata INFO  : ACLK_Main : Wait before attempting to reconnect in 0.000 seconds
2022-02-18 06:30:04: netdata INFO  : ACLK_Stats : thread created with task id 835207
2022-02-18 06:30:42: netdata INFO  : MAIN : Starting ACLK sync thread for host ccec8d2e-8e72-11ec-8d3c-0bbeb1006e53 -- scratch area 786944 bytes
2022-02-18 06:30:42: netdata INFO  : MAIN : SQLite aclk sync initialization
2022-02-18 06:30:42: netdata INFO  : MAIN : SQLite aclk sync initialization completed
2022-02-18 06:30:42: netdata INFO  : ACLK_Main : thread created with task id 835455
2022-02-18 06:30:42: netdata INFO  : ACLK_Main : set name of thread 835455 to ACLK_Main
2022-02-18 06:30:42: netdata INFO  : ACLK_Main : Waiting for Cloud to be enabled
2022-02-18 06:30:47: netdata INFO  : ACLK_Main : Wait before attempting to reconnect in 0.000 seconds
2022-02-18 06:31:27: netdata INFO  : MAIN : Starting ACLK sync thread for host ccec8d2e-8e72-11ec-8d3c-0bbeb1006e53 -- scratch area 786944 bytes
2022-02-18 06:31:27: netdata INFO  : MAIN : SQLite aclk sync initialization
2022-02-18 06:31:27: netdata INFO  : MAIN : SQLite aclk sync initialization completed
2022-02-18 06:31:27: netdata INFO  : ACLK_Main : thread created with task id 836045
2022-02-18 06:31:27: netdata INFO  : ACLK_Main : set name of thread 836045 to ACLK_Main
2022-02-18 06:31:27: netdata INFO  : ACLK_Main : Waiting for Cloud to be enabled
2022-02-18 06:31:32: netdata INFO  : ACLK_Main : Wait before attempting to reconnect in 0.000 seconds

The cups plugin throws many errors like these:

2022-02-18 08:49:32: netdata INFO  : PLUGINSD[cups] : Collector updated metadata for chart cups.job_size_HP_LaserJet_Pro_M404_M405_A57537_
2022-02-18 08:49:33: netdata INFO  : PLUGINSD[cups] : RRDSET: chart name 'cups.job_num_HP_LaserJet_Pro_M404_M405_A59541_' on host 'tgprint-h1' already exists.
2022-02-18 08:49:33: netdata INFO  : PLUGINSD[cups] : Collector updated metadata for chart cups.job_num_HP_LaserJet_Pro_M404_M405_A59541_
2022-02-18 08:49:33: netdata INFO  : PLUGINSD[cups] : RRDSET: chart name 'cups.job_size_HP_LaserJet_Pro_M404_M405_A59541_' on host 'tgprint-h1' already exists.
2022-02-18 08:49:33: netdata INFO  : PLUGINSD[cups] : Collector updated metadata for chart cups.job_size_HP_LaserJet_Pro_M404_M405_A59541_
2022-02-18 08:49:33: netdata INFO  : PLUGINSD[cups] : RRDSET: chart name 'cups.job_num_HP_LaserJet_Pro_M404_M405_A5843E_' on host 'tgprint-h1' already exists.
2022-02-18 08:49:33: netdata INFO  : PLUGINSD[cups] : Collector updated metadata for chart cups.job_num_HP_LaserJet_Pro_M404_M405_A5843E_
2022-02-18 08:49:33: netdata INFO  : PLUGINSD[cups] : RRDSET: chart name 'cups.job_size_HP_LaserJet_Pro_M404_M405_A5843E_' on host 'tgprint-h1' already exists.
2022-02-18 08:49:33: netdata INFO  : PLUGINSD[cups] : Collector updated metadata for chart cups.job_size_HP_LaserJet_Pro_M404_M405_A5843E_
2022-02-18 08:49:33: netdata LOG FLOOD PROTECTION too many logs (201 logs in 584 seconds, threshold is set to 200 logs in 1200 seconds). Preventing more logs from process 'netdata' for 616 seconds.

This could be happening because I use cupsbrowsed.

What I expected to happen

To reconnect to the cloud without the many auto restarts of netdata.service.

At least after the agent service on the node is running again smoothly the flood protection should be disabled so that I get the alerts. At the moment it seems to be active 24h. Could this be happening due to the LOG FLOOD PROTECTION?

Many thanks in advance for any suggestions!

Hi @tonyh ! Welcome, thanks for your report!

Indeed, there seems to be something wrong, with multiple connections/disconnections to the cloud.

Is it possible please to forward the complete error.log file to manolis at netdata dot cloud so we can have a check?

Thank you!

Hello @Manolis_Vasilakis ! Many thanks for your quick response. I will send the logs from both nodes.

Hi @tonyh

It seems that the agent is crashing at some short intervals… We will need to look it further, but as a first option, is it possible to disable the ebpf.plugin ? You can do that if you follow the following link: eBPF monitoring with Netdata | Learn Netdata (but use ebpf = no). And we can then check if it continues to behave in the same way.

Thanks!

Hi @Manolis_Vasilakis

I did the change to the config and restarted the service on both nodes at around 2022-02-18 15:35:00.
On node1 it seems more worsened since it took much longer now to send the metrics again:

On node2 it seems to be pretty much the same.

But the log output seems to be quite different. I have sent you the current ones.

Thank you very much!

EDIT: Sorry, on node 2 was a typo in the plugin-setting. I have to check there again.

Hi Tony! Thanks, ok, got them, will have a look and let you know, thanks!

Hi Tony! First of all many thanks for sticking for this around, and your help in providing logs. I think the issue has been detected, and a PR is up to try to fix it: Skip info field in protobuf alerts messages if it doesn't exist. by MrZammler · Pull Request #12210 · netdata/netdata · GitHub

The problem is that the custom alert you have is missing an info field. Until the patch reaches your setup though, there are a couple of things to try and work around it. First of all, add an info field in the alert with just a text entry to describe it (doesn’t functionally matter what it reads).

So new alerts raised by the new (with added info) will be okay. But the agent will try to send those old messages (old raised alerts) to the cloud and will keep failing. At some point though, the backlog of messages will be cleared by the agent, and from then on it should be okay. But it might take a while.

A more aggressive solution would be to delete the sqlite database, but I would advise to first check the above scenario.

Of course waiting for the patch to reach a release is also an option, in which case you shouldn’t have to change anything (it will handle the alert with missing info).

Thanks again and sorry for all the trouble!

Hi @Manolis_Vasilakis !

Thank you very much for tracking down the issue! I added the info: parameter to all my custom alerts and restarted the service. Since then I didn’t have any issues anymore. The alerts are triggered successfully and the agents are connecting right away.

Excellent! Glad I could help! Let us know of any other issues you may encounter!

Thanks!