change alam not applied

vahid_sohrabloo · July 31, 2022, 10:10am

Hi
I changed the the tco_listen.conf to this

 alarm: 1m_tcp_accept_queue_overflows
       on: ip.tcp_accept_queue
    class: Workload
     type: System
component: Network
       os: linux
    hosts: *
   lookup: average -60s unaligned absolute of ListenOverflows
    units: overflows
    every: 10s
     warn: $this > 1
     crit: $this > 30
    delay: up 0 down 5m multiplier 1.5 max 1h
     info: average number of overflows in the TCP accept queue over the last minute
       to: sysadmin

and then restart the agent.
But in the cloud I still have the old config

$this > (($status == $CRITICAL) ? (1) : (5))

Manolis_Vasilakis · August 1, 2022, 9:34am

Hey @vahid_sohrabloo !

Can you please check under http://localhost:19999/api/v1/alarms?all that the new alert is active?

Can you also please send us the node_id as it appears under http://localhost:19999/api/v1/info when the agent is connected to the cloud?

Thanks!

vahid_sohrabloo · August 1, 2022, 9:43am

HI @Manolis_Vasilakis
this is from http://localhost:19999/api/v1/alarms?all

"ip.tcp_accept_queue.1m_tcp_accept_queue_overflows": {
			"id": 1655335791,
			"config_hash_id": "6856b639-35df-0620-16e1-15bd49176581",
			"name": "1m_tcp_accept_queue_overflows",
			"chart": "ip.tcp_accept_queue",
			"family": "tcp",
			"class": "Workload",
			"component": "Network",
			"type": "System",
			"active": true,
			"disabled": false,
			"silenced": false,
			"exec": "/usr/libexec/netdata/plugins.d/alarm-notify.sh",
			"recipient": "sysadmin",
			"source": "21@/etc/netdata/health.d/tcp_listen.conf",
			"units": "overflows",
			"info": "average number of overflows in the TCP accept queue over the last minute",
			"status": "CLEAR",
			"last_status_change": 1659319268,
			"last_updated": 1659346898,
			"next_update": 1659346908,
			"update_every": 10,
			"delay_up_duration": 0,
			"delay_down_duration": 300,
			"delay_max_duration": 3600,
			"delay_multiplier": 1.500000,
			"delay": 300,
			"delay_up_to_timestamp": 1659319568,
			"warn_repeat_every": "0",
			"crit_repeat_every": "0",
			"value_string": "0 overflows",
			"last_repeat": "0",
			"times_repeat": 0,
			"lookup_dimensions":"ListenOverflows",
			"db_after": 1659346838,
			"db_before": 1659346897,
			"lookup_method": "average",
			"lookup_after": -60,
			"lookup_before": 0,
			"lookup_options": "absolute unaligned",
			"warn":"$this &gt; 1",
			"warn_parsed":"(${this} &gt; 1)",
			"crit":"$this &gt; 30",
			"crit_parsed":"(${this} &gt; 30)",
			"green":null,
			"red":null,
			"value":0
		},

and node id

 "node_id": "d9dd4a47-4737-49af-9d10-e4fc41ad6b80"

Manolis_Vasilakis · August 1, 2022, 10:29am

Hi, thanks for this.

In which part on the cloud are you seeing the old configuration? Is it on the alert configuration tab, or in the alert drawer when such an alert is raised?

Can you check again please, and have you tried a refresh on that page?

vahid_sohrabloo · August 1, 2022, 9:04pm

Hi. Thanks for your response.
now it’s OK. it’s really weird. I applied a week ago. I received a lot of error messages since tim
e. I copied it from the alert tab here.
thanks.

vahid_sohrabloo · August 2, 2022, 9:28am

Hi @Manolis_Vasilakis
I received an alert in slack with this
x is critical, `ip.tcp_accept_queue` (*tcp* ), **1m tcp accept queue overflows = 11.2 overflows**
but as you see I set critical to more than 30

Manolis_Vasilakis · August 2, 2022, 9:53am

Hi @vahid_sohrabloo

This is strange, so it appears once to be running ok, then you get an alert with the old configuration?

Can you tell me a bit more about your configuration? Is it a docker instance? How did you edit the alert? Did you use edit-config ?

Receiving this alert on slack means that the alert comes from the agent itself. Without restarting the agent that sent the alert, is it possible to check again http://localhost:19999/api/v1/alarms?all to make sure the alert has the new (crit: $this > 30) rather than the old ($this > (($status == $CRITICAL) ? (1) : (5))) configuration?

vahid_sohrabloo · August 2, 2022, 10:02am

Hi. It runs directly on the host.
this is the config

			"id": 1655335791,
			"config_hash_id": "6856b639-35df-0620-16e1-15bd49176581",
			"name": "1m_tcp_accept_queue_overflows",
			"chart": "ip.tcp_accept_queue",
			"family": "tcp",
			"class": "Workload",
			"component": "Network",
			"type": "System",
			"active": true,
			"disabled": false,
			"silenced": false,
			"exec": "/usr/libexec/netdata/plugins.d/alarm-notify.sh",
			"recipient": "sysadmin",
			"source": "21@/etc/netdata/health.d/tcp_listen.conf",
			"units": "overflows",
			"info": "average number of overflows in the TCP accept queue over the last minute",
			"status": "CLEAR",
			"last_status_change": 1659405672,
			"last_updated": 1659434252,
			"next_update": 1659434262,
			"update_every": 10,
			"delay_up_duration": 0,
			"delay_down_duration": 300,
			"delay_max_duration": 3600,
			"delay_multiplier": 1.500000,
			"delay": 300,
			"delay_up_to_timestamp": 1659405972,
			"warn_repeat_every": "0",
			"crit_repeat_every": "0",
			"value_string": "0 overflows",
			"last_repeat": "0",
			"times_repeat": 0,
			"lookup_dimensions":"ListenOverflows",
			"db_after": 1659434192,
			"db_before": 1659434251,
			"lookup_method": "average",
			"lookup_after": -60,
			"lookup_before": 0,
			"lookup_options": "absolute unaligned",
			"warn":"$this &gt; 1",
			"warn_parsed":"(${this} &gt; 1)",
			"crit":"$this &gt; 30",
			"crit_parsed":"(${this} &gt; 30)",
			"green":null,
			"red":null,
			"value":0
		},

I use ansible to configure a lot of nodes.
Also, I tried a few times with edit-config and restart the agent multiple times.
Other nodes work correctly

Manolis_Vasilakis · August 2, 2022, 10:17am

The alert you received a little while ago, can you check if it was raised recently?

vahid_sohrabloo · August 2, 2022, 12:50pm

I received that alert today. (4 PM CEST)

Manolis_Vasilakis · August 4, 2022, 10:07am

Hi!

Can you please do the following:

In netdata.conf can you add: debug flags = 0x0000000000800000 under the [logs] section and restart netdata?

This should create some information in debug.log file. Could you then please share it with me at manolis at netdata dot cloud?

Thanks a lot!

Topic		Replies	Views
1m_tcp_accept_queue_drops Alerts	1	5444	April 28, 2023
1m_tcp_accept_queue_overflows Alerts	0	4971	November 10, 2021
How to configure a notification when the traffic exceeds 100mbs Help cloud	1	721	December 6, 2021
Ipv4 tcp resets received Help agent	4	3264	January 11, 2022
Entropy Alarm outdated Help cloud	12	961	August 14, 2022

change alam not applied

Related topics