change alam not applied

Hi
I changed the the tco_listen.conf to this

 alarm: 1m_tcp_accept_queue_overflows
       on: ip.tcp_accept_queue
    class: Workload
     type: System
component: Network
       os: linux
    hosts: *
   lookup: average -60s unaligned absolute of ListenOverflows
    units: overflows
    every: 10s
     warn: $this > 1
     crit: $this > 30
    delay: up 0 down 5m multiplier 1.5 max 1h
     info: average number of overflows in the TCP accept queue over the last minute
       to: sysadmin

and then restart the agent.
But in the cloud I still have the old config

$this > (($status == $CRITICAL) ? (1) : (5))

Hey @vahid_sohrabloo !

Can you please check under http://localhost:19999/api/v1/alarms?all that the new alert is active?

Can you also please send us the node_id as it appears under http://localhost:19999/api/v1/info when the agent is connected to the cloud?

Thanks!

HI @Manolis_Vasilakis
this is from http://localhost:19999/api/v1/alarms?all

"ip.tcp_accept_queue.1m_tcp_accept_queue_overflows": {
			"id": 1655335791,
			"config_hash_id": "6856b639-35df-0620-16e1-15bd49176581",
			"name": "1m_tcp_accept_queue_overflows",
			"chart": "ip.tcp_accept_queue",
			"family": "tcp",
			"class": "Workload",
			"component": "Network",
			"type": "System",
			"active": true,
			"disabled": false,
			"silenced": false,
			"exec": "/usr/libexec/netdata/plugins.d/alarm-notify.sh",
			"recipient": "sysadmin",
			"source": "21@/etc/netdata/health.d/tcp_listen.conf",
			"units": "overflows",
			"info": "average number of overflows in the TCP accept queue over the last minute",
			"status": "CLEAR",
			"last_status_change": 1659319268,
			"last_updated": 1659346898,
			"next_update": 1659346908,
			"update_every": 10,
			"delay_up_duration": 0,
			"delay_down_duration": 300,
			"delay_max_duration": 3600,
			"delay_multiplier": 1.500000,
			"delay": 300,
			"delay_up_to_timestamp": 1659319568,
			"warn_repeat_every": "0",
			"crit_repeat_every": "0",
			"value_string": "0 overflows",
			"last_repeat": "0",
			"times_repeat": 0,
			"lookup_dimensions":"ListenOverflows",
			"db_after": 1659346838,
			"db_before": 1659346897,
			"lookup_method": "average",
			"lookup_after": -60,
			"lookup_before": 0,
			"lookup_options": "absolute unaligned",
			"warn":"$this > 1",
			"warn_parsed":"(${this} > 1)",
			"crit":"$this > 30",
			"crit_parsed":"(${this} > 30)",
			"green":null,
			"red":null,
			"value":0
		},

and node id

 "node_id": "d9dd4a47-4737-49af-9d10-e4fc41ad6b80"
1 Like

Hi, thanks for this.

In which part on the cloud are you seeing the old configuration? Is it on the alert configuration tab, or in the alert drawer when such an alert is raised?

Can you check again please, and have you tried a refresh on that page?

1 Like

Hi. Thanks for your response.
now it’s OK. it’s really weird. I applied a week ago. I received a lot of error messages since tim
e. I copied it from the alert tab here.
thanks.

1 Like

Hi @Manolis_Vasilakis
I received an alert in slack with this
x is critical, `ip.tcp_accept_queue` (*tcp* ), **1m tcp accept queue overflows = 11.2 overflows**
but as you see I set critical to more than 30

Hi @vahid_sohrabloo

This is strange, so it appears once to be running ok, then you get an alert with the old configuration?

Can you tell me a bit more about your configuration? Is it a docker instance? How did you edit the alert? Did you use edit-config ?

Receiving this alert on slack means that the alert comes from the agent itself. Without restarting the agent that sent the alert, is it possible to check again http://localhost:19999/api/v1/alarms?all to make sure the alert has the new (crit: $this > 30) rather than the old ($this > (($status == $CRITICAL) ? (1) : (5))) configuration?

Hi. It runs directly on the host.
this is the config

			"id": 1655335791,
			"config_hash_id": "6856b639-35df-0620-16e1-15bd49176581",
			"name": "1m_tcp_accept_queue_overflows",
			"chart": "ip.tcp_accept_queue",
			"family": "tcp",
			"class": "Workload",
			"component": "Network",
			"type": "System",
			"active": true,
			"disabled": false,
			"silenced": false,
			"exec": "/usr/libexec/netdata/plugins.d/alarm-notify.sh",
			"recipient": "sysadmin",
			"source": "21@/etc/netdata/health.d/tcp_listen.conf",
			"units": "overflows",
			"info": "average number of overflows in the TCP accept queue over the last minute",
			"status": "CLEAR",
			"last_status_change": 1659405672,
			"last_updated": 1659434252,
			"next_update": 1659434262,
			"update_every": 10,
			"delay_up_duration": 0,
			"delay_down_duration": 300,
			"delay_max_duration": 3600,
			"delay_multiplier": 1.500000,
			"delay": 300,
			"delay_up_to_timestamp": 1659405972,
			"warn_repeat_every": "0",
			"crit_repeat_every": "0",
			"value_string": "0 overflows",
			"last_repeat": "0",
			"times_repeat": 0,
			"lookup_dimensions":"ListenOverflows",
			"db_after": 1659434192,
			"db_before": 1659434251,
			"lookup_method": "average",
			"lookup_after": -60,
			"lookup_before": 0,
			"lookup_options": "absolute unaligned",
			"warn":"$this > 1",
			"warn_parsed":"(${this} > 1)",
			"crit":"$this > 30",
			"crit_parsed":"(${this} > 30)",
			"green":null,
			"red":null,
			"value":0
		},

I use ansible to configure a lot of nodes.
Also, I tried a few times with edit-config and restart the agent multiple times.
Other nodes work correctly

The alert you received a little while ago, can you check if it was raised recently?

I received that alert today. (4 PM CEST)

Hi!

Can you please do the following:

In netdata.conf can you add: debug flags = 0x0000000000800000 under the [logs] section and restart netdata?

This should create some information in debug.log file. Could you then please share it with me at manolis at netdata dot cloud?

Thanks a lot!