Slack Notification is recurring

Hey there,

we recently started to use Netdata on our servers and have some trouble with the Slack-Notification. In general, it works fine an as expected, but sometimes we keep getting the same notification multiple times. In the screenshot you can see, that we have the same recovering-message (dated to the 27th of February), once from yesterday, once from today (without a warning in between). It’s not the first time, we’ve noticed that. I think the last time we deactivated Slack for some time and afterwards, it worked fine again.

Any ideas about this?

Hi @benedikt.allendorf :wave:

Let’s check error.log, perhaps Netdata failed to send warning notification for some reason.

grep "slack notification" error.log

Hey,

that shows some INFOs like “[…] is CLEAR/WARNING without specifying a channel” (seems like for every Slack notification we get). Could that already be part of the problem? We don’t get duplicates for each notification, only for a few.

Edit: Additionally, the notifications we get (at least mostly) are not from the servers running netdata themselves but instead we have one node which receives streaming and sends the (Slack) Notifications.

1 Like

Is it possible to copy and paste in a code block the error log as indicated by @ilyam8 ? That log should originate from the node which receives the streaming data and runs the health checks (parent node).

Sure, the full log looks like this:

Summary

2021-03-03 06:46:58: alarm-notify.sh: INFO: sent slack notification for: ServerA mem.available.ram_available is CLEAR without specifying a channel
2021-03-04 07:23:38: alarm-notify.sh: INFO: sent slack notification for: ServerA mem.available.ram_available is CLEAR without specifying a channel
2021-03-04 10:06:43: alarm-notify.sh: INFO: sent slack notification for: ServerB mem.available.ram_available is WARNING without specifying a channel
2021-03-04 10:07:53: alarm-notify.sh: INFO: sent slack notification for: ServerB system.ram.ram_in_use is WARNING without specifying a channel
2021-03-04 10:39:14: alarm-notify.sh: INFO: sent slack notification for: ServerC ip.tcp_accept_queue.1m_tcp_accept_queue_drops is WARNING without specifying a channel
2021-03-04 10:39:24: alarm-notify.sh: INFO: sent slack notification for: ServerC ip.tcp_syn_queue.1m_tcp_syn_queue_cookies is WARNING without specifying a channel
2021-03-04 10:45:15: alarm-notify.sh: INFO: sent slack notification for: ServerC ip.tcp_syn_queue.1m_tcp_syn_queue_cookies is CLEAR without specifying a channel
2021-03-04 10:45:15: alarm-notify.sh: INFO: sent slack notification for: ServerC ip.tcp_accept_queue.1m_tcp_accept_queue_drops is CLEAR without specifying a channel
2021-03-04 10:55:55: alarm-notify.sh: INFO: sent slack notification for: ServerB mem.available.ram_available is CRITICAL without specifying a channel
2021-03-04 12:46:04: alarm-notify.sh: INFO: sent slack notification for: ServerB system.ram.ram_in_use is CLEAR without specifying a channel
2021-03-04 12:49:34: alarm-notify.sh: INFO: sent slack notification for: ServerB mem.available.ram_available is CLEAR without specifying a channel
2021-03-04 13:29:43: alarm-notify.sh: INFO: sent slack notification for: ServerD disk_backlog.sda.10min_disk_backlog is WARNING without specifying a channel
2021-03-04 13:50:44: alarm-notify.sh: INFO: sent slack notification for: ServerD disk_backlog.sda.10min_disk_backlog is CLEAR without specifying a channel
2021-03-04 14:54:14: alarm-notify.sh: INFO: sent slack notification for: ServerC ip.tcp_accept_queue.1m_tcp_accept_queue_drops is WARNING without specifying a channel
2021-03-04 14:54:26: alarm-notify.sh: INFO: sent slack notification for: ServerC ip.tcp_syn_queue.1m_tcp_syn_queue_cookies is WARNING without specifying a channel
2021-03-04 15:00:24: alarm-notify.sh: INFO: sent slack notification for: ServerC ip.tcp_syn_queue.1m_tcp_syn_queue_cookies is CLEAR without specifying a channel
2021-03-04 15:00:24: alarm-notify.sh: INFO: sent slack notification for: ServerC ip.tcp_accept_queue.1m_tcp_accept_queue_drops is CLEAR without specifying a channel
2021-03-04 18:02:55: alarm-notify.sh: INFO: sent slack notification for: ServerC ip.tcp_accept_queue.1m_tcp_accept_queue_drops is WARNING without specifying a channel
2021-03-04 18:03:04: alarm-notify.sh: INFO: sent slack notification for: ServerC ip.tcp_syn_queue.1m_tcp_syn_queue_cookies is WARNING without specifying a channel
2021-03-04 18:09:24: alarm-notify.sh: INFO: sent slack notification for: ServerC ip.tcp_syn_queue.1m_tcp_syn_queue_cookies is CLEAR without specifying a channel
2021-03-04 18:09:24: alarm-notify.sh: INFO: sent slack notification for: ServerC ip.tcp_accept_queue.1m_tcp_accept_queue_drops is CLEAR without specifying a channel
2021-03-04 21:04:43: alarm-notify.sh: INFO: sent slack notification for: ServerD disk_backlog.sda.10min_disk_backlog is WARNING without specifying a channel
2021-03-04 21:29:43: alarm-notify.sh: INFO: sent slack notification for: ServerD disk_backlog.sda.10min_disk_backlog is CLEAR without specifying a channel
2021-03-04 22:27:24: alarm-notify.sh: INFO: sent slack notification for: ServerC ip.tcp_accept_queue.1m_tcp_accept_queue_drops is WARNING without specifying a channel
2021-03-04 22:28:09: alarm-notify.sh: INFO: sent slack notification for: ServerC ip.tcp_syn_queue.1m_tcp_syn_queue_cookies is WARNING without specifying a channel
2021-03-04 22:33:44: alarm-notify.sh: INFO: sent slack notification for: ServerC ip.tcp_accept_queue.1m_tcp_accept_queue_drops is CLEAR without specifying a channel
2021-03-04 22:33:44: alarm-notify.sh: INFO: sent slack notification for: ServerC ip.tcp_syn_queue.1m_tcp_syn_queue_cookies is CLEAR without specifying a channel
2021-03-04 23:07:14: alarm-notify.sh: INFO: sent slack notification for: ServerB disk_backlog.sda.10min_disk_backlog is WARNING without specifying a channel
2021-03-04 23:33:15: alarm-notify.sh: INFO: sent slack notification for: ServerB disk_backlog.sda.10min_disk_backlog is CLEAR without specifying a channel
2021-03-05 00:13:15: alarm-notify.sh: INFO: sent slack notification for: ServerE disk_backlog.sda.10min_disk_backlog is WARNING without specifying a channel
2021-03-05 00:34:14: alarm-notify.sh: INFO: sent slack notification for: ServerE disk_backlog.sda.10min_disk_backlog is CLEAR without specifying a channel
2021-03-05 06:49:55: alarm-notify.sh: INFO: sent slack notification for: ServerF web_log_apache_vhosts.excluded_requests.web_log_1m_unmatched is WARNING without specifying a channel
2021-03-05 07:00:15: alarm-notify.sh: INFO: sent slack notification for: ServerA mem.available.ram_available is CLEAR without specifying a channel
2021-03-05 07:00:15: alarm-notify.sh: INFO: sent slack notification for: ServerC ip.tcp_syn_queue.1m_tcp_syn_queue_cookies is CLEAR without specifying a channel
2021-03-05 07:00:15: alarm-notify.sh: INFO: sent slack notification for: ServerC ip.tcp_accept_queue.1m_tcp_accept_queue_drops is CLEAR without specifying a channel
2021-03-05 07:10:14: alarm-notify.sh: INFO: sent slack notification for: ServerF web_log_apache_vhosts.excluded_requests.web_log_1m_unmatched is CLEAR without specifying a channel
2021-03-05 09:03:14: alarm-notify.sh: INFO: sent slack notification for: ServerC ip.tcp_accept_queue.1m_tcp_accept_queue_drops is WARNING without specifying a channel
2021-03-05 09:03:25: alarm-notify.sh: INFO: sent slack notification for: ServerC ip.tcp_syn_queue.1m_tcp_syn_queue_cookies is WARNING without specifying a channel
2021-03-05 09:10:14: alarm-notify.sh: INFO: sent slack notification for: ServerC ip.tcp_accept_queue.1m_tcp_accept_queue_drops is CLEAR without specifying a channel
2021-03-05 09:10:14: alarm-notify.sh: INFO: sent slack notification for: ServerC ip.tcp_syn_queue.1m_tcp_syn_queue_cookies is CLEAR without specifying a channel

I analyzed the logs, according them:

  • there is no failed to send events
  • indeed, sometimes there is several sequential CLEAR w/o WARN/CRIT in between

See [!!!], it means wrong transition

ServerA
  └mem.available.ram_available
    └CLEAR->[!!!]CLEAR->[!!!]CLEAR
ServerB
  └mem.available.ram_available
    └WARNING->CRITICAL->CLEAR
  └system.ram.ram_in_use
    └WARNING->CLEAR
  └disk_backlog.sda.10min_disk_backlog
    └WARNING->CLEAR
ServerC
  └ip.tcp_accept_queue.1m_tcp_accept_queue_drops
    └WARNING->CLEAR->WARNING->CLEAR->WARNING->CLEAR->WARNING->CLEAR->[!!!]CLEAR->WARNING->CLEAR
  └ip.tcp_syn_queue.1m_tcp_syn_queue_cookies
    └WARNING->CLEAR->WARNING->CLEAR->WARNING->CLEAR->WARNING->CLEAR->[!!!]CLEAR->WARNING->CLEAR
ServerD
  └disk_backlog.sda.10min_disk_backlog
    └WARNING->CLEAR->WARNING->CLEAR
ServerE
  └disk_backlog.sda.10min_disk_backlog
    └WARNING->CLEAR
ServerF
  └web_log_apache_vhosts.excluded_requests.web_log_1m_unmatched
    └WARNING->CLEAR

Will be much easier to debug when [health] Dispatch some alarms into health.log instead of debug by Saruspete · Pull Request #7576 · netdata/netdata · GitHub is merged.

Is there anything else I could look up or do? Or should I just wait for that Pull Requests?

Does it happen only with CLEAR notifications?

Mostly, but not exclusively I think.

When scrolling through the Slack channel I found at least one Warning that was dated to two days before (but apparently not sent at all on the original day. So more a delay than a duplicate). Those were swap-WARNINGs which happened multiple times per day (and were cleared multiple times). And one time the CLEAR occurred on the next day, maybe that broke something.