How to debug missed local and cloud notifications?

ow-blaze-v · August 6, 2024, 8:23am

I’m experiencing missed notifications, I believe, on at least one of our nodes.

How do I debug and make sure that both local and cloud notifications are being sent?

eg. Local and cloud notifications are somettimes incosistent.

I find that reloading the health config seems to send the missed emails:

sudo netdatacli reload-health

Your help is kindly appreciated. Thanks

Edit 1

Out of the blue, some missing notifications began to appear overnight. Odd.

Edit 2

I’ve just discovered some scripts in this this dir:

/usr/libexec/netdata/plugins.d

I will check it out and see what I can find.

Edit 3

Q: If I were to manually cause a notification (eg Fill disk to over 90%), what log file can I check that will show that netdata has picked up on this event and is sending a notification to the local node. And sending the event to the cloud instance?

Edit 4

I’ve just found /var/log/netdata/health.log which appears to be what I’m looking for.
But the last entry in that file was from last year: 2023-11-23 00:17:34.

Q: Why is my health log file not being populated?

Edit 5

I’m finding that my netdata logs are largely not being written to even though there are alerts going on:

-rw-r--r-- 1 netdata netdata       0 Nov 24  2023 health.log
-rw-r--r-- 1 netdata netdata       0 Nov 24  2023 error.log
-rw-r--r-- 1 netdata netdata       0 Nov 24  2023 collector.log
-rw-rw---- 1 netdata netdata       0 Jul 13 07:22 daemon.log
-rw-rw---- 1 netdata netdata       0 Jul 13 07:22 collectors.log
-rw-r--r-- 1 netdata netdata     434 Jul 24 19:58 access.log.14.gz
...
-rw-r--r-- 1 netdata netdata 2156554 Aug  5 06:31 aclk.log.1
-rw-r--r-- 1 netdata netdata  100641 Aug  5 23:52 access.log.2.gz
-rw-r--r-- 1 netdata netdata       0 Aug  6 00:00 aclk.log
-rw-r--r-- 1 netdata netdata  185422 Aug  6 23:51 access.log.1
-rw-r--r-- 1 netdata netdata     252 Aug  7 00:17 access.log

Q: Is it strange that many of the logs here are empty?

Environment

netdata v1.46.3
Debian 11 (Bulls Eye)
Default configurations

ow-blaze-v · August 8, 2024, 1:01am

In netdata cloud, and checking the logs for the node, I can see some events that should trigger a notification:

But I am not receiving them either in my local node, nor in netdata cloud.

Is this because the time frame of the event being triggered to the time it returned to normal is too small for detection?

ow-blaze-v · August 8, 2024, 1:22am

Similarly, for disk space usge, of the 4 events, I only received an email notification for 1 of the events:

Can I get confirmation this is normal or odd?

Note: I have default configurations

ow-blaze-v · August 8, 2024, 2:27am

Following directions from here:

I have run the command to test the local node notifications:

bash -x /usr/libexec/netdata/plugins.d/alarm-notify.sh test

I receive all three emails:

Warning
Critical
Clear

So while the test clearly sends out all three, in real world situations almost all of the time, the only one I get is the Warning

Environment

netdata agent: v1.46.3
OS: Debian 11 (Bulls Eye) on AWS
Configuration: Default
Cloud plan: Community (2023.11)

car12o · August 12, 2024, 2:42pm

Hi @ow-blaze-v!
Firstly, let me apology for the late assistance, as can see you opened the issue couple of days back.

If I understood correctly, the issue you are facing is that, although alerts change status, you do not receive their respective notification, right?
I also understand you did not configure any notification integration on the Agent, and you expect this notifications coming from Cloud, which happen automatically if the Agent is claimed to the Cloud.

In order to assist you further, could you please share your Space ID in a private message? You can find it at your space setting on the Cloud.

ow-blaze-v · August 15, 2024, 6:26am

Correct.

Sorry, I don’t fully understand the question. I do receive some notification from the agent to the root email account on the same server. I also receive notifications from Netdata cloud to my gmail account. It should be noted that the notifications from the agent don’t always correspond directly with the notifications from the cloud.

I will send you our SpaceID in a PM.

car12o · August 19, 2024, 1:41pm

I took a closer look into your space alerts and everything looks correct. Unfortunately, alerts you shared are too old for the traces we have on our Cloud systems and we cannot debug them properly.
One of the reason I guess could be the culprit is the alert config being set to silent and therefore no notifications are dispatched for status changed.

We have some stock alerts being silenced by default.

github.com

netdata/netdata/blob/572011c0b6ddee11e8f7f462dc76139aa4ce77d1/src/health/health.d/disks.conf#L127-L140


      
             template: 10min_disk_utilization
                   on: disk.util
                class: Utilization
                 type: System
            component: Disk
          host labels: _os=linux freebsd
               lookup: average -10m unaligned
                units: %
                every: 1m
                 warn: $this > 98 * (($status >= $WARNING)  ? (0.7) : (1))
                delay: down 15m multiplier 1.2 max 1h
              summary: Disk ${label:device} utilization
                 info: Average percentage of time ${label:device} disk was busy over the last 10 minutes
                   to: silent

If you see any recent alert (15 days old at max) that you were expecting a notifications and you didn’t receive, please share the alert name and the node from where it was triggered and I can trace it down.

Agent & Cloud notifications are two separate systems and may not work exactly the same. Cloud has some flood protection mechanism to avoid spamming users, like not sending another email if alert status is the same as the previous sent email.

Topic		Replies	Views
Netdata Agent vs Cloud alarm notifications General faq	1	681	June 28, 2021
I no longer receive notifications is ... Help	3	375	January 9, 2023
Why are some notifications sent from Netdata Cloud by email, but not via Slack? Help	5	559	November 26, 2022
Not receiving alert emails Help cloud	2	495	August 4, 2023
Not all alarms send to the Telegram Help agent-alarms , agent	11	1597	May 25, 2021

How to debug missed local and cloud notifications?

Related topics