How to debug missed local and cloud notifications?

I’m experiencing missed notifications, I believe, on at least one of our nodes.

How do I debug and make sure that both local and cloud notifications are being sent?

eg. Local and cloud notifications are somettimes incosistent.

I find that reloading the health config seems to send the missed emails:

sudo netdatacli reload-health

Your help is kindly appreciated. Thanks

Edit 1

Out of the blue, some missing notifications began to appear overnight. Odd.

Edit 2

I’ve just discovered some scripts in this this dir:

/usr/libexec/netdata/plugins.d

I will check it out and see what I can find.

Edit 3

Q: If I were to manually cause a notification (eg Fill disk to over 90%), what log file can I check that will show that netdata has picked up on this event and is sending a notification to the local node. And sending the event to the cloud instance?

Edit 4

I’ve just found /var/log/netdata/health.log which appears to be what I’m looking for.
But the last entry in that file was from last year: 2023-11-23 00:17:34.

Q: Why is my health log file not being populated?

Edit 5

I’m finding that my netdata logs are largely not being written to even though there are alerts going on:

-rw-r--r-- 1 netdata netdata       0 Nov 24  2023 health.log
-rw-r--r-- 1 netdata netdata       0 Nov 24  2023 error.log
-rw-r--r-- 1 netdata netdata       0 Nov 24  2023 collector.log
-rw-rw---- 1 netdata netdata       0 Jul 13 07:22 daemon.log
-rw-rw---- 1 netdata netdata       0 Jul 13 07:22 collectors.log
-rw-r--r-- 1 netdata netdata     434 Jul 24 19:58 access.log.14.gz
...
-rw-r--r-- 1 netdata netdata 2156554 Aug  5 06:31 aclk.log.1
-rw-r--r-- 1 netdata netdata  100641 Aug  5 23:52 access.log.2.gz
-rw-r--r-- 1 netdata netdata       0 Aug  6 00:00 aclk.log
-rw-r--r-- 1 netdata netdata  185422 Aug  6 23:51 access.log.1
-rw-r--r-- 1 netdata netdata     252 Aug  7 00:17 access.log

Q: Is it strange that many of the logs here are empty?

Environment

  • netdata v1.46.3
  • Debian 11 (Bulls Eye)
  • Default configurations

In netdata cloud, and checking the logs for the node, I can see some events that should trigger a notification:

But I am not receiving them either in my local node, nor in netdata cloud.

Is this because the time frame of the event being triggered to the time it returned to normal is too small for detection?

Similarly, for disk space usge, of the 4 events, I only received an email notification for 1 of the events:

Can I get confirmation this is normal or odd?

Note: I have default configurations

Following directions from here:

I have run the command to test the local node notifications:

bash -x /usr/libexec/netdata/plugins.d/alarm-notify.sh test

I receive all three emails:

  • Warning
  • Critical
  • Clear

So while the test clearly sends out all three, in real world situations almost all of the time, the only one I get is the Warning


Environment

  • netdata agent: v1.46.3
  • OS: Debian 11 (Bulls Eye) on AWS
  • Configuration: Default
  • Cloud plan: Community (2023.11)

Hi @ow-blaze-v!
Firstly, let me apology for the late assistance, as can see you opened the issue couple of days back.

If I understood correctly, the issue you are facing is that, although alerts change status, you do not receive their respective notification, right?
I also understand you did not configure any notification integration on the Agent, and you expect this notifications coming from Cloud, which happen automatically if the Agent is claimed to the Cloud.

In order to assist you further, could you please share your Space ID in a private message? You can find it at your space setting on the Cloud.

Correct.

Sorry, I don’t fully understand the question. I do receive some notification from the agent to the root email account on the same server. I also receive notifications from Netdata cloud to my gmail account. It should be noted that the notifications from the agent don’t always correspond directly with the notifications from the cloud.

I will send you our SpaceID in a PM.

I took a closer look into your space alerts and everything looks correct. Unfortunately, alerts you shared are too old for the traces we have on our Cloud systems and we cannot debug them properly.
One of the reason I guess could be the culprit is the alert config being set to silent and therefore no notifications are dispatched for status changed.

We have some stock alerts being silenced by default.

If you see any recent alert (15 days old at max) that you were expecting a notifications and you didn’t receive, please share the alert name and the node from where it was triggered and I can trace it down.

Agent & Cloud notifications are two separate systems and may not work exactly the same. Cloud has some flood protection mechanism to avoid spamming users, like not sending another email if alert status is the same as the previous sent email.