Alarms from some nodes and not others

Environment

Centos7 - installed via EPEL
Name : netdata
Arch : x86_64
Version : 1.31.0
Release : 1.el7

First of all, just wanted to say I love Netdata and have found the addition of Netdata cloud to be amazing for my use case.

I have a heterogeneous collection of clients’ servers, mostly LAMP but some other stacks, that I’m monitoring with Netdata cloud.

These are client’s individual servers that they’ve given me access to in order to maintain and monitor, so I have decided to go with RPM/DEB installations so as to get Netdata installed as cleanly as possible - for this scenario, it’s far preferable to the netdata-installer from my POV - e.g. if they ever cease to be my clients, it will be very easy to remove Netdata if necessary, though I would strongly suggest the next guy or girl use it also :slight_smile:

Problem/Question

The issue is: I’m getting automatic email notifications (no setup/config needed) via Netdata Cloud for some of these machines, but not others. I should mention that the most recent batch of machines (which don’t seem to be alerting) has had Netdata installed via the EPEL repo for Centos7, whereas the older ones (which do seem to be alerting) were installed via the packagecloud repo. EPEL seemed to have an up-to-date version, so I went with that rather than add yet another YUM repo to the boxes.

What I have noticed, is that the EPEL version seems to have the many alert config files in /etc/netdata/conf.d/health.d, whereas the packagecloud version seems to have these config files in /usr/lib/netdata/conf.d/health.d/

I have had a situation recently where the CPU was pinned at 100% for hours on end, causing issues with a client’s live site, yet I received no alerts

There were no problems connecting these EPEL Netdata instances to Netdata Cloud

What I expected to happen

I would expect these automatically configured alerts to fire via Netdata Cloud if there was an issue (as there has been recently). If more config is needed, or if I should switch to the packagecloud repo, I’m happy to do that, but I’m keen to know what the root cause might be, even if only to file a bug with the EPEL maintainer in order to improve the package if there’s a bug (and be a good open source citizen :slightly_smiling_face: )

Many Thanks in advance

Hey @ticktockhouse ! Welcome !

Can you please send the error.log (typically found under /var/log/netdata) to manolis at netdata dot cloud, from one of the EPEL installed instances? Would like to have a look and maybe check what is happening.

So in general, the from-EPEL installed netdata instances, do connect to the cloud, but there are no alerts sent over, right?

Thanks!!

You can also check under http://a-machines-ip/api/v1/info, in section alarms if there are any alarms in any state ?

Hi Manolis,

That’s correct. I have emailed the log as requested.

The issue seems to be that no alarms are triggered in the 1st place…

Thanks :slight_smile:

Ok, got the logs, thanks! (Will wait for the rotated ones too)!

However, in the mean time, I setup Netdata in a Centos 7.9 via EPEL package:

Installed Packages
Name        : netdata
Arch        : x86_64
Version     : 1.31.0
Release     : 1.el7
Size        : 5.0 M
Repo        : installed
From repo   : epel

and, although the alert configuration files are indeed in /etc/netdata/conf.d/health.d/, its packaged configuration file (/etc/netdata/netdata.conf) is also ok to point at those:

...
[health]
    stock health configuration directory = /etc/netdata/conf.d/health.d
...

So in a quick test, ate all the ram with stress-ng, claimed to the cloud I did get alerts there.

(Btw, is it possible you’re using a different netdata.conf, which is missing the path to alert configuration files?)

So there’s must be something in your setup we need to find. I think the logs should provide some info.

Thanks!

Aha! I think that may be it…

I’m templating the netdata.conf using ansible and have indeed overwritten it with this:

# Netdata configuration

[global]
  hostname = myserver
  dbengine multihost disk space = 2048

[web]
	mode = none

I’ll try putting the necessary parameters back in and see how I go…

Thanks you so much for your help! :star_struck:

Hey! No problem! Let us know how it goes!

Thanks!!

That worked a treat :smiley:

Now I have to find out how to fix the causes of all these alerts I’m getting(!)

Not sure if I have to mark this thread “SOLVED” or anything…

Marked it for you, thanks!!