Alarms from some nodes and not others

ticktockhouse · November 24, 2021, 1:08pm

Environment

Centos7 - installed via EPEL
Name : netdata
Arch : x86_64
Version : 1.31.0
Release : 1.el7

First of all, just wanted to say I love Netdata and have found the addition of Netdata cloud to be amazing for my use case.

I have a heterogeneous collection of clients’ servers, mostly LAMP but some other stacks, that I’m monitoring with Netdata cloud.

These are client’s individual servers that they’ve given me access to in order to maintain and monitor, so I have decided to go with RPM/DEB installations so as to get Netdata installed as cleanly as possible - for this scenario, it’s far preferable to the netdata-installer from my POV - e.g. if they ever cease to be my clients, it will be very easy to remove Netdata if necessary, though I would strongly suggest the next guy or girl use it also

Problem/Question

The issue is: I’m getting automatic email notifications (no setup/config needed) via Netdata Cloud for some of these machines, but not others. I should mention that the most recent batch of machines (which don’t seem to be alerting) has had Netdata installed via the EPEL repo for Centos7, whereas the older ones (which do seem to be alerting) were installed via the packagecloud repo. EPEL seemed to have an up-to-date version, so I went with that rather than add yet another YUM repo to the boxes.

What I have noticed, is that the EPEL version seems to have the many alert config files in /etc/netdata/conf.d/health.d, whereas the packagecloud version seems to have these config files in /usr/lib/netdata/conf.d/health.d/

I have had a situation recently where the CPU was pinned at 100% for hours on end, causing issues with a client’s live site, yet I received no alerts

There were no problems connecting these EPEL Netdata instances to Netdata Cloud

What I expected to happen

I would expect these automatically configured alerts to fire via Netdata Cloud if there was an issue (as there has been recently). If more config is needed, or if I should switch to the packagecloud repo, I’m happy to do that, but I’m keen to know what the root cause might be, even if only to file a bug with the EPEL maintainer in order to improve the package if there’s a bug (and be a good open source citizen )

Many Thanks in advance

Manolis_Vasilakis · November 24, 2021, 1:24pm

Hey @ticktockhouse ! Welcome !

Can you please send the error.log (typically found under /var/log/netdata) to manolis at netdata dot cloud, from one of the EPEL installed instances? Would like to have a look and maybe check what is happening.

So in general, the from-EPEL installed netdata instances, do connect to the cloud, but there are no alerts sent over, right?

Thanks!!

Manolis_Vasilakis · November 24, 2021, 1:32pm

You can also check under http://a-machines-ip/api/v1/info, in section alarms if there are any alarms in any state ?

ticktockhouse · November 24, 2021, 2:22pm

Hi Manolis,

That’s correct. I have emailed the log as requested.

The issue seems to be that no alarms are triggered in the 1st place…

Thanks

Manolis_Vasilakis · November 24, 2021, 2:38pm

Ok, got the logs, thanks! (Will wait for the rotated ones too)!

However, in the mean time, I setup Netdata in a Centos 7.9 via EPEL package:

Installed Packages
Name        : netdata
Arch        : x86_64
Version     : 1.31.0
Release     : 1.el7
Size        : 5.0 M
Repo        : installed
From repo   : epel

and, although the alert configuration files are indeed in /etc/netdata/conf.d/health.d/, its packaged configuration file (/etc/netdata/netdata.conf) is also ok to point at those:

...
[health]
    stock health configuration directory = /etc/netdata/conf.d/health.d
...

So in a quick test, ate all the ram with stress-ng, claimed to the cloud I did get alerts there.

(Btw, is it possible you’re using a different netdata.conf, which is missing the path to alert configuration files?)

So there’s must be something in your setup we need to find. I think the logs should provide some info.

Thanks!

ticktockhouse · November 24, 2021, 4:20pm

Aha! I think that may be it…

I’m templating the netdata.conf using ansible and have indeed overwritten it with this:

# Netdata configuration

[global]
  hostname = myserver
  dbengine multihost disk space = 2048

[web]
	mode = none

I’ll try putting the necessary parameters back in and see how I go…

Thanks you so much for your help!

Manolis_Vasilakis · November 25, 2021, 7:14am

Hey! No problem! Let us know how it goes!

Thanks!!

ticktockhouse · November 25, 2021, 11:06am

That worked a treat

Now I have to find out how to fix the causes of all these alerts I’m getting(!)

Not sure if I have to mark this thread “SOLVED” or anything…

Manolis_Vasilakis · November 25, 2021, 2:57pm

Marked it for you, thanks!!

Topic		Replies	Views
Netdata Agent vs Cloud alarm notifications General faq	1	680	June 28, 2021
Cannot install netdata from source (the source directory does not include netdata-installer.sh). Leaving all files in /tmp/netdata-kickstart-RH8spBRm6P Help agent-installation , agent	13	2939	October 23, 2020
Up to date, but "Needs update" Help cloud	2	548	February 20, 2021
Centralize the truth of your infrastructure with alarm notifications General agent-release , feature-release , announcement	4	1280	December 17, 2020
Alarm has wrong link Help agent-configuration , agent-health , agent	3	941	August 29, 2022

Alarms from some nodes and not others

Environment

Problem/Question

What I expected to happen

Related topics