setting alerts

Hi Guys,

I’m not sure if i’m missing something obious or it’s just the design of the system. trying to figure out how to setup a global alert for each resource. so for example if any of our monitored cpu’s goes above X - send alert.
In the docs it says each alert should be configured on the agent itself - is it really that complaicated? we have about 250 machines - this is impossible to manage…
Or maybe i can just get a simple api request to pull all cpu’s at ones and make my own alert (didn’t find this either…

Thanks.

Hi @Pavel_Rekun

In general yes, each agent runs it’s own health (alerting) on it’s own metrics. So if you need to setup an alert you need to do it on each one separately.

If you use any special way for deployment of netdata, then you could include this custom alert as part of the deployment.

The other way this can be achieved, is by having those agents stream their metrics to a “parent” netdata agent. That parent can be configured to run alerts on it’s “children” nodes.

Are those 250 nodes currently in such a setup, or each on it’s own?

What’s meant by “special way for deployment” here is that for 250 nodes we expect you are already using a provisioning and/or configuration management (infrastructure as code) tool like terraform, ansible, chef, or puppet.

Also, for production deployments you should always set up streaming and replication. Read Deployment strategies | Learn Netdata , we’re improving it heavily these days.

Currently about half of it are windows machines. still testing out the parent setup (this is the way we can monitor windows machines - already opened a bug on parent crash so its on a hold) .

As to the alarms - not sure why its design so complicated - if we already have all data in the cloud web gui - why not to add an option to manage it from the cloud? this complicates things very much for us.
Is it possible to fetch with api? as example all cpu’s of all machines?

We currently have about 25 machines connected to test out netdata - the UI and usability are excellent, but not sure what to do with the alerts now…