Some of our customers got interested in the power of alert notifications and we are about to enhance the configuration to not only meet the needs of ourselves but also the clients who want to receive selective notifications. We did some research and think, all we need can be done but we want to make sure, we got it right, hence a number of questions:
We can define our own roles and are not limited to the stock roles like sysadmin, dba, webmaster. Is that right?
Each template can can multiple roles in the to field, separated by space, right?
When overwriting the to field of a template, we do that in /etc/netdata - when using edit-config we get the file to overwrite all templates of a health category. Can we just overwrite one template in a category by creating the correct file containing only that overwritten template?
Can a chart overwrite the to field of the alert templates? Use case is that we have many http checks configured, but for one of them, the alert should go to an additional custom role.
If i understand you right - no. There is no merge of user and stock configuration files, it is: try to read user config, if it doesn’t exist, read stock config. For instance, if you create an empty /etc/netdata/health.d/cpu.conf you get no cpu alarms (stock cpu.conf is ignored).
No, it can not. That is an interesting idea, but i believe it is impossible to do right now (many http checks configured, but for one of them, the alert should go to an additional custom role). Correct me if i am wrong @Thiago_Marques_0@vlvkobal
No you are right, right now it is not possible chart overwrite any field to alert template. When health process data data, it reads the alarm values, but it does not use any chart field to dispatch messages, but like you I also consider this a good idea.
As for 3. that’s a bit unfortunate. Copying the complete stock file and making customizations is not a problem. I’m only worried about updates of Netdata, where we don’t want to miss out on template improvements in stock. Any suggestions on how to handle that?
As for 4. if we have individual charts that should have modified notifications, should we write our own templates for those? Is it even possible to assign specific templates to charts or is there a better way of doing this?
For 3:
When you use edit-config, the script copies the original stock files to /etc/netdata/health.d, netdata never overwrites these files, we only overwrite the content inside /usr/lib/netdata/conf.d, for example, if you run the following commands:
# cd /etc/netdata/
# ./edit-config health.d/haproxy.conf
# ls health.d/haproxy.conf
you will be able to configure your alarms and you do not need to worry with the possibility to lose them.
For 4:
When we use template, we apply alarms to contexts and they are more generic, but if you create alarms you can apply to specific charts.
Well, that workflow regarding 3 is ok for a single host. But if you maintain hundreds or thousands of hosts, you can do it that way. Then you need a provisioning tool like Ansible, which we already have in place. That’s not the issue. So we use the stock templates from the latest version and use them as Ansible templates with parameters in them which we copy to each host when rolling out Netdata - and later also if we update Netdata to a newer version. And that’s when I get worried. The new version of Netdata will come with improved templates but we won’t benefit from those improvements, unless we compare each of the new stock templates with our custom templates - or we review the Git repository from Netdata. Both options are not practical to be honest.
As for 4, I’m not getting it yet how we could define individual alarm and notification settings for individual charts. Is there any examples or a guide somewhere?
That sounds better than expected. Regarding config files, if I keep stock config where it is - untouched - and create my own template file with a different filename, that way I can overwrite individual templates with my own version without loosing anything from other stock templates. @ilyam8 hope I got this right, just trying to rephrase what you explained two comments ago.
I’ll also revisit the alarms and templates documentation that you linked to. I had been there before but had issues understanding it all. I may well come back here with more specific questions then.
Yes, you got it right. Different filename, but the alarm/template in there has same name as in the stock config file => overwrite specific alarm/template.
Just to let you all know, we’ve achieved what we needed to with the help from Netdata people in the chat, thanks a lot. As we do everything in Ansible, we’ve also documented our approach and you can read about it at Ansible Role NetData - DevOps Tools
@ilyam8 that’s a great move! It means we have to adjust our Ansible scripts when we update to that new version eventually, but it makes things more readable and avoids confusion, I agree.
@OdysLam I’d be happy to rite a little guide, but I wouldn’t want to replicate the source code repository on GitHub because our code is already available publicly on our GitLab instance. So if it were OK with you you keep the code where it currently is and link from the guide to it, then that’s OK for us too and we could provide you with something by the end of the week.