Our Setup
We’re using Docker Swarm to deploy a whole stack of containers to a variety of nodes, some in a datacenter and some in the cloud. We use Netdata as one of these containers, so every node has one netdata-collector
(the child), and a leader node has a netdata-parent
. We do not expose the port to external networks; Netdata is only reachable from within the Docker Swarm network.
To receive notifications if any of the nodes go offline (we are aware that the parent here is a SPOF), we recently added all nodes to Netdata Cloud and connected our Slack to the notifications to receive alerts. We have a hybrid approach, as the parent keeps a historic backlog of the metrics and is accessible by many people. On the other hand, Cloud is only accessible to a few with administrative privileges.
We deploy in a way that removes the whole stack, including the Netdata children and the parent. This results in many notifications that all our nodes are down. As this is expected, we are looking for a way to silence the notifications temporarily before we deploy and, if needed, re-enable them after the deployment is successful to reduce the noise from numerous messages.
We found this and this, but we do not want to use the Netdata Cloud UI. We are looking for a solution that can be automated in our CI or be a CLI command (e.g., curl
), as we deploy multiple times a day and, as implied before, not everyone who is permitted to deploy has this kind of access to our metrics and monitoring.
Question
What is the correct/intended way to temporarily disable alerts in Netdata Cloud? Both links above provide a way to tell a single NODE not to send alerts, but this is fruitless if the node won’t exist in the next moment. I guess we’d have to use some Cloud API (if there is one) to indicate that it’s temporarily OK for these nodes to be gone, since that is also where the alerts come from.
Unfortunately, either I have missed the relevant documentation, or there is a lack of it for this set of features.
Is there a way to temporarily, programatically disable alerts for a single node, a list of nodes, a single room, or a list of rooms? Preferably with an optional time range (e.g., disable for 15 minutes) so I don’t have to manually delete this disable action, assuming the deploy won’t have any problems.
Relevant Docs You Followed
The links, mentioned above:
- Health API Calls | Learn Netdata
- How to mute alerts during maintenance windows or scheduled backups? | Netdata
Environment/Browser/Agent’s Version etc.
- Debian 12
- Docker 24
- Netdata Agent 1.45 oci