Hi,
I’d like to know how can I create an alert that will be turned on, if some other docker container on my docker server has crashed and is stopped.
I can already see all the metrics for all my other docker containers.
Thank you
Hi,
I’d like to know how can I create an alert that will be turned on, if some other docker container on my docker server has crashed and is stopped.
I can already see all the metrics for all my other docker containers.
Thank you
Yes, absolutely.
Check out this topic for a number of very helpful resources on creating configuration alarms. What you really want is to create an alarm using the $last_collected_t
variable.
So, if the container is killed, then that variable will have the timestamp of the last time when Netdata was able to collect data from that container.
What I would do, is I would use the resources to understand how the alarms syntax works (+ the reference documentation at Agent alert notifications | Learn Netdata) and then I would create an alarm just like the one with apache, shown in the forum topic I linked before. Finally, instead of using the apache chart, I would use a chart from that particular container.
That should work. Pinging @ilyam8 to verify my thinking.
Cheers!
It is possible to alert on any stopped container. I am not sure it sounds good - you can deliberately stop it (docker stop ...
).
If a container crashed - that sounds good. But how do we detect the container crashes?
Not a specialist, just guessing here
[ilyam@pc ~]$ docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
8c398b978036 nginx "/docker-entrypoint.…" 27 minutes ago Exited (0) 10 minutes ago pedantic_johnson
a6f1754d5361 gcr.io/k8s-minikube/kicbase:v0.0.15-snapshot4 "/usr/local/bin/entr…" 3 weeks ago Exited (137) 4 days ago minikube
Perhaps if its Status is Exited and Exit Code != 0 ?
We have dockerd
python collector, it connects to the Docker Engine API, it means it has all metadata (container id, name, image, status, exit code, etc.). Perhaps we could extend it.
First question we need to answer - how do we detect a container crash?
I’m no expert on Docker, but from my experience, this is the way I would consider the container as crashed:
die
or oom
- docker events | Docker Documentation