Netdata Community

Alert for another docker container crash

Hi,
I’d like to know how can I create an alert that will be turned on, if some other docker container on my docker server has crashed and is stopped.

I can already see all the metrics for all my other docker containers.

Thank you

1 Like

Yes, absolutely.

Check out this topic for a number of very helpful resources on creating configuration alarms. What you really want is to create an alarm using the $last_collected_t variable.

So, if the container is killed, then that variable will have the timestamp of the last time when Netdata was able to collect data from that container.

What I would do, is I would use the resources to understand how the alarms syntax works (+ the reference documentation at Alarm notifications · Netdata Agent | Learn Netdata) and then I would create an alarm just like the one with apache, shown in the forum topic I linked before. Finally, instead of using the apache chart, I would use a chart from that particular container.

That should work. Pinging @ilyam8 to verify my thinking.

Cheers!

Hi @zakimakarena

It is possible to alert on any stopped container. I am not sure it sounds good - you can deliberately stop it (docker stop ...).

If a container crashed - that sounds good. But how do we detect the container crashes?

Not a specialist, just guessing here

[ilyam@pc ~]$ docker ps -a
CONTAINER ID   IMAGE                                           COMMAND                  CREATED          STATUS                      PORTS     NAMES
8c398b978036   nginx                                           "/docker-entrypoint.…"   27 minutes ago   Exited (0) 10 minutes ago             pedantic_johnson
a6f1754d5361   gcr.io/k8s-minikube/kicbase:v0.0.15-snapshot4   "/usr/local/bin/entr…"   3 weeks ago      Exited (137) 4 days ago               minikube

Perhaps if its Status is Exited and Exit Code != 0 ?

We have dockerd python collector, it connects to the Docker Engine API, it means it has all metadata (container id, name, image, status, exit code, etc.). Perhaps we could extend it.

First question we need to answer - how do we detect a container crash?

@ilyam8

I’m no expert on Docker, but from my experience, this is the way I would consider the container as crashed:

  1. The docker event should be die or oom - docker events | Docker Documentation
  2. The docker exit code should be 1 or 139 - https://medium.com/better-programming/understanding-docker-container-exit-codes-5ee79a1d58f6