docker_unhealthy_containers

docker_unhealthy_containers

Containers | Docker

Docker is an open source containerization platform. It enables developers to package applications into containers—standardized executable components combining application source code with the operating system (OS) libraries and dependencies required to run that code in any environment

Sometimes while our container is running, the application inside may have crashed. To foresee those events, container runtimes (CR) and orchestrators perform health checks to endpoints inside the functional units of the container. A container marked as unhealthy by the CR, is malfunctioning and should be stopped. Those health checks are defined by the creator of the container with the HEALTHCHECK instructions. 1

The Netdata Agent monitors the average number of unhealthy docker containers over the last 10 seconds. This alert indicates that some containers are not running due to failed health checks.

This alert is raised into warning when at least one container is unhealthy in your Docker engine.

References and sources
  1. HEALTHCHECK instruction in Docker docs

Troubleshooting section

Inspect and restart the UNHEALTY container
  1. Check all the containers in the system.

    root@netdata # docker ps -a
    
  2. Find the NAME of the container that is marked as UNHEALTHY.

  3. Check the logs of this container to get some insights into what’s going wrong

    root@netdata # docker logs <UNHEALTHY_CONTAINER>
    

    In many cases, your app’s logs may not appear in docker log collector. A simple workaround is something like this, redirect your apps’s logs into stderr. Use this workaround purposefully. Another workaround is to redirect any log attempt to log directly into the /proc/self/fd/2.

  4. Restart the container and see if this fixes the problem.

    root@netdata # docker logs <UNHEALTHY_CONTAINER>
    
  5. If you receive this alert often, you may have to do further investigation on why this event occurs

Hi guys,

Sorry if this question is documented somewhere, but where can I see which container is in this status?

It has not been easy for me to find it in the notification I receive:

Hi, @Juanra. Unfortunately, Netdata does not collect the health status of individual containers, so it is not possible to know which container is unhealthy from the Netdata user interface (and alarms).

found the command for it ./edit-config health.d/dockerd.conf. it is confusing that we edit docker.conf to configure docker plugin, but dockerd.conf to configure docker health alert.

How do you modify this alert so it is not too sensitive? Can it be averaged over a more extended period of time or change the threshold value?

Per container health metrics implemented in docker: add per-container stats by ilyam8 · Pull Request #1148 · netdata/go.d.plugin · GitHub

How do we use this feature?

It is enabled by default and available in both the latest stable and nightly releases have it. Netdata creates a “Docker container health status” chart for every Docker container + alarm for unhealthy containers.

Can I exclude a specific container or more from sending unhealthy alerts? I Have one with VPN and sometimes it restarts if not connected how can I exclude it

I think it would be easy enough to define a silencing rule in netdata cloud for the alert instance of that specific container Cloud alert notifications | Learn Netdata.

@ilyam8 do you know if is another way to essentially disable for some subset of containers via the on or chart_labels maybe?

What if had configured alerts for telegram at node level?

chart_labels maybe

Yes, we have it in our documentation - chart_labels description with examples.

chart labels: container_name=!NameToExclude *

Cool which file should I edit?

Add chart labels line to this I think.

/health.d/docker.conf

More info