A couple of days ago I started a new container-based project and naturally, I installed Netdata to be able to have per-second metrics with minimal effort.
This was crucial, as the project concerned urbit, a fascinating project that wants to re-invent the personal computer. We won’t go into detail, but it’s a sort of a VM that I deployed on my Raspberry pi using balena.
Balena is a container-based platform to manage IoT devices and the lifecycle of their applications, so I deployed 2 containers on my Raspberry pi, a container running Urbit and a container running Netdata on docker.
To deploy Netdata, I defined a docker-compose file based on the documentation and a Dockerfile that used netdata/netdata
as base.
Why Modify the Netdata Dockerfile?
Netdata’s dockerfile is robust, created by our own @Austin_Hemmelgarn , but if you want to customize your Netdata installation, you will need to use our Dockerfile as base and create your own.
Customing the Dockerfile is a great way to add custom configuration for Netdata, collectors, and alarms. This configuration will be copied every time you build the container, making it a great choice for automation. SSHing into the container to use ./edit-config
is far from ideal
Another great reason to do this is the ability to load custom software into the container. For example, you might want to have a proper ssh
server inside the container, so that you can ssh into it remotely.
How to fix container name resolution
It’s an issue that we see from time to time. Netdata fails to translate the container_id
into the humanly readable container name, making the integration considerably harder to use.
This boils down to a particular script that netdata uses, named cgroup-name.sh
. This script is run by the Netdata Agent and communicates over HTTP
or Linux socket
with various container daemons, in order to find the name.
In our case, it communicates with the docker daemon, over the docker socket.
As per the documentation, there are a bunch of different ways to tackle this. I chose to go with an option that is both somewhat secure and somewhat easy, that is to add the netdata
user in the docker group
, thus allowing netdata to use the socket and get the information it needs.
The problem is that in order to do that, we need to define the PGID
of the docker group
in the docker-compose.yml
file. (check the image above).
And here lies the gotcha.
In the Netdata docs, we use the /etc/group
as the source of truth to find the PGID of the group docker
. If we look at the startup script of the Netdata container, it created the docker group and proceeds to add the netdata user to it.
Thus, when we read /etc/group
, we will read the value that we in fact populated, by defining the PGID
and then running the default ENTRYPOINT
of the Netdata container.
It’s a self-fulfilling prophecy.
How Container Resolution is happening at Netdata?
As we have said, Netdata needs access to the docker socket
that lives in /var/run/docker.sock
. This is the endpoint that the cgroup-name.sh
script is using to translate IDs to names.
Thus, what we really really want is that the netdata user belongs to the same group to which that file belongs.
Thus, the best way to tackle container name resolution is:
- Run netdata container without defining a PGID
- SSH into the container (e.g
docker exec /bin/bash
) - run
ls -l /var/run
. The second number in the columns is the PGID that we are searching for - Go back to our
docker run
ordocker-compose.yml
and modify the PGID
In the image below, the PGID has been set correctly. This means that the PGID for the docker
group and the PGID of the group to which docker.sock
belongs to are the same.
Thus, ls -l
will not output a PGID, but instead the name of the group which has that PGID, in our case docker
.
In conclusion
- If name resolution doesn’t work, it’s probably because the
netdata
user can’t access the socket. - To fix this, run
ls -l /var/run/
and find thedocker.sock
. If the group is not properly set, it will have a number instead ofdocker
. That number is the PGID. Note that alternative container solutions could have docker-compatible sockets with different names. For example, balena hasbalena-engine.sock
. You will need to define this in thedocker-compose.yaml
file, the custom Dockerfile ordocker run
. - Go back in docker-compose/docker run and replace the PGID.
- Build the container(s) again
What do you think of Netdata + Containers ? What would you like us to improve?
Comment below