A couple of days ago I started a new container-based project and naturally, I installed Netdata to be able to have per-second metrics with minimal effort.
This was crucial, as the project concerned urbit, a fascinating project that wants to re-invent the personal computer. We won’t go into detail, but it’s a sort of a VM that I deployed on my Raspberry pi using balena.
Balena is a container-based platform to manage IoT devices and the lifecycle of their applications, so I deployed 2 containers on my Raspberry pi, a container running Urbit and a container running Netdata on docker.
To deploy Netdata, I defined a docker-compose file based on the documentation and a Dockerfile that used
netdata/netdata as base.
Netdata’s dockerfile is robust, created by our own @Austin_Hemmelgarn , but if you want to customize your Netdata installation, you will need to use our Dockerfile as base and create your own.
Customing the Dockerfile is a great way to add custom configuration for Netdata, collectors, and alarms. This configuration will be copied every time you build the container, making it a great choice for automation. SSHing into the container to use
./edit-config is far from ideal
Another great reason to do this is the ability to load custom software into the container. For example, you might want to have a proper
ssh server inside the container, so that you can ssh into it remotely.
It’s an issue that we see from time to time. Netdata fails to translate the
container_id into the humanly readable container name, making the integration considerably harder to use.
This boils down to a particular script that netdata uses, named
cgroup-name.sh. This script is run by the Netdata Agent and communicates over
Linux socket with various container daemons, in order to find the name.
In our case, it communicates with the docker daemon, over the docker socket.
As per the documentation, there are a bunch of different ways to tackle this. I chose to go with an option that is both somewhat secure and somewhat easy, that is to add the
netdata user in the
docker group, thus allowing netdata to use the socket and get the information it needs.
The problem is that in order to do that, we need to define the
PGID of the
docker group in the
docker-compose.yml file. (check the image above).
And here lies the gotcha.
In the Netdata docs, we use the
/etc/group as the source of truth to find the PGID of the group
docker. If we look at the startup script of the Netdata container, it created the docker group and proceeds to add the netdata user to it.
Thus, when we read
/etc/group, we will read the value that we in fact populated, by defining the
PGID and then running the default
ENTRYPOINT of the Netdata container.
It’s a self-fulfilling prophecy.
As we have said, Netdata needs access to the
docker socket that lives in
/var/run/docker.sock. This is the endpoint that the
cgroup-name.sh script is using to translate IDs to names.
Thus, what we really really want is that the netdata user belongs to the same group to which that file belongs.
Thus, the best way to tackle container name resolution is:
- Run netdata container without defining a PGID
- SSH into the container (e.g
docker exec /bin/bash)
ls -l /var/run. The second number in the columns is the PGID that we are searching for
- Go back to our
docker-compose.ymland modify the PGID
In the image below, the PGID has been set correctly. This means that the PGID for the
docker group and the PGID of the group to which
docker.sock belongs to are the same.
ls -l will not output a PGID, but instead the name of the group which has that PGID, in our case
- If name resolution doesn’t work, it’s probably because the
netdatauser can’t access the socket.
- To fix this, run
ls -l /var/run/and find the
docker.sock. If the group is not properly set, it will have a number instead of
docker. That number is the PGID. Note that alternative container solutions could have docker-compatible sockets with different names. For example, balena has
balena-engine.sock. You will need to define this in the
docker-compose.yamlfile, the custom Dockerfile or
- Go back in docker-compose/docker run and replace the PGID.
- Build the container(s) again
What do you think of Netdata + Containers ? What would you like us to improve?