GPU monitoring in Google Colab cluster

Hi all, I would like to monitor the GPU usage through netdata, I’ve successfully added the Colab instance as node in Netdata.

I’m following this tutorial: Netdata GPU collector

Nvidia-smi is actually working on Google Colab free version and GPU is enabled.

The tutorial says as requirements I’ve to modify the netdata configuration file contained in folder \etc\netadata\python.d but this folder is empty, it does not contain nvidia_smi.conf file as well.

Moreover, from google colab I cant run the edit-config script contained in \etc\netdata folder, can I do a workaround by configuring it manually from the Google Colab interface?

here’s an image which shows what I mean with configuring manually the file:

In conclusion, I can’t integrate the gpu monitoring with netdata using Google Colab free version as node.

The edit-config script is a quick shortcut that lets you locate the shipped file under the /etc/netdata/orig directory (actually a symlink), make a copy of that file to the correct place under /etc/netdata and edit that copy. You can do all of that manually.

DESKTOP-C7OKV71:/etc/netdata# ls -l orig
lrwxrwxrwx    1 root     root            23 Dec  7 00:36 orig -> /usr/lib/netdata/conf.d
DESKTOP-C7OKV71:/etc/netdata# ls -l /usr/lib/netdata/conf.d/
total 132
-rw-r--r--    1 netdata  root         13832 Dec  7 00:36 apps_groups.conf
drwxr-xr-x    2 netdata  root          4096 Dec  7 00:36 charts.d
-rw-r--r--    1 netdata  root          1550 Dec  7 00:36 charts.d.conf
drwxr-xr-x    2 netdata  root          4096 Dec  7 00:36 ebpf.d
-rw-r--r--    1 netdata  root          2909 Dec  7 00:36 ebpf.d.conf
-rw-r--r--    1 netdata  root          2785 Dec  7 00:36 exporting.conf
-rw-r--r--    1 netdata  root          1233 Dec  7 00:36 fping.conf
drwxr-xr-x    2 netdata  root          4096 Nov 28 09:02 go.d
-rw-r--r--    1 netdata  root          1720 Nov 28 09:02 go.d.conf
drwxr-xr-x    2 netdata  root          4096 Dec  7 00:36 health.d
-rw-r--r--    1 netdata  root         47824 Dec  7 00:36 health_alarm_notify.conf
-rw-r--r--    1 netdata  root            57 Dec  7 00:36 health_email_recipients.conf
-rw-r--r--    1 netdata  root          1010 Dec  7 00:36 ioping.conf
drwxr-xr-x    2 netdata  root          4096 Dec  7 00:36 python.d
-rw-r--r--    1 netdata  root          1551 Dec  7 00:36 python.d.conf
drwxr-xr-x    2 netdata  root          4096 Dec  7 00:36 statsd.d
-rw-r--r--    1 netdata  root          9789 Dec  7 00:36 stream.conf
DESKTOP-C7OKV71:/etc/netdata# ls -l /usr/lib/netdata/conf.d/python.d/nvidia_smi.conf
-rw-r--r--    1 netdata  root          2826 Dec  7 00:36 /usr/lib/netdata/conf.d/python.d/nvidia_smi.conf

So in my case, I can just
cp /usr/lib/netdata/conf.d/python.d/nvidia_smi.conf /etc/netdata/python.d/nvidia_smi.conf
and do the edit there.

Now I’ve understood how to manage netdata files, however, I’ve followed this tutorial https://blog.ronin.cloud/netdata/ and now I have a copy of python.d.conf file in /etc/netdata folder, I’ve deleted the # in nvidia_smi. (please notice that nvidia-smi is actually working on my node)
I restart the netdata service but unfortunately the dashboard on right side do not show any nvidia-smi label :confused:
I just wanna monitor my GPU, I’m sorry but I need help again.

I’ve just figured out that Netdata reads Google Colab node as “Docker container” and the original Netdata tutorail says that it’s not possible to use GPU monitoring on Docker container.

Yes, the nvidia_smi collector doesn’t work if Netdata runs in a Docker container.

I’ve deleted the # in nvidia_smi.

You need to uncomment it and set the value to yes.