Netdata Community

Dynamically register a custom app to monitor

It would be amazing to monitor a custom application with all sub processes and have the possibility to export the recorded metrics only for that application.
The mechanism available now with apps_groups.conf is not well suited.

Problem:

Imagine a user wants to know about the memory and CPU requirements of a program. The host he is using is a multi user system and he already has one or several instance of that program running on it. On the same time other users might also run instances of that program at the same host.
In that case defining an app group for that program will not help as there are several instances of the program running. The program might also run for a longer time (as there are only a very limited number of metrics it would not harm to extend the time span for collecting the data) have the option to extend the time span for collecting the data so that it is not necessary to periodically save the data.

Use case:

A user needs to know the resources necessary for an application to run it on a cluster. For example asking for too much memory might reduce the number of jobs running in parallel as there is not enough memory to use all available CPUs. On the other side asking for too little memory will kill the program at some point. A program might be able to use several CPUs and therefore get some speed up of using more than one CPU but it cannot always utilize them fully and might be limited to a maximum number. For recent hardware with a huge number of CPUs having data about the number of used CPUs over time used by a program provides the possibility to calculate the optimal number of CPUs so that running several instances in parallel will utilizes the system in a most efficient way.
Having the possibility to dynamically define a custom app for a process id and all sub processes would be amazing.
For example by using the API and register the PID and a NAME. On the web interface display a menu item like custom app with a sub entry for the NAME. And also have the possibility using the API to get all collected data concerning only that process id and all sub processes. In case the process will run longer than the default time the data is stored having some parameter to define a custom maximum range (like 7 days instead of the default 24h)

usage example using a bash script could look like:

PID2watch=$$
range=604800
NAME="CustomProgram1"
data="./${NAME}.netdata.zip"
# monitor process using netdata
curl -X GET "http://${HOSTNAME}/:19999/api/v1/monitorcustomapp?pid=${PID2watch}&name=${NAME}&range=${range}"
# run program
myCustomProgram1
# save data form netdata
curl -X GET "http://${HOSTNAME}/:19999/api/v1/getcustomapp?pid=${PID2watch}&name=${NAME}" -H "accept: application/zip" > ${data}

Hello @koebi001,

Firstly, welcome to our community!

Thank you for your suggestion. I think @Manos_Saratsis and @dim08 will like to read and to think about it.

But, I can say for you that next year we will have charts with specific information about CPU usage and Hard Disk IO, because I am working now with Latency charts that will show information on apps submenu that will give some idea about resources that a software is consuming. We will bring this using the integration between ebpf.plugin and apps.plugin.

Best regards!