Nvidia-smi use to monitor quadro gpus



  • Hi everybody,
    I have just discovered netdata and I already think it is a great tool.

    I installed it the recommended way using ‘curl’ and I have the latest stable branch on RHEL 7.8 with a Quadro P5200. Nvidia-smi works usually like a charm and the normal user can run it without escalated permissions.

    I used inside the folder /etc/netdata/:

    sudo ./edit-config python.d/nvidia_smi.conf
    

    The conf file is created, with the values :

    nvidia-hm:
    	name: 		nvidia-smi
    	update_every: 	1
    	priority: 	60000
    	penalty: 	no
    	autodetection_retry:	0
    	loop_mode: 	yes
    	poll_seconds: 	1
    

    After that I restart the netdata service using :

    sudo systemctl restart netdata
    

    … but nothing Nvidia related appears on the dashboard. I cannot wrap my head around it, I must have missed something but either on Github or the official documentation I have not found the answer.

    Would someone please help me ?
    Thanks in advance,

    Alexandre



  • Hi @eidal!

    You should also enable nvidia-smi python plugin in python.d.conf
    So, run

    sudo ./edit-config python.d.conf

    and uncoment nvidia_smi yes line


  • Staff

    @eidal, make sure to report back the results of the above comment, we want to make sure that you got it working!

    Thanks @rybue for the suggestion! 🙏



  • Hi @Rybue,
    Thanks a lot for your reactivity, I deeply apologize for being late.
    I ran :

    sudo ./edit-config python.d.conf
    

    and uncomented the nvidia_smi yes line. I then restarted the service but still nothing… On the dashboard I searched under NetData Monitoring > python.d but still no Nvidia Gpu… Am I missing something ?



  • On dashboard, new item, called nvidia smi should appeared. It should not be under NetData Monitoring > python.d



  • Nope. Nothing called Nvidia-smi on the dashboard. I have a screenshot but I do not know how to add it to my answer…



  • Could you try to simplify python.d/nvidia_smi.conf file and have only those lines in file:

    loop_mode: yes
    poll_seconds: 1
    

    Restart service and check dashboard.

    Also, check for any errors in the /var/log/netdata/error.log file, related to nvidia.



  • Well I did all of it. Thanks for the path to the error logs. Among others, I get a :

    2020-09-14 10:35:22: python.d ERROR: plugin[main] : [nvidia_smi] error on loading '/etc/netdata/python.d/nvidia_smi.conf' : ScannerError()
    2020-09-14 10:35:22: python.d INFO: plugin[main] : [nvidia_smi] has no job configs, skipping it
    


  • OK ! Thanks to the hint in error log, I have the solution. It appears that the file does not read tabulations… my bad. An old habit to format code…
    Huge thanks @Rybue for your precious help and your reactivity 🙂


  • Staff

    Thanks @rybue for providing another answer, you rock man 🙂


Log in to reply