Impossible to set GPU Alerts

Hey All!

I’ve been wrestling with this for about 6 hours now and still can’t get this GPU alert to work. I’m starting to think it might actually be impossible! ^^

  • Using netdata v2.8.4, agent only, Ubuntu

  • Having Nvidia GPU and using nvidia_smi collector. All charts are working correctly under Metrics

  • Created gpu.conf under /etc/netdata/health.d with nvidia_smi.gpu_utilization as this was on the usage chart:

    alarm: gpu_usage
       on: nvidia_smi.gpu_utilization
    lookup: average -1m
    units: %
    every: 1m
     warn: $this > 80
     crit: $this > 90
     info: GPU usage monitoring
    
  • Above and dozens of configurations did not work. Alert is not showing in web and /api/v1/alarms?all

  • No relevant info in Journal and error.log

  • To narrow the problem, in above conf I only changed *on: system.ram*and alert appeared successfully. So I guess that must be a bigger problem with nvidia_smi?

Please help, I’m sooo stuck with this :frowning:

Hey, @JanItor. Use template instead of alarm.

template: gpu_usage 
      on: nvidia_smi.gpu_utilization 
  lookup: average -1m 
   units: % 
   every: 1m 
    warn: $this > 80 
    crit: $this > 90 
    info: High GPU usage ${label:product_name}

Of course! So obvious yet so tedious…

Thank You!