Netdata Community

Encountered multiple issues trying to set an alarm on bandwidth overuse using snmp fetched metric

Environment

Running netdata/netdata:v1.29.3 docker image on top of aarch64 architecture

Problem/Question

This is the snmp.conf I use for collecting my router BW info (faked some data for security reasons):

{
    "enable_autodetect": false,
    "update_every": 5,
    "max_request_size": 100,
    "servers": [
        {
            "hostname": "192.161.1.1",
            "community": "my_comunity",
            "update_every": 10,
            "max_request_size": 50,
            "options": {
                "timeout": 10000
            },
            "charts": {
                "snmp.lte1_tx_rx_rate": {
                    "title": "SNMP Bandwidth for lte1",
                    "units": "kilobits/s",
                    "type": "area",
                    "priority": 1,
                    "family": "mikrotik",
                    "dimensions": {
                        "in": {
                            "oid": "1.3.6.1.2.1.31.1.1.1.6.5",
                            "algorithm": "incremental",
                            "multiplier": 8,
                            "divisor": 1024,
                            "offset": 0
                        },
                        "out": {
                            "oid": "1.3.6.1.2.1.31.1.1.1.10.5",
                            "algorithm": "incremental",
                            "multiplier": -8,
                            "divisor": 1024,
                            "offset": 0
                        }
                    }
                }
            }
        }
    ]
}

Inspired by: Netdata + SNMP + Mikrotik

Let x = average rate from the last day
bw = total bw over a month assuming daily rate is x (and it doesn’t change)

if bw > 50 GB: warn
if bw > 70 GB: critical

as for the calculation of x, I encountered some issues and had to split it to two calculations, also I configured the calculation to run every 12 hours. See following config, which is my best attempt defining the alarm:

alarm : monthly_bandwidth_rate_in
    on: snmp.lte1_tx_rx_rate
lookup: average -1d every 10s unaligned absolute of in
  info: monthly BW average kilobits/s of in
    to: sysadmin

alarm : monthly_bandwidth_rate_out
    on: snmp.lte1_tx_rx_rate
lookup: average -1d every 10s unaligned absolute of out
  info: monthly BW average kilobits/s of out
    to: sysadmin

alarm : monthly_bandwidth
    on: snmp.lte1_tx_rx_rate
 every: 12h
  calc: ($monthly_bandwidth_rate_out + $monthly_bandwidth_rate_in) * 60 * 60 * 24 * 30  / 8 / 1024 / 1024
  units: Gbyte
  warn: $this > 50
  crit: $this > 70
  info: monthly BW average GB/month
    to: sysadmin


alarm : monthly_bandwidth_fake1
    on: snmp.lte1_tx_rx_rate
 every: 12h
  calc: ($monthly_bandwidth_rate_out + $monthly_bandwidth_rate_in)
  warn: $this > 50
  crit: $this > 70
  info: monthly BW average GB/month
    to: sysadmin



alarm : monthly_bandwidth_fake2
    on: snmp.lte1_tx_rx_rate
 every: 12h
  calc: ($monthly_bandwidth_rate_out)
  warn: $this > 50
  crit: $this > 70
  info: monthly BW average GB/month
    to: sysadmin

There are two issues with that resault:

  1. monthly_bandwidth units are kilobits/s and I wasn’t able to change them to anything else

  2. The avg calculated is wrong since running the following

curl 'http://10.9.0.6:19999/api/v1/data?chart=snmp.lte1_tx_rx_rate&points=1&&group=average&after=-86400d&before=0' --compressed  --insecure
{
 "labels": ["time", "in", "out"],
    "data":
 [
      [ 1616371200, 11.4808157, -2.3677532]
  ]
}

resulted in different average values:
11.4808157 and abs(-2.3677532) using api
vs
1.08 and 0.77 in my defined alarms

I am not sure how to proceed and debug this issues .
Any help would be appreciated

Hi @Ofir_Marcus

I tested it with the following alarms (have no network devices, so using my server network bandwidth metrics).

template: 1h_traffic_average_rate_in
      on: net.net
  lookup: average -1h unaligned absolute of received
    calc: $this / 1000
   units: Megabits/s
   every: 10s
    info: average rate for the last hour
      to: sysadmin

template: 1h_traffic_average_rate_out
      on: net.net
  lookup: average -1h unaligned absolute of sent
    calc: abs($this) / 1000
   units: Megabits/s
   every: 10s
    info: average rate for the last hour
      to: sysadmin

template: 1h_traffic_average_rate
      on: net.net
    calc: $1h_traffic_average_rate_in + $1h_traffic_average_rate_out
   units: Megabits/s
   every: 10s
    info: average rate for the last hour
      to: sysadmin

template: day_traffic_predict
      on: net.net
    calc: $1h_traffic_average_rate * 60 * 60 * 24 / 8000
   every: 10s
   units: Gigabytes
   every: 10s
    info: predicted total traffic for the upcoming day
      to: sysadmin

template: week_traffic_predict
      on: net.net
    calc: $1h_traffic_average_rate * 60 * 60 * 24 * 7 / 8000
   every: 10s
   units: Gigabytes
   every: 10s
    info: predicted total traffic for the upcoming week
      to: sysadmin

template: month_traffic_predict
      on: net.net
    calc: $1h_traffic_average_rate * 60 * 60 * 24 * 30 / 8000
   every: 10s
   units: Gigabytes
   every: 10s
    info: predicted total traffic for the upcoming 30 days
      to: sysadmin

i get the same data (average in, out for the last hour).

[ilyam@pc health.d]$ curl 'http://127.0.0.1:19999/api/v1/data?chart=net.wlp5s0&points=1&after=-3600&options=absolute&options=unaligned'
{
 "labels": ["time", "received", "sent"],
    "data":
 [
      [ 1616447455, 1093.0917114, 80.8864254]
  ]
}

if you use unaligned absolute in the alarms don’t forget to ad options=absolute&options=unaligned to your query.