Netdata Community

Anomaly detection, custom models not working

Hi
I am trying to implement the custom models option in the anomalies.conf file, but it looks like the anomalies config fails. My config is as follows:

snmp:
    name: 'snmp'
    host: '127.0.0.1:19999'
    protocol: 'http'
    charts_regex: 'snmp_router\..*'
    charts_to_exclude: 'None'
    model: 'hbos'
    train_max_n: 100000
    train_every_n: 1800
    train_n_secs: 14400
    offset_n_secs: 0
    initial_train_data_after: 1610967339
# initial_train_data_before: 1604593257
    lags_n: 5
    smooth_n: 3
    diffs_n: 1
    contamination: 0.1
    include_average_prob: false
    custom_models:
      - name: 'pppoe'
        dimensions: 'pppoe'
      - name: 'internal'
        dimensions: 'internal'

The error is not very clear:

2021-01-18 15:12:42: python.d ERROR: anomalies[snmp] : update() unhandled exception: Expecting value: line 1 column 1 (char 0)
2021-01-18 15:12:43: python.d ERROR: anomalies[snmp] : update() unhandled exception: Expecting value: line 1 column 1 (char 0)

I am pulling data from an SNMP configuration as follows. The SNMP charts are created as expected:

{
    "enable_autodetect": false,
    "update_every": 5,
    "max_request_size": 100,
    "servers": [
        {
            "hostname": "x.x.x.x",
            "community": "public",
            "update_every": 10,
            "max_request_size": 50,
            "options": {
                "timeout": 10000
            },
            "charts": {
                "snmp_router.bandwidth_pppoe": {
                    "title": "Switch Bandwidth for port pppoe",
                    "units": "kilobits/s",
                    "type": "area",
                    "priority": 1,
                    "family": "ports",
                    "dimensions": {
                        "pppoe": {
                            "oid": "1.3.6.1.2.1.31.1.1.1.6.8",
                            "algorithm": "incremental",
                            "multiplier": 8,
                            "divisor": 1024,
                            "offset": 0
                        }
                    }
                },
                "snmp_router.bandwidth_internal": {
                    "title": "Switch Bandwidth for port internal",
                    "units": "kilobits/s",
                    "type": "area",
                    "priority": 1,
                    "family": "ports",
                    "dimensions": {
                        "internal": {
                            "oid": "1.3.6.1.2.1.31.1.1.1.6.2",
                            "algorithm": "incremental",
                            "multiplier": 8,
                            "divisor": 1024,
                            "offset": 0
                        }
                    }
                }
            }
        }
    ]
}

@andrewm4894 , please, can you help us here?

While we are waiting @andrewm4894

@Morne_Supra lets see debug output, it should provide us with more info

# cd to the plugins.d dir
sudo su -s /bin/bash netdata
./python.d.plugin nolock debug trace anomalies
bash-4.2$ ./python.d.plugin -ppython3 debug nolock trace anomalies
2021-01-18 18:26:54: python.d INFO: plugin[main] : using python v3
2021-01-18 18:26:54: python.d DEBUG: plugin[main] : looking for 'python.d.conf' in ['/etc/netdata', '/usr/lib/netdata/conf.d']
2021-01-18 18:26:54: python.d DEBUG: plugin[main] : loading '/etc/netdata/python.d.conf'
2021-01-18 18:26:54: python.d DEBUG: plugin[main] : '/etc/netdata/python.d.conf' is loaded
2021-01-18 18:26:54: python.d DEBUG: plugin[main] : looking for 'pythond-jobs-statuses.json' in /var/lib/netdata
2021-01-18 18:26:54: python.d DEBUG: plugin[main] : loading '/var/lib/netdata/pythond-jobs-statuses.json'
2021-01-18 18:26:54: python.d DEBUG: plugin[main] : '/var/lib/netdata/pythond-jobs-statuses.json' is loaded
2021-01-18 18:26:56: python.d DEBUG: plugin[main] : [anomalies] looking for 'anomalies.conf' in ['/etc/netdata/python.d', '/usr/lib/netdata/conf.d/python.d']
2021-01-18 18:26:56: python.d DEBUG: plugin[main] : [anomalies] loading '/etc/netdata/python.d/anomalies.conf'
2021-01-18 18:26:56: python.d DEBUG: plugin[main] : [anomalies] '/etc/netdata/python.d/anomalies.conf' is loaded
2021-01-18 18:26:56: python.d INFO: plugin[main] : [anomalies] built 1 job(s) configs
2021-01-18 18:26:56: python.d DEBUG: plugin[main] : anomalies[snmp] was previously active, applying recovering settings
2021-01-18 18:26:57: python.d INFO: plugin[main] : anomalies[snmp] : check success
CHART netdata.runtime_anomalies_snmp '' 'Execution time for anomalies_snmp' 'ms' 'python.d' netdata.pythond_runtime line 145000 1
DIMENSION run_time 'run time' absolute 1 1

2021-01-18 18:26:57: python.d DEBUG: anomalies[snmp] : started, update frequency: 1
2021-01-18 18:26:57: python.d ERROR: anomalies[snmp] : update() unhandled exception: Expecting value: line 1 column 1 (char 0)
2021-01-18 18:26:57: python.d ERROR: anomalies[snmp] : Traceback (most recent call last):
  File "/usr/libexec/netdata/python.d/python_modules/bases/FrameworkServices/SimpleService.py", line 197, in run
    updated = self.update(interval=since)
  File "/usr/libexec/netdata/python.d/python_modules/bases/FrameworkServices/SimpleService.py", line 222, in update
    data = self.get_data()
  File "/usr/libexec/netdata/python.d/anomalies.chart.py", line 331, in get_data
    train_data_before=self.initial_train_data_before)
  File "/usr/libexec/netdata/python.d/anomalies.chart.py", line 245, in train
    sort_cols=True, numeric_only=True, protocol=self.protocol, float_size='float32', user=self.username, pwd=self.password
  File "/var/lib/netdata/.local/lib/python3.6/site-packages/netdata_pandas/data.py", line 172, in get_data
    df = trio.run(get_charts, api_calls, col_sep, timeout, numeric_only, float_size, host_prefix, host_sep)
  File "/var/lib/netdata/.local/lib/python3.6/site-packages/trio/_core/_run.py", line 1896, in run
    raise runner.main_task_outcome.error
  File "/var/lib/netdata/.local/lib/python3.6/site-packages/netdata_pandas/data.py", line 102, in get_charts
    nursery.start_soon(get_chart, api_call, data, col_sep, numeric_only, float_size, host_prefix, host_sep)
  File "/var/lib/netdata/.local/lib/python3.6/site-packages/trio/_core/_run.py", line 741, in __aexit__
    raise combined_error_from_nursery
  File "/var/lib/netdata/.local/lib/python3.6/site-packages/netdata_pandas/data.py", line 64, in get_chart
    r_json = r.json()
  File "/var/lib/netdata/.local/lib/python3.6/site-packages/asks/response_objects.py", line 80, in json
    return _json.loads(body, **kwargs)
  File "/usr/lib64/python3.6/json/__init__.py", line 354, in loads
    return _default_decoder.decode(s)
  File "/usr/lib64/python3.6/json/decoder.py", line 339, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib64/python3.6/json/decoder.py", line 357, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

should be something like

custom_models:
      - name: 'pppoe'
        dimensions: 'chartname.pppoe'
      - name: 'internal'
        dimensions: 'chartname.internal'

where ‘chartname’ is whatever the correct chart id is.

For example from the example conf:

custom_models:
      - name: 'python_d'
        dimensions: 'apps.cpu|python.d.plugin,apps.mem|python.d.plugin'

would create a anomaly ‘python_d’ model what builds one custom model based on the ‘python.d.plugin’ dimension from the ‘apps.cpu’ chart and the ‘python.d.plugin’ dimension from the ‘apps.mem’ chart.

So the syntax is:

custom_models:
      - name: 'some_nice_name'
        dimensions: '<chart.id1>|<dimension.id1>,<chart.id2>|<dimension.id2>'

The collector will then take the dimensions string, split on the ‘,’ and then use the list of ‘chart.id|dimension.id’ to determine what charts and dimensions to use in the model called “some_nice_name”.

It’s a little bit funky but seemed a more compact way to just write it as a string then lots more extra config.

So try this:

snmp:
    name: 'snmp'
    host: '127.0.0.1:19999'
    protocol: 'http'
    charts_regex: 'snmp_router\..*'
    charts_to_exclude: 'None'
    model: 'hbos'
    train_max_n: 100000
    train_every_n: 1800
    train_n_secs: 14400
    offset_n_secs: 0
# initial_train_data_after: 1610967339
# initial_train_data_before: 1604593257
    lags_n: 5
    smooth_n: 3
    diffs_n: 1
    contamination: 0.1
    include_average_prob: false
    custom_models:
      - name: 'pppoe'
        dimensions: 'snmp_router|pppoe'
      - name: 'internal'
        dimensions: 'snmp_router|internal'

Hi @andrewm4894 . Thanks for the feedback. I made the required change as suggested, but still seem to get the same error:

snmp:
    name: 'snmp'
    host: '127.0.0.1:19999'
    protocol: 'http'
    charts_regex: 'snmp_router\..*'
#    charts_regex: 'snmp_router.ppoe'
    charts_to_exclude: 'None'
#    model: 'pca'
    model: 'hbos'
    train_max_n: 100000
    train_every_n: 1800
    train_n_secs: 14400
    offset_n_secs: 0
    initial_train_data_after: 1610967339
# initial_train_data_before: 1604593257
    lags_n: 5
    smooth_n: 3
    diffs_n: 1
    contamination: 0.1
    include_average_prob: false
    custom_models:
      - name: 'pppoe'
        dimensions: 'snmp_router|pppoe'
      - name: 'internal'
        dimensions: 'snmp_router|internal'
2021-01-18 20:11:04: python.d INFO: plugin[main] : [anomalies] built 1 job(s) configs
2021-01-18 20:11:05: python.d INFO: plugin[main] : anomalies[snmp] : check success
2021-01-18 20:11:05: python.d ERROR: anomalies[snmp] : update() unhandled exception: Expecting value: line 1 column 1 (char 0)
2021-01-18 20:11:06: python.d ERROR: anomalies[snmp] : update() unhandled exception: Expecting value: line 1 column 1 (char 0)
2021-01-18 20:11:07: netdata INFO  : PLUGINSD[apps] : 2021-01-18 20:11:07: python.d ERROR: anomalies[snmp] : update() unhandled exception: Expecting value: line 1 column 1 (char 0)
2021-01-18 20:11:08: python.d ERROR: anomalies[snmp] : update() unhandled exception: Expecting value: line 1 column 1 (char 0)
bash-4.2$ ./python.d.plugin -ppython3 debug nolock trace anomalies
2021-01-18 20:12:15: python.d INFO: plugin[main] : using python v3
2021-01-18 20:12:15: python.d DEBUG: plugin[main] : looking for 'python.d.conf' in ['/etc/netdata', '/usr/lib/netdata/conf.d']
2021-01-18 20:12:15: python.d DEBUG: plugin[main] : loading '/etc/netdata/python.d.conf'
2021-01-18 20:12:15: python.d DEBUG: plugin[main] : '/etc/netdata/python.d.conf' is loaded
2021-01-18 20:12:15: python.d DEBUG: plugin[main] : looking for 'pythond-jobs-statuses.json' in /var/lib/netdata
2021-01-18 20:12:15: python.d DEBUG: plugin[main] : loading '/var/lib/netdata/pythond-jobs-statuses.json'
2021-01-18 20:12:15: python.d DEBUG: plugin[main] : '/var/lib/netdata/pythond-jobs-statuses.json' is loaded
2021-01-18 20:12:17: python.d DEBUG: plugin[main] : [anomalies] looking for 'anomalies.conf' in ['/etc/netdata/python.d', '/usr/lib/netdata/conf.d/python.d']
2021-01-18 20:12:17: python.d DEBUG: plugin[main] : [anomalies] loading '/etc/netdata/python.d/anomalies.conf'
2021-01-18 20:12:17: python.d DEBUG: plugin[main] : [anomalies] '/etc/netdata/python.d/anomalies.conf' is loaded
2021-01-18 20:12:17: python.d INFO: plugin[main] : [anomalies] built 1 job(s) configs
2021-01-18 20:12:17: python.d DEBUG: plugin[main] : anomalies[snmp] was previously active, applying recovering settings
2021-01-18 20:12:17: python.d INFO: plugin[main] : anomalies[snmp] : check success
CHART netdata.runtime_anomalies_snmp '' 'Execution time for anomalies_snmp' 'ms' 'python.d' netdata.pythond_runtime line 145000 1
DIMENSION run_time 'run time' absolute 1 1

2021-01-18 20:12:17: python.d DEBUG: anomalies[snmp] : started, update frequency: 1
2021-01-18 20:12:17: python.d ERROR: anomalies[snmp] : update() unhandled exception: Expecting value: line 1 column 1 (char 0)
2021-01-18 20:12:17: python.d ERROR: anomalies[snmp] : Traceback (most recent call last):
  File "/usr/libexec/netdata/python.d/python_modules/bases/FrameworkServices/SimpleService.py", line 197, in run
    updated = self.update(interval=since)
  File "/usr/libexec/netdata/python.d/python_modules/bases/FrameworkServices/SimpleService.py", line 222, in update
    data = self.get_data()
  File "/usr/libexec/netdata/python.d/anomalies.chart.py", line 331, in get_data
    train_data_before=self.initial_train_data_before)
  File "/usr/libexec/netdata/python.d/anomalies.chart.py", line 245, in train
    sort_cols=True, numeric_only=True, protocol=self.protocol, float_size='float32', user=self.username, pwd=self.password
  File "/var/lib/netdata/.local/lib/python3.6/site-packages/netdata_pandas/data.py", line 172, in get_data
    df = trio.run(get_charts, api_calls, col_sep, timeout, numeric_only, float_size, host_prefix, host_sep)
  File "/var/lib/netdata/.local/lib/python3.6/site-packages/trio/_core/_run.py", line 1896, in run
    raise runner.main_task_outcome.error
  File "/var/lib/netdata/.local/lib/python3.6/site-packages/netdata_pandas/data.py", line 102, in get_charts
    nursery.start_soon(get_chart, api_call, data, col_sep, numeric_only, float_size, host_prefix, host_sep)
  File "/var/lib/netdata/.local/lib/python3.6/site-packages/trio/_core/_run.py", line 741, in __aexit__
    raise combined_error_from_nursery
  File "/var/lib/netdata/.local/lib/python3.6/site-packages/netdata_pandas/data.py", line 64, in get_chart
    r_json = r.json()
  File "/var/lib/netdata/.local/lib/python3.6/site-packages/asks/response_objects.py", line 80, in json
    return _json.loads(body, **kwargs)
  File "/usr/lib64/python3.6/json/__init__.py", line 354, in loads
    return _default_decoder.decode(s)
  File "/usr/lib64/python3.6/json/decoder.py", line 339, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib64/python3.6/json/decoder.py", line 357, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

This seems to be the correct syntax:

   custom_models:
      - name: 'pppoe'
        dimensions: 'snmp_router.bandwidth_pppoe|pppoe'
      - name: 'internal'
        dimensions: 'snmp_router.bandwidth_internal|internal'
2021-01-18 20:31:54: python.d INFO: plugin[main] : [anomalies] built 1 job(s) configs
2021-01-18 20:32:13: python.d INFO: plugin[main] : anomalies[snmp] : check success
2021-01-18 20:32:17: python.d INFO: anomalies[snmp] : training complete in 3.33 seconds (runs_counter=1, model=hbos, train_n_secs=14400, models=2, n_fit_success=2, n_fit_fails=0, after=1610980334, before=1610994734).

ah yes - my bad - i didn’t have an agent with snmp set up that i could check easily and not really familiar with those charts.

p.s. it looks like you have maybe defined initial_train_data_after but not initial_train_data_before as a result i think the collector will be ignoring your initial_train_data_after of 1610967339 and you can see from the log message it used after=1610980334, before=1610994734 for the training data for the model.

initial_train_data_after and initial_train_data_after both need to be set together if you want to tell it what window to use for the initial model - the idea here being to have the ability to train on a fixed window, so sort of train once and then use a really really high train_every_n to never train again (i think train_every_n = -1 also works to basically turn off re-training). So you could kind of freeze the model to just be trained on a specific window you know is ‘normal’. Or at least that was the thinking behind those two params.

@andrewm4894 , thanks for the info.