Anomalies section missing on local agent dashboard and on cloud

Hi
I followed the anomaly detection guide, but I can not see the Anomaly section nor any dashboards.

Using Centos7 with agent v1.28.0-131-nightly

Regards

Hi @Morne_Supra

Let’s try to debug the problem.

Running anomalies collector in the debug mode likely will tell us what is wrong, please do the following:

cd  /path/to/netdata/plugins.d/ # /opt/netdata/usr/libexec/netdata/plugins.d or usr/libexec/netdata/plugins.d


sudo su -s /bin/bash netdata

./python.d.plugin debug nolock trace anomalies 

And share the output

Hi @ilyam8
Thanks for your reply. Below the output:

bash-4.2$ ./python.d.plugin debug nolock trace anomalies
2021-01-15 11:29:19: python.d INFO: plugin[main] : using python v2
2021-01-15 11:29:19: python.d DEBUG: plugin[main] : looking for 'python.d.conf' in ['/etc/netdata', '/usr/lib/netdata/conf.d']
2021-01-15 11:29:19: python.d DEBUG: plugin[main] : loading '/etc/netdata/python.d.conf'
2021-01-15 11:29:19: python.d DEBUG: plugin[main] : '/etc/netdata/python.d.conf' is loaded
2021-01-15 11:29:19: python.d DEBUG: plugin[main] : looking for 'pythond-jobs-statuses.json' in /var/lib/netdata
2021-01-15 11:29:19: python.d DEBUG: plugin[main] : loading '/var/lib/netdata/pythond-jobs-statuses.json'
2021-01-15 11:29:19: python.d DEBUG: plugin[main] : '/var/lib/netdata/pythond-jobs-statuses.json' is loaded
2021-01-15 11:29:19: python.d WARNING: plugin[main] : [anomalies] error on loading source : SyntaxError('invalid syntax', ('/usr/libexec/netdata/python.d/anomalies.chart.py', 83, 108, "        self.charts_available = [c for c in list(requests.get(f'{self.protocol}://{self.host}/api/v1/charts').json().get('charts', {}).keys())]\n")), skipping it
2021-01-15 11:29:19: python.d INFO: plugin[main] : no jobs to run

Ok, that is the problem, anomalies collector requires python3. If you have python3 installed, see Syntax error: anomalies.chart.py · Issue #10499 · netdata/netdata · GitHub

Hi @ilyam8

I made the required change and then restarted the agent:

[plugin:python.d]
        # update every = 1
        # command options =
        command options = -ppython3

I then ran ./python.d.plugin debug nolock trace anomalies again and got the same error:

bash-4.2$ ./python.d.plugin debug nolock trace anomalies
2021-01-15 11:38:46: python.d INFO: plugin[main] : using python v2
2021-01-15 11:38:46: python.d DEBUG: plugin[main] : looking for 'python.d.conf' in ['/etc/netdata', '/usr/lib/netdata/conf.d']
2021-01-15 11:38:46: python.d DEBUG: plugin[main] : loading '/etc/netdata/python.d.conf'
2021-01-15 11:38:46: python.d DEBUG: plugin[main] : '/etc/netdata/python.d.conf' is loaded
2021-01-15 11:38:46: python.d DEBUG: plugin[main] : looking for 'pythond-jobs-statuses.json' in /var/lib/netdata
2021-01-15 11:38:46: python.d DEBUG: plugin[main] : loading '/var/lib/netdata/pythond-jobs-statuses.json'
2021-01-15 11:38:46: python.d DEBUG: plugin[main] : '/var/lib/netdata/pythond-jobs-statuses.json' is loaded
2021-01-15 11:38:46: python.d WARNING: plugin[main] : [anomalies] error on loading source : SyntaxError('invalid syntax', ('/usr/libexec/netdata/python.d/anomalies.chart.py', 83, 108, "        self.charts_available = [c for c in list(requests.get(f'{self.protocol}://{self.host}/api/v1/charts').json().get('charts', {}).keys())]\n")), skipping it
2021-01-15 11:38:46: python.d INFO: plugin[main] : no jobs to run

I definitely have python3 installed:

bash-4.2$ python3 --version
Python 3.6.8
bash-4.2$ python --version
Python 2.7.5
bash-4.2$ exit
exit

Looks like python is defaulting to v2. Do you know how to setup default to version 3 for the netdata user?

Changes you applied to netdata.conf are not seen in the debug mode because python.d.plugin doesn’t read the file (netdata reads it and tells to use python3 to python.d.plugin - should work after netdata.service restart).

In the debug mode you need

./python.d.plugin -ppython3 debug nolock trace anomalies

Thanks @ilyam8

The following from error.log and ./python.d.plugin -ppython3 debug nolock trace anomalies. Is it expected?

2021-01-15 11:52:00: python.d INFO: plugin[main] : [anomalies] built 1 job(s) configs
2021-01-15 11:52:01: python.d WARNING: plugin[main] : anomalies[local] : unhandled exception on check : DataError('No numeric types to aggregate',), skipping the job
1 Like

Hi @ilyam8

Fixed it. I had a typo in my anomalies.conf.

Now just waiting for the Anomalies section to appear:

2021-01-15 12:03:58: python.d INFO: plugin[main] : [anomalies] built 1 job(s) configs
2021-01-15 12:03:59: python.d INFO: plugin[main] : anomalies[local] : check success
2021-01-15 12:04:02: python.d INFO: anomalies[local] : training complete in 3.14 seconds (runs_counter=1, model=pca, train_n_secs=14400, models=23, n_fit_success=23, n_fit_fails=0, after=1610690639, before=1610705039).

Thanks @ilyam8

I see my anomaly score for system..*. Will now start playing around with it to see if I can trigger an anomaly.

Nice you get it working :tada:

Consider to check Anomalies collector feedback megathread! and share your opinion :grinning_face_with_smiling_eyes:

2 Likes

Yeah, I definitely will. I am setting up an SNMP config, so I can get values from my Mikrotik router. As soon as I have these values, I can start playing around with detection and then share my findings.

2 Likes

This is due to the way GroupBy objects handle the different aggregation methods. In fact sum and mean are handled differently. GroupBy.mean function call dispatches to self._cython_agg_general which checks for numeric types and in case it doesn’t find any (which is the case for your example) it raises a DataError. Though the call to self._cython_agg_general is wrapped in try/except in case of a GroupByError it just re-raises and DataError inherits from GroupByError. Thus the exception.

By applying pd.to_numeric you convert them to numeric type and the agg works.

1 Like

Thanks @sambutle for that awesome explanation!

I think i have seen this come up sometimes too if i set the configuration for the collector to cover a handful of charts that sometimes return “None” or “null” via the rest api. It’s usually transient most of this time iirc. But it would be nice to handle this better in some way within the collector i think, just not 100% sure how would be as graceful as possible a way to do it without having to add more complex logic or data validation of some sort.