Metrics Correlation

It is time consuming to go over thousands of metrics and hundreds of charts to manually identify metrics which are correlated.

!


Correlated Metrics

For a specific time window when an anomaly occurred it should be possible to automatically find a subset of charts where for the same time window the same anomaly was detected.
Let’s say a software issue resulted in high CPU and disk load over thirty seconds during an outage. Selecting that time window for the CPU load chart will provide users with the option to ask Netdata to find correlated charts. Netdata can then display the disk load chart automatically, while also hiding charts that are not relevant.
This will allow you to very quickly narrow down all the metrics being affected by a specific issue, saving a lot of time in debugging specific software or infrastructure issues.
We also want users to be able to share specific correlations they have found by sharing a link with their team.

Thanks @andrewm4894 for keeping this up to date!

done and live to play with: Introducing our first Netdata Cloud Insights feature: Metric Correlations for faster root cause analysis - Netdata

1 Like

Working on this at the moment - happy to chat in more detail about it with anyone interested, just reply in here.

1 Like