Hello - this is Andy from Netdata Cloud.
I’m reaching out to the Netdata community to see if anyone might be able to help me.
I’m the Machine Learning engineer in Netdata Cloud (team of 1 ) and am working on some prototypes of various ML driven features we would like to explore adding to the cloud or the agent (wherever fits best and makes most sense).
To that end, I’m always on the lookout for any real world Netdata data I can get my hands on to try out the various prototypes on - Netdata’s in the wild as I like to think of them.
I typically use the various demo sites to pull some data from via the rest api
/api/v1/... etc. This is fine for getting a POC up and running but as you can imagine the data on those demo sites is not really the most representative stuff to be using, nor even very realistic.
So I’m wondering if anyone has any publicly accessible Netdata’s (have we agreed what the plural is here?) that they don’t mind me randomly hitting up the api of every now and then to pull some data from as I build and test out various prototype features that could end up making their way into either the cloud or agent as new additional ‘insights’ features.
If so, can you share the url of them in here or if you’re happy to share but not in here publicly then you can email me at firstname.lastname@example.org. I’m also happy to give you my static ip address if you would rather just whitelist me as opposed to making public.
We had previously looked at asking people who were interested in contributing to reach out and in a few cases we streamed their metrics to some parent nodes in our cloud. But most of the features we are looking at for the short to medium term follow the pattern of we pull some data from the agent, do some ML stuff, display results and insights so full on streaming all metrics is a bit overkill for what we need and tbh a bit of an overhead for me to manage too . A list of publicly accessible Netdata’s that I can randomly pull some data from every now and then as part of iterating on the POC’s would be much cleaner and probably even scale better.
To give a concrete example since you have read this far - at the moment i’m working on a POC that uses traditional market basket type analysis, aka association rule mining (i’m sure many have heard the beer and diapers story before) to look at your last N alarm events or all alarms in the last N hours, add a window around each alarm and use that as your “basket of alarms”. Then we can see if we can use association rule mining approaches to find frequent sets of co-occurring alarms and maybe even some association rules that could turn out to be useful or interesting - especially when troubleshooting a period with lots of alarms for example.
I have a POC working but at the moment I’m pulling in the alarm log from the demo sites (e.g. https://london.my-netdata.io/api/v1/alarm_log) as the raw data but as you can imagine this data is not really realistic or varied enough to get a feel for if what I’ve done so far might be useful.
I think there could be a useful feature there but it would be a lot better to run it against a much wider range of Netdata agents to get a good feel for on average how useful or not it might be and iterate on what parts of the approach need some sort of special handling or re-design etc in terms of both the ML framing and how it could be added to the product.
Anyway this is starting to look like a wall of text so I’ll stop here.
p.s. I’d be happy to do some sessions and share examples of the POC’s type features using your own data to see if is something you would find useful as a proper product feature somewhere in Netdata.
Any help much appreciated! I might even be able to get some swag sent your way