DBEngine files format/specification

Hi,

We currently have a netdata streaming setup with 5 servers mirrored on a “monitor”.

I’d like to access the metrics to do some stats/simulations/etc. Is there any more documentation on the file format of the DBEngine used by netdata? How can I read these files (I mean short of going through the source code)…?

Many thanks for any input

1 Like

Hey @Skykeeper,

Thanks for reaching out. I think the best way forward would be to use the Netdata Agent API . We use it in our ML efforts where we extract metrics from Netdata Agents and we have even shipped a python library to easily extract data and convert them to python pandas. @andrewm4894 is the :brain: behind these efforts, so he might have more insights into this.

P.S Welcome to the community, do stick around and ping us whenever we can help with anything :slight_smile:

Hi, I have sort of built my own pypi library to just pull the data from the rest api, for multiple hosts if you like, into a pandas dataframe. So if your looking to just get your data into python and a pandas df, that’s what I’ve been using as first step in my own internal ml related research.

Here is a some examples of using it in a jupyter notebook community/netdata-agent-api/netdata-pandas at main · netdata/community · GitHub

Just let me know if you run into any issues using it if python is something you looking to use.

If python and pandas is not your thing then the rest api is for sure best place to start.

Ps I know we are using sqlite a bit more in places going forward which would be cool as you could potentially just plug into the db using whatever driver or client. But for now from what I could gather is that the db engine would need to be queried via the underlying c api which was a bit of a stretch for me as I don’t know c.

1 Like

Thank you @OdysLam for the kind welcome and reply, and thanks @andrewm4894 for linking me to your repo. I will look at what you have, see if I can get inspired for my purposes. I am not really bound to a programming language, so any way to get it done is the good way I guess :wink:

I had also considered the API, but it seems a shame to go through all the socketing for data I have available locally, and I figured I don’t really want to export to another backend if all the data is already on my disk…

Anyway I will start from your project and see how I can go… worst comes to worst, I will spend time reading the C source, or export :smiley:

Cheers

1 Like

I think the next best solution is to use the export mechanism and put it in a TSDB or something like Mongo, then you can easily use it in any way you prefer.

I wouldn’t go with source code and dbengine, it seems far more complex (and not sure if even possible).

OK I will keep this in mind.
Do you know if I set up the export to MongoDB, if my backlog (= 1 month) of data will be exported as well, or will it just be the new data being collected?

@Skykeeper the netdata-pandas lib makes async calls so might help a bit there maybe. I have been wanting to get an endpoint added to REST API whereby you could query multiple (or all charts) via /charts or /data or some new endpoint. At the moment you kinda have to query each chart individually which can be a bit annoying to work with when you just want all the data fro some time range or a subset of charts in one batch.

I will make a feature request for it actually and we can see if people vote on it and that might help get it prioritized so not just me asking for it :slight_smile:

I made a feature request here so can see if anyone else in community interested over next while.

https://community.netdata.cloud/t/ability-to-get-data-for-multiple-or-all-charts-across-a-user-defined-time-range-via-rest-api/656