I have three Linux machines acting as Child nodes, streaming 5000 netdata metrics to one Parent Node. Each of these three machines is consuming very high bandwidth, about 110 GB/month (~45kB/s).
They all have same Netdata version v2.1.1 and same Linux firmware.
The child nodes are remote and they’re on different networks. They use VPN service called “Tailscale” to stream data to Parent Node, which is a virtual machine in Google Cloud Platform (GCP).
Relevant docs you followed/actions you took to solve the issue
Netdata Data Retention for 1000 metrics = ~60 MB/month
I used linux command line tool called “Nethogs”. Nethogs would tell me the KB/s consumed by tailscale. I measured the nethogs logs for 24-hrs on all these six machines WITH Netdata Child-Parent streaming and WITHOUT Netdata Child-Parent streaming:
Here are the results:
WITH Netdata-Child-Parent streaming: ~45kB/s
WITHOUT Netdata-Child-Parent streaming: ~20 B/s
I expect it to consume bandwidth of 300 MB/month, for 5000 metrics my child nodes are streaming, calculated as per Netdata official documentation. [Operational Considerations Long-Term Data Storage and Retention in Netdata | Netdata ]
Kindly assist to understand why there is a big variation in the Bandwidth.
Thank you
Hi, @meghnamscs. To reduce the data volume, you will need to limit the number of metrics being collected. This can be accomplished by disabling collectors that aren’t essential to your needs. You can also adjust the data collection frequency (for example, changing from every 1 minute to every 5 minutes) to decrease the volume.
I expect it to consume bandwidth of 300 MB/month, for 5000 metrics
Thanks for those suggestions, but I want to know how I can calculate how much Netdata supposed to consume to transfer 5000 metrics from Child to Parent? Isn’t this documentation the right resource to arrive at my conclusion of 300MB/month? Long-Term Data Storage and Retention in Netdata | Netdata
These are my testing steps if it helps us in getting to the bottom of the issue:
Test 1-
Netdata Child-Parent is the only service using our VPN service “tailscale” to stream data to Parent Node. And turning OFF the Child-Parent streaming has shown drastic decrease in the Bandwidth tailscale is using. So that proves Netdata indeed is consuming high bandwidth.
Following is the observation.
(1) Machine 1: SEM-SC
(a) WITH NETDATA CHILD-PARENT STREAMING: Mean Bandwidth=41.8 kB/s
Test 2-
I performed similar tests on two similar machines and they showed lower bandwidth consumption. That is strange.
(1) Machine 3: HARV
WITH NETDATA CHILD-PARENT STREAMING: Mean Bandwidth=256 B/s
OS, Firmware , architecture of all four machines are same. Netdata versions are same: v2.1.1.
They all are on different network and are remote.
All are running on default netdata.conf configuration, streaming about the same no. of metrics, ~ 5000, with 1s update interval. None of them have been tuned. So I expect the bandwidth should be same across all of them. If Netdata can perform streaming of 5000 metrics with avg 300 B/s, then why does is it need to consume ~40 kb/s high bandwidth across other setups is the question I’m trying to answer.
You as a Netdata Team, can help me in getting insights on why this is happening. Please let me know what you think about this problem. What can you gather from my observation?
Expecting only 300MB/month total traffic when streaming 5000 metrics at 1-second granularity is unrealistic. This calculation is impossible given the data volume involved, even with Netdata’s built-in ZSTD compression used during streaming between instances.
What is a realistic bandwidth range that I can expect?
Because I’m seeing two ranges: ~600MB/month, and the other is ~100GB/month in my experiements.
My remote machine will have limited bandwidth quota, about 2GB/month and I want to get Netdata streaming working within that limit. It is critical for my work to know how much bandwidth netdata is actually supposed to consume, and how I can estimate it.