I’m new here but have been using Netdata for a while. Just recently I’ve been wanting to expand its capabilities by monitoring more things.
Recently, I discovered this project from one of NetData own postings. How to monitor Internet quality and ISP performance with Netdata | Netdata Blog
I’ve had nothing but problems with this but would like to get this to work. Here’s what I’ve done so far.
- installed on 2 raspberry pis. Host OS was called Dietpi based of Debian. The charts weren’t being displayed (Overview).
- Above setup again using a different OS Raspbain lite. This time the speedtest ran in debug mode and the charts were posted in (overview). However, no data was being returned inside the charts.
Yesterday the charts are there but with a red critical, node didn’t have the requested context.
Today, I’m getting ‘no data’.
2022-12-14 07:28:27: charts.d: INFO: main: started from './charts.d.plugin' with options: speedtest
2022-12-14 07:28:27: charts.d: INFO: main: Configuration file '/usr/lib/netdata/conf.d/charts.d.conf' loaded.
/etc/netdata/charts.d.conf: line 1: speedtest: yes: command not found
2022-12-14 07:28:27: charts.d: ERROR: main: Config file '/etc/netdata/charts.d.conf' loaded with errors.
2022-12-14 07:28:27: charts.d: INFO: example: is disabled. Add a line with example=force in '/etc/netdata/charts.d.conf' to enable it (or remove the line that disables it).
2022-12-14 07:28:27: charts.d: DEBUG: speedtest: is enabled for auto-detection.
2022-12-14 07:28:27: charts.d: DEBUG: speedtest: loading module: './../charts.d/speedtest.chart.sh'
2022-12-14 07:28:27: charts.d: DEBUG: speedtest: not found module configuration: '/usr/lib/netdata/conf.d/charts.d/speedtest.conf'
2022-12-14 07:28:27: charts.d: DEBUG: speedtest: loading module configuration: '/etc/netdata/charts.d/speedtest.conf'
2022-12-14 07:28:43: charts.d: DEBUG: speedtest: module 'speedtest' activated
2022-12-14 07:28:43: charts.d: DEBUG: main: activated modules: speedtest
2022-12-14 07:28:43: charts.d: DEBUG: main: requested to run only for: 'speedtest'
2022-12-14 07:28:43: charts.d: DEBUG: main: activated charts: speedtest
2022-12-14 07:28:43: charts.d: DEBUG: speedtest: calling 'speedtest_create()'...
CHART speedtest.download_speed '' 'Download Bandwidth' 'kilobits/s' 'speed' 'speedtest.download' area 150000 1800 '' '' 'speedtest'
DIMENSION speedtest.download_speed '' absolute 1 1
CHART speedtest.upload_speed '' 'Upload Bandwidth' 'kilobits/s' 'speed' 'speedtest.upload' area 150000 1800 '' '' 'speedtest'
DIMENSION speedtest.upload_speed '' absolute 1 1
CHART speedtest.packet_loss '' 'Packet Loss' 'packet loss %' 'loss' 'speedtest.packetloss' line 150000 1800 '' '' 'speedtest'
DIMENSION speedtest.packet_loss '' percentage-of-absolute-row 1 1
CHART speedtest.idle_latency '' 'Idle Latency' 'milliseconds' 'latency' 'speedtest.idle_latency' line 150000 1800 '' '' 'speedtest'
DIMENSION speedtest.idle_latency '' absolute 1 1
CHART speedtest.download_latency '' 'Download Latency' 'milliseconds' 'latency' 'speedtest.download_latency' line 150000 1800 '' '' 'speedtest'
DIMENSION speedtest.download_latency '' absolute 1 1
CHART speedtest.upload_latency '' 'Upload Latency' 'milliseconds' 'latency' 'speedtest.upload_latency' line 150000 1800 '' '' 'speedtest'
DIMENSION speedtest.upload_latency '' absolute 1 1
CHART speedtest.idle_jitter '' 'Idle Jitter' 'milliseconds' 'jitter' 'speedtest.idle_jitter' line 150000 1800 '' '' 'speedtest'
DIMENSION speedtest.idle_jitter '' absolute 1 1
CHART speedtest.download_jitter '' 'Download Jitter' 'milliseconds' 'jitter' 'speedtest.download_jitter' line 150000 1800 '' '' 'speedtest'
DIMENSION speedtest.download_jitter '' absolute 1 1
CHART speedtest.upload_jitter '' 'Upload Jitter' 'milliseconds' 'jitter' 'speedtest.upload_jitter' line 150000 1800 '' '' 'speedtest'
DIMENSION speedtest.upload_jitter '' absolute 1 1
CHART speedtest.download_bytes '' 'Bytes downloaded' 'bytes' 'bytes transmitted' 'speedtest.download_bytes' line 150000 1800 '' '' 'speedtest'
DIMENSION speedtest.download_bytes '' absolute 1 1
CHART speedtest.upload_bytes '' 'Bytes uploaded' 'bytes' 'bytes transmitted' 'speedtest.upload_bytes' line 150000 1800 '' '' 'speedtest'
DIMENSION speedtest.upload_bytes '' absolute 1 1
2022-12-14 07:28:43: charts.d: DEBUG: speedtest: 'speedtest' initialized.
2022-12-14 07:28:43: charts.d: DEBUG: main: run_charts=' speedtest'
CHART netdata.plugin_chartsd_speedtest '' 'Execution time for speedtest plugin' 'milliseconds / run' charts.d netdata.plugin_charts area 145000 1800 '' '' 'speedtest'
DIMENSION run_time 'run time' absolute 1 1
2022-12-14 07:28:43: charts.d: DEBUG: speedtest: sleeping for 76.659 seconds.
./loopsleepms.sh.inc: line 97: /tmp/.netdata_bash_sleep_timer_fifo: Permission denied
./charts.d.plugin: Cannot use read for sleeping (return code 1).
Thanks in advance.
Hey there, do you see any problems when you run speedtest manually from the command line as the netdata user? (This needs to be done once manually to accept the license agreements. You probably know this from the blog already.)
Trying to understand if the speedtest CLI itself runs without issues.
Yep, sure did. Saw the license and agreed.
That’s interesting. I just tried this out but it won’t even display the actual charts.
The ‘speedtest’ entry shows up under the overview menu on the node’s local web UI, but when I click it I just get a bunch of recycle/refresh symbols followed by “netdata”, which I take to be placeholders for the actual charts.
When running the plugin in debug mode I can see the charts being generated but where should I see the values for the charts? In @exa’s logs above and in mine I don’t see the actual metrics. I can see from the network activity that it is running a speedtest because the bandwidth profile is the same as when I run speedtest from the command-line manually. So I’m guessing the issue is with the plugin itself.
Also testing this on a Raspberry Pi 4B 4GB (Ubuntu 22.04.1 LTS).
Let me reproduce these steps on my raspberry pi, to get to the bottom of this.
OK, I’ve got data to show up.
The issue here is that the test/sampling happens every 30 minutes at the top of the hour and bottom of the hour, e.g. 10:00/10:30, 11:00/11:30, and so on.
So, on the default view which shows the last 5 minutes the chart shows no information. If I expand it out to over the past 6 hours then I get data back.
Not ideal but at least I understand it.
The second issue (sorry, I realise this is a community plugin, but just want to be thorough) is that if I select a time frame for which there is only one data-point (e.g., the time is now 22:18 and I’ve set the UI to display the past 30 minutes) then the rendering gets rather messed up. I end up with a dot representing the data point on each chart sat hard to the left of the chart.
In the first place that doesn’t make sense since the sample must be from 22:00 but the time window is 22:18 - 30 minutes and so shouldn’t I see the chart starting from 21:52 and a dot at 22:00?
But also, I don’t seem to see the information about the metric timestamp come up like it does for other charts and so it makes it hard to be sure that I am seeing the proper time for the metric rather than just getting a weird rendering.
The third issue is that no matter what time-frame I select, I get empty charts in the netdata cloud view.
Hope that helps!
Thanks, sir, for looking into this.
Follow-up. Luis was correct and I was able to duplicate what he mentions below. Perhaps, there is another way to do this. I’m disappointed in graphing/charting. Again, it’s a metrics/timing thing and how the plugin works. Thanks again.
Thanks @Luis_Johnstone this explains it alright.
I actually did these few PR’s to change default to 30 mins and add some warnings.
Scary cloud cost story time…
I actually had this enabled on all 20 of my ml demo nodes running on GCP and with an update every of 2 seconds.
It ended up burning through over $1k usd per day for a day or two due to ingress/egress costs from the speedtest constantly running and fact of insane bandwidth on my gcp VM’s. Ended up with 100’s GB on all my nodes being sent back and forth as part of speedtest running every 2 seconds.
Some images below showing this in GCP console.
So i added the very conservative 30 mins to make it much safer out of box in case anyone would be in a similar situation to me. And also added some warnings in the README.
The idea being that this would be fine to run every 30 mins and then over a few days and weeks be useful for monitoring isp speed etc while not actually eating into your bandwidth. Typically it would not really make sense to run it at much more frequent intervals below maybe 10 or 15 minutes.
But it all depends really. Main thing was default
update_every = 2 was very much not a “sane default” as we found out the hard way.
I will actually add another note about this to the readme and can understand how it would appear confusing as won’t really start showing data properly until after a few hours with default settings (or not at all if using default time filter of 15 minutes etc).
@Luis_Johnstone @exa @andrewm4894 it appears the charting/graphing gets messed up when the sampling happens as slow as every 30 minutes.
To test this, I just set the sampling (update_every) to 3 minutes (180 seconds) and it seems to work well on local dashboard as well as netdata cloud
So if you are on an unlimited connection on a raspberry pi and are OK to have speedtest download and upload a bunch of data then setting the
speedtest_update_every variable in the
speedtest.conf file to a smaller value should do the trick.
I think the default you set is a good one given the potential impact. I also think that the prominence it has on the project page is perfect.
Yes, I have now tested that too but in a way it’s potentially somewhat misleading because it implies that anything less than 30 minutes per sampling will work when it won’t. Even a 4 minute sampling won’t work (it’ll work locally but not on the cloud dashboard).
The issue here is that there are really good reasons why you might not want to sample every second. Obviously this is a fundamental discussion about netdata architecture but a while back I raised the issue with the pihole plugin collecting data every second. It does this by querying a web API interface which then generates an entry in an access logfile; and I’d imagine it does this for a lot of other modules. The problem then is that you fill up logs with noise that could well cause you to lose data related to issues or even security because the additional activity causes logs to roll-over so quickly; and even if you store them you have an awful lot of chaff to get through when checking them.
Might I suggest that a compromise might be an RFC to enable displaying of time-series data with custom sampling intervals via the custom dashboards feature? That way no major changes would probably have to be made to the code for the main overview and node views but would allow people to build out and display their data.
I’ve now found that 3 minutes/ 180 seconds works locally but not on the cloud. Seems like something more fundamental might also be going on here.
Can anyone else repro what I’m seeing?
The same problem happened with me, can anyone suggest a better solution so that this problem does not come again?