Hey there, do you see any problems when you run speedtest manually from the command line as the netdata user? (This needs to be done once manually to accept the license agreements. You probably know this from the blog already.)
Trying to understand if the speedtest CLI itself runs without issues.
That’s interesting. I just tried this out but it won’t even display the actual charts.
The ‘speedtest’ entry shows up under the overview menu on the node’s local web UI, but when I click it I just get a bunch of recycle/refresh symbols followed by “netdata”, which I take to be placeholders for the actual charts.
When running the plugin in debug mode I can see the charts being generated but where should I see the values for the charts? In @exa’s logs above and in mine I don’t see the actual metrics. I can see from the network activity that it is running a speedtest because the bandwidth profile is the same as when I run speedtest from the command-line manually. So I’m guessing the issue is with the plugin itself.
Also testing this on a Raspberry Pi 4B 4GB (Ubuntu 22.04.1 LTS).
OK, I’ve got data to show up.
The issue here is that the test/sampling happens every 30 minutes at the top of the hour and bottom of the hour, e.g. 10:00/10:30, 11:00/11:30, and so on.
So, on the default view which shows the last 5 minutes the chart shows no information. If I expand it out to over the past 6 hours then I get data back.
Not ideal but at least I understand it.
The second issue (sorry, I realise this is a community plugin, but just want to be thorough) is that if I select a time frame for which there is only one data-point (e.g., the time is now 22:18 and I’ve set the UI to display the past 30 minutes) then the rendering gets rather messed up. I end up with a dot representing the data point on each chart sat hard to the left of the chart.
In the first place that doesn’t make sense since the sample must be from 22:00 but the time window is 22:18 - 30 minutes and so shouldn’t I see the chart starting from 21:52 and a dot at 22:00?
But also, I don’t seem to see the information about the metric timestamp come up like it does for other charts and so it makes it hard to be sure that I am seeing the proper time for the metric rather than just getting a weird rendering.
The third issue is that no matter what time-frame I select, I get empty charts in the netdata cloud view.
Follow-up. Luis was correct and I was able to duplicate what he mentions below. Perhaps, there is another way to do this. I’m disappointed in graphing/charting. Again, it’s a metrics/timing thing and how the plugin works. Thanks again.
I actually did these few PR’s to change default to 30 mins and add some warnings.
Scary cloud cost story time…
I actually had this enabled on all 20 of my ml demo nodes running on GCP and with an update every of 2 seconds.
It ended up burning through over $1k usd per day for a day or two due to ingress/egress costs from the speedtest constantly running and fact of insane bandwidth on my gcp VM’s. Ended up with 100’s GB on all my nodes being sent back and forth as part of speedtest running every 2 seconds.
So i added the very conservative 30 mins to make it much safer out of box in case anyone would be in a similar situation to me. And also added some warnings in the README.
The idea being that this would be fine to run every 30 mins and then over a few days and weeks be useful for monitoring isp speed etc while not actually eating into your bandwidth. Typically it would not really make sense to run it at much more frequent intervals below maybe 10 or 15 minutes.
But it all depends really. Main thing was default update_every = 2 was very much not a “sane default” as we found out the hard way.
I will actually add another note about this to the readme and can understand how it would appear confusing as won’t really start showing data properly until after a few hours with default settings (or not at all if using default time filter of 15 minutes etc).
So if you are on an unlimited connection on a raspberry pi and are OK to have speedtest download and upload a bunch of data then setting the speedtest_update_every variable in the speedtest.conf file to a smaller value should do the trick.
I think the default you set is a good one given the potential impact. I also think that the prominence it has on the project page is perfect.
Yes, I have now tested that too but in a way it’s potentially somewhat misleading because it implies that anything less than 30 minutes per sampling will work when it won’t. Even a 4 minute sampling won’t work (it’ll work locally but not on the cloud dashboard).
The issue here is that there are really good reasons why you might not want to sample every second. Obviously this is a fundamental discussion about netdata architecture but a while back I raised the issue with the pihole plugin collecting data every second. It does this by querying a web API interface which then generates an entry in an access logfile; and I’d imagine it does this for a lot of other modules. The problem then is that you fill up logs with noise that could well cause you to lose data related to issues or even security because the additional activity causes logs to roll-over so quickly; and even if you store them you have an awful lot of chaff to get through when checking them.
Might I suggest that a compromise might be an RFC to enable displaying of time-series data with custom sampling intervals via the custom dashboards feature? That way no major changes would probably have to be made to the code for the main overview and node views but would allow people to build out and display their data.
Have you run speedtest-cli as a Netdata user once to accept the license agreements?
Have you reduced the time interval to something smaller to see the charts being populated sooner? (As mentioned above this will increase data consumption so be careful)
If you have done both of the above, then I think the problem is due to the path that this script community/install-collector.sh at main · netdata/community · GitHub is using while copying the collector code. Please verify if the path used is different from the Netdata path on your system, if yes then you can either update the script or do the steps manually.