I use NetData to monitor my production app but don’t check it that often (couple of times a month) outside of when we get issues reported. I’ve been using it over a year but I’ve found in the last six months the dashboards that I create become broken after a short time and I have to recreate them. It’s very annoying and to be honest is spoiling the whole tool for me at the moment.
Here is an example of what one of these dashboards looks like:
The node is fine, I can see it in the nodes list and it’s reporting metrics. The dashboard just shows nothing. In the console there are a bunch of 404 errors like this:
It’s good that you don’t check Netdata that often, it means that your systems are working optimally
Just to verify my understanding, you mean that the custom dashboards that you create through cloud become "unusable’ as you show in the screenshot after a random amount of time?
When I first set up ND I created a bunch of dashboards. They worked fine for ages (months) until a recent(ish) update. It was around the time that the non-white UI came in I think. Anyway, I’ll create the custom dashboards, usually only connected to a single node, and they work fine. To confirm they are on the ND Cloud product. Fast forward an unknown amount of time (maybe a couple of weeks) and they will stop working.
I’ve just gone and checked and currently it’s only affecting 2 of my nodes, but again I haven’t changed anything and the nodes are both accessible via the Node tab.
Screenshot of all my dashboards, X marked are completely unusable, only single node. \ marked are partially unusable and not showing data to the affected nodes.
Yes, @OdysLam. It is something we need to improve. Soon we would be able to better detect that it is the same node, even if it was claimed from scratch (using the machine_guid).
@swhitf Is it possible that you run the claiming script on machine startup, or as part of another script?
Now that I think about it, I might have had an issue with this node being “not reachable” and tried reclaiming to get it working again, I have a vague memory of something like this. Given how fast time appears to be moving in my life at the moment its also possible I’m way off with my timelines.
So basically, if I reclaim, I have to recreate the dashboards for the respective node? If that’s the case at least I know why it’s happening and will be more mindful next time. If it happens again and I definitely didn’t reclaim myself I’ll be back in touch. I’m not running it automatically anywhere.
Anyway, thanks for your prompt responses, great support. Love the product also, waiting for the day I can pay for it!
So basically, if I reclaim, I have to recreate the dashboards for the respective node?
Yes, this is the case for now. We already plan to improve this behavior, so even after a reclaim, the cloud would detect it is the same node.
Also, we will communicate better that the particular node is actually offline, instead of this misleading loading graphic. Thanks for bringing this to our attention.
Back again. I just tried changing the affected dashboard and had issues saving. I think it’s possible that this bug actually happened to me a few times without me realising and its why I kept thinking it was re-happening.
Steps to reproduce are in a video here:
TL/DW though is that if you have a dashboard made up of only charts from a dead node, you can’t ever get it to save and have to delete and re-create.
Let me know if you have any questions, though I’m literally going on holiday in 5 minutes so I won’t reply until next week. Laters!
From our logs I have detected 14 warnings when you tried to update your dashboard. The reason for those is “dashboard version conflict”. Around “2021-05-27T18:47:16.925136065Z” (UTC)
It can happen when you tried to edit a dashboard that is already edited by another session. A simple page refresh should fix it.
@novykh We have a caching issue as well, when I refresh the page (after I modified the dashboard in an other tab), the request returned with 304.
Cache related headers:
cache-control: max-age=0
if-modified-since:Wed, 19 May 2021 11:45:16 GMT
if-none-match: "60a4fa4c-a93"
We need to review our caching policies and middlewares.
If I disable the cache on chrome it works ok.
@swhitf A hard-refresh shift+f5 should be enough as a quick fix. Until will fix it on our end.
We haven’t pay much attention to dashboards lately, but your feedback was really valuable for pinpoint all those issues. We will you posted with the fixes.