My dashboards keep "dying" and becoming useless so I have to recreate them. How can I stop this happening?

swhitf · May 27, 2021, 4:32pm

I use NetData to monitor my production app but don’t check it that often (couple of times a month) outside of when we get issues reported. I’ve been using it over a year but I’ve found in the last six months the dashboards that I create become broken after a short time and I have to recreate them. It’s very annoying and to be honest is spoiling the whole tool for me at the moment.

Here is an example of what one of these dashboards looks like:

The node is fine, I can see it in the nodes list and it’s reporting metrics. The dashboard just shows nothing. In the console there are a bunch of 404 errors like this:

Seems like the dashboard entries have gone?

Cheers
Steve

OdysLam · May 27, 2021, 5:26pm

Hi,

It’s good that you don’t check Netdata that often, it means that your systems are working optimally

Just to verify my understanding, you mean that the custom dashboards that you create through cloud become "unusable’ as you show in the screenshot after a random amount of time?

swhitf · May 27, 2021, 5:49pm

Haha yeah :]

When I first set up ND I created a bunch of dashboards. They worked fine for ages (months) until a recent(ish) update. It was around the time that the non-white UI came in I think. Anyway, I’ll create the custom dashboards, usually only connected to a single node, and they work fine. To confirm they are on the ND Cloud product. Fast forward an unknown amount of time (maybe a couple of weeks) and they will stop working.

I’ve just gone and checked and currently it’s only affecting 2 of my nodes, but again I haven’t changed anything and the nodes are both accessible via the Node tab.

Screenshot of all my dashboards, X marked are completely unusable, only single node. \ marked are partially unusable and not showing data to the affected nodes.

Example of one that is half working:

Let me know if you need any more information.

swhitf · May 27, 2021, 5:50pm

Just checked and ALL my nodes are running netdata v1.26.0-333-nightly.

OdysLam · May 27, 2021, 5:52pm

Is it possible to update 1 node and see if it fixes the issue?

In any case, this is troubling.

Thank you for being so thorough with your analysis.

Leonidas_Vrachnis · May 27, 2021, 6:07pm

Hello @swhitf.

I looked into this and the agent you have in the dashboard has been re-claimed. I can see multiple ids for the same hostname.

Last time we heard about the particular id was 2020-10-03 15:59:20.660894+00:00 UTC

re-claimed could happen automatically in some environments.

OdysLam · May 27, 2021, 6:15pm

So, if you reclaim a node, it’s basically a new agent, thus the dashboards will no longer work, right?

new agent from the PoV of the cloud

Leonidas_Vrachnis · May 27, 2021, 6:18pm

Yes, @OdysLam. It is something we need to improve. Soon we would be able to better detect that it is the same node, even if it was claimed from scratch (using the machine_guid).

@swhitf Is it possible that you run the claiming script on machine startup, or as part of another script?

swhitf · May 27, 2021, 6:28pm

Now that I think about it, I might have had an issue with this node being “not reachable” and tried reclaiming to get it working again, I have a vague memory of something like this. Given how fast time appears to be moving in my life at the moment its also possible I’m way off with my timelines.

So basically, if I reclaim, I have to recreate the dashboards for the respective node? If that’s the case at least I know why it’s happening and will be more mindful next time. If it happens again and I definitely didn’t reclaim myself I’ll be back in touch. I’m not running it automatically anywhere.

Anyway, thanks for your prompt responses, great support. Love the product also, waiting for the day I can pay for it!

Cheers
Steve

Leonidas_Vrachnis · May 27, 2021, 6:34pm

Thanks, for the kind words

So basically, if I reclaim, I have to recreate the dashboards for the respective node?

Yes, this is the case for now. We already plan to improve this behavior, so even after a reclaim, the cloud would detect it is the same node.

Also, we will communicate better that the particular node is actually offline, instead of this misleading loading graphic. Thanks for bringing this to our attention.

swhitf · May 27, 2021, 6:52pm

Back again. I just tried changing the affected dashboard and had issues saving. I think it’s possible that this bug actually happened to me a few times without me realising and its why I kept thinking it was re-happening.

Steps to reproduce are in a video here:

TL/DW though is that if you have a dashboard made up of only charts from a dead node, you can’t ever get it to save and have to delete and re-create.

Let me know if you have any questions, though I’m literally going on holiday in 5 minutes so I won’t reply until next week. Laters!

Steve

novykh · May 27, 2021, 7:04pm

Hey Steve,

Could you please verify from the network tab, that the PATCH request is getting a 409 conflict error?

Thanks,
Johnny

Leonidas_Vrachnis · May 27, 2021, 7:06pm

From our logs I have detected 14 warnings when you tried to update your dashboard. The reason for those is “dashboard version conflict”. Around “2021-05-27T18:47:16.925136065Z” (UTC)

It can happen when you tried to edit a dashboard that is already edited by another session. A simple page refresh should fix it.

We probably don’t handle it correctly.

Leonidas_Vrachnis · May 27, 2021, 7:11pm

@novykh I’ve replicate it, we are not handling 409 correctly

novykh · May 27, 2021, 7:16pm

Also, we are missing a notification there. Will be added in the next release.
@swhitf thanks

Leonidas_Vrachnis · May 27, 2021, 7:17pm

It seems we have another issue, us the dashboard version can not be updated even after a page refresh.

We you will check it out.

Leonidas_Vrachnis · May 27, 2021, 7:33pm

@novykh We have a caching issue as well, when I refresh the page (after I modified the dashboard in an other tab), the request returned with 304.

Cache related headers:

cache-control: max-age=0
if-modified-since:Wed, 19 May 2021 11:45:16 GMT
if-none-match: "60a4fa4c-a93"

We need to review our caching policies and middlewares.

If I disable the cache on chrome it works ok.

@swhitf A hard-refresh shift+f5 should be enough as a quick fix. Until will fix it on our end.

We haven’t pay much attention to dashboards lately, but your feedback was really valuable for pinpoint all those issues. We will you posted with the fixes.

Topic		Replies	Views
dashboard charts not showing data before visiting node Help cloud	8	1754	February 17, 2022
Empty Cloud Overview Dashboard Help cloud	8	781	October 30, 2021
suddenly - Local dashboard is limited to 5 nodes Help agent	22	1687	April 17, 2025
Dashboard V2 is lost on my local server Help	7	755	April 3, 2024
some of the missing ones I found Help agent , cloud , collectors , dashboards , configuration , ui	0	481	February 16, 2022

My dashboards keep "dying" and becoming useless so I have to recreate them. How can I stop this happening?

Related topics