I’m monitoring DNS response times as a basis for deciding which DNS resolver to choose for performance reasons.
With some updates there came a severe change in the graphs regarding the scale which makes me unhappy. Before all graphs were the same scale (milliseconds, because seconds for DNS queries really suck!), but now I see some graphs with milliseconds, others with seconds.
Another thing I dislike is that I grouped DNS servers of the same company, but the charts are on a per-server-base which makes it messier than before:
I really liked it before as you can see on an older screenshot on my GitHub site.
And is “query status” new for “how often did the DNS server respond” (in percent)? Well, this might be indeed a good indicator not to choose Quad-9 (in my case)
hmmm, let me see if i can help (and thanks for the feedback!).
can you share a screenshot of this? have not seen this before so am curious what might have changed for this to happen, screenshot might help me try figure out who to ask about it. Agree showing seconds and milliseconds on same chart would be silly.
I think if you use the charts in netdata cloud then you could use group by job name from the config to aggregate per job. In example below we just have one job i believe but i think if you had a job per company then you could agg by _collect_job and would end up with a line per job that would agg over all the servers.
here is a link to the public netdata-demo space that has some DNS stuff set up. I think in your case you might end up with a dimension called “ext_nextdns” for example.
The new grouping (a chart per server) was done taking into account Netdata Cloud aggregated charts. We will switch to chart aggregation only in the future.
The status chart reflects the query’s current status. The dimensions are a set of individual statuses( success, network_error, and dns_error). The value is boolean (0 or 1). Only one of them is 1 at a given time (the current state). So the current value is not a percentage. But aggregation over time can be read as a percentage (e.g. 30 mins average: success 0.8, network_error 0.1, dns_error 0.1 => for the last 30 mins 80% success 10% network_error and 10% dns_error).
some graphs with milliseconds, others with seconds
All of them are in seconds actually (decimal). It is the “Scale Units” feature (dashboard) that changes them based on precision.
you could agg by _collect_job
Can be aggregated (Netdata Cloud) by server. This will give a chart with a dimension per server (as it was on the old chart).
@andrewm4894 I’m using and speaking about local netdata “dashboard” or “system overview” where no “group_by” exists in my netdata version: v1.36.0-264-nightly (“You already have the latest netdata!”)
Seconds and milliseconds are NOT mixed in the SAME chart for “DNS Query Time”. But one chart has seconds, other one has milliseconds for Y-axis as you can see on my screenshots I’ve already provided.