Netdata Community

Composite charts (Overview pane) are here!

Composite charts are here!

In Netdata Cloud, your nodes are organized into War Rooms. One of the two available views for a War Room is the Overview, which uses composite charts to display real-time, aggregated metrics from all the nodes (or a filtered selection) in a given War Room.

With Overview’s composite charts, you can see your infrastructure from a single pane of glass, discover trends or anomalies, then drill down with filtering or single-node dashboards to see more. In the screenshot below, each chart visualizes average or sum metrics values from across 5 distributed nodes.

That’s right, simply add all the nodes you want in the same war-room click on the view dropdown of the War-Room Utility Bar and select Overview.

You don’t need to set up anything. We do all the heavy lifting :muscle: for you.

Caveat

Only nodes with v1.25.0-127 or later of the Netdata Agent can contribute to composite charts. If your node(s) use an earlier, incompatible version of the Netdata Agent, you will see them marked as needs upgrade in the tooltip that appears when hovering over X Issues.

Tooltip showing nodes that need to be
upgraded

See our update docs for the preferred update method based on how you installed the Agent.

Aggregate Metrics

Composite charts, at the moment, group all the nodes in the same War Room by chart. Each chart is a chart of the aggregated metrics from all the nodes.

These charts have 4 distinct modes, namely SUM, MAX, MIN, AVG which perform the relevant aggregate function to the metrics of a particular chart, from all nodes.

This way, you can choose MAX of CPU consumption and see that from the tens of nodes that you may have in the war-room, the node with the highest CPU consumption has it’s CPU consumption under normal boundaries, thus the rest of the nodes are normal too.

:page_with_curl: You can read more about the overview mode in our documentation.

Filters

We have also added the option of using filters, thus the user can easily view the nodes that share some characteristic, such as a particular service.

:page_with_curl: You can read more about filters in our documentation.

More information

Manos Saratsis, Senior Product Manager in Netdata has written a detailed blog post about the Overview pane and the way it revolutionizes the Netdata Experience.

Time & Date picker

We add a brand new date picker in Netdata Cloud. You can now either choose one of our pre-defined selections (e.g last 12hrs) or you can define a custom date.

:page_with_curl: You can read about the Time & Date picker in our documentation.

We need you!

This feature is brand-new and it’s only the first step towards a robust solution for overview visualization of your infrastructure. Soon, you will be able to do a lot more, like grouping metrics by node instead of by chart. But, we can’t do it alone, we need your feedback.

  1. Go ahead and update your Agents to the newest version. It will only take some minutes.
  2. Log in Netdata Cloud account add the nodes of interest in a War Room and play around with the new overview view.
  3. Come back here, at this very topic, and share your thoughts.

What you liked, what you didn’t like, what was hard,what you would prefer differently.Every bit of information is valuable to us. At the same time, you will be shaping the next iterations of this feature, ensuring that your voice is heard in the product you love.

@Luis-Johnstone fyi Grouping by node is out there!

Thanks @Luis-Johnstone for the detailed review, as you see engineers and product are already on top of your feedback. Our product development is so much better when users like you engage!
Cheers to @andrewm4894 @manos-saratsis and @leonidas-vrachnis for chiming in :slight_smile:

@Luis-Johnstone Thank you for sharing your investigation story. I am glad you found the first iteration composite charts useful, “group by node” will make your experience even better.

1 Like

@Luis-Johnstone thx a lot for your feedback. Some comments:

  1. We will offer composite charts grouped by node very soon
  2. a)The bug is raised. Thanks again b)Hopefully this week you will see the date&time picker in nightlies
  3. Within the Cloud, the date&time picker configuration is persisted when you change different views. This way users can investigate without having to change the timeframe all the time. Can you please elaborate about your use case?
  4. The filter shows the services/applications that are autodetected (eg service is mysql). In case you have applications installed on your hosts but these are not shown in the list please share the OS version and applications
1 Like

This is cool feedback - I wonder if some other aggregate might be useful for a situation like this such as median, range or standard deviation across the underlying nodes - all those a bit messier and complicated to implement with various trade offs but would be cool if you did not have to look at the min and then also the max separately. I guess percentiles could be useful here too but all seem tricky for smaller numbers like the 4 nodes which would probably be a more typical use case. Range would be an easy one but then the y axis would be something different which could end up being confusing in its own right.

Great feedback though! - I love seeing real examples like this on the community as help me learn more about how users using things :slight_smile:

I’d say that this can be quite useful for things like monitoring clusters for capacity. Here’s my test cluster showing me, overall, how much more free memory I have left (much better than checking each node and adding up the free memory):

!

This can also be useful where you expect a certain workload to be high; I’m thinking things like network links and so on which, if they are quiet might be a sign that something has gone wrong.
I’ve got a somewhat imperfect example of this regarding CPU usage on a cluster that runs at circa 85% CPU 24x7 (intentionally running computational work). Imperfect because in the chart below the numbers would normally be a bit higher but the cluster has had less work units during the time I have metrics.
So this is the current view for the last 6 hours:

!

So a little bit off since I’d expect it to be a bit higher…BUT…switch the measure from “Average” to Minimum and…

!

Well, ru-roh…someone isn’t doing any work…and someone is doing a lot (switch measure to “Maximum”:

!

So now I have a look and, yes, only one of the nodes actually has any work units to do…lucky him :slight_smile:

I reckon this would be even more useful when monitoring network infrastructure.

So on to constructive feedback :slight_smile:

I see that having a drill-down option is flagged for the next iteration; that’s good because this would be a lot more useful if I could switch to show individual nodes. For example, in the CPU view all I can see is the overall, which is great but I’d also like to be able to see what portion each node is contributing to that. obviously a simpler calculation if using SUM rather than average but it should be doable. In my example, it would have been even faster to see the CPU average of each node represented by it’s own line.

2a.
I can see that I now have a date picker also on my custom dashboards but unfortunately the ranges I choose never have effect. This is the same for newly created dashboards.
It does work fine on the new “Overview” dashboard.

2b.
Do you plan to port this feature into the web UI on the local nodes?

Is there a way to make the date/time range picker sticky for a specific dashboard?
e.g. I have a dashboard for monitoring temperatures and a date range of 12-24 hours makes sense but the setting seems to be lost when I close the browser.

  1. When I use the filter and set it to “service” I never seem to get results. I’ve tried doing contains “ssh” or “k8s” but it always shows zero nodes in the results? What kind of services should I be able to filter on?
1 Like

@Manos_Saratsis

I can confirm that 2a is now resolved and works across all dashboards and down to the node views that I’ve tested so far.

Regarding 4:

Well, Netdata is showing menus on the right-side for things like ZFS, BOINC, fail2ban for example and those comes up with zero nodes applicable:

If I drill into the “Systemd Services” node/menu then one service (for example) listed is fail2ban and, as mentioned above, that brings up zero hits using the filter. So exactly what is the services filter supposed to match against? :smiley:
Interestingly, it does offer to autocomplete “IPVS”…

OS here is Ubuntu Ubuntu 20.04.1 LTS on ARM and x84. Let me know what else you need :slight_smile:

Hello @Luis_Johnstone.

Really appreciate the time you took for your feedback.

We may have an issue in the way we are mapping the agent collectors as services in the cloud.

Can you please, posts the collectors on the “fail2ban” agent. To do this visit this endpoint on your agenthttp://localhost:19999/api/v1/info. It should contain an array with collectors.

Thank you!

Now worries :slight_smile:
As requested:

"collectors": [
	{
		"plugin": "proc.plugin",
		"module": "/proc/pressure"
	},
	{
		"plugin": "apps.plugin",
		"module": ""
	},
	{
		"plugin": "proc.plugin",
		"module": "/proc/diskstats"
	},
	{
		"plugin": "proc.plugin",
		"module": "/proc/softirqs"
	},
	{
		"plugin": "proc.plugin",
		"module": "/proc/uptime"
	},
	{
		"plugin": "cgroups.plugin",
		"module": "systemd"
	},
	{
		"plugin": "proc.plugin",
		"module": "/proc/meminfo"
	},
	{
		"plugin": "tc.plugin",
		"module": ""
	},
	{
		"plugin": "idlejitter.plugin",
		"module": ""
	},
	{
		"plugin": "proc.plugin",
		"module": "/proc/net/dev"
	},
	{
		"plugin": "proc.plugin",
		"module": "/proc/net/stat/nf_conntrack"
	},
	{
		"plugin": "proc.plugin",
		"module": "/proc/loadavg"
	},
	{
		"plugin": "proc.plugin",
		"module": "ipc"
	},
	{
		"plugin": "python.d.plugin",
		"module": "fail2ban"
	},
	{
		"plugin": "proc.plugin",
		"module": "/proc/net/sockstat"
	},
	{
		"plugin": "statsd.plugin",
		"module": "stats"
	},
	{
		"plugin": "cgroups.plugin",
		"module": "stats"
	},
	{
		"plugin": "proc.plugin",
		"module": "/proc/interrupts"
	},
	{
		"plugin": "go.d",
		"module": "k8s_kubeproxy"
	},
	{
		"plugin": "proc.plugin",
		"module": "/proc/net/snmp"
	},
	{
		"plugin": "proc.plugin",
		"module": "/proc/stat"
	},
	{
		"plugin": "netdata",
		"module": "stats"
	},
	{
		"plugin": "diskspace.plugin",
		"module": ""
	},
	{
		"plugin": "proc.plugin",
		"module": "/proc/net/ip_vs_stats"
	},
	{
		"plugin": "web",
		"module": "stats"
	},
	{
		"plugin": "python.d.plugin",
		"module": "sensors"
	},
	{
		"plugin": "proc.plugin",
		"module": "/proc/net/netstat"
	},
	{
		"plugin": "proc.plugin",
		"module": "/proc/vmstat"
	},
	{
		"plugin": "proc.plugin",
		"module": "/proc/net/snmp6"
	},
	{
		"plugin": "python.d.plugin",
		"module": ""
	},
	{
		"plugin": "proc.plugin",
		"module": "/proc/net/softnet_stat"
	},
	{
		"plugin": "proc.plugin",
		"module": "/proc/net/sockstat6"
	}
],
"cloud-enabled": true,
"cloud-available": true,
"agent-claimed": true,
"aclk-available": true

}

1 Like

@Luis_Johnstone thanks for the great feedback and for making our products better

@Manos_Saratsis
OK just tested the Group by Node in the Overview dashboard and very good work. Thanks very much for getting that done.
A slight tweak I’d suggest is regarding those charts with lots of dimensions such as under the Applications node: right now the view switches to an aggregated view and so you lose the resolution of seeing individual apps; this is an inevitable trade-off with grouping by node but might I suggest that if a user selects just one dimension that it switches to showing one line for that dimension per host.

1 Like

A couple of days ago we released group by node for the Overview pane. You can now see all the metrics for a specific chart by node, allowing you to quickly spot abnormalities in the infrastructure.

This is exciting :muscle:

Read more about it in our documentation!