Now that we have our metrics, let’s talk about creating a prometheus collector for Netdata.
netdata currently has a Prometheus endpoint collector, which means that it can easily scrape any Prometheus endpoint and create a chart per metric.
The collector is currently limited, being able to do very elementary computation on the metrics.
Thus, we opted to create our own collector, which is only a bit harder, but offers much more flexibility. It’s still fairly easy, because all prometheus endpoint-related collectors have the same architecture and functionality.
It’s really just a matter of changing the following types of variables:
- Metric names that are being scraped
- Chart definitions.
Let’s seen an example.
As the basis for our work, we will use the VerneMQ collector:
Let’s look at the files, in order:
vernemq.go
Changes: Change endpoint, change object name and package name. For example:
package geth
import (
"errors"
"time"
"github.com/netdata/go.d.plugin/pkg/prometheus"
"github.com/netdata/go.d.plugin/pkg/web"
"github.com/netdata/go.d.plugin/agent/module"
)
func init() {
creator := module.Creator{
Create: func() module.Module { return New() },
}
module.Register("geth", creator)
}
func New() *Geth {
config := Config{
HTTP: web.HTTP{
Request: web.Request{
URL: "http://127.0.0.1:6060/metrics/debug/prometheus",
},
Client: web.Client{
Timeout: web.Duration{Duration: time.Second},
},
},
}
return &Geth{
Config: config,
charts: charts.Copy(),
cache: make(cache),
}
}
type (
Config struct {
web.HTTP `yaml:",inline"`
}
Geth struct {
module.Base
Config `yaml:",inline"`
prom prometheus.Prometheus
charts *Charts
cache cache
}
cache map[string]bool
)
metrics.go
This is the file that is responsible for the metrics definitions. In essence we simply translate the metrics that we care about, from the prometheus format to a go variable.
Changes: Choose the metrics that you will use and change both the prometheus metric name and the corresponding variable.
const (
// AUTH
metricAUTHReceived = "mqtt_auth_received" // v5 has 'reason_code' label
metricAUTHSent = "mqtt_auth_sent" // v5 has 'reason_code' label
collect.go
This is the source file that is responsible for collecting the metrics from the endpoints. A best practice is to divide the functions by chart. We use the metric variables, as we defined them in charts.go
above.
unc (v *Geth) collectGeth(pms prometheus.Metrics) map[string]float64 {
mx := make(map[string]float64)
collectSockets(mx, pms)
collectQueues(mx, pms)
collectSubscriptions(mx, pms)
v.collectErlangVM(mx, pms)
collectBandwidth(mx, pms)
collectRetain(mx, pms)
collectCluster(mx, pms)
collectUptime(mx, pms)
v.collectAUTH(mx, pms)
v.collectCONNECT(mx, pms)
v.collectDISCONNECT(mx, pms)
v.collectSUBSCRIBE(mx, pms)
v.collectUNSUBSCRIBE(mx, pms)
v.collectPUBLISH(mx, pms)
v.collectPING(mx, pms)
v.collectMQTTInvalidMsgSize(mx, pms)
return mx
}
func (v *Geth) collectCONNECT(mx map[string]float64, pms prometheus.Metrics) {
pms = pms.FindByNames(
metricCONNECTReceived,
metricCONNACKSent,
)
v.collectMQTT(mx, pms)
}
Changes: Change the name of every function to correspond to some “logical division” of the metrics.
We may want to make some computations, e.g
collectNonMQTT(mx, pms)
mx["open_sockets"] = mx[metricSocketOpen] - mx[metricSocketClose]
Finally,
charts.go
Now that we have defined our metrics and how we are going to collect them, it’s time to organize them into charts.
chartOpenSockets = Chart{
ID: "sockets",
Title: "Open Sockets",
Units: "sockets",
Fam: "sockets",
Ctx: "vernemq.sockets",
Dims: Dims{
{ID: "open_sockets", Name: "open"},
},
}
chartSocketEvents = Chart{
ID: "socket_events",
Title: "Socket Open and Close Events",
Units: "events/s",
Fam: "sockets",
Ctx: "vernemq.socket_operations",
Type: module.Stacked,
Dims: Dims{
{ID: metricSocketOpen, Name: "open", Algo: module.Incremental},
{ID: metricSocketClose, Name: "close", Algo: module.Incremental},
},
}
Changes: Change every aspect of these charts. Moreover, make sure to change the charts
object so that it uses the charts that you will define.
var charts = Charts{
chartOpenSockets.Copy(),
chartSocketEvents.Copy(),
chartClientKeepaliveExpired.Copy(),
chartSocketErrors.Copy(),
chartSocketCloseTimeout.Copy()
...
Here is some clarifications:
-
Fam
= `family
-
Ctx
= context
-
Dims
= dimensions
(The ID
of every dimension is the name of the metric variable that we have defined above)
???
Yup, that was it. As you can see, the challenge here is not to create a new collector, but rather to understand how the metrics should be surfaced to the user.
We need you
Yup, that’s right.
We need your expertise to organize the metrics that I have posted in the previous post into meaningful charts. Then, after we define the charts, we will be able to define meaningful alerts and have a complete turn-key experience.
Just fire and forget. Exactly as you would expect from a modern monitoring solution.
On top of that, you get to leverage the rest of the features, such as eBPF monitoring and per-second metric granularity. Not to mention the whole range of sweet sweet things we offer with Netdata Cloud.
The goal is to offer a solution that is as transparent as possible to the Ethereum Node operator. They will focus on keeping the network secure, while we focus on what we know best, monitoring systems and keeping them up and running.