What are collectors and how do they work?

OdysLam · April 15, 2021, 12:01pm

Netdata gathers data from data sources using a collector system. A collector can be implemented differently, but it’s a logical “unit” that:

Gathers data from a data source
Organizes raw data into charts (and performs any processing required)
Stores the data into the database and/or sends data to the streaming module

There are 2 kinds of collectors:

Internal Collectors
External Collectors

Internal Collectors are implemented inside the core Netdata Agent.

External collectors are actually processes that run outside of Netdata (but they are managed by it). They send data using PIPES to an internal collector of Netdata, responsible for getting data from external collectors. It’s called Plugin.d.*

Note: Any external process can send data to Netdata and be considered as a collector. It only has to follow the external collector API. The external collectors that we implement and maintain are part of our philosophy of zero-configuration and auto-detection of data sources.

The External Collectors are further divided into:

Normal external collectors (e.g apps.plugin.d)
Orchestrator Plugins

Orchestrator plugins are called that way because they orchestrate different language-specific modules that are executed as jobs.

This way, an Orchestrator can, in fact, run many different collectors, as every collector is defined as a module (e.g PostgreSQL collector), that is written in a language (e.g Python) and orchestrated by the same Orchestrator (Python.d.plugin).

For every module the orchestrator will spawn as many jobs as they are defined. The Orchestrator will drop any job that fails to gather data and every job that has the same name with an already running job of the same module.

This way, the user can:

Define many different jobs to gather data from the same data source, but using different methods (e.g HTTP and UNIX sockets). If they are named the same (e.g localhost), only one will work so the user can define many default jobs to cover edge cases and be certain that only 1 will run.
Use the same module as a blueprint to gather data from different data sources but which are of the same nature (e.g 5 running PostgreSQL databases). The user will define 5 different jobs, passing different configurations (e.g hostname, username, password) for every job. The Orchestrator will use the same module to create, in essence, 5 collectors that gather data from 5 different data sources. The collectors will be nearly identical, save for the different configuration variables.

The above can be summarised into the Diagrams bellow:

Relevant Documentation:

bwolf453677056 · January 28, 2022, 1:56pm

I’ve been working on netdata recently, and I’m going to port it to embedded devices; I’d like to ask apps.plugin.d to write collected data to stdout, but I don’t see anyone reading this part of stdout data And save, would like to ask how this part of the implementation is? Can you capture some code snippets?

Topic		Replies	Views
How to collect data efficiently with a command tool which taking 1s to collect one sample? Help agent-collector , agent	18	1428	March 30, 2021
example external plugin to help understand API Help	6	408	May 22, 2023
Any way to aggregate multple hosts without using Cloud? Help cloud	4	704	September 5, 2021
Is it possible to disable all parts excepts collector and integrate data in a server node? General	4	1284	December 23, 2020
Updates on Netdata Integrations (Collectors, Exporters) General announcement	2	1197	January 21, 2021

What are collectors and how do they work?

Related topics