What are collectors and how do they work?

Netdata gathers data from data sources using a collector system. A collector can be implemented differently, but it’s a logical “unit” that:

  1. Gathers data from a data source
  2. Organizes raw data into charts (and performs any processing required)
  3. Stores the data into the database and/or sends data to the streaming module

There are 2 kinds of collectors:

  1. Internal Collectors
  2. External Collectors

Internal Collectors are implemented inside the core Netdata Agent.

External collectors are actually processes that run outside of Netdata (but they are managed by it). They send data using PIPES to an internal collector of Netdata, responsible for getting data from external collectors. It’s called Plugin.d.*

Note: Any external process can send data to Netdata and be considered as a collector. It only has to follow the external collector API. The external collectors that we implement and maintain are part of our philosophy of zero-configuration and auto-detection of data sources.

The External Collectors are further divided into:

  1. Normal external collectors (e.g apps.plugin.d)
  2. Orchestrator Plugins

Orchestrator plugins are called that way because they orchestrate different language-specific modules that are executed as jobs.

This way, an Orchestrator can, in fact, run many different collectors, as every collector is defined as a module (e.g PostgreSQL collector), that is written in a language (e.g Python) and orchestrated by the same Orchestrator (Python.d.plugin).

For every module the orchestrator will spawn as many jobs as they are defined. The Orchestrator will drop any job that fails to gather data and every job that has the same name with an already running job of the same module.

This way, the user can:

  • Define many different jobs to gather data from the same data source, but using different methods (e.g HTTP and UNIX sockets). If they are named the same (e.g localhost), only one will work so the user can define many default jobs to cover edge cases and be certain that only 1 will run.
  • Use the same module as a blueprint to gather data from different data sources but which are of the same nature (e.g 5 running PostgreSQL databases). The user will define 5 different jobs, passing different configurations (e.g hostname, username, password) for every job. The Orchestrator will use the same module to create, in essence, 5 collectors that gather data from 5 different data sources. The collectors will be nearly identical, save for the different configuration variables.

The above can be summarised into the Diagrams bellow:

Relevant Documentation:

2 Likes

I’ve been working on netdata recently, and I’m going to port it to embedded devices; I’d like to ask apps.plugin.d to write collected data to stdout, but I don’t see anyone reading this part of stdout data And save, would like to ask how this part of the implementation is? Can you capture some code snippets?