Intro
Today you do not get all the metrics you need from your monitoring software. Most provide system metrics, packaged and custom application metrics but there is more than this. With eBPF there is technological advancement that allows monitoring software to get rich information out of the Kernel and present it.
At Netdata we are in position to leverage the power of eBPF and we will try to:
- Get the most out of the Linux Kernel and offer to you monitoring insights based on data directly gathered from the Linux Kernel that you haven’t seen before
- Improve our coverage with metrics and charts, give you richer information about areas such as CPU, Hard Disk, Memory, Network, Applications, Interrupt Requests, other Hardware
- Enrich our alarming with a new set of use cases
- Educate you about the Linux Kernel and its secrets
But most of all we will offer insights from the Linux Kernel to each one of you effortlessly; everybody can have insights out-of-the-box with a one line installation of the Netdata agent.
Roadmap
We will start our roadmap with using a set of eBPF tools that are already available. The plan is to move them to the Netdata Agent and simplify how you can access and use these tools. IOvisor with the BCC tools and Cloudflare with the eBPF exporter did an amazing groundwork on eBPF; we are thankful for their contribution and we intend to increase the reach of their work by distributing the value that can be derived from these tools to all of you.
We will start with offering charts related to the below areas:
Area | Short description |
---|---|
Memory (HD access) | Page cache access, creation and modification |
Memory (HD access) | File access using directory cache |
Memory (HD access) | Memory swap access per process |
Memory (HD access) | Count of unused pages in the cache |
Memory | Memory allocation (page allocation) per application |
Hard Disk | I/O operations and Latency to measure hardware health |
Hard Disk | Monitor when a mount point is added or removed |
Hard Disk | Monitor RAID flush |
Hard disk | Monitor sync() system call monitoring when data is merged from memory to disk |
Hard disk/FS | I/O operations for specific filesystems (ext4* , xfs*, zfs*, nfs*, and btrfs*) |
Hardware X86_64 | Hardware errors |
CPU | Instruction per CPU cycle and give hints if you need better Memory or CPU in your system |
CPU | Last level cache usage per application |
CPU | Monitor how long tasks spent waiting their turn to run on CPU |
IRQ - Interrupt Requests | Usage of hardware IRQs |
IRQ - Interrupt Requests | Usage of software IRQs |
Network | Port usage |
Network | Number of successful accepted connections per second |
Network | Possible port scan done to your systems |
Network | Problems to accept connections for IPv4, IPv6 |
Network | TCP connections to external servers |
Network | TCP connections that were dropped |
Network | Network interface quality and established connections round-trip time |
Network | Summarize TCP connection state changes and statuses |
Network | TCP information per subnet |
Network | Number of available TCP connections that the server can receive by measuring the TCP SYN backlog size |
Network | TCP connection per application |
Network | Bandwidth per Network interface |
Network | DNS latency |
Application | Process termination either by fatal error or exit |
Application | File lifetime, important for temporary files |
Application | Mutex lock events |
Application | Out of memory killer traces |
Application | Monitor shared memory events |
Sidenotes:
- Linux Kernel versions newer than 4.11 are supported
- Netdata has some of the above charts already, with eBPF we will have benefits on the performance and resource utilization
- The naming of the areas is not in sync with the OSI layer but an internal naming convention
Some of the charts will be always on while others that require a lot of resource utilization will be available on demand. You do not need your monitoring tools to use your system resources but you need more actionable insights when you troubleshoot and try to analyze system behaviour.
We will offer all the above and we are just warming up. The Linux Kernel has many secrets to unleash and we will be adding use cases to our roadmap. We plan to offer more than charts, other visualizations will follow that will help you to better consume and understand the Linux Kernel information. On top we will offer tools for increased security, tracing and observability overall.
We need your help
Help us improve and offer you meaningful products.
- Help us prioritize the work we do in a per area basis by voting. Tell us which section is most important for you in the following poll
- CPU
- Hard Disk
- Memory
- Network
- Applications
- Interrupt Requests
- Hardware
0 voters
- Give us feedback about the charts and the alerts that we develop
- Let us help you with your eBPF needs. If you know use cases that will be useful to our community please reach out to us. Who knows we might be able to solve your problem.