Linux Kernel insights with eBPF

Manos_Saratsis · February 25, 2021, 12:46pm

Intro

Today you do not get all the metrics you need from your monitoring software. Most provide system metrics, packaged and custom application metrics but there is more than this. With eBPF there is technological advancement that allows monitoring software to get rich information out of the Kernel and present it.

At Netdata we are in position to leverage the power of eBPF and we will try to:

Get the most out of the Linux Kernel and offer to you monitoring insights based on data directly gathered from the Linux Kernel that you haven’t seen before
Improve our coverage with metrics and charts, give you richer information about areas such as CPU, Hard Disk, Memory, Network, Applications, Interrupt Requests, other Hardware
Enrich our alarming with a new set of use cases
Educate you about the Linux Kernel and its secrets

But most of all we will offer insights from the Linux Kernel to each one of you effortlessly; everybody can have insights out-of-the-box with a one line installation of the Netdata agent.

Roadmap

We will start our roadmap with using a set of eBPF tools that are already available. The plan is to move them to the Netdata Agent and simplify how you can access and use these tools. IOvisor with the BCC tools and Cloudflare with the eBPF exporter did an amazing groundwork on eBPF; we are thankful for their contribution and we intend to increase the reach of their work by distributing the value that can be derived from these tools to all of you.

We will start with offering charts related to the below areas:

Area	Short description
Memory (HD access)	Page cache access, creation and modification
Memory (HD access)	File access using directory cache
Memory (HD access)	Memory swap access per process
Memory (HD access)	Count of unused pages in the cache
Memory	Memory allocation (page allocation) per application
Hard Disk	I/O operations and Latency to measure hardware health
Hard Disk	Monitor when a mount point is added or removed
Hard Disk	Monitor RAID flush
Hard disk	Monitor sync() system call monitoring when data is merged from memory to disk
Hard disk/FS	I/O operations for specific filesystems (ext4* , xfs, zfs, nfs, and btrfs)
Hardware X86_64	Hardware errors
CPU	Instruction per CPU cycle and give hints if you need better Memory or CPU in your system
CPU	Last level cache usage per application
CPU	Monitor how long tasks spent waiting their turn to run on CPU
IRQ - Interrupt Requests	Usage of hardware IRQs
IRQ - Interrupt Requests	Usage of software IRQs
Network	Port usage
Network	Number of successful accepted connections per second
Network	Possible port scan done to your systems
Network	Problems to accept connections for IPv4, IPv6
Network	TCP connections to external servers
Network	TCP connections that were dropped
Network	Network interface quality and established connections round-trip time
Network	Summarize TCP connection state changes and statuses
Network	TCP information per subnet
Network	Number of available TCP connections that the server can receive by measuring the TCP SYN backlog size
Network	TCP connection per application
Network	Bandwidth per Network interface
Network	DNS latency
Application	Process termination either by fatal error or exit
Application	File lifetime, important for temporary files
Application	Mutex lock events
Application	Out of memory killer traces
Application	Monitor shared memory events

Sidenotes:

Linux Kernel versions newer than 4.11 are supported
Netdata has some of the above charts already, with eBPF we will have benefits on the performance and resource utilization
The naming of the areas is not in sync with the OSI layer but an internal naming convention

Some of the charts will be always on while others that require a lot of resource utilization will be available on demand. You do not need your monitoring tools to use your system resources but you need more actionable insights when you troubleshoot and try to analyze system behaviour.

We will offer all the above and we are just warming up. The Linux Kernel has many secrets to unleash and we will be adding use cases to our roadmap. We plan to offer more than charts, other visualizations will follow that will help you to better consume and understand the Linux Kernel information. On top we will offer tools for increased security, tracing and observability overall.

We need your help

Help us improve and offer you meaningful products.

Help us prioritize the work we do in a per area basis by voting. Tell us which section is most important for you in the following poll

Sections of interest

CPU
Hard Disk
Memory
Network
Applications
Interrupt Requests
Hardware

0 voters

Give us feedback about the charts and the alerts that we develop
Let us help you with your eBPF needs. If you know use cases that will be useful to our community please reach out to us. Who knows we might be able to solve your problem.

Topic		Replies	Views
GUIDE: Monitor any process in real-time with Netdata Media Center	1	1134	December 17, 2020
Monitor, troubleshoot, and debug applications with eBPF metrics Media Center	0	735	July 15, 2020
Bandwidth monitoring per process and per IP Help	2	462	December 25, 2023
Missing Charts including Applications, Users, etc Help agent	3	1072	June 16, 2022
How to Monitor with Netdata. A crash course for Absolute Beginners General how-to	4	7443	April 6, 2022

Linux Kernel insights with eBPF

Intro

Roadmap

We need your help

Related topics