Netdata Community

Linux Kernel insights with eBPF

Intro

Today you do not get all the metrics you need from your monitoring software. Most provide system metrics, packaged and custom application metrics but there is more than this. With eBPF there is technological advancement that allows monitoring software to get rich information out of the Kernel and present it.

At Netdata we are in position to leverage the power of eBPF and we will try to:

  • Get the most out of the Linux Kernel and offer to you monitoring insights based on data directly gathered from the Linux Kernel that you haven’t seen before
  • Improve our coverage with metrics and charts, give you richer information about areas such as CPU, Hard Disk, Memory, Network, Applications, Interrupt Requests, other Hardware
  • Enrich our alarming with a new set of use cases
  • Educate you about the Linux Kernel and its secrets

But most of all we will offer insights from the Linux Kernel to each one of you effortlessly; everybody can have insights out-of-the-box with a one line installation of the Netdata agent.

Roadmap

We will start our roadmap with using a set of eBPF tools that are already available. The plan is to move them to the Netdata Agent and simplify how you can access and use these tools. IOvisor with the BCC tools and Cloudflare with the eBPF exporter did an amazing groundwork on eBPF; we are thankful for their contribution and we intend to increase the reach of their work by distributing the value that can be derived from these tools to all of you.

We will start with offering charts related to the below areas:

Area Short description
Memory (HD access) Page cache access, creation and modification
Memory (HD access) File access using directory cache
Memory (HD access) Memory swap access per process
Memory (HD access) Count of unused pages in the cache
Memory Memory allocation (page allocation) per application
Hard Disk I/O operations and Latency to measure hardware health
Hard Disk Monitor when a mount point is added or removed
Hard Disk Monitor RAID flush
Hard disk Monitor sync() system call monitoring when data is merged from memory to disk
Hard disk/FS I/O operations for specific filesystems (ext4* , xfs*, zfs*, nfs*, and btrfs*)
Hardware X86_64 Hardware errors
CPU Instruction per CPU cycle and give hints if you need better Memory or CPU in your system
CPU Last level cache usage per application
CPU Monitor how long tasks spent waiting their turn to run on CPU
IRQ - Interrupt Requests Usage of hardware IRQs
IRQ - Interrupt Requests Usage of software IRQs
Network Port usage
Network Number of successful accepted connections per second
Network Possible port scan done to your systems
Network Problems to accept connections for IPv4, IPv6
Network TCP connections to external servers
Network TCP connections that were dropped
Network Network interface quality and established connections round-trip time
Network Summarize TCP connection state changes and statuses
Network TCP information per subnet
Network Number of available TCP connections that the server can receive by measuring the TCP SYN backlog size
Network TCP connection per application
Network Bandwidth per Network interface
Network DNS latency
Application Process termination either by fatal error or exit
Application File lifetime, important for temporary files
Application Mutex lock events
Application Out of memory killer traces
Application Monitor shared memory events

Sidenotes:

  • Linux Kernel versions newer than 4.11 are supported
  • Netdata has some of the above charts already, with eBPF we will have benefits on the performance and resource utilization
  • The naming of the areas is not in sync with the OSI layer but an internal naming convention

Some of the charts will be always on while others that require a lot of resource utilization will be available on demand. You do not need your monitoring tools to use your system resources but you need more actionable insights when you troubleshoot and try to analyze system behaviour.

We will offer all the above and we are just warming up. The Linux Kernel has many secrets to unleash and we will be adding use cases to our roadmap. We plan to offer more than charts, other visualizations will follow that will help you to better consume and understand the Linux Kernel information. On top we will offer tools for increased security, tracing and observability overall.

We need your help

Help us improve and offer you meaningful products.

  • Help us prioritize the work we do in a per area basis by voting. Tell us which section is most important for you in the following poll
Sections of interest
  • CPU
  • Hard Disk
  • Memory
  • Network
  • Applications
  • Interrupt Requests
  • Hardware

0 voters

  • Give us feedback about the charts and the alerts that we develop
  • Let us help you with your eBPF needs. If you know use cases that will be useful to our community please reach out to us. Who knows we might be able to solve your problem.
3 Likes