High CPU usage from eBPF collectors when monitoring 3rd-party mobile automation bridges?

Hi everyone,
I’ve been using Netdata to monitor a small cluster of Mac minis that we use for automated iOS testing, and I’ve run into a weird performance spike. I’m trying to use the ebpf.plugin to get a better look at system call latency, but I’m seeing CPU usage jump by 15-20% whenever my automation scripts are active.

For context, the environment I’m monitoring uses a delta mod as the primary execution layer for our mobile scripts. It seems that when the mod starts injecting logic into the mobile simulator, Netdata’s apps.plugin starts reporting massive amounts of “interrupt” and “softirq” activity on the host machine. I can’t tell if Netdata is struggling to keep up with the rapid-fire process creation or if the mod itself is causing a bottleneck that Netdata is just reporting.

Has anyone here successfully tuned their netdata.conf for high-velocity script environments like this? I’m considering disabling the cgroups collector for these specific containers to see if it lowers the overhead, but I don’t want to lose visibility into the memory usage. Should I be looking into the per_core settings for the eBPF plugin, or is there a way to exclude the executor’s specific child processes from the real-time tracking? I’d love to hear how you guys handle monitoring volatile automation tools without the monitoring agent itself becoming the biggest resource consumer on the box!

Dear @lola667 ,

Thanks for your feedback! We have an open PR that completely refactors eBPF.plugin (https://github.com/netdata/netdata/pull/21676), and we’re expecting it to be reviewed and to fix this issue soon.

To be sure we haven’t missed anything, would you mind sharing your distribution? That way we can test on a setup just like yours.

In the meantime, you can disable the plugin in netdata.conf or disable specific threads in ebpf.d.conf.