Fping plugin not starting properly most of the time

Hello,

I use netdata v1.31.1 sinc a couple of months now and most of the things work well appart the fping plugin randomly.

In my net data master, I’ve setup a couple of time the plugin as stated in the documentation for different hosts:

[root@monitoring-apps [DEV] ~]# ls -altr /opt/netdata/netdata-configs/fping*
-rw-r--r--. 1 netdata netdata 1233 Jul  9 11:27 /opt/netdata/netdata-configs/fping.conf
-rw-r--r--. 1 netdata netdata 1434 Sep 14 12:59 /opt/netdata/netdata-configs/fpingtalendrcc.conf
-rw-r--r--. 1 netdata netdata 1293 Sep 14 13:00 /opt/netdata/netdata-configs/fpingalerta.conf
-rw-r--r--. 1 netdata netdata 1299 Sep 14 13:00 /opt/netdata/netdata-configs/fpingattunityrcc.conf
-rw-r--r--. 1 netdata netdata 1277 Sep 14 13:00 /opt/netdata/netdata-configs/fpingbackupdns.conf
-rw-r--r--. 1 netdata netdata 1770 Sep 14 13:00 /opt/netdata/netdata-configs/fpingconfluentrcc.conf
-rw-r--r--. 1 netdata netdata 1302 Sep 14 13:00 /opt/netdata/netdata-configs/fpingselfservicepasswordrcc.conf

In every files I have max 8 hosts… so something small but 95% of the time, as soon as I restart, I only have half of the plugins which are active.

2021-09-14 13:05:48: fpingalerta.plugin: INFO: Loading config file '/opt/netdata/etc/netdata/fpingalerta.conf'...
2021-09-14 13:05:48: perf.plugin INFO  : MAIN : no charts enabled - nothing to do.
2021-09-14 13:05:48: fpingalerta.plugin: INFO: starting fping: /opt/netdata/bin/fping -N -l -Q 5 -p 1000 -R -b 56 -i 1 -r 0 -t 5000 alerta.xxxxxxxxxxxxxx.local
2021-09-14 13:05:48: netdata INFO  : PLUGINSD[perf] : called DISABLE. Disabling it.
2021-09-14 13:05:48: netdata INFO  : PLUGINSD[perf] : PARSER ended
2021-09-14 13:05:48: netdata ERROR : PLUGINSD[perf] : '/opt/netdata/usr/libexec/netdata/plugins.d/perf.plugin' (pid 16683) disconnected after 0 successful data collections (ENDs). (errno 22, Invalid argument)
2021-09-14 13:05:48: netdata ERROR : PLUGINSD[perf] : child pid 16683 exited with code 1.
2021-09-14 13:05:48: netdata ERROR : PLUGINSD[perf] : '/opt/netdata/usr/libexec/netdata/plugins.d/perf.plugin' (pid 16683) exited with error code 1 and haven't collected any data. Disabling it. (errno 22, Invalid argument)
2021-09-14 13:05:48: netdata INFO  : PLUGINSD[perf] : thread with task id 16680 finished

So a couple of plugins generate this error, not always the same, and it’s REALLY complicated to finally have all of them available at the same time after restart. Of course, when I run it manually, everything works fine.

Can somebody help me to solve it please ?

Best,
Jerome

Hi, @jrevillard. Let’s see what is happening.

The topic is about fping.pluigin, but later you say you have problems with other plugins.

  • fping.plugin

According to your logs, it works (or at least I don’t see it fails)

2021-09-14 13:05:48: fpingalerta.plugin: INFO: starting fping: /opt/netdata/bin/fping -N -l -Q 5 -p 1000 -R -b 56 -i 1 -r 0 -t 5000 alerta.xxxxxxxxxxxxxx.local

  • other plugins

I see only perf.plugin in the logs. It doesn’t work, indeed. But that is expected because it is disabled by default (can be changed in netdata.conf).


What plugins do you mean by saying “all of them”?

To see running external plugins (fping is an external) you can use ps faxu | grep "[n]etdata"

That is what i get on my VM:

ilyam@debian-s-1vcpu-1gb-fra1-01:~$ ps faxu | grep "[n]etdata"
netdata  16003  1.2 13.5 402456 138196 ?       Ssl  11:23   4:47 /opt/netdata/usr/sbin/netdata -P /opt/netdata/var/run/netdata/netdata.pid -D
netdata  16027  0.0  0.2  51524  2464 ?        Sl   11:23   0:00  \_ /opt/netdata/usr/sbin/netdata --special-spawn-server
netdata  16246  1.0  0.4  37676  4760 ?        S    11:23   4:02  \_ /opt/netdata/usr/libexec/netdata/plugins.d/apps.plugin 1
netdata  16247  0.0  0.3   7848  3128 ?        S    11:23   0:23  \_ /usr/bin/fping -N -l -Q 1 -p 200 -R -b 56 -i 1 -r 0 -t 5000 8.8.8.8
netdata  16248  0.5  2.3 723888 23504 ?        Sl   11:23   2:00  \_ /opt/netdata/usr/libexec/netdata/plugins.d/go.d.plugin 1
netdata  16249  0.2  5.1 108904 52808 ?        Sl   11:23   0:52  \_ /usr/bin/python /opt/netdata/usr/libexec/netdata/plugins.d/python.d.plugin 1
netdata  18672  0.0  0.3  36136  3356 ?        S    15:23   0:00  \_ /opt/netdata/usr/libexec/netdata/plugins.d/perf.plugin 1 cycles instructions
netdata  20054  0.1  0.2   9724  2796 ?        S    17:23   0:02  \_ bash /opt/netdata/usr/libexec/netdata/plugins.d/tc-qos-helper.sh 1

Ok sorry if I didn’t explained properly … perhaps that the log part that I extracted is not relevant in fact.

My problem is only with the fping plugin (at least I concentrate my time on this one for the moment). As you can see in my first post, I have 6 different fping configuration but when I check the processes, I only have 5 processes actually:

[root@monitoring-apps [DEV] ~]#  ps faxu | grep "[n]etdata" | grep fping
netdata  25323  0.0  0.0   1288  1124 ?        SN   Sep14   0:35  \_ /opt/netdata/bin/fping -N -l -Q 5 -p 1000 -R -b 56 -i 1 -r 0 -t 5000 ...
netdata  25327  0.0  0.0   1288   856 ?        SN   Sep14   0:35  \_ /opt/netdata/bin/fping -N -l -Q 5 -p 1000 -R -b 56 -i 1 -r 0 -t 5000 ...
netdata  25328  0.0  0.0   1288   860 ?        SN   Sep14   0:30  \_ /opt/netdata/bin/fping -N -l -Q 5 -p 1000 -R -b 56 -i 1 -r 0 -t 5000 ...
netdata  25333  0.0  0.0   1288   860 ?        SN   Sep14   0:30  \_ /opt/netdata/bin/fping -N -l -Q 5 -p 1000 -R -b 56 -i 1 -r 0 -t 5000 ...
netdata  25335  0.1  0.0   1292  1124 ?        SN   Sep14   0:52  \_ /opt/netdata/bin/fping -N -l -Q 5 -p 1000 -R -b 56 -i 1 -r 0 -t 5000 ...

This is not a configuration issue because, for instance, if I restart, I will have finally the 6 available, or only 4, not always the same… This is really annoying because at every restart, it’s really complicated to get the all stuff in place.

Best,
Jerome

Ok, so the problem when using Multiple fping Plugins With Different Settings. I need to test it.

Btw, you use that feature if

For example, you may need to ping a few hosts 10 times per second, and others once per second.

From your “ps” output it looks like all the plugins have the same settings (different targets only). Have you considered using one plugin with multiple targets?

I need to separate because of the way that we use to deploy everything (Infrastructure as code)… so I need to separate for the moment yes…

but this does not explain the current behaviour…

It doesn’t explain, yes. I suggested that so you can have a working fping.plugin while we are investigating the problem.

Could you guide me please ? what kind of logs do you need ?

Ok, i tried to reproduce the issue, but it worked for me.

  • i created 10 fping.plugin instances
[ilyam@pc ~]$ ps faxu | grep netdata | grep fping
netdata    64694  0.0  0.0   3504  1972 ?        S    14:04   0:01  \_ /usr/bin/fping -N -l -Q 1 -p 200 -R -b 56 -i 1 -r 0 -t 5000 8.8.8.4
netdata    64699  0.0  0.0   3504  1756 ?        S    14:04   0:01  \_ /usr/bin/fping -N -l -Q 1 -p 200 -R -b 56 -i 1 -r 0 -t 5000 8.8.8.8
netdata    64715  0.0  0.0   3504  2000 ?        S    14:04   0:01  \_ /usr/bin/fping -N -l -Q 1 -p 200 -R -b 56 -i 1 -r 0 -t 5000 8.8.8.9
netdata    64725  0.0  0.0   3504  1968 ?        S    14:04   0:01  \_ /usr/bin/fping -N -l -Q 1 -p 200 -R -b 56 -i 1 -r 0 -t 5000 8.8.8.88
netdata    64732  0.0  0.0   3504  1872 ?        S    14:04   0:01  \_ /usr/bin/fping -N -l -Q 1 -p 200 -R -b 56 -i 1 -r 0 -t 5000 8.8.8.10
netdata    64733  0.0  0.0   3504  1964 ?        S    14:04   0:01  \_ /usr/bin/fping -N -l -Q 1 -p 200 -R -b 56 -i 1 -r 0 -t 5000 8.8.8.3
netdata    64739  0.0  0.0   3504  1900 ?        S    14:04   0:01  \_ /usr/bin/fping -N -l -Q 1 -p 200 -R -b 56 -i 1 -r 0 -t 5000 8.8.8.2
netdata    64748  0.0  0.0   3504  1964 ?        S    14:04   0:01  \_ /usr/bin/fping -N -l -Q 1 -p 200 -R -b 56 -i 1 -r 0 -t 5000 8.8.8.5
netdata    64754  0.0  0.0   3504  1872 ?        S    14:04   0:01  \_ /usr/bin/fping -N -l -Q 1 -p 200 -R -b 56 -i 1 -r 0 -t 5000 8.8.8.7
netdata    64764  0.0  0.0   3504  1968 ?        S    14:04   0:01  \_ /usr/bin/fping -N -l -Q 1 -p 200 -R -b 56 -i 1 -r 0 -t 5000 8.8.8.6
  • restarted netdata.service 10 times and checked the number of running fping instances
[pc ilyam]# for i in $(seq 1 10); do systemctl restart netdata.service; sleep 5; echo "run $i, fping instances: $(ps faxu | grep netdata | grep -c fping)"; done
run 1, fping instances: 10
run 2, fping instances: 10
run 3, fping instances: 10
run 4, fping instances: 10
run 5, fping instances: 10
run 6, fping instances: 10
run 7, fping instances: 10
run 8, fping instances: 10
run 9, fping instances: 10
run 10, fping instances: 10

Let’s do the following

cd /opt/netdata/var/log/netdata/
sudo systemctl stop netdata
sudo cp /dev/null error.log
sudo systemctl start netdata
# wait for 5 seconds
grep fping error.log

Ok, so here it is:

[root@monitoring-apps [DEV] netdata]# grep fping error.log
2021-09-16 14:18:41: netdata INFO  : PLUGINSD[fpingconfluentrcc] : thread created with task id 20256
2021-09-16 14:18:41: netdata INFO  : PLUGINSD[fpingconfluentrcc] : set name of thread 20256 to PLUGINSD[fpingc
2021-09-16 14:18:41: netdata INFO  : PLUGINSD[fpingcruisecontrolrcc] : thread created with task id 20266
2021-09-16 14:18:41: netdata INFO  : PLUGINSD[fpingcruisecontrolrcc] : set name of thread 20266 to PLUGINSD[fpingc
2021-09-16 14:18:41: netdata INFO  : PLUGINSD[fpingalerta] : thread created with task id 20252
2021-09-16 14:18:41: netdata INFO  : PLUGINSD[fpingalerta] : set name of thread 20252 to PLUGINSD[fpinga
2021-09-16 14:18:41: netdata INFO  : PLUGINSD[fpingselfservicepasswordrcc] : thread created with task id 20254
2021-09-16 14:18:41: netdata INFO  : PLUGINSD[fpingselfservicepasswordrcc] : set name of thread 20254 to PLUGINSD[fpings
2021-09-16 14:18:41: netdata INFO  : PLUGINSD[fpingalerta] : connected to '/opt/netdata/usr/libexec/netdata/plugins.d/fpingalerta.plugin' running on pid 20274
2021-09-16 14:18:41: netdata INFO  : PLUGINSD[fpingcruisecontrolrcc] : connected to '/opt/netdata/usr/libexec/netdata/plugins.d/fpingcruisecontrolrcc.plugin' running on pid 20268
2021-09-16 14:18:41: netdata INFO  : PLUGINSD[fping] : thread created with task id 20259
2021-09-16 14:18:41: netdata INFO  : PLUGINSD[fping] : set name of thread 20259 to PLUGINSD[fping]
2021-09-16 14:18:41: netdata INFO  : PLUGINSD[fpingselfservicepasswordrcc] : connected to '/opt/netdata/usr/libexec/netdata/plugins.d/fpingselfservicepasswordrcc.plugin' running on pid 20279
2021-09-16 14:18:41: netdata INFO  : PLUGINSD[fpingbackupdns] : thread created with task id 20253
2021-09-16 14:18:41: netdata INFO  : PLUGINSD[fpingbackupdns] : set name of thread 20253 to PLUGINSD[fpingb
2021-09-16 14:18:41: netdata INFO  : PLUGINSD[fpingconfluentrcc] : connected to '/opt/netdata/usr/libexec/netdata/plugins.d/fpingconfluentrcc.plugin' running on pid 20267
2021-09-16 14:18:41: fpingcruisecontrolrcc.plugin: WARNING: Cannot find file '/opt/netdata/usr/lib/netdata/conf.d/fpingcruisecontrolrcc.conf'.
2021-09-16 14:18:41: fpingselfservicepasswordrcc.plugin: WARNING: Cannot find file '/opt/netdata/usr/lib/netdata/conf.d/fpingselfservicepasswordrcc.conf'.
2021-09-16 14:18:41: netdata INFO  : PLUGINSD[fping] : connected to '/opt/netdata/usr/libexec/netdata/plugins.d/fping.plugin' running on pid 20288
2021-09-16 14:18:41: netdata INFO  : PLUGINSD[fpingbackupdns] : connected to '/opt/netdata/usr/libexec/netdata/plugins.d/fpingbackupdns.plugin' running on pid 20306
2021-09-16 14:18:41: netdata INFO  : PLUGINSD[fpingattunityrcc] : 2021-09-16 14:18:41: fpingalerta.plugin: WARNING: Cannot find file '/opt/netdata/usr/lib/netdata/conf.d/fpingalerta.conf'.
2021-09-16 14:18:41: fpingconfluentrcc.plugin: WARNING: Cannot find file '/opt/netdata/usr/lib/netdata/conf.d/fpingconfluentrcc.conf'.
2021-09-16 14:18:41: netdata INFO  : PLUGINSD[fpingattunityrcc] : set name of thread 20255 to PLUGINSD[fpinga
2021-09-16 14:18:41: netdata INFO  : PLUGINSD[fpingtalendrcc] : thread created with task id 20265
2021-09-16 14:18:41: netdata INFO  : PLUGINSD[fpingtalendrcc] : 2021-09-16 14:18:41: tc-qos-helper.sh: WARNING: FireQoS is not installed on this system. Use FireQoS to apply traffic QoS and expose the class names to netdata. Check https://github.com/netdata/netdata/tree/master/collectors/tc.plugin#tcplugin
set name of thread 20265 to PLUGINSD[fpingt
2021-09-16 14:18:41: netdata INFO  : WEB_SERVER[static1] : 2021-09-16 14:18:41: fpingselfservicepasswordrcc.plugin: INFO: Loading config file '/opt/netdata/etc/netdata/fpingselfservicepasswordrcc.conf'...
2021-09-16 14:18:41: fpingcruisecontrolrcc.plugin: INFO: Loading config file '/opt/netdata/etc/netdata/fpingcruisecontrolrcc.conf'...
2021-09-16 14:18:41: fpingconfluentrcc.plugin: INFO: Loading config file '/opt/netdata/etc/netdata/fpingconfluentrcc.conf'...
2021-09-16 14:18:41: netdata INFO  : PLUGINSD[fpingattunityrcc] : connected to '/opt/netdata/usr/libexec/netdata/plugins.d/fpingattunityrcc.plugin' running on pid 2021-09-16 14:18:41: fpingalerta.plugin: INFO: Loading config file '/opt/netdata/etc/netdata/fpingalerta.conf'...
2021-09-16 14:18:41: netdata INFO  : PLUGINSD[fpingtalendrcc] : connected to '/opt/netdata/usr/libexec/netdata/plugins.d/fpingtalendrcc.plugin' running on pid 20329
2021-09-16 14:18:41: fpingconfluentrcc.plugin: INFO: starting fping: /opt/netdata/bin/fping -N -l -Q 5 -p 1000 -R -b 56 -i 1 -r 0 -t 5000 kafka.........
2021-09-16 14:18:41: fpingselfservicepasswordrcc.plugin: INFO: starting fping: /opt/netdata/bin/fping -N -l -Q 5 -p 1000 -R -b 56 -i 1 -r 0 -t 5000 self-service-password..........
2021-09-16 14:18:41: fpingbackupdns.plugin: WARNING: Cannot find file '/opt/netdata/usr/lib/netdata/conf.d/fpingbackupdns.conf'.
2021-09-16 14:18:41: fping.plugin: INFO: Loading config file '/opt/netdata/usr/lib/netdata/conf.d/fping.conf'...
2021-09-16 14:18:41: fpingcruisecontrolrcc.plugin: INFO: starting fping: /opt/netdata/bin/fping -N -l -Q 5 -p 1000 -R -b 56 -i 1 -r 0 -t 5000 cruise-control...........
2021-09-16 14:18:41: fpingalerta.plugin: INFO: starting fping: /opt/netdata/bin/fping -N -l -Q 5 -p 1000 -R -b 56 -i 1 -r 0 -t 5000 alerta................
2021-09-16 14:18:41: fpingbackupdns.plugin: INFO: Loading config file '/opt/netdata/etc/netdata/fpingbackupdns.conf'...
2021-09-16 14:18:41: fpingattunityrcc.plugin: WARNING: Cannot find file '/opt/netdata/usr/lib/netdata/conf.d/fpingattunityrcc.conf'.
2021-09-16 14:18:41: fping.plugin: INFO: Loading config file '/opt/netdata/etc/netdata/fping.conf'...
2021-09-16 14:18:41: apps.plugin INFO  : MAIN : 2021-09-16 14:18:41: fpingattunityrcc.plugin: INFO: Loading config file '/opt/netdata/etc/netdata/fpingattunityrcc.conf'...
2021-09-16 14:18:41: fpingbackupdns.plugin: INFO: starting fping: /opt/netdata/bin/fping -N -l -Q 5 -p 1000 -R -b 56 -i 1 -r 0 -t 5000 192.168.50.20 192.168.50.30
2021-09-16 14:18:41: fpingtalendrcc.plugin: WARNING: Cannot find file '/opt/netdata/usr/lib/netdata/conf.d/fpingtalendrcc.conf'.
2021-09-16 14:18:41: fpingattunityrcc.plugin: INFO: starting fping: /opt/netdata/bin/fping -N -l -Q 5 -p 1000 -R -b 56 -i 1 -r 0 -t 5000 attunity...........
2021-09-16 14:18:41: fpingtalendrcc.plugin: INFO: Loading config file '/opt/netdata/etc/netdata/fpingtalendrcc.conf'...
2021-09-16 14:18:41: netdata INFO  : PLUGINSD[ioping] : 2021-09-16 14:18:41: fping.plugin: FATAL: no hosts configured - nothing to do.
2021-09-16 14:18:41: netdata INFO  : PLUGINSD[fping] : called DISABLE. Disabling it.
2021-09-16 14:18:41: netdata INFO  : PLUGINSD[fping] : PARSER ended
2021-09-16 14:18:41: netdata ERROR : PLUGINSD[fping] : '/opt/netdata/usr/libexec/netdata/plugins.d/fping.plugin' (pid 20288) disconnected after 0 successful data collections (ENDs). (errno 22, Invalid argument)
2021-09-16 14:18:41: netdata ERROR : PLUGINSD[fping] : child pid 20288 exited with code 1.
2021-09-16 14:18:41: netdata ERROR : PLUGINSD[fping] : '/opt/netdata/usr/libexec/netdata/plugins.d/fping.plugin' (pid 20288) exited with error code 1 and haven't collected any data. Disabling it. (errno 22, Invalid argument)
2021-09-16 14:18:41: netdata INFO  : PLUGINSD[fping] : thread with task id 20259 finished
2021-09-16 14:18:41: netdata ERROR : PLUGINSD[fpingalerta] : read failed: end of file (errno 22, Invalid argument)
2021-09-16 14:18:41: netdata INFO  : PLUGINSD[fpingalerta] : PARSER ended
2021-09-16 14:18:41: netdata ERROR : PLUGINSD[fpingalerta] : 2021-09-16 14:18:41: fpingtalendrcc.plugin: INFO: starting fping: /opt/netdata/bin/fping -N -l -Q 5 -p 1000 -R -b 56 -i 1 -r 0 -t 5000 talend.............
'/opt/netdata/usr/libexec/netdata/plugins.d/fpingalerta.plugin' (pid 20274) disconnected after 0 successful data collections (ENDs). (errno 22, Invalid argument)
2021-09-16 14:18:41: netdata ERROR : PLUGINSD[fpingalerta] : child pid 20274 exited with code 2.
2021-09-16 14:18:41: netdata ERROR : PLUGINSD[fpingalerta] : '/opt/netdata/usr/libexec/netdata/plugins.d/fpingalerta.plugin' (pid 20274) exited with error code 2 and haven't collected any data. Disabling it. (errno 22, Invalid argument)
2021-09-16 14:18:41: netdata INFO  : PLUGINSD[fpingalerta] : thread with task id 20252 finished
2021-09-16 14:18:41: netdata ERROR : PLUGINSD[fpingattunityrcc] : read failed: end of file (errno 22, Invalid argument)
2021-09-16 14:18:41: netdata INFO  : PLUGINSD[fpingattunityrcc] : PARSER ended
2021-09-16 14:18:41: netdata ERROR : PLUGINSD[fpingattunityrcc] : '/opt/netdata/usr/libexec/netdata/plugins.d/fpingattunityrcc.plugin' (pid 20321) disconnected after 0 successful data collections (ENDs). (errno 22, Invalid argument)
2021-09-16 14:18:41: netdata ERROR : PLUGINSD[fpingattunityrcc] : child pid 20321 exited with code 2.
2021-09-16 14:18:41: netdata ERROR : PLUGINSD[fpingattunityrcc] : '/opt/netdata/usr/libexec/netdata/plugins.d/fpingattunityrcc.plugin' (pid 20321) exited with error code 2 and haven't collected any data. Disabling it. (errno 22, Invalid argument)
2021-09-16 14:18:41: netdata INFO  : PLUGINSD[fpingattunityrcc] : thread with task id 20255 finished
2021-09-16 14:18:41: netdata ERROR : PLUGINSD[fpingcruisecontrolrcc] : read failed: end of file (errno 22, Invalid argument)
2021-09-16 14:18:41: netdata INFO  : PLUGINSD[fpingcruisecontrolrcc] : PARSER ended
2021-09-16 14:18:41: netdata ERROR : PLUGINSD[fpingcruisecontrolrcc] : '/opt/netdata/usr/libexec/netdata/plugins.d/fpingcruisecontrolrcc.plugin' (pid 20268) disconnected after 0 successful data collections (ENDs). (errno 22, Invalid argument)
2021-09-16 14:18:41: netdata ERROR : PLUGINSD[fpingcruisecontrolrcc] : child pid 20268 exited with code 2.
2021-09-16 14:18:41: netdata ERROR : PLUGINSD[fpingcruisecontrolrcc] : '/opt/netdata/usr/libexec/netdata/plugins.d/fpingcruisecontrolrcc.plugin' (pid 20268) exited with error code 2 and haven't collected any data. Disabling it. (errno 22, Invalid argument)
2021-09-16 14:18:41: netdata INFO  : PLUGINSD[fpingcruisecontrolrcc] : thread with task id 20266 finished
2021-09-16 14:18:41: netdata ERROR : PLUGINSD[fpingtalendrcc] : read failed: end of file (errno 22, Invalid argument)
2021-09-16 14:18:41: netdata INFO  : PLUGINSD[fpingtalendrcc] : PARSER ended
2021-09-16 14:18:41: netdata ERROR : PLUGINSD[fpingtalendrcc] : '/opt/netdata/usr/libexec/netdata/plugins.d/fpingtalendrcc.plugin' (pid 20329) disconnected after 0 successful data collections (ENDs). (errno 22, Invalid argument)
2021-09-16 14:18:41: netdata ERROR : PLUGINSD[fpingtalendrcc] : child pid 20329 exited with code 2.
2021-09-16 14:18:41: netdata ERROR : PLUGINSD[fpingtalendrcc] : '/opt/netdata/usr/libexec/netdata/plugins.d/fpingtalendrcc.plugin' (pid 20329) exited with error code 2 and haven't collected any data. Disabling it. (errno 22, Invalid argument)
2021-09-16 14:18:41: netdata INFO  : PLUGINSD[fpingtalendrcc] : thread with task id 20265 finished
2021-09-16 14:18:42: go.d ERROR: prometheus[fping-exporter_local] Get "http://127.0.0.1:9605/metrics": dial tcp 127.0.0.1:9605: connect: connection refused
2021-09-16 14:18:42: go.d ERROR: prometheus[fping-exporter_local] check failed

And for this restart, only 3 fping processes are active:

[root@monitoring-apps [DEV] netdata]#  ps faxu | grep "[n]etdata" | grep fping
netdata  20267  0.0  0.0   1292     4 ?        SN   14:18   0:00  \_ /opt/netdata/bin/fping -N -l -Q 5 -p 1000 -R -b 56 -i 1 -r 0 -t 5000 kafka........
netdata  20279  0.0  0.0   1288     4 ?        SN   14:18   0:00  \_ /opt/netdata/bin/fping -N -l -Q 5 -p 1000 -R -b 56 -i 1 -r 0 -t 5000 self-service-password.........
netdata  20306  0.0  0.0   1288     4 ?        SN   14:18   0:00  \_ /opt/netdata/bin/fping -N -l -Q 5 -p 1000 -R -b 56 -i 1 -r 0 -t 5000 192.168.50.20 192.168.50.30

Best,
Jerome

Dear @ilyam8 , would you have time to look at the problem please ?

Please configure fping.conf so that the plugin doesn’t stop with fping.plugin: FATAL: no hosts configured - nothing to do.. What is the contents of the logs after that?

Hey, @jrevillard

I am not sure what is happening, I suspect there is a problem with resolving DNS names.

You have several log lines like

2021-09-16 14:18:41: netdata ERROR : PLUGINSD[fpingtalendrcc] : '/opt/netdata/usr/libexec/netdata/plugins.d/fpingtalendrcc.plugin' (pid 20329) exited with error code 2 and haven't collected any data. Disabling it. (errno 22, Invalid argument)

See fping man

DIAGNOSTICS
Exit status is 0 if all the hosts are reachable, 1 if some hosts were unreachable, 2 if any IP addresses were not found, 3 for invalid command line arguments, and 4 for a system call failure.

Hi @ilyam8,

Perhaps but what to do about it … I already have DNS cache, and if I run everything manually everything works… Perhaps starting all the fping processes at the same time overload DNS… wouldn’t it be possible to have a kind of retry functionality ?

Best,
Jerome

@jrevillard let’s confirm that assumption, can you try switching to IP addresses temporarily and see if this fixes the problem?

Thx @ilyam8, I confirm that with IPs everything works… I can tell you that I have the same issue with the x509check and httpcheck plugins…

The think is that the DNS resolution works well… perhaps some flooding on netdata restart ?

Also for information I put nscd on the server but it does not help…