Netdata v2.2.0 broke php-fpm graphs

Problem/Question

We’ve been doing sone php-fpm tuning lately. In the last day or so we noticed short, semi-sporadic patches of data not coming through to the php-fpm graphs. The periods of “empty data” lasts between 15s to 60s and the intervals range from 60s to 300s.

To try resolve this issue, first we restarted php-fpm. This did not resolve the issue. Secondly, we restarted netdata. When we restarted netdata, all php-fpm graphs have “empty data”.

I have ran some debugging and I can see data coming through:

BEGIN 'phpfpm_php-fpm.connections' 999465
SET 'active' = 15
SET 'max active' = 60
SET 'idle' = 48
END

There is also no errors in the php-fpm log.

I believe that restarting netdata updated netdata agent to v2.2.0 and there’s some incompatibility which is causing this issue.

EDIT 1:

While doing more debugging I’m seeing some errors now.

DEBUG ERROR 01

DBG phpfpm/client.go:146 failed to JSON decode data: {...}
...
ERR phpfpm/collector.go:92 error on decoding response from socket '/var/run/php/php7.4-fpm.sock': invalid character '&' after object key:value pair collector=phpfpm job=php-fpm

DEBUG ERROR 02

DBG jobmgr/manager.go:303 creating phpfpm[php-fpm] job, config: map[__provider__:file reader __source__:discoverer=file_reader,file=/etc/netdata/go.d/phpfpm.conf __source_type__:user autodetection_retry:0 module:phpfpm name:php-fpm priority:70000 socket:/var/run/php/php7.4-fpm.sock status_path:/status?full&json update_every:1 user:www-data] component="job manager"
...
ERR module/job.go:243 check failed: error on decoding response from socket '/var/run/php/php7.4-fpm.sock': invalid character '"' after object key:value pair collector=phpfpm job=php-fpm

I don’t know if directly related or another issue altogether.

Relevant docs you followed/actions you took to solve the issue

Environment/Browser/Agent’s version etc

Netdata agent: v2.2.0 (stable)
OS: Debian 11 (bullseye)
PHP: 7.4

Can provide more info if required.

What I expected to happen

Having php-fpm data coming through to the cloud graphs

@route-dist we haven’t changed phpfm collector in v2.2.0.

DBG phpfpm/client.go:146 failed to JSON decode data: {…}

Can you share the data?

I’m sorry for the late reply.

I thought I had saved the data somewhere, but it seems I did not.

Going back to the debug, however, the error is no longer present.

Just a note, though. I seem to have resolved it prior with the following steps:

  • Changed phpfpm config to values before the issue appeared
  • Restarted phpfpm
  • Restarted apache
  • Restarted netdata

After these steps phpfpm data seems to have gone back online.

Seeing as phpfpm data was back online, I changed the phpfpm configs to my preferred values again, followed the restart process as above and phpfpm remained online.

The only difference of note with my restart process was that I added the apache restart.

There were no other changes.

@ilyam8

Upon further investigation I have identified the issue to be malicious http requests hitting our web servers at the time. Everything else is just co-incidental.

It seems that while phpfpm was “fine”, the /status data that was sent to netdata caused parsing errors.

So while I think the version change wasn’t the cause, the php-fpm collector is still the issue.

How do you expect it to work? What’s the issue?

While php-fpm is still working on the node, I would expect the corresponding graphs to not have missing data points.

The php-fpm collector appears to be having trouble parsing some URIs. This is causing it not to send data to netdata cloud causing gaps in the graphs. And consequently, not being able to be initialized properly and not send data at all.

But it failed to decode the response

DBG phpfpm/client.go:146 failed to JSON decode data: {…}

That means no data was collected.

The php-fpm collector appears to be having trouble parsing some URIs.

What do you mean by “parsing some URIs”? Can you give me data sample?

I can’t help but feel this is a failure (as you say) with netdata but you seem to suggest the failure is elsewhere.

I don’t have any json data that caused the issue. I only managed to capture the error and forgot to capture the data.

My assumption is that some of the malicious URIs caused the json data not to be parsed properly.

This may be related:

A case study on this exact issue in two blog posts:

The bottom line is, this issue was caused by the improper json_encoding of data in php (prior to PHP 8.1).

The solution according to Thomas Pike is to replace:

  • This : http://localhost/status?full&json
  • With this: http://localhost/status?json

The downside to making this change is a few php-fpm graphs will not receive any data. However, this an acceptable trade to make. No further changes required.


Side Notes:

PHP rectified the issue here:

Blog post identifying problem and providing a solution: