Virtualmin / Websites monitoring httpcheck, problem flood alerting

Hi,

I just recently started Netdata on my VPS, running Webmin/Virtualmin/multiple websites.
I want to setup Netdata to check the monitoring of all my individual websites with latency, up/down info, #requests, …

I found the plugin httpcheck.conf
https://learn.netdata.cloud/docs/collecting-metrics/synthetic-checks/http-endpoints

After setting up httpcheck.conf in my go.d directory it seems to be as I see it in my metrics.
However, i get multiple alerts every couple of minutes from all websites (Raised to Warning / Recovered) on timeouts.

I cannot find a way to investigate these alerts.
Are there indeed many timeouts?
Should I use web_log plugin to investigate?
Defining the alerts to a lower frequency is not my first idea, as I don’t know the problem at all.

I would like to know how to proceed with investigating and what setup of plugins is the best for my goal.

My setup:
netdata v1.44.3
Debian Linux 11

My httpcheck.conf:

All available configuration options, their descriptions and default values:

go.d.plugin/modules/httpcheck at master · netdata/go.d.plugin · GitHub

update_every : 30
#autodetection_retry : 0
#priority : 70000

jobs:

My questions:

  • Is there something wrong in my settings for httpcheck?
  • What is the way to go for investigating my timeout alerts?

Kind regards

Seems like you are running Netdata on the same server that us hosting the websites, thus are you testing them locally.

You could try curl -I https://site.nl on the same server, and see that that says.

When testing locally; you could get different results than externally, depending on your setup and which interface(s) your webserver/load balancer is bound to.

Hi,

Thank you for your answer.
Actually I have 2 VPS systems,both with Debian 11 and Webmin/Virtualmin.

VPS1 is the production server with all my websites running.
VPS2 is a testserver where I setup httpcheck.conf.

So httpcheck is checking the live websites (on VPS1) from VPS2, so it will not test locally.

The curl -I command is giving the same result on both VPS’s:

HTTP/2 200

last-modified: Fri, 02 Feb 2024 13:34:43 GMT

etag: “1ca8a-610662cb497a8”

accept-ranges: bytes

content-length: 117386

cache-control: max-age=0

expires: Fri, 01 Mar 2024 11:32:26 GMT

vary: Accept-Encoding,User-Agent

strict-transport-security: max-age=31536000

x-xss-protection: 1; mode=block

x-content-type-options: nosniff

x-frame-options: SAMEORIGIN

referrer-policy: no-referrer-when-downgrade

content-type: text/html

date: Fri, 01 Mar 2024 11:32:26 GMT

server: Apache

Hm, the response looks good. Maybe run it with the same interval as the httpcheck to see if it fails every once in a while?

Hi,

I played around with curl -I https://www.site.nl in a while loop with 30s wait.
It does not give me any timeouts at all, for around half an hour.

So this drives me crazy for days now.
Alerting from httpcheck plugin while curl gives no problems.

My question now:
How to investigate the httpcheck_web_service_timeouts alert?
In the Netdata dashboard?
Or perhaps with help of my server logs?

Alternative??
I only want to see whether my websites are up and running end get emailed when one goes down.

Hello, while I don’t have a direct answer to your question — I just stumbled onto the following in the documentation:

To troubleshoot issues with the httpcheck collector, run the go.d.plugin with the debug option enabled. The output should give you clues as to why the collector isn’t working.

Thanks for the tip.
Unfortunately I have read that doc over and over again, but it did not help me find an answer.

I now have a setup with only monitoring one site in httpcheck.
I will keep trying and play with parameters like timeout to see what happens.
Default timeout is 1s, I have set it to 3s and waiting for results now.

Hi,

I just want to let you know what is happening on my webserver and what solution for the “Flood alerting” I implemented.

Problem was:
I installed Netdata on my 2 servers with http endpoint on server1.
Http Endpoint setup for all my websites running on server2.
This resulted in a flood alerting, with “Timeout alerts”.

My solution:
Setting the timeout to 3 secs, resulting in less timeout alerts, but i thought there were still too many.

Further investigating on my server2, with all Wordpress websites running, I saw that there were a huge amount of 404 http requests on Netdata. Digging into the individual access logs I saw these 404 GET’s.
What is that?
As I already knew, all Wordpress websites on the whole world are under attack of hackers.

My solution would be:
Stop the flood as soon as possible so that the ip addresses would by protected by my server.
I already have my webserver protected with Fail2ban, where sshd and postfix are protected: IP adresses get banned.
I implemented a Jail for attackers of my websites andnow I see that a lot of IP adresses get banned on e “404 flood” as well.

For now this seems a good solution for me.
404 floods are stopped as soon as they are started.
And my timeout alerts of Http Endpoint in Netdata is stopped as well.

Just letting you know :wink:

Regards,
Jan