I am using httpcheck to verify my websites are up and running.
I setup this for several websites in the go.d/httpcheck.conf.
This works very good.
The problem is about the timeout alerts, specified in health.d/httpcheck.conf.
The reponse times are good and I expect no alerts to be raised.
However, thay are often raised and recovered.
Looking in the alert definition:
template: httpcheck_web_service_timeouts
on: httpcheck.status
class: Latency
type: Web Server
component: HTTP endpoint
lookup: average -5m unaligned percentage of timeout
every: 10s
units: %
warn: $this >= 10 AND $this < 40
crit: $this >= 40
delay: down 5m multiplier 1.5 max 1h
summary: HTTP check for ${label:url} timeouts
info: Percentage of timed-out HTTP requests to ${label:url} in the last 5 minutes
to: webmaster
This alert seems to calculate average timeouts, so response times are rlated to each other…
I would like to be alerted only when the specified timeout (in go.d/httpcheck.conf) is reached several, say 3, in the last, say 2 minutes or so.
How should I change the settings in go.d/httpcheck.conf and the alert definition in health.d/httpcheck.conf?
I think I understand it.
I have set the timeout in go.d/httpcheck.conf.
In the netdata httpcheck.status chart I can see the timeouts occurring.
Now I just need to find in my server logs what is causing those timeouts.
I cannot find it quickly on my debian /var/log files, so I have to dig into it.