Alert status escalation

Hello Team,

We would like to enhance an alert for monitoring PostgreSQL replication slots by raising a warning if the condition persists for 6 hours and escalating to critical if it persists for 24 hours, based on the following alert configuration:

 template: Alerts_Inactive_Replication_Slots
       on: Alerts.Inactive_Replication_Slots
   lookup: average -5m unaligned of Inactive
    units: slots
    every: 60s
     warn: $this >= (1)
    delay: up 6h
  summary: Inactive Replication Slots
     info: Inactive Replication Slots
       to: dba

Can you help us to find a solution?

Regards,
Richard Barrantes

Hello @Richard_Barrantes,

The configuration you shared is pretty close to what you desire, it just needs small changes.

If you want the alert to check on a time window you need change the query (lookup field) . Changing delay field would just delay the notification but the alert will still trigger for the average of the last 5min.

Since query a big time window is a heavy query, this also bypasses tiering and requests high resolution data from tier 0 (per second) not tier 1 (per min) or tier 2 per hour, we may want to do not check the alert that often (every 60s). Let’s run the alert check every 15min for example by changing the every field.

So the end result should be something like this:

template: Alerts_Inactive_Replication_Slots_6h
       on: Alerts.Inactive_Replication_Slots
   lookup: average -6h unaligned of Inactive
    units: slots
    every: 15m
     warn: $this >=  1
  summary: Inactive Replication Slots 6h
     info: Inactive Replication Slots 6h
       to: dba

Unfortunately it’s not possible to change the query AKA lookup to use a different time window, so for the 24h check we will need to create another alert.

template: Alerts_Inactive_Replication_Slots_24h
       on: Alerts.Inactive_Replication_Slots
   lookup: average -24h unaligned of Inactive
    units: slots
    every: 30min
     crit: $this >=  1
  summary: Inactive Replication Slots 24h
     info: Inactive Replication Slots 24h
       to: dba

Also, I see you have charts AKA on field and dimension AKA lookup: average ... Inactive with upper case letter, please confirm those are correct. Usually they are all lower case.

I hope this help you out.

Hello @car12o

Thank you for the information! I will proceed with setting up a separate alert for the 24-hour check. This was very helpful.

Regards,