Unable to add or edit health checks from WebUI (Plan: Business)

Suggested template:

Problem/Question

Unable to edit or create new health checks via WEBUI

Relevant docs you followed/actions you took to solve the issue

Environment/Browser/Agent’s version etc

  • Netdata 1.46.3
  • 2 Parents ( streaming between parents active, agents streaming to parents)
  • Plan: business

What I expected to happen

As reported in June 6th blog post and included inside docs (alerts-&-notifications) I expect to define new alerts or edit existing alerts via webUI -OR- at least to be able to use old rule wizard from the webUI to generate working snippets to be added later via health.d

Steps to reproduce:

  • Create new alert using the webui (chart → bell icon → manage alert → add alert): Will lead you a page (Manage Space / / Configurations) for node configurations which will not show you alerts configuration anyway
  • Edit an existing alert using the webui (alerts → edit this alert configuration ) Will lead you a page (Manage Space / / Configurations) for node configurations which will not show you alerts configuration anyway

Hi, @Alessio_C. Can you record a short video? I tried your “Step to reproduce” and it worked for me.

Hi @ilyam8 ,

sure thing! I’ve just uploaded a short video here: let me know if I can provide additional details. Thanks

Hi @Alessio_C, thanks for reporting this!
I have also tried and worked for me… do you get any console errors on the browser developer tools while you try this?

@kapantzak One of the issues I see in @Alessio_C video:

  • Select node “gatetest” from “Nodes”.
  • Click on “Add alert” (system.cpu chart).
  • Issue: UI opens different node Dyncfg menu (az-maestro-cache-…).

hi @kapantzak , you are welcome thanks for your time and interest.

On the browser devconsole, I only get a couple of “port disconnected from addon code”

I’ll provide more information on our setup to help you reproduce this:

  • 2 parents running on separate hosts deployed via docker
  • stream syncronization across parents working as expected
  • agent (clients) streaming data to parents (parent1 or parent2, an HA setup)
  • both parents are streaming to Cloud
  • both parents’ health checks demanded to Cloud (parents configured with netdata.conf [health] section, set enabled to no)

UPDATE:
As pointed out by @kapantzak UI opens different node Dyncfg menu, if I scroll down to a parent (let’s say parent1) I can access the health menu!

But it’s still a bit confusing to me: how should I proceed to setup a new chart alert in this scenario?

Am I supposed to select a parent node and then distribute the new/modified alert on both parents?

@Alessio_C indeed, as @ilyam8 pointed, we have a problem there, the form opens with the wrong node selected. I’m on it to fix it asap

@Alessio_C I just released a fix that hopefully solves the incorrect selected node

@kapantzak, not sure if the fix you mentioned is already live in production (cloud), if this is the case I’m still experiencing the same issue as per screenshot1 and screenshot2

@Alessio_C this “gatetest” node. Is “health” disabled on this node? Can you check the [health] section via http://IP:19999/netdata.conf?

Hi @ilyam8 ,

confirmed: this “gatetest” host as well as other hosts are running netdata agent with [health] enabled = no. Due to the fact we use parents.

Then you need to configure alerts on the parent instance.

@ilyam8 we gave up trying using the form ATM as it frequently messes up with ID anyway.Our current solution - as fallback - is setting up health.d checks injecting configuration files to both parents and then reload [health] via automation. Hope this will be fixed in 2.x!

What do you mean? Can you provide an example?

A typical chain of actions could be the following:

  1. go to host chart (eg: docker_container_status) select alerts add
  2. a new form will appear with node names that will generate and ID ERROR =>
  3. select from the list one of the parent nodes and click on Health and define the rule
  4. you can apply the rule to: a) only one parent(selected) or b) to parent(selected) and selected node(agent providing metrics)

In step 4 the additional parent (the second one in our case) is missing AFAIK this is not
fine in a multiparent environment and health rules should be aligned.

EDIT:
we bumped to v1.47.0 everything, now the “missing ID” issue seems to be less frequent or gone