Netdata Community

Declaim node from waroom via command line

At the moment operators can manually remove node from waroom on cloud console.

We hope a script/command can be made available so that we can automate the operation. Maybe something like below:

sudo netdata-declaim.sh -token=<token> -rooms=<roomid> -url=https://app.netdata.cloud

Use case:

Operator manages a server pool of 200+ machines. On a weekly basis, some machines get rebuilt or removed from pool. Currently, operator needs to logins to netdata.cloud and navigates to the waroom, selects manage waroom, searches for the node then removes removed/rebult machines from the waroom.
With a script/command being available, operator can simply declaim node from waroom much easily.

@Leonidas-Vrachnis We need to cover 2 major use cases regarding unclaiming:

  1. Users can unclaim their agents from the Cloud GUI through space management.
  2. Users can unclaim their agents through command line via the unclaiming script

The latter could be used even from an external machine if this has ssh access to the hosts where the agent runs.

@Dongdong-Zheng said in Declaim node from waroom via command line:

If I can put a weight to the above two scenarios, I would estimate a 80% UseCase A and 20% UseCase B.

Sorry that I may sound fussy with my ‘greedy’ requirements. Unfortunately managing cloud server pool is no fun job and we have to deal with various issues on a daily basis.

First of all, thank you again, and note that your feedback is very valuable to us.

Your requirements are not greedy, indeed cloud infrastructures are much more dynamic and thus require more effort. However, we are determined to find a good solution.

My early thoughts are:

  1. for UseCaseA: sudo netdata-unclaimed.sh would do the trick.
  2. for UseCaseB: A garbage collection setting that could be set either on space or at a room level. Nodes that are unreachable for a prolonged period, are automatically removed, with no action on your end.
  3. Alternative for UseCaseB: A way to access a subset cloud API using a token. So you can do actions like node addition or removal programmatically. This would be useful if you already have a way of a programmatic way of detecting non-responsive servers.

@Dongdong-Zheng I would like to ask, what do you think about 2, 3 in terms of effort and satisfaction from your end?

@Manos-Saratsis any thoughts on 2.

I am going to think and discuss with the team about it. So I may reply with extra feedback later.

Hi @OdysLam, an interim script to fulfill the unclaim task would be very helpful at this stage. I’d very much appreciate it if you can share your expertise on the matter.

Hi @Leonidas-Vrachnis, Thank you for clarifying the planned solution and I really appreciate the simplicity and beauty of the to-be solution.

In my real-world scenarios, servers to be rebuilt/terminated could be either dead or still alive. So I might have to reply ‘Yes’ to both your questions:
Could we run the script before a machine shutdowns for a rebuild? – YES, if the server in question is still running. (UseCase A)
Or you would need a script that could work from an external machine? – YES, when the server is not responsive.(UseCase B)

If I can put a weight to the above two scenarios, I would estimate a 80% UseCase A and 20% UseCase B.

Sorry that I may sound fussy with my ‘greedy’ requirements. Unfortunately managing cloud server pool is no fun job and we have to deal with various issues on a daily basis.

Respectfully yours,

Thank you very much for your feedback.

As mentioned introducing unclaiming is something we have a plan for and is postponed due to other prioritizes.

However, I would like to describe how we have envisioned the CLI approach to work and if it would be useful in your use case.

sudo netdata-unclaimed.sh

Some key points:

  • The script will need to run on an agent that is already claimed.
  • You will not specify either token, rooms, or URL
  • The script will use the existing ACLK configuration and authorization mechanism

That means however that the script would need to run on the machine with the claimed agent. Would that work in your case?
Could we run the script before a machine shutdowns for a rebuild?
Or you would need a script that could work from an external machine?

This is very good feedback, thanks! As Manos said, we are working on this. Would it help you if I created a simple script that bundles all the individuals commands that are required to unclaim? (e.g delete cloud.d, etc.)

@Dongdong-Zheng Many thx for using Cloud. We plan to offer unclaiming both from the command line and Cloud UI. Unclaiming will remove the node from the War Rooms as well. Unfortunately you will have to wait a couple of months due to other priorities.