Environment
Debian 10
Problem/Question
Hi,
I’m trying to setup encrypted streaming between two netdata instances, one acting as a parent, and the other as a child.
Everything works as expected when streaming without SSL. But when I enable SSL in the child destination configuration, the child suddenly can’t connect to the parent. It keeps trying to connect unsuccessfully.
Here is a sample of the child’s error.log :
2021-01-13 15:22:42: netdata INFO : STREAM_SENDER[child] : STREAM slave [send to parent:19996]: connecting...
2021-01-13 15:22:42: netdata INFO : STREAM_SENDER[child] : STREAM slave [send to parent:19996]: initializing communication...
2021-01-13 15:22:42: netdata ERROR : STREAM_SENDER[child] : SSL cannot connect with the server: error:00000000:lib(0):func(0):reason(0)
And here is a sample of the parent’s access.log :
2021-01-13 14:30:20: 521: 9620 '[CHILD_IP]:50842' 'CONNECTED'
2021-01-13 14:30:20: 521: 9620 '[CHILD_IP]:50842' 'DISCONNECTED'
2021-01-13 14:30:25: 522: 9620 '[CHILD_IP]:50880' 'CONNECTED'
2021-01-13 14:30:25: 522: 9620 '[CHILD_IP]:50880' 'DISCONNECTED'
2021-01-13 14:30:30: 523: 9620 '[CHILD_IP]:50922' 'CONNECTED'
2021-01-13 14:30:30: 523: 9620 '[CHILD_IP]:50922' 'DISCONNECTED'
My configuration
Port 19999 is used behind an Nginx proxy with basic auth for the dashboard.
Port 19996 is open and directly used for streaming.
I use a letsencrypt certificate for SSL communications but this should not matter as I disabled certificate verification for testing.
Parent’s netdata.conf
[web]
ssl key = /etc/letsencrypt/live/certificate/privkey.pem
ssl certificate = /etc/letsencrypt/live/certificate/cert.pem
bind to = *:19999=dashboard|netdata.conf^SSL=optional, *:19996=streaming^SSL=optional
Parent’s stream.conf
[111...555]
enabled = yes
allow from = *
default history = 3600
default memory mode = ram
health enabled = yes
default postpone alarms on connect seconds = 60
Child’s stream.conf
[stream]
enabled = yes
destination = parent:19996:SSL
ssl skip certificate verification = yes
api key = 111...555
timeout seconds = 60
default port = 19999
send charts matching = *
buffer size bytes = 1048576
reconnect delay seconds = 5
initial clock resync iterations = 60
Hello @lmoretba ,
Let me do few questions for you to understand better the problem you are having:
1 - Can you access Netdata dashboard without any warning using a browser?
2 - Does Netdata have permission to access the key
and certificate
?
3 - What is the OpenSSL version installed on your host?
4 - Finally when you run the next command, can you see any ssl lib linked to netdata
?
bash-5.1# ldd /usr/sbin/netdata | grep ssl
libssl.so.1.1 => /lib64/libssl.so.1.1 (0x00007f696e413000)
Best regards!
1 Like
Hello @Thiago_Marques_0 , thank you very much for your quick answer!
There is no problem with the Netdata dashboard, I can access it behind the Nginx proxy and it works flawlessly without giving any alert.
I made sure that Netdata has access to the certificate files but this does not change anything (as expected since certificate check is currently disabled for testing).
The OpenSSL version is the same on both servers:
$ openssl version
OpenSSL 1.1.1d 10 Sep 2019
And it seems that libssl
is indeed linked to netdata
$ ldd /usr/sbin/netdata | grep ssl
libssl.so.1.1 => /lib/x86_64-linux-gnu/libssl.so.1.1 (0x00007fa7fd1e0000)
Best regards!
Hello @lmoretba ,
Sorry for the delay, but I had to solve personal problems and only now I could work with Debian.
I tested now the stream running Netdata on Debian 10.7 with the same OpenSSL version you reported , I also set an analogous configuration to be sure that I would be able to recreate the problem. Unfortunately I could not have the problems, and after to check my netdata.conf
I remembered that I am using memory mode = ram
instead memory mode = dbengine
inside my netdata.conf
and I also changed my stream.conf
to have ram mode setting default memory mode = ram
, to simplify some tests I am doing during development. Please, can you change the memory mode on parent inside both netdata.conf
and stream.conf
? After this restart your netdata parent, I am suspecting that your database could be corrupted.
Best regards!
Hello @Thiago_Marques_0 ,
I just tried switching my parent’s netdata.conf
to memory mode = ram
and stream.conf to default memory mode = ram
.
I then tried to restart both instances, but it did not change anything, so this probably rules out the database.
At this point I think that I probably made a silly mistake somewhere on my side or I forgot something. But I’m stuck figuring it out.
I tried temporarily disabling firewalls on both servers but it did not change anything.
I still don’t get why unencrypted communication with destination = parent:19996
works well, but destination = parent:19996:SSL
does not.
Once again, thank you for your quick reply!
Best regards!
Hello,
I remember I already had a problem with Letsencrypt, this was in the beginning of the project, my certificate was not recognized, because when OpenSSL tried to verify if the certificate was valid, the verifiers were not recognizing it. If this is your case, you can bring the server certificate for the machine and set the variable CApath = /etc/ssl/certs/
inside the child stream.conf
. I will try to create a certificate using certbot
to test if this is the possible problem.
Now, about the configuration, I have the following configuration at my environment:
On Parent:
My netdata.conf:
[global]
run as user = thiago
[web]
ssl key = /etc/netdata/ssl/key.pem
ssl certificate = /etc/netdata/ssl/cert.pem
bind to = *=dashboard|registry|streaming|management|netdata.conf|badges *:20000=dashboard|registry|streaming|netdata.conf|badges|management^SSL=optional *:20001=dashboard|registry|streaming|badges|management^SSL=force unix:/tmp/netdata/netdata.sock *:20002=streaming^SSL=optional
I tested using the port 20002
and everything worked.
My stream.conf
:
[11111111-2222-3333-4444-555555555555]
enabled = yes
allow from = *
health enabled by default = auto
default memory mode = ram
default postpone alarms on connect seconds = 60
update every = 15
On child:
My stream.conf
[stream]
enabled = yes
destination = 192.168.0.12:20002:SSL
ssl skip certificate verification = yes
api key = 11111111-2222-3333-4444-555555555555
Hi again,
I’ve been working on this again today. I’ve tried different things and finally got it working, but things don’t make much sense anymore.
First, I tried using a self-signed OpenSSL certificate on the parent instead of the letsencryt certificate.
I set the child to skip certificate verification and it did not work.
Then I added the self-signed certificate to the list of trusted certificates on the child following the instructions here. After that, SSL streaming was finally working.
That made me think that there was a problem with the letsencrypt certificate.
But I still didn’t understand why I had to add the certificate on the child server and skip certificate verification = yes
would not work.
So I removed the certificate from the child, and now skip certificate verification
would behave as expected.
If set to yes
I did not need to have the certificate on the child.
So then out of curiosity I tried using the letsencrypt certificate again, and it worked with skip certificate verification = yes
, but the child could not verify it.
So at this point I’m a bit lost and I don’t really know why it did not work initially nor why it’s working now.
I also tried connecting other freshly installed netdata instances as child and they behave identically so the problem is probably on the parent side?
Yesterday I finally found out what went wrong and I feel dumb, but this still raises a few questions.
I basically forgot that with Letsencrypt certificates generated by certbot, your live
directory does not contain the actual certificate files, but only symlinks which point to the real files in the archive
directory.
So therefore, when I set permissions recursively to the live
directory, it didn’t apply to those. Now that the permissions are fixed, everything works fine on a fresh install.
What made me realize that there might be an issue with permissions is the fact that this OpenSSL error (error:00000000:lib(0):func(0):reason(0)
) also happened when I used an incorrect certificate path on purpose.
Now, it seems that when I previously tried using Letsencrypt certificates again just after using self-signed certificates, Netdata still used the self-signed certificates instead of erroring out like it should have done because it didn’t have permissions for Letsencrypt certificates?
Thank you very much @Thiago_Marques_0 for taking the time to investigate my problem on this dumb issue :')
Hello @lmoretba ,
Firstly I apologize for the delay, I was busy these days working on new features that Netdata will have.
I will do some tests with certificates and permissions to give a better message on errors. Thank you to report this error.
You are welcome! It is always a pleasure to help our users!
Best regard!
1 Like