Hi team,
I have one server acting as master and other 4 servers as child nodes.
On the master node, I have configured the dbengine (as per the calculation) to retain the metrics for 90 days.
[global]
run as user = netdata
dbengine multihost disk space = 284766
# the default database size - 1 hour
history = 3600
update every = 5
apps = no
cgroups = no
But I can see that the metrics are not retained for more that 4 days on an avg.
let me know what logs are required for further troubleshooting this issue.
Below is the result for error.log | grep -i stream| grep -v -i info
2022-06-22 09:55:03: netdata ERROR : STREAM_RECEIVER[london,[192.168.1.18]:56572] : STREAM london [receive from [192.168.1.18]:56572]: disconnected (completed 31384 updates).
2022-06-22 09:55:04: netdata ERROR : STREAM_RECEIVER[miami,[192.168.1.90]:40140] : STREAM miami [receive from [192.168.1.90]:40140]: disconnected (completed 27563 updates).
2022-06-22 09:55:04: netdata ERROR : STREAM_RECEIVER[hollywood,[192.168.1.30]:59062] : STREAM hollywood [receive from [192.168.1.30]:59062]: disconnected (completed 1906 updates).
2022-06-22 09:55:05: netdata ERROR : STREAM_RECEIVER[vegas,[192.168.1.27]:45496] : STREAM vegas [receive from [192.168.1.27]:45496]: disconnected (completed 37149 updates).
2022-06-22 09:55:07: go.d ERROR: prometheus[stream_exporter_local] Get "http://127.0.0.1:9178/metrics": dial tcp 127.0.0.1:9178: connect: connection refused
2022-06-22 09:55:07: go.d ERROR: prometheus[stream_exporter_local] check failed
2022-06-22 09:55:07: go.d ERROR: prometheus[openconfig_streaming_telemetry_exporter_local] Get "http://127.0.0.1:9513/metrics": dial tcp 127.0.0.1:9513: connect: connection refused
2022-06-22 09:55:07: go.d ERROR: prometheus[openconfig_streaming_telemetry_exporter_local] check failed
2022-06-22 10:49:59: netdata ERROR : STREAM_RECEIVER[miami,[192.168.1.90]:40150] : STREAM miami [receive from [192.168.1.90]:40150]: disconnected (completed 214152 updates).
2022-06-22 10:49:59: netdata ERROR : STREAM_RECEIVER[hollywood,[192.168.1.30]:59068] : STREAM hollywood [receive from [192.168.1.30]:59068]: disconnected (completed 222798 updates).
2022-06-22 10:50:00: netdata ERROR : STREAM_RECEIVER[vegas,[192.168.1.27]:45696] : STREAM vegas [receive from [192.168.1.27]:45696]: disconnected (completed 281981 updates).
2022-06-22 10:50:00: netdata ERROR : STREAM_RECEIVER[london,[192.168.1.18]:56588] : STREAM london [receive from [192.168.1.18]:56588]: disconnected (completed 249167 updates).
2022-06-22 10:50:01: go.d ERROR: prometheus[stream_exporter_local] Get "http://127.0.0.1:9178/metrics": dial tcp 127.0.0.1:9178: connect: connection refused
2022-06-22 10:50:01: go.d ERROR: prometheus[stream_exporter_local] check failed
2022-06-22 10:50:01: go.d ERROR: prometheus[openconfig_streaming_telemetry_exporter_local] Get "http://127.0.0.1:9513/metrics": dial tcp 127.0.0.1:9513: connect: connection refused
2022-06-22 10:50:01: go.d ERROR: prometheus[openconfig_streaming_telemetry_exporter_local] check failed
2022-06-22 10:59:14: netdata ERROR : STREAM_RECEIVER[hollywood,[192.168.1.30]:59072] : STREAM hollywood [receive from [192.168.1.30]:59072]: disconnected (completed 37234 updates).
2022-06-22 10:59:15: netdata ERROR : STREAM_RECEIVER[miami,[192.168.1.90]:41866] : STREAM miami [receive from [192.168.1.90]:41866]: disconnected (completed 40442 updates).
2022-06-22 10:59:15: netdata ERROR : STREAM_RECEIVER[london,[192.168.1.18]:58296] : STREAM london [receive from [192.168.1.18]:58296]: disconnected (completed 45954 updates).
2022-06-22 10:59:15: netdata ERROR : STREAM_RECEIVER[vegas,[192.168.1.27]:47212] : STREAM vegas [receive from [192.168.1.27]:47212]: disconnected (completed 54532 updates).
2022-06-22 10:59:37: go.d ERROR: prometheus[stream_exporter_local] Get "http://127.0.0.1:9178/metrics": dial tcp 127.0.0.1:9178: connect: connection refused
2022-06-22 10:59:37: go.d ERROR: prometheus[stream_exporter_local] check failed
2022-06-22 10:59:37: go.d ERROR: prometheus[openconfig_streaming_telemetry_exporter_local] Get "http://127.0.0.1:9513/metrics": dial tcp 127.0.0.1:9513: connect: connection refused
2022-06-22 10:59:37: go.d ERROR: prometheus[openconfig_streaming_telemetry_exporter_local] check failed
2022-06-22 11:03:18: netdata ERROR : STREAM_RECEIVER[london,[192.168.1.18]:58302] : STREAM london [receive from [192.168.1.18]:58302]: disconnected (completed 19981 updates).
2022-06-22 11:03:19: netdata ERROR : STREAM_RECEIVER[miami,[192.168.1.90]:41870] : STREAM miami [receive from [192.168.1.90]:41870]: disconnected (completed 17537 updates).
2022-06-22 11:03:19: netdata ERROR : STREAM_RECEIVER[hollywood,[192.168.1.30]:59076] : STREAM hollywood [receive from [192.168.1.30]:59076]: disconnected (completed 16240 updates).
2022-06-22 11:03:20: netdata ERROR : STREAM_RECEIVER[vegas,[192.168.1.27]:47216] : STREAM vegas [receive from [192.168.1.27]:47216]: disconnected (completed 23742 updates).
2022-06-22 11:03:21: go.d ERROR: prometheus[stream_exporter_local] Get "http://127.0.0.1:9178/metrics": dial tcp 127.0.0.1:9178: connect: connection refused
2022-06-22 11:03:21: go.d ERROR: prometheus[stream_exporter_local] check failed
2022-06-22 11:03:21: go.d ERROR: prometheus[openconfig_streaming_telemetry_exporter_local] Get "http://127.0.0.1:9513/metrics": dial tcp 127.0.0.1:9513: connect: connection refused
2022-06-22 11:03:21: go.d ERROR: prometheus[openconfig_streaming_telemetry_exporter_local] check failed
root@mediapc:/etc/netdata# cat /etc/netdata/stream.conf | grep -v " #" | grep -v ^#
[stream]
enabled = no
destination =
api key =
timeout seconds = 60
default port = 19999
send charts matching = *
buffer size bytes = 10485760
reconnect delay seconds = 5
initial clock resync iterations = 60
[xxxxx]
enabled = yes
allow from = *
default history = 3600
default memory mode = dbengine
health enabled by default = auto
default postpone alarms on connect seconds = 60
[MACHINE_GUID]
enabled = no
allow from = *
history = 3600
memory mode = save
health enabled = yes
postpone alarms on connect seconds = 60
root@mediapc:/etc/netdata# netdata -W buildinfo
Version: netdata v1.35.0-54-nightly
Configure options: '--build=x86_64-linux-gnu' '--includedir=${prefix}/include' '--mandir=${prefix}/share/man' '--infodir=${prefix}/share/info' '--disable-silent-rules' '--libdir=${prefix}/lib/x86_64-linux-gnu' '--libexecdir=${prefix}/lib/x86_64-linux-gnu' '--disable-maintainer-mode' '--prefix=/usr' '--sysconfdir=/etc' '--localstatedir=/var' '--libdir=/usr/lib' '--libexecdir=/usr/libexec' '--with-user=netdata' '--with-math' '--with-zlib' '--with-webdir=/var/lib/netdata/www' '--disable-dependency-tracking' 'build_alias=x86_64-linux-gnu' 'CFLAGS=-g -O2 -fdebug-prefix-map=/usr/src/netdata=. -fstack-protector-strong -Wformat -Werror=format-security' 'LDFLAGS=-Wl,-Bsymbolic-functions -Wl,-z,relro' 'CPPFLAGS=-Wdate-time -D_FORTIFY_SOURCE=2' 'CXXFLAGS=-g -O2 -fdebug-prefix-map=/usr/src/netdata=. -fstack-protector-strong -Wformat -Werror=format-security'
Install type: binpkg-deb
Binary architecture: x86_64
Packaging distro:
Features:
dbengine: YES
Native HTTPS: YES
Netdata Cloud: YES
ACLK Next Generation: YES
ACLK-NG New Cloud Protocol: YES
ACLK Legacy: NO
TLS Host Verification: YES
Machine Learning: YES
Stream Compression: NO
Libraries:
protobuf: YES (system)
jemalloc: NO
JSON-C: YES
libcap: NO
libcrypto: YES
libm: YES
tcalloc: NO
zlib: YES
Plugins:
apps: YES
cgroup Network Tracking: YES
CUPS: YES
EBPF: YES
IPMI: YES
NFACCT: YES
perf: YES
slabinfo: YES
Xen: NO
Xen VBD Error Tracking: NO
Exporters:
AWS Kinesis: NO
GCP PubSub: NO
MongoDB: NO
Prometheus Remote Write: YES
root@mediapc:/etc/netdata# netdatacli aclk-state
ACLK Available: Yes
ACLK Version: 2
Protocols Supported: Legacy, Protobuf
Protocol Used: Protobuf
MQTT Version: 3
Claimed: Yes
Claimed Id: xxxx
Cloud URL: https://app.netdata.cloud
Online: Yes
Reconnect count: 1
Banned By Cloud: No
Last Connection Time: 2022-06-22 10:38:22
Last Connection Time + 3 PUBACKs received: 2022-06-22 10:38:23
Last Disconnect Time: 2022-06-22 10:34:28
Received Cloud MQTT Messages: 16
MQTT Messages Confirmed by Remote Broker (PUBACKs): 11
> Node Instance for mGUID: "xxx hostname "mediapc"
Claimed ID: xxxx
Node ID: xxxxxxx
Streaming Hops: 0
Relationship: self
Alert Streaming Status:
Updates: 1
Batch ID: 46
Last Acked Seq ID: 839
Pending Min Seq ID: 0
Pending Max Seq ID: 0
Last Submitted Seq ID: 839
Chart Streaming Status:
Updates: 1
Batch ID: 79
Min Seq ID: 1
Max Seq ID: 3766
Pending Min Seq ID: 0
Pending Max Seq ID: 0
Sent Min Seq ID: 1
Sent Max Seq ID: 3766
Acked Min Seq ID: 1
Acked Max Seq ID: 3766
> Node Instance for mGUID: "xxx hostname "vegas"
Claimed ID: xxxx
Node ID: xxxxxxx
Streaming Hops: 1
Relationship: child
Streaming Connection Live: true
Alert Streaming Status:
Updates: 1
Batch ID: 119
Last Acked Seq ID: 4786
Pending Min Seq ID: 0
Pending Max Seq ID: 0
Last Submitted Seq ID: 4786
Chart Streaming Status:
Updates: 1
Batch ID: 23
Min Seq ID: 3470
Max Seq ID: 3753
Pending Min Seq ID: 0
Pending Max Seq ID: 0
Sent Min Seq ID: 3470
Sent Max Seq ID: 3753
Acked Min Seq ID: 3470
Acked Max Seq ID: 3753
> Node Instance for mGUID: "xxx hostname "hollywood"
Claimed ID: xxxx
Node ID: xxxxxxx
Streaming Hops: 1
Relationship: child
Streaming Connection Live: true
Alert Streaming Status:
Updates: 1
Batch ID: 20
Last Acked Seq ID: 919
Pending Min Seq ID: 0
Pending Max Seq ID: 0
Last Submitted Seq ID: 919
Chart Streaming Status:
Updates: 1
Batch ID: 42
Min Seq ID: 9923
Max Seq ID: 10841
Pending Min Seq ID: 0
Pending Max Seq ID: 0
Sent Min Seq ID: 9923
Sent Max Seq ID: 10841
Acked Min Seq ID: 9923
Acked Max Seq ID: 10841
> Node Instance for mGUID: "xxx hostname "miami"
Claimed ID: xxxx
Node ID: xxxxxxx
Streaming Hops: 1
Relationship: child
Streaming Connection Live: true
Alert Streaming Status:
Updates: 1
Batch ID: 28
Last Acked Seq ID: 1076
Pending Min Seq ID: 0
Pending Max Seq ID: 0
Last Submitted Seq ID: 1076
Chart Streaming Status:
Updates: 1
Batch ID: 27
Min Seq ID: 4537
Max Seq ID: 7462
Pending Min Seq ID: 0
Pending Max Seq ID: 0
Sent Min Seq ID: 4537
Sent Max Seq ID: 7462
Acked Min Seq ID: 4537
Acked Max Seq ID: 7462
> Node Instance for mGUID: "xxx hostname "london"
Claimed ID: xxxx
Node ID: xxxxxxx
Streaming Hops: 1
Relationship: child
Streaming Connection Live: true
Alert Streaming Status:
Updates: 1
Batch ID: 28
Last Acked Seq ID: 1180
Pending Min Seq ID: 0
Pending Max Seq ID: 0
Last Submitted Seq ID: 1180
Chart Streaming Status:
Updates: 1
Batch ID: 49
Min Seq ID: 3425
Max Seq ID: 3955
Pending Min Seq ID: 0
Pending Max Seq ID: 0
Sent Min Seq ID: 3425
Sent Max Seq ID: 3955
Acked Min Seq ID: 3425
Acked Max Seq ID: 3955
root@mediapc:/etc/netdata# cat netdata.conf
# netdata configuration
#
# You can download the latest version of this file, using:
#
# wget -O /etc/netdata/netdata.conf http://localhost:19999/netdata.conf
# or
# curl -o /etc/netdata/netdata.conf http://localhost:19999/netdata.conf
#
# You can uncomment and change any of the options below.
# The value shown in the commented settings, is the default value.
#
[global]
run as user = netdata
dbengine multihost disk space = 284766
# the default database size - 1 hour
history = 3600
update every = 5
apps = no
cgroups = no
[directories]
cache = /netdata/cache
log = /netdata/logs
[plugins]
proc = yes
python.d = yes
charts.d = yes
go.d = yes
idlejitter = no
ebpf = no
timex = no
[logs]
debug log = none
error log = none
access log = none
[plugin:proc]
/proc/softirqs = no
/proc/interrupts = no
/proc/pressure = no
/proc/sys/kernel/random/entropy_avail = no
/proc/net/snmp6 = no
/proc/net/softnet_sta = no
/proc/spl/kstat/zfs/arcstats = no
/proc/spl/kstat/zfs/pool/state = no
/proc/net/rpc/nfsd = no
/proc/net/softnet_stat = no
/proc/net/sockstat6 = no
/proc/net/sockstat = no
/proc/net/ip_vs/stats = no
/proc/net/stat/conntrack = no
/proc/net/stat/synproxy = no
ipc = no
/sys/class/infiniband = no
[plugin:proc:/proc/meminfo]
slab memory = no
[plugin:proc:diskspace]
check for new mount points every = 15
exclude space metrics on paths = /boot/* /run/* /dev/* /proc/* /sys/* /var/run/user/* /run/user/* /snap/* /var/lib/docker/*
[plugin:timex]
clock synchronization state = no
time offset = no
[plugin:proc:/proc/net/sockstat6]
ipv6 TCP sockets = no
ipv6 UDP sockets = no
ipv6 UDPLITE sockets = no
ipv6 RAW sockets = no
ipv6 FRAG sockets = no
[plugin:proc:/proc/net/snmp6]
ipv6 packets = no
ipv6 fragments sent = no
ipv6 fragments assembly = no
ipv6 errors = no
ipv6 UDP packets = no
ipv6 UDP errors = no
ipv6 UDPlite packets = no
ipv6 UDPlite errors = no
bandwidth = no
multicast bandwidth = no
broadcast bandwidth = no
multicast packets = no
icmp = no
icmp redirects = no
icmp errors = no
icmp echos = no
icmp group membership = no
icmp router = no
icmp neighbor = no
icmp mldv2 = no
icmp types = no
ect = no
Stream config
root@mediapc:/etc/netdata# cat /etc/netdata/stream.conf | grep -v " #" | grep -v ^#
[stream]
enabled = no
destination =
api key =
timeout seconds = 60
default port = 19999
send charts matching = *
buffer size bytes = 10485760
reconnect delay seconds = 5
initial clock resync iterations = 60
[xxxxx]
enabled = yes
allow from = *
default history = 3600
default memory mode = dbengine
health enabled by default = auto
default postpone alarms on connect seconds = 60
[MACHINE_GUID]
enabled = no
allow from = *
history = 3600
memory mode = save
health enabled = yes
postpone alarms on connect seconds = 60
How many metrics does each child collect? Same for the parent.
Is the ‘update every’ 5 on each child as well?
The calculator is at Change how long Netdata stores metrics | Learn Netdata and is fairly accurate. It says where you can see the metrics collected on each child too
Hi Cristopher,
Firstly thank you for your prompt response, highly appreciate it.
Yes, the ‘update every’ is set to 5 on all child as well.
PFA screenshot of the dashboard.
It looks like a bug, unless I am missing something. You should only be needing on the parent the following to get 90 days retention.
[global]
dbengine multihost disk space = 44495
You have more. Will try to get more help to debug here
1 Like
Thank you @Christopher_Akritid1 for looking into this.
Kindly keep me posted on this. Let me know if you need any logs for troubleshooting this issue further.
Hi @Vicky_Ingle ! Can you please share a full recent error.log to manolis at netdata dot cloud? Thanks!
Kindly let me know how shall I share the logs with you. The upload option in the forum only allows me to upload pictures.
Shyam
June 28, 2022, 7:04am
10
Please email the logs to the email Manolis mentions above. Thanks.
I have emailed him the relevant logs and information