Skip to content

Conversation

@hpatro
Copy link
Collaborator

@hpatro hpatro commented Apr 26, 2025

This PR logs CLUSTER INFO / CLUSTER NODES output every 5 seconds to the log file for verbose/debug loglevel mode.

Certain times few nodes are not in convergence with the entire cluster and there are no logs captured about the divergence. This logging could help us better analyze in test setup where we can aggressively log more cluster information.

@hpatro hpatro requested review from enjoy-binbin and madolson April 26, 2025 07:14
@codecov
Copy link

codecov bot commented Apr 26, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 70.99%. Comparing base (0b94ca6) to head (4e7f83c).
Report is 8 commits behind head on unstable.

Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #2011      +/-   ##
============================================
- Coverage     71.01%   70.99%   -0.03%     
============================================
  Files           123      123              
  Lines         66033    66125      +92     
============================================
+ Hits          46892    46944      +52     
- Misses        19141    19181      +40     
Files with missing lines Coverage Δ
src/cluster_legacy.c 86.19% <100.00%> (+0.10%) ⬆️

... and 22 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

sds cluster_info = genClusterInfoString();
sds cluster_nodes = clusterGenNodesDescription(NULL, 0, 0);

sds infostring = sdscatprintf(sdsempty(), "\r\n# Cluster info\r\n");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New lines break log parsing. If someone turns on for some reason, it should still be "valid" log lines. I'm OK logging the state, but I don't think it should just be verbatim the info fields.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. So, it's an exception for crash report to have new lines?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@madolson and I discussed offline. We discussed we can have a single log line with information around failed nodes rather than all the nodes.

@enjoy-binbin
Copy link
Member

Is the main purpose for debugging? ie someone find the cluster is not normal and adjust the loglevel to verbose and catch it?

@hpatro hpatro changed the title Log cluster state periodically to capture transient state for debuggability Log failed cluster node(s) state periodically to capture transient state for debuggability Jun 16, 2025
@hpatro
Copy link
Collaborator Author

hpatro commented Jun 16, 2025

Is the main purpose for debugging? ie someone find the cluster is not normal and adjust the loglevel to verbose and catch it?

Yes. Even to investigate incident which occurred in the past it's quite difficult for operators to figure out the issue with the current state of logging. I would like this to be active at NOTICE level with failed nodes information which is actually relevant #2011 (comment)

@hpatro hpatro mentioned this pull request Jun 27, 2025
15 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants