Skip to content

Add metrics and reduce log verbosity in iceberg-source#6648

Open
lawofcycles wants to merge 3 commits intoopensearch-project:mainfrom
lawofcycles:iceberg-source-add-metrics
Open

Add metrics and reduce log verbosity in iceberg-source#6648
lawofcycles wants to merge 3 commits intoopensearch-project:mainfrom
lawofcycles:iceberg-source-add-metrics

Conversation

@lawofcycles
Copy link
Copy Markdown
Contributor

Description

Adds operational metrics to the iceberg-source plugin and reduces log verbosity for high-frequency messages, following the patterns established by the DynamoDB and RDS source plugins. This is a follow-up to #6554.

Metrics added:

Metric Type Description
changeEventsProcessed Counter CDC events written to buffer
changeEventsProcessingErrors Counter CDC partition processing failures
exportRecordsProcessed Counter Initial load records written to buffer
exportRecordsProcessingErrors Counter Initial load processing failures
snapshotsProcessed Counter Iceberg snapshots fully processed
bytesProcessed DistributionSummary Data file bytes read
carryoverRowsRemoved DistributionSummary Carryover rows removed per task

Log level changes:

Per-partition and per-file log messages (e.g. "Processing partition...", "Reading file...", "Carryover removal...") are changed from INFO to DEBUG. INFO is now reserved for lifecycle events (start, stop) and snapshot-level progress in LeaderScheduler.

Refactoring:
Extracted IcebergDataFileReader from ChangelogWorker to decouple Iceberg file I/O (Parquet/Avro/ORC reading) from processing logic. This enables unit testing of metrics recording by allowing test substitution of the file reader.

Issues Resolved

Related to #6552

Check List

  • New functionality includes testing.
  • New functionality has a documentation issue. Please link to it in this PR.
  • New functionality has javadoc added
  • Commits are signed with a real name per the DCO

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Sotaro Hikita <bering1814@gmail.com>
…o LeaderScheduler and TaskGrouper

Signed-off-by: Sotaro Hikita <bering1814@gmail.com>
Signed-off-by: Sotaro Hikita <bering1814@gmail.com>
@lawofcycles
Copy link
Copy Markdown
Contributor Author

I have opened #6682 which adds source-layer shuffle to the iceberg-source plugin. It modifies ChangelogWorker, LeaderScheduler, and TaskGrouper significantly. Once #6682 is merged, this PR will need a rebase to resolve conflicts, particularly around the IcebergDataFileReader extraction and the SHUFFLE_WRITE/READ processing added to ChangelogWorker.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant