Skip to content

feat(api): add scheduled cleanup task for dataset_queries#35734

Open
Echo0ff wants to merge 3 commits intolanggenius:mainfrom
Echo0ff:feat/clean-dataset-queries-task
Open

feat(api): add scheduled cleanup task for dataset_queries#35734
Echo0ff wants to merge 3 commits intolanggenius:mainfrom
Echo0ff:feat/clean-dataset-queries-task

Conversation

@Echo0ff
Copy link
Copy Markdown
Contributor

@Echo0ff Echo0ff commented Apr 30, 2026

Summary

Fixes #35733

The dataset_queries table grows without bound because every RAG retrieval and hit-testing operation inserts a row. There is currently no periodic cleanup task for this table — the only deletions happen when an entire dataset is removed via clean_dataset_task.

This PR adds a configurable Celery Beat task (clean_dataset_queries_task) that deletes rows older than a retention period in batches.

Changes

  1. api/schedule/clean_dataset_queries_task.py — New task: Redis lock + batch deletion with automatic retention clamping and warning logging when the configured retention falls below the safe threshold
  2. api/configs/feature/__init__.py — 4 new config fields in CeleryScheduleTasksConfig:
    • ENABLE_CLEAN_DATASET_QUERIES_TASK (default: False)
    • CLEAN_DATASET_QUERIES_RETENTION_DAYS (default: 60)
    • CLEAN_DATASET_QUERIES_BATCH_SIZE (default: 500)
    • CLEAN_DATASET_QUERIES_LOCK_TTL (default: 3600)
  3. api/extensions/ext_celery.py — Register in beat_schedule, hour=5 to avoid collision with existing tasks at 0/2/3/4
  4. api/models/dataset.py — Add created_at index declaration to DatasetQuery.__table_args__
  5. api/migrations/versions/2026_04_30_1600-67b5709d7d0a_add_dataset_queries_created_at_idx.py — Alembic migration
  6. api/tests/unit_tests/schedule/test_clean_dataset_queries_task.py — 4 unit tests

Key constraint

clean_unused_datasets_task reads DatasetQuery.created_at to determine whether a dataset has been queried recently (threshold = PLAN_SANDBOX_CLEAN_DAY_SETTING, default 30 days). The new task uses a default retention of 60 days (> 30). If a user manually sets retention below 30, the task clamps it and logs a warning.

Test plan

  • make lint passes
  • basedpyright api/schedule/clean_dataset_queries_task.py — 0 errors
  • 4 unit tests all pass
  • Manual verification: set ENABLE_CLEAN_DATASET_QUERIES_TASK=true, start celery beat, confirm the task is scheduled and deletes in batches

🤖 Generated with Claude Code

The dataset_queries table grows without bound because every RAG retrieval
and hit-test inserts a row. This adds a configurable Celery Beat task
(clean_dataset_queries_task) that deletes rows older than a retention
period (default 60 days) in batches, gated by ENABLE_CLEAN_DATASET_QUERIES_TASK.

Retention is clamped to max(config, PLAN_SANDBOX_CLEAN_DAY_SETTING) to
avoid breaking clean_unused_datasets_task which reads DatasetQuery.created_at.

Also adds a created_at index on dataset_queries via alembic migration
to keep the delete scan performant as the table grows.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@Echo0ff Echo0ff requested a review from a team April 30, 2026 08:30
@dosubot dosubot Bot added size:L This PR changes 100-499 lines, ignoring generated files. 💪 enhancement New feature or request labels Apr 30, 2026
@Qodo-Free-For-OSS
Copy link
Copy Markdown

Hi, When the Redis lock is already held, the cleanup task prints a 'skipped' message but then re-raises LockError, marking the run as failed. This creates misleading failures/alerts for normal lock contention.

Severity: remediation recommended | Category: reliability

How to fix: Return on lock contention

Agent prompt to fix - you can give this to your LLM of choice:

Issue description

Lock contention is expected for scheduled tasks; re-raising LockError causes false task failures.

Issue Context

The task already uses blocking=False and prints a skip message, so it should exit cleanly.

Fix Focus Areas

  • api/schedule/clean_dataset_queries_task.py[57-108]

Recommended fix

  • In the except LockError: block, log at info/warning (not exception) and return instead of raise.
  • Keep exception logging/raising for unexpected exceptions only.

We noticed a couple of other issues in this PR as well - happy to share if helpful.


Found by Qodo code review

echooffx and others added 2 commits May 7, 2026 14:52
…task

Re-raise LockError after printing a skip message caused false task
failures for normal lock contention. Return instead to exit cleanly.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@Echo0ff
Copy link
Copy Markdown
Contributor Author

Echo0ff commented May 7, 2026

Thanks for the review! Fixed the LockError re-raise in ca8a742.

Regarding "We noticed a couple of other issues" — could you please share those as well? Happy to fix them in this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

💪 enhancement New feature or request size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

dataset_queries table grows without bound — no periodic cleanup task

2 participants