-
Notifications
You must be signed in to change notification settings - Fork 1.1k
feat: improve document processing #1879
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests
Tip 👮 Agentic pre-merge checks are now available in preview!Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.
Please see the documentation for more information. Example: reviews:
pre_merge_checks:
custom_checks:
- name: "Undocumented Breaking Changes"
mode: "warning"
instructions: |
Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).Please share your feedback with us on this Discord post. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (6)
ee/features/security/lib/ratelimit.ts (2)
9-13: Policy mismatch: comment vs limiter window (3/hour vs 3 per 20 minutes).Either update the comment or the limiter window to reflect the intended policy.
Apply one of:
- // 3 auth attempts per hour per IP + // 3 auth attempts per 20 minutes per IP - limiter: Ratelimit.slidingWindow(3, "20 m"), + limiter: Ratelimit.slidingWindow(3, "20 m"),or (if 3/hour is intended):
- limiter: Ratelimit.slidingWindow(3, "20 m"), + limiter: Ratelimit.slidingWindow(3, "60 m"),
18-25: Policy mismatch: comment says 5/hour; limiter enforces 3/30m.Align config and comment to avoid surprising enforcement.
If 5/hour is intended:
- // 5 billing operations per hour per IP - limiter: Ratelimit.slidingWindow(3, "30 m"), + // 5 billing operations per hour per IP + limiter: Ratelimit.slidingWindow(5, "60 m"),If 3/30m is intended, adjust the comment:
- // 5 billing operations per hour per IP + // 3 billing operations per 30 minutes per IPpages/api/mupdf/convert-page.ts (3)
12-15: Outdated comment vs config.Comment says “maximum of 120 seconds” but
maxDurationis 180.-// This function can run for a maximum of 120 seconds +// This function can run for a maximum of 180 seconds
147-182: Keyword check works; consider case-insensitive matching and lighter disclosure.
- Matching is case-sensitive; normalize to reduce misses.
- Consider returning/logging only the hostname to reduce sensitive URL leakage.
Minimal changes:
- const keywords = await get("keywords"); + const keywords = await get<string[]>("keywords"); @@ - for (const link of embeddedLinks) { + for (const link of embeddedLinks) { if (link.href) { - const matchedKeyword = keywords.find( - (keyword) => - typeof keyword === "string" && link.href.includes(keyword), - ); + const hrefLower = link.href.toLowerCase(); + const matchedKeyword = keywords.find( + (keyword) => + typeof keyword === "string" && + hrefLower.includes(keyword.toLowerCase()), + ); @@ - await log({ + const hostname = (() => { try { return new URL(link.href).hostname } catch { return undefined }})() + await log({ - message: `Document processing blocked: ${matchedKeyword} \n\n \`Metadata: {teamId: ${teamId}, documentVersionId: ${documentVersionId}, pageNumber: ${pageNumber}}\``, + message: `Document processing blocked: ${matchedKeyword}${hostname ? ` (host: ${hostname})` : ""} \n\n \`Metadata: {teamId: ${teamId}, documentVersionId: ${documentVersionId}, pageNumber: ${pageNumber}}\``, type: "error", mention: true, }); res.status(400).json({ error: "Document processing blocked", - matchedUrl: link.href, + matchedHostname: hostname ?? null, matchedKeyword: matchedKeyword, pageNumber: pageNumber, });Optional: if you need full URLs internally, keep logging the full value but mask query strings.
293-297: Free the PDF document object as well.You destroy the page and pixmap; also destroy the
docto reduce memory pressure for large PDFs.scaledPixmap.destroy(); // free memory page.destroy(); // free memory + doc.destroy?.(); // free memory (guard if method not present)lib/trigger/pdf-to-image-route.ts (1)
14-16: Trigger.dev compliance check: OK; consider adding an example payload.Using
taskfrom@trigger.dev/sdk/v3and exporting the task meets guidelines. Add a sample payload to docs/tests for easier triggering.Example payload:
{ "documentId": "doc_123", "documentVersionId": "dv_456", "teamId": "team_789", "versionNumber": 2 }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
ee/features/security/lib/ratelimit.ts(2 hunks)lib/trigger/pdf-to-image-route.ts(1 hunks)pages/api/mupdf/convert-page.ts(2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/trigger/**/*.ts
📄 CodeRabbit inference engine (.cursor/rules/rule-trigger-typescript.mdc)
**/trigger/**/*.ts: You MUST use@trigger.dev/sdk/v3when implementing Trigger.dev tasks.
You MUST NEVER useclient.defineJobin Trigger.dev task files, as it is deprecated and will break the application.
You MUSTexportevery task, including subtasks, in Trigger.dev task files.
If you are able to generate an example payload for a task, do so.
When implementing a Trigger.dev task, always use thetaskfunction from@trigger.dev/sdk/v3and follow the correct pattern for task definition.
When implementing scheduled (cron) tasks, useschedules.taskfrom@trigger.dev/sdk/v3and follow the correct pattern for schedule definition.
When implementing schema-validated tasks, useschemaTaskfrom@trigger.dev/sdk/v3and provide a schema using Zod or another supported library.
When triggering tasks from backend code, use thetasks.trigger,tasks.batchTrigger, ortasks.triggerAndPollmethods from@trigger.dev/sdk/v3and use type-only imports for type safety.
When triggering a task from inside another task, use the correct methods (trigger,batchTrigger,triggerAndWait,batchTriggerAndWait) on the task instance as shown in the guide.
When using metadata in tasks, use themetadataAPI from@trigger.dev/sdk/v3only inside run functions or task lifecycle hooks.
When using idempotency, use theidempotencyKeysAPI from@trigger.dev/sdk/v3and provide anidempotencyKeywhen triggering tasks.
When logging inside tasks, use theloggerAPI from@trigger.dev/sdk/v3and provide a message and a key-value object.
Files:
lib/trigger/pdf-to-image-route.ts
🧬 Code graph analysis (2)
lib/trigger/pdf-to-image-route.ts (1)
lib/utils/generate-trigger-status.ts (1)
updateStatus(20-24)
pages/api/mupdf/convert-page.ts (1)
lib/utils.ts (1)
log(64-124)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Vercel Agent Review
- GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (3)
lib/trigger/pdf-to-image-route.ts (1)
136-153: Smart blocked‑document handling and safe error parsing.Good defensive parse and explicit halt on blocked responses; status updates are clear.
- Confirm
pages/api/mupdf/convert-pagereturns{ error: "Document processing blocked", matchedUrl|matchedHostname, matchedKeyword }as implemented so the logger fields here are always defined.- Optional: set a distinct status (e.g., “Blocked by content policy”) to distinguish from generic failures.
ee/features/security/lib/ratelimit.ts (1)
14-16: ConfirmenableProtectionsupport in the installed@upstash/ratelimitversion.package.json declares @upstash/ratelimit: ^2.0.6 — Upstash added
enableProtectionin ratelimit v1.2.1, so the declared range should include it. (upstash.com)Repo run did not find a resolved lockfile entry or vendored types; verify the actual installed/resolved version and TypeScript types in your environment to avoid compile-time errors. Quick checks:
- Check lockfile: rg -n "@upstash/ratelimit" -g "pnpm-lock.yaml|package-lock.json|yarn.lock"
- Check installed package / types: npm ls @upstash/ratelimit || rg -n "enableProtection" node_modules/@upstash/ratelimit -S
pages/api/mupdf/convert-page.ts (1)
4-6:waitUntilrequires Edge runtime; this route runs on the Node runtime.Using
waitUntilhere can be a no-op or throw at runtime. Prefer awaiting the log before responding, and remove the import.Apply this diff:
-import { waitUntil } from "@vercel/functions";And in the blocked branch below:
- waitUntil( - log({ - message: `Document processing blocked: ${matchedKeyword} \n\n \`Metadata: {teamId: ${teamId}, documentVersionId: ${documentVersionId}, pageNumber: ${pageNumber}}\``, - type: "error", - mention: true, - }), - ); + await log({ + message: `Document processing blocked: ${matchedKeyword} \n\n \`Metadata: {teamId: ${teamId}, documentVersionId: ${documentVersionId}, pageNumber: ${pageNumber}}\``, + type: "error", + mention: true, + });Likely an incorrect or invalid review comment.
Summary by CodeRabbit
New Features
Bug Fixes
Chores