Skip to content

Conversation

@mfts
Copy link
Owner

@mfts mfts commented Sep 21, 2025

Summary by CodeRabbit

  • New Features

    • PDF processing now blocks documents containing restricted links, with clear error feedback.
  • Bug Fixes

    • Improved PDF-to-image error handling: clearer blocked-document status, safer error parsing, and more reliable failure reporting.
  • Chores

    • Strengthened rate limiting protection for authentication and billing operations.

@vercel
Copy link

vercel bot commented Sep 21, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
papermark Ready Ready Preview Comment Sep 21, 2025 1:19pm

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 21, 2025

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title "feat: improve document processing" concisely captures the primary intent of the changeset—improvements to PDF/document processing (link-blocking in convert-page and enhanced error handling in pdf-to-image-route) and related protections—so it is directly related to the modified files. It is short, single-line, and not misleading, though somewhat broad.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/index-improvements

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

  • Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
  • Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (6)
ee/features/security/lib/ratelimit.ts (2)

9-13: Policy mismatch: comment vs limiter window (3/hour vs 3 per 20 minutes).

Either update the comment or the limiter window to reflect the intended policy.

Apply one of:

-  // 3 auth attempts per hour per IP
+  // 3 auth attempts per 20 minutes per IP
-    limiter: Ratelimit.slidingWindow(3, "20 m"),
+    limiter: Ratelimit.slidingWindow(3, "20 m"),

or (if 3/hour is intended):

-    limiter: Ratelimit.slidingWindow(3, "20 m"),
+    limiter: Ratelimit.slidingWindow(3, "60 m"),

18-25: Policy mismatch: comment says 5/hour; limiter enforces 3/30m.

Align config and comment to avoid surprising enforcement.

If 5/hour is intended:

-  // 5 billing operations per hour per IP
-    limiter: Ratelimit.slidingWindow(3, "30 m"),
+  // 5 billing operations per hour per IP
+    limiter: Ratelimit.slidingWindow(5, "60 m"),

If 3/30m is intended, adjust the comment:

-  // 5 billing operations per hour per IP
+  // 3 billing operations per 30 minutes per IP
pages/api/mupdf/convert-page.ts (3)

12-15: Outdated comment vs config.

Comment says “maximum of 120 seconds” but maxDuration is 180.

-// This function can run for a maximum of 120 seconds
+// This function can run for a maximum of 180 seconds

147-182: Keyword check works; consider case-insensitive matching and lighter disclosure.

  • Matching is case-sensitive; normalize to reduce misses.
  • Consider returning/logging only the hostname to reduce sensitive URL leakage.

Minimal changes:

-        const keywords = await get("keywords");
+        const keywords = await get<string[]>("keywords");
@@
-          for (const link of embeddedLinks) {
+          for (const link of embeddedLinks) {
             if (link.href) {
-              const matchedKeyword = keywords.find(
-                (keyword) =>
-                  typeof keyword === "string" && link.href.includes(keyword),
-              );
+              const hrefLower = link.href.toLowerCase();
+              const matchedKeyword = keywords.find(
+                (keyword) =>
+                  typeof keyword === "string" &&
+                  hrefLower.includes(keyword.toLowerCase()),
+              );
@@
-                await log({
+                const hostname = (() => { try { return new URL(link.href).hostname } catch { return undefined }})()
+                await log({
-                  message: `Document processing blocked: ${matchedKeyword} \n\n \`Metadata: {teamId: ${teamId}, documentVersionId: ${documentVersionId}, pageNumber: ${pageNumber}}\``,
+                  message: `Document processing blocked: ${matchedKeyword}${hostname ? ` (host: ${hostname})` : ""} \n\n \`Metadata: {teamId: ${teamId}, documentVersionId: ${documentVersionId}, pageNumber: ${pageNumber}}\``,
                   type: "error",
                   mention: true,
                 });
                 res.status(400).json({
                   error: "Document processing blocked",
-                  matchedUrl: link.href,
+                  matchedHostname: hostname ?? null,
                   matchedKeyword: matchedKeyword,
                   pageNumber: pageNumber,
                 });

Optional: if you need full URLs internally, keep logging the full value but mask query strings.


293-297: Free the PDF document object as well.

You destroy the page and pixmap; also destroy the doc to reduce memory pressure for large PDFs.

     scaledPixmap.destroy(); // free memory
     page.destroy(); // free memory
+    doc.destroy?.(); // free memory (guard if method not present)
lib/trigger/pdf-to-image-route.ts (1)

14-16: Trigger.dev compliance check: OK; consider adding an example payload.

Using task from @trigger.dev/sdk/v3 and exporting the task meets guidelines. Add a sample payload to docs/tests for easier triggering.

Example payload:

{
  "documentId": "doc_123",
  "documentVersionId": "dv_456",
  "teamId": "team_789",
  "versionNumber": 2
}
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 02f303c and 15aa25a.

📒 Files selected for processing (3)
  • ee/features/security/lib/ratelimit.ts (2 hunks)
  • lib/trigger/pdf-to-image-route.ts (1 hunks)
  • pages/api/mupdf/convert-page.ts (2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/trigger/**/*.ts

📄 CodeRabbit inference engine (.cursor/rules/rule-trigger-typescript.mdc)

**/trigger/**/*.ts: You MUST use @trigger.dev/sdk/v3 when implementing Trigger.dev tasks.
You MUST NEVER use client.defineJob in Trigger.dev task files, as it is deprecated and will break the application.
You MUST export every task, including subtasks, in Trigger.dev task files.
If you are able to generate an example payload for a task, do so.
When implementing a Trigger.dev task, always use the task function from @trigger.dev/sdk/v3 and follow the correct pattern for task definition.
When implementing scheduled (cron) tasks, use schedules.task from @trigger.dev/sdk/v3 and follow the correct pattern for schedule definition.
When implementing schema-validated tasks, use schemaTask from @trigger.dev/sdk/v3 and provide a schema using Zod or another supported library.
When triggering tasks from backend code, use the tasks.trigger, tasks.batchTrigger, or tasks.triggerAndPoll methods from @trigger.dev/sdk/v3 and use type-only imports for type safety.
When triggering a task from inside another task, use the correct methods (trigger, batchTrigger, triggerAndWait, batchTriggerAndWait) on the task instance as shown in the guide.
When using metadata in tasks, use the metadata API from @trigger.dev/sdk/v3 only inside run functions or task lifecycle hooks.
When using idempotency, use the idempotencyKeys API from @trigger.dev/sdk/v3 and provide an idempotencyKey when triggering tasks.
When logging inside tasks, use the logger API from @trigger.dev/sdk/v3 and provide a message and a key-value object.

Files:

  • lib/trigger/pdf-to-image-route.ts
🧬 Code graph analysis (2)
lib/trigger/pdf-to-image-route.ts (1)
lib/utils/generate-trigger-status.ts (1)
  • updateStatus (20-24)
pages/api/mupdf/convert-page.ts (1)
lib/utils.ts (1)
  • log (64-124)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Vercel Agent Review
  • GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (3)
lib/trigger/pdf-to-image-route.ts (1)

136-153: Smart blocked‑document handling and safe error parsing.

Good defensive parse and explicit halt on blocked responses; status updates are clear.

  • Confirm pages/api/mupdf/convert-page returns { error: "Document processing blocked", matchedUrl|matchedHostname, matchedKeyword } as implemented so the logger fields here are always defined.
  • Optional: set a distinct status (e.g., “Blocked by content policy”) to distinguish from generic failures.
ee/features/security/lib/ratelimit.ts (1)

14-16: Confirm enableProtection support in the installed @upstash/ratelimit version.

package.json declares @upstash/ratelimit: ^2.0.6 — Upstash added enableProtection in ratelimit v1.2.1, so the declared range should include it. (upstash.com)

Repo run did not find a resolved lockfile entry or vendored types; verify the actual installed/resolved version and TypeScript types in your environment to avoid compile-time errors. Quick checks:

  • Check lockfile: rg -n "@upstash/ratelimit" -g "pnpm-lock.yaml|package-lock.json|yarn.lock"
  • Check installed package / types: npm ls @upstash/ratelimit || rg -n "enableProtection" node_modules/@upstash/ratelimit -S
pages/api/mupdf/convert-page.ts (1)

4-6: waitUntil requires Edge runtime; this route runs on the Node runtime.

Using waitUntil here can be a no-op or throw at runtime. Prefer awaiting the log before responding, and remove the import.

Apply this diff:

-import { waitUntil } from "@vercel/functions";

And in the blocked branch below:

-                waitUntil(
-                  log({
-                    message: `Document processing blocked: ${matchedKeyword} \n\n \`Metadata: {teamId: ${teamId}, documentVersionId: ${documentVersionId}, pageNumber: ${pageNumber}}\``,
-                    type: "error",
-                    mention: true,
-                  }),
-                );
+                await log({
+                  message: `Document processing blocked: ${matchedKeyword} \n\n \`Metadata: {teamId: ${teamId}, documentVersionId: ${documentVersionId}, pageNumber: ${pageNumber}}\``,
+                  type: "error",
+                  mention: true,
+                });

Likely an incorrect or invalid review comment.

@mfts mfts merged commit 240f3b8 into main Sep 21, 2025
10 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Sep 21, 2025
@mfts mfts deleted the feat/index-improvements branch November 19, 2025 11:46
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants