Skip to content

Conversation

@mfts
Copy link
Owner

@mfts mfts commented Aug 23, 2025

Summary by CodeRabbit

  • New Features

    • Stronger URL and file-path validation to block unsafe inputs (SSRF, traversal, double-encoding).
    • Server-side validation for document uploads across relevant endpoints with clear error responses.
    • Webhook file URLs now require HTTPS and run enhanced security checks.
    • Pre-upload content-type check for webhook files that logs informative warnings on mismatches.
  • Bug Fixes

    • Fewer failed uploads and processing errors due to stricter input validation.
  • Chores

    • Pinned Trigger.dev CLI scripts to v3 for consistent deployments.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Aug 23, 2025

Walkthrough

Adds a new URL/path validation module with SSRF/path-security checks and Zod schemas, applies server-side document payload validation across multiple document APIs and webhook ingestion, adds a content-type precheck for fetched webhook files, and pins two npm scripts to Trigger.dev CLI v3.

Changes

Cohort / File(s) Summary
Security & URL/Path Validation Module
lib/zod/url-validation.ts
New module: path traversal and double-encoding checks, SSRF protection, combined URL security validator, filePathSchema, documentUploadSchema, and webhookFileUrlSchema. Enforces storage-type/URL consistency and MIME/type checks; recognizes Notion hosts and optional VERCEL_BLOB_HOST.
Document API Validation
pages/api/teams/[teamId]/documents/index.ts, pages/api/teams/[teamId]/documents/agreement.ts, pages/api/teams/[teamId]/documents/[id]/versions/index.ts
Introduces Zod-based validation using documentUploadSchema.safeParse; returns 400 on failure with logged errors; on success consumes validationResult.data for downstream processing (no change to core creation/versioning logic).
Webhook Ingestion Hardening
pages/api/webhooks/services/[...path]/index.ts
Replaces generic URL validation with webhookFileUrlSchema (HTTPS + SSRF/path checks). Adds non-fatal response content-type precheck after fetching the file (logs warning on mismatch) before buffering/uploading; updates step annotations.
Tooling Scripts
package.json
Pins Trigger.dev CLI to v3 in scripts: npx trigger.dev@3 dev and npx trigger.dev@3 deploy (replacing @latest).

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Client
  participant API as API: Team Documents
  participant Validator as Zod Schemas
  participant Store as Storage/DB

  Client->>API: POST /teams/:teamId/documents (body)
  API->>Validator: documentUploadSchema.safeParse(body)
  alt Invalid payload
    API-->>Client: 400 Bad Request (validation errors)
  else Valid payload
    API->>Store: Create/Process Document (using validated data)
    Store-->>API: Result
    API-->>Client: 200 OK (document info)
  end
Loading
sequenceDiagram
  autonumber
  participant WH as Webhook Source
  participant API as Webhook Handler
  participant V as URL Security (webhookFileUrlSchema)
  participant Net as HTTP Fetch
  participant Storage as Upload Target
  participant DB as DB

  WH->>API: DocumentCreate event (fileUrl, contentType, ...)
  API->>V: Validate fileUrl (HTTPS + SSRF/path checks)
  alt Invalid URL
    API-->>WH: 400/ignore (per handler policy)
  else Valid URL
    API->>Net: Fetch fileUrl (HEAD/GET)
    Net-->>API: Response (headers, stream)
    API->>API: Compare response content-type vs expected (log warn if mismatch)
    API->>API: Convert to buffer
    API->>Storage: Upload buffer
    Storage-->>API: URL/path
    API->>DB: Create document record
    DB-->>API: Record
    API-->>WH: Ack/Success
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.


📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 3103b43 and f626372.

📒 Files selected for processing (3)
  • lib/zod/url-validation.ts (1 hunks)
  • pages/api/teams/[teamId]/documents/agreement.ts (4 hunks)
  • pages/api/webhooks/services/[...path]/index.ts (4 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • pages/api/teams/[teamId]/documents/agreement.ts
  • pages/api/webhooks/services/[...path]/index.ts
  • lib/zod/url-validation.ts
✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/issue

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@vercel
Copy link

vercel bot commented Aug 23, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
papermark Ready Ready Preview Comment Aug 23, 2025 1:58pm

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
pages/api/teams/[teamId]/documents/[id]/versions/index.ts (1)

95-114: Remove duplicate code block

Lines 105-114 are an exact duplicate of lines 95-103. This appears to be a copy-paste error.

   // turn off isPrimary flag for all other versions
   await prisma.documentVersion.updateMany({
     where: {
       documentId: documentId,
       id: { not: version.id },
     },
     data: {
       isPrimary: false,
     },
   });

-  // turn off isPrimary flag for all other versions
-  await prisma.documentVersion.updateMany({
-    where: {
-      documentId: documentId,
-      id: { not: version.id },
-    },
-    data: {
-      isPrimary: false,
-    },
-  });
pages/api/webhooks/services/[...path]/index.ts (2)

286-299: Guard against SSRF via redirects, unbounded downloads, and hung connections

Right now we validate only the initial URL and perform an unbounded fetch. Redirects could bypass initial checks, large files can exhaust memory, and a slow origin can hang the request.

Apply the following patch to add a timeout, re‑validate the final URL after redirects, and reject obviously too‑large payloads via Content-Length:

-  // 4. Fetch file from URL
-  const response = await fetch(fileUrl);
+  // 4. Fetch file from URL with timeout and safety checks
+  const controller = new AbortController();
+  const timeout = setTimeout(() => controller.abort(), 30_000); // 30s safety timeout
+  let response: Response;
+  try {
+    response = await fetch(fileUrl, { signal: controller.signal });
+  } finally {
+    clearTimeout(timeout);
+  }
   if (!response.ok) {
     return res.status(400).json({ error: "Failed to fetch file from URL" });
   }
+
+  // 4b. Re-validate the final response URL (guards against SSRF via redirects)
+  try {
+    webhookFileUrlSchema.parse(response.url);
+  } catch {
+    return res
+      .status(400)
+      .json({ error: "File URL redirect violated security policy" });
+  }
+
+  // 4c. Enforce a sane max size using Content-Length when available
+  const MAX_BYTES =
+    Number(process.env.WEBHOOK_MAX_FILE_BYTES ?? 50 * 1024 * 1024); // 50 MB default
+  const contentLength = response.headers.get("content-length");
+  if (contentLength && Number(contentLength) > MAX_BYTES) {
+    return res.status(413).json({ error: "File too large" });
+  }
 
-  // 5. Validate response content type matches expected
-  const responseContentType = response.headers.get("content-type");
-  if (responseContentType && !responseContentType.startsWith(contentType)) {
-    console.warn(
-      `Content type mismatch: expected ${contentType}, got ${responseContentType}`,
-    );
-    // Log but don't fail - some services return generic types
-  }
+  // 5. Validate response content type matches expected
+  const responseContentType = response.headers.get("content-type");
+  if (responseContentType && !responseContentType.startsWith(contentType)) {
+    console.warn(
+      `Content type mismatch: expected ${contentType}, got ${responseContentType}`,
+    );
+    // Log but don't fail - some services return generic types
+  }

Follow-up (optional next step): stream with a byte cap instead of arrayBuffer() to hard-limit memory; if putFileServer can accept streams, we can wire that in a separate change.


286-299: Harden untrusted external fetch() calls with timeouts and redirect checks

We ran a repo-wide search and found numerous fetch(...) usages—many of which pull from user-provided URLs (e.g. fileUrl, url) without any timeout or redirect validation. To prevent hangs, redirect loops, or unwanted host access, please audit and harden each external fetch by:

  • Wrapping in an AbortController and enforcing a sensible timeout (e.g. 30 seconds)
  • Setting redirect: 'manual' (or equivalent) and explicitly handling non-OK redirect responses
  • Verifying the request URL against an allowlist or validating its hostname
  • Applying any necessary size limits or content-type checks

Key locations to address immediately:

  • pages/api/webhooks/services/[...path]/index.ts (line 286):
    const response = await fetch(fileUrl);
  • pages/api/mupdf/get-pages.ts (line 25):
    const response = await fetch(url);
  • lib/trigger/optimize-video-files.ts (line 43):
    const response = await fetch(fileUrl);
  • lib/files/bulk-download.ts (lines 67–70):
    const response = await fetch(url);
  • lib/trigger/pdf-to-image-route.ts
    – line 64: const response = await fetch(.../api/mupdf/get-pages)
    – line 118: const response = await fetch(.../api/mupdf/convert-page)

And any similar fetches in lib/mupdf/convert-page.ts, lib/mupdf/annotate-document.ts, or other modules that fetch arbitrary URLs.

Please refactor these to include the above safeguards and review any additional external fetches uncovered in your scan.

🧹 Nitpick comments (5)
lib/zod/url-validation.ts (2)

25-27: Consider additional double-encoding patterns

While the current checks cover common double-encoding attacks, consider adding checks for:

  • %252E (double-encoded dot)
  • %255C (double-encoded backslash)
  • Mixed case variants like %2e%2e or %2E%2e
   // Prevent double encoding attacks
-  if (pathOrUrl.includes("%2E%2E") || pathOrUrl.includes("%2F%2F")) {
+  const doubleEncodedPatterns = [
+    "%2E%2E", "%2e%2e", "%2E%2e", "%2e%2E",  // Various cases of ..
+    "%2F%2F", "%2f%2f",                       // //
+    "%252E", "%252e",                         // Double-encoded dot
+    "%255C", "%255c"                          // Double-encoded backslash
+  ];
+  if (doubleEncodedPatterns.some(pattern => pathOrUrl.toLowerCase().includes(pattern.toLowerCase()))) {
     return false;
   }

186-212: Duplicate regex pattern

The S3 path validation regex is duplicated in lines 109 and 208. Consider extracting it to a constant for maintainability.

+// Define the S3 path pattern as a constant at the top of the file
+const S3_PATH_PATTERN = /^[a-zA-Z0-9_-]+\/doc_[a-zA-Z0-9_-]+\/[a-zA-Z0-9_.-]+\.[a-zA-Z0-9]+$/;
+
 // Custom validator for file paths - either Notion URLs or S3 storage paths
 const createFilePathValidator = () => {
   return z
     .string()
     .min(1, "File path is required")
     .refine(
       (path) => {
         // ... existing code ...
         
         // Case 2: file storage paths - must match pattern: <id>/doc_<someId>/<name>.<ext>
-        const s3PathPattern =
-          /^[a-zA-Z0-9_-]+\/doc_[a-zA-Z0-9_-]+\/[a-zA-Z0-9_.-]+\.[a-zA-Z0-9]+$/;
-        return s3PathPattern.test(path);
+        return S3_PATH_PATTERN.test(path);
       },
       // ... rest of the code
     )
     
 // And update line 208:
-        return /^[a-zA-Z0-9_-]+\/doc_[a-zA-Z0-9_-]+\/[a-zA-Z0-9_.-]+\.[a-zA-Z0-9]+$/.test(
+        return S3_PATH_PATTERN.test(
           data.url,
         );
pages/api/teams/[teamId]/documents/[id]/versions/index.ts (1)

35-39: Consider using a more descriptive dummy name

The dummy name Version ${new Date().toISOString()} is added solely to satisfy schema validation. Consider using a more meaningful pattern or updating the schema to make the name optional for version creation.

     // Validate request body using Zod schema for security
     const validationResult = documentUploadSchema.safeParse({
       ...req.body,
-      name: `Version ${new Date().toISOString()}`, // Dummy name for validation
+      name: req.body.name || `Version_${documentId}_${Date.now()}`, // Use provided name or generate a meaningful default
     });
pages/api/webhooks/services/[...path]/index.ts (2)

55-64: Switching fileUrl to webhookFileUrlSchema is a solid upgrade

Strong + consistent validation at the schema layer reduces attack surface before any network I/O. One minor improvement you might consider in webhookFileUrlSchema: trim whitespace to avoid subtle failures (.transform((s) => s.trim())) before .url(). Not a blocker here.


300-303: Add a post-read size guard in case Content-Length was missing

If the server doesn’t send Content-Length, we can still enforce the limit after reading.

   // 6. Convert to buffer
   const fileBuffer = Buffer.from(await response.arrayBuffer());
+  if (typeof MAX_BYTES !== "undefined" && fileBuffer.byteLength > MAX_BYTES) {
+    return res.status(413).json({ error: "File too large" });
+  }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 4338400 and 3103b43.

📒 Files selected for processing (6)
  • lib/zod/url-validation.ts (1 hunks)
  • package.json (1 hunks)
  • pages/api/teams/[teamId]/documents/[id]/versions/index.ts (2 hunks)
  • pages/api/teams/[teamId]/documents/agreement.ts (3 hunks)
  • pages/api/teams/[teamId]/documents/index.ts (3 hunks)
  • pages/api/webhooks/services/[...path]/index.ts (4 hunks)
🧰 Additional context used
🧠 Learnings (7)
📓 Common learnings
Learnt from: CR
PR: mfts/papermark#0
File: .cursor/rules/rule-trigger-typescript.mdc:0-0
Timestamp: 2025-07-19T07:46:44.421Z
Learning: Applies to **/trigger/**/*.ts : When implementing schema-validated tasks, use `schemaTask` from `trigger.dev/sdk/v3` and provide a schema using Zod or another supported library.
📚 Learning: 2025-07-19T07:46:44.421Z
Learnt from: CR
PR: mfts/papermark#0
File: .cursor/rules/rule-trigger-typescript.mdc:0-0
Timestamp: 2025-07-19T07:46:44.421Z
Learning: Applies to **/trigger/**/*.ts : You MUST use `trigger.dev/sdk/v3` when implementing Trigger.dev tasks.

Applied to files:

  • package.json
📚 Learning: 2025-07-19T07:46:44.421Z
Learnt from: CR
PR: mfts/papermark#0
File: .cursor/rules/rule-trigger-typescript.mdc:0-0
Timestamp: 2025-07-19T07:46:44.421Z
Learning: Before generating any code for Trigger.dev tasks, verify that you are importing from `trigger.dev/sdk/v3`, exporting every task, and not generating any deprecated code patterns.

Applied to files:

  • package.json
📚 Learning: 2025-07-19T07:46:44.421Z
Learnt from: CR
PR: mfts/papermark#0
File: .cursor/rules/rule-trigger-typescript.mdc:0-0
Timestamp: 2025-07-19T07:46:44.421Z
Learning: Applies to **/trigger/**/*.ts : When implementing a Trigger.dev task, always use the `task` function from `trigger.dev/sdk/v3` and follow the correct pattern for task definition.

Applied to files:

  • package.json
📚 Learning: 2025-07-19T07:46:44.421Z
Learnt from: CR
PR: mfts/papermark#0
File: .cursor/rules/rule-trigger-typescript.mdc:0-0
Timestamp: 2025-07-19T07:46:44.421Z
Learning: Applies to **/trigger/**/*.ts : When triggering tasks from backend code, use the `tasks.trigger`, `tasks.batchTrigger`, or `tasks.triggerAndPoll` methods from `trigger.dev/sdk/v3` and use type-only imports for type safety.

Applied to files:

  • package.json
📚 Learning: 2025-07-19T07:46:44.421Z
Learnt from: CR
PR: mfts/papermark#0
File: .cursor/rules/rule-trigger-typescript.mdc:0-0
Timestamp: 2025-07-19T07:46:44.421Z
Learning: Applies to trigger.config.ts : When using build extensions in Trigger.dev, use only the supported extensions such as `additionalFiles`, `additionalPackages`, `aptGet`, `emitDecoratorMetadata`, `prismaExtension`, `syncEnvVars`, `puppeteer`, `ffmpeg`, and `esbuildPlugin`.

Applied to files:

  • package.json
📚 Learning: 2025-07-19T07:46:44.421Z
Learnt from: CR
PR: mfts/papermark#0
File: .cursor/rules/rule-trigger-typescript.mdc:0-0
Timestamp: 2025-07-19T07:46:44.421Z
Learning: Applies to **/trigger/**/*.ts : When implementing scheduled (cron) tasks, use `schedules.task` from `trigger.dev/sdk/v3` and follow the correct pattern for schedule definition.

Applied to files:

  • package.json
🧬 Code graph analysis (5)
pages/api/teams/[teamId]/documents/index.ts (2)
lib/zod/url-validation.ts (1)
  • documentUploadSchema (131-218)
lib/utils.ts (1)
  • log (64-124)
pages/api/teams/[teamId]/documents/agreement.ts (2)
lib/zod/url-validation.ts (1)
  • documentUploadSchema (131-218)
lib/utils.ts (2)
  • getExtension (27-30)
  • log (64-124)
pages/api/webhooks/services/[...path]/index.ts (1)
lib/zod/url-validation.ts (1)
  • webhookFileUrlSchema (221-242)
pages/api/teams/[teamId]/documents/[id]/versions/index.ts (2)
lib/zod/url-validation.ts (1)
  • documentUploadSchema (131-218)
lib/utils.ts (1)
  • log (64-124)
lib/zod/url-validation.ts (2)
lib/constants.ts (2)
  • SUPPORTED_DOCUMENT_SIMPLE_TYPES (157-169)
  • SUPPORTED_DOCUMENT_MIME_TYPES (67-98)
lib/utils/get-content-type.ts (1)
  • getSupportedContentType (1-46)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (8)
package.json (1)

16-17: Good security practice: Pinning Trigger.dev CLI to v3

Pinning the Trigger.dev CLI to version 3 prevents unexpected breaking changes from newer major versions and ensures consistent behavior across different development environments.

lib/zod/url-validation.ts (1)

51-53: Incomplete private IP validation

The current regex doesn't properly validate the full range of private IPs (172.16.0.0 - 172.31.255.255). It currently accepts any 172.1x.x.x or 172.2x.x.x address.

   // Block private IP ranges (simplified check)
-  if (hostname.match(/^10\.|^172\.(1[6-9]|2[0-9]|3[01])\.|^192\.168\./)) {
+  // More precise private IP validation
+  const ipv4Pattern = /^(\d{1,3})\.(\d{1,3})\.(\d{1,3})\.(\d{1,3})$/;
+  const match = hostname.match(ipv4Pattern);
+  if (match) {
+    const [, a, b, c, d] = match.map(Number);
+    // 10.0.0.0/8
+    if (a === 10) return false;
+    // 172.16.0.0/12
+    if (a === 172 && b >= 16 && b <= 31) return false;
+    // 192.168.0.0/16
+    if (a === 192 && b === 168) return false;
+  }
-    return false;
-  }

Likely an incorrect or invalid review comment.

pages/api/teams/[teamId]/documents/index.ts (2)

244-256: Good security improvement with input validation

The addition of Zod schema validation strengthens the API's security posture by preventing injection attacks and ensuring data integrity before processing.


260-268: Verify processDocument handles the renamed fields
I ran searches in lib/api/documents/process-document.ts and did not find any references to fileUrl or fileType inside the processDocument implementation. It looks like the function’s parameter destructuring and body still expect the old field names (e.g. url, type) or never extract the new ones.

Please review and ensure that:

  • The payload passed into processDocument includes fileUrl and fileType (rather than url and type).
  • The function signature destructures those two fields out of documentData (or top-level args) and uses them consistently throughout.
pages/api/teams/[teamId]/documents/agreement.ts (1)

35-50: Good validation implementation

The schema validation ensures data integrity and provides detailed error messages for better debugging.

pages/api/webhooks/services/[...path]/index.ts (3)

22-22: Hardened URL schema import — good call

Importing webhookFileUrlSchema centralizes SSRF/URL checks and avoids ad‑hoc validators scattered around the codebase.


305-315: Step renumbering and wording — looks correct

Comment updates align with the new pre-buffer validation step.


320-336: Step renumbering — consistent

The create‑document step is now correctly labeled after adding the content‑type precheck.

@mfts mfts merged commit 6898498 into main Aug 23, 2025
9 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Aug 23, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants