-
Notifications
You must be signed in to change notification settings - Fork 1.1k
feat: add url validation #1829
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add url validation #1829
Conversation
WalkthroughAdds a new URL/path validation module with SSRF/path-security checks and Zod schemas, applies server-side document payload validation across multiple document APIs and webhook ingestion, adds a content-type precheck for fetched webhook files, and pins two npm scripts to Trigger.dev CLI v3. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor Client
participant API as API: Team Documents
participant Validator as Zod Schemas
participant Store as Storage/DB
Client->>API: POST /teams/:teamId/documents (body)
API->>Validator: documentUploadSchema.safeParse(body)
alt Invalid payload
API-->>Client: 400 Bad Request (validation errors)
else Valid payload
API->>Store: Create/Process Document (using validated data)
Store-->>API: Result
API-->>Client: 200 OK (document info)
end
sequenceDiagram
autonumber
participant WH as Webhook Source
participant API as Webhook Handler
participant V as URL Security (webhookFileUrlSchema)
participant Net as HTTP Fetch
participant Storage as Upload Target
participant DB as DB
WH->>API: DocumentCreate event (fileUrl, contentType, ...)
API->>V: Validate fileUrl (HTTPS + SSRF/path checks)
alt Invalid URL
API-->>WH: 400/ignore (per handler policy)
else Valid URL
API->>Net: Fetch fileUrl (HEAD/GET)
Net-->>API: Response (headers, stream)
API->>API: Compare response content-type vs expected (log warn if mismatch)
API->>API: Convert to buffer
API->>Storage: Upload buffer
Storage-->>API: URL/path
API->>DB: Create document record
DB-->>API: Record
API-->>WH: Ack/Success
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. 📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 💡 Knowledge Base configuration:
You can enable these sources in your CodeRabbit configuration. 📒 Files selected for processing (3)
🚧 Files skipped from review as they are similar to previous changes (3)
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
pages/api/teams/[teamId]/documents/[id]/versions/index.ts (1)
95-114: Remove duplicate code blockLines 105-114 are an exact duplicate of lines 95-103. This appears to be a copy-paste error.
// turn off isPrimary flag for all other versions await prisma.documentVersion.updateMany({ where: { documentId: documentId, id: { not: version.id }, }, data: { isPrimary: false, }, }); - // turn off isPrimary flag for all other versions - await prisma.documentVersion.updateMany({ - where: { - documentId: documentId, - id: { not: version.id }, - }, - data: { - isPrimary: false, - }, - });pages/api/webhooks/services/[...path]/index.ts (2)
286-299: Guard against SSRF via redirects, unbounded downloads, and hung connectionsRight now we validate only the initial URL and perform an unbounded
fetch. Redirects could bypass initial checks, large files can exhaust memory, and a slow origin can hang the request.Apply the following patch to add a timeout, re‑validate the final URL after redirects, and reject obviously too‑large payloads via Content-Length:
- // 4. Fetch file from URL - const response = await fetch(fileUrl); + // 4. Fetch file from URL with timeout and safety checks + const controller = new AbortController(); + const timeout = setTimeout(() => controller.abort(), 30_000); // 30s safety timeout + let response: Response; + try { + response = await fetch(fileUrl, { signal: controller.signal }); + } finally { + clearTimeout(timeout); + } if (!response.ok) { return res.status(400).json({ error: "Failed to fetch file from URL" }); } + + // 4b. Re-validate the final response URL (guards against SSRF via redirects) + try { + webhookFileUrlSchema.parse(response.url); + } catch { + return res + .status(400) + .json({ error: "File URL redirect violated security policy" }); + } + + // 4c. Enforce a sane max size using Content-Length when available + const MAX_BYTES = + Number(process.env.WEBHOOK_MAX_FILE_BYTES ?? 50 * 1024 * 1024); // 50 MB default + const contentLength = response.headers.get("content-length"); + if (contentLength && Number(contentLength) > MAX_BYTES) { + return res.status(413).json({ error: "File too large" }); + } - // 5. Validate response content type matches expected - const responseContentType = response.headers.get("content-type"); - if (responseContentType && !responseContentType.startsWith(contentType)) { - console.warn( - `Content type mismatch: expected ${contentType}, got ${responseContentType}`, - ); - // Log but don't fail - some services return generic types - } + // 5. Validate response content type matches expected + const responseContentType = response.headers.get("content-type"); + if (responseContentType && !responseContentType.startsWith(contentType)) { + console.warn( + `Content type mismatch: expected ${contentType}, got ${responseContentType}`, + ); + // Log but don't fail - some services return generic types + }Follow-up (optional next step): stream with a byte cap instead of
arrayBuffer()to hard-limit memory; ifputFileServercan accept streams, we can wire that in a separate change.
286-299: Harden untrusted external fetch() calls with timeouts and redirect checksWe ran a repo-wide search and found numerous
fetch(...)usages—many of which pull from user-provided URLs (e.g.fileUrl,url) without any timeout or redirect validation. To prevent hangs, redirect loops, or unwanted host access, please audit and harden each external fetch by:
- Wrapping in an AbortController and enforcing a sensible timeout (e.g. 30 seconds)
- Setting
redirect: 'manual'(or equivalent) and explicitly handling non-OK redirect responses- Verifying the request URL against an allowlist or validating its hostname
- Applying any necessary size limits or content-type checks
Key locations to address immediately:
- pages/api/webhooks/services/[...path]/index.ts (line 286):
const response = await fetch(fileUrl);- pages/api/mupdf/get-pages.ts (line 25):
const response = await fetch(url);- lib/trigger/optimize-video-files.ts (line 43):
const response = await fetch(fileUrl);- lib/files/bulk-download.ts (lines 67–70):
const response = await fetch(url);- lib/trigger/pdf-to-image-route.ts
– line 64:const response = await fetch(.../api/mupdf/get-pages)
– line 118:const response = await fetch(.../api/mupdf/convert-page)And any similar fetches in
lib/mupdf/convert-page.ts,lib/mupdf/annotate-document.ts, or other modules that fetch arbitrary URLs.Please refactor these to include the above safeguards and review any additional external fetches uncovered in your scan.
🧹 Nitpick comments (5)
lib/zod/url-validation.ts (2)
25-27: Consider additional double-encoding patternsWhile the current checks cover common double-encoding attacks, consider adding checks for:
%252E(double-encoded dot)%255C(double-encoded backslash)- Mixed case variants like
%2e%2eor%2E%2e// Prevent double encoding attacks - if (pathOrUrl.includes("%2E%2E") || pathOrUrl.includes("%2F%2F")) { + const doubleEncodedPatterns = [ + "%2E%2E", "%2e%2e", "%2E%2e", "%2e%2E", // Various cases of .. + "%2F%2F", "%2f%2f", // // + "%252E", "%252e", // Double-encoded dot + "%255C", "%255c" // Double-encoded backslash + ]; + if (doubleEncodedPatterns.some(pattern => pathOrUrl.toLowerCase().includes(pattern.toLowerCase()))) { return false; }
186-212: Duplicate regex patternThe S3 path validation regex is duplicated in lines 109 and 208. Consider extracting it to a constant for maintainability.
+// Define the S3 path pattern as a constant at the top of the file +const S3_PATH_PATTERN = /^[a-zA-Z0-9_-]+\/doc_[a-zA-Z0-9_-]+\/[a-zA-Z0-9_.-]+\.[a-zA-Z0-9]+$/; + // Custom validator for file paths - either Notion URLs or S3 storage paths const createFilePathValidator = () => { return z .string() .min(1, "File path is required") .refine( (path) => { // ... existing code ... // Case 2: file storage paths - must match pattern: <id>/doc_<someId>/<name>.<ext> - const s3PathPattern = - /^[a-zA-Z0-9_-]+\/doc_[a-zA-Z0-9_-]+\/[a-zA-Z0-9_.-]+\.[a-zA-Z0-9]+$/; - return s3PathPattern.test(path); + return S3_PATH_PATTERN.test(path); }, // ... rest of the code ) // And update line 208: - return /^[a-zA-Z0-9_-]+\/doc_[a-zA-Z0-9_-]+\/[a-zA-Z0-9_.-]+\.[a-zA-Z0-9]+$/.test( + return S3_PATH_PATTERN.test( data.url, );pages/api/teams/[teamId]/documents/[id]/versions/index.ts (1)
35-39: Consider using a more descriptive dummy nameThe dummy name
Version ${new Date().toISOString()}is added solely to satisfy schema validation. Consider using a more meaningful pattern or updating the schema to make the name optional for version creation.// Validate request body using Zod schema for security const validationResult = documentUploadSchema.safeParse({ ...req.body, - name: `Version ${new Date().toISOString()}`, // Dummy name for validation + name: req.body.name || `Version_${documentId}_${Date.now()}`, // Use provided name or generate a meaningful default });pages/api/webhooks/services/[...path]/index.ts (2)
55-64: SwitchingfileUrltowebhookFileUrlSchemais a solid upgradeStrong + consistent validation at the schema layer reduces attack surface before any network I/O. One minor improvement you might consider in
webhookFileUrlSchema: trim whitespace to avoid subtle failures (.transform((s) => s.trim())) before.url(). Not a blocker here.
300-303: Add a post-read size guard in case Content-Length was missingIf the server doesn’t send Content-Length, we can still enforce the limit after reading.
// 6. Convert to buffer const fileBuffer = Buffer.from(await response.arrayBuffer()); + if (typeof MAX_BYTES !== "undefined" && fileBuffer.byteLength > MAX_BYTES) { + return res.status(413).json({ error: "File too large" }); + }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (6)
lib/zod/url-validation.ts(1 hunks)package.json(1 hunks)pages/api/teams/[teamId]/documents/[id]/versions/index.ts(2 hunks)pages/api/teams/[teamId]/documents/agreement.ts(3 hunks)pages/api/teams/[teamId]/documents/index.ts(3 hunks)pages/api/webhooks/services/[...path]/index.ts(4 hunks)
🧰 Additional context used
🧠 Learnings (7)
📓 Common learnings
Learnt from: CR
PR: mfts/papermark#0
File: .cursor/rules/rule-trigger-typescript.mdc:0-0
Timestamp: 2025-07-19T07:46:44.421Z
Learning: Applies to **/trigger/**/*.ts : When implementing schema-validated tasks, use `schemaTask` from `trigger.dev/sdk/v3` and provide a schema using Zod or another supported library.
📚 Learning: 2025-07-19T07:46:44.421Z
Learnt from: CR
PR: mfts/papermark#0
File: .cursor/rules/rule-trigger-typescript.mdc:0-0
Timestamp: 2025-07-19T07:46:44.421Z
Learning: Applies to **/trigger/**/*.ts : You MUST use `trigger.dev/sdk/v3` when implementing Trigger.dev tasks.
Applied to files:
package.json
📚 Learning: 2025-07-19T07:46:44.421Z
Learnt from: CR
PR: mfts/papermark#0
File: .cursor/rules/rule-trigger-typescript.mdc:0-0
Timestamp: 2025-07-19T07:46:44.421Z
Learning: Before generating any code for Trigger.dev tasks, verify that you are importing from `trigger.dev/sdk/v3`, exporting every task, and not generating any deprecated code patterns.
Applied to files:
package.json
📚 Learning: 2025-07-19T07:46:44.421Z
Learnt from: CR
PR: mfts/papermark#0
File: .cursor/rules/rule-trigger-typescript.mdc:0-0
Timestamp: 2025-07-19T07:46:44.421Z
Learning: Applies to **/trigger/**/*.ts : When implementing a Trigger.dev task, always use the `task` function from `trigger.dev/sdk/v3` and follow the correct pattern for task definition.
Applied to files:
package.json
📚 Learning: 2025-07-19T07:46:44.421Z
Learnt from: CR
PR: mfts/papermark#0
File: .cursor/rules/rule-trigger-typescript.mdc:0-0
Timestamp: 2025-07-19T07:46:44.421Z
Learning: Applies to **/trigger/**/*.ts : When triggering tasks from backend code, use the `tasks.trigger`, `tasks.batchTrigger`, or `tasks.triggerAndPoll` methods from `trigger.dev/sdk/v3` and use type-only imports for type safety.
Applied to files:
package.json
📚 Learning: 2025-07-19T07:46:44.421Z
Learnt from: CR
PR: mfts/papermark#0
File: .cursor/rules/rule-trigger-typescript.mdc:0-0
Timestamp: 2025-07-19T07:46:44.421Z
Learning: Applies to trigger.config.ts : When using build extensions in Trigger.dev, use only the supported extensions such as `additionalFiles`, `additionalPackages`, `aptGet`, `emitDecoratorMetadata`, `prismaExtension`, `syncEnvVars`, `puppeteer`, `ffmpeg`, and `esbuildPlugin`.
Applied to files:
package.json
📚 Learning: 2025-07-19T07:46:44.421Z
Learnt from: CR
PR: mfts/papermark#0
File: .cursor/rules/rule-trigger-typescript.mdc:0-0
Timestamp: 2025-07-19T07:46:44.421Z
Learning: Applies to **/trigger/**/*.ts : When implementing scheduled (cron) tasks, use `schedules.task` from `trigger.dev/sdk/v3` and follow the correct pattern for schedule definition.
Applied to files:
package.json
🧬 Code graph analysis (5)
pages/api/teams/[teamId]/documents/index.ts (2)
lib/zod/url-validation.ts (1)
documentUploadSchema(131-218)lib/utils.ts (1)
log(64-124)
pages/api/teams/[teamId]/documents/agreement.ts (2)
lib/zod/url-validation.ts (1)
documentUploadSchema(131-218)lib/utils.ts (2)
getExtension(27-30)log(64-124)
pages/api/webhooks/services/[...path]/index.ts (1)
lib/zod/url-validation.ts (1)
webhookFileUrlSchema(221-242)
pages/api/teams/[teamId]/documents/[id]/versions/index.ts (2)
lib/zod/url-validation.ts (1)
documentUploadSchema(131-218)lib/utils.ts (1)
log(64-124)
lib/zod/url-validation.ts (2)
lib/constants.ts (2)
SUPPORTED_DOCUMENT_SIMPLE_TYPES(157-169)SUPPORTED_DOCUMENT_MIME_TYPES(67-98)lib/utils/get-content-type.ts (1)
getSupportedContentType(1-46)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (8)
package.json (1)
16-17: Good security practice: Pinning Trigger.dev CLI to v3Pinning the Trigger.dev CLI to version 3 prevents unexpected breaking changes from newer major versions and ensures consistent behavior across different development environments.
lib/zod/url-validation.ts (1)
51-53: Incomplete private IP validationThe current regex doesn't properly validate the full range of private IPs (172.16.0.0 - 172.31.255.255). It currently accepts any 172.1x.x.x or 172.2x.x.x address.
// Block private IP ranges (simplified check) - if (hostname.match(/^10\.|^172\.(1[6-9]|2[0-9]|3[01])\.|^192\.168\./)) { + // More precise private IP validation + const ipv4Pattern = /^(\d{1,3})\.(\d{1,3})\.(\d{1,3})\.(\d{1,3})$/; + const match = hostname.match(ipv4Pattern); + if (match) { + const [, a, b, c, d] = match.map(Number); + // 10.0.0.0/8 + if (a === 10) return false; + // 172.16.0.0/12 + if (a === 172 && b >= 16 && b <= 31) return false; + // 192.168.0.0/16 + if (a === 192 && b === 168) return false; + } - return false; - }Likely an incorrect or invalid review comment.
pages/api/teams/[teamId]/documents/index.ts (2)
244-256: Good security improvement with input validationThe addition of Zod schema validation strengthens the API's security posture by preventing injection attacks and ensuring data integrity before processing.
260-268: VerifyprocessDocumenthandles the renamed fields
I ran searches inlib/api/documents/process-document.tsand did not find any references tofileUrlorfileTypeinside theprocessDocumentimplementation. It looks like the function’s parameter destructuring and body still expect the old field names (e.g.url,type) or never extract the new ones.Please review and ensure that:
- The payload passed into
processDocumentincludesfileUrlandfileType(rather thanurlandtype).- The function signature destructures those two fields out of
documentData(or top-level args) and uses them consistently throughout.pages/api/teams/[teamId]/documents/agreement.ts (1)
35-50: Good validation implementationThe schema validation ensures data integrity and provides detailed error messages for better debugging.
pages/api/webhooks/services/[...path]/index.ts (3)
22-22: Hardened URL schema import — good callImporting
webhookFileUrlSchemacentralizes SSRF/URL checks and avoids ad‑hoc validators scattered around the codebase.
305-315: Step renumbering and wording — looks correctComment updates align with the new pre-buffer validation step.
320-336: Step renumbering — consistentThe create‑document step is now correctly labeled after adding the content‑type precheck.
Summary by CodeRabbit
New Features
Bug Fixes
Chores