-
Notifications
You must be signed in to change notification settings - Fork 1.1k
fix: url validation for notion #1830
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
Note Currently processing new changes in this PR. This may take a few minutes, please wait... 📒 Files selected for processing (1)
WalkthroughUpdates validation and types: documentUploadSchema now defaults Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor Client
participant Validator as documentUploadSchema
participant API as createDocument handlers
Client->>Validator: validate({ url, storageType?, contentType? })
alt contentType provided
Validator->>Validator: validate contentType against supported types or "text/html"
else contentType missing/null
Note right of Validator #DDEBF7: contentType validation skipped
end
alt storageType provided
Validator->>Validator: enforce storageType rules
Note right of Validator #F7F3DD: S3_PATH => non-URL path\nVERCEL_BLOB => Notion URL or S3-like path
alt mismatch
Validator-->>Client: error: "Storage type does not match the URL/path format, or missing storage type for non-Notion files"
else ok
Validator-->>Client: success
end
else storageType missing
alt url is HTTPS
Validator->>Validator: require Notion host (notion.so, www.notion.so, *.notion.site)
alt Notion host
Validator-->>Client: success
else invalid host
Validator-->>Client: error: "Storage type does not match the URL/path format, or missing storage type for non-Notion files"
end
else non-HTTPS or path
Validator-->>Client: error (storageType required for non-Notion files)
end
end
Note over Client,API #EDF7EE: On success, payload may include contentType:null
Client->>API: submit validated payload (contentType: string | null)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Tip You can validate your CodeRabbit configuration file in your editor.If your editor has YAML language server, you can enable auto-completion and validation by adding ✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (5)
lib/zod/url-validation.ts (5)
200-218: Deduplicate Notion host validation and make it less error-proneYou re-implement the Notion hostname checks here. Extract a helper (also usable by
createFilePathValidator) to keep a single source of truth and reduce drift.Apply this refactor:
// Place near the top of the file const isNotionHostname = (hostname: string): boolean => { const h = hostname.toLowerCase(); return h === "www.notion.so" || h === "notion.so" || h.endsWith(".notion.site"); }; const isNotionUrl = (maybeUrl: string): boolean => { if (!maybeUrl.startsWith("https://")) return false; try { return isNotionHostname(new URL(maybeUrl).hostname); } catch { return false; } };Then simplify this block:
- if (data.url.startsWith("https://")) { - try { - const urlObj = new URL(data.url); - const hostname = urlObj.hostname; - return ( - hostname === "www.notion.so" || - hostname === "notion.so" || - hostname.endsWith(".notion.site") - ); - } catch { - return false; - } - } + if (data.url.startsWith("https://")) { + return isNotionUrl(data.url); + }
101-103: Make Vercel Blob host detection robust (case-insensitive, allow subdomains, avoid startsWith)
startsWith(process.env.VERCEL_BLOB_HOST)is brittle and case-sensitive. Vercel Blob hostnames often use subdomains; prefer exact match or subdomain match (endsWith).Apply this change:
- if (process.env.VERCEL_BLOB_HOST) { - isVercelBlob = hostname.startsWith(process.env.VERCEL_BLOB_HOST); - } + const blobHost = process.env.VERCEL_BLOB_HOST?.toLowerCase(); + if (blobHost) { + const h = hostname.toLowerCase(); + isVercelBlob = h === blobHost || h.endsWith(`.${blobHost}`); + }If you adopt the
isNotionHostnamehelper, consider a parallelisVercelBlobHostnamehelper for consistency.
117-119: Grammar nit: fix article in error messageSmall copy edit.
- "File path must be either a Notion URL or an file storage path", + "File path must be either a Notion URL or a file storage path",
221-246: Tighten S3 validation and remove regex duplication
- The S3 path regex is duplicated (here and in
createFilePathValidator). Extract a sharedconst S3_PATH_REGEX = ...to prevent divergence.- In the
S3_PATHbranch you only assert “not https.” Relying on the earlierfilePathSchemarefine is fine, but explicitly testing the sameS3_PATH_REGEXhere would make this block self-sufficient and safer against future changes.Example:
- if (data.storageType === "S3_PATH") { - // S3_PATH should use file paths, not URLs - return !data.url.startsWith("https://"); - } else if (data.storageType === "VERCEL_BLOB") { + if (data.storageType === "S3_PATH") { + // S3_PATH should use file paths, not URLs + return !data.url.startsWith("https://") && S3_PATH_REGEX.test(data.url); + } else if (data.storageType === "VERCEL_BLOB") { // VERCEL_BLOB can use either Notion URLs or S3 paths (for migration)And hoist:
const S3_PATH_REGEX = /^[a-zA-Z0-9_-]+\/doc_[a-zA-Z0-9_-]+\/[a-zA-Z0-9_.-]+\.[a-zA-Z0-9]+$/;
36-69: Optional: broaden SSRF protectionsYour SSRF checks are a good start. Consider covering:
- 0.0.0.0/8, 127.0.0.0/8 beyond 127.0.0.1, and IPv4-mapped IPv6 (
::ffff:127.0.0.1).- Numeric, octal, or hex IP encodings.
- DNS resolution to private ranges (needs server-side DNS check, not just string matching).
If desired, I can add a utility using
ipaddr.js(or Node’snetmodule plus a resolver) to harden this without overcomplicating the client side.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
lib/zod/url-validation.ts(4 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (2)
lib/zod/url-validation.ts (2)
171-173: LGTM: Allowing text/html and making contentType optional fits NotionAllowing
text/htmlfor Notion docs is correct, and makingcontentTypeoptional aligns with ingestion realities—provided the earlier bug is fixed to limit the skip to Notion.
143-147: storageType optional: verify downstream correctness before mergingMaking
storageTypeoptional in thedocumentUploadSchemadoes introduce a breaking change: any call to the schema that omitsstorageTypewill now produceundefinedinstead of the previous required enum, and this value is immediately passed into Prisma and routing logic without a fallback. We’ve confirmed numerous call sites that destructure and usestorageTypedirectly (e.g., inpages/api/teams/[teamId]/documents/agreement.ts,lib/trigger/convert-files.ts,lib/files/copy-file-to-bucket-server.ts, and many client components) where anundefinedvalue could surface.Please manually verify for each critical path that:
- No logic assumes
storageTypeis always defined/truthy (e.g., the.with("S3_PATH", ...)and.with("VERCEL_BLOB", ...)branches in file-copy functions).- Passing
undefinedtoprisma.document.create({ data: { storageType } })either correctly falls back to the database default (VERCEL_BLOB) or is explicitly handled.- Any UI components or API endpoints that read
storageTypewon’t break or render incorrectly when it’s missing.Optional refactor suggestion: add a transform just after your object schema to compute a reliable fallback based on the URL, so downstream code always receives a concrete value:
documentUploadSchema = z .object({ name: z.string().min(1).max(255), url: filePathSchema, storageType: z .enum(["S3_PATH", "VERCEL_BLOB"], { errorMap: () => ({ message: "Invalid storage type" }), }) .optional(), }) .transform((data) => { const isUrl = data.url.startsWith("https://"); const isNotion = isUrl && isNotionHostname(new URL(data.url).hostname); return { ...data, storageType: data.storageType ?? (isNotion ? "VERCEL_BLOB" : "S3_PATH"), }; }) .refine(/* existing refine logic */);This ensures every consumer always sees a defined
storageTypewithout re-implementing inference logic downstream.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (4)
lib/documents/create-document.ts (2)
53-56: Avoid throwing Error with a non-string payload
await response.json()is likely an object;new Error(object)will stringify to "[object Object]". Prefer extracting amessageor falling back to status text.Apply:
if (!response.ok) { - const error = await response.json(); - throw new Error(error); + let message = `HTTP error! status: ${response.status}`; + try { + const err = await response.json(); + message = err?.message ?? message; + } catch {} + throw new Error(message); }
135-139: Don’t mask network/HTTP failures as ID validation errorsThe catch transforms any error (including fetch/network failures and 5xx) into "Invalid document ID or team ID", which is misleading and harms debuggability.
Apply:
} catch (error) { console.error("Error creating new document version:", error); - throw new Error("Invalid document ID or team ID"); + if (error instanceof z.ZodError) { + throw new Error("Invalid document ID"); + } + // Re-throw original error for network/HTTP issues + throw error; }lib/zod/url-validation.ts (2)
224-243: Allow Vercel Blob HTTPS URLs consistently and harden host matching
- The path validator allows Vercel Blob hosts, but the storageType refinement for
VERCEL_BLOBrejects them (it only accepts Notion HTTPS or S3-style paths). This inconsistency blocks valid Blob URLs.- Also,
startsWith(process.env.VERCEL_BLOB_HOST)is an unsafe host check; it will acceptblob.vercel-storage.com.evil.com.Apply:
@@ - // Check for vercel blob storage - let isVercelBlob = false; - if (process.env.VERCEL_BLOB_HOST) { - isVercelBlob = hostname.startsWith(process.env.VERCEL_BLOB_HOST); - } + // Check for Vercel Blob storage (exact or subdomain match with dot-boundary) + let isVercelBlob = false; + if (process.env.VERCEL_BLOB_HOST) { + const allowed = process.env.VERCEL_BLOB_HOST.toLowerCase(); + const h = hostname.toLowerCase(); + isVercelBlob = h === allowed || h.endsWith(`.${allowed}`); + } @@ } else if (data.storageType === "VERCEL_BLOB") { // VERCEL_BLOB can use either Notion URLs or S3 paths (for migration) if (data.url.startsWith("https://")) { - // Must be a Notion URL for VERCEL_BLOB - try { - const urlObj = new URL(data.url); - const hostname = urlObj.hostname; - return ( - hostname === "www.notion.so" || - hostname === "notion.so" || - hostname.endsWith(".notion.site") - ); - } catch { - return false; - } + // Allow Notion or Vercel Blob hosts for VERCEL_BLOB + try { + const urlObj = new URL(data.url); + const hostname = urlObj.hostname.toLowerCase(); + if ( + hostname === "www.notion.so" || + hostname === "notion.so" || + hostname.endsWith(".notion.site") + ) { + return true; + } + if (process.env.VERCEL_BLOB_HOST) { + const allowed = process.env.VERCEL_BLOB_HOST.toLowerCase(); + if (hostname === allowed || hostname.endsWith(`.${allowed}`)) { + return true; + } + } + } catch { + // ignore and continue to S3-style path check + } } // Or an S3 path (allowed for migration) return /^[a-zA-Z0-9_-]+\/doc_[a-zA-Z0-9_-]+\/[a-zA-Z0-9_.-]+\.[a-zA-Z0-9]+$/.test( data.url, );Also applies to: 100-106
117-119: Fix typo in user-facing error message"an file storage path" → "a file storage path".
Apply:
- message: - "File path must be either a Notion URL or an file storage path", + message: + "File path must be either a Notion URL or a file storage path",
♻️ Duplicate comments (1)
lib/zod/url-validation.ts (1)
171-173: Bug: contentType becomes optional for all types, not just NotionThe early return skips MIME↔type consistency for any item with falsy
contentType. This lets non-Notion uploads bypass the check by sendingnull.Apply this minimal, safe fix (gate by verified Notion hosts to prevent spoofing via type):
- // Skip content type validation if it's not provided (e.g., for Notion files) - if (!data.contentType) { - return true; - } + // Skip only when contentType is missing AND URL is a verified Notion host + if (!data.contentType) { + try { + const hostname = new URL(data.url).hostname; + return ( + hostname === "www.notion.so" || + hostname === "notion.so" || + hostname.endsWith(".notion.site") + ); + } catch { + return false; + } + }Also applies to: 178-181
🧹 Nitpick comments (3)
lib/documents/create-document.ts (1)
31-33: Optional: Fallback to relative URL when NEXT_PUBLIC_BASE_URL is unsetIn environments where
NEXT_PUBLIC_BASE_URLisn't defined, this will generate an invalid URL. Consider a safe fallback.Example:
- const response = await fetch( - `${process.env.NEXT_PUBLIC_BASE_URL}/api/teams/${teamId}/documents`, - { + const base = process.env.NEXT_PUBLIC_BASE_URL; + const endpoint = base + ? `${base}/api/teams/${teamId}/documents` + : `/api/teams/${teamId}/documents`; + const response = await fetch(endpoint, {lib/zod/url-validation.ts (2)
24-27: Make double-encoding check case-insensitiveAttackers can use lowercase percent-encodings (
%2e%2e,%2f%2f). Current check only catches uppercase.Apply:
// Prevent double encoding attacks - if (pathOrUrl.includes("%2E%2E") || pathOrUrl.includes("%2F%2F")) { + const lower = pathOrUrl.toLowerCase(); + if (lower.includes("%2e%2e") || lower.includes("%2f%2f")) { return false; }
79-130: Optional: Deduplicate Notion host logic with a small helperHost checks are repeated in three places. Extracting a single helper reduces drift and makes future domain updates safer.
Example helper (add once near the top of the file):
const isNotionHost = (hostname: string) => hostname === "www.notion.so" || hostname === "notion.so" || hostname.endsWith(".notion.site");Then reuse in
createFilePathValidatorand theVERCEL_BLOBbranch.Also applies to: 220-246
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
lib/documents/create-document.ts(1 hunks)lib/zod/url-validation.ts(4 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (2)
lib/documents/create-document.ts (1)
8-8: All callers correctly passcontentType: no explicitundefinedfoundI ran the provided grep across the codebase and confirmed:
- In
lib/webhook/triggers/link-created.tsanddocument-created.ts, eachcontentTypeis set todocument.contentType.- In
lib/documents/create-document.ts(at lines 46, 85, 124), it’s alwaysdocumentData.contentType.- In
lib/api/views/send-webhook-event.ts,components/upload-zone.tsx, andcomponents/welcome/containers/upload-container.tsx,contentTypelikewise comes fromdocument.contentType.No instances of
contentType: undefinedor use of optional chaining (
?.) were detected. Upstream callers therefore uniformly pass either a string ornull, matching the schema’s.nullable()requirement. No further changes are needed.lib/zod/url-validation.ts (1)
255-276: Heads-up: SSRF checks are hostname-onlyThe SSRF check does not resolve hostnames to IPs, so domains like nip.io/xip.io that resolve to private IPs may slip through. If this endpoint ever dereferences URLs server-side, consider DNS+IP verification at request-time.
Would you like a follow-up PR that adds DNS resolution and CIDR checks (including IPv6) before fetching?
| storageType: z | ||
| .enum(["S3_PATH", "VERCEL_BLOB"], { | ||
| errorMap: () => ({ message: "Invalid storage type" }), | ||
| }) | ||
| .default("VERCEL_BLOB"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Defaulting storageType makes the “missing storageType” branch unreachable—simplify and align semantics
Adding .default("VERCEL_BLOB") ensures storageType is never falsy post-parse, so the branch that handles missing storageType won’t run. This is confusing and risks drift between intended and actual behavior.
Two options—pick one:
- Option A (recommended): Keep the default and remove the dead branch. Also tighten the error message.
- Option B: Remove the default so the missing-storageType logic applies for HTTPS Notion URLs.
Diff for Option A:
storageType: z
.enum(["S3_PATH", "VERCEL_BLOB"], {
errorMap: () => ({ message: "Invalid storage type" }),
})
.default("VERCEL_BLOB"),
@@
- // Skip storage type validation if not provided (e.g., for Notion files)
- if (!data.storageType) {
- // For Notion URLs, storage type is not required
- if (data.url.startsWith("https://")) {
- try {
- const urlObj = new URL(data.url);
- const hostname = urlObj.hostname;
- return (
- hostname === "www.notion.so" ||
- hostname === "notion.so" ||
- hostname.endsWith(".notion.site")
- );
- } catch {
- return false;
- }
- }
- // For file paths without storage type, this is invalid
- return false;
- }
+ // storageType is always set due to default; proceed with consistency checks
@@
- message:
- "Storage type does not match the URL/path format, or missing storage type for non-Notion files",
+ message: "Storage type does not match the URL/path format",If you prefer Option B, I can provide the inverse diff.
Also applies to: 200-218, 248-250
🤖 Prompt for AI Agents
In lib/zod/url-validation.ts around lines 143 to 147, the
.default("VERCEL_BLOB") makes the "missing storageType" branch unreachable;
follow Option A: keep the default on storageType and remove the dead branch that
handles missing storageType (and any conditional logic relying on falsy
storageType), tighten the enum error message to be more specific (e.g., "invalid
storageType, expected S3_PATH or VERCEL_BLOB"), and apply the same change
pattern to the other occurrences noted (lines ~200-218 and ~248-250) so
semantics and validation are aligned.
Summary by CodeRabbit
New Features
Bug Fixes