Skip to content

Conversation

@mfts
Copy link
Owner

@mfts mfts commented Aug 23, 2025

Summary by CodeRabbit

  • New Features

    • Content type can be omitted/null during uploads; storage type now defaults to the common blob option when not provided.
    • Automatic detection and validation of Notion-hosted URLs (including notion.site variants) when storage type is absent.
  • Bug Fixes

    • Stricter validation across upload URL/path and storage type combinations (S3 vs. blob vs. Notion).
    • Clearer error message when storage type and URL/path formats mismatch or storage type is missing for non-Notion files.

@vercel
Copy link

vercel bot commented Aug 23, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
papermark Ready Ready Preview Comment Aug 23, 2025 2:42pm

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Aug 23, 2025

Note

Currently processing new changes in this PR. This may take a few minutes, please wait...

📥 Commits

Reviewing files that changed from the base of the PR and between c2aabcb and 8cc0b40.

📒 Files selected for processing (1)
  • lib/zod/url-validation.ts (4 hunks)
 _____________________________________________________________
< Oompa Loompa doompadee doo, I've got a code review for you. >
 -------------------------------------------------------------
  \
   \   (\__/)
       (•ㅅ•)
       /   づ

Walkthrough

Updates validation and types: documentUploadSchema now defaults storageType to "VERCEL_BLOB", allows contentType to be null, adds guards to skip validations when fields are absent, expands Notion-host detection and storage-type rules; DocumentData.contentType type widened to string | null.

Changes

Cohort / File(s) Summary
Schema and validation logic
lib/zod/url-validation.ts
Added .default("VERCEL_BLOB") for storageType; made contentType nullable; added guards to skip contentType/storageType checks when absent; expanded Notion-host recognition (notion.so, www.notion.so, *.notion.site); refined rules for S3_PATH vs VERCEL_BLOB; updated storage-type mismatch error message.
Document type update
lib/documents/create-document.ts
Widened exported DocumentData.contentType type from string to `string

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Client
  participant Validator as documentUploadSchema
  participant API as createDocument handlers

  Client->>Validator: validate({ url, storageType?, contentType? })

  alt contentType provided
    Validator->>Validator: validate contentType against supported types or "text/html"
  else contentType missing/null
    Note right of Validator #DDEBF7: contentType validation skipped
  end

  alt storageType provided
    Validator->>Validator: enforce storageType rules
    Note right of Validator #F7F3DD: S3_PATH => non-URL path\nVERCEL_BLOB => Notion URL or S3-like path
    alt mismatch
      Validator-->>Client: error: "Storage type does not match the URL/path format, or missing storage type for non-Notion files"
    else ok
      Validator-->>Client: success
    end
  else storageType missing
    alt url is HTTPS
      Validator->>Validator: require Notion host (notion.so, www.notion.so, *.notion.site)
      alt Notion host
        Validator-->>Client: success
      else invalid host
        Validator-->>Client: error: "Storage type does not match the URL/path format, or missing storage type for non-Notion files"
      end
    else non-HTTPS or path
      Validator-->>Client: error (storageType required for non-Notion files)
    end
  end

  Note over Client,API #EDF7EE: On success, payload may include contentType:null
  Client->>API: submit validated payload (contentType: string | null)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Tip

You can validate your CodeRabbit configuration file in your editor.

If your editor has YAML language server, you can enable auto-completion and validation by adding # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json at the top of your CodeRabbit configuration file.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/issue

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (5)
lib/zod/url-validation.ts (5)

200-218: Deduplicate Notion host validation and make it less error-prone

You re-implement the Notion hostname checks here. Extract a helper (also usable by createFilePathValidator) to keep a single source of truth and reduce drift.

Apply this refactor:

// Place near the top of the file
const isNotionHostname = (hostname: string): boolean => {
  const h = hostname.toLowerCase();
  return h === "www.notion.so" || h === "notion.so" || h.endsWith(".notion.site");
};
const isNotionUrl = (maybeUrl: string): boolean => {
  if (!maybeUrl.startsWith("https://")) return false;
  try {
    return isNotionHostname(new URL(maybeUrl).hostname);
  } catch {
    return false;
  }
};

Then simplify this block:

-        if (data.url.startsWith("https://")) {
-          try {
-            const urlObj = new URL(data.url);
-            const hostname = urlObj.hostname;
-            return (
-              hostname === "www.notion.so" ||
-              hostname === "notion.so" ||
-              hostname.endsWith(".notion.site")
-            );
-          } catch {
-            return false;
-          }
-        }
+        if (data.url.startsWith("https://")) {
+          return isNotionUrl(data.url);
+        }

101-103: Make Vercel Blob host detection robust (case-insensitive, allow subdomains, avoid startsWith)

startsWith(process.env.VERCEL_BLOB_HOST) is brittle and case-sensitive. Vercel Blob hostnames often use subdomains; prefer exact match or subdomain match (endsWith).

Apply this change:

-            if (process.env.VERCEL_BLOB_HOST) {
-              isVercelBlob = hostname.startsWith(process.env.VERCEL_BLOB_HOST);
-            }
+            const blobHost = process.env.VERCEL_BLOB_HOST?.toLowerCase();
+            if (blobHost) {
+              const h = hostname.toLowerCase();
+              isVercelBlob = h === blobHost || h.endsWith(`.${blobHost}`);
+            }

If you adopt the isNotionHostname helper, consider a parallel isVercelBlobHostname helper for consistency.


117-119: Grammar nit: fix article in error message

Small copy edit.

-          "File path must be either a Notion URL or an file storage path",
+          "File path must be either a Notion URL or a file storage path",

221-246: Tighten S3 validation and remove regex duplication

  • The S3 path regex is duplicated (here and in createFilePathValidator). Extract a shared const S3_PATH_REGEX = ... to prevent divergence.
  • In the S3_PATH branch you only assert “not https.” Relying on the earlier filePathSchema refine is fine, but explicitly testing the same S3_PATH_REGEX here would make this block self-sufficient and safer against future changes.

Example:

-      if (data.storageType === "S3_PATH") {
-        // S3_PATH should use file paths, not URLs
-        return !data.url.startsWith("https://");
-      } else if (data.storageType === "VERCEL_BLOB") {
+      if (data.storageType === "S3_PATH") {
+        // S3_PATH should use file paths, not URLs
+        return !data.url.startsWith("https://") && S3_PATH_REGEX.test(data.url);
+      } else if (data.storageType === "VERCEL_BLOB") {
         // VERCEL_BLOB can use either Notion URLs or S3 paths (for migration)

And hoist:

const S3_PATH_REGEX =
  /^[a-zA-Z0-9_-]+\/doc_[a-zA-Z0-9_-]+\/[a-zA-Z0-9_.-]+\.[a-zA-Z0-9]+$/;

36-69: Optional: broaden SSRF protections

Your SSRF checks are a good start. Consider covering:

  • 0.0.0.0/8, 127.0.0.0/8 beyond 127.0.0.1, and IPv4-mapped IPv6 (::ffff:127.0.0.1).
  • Numeric, octal, or hex IP encodings.
  • DNS resolution to private ranges (needs server-side DNS check, not just string matching).

If desired, I can add a utility using ipaddr.js (or Node’s net module plus a resolver) to harden this without overcomplicating the client side.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 6898498 and dc4a668.

📒 Files selected for processing (1)
  • lib/zod/url-validation.ts (4 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (2)
lib/zod/url-validation.ts (2)

171-173: LGTM: Allowing text/html and making contentType optional fits Notion

Allowing text/html for Notion docs is correct, and making contentType optional aligns with ingestion realities—provided the earlier bug is fixed to limit the skip to Notion.


143-147: storageType optional: verify downstream correctness before merging

Making storageType optional in the documentUploadSchema does introduce a breaking change: any call to the schema that omits storageType will now produce undefined instead of the previous required enum, and this value is immediately passed into Prisma and routing logic without a fallback. We’ve confirmed numerous call sites that destructure and use storageType directly (e.g., in pages/api/teams/[teamId]/documents/agreement.ts, lib/trigger/convert-files.ts, lib/files/copy-file-to-bucket-server.ts, and many client components) where an undefined value could surface.

Please manually verify for each critical path that:

  • No logic assumes storageType is always defined/truthy (e.g., the .with("S3_PATH", ...) and .with("VERCEL_BLOB", ...) branches in file-copy functions).
  • Passing undefined to prisma.document.create({ data: { storageType } }) either correctly falls back to the database default (VERCEL_BLOB) or is explicitly handled.
  • Any UI components or API endpoints that read storageType won’t break or render incorrectly when it’s missing.

Optional refactor suggestion: add a transform just after your object schema to compute a reliable fallback based on the URL, so downstream code always receives a concrete value:

documentUploadSchema = z
  .object({
    name: z.string().min(1).max(255),
    url: filePathSchema,
    storageType: z
      .enum(["S3_PATH", "VERCEL_BLOB"], {
        errorMap: () => ({ message: "Invalid storage type" }),
      })
      .optional(),
  })
  .transform((data) => {
    const isUrl = data.url.startsWith("https://");
    const isNotion = isUrl && isNotionHostname(new URL(data.url).hostname);
    return {
      ...data,
      storageType:
        data.storageType ??
        (isNotion ? "VERCEL_BLOB" : "S3_PATH"),
    };
  })
  .refine(/* existing refine logic */);

This ensures every consumer always sees a defined storageType without re-implementing inference logic downstream.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
lib/documents/create-document.ts (2)

53-56: Avoid throwing Error with a non-string payload

await response.json() is likely an object; new Error(object) will stringify to "[object Object]". Prefer extracting a message or falling back to status text.

Apply:

   if (!response.ok) {
-    const error = await response.json();
-    throw new Error(error);
+    let message = `HTTP error! status: ${response.status}`;
+    try {
+      const err = await response.json();
+      message = err?.message ?? message;
+    } catch {}
+    throw new Error(message);
   }

135-139: Don’t mask network/HTTP failures as ID validation errors

The catch transforms any error (including fetch/network failures and 5xx) into "Invalid document ID or team ID", which is misleading and harms debuggability.

Apply:

   } catch (error) {
     console.error("Error creating new document version:", error);
-    throw new Error("Invalid document ID or team ID");
+    if (error instanceof z.ZodError) {
+      throw new Error("Invalid document ID");
+    }
+    // Re-throw original error for network/HTTP issues
+    throw error;
   }
lib/zod/url-validation.ts (2)

224-243: Allow Vercel Blob HTTPS URLs consistently and harden host matching

  • The path validator allows Vercel Blob hosts, but the storageType refinement for VERCEL_BLOB rejects them (it only accepts Notion HTTPS or S3-style paths). This inconsistency blocks valid Blob URLs.
  • Also, startsWith(process.env.VERCEL_BLOB_HOST) is an unsafe host check; it will accept blob.vercel-storage.com.evil.com.

Apply:

@@
-            // Check for vercel blob storage
-            let isVercelBlob = false;
-            if (process.env.VERCEL_BLOB_HOST) {
-              isVercelBlob = hostname.startsWith(process.env.VERCEL_BLOB_HOST);
-            }
+            // Check for Vercel Blob storage (exact or subdomain match with dot-boundary)
+            let isVercelBlob = false;
+            if (process.env.VERCEL_BLOB_HOST) {
+              const allowed = process.env.VERCEL_BLOB_HOST.toLowerCase();
+              const h = hostname.toLowerCase();
+              isVercelBlob = h === allowed || h.endsWith(`.${allowed}`);
+            }
@@
       } else if (data.storageType === "VERCEL_BLOB") {
         // VERCEL_BLOB can use either Notion URLs or S3 paths (for migration)
         if (data.url.startsWith("https://")) {
-          // Must be a Notion URL for VERCEL_BLOB
-          try {
-            const urlObj = new URL(data.url);
-            const hostname = urlObj.hostname;
-            return (
-              hostname === "www.notion.so" ||
-              hostname === "notion.so" ||
-              hostname.endsWith(".notion.site")
-            );
-          } catch {
-            return false;
-          }
+          // Allow Notion or Vercel Blob hosts for VERCEL_BLOB
+          try {
+            const urlObj = new URL(data.url);
+            const hostname = urlObj.hostname.toLowerCase();
+            if (
+              hostname === "www.notion.so" ||
+              hostname === "notion.so" ||
+              hostname.endsWith(".notion.site")
+            ) {
+              return true;
+            }
+            if (process.env.VERCEL_BLOB_HOST) {
+              const allowed = process.env.VERCEL_BLOB_HOST.toLowerCase();
+              if (hostname === allowed || hostname.endsWith(`.${allowed}`)) {
+                return true;
+              }
+            }
+          } catch {
+            // ignore and continue to S3-style path check
+          }
         }
         // Or an S3 path (allowed for migration)
         return /^[a-zA-Z0-9_-]+\/doc_[a-zA-Z0-9_-]+\/[a-zA-Z0-9_.-]+\.[a-zA-Z0-9]+$/.test(
           data.url,
         );

Also applies to: 100-106


117-119: Fix typo in user-facing error message

"an file storage path" → "a file storage path".

Apply:

-        message:
-          "File path must be either a Notion URL or an file storage path",
+        message:
+          "File path must be either a Notion URL or a file storage path",
♻️ Duplicate comments (1)
lib/zod/url-validation.ts (1)

171-173: Bug: contentType becomes optional for all types, not just Notion

The early return skips MIME↔type consistency for any item with falsy contentType. This lets non-Notion uploads bypass the check by sending null.

Apply this minimal, safe fix (gate by verified Notion hosts to prevent spoofing via type):

-      // Skip content type validation if it's not provided (e.g., for Notion files)
-      if (!data.contentType) {
-        return true;
-      }
+      // Skip only when contentType is missing AND URL is a verified Notion host
+      if (!data.contentType) {
+        try {
+          const hostname = new URL(data.url).hostname;
+          return (
+            hostname === "www.notion.so" ||
+            hostname === "notion.so" ||
+            hostname.endsWith(".notion.site")
+          );
+        } catch {
+          return false;
+        }
+      }

Also applies to: 178-181

🧹 Nitpick comments (3)
lib/documents/create-document.ts (1)

31-33: Optional: Fallback to relative URL when NEXT_PUBLIC_BASE_URL is unset

In environments where NEXT_PUBLIC_BASE_URL isn't defined, this will generate an invalid URL. Consider a safe fallback.

Example:

-  const response = await fetch(
-    `${process.env.NEXT_PUBLIC_BASE_URL}/api/teams/${teamId}/documents`,
-    {
+  const base = process.env.NEXT_PUBLIC_BASE_URL;
+  const endpoint = base
+    ? `${base}/api/teams/${teamId}/documents`
+    : `/api/teams/${teamId}/documents`;
+  const response = await fetch(endpoint, {
lib/zod/url-validation.ts (2)

24-27: Make double-encoding check case-insensitive

Attackers can use lowercase percent-encodings (%2e%2e, %2f%2f). Current check only catches uppercase.

Apply:

   // Prevent double encoding attacks
-  if (pathOrUrl.includes("%2E%2E") || pathOrUrl.includes("%2F%2F")) {
+  const lower = pathOrUrl.toLowerCase();
+  if (lower.includes("%2e%2e") || lower.includes("%2f%2f")) {
     return false;
   }

79-130: Optional: Deduplicate Notion host logic with a small helper

Host checks are repeated in three places. Extracting a single helper reduces drift and makes future domain updates safer.

Example helper (add once near the top of the file):

const isNotionHost = (hostname: string) =>
  hostname === "www.notion.so" ||
  hostname === "notion.so" ||
  hostname.endsWith(".notion.site");

Then reuse in createFilePathValidator and the VERCEL_BLOB branch.

Also applies to: 220-246

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between dc4a668 and c2aabcb.

📒 Files selected for processing (2)
  • lib/documents/create-document.ts (1 hunks)
  • lib/zod/url-validation.ts (4 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (2)
lib/documents/create-document.ts (1)

8-8: All callers correctly pass contentType: no explicit undefined found

I ran the provided grep across the codebase and confirmed:

  • In lib/webhook/triggers/link-created.ts and document-created.ts, each contentType is set to document.contentType.
  • In lib/documents/create-document.ts (at lines 46, 85, 124), it’s always documentData.contentType.
  • In lib/api/views/send-webhook-event.ts, components/upload-zone.tsx, and components/welcome/containers/upload-container.tsx, contentType likewise comes from document.contentType.

No instances of

contentType: undefined

or use of optional chaining (?.) were detected. Upstream callers therefore uniformly pass either a string or null, matching the schema’s .nullable() requirement. No further changes are needed.

lib/zod/url-validation.ts (1)

255-276: Heads-up: SSRF checks are hostname-only

The SSRF check does not resolve hostnames to IPs, so domains like nip.io/xip.io that resolve to private IPs may slip through. If this endpoint ever dereferences URLs server-side, consider DNS+IP verification at request-time.

Would you like a follow-up PR that adds DNS resolution and CIDR checks (including IPv6) before fetching?

Comment on lines +143 to +147
storageType: z
.enum(["S3_PATH", "VERCEL_BLOB"], {
errorMap: () => ({ message: "Invalid storage type" }),
})
.default("VERCEL_BLOB"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Defaulting storageType makes the “missing storageType” branch unreachable—simplify and align semantics

Adding .default("VERCEL_BLOB") ensures storageType is never falsy post-parse, so the branch that handles missing storageType won’t run. This is confusing and risks drift between intended and actual behavior.

Two options—pick one:

  • Option A (recommended): Keep the default and remove the dead branch. Also tighten the error message.
  • Option B: Remove the default so the missing-storageType logic applies for HTTPS Notion URLs.

Diff for Option A:

   storageType: z
     .enum(["S3_PATH", "VERCEL_BLOB"], {
       errorMap: () => ({ message: "Invalid storage type" }),
     })
     .default("VERCEL_BLOB"),
@@
-      // Skip storage type validation if not provided (e.g., for Notion files)
-      if (!data.storageType) {
-        // For Notion URLs, storage type is not required
-        if (data.url.startsWith("https://")) {
-          try {
-            const urlObj = new URL(data.url);
-            const hostname = urlObj.hostname;
-            return (
-              hostname === "www.notion.so" ||
-              hostname === "notion.so" ||
-              hostname.endsWith(".notion.site")
-            );
-          } catch {
-            return false;
-          }
-        }
-        // For file paths without storage type, this is invalid
-        return false;
-      }
+      // storageType is always set due to default; proceed with consistency checks
@@
-      message:
-        "Storage type does not match the URL/path format, or missing storage type for non-Notion files",
+      message: "Storage type does not match the URL/path format",

If you prefer Option B, I can provide the inverse diff.

Also applies to: 200-218, 248-250

🤖 Prompt for AI Agents
In lib/zod/url-validation.ts around lines 143 to 147, the
.default("VERCEL_BLOB") makes the "missing storageType" branch unreachable;
follow Option A: keep the default on storageType and remove the dead branch that
handles missing storageType (and any conditional logic relying on falsy
storageType), tighten the enum error message to be more specific (e.g., "invalid
storageType, expected S3_PATH or VERCEL_BLOB"), and apply the same change
pattern to the other occurrences noted (lines ~200-218 and ~248-250) so
semantics and validation are aligned.

@mfts mfts merged commit 70a2b0d into main Aug 23, 2025
8 of 9 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Aug 23, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants