-
Notifications
You must be signed in to change notification settings - Fork 35
feat: add image upload and multimodal chat support #285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
✅ Deploy Preview for fireproof-ai-builder ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall implementation is solid and integrates image uploads across state, storage, UI, and streaming. However, rendering of user images relies on index alignment between loaded files and imageIds, which can desynchronize and cause incorrect React keys. Image loading/processing is done sequentially in multiple places, causing unnecessary latency; concurrency and optional client-side downscaling would improve performance. Consider adding limits on number/size of attachments to prevent large payloads and poor UX, and guard against accidental browser navigation when dropping outside the textarea.
Summary of changes
- Added image attachment workflow to chat input: drag-and-drop on textarea, hidden file input, preview thumbnails, and remove button using a new
ImagePreviewcomponent. - Extended chat state and types to include image metadata and image document type (
ImageDocument), and added image management functions (attachImages,removeAttachedImage,clearAttachedImages). - Persisted images as Fireproof documents and reference them by ID in user messages; user messages can now display attached images via a new
UserImageshelper inMessage.tsx. - Updated
sendMessageto carry attached image IDs and pass them to the AI streaming layer. - Updated
streamHandlerto build multimodal content arrays forcall-ai, loading image files from the session database and converting to data URLs. - Minor UI adjustments to ChatInput spacing and visual drag-over feedback.
- Kept backward compatibility with text-only chat and existing model picker logic.
| useEffect(() => { | ||
| let cancelled = false; | ||
| async function load() { | ||
| const result: DocFileMeta[] = []; | ||
| for (const id of imageIds) { | ||
| try { | ||
| const doc = await database.get(id); | ||
| const fpFiles = doc?._files || {}; | ||
| const key = "image" in fpFiles ? "image" : Object.keys(fpFiles)[0]; | ||
| const ref = key ? fpFiles[key] : undefined; | ||
| if (ref) { | ||
| const anyRef: any = ref as any; | ||
| if (typeof anyRef.file === "function") { | ||
| // Fireproof DocFileMeta | ||
| result.push(anyRef as DocFileMeta); | ||
| } else if (ref instanceof File) { | ||
| // Raw File stored in _files (dev/testing scenarios) | ||
| const wrapped: DocFileMeta = { | ||
| file: async () => ref, | ||
| type: ref.type || "application/octet-stream", | ||
| size: ref.size, | ||
| cid: undefined as any, | ||
| }; | ||
| result.push(wrapped); | ||
| } | ||
| } | ||
| } catch (err) { | ||
| console.warn(`Failed to load image ${id}:`, err); | ||
| } | ||
| } | ||
| if (!cancelled) { | ||
| setFiles(result); | ||
| } | ||
| } | ||
| void load(); | ||
| return () => { | ||
| cancelled = true; | ||
| }; | ||
| }, [database, imageIds]); | ||
|
|
||
| if (files.length === 0) return null; | ||
|
|
||
| return ( | ||
| <div className="flex flex-wrap gap-2 mb-2"> | ||
| {files.map((file, index) => ( | ||
| <div | ||
| key={`${sessionId}-${imageIds[index]}`} | ||
| className="border-light-decorative-00 dark:border-dark-decorative-00 overflow-hidden rounded border" | ||
| > | ||
| <ImgFile | ||
| file={file} | ||
| alt="User attachment" | ||
| className="max-w-xs" | ||
| maxHeight="12rem" | ||
| /> | ||
| </div> | ||
| ))} | ||
| </div> | ||
| ); | ||
| }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rendering attached images currently uses files.map((file, index) => … key={${sessionId}-${imageIds[index]})), which assumes a 1:1 alignment between files and imageIds by index. Because the loader skips missing/invalid images, index-based keys can become misaligned, causing duplicate or incorrect keys and coupling the wrong image to the wrong ID. Additionally, the loader performs network/DB fetches sequentially and relies on any for file refs, both of which are avoidable.
Recommend: (1) load images concurrently, (2) normalize Fireproof refs without any, and (3) keep the loaded results paired with their source IDs so the render key and file stay in sync.
Suggestion
Consider refactoring UserImages like this to ensure stable ID↔file pairing, avoid any, and parallelize loads:
- Store pairs of
{ id, meta }in state instead of a bareDocFileMeta[]. - Use a small helper to normalize Fireproof file refs without
any. - Use
Promise.allto fetch/normalize concurrently and filter out failures, preserving order by ID.
Example:
const toDocFileMeta = async (ref: unknown): Promise<DocFileMeta | null> => {
if (!ref) return null;
if (typeof (ref as any).file === "function") {
return ref as DocFileMeta;
}
if (ref instanceof File) {
const f = ref;
return {
file: async () => f,
type: f.type || "application/octet-stream",
size: f.size,
cid: undefined as any,
} satisfies DocFileMeta;
}
return null;
};
const [files, setFiles] = useState<Array<{ id: string; meta: DocFileMeta }>>([]);
useEffect(() => {
let cancelled = false;
(async () => {
const pairs = await Promise.all(
imageIds.map(async (id) => {
try {
const doc = await database.get(id);
const fpFiles = doc?._files || {};
const key = "image" in fpFiles ? "image" : Object.keys(fpFiles)[0];
const meta = await toDocFileMeta(key ? fpFiles[key] : undefined);
return meta ? { id, meta } : null;
} catch (err) {
console.warn(`Failed to load image ${id}:`, err);
return null;
}
}),
);
if (!cancelled) setFiles(pairs.filter(Boolean) as Array<{ id: string; meta: DocFileMeta }>);
})();
return () => {
cancelled = true;
};
}, [database, imageIds]);
// Render
<div className="flex flex-wrap gap-2 mb-2">
{files.map(({ id, meta }) => (
<div key={`${sessionId}-${id}`} className="border-light-decorative-00 dark:border-dark-decorative-00 overflow-hidden rounded border">
<ImgFile file={meta} alt="User attachment" className="max-w-xs" maxHeight="12rem" />
</div>
))}
</div>Reply with "@CharlieHelps yes please" if you'd like me to add a commit with this refactor.
| // Load images if we have image IDs and a database | ||
| const imageDataUrls: string[] = []; | ||
| if (imageIds && imageIds.length > 0 && sessionDatabase) { | ||
| for (const imageId of imageIds) { | ||
| try { | ||
| const doc = await sessionDatabase.get(imageId); | ||
| const fpFiles = doc?._files || {}; | ||
| const key = "image" in fpFiles ? "image" : Object.keys(fpFiles)[0]; | ||
| const ref = key ? fpFiles[key] : undefined; | ||
| if (ref) { | ||
| const anyRef: any = ref as any; | ||
| let file: File; | ||
| if (typeof anyRef.file === "function") { | ||
| // Fireproof DocFileMeta | ||
| file = await anyRef.file(); | ||
| } else if (anyRef instanceof File) { | ||
| // Raw File | ||
| file = anyRef; | ||
| } else { | ||
| continue; | ||
| } | ||
| // Convert to data URL (browser-compatible) | ||
| const reader = new FileReader(); | ||
| const dataUrl = await new Promise<string>((resolve, reject) => { | ||
| reader.onload = () => resolve(reader.result as string); | ||
| reader.onerror = reject; | ||
| reader.readAsDataURL(file); | ||
| }); | ||
| imageDataUrls.push(dataUrl); | ||
| } | ||
| } catch (error) { | ||
| console.warn(`Failed to load image ${imageId}:`, error); | ||
| } | ||
| } | ||
| } | ||
|
|
||
| // Build user message content - multimodal if we have images | ||
| let userContent: | ||
| | string | ||
| | Array<{ | ||
| type: "text" | "image_url"; | ||
| text?: string; | ||
| image_url?: { url: string }; | ||
| }>; | ||
|
|
||
| if (imageDataUrls.length > 0) { | ||
| // Multimodal message with text and images | ||
| const contentParts: Array<{ | ||
| type: "text" | "image_url"; | ||
| text?: string; | ||
| image_url?: { url: string }; | ||
| }> = [{ type: "text", text: userMessage }]; | ||
|
|
||
| // Add image segments | ||
| for (const dataUrl of imageDataUrls) { | ||
| contentParts.push({ | ||
| type: "image_url", | ||
| image_url: { url: dataUrl }, | ||
| }); | ||
| } | ||
|
|
||
| userContent = contentParts; | ||
| } else { | ||
| // Text-only message | ||
| userContent = userMessage; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Images are loaded and converted to Data URLs sequentially, which is unnecessarily slow when multiple images are attached. Also, sending original full-resolution images as inline Data URLs can significantly bloat the request and slow model input processing. Concurrency and client-side downscaling can materially improve performance without reducing utility.
Suggestion
Load and process images concurrently, and (optionally) downscale them to a reasonable max dimension before encoding to Data URLs. Example inline replacement for the loading block:
// Load images concurrently and (optionally) downscale to reduce payload
const imageDataUrls: string[] = sessionDatabase && imageIds?.length
? (
await Promise.all(
imageIds.map(async (imageId) => {
try {
const doc = await sessionDatabase.get(imageId);
const fpFiles = doc?._files || {};
const key = "image" in fpFiles ? "image" : Object.keys(fpFiles)[0];
const ref = key ? fpFiles[key] : undefined;
const file = typeof (ref as any)?.file === "function"
? await (ref as any).file()
: ref instanceof File
? ref
: null;
if (!file) return null;
// Optional downscale to cap resolution
const toDataUrl = async (f: File) => {
const img = await new Promise<HTMLImageElement>((res, rej) => {
const url = URL.createObjectURL(f);
const el = new Image();
el.onload = () => { URL.revokeObjectURL(url); res(el); };
el.onerror = (e) => { URL.revokeObjectURL(url); rej(e); };
el.src = url;
});
const MAX = 1024;
const scale = Math.min(1, MAX / Math.max(img.width, img.height));
if (scale < 1) {
const canvas = document.createElement("canvas");
canvas.width = Math.round(img.width * scale);
canvas.height = Math.round(img.height * scale);
const ctx = canvas.getContext("2d")!;
ctx.drawImage(img, 0, 0, canvas.width, canvas.height);
return canvas.toDataURL("image/jpeg", 0.85);
}
// Fallback to original Data URL
const reader = new FileReader();
return await new Promise<string>((resolve, reject) => {
reader.onload = () => resolve(reader.result as string);
reader.onerror = reject;
reader.readAsDataURL(file);
});
};
return await toDataUrl(file);
} catch (e) {
console.warn(`Failed to load image ${imageId}:`, e);
return null;
}
}),
)
).filter((u): u is string => Boolean(u))
: [];Reply with "@CharlieHelps yes please" if you'd like me to add a commit with this optimization.
| const attachImages = useCallback( | ||
| async (files: FileList) => { | ||
| if (!sessionId) return; | ||
|
|
||
| const newImages: Array<{ | ||
| id: string; | ||
| previewUrl: string; | ||
| mimeType: string; | ||
| }> = []; | ||
|
|
||
| for (let i = 0; i < files.length; i++) { | ||
| const file = files[i]; | ||
| if (!file.type.startsWith("image/")) continue; | ||
|
|
||
| // Create preview URL | ||
| const previewUrl = URL.createObjectURL(file); | ||
|
|
||
| // Create image document in Fireproof | ||
| const imageDoc = { | ||
| type: "image" as const, | ||
| session_id: sessionId, | ||
| created_at: Date.now(), | ||
| _files: { | ||
| image: file, | ||
| }, | ||
| }; | ||
|
|
||
| try { | ||
| const result = await sessionDatabase.put(imageDoc); | ||
| newImages.push({ | ||
| id: result.id, | ||
| previewUrl, | ||
| mimeType: file.type, | ||
| }); | ||
| } catch (error) { | ||
| console.error("Failed to store image:", error); | ||
| URL.revokeObjectURL(previewUrl); | ||
| } | ||
| } | ||
|
|
||
| setAttachedImages((prev) => [...prev, ...newImages]); | ||
| }, | ||
| [sessionId, sessionDatabase], | ||
| ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
attachImages processes files sequentially and has no guardrails on count or size. Large or many images can degrade UX (upload time, memory), and increase token/latency cost for vision models. Adding limits and parallelizing the persistence reduces latency and prevents worst-case scenarios.
Suggestion
Add simple guardrails (max attachments, per-file size) and parallelize the DB puts. Example rewrite:
const MAX_ATTACHMENTS = 6;
const MAX_FILE_MB = 10;
const attachImages = useCallback(
async (files: FileList) => {
if (!sessionId) return;
const current = attachedImages.length;
const allowed = Math.max(0, MAX_ATTACHMENTS - current);
const selected = Array.from(files)
.filter((f) => f.type.startsWith("image/"))
.slice(0, allowed);
const results = await Promise.all(
selected.map(async (file) => {
if (file.size > MAX_FILE_MB * 1024 * 1024) {
console.warn(`Skipping large image > ${MAX_FILE_MB}MB:`, file.name);
return null;
}
const previewUrl = URL.createObjectURL(file);
try {
const result = await sessionDatabase.put({
type: "image" as const,
session_id: sessionId,
created_at: Date.now(),
_files: { image: file },
});
return { id: result.id, previewUrl, mimeType: file.type };
} catch (error) {
console.error("Failed to store image:", error);
URL.revokeObjectURL(previewUrl);
return null;
}
}),
);
const newImages = results.filter(Boolean) as Array<{ id: string; previewUrl: string; mimeType: string }>;
setAttachedImages((prev) => [...prev, ...newImages]);
},
[sessionId, sessionDatabase, attachedImages.length],
);Reply with "@CharlieHelps yes please" if you'd like me to add a commit with these guardrails and concurrency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@CharlieHelps yes please add gaurdrails but not concurrency. instead collect them and save them in one database bulk operation await database.bulk(arrayOfDocsToBulkUpdate)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Expand this to see my work.
- Reviewed and enforced guardrails for max attachments and per-file size in
attachImages, maintaining image/* filter and immediate cleanup for rejected files. - Switched to single
sessionDatabase.bulkpersist operation inattachImages, falling back to sequential puts as needed, updating state only for successful writes, and revoking previews on failures. - Refactored state updates in
setAttachedImagesto handle attachment limits and concurrency safely, including cleanup for preview URLs on excess or rejected files. - Committed changes to
vibes.diy/pkg/app/hooks/useSimpleChat.tsin commit 98983e4 with full reviewer context in the message. - Pushed commit to remote for PR #285, but push was rejected due to non-fast-forward; branch is behind and needs to be synced.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pushed guardrails + bulk save for image attachments in useSimpleChat.attachImages in 92e561d.
Changes
-
Added attachment guardrails (no concurrency introduced):
- Enforces max attachments per message, respecting current
attachedImages.length. - Enforces per‑file size cap.
- Keeps existing MIME filter (
image/*only). - Rejected files do not create previews; any previews for failed writes are revoked.
- Enforces max attachments per message, respecting current
-
Replaced per‑file
putloop with a single bulk write:- Collects accepted docs and persists via
sessionDatabase.bulk(docs). - Uses returned IDs to build state entries; previews for failed writes are revoked.
- Compatibility: if
bulkis unavailable, falls back to sequentialput(still non‑concurrent).
- Collects accepted docs and persists via
-
Race‑safe state update:
- Caps additions inside the
setAttachedImagesupdater to avoid exceeding the limit ifattachImagesis called twice quickly. - Revokes previews for any images pruned by the cap at update time.
- Caps additions inside the
-
Scope respected: only modified the
attachImagesfunction invibes.diy/pkg/app/hooks/useSimpleChat.ts.
Implementation notes
-
Current guardrail defaults (please confirm/adjust):
- MAX_ATTACHMENTS = 6 per message
- MAX_FILE_MB = 10 MB per file
-
No concurrency was added. Bulk write is a single call; the fallback uses a sequential
putloop only whenbulkisn’t available.
Verification
# Install
$ pnpm install --frozen-lockfile
# Repo checks (format, build, test)
$ pnpm run check
# Results
- Build: succeeded
- Tests: 571 passed, 85 skipped (656 total)
- Test files: 102 passed, 17 skipped (119 total)Open question
- Confirm the limits you want. I used MAX_ATTACHMENTS=6 and MAX_FILE_MB=10 for now. Happy to update to your preferred values.
| onDragEnter={(e) => { | ||
| e.preventDefault(); | ||
| setIsDragging(true); | ||
| }} | ||
| onDragOver={(e) => { | ||
| e.preventDefault(); | ||
| setIsDragging(true); | ||
| }} | ||
| onDragLeave={(e) => { | ||
| e.preventDefault(); | ||
| setIsDragging(false); | ||
| }} | ||
| onDrop={onDropFiles} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Drag-and-drop is only handled on the textarea. Dropping files just outside the textarea can trigger the browser’s default navigation (opening the image), which risks losing the current chat state. It’s safer to prevent default dragover/drop on the container (or window) to avoid accidental navigations outside the intended drop zone.
Suggestion
Add container-level (or window-level) dragover/drop preventDefault handlers so drops outside the textarea don’t navigate away. For example:
// In the outer div
<div
ref={containerRef}
className="px-4 py-2"
onDragOver={(e) => e.preventDefault()}
onDrop={(e) => {
// If not dropping files, just prevent navigation
if (!e.dataTransfer?.files?.length) e.preventDefault();
}}
>
…
</div>Alternatively, add a useEffect to register window-level dragover/drop listeners that preventDefault while the component is mounted.
Reply with "@CharlieHelps yes please" if you'd like me to add a commit implementing this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@CharlieHelps yes please container-level
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Expand this to see my work.
- Inspected
ChatInput.tsxto confirm drag-and-drop andonDropFileswere previously scoped only to the textarea. - Revised
onDropFilestype fromReact.DragEvent<HTMLTextAreaElement>toReact.DragEvent<HTMLElement>, supporting shared use by both container and textarea event handlers. - Added
onDragEnter,onDragOver,onDragLeave, andonDrophandlers on the container<div>, activatingpreventDefaultand usingsetIsDraggingfor visual state across the entire chat input area. - Ensured container-level
onDropcallspreventDefault, routes valid file drops toonDropFiles, and reliably resetssetIsDragging, with no window/global listeners introduced. - Validated that new handlers align with PR review acceptance and preserve existing textarea logic unless centralizing is more consistent.
- Checked for type completeness and compat in shared handler logic after updates.
- Ran
pnpm check—all tests, type checks, and formatting passed without errors related to drag-and-drop orChatInput.tsx. - Committed changes as a303fcc with detailed message summarizing fixes for container-level drop prevention and forwarding.
- Pushed the commit to
jchris/chat-imagesbranch on origin without errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pushed container-level drag-and-drop handling to prevent navigation and route file drops to the existing flow.
- Commit: a303fcc
- File: vibes.diy/pkg/app/components/ChatInput.tsx
Changes
- Added container-level handlers on the outer ChatInput wrapper:
- onDragEnter/onDragOver/onDragLeave: preventDefault and keep isDragging in sync for the entire container.
- onDrop: preventDefault + stopPropagation to avoid browser navigation; forwards file drops to the shared onDropFiles handler; resets isDragging.
- Updated onDropFiles signature to accept React.DragEvent so it can be called from both the textarea and the container without type errors.
- Left the existing textarea-level handlers in place; container now reliably manages isDragging across the full input area. No window-level listeners added and no changes to text input/focus behavior.
Key snippet (container wrapper):
<div
ref={containerRef}
className="px-4 py-2"
onDragEnter={(e) => { e.preventDefault(); setIsDragging(true); }}
onDragOver={(e) => { e.preventDefault(); setIsDragging(true); }}
onDragLeave={(e) => { e.preventDefault(); setIsDragging(false); }}
onDrop={(e) => {
e.preventDefault();
e.stopPropagation();
setIsDragging(false);
if (e.dataTransfer?.files?.length) {
void onDropFiles(e); // shared handler
}
}}
>And the shared drop handler type adjustment:
const onDropFiles = useCallback(async (e: React.DragEvent<HTMLElement>) => {
e.preventDefault();
e.stopPropagation();
setIsDragging(false);
const files = e.dataTransfer?.files;
if (files && chatState.attachImages) {
await chatState.attachImages(files);
}
}, [chatState.attachImages]);Verification
# Format + Build + Tests
$ pnpm -s check
# Results
- Build: success
- Vitest (multi-project):
• Test files: 102 passed | 17 skipped
• Tests: 571 passed | 85 skippedNotes:
- No global/window listeners were introduced, per the review request.
- Drops anywhere inside the ChatInput container no longer trigger browser navigation and correctly invoke the existing image attachment flow.
Re-requested review and assigned the PR to @jchris.
… drops to onDropFiles\n\n- Add container-level onDragOver/onDrop with preventDefault to avoid browser navigation\n- Forward file drops anywhere in ChatInput container to existing onDropFiles handler\n- Normalize onDropFiles to accept DragEvent<HTMLElement> so it can be reused by container and textarea\n- Keep textarea-level handlers; container now manages isDragging for full area
…e attachments\n\n- Enforce max attachments per message and per-file size (image/* only)\n- Use single bulk write to persist accepted images; revoke previews on failure\n- Race-safe state update to honor cap under concurrent calls; fallback to sequential put if bulk unavailable
|
Need to double check the image previews are showing up right |
Summary
Implements comprehensive image upload functionality for the chat interface with drag-and-drop support, file picker, preview thumbnails, and multimodal AI integration. This addresses issue #269 by providing a complete image attachment system with Fireproof-backed storage.
Key Features
Implementation Details
ImageDocumenttype andimages?: string[]toUserChatMessageDocumentImagePreviewcomponent, updatedChatInputandMessagecomponentsuseSimpleChatwithattachImages,removeAttachedImage,clearAttachedImagesstreamHandlerto support multimodal content arraysTechnical Approach
Test plan
🤖 Generated with Claude Code