-
-
Notifications
You must be signed in to change notification settings - Fork 1.4k
feat(web): Similar Images - CLIP-based visual similarity detection #8787
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c7ad2d210e
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
@codex review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5df9d5db48
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
@codex review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 31d47fa1f9
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| // Reset cache when expanded groups or data changes | ||
| React.useEffect(() => { | ||
| listRef.current?.resetAfterIndex(0); | ||
| }, [expandedGroups, similarImageGroups]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reset list size cache when layout params change
The row height calculation depends on layoutParams (columns/itemHeight/gap), but the cache reset only runs when expandedGroups or similarImageGroups change. If the container width changes (resize, split view, sidebar toggle), layoutParams changes without a reset, so VariableSizeList keeps stale item sizes and rows render with incorrect heights until another expansion/data change occurs. Consider adding layoutParams (or width/height) to the reset effect so resizing recalculates sizes.
Useful? React with 👍 / 👎.
| } | ||
| } | ||
|
|
||
| onProgress?.(65); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid regressing progress after index build
When the index is rebuilt from scratch, progress is advanced to ~95% during the build/save path, but this unconditional onProgress(65) forces the UI backwards before search starts. Users will see the progress bar jump from near-complete back to 65% on cache misses. This should only run when the index was loaded (not rebuilt), or be replaced with a monotonic update.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 31d47fa1f9
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
@codex review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5084dcbb9d
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| const cacheKey = `si_${distanceThreshold.toFixed(3)}_${hashFileIDs( | ||
| fileIDs, | ||
| )}`; | ||
| const cacheEntry: CachedSimilarImages = { | ||
| id: cacheKey, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use cache key format expected by loader
The cache is saved under si_${distanceThreshold.toFixed(3)}_${hashFileIDs(fileIDs)}, but loadSimilarImagesCache() builds keys via getSimilarImagesCacheKey which appends _v${clipIndexingVersion} (see ml/db.ts). That mismatch means the saved entries are never retrieved, so every run is a cache miss and triggers full recomputation/rebuild even when nothing changed. Saving with the same key builder as the loader (including the version suffix) would make cache hits work.
Useful? React with 👍 / 👎.
| if (!_clipHNSWIndex) { | ||
| console.log(`[HNSW] Creating new index with capacity: ${capacity}`); | ||
| _clipHNSWIndex = new HNSWIndex( | ||
| 512, // CLIP embedding dimension | ||
| capacity, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Recreate index when skipInit is requested
skipInit only takes effect when a new instance is created. If Similar Images is opened again in the same SPA session, _clipHNSWIndex is already initialized but callers still request getCLIPHNSWIndex(..., true) and then call loadIndex(). loadIndex() explicitly throws when the index is initialized, so the caller clears metadata and rebuilds every time. Consider destroying/recreating the index when skipInit is true on an existing instance, or bypassing loadIndex() when the in-memory index is already valid.
Useful? React with 👍 / 👎.
|
@codex review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 03ef360b69
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
f282ad6 to
c9f4558
Compare
Similar Images: CLIP-based visual similarity detection
Summary
Implements a Similar Images feature that helps users find and manage visually similar photos using CLIP embeddings and HNSW indexing for efficient nearest-neighbor search.
Key capabilities:
UI Integration
Added to Sidebar → Free up space → Similar Images submenu, alongside:
This logical grouping makes it easy for users to discover cleanup tools in one place.
Technical Implementation
Similarity Detection
Category Thresholds
Matches mobile implementation:
Based on CLIP cosine distance where 0 = identical, 2 = opposite.
Smart Deletion Logic
"Best Photo" selection prioritizes keeping:
The first item in each group (best photo) is automatically protected from deletion. Users can also manually select/deselect individual items.
Collection Handling
Properly handles files in multiple collections:
Changes
New Files
web/packages/new/photos/services/similar-images.ts- Core similarity detection (855 lines)web/packages/new/photos/services/similar-images-types.ts- Type definitions (89 lines)web/packages/new/photos/services/similar-images-delete.ts- Deletion logic (173 lines)web/packages/new/photos/pages/similar-images.tsx- UI page component (1203 lines)web/packages/new/photos/services/ml/hnsw.ts- HNSW index wrapper (456 lines)web/packages/new/photos/services/__tests__/similar-images.test.ts- Test suite (569 lines)web/apps/photos/src/pages/similar-images.tsx- Next.js route (10 lines)Modified Files
web/packages/new/photos/services/ml/db.ts- Added similar-images cache schemaweb/packages/new/photos/services/ml/clip.ts- Exported getCLIPIndexes aliasweb/packages/new/photos/components/Tiles.tsx- Added LargeFileTileOverlay exportweb/apps/photos/src/components/Sidebar.tsx- Added submenu integrationweb/packages/new/photos/services/search/types.ts- Added freeUpSpace.similarImages actionPerformance Characteristics
Initial Analysis (first time):
Subsequent Analyses:
Search Performance: