-
Notifications
You must be signed in to change notification settings - Fork 1k
Cross-Origin Storage API Extension #1442
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 4 commits
ab41df3
19a6e03
f36e0bf
4cb6baf
ae95ed4
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -0,0 +1,71 @@ | ||||||
| class CrossOriginStorage { | ||||||
| static isAvailable = () => "crossOriginStorage" in navigator; | ||||||
|
|
||||||
| match = async (request) => { | ||||||
| const hashValue = await this._getFileHash(request); | ||||||
| if (!hashValue) { | ||||||
| return undefined; | ||||||
| } | ||||||
| const hash = { algorithm: "SHA-256", value: hashValue }; | ||||||
|
||||||
| try { | ||||||
| // @ts-expect-error | ||||||
| const [handle] = await navigator.crossOriginStorage.requestFileHandles([ | ||||||
| hash, | ||||||
| ]); | ||||||
| const blob = await handle.getFile(); | ||||||
| return new Response(blob); | ||||||
| } catch (err) { | ||||||
| return undefined; | ||||||
| } | ||||||
| }; | ||||||
| put = async (request, response) => { | ||||||
| const blob = await response.blob(); | ||||||
| const hash = await this._getBlobHash(blob); | ||||||
| // @ts-expect-error | ||||||
| const [handle] = await navigator.crossOriginStorage.requestFileHandles( | ||||||
| [hash], | ||||||
| { create: true }, | ||||||
| ); | ||||||
| const writableStream = await handle.createWritable(); | ||||||
| await writableStream.write(blob); | ||||||
| await writableStream.close(); | ||||||
| }; | ||||||
|
|
||||||
| _getFileHash = async (url) => { | ||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This doesn't "see" the requests for the ORT Wasm files. Those should be 100% cached in COS for guaranteed cache hits as any Transformers.js or ONNX Runtime Web uses the same few files. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this needs to happen in ORT, or you can of course do it "by hand". There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The Wasm file fetch might happen here (line 12), but not 100% sure. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes we have been meaning to "control" this on the Transformers.js side by loading and caching the binary, then pointing Just need to get around to adding it :) |
||||||
| if (/\/resolve\/main\/onnx\//.test(url)) { | ||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This function is essentially scraping the website. Maybe leave the original comment from my code where this was linked to an explanation on the HF docs. Also see the comment above about future-proofing this for possible algorithm changes. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
@nico-martin Just realized this needs to be checking for just There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @nico-martin Actually, the more I play with it, the more corner cases I discover. You can also point at the non- There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are you sure thi model is supported by transformers.js? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh, this is purely about the URL structure, not the model itself. The point is that the original code checked for |
||||||
| const rawUrl = url.replace(/\/resolve\//, "/raw/"); | ||||||
| const text = await fetch(rawUrl).then((response) => response.text()); | ||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This runs every time, which means you can't run fully offline. Instead, this should cache the mapping url=>hash and return the cached value. I had this in my initial implementation and remember there was some trickery needed to make it work with the actual URLs (I don't remember, but maybe it had to do with the post-redirect URLs that point at the CDN? Just copy what I had, this worked :-) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, I did that deliberately. From my point of view, it's a question of separation of concerns/responsibilities. I dont think it is the responsibility of transformers.js to ensure that everything works offline. It is our responsibility to do our best to keep the download payload as little as possible. But here I dont this we need to cache this request since it is tiny. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Makes sense. And stale-while-revalidate as a caching strategy for these "get SHA-256 hash" routes would work perfectly both for always being offline-capable and for never missing a new model. This should likely be added somewhere as a best practice in the docs, but for here: LGTM. |
||||||
| if (!text.includes("oid sha256:")) { | ||||||
| return null; | ||||||
| } | ||||||
| return text.replace(/.*?\n^oid sha256:(\w+)\n.*?$/gm, "$1") || null; | ||||||
| } | ||||||
| return null; | ||||||
| }; | ||||||
|
|
||||||
| _getBlobHash = async (blob) => { | ||||||
| const hashAlgorithmIdentifier = "SHA-256"; | ||||||
|
|
||||||
| // Get the contents of the blob as binary data contained in an ArrayBuffer. | ||||||
| const arrayBuffer = await blob.arrayBuffer(); | ||||||
|
|
||||||
| // Hash the arrayBuffer using SHA-256. | ||||||
| const hashBuffer = await crypto.subtle.digest( | ||||||
| hashAlgorithmIdentifier, | ||||||
| arrayBuffer, | ||||||
| ); | ||||||
|
|
||||||
| // Convert the ArrayBuffer to a hex string. | ||||||
| const hashArray = Array.from(new Uint8Array(hashBuffer)); | ||||||
| const hashHex = hashArray | ||||||
| .map((byte) => byte.toString(16).padStart(2, "0")) | ||||||
| .join(""); | ||||||
|
|
||||||
| return { | ||||||
| algorithm: hashAlgorithmIdentifier, | ||||||
| value: hashHex, | ||||||
| }; | ||||||
| }; | ||||||
| } | ||||||
|
|
||||||
| export default CrossOriginStorage; | ||||||
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should use
typeofcheck here, otherwise we get crashes in Node.js. For example, from the unit tests:See
transformers.js/src/env.js
Line 35 in fcf2ec9
for example