-
-
Notifications
You must be signed in to change notification settings - Fork 3.1k
feat: HF /scan endpoint
#2566
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
feat: HF /scan endpoint
#2566
Changes from 47 commits
Commits
Show all changes
51 commits
Select commit
Hold shift + click to select a range
ec2f26d
start by checking /scan during the checksum update
dave-gray101 629bd36
add back in golang side features: downloader/uri gets struct and scan…
dave-gray101 aafa7e3
Merge branch 'master' into feat-hf-api-scan
dave-gray101 4159c26
add a param to scan specific urls - useful for debugging
dave-gray101 e0261bb
helpful printouts
dave-gray101 de1d9f6
fix offsets
dave-gray101 4cf3673
fix error and naming
dave-gray101 9106391
expose error
dave-gray101 8929d4c
fix json tags
dave-gray101 af5a49b
slight wording change
dave-gray101 198aa2a
Merge branch 'master' into feat-hf-api-scan
dave-gray101 105355f
go mod tidy - getting warnings
dave-gray101 53e30cd
Merge branch 'master' into feat-hf-api-scan
dave-gray101 53f7d2b
split out python to make editing easier, add some simple code to del…
dave-gray101 45f1d11
Merge branch 'master' into feat-hf-api-scan
dave-gray101 e7e6b3f
Merge branch 'master' into feat-hf-api-scan
dave-gray101 1d9ded2
Merge branch 'master' into feat-hf-api-scan
dave-gray101 09538ea
Merge branch 'master' into feat-hf-api-scan
dave-gray101 312e7f8
Merge branch 'master' into feat-hf-api-scan
dave-gray101 2e1cc01
Merge branch 'master' into feat-hf-api-scan
dave-gray101 c66a467
Merge branch 'master' into feat-hf-api-scan
dave-gray101 ffbd4cb
Merge branch 'master' into feat-hf-api-scan
dave-gray101 60c589c
Merge branch 'master' into feat-hf-api-scan
dave-gray101 23ad54b
Merge branch 'master' into feat-hf-api-scan
dave-gray101 7df0ce6
Merge branch 'master' into feat-hf-api-scan
dave-gray101 f4cd408
Merge branch 'master' into feat-hf-api-scan
dave-gray101 157193d
Merge branch 'master' into feat-hf-api-scan
dave-gray101 14005a1
manual merge
dave-gray101 68f6ef4
Merge branch 'master' into feat-hf-api-scan
dave-gray101 b662870
Merge branch 'master' into feat-hf-api-scan
dave-gray101 372bc83
o7 to my favorite part of our old name, go-skynet
dave-gray101 991532e
Merge branch 'master' into feat-hf-api-scan
dave-gray101 1f3e942
Merge branch 'master' into feat-hf-api-scan
dave-gray101 245b6a8
merge fix
dave-gray101 ff55c99
merge fix
dave-gray101 cd43e9a
merge fix
dave-gray101 a8889ec
Merge branch 'master' into feat-hf-api-scan
dave-gray101 b89d61a
Merge branch 'master' into feat-hf-api-scan
dave-gray101 729648d
Merge branch 'master' into feat-hf-api-scan
dave-gray101 135a245
Merge branch 'master' into feat-hf-api-scan
dave-gray101 7e02416
Merge branch 'master' into feat-hf-api-scan
dave-gray101 8823490
Merge branch 'master' into feat-hf-api-scan
dave-gray101 082ccd8
Merge branch 'master' into feat-hf-api-scan
dave-gray101 950edef
address review comments
dave-gray101 5e15661
forgot secscan could accept multiple URL at once
dave-gray101 cf89fd8
invert naming and actually use it
dave-gray101 355be77
Merge branch 'master' into feat-hf-api-scan
dave-gray101 ca65d65
missed cli/models.go
dave-gray101 3481934
Merge branch 'feat-hf-api-scan' of ghgray101:/dave-gray101/LocalAI in…
dave-gray101 700c631
Update .github/check_and_update.py
dave-gray101 7127029
Merge branch 'master' into feat-hf-api-scan
mudler File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,80 @@ | ||
| import hashlib | ||
| from huggingface_hub import hf_hub_download, get_paths_info | ||
| import requests | ||
| import sys | ||
| import os | ||
|
|
||
| uri = sys.argv[0] | ||
| file_name = uri.split('/')[-1] | ||
|
|
||
| # Function to parse the URI and determine download method | ||
| # Function to parse the URI and determine download method | ||
| def parse_uri(uri): | ||
| if uri.startswith('huggingface://'): | ||
| repo_id = uri.split('://')[1] | ||
| return 'huggingface', repo_id.rsplit('/', 1)[0] | ||
| elif 'huggingface.co' in uri: | ||
| parts = uri.split('/resolve/') | ||
| if len(parts) > 1: | ||
| repo_path = parts[0].split('https://huggingface.co/')[-1] | ||
| return 'huggingface', repo_path | ||
| return 'direct', uri | ||
|
|
||
| def calculate_sha256(file_path): | ||
| sha256_hash = hashlib.sha256() | ||
| with open(file_path, 'rb') as f: | ||
| for byte_block in iter(lambda: f.read(4096), b''): | ||
| sha256_hash.update(byte_block) | ||
| return sha256_hash.hexdigest() | ||
|
|
||
| def manual_safety_check_hf(repo_id): | ||
| scanResponse = requests.get('https://huggingface.co/api/models/' + repo_id + "/scan") | ||
| scan = scanResponse.json() | ||
| if scan['hasUnsafeFile']: | ||
| return scan | ||
| return None | ||
|
|
||
| download_type, repo_id_or_url = parse_uri(uri) | ||
|
|
||
| new_checksum = None | ||
|
|
||
| # Decide download method based on URI type | ||
| if download_type == 'huggingface': | ||
| # Check if the repo is flagged as dangerous by HF | ||
| hazard = manual_safety_check_hf(repo_id_or_url) | ||
| if hazard != None: | ||
| print(f'Error: HuggingFace has detected security problems for {repo_id_or_url}: {str(hazard)}', filename=file_name) | ||
| sys.exit(5) | ||
| # Use HF API to pull sha | ||
| for file in get_paths_info(repo_id_or_url, [file_name], repo_type='model'): | ||
| try: | ||
| new_checksum = file.lfs.sha256 | ||
| break | ||
| except Exception as e: | ||
| print(f'Error from Hugging Face Hub: {str(e)}', file=sys.stderr) | ||
| sys.exit(2) | ||
| if new_checksum is None: | ||
| try: | ||
| file_path = hf_hub_download(repo_id=repo_id_or_url, filename=file_name) | ||
| except Exception as e: | ||
| print(f'Error from Hugging Face Hub: {str(e)}', file=sys.stderr) | ||
| sys.exit(2) | ||
| else: | ||
| response = requests.get(repo_id_or_url) | ||
| if response.status_code == 200: | ||
| with open(file_name, 'wb') as f: | ||
| f.write(response.content) | ||
| file_path = file_name | ||
| elif response.status_code == 404: | ||
| print(f'File not found: {response.status_code}', file=sys.stderr) | ||
| sys.exit(2) | ||
| else: | ||
| print(f'Error downloading file: {response.status_code}', file=sys.stderr) | ||
| sys.exit(1) | ||
|
|
||
| if new_checksum is None: | ||
| new_checksum = calculate_sha256(file_path) | ||
| print(new_checksum) | ||
| os.remove(file_path) | ||
| else: | ||
| print(new_checksum) | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,23 +1,35 @@ | ||
| package cli | ||
|
|
||
| import ( | ||
| "encoding/json" | ||
| "errors" | ||
| "fmt" | ||
|
|
||
| "github.com/rs/zerolog/log" | ||
|
|
||
| cliContext "github.com/mudler/LocalAI/core/cli/context" | ||
| "github.com/mudler/LocalAI/core/config" | ||
| "github.com/mudler/LocalAI/core/gallery" | ||
| "github.com/mudler/LocalAI/pkg/downloader" | ||
| gguf "github.com/thxcode/gguf-parser-go" | ||
| ) | ||
|
|
||
| type UtilCMD struct { | ||
| GGUFInfo GGUFInfoCMD `cmd:"" name:"gguf-info" help:"Get information about a GGUF file"` | ||
| HFScan HFScanCMD `cmd:"" name:"hf-scan" help:"Checks installed models for known security issues. WARNING: this is a best-effort feature and may not catch everything!"` | ||
|
Owner
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 💯 |
||
| } | ||
|
|
||
| type GGUFInfoCMD struct { | ||
| Args []string `arg:"" optional:"" name:"args" help:"Arguments to pass to the utility command"` | ||
| Header bool `optional:"" default:"false" name:"header" help:"Show header information"` | ||
| } | ||
|
|
||
| type HFScanCMD struct { | ||
| ModelsPath string `env:"LOCALAI_MODELS_PATH,MODELS_PATH" type:"path" default:"${basepath}/models" help:"Path containing models used for inferencing" group:"storage"` | ||
| Galleries string `env:"LOCALAI_GALLERIES,GALLERIES" help:"JSON list of galleries" group:"models" default:"${galleries}"` | ||
| ToScan []string `arg:""` | ||
| } | ||
|
|
||
| func (u *GGUFInfoCMD) Run(ctx *cliContext.Context) error { | ||
| if u.Args == nil || len(u.Args) == 0 { | ||
| return fmt.Errorf("no GGUF file provided") | ||
|
|
@@ -53,3 +65,37 @@ func (u *GGUFInfoCMD) Run(ctx *cliContext.Context) error { | |
|
|
||
| return nil | ||
| } | ||
|
|
||
| func (hfscmd *HFScanCMD) Run(ctx *cliContext.Context) error { | ||
| log.Info().Msg("LocalAI Security Scanner - This is BEST EFFORT functionality! Currently limited to huggingface models!") | ||
| if len(hfscmd.ToScan) == 0 { | ||
| log.Info().Msg("Checking all installed models against galleries") | ||
| var galleries []config.Gallery | ||
| if err := json.Unmarshal([]byte(hfscmd.Galleries), &galleries); err != nil { | ||
| log.Error().Err(err).Msg("unable to load galleries") | ||
| } | ||
|
|
||
| err := gallery.SafetyScanGalleryModels(galleries, hfscmd.ModelsPath) | ||
| if err == nil { | ||
| log.Info().Msg("No security warnings were detected for your installed models. Please note that this is a BEST EFFORT tool, and all issues may not be detected.") | ||
| } else { | ||
| log.Error().Err(err).Msg("! WARNING ! A known-vulnerable model is installed!") | ||
| } | ||
| return err | ||
| } else { | ||
| var errs error = nil | ||
| for _, uri := range hfscmd.ToScan { | ||
| log.Info().Str("uri", uri).Msg("scanning specific uri") | ||
| scanResults, err := downloader.HuggingFaceScan(uri) | ||
| if err != nil && !errors.Is(err, downloader.ErrNonHuggingFaceFile) { | ||
| log.Error().Err(err).Strs("clamAV", scanResults.ClamAVInfectedFiles).Strs("pickles", scanResults.DangerousPickles).Msg("! WARNING ! A known-vulnerable model is included in this repo!") | ||
| errs = errors.Join(errs, err) | ||
| } | ||
| } | ||
| if errs != nil { | ||
| return errs | ||
| } | ||
| log.Info().Msg("No security warnings were detected for your installed models. Please note that this is a BEST EFFORT tool, and all issues may not be detected.") | ||
| return nil | ||
| } | ||
| } | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.