Skip to content

use weights_only in conversion script to prevent model arbitrary code execution#32

Merged
ggerganov merged 1 commit intoggml-org:masterfrom
deepdiffuser:master
Mar 12, 2023
Merged

use weights_only in conversion script to prevent model arbitrary code execution#32
ggerganov merged 1 commit intoggml-org:masterfrom
deepdiffuser:master

Conversation

@deepdiffuser
Copy link
Copy Markdown
Contributor

this restricts malicious weights from executing arbitrary code by restricting the unpickler to only loading tensors, primitive types, and dictionaries.

see torch.load docs

https://pytorch.org/docs/stable/generated/torch.load.html

i tested this and it seems to work the same as before

this restricts malicious weights from executing arbitrary code by restricting the unpickler to only loading tensors, primitive types, and dictionaries
@ggerganov ggerganov merged commit a931202 into ggml-org:master Mar 12, 2023
@wizzard0
Copy link
Copy Markdown
Contributor

@deepdiffuser I do support this change, but now I get

TypeError: 'weights_only' is an invalid keyword argument for Unpickler()

Any ideas?

@deepdiffuser
Copy link
Copy Markdown
Contributor Author

what version of pytorch? I believe you need 1.13.1 for this arg

@wizzard0
Copy link
Copy Markdown
Contributor

Ah, I see. conda gets you 1.12.1. Let's keep this thread for posterity.

flowgrad pushed a commit to flowgrad/llama.cpp that referenced this pull request Jun 27, 2023
Alcpz added a commit to Alcpz/llama.cpp that referenced this pull request Dec 1, 2025
Cherry pick of ggml-org#32

Co-authored-by: Yuri Khrustalev <ykhrustalev@users.noreply.github.com>
SamuelOliveirads pushed a commit to SamuelOliveirads/llama.cpp that referenced this pull request Dec 29, 2025
* Zen4 flash attention: moving useful parts from the kq_fused_softmax branch

* Add flash attention with soft-cap and fix D = 256 case

* Flash attention refinements

* Update FlashAttn comment

---------

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
TheTom added a commit to TheTom/llama-cpp-turboquant that referenced this pull request Mar 26, 2026
…n data

Part of ggml-org#32: turbo3 prefill degrades relative to q8_0 with context length.

Changes so far:
- Skip ggml_cont when tensors already contiguous (+1%, minimal)
- Generated 32x32 rotation matrices (turbo-rotation-data-32.h) for
  reduced group size approach (16x less matmul compute)
- Fixed V un-rotation to check v->type not k->type

Next: update QK_TURBO3_GROUP, Metal WHT kernel, and KV cache for d=32.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: tturney@psyguard.ai
TheTom added a commit to TheTom/llama-cpp-turboquant that referenced this pull request Mar 26, 2026
…n data

Part of ggml-org#32: turbo3 prefill degrades relative to q8_0 with context length.

Changes so far:
- Skip ggml_cont when tensors already contiguous (+1%, minimal)
- Generated 32x32 rotation matrices (turbo-rotation-data-32.h) for
  reduced group size approach (16x less matmul compute)
- Fixed V un-rotation to check v->type not k->type

Next: update QK_TURBO3_GROUP, Metal WHT kernel, and KV cache for d=32.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: tturney@psyguard.ai
didlawowo pushed a commit to didlawowo/llama.cpp that referenced this pull request Mar 27, 2026
…n data

Part of ggml-org#32: turbo3 prefill degrades relative to q8_0 with context length.

Changes so far:
- Skip ggml_cont when tensors already contiguous (+1%, minimal)
- Generated 32x32 rotation matrices (turbo-rotation-data-32.h) for
  reduced group size approach (16x less matmul compute)
- Fixed V un-rotation to check v->type not k->type

Next: update QK_TURBO3_GROUP, Metal WHT kernel, and KV cache for d=32.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: tturney@psyguard.ai
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants