Skip to content

Prevent out-of-bounds memory access caused by corrupt tensor.ndim in gguf file#3359

Open
MillaFleurs wants to merge 5 commits intoml-explore:mainfrom
MillaFleurs:PR3358-get_shape-bounds-checking
Open

Prevent out-of-bounds memory access caused by corrupt tensor.ndim in gguf file#3359
MillaFleurs wants to merge 5 commits intoml-explore:mainfrom
MillaFleurs:PR3358-get_shape-bounds-checking

Conversation

@MillaFleurs
Copy link
Copy Markdown
Contributor

Proposed changes

Fix for #3358 get_shape() lacks bounds checking.

The GGUF tensor loader in Apple MLX trusts the ndim field from a crafted GGUF file without bounds-checking it against the fixed-size dim[] array in the gguf_tensor struct. In release builds, the upstream gguflib.c assert() guard is compiled out (NDEBUG), leaving no enforcement. MLX's get_shape() function iterates ndim times over the stack-allocated dim[] array (maximum 8 elements), reading beyond its bounds when ndim > 8.

Checklist

Put an x in the boxes that apply.

  • I have read the CONTRIBUTING document
  • I have run pre-commit run --all-files to format my code / installed pre-commit prior to committing changes
  • I have added tests that prove my fix is effective or that my feature works
  • I have updated the necessary documentation (if needed)

Copy link
Copy Markdown
Collaborator

@zcbenz zcbenz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this can prevent a bad/malicious model file.

For example the file could provide ndim: 4 with a tensor of 0 size, which would still crash the reading program.

@MillaFleurs
Copy link
Copy Markdown
Contributor Author

You're right that this doesn't prevent all malformed files — but it's not intended to. This specifically fixes an out-of-bounds memory read where ndim > 8 causes get_shape() to read past the end of the fixed-size dim[8] array into adjacent stack memory. A zero-dimension tensor with ndim=4 stays within array bounds and doesn't trigger the same class of bug. Additional validation for semantic issues like zero-size tensors could be a good follow-up, but is orthogonal to this memory safety fix.

Copy link
Copy Markdown
Collaborator

@zcbenz zcbenz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification, to be honest I think it should be checked in gguflib, can you send a PR there?

MillaFleurs and others added 2 commits April 3, 2026 21:32
@MillaFleurs
Copy link
Copy Markdown
Contributor Author

I made the requested changes @zcbenz . Thank you for the feedback it's extremely helpful and insightful.

@MillaFleurs
Copy link
Copy Markdown
Contributor Author

Thanks for the clarification, to be honest I think it should be checked in gguflib, can you send a PR there?

I will file a PR on github.com/antirez/gguf-tools as well that's a great idea. We can do both as well. By keeping the
MLX-side check we are not dependent on waiting for their fix.

Copy link
Copy Markdown
Collaborator

@zcbenz zcbenz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@zcbenz zcbenz changed the title PR #3358 fixing get_shape() bounds checking and updating per gguf spe… Prevent out-of-bounds memory access caused by corrupt tensor.ndim in gguf file Apr 4, 2026
@zcbenz
Copy link
Copy Markdown
Collaborator

zcbenz commented Apr 4, 2026

The test does not work since the assertion works under debug build, I think we can #define NDEBUG when using gguflib or wait for gguflib to fix it, either way we shouldn't need the fix in this PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants