Skip to content

Fix convert_hf_to_gguf.py script on s390x#17431

Merged
CISC merged 3 commits intoggml-org:masterfrom
AlekseiNikiforovIBM:s390x_hf_convert
Nov 25, 2025
Merged

Fix convert_hf_to_gguf.py script on s390x#17431
CISC merged 3 commits intoggml-org:masterfrom
AlekseiNikiforovIBM:s390x_hf_convert

Conversation

@AlekseiNikiforovIBM
Copy link
Copy Markdown
Contributor

Assume converted model data is originally little-endian.
Byteswap data on s390x after reading it to put values in correct presentation
for any transformation needed, like calculating weight tensors.

Then byteswap data to little-endian before passing it to GGUFWriter while
GGUFWriter will byteswap data back to big endian if big endian output is requested.

byteswap(inplace=True) calls don't work with lazy tensor and array wrappers.
Use byteswap with copying data to workaround this behaviour.

Make GGUFWriter accept tensors in native endianness instead of little-endian.

With this change if no byteswapping is actually needed, 2 excessive byteswaps can be omitted on s390x.

Assume converted model data is originally little-endian.
Byteswap data on s390x after reading it to put values in correct presentation
for any transformation needed, like calculating weight tensors.

Then byteswap data to little-endian before passing it to GGUFWriter while
GGUFWriter will byteswap data back to big endian if big endian output is requested.

byteswap(inplace=True) calls don't work with lazy tensor and array wrappers.
Use byteswap with copying data to workaround this behaviour.
…-endian

With this change if no byteswapping is actually needed, 2 excessive byteswaps can be omitted on s390x
@CISC CISC requested a review from compilade November 21, 2025 16:07
Copy link
Copy Markdown
Collaborator

@compilade compilade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Did it ever work before or was it broken by #15667?

Comment on lines +10049 to +10053
torch.uint64: np.uint64,
torch.int32: np.int32,
torch.uint32: np.uint32,
torch.int16: np.int16,
torch.uint16: np.uint16,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be relevant to uncomment the unsigned int types in _dtype_str_map as well (U16, U32, U64) if those are expected to exist.

They seem to be available since PyTorch 2.3.0, while the requirements.txt has version 2.6.0, so it should be fine.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've mentioned those types for numpy in case they're ever encountered. It would be fine for me to get _dtype_str_map updated, but maybe it could be done separately?

@AlekseiNikiforovIBM
Copy link
Copy Markdown
Contributor Author

Thanks! Did it ever work before or was it broken by #15667?

I don't know because I didn't test it before.

@AlekseiNikiforovIBM
Copy link
Copy Markdown
Contributor Author

Is this change ok to merge with latest commit? If yes, how do I merge it?

@CISC
Copy link
Copy Markdown
Member

CISC commented Nov 25, 2025

Is this change ok to merge with latest commit? If yes, how do I merge it?

Yes, LGTM, I'll merge.

@CISC CISC merged commit 05872ac into ggml-org:master Nov 25, 2025
6 of 7 checks passed
Anico2 added a commit to Anico2/llama.cpp that referenced this pull request Jan 15, 2026
* Fix convert_hf_to_gguf.py script on s390x

Assume converted model data is originally little-endian.
Byteswap data on s390x after reading it to put values in correct presentation
for any transformation needed, like calculating weight tensors.

Then byteswap data to little-endian before passing it to GGUFWriter while
GGUFWriter will byteswap data back to big endian if big endian output is requested.

byteswap(inplace=True) calls don't work with lazy tensor and array wrappers.
Use byteswap with copying data to workaround this behaviour.

* Make GGUFWriter accept tensors in native endianness instead of little-endian

With this change if no byteswapping is actually needed, 2 excessive byteswaps can be omitted on s390x

* Fix byteswapping in convert_hf_to_gguf.py for remote models
blime4 referenced this pull request in blime4/llama.cpp Feb 5, 2026
* Fix convert_hf_to_gguf.py script on s390x

Assume converted model data is originally little-endian.
Byteswap data on s390x after reading it to put values in correct presentation
for any transformation needed, like calculating weight tensors.

Then byteswap data to little-endian before passing it to GGUFWriter while
GGUFWriter will byteswap data back to big endian if big endian output is requested.

byteswap(inplace=True) calls don't work with lazy tensor and array wrappers.
Use byteswap with copying data to workaround this behaviour.

* Make GGUFWriter accept tensors in native endianness instead of little-endian

With this change if no byteswapping is actually needed, 2 excessive byteswaps can be omitted on s390x

* Fix byteswapping in convert_hf_to_gguf.py for remote models
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

python python script changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants