Fix convert_hf_to_gguf.py script on s390x#17431
Conversation
Assume converted model data is originally little-endian. Byteswap data on s390x after reading it to put values in correct presentation for any transformation needed, like calculating weight tensors. Then byteswap data to little-endian before passing it to GGUFWriter while GGUFWriter will byteswap data back to big endian if big endian output is requested. byteswap(inplace=True) calls don't work with lazy tensor and array wrappers. Use byteswap with copying data to workaround this behaviour.
…-endian With this change if no byteswapping is actually needed, 2 excessive byteswaps can be omitted on s390x
de7b5ec to
ed94707
Compare
| torch.uint64: np.uint64, | ||
| torch.int32: np.int32, | ||
| torch.uint32: np.uint32, | ||
| torch.int16: np.int16, | ||
| torch.uint16: np.uint16, |
There was a problem hiding this comment.
Might be relevant to uncomment the unsigned int types in _dtype_str_map as well (U16, U32, U64) if those are expected to exist.
They seem to be available since PyTorch 2.3.0, while the requirements.txt has version 2.6.0, so it should be fine.
There was a problem hiding this comment.
I've mentioned those types for numpy in case they're ever encountered. It would be fine for me to get _dtype_str_map updated, but maybe it could be done separately?
I don't know because I didn't test it before. |
|
Is this change ok to merge with latest commit? If yes, how do I merge it? |
Yes, LGTM, I'll merge. |
* Fix convert_hf_to_gguf.py script on s390x Assume converted model data is originally little-endian. Byteswap data on s390x after reading it to put values in correct presentation for any transformation needed, like calculating weight tensors. Then byteswap data to little-endian before passing it to GGUFWriter while GGUFWriter will byteswap data back to big endian if big endian output is requested. byteswap(inplace=True) calls don't work with lazy tensor and array wrappers. Use byteswap with copying data to workaround this behaviour. * Make GGUFWriter accept tensors in native endianness instead of little-endian With this change if no byteswapping is actually needed, 2 excessive byteswaps can be omitted on s390x * Fix byteswapping in convert_hf_to_gguf.py for remote models
* Fix convert_hf_to_gguf.py script on s390x Assume converted model data is originally little-endian. Byteswap data on s390x after reading it to put values in correct presentation for any transformation needed, like calculating weight tensors. Then byteswap data to little-endian before passing it to GGUFWriter while GGUFWriter will byteswap data back to big endian if big endian output is requested. byteswap(inplace=True) calls don't work with lazy tensor and array wrappers. Use byteswap with copying data to workaround this behaviour. * Make GGUFWriter accept tensors in native endianness instead of little-endian With this change if no byteswapping is actually needed, 2 excessive byteswaps can be omitted on s390x * Fix byteswapping in convert_hf_to_gguf.py for remote models
Assume converted model data is originally little-endian.
Byteswap data on s390x after reading it to put values in correct presentation
for any transformation needed, like calculating weight tensors.
Then byteswap data to little-endian before passing it to GGUFWriter while
GGUFWriter will byteswap data back to big endian if big endian output is requested.
byteswap(inplace=True) calls don't work with lazy tensor and array wrappers.
Use byteswap with copying data to workaround this behaviour.
Make GGUFWriter accept tensors in native endianness instead of little-endian.
With this change if no byteswapping is actually needed, 2 excessive byteswaps can be omitted on s390x.