Skip to content
This repository was archived by the owner on Sep 9, 2025. It is now read-only.

Conversation

@fabianlim
Copy link
Collaborator

There are scalings m_emb, m_residual, m_width that are part of Dolomite but not part of standard HF arch,

So when performing export_to_huggingface_llama and import_from_huggingface_llama we need to account fro this caling

  • export_to_huggingface_llama done for m_emb, m_residual but not m_width
  • import_from_huggingface_llama: not done

the key idea is to absorb the constant into specific parts of the weights. But the dififculty with m_width is that the lm_head is tied.

But this is a demo of how these match

image

Signed-off-by: Yu Chin Fabian Lim <[email protected]>
@fabianlim fabianlim marked this pull request as draft July 29, 2024 15:31
@fabianlim fabianlim requested a review from mayank31398 July 29, 2024 15:32
@fabianlim fabianlim assigned aldopareja and unassigned aldopareja Jul 29, 2024
@fabianlim fabianlim requested a review from aldopareja July 29, 2024 15:33
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants