Skip to content

Conversation

@EduardDurech
Copy link
Contributor

Pre-release of Apertus from the Swiss AI Initiative

Main modifications from Llama

  • xIELU Activation
  • QK-norm

@ArthurZucker

@EduardDurech
Copy link
Contributor Author

@dhia680

@chiffa
Copy link

chiffa commented Jul 14, 2025

@Cyrilvallez - this is the part 1 of the PR from Swiss AI initiative

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very nice and very transformers like! Do you mind using modular to isolate the changes?

@EduardDurech
Copy link
Contributor Author

very nice and very transformers like! Do you mind using modular to isolate the changes?

Yep, I was planning to and Cyril suggested, I built from Alex's original implementation but I'll refactor

@ArthurZucker
Copy link
Collaborator

It should not require too much changes don't worry, its already in an excellent state!

As the tests are written for a config without default scaling (which is not the case in Apertus) - besides, rope scaling is tested in other models so it's all safe.
@github-actions
Copy link
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: apertus, auto

Copy link
Member

@Cyrilvallez Cyrilvallez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright! Very very nice, congrats! Super efficient work! Merging it now! 🤗

@ArthurZucker
Copy link
Collaborator

Hey @EduardDurech do you think we can add integration tests now? 🤗

@EduardDurech
Copy link
Contributor Author

@ArthurZucker yea should be able, the models are hosted on HF now, @dhia680 would you be able to? I'm too busy with RL for a bit

@ArthurZucker
Copy link
Collaborator

Nice! 🤗 we can also have a god if neither of you can!!

vermouth1992 pushed a commit to volcengine/verl that referenced this pull request Sep 13, 2025
Pre-release of Apertus from the Swiss AI Initiative

Main modifications from Llama

- xIELU Activation
- QK-norm

Associated Transformers PR
huggingface/transformers#39381
Associated vLLM PR vllm-project/vllm#23068
Associated SGLang PR sgl-project/sglang#9774

GSM8K
<img width="430" height="262" alt="image"
src="https://github.com/user-attachments/assets/8b2d5188-834b-4a8c-828e-2d0aa2ccffed"
/>
<img width="436" height="266" alt="image"
src="https://github.com/user-attachments/assets/57241a73-3150-474a-a4fb-222e33a0de08"
/>
wlf-darkmatter pushed a commit to wlf-darkmatter/verl that referenced this pull request Sep 13, 2025
Pre-release of Apertus from the Swiss AI Initiative

Main modifications from Llama

- xIELU Activation
- QK-norm

Associated Transformers PR
huggingface/transformers#39381
Associated vLLM PR vllm-project/vllm#23068
Associated SGLang PR sgl-project/sglang#9774

GSM8K
<img width="430" height="262" alt="image"
src="https://github.com/user-attachments/assets/8b2d5188-834b-4a8c-828e-2d0aa2ccffed"
/>
<img width="436" height="266" alt="image"
src="https://github.com/user-attachments/assets/57241a73-3150-474a-a4fb-222e33a0de08"
/>
@EduardDurech
Copy link
Contributor Author

@andresnowak has a draft PR for tests #41037 if you guys want to check, in the meantime there are xIELU CUDA parity issues (already known issue) I asked the group about, will see if that's fixed before and included

VocabVictor pushed a commit to VocabVictor/verl-plus that referenced this pull request Sep 24, 2025
Pre-release of Apertus from the Swiss AI Initiative

Main modifications from Llama

- xIELU Activation
- QK-norm

Associated Transformers PR
huggingface/transformers#39381
Associated vLLM PR vllm-project/vllm#23068
Associated SGLang PR sgl-project/sglang#9774

GSM8K
<img width="430" height="262" alt="image"
src="https://github.com/user-attachments/assets/8b2d5188-834b-4a8c-828e-2d0aa2ccffed"
/>
<img width="436" height="266" alt="image"
src="https://github.com/user-attachments/assets/57241a73-3150-474a-a4fb-222e33a0de08"
/>
masoudhashemi pushed a commit to masoudhashemi/verl that referenced this pull request Oct 19, 2025
Pre-release of Apertus from the Swiss AI Initiative

Main modifications from Llama

- xIELU Activation
- QK-norm

Associated Transformers PR
huggingface/transformers#39381
Associated vLLM PR vllm-project/vllm#23068
Associated SGLang PR sgl-project/sglang#9774

GSM8K
<img width="430" height="262" alt="image"
src="https://github.com/user-attachments/assets/8b2d5188-834b-4a8c-828e-2d0aa2ccffed"
/>
<img width="436" height="266" alt="image"
src="https://github.com/user-attachments/assets/57241a73-3150-474a-a4fb-222e33a0de08"
/>
techkang pushed a commit to techkang/verl that referenced this pull request Oct 31, 2025
Pre-release of Apertus from the Swiss AI Initiative

Main modifications from Llama

- xIELU Activation
- QK-norm

Associated Transformers PR
huggingface/transformers#39381
Associated vLLM PR vllm-project/vllm#23068
Associated SGLang PR sgl-project/sglang#9774

GSM8K
<img width="430" height="262" alt="image"
src="https://github.com/user-attachments/assets/8b2d5188-834b-4a8c-828e-2d0aa2ccffed"
/>
<img width="436" height="266" alt="image"
src="https://github.com/user-attachments/assets/57241a73-3150-474a-a4fb-222e33a0de08"
/>
mtian8 pushed a commit to mtian8/verl that referenced this pull request Nov 1, 2025
Pre-release of Apertus from the Swiss AI Initiative

Main modifications from Llama

- xIELU Activation
- QK-norm

Associated Transformers PR
huggingface/transformers#39381
Associated vLLM PR vllm-project/vllm#23068
Associated SGLang PR sgl-project/sglang#9774

GSM8K
<img width="430" height="262" alt="image"
src="https://github.com/user-attachments/assets/8b2d5188-834b-4a8c-828e-2d0aa2ccffed"
/>
<img width="436" height="266" alt="image"
src="https://github.com/user-attachments/assets/57241a73-3150-474a-a4fb-222e33a0de08"
/>
chenjiaoAngel added a commit to chenjiaoAngel/verl that referenced this pull request Nov 14, 2025
Pre-release of Apertus from the Swiss AI Initiative

Main modifications from Llama

- xIELU Activation
- QK-norm

Associated Transformers PR
huggingface/transformers#39381
Associated vLLM PR vllm-project/vllm#23068
Associated SGLang PR sgl-project/sglang#9774

GSM8K
<img width="430" height="262" alt="image"
src="https://github.com/user-attachments/assets/8b2d5188-834b-4a8c-828e-2d0aa2ccffed"
/>
<img width="436" height="266" alt="image"
src="https://github.com/user-attachments/assets/57241a73-3150-474a-a4fb-222e33a0de08"
/>
NenoL2001 pushed a commit to NenoL2001/verl that referenced this pull request Nov 26, 2025
Pre-release of Apertus from the Swiss AI Initiative

Main modifications from Llama

- xIELU Activation
- QK-norm

Associated Transformers PR
huggingface/transformers#39381
Associated vLLM PR vllm-project/vllm#23068
Associated SGLang PR sgl-project/sglang#9774

GSM8K
<img width="430" height="262" alt="image"
src="https://github.com/user-attachments/assets/8b2d5188-834b-4a8c-828e-2d0aa2ccffed"
/>
<img width="436" height="266" alt="image"
src="https://github.com/user-attachments/assets/57241a73-3150-474a-a4fb-222e33a0de08"
/>
paolo328 added a commit to paolo328/Verl that referenced this pull request Nov 27, 2025
Pre-release of Apertus from the Swiss AI Initiative

Main modifications from Llama

- xIELU Activation
- QK-norm

Associated Transformers PR
huggingface/transformers#39381
Associated vLLM PR vllm-project/vllm#23068
Associated SGLang PR sgl-project/sglang#9774

GSM8K
<img width="430" height="262" alt="image"
src="https://github.com/user-attachments/assets/8b2d5188-834b-4a8c-828e-2d0aa2ccffed"
/>
<img width="436" height="266" alt="image"
src="https://github.com/user-attachments/assets/57241a73-3150-474a-a4fb-222e33a0de08"
/>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants