Apertus #22810

AllenHaoHuang · 2025-08-13T10:53:10Z

Pre-release of Apertus from the Swiss AI Initiative

Main modifications from Llama

QK Norm
xIELU Activation

Corresponding transformers PR that the vLLM matches outputs with: huggingface/transformers#39381

The code passes the following tests with an early 1B checkpoint uploaded to Saesara/swissai. It is not included as part of the PR as the naming is still being finalized and will eventually be replaced with a checkpoint uploaded to https://huggingface.co/swiss-ai.

tests/models/registry.py
"ApertusForCausalLM": _HfExamplesInfo("Saesara/swissai", trust_remote_code=True),

tests/models/language/generation/test_common.py
pytest.param("Saesara/swissai"),

v0.8.2 vLLM + SwissLM

* Update swissai.py Replaced LlamaConfig with SwissAIConfig Changed up_proj from RowParallelLinear to ColumnParallelLinear * `ColumnParallelLinear` import * `LLaMa` -> `SwissAI` --------- Co-authored-by: EduardDurech <[email protected]>

Bugfixes in swiss-ai/main

vllm uses float16, and recasts bfloat16 to float16

Removed unnecessary comments

compatible with torch.dynamo, passes tests matches HF and vLLM outputs

temporarily commenting out tests for other models

vllm xielu fix mirrors transformers with additional torch._dynamo.is_compiling() check

updating vllm

github-actions · 2025-08-13T10:53:20Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

gemini-code-assist

Code Review

This pull request introduces the Apertus model, which features a new xIELU activation function and QK normalization. The implementation of the Apertus model and its registration are well-structured and align with the existing codebase. My review identifies a critical bug in the XIELU activation function that could cause a TypeError, and a high-severity issue regarding the use of print instead of a proper logger, which is a better practice for library code.

gemini-code-assist · 2025-08-13T10:54:13Z

vllm/model_executor/layers/activation.py

+        return result.view(original_shape)
+
+    def forward(self, input: torch.Tensor) -> torch.Tensor:
+        if self._xielu_cuda_obj is not None and input.is_cuda and not torch._dynamo.is_compiling():


There is a potential TypeError in the forward method. If torch._dynamo.allow_in_graph is not available or fails to be imported, self._xielu_cuda_fn will remain None. However, self._xielu_cuda_obj could be non-None if xielu.ops was imported successfully. In this case, the condition self._xielu_cuda_obj is not None would pass, and the code would attempt to call self._xielu_cuda_fn(input), which would result in None(input), raising a TypeError.

To fix this, the condition should check self._xielu_cuda_fn's availability instead of self._xielu_cuda_obj.

Suggested change

if self._xielu_cuda_obj is not None and input.is_cuda and not torch._dynamo.is_compiling():

if self._xielu_cuda_fn is not None and input.is_cuda and not torch._dynamo.is_compiling():

gemini-code-assist · 2025-08-13T10:54:13Z

vllm/model_executor/layers/activation.py

+            except Exception as err:
+                print(f"Could not enable torch._dynamo for xIELU ({err}) - this may result in slower performance.")
+        except Exception as err:
+            print(f"CUDA-fused xIELU not available ({err}) - using Python implementation. "
+                  "Install with: pip install git+https://github.com/nickjbrowning/XIELU")


Using print for warnings in a library is discouraged as it can interfere with the logging configuration of downstream applications. It's better to use the logging module for this.

Please replace the print calls with logger.warning. You'll need to add the following at the beginning of the file:

from vllm.logger import init_logger logger = init_logger(__name__)

Suggested change

except Exception as err:

print(f"Could not enable torch._dynamo for xIELU ({err}) - this may result in slower performance.")

except Exception as err:

print(f"CUDA-fused xIELU not available ({err}) - using Python implementation. "

"Install with: pip install git+https://github.com/nickjbrowning/XIELU")

except Exception as err:

logger.warning(f"Could not enable torch._dynamo for xIELU ({err}) - this may result in slower performance.")

except Exception as err:

logger.warning(f"CUDA-fused xIELU not available ({err}) - using Python implementation. "

"Install with: pip install git+https://github.com/nickjbrowning/XIELU")

gemini-code-assist

Code Review

This pull request introduces support for the Apertus model, which includes a new xIELU activation function and QK normalization. The implementation looks solid, but I've identified a critical bug in the XIELU activation layer that could lead to a TypeError at runtime. My review includes a fix for this issue along with suggestions for improving logging and exception handling.

gemini-code-assist · 2025-08-13T10:55:04Z

vllm/model_executor/layers/activation.py

+        self._xielu_cuda_obj = None
+        self._xielu_cuda_fn = None  # Will be set if CUDA available
+        try:
+            import xielu.ops  # noqa: F401
+
+            self._xielu_cuda_obj = torch.classes.xielu.XIELU()
+            try:
+                from torch._dynamo import allow_in_graph
+                self._xielu_cuda_fn = allow_in_graph(self._xielu_cuda)
+            except Exception as err:
+                print(f"Could not enable torch._dynamo for xIELU ({err}) - this may result in slower performance.")
+        except Exception as err:
+            print(f"CUDA-fused xIELU not available ({err}) - using Python implementation. "
+                  "Install with: pip install git+https://github.com/nickjbrowning/XIELU")


There are a few issues in this block that could lead to runtime errors and maintenance difficulties:

Critical Bug: If torch._dynamo.allow_in_graph cannot be imported or fails (e.g., on older PyTorch versions), self._xielu_cuda_fn remains None. However, the forward method will still attempt to call it if the CUDA object self._xielu_cuda_obj was successfully created, leading to a TypeError.

Logging: Using print() for warnings is not ideal in a library. It's better to use the logging module for better control over verbosity and output streams. I recommend replacing print with logger.warning.

Exception Handling: Catching the broad Exception can hide unexpected errors. It's better to catch more specific exceptions like ImportError and AttributeError.

I've provided a suggestion to fix the critical bug by setting a fallback for self._xielu_cuda_fn.

Suggested change

self._xielu_cuda_obj = None

self._xielu_cuda_fn = None # Will be set if CUDA available

try:

import xielu.ops # noqa: F401

self._xielu_cuda_obj = torch.classes.xielu.XIELU()

try:

from torch._dynamo import allow_in_graph

self._xielu_cuda_fn = allow_in_graph(self._xielu_cuda)

except Exception as err:

print(f"Could not enable torch._dynamo for xIELU ({err}) - this may result in slower performance.")

except Exception as err:

print(f"CUDA-fused xIELU not available ({err}) - using Python implementation. "

"Install with: pip install git+https://github.com/nickjbrowning/XIELU")

self._xielu_cuda_obj = None

self._xielu_cuda_fn = None # Will be set if CUDA available

try:

import xielu.ops # noqa: F401

self._xielu_cuda_obj = torch.classes.xielu.XIELU()

self._xielu_cuda_fn = self._xielu_cuda

try:

from torch._dynamo import allow_in_graph

self._xielu_cuda_fn = allow_in_graph(self._xielu_cuda)

except Exception as err:

print(f"Could not enable torch._dynamo for xIELU ({err}) - this may result in slower performance.")

except Exception as err:

print(f"CUDA-fused xIELU not available ({err}) - using Python implementation. "

"Install with: pip install git+https://github.com/nickjbrowning/XIELU")

vllm-project#22810

AllenHaoHuang and others added 30 commits April 6, 2025 07:20

vllm support for swissai model

304e7f4

Merge pull request #2 from EduardDurech/v0.8.2

88fc1a5

v0.8.2 vLLM + SwissLM

SwissAI parallelization bugfix (#3)

f03bedf

* Update swissai.py Replaced LlamaConfig with SwissAIConfig Changed up_proj from RowParallelLinear to ColumnParallelLinear * `ColumnParallelLinear` import * `LLaMa` -> `SwissAI` --------- Co-authored-by: EduardDurech <[email protected]>

Merge pull request #1 from swiss-ai/main

0d79f60

Bugfixes in swiss-ai/main

CUDA xIELU support

1323808

Changed bfloat16 to float16 and fixed warning

0457a41

vllm uses float16, and recasts bfloat16 to float16

Code working for XIELUfn but not XIELU (CUDA)

2cf6bbf

Update activation.py

6e44971

Update activation.py

9c56ab9

Fixing casting from f16 back to bf16

99757fa

Integrating CUDA xIELU forward with vLLM

d225c42

Cleanup code

fa815e6

Removed unnecessary comments

CUDA xIELU support for VLLM

cc49328

compatible with torch.dynamo, passes tests matches HF and vLLM outputs

Fix dtype and added input.is_cuda

93ac959

Update test_models.py

40f8c36

temporarily commenting out tests for other models

Update activation.py

0bf40d9

Update activation.py

ae7df9b

vllm xielu fix mirrors transformers with additional torch._dynamo.is_compiling() check

Rename to Apertus

37d11f7

Rename to Apertus

55453ae

Rename to Apertus

bc769a7

Rename to Apertus

c9443f2

Rename to Apertus

ce36321

Update registry.py

4dae158

Update test_models.py

35ce6a2

Merge pull request #2 from AllenHaoHuang/main

e005580

updating vllm

Add apertus to registry.py

6241bff

add apertus to test_common.py

9da245d

Merge branch 'vllm-project:main' into apertus

cb326a1

rename swissai to apertus

4268ad8

Merge branch 'vllm-project:main' into apertus

7553f6a

AllenHaoHuang added 4 commits August 9, 2025 19:50

Update test_common.py

874249d

Merge branch 'vllm-project:main' into apertus

89c05fa

Update test_common.py

55d7d8f

Update registry.py

528f01a

mergify bot added the new-model Requests to new models label Aug 13, 2025

gemini-code-assist bot reviewed Aug 13, 2025

View reviewed changes

AllenHaoHuang closed this Aug 13, 2025

AllenHaoHuang added a commit to swiss-ai/vllm that referenced this pull request Aug 15, 2025

Recommended gemini code review bug fixes

9260c75

vllm-project#22810

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Apertus #22810

Apertus #22810

Uh oh!

AllenHaoHuang commented Aug 13, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Aug 13, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Aug 13, 2025

Uh oh!

gemini-code-assist bot Aug 13, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Aug 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	if self._xielu_cuda_obj is not None and input.is_cuda and not torch._dynamo.is_compiling():
	if self._xielu_cuda_fn is not None and input.is_cuda and not torch._dynamo.is_compiling():

Uh oh!

Apertus #22810

Apertus #22810

Uh oh!

Conversation

AllenHaoHuang commented Aug 13, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Aug 13, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

AllenHaoHuang commented Aug 13, 2025 •

edited by github-actions bot

Loading