Skip to content
Closed

Apertus #22810

Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
304e7f4
vllm support for swissai model
AllenHaoHuang Mar 22, 2025
88fc1a5
Merge pull request #2 from EduardDurech/v0.8.2
AllenHaoHuang Apr 6, 2025
f03bedf
SwissAI parallelization bugfix (#3)
AllenHaoHuang May 2, 2025
0d79f60
Merge pull request #1 from swiss-ai/main
AllenHaoHuang May 7, 2025
1323808
CUDA xIELU support
AllenHaoHuang May 28, 2025
0457a41
Changed bfloat16 to float16 and fixed warning
AllenHaoHuang May 28, 2025
2cf6bbf
Code working for XIELUfn but not XIELU (CUDA)
AllenHaoHuang May 29, 2025
6e44971
Update activation.py
AllenHaoHuang May 29, 2025
9c56ab9
Update activation.py
AllenHaoHuang May 29, 2025
99757fa
Fixing casting from f16 back to bf16
AllenHaoHuang Jun 8, 2025
d225c42
Integrating CUDA xIELU forward with vLLM
AllenHaoHuang Jun 26, 2025
fa815e6
Cleanup code
AllenHaoHuang Jul 2, 2025
cc49328
CUDA xIELU support for VLLM
AllenHaoHuang Jul 10, 2025
93ac959
Fix dtype and added input.is_cuda
AllenHaoHuang Jul 10, 2025
40f8c36
Update test_models.py
AllenHaoHuang Jul 14, 2025
0bf40d9
Update activation.py
AllenHaoHuang Jul 14, 2025
ae7df9b
Update activation.py
AllenHaoHuang Jul 21, 2025
37d11f7
Rename to Apertus
AllenHaoHuang Jul 30, 2025
55453ae
Rename to Apertus
AllenHaoHuang Jul 30, 2025
bc769a7
Rename to Apertus
AllenHaoHuang Jul 30, 2025
c9443f2
Rename to Apertus
AllenHaoHuang Jul 30, 2025
ce36321
Rename to Apertus
AllenHaoHuang Jul 30, 2025
4dae158
Update registry.py
AllenHaoHuang Aug 7, 2025
35ce6a2
Update test_models.py
AllenHaoHuang Aug 7, 2025
e005580
Merge pull request #2 from AllenHaoHuang/main
AllenHaoHuang Aug 7, 2025
6241bff
Add apertus to registry.py
AllenHaoHuang Aug 7, 2025
9da245d
add apertus to test_common.py
AllenHaoHuang Aug 7, 2025
cb326a1
Merge branch 'vllm-project:main' into apertus
AllenHaoHuang Aug 8, 2025
4268ad8
rename swissai to apertus
AllenHaoHuang Aug 8, 2025
7553f6a
Merge branch 'vllm-project:main' into apertus
AllenHaoHuang Aug 9, 2025
874249d
Update test_common.py
AllenHaoHuang Aug 9, 2025
89c05fa
Merge branch 'vllm-project:main' into apertus
AllenHaoHuang Aug 9, 2025
55d7d8f
Update test_common.py
AllenHaoHuang Aug 13, 2025
528f01a
Update registry.py
AllenHaoHuang Aug 13, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
75 changes: 75 additions & 0 deletions vllm/model_executor/layers/activation.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,9 +16,84 @@
from vllm.utils import LazyDict


@CustomOp.register("xielu")
class XIELU(CustomOp):
"""
Applies the xIELU activation function introduced in https://arxiv.org/abs/2411.13010
If the user has installed the nickjbrowning/XIELU wheel, we import xIELU CUDA

Check failure on line 23 in vllm/model_executor/layers/activation.py

View workflow job for this annotation

GitHub Actions / pre-commit

Ruff (E501)

vllm/model_executor/layers/activation.py:23:81: E501 Line too long (81 > 80)
Otherwise, we emit a single warning and use xIELU Python
"""
def __init__(
self,
alpha_p_init: float = 0.8,
alpha_n_init: float = 0.8,
beta: float = 0.5,
eps: float = -1e-6,
with_vector_loads: bool = True,
) -> None:
super().__init__()
# Initialize parameters
self.alpha_p = nn.Parameter(
torch.log(torch.exp(torch.tensor(alpha_p_init)) - 1).unsqueeze(0))
self.alpha_n = nn.Parameter(
torch.log(torch.exp(torch.tensor(alpha_n_init - beta)) - 1).unsqueeze(0))

# Register beta and eps as buffers (fixed tensors)
self.register_buffer('beta', torch.tensor(beta), persistent=False)
self.register_buffer('eps', torch.tensor(eps), persistent=False)
self.with_vector_loads = with_vector_loads

self._xielu_cuda_obj = None
self._xielu_cuda_fn = None # Will be set if CUDA available
try:
import xielu.ops # noqa: F401

self._xielu_cuda_obj = torch.classes.xielu.XIELU()
try:
from torch._dynamo import allow_in_graph
self._xielu_cuda_fn = allow_in_graph(self._xielu_cuda)
except Exception as err:
print(f"Could not enable torch._dynamo for xIELU ({err}) - this may result in slower performance.")
except Exception as err:
print(f"CUDA-fused xIELU not available ({err}) - using Python implementation. "
"Install with: pip install git+https://github.com/nickjbrowning/XIELU")

Check failure on line 59 in vllm/model_executor/layers/activation.py

View workflow job for this annotation

GitHub Actions / pre-commit

Ruff (E501)

vllm/model_executor/layers/activation.py:59:81: E501 Line too long (112 > 80)
Comment on lines +55 to +59
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Using print for warnings in a library is discouraged as it can interfere with the logging configuration of downstream applications. It's better to use the logging module for this.

Please replace the print calls with logger.warning. You'll need to add the following at the beginning of the file:

from vllm.logger import init_logger

logger = init_logger(__name__)
Suggested change
except Exception as err:
print(f"Could not enable torch._dynamo for xIELU ({err}) - this may result in slower performance.")
except Exception as err:
print(f"CUDA-fused xIELU not available ({err}) - using Python implementation. "
"Install with: pip install git+https://github.com/nickjbrowning/XIELU")
except Exception as err:
logger.warning(f"Could not enable torch._dynamo for xIELU ({err}) - this may result in slower performance.")
except Exception as err:
logger.warning(f"CUDA-fused xIELU not available ({err}) - using Python implementation. "
"Install with: pip install git+https://github.com/nickjbrowning/XIELU")

Comment on lines +46 to +59
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

There are a few issues in this block that could lead to runtime errors and maintenance difficulties:

  1. Critical Bug: If torch._dynamo.allow_in_graph cannot be imported or fails (e.g., on older PyTorch versions), self._xielu_cuda_fn remains None. However, the forward method will still attempt to call it if the CUDA object self._xielu_cuda_obj was successfully created, leading to a TypeError.
  2. Logging: Using print() for warnings is not ideal in a library. It's better to use the logging module for better control over verbosity and output streams. I recommend replacing print with logger.warning.
  3. Exception Handling: Catching the broad Exception can hide unexpected errors. It's better to catch more specific exceptions like ImportError and AttributeError.

I've provided a suggestion to fix the critical bug by setting a fallback for self._xielu_cuda_fn.

Suggested change
self._xielu_cuda_obj = None
self._xielu_cuda_fn = None # Will be set if CUDA available
try:
import xielu.ops # noqa: F401
self._xielu_cuda_obj = torch.classes.xielu.XIELU()
try:
from torch._dynamo import allow_in_graph
self._xielu_cuda_fn = allow_in_graph(self._xielu_cuda)
except Exception as err:
print(f"Could not enable torch._dynamo for xIELU ({err}) - this may result in slower performance.")
except Exception as err:
print(f"CUDA-fused xIELU not available ({err}) - using Python implementation. "
"Install with: pip install git+https://github.com/nickjbrowning/XIELU")
self._xielu_cuda_obj = None
self._xielu_cuda_fn = None # Will be set if CUDA available
try:
import xielu.ops # noqa: F401
self._xielu_cuda_obj = torch.classes.xielu.XIELU()
self._xielu_cuda_fn = self._xielu_cuda
try:
from torch._dynamo import allow_in_graph
self._xielu_cuda_fn = allow_in_graph(self._xielu_cuda)
except Exception as err:
print(f"Could not enable torch._dynamo for xIELU ({err}) - this may result in slower performance.")
except Exception as err:
print(f"CUDA-fused xIELU not available ({err}) - using Python implementation. "
"Install with: pip install git+https://github.com/nickjbrowning/XIELU")


def _xielu_python(self, x: torch.Tensor) -> torch.Tensor:
alpha_p = F.softplus(self.alpha_p)
alpha_n = self.beta + F.softplus(self.alpha_n)

Check failure on line 63 in vllm/model_executor/layers/activation.py

View workflow job for this annotation

GitHub Actions / pre-commit

Ruff (E501)

vllm/model_executor/layers/activation.py:63:81: E501 Line too long (89 > 80)
return torch.where(
x > 0,
alpha_p * x * x + self.beta * x,
(torch.expm1(torch.min(x, self.eps)) - x) * alpha_n + self.beta * x,
)

def _xielu_cuda(self, x: torch.Tensor) -> torch.Tensor:
"""Firewall function to prevent torch.compile from seeing .item() calls"""
original_shape = x.shape
# CUDA kernel expects 3D tensors, reshape if needed
while x.dim() < 3:
x = x.unsqueeze(0)
if x.dim() > 3:
x = x.view(-1, 1, x.size(-1))
result = self._xielu_cuda_obj.forward(

Check failure on line 78 in vllm/model_executor/layers/activation.py

View workflow job for this annotation

GitHub Actions / pre-commit

Ruff (E501)

vllm/model_executor/layers/activation.py:78:81: E501 Line too long (82 > 80)
x,
self.alpha_p,
self.alpha_n,
self.beta.item(),
self.eps.item(),
self.with_vector_loads,
)
return result.view(original_shape)

def forward(self, input: torch.Tensor) -> torch.Tensor:
if self._xielu_cuda_obj is not None and input.is_cuda and not torch._dynamo.is_compiling():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

There is a potential TypeError in the forward method. If torch._dynamo.allow_in_graph is not available or fails to be imported, self._xielu_cuda_fn will remain None. However, self._xielu_cuda_obj could be non-None if xielu.ops was imported successfully. In this case, the condition self._xielu_cuda_obj is not None would pass, and the code would attempt to call self._xielu_cuda_fn(input), which would result in None(input), raising a TypeError.

To fix this, the condition should check self._xielu_cuda_fn's availability instead of self._xielu_cuda_obj.

Suggested change
if self._xielu_cuda_obj is not None and input.is_cuda and not torch._dynamo.is_compiling():
if self._xielu_cuda_fn is not None and input.is_cuda and not torch._dynamo.is_compiling():

return self._xielu_cuda_fn(input)
return self._xielu_python(input)


@CustomOp.register("fatrelu_and_mul")
class FatreluAndMul(CustomOp):
"""An activation function for FATReLU.

Check failure on line 96 in vllm/model_executor/layers/activation.py

View workflow job for this annotation

GitHub Actions / pre-commit

Ruff (E501)

vllm/model_executor/layers/activation.py:96:81: E501 Line too long (97 > 80)

The function computes x -> FATReLU(x[:d]) * x[d:] where
d = x.shape[-1] // 2.
Expand Down
Loading