Skip to content

Conversation

@McLavish
Copy link
Owner

@McLavish McLavish commented Nov 5, 2025

First implementation of a LLM Language GPU Benchmark

@McLavish McLavish self-assigned this Nov 5, 2025
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds a new inference benchmark (412.language-bert) that performs sentence classification using a compact BERT model served via ONNX Runtime. The benchmark is added to the serverless benchmarks suite and integrates with the existing testing infrastructure. Additionally, the PR updates the benchmark data repository URL to a forked version.

  • Added new BERT-based language inference benchmark for sentence classification
  • Updated benchmark data repository references from spcl/serverless-benchmarks-data to McLavish/serverless-benchmarks-data-dphpc
  • Enhanced mypy configuration to ignore missing imports for docker submodules

Reviewed Changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
sebs/regression.py Adds "412.language-bert" to the Python benchmarks list for regression testing
install.py Updates git clone URL to forked benchmark data repository
.gitmodules Updates submodule URL to forked benchmark data repository
docs/benchmarks.md Adds documentation entry and description for the new language inference benchmark
benchmarks/400.inference/412.language-bert/python/requirements.txt* Defines Python dependencies (numpy, onnxruntime-gpu, tokenizers) for multiple Python versions
benchmarks/400.inference/412.language-bert/python/package.sh Provides packaging script for stripping unnecessary files and handling torch dependencies
benchmarks/400.inference/412.language-bert/python/init.sh Initialization script (no additional setup required)
benchmarks/400.inference/412.language-bert/python/function.py Main benchmark implementation with model loading, tokenization, and inference logic
benchmarks/400.inference/412.language-bert/input.py Input generation and file upload utilities for the benchmark
benchmarks/400.inference/412.language-bert/config.json Benchmark configuration specifying timeout, memory, languages, and required modules
.mypy.ini Adds ignore rule for docker.* submodules to prevent type checking errors

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +42 to +43
with tarfile.open(archive_path, "r:gz") as tar:
tar.extractall(MODEL_DIRECTORY)
Copy link

Copilot AI Nov 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using tar.extractall() without validation is vulnerable to path traversal attacks. Malicious archives could extract files outside the intended directory. Use tar.extractall(MODEL_DIRECTORY, filter='data') (Python 3.12+) or manually validate each member's path before extraction for older Python versions.

Copilot uses AI. Check for mistakes.
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't care

Comment on lines +63 to +66
if "CUDAExecutionProvider" not in available:
raise RuntimeError(f"CUDAExecutionProvider unavailable (have: {available})")

_session = ort.InferenceSession(onnx_path, providers=["CUDAExecutionProvider"])
Copy link

Copilot AI Nov 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code requires CUDAExecutionProvider but the benchmark uses onnxruntime-gpu. This creates a hard dependency on GPU availability, which may fail in CPU-only environments. Consider falling back to CPUExecutionProvider if CUDA is unavailable, or document this GPU requirement clearly in the benchmark configuration. The existing 411.image-recognition benchmark uses CPU-only inference for broader compatibility.

Suggested change
if "CUDAExecutionProvider" not in available:
raise RuntimeError(f"CUDAExecutionProvider unavailable (have: {available})")
_session = ort.InferenceSession(onnx_path, providers=["CUDAExecutionProvider"])
if "CUDAExecutionProvider" in available:
providers = ["CUDAExecutionProvider"]
print("Using CUDAExecutionProvider for ONNX Runtime inference.")
else:
providers = ["CPUExecutionProvider"]
print("CUDAExecutionProvider unavailable, falling back to CPUExecutionProvider for ONNX Runtime inference.")
_session = ort.InferenceSession(onnx_path, providers=providers)

Copilot uses AI. Check for mistakes.
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't care

@@ -0,0 +1,3 @@
numpy==1.24.4
onnxruntime-gpu==1.16.3
Copy link

Copilot AI Nov 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using onnxruntime-gpu requires CUDA dependencies and may not be compatible with all serverless environments. Consider using onnxruntime (CPU version) for better portability across different cloud platforms, or provide separate CPU and GPU requirement files. The comment on line 63-64 of function.py enforces GPU requirement, but many serverless platforms don't provide GPU access by default.

Suggested change
onnxruntime-gpu==1.16.3
onnxruntime==1.16.3

Copilot uses AI. Check for mistakes.
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't care

Comment on lines +28 to +35
TORCH_DIR=".python_packages/lib/site-packages/torch"
if [ -d "$1/${TORCH_DIR}" ]; then
cd $1
zip -qr torch.zip ${TORCH_DIR}
rm -rf ${TORCH_DIR}
cd ${CUR_DIR}
echo "Torch-zipped size $(du -sh $1 | cut -f1)"
fi
Copy link

Copilot AI Nov 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This benchmark packages torch despite not using PyTorch in its requirements or implementation (uses ONNX Runtime instead). The torch packaging logic appears to be copied from the 411.image-recognition benchmark but is unnecessary here. Consider removing lines 28-35 to avoid confusion and reduce package size.

Suggested change
TORCH_DIR=".python_packages/lib/site-packages/torch"
if [ -d "$1/${TORCH_DIR}" ]; then
cd $1
zip -qr torch.zip ${TORCH_DIR}
rm -rf ${TORCH_DIR}
cd ${CUR_DIR}
echo "Torch-zipped size $(du -sh $1 | cut -f1)"
fi

Copilot uses AI. Check for mistakes.
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NO

# clone
else:
execute(f"git clone https://github.com/spcl/serverless-benchmarks-data.git {data_dir}")
execute(f"git clone https://github.com/McLavish/serverless-benchmarks-data-dphpc.git {data_dir}")
Copy link

Copilot AI Nov 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The repository URL change from spcl/serverless-benchmarks-data to McLavish/serverless-benchmarks-data-dphpc suggests this is a fork for a specific project (dphpc). Using a personal fork in production code may cause maintenance issues if the fork becomes outdated or unavailable. Consider using the official repository or clearly documenting why this fork is necessary.

Suggested change
execute(f"git clone https://github.com/McLavish/serverless-benchmarks-data-dphpc.git {data_dir}")
execute(f"git clone https://github.com/spcl/serverless-benchmarks-data.git {data_dir}")

Copilot uses AI. Check for mistakes.
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

who

[submodule "benchmarks-data"]
path = benchmarks-data
url = https://github.com/spcl/serverless-benchmarks-data.git
url = https://github.com/McLavish/serverless-benchmarks-data-dphpc.git
Copy link

Copilot AI Nov 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The submodule URL change to a personal fork (McLavish/serverless-benchmarks-data-dphpc) may cause synchronization issues with the main project. If this fork contains essential data for the new benchmark, consider contributing it back to the upstream repository or documenting the dependency clearly in the project README.

Copilot uses AI. Check for mistakes.
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

asked

@McLavish McLavish merged commit c478c91 into development Nov 5, 2025
7 checks passed
@McLavish McLavish added this to the benchmark implementation 1 milestone Nov 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants