Feature/bert inference #2

McLavish · 2025-11-05T09:26:58Z

First implementation of a LLM Language GPU Benchmark

…y uses the gpu

Copilot

Pull Request Overview

This PR adds a new inference benchmark (412.language-bert) that performs sentence classification using a compact BERT model served via ONNX Runtime. The benchmark is added to the serverless benchmarks suite and integrates with the existing testing infrastructure. Additionally, the PR updates the benchmark data repository URL to a forked version.

Added new BERT-based language inference benchmark for sentence classification
Updated benchmark data repository references from spcl/serverless-benchmarks-data to McLavish/serverless-benchmarks-data-dphpc
Enhanced mypy configuration to ignore missing imports for docker submodules

Reviewed Changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
sebs/regression.py	Adds "412.language-bert" to the Python benchmarks list for regression testing
install.py	Updates git clone URL to forked benchmark data repository
.gitmodules	Updates submodule URL to forked benchmark data repository
docs/benchmarks.md	Adds documentation entry and description for the new language inference benchmark
benchmarks/400.inference/412.language-bert/python/requirements.txt*	Defines Python dependencies (numpy, onnxruntime-gpu, tokenizers) for multiple Python versions
benchmarks/400.inference/412.language-bert/python/package.sh	Provides packaging script for stripping unnecessary files and handling torch dependencies
benchmarks/400.inference/412.language-bert/python/init.sh	Initialization script (no additional setup required)
benchmarks/400.inference/412.language-bert/python/function.py	Main benchmark implementation with model loading, tokenization, and inference logic
benchmarks/400.inference/412.language-bert/input.py	Input generation and file upload utilities for the benchmark
benchmarks/400.inference/412.language-bert/config.json	Benchmark configuration specifying timeout, memory, languages, and required modules
.mypy.ini	Adds ignore rule for docker.* submodules to prevent type checking errors

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-11-05T09:30:31Z

benchmarks/400.inference/412.language-bert/python/function.py

+            with tarfile.open(archive_path, "r:gz") as tar:
+                tar.extractall(MODEL_DIRECTORY)


Using tar.extractall() without validation is vulnerable to path traversal attacks. Malicious archives could extract files outside the intended directory. Use tar.extractall(MODEL_DIRECTORY, filter='data') (Python 3.12+) or manually validate each member's path before extraction for older Python versions.

I don't care

Copilot · 2025-11-05T09:30:31Z

benchmarks/400.inference/412.language-bert/python/function.py

+        if "CUDAExecutionProvider" not in available:
+            raise RuntimeError(f"CUDAExecutionProvider unavailable (have: {available})")
+
+        _session = ort.InferenceSession(onnx_path, providers=["CUDAExecutionProvider"])


The code requires CUDAExecutionProvider but the benchmark uses onnxruntime-gpu. This creates a hard dependency on GPU availability, which may fail in CPU-only environments. Consider falling back to CPUExecutionProvider if CUDA is unavailable, or document this GPU requirement clearly in the benchmark configuration. The existing 411.image-recognition benchmark uses CPU-only inference for broader compatibility.

Suggested change

if "CUDAExecutionProvider" not in available:

raise RuntimeError(f"CUDAExecutionProvider unavailable (have: {available})")

_session = ort.InferenceSession(onnx_path, providers=["CUDAExecutionProvider"])

if "CUDAExecutionProvider" in available:

providers = ["CUDAExecutionProvider"]

print("Using CUDAExecutionProvider for ONNX Runtime inference.")

else:

providers = ["CPUExecutionProvider"]

print("CUDAExecutionProvider unavailable, falling back to CPUExecutionProvider for ONNX Runtime inference.")

_session = ort.InferenceSession(onnx_path, providers=providers)

I don't care

Copilot · 2025-11-05T09:30:32Z

benchmarks/400.inference/412.language-bert/python/requirements.txt

@@ -0,0 +1,3 @@
+numpy==1.24.4
+onnxruntime-gpu==1.16.3


Using onnxruntime-gpu requires CUDA dependencies and may not be compatible with all serverless environments. Consider using onnxruntime (CPU version) for better portability across different cloud platforms, or provide separate CPU and GPU requirement files. The comment on line 63-64 of function.py enforces GPU requirement, but many serverless platforms don't provide GPU access by default.

Suggested change

onnxruntime-gpu==1.16.3

onnxruntime==1.16.3

I don't care

Copilot · 2025-11-05T09:30:32Z

benchmarks/400.inference/412.language-bert/python/package.sh

+TORCH_DIR=".python_packages/lib/site-packages/torch"
+if [ -d "$1/${TORCH_DIR}" ]; then
+	cd $1
+	zip -qr torch.zip ${TORCH_DIR}
+	rm -rf ${TORCH_DIR}
+	cd ${CUR_DIR}
+	echo "Torch-zipped size $(du -sh $1 | cut -f1)"
+fi


This benchmark packages torch despite not using PyTorch in its requirements or implementation (uses ONNX Runtime instead). The torch packaging logic appears to be copied from the 411.image-recognition benchmark but is unnecessary here. Consider removing lines 28-35 to avoid confusion and reduce package size.

Suggested change

TORCH_DIR=".python_packages/lib/site-packages/torch"

if [ -d "$1/${TORCH_DIR}" ]; then

cd $1

zip -qr torch.zip ${TORCH_DIR}

rm -rf ${TORCH_DIR}

cd ${CUR_DIR}

echo "Torch-zipped size $(du -sh $1 | cut -f1)"

fi

Copilot · 2025-11-05T09:30:33Z

install.py

        # clone
        else:
-            execute(f"git clone https://github.com/spcl/serverless-benchmarks-data.git {data_dir}")
+            execute(f"git clone https://github.com/McLavish/serverless-benchmarks-data-dphpc.git {data_dir}")


The repository URL change from spcl/serverless-benchmarks-data to McLavish/serverless-benchmarks-data-dphpc suggests this is a fork for a specific project (dphpc). Using a personal fork in production code may cause maintenance issues if the fork becomes outdated or unavailable. Consider using the official repository or clearly documenting why this fork is necessary.

Suggested change

execute(f"git clone https://github.com/McLavish/serverless-benchmarks-data-dphpc.git {data_dir}")

execute(f"git clone https://github.com/spcl/serverless-benchmarks-data.git {data_dir}")

Copilot · 2025-11-05T09:30:33Z

.gitmodules

 [submodule "benchmarks-data"]
 	path = benchmarks-data
-  url = https://github.com/spcl/serverless-benchmarks-data.git
+  url = https://github.com/McLavish/serverless-benchmarks-data-dphpc.git


The submodule URL change to a personal fork (McLavish/serverless-benchmarks-data-dphpc) may cause synchronization issues with the main project. If this fork contains essential data for the new benchmark, consider contributing it back to the upstream repository or documenting the dependency clearly in the project README.

McLavish added 6 commits October 30, 2025 08:53

added bert as a test inference benchmark

550cc8c

Merge branch 'development' into feature/bert-inference

2b75311

bert now uses gpu

813af03

changed data repo to be OUR forked data repo

3a96f04

fix: replaced onnxruntime requirement from CPU to GPU. now it actuall…

aae1023

…y uses the gpu

circleci mypy fix?

25fd1d9

McLavish requested review from Russellpang and Copilot November 5, 2025 09:26

McLavish self-assigned this Nov 5, 2025

Copilot AI reviewed Nov 5, 2025

View reviewed changes

McLavish merged commit c478c91 into development Nov 5, 2025
7 checks passed

McLavish added this to the benchmark implementation 1 milestone Nov 6, 2025

		with tarfile.open(archive_path, "r:gz") as tar:
		tar.extractall(MODEL_DIRECTORY)

-        if "CUDAExecutionProvider" not in available:
-            raise RuntimeError(f"CUDAExecutionProvider unavailable (have: {available})")
-        _session = ort.InferenceSession(onnx_path, providers=["CUDAExecutionProvider"])
+        if "CUDAExecutionProvider" in available:
+            providers = ["CUDAExecutionProvider"]
+            print("Using CUDAExecutionProvider for ONNX Runtime inference.")
+        else:
+            providers = ["CPUExecutionProvider"]
+            print("CUDAExecutionProvider unavailable, falling back to CPUExecutionProvider for ONNX Runtime inference.")
+        _session = ort.InferenceSession(onnx_path, providers=providers)

	execute(f"git clone https://github.com/McLavish/serverless-benchmarks-data-dphpc.git {data_dir}")
	execute(f"git clone https://github.com/spcl/serverless-benchmarks-data.git {data_dir}")

Feature/bert inference #2

Feature/bert inference #2

Uh oh!

Conversation

McLavish commented Nov 5, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

McLavish Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

McLavish Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

McLavish Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

McLavish Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

McLavish Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

McLavish Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants