- For example codes, tritonserver would have opencv-python
- run
scripts/create_image.shto create a docker image we need
cd scripts
chmod 775 create_image.sh
./create_image.sh- This repository needs mobilenetv2 and resnet50 onnx files to run example codes
- You can download model files from Google Drive
- Or you can create model files manually with PyTorch
- Notice that a model should have dynamic axes on batch if you want to use
dynamic batching
import torch
from torchvision.models import (
resnet50,
ResNet50_Weights,
mobilenet_v2,
MobileNet_V2_Weights,
)
targets = (
(resnet50, ResNet50_Weights, "resnet50.onnx"),
(mobilenet_v2, MobileNet_V2_Weights, "mobilenetv2.onnx"),
)
for module, weights, fname in targets:
model = module(weights=weights.IMAGENET1K_V2)
torch.onnx.export(
model,
torch.randn((1, 3, 224, 224), dtype=torch.float32),
fname,
opset=17,
input_names=["input"],
output_names=["output"],
dynamic_axes={
"input": {0: "batch_size"},
"output": {0: "batch_size"},
},
)- ONNX files should be locatted in
model_repositoryas shown below - The name of model must be
model.onnx
tree ./model_repository
# model_repository/
# ├── classification
# │ ├── mobilenetv2
# │ │ ├── mobilenetv2
# │ │ │ ├── 1
# │ │ │ └── config.pbtxt
# │ │ ├── mobilenetv2_config
# │ │ │ ├── 1
# │ │ │ │ └── model.py
# │ │ │ └── config.pbtxt
# │ │ └── mobilenetv2_onnx
# │ │ ├── 1
# │ │ │ └── model.onnx
# │ │ └── config.pbtxt
# │ └── resnet50
# │ ├── resnet50
# │ │ ├── 1
# │ │ └── config.pbtxt
# │ ├── resnet50_config
# │ │ ├── 1
# │ │ │ └── model.py
# │ │ └── config.pbtxt
# │ └── resnet50_onnx
# │ ├── 1
# │ │ └── model.onnx
# │ └── config.pbtxt
# └── common_utils
# ├── classifier_postprocessor
# │ ├── 1
# │ │ └── model.py
# │ └── config.pbtxt
# └── image_preprocessor
# ├── 1
# │ └── model.py
# └── config.pbtxt- If you set dynamic_batching on, tritonserver can handle large number of requests with dividing into batches. Otherwise, it is processed as like batch size is 1
- See more about in here
- You can add
dynamic_batchingin config files except ensemble model. - When
dynamic_batchingis on,max_batch_sizeshould set to be larger than0 - Notice that
dynamic_batchingis not compatible with ensemble model. An ensemble model is handled with ensemble_scheduling. You just adddynamic_batchingto each model used in the ensemble model.
1 name: "mobilenetv2_config"
2 backend: "python"
3
4 max_batch_size: 8
5 instance_group [
6 {
7 kind: KIND_CPU
8 }
9 ]
10 dynamic_batching {
11 preferred_batch_size: [ 4, 8 ]
12 max_queue_delay_microseconds: 100
13 }
- First, You should deploy tritonserver in your machine.
- run
scripts/launch_tritonserver.shto create tritonserver docker container
./scripts/launch_tritonserver.shinference.pywill use ensemble model which includes pre-process, inference, and post-process.- For MobileNetv2, the ensemble structure is written in
./model_repository/classification/mobilenetv2/mobilenetv2/config.pbtxt - The model name is
mobilenetv2and it useensembleplatform. And it's input name isRAW_INPUTand output name isSCORES - If you want, you can change these names what you want.
1 name: "mobilenetv2"
2 platform: "ensemble"
3
4 max_batch_size: 8
5
6 input [
7 {
8 name: "RAW_INPUT"
9 data_type: TYPE_UINT8
10 dims: [ -1,-1,3 ]
11 }
12 ]
13 output [
14 {
15 name: "SCORES"
16 data_type: TYPE_FP32
17 dims: [ -1 ]
18 }
19 ]
python inference.py --model mobilenetv2
# ==================================================
# ./resources/images/golden_retrieval.jpg
# [ 207] Golden Retriever : 47.27%
# [ 208] Labrador Retriever : 7.38%
# [ 222] Kuvasz : 4.41%
# [ 257] Pyrenean Mountain Dog : 3.78%
# [ 176] Saluki : 1.10%
# --------------------------------------------------
# ./resources/images/red_fox.jpg
# [ 277] red fox : 12.15%
# [ 272] coyote : 7.00%
# [ 278] kit fox : 5.79%
# [ 274] dhole : 4.90%
# [ 269] grey wolf : 1.49%
# ==================================================
