Pytorch Implementaion of mediapipe
torchvision to transform images because of batch processing but mediapipe often uses OpenCV.
Please use mp2torch with caution since the results by mp2torch may differ from those by the original mediapipe due to the different backend for image transformation.
As the image size increases, the predictions of mp2torch are more likely to differ from the predictions of mediapipe.
- FaceDetection
- Currently support only short-range detection
- FaceMesh
- Not support
static_image_mode=False
- Not support
- CPU
- GPU
- MPS (macOS GPU)
- slower than cpu even when using batch processing
| Environment | OS | CPU | GPU |
|---|---|---|---|
| CPU | Ubuntu22.04 | AMD EPYC 7742 | NVIDIA A100-SXM 80GB |
| GPU | Ubuntu22.04 | AMD EPYC 7742 | NVIDIA A100-SXM 80GB |
| MPS | macOS Sonoma | Apple M2 Pro | Apple M2 Pro |
The speed of processing a single 240-frame video.
| Processing Time | Relative Time |
|---|---|
![]() |
![]() |
Table
| #batch | mediapipe (CPU) |
mp2torch (CPU) |
mp2torch (GPU) |
mp2torch (MPS) |
|---|---|---|---|---|
| 1 | 0.856 s | 4.053 s | 2.136 s | 6.541 s |
| 8 | - | 1.880 s | 0.769 s | 2.030 s |
| 16 | - | 1.634 s | 0.637 s | 1.652 s |
| 32 | - | 1.337 s | 0.594 s | 1.508 s |
| Processing Time | Relative Time |
|---|---|
![]() |
![]() |
Table
| #batch | mediapipe (CPU) |
mp2torch (CPU) |
mp2torch (GPU) |
mp2torch (MPS) |
|---|---|---|---|---|
| 1 | 2.268s | 12.076 s | 5.179 s | 32.993 s |
| 8 | - | 6.856 s | 1.497 s | 9.015 s |
| 16 | - | 5.736 s | 1.128 s | 4.569 s |
| 32 | - | 4.094 s | 0.928 s | 4.007 s |
You can install mp2torch through pip:
$ pip install git+https://github.com/reazon-research/mp2torch.gitYou can also clone this repository and install it:
$ git clone https://github.com/reazon-research/mp2torch.git
$ pip install ./mp2torchIf you do not have ffmpeg, please install ffmpeg:
$ sudo apt install ffmpeg # UNIX
$ brew install ffmpeg # macOSWhen you use python3.10 and venv and create a virtual environment in .venv/, you can find installed libraries in .venv/lib/python3.10/site-packages/.
Please execute the commands below to convert .tflite models to .onnx models:
. .venv/bin/activate
$ mkdir -p models/onnx/
$ python -m tf2onnx.convert --opset 16 \
--tflite .venv/lib/python3.10/site-packages/mediapipe/modules/face_detection/face_detection_short_range.tflite \
--output models/onnx/face_detection_short_range.onnx
$ python -m tf2onnx.convert --opset 16 \
--tflite .venv/lib/python3.10/site-packages/mediapipe/modules/face_landmark/face_landmark.tflite \
--output models/onnx/face_landmark.onnxYou can perform face detection to batch-process one video:
from mp2torch import BlazeFace
from mp2torch.utils import VideoFramesBatchLoader
from tqdm import tqdm
model = BlazeFace(
onnx_path="./models/onnx/face_detection_short_range.onnx",
min_score_threshold=0.5,
)
loader = VideoFramesBatchLoader(batch_size=64)
video_path = "..."
for batch, _ in tqdm(loader.make_loader(video_path)):
outs = model(batch.to(model.device())) # SegmentedTensor
# outs = model(batch.to(model.device(), return_device="cpu") # SegmentedTensor on cpu
# outs = model(batch.to(model.device()), return_tensors="np") # list[np.NDArray]
# outs = model(batch.to(model.device()), return_tensors="list") # list[list]
det_results = outs.filterd_detections
for detections in det_results.get_all_segments():
# detections corresponding to an input image in batchYou can use the model without any preprocessing or postprocessing:
from mp2torch import FaceDetectionShortRange
model = FaceDetectionShortRange()
batch = ...
bboxes, scores = model(batch)You can perform face landmark to batch-process one video:
from mp2torch import FaceLandmarker
from mp2torch.utils import VdieoFramesBatchLoader
from tqdm import tqdm
model = FaceLandmarker(
onnx_path="./models/onnx/face_landmark.onnx",
onnx_path_face_detection_short_range="./models/onnx/face_detection_short_range.onnx",
static_image_mode=True,
max_num_faces=-1, # -1 means unlimited detection
min_detection_confidence=0.5,
min_tracking_confidence=0.5,
)
loader = VideoFramesBatchLoader(batch_size=64)
video_path = "..."
for batch, _ in tqdm(loader.make_loader(video_path)):
outs = model(batch.to(model.device()))
for landmarks in outs.landmarks.get_all_segments():
# landmarks corresponding to an input image in batchYou can use the model without any preprocessing or postprocessing:
from mp2torch import FaceMesh
model = FaceMesh()
batch = ...
landmark_tensors, face_flag_tensor = model(batch)This loads a video and yields batched frames
from mp2torch.utils import VideoFramesBatchLoader
loader = VideoFramesBatchLoader(batch_size=1)
video_path = "..."
for batch, has_frame in loader.make_loader(video_path):
# processThis loads multiple videos in a batch and yields batched frames.
After loading all frames in batch, this loader finishes yielding the batched frames.
If a video has finished loaded, the image corresponding to the video in a batch is represented by a zeros tensor and has_frame is set to False.
from mp2torch.utils import VideoBatchLoader
loader = VideoBatchLoader()
video_paths = ["...", ...]
for batch, has_frame in loader.make_loader(video_paths):
# processThis loads multiple images in a batch and yields batched images.
from mp2torch.utils import ImageBatchLoader
loader = ImageBatchLoader(batch_size=32)
image_paths = ["...", ...]
for batch in loader.make_loader(image_paths):
# processIf you benchmark on your environment, please run these commands:
$ python scripts/benchmark_facedetection.py --video=$video_path --cpu --cuda
$ python scripts/benchmark_facelandmarker.py --video=$video_path --cpu --cudaYou can also benchmark on mps environment:
$ python scripts/benchmark_facedetection.py --video=$video_path --cpu --mps


