diff --git a/.new_docs/cn/faq/add_new_model.md b/.new_docs/cn/faq/add_new_model.md deleted file mode 100644 index ab5afb9b2fb..00000000000 --- a/.new_docs/cn/faq/add_new_model.md +++ /dev/null @@ -1,268 +0,0 @@ -# FastDeploy外部模型集成指引 - -在FastDeploy里面新增一个模型,包括增加C++/Python的部署支持。 本文以torchvision v0.12.0中的ResNet50模型为例,介绍使用FastDeploy做外部[模型集成](#modelsupport),具体包括如下3步。 - -| 步骤 | 说明 | 创建或修改的文件 | -|:------:|:-------------------------------------:|:---------------------------------------------:| -| [1](#step2) | 在fastdeploy/vision相应任务模块增加模型实现 | resnet.h、resnet.cc、vision.h | -| [2](#step4) | 通过pybind完成Python接口绑定 | resnet_pybind.cc、classification_pybind.cc | -| [3](#step5) | 实现Python相应调用接口 | resnet.py、\_\_init\_\_.py | - -在完成上述3步之后,一个外部模型就集成好了。 -
-如果您想为FastDeploy贡献代码,还需要为新增模型添加测试代码、说明文档和代码注释,可在[测试](#test)中查看。 -## 模型集成 - -### 模型准备 - - -在集成外部模型之前,先要将训练好的模型(.pt,.pdparams 等)转换成FastDeploy支持部署的模型格式(.onnx,.pdmodel)。多数开源仓库会提供模型转换脚本,可以直接利用脚本做模型的转换。由于torchvision没有提供转换脚本,因此手动编写转换脚本,本文中将 `torchvison.models.resnet50` 转换为 `resnet50.onnx`, 参考代码如下: - -```python -import torch -import torchvision.models as models -model = models.resnet50(pretrained=True) -batch_size = 1 #批处理大小 -input_shape = (3, 224, 224) #输入数据,改成自己的输入shape -model.eval() -x = torch.randn(batch_size, *input_shape) # 生成张量 -export_onnx_file = "resnet50.onnx" # 目的ONNX文件名 -torch.onnx.export(model, - x, - export_onnx_file, - opset_version=12, - input_names=["input"], # 输入名 - output_names=["output"], # 输出名 - dynamic_axes={"input":{0:"batch_size"}, # 批处理变量 - "output":{0:"batch_size"}}) -``` -执行上述脚本将会得到 `resnet50.onnx` 文件。 - -### C++部分 -* 创建`resnet.h`文件 - * 创建位置 - * FastDeploy/fastdeploy/vision/classification/contrib/resnet.h (FastDeploy/C++代码存放位置/视觉模型/任务名称/外部模型/模型名.h) - * 创建内容 - * 首先在resnet.h中创建 ResNet类并继承FastDeployModel父类,之后声明`Predict`、`Initialize`、`Preprocess`、`Postprocess`和`构造函数`,以及必要的变量,具体的代码细节请参考[resnet.h](https://github.com/PaddlePaddle/FastDeploy/pull/347/files#diff-69128489e918f305c208476ba793d8167e77de2aa7cadf5dcbac30da448bd28e)。 - -```C++ -class FASTDEPLOY_DECL ResNet : public FastDeployModel { - public: - ResNet(...); - virtual bool Predict(...); - private: - bool Initialize(); - bool Preprocess(...); - bool Postprocess(...); -}; -``` - -* 创建`resnet.cc`文件 - * 创建位置 - * FastDeploy/fastdeploy/vision/classification/contrib/resnet.cc (FastDeploy/C++代码存放位置/视觉模型/任务名称/外部模型/模型名.cc) - * 创建内容 - * 在`resnet.cc`中实现`resnet.h`中声明函数的具体逻辑,其中`PreProcess` 和 `PostProcess`需要参考源官方库的前后处理逻辑复现,ResNet每个函数具体逻辑如下,具体的代码请参考[resnet.cc](https://github.com/PaddlePaddle/FastDeploy/pull/347/files#diff-d229d702de28345253a53f2a5839fd2c638f3d32fffa6a7d04d23db9da13a871)。 - -```C++ -ResNet::ResNet(...) { - // 构造函数逻辑 - // 1. 指定 Backend 2. 设置RuntimeOption 3. 调用Initialize()函数 -} -bool ResNet::Initialize() { - // 初始化逻辑 - // 1. 全局变量赋值 2. 调用InitRuntime()函数 - return true; -} -bool ResNet::Preprocess(Mat* mat, FDTensor* output) { -// 前处理逻辑 -// 1. Resize 2. BGR2RGB 3. Normalize 4. HWC2CHW 5. 处理结果存入 FDTensor类中 - return true; -} -bool ResNet::Postprocess(FDTensor& infer_result, ClassifyResult* result, int topk) { - //后处理逻辑 - // 1. Softmax 2. Choose topk labels 3. 结果存入 ClassifyResult类 - return true; -} -bool ResNet::Predict(cv::Mat* im, ClassifyResult* result, int topk) { - Preprocess(...) - Infer(...) - Postprocess(...) - return true; -} -``` - -* 在`vision.h`文件中加入新增模型文件 - * 修改位置 - * FastDeploy/fastdeploy/vision.h - * 修改内容 - -```C++ -#ifdef ENABLE_VISION -#include "fastdeploy/vision/classification/contrib/resnet.h" -#endif -``` - - -### Pybind部分 - -* 创建Pybind文件 - * 创建位置 - * FastDeploy/fastdeploy/vision/classification/contrib/resnet_pybind.cc (FastDeploy/C++代码存放位置/视觉模型/任务名称/外部模型/模型名_pybind.cc) - * 创建内容 - * 利用Pybind将C++中的函数变量绑定到Python中,具体代码请参考[resnet_pybind.cc](https://github.com/PaddlePaddle/FastDeploy/pull/347/files#diff-270af0d65720310e2cfbd5373c391b2110d65c0f4efa547f7b7eeffcb958bdec)。 -```C++ -void BindResNet(pybind11::module& m) { - pybind11::class_( - m, "ResNet") - .def(pybind11::init()) - .def("predict", ...) - .def_readwrite("size", &vision::classification::ResNet::size) - .def_readwrite("mean_vals", &vision::classification::ResNet::mean_vals) - .def_readwrite("std_vals", &vision::classification::ResNet::std_vals); -} -``` - -* 调用Pybind函数 - * 修改位置 - * FastDeploy/fastdeploy/vision/classification/classification_pybind.cc (FastDeploy/C++代码存放位置/视觉模型/任务名称/任务名称}_pybind.cc) - * 修改内容 -```C++ -void BindResNet(pybind11::module& m); -void BindClassification(pybind11::module& m) { - auto classification_module = - m.def_submodule("classification", "Image classification models."); - BindResNet(classification_module); -} -``` - - -### Python部分 - - -* 创建`resnet.py`文件 - * 创建位置 - * FastDeploy/python/fastdeploy/vision/classification/contrib/resnet.py (FastDeploy/Python代码存放位置/fastdeploy/视觉模型/任务名称/外部模型/模型名.py) - * 创建内容 - * 创建ResNet类继承自FastDeployModel,实现 `\_\_init\_\_`、Pybind绑定的函数(如`predict()`)、以及`对Pybind绑定的全局变量进行赋值和获取的函数`,具体代码请参考[resnet.py](https://github.com/PaddlePaddle/FastDeploy/pull/347/files#diff-a4dc5ec2d450e91f1c03819bf314c238b37ac678df56d7dea3aab7feac10a157)。 - -```python - -class ResNet(FastDeployModel): - def __init__(self, ...): - self._model = C.vision.classification.ResNet(...) - def predict(self, input_image, topk=1): - return self._model.predict(input_image, topk) - @property - def size(self): - return self._model.size - @size.setter - def size(self, wh): - ... -``` - -* 导入ResNet类 - * 修改位置 - * FastDeploy/python/fastdeploy/vision/classification/\_\_init\_\_.py (FastDeploy/Python代码存放位置/fastdeploy/视觉模型/任务名称/\_\_init\_\_.py) - * 修改内容 - -```Python -from .contrib.resnet import ResNet -``` - -## 测试 -### 编译 - * C++ - * 位置:FastDeploy/ - -``` -mkdir build & cd build -cmake .. -DENABLE_ORT_BACKEND=ON -DENABLE_VISION=ON -DCMAKE_INSTALL_PREFIX=${PWD/fastdeploy-0.0.3 --DENABLE_PADDLE_BACKEND=ON -DENABLE_TRT_BACKEND=ON -DWITH_GPU=ON -DTRT_DIRECTORY=/PATH/TO/TensorRT/ -make -j8 -make install -``` - - 编译会得到 build/fastdeploy-0.0.3/。 - - * Python - * 位置:FastDeploy/python/ - -``` -export TRT_DIRECTORY=/PATH/TO/TensorRT/ # 如果用TensorRT 需要填写TensorRT所在位置,并开启 ENABLE_TRT_BACKEND -export ENABLE_TRT_BACKEND=ON -export WITH_GPU=ON -export ENABLE_PADDLE_BACKEND=ON -export ENABLE_OPENVINO_BACKEND=ON -export ENABLE_VISION=ON -export ENABLE_ORT_BACKEND=ON -python setup.py build -python setup.py bdist_wheel -cd dist -pip install fastdeploy_gpu_python-版本号-cpxx-cpxxm-系统架构.whl -``` - -### 编写测试代码 - * 创建位置: FastDeploy/examples/vision/classification/resnet/ (FastDeploy/示例目录/视觉模型/任务名称/模型名/) - * 创建目录结构 - -``` -. -├── cpp -│ ├── CMakeLists.txt -│ ├── infer.cc // C++ 版本测试代码 -│ └── README.md // C++版本使用文档 -├── python -│ ├── infer.py // Python 版本测试代码 -│ └── README.md // Python版本使用文档 -└── README.md // ResNet 模型集成说明文档 -``` - -* C++ - * 编写CmakeLists文件、C++ 代码以及 README.md 内容请参考[cpp/](https://github.com/PaddlePaddle/FastDeploy/pull/347/files#diff-afcbe607b796509581f89e38b84190717f1eeda2df0419a2ac9034197ead5f96)。 - * 编译 infer.cc - * 位置:FastDeploy/examples/vision/classification/resnet/cpp/ - -``` -mkdir build & cd build -cmake .. -DFASTDEPLOY_INSTALL_DIR=/PATH/TO/FastDeploy/build/fastdeploy-0.0.3/ -make -``` - -* Python - * Python 代码以及 README.md 内容请参考[python/](https://github.com/PaddlePaddle/FastDeploy/pull/347/files#diff-5a0d6be8c603a8b81454ac14c17fb93555288d9adf92bbe40454449309700135)。 - -### 为代码添加注释 -为了方便用户理解代码,我们需要为新增代码添加注释,添加注释方法可参考如下示例。 -- C++ 代码 -您需要在resnet.h文件中为函数和变量增加注释,有如下三种注释方式,具体可参考[resnet.h](https://github.com/PaddlePaddle/FastDeploy/pull/347/files#diff-69128489e918f305c208476ba793d8167e77de2aa7cadf5dcbac30da448bd28e)。 - -```C++ -/** \brief Predict for the input "im", the result will be saved in "result". -* -* \param[in] im Input image for inference. -* \param[in] result Saving the inference result. -* \param[in] topk The length of return values, e.g., if topk==2, the result will include the 2 most possible class label for input image. -*/ -virtual bool Predict(cv::Mat* im, ClassifyResult* result, int topk = 1); - -/// Tuple of (width, height) -std::vector size; -/*! @brief Initialize for ResNet model, assign values to the global variables and call InitRuntime() -*/ -bool Initialize(); -``` -- Python 代码 -你需要为resnet.py文件中的函数和变量增加适当的注释,示例如下,具体可参考[resnet.py](https://github.com/PaddlePaddle/FastDeploy/pull/347/files#diff-a4dc5ec2d450e91f1c03819bf314c238b37ac678df56d7dea3aab7feac10a157)。 - -```python - def predict(self, input_image, topk=1): - """Classify an input image - - :param input_image: (numpy.ndarray)The input image data, 3-D array with layout HWC, BGR format - :param topk: (int)The topk result by the classify confidence score, default 1 - :return: ClassifyResult - """ - return self._model.predict(input_image, topk) -``` - -对于集成模型过程中的其他文件,您也可以对实现的细节添加适当的注释说明。 diff --git a/docs/api_docs/python/index.rst b/docs/api_docs/python/index.rst index 5761cd3a511..873f83489a9 100644 --- a/docs/api_docs/python/index.rst +++ b/docs/api_docs/python/index.rst @@ -18,3 +18,4 @@ FastDeploy image_classification.md keypoint_detection.md matting.md + vision_results_en.md diff --git a/docs/api_docs/python/vision_results_en.md b/docs/api_docs/python/vision_results_en.md index a7ae54a5709..513a011d7a3 100644 --- a/docs/api_docs/python/vision_results_en.md +++ b/docs/api_docs/python/vision_results_en.md @@ -1,5 +1,7 @@ # Description of Vision Results +本文档的中文版本参考[视觉模型预测结果说明](./vision_results_cn.md) + ## ClassifyResult The code of ClassifyResult is defined in `fastdeploy/vision/common/result.h` and is used to indicate the classification label result and confidence the image. diff --git a/docs/cn/build_and_install/android.md b/docs/cn/build_and_install/android.md index c091b75d550..899ec2985a5 100644 --- a/docs/cn/build_and_install/android.md +++ b/docs/cn/build_and_install/android.md @@ -1,3 +1,105 @@ # Android部署库编译 -进行中... +FastDeploy当前在Android仅支持Paddle-Lite后端推理,支持armeabi-v7a和arm64-v8a两种cpu架构,在armv8.2架构的arm设备支持fp16精度推理。相关编译选项说明如下: + +|编译选项|默认值|说明|备注| +|:---|:---|:---|:---| +|ENABLE_LITE_BACKEND|OFF|编译Android库时需要设置为ON| - | +|WITH_OPENCV_STATIC|OFF|是否使用OpenCV静态库| - | +|WITH_LITE_STATIC|OFF|是否使用Lite静态库| 暂不支持使用Lite静态库 | +|WITH_LITE_FULL_API|ON|是否使用Lite Full API库| 目前必须为ON | +|WITH_LITE_FP16|OFF|是否使用带FP16支持的Lite库| 目前仅支持 arm64-v8a 架构| + +更多编译选项请参考[FastDeploy编译选项说明](./README.md) + +## Android C++ SDK 编译安装 + +编译需要满足: + +- Android SDK API >= 21 +- Android NDK >= 20 (当前仅支持clang编译工具链) +- cmake >= 3.10.0 + +编译前请先检查您的Android SDK 和 NDK 是否已经配置,如: +```bash +➜ echo $ANDROID_SDK +/Users/xxx/Library/Android/sdk +➜ echo $ANDROID_NDK +/Users/xxx/Library/Android/sdk/ndk/25.1.8937393 +``` +推荐使用 NDK>=20 进行交叉编译,编译命令如下: +```bash +# Download the latest source code +git clone https://github.com/PaddlePaddle/FastDeploy.git +cd FastDeploy + +# Setting up Android toolchanin +ANDROID_ABI=arm64-v8a # 'arm64-v8a', 'armeabi-v7a' +ANDROID_PLATFORM="android-21" # API >= 21 +ANDROID_STL=c++_shared # 'c++_shared', 'c++_static' +ANDROID_TOOLCHAIN=clang # 'clang' only +TOOLCHAIN_FILE=${ANDROID_NDK}/build/cmake/android.toolchain.cmake + +# Create build directory +BUILD_ROOT=build/Android +BUILD_DIR=${BUILD_ROOT}/${ANDROID_ABI}-api-21 +FASDEPLOY_INSTALL_DIR="${BUILD_DIR}/install" +mkdir build && mkdir ${BUILD_ROOT} && mkdir ${BUILD_DIR} +cd ${BUILD_DIR} + +# Check fp16 support (only support arm64-v8a now) +WITH_LITE_FP16=ON +if [ "$ANDROID_ABI" = "armeabi-v7a" ]; then + WITH_LITE_FP16=OFF +fi + +# CMake configuration with Android toolchain +cmake -DCMAKE_TOOLCHAIN_FILE=${TOOLCHAIN_FILE} \ + -DCMAKE_BUILD_TYPE=MinSizeRel \ + -DANDROID_ABI=${ANDROID_ABI} \ + -DANDROID_NDK=${ANDROID_NDK} \ + -DANDROID_PLATFORM=${ANDROID_PLATFORM} \ + -DANDROID_STL=${ANDROID_STL} \ + -DANDROID_TOOLCHAIN=${ANDROID_TOOLCHAIN} \ + -DENABLE_LITE_BACKEND=ON \ + -DENABLE_VISION=ON \ + -DWITH_LITE_FP16=${WITH_LITE_FP16} \ + -DCMAKE_INSTALL_PREFIX=${FASDEPLOY_INSTALL_DIR} \ + -Wno-dev ../../.. + +# Build FastDeploy Android C++ SDK +make -j8 +make install +``` +编译完成后,Android C++ SDK 保存在 `build/Android/arm64-v8a-api-21/install` 目录下,目录结构如下: +```bash +➜ tree . -d -L 3 +. +├── examples +├── include +│   └── fastdeploy # FastDeploy 头文件 +├── lib +│   └── arm64-v8a # FastDeploy Android 动态库 +└── third_libs # 第三方依赖库 + └── install + ├── opencv + └── paddlelite +``` +在examples/vision目录下可查看Android C++ SDK 使用案例: +```bash +. +├── classification +│   ├── paddleclas +│   │   ├── android # 图像分类Android使用案例 +│   │   ├── cpp +... +├── detection +│   ├── paddledetection +│   │   ├── android # 目标检测Android使用案例 +│   │   ├── cpp +... +``` +如何使用FastDeploy Android C++ SDK 请参考使用案例文档: +- [图像分类Android使用文档](../../../examples/vision/classification/paddleclas/android/README.md) +- [目标检测Android使用文档](../../../examples/vision/detection/paddledetection/android/README.md) +- [在 Android 通过 JNI 中使用 FastDeploy C++ SDK](../../../../../docs/cn/faq/use_cpp_sdk_on_android.md) diff --git a/docs/cn/build_and_install/download_prebuilt_libraries.md b/docs/cn/build_and_install/download_prebuilt_libraries.md index a320b692440..9182aa69fc7 100644 --- a/docs/cn/build_and_install/download_prebuilt_libraries.md +++ b/docs/cn/build_and_install/download_prebuilt_libraries.md @@ -17,7 +17,7 @@ FastDeploy提供各平台预编译库,供开发者直接下载安装使用。 Release版本(当前最新0.3.0)安装 ```bash -pip install fastdeploy-gpu-python -f https://www.paddlepaddle.org.cn/whl/fastdeploy.html +pip install fastdeploy-gpu-python -f https://www.paddlepaddle.org.cn/whl/fastdeploy.html ``` 其中推荐使用Conda配置开发环境 @@ -51,7 +51,7 @@ pip install fastdeploy-python -f https://www.paddlepaddle.org.cn/whl/fastdeploy. ## C++ SDK安装 -Release版本(当前最新0.3.0) +Release版本(当前最新0.3.0,Android为0.4.0 pre-release) | 平台 | 文件 | 说明 | | :--- | :--- | :---- | @@ -59,5 +59,5 @@ Release版本(当前最新0.3.0) | Windows x64 | [fastdeploy-win-x64-0.3.0.zip](https://bj.bcebos.com/fastdeploy/release/cpp/fastdeploy-win-x64-0.3.0.zip) | Visual Studio 16 2019编译产出 | | Mac OSX x64 | [fastdeploy-osx-x86_64-0.3.0.tgz](https://bj.bcebos.com/fastdeploy/release/cpp/fastdeploy-osx-x86_64-0.3.0.tgz) | clang++ 10.0.0编译产出| | Mac OSX arm64 | [fastdeploy-osx-arm64-0.3.0.tgz](https://bj.bcebos.com/fastdeploy/release/cpp/fastdeploy-osx-arm64-0.3.0.tgz) | clang++ 13.0.0编译产出 | -| Linux aarch64 | [fastdeploy-linux-aarch64-0.3.0.tgz](https://bj.bcebos.com/fastdeploy/release/cpp/fastdeploy-linux-aarch64-0.3.0.tgz) | g++ 6.3.0编译产出 | - +| Linux aarch64 | [fastdeploy-linux-aarch64-0.3.0.tgz](https://bj.bcebos.com/fastdeploy/release/cpp/fastdeploy-linux-aarch64-0.3.0.tgz) | g++ 6.3.0编译产出 | +| Android armv7&v8 | [fastdeploy-android-0.4.0-shared.tgz](https://bj.bcebos.com/fastdeploy/release/android/fastdeploy-android-0.4.0-shared.tgz) | NDK 25及clang++编译产出, 支持arm64-v8a及armeabi-v7a | diff --git a/docs/cn/faq/use_cpp_sdk_on_android.md b/docs/cn/faq/use_cpp_sdk_on_android.md new file mode 100644 index 00000000000..bdea635b4b2 --- /dev/null +++ b/docs/cn/faq/use_cpp_sdk_on_android.md @@ -0,0 +1,231 @@ +# 在 Android 中通过 JNI 使用 FastDeploy C++ SDK +本文档将以PicoDet为例,讲解如何通过JNI,将FastDeploy中的模型封装到Android中进行调用。阅读本文档,您至少需要了解C++、Java、JNI以及Android的基础知识。如果您主要关注如何在Java层如何调用FastDeploy的API,则可以不阅读本文档。 + +## 目录 +- [新建Java类并定义native API](#Java) +- [Android Studio 生成JNI函数定义](#JNI) +- [在C++层实现JNI函数](#CPP) +- [编写CMakeLists.txt及配置build.gradle](#CMakeAndGradle) +- [更多FastDeploy Android 使用案例](#Examples) + +## 新建Java类并定义native API +
+ +```java +public class PicoDet { + protected long mNativeModelContext = 0; // Context from native. + protected boolean mInitialized = false; + // ... + // Bind predictor from native context. + private static native long bindNative(String modelFile, + String paramsFile, + String configFile, + int cpuNumThread, + boolean enableLiteFp16, + int litePowerMode, + String liteOptimizedModelDir, + boolean enableRecordTimeOfRuntime, + String labelFile); + + // Call prediction from native context. + private static native long predictNative(long nativeModelContext, + Bitmap ARGB8888Bitmap, + boolean saved, + String savedImagePath, + float scoreThreshold, + boolean rendering); + + // Release buffers allocated in native context. + private static native boolean releaseNative(long nativeModelContext); + + // Initializes at the beginning. + static { + FastDeployInitializer.init(); + } +} +``` +这些被标记为native的接口是需要通过JNI的方式实现,并在Java层供PicoDet类调用。完整的PicoDet Java代码请参考 [PicoDet.java](../../../examples/vision/detection/paddledetection/android/app/src/main/java/com/baidu/paddle/fastdeploy/vision/detection/PicoDet.java) 。各个函数说明如下: +- `bindNative`: C++层初始化模型资源,如果成功初始化,则返回指向该模型的指针(long类型),否则返回0指针 +- `predictNative`: 通过已经初始化好的模型指针,在C++层执行预测代码,如果预测成功则返回指向预测结果的指针,否则返回0指针。注意,该结果指针在当次预测使用完之后需要释放,具体操作请参考 [PicoDet.java](../../../examples/vision/detection/paddledetection/android/app/src/main/java/com/baidu/paddle/fastdeploy/vision/detection/PicoDet.java) 中的predict函数。 +- `releaseNative`: 根据传入的模型指针,在C++层释放模型资源。 + +## Android Studio 生成JNI函数定义 +
+ +Android Studio 生成 JNI 函数定义: 鼠标停留在Java中定义的native函数上,Android Studio 便会提示是否要创建JNI函数定义;这里,我们把JNI函数定义创建在一个事先创建好的c++文件`picodet_jni.cc`上; + +- 使用Android Studio创建JNI函数定义: +![](https://user-images.githubusercontent.com/31974251/197341065-cdf8f626-4bb1-4a57-8d7a-80b382fe994e.png) + +- 将JNI函数定义创建在picodet_jni.cc上: +![](https://user-images.githubusercontent.com/31974251/197341190-b887dec5-fa75-43c9-9ab3-7ead50c0eb45.png) + +- 创建的JNI函数定义如下: +![](https://user-images.githubusercontent.com/31974251/197341274-e9671bac-9e77-4043-a870-9d5db914586b.png) + +其他native函数对应的JNI函数定义的创建和此流程一样。 + +## 在C++层实现JNI函数 +
+ +以下为PicoDet JNI层实现的示例,相关的辅助函数不在此处赘述,完整的C++代码请参考 [android/app/src/main/cpp](../../../examples/vision/detection/paddledetection/android/app/src/main/cpp/). +```C++ +#include // NOLINT +#include "fastdeploy_jni.h" // NOLINT + +#ifdef __cplusplus +extern "C" { +#endif + +// 绑定C++层的模型 +JNIEXPORT jlong JNICALL +Java_com_baidu_paddle_fastdeploy_vision_detection_PicoDet_bindNative( + JNIEnv *env, jclass clazz, jstring model_file, jstring params_file, + jstring config_file, jint cpu_num_thread, jboolean enable_lite_fp16, + jint lite_power_mode, jstring lite_optimized_model_dir, + jboolean enable_record_time_of_runtime, jstring label_file) { + std::string c_model_file = fastdeploy::jni::ConvertTo(env, model_file); + std::string c_params_file = fastdeploy::jni::ConvertTo(env, params_file); + std::string c_config_file = astdeploy::jni::ConvertTo(env, config_file); + std::string c_label_file = fastdeploy::jni::ConvertTo(env, label_file); + std::string c_lite_optimized_model_dir = fastdeploy::jni::ConvertTo(env, lite_optimized_model_dir); + auto c_cpu_num_thread = static_cast(cpu_num_thread); + auto c_enable_lite_fp16 = static_cast(enable_lite_fp16); + auto c_lite_power_mode = static_cast(lite_power_mode); + fastdeploy::RuntimeOption c_option; + c_option.UseCpu(); + c_option.UseLiteBackend(); + c_option.SetCpuThreadNum(c_cpu_num_thread); + c_option.SetLitePowerMode(c_lite_power_mode); + c_option.SetLiteOptimizedModelDir(c_lite_optimized_model_dir); + if (c_enable_lite_fp16) { + c_option.EnableLiteFP16(); + } + // 如果您实现的是其他模型,比如PPYOLOE,请注意修改此处绑定的C++类型 + auto c_model_ptr = new fastdeploy::vision::detection::PicoDet( + c_model_file, c_params_file, c_config_file, c_option); + // Enable record Runtime time costs. + if (enable_record_time_of_runtime) { + c_model_ptr->EnableRecordTimeOfRuntime(); + } + // Load detection labels if label path is not empty. + if ((!fastdeploy::jni::AssetsLoaderUtils::IsDetectionLabelsLoaded()) && + (!c_label_file.empty())) { + fastdeploy::jni::AssetsLoaderUtils::LoadDetectionLabels(c_label_file); + } + // WARN: need to release manually in Java ! + return reinterpret_cast(c_model_ptr); // native model context +} + +// 通过传入的模型指针在C++层进行预测 +JNIEXPORT jlong JNICALL +Java_com_baidu_paddle_fastdeploy_vision_detection_PicoDet_predictNative( + JNIEnv *env, jclass clazz, jlong native_model_context, + jobject argb8888_bitmap, jboolean saved, jstring saved_image_path, + jfloat score_threshold, jboolean rendering) { + if (native_model_context == 0) { + return 0; + } + cv::Mat c_bgr; + if (!fastdeploy::jni::ARGB888Bitmap2BGR(env, argb8888_bitmap, &c_bgr)) { + return 0; + } + auto c_model_ptr = reinterpret_cast( + native_model_context); + auto c_result_ptr = new fastdeploy::vision::DetectionResult(); + t = fastdeploy::jni::GetCurrentTime(); + if (!c_model_ptr->Predict(&c_bgr, c_result_ptr)) { + delete c_result_ptr; + return 0; + } + // ... + return reinterpret_cast(c_result_ptr); // native result context +} + +// 在C++层释放模型资源 +JNIEXPORT jboolean JNICALL +Java_com_baidu_paddle_fastdeploy_vision_detection_PicoDet_releaseNative( + JNIEnv *env, jclass clazz, jlong native_model_context) { + if (native_model_context == 0) { + return JNI_FALSE; + } + auto c_model_ptr = reinterpret_cast( + native_model_context); + // ... + delete c_model_ptr; + return JNI_TRUE; +} + +#ifdef __cplusplus +} +#endif +``` +## 编写CMakeLists.txt及配置build.gradle +
+ +实现好的JNI代码,需要被编译成so库,才能被Java调用,为实现该目的,需要在build.gradle中添加JNI项目支持,并编写对应的CMakeLists.txt。 +- build.gradle中配置NDK、CMake以及Android ABI +```java +android { + defaultConfig { + // 省略其他配置 ... + externalNativeBuild { + cmake { + arguments '-DANDROID_PLATFORM=android-21', '-DANDROID_STL=c++_shared', "-DANDROID_TOOLCHAIN=clang" + abiFilters 'armeabi-v7a', 'arm64-v8a' + cppFlags "-std=c++11" + } + } + } + // 省略其他配置 ... + externalNativeBuild { + cmake { + path file('src/main/cpp/CMakeLists.txt') + version '3.10.2' + } + } + ndkVersion '20.1.5948944' +} +``` +- 编写CMakeLists.txt示例 +```cmake +cmake_minimum_required(VERSION 3.10.2) +project("fastdeploy_jni") + +set(FastDeploy_DIR "${CMAKE_CURRENT_SOURCE_DIR}/../../../libs/fastdeploy-android-0.4.0-shared") + +find_package(FastDeploy REQUIRED) + +include_directories(${CMAKE_CURRENT_SOURCE_DIR}) +include_directories(${FastDeploy_INCLUDE_DIRS}) + +add_library( + fastdeploy_jni + SHARED + utils_jni.cc + bitmap_jni.cc + vision/results_jni.cc + vision/visualize_jni.cc + vision/detection/picodet_jni.cc + vision/classification/paddleclas_model_jni.cc) + +find_library(log-lib log) + +target_link_libraries( + # Specifies the target library. + fastdeploy_jni + jnigraphics + ${FASTDEPLOY_LIBS} + GLESv2 + EGL + ${log-lib} +) +``` +完整的工程示例,请参考 [android/app/src/main/cpp/CMakelists.txt](../../../examples/vision/detection/paddledetection/android/app/src/main/cpp/) 以及 [android/app/build.gradle](../../../examples/vision/detection/paddledetection/android/app/build.gradle). + +## 更多FastDeploy Android 使用案例 +
+ +更多FastDeploy Android 使用案例请参考以下文档: +- [图像分类Android使用文档](../../../examples/vision/classification/paddleclas/android/README.md) +- [目标检测Android使用文档](../../../examples/vision/detection/paddledetection/android/README.md) diff --git a/docs/cn/faq/use_java_sdk_on_android.md b/docs/cn/faq/use_java_sdk_on_android.md new file mode 100644 index 00000000000..832ca1909c8 --- /dev/null +++ b/docs/cn/faq/use_java_sdk_on_android.md @@ -0,0 +1,2 @@ +## 在 Android 中使用 FastDeploy Java SDK +- TODO diff --git a/docs/cn/quantize.md b/docs/cn/quantize.md index 5411cac2995..6277c8385c0 100644 --- a/docs/cn/quantize.md +++ b/docs/cn/quantize.md @@ -2,7 +2,7 @@ # 量化加速 量化是一种流行的模型压缩方法,量化后的模型拥有更小的体积和更快的推理速度. -FastDeploy基于PaddleSlim, 集成了一键模型量化的工具, 同时, FastDeploy支持部署量化后的模型, 帮助用户实现推理加速. +FastDeploy基于PaddleSlim, 集成了一键模型量化的工具, 同时, FastDeploy支持推理部署量化后的模型, 帮助用户实现推理加速. ## FastDeploy 多个引擎和硬件支持量化模型部署 @@ -17,7 +17,7 @@ FastDeploy基于PaddleSlim, 集成了一键模型量化的工具, 同时, FastDe ## 模型量化 ### 量化方法 -基于PaddleSlim, 目前FastDeploy提供的的量化方法有量化蒸馏训练和离线量化, 量化蒸馏训练通过模型训练来获得量化模型, 离线量化不需要模型训练即可完成模型的量化. FastDeploy 对两种方式产出的量化模型均能部署. +基于PaddleSlim,目前FastDeploy提供的的量化方法有量化蒸馏训练和离线量化,量化蒸馏训练通过模型训练来获得量化模型,离线量化不需要模型训练即可完成模型的量化。 FastDeploy 对两种方式产出的量化模型均能部署。 两种方法的主要对比如下表所示: | 量化方法 | 量化过程耗时 | 量化模型精度 | 模型体积 | 推理速度 | @@ -25,13 +25,14 @@ FastDeploy基于PaddleSlim, 集成了一键模型量化的工具, 同时, FastDe | 离线量化 | 无需训练,耗时短 | 比量化蒸馏训练稍低 | 两者一致 | 两者一致 | | 量化蒸馏训练 | 需要训练,耗时稍高 | 较未量化模型有少量损失 | 两者一致 |两者一致 | -### 用户使用FastDeploy一键模型量化工具来量化模型 -Fastdeploy基于PaddleSlim, 为用户提供了一键模型量化的工具,请参考如下文档进行模型量化. +### 使用FastDeploy一键模型量化工具来量化模型 +Fastdeploy基于PaddleSlim, 为用户提供了一键模型量化的工具,请参考如下文档进行模型量化。 - [FastDeploy 一键模型量化](../../tools/quantization/) -当用户获得产出的量化模型之后,即可以使用FastDeploy来部署量化模型. +当用户获得产出的量化模型之后,即可以使用FastDeploy来部署量化模型。 -## 量化示例 +## 量化benchmark + 目前, FastDeploy已支持的模型量化如下表所示: ### YOLO 系列 diff --git a/examples/application/js/README.md b/examples/application/js/README.md index 4544c061a27..ced46e74129 100644 --- a/examples/application/js/README.md +++ b/examples/application/js/README.md @@ -33,3 +33,9 @@ |目标检测| coming soon | |图像分割| coming soon | |物品分类| coming soon | + + +## Contributor + +感谢飞桨开发者专家(PPDE)陈千鹤(github: [chenqianhe](https://github.com/chenqianhe))贡献的小程序、Web demo,项目[链接](https://github.com/chenqianhe/Paddle.js)。 + diff --git a/examples/vision/classification/paddleclas/android/README.md b/examples/vision/classification/paddleclas/android/README.md index 5a595241daf..05164d8435a 100644 --- a/examples/vision/classification/paddleclas/android/README.md +++ b/examples/vision/classification/paddleclas/android/README.md @@ -1,3 +1,155 @@ -## 图像分类 Android Demo 使用文档 +## 图像分类 PaddleClas Android Demo 使用文档 -- TODO +在 Android 上实现实时的图像分类功能,此 Demo 有很好的的易用性和开放性,如在 Demo 中跑自己训练好的模型等。 + +## 环境准备 + +1. 在本地环境安装好 Android Studio 工具,详细安装方法请见[Android Stuido 官网](https://developer.android.com/studio)。 +2. 准备一部 Android 手机,并开启 USB 调试模式。开启方法: `手机设置 -> 查找开发者选项 -> 打开开发者选项和 USB 调试模式` + +**注意**:如果您的 Android Studio 尚未配置 NDK ,请根据 Android Studio 用户指南中的[安装及配置 NDK 和 CMake ](https://developer.android.com/studio/projects/install-ndk)内容,预先配置好 NDK 。您可以选择最新的 NDK 版本,或者使用 FastDeploy Android 预测库版本一样的 NDK + +## 部署步骤 + +1. 目标检测 PaddleClas Demo 位于 `fastdeploy/examples/vision/classification/paddleclas/android` 目录 +2. 用 Android Studio 打开 paddleclas/android 工程 +3. 手机连接电脑,打开 USB 调试和文件传输模式,并在 Android Studio 上连接自己的手机设备(手机需要开启允许从 USB 安装软件权限) + +

+image +

+ +> **注意:** +>> 如果您在导入项目、编译或者运行过程中遇到 NDK 配置错误的提示,请打开 ` File > Project Structure > SDK Location`,修改 `Andriod NDK location` 为您本机配置的 NDK 所在路径。本工程默认使用的NDK版本为20. +>> 如果您是通过 Andriod Studio 的 SDK Tools 下载的 NDK (见本章节"环境准备"),可以直接点击下拉框选择默认路径。 +>> 还有一种 NDK 配置方法,你可以在 `paddleclas/android/local.properties` 文件中手动完成 NDK 路径配置,如下图所示 +>> 如果以上步骤仍旧无法解决 NDK 配置错误,请尝试根据 Andriod Studio 官方文档中的[更新 Android Gradle 插件](https://developer.android.com/studio/releases/gradle-plugin?hl=zh-cn#updating-plugin)章节,尝试更新Android Gradle plugin版本。 + +4. 点击 Run 按钮,自动编译 APP 并安装到手机。(该过程会自动下载预编译的 FastDeploy Android 库,需要联网) +成功后效果如下,图一:APP 安装到手机;图二: APP 打开后的效果,会自动识别图片中的物体并标记;图三:APP设置选项,点击右上角的设置图片,可以设置不同选项进行体验。 + + | APP 图标 | APP 效果 | APP设置项 + | --- | --- | --- | + | ![app_pic ](https://user-images.githubusercontent.com/31974251/197170082-a2bdd49d-60ea-4df0-af63-18ed898a746e.jpg) | ![app_res](https://user-images.githubusercontent.com/31974251/197339363-ae7acd5d-88b8-4365-aea5-b27826c6a25f.jpg) | ![app_setup](https://user-images.githubusercontent.com/31974251/197339378-bb30b108-2d77-4b30-981d-d687b6fca8f6.jpg) | + +## PaddleClasModel Java API 说明 +- 模型初始化 API: 模型初始化API包含两种方式,方式一是通过构造函数直接初始化;方式二是,通过调用init函数,在合适的程序节点进行初始化。PaddleClasModel初始化参数说明如下: + - modelFile: String, paddle格式的模型文件路径,如 model.pdmodel + - paramFile: String, paddle格式的参数文件路径,如 model.pdiparams + - configFile: String, 模型推理的预处理配置文件,如 infer_cfg.yml + - labelFile: String, 可选参数,表示label标签文件所在路径,用于可视化,如 imagenet1k_label_list.txt,每一行包含一个label + - option: RuntimeOption,可选参数,模型初始化option。如果不传入该参数则会使用默认的运行时选项。 + +```java +// 构造函数: constructor w/o label file +public PaddleClasModel(); // 空构造函数,之后可以调用init初始化 +public PaddleClasModel(String modelFile, String paramsFile, String configFile); +public PaddleClasModel(String modelFile, String paramsFile, String configFile, String labelFile); +public PaddleClasModel(String modelFile, String paramsFile, String configFile, RuntimeOption option); +public PaddleClasModel(String modelFile, String paramsFile, String configFile, String labelFile, RuntimeOption option); +// 手动调用init初始化: call init manually w/o label file +public boolean init(String modelFile, String paramsFile, String configFile, RuntimeOption option); +public boolean init(String modelFile, String paramsFile, String configFile, String labelFile, RuntimeOption option); +``` +- 模型预测 API:模型预测API包含直接预测的API以及带可视化功能的API。直接预测是指,不保存图片以及不渲染结果到Bitmap上,仅预测推理结果。预测并且可视化是指,预测结果以及可视化,并将可视化后的图片保存到指定的途径,以及将可视化结果渲染在Bitmap(目前支持ARGB8888格式的Bitmap), 后续可将该Bitmap在camera中进行显示。 +```java +// 直接预测:不保存图片以及不渲染结果到Bitmap上 +public ClassifyResult predict(Bitmap ARGB8888Bitmap); +// 预测并且可视化:预测结果以及可视化,并将可视化后的图片保存到指定的途径,以及将可视化结果渲染在Bitmap上 +public ClassifyResult predict(Bitmap ARGB8888Bitmap, String savedImagePath, float scoreThreshold) +``` +- 模型资源释放 API:调用 release() API 可以释放模型资源,返回true表示释放成功,false表示失败;调用 initialized() 可以判断模型是否初始化成功,true表示初始化成功,false表示失败。 +```java +public boolean release(); // 释放native资源 +public boolean initialized(); // 检查是否初始化成功 +``` +- RuntimeOption设置说明 +```java +public void enableLiteFp16(); // 开启fp16精度推理 +public void disableLiteFP16(); // 关闭fp16精度推理 +public void setCpuThreadNum(int threadNum); // 设置线程数 +public void setLitePowerMode(LitePowerMode mode); // 设置能耗模式 +public void setLitePowerMode(String modeStr); // 通过字符串形式设置能耗模式 +public void enableRecordTimeOfRuntime(); // 是否打印模型运行耗时 +``` + +- 模型结果ClassifyResult说明 +```java +public float[] mScores; // [n] 得分 +public int[] mLabelIds; // [n] 分类ID +public boolean initialized(); // 检测结果是否有效 +``` + +- 模型调用示例1:使用构造函数以及默认的RuntimeOption +```java +import java.nio.ByteBuffer; +import android.graphics.Bitmap; +import android.opengl.GLES20; + +import com.baidu.paddle.fastdeploy.vision.ClassifyResult; +import com.baidu.paddle.fastdeploy.vision.classification.PaddleClasModel; + +// 初始化模型 +PaddleClasModel model = new PaddleClasModel("MobileNetV1_x0_25_infer/inference.pdmodel", + "MobileNetV1_x0_25_infer/inference.pdiparams", + "MobileNetV1_x0_25_infer/inference_cls.yml"); + +// 读取图片: 以下仅为读取Bitmap的伪代码 +ByteBuffer pixelBuffer = ByteBuffer.allocate(width * height * 4); +GLES20.glReadPixels(0, 0, width, height, GLES20.GL_RGBA, GLES20.GL_UNSIGNED_BYTE, pixelBuffer); +Bitmap ARGB8888ImageBitmap = Bitmap.createBitmap(width, height, Bitmap.Config.ARGB_8888); +ARGB8888ImageBitmap.copyPixelsFromBuffer(pixelBuffer); + +// 模型推理 +ClassifyResult result = model.predict(ARGB8888ImageBitmap); + +// 释放模型资源 +model.release(); +``` + +- 模型调用示例2: 在合适的程序节点,手动调用init,并自定义RuntimeOption +```java +// import 同上 ... +import com.baidu.paddle.fastdeploy.RuntimeOption; +import com.baidu.paddle.fastdeploy.LitePowerMode; +import com.baidu.paddle.fastdeploy.vision.ClassifyResult; +import com.baidu.paddle.fastdeploy.vision.classification.PaddleClasModel; +// 新建空模型 +PaddleClasModel model = new PaddleClasModel(); +// 模型路径 +String modelFile = "MobileNetV1_x0_25_infer/inference.pdmodel"; +String paramFile = "MobileNetV1_x0_25_infer/inference.pdiparams"; +String configFile = "MobileNetV1_x0_25_infer/inference_cls.yml"; +// 指定RuntimeOption +RuntimeOption option = new RuntimeOption(); +option.setCpuThreadNum(2); +option.setLitePowerMode(LitePowerMode.LITE_POWER_HIGH); +option.enableRecordTimeOfRuntime(); +option.enableLiteFp16(); +// 使用init函数初始化 +model.init(modelFile, paramFile, configFile, option); +// Bitmap读取、模型预测、资源释放 同上 ... +``` +更详细的用法请参考 [MainActivity](./app/src/main/java/com/baidu/paddle/fastdeploy/examples/MainActivity.java#L207) 中的用法 + +## 替换 FastDeploy 预测库和模型 +替换FastDeploy预测库和模型的步骤非常简单。预测库所在的位置为 `app/libs/fastdeploy-android-xxx-shared`,其中 `xxx` 表示当前您使用的预测库版本号。模型所在的位置为,`app/src/main/assets/models/MobileNetV1_x0_25_infer`。 +- 替换FastDeploy预测库的步骤: + - 下载或编译最新的FastDeploy Android预测库,解压缩后放在 `app/libs` 目录下; + - 修改 `app/src/main/cpp/CMakeLists.txt` 中的预测库路径,指向您下载或编译的预测库路径。如: +```cmake +set(FastDeploy_DIR "${CMAKE_CURRENT_SOURCE_DIR}/../../../libs/fastdeploy-android-xxx-shared") +``` +- 替换PaddleClas模型的步骤: + - 将您的PaddleClas分类模型放在 `app/src/main/assets/models` 目录下; + - 修改 `app/src/main/res/values/strings.xml` 中模型路径的默认值,如: +```xml + +models/MobileNetV1_x0_25_infer +labels/imagenet1k_label_list.txt +``` + +## 如何通过 JNI 在 Native 层接入 FastDeploy C++ API ? +如果您对如何通过JNI来接入FastDeploy C++ API感兴趣,可以参考以下内容: +- [app/src/main/cpp 代码实现](./app/src/main/cpp/) +- [在 Android 中通过 JNI 使用 FastDeploy C++ SDK](../../../../../docs/cn/faq/use_cpp_sdk_on_android.md) diff --git a/examples/vision/classification/paddleclas/android/app/src/main/java/com/baidu/paddle/fastdeploy/vision/classification/PaddleClasModel.java b/examples/vision/classification/paddleclas/android/app/src/main/java/com/baidu/paddle/fastdeploy/vision/classification/PaddleClasModel.java index f0524604e05..b4a56e30955 100644 --- a/examples/vision/classification/paddleclas/android/app/src/main/java/com/baidu/paddle/fastdeploy/vision/classification/PaddleClasModel.java +++ b/examples/vision/classification/paddleclas/android/app/src/main/java/com/baidu/paddle/fastdeploy/vision/classification/PaddleClasModel.java @@ -14,6 +14,20 @@ public PaddleClasModel() { mInitialized = false; } + // Constructor with default runtime option + public PaddleClasModel(String modelFile, + String paramsFile, + String configFile) { + init_(modelFile, paramsFile, configFile, "", new RuntimeOption()); + } + + public PaddleClasModel(String modelFile, + String paramsFile, + String configFile, + String labelFile) { + init_(modelFile, paramsFile, configFile, labelFile, new RuntimeOption()); + } + // Constructor without label file public PaddleClasModel(String modelFile, String paramsFile, diff --git a/examples/vision/classification/paddleclas/android/app/src/main/java/com/baidu/paddle/fastdeploy/vision/detection/PicoDet.java b/examples/vision/classification/paddleclas/android/app/src/main/java/com/baidu/paddle/fastdeploy/vision/detection/PicoDet.java index bbd52eecee7..9729eeb8a43 100644 --- a/examples/vision/classification/paddleclas/android/app/src/main/java/com/baidu/paddle/fastdeploy/vision/detection/PicoDet.java +++ b/examples/vision/classification/paddleclas/android/app/src/main/java/com/baidu/paddle/fastdeploy/vision/detection/PicoDet.java @@ -14,6 +14,20 @@ public PicoDet() { mInitialized = false; } + // Constructor with default runtime option + public PicoDet(String modelFile, + String paramsFile, + String configFile) { + init_(modelFile, paramsFile, configFile, "", new RuntimeOption()); + } + + public PicoDet(String modelFile, + String paramsFile, + String configFile, + String labelFile) { + init_(modelFile, paramsFile, configFile, labelFile, new RuntimeOption()); + } + // Constructor without label file public PicoDet(String modelFile, String paramsFile, diff --git a/examples/vision/detection/paddledetection/android/README.md b/examples/vision/detection/paddledetection/android/README.md index 6a82dd18c1f..4a57f98f953 100644 --- a/examples/vision/detection/paddledetection/android/README.md +++ b/examples/vision/detection/paddledetection/android/README.md @@ -1,3 +1,156 @@ -## 目标检测 Android Demo 使用文档 +# 目标检测 PicoDet Android Demo 使用文档 -- TODO +在 Android 上实现实时的目标检测功能,此 Demo 有很好的的易用性和开放性,如在 Demo 中跑自己训练好的模型等。 + +## 环境准备 + +1. 在本地环境安装好 Android Studio 工具,详细安装方法请见[Android Stuido 官网](https://developer.android.com/studio)。 +2. 准备一部 Android 手机,并开启 USB 调试模式。开启方法: `手机设置 -> 查找开发者选项 -> 打开开发者选项和 USB 调试模式` + +**注意**:如果您的 Android Studio 尚未配置 NDK ,请根据 Android Studio 用户指南中的[安装及配置 NDK 和 CMake ](https://developer.android.com/studio/projects/install-ndk)内容,预先配置好 NDK 。您可以选择最新的 NDK 版本,或者使用 FastDeploy Android 预测库版本一样的 NDK + +## 部署步骤 + +1. 目标检测 PicoDet Demo 位于 `fastdeploy/examples/vision/detection/paddledetection/android` 目录 +2. 用 Android Studio 打开 paddledetection/android 工程 +3. 手机连接电脑,打开 USB 调试和文件传输模式,并在 Android Studio 上连接自己的手机设备(手机需要开启允许从 USB 安装软件权限) + +

+image +

+ +> **注意:** +>> 如果您在导入项目、编译或者运行过程中遇到 NDK 配置错误的提示,请打开 ` File > Project Structure > SDK Location`,修改 `Andriod NDK location` 为您本机配置的 NDK 所在路径。本工程默认使用的NDK版本为20. +>> 如果您是通过 Andriod Studio 的 SDK Tools 下载的 NDK (见本章节"环境准备"),可以直接点击下拉框选择默认路径。 +>> 还有一种 NDK 配置方法,你可以在 `paddledetection/android/local.properties` 文件中手动完成 NDK 路径配置,如下图所示 +>> 如果以上步骤仍旧无法解决 NDK 配置错误,请尝试根据 Andriod Studio 官方文档中的[更新 Android Gradle 插件](https://developer.android.com/studio/releases/gradle-plugin?hl=zh-cn#updating-plugin)章节,尝试更新Android Gradle plugin版本。 + +4. 点击 Run 按钮,自动编译 APP 并安装到手机。(该过程会自动下载预编译的 FastDeploy Android 库,需要联网) +成功后效果如下,图一:APP 安装到手机;图二: APP 打开后的效果,会自动识别图片中的物体并标记;图三:APP设置选项,点击右上角的设置图片,可以设置不同选项进行体验。 + + | APP 图标 | APP 效果 | APP设置项 + | --- | --- | --- | + | ![app_pic ](https://user-images.githubusercontent.com/31974251/197170082-a2bdd49d-60ea-4df0-af63-18ed898a746e.jpg) | ![app_res](https://user-images.githubusercontent.com/31974251/197169609-bb214af3-d6e7-4433-bb96-1225cddd441c.jpg) | ![app_setup](https://user-images.githubusercontent.com/31974251/197332983-afbfa6d5-4a3b-4c54-a528-4a3e58441be1.jpg) | + +## PicoDet Java API 说明 +- 模型初始化 API: 模型初始化API包含两种方式,方式一是通过构造函数直接初始化;方式二是,通过调用init函数,在合适的程序节点进行初始化。PicoDet初始化参数说明如下: + - modelFile: String, paddle格式的模型文件路径,如 model.pdmodel + - paramFile: String, paddle格式的参数文件路径,如 model.pdiparams + - configFile: String, 模型推理的预处理配置文件,如 infer_cfg.yml + - labelFile: String, 可选参数,表示label标签文件所在路径,用于可视化,如 coco_label_list.txt,每一行包含一个label + - option: RuntimeOption,可选参数,模型初始化option。如果不传入该参数则会使用默认的运行时选项。 + +```java +// 构造函数: constructor w/o label file +public PicoDet(); // 空构造函数,之后可以调用init初始化 +public PicoDet(String modelFile, String paramsFile, String configFile); +public PicoDet(String modelFile, String paramsFile, String configFile, String labelFile); +public PicoDet(String modelFile, String paramsFile, String configFile, RuntimeOption option); +public PicoDet(String modelFile, String paramsFile, String configFile, String labelFile, RuntimeOption option); +// 手动调用init初始化: call init manually w/o label file +public boolean init(String modelFile, String paramsFile, String configFile, RuntimeOption option); +public boolean init(String modelFile, String paramsFile, String configFile, String labelFile, RuntimeOption option); +``` +- 模型预测 API:模型预测API包含直接预测的API以及带可视化功能的API。直接预测是指,不保存图片以及不渲染结果到Bitmap上,仅预测推理结果。预测并且可视化是指,预测结果以及可视化,并将可视化后的图片保存到指定的途径,以及将可视化结果渲染在Bitmap(目前支持ARGB8888格式的Bitmap), 后续可将该Bitmap在camera中进行显示。 +```java +// 直接预测:不保存图片以及不渲染结果到Bitmap上 +public DetectionResult predict(Bitmap ARGB8888Bitmap); +// 预测并且可视化:预测结果以及可视化,并将可视化后的图片保存到指定的途径,以及将可视化结果渲染在Bitmap上 +public DetectionResult predict(Bitmap ARGB8888Bitmap, String savedImagePath, float scoreThreshold) +``` +- 模型资源释放 API:调用 release() API 可以释放模型资源,返回true表示释放成功,false表示失败;调用 initialized() 可以判断模型是否初始化成功,true表示初始化成功,false表示失败。 +```java +public boolean release(); // 释放native资源 +public boolean initialized(); // 检查是否初始化成功 +``` +- RuntimeOption设置说明 +```java +public void enableLiteFp16(); // 开启fp16精度推理 +public void disableLiteFP16(); // 关闭fp16精度推理 +public void setCpuThreadNum(int threadNum); // 设置线程数 +public void setLitePowerMode(LitePowerMode mode); // 设置能耗模式 +public void setLitePowerMode(String modeStr); // 通过字符串形式设置能耗模式 +public void enableRecordTimeOfRuntime(); // 是否打印模型运行耗时 +``` + +- 模型结果DetectionResult说明 +```java +public float[][] mBoxes; // [n,4] 检测框 (x1,y1,x2,y2) +public float[] mScores; // [n] 得分 +public int[] mLabelIds; // [n] 分类ID +public boolean initialized(); // 检测结果是否有效 +``` + +- 模型调用示例1:使用构造函数以及默认的RuntimeOption +```java +import java.nio.ByteBuffer; +import android.graphics.Bitmap; +import android.opengl.GLES20; + +import com.baidu.paddle.fastdeploy.vision.DetectionResult; +import com.baidu.paddle.fastdeploy.vision.detection.PicoDet; + +// 初始化模型 +PicoDet model = new PicoDet("picodet_s_320_coco_lcnet/model.pdmodel", + "picodet_s_320_coco_lcnet/model.pdiparams", + "picodet_s_320_coco_lcnet/infer_cfg.yml"); + +// 读取图片: 以下仅为读取Bitmap的伪代码 +ByteBuffer pixelBuffer = ByteBuffer.allocate(width * height * 4); +GLES20.glReadPixels(0, 0, width, height, GLES20.GL_RGBA, GLES20.GL_UNSIGNED_BYTE, pixelBuffer); +Bitmap ARGB8888ImageBitmap = Bitmap.createBitmap(width, height, Bitmap.Config.ARGB_8888); +ARGB8888ImageBitmap.copyPixelsFromBuffer(pixelBuffer); + +// 模型推理 +DetectionResult result = model.predict(ARGB8888ImageBitmap); + +// 释放模型资源 +model.release(); +``` + +- 模型调用示例2: 在合适的程序节点,手动调用init,并自定义RuntimeOption +```java +// import 同上 ... +import com.baidu.paddle.fastdeploy.RuntimeOption; +import com.baidu.paddle.fastdeploy.LitePowerMode; +import com.baidu.paddle.fastdeploy.vision.DetectionResult; +import com.baidu.paddle.fastdeploy.vision.detection.PicoDet; +// 新建空模型 +PicoDet model = new PicoDet(); +// 模型路径 +String modelFile = "picodet_s_320_coco_lcnet/model.pdmodel"; +String paramFile = "picodet_s_320_coco_lcnet/model.pdiparams"; +String configFile = "picodet_s_320_coco_lcnet/infer_cfg.yml"; +// 指定RuntimeOption +RuntimeOption option = new RuntimeOption(); +option.setCpuThreadNum(2); +option.setLitePowerMode(LitePowerMode.LITE_POWER_HIGH); +option.enableRecordTimeOfRuntime(); +option.enableLiteFp16(); +// 使用init函数初始化 +model.init(modelFile, paramFile, configFile, option); +// Bitmap读取、模型预测、资源释放 同上 ... +``` +更详细的用法请参考 [MainActivity](./app/src/main/java/com/baidu/paddle/fastdeploy/examples/MainActivity.java#L207) 中的用法 + +## 替换 FastDeploy 预测库和模型 +替换FastDeploy预测库和模型的步骤非常简单。预测库所在的位置为 `app/libs/fastdeploy-android-xxx-shared`,其中 `xxx` 表示当前您使用的预测库版本号。模型所在的位置为,`app/src/main/assets/models/picodet_s_320_coco_lcnet`。 +- 替换FastDeploy预测库的步骤: + - 下载或编译最新的FastDeploy Android预测库,解压缩后放在 `app/libs` 目录下; + - 修改 `app/src/main/cpp/CMakeLists.txt` 中的预测库路径,指向您下载或编译的预测库路径。如: +```cmake +set(FastDeploy_DIR "${CMAKE_CURRENT_SOURCE_DIR}/../../../libs/fastdeploy-android-xxx-shared") +``` +- 替换PicoDet模型的步骤: + - 将您的PicoDet模型放在 `app/src/main/assets/models` 目录下; + - 修改 `app/src/main/res/values/strings.xml` 中模型路径的默认值,如: +```xml + +models/picodet_s_320_coco_lcnet +labels/coco_label_list.txt +``` + +## 如何通过 JNI 在 Native 层接入 FastDeploy C++ API ? +如果您对如何通过JNI来接入FastDeploy C++ API感兴趣,可以参考以下内容: +- [app/src/main/cpp 代码实现](./app/src/main/cpp/) +- [在 Android 中使用 FastDeploy C++ SDK](../../../../../docs/cn/faq/use_cpp_sdk_on_android.md) diff --git a/examples/vision/detection/paddledetection/android/app/src/main/java/com/baidu/paddle/fastdeploy/vision/classification/PaddleClasModel.java b/examples/vision/detection/paddledetection/android/app/src/main/java/com/baidu/paddle/fastdeploy/vision/classification/PaddleClasModel.java index f0524604e05..b4a56e30955 100644 --- a/examples/vision/detection/paddledetection/android/app/src/main/java/com/baidu/paddle/fastdeploy/vision/classification/PaddleClasModel.java +++ b/examples/vision/detection/paddledetection/android/app/src/main/java/com/baidu/paddle/fastdeploy/vision/classification/PaddleClasModel.java @@ -14,6 +14,20 @@ public PaddleClasModel() { mInitialized = false; } + // Constructor with default runtime option + public PaddleClasModel(String modelFile, + String paramsFile, + String configFile) { + init_(modelFile, paramsFile, configFile, "", new RuntimeOption()); + } + + public PaddleClasModel(String modelFile, + String paramsFile, + String configFile, + String labelFile) { + init_(modelFile, paramsFile, configFile, labelFile, new RuntimeOption()); + } + // Constructor without label file public PaddleClasModel(String modelFile, String paramsFile, diff --git a/examples/vision/detection/paddledetection/android/app/src/main/java/com/baidu/paddle/fastdeploy/vision/detection/PicoDet.java b/examples/vision/detection/paddledetection/android/app/src/main/java/com/baidu/paddle/fastdeploy/vision/detection/PicoDet.java index bbd52eecee7..9729eeb8a43 100644 --- a/examples/vision/detection/paddledetection/android/app/src/main/java/com/baidu/paddle/fastdeploy/vision/detection/PicoDet.java +++ b/examples/vision/detection/paddledetection/android/app/src/main/java/com/baidu/paddle/fastdeploy/vision/detection/PicoDet.java @@ -14,6 +14,20 @@ public PicoDet() { mInitialized = false; } + // Constructor with default runtime option + public PicoDet(String modelFile, + String paramsFile, + String configFile) { + init_(modelFile, paramsFile, configFile, "", new RuntimeOption()); + } + + public PicoDet(String modelFile, + String paramsFile, + String configFile, + String labelFile) { + init_(modelFile, paramsFile, configFile, labelFile, new RuntimeOption()); + } + // Constructor without label file public PicoDet(String modelFile, String paramsFile, diff --git a/fastdeploy/backends/paddle/paddle_backend.cc b/fastdeploy/backends/paddle/paddle_backend.cc index 674a3795481..25951dae550 100644 --- a/fastdeploy/backends/paddle/paddle_backend.cc +++ b/fastdeploy/backends/paddle/paddle_backend.cc @@ -19,6 +19,7 @@ namespace fastdeploy { void PaddleBackend::BuildOption(const PaddleBackendOption& option) { + option_ = option; if (option.use_gpu) { config_.EnableUseGpu(option.gpu_mem_init_size, option.gpu_id); if (option.enable_trt) { @@ -190,6 +191,7 @@ bool PaddleBackend::Infer(std::vector& inputs, outputs->resize(outputs_desc_.size()); for (size_t i = 0; i < outputs_desc_.size(); ++i) { auto handle = predictor_->GetOutputHandle(outputs_desc_[i].name); + (*outputs)[i].is_pinned_memory = option_.enable_pinned_memory; CopyTensorToCpu(handle, &((*outputs)[i])); } return true; diff --git a/fastdeploy/backends/paddle/paddle_backend.h b/fastdeploy/backends/paddle/paddle_backend.h index 78b939feaea..1d4f53db3d8 100755 --- a/fastdeploy/backends/paddle/paddle_backend.h +++ b/fastdeploy/backends/paddle/paddle_backend.h @@ -53,6 +53,7 @@ struct PaddleBackendOption { int gpu_mem_init_size = 100; // gpu device id int gpu_id = 0; + bool enable_pinned_memory = false; std::vector delete_pass_names = {}; }; @@ -105,6 +106,7 @@ class PaddleBackend : public BaseBackend { std::map>* opt_shape) const; void SetTRTDynamicShapeToConfig(const PaddleBackendOption& option); #endif + PaddleBackendOption option_; paddle_infer::Config config_; std::shared_ptr predictor_; std::vector inputs_desc_; diff --git a/fastdeploy/backends/paddle/util.cc b/fastdeploy/backends/paddle/util.cc index 005f9966b1a..d8cc1dbb936 100644 --- a/fastdeploy/backends/paddle/util.cc +++ b/fastdeploy/backends/paddle/util.cc @@ -67,7 +67,7 @@ void CopyTensorToCpu(std::unique_ptr& tensor, std::vector shape; auto tmp_shape = tensor->shape(); shape.assign(tmp_shape.begin(), tmp_shape.end()); - fd_tensor->Allocate(shape, fd_dtype, tensor->name()); + fd_tensor->Resize(shape, fd_dtype, tensor->name()); if (fd_tensor->dtype == FDDataType::FP32) { tensor->CopyToCpu(static_cast(fd_tensor->MutableData())); return; diff --git a/fastdeploy/backends/tensorrt/trt_backend.cc b/fastdeploy/backends/tensorrt/trt_backend.cc index 395215db030..363a9d1ce42 100644 --- a/fastdeploy/backends/tensorrt/trt_backend.cc +++ b/fastdeploy/backends/tensorrt/trt_backend.cc @@ -306,17 +306,21 @@ bool TrtBackend::Infer(std::vector& inputs, SetInputs(inputs); AllocateOutputsBuffer(outputs); + if (!context_->enqueueV2(bindings_.data(), stream_, nullptr)) { FDERROR << "Failed to Infer with TensorRT." << std::endl; return false; } for (size_t i = 0; i < outputs->size(); ++i) { FDASSERT(cudaMemcpyAsync((*outputs)[i].Data(), - outputs_buffer_[(*outputs)[i].name].data(), + outputs_device_buffer_[(*outputs)[i].name].data(), (*outputs)[i].Nbytes(), cudaMemcpyDeviceToHost, stream_) == 0, "[ERROR] Error occurs while copy memory from GPU to CPU."); } + FDASSERT(cudaStreamSynchronize(stream_) == cudaSuccess, + "[ERROR] Error occurs while sync cuda stream."); + return true; } @@ -332,10 +336,10 @@ void TrtBackend::GetInputOutputInfo() { auto dtype = engine_->getBindingDataType(i); if (engine_->bindingIsInput(i)) { inputs_desc_.emplace_back(TrtValueInfo{name, shape, dtype}); - inputs_buffer_[name] = FDDeviceBuffer(dtype); + inputs_device_buffer_[name] = FDDeviceBuffer(dtype); } else { outputs_desc_.emplace_back(TrtValueInfo{name, shape, dtype}); - outputs_buffer_[name] = FDDeviceBuffer(dtype); + outputs_device_buffer_[name] = FDDeviceBuffer(dtype); } } bindings_.resize(num_binds); @@ -357,30 +361,31 @@ void TrtBackend::SetInputs(const std::vector& inputs) { "please use INT32 input"); } else { // no copy - inputs_buffer_[item.name].SetExternalData(dims, item.Data()); + inputs_device_buffer_[item.name].SetExternalData(dims, item.Data()); } } else { // Allocate input buffer memory - inputs_buffer_[item.name].resize(dims); + inputs_device_buffer_[item.name].resize(dims); // copy from cpu to gpu if (item.dtype == FDDataType::INT64) { int64_t* data = static_cast(const_cast(item.Data())); std::vector casted_data(data, data + item.Numel()); - FDASSERT(cudaMemcpyAsync(inputs_buffer_[item.name].data(), + FDASSERT(cudaMemcpyAsync(inputs_device_buffer_[item.name].data(), static_cast(casted_data.data()), item.Nbytes() / 2, cudaMemcpyHostToDevice, stream_) == 0, "Error occurs while copy memory from CPU to GPU."); } else { - FDASSERT(cudaMemcpyAsync(inputs_buffer_[item.name].data(), item.Data(), + FDASSERT(cudaMemcpyAsync(inputs_device_buffer_[item.name].data(), + item.Data(), item.Nbytes(), cudaMemcpyHostToDevice, stream_) == 0, "Error occurs while copy memory from CPU to GPU."); } } // binding input buffer - bindings_[idx] = inputs_buffer_[item.name].data(); + bindings_[idx] = inputs_device_buffer_[item.name].data(); } } @@ -399,15 +404,19 @@ void TrtBackend::AllocateOutputsBuffer(std::vector* outputs) { "Cannot find output: %s of tensorrt network from the original model.", outputs_desc_[i].name.c_str()); auto ori_idx = iter->second; + // set user's outputs info std::vector shape(output_dims.d, output_dims.d + output_dims.nbDims); + (*outputs)[ori_idx].is_pinned_memory = option_.enable_pinned_memory; (*outputs)[ori_idx].Resize(shape, GetFDDataType(outputs_desc_[i].dtype), outputs_desc_[i].name); + // Allocate output buffer memory - outputs_buffer_[outputs_desc_[i].name].resize(output_dims); + outputs_device_buffer_[outputs_desc_[i].name].resize(output_dims); + // binding output buffer - bindings_[idx] = outputs_buffer_[outputs_desc_[i].name].data(); + bindings_[idx] = outputs_device_buffer_[outputs_desc_[i].name].data(); } } diff --git a/fastdeploy/backends/tensorrt/trt_backend.h b/fastdeploy/backends/tensorrt/trt_backend.h index ad3ace6a438..09f18b2dff7 100755 --- a/fastdeploy/backends/tensorrt/trt_backend.h +++ b/fastdeploy/backends/tensorrt/trt_backend.h @@ -70,6 +70,7 @@ struct TrtBackendOption { std::map> min_shape; std::map> opt_shape; std::string serialize_file = ""; + bool enable_pinned_memory = false; // inside parameter, maybe remove next version bool remove_multiclass_nms_ = false; @@ -118,8 +119,8 @@ class TrtBackend : public BaseBackend { std::vector bindings_; std::vector inputs_desc_; std::vector outputs_desc_; - std::map inputs_buffer_; - std::map outputs_buffer_; + std::map inputs_device_buffer_; + std::map outputs_device_buffer_; std::string calibration_str_; diff --git a/fastdeploy/backends/tensorrt/utils.h b/fastdeploy/backends/tensorrt/utils.h index f7623052607..7f2e7344bad 100644 --- a/fastdeploy/backends/tensorrt/utils.h +++ b/fastdeploy/backends/tensorrt/utils.h @@ -206,6 +206,8 @@ class FDGenericBuffer { }; using FDDeviceBuffer = FDGenericBuffer; +using FDDeviceHostBuffer = FDGenericBuffer; class FDTrtLogger : public nvinfer1::ILogger { public: diff --git a/fastdeploy/core/allocate.cc b/fastdeploy/core/allocate.cc index 285642d5ce3..e71cd34434a 100644 --- a/fastdeploy/core/allocate.cc +++ b/fastdeploy/core/allocate.cc @@ -34,6 +34,12 @@ bool FDDeviceAllocator::operator()(void** ptr, size_t size) const { void FDDeviceFree::operator()(void* ptr) const { cudaFree(ptr); } +bool FDDeviceHostAllocator::operator()(void** ptr, size_t size) const { + return cudaMallocHost(ptr, size) == cudaSuccess; +} + +void FDDeviceHostFree::operator()(void* ptr) const { cudaFreeHost(ptr); } + #endif } // namespace fastdeploy diff --git a/fastdeploy/core/allocate.h b/fastdeploy/core/allocate.h index c48bb7cee3c..1e88787f4d6 100644 --- a/fastdeploy/core/allocate.h +++ b/fastdeploy/core/allocate.h @@ -45,6 +45,16 @@ class FASTDEPLOY_DECL FDDeviceFree { void operator()(void* ptr) const; }; +class FASTDEPLOY_DECL FDDeviceHostAllocator { + public: + bool operator()(void** ptr, size_t size) const; +}; + +class FASTDEPLOY_DECL FDDeviceHostFree { + public: + void operator()(void* ptr) const; +}; + #endif } // namespace fastdeploy diff --git a/fastdeploy/core/fd_tensor.cc b/fastdeploy/core/fd_tensor.cc index 1161d2b0e2d..e98a81e1b78 100644 --- a/fastdeploy/core/fd_tensor.cc +++ b/fastdeploy/core/fd_tensor.cc @@ -207,9 +207,27 @@ bool FDTensor::ReallocFn(size_t nbytes) { "-DWITH_GPU=ON," "so this is an unexpected problem happend."); #endif + } else { + if (is_pinned_memory) { +#ifdef WITH_GPU + size_t original_nbytes = Nbytes(); + if (nbytes > original_nbytes) { + if (buffer_ != nullptr) { + FDDeviceHostFree()(buffer_); + } + FDDeviceHostAllocator()(&buffer_, nbytes); + } + return buffer_ != nullptr; +#else + FDASSERT(false, + "The FastDeploy FDTensor allocator didn't compile under " + "-DWITH_GPU=ON," + "so this is an unexpected problem happend."); +#endif + } + buffer_ = realloc(buffer_, nbytes); + return buffer_ != nullptr; } - buffer_ = realloc(buffer_, nbytes); - return buffer_ != nullptr; } void FDTensor::FreeFn() { @@ -220,7 +238,13 @@ void FDTensor::FreeFn() { FDDeviceFree()(buffer_); #endif } else { - FDHostFree()(buffer_); + if (is_pinned_memory) { +#ifdef WITH_GPU + FDDeviceHostFree()(buffer_); +#endif + } else { + FDHostFree()(buffer_); + } } buffer_ = nullptr; } @@ -231,7 +255,6 @@ void FDTensor::CopyBuffer(void* dst, const void* src, size_t nbytes) { #ifdef WITH_GPU FDASSERT(cudaMemcpy(dst, src, nbytes, cudaMemcpyDeviceToDevice) == 0, "[ERROR] Error occurs while copy memory from GPU to GPU"); - #else FDASSERT(false, "The FastDeploy didn't compile under -DWITH_GPU=ON, so copying " @@ -239,7 +262,19 @@ void FDTensor::CopyBuffer(void* dst, const void* src, size_t nbytes) { "an unexpected problem happend."); #endif } else { - std::memcpy(dst, src, nbytes); + if (is_pinned_memory) { +#ifdef WITH_GPU + FDASSERT(cudaMemcpy(dst, src, nbytes, cudaMemcpyHostToHost) == 0, + "[ERROR] Error occurs while copy memory from host to host"); +#else + FDASSERT(false, + "The FastDeploy didn't compile under -DWITH_GPU=ON, so copying " + "gpu buffer is " + "an unexpected problem happend."); +#endif + } else { + std::memcpy(dst, src, nbytes); + } } } diff --git a/fastdeploy/core/fd_tensor.h b/fastdeploy/core/fd_tensor.h index 7e8bb785199..1619fe27118 100644 --- a/fastdeploy/core/fd_tensor.h +++ b/fastdeploy/core/fd_tensor.h @@ -40,6 +40,10 @@ struct FASTDEPLOY_DECL FDTensor { // so we can skip data transfer, which may improve the efficience Device device = Device::CPU; + // Whether the data buffer is in pinned memory, which is allocated + // with cudaMallocHost() + bool is_pinned_memory = false; + // if the external data is not on CPU, we use this temporary buffer // to transfer data to CPU at some cases we need to visit the // other devices' data diff --git a/fastdeploy/pybind/runtime.cc b/fastdeploy/pybind/runtime.cc index 6d8eb78048a..70f9a5917b1 100755 --- a/fastdeploy/pybind/runtime.cc +++ b/fastdeploy/pybind/runtime.cc @@ -44,6 +44,8 @@ void BindRuntime(pybind11::module& m) { .def("enable_trt_fp16", &RuntimeOption::EnableTrtFP16) .def("disable_trt_fp16", &RuntimeOption::DisableTrtFP16) .def("set_trt_cache_file", &RuntimeOption::SetTrtCacheFile) + .def("enable_pinned_memory", &RuntimeOption::EnablePinnedMemory) + .def("disable_pinned_memory", &RuntimeOption::DisablePinnedMemory) .def("enable_paddle_trt_collect_shape", &RuntimeOption::EnablePaddleTrtCollectShape) .def("disable_paddle_trt_collect_shape", &RuntimeOption::DisablePaddleTrtCollectShape) .def_readwrite("model_file", &RuntimeOption::model_file) @@ -200,6 +202,7 @@ void BindRuntime(pybind11::module& m) { .def("numel", &FDTensor::Numel) .def("nbytes", &FDTensor::Nbytes) .def_readwrite("name", &FDTensor::name) + .def_readwrite("is_pinned_memory", &FDTensor::is_pinned_memory) .def_readonly("shape", &FDTensor::shape) .def_readonly("dtype", &FDTensor::dtype) .def_readonly("device", &FDTensor::device); diff --git a/fastdeploy/runtime.cc b/fastdeploy/runtime.cc index 0877402d727..5037dc1206d 100755 --- a/fastdeploy/runtime.cc +++ b/fastdeploy/runtime.cc @@ -356,6 +356,10 @@ void RuntimeOption::EnableTrtFP16() { trt_enable_fp16 = true; } void RuntimeOption::DisableTrtFP16() { trt_enable_fp16 = false; } +void RuntimeOption::EnablePinnedMemory() { enable_pinned_memory = true; } + +void RuntimeOption::DisablePinnedMemory() { enable_pinned_memory = false; } + void RuntimeOption::SetTrtCacheFile(const std::string& cache_file_path) { trt_serialize_file = cache_file_path; } @@ -503,6 +507,7 @@ void Runtime::CreatePaddleBackend() { pd_option.gpu_id = option.device_id; pd_option.delete_pass_names = option.pd_delete_pass_names; pd_option.cpu_thread_num = option.cpu_thread_num; + pd_option.enable_pinned_memory = option.enable_pinned_memory; #ifdef ENABLE_TRT_BACKEND if (pd_option.use_gpu && option.pd_enable_trt) { pd_option.enable_trt = true; @@ -516,6 +521,7 @@ void Runtime::CreatePaddleBackend() { trt_option.min_shape = option.trt_min_shape; trt_option.opt_shape = option.trt_opt_shape; trt_option.serialize_file = option.trt_serialize_file; + trt_option.enable_pinned_memory = option.enable_pinned_memory; pd_option.trt_option = trt_option; } #endif @@ -606,6 +612,7 @@ void Runtime::CreateTrtBackend() { trt_option.min_shape = option.trt_min_shape; trt_option.opt_shape = option.trt_opt_shape; trt_option.serialize_file = option.trt_serialize_file; + trt_option.enable_pinned_memory = option.enable_pinned_memory; // TODO(jiangjiajun): inside usage, maybe remove this later trt_option.remove_multiclass_nms_ = option.remove_multiclass_nms_; diff --git a/fastdeploy/runtime.h b/fastdeploy/runtime.h index 32ad1615c76..021103cb21f 100755 --- a/fastdeploy/runtime.h +++ b/fastdeploy/runtime.h @@ -204,6 +204,15 @@ struct FASTDEPLOY_DECL RuntimeOption { */ void SetTrtCacheFile(const std::string& cache_file_path); + /** + * @brief Enable pinned memory. Pinned memory can be utilized to speedup the data transfer between CPU and GPU. Currently it's only suppurted in TRT backend and Paddle Inference backend. + */ + void EnablePinnedMemory(); + + /** + * @brief Disable pinned memory + */ + void DisablePinnedMemory(); /** * @brief Enable to collect shape in paddle trt backend @@ -223,6 +232,8 @@ struct FASTDEPLOY_DECL RuntimeOption { Device device = Device::CPU; + bool enable_pinned_memory = false; + // ======Only for ORT Backend======== // -1 means use default value by ort // 0: ORT_DISABLE_ALL 1: ORT_ENABLE_BASIC 2: ORT_ENABLE_EXTENDED 3: diff --git a/python/fastdeploy/runtime.py b/python/fastdeploy/runtime.py index 90e64d4005c..61d1039318b 100755 --- a/python/fastdeploy/runtime.py +++ b/python/fastdeploy/runtime.py @@ -319,6 +319,16 @@ def disable_trt_fp16(self): """ return self._option.disable_trt_fp16() + def enable_pinned_memory(self): + """Enable pinned memory. Pinned memory can be utilized to speedup the data transfer between CPU and GPU. Currently it's only suppurted in TRT backend and Paddle Inference backend. + """ + return self._option.enable_pinned_memory() + + def disable_pinned_memory(self): + """Disable pinned memory. + """ + return self._option.disable_pinned_memory() + def enable_paddle_to_trt(self): """While using TensorRT backend, enable_paddle_to_trt() will change to use Paddle Inference backend, and use its integrated TensorRT instead. """ diff --git a/tests/release_task/compare_with_gt.py b/tests/release_task/compare_with_gt.py index 2770fe18212..d248b1b1810 100644 --- a/tests/release_task/compare_with_gt.py +++ b/tests/release_task/compare_with_gt.py @@ -1,6 +1,39 @@ import numpy as np import re +diff_score_threshold = { + "linux-x64": { + "label_diff": 0, + "score_diff": 1e-4, + "boxes_diff_ratio": 1e-4, + "boxes_diff": 1e-3 + }, + "linux-aarch64": { + "label_diff": 0, + "score_diff": 1e-4, + "boxes_diff_ratio": 1e-4, + "boxes_diff": 1e-3 + }, + "osx-x86_64": { + "label_diff": 0, + "score_diff": 1e-4, + "boxes_diff_ratio": 2e-4, + "boxes_diff": 1e-3 + }, + "osx-arm64": { + "label_diff": 0, + "score_diff": 1e-4, + "boxes_diff_ratio": 2e-4, + "boxes_diff": 1e-3 + }, + "win-x64": { + "label_diff": 0, + "score_diff": 5e-4, + "boxes_diff_ratio": 1e-3, + "boxes_diff": 1e-3 + } +} + def parse_arguments(): import argparse @@ -59,6 +92,7 @@ def save_numpy_result(file_path, error_msg): def check_result(gt_result, infer_result, args): + platform = args.platform if len(gt_result) != len(infer_result): infer_result = infer_result[-len(gt_result):] diff = np.abs(gt_result - infer_result) @@ -68,17 +102,19 @@ def check_result(gt_result, infer_result, args): boxes_diff_ratio = boxes_diff / (infer_result[:, :-2] + 1e-6) is_diff = False backend = args.result_path.split(".")[0] - if (label_diff > 0).any(): + if (label_diff > diff_score_threshold[platform]["label_diff"]).any(): print(args.platform, args.device, "label diff ", label_diff) is_diff = True label_diff_bool_file = args.platform + "_" + backend + "_" + "label_diff_bool.txt" save_numpy_result(label_diff_bool_file, label_diff > 0) - if (score_diff > 2e-4).any(): + if (score_diff > diff_score_threshold[platform]["score_diff"]).any(): print(args.platform, args.device, "score diff ", score_diff) is_diff = True score_diff_bool_file = args.platform + "_" + backend + "_" + "score_diff_bool.txt" save_numpy_result(score_diff_bool_file, score_diff > 1e-4) - if (boxes_diff_ratio > 1e-4).any() and (boxes_diff > 1e-3).any(): + if (boxes_diff_ratio > diff_score_threshold[platform]["boxes_diff_ratio"] + ).any() and ( + boxes_diff > diff_score_threshold[platform]["boxes_diff"]).any(): print(args.platform, args.device, "boxes diff ", boxes_diff_ratio) is_diff = True boxes_diff_bool_file = args.platform + "_" + backend + "_" + "boxes_diff_bool.txt" diff --git a/tests/release_task/cpp_run.bat b/tests/release_task/cpp_run.bat index f79a0c01e53..2c8a8aed628 100644 --- a/tests/release_task/cpp_run.bat +++ b/tests/release_task/cpp_run.bat @@ -17,7 +17,7 @@ if "%DEVICE%" == "gpu" ( set CPP_FASTDEPLOY_PACKAGE=fastdeploy-%PLATFORM%-%DEVICE%-%VERSION% set RUN_CASES=ort paddle trt ) else ( - set CPP_FASTDEPLOY_PACKAGE=fastdeploy-python + set CPP_FASTDEPLOY_PACKAGE=fastdeploy-%PLATFORM%-%VERSION% set RUN_CASES=ort paddle openvino ) @@ -36,7 +36,7 @@ set COMPARE_SHELL=%CURRENT_DIR%\compare_with_gt.py python -c "from download import *; download_and_decompress('https://fastdeploy.bj.bcebos.com/dev/cpp/%CPP_FASTDEPLOY_PACKAGE%.zip', './')" mkdir build && cd build -cmake .. -G "Visual Studio 16 2019" -A x64 -DFASTDEPLOY_INSTALL_DIR=%cd%\..\%CPP_FASTDEPLOY_PACKAGE% -DCUDA_DIRECTORY="C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.2" -DCMAKE_CXX_COMPILER=%CMAKE_CXX_COMPILER% +cmake .. -G "Visual Studio 16 2019" -A x64 -DFASTDEPLOY_INSTALL_DIR=%cd%\..\%CPP_FASTDEPLOY_PACKAGE% -DCUDA_DIRECTORY="C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.2" msbuild infer_demo.sln /m:4 /p:Configuration=Release /p:Platform=x64 @@ -47,7 +47,6 @@ echo "FASTDEPLOY_HOME" %FASTDEPLOY_HOME% copy /Y %FASTDEPLOY_HOME%\third_libs\install\onnxruntime\lib\onnxruntime* Release\ set PATH=%FASTDEPLOY_HOME%\lib;%PATH% set PATH=%FASTDEPLOY_HOME%\third_libs\install\onnxruntime\lib;%PATH% -set PATH=%FASTDEPLOY_HOME%\third_libs\install\opencv-win-x64-3.4.16\build\x64\vc15\bin;%PATH% set PATH=%FASTDEPLOY_HOME%\third_libs\install\paddle_inference\paddle\lib;%PATH% set PATH=%FASTDEPLOY_HOME%\third_libs\install\paddle_inference\third_party\install\mkldnn\lib;%PATH% set PATH=%FASTDEPLOY_HOME%\third_libs\install\paddle_inference\third_party\install\mklml\lib;%PATH% @@ -56,9 +55,9 @@ set PATH=%FASTDEPLOY_HOME%\third_libs\install\tensorrt\lib;%PATH% set PATH=%FASTDEPLOY_HOME%\third_libs\install\faster_tokenizer\lib;%PATH% set PATH=%FASTDEPLOY_HOME%\third_libs\install\faster_tokenizer\third_party\lib;%PATH% set PATH=%FASTDEPLOY_HOME%\third_libs\install\yaml-cpp\lib;%PATH% -set PATH=%FASTDEPLOY_HOME%\third_libs\install\openvino\bin;%PATH% -set PATH=%FASTDEPLOY_HOME%\third_libs\install\openvino\3rdparty\tbb\bin;%PATH% - +set PATH=%FASTDEPLOY_HOME%\third_libs\install\opencv\build\x64\vc15\bin;%PATH% +set PATH=%FASTDEPLOY_HOME%\third_libs\install\openvino\runtime\bin;%PATH% +set PATH=%FASTDEPLOY_HOME%\third_libs\install\openvino\runtime\3rdparty\tbb\bin;%PATH% echo "set path done" cd %cd%\Release for %%b in (%RUN_CASES%) do ( @@ -71,11 +70,9 @@ for %%b in (%RUN_CASES%) do ( if %%b == trt ( infer_ppyoloe_demo.exe --model_dir=%MODEL_PATH% --image_file=%IMAGE_PATH% --device=gpu --backend=%%b >> cpp_%%b_trt_result.txt python %COMPARE_SHELL% --gt_path %GROUND_TRUTH_PATH% --result_path cpp_%%b_trt_result.txt --platform %PLATFORM% --device trt --conf_threshold 0.5 - ) else if %%b == ort ( + ) else ( infer_ppyoloe_demo.exe --model_dir=%MODEL_PATH% --image_file=%IMAGE_PATH% --device=gpu --backend=%%b >> cpp_%%b_gpu_result.txt python %COMPARE_SHELL% --gt_path %GROUND_TRUTH_PATH% --result_path cpp_%%b_gpu_result.txt --platform %PLATFORM% --device gpu --conf_threshold 0.5 - ) else if %%b == paddle ( - echo "Temporarily skip paddle gpu case in windows for Inaccurate inference precision" ) ) ) diff --git a/tests/release_task/cpp_run.sh b/tests/release_task/cpp_run.sh index 0c14904ca71..21f1df8bfb5 100644 --- a/tests/release_task/cpp_run.sh +++ b/tests/release_task/cpp_run.sh @@ -12,7 +12,8 @@ fi echo $CPP_FASTDEPLOY_PACKAGE LINUX_X64_GPU_CASE=('ort' 'paddle' 'trt') LINUX_X64_CPU_CASE=('ort' 'paddle' 'openvino') -LINUX_AARCH_CPU_CASE=('ort' 'openvino') +#LINUX_AARCH_CPU_CASE=('ort' 'openvino') +LINUX_AARCH_CPU_CASE=('ort') MACOS_INTEL_CPU_CASE=('ort' 'paddle' 'openvino') MACOS_ARM64_CPU_CASE=('default') wget -q https://bj.bcebos.com/paddlehub/fastdeploy/ppyoloe_crn_l_300e_coco.tgz @@ -25,6 +26,7 @@ GROUND_TRUTH_PATH=$CURRENT_DIR/release_task_groud_truth_result.txt COMPARE_SHELL=$CURRENT_DIR/compare_with_gt.py RUN_CASE=() +CONF_THRESHOLD=0 if [ "$DEVICE" = "gpu" ] && [ "$PLATFORM" = "linux-x64" ];then RUN_CASE=(${LINUX_X64_GPU_CASE[*]}) elif [ "$DEVICE" = "cpu" ] && [ "$PLATFORM" = "linux-x64" ];then @@ -35,6 +37,7 @@ elif [ "$DEVICE" = "cpu" ] && [ "$PLATFORM" = "osx-x86_64" ];then RUN_CASE=(${MACOS_INTEL_CPU_CASE[*]}) elif [ "$DEVICE" = "cpu" ] && [ "$PLATFORM" = "osx-arm64" ];then RUN_CASE=(${MACOS_ARM64_CPU_CASE[*]}) + CONF_THRESHOLD=0.5 fi case_number=${#RUN_CASE[@]} @@ -52,16 +55,16 @@ do echo "Cpp Backend:" $backend if [ "$backend" != "trt" ];then ./infer_ppyoloe_demo --model_dir=$MODEL_PATH --image_file=$IMAGE_PATH --device=cpu --backend=$backend >> cpp_cpu_result.txt - python $COMPARE_SHELL --gt_path $GROUND_TRUTH_PATH --result_path cpp_cpu_result.txt --platform $PLATFORM --device cpu + python $COMPARE_SHELL --gt_path $GROUND_TRUTH_PATH --result_path cpp_cpu_result.txt --platform $PLATFORM --device cpu --conf_threshold $CONF_THRESHOLD fi if [ "$DEVICE" = "gpu" ];then if [ "$backend" = "trt" ];then ./infer_ppyoloe_demo --model_dir=$MODEL_PATH --image_file=$IMAGE_PATH --device=gpu --backend=$backend >> cpp_trt_result.txt - python $COMPARE_SHELL --gt_path $GROUND_TRUTH_PATH --result_path cpp_trt_result.txt --platform $PLATFORM --device trt + python $COMPARE_SHELL --gt_path $GROUND_TRUTH_PATH --result_path cpp_trt_result.txt --platform $PLATFORM --device trt --conf_threshold $CONF_THRESHOLD else ./infer_ppyoloe_demo --model_dir=$MODEL_PATH --image_file=$IMAGE_PATH --device=gpu --backend=$backend >> cpp_gpu_result.txt - python $COMPARE_SHELL --gt_path $GROUND_TRUTH_PATH --result_path cpp_gpu_result.txt --platform $PLATFORM --device gpu + python $COMPARE_SHELL --gt_path $GROUND_TRUTH_PATH --result_path cpp_gpu_result.txt --platform $PLATFORM --device gpu --conf_threshold $CONF_THRESHOLD fi fi done @@ -78,4 +81,3 @@ else cat $res_file exit -1 fi - diff --git a/tests/release_task/py_run.bat b/tests/release_task/py_run.bat index b2e179d1b6c..5be6475494a 100644 --- a/tests/release_task/py_run.bat +++ b/tests/release_task/py_run.bat @@ -46,12 +46,10 @@ for %%b in (%RUN_CASES%) do ( if %%b == trt ( python infer_ppyoloe.py --model_dir=%MODEL_PATH% --image=%IMAGE_PATH% --device=gpu --backend=%%b >> py_%%b_trt_result.txt python %COMPARE_SHELL% --gt_path %GROUND_TRUTH_PATH% --result_path py_%%b_trt_result.txt --platform %PLATFORM% --device trt --conf_threshold 0.5 - ) else if %%b == ort ( + ) else ( python infer_ppyoloe.py --model_dir=%MODEL_PATH% --image=%IMAGE_PATH% --device=gpu --backend=%%b >> py_%%b_gpu_result.txt python %COMPARE_SHELL% --gt_path %GROUND_TRUTH_PATH% --result_path py_%%b_gpu_result.txt --platform %PLATFORM% --device gpu --conf_threshold 0.5 - ) else if %%b == paddle ( - echo "Temporarily skip paddle gpu case in windows for Inaccurate inference precision" - ) + ) ) ) diff --git a/tests/release_task/py_run.sh b/tests/release_task/py_run.sh index 2856bc4da3c..36624246306 100644 --- a/tests/release_task/py_run.sh +++ b/tests/release_task/py_run.sh @@ -12,7 +12,8 @@ fi echo $PY_FASTDEPLOY_PACKAGE LINUX_X64_GPU_CASE=('ort' 'paddle' 'trt') LINUX_X64_CPU_CASE=('ort' 'paddle' 'openvino') -LINUX_AARCH_CPU_CASE=('ort' 'openvino') +#LINUX_AARCH_CPU_CASE=('ort' 'openvino') +LINUX_AARCH_CPU_CASE=('ort') MACOS_INTEL_CPU_CASE=('ort' 'paddle' 'openvino') MACOS_ARM64_CPU_CASE=('default') wget -q https://bj.bcebos.com/paddlehub/fastdeploy/ppyoloe_crn_l_300e_coco.tgz @@ -60,7 +61,7 @@ do fi done -ret=$? +ret=$? res_file="result.txt" if [ ! -f $res_file ];then diff --git a/tools/quantization/README.md b/tools/quantization/README.md index 2bf2da7f220..4459d526d2d 100644 --- a/tools/quantization/README.md +++ b/tools/quantization/README.md @@ -47,6 +47,7 @@ tar -xvf COCO_val_320.tar.gz ```shell fastdeploy_quant --config_path=./configs/detection/yolov5s_quant.yaml --method='PTQ' --save_dir='./yolov5s_ptq_model/' ``` +【说明】离线量化(训练后量化):post-training quantization,缩写是PTQ ##### 3.参数说明