init Inference top APIs#10549
Conversation
| @@ -0,0 +1,27 @@ | |||
| # Embed Paddle Inference in Your Application | |||
|
|
|||
| Paddle inference offers the APIs in `C` and `C++` languages. | |||
There was a problem hiding this comment.
这里有必要分C和C++两个么?目前只是C++ api,能否先只写C++ api?
There was a problem hiding this comment.
嗯,另外加一个 c api,估计另外一个pr里
|
|
||
| Paddle inference offers the APIs in `C` and `C++` languages. | ||
|
|
||
| One can easily deploy a model trained by Paddle following the steps as below: |
|
|
||
| ## Optimize the native Fluid Model | ||
|
|
||
| The native model that get from the training phase needs to be optimized for that. |
There was a problem hiding this comment.
我们是拿了train阶段的save_inference_model,这样会加入feed和fetch op,并做了一定的剪裁优化。如果直接拿train阶段的模型,没有feed和fetch op,就跑不了了。
这里提到的策略1,2,3,应该在save_inference_model的时候就做了。
这里是否应该只提供一些额外的优化策略,比如third-party engine, fuse operators等
There was a problem hiding this comment.
对,这里只是解释这个工具的必要性。
| const std::vector<std::vector<int>>& input_shapes, | ||
| const std::vector<std::vector<int>>& output_shapes, | ||
| const std::vector<std::vector<float>>& input_data, | ||
| std::vector<std::vector<float>>* output_data); |
There was a problem hiding this comment.
这个接口,对NLP的已经不适用了。是否考虑接口中直接使用LoDTensor。
因为用户的数据格式千变万化,让用户自己转成LoDTensor比较合理。我们也可以给出一些转换的工具或函数,但run的接口里保持使用LoDTensor。
bool Run(const std::vector<LoDTensor>& input,
std::vector<LoDTensor>* output);
inputs和outputs不需要,feed和fetch op里面都有的。
Paddle/paddle/fluid/inference/tests/test_helper.h
Lines 93 to 96 in 4c8ff72
单侧里面已经封装的比较干净了。
There was a problem hiding this comment.
这里还需要考虑多线程预测的情况,需要加一个const int thread_nums的参数。
There was a problem hiding this comment.
内部没有多线程,多线程是外面的线程调预测库。
|
|
||
| class Predictor { | ||
| public: | ||
| struct Attr; |
There was a problem hiding this comment.
不是Network,是 attribute
| kAnakin, // Use Anakin for inference. | ||
| kTensorRT, // Use TensorRT for inference. | ||
| kAutoMixedAnakin, // Automatically mix Fluid with Anakin. | ||
| kAutoMixedTensorRT, // Automatically mix Fluid with TensorRT. |
There was a problem hiding this comment.
- kAutoMixedAnakin和kAutoMixedTensorRT可以去掉,kAnakin应该就包括kAutoMixedAnakin
- kNone里面应该还要分CPU模式,GPU模式
- MKLDNN属于kNone还是单列?
There was a problem hiding this comment.
不包括,这里 kTensorRT指的是全图用,子图那个是单独的开关kAutoMixedTensorRT
There was a problem hiding this comment.
对用户来说,子图全图概念有点复杂,选了TensorRT,就理解为用TensorRT来做优化了,至于用子图还是全图优化(而且全图是子图的一部分),应该内部实现。
There was a problem hiding this comment.
部分支持的feature现在还没有,放在这里只剩为了让业务方知道我们在做这个feature
| - Memory reuse for native Fluid executor; | ||
| - Translate the model storage format to some third-party engine's, so that the inference API can utilize the engine for acceleration; | ||
|
|
||
| We have an official tool to do the optimization, call `paddle_inference_optimize --help` for more information. |
There was a problem hiding this comment.
paddle_inference_optimize是binary还是python脚本?
比如python paddle_inference_optimize src_model_dir dst_model_dir --inference_optimize_method=2 代表使用第二种优化策略。
panyx0718
left a comment
There was a problem hiding this comment.
Let's kick off this thing. It's in contrib, just for experiment for now
| @@ -0,0 +1,27 @@ | |||
| # Embed Paddle Inference in Your Application | |||
|
|
|||
| Paddle inference offers the APIs in `C` and `C++` languages. | |||
With a README.md with some description/plan of how to use the APIs.