init Inference top APIs by Superjomn · Pull Request #10549 · PaddlePaddle/Paddle

Superjomn · 2018-05-10T03:19:39Z

With a README.md with some description/plan of how to use the APIs.

luotao1 · 2018-05-10T06:05:30Z

@@ -0,0 +1,27 @@
+# Embed Paddle Inference in Your Application
+
+Paddle inference offers the APIs in `C` and `C++` languages.


这里有必要分C和C++两个么？目前只是C++ api，能否先只写C++ api？

嗯，另外加一个 c api，估计另外一个pr里

c如果暂时不需要就先别写了

luotao1 · 2018-05-10T06:05:52Z

+
+Paddle inference offers the APIs in `C` and `C++` languages.
+
+One can easily deploy a model trained by Paddle following the steps as below:


Paddle->PaddlePaddle

luotao1 · 2018-05-10T06:12:07Z

+
+## Optimize the native Fluid Model
+
+The native model that get from the training phase needs to be optimized for that.


我们是拿了train阶段的save_inference_model，这样会加入feed和fetch op，并做了一定的剪裁优化。如果直接拿train阶段的模型，没有feed和fetch op，就跑不了了。

这里提到的策略1,2,3,应该在save_inference_model的时候就做了。
这里是否应该只提供一些额外的优化策略，比如third-party engine, fuse operators等

对，这里只是解释这个工具的必要性。

luotao1 · 2018-05-10T06:14:58Z

+           const std::vector<std::vector<int>>& input_shapes,
+           const std::vector<std::vector<int>>& output_shapes,
+           const std::vector<std::vector<float>>& input_data,
+           std::vector<std::vector<float>>* output_data);


这个接口，对NLP的已经不适用了。是否考虑接口中直接使用LoDTensor。
因为用户的数据格式千变万化，让用户自己转成LoDTensor比较合理。我们也可以给出一些转换的工具或函数，但run的接口里保持使用LoDTensor。

bool Run(const std::vector<LoDTensor>& input, std::vector<LoDTensor>* output);

inputs和outputs不需要，feed和fetch op里面都有的。

Paddle/paddle/fluid/inference/tests/test_helper.h

Lines 93 to 96 in 4c8ff72

void TestInference(const std::string& dirname,

const std::vector<paddle::framework::LoDTensor*>& cpu_feeds,

const std::vector<paddle::framework::LoDTensor*>& cpu_fetchs,

const int repeat = 1, const bool is_combined = false) {

单侧里面已经封装的比较干净了。

这里还需要考虑多线程预测的情况，需要加一个const int thread_nums的参数。

内部没有多线程，多线程是外面的线程调预测库。

luotao1 · 2018-05-10T06:31:31Z

+
+class Predictor {
+public:
+  struct Attr;


Attr-》Network？

不是Network，是 attribute

luotao1 · 2018-05-10T06:36:59Z

+      kAnakin,             // Use Anakin for inference.
+      kTensorRT,           // Use TensorRT for inference.
+      kAutoMixedAnakin,    // Automatically mix Fluid with Anakin.
+      kAutoMixedTensorRT,  // Automatically mix Fluid with TensorRT.


kAutoMixedAnakin和kAutoMixedTensorRT可以去掉，kAnakin应该就包括kAutoMixedAnakin

kNone里面应该还要分CPU模式，GPU模式

MKLDNN属于kNone还是单列？

不包括，这里 kTensorRT指的是全图用，子图那个是单独的开关kAutoMixedTensorRT

对用户来说，子图全图概念有点复杂，选了TensorRT，就理解为用TensorRT来做优化了，至于用子图还是全图优化（而且全图是子图的一部分），应该内部实现。

部分支持的feature现在还没有，放在这里只剩为了让业务方知道我们在做这个feature

luotao1 · 2018-05-10T06:39:16Z

+- Memory reuse for native Fluid executor;
+- Translate the model storage format to some third-party engine's, so that the inference API can utilize the engine for acceleration;
+
+We have an official tool to do the optimization, call `paddle_inference_optimize --help` for more information.


paddle_inference_optimize是binary还是python脚本？
比如python paddle_inference_optimize src_model_dir dst_model_dir --inference_optimize_method=2 代表使用第二种优化策略。

binary或者脚本

panyx0718

Let's kick off this thing. It's in contrib, just for experiment for now

panyx0718 · 2018-05-10T10:37:37Z

@@ -0,0 +1,27 @@
+# Embed Paddle Inference in Your Application
+
+Paddle inference offers the APIs in `C` and `C++` languages.


c如果暂时不需要就先别写了

init

6a86630

Superjomn requested review from Xreki and luotao1 May 10, 2018 03:19

fix grammar

eece930

luotao1 reviewed May 10, 2018

View reviewed changes

panyx0718 approved these changes May 10, 2018

View reviewed changes

Superjomn merged commit 6d371e4 into PaddlePaddle:develop May 10, 2018

Superjomn deleted the feature/inference_api branch May 10, 2018 12:04

Xreki added the 预测原名Inference，包含Capi预测问题等 label May 16, 2018

		@@ -0,0 +1,27 @@
		# Embed Paddle Inference in Your Application

		Paddle inference offers the APIs in `C` and `C++` languages.


		Paddle inference offers the APIs in `C` and `C++` languages.

		One can easily deploy a model trained by Paddle following the steps as below:


		## Optimize the native Fluid Model

		The native model that get from the training phase needs to be optimized for that.

	void TestInference(const std::string& dirname,
	const std::vector<paddle::framework::LoDTensor*>& cpu_feeds,
	const std::vector<paddle::framework::LoDTensor*>& cpu_fetchs,
	const int repeat = 1, const bool is_combined = false) {

Uh oh!

Conversation

Superjomn commented May 10, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Superjomn May 10, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

panyx0718 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Superjomn May 10, 2018 •

edited

Loading