Skip to content

Commit 5f734c1

Browse files
committed
Add design doc for inference.
1 parent e8a96a8 commit 5f734c1

File tree

1 file changed

+105
-0
lines changed

1 file changed

+105
-0
lines changed

doc/design/inference.md

Lines changed: 105 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,105 @@
1+
# Design Doc: Inferencer
2+
3+
In fluid, a nueral network is represented as a protobuf message [ProgramDesc](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/program.md), the python wrapper of which is `Program`.
4+
Given a `ProgramDesc`, it can be run on any execution environment.
5+
In fluid, we call the execution environment `Runtime`, which includes `Place`, `Scope` and `Executor`.
6+
7+
## Representation of the Inference Network
8+
9+
In python, an inference network is defined as:
10+
11+
```python
12+
image = fluid.layers.data(name='x', shape=[784], dtype='float32')
13+
predict = fluid.layers.fc(input=image,
14+
size=10,
15+
act='softmax')
16+
```
17+
18+
After training for serval passes, the parameters can be saved use `fluid.io.save_inference_model`, which will save the binary proto string of the network at the same time.
19+
```python
20+
fluid.io.save_inference_model(
21+
"./inference_model/", ["x"], [predict],
22+
exe)
23+
```
24+
25+
The saved model contains everything of the inference network, including all operators and variables. Thus, the `inference_program` should be initilized by the model file or a pre-loaded buffer.
26+
27+
Given a `inference_program`, it is easy to derive a `load_program` which is composed of `load_op` and is responsible for initializing all the parameter variables in `inference_program`. `load_program` will be executed once and `inference_program` will be executed as many times as you need.
28+
29+
To summerize, a inferencer should:
30+
- be initialized from files or from buffers
31+
- be composed of two ProgramDesc, namely the `inference_program` and `load_program`
32+
33+
All the initialization is designed to be done in constructor.
34+
35+
## Support of Switching Runtime
36+
37+
In fluid, the execution environment is composed of three key concepts: `Place`, `Scope` and `Executor`.
38+
39+
There are two types of Place in current framework, `CPUPlace` for CPU and `CUDAPlace` for CUDA GPU. `Scope` is independent to `Place`. Given the place, you need to define a `Executor`, and run the `Executor` among the `Scope`.
40+
41+
In Inferencer, the `Runtime` is declared as follows:
42+
43+
```c++
44+
class Runtime {
45+
platform::Place* place;
46+
framework::Scope* scope;
47+
framework::Executor* executor;
48+
};
49+
```
50+
51+
With the definition of `Runtime`, the `Inferencer` will has following features:
52+
- **Switch runtime**. Different `Runtime` can have either different of the same type of `Place`, with different `Scope` and `Executor`. An `Inferencer` can run on different `Runtime` at the same time independently.
53+
- **Share parameters among different networks**. Users can run different `Inferencer`, which means different network, on the same `Runtime`, parameters with the same name will be shared.
54+
- **Share parameters among different threads**. Multi-threads can be launched to run an `Inferencer` in parallel on the same `Runtime`.
55+
56+
## Overview of the Inference API
57+
58+
A simple design, users can use the core data structure, `Tensor` and `LoDTensor`, to feed input data and fetch output data.
59+
An `Inferencer` should enable the following members and public interfaces:
60+
- Members:
61+
- the pointer of the `inference_program`
62+
- the pointer of the `load_program`
63+
- vectors of string to record the `feed_var_names` and `fetch_var_names`
64+
- the pointer of current `Runtime`
65+
- Important interfaces:
66+
- constructor, to initialize the `inference_program` and `load_program`. Once initialized, they cannot be changed.
67+
- `Run`, to run the inference based on the current runtime.
68+
- `SetRuntime`, to set the current runtime. When the runtime is set, the `load_program` will be run once to load parameters from files.
69+
- Utility interfaces:
70+
- `GetFeed/FetchVarNames`, to help users to debug.
71+
- `GetFeed/FetchVarShape`, to help users to verify the size of input and output data.
72+
73+
```c++
74+
class Inferencer {
75+
public:
76+
// Initialize from file
77+
Inferencer(const std::string& filename);
78+
// Initialize from buffer
79+
Inferencer(const char* buffer, const size_t num_bytes);
80+
81+
void SetRuntime(Runtime* runtime);
82+
83+
void Run(const std::vector<framework::Tensor>& feeds,
84+
std::vector<framework::Tensor>& fetchs);
85+
86+
// utility inferfaces
87+
std::vector<std::string>& GetFeedVarNames() const;
88+
std::vector<std::string>& GetFecthVarNames() const;
89+
std::vector<int64_t> GetFeedVarShape(const size_t index);
90+
std::vector<int64_t> GetFetchVarShape(const size_t index);
91+
92+
private:
93+
framework::ProgramDesc* inference_program_;
94+
framework::ProgramDesc* load_program_;
95+
std::vector<std::string> feed_var_names_;
96+
std::vector<std::string> fetch_var_names_;
97+
98+
Runtime* runtime;
99+
};
100+
```
101+
102+
### Issues
103+
104+
- Normally, all fetching variables' names should be written in the ProgramDesc and read from file. If users want to add some extra fetching variables for debug, or for some other use, they need to regenerate the file again. Do we need to allow user to append extra fetching variables?
105+
- How to support multi-devices?

0 commit comments

Comments
 (0)