Skip to content
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
49 changes: 38 additions & 11 deletions paddle/fluid/operators/jit/README.en.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# JIT Kernel

JIT(Just In Time) Kernel contains actually generated code and some other implemenations with the same logic.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

implementations

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

Each implementations has its own condition to use, defined in `UseMe`.
Each implementations has its own condition to use, defined in `CanBeUsed`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

each 后面不应跟复数,另外请确定一下implementations 这个词的单复数是同样的写法吗

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

They are combined together to get the best performance of one single independent function.
They could be some very simple functions like vector multiply, or some complicated functions like LSTM.
And they can be composed with some other exited jit kernels to build up a complex function.
Expand Down Expand Up @@ -42,35 +42,62 @@ All basical definations of jit kernels are addressed in `paddle/fluid/operators/

## How to use

One simple function `jit::Get`, which is very easy to use, is supported to get the kernel.
It can automatically return the expected function with best performance under the given attributes.
All kernels are inlcuded in `paddle/fluid/operators/jit/kernels.h`, you can only include this one header to get all the registered kernels.
We present these methods to get the functions:
- `GetAllCandidateFuncs`. It can return all the implementations supported. All of the implementations can get the same result. You can do some runtime benchmark to choose which should actually be used.
- `GetDefaultBestFunc`. It only return one defualt function pointer, which is tuning offline with some genenal configures and attributes. This should cover most situations.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

- `KernelFuncs::Cache()`. It can get the defualt functions and save it for next time with the same attribute.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

- `GetReferFunc`. It can only get the reference code in CPU, and all the others implementations have same logic with this reference code.

And here are some examples:

Get from cache:

```cpp
using T = float;
jit::seq_pool_attr_t attr(width, jit::SeqPoolType::kSum);
auto seqpool_func = jit::KernelFuncs<jit::SeqPoolTuple<T>, platform::CPUPlace>::Cache().At(attr);
seqpool_func(src_data, dst_data, &attr);
```

Get all implementations and run once:

```cpp
using T = float;
jit::seq_pool_attr_t attr(width, jit::SeqPoolType::kSum);
auto funcs = jit::GetAllCandidateFuncsWithTypes<jit::SeqPoolTuple<T>, platform::CPUPlace>(attr);
for (auto f : funcs) {
LOG(INFO) << "Kernel implementation type: " << f.first;
f.second(src_data, dst_data, &attr);
}
```

All kernels are inlcuded in `paddle/fluid/operators/jit/kernels.h` which is automatically generated in compile time, you can only include this one header to get all the registered kernels.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All kernels are inlcuded in paddle/fluid/operators/jit/kernels.h 后面应跟逗号

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, done


## Solid Test

- Unit Test
All functions should be compared with the corresponding reference functions, including data tyep `float` and `double`.
- Benchmark
All functions should be tested, and make sure the `jit::Get` function obtain the best performance with all attributes.
All functions should be tested, and make sure the `jit::GetDefaultBestFunc` function obtain the best performance with all attributes.

# How to add new kernel

## Required

1. Add `your_key` at `KernelType`.
2. Add reference function of `your_key`.
2. Add your new `KernelTuple` which must inlucde `your_key`. I should be a combination of the data type, attribute type and function type. You can refer `SeqPoolTuple`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

include

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should be

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks done.

3. Add reference function of `your_key`.
Note:
- this should be run on CPU and do not depend on any third-party.
- Add `USE_JITKERNEL_REFER(your_key)` in `refer/CmakeLists.txt` to make sure this code can be used.
3. Add unit test in `test.cc`, and verfiy at least `float` and `double`.
4. Add unit test in `test.cc`, and verfiy at least `float` and `double`.
Test more data type for some special functions if necessary, for example `int8`.
4. Add functions in `benchmark.cc` to test all function of same `KernelType`. Make sure `jit::Get` always get the best one.
5. Add functions in `benchmark.cc` to test all function of same `KernelType`. Make sure `GetDefaultBestFunc` always get the best one.

## Optional

Add more implementations of `your_kery` for performance enhancement.

1. Add functions based on generated code in `gen`. It should be derived from `JitCode` and should have corepsonding creator from `JitCodeCreator` which will be registered on the `your_key`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correpsonding

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks done.

Note: Add new `KernelTuples` if necessary,your can refer to `XYZNTuples`.
Specialie method `JitCodeKey` when add new attribute type。
2. Add more functions in `more`,you can use any third party you wish, like mkl, mkldnn or intrinsic code to reach the best performance.
2. If new attribute type is added, you should specialize `JitCodeKey` of this type.
3. Add more functions in `more`,you can use any third party you wish, like mkl, mkldnn or intrinsic code to reach the best performance.
50 changes: 39 additions & 11 deletions paddle/fluid/operators/jit/README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# JIT Kernel

结合函数模板和JIT生成需要的kernel函数。
这里的kernel是比Operator中kernel更小级别的算子单元,更侧重的是在不同硬件上的性能。可以有多重第三方库的实现,每种实现有自己的`UseMe`函数负责什么条件下可以被调用。
这里的kernel是比Operator中kernel更小级别的算子单元,更侧重的是在不同硬件上的性能。可以有多重第三方库的实现,每种实现有自己的`CanBeUsed`函数负责什么条件下可以被调用。
这里实现的函数可以非常细粒度的函数方法,比如Vector MUL, 也可以是一个复杂的逻辑比如LSTM等。复杂的逻辑也可以由自己的底层函数拼接而成。
目前仅支持CPU上的高性能计算。

Expand Down Expand Up @@ -39,27 +39,55 @@ PaddlePaddle/Paddle/paddle/fluid/

## 动态获取

提供一个`jit::Get`方法,根据kernel类别获取,每种实现都有自己的使用范围,根据范围动态和当前条件选择需要的kernel函数。
- 提供`GetAllCandidateFuncs`方法,根据输入的kernel类别,获取满足要求的所有函数实现。所有实现保证结果一致,但是速度不一致,可以根据具体输入属性大小,动态测试得到当前最优实现,手动选择最优函数。
- 提供`GetDefaultBestFunc`方法,返回一个默认最优的函数实现。该函数是根据一些通用配置离线tuning之后的结果,能覆盖大多数情况下最优结果。
- 提供`KernelFuncs::Cache()`方法,该方法会返回默认最优的函数,同时会缓存该函数指针,如果出现属性一致的情况,直接返回上次的函数指针,如果不存在测根据属性新建。
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

如果不存在则

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.done

- 提供`GetReferFunc` 方法,返回该kernel最原始的逻辑函数。该方法与kernel的输入大小和属性没有任何关系,有且并只有一个在CPU上的实现。该方法表征了kernel的原始逻辑,其他所有实现的逻辑与它保持一致。

### 例子

所有kernel的调用只需要在头文件中包含`"paddle/fluid/operators/jit/kernels.h"`, 该文件是编译时自动生成的。

直接从缓存中获取默认最优的函数。

```cpp
using T = float;
jit::seq_pool_attr_t attr(width, jit::SeqPoolType::kSum);
auto seqpool_func = jit::KernelFuncs<jit::SeqPoolTuple<T>, platform::CPUPlace>::Cache().At(attr);
seqpool_func(src_data, dst_data, &attr);
```

跑一遍所有实现,并输出实现类别。

```cpp
using T = float;
jit::seq_pool_attr_t attr(width, jit::SeqPoolType::kSum);
auto funcs = jit::GetAllCandidateFuncsWithTypes<jit::SeqPoolTuple<T>, platform::CPUPlace>(attr);
for (auto f : funcs) {
LOG(INFO) << "Kernel implementation type: " << f.first;
f.second(src_data, dst_data, &attr);
}
```

## 测试

- 逻辑测试
所有实现都要与refer的code对比,需要满足精度要求, 包括float和double的数据类型
- 性能测试
所有实现的性能对比,并且与最终的`jit::Get`方法对比,该方法拿到的性能需要在各种条件下都是最好的。
所有实现的性能对比,并且与最终的`jit::GetDefaultBestFunc`方法对比,该方法拿到的性能需要在各种条件下都是最好的。

# 如何添加新的算子

- 在`KernelType` 中添加 `your_key` .
- 实现Reference 的逻辑,这个是必须是在CPU上的实现,并且不能依赖任何第三方库。实现后在`refer/CmakeLists.txt`中添加`USE_JITKERNEL_REFER(your_key)`来使用该kernel.
- (optional) 实现更多的算法在`more`目录下,可以依赖mkl,intrinsic或者mkldnn等第三方库。
- (optional) 实现基于Xbyak的生成code,在`gen`目下。 jitcode需要实现自己的`JitCodeCreator`,并注册在与refer相同的`KernelType`上。
- 必要时可以添加新的`KernelTuples`,可以参考`XYZNTuples`,新加的Attr类型需要特例化`JitCodeKey`方法。
- 在`test.cc`中添加unit test,至少需要测试`float`和`double`两种数据类型,如有必要需要支持额外的数据类型,比如`int8`的相关函数。
- 在`benchmark.cc`中添加相应的性能对比,同一种kernel需要对比所有实现,并且确保`jit::Get`得到的实现一直是速度最快的。
1. 在`KernelType` 中添加 `your_key`
2. 实现Reference 的逻辑,这个是必须是在CPU上的实现,并且不能依赖任何第三方库。实现后在`refer/CmakeLists.txt`中添加`USE_JITKERNEL_REFER(your_key)`来使用该kernel
3. (optional) 实现更多的算法在`more`目录下,可以依赖mkl,intrinsic或者mkldnn等第三方库。
4. (optional) 实现基于Xbyak的生成code,在`gen`目下。 jitcode需要实现自己的`JitCodeCreator`,并注册在与refer相同的`KernelType`上。
5. 添加新的`KernelTuple`,需要与`KernelType`一一对应,是所有类型的一个打包,包括数据类型,属性的类型,以及返回的函数类型。可以参考`SeqPoolTuple`,新加的Attr类型需要特例化`JitCodeKey`方法。
6. 在`test.cc`中添加unit test,至少需要测试`float`和`double`两种数据类型,如有必要需要支持额外的数据类型,比如`int8`的相关函数。
7. 在`benchmark.cc`中添加相应的性能对比,同一种kernel需要对比所有实现,并且确保`GetDefaultBestFunc`得到的实现一直是速度最快的。

# 优点
- 统一的Get方法,接口简单
- 接口方便,灵活调用
- 同一套逻辑可以有多套实现,可以依赖多套第三方库,互不影响。
- 目录结构清晰,不会在某个文件中有多个宏定义,导致的可读性差问题。
- 优化方便,可以直接针对某种属性针对性优化,并不影响其他属性下的性能。
Expand Down