Skip to content

Commit 360bc05

Browse files
committed
advanced_transforms
1 parent 37c120c commit 360bc05

File tree

4 files changed

+478
-3
lines changed

4 files changed

+478
-3
lines changed

.readthedocs.yml

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,12 @@ version: 2
22

33
formats:
44
- epub
5-
5+
build:
6+
os: ubuntu-22.04
7+
tools:
8+
python: "3.8"
69
python:
7-
version: 3.7
10+
version: 3.8
811
install:
912
- requirements: requirements/docs.txt
1013
- requirements: requirements/readthedocs.txt
Lines changed: 236 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,236 @@
1+
# Customize Data Transformation and Augmentation
2+
3+
## DATA TRANSFORM
4+
5+
In the OpenMMLab algorithm library, the construction of the dataset and the preparation of the data are decoupled from each other. Usually, the construction of the dataset only analyzes the dataset and records the basic information of each sample, while the preparation of the data is through a series of According to the basic information of the sample, perform data loading, preprocessing, formatting and other operations.
6+
7+
### The use of data transformation
8+
9+
The **data transformation** and **data augmentation** classes in **MMPose** are defined in the [$MMPose/datasets/transforms](https://github.com/open-mmlab/mmpose/tree/dev-1.x/mmpose/datasets/transforms) directory, and the corresponding file structure is as follows:
10+
11+
```txt
12+
mmpose
13+
|----datasets
14+
|----transforms
15+
|----bottomup_transforms # Button-Up transforms
16+
|----common_transforms # Common Transforms
17+
|----converting # Keypoint converting
18+
|----formatting # Input data formatting
19+
|----loading # Raw data loading
20+
|----pose3d_transforms # Pose3d-transforms
21+
|----topdown_transforms # Top-Down transforms
22+
```
23+
24+
In **MMPose**, **data augmentation** and **data transformation** is a stage that users often need to consider. You can refer to the following process to design related stages:
25+
26+
```mermaid
27+
flowchart LR
28+
subgraph Transforms1
29+
Flip
30+
HalfBody
31+
Rotation
32+
Shift
33+
Resize
34+
ColorJittor
35+
end
36+
subgraph Transforms2
37+
GenerateTarget
38+
end
39+
subgraph Transforms3
40+
PackPoseInput
41+
end
42+
RawImage --> Transforms1
43+
RawLabel --> Transforms1
44+
Transforms1 --> InputImage
45+
Transforms1 --> InputCoordinates
46+
InputCoordinates --> Transforms2
47+
Transforms2 --> Transforms3
48+
InputImage --> Transforms3
49+
50+
```
51+
52+
The `common_transforms` component provides commonly used `RandomFlip`, `RandomHalfBody` **data augmentation**.
53+
54+
- Operations such as `Shift`, `Rotate`, and `Resize` in the `Top-Down` method are reflected in the [RandomBBoxTransform](https://github.com/open-mmlab/mmpose/blob/dev-1.x/mmpose/datasets/transforms/common_transforms.py#L435) method.
55+
- The [BottomupResize](https://github.com/open-mmlab/mmpose/blob/dev-1.x/mmpose/datasets/transforms/bottomup_transforms.py#L327) method is embodied in the `Buttom-Up` algorithm.
56+
- `pose-3d` is the [RandomFlipAroundRoot](https://github.com/open-mmlab/mmpose/blob/dev-1.x/mmpose/datasets/transforms/pose3d_transforms.py#L13) method.
57+
58+
**MMPose** provides corresponding data conversion interfaces for `Top-Down`, `Button-Up`, and `pose-3d`. Transform the image and coordinate labels from the `original_image_space` to the `input_image_space` by using an affine transformation.
59+
60+
- The `Top-Down` method is manifested as [TopdownAffine](https://github.com/open-mmlab/mmpose/blob/dev-1.x/mmpose/datasets/transforms/topdown_transforms.py#L14).
61+
- The `Bottom-Up` method is embodied as [BottomupRandomAffine](https://github.com/open-mmlab/mmpose/blob/dev-1.x/mmpose/datasets/transforms/bottomup_transforms.py#L134).
62+
63+
Taking `RandomFlip` as an example, this method randomly transforms the `original_image` and converts it into an `input_image` or an `intermediate_image`. To define a data transformation process, you need to inherit the [BaseTransform](https://github.com/open-mmlab/mmcv/blob/main/mmcv/transforms/base.py) class and register with `TRANSFORM`:
64+
65+
```python
66+
from mmcv.transforms import BaseTransform
67+
from mmpose.registry import TRANSFORMS
68+
69+
@TRANSFORMS.register_module()
70+
class RandomFlip(BaseTransform):
71+
"""Randomly flip the image, bbox and keypoints.
72+
73+
Required Keys:
74+
75+
- img
76+
- img_shape
77+
- flip_indices
78+
- input_size (optional)
79+
- bbox (optional)
80+
- bbox_center (optional)
81+
- keypoints (optional)
82+
- keypoints_visible (optional)
83+
- img_mask (optional)
84+
85+
Modified Keys:
86+
87+
- img
88+
- bbox (optional)
89+
- bbox_center (optional)
90+
- keypoints (optional)
91+
- keypoints_visible (optional)
92+
- img_mask (optional)
93+
94+
Added Keys:
95+
96+
- flip
97+
- flip_direction
98+
99+
Args:
100+
prob (float | list[float]): The flipping probability. If a list is
101+
given, the argument `direction` should be a list with the same
102+
length. And each element in `prob` indicates the flipping
103+
probability of the corresponding one in ``direction``. Defaults
104+
to 0.5
105+
direction (str | list[str]): The flipping direction. Options are
106+
``'horizontal'``, ``'vertical'`` and ``'diagonal'``. If a list is
107+
is given, each data sample's flipping direction will be sampled
108+
from a distribution determined by the argument ``prob``. Defaults
109+
to ``'horizontal'``.
110+
"""
111+
def __init__(self,
112+
prob: Union[float, List[float]] = 0.5,
113+
direction: Union[str, List[str]] = 'horizontal') -> None:
114+
if isinstance(prob, list):
115+
assert is_list_of(prob, float)
116+
assert 0 <= sum(prob) <= 1
117+
elif isinstance(prob, float):
118+
assert 0 <= prob <= 1
119+
else:
120+
raise ValueError(f'probs must be float or list of float, but \
121+
got `{type(prob)}`.')
122+
self.prob = prob
123+
124+
valid_directions = ['horizontal', 'vertical', 'diagonal']
125+
if isinstance(direction, str):
126+
assert direction in valid_directions
127+
elif isinstance(direction, list):
128+
assert is_list_of(direction, str)
129+
assert set(direction).issubset(set(valid_directions))
130+
else:
131+
raise ValueError(f'direction must be either str or list of str, \
132+
but got `{type(direction)}`.')
133+
self.direction = direction
134+
135+
if isinstance(prob, list):
136+
assert len(prob) == len(self.direction)
137+
```
138+
139+
**Input**:
140+
141+
- `prob` specifies the probability of transformation in horizontal, vertical, diagonal, etc., and is a `list` of floating-point numbers in the range \[0,1\].
142+
- `direction` specifies the direction of data transformation:
143+
- `horizontal`
144+
- `vertical`
145+
- `diagonal`
146+
147+
**Output**:
148+
149+
- Return a `dict` data after data transformation.
150+
151+
Here is a simple example of using `diagonal RandomFlip`
152+
153+
```python
154+
from mmpose.datasets.transforms import LoadImage, RandomFlip
155+
import mmcv
156+
157+
# Load the original image from the path
158+
results = dict(
159+
img_path='data/test/multi-person.jpeg'
160+
)
161+
transform = LoadImage()
162+
results = transform(results)
163+
# At this point, the original image loaded is a `dict`
164+
# that contains the following attributes`:
165+
# - `img_path`: Absolute path of image
166+
# - `img`: Pixel points of the image
167+
# - `img_shape`: The shape of the image
168+
# - `ori_shape`: The original shape of the image
169+
170+
# Perform diagonal flip transformation on the original image
171+
transform = RandomFlip(prob=1., direction='diagonal')
172+
results = transform(results)
173+
# At this point, the original image loaded is a `dict`
174+
# that contains the following attributes`:
175+
# - `img_path`: Absolute path of image
176+
# - `img`: Pixel points of the image
177+
# - `img_shape`: The shape of the image
178+
# - `ori_shape`: The original shape of the image
179+
# - `flip`: Is the image flipped and transformed
180+
# - `flip_direction`: The direction in which
181+
# the image is flipped and transformed
182+
183+
# Get the image after flipping and transformation
184+
mmcv.imshow(results['img'])
185+
```
186+
187+
For more information on using custom data transformations and enhancements, please refer to [$MMPose/test/test_datasets/test_transforms/test_common_transforms](https://github.com/open-mmlab/mmpose/blob/main/tests/test_datasets/test_transforms/test_common_transforms.py#L59)
188+
189+
#### RandomHalfBody
190+
191+
The [RandomHalfBody](https://github.com/open-mmlab/mmpose/blob/dev-1.x/mmpose/datasets/transforms/common_transforms.py#L263) **data augmentation** algorithm probabilistically transforms the data of the upper or lower body.
192+
193+
**Input**:
194+
195+
- `min_total_keypoints` minimum total keypoints
196+
- `min_half_keypoints` minimum half-body keypoints
197+
- `padding` The filling ratio of the bbox
198+
- `prob` accepts the probability of half-body transformation when the number of key points meets the requirements
199+
200+
**Output**:
201+
202+
- Return a `dict` data after data transformation.
203+
204+
#### Topdown Affine
205+
206+
The [TopdownAffine](https://github.com/open-mmlab/mmpose/blob/dev-1.x/mmpose/datasets/transforms/topdown_transforms.py#L14) data transformation algorithm transforms the `original image` into an `input image` through affine transformation
207+
208+
- `input_size` The bbox area will be cropped and corrected to the \[w,h\] size
209+
- `use_udp` whether to use fair data process [UDP](https://arxiv.org/abs/1911.07524).
210+
211+
**Output**:
212+
213+
- Return a `dict` data after data transformation.
214+
215+
### Using Data Augmentation and Transformation in the Pipeline
216+
217+
The **data augmentation** and **data transformation** process in the configuration file can be the following example:
218+
219+
```python
220+
train_pipeline_stage2 = [
221+
...
222+
dict(type='RandomFlip', direction='horizontal'),
223+
dict(type='RandomHalfBody'),
224+
dict(
225+
type='RandomBBoxTransform',
226+
shift_factor=0.,
227+
scale_factor=[0.75, 1.25],
228+
rotate_factor=60),
229+
dict(
230+
type='TopdownAffine',
231+
input_size=codec['input_size']),
232+
...
233+
]
234+
```
235+
236+
The pipeline in the example performs **data enhancement** on the `input data`, performs random horizontal transformation and half-body transformation, and performs `Top-Down` `Shift`, `Rotate`, and `Resize` operations, and implements affine transformation through `TopdownAffine` operations to transform to the `input_image_space`.

0 commit comments

Comments
 (0)