Skip to content

Commit e7c5178

Browse files
committed
advanced_transforms
1 parent a8fd28f commit e7c5178

File tree

4 files changed

+429
-3
lines changed

4 files changed

+429
-3
lines changed

.readthedocs.yml

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,11 @@ version: 2
22

33
formats:
44
- epub
5-
5+
build:
6+
os: ubuntu-22.04
7+
tools:
8+
python: "3.8"
69
python:
7-
version: 3.7
810
install:
911
- requirements: requirements/docs.txt
1012
- requirements: requirements/readthedocs.txt
Lines changed: 212 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,212 @@
1+
# Customize Data Transformation and Augmentation
2+
3+
## DATA TRANSFORM
4+
5+
In the OpenMMLab algorithm library, the construction of the dataset and the preparation of the data are decoupled from each other. Usually, the construction of the dataset only analyzes the dataset and records the basic information of each sample, while the preparation of the data is through a series of According to the basic information of the sample, perform data loading, preprocessing, formatting and other operations.
6+
7+
### The use of data transformation
8+
9+
The **data transformation** and **data augmentation** classes in **MMPose** are defined in the [$MMPose/datasets/transforms](https://github.com/open-mmlab/mmpose/tree/dev-1.x/mmpose/datasets/transforms) directory, and the corresponding file structure is as follows:
10+
11+
```txt
12+
mmpose
13+
|----datasets
14+
|----transforms
15+
|----bottomup_transforms # Button-Up transforms
16+
|----common_transforms # Common Transforms
17+
|----converting # Keypoint converting
18+
|----formatting # Input data formatting
19+
|----loading # Raw data loading
20+
|----pose3d_transforms # Pose3d-transforms
21+
|----topdown_transforms # Top-Down transforms
22+
```
23+
24+
In **MMPose**, **data augmentation** and **data transformation** is a stage that users often need to consider. You can refer to the following process to design related stages:
25+
26+
[![](https://mermaid.ink/img/pako:eNp9UbFOwzAQ_ZXIczuQbBkYKAKKOlRpJ5TlGp8TC9sX2WdVpeq_Y0cClahl8rv3nt_d2WfRkURRC2Xo2A3gudg0rSuKEA-9h3Eo9h5cUORteMj8i9FjPt_AqCeSp4wbYmBNLuPdoBVPJAb9hRmtyJB_18zkc4lO3mlQZv4VHXpg3IPvkf-_UGV-C93nlgKu3Riv_Q0c1xZ6LJbLx_kWSdvAAc0t7aqc5Cl3Srqrroi81C5NHbJnzs26lH9zyplc_UbcGr8SC2HRW9Ay_do5e1vBA1psRZ2gRAXRcCtad0lWiEy7k-tEzT7iQsRRpomeNaSntKJWYEJiR3AfRD_15RuTF7md?type=png)](https://mermaid-js.github.io/mermaid-live-editor/edit#pako:eNp9UbFOwzAQ_ZXIczuQbBkYKAKKOlRpJ5TlGp8TC9sX2WdVpeq_Y0cClahl8rv3nt_d2WfRkURRC2Xo2A3gudg0rSuKEA-9h3Eo9h5cUORteMj8i9FjPt_AqCeSp4wbYmBNLuPdoBVPJAb9hRmtyJB_18zkc4lO3mlQZv4VHXpg3IPvkf-_UGV-C93nlgKu3Riv_Q0c1xZ6LJbLx_kWSdvAAc0t7aqc5Cl3Srqrroi81C5NHbJnzs26lH9zyplc_UbcGr8SC2HRW9Ay_do5e1vBA1psRZ2gRAXRcCtad0lWiEy7k-tEzT7iQsRRpomeNaSntKJWYEJiR3AfRD_15RuTF7md)
27+
28+
The `common_transforms` component provides commonly used `RandomFlip`, `RandomHalfBody` **data augmentation**.
29+
30+
- Operations such as `Shift`, `Rotate`, and `Resize` in the `Top-Down` method are reflected in the [RandomBBoxTransform](https://github.com/open-mmlab/mmpose/blob/dev-1.x/mmpose/datasets/transforms/common_transforms.py#L435) method.
31+
- The [BottomupResize](https://github.com/open-mmlab/mmpose/blob/dev-1.x/mmpose/datasets/transforms/bottomup_transforms.py#L327) method is embodied in the `Buttom-Up` algorithm.
32+
- `pose-3d` is the [RandomFlipAroundRoot](https://github.com/open-mmlab/mmpose/blob/dev-1.x/mmpose/datasets/transforms/pose3d_transforms.py#L13) method.
33+
34+
**MMPose** provides corresponding data conversion interfaces for `Top-Down`, `Button-Up`, and `pose-3d`. Transform the image and coordinate labels from the `original_image_space` to the `input_image_space` by using an affine transformation.
35+
36+
- The `Top-Down` method is manifested as [TopdownAffine](https://github.com/open-mmlab/mmpose/blob/dev-1.x/mmpose/datasets/transforms/topdown_transforms.py#L14).
37+
- The `Bottom-Up` method is embodied as [BottomupRandomAffine](https://github.com/open-mmlab/mmpose/blob/dev-1.x/mmpose/datasets/transforms/bottomup_transforms.py#L134).
38+
39+
Taking `RandomFlip` as an example, this method randomly transforms the `original_image` and converts it into an `input_image` or an `intermediate_image`. To define a data transformation process, you need to inherit the [BaseTransform](https://github.com/open-mmlab/mmcv/blob/main/mmcv/transforms/base.py) class and register with `TRANSFORM`:
40+
41+
```python
42+
from mmcv.transforms import BaseTransform
43+
from mmpose.registry import TRANSFORMS
44+
45+
@TRANSFORMS.register_module()
46+
class RandomFlip(BaseTransform):
47+
"""Randomly flip the image, bbox and keypoints.
48+
49+
Required Keys:
50+
51+
- img
52+
- img_shape
53+
- flip_indices
54+
- input_size (optional)
55+
- bbox (optional)
56+
- bbox_center (optional)
57+
- keypoints (optional)
58+
- keypoints_visible (optional)
59+
- img_mask (optional)
60+
61+
Modified Keys:
62+
63+
- img
64+
- bbox (optional)
65+
- bbox_center (optional)
66+
- keypoints (optional)
67+
- keypoints_visible (optional)
68+
- img_mask (optional)
69+
70+
Added Keys:
71+
72+
- flip
73+
- flip_direction
74+
75+
Args:
76+
prob (float | list[float]): The flipping probability. If a list is
77+
given, the argument `direction` should be a list with the same
78+
length. And each element in `prob` indicates the flipping
79+
probability of the corresponding one in ``direction``. Defaults
80+
to 0.5
81+
direction (str | list[str]): The flipping direction. Options are
82+
``'horizontal'``, ``'vertical'`` and ``'diagonal'``. If a list is
83+
is given, each data sample's flipping direction will be sampled
84+
from a distribution determined by the argument ``prob``. Defaults
85+
to ``'horizontal'``.
86+
"""
87+
def __init__(self,
88+
prob: Union[float, List[float]] = 0.5,
89+
direction: Union[str, List[str]] = 'horizontal') -> None:
90+
if isinstance(prob, list):
91+
assert is_list_of(prob, float)
92+
assert 0 <= sum(prob) <= 1
93+
elif isinstance(prob, float):
94+
assert 0 <= prob <= 1
95+
else:
96+
raise ValueError(f'probs must be float or list of float, but \
97+
got `{type(prob)}`.')
98+
self.prob = prob
99+
100+
valid_directions = ['horizontal', 'vertical', 'diagonal']
101+
if isinstance(direction, str):
102+
assert direction in valid_directions
103+
elif isinstance(direction, list):
104+
assert is_list_of(direction, str)
105+
assert set(direction).issubset(set(valid_directions))
106+
else:
107+
raise ValueError(f'direction must be either str or list of str, \
108+
but got `{type(direction)}`.')
109+
self.direction = direction
110+
111+
if isinstance(prob, list):
112+
assert len(prob) == len(self.direction)
113+
```
114+
115+
**Input**:
116+
117+
- `prob` specifies the probability of transformation in horizontal, vertical, diagonal, etc., and is a `list` of floating-point numbers in the range \[0,1\].
118+
- `direction` specifies the direction of data transformation:
119+
- `horizontal`
120+
- `vertical`
121+
- `diagonal`
122+
123+
**Output**:
124+
125+
- Return a `dict` data after data transformation.
126+
127+
Here is a simple example of using `diagonal RandomFlip`
128+
129+
```python
130+
from mmpose.datasets.transforms import LoadImage, RandomFlip
131+
import mmcv
132+
133+
# Load the original image from the path
134+
results = dict(
135+
img_path='data/test/multi-person.jpeg'
136+
)
137+
transform = LoadImage()
138+
results = transform(results)
139+
# At this point, the original image loaded is a `dict`
140+
# that contains the following attributes`:
141+
# - `img_path`: Absolute path of image
142+
# - `img`: Pixel points of the image
143+
# - `img_shape`: The shape of the image
144+
# - `ori_shape`: The original shape of the image
145+
146+
# Perform diagonal flip transformation on the original image
147+
transform = RandomFlip(prob=1., direction='diagonal')
148+
results = transform(results)
149+
# At this point, the original image loaded is a `dict`
150+
# that contains the following attributes`:
151+
# - `img_path`: Absolute path of image
152+
# - `img`: Pixel points of the image
153+
# - `img_shape`: The shape of the image
154+
# - `ori_shape`: The original shape of the image
155+
# - `flip`: Is the image flipped and transformed
156+
# - `flip_direction`: The direction in which
157+
# the image is flipped and transformed
158+
159+
# Get the image after flipping and transformation
160+
mmcv.imshow(results['img'])
161+
```
162+
163+
For more information on using custom data transformations and enhancements, please refer to [$MMPose/test/test_datasets/test_transforms/test_common_transforms](https://github.com/open-mmlab/mmpose/blob/main/tests/test_datasets/test_transforms/test_common_transforms.py#L59)
164+
165+
#### RandomHalfBody
166+
167+
The [RandomHalfBody](https://github.com/open-mmlab/mmpose/blob/dev-1.x/mmpose/datasets/transforms/common_transforms.py#L263) **data augmentation** algorithm probabilistically transforms the data of the upper or lower body.
168+
169+
**Input**:
170+
171+
- `min_total_keypoints` minimum total keypoints
172+
- `min_half_keypoints` minimum half-body keypoints
173+
- `padding` The filling ratio of the bbox
174+
- `prob` accepts the probability of half-body transformation when the number of key points meets the requirements
175+
176+
**Output**:
177+
178+
- Return a `dict` data after data transformation.
179+
180+
#### Topdown Affine
181+
182+
The [TopdownAffine](https://github.com/open-mmlab/mmpose/blob/dev-1.x/mmpose/datasets/transforms/topdown_transforms.py#L14) data transformation algorithm transforms the `original image` into an `input image` through affine transformation
183+
184+
- `input_size` The bbox area will be cropped and corrected to the \[w,h\] size
185+
- `use_udp` whether to use fair data process [UDP](https://arxiv.org/abs/1911.07524).
186+
187+
**Output**:
188+
189+
- Return a `dict` data after data transformation.
190+
191+
### Using Data Augmentation and Transformation in the Pipeline
192+
193+
The **data augmentation** and **data transformation** process in the configuration file can be the following example:
194+
195+
```python
196+
train_pipeline_stage2 = [
197+
...
198+
dict(type='RandomFlip', direction='horizontal'),
199+
dict(type='RandomHalfBody'),
200+
dict(
201+
type='RandomBBoxTransform',
202+
shift_factor=0.,
203+
scale_factor=[0.75, 1.25],
204+
rotate_factor=60),
205+
dict(
206+
type='TopdownAffine',
207+
input_size=codec['input_size']),
208+
...
209+
]
210+
```
211+
212+
The pipeline in the example performs **data enhancement** on the `input data`, performs random horizontal transformation and half-body transformation, and performs `Top-Down` `Shift`, `Rotate`, and `Resize` operations, and implements affine transformation through `TopdownAffine` operations to transform to the `input_image_space`.

0 commit comments

Comments
 (0)