Skip to content

Refine dropout gpu memory #17095

Merged
sneaxiy merged 3 commits intoPaddlePaddle:developfrom
sneaxiy:refine_dropout_mem
Apr 28, 2019
Merged

Refine dropout gpu memory #17095
sneaxiy merged 3 commits intoPaddlePaddle:developfrom
sneaxiy:refine_dropout_mem

Conversation

@sneaxiy
Copy link
Collaborator

@sneaxiy sneaxiy commented Apr 25, 2019

This PR changes the output Mask of dropout_op to be type of uint8_t. (Furthermore, we can change Mask to be something like std::vector<bool>).

This PR makes the maximum batch size of Transformer model in benchmark repo reach 12000 stably.

chengduoZH
chengduoZH previously approved these changes Apr 28, 2019
Copy link
Contributor

@chengduoZH chengduoZH left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent

# The first commit's message is:
remove ut test_dist_word2vec in mac ci, will fix it in private, test=develop (PaddlePaddle#17066)

# This is the 2nd commit message:

Fleet unify distributed training (PaddlePaddle#16791)

* implement distributed transpiler with fleet
# This is the 3rd commit message:

ParallelDyGraph with GPU collective mode (PaddlePaddle#16827)

implement dygraph.parallel.DataParallel to hook reduce op.

# This is the 4th commit message:

Init mixed precision training interface (PaddlePaddle#16856)

* Init mixed precision training interface

* Add fp16 test script

test=develop

* All initializers support float16

test=develop

* Code cleanup & add more code annotations

test=develop

* Update API spec

test=develop

* Add usage example in doc

test=develop

# This is the 5th commit message:

fix reference_count_pass,test=develop (PaddlePaddle#17060)

test=develop
# This is the 6th commit message:

Speedup roi_perspective_transform op by caching the information of linear interpolation in forward (PaddlePaddle#17090)

* Cache the information of linear interpolation in forward and use it in backward.
test=develop

* Fix cuda kernel.
test=develop

# This is the 7th commit message:

remove unnecessary prepare_data (PaddlePaddle#17080)

test=develop
# This is the 8th commit message:

fix interpolate cu. test=develop (PaddlePaddle#17101)

# This is the 9th commit message:

test=develop, double backward leaky_relu (PaddlePaddle#17067)

backward of backward: leaky_relu
# This is the 10th commit message:

fix fuse optimizer ops (PaddlePaddle#17102)

test=develop
# This is the 11th commit message:

truncated_gaussian_random supported in distributed training, test=develop (PaddlePaddle#17091)

# This is the 12th commit message:

 Detailed coordinate description for yolov3 loss (PaddlePaddle#17007)

* Detailed coordinate description for yolov3 loss

test=develop

* modified api.spec

test=develop

* modified loss name

* fix api.spec

test=develop

* polish description

test=develop

* modified api.spec

test=develop

# This is the 13th commit message:

fix test_weight_decay (PaddlePaddle#17109)

test=develop
# This is the 14th commit message:

Path flag (PaddlePaddle#17105)

* fix python/paddle/fluid/__init__.py detecting problems
@sneaxiy sneaxiy force-pushed the refine_dropout_mem branch from 7ead7f9 to d8389c2 Compare April 28, 2019 02:24
Copy link
Contributor

@chengduoZH chengduoZH left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent

@sneaxiy sneaxiy merged commit 28d69d7 into PaddlePaddle:develop Apr 28, 2019
@sneaxiy sneaxiy deleted the refine_dropout_mem branch October 17, 2019 07:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants