Implementation of ResMLP and gMLP (with improvements to MLPMixer and `PatchEmbedding`) #125

theabhirath · 2022-02-25T14:42:08Z

This is an implementation of ResMLP. In the process, I ended up doing quite a lot of things, including an almost complete rewrite of the base MLPMixer model itself to make it cleaner and more understandable, as well as by fixing stuff like PatchEmbedding, DropPath and LayerScale. There is also some utility function cleanup (and some random formatting errors that I fixed as I found them) but mostly this PR deals with MLPMixer, ResMLP and PatchEmbedding. gradtest passes, so I think this should be fine on that front

Edit: also added gMLP to the implementations

1. Refactor of MLPMixer to allow for more customisation 2. Refined API for PatchEmbedding, DropPath and LayerScale 3. Cleaned up some utility functions 4. Fixed minor formatting errors

theabhirath · 2022-02-25T15:06:52Z

What is with the Ubuntu failures with the KILL signal 🥲

ToucheSir · 2022-02-25T15:13:37Z

Per one of Kyle's earlier comments, this may be more OOMs. I saw a similar issue pop up in FluxBench, so perhaps something is leaking in the test suite, something is being tested with too large of an input or GC is not running aggressively enough?

theabhirath · 2022-02-25T15:22:43Z

Per one of Kyle's earlier comments, this may be more OOMs. I saw a similar issue pop up in FluxBench, so perhaps something is leaking in the test suite, something is being tested with too large of an input or GC is not running aggressively enough?

Inputs are similar for all models, so probably unlikely? Not sure about GC or leakages in the test suite, although it's interesting that it seems to be OS dependent somehow. macOS has been unaffected across PRs, while Windows is fine on this one but errors on the res2net one

ToucheSir · 2022-02-25T15:25:15Z

Right, but some of the newer models might be larger. I wonder if the glibc memory bloat issue might be a factor on linux, but this is all speculation.

darsnack · 2022-02-25T15:28:08Z

These OOMs are why we disable gradtest in the first place. It isn't surprising that it is OS-dependent, because the "kill" switch is not really up to Julia. If the issue is accumulated memory that isn't be garbage collected, then both the number of tests and the size of each matters. So, not the input size to ResMLP that matter as much as where it is in the test suite order.

Can you try adding manual GC.gc() calls between @testset for each model?

Maybe we can see if FluxML can get some more dedicated CI options. Otherwise, we can always "chunk" the tests so that an ENV variable controls which test sets are run. Then our actions script will invoke Julia multiple times with a different environment variable.

theabhirath · 2022-02-25T16:02:31Z

Huh. That's unexpected - I did GC.gc() between the groups of testsets (not individual models just yet). Is that Julia version dependent somehow? Also the KILL signal is right after the GC call....that doesn't feel right somehow

darsnack · 2022-02-25T16:47:14Z

The GC can certainly be version dependent. It's also possible that we are right on the edge of maxing out and the results are non-deterministic. ~~You could try going more fine-grained or even add GC.gc() directly in gradtest's definition.~~

Also the KILL signal is right after the GC call....that doesn't feel right somehow

I don't the GC calls are making any difference. What is consistent between both runs so far is that the KILL signal happens right when we start the "other" tests. So, it seems like the MLP mixer variants are more memory intensive; maybe right on our 7GB limit. Can you try reducing the batch size to 1?

As for why macOS never has these issues, it looks like those machines get twice as much memory as Windows or Linux: https://docs.github.com/en/actions/using-github-hosted-runners/about-github-hosted-runners#supported-runners-and-hardware-resources

darsnack · 2022-02-25T16:54:48Z

Sorry, edited my comment above after you pushed but let's see what happens anyways

theabhirath · 2022-02-25T16:58:31Z

it seems like the MLP mixer variants are more memory intensive

This is exactly what's confusing me - they're not. The ViT variants are the heaviest by far, and they didn't trip the memory when they were merged, so this is surprising to me because the MLPMixer variants are less intensive on both memory and compute

Edit: Oh I just realised we aren't testing on the ViT variants, only on the base model 🤦🏽‍♂️ that probably explains it

theabhirath · 2022-02-25T17:13:45Z

But yeah, we will probably need a different approach for tests anyways given that some of the ViT variants will be in the multi-100Ms range - this seems hacky and also doesn't seem to work all the time

Edit: nvm, figured that some of the models are upto 1 GB in size (especially the xlarge ones) - so GC calls inside the MLPMixer testsets seem to fix everything up

1. Cleaned up `mlpblock` implementation 2. More elaborate API for mlpmixer constructor

theabhirath · 2022-03-01T04:17:06Z

Is this GTG?

darsnack

Sorry about that, I left some comments. I'll need a little more time to review the spatial gating unit properly.

src/convnets/convnext.jl

src/layers/mlp.jl

src/layers/embeddings.jl

src/layers/others.jl

src/other/mlpmixer.jl

src/layers/mlp.jl

src/layers/embeddings.jl

theabhirath · 2022-03-01T18:08:11Z

CI failures upstream on nightly during precompilation because libblas not found - for NNLib 🥲. Was facing this locally as well on Julia master. Worth opening an issue somewhere?

darsnack

Wow this is a very clean implementation now that I've had the chance to appreciate it!

src/layers/mlp.jl

src/layers/others.jl

src/other/mlpmixer.jl

test/other.jl

src/layers/others.jl

theabhirath · 2022-03-12T19:22:20Z

Why is Windows CI OOMing on nightly 😑 it's the same code for the stable version

theabhirath · 2022-03-12T20:23:38Z

Yeah the memory issues are still problematic on GitHub Actions...probably need a long term alternative as the number of models increase

darsnack

Some small changes, but looks ready otherwise.

src/layers/embeddings.jl

src/other/mlpmixer.jl

Co-authored-by: Kyle Daruwalla <[email protected]>

darsnack · 2022-03-17T12:58:42Z

Thanks @theabhirath!

Initial commit for ResMLP

f87c1cb

1. Refactor of MLPMixer to allow for more customisation 2. Refined API for PatchEmbedding, DropPath and LayerScale 3. Cleaned up some utility functions 4. Fixed minor formatting errors

Added GC calls between testsets

ad5c12e

theabhirath changed the title ~~Implementation of ResMLP~~ Implementation of ResMLP (and improvements to MLPMixer and PatchEmbedding) Feb 25, 2022

theabhirath changed the title ~~Implementation of ResMLP (and improvements to MLPMixer and PatchEmbedding)~~ Implementation of ResMLP (with improvements to MLPMixer and PatchEmbedding) Feb 25, 2022

More GC + reduce batch size

2b6ceee

theabhirath added 2 commits February 25, 2022 22:48

Even more GC

b5ca55c

Added gMLP

1296dc4

1. Cleaned up `mlpblock` implementation 2. More elaborate API for mlpmixer constructor

theabhirath changed the title ~~Implementation of ResMLP (with improvements to MLPMixer and PatchEmbedding)~~ Implementation of ResMLP and gMLP (with improvements to MLPMixer and PatchEmbedding) Feb 26, 2022

Minor Fixes for LayerScale

c8333a6

darsnack requested changes Mar 1, 2022

View reviewed changes

Fixes I

7d5cdac

theabhirath mentioned this pull request Mar 2, 2022

Change libblas to libblastrampoline FluxML/NNlib.jl#396

Merged

mlp_block API update

8d7a41b

theabhirath requested a review from darsnack March 3, 2022 18:02

darsnack mentioned this pull request Mar 11, 2022

Release 0.7 #132

Merged

darsnack requested changes Mar 12, 2022

View reviewed changes

ToucheSir reviewed Mar 12, 2022

View reviewed changes

src/layers/others.jl Outdated Show resolved Hide resolved

theabhirath added 2 commits March 13, 2022 01:28

Fixes II

a3181a4

GC as much as humanly possible

3617b74

theabhirath added 2 commits March 14, 2022 22:15

Weight init for gating unit

6eb93b6

Cleanup

0189aa9

darsnack requested changes Mar 15, 2022

View reviewed changes

src/layers/embeddings.jl Outdated Show resolved Hide resolved

src/other/mlpmixer.jl Show resolved Hide resolved

src/other/mlpmixer.jl Outdated Show resolved Hide resolved

Apply suggestions from code review

f2250d0

Co-authored-by: Kyle Daruwalla <[email protected]>

theabhirath mentioned this pull request Mar 15, 2022

Refactor of ViT models #135

Merged

theabhirath requested a review from darsnack March 15, 2022 20:45

darsnack approved these changes Mar 17, 2022

View reviewed changes

darsnack merged commit 13cbf02 into FluxML:master Mar 17, 2022

theabhirath deleted the resmlp branch March 18, 2022 14:50

Uh oh!

Implementation of ResMLP and gMLP (with improvements to MLPMixer and PatchEmbedding) #125

Implementation of ResMLP and gMLP (with improvements to MLPMixer and PatchEmbedding) #125

Uh oh!

Conversation

theabhirath commented Feb 25, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

theabhirath commented Feb 25, 2022

Uh oh!

ToucheSir commented Feb 25, 2022

Uh oh!

theabhirath commented Feb 25, 2022

Uh oh!

ToucheSir commented Feb 25, 2022

Uh oh!

darsnack commented Feb 25, 2022

Uh oh!

theabhirath commented Feb 25, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

darsnack commented Feb 25, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

darsnack commented Feb 25, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

theabhirath commented Feb 25, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

theabhirath commented Feb 25, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

theabhirath commented Mar 1, 2022

Uh oh!

darsnack left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

theabhirath commented Mar 1, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

darsnack left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

theabhirath commented Mar 12, 2022

Uh oh!

theabhirath commented Mar 12, 2022

Uh oh!

darsnack left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

darsnack commented Mar 17, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Implementation of ResMLP and gMLP (with improvements to MLPMixer and `PatchEmbedding`) #125

Implementation of ResMLP and gMLP (with improvements to MLPMixer and `PatchEmbedding`) #125

theabhirath commented Feb 25, 2022 •

edited

Loading

theabhirath commented Feb 25, 2022 •

edited

Loading

darsnack commented Feb 25, 2022 •

edited

Loading

darsnack commented Feb 25, 2022 •

edited

Loading

theabhirath commented Feb 25, 2022 •

edited

Loading

theabhirath commented Feb 25, 2022 •

edited

Loading

theabhirath commented Mar 1, 2022 •

edited

Loading