Skip to content

Would SGD w/ momentum or NAG be a good fit for the gradient masking? #7

@kaimatzu

Description

@kaimatzu

Just wondering if it's a good idea in theory to try this with SGD since theoretically, it should behave better than first+second moment optimization. Thoughts?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions