feat: allow not using the prefix cache by justinwangx · Pull Request #3 · GraySwanAI/nanoGCG

justinwangx · 2024-08-12T18:41:50Z

No description provided.

* chore: housekeeping * chore: linter/formatter * chore: more linter/formatter + default model * feat: probe sampling for nanoGCG - plumbing (#3) * feat: probe sampling - checkpointing * a bit more cleanups * feat: probe sampling for nanoGCG - parallelization (#4) Major thing is the parallelization. It's actually relatively straightforward, what took the most time was actually debugging the cuda device-side assertions. * feat: allow retrying (#5) During probe sampling, quite a noticeable bunch of iterations don't actually bring down the losses. Adding a retry functionality to hopefully tackle the issue. * fix: correct condition of buffer.size == 0 * docs: Update README.md * docs: wordsmithing on README * debug: try using another pad token * fix: correct `optim_ids` assignment (#6) * A trivial that unfortunately blocked me for a couple of days. * Symptom: When probe sampling is enabled, the losses don't seem to be optimized at all. * Debugging: Manually tweaked `R` such that probe sampling effectively looks at all `B` candidates, the issue still persists. * Approach: In the end a silly mistake was discovered during the `current_loss, optim_ids` assignment stage where in probe sampling, the indices in `sampled_ids` don't match the actual calculated optimal candidate. * Fixed by making probe sampling function return the optimal candidate. * chore: undo retry feature (#7) Thanks to psyclaudeZ#6, the retry mechanism is probably not helpful. Hence reverting. * chore: code cleanup (#8) * refactor: better capture draft mode with a dedicated config * docs: comments * perf: GPU memory cleanup (#9) * perf: actually might not need the cleanup * perf: no_grad for draft loss calculation (#10) * docs: revert changes to README * address feedback: excessive logger.debug, linter, simply.py * address feedback: pad tokens, deps * chore: lowerbound for transformers dep

feat: allow not using the prefix cache

eebe8d6

justinwangx merged commit d2eaa78 into main Aug 12, 2024

justinwangx deleted the optional-prefix-cache branch August 12, 2024 18:44

justinwangx mentioned this pull request Aug 12, 2024

Fix for Gemma 2 #1

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: allow not using the prefix cache#3

feat: allow not using the prefix cache#3
justinwangx merged 1 commit into
mainfrom
optional-prefix-cache

justinwangx commented Aug 12, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

justinwangx commented Aug 12, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant