allow for empty before_ids and after_ids by rimon15 · Pull Request #8 · GraySwanAI/nanoGCG

rimon15 · 2024-08-21T16:06:39Z

Currently if the chat template does not have prefix/suffix text to add, the run will fail because torch defaults to float32 dtype for empty tensors, meaning that the embedding lookup in gcg.py:223 will cause an error. This is useful for models that do not have prefixes/suffixes in their templates (e.g., a base model).

justinwangx · 2024-08-22T15:33:35Z

this is great, thanks you!

* chore: housekeeping * chore: linter/formatter * chore: more linter/formatter + default model * feat: probe sampling for nanoGCG - plumbing (#3) * feat: probe sampling - checkpointing * a bit more cleanups * feat: probe sampling for nanoGCG - parallelization (#4) Major thing is the parallelization. It's actually relatively straightforward, what took the most time was actually debugging the cuda device-side assertions. * feat: allow retrying (#5) During probe sampling, quite a noticeable bunch of iterations don't actually bring down the losses. Adding a retry functionality to hopefully tackle the issue. * fix: correct condition of buffer.size == 0 * docs: Update README.md * docs: wordsmithing on README * debug: try using another pad token * fix: correct `optim_ids` assignment (#6) * A trivial that unfortunately blocked me for a couple of days. * Symptom: When probe sampling is enabled, the losses don't seem to be optimized at all. * Debugging: Manually tweaked `R` such that probe sampling effectively looks at all `B` candidates, the issue still persists. * Approach: In the end a silly mistake was discovered during the `current_loss, optim_ids` assignment stage where in probe sampling, the indices in `sampled_ids` don't match the actual calculated optimal candidate. * Fixed by making probe sampling function return the optimal candidate. * chore: undo retry feature (#7) Thanks to psyclaudeZ#6, the retry mechanism is probably not helpful. Hence reverting. * chore: code cleanup (#8) * refactor: better capture draft mode with a dedicated config * docs: comments * perf: GPU memory cleanup (#9) * perf: actually might not need the cleanup * perf: no_grad for draft loss calculation (#10) * docs: revert changes to README * address feedback: excessive logger.debug, linter, simply.py * address feedback: pad tokens, deps * chore: lowerbound for transformers dep

allow for empty before_ids and after_ids

a7c37a3

justinwangx merged commit 654aba5 into GraySwanAI:main Aug 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

allow for empty before_ids and after_ids#8

allow for empty before_ids and after_ids#8
justinwangx merged 1 commit into
GraySwanAI:mainfrom
rimon15:fix_empty_ids

rimon15 commented Aug 21, 2024

Uh oh!

justinwangx commented Aug 22, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

rimon15 commented Aug 21, 2024

Uh oh!

justinwangx commented Aug 22, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants