Skip to content

allow for empty before_ids and after_ids#8

Merged
justinwangx merged 1 commit into
GraySwanAI:mainfrom
rimon15:fix_empty_ids
Aug 22, 2024
Merged

allow for empty before_ids and after_ids#8
justinwangx merged 1 commit into
GraySwanAI:mainfrom
rimon15:fix_empty_ids

Conversation

@rimon15

@rimon15 rimon15 commented Aug 21, 2024

Copy link
Copy Markdown
Contributor

Currently if the chat template does not have prefix/suffix text to add, the run will fail because torch defaults to float32 dtype for empty tensors, meaning that the embedding lookup in gcg.py:223 will cause an error. This is useful for models that do not have prefixes/suffixes in their templates (e.g., a base model).

@justinwangx

Copy link
Copy Markdown
Collaborator

this is great, thanks you!

@justinwangx justinwangx merged commit 654aba5 into GraySwanAI:main Aug 22, 2024
justinwangx pushed a commit that referenced this pull request Feb 3, 2025
* chore: housekeeping

* chore: linter/formatter

* chore: more linter/formatter + default model

* feat: probe sampling for nanoGCG - plumbing (#3)

* feat: probe sampling - checkpointing

* a bit more cleanups

* feat: probe sampling for nanoGCG - parallelization (#4)

Major thing is the parallelization. It's actually relatively straightforward, what took the most time was actually debugging the cuda device-side assertions.

* feat: allow retrying (#5)

During probe sampling, quite a noticeable bunch of iterations don't actually bring down the losses. Adding a retry functionality to hopefully tackle the issue.

* fix: correct condition of buffer.size == 0

* docs: Update README.md

* docs: wordsmithing on README

* debug: try using another pad token

* fix: correct `optim_ids` assignment (#6)

* A trivial that unfortunately blocked me for a couple of days. 
* Symptom: When probe sampling is enabled, the losses don't seem to be optimized at all.
* Debugging: Manually tweaked `R` such that probe sampling effectively looks at all `B` candidates, the issue still persists.
* Approach: In the end a silly mistake was discovered during the `current_loss, optim_ids` assignment stage where in probe sampling, the indices in `sampled_ids` don't match the actual calculated optimal candidate. 
* Fixed by making probe sampling function return the optimal candidate.

* chore: undo retry feature (#7)

Thanks to psyclaudeZ#6, the retry mechanism is probably not helpful. Hence reverting.

* chore: code cleanup (#8)

* refactor: better capture draft mode with a dedicated config

* docs: comments

* perf: GPU memory cleanup (#9)

* perf: actually might not need the cleanup

* perf: no_grad for draft loss calculation (#10)

* docs: revert changes to README

* address feedback: excessive logger.debug, linter, simply.py

* address feedback: pad tokens, deps

* chore: lowerbound for transformers dep
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants