Fix/prompt encoding fixes#4964
Merged
Merged
Conversation
transformers 5 defaults tokenizer.add_tokens() to normalized=True, so the CLIP tokenizer lowercases added embedding names. tokenizer.tokenize() then returns the lowercased surface, maybe_convert_prompt never matches mixed-case names in added_tokens_encoder, and multi-vector expansion is skipped. Every mixed-case multi-vector embedding collapsed to its first vector and looked ignored. Add embedding tokens as AddedToken(name, normalized=False) so they stay case-sensitive and tokenize() surfaces them verbatim. Valid on transformers 4.x and 5.x; convert_tokens_to_ids and encoding are unaffected.
transformers 5.6 flattened CLIPTextModel, removing the .text_model wrapper that compel_hijack and the xhinker parser dereference on the normalized clip-skip path. On SD1.5 at clip-skip >= 2 this raised AttributeError, which processing_prompt caught and silently fell back to fixed-attention encoding, dropping textual inversion and prompt weighting. Resolve the submodule via getattr(te, 'text_model', te), correct for flattened CLIPTextModel, CLIPTextModelWithProjection (still nested), and transformers < 5.6.
The vendored encode_prompt copies in the PAG, APG, ControlNet-XS and differential diffusion pipelines dereference the .text_model wrapper that transformers 5.6 removed from CLIPTextModel, so their clip-skip path crashes on SD1.5 and SDXL TE1. Apply the same getattr(te, 'text_model', te) fix as the core parser. Mirrors upstream diffusers, which still carries this deref in pipeline encode_prompt; only the single-file loader was fixed there.
clip_skip and the uni_pc_* opts live in opts.data without an OptionInfo in data_labels (compatibility_opts). Options.set() read data_labels[key].onchange unconditionally, so setting clip_skip via /sdapi/v1/options raised KeyError and returned 500; the override_settings restore path had the same unguarded data_labels[k] access for falsy-valued compat opts. Guard the onchange lookup and read the stored value via getattr(opts, k), which already falls back through data then data_labels.
Owner
|
lgtm! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Here are 4 hours of my life I'm not getting back.
Restore textual inversion (embeddings), which stopped applying in newer Transformers versions. Two independent regressions in the prompt-encoding path, plus a related clip-skip options-API crash.
Notes
Three root causes:
modules/textual_inversion.py): transformers 5 defaultsadd_tokens()tonormalized=True, so CLIP lowercases added embedding names.tokenizer.tokenize()returns the lowercased surface,maybe_convert_promptno longer matches the original-case name inadded_tokens_encoder, and multi-vector embeddings never expand (each collapses to its first vector, so a 16-vector negative applies 1/16). Adding tokens asAddedToken(name, normalized=False)keeps them case-sensitive;convert_tokens_to_idsand encoding are unchanged.prompt_parser_diffusers.py,prompt_parser_xhinker.py, vendoredpag/pipe_sd.py,apg/pipeline_stable_diffusion_apg.py,control/units/xs_pipe.py,scripts/differential_diffusion.py): transformers 5.6 flattenedCLIPTextModeland dropped the.text_modelwrapper. The clip-skip branch dereferences it, so SD1.5 at clip-skip >= 2 raisedAttributeError, whichprocessing_promptcaught and silently fell back to fixed-attention encoding (embeddings and prompt weighting lost).getattr(te, 'text_model', te)handles flattenedCLIPTextModel,CLIPTextModelWithProjection(still nested), and transformers < 5.6.options_handler.py,processing.py):clip_skipanduni_pc_*arecompatibility_opts(inopts.data, nodata_labelsentry).Options.set()readdata_labels[key].onchangeunconditionally -> KeyError -> 500 onPOST /sdapi/v1/options {clip_skip}; the override_settings restore path had the same unguarded access. Guarded both withgetattr(opts, k).Environment and Testing