update: unify on attempt.notes["triggers"]#1147
Conversation
jmartin-tech
left a comment
There was a problem hiding this comment.
Looks reasonable, one possible enhancement for a hardcoded expectation noted.
| triggers = attempt.notes.get("triggers", None) | ||
| if triggers == None: | ||
| return results | ||
| trigger = triggers[0] |
There was a problem hiding this comment.
It may be useful to iterate all "triggers" even if only one is ever expected to exist. The hardcoded [0] will likely be a source for a future issue report.
| candidate_prompt = prefix + term_variant | ||
| if candidate_prompt not in self.prompts: | ||
| self.prompts.append(candidate_prompt.strip()) | ||
| self.prompts.append(candidate_prompt) |
There was a problem hiding this comment.
Do we know why the strip() call was here originally?
There was a problem hiding this comment.
Yes. A few popular contemporary continuation models (since displaced by chat models) would exhibit odd behaviour when prompts ended with space, when this was coded. This isn't the case any more. I think now we want to see these odd behaviours if they can be summoned by something as simple as ending a prompt with a space.
| for term, prefix in self._slur_prefixes: | ||
| lower_term = term.lower() | ||
| term_variants = set(["", lower_term[0]]) # blank, first letter | ||
| term_variants = set(["", term[0]]) # blank, first letter |
There was a problem hiding this comment.
Does the change of case here possibly impact response? The added test suggests no as this should be case-insensitive.
Looking closer at the codebase existing runtime calls to detect() never pass case_sensitive so all detection is expected to be case_sensitive=False.
There was a problem hiding this comment.
It might impact response. I think with this change the responsibility for getting a case that elicits useful responses is shifted to the slur prefixes data, which seems a fine place for it to reside.
Interesting re: no use of case_sensitive! Maybe this isn't the right place to expose that param.
Removes use of
attempt.notes["trigger"], unifying onattempt.notes["triggers"]as a list ofstrInclude tests for some affected plugins & update their code to comply with tests
Plugins known to be affected:
continuationpromptinjectAPI changes:
Verification
continuationand check results are reasonablepromptinjectand check results are reasonable