Skip to content

update: unify on attempt.notes["triggers"]#1147

Merged
leondz merged 4 commits intoNVIDIA:mainfrom
leondz:update/trigger_triggers
Apr 2, 2025
Merged

update: unify on attempt.notes["triggers"]#1147
leondz merged 4 commits intoNVIDIA:mainfrom
leondz:update/trigger_triggers

Conversation

@leondz
Copy link
Collaborator

@leondz leondz commented Apr 2, 2025

Removes use of attempt.notes["trigger"], unifying on attempt.notes["triggers"] as a list of str

Include tests for some affected plugins & update their code to comply with tests

Plugins known to be affected:

  • continuation
  • promptinject

API changes:

  • hitlog param "trigger" (str) is now "triggers" (list)

Verification

  • Run general tests
  • Run the new tests
  • Run a scan with continuation and check results are reasonable
  • Run a scan with promptinject and check results are reasonable

@leondz leondz added the architecture Architectural upgrades label Apr 2, 2025
@leondz leondz requested a review from jmartin-tech April 2, 2025 08:32
Copy link
Collaborator

@jmartin-tech jmartin-tech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable, one possible enhancement for a hardcoded expectation noted.

triggers = attempt.notes.get("triggers", None)
if triggers == None:
return results
trigger = triggers[0]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be useful to iterate all "triggers" even if only one is ever expected to exist. The hardcoded [0] will likely be a source for a future issue report.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree, will address

candidate_prompt = prefix + term_variant
if candidate_prompt not in self.prompts:
self.prompts.append(candidate_prompt.strip())
self.prompts.append(candidate_prompt)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know why the strip() call was here originally?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. A few popular contemporary continuation models (since displaced by chat models) would exhibit odd behaviour when prompts ended with space, when this was coded. This isn't the case any more. I think now we want to see these odd behaviours if they can be summoned by something as simple as ending a prompt with a space.

for term, prefix in self._slur_prefixes:
lower_term = term.lower()
term_variants = set(["", lower_term[0]]) # blank, first letter
term_variants = set(["", term[0]]) # blank, first letter
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the change of case here possibly impact response? The added test suggests no as this should be case-insensitive.

Looking closer at the codebase existing runtime calls to detect() never pass case_sensitive so all detection is expected to be case_sensitive=False.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might impact response. I think with this change the responsibility for getting a case that elicits useful responses is shifted to the slur prefixes data, which seems a fine place for it to reside.

Interesting re: no use of case_sensitive! Maybe this isn't the right place to expose that param.

@leondz leondz merged commit 744d7b0 into NVIDIA:main Apr 2, 2025
9 checks passed
@leondz leondz deleted the update/trigger_triggers branch April 2, 2025 19:09
@github-actions github-actions bot locked and limited conversation to collaborators Apr 2, 2025
@leondz leondz restored the update/trigger_triggers branch April 2, 2025 19:21
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

architecture Architectural upgrades

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants