-
Notifications
You must be signed in to change notification settings - Fork 583
feat: add reasoning guardrail connector #1565
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
Documentation preview |
nemoguardrails/llm/taskmanager.py
Outdated
| "sample_conversation_two_turns": self.config.sample_conversation, | ||
| # Make rails config available in templates for accessing settings like | ||
| # content_safety.reasoning.enabled, content_safety.multilingual.enabled, etc. | ||
| "config": self.config.rails.config, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Pouyanpi @tgasser-nv this was needed so that we can toggle reasoning-on and reasoning-off modes in the jinja-2 prompt template depending on config.yml
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is better to pass reasoning_enabled in actions.py from render_task_prompt (see this commit). Content safety should own its configuration, it also allows us to keep LLMTaskManager generic 👍🏻 Please let me know if you see any issues.
Greptile SummaryThis PR adds a new Reasoning Guardrail Connector for the Key changes:
Issue found:
|
| Filename | Overview |
|---|---|
| nemoguardrails/llm/output_parsers.py | Added two new output parsers (nemotron_reasoning_parse_prompt_safety and nemotron_reasoning_parse_response_safety) with helper functions for parsing Nemotron Content Safety Reasoning model outputs |
| nemoguardrails/llm/taskmanager.py | Registered new output parsers and exposed config.rails.config object in template rendering context to enable access to configuration settings |
| nemoguardrails/rails/llm/config.py | Added ReasoningConfig model to support reasoning mode toggle for content safety models |
| examples/configs/content_safety_reasoning/prompts.yml | Prompt templates for content safety checks - uses incorrect variable reasoning_mode instead of config.content_safety.reasoning.enabled |
Sequence Diagram
sequenceDiagram
participant User
participant LLMRails
participant TaskManager
participant ContentSafetyModel as Nemotron-4B
participant MainLLM
User->>LLMRails: generate(message)
Note over LLMRails,TaskManager: Input Rail Flow
LLMRails->>TaskManager: render_task_prompt("content_safety_check_input")
TaskManager->>TaskManager: Apply template with config.content_safety.reasoning.enabled
TaskManager-->>LLMRails: Prompt with /think or /no_think
LLMRails->>ContentSafetyModel: Check input safety
ContentSafetyModel-->>LLMRails: "Prompt harm: harmful/unharmful"
LLMRails->>TaskManager: parse_task_output(output)
TaskManager->>TaskManager: nemotron_reasoning_parse_prompt_safety()
TaskManager->>TaskManager: Strip <think> tags, extract harm value
TaskManager-->>LLMRails: [True] or [False]
alt Input is safe
LLMRails->>MainLLM: Generate response
MainLLM-->>LLMRails: Response text
Note over LLMRails,TaskManager: Output Rail Flow
LLMRails->>TaskManager: render_task_prompt("content_safety_check_output")
TaskManager-->>LLMRails: Prompt with /think or /no_think
LLMRails->>ContentSafetyModel: Check output safety
ContentSafetyModel-->>LLMRails: "Response Harm: harmful/unharmful"
LLMRails->>TaskManager: parse_task_output(output)
TaskManager->>TaskManager: nemotron_reasoning_parse_response_safety()
TaskManager-->>LLMRails: [True] or [False]
alt Output is safe
LLMRails-->>User: Response
else Output is unsafe
LLMRails-->>User: "I'm sorry, I can't respond to that."
end
else Input is unsafe
LLMRails-->>User: "I'm sorry, I can't respond to that."
end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
9 files reviewed, 2 comments
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
e36d9fc to
4c134db
Compare
Signed-off-by: Makesh Sreedhar <[email protected]>
Signed-off-by: Makesh Sreedhar <[email protected]>
Include the `reasoning_enabled` flag from the content safety config in the context passed to both input and output content safety check prompts. This enables prompt templates to conditionally adjust behavior based on the reasoning setting. Removes redundant config exposure in LLMTaskManager's default context.
4c134db to
921f1d7
Compare
| - type: content_safety_reasoning | ||
| engine: openai | ||
| parameters: | ||
| openai_api_base: http://localhost:8001/v1 | ||
| model_name: nvidia/Nemotron-Content-Safety-Reasoning-4B | ||
| temperature: 0.6 | ||
| top_p: 0.95 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think it still works if we change it to:
- type: content_safety_reasoning
engine: openai
model: nvidia/Nemotron-Content-Safety-Reasoning-4B
parameters:
openai_api_base: http://localhost:8001/v1
temperature: 0.6
top_p: 0.95can you please check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to modify the example in the tutorial as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot @makeshn , this is great 🚀
I just made a minor change based on my comment above and added tests. Let me know if you see any issues, we can revert/change.
Please look at the tests to ensure the assertions are aligned with you expectations. As I cannot run it myself, would you please to run it and pass some inputs to see everything is still working. Thanks 👍🏻
trebedea
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, some suggestions.
| :maxdepth: 2 | ||
| Check Harmful Content <nemotron-safety-guard-deployment> | ||
| Content Safety Reasoning <nemotron-content-safety-reasoning-deployment> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we use Respect Custom Safety and Dialogue Policies? This would focus on use-case rather than how it works.
| :link: nemotron-content-safety-reasoning-deployment | ||
| :link-type: doc | ||
|
|
||
| Deploy Nemotron-Content-Safety-Reasoning-4B for customizable content safety with reasoning traces. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar here, maybe we want to focus on use-case rather than how the model works. An alternative would be:
:::{grid-item-card} Respect Custom Safety and Dialogue Policies
:link: nemotron-content-safety-reasoning-deployment
:link-type: doc
Enforce customizable content safety and dialogue policies for your use-case by using our reasoning guard model Nemotron-Content-Safety-Reasoning-4B.
|
|
||
| ```text | ||
| Prompt harm: unharmful | ||
| Response harm: None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Response harm: None | |
| Response harm: unharmful |
| ```text | ||
| <think> | ||
| The human user's request falls under S21 (Illegal Activity) as it seeks guidance on theft, | ||
| which is a criminal act. The AI assistant's response is a refusal and provides an ethical |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here the input rail is failing - it does not really make sense to have the output rail (including the bot response) in the example.
|
|
||
| **Reasoning On (`/think`)**: Provides explicit reasoning traces for decisions. Better for complex or novel custom policies. Higher latency. See [example](https://huggingface.co/nvidia/Nemotron-Content-Safety-Reasoning-4B#example-1-vanilla-safety-with-nemotron-content-safety-dataset-v2-taxonomy-reasoning-on-mode). | ||
|
|
||
| **Reasoning Off (`/no_think`)**: Fast classification without reasoning. Suitable for standard safety checks. Lower latency. See [example](https://huggingface.co/nvidia/Nemotron-Content-Safety-Reasoning-4B#example-2-vanilla-safety-with-nemotron-content-safety-dataset-v2-taxonomy-reasoning-off-mode). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| **Reasoning Off (`/no_think`)**: Fast classification without reasoning. Suitable for standard safety checks. Lower latency. See [example](https://huggingface.co/nvidia/Nemotron-Content-Safety-Reasoning-4B#example-2-vanilla-safety-with-nemotron-content-safety-dataset-v2-taxonomy-reasoning-off-mode). | |
| **Reasoning Off (`/no_think`)**: Fast classification without reasoning. Suitable for standard content safety policies. Lower latency. See [example](https://huggingface.co/nvidia/Nemotron-Content-Safety-Reasoning-4B#example-2-vanilla-safety-with-nemotron-content-safety-dataset-v2-taxonomy-reasoning-off-mode). |
| - type: content_safety_reasoning | ||
| engine: openai | ||
| parameters: | ||
| openai_api_base: http://localhost:8001/v1 | ||
| model_name: nvidia/Nemotron-Content-Safety-Reasoning-4B | ||
| temperature: 0.6 | ||
| top_p: 0.95 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to modify the example in the tutorial as well.
| # fmt: off | ||
| # isort: skip_file | ||
|
|
||
| # start-prerequisites | ||
|
|
||
| # end-prerequisites |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove these and move the print_moderation_output function after the imports
Description
This PR adds a new Reasoning Guardrail Connector for the nvidia/Nemotron-Content-Safety-Reasoning-4B model.
Key Changes:
Checklist