feat: add reasoning guardrail connector #1565

makeshn · 2026-01-05T22:47:14Z

Description

This PR adds a new Reasoning Guardrail Connector for the nvidia/Nemotron-Content-Safety-Reasoning-4B model.

Key Changes:

Added a new model type content_safety_reasoning that integrates with the Nemotron Content Safety Reasoning model
Added support for reasoning mode toggle (/think vs /no_think) via config option
Added new output parsers (nemotron_reasoning_parse_prompt_safety, nemotron_reasoning_parse_response_safety) for parsing the model's safety classification output
Added example configuration demonstrating content safety checks on both input and output flows as well as reasoning off and reasoning on modes.
Tutorial showing how to serve and setup the model via vLLM with OpenAI-compatible API

Checklist

I've read the CONTRIBUTING guidelines.
I've updated the documentation if applicable.
I've added tests if applicable.
@mentions of the person or team responsible for reviewing proposed changes.

nemoguardrails/llm/providers/huggingface/streamers.py

github-actions · 2026-01-05T22:48:36Z

Documentation preview

https://nvidia-nemo.github.io/Guardrails/review/pr-1565

makeshn · 2026-01-05T22:49:24Z

nemoguardrails/llm/taskmanager.py

            "sample_conversation_two_turns": self.config.sample_conversation,
+            # Make rails config available in templates for accessing settings like
+            # content_safety.reasoning.enabled, content_safety.multilingual.enabled, etc.
+            "config": self.config.rails.config,


@Pouyanpi @tgasser-nv this was needed so that we can toggle reasoning-on and reasoning-off modes in the jinja-2 prompt template depending on config.yml

I think it is better to pass reasoning_enabled in actions.py from render_task_prompt (see this commit). Content safety should own its configuration, it also allows us to keep LLMTaskManager generic 👍🏻 Please let me know if you see any issues.

greptile-apps · 2026-01-05T22:50:21Z

Greptile Summary

This PR adds a new Reasoning Guardrail Connector for the nvidia/Nemotron-Content-Safety-Reasoning-4B model, enabling content safety checks with optional reasoning traces.

Key changes:

Added two new output parsers (nemotron_reasoning_parse_prompt_safety and nemotron_reasoning_parse_response_safety) to parse the model's "harmful/unharmful" output format and strip <think> tags
Added ReasoningConfig to the configuration schema to allow toggling between reasoning on/off modes via rails.config.content_safety.reasoning.enabled
Exposed config.rails.config in Jinja2 template rendering context (in taskmanager.py:138) to allow templates to access configuration settings
Provided comprehensive example configuration demonstrating both input and output rails with the new model type
Added detailed tutorial documentation explaining deployment with vLLM and configuration options

Issue found:
The example prompts.yml file uses an undefined variable reasoning_mode instead of config.content_safety.reasoning.enabled, which will cause the template rendering to fail or default incorrectly. The documentation shows the correct usage.

Confidence Score: 3/5

This PR has a critical bug in the example configuration that will prevent the reasoning mode toggle from working correctly
The core implementation (output parsers, config schema, taskmanager changes) is solid and well-structured. However, the example prompts.yml contains a template variable error that references an undefined variable (reasoning_mode) instead of the correct path (config.content_safety.reasoning.enabled). This inconsistency between the example and documentation will cause confusion and broken functionality for users following the example.
examples/configs/content_safety_reasoning/prompts.yml requires fixes to use the correct template variable path

Important Files Changed

Filename	Overview
nemoguardrails/llm/output_parsers.py	Added two new output parsers (`nemotron_reasoning_parse_prompt_safety` and `nemotron_reasoning_parse_response_safety`) with helper functions for parsing Nemotron Content Safety Reasoning model outputs
nemoguardrails/llm/taskmanager.py	Registered new output parsers and exposed `config.rails.config` object in template rendering context to enable access to configuration settings
nemoguardrails/rails/llm/config.py	Added `ReasoningConfig` model to support reasoning mode toggle for content safety models
examples/configs/content_safety_reasoning/prompts.yml	Prompt templates for content safety checks - uses incorrect variable `reasoning_mode` instead of `config.content_safety.reasoning.enabled`

Sequence Diagram

sequenceDiagram
    participant User
    participant LLMRails
    participant TaskManager
    participant ContentSafetyModel as Nemotron-4B
    participant MainLLM
    
    User->>LLMRails: generate(message)
    
    Note over LLMRails,TaskManager: Input Rail Flow
    LLMRails->>TaskManager: render_task_prompt("content_safety_check_input")
    TaskManager->>TaskManager: Apply template with config.content_safety.reasoning.enabled
    TaskManager-->>LLMRails: Prompt with /think or /no_think
    
    LLMRails->>ContentSafetyModel: Check input safety
    ContentSafetyModel-->>LLMRails: "Prompt harm: harmful/unharmful"
    
    LLMRails->>TaskManager: parse_task_output(output)
    TaskManager->>TaskManager: nemotron_reasoning_parse_prompt_safety()
    TaskManager->>TaskManager: Strip <think> tags, extract harm value
    TaskManager-->>LLMRails: [True] or [False]
    
    alt Input is safe
        LLMRails->>MainLLM: Generate response
        MainLLM-->>LLMRails: Response text
        
        Note over LLMRails,TaskManager: Output Rail Flow
        LLMRails->>TaskManager: render_task_prompt("content_safety_check_output")
        TaskManager-->>LLMRails: Prompt with /think or /no_think
        
        LLMRails->>ContentSafetyModel: Check output safety
        ContentSafetyModel-->>LLMRails: "Response Harm: harmful/unharmful"
        
        LLMRails->>TaskManager: parse_task_output(output)
        TaskManager->>TaskManager: nemotron_reasoning_parse_response_safety()
        TaskManager-->>LLMRails: [True] or [False]
        
        alt Output is safe
            LLMRails-->>User: Response
        else Output is unsafe
            LLMRails-->>User: "I'm sorry, I can't respond to that."
        end
    else Input is unsafe
        LLMRails-->>User: "I'm sorry, I can't respond to that."
    end

greptile-apps

_{9 files reviewed, 2 comments}

_{Edit Code Review Agent Settings | Greptile}

examples/configs/content_safety_reasoning/prompts.yml

codecov · 2026-01-05T22:54:23Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Signed-off-by: Makesh Sreedhar <[email protected]>

Include the `reasoning_enabled` flag from the content safety config in the context passed to both input and output content safety check prompts. This enables prompt templates to conditionally adjust behavior based on the reasoning setting. Removes redundant config exposure in LLMTaskManager's default context.

Pouyanpi · 2026-01-08T10:27:54Z

examples/configs/content_safety_reasoning/config.yml

+  - type: content_safety_reasoning
+    engine: openai
+    parameters:
+      openai_api_base: http://localhost:8001/v1
+      model_name: nvidia/Nemotron-Content-Safety-Reasoning-4B
+      temperature: 0.6
+      top_p: 0.95


i think it still works if we change it to:

- type: content_safety_reasoning engine: openai model: nvidia/Nemotron-Content-Safety-Reasoning-4B parameters: openai_api_base: http://localhost:8001/v1 temperature: 0.6 top_p: 0.95

can you please check?

We need to modify the example in the tutorial as well.

Pouyanpi

Thanks a lot @makeshn , this is great 🚀

I just made a minor change based on my comment above and added tests. Let me know if you see any issues, we can revert/change.

Please look at the tests to ensure the assertions are aligned with you expectations. As I cannot run it myself, would you please to run it and pass some inputs to see everything is still working. Thanks 👍🏻

trebedea

Looks good, some suggestions.

trebedea · 2026-01-08T18:28:41Z

docs/getting-started/tutorials/index.md

 :maxdepth: 2

 Check Harmful Content <nemotron-safety-guard-deployment>
+Content Safety Reasoning <nemotron-content-safety-reasoning-deployment>


Should we use Respect Custom Safety and Dialogue Policies? This would focus on use-case rather than how it works.

trebedea · 2026-01-08T18:32:26Z

docs/getting-started/tutorials/index.md

+:link: nemotron-content-safety-reasoning-deployment
+:link-type: doc
+
+Deploy Nemotron-Content-Safety-Reasoning-4B for customizable content safety with reasoning traces.


Similar here, maybe we want to focus on use-case rather than how the model works. An alternative would be:

:::{grid-item-card} Respect Custom Safety and Dialogue Policies :link: nemotron-content-safety-reasoning-deployment :link-type: doc Enforce customizable content safety and dialogue policies for your use-case by using our reasoning guard model Nemotron-Content-Safety-Reasoning-4B.

trebedea · 2026-01-08T18:41:46Z

docs/getting-started/tutorials/nemotron-content-safety-reasoning-deployment.md

+
+```text
+Prompt harm: unharmful
+Response harm: None


Suggested change

Response harm: None

Response harm: unharmful

trebedea · 2026-01-08T18:43:59Z

docs/getting-started/tutorials/nemotron-content-safety-reasoning-deployment.md

+```text
+<think>
+The human user's request falls under S21 (Illegal Activity) as it seeks guidance on theft,
+which is a criminal act. The AI assistant's response is a refusal and provides an ethical


Here the input rail is failing - it does not really make sense to have the output rail (including the bot response) in the example.

trebedea · 2026-01-08T18:45:08Z

docs/getting-started/tutorials/nemotron-content-safety-reasoning-deployment.md

+
+**Reasoning On (`/think`)**: Provides explicit reasoning traces for decisions. Better for complex or novel custom policies. Higher latency. See [example](https://huggingface.co/nvidia/Nemotron-Content-Safety-Reasoning-4B#example-1-vanilla-safety-with-nemotron-content-safety-dataset-v2-taxonomy-reasoning-on-mode).
+
+**Reasoning Off (`/no_think`)**: Fast classification without reasoning. Suitable for standard safety checks. Lower latency. See [example](https://huggingface.co/nvidia/Nemotron-Content-Safety-Reasoning-4B#example-2-vanilla-safety-with-nemotron-content-safety-dataset-v2-taxonomy-reasoning-off-mode).


Suggested change

**Reasoning Off (`/no_think`)**: Fast classification without reasoning. Suitable for standard safety checks. Lower latency. See [example](https://huggingface.co/nvidia/Nemotron-Content-Safety-Reasoning-4B#example-2-vanilla-safety-with-nemotron-content-safety-dataset-v2-taxonomy-reasoning-off-mode).

**Reasoning Off (`/no_think`)**: Fast classification without reasoning. Suitable for standard content safety policies. Lower latency. See [example](https://huggingface.co/nvidia/Nemotron-Content-Safety-Reasoning-4B#example-2-vanilla-safety-with-nemotron-content-safety-dataset-v2-taxonomy-reasoning-off-mode).

trebedea · 2026-01-08T18:48:57Z

examples/configs/content_safety_reasoning/config.yml

+  - type: content_safety_reasoning
+    engine: openai
+    parameters:
+      openai_api_base: http://localhost:8001/v1
+      model_name: nvidia/Nemotron-Content-Safety-Reasoning-4B
+      temperature: 0.6
+      top_p: 0.95


We need to modify the example in the tutorial as well.

trebedea · 2026-01-08T19:01:23Z

examples/configs/content_safety_reasoning/demo.py

+# fmt: off
+# isort: skip_file
+
+# start-prerequisites
+
+# end-prerequisites


Remove these and move the print_moderation_output function after the imports

makeshn requested review from Pouyanpi, cparisien, tgasser-nv and trebedea January 5, 2026 22:47

makeshn self-assigned this Jan 5, 2026

makeshn commented Jan 5, 2026

View reviewed changes

nemoguardrails/llm/providers/huggingface/streamers.py Outdated Show resolved Hide resolved

makeshn commented Jan 5, 2026

View reviewed changes

greptile-apps bot reviewed Jan 5, 2026

View reviewed changes

examples/configs/content_safety_reasoning/prompts.yml Outdated Show resolved Hide resolved

examples/configs/content_safety_reasoning/prompts.yml Outdated Show resolved Hide resolved

Pouyanpi changed the title ~~add reasoning guardrail connector~~ feat: add reasoning guardrail connector Jan 6, 2026

Pouyanpi force-pushed the feature/reasoning_guardrail_connector branch from e36d9fc to 4c134db Compare January 8, 2026 10:19

makeshn and others added 5 commits January 8, 2026 11:22

add reasoning guardrail connector

643e554

Signed-off-by: Makesh Sreedhar <[email protected]>

update reasoning guardrail prompt

2be40c9

Signed-off-by: Makesh Sreedhar <[email protected]>

revert streamers.py change

e916dd4

test(content_safety): add Nemotron reasoning parser tests

921f1d7

Pouyanpi force-pushed the feature/reasoning_guardrail_connector branch from 4c134db to 921f1d7 Compare January 8, 2026 10:24

Pouyanpi reviewed Jan 8, 2026

View reviewed changes

Pouyanpi self-requested a review January 8, 2026 10:30

Pouyanpi approved these changes Jan 8, 2026

View reviewed changes

Pouyanpi added the enhancement New feature or request label Jan 8, 2026

Pouyanpi added this to the 0.20.0 milestone Jan 8, 2026

trebedea approved these changes Jan 8, 2026

View reviewed changes


		Reasoning On (`/think`): Provides explicit reasoning traces for decisions. Better for complex or novel custom policies. Higher latency. See [example](https://huggingface.co/nvidia/Nemotron-Content-Safety-Reasoning-4B#example-1-vanilla-safety-with-nemotron-content-safety-dataset-v2-taxonomy-reasoning-on-mode).

		Reasoning Off (`/no_think`): Fast classification without reasoning. Suitable for standard safety checks. Lower latency. See [example](https://huggingface.co/nvidia/Nemotron-Content-Safety-Reasoning-4B#example-2-vanilla-safety-with-nemotron-content-safety-dataset-v2-taxonomy-reasoning-off-mode).

feat: add reasoning guardrail connector #1565

Are you sure you want to change the base?

feat: add reasoning guardrail connector #1565

Conversation

makeshn commented Jan 5, 2026

Description

Key Changes:

Checklist

Uh oh!

Uh oh!

github-actions bot commented Jan 5, 2026

Documentation preview

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot commented Jan 5, 2026

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Jan 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Pouyanpi left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

trebedea left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

codecov bot commented Jan 5, 2026 •

edited

Loading

Pouyanpi left a comment •

edited

Loading