Skip to content

fix: enforce disabled alert handlers for TICKscript handler methods#2885

Open
rjoost wants to merge 4 commits intomasterfrom
for_testing_not_intended_for_merging_kapacitor
Open

fix: enforce disabled alert handlers for TICKscript handler methods#2885
rjoost wants to merge 4 commits intomasterfrom
for_testing_not_intended_for_merging_kapacitor

Conversation

@rjoost
Copy link

@rjoost rjoost commented Mar 12, 2026

Required checklist

NOTE: the branch name has "not_intended_for_merging_" but it is intended for merging.

  • Sample config files updated — N/A (no config changes; uses existing --disable-alert-handlers CLI flag)
  • openapi swagger.yml updated — N/A (no API changes)
  • Signed CLA (if not already signed)

Description

The -disable-handlers exec flag only blocked exec handlers registered via the topic-handler REST API, but did not block .exec() calls directly on AlertNode in TICKscripts. This closes that bypass so all 25 TICKscript alert handler methods (.exec(), .log(), .tcp(), .email(), etc.) respect the disabled handlers flag. Tasks using a disabled handler now fail to start with a clear error message.

Context

Why: Mandiant finding CSA-H-01 identified that -disable-handlers exec could be bypassed via TICKscript .exec() calls, enabling remote code execution. See influxdata/edge#1044.

Value: Closes the RCE bypass — the disable-handlers flag now works uniformly across both code paths (REST API topic-handlers and TICKscript direct handlers).

Risk: Tasks that were previously running with .exec() (or other disabled handler methods) in TICKscripts will fail to start after this change if the handler is disabled. This is the intended secure behavior.

Affected areas (if applicable):

No user-visible CLI, API, or config changes. The existing --disable-alert-handlers flag now additionally applies to TICKscript handler methods. Users with disabled handlers who have TICKscripts using those handlers will see an error when enabling the task (e.g., "exec alert handler is disabled, TICKscripts using .exec() cannot be enabled").

Severity

Recommend upgrading immediately for deployments using -disable-handlers exec as a security control.

Note for reviewers:

Semantic commit type: Fix — security bug fix (RCE bypass via TICKscript .exec() when handler is disabled).

Files changed:

  • alert.go — Added disabled-handler checks for all 25 handler types in newAlertNode. Explicit TICKscript handlers return an error (fail closed). Global handler fallbacks are silently skipped.
  • task_master.go — Added DisabledHandlers field, initialized in NewTaskMaster, propagated in New().
  • server/server.go — Wired disabledAlertHandlers into TaskMaster at startup (1 line).
  • server/server_test.go — Added TestServer_AlertHandlers_disable_tickscript with test cases for exec, log, and tcp.

@rjoost rjoost self-assigned this Mar 12, 2026
@rjoost rjoost changed the title handling disabled_handlers implementation fix: enforce disabled alert handlers for TICKscript handler methods Mar 12, 2026
@rjoost rjoost requested a review from srebhan March 12, 2026 04:28
@rjoost
Copy link
Author

rjoost commented Mar 12, 2026

@srebhan perhaps this can be reviewed for the fix for this issue to help remediate the vulnerability. Let me know if you have other thoughts, thanks

@bednar bednar requested a review from karel-rehor March 12, 2026 09:22
@rjoost rjoost marked this pull request as ready for review March 12, 2026 16:09
@rjoost
Copy link
Author

rjoost commented Mar 12, 2026

The flag, "-disable-handlers," as documented here, does not support "exec" in the list of alert-handlers.
This fix is to address this.

Copy link
Contributor

@bednar bednar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rjoost thanks for your PR, in general looks good 👍

Please update the CHANGELOG.md.

Copy link
Collaborator

@karel-rehor karel-rehor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After taking a first look at the changes in alert.go everything looks good. However I've gone ahead and run a build and repeated the steps in the edge issue (https://github.com/influxdata/edge/issues/1044). I've tried a couple of additional steps, such as $./kapacitor list tasks and ./kapacitor show task exec-test. The ./kapacitor list tasks command, shows the task is enabled.

$./kapacitor list tasks
ID        Type      Status    Executing Databases and Retention Policies
exec-test stream    enabled   false     ["telegraf"."autogen"]

In addition ./kapacitor show task exec-test, shows an error occurred and that the task should be disabled. But the Status field shows that it is enabled.

$ ./kapacitor show exec-test
ID: exec-test
Error: exec alert handler is disabled, TICKscripts using .exec() cannot be enabled
Template: 
Type: stream
Status: enabled
Executing: false
Created: 16 Mar 26 14:50 CET
Modified: 16 Mar 26 14:50 CET
LastEnabled: 16 Mar 26 14:50 CET
Databases Retention Policies: ["telegraf"."autogen"]
TICKscript:
stream
    |from()
        .measurement('cpu')
    |alert()
        .crit(lambda: TRUE)
        .exec('/bin/touch', '/tmp/exec_handler_works')

DOT:
digraph exec-test {
stream0 -> from1;
from1 -> alert2;
}

Following up with curl -s -XPOST 'http://localhost:9092/kapacitor/v1/write?db=telegraf&rp=autogen' \ --data-binary 'cpu,host=test value=1' shows that the file /tmp/exec_handler_works was not created or updated with the touch command. So in practice it is disabled, but the state information is not correct.

The status information of the task object should reflect the error information. This requires some further investigation.

In addition, after creating the exec-test task over curl the response was an error message.

{
    "error": "exec alert handler is disabled, TICKscripts using .exec() cannot be enabled",
    "message": "exec alert handler is disabled, TICKscripts using .exec() cannot be enabled"
}

On successful task creation the json response describes the newly created task...

{
    "link": {
        "rel": "self",
        "href": "/kapacitor/v1/tasks/exec-test2"
    },
    "id": "exec-test2",
    "template-id": "",
    "type": "stream",
    "dbrps": [
        {
            "db": "telegraf",
            "rp": "autogen"
        }
    ],
    "script": "stream\n    |from()\n        .measurement('cpu')\n    |alert()\n        .crit(lambda: TRUE)\n        .exec('/bin/touch', '/tmp/exec_handler_works')\n",
    "vars": {},
    "dot": "digraph exec-test2 {\ngraph [throughput=\"0.00 points/s\"];\n\nstream0 [avg_exec_time_ns=\"0s\" errors=\"0\" working_cardinality=\"0\" ];\nstream0 -\u003e from1 [processed=\"0\"];\n\nfrom1 [avg_exec_time_ns=\"0s\" errors=\"0\" working_cardinality=\"0\" ];\nfrom1 -\u003e alert2 [processed=\"0\"];\n\nalert2 [alerts_inhibited=\"0\" alerts_triggered=\"0\" avg_exec_time_ns=\"0s\" crits_triggered=\"0\" errors=\"0\" infos_triggered=\"0\" oks_triggered=\"0\" warns_triggered=\"0\" working_cardinality=\"0\" ];\n}",
    "status": "enabled",
    "executing": true,
    "error": "",
    "stats": {
        "task-stats": {
...

This includes an "error": field.

  1. The current error response can cause the impression that the task was not created, when it was.
  2. Perhaps this error information should be added to the error field of the newly created task.
  3. In addition the initial "status" should be set to "disabled".

I know Kapacitor is a legacy product in maintenance only mode, so these remarks could be classified as SHOULD but not necessarily MUST. The security criteria appear to be met. The remainder is a question of budgeting and time allotment.

@rjoost
Copy link
Author

rjoost commented Mar 16, 2026

@karel-rehor thanks for the thorough testing — really appreciate you going beyond the diff and actually running the build against the edge issue steps.

I see what you mean about the task status UX being misleading and that the root cause seems to be the existing create-then-start pattern in task_store/service.go — the task is saved to the store first, then startTask is called. When startTask fails (now due to a disabled handler), the task persists in the store with status: enabled even though it's not executing.

This is a pre-existing architectural issue that's now more visible because disabled-handler errors are a deliberate, expected failure mode rather than a transient one. I'd prefer to address the status consistency improvement in a follow-up PR rather than mixing task lifecycle changes into this security fix. Specifically:

  1. Set task status to disabled when startTask fails due to a disabled handler
  2. Return the task object in the API response (with the error in the error field) instead of a bare HTTP 500

The security criteria are met as-is — the exec is blocked, the command does not run. OK to merge this and track the UX fix separately?

@rjoost
Copy link
Author

rjoost commented Mar 16, 2026

I opened pr #2886 addresses UX as a separate fix since it is not directly related to the security criteria fix.
Please review both and let me know

@rjoost rjoost requested review from bednar and karel-rehor March 16, 2026 22:59
@rjoost
Copy link
Author

rjoost commented Mar 16, 2026

I also ran a docker test and it passed:

docker build -f Dockerfile_build_ubuntu64 -t kapacitor-build .
docker run --rm -v $(pwd):/kapacitor kapacitor-build test --junit
docker run --rm --platform linux/amd64 -v "$(pwd)":/kapacitor --entrypoint bash kapacitor-build -c "cd /kapacitor && go test -run 'TestServer_AlertHandlers_disable' -count=1 -v -timeout 120s ./server/"
All tests pass:

  - TestServer_AlertHandlers_disable/alerta-0 — PASS
  - TestServer_AlertHandlers_disable_tickscript/exec — PASS
  - TestServer_AlertHandlers_disable_tickscript/log — PASS
  - TestServer_AlertHandlers_disable_tickscript/tcp — PASS

Since testing passed both @karel-rehor and this one, what is the next step to get this approved and merged?

Copy link
Collaborator

@karel-rehor karel-rehor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll look at fixes for problems with managing Task state in #2886.

Apart from that...

Looks good to me 🚴 🏁

@rjoost rjoost closed this Mar 18, 2026
@rjoost rjoost deleted the for_testing_not_intended_for_merging_kapacitor branch March 18, 2026 00:09
@rjoost rjoost restored the for_testing_not_intended_for_merging_kapacitor branch March 18, 2026 00:11
@rjoost rjoost reopened this Mar 18, 2026
@rjoost
Copy link
Author

rjoost commented Mar 18, 2026

I had opened pr#2886 for UX status update but closed it and integrated it into this pr.

@karel-rehor - the fix for misleading UX status is integrated

@bednar - CHANGELOG.md is updated.

All circelci and molecule statuses are successful.

Please review and approve or give additional feedback.

@rjoost rjoost requested a review from karel-rehor March 18, 2026 00:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants