fix: enforce disabled alert handlers for TICKscript handler methods#2885
fix: enforce disabled alert handlers for TICKscript handler methods#2885
Conversation
|
@srebhan perhaps this can be reviewed for the fix for this issue to help remediate the vulnerability. Let me know if you have other thoughts, thanks |
|
The flag, "-disable-handlers," as documented here, does not support "exec" in the list of alert-handlers. |
karel-rehor
left a comment
There was a problem hiding this comment.
After taking a first look at the changes in alert.go everything looks good. However I've gone ahead and run a build and repeated the steps in the edge issue (https://github.com/influxdata/edge/issues/1044). I've tried a couple of additional steps, such as $./kapacitor list tasks and ./kapacitor show task exec-test. The ./kapacitor list tasks command, shows the task is enabled.
$./kapacitor list tasks
ID Type Status Executing Databases and Retention Policies
exec-test stream enabled false ["telegraf"."autogen"]
In addition ./kapacitor show task exec-test, shows an error occurred and that the task should be disabled. But the Status field shows that it is enabled.
$ ./kapacitor show exec-test
ID: exec-test
Error: exec alert handler is disabled, TICKscripts using .exec() cannot be enabled
Template:
Type: stream
Status: enabled
Executing: false
Created: 16 Mar 26 14:50 CET
Modified: 16 Mar 26 14:50 CET
LastEnabled: 16 Mar 26 14:50 CET
Databases Retention Policies: ["telegraf"."autogen"]
TICKscript:
stream
|from()
.measurement('cpu')
|alert()
.crit(lambda: TRUE)
.exec('/bin/touch', '/tmp/exec_handler_works')
DOT:
digraph exec-test {
stream0 -> from1;
from1 -> alert2;
}
Following up with curl -s -XPOST 'http://localhost:9092/kapacitor/v1/write?db=telegraf&rp=autogen' \ --data-binary 'cpu,host=test value=1' shows that the file /tmp/exec_handler_works was not created or updated with the touch command. So in practice it is disabled, but the state information is not correct.
The status information of the task object should reflect the error information. This requires some further investigation.
In addition, after creating the exec-test task over curl the response was an error message.
{
"error": "exec alert handler is disabled, TICKscripts using .exec() cannot be enabled",
"message": "exec alert handler is disabled, TICKscripts using .exec() cannot be enabled"
}
On successful task creation the json response describes the newly created task...
{
"link": {
"rel": "self",
"href": "/kapacitor/v1/tasks/exec-test2"
},
"id": "exec-test2",
"template-id": "",
"type": "stream",
"dbrps": [
{
"db": "telegraf",
"rp": "autogen"
}
],
"script": "stream\n |from()\n .measurement('cpu')\n |alert()\n .crit(lambda: TRUE)\n .exec('/bin/touch', '/tmp/exec_handler_works')\n",
"vars": {},
"dot": "digraph exec-test2 {\ngraph [throughput=\"0.00 points/s\"];\n\nstream0 [avg_exec_time_ns=\"0s\" errors=\"0\" working_cardinality=\"0\" ];\nstream0 -\u003e from1 [processed=\"0\"];\n\nfrom1 [avg_exec_time_ns=\"0s\" errors=\"0\" working_cardinality=\"0\" ];\nfrom1 -\u003e alert2 [processed=\"0\"];\n\nalert2 [alerts_inhibited=\"0\" alerts_triggered=\"0\" avg_exec_time_ns=\"0s\" crits_triggered=\"0\" errors=\"0\" infos_triggered=\"0\" oks_triggered=\"0\" warns_triggered=\"0\" working_cardinality=\"0\" ];\n}",
"status": "enabled",
"executing": true,
"error": "",
"stats": {
"task-stats": {
...
This includes an "error": field.
- The current error response can cause the impression that the task was not created, when it was.
- Perhaps this error information should be added to the error field of the newly created task.
- In addition the initial "status" should be set to "disabled".
I know Kapacitor is a legacy product in maintenance only mode, so these remarks could be classified as SHOULD but not necessarily MUST. The security criteria appear to be met. The remainder is a question of budgeting and time allotment.
|
@karel-rehor thanks for the thorough testing — really appreciate you going beyond the diff and actually running the build against the edge issue steps. I see what you mean about the task status UX being misleading and that the root cause seems to be the existing create-then-start pattern in This is a pre-existing architectural issue that's now more visible because disabled-handler errors are a deliberate, expected failure mode rather than a transient one. I'd prefer to address the status consistency improvement in a follow-up PR rather than mixing task lifecycle changes into this security fix. Specifically:
The security criteria are met as-is — the exec is blocked, the command does not run. OK to merge this and track the UX fix separately? |
|
I opened pr #2886 addresses UX as a separate fix since it is not directly related to the security criteria fix. |
|
I also ran a docker test and it passed: Since testing passed both @karel-rehor and this one, what is the next step to get this approved and merged? |
karel-rehor
left a comment
There was a problem hiding this comment.
I'll look at fixes for problems with managing Task state in #2886.
Apart from that...
Looks good to me 🚴 🏁
|
I had opened pr#2886 for UX status update but closed it and integrated it into this pr. @karel-rehor - the fix for misleading UX status is integrated @bednar - CHANGELOG.md is updated. All circelci and molecule statuses are successful. Please review and approve or give additional feedback. |
Required checklist
NOTE: the branch name has "not_intended_for_merging_" but it is intended for merging.
--disable-alert-handlersCLI flag)Description
The
-disable-handlers execflag only blocked exec handlers registered via the topic-handler REST API, but did not block.exec()calls directly on AlertNode in TICKscripts. This closes that bypass so all 25 TICKscript alert handler methods (.exec(),.log(),.tcp(),.email(), etc.) respect the disabled handlers flag. Tasks using a disabled handler now fail to start with a clear error message.Context
Why: Mandiant finding CSA-H-01 identified that
-disable-handlers execcould be bypassed via TICKscript.exec()calls, enabling remote code execution. See influxdata/edge#1044.Value: Closes the RCE bypass — the disable-handlers flag now works uniformly across both code paths (REST API topic-handlers and TICKscript direct handlers).
Risk: Tasks that were previously running with
.exec()(or other disabled handler methods) in TICKscripts will fail to start after this change if the handler is disabled. This is the intended secure behavior.Affected areas (if applicable):
No user-visible CLI, API, or config changes. The existing
--disable-alert-handlersflag now additionally applies to TICKscript handler methods. Users with disabled handlers who have TICKscripts using those handlers will see an error when enabling the task (e.g.,"exec alert handler is disabled, TICKscripts using .exec() cannot be enabled").Severity
Recommend upgrading immediately for deployments using
-disable-handlers execas a security control.Note for reviewers:
Semantic commit type: Fix — security bug fix (RCE bypass via TICKscript
.exec()when handler is disabled).Files changed:
alert.go— Added disabled-handler checks for all 25 handler types innewAlertNode. Explicit TICKscript handlers return an error (fail closed). Global handler fallbacks are silently skipped.task_master.go— AddedDisabledHandlersfield, initialized inNewTaskMaster, propagated inNew().server/server.go— WireddisabledAlertHandlersintoTaskMasterat startup (1 line).server/server_test.go— AddedTestServer_AlertHandlers_disable_tickscriptwith test cases for exec, log, and tcp.