Skip to content
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions docs/src/content/docs/pattern-syntax/matching-behavior.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,17 @@ Tokens that do not start with `-` are matched **in order**:
| `git push origin main` | Matches |
| `git push main origin` | Does not match |

## Backslash Escapes

A backslash (`\`) in a pattern escapes the following character. During matching, the backslash is stripped and the remaining character is compared literally. This is useful for characters that have special meaning in shells, such as `;`:

```yaml
# \; in the pattern matches ; in the command
- "find * -exec <cmd> \\;|+"
```

The shell resolves `\;` to `;` before runok sees the command, so the pattern's `\;` (after unescape) matches the command's `;`.

## Combined Short Flags

Combined short flags like `-am` are **not** split into individual flags — they are matched as a single token, exactly as written:
Expand Down
1 change: 1 addition & 0 deletions docs/src/content/docs/pattern-syntax/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ Patterns are parsed exactly as written, with no hidden rewriting or implicit tra
| [Optional group](/pattern-syntax/optional-groups/) | `[-f]`, `[-X POST]` | Matches with or without the group |
| [Flag with value](/pattern-syntax/matching-behavior/#flag-schema-inference) | `-X\|--request POST` | A flag-value pair matched in any order |
| [Placeholder](/pattern-syntax/placeholders/) | `<cmd>`, `<opts>`, `<path:...>` | Special tokens in `<...>` with various behaviors (see below) |
| Backslash escape | `\;` | Literal match after removing the backslash |
| Quoted literal | `"WIP*"`, `'hello'` | Exact match without glob expansion |
| [Multi-word alternation](/pattern-syntax/alternation/#multi-word-alternation) | `"npx prettier"\|prettier` | Alternatives that include multi-word commands |

Expand Down
7 changes: 7 additions & 0 deletions docs/src/content/docs/pattern-syntax/placeholders.md
Original file line number Diff line number Diff line change
Expand Up @@ -161,8 +161,15 @@ definitions:

# Handles: xargs [flags...] command [args...]
- 'xargs <opts> <cmd>'

# Handles: find [args...] -exec|-execdir|-ok|-okdir command [args...] \;|+
- "find * -exec|-execdir|-ok|-okdir <cmd> \\;|+"
```

:::note
In the `find` wrapper example, `\\;` is a backslash-escaped semicolon in YAML. The pattern parser preserves the backslash (`\;`), and the matcher strips it during comparison so that it matches the shell-unescaped `;` in the actual command.
:::

## Restrictions

- `<cmd>` captures one or more tokens; it tries all possible split points to find a valid wrapped command
Expand Down
125 changes: 124 additions & 1 deletion src/rules/pattern_matcher.rs
Original file line number Diff line number Diff line change
Expand Up @@ -619,13 +619,117 @@ fn optional_flags_absent(optional_tokens: &[PatternToken], cmd_tokens: &[&str])
/// where `*` matches zero or more arbitrary characters. Otherwise, an
/// exact string comparison is performed.
fn literal_matches(pattern: &str, token: &str) -> bool {
if pattern.contains('*') {
if pattern.contains('\\') {
// Strip backslash escapes so that pattern `\;` matches command token `;`.
// The pattern lexer preserves backslash-escaped characters as-is (e.g. `\;`),
// while the command tokenizer resolves them (e.g. `\;` -> `;`).
// Uses sentinel-based matching so that `\*` is treated as a literal `*`,
// not a glob, even when the same token also contains a bare `*`.
unescape_and_match(pattern, token)
} else if pattern.contains('*') {
glob_match(pattern, token)
} else {
pattern == token
}
}

/// Remove backslash escapes and perform matching that correctly distinguishes
/// escaped characters from glob wildcards.
///
/// `\;` → matches `;`, `\*` → matches literal `*` (not a glob), `\\` → matches `\`.
///
/// When the pattern contains both `\*` (literal) and bare `*` (glob), the
/// escaped `*` characters are temporarily replaced with a sentinel (`\x00`)
/// during glob expansion so they are not treated as wildcards.
fn unescape_and_match(pattern: &str, token: &str) -> bool {
let mut unescaped = String::with_capacity(pattern.len());
let mut has_unescaped_glob = false;
let mut has_escaped_star = false;
let mut chars = pattern.chars();
while let Some(ch) = chars.next() {
if ch == '\\' {
if let Some(next) = chars.next() {
if next == '*' {
// Use sentinel for escaped `*` so glob_match won't treat it
// as a wildcard. We restore it after matching.
unescaped.push('\x00');
has_escaped_star = true;
} else {
unescaped.push(next);
}
}
} else {
if ch == '*' {
has_unescaped_glob = true;
}
unescaped.push(ch);
}
}
if has_unescaped_glob {
// Perform glob matching. Escaped `*` characters are sentinels (`\x00`)
// and won't be split by glob_match. We need to also place the sentinel
// in the token for comparison purposes.
// Actually, the token is a real command string and won't contain `\x00`,
// but sentinels in the pattern's literal segments need to match `*` in
// the token. Replace sentinel back to `*` in the pattern parts that
// glob_match compares literally.
glob_match_with_sentinel(&unescaped, token)
} else {
// No glob — restore sentinels to `*` and do exact comparison.
if has_escaped_star {
let plain = unescaped.replace('\x00', "*");
plain == token
} else {
unescaped == token
}
}
}

/// Glob matching that treats `\x00` in pattern literal segments as a literal `*`.
///
/// Splits the pattern on `*` (the real glob wildcards). Each resulting segment
/// may contain `\x00` which represents a literal `*` from `\*` in the original
/// pattern. When comparing segments against the text, `\x00` matches `*`.
fn glob_match_with_sentinel(pattern: &str, text: &str) -> bool {
let parts: Vec<&str> = pattern.split('*').collect();

if parts.iter().all(|p| p.is_empty()) {
return true;
}

let mut pos = 0;

for (i, part) in parts.iter().enumerate() {
if part.is_empty() {
continue;
}
// Replace sentinel back to `*` for comparison against the actual text
let segment = part.replace('\x00', "*");
if i == 0 {
if !text.starts_with(&segment) {
return false;
}
pos = segment.len();
} else if i == parts.len() - 1 {
if !text[pos..].ends_with(&segment) {
return false;
}
pos = text.len();
} else {
match text[pos..].find(&*segment) {
Some(offset) => pos += offset + segment.len(),
None => return false,
}
}
}

if !pattern.ends_with('*') {
return pos == text.len();
}

true
}

/// Simple glob matching where `*` matches zero or more arbitrary characters.
///
/// Only supports `*` as a wildcard; no other glob syntax (e.g. `?`, `[...]`)
Expand Down Expand Up @@ -1619,4 +1723,23 @@ mod tests {
"pattern {pattern_str:?} vs command {command_str:?}",
);
}

// === literal_matches: backslash escape ===

#[rstest]
#[case::backslash_semicolon(r"\;", ";", true)]
#[case::backslash_semicolon_no_match(r"\;", "x", false)]
#[case::backslash_star_literal(r"\*", "*", true)]
#[case::backslash_star_not_glob(r"\*", "foo", false)]
#[case::escaped_and_bare_glob(r"\*.*", "*.foo", true)]
#[case::escaped_and_bare_glob_no_match(r"\*.*", "foo.bar", false)]
#[case::no_backslash("foo", "foo", true)]
#[case::plain_glob("fo*", "foobar", true)]
fn literal_matches_cases(#[case] pattern: &str, #[case] token: &str, #[case] expected: bool) {
assert_eq!(
literal_matches(pattern, token),
expected,
"literal_matches({pattern:?}, {token:?})",
);
}
}
45 changes: 19 additions & 26 deletions src/rules/pattern_parser.rs
Original file line number Diff line number Diff line change
Expand Up @@ -187,8 +187,7 @@ fn build_pattern_tokens(
// alternation so that flag-with-value and order-independent
// matching work the same as for `-X|--request` style patterns.
if let Some(&(j, next)) = iter.peek() {
if should_consume_as_value_strict(next, j + 1 < lex_tokens.len(), inside_group)
{
if should_consume_as_value(next, j + 1 < lex_tokens.len(), inside_group) {
let (_, next_token) = iter.next().ok_or(
PatternParseError::InvalidSyntax("unexpected end of tokens".into()),
)?;
Expand Down Expand Up @@ -357,26 +356,12 @@ fn should_consume_as_value(next: &LexToken, has_more_after: bool, inside_group:
LexToken::Literal(s) if s == "]" => false,
LexToken::Literal(s) if is_flag(s) => false,
LexToken::Alternation(alts) if alts.iter().any(|a| is_flag(a)) => false,
LexToken::Placeholder(_) => false,
LexToken::Wildcard => inside_group || has_more_after,
_ => true,
}
}

/// Like [`should_consume_as_value`], but stricter: also refuses to consume
/// placeholder tokens as flag values. Used for bare flags (e.g. `-c`) where
/// the flag is written without alternation syntax and the next token may be a
/// wrapper placeholder (e.g. `<cmd>`) rather than a flag value.
fn should_consume_as_value_strict(
next: &LexToken,
has_more_after: bool,
inside_group: bool,
) -> bool {
match next {
LexToken::Placeholder(_) => false,
_ => should_consume_as_value(next, has_more_after, inside_group),
}
}

/// Check if a string looks like a flag (starts with `-`).
fn is_flag(s: &str) -> bool {
s.starts_with('-')
Expand Down Expand Up @@ -496,17 +481,13 @@ mod tests {
PatternToken::Alternation(vec!["-f".into(), "--force".into()]),
PatternToken::Wildcard,
])]
#[case::placeholder_value("cmd -o|--option <cmd>", "cmd", vec![
PatternToken::FlagWithValue {
aliases: vec!["-o".into(), "--option".into()],
value: Box::new(PatternToken::Placeholder("cmd".into())),
},
#[case::placeholder_not_consumed_as_flag_value("cmd -o|--option <cmd>", "cmd", vec![
PatternToken::Alternation(vec!["-o".into(), "--option".into()]),
PatternToken::Placeholder("cmd".into()),
])]
#[case::path_ref_value("cmd -c|--config <path:config>", "cmd", vec![
PatternToken::FlagWithValue {
aliases: vec!["-c".into(), "--config".into()],
value: Box::new(PatternToken::PathRef("config".into())),
},
PatternToken::Alternation(vec!["-c".into(), "--config".into()]),
PatternToken::PathRef("config".into()),
])]
fn parse_flag_with_value(
#[case] input: &str,
Expand Down Expand Up @@ -641,6 +622,18 @@ mod tests {
#[case::path_ref("cat <path:sensitive>", "cat", vec![
PatternToken::PathRef("sensitive".into()),
])]
#[case::flag_alternation_then_placeholder(
r"find * -exec|-execdir|-ok|-okdir <cmd> \;|+",
"find",
vec![
PatternToken::Wildcard,
PatternToken::Alternation(vec![
"-exec".into(), "-execdir".into(), "-ok".into(), "-okdir".into(),
]),
PatternToken::Placeholder("cmd".into()),
PatternToken::Alternation(vec![r"\;".into(), "+".into()]),
],
)]
fn parse_placeholder(
#[case] input: &str,
#[case] expected_command: &str,
Expand Down
32 changes: 32 additions & 0 deletions tests/integration/wrapper_recursive_evaluation.rs
Original file line number Diff line number Diff line change
Expand Up @@ -757,3 +757,35 @@ fn wrapper_compound_with_sandbox(empty_context: EvalContext) {
assert_eq!(result.action, Action::Allow);
assert_eq!(result.sandbox_preset.as_deref(), Some("py_sandbox"));
}

// ========================================
// find -exec/-execdir wrapper: flag alternation followed by <cmd>
// placeholder is parsed correctly, enabling recursive evaluation
// ========================================

#[rstest]
#[case::find_exec_rm_denied_semicolon("find . -exec rm -rf / \\;", assert_deny as ActionAssertion)]
#[case::find_exec_rm_denied_plus("find . -exec rm -rf / +", assert_deny as ActionAssertion)]
#[case::find_execdir_echo_allowed("find . -execdir echo hello +", assert_allow as ActionAssertion)]
#[case::find_ok_rm_denied("find /tmp -ok rm -rf / \\;", assert_deny as ActionAssertion)]
#[case::find_okdir_ls_allowed("find . -okdir ls -la +", assert_allow as ActionAssertion)]
#[case::find_exec_unmatched_default("find . -exec hg status +", assert_default as ActionAssertion)]
fn find_exec_wrapper_evaluates_inner(
#[case] command: &str,
#[case] expected: ActionAssertion,
empty_context: EvalContext,
) {
let config = parse_config(indoc! {"
rules:
- deny: 'rm -rf *'
- allow: 'echo *'
- allow: 'ls *'
definitions:
wrappers:
- 'find * -exec|-execdir|-ok|-okdir <cmd> \\;|+'
"})
.unwrap();

let result = evaluate_command(&config, command, &empty_context).unwrap();
expected(&result.action);
}
Loading