tair-opensource · suxb201 · Sep 30, 2025 · Sep 30, 2025
diff --git a/docs/src/en/filter/filter.md b/docs/src/en/filter/filter.md
@@ -2,47 +2,82 @@
 outline: deep
 ---
 # Built-in Filter Rules
-RedisShake provides various built-in filter rules that users can choose from according to their needs.
 
-## Filtering Keys
-RedisShake supports filtering by key name, key name prefixes, and suffixes. You can set the following options in the configuration file, for example:
+RedisShake evaluates filter rules after commands are parsed but before anything is sent to the destination. The filter therefore controls which commands ever leave RedisShake, and only the commands that pass this stage are eligible for further processing by the optional [function](./function.md) hook.
+
+## Where filtering happens
+
+```
+source reader  -->  filter rules  -->  (optional Lua function)  -->  writer / target
+```
+
+* Commands enter the filter after RedisShake has parsed the RESP payload from the reader. At this point the request is already considered valid and would be forwarded if no filters were configured.
+* Filtering happens before any other transformation stage, so blocked commands never reach the optional Lua function or the writer.
+* The stage operates on the same command representation that writers use, which keeps behaviour consistent for all readers.
+
+## How Filter Evaluation Works
+
+1. **Block rules run first.** If a key, database, command, or command group matches a `block_*` rule, the entire entry is dropped immediately.
+2. **Allow lists are optional.** When no `allow_*` rule is configured for a category, everything is permitted by default. As soon as you define an allow list, only the explicitly listed items will pass.
+3. **Multi-key consistency.** Commands with multiple keys (for example, `MSET`) must either pass for all keys or the entry is discarded. RedisShake also emits logs when a mixed result is detected to help you troubleshoot your patterns.
+
+Combining allow and block lists lets you quickly express exceptions such as “allow user keys except temporary cache variants.” Block rules take precedence, so avoid listing the same pattern in both allow and block lists.
+
+## Key Filtering
+
+RedisShake supports filtering by key names, prefixes, suffixes, and regular expressions. For example:
+
 ```toml
 [filter]
-allow_keys = ["user:1001", "product:2001"] # allowed key names  
-allow_key_prefix = ["user:", "product:"] # allowed key name prefixes
-allow_key_suffix = [":active", ":valid"] # allowed key name suffixes
-allow_key_regex = [":\\d{11}:"] # allowed key name regex, 11-digit mobile phone number
-block_keys = ["temp:1001", "cache:2001"] # blocked key names
-block_key_prefix = ["temp:", "cache:"] # blocked key name prefixes
-block_key_suffix = [":tmp", ":old"] # blocked key name suffixes
-block_key_regex = [":test:\\d{11}:"] # blocked key name regex, 11-digit mobile phone number with "test" prefix
+allow_keys = ["user:1001", "product:2001"]          # allow-listed key names
+allow_key_prefix = ["user:", "product:"]             # allow-listed key prefixes
+allow_key_suffix = [":active", ":valid"]             # allow-listed key suffixes
+allow_key_regex = [":\\d{11}:"]                     # allow-listed key regex (11-digit phone numbers)
+block_keys = ["temp:1001", "cache:2001"]              # block-listed key names
+block_key_prefix = ["temp:", "cache:"]                # block-listed key prefixes
+block_key_suffix = [":tmp", ":old"]                  # block-listed key suffixes
+block_key_regex = [":test:\\d{11}:"]                # block-listed key regex with "test" prefix
 ```
-If these options are not set, all keys are allowed by default.
 
-## Filtering Databases
-You can specify allowed or blocked database numbers, for example:
+Regular expressions follow Go’s syntax. Escape backslashes carefully when writing inline TOML strings. Regex support allows complex tenant-isolation scenarios, such as filtering phone numbers or shard identifiers.
+
+## Database Filtering
+
+Limit synchronization to specific logical databases or skip known noisy ones:
+
 ```toml
 [filter]
 allow_db = [0, 1, 2]
 block_db = [3, 4, 5]
 ```
-If these options are not set, all databases are allowed by default.
 
-## Filtering Commands
-RedisShake allows you to filter specific Redis commands, for example:
+If neither `allow_db` nor `block_db` is set, all databases are synchronized.
+
+## Command and Command-Group Filtering
+
+Restrict the traffic by command name or by the Redis command group. This is useful when the destination lacks support for scripting or cluster administration commands.
+
 ```toml
 [filter]
 allow_command = ["GET", "SET"]
 block_command = ["DEL", "FLUSHDB"]
-``` 
-
-## Filtering Command Groups
 
-You can also filter by command groups. Available command groups include:
-SERVER, STRING, CLUSTER, CONNECTION, BITMAP, LIST, SORTED_SET, GENERIC, TRANSACTIONS, SCRIPTING, TAIRHASH, TAIRSTRING, TAIRZSET, GEO, HASH, HYPERLOGLOG, PUBSUB, SET, SENTINEL, STREAM
-For example:
-```toml
-[filter]
 allow_command_group = ["STRING", "HASH"]
 block_command_group = ["SCRIPTING", "PUBSUB"]
 ```
+
+Command groups follow the [Redis command key specifications](https://redis.io/docs/reference/key-specs/). Use groups to efficiently exclude entire data structures (for example, block `SCRIPTING` to avoid unsupported Lua scripts when synchronizing to a cluster).
+
+## Configuration Reference
+
+| Option | Type | Description |
+| --- | --- | --- |
+| `allow_keys` / `block_keys` | `[]string` | Exact key names to allow or block. |
+| `allow_key_prefix` / `block_key_prefix` | `[]string` | Filter keys by prefix. |
+| `allow_key_suffix` / `block_key_suffix` | `[]string` | Filter keys by suffix. |
+| `allow_key_regex` / `block_key_regex` | `[]string` | Regular expressions evaluated against the full key. |
+| `allow_db` / `block_db` | `[]int` | Logical database numbers to include or exclude. |
+| `allow_command` / `block_command` | `[]string` | Redis command names. |
+| `allow_command_group` / `block_command_group` | `[]string` | Redis command groups such as `STRING`, `HASH`, `SCRIPTING`. |
+
+All options are optional. When both an allow and block rule apply to the same category, block rules win. Keep configurations symmetrical across active/standby clusters to avoid asymmetric data drops during failover.
diff --git a/docs/src/en/filter/function.md b/docs/src/en/filter/function.md
@@ -4,15 +4,28 @@ outline: deep
 
 # What is function
 
-RedisShake provides a function feature that implements the `transform` capability in [ETL (Extract-Transform-Load)](https://en.wikipedia.org/wiki/Extract,_transform,_load). By utilizing functions, you can achieve similar functionalities:
-* Change the `db` to which data belongs, for example, writing data from source `db 0` to destination `db 1`.
-* Filter data, for instance, only writing source data with keys starting with `user:` to the destination.
-* Modify key prefixes, such as writing a source key `prefix_old_key` to a destination key `prefix_new_key`.
-* ...
+The **function** option extends the `[filter]` section with a Lua hook. Built-in filter rules run first to decide whether a command should leave RedisShake; only the surviving commands enter the Lua function, where you can reshape, split, or enrich them before they reach the destination. This hook is intended for lightweight adjustments that are difficult to express with static allow/block lists.
 
-To use the function feature, you only need to write a Lua script. After RedisShake retrieves data from the source, it converts the data into Redis commands. Then, it processes these commands, parsing information such as `KEYS`, `ARGV`, `SLOTS`, `GROUP`, and passes this information to the Lua script. The Lua script processes this data and returns the processed commands. Finally, RedisShake writes the processed data to the destination.
+With the function feature you can:
+
+* Change the database (`db`) to which data belongs (for example, write source `db 0` into destination `db 1`).
+* Filter or drop specific data, keeping only keys that match custom business rules.
+* Rewrite commands, such as expanding `MSET` into multiple `SET` commands or adding new key prefixes.
+* Emit additional commands (for metrics or cache warming) derived from the incoming data stream.
+
+## Execution Flow
+
+1. RedisShake retrieves commands from the reader and parses metadata such as command name, keys, key slots, and group.
+2. Built-in filter rules evaluate the command. Anything blocked here never reaches Lua or the writer.
+3. For the remaining entries, RedisShake creates a Lua state and exposes read-only context variables (`DB`, `CMD`, `KEYS`, and so on) plus helper functions under the `shake` table.
+4. Your Lua code decides which commands to send downstream by calling `shake.call` zero or more times.
+
+If your script does not invoke `shake.call`, the original command is suppressed. This makes it easy to implement drop-and-replace logic, but also means forgetting a `shake.call` will silently discard data. Always add logging while testing.
+
+## Quick Start
+
+Place the Lua script inline in the `[filter]` section of the configuration file:
 
-Here's a specific example:
 ```toml
 [filter]
 function = """
@@ -30,47 +43,52 @@ address = "127.0.0.1:6379"
 [redis_writer]
 address = "127.0.0.1:6380"
 ```
-`DB` is information provided by RedisShake, indicating the db to which the current data belongs. `shake.log` is used for logging, and `shake.call` is used to call Redis commands. The purpose of the above script is to discard data from source `db 0` and write data from other `db`s to the destination.
 
-In addition to `DB`, there is other information such as `KEYS`, `ARGV`, `SLOTS`, `GROUP`, and available functions include `shake.log` and `shake.call`. For details, please refer to [function API](#function-api).
+`DB` is information provided by RedisShake, indicating the database to which the current data belongs. `shake.log` is used for logging, and `shake.call` emits a Redis command to the destination. The above script discards data from source `db 0` and forwards data from the other databases.
 
 ## function API
 
 ### Variables
 
-Because some commands contain multiple keys, such as the `mset` command, the variables `KEYS`, `KEY_INDEXES`, and `SLOTS` are all array types. If you are certain that a command has only one key, you can directly use `KEYS[1]`, `KEY_INDEXES[1]`, `SLOTS[1]`.
+Because some commands contain multiple keys, such as `MSET`, the variables `KEYS`, `KEY_INDEXES`, and `SLOTS` are all array types. If you are certain that a command has only one key, you can directly use `KEYS[1]`, `KEY_INDEXES[1]`, and `SLOTS[1]`.
 
 | Variable | Type | Example | Description |
-|-|-|-|-----|
-| DB | number | 1 | The `db` to which the command belongs |
-| GROUP | string | "LIST" | The `group` to which the command belongs, conforming to [Command key specifications](https://redis.io/docs/reference/key-specs/). You can check the `group` field for each command in [commands](https://github.com/tair-opensource/RedisShake/tree/v4/scripts/commands) |
-| CMD | string | "XGROUP-DELCONSUMER" | The name of the command |
-| KEYS | table | {"key1", "key2"} | All keys of the command |
-| KEY_INDEXES | table | {2, 4} | The indexes of all keys in `ARGV` |
-| SLOTS | table | {9189, 4998} | The [slots](https://redis.io/docs/reference/cluster-spec/#key-distribution-model) to which all keys of the current command belong |
-| ARGV | table | {"mset", "key1", "value1", "key2", "value2"} | All parameters of the command |
+| --- | --- | --- | --- |
+| `DB` | number | `1` | The database to which the command belongs. |
+| `CMD` | string | `"XGROUP-DELCONSUMER"` | The name of the command. |
+| `GROUP` | string | `"LIST"` | The command group, conforming to [Command key specifications](https://redis.io/docs/reference/key-specs/). You can check the `group` field for each command in [commands](https://github.com/tair-opensource/RedisShake/tree/v4/scripts/commands). |
+| `KEYS` | table | `{"key1", "key2"}` | All keys of the command. |
+| `KEY_INDEXES` | table | `{2, 4}` | Indexes of all keys inside `ARGV`. |
+| `SLOTS` | table | `{9189, 4998}` | Hash slots of the keys (cluster mode). |
+| `ARGV` | table | `{"mset", "key1", "value1", "key2", "value2"}` | All command arguments, including the command name at index `1`. |
 
 ### Functions
-* `shake.call(DB, ARGV)`: Returns a Redis command that RedisShake will write to the destination.
-* `shake.log(msg)`: Prints logs.
+
+* `shake.call(db, argv_table)`: Emits a command to the writer. The first element of `argv_table` must be the command name. You can call `shake.call` multiple times to split one input into several outputs (for example, expand `MSET` into multiple `SET`).
+* `shake.log(msg)`: Prints logs prefixed with `lua log:` in `shake.log`. Use this to verify script behaviour during testing.
 
 ## Best Practices
 
+### General Recommendations
+
+* **Keep scripts idempotent.** RedisShake may retry commands, so ensure the emitted commands do not rely on side effects.
+* **Guard against missing keys.** Always check whether `KEYS[1]` exists before slicing to avoid runtime errors with keyless commands such as `PING`.
+* **Prefer simple logic.** Complex loops increase Lua VM time and can slow down synchronization. Offload heavy transformations to upstream processes when possible.
 
 ### Filtering Keys
 
 ```lua
 local prefix = "user:"
 local prefix_len = #prefix
 
-if string.sub(KEYS[1], 1, prefix_len) ~= prefix then
+if not KEYS[1] or string.sub(KEYS[1], 1, prefix_len) ~= prefix then
   return
 end
 
 shake.call(DB, ARGV)
 ```
 
-The effect is to only write source data with keys starting with `user:` to the destination. This doesn't consider cases of multi-key commands like `mset`.
+The effect is to only write source data with keys starting with `user:` to the destination. This does not consider cases of multi-key commands like `MSET`.
 
 ### Filtering DB
 
@@ -85,12 +103,12 @@ shake.call(DB, ARGV)
 
 The effect is to discard data from source `db 0` and write data from other `db`s to the destination.
 
-
 ### Filtering Certain Data Structures
 
-You can use the `GROUP` variable to determine the data structure type. Supported data structure types include: `STRING`, `LIST`, `SET`, `ZSET`, `HASH`, `SCRIPTING`, etc.
+You can use the `GROUP` variable to determine the data structure type. Supported data structure types include `STRING`, `LIST`, `SET`, `ZSET`, `HASH`, `SCRIPTING`, and more.
 
 #### Filtering Hash Type Data
+
 ```lua
 if GROUP == "HASH" then
   return
@@ -100,7 +118,7 @@ shake.call(DB, ARGV)
 
 The effect is to discard `hash` type data from the source and write other data to the destination.
 
-#### Filtering [LUA Scripts](https://redis.io/docs/interact/programmability/eval-intro/)
+#### Filtering [Lua Scripts](https://redis.io/docs/interact/programmability/eval-intro/)
 
 ```lua
 if GROUP == "SCRIPTING" then
@@ -109,7 +127,22 @@ end
 shake.call(DB, ARGV)
 ```
 
-The effect is to discard `lua` scripts from the source and write other data to the destination. This is common when synchronizing from master-slave to cluster, where there are LUA scripts not supported by the cluster.
+The effect is to discard Lua scripts from the source and write other data to the destination. This is common when synchronizing from master-slave to cluster, where there are Lua scripts not supported by the cluster.
+
+### Splitting Commands
+
+```lua
+if CMD == "MSET" then
+  for i = 2, #ARGV, 2 do
+    shake.call(DB, {"SET", ARGV[i], ARGV[i + 1]})
+  end
+  return
+end
+
+shake.call(DB, ARGV)
+```
+
+This pattern expands one `MSET` into several `SET` commands to improve compatibility with destinations that prefer single-key writes.
 
 ### Modifying Key Prefixes
 
@@ -119,20 +152,21 @@ local prefix_new = "prefix_new_"
 
 shake.log("old=" .. table.concat(ARGV, " "))
 
-for i, index in ipairs(KEY_INDEXES) do
+for _, index in ipairs(KEY_INDEXES) do
   local key = ARGV[index]
-  if string.sub(key, 1, #prefix_old) == prefix_old then
+  if key and string.sub(key, 1, #prefix_old) == prefix_old then
     ARGV[index] = prefix_new .. string.sub(key, #prefix_old + 1)
   end
 end
 
 shake.log("new=" .. table.concat(ARGV, " "))
 shake.call(DB, ARGV)
 ```
+
 The effect is to write the source key `prefix_old_key` to the destination key `prefix_new_key`.
 
 ### Swapping DBs
-    
+
 ```lua
 local db1 = 1
 local db2 = 2
@@ -146,3 +180,9 @@ shake.call(DB, ARGV)
 ```
 
 The effect is to write source `db 1` to destination `db 2`, write source `db 2` to destination `db 1`, and leave other `db`s unchanged.
+
+## Troubleshooting
+
+* **Script fails to compile:** RedisShake validates the Lua code during startup and panics on syntax errors. Check the configuration logs for the exact line number.
+* **No data reaches the destination:** Ensure that `shake.call` is invoked for every branch. Adding `shake.log` statements helps confirm which code path runs.
+* **Performance drops:** Heavy scripts may become CPU-bound. Consider narrowing the scope with filters or moving expensive operations out of RedisShake.
diff --git a/docs/src/en/guide/config.md b/docs/src/en/guide/config.md
@@ -39,7 +39,12 @@ RedisShake provides different Writers to interface with different targets, see t
 
 ## filter Configuration
 
-You can set filter rules through the configuration file. Refer to [Filter and Processing](../filter/filter.md) and [function](../filter/function.md).
+The `[filter]` section contains two layers:
+
+* **Rule engine:** Configure `allow_*` and `block_*` lists to keep or drop keys, databases, commands, and command groups. See [Filter and Processing](../filter/filter.md) for detailed semantics and examples.
+* **Lua function hook:** Provide inline Lua code via the `function` option to rewrite commands after they pass the rule engine. See [function](../filter/function.md) for API details and best practices.
+
+Filters always run before the Lua hook. Commands blocked by the rule engine never enter the script or reach the writer, so you can reserve the Lua layer for the smaller, approved subset of traffic.
 
 ## advanced Configuration