Skip to content

Commit 55e3c5b

Browse files
authored
[tool] fix: support non-ascii characters in search results (#3044)
### What does this PR do? A small change from `json.dumps({"result": final_result})` to `json.dumps({"result": final_result}, ensure_ascii=False)`, supporting customized search engines that return docs containing non-ascii characters (e.g., CJK characters). ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: #1682 - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test N/A ### API and Usage Example N/A ### Design & Code Changes N/A ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
1 parent d253526 commit 55e3c5b

File tree

1 file changed

+8
-6
lines changed

1 file changed

+8
-6
lines changed

verl/tools/utils/search_r1_like_utils.py

Lines changed: 8 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -198,11 +198,11 @@ def perform_single_search_batch(
198198
"formatted_result": None,
199199
}
200200

201-
result_text = json.dumps({"result": "Search request failed or timed out after retries."})
201+
result_text = json.dumps({"result": "Search request failed or timed out after retries."}, ensure_ascii=False)
202202

203203
if error_msg:
204204
metadata["status"] = "api_error"
205-
result_text = json.dumps({"result": f"Search error: {error_msg}"})
205+
result_text = json.dumps({"result": f"Search error: {error_msg}"}, ensure_ascii=False)
206206
logger.error(f"Batch search: API error occurred: {error_msg}")
207207
elif api_response:
208208
logger.debug(f"Batch search: API Response: {api_response}")
@@ -220,24 +220,26 @@ def perform_single_search_batch(
220220
total_results += len(retrieval) if isinstance(retrieval, list) else 1
221221

222222
final_result = "\n---\n".join(pretty_results)
223-
result_text = json.dumps({"result": final_result})
223+
result_text = json.dumps({"result": final_result}, ensure_ascii=False)
224224
metadata["status"] = "success"
225225
metadata["total_results"] = total_results
226226
metadata["formatted_result"] = final_result
227227
logger.info(f"Batch search: Successful, got {total_results} total results")
228228
else:
229-
result_text = json.dumps({"result": "No search results found."})
229+
result_text = json.dumps({"result": "No search results found."}, ensure_ascii=False)
230230
metadata["status"] = "no_results"
231231
metadata["total_results"] = 0
232232
logger.info("Batch search: No results found")
233233
except Exception as e:
234234
error_msg = f"Error processing search results: {e}"
235-
result_text = json.dumps({"result": error_msg})
235+
result_text = json.dumps({"result": error_msg}, ensure_ascii=False)
236236
metadata["status"] = "processing_error"
237237
logger.error(f"Batch search: {error_msg}")
238238
else:
239239
metadata["status"] = "unknown_api_state"
240-
result_text = json.dumps({"result": "Unknown API state (no response and no error message)."})
240+
result_text = json.dumps(
241+
{"result": "Unknown API state (no response and no error message)."}, ensure_ascii=False
242+
)
241243
logger.error("Batch search: Unknown API state.")
242244

243245
return result_text, metadata

0 commit comments

Comments
 (0)