DeepResearch/evaluation at main · Alibaba-NLP/DeepResearch

Name	Name	Last commit message	Last commit date
parent directory ..
README.md	README.md
evaluate_deepsearch_official.py	evaluate_deepsearch_official.py
evaluate_hle_official.py	evaluate_hle_official.py
prompt.py	prompt.py

Name

Last commit message

Last commit date

README.md

evaluate_deepsearch_official.py

evaluate_hle_official.py

prompt.py

For hle,

Export what you need

export API_KEY=Your api key
export BASE_URL=Your base url

Run this command

python eval_hle_old_react.py --input_fp your_input_folder --model_path your_qwen_model_path

For other benchmarks,

Export what you need

export OPENAI_API_KEY=Your openai api key
export OPENAI_API_BASE=Your openai api base
export API_KEY=Your api key
export BASE_URL=Your base url
export Qwen2_5_7B_PATH=Your qwen model path

Run this command

python evaluate_all_official.py --input_fp your_input_folder --dataset your_evaluated_dataset

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

FilesExpand file tree

evaluation

Directory actions

More options

Directory actions

More options

Latest commit

History

evaluation

Folders and files

parent directory

README.md