Mellen Y. Pu Tao Lin Hongyu Chen
Westlake University
New to PiFlow? Check out our
Colab Tutorial to get started in minutes!
We introduce PiFlow (Principle Flow), an information-theoretical framework that treats automated scientific discovery as a structured uncertainty reduction problem, guided by foundational scientific principles. This approach ensures a more systematic and rational exploration of scientific problems by learning and optimization. You can directly use PiFlow for ANY of your specific tasks to assist scientific discovery!
🔮 Supported Features:
- Budget-limited iterative hypothesis-testing.
- Saving the full running log of agents.
- Colored output of different agents in terminal.
- Interactive Tutorial: Step-by-step guidance for scientists.
- Case study:
- Discovered nanohelix geo-structure with high chiral property (g-factor > 1.8).
PiFlow has demonstrated significant advancements in scientific discovery across multiple domains:
-
Comprehensive evaluation: Tested across three distinct scientific domains:
- 🔬 Nanohelix materials
- 🧬 Bio-molecules
- ⚡ Superconductors
-
Enhanced efficiency: Demonstrates a 73.55% increase in the Area Under the Curve (AUC) of property values versus exploration steps, significantly improving discovery efficiency.
-
Superior solution quality: Achieves an impressive 94.06% improvement in solution quality compared to a baseline agent system.
PiFlow serves as a plug-and-play method that establishes a novel paradigm for highly efficient automated scientific discovery, paving the way for more robust and accelerated AI-driven research. The framework accommodates various scenarios (bio-molecules, nanomaterials, and superconductor discovery) with diverse experimental conditions, requiring minimal to no prompt engineering for effective agent-level interaction.
To prepare the environment, execute the following conda commands:
git clone https://github.com/amair-lab/PiFlow && cd PiFlow
conda env create -f environment.yml # The environment name will be `piflow`
conda activate piflow
# We recommend using mamba for faster installation, though conda works as well
# If you want to use mamba for package management, run these commands instead:
# conda install mamba -n base -c conda-forge
# mamba env create -f environment.yml
# mamba activate piflowYou need to obtain the OpenAI-compatible API key for calling the language models. Ensure that the model embedded within the experimental agent is able to use tools. We recommend using official large models supported by Alibaba Cloud, e.g. QwenMax.
In the configuration for language models, each agent's profile contains API info and settings like verbose mode:
streaming: # bool, true means the output will be in streaming format, consistent with the original API output
api_config:
base_url: # Follow the OpenAI API format
model_name: # Follow the OpenAI API format
is_reasoning: # bool, true if it is a reasoning model like o1, R1, QwQ, etc. (typically has longer processing time)
api_key: # Follow the OpenAI API format
temperature: # Follow the OpenAI API, generally set to 0.6 or 0.4
max_tokens: # We set this to 4096
tools: [] # A list of tool names; they should be defined and declared in `tools/__init__.py`, following YAML list syntaxOther fields like enabled indicate whether this agent is a member of group chat. Fields like name and description are used for building the agent profile. Note that the name field should not contain any blank characters.
After configuration, you can run the demo task for nanohelix material discovery (you could also modify the args by argparse, see inference.py) by running the command:
python inference.pyYou will see detailed output with colored chat history at the command line if success. Our philosophy is that one configuration file corresponds to one task, and you should prepare both task and model configurations for running PiFlow.
PiFlow also offers extensive flexibility to adapt to your own scenarios, such as quantum science, MOF synthesis, and other domains, as demonstrated in our manuscript - its Plug-and-Play ability. The framework can assist with various discovery processes.
To adapt PiFlow to your task, you can design tools based on the provided examples in tools/.... Simply copy one file (e.g., [tool_nanomaterial.py](tools/tool_nanomaterial.py)_nanohelix_tools.py) and create a new file named according to your scenario (e.g., _mof_tools.py), the tools manager will automatically load this new tool, but please remember to add @register_tool for assigning both name and desc. Additionally, you should also prepare the output as a JSON format data with the same fields supported by us:
{
"tool_name": "xxx", # same with `@register_tool`
"input": "", # exact hypothesis candidate, str, etc
"output": None, # the `reward` for PiFlow to maximize
"success": True, # boolean value for checking the status
"error": "xxx" # helper info
}Remember to add the tool into the agent profile at model's config file. This is essential because the program will screen all tools defined at tools/ and load them into the agent's context by filtering the tool name.
If you have any problems or want to discuss something about PiFlow, welcome to join our Discord!
@misc{pu2025piflow,
title={PiFlow: Principle-aware Scientific Discovery with Multi-Agent Collaboration},
author={Yingming Pu and Tao Lin and Hongyu Chen},
year={2025},
eprint={2505.15047},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2505.15047},
}This project is licensed under the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) License. Under this license, you are free to share and adapt this work for non-commercial purposes, provided you give appropriate attribution.

