I would like if possible to avoid going down the yaml files path as I don't see any real benefit on creating an extra abstraction layer or a tool for creating tools. Keeping in mind that the goal is to test how accurate and scalable is using MCPs for this purpose I was expecting a way more simpler approach. Let's discuss what benefits the YAML files brings to the table.
Suggested simpler approach.
Following up on #21 , creating new tools and new ways to answer questions should be as simple as:
@mcp.tools()
def survivors_of_the_titanic():
"""This new function answers the question of how many people survived the titanic"""
data_url = "https://datasets.org/titanic.csv"
df = pd.read_csv(data_url)
result = (df['Survived'] == 1).sum()
return f"The survivors are {result}. Based on {data_url}."
And then repeat for each question we want to answer. Every question will have a more simpler/complicated query but the structure is quite the same and straigthforward. If we want another one we can build something like:
# other questions...
@mcp.tools()
def survivors_of_the_titanic_by_sex(sex):
data_url = "https://datasets.org/titanic.csv"
df = pd.read_csv(data_url)
result = df['Survived'] == 1
if sex is not None:
result = survived_mask & (df['Sex'] == sex)
return f"{result} people survived of sex {sex}. Based on {data_url}"
return f"{result} people survived without considing sex. Based on {data_url}"
And so on....
So, considering that each tool follows the same pattern: of read, query, return I'm struggling to see the benefit of the YAML file. We are changing from a straight forward well-known pandas structure to a new yaml base definition on how to query.
Reason I don't like the yaml approach
- It doesn't really save typing and repetition. For each question we have a ~25 yaml file where we need to repeat each information (url, data, filter, label, etc)
- It adds a layer of abstraction that we need to maintain.
- Makes debugging more complicated, if my Yaml specification is not working. How do I debug it and test it? The difference in development experience is significative here because the pandas pipeline can be easily debuged and improved.
- We need to document the YAML specification and maintain it (what if we want to make changes in the future? Are we going to version it?)
- We are creating yet a kew query language.
Benefits of YAML files
I know there are some benefits:
- they seem easier to read (although this depends on who is writing it)
- Maybe AI tools can quicly create new ones (but at the same time they can as well create new pandas pipelines.)
- It abstracts the implementation so we could have pandas, polars, sql engines.
Let's discuss as I'd like to keep this as simple as possible for the pilots.
I would like if possible to avoid going down the
yamlfiles path as I don't see any real benefit on creating an extra abstraction layer or a tool for creating tools. Keeping in mind that the goal is to test how accurate and scalable is using MCPs for this purpose I was expecting a way more simpler approach. Let's discuss what benefits the YAML files brings to the table.Suggested simpler approach.
Following up on #21 , creating new tools and new ways to answer questions should be as simple as:
And then repeat for each question we want to answer. Every question will have a more simpler/complicated query but the structure is quite the same and straigthforward. If we want another one we can build something like:
And so on....
So, considering that each tool follows the same pattern: of read, query, return I'm struggling to see the benefit of the YAML file. We are changing from a straight forward well-known pandas structure to a new yaml base definition on how to query.
Reason I don't like the yaml approach
Benefits of YAML files
I know there are some benefits:
Let's discuss as I'd like to keep this as simple as possible for the pilots.