Skip to content

[Discussion] Engines and YAML files #25

@pdelboca

Description

@pdelboca

I would like if possible to avoid going down the yaml files path as I don't see any real benefit on creating an extra abstraction layer or a tool for creating tools. Keeping in mind that the goal is to test how accurate and scalable is using MCPs for this purpose I was expecting a way more simpler approach. Let's discuss what benefits the YAML files brings to the table.

Suggested simpler approach.

Following up on #21 , creating new tools and new ways to answer questions should be as simple as:

@mcp.tools()
def survivors_of_the_titanic():
     """This new function answers the question of how many people survived the titanic"""
    data_url = "https://datasets.org/titanic.csv"

    df = pd.read_csv(data_url)
    result = (df['Survived'] == 1).sum()
    
    return f"The survivors are {result}. Based on {data_url}."

And then repeat for each question we want to answer. Every question will have a more simpler/complicated query but the structure is quite the same and straigthforward. If we want another one we can build something like:

# other questions...

@mcp.tools()
def survivors_of_the_titanic_by_sex(sex):
    data_url = "https://datasets.org/titanic.csv"

    df = pd.read_csv(data_url)
    result = df['Survived'] == 1
    if sex is not None:
        result = survived_mask & (df['Sex'] == sex)
        return f"{result} people survived of sex {sex}. Based on {data_url}"
   return f"{result} people survived without considing sex. Based on {data_url}"

And so on....

So, considering that each tool follows the same pattern: of read, query, return I'm struggling to see the benefit of the YAML file. We are changing from a straight forward well-known pandas structure to a new yaml base definition on how to query.

Reason I don't like the yaml approach

  1. It doesn't really save typing and repetition. For each question we have a ~25 yaml file where we need to repeat each information (url, data, filter, label, etc)
  2. It adds a layer of abstraction that we need to maintain.
  3. Makes debugging more complicated, if my Yaml specification is not working. How do I debug it and test it? The difference in development experience is significative here because the pandas pipeline can be easily debuged and improved.
  4. We need to document the YAML specification and maintain it (what if we want to make changes in the future? Are we going to version it?)
  5. We are creating yet a kew query language.

Benefits of YAML files

I know there are some benefits:

  1. they seem easier to read (although this depends on who is writing it)
  2. Maybe AI tools can quicly create new ones (but at the same time they can as well create new pandas pipelines.)
  3. It abstracts the implementation so we could have pandas, polars, sql engines.

Let's discuss as I'd like to keep this as simple as possible for the pilots.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions