Proposal: add WFGY 16-problem RAG failure map as a taxonomy for eval analysis

Hi Evals team,

OpenAI Evals is a natural place where people want to understand why certain prompts / systems fail, not just whether they pass or fail a metric.

I maintain **WFGY RAG 16 Problem Map**, an open-source failure taxonomy for RAG / LLM pipelines. It includes:

- A list of **16 common failure types**  
- A one-page **Global Debug Card**  
- A **triage prompt** that can classify failing examples and suggest structural fixes

Repo (MIT):  
https://github.com/onestardao/WFGY  
Main reference page:  
https://github.com/onestardao/WFGY/tree/main/ProblemMap/README.md

The map is already integrated or cited in other RAG / LLM ecosystems, including **RAGFlow, LlamaIndex, ToolUniverse (Harvard MIMS Lab), Rankify (Univ. of Innsbruck), Multimodal RAG Survey (QCRI LLM Lab)** and curated lists like **Awesome LLM Apps**.

Proposal:

Provide WFGY’s 16-problem map as a **recommended taxonomy for eval analysis** in Evals, for example:

1. An example eval where:
   - RAG outputs that fail a metric are further classified into one of the 16 WFGY failure types by an LLM.
   - The analysis notebook aggregates failure counts per type.

2. A short docs section that:
   - Introduces the 16-problem map.
   - Shows how users can plug in their own failure taxonomies, with WFGY as a concrete example.

This would help eval users move from “the score went down” to “we see more cases of failure types 3 and 7, which point to specific pipeline fixes”.

If this aligns with Evals’ direction I am happy to draft the example and documentation text.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: add WFGY 16-problem RAG failure map as a taxonomy for eval analysis #1629

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Proposal: add WFGY 16-problem RAG failure map as a taxonomy for eval analysis #1629

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions