Hi! We benchmarked NEXO Brain, an open-source MCP memory server using Atkinson-Shiffrin cognitive architecture, on LoCoMo.
Results (v0.5.0)
| System |
F1 |
Hardware |
| NEXO Brain v0.5.0 |
0.588 |
CPU only |
| GPT-4 (128K full context) |
0.379 |
GPU cloud |
| Gemini Pro 1.0 |
0.313 |
GPU cloud |
| LLaMA-3 70B |
0.295 |
A100 GPU |
| GPT-3.5 + Contriever RAG |
0.283 |
GPU |
Setup
- Embedding model: BAAI/bge-base-en-v1.5 (768 dims, CPU)
- Answer generation: Claude Sonnet 4
- Retrieval: Hybrid vector+BM25, HyDE expansion, cross-encoder reranking, multi-query decomposition
- Memory architecture: STM/LTM stores with adaptive Ebbinghaus decay, intelligent chunking, session summaries
Key findings
- Outperforms GPT-4 (128K full context) by 55% on F1
- 93.3% adversarial rejection rate (446 questions)
- 74.9% recall across 1,986 questions
- Runs entirely on CPU with 768-dim embeddings
Full results: https://github.com/wazionapps/nexo/tree/main/benchmarks/locomo
We believe this is the highest published score on LoCoMo. Would be great if you'd consider adding external benchmark results to your repo or leaderboard.
Thanks for building such a useful benchmark!
Hi! We benchmarked NEXO Brain, an open-source MCP memory server using Atkinson-Shiffrin cognitive architecture, on LoCoMo.
Results (v0.5.0)
Setup
Key findings
Full results: https://github.com/wazionapps/nexo/tree/main/benchmarks/locomo
We believe this is the highest published score on LoCoMo. Would be great if you'd consider adding external benchmark results to your repo or leaderboard.
Thanks for building such a useful benchmark!