Replies: 3 comments
-
| Hey @hironow, really appreciate you taking the time to point these out! The diff you shared is super helpful for fixing the dataset, and thanks a lot as well for the kind words about the project—it means a lot! CC: @prateekchhikara | 
Beta Was this translation helpful? Give feedback.
-
| @hironow Thanks a lot for taking the time to review the dataset and point out these potential annotation issues. We really appreciate the level of detail you provided along with the diff. Just to clarify: the dataset used is the publicly available LOCOMO benchmark, and we haven’t made any modifications to it. While you are right that some of the evidence spans look incorrect or out of range, in our experiments, we only relied on the provided answer values, not the evidence references. So these issues don’t affect our reported results. | 
Beta Was this translation helpful? Give feedback.
-
| @parshvadaftari @prateekchhikara Thank you both for your kind and detailed replies! I appreciate the clarification that this is from the original LOCOMO benchmark and that it doesn't affect your reported results. That makes perfect sense. Keep up the great work on the project! | 
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi team,
I was reviewing the dataset linked in the evaluation/README.md and I believe I've found 7 potential annotation errors. The issues seem to be related to invalid reference formats or line numbers that are out of range.
Here is the list of items I identified:
- Item 3 / QA 59 Q: What things has Nate reccomended to Joanna? evidence: [D2:14, D9:12, D9:14, D10:11, D19:17, D27:23, D10:19] unresolved: - ref=D10:19 reason=line 19 out of range for D10 (len=16) - Item 3 / QA 89 Q: What is one of Joanna's favorite movies? evidence: [D1:18, D, D1:20] unresolved: - ref=D reason=invalid ref format (expected D#:line) - Item 4 / QA 19 Q: What authors has Tim read books from? evidence: [D1:14, D2:7, D4:7, D5:15, D:11:26, D20:21, D26:36] unresolved: - ref=D:11:26 reason=invalid ref format (expected D#:line) - Item 6 / QA 39 Q: What happened to John's job situation in 2022? evidence: [D4:36, D18:1, D18:7] unresolved: - ref=D4:36 reason=line 36 out of range for D4 (len=25) - Item 8 / QA 32 Q: How might Evan and Sam's experiences with health and lifestyle changes influence their approach to stress and challenges? evidence: [D9:1 D4:4 D4:6] unresolved: - ref=D9:1 D4:4 D4:6 reason=invalid ref format (expected D#:line) - Item 8 / QA 39 Q: What role does nature and the outdoors play in Evan and Sam's mental well-being? evidence: [D22:1 D22:2 D9:10 D9:11] unresolved: - ref=D22:1 D22:2 D9:10 D9:11 reason=invalid ref format (expected D#:line) - Item 8 / QA 47 Q: How do Evan and Sam use creative outlets to cope with life's challenges? evidence: [D21:18 D21:22 D11:15 D11:19] unresolved: - ref=D21:18 D21:22 D11:15 D11:19 reason=invalid ref format (expected D#:line)I went ahead and created a commit with what I believe are the correct annotations. You can view the diff here:
hironow/dspy@aa5a138
I'm starting this as a discussion to share my findings. I hope the provided diff is helpful for you to correct the dataset. Please let me know your thoughts.
Thanks for your great work on this project!
Beta Was this translation helpful? Give feedback.
All reactions