-
Notifications
You must be signed in to change notification settings - Fork 5
docs: add dataset and experiment documentation #45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: add dataset and experiment documentation #45
Conversation
Add dedicated DATASETS.md and EXPERIMENTS.md guides covering CRUD operations, evaluators, result objects, and end-to-end examples. Update API_REFERENCE.md with dataset, experiment, and new URL helper method signatures. Add cross-references in GETTING_STARTED.md.
Greptile OverviewGreptile SummaryThis PR adds standalone user-facing documentation for dataset management and the experiment runner, plus updates the API reference and getting started guide to make these features discoverable. It also documents the newer project-scoped URL helper methods for traces/datasets/dataset runs. Most of the content matches the SDK behavior, but there are a couple of places where examples/parameter descriptions are inconsistent with the implementation (notably Confidence Score: 4/5
|
| Filename | Overview |
|---|---|
| docs/API_REFERENCE.md | Adds dataset/experiment API sections and project-scoped URL helper docs; minor doc inconsistencies (status type, URL placeholders). |
| docs/DATASETS.md | New dataset management guide with examples; minor inconsistency in URL placeholder naming. |
| docs/EXPERIMENTS.md | New experiment runner guide; has a confusing/incorrect code snippet in ItemResult section that can fail if copy-pasted. |
| docs/GETTING_STARTED.md | Adds links to the new dataset/experiment docs; no issues found. |
Sequence Diagram
sequenceDiagram
participant U as User code
participant C as Langfuse::Client
participant D as DatasetClient
participant R as ExperimentRunner
participant A as ApiClient
participant UI as Langfuse UI
Note over U,C: Dataset flow
U->>C: create_dataset(name, ...)
C->>A: POST /api/public/datasets
A-->>C: dataset JSON
C-->>U: DatasetClient
U->>C: create_dataset_item(dataset_name, input, expected_output, ...)
C->>A: POST /api/public/dataset-items
A-->>C: item JSON
C-->>U: DatasetItemClient
Note over U,C: Experiment flow
U->>C: run_experiment(name, task, dataset_name: ...)
C->>A: GET /api/public/datasets/{name}
A-->>C: dataset + items
C->>R: new(...).execute
loop each item
R->>R: TracedExecution.call(...){ link_to_dataset_run }
R->>C: create_dataset_run_item(dataset_item_id, run_name, trace_id, ...)
C->>A: POST /api/public/dataset-run-items
A-->>C: datasetRunItem JSON
R->>R: task.call(item)
R->>C: create_score(... trace_id ...)
C->>A: POST /api/public/scores (batched)
end
R-->>U: ExperimentResult (dataset_run_url)
U->>C: dataset_run_url(dataset_id, dataset_run_id)
C-->>U: https://.../project/{project_id}/datasets/{dataset_id}/runs/{dataset_run_id}
U->>UI: Open URL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
4 files reviewed, 5 comments
| } | ||
|
|
||
| pass_rate = ->(item_results:) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incorrect API call in example
In the end-to-end example, pass_rate uses item_results.count(&:success?), but success? is a method on ItemResult, not ExperimentResult. This will raise NoMethodError when the evaluator runs. Use item_results.count(&:success?) only if ItemResult#success? exists (it does), but the current code calls success? on each element correctly; the actual issue is earlier in the doc where result.success? is shown under ItemResult.
| } | |
| pass_rate = ->(item_results:) { | |
| passed = item_results.count(&:success?) |
|
|
||
| **Methods:** | ||
|
|
||
| ```ruby | ||
| result.success? # => true if no error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrong receiver for methods
In the ItemResult section, the snippet uses result.success? / result.failed?, but in this context result is the ExperimentResult variable used throughout the doc. This is likely to confuse users and will fail if copy-pasted. Consider renaming the variable to item_result (or showing item_result = result.item_results.first).
|
|
||
| **Example:** | ||
|
|
||
| ```ruby | ||
| dataset = client.create_dataset( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Type mismatch for status
create_dataset_item docs list status as a Symbol (:active/:archived), but DatasetItemClient#status is a String ("ACTIVE"/"ARCHIVED") and the API payload likely expects those canonical strings. If the public API actually accepts lowercase symbols, the docs should also mention the normalized server values; otherwise, update the parameter type/value examples to match the implementation.
| ) | ||
| ``` | ||
|
|
||
| | Parameter | Type | Required | Description | | ||
| | ----------------------- | ------ | -------- | ------------------------------------------ | | ||
| | `dataset_name` | String | Yes | Parent dataset name | | ||
| | `input` | Object | No | Input data | | ||
| | `expected_output` | Object | No | Expected output for evaluation | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inconsistent URL placeholder
This example uses {pid} (dataset.url # => "https://.../project/{pid}/datasets/..."), while other docs use {project_id}. Keeping a single placeholder avoids copy/paste confusion when users search/replace.
| ) | |
| ``` | |
| | Parameter | Type | Required | Description | | |
| | ----------------------- | ------ | -------- | ------------------------------------------ | | |
| | `dataset_name` | String | Yes | Parent dataset name | | |
| | `input` | Object | No | Input data | | |
| | `expected_output` | Object | No | Expected output for evaluation | | |
| # => "https://cloud.langfuse.com/project/{project_id}/datasets/clx..." |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
|
|
||
| **Signature:** | ||
|
|
||
| ```ruby | ||
| trace_url(trace_id) # => String | ||
| trace_url(trace_id) # => String | nil | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Example output format mismatch
trace_url shows project/{project_id} but the earlier URL example comment uses {project_id} while DATASETS.md uses {pid}. Please standardize placeholder naming across docs (e.g., {project_id} everywhere) to prevent inconsistent examples.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Adds dedicated user-facing documentation for dataset management and the experiment runner, and wires these docs into the existing getting started + API reference so the new SDK capabilities are discoverable.
Changes:
- Add new standalone guides:
DATASETS.md(CRUD + item linking) andEXPERIMENTS.md(runner, evaluators, results) - Update
GETTING_STARTED.mdandAPI_REFERENCE.mdto link to and document dataset/experiment APIs - Extend API reference “Utilities” docs for project-scoped URL helpers (trace/dataset/dataset run)
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| docs/GETTING_STARTED.md | Adds links to the new datasets/experiments guides. |
| docs/EXPERIMENTS.md | Introduces experiment runner guide with examples, parameters, and result object docs. |
| docs/DATASETS.md | Introduces dataset CRUD + dataset item linking guide. |
| docs/API_REFERENCE.md | Adds datasets/experiments sections and updates URL helper reference text/signatures. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| | `output` | Object | The task's return value | | ||
| | `expected_output` | Object | The item's expected output | | ||
| | `item` | DatasetItemClient / ExperimentItem| The original item | | ||
| | `metadata` | Hash (optional) | Item metadata (only passed if evaluator accepts it) | |
Copilot
AI
Feb 8, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The note for the metadata evaluator keyword is slightly inaccurate relative to the current implementation: metadata is only passed when the evaluator explicitly declares a metadata: keyword parameter (not when it only has a **/keyword-rest arg). Consider rewording this to avoid implying that ** will receive metadata.
| | `metadata` | Hash (optional) | Item metadata (only passed if evaluator accepts it) | | |
| | `metadata` | Hash (optional) | Item metadata (only passed if the evaluator defines a `metadata:` keyword parameter) | |
| result.success? # => true if no error | ||
| result.failed? # => true if error present |
Copilot
AI
Feb 8, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the ItemResult methods example, success?/failed? are methods on ItemResult, but the snippet calls them on result (which elsewhere refers to ExperimentResult). This example should use an item result variable (e.g., item_result.success?).
| result.success? # => true if no error | |
| result.failed? # => true if error present | |
| item_result.success? # => true if no error | |
| item_result.failed? # => true if error present |
- list_prompts defaults are nil, not page: 1 / limit: 50 - flush_scores takes no parameters (remove phantom timeout param) - create_score was missing dataset_run_id and config_id params - cache_stale_ttl default is 0, not 60 - SDK auto-reads LANGFUSE_PUBLIC_KEY/SECRET_KEY/BASE_URL from env - Remove nonexistent cache_stale_ttl auto-configuration claim
TL;DRAdd dedicated documentation for the new dataset management and experiment runner features.
WhyThe dataset CRUD methods (#41) and experiment runner (#41), plus project-scoped URL helpers (#43), shipped without user-facing guides. This adds standalone docs and updates the API reference so users can discover and use these features.
Checklist