docs: add dataset and experiment documentation #45

kxzk · 2026-02-08T23:06:51Z

`TL;DR`

Add dedicated documentation for the new dataset management and experiment runner features.

`Why`

The dataset CRUD methods (#41) and experiment runner (#41), plus project-scoped URL helpers (#43), shipped without user-facing guides. This adds standalone docs and updates the API reference so users can discover and use these features.

`Checklist`

Has label
Has linked issue
Tests added for new behavior
Docs updated (if user-facing)

Add dedicated DATASETS.md and EXPERIMENTS.md guides covering CRUD operations, evaluators, result objects, and end-to-end examples. Update API_REFERENCE.md with dataset, experiment, and new URL helper method signatures. Add cross-references in GETTING_STARTED.md.

greptile-apps · 2026-02-08T23:09:04Z

Greptile Overview

Greptile Summary

This PR adds standalone user-facing documentation for dataset management and the experiment runner, plus updates the API reference and getting started guide to make these features discoverable. It also documents the newer project-scoped URL helper methods for traces/datasets/dataset runs.

Most of the content matches the SDK behavior, but there are a couple of places where examples/parameter descriptions are inconsistent with the implementation (notably ItemResult example variable naming and dataset item status type/value), which are likely to break or confuse users if copy-pasted.

Confidence Score: 4/5

Safe to merge after fixing a couple of user-facing doc inaccuracies/confusing snippets.
Changes are documentation-only and largely consistent with the SDK, but a few example/code snippets are inconsistent with actual method receivers and returned types/values, which can mislead users and cause copy-paste failures.
docs/EXPERIMENTS.md and docs/API_REFERENCE.md

Important Files Changed

Filename	Overview
docs/API_REFERENCE.md	Adds dataset/experiment API sections and project-scoped URL helper docs; minor doc inconsistencies (status type, URL placeholders).
docs/DATASETS.md	New dataset management guide with examples; minor inconsistency in URL placeholder naming.
docs/EXPERIMENTS.md	New experiment runner guide; has a confusing/incorrect code snippet in ItemResult section that can fail if copy-pasted.
docs/GETTING_STARTED.md	Adds links to the new dataset/experiment docs; no issues found.

Sequence Diagram

sequenceDiagram
  participant U as User code
  participant C as Langfuse::Client
  participant D as DatasetClient
  participant R as ExperimentRunner
  participant A as ApiClient
  participant UI as Langfuse UI

  Note over U,C: Dataset flow
  U->>C: create_dataset(name, ...)
  C->>A: POST /api/public/datasets
  A-->>C: dataset JSON
  C-->>U: DatasetClient

  U->>C: create_dataset_item(dataset_name, input, expected_output, ...)
  C->>A: POST /api/public/dataset-items
  A-->>C: item JSON
  C-->>U: DatasetItemClient

  Note over U,C: Experiment flow
  U->>C: run_experiment(name, task, dataset_name: ...)
  C->>A: GET /api/public/datasets/{name}
  A-->>C: dataset + items
  C->>R: new(...).execute

  loop each item
    R->>R: TracedExecution.call(...){ link_to_dataset_run }
    R->>C: create_dataset_run_item(dataset_item_id, run_name, trace_id, ...)
    C->>A: POST /api/public/dataset-run-items
    A-->>C: datasetRunItem JSON
    R->>R: task.call(item)
    R->>C: create_score(... trace_id ...)
    C->>A: POST /api/public/scores (batched)
  end

  R-->>U: ExperimentResult (dataset_run_url)
  U->>C: dataset_run_url(dataset_id, dataset_run_id)
  C-->>U: https://.../project/{project_id}/datasets/{dataset_id}/runs/{dataset_run_id}
  U->>UI: Open URL

greptile-apps

_{4 files reviewed, 5 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-02-08T23:09:08Z

docs/EXPERIMENTS.md

+}
+
+pass_rate = ->(item_results:) {


Incorrect API call in example

In the end-to-end example, pass_rate uses item_results.count(&:success?), but success? is a method on ItemResult, not ExperimentResult. This will raise NoMethodError when the evaluator runs. Use item_results.count(&:success?) only if ItemResult#success? exists (it does), but the current code calls success? on each element correctly; the actual issue is earlier in the doc where result.success? is shown under ItemResult.

Suggested change

}

pass_rate = ->(item_results:) {

passed = item_results.count(&:success?)

greptile-apps · 2026-02-08T23:09:09Z

docs/EXPERIMENTS.md

+
+**Methods:**
+
+```ruby
+result.success?  # => true if no error


Wrong receiver for methods

In the ItemResult section, the snippet uses result.success? / result.failed?, but in this context result is the ExperimentResult variable used throughout the doc. This is likely to confuse users and will fail if copy-pasted. Consider renaming the variable to item_result (or showing item_result = result.item_results.first).

greptile-apps · 2026-02-08T23:09:10Z

docs/API_REFERENCE.md

+
+**Example:**
+
+```ruby
+dataset = client.create_dataset(


Type mismatch for status

create_dataset_item docs list status as a Symbol (:active/:archived), but DatasetItemClient#status is a String ("ACTIVE"/"ARCHIVED") and the API payload likely expects those canonical strings. If the public API actually accepts lowercase symbols, the docs should also mention the normalized server values; otherwise, update the parameter type/value examples to match the implementation.

greptile-apps · 2026-02-08T23:09:11Z

docs/DATASETS.md

+)
+```
+
+| Parameter               | Type   | Required | Description                                |
+| ----------------------- | ------ | -------- | ------------------------------------------ |
+| `dataset_name`          | String | Yes      | Parent dataset name                        |
+| `input`                 | Object | No       | Input data                                 |
+| `expected_output`       | Object | No       | Expected output for evaluation             |


Inconsistent URL placeholder

This example uses {pid} (dataset.url # => "https://.../project/{pid}/datasets/..."), while other docs use {project_id}. Keeping a single placeholder avoids copy/paste confusion when users search/replace.

Suggested change

)

```

| Parameter | Type | Required | Description |

| ----------------------- | ------ | -------- | ------------------------------------------ |

| `dataset_name` | String | Yes | Parent dataset name |

| `input` | Object | No | Input data |

| `expected_output` | Object | No | Expected output for evaluation |

# => "https://cloud.langfuse.com/project/{project_id}/datasets/clx..."

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

greptile-apps · 2026-02-08T23:09:12Z

docs/API_REFERENCE.md


 **Signature:**

 ```ruby
-trace_url(trace_id) # => String
+trace_url(trace_id) # => String | nil
 ```


Example output format mismatch

trace_url shows project/{project_id} but the earlier URL example comment uses {project_id} while DATASETS.md uses {pid}. Please standardize placeholder naming across docs (e.g., {project_id} everywhere) to prevent inconsistent examples.

Copilot

Pull request overview

Adds dedicated user-facing documentation for dataset management and the experiment runner, and wires these docs into the existing getting started + API reference so the new SDK capabilities are discoverable.

Changes:

Add new standalone guides: DATASETS.md (CRUD + item linking) and EXPERIMENTS.md (runner, evaluators, results)
Update GETTING_STARTED.md and API_REFERENCE.md to link to and document dataset/experiment APIs
Extend API reference “Utilities” docs for project-scoped URL helpers (trace/dataset/dataset run)

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File	Description
docs/GETTING_STARTED.md	Adds links to the new datasets/experiments guides.
docs/EXPERIMENTS.md	Introduces experiment runner guide with examples, parameters, and result object docs.
docs/DATASETS.md	Introduces dataset CRUD + dataset item linking guide.
docs/API_REFERENCE.md	Adds datasets/experiments sections and updates URL helper reference text/signatures.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-08T23:10:24Z

docs/EXPERIMENTS.md

+| `output`          | Object                            | The task's return value     |
+| `expected_output` | Object                            | The item's expected output  |
+| `item`            | DatasetItemClient / ExperimentItem| The original item           |
+| `metadata`        | Hash (optional)                   | Item metadata (only passed if evaluator accepts it) |


The note for the metadata evaluator keyword is slightly inaccurate relative to the current implementation: metadata is only passed when the evaluator explicitly declares a metadata: keyword parameter (not when it only has a **/keyword-rest arg). Consider rewording this to avoid implying that ** will receive metadata.

Suggested change

| `metadata` | Hash (optional) | Item metadata (only passed if evaluator accepts it) |

| `metadata` | Hash (optional) | Item metadata (only passed if the evaluator defines a `metadata:` keyword parameter) |

Copilot · 2026-02-08T23:10:24Z

docs/EXPERIMENTS.md

+result.success?  # => true if no error
+result.failed?   # => true if error present


In the ItemResult methods example, success?/failed? are methods on ItemResult, but the snippet calls them on result (which elsewhere refers to ExperimentResult). This example should use an item result variable (e.g., item_result.success?).

Suggested change

result.success? # => true if no error

result.failed? # => true if error present

item_result.success? # => true if no error

item_result.failed? # => true if error present

- list_prompts defaults are nil, not page: 1 / limit: 50 - flush_scores takes no parameters (remove phantom timeout param) - create_score was missing dataset_run_id and config_id params - cache_stale_ttl default is 0, not 60 - SDK auto-reads LANGFUSE_PUBLIC_KEY/SECRET_KEY/BASE_URL from env - Remove nonexistent cache_stale_ttl auto-configuration claim

kxzk added the documentation Improvements or additions to documentation label Feb 8, 2026

Copilot AI review requested due to automatic review settings February 8, 2026 23:06

Copilot started reviewing on behalf of kxzk February 8, 2026 23:07 View session

greptile-apps bot reviewed Feb 8, 2026

View reviewed changes

Copilot AI reviewed Feb 8, 2026

View reviewed changes

kxzk merged commit 2369a78 into main Feb 8, 2026
10 checks passed

kxzk deleted the feature/update-docs-for-new-dataset-and-experiment-functionality branch February 8, 2026 23:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: add dataset and experiment documentation #45

docs: add dataset and experiment documentation #45

Uh oh!

kxzk commented Feb 8, 2026

Uh oh!

greptile-apps bot commented Feb 8, 2026

Important Files Changed

Uh oh!

greptile-apps bot left a comment

Uh oh!

greptile-apps bot Feb 8, 2026

Uh oh!

greptile-apps bot Feb 8, 2026

Uh oh!

greptile-apps bot Feb 8, 2026

Uh oh!

greptile-apps bot Feb 8, 2026

Uh oh!

greptile-apps bot Feb 8, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 8, 2026

Uh oh!

Copilot AI Feb 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	\| `metadata` \| Hash (optional) \| Item metadata (only passed if evaluator accepts it) \|
	\| `metadata` \| Hash (optional) \| Item metadata (only passed if the evaluator defines a `metadata:` keyword parameter) \|

		result.success? # => true if no error
		result.failed? # => true if error present

docs: add dataset and experiment documentation #45

docs: add dataset and experiment documentation #45

Uh oh!

Conversation

kxzk commented Feb 8, 2026

TL;DR

Why

Checklist

Uh oh!

greptile-apps bot commented Feb 8, 2026

Greptile Overview

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 8, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 8, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 8, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 8, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

`TL;DR`

`Why`

`Checklist`