Skip to content

Conversation

@kxzk
Copy link
Collaborator

@kxzk kxzk commented Feb 8, 2026

TL;DR

Add dedicated documentation for the new dataset management and experiment runner features.

Why

The dataset CRUD methods (#41) and experiment runner (#41), plus project-scoped URL helpers (#43), shipped without user-facing guides. This adds standalone docs and updates the API reference so users can discover and use these features.

Checklist

  • Has label
  • Has linked issue
  • Tests added for new behavior
  • Docs updated (if user-facing)

Add dedicated DATASETS.md and EXPERIMENTS.md guides covering CRUD
operations, evaluators, result objects, and end-to-end examples.
Update API_REFERENCE.md with dataset, experiment, and new URL helper
method signatures. Add cross-references in GETTING_STARTED.md.
@kxzk kxzk added the documentation Improvements or additions to documentation label Feb 8, 2026
Copilot AI review requested due to automatic review settings February 8, 2026 23:06
@greptile-apps
Copy link

greptile-apps bot commented Feb 8, 2026

Greptile Overview

Greptile Summary

This PR adds standalone user-facing documentation for dataset management and the experiment runner, plus updates the API reference and getting started guide to make these features discoverable. It also documents the newer project-scoped URL helper methods for traces/datasets/dataset runs.

Most of the content matches the SDK behavior, but there are a couple of places where examples/parameter descriptions are inconsistent with the implementation (notably ItemResult example variable naming and dataset item status type/value), which are likely to break or confuse users if copy-pasted.

Confidence Score: 4/5

  • Safe to merge after fixing a couple of user-facing doc inaccuracies/confusing snippets.
  • Changes are documentation-only and largely consistent with the SDK, but a few example/code snippets are inconsistent with actual method receivers and returned types/values, which can mislead users and cause copy-paste failures.
  • docs/EXPERIMENTS.md and docs/API_REFERENCE.md

Important Files Changed

Filename Overview
docs/API_REFERENCE.md Adds dataset/experiment API sections and project-scoped URL helper docs; minor doc inconsistencies (status type, URL placeholders).
docs/DATASETS.md New dataset management guide with examples; minor inconsistency in URL placeholder naming.
docs/EXPERIMENTS.md New experiment runner guide; has a confusing/incorrect code snippet in ItemResult section that can fail if copy-pasted.
docs/GETTING_STARTED.md Adds links to the new dataset/experiment docs; no issues found.

Sequence Diagram

sequenceDiagram
  participant U as User code
  participant C as Langfuse::Client
  participant D as DatasetClient
  participant R as ExperimentRunner
  participant A as ApiClient
  participant UI as Langfuse UI

  Note over U,C: Dataset flow
  U->>C: create_dataset(name, ...)
  C->>A: POST /api/public/datasets
  A-->>C: dataset JSON
  C-->>U: DatasetClient

  U->>C: create_dataset_item(dataset_name, input, expected_output, ...)
  C->>A: POST /api/public/dataset-items
  A-->>C: item JSON
  C-->>U: DatasetItemClient

  Note over U,C: Experiment flow
  U->>C: run_experiment(name, task, dataset_name: ...)
  C->>A: GET /api/public/datasets/{name}
  A-->>C: dataset + items
  C->>R: new(...).execute

  loop each item
    R->>R: TracedExecution.call(...){ link_to_dataset_run }
    R->>C: create_dataset_run_item(dataset_item_id, run_name, trace_id, ...)
    C->>A: POST /api/public/dataset-run-items
    A-->>C: datasetRunItem JSON
    R->>R: task.call(item)
    R->>C: create_score(... trace_id ...)
    C->>A: POST /api/public/scores (batched)
  end

  R-->>U: ExperimentResult (dataset_run_url)
  U->>C: dataset_run_url(dataset_id, dataset_run_id)
  C-->>U: https://.../project/{project_id}/datasets/{dataset_id}/runs/{dataset_run_id}
  U->>UI: Open URL
Loading

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 files reviewed, 5 comments

Edit Code Review Agent Settings | Greptile

Comment on lines +233 to +235
}

pass_rate = ->(item_results:) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorrect API call in example

In the end-to-end example, pass_rate uses item_results.count(&:success?), but success? is a method on ItemResult, not ExperimentResult. This will raise NoMethodError when the evaluator runs. Use item_results.count(&:success?) only if ItemResult#success? exists (it does), but the current code calls success? on each element correctly; the actual issue is earlier in the doc where result.success? is shown under ItemResult.

Suggested change
}
pass_rate = ->(item_results:) {
passed = item_results.count(&:success?)

Comment on lines +201 to +205

**Methods:**

```ruby
result.success? # => true if no error
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong receiver for methods

In the ItemResult section, the snippet uses result.success? / result.failed?, but in this context result is the ExperimentResult variable used throughout the doc. This is likely to confuse users and will fail if copy-pasted. Consider renaming the variable to item_result (or showing item_result = result.item_results.first).

Comment on lines +641 to +645

**Example:**

```ruby
dataset = client.create_dataset(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Type mismatch for status

create_dataset_item docs list status as a Symbol (:active/:archived), but DatasetItemClient#status is a String ("ACTIVE"/"ARCHIVED") and the API payload likely expects those canonical strings. If the public API actually accepts lowercase symbols, the docs should also mention the normalized server values; otherwise, update the parameter type/value examples to match the implementation.

Comment on lines +75 to +82
)
```

| Parameter | Type | Required | Description |
| ----------------------- | ------ | -------- | ------------------------------------------ |
| `dataset_name` | String | Yes | Parent dataset name |
| `input` | Object | No | Input data |
| `expected_output` | Object | No | Expected output for evaluation |
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent URL placeholder

This example uses {pid} (dataset.url # => "https://.../project/{pid}/datasets/..."), while other docs use {project_id}. Keeping a single placeholder avoids copy/paste confusion when users search/replace.

Suggested change
)
```
| Parameter | Type | Required | Description |
| ----------------------- | ------ | -------- | ------------------------------------------ |
| `dataset_name` | String | Yes | Parent dataset name |
| `input` | Object | No | Input data |
| `expected_output` | Object | No | Expected output for evaluation |
# => "https://cloud.langfuse.com/project/{project_id}/datasets/clx..."

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Comment on lines 1016 to 1021

**Signature:**

```ruby
trace_url(trace_id) # => String
trace_url(trace_id) # => String | nil
```
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Example output format mismatch

trace_url shows project/{project_id} but the earlier URL example comment uses {project_id} while DATASETS.md uses {pid}. Please standardize placeholder naming across docs (e.g., {project_id} everywhere) to prevent inconsistent examples.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds dedicated user-facing documentation for dataset management and the experiment runner, and wires these docs into the existing getting started + API reference so the new SDK capabilities are discoverable.

Changes:

  • Add new standalone guides: DATASETS.md (CRUD + item linking) and EXPERIMENTS.md (runner, evaluators, results)
  • Update GETTING_STARTED.md and API_REFERENCE.md to link to and document dataset/experiment APIs
  • Extend API reference “Utilities” docs for project-scoped URL helpers (trace/dataset/dataset run)

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
docs/GETTING_STARTED.md Adds links to the new datasets/experiments guides.
docs/EXPERIMENTS.md Introduces experiment runner guide with examples, parameters, and result object docs.
docs/DATASETS.md Introduces dataset CRUD + dataset item linking guide.
docs/API_REFERENCE.md Adds datasets/experiments sections and updates URL helper reference text/signatures.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

| `output` | Object | The task's return value |
| `expected_output` | Object | The item's expected output |
| `item` | DatasetItemClient / ExperimentItem| The original item |
| `metadata` | Hash (optional) | Item metadata (only passed if evaluator accepts it) |
Copy link

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The note for the metadata evaluator keyword is slightly inaccurate relative to the current implementation: metadata is only passed when the evaluator explicitly declares a metadata: keyword parameter (not when it only has a **/keyword-rest arg). Consider rewording this to avoid implying that ** will receive metadata.

Suggested change
| `metadata` | Hash (optional) | Item metadata (only passed if evaluator accepts it) |
| `metadata` | Hash (optional) | Item metadata (only passed if the evaluator defines a `metadata:` keyword parameter) |

Copilot uses AI. Check for mistakes.
Comment on lines +205 to +206
result.success? # => true if no error
result.failed? # => true if error present
Copy link

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the ItemResult methods example, success?/failed? are methods on ItemResult, but the snippet calls them on result (which elsewhere refers to ExperimentResult). This example should use an item result variable (e.g., item_result.success?).

Suggested change
result.success? # => true if no error
result.failed? # => true if error present
item_result.success? # => true if no error
item_result.failed? # => true if error present

Copilot uses AI. Check for mistakes.
- list_prompts defaults are nil, not page: 1 / limit: 50
- flush_scores takes no parameters (remove phantom timeout param)
- create_score was missing dataset_run_id and config_id params
- cache_stale_ttl default is 0, not 60
- SDK auto-reads LANGFUSE_PUBLIC_KEY/SECRET_KEY/BASE_URL from env
- Remove nonexistent cache_stale_ttl auto-configuration claim
@kxzk kxzk merged commit 2369a78 into main Feb 8, 2026
10 checks passed
@kxzk kxzk deleted the feature/update-docs-for-new-dataset-and-experiment-functionality branch February 8, 2026 23:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant