-
Notifications
You must be signed in to change notification settings - Fork 5
docs: add dataset and experiment documentation #45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
kxzk
merged 2 commits into
main
from
feature/update-docs-for-new-dataset-and-experiment-functionality
Feb 8, 2026
Merged
Changes from all commits
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -9,6 +9,8 @@ Complete method reference for the Langfuse Ruby SDK. | |
| - [Prompt Management](#prompt-management) | ||
| - [Tracing & Observability](#tracing--observability) | ||
| - [Scoring](#scoring) | ||
| - [Datasets](#datasets) | ||
| - [Experiments](#experiments) | ||
| - [Attribute Propagation](#attribute-propagation) | ||
| - [Types](#types) | ||
| - [Exceptions](#exceptions) | ||
|
|
@@ -41,7 +43,7 @@ Block receives a configuration object with these properties: | |
| | `cache_backend` | Symbol | No | `:memory` | `:memory` or `:rails` | | ||
| | `cache_lock_timeout` | Integer | No | `10` | Lock timeout (seconds) | | ||
| | `cache_stale_while_revalidate` | Boolean | No | `false` | Enable stale-while-revalidate | | ||
| | `cache_stale_ttl` | Integer | No | `60` when SWR is enabled | Stale TTL (seconds) | | ||
| | `cache_stale_ttl` | Integer | No | `0` | Stale TTL (seconds) | | ||
| | `cache_refresh_threads` | Integer | No | `5` | Background refresh threads | | ||
| | `batch_size` | Integer | No | `50` | Score batch size | | ||
| | `flush_interval` | Integer | No | `10` | Score flush interval (seconds) | | ||
|
|
@@ -218,15 +220,15 @@ List all prompts in the project. | |
| **Signature:** | ||
|
|
||
| ```ruby | ||
| list_prompts(page: 1, limit: 50) | ||
| list_prompts(page: nil, limit: nil) | ||
| ``` | ||
|
|
||
| **Parameters:** | ||
|
|
||
| | Parameter | Type | Required | Default | Description | | ||
| | --------- | ------- | -------- | ------- | ---------------- | | ||
| | `page` | Integer | No | `1` | Page number | | ||
| | `limit` | Integer | No | `50` | Results per page | | ||
| | `page` | Integer | No | - | Page number | | ||
| | `limit` | Integer | No | - | Results per page | | ||
|
|
||
| **Returns:** Array of prompt hashes | ||
|
|
||
|
|
@@ -520,7 +522,8 @@ Create a score for a trace or observation. | |
| **Signature:** | ||
|
|
||
| ```ruby | ||
| create_score(name:, value:, trace_id: nil, observation_id: nil, comment: nil, metadata: nil, data_type: :numeric) | ||
| create_score(name:, value:, trace_id: nil, observation_id: nil, comment: nil, metadata: nil, | ||
| data_type: :numeric, dataset_run_id: nil, config_id: nil) | ||
| ``` | ||
|
|
||
| **Parameters:** | ||
|
|
@@ -534,6 +537,8 @@ create_score(name:, value:, trace_id: nil, observation_id: nil, comment: nil, me | |
| | `comment` | String | No | Score comment | | ||
| | `metadata` | Hash | No | Additional metadata | | ||
| | `data_type` | Symbol | No | `:numeric`, `:boolean`, or `:categorical` | | ||
| | `dataset_run_id` | String | No | Dataset run ID to associate with | | ||
| | `config_id` | String | No | Score config ID | | ||
|
|
||
| **Note:** Must provide at least one of `trace_id` or `observation_id`. | ||
|
|
||
|
|
@@ -586,15 +591,9 @@ Immediately flush all queued scores to API. | |
| **Signature:** | ||
|
|
||
| ```ruby | ||
| flush_scores(timeout: 30) | ||
| flush_scores | ||
| ``` | ||
|
|
||
| **Parameters:** | ||
|
|
||
| | Parameter | Type | Required | Default | Description | | ||
| | --------- | ------- | -------- | ------- | ----------------------- | | ||
| | `timeout` | Integer | No | `30` | Flush timeout (seconds) | | ||
|
|
||
| **Example:** | ||
|
|
||
| ```ruby | ||
|
|
@@ -615,6 +614,247 @@ Langfuse.flush_scores | |
|
|
||
| See [SCORING.md](SCORING.md) for complete guide. | ||
|
|
||
| ## Datasets | ||
|
|
||
| ### `Client#create_dataset` | ||
|
|
||
| Create a new dataset. | ||
|
|
||
| **Signature:** | ||
|
|
||
| ```ruby | ||
| create_dataset(name:, description: nil, metadata: nil) | ||
| ``` | ||
|
|
||
| **Parameters:** | ||
|
|
||
| | Parameter | Type | Required | Description | | ||
| | ------------- | ------ | -------- | -------------------------- | | ||
| | `name` | String | Yes | Dataset name | | ||
| | `description` | String | No | Human-readable description | | ||
| | `metadata` | Hash | No | Arbitrary key-value pairs | | ||
|
|
||
| **Returns:** `DatasetClient` | ||
|
|
||
| **Example:** | ||
|
|
||
| ```ruby | ||
| dataset = client.create_dataset( | ||
| name: "qa-eval", | ||
| description: "QA evaluation set", | ||
| metadata: { domain: "support" } | ||
| ) | ||
| ``` | ||
|
|
||
| ### `Client#get_dataset` | ||
|
|
||
| Fetch a dataset by name. | ||
|
|
||
| **Signature:** | ||
|
|
||
| ```ruby | ||
| get_dataset(name) # => DatasetClient | ||
| ``` | ||
|
|
||
| **Parameters:** | ||
|
|
||
| | Parameter | Type | Required | Description | | ||
| | --------- | ------ | -------- | -------------------------------------------------------- | | ||
| | `name` | String | Yes | Dataset name (supports folder paths like "eval/qa-set") | | ||
|
|
||
| **Returns:** `DatasetClient` | ||
|
|
||
| **Raises:** `NotFoundError` if the dataset doesn't exist | ||
|
|
||
| ### `Client#list_datasets` | ||
|
|
||
| List all datasets in the project. | ||
|
|
||
| **Signature:** | ||
|
|
||
| ```ruby | ||
| list_datasets(page: nil, limit: nil) | ||
| ``` | ||
|
|
||
| **Parameters:** | ||
|
|
||
| | Parameter | Type | Required | Description | | ||
| | --------- | ------- | -------- | ---------------- | | ||
| | `page` | Integer | No | Page number | | ||
| | `limit` | Integer | No | Results per page | | ||
|
|
||
| **Returns:** `Array<Hash>` of dataset metadata | ||
|
|
||
| ### `Client#create_dataset_item` | ||
|
|
||
| Create a new dataset item. | ||
|
|
||
| **Signature:** | ||
|
|
||
| ```ruby | ||
| create_dataset_item(dataset_name:, input: nil, expected_output: nil, | ||
| metadata: nil, id: nil, source_trace_id: nil, | ||
| source_observation_id: nil, status: nil) | ||
| ``` | ||
|
|
||
| **Parameters:** | ||
|
|
||
| | Parameter | Type | Required | Description | | ||
| | ----------------------- | ------ | -------- | ---------------------------------------- | | ||
| | `dataset_name` | String | Yes | Parent dataset name | | ||
| | `input` | Object | No | Input data | | ||
| | `expected_output` | Object | No | Expected output for evaluation | | ||
| | `metadata` | Hash | No | Arbitrary metadata | | ||
| | `id` | String | No | Explicit ID (enables upsert) | | ||
| | `source_trace_id` | String | No | Link to source trace | | ||
| | `source_observation_id` | String | No | Link to source observation | | ||
| | `status` | Symbol | No | `:active` or `:archived` | | ||
|
|
||
| **Returns:** `DatasetItemClient` | ||
|
|
||
| **Example:** | ||
|
|
||
| ```ruby | ||
| item = client.create_dataset_item( | ||
| dataset_name: "qa-eval", | ||
| input: { question: "What is Ruby?" }, | ||
| expected_output: { answer: "A programming language" } | ||
| ) | ||
| ``` | ||
|
|
||
| ### `Client#get_dataset_item` | ||
|
|
||
| Fetch a dataset item by ID. | ||
|
|
||
| **Signature:** | ||
|
|
||
| ```ruby | ||
| get_dataset_item(id) # => DatasetItemClient | ||
| ``` | ||
|
|
||
| **Raises:** `NotFoundError` if the item doesn't exist | ||
|
|
||
| ### `Client#list_dataset_items` | ||
|
|
||
| List items in a dataset. Auto-paginates when `page` is nil. | ||
|
|
||
| **Signature:** | ||
|
|
||
| ```ruby | ||
| list_dataset_items(dataset_name:, page: nil, limit: nil, | ||
| source_trace_id: nil, source_observation_id: nil) | ||
| ``` | ||
|
|
||
| **Parameters:** | ||
|
|
||
| | Parameter | Type | Required | Description | | ||
| | ----------------------- | ------- | -------- | ---------------------------------------- | | ||
| | `dataset_name` | String | Yes | Dataset name | | ||
| | `page` | Integer | No | Page number (nil = fetch all pages) | | ||
| | `limit` | Integer | No | Results per page | | ||
| | `source_trace_id` | String | No | Filter by source trace | | ||
| | `source_observation_id` | String | No | Filter by source observation | | ||
|
|
||
| **Returns:** `Array<DatasetItemClient>` | ||
|
|
||
| ### `Client#delete_dataset_item` | ||
|
|
||
| Delete a dataset item by ID. Idempotent (404 treated as success). | ||
|
|
||
| **Signature:** | ||
|
|
||
| ```ruby | ||
| delete_dataset_item(id) # => nil | ||
| ``` | ||
|
|
||
| ### `Client#create_dataset_run_item` | ||
|
|
||
| Link a trace to a dataset item within a named run. | ||
|
|
||
| **Signature:** | ||
|
|
||
| ```ruby | ||
| create_dataset_run_item(dataset_item_id:, run_name:, trace_id: nil, | ||
| observation_id: nil, metadata: nil, run_description: nil) | ||
| ``` | ||
|
|
||
| **Parameters:** | ||
|
|
||
| | Parameter | Type | Required | Description | | ||
| | ----------------- | ------ | -------- | ------------------- | | ||
| | `dataset_item_id` | String | Yes | Dataset item ID | | ||
| | `run_name` | String | Yes | Run name | | ||
| | `trace_id` | String | No | Trace ID | | ||
| | `observation_id` | String | No | Observation ID | | ||
| | `metadata` | Hash | No | Optional metadata | | ||
| | `run_description` | String | No | Run description | | ||
|
|
||
| **Returns:** `Hash` (created dataset run item data) | ||
|
|
||
| See [DATASETS.md](DATASETS.md) for complete guide. | ||
|
|
||
| ## Experiments | ||
|
|
||
| ### `Client#run_experiment` | ||
|
|
||
| Run an experiment against a named dataset or local data. | ||
|
|
||
| **Signature:** | ||
|
|
||
| ```ruby | ||
| run_experiment(name:, task:, data: nil, dataset_name: nil, description: nil, | ||
| evaluators: [], run_evaluators: [], metadata: nil, run_name: nil) | ||
| ``` | ||
|
|
||
| **Parameters:** | ||
|
|
||
| | Parameter | Type | Required | Description | | ||
| | ---------------- | ------------- | -------- | ----------------------------------------------- | | ||
| | `name` | String | Yes | Experiment name | | ||
| | `task` | Proc | Yes | Callable receiving item, returning output | | ||
| | `dataset_name` | String | No* | Dataset to run against | | ||
| | `data` | Array | No* | Local data items (hashes or DatasetItemClients) | | ||
| | `description` | String | No | Run description | | ||
| | `evaluators` | Array\<Proc\> | No | Item-level evaluators | | ||
| | `run_evaluators` | Array\<Proc\> | No | Run-level evaluators | | ||
| | `metadata` | Hash | No | Metadata attached to each trace | | ||
| | `run_name` | String | No | Explicit run name (default: "name - timestamp") | | ||
|
|
||
| \* Provide exactly one of `dataset_name` or `data`. | ||
|
|
||
| **Returns:** `ExperimentResult` | ||
|
|
||
| **Raises:** `ArgumentError` if both or neither of `data`/`dataset_name` provided | ||
|
|
||
| **Example:** | ||
|
|
||
| ```ruby | ||
| result = client.run_experiment( | ||
| name: "qa-v1", | ||
| dataset_name: "qa-eval", | ||
| task: ->(item) { my_llm_call(item.input) }, | ||
| evaluators: [my_evaluator], | ||
| metadata: { model: "gpt-4o" } | ||
| ) | ||
| ``` | ||
|
|
||
| ### `DatasetClient#run_experiment` | ||
|
|
||
| Run an experiment against this dataset's items. | ||
|
|
||
| **Signature:** | ||
|
|
||
| ```ruby | ||
| dataset.run_experiment(name:, task:, description: nil, evaluators: [], | ||
| run_evaluators: [], metadata: nil, run_name: nil) | ||
| ``` | ||
|
|
||
| Same parameters as `Client#run_experiment` minus `dataset_name` and `data`. | ||
|
|
||
| **Returns:** `ExperimentResult` | ||
|
|
||
| See [EXPERIMENTS.md](EXPERIMENTS.md) for complete guide. | ||
|
|
||
| ## Attribute Propagation | ||
|
|
||
| ### `Langfuse.propagate_attributes` | ||
|
|
@@ -769,19 +1009,58 @@ See [ERROR_HANDLING.md](ERROR_HANDLING.md) for complete guide. | |
|
|
||
| ### `Client#trace_url` | ||
|
|
||
| Generate Langfuse UI URL for a trace. | ||
| Generate a project-scoped Langfuse UI URL for a trace. | ||
|
|
||
| **Signature:** | ||
|
|
||
| ```ruby | ||
| trace_url(trace_id) # => String | ||
| trace_url(trace_id) # => String | nil | ||
| ``` | ||
|
Comment on lines
1013
to
1018
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Example output format mismatch
|
||
|
|
||
| **Example:** | ||
|
|
||
| ```ruby | ||
| url = client.trace_url("abc123") | ||
| # => "https://cloud.langfuse.com/traces/abc123" | ||
| # => "https://cloud.langfuse.com/project/{project_id}/traces/abc123" | ||
| ``` | ||
|
|
||
| Returns `nil` if the project ID cannot be fetched. | ||
|
|
||
| ### `Client#dataset_url` | ||
|
|
||
| Generate a project-scoped Langfuse UI URL for a dataset. | ||
|
|
||
| **Signature:** | ||
|
|
||
| ```ruby | ||
| dataset_url(dataset_id) # => String | nil | ||
| ``` | ||
|
|
||
| **Example:** | ||
|
|
||
| ```ruby | ||
| url = client.dataset_url("dataset-uuid") | ||
| # => "https://cloud.langfuse.com/project/{project_id}/datasets/dataset-uuid" | ||
| ``` | ||
|
|
||
| ### `Client#dataset_run_url` | ||
|
|
||
| Generate a project-scoped Langfuse UI URL for a dataset run. | ||
|
|
||
| **Signature:** | ||
|
|
||
| ```ruby | ||
| dataset_run_url(dataset_id:, dataset_run_id:) # => String | nil | ||
| ``` | ||
|
|
||
| **Example:** | ||
|
|
||
| ```ruby | ||
| url = client.dataset_run_url( | ||
| dataset_id: "dataset-uuid", | ||
| dataset_run_id: "run-uuid" | ||
| ) | ||
| # => "https://cloud.langfuse.com/project/{project_id}/datasets/dataset-uuid/runs/run-uuid" | ||
| ``` | ||
|
|
||
| ### `Langfuse.shutdown` | ||
|
|
@@ -830,4 +1109,6 @@ Langfuse.force_flush(timeout: 10) | |
| - [PROMPTS.md](PROMPTS.md) - Prompt management | ||
| - [TRACING.md](TRACING.md) - Tracing patterns | ||
| - [SCORING.md](SCORING.md) - Scoring guide | ||
| - [DATASETS.md](DATASETS.md) - Dataset management | ||
| - [EXPERIMENTS.md](EXPERIMENTS.md) - Experiment runner | ||
| - [ERROR_HANDLING.md](ERROR_HANDLING.md) - Exception handling | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Type mismatch for status
create_dataset_itemdocs liststatusas aSymbol(:active/:archived), butDatasetItemClient#statusis a String ("ACTIVE"/"ARCHIVED") and the API payload likely expects those canonical strings. If the public API actually accepts lowercase symbols, the docs should also mention the normalized server values; otherwise, update the parameter type/value examples to match the implementation.