layout	default
title	Langfuse Tutorial - Chapter 3: Prompt Management
nav_order	3
has_children	false
parent	Langfuse Tutorial

Chapter 3: Prompt Management

Welcome to Chapter 3: Prompt Management. In this part of Langfuse Tutorial: LLM Observability, Evaluation, and Prompt Operations, you will build an intuitive mental model first, then move into concrete implementation details and practical production tradeoffs.

Version, release, and A/B test prompts directly from Langfuse.

Overview

Prompts are the most frequently changed part of any LLM application. A small wording tweak can dramatically affect quality, cost, and safety. Langfuse lets you store prompts centrally, version them automatically, attach labels for release management, and fetch them at runtime -- all without redeploying your application. In this chapter you will:

Understand the prompt lifecycle from creation to monitoring.
Work with both Chat and Text prompt types.
Use variables and templates effectively.
Fetch and cache prompts in your application.
Monitor prompt performance across versions.
Walk through a complete prompt iteration workflow.
Learn prompt engineering best practices within Langfuse.

The Prompt Lifecycle

A prompt in Langfuse goes through a clear lifecycle. Understanding these stages helps you manage changes safely:

graph LR
    A[Create Prompt] --> B[New Version<br/>auto-incremented]
    B --> C[Assign Label<br/>e.g. staging]
    C --> D[Deploy to App<br/>SDK fetches by label]
    D --> E[Monitor Performance<br/>scores, cost, latency]
    E -->|iterate| B

    style A fill:#e0f2fe,stroke:#0284c7
    style B fill:#fef3c7,stroke:#d97706
    style C fill:#f3e8ff,stroke:#9333ea
    style D fill:#dcfce7,stroke:#16a34a
    style E fill:#fce7f3,stroke:#db2777

Create -- Define a prompt in the Langfuse UI or via the API. Give it a descriptive name like support_reply or summarizer_v2.
Version -- Every edit creates a new, immutable version. Versions are auto-incremented integers (1, 2, 3, ...).
Label -- Assign labels like production, staging, or beta to specific versions. Labels are movable pointers -- you can relabel instantly to roll back.
Deploy -- Your application fetches the prompt by name and label at runtime. No redeploy needed.
Monitor -- Attach the prompt_version to your traces and use Langfuse analytics to compare quality, cost, and latency across versions.

Chat vs Text Prompts

Langfuse supports two prompt types. Choosing the right one depends on how your LLM expects its input.

Text Prompts

A text prompt is a single string with optional {{variables}}. It is ideal for completion-style models or when you build your own message array.

You are a helpful customer support agent for {{company_name}}.

The customer's name is {{customer_name}} and their issue is: {{issue}}.

Respond politely and provide a clear solution based on the following context:
{{context}}

When you call prompt.compile(...), Langfuse returns the rendered string.

Chat Prompts

A chat prompt is an array of messages, each with a role and content. This maps directly to the format expected by OpenAI, Anthropic, and most chat-based APIs.

[
  {
    "role": "system",
    "content": "You are a support agent for {{company_name}}. Be concise and helpful."
  },
  {
    "role": "user",
    "content": "Hi, my name is {{customer_name}}. I have an issue with {{issue}}.\n\nContext: {{context}}"
  }
]

When you call prompt.compile(...) on a chat prompt, Langfuse returns a list of message dictionaries ready to pass to your LLM client.

When to Use Which

Use case	Prompt type
OpenAI / Anthropic chat completions	Chat
Single-prompt completions or embeddings	Text
Complex multi-turn templates	Chat
Simple string interpolation	Text

Creating Prompts in Langfuse

You can create prompts through the UI or the SDK.

Via the UI

Navigate to Prompts in the sidebar.
Click New Prompt.
Choose Chat or Text type.
Enter a name (e.g., support_reply).
Write your template with {{variable}} placeholders.
Click Save -- this creates version 1.

Via the SDK (Python)

from langfuse import Langfuse

langfuse = Langfuse()

# Create a chat prompt
langfuse.create_prompt(
    name="support_reply",
    type="chat",
    prompt=[
        {
            "role": "system",
            "content": "You are a support agent for {{company_name}}. Be helpful and concise.",
        },
        {
            "role": "user",
            "content": "Customer {{customer_name}} asks: {{issue}}\n\nContext: {{context}}",
        },
    ],
    labels=["staging"],  # immediately label this version
    config={
        "model": "gpt-4o-mini",
        "temperature": 0.3,
        "max_tokens": 500,
    },
)

The optional config object lets you store model parameters alongside the prompt. Your application can read these at runtime to stay in sync.

Prompt Variables and Templates

Variables are the bridge between your static prompt template and the dynamic data in each request. They use double-curly-brace syntax: {{variable_name}}.

Defining Variables

When you write a prompt template, any {{token}} becomes a variable. Langfuse automatically detects them and lists them in the UI.

Compiling Templates

The compile method replaces variables with the values you provide:

prompt = langfuse.get_prompt("support_reply", label="production")

messages = prompt.compile(
    customer_name="Alex",
    issue="billing error on invoice #789",
    context="We refunded invoice #789 and updated the payment method on file.",
    company_name="Acme Corp",
)

# messages is now a list of dicts ready for your LLM client

Variable Best Practices

Name variables descriptively: {{customer_name}} is better than {{name}}.
Document expected types: In the prompt description field, note whether a variable expects a string, a list, or structured data.
Provide defaults in your code: If a variable might be missing, handle it gracefully before calling compile.
Avoid secrets: Never pass API keys, passwords, or tokens as prompt variables. They would be stored in Langfuse.

Using Config for Model Parameters

The config field on a prompt is a free-form dictionary. A common pattern is to store model settings there:

prompt = langfuse.get_prompt("support_reply", label="production")

# Read model config from the prompt
model = prompt.config.get("model", "gpt-4o-mini")
temperature = prompt.config.get("temperature", 0.7)
max_tokens = prompt.config.get("max_tokens", 500)

messages = prompt.compile(customer_name="Alex", issue="billing", context="...", company_name="Acme")

resp = client.chat.completions.create(
    model=model,
    temperature=temperature,
    max_tokens=max_tokens,
    messages=messages,
)

This way, non-engineers on your team can adjust model parameters from the Langfuse UI without touching code.

Versioning and Releases

How Versions Work

Every time you save a prompt (in the UI or via create_prompt), Langfuse creates a new immutable version. Versions are integers that increment automatically. You can never edit an existing version -- only create a new one.

Labels as Release Pointers

Labels are movable pointers to versions. Think of them like Git tags that you can reassign:

production -- the version your live application uses.
staging -- the version being tested before promotion.
beta -- an experimental version for a subset of users.
latest -- Langfuse automatically assigns this to the newest version.

To promote a staging prompt to production, simply move the production label to the staging version. Instant rollout, instant rollback.

Linking Prompt Versions to Traces

Always record which prompt version generated each response. This makes performance comparison possible:

prompt = langfuse.get_prompt("support_reply", label="production")
messages = prompt.compile(customer_name="Alex", issue="billing", context="...", company_name="Acme")

trace = langfuse.trace(name="support-query", user_id="user_123")
span = trace.span(
    name="support-llm",
    input=messages,
    metadata={
        "prompt_name": prompt.name,
        "prompt_version": prompt.version,
        "prompt_label": "production",
    },
)

A/B Testing Prompts

Want to test two prompt versions head-to-head? Assign different labels and split traffic in your code:

import hashlib

def get_prompt_label(user_id: str) -> str:
    """Deterministic split: same user always gets the same variant."""
    hash_val = int(hashlib.md5(user_id.encode()).hexdigest(), 16)
    return "production" if hash_val % 2 == 0 else "beta"

label = get_prompt_label(user_id)
prompt = langfuse.get_prompt("support_reply", label=label)

# Tag the trace so you can filter by variant later
trace = langfuse.trace(
    name="support-query",
    user_id=user_id,
    tags=[f"prompt-variant:{label}"],
)

Then in the Langfuse dashboard, filter traces by the prompt-variant:production and prompt-variant:beta tags to compare scores, latency, and cost side by side.

Prompt Caching

Fetching prompts from Langfuse on every request adds a network call. The SDK includes built-in caching to minimize this overhead.

Default Caching Behavior

The Python SDK caches prompts in memory with a default TTL (time-to-live). When you call get_prompt, it returns the cached version if the TTL has not expired.

Controlling the Cache

# Cache for 5 minutes (300 seconds)
prompt = langfuse.get_prompt("support_reply", label="production", cache_ttl_seconds=300)

# Force a fresh fetch (bypass cache)
prompt = langfuse.get_prompt("support_reply", label="production", cache_ttl_seconds=0)

Caching Strategy Recommendations

Environment	Recommended TTL	Reason
Development	0 (no cache)	See prompt changes immediately.
Staging	30-60 seconds	Quick iteration, but reduce API calls.
Production	300-600 seconds	Stable prompts; minimize latency.

Warming the Cache at Startup

For latency-sensitive applications, fetch your prompts during initialization:

# At application startup
prompts = {
    "support_reply": langfuse.get_prompt("support_reply", label="production"),
    "summarizer": langfuse.get_prompt("summarizer", label="production"),
}

# During request handling -- already cached
messages = prompts["support_reply"].compile(...)

Prompt Performance Monitoring

Connecting prompts to evaluation (covered in detail in Chapter 4) closes the feedback loop. Here is the pattern:

Tag traces with prompt metadata -- name, version, label.
Score traces -- either with LLM-as-judge or human feedback.
Filter by prompt version in the Langfuse dashboard to compare.

This lets you answer questions like:

Did version 5 of support_reply improve helpfulness scores compared to version 4?
Which prompt variant has lower cost per successful response?
Are there regressions in safety scores after the latest prompt edit?

Complete Example: Prompt Iteration Workflow

Let's walk through a realistic workflow where you iterate on a prompt, test it, and promote it to production.

Step 1 -- Create the Initial Prompt

langfuse.create_prompt(
    name="ticket_classifier",
    type="chat",
    prompt=[
        {
            "role": "system",
            "content": (
                "Classify the support ticket into one of these categories: "
                "billing, technical, account, other.\n"
                "Respond with only the category name."
            ),
        },
        {"role": "user", "content": "{{ticket_text}}"},
    ],
    labels=["production"],
    config={"model": "gpt-4o-mini", "temperature": 0},
)

Step 2 -- Discover a Problem

After monitoring, you notice that tickets about refunds are being classified as other instead of billing. Time to iterate.

Step 3 -- Create an Improved Version

langfuse.create_prompt(
    name="ticket_classifier",
    type="chat",
    prompt=[
        {
            "role": "system",
            "content": (
                "Classify the support ticket into one of these categories: "
                "billing (includes refunds, invoices, payments), "
                "technical (includes bugs, errors, integrations), "
                "account (includes login, password, profile), "
                "other.\n"
                "Respond with only the category name in lowercase."
            ),
        },
        {"role": "user", "content": "{{ticket_text}}"},
    ],
    labels=["staging"],
    config={"model": "gpt-4o-mini", "temperature": 0},
)

This creates version 2 with the staging label. Version 1 still has production.

Step 4 -- Test the Staging Version

prompt = langfuse.get_prompt("ticket_classifier", label="staging")

test_tickets = [
    "I need a refund for my last invoice",
    "The API returns a 500 error",
    "I cannot log into my account",
    "I want to request a feature",
]

for ticket in test_tickets:
    messages = prompt.compile(ticket_text=ticket)
    resp = client.chat.completions.create(model="gpt-4o-mini", messages=messages, temperature=0)
    category = resp.choices[0].message.content.strip()

    trace = langfuse.trace(name="classifier-test", tags=["staging-test"])
    trace.span(
        name="classify",
        input=ticket,
        output=category,
        metadata={"prompt_version": prompt.version},
    )
    print(f"Ticket: {ticket} -> {category}")

langfuse.flush()

Step 5 -- Promote to Production

If the staging results look good, promote by moving the production label:

# In the Langfuse UI: go to Prompts > ticket_classifier > Version 2 > Add label "production"
# Or via SDK:
langfuse.create_prompt(
    name="ticket_classifier",
    type="chat",
    prompt=[...],          # same content as version 2
    labels=["production"], # this version now gets the production label
    config={"model": "gpt-4o-mini", "temperature": 0},
)

Your live application immediately picks up the new version on the next cache refresh -- no redeploy required.

Prompt Engineering Best Practices

These tips will help you get the most out of Langfuse prompt management:

Structure Your Prompts Clearly

Put instructions in the system message.
Put user input in the user message.
Use numbered lists or bullet points for multi-step instructions.
Separate concerns: one prompt per task (classify, summarize, generate).

Use Descriptive Naming

Name prompts by function: ticket_classifier, support_reply, doc_summarizer.
Avoid generic names like prompt_1 or test.
Use the description field to document what the prompt does and what variables it expects.

Keep Variables Minimal

Only parameterize what actually changes between requests.
Hard-code instructions, formatting rules, and output schemas in the template itself.
The fewer variables, the less room for injection or misuse.

Test Before Promoting

Always test new prompt versions on a sample of real inputs before moving the production label.
Use Langfuse evaluation (Chapter 4) to compare scores between versions.
Keep at least one known-good version labeled production at all times.

Version Hygiene

Write a brief note in the prompt description when you create a new version explaining what changed and why.
Clean up old labels that are no longer in use.
Review prompt performance weekly to catch regressions early.

What You Learned

The prompt lifecycle: Create, Version, Label, Deploy, Monitor.
The difference between Chat and Text prompt types.
How to use variables, templates, and the config object.
How to cache prompts for performance.
How to link prompt versions to traces for performance monitoring.
A complete prompt iteration workflow from creation through production promotion.
Prompt engineering best practices within Langfuse.

| Previous: Chapter 2 -- Tracing Fundamentals | Next: Chapter 4 -- Evaluation |

Depth Expansion Playbook

Source Code Walkthrough

`package.json`

The package module in package.json handles a key part of this chapter's functionality:

{
  "name": "langfuse",
  "version": "3.163.0",
  "author": "[email protected]",
  "license": "MIT",
  "private": true,
  "engines": {
    "node": "24"
  },
  "scripts": {
    "agents:check": "node scripts/agents/sync-agent-shims.mjs --check",
    "agents:sync": "node scripts/agents/sync-agent-shims.mjs",
    "postinstall": "node -e \"const fs = require('node:fs'); const cp = require('node:child_process'); if (!fs.existsSync('scripts/postinstall.sh')) { console.log('Skipping repo postinstall helper: scripts/postinstall.sh is not present in this install context.'); process.exit(0); } cp.execSync('bash scripts/postinstall.sh', { stdio: 'inherit' });\"",
    "preinstall": "npx only-allow pnpm",
    "infra:dev:up": "docker compose -f ./docker-compose.dev.yml up -d --wait",
    "infra:dev:down": "docker compose -f ./docker-compose.dev.yml down",
    "infra:dev:prune": "docker compose -f ./docker-compose.dev.yml down -v",
    "db:generate": "turbo run db:generate",
    "db:migrate": "turbo run db:migrate",
    "db:seed": "turbo run db:seed",
    "db:seed:examples": "turbo run db:seed:examples",
    "nuke": "bash ./scripts/nuke.sh",
    "dx": "pnpm i && pnpm run infra:dev:prune && pnpm run infra:dev:up --pull always && pnpm --filter=shared run db:reset:test && pnpm --filter=shared run db:reset && pnpm --filter=shared run ch:reset && pnpm --filter=shared run db:seed:examples && pnpm run dev",
    "dx-f": "pnpm i && pnpm run infra:dev:prune && pnpm run infra:dev:up --pull always && pnpm --filter=shared run db:reset:test && pnpm --filter=shared run db:reset -f && SKIP_CONFIRM=1 pnpm --filter=shared run ch:reset && pnpm --filter=shared run db:seed:examples && pnpm run dev",
    "dx:skip-infra": "pnpm i && pnpm --filter=shared run db:reset:test && pnpm --filter=shared run db:reset && pnpm --filter=shared run ch:reset && pnpm --filter=shared run db:seed:examples && pnpm run dev",
    "build": "turbo run build",
    "build:check": "turbo run build:check",
    "typecheck": "turbo run typecheck",
    "tc": "turbo run typecheck",
    "start": "turbo run start",
    "dev": "turbo run dev",
    "dev:worker": "turbo run dev --filter=worker",
    "dev:web": "turbo run dev --filter=web",
    "dev:web-webpack": "turbo run dev --filter=web -- --webpack",
    "lint": "turbo run lint",

This module is important because it defines how Langfuse Tutorial: LLM Observability, Evaluation, and Prompt Operations implements the patterns covered in this chapter.

`docker-compose.dev-azure.yml`

The docker-compose.dev-azure module in docker-compose.dev-azure.yml handles a key part of this chapter's functionality:

services:
  clickhouse:
    image: docker.io/clickhouse/clickhouse-server:24.3
    user: "101:101"
    environment:
      CLICKHOUSE_DB: default
      CLICKHOUSE_USER: ${CLICKHOUSE_USER:-clickhouse}
      CLICKHOUSE_PASSWORD: ${CLICKHOUSE_PASSWORD:-clickhouse}
    volumes:
      - langfuse_clickhouse_data:/var/lib/clickhouse
      - langfuse_clickhouse_logs:/var/log/clickhouse-server
    ports:
      - "8123:8123"
      - "9000:9000"
    healthcheck:
      test: wget --no-verbose --tries=1 --spider http://localhost:8123/ping || exit 1
      interval: 5s
      timeout: 5s
      retries: 10
      start_period: 1s
    depends_on:
      - postgres

  azurite:
    image: mcr.microsoft.com/azure-storage/azurite
    command: azurite-blob --blobHost 0.0.0.0
    ports:
      - "10000:10000"
    volumes:
      - langfuse_azurite_data:/data

  minio:
    image: cgr.dev/chainguard/minio
    container_name: ${MINIO_CONTAINER_NAME:-langfuse-minio}
    entrypoint: sh

This module is important because it defines how Langfuse Tutorial: LLM Observability, Evaluation, and Prompt Operations implements the patterns covered in this chapter.

`docker-compose.yml`

The docker-compose module in docker-compose.yml handles a key part of this chapter's functionality:

# Make sure to update the credential placeholders with your own secrets.
# We mark them with # CHANGEME in the file below.
# In addition, we recommend to restrict inbound traffic on the host to langfuse-web (port 3000) and minio (port 9090) only.
# All other components are bound to localhost (127.0.0.1) to only accept connections from the local machine.
# External connections from other machines will not be able to reach these services directly.
services:
  langfuse-worker:
    image: docker.io/langfuse/langfuse-worker:3
    restart: always
    depends_on: &langfuse-depends-on
      postgres:
        condition: service_healthy
      minio:
        condition: service_healthy
      redis:
        condition: service_healthy
      clickhouse:
        condition: service_healthy
    ports:
      - 127.0.0.1:3030:3030
    environment: &langfuse-worker-env
      NEXTAUTH_URL: ${NEXTAUTH_URL:-http://localhost:3000}
      DATABASE_URL: ${DATABASE_URL:-postgresql://postgres:postgres@postgres:5432/postgres} # CHANGEME
      SALT: ${SALT:-mysalt} # CHANGEME
      ENCRYPTION_KEY: ${ENCRYPTION_KEY:-0000000000000000000000000000000000000000000000000000000000000000} # CHANGEME: generate via `openssl rand -hex 32`
      TELEMETRY_ENABLED: ${TELEMETRY_ENABLED:-true}
      LANGFUSE_ENABLE_EXPERIMENTAL_FEATURES: ${LANGFUSE_ENABLE_EXPERIMENTAL_FEATURES:-false}
      CLICKHOUSE_MIGRATION_URL: ${CLICKHOUSE_MIGRATION_URL:-clickhouse://clickhouse:9000}
      CLICKHOUSE_URL: ${CLICKHOUSE_URL:-http://clickhouse:8123}
      CLICKHOUSE_USER: ${CLICKHOUSE_USER:-clickhouse}
      CLICKHOUSE_PASSWORD: ${CLICKHOUSE_PASSWORD:-clickhouse} # CHANGEME
      CLICKHOUSE_CLUSTER_ENABLED: ${CLICKHOUSE_CLUSTER_ENABLED:-false}
      LANGFUSE_USE_AZURE_BLOB: ${LANGFUSE_USE_AZURE_BLOB:-false}
      LANGFUSE_S3_EVENT_UPLOAD_BUCKET: ${LANGFUSE_S3_EVENT_UPLOAD_BUCKET:-langfuse}
      LANGFUSE_S3_EVENT_UPLOAD_REGION: ${LANGFUSE_S3_EVENT_UPLOAD_REGION:-auto}

This module is important because it defines how Langfuse Tutorial: LLM Observability, Evaluation, and Prompt Operations implements the patterns covered in this chapter.

How These Components Connect

flowchart TD
    A[package]
    B[docker-compose.dev-azure]
    C[docker-compose]
    A --> B
    B --> C

FilesExpand file tree

03-prompts.md

Latest commit

History

03-prompts.md

File metadata and controls

Chapter 3: Prompt Management

Overview

The Prompt Lifecycle

Chat vs Text Prompts

Text Prompts

Chat Prompts

When to Use Which

Creating Prompts in Langfuse

Via the UI

Via the SDK (Python)

Prompt Variables and Templates

Defining Variables

Compiling Templates

Variable Best Practices

Using Config for Model Parameters

Versioning and Releases

How Versions Work

Labels as Release Pointers

Linking Prompt Versions to Traces

A/B Testing Prompts

Prompt Caching

Default Caching Behavior

Controlling the Cache

Caching Strategy Recommendations

Warming the Cache at Startup

Prompt Performance Monitoring

Complete Example: Prompt Iteration Workflow

Step 1 -- Create the Initial Prompt

Step 2 -- Discover a Problem

Step 3 -- Create an Improved Version

Step 4 -- Test the Staging Version

Step 5 -- Promote to Production

Prompt Engineering Best Practices

Structure Your Prompts Clearly

Use Descriptive Naming

Keep Variables Minimal

Test Before Promoting

Version Hygiene

What You Learned

Depth Expansion Playbook

Source Code Walkthrough

package.json

docker-compose.dev-azure.yml

docker-compose.yml

How These Components Connect

`package.json`

`docker-compose.dev-azure.yml`

`docker-compose.yml`