Langfuse 101: See Inside Your AI App

Langfuse is an open-source observability platform for LLM applications. It is free to start, runs on their cloud or your own infrastructure, and gives you a full trace of every LLM call your app makes, including what went in, what came out, how long it took, and how many tokens it used. This post walks you through going from zero to your first scored trace.

Install and Connect

Install the Langfuse Python SDK alongside your Anthropic SDK. Then set your API keys as environment variables. You can get your Langfuse public and secret keys from the project settings page at cloud.langfuse.com.

# Install
pip install langfuse anthropic

# Set environment variables (add to .env or your shell profile)
# LANGFUSE_PUBLIC_KEY=pk-lf-...
# LANGFUSE_SECRET_KEY=sk-lf-...
# LANGFUSE_HOST=https://cloud.langfuse.com

import os
from langfuse import Langfuse
import anthropic

langfuse = Langfuse(
    public_key=os.environ["LANGFUSE_PUBLIC_KEY"],
    secret_key=os.environ["LANGFUSE_SECRET_KEY"],
    host=os.environ.get("LANGFUSE_HOST", "https://cloud.langfuse.com")
)
anthropic_client = anthropic.Anthropic()

print("Connected to Langfuse:", langfuse.auth_check())

If auth_check() returns True, you are connected and ready to start tracing.

Your First Trace

The simplest way to create a trace is to use the @observe decorator from the Langfuse decorators module. Any function decorated with @observe automatically becomes a traced span. Nested calls to other observed functions become child spans within the same trace.

from langfuse.decorators import observe, langfuse_context

@observe()
def answer_user_question(question: str) -> str:
    response = anthropic_client.messages.create(
        model="claude-haiku-4-5",
        max_tokens=512,
        messages=[{"role": "user", "content": question}]
    )
    answer = response.content[0].text

    # Attach metadata to the current trace
    langfuse_context.update_current_trace(
        name="answer_user_question",
        input=question,
        output=answer,
        metadata={"model": "claude-haiku-4-5"}
    )
    return answer

# Call it normally - tracing happens automatically
result = answer_user_question("What is the capital of France?")
print(result)

After running this, open your Langfuse dashboard and you will see the trace appear within a few seconds with full input, output, token count, and latency.

Viewing Spans and Setting Up Score Tracking

Spans let you see which part of your pipeline is slow or failing. Add a span around any significant step by creating a nested observed function. To track quality, post a score to any trace using its trace ID.

from langfuse.decorators import observe, langfuse_context

@observe()
def retrieve_context(query: str) -> str:
    # Your retrieval logic here
    return "Relevant document content..."

@observe()
def generate_answer(question: str) -> dict:
    context = retrieve_context(question)

    response = anthropic_client.messages.create(
        model="claude-haiku-4-5",
        max_tokens=512,
        messages=[{"role": "user", "content": "Context: {}

Question: {}".format(context, question)}]
    )
    answer = response.content[0].text
    trace_id = langfuse_context.get_current_trace_id()
    return {"answer": answer, "trace_id": trace_id}

# After user provides feedback, log it as a score
def log_user_feedback(trace_id: str, thumbs_up: bool):
    langfuse.score(
        trace_id=trace_id,
        name="user_feedback",
        value=1.0 if thumbs_up else 0.0
    )

Prompt

"You are reviewing a Langfuse trace for an AI question-answering app. The trace shows two spans: a retrieval step taking 800ms and an LLM generation step taking 1200ms. The user's question was about a product return policy. The response correctly cited the policy but included three unrelated sentences. Identify the quality issues and suggest one prompt change and one retrieval change to address them."

Once you have traces and scores flowing, the Langfuse dashboard becomes your primary tool for understanding your AI feature. Filter by date, by score range, by latency percentile. The data you collect in the first week will surface issues you would never have found otherwise.

Want to build this live with Aki?

Join a Lightning Lesson and go deeper on this topic. Browse upcoming sessions →

Langfuse 101: See Inside Your AI App

Key Takeaways

Install and Connect

Your First Trace

Viewing Spans and Setting Up Score Tracking

Want to build this live with Aki?

Aki Wijesundara

Ready to Launch Your AI Career?

Table of Contents

Share Article

Get Weekly AI Career Tips