Engineering

Langfuse 101: See Inside Your AI App

Getting started with Langfuse takes about ten minutes and immediately shows you every input, output, and latency breakdown your AI app produces.

June 26, 2026
5 min read
Aki Wijesundara
#Langfuse#Observability#Getting Started

Key Takeaways

  • Comprehensive strategies proven to work at top companies
  • Actionable tips you can implement immediately
  • Expert insights from industry professionals

Langfuse is an open-source observability platform for LLM applications. It is free to start, runs on their cloud or your own infrastructure, and gives you a full trace of every LLM call your app makes, including what went in, what came out, how long it took, and how many tokens it used. This post walks you through going from zero to your first scored trace.

Install and Connect

Install the Langfuse Python SDK alongside your Anthropic SDK. Then set your API keys as environment variables. You can get your Langfuse public and secret keys from the project settings page at cloud.langfuse.com.

# Install
pip install langfuse anthropic

# Set environment variables (add to .env or your shell profile)
# LANGFUSE_PUBLIC_KEY=pk-lf-...
# LANGFUSE_SECRET_KEY=sk-lf-...
# LANGFUSE_HOST=https://cloud.langfuse.com

import os
from langfuse import Langfuse
import anthropic

langfuse = Langfuse(
    public_key=os.environ["LANGFUSE_PUBLIC_KEY"],
    secret_key=os.environ["LANGFUSE_SECRET_KEY"],
    host=os.environ.get("LANGFUSE_HOST", "https://cloud.langfuse.com")
)
anthropic_client = anthropic.Anthropic()

print("Connected to Langfuse:", langfuse.auth_check())

If auth_check() returns True, you are connected and ready to start tracing.

Your First Trace

The simplest way to create a trace is to use the @observe decorator from the Langfuse decorators module. Any function decorated with @observe automatically becomes a traced span. Nested calls to other observed functions become child spans within the same trace.

from langfuse.decorators import observe, langfuse_context

@observe()
def answer_user_question(question: str) -> str:
    response = anthropic_client.messages.create(
        model="claude-haiku-4-5",
        max_tokens=512,
        messages=[{"role": "user", "content": question}]
    )
    answer = response.content[0].text

    # Attach metadata to the current trace
    langfuse_context.update_current_trace(
        name="answer_user_question",
        input=question,
        output=answer,
        metadata={"model": "claude-haiku-4-5"}
    )
    return answer

# Call it normally - tracing happens automatically
result = answer_user_question("What is the capital of France?")
print(result)

After running this, open your Langfuse dashboard and you will see the trace appear within a few seconds with full input, output, token count, and latency.

Viewing Spans and Setting Up Score Tracking

Spans let you see which part of your pipeline is slow or failing. Add a span around any significant step by creating a nested observed function. To track quality, post a score to any trace using its trace ID.

from langfuse.decorators import observe, langfuse_context

@observe()
def retrieve_context(query: str) -> str:
    # Your retrieval logic here
    return "Relevant document content..."

@observe()
def generate_answer(question: str) -> dict:
    context = retrieve_context(question)

    response = anthropic_client.messages.create(
        model="claude-haiku-4-5",
        max_tokens=512,
        messages=[{"role": "user", "content": "Context: {}

Question: {}".format(context, question)}]
    )
    answer = response.content[0].text
    trace_id = langfuse_context.get_current_trace_id()
    return {"answer": answer, "trace_id": trace_id}

# After user provides feedback, log it as a score
def log_user_feedback(trace_id: str, thumbs_up: bool):
    langfuse.score(
        trace_id=trace_id,
        name="user_feedback",
        value=1.0 if thumbs_up else 0.0
    )

Prompt

"You are reviewing a Langfuse trace for an AI question-answering app. The trace shows two spans: a retrieval step taking 800ms and an LLM generation step taking 1200ms. The user's question was about a product return policy. The response correctly cited the policy but included three unrelated sentences. Identify the quality issues and suggest one prompt change and one retrieval change to address them."

Once you have traces and scores flowing, the Langfuse dashboard becomes your primary tool for understanding your AI feature. Filter by date, by score range, by latency percentile. The data you collect in the first week will surface issues you would never have found otherwise.

Want to build this live with Aki?

Join a Lightning Lesson and go deeper on this topic. Browse upcoming sessions →

A

Aki Wijesundara

Expert team of AI professionals and career advisors with experience at top tech companies. We've helped 500+ students land internships at Google, Meta, OpenAI, and other leading AI companies.

📍 Silicon Valley🎓 500+ Success Stories⭐ 98% Success Rate

Ready to Launch Your AI Career?

Join our comprehensive program and get personalized guidance from industry experts who've been where you want to go.

Share Article

Get Weekly AI Career Tips

Join 5,000+ professionals getting actionable career advice in their inbox.

No spam. Unsubscribe anytime.