Langfuse 101: See Inside Your AI App
Getting started with Langfuse takes about ten minutes and immediately shows you every input, output, and latency breakdown your AI app produces.
Key Takeaways
- Comprehensive strategies proven to work at top companies
- Actionable tips you can implement immediately
- Expert insights from industry professionals
Langfuse is an open-source observability platform for LLM applications. It is free to start, runs on their cloud or your own infrastructure, and gives you a full trace of every LLM call your app makes, including what went in, what came out, how long it took, and how many tokens it used. This post walks you through going from zero to your first scored trace.
Install and Connect
Install the Langfuse Python SDK alongside your Anthropic SDK. Then set your API keys as environment variables. You can get your Langfuse public and secret keys from the project settings page at cloud.langfuse.com.
# Install
pip install langfuse anthropic
# Set environment variables (add to .env or your shell profile)
# LANGFUSE_PUBLIC_KEY=pk-lf-...
# LANGFUSE_SECRET_KEY=sk-lf-...
# LANGFUSE_HOST=https://cloud.langfuse.com
import os
from langfuse import Langfuse
import anthropic
langfuse = Langfuse(
public_key=os.environ["LANGFUSE_PUBLIC_KEY"],
secret_key=os.environ["LANGFUSE_SECRET_KEY"],
host=os.environ.get("LANGFUSE_HOST", "https://cloud.langfuse.com")
)
anthropic_client = anthropic.Anthropic()
print("Connected to Langfuse:", langfuse.auth_check())
If auth_check() returns True, you are connected and ready to start tracing.
Your First Trace
The simplest way to create a trace is to use the @observe decorator from the Langfuse decorators module. Any function decorated with @observe automatically becomes a traced span. Nested calls to other observed functions become child spans within the same trace.
from langfuse.decorators import observe, langfuse_context
@observe()
def answer_user_question(question: str) -> str:
response = anthropic_client.messages.create(
model="claude-haiku-4-5",
max_tokens=512,
messages=[{"role": "user", "content": question}]
)
answer = response.content[0].text
# Attach metadata to the current trace
langfuse_context.update_current_trace(
name="answer_user_question",
input=question,
output=answer,
metadata={"model": "claude-haiku-4-5"}
)
return answer
# Call it normally - tracing happens automatically
result = answer_user_question("What is the capital of France?")
print(result)
After running this, open your Langfuse dashboard and you will see the trace appear within a few seconds with full input, output, token count, and latency.
Viewing Spans and Setting Up Score Tracking
Spans let you see which part of your pipeline is slow or failing. Add a span around any significant step by creating a nested observed function. To track quality, post a score to any trace using its trace ID.
from langfuse.decorators import observe, langfuse_context
@observe()
def retrieve_context(query: str) -> str:
# Your retrieval logic here
return "Relevant document content..."
@observe()
def generate_answer(question: str) -> dict:
context = retrieve_context(question)
response = anthropic_client.messages.create(
model="claude-haiku-4-5",
max_tokens=512,
messages=[{"role": "user", "content": "Context: {}
Question: {}".format(context, question)}]
)
answer = response.content[0].text
trace_id = langfuse_context.get_current_trace_id()
return {"answer": answer, "trace_id": trace_id}
# After user provides feedback, log it as a score
def log_user_feedback(trace_id: str, thumbs_up: bool):
langfuse.score(
trace_id=trace_id,
name="user_feedback",
value=1.0 if thumbs_up else 0.0
)
Prompt
"You are reviewing a Langfuse trace for an AI question-answering app. The trace shows two spans: a retrieval step taking 800ms and an LLM generation step taking 1200ms. The user's question was about a product return policy. The response correctly cited the policy but included three unrelated sentences. Identify the quality issues and suggest one prompt change and one retrieval change to address them."
Once you have traces and scores flowing, the Langfuse dashboard becomes your primary tool for understanding your AI feature. Filter by date, by score range, by latency percentile. The data you collect in the first week will surface issues you would never have found otherwise.
Want to build this live with Aki?
Join a Lightning Lesson and go deeper on this topic. Browse upcoming sessions →
Aki Wijesundara
Expert team of AI professionals and career advisors with experience at top tech companies. We've helped 500+ students land internships at Google, Meta, OpenAI, and other leading AI companies.
Ready to Launch Your AI Career?
Join our comprehensive program and get personalized guidance from industry experts who've been where you want to go.
Table of Contents
Share Article
Get Weekly AI Career Tips
Join 5,000+ professionals getting actionable career advice in their inbox.
No spam. Unsubscribe anytime.