Engineering

The AI-Native Data Scientist: Analyze Anything in English

Discover how to use Claude Code to perform data analysis by describing what you want in plain English, have it write and run Python and pandas, and then interpret the output.

June 26, 2026
5 min read
Aki Wijesundara
#Data Science#Claude Code#Analysis

Key Takeaways

  • Comprehensive strategies proven to work at top companies
  • Actionable tips you can implement immediately
  • Expert insights from industry professionals

The New Model for Data Analysis

The traditional data science loop goes like this: formulate a question, write code, debug syntax errors, re-run, interpret output, write more code. Claude Code breaks this loop by letting you stay in English the entire time. You describe what you want to know, it writes and runs the Python, and then explains what it found. The code becomes a by-product, not the goal.

This is not about replacing Python knowledge. It is about removing the friction between having a question and getting an answer.

A Real Walkthrough: Analyzing a Sales Dataset

Start with a clear question and a data file. Here is a session that covers revenue trends, country-level patterns, and customer churn risk.

Prompt

"I have sales.csv with columns: order_date, customer_id, product, category, units, price, country. Tell me: which product categories drive the most revenue, whether any countries show declining trends over the last 6 months, and which customers are at risk of churning based on purchase recency."

Claude Code loads the data, checks the schema, and breaks the analysis into three parts. For the churn analysis, it writes and runs something like this.

import pandas as pd

df = pd.read_csv('sales.csv', parse_dates=['order_date'])

snapshot_date = df['order_date'].max()
rfm = df.groupby('customer_id').agg(
    last_purchase=('order_date', 'max'),
    frequency=('order_date', 'count'),
    total_revenue=('price', 'sum')
).reset_index()

rfm['recency_days'] = (snapshot_date - rfm['last_purchase']).dt.days
rfm['churn_risk'] = rfm['recency_days'] > 90

print(f"Churn risk customers: {rfm['churn_risk'].sum()}")
print("Revenue at risk: $" + "{:,.0f}".format(rfm[rfm['churn_risk']]['total_revenue'].sum()))
rfm.to_csv('churn_risk_report.csv', index=False)

After running this, Claude Code surfaces a plain-English summary: how many customers are at risk, how much revenue is exposed, and which customers represent the highest-value churn risk.

Reading Output and Asking Follow-up Questions

Every output opens a new question. The key insight is that each follow-up costs almost nothing, so you can explore far more hypotheses than you could with manual analysis.

Prompt

"Show me the top 20 churn-risk customers with their last purchase category and total lifetime value, sorted by lifetime value descending. Export to a CSV I can send to the account team."

A full analysis that would take a junior analyst half a day can happen in 20 minutes of back-and-forth. The shift is not just speed. It is that you can explore more hypotheses because the cost of each question is so low.

Want to build this live with Aki?

Join a Lightning Lesson and go deeper on this topic. Browse upcoming sessions →

A

Aki Wijesundara

Expert team of AI professionals and career advisors with experience at top tech companies. We've helped 500+ students land internships at Google, Meta, OpenAI, and other leading AI companies.

📍 Silicon Valley🎓 500+ Success Stories⭐ 98% Success Rate

Ready to Launch Your AI Career?

Join our comprehensive program and get personalized guidance from industry experts who've been where you want to go.

Share Article

Get Weekly AI Career Tips

Join 5,000+ professionals getting actionable career advice in their inbox.

No spam. Unsubscribe anytime.