The AI-Native Data Scientist: Analyze Anything in English
Discover how to use Claude Code to perform data analysis by describing what you want in plain English, have it write and run Python and pandas, and then interpret the output.
Key Takeaways
- Comprehensive strategies proven to work at top companies
- Actionable tips you can implement immediately
- Expert insights from industry professionals
The New Model for Data Analysis
The traditional data science loop goes like this: formulate a question, write code, debug syntax errors, re-run, interpret output, write more code. Claude Code breaks this loop by letting you stay in English the entire time. You describe what you want to know, it writes and runs the Python, and then explains what it found. The code becomes a by-product, not the goal.
This is not about replacing Python knowledge. It is about removing the friction between having a question and getting an answer.
A Real Walkthrough: Analyzing a Sales Dataset
Start with a clear question and a data file. Here is a session that covers revenue trends, country-level patterns, and customer churn risk.
Prompt
"I have sales.csv with columns: order_date, customer_id, product, category, units, price, country. Tell me: which product categories drive the most revenue, whether any countries show declining trends over the last 6 months, and which customers are at risk of churning based on purchase recency."
Claude Code loads the data, checks the schema, and breaks the analysis into three parts. For the churn analysis, it writes and runs something like this.
import pandas as pd
df = pd.read_csv('sales.csv', parse_dates=['order_date'])
snapshot_date = df['order_date'].max()
rfm = df.groupby('customer_id').agg(
last_purchase=('order_date', 'max'),
frequency=('order_date', 'count'),
total_revenue=('price', 'sum')
).reset_index()
rfm['recency_days'] = (snapshot_date - rfm['last_purchase']).dt.days
rfm['churn_risk'] = rfm['recency_days'] > 90
print(f"Churn risk customers: {rfm['churn_risk'].sum()}")
print("Revenue at risk: $" + "{:,.0f}".format(rfm[rfm['churn_risk']]['total_revenue'].sum()))
rfm.to_csv('churn_risk_report.csv', index=False)
After running this, Claude Code surfaces a plain-English summary: how many customers are at risk, how much revenue is exposed, and which customers represent the highest-value churn risk.
Reading Output and Asking Follow-up Questions
Every output opens a new question. The key insight is that each follow-up costs almost nothing, so you can explore far more hypotheses than you could with manual analysis.
Prompt
"Show me the top 20 churn-risk customers with their last purchase category and total lifetime value, sorted by lifetime value descending. Export to a CSV I can send to the account team."
A full analysis that would take a junior analyst half a day can happen in 20 minutes of back-and-forth. The shift is not just speed. It is that you can explore more hypotheses because the cost of each question is so low.
Want to build this live with Aki?
Join a Lightning Lesson and go deeper on this topic. Browse upcoming sessions →
Aki Wijesundara
Expert team of AI professionals and career advisors with experience at top tech companies. We've helped 500+ students land internships at Google, Meta, OpenAI, and other leading AI companies.
Ready to Launch Your AI Career?
Join our comprehensive program and get personalized guidance from industry experts who've been where you want to go.
Table of Contents
Share Article
Get Weekly AI Career Tips
Join 5,000+ professionals getting actionable career advice in their inbox.
No spam. Unsubscribe anytime.