#claude-code#skills#data-science#python#data-analysis

Claude Code for Data Scientists: 35+ Agent Skills

Q: How do I share data science skills with my team?

Commit the skill folder to your repo's `.claude/skills/` or `.cursor/skills/` directory. Everyone who clones the repo gets the skill. For org-wide skills, use the Claude Plugins Marketplace or a shared GitHub repo. Your data dictionary and domain-specific gotchas should be in the skill - they are the most valuable part to share.

Claude Code for data scientists: 35+ agent skills for pandas EDA, matplotlib, Streamlit dashboards, SQL, and Jupyter workflows.

ShirleyJune 11, 2026Updated July 7, 202617 min read

Every data scientist has the same first 50 lines. Import pandas. Load the CSV. Check dtypes. Count nulls. Describe the distributions. Plot the histograms. It is the same boilerplate in every notebook, every project, every dataset. Agent Skills encode that entire opening ritual - plus your team's specific domain knowledge, data dictionaries, and analysis patterns - into a folder Claude loads once and executes automatically.

I cataloged 35+ data science skills across 9 repos. The ecosystem is deeper than I expected. K-Dense-AI has 144 scientific skills (RDKit, Scanpy, BioPython - the real stuff). Anthropic's official sql-queries skill covers 5 SQL dialects. The official OpenAI jupyter-notebook skill has 2.4K installs. Streamlit published their own agent skill. There is even a publication-quality matplotlib skill with journal presets for Nature, Science, and Cell.

Here is what the data-analyst skill produces compared to a generic prompt. Without a skill, you ask "analyze this CSV" and get a wall of df.describe() output. With the skill installed, every analysis follows the What / So What / Now What framework:

What: Monthly churn rate spiked from 3.2% to 5.8% in April, driven entirely by the Basic plan tier (8.1% churn vs 1.9% for Pro).

So What: At this rate, Basic plan revenue drops $47K/month within 90 days. The Pro plan is healthy - this is a tier-specific problem, not a product-wide one.

Now What: Investigate what changed for Basic users in March (pricing change? feature removal? onboarding regression?). Run a cohort analysis on Basic signups from Q1 vs Q4 to isolate the variable.

Same data. The skill forces structure: finding, business impact, recommended action. That is the difference between "here are some numbers" and an analysis a stakeholder can act on.

Why Do Agent Skills Beat ChatGPT Uploads for Data Work?

The default "upload a CSV and ask questions" workflow has three problems:

No memory. Every session starts fresh. You re-explain your schema, your business rules, your column naming conventions every time.
No scripts. The model writes code but cannot execute persistent analysis scripts. Your EDA pipeline, your feature engineering transforms, your model evaluation harness - they live in notebooks, not in the model's toolkit.
No domain context. "Revenue" in your dataset might mean ARR or MRR or gross revenue or net revenue. The model guesses. A skill encodes the answer.

Agent Skills solve all three. The SKILL.md carries your data dictionary, analysis patterns, and domain-specific gotchas. The scripts/ directory carries your pandas pipelines, plotting utilities, and model evaluation code. The references/ directory holds your schema documentation and statistical methodology notes. The agent loads it all when a data task triggers the skill.

How Are Real Data Scientists Using Claude Code?

The skills below are building blocks. Here is what the finished workflows look like in practice, drawn from two data scientists who documented their day-to-day use.

Priya, a senior data scientist at Stripe (previously Uber, 7 years in the field), says Claude Code "replaced maybe 75% of the way I do my job." Her framing is the useful part: do not think of it as a chatbot. Think of it as a staff-level teammate who has already read your repo and understands your project context. Her highest-value uses:

EDA and data wrangling - the 80% of the job that is data sourcing, cleaning, and preprocessing, not model tuning.
Debugging with full context - paste a traceback and it reasons over every surrounding file and dependency, not just the snippet you pasted.
A self-audit trick worth stealing: open a fresh session in the same project directory (empty context window, but the files are still there) and ask it to "find all the potential issues and assumptions made in this project and list them in an assumptions.md file." She surfaced over 100 hidden assumptions on a single project.
Business translation - the thing that separates a data scientist from a software engineer. Feed it your metrics (precision, recall, PR-AUC) and it translates them into dollars saved at a given trade-off, then builds the stakeholder deck. Give it a reference .md of your own past write-ups so the summary sounds like you.

Ryan & Matt Data Science documented 10 concrete use cases, each a single prompt. The full list is a good map of where Claude Code earns its keep:

#	Use case	What the one-shot prompt produced
1	Full EDA	Shapes, dtypes, null summary, descriptive stats, correlation heatmap, top-5 products, outlier flags, all charts saved as PNG, plus a rerunnable EDA script
2	Data cleaning	Title-cased names, fixed emails, standardized phones/dates/currency/country, removed impossible ages, caught non-adjacent duplicates, before/after diff
3	Charts → PowerPoint	Presentation-ready charts and a full deck with per-slide insights, delivered as a reusable Python file for recurring monthly reports
4	SQL from plain English	3 CSVs loaded into in-memory SQLite, 5 business questions answered - and it flagged a data limitation (no refund status; used "canceled" as a proxy and said so)
5	API pull	CoinGecko top-20 crypto → DataFrame → bar chart → CSV
6	Web scraping	Wikipedia table → DataFrame → charts → CSV (works on harder Cloudflare-protected sites with back-and-forth)
7	Feature engineering	8+ engineered churn features with a rationale for each - and a data-leakage warning when one feature perfectly predicted the target
8	A/B testing	Chose the right tests (Mann-Whitney U, Welch's t-test), ran a power analysis, segmented by device, wrote a plain-English "ship it, with nuance" summary
9	ML pipeline	Stratified split, imputation, encoding, scaling, cross-validation, hyperparameter tuning, multi-model comparison on accuracy/precision/recall/F1
10	Streamlit app	A deployable churn-predictor app from the trained model - it even installed Streamlit itself when it hit the missing dependency

The EDA prompt (use case #1) is the one to try first - it is the "first 50 lines" ritual from the top of this article, collapsed into a single paste. Point it at your own file and the "rerunnable script" ending is what turns a one-off into a monthly report:

code

Run a full EDA on <your-file.csv>: dataset shape, dtypes, null summary,
descriptive statistics, distribution plots, a correlation heatmap, and
top categories by <metric>. Flag any outliers or data-quality issues.
Save all charts as PNG, write a summary of key findings, and save the
analysis as a rerunnable Python script.

Two honest caveats from watching these run. First, verify the output: in the scraping example it returned 5 rows when asked for the top 10 - a two-second fix, but you have to catch it. Second, the model is only as sharp as the context you hand it, which is the whole point of the next section.

Set Up Your CLAUDE.md First (Context Beats Clever Prompts)

Every practitioner lands on the same lesson: Claude Code is only as good as your project directory and file-structure context. Skills carry reusable procedures; your CLAUDE.md carries the project-specific truth. Skip it and you are back to re-explaining your schema every session - the exact problem skills were meant to kill.

A data science CLAUDE.md should include:

Project context and goals - what this analysis or model is actually for.
Data schemas and base tables - the columns, the grain, the join keys.
Common mistakes to avoid - the gotchas that bite every newcomer to this dataset.
Structured output format - how you want reports and analyses shaped.
References to your other .md files - e.g. a nextsteps.md so it knows where the project is heading.

Treat it as living. When you notice Claude getting the same thing wrong more than once, add the correction to the file so it stops repeating it. And you do not have to write it cold - open Claude in the repo and ask it to draft the CLAUDE.md with you.

This pairs directly with the custom EDA skill below: the skill encodes the procedure, the CLAUDE.md encodes the project. Together they are the context a generic ChatGPT upload can never match.

What Data Science Skills Should You Install? (by Workflow Stage)

Stage 1: Exploration & EDA

Skill	What It Does	Source	Install
exploratory-data-analysis	Automated EDA across 200+ scientific file formats. Detects file type, generates markdown reports with quality metrics, statistical summaries, missing value analysis. Includes `eda_analyzer.py` script.	K-Dense-AI/scientific-agent-skills	`npx skills add K-Dense-AI/scientific-agent-skills`
analytics-data-analysis	Expert pandas/numpy/seaborn workflow: vectorized ops, descriptive stats, correlation, pivot tables, visualization templates. 551 installs.	skills.sh	`npx skills add mindrally/skills --skill analytics-data-analysis`
data-analyst	Senior analyst workflow: frame business questions as testable hypotheses, write/validate SQL with CTEs, cohort/funnel/hypothesis testing, What/So What/Now What reporting. 368 installs.	skills.sh	`npx skills add borghei/claude-skills --skill data-analyst`
data-scientist	Full statistical pipeline: EDA, A/B testing, hypothesis testing, causal inference, ML model building, time series forecasting, SHAP/LIME interpretability.	skills.sh	`npx skills add 404kidwiz/claude-supercode-skills --skill data-scientist`

What makes these useful: The data-analyst skill from borghei encodes the "What/So What/Now What" reporting framework. Every analysis output includes the finding, the business impact, and the recommended action. That alone saves 30 minutes per report.

Stage 2: Data Cleaning & Transformation

Skill	What It Does	Source	Install
csv-data-wrangler	CSV processing expert: tool selection by file size (pandas vs polars vs DuckDB vs Spark), encoding/delimiter handling, data cleaning pipelines, SQL querying via DuckDB.	skills.sh	`npx skills add 404kidwiz/claude-supercode-skills --skill csv-data-wrangler`
data-cleaning-pipeline	Systematic cleaning: schema validation, missing value imputation (mean/median/predictive), duplicate removal, outlier detection (IQR/Z-score), type standardization, data quality scoring.	skills.sh	`npx skills add aj-geddes/useful-ai-prompts --skill data-cleaning-pipeline`
data-transform	Universal data transforms with pandas, numpy, sklearn: normalization, scaling (StandardScaler, MinMaxScaler, RobustScaler), encoding (LabelEncoder, OneHotEncoder), reshaping.	Microck/ordinary-claude-skills	Clone repo

What makes these useful: The csv-data-wrangler skill automatically selects the right tool for your file size. Under 1GB? pandas. 1-10GB? polars or DuckDB. Over 10GB? Spark. That decision tree alone prevents the #1 mistake data scientists make: loading a 5GB CSV into pandas and wondering why their laptop catches fire.

Stage 3: Visualization

Skill	What It Does	Source	Install
scientific-visualization	Publication-quality figures: matplotlib/seaborn/plotly, multi-panel layouts, error bars, significance markers, colorblind-safe palettes. Journal presets for Nature, Science, Cell. Export PDF/EPS/TIFF.	K-Dense-AI/scientific-agent-skills	Same repo
matplotlib (tvhahn)	9 opinionated chart patterns: horizontal bar, violin+strip, lollipop, decision boundary, heatmap, PR/ROC curves. DejaVu Sans, cubehelix/ColorBrewer palettes. Includes 10 source datasets + 29 public datasets.	tvhahn/matplotlib-skill	Clone repo
interactive-dashboard-builder	Self-contained HTML/JS dashboards with Chart.js, filters, interactivity, and professional styling. By Anthropic. 591 installs.	skills.sh	`npx skills add anthropics/knowledge-work-plugins --skill interactive-dashboard-builder`
visualization-expert	Chart selection guide, dashboard design, visual communication best practices. 2.9K installs.	skills.sh	`npx skills add shubhamsaboo/awesome-llm-apps --skill visualization-expert`
developing-with-streamlit	Official Streamlit skill: bundled reference docs for dashboards, themes, layouts, session state, custom components. Auto-discovers project environment. 1.8K installs.	streamlit/agent-skills	`npx skills add streamlit/agent-skills --skill developing-with-streamlit`

What makes these useful: The tvhahn matplotlib skill is remarkable. 9 opinionated chart patterns with journal-ready defaults. It includes 10 bundled datasets and references 29 public datasets. The Nature/Science/Cell presets for the scientific-visualization skill mean no more fighting with rcParams to get publication formatting right.

Stage 4: SQL & Query Generation

Skill	What It Does	Source	Install
sql-queries	Production SQL across 5 major dialects (PostgreSQL, Snowflake, BigQuery, Redshift, Databricks): window functions, CTEs, cohort retention, funnel analysis, `EXPLAIN ANALYZE`. By Anthropic.	skills.sh	`npx skills add anthropics/knowledge-work-plugins --skill sql-queries`
dbt integration	Translates business questions to dbt models: semantic layer first, then SQL modification, model discovery. Builds `ref()` and `source()` properly. Official dbt Labs skill.	dbt-labs/dbt-agent-skills	`npx skills add dbt-labs/dbt-agent-skills`
motherduck-query	DuckDB SQL for MotherDuck: write, validate, optimize analytical queries against cloud DuckDB. Part of 17-skill analytics catalog.	motherduckdb/agent-skills	Clone repo

What makes these useful: Anthropic's sql-queries skill covers 5 dialects with dialect-specific optimization hints. It knows that Snowflake prefers QUALIFY over subqueries, that BigQuery needs UNNEST for arrays, and that Redshift's DISTKEY matters for join performance. The dbt Labs skill is official - it understands ref() and source() natively.

Stage 5: Machine Learning & Modeling

Skill	What It Does	Source	Install
scikit-learn	Comprehensive classical ML: classification, regression, clustering, dimensionality reduction, preprocessing, hyperparameter tuning, production pipelines.	davila7/claude-code-templates	Clone repo
ai-ml-data-science	Full ML engineering: drift detection, leakage prevention, baselines-first model selection, slice analysis. LightGBM, CatBoost, sklearn, PyTorch, Polars.	skills.sh	`npx skills add vasilyu1983/ai-agents-public --skill ai-ml-data-science`
ml-pipeline-workflow	End-to-end MLOps: DAG orchestration (Airflow, Dagster, Kubeflow), experiment tracking, model versioning, deployment automation.	skills.sh	`npx skills add wshobson/agents --skill ml-pipeline-workflow`

The ai-ml-data-science skill enforces "baselines first" - it always starts with a simple model (logistic regression, random forest) before reaching for gradient boosting or neural nets. It also checks for data leakage automatically. Two mistakes that waste more data science time than any other.

Stage 6: Jupyter Notebooks

Skill	What It Does	Source	Install
jupyter-notebook	Clean, reproducible notebooks. Bundled templates + helper script to avoid JSON mistakes. Structured sections, metadata, reproducibility patterns. Official OpenAI skill. 2.4K installs.	openai/skills	`npx skills add openai/skills --skill jupyter-notebook`
working-in-notebooks	Jupyter, JupyterLab, and marimo notebook workflows.	legout/data-platform-agent-skills	Clone repo

The OpenAI jupyter-notebook skill is the one to start with. It includes bundled templates and a helper script that prevents the #1 notebook mistake: corrupting the .ipynb JSON structure by hand-editing cells.

Claude Code 101 · Rebuilt June 2026

You've read the theory. The course is where you ship.

3 guided Labs - a live website, a full-stack app with Stripe payments, and a business automation with measured 10x token savings - plus the Template Vault: CLAUDE.md templates, 9 skills, 5 subagent definitions, a hooks pack, and the security audit prompt.

Start shipping with Claude Code

Stage 7: Data Pipelines & ETL

Skill	What It Does	Source	Install
building-data-pipelines	Core batch ETL with Polars, DuckDB, PyArrow.	legout/data-platform-agent-skills	Clone repo
building-streaming-pipelines	Streaming ETL and real-time data processing.	legout/data-platform-agent-skills	Same repo
data-engineering-data-pipeline	ETL/ELT, Lambda, Kappa, Lakehouse architectures. Airflow/Prefect orchestration, dbt/Spark transforms, Delta Lake/Iceberg storage.	skills.sh	`npx skills add rmyndharis/antigravity-skills`
assuring-data-pipelines	Data quality and observability: Great Expectations, OpenTelemetry.	legout/data-platform-agent-skills	Same repo

The legout repo is a complete data platform in skill form: 14 skills covering ETL, streaming, quality, notebooks, ML, and app building. If you build data pipelines (not just analyze data), install this entire collection.

Stage 8: Document Output (xlsx, pdf, pptx, docx)

Source: anthropics/skills (Official, bundled with Claude)

Install:

code

/plugin install document-skills@anthropic-agent-skills

xlsx: Programmatic Excel generation. Multi-sheet workbooks with pivot tables, charts, conditional formatting. Eliminates the "can you put that in a spreadsheet?" request.
pdf: Extract text, tables, and structured data from research papers. Parse vendor reports and compliance documents.
pptx: Generate the entire stakeholder presentation from your analysis. Charts, key metrics, methodology, recommendations.
docx: Formal analysis reports with executive summary, methodology section, results tables, and appendices.

Stage 9: Scientific Research (144 skills)

K-Dense-AI/scientific-agent-skills is the largest scientific skill library at 27.9K stars. It covers:

70+ Python package skills: RDKit, Scanpy, BioPython, PyTorch Lightning, and more
78+ scientific databases: UniProt, PDB, ChEMBL, TCGA, and more
Domain-specific workflows: RNA-seq pipelines, single-cell analysis, drug discovery, molecular dynamics, geospatial science, time series forecasting, quantum computing (PennyLane, Qiskit)

If you do computational biology, chemistry, or physics, this repo alone is worth the install.

Which Repos Have the Best Data Science Skills?

Repo	Stars	Skills	Focus
K-Dense-AI/scientific-agent-skills	27.9K	144	Biology, chemistry, physics, genomics, drug discovery
davila7/claude-code-templates	27.8K	20+ scientific	Scientific Python suite (matplotlib, sklearn, dask, scipy)
anthropics/skills	149K	21+	Document skills (xlsx, pdf, docx, pptx) + dashboard builder + SQL
legout/data-platform-agent-skills	-	14	Data engineering platform (ETL, ML, notebooks, quality)
motherduckdb/agent-skills	-	17	DuckDB/MotherDuck analytics workflows
tvhahn/matplotlib-skill	-	1	Publication-quality matplotlib (9 chart patterns, 39 datasets)
streamlit/agent-skills	-	1	Official Streamlit dashboard skill
dbt-labs/dbt-agent-skills	-	1+	Official dbt integration

What Is Missing? Building a Custom EDA Skill

Generic EDA skills exist (see Stage 1 above). But the biggest value comes from building one tuned to YOUR data. Here is the template:

yaml

---
name: eda
description: Run exploratory data analysis on tabular datasets. Use when the 
  user loads a CSV, Parquet, or database table and wants to understand its 
  structure, distributions, missing values, correlations, and anomalies. 
  Produces a formatted analysis report with visualizations.
dependencies: python>=3.10, pandas>=2.0, matplotlib, seaborn, scipy
---

What the skill should encode:

Schema discovery. dtypes, cardinality, sample values for every column. Classify columns as numeric, categorical, datetime, identifier, or free text.
Missing value analysis. Not just counts. Patterns of missingness (MCAR, MAR, MNAR). Recommendations for imputation strategy.
Distribution analysis. Histograms for numerics, value counts for categoricals, time series plots for datetime columns. Flag skew, outliers, and multimodal distributions.
Correlation analysis. Numeric correlations (Pearson and Spearman). Categorical associations (Cramér's V). Feature importance ranking.
Data quality checks. Duplicates, impossible values (negative ages, future dates), inconsistent categories ("US" vs "United States" vs "USA").
Gotchas list. Domain-specific corrections: "Revenue columns in this company are always in cents, not dollars." "The 'status' column uses legacy codes: 1=active, 2=churned, 3=paused."

Why build it yourself instead of waiting for the ecosystem: Your data dictionary is your competitive moat. A generic EDA skill runs the same analysis on every dataset. Your custom EDA skill knows that mrr means monthly recurring revenue, that plan_id=3 is the Enterprise tier, and that the created_at column has a known gap from the migration on 2025-03-15.

Use the skill-creator meta-skill to scaffold it:

code

"Create an EDA skill that analyzes tabular data. It should use pandas for 
data loading, seaborn for visualizations, and scipy for statistical tests. 
Include a gotchas list for common data quality issues."

Then customize the gotchas list with your domain knowledge. That is the skill nobody else can build for you.

How Do You Stack Skills for a Data Science Workflow?

code

Data ingestion:
  → composio (connect to databases, APIs, cloud storage)
  → sql-queries (Anthropic - 5 SQL dialects with optimization hints)
  → dbt integration (official dbt Labs - semantic layer, ref/source)
  → pdf skill (extract tables from research papers and reports)

Exploration & cleaning:
  → exploratory-data-analysis (K-Dense-AI - 200+ file formats)
  → data-analyst (borghei - What/So What/Now What reporting)
  → csv-data-wrangler (auto-selects pandas vs polars vs DuckDB by file size)
  → data-cleaning-pipeline (schema validation, imputation, dedup)

Analysis & modeling:
  → data-scientist (hypothesis testing, causal inference, SHAP/LIME)
  → scikit-learn (davila7 - full classical ML reference)
  → ai-ml-data-science (drift detection, leakage prevention, MLOps)

Visualization & dashboards:
  → scientific-visualization (publication-quality, journal presets)
  → matplotlib (tvhahn - 9 chart patterns, 39 datasets)
  → developing-with-streamlit (official Streamlit skill)
  → interactive-dashboard-builder (Anthropic - HTML/JS dashboards)

Reporting & output:
  → xlsx skill (multi-sheet workbooks, pivot tables)
  → pptx skill (stakeholder presentations from results)
  → docx skill (formal analysis reports)
  → jupyter-notebook (official OpenAI - clean reproducible notebooks)

Once the stack runs end to end, the next move is running it without you sitting there. Wire the whole chain into a scheduled agent that reruns the EDA, retrains the model, and regenerates the report every night, and you have gone from prompting Claude to loop engineering: designing the loop that ships the work while you sleep.

How Does Jupyter Integration Work Today?

Claude Code operates in the terminal, not inside Jupyter. But the workflows compose well:

Claude Code generates the notebook. Ask it to create a .ipynb file with your analysis pipeline. It writes the cells, installs the dependencies, and structures the notebook with markdown explanations.
Claude Code generates scripts that notebooks import. Put your data transforms, feature engineering, and model evaluation code in .py files. Import them from notebooks. Claude Code maintains the Python files; you interact with the analysis through Jupyter.
Claude Code processes notebook output. Export results from Jupyter as CSVs or DataFrames saved to disk. Claude Code picks them up for report generation (xlsx, pptx, docx).

The gap is real-time notebook interaction. Claude Code cannot execute cells inside a running Jupyter kernel. But for the bookends (setup and reporting), skills eliminate the manual work.

Which MCP Servers Pair With Data Science Skills?

Skills tell the agent what to do. MCP servers give it access to data. The combination is where the real power is.

MCP Server	What It Connects To	Pairs With
Postgres MCP	PostgreSQL databases	EDA skill, scientific-agent-skills
BigQuery MCP	Google BigQuery	xlsx skill for report generation
Snowflake MCP	Snowflake data warehouse	Any analysis skill
S3 MCP	AWS S3 buckets	pdf skill for document processing
Composio	100+ apps (Stripe, Salesforce, HubSpot)	All data skills

Anthropic Built 300+ Claude Code Skills Internally. Here's What They Learned. - How to build your own skills + the complete skills directory by category
Claude Code for Sales - 60+ sales skills mapped to every pipeline stage
Claude Code for Growth & Marketing - 120+ marketing skills with the shared context pattern
Claude Code for DevOps - 150+ skills for Docker, K8s, Terraform, and CI/CD
Claude Code for Solopreneurs - The 15-skill stack that replaces 5 hires

Start Here

Install the Document Skills first (/plugin install document-skills). Generate an Excel dashboard from your next analysis. Then build a custom EDA skill with your team's data dictionary and domain gotchas. That is the skill nobody else can build for you.

Want to build your own data science skills from scratch? The skills in this guide are starting points. The real value is encoding YOUR data dictionary, YOUR analysis patterns, YOUR domain-specific gotchas. Agent Skills 101 covers the full SKILL.md spec, progressive loading architecture, security best practices, and how to build skills that work across Claude Code, Cursor, and 50+ other agents. 25 lessons.

Start Agent Skills 101 →

Open source · free

AI Builder Club Skills

Beyond the tools above, our own Claude Code skills are open and free. Install the plugin and run them in your repo.

View on GitHub →

Frequently Asked Questions

Can Claude Code replace Jupyter notebooks?

No. Claude Code and Jupyter serve different purposes. Jupyter is an interactive exploration environment with rich output rendering. Claude Code is a task-execution agent that generates, modifies, and processes files. They work best together: Claude Code generates notebooks and scripts, Jupyter executes and visualizes, Claude Code processes the results into reports.

What about large datasets?

Claude Code processes files on your local machine. It can handle datasets that fit in memory using pandas, or use chunked processing for larger files. For datasets that require distributed compute (Spark, Dask), use Claude Code to generate the pipeline code, then execute it in your cluster environment.

Do I need to know Python to use data science skills?

Not for the document skills (xlsx, pdf, pptx, docx) - those produce output files directly. For skills that generate Python code (scientific-agent-skills, custom EDA), basic Python literacy helps for reviewing and debugging the output. But the skill handles the code generation; you describe what you want in natural language.

Can skills connect to my company's databases?

Not directly. Skills are instruction files. Use an MCP server (Postgres MCP, BigQuery MCP, Snowflake MCP) to provide the database connection. The skill tells the agent what analysis to run; the MCP server provides the data access.

Will Claude Code catch data problems like leakage or bad joins?

Sometimes, and impressively. In documented workflows it flagged a data-leakage risk when one feature perfectly predicted the target, and it noted when a dataset had no refund status and told you it was using 'canceled' as a proxy instead. But it also makes quiet mistakes - returning 5 rows when asked for 10, for example - so treat it as a sharp junior analyst whose work you still review, not an oracle. A useful audit trick: open a fresh session in the same project directory and ask it to list every assumption made in the project into an assumptions.md file.

How do I share data science skills with my team?

Commit the skill folder to your repo's .claude/skills/ or .cursor/skills/ directory. Everyone who clones the repo gets the skill. For org-wide skills, use the Claude Plugins Marketplace or a shared GitHub repo. Your data dictionary and domain-specific gotchas should be in the skill - they are the most valuable part to share.

Sources & Verification

This guide is written from hands-on testing, then cross-checked against primary sources - official documentation and first-party announcements. Field results and opinions are labeled as such. See our editorial standards.

Anthropic Agent Skills Repository (GitHub) - Official document skills (xlsx, pdf, pptx, docx) and example skills
K-Dense-AI/scientific-agent-skills (GitHub) - Research, science, and finance workflow skills
composiohq/composio (GitHub) - 100+ app connectors with managed authentication
Agent Skills Specification - The open standard for portable AI agent capabilities
VoltAgent/awesome-agent-skills (GitHub) - Directory of 1,424+ curated skills
How I Use Claude Code as a Data Scientist (10 Real Use Cases) - Ryan & Matt Data Science (YouTube) - 10 single-prompt data science workflows: EDA, cleaning, SQL, feature engineering, A/B testing, ML pipeline, Streamlit
Claude Code for Data Scientists in 9 minutes - Priya (Senior DS at Stripe) (YouTube) - Practitioner workflow: CLAUDE.md context, assumptions.md self-audit, business translation, saving tasks as skills

Join AI Builder Club

✓65+ lessons, 22+ workshops

✓350+ plug-and-play prompts & skills

✓Weekly live builder workshop

✓Premium tools (e.g. 10xCoder, AI tutor)

✓AI Builder Pack ($5,000+ in exclusive AI credits & perks)

1k+

Join 1,000+ builders already inside

Start shipping →30-day money-back · Cancel anytime

$37/mo

Live workshop

Get the free newsletter

Weekly deep-dives on AI tools, automation workflows, and builder strategies. Join 5,000+ readers.

No spam. Unsubscribe anytime.

Continue Learning

Claude Code 101

You've read the theory. The course is where you ship: 3 guided Labs (live website, full-stack app with payments, business automation) plus the Template Vault starter kit. Rebuilt June 2026.

Ultimate Cursor Courses

Build full stack web and mobile apps with Cursor AI — PRD workflows, TDD, memory bank, and real case studies.

← Back to Blog

Why Do Agent Skills Beat ChatGPT Uploads for Data Work?

How Are Real Data Scientists Using Claude Code?

Set Up Your CLAUDE.md First (Context Beats Clever Prompts)

What Data Science Skills Should You Install? (by Workflow Stage)

Stage 1: Exploration & EDA

Stage 2: Data Cleaning & Transformation

Stage 3: Visualization

Stage 4: SQL & Query Generation

Stage 5: Machine Learning & Modeling

Stage 6: Jupyter Notebooks

You've read the theory. The course is where you ship.

Stage 7: Data Pipelines & ETL

Stage 8: Document Output (xlsx, pdf, pptx, docx)

Stage 9: Scientific Research (144 skills)

Which Repos Have the Best Data Science Skills?

What Is Missing? Building a Custom EDA Skill

How Do You Stack Skills for a Data Science Workflow?

How Does Jupyter Integration Work Today?

Which MCP Servers Pair With Data Science Skills?

Related Content

Start Here

Frequently Asked Questions

Can Claude Code replace Jupyter notebooks?

What about large datasets?

Do I need to know Python to use data science skills?

Can skills connect to my company's databases?

Will Claude Code catch data problems like leakage or bad joins?

How do I share data science skills with my team?

Sources & Verification

Join AI Builder Club

Get the free newsletter

Continue Learning

Claude Code 101

Ultimate Cursor Courses