├── requirements.txt
├── csv-data-summarizer.zip
├── .gitignore
├── resources
├── sample.csv
└── README.md
├── SKILL.md
├── examples
└── showcase_financial_pl_data.csv
├── analyze.py
└── README.md
/requirements.txt:
--------------------------------------------------------------------------------
1 | pandas>=2.0.0
2 | matplotlib>=3.7.0
3 | seaborn>=0.12.0
4 |
5 |
--------------------------------------------------------------------------------
/csv-data-summarizer.zip:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/coffeefuelbump/csv-data-summarizer-claude-skill/HEAD/csv-data-summarizer.zip
--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
1 | # Python
2 | __pycache__/
3 | *.py[cod]
4 | *$py.class
5 | *.so
6 | .Python
7 | env/
8 | venv/
9 | .venv/
10 |
11 | # Distribution / packaging
12 | *.egg-info/
13 | dist/
14 | build/
15 |
16 | # Jupyter Notebook
17 | .ipynb_checkpoints
18 |
19 | # IDE
20 | .vscode/
21 | .idea/
22 | *.swp
23 | *.swo
24 | *~
25 |
26 | # Project specific
27 | *.png
28 | *.jpg
29 | *.jpeg
30 | chart.png
31 | correlation_heatmap.png
32 | time_series_analysis.png
33 | distributions.png
34 | categorical_distributions.png
35 |
36 | # Allow the skill zip file to be committed
37 | !csv-data-summarizer.zip
38 |
39 | # OS
40 | .DS_Store
41 | Thumbs.db
42 |
43 |
--------------------------------------------------------------------------------
/resources/sample.csv:
--------------------------------------------------------------------------------
1 | date,product,quantity,revenue,customer_id,region
2 | 2024-01-15,Widget A,5,129.99,C001,North
3 | 2024-01-16,Widget B,3,89.97,C002,South
4 | 2024-01-17,Widget A,7,181.98,C003,East
5 | 2024-01-18,Widget C,2,199.98,C001,North
6 | 2024-01-19,Widget B,4,119.96,C004,West
7 | 2024-01-20,Widget A,6,155.94,C005,South
8 | 2024-01-21,Widget C,1,99.99,C002,South
9 | 2024-01-22,Widget B,8,239.92,C006,East
10 | 2024-01-23,Widget A,3,77.97,C007,North
11 | 2024-01-24,Widget C,5,499.95,C003,East
12 | 2024-01-25,Widget B,2,59.98,C008,West
13 | 2024-01-26,Widget A,9,233.91,C004,West
14 | 2024-01-27,Widget C,3,299.97,C009,North
15 | 2024-01-28,Widget B,6,179.94,C010,South
16 | 2024-01-29,Widget A,4,103.96,C005,South
17 | 2024-01-30,Widget C,7,699.93,C011,East
18 | 2024-01-31,Widget B,5,149.95,C012,West
19 | 2024-02-01,Widget A,8,207.92,C013,North
20 | 2024-02-02,Widget C,2,199.98,C014,South
21 | 2024-02-03,Widget B,10,299.90,C015,East
22 |
23 |
--------------------------------------------------------------------------------
/resources/README.md:
--------------------------------------------------------------------------------
1 | # CSV Data Summarizer - Resources
2 |
3 | ---
4 |
5 | ## 🌟 Connect & Learn More
6 |
7 |
8 |
9 | ### 🚀 **Join Our Community**
10 | [-blue?style=for-the-badge&logo=)](https://www.skool.com/ai-for-your-business/about)
11 |
12 | ### 🔗 **All My Links**
13 | [](https://linktr.ee/corbin_brown)
14 |
15 | ### 🛠️ **Become a Builder**
16 | [](https://www.youtube.com/channel/UCJFMlSxcvlZg5yZUYJT0Pug/join)
17 |
18 | ### 🐦 **Follow on Twitter**
19 | [](https://twitter.com/corbin_braun)
20 |
21 |
22 |
23 | ---
24 |
25 | ## Sample Data
26 |
27 | The `sample.csv` file contains example sales data with the following columns:
28 |
29 | - **date**: Transaction date
30 | - **product**: Product name (Widget A, B, or C)
31 | - **quantity**: Number of items sold
32 | - **revenue**: Total revenue from the transaction
33 | - **customer_id**: Unique customer identifier
34 | - **region**: Geographic region (North, South, East, West)
35 |
36 | ## Usage Examples
37 |
38 | ### Basic Summary
39 | ```
40 | Analyze sample.csv
41 | ```
42 |
43 | ### With Custom CSV
44 | ```
45 | Here's my sales_data.csv file. Can you summarize it?
46 | ```
47 |
48 | ### Focus on Specific Insights
49 | ```
50 | What are the revenue trends in this dataset?
51 | ```
52 |
53 | ## Testing the Skill
54 |
55 | You can test the skill locally before uploading to Claude:
56 |
57 | ```bash
58 | # Install dependencies
59 | pip install -r ../requirements.txt
60 |
61 | # Run the analysis
62 | python ../analyze.py sample.csv
63 | ```
64 |
65 | ## Expected Output
66 |
67 | The analysis will provide:
68 |
69 | 1. **Dataset dimensions** - Row and column counts
70 | 2. **Column information** - Names and data types
71 | 3. **Summary statistics** - Mean, median, std dev, min/max for numeric columns
72 | 4. **Data quality** - Missing value detection and counts
73 | 5. **Visualizations** - Time-series plots when date columns are present
74 |
75 | ## Customization
76 |
77 | To adapt this skill for your specific use case:
78 |
79 | 1. Modify `analyze.py` to include domain-specific calculations
80 | 2. Add custom visualization types in the plotting section
81 | 3. Include validation rules specific to your data
82 | 4. Add more sample datasets to test different scenarios
83 |
84 |
--------------------------------------------------------------------------------
/SKILL.md:
--------------------------------------------------------------------------------
1 | ---
2 | name: csv-data-summarizer
3 | description: Analyzes CSV files, generates summary stats, and plots quick visualizations using Python and pandas.
4 | metadata:
5 | version: 2.1.0
6 | dependencies: python>=3.8, pandas>=2.0.0, matplotlib>=3.7.0, seaborn>=0.12.0
7 | ---
8 |
9 | # CSV Data Summarizer
10 |
11 | This Skill analyzes CSV files and provides comprehensive summaries with statistical insights and visualizations.
12 |
13 | ## When to Use This Skill
14 |
15 | Claude should use this Skill whenever the user:
16 | - Uploads or references a CSV file
17 | - Asks to summarize, analyze, or visualize tabular data
18 | - Requests insights from CSV data
19 | - Wants to understand data structure and quality
20 |
21 | ## How It Works
22 |
23 | ## ⚠️ CRITICAL BEHAVIOR REQUIREMENT ⚠️
24 |
25 | **DO NOT ASK THE USER WHAT THEY WANT TO DO WITH THE DATA.**
26 | **DO NOT OFFER OPTIONS OR CHOICES.**
27 | **DO NOT SAY "What would you like me to help you with?"**
28 | **DO NOT LIST POSSIBLE ANALYSES.**
29 |
30 | **IMMEDIATELY AND AUTOMATICALLY:**
31 | 1. Run the comprehensive analysis
32 | 2. Generate ALL relevant visualizations
33 | 3. Present complete results
34 | 4. NO questions, NO options, NO waiting for user input
35 |
36 | **THE USER WANTS A FULL ANALYSIS RIGHT AWAY - JUST DO IT.**
37 |
38 | ### Automatic Analysis Steps:
39 |
40 | **The skill intelligently adapts to different data types and industries by inspecting the data first, then determining what analyses are most relevant.**
41 |
42 | 1. **Load and inspect** the CSV file into pandas DataFrame
43 | 2. **Identify data structure** - column types, date columns, numeric columns, categories
44 | 3. **Determine relevant analyses** based on what's actually in the data:
45 | - **Sales/E-commerce data** (order dates, revenue, products): Time-series trends, revenue analysis, product performance
46 | - **Customer data** (demographics, segments, regions): Distribution analysis, segmentation, geographic patterns
47 | - **Financial data** (transactions, amounts, dates): Trend analysis, statistical summaries, correlations
48 | - **Operational data** (timestamps, metrics, status): Time-series, performance metrics, distributions
49 | - **Survey data** (categorical responses, ratings): Frequency analysis, cross-tabulations, distributions
50 | - **Generic tabular data**: Adapts based on column types found
51 |
52 | 4. **Only create visualizations that make sense** for the specific dataset:
53 | - Time-series plots ONLY if date/timestamp columns exist
54 | - Correlation heatmaps ONLY if multiple numeric columns exist
55 | - Category distributions ONLY if categorical columns exist
56 | - Histograms for numeric distributions when relevant
57 |
58 | 5. **Generate comprehensive output** automatically including:
59 | - Data overview (rows, columns, types)
60 | - Key statistics and metrics relevant to the data type
61 | - Missing data analysis
62 | - Multiple relevant visualizations (only those that apply)
63 | - Actionable insights based on patterns found in THIS specific dataset
64 |
65 | 6. **Present everything** in one complete analysis - no follow-up questions
66 |
67 | **Example adaptations:**
68 | - Healthcare data with patient IDs → Focus on demographics, treatment patterns, temporal trends
69 | - Inventory data with stock levels → Focus on quantity distributions, reorder patterns, SKU analysis
70 | - Web analytics with timestamps → Focus on traffic patterns, conversion metrics, time-of-day analysis
71 | - Survey responses → Focus on response distributions, demographic breakdowns, sentiment patterns
72 |
73 | ### Behavior Guidelines
74 |
75 | ✅ **CORRECT APPROACH - SAY THIS:**
76 | - "I'll analyze this data comprehensively right now."
77 | - "Here's the complete analysis with visualizations:"
78 | - "I've identified this as [type] data and generated relevant insights:"
79 | - Then IMMEDIATELY show the full analysis
80 |
81 | ✅ **DO:**
82 | - Immediately run the analysis script
83 | - Generate ALL relevant charts automatically
84 | - Provide complete insights without being asked
85 | - Be thorough and complete in first response
86 | - Act decisively without asking permission
87 |
88 | ❌ **NEVER SAY THESE PHRASES:**
89 | - "What would you like to do with this data?"
90 | - "What would you like me to help you with?"
91 | - "Here are some common options:"
92 | - "Let me know what you'd like help with"
93 | - "I can create a comprehensive analysis if you'd like!"
94 | - Any sentence ending with "?" asking for user direction
95 | - Any list of options or choices
96 | - Any conditional "I can do X if you want"
97 |
98 | ❌ **FORBIDDEN BEHAVIORS:**
99 | - Asking what the user wants
100 | - Listing options for the user to choose from
101 | - Waiting for user direction before analyzing
102 | - Providing partial analysis that requires follow-up
103 | - Describing what you COULD do instead of DOING it
104 |
105 | ### Usage
106 |
107 | The Skill provides a Python function `summarize_csv(file_path)` that:
108 | - Accepts a path to a CSV file
109 | - Returns a comprehensive text summary with statistics
110 | - Generates multiple visualizations automatically based on data structure
111 |
112 | ### Example Prompts
113 |
114 | > "Here's `sales_data.csv`. Can you summarize this file?"
115 |
116 | > "Analyze this customer data CSV and show me trends."
117 |
118 | > "What insights can you find in `orders.csv`?"
119 |
120 | ### Example Output
121 |
122 | **Dataset Overview**
123 | - 5,000 rows × 8 columns
124 | - 3 numeric columns, 1 date column
125 |
126 | **Summary Statistics**
127 | - Average order value: $58.2
128 | - Standard deviation: $12.4
129 | - Missing values: 2% (100 cells)
130 |
131 | **Insights**
132 | - Sales show upward trend over time
133 | - Peak activity in Q4
134 | *(Attached: trend plot)*
135 |
136 | ## Files
137 |
138 | - `analyze.py` - Core analysis logic
139 | - `requirements.txt` - Python dependencies
140 | - `resources/sample.csv` - Example dataset for testing
141 | - `resources/README.md` - Additional documentation
142 |
143 | ## Notes
144 |
145 | - Automatically detects date columns (columns containing 'date' in name)
146 | - Handles missing data gracefully
147 | - Generates visualizations only when date columns are present
148 | - All numeric columns are included in statistical summary
149 |
150 |
--------------------------------------------------------------------------------
/examples/showcase_financial_pl_data.csv:
--------------------------------------------------------------------------------
1 | month,year,quarter,product_line,total_revenue,cost_of_goods_sold,gross_profit,gross_margin_pct,marketing_expense,sales_expense,rd_expense,admin_expense,total_operating_expenses,operating_income,operating_margin_pct,interest_expense,tax_expense,net_income,net_margin_pct,customer_acquisition_cost,customer_lifetime_value,units_sold,avg_selling_price,headcount,revenue_per_employee
2 | Jan,2023,Q1,SaaS Platform,450000,135000,315000,70.0,65000,85000,45000,35000,230000,85000,18.9,5000,16000,64000,14.2,125,2400,1200,375,45,10000
3 | Jan,2023,Q1,Enterprise Solutions,280000,112000,168000,60.0,35000,55000,25000,20000,135000,33000,11.8,3000,6600,23400,8.4,450,8500,450,622,45,6222
4 | Jan,2023,Q1,Professional Services,125000,50000,75000,60.0,15000,22000,8000,12000,57000,18000,14.4,1500,3600,12900,10.3,200,3200,95,1316,45,2778
5 | Feb,2023,Q1,SaaS Platform,475000,142500,332500,70.0,68000,89000,47000,36000,240000,92500,19.5,5200,18500,68800,14.5,120,2500,1300,365,47,10106
6 | Feb,2023,Q1,Enterprise Solutions,295000,118000,177000,60.0,38000,58000,27000,22000,145000,32000,10.8,3200,6400,22400,7.6,440,8600,470,628,47,6277
7 | Feb,2023,Q1,Professional Services,135000,54000,81000,60.0,16000,24000,9000,13000,62000,19000,14.1,1600,3800,13600,10.1,195,3300,105,1286,47,2872
8 | Mar,2023,Q1,SaaS Platform,520000,156000,364000,70.0,75000,95000,52000,40000,262000,102000,19.6,5500,19250,77250,14.9,115,2650,1450,359,50,10400
9 | Mar,2023,Q1,Enterprise Solutions,325000,130000,195000,60.0,42000,63000,30000,25000,160000,35000,10.8,3500,7000,24500,7.5,425,8800,520,625,50,6500
10 | Mar,2023,Q1,Professional Services,148000,59200,88800,60.0,18000,26000,10000,14000,68000,20800,14.1,1800,4160,14840,10.0,190,3400,115,1287,50,2960
11 | Apr,2023,Q2,SaaS Platform,555000,166500,388500,70.0,80000,100000,55000,42000,277000,111500,20.1,5800,22300,83400,15.0,110,2750,1550,358,52,10673
12 | Apr,2023,Q2,Enterprise Solutions,340000,136000,204000,60.0,45000,65000,32000,26000,168000,36000,10.6,3700,7200,25100,7.4,420,9000,540,630,52,6538
13 | Apr,2023,Q2,Professional Services,158000,63200,94800,60.0,19000,27000,11000,15000,72000,22800,14.4,1900,4560,16340,10.3,185,3500,125,1264,52,3038
14 | May,2023,Q2,SaaS Platform,590000,177000,413000,70.0,85000,105000,58000,44000,292000,121000,20.5,6000,24200,90800,15.4,105,2850,1650,358,55,10727
15 | May,2023,Q2,Enterprise Solutions,365000,146000,219000,60.0,48000,68000,35000,28000,179000,40000,11.0,4000,8000,28000,7.7,410,9200,580,629,55,6636
16 | May,2023,Q2,Professional Services,172000,68800,103200,60.0,21000,29000,12000,16000,78000,25200,14.7,2100,5040,18060,10.5,180,3600,135,1274,55,3127
17 | Jun,2023,Q2,SaaS Platform,625000,187500,437500,70.0,90000,110000,62000,46000,308000,129500,20.7,6200,25850,97450,15.6,100,2950,1750,357,58,10776
18 | Jun,2023,Q2,Enterprise Solutions,385000,154000,231000,60.0,50000,70000,37000,29000,186000,45000,11.7,4200,9000,31800,8.3,400,9400,610,631,58,6638
19 | Jun,2023,Q2,Professional Services,185000,74000,111000,60.0,22000,31000,13000,17000,83000,28000,15.1,2200,5580,20220,10.9,175,3700,145,1276,58,3190
20 | Jul,2023,Q3,SaaS Platform,665000,199500,465500,70.0,95000,115000,65000,48000,323000,142500,21.4,6500,28500,107500,16.2,95,3050,1850,359,60,11083
21 | Jul,2023,Q3,Enterprise Solutions,410000,164000,246000,60.0,53000,73000,40000,31000,197000,49000,12.0,4400,9800,34800,8.5,390,9600,650,631,60,6833
22 | Jul,2023,Q3,Professional Services,198000,79200,118800,60.0,24000,33000,14000,18000,89000,29800,15.1,2400,5960,21440,10.8,170,3800,155,1277,60,3300
23 | Aug,2023,Q3,SaaS Platform,705000,211500,493500,70.0,100000,120000,68000,50000,338000,155500,22.1,6800,31100,117600,16.7,90,3150,1950,362,63,11190
24 | Aug,2023,Q3,Enterprise Solutions,435000,174000,261000,60.0,56000,76000,42000,33000,207000,54000,12.4,4600,10800,38600,8.9,380,9800,690,630,63,6905
25 | Aug,2023,Q3,Professional Services,210000,84000,126000,60.0,25000,35000,15000,19000,94000,32000,15.2,2500,6400,23100,11.0,165,3900,165,1273,63,3333
26 | Sep,2023,Q3,SaaS Platform,750000,225000,525000,70.0,108000,128000,72000,53000,361000,164000,21.9,7200,33360,123440,16.5,88,3250,2080,360,65,11538
27 | Sep,2023,Q3,Enterprise Solutions,465000,186000,279000,60.0,60000,80000,45000,35000,220000,59000,12.7,5000,11800,42200,9.1,370,10000,735,633,65,7154
28 | Sep,2023,Q3,Professional Services,225000,90000,135000,60.0,27000,37000,16000,20000,100000,35000,15.6,2700,6920,25380,11.3,160,4000,175,1286,65,3462
29 | Oct,2023,Q4,SaaS Platform,795000,238500,556500,70.0,115000,135000,75000,55000,380000,176500,22.2,7500,35870,133130,16.7,85,3350,2200,361,68,11691
30 | Oct,2023,Q4,Enterprise Solutions,490000,196000,294000,60.0,63000,83000,47000,36000,229000,65000,13.3,5200,13000,46800,9.6,360,10200,770,636,68,7206
31 | Oct,2023,Q4,Professional Services,238000,95200,142800,60.0,29000,39000,17000,21000,106000,36800,15.5,2800,7360,26640,11.2,158,4100,185,1286,68,3500
32 | Nov,2023,Q4,SaaS Platform,840000,252000,588000,70.0,122000,142000,78000,58000,400000,188000,22.4,7800,38440,141760,16.9,82,3450,2320,362,70,12000
33 | Nov,2023,Q4,Enterprise Solutions,520000,208000,312000,60.0,67000,87000,50000,38000,242000,70000,13.5,5500,14100,50400,9.7,355,10400,815,638,70,7429
34 | Nov,2023,Q4,Professional Services,252000,100800,151200,60.0,31000,41000,18000,22000,112000,39200,15.6,3000,7728,28472,11.3,155,4200,195,1292,70,3600
35 | Dec,2023,Q4,SaaS Platform,895000,268500,626500,70.0,130000,150000,82000,62000,424000,202500,22.6,8200,41145,153155,17.1,80,3550,2480,361,72,12431
36 | Dec,2023,Q4,Enterprise Solutions,555000,222000,333000,60.0,72000,92000,53000,40000,257000,76000,13.7,6000,15400,54600,9.8,350,10600,870,638,72,7708
37 | Dec,2023,Q4,Professional Services,268000,107200,160800,60.0,33000,43000,19000,23000,118000,42800,16.0,3200,8352,31248,11.7,152,4300,205,1307,72,3722
38 | Jan,2024,Q1,SaaS Platform,925000,277500,647500,70.0,135000,155000,85000,64000,439000,208500,22.5,8500,42070,157930,17.1,78,3650,2550,363,75,12333
39 | Jan,2024,Q1,Enterprise Solutions,575000,230000,345000,60.0,75000,95000,55000,42000,267000,78000,13.6,6200,15760,56040,9.7,345,10800,900,639,75,7667
40 | Jan,2024,Q1,Professional Services,280000,112000,168000,60.0,34000,45000,20000,24000,123000,45000,16.1,3300,8770,32930,11.8,150,4400,215,1302,75,3733
41 | Feb,2024,Q1,SaaS Platform,965000,289500,675500,70.0,140000,160000,88000,66000,454000,221500,23.0,8800,44510,168190,17.4,75,3750,2660,363,77,12532
42 | Feb,2024,Q1,Enterprise Solutions,600000,240000,360000,60.0,78000,98000,57000,43000,276000,84000,14.0,6400,16800,60800,10.1,340,11000,940,638,77,7792
43 | Feb,2024,Q1,Professional Services,295000,118000,177000,60.0,36000,47000,21000,25000,129000,48000,16.3,3500,9420,35080,11.9,148,4500,225,1311,77,3831
44 | Mar,2024,Q1,SaaS Platform,1020000,306000,714000,70.0,148000,168000,92000,69000,477000,237000,23.2,9200,47880,179920,17.6,73,3850,2810,363,80,12750
45 | Mar,2024,Q1,Enterprise Solutions,635000,254000,381000,60.0,82000,103000,60000,45000,290000,91000,14.3,6800,18200,66000,10.4,335,11200,990,641,80,7938
46 | Mar,2024,Q1,Professional Services,312000,124800,187200,60.0,38000,49000,22000,26000,135000,52200,16.7,3700,10230,38270,12.3,145,4600,240,1300,80,3900
47 |
--------------------------------------------------------------------------------
/analyze.py:
--------------------------------------------------------------------------------
1 | import pandas as pd
2 | import matplotlib.pyplot as plt
3 | import seaborn as sns
4 | from pathlib import Path
5 |
6 | def summarize_csv(file_path):
7 | """
8 | Comprehensively analyzes a CSV file and generates multiple visualizations.
9 |
10 | Args:
11 | file_path (str): Path to the CSV file
12 |
13 | Returns:
14 | str: Formatted comprehensive analysis of the dataset
15 | """
16 | df = pd.read_csv(file_path)
17 | summary = []
18 | charts_created = []
19 |
20 | # Basic info
21 | summary.append("=" * 60)
22 | summary.append("📊 DATA OVERVIEW")
23 | summary.append("=" * 60)
24 | summary.append(f"Rows: {df.shape[0]:,} | Columns: {df.shape[1]}")
25 | summary.append(f"\nColumns: {', '.join(df.columns.tolist())}")
26 |
27 | # Data types
28 | summary.append(f"\n📋 DATA TYPES:")
29 | for col, dtype in df.dtypes.items():
30 | summary.append(f" • {col}: {dtype}")
31 |
32 | # Missing data analysis
33 | missing = df.isnull().sum().sum()
34 | missing_pct = (missing / (df.shape[0] * df.shape[1])) * 100
35 | summary.append(f"\n🔍 DATA QUALITY:")
36 | if missing:
37 | summary.append(f"Missing values: {missing:,} ({missing_pct:.2f}% of total data)")
38 | summary.append("Missing by column:")
39 | for col in df.columns:
40 | col_missing = df[col].isnull().sum()
41 | if col_missing > 0:
42 | col_pct = (col_missing / len(df)) * 100
43 | summary.append(f" • {col}: {col_missing:,} ({col_pct:.1f}%)")
44 | else:
45 | summary.append("✓ No missing values - dataset is complete!")
46 |
47 | # Numeric analysis
48 | numeric_cols = df.select_dtypes(include='number').columns.tolist()
49 | if numeric_cols:
50 | summary.append(f"\n📈 NUMERICAL ANALYSIS:")
51 | summary.append(str(df[numeric_cols].describe()))
52 |
53 | # Correlations if multiple numeric columns
54 | if len(numeric_cols) > 1:
55 | summary.append(f"\n🔗 CORRELATIONS:")
56 | corr_matrix = df[numeric_cols].corr()
57 | summary.append(str(corr_matrix))
58 |
59 | # Create correlation heatmap
60 | plt.figure(figsize=(10, 8))
61 | sns.heatmap(corr_matrix, annot=True, cmap='coolwarm', center=0,
62 | square=True, linewidths=1)
63 | plt.title('Correlation Heatmap')
64 | plt.tight_layout()
65 | plt.savefig('correlation_heatmap.png', dpi=150)
66 | plt.close()
67 | charts_created.append('correlation_heatmap.png')
68 |
69 | # Categorical analysis
70 | categorical_cols = df.select_dtypes(include=['object']).columns.tolist()
71 | categorical_cols = [c for c in categorical_cols if 'id' not in c.lower()]
72 |
73 | if categorical_cols:
74 | summary.append(f"\n📊 CATEGORICAL ANALYSIS:")
75 | for col in categorical_cols[:5]: # Limit to first 5
76 | value_counts = df[col].value_counts()
77 | summary.append(f"\n{col}:")
78 | for val, count in value_counts.head(10).items():
79 | pct = (count / len(df)) * 100
80 | summary.append(f" • {val}: {count:,} ({pct:.1f}%)")
81 |
82 | # Time series analysis
83 | date_cols = [c for c in df.columns if 'date' in c.lower() or 'time' in c.lower()]
84 | if date_cols:
85 | summary.append(f"\n📅 TIME SERIES ANALYSIS:")
86 | date_col = date_cols[0]
87 | df[date_col] = pd.to_datetime(df[date_col], errors='coerce')
88 |
89 | date_range = df[date_col].max() - df[date_col].min()
90 | summary.append(f"Date range: {df[date_col].min()} to {df[date_col].max()}")
91 | summary.append(f"Span: {date_range.days} days")
92 |
93 | # Create time-series plots for numeric columns
94 | if numeric_cols:
95 | fig, axes = plt.subplots(min(3, len(numeric_cols)), 1,
96 | figsize=(12, 4 * min(3, len(numeric_cols))))
97 | if len(numeric_cols) == 1:
98 | axes = [axes]
99 |
100 | for idx, num_col in enumerate(numeric_cols[:3]):
101 | ax = axes[idx] if len(numeric_cols) > 1 else axes[0]
102 | daily_data = df.groupby(date_col)[num_col].agg(['mean', 'sum', 'count'])
103 | daily_data['mean'].plot(ax=ax, label='Average', linewidth=2)
104 | ax.set_title(f'{num_col} Over Time')
105 | ax.set_xlabel('Date')
106 | ax.set_ylabel(num_col)
107 | ax.legend()
108 | ax.grid(True, alpha=0.3)
109 |
110 | plt.tight_layout()
111 | plt.savefig('time_series_analysis.png', dpi=150)
112 | plt.close()
113 | charts_created.append('time_series_analysis.png')
114 |
115 | # Distribution plots for numeric columns
116 | if numeric_cols:
117 | n_cols = min(4, len(numeric_cols))
118 | fig, axes = plt.subplots(2, 2, figsize=(12, 10))
119 | axes = axes.flatten()
120 |
121 | for idx, col in enumerate(numeric_cols[:4]):
122 | axes[idx].hist(df[col].dropna(), bins=30, edgecolor='black', alpha=0.7)
123 | axes[idx].set_title(f'Distribution of {col}')
124 | axes[idx].set_xlabel(col)
125 | axes[idx].set_ylabel('Frequency')
126 | axes[idx].grid(True, alpha=0.3)
127 |
128 | # Hide unused subplots
129 | for idx in range(len(numeric_cols[:4]), 4):
130 | axes[idx].set_visible(False)
131 |
132 | plt.tight_layout()
133 | plt.savefig('distributions.png', dpi=150)
134 | plt.close()
135 | charts_created.append('distributions.png')
136 |
137 | # Categorical distributions
138 | if categorical_cols:
139 | fig, axes = plt.subplots(2, 2, figsize=(14, 10))
140 | axes = axes.flatten()
141 |
142 | for idx, col in enumerate(categorical_cols[:4]):
143 | value_counts = df[col].value_counts().head(10)
144 | axes[idx].barh(range(len(value_counts)), value_counts.values)
145 | axes[idx].set_yticks(range(len(value_counts)))
146 | axes[idx].set_yticklabels(value_counts.index)
147 | axes[idx].set_title(f'Top Values in {col}')
148 | axes[idx].set_xlabel('Count')
149 | axes[idx].grid(True, alpha=0.3, axis='x')
150 |
151 | # Hide unused subplots
152 | for idx in range(len(categorical_cols[:4]), 4):
153 | axes[idx].set_visible(False)
154 |
155 | plt.tight_layout()
156 | plt.savefig('categorical_distributions.png', dpi=150)
157 | plt.close()
158 | charts_created.append('categorical_distributions.png')
159 |
160 | # Summary of visualizations
161 | if charts_created:
162 | summary.append(f"\n📊 VISUALIZATIONS CREATED:")
163 | for chart in charts_created:
164 | summary.append(f" ✓ {chart}")
165 |
166 | summary.append("\n" + "=" * 60)
167 | summary.append("✅ COMPREHENSIVE ANALYSIS COMPLETE")
168 | summary.append("=" * 60)
169 |
170 | return "\n".join(summary)
171 |
172 |
173 | if __name__ == "__main__":
174 | # Test with sample data
175 | import sys
176 | if len(sys.argv) > 1:
177 | file_path = sys.argv[1]
178 | else:
179 | file_path = "resources/sample.csv"
180 |
181 | print(summarize_csv(file_path))
182 |
183 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 |
2 |
3 | [-4F46E5?style=for-the-badge)](https://www.skool.com/ai-for-your-business)
4 | [](https://github.com/coffeefuelbump)
5 |
6 | [](https://linktr.ee/corbin_brown)
7 | [](https://www.youtube.com/channel/UCJFMlSxcvlZg5yZUYJT0Pug/join)
8 |
9 |
10 |
11 | ---
12 |
13 | # 📊 CSV Data Summarizer - Claude Skill
14 |
15 | A powerful Claude Skill that automatically analyzes CSV files and generates comprehensive insights with visualizations. Upload any CSV and get instant, intelligent analysis without being asked what you want!
16 |
17 |
18 |
19 | [](https://github.com/coffeefuelbump/csv-data-summarizer-claude-skill)
20 | [](https://www.python.org/)
21 | [](LICENSE)
22 |
23 |
24 |
25 | ## 🚀 Features
26 |
27 | - **🤖 Intelligent & Adaptive** - Automatically detects data type (sales, customer, financial, survey, etc.) and applies relevant analysis
28 | - **📈 Comprehensive Analysis** - Generates statistics, correlations, distributions, and trends
29 | - **🎨 Auto Visualizations** - Creates multiple charts based on what's in your data:
30 | - Time-series plots for date-based data
31 | - Correlation heatmaps for numeric relationships
32 | - Distribution histograms
33 | - Categorical breakdowns
34 | - **⚡ Proactive** - No questions asked! Just upload CSV and get complete analysis immediately
35 | - **🔍 Data Quality Checks** - Automatically detects and reports missing values
36 | - **📊 Multi-Industry Support** - Adapts to e-commerce, healthcare, finance, operations, surveys, and more
37 |
38 | ## 📥 Quick Download
39 |
40 |
41 |
42 | ### Get Started in 2 Steps
43 |
44 | **1️⃣ Download the Skill**
45 | [](https://github.com/coffeefuelbump/csv-data-summarizer-claude-skill/raw/main/csv-data-summarizer.zip)
46 |
47 | **2️⃣ Try the Demo Data**
48 | [](https://github.com/coffeefuelbump/csv-data-summarizer-claude-skill/raw/main/examples/showcase_financial_pl_data.csv)
49 |
50 |
51 |
52 | ---
53 |
54 | ## 📦 What's Included
55 |
56 | ```
57 | csv-data-summarizer-claude-skill/
58 | ├── SKILL.md # Claude Skill definition
59 | ├── analyze.py # Comprehensive analysis engine
60 | ├── requirements.txt # Python dependencies
61 | ├── examples/
62 | │ └── showcase_financial_pl_data.csv # Demo P&L financial dataset (15 months, 25 metrics)
63 | └── resources/
64 | ├── sample.csv # Example dataset
65 | └── README.md # Usage documentation
66 | ```
67 |
68 | ## 🎯 How It Works
69 |
70 | 1. **Upload** any CSV file to Claude.ai
71 | 2. **Skill activates** automatically when CSV is detected
72 | 3. **Analysis runs** immediately - inspects data structure and adapts
73 | 4. **Results delivered** - Complete analysis with multiple visualizations
74 |
75 | No prompting needed. No options to choose. Just instant, comprehensive insights!
76 |
77 | ## 📥 Installation
78 |
79 | ### For Claude.ai Users
80 |
81 | 1. Download the latest release: [`csv-data-summarizer.zip`](https://github.com/coffeefuelbump/csv-data-summarizer-claude-skill/releases)
82 | 2. Go to [Claude.ai](https://claude.ai) → Settings → Capabilities → Skills
83 | 3. Upload the zip file
84 | 4. Enable the skill
85 | 5. Done! Upload any CSV and watch it work ✨
86 |
87 | ### For Developers
88 |
89 | ```bash
90 | git clone git@github.com:coffeefuelbump/csv-data-summarizer-claude-skill.git
91 | cd csv-data-summarizer-claude-skill
92 | pip install -r requirements.txt
93 | ```
94 |
95 | ## 📊 Sample Dataset Highlights
96 |
97 | The included demo CSV contains **15 months of P&L data** with:
98 | - 3 product lines (SaaS, Enterprise, Services)
99 | - 25 financial metrics including revenue, expenses, margins, CAC, LTV
100 | - Quarterly trends showing business growth
101 | - Perfect for showcasing time-series analysis, correlations, and financial insights
102 |
103 | ## 🎨 Example Use Cases
104 |
105 | - **📊 Sales Data** → Revenue trends, product performance, regional analysis
106 | - **👥 Customer Data** → Demographics, segmentation, geographic patterns
107 | - **💰 Financial Data** → Transaction analysis, trend detection, correlations
108 | - **⚙️ Operational Data** → Performance metrics, time-series analysis
109 | - **📋 Survey Data** → Response distributions, cross-tabulations
110 |
111 | ## 🛠️ Technical Details
112 |
113 | **Dependencies:**
114 | - Python 3.8+
115 | - pandas 2.0+
116 | - matplotlib 3.7+
117 | - seaborn 0.12+
118 |
119 | **Visualizations Generated:**
120 | - Time-series trend plots
121 | - Correlation heatmaps
122 | - Distribution histograms
123 | - Categorical bar charts
124 |
125 | ## 📝 Example Output
126 |
127 | ```
128 | ============================================================
129 | 📊 DATA OVERVIEW
130 | ============================================================
131 | Rows: 100 | Columns: 15
132 |
133 | 📋 DATA TYPES:
134 | • order_date: object
135 | • total_revenue: float64
136 | • customer_segment: object
137 | ...
138 |
139 | 🔍 DATA QUALITY:
140 | ✓ No missing values - dataset is complete!
141 |
142 | 📈 NUMERICAL ANALYSIS:
143 | [Summary statistics for all numeric columns]
144 |
145 | 🔗 CORRELATIONS:
146 | [Correlation matrix showing relationships]
147 |
148 | 📅 TIME SERIES ANALYSIS:
149 | Date range: 2024-01-05 to 2024-04-11
150 | Span: 97 days
151 |
152 | 📊 VISUALIZATIONS CREATED:
153 | ✓ correlation_heatmap.png
154 | ✓ time_series_analysis.png
155 | ✓ distributions.png
156 | ✓ categorical_distributions.png
157 | ```
158 |
159 | ## 🌟 Connect & Learn More
160 |
161 |
162 |
163 | [-blue?style=for-the-badge&logo=)](https://www.skool.com/ai-for-your-business/about)
164 |
165 | [](https://linktr.ee/corbin_brown)
166 |
167 | [](https://www.youtube.com/channel/UCJFMlSxcvlZg5yZUYJT0Pug/join)
168 |
169 | [](https://twitter.com/corbin_braun)
170 |
171 |
172 |
173 | ## 🤝 Contributing
174 |
175 | Contributions are welcome! Feel free to:
176 | - Report bugs
177 | - Suggest new features
178 | - Submit pull requests
179 | - Share your use cases
180 |
181 | ## 📄 License
182 |
183 | MIT License - feel free to use this skill for personal or commercial projects!
184 |
185 | ## 🙏 Acknowledgments
186 |
187 | Built for the Claude Skills platform by [Anthropic](https://www.anthropic.com/news/skills).
188 |
189 | ---
190 |
191 |
192 |
193 | **Made with ❤️ for the AI community**
194 |
195 | ⭐ Star this repo if you find it useful!
196 |
197 |
198 |
199 |
--------------------------------------------------------------------------------