├── requirements.txt ├── csv-data-summarizer.zip ├── .gitignore ├── resources ├── sample.csv └── README.md ├── SKILL.md ├── examples └── showcase_financial_pl_data.csv ├── analyze.py └── README.md /requirements.txt: -------------------------------------------------------------------------------- 1 | pandas>=2.0.0 2 | matplotlib>=3.7.0 3 | seaborn>=0.12.0 4 | 5 | -------------------------------------------------------------------------------- /csv-data-summarizer.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/coffeefuelbump/csv-data-summarizer-claude-skill/HEAD/csv-data-summarizer.zip -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | # Python 2 | __pycache__/ 3 | *.py[cod] 4 | *$py.class 5 | *.so 6 | .Python 7 | env/ 8 | venv/ 9 | .venv/ 10 | 11 | # Distribution / packaging 12 | *.egg-info/ 13 | dist/ 14 | build/ 15 | 16 | # Jupyter Notebook 17 | .ipynb_checkpoints 18 | 19 | # IDE 20 | .vscode/ 21 | .idea/ 22 | *.swp 23 | *.swo 24 | *~ 25 | 26 | # Project specific 27 | *.png 28 | *.jpg 29 | *.jpeg 30 | chart.png 31 | correlation_heatmap.png 32 | time_series_analysis.png 33 | distributions.png 34 | categorical_distributions.png 35 | 36 | # Allow the skill zip file to be committed 37 | !csv-data-summarizer.zip 38 | 39 | # OS 40 | .DS_Store 41 | Thumbs.db 42 | 43 | -------------------------------------------------------------------------------- /resources/sample.csv: -------------------------------------------------------------------------------- 1 | date,product,quantity,revenue,customer_id,region 2 | 2024-01-15,Widget A,5,129.99,C001,North 3 | 2024-01-16,Widget B,3,89.97,C002,South 4 | 2024-01-17,Widget A,7,181.98,C003,East 5 | 2024-01-18,Widget C,2,199.98,C001,North 6 | 2024-01-19,Widget B,4,119.96,C004,West 7 | 2024-01-20,Widget A,6,155.94,C005,South 8 | 2024-01-21,Widget C,1,99.99,C002,South 9 | 2024-01-22,Widget B,8,239.92,C006,East 10 | 2024-01-23,Widget A,3,77.97,C007,North 11 | 2024-01-24,Widget C,5,499.95,C003,East 12 | 2024-01-25,Widget B,2,59.98,C008,West 13 | 2024-01-26,Widget A,9,233.91,C004,West 14 | 2024-01-27,Widget C,3,299.97,C009,North 15 | 2024-01-28,Widget B,6,179.94,C010,South 16 | 2024-01-29,Widget A,4,103.96,C005,South 17 | 2024-01-30,Widget C,7,699.93,C011,East 18 | 2024-01-31,Widget B,5,149.95,C012,West 19 | 2024-02-01,Widget A,8,207.92,C013,North 20 | 2024-02-02,Widget C,2,199.98,C014,South 21 | 2024-02-03,Widget B,10,299.90,C015,East 22 | 23 | -------------------------------------------------------------------------------- /resources/README.md: -------------------------------------------------------------------------------- 1 | # CSV Data Summarizer - Resources 2 | 3 | --- 4 | 5 | ## 🌟 Connect & Learn More 6 | 7 |
8 | 9 | ### 🚀 **Join Our Community** 10 | [![Join AI Community](https://img.shields.io/badge/Join-AI%20Community%20(FREE)-blue?style=for-the-badge&logo=data:image/svg+xml;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHdpZHRoPSIyNCIgaGVpZ2h0PSIyNCIgdmlld0JveD0iMCAwIDI0IDI0IiBmaWxsPSJ3aGl0ZSI+PHBhdGggZD0iTTEyIDJDNi40OCAyIDIgNi40OCAyIDEyczQuNDggMTAgMTAgMTAgMTAtNC40OCAxMC0xMFMxNy41MiAyIDEyIDJ6bTAgM2MxLjY2IDAgMyAxLjM0IDMgM3MtMS4zNCAzLTMgMy0zLTEuMzQtMy0zIDEuMzQtMyAzLTN6bTAgMTQuMmMtMi41IDAtNC43MS0xLjI4LTYtMy4yMi4wMy0xLjk5IDQtMy4wOCA2LTMuMDggMS45OSAwIDUuOTcgMS4wOSA2IDMuMDgtMS4yOSAxLjk0LTMuNSAzLjIyLTYgMy4yMnoiLz48L3N2Zz4=)](https://www.skool.com/ai-for-your-business/about) 11 | 12 | ### 🔗 **All My Links** 13 | [![Link Tree](https://img.shields.io/badge/Linktree-Everything-green?style=for-the-badge&logo=linktree&logoColor=white)](https://linktr.ee/corbin_brown) 14 | 15 | ### 🛠️ **Become a Builder** 16 | [![YouTube Membership](https://img.shields.io/badge/YouTube-Become%20a%20Builder-red?style=for-the-badge&logo=youtube&logoColor=white)](https://www.youtube.com/channel/UCJFMlSxcvlZg5yZUYJT0Pug/join) 17 | 18 | ### 🐦 **Follow on Twitter** 19 | [![Twitter Follow](https://img.shields.io/badge/Twitter-Follow%20@corbin__braun-1DA1F2?style=for-the-badge&logo=twitter&logoColor=white)](https://twitter.com/corbin_braun) 20 | 21 |
22 | 23 | --- 24 | 25 | ## Sample Data 26 | 27 | The `sample.csv` file contains example sales data with the following columns: 28 | 29 | - **date**: Transaction date 30 | - **product**: Product name (Widget A, B, or C) 31 | - **quantity**: Number of items sold 32 | - **revenue**: Total revenue from the transaction 33 | - **customer_id**: Unique customer identifier 34 | - **region**: Geographic region (North, South, East, West) 35 | 36 | ## Usage Examples 37 | 38 | ### Basic Summary 39 | ``` 40 | Analyze sample.csv 41 | ``` 42 | 43 | ### With Custom CSV 44 | ``` 45 | Here's my sales_data.csv file. Can you summarize it? 46 | ``` 47 | 48 | ### Focus on Specific Insights 49 | ``` 50 | What are the revenue trends in this dataset? 51 | ``` 52 | 53 | ## Testing the Skill 54 | 55 | You can test the skill locally before uploading to Claude: 56 | 57 | ```bash 58 | # Install dependencies 59 | pip install -r ../requirements.txt 60 | 61 | # Run the analysis 62 | python ../analyze.py sample.csv 63 | ``` 64 | 65 | ## Expected Output 66 | 67 | The analysis will provide: 68 | 69 | 1. **Dataset dimensions** - Row and column counts 70 | 2. **Column information** - Names and data types 71 | 3. **Summary statistics** - Mean, median, std dev, min/max for numeric columns 72 | 4. **Data quality** - Missing value detection and counts 73 | 5. **Visualizations** - Time-series plots when date columns are present 74 | 75 | ## Customization 76 | 77 | To adapt this skill for your specific use case: 78 | 79 | 1. Modify `analyze.py` to include domain-specific calculations 80 | 2. Add custom visualization types in the plotting section 81 | 3. Include validation rules specific to your data 82 | 4. Add more sample datasets to test different scenarios 83 | 84 | -------------------------------------------------------------------------------- /SKILL.md: -------------------------------------------------------------------------------- 1 | --- 2 | name: csv-data-summarizer 3 | description: Analyzes CSV files, generates summary stats, and plots quick visualizations using Python and pandas. 4 | metadata: 5 | version: 2.1.0 6 | dependencies: python>=3.8, pandas>=2.0.0, matplotlib>=3.7.0, seaborn>=0.12.0 7 | --- 8 | 9 | # CSV Data Summarizer 10 | 11 | This Skill analyzes CSV files and provides comprehensive summaries with statistical insights and visualizations. 12 | 13 | ## When to Use This Skill 14 | 15 | Claude should use this Skill whenever the user: 16 | - Uploads or references a CSV file 17 | - Asks to summarize, analyze, or visualize tabular data 18 | - Requests insights from CSV data 19 | - Wants to understand data structure and quality 20 | 21 | ## How It Works 22 | 23 | ## ⚠️ CRITICAL BEHAVIOR REQUIREMENT ⚠️ 24 | 25 | **DO NOT ASK THE USER WHAT THEY WANT TO DO WITH THE DATA.** 26 | **DO NOT OFFER OPTIONS OR CHOICES.** 27 | **DO NOT SAY "What would you like me to help you with?"** 28 | **DO NOT LIST POSSIBLE ANALYSES.** 29 | 30 | **IMMEDIATELY AND AUTOMATICALLY:** 31 | 1. Run the comprehensive analysis 32 | 2. Generate ALL relevant visualizations 33 | 3. Present complete results 34 | 4. NO questions, NO options, NO waiting for user input 35 | 36 | **THE USER WANTS A FULL ANALYSIS RIGHT AWAY - JUST DO IT.** 37 | 38 | ### Automatic Analysis Steps: 39 | 40 | **The skill intelligently adapts to different data types and industries by inspecting the data first, then determining what analyses are most relevant.** 41 | 42 | 1. **Load and inspect** the CSV file into pandas DataFrame 43 | 2. **Identify data structure** - column types, date columns, numeric columns, categories 44 | 3. **Determine relevant analyses** based on what's actually in the data: 45 | - **Sales/E-commerce data** (order dates, revenue, products): Time-series trends, revenue analysis, product performance 46 | - **Customer data** (demographics, segments, regions): Distribution analysis, segmentation, geographic patterns 47 | - **Financial data** (transactions, amounts, dates): Trend analysis, statistical summaries, correlations 48 | - **Operational data** (timestamps, metrics, status): Time-series, performance metrics, distributions 49 | - **Survey data** (categorical responses, ratings): Frequency analysis, cross-tabulations, distributions 50 | - **Generic tabular data**: Adapts based on column types found 51 | 52 | 4. **Only create visualizations that make sense** for the specific dataset: 53 | - Time-series plots ONLY if date/timestamp columns exist 54 | - Correlation heatmaps ONLY if multiple numeric columns exist 55 | - Category distributions ONLY if categorical columns exist 56 | - Histograms for numeric distributions when relevant 57 | 58 | 5. **Generate comprehensive output** automatically including: 59 | - Data overview (rows, columns, types) 60 | - Key statistics and metrics relevant to the data type 61 | - Missing data analysis 62 | - Multiple relevant visualizations (only those that apply) 63 | - Actionable insights based on patterns found in THIS specific dataset 64 | 65 | 6. **Present everything** in one complete analysis - no follow-up questions 66 | 67 | **Example adaptations:** 68 | - Healthcare data with patient IDs → Focus on demographics, treatment patterns, temporal trends 69 | - Inventory data with stock levels → Focus on quantity distributions, reorder patterns, SKU analysis 70 | - Web analytics with timestamps → Focus on traffic patterns, conversion metrics, time-of-day analysis 71 | - Survey responses → Focus on response distributions, demographic breakdowns, sentiment patterns 72 | 73 | ### Behavior Guidelines 74 | 75 | ✅ **CORRECT APPROACH - SAY THIS:** 76 | - "I'll analyze this data comprehensively right now." 77 | - "Here's the complete analysis with visualizations:" 78 | - "I've identified this as [type] data and generated relevant insights:" 79 | - Then IMMEDIATELY show the full analysis 80 | 81 | ✅ **DO:** 82 | - Immediately run the analysis script 83 | - Generate ALL relevant charts automatically 84 | - Provide complete insights without being asked 85 | - Be thorough and complete in first response 86 | - Act decisively without asking permission 87 | 88 | ❌ **NEVER SAY THESE PHRASES:** 89 | - "What would you like to do with this data?" 90 | - "What would you like me to help you with?" 91 | - "Here are some common options:" 92 | - "Let me know what you'd like help with" 93 | - "I can create a comprehensive analysis if you'd like!" 94 | - Any sentence ending with "?" asking for user direction 95 | - Any list of options or choices 96 | - Any conditional "I can do X if you want" 97 | 98 | ❌ **FORBIDDEN BEHAVIORS:** 99 | - Asking what the user wants 100 | - Listing options for the user to choose from 101 | - Waiting for user direction before analyzing 102 | - Providing partial analysis that requires follow-up 103 | - Describing what you COULD do instead of DOING it 104 | 105 | ### Usage 106 | 107 | The Skill provides a Python function `summarize_csv(file_path)` that: 108 | - Accepts a path to a CSV file 109 | - Returns a comprehensive text summary with statistics 110 | - Generates multiple visualizations automatically based on data structure 111 | 112 | ### Example Prompts 113 | 114 | > "Here's `sales_data.csv`. Can you summarize this file?" 115 | 116 | > "Analyze this customer data CSV and show me trends." 117 | 118 | > "What insights can you find in `orders.csv`?" 119 | 120 | ### Example Output 121 | 122 | **Dataset Overview** 123 | - 5,000 rows × 8 columns 124 | - 3 numeric columns, 1 date column 125 | 126 | **Summary Statistics** 127 | - Average order value: $58.2 128 | - Standard deviation: $12.4 129 | - Missing values: 2% (100 cells) 130 | 131 | **Insights** 132 | - Sales show upward trend over time 133 | - Peak activity in Q4 134 | *(Attached: trend plot)* 135 | 136 | ## Files 137 | 138 | - `analyze.py` - Core analysis logic 139 | - `requirements.txt` - Python dependencies 140 | - `resources/sample.csv` - Example dataset for testing 141 | - `resources/README.md` - Additional documentation 142 | 143 | ## Notes 144 | 145 | - Automatically detects date columns (columns containing 'date' in name) 146 | - Handles missing data gracefully 147 | - Generates visualizations only when date columns are present 148 | - All numeric columns are included in statistical summary 149 | 150 | -------------------------------------------------------------------------------- /examples/showcase_financial_pl_data.csv: -------------------------------------------------------------------------------- 1 | month,year,quarter,product_line,total_revenue,cost_of_goods_sold,gross_profit,gross_margin_pct,marketing_expense,sales_expense,rd_expense,admin_expense,total_operating_expenses,operating_income,operating_margin_pct,interest_expense,tax_expense,net_income,net_margin_pct,customer_acquisition_cost,customer_lifetime_value,units_sold,avg_selling_price,headcount,revenue_per_employee 2 | Jan,2023,Q1,SaaS Platform,450000,135000,315000,70.0,65000,85000,45000,35000,230000,85000,18.9,5000,16000,64000,14.2,125,2400,1200,375,45,10000 3 | Jan,2023,Q1,Enterprise Solutions,280000,112000,168000,60.0,35000,55000,25000,20000,135000,33000,11.8,3000,6600,23400,8.4,450,8500,450,622,45,6222 4 | Jan,2023,Q1,Professional Services,125000,50000,75000,60.0,15000,22000,8000,12000,57000,18000,14.4,1500,3600,12900,10.3,200,3200,95,1316,45,2778 5 | Feb,2023,Q1,SaaS Platform,475000,142500,332500,70.0,68000,89000,47000,36000,240000,92500,19.5,5200,18500,68800,14.5,120,2500,1300,365,47,10106 6 | Feb,2023,Q1,Enterprise Solutions,295000,118000,177000,60.0,38000,58000,27000,22000,145000,32000,10.8,3200,6400,22400,7.6,440,8600,470,628,47,6277 7 | Feb,2023,Q1,Professional Services,135000,54000,81000,60.0,16000,24000,9000,13000,62000,19000,14.1,1600,3800,13600,10.1,195,3300,105,1286,47,2872 8 | Mar,2023,Q1,SaaS Platform,520000,156000,364000,70.0,75000,95000,52000,40000,262000,102000,19.6,5500,19250,77250,14.9,115,2650,1450,359,50,10400 9 | Mar,2023,Q1,Enterprise Solutions,325000,130000,195000,60.0,42000,63000,30000,25000,160000,35000,10.8,3500,7000,24500,7.5,425,8800,520,625,50,6500 10 | Mar,2023,Q1,Professional Services,148000,59200,88800,60.0,18000,26000,10000,14000,68000,20800,14.1,1800,4160,14840,10.0,190,3400,115,1287,50,2960 11 | Apr,2023,Q2,SaaS Platform,555000,166500,388500,70.0,80000,100000,55000,42000,277000,111500,20.1,5800,22300,83400,15.0,110,2750,1550,358,52,10673 12 | Apr,2023,Q2,Enterprise Solutions,340000,136000,204000,60.0,45000,65000,32000,26000,168000,36000,10.6,3700,7200,25100,7.4,420,9000,540,630,52,6538 13 | Apr,2023,Q2,Professional Services,158000,63200,94800,60.0,19000,27000,11000,15000,72000,22800,14.4,1900,4560,16340,10.3,185,3500,125,1264,52,3038 14 | May,2023,Q2,SaaS Platform,590000,177000,413000,70.0,85000,105000,58000,44000,292000,121000,20.5,6000,24200,90800,15.4,105,2850,1650,358,55,10727 15 | May,2023,Q2,Enterprise Solutions,365000,146000,219000,60.0,48000,68000,35000,28000,179000,40000,11.0,4000,8000,28000,7.7,410,9200,580,629,55,6636 16 | May,2023,Q2,Professional Services,172000,68800,103200,60.0,21000,29000,12000,16000,78000,25200,14.7,2100,5040,18060,10.5,180,3600,135,1274,55,3127 17 | Jun,2023,Q2,SaaS Platform,625000,187500,437500,70.0,90000,110000,62000,46000,308000,129500,20.7,6200,25850,97450,15.6,100,2950,1750,357,58,10776 18 | Jun,2023,Q2,Enterprise Solutions,385000,154000,231000,60.0,50000,70000,37000,29000,186000,45000,11.7,4200,9000,31800,8.3,400,9400,610,631,58,6638 19 | Jun,2023,Q2,Professional Services,185000,74000,111000,60.0,22000,31000,13000,17000,83000,28000,15.1,2200,5580,20220,10.9,175,3700,145,1276,58,3190 20 | Jul,2023,Q3,SaaS Platform,665000,199500,465500,70.0,95000,115000,65000,48000,323000,142500,21.4,6500,28500,107500,16.2,95,3050,1850,359,60,11083 21 | Jul,2023,Q3,Enterprise Solutions,410000,164000,246000,60.0,53000,73000,40000,31000,197000,49000,12.0,4400,9800,34800,8.5,390,9600,650,631,60,6833 22 | Jul,2023,Q3,Professional Services,198000,79200,118800,60.0,24000,33000,14000,18000,89000,29800,15.1,2400,5960,21440,10.8,170,3800,155,1277,60,3300 23 | Aug,2023,Q3,SaaS Platform,705000,211500,493500,70.0,100000,120000,68000,50000,338000,155500,22.1,6800,31100,117600,16.7,90,3150,1950,362,63,11190 24 | Aug,2023,Q3,Enterprise Solutions,435000,174000,261000,60.0,56000,76000,42000,33000,207000,54000,12.4,4600,10800,38600,8.9,380,9800,690,630,63,6905 25 | Aug,2023,Q3,Professional Services,210000,84000,126000,60.0,25000,35000,15000,19000,94000,32000,15.2,2500,6400,23100,11.0,165,3900,165,1273,63,3333 26 | Sep,2023,Q3,SaaS Platform,750000,225000,525000,70.0,108000,128000,72000,53000,361000,164000,21.9,7200,33360,123440,16.5,88,3250,2080,360,65,11538 27 | Sep,2023,Q3,Enterprise Solutions,465000,186000,279000,60.0,60000,80000,45000,35000,220000,59000,12.7,5000,11800,42200,9.1,370,10000,735,633,65,7154 28 | Sep,2023,Q3,Professional Services,225000,90000,135000,60.0,27000,37000,16000,20000,100000,35000,15.6,2700,6920,25380,11.3,160,4000,175,1286,65,3462 29 | Oct,2023,Q4,SaaS Platform,795000,238500,556500,70.0,115000,135000,75000,55000,380000,176500,22.2,7500,35870,133130,16.7,85,3350,2200,361,68,11691 30 | Oct,2023,Q4,Enterprise Solutions,490000,196000,294000,60.0,63000,83000,47000,36000,229000,65000,13.3,5200,13000,46800,9.6,360,10200,770,636,68,7206 31 | Oct,2023,Q4,Professional Services,238000,95200,142800,60.0,29000,39000,17000,21000,106000,36800,15.5,2800,7360,26640,11.2,158,4100,185,1286,68,3500 32 | Nov,2023,Q4,SaaS Platform,840000,252000,588000,70.0,122000,142000,78000,58000,400000,188000,22.4,7800,38440,141760,16.9,82,3450,2320,362,70,12000 33 | Nov,2023,Q4,Enterprise Solutions,520000,208000,312000,60.0,67000,87000,50000,38000,242000,70000,13.5,5500,14100,50400,9.7,355,10400,815,638,70,7429 34 | Nov,2023,Q4,Professional Services,252000,100800,151200,60.0,31000,41000,18000,22000,112000,39200,15.6,3000,7728,28472,11.3,155,4200,195,1292,70,3600 35 | Dec,2023,Q4,SaaS Platform,895000,268500,626500,70.0,130000,150000,82000,62000,424000,202500,22.6,8200,41145,153155,17.1,80,3550,2480,361,72,12431 36 | Dec,2023,Q4,Enterprise Solutions,555000,222000,333000,60.0,72000,92000,53000,40000,257000,76000,13.7,6000,15400,54600,9.8,350,10600,870,638,72,7708 37 | Dec,2023,Q4,Professional Services,268000,107200,160800,60.0,33000,43000,19000,23000,118000,42800,16.0,3200,8352,31248,11.7,152,4300,205,1307,72,3722 38 | Jan,2024,Q1,SaaS Platform,925000,277500,647500,70.0,135000,155000,85000,64000,439000,208500,22.5,8500,42070,157930,17.1,78,3650,2550,363,75,12333 39 | Jan,2024,Q1,Enterprise Solutions,575000,230000,345000,60.0,75000,95000,55000,42000,267000,78000,13.6,6200,15760,56040,9.7,345,10800,900,639,75,7667 40 | Jan,2024,Q1,Professional Services,280000,112000,168000,60.0,34000,45000,20000,24000,123000,45000,16.1,3300,8770,32930,11.8,150,4400,215,1302,75,3733 41 | Feb,2024,Q1,SaaS Platform,965000,289500,675500,70.0,140000,160000,88000,66000,454000,221500,23.0,8800,44510,168190,17.4,75,3750,2660,363,77,12532 42 | Feb,2024,Q1,Enterprise Solutions,600000,240000,360000,60.0,78000,98000,57000,43000,276000,84000,14.0,6400,16800,60800,10.1,340,11000,940,638,77,7792 43 | Feb,2024,Q1,Professional Services,295000,118000,177000,60.0,36000,47000,21000,25000,129000,48000,16.3,3500,9420,35080,11.9,148,4500,225,1311,77,3831 44 | Mar,2024,Q1,SaaS Platform,1020000,306000,714000,70.0,148000,168000,92000,69000,477000,237000,23.2,9200,47880,179920,17.6,73,3850,2810,363,80,12750 45 | Mar,2024,Q1,Enterprise Solutions,635000,254000,381000,60.0,82000,103000,60000,45000,290000,91000,14.3,6800,18200,66000,10.4,335,11200,990,641,80,7938 46 | Mar,2024,Q1,Professional Services,312000,124800,187200,60.0,38000,49000,22000,26000,135000,52200,16.7,3700,10230,38270,12.3,145,4600,240,1300,80,3900 47 | -------------------------------------------------------------------------------- /analyze.py: -------------------------------------------------------------------------------- 1 | import pandas as pd 2 | import matplotlib.pyplot as plt 3 | import seaborn as sns 4 | from pathlib import Path 5 | 6 | def summarize_csv(file_path): 7 | """ 8 | Comprehensively analyzes a CSV file and generates multiple visualizations. 9 | 10 | Args: 11 | file_path (str): Path to the CSV file 12 | 13 | Returns: 14 | str: Formatted comprehensive analysis of the dataset 15 | """ 16 | df = pd.read_csv(file_path) 17 | summary = [] 18 | charts_created = [] 19 | 20 | # Basic info 21 | summary.append("=" * 60) 22 | summary.append("📊 DATA OVERVIEW") 23 | summary.append("=" * 60) 24 | summary.append(f"Rows: {df.shape[0]:,} | Columns: {df.shape[1]}") 25 | summary.append(f"\nColumns: {', '.join(df.columns.tolist())}") 26 | 27 | # Data types 28 | summary.append(f"\n📋 DATA TYPES:") 29 | for col, dtype in df.dtypes.items(): 30 | summary.append(f" • {col}: {dtype}") 31 | 32 | # Missing data analysis 33 | missing = df.isnull().sum().sum() 34 | missing_pct = (missing / (df.shape[0] * df.shape[1])) * 100 35 | summary.append(f"\n🔍 DATA QUALITY:") 36 | if missing: 37 | summary.append(f"Missing values: {missing:,} ({missing_pct:.2f}% of total data)") 38 | summary.append("Missing by column:") 39 | for col in df.columns: 40 | col_missing = df[col].isnull().sum() 41 | if col_missing > 0: 42 | col_pct = (col_missing / len(df)) * 100 43 | summary.append(f" • {col}: {col_missing:,} ({col_pct:.1f}%)") 44 | else: 45 | summary.append("✓ No missing values - dataset is complete!") 46 | 47 | # Numeric analysis 48 | numeric_cols = df.select_dtypes(include='number').columns.tolist() 49 | if numeric_cols: 50 | summary.append(f"\n📈 NUMERICAL ANALYSIS:") 51 | summary.append(str(df[numeric_cols].describe())) 52 | 53 | # Correlations if multiple numeric columns 54 | if len(numeric_cols) > 1: 55 | summary.append(f"\n🔗 CORRELATIONS:") 56 | corr_matrix = df[numeric_cols].corr() 57 | summary.append(str(corr_matrix)) 58 | 59 | # Create correlation heatmap 60 | plt.figure(figsize=(10, 8)) 61 | sns.heatmap(corr_matrix, annot=True, cmap='coolwarm', center=0, 62 | square=True, linewidths=1) 63 | plt.title('Correlation Heatmap') 64 | plt.tight_layout() 65 | plt.savefig('correlation_heatmap.png', dpi=150) 66 | plt.close() 67 | charts_created.append('correlation_heatmap.png') 68 | 69 | # Categorical analysis 70 | categorical_cols = df.select_dtypes(include=['object']).columns.tolist() 71 | categorical_cols = [c for c in categorical_cols if 'id' not in c.lower()] 72 | 73 | if categorical_cols: 74 | summary.append(f"\n📊 CATEGORICAL ANALYSIS:") 75 | for col in categorical_cols[:5]: # Limit to first 5 76 | value_counts = df[col].value_counts() 77 | summary.append(f"\n{col}:") 78 | for val, count in value_counts.head(10).items(): 79 | pct = (count / len(df)) * 100 80 | summary.append(f" • {val}: {count:,} ({pct:.1f}%)") 81 | 82 | # Time series analysis 83 | date_cols = [c for c in df.columns if 'date' in c.lower() or 'time' in c.lower()] 84 | if date_cols: 85 | summary.append(f"\n📅 TIME SERIES ANALYSIS:") 86 | date_col = date_cols[0] 87 | df[date_col] = pd.to_datetime(df[date_col], errors='coerce') 88 | 89 | date_range = df[date_col].max() - df[date_col].min() 90 | summary.append(f"Date range: {df[date_col].min()} to {df[date_col].max()}") 91 | summary.append(f"Span: {date_range.days} days") 92 | 93 | # Create time-series plots for numeric columns 94 | if numeric_cols: 95 | fig, axes = plt.subplots(min(3, len(numeric_cols)), 1, 96 | figsize=(12, 4 * min(3, len(numeric_cols)))) 97 | if len(numeric_cols) == 1: 98 | axes = [axes] 99 | 100 | for idx, num_col in enumerate(numeric_cols[:3]): 101 | ax = axes[idx] if len(numeric_cols) > 1 else axes[0] 102 | daily_data = df.groupby(date_col)[num_col].agg(['mean', 'sum', 'count']) 103 | daily_data['mean'].plot(ax=ax, label='Average', linewidth=2) 104 | ax.set_title(f'{num_col} Over Time') 105 | ax.set_xlabel('Date') 106 | ax.set_ylabel(num_col) 107 | ax.legend() 108 | ax.grid(True, alpha=0.3) 109 | 110 | plt.tight_layout() 111 | plt.savefig('time_series_analysis.png', dpi=150) 112 | plt.close() 113 | charts_created.append('time_series_analysis.png') 114 | 115 | # Distribution plots for numeric columns 116 | if numeric_cols: 117 | n_cols = min(4, len(numeric_cols)) 118 | fig, axes = plt.subplots(2, 2, figsize=(12, 10)) 119 | axes = axes.flatten() 120 | 121 | for idx, col in enumerate(numeric_cols[:4]): 122 | axes[idx].hist(df[col].dropna(), bins=30, edgecolor='black', alpha=0.7) 123 | axes[idx].set_title(f'Distribution of {col}') 124 | axes[idx].set_xlabel(col) 125 | axes[idx].set_ylabel('Frequency') 126 | axes[idx].grid(True, alpha=0.3) 127 | 128 | # Hide unused subplots 129 | for idx in range(len(numeric_cols[:4]), 4): 130 | axes[idx].set_visible(False) 131 | 132 | plt.tight_layout() 133 | plt.savefig('distributions.png', dpi=150) 134 | plt.close() 135 | charts_created.append('distributions.png') 136 | 137 | # Categorical distributions 138 | if categorical_cols: 139 | fig, axes = plt.subplots(2, 2, figsize=(14, 10)) 140 | axes = axes.flatten() 141 | 142 | for idx, col in enumerate(categorical_cols[:4]): 143 | value_counts = df[col].value_counts().head(10) 144 | axes[idx].barh(range(len(value_counts)), value_counts.values) 145 | axes[idx].set_yticks(range(len(value_counts))) 146 | axes[idx].set_yticklabels(value_counts.index) 147 | axes[idx].set_title(f'Top Values in {col}') 148 | axes[idx].set_xlabel('Count') 149 | axes[idx].grid(True, alpha=0.3, axis='x') 150 | 151 | # Hide unused subplots 152 | for idx in range(len(categorical_cols[:4]), 4): 153 | axes[idx].set_visible(False) 154 | 155 | plt.tight_layout() 156 | plt.savefig('categorical_distributions.png', dpi=150) 157 | plt.close() 158 | charts_created.append('categorical_distributions.png') 159 | 160 | # Summary of visualizations 161 | if charts_created: 162 | summary.append(f"\n📊 VISUALIZATIONS CREATED:") 163 | for chart in charts_created: 164 | summary.append(f" ✓ {chart}") 165 | 166 | summary.append("\n" + "=" * 60) 167 | summary.append("✅ COMPREHENSIVE ANALYSIS COMPLETE") 168 | summary.append("=" * 60) 169 | 170 | return "\n".join(summary) 171 | 172 | 173 | if __name__ == "__main__": 174 | # Test with sample data 175 | import sys 176 | if len(sys.argv) > 1: 177 | file_path = sys.argv[1] 178 | else: 179 | file_path = "resources/sample.csv" 180 | 181 | print(summarize_csv(file_path)) 182 | 183 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 |
2 | 3 | [![Join AI Community](https://img.shields.io/badge/🚀_Join-AI_Community_(FREE)-4F46E5?style=for-the-badge)](https://www.skool.com/ai-for-your-business) 4 | [![GitHub Profile](https://img.shields.io/badge/GitHub-@coffeefuelbump-181717?style=for-the-badge&logo=github)](https://github.com/coffeefuelbump) 5 | 6 | [![Link Tree](https://img.shields.io/badge/Linktree-Everything-green?style=for-the-badge&logo=linktree&logoColor=white)](https://linktr.ee/corbin_brown) 7 | [![YouTube Membership](https://img.shields.io/badge/YouTube-Become%20a%20Builder-red?style=for-the-badge&logo=youtube&logoColor=white)](https://www.youtube.com/channel/UCJFMlSxcvlZg5yZUYJT0Pug/join) 8 | 9 |
10 | 11 | --- 12 | 13 | # 📊 CSV Data Summarizer - Claude Skill 14 | 15 | A powerful Claude Skill that automatically analyzes CSV files and generates comprehensive insights with visualizations. Upload any CSV and get instant, intelligent analysis without being asked what you want! 16 | 17 |
18 | 19 | [![Version](https://img.shields.io/badge/version-2.1.0-blue.svg)](https://github.com/coffeefuelbump/csv-data-summarizer-claude-skill) 20 | [![Python](https://img.shields.io/badge/python-3.8+-green.svg)](https://www.python.org/) 21 | [![License](https://img.shields.io/badge/license-MIT-orange.svg)](LICENSE) 22 | 23 |
24 | 25 | ## 🚀 Features 26 | 27 | - **🤖 Intelligent & Adaptive** - Automatically detects data type (sales, customer, financial, survey, etc.) and applies relevant analysis 28 | - **📈 Comprehensive Analysis** - Generates statistics, correlations, distributions, and trends 29 | - **🎨 Auto Visualizations** - Creates multiple charts based on what's in your data: 30 | - Time-series plots for date-based data 31 | - Correlation heatmaps for numeric relationships 32 | - Distribution histograms 33 | - Categorical breakdowns 34 | - **⚡ Proactive** - No questions asked! Just upload CSV and get complete analysis immediately 35 | - **🔍 Data Quality Checks** - Automatically detects and reports missing values 36 | - **📊 Multi-Industry Support** - Adapts to e-commerce, healthcare, finance, operations, surveys, and more 37 | 38 | ## 📥 Quick Download 39 | 40 |
41 | 42 | ### Get Started in 2 Steps 43 | 44 | **1️⃣ Download the Skill** 45 | [![Download Skill](https://img.shields.io/badge/Download-CSV%20Data%20Summarizer%20Skill-blue?style=for-the-badge&logo=download)](https://github.com/coffeefuelbump/csv-data-summarizer-claude-skill/raw/main/csv-data-summarizer.zip) 46 | 47 | **2️⃣ Try the Demo Data** 48 | [![Download Demo CSV](https://img.shields.io/badge/Download-Sample%20P%26L%20Financial%20Data-green?style=for-the-badge&logo=data)](https://github.com/coffeefuelbump/csv-data-summarizer-claude-skill/raw/main/examples/showcase_financial_pl_data.csv) 49 | 50 |
51 | 52 | --- 53 | 54 | ## 📦 What's Included 55 | 56 | ``` 57 | csv-data-summarizer-claude-skill/ 58 | ├── SKILL.md # Claude Skill definition 59 | ├── analyze.py # Comprehensive analysis engine 60 | ├── requirements.txt # Python dependencies 61 | ├── examples/ 62 | │ └── showcase_financial_pl_data.csv # Demo P&L financial dataset (15 months, 25 metrics) 63 | └── resources/ 64 | ├── sample.csv # Example dataset 65 | └── README.md # Usage documentation 66 | ``` 67 | 68 | ## 🎯 How It Works 69 | 70 | 1. **Upload** any CSV file to Claude.ai 71 | 2. **Skill activates** automatically when CSV is detected 72 | 3. **Analysis runs** immediately - inspects data structure and adapts 73 | 4. **Results delivered** - Complete analysis with multiple visualizations 74 | 75 | No prompting needed. No options to choose. Just instant, comprehensive insights! 76 | 77 | ## 📥 Installation 78 | 79 | ### For Claude.ai Users 80 | 81 | 1. Download the latest release: [`csv-data-summarizer.zip`](https://github.com/coffeefuelbump/csv-data-summarizer-claude-skill/releases) 82 | 2. Go to [Claude.ai](https://claude.ai) → Settings → Capabilities → Skills 83 | 3. Upload the zip file 84 | 4. Enable the skill 85 | 5. Done! Upload any CSV and watch it work ✨ 86 | 87 | ### For Developers 88 | 89 | ```bash 90 | git clone git@github.com:coffeefuelbump/csv-data-summarizer-claude-skill.git 91 | cd csv-data-summarizer-claude-skill 92 | pip install -r requirements.txt 93 | ``` 94 | 95 | ## 📊 Sample Dataset Highlights 96 | 97 | The included demo CSV contains **15 months of P&L data** with: 98 | - 3 product lines (SaaS, Enterprise, Services) 99 | - 25 financial metrics including revenue, expenses, margins, CAC, LTV 100 | - Quarterly trends showing business growth 101 | - Perfect for showcasing time-series analysis, correlations, and financial insights 102 | 103 | ## 🎨 Example Use Cases 104 | 105 | - **📊 Sales Data** → Revenue trends, product performance, regional analysis 106 | - **👥 Customer Data** → Demographics, segmentation, geographic patterns 107 | - **💰 Financial Data** → Transaction analysis, trend detection, correlations 108 | - **⚙️ Operational Data** → Performance metrics, time-series analysis 109 | - **📋 Survey Data** → Response distributions, cross-tabulations 110 | 111 | ## 🛠️ Technical Details 112 | 113 | **Dependencies:** 114 | - Python 3.8+ 115 | - pandas 2.0+ 116 | - matplotlib 3.7+ 117 | - seaborn 0.12+ 118 | 119 | **Visualizations Generated:** 120 | - Time-series trend plots 121 | - Correlation heatmaps 122 | - Distribution histograms 123 | - Categorical bar charts 124 | 125 | ## 📝 Example Output 126 | 127 | ``` 128 | ============================================================ 129 | 📊 DATA OVERVIEW 130 | ============================================================ 131 | Rows: 100 | Columns: 15 132 | 133 | 📋 DATA TYPES: 134 | • order_date: object 135 | • total_revenue: float64 136 | • customer_segment: object 137 | ... 138 | 139 | 🔍 DATA QUALITY: 140 | ✓ No missing values - dataset is complete! 141 | 142 | 📈 NUMERICAL ANALYSIS: 143 | [Summary statistics for all numeric columns] 144 | 145 | 🔗 CORRELATIONS: 146 | [Correlation matrix showing relationships] 147 | 148 | 📅 TIME SERIES ANALYSIS: 149 | Date range: 2024-01-05 to 2024-04-11 150 | Span: 97 days 151 | 152 | 📊 VISUALIZATIONS CREATED: 153 | ✓ correlation_heatmap.png 154 | ✓ time_series_analysis.png 155 | ✓ distributions.png 156 | ✓ categorical_distributions.png 157 | ``` 158 | 159 | ## 🌟 Connect & Learn More 160 | 161 |
162 | 163 | [![Join AI Community](https://img.shields.io/badge/Join-AI%20Community%20(FREE)-blue?style=for-the-badge&logo=data:image/svg+xml;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHdpZHRoPSIyNCIgaGVpZ2h0PSIyNCIgdmlld0JveD0iMCAwIDI0IDI0IiBmaWxsPSJ3aGl0ZSI+PHBhdGggZD0iTTEyIDJDNi40OCAyIDIgNi40OCAyIDEyczQuNDggMTAgMTAgMTAgMTAtNC40OCAxMC0xMFMxNy41MiAyIDEyIDJ6bTAgM2MxLjY2IDAgMyAxLjM0IDMgM3MtMS4zNCAzLTMgMy0zLTEuMzQtMy0zIDEuMzQtMyAzLTN6bTAgMTQuMmMtMi41IDAtNC43MS0xLjI4LTYtMy4yMi4wMy0xLjk5IDQtMy4wOCA2LTMuMDggMS45OSAwIDUuOTcgMS4wOSA2IDMuMDgtMS4yOSAxLjk0LTMuNSAzLjIyLTYgMy4yMnoiLz48L3N2Zz4=)](https://www.skool.com/ai-for-your-business/about) 164 | 165 | [![Link Tree](https://img.shields.io/badge/Linktree-Everything-green?style=for-the-badge&logo=linktree&logoColor=white)](https://linktr.ee/corbin_brown) 166 | 167 | [![YouTube Membership](https://img.shields.io/badge/YouTube-Become%20a%20Builder-red?style=for-the-badge&logo=youtube&logoColor=white)](https://www.youtube.com/channel/UCJFMlSxcvlZg5yZUYJT0Pug/join) 168 | 169 | [![Twitter Follow](https://img.shields.io/badge/Twitter-Follow%20@corbin__braun-1DA1F2?style=for-the-badge&logo=twitter&logoColor=white)](https://twitter.com/corbin_braun) 170 | 171 |
172 | 173 | ## 🤝 Contributing 174 | 175 | Contributions are welcome! Feel free to: 176 | - Report bugs 177 | - Suggest new features 178 | - Submit pull requests 179 | - Share your use cases 180 | 181 | ## 📄 License 182 | 183 | MIT License - feel free to use this skill for personal or commercial projects! 184 | 185 | ## 🙏 Acknowledgments 186 | 187 | Built for the Claude Skills platform by [Anthropic](https://www.anthropic.com/news/skills). 188 | 189 | --- 190 | 191 |
192 | 193 | **Made with ❤️ for the AI community** 194 | 195 | ⭐ Star this repo if you find it useful! 196 | 197 |
198 | 199 | --------------------------------------------------------------------------------