├── requirements.txt
├── csv-data-summarizer.zip
├── .gitignore
├── resources
    ├── sample.csv
    └── README.md
├── SKILL.md
├── examples
    └── showcase_financial_pl_data.csv
├── analyze.py
└── README.md


/requirements.txt:
--------------------------------------------------------------------------------
1 | pandas>=2.0.0
2 | matplotlib>=3.7.0
3 | seaborn>=0.12.0
4 | 
5 | 


--------------------------------------------------------------------------------
/csv-data-summarizer.zip:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/coffeefuelbump/csv-data-summarizer-claude-skill/HEAD/csv-data-summarizer.zip


--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
 1 | # Python
 2 | __pycache__/
 3 | *.py[cod]
 4 | *$py.class
 5 | *.so
 6 | .Python
 7 | env/
 8 | venv/
 9 | .venv/
10 | 
11 | # Distribution / packaging
12 | *.egg-info/
13 | dist/
14 | build/
15 | 
16 | # Jupyter Notebook
17 | .ipynb_checkpoints
18 | 
19 | # IDE
20 | .vscode/
21 | .idea/
22 | *.swp
23 | *.swo
24 | *~
25 | 
26 | # Project specific
27 | *.png
28 | *.jpg
29 | *.jpeg
30 | chart.png
31 | correlation_heatmap.png
32 | time_series_analysis.png
33 | distributions.png
34 | categorical_distributions.png
35 | 
36 | # Allow the skill zip file to be committed
37 | !csv-data-summarizer.zip
38 | 
39 | # OS
40 | .DS_Store
41 | Thumbs.db
42 | 
43 | 


--------------------------------------------------------------------------------
/resources/sample.csv:
--------------------------------------------------------------------------------
 1 | date,product,quantity,revenue,customer_id,region
 2 | 2024-01-15,Widget A,5,129.99,C001,North
 3 | 2024-01-16,Widget B,3,89.97,C002,South
 4 | 2024-01-17,Widget A,7,181.98,C003,East
 5 | 2024-01-18,Widget C,2,199.98,C001,North
 6 | 2024-01-19,Widget B,4,119.96,C004,West
 7 | 2024-01-20,Widget A,6,155.94,C005,South
 8 | 2024-01-21,Widget C,1,99.99,C002,South
 9 | 2024-01-22,Widget B,8,239.92,C006,East
10 | 2024-01-23,Widget A,3,77.97,C007,North
11 | 2024-01-24,Widget C,5,499.95,C003,East
12 | 2024-01-25,Widget B,2,59.98,C008,West
13 | 2024-01-26,Widget A,9,233.91,C004,West
14 | 2024-01-27,Widget C,3,299.97,C009,North
15 | 2024-01-28,Widget B,6,179.94,C010,South
16 | 2024-01-29,Widget A,4,103.96,C005,South
17 | 2024-01-30,Widget C,7,699.93,C011,East
18 | 2024-01-31,Widget B,5,149.95,C012,West
19 | 2024-02-01,Widget A,8,207.92,C013,North
20 | 2024-02-02,Widget C,2,199.98,C014,South
21 | 2024-02-03,Widget B,10,299.90,C015,East
22 | 
23 | 


--------------------------------------------------------------------------------
/resources/README.md:
--------------------------------------------------------------------------------
 1 | # CSV Data Summarizer - Resources
 2 | 
 3 | ---
 4 | 
 5 | ## 🌟 Connect & Learn More
 6 | 
 7 | <div align="center">
 8 | 
 9 | ### 🚀 **Join Our Community**
10 | [![Join AI Community](https://img.shields.io/badge/Join-AI%20Community%20(FREE)-blue?style=for-the-badge&logo=data:image/svg+xml;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHdpZHRoPSIyNCIgaGVpZ2h0PSIyNCIgdmlld0JveD0iMCAwIDI0IDI0IiBmaWxsPSJ3aGl0ZSI+PHBhdGggZD0iTTEyIDJDNi40OCAyIDIgNi40OCAyIDEyczQuNDggMTAgMTAgMTAgMTAtNC40OCAxMC0xMFMxNy41MiAyIDEyIDJ6bTAgM2MxLjY2IDAgMyAxLjM0IDMgM3MtMS4zNCAzLTMgMy0zLTEuMzQtMy0zIDEuMzQtMyAzLTN6bTAgMTQuMmMtMi41IDAtNC43MS0xLjI4LTYtMy4yMi4wMy0xLjk5IDQtMy4wOCA2LTMuMDggMS45OSAwIDUuOTcgMS4wOSA2IDMuMDgtMS4yOSAxLjk0LTMuNSAzLjIyLTYgMy4yMnoiLz48L3N2Zz4=)](https://www.skool.com/ai-for-your-business/about)
11 | 
12 | ### 🔗 **All My Links**
13 | [![Link Tree](https://img.shields.io/badge/Linktree-Everything-green?style=for-the-badge&logo=linktree&logoColor=white)](https://linktr.ee/corbin_brown)
14 | 
15 | ### 🛠️ **Become a Builder**
16 | [![YouTube Membership](https://img.shields.io/badge/YouTube-Become%20a%20Builder-red?style=for-the-badge&logo=youtube&logoColor=white)](https://www.youtube.com/channel/UCJFMlSxcvlZg5yZUYJT0Pug/join)
17 | 
18 | ### 🐦 **Follow on Twitter**
19 | [![Twitter Follow](https://img.shields.io/badge/Twitter-Follow%20@corbin__braun-1DA1F2?style=for-the-badge&logo=twitter&logoColor=white)](https://twitter.com/corbin_braun)
20 | 
21 | </div>
22 | 
23 | ---
24 | 
25 | ## Sample Data
26 | 
27 | The `sample.csv` file contains example sales data with the following columns:
28 | 
29 | - **date**: Transaction date
30 | - **product**: Product name (Widget A, B, or C)
31 | - **quantity**: Number of items sold
32 | - **revenue**: Total revenue from the transaction
33 | - **customer_id**: Unique customer identifier
34 | - **region**: Geographic region (North, South, East, West)
35 | 
36 | ## Usage Examples
37 | 
38 | ### Basic Summary
39 | ```
40 | Analyze sample.csv
41 | ```
42 | 
43 | ### With Custom CSV
44 | ```
45 | Here's my sales_data.csv file. Can you summarize it?
46 | ```
47 | 
48 | ### Focus on Specific Insights
49 | ```
50 | What are the revenue trends in this dataset?
51 | ```
52 | 
53 | ## Testing the Skill
54 | 
55 | You can test the skill locally before uploading to Claude:
56 | 
57 | ```bash
58 | # Install dependencies
59 | pip install -r ../requirements.txt
60 | 
61 | # Run the analysis
62 | python ../analyze.py sample.csv
63 | ```
64 | 
65 | ## Expected Output
66 | 
67 | The analysis will provide:
68 | 
69 | 1. **Dataset dimensions** - Row and column counts
70 | 2. **Column information** - Names and data types
71 | 3. **Summary statistics** - Mean, median, std dev, min/max for numeric columns
72 | 4. **Data quality** - Missing value detection and counts
73 | 5. **Visualizations** - Time-series plots when date columns are present
74 | 
75 | ## Customization
76 | 
77 | To adapt this skill for your specific use case:
78 | 
79 | 1. Modify `analyze.py` to include domain-specific calculations
80 | 2. Add custom visualization types in the plotting section
81 | 3. Include validation rules specific to your data
82 | 4. Add more sample datasets to test different scenarios
83 | 
84 | 


--------------------------------------------------------------------------------
/SKILL.md:
--------------------------------------------------------------------------------
  1 | ---
  2 | name: csv-data-summarizer
  3 | description: Analyzes CSV files, generates summary stats, and plots quick visualizations using Python and pandas.
  4 | metadata:
  5 |   version: 2.1.0
  6 |   dependencies: python>=3.8, pandas>=2.0.0, matplotlib>=3.7.0, seaborn>=0.12.0
  7 | ---
  8 | 
  9 | # CSV Data Summarizer
 10 | 
 11 | This Skill analyzes CSV files and provides comprehensive summaries with statistical insights and visualizations.
 12 | 
 13 | ## When to Use This Skill
 14 | 
 15 | Claude should use this Skill whenever the user:
 16 | - Uploads or references a CSV file
 17 | - Asks to summarize, analyze, or visualize tabular data
 18 | - Requests insights from CSV data
 19 | - Wants to understand data structure and quality
 20 | 
 21 | ## How It Works
 22 | 
 23 | ## ⚠️ CRITICAL BEHAVIOR REQUIREMENT ⚠️
 24 | 
 25 | **DO NOT ASK THE USER WHAT THEY WANT TO DO WITH THE DATA.**
 26 | **DO NOT OFFER OPTIONS OR CHOICES.**
 27 | **DO NOT SAY "What would you like me to help you with?"**
 28 | **DO NOT LIST POSSIBLE ANALYSES.**
 29 | 
 30 | **IMMEDIATELY AND AUTOMATICALLY:**
 31 | 1. Run the comprehensive analysis
 32 | 2. Generate ALL relevant visualizations
 33 | 3. Present complete results
 34 | 4. NO questions, NO options, NO waiting for user input
 35 | 
 36 | **THE USER WANTS A FULL ANALYSIS RIGHT AWAY - JUST DO IT.**
 37 | 
 38 | ### Automatic Analysis Steps:
 39 | 
 40 | **The skill intelligently adapts to different data types and industries by inspecting the data first, then determining what analyses are most relevant.**
 41 | 
 42 | 1. **Load and inspect** the CSV file into pandas DataFrame
 43 | 2. **Identify data structure** - column types, date columns, numeric columns, categories
 44 | 3. **Determine relevant analyses** based on what's actually in the data:
 45 |    - **Sales/E-commerce data** (order dates, revenue, products): Time-series trends, revenue analysis, product performance
 46 |    - **Customer data** (demographics, segments, regions): Distribution analysis, segmentation, geographic patterns
 47 |    - **Financial data** (transactions, amounts, dates): Trend analysis, statistical summaries, correlations
 48 |    - **Operational data** (timestamps, metrics, status): Time-series, performance metrics, distributions
 49 |    - **Survey data** (categorical responses, ratings): Frequency analysis, cross-tabulations, distributions
 50 |    - **Generic tabular data**: Adapts based on column types found
 51 | 
 52 | 4. **Only create visualizations that make sense** for the specific dataset:
 53 |    - Time-series plots ONLY if date/timestamp columns exist
 54 |    - Correlation heatmaps ONLY if multiple numeric columns exist
 55 |    - Category distributions ONLY if categorical columns exist
 56 |    - Histograms for numeric distributions when relevant
 57 |    
 58 | 5. **Generate comprehensive output** automatically including:
 59 |    - Data overview (rows, columns, types)
 60 |    - Key statistics and metrics relevant to the data type
 61 |    - Missing data analysis
 62 |    - Multiple relevant visualizations (only those that apply)
 63 |    - Actionable insights based on patterns found in THIS specific dataset
 64 |    
 65 | 6. **Present everything** in one complete analysis - no follow-up questions
 66 | 
 67 | **Example adaptations:**
 68 | - Healthcare data with patient IDs → Focus on demographics, treatment patterns, temporal trends
 69 | - Inventory data with stock levels → Focus on quantity distributions, reorder patterns, SKU analysis  
 70 | - Web analytics with timestamps → Focus on traffic patterns, conversion metrics, time-of-day analysis
 71 | - Survey responses → Focus on response distributions, demographic breakdowns, sentiment patterns
 72 | 
 73 | ### Behavior Guidelines
 74 | 
 75 | ✅ **CORRECT APPROACH - SAY THIS:**
 76 | - "I'll analyze this data comprehensively right now."
 77 | - "Here's the complete analysis with visualizations:"
 78 | - "I've identified this as [type] data and generated relevant insights:"
 79 | - Then IMMEDIATELY show the full analysis
 80 | 
 81 | ✅ **DO:**
 82 | - Immediately run the analysis script
 83 | - Generate ALL relevant charts automatically
 84 | - Provide complete insights without being asked
 85 | - Be thorough and complete in first response
 86 | - Act decisively without asking permission
 87 | 
 88 | ❌ **NEVER SAY THESE PHRASES:**
 89 | - "What would you like to do with this data?"
 90 | - "What would you like me to help you with?"
 91 | - "Here are some common options:"
 92 | - "Let me know what you'd like help with"
 93 | - "I can create a comprehensive analysis if you'd like!"
 94 | - Any sentence ending with "?" asking for user direction
 95 | - Any list of options or choices
 96 | - Any conditional "I can do X if you want"
 97 | 
 98 | ❌ **FORBIDDEN BEHAVIORS:**
 99 | - Asking what the user wants
100 | - Listing options for the user to choose from
101 | - Waiting for user direction before analyzing
102 | - Providing partial analysis that requires follow-up
103 | - Describing what you COULD do instead of DOING it
104 | 
105 | ### Usage
106 | 
107 | The Skill provides a Python function `summarize_csv(file_path)` that:
108 | - Accepts a path to a CSV file
109 | - Returns a comprehensive text summary with statistics
110 | - Generates multiple visualizations automatically based on data structure
111 | 
112 | ### Example Prompts
113 | 
114 | > "Here's `sales_data.csv`. Can you summarize this file?"
115 | 
116 | > "Analyze this customer data CSV and show me trends."
117 | 
118 | > "What insights can you find in `orders.csv`?"
119 | 
120 | ### Example Output
121 | 
122 | **Dataset Overview**
123 | - 5,000 rows × 8 columns  
124 | - 3 numeric columns, 1 date column  
125 | 
126 | **Summary Statistics**
127 | - Average order value: $58.2  
128 | - Standard deviation: $12.4
129 | - Missing values: 2% (100 cells)
130 | 
131 | **Insights**
132 | - Sales show upward trend over time
133 | - Peak activity in Q4
134 | *(Attached: trend plot)*
135 | 
136 | ## Files
137 | 
138 | - `analyze.py` - Core analysis logic
139 | - `requirements.txt` - Python dependencies
140 | - `resources/sample.csv` - Example dataset for testing
141 | - `resources/README.md` - Additional documentation
142 | 
143 | ## Notes
144 | 
145 | - Automatically detects date columns (columns containing 'date' in name)
146 | - Handles missing data gracefully
147 | - Generates visualizations only when date columns are present
148 | - All numeric columns are included in statistical summary
149 | 
150 | 


--------------------------------------------------------------------------------
/examples/showcase_financial_pl_data.csv:
--------------------------------------------------------------------------------
 1 | month,year,quarter,product_line,total_revenue,cost_of_goods_sold,gross_profit,gross_margin_pct,marketing_expense,sales_expense,rd_expense,admin_expense,total_operating_expenses,operating_income,operating_margin_pct,interest_expense,tax_expense,net_income,net_margin_pct,customer_acquisition_cost,customer_lifetime_value,units_sold,avg_selling_price,headcount,revenue_per_employee
 2 | Jan,2023,Q1,SaaS Platform,450000,135000,315000,70.0,65000,85000,45000,35000,230000,85000,18.9,5000,16000,64000,14.2,125,2400,1200,375,45,10000
 3 | Jan,2023,Q1,Enterprise Solutions,280000,112000,168000,60.0,35000,55000,25000,20000,135000,33000,11.8,3000,6600,23400,8.4,450,8500,450,622,45,6222
 4 | Jan,2023,Q1,Professional Services,125000,50000,75000,60.0,15000,22000,8000,12000,57000,18000,14.4,1500,3600,12900,10.3,200,3200,95,1316,45,2778
 5 | Feb,2023,Q1,SaaS Platform,475000,142500,332500,70.0,68000,89000,47000,36000,240000,92500,19.5,5200,18500,68800,14.5,120,2500,1300,365,47,10106
 6 | Feb,2023,Q1,Enterprise Solutions,295000,118000,177000,60.0,38000,58000,27000,22000,145000,32000,10.8,3200,6400,22400,7.6,440,8600,470,628,47,6277
 7 | Feb,2023,Q1,Professional Services,135000,54000,81000,60.0,16000,24000,9000,13000,62000,19000,14.1,1600,3800,13600,10.1,195,3300,105,1286,47,2872
 8 | Mar,2023,Q1,SaaS Platform,520000,156000,364000,70.0,75000,95000,52000,40000,262000,102000,19.6,5500,19250,77250,14.9,115,2650,1450,359,50,10400
 9 | Mar,2023,Q1,Enterprise Solutions,325000,130000,195000,60.0,42000,63000,30000,25000,160000,35000,10.8,3500,7000,24500,7.5,425,8800,520,625,50,6500
10 | Mar,2023,Q1,Professional Services,148000,59200,88800,60.0,18000,26000,10000,14000,68000,20800,14.1,1800,4160,14840,10.0,190,3400,115,1287,50,2960
11 | Apr,2023,Q2,SaaS Platform,555000,166500,388500,70.0,80000,100000,55000,42000,277000,111500,20.1,5800,22300,83400,15.0,110,2750,1550,358,52,10673
12 | Apr,2023,Q2,Enterprise Solutions,340000,136000,204000,60.0,45000,65000,32000,26000,168000,36000,10.6,3700,7200,25100,7.4,420,9000,540,630,52,6538
13 | Apr,2023,Q2,Professional Services,158000,63200,94800,60.0,19000,27000,11000,15000,72000,22800,14.4,1900,4560,16340,10.3,185,3500,125,1264,52,3038
14 | May,2023,Q2,SaaS Platform,590000,177000,413000,70.0,85000,105000,58000,44000,292000,121000,20.5,6000,24200,90800,15.4,105,2850,1650,358,55,10727
15 | May,2023,Q2,Enterprise Solutions,365000,146000,219000,60.0,48000,68000,35000,28000,179000,40000,11.0,4000,8000,28000,7.7,410,9200,580,629,55,6636
16 | May,2023,Q2,Professional Services,172000,68800,103200,60.0,21000,29000,12000,16000,78000,25200,14.7,2100,5040,18060,10.5,180,3600,135,1274,55,3127
17 | Jun,2023,Q2,SaaS Platform,625000,187500,437500,70.0,90000,110000,62000,46000,308000,129500,20.7,6200,25850,97450,15.6,100,2950,1750,357,58,10776
18 | Jun,2023,Q2,Enterprise Solutions,385000,154000,231000,60.0,50000,70000,37000,29000,186000,45000,11.7,4200,9000,31800,8.3,400,9400,610,631,58,6638
19 | Jun,2023,Q2,Professional Services,185000,74000,111000,60.0,22000,31000,13000,17000,83000,28000,15.1,2200,5580,20220,10.9,175,3700,145,1276,58,3190
20 | Jul,2023,Q3,SaaS Platform,665000,199500,465500,70.0,95000,115000,65000,48000,323000,142500,21.4,6500,28500,107500,16.2,95,3050,1850,359,60,11083
21 | Jul,2023,Q3,Enterprise Solutions,410000,164000,246000,60.0,53000,73000,40000,31000,197000,49000,12.0,4400,9800,34800,8.5,390,9600,650,631,60,6833
22 | Jul,2023,Q3,Professional Services,198000,79200,118800,60.0,24000,33000,14000,18000,89000,29800,15.1,2400,5960,21440,10.8,170,3800,155,1277,60,3300
23 | Aug,2023,Q3,SaaS Platform,705000,211500,493500,70.0,100000,120000,68000,50000,338000,155500,22.1,6800,31100,117600,16.7,90,3150,1950,362,63,11190
24 | Aug,2023,Q3,Enterprise Solutions,435000,174000,261000,60.0,56000,76000,42000,33000,207000,54000,12.4,4600,10800,38600,8.9,380,9800,690,630,63,6905
25 | Aug,2023,Q3,Professional Services,210000,84000,126000,60.0,25000,35000,15000,19000,94000,32000,15.2,2500,6400,23100,11.0,165,3900,165,1273,63,3333
26 | Sep,2023,Q3,SaaS Platform,750000,225000,525000,70.0,108000,128000,72000,53000,361000,164000,21.9,7200,33360,123440,16.5,88,3250,2080,360,65,11538
27 | Sep,2023,Q3,Enterprise Solutions,465000,186000,279000,60.0,60000,80000,45000,35000,220000,59000,12.7,5000,11800,42200,9.1,370,10000,735,633,65,7154
28 | Sep,2023,Q3,Professional Services,225000,90000,135000,60.0,27000,37000,16000,20000,100000,35000,15.6,2700,6920,25380,11.3,160,4000,175,1286,65,3462
29 | Oct,2023,Q4,SaaS Platform,795000,238500,556500,70.0,115000,135000,75000,55000,380000,176500,22.2,7500,35870,133130,16.7,85,3350,2200,361,68,11691
30 | Oct,2023,Q4,Enterprise Solutions,490000,196000,294000,60.0,63000,83000,47000,36000,229000,65000,13.3,5200,13000,46800,9.6,360,10200,770,636,68,7206
31 | Oct,2023,Q4,Professional Services,238000,95200,142800,60.0,29000,39000,17000,21000,106000,36800,15.5,2800,7360,26640,11.2,158,4100,185,1286,68,3500
32 | Nov,2023,Q4,SaaS Platform,840000,252000,588000,70.0,122000,142000,78000,58000,400000,188000,22.4,7800,38440,141760,16.9,82,3450,2320,362,70,12000
33 | Nov,2023,Q4,Enterprise Solutions,520000,208000,312000,60.0,67000,87000,50000,38000,242000,70000,13.5,5500,14100,50400,9.7,355,10400,815,638,70,7429
34 | Nov,2023,Q4,Professional Services,252000,100800,151200,60.0,31000,41000,18000,22000,112000,39200,15.6,3000,7728,28472,11.3,155,4200,195,1292,70,3600
35 | Dec,2023,Q4,SaaS Platform,895000,268500,626500,70.0,130000,150000,82000,62000,424000,202500,22.6,8200,41145,153155,17.1,80,3550,2480,361,72,12431
36 | Dec,2023,Q4,Enterprise Solutions,555000,222000,333000,60.0,72000,92000,53000,40000,257000,76000,13.7,6000,15400,54600,9.8,350,10600,870,638,72,7708
37 | Dec,2023,Q4,Professional Services,268000,107200,160800,60.0,33000,43000,19000,23000,118000,42800,16.0,3200,8352,31248,11.7,152,4300,205,1307,72,3722
38 | Jan,2024,Q1,SaaS Platform,925000,277500,647500,70.0,135000,155000,85000,64000,439000,208500,22.5,8500,42070,157930,17.1,78,3650,2550,363,75,12333
39 | Jan,2024,Q1,Enterprise Solutions,575000,230000,345000,60.0,75000,95000,55000,42000,267000,78000,13.6,6200,15760,56040,9.7,345,10800,900,639,75,7667
40 | Jan,2024,Q1,Professional Services,280000,112000,168000,60.0,34000,45000,20000,24000,123000,45000,16.1,3300,8770,32930,11.8,150,4400,215,1302,75,3733
41 | Feb,2024,Q1,SaaS Platform,965000,289500,675500,70.0,140000,160000,88000,66000,454000,221500,23.0,8800,44510,168190,17.4,75,3750,2660,363,77,12532
42 | Feb,2024,Q1,Enterprise Solutions,600000,240000,360000,60.0,78000,98000,57000,43000,276000,84000,14.0,6400,16800,60800,10.1,340,11000,940,638,77,7792
43 | Feb,2024,Q1,Professional Services,295000,118000,177000,60.0,36000,47000,21000,25000,129000,48000,16.3,3500,9420,35080,11.9,148,4500,225,1311,77,3831
44 | Mar,2024,Q1,SaaS Platform,1020000,306000,714000,70.0,148000,168000,92000,69000,477000,237000,23.2,9200,47880,179920,17.6,73,3850,2810,363,80,12750
45 | Mar,2024,Q1,Enterprise Solutions,635000,254000,381000,60.0,82000,103000,60000,45000,290000,91000,14.3,6800,18200,66000,10.4,335,11200,990,641,80,7938
46 | Mar,2024,Q1,Professional Services,312000,124800,187200,60.0,38000,49000,22000,26000,135000,52200,16.7,3700,10230,38270,12.3,145,4600,240,1300,80,3900
47 | 


--------------------------------------------------------------------------------
/analyze.py:
--------------------------------------------------------------------------------
  1 | import pandas as pd
  2 | import matplotlib.pyplot as plt
  3 | import seaborn as sns
  4 | from pathlib import Path
  5 | 
  6 | def summarize_csv(file_path):
  7 |     """
  8 |     Comprehensively analyzes a CSV file and generates multiple visualizations.
  9 |     
 10 |     Args:
 11 |         file_path (str): Path to the CSV file
 12 |         
 13 |     Returns:
 14 |         str: Formatted comprehensive analysis of the dataset
 15 |     """
 16 |     df = pd.read_csv(file_path)
 17 |     summary = []
 18 |     charts_created = []
 19 |     
 20 |     # Basic info
 21 |     summary.append("=" * 60)
 22 |     summary.append("📊 DATA OVERVIEW")
 23 |     summary.append("=" * 60)
 24 |     summary.append(f"Rows: {df.shape[0]:,} | Columns: {df.shape[1]}")
 25 |     summary.append(f"\nColumns: {', '.join(df.columns.tolist())}")
 26 |     
 27 |     # Data types
 28 |     summary.append(f"\n📋 DATA TYPES:")
 29 |     for col, dtype in df.dtypes.items():
 30 |         summary.append(f"  • {col}: {dtype}")
 31 |     
 32 |     # Missing data analysis
 33 |     missing = df.isnull().sum().sum()
 34 |     missing_pct = (missing / (df.shape[0] * df.shape[1])) * 100
 35 |     summary.append(f"\n🔍 DATA QUALITY:")
 36 |     if missing:
 37 |         summary.append(f"Missing values: {missing:,} ({missing_pct:.2f}% of total data)")
 38 |         summary.append("Missing by column:")
 39 |         for col in df.columns:
 40 |             col_missing = df[col].isnull().sum()
 41 |             if col_missing > 0:
 42 |                 col_pct = (col_missing / len(df)) * 100
 43 |                 summary.append(f"  • {col}: {col_missing:,} ({col_pct:.1f}%)")
 44 |     else:
 45 |         summary.append("✓ No missing values - dataset is complete!")
 46 |     
 47 |     # Numeric analysis
 48 |     numeric_cols = df.select_dtypes(include='number').columns.tolist()
 49 |     if numeric_cols:
 50 |         summary.append(f"\n📈 NUMERICAL ANALYSIS:")
 51 |         summary.append(str(df[numeric_cols].describe()))
 52 |         
 53 |         # Correlations if multiple numeric columns
 54 |         if len(numeric_cols) > 1:
 55 |             summary.append(f"\n🔗 CORRELATIONS:")
 56 |             corr_matrix = df[numeric_cols].corr()
 57 |             summary.append(str(corr_matrix))
 58 |             
 59 |             # Create correlation heatmap
 60 |             plt.figure(figsize=(10, 8))
 61 |             sns.heatmap(corr_matrix, annot=True, cmap='coolwarm', center=0, 
 62 |                        square=True, linewidths=1)
 63 |             plt.title('Correlation Heatmap')
 64 |             plt.tight_layout()
 65 |             plt.savefig('correlation_heatmap.png', dpi=150)
 66 |             plt.close()
 67 |             charts_created.append('correlation_heatmap.png')
 68 |     
 69 |     # Categorical analysis
 70 |     categorical_cols = df.select_dtypes(include=['object']).columns.tolist()
 71 |     categorical_cols = [c for c in categorical_cols if 'id' not in c.lower()]
 72 |     
 73 |     if categorical_cols:
 74 |         summary.append(f"\n📊 CATEGORICAL ANALYSIS:")
 75 |         for col in categorical_cols[:5]:  # Limit to first 5
 76 |             value_counts = df[col].value_counts()
 77 |             summary.append(f"\n{col}:")
 78 |             for val, count in value_counts.head(10).items():
 79 |                 pct = (count / len(df)) * 100
 80 |                 summary.append(f"  • {val}: {count:,} ({pct:.1f}%)")
 81 |     
 82 |     # Time series analysis
 83 |     date_cols = [c for c in df.columns if 'date' in c.lower() or 'time' in c.lower()]
 84 |     if date_cols:
 85 |         summary.append(f"\n📅 TIME SERIES ANALYSIS:")
 86 |         date_col = date_cols[0]
 87 |         df[date_col] = pd.to_datetime(df[date_col], errors='coerce')
 88 |         
 89 |         date_range = df[date_col].max() - df[date_col].min()
 90 |         summary.append(f"Date range: {df[date_col].min()} to {df[date_col].max()}")
 91 |         summary.append(f"Span: {date_range.days} days")
 92 |         
 93 |         # Create time-series plots for numeric columns
 94 |         if numeric_cols:
 95 |             fig, axes = plt.subplots(min(3, len(numeric_cols)), 1, 
 96 |                                     figsize=(12, 4 * min(3, len(numeric_cols))))
 97 |             if len(numeric_cols) == 1:
 98 |                 axes = [axes]
 99 |             
100 |             for idx, num_col in enumerate(numeric_cols[:3]):
101 |                 ax = axes[idx] if len(numeric_cols) > 1 else axes[0]
102 |                 daily_data = df.groupby(date_col)[num_col].agg(['mean', 'sum', 'count'])
103 |                 daily_data['mean'].plot(ax=ax, label='Average', linewidth=2)
104 |                 ax.set_title(f'{num_col} Over Time')
105 |                 ax.set_xlabel('Date')
106 |                 ax.set_ylabel(num_col)
107 |                 ax.legend()
108 |                 ax.grid(True, alpha=0.3)
109 |             
110 |             plt.tight_layout()
111 |             plt.savefig('time_series_analysis.png', dpi=150)
112 |             plt.close()
113 |             charts_created.append('time_series_analysis.png')
114 |     
115 |     # Distribution plots for numeric columns
116 |     if numeric_cols:
117 |         n_cols = min(4, len(numeric_cols))
118 |         fig, axes = plt.subplots(2, 2, figsize=(12, 10))
119 |         axes = axes.flatten()
120 |         
121 |         for idx, col in enumerate(numeric_cols[:4]):
122 |             axes[idx].hist(df[col].dropna(), bins=30, edgecolor='black', alpha=0.7)
123 |             axes[idx].set_title(f'Distribution of {col}')
124 |             axes[idx].set_xlabel(col)
125 |             axes[idx].set_ylabel('Frequency')
126 |             axes[idx].grid(True, alpha=0.3)
127 |         
128 |         # Hide unused subplots
129 |         for idx in range(len(numeric_cols[:4]), 4):
130 |             axes[idx].set_visible(False)
131 |         
132 |         plt.tight_layout()
133 |         plt.savefig('distributions.png', dpi=150)
134 |         plt.close()
135 |         charts_created.append('distributions.png')
136 |     
137 |     # Categorical distributions
138 |     if categorical_cols:
139 |         fig, axes = plt.subplots(2, 2, figsize=(14, 10))
140 |         axes = axes.flatten()
141 |         
142 |         for idx, col in enumerate(categorical_cols[:4]):
143 |             value_counts = df[col].value_counts().head(10)
144 |             axes[idx].barh(range(len(value_counts)), value_counts.values)
145 |             axes[idx].set_yticks(range(len(value_counts)))
146 |             axes[idx].set_yticklabels(value_counts.index)
147 |             axes[idx].set_title(f'Top Values in {col}')
148 |             axes[idx].set_xlabel('Count')
149 |             axes[idx].grid(True, alpha=0.3, axis='x')
150 |         
151 |         # Hide unused subplots
152 |         for idx in range(len(categorical_cols[:4]), 4):
153 |             axes[idx].set_visible(False)
154 |         
155 |         plt.tight_layout()
156 |         plt.savefig('categorical_distributions.png', dpi=150)
157 |         plt.close()
158 |         charts_created.append('categorical_distributions.png')
159 |     
160 |     # Summary of visualizations
161 |     if charts_created:
162 |         summary.append(f"\n📊 VISUALIZATIONS CREATED:")
163 |         for chart in charts_created:
164 |             summary.append(f"  ✓ {chart}")
165 |     
166 |     summary.append("\n" + "=" * 60)
167 |     summary.append("✅ COMPREHENSIVE ANALYSIS COMPLETE")
168 |     summary.append("=" * 60)
169 |     
170 |     return "\n".join(summary)
171 | 
172 | 
173 | if __name__ == "__main__":
174 |     # Test with sample data
175 |     import sys
176 |     if len(sys.argv) > 1:
177 |         file_path = sys.argv[1]
178 |     else:
179 |         file_path = "resources/sample.csv"
180 |     
181 |     print(summarize_csv(file_path))
182 | 
183 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | <div align="center">
  2 | 
  3 | [![Join AI Community](https://img.shields.io/badge/🚀_Join-AI_Community_(FREE)-4F46E5?style=for-the-badge)](https://www.skool.com/ai-for-your-business)
  4 | [![GitHub Profile](https://img.shields.io/badge/GitHub-@coffeefuelbump-181717?style=for-the-badge&logo=github)](https://github.com/coffeefuelbump)
  5 | 
  6 | [![Link Tree](https://img.shields.io/badge/Linktree-Everything-green?style=for-the-badge&logo=linktree&logoColor=white)](https://linktr.ee/corbin_brown)
  7 | [![YouTube Membership](https://img.shields.io/badge/YouTube-Become%20a%20Builder-red?style=for-the-badge&logo=youtube&logoColor=white)](https://www.youtube.com/channel/UCJFMlSxcvlZg5yZUYJT0Pug/join)
  8 | 
  9 | </div>
 10 | 
 11 | ---
 12 | 
 13 | # 📊 CSV Data Summarizer - Claude Skill
 14 | 
 15 | A powerful Claude Skill that automatically analyzes CSV files and generates comprehensive insights with visualizations. Upload any CSV and get instant, intelligent analysis without being asked what you want!
 16 | 
 17 | <div align="center">
 18 | 
 19 | [![Version](https://img.shields.io/badge/version-2.1.0-blue.svg)](https://github.com/coffeefuelbump/csv-data-summarizer-claude-skill)
 20 | [![Python](https://img.shields.io/badge/python-3.8+-green.svg)](https://www.python.org/)
 21 | [![License](https://img.shields.io/badge/license-MIT-orange.svg)](LICENSE)
 22 | 
 23 | </div>
 24 | 
 25 | ## 🚀 Features
 26 | 
 27 | - **🤖 Intelligent & Adaptive** - Automatically detects data type (sales, customer, financial, survey, etc.) and applies relevant analysis
 28 | - **📈 Comprehensive Analysis** - Generates statistics, correlations, distributions, and trends
 29 | - **🎨 Auto Visualizations** - Creates multiple charts based on what's in your data:
 30 |   - Time-series plots for date-based data
 31 |   - Correlation heatmaps for numeric relationships
 32 |   - Distribution histograms
 33 |   - Categorical breakdowns
 34 | - **⚡ Proactive** - No questions asked! Just upload CSV and get complete analysis immediately
 35 | - **🔍 Data Quality Checks** - Automatically detects and reports missing values
 36 | - **📊 Multi-Industry Support** - Adapts to e-commerce, healthcare, finance, operations, surveys, and more
 37 | 
 38 | ## 📥 Quick Download
 39 | 
 40 | <div align="center">
 41 | 
 42 | ### Get Started in 2 Steps
 43 | 
 44 | **1️⃣ Download the Skill**  
 45 | [![Download Skill](https://img.shields.io/badge/Download-CSV%20Data%20Summarizer%20Skill-blue?style=for-the-badge&logo=download)](https://github.com/coffeefuelbump/csv-data-summarizer-claude-skill/raw/main/csv-data-summarizer.zip)
 46 | 
 47 | **2️⃣ Try the Demo Data**  
 48 | [![Download Demo CSV](https://img.shields.io/badge/Download-Sample%20P%26L%20Financial%20Data-green?style=for-the-badge&logo=data)](https://github.com/coffeefuelbump/csv-data-summarizer-claude-skill/raw/main/examples/showcase_financial_pl_data.csv)
 49 | 
 50 | </div>
 51 | 
 52 | ---
 53 | 
 54 | ## 📦 What's Included
 55 | 
 56 | ```
 57 | csv-data-summarizer-claude-skill/
 58 | ├── SKILL.md              # Claude Skill definition
 59 | ├── analyze.py            # Comprehensive analysis engine
 60 | ├── requirements.txt      # Python dependencies
 61 | ├── examples/
 62 | │   └── showcase_financial_pl_data.csv  # Demo P&L financial dataset (15 months, 25 metrics)
 63 | └── resources/
 64 |     ├── sample.csv        # Example dataset
 65 |     └── README.md         # Usage documentation
 66 | ```
 67 | 
 68 | ## 🎯 How It Works
 69 | 
 70 | 1. **Upload** any CSV file to Claude.ai
 71 | 2. **Skill activates** automatically when CSV is detected
 72 | 3. **Analysis runs** immediately - inspects data structure and adapts
 73 | 4. **Results delivered** - Complete analysis with multiple visualizations
 74 | 
 75 | No prompting needed. No options to choose. Just instant, comprehensive insights!
 76 | 
 77 | ## 📥 Installation
 78 | 
 79 | ### For Claude.ai Users
 80 | 
 81 | 1. Download the latest release: [`csv-data-summarizer.zip`](https://github.com/coffeefuelbump/csv-data-summarizer-claude-skill/releases)
 82 | 2. Go to [Claude.ai](https://claude.ai) → Settings → Capabilities → Skills
 83 | 3. Upload the zip file
 84 | 4. Enable the skill
 85 | 5. Done! Upload any CSV and watch it work ✨
 86 | 
 87 | ### For Developers
 88 | 
 89 | ```bash
 90 | git clone git@github.com:coffeefuelbump/csv-data-summarizer-claude-skill.git
 91 | cd csv-data-summarizer-claude-skill
 92 | pip install -r requirements.txt
 93 | ```
 94 | 
 95 | ## 📊 Sample Dataset Highlights
 96 | 
 97 | The included demo CSV contains **15 months of P&L data** with:
 98 | - 3 product lines (SaaS, Enterprise, Services)
 99 | - 25 financial metrics including revenue, expenses, margins, CAC, LTV
100 | - Quarterly trends showing business growth
101 | - Perfect for showcasing time-series analysis, correlations, and financial insights
102 | 
103 | ## 🎨 Example Use Cases
104 | 
105 | - **📊 Sales Data** → Revenue trends, product performance, regional analysis
106 | - **👥 Customer Data** → Demographics, segmentation, geographic patterns
107 | - **💰 Financial Data** → Transaction analysis, trend detection, correlations
108 | - **⚙️ Operational Data** → Performance metrics, time-series analysis
109 | - **📋 Survey Data** → Response distributions, cross-tabulations
110 | 
111 | ## 🛠️ Technical Details
112 | 
113 | **Dependencies:**
114 | - Python 3.8+
115 | - pandas 2.0+
116 | - matplotlib 3.7+
117 | - seaborn 0.12+
118 | 
119 | **Visualizations Generated:**
120 | - Time-series trend plots
121 | - Correlation heatmaps
122 | - Distribution histograms
123 | - Categorical bar charts
124 | 
125 | ## 📝 Example Output
126 | 
127 | ```
128 | ============================================================
129 | 📊 DATA OVERVIEW
130 | ============================================================
131 | Rows: 100 | Columns: 15
132 | 
133 | 📋 DATA TYPES:
134 |   • order_date: object
135 |   • total_revenue: float64
136 |   • customer_segment: object
137 |   ...
138 | 
139 | 🔍 DATA QUALITY:
140 | ✓ No missing values - dataset is complete!
141 | 
142 | 📈 NUMERICAL ANALYSIS:
143 | [Summary statistics for all numeric columns]
144 | 
145 | 🔗 CORRELATIONS:
146 | [Correlation matrix showing relationships]
147 | 
148 | 📅 TIME SERIES ANALYSIS:
149 | Date range: 2024-01-05 to 2024-04-11
150 | Span: 97 days
151 | 
152 | 📊 VISUALIZATIONS CREATED:
153 |   ✓ correlation_heatmap.png
154 |   ✓ time_series_analysis.png
155 |   ✓ distributions.png
156 |   ✓ categorical_distributions.png
157 | ```
158 | 
159 | ## 🌟 Connect & Learn More
160 | 
161 | <div align="center">
162 | 
163 | [![Join AI Community](https://img.shields.io/badge/Join-AI%20Community%20(FREE)-blue?style=for-the-badge&logo=data:image/svg+xml;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHdpZHRoPSIyNCIgaGVpZ2h0PSIyNCIgdmlld0JveD0iMCAwIDI0IDI0IiBmaWxsPSJ3aGl0ZSI+PHBhdGggZD0iTTEyIDJDNi40OCAyIDIgNi40OCAyIDEyczQuNDggMTAgMTAgMTAgMTAtNC40OCAxMC0xMFMxNy41MiAyIDEyIDJ6bTAgM2MxLjY2IDAgMyAxLjM0IDMgM3MtMS4zNCAzLTMgMy0zLTEuMzQtMy0zIDEuMzQtMyAzLTN6bTAgMTQuMmMtMi41IDAtNC43MS0xLjI4LTYtMy4yMi4wMy0xLjk5IDQtMy4wOCA2LTMuMDggMS45OSAwIDUuOTcgMS4wOSA2IDMuMDgtMS4yOSAxLjk0LTMuNSAzLjIyLTYgMy4yMnoiLz48L3N2Zz4=)](https://www.skool.com/ai-for-your-business/about)
164 | 
165 | [![Link Tree](https://img.shields.io/badge/Linktree-Everything-green?style=for-the-badge&logo=linktree&logoColor=white)](https://linktr.ee/corbin_brown)
166 | 
167 | [![YouTube Membership](https://img.shields.io/badge/YouTube-Become%20a%20Builder-red?style=for-the-badge&logo=youtube&logoColor=white)](https://www.youtube.com/channel/UCJFMlSxcvlZg5yZUYJT0Pug/join)
168 | 
169 | [![Twitter Follow](https://img.shields.io/badge/Twitter-Follow%20@corbin__braun-1DA1F2?style=for-the-badge&logo=twitter&logoColor=white)](https://twitter.com/corbin_braun)
170 | 
171 | </div>
172 | 
173 | ## 🤝 Contributing
174 | 
175 | Contributions are welcome! Feel free to:
176 | - Report bugs
177 | - Suggest new features
178 | - Submit pull requests
179 | - Share your use cases
180 | 
181 | ## 📄 License
182 | 
183 | MIT License - feel free to use this skill for personal or commercial projects!
184 | 
185 | ## 🙏 Acknowledgments
186 | 
187 | Built for the Claude Skills platform by [Anthropic](https://www.anthropic.com/news/skills).
188 | 
189 | ---
190 | 
191 | <div align="center">
192 | 
193 | **Made with ❤️ for the AI community**
194 | 
195 | ⭐ Star this repo if you find it useful!
196 | 
197 | </div>
198 | 
199 | 


--------------------------------------------------------------------------------