├── requirements.txt ├── data ├── Superstore-Data.csv └── Features_Target_Description.txt ├── Product Analysis Tableau.txt ├── README.md └── notebooks └── Products-Analysis.sql /requirements.txt: -------------------------------------------------------------------------------- 1 | numpy 2 | pandas 3 | matplotlib 4 | seaborn 5 | squarify 6 | scikit-learn 7 | gradio 8 | -------------------------------------------------------------------------------- /data/Superstore-Data.csv: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/nikitaprasad21/Product-Recommendation-Analysis-Project/HEAD/data/Superstore-Data.csv -------------------------------------------------------------------------------- /Product Analysis Tableau.txt: -------------------------------------------------------------------------------- 1 | I designed Quarterly Sales Forecasting Dashboard to get insights on State Wise Distribution of Sales & Profit Time Series using Tableau. 2 | 3 | Here is the link to dashboard on my Tableau Public Profile. 4 | 5 | https://public.tableau.com/app/profile/nikita.prasad/viz/QuarterlySalesForecastingAnalysisDahboard/AnalysisDashboard 6 | 7 | The observations made are as mentioned below: 8 | 9 | 1. The Quarterly Sales Forecasting is expected to grow in 2018 Q4. 10 | 2. "Consumer" Category Segment are the most Profitable Customers for our business. 11 | 3. "California" is biggest and profitable market in consumer segment whereas "North Darkota" is the least. 12 | 3. "Phones" are the highest selling product but "Copiers" has more profit product. 13 | 4. "Fasterners" are the lowest selling product but "Tables" have even negative profit margin. -------------------------------------------------------------------------------- /data/Features_Target_Description.txt: -------------------------------------------------------------------------------- 1 | Superstore Data Table 2 | 3 | Variable Name Description 4 | -------------------------------------------------------- 5 | Row ID Unique ID for each row. 6 | Order ID Unique Order ID for each Customer. 7 | Order Date Order Date of the product. 8 | Ship Date Shipping Date of the Product. 9 | Ship Mode Shipping Mode specified by the Customer. 10 | Customer ID Unique ID to identify each Customer. 11 | Customer Name Name of the Customer. 12 | Segment The segment where the Customer belongs. 13 | Country Country of residence of the Customer. 14 | City City of residence of the Customer. 15 | State State of residence of the Customer. 16 | Postal Code Postal Code of every Customer. 17 | Region Region where the Customer belongs. 18 | Product ID Unique ID of the Product. 19 | Category Category of the product ordered. 20 | Sub-Category Sub-Category of the product ordered. 21 | Product Name Name of the Product. 22 | Sales Sales of the Product. 23 | Quantity Quantity of the Product. 24 | Discount Discount provided. 25 | Profit Profit/Loss incurred. 26 | 27 | 28 | 29 | Superstore Reviews Data Table 30 | 31 | Variable Name Description 32 | -------------------------------------------------------- 33 | Row ID Unique ID for each row. 34 | Order ID Unique Order ID for each Customer. 35 | Customer ID Unique ID to identify each Customer. 36 | Product ID Unique ID of the Product. 37 | Segment The segment where the Customer belongs. 38 | Category Category of the product ordered. 39 | Sub-Category Sub-Category of the product ordered. 40 | Product Name Name of the Product. 41 | Sales Sales of the Product. 42 | Quantity Quantity of the Product. 43 | Discount Discount provided. 44 | Profit Profit/Loss incurred. 45 | Rate Rating from 1 to 5. 46 | Review Review of the product given by the customer. 47 | Summary Summary of the customer review or experience. 48 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Product Recommendation System for E-commerce Project 2 | 3 | ## Project Background 4 | This project focuses on enhancing product recommendations and understanding market trends for an online retail platform. A **SQL**-based data warehouse was designed to store and manage customer and sales data. The project involved analyzing 10,000 customer records, forecasting sales trends, and building personalized recommendation systems using **Python** to improve user experience. These insights help businesses identify profitable segments and optimize marketing efforts, leading to growth and increased customer satisfaction. 5 | 6 | ## Dataset Structure 7 | The dataset for this project was organized into a SQL-based data warehouse with multiple tables for customer data, sales records, product information, and regional market data. The primary data structure used for this analysis and modeling is the combination of two major sources: 8 | 9 | * [Superstore Data](https://github.com/nikitaprasad21/Product-Recommendation-Analysis-Project/blob/main/data/Superstore-Data.csv): Superstore purchase history and revenue data. 10 | * [Superstore Reviews Data](https://github.com/nikitaprasad21/Product-Recommendation-Analysis-Project/blob/main/data/Superstore-Dataset-Reviews.csv): Rating, review and summary of the product given by the customer. 11 | 12 | Note: The dataset comprised 10,000+ records, divided into 36 columns, the feature-target description you can see [here](https://github.com/nikitaprasad21/Product-Recommendation-Analysis-Project/blob/main/data/Features_Target_Description.txt). 13 | 14 | 15 | ## Executive Summary 16 | This project implemented a SQL-based Superstore Data Warehouse and developed product recommendation models to improve customer experience and sales strategies. By analyzing data from 10,000 customers, the system identified profitable market segments, forecasted sales trends, and provided personalized product recommendations. 17 | 18 | Key insights included the dominance of the "Consumer" segment in profitability, the significant role of California as the top-performing market, and strategies for improvement in underperforming regions such as North Dakota. 19 | 20 | 21 | ## Codes 22 | * The targed **SQL queries** regarding various business questions can be found [here](https://github.com/nikitaprasad21/Product-Recommendation-Analysis-Project/blob/main/notebooks/Products-Analysis.sql). 23 | * The interactive **Tableau dashboard** used to report and explore sales trends can be found [here](https://public.tableau.com/app/profile/nikita.prasad/viz/QuarterlySalesForecastingAnalysisDahboard/AnalysisDashboard). 24 | * The **Python Pipeline** used for EDA, Model Building for Recommendation Systems and Deployment of model can be found [here](https://github.com/nikitaprasad21/Product-Recommendation-Analysis-Project/blob/main/notebooks/Product-Recommendation-Project.ipynb). 25 | * The **Recommendation App** demo can be found [here](https://huggingface.co/spaces/nikitaprasad-analyst/product-recommendation-system). 26 | 27 | ## Dashboard 28 | 29 | Here is the glimpse of dashboard. 30 | ![Screenshot 2023-04-12 165416](https://user-images.githubusercontent.com/84131752/231447810-39810cfc-f423-4463-b6c8-e2eb4c73f878.png) 31 | 32 | 33 | ## Insights 34 | 35 | #### Category 1: Profitable Market Segment 36 | * The "Consumer" category is the most profitable customer segment among our customer segments, generating the highest revenue and profit, highlighting its significance to our business. 37 | 38 | #### Category 2: Top Performing Region 39 | * California emerged as the largest and most profitable market, contributing significantly to total sales. 40 | #### Category 3: Underperforming Markets 41 | * North Dakota represents an underperforming market, requiring strategic attention to improve profitability and market share. 42 | #### Category 4: Growth Forecast 43 | * The sales forecast predicts increased sales in Q4 of 2018, presenting growth opportunities for targeted marketing efforts. 44 | 45 | ## Recommendations 46 | 47 | #### Category 1: Capitalize on Growth Opportunities: 48 | 49 | * With the anticipated growth in Quarterly Sales Forecasting for 2018 Q4, it is advisable to allocate additional additional marketing resources to maximize sales opportunities during this period. 50 | 51 | #### Category 2: Focus on the "Consumer" Category: 52 | * Develop targeted marketing campaigns, personalized promotions, and enhanced customer experiences to further boost profitability within the "Consumer" segment segment. 53 | 54 | #### Category 3. Optimize Market Strategies: 55 | * As California is the most profitable market, it is essential to continue investing in this region through focused marketing efforts, maintaining customer loyalty, and increasing market share. 56 | 57 | #### Category 4. Improve Underperforming Markets: 58 | 59 | * Conduct in-depth market research in North Dakota to understand consumer behavior and tailor marketing efforts to boost sales and profitability in the region. 60 | 61 | ## Assumptions and Caveats 62 | #### Assumptions: 63 | 64 | * The data provided for analysis is accurate and reflects customer preferences and purchasing patterns. 65 | * The sales forecast is based on historical trends and assumes no major disruptions in market behavior. 66 | #### Caveats: 67 | 68 | * Seasonal fluctuations and external factors like economic changes or unforeseen events may impact sales trends and market behavior, making forecast predictions less accurate. 69 | * The recommendation models are built on available data; their effectiveness may vary if customer preferences shift or if there is insufficient data for new products. 70 | -------------------------------------------------------------------------------- /notebooks/Products-Analysis.sql: -------------------------------------------------------------------------------- 1 | -- Here is the SQL Analysis, I did to identify trends of 10K customer records to get the insights of the profitable market segments. 2 | 3 | -- 1. To Create a Temporary Table to combining all the tables from all the regions 4 | 5 | WITH superstore_db AS ( 6 | SELECT * FROM superstore_dataset___west 7 | UNION 8 | SELECT * FROM superstore_dataset___cental 9 | UNION 10 | SELECT * FROM superstore_dataset___east 11 | UNION 12 | SELECT * FROM superstore_dataset___south 13 | ) 14 | 15 | SELECT * FROM superstore_db 16 | 17 | 18 | -- 2. Alter table columns name by replacing space with “_” 19 | 20 | ALTER TABLE `superstore_dataset___west` CHANGE `Sub-Category` `Sub_Category` VARCHAR(11) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL; 21 | 22 | 23 | -- 3. To Find Total Sales and Profit in each region and states. 24 | 25 | SELECT superstore_dataset___west.Region,superstore_dataset___west.State,ROUND(SUM(superstore_dataset___west.Sales) ,2) AS Total_Sales, ROUND(SUM(superstore_dataset___west.Profit),2) AS Total_Profit FROM superstore_dataset___west 26 | GROUP BY superstore_dataset___west.Region,superstore_dataset___west.State 27 | UNION 28 | SELECT superstore_dataset___central.Region,superstore_dataset___central.State,ROUND(SUM(superstore_dataset___central.Sales),2) AS Total_Sales, ROUND(SUM(superstore_dataset___central.Profit),2) AS Total_Profit FROM superstore_dataset___central 29 | GROUP BY superstore_dataset___central.Region,superstore_dataset___central.State 30 | UNION 31 | SELECT superstore_dataset___east.Region,superstore_dataset___east.State,ROUND(SUM(superstore_dataset___east.Sales),2) AS Total_Sales, ROUND(SUM(superstore_dataset___east.Profit),2) AS Total_Profit FROM superstore_dataset___east 32 | GROUP BY superstore_dataset___east.Region,superstore_dataset___east.State 33 | UNION 34 | SELECT superstore_dataset___south.Region,superstore_dataset___south.State,ROUND(SUM(superstore_dataset___south.Sales),2) AS Total_Sales, ROUND(SUM(superstore_dataset___south.Profit),2) AS Total_Profit FROM superstore_dataset___south 35 | GROUP BY superstore_dataset___south.Region,superstore_dataset___south.State 36 | 37 | 38 | -- 4. To find Avg Price of a top 10 products 39 | 40 | SELECT superstore_dataset___west.Row_ID, superstore_dataset___west.Product_Name, round(AVG(superstore_dataset___west.Sales), 2) AS Total_Sales 41 | FROM superstore_dataset___west 42 | GROUP BY superstore_dataset___west.Product_Name 43 | UNION 44 | SELECT superstore_dataset___central.Row_ID, superstore_dataset___central.Product_Name, round(AVG(superstore_dataset___central.Sales),2) AS Total_Sales 45 | FROM superstore_dataset___central 46 | GROUP BY superstore_dataset___central.Product_Name 47 | UNION 48 | SELECT superstore_dataset___east.Row_ID, superstore_dataset___east.Product_Name, round(AVG(superstore_dataset___east.Sales),2) AS Total_Sales 49 | FROM superstore_dataset___east 50 | GROUP BY superstore_dataset___east.Product_Name 51 | UNION 52 | SELECT superstore_dataset___south.Row_ID, superstore_dataset___south.Product_Name, round(AVG(superstore_dataset___south.Sales),2) AS Total_Sales 53 | FROM superstore_dataset___south 54 | GROUP BY superstore_dataset___south.Product_Name 55 | ORDER BY Total_Sales DESC 56 | LIMIT 10; 57 | 58 | 59 | -- 5. To find top 5 categories, sub categpries, product and profit associated with it 60 | 61 | SELECT superstore_dataset___west.Row_ID, superstore_dataset___west.Category, superstore_dataset___west.Sub_Category, superstore_dataset___west.Product_Name, round(SUM(superstore_dataset___west.Profit),2) AS Profit 62 | FROM superstore_dataset___west 63 | GROUP BY superstore_dataset___west.Category,superstore_dataset___west.Sub_Category, superstore_dataset___west.Product_Name 64 | UNION 65 | SELECT superstore_dataset___central.Row_ID, superstore_dataset___central.Category, superstore_dataset___central.Sub_Category, superstore_dataset___central.Product_Name, round(SUM(superstore_dataset___central.Profit),2) AS Profit 66 | FROM superstore_dataset___central 67 | GROUP BY superstore_dataset___central.Category,superstore_dataset___central.Sub_Category, superstore_dataset___central.Product_Name 68 | UNION 69 | SELECT superstore_dataset___east.Row_ID, superstore_dataset___east.Category, superstore_dataset___east.Sub_Category, superstore_dataset___east.Product_Name, round(SUM(superstore_dataset___east.Profit),2) AS Profit 70 | FROM superstore_dataset___east 71 | GROUP BY superstore_dataset___east.Category,superstore_dataset___east.Sub_Category, superstore_dataset___east.Product_Name 72 | UNION 73 | SELECT superstore_dataset___south.Row_ID, superstore_dataset___south.Category, superstore_dataset___south.Sub_Category, superstore_dataset___south.Product_Name, round(SUM(superstore_dataset___south.Profit),2) AS Profit 74 | FROM superstore_dataset___south 75 | GROUP BY superstore_dataset___south.Category,superstore_dataset___south.Sub_Category, superstore_dataset___south.Product_Name 76 | ORDER BY Profit DESC 77 | LIMIT 5; 78 | 79 | 80 | -- 6. Getting all the details of customer name as ”Darrin Van Huff", "Brosina Hoffman” 81 | 82 | SELECT * 83 | FROM superstore_dataset___west 84 | WHERE superstore_dataset___west.Customer_Name IN ("Darrin Van Huff", "Brosina Hoffman"); 85 | 86 | 87 | -- 7. Getting top 10 potential customers with count of shopping 88 | 89 | SELECT superstore_dataset___west.Customer_ID, superstore_dataset___west.Region, superstore_dataset___west.Customer_Name, COUNT(superstore_dataset___west.Sales) AS Potential_Customers 90 | FROM superstore_dataset___west 91 | GROUP BY superstore_dataset___west.Customer_ID 92 | UNION 93 | SELECT superstore_dataset___central.Customer_ID, superstore_dataset___central.Region, superstore_dataset___central.Customer_Name, COUNT(superstore_dataset___central.Sales) AS Potential_Customers 94 | FROM superstore_dataset___central 95 | GROUP BY superstore_dataset___central.Customer_ID 96 | UNION 97 | SELECT superstore_dataset___east.Customer_ID, superstore_dataset___east.Region, superstore_dataset___east.Customer_Name, COUNT(superstore_dataset___east.Sales) AS Potential_Customers 98 | FROM superstore_dataset___east 99 | GROUP BY superstore_dataset___east.Customer_ID 100 | UNION 101 | SELECT superstore_dataset___south.Customer_ID, superstore_dataset___south.Region, superstore_dataset___south.Customer_Name, COUNT(superstore_dataset___south.Sales) AS Potential_Customers 102 | FROM superstore_dataset___south 103 | GROUP BY superstore_dataset___south.Customer_ID 104 | ORDER BY `Potential_Customers` DESC 105 | LIMIT 10; 106 | 107 | --------------------------------------------------------------------------------