├── SQL ├── SQL Project │ ├── Readme.md │ └── DANNYSDINER CASE-STUDY.sql ├── README.md └── Most Powerful women │ ├── Most powerful women SQL data wrangling.sql │ └── The Worlds 100 Most Powerful Women.csv ├── POWER BI └── README.md ├── EXCEL └── README.md ├── R ├── README.md ├── Cyclistic bike share.R └── Quantium Chips.R ├── Python Projects ├── README.md └── WEB SCRAPPING E- COMMERCE SITE (EBAY).ipynb └── README.md /SQL/SQL Project/Readme.md: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /POWER BI/README.md: -------------------------------------------------------------------------------- 1 | #### 📍I AM SO GLAD THAT YOU HAVE TAKEN INTREST IN MY WORK. 2 | * THANK YOU FOR YOUR TIME. 3 | 🔴PLEASE DO FOLLOW THE LINK BELOW TO INTERACT WITH MY POWER BI DASHBOARD ON NOVY PRO. 4 | 5 | 6 | 7 | https://www.novypro.com/profile_projects/mary-anene 8 | -------------------------------------------------------------------------------- /EXCEL/README.md: -------------------------------------------------------------------------------- 1 | #### I APPRICIATE YOUR INTREST IN MY EXCEL PORTFOLIO. 2 | * PLEASE DO FIND MY WORK ON EXCEL VIA THIS LINK 3 | 4 | 5 | https://www.canva.com/design/DAFdrRT376E/z1Nv_fbSAF9VXERRIO6QCQ/view?utm_content=DAFdrRT376E&utm_campaign=designshare&utm_medium=link2&utm_source=sharebutton 6 | -------------------------------------------------------------------------------- /SQL/README.md: -------------------------------------------------------------------------------- 1 | ## DATA ANALYTICS WITH SQL 2 | 3 | Welcome to the world of data analytics with SQL! In this folder, you will find a collection of projects and resources that showcase the power and versatility of SQL in data analysis. 4 | 5 | SQL, or Structured Query Language, is a powerful tool for managing and analyzing data. It allows you to perform complex queries and extract valuable insights from large and complex data sets. In this repository, you will find examples of SQL queries and scripts that demonstrate my proficiency in data analysis using SQL. 6 | 7 | You'll find a diverse set of projects that covers a wide range of data analysis techniques, such as database creation , data cleaning, database querying and critical thinking. These projects showcase my ability to extract insights from data, and present them in a clear and visually appealing way. I have also included examples of reports that I have created using SQL. These projects demonstrate my ability to analyze and present data in a way that is easy for decision makers to understand. 8 | 9 | I have also included resources to help you improve your data analysis skills with SQL. 10 | 11 | I hope you find this repository to be informative and engaging, and I welcome any opportunity to discuss my qualifications and experience further. Thank you for visiting my repository and happy data analyzing! 12 | -------------------------------------------------------------------------------- /R/README.md: -------------------------------------------------------------------------------- 1 | ### R PROGRAMING DATA ANALYSIS 2 | 3 | Welcome to the world of data analytics with R! In this repository, you will find a collection of projects and resources that showcase the power and versatility of R in data analysis. 4 | 5 | R is a popular programming language and software environment for data analysis. It is widely used by statisticians, data scientists, and researchers because of its powerful tools for data manipulation, visualization, and statistical modeling. 6 | 7 | In this repository, you will find examples of R code that demonstrate my proficiency in data analysis using R. You'll find a diverse set of projects that covers a wide range of data analysis techniques, such as data cleaning, data exploration, data visualization, statistical modeling, and machine learning. These projects showcase my ability to extract insights from data and present them in a clear and visually appealing way. 8 | 9 | I have also included resources to help you improve your data analysis skills with R. These resources include tutorials, exercises, and examples that will help you learn the basics of R and how to use it for data analysis. Whether you are new to data analysis or a seasoned data analyst looking to improve your skills, you will find something of value in this repository. 10 | 11 | I hope you find this repository to be informative and engaging. I welcome any opportunity to discuss my qualifications and experience further. Thank you for visiting my repository and happy data analyzing with R! 12 | -------------------------------------------------------------------------------- /Python Projects/README.md: -------------------------------------------------------------------------------- 1 | ### DATA ANALYTICS WITH PYTHON 2 | Welcome to the world of data analytics with Python! In this file, you will find a collection of projects and resources that showcase the power and versatility of Python in data analysis. 3 | 4 | Python is a powerful and versatile programming language that is widely used in data analysis. It has a large ecosystem of libraries and frameworks that make it easy to perform complex data analysis tasks, such as data mining, data visualization, statistical modeling, and machine learning. 5 | 6 | In this file, you will find examples of Python code that demonstrate my proficiency in data analysis using Python. You'll find a diverse set of projects that covers a wide range of data analysis techniques, such as data cleaning, data exploration, data visualization, and statistical modeling. These projects showcase my ability to extract insights from data and present them in a clear and visually appealing way. 7 | 8 | I have also included resources to help you improve your data analysis skills with Python. These resources include tutorials, exercises, and examples that will help you learn the basics of Python and how to use it for data analysis. Whether you are new to data analysis or a seasoned data analyst looking to improve your skills, you will find something of value in this folder. 9 | 10 | I hope you find this repository to be informative and engaging. I welcome any opportunity to discuss my qualifications and experience further. Thank you for visiting my repository and happy data analyzing with Python! 11 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # DATA ANALYTICS POTFOLIO 2 | Welcome to my Data Analyst Portfolio! 3 | 4 | I am a data analyst with experience in using various tools and technologies to collect, organize, and analyze data to inform business decisions. I am proficient in Python, R, SQL, Excel, Power BI, and Tableau, and have a solid understanding of data analytics techniques such as dashboard building, report writing, data mining, data merging, statistics, and critical thinking. 5 | 6 | In this portfolio, I have included a variety of projects that showcase my data analytics skills. You will find links of the dashboards and reports I have created using various tools such as Power BI, Tableau, and Excel. These projects demonstrate my ability to analyze and present data in a clear and visually appealing way, making it easy for decision makers to understand the insights and take action. 7 | 8 | I have also included my work in programming languages such as Python, SQL and R. These projects showcase my ability to extract valuable insights from large and complex data sets, and to combine data from multiple sources to create a comprehensive view of the data. 9 | 10 | In addition to my technical skills, I also pride myself on my critical thinking and problem-solving abilities. I am able to approach data analysis with a strategic mindset, and to identify key issues and opportunities in the data. 11 | 12 | I am confident that my data analytics skills and experience make me an ideal candidate for any data analyst role. I hope you find my portfolio to be informative and engaging, and I welcome any opportunity to discuss my qualifications further with you. 13 | 14 | Thank you for your time, I look forward to hearing from you soon. 15 | -------------------------------------------------------------------------------- /SQL/Most Powerful women/Most powerful women SQL data wrangling.sql: -------------------------------------------------------------------------------- 1 | #For this SQL project i will be answering 10 question from this data set 2 | #Let start by introducing the table 3 | 4 | SELECT * 5 | FROM `data_analytics_ with_sql`.`the worlds 100 most powerful women`; 6 | 7 | # 1. What is the average age of the women in the dataset? 8 | 9 | SELECT AVG(AGE) 10 | FROM `data_analytics_ with_sql`.`the worlds 100 most powerful women`; 11 | 12 | # 2. How many women in the dataset are from each country? 13 | 14 | SELECT COUNT(NAME) AS NUM_OF_WOMEN, 15 | LOCATION 16 | FROM `data_analytics_ with_sql`.`the worlds 100 most powerful women` 17 | GROUP BY LOCATION 18 | ORDER BY COUNT(NAME) DESC; 19 | 20 | #3. What are the top 5 most common occupations among the women in the dataset? 21 | 22 | SELECT CATEGORY, 23 | COUNT(CATEGORY) AS NUM_OFWOMEN 24 | FROM `data_analytics_ with_sql`.`the worlds 100 most powerful women` 25 | GROUP BY 1 26 | ORDER BY NUM_OFWOMEN DESC; 27 | 28 | #4. How many women in the dataset are over the age of 40? 29 | 30 | SELECT COUNT(AGE) AS OVER40WOMEN 31 | FROM `data_analytics_ with_sql`.`the worlds 100 most powerful women` 32 | WHERE AGE > 40 33 | ORDER BY OVER40WOMEN DESC; 34 | 35 | #-----OR--- 36 | 37 | SELECT `the worlds 100 most powerful women`.NAME, 38 | AGE AS OVER40WOMEN 39 | FROM `data_analytics_ with_sql`.`the worlds 100 most powerful women` 40 | WHERE AGE > 40 41 | ORDER BY OVER40WOMEN DESC; 42 | 43 | #5. What is the youngest and oldest age of the women in the dataset? 44 | #YOUNGEST 45 | SELECT `the worlds 100 most powerful women`.RANK, 46 | `the worlds 100 most powerful women`.NAME, 47 | AGE AS YOUNGEST_WOMAN, LOCATION, CATEGORY 48 | FROM `data_analytics_ with_sql`.`the worlds 100 most powerful women` 49 | ORDER BY YOUNGEST_WOMAN 50 | LIMIT 1; 51 | 52 | #OLDEST 53 | 54 | SELECT `the worlds 100 most powerful women`.RANK, 55 | `the worlds 100 most powerful women`.NAME, 56 | AGE AS OLDERST_WOMAN, LOCATION, CATEGORY 57 | FROM `data_analytics_ with_sql`.`the worlds 100 most powerful women` 58 | ORDER BY OLDERST_WOMAN DESC 59 | LIMIT 1; 60 | 61 | #6. How many women in the dataset are from North America? 62 | 63 | SELECT LOCATION, 64 | COUNT(LOCATION) AS NORTH_AMERICA 65 | FROM `data_analytics_ with_sql`.`the worlds 100 most powerful women` 66 | GROUP BY LOCATION 67 | HAVING LOCATION IN ('United states', 'Honduras', 'Barbados') ; 68 | 69 | 70 | 71 | #7. How many women in the dataset are in the Technology category? 72 | 73 | SELECT CATEGORY, 74 | COUNT(CATEGORY) AS WOMEN_IN_TECH 75 | FROM `data_analytics_ with_sql`.`the worlds 100 most powerful women` 76 | GROUP BY CATEGORY 77 | HAVING CATEGORY = 'Technology'; 78 | 79 | #8. How many women in the dataset are from Asia? 80 | 81 | SELECT LOCATION, 82 | COUNT(LOCATION) AS ASIAN_GIRLPOWER 83 | FROM `data_analytics_ with_sql`.`the worlds 100 most powerful women` 84 | GROUP BY LOCATION 85 | HAVING LOCATION IN ('China', 'India', 'Singapore', 'Indonesia' , 'Bangladesh', 'Japan', 'South Korea', 'Tiwan') 86 | ORDER BY ASIAN_GIRLPOWER DESC; 87 | 88 | #---or---- 89 | 90 | SELECT LOCATION , 91 | COUNT( 92 | CASE 93 | WHEN LOCATION IN ('China', 'India', 'Singapore', 'Indonesia' , 'Bangladesh', 'Japan', 'South Korea', 'Tiwan') Then 1 END) AS GIRLS 94 | FROM`data_analytics_ with_sql`.`the worlds 100 most powerful women` 95 | GROUP BY 1 96 | ORDER BY GIRLS DESC; 97 | 98 | # 9. What is the standard deviation of the ages of the women in the dataset? 99 | 100 | SELECT STDDEV(AGE) AS STDDEV_AGE 101 | FROM `data_analytics_ with_sql`.`the worlds 100 most powerful women`; 102 | 103 | 104 | #10. ASSIGNMENT LOCATION TO TEMPORARY COULMNS TO SHOWCASE COLUMNS 105 | SELECT *, 106 | CASE 107 | WHEN LOCATION IN ('United States', 'Honduras', 'Barbados') THEN 'NORTH AMERICA' 108 | WHEN LOCATION IN ('Germany', 'Belgium', 'Italy', 'United Kingdom', 'Spain', 'France', 'Denmark', 'Turkey', 'Finland', 'Solvakia') THEN 'EUROPE' 109 | WHEN LOCATION IN ('China', 'India', 'Taiwan', 'Singapore', 'Indonesia', 'Bangladesh', 'Japan', 'South Korea') THEN 'ASIA' 110 | WHEN LOCATION IN('Australia','New Zealand') THEN 'OCEANIA' 111 | WHEN LOCATION IN ('Nigeria', 'Tanzania') THEN 'AFRICA' 112 | ELSE 'WHAT ARE YOU!' END AS CONTINENT 113 | FROM `data_analytics_ with_sql`.`the worlds 100 most powerful women` 114 | ORDER BY CONTINENT; 115 | 116 | 117 | #11. What is the median age of the women in the dataset? 118 | 119 | SELECT AVG (AGE) AS MEDIAN_AGE 120 | FROM 121 | (SELECT AGE 122 | FROM `data_analytics_ with_sql`.`the worlds 100 most powerful women` 123 | ORDER BY AGE); 124 | 125 | -------------------------------------------------------------------------------- /SQL/Most Powerful women/The Worlds 100 Most Powerful Women.csv: -------------------------------------------------------------------------------- 1 | RANK,NAME,AGE,LOCATION,CATEGORY 2 | 1.,Ursula von der Leyen,64,Belgium,Politics & Policy 3 | 2.,Christine Lagarde,66,Germany,Politics & Policy 4 | 3.,Kamala Harris,58,United States,Politics & Policy 5 | 4.,Mary Barra,60,United States,Business 6 | 5.,Abigail Johnson,60,United States,Money 7 | 6.,Melinda French Gates,58,United States,Philanthropy 8 | 7.,Giorgia Meloni,45,Italy,Politics & Policy 9 | 8.,Karen Lynch,58,United States,Business 10 | 9.,Julie Sweet,55,United States,Business 11 | 10.,Jane Fraser,55,United States,Finance 12 | 11.,MacKenzie Scott,52,United States,Impact 13 | 12.,Kristalina Georgieva,69,United States,Politics & Policy 14 | 13.,Rosalind Brewer,59,United States,Business 15 | 14.,Emma Walmsley,53,United Kingdom,Business 16 | 15.,Ana Patricia Botín,62,Spain,Finance 17 | 16.,Gail Boudreaux,62,United States,Business 18 | 17.,Tsai Ing-wen,66,Taiwan,Politics & Policy 19 | 18.,Ruth Porat,65,United States,Technology 20 | 19.,Safra Catz,60,United States,Technology 21 | 20.,Martina Merz,59,Germany,Business 22 | 21.,Carol Tomé,65,United States,Business 23 | 22.,Judith McKenna,56,United States,Business 24 | 23.,Susan Wojcicki,54,United States,Technology 25 | 24.,Oprah Winfrey,68,United States,Media & Entertainment 26 | 25.,Nancy Pelosi,82,United States,Politics & Policy 27 | 26.,Laurene Powell Jobs & family,59,United States,Philanthropy 28 | 27.,Amanda Blanc,55,United Kingdom,Business 29 | 28.,Amy Hood,50,United States,Technology 30 | 29.,Catherine MacGregor,50,France,Business 31 | 30.,Phebe Novakovic,65,United States,Entrepreneurs 32 | 31.,Gwynne Shotwell,59,United States,Technology 33 | 32.,Shari Redstone,68,United States,Lifestyle 34 | 33.,Janet Yellen,76,United States,Politics & Policy 35 | 34.,Jessica Tan,45,China,Finance 36 | 35.,Ho Ching,69,Singapore,Finance 37 | 36.,Nirmala Sitharaman,63,India,Politics & Policy 38 | 37.,Thasunda Brown Duckett,49,United States,Finance 39 | 38.,Kathy Warden,51,United States,Business 40 | 39.,"Marianne Lake, Jennifer Piepszak",-,United States,Finance 41 | 40.,Jacinda Ardern,42,New Zealand,Politics & Policy 42 | 41.,Dana Walden,58,United States,Media & Entertainment 43 | 42.,Sheikh Hasina Wajed,75,Bangladesh,Politics & Policy 44 | 43.,Mary Callahan Erdoes,55,United States,Money 45 | 44.,Adena Friedman,53,United States,Finance 46 | 45.,Gina Rinehart,68,Australia,Business 47 | 46.,Lynn Martin,46,United States,Finance 48 | 47.,Sri Mulyani Indrawati,60,Indonesia,Politics & Policy 49 | 48.,Vicki Hollub,63,United States,Entrepreneurs 50 | 49.,Nicke Widyawati,54,Indonesia,Business 51 | 50.,Lisa Su,53,United States,Entrepreneurs 52 | 51.,Tricia Griffith,58,United States,Business 53 | 52.,Shemara Wikramanayake,60,Australia,Business 54 | 53.,Roshni Nadar Malhotra,41,India,Technology 55 | 54.,Madhabi Puri Buch,56,India,Politics & Policy 56 | 55.,Sinead Gorman,45,United Kingdom,Business 57 | 56.,Tokiko Shimizu,57,Japan,Finance 58 | 57.,Yuriko Koike,70,Japan,Politics & Policy 59 | 58.,Jennifer Salke,58,United States,Media & Entertainment 60 | 59.,Jenny Johnson,58,United States,Money 61 | 60.,Hana Al Rostamani,-,United Arab Emirates,Finance 62 | 61.,Donna Langley,54,United States,Media & Entertainment 63 | 62.,Dong Mingzhu,68,China,Business 64 | 63.,Judy Faulkner,79,United States,Entrepreneurs 65 | 64.,Robyn Denholm,59,Australia,Business 66 | 65.,Suzanne Scott,56,United States,Media & Entertainment 67 | 66.,Lynn Good,63,United States,Business 68 | 67.,Soma Mondal,59,India,Business 69 | 68.,Belén Garijo,62,Germany,Business 70 | 69.,Melanie Kreis,51,Germany,Business 71 | 70.,Bela Bajaria,51,United States,Media & Entertainment 72 | 71.,Paula Santilli,-,Mexico,Business 73 | 72.,Laura Cha,-,China,Finance 74 | 73.,Rihanna,34,United States,Media & Entertainment 75 | 74.,Mette Frederiksen,45,Denmark,Politics & Policy 76 | 75.,Mary Meeker,63,United States,Venture Capital 77 | 76.,Joey Wat,51,China,Business 78 | 77.,Kiran Mazumdar-Shaw,69,India,Business 79 | 78.,Jenny Lee,50,Singapore,Venture Capital 80 | 79.,Taylor Swift,32,United States,Media & Entertainment 81 | 80.,Beyoncé Knowles,41,United States,Media & Entertainment 82 | 81.,Güler Sabanci,67,Turkey,Business 83 | 82.,Linda Thomas-Greenfield,70,United States,Impact 84 | 83.,Sanna Marin,37,Finland,Politics & Policy 85 | 84.,Solina Chau,60,China,Philanthropy 86 | 85.,Lee Boo-jin,52,South Korea,Business 87 | 86.,Reese Witherspoon,46,United States,Media & Entertainment 88 | 87.,Zuzana Caputova,49,Slovakia,Politics & Policy 89 | 88.,Dominique Senequier,69,France,Finance 90 | 89.,Falguni Nayar,59,India,Business 91 | 90.,Julia Gillard,61,Australia,Philanthropy 92 | 91.,Ngozi Okonjo-Iweala,68,Nigeria,Politics & Policy 93 | 92.,Raja Easa Al Gurg,-,United Arab Emirates,Business 94 | 93.,Shonda Rhimes,52,United States,Entertainment 95 | 94.,Xiomara Castro,63,Honduras,Politics & Policy 96 | 95.,Samia Suluhu Hassan,62,Tanzania,Politics & Policy 97 | 96.,Dolly Parton,76,United States,Lifestyle 98 | 97.,Kirsten Green,51,United States,Money 99 | 98.,Mia Mottley,57,Barbados,Politics & Policy 100 | 99.,Mo Abudu,58,Nigeria,Media & Entertainment 101 | 100.,Mahsa Amini (posthumous),-,Iran,Politics & Policy 102 | -------------------------------------------------------------------------------- /SQL/SQL Project/DANNYSDINER CASE-STUDY.sql: -------------------------------------------------------------------------------- 1 | /* -------------------- 2 | Case Study Questions 3 | --------------------*/ 4 | 5 | -- 1. What is the total amount each customer spent at the restaurant? 6 | -- 2. How many days has each customer visited the restaurant? 7 | -- 3. What was the first item from the menu purchased by each customer? 8 | -- 4. What is the most purchased item on the menu and how many times was it purchased by all customers? 9 | -- 5. Which item was the most popular for each customer? 10 | -- 6. Which item was purchased first by the customer after they became a member? 11 | -- 7. Which item was purchased just before the customer became a member? 12 | -- 8. What is the total items and amount spent for each member before they became a member? 13 | -- 9. If each $1 spent equates to 10 points and sushi has a 2x points multiplier - how many points would each customer have? 14 | -- 10. In the first week after a customer joins the program (including their join date) they earn 2x points on all items, not just sushi - how many points do customer A and B have at the end of January? 15 | 16 | -- Select table Query: 17 | 18 | --Members Table 19 | SELECT 20 | * 21 | FROM dannys_diner.dbo.members; 22 | 23 | 24 | -- Menu table 25 | SELECT 26 | * 27 | FROM dannys_diner.dbo.menu; 28 | 29 | 30 | --Sales table 31 | SELECT 32 | * 33 | FROM dannys_diner.dbo.sales; 34 | 35 | 36 | -- 1. What is the total amount each customer spent at the restaurant? 37 | 38 | SELECT 39 | s.customer_id, 40 | SUM(m.price) AS total_amount_spent 41 | FROM dannys_diner.dbo.sales s 42 | JOIN dannys_diner.dbo.menu m ON s.product_id = m.product_id 43 | GROUP BY s.customer_id 44 | ORDER BY s.customer_id; 45 | 46 | 47 | 48 | -- 2. How many days has each customer visited the restaurant? 49 | 50 | SELECT 51 | customer_id, 52 | COUNT(DISTINCT order_date) AS total_days_visited 53 | FROM dannys_diner.dbo.sales 54 | GROUP BY customer_id 55 | ORDER BY customer_id; 56 | 57 | 58 | -- 3. What was the first item from the menu purchased by each customer? 59 | 60 | WITH customer_first_purchase AS ( 61 | SELECT 62 | s.customer_id, m.product_name, 63 | MIN(s.order_date) AS first_purchase_date 64 | FROM dannys_diner.dbo.sales s 65 | JOIN dannys_diner.dbo.menu m 66 | ON s.product_id = m.product_id 67 | GROUP BY s.customer_id, m.product_name 68 | ORDER BY first_purchase_date 69 | ) 70 | SELECT 71 | c.customer_id, 72 | c.product_name 73 | FROM customer_first_purchase c 74 | WHERE c.first_purchase_date = ( 75 | SELECT MIN(first_purchase_date) 76 | FROM customer_first_purchase 77 | WHERE customer_id = c.customer_id 78 | ) 79 | ORDER BY c.customer_id; 80 | SELECT c.customer_id, m.product_name 81 | FROM ( 82 | SELECT customer_id, MIN(order_date) AS first_order_date 83 | FROM dannys_diner.dbo.sales 84 | GROUP BY customer_id 85 | ) AS c 86 | JOIN dannys_diner.dbo.sales AS s ON c.customer_id = s.customer_id AND c.first_order_date = s.order_date 87 | JOIN dannys_diner.dbo.menu AS m ON s.product_id = m.product_id 88 | ORDER BY c.customer_id; 89 | 90 | 91 | 92 | -- 4. What is the most purchased item on the menu and how many times was it purchased by all customers? 93 | 94 | SELECT 95 | M.product_id, 96 | product_name, 97 | price, 98 | COUNT(S.product_id) AS total_purchases 99 | FROM dannys_diner.dbo.menu AS M 100 | INNER JOIN dannys_diner.dbo.sales AS S 101 | ON M.product_id= S.product_id 102 | GROUP BY M.product_id, product_name, price 103 | ORDER BY total_purchases DESC 104 | ; 105 | 106 | 107 | 108 | -- 5. Which item was the most popular for each customer? 109 | 110 | WITH popular_items AS ( 111 | SELECT 112 | customer_id, 113 | product_id, 114 | COUNT(*) AS order_count, 115 | ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY COUNT(*) DESC) AS rn 116 | FROM dannys_diner.dbo.sales 117 | GROUP BY customer_id, product_id 118 | ) 119 | SELECT 120 | p.customer_id, 121 | m.product_name AS most_popular_item 122 | FROM popular_items p 123 | JOIN dannys_diner.dbo.menu m 124 | ON p.product_id = m.product_id 125 | WHERE p.rn = 1 126 | ORDER BY p.customer_id; 127 | 128 | 129 | 130 | -- 6. Which item was purchased first by the customer after they became a member? 131 | 132 | SELECT 133 | m.customer_id, 134 | m.join_date, 135 | MIN(s.order_date) AS first_purchase_date, 136 | u.product_name AS first_purchase_item 137 | FROM dannys_diner.dbo.members m 138 | JOIN dannys_diner.dbo.sales s ON m.customer_id = s.customer_id 139 | JOIN dannys_diner.dbo.menu u ON s.product_id = u.product_id 140 | WHERE s.order_date > m.join_date 141 | GROUP BY m.customer_id, m.join_date, u.product_name 142 | ORDER BY m.customer_id; 143 | 144 | 145 | 146 | -- 7. Which item was purchased just before the customer became a member? 147 | 148 | SELECT 149 | m.customer_id, 150 | m.join_date, 151 | MAX(s.order_date) AS last_purchase_date, 152 | u.product_name AS last_purchase_item 153 | FROM dannys_diner.dbo.members m 154 | JOIN dannys_diner.dbo.sales s 155 | ON m.customer_id = s.customer_id 156 | JOIN dannys_diner.dbo.menu u 157 | ON s.product_id = u.product_id 158 | WHERE s.order_date < m.join_date 159 | GROUP BY m.customer_id, m.join_date, u.product_name 160 | ORDER BY m.customer_id; 161 | 162 | 163 | -- 8. What is the total items and amount spent for each member before they became a member? 164 | 165 | SELECT 166 | m.customer_id, 167 | m.join_date, 168 | COUNT(s.product_id) AS total_items, 169 | SUM(u.price) AS total_amount_spent 170 | FROM dannys_diner.dbo.members m 171 | JOIN dannys_diner.dbo.sales s 172 | ON m.customer_id = s.customer_id 173 | JOIN dannys_diner.dbo.menu u 174 | ON s.product_id = u.product_id 175 | WHERE s.order_date < m.join_date 176 | GROUP BY m.customer_id, m.join_date 177 | ORDER BY m.customer_id; 178 | 179 | 180 | -- 9. If each $1 spent equates to 10 points and sushi has a 2x points multiplier - how many points would each customer have? 181 | 182 | SELECT 183 | s.customer_id, 184 | SUM( 185 | CASE 186 | WHEN m.product_name = 'sushi' THEN 20 * m.price 187 | ELSE 10 * m.price 188 | END 189 | ) AS total_points 190 | FROM dannys_diner.dbo.sales s 191 | JOIN dannys_diner.dbo.menu m 192 | ON s.product_id = m.product_id 193 | GROUP BY s.customer_id 194 | ORDER BY s.customer_id; 195 | 196 | 197 | -- 10. In the first week after a customer joins the program (including their join date) they earn 2x points on all items, not just sushi - how many points do customer A and B have at the end of January? 198 | 199 | WITH customer_points AS ( 200 | SELECT s.customer_id, s.order_date, u.product_name, 201 | CASE 202 | WHEN u.product_name = 'sushi' THEN 203 | CASE 204 | WHEN s.order_date <= DATEADD(DAY, 6, m.join_date) THEN 20 * u.price 205 | ELSE 10 * u.price 206 | END 207 | ELSE 208 | CASE 209 | WHEN s.order_date <= DATEADD(DAY, 6, m.join_date) THEN 20 * u.price 210 | ELSE 10 * u.price 211 | END 212 | END AS points 213 | FROM dannys_diner.dbo.sales s 214 | JOIN dannys_diner.dbo.menu u ON s.product_id = u.product_id 215 | JOIN dannys_diner.dbo.members m ON s.customer_id = m.customer_id 216 | WHERE s.order_date <= '2021-01-31' -- End of January 217 | AND (s.order_date >= m.join_date OR s.order_date <= DATEADD(DAY, 6, m.join_date)) 218 | ) 219 | SELECT customer_id, SUM(points) AS total_points 220 | FROM customer_points 221 | WHERE customer_id IN ('A', 'B') 222 | GROUP BY customer_id 223 | ORDER BY customer_id; 224 | 225 | -------------------------------------------------------------------------------- /R/Cyclistic bike share.R: -------------------------------------------------------------------------------- 1 | library(tidyverse) #helps wrangle data 2 | library(lubridate) #helps wrangle date attributes 3 | library(ggplot2) #helps visualize data 4 | getwd() #displays your working directory 5 | setwd("/Users/Oluchukwu Anene/Documents/Cyclistic bike share") 6 | 7 | 8 | 9 | 10 | 11 | # COLLECT DATA 12 | #==================================================== 13 | # Upload Divvy datasets (csv files) here 14 | 15 | Jan_2022 <- read_csv("202201-divvy-tripdata.csv") 16 | Feb_2022 <- read_csv("202202-divvy-tripdata.csv") 17 | Mar_2022 <- read_csv("202203-divvy-tripdata.csv") 18 | Apr_2022 <- read_csv("202204-divvy-tripdata.csv") 19 | May_2022 <- read_csv("202205-divvy-tripdata.csv") 20 | Jun_2022 <- read_csv("202206-divvy-tripdata.csv") 21 | Jul_2022 <- read_csv("202207-divvy-tripdata.csv") 22 | Aug_2022 <- read_csv("202208-divvy-tripdata.csv") 23 | Sep_2022 <- read_csv("202209-divvy-publictripdata.csv") 24 | Oct_2022 <- read_csv("202210-divvy-tripdata.csv") 25 | Nov_2022 <- read_csv("202211-divvy-tripdata.csv") 26 | Dec_2022 <- read_csv("202212-divvy-tripdata.csv") 27 | 28 | 29 | 30 | 31 | 32 | # WRANGLE DATA AND COMBINE INTO A SINGLE FILE 33 | #==================================================== 34 | 35 | # Compare column names in each of the files 36 | 37 | colnames(Jan_2022) 38 | colnames(Feb_2022) 39 | colnames(Mar_2022) 40 | colnames(Apr_2022) 41 | colnames(May_2022) 42 | colnames(Jun_2022) 43 | colnames(Jul_2022) 44 | colnames(Aug_2022) 45 | colnames(Sep_2022) 46 | colnames(Oct_2022) 47 | colnames(Nov_2022) 48 | colnames(Dec_2022) 49 | 50 | 51 | 52 | # Inspect the dataframes and look for incongruencies 53 | str(Jan_2022) 54 | str(Feb_2022) 55 | str(Mar_2022) 56 | str(Apr_2022) 57 | str(May_2022) 58 | str(Jun_2022) 59 | str(Jul_2022) 60 | str(Aug_2022) 61 | str(Sep_2022) 62 | str(Oct_2022) 63 | str(Nov_2022) 64 | str(Dec_2022) 65 | 66 | 67 | # Stack individual month's data frames into one big data frame 68 | Bike_Share <- bind_rows(Jan_2022, Feb_2022, Mar_2022,Apr_2022, May_2022, Jun_2022, Jul_2022, Aug_2022, Sep_2022, Oct_2022, Nov_2022, Dec_2022) 69 | 70 | 71 | 72 | # Remove start and end station name and station id as this data columns consists of inconsistent data as well as null values making the data unuseful 73 | Bike_Share <- Bike_Share %>% 74 | select(-c(start_station_name, start_station_id, end_station_name, end_station_id)) 75 | 76 | 77 | 78 | 79 | # Rename columns to give the more relatable names 80 | 81 | (Bike_Share <- rename(Bike_Share 82 | ,Ride_id = ride_id 83 | ,Ride_types = rideable_type 84 | ,Start_time = started_at 85 | ,End_time = ended_at 86 | ,Start_lat = start_lat 87 | ,Start_lng = start_lng 88 | ,End_lat = end_lat 89 | ,End_lng= end_lng 90 | ,User_types = member_casual )) 91 | 92 | 93 | 94 | 95 | 96 | #CLEAN UP AND ADD DATA TO PREPARE FOR ANALYSIS 97 | #====================================================== 98 | # Inspect the new table that has been created 99 | colnames(Bike_Share) #List of column names 100 | nrow(Bike_Share) #How many rows are in data frame? 101 | dim(Bike_Share) #Dimensions of the data frame? 102 | head(Bike_Share) #See the first 6 rows of data frame. 103 | tail(Bike_Share) #see the last 6 rows of data frame 104 | str(Bike_Share) #See list of columns and data types (numeric, character, etc) 105 | summary(Bike_Share) #Statistical summary of data. Mainly for numerics 106 | 107 | 108 | 109 | 110 | # Add columns that list the date, month, day, and year of each ride 111 | Bike_Share$Date <- as.Date(Bike_Share$Start_time) #The default format is yyyy-mm-dd 112 | Bike_Share$Month <- format(as.Date(Bike_Share$Date), "%m") 113 | Bike_Share$Day <- format(as.Date(Bike_Share$Date), "%d") 114 | Bike_Share$Year <- format(as.Date(Bike_Share$Date), "%Y") 115 | Bike_Share$Day_of_week <- format(as.Date(Bike_Share$Date), "%A") 116 | 117 | 118 | 119 | 120 | 121 | # Add a "ride_length" calculation to Bike_Share(in seconds) 122 | Bike_Share$Ride_length <- difftime(Bike_Share$End_time,Bike_Share$Start_time) 123 | 124 | # Inspect the structure of the columns 125 | str(Bike_Share) 126 | 127 | 128 | # Convert "ride_length" from Factor to numeric so we can run calculations on the data 129 | is.factor(Bike_Share$Ride_length) 130 | Bike_Share$Ride_length <- as.numeric(as.character(Bike_Share$Ride_length)) 131 | is.numeric(Bike_Share$Ride_length) 132 | 133 | # Removing "bad" data 134 | # The data frame includes a few hundred entries when ride_length was negative 135 | # Creating a new data frame (Bike_Share_p2) to store the cleaned data set 136 | Bike_Share_p2 <- Bike_Share[!(Bike_Share$Ride_length<0),] 137 | 138 | 139 | 140 | 141 | 142 | 143 | # DESCRIPTIVE ANALYSIS 144 | #===================================== 145 | 146 | # Descriptive analysis on Ride_length (all figures in seconds) 147 | 148 | summary(Bike_Share_p2$Ride_length) 149 | 150 | # Compare members and casual users 151 | aggregate(Bike_Share_p2$Ride_length ~ Bike_Share_p2$User_types, FUN = mean) 152 | aggregate(Bike_Share_p2$Ride_length ~ Bike_Share_p2$User_types, FUN = median) 153 | aggregate(Bike_Share_p2$Ride_length ~ Bike_Share_p2$User_types, FUN = max) 154 | aggregate(Bike_Share_p2$Ride_length ~ Bike_Share_p2$User_types, FUN = min) 155 | 156 | # Average ride time by each day for members vs casual users 157 | aggregate(Bike_Share_p2$Ride_length ~ Bike_Share_p2$User_types + Bike_Share_p2$Day_of_week, FUN = mean) 158 | 159 | # Get the days of the week in order . 160 | Bike_Share_p2$Day_of_week <- ordered(Bike_Share_p2$Day_of_week, levels=c("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday")) 161 | 162 | # Run the average ride time by each day for members vs casual users 163 | aggregate(Bike_Share_p2$Ride_length ~ Bike_Share_p2$User_types + Bike_Share_p2$Day_of_week, FUN = mean) 164 | 165 | # analyze ridership data by type and weekday 166 | Bike_Share_p2 %>% 167 | mutate(weekday = wday(Start_time, label = TRUE)) %>% #creates weekday field using wday() 168 | group_by(User_types, weekday) %>% #groups by user_types and weekday 169 | summarise(number_of_rides = n() #calculates the number of rides and average duration 170 | ,average_duration = mean(Ride_length)) %>% # calculates the average duration 171 | arrange(User_types, weekday) # sorts 172 | 173 | # Plot the number of rides by rider type 174 | Bike_Share_p2 %>% 175 | mutate(weekday = wday(Start_time, label = TRUE)) %>% 176 | group_by(User_types, weekday) %>% 177 | summarise(Number_of_rides = n() 178 | ,average_duration = mean(Ride_length)) %>% 179 | arrange(User_types, weekday) %>% 180 | ggplot(aes(x = weekday, y = Number_of_rides, fill = User_types)) + 181 | geom_col(position = "dodge") 182 | 183 | # Plot for average duration 184 | Bike_Share_p2 %>% 185 | mutate(weekday = wday(Start_time, label = TRUE)) %>% 186 | group_by(User_types, weekday) %>% 187 | summarise(Number_of_rides = n() 188 | ,average_duration = mean(Ride_length)) %>% 189 | arrange(User_types, weekday) %>% 190 | ggplot(aes(x = weekday, y = average_duration, fill = User_types)) + 191 | geom_col(position = "dodge") 192 | 193 | 194 | 195 | #EXPORT SUMMARY FILE FOR FURTHER ANALYSIS 196 | #================================================= 197 | 198 | # Create a csv file to export 199 | 200 | counts <- aggregate(Bike_Share_p2$Ride_length ~ Bike_Share_p2$User_types + Bike_Share_p2$Day_of_week, FUN = mean) 201 | write.csv(counts, file = '~/Cyclistic bike share/avg_ridecyc_length.csv') 202 | write.csv(Bike_Share_p2, file = '~/Cyclistic bike share/Cyclistics_sharing.csv') -------------------------------------------------------------------------------- /R/Quantium Chips.R: -------------------------------------------------------------------------------- 1 | ## Load required libraries 2 | library(data.table) #helps enhanced version of data frames 3 | library(tidyverse) #helps wrangle data 4 | library(lubridate) #helps wrangle date attributes 5 | library(ggplot2) #helps visualize data 6 | library(ggmosaic)#helps visualize data 7 | library(stringr) #helps provide functions for working with strings 8 | library(readr) 9 | library(dplyr) #helps manipulation data 10 | 11 | #======================================================================= 12 | # Point the file Path to download the data sets 13 | #====================================================================== 14 | 15 | getwd() #displays your working directory 16 | setwd("/Users/Oluchukwu Anene/Documents/Quantium Chips data") 17 | 18 | 19 | #======================================================== 20 | #Upload CSV Files 21 | #======================================================= 22 | transactionData <- read_csv("QVI_transaction_data.csv") 23 | customerData <- read_csv("QVI_purchase_behaviour.csv") 24 | 25 | #======================================================== 26 | #Exploratory data analysis for transactionData 27 | #======================================================== 28 | #view colunms 29 | colnames(transactionData) 30 | 31 | 32 | #### Examine transaction data 33 | str(transactionData) 34 | 35 | 36 | #### Examine PROD_NAME 37 | table(transactionData[ "PROD_NAME"]) #to find the unique containt and its no. of occurance 38 | 39 | #or You can use 40 | 41 | transactionData %>% 42 | count(PROD_NAME) 43 | 44 | #From the output the Product names is populated by chips so we will be working with various chips. 45 | 46 | 'There are salsa products in the dataset but we are only interested in the chips category, so let’s remove 47 | these.' 48 | #-------------------------------------------------------------------------------------------------------- 49 | 50 | #### Remove salsa products 51 | # Create a new column indicating whether each row contains the word "salsa" in the PROD_NAME column 52 | transactionData <- transactionData %>% 53 | mutate(SALSA = grepl("salsa", tolower(PROD_NAME))) 54 | 55 | # Remove rows with the word "salsa" in the PROD_NAME column 56 | transactionData <- transactionData %>% 57 | filter(SALSA == FALSE) %>% 58 | select(-SALSA) 59 | #---------------------------------------------------------------------------------------------------------- 60 | 61 | #### Summaries the data to check or nulls and possible outliers 62 | summary(transactionData) 63 | 64 | #====================================================================================================== 65 | #PROD_QTY has an outlier of 200 66 | #Filter the dataset to find the outlier 67 | #======================================================================================================= 68 | 69 | # Select rows with PROD_QTY equal to 200 70 | transactionData[transactionData$PROD_QTY == 200, ] 71 | 72 | # Let's see if the customer has had other transactions 73 | transactionData[transactionData$LYLTY_CARD_NBR == 226000, ] 74 | 75 | ' It looks like this customer has only had the two transactions over the year and is not an ordinary retail 76 | customer. The customer might be buying chips for commercial purposes instead. We’ll remove this loyalty 77 | card number from further analysis.' 78 | #----------------------------------------------------------------------------------------------------------- 79 | 80 | # Use the base R's subsetting methodto keep only rows where LYLTY_CARD_NBR does not equal 226000 81 | transactionData <- transactionData[transactionData$LYLTY_CARD_NBR != 226000, ] 82 | 83 | # Re‐examine transaction data 84 | summary(transactionData) 85 | 86 | # That’s better. Now, let’s look at the number of transaction lines over time to see if there are any obvious data issues such as missing data. 87 | #------------------------------------------------------------------------------------------------------------------------------------------------ 88 | ## Count the number of transactions by date 89 | transactionData %>% 90 | group_by(DATE) %>% 91 | summarize(n = n()) 92 | 93 | # Show the top 5 dates with the highest number of transactions 94 | head(arrange(transactionData, DATE),5) 95 | 96 | # Show the bottom 5 dates with the lowest number of transactions 97 | tail(arrange(transactionData, DATE),5) 98 | 99 | 'There’s only 364 rows, meaning only 364 dates which indicates a missing date. Let’s create a sequence of 100 | dates from 1 Jul 2018 to 30 Jun 2019 and use this to create a chart of number of transactions over time to 101 | find the missing date.' 102 | #---------------------------------------------------------------------------------------------------------------------- 103 | # Create a sequence of dates from 2018-07-01 to 2019-06-30 104 | allDates <- data.frame(DATE = seq(as.Date("2018-07-01"), as.Date("2019-06-30"), by = "day")) 105 | 106 | # Count the number of transactions by date and join it with the allDates dataframe 107 | transactions_by_day <- left_join(allDates, 108 | transactionData %>% group_by(DATE) %>% summarize(transactions = n()), 109 | by = c("DATE")) 110 | 111 | # Plot transactions over time 112 | ggplot(transactions_by_day, aes(x = DATE, y = transactions)) + 113 | geom_line() + 114 | scale_x_date(date_breaks = "1 month", date_labels = "%b-%Y") + 115 | labs(title = "Transactions over time", x = "Date", y = "Number of transactions") + 116 | theme_bw() + 117 | theme(axis.text.x = element_text(angle = 90, vjust = 0.5)) 118 | #This code creates a sequence of dates between 2018-07-01 and 2019-06-30, and a data frame allDates which contains all the dates and labels them as "DATE". Then it uses group_by(), summarize() and left_join() functions to count the number of transactions by date, and join this dataframe with allDates dataframe. 119 | 120 | #The scale_x_date() function allows to format the x-axis with breaks set to 1 month and labels set to month-year format and other cosmetics are set using ggplot2's theme options. 121 | 122 | 123 | 'We can see that there is an increase in purchases in December and a break in late December. Let’s zoom in 124 | on this.' 125 | #-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 126 | #Filter to December and look at individual days 127 | ggplot(transactions_by_day[month(transactions_by_day$DATE) == 12, ], aes(x = DATE, y = transactions)) + 128 | geom_line() + 129 | scale_x_date(date_breaks = "1 day", date_labels = "%d-%b") + 130 | labs(title = "Transactions over time in December", x = "Date", y = "Number of transactions") + 131 | theme_bw() + 132 | theme(axis.text.x = element_text(angle = 90, vjust = 0.5)) 133 | #This code uses the month() function to filter out only December's dates, and then uses ggplot() function to create a line plot. The scale_x_date() function allows to format the x-axis with breaks set to 1 day and labels set to day-month format, cosmetics are set with ggplot2's theme options. 134 | 135 | 136 | ' We can see that the increase in sales occurs in the lead-up to Christmas and that there are zero sales on 137 | Christmas day itself. This is due to shops being closed on Christmas day. 138 | Now that we are satisfied that the data no longer has outliers, we can move on to creating other features 139 | such as brand of chips or pack size from PROD_NAME. We will start with pack size.' 140 | 141 | #======================================================================================================================== 142 | # Extract the pack size from PROD_NAME column 143 | transactionData$PACK_SIZE <- as.numeric(str_extract(transactionData$PROD_NAME, "\\d+")) 144 | 145 | # Check the result 146 | transactionData %>% 147 | group_by(PACK_SIZE) %>% 148 | summarize(count = n()) %>% 149 | arrange(PACK_SIZE) 150 | #=========================================================================================================================== 151 | 152 | #View data set 153 | transactionData 154 | #---------------------------------------------------------------- 155 | #since we have a product size column, i will be removing ever 134g at the back of ever chip product since we are analyzing the types not size 156 | 157 | # Remove last 4 characters from the PROD_NAME column 158 | transactionData$PROD_NAME<- substr(transactionData$PROD_NAME, 1, nchar(transactionData$PROD_NAME) - 4) 159 | 160 | #View data set 161 | transactionData 162 | #================================================================================================================== 163 | 164 | # Plot a histogram of PACK_SIZE 165 | ggplot(transactionData, aes(x = PACK_SIZE)) + 166 | geom_histogram(binwidth = 10, color = "black", fill = "white") + 167 | labs(title = "Histogram of Pack Size", x = "Pack Size", y = "Frequency") + 168 | theme_bw() 169 | 170 | 'Pack sizes created look reasonable and now to create brands, we can use the first word in PROD_NAME to 171 | work out the brand name' 172 | 173 | #==================================================================================================================== 174 | # Extract the brand from PROD_NAME column 175 | transactionData$BRAND <- substr(transactionData$PROD_NAME,1,regexpr(" ",transactionData$PROD_NAME)-1) 176 | transactionData$BRAND<-toupper(transactionData$BRAND) 177 | 178 | # Check the result 179 | transactionData %>% 180 | group_by(BRAND) %>% 181 | summarize(count = n()) %>% 182 | arrange(-count) 183 | 184 | # Clean brand names 185 | transactionData$BRAND <- 186 | ifelse(transactionData$BRAND == "RED", "RRD", 187 | ifelse(transactionData$BRAND == "SNBTS", "SUNBITES", 188 | ifelse(transactionData$BRAND == "INFZNS", "INFUZIONS", 189 | ifelse(transactionData$BRAND == "WW", "WOOLWORTHS", 190 | ifelse(transactionData$BRAND == "SMITH", "SMITHS", 191 | ifelse(transactionData$BRAND == "NCC", "NATURAL", 192 | ifelse(transactionData$BRAND == "DORITO", "DORITOS", 193 | ifelse(transactionData$BRAND == "GRAIN", "GRNWVES", 194 | transactionData$BRAND)))))))) 195 | 196 | # Check the result 197 | transactionData %>% 198 | group_by(BRAND) %>% 199 | summarize(count = n()) %>% 200 | arrange(BRAND) 201 | 202 | 203 | 204 | #======================================================================================================================= 205 | #Examining customer data 206 | #Now that the transaction dataset has been wrangled and cleaned, I can look at the customer dataset. 207 | 208 | #======================================================================================================================= 209 | #### Examining customer data 210 | str(customerData) 211 | 212 | summary(customerData) 213 | #----------------------------------------------------------------------------------------------------------------------- 214 | #Checking the LIFESTAGE and PREMIUM_CUSTOMER columns. 215 | #Examine the values of LIFESTAGE 216 | customerData %>% 217 | group_by(LIFESTAGE) %>% 218 | summarize(count = n()) %>% 219 | arrange(-count) 220 | 221 | #-------------------------------------------------------------------------------------------------------------------------- 222 | #Examine the values of PREMIUM_CUSTOMER 223 | customerData %>% 224 | group_by(PREMIUM_CUSTOMER) %>% 225 | summarize(count = n()) %>% 226 | arrange(-count) 227 | 228 | 'As there do not seem to be any issues with the customer data, we can now go ahead and oin the transaction 229 | and customer data sets together' 230 | #============================================================================================================ 231 | #### Merge transaction data to customer data 232 | Chipsdata <- merge(transactionData, customerData, all.x = TRUE) 233 | 234 | #View merged dataset 235 | Chipsdata 236 | #============================================================================================================= 237 | #check if some customers were not matched on by checking for nulls. 238 | 239 | #count the number of rows with missing LIFESTAGE 240 | Chipsdata %>% 241 | filter(is.na(LIFESTAGE)) %>% 242 | nrow() 243 | 244 | 245 | #count the number of rows with missing PREMIUM_CUSTOMER 246 | Chipsdata %>% 247 | filter(is.na(PREMIUM_CUSTOMER)) %>% 248 | nrow() 249 | #There are no missing values 250 | 251 | 252 | 253 | ###Data exploration is now complete! 254 | #=================================================================== 255 | 256 | 257 | #### Data analysis on customer segments 258 | #======================================================================================================= 259 | 'Now that the data is ready for analysis, we can define some metrics of interest to the client: 260 | • Who spends the most on chips (total sales), describing customers by lifestage and how premium their 261 | general purchasing behaviour is 262 | • How many customers are in each segment 263 | • How many chips are bought per customer by segment 264 | • What’s the average chip price by customer segment 265 | We could also ask our data team for more information. Examples are: 266 | • The customer’s total spend over the period and total spend for each transaction to understand what 267 | proportion of their grocery spend is on chips 268 | • Proportion of customers in each customer segment overall to compare against the mix of customers 269 | who purchase chips 270 | Let’s start with calculating total sales by LIFESTAGE and PREMIUM_CUSTOMER and plotting the split by 271 | these segments to describe which customer segment contribute most to chip sales.' 272 | #=================================================================================================================== 273 | #================================================================================================================== 274 | # Sum total sales by LIFESTAGE and PREMIUM_CUSTOMER 275 | sales <- Chipsdata %>% 276 | group_by(LIFESTAGE, PREMIUM_CUSTOMER) %>% 277 | summarize(SALES = sum(TOT_SALES)) 278 | 279 | #### Create plot 280 | p <- ggplot(data = sales) + 281 | geom_mosaic(aes(weight = SALES, x = product(PREMIUM_CUSTOMER, LIFESTAGE), 282 | fill = PREMIUM_CUSTOMER)) + 283 | labs(x = "Lifestage", y = "Premium customer flag", title = "Proportion of 284 | sales") + 285 | theme(axis.text.x = element_text(angle = 90, vjust = 0.5)) 286 | #### Plot and label with proportion of sales 287 | p + geom_text(data = ggplot_build(p)$data[[1]], aes(x = (xmin + xmax)/2 , y = 288 | (ymin + ymax)/2, label = as.character(paste(round(.wt/sum(.wt),3)*100, 289 | '%')))) 290 | 291 | 292 | 293 | 294 | 'Sales are coming mainly from Budget - older families, Mainstream - young singles/couples, and Mainstream 295 | - retirees' 296 | #========================================================================================================== 297 | #Let’s see if the higher sales are due to there being more customers who buy chips. 298 | #========================================================================================================== 299 | # Number of customers by LIFESTAGE and PREMIUM_CUSTOMER 300 | customers <- Chipsdata %>% 301 | group_by(LIFESTAGE, PREMIUM_CUSTOMER) %>% 302 | summarize(CUSTOMERS = n_distinct(LYLTY_CARD_NBR)) %>% 303 | arrange(-CUSTOMERS) 304 | 305 | 306 | #### Create Plot 307 | p <- ggplot(data = customers) + 308 | geom_mosaic(aes(weight = CUSTOMERS, x = product(PREMIUM_CUSTOMER, 309 | LIFESTAGE), fill = PREMIUM_CUSTOMER)) + 310 | labs(x = "Lifestage", y = "Premium customer flag", title = "Proportion of 311 | customers") + 312 | theme(axis.text.x = element_text(angle = 90, vjust = 0.5)) 313 | #### Plot and label with Proportion of customers 314 | p + geom_text(data = ggplot_build(p)$data[[1]], aes(x = (xmin + xmax)/2 , y = 315 | (ymin + ymax)/2, label = as.character(paste(round(.wt/sum(.wt),3)*100, 316 | '%')))) 317 | 318 | 'There are more Mainstream - young singles/couples and Mainstream - retirees who buy chips. This con￾ 319 | tributes to there being more sales to these customer segments but this is not a major driver for the Budget 320 | - Older families segment.' 321 | 322 | #====================================================================================================================================== 323 | #Higher sales may also be driven by more units of chips being bought per customer. 324 | #====================================================================================================================================== 325 | 326 | # Average number of units per customer by LIFESTAGE and PREMIUM_CUSTOMER 327 | avg_units <- Chipsdata %>% 328 | group_by(LIFESTAGE, PREMIUM_CUSTOMER) %>% 329 | summarize(AVG = sum(PROD_QTY) / n_distinct(LYLTY_CARD_NBR)) %>% 330 | arrange(-AVG) 331 | 332 | ## Create Plot 333 | ggplot(data = avg_units, aes(weight = AVG, x = LIFESTAGE, fill = 334 | PREMIUM_CUSTOMER)) + 335 | geom_bar(position = position_dodge()) + 336 | labs(x = "Lifestage", y = "Avg units per transaction", title = "Units per 337 | customer") + 338 | theme(axis.text.x = element_text(angle = 90, vjust = 0.5)) 339 | 340 | 'Older families and young families in general buy more chips per customer' 341 | 342 | #========================================================================================================= 343 | #NOW for the average price per unit chips bought for each customer segment as this is also a driver of total sales. 344 | #======================================================================================================== 345 | # Average price per unit by LIFESTAGE and PREMIUM_CUSTOMER 346 | avg_price <- Chipsdata %>% 347 | group_by(LIFESTAGE, PREMIUM_CUSTOMER) %>% 348 | summarize(AVG = sum(TOT_SALES) / sum(PROD_QTY)) %>% 349 | arrange(-AVG) 350 | 351 | #### Create Plot 352 | ggplot(data = avg_price, aes(weight = AVG, x = LIFESTAGE, fill = 353 | PREMIUM_CUSTOMER)) + 354 | geom_bar(position = position_dodge()) + 355 | labs(x = "Lifestage", y = "Avg price per unit", title = "Price per unit") + 356 | theme(axis.text.x = element_text(angle = 90, vjust = 0.5)) 357 | 358 | 'Mainstream midage and young singles and couples are more willing to pay more per packet of chips com￾ 359 | pared to their budget and premium counterparts. This may be due to premium shoppers being more likely to 360 | buy healthy snacks and when they buy chips, this is mainly for entertainment purposes rather than their own 361 | consumption. This is also supported by there being fewer premium midage and young singles and couples 362 | buying chips compared to their mainstream counterparts.' 363 | 364 | #=================================================================================================================== 365 | #As the difference in average price per unit isn’t large, we can check if this difference is statistically different. 366 | #=================================================================================================================== 367 | 368 | # Perform independent t-test between Mainstream vs Premium 369 | pricePerUnit <- Chipsdata %>% 370 | mutate(price = TOT_SALES/PROD_QTY) 371 | 372 | result <- pricePerUnit %>% 373 | filter(LIFESTAGE %in% c("YOUNG SINGLES/COUPLES", "MIDAGE SINGLES/COUPLES")) %>% 374 | group_by(PREMIUM_CUSTOMER) %>% 375 | summarize(mean = mean(price)) 376 | 377 | t.test(pricePerUnit[pricePerUnit$PREMIUM_CUSTOMER == "Mainstream" & pricePerUnit$LIFESTAGE %in% c("YOUNG SINGLES/COUPLES", "MIDAGE SINGLES/COUPLES"),'price'], 378 | pricePerUnit[pricePerUnit$PREMIUM_CUSTOMER != "Mainstream" & pricePerUnit$LIFESTAGE %in% c("YOUNG SINGLES/COUPLES", "MIDAGE SINGLES/COUPLES"),'price'], 379 | alternative = "greater") 380 | 381 | 382 | 'The t-test results in a p-value < 2.2e-16, i.e. the unit price for mainstream, young and mid-age singles and 383 | couples are significantly higher than that of budget or premium, young and midage singles and couples.' 384 | 385 | 386 | #=============================================================================================================== 387 | 'Checking customer segments that contribute the most to sales to retain them or further 388 | increase sales. Let’s look at Mainstream - young singles/couples. For instance, let’s find out if they tend to 389 | buy a particular brand of chips.' 390 | #================================================================================================================= 391 | 392 | # Deep dive into mainstream, young singles/couples 393 | segment1 <- Chipsdata %>% 394 | filter(LIFESTAGE == "YOUNG SINGLES/COUPLES", PREMIUM_CUSTOMER == "Mainstream") 395 | other <- Chipsdata %>% 396 | filter(!(LIFESTAGE == "YOUNG SINGLES/COUPLES" & PREMIUM_CUSTOMER == "Mainstream")) 397 | 398 | # Brand affinity compared to the rest of the population 399 | quantity_segment1 <- segment1 %>% 400 | summarize(total_qty = sum(PROD_QTY)) %>% 401 | pull(total_qty) 402 | quantity_other <- other %>% 403 | summarize(total_qty = sum(PROD_QTY)) %>% 404 | pull(total_qty) 405 | 406 | quantity_segment1_by_brand <- segment1 %>% 407 | group_by(BRAND) %>% 408 | summarize(targetSegment = sum(PROD_QTY)/quantity_segment1) 409 | quantity_other_by_brand <- other %>% 410 | group_by(BRAND) %>% 411 | summarize(other = sum(PROD_QTY)/quantity_other) 412 | 413 | brand_proportions <- quantity_segment1_by_brand %>% 414 | left_join(quantity_other_by_brand) %>% 415 | mutate(affinityToBrand = targetSegment/other) %>% 416 | arrange(affinityToBrand) 417 | 418 | 419 | #Create plot 420 | ggplot(brand_proportions, aes(x = BRAND, y = affinityToBrand)) + 421 | geom_col() + 422 | labs(x = "Brand", y = "Affinity to Brand", title = "Brand Affinity of Young Singles/Couples in Mainstream Segment") 423 | 424 | 'We can see that : 425 | • Mainstream young singles/couples are 23% more likely to purchase Tyrrells chips compared to the 426 | rest of the population 427 | • Mainstream young singles/couples are 56% less likely to purchase Burger Rings compared to the rest 428 | of the population' 429 | 430 | 431 | #====================================================================================================================== 432 | #Let’s also find out if our target segment tends to buy larger packs of chips. 433 | #====================================================================================================================== 434 | # Join the dataset by pack_size column and calculate the proportion 435 | 436 | #### Preferred pack size compared to the rest of the population 437 | # Pack_size affinity compared to the rest of the population 438 | # convert dataframes to data.tables 439 | segment1 <- as.data.table(segment1) 440 | other <- as.data.table(other) 441 | 442 | #convert PACK_SIZE to numeric 443 | segment1[, PACK_SIZE := as.numeric(PACK_SIZE)] 444 | other[, PACK_SIZE := as.numeric(PACK_SIZE)] 445 | 446 | quantity_segment1 <- sum(segment1[, PROD_QTY]) 447 | quantity_other <- sum(other[, PROD_QTY]) 448 | 449 | quantity_segment1_by_pack <- segment1[, .(targetSegment = sum(PROD_QTY) / quantity_segment1), by = PACK_SIZE] 450 | 451 | quantity_other_by_pack <- other[, .(other = sum(PROD_QTY) / quantity_other), by = PACK_SIZE] 452 | 453 | quantity_segment1_by_pack 454 | 455 | 456 | 'It looks like Mainstream young singles/couples are more likely to purchase a 270g pack of chips compared to the rest of the population' 457 | 458 | #========================================================================================================================== 459 | #let’s dive into what brands sell this pack size. 460 | #========================================================================================================================== 461 | 462 | # convert dataframe to data.table 463 | Chipsdata <- as.data.table(Chipsdata) 464 | 465 | # filter for rows where PACK_SIZE is equal to 270 466 | Chipsdata[Chipsdata$PACK_SIZE == 270, PROD_NAME] 467 | 468 | 469 | # filter for rows where PACK_SIZE is equal to 270 and select BRAND column 470 | Chipsdata[Chipsdata$PACK_SIZE == 270, unique(BRAND)] 471 | 472 | # 473 | 'Twisties are the only brand offering 270g packs and so this may instead be reflecting a higher likelihood of 474 | purchasing Twisties.' 475 | #=============================================== 476 | 477 | #LET'S SAVE THE NEW DATASET for task2 478 | write.csv(Chipsdata, file = '~/Quantium Chips data/Chipsdata.csv') 479 | 480 | #================================================================================================ 481 | #Conclusion 482 | #================================================================================================ 483 | 484 | '* Sales have mainly been due to Budget - older families, Mainstream - young singles/couples, and Mainstream retirees shoppers. 485 | 486 | * We found that the high spend in chips for mainstream young singles/couples and retirees is due to there being more of them than other buyers. Mainstream, midage and young singles and 487 | couples are also more likely to pay more per packet of chips. This is indicative of impulse buying behaviour. 488 | 489 | * We’ve also found that Mainstream young singles and couples are 23% more likely to purchase Tyrrells chips 490 | compared to the rest of the population. 491 | 492 | * The Category Manager may want to increase the category’s performance by off-locating some Tyrrells and smaller packs of chips in discretionary space near segments 493 | where young singles and couples frequent more often to increase visibilty and impulse behaviour. 494 | 495 | *Quantium data analyst can help the Category Manager with recommendations of where these segments are and further 496 | help them with measuring the impact of the changed placement'. -------------------------------------------------------------------------------- /Python Projects/WEB SCRAPPING E- COMMERCE SITE (EBAY).ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "id": "3a887880", 6 | "metadata": {}, 7 | "source": [ 8 | "\n", 9 | "# " 10 | ] 11 | }, 12 | { 13 | "cell_type": "markdown", 14 | "id": "e79c20d7", 15 | "metadata": {}, 16 | "source": [ 17 | "# WEBSCRAPE EBAY PRODUCT DATA \n" 18 | ] 19 | }, 20 | { 21 | "cell_type": "markdown", 22 | "id": "0517bf9e", 23 | "metadata": {}, 24 | "source": [ 25 | "#### This data below show a filter category of books that contain Business Intelligence in its title " 26 | ] 27 | }, 28 | { 29 | "cell_type": "markdown", 30 | "id": "5a84cec0", 31 | "metadata": {}, 32 | "source": [ 33 | "# " 34 | ] 35 | }, 36 | { 37 | "cell_type": "code", 38 | "execution_count": 1, 39 | "id": "ddfd62f2", 40 | "metadata": {}, 41 | "outputs": [], 42 | "source": [ 43 | "# Import all libraries required\n", 44 | "\n", 45 | "import requests\n", 46 | "import pandas as pd\n", 47 | "from bs4 import BeautifulSoup\n", 48 | "import matplotlib.pyplot as plt\n", 49 | "import seaborn as sns\n", 50 | "from prettytable import PrettyTable" 51 | ] 52 | }, 53 | { 54 | "cell_type": "code", 55 | "execution_count": 26, 56 | "id": "05c1dc68", 57 | "metadata": {}, 58 | "outputs": [], 59 | "source": [ 60 | "# Specify the URL of the Ebay item\n", 61 | "\n", 62 | "url_pattern = 'https://www.ebay.com/sch/267/i.html?_from=R40&_nkw=Business+Intelligence+&rt=nc'" 63 | ] 64 | }, 65 | { 66 | "cell_type": "code", 67 | "execution_count": 27, 68 | "id": "36d1f2c1", 69 | "metadata": {}, 70 | "outputs": [], 71 | "source": [ 72 | "# create an empty list to store the scraped data\n", 73 | "product_data = []\n", 74 | "\n", 75 | "# iterate over the page numbers\n", 76 | "for page_num in range(1, 11): # scrape data from page 1 to 10\n", 77 | " # create the url for the current page\n", 78 | " url = url_pattern.format(page_num=page_num)" 79 | ] 80 | }, 81 | { 82 | "cell_type": "code", 83 | "execution_count": 28, 84 | "id": "5bc231a9", 85 | "metadata": {}, 86 | "outputs": [], 87 | "source": [ 88 | "#send a GET request to the URL and extract the HTML content\n", 89 | "\n", 90 | "response = requests.get(url)\n", 91 | "content = response.content" 92 | ] 93 | }, 94 | { 95 | "cell_type": "code", 96 | "execution_count": 29, 97 | "id": "d890fbe9", 98 | "metadata": {}, 99 | "outputs": [], 100 | "source": [ 101 | "#Use Beautiful Soup to parse the HTML content\n", 102 | "\n", 103 | "soup = BeautifulSoup(content, 'html.parser')" 104 | ] 105 | }, 106 | { 107 | "cell_type": "code", 108 | "execution_count": 30, 109 | "id": "540bc991", 110 | "metadata": {}, 111 | "outputs": [ 112 | { 113 | "name": "stdout", 114 | "output_type": "stream", 115 | "text": [ 116 | " Title Price_sold \\\n", 117 | "0 Shop on eBay 20.00 \n", 118 | "1 Business Intelligence : Practices, Technologie... 54.99 \n", 119 | "2 The Artificial Intelligence Imperative: A Prac... 30.17 \n", 120 | "3 Business Intelligence With Cold Fusion By John... 13.78 \n", 121 | "4 Better Business Intelligence By Robert Collins 14.63 \n", 122 | "\n", 123 | " Shipping_cost Item_location Item_seller \\\n", 124 | "0 0.0 \n", 125 | "1 57.45 shipping United States vbbc2015 (11,848) 100% \n", 126 | "2 17.48 shipping United Kingdom webuybooks (1,740,389) 99.7% \n", 127 | "3 6.99 shipping United States awesomebooksusa (387,183) 98.2% \n", 128 | "4 5.35 shipping United Kingdom book_fountain (166,979) 99.2% \n", 129 | "\n", 130 | " Link \n", 131 | "0 https://ebay.com/itm/123456?hash=item28caef0a3... \n", 132 | "1 https://www.ebay.com/itm/134303280275?epid=738... \n", 133 | "2 https://www.ebay.com/itm/134435858075?epid=230... \n", 134 | "3 https://www.ebay.com/itm/394419887418?hash=ite... \n", 135 | "4 https://www.ebay.com/itm/175596253189?hash=ite... \n" 136 | ] 137 | } 138 | ], 139 | "source": [ 140 | "#Extract the product information needed\n", 141 | "items = soup.find_all('div', {'class': 's-item__wrapper clearfix'})\n", 142 | " \n", 143 | "\n", 144 | "for item in items:\n", 145 | " title = item.find('div', {'class': 's-item__title'}).text.strip()\n", 146 | " \n", 147 | " price_sold = float(item.find('span', {'class': 's-item__price'}).text.replace('$','').replace(',','').strip())\n", 148 | " shipping_cost = item.find('span', {'class': 's-item__shipping s-item__logisticsCost'})\n", 149 | " if shipping_cost:\n", 150 | " shipping_cost = shipping_cost.text.replace('+','').replace('$','').replace(',','').strip()\n", 151 | " else:\n", 152 | " shipping_cost = 0.0\n", 153 | " item_location = item.find('span', {'class': 's-item__location s-item__itemLocation'})\n", 154 | " if item_location:\n", 155 | " item_location = item_location.text.replace('from','').strip()\n", 156 | " else:\n", 157 | " item_location = ''\n", 158 | " item_seller = item.find('span', {'class':'s-item__seller-info'})\n", 159 | " if item_seller:\n", 160 | " item_seller = item_seller.text.strip()\n", 161 | " else:\n", 162 | " item_seller = ''\n", 163 | " link = item.find('a', {'class': 's-item__link'})['href']\n", 164 | " product_data.append([title, price_sold, shipping_cost, item_location, item_seller, link])\n", 165 | " \n", 166 | "BIbooks = pd.DataFrame(product_data, columns=['Title', 'Price_sold', 'Shipping_cost', 'Item_location','Item_seller', 'Link'])\n", 167 | "print(BIbooks.head())\n" 168 | ] 169 | }, 170 | { 171 | "cell_type": "code", 172 | "execution_count": 31, 173 | "id": "d243d1c3", 174 | "metadata": {}, 175 | "outputs": [ 176 | { 177 | "data": { 178 | "text/html": [ 179 | "
\n", 180 | "\n", 193 | "\n", 194 | " \n", 195 | " \n", 196 | " \n", 197 | " \n", 198 | " \n", 199 | " \n", 200 | " \n", 201 | " \n", 202 | " \n", 203 | " \n", 204 | " \n", 205 | " \n", 206 | " \n", 207 | " \n", 208 | " \n", 209 | " \n", 210 | " \n", 211 | " \n", 212 | " \n", 213 | " \n", 214 | " \n", 215 | " \n", 216 | " \n", 217 | " \n", 218 | " \n", 219 | " \n", 220 | " \n", 221 | " \n", 222 | " \n", 223 | " \n", 224 | " \n", 225 | "
TitlePrice_soldShipping_costItem_locationItem_sellerLink
0Shop on eBay20.000.0https://ebay.com/itm/123456?hash=item28caef0a3...
1Business Intelligence : Practices, Technologie...54.9957.45 shippingUnited Statesvbbc2015 (11,848) 100%https://www.ebay.com/itm/134303280275?epid=738...
\n", 226 | "
" 227 | ], 228 | "text/plain": [ 229 | " Title Price_sold \\\n", 230 | "0 Shop on eBay 20.00 \n", 231 | "1 Business Intelligence : Practices, Technologie... 54.99 \n", 232 | "\n", 233 | " Shipping_cost Item_location Item_seller \\\n", 234 | "0 0.0 \n", 235 | "1 57.45 shipping United States vbbc2015 (11,848) 100% \n", 236 | "\n", 237 | " Link \n", 238 | "0 https://ebay.com/itm/123456?hash=item28caef0a3... \n", 239 | "1 https://www.ebay.com/itm/134303280275?epid=738... " 240 | ] 241 | }, 242 | "execution_count": 31, 243 | "metadata": {}, 244 | "output_type": "execute_result" 245 | } 246 | ], 247 | "source": [ 248 | "#Display the first 2 rows of the Business Inteligence(BI) books\n", 249 | "\n", 250 | "BIbooks.head(2)" 251 | ] 252 | }, 253 | { 254 | "cell_type": "code", 255 | "execution_count": 32, 256 | "id": "056b98c5", 257 | "metadata": {}, 258 | "outputs": [ 259 | { 260 | "data": { 261 | "text/html": [ 262 | "
\n", 263 | "\n", 276 | "\n", 277 | " \n", 278 | " \n", 279 | " \n", 280 | " \n", 281 | " \n", 282 | " \n", 283 | " \n", 284 | " \n", 285 | " \n", 286 | " \n", 287 | " \n", 288 | " \n", 289 | " \n", 290 | " \n", 291 | " \n", 292 | " \n", 293 | " \n", 294 | " \n", 295 | " \n", 296 | " \n", 297 | " \n", 298 | " \n", 299 | " \n", 300 | " \n", 301 | " \n", 302 | " \n", 303 | " \n", 304 | " \n", 305 | " \n", 306 | " \n", 307 | " \n", 308 | " \n", 309 | " \n", 310 | " \n", 311 | " \n", 312 | " \n", 313 | " \n", 314 | " \n", 315 | " \n", 316 | " \n", 317 | "
TitlePrice_soldShipping_costItem_locationItem_sellerLinkSeller_nameSeller_feedbackSeller_Rating%
0Shop on eBay20.000.0https://ebay.com/itm/123456?hash=item28caef0a3...NoneNone
1Business Intelligence : Practices, Technologie...54.9957.45 shippingUnited Statesvbbc2015 (11,848) 100%https://www.ebay.com/itm/134303280275?epid=738...vbbc2015(11,848)100%
\n", 318 | "
" 319 | ], 320 | "text/plain": [ 321 | " Title Price_sold \\\n", 322 | "0 Shop on eBay 20.00 \n", 323 | "1 Business Intelligence : Practices, Technologie... 54.99 \n", 324 | "\n", 325 | " Shipping_cost Item_location Item_seller \\\n", 326 | "0 0.0 \n", 327 | "1 57.45 shipping United States vbbc2015 (11,848) 100% \n", 328 | "\n", 329 | " Link Seller_name \\\n", 330 | "0 https://ebay.com/itm/123456?hash=item28caef0a3... \n", 331 | "1 https://www.ebay.com/itm/134303280275?epid=738... vbbc2015 \n", 332 | "\n", 333 | " Seller_feedback Seller_Rating% \n", 334 | "0 None None \n", 335 | "1 (11,848) 100% " 336 | ] 337 | }, 338 | "execution_count": 32, 339 | "metadata": {}, 340 | "output_type": "execute_result" 341 | } 342 | ], 343 | "source": [ 344 | "#To separate the Item_seller column into two columns, one for the seller name and one for the seller rating\n", 345 | "\n", 346 | "BIbooks[['Seller_name','Seller_feedback', 'Seller_Rating%']] = BIbooks['Item_seller'].str.split(' ', expand=True)\n", 347 | "BIbooks.head(2)" 348 | ] 349 | }, 350 | { 351 | "cell_type": "code", 352 | "execution_count": 33, 353 | "id": "376eb65f", 354 | "metadata": {}, 355 | "outputs": [ 356 | { 357 | "data": { 358 | "text/plain": [ 359 | "Title 0\n", 360 | "Price_sold 0\n", 361 | "Shipping_cost 0\n", 362 | "Item_location 0\n", 363 | "Item_seller 0\n", 364 | "Link 0\n", 365 | "Seller_name 0\n", 366 | "Seller_feedback 1\n", 367 | "Seller_Rating% 1\n", 368 | "dtype: int64" 369 | ] 370 | }, 371 | "execution_count": 33, 372 | "metadata": {}, 373 | "output_type": "execute_result" 374 | } 375 | ], 376 | "source": [ 377 | "#check for null values\n", 378 | "\n", 379 | "BIbooks.isnull().sum()" 380 | ] 381 | }, 382 | { 383 | "cell_type": "code", 384 | "execution_count": 34, 385 | "id": "aa4156ee", 386 | "metadata": {}, 387 | "outputs": [ 388 | { 389 | "data": { 390 | "text/html": [ 391 | "
\n", 392 | "\n", 405 | "\n", 406 | " \n", 407 | " \n", 408 | " \n", 409 | " \n", 410 | " \n", 411 | " \n", 412 | " \n", 413 | " \n", 414 | " \n", 415 | " \n", 416 | " \n", 417 | " \n", 418 | " \n", 419 | " \n", 420 | " \n", 421 | " \n", 422 | " \n", 423 | " \n", 424 | " \n", 425 | " \n", 426 | " \n", 427 | " \n", 428 | " \n", 429 | " \n", 430 | " \n", 431 | " \n", 432 | " \n", 433 | " \n", 434 | " \n", 435 | " \n", 436 | " \n", 437 | " \n", 438 | " \n", 439 | " \n", 440 | " \n", 441 | " \n", 442 | " \n", 443 | " \n", 444 | " \n", 445 | " \n", 446 | "
TitlePrice_soldShipping_costItem_locationItem_sellerLinkSeller_nameSeller_feedbackSeller_Rating%
1Business Intelligence : Practices, Technologie...54.9957.45 shippingUnited Statesvbbc2015 (11,848) 100%https://www.ebay.com/itm/134303280275?epid=738...vbbc2015(11,848)100%
2The Artificial Intelligence Imperative: A Prac...30.1717.48 shippingUnited Kingdomwebuybooks (1,740,389) 99.7%https://www.ebay.com/itm/134435858075?epid=230...webuybooks(1,740,389)99.7%
\n", 447 | "
" 448 | ], 449 | "text/plain": [ 450 | " Title Price_sold \\\n", 451 | "1 Business Intelligence : Practices, Technologie... 54.99 \n", 452 | "2 The Artificial Intelligence Imperative: A Prac... 30.17 \n", 453 | "\n", 454 | " Shipping_cost Item_location Item_seller \\\n", 455 | "1 57.45 shipping United States vbbc2015 (11,848) 100% \n", 456 | "2 17.48 shipping United Kingdom webuybooks (1,740,389) 99.7% \n", 457 | "\n", 458 | " Link Seller_name \\\n", 459 | "1 https://www.ebay.com/itm/134303280275?epid=738... vbbc2015 \n", 460 | "2 https://www.ebay.com/itm/134435858075?epid=230... webuybooks \n", 461 | "\n", 462 | " Seller_feedback Seller_Rating% \n", 463 | "1 (11,848) 100% \n", 464 | "2 (1,740,389) 99.7% " 465 | ] 466 | }, 467 | "execution_count": 34, 468 | "metadata": {}, 469 | "output_type": "execute_result" 470 | } 471 | ], 472 | "source": [ 473 | "#Drop the first row column \n", 474 | "\n", 475 | "BIbooks = BIbooks.drop([0], axis=0) #drop the first row\n", 476 | "\n", 477 | "BIbooks.head(2)" 478 | ] 479 | }, 480 | { 481 | "cell_type": "code", 482 | "execution_count": 35, 483 | "id": "00ed5c09", 484 | "metadata": {}, 485 | "outputs": [ 486 | { 487 | "data": { 488 | "text/html": [ 489 | "
\n", 490 | "\n", 503 | "\n", 504 | " \n", 505 | " \n", 506 | " \n", 507 | " \n", 508 | " \n", 509 | " \n", 510 | " \n", 511 | " \n", 512 | " \n", 513 | " \n", 514 | " \n", 515 | " \n", 516 | " \n", 517 | " \n", 518 | " \n", 519 | " \n", 520 | " \n", 521 | " \n", 522 | " \n", 523 | " \n", 524 | " \n", 525 | " \n", 526 | " \n", 527 | " \n", 528 | " \n", 529 | " \n", 530 | " \n", 531 | " \n", 532 | " \n", 533 | " \n", 534 | " \n", 535 | " \n", 536 | " \n", 537 | " \n", 538 | " \n", 539 | " \n", 540 | " \n", 541 | " \n", 542 | " \n", 543 | " \n", 544 | "
TitlePrice_soldShipping_costItem_locationItem_sellerLinkSeller_nameSeller_feedbackSeller_Rating%
1Business Intelligence : Practices, Technologie...54.9957.45 shippingUnited Statesvbbc2015 (11,848) 100%https://www.ebay.com/itm/134303280275?epid=738...vbbc201511848100.0
2The Artificial Intelligence Imperative: A Prac...30.1717.48 shippingUnited Kingdomwebuybooks (1,740,389) 99.7%https://www.ebay.com/itm/134435858075?epid=230...webuybooks174038999.7
\n", 545 | "
" 546 | ], 547 | "text/plain": [ 548 | " Title Price_sold \\\n", 549 | "1 Business Intelligence : Practices, Technologie... 54.99 \n", 550 | "2 The Artificial Intelligence Imperative: A Prac... 30.17 \n", 551 | "\n", 552 | " Shipping_cost Item_location Item_seller \\\n", 553 | "1 57.45 shipping United States vbbc2015 (11,848) 100% \n", 554 | "2 17.48 shipping United Kingdom webuybooks (1,740,389) 99.7% \n", 555 | "\n", 556 | " Link Seller_name \\\n", 557 | "1 https://www.ebay.com/itm/134303280275?epid=738... vbbc2015 \n", 558 | "2 https://www.ebay.com/itm/134435858075?epid=230... webuybooks \n", 559 | "\n", 560 | " Seller_feedback Seller_Rating% \n", 561 | "1 11848 100.0 \n", 562 | "2 1740389 99.7 " 563 | ] 564 | }, 565 | "execution_count": 35, 566 | "metadata": {}, 567 | "output_type": "execute_result" 568 | } 569 | ], 570 | "source": [ 571 | "#Remove the brackets from the 'Seller_feedback' column by calling the str.replace() \n", 572 | "BIbooks['Seller_feedback'] = BIbooks['Seller_feedback'].str.replace('[(),]', '', regex=True)\n", 573 | "\n", 574 | "\n", 575 | "#Remove the Percentage % sign from the 'Seller_Rating%' column\n", 576 | "BIbooks['Seller_Rating%'] = BIbooks['Seller_Rating%'].str.replace('%', '', regex=True)\n", 577 | "\n", 578 | "\n", 579 | "#Convert the column datatype using the astype()\n", 580 | "\n", 581 | "BIbooks['Seller_feedback'] = BIbooks['Seller_feedback'].astype(int)\n", 582 | "BIbooks['Seller_Rating%'] = BIbooks['Seller_Rating%'].astype(float)\n", 583 | "BIbooks.head(2)" 584 | ] 585 | }, 586 | { 587 | "cell_type": "code", 588 | "execution_count": 36, 589 | "id": "408bce41", 590 | "metadata": {}, 591 | "outputs": [ 592 | { 593 | "data": { 594 | "text/plain": [ 595 | "Title 0\n", 596 | "Price_sold 0\n", 597 | "Shipping_cost 0\n", 598 | "Item_location 0\n", 599 | "Item_seller 0\n", 600 | "Link 0\n", 601 | "Seller_name 0\n", 602 | "Seller_feedback 0\n", 603 | "Seller_Rating% 0\n", 604 | "dtype: int64" 605 | ] 606 | }, 607 | "execution_count": 36, 608 | "metadata": {}, 609 | "output_type": "execute_result" 610 | } 611 | ], 612 | "source": [ 613 | "#Check for null values\n", 614 | "\n", 615 | "BIbooks.isnull().sum()" 616 | ] 617 | }, 618 | { 619 | "cell_type": "code", 620 | "execution_count": 37, 621 | "id": "b482b1c9", 622 | "metadata": { 623 | "scrolled": true 624 | }, 625 | "outputs": [ 626 | { 627 | "name": "stdout", 628 | "output_type": "stream", 629 | "text": [ 630 | "\n", 631 | "RangeIndex: 75 entries, 1 to 75\n", 632 | "Data columns (total 9 columns):\n", 633 | " # Column Non-Null Count Dtype \n", 634 | "--- ------ -------------- ----- \n", 635 | " 0 Title 75 non-null object \n", 636 | " 1 Price_sold 75 non-null float64\n", 637 | " 2 Shipping_cost 75 non-null object \n", 638 | " 3 Item_location 75 non-null object \n", 639 | " 4 Item_seller 75 non-null object \n", 640 | " 5 Link 75 non-null object \n", 641 | " 6 Seller_name 75 non-null object \n", 642 | " 7 Seller_feedback 75 non-null int32 \n", 643 | " 8 Seller_Rating% 75 non-null float64\n", 644 | "dtypes: float64(2), int32(1), object(6)\n", 645 | "memory usage: 5.1+ KB\n" 646 | ] 647 | } 648 | ], 649 | "source": [ 650 | "#Check for info\n", 651 | "\n", 652 | "BIbooks.info()" 653 | ] 654 | }, 655 | { 656 | "cell_type": "code", 657 | "execution_count": 38, 658 | "id": "25fa79ab", 659 | "metadata": {}, 660 | "outputs": [ 661 | { 662 | "data": { 663 | "text/html": [ 664 | "
\n", 665 | "\n", 678 | "\n", 679 | " \n", 680 | " \n", 681 | " \n", 682 | " \n", 683 | " \n", 684 | " \n", 685 | " \n", 686 | " \n", 687 | " \n", 688 | " \n", 689 | " \n", 690 | " \n", 691 | " \n", 692 | " \n", 693 | " \n", 694 | " \n", 695 | " \n", 696 | " \n", 697 | " \n", 698 | " \n", 699 | " \n", 700 | " \n", 701 | " \n", 702 | " \n", 703 | " \n", 704 | " \n", 705 | " \n", 706 | " \n", 707 | " \n", 708 | " \n", 709 | " \n", 710 | " \n", 711 | " \n", 712 | " \n", 713 | " \n", 714 | " \n", 715 | " \n", 716 | " \n", 717 | " \n", 718 | " \n", 719 | " \n", 720 | " \n", 721 | " \n", 722 | " \n", 723 | " \n", 724 | " \n", 725 | " \n", 726 | " \n", 727 | " \n", 728 | " \n", 729 | " \n", 730 | " \n", 731 | " \n", 732 | " \n", 733 | " \n", 734 | " \n", 735 | " \n", 736 | " \n", 737 | " \n", 738 | " \n", 739 | " \n", 740 | " \n", 741 | " \n", 742 | " \n", 743 | " \n", 744 | " \n", 745 | " \n", 746 | " \n", 747 | " \n", 748 | " \n", 749 | " \n", 750 | " \n", 751 | " \n", 752 | " \n", 753 | " \n", 754 | " \n", 755 | " \n", 756 | " \n", 757 | " \n", 758 | " \n", 759 | " \n", 760 | " \n", 761 | " \n", 762 | " \n", 763 | " \n", 764 | " \n", 765 | " \n", 766 | " \n", 767 | " \n", 768 | " \n", 769 | " \n", 770 | " \n", 771 | " \n", 772 | " \n", 773 | " \n", 774 | " \n", 775 | " \n", 776 | " \n", 777 | " \n", 778 | " \n", 779 | " \n", 780 | " \n", 781 | " \n", 782 | " \n", 783 | " \n", 784 | " \n", 785 | " \n", 786 | " \n", 787 | " \n", 788 | " \n", 789 | " \n", 790 | " \n", 791 | " \n", 792 | " \n", 793 | " \n", 794 | " \n", 795 | " \n", 796 | " \n", 797 | " \n", 798 | " \n", 799 | " \n", 800 | " \n", 801 | " \n", 802 | " \n", 803 | " \n", 804 | " \n", 805 | " \n", 806 | " \n", 807 | " \n", 808 | " \n", 809 | " \n", 810 | " \n", 811 | " \n", 812 | " \n", 813 | " \n", 814 | " \n", 815 | " \n", 816 | " \n", 817 | " \n", 818 | " \n", 819 | " \n", 820 | " \n", 821 | " \n", 822 | " \n", 823 | " \n", 824 | " \n", 825 | " \n", 826 | " \n", 827 | " \n", 828 | " \n", 829 | " \n", 830 | " \n", 831 | " \n", 832 | " \n", 833 | " \n", 834 | " \n", 835 | " \n", 836 | " \n", 837 | " \n", 838 | " \n", 839 | " \n", 840 | " \n", 841 | " \n", 842 | " \n", 843 | " \n", 844 | " \n", 845 | " \n", 846 | " \n", 847 | " \n", 848 | " \n", 849 | " \n", 850 | " \n", 851 | " \n", 852 | " \n", 853 | " \n", 854 | " \n", 855 | " \n", 856 | " \n", 857 | " \n", 858 | " \n", 859 | " \n", 860 | " \n", 861 | " \n", 862 | " \n", 863 | " \n", 864 | " \n", 865 | " \n", 866 | " \n", 867 | " \n", 868 | " \n", 869 | " \n", 870 | " \n", 871 | " \n", 872 | " \n", 873 | " \n", 874 | " \n", 875 | " \n", 876 | " \n", 877 | " \n", 878 | " \n", 879 | " \n", 880 | " \n", 881 | " \n", 882 | " \n", 883 | " \n", 884 | " \n", 885 | " \n", 886 | " \n", 887 | " \n", 888 | " \n", 889 | " \n", 890 | " \n", 891 | " \n", 892 | " \n", 893 | " \n", 894 | " \n", 895 | " \n", 896 | " \n", 897 | " \n", 898 | " \n", 899 | " \n", 900 | " \n", 901 | " \n", 902 | " \n", 903 | " \n", 904 | " \n", 905 | " \n", 906 | " \n", 907 | " \n", 908 | " \n", 909 | " \n", 910 | " \n", 911 | " \n", 912 | " \n", 913 | " \n", 914 | " \n", 915 | " \n", 916 | " \n", 917 | " \n", 918 | " \n", 919 | " \n", 920 | " \n", 921 | " \n", 922 | " \n", 923 | " \n", 924 | " \n", 925 | " \n", 926 | " \n", 927 | " \n", 928 | " \n", 929 | " \n", 930 | " \n", 931 | " \n", 932 | " \n", 933 | " \n", 934 | " \n", 935 | " \n", 936 | " \n", 937 | " \n", 938 | " \n", 939 | " \n", 940 | " \n", 941 | " \n", 942 | " \n", 943 | " \n", 944 | " \n", 945 | " \n", 946 | " \n", 947 | " \n", 948 | " \n", 949 | " \n", 950 | " \n", 951 | " \n", 952 | " \n", 953 | " \n", 954 | " \n", 955 | " \n", 956 | " \n", 957 | " \n", 958 | " \n", 959 | " \n", 960 | " \n", 961 | " \n", 962 | " \n", 963 | " \n", 964 | " \n", 965 | " \n", 966 | " \n", 967 | " \n", 968 | " \n", 969 | " \n", 970 | " \n", 971 | " \n", 972 | " \n", 973 | " \n", 974 | " \n", 975 | " \n", 976 | " \n", 977 | " \n", 978 | " \n", 979 | " \n", 980 | " \n", 981 | " \n", 982 | " \n", 983 | " \n", 984 | " \n", 985 | " \n", 986 | " \n", 987 | " \n", 988 | " \n", 989 | " \n", 990 | " \n", 991 | " \n", 992 | " \n", 993 | " \n", 994 | " \n", 995 | " \n", 996 | " \n", 997 | " \n", 998 | " \n", 999 | " \n", 1000 | " \n", 1001 | " \n", 1002 | " \n", 1003 | " \n", 1004 | " \n", 1005 | " \n", 1006 | " \n", 1007 | " \n", 1008 | " \n", 1009 | " \n", 1010 | " \n", 1011 | " \n", 1012 | " \n", 1013 | " \n", 1014 | " \n", 1015 | " \n", 1016 | " \n", 1017 | " \n", 1018 | " \n", 1019 | " \n", 1020 | " \n", 1021 | " \n", 1022 | " \n", 1023 | " \n", 1024 | " \n", 1025 | " \n", 1026 | " \n", 1027 | " \n", 1028 | " \n", 1029 | " \n", 1030 | " \n", 1031 | " \n", 1032 | " \n", 1033 | " \n", 1034 | " \n", 1035 | " \n", 1036 | " \n", 1037 | " \n", 1038 | " \n", 1039 | " \n", 1040 | " \n", 1041 | " \n", 1042 | " \n", 1043 | " \n", 1044 | " \n", 1045 | " \n", 1046 | " \n", 1047 | " \n", 1048 | " \n", 1049 | " \n", 1050 | " \n", 1051 | " \n", 1052 | " \n", 1053 | " \n", 1054 | " \n", 1055 | " \n", 1056 | " \n", 1057 | " \n", 1058 | " \n", 1059 | " \n", 1060 | " \n", 1061 | " \n", 1062 | " \n", 1063 | " \n", 1064 | " \n", 1065 | " \n", 1066 | " \n", 1067 | " \n", 1068 | " \n", 1069 | " \n", 1070 | " \n", 1071 | " \n", 1072 | " \n", 1073 | " \n", 1074 | " \n", 1075 | " \n", 1076 | " \n", 1077 | " \n", 1078 | " \n", 1079 | " \n", 1080 | " \n", 1081 | " \n", 1082 | " \n", 1083 | " \n", 1084 | " \n", 1085 | " \n", 1086 | " \n", 1087 | " \n", 1088 | " \n", 1089 | " \n", 1090 | " \n", 1091 | " \n", 1092 | " \n", 1093 | " \n", 1094 | " \n", 1095 | " \n", 1096 | " \n", 1097 | " \n", 1098 | " \n", 1099 | " \n", 1100 | " \n", 1101 | " \n", 1102 | " \n", 1103 | " \n", 1104 | " \n", 1105 | " \n", 1106 | " \n", 1107 | " \n", 1108 | " \n", 1109 | " \n", 1110 | " \n", 1111 | " \n", 1112 | " \n", 1113 | " \n", 1114 | " \n", 1115 | " \n", 1116 | " \n", 1117 | " \n", 1118 | " \n", 1119 | " \n", 1120 | " \n", 1121 | " \n", 1122 | " \n", 1123 | " \n", 1124 | " \n", 1125 | " \n", 1126 | " \n", 1127 | " \n", 1128 | " \n", 1129 | " \n", 1130 | " \n", 1131 | " \n", 1132 | " \n", 1133 | " \n", 1134 | " \n", 1135 | " \n", 1136 | " \n", 1137 | " \n", 1138 | " \n", 1139 | " \n", 1140 | " \n", 1141 | " \n", 1142 | " \n", 1143 | " \n", 1144 | " \n", 1145 | " \n", 1146 | " \n", 1147 | " \n", 1148 | " \n", 1149 | " \n", 1150 | " \n", 1151 | " \n", 1152 | " \n", 1153 | " \n", 1154 | " \n", 1155 | " \n", 1156 | " \n", 1157 | " \n", 1158 | " \n", 1159 | " \n", 1160 | " \n", 1161 | " \n", 1162 | " \n", 1163 | " \n", 1164 | " \n", 1165 | " \n", 1166 | " \n", 1167 | " \n", 1168 | " \n", 1169 | " \n", 1170 | " \n", 1171 | " \n", 1172 | " \n", 1173 | " \n", 1174 | " \n", 1175 | " \n", 1176 | " \n", 1177 | " \n", 1178 | " \n", 1179 | " \n", 1180 | " \n", 1181 | " \n", 1182 | " \n", 1183 | " \n", 1184 | " \n", 1185 | " \n", 1186 | " \n", 1187 | " \n", 1188 | " \n", 1189 | " \n", 1190 | " \n", 1191 | " \n", 1192 | " \n", 1193 | " \n", 1194 | " \n", 1195 | " \n", 1196 | " \n", 1197 | " \n", 1198 | " \n", 1199 | " \n", 1200 | " \n", 1201 | " \n", 1202 | " \n", 1203 | " \n", 1204 | " \n", 1205 | " \n", 1206 | " \n", 1207 | " \n", 1208 | " \n", 1209 | " \n", 1210 | " \n", 1211 | " \n", 1212 | " \n", 1213 | " \n", 1214 | " \n", 1215 | " \n", 1216 | " \n", 1217 | " \n", 1218 | " \n", 1219 | " \n", 1220 | " \n", 1221 | " \n", 1222 | " \n", 1223 | " \n", 1224 | " \n", 1225 | " \n", 1226 | " \n", 1227 | " \n", 1228 | " \n", 1229 | " \n", 1230 | " \n", 1231 | " \n", 1232 | " \n", 1233 | " \n", 1234 | " \n", 1235 | "
TitlePrice_soldShipping_costItem_locationItem_sellerLinkSeller_nameSeller_feedbackSeller_Rating%
1Business Intelligence : Practices, Technologie...54.9957.45 shippingUnited Statesvbbc2015 (11,848) 100%https://www.ebay.com/itm/134303280275?epid=738...vbbc201511848100.0
2The Artificial Intelligence Imperative: A Prac...30.1717.48 shippingUnited Kingdomwebuybooks (1,740,389) 99.7%https://www.ebay.com/itm/134435858075?epid=230...webuybooks174038999.7
3Business Intelligence With Cold Fusion By John...13.786.99 shippingUnited Statesawesomebooksusa (387,183) 98.2%https://www.ebay.com/itm/394419887418?hash=ite...awesomebooksusa38718398.2
4Better Business Intelligence By Robert Collins14.635.35 shippingUnited Kingdombook_fountain (166,979) 99.2%https://www.ebay.com/itm/175596253189?hash=ite...book_fountain16697999.2
5Business Intelligence, Analytics Data Science ...49.9930.40 shippingUnited Statesgeda-n (1,065) 100%https://www.ebay.com/itm/334775697963?epid=234...geda-n1065100.0
6Definitive Guide to DAX, The Business intellig...25.14Shipping not specifiedUnited Kingdommiscgoodstuff (307) 100%https://www.ebay.com/itm/394408552413?epid=200...miscgoodstuff307100.0
7Successful Business Intelligence: Unlock the V...30.0060.72 shippingUnited Statesthodgez28 (476) 99.3%https://www.ebay.com/itm/175688111518?epid=280...thodgez2847699.3
8Beyond the Balanced Scorecard : Improving Busi...5.9925.39 shippingUnited Statesreadingbooks_sendingpostcards (355) 100%https://www.ebay.com/itm/223503389279?epid=593...readingbooks_sendingpostcards355100.0
9Business Intelligence, Analytics, and Data Sci...22.9084.01 shippingUnited Statesdunkin_bookstore (30,353) 99.2%https://www.ebay.com/itm/233198562933?epid=234...dunkin_bookstore3035399.2
10Data Mining for Business Intelligence: Concept...8.1222.83 shippingUnited Kingdomwebuybooks (1,740,389) 99.7%https://www.ebay.com/itm/134523252613?hash=ite...webuybooks174038999.7
11Business Intelligence Competency Centers: A Te...6.345.35 shippingUnited Kingdombook_fountain (166,979) 99.2%https://www.ebay.com/itm/385311444832?epid=955...book_fountain16697999.2
12Data Mining for Business Intelligence: Concept...11.4979.84 shippingUnited Stateshpb-inc (31,625) 98.9%https://www.ebay.com/itm/334841553738?epid=844...hpb-inc3162598.9
13Definitive Guide to DAX, The: Business intelli...29.9524.65 shippingUnited Kingdomwebuybooks (1,740,389) 99.7%https://www.ebay.com/itm/134544064239?epid=216...webuybooks174038999.7
14Project Management - an Artificial Intelligent...23.2011.56 shippingUnited Kingdomwebuybooks (1,740,389) 99.7%https://www.ebay.com/itm/364187110385?epid=220...webuybooks174038999.7
15Data Visualization for Oracle Business Intelli...13.786.99 shippingUnited Statesawesomebooksusa (387,183) 98.2%https://www.ebay.com/itm/314493314854?epid=201...awesomebooksusa38718398.2
16SAP Business Intelligence: Up-to-date for SAP ...18.416.99 shippingUnited Statesawesomebooksusa (387,183) 98.2%https://www.ebay.com/itm/334740223837?epid=602...awesomebooksusa38718398.2
17New ListingPOPULAR SCIENCE Technology and Comm...19.00102.55 shippingUnited Statesvanpatrick2005 (2,220) 100%https://www.ebay.com/itm/256060365330?hash=ite...vanpatrick20052220100.0
18Leadership Presence (HBR Emotional Intelligenc...6.374.00 shippingUnited Kingdomwebuybooks (1,740,389) 99.7%https://www.ebay.com/itm/364200255647?epid=280...webuybooks174038999.7
19SAP Business Intelligence: Up-to-date for SAP ...10.605.35 shippingUnited Kingdombook_fountain (166,979) 99.2%https://www.ebay.com/itm/385405901406?epid=107...book_fountain16697999.2
20Delivering Business Intelligence with Microsof...12.045.35 shippingUnited Kingdombook_fountain (166,979) 99.2%https://www.ebay.com/itm/175678784438?epid=222...book_fountain16697999.2
21Data Mining for Business Intelligence: Concept...8.4930.00 shippingUnited Stateswonderbooks (587,952) 99.7%https://www.ebay.com/itm/354646465558?epid=844...wonderbooks58795299.7
22Business Intelligence and Analytics: Systems f...25.0085.00 shippingUnited Statesjustsafa (1,192) 98.1%https://www.ebay.com/itm/334772967747?epid=177...justsafa119298.1
23Building Business Intelligence Using SAS: Cont...15.385.35 shippingUnited Kingdombook_fountain (166,979) 99.2%https://www.ebay.com/itm/175686504625?epid=113...book_fountain16697999.2
24New ListingIntelligent Enterprise, Quinn, Jame...14.452.64 shippingUnited Kingdomthecotswoldlibrary (586,859) 99.6%https://www.ebay.com/itm/385583605619?epid=950...thecotswoldlibrary58685999.6
25Expert Systems: Artificial Intelligence in Bus...13.866.27 shippingUnited Kingdomcmedia_group (763,774) 99.6%https://www.ebay.com/itm/134407217856?epid=888...cmedia_group76377499.6
26Karen Berman Financial Intelligence, Revised E...29.371.98 shippingUnited Kingdomrarewaves-outlet (909,494) 99.4%https://www.ebay.com/itm/354309562663?epid=138...rarewaves-outlet90949499.4
27Daniel Goleman Emotionally Intelligent Leader ...13.413.76 shippingUnited Kingdomrarewaves-outlet (909,494) 99.4%https://www.ebay.com/itm/354225696018?epid=150...rarewaves-outlet90949499.4
28Intelligent Marketing for Employment Lawyers: ...6.156.27 shippingUnited Kingdomcmedia_group (763,774) 99.6%https://www.ebay.com/itm/385204743329?epid=404...cmedia_group76377499.6
29Oracle Business Intelligence 11g Developers Gu...19.5640.48 shippingUnited Kingdomwebuybooks (1,740,389) 99.7%https://www.ebay.com/itm/155486480064?epid=117...webuybooks174038999.7
30Decision Support, Analytics, and Business Inte...8.3038.17 shippingUnited Statesbombbooks (7,358) 98.8%https://www.ebay.com/itm/255810593954?epid=215...bombbooks735898.8
31The Leader's Guide to Emotional Agility (Emoti...8.809.01 shippingUnited Kingdomwebuybooks (1,740,389) 99.7%https://www.ebay.com/itm/134479529800?epid=221...webuybooks174038999.7
32Effective Strategy Execution: Improving Perfor...12.8212.09 shippingUnited Kingdomwebuybooks (1,740,389) 99.7%https://www.ebay.com/itm/364131898642?hash=ite...webuybooks174038999.7
33Learn Microsoft PowerApps: Build customized bu...36.6357.17 shippingUnited Statesgoodwillrs (211,859) 99.6%https://www.ebay.com/itm/165803498246?epid=210...goodwillrs21185999.6
34Business Intelligence for the Enterprise, Bier...10.439.69 shippingUnited Kingdomwebuybooks (1,740,389) 99.7%https://www.ebay.com/itm/155494228400?hash=ite...webuybooks174038999.7
35Power and Impact (HBR Emotional Intelligence S...8.982.64 shippingUnited Kingdomthecotswoldlibrary (586,859) 99.6%https://www.ebay.com/itm/385450221653?epid=704...thecotswoldlibrary58685999.6
36Intelligent Business, Upper Intermediate Workb...26.2825.38 shippingGermanyrheinberg-buch-at (85,845) 99.8%https://www.ebay.com/itm/364238567063?hash=ite...rheinberg-buch-at8584599.8
37Financial Intelligence for HR Professionals: W...8.2610.27 shippingUnited Kingdomwebuybooks (1,740,389) 99.7%https://www.ebay.com/itm/134430226236?hash=ite...webuybooks174038999.7
38Business Intelligence, Analytics, and Data Sci...110.0040.00 shippingUnited Statesvalleegirls (8,391) 100%https://www.ebay.com/itm/145030796056?epid=234...valleegirls8391100.0
39Building Business Intelligence Using SAS: Cont...15.126.99 shippingUnited Statesawesomebooksusa (387,183) 98.2%https://www.ebay.com/itm/334829285079?epid=113...awesomebooksusa38718398.2
40Business Intelligence, Analytics, and Data Sci...33.046.60 shippingIndiaprincess-shopee (2,671) 86.5%https://www.ebay.com/itm/195582064511?epid=233...princess-shopee267186.5
41New ListingCompetitive Intelligence: From Blac...14.782.64 shippingUnited Kingdomthecotswoldlibrary (586,859) 99.6%https://www.ebay.com/itm/385584174988?epid=955...thecotswoldlibrary58685999.6
42Happiness (HBR Emotional Intelligence Series) ...10.715.35 shippingUnited Kingdombook_fountain (166,979) 99.2%https://www.ebay.com/itm/385546852155?epid=235...book_fountain16697999.2
43Delivering Business Intelligence with Microsof...12.156.27 shippingUnited Kingdomcmedia_group (763,774) 99.6%https://www.ebay.com/itm/385528696514?epid=222...cmedia_group76377499.6
44Expert Systems: Artificial Intelligence in Bus...5.155.35 shippingUnited Kingdombook_fountain (166,979) 99.2%https://www.ebay.com/itm/385343928440?epid=954...book_fountain16697999.2
45Confidence (HBR Emotional Intelligence Series)...10.715.35 shippingUnited Kingdombook_fountain (166,979) 99.2%https://www.ebay.com/itm/175695774186?epid=220...book_fountain16697999.2
\n", 1236 | "
" 1237 | ], 1238 | "text/plain": [ 1239 | " Title Price_sold \\\n", 1240 | "1 Business Intelligence : Practices, Technologie... 54.99 \n", 1241 | "2 The Artificial Intelligence Imperative: A Prac... 30.17 \n", 1242 | "3 Business Intelligence With Cold Fusion By John... 13.78 \n", 1243 | "4 Better Business Intelligence By Robert Collins 14.63 \n", 1244 | "5 Business Intelligence, Analytics Data Science ... 49.99 \n", 1245 | "6 Definitive Guide to DAX, The Business intellig... 25.14 \n", 1246 | "7 Successful Business Intelligence: Unlock the V... 30.00 \n", 1247 | "8 Beyond the Balanced Scorecard : Improving Busi... 5.99 \n", 1248 | "9 Business Intelligence, Analytics, and Data Sci... 22.90 \n", 1249 | "10 Data Mining for Business Intelligence: Concept... 8.12 \n", 1250 | "11 Business Intelligence Competency Centers: A Te... 6.34 \n", 1251 | "12 Data Mining for Business Intelligence: Concept... 11.49 \n", 1252 | "13 Definitive Guide to DAX, The: Business intelli... 29.95 \n", 1253 | "14 Project Management - an Artificial Intelligent... 23.20 \n", 1254 | "15 Data Visualization for Oracle Business Intelli... 13.78 \n", 1255 | "16 SAP Business Intelligence: Up-to-date for SAP ... 18.41 \n", 1256 | "17 New ListingPOPULAR SCIENCE Technology and Comm... 19.00 \n", 1257 | "18 Leadership Presence (HBR Emotional Intelligenc... 6.37 \n", 1258 | "19 SAP Business Intelligence: Up-to-date for SAP ... 10.60 \n", 1259 | "20 Delivering Business Intelligence with Microsof... 12.04 \n", 1260 | "21 Data Mining for Business Intelligence: Concept... 8.49 \n", 1261 | "22 Business Intelligence and Analytics: Systems f... 25.00 \n", 1262 | "23 Building Business Intelligence Using SAS: Cont... 15.38 \n", 1263 | "24 New ListingIntelligent Enterprise, Quinn, Jame... 14.45 \n", 1264 | "25 Expert Systems: Artificial Intelligence in Bus... 13.86 \n", 1265 | "26 Karen Berman Financial Intelligence, Revised E... 29.37 \n", 1266 | "27 Daniel Goleman Emotionally Intelligent Leader ... 13.41 \n", 1267 | "28 Intelligent Marketing for Employment Lawyers: ... 6.15 \n", 1268 | "29 Oracle Business Intelligence 11g Developers Gu... 19.56 \n", 1269 | "30 Decision Support, Analytics, and Business Inte... 8.30 \n", 1270 | "31 The Leader's Guide to Emotional Agility (Emoti... 8.80 \n", 1271 | "32 Effective Strategy Execution: Improving Perfor... 12.82 \n", 1272 | "33 Learn Microsoft PowerApps: Build customized bu... 36.63 \n", 1273 | "34 Business Intelligence for the Enterprise, Bier... 10.43 \n", 1274 | "35 Power and Impact (HBR Emotional Intelligence S... 8.98 \n", 1275 | "36 Intelligent Business, Upper Intermediate Workb... 26.28 \n", 1276 | "37 Financial Intelligence for HR Professionals: W... 8.26 \n", 1277 | "38 Business Intelligence, Analytics, and Data Sci... 110.00 \n", 1278 | "39 Building Business Intelligence Using SAS: Cont... 15.12 \n", 1279 | "40 Business Intelligence, Analytics, and Data Sci... 33.04 \n", 1280 | "41 New ListingCompetitive Intelligence: From Blac... 14.78 \n", 1281 | "42 Happiness (HBR Emotional Intelligence Series) ... 10.71 \n", 1282 | "43 Delivering Business Intelligence with Microsof... 12.15 \n", 1283 | "44 Expert Systems: Artificial Intelligence in Bus... 5.15 \n", 1284 | "45 Confidence (HBR Emotional Intelligence Series)... 10.71 \n", 1285 | "\n", 1286 | " Shipping_cost Item_location \\\n", 1287 | "1 57.45 shipping United States \n", 1288 | "2 17.48 shipping United Kingdom \n", 1289 | "3 6.99 shipping United States \n", 1290 | "4 5.35 shipping United Kingdom \n", 1291 | "5 30.40 shipping United States \n", 1292 | "6 Shipping not specified United Kingdom \n", 1293 | "7 60.72 shipping United States \n", 1294 | "8 25.39 shipping United States \n", 1295 | "9 84.01 shipping United States \n", 1296 | "10 22.83 shipping United Kingdom \n", 1297 | "11 5.35 shipping United Kingdom \n", 1298 | "12 79.84 shipping United States \n", 1299 | "13 24.65 shipping United Kingdom \n", 1300 | "14 11.56 shipping United Kingdom \n", 1301 | "15 6.99 shipping United States \n", 1302 | "16 6.99 shipping United States \n", 1303 | "17 102.55 shipping United States \n", 1304 | "18 4.00 shipping United Kingdom \n", 1305 | "19 5.35 shipping United Kingdom \n", 1306 | "20 5.35 shipping United Kingdom \n", 1307 | "21 30.00 shipping United States \n", 1308 | "22 85.00 shipping United States \n", 1309 | "23 5.35 shipping United Kingdom \n", 1310 | "24 2.64 shipping United Kingdom \n", 1311 | "25 6.27 shipping United Kingdom \n", 1312 | "26 1.98 shipping United Kingdom \n", 1313 | "27 3.76 shipping United Kingdom \n", 1314 | "28 6.27 shipping United Kingdom \n", 1315 | "29 40.48 shipping United Kingdom \n", 1316 | "30 38.17 shipping United States \n", 1317 | "31 9.01 shipping United Kingdom \n", 1318 | "32 12.09 shipping United Kingdom \n", 1319 | "33 57.17 shipping United States \n", 1320 | "34 9.69 shipping United Kingdom \n", 1321 | "35 2.64 shipping United Kingdom \n", 1322 | "36 25.38 shipping Germany \n", 1323 | "37 10.27 shipping United Kingdom \n", 1324 | "38 40.00 shipping United States \n", 1325 | "39 6.99 shipping United States \n", 1326 | "40 6.60 shipping India \n", 1327 | "41 2.64 shipping United Kingdom \n", 1328 | "42 5.35 shipping United Kingdom \n", 1329 | "43 6.27 shipping United Kingdom \n", 1330 | "44 5.35 shipping United Kingdom \n", 1331 | "45 5.35 shipping United Kingdom \n", 1332 | "\n", 1333 | " Item_seller \\\n", 1334 | "1 vbbc2015 (11,848) 100% \n", 1335 | "2 webuybooks (1,740,389) 99.7% \n", 1336 | "3 awesomebooksusa (387,183) 98.2% \n", 1337 | "4 book_fountain (166,979) 99.2% \n", 1338 | "5 geda-n (1,065) 100% \n", 1339 | "6 miscgoodstuff (307) 100% \n", 1340 | "7 thodgez28 (476) 99.3% \n", 1341 | "8 readingbooks_sendingpostcards (355) 100% \n", 1342 | "9 dunkin_bookstore (30,353) 99.2% \n", 1343 | "10 webuybooks (1,740,389) 99.7% \n", 1344 | "11 book_fountain (166,979) 99.2% \n", 1345 | "12 hpb-inc (31,625) 98.9% \n", 1346 | "13 webuybooks (1,740,389) 99.7% \n", 1347 | "14 webuybooks (1,740,389) 99.7% \n", 1348 | "15 awesomebooksusa (387,183) 98.2% \n", 1349 | "16 awesomebooksusa (387,183) 98.2% \n", 1350 | "17 vanpatrick2005 (2,220) 100% \n", 1351 | "18 webuybooks (1,740,389) 99.7% \n", 1352 | "19 book_fountain (166,979) 99.2% \n", 1353 | "20 book_fountain (166,979) 99.2% \n", 1354 | "21 wonderbooks (587,952) 99.7% \n", 1355 | "22 justsafa (1,192) 98.1% \n", 1356 | "23 book_fountain (166,979) 99.2% \n", 1357 | "24 thecotswoldlibrary (586,859) 99.6% \n", 1358 | "25 cmedia_group (763,774) 99.6% \n", 1359 | "26 rarewaves-outlet (909,494) 99.4% \n", 1360 | "27 rarewaves-outlet (909,494) 99.4% \n", 1361 | "28 cmedia_group (763,774) 99.6% \n", 1362 | "29 webuybooks (1,740,389) 99.7% \n", 1363 | "30 bombbooks (7,358) 98.8% \n", 1364 | "31 webuybooks (1,740,389) 99.7% \n", 1365 | "32 webuybooks (1,740,389) 99.7% \n", 1366 | "33 goodwillrs (211,859) 99.6% \n", 1367 | "34 webuybooks (1,740,389) 99.7% \n", 1368 | "35 thecotswoldlibrary (586,859) 99.6% \n", 1369 | "36 rheinberg-buch-at (85,845) 99.8% \n", 1370 | "37 webuybooks (1,740,389) 99.7% \n", 1371 | "38 valleegirls (8,391) 100% \n", 1372 | "39 awesomebooksusa (387,183) 98.2% \n", 1373 | "40 princess-shopee (2,671) 86.5% \n", 1374 | "41 thecotswoldlibrary (586,859) 99.6% \n", 1375 | "42 book_fountain (166,979) 99.2% \n", 1376 | "43 cmedia_group (763,774) 99.6% \n", 1377 | "44 book_fountain (166,979) 99.2% \n", 1378 | "45 book_fountain (166,979) 99.2% \n", 1379 | "\n", 1380 | " Link \\\n", 1381 | "1 https://www.ebay.com/itm/134303280275?epid=738... \n", 1382 | "2 https://www.ebay.com/itm/134435858075?epid=230... \n", 1383 | "3 https://www.ebay.com/itm/394419887418?hash=ite... \n", 1384 | "4 https://www.ebay.com/itm/175596253189?hash=ite... \n", 1385 | "5 https://www.ebay.com/itm/334775697963?epid=234... \n", 1386 | "6 https://www.ebay.com/itm/394408552413?epid=200... \n", 1387 | "7 https://www.ebay.com/itm/175688111518?epid=280... \n", 1388 | "8 https://www.ebay.com/itm/223503389279?epid=593... \n", 1389 | "9 https://www.ebay.com/itm/233198562933?epid=234... \n", 1390 | "10 https://www.ebay.com/itm/134523252613?hash=ite... \n", 1391 | "11 https://www.ebay.com/itm/385311444832?epid=955... \n", 1392 | "12 https://www.ebay.com/itm/334841553738?epid=844... \n", 1393 | "13 https://www.ebay.com/itm/134544064239?epid=216... \n", 1394 | "14 https://www.ebay.com/itm/364187110385?epid=220... \n", 1395 | "15 https://www.ebay.com/itm/314493314854?epid=201... \n", 1396 | "16 https://www.ebay.com/itm/334740223837?epid=602... \n", 1397 | "17 https://www.ebay.com/itm/256060365330?hash=ite... \n", 1398 | "18 https://www.ebay.com/itm/364200255647?epid=280... \n", 1399 | "19 https://www.ebay.com/itm/385405901406?epid=107... \n", 1400 | "20 https://www.ebay.com/itm/175678784438?epid=222... \n", 1401 | "21 https://www.ebay.com/itm/354646465558?epid=844... \n", 1402 | "22 https://www.ebay.com/itm/334772967747?epid=177... \n", 1403 | "23 https://www.ebay.com/itm/175686504625?epid=113... \n", 1404 | "24 https://www.ebay.com/itm/385583605619?epid=950... \n", 1405 | "25 https://www.ebay.com/itm/134407217856?epid=888... \n", 1406 | "26 https://www.ebay.com/itm/354309562663?epid=138... \n", 1407 | "27 https://www.ebay.com/itm/354225696018?epid=150... \n", 1408 | "28 https://www.ebay.com/itm/385204743329?epid=404... \n", 1409 | "29 https://www.ebay.com/itm/155486480064?epid=117... \n", 1410 | "30 https://www.ebay.com/itm/255810593954?epid=215... \n", 1411 | "31 https://www.ebay.com/itm/134479529800?epid=221... \n", 1412 | "32 https://www.ebay.com/itm/364131898642?hash=ite... \n", 1413 | "33 https://www.ebay.com/itm/165803498246?epid=210... \n", 1414 | "34 https://www.ebay.com/itm/155494228400?hash=ite... \n", 1415 | "35 https://www.ebay.com/itm/385450221653?epid=704... \n", 1416 | "36 https://www.ebay.com/itm/364238567063?hash=ite... \n", 1417 | "37 https://www.ebay.com/itm/134430226236?hash=ite... \n", 1418 | "38 https://www.ebay.com/itm/145030796056?epid=234... \n", 1419 | "39 https://www.ebay.com/itm/334829285079?epid=113... \n", 1420 | "40 https://www.ebay.com/itm/195582064511?epid=233... \n", 1421 | "41 https://www.ebay.com/itm/385584174988?epid=955... \n", 1422 | "42 https://www.ebay.com/itm/385546852155?epid=235... \n", 1423 | "43 https://www.ebay.com/itm/385528696514?epid=222... \n", 1424 | "44 https://www.ebay.com/itm/385343928440?epid=954... \n", 1425 | "45 https://www.ebay.com/itm/175695774186?epid=220... \n", 1426 | "\n", 1427 | " Seller_name Seller_feedback Seller_Rating% \n", 1428 | "1 vbbc2015 11848 100.0 \n", 1429 | "2 webuybooks 1740389 99.7 \n", 1430 | "3 awesomebooksusa 387183 98.2 \n", 1431 | "4 book_fountain 166979 99.2 \n", 1432 | "5 geda-n 1065 100.0 \n", 1433 | "6 miscgoodstuff 307 100.0 \n", 1434 | "7 thodgez28 476 99.3 \n", 1435 | "8 readingbooks_sendingpostcards 355 100.0 \n", 1436 | "9 dunkin_bookstore 30353 99.2 \n", 1437 | "10 webuybooks 1740389 99.7 \n", 1438 | "11 book_fountain 166979 99.2 \n", 1439 | "12 hpb-inc 31625 98.9 \n", 1440 | "13 webuybooks 1740389 99.7 \n", 1441 | "14 webuybooks 1740389 99.7 \n", 1442 | "15 awesomebooksusa 387183 98.2 \n", 1443 | "16 awesomebooksusa 387183 98.2 \n", 1444 | "17 vanpatrick2005 2220 100.0 \n", 1445 | "18 webuybooks 1740389 99.7 \n", 1446 | "19 book_fountain 166979 99.2 \n", 1447 | "20 book_fountain 166979 99.2 \n", 1448 | "21 wonderbooks 587952 99.7 \n", 1449 | "22 justsafa 1192 98.1 \n", 1450 | "23 book_fountain 166979 99.2 \n", 1451 | "24 thecotswoldlibrary 586859 99.6 \n", 1452 | "25 cmedia_group 763774 99.6 \n", 1453 | "26 rarewaves-outlet 909494 99.4 \n", 1454 | "27 rarewaves-outlet 909494 99.4 \n", 1455 | "28 cmedia_group 763774 99.6 \n", 1456 | "29 webuybooks 1740389 99.7 \n", 1457 | "30 bombbooks 7358 98.8 \n", 1458 | "31 webuybooks 1740389 99.7 \n", 1459 | "32 webuybooks 1740389 99.7 \n", 1460 | "33 goodwillrs 211859 99.6 \n", 1461 | "34 webuybooks 1740389 99.7 \n", 1462 | "35 thecotswoldlibrary 586859 99.6 \n", 1463 | "36 rheinberg-buch-at 85845 99.8 \n", 1464 | "37 webuybooks 1740389 99.7 \n", 1465 | "38 valleegirls 8391 100.0 \n", 1466 | "39 awesomebooksusa 387183 98.2 \n", 1467 | "40 princess-shopee 2671 86.5 \n", 1468 | "41 thecotswoldlibrary 586859 99.6 \n", 1469 | "42 book_fountain 166979 99.2 \n", 1470 | "43 cmedia_group 763774 99.6 \n", 1471 | "44 book_fountain 166979 99.2 \n", 1472 | "45 book_fountain 166979 99.2 " 1473 | ] 1474 | }, 1475 | "execution_count": 38, 1476 | "metadata": {}, 1477 | "output_type": "execute_result" 1478 | } 1479 | ], 1480 | "source": [ 1481 | "BIbooks.head(45)" 1482 | ] 1483 | }, 1484 | { 1485 | "cell_type": "code", 1486 | "execution_count": 39, 1487 | "id": "623bd55a", 1488 | "metadata": {}, 1489 | "outputs": [ 1490 | { 1491 | "data": { 1492 | "text/html": [ 1493 | "
\n", 1494 | "\n", 1507 | "\n", 1508 | " \n", 1509 | " \n", 1510 | " \n", 1511 | " \n", 1512 | " \n", 1513 | " \n", 1514 | " \n", 1515 | " \n", 1516 | " \n", 1517 | " \n", 1518 | " \n", 1519 | " \n", 1520 | " \n", 1521 | " \n", 1522 | " \n", 1523 | " \n", 1524 | " \n", 1525 | " \n", 1526 | " \n", 1527 | " \n", 1528 | " \n", 1529 | " \n", 1530 | " \n", 1531 | " \n", 1532 | " \n", 1533 | " \n", 1534 | " \n", 1535 | " \n", 1536 | " \n", 1537 | " \n", 1538 | " \n", 1539 | " \n", 1540 | " \n", 1541 | " \n", 1542 | " \n", 1543 | " \n", 1544 | " \n", 1545 | " \n", 1546 | " \n", 1547 | " \n", 1548 | " \n", 1549 | " \n", 1550 | " \n", 1551 | " \n", 1552 | " \n", 1553 | " \n", 1554 | "
TitlePrice_soldShipping_costItem_locationItem_sellerLinkSeller_nameSeller_feedbackSeller_Rating%Shipping_cost_valueShipping_type
1Business Intelligence : Practices, Technologie...54.9957.45 shippingUnited Statesvbbc2015 (11,848) 100%https://www.ebay.com/itm/134303280275?epid=738...vbbc201511848100.057.45shipping
2The Artificial Intelligence Imperative: A Prac...30.1717.48 shippingUnited Kingdomwebuybooks (1,740,389) 99.7%https://www.ebay.com/itm/134435858075?epid=230...webuybooks174038999.717.48shipping
\n", 1555 | "
" 1556 | ], 1557 | "text/plain": [ 1558 | " Title Price_sold \\\n", 1559 | "1 Business Intelligence : Practices, Technologie... 54.99 \n", 1560 | "2 The Artificial Intelligence Imperative: A Prac... 30.17 \n", 1561 | "\n", 1562 | " Shipping_cost Item_location Item_seller \\\n", 1563 | "1 57.45 shipping United States vbbc2015 (11,848) 100% \n", 1564 | "2 17.48 shipping United Kingdom webuybooks (1,740,389) 99.7% \n", 1565 | "\n", 1566 | " Link Seller_name \\\n", 1567 | "1 https://www.ebay.com/itm/134303280275?epid=738... vbbc2015 \n", 1568 | "2 https://www.ebay.com/itm/134435858075?epid=230... webuybooks \n", 1569 | "\n", 1570 | " Seller_feedback Seller_Rating% Shipping_cost_value Shipping_type \n", 1571 | "1 11848 100.0 57.45 shipping \n", 1572 | "2 1740389 99.7 17.48 shipping " 1573 | ] 1574 | }, 1575 | "execution_count": 39, 1576 | "metadata": {}, 1577 | "output_type": "execute_result" 1578 | } 1579 | ], 1580 | "source": [ 1581 | "#create new columns for shipping cost value and shipping type from 'Shipping_cost'column\n", 1582 | "BIbooks[['Shipping_cost_value', 'Shipping_type']] = BIbooks['Shipping_cost'].str.extract('([\\d\\.]+)\\s*([a-zA-Z\\s]+)', expand=True)\n", 1583 | "\n", 1584 | "#show the updated dataframe\n", 1585 | "BIbooks.head(2)" 1586 | ] 1587 | }, 1588 | { 1589 | "cell_type": "code", 1590 | "execution_count": 40, 1591 | "id": "23850af2", 1592 | "metadata": {}, 1593 | "outputs": [ 1594 | { 1595 | "data": { 1596 | "text/html": [ 1597 | "
\n", 1598 | "\n", 1611 | "\n", 1612 | " \n", 1613 | " \n", 1614 | " \n", 1615 | " \n", 1616 | " \n", 1617 | " \n", 1618 | " \n", 1619 | " \n", 1620 | " \n", 1621 | " \n", 1622 | " \n", 1623 | " \n", 1624 | " \n", 1625 | " \n", 1626 | " \n", 1627 | " \n", 1628 | " \n", 1629 | " \n", 1630 | " \n", 1631 | " \n", 1632 | " \n", 1633 | " \n", 1634 | " \n", 1635 | " \n", 1636 | " \n", 1637 | " \n", 1638 | " \n", 1639 | " \n", 1640 | " \n", 1641 | " \n", 1642 | " \n", 1643 | " \n", 1644 | " \n", 1645 | " \n", 1646 | " \n", 1647 | " \n", 1648 | " \n", 1649 | " \n", 1650 | " \n", 1651 | " \n", 1652 | " \n", 1653 | " \n", 1654 | " \n", 1655 | " \n", 1656 | " \n", 1657 | " \n", 1658 | "
TitlePrice_soldShipping_costItem_locationItem_sellerLinkSeller_nameSeller_feedbackSeller_Rating%Shipping_cost_valueShipping_type
1Business Intelligence : Practices, Technologie...54.9957.45 shippingUnited Statesvbbc2015 (11,848) 100%https://www.ebay.com/itm/134303280275?epid=738...vbbc201511848100.057.45Paid shipping
2The Artificial Intelligence Imperative: A Prac...30.1717.48 shippingUnited Kingdomwebuybooks (1,740,389) 99.7%https://www.ebay.com/itm/134435858075?epid=230...webuybooks174038999.717.48Paid shipping
\n", 1659 | "
" 1660 | ], 1661 | "text/plain": [ 1662 | " Title Price_sold \\\n", 1663 | "1 Business Intelligence : Practices, Technologie... 54.99 \n", 1664 | "2 The Artificial Intelligence Imperative: A Prac... 30.17 \n", 1665 | "\n", 1666 | " Shipping_cost Item_location Item_seller \\\n", 1667 | "1 57.45 shipping United States vbbc2015 (11,848) 100% \n", 1668 | "2 17.48 shipping United Kingdom webuybooks (1,740,389) 99.7% \n", 1669 | "\n", 1670 | " Link Seller_name \\\n", 1671 | "1 https://www.ebay.com/itm/134303280275?epid=738... vbbc2015 \n", 1672 | "2 https://www.ebay.com/itm/134435858075?epid=230... webuybooks \n", 1673 | "\n", 1674 | " Seller_feedback Seller_Rating% Shipping_cost_value Shipping_type \n", 1675 | "1 11848 100.0 57.45 Paid shipping \n", 1676 | "2 1740389 99.7 17.48 Paid shipping " 1677 | ] 1678 | }, 1679 | "execution_count": 40, 1680 | "metadata": {}, 1681 | "output_type": "execute_result" 1682 | } 1683 | ], 1684 | "source": [ 1685 | "#add \"paid\" in front of \"shipping\" in the Shipping_type column\n", 1686 | "BIbooks['Shipping_type'] = 'Paid ' + BIbooks['Shipping_type'].str.replace('shipping', 'shipping', regex=True)\n", 1687 | "\n", 1688 | "\n", 1689 | "#Fill the NaN values in the Shipping_type column with 'Free International shipping'\n", 1690 | "BIbooks['Shipping_type'] = BIbooks['Shipping_type'].fillna('Free International shipping')\n", 1691 | "\n", 1692 | "\n", 1693 | "#Fill the NaN values in the Shipping_type column with 'Free International shipping'\n", 1694 | "BIbooks['Shipping_cost_value'] = BIbooks['Shipping_cost_value'].fillna(0)\n", 1695 | "\n", 1696 | "\n", 1697 | "#Convert data type to float\n", 1698 | "BIbooks['Shipping_cost_value'] = BIbooks['Shipping_cost_value'].astype(float)\n", 1699 | "\n", 1700 | "\n", 1701 | "BIbooks.head(2)" 1702 | ] 1703 | }, 1704 | { 1705 | "cell_type": "code", 1706 | "execution_count": 41, 1707 | "id": "0c532a7e", 1708 | "metadata": {}, 1709 | "outputs": [ 1710 | { 1711 | "data": { 1712 | "text/html": [ 1713 | "
\n", 1714 | "\n", 1727 | "\n", 1728 | " \n", 1729 | " \n", 1730 | " \n", 1731 | " \n", 1732 | " \n", 1733 | " \n", 1734 | " \n", 1735 | " \n", 1736 | " \n", 1737 | " \n", 1738 | " \n", 1739 | " \n", 1740 | " \n", 1741 | " \n", 1742 | " \n", 1743 | " \n", 1744 | " \n", 1745 | " \n", 1746 | " \n", 1747 | " \n", 1748 | " \n", 1749 | " \n", 1750 | " \n", 1751 | " \n", 1752 | " \n", 1753 | " \n", 1754 | " \n", 1755 | " \n", 1756 | " \n", 1757 | " \n", 1758 | " \n", 1759 | " \n", 1760 | " \n", 1761 | " \n", 1762 | " \n", 1763 | " \n", 1764 | " \n", 1765 | " \n", 1766 | " \n", 1767 | " \n", 1768 | " \n", 1769 | " \n", 1770 | " \n", 1771 | " \n", 1772 | " \n", 1773 | " \n", 1774 | " \n", 1775 | " \n", 1776 | " \n", 1777 | " \n", 1778 | " \n", 1779 | " \n", 1780 | " \n", 1781 | " \n", 1782 | " \n", 1783 | " \n", 1784 | " \n", 1785 | " \n", 1786 | " \n", 1787 | " \n", 1788 | " \n", 1789 | " \n", 1790 | " \n", 1791 | " \n", 1792 | " \n", 1793 | " \n", 1794 | " \n", 1795 | " \n", 1796 | " \n", 1797 | " \n", 1798 | " \n", 1799 | " \n", 1800 | " \n", 1801 | " \n", 1802 | " \n", 1803 | " \n", 1804 | " \n", 1805 | " \n", 1806 | " \n", 1807 | " \n", 1808 | " \n", 1809 | " \n", 1810 | " \n", 1811 | " \n", 1812 | " \n", 1813 | " \n", 1814 | " \n", 1815 | " \n", 1816 | " \n", 1817 | " \n", 1818 | " \n", 1819 | " \n", 1820 | " \n", 1821 | " \n", 1822 | " \n", 1823 | " \n", 1824 | " \n", 1825 | " \n", 1826 | " \n", 1827 | " \n", 1828 | " \n", 1829 | " \n", 1830 | "
TitlePrice_soldShipping_costItem_locationItem_sellerLinkSeller_nameSeller_feedbackSeller_Rating%Shipping_cost_valueShipping_type
41New ListingCompetitive Intelligence: From Blac...14.782.64 shippingUnited Kingdomthecotswoldlibrary (586,859) 99.6%https://www.ebay.com/itm/385584174988?epid=955...thecotswoldlibrary58685999.62.64Paid shipping
42Happiness (HBR Emotional Intelligence Series) ...10.715.35 shippingUnited Kingdombook_fountain (166,979) 99.2%https://www.ebay.com/itm/385546852155?epid=235...book_fountain16697999.25.35Paid shipping
43Delivering Business Intelligence with Microsof...12.156.27 shippingUnited Kingdomcmedia_group (763,774) 99.6%https://www.ebay.com/itm/385528696514?epid=222...cmedia_group76377499.66.27Paid shipping
44Expert Systems: Artificial Intelligence in Bus...5.155.35 shippingUnited Kingdombook_fountain (166,979) 99.2%https://www.ebay.com/itm/385343928440?epid=954...book_fountain16697999.25.35Paid shipping
45Confidence (HBR Emotional Intelligence Series)...10.715.35 shippingUnited Kingdombook_fountain (166,979) 99.2%https://www.ebay.com/itm/175695774186?epid=220...book_fountain16697999.25.35Paid shipping
46Data Mining for Business Intelligence Galit Sh...13.9984.87 shippingUnited Statesstephensusedbookstore (9,208) 100%https://www.ebay.com/itm/394596385395?epid=844...stephensusedbookstore9208100.084.87Paid shipping
\n", 1831 | "
" 1832 | ], 1833 | "text/plain": [ 1834 | " Title Price_sold \\\n", 1835 | "41 New ListingCompetitive Intelligence: From Blac... 14.78 \n", 1836 | "42 Happiness (HBR Emotional Intelligence Series) ... 10.71 \n", 1837 | "43 Delivering Business Intelligence with Microsof... 12.15 \n", 1838 | "44 Expert Systems: Artificial Intelligence in Bus... 5.15 \n", 1839 | "45 Confidence (HBR Emotional Intelligence Series)... 10.71 \n", 1840 | "46 Data Mining for Business Intelligence Galit Sh... 13.99 \n", 1841 | "\n", 1842 | " Shipping_cost Item_location Item_seller \\\n", 1843 | "41 2.64 shipping United Kingdom thecotswoldlibrary (586,859) 99.6% \n", 1844 | "42 5.35 shipping United Kingdom book_fountain (166,979) 99.2% \n", 1845 | "43 6.27 shipping United Kingdom cmedia_group (763,774) 99.6% \n", 1846 | "44 5.35 shipping United Kingdom book_fountain (166,979) 99.2% \n", 1847 | "45 5.35 shipping United Kingdom book_fountain (166,979) 99.2% \n", 1848 | "46 84.87 shipping United States stephensusedbookstore (9,208) 100% \n", 1849 | "\n", 1850 | " Link Seller_name \\\n", 1851 | "41 https://www.ebay.com/itm/385584174988?epid=955... thecotswoldlibrary \n", 1852 | "42 https://www.ebay.com/itm/385546852155?epid=235... book_fountain \n", 1853 | "43 https://www.ebay.com/itm/385528696514?epid=222... cmedia_group \n", 1854 | "44 https://www.ebay.com/itm/385343928440?epid=954... book_fountain \n", 1855 | "45 https://www.ebay.com/itm/175695774186?epid=220... book_fountain \n", 1856 | "46 https://www.ebay.com/itm/394596385395?epid=844... stephensusedbookstore \n", 1857 | "\n", 1858 | " Seller_feedback Seller_Rating% Shipping_cost_value Shipping_type \n", 1859 | "41 586859 99.6 2.64 Paid shipping \n", 1860 | "42 166979 99.2 5.35 Paid shipping \n", 1861 | "43 763774 99.6 6.27 Paid shipping \n", 1862 | "44 166979 99.2 5.35 Paid shipping \n", 1863 | "45 166979 99.2 5.35 Paid shipping \n", 1864 | "46 9208 100.0 84.87 Paid shipping " 1865 | ] 1866 | }, 1867 | "execution_count": 41, 1868 | "metadata": {}, 1869 | "output_type": "execute_result" 1870 | } 1871 | ], 1872 | "source": [ 1873 | "BIbooks.iloc[40:46, :]" 1874 | ] 1875 | }, 1876 | { 1877 | "cell_type": "code", 1878 | "execution_count": 42, 1879 | "id": "4e5ff180", 1880 | "metadata": {}, 1881 | "outputs": [ 1882 | { 1883 | "name": "stdout", 1884 | "output_type": "stream", 1885 | "text": [ 1886 | "\n", 1887 | "RangeIndex: 75 entries, 1 to 75\n", 1888 | "Data columns (total 11 columns):\n", 1889 | " # Column Non-Null Count Dtype \n", 1890 | "--- ------ -------------- ----- \n", 1891 | " 0 Title 75 non-null object \n", 1892 | " 1 Price_sold 75 non-null float64\n", 1893 | " 2 Shipping_cost 75 non-null object \n", 1894 | " 3 Item_location 75 non-null object \n", 1895 | " 4 Item_seller 75 non-null object \n", 1896 | " 5 Link 75 non-null object \n", 1897 | " 6 Seller_name 75 non-null object \n", 1898 | " 7 Seller_feedback 75 non-null int32 \n", 1899 | " 8 Seller_Rating% 75 non-null float64\n", 1900 | " 9 Shipping_cost_value 75 non-null float64\n", 1901 | " 10 Shipping_type 75 non-null object \n", 1902 | "dtypes: float64(3), int32(1), object(7)\n", 1903 | "memory usage: 6.3+ KB\n" 1904 | ] 1905 | } 1906 | ], 1907 | "source": [ 1908 | "#Check for info\n", 1909 | "\n", 1910 | "BIbooks.info()" 1911 | ] 1912 | }, 1913 | { 1914 | "cell_type": "markdown", 1915 | "id": "494a2c1c", 1916 | "metadata": {}, 1917 | "source": [ 1918 | "# " 1919 | ] 1920 | }, 1921 | { 1922 | "cell_type": "markdown", 1923 | "id": "a39de738", 1924 | "metadata": {}, 1925 | "source": [ 1926 | "# " 1927 | ] 1928 | }, 1929 | { 1930 | "cell_type": "markdown", 1931 | "id": "d1eb65aa", 1932 | "metadata": {}, 1933 | "source": [ 1934 | "## Data Analysis \n", 1935 | "\n", 1936 | "What is the: \n", 1937 | "* Average price of books\n", 1938 | "* Average shipping cost\n", 1939 | "* Seller feedback and rating\n", 1940 | "* Shipping type\n", 1941 | "* Item location\n", 1942 | "* Price vs. shipping cost\n", 1943 | "* Top sellers\n", 1944 | "* Price distribution by seller" 1945 | ] 1946 | }, 1947 | { 1948 | "cell_type": "code", 1949 | "execution_count": 43, 1950 | "id": "48a087bf", 1951 | "metadata": {}, 1952 | "outputs": [ 1953 | { 1954 | "data": { 1955 | "text/html": [ 1956 | "
\n", 1957 | "\n", 1970 | "\n", 1971 | " \n", 1972 | " \n", 1973 | " \n", 1974 | " \n", 1975 | " \n", 1976 | " \n", 1977 | " \n", 1978 | " \n", 1979 | " \n", 1980 | " \n", 1981 | " \n", 1982 | " \n", 1983 | " \n", 1984 | " \n", 1985 | " \n", 1986 | " \n", 1987 | " \n", 1988 | " \n", 1989 | " \n", 1990 | " \n", 1991 | " \n", 1992 | " \n", 1993 | " \n", 1994 | " \n", 1995 | " \n", 1996 | " \n", 1997 | " \n", 1998 | " \n", 1999 | " \n", 2000 | " \n", 2001 | " \n", 2002 | " \n", 2003 | " \n", 2004 | " \n", 2005 | " \n", 2006 | " \n", 2007 | " \n", 2008 | " \n", 2009 | " \n", 2010 | " \n", 2011 | " \n", 2012 | " \n", 2013 | " \n", 2014 | " \n", 2015 | " \n", 2016 | " \n", 2017 | " \n", 2018 | " \n", 2019 | " \n", 2020 | " \n", 2021 | " \n", 2022 | " \n", 2023 | " \n", 2024 | " \n", 2025 | " \n", 2026 | " \n", 2027 | " \n", 2028 | " \n", 2029 | " \n", 2030 | " \n", 2031 | " \n", 2032 | " \n", 2033 | " \n", 2034 | " \n", 2035 | " \n", 2036 | " \n", 2037 | " \n", 2038 | "
Price_soldSeller_feedbackSeller_Rating%Shipping_cost_value
count75.0000007.500000e+0175.00000075.000000
mean21.1425335.062751e+0599.18266727.472133
std20.8040596.457872e+052.16516534.213422
min2.0000008.100000e+0186.5000000.000000
25%9.9700006.916000e+0399.2000005.350000
50%13.8600001.669790e+0599.60000011.560000
75%25.7100007.637740e+0599.90000032.425000
max130.2800001.740389e+06100.000000202.230000
\n", 2039 | "
" 2040 | ], 2041 | "text/plain": [ 2042 | " Price_sold Seller_feedback Seller_Rating% Shipping_cost_value\n", 2043 | "count 75.000000 7.500000e+01 75.000000 75.000000\n", 2044 | "mean 21.142533 5.062751e+05 99.182667 27.472133\n", 2045 | "std 20.804059 6.457872e+05 2.165165 34.213422\n", 2046 | "min 2.000000 8.100000e+01 86.500000 0.000000\n", 2047 | "25% 9.970000 6.916000e+03 99.200000 5.350000\n", 2048 | "50% 13.860000 1.669790e+05 99.600000 11.560000\n", 2049 | "75% 25.710000 7.637740e+05 99.900000 32.425000\n", 2050 | "max 130.280000 1.740389e+06 100.000000 202.230000" 2051 | ] 2052 | }, 2053 | "execution_count": 43, 2054 | "metadata": {}, 2055 | "output_type": "execute_result" 2056 | } 2057 | ], 2058 | "source": [ 2059 | "BIbooks.describe()" 2060 | ] 2061 | }, 2062 | { 2063 | "cell_type": "markdown", 2064 | "id": "ff45c32a", 2065 | "metadata": {}, 2066 | "source": [ 2067 | "\n", 2068 | "* The Average price of BIbooks on Ebay is 21.14\n", 2069 | "* The Average shipping cost of BIbooks on Ebay is 27.47\n", 2070 | "\n", 2071 | "* The Minimum price of BIbooks on Ebay is 2\n", 2072 | "* The minimum shipping cost of BIbooks on Ebay is 0\n", 2073 | "\n", 2074 | "* The Maximum price of of BIbooks on Ebay is 130.28\n", 2075 | "* The Maximum shipping cost of BIbooks on Ebay is 202.23\n" 2076 | ] 2077 | }, 2078 | { 2079 | "cell_type": "markdown", 2080 | "id": "2f6926c4", 2081 | "metadata": {}, 2082 | "source": [ 2083 | "# " 2084 | ] 2085 | }, 2086 | { 2087 | "cell_type": "code", 2088 | "execution_count": 44, 2089 | "id": "636ae4f5", 2090 | "metadata": { 2091 | "scrolled": true 2092 | }, 2093 | "outputs": [ 2094 | { 2095 | "data": { 2096 | "text/plain": [ 2097 | "Title 70\n", 2098 | "Price_sold 72\n", 2099 | "Shipping_cost 52\n", 2100 | "Item_location 5\n", 2101 | "Item_seller 36\n", 2102 | "Link 75\n", 2103 | "Seller_name 36\n", 2104 | "Seller_feedback 36\n", 2105 | "Seller_Rating% 13\n", 2106 | "Shipping_cost_value 52\n", 2107 | "Shipping_type 2\n", 2108 | "dtype: int64" 2109 | ] 2110 | }, 2111 | "execution_count": 44, 2112 | "metadata": {}, 2113 | "output_type": "execute_result" 2114 | } 2115 | ], 2116 | "source": [ 2117 | "BIbooks.nunique()" 2118 | ] 2119 | }, 2120 | { 2121 | "cell_type": "code", 2122 | "execution_count": 45, 2123 | "id": "d4ff8f4c", 2124 | "metadata": {}, 2125 | "outputs": [ 2126 | { 2127 | "data": { 2128 | "text/html": [ 2129 | "
\n", 2130 | "\n", 2143 | "\n", 2144 | " \n", 2145 | " \n", 2146 | " \n", 2147 | " \n", 2148 | " \n", 2149 | " \n", 2150 | " \n", 2151 | " \n", 2152 | " \n", 2153 | " \n", 2154 | " \n", 2155 | " \n", 2156 | " \n", 2157 | " \n", 2158 | " \n", 2159 | " \n", 2160 | " \n", 2161 | " \n", 2162 | " \n", 2163 | " \n", 2164 | " \n", 2165 | " \n", 2166 | " \n", 2167 | " \n", 2168 | " \n", 2169 | " \n", 2170 | " \n", 2171 | " \n", 2172 | " \n", 2173 | " \n", 2174 | " \n", 2175 | " \n", 2176 | " \n", 2177 | " \n", 2178 | " \n", 2179 | " \n", 2180 | " \n", 2181 | " \n", 2182 | " \n", 2183 | "
Price_soldSeller_feedbackSeller_Rating%Shipping_cost_value
Price_sold1.000000-0.236621-0.0348040.325289
Seller_feedback-0.2366211.0000000.144459-0.345786
Seller_Rating%-0.0348040.1444591.0000000.138973
Shipping_cost_value0.325289-0.3457860.1389731.000000
\n", 2184 | "
" 2185 | ], 2186 | "text/plain": [ 2187 | " Price_sold Seller_feedback Seller_Rating% \\\n", 2188 | "Price_sold 1.000000 -0.236621 -0.034804 \n", 2189 | "Seller_feedback -0.236621 1.000000 0.144459 \n", 2190 | "Seller_Rating% -0.034804 0.144459 1.000000 \n", 2191 | "Shipping_cost_value 0.325289 -0.345786 0.138973 \n", 2192 | "\n", 2193 | " Shipping_cost_value \n", 2194 | "Price_sold 0.325289 \n", 2195 | "Seller_feedback -0.345786 \n", 2196 | "Seller_Rating% 0.138973 \n", 2197 | "Shipping_cost_value 1.000000 " 2198 | ] 2199 | }, 2200 | "execution_count": 45, 2201 | "metadata": {}, 2202 | "output_type": "execute_result" 2203 | } 2204 | ], 2205 | "source": [ 2206 | "BIbooks.corr()" 2207 | ] 2208 | }, 2209 | { 2210 | "cell_type": "markdown", 2211 | "id": "97c54451", 2212 | "metadata": {}, 2213 | "source": [ 2214 | "# " 2215 | ] 2216 | }, 2217 | { 2218 | "cell_type": "markdown", 2219 | "id": "eb7051c0", 2220 | "metadata": {}, 2221 | "source": [ 2222 | "\n", 2223 | "\n", 2224 | "* The correlation coefficient between Price_sold and Shipping_cost_value is 0.325289, this indicates a weak positive correlation, which means that there is some tendency for books with higher prices to have higher shipping costs.\n", 2225 | "\n", 2226 | "\n", 2227 | "* The correlation coefficient between Price_sold and Seller_feedback is -0.236621, this indicates a negative correlation, the negative correlation coefficient suggests that as the seller feedback rating increases, the price of the book tends to decrease. \n", 2228 | "\n", 2229 | "\n", 2230 | "* The correlation coefficient between Price_sold and Seller_Rating% is -0.034804, this indicates no significant correlation between them.\n" 2231 | ] 2232 | }, 2233 | { 2234 | "cell_type": "markdown", 2235 | "id": "8ffab4dc", 2236 | "metadata": {}, 2237 | "source": [ 2238 | "# " 2239 | ] 2240 | }, 2241 | { 2242 | "cell_type": "code", 2243 | "execution_count": 49, 2244 | "id": "047047c5", 2245 | "metadata": {}, 2246 | "outputs": [ 2247 | { 2248 | "data": { 2249 | "text/plain": [ 2250 | "United Kingdom 39\n", 2251 | "United States 30\n", 2252 | "Germany 3\n", 2253 | "India 2\n", 2254 | "Australia 1\n", 2255 | "Name: Item_location, dtype: int64" 2256 | ] 2257 | }, 2258 | "execution_count": 49, 2259 | "metadata": {}, 2260 | "output_type": "execute_result" 2261 | } 2262 | ], 2263 | "source": [ 2264 | "#Number of books by location\n", 2265 | "BIbooks['Item_location'].value_counts()" 2266 | ] 2267 | }, 2268 | { 2269 | "cell_type": "code", 2270 | "execution_count": 50, 2271 | "id": "a619a960", 2272 | "metadata": {}, 2273 | "outputs": [ 2274 | { 2275 | "data": { 2276 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAX4AAAEWCAYAAABhffzLAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/YYfK9AAAACXBIWXMAAAsTAAALEwEAmpwYAAAiX0lEQVR4nO3deZgcVdn+8e9NCCCGLWTEBAiDYVFewCCBVwU1gCiKbIogsosGN0DFH0ZEDaKAC6iIqCBLRBZZDEsAWQIB+SlL2BMJBmNEICRhT1ACgef945whRWemp2cy1c2k7s91zdW111M11U+dPlV1ShGBmZlVx3KtDsDMzJrLid/MrGKc+M3MKsaJ38ysYpz4zcwqxonfzKxinPiXYZJ+LenbfbSs4ZIWSBqQ+ydL+mxfLDsv7xpJB/bV8nqw3u9LelLSE01e7zhJv1/KZYyW9GhfxdRKkt4n6aFWx1EVTvz9lKRZkv4rab6kZyX9RdLnJb32P42Iz0fEcQ0u64P1pomIRyJiUES80gexL5H0IuIjETF+aZfdwzjWBY4ENomIt3YyfrSkV/MJb4GkxyQd28wYm0XSOZK+38T1haQNOvoj4s8RsXGz1l91Tvz92y4RsQqwHnAi8A3gzL5eiaTl+3qZbxDrAU9FxNw60zyeT3iDgG2BQyTt3pTozErixL8MiIjnIuIKYG/gQEmbwutLcZKGSJqYfx08LenPkpaTdC4wHLgyl2qPktSeS2SHSHoEuLEwrHgSGCHpDknPSbpc0uC8riWqIDp+VUjaCTga2Duv7748/rWqoxzXMZL+JWmupN9JWi2P64jjQEmP5Gqab3W1byStluefl5d3TF7+B4HrgWE5jnMa2M//BP4CbFJY/nsl3Zn3wZ2S3lsYN0zSFXl/Pyzpc13EOFDSBZIulbSCpK0lTZH0vKQ5kk6uF5eko/N+mCVp3zxsqzzv8oXpPiHp3u62s5Plfy7H/3TenmGFcf8j6fo8bo6ko/PwrSX9NR9vsyWdKmmFPO6WPPt9ed/vXXvMSHpHPiaelTRN0q6FcedI+qWkq5R+8d4uaURPt6vKnPiXIRFxB/Ao8L5ORh+Zx7UBa5GSb0TE/sAjpF8PgyLiR4V5PgC8A/hwF6s8APgMMAxYBJzSQIx/Ao4H/pDX985OJjso/20HvA0YBJxaM822wMbADsB3JL2ji1X+AlgtL+cDOeaDI+IG4CMsLtEf1F3skjYEtgFuy/2DgatI270mcDJwlaQ18ywXkPb5MGBP4HhJO9Qs803AZcBCYK+IeAn4OfDziFgVGAFcVCestwJDgLWBA4HTJW0cEXcCTwE7FqbdDzi3u+2siW974ARgL2Ao8C/gwjxuFeAG4E95GzcAJuVZXwG+mmN7D+n/9EWAiHh/nuaded//oWadA4ErgeuAtwCHAedJKlYF7QMcC6wBPAz8oCfbVXVO/Muex4HBnQx/mfTFXS8iXs51qt011DQuIl6IiP92Mf7ciJgaES8A3wb2Ur74u5T2BU6OiJkRsQD4JvCpml8bx0bEfyPiPuA+YIkTSI5lb+CbETE/ImYBJwH79yCWYbnU+Tzwd+B24NY8bmdgRkScGxGLIuICYDqwi9L1g22Bb0TEixFxL/DbmnWvSkqa/yCdjDqun7wMbCBpSEQsiIjbuonx2xGxMCJuJp2I9srDx5OSfcdJ6sPA+T3Ydkj/i7Mi4u6IWEj6X7xHUjvwMeCJiDgpb+P8iLgdICLuiojb8n6ZBfyGdOJtxLtJJ/sTI+KliLgRmEhK9h3+GBF3RMQi4DxgZA+3q9Kc+Jc9awNPdzL8x6SS0XWSZkoa28Cy/t2D8f8CBpJKeEtrWF5ecdnLk36pdCjehfMfUqKoNQRYoZNlrd2DWB6PiNVz6Xt14L+khNpZnMXlDwOejoj5ddb9bmBzUoIrnoQPATYCpufqo4/Vie+ZfOItrqOjKub3pJPQINLJ4M8RMbvu1i7pdduYT8RP5e1Yl3TSWoKkjZSqFp/IJ83jafzYGAb8OyJeLQyr3XeN/P+tC078yxBJW5G+HLfWjsulsSMj4m3ALsDXCtUOXZX8u/tFsG6hezippPok8AKwciGuAaQqpkaX+zjpwmtx2YuAOd3MV+vJHFPtsh7r4XKAdC2FVGLepYs4i8t/HBicq0O6Wvd1pGqUSZJeO6lFxIyI2IdUzfFD4BJJb+4irDVqxg3P6yYiHgP+CuxB+qXRo2qe7HXbmNe1Zt6Of5OqojrzK9Kvnw3zSfNoQD1Y57oq3KHGUvzfbElO/MsASavmUuGFwO8j4oFOpvmYpA0kCXieVAfbUbUwh1QH3lP7SdpE0srA94BLcnXF34GVJO2c62uPAVYszDcHaK/5YhddAHxV0vq5tNpxTWBRT4LLsVwE/EDSKpLWA75GKgn3WI7lU8C0POhqYCNJn5a0vKS9SRd+J0bEv0kXgk+QtJKkzUkl+fNqYvwR6WQySdKQvJ79JLXlEu+zedJ6t9Eemy8Kv49U/XJxYdzvgKOAzYAJ3WzigBxrx98KObaDJY2UtCLpf3F7rr6ZCLxV0lckrZj38f/mZa1COs4WSHo78IWaddU75m4nFR6OUrrwPZp0sr2wm/itQU78/duVkuaTSl7fIl1cPLiLaTckXYhbQCoFnhYRk/O4E4Bjcl3213uw/nOBc0g/u1cCDofXSsZfJNVpP0b6Ehfv8ulITE9JuruT5Z6Vl30L8E/gRdIFvt44LK9/JumX0Pl5+Y3quOtnAam6YTCp3puIeIqUaI8kVX8cBXwsIp7M8+4DtJNKsBOA70bE9bUryM9aXAbckOvidwKm5XX+HPhURLzYRXxPAM/kdZwHfD4iphfGTyCV2CfUVAl1ZiypKqvj78aImES6fnMpMJtUwv9Ujns+6eLxLjmOGaQL8gBfBz4NzAfOAF53ARcYB4zPx9xexRH5AveupIvvTwKnAQfUbJctBflFLGbLNkn/AA7NdzKZucRvtiyT9AnSNZUbWx2LvXEsq09kmlWepMmkaw7719whYxXnqh4zs4pxVY+ZWcX0i6qeIUOGRHt7e6vDMDPrV+66664nI6Ktdni/SPzt7e1MmTKl1WGYmfUrkmqfLAeaUNUjaYCkeyRNzP2Dc2t+M/LnGmXHYGZmizWjjv8I4MFC/1hgUkRsSGrJr5E2Y8zMrI+UmvglrUNqwfC3hcG7sbiRq/HA7mXGYGZmr1d2if9npMfYi/cQr9XRQmD+fEtnM0oao/Qyiinz5s0rOUwzs+ooLfHnRsPmRsRdvZk/Ik6PiFERMaqtbYmL0mZm1ktl3tWzDbCrpI+SGvBaVekF23MkDY2I2ZKGAvXed2pmZn2stBJ/RHwzItaJiHZSa343RsR+wBWkV8SRPy8vKwYzM1tSK57cPRHYUdIMUpOuJ7YgBjOzymrKA1y53ffJufsp0ouXzcysBfrFk7tLo33sVa0Ooc/MOnHnVodgZssAN9JmZlYxTvxmZhXjxG9mVjFO/GZmFePEb2ZWMU78ZmYV48RvZlYxTvxmZhXjxG9mVjFO/GZmFePEb2ZWMU78ZmYV48RvZlYxTvxmZhXjxG9mVjFO/GZmFePEb2ZWMaUlfkkrSbpD0n2Spkk6Ng8fJ+kxSffmv4+WFYOZmS2pzFcvLgS2j4gFkgYCt0q6Jo/7aUT8pMR1m5lZF0pL/BERwILcOzD/RVnrMzOzxpRaxy9pgKR7gbnA9RFxex71ZUn3SzpL0hpdzDtG0hRJU+bNm1dmmGZmlVJq4o+IVyJiJLAOsLWkTYFfASOAkcBs4KQu5j09IkZFxKi2trYywzQzq5Sm3NUTEc8Ck4GdImJOPiG8CpwBbN2MGMzMLCnzrp42Savn7jcBHwSmSxpamGwPYGpZMZiZ2ZLKvKtnKDBe0gDSCeaiiJgo6VxJI0kXemcBh5YYg5mZ1Sjzrp77gS06Gb5/Wes0M7Pu+cldM7OKceI3M6sYJ34zs4px4jczqxgnfjOzinHiNzOrGCd+M7OKceI3M6sYJ34zs4px4jczqxgnfjOzinHiNzOrGCd+M7OKceI3M6sYJ34zs4px4jczqxgnfjOziinznbsrSbpD0n2Spkk6Ng8fLOl6STPy5xplxWBmZksqs8S/ENg+It4JjAR2kvRuYCwwKSI2BCblfjMza5LSEn8kC3LvwPwXwG7A+Dx8PLB7WTGYmdmSSq3jlzRA0r3AXOD6iLgdWCsiZgPkz7eUGYOZmb3e8mUuPCJeAUZKWh2YIGnTRueVNAYYAzB8+PByAqyA9rFXtTqEPjHrxJ1bHYLZMqMpd/VExLPAZGAnYI6koQD5c24X85weEaMiYlRbW1szwjQzq4Qy7+ppyyV9JL0J+CAwHbgCODBPdiBweVkxmJnZksqs6hkKjJc0gHSCuSgiJkr6K3CRpEOAR4BPlhiDmZnVKC3xR8T9wBadDH8K2KGs9ZqZWX1+ctfMrGKc+M3MKsaJ38ysYpz4zcwqpkeJX9IakjYvKxgzMytft4lf0mRJq0oaDNwHnC3p5PJDMzOzMjRS4l8tIp4HPg6cHRFbkh7GMjOzfqiRxL98blphL2BiyfGYmVnJGkn83wOuBR6OiDslvQ2YUW5YZmZWlkae3J0UERd39ETETElfLzEmMzMrUSMl/islrdrRI2kT4MryQjIzszI1kviPJyX/QZK2BC4G9is3LDMzK0u3VT0RcZWkgcB1wCrA7hHhOn4zs36qy8Qv6Rekd+R2WBWYCRwmiYg4vOzgzMys79Ur8U+p6b+rzEDMzKw5ukz8ETG+o1vSCsBGufehiHi57MDMzKwc3dbxSxoNjAdmAQLWlXRgRNxSamRmZlaKRu7jPwn4UEQ8BCBpI+ACYMsyAzMzs3I0cjvnwI6kDxARfwcGdjeTpHUl3STpQUnTJB2Rh4+T9Jike/PfR3sfvpmZ9VQjJf4pks4Ezs39+9LYhd5FwJERcbekVYC7JF2fx/00In7S83DNzGxpNZL4vwB8CTicVMd/C3BadzNFxGxgdu6eL+lBYO3eh2pmZn2hkQe4Fko6FbiedF9/j+/qkdQObAHcDmwDfFnSAaRbRo+MiGc6mWcMMAZg+PDhPVmdmZnV0ciLWEaTWuM8lVTS/7uk9ze6AkmDgEuBr+R2/X8FjABGkn4RnNTZfBFxekSMiohRbW1tja7OzMy6UepdPbmph0uB8yLijwARMacw/gzcxr+ZWVOVeVePgDOBByPi5MLwoYXJ9gCmNh6umZktrTLv6tkG2B94QNK9edjRwD6SRpKuF8wCDu1BvGZmtpTKvKvn1jx9rat7EqCZmfWthu7qAU7Of2Zm1s91WccvaUNJ50g6WdI6kq6RtEDSfZK2amaQZmbWd+pd3D0b+AvwOOn++7OAIcDXSbd2mplZP1Qv8Q/K99L/BPhvRFwcES9GxPXAik2Kz8zM+li9xP9qofv5OuPMzKwfqXdx9+2S7ifdmTMid5P731Z6ZGZmVop6if8dTYvCzMyapt6rF//VzEDMzKw5GmmywczMliFO/GZmFVPvAa5J+fOHzQvHzMzKVu/i7lBJHwB2lXQhNe3uRMTdpUZmZmalqJf4vwOMBdZhyXZ6Ati+rKDMzKw89e7quQS4RNK3I+K4JsZkZmYlaqR1zuMk7Qp0vG5xckT4rVlmZv1UI+/cPQE4Avhb/jsiDzMzs36okRex7AyMjIhXASSNB+4BvllmYGZmVo5G7+NfvdC9WiMzSFpX0k2SHpQ0TdIRefhgSddLmpE/1+hhzGZmthQaSfwnAPfkl7KMJ71v9/gG5lsEHBkR7wDeDXxJ0iakO4UmRcSGwKTcb2ZmTdLIxd0LJE0GtiLdy/+NiHiigflmA7Nz93xJDwJrA7sBo/Nk44HJwDd6EbuZmfVCI3X8HUn8it6uRFI7sAXpTV5r5eUREbMlvaW3yzUzs54rva0eSYOAS4GvRETtC13qzTdG0hRJU+bNm1degGZmFVNq4pc0kJT0z4uIP+bBcyQNzeOHAnM7mze/9nFURIxqa2srM0wzs0qpm/glLSdpam8WLEnAmcCDEVFs8uEK4MDcfSBweW+Wb2ZmvVM38ed79++TNLwXy94G2B/YXtK9+e+jwInAjpJmADvmfjMza5JGLu4OBaZJugN4oWNgROxab6aIuJWaFj0Ldmg4QjMz61ONJP5jS4/CzMyappH7+G+WtB6wYUTcIGllYED5oZmZWRkaaaTtc8AlwG/yoLWBy0qMyczMStTI7ZxfIl2ofR4gImYAfujKzKyfaiTxL4yIlzp6JC1PegOXmZn1Q40k/pslHQ28SdKOwMXAleWGZWZmZWkk8Y8F5gEPAIcCVwPHlBmUmZmVp5G7el7NzTHfTqrieSgiXNVjZtZPdZv4Je0M/Br4B+mBrPUlHRoR15QdnJmZ9b1GHuA6CdguIh4GkDQCuApw4jcz64caqeOf25H0s5l00aKmmZm98XVZ4pf08dw5TdLVwEWkOv5PAnc2ITYzMytBvaqeXQrdc4AP5O55gF+QbmbWT3WZ+CPi4GYGYmZmzdHIXT3rA4cB7cXpu2uW2czM3pgauavnMtKbtK4EXi01GjMzK10jif/FiDil9EjMzKwpGkn8P5f0XeA6YGHHwIi4u7SozMysNI0k/s3I785lcVVP5P4uSToL+BjpOYBN87BxwOdIdwYBHB0RV/c8bDMz661GEv8ewNuKTTM36BzgVOB3NcN/GhE/6eGyzMysjzTy5O59wOo9XXBE3AI83dP5zMysXI2U+NcCpku6k9fX8ff2ds4vSzoAmAIcGRHPdDaRpDHAGIDhw4f3clVmZlarkcT/3T5c36+A40jXCI4jNQD3mc4mjIjTgdMBRo0a5Wagzcz6SCPt8d/cVyuLiDkd3ZLOACb21bLNzKwx3dbxS5ov6fn896KkVyQ935uVSRpa6N0DmNqb5ZiZWe81UuJfpdgvaXdg6+7mk3QBMBoYIulRUpXRaEkjSVU9s0ivcjQzsyZqpI7/dSLiMkljG5hun04Gn9nT9ZmZWd9qpJG2jxd6lwNGkUrsZmbWDzVS4i+2y7+IVEWzWynRmJlZ6Rqp43e7/GZmy5B6r178Tp35IiKOKyEeMzMrWb0S/wudDHszcAiwJukBLDMz62fqvXrxpI5uSasARwAHAxeSnrg1M7N+qG4dv6TBwNeAfYHxwLu6alvHzMz6h3p1/D8GPk5qL2eziFjQtKjMzKw09ZpsOBIYBhwDPF5otmF+b5tsMDOz1qtXx99IW/1mZtbPOLmbmVWME7+ZWcU48ZuZVYwTv5lZxTjxm5lVjBO/mVnFOPGbmVWME7+ZWcWUlvglnSVprqSphWGDJV0vaUb+XKOs9ZuZWefKLPGfA+xUM2wsMCkiNgQm5X4zM2ui0hJ/RNwCPF0zeDdSK5/kz93LWr+ZmXWu2XX8a0XEbID8+ZauJpQ0RtIUSVPmzZvXtADNzJZ1b9iLuxFxekSMiohRbW1trQ7HzGyZ0ezEP0fSUID8ObfJ6zczq7xmJ/4rgANz94HA5U1ev5lZ5ZV5O+cFwF+BjSU9KukQ4ERgR0kzgB1zv5mZNVHdd+4ujYjYp4tRO5S1TjMz694b9uKumZmVw4nfzKxinPjNzCrGid/MrGKc+M3MKsaJ38ysYpz4zcwqxonfzKxinPjNzCrGid/MrGKc+M3MKsaJ38ysYpz4zcwqxonfzKxinPjNzCrGid/MrGKc+M3MKqa0N3DVI2kWMB94BVgUEaNaEYeZWRW1JPFn20XEky1cv5lZJbmqx8ysYlqV+AO4TtJdksZ0NoGkMZKmSJoyb968JodnZrbsalXi3yYi3gV8BPiSpPfXThARp0fEqIgY1dbW1vwIzcyWUS1J/BHxeP6cC0wAtm5FHGZmVdT0xC/pzZJW6egGPgRMbXYcZmZV1Yq7etYCJkjqWP/5EfGnFsRhZlZJTU/8ETETeGez12tmZolv5zQzqxgnfjOzinHiNzOrGCd+M7OKceI3M6uYVjbSZlaq9rFXtTqEPjPrxJ1bHYItQ1ziNzOrGCd+M7OKceI3M6sYJ34zs4rxxV2zZZAvbFs9LvGbmVWME7+ZWcU48ZuZVYwTv5lZxfjirpktc3xxuz6X+M3MKsaJ38ysYlqS+CXtJOkhSQ9LGtuKGMzMqqrpiV/SAOCXwEeATYB9JG3S7DjMzKqqFSX+rYGHI2JmRLwEXAjs1oI4zMwqSRHR3BVKewI7RcRnc//+wP9GxJdrphsDjMm9GwMPNTXQnhsCPNnqIFrE215dVd7+/rDt60VEW+3AVtzOqU6GLXH2iYjTgdPLD6dvSJoSEaNaHUcreNurue1Q7e3vz9veiqqeR4F1C/3rAI+3IA4zs0pqReK/E9hQ0vqSVgA+BVzRgjjMzCqp6VU9EbFI0peBa4EBwFkRMa3ZcZSg31RLlcDbXl1V3v5+u+1Nv7hrZmat5Sd3zcwqxonfzKxi+kXil9QuaWrNsHGSvt7NfKMknZK7R0t6by/WPUvSkHrDJW0p6Z+StpC0a181Q5Fjnlhn/Btxv3xG0gOS7pc0VdJuefhBkoY1sNyGpusNSWtJOl/STEl3SfqrpD3KWFd/JWlBD6d/7Rjty2O/DJL2kBSS3t7L+XfvTSsD+Zg+NXd/XtIBvVl/X1qmm2WOiCnAlNw7GlgA/KUv1yFpc+ASYO+IuAe4hzf4XUpl7RdJ6wDfAt4VEc9JGgR0PDxyEDCV7m/dbXS6nsYm4DJgfER8Og9bD9i1wfkHRMQrfRnTsiYiruCNfezvA9xKupNwXC/m3x2YCPytdoSk5SNiUXcLiIhf92K9fS8i3vB/QDswtWbYOODruXsy8EPgDuDvwPvy8NGkf1Q78ATwGHAv8D5SQrqUdHvpncA2eZ41getICfw3wL+AIZ3ENCsvZyawbWH4QcCpufsc4BRSUp0J7JmHLwecBkzL8V1dGLcTMJ10gJ4CTMzDB5MS1/3AbcDmebvmAuNzzLOAPwA3AQ8ATwM/btZ+Ad6VlzOgZviepJPLQ3n8m4Dv5OVPJd0doS6m2xK4GbiLdCfY0LzMw0lfwPuBCxs4hnYAbu5i3IC8n+7Myzu0sJ9uAs7P6xqdY7ko788TgX3z/n0AGJHn2wW4Pe+rG4C1CsfsWaTjdSZweB5+HHBEIZ4fdIxrwXdtQWHbJ5MKNdOB81h8M0hXx+hBLD72O90HLcwhg/JxvhEwvfg9KExzKnBQ7j6xcHz9BHgv6fv0z3xsjsj75/h8TBxZ5/9e3C/jWJy3PpePuftI37mVm7Y/WvnP6ME/rZ3uE/9JufujwA21/9ji9Ln/fHLCBoYDD+buU4Dv5O6dSU8Vd5X4nwY+WjO8+E8+B7iYlOg3IbVRBCnBXZ2HvxV4Jg9bCfg3sCEpEV5UiP8XwHdz9/b54GsnJf5bgYHAO4GXgTPydPOAy5q1X0gJ9FrgEeBsYJfCuMnAqEL/4EL3uR3TFqfL2/QXoC337026/RfSL4IVc/fqDRxDhwM/7WLcGOCY3L0i6dfQ+nk/vQCsX9hvzwJD83SPAcfmcUcAP8vda7A4SX6WxcfmuLw9K5Ie938qb2M7cHeeZjngH8CaLfquFRP/c6QHLJcD/gpsS/1j9CAWH/ud7oNW/QH7AWfm7r+QCimj6STxkwpZDxXiX73wfd6z5pg+rdDf1f+9uF/GsThvrVmY9/vAYc3aH/2lqqere06Lw/+YP+8ifZG680Fgk1QDAMCqklYB3g98HCAirpL0TJ1l3AB8VtK10XU1wGUR8SrwN0lr5WHbAhfn4U9IuikPfzvwz4iYASDp9yxur2hb4BM5rhslrUkqxQBcExEvS3qA9GWcnoe/QCqZQxP2S0S8ImknYCtSCfunkraMiHGdrGc7SUcBK5O+aNOAK2um2RjYFLg+xzMAmJ3H3Q+cJ+ky0i+hHpH0S9I+fYm0jzbP7UgBrEZKbC8Bd0TEPwuz3hkRs/My/kH6FQSpxL9d7l4H+IOkocAKpFJih6siYiGwUNJcUqlwlqSnJG0BrAXcExFP9XSbSnBHRDwKIOle0vGzgK6P0aJ6+6AV9gF+lrsvzP1dvabreeBF4LeSriL9Ou7KHwrdPd3mTSV9H1id9F2+tpvp+0y/uLhLKhmtUTNsMK9vIGlh/nyFxq5dLAe8JyJG5r+1I2J+Htfoww0dDcudVmeahYVu1Xx2pqt1dzbPU6QS2EKAfCJ5ldfvl47lNWW/RHJHRJxAqkv9xBIbIq1E2md7RsRmwBl5O5aYFJhWiGWziPhQHrczqXnvLYG7JHW3bdNIpbyOOL9EOjm15fUcVljP+hHRkdBfqFlO8f/5aqH/VRbv31+QSnibAYfWbFtx/uL/5LekkuHBpOqgN4KuYm3k+1FvHzRVLiRtT0rks4D/R/r1+Aqvz4ErQXrIlNSK8KWkev0/1Vl88fjo6TafA3w5T39sA9P3mX6R+CNiATBb0g4AkgaT6hlv7cFi5gOrFPqvY3HiRtLI3HkLqd4WSR9hyRNO0aukksPGkr7Xg1huBT4habn8K2B0Hj4dWF/SiNy/T2GeYlyjgSdzyXMBsEEePpj05WzJfpE0TNK7CoNGsvgXR3E9HQf4k/kC8J6FeYrTPQS0SXpPXv5ASf8jaTlg3Yi4CTiKxSWmem4EVpL0hcKwlfPntcAXJA3M69lI0pu7WV49q5GqgQAObHCeCaRjeiuaWPLrhXrHaFFv9kFZ9gR+FxHrRUR7RKzL4tL4JpJWlLQaqSBAPiZXi4irga+QjmNY8rtSq6fbvAoprw0kf7eapb9U9QAcAPxS0km5/9iI+EcP5r8SuCTfXngYqc73l5LuJ+2HW4DPk868F0i6m3TR5pF6C42IhXmZN0uaw5IlxM5cSjrIppIuEt4OPBcRLyo1R32VpCdJCXzTPM844Owc739YfGBNAD6cf4oDvNTC/TIQ+Em+HfNF0jWGz+dx5wC/lvRf4D2kUv4DpGsldxaWUTvdnsAp+Yu5POnn+t+B3+dhItXdP1tvIyMiJO1Oqn46Ksf2AvAN0nWYduDufPfPPFJJr7fGARdLeox0IX797maIiJdyld+zdaoNW66bY7RoHD3cByXah3SxtuhS4NOkaxT3AzNIF2UhJeTL8y9TAV/Nwy8EzpB0OK8vrHQYR8+2+duk7/6/SN+FeieVPuUmG1pE0qCIWJB/ht5BunvmiVbHZa2Rf8XcDXyyo/7crCz9qcS/rJkoaXXSRaDjnPSrKz8UNBGY4KRvzeASv5lZxfSLi7tmZtZ3nPjNzCrGid/MrGKc+K1Setr6ZA+Wu7qkLxb6h0m6pIx1mS0tX9y1SpG0ICK6e9irN8ttJ7X70tk97WZvKC7xW+VJGinpNqV3CEyQtEYevoGkGyTdJ+luSSMkDZI0Kfc/kB98g/SA0AhJ90r6sQrvSpC0kqSz8/T3SNouDz9I0h8l/UnSDEk/as0esKrxffxm8DtSWz0356Y3vkt6VP884MSImJCf4lyO1HDbHhHxvNKLaG6TdAUwFtg0IkbCa78AOnwJICI2U3oJyHWSNsrjRgJbkNrFeUjSLyLi36VurVWeS/xWabnZh9Uj4uY8aDzwfqUWSdeOiAmQmiqIiP+QHuE/PjdpcQOwNqlFzXq2JTU9TURMJz2i35H4J0XEcxHxIqn99/X6buvMOucSv1nnumpBdV9Si55b5qawZ9F9q4r1WmPtqgVMs9K4xG+VFhHPAc9Iel8etD/pTV3PA4/mht3ILTiuTGqBcW5O+tuxuIRer+XGYsumG5FecPNQGdtj1giXLqxqVpb0aKH/ZFJLp7/OiX0mqU18SCeB3+R6/5eBT5Lq/a+UNIX0FrTpABHxlKT/ny/oXkN6V0CH0/LyHwAWkV7vt1Cq90PArDy+ndPMrGJc1WNmVjFO/GZmFePEb2ZWMU78ZmYV48RvZlYxTvxmZhXjxG9mVjH/ByIjTNb5aKbZAAAAAElFTkSuQmCC\n", 2277 | "text/plain": [ 2278 | "
" 2279 | ] 2280 | }, 2281 | "metadata": { 2282 | "needs_background": "light" 2283 | }, 2284 | "output_type": "display_data" 2285 | } 2286 | ], 2287 | "source": [ 2288 | "#number of books by location\n", 2289 | "Book_location = BIbooks['Item_location'].value_counts()\n", 2290 | "\n", 2291 | "\n", 2292 | "plt.bar(Book_location.index, Book_location.values)\n", 2293 | "\n", 2294 | "\n", 2295 | "plt.title('Distribution of Books by Location')\n", 2296 | "plt.xlabel('Location')\n", 2297 | "plt.ylabel('Number of Books')\n", 2298 | "\n", 2299 | "\n", 2300 | "plt.show()\n" 2301 | ] 2302 | }, 2303 | { 2304 | "cell_type": "markdown", 2305 | "id": "98bd80c7", 2306 | "metadata": {}, 2307 | "source": [ 2308 | "# " 2309 | ] 2310 | }, 2311 | { 2312 | "cell_type": "markdown", 2313 | "id": "8ea92714", 2314 | "metadata": {}, 2315 | "source": [ 2316 | "# " 2317 | ] 2318 | }, 2319 | { 2320 | "cell_type": "code", 2321 | "execution_count": 51, 2322 | "id": "47dde10d", 2323 | "metadata": {}, 2324 | "outputs": [ 2325 | { 2326 | "data": { 2327 | "image/png": "\n", 2328 | "text/plain": [ 2329 | "
" 2330 | ] 2331 | }, 2332 | "metadata": { 2333 | "needs_background": "light" 2334 | }, 2335 | "output_type": "display_data" 2336 | } 2337 | ], 2338 | "source": [ 2339 | "#Relationship between 'Price_sold' and 'Shipping_cost_value'\n", 2340 | "\n", 2341 | "plt.scatter(x=BIbooks['Price_sold'], y=BIbooks['Shipping_cost_value'])\n", 2342 | "\n", 2343 | "plt.title(' Relationship between Price_sold and Shipping_cost_value')\n", 2344 | "plt.xlabel('Price_sold')\n", 2345 | "plt.ylabel('Shipping_cost_value')\n", 2346 | "\n", 2347 | "\n", 2348 | "plt.show()\n" 2349 | ] 2350 | }, 2351 | { 2352 | "cell_type": "markdown", 2353 | "id": "fca2de35", 2354 | "metadata": {}, 2355 | "source": [ 2356 | "* The scattterplot also this indicates a weak positive correlation, which confirms that there is some tendency for books with higher prices to have higher shipping costs." 2357 | ] 2358 | }, 2359 | { 2360 | "cell_type": "markdown", 2361 | "id": "87416ecd", 2362 | "metadata": {}, 2363 | "source": [ 2364 | "# " 2365 | ] 2366 | }, 2367 | { 2368 | "cell_type": "code", 2369 | "execution_count": 52, 2370 | "id": "71cd6ed1", 2371 | "metadata": {}, 2372 | "outputs": [ 2373 | { 2374 | "data": { 2375 | "image/png": "\n", 2376 | "text/plain": [ 2377 | "
" 2378 | ] 2379 | }, 2380 | "metadata": { 2381 | "needs_background": "light" 2382 | }, 2383 | "output_type": "display_data" 2384 | } 2385 | ], 2386 | "source": [ 2387 | "#distribution of Price_sold\n", 2388 | "\n", 2389 | "plt.hist(BIbooks['Price_sold'], bins=10)\n", 2390 | "\n", 2391 | "plt.title('Distribution of Price_sold')\n", 2392 | "plt.xlabel('Price_sold')\n", 2393 | "plt.ylabel('Frequency')\n", 2394 | "plt.show()" 2395 | ] 2396 | }, 2397 | { 2398 | "cell_type": "markdown", 2399 | "id": "ed421023", 2400 | "metadata": {}, 2401 | "source": [ 2402 | "* Most BIbooks on EBAY are sold within the range of 2 Dollars to 20 dollars" 2403 | ] 2404 | }, 2405 | { 2406 | "cell_type": "markdown", 2407 | "id": "3a27e27a", 2408 | "metadata": {}, 2409 | "source": [ 2410 | "# " 2411 | ] 2412 | }, 2413 | { 2414 | "cell_type": "code", 2415 | "execution_count": 53, 2416 | "id": "854cdfed", 2417 | "metadata": {}, 2418 | "outputs": [ 2419 | { 2420 | "data": { 2421 | "text/plain": [ 2422 | "Paid shipping 74\n", 2423 | "Free International shipping 1\n", 2424 | "Name: Shipping_type, dtype: int64" 2425 | ] 2426 | }, 2427 | "execution_count": 53, 2428 | "metadata": {}, 2429 | "output_type": "execute_result" 2430 | } 2431 | ], 2432 | "source": [ 2433 | "#Count of shipping type\n", 2434 | "BIbooks['Shipping_type'].value_counts()" 2435 | ] 2436 | }, 2437 | { 2438 | "cell_type": "code", 2439 | "execution_count": 54, 2440 | "id": "b3809d9e", 2441 | "metadata": { 2442 | "scrolled": true 2443 | }, 2444 | "outputs": [ 2445 | { 2446 | "data": { 2447 | "text/plain": [ 2448 | "Shipping_type\n", 2449 | "Free International shipping 0.00\n", 2450 | "Paid shipping 2060.41\n", 2451 | "Name: Shipping_cost_value, dtype: float64" 2452 | ] 2453 | }, 2454 | "execution_count": 54, 2455 | "metadata": {}, 2456 | "output_type": "execute_result" 2457 | } 2458 | ], 2459 | "source": [ 2460 | "# total shipping cost of shipping type\n", 2461 | "\n", 2462 | "BIbooks.groupby('Shipping_type')['Shipping_cost_value'].sum()" 2463 | ] 2464 | }, 2465 | { 2466 | "cell_type": "code", 2467 | "execution_count": 55, 2468 | "id": "d37d31b7", 2469 | "metadata": {}, 2470 | "outputs": [ 2471 | { 2472 | "data": { 2473 | "text/plain": [ 2474 | "Shipping_type\n", 2475 | "Free International shipping 0.00\n", 2476 | "Paid shipping 202.23\n", 2477 | "Name: Shipping_cost_value, dtype: float64" 2478 | ] 2479 | }, 2480 | "execution_count": 55, 2481 | "metadata": {}, 2482 | "output_type": "execute_result" 2483 | } 2484 | ], 2485 | "source": [ 2486 | "# max shipping cost of shipping type\n", 2487 | "\n", 2488 | "BIbooks.groupby('Shipping_type')['Shipping_cost_value'].max()" 2489 | ] 2490 | }, 2491 | { 2492 | "cell_type": "code", 2493 | "execution_count": 56, 2494 | "id": "8834dad1", 2495 | "metadata": {}, 2496 | "outputs": [ 2497 | { 2498 | "data": { 2499 | "image/png": "\n", 2500 | "text/plain": [ 2501 | "
" 2502 | ] 2503 | }, 2504 | "metadata": { 2505 | "needs_background": "light" 2506 | }, 2507 | "output_type": "display_data" 2508 | } 2509 | ], 2510 | "source": [ 2511 | "#relationship between Shipping_cost_value and Shipping_type \n", 2512 | "\n", 2513 | "\n", 2514 | "\n", 2515 | "\n", 2516 | "sns.boxplot(x='Shipping_type', y='Shipping_cost_value', data=BIbooks)\n", 2517 | "plt.show()\n" 2518 | ] 2519 | }, 2520 | { 2521 | "cell_type": "markdown", 2522 | "id": "7f0b151c", 2523 | "metadata": {}, 2524 | "source": [ 2525 | "# " 2526 | ] 2527 | }, 2528 | { 2529 | "cell_type": "markdown", 2530 | "id": "5752d198", 2531 | "metadata": {}, 2532 | "source": [ 2533 | "67 of the BI books has Paid shipping, which comes to a total cost of $1725.17, and the highest shipping cost for a book is 44.16 \n", 2534 | " " 2535 | ] 2536 | }, 2537 | { 2538 | "cell_type": "markdown", 2539 | "id": "de86fb13", 2540 | "metadata": {}, 2541 | "source": [ 2542 | "# " 2543 | ] 2544 | }, 2545 | { 2546 | "cell_type": "code", 2547 | "execution_count": 57, 2548 | "id": "1e195e41", 2549 | "metadata": { 2550 | "scrolled": false 2551 | }, 2552 | "outputs": [ 2553 | { 2554 | "data": { 2555 | "text/plain": [ 2556 | "Seller_name\n", 2557 | "webuybooks 1740389\n", 2558 | "rarewaves-outlet 909494\n", 2559 | "cmedia_group 763774\n", 2560 | "wonderbooks 587952\n", 2561 | "thecotswoldlibrary 586859\n", 2562 | "awesomebooksusa 387183\n", 2563 | "goodwillrs 211859\n", 2564 | "book_fountain 166979\n", 2565 | "bluevasemarketplace 125322\n", 2566 | "bookmans_exchange 112385\n", 2567 | "Name: Seller_feedback, dtype: int32" 2568 | ] 2569 | }, 2570 | "execution_count": 57, 2571 | "metadata": {}, 2572 | "output_type": "execute_result" 2573 | } 2574 | ], 2575 | "source": [ 2576 | "#find the top 10 Seller_name with the highest Seller_feedback\n", 2577 | "\n", 2578 | "BIbooks.groupby('Seller_name')['Seller_feedback'].max().nlargest(10)" 2579 | ] 2580 | }, 2581 | { 2582 | "cell_type": "code", 2583 | "execution_count": 58, 2584 | "id": "e94d152e", 2585 | "metadata": {}, 2586 | "outputs": [ 2587 | { 2588 | "data": { 2589 | "image/png": "\n", 2590 | "text/plain": [ 2591 | "
" 2592 | ] 2593 | }, 2594 | "metadata": { 2595 | "needs_background": "light" 2596 | }, 2597 | "output_type": "display_data" 2598 | } 2599 | ], 2600 | "source": [ 2601 | "# Create a scatter plot\n", 2602 | "plt.scatter(BIbooks['Seller_feedback'], BIbooks['Price_sold'])\n", 2603 | "\n", 2604 | "# Add chart labels\n", 2605 | "plt.title('Relationship between Seller Feedback and Price Sold')\n", 2606 | "plt.xlabel('Seller Feedback')\n", 2607 | "plt.ylabel('Price Sold')\n", 2608 | "\n", 2609 | "# Display the chart\n", 2610 | "plt.show()\n" 2611 | ] 2612 | }, 2613 | { 2614 | "cell_type": "markdown", 2615 | "id": "ef5da34b", 2616 | "metadata": {}, 2617 | "source": [ 2618 | "# " 2619 | ] 2620 | }, 2621 | { 2622 | "cell_type": "code", 2623 | "execution_count": 59, 2624 | "id": "af633db5", 2625 | "metadata": {}, 2626 | "outputs": [ 2627 | { 2628 | "name": "stdout", 2629 | "output_type": "stream", 2630 | "text": [ 2631 | "+---------------------------------------------------------------------------------------------+------------+\n", 2632 | "| Title | Price Sold |\n", 2633 | "+---------------------------------------------------------------------------------------------+------------+\n", 2634 | "| Business Education Course Entrepreneurs DVD/Textbooks Venture Academy Christian | 130.28 |\n", 2635 | "| Business Intelligence, Analytics, and Data Science: A Managerial Perspective 4th | 110.0 |\n", 2636 | "| Entrepreneurs Course DVD/Textbooks Venture Academy Christian Business Education | 59.99 |\n", 2637 | "| Business Intelligence : Practices, Technologies, and Management by Irma...TPB | 54.99 |\n", 2638 | "| Business Intelligence, Analytics Data Science A Managerial Perspective Sharda | 49.99 |\n", 2639 | "| 2010 Business Intelligence: A Managerial Approach Book Efraim Turban 2nd Edition | 48.73 |\n", 2640 | "| Intelligent Business, Elementary Skills Book, w. CD-ROM ~ Ch ... 9781405881418 | 42.84 |\n", 2641 | "| New ListingMixed Intelligent Systems: Developing Models for Project Management and Evaluati | 38.3 |\n", 2642 | "| Learn Microsoft PowerApps: Build customized business applications without writin | 36.63 |\n", 2643 | "| Business Intelligence, Analytics, and Data Science - International Edition | 36.0 |\n", 2644 | "+---------------------------------------------------------------------------------------------+------------+\n" 2645 | ] 2646 | } 2647 | ], 2648 | "source": [ 2649 | "# top 10 most expensive book by title\n", 2650 | "top_books = BIbooks.groupby('Title')['Price_sold'].max().nlargest(10)\n", 2651 | "\n", 2652 | "\n", 2653 | "table = PrettyTable()\n", 2654 | "table.field_names = [\"Title\", \"Price Sold\"]\n", 2655 | "\n", 2656 | "\n", 2657 | "for Title, price_sold in zip(top_books.index, top_books.values):\n", 2658 | " table.add_row([Title, price_sold])\n", 2659 | "\n", 2660 | "\n", 2661 | "print(table)\n" 2662 | ] 2663 | }, 2664 | { 2665 | "cell_type": "markdown", 2666 | "id": "8f2ef743", 2667 | "metadata": {}, 2668 | "source": [ 2669 | "# " 2670 | ] 2671 | }, 2672 | { 2673 | "cell_type": "code", 2674 | "execution_count": null, 2675 | "id": "ecf1c8c6", 2676 | "metadata": {}, 2677 | "outputs": [], 2678 | "source": [ 2679 | "# save the dataframe to csv\n", 2680 | "BIbooks.to_csv(r'C:\\Users\\O---u A\\Downloads\\WEB Scrapping\\Ebay_BusinessIntelligenceBooks.csv', index=False)" 2681 | ] 2682 | }, 2683 | { 2684 | "cell_type": "markdown", 2685 | "id": "ddbacbd7", 2686 | "metadata": {}, 2687 | "source": [ 2688 | "# " 2689 | ] 2690 | } 2691 | ], 2692 | "metadata": { 2693 | "kernelspec": { 2694 | "display_name": "Python 3 (ipykernel)", 2695 | "language": "python", 2696 | "name": "python3" 2697 | }, 2698 | "language_info": { 2699 | "codemirror_mode": { 2700 | "name": "ipython", 2701 | "version": 3 2702 | }, 2703 | "file_extension": ".py", 2704 | "mimetype": "text/x-python", 2705 | "name": "python", 2706 | "nbconvert_exporter": "python", 2707 | "pygments_lexer": "ipython3", 2708 | "version": "3.9.12" 2709 | } 2710 | }, 2711 | "nbformat": 4, 2712 | "nbformat_minor": 5 2713 | } 2714 | --------------------------------------------------------------------------------