├── Adidas US Sales Analysis ├── Adidas Sales Analysis with reports.xlsx ├── Dashboard.xlsx └── Original-dataset.xlsx ├── Certificates ├── Certificate Agile Requirements Foundations.pdf ├── Certificate Business Analysis Foundations.pdf ├── Certificate Business Analyst and Project Manager Collaboration.pdf ├── Certificate Business Benefits Realization Foundations.pdf ├── Certificate Excel Working Together with Power Query and Power Pivot.pdf ├── Coursera Prepare, Clean, Transform, and Load Data using Power BI.pdf ├── Data analysis with Pandas.pdf ├── ETL and Data Pipelines with Shell, Airflow and Kafka.pdf ├── Google Data Data Analytics.pdf ├── IBM data analysis Python.pdf ├── Intermediate SQL Data Reporting and Analysis.pdf ├── Linkedin Learning Excel Data Analysis.pdf └── Linkedin Learning Python Data Analysis.pdf ├── Data Analyst CV.pdf ├── German cars Analysis ├── German cars presentation Pavel Liaoshka.pdf ├── German cars presentation Pavel Liaoshka.pptx ├── German-cars-Analysis.ipynb └── original dataset.csv ├── README.md └── Real Estate USA data SQL and Power BI analysis ├── Data_cleaning.sql ├── Exploring_data.sql ├── Quered data property sold.csv ├── Quered data year and month.csv ├── Real Estate USA Dashboards.pbix └── Real-Estate USA Dashboards.pdf /Adidas US Sales Analysis/Adidas Sales Analysis with reports.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/paulo81818/Data-Business-Analysis-Portfolio/a03c682609731826fbf301cdc43e5dc808a676fd/Adidas US Sales Analysis/Adidas Sales Analysis with reports.xlsx -------------------------------------------------------------------------------- /Adidas US Sales Analysis/Dashboard.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/paulo81818/Data-Business-Analysis-Portfolio/a03c682609731826fbf301cdc43e5dc808a676fd/Adidas US Sales Analysis/Dashboard.xlsx -------------------------------------------------------------------------------- /Adidas US Sales Analysis/Original-dataset.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/paulo81818/Data-Business-Analysis-Portfolio/a03c682609731826fbf301cdc43e5dc808a676fd/Adidas US Sales Analysis/Original-dataset.xlsx -------------------------------------------------------------------------------- /Certificates/Certificate Agile Requirements Foundations.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/paulo81818/Data-Business-Analysis-Portfolio/a03c682609731826fbf301cdc43e5dc808a676fd/Certificates/Certificate Agile Requirements Foundations.pdf -------------------------------------------------------------------------------- /Certificates/Certificate Business Analysis Foundations.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/paulo81818/Data-Business-Analysis-Portfolio/a03c682609731826fbf301cdc43e5dc808a676fd/Certificates/Certificate Business Analysis Foundations.pdf -------------------------------------------------------------------------------- /Certificates/Certificate Business Analyst and Project Manager Collaboration.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/paulo81818/Data-Business-Analysis-Portfolio/a03c682609731826fbf301cdc43e5dc808a676fd/Certificates/Certificate Business Analyst and Project Manager Collaboration.pdf -------------------------------------------------------------------------------- /Certificates/Certificate Business Benefits Realization Foundations.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/paulo81818/Data-Business-Analysis-Portfolio/a03c682609731826fbf301cdc43e5dc808a676fd/Certificates/Certificate Business Benefits Realization Foundations.pdf -------------------------------------------------------------------------------- /Certificates/Certificate Excel Working Together with Power Query and Power Pivot.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/paulo81818/Data-Business-Analysis-Portfolio/a03c682609731826fbf301cdc43e5dc808a676fd/Certificates/Certificate Excel Working Together with Power Query and Power Pivot.pdf -------------------------------------------------------------------------------- /Certificates/Coursera Prepare, Clean, Transform, and Load Data using Power BI.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/paulo81818/Data-Business-Analysis-Portfolio/a03c682609731826fbf301cdc43e5dc808a676fd/Certificates/Coursera Prepare, Clean, Transform, and Load Data using Power BI.pdf -------------------------------------------------------------------------------- /Certificates/Data analysis with Pandas.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/paulo81818/Data-Business-Analysis-Portfolio/a03c682609731826fbf301cdc43e5dc808a676fd/Certificates/Data analysis with Pandas.pdf -------------------------------------------------------------------------------- /Certificates/ETL and Data Pipelines with Shell, Airflow and Kafka.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/paulo81818/Data-Business-Analysis-Portfolio/a03c682609731826fbf301cdc43e5dc808a676fd/Certificates/ETL and Data Pipelines with Shell, Airflow and Kafka.pdf -------------------------------------------------------------------------------- /Certificates/Google Data Data Analytics.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/paulo81818/Data-Business-Analysis-Portfolio/a03c682609731826fbf301cdc43e5dc808a676fd/Certificates/Google Data Data Analytics.pdf -------------------------------------------------------------------------------- /Certificates/IBM data analysis Python.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/paulo81818/Data-Business-Analysis-Portfolio/a03c682609731826fbf301cdc43e5dc808a676fd/Certificates/IBM data analysis Python.pdf -------------------------------------------------------------------------------- /Certificates/Intermediate SQL Data Reporting and Analysis.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/paulo81818/Data-Business-Analysis-Portfolio/a03c682609731826fbf301cdc43e5dc808a676fd/Certificates/Intermediate SQL Data Reporting and Analysis.pdf -------------------------------------------------------------------------------- /Certificates/Linkedin Learning Excel Data Analysis.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/paulo81818/Data-Business-Analysis-Portfolio/a03c682609731826fbf301cdc43e5dc808a676fd/Certificates/Linkedin Learning Excel Data Analysis.pdf -------------------------------------------------------------------------------- /Certificates/Linkedin Learning Python Data Analysis.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/paulo81818/Data-Business-Analysis-Portfolio/a03c682609731826fbf301cdc43e5dc808a676fd/Certificates/Linkedin Learning Python Data Analysis.pdf -------------------------------------------------------------------------------- /Data Analyst CV.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/paulo81818/Data-Business-Analysis-Portfolio/a03c682609731826fbf301cdc43e5dc808a676fd/Data Analyst CV.pdf -------------------------------------------------------------------------------- /German cars Analysis/German cars presentation Pavel Liaoshka.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/paulo81818/Data-Business-Analysis-Portfolio/a03c682609731826fbf301cdc43e5dc808a676fd/German cars Analysis/German cars presentation Pavel Liaoshka.pdf -------------------------------------------------------------------------------- /German cars Analysis/German cars presentation Pavel Liaoshka.pptx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/paulo81818/Data-Business-Analysis-Portfolio/a03c682609731826fbf301cdc43e5dc808a676fd/German cars Analysis/German cars presentation Pavel Liaoshka.pptx -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Pavel Liaoshka - Data Analysis portfolio 2 | ## About me 3 | Hello everyone! My name is Pavel, and this is my portfolio.
4 | I love investigating different types of data, discovering insights, and representing it with beautiful visuals.
5 | I have a background in digital marketing and financial data analysis.
6 | 7 | You can see more information in my [**CV**](https://github.com/paulo81818/Data-Business-Analysis-Portfolio/blob/main/Data%20Analyst%20CV.pdf). 8 | 9 | This repository was created to showcase my analytical and technical skills (Excel, Python, SQL, Power BI, PowerPoint, and others). 10 | ## Contents 11 | * [About me](#about-me) 12 | * [Portfolio Projects](#portfolio-projects) 13 | - [German cars data analysis](#german-cars-data-analysis) 14 | - [Adidas US sales data analysis](#adidas-us-sales-data-analysis) 15 | - [Real estate US SQL and Power BI analysis](#real-estate-us-sql-and-power-bi-analysis) 16 | * [Study projects](#study-projects) 17 | - [Python Data Analysis](#python-data-analysis) 18 | - [Excel Exercises](#excel-exercises) 19 | - [R Exercises](#r-exercises) 20 | - [SQL Exercises](#sql-exercises) 21 | - [ETL pipelines project](#etl-pipelines-project) 22 | * [Certificates](#certificates) 23 | * [Contacts](#contacts) 24 | ## Portfolio Projects 25 | This section contains a list of projects with brief descriptions. 26 | ### German cars data analysis 27 | **Description:** The main goal of this project is to analyze a dataset scraped from https://www.autoscout24.de containing data in 46405 rows about cars on sale in Germany with production years from 2011 to 2021 using different Python libraries. Find out some interesting information and insights, visualize them, and present them using MS PowerPoint.
28 | **Code:** 29 | german-cars-data-analysis.ipynb
30 | **Presentation:** 31 | PDF static version 32 | PowerPoint PPTX dynamic version
33 | **Original dataset:** 34 | german-cars-dataset.csv
35 | **Skills:** analytical thinking, data cleaning, data analysis, data vizualization, presentations
36 | **Hard skills:** MS PowerPoint, Python: Pandas, NumPy, Mathplotlib, Seaborn.
37 | **Results:** An analysis of a dataset containing information about cars posted on AutoScaut 24 was conducted. Some insights were found, visualized, and included in the presentation. 38 | ### Adidas US sales data analysis 39 | **Description:** The goal of this project is to use Excel functionality to analyze data about Adidas's product sales in the United States, which contain information in 9652 rows and 14 columns for the fiscal years 2020 and 2021. Visualize data, prepare different types of reports and interactive dashboard.
40 | **Reports and conclusions:** 41 | adidas-sales-data-analysis.xlsx
42 | **Dashboard:** 43 | adidas-sales-dashboard.xlsx
44 | **Original dataset:** 45 | adidas-us-sales.xlsx
46 | **Skills:** analytical thinking, data cleaning, data analysis, data vizualization.
47 | **Hard skills:** Excel, Pivot Tables, Formulas, Functions, Charts, Dashboards, Slices, Pivot Charts.
48 | **Results:** An analysis of financial data on the sale of Adidas products in the USA for 2020 and 2021 was performed. Reports and dashboard were created. 49 | ### Real estate US SQL and Power BI analysis 50 | **Description:** The main goals of this project are:
1) To clean the dataset from Kaggle about real estate in the United States using SQL,
2) Explore data in Bigquery,
3) Group and prepare data for visualization in Power BI,
4) Create comprehensive dashboards, including interactive property stats, the price calculator that can count average prices by given parameters, and a dashboard that shows the quantity of sold properties in different states and the sales distribution among the months in the year for effective data exploration.
51 | **Data cleaning and exploring SQL:** 52 | data_cleaning.sql 53 | data_exploration.sql
54 | **Power BI dashboards:** 55 | real-estate-dashboards.pbix 56 | real-estate-dashboards.pdf
57 | **Skills:** analytical thinking, data cleaning, data analysis, data vizualization
58 | **Hard skills:** Bigquery SQL, Power BI, Dashboards
59 | **Results:** Data was cleaned (which reduced the amount of data by about nine times), explored in Bigquery, imported into Power BI, transformed, and 3 dashboards were created. 60 | ## Study Projects 61 | ### Python Data Analysis 62 | **Repository:** 63 | python-data-analysis
64 | **Description:** This repository contains ipynb files with tasks that were completed during the Python Data Analysis course on LinkedIn Learning. The main goal is to improve Python's analytical abilities.
65 | **Skills:** Python, Pandas, NumPy, Matplotlib, Data Analysis, Data Visualization
66 | **Status:** Completed in 2022 67 | ### Excel Exercises 68 | **Repository:** 69 | excel-data-analysis
70 | **Description:** This repository contains xlsx files with completed tasks from the Excel Data Analysis course on LinkedIn Learning. The main goal is to advance knowledge in spreadsheets and statistical analysis.
71 | **Skills:** Excel, Formulas, Functions, Data Analysis, Statistical Analysis.
72 | **Status:** Completed in 2022 73 | ### R Exercises 74 | **Repository:** 75 | r-data-analysis
76 | **Description:** This repository contains some completed exercises with R practice. The main goal is to gain basic knowledge of the R programming language and its use for data analysis and visualization.
77 | **Skills:** R programming language, Data Analysis, Data Visualization.
78 | **Status:** Completed in 2022 79 | ### SQL Exercises 80 | **Repository:** 81 | sql-queries
82 | **Description:** This repository contains SQL queries that I was writing to accomplish different tasks during different courses. The main goal is to improve SQL knowledge and fluency.
83 | **Skills:** SQL, Queries, data analysis, data cleaning
84 | **Status:** Completed in 2022 85 | ### ETL pipelines project 86 | **Repository:** 87 | etl-pipelines
88 | **Description:** This repository contains screenshots of the final project in the ETL and Data Pipelines with Shell, Airflow, and Kafka. Main goal: create data pipelines using Apache Airflow, Bash commands, and Kafka.
89 | **Skills:** Bash, Apache Airflow, Kafka
90 | **Status:** Completed in 2022 91 | ## Certificates 92 | * [Google Data Analytics Certificate](https://github.com/paulo81818/Data-Business-Analysis-Portfolio/blob/main/Certificates/Google%20Data%20Data%20Analytics.pdf) - Coursera, 2022 93 | * [IBM Data Analysis with Python](https://github.com/paulo81818/Data-Business-Analysis-Portfolio/blob/main/Certificates/IBM%20data%20analysis%20Python.pdf) - Coursera, 2022 94 | * [Mastering Data Analysis with Pandas](https://github.com/paulo81818/Data-Business-Analysis-Portfolio/blob/main/Certificates/Data%20analysis%20with%20Pandas.pdf) - Coursera, 2022 95 | * [Python Data Analysis](https://github.com/paulo81818/Data-Business-Analysis-Portfolio/blob/main/Certificates/Linkedin%20Learning%20Python%20Data%20Analysis.pdf) - Linkedin Learning, 2022 96 | * [Excel Data Analysis](https://github.com/paulo81818/Data-Business-Analysis-Portfolio/blob/main/Certificates/Linkedin%20Learning%20Excel%20Data%20Analysis.pdf) - Linkedin Learning, 2022 97 | * [ETL and Data Pipelines with Shell, Airflow and Kafka](https://github.com/paulo81818/Data-Business-Analysis-Portfolio/blob/main/Certificates/ETL%20and%20Data%20Pipelines%20with%20Shell%2C%20Airflow%20and%20Kafka.pdf) - Coursera, 2022 98 | * [Intermediate SQL: Data Reporting and Analysis](https://github.com/paulo81818/Data-Business-Analysis-Portfolio/blob/main/Certificates/Intermediate%20SQL%20Data%20Reporting%20and%20Analysis.pdf) - Linkedin Learning, 2022 99 | * [Prepare, Clean, Transform, and Load Data using Power BI](https://github.com/paulo81818/Data-Business-Analysis-Portfolio/blob/main/Certificates/Coursera%20Prepare%2C%20Clean%2C%20Transform%2C%20and%20Load%20Data%20using%20Power%20BI.pdf) - Coursera, 2022 100 | * [Excel: Working Together with Power Query and Power Pivot](https://github.com/paulo81818/Data-Business-Analysis-Portfolio/blob/main/Certificates/Certificate%20Excel%20Working%20Together%20with%20Power%20Query%20and%20Power%20Pivot.pdf) - Linkedin Learning, 2022 101 | * [Business Benefits Realization Foundations](https://github.com/paulo81818/Data-Business-Analysis-Portfolio/blob/main/Certificates/Certificate%20Business%20Benefits%20Realization%20Foundations.pdf) - Linkedin Learning, 2023 102 | * [Business Analyst and Project Manager Collaboration](https://github.com/paulo81818/Data-Business-Analysis-Portfolio/blob/main/Certificates/Certificate%20Business%20Analyst%20and%20Project%20Manager%20Collaboration.pdf) - Linkedin Learning, 2023 103 | * [Business Analysis Foundations](https://github.com/paulo81818/Data-Business-Analysis-Portfolio/blob/main/Certificates/Certificate%20Business%20Analysis%20Foundations.pdf) - Linkedin Learning, 2023 104 | * [Agile Requirements Foundations](https://github.com/paulo81818/Data-Business-Analysis-Portfolio/blob/main/Certificates/Certificate%20Agile%20Requirements%20Foundations.pdf) - Linkedin Learning, 2023 105 | 106 | ## Contacts 107 | * Linkedin: https://www.linkedin.com/in/pavelliaoshka 108 | * Email: pavelliaoshka818@gmail.com 109 | * Telegram: @franch8888 110 | -------------------------------------------------------------------------------- /Real Estate USA data SQL and Power BI analysis/Data_cleaning.sql: -------------------------------------------------------------------------------- 1 | /* Data cleaning using SQL 2 | Skills used: CREATE, GROUP BY, ORDER BY, REPLACE, CASE, Remove duplicates, inadequate, and unnecessary data.*/ 3 | 4 | ---------------------------------------------------------------------------------- 5 | 6 | /*First look at the dataset*/ 7 | 8 | SELECT 9 | * 10 | FROM 11 | `myproject8888-357816.real_estate_us.re-us` 12 | LIMIT 13 | 1000 14 | 15 | 16 | 17 | 18 | /* Remove Duplicates and create a new table */ 19 | 20 | CREATE OR REPLACE TABLE myproject8888-357816.real_estate_us.re_us1 21 | AS 22 | SELECT 23 | DISTINCT * 24 | FROM 25 | `myproject8888-357816.real_estate_us.re-us` 26 | 27 | /* The dataset has decreased by almost 9 times */ 28 | 29 | 30 | 31 | /* Change datatypes and names in the 'bed', 'bath', 'zip_code' columns*/ 32 | 33 | CREATE OR REPLACE TABLE `myproject8888-357816.real_estate_us.re_us1` 34 | AS 35 | SELECT 36 | status, 37 | price, 38 | CAST(bed AS INT64) AS bedrooms, 39 | CAST(bath AS INT64) AS bathrooms, 40 | acre_lot, 41 | full_address, 42 | street, 43 | city, 44 | state, 45 | CAST(zip_code AS STRING) AS zipcode, 46 | house_size, 47 | sold_date 48 | FROM 49 | `myproject8888-357816.real_estate_us.re_us1` 50 | 51 | 52 | 53 | /*Check distint values in the status column and quantity*/ 54 | SELECT 55 | status, 56 | COUNT(status) 57 | FROM 58 | `myproject8888-357816.real_estate_us.re_us1` 59 | GROUP BY 60 | status 61 | 62 | 63 | 64 | /*Check sold_date column with status 'ready_to_build'*/ 65 | 66 | SELECT 67 | * 68 | FROM 69 | `myproject8888-357816.real_estate_us.re_us1` 70 | WHERE 71 | status = 'ready_to_build' AND 72 | sold_date IS NOT NULL 73 | 74 | /* The Status column has two distinct values, the majority of which are "for_sale" (113512). 75 | There is no 'sold date' for any of the rows with "ready_to_build" (277 rows). Later, we will exclude 'ready_to_build' rows from our analysis, 76 | because they are now actually existing buildings.  */ 77 | 78 | 79 | 80 | /*Check bed column*/ 81 | 82 | SELECT 83 | bedrooms, 84 | COUNT(bedrooms) as count_bed 85 | FROM 86 | `myproject8888-357816.real_estate_us.re-us` 87 | GROUP BY 88 | bedrooms 89 | ORDER BY 90 | count_bed DESC 91 | 92 | 93 | 94 | SELECT 95 | * 96 | FROM 97 | `myproject8888-357816.real_estate_us.re_us1` 98 | WHERE 99 | bedrooms > 11 100 | 101 | 102 | 103 | SELECT 104 | * 105 | FROM 106 | `myproject8888-357816.real_estate_us.re_us1` 107 | WHERE 108 | bedrooms IS NULL 109 | 110 | /* Here are some rows with an enormously high quantity of bedrooms; for example, 123 is the maximum value. However, a portion of the rows do not have a null sold_date. 111 | There aren't many rows like this. We will leave it as is. 17516 with null values in this column */ 112 | 113 | 114 | 115 | /* Check the other columns */ 116 | 117 | SELECT 118 | bathrooms, 119 | COUNT(bathrooms) as count_bath 120 | FROM 121 | `myproject8888-357816.real_estate_us.re_us1` 122 | GROUP BY 123 | bathrooms 124 | ORDER BY 125 | count_bath DESC 126 | 127 | SELECT 128 | * 129 | FROM 130 | `myproject8888-357816.real_estate_us.re_us1` 131 | WHERE bathrooms > 12 132 | 133 | SELECT 134 | * 135 | FROM 136 | `myproject8888-357816.real_estate_us.re_us1` 137 | WHERE 138 | bathrooms IS NULL 139 | 140 | /* 16297 null values in the bathrooms column. Enormously high values (more than 11) in the bathroom column in 210 rows. 141 | There are more bathrooms than bedrooms in these rows. Maybe it's a mistake. But we don't know exactly; it's not the goal of our analysis right now. 142 | And it will not skew the results; we will leave it as it is. */ 143 | 144 | 145 | SELECT 146 | * 147 | FROM 148 | `myproject8888-357816.real_estate_us.re_us1` 149 | WHERE 150 | state IS NULL 151 | 152 | 153 | 154 | SELECT 155 | state, 156 | COUNT(state) AS counts 157 | FROM 158 | `myproject8888-357816.real_estate_us.re_us1` 159 | GROUP BY 160 | state 161 | ORDER BY 162 | counts DESC 163 | 164 | 165 | /* There are no null values in the "state" column. 166 | Virginia (7), Georgia (5), South Carolina, Tennessee, Wyoming, and West Virginia (1) have a low quantity of rows. 167 | We will exclude them from our analysis. */ 168 | 169 | 170 | 171 | SELECT 172 | * 173 | FROM 174 | `myproject8888-357816.real_estate_us.re_us1` 175 | WHERE 176 | sold_date IS NULL 177 | 178 | 179 | /* There are 54092 null values in the "sold_date" column. These rows cannot be used for time-series analysis. 180 | So, let's create two tables: one for analysis by time periods and prediction, and another for the basic exploration. */ 181 | 182 | 183 | 184 | /* Remove the states of Virginia, Georgia, South Carolina, Tennessee, Wyoming, and West Virginia; "ready_to_build' status.  185 | Drop the status, full_address, and zipcode columns. 186 | I use CREATE OR DROP TABLE because DML is not available in the Bigquery Sandbox. */ 187 | 188 | CREATE OR REPLACE TABLE `myproject8888-357816.real_estate_us.re_us1` 189 | AS 190 | SELECT 191 | state, 192 | city, 193 | street, 194 | price, 195 | bedrooms, 196 | bathrooms, 197 | acre_lot, 198 | house_size, 199 | sold_date 200 | FROM 201 | `myproject8888-357816.real_estate_us.re_us1` 202 | WHERE 203 | status != 'ready_to_build' AND 204 | state != 'Virginia'AND 205 | state != 'Georgia' AND 206 | state != 'South Carolina' AND 207 | state != 'Tennessee' AND 208 | state != 'Wyoming' AND 209 | state != 'West Virginia' 210 | 211 | 212 | /* Inspect city column */ 213 | SELECT 214 | city, 215 | COUNT(city) AS counts, 216 | state 217 | FROM `myproject8888-357816.real_estate_us.re_us1` 218 | GROUP BY 219 | city, 220 | state 221 | ORDER BY 222 | counts DESC 223 | 224 | 225 | SELECT 226 | city, 227 | COUNT(city) AS counts, 228 | state 229 | FROM `myproject8888-357816.real_estate_us.re_us1` 230 | WHERE city LIKE 'N%' AND state = 'New York' 231 | GROUP BY city, state 232 | ORDER BY counts DESC 233 | 234 | 235 | 236 | /*There are different spellings of New York(New York City, Ny, Nyc). Let's fix it */ 237 | 238 | SELECT 239 | city, 240 | COUNT(city) AS counts, 241 | REPLACE(REPLACE(REPLACE(city, 'New York City', 'New York'),'Nyc', 'New York'), 'Ny', 'New York') AS ny, 242 | state 243 | FROM 244 | `myproject8888-357816.real_estate_us.re_us1` 245 | WHERE 246 | city LIKE 'N%' AND state = 'New York' 247 | GROUP BY 248 | city, 249 | state 250 | ORDER BY 251 | counts DESC 252 | 253 | 254 | CREATE OR REPLACE TABLE 255 | `myproject8888-357816.real_estate_us.re_us1` 256 | AS 257 | SELECT 258 | state, 259 | REPLACE(REPLACE(REPLACE(city, 'New York City', 'New York'),'Nyc', 'New York'), 'Ny', 'New York') AS city, 260 | street, 261 | price, 262 | bedrooms, 263 | bathrooms, 264 | acre_lot, 265 | house_size, 266 | sold_date 267 | FROM 268 | `myproject8888-357816.real_estate_us.re_us1` 269 | 270 | 271 | 272 | /* Fixing 23 null values in the "city" column and add excract year from 'sold_date'. 273 | Checking null and suspiciously low values values(51 rows) is 'price' column and removing them */ 274 | 275 | SELECT 276 | * 277 | FROM 278 | `myproject8888-357816.real_estate_us.re_us1` 279 | WHERE 280 | city IS NULL 281 | 282 | 283 | SELECT 284 | * 285 | FROM 286 | `myproject8888-357816.real_estate_us.re_us2` 287 | WHERE 288 | price IS NULL 289 | 290 | 291 | 292 | SELECT 293 | * 294 | FROM 295 | `myproject8888-357816.real_estate_us.re_us2` 296 | WHERE 297 | price < 5000 298 | 299 | 300 | 301 | CREATE OR REPLACE TABLE `myproject8888-357816.real_estate_us.re_us2` 302 | AS 303 | SELECT 304 | state, 305 | CASE 306 | WHEN street IN ('163 Union and Mt Wash Ea','155-A La Vallee Nb','123 Catherines Hope Eb', '21 N Grapetree Eb', '42 43 Shoys Ea', '8-B Teagues Bay Eb', '242 Union and Mt Wash Ea', '96 Hard Labor Pr') AND city IS NULL 307 | THEN 'Christiansted' 308 | WHEN street IN ('4 Prosperity Nb', '20 River Pr', '17 Prosperity Nb', '94V I Corp Lands Pr', '14 Diamond Pr', '192 La Vallee Nb') AND city IS NULL THEN 'Frederiksted' 309 | WHEN street IN ('240 St John Qu') AND city IS NULL THEN 'Saint John' 310 | WHEN street IN ('230 S Stevens Ave') AND city IS NULL THEN 'South Amboy' 311 | WHEN street IN ('0 Block 32 Quinton Alloway Quinton Rd Lot 11 01') AND city IS NULL THEN 'Quinton' 312 | WHEN street = '641 State Route 82' AND city IS NULL THEN 'Hopewell Junction' 313 | WHEN street = '32 Devereux Dr' AND city IS NULL THEN 'Manchester Township' 314 | WHEN street = '9-11 Putnam Park Rd' AND city IS NULL THEN 'Bethel' 315 | WHEN street = '68 Avondale St' AND city IS NULL THEN 'Valley Stream' 316 | WHEN street = '824-26 Berckman St' AND city IS NULL THEN 'Plainfield' 317 | WHEN street = '689 Luis M Marin Blvd Unit 1009' AND city IS NULL THEN 'Jersey City' 318 | ELSE city 319 | END AS city, 320 | street, 321 | price, 322 | bedrooms, 323 | bathrooms, 324 | acre_lot, 325 | house_size, 326 | sold_date, 327 | EXTRACT(YEAR FROM sold_date) AS year 328 | FROM 329 | `myproject8888-357816.real_estate_us.re_us1` 330 | WHERE 331 | price > 5000 332 | 333 | 334 | /* Create house_size_m2 and hectare_lot columns. 335 | Replace incorrect highest value, update new highest value according to realtor.com data. 336 | Change info about property with adress information 421 W 250th St (complex way, because DML language and UPDATE is not available in the Bigquery sandbox) */ 337 | 338 | 339 | SELECT 340 | * 341 | FROM 342 | `myproject8888-357816.real_estate_us.re_us2` 343 | ORDER BY 344 | price DESC 345 | 346 | 347 | CREATE OR UPDATE TABLE `myproject8888-357816.real_estate_us.re_us2` 348 | AS 349 | SELECT 350 | state, 351 | CASE 352 | WHEN street = '421 W 250th St' AND city = 'New York' THEN 'Bronx' 353 | ELSE city 354 | END AS city, 355 | street, 356 | CASE 357 | WHEN street = '952 E 223 St Units 4858 & 66' AND price = 875000000 THEN 850000 358 | WHEN street = '432 Park Ave Unit Penthouse' AND price = 169000000 Then 180000000 359 | WHEN street = '421 W 250th St' AND price = 120000000 THEN 8750000 360 | ELSE price 361 | END AS price, 362 | CASE 363 | WHEN street = '421 W 250th St' AND bedrooms = 123 THEN 8 364 | ELSE bedrooms 365 | END AS bedrooms, 366 | CASE 367 | WHEN street = '421 W 250th St' AND bathrooms = 123 THEN 10 368 | ELSE bathrooms 369 | END AS bathrooms, 370 | acre_lot, 371 | acre_lot*0.404686 AS hectare_lot, 372 | CASE 373 | WHEN street = '421 W 250th St' AND house_size IS NULL THEN 11135 374 | ELSE house_size 375 | END AS house_size, 376 | house_size/10.7639 AS house_size_m2, 377 | CASE 378 | WHEN street = '421 W 250th St' AND sold_date = '2012-06-29' THEN NULL 379 | ELSE sold_date 380 | END AS sold_date 381 | FROM 382 | `myproject8888-357816.real_estate_us.re_us2` 383 | ORDER BY 384 | price DESC 385 | 386 | 387 | /* Here are some more duplicates with slightly different street column values but the same other columns. We need to solve this */ 388 | 389 | SELECT 390 | DISTINCT 391 | state, 392 | city, 393 | price, 394 | bedrooms, 395 | bathrooms, 396 | acre_lot, 397 | house_size, 398 | sold_date 399 | FROM 400 | `myproject8888-357816.real_estate_us.re_us2` 401 | ORDER BY 402 | price DESC 403 | 404 | /* There are 111016 Distinct rows excluding street column */ 405 | 406 | /* Check the duplicate rows to decide how to treat them. */ 407 | 408 | SELECT 409 | * 410 | FROM 411 | `myproject8888-357816.real_estate_us.re_us4` a 412 | JOIN (SELECT state, 413 | city, 414 | price, 415 | IFNULL(bedrooms, 0) AS bedrooms, 416 | IFNULL(bathrooms, 0) AS bathrooms, 417 | IFNULL(acre_lot, 0) AS acre_lot, 418 | IFNULL(house_size, 0) AS house_size, 419 | COUNT(*) 420 | FROM `myproject8888-357816.real_estate_us.re_us4` 421 | GROUP BY state, 422 | city, 423 | price, 424 | bedrooms, 425 | bathrooms, 426 | acre_lot, 427 | house_size 428 | HAVING COUNT(*) > 1) b 429 | ON a.state = b.state 430 | AND a.city = b.city 431 | AND a.price = b.price 432 | AND a.bedrooms = b.bedrooms 433 | AND a.bathrooms = b.bathrooms 434 | AND a.acre_lot = b.acre_lot 435 | AND a.house_size = b.house_size 436 | ORDER BY 437 | a.price 438 | 439 | /* With a few exceptions, we can tell from the web information about duplicate row addresses that the majority of them are the same property. 440 | We can remove this duplicates 441 | But it's necessary to check rows where bedrooms, bathrooms, acre_lot, house_size are nulls to see if they are the same.*/  442 | /* Create a table with the changed datatypes and replaced null values. */ 443 | 444 | CREATE OR REPLACE TABLE 445 | `myproject8888-357816.real_estate_us.re_us5` 446 | AS 447 | SELECT 448 | state, 449 | city, 450 | street, 451 | CAST(price AS INT64) AS price, 452 | IFNULL(bedrooms, 0) AS bedrooms, 453 | IFNULL(bathrooms, 0) AS bathrooms, 454 | IFNULL(CAST(acre_lot AS STRING), '0') AS acre_lot, 455 | IFNULL(CAST(house_size AS STRING), '0') AS house_size, 456 | IFNULL(CAST(sold_date AS STRING), '0') AS sold_date, 457 | FROM 458 | `myproject8888-357816.real_estate_us.re_us4` 459 | 460 | 461 | WITH cte AS ( 462 | SELECT *, 463 | row_number() OVER(PARTITION BY state, 464 | city, 465 | price, 466 | bedrooms, 467 | bathrooms, 468 | acre_lot, 469 | house_size, 470 | sold_date ORDER BY price DESC) AS rn 471 | FROM `myproject8888-357816.real_estate_us.re_us5` 472 | ) 473 | Select * from cte WHERE rn > 1 AND bedrooms = 0 AND bathrooms = 0 AND acre_lot = '0' AND house_size = '0' 474 | 475 | /* 33 rows with null in bedrooms, bathrooms, acre_lot, house_size columns at the same time. 476 | Part of them are different plots of land, and another part are the duplicate properties. We can remove duplicates here. */ 477 | 478 | 479 | CREATE OR REPLACE TABLE `myproject8888-357816.real_estate_us.re_us_noduplicates` 480 | AS 481 | WITH CTE AS ( 482 | SELECT *, 483 | row_number() OVER(PARTITION BY state, 484 | city, 485 | price, 486 | bedrooms, 487 | bathrooms, 488 | acre_lot, 489 | house_size, 490 | sold_date ORDER BY price DESC) AS rn 491 | FROM 492 | `myproject8888-357816.real_estate_us.re_us5` 493 | ) 494 | SELECT 495 | * 496 | FROM 497 | CTE 498 | WHERE 499 | rn = 1 500 | ORDER BY price DESC 501 | 502 | 503 | /* Also, we need to separate plots of land from property for our analysis. */ 504 | 505 | CREATE OR REPLACE TABLE 506 | `myproject8888-357816.real_estate_us.re_us_noduplicates` 507 | AS 508 | SELECT 509 | ROW_NUMBER() OVER(ORDER BY price DESC) AS id, 510 | state, 511 | city, 512 | street, 513 | price, 514 | bedrooms, 515 | bathrooms, 516 | acre_lot, 517 | house_size, 518 | sold_date 519 | FROM 520 | `myproject8888-357816.real_estate_us.re_us_noduplicates` 521 | 522 | 523 | CREATE OR REPLACE TABLE `myproject8888-357816.real_estate_us.re_us_noduplicates` 524 | AS 525 | SELECT 526 | * 527 | FROM 528 | `myproject8888-357816.real_estate_us.re_us_noduplicates` 529 | WHERE 530 | id NOT IN (SELECT id From `myproject8888-357816.real_estate_us.re_us_noduplicates` WHERE bedrooms = 0 AND bathrooms = 0 AND acre_lot != '0' AND house_size = '0') 531 | ORDER BY 532 | price DESC 533 | 534 | /* Check columns with nulls in bedrooms, bathrooms, acre_lot, house_size4 columns*/ 535 | 536 | WITH cte AS ( 537 | SELECT *, 538 | row_number() OVER(PARTITION BY state, 539 | city, 540 | price, 541 | bedrooms, 542 | bathrooms, 543 | acre_lot, 544 | house_size, 545 | sold_date ORDER BY price DESC) AS rn 546 | FROM `myproject8888-357816.real_estate_us.re_us5` 547 | ) 548 | Select * from cte WHERE bedrooms = 0 AND bathrooms = 0 AND acre_lot = '0' AND house_size = '0' 549 | ORDER BY price DESC 550 | 551 | /* There are 552 such rows. Most of them are properties. Maybe all this information was just skipped while entering data. 552 | Leave these rows in our property data. */ 553 | 554 | /* Change datatypes, add columns */ 555 | 556 | SELECT 557 | state, 558 | city, 559 | street, 560 | price, 561 | bedrooms, 562 | bathrooms, 563 | CAST(CASE WHEN acre_lot = 0 564 | THEN NULL 565 | ELSE acre_lot 566 | END AS FLOAT64) AS acre_lot, 567 | CAST(CASE WHEN house_size = 0 568 | THEN NULL 569 | ELSE house_size 570 | END AS FLOAT64) AS house_size, 571 | CAST(CASE WHEN sold_date = '0' 572 | THEN NULL 573 | ELSE sold_date 574 | END AS DATE) AS sold_date 575 | FROM 576 | `myproject8888-357816.real_estate_us.re_us5` 577 | 578 | 579 | CREATE OR REPLACE TABLE `myproject8888-357816.real_estate_us.re_us_property` 580 | AS 581 | SELECT 582 | state, 583 | city, 584 | street, 585 | price, 586 | bedrooms, 587 | bathrooms, 588 | acre_lot, 589 | acre_lot*0.404686 AS hectare_lot, 590 | house_size, 591 | house_size/10.7639 AS house_size_m2, 592 | sold_date, 593 | EXTRACT(YEAR FROM sold_date) AS year 594 | FROM 595 | `myproject8888-357816.real_estate_us.re_us_property` 596 | 597 | 598 | /* Create second table only with not null values in the 'sold_date' column */ 599 | 600 | CREATE OR REPLACE TABLE `myproject8888-357816.real_estate_us.re_us_sold` 601 | AS 602 | SELECT 603 | * 604 | FROM 605 | `myproject8888-357816.real_estate_us.re_us2` 606 | WHERE 607 | sold_date IS NOT NULL 608 | 609 | 610 | /* Create a table with plots of land. 611 | But we need to do more exploration of this data to be sure that this table contains actual information about plots. */ 612 | 613 | 614 | CREATE OR REPLACE TABLE 615 | `myproject8888-357816.real_estate_us.re_us_plots` 616 | AS 617 | SELECT 618 | state, 619 | city, 620 | street, 621 | price, 622 | CAST(acre_lot AS FLOAT64) AS acre_lot, 623 | sold_date 624 | FROM `myproject8888-357816.real_estate_us.re_us5` 625 | WHERE bedrooms = 0 AND bathrooms = 0 AND acre_lot != '0' AND house_size = '0' 626 | ORDER BY price DESC 627 | -------------------------------------------------------------------------------- /Real Estate USA data SQL and Power BI analysis/Exploring_data.sql: -------------------------------------------------------------------------------- 1 | /* Exploring data about real estate in USA in SQL 2 | Skills used:*/ 3 | 4 | ------------------------------------------------------------------------ 5 | 6 | 7 | SELECT 8 | year, 9 | COUNT(year) AS property_sold 10 | FROM 11 | `myproject8888-357816.real_estate_us.re_us2` 12 | GROUP BY 13 | year 14 | ORDER BY 15 | property_sold DESC 16 | 17 | /* 2023 - highest, 1901 - lowest */ 18 | 19 | 20 | /* Explore state */ 21 | 22 | SELECT 23 | state, 24 | COUNT(state) AS num_of_property, 25 | AVG(price) AS avg_price, 26 | MIN(price) AS min_price, 27 | MAX(price) AS max_price, 28 | AVG(house_size_m2) AS avg_size, 29 | MIN(house_size_m2) AS min_size, 30 | MAX(house_size_m2) AS max_size, 31 | AVG(hectare_lot) AS avg_lot, 32 | MIN(hectare_lot) AS min_lot, 33 | MAX(hectare_lot) AS max_lot, 34 | FROM 35 | `myproject8888-357816.real_estate_us.re_us4` 36 | GROUP BY 37 | state 38 | ORDER BY 39 | num_of_property DESC 40 | 41 | SELECT 42 | year, 43 | state, 44 | COUNT(state) AS num_of_property, 45 | AVG(price) AS avg_price, 46 | MIN(price) AS min_price, 47 | MAX(price) AS max_price, 48 | AVG(house_size_m2) AS avg_size, 49 | MIN(house_size_m2) AS min_size, 50 | MAX(house_size_m2) AS max_size, 51 | AVG(hectare_lot) AS avg_lot, 52 | MIN(hectare_lot) AS min_lot, 53 | MAX(hectare_lot) AS max_lot, 54 | FROM 55 | `myproject8888-357816.real_estate_us.re_us_property` 56 | WHERE 57 | year IS NOT NULL 58 | GROUP BY 59 | state, year 60 | ORDER BY 61 | num_of_property DESC 62 | 63 | 64 | 65 | /* Explore city */ 66 | 67 | SELECT 68 | city, 69 | COUNT(city) AS num_of_property, 70 | AVG(price) AS avg_price, 71 | MIN(price) AS min_price, 72 | MAX(price) AS max_price, 73 | AVG(house_size_m2) AS avg_size, 74 | MIN(house_size_m2) AS min_size, 75 | MAX(house_size_m2) AS max_size, 76 | AVG(hectare_lot) AS avg_lot, 77 | MIN(hectare_lot) AS min_lot, 78 | MAX(hectare_lot) AS max_lot, 79 | FROM 80 | `myproject8888-357816.real_estate_us.re_us4` 81 | GROUP BY 82 | city 83 | ORDER BY 84 | num_of_property DESC 85 | 86 | 87 | /* Explore bathrooms */ 88 | 89 | 90 | SELECT 91 | state, 92 | bathrooms, 93 | COUNT(bathrooms) AS count_bath, 94 | AVG(price) AS avg_price, 95 | MIN(price) AS min_price, 96 | MAX(price) AS max_price, 97 | AVG(house_size_m2) AS avg_size, 98 | MIN(house_size_m2) AS min_size, 99 | MAX(house_size_m2) AS max_size, 100 | FROM 101 | `myproject8888-357816.real_estate_us.re_us_property` 102 | GROUP BY 103 | state, bathrooms 104 | ORDER BY 105 | count_bath DESC, state 106 | 107 | 108 | SELECT 109 | bathrooms, 110 | COUNT(bathrooms) AS count_bath, 111 | AVG(price) AS avg_price, 112 | MIN(price) AS min_price, 113 | MAX(price) AS max_price, 114 | AVG(house_size_m2) AS avg_size, 115 | MIN(house_size_m2) AS min_size, 116 | MAX(house_size_m2) AS max_size, 117 | FROM 118 | `myproject8888-357816.real_estate_us.re_us_property` 119 | GROUP BY 120 | bathrooms 121 | ORDER BY 122 | count_bath DESC 123 | 124 | 125 | 126 | /* Explore bedrooms */ 127 | 128 | SELECT 129 | state, 130 | bedrooms, 131 | COUNT(bedrooms) AS count_bed, 132 | AVG(price) AS avg_price, 133 | MIN(price) AS min_price, 134 | MAX(price) AS max_price, 135 | AVG(house_size_m2) AS avg_size, 136 | MIN(house_size_m2) AS min_size, 137 | MAX(house_size_m2) AS max_size, 138 | FROM 139 | `myproject8888-357816.real_estate_us.re_us_property` 140 | GROUP BY 141 | bedrooms, state 142 | ORDER BY 143 | count_bed DESC 144 | 145 | 146 | SELECT 147 | bedrooms, 148 | COUNT(bedrooms) AS count_bed, 149 | AVG(price) AS avg_price, 150 | MIN(price) AS min_price, 151 | MAX(price) AS max_price, 152 | AVG(house_size_m2) AS avg_size, 153 | MIN(house_size_m2) AS min_size, 154 | MAX(house_size_m2) AS max_size, 155 | FROM 156 | `myproject8888-357816.real_estate_us.re_us_property` 157 | GROUP BY 158 | bedrooms 159 | ORDER BY 160 | count_bed DESC 161 | 162 | 163 | /* Adding "year" column. */ 164 | 165 | CREATE OR REPLACE TABLE `myproject8888-357816.real_estate_us.re_us_property` 166 | AS 167 | SELECT 168 | *, 169 | EXTRACT(YEAR FROM sold_date) AS year, 170 | FROM `myproject8888-357816.real_estate_us.re_us_property` 171 | 172 | 173 | /* Query data for Power BI exploration by year. */ 174 | SELECT 175 | state, 176 | city, 177 | year, 178 | bedrooms, 179 | bathrooms, 180 | COUNT(state) AS num_of_property, 181 | AVG(price) AS avg_price, 182 | MIN(price) AS min_price, 183 | MAX(price) AS max_price, 184 | AVG(house_size_m2) AS avg_size, 185 | MIN(house_size_m2) AS min_size, 186 | MAX(house_size_m2) AS max_size, 187 | AVG(hectare_lot) AS avg_lot, 188 | MIN(hectare_lot) AS min_lot, 189 | MAX(hectare_lot) AS max_lot, 190 | FROM 191 | `myproject8888-357816.real_estate_us.re_us_property` 192 | WHERE year IS NOT NULL 193 | GROUP BY 194 | state, city, year, bedrooms, bathrooms 195 | ORDER BY 196 | num_of_property DESC 197 | 198 | 199 | 200 | /* Query data for Power BI exploration by year and month. */ 201 | SELECT 202 | state, 203 | city, 204 | year, 205 | FORMAT_DATE('%B', sold_date) AS month, 206 | bedrooms, 207 | bathrooms, 208 | COUNT(state) AS num_of_property, 209 | AVG(price) AS avg_price, 210 | MIN(price) AS min_price, 211 | MAX(price) AS max_price, 212 | AVG(house_size_m2) AS avg_size, 213 | MIN(house_size_m2) AS min_size, 214 | MAX(house_size_m2) AS max_size, 215 | AVG(hectare_lot) AS avg_lot, 216 | MIN(hectare_lot) AS min_lot, 217 | MAX(hectare_lot) AS max_lot, 218 | FROM 219 | `myproject8888-357816.real_estate_us.re_us_property` 220 | WHERE year IS NOT NULL 221 | GROUP BY 222 | state, city, year, month, bedrooms, bathrooms 223 | ORDER BY 224 | num_of_property DESC 225 | 226 | 227 | /* Explore data with null sold date */ 228 | SELECT 229 | state, 230 | city, 231 | bedrooms, 232 | bathrooms, 233 | COUNT(state) AS num_of_property, 234 | SUM(price) AS market_size, 235 | AVG(price) AS avg_price, 236 | MIN(price) AS min_price, 237 | MAX(price) AS max_price, 238 | AVG(house_size_m2) AS avg_size, 239 | MIN(house_size_m2) AS min_size, 240 | MAX(house_size_m2) AS max_size, 241 | AVG(hectare_lot) AS avg_lot, 242 | MIN(hectare_lot) AS min_lot, 243 | MAX(hectare_lot) AS max_lot, 244 | FROM 245 | `myproject8888-357816.real_estate_us.re_us_property` 246 | WHERE year IS NULL 247 | GROUP BY 248 | state, city, bedrooms, bathrooms 249 | ORDER BY 250 | num_of_property DESC 251 | 252 | 253 | 254 | /* Explore from lowest price */ 255 | SELECT 256 | * 257 | FROM 258 | `myproject8888-357816.real_estate_us.re_us_property` 259 | WHERE year IS NULL 260 | ORDER BY 261 | price ASC 262 | 263 | /* Part of the properties from the data are off-market right now, and part are still on sale.  264 | It's now very useful for analysis: the data contains information about property on sale and already sold at an unknown time.  265 | We can divide it just by manually checking.  266 | There is no need to do such big work. We can additionally try to visualize the whole bunch of data in Power BI, maybe it will show something. */ 267 | -------------------------------------------------------------------------------- /Real Estate USA data SQL and Power BI analysis/Real Estate USA Dashboards.pbix: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/paulo81818/Data-Business-Analysis-Portfolio/a03c682609731826fbf301cdc43e5dc808a676fd/Real Estate USA data SQL and Power BI analysis/Real Estate USA Dashboards.pbix -------------------------------------------------------------------------------- /Real Estate USA data SQL and Power BI analysis/Real-Estate USA Dashboards.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/paulo81818/Data-Business-Analysis-Portfolio/a03c682609731826fbf301cdc43e5dc808a676fd/Real Estate USA data SQL and Power BI analysis/Real-Estate USA Dashboards.pdf --------------------------------------------------------------------------------