├── Adidas US Sales Analysis
├── Adidas Sales Analysis with reports.xlsx
├── Dashboard.xlsx
└── Original-dataset.xlsx
├── Certificates
├── Certificate Agile Requirements Foundations.pdf
├── Certificate Business Analysis Foundations.pdf
├── Certificate Business Analyst and Project Manager Collaboration.pdf
├── Certificate Business Benefits Realization Foundations.pdf
├── Certificate Excel Working Together with Power Query and Power Pivot.pdf
├── Coursera Prepare, Clean, Transform, and Load Data using Power BI.pdf
├── Data analysis with Pandas.pdf
├── ETL and Data Pipelines with Shell, Airflow and Kafka.pdf
├── Google Data Data Analytics.pdf
├── IBM data analysis Python.pdf
├── Intermediate SQL Data Reporting and Analysis.pdf
├── Linkedin Learning Excel Data Analysis.pdf
└── Linkedin Learning Python Data Analysis.pdf
├── Data Analyst CV.pdf
├── German cars Analysis
├── German cars presentation Pavel Liaoshka.pdf
├── German cars presentation Pavel Liaoshka.pptx
├── German-cars-Analysis.ipynb
└── original dataset.csv
├── README.md
└── Real Estate USA data SQL and Power BI analysis
├── Data_cleaning.sql
├── Exploring_data.sql
├── Quered data property sold.csv
├── Quered data year and month.csv
├── Real Estate USA Dashboards.pbix
└── Real-Estate USA Dashboards.pdf
/Adidas US Sales Analysis/Adidas Sales Analysis with reports.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/paulo81818/Data-Business-Analysis-Portfolio/a03c682609731826fbf301cdc43e5dc808a676fd/Adidas US Sales Analysis/Adidas Sales Analysis with reports.xlsx
--------------------------------------------------------------------------------
/Adidas US Sales Analysis/Dashboard.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/paulo81818/Data-Business-Analysis-Portfolio/a03c682609731826fbf301cdc43e5dc808a676fd/Adidas US Sales Analysis/Dashboard.xlsx
--------------------------------------------------------------------------------
/Adidas US Sales Analysis/Original-dataset.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/paulo81818/Data-Business-Analysis-Portfolio/a03c682609731826fbf301cdc43e5dc808a676fd/Adidas US Sales Analysis/Original-dataset.xlsx
--------------------------------------------------------------------------------
/Certificates/Certificate Agile Requirements Foundations.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/paulo81818/Data-Business-Analysis-Portfolio/a03c682609731826fbf301cdc43e5dc808a676fd/Certificates/Certificate Agile Requirements Foundations.pdf
--------------------------------------------------------------------------------
/Certificates/Certificate Business Analysis Foundations.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/paulo81818/Data-Business-Analysis-Portfolio/a03c682609731826fbf301cdc43e5dc808a676fd/Certificates/Certificate Business Analysis Foundations.pdf
--------------------------------------------------------------------------------
/Certificates/Certificate Business Analyst and Project Manager Collaboration.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/paulo81818/Data-Business-Analysis-Portfolio/a03c682609731826fbf301cdc43e5dc808a676fd/Certificates/Certificate Business Analyst and Project Manager Collaboration.pdf
--------------------------------------------------------------------------------
/Certificates/Certificate Business Benefits Realization Foundations.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/paulo81818/Data-Business-Analysis-Portfolio/a03c682609731826fbf301cdc43e5dc808a676fd/Certificates/Certificate Business Benefits Realization Foundations.pdf
--------------------------------------------------------------------------------
/Certificates/Certificate Excel Working Together with Power Query and Power Pivot.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/paulo81818/Data-Business-Analysis-Portfolio/a03c682609731826fbf301cdc43e5dc808a676fd/Certificates/Certificate Excel Working Together with Power Query and Power Pivot.pdf
--------------------------------------------------------------------------------
/Certificates/Coursera Prepare, Clean, Transform, and Load Data using Power BI.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/paulo81818/Data-Business-Analysis-Portfolio/a03c682609731826fbf301cdc43e5dc808a676fd/Certificates/Coursera Prepare, Clean, Transform, and Load Data using Power BI.pdf
--------------------------------------------------------------------------------
/Certificates/Data analysis with Pandas.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/paulo81818/Data-Business-Analysis-Portfolio/a03c682609731826fbf301cdc43e5dc808a676fd/Certificates/Data analysis with Pandas.pdf
--------------------------------------------------------------------------------
/Certificates/ETL and Data Pipelines with Shell, Airflow and Kafka.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/paulo81818/Data-Business-Analysis-Portfolio/a03c682609731826fbf301cdc43e5dc808a676fd/Certificates/ETL and Data Pipelines with Shell, Airflow and Kafka.pdf
--------------------------------------------------------------------------------
/Certificates/Google Data Data Analytics.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/paulo81818/Data-Business-Analysis-Portfolio/a03c682609731826fbf301cdc43e5dc808a676fd/Certificates/Google Data Data Analytics.pdf
--------------------------------------------------------------------------------
/Certificates/IBM data analysis Python.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/paulo81818/Data-Business-Analysis-Portfolio/a03c682609731826fbf301cdc43e5dc808a676fd/Certificates/IBM data analysis Python.pdf
--------------------------------------------------------------------------------
/Certificates/Intermediate SQL Data Reporting and Analysis.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/paulo81818/Data-Business-Analysis-Portfolio/a03c682609731826fbf301cdc43e5dc808a676fd/Certificates/Intermediate SQL Data Reporting and Analysis.pdf
--------------------------------------------------------------------------------
/Certificates/Linkedin Learning Excel Data Analysis.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/paulo81818/Data-Business-Analysis-Portfolio/a03c682609731826fbf301cdc43e5dc808a676fd/Certificates/Linkedin Learning Excel Data Analysis.pdf
--------------------------------------------------------------------------------
/Certificates/Linkedin Learning Python Data Analysis.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/paulo81818/Data-Business-Analysis-Portfolio/a03c682609731826fbf301cdc43e5dc808a676fd/Certificates/Linkedin Learning Python Data Analysis.pdf
--------------------------------------------------------------------------------
/Data Analyst CV.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/paulo81818/Data-Business-Analysis-Portfolio/a03c682609731826fbf301cdc43e5dc808a676fd/Data Analyst CV.pdf
--------------------------------------------------------------------------------
/German cars Analysis/German cars presentation Pavel Liaoshka.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/paulo81818/Data-Business-Analysis-Portfolio/a03c682609731826fbf301cdc43e5dc808a676fd/German cars Analysis/German cars presentation Pavel Liaoshka.pdf
--------------------------------------------------------------------------------
/German cars Analysis/German cars presentation Pavel Liaoshka.pptx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/paulo81818/Data-Business-Analysis-Portfolio/a03c682609731826fbf301cdc43e5dc808a676fd/German cars Analysis/German cars presentation Pavel Liaoshka.pptx
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Pavel Liaoshka - Data Analysis portfolio
2 | ## About me
3 | Hello everyone! My name is Pavel, and this is my portfolio.
4 | I love investigating different types of data, discovering insights, and representing it with beautiful visuals.
5 | I have a background in digital marketing and financial data analysis.
6 |
7 | You can see more information in my [**CV**](https://github.com/paulo81818/Data-Business-Analysis-Portfolio/blob/main/Data%20Analyst%20CV.pdf).
8 |
9 | This repository was created to showcase my analytical and technical skills (Excel, Python, SQL, Power BI, PowerPoint, and others).
10 | ## Contents
11 | * [About me](#about-me)
12 | * [Portfolio Projects](#portfolio-projects)
13 | - [German cars data analysis](#german-cars-data-analysis)
14 | - [Adidas US sales data analysis](#adidas-us-sales-data-analysis)
15 | - [Real estate US SQL and Power BI analysis](#real-estate-us-sql-and-power-bi-analysis)
16 | * [Study projects](#study-projects)
17 | - [Python Data Analysis](#python-data-analysis)
18 | - [Excel Exercises](#excel-exercises)
19 | - [R Exercises](#r-exercises)
20 | - [SQL Exercises](#sql-exercises)
21 | - [ETL pipelines project](#etl-pipelines-project)
22 | * [Certificates](#certificates)
23 | * [Contacts](#contacts)
24 | ## Portfolio Projects
25 | This section contains a list of projects with brief descriptions.
26 | ### German cars data analysis
27 | **Description:** The main goal of this project is to analyze a dataset scraped from https://www.autoscout24.de containing data in 46405 rows about cars on sale in Germany with production years from 2011 to 2021 using different Python libraries. Find out some interesting information and insights, visualize them, and present them using MS PowerPoint.
28 | **Code:**
29 | german-cars-data-analysis.ipynb
30 | **Presentation:**
31 | PDF static version
32 | PowerPoint PPTX dynamic version
33 | **Original dataset:**
34 | german-cars-dataset.csv
35 | **Skills:** analytical thinking, data cleaning, data analysis, data vizualization, presentations
36 | **Hard skills:** MS PowerPoint, Python: Pandas, NumPy, Mathplotlib, Seaborn.
37 | **Results:** An analysis of a dataset containing information about cars posted on AutoScaut 24 was conducted. Some insights were found, visualized, and included in the presentation.
38 | ### Adidas US sales data analysis
39 | **Description:** The goal of this project is to use Excel functionality to analyze data about Adidas's product sales in the United States, which contain information in 9652 rows and 14 columns for the fiscal years 2020 and 2021. Visualize data, prepare different types of reports and interactive dashboard.
40 | **Reports and conclusions:**
41 | adidas-sales-data-analysis.xlsx
42 | **Dashboard:**
43 | adidas-sales-dashboard.xlsx
44 | **Original dataset:**
45 | adidas-us-sales.xlsx
46 | **Skills:** analytical thinking, data cleaning, data analysis, data vizualization.
47 | **Hard skills:** Excel, Pivot Tables, Formulas, Functions, Charts, Dashboards, Slices, Pivot Charts.
48 | **Results:** An analysis of financial data on the sale of Adidas products in the USA for 2020 and 2021 was performed. Reports and dashboard were created.
49 | ### Real estate US SQL and Power BI analysis
50 | **Description:** The main goals of this project are:
1) To clean the dataset from Kaggle about real estate in the United States using SQL,
2) Explore data in Bigquery,
3) Group and prepare data for visualization in Power BI,
4) Create comprehensive dashboards, including interactive property stats, the price calculator that can count average prices by given parameters, and a dashboard that shows the quantity of sold properties in different states and the sales distribution among the months in the year for effective data exploration.
51 | **Data cleaning and exploring SQL:**
52 | data_cleaning.sql
53 | data_exploration.sql
54 | **Power BI dashboards:**
55 | real-estate-dashboards.pbix
56 | real-estate-dashboards.pdf
57 | **Skills:** analytical thinking, data cleaning, data analysis, data vizualization
58 | **Hard skills:** Bigquery SQL, Power BI, Dashboards
59 | **Results:** Data was cleaned (which reduced the amount of data by about nine times), explored in Bigquery, imported into Power BI, transformed, and 3 dashboards were created.
60 | ## Study Projects
61 | ### Python Data Analysis
62 | **Repository:**
63 | python-data-analysis
64 | **Description:** This repository contains ipynb files with tasks that were completed during the Python Data Analysis course on LinkedIn Learning. The main goal is to improve Python's analytical abilities.
65 | **Skills:** Python, Pandas, NumPy, Matplotlib, Data Analysis, Data Visualization
66 | **Status:** Completed in 2022
67 | ### Excel Exercises
68 | **Repository:**
69 | excel-data-analysis
70 | **Description:** This repository contains xlsx files with completed tasks from the Excel Data Analysis course on LinkedIn Learning. The main goal is to advance knowledge in spreadsheets and statistical analysis.
71 | **Skills:** Excel, Formulas, Functions, Data Analysis, Statistical Analysis.
72 | **Status:** Completed in 2022
73 | ### R Exercises
74 | **Repository:**
75 | r-data-analysis
76 | **Description:** This repository contains some completed exercises with R practice. The main goal is to gain basic knowledge of the R programming language and its use for data analysis and visualization.
77 | **Skills:** R programming language, Data Analysis, Data Visualization.
78 | **Status:** Completed in 2022
79 | ### SQL Exercises
80 | **Repository:**
81 | sql-queries
82 | **Description:** This repository contains SQL queries that I was writing to accomplish different tasks during different courses. The main goal is to improve SQL knowledge and fluency.
83 | **Skills:** SQL, Queries, data analysis, data cleaning
84 | **Status:** Completed in 2022
85 | ### ETL pipelines project
86 | **Repository:**
87 | etl-pipelines
88 | **Description:** This repository contains screenshots of the final project in the ETL and Data Pipelines with Shell, Airflow, and Kafka. Main goal: create data pipelines using Apache Airflow, Bash commands, and Kafka.
89 | **Skills:** Bash, Apache Airflow, Kafka
90 | **Status:** Completed in 2022
91 | ## Certificates
92 | * [Google Data Analytics Certificate](https://github.com/paulo81818/Data-Business-Analysis-Portfolio/blob/main/Certificates/Google%20Data%20Data%20Analytics.pdf) - Coursera, 2022
93 | * [IBM Data Analysis with Python](https://github.com/paulo81818/Data-Business-Analysis-Portfolio/blob/main/Certificates/IBM%20data%20analysis%20Python.pdf) - Coursera, 2022
94 | * [Mastering Data Analysis with Pandas](https://github.com/paulo81818/Data-Business-Analysis-Portfolio/blob/main/Certificates/Data%20analysis%20with%20Pandas.pdf) - Coursera, 2022
95 | * [Python Data Analysis](https://github.com/paulo81818/Data-Business-Analysis-Portfolio/blob/main/Certificates/Linkedin%20Learning%20Python%20Data%20Analysis.pdf) - Linkedin Learning, 2022
96 | * [Excel Data Analysis](https://github.com/paulo81818/Data-Business-Analysis-Portfolio/blob/main/Certificates/Linkedin%20Learning%20Excel%20Data%20Analysis.pdf) - Linkedin Learning, 2022
97 | * [ETL and Data Pipelines with Shell, Airflow and Kafka](https://github.com/paulo81818/Data-Business-Analysis-Portfolio/blob/main/Certificates/ETL%20and%20Data%20Pipelines%20with%20Shell%2C%20Airflow%20and%20Kafka.pdf) - Coursera, 2022
98 | * [Intermediate SQL: Data Reporting and Analysis](https://github.com/paulo81818/Data-Business-Analysis-Portfolio/blob/main/Certificates/Intermediate%20SQL%20Data%20Reporting%20and%20Analysis.pdf) - Linkedin Learning, 2022
99 | * [Prepare, Clean, Transform, and Load Data using Power BI](https://github.com/paulo81818/Data-Business-Analysis-Portfolio/blob/main/Certificates/Coursera%20Prepare%2C%20Clean%2C%20Transform%2C%20and%20Load%20Data%20using%20Power%20BI.pdf) - Coursera, 2022
100 | * [Excel: Working Together with Power Query and Power Pivot](https://github.com/paulo81818/Data-Business-Analysis-Portfolio/blob/main/Certificates/Certificate%20Excel%20Working%20Together%20with%20Power%20Query%20and%20Power%20Pivot.pdf) - Linkedin Learning, 2022
101 | * [Business Benefits Realization Foundations](https://github.com/paulo81818/Data-Business-Analysis-Portfolio/blob/main/Certificates/Certificate%20Business%20Benefits%20Realization%20Foundations.pdf) - Linkedin Learning, 2023
102 | * [Business Analyst and Project Manager Collaboration](https://github.com/paulo81818/Data-Business-Analysis-Portfolio/blob/main/Certificates/Certificate%20Business%20Analyst%20and%20Project%20Manager%20Collaboration.pdf) - Linkedin Learning, 2023
103 | * [Business Analysis Foundations](https://github.com/paulo81818/Data-Business-Analysis-Portfolio/blob/main/Certificates/Certificate%20Business%20Analysis%20Foundations.pdf) - Linkedin Learning, 2023
104 | * [Agile Requirements Foundations](https://github.com/paulo81818/Data-Business-Analysis-Portfolio/blob/main/Certificates/Certificate%20Agile%20Requirements%20Foundations.pdf) - Linkedin Learning, 2023
105 |
106 | ## Contacts
107 | * Linkedin: https://www.linkedin.com/in/pavelliaoshka
108 | * Email: pavelliaoshka818@gmail.com
109 | * Telegram: @franch8888
110 |
--------------------------------------------------------------------------------
/Real Estate USA data SQL and Power BI analysis/Data_cleaning.sql:
--------------------------------------------------------------------------------
1 | /* Data cleaning using SQL
2 | Skills used: CREATE, GROUP BY, ORDER BY, REPLACE, CASE, Remove duplicates, inadequate, and unnecessary data.*/
3 |
4 | ----------------------------------------------------------------------------------
5 |
6 | /*First look at the dataset*/
7 |
8 | SELECT
9 | *
10 | FROM
11 | `myproject8888-357816.real_estate_us.re-us`
12 | LIMIT
13 | 1000
14 |
15 |
16 |
17 |
18 | /* Remove Duplicates and create a new table */
19 |
20 | CREATE OR REPLACE TABLE myproject8888-357816.real_estate_us.re_us1
21 | AS
22 | SELECT
23 | DISTINCT *
24 | FROM
25 | `myproject8888-357816.real_estate_us.re-us`
26 |
27 | /* The dataset has decreased by almost 9 times */
28 |
29 |
30 |
31 | /* Change datatypes and names in the 'bed', 'bath', 'zip_code' columns*/
32 |
33 | CREATE OR REPLACE TABLE `myproject8888-357816.real_estate_us.re_us1`
34 | AS
35 | SELECT
36 | status,
37 | price,
38 | CAST(bed AS INT64) AS bedrooms,
39 | CAST(bath AS INT64) AS bathrooms,
40 | acre_lot,
41 | full_address,
42 | street,
43 | city,
44 | state,
45 | CAST(zip_code AS STRING) AS zipcode,
46 | house_size,
47 | sold_date
48 | FROM
49 | `myproject8888-357816.real_estate_us.re_us1`
50 |
51 |
52 |
53 | /*Check distint values in the status column and quantity*/
54 | SELECT
55 | status,
56 | COUNT(status)
57 | FROM
58 | `myproject8888-357816.real_estate_us.re_us1`
59 | GROUP BY
60 | status
61 |
62 |
63 |
64 | /*Check sold_date column with status 'ready_to_build'*/
65 |
66 | SELECT
67 | *
68 | FROM
69 | `myproject8888-357816.real_estate_us.re_us1`
70 | WHERE
71 | status = 'ready_to_build' AND
72 | sold_date IS NOT NULL
73 |
74 | /* The Status column has two distinct values, the majority of which are "for_sale" (113512).
75 | There is no 'sold date' for any of the rows with "ready_to_build" (277 rows). Later, we will exclude 'ready_to_build' rows from our analysis,
76 | because they are now actually existing buildings. */
77 |
78 |
79 |
80 | /*Check bed column*/
81 |
82 | SELECT
83 | bedrooms,
84 | COUNT(bedrooms) as count_bed
85 | FROM
86 | `myproject8888-357816.real_estate_us.re-us`
87 | GROUP BY
88 | bedrooms
89 | ORDER BY
90 | count_bed DESC
91 |
92 |
93 |
94 | SELECT
95 | *
96 | FROM
97 | `myproject8888-357816.real_estate_us.re_us1`
98 | WHERE
99 | bedrooms > 11
100 |
101 |
102 |
103 | SELECT
104 | *
105 | FROM
106 | `myproject8888-357816.real_estate_us.re_us1`
107 | WHERE
108 | bedrooms IS NULL
109 |
110 | /* Here are some rows with an enormously high quantity of bedrooms; for example, 123 is the maximum value. However, a portion of the rows do not have a null sold_date.
111 | There aren't many rows like this. We will leave it as is. 17516 with null values in this column */
112 |
113 |
114 |
115 | /* Check the other columns */
116 |
117 | SELECT
118 | bathrooms,
119 | COUNT(bathrooms) as count_bath
120 | FROM
121 | `myproject8888-357816.real_estate_us.re_us1`
122 | GROUP BY
123 | bathrooms
124 | ORDER BY
125 | count_bath DESC
126 |
127 | SELECT
128 | *
129 | FROM
130 | `myproject8888-357816.real_estate_us.re_us1`
131 | WHERE bathrooms > 12
132 |
133 | SELECT
134 | *
135 | FROM
136 | `myproject8888-357816.real_estate_us.re_us1`
137 | WHERE
138 | bathrooms IS NULL
139 |
140 | /* 16297 null values in the bathrooms column. Enormously high values (more than 11) in the bathroom column in 210 rows.
141 | There are more bathrooms than bedrooms in these rows. Maybe it's a mistake. But we don't know exactly; it's not the goal of our analysis right now.
142 | And it will not skew the results; we will leave it as it is. */
143 |
144 |
145 | SELECT
146 | *
147 | FROM
148 | `myproject8888-357816.real_estate_us.re_us1`
149 | WHERE
150 | state IS NULL
151 |
152 |
153 |
154 | SELECT
155 | state,
156 | COUNT(state) AS counts
157 | FROM
158 | `myproject8888-357816.real_estate_us.re_us1`
159 | GROUP BY
160 | state
161 | ORDER BY
162 | counts DESC
163 |
164 |
165 | /* There are no null values in the "state" column.
166 | Virginia (7), Georgia (5), South Carolina, Tennessee, Wyoming, and West Virginia (1) have a low quantity of rows.
167 | We will exclude them from our analysis. */
168 |
169 |
170 |
171 | SELECT
172 | *
173 | FROM
174 | `myproject8888-357816.real_estate_us.re_us1`
175 | WHERE
176 | sold_date IS NULL
177 |
178 |
179 | /* There are 54092 null values in the "sold_date" column. These rows cannot be used for time-series analysis.
180 | So, let's create two tables: one for analysis by time periods and prediction, and another for the basic exploration. */
181 |
182 |
183 |
184 | /* Remove the states of Virginia, Georgia, South Carolina, Tennessee, Wyoming, and West Virginia; "ready_to_build' status.
185 | Drop the status, full_address, and zipcode columns.
186 | I use CREATE OR DROP TABLE because DML is not available in the Bigquery Sandbox. */
187 |
188 | CREATE OR REPLACE TABLE `myproject8888-357816.real_estate_us.re_us1`
189 | AS
190 | SELECT
191 | state,
192 | city,
193 | street,
194 | price,
195 | bedrooms,
196 | bathrooms,
197 | acre_lot,
198 | house_size,
199 | sold_date
200 | FROM
201 | `myproject8888-357816.real_estate_us.re_us1`
202 | WHERE
203 | status != 'ready_to_build' AND
204 | state != 'Virginia'AND
205 | state != 'Georgia' AND
206 | state != 'South Carolina' AND
207 | state != 'Tennessee' AND
208 | state != 'Wyoming' AND
209 | state != 'West Virginia'
210 |
211 |
212 | /* Inspect city column */
213 | SELECT
214 | city,
215 | COUNT(city) AS counts,
216 | state
217 | FROM `myproject8888-357816.real_estate_us.re_us1`
218 | GROUP BY
219 | city,
220 | state
221 | ORDER BY
222 | counts DESC
223 |
224 |
225 | SELECT
226 | city,
227 | COUNT(city) AS counts,
228 | state
229 | FROM `myproject8888-357816.real_estate_us.re_us1`
230 | WHERE city LIKE 'N%' AND state = 'New York'
231 | GROUP BY city, state
232 | ORDER BY counts DESC
233 |
234 |
235 |
236 | /*There are different spellings of New York(New York City, Ny, Nyc). Let's fix it */
237 |
238 | SELECT
239 | city,
240 | COUNT(city) AS counts,
241 | REPLACE(REPLACE(REPLACE(city, 'New York City', 'New York'),'Nyc', 'New York'), 'Ny', 'New York') AS ny,
242 | state
243 | FROM
244 | `myproject8888-357816.real_estate_us.re_us1`
245 | WHERE
246 | city LIKE 'N%' AND state = 'New York'
247 | GROUP BY
248 | city,
249 | state
250 | ORDER BY
251 | counts DESC
252 |
253 |
254 | CREATE OR REPLACE TABLE
255 | `myproject8888-357816.real_estate_us.re_us1`
256 | AS
257 | SELECT
258 | state,
259 | REPLACE(REPLACE(REPLACE(city, 'New York City', 'New York'),'Nyc', 'New York'), 'Ny', 'New York') AS city,
260 | street,
261 | price,
262 | bedrooms,
263 | bathrooms,
264 | acre_lot,
265 | house_size,
266 | sold_date
267 | FROM
268 | `myproject8888-357816.real_estate_us.re_us1`
269 |
270 |
271 |
272 | /* Fixing 23 null values in the "city" column and add excract year from 'sold_date'.
273 | Checking null and suspiciously low values values(51 rows) is 'price' column and removing them */
274 |
275 | SELECT
276 | *
277 | FROM
278 | `myproject8888-357816.real_estate_us.re_us1`
279 | WHERE
280 | city IS NULL
281 |
282 |
283 | SELECT
284 | *
285 | FROM
286 | `myproject8888-357816.real_estate_us.re_us2`
287 | WHERE
288 | price IS NULL
289 |
290 |
291 |
292 | SELECT
293 | *
294 | FROM
295 | `myproject8888-357816.real_estate_us.re_us2`
296 | WHERE
297 | price < 5000
298 |
299 |
300 |
301 | CREATE OR REPLACE TABLE `myproject8888-357816.real_estate_us.re_us2`
302 | AS
303 | SELECT
304 | state,
305 | CASE
306 | WHEN street IN ('163 Union and Mt Wash Ea','155-A La Vallee Nb','123 Catherines Hope Eb', '21 N Grapetree Eb', '42 43 Shoys Ea', '8-B Teagues Bay Eb', '242 Union and Mt Wash Ea', '96 Hard Labor Pr') AND city IS NULL
307 | THEN 'Christiansted'
308 | WHEN street IN ('4 Prosperity Nb', '20 River Pr', '17 Prosperity Nb', '94V I Corp Lands Pr', '14 Diamond Pr', '192 La Vallee Nb') AND city IS NULL THEN 'Frederiksted'
309 | WHEN street IN ('240 St John Qu') AND city IS NULL THEN 'Saint John'
310 | WHEN street IN ('230 S Stevens Ave') AND city IS NULL THEN 'South Amboy'
311 | WHEN street IN ('0 Block 32 Quinton Alloway Quinton Rd Lot 11 01') AND city IS NULL THEN 'Quinton'
312 | WHEN street = '641 State Route 82' AND city IS NULL THEN 'Hopewell Junction'
313 | WHEN street = '32 Devereux Dr' AND city IS NULL THEN 'Manchester Township'
314 | WHEN street = '9-11 Putnam Park Rd' AND city IS NULL THEN 'Bethel'
315 | WHEN street = '68 Avondale St' AND city IS NULL THEN 'Valley Stream'
316 | WHEN street = '824-26 Berckman St' AND city IS NULL THEN 'Plainfield'
317 | WHEN street = '689 Luis M Marin Blvd Unit 1009' AND city IS NULL THEN 'Jersey City'
318 | ELSE city
319 | END AS city,
320 | street,
321 | price,
322 | bedrooms,
323 | bathrooms,
324 | acre_lot,
325 | house_size,
326 | sold_date,
327 | EXTRACT(YEAR FROM sold_date) AS year
328 | FROM
329 | `myproject8888-357816.real_estate_us.re_us1`
330 | WHERE
331 | price > 5000
332 |
333 |
334 | /* Create house_size_m2 and hectare_lot columns.
335 | Replace incorrect highest value, update new highest value according to realtor.com data.
336 | Change info about property with adress information 421 W 250th St (complex way, because DML language and UPDATE is not available in the Bigquery sandbox) */
337 |
338 |
339 | SELECT
340 | *
341 | FROM
342 | `myproject8888-357816.real_estate_us.re_us2`
343 | ORDER BY
344 | price DESC
345 |
346 |
347 | CREATE OR UPDATE TABLE `myproject8888-357816.real_estate_us.re_us2`
348 | AS
349 | SELECT
350 | state,
351 | CASE
352 | WHEN street = '421 W 250th St' AND city = 'New York' THEN 'Bronx'
353 | ELSE city
354 | END AS city,
355 | street,
356 | CASE
357 | WHEN street = '952 E 223 St Units 4858 & 66' AND price = 875000000 THEN 850000
358 | WHEN street = '432 Park Ave Unit Penthouse' AND price = 169000000 Then 180000000
359 | WHEN street = '421 W 250th St' AND price = 120000000 THEN 8750000
360 | ELSE price
361 | END AS price,
362 | CASE
363 | WHEN street = '421 W 250th St' AND bedrooms = 123 THEN 8
364 | ELSE bedrooms
365 | END AS bedrooms,
366 | CASE
367 | WHEN street = '421 W 250th St' AND bathrooms = 123 THEN 10
368 | ELSE bathrooms
369 | END AS bathrooms,
370 | acre_lot,
371 | acre_lot*0.404686 AS hectare_lot,
372 | CASE
373 | WHEN street = '421 W 250th St' AND house_size IS NULL THEN 11135
374 | ELSE house_size
375 | END AS house_size,
376 | house_size/10.7639 AS house_size_m2,
377 | CASE
378 | WHEN street = '421 W 250th St' AND sold_date = '2012-06-29' THEN NULL
379 | ELSE sold_date
380 | END AS sold_date
381 | FROM
382 | `myproject8888-357816.real_estate_us.re_us2`
383 | ORDER BY
384 | price DESC
385 |
386 |
387 | /* Here are some more duplicates with slightly different street column values but the same other columns. We need to solve this */
388 |
389 | SELECT
390 | DISTINCT
391 | state,
392 | city,
393 | price,
394 | bedrooms,
395 | bathrooms,
396 | acre_lot,
397 | house_size,
398 | sold_date
399 | FROM
400 | `myproject8888-357816.real_estate_us.re_us2`
401 | ORDER BY
402 | price DESC
403 |
404 | /* There are 111016 Distinct rows excluding street column */
405 |
406 | /* Check the duplicate rows to decide how to treat them. */
407 |
408 | SELECT
409 | *
410 | FROM
411 | `myproject8888-357816.real_estate_us.re_us4` a
412 | JOIN (SELECT state,
413 | city,
414 | price,
415 | IFNULL(bedrooms, 0) AS bedrooms,
416 | IFNULL(bathrooms, 0) AS bathrooms,
417 | IFNULL(acre_lot, 0) AS acre_lot,
418 | IFNULL(house_size, 0) AS house_size,
419 | COUNT(*)
420 | FROM `myproject8888-357816.real_estate_us.re_us4`
421 | GROUP BY state,
422 | city,
423 | price,
424 | bedrooms,
425 | bathrooms,
426 | acre_lot,
427 | house_size
428 | HAVING COUNT(*) > 1) b
429 | ON a.state = b.state
430 | AND a.city = b.city
431 | AND a.price = b.price
432 | AND a.bedrooms = b.bedrooms
433 | AND a.bathrooms = b.bathrooms
434 | AND a.acre_lot = b.acre_lot
435 | AND a.house_size = b.house_size
436 | ORDER BY
437 | a.price
438 |
439 | /* With a few exceptions, we can tell from the web information about duplicate row addresses that the majority of them are the same property.
440 | We can remove this duplicates
441 | But it's necessary to check rows where bedrooms, bathrooms, acre_lot, house_size are nulls to see if they are the same.*/
442 | /* Create a table with the changed datatypes and replaced null values. */
443 |
444 | CREATE OR REPLACE TABLE
445 | `myproject8888-357816.real_estate_us.re_us5`
446 | AS
447 | SELECT
448 | state,
449 | city,
450 | street,
451 | CAST(price AS INT64) AS price,
452 | IFNULL(bedrooms, 0) AS bedrooms,
453 | IFNULL(bathrooms, 0) AS bathrooms,
454 | IFNULL(CAST(acre_lot AS STRING), '0') AS acre_lot,
455 | IFNULL(CAST(house_size AS STRING), '0') AS house_size,
456 | IFNULL(CAST(sold_date AS STRING), '0') AS sold_date,
457 | FROM
458 | `myproject8888-357816.real_estate_us.re_us4`
459 |
460 |
461 | WITH cte AS (
462 | SELECT *,
463 | row_number() OVER(PARTITION BY state,
464 | city,
465 | price,
466 | bedrooms,
467 | bathrooms,
468 | acre_lot,
469 | house_size,
470 | sold_date ORDER BY price DESC) AS rn
471 | FROM `myproject8888-357816.real_estate_us.re_us5`
472 | )
473 | Select * from cte WHERE rn > 1 AND bedrooms = 0 AND bathrooms = 0 AND acre_lot = '0' AND house_size = '0'
474 |
475 | /* 33 rows with null in bedrooms, bathrooms, acre_lot, house_size columns at the same time.
476 | Part of them are different plots of land, and another part are the duplicate properties. We can remove duplicates here. */
477 |
478 |
479 | CREATE OR REPLACE TABLE `myproject8888-357816.real_estate_us.re_us_noduplicates`
480 | AS
481 | WITH CTE AS (
482 | SELECT *,
483 | row_number() OVER(PARTITION BY state,
484 | city,
485 | price,
486 | bedrooms,
487 | bathrooms,
488 | acre_lot,
489 | house_size,
490 | sold_date ORDER BY price DESC) AS rn
491 | FROM
492 | `myproject8888-357816.real_estate_us.re_us5`
493 | )
494 | SELECT
495 | *
496 | FROM
497 | CTE
498 | WHERE
499 | rn = 1
500 | ORDER BY price DESC
501 |
502 |
503 | /* Also, we need to separate plots of land from property for our analysis. */
504 |
505 | CREATE OR REPLACE TABLE
506 | `myproject8888-357816.real_estate_us.re_us_noduplicates`
507 | AS
508 | SELECT
509 | ROW_NUMBER() OVER(ORDER BY price DESC) AS id,
510 | state,
511 | city,
512 | street,
513 | price,
514 | bedrooms,
515 | bathrooms,
516 | acre_lot,
517 | house_size,
518 | sold_date
519 | FROM
520 | `myproject8888-357816.real_estate_us.re_us_noduplicates`
521 |
522 |
523 | CREATE OR REPLACE TABLE `myproject8888-357816.real_estate_us.re_us_noduplicates`
524 | AS
525 | SELECT
526 | *
527 | FROM
528 | `myproject8888-357816.real_estate_us.re_us_noduplicates`
529 | WHERE
530 | id NOT IN (SELECT id From `myproject8888-357816.real_estate_us.re_us_noduplicates` WHERE bedrooms = 0 AND bathrooms = 0 AND acre_lot != '0' AND house_size = '0')
531 | ORDER BY
532 | price DESC
533 |
534 | /* Check columns with nulls in bedrooms, bathrooms, acre_lot, house_size4 columns*/
535 |
536 | WITH cte AS (
537 | SELECT *,
538 | row_number() OVER(PARTITION BY state,
539 | city,
540 | price,
541 | bedrooms,
542 | bathrooms,
543 | acre_lot,
544 | house_size,
545 | sold_date ORDER BY price DESC) AS rn
546 | FROM `myproject8888-357816.real_estate_us.re_us5`
547 | )
548 | Select * from cte WHERE bedrooms = 0 AND bathrooms = 0 AND acre_lot = '0' AND house_size = '0'
549 | ORDER BY price DESC
550 |
551 | /* There are 552 such rows. Most of them are properties. Maybe all this information was just skipped while entering data.
552 | Leave these rows in our property data. */
553 |
554 | /* Change datatypes, add columns */
555 |
556 | SELECT
557 | state,
558 | city,
559 | street,
560 | price,
561 | bedrooms,
562 | bathrooms,
563 | CAST(CASE WHEN acre_lot = 0
564 | THEN NULL
565 | ELSE acre_lot
566 | END AS FLOAT64) AS acre_lot,
567 | CAST(CASE WHEN house_size = 0
568 | THEN NULL
569 | ELSE house_size
570 | END AS FLOAT64) AS house_size,
571 | CAST(CASE WHEN sold_date = '0'
572 | THEN NULL
573 | ELSE sold_date
574 | END AS DATE) AS sold_date
575 | FROM
576 | `myproject8888-357816.real_estate_us.re_us5`
577 |
578 |
579 | CREATE OR REPLACE TABLE `myproject8888-357816.real_estate_us.re_us_property`
580 | AS
581 | SELECT
582 | state,
583 | city,
584 | street,
585 | price,
586 | bedrooms,
587 | bathrooms,
588 | acre_lot,
589 | acre_lot*0.404686 AS hectare_lot,
590 | house_size,
591 | house_size/10.7639 AS house_size_m2,
592 | sold_date,
593 | EXTRACT(YEAR FROM sold_date) AS year
594 | FROM
595 | `myproject8888-357816.real_estate_us.re_us_property`
596 |
597 |
598 | /* Create second table only with not null values in the 'sold_date' column */
599 |
600 | CREATE OR REPLACE TABLE `myproject8888-357816.real_estate_us.re_us_sold`
601 | AS
602 | SELECT
603 | *
604 | FROM
605 | `myproject8888-357816.real_estate_us.re_us2`
606 | WHERE
607 | sold_date IS NOT NULL
608 |
609 |
610 | /* Create a table with plots of land.
611 | But we need to do more exploration of this data to be sure that this table contains actual information about plots. */
612 |
613 |
614 | CREATE OR REPLACE TABLE
615 | `myproject8888-357816.real_estate_us.re_us_plots`
616 | AS
617 | SELECT
618 | state,
619 | city,
620 | street,
621 | price,
622 | CAST(acre_lot AS FLOAT64) AS acre_lot,
623 | sold_date
624 | FROM `myproject8888-357816.real_estate_us.re_us5`
625 | WHERE bedrooms = 0 AND bathrooms = 0 AND acre_lot != '0' AND house_size = '0'
626 | ORDER BY price DESC
627 |
--------------------------------------------------------------------------------
/Real Estate USA data SQL and Power BI analysis/Exploring_data.sql:
--------------------------------------------------------------------------------
1 | /* Exploring data about real estate in USA in SQL
2 | Skills used:*/
3 |
4 | ------------------------------------------------------------------------
5 |
6 |
7 | SELECT
8 | year,
9 | COUNT(year) AS property_sold
10 | FROM
11 | `myproject8888-357816.real_estate_us.re_us2`
12 | GROUP BY
13 | year
14 | ORDER BY
15 | property_sold DESC
16 |
17 | /* 2023 - highest, 1901 - lowest */
18 |
19 |
20 | /* Explore state */
21 |
22 | SELECT
23 | state,
24 | COUNT(state) AS num_of_property,
25 | AVG(price) AS avg_price,
26 | MIN(price) AS min_price,
27 | MAX(price) AS max_price,
28 | AVG(house_size_m2) AS avg_size,
29 | MIN(house_size_m2) AS min_size,
30 | MAX(house_size_m2) AS max_size,
31 | AVG(hectare_lot) AS avg_lot,
32 | MIN(hectare_lot) AS min_lot,
33 | MAX(hectare_lot) AS max_lot,
34 | FROM
35 | `myproject8888-357816.real_estate_us.re_us4`
36 | GROUP BY
37 | state
38 | ORDER BY
39 | num_of_property DESC
40 |
41 | SELECT
42 | year,
43 | state,
44 | COUNT(state) AS num_of_property,
45 | AVG(price) AS avg_price,
46 | MIN(price) AS min_price,
47 | MAX(price) AS max_price,
48 | AVG(house_size_m2) AS avg_size,
49 | MIN(house_size_m2) AS min_size,
50 | MAX(house_size_m2) AS max_size,
51 | AVG(hectare_lot) AS avg_lot,
52 | MIN(hectare_lot) AS min_lot,
53 | MAX(hectare_lot) AS max_lot,
54 | FROM
55 | `myproject8888-357816.real_estate_us.re_us_property`
56 | WHERE
57 | year IS NOT NULL
58 | GROUP BY
59 | state, year
60 | ORDER BY
61 | num_of_property DESC
62 |
63 |
64 |
65 | /* Explore city */
66 |
67 | SELECT
68 | city,
69 | COUNT(city) AS num_of_property,
70 | AVG(price) AS avg_price,
71 | MIN(price) AS min_price,
72 | MAX(price) AS max_price,
73 | AVG(house_size_m2) AS avg_size,
74 | MIN(house_size_m2) AS min_size,
75 | MAX(house_size_m2) AS max_size,
76 | AVG(hectare_lot) AS avg_lot,
77 | MIN(hectare_lot) AS min_lot,
78 | MAX(hectare_lot) AS max_lot,
79 | FROM
80 | `myproject8888-357816.real_estate_us.re_us4`
81 | GROUP BY
82 | city
83 | ORDER BY
84 | num_of_property DESC
85 |
86 |
87 | /* Explore bathrooms */
88 |
89 |
90 | SELECT
91 | state,
92 | bathrooms,
93 | COUNT(bathrooms) AS count_bath,
94 | AVG(price) AS avg_price,
95 | MIN(price) AS min_price,
96 | MAX(price) AS max_price,
97 | AVG(house_size_m2) AS avg_size,
98 | MIN(house_size_m2) AS min_size,
99 | MAX(house_size_m2) AS max_size,
100 | FROM
101 | `myproject8888-357816.real_estate_us.re_us_property`
102 | GROUP BY
103 | state, bathrooms
104 | ORDER BY
105 | count_bath DESC, state
106 |
107 |
108 | SELECT
109 | bathrooms,
110 | COUNT(bathrooms) AS count_bath,
111 | AVG(price) AS avg_price,
112 | MIN(price) AS min_price,
113 | MAX(price) AS max_price,
114 | AVG(house_size_m2) AS avg_size,
115 | MIN(house_size_m2) AS min_size,
116 | MAX(house_size_m2) AS max_size,
117 | FROM
118 | `myproject8888-357816.real_estate_us.re_us_property`
119 | GROUP BY
120 | bathrooms
121 | ORDER BY
122 | count_bath DESC
123 |
124 |
125 |
126 | /* Explore bedrooms */
127 |
128 | SELECT
129 | state,
130 | bedrooms,
131 | COUNT(bedrooms) AS count_bed,
132 | AVG(price) AS avg_price,
133 | MIN(price) AS min_price,
134 | MAX(price) AS max_price,
135 | AVG(house_size_m2) AS avg_size,
136 | MIN(house_size_m2) AS min_size,
137 | MAX(house_size_m2) AS max_size,
138 | FROM
139 | `myproject8888-357816.real_estate_us.re_us_property`
140 | GROUP BY
141 | bedrooms, state
142 | ORDER BY
143 | count_bed DESC
144 |
145 |
146 | SELECT
147 | bedrooms,
148 | COUNT(bedrooms) AS count_bed,
149 | AVG(price) AS avg_price,
150 | MIN(price) AS min_price,
151 | MAX(price) AS max_price,
152 | AVG(house_size_m2) AS avg_size,
153 | MIN(house_size_m2) AS min_size,
154 | MAX(house_size_m2) AS max_size,
155 | FROM
156 | `myproject8888-357816.real_estate_us.re_us_property`
157 | GROUP BY
158 | bedrooms
159 | ORDER BY
160 | count_bed DESC
161 |
162 |
163 | /* Adding "year" column. */
164 |
165 | CREATE OR REPLACE TABLE `myproject8888-357816.real_estate_us.re_us_property`
166 | AS
167 | SELECT
168 | *,
169 | EXTRACT(YEAR FROM sold_date) AS year,
170 | FROM `myproject8888-357816.real_estate_us.re_us_property`
171 |
172 |
173 | /* Query data for Power BI exploration by year. */
174 | SELECT
175 | state,
176 | city,
177 | year,
178 | bedrooms,
179 | bathrooms,
180 | COUNT(state) AS num_of_property,
181 | AVG(price) AS avg_price,
182 | MIN(price) AS min_price,
183 | MAX(price) AS max_price,
184 | AVG(house_size_m2) AS avg_size,
185 | MIN(house_size_m2) AS min_size,
186 | MAX(house_size_m2) AS max_size,
187 | AVG(hectare_lot) AS avg_lot,
188 | MIN(hectare_lot) AS min_lot,
189 | MAX(hectare_lot) AS max_lot,
190 | FROM
191 | `myproject8888-357816.real_estate_us.re_us_property`
192 | WHERE year IS NOT NULL
193 | GROUP BY
194 | state, city, year, bedrooms, bathrooms
195 | ORDER BY
196 | num_of_property DESC
197 |
198 |
199 |
200 | /* Query data for Power BI exploration by year and month. */
201 | SELECT
202 | state,
203 | city,
204 | year,
205 | FORMAT_DATE('%B', sold_date) AS month,
206 | bedrooms,
207 | bathrooms,
208 | COUNT(state) AS num_of_property,
209 | AVG(price) AS avg_price,
210 | MIN(price) AS min_price,
211 | MAX(price) AS max_price,
212 | AVG(house_size_m2) AS avg_size,
213 | MIN(house_size_m2) AS min_size,
214 | MAX(house_size_m2) AS max_size,
215 | AVG(hectare_lot) AS avg_lot,
216 | MIN(hectare_lot) AS min_lot,
217 | MAX(hectare_lot) AS max_lot,
218 | FROM
219 | `myproject8888-357816.real_estate_us.re_us_property`
220 | WHERE year IS NOT NULL
221 | GROUP BY
222 | state, city, year, month, bedrooms, bathrooms
223 | ORDER BY
224 | num_of_property DESC
225 |
226 |
227 | /* Explore data with null sold date */
228 | SELECT
229 | state,
230 | city,
231 | bedrooms,
232 | bathrooms,
233 | COUNT(state) AS num_of_property,
234 | SUM(price) AS market_size,
235 | AVG(price) AS avg_price,
236 | MIN(price) AS min_price,
237 | MAX(price) AS max_price,
238 | AVG(house_size_m2) AS avg_size,
239 | MIN(house_size_m2) AS min_size,
240 | MAX(house_size_m2) AS max_size,
241 | AVG(hectare_lot) AS avg_lot,
242 | MIN(hectare_lot) AS min_lot,
243 | MAX(hectare_lot) AS max_lot,
244 | FROM
245 | `myproject8888-357816.real_estate_us.re_us_property`
246 | WHERE year IS NULL
247 | GROUP BY
248 | state, city, bedrooms, bathrooms
249 | ORDER BY
250 | num_of_property DESC
251 |
252 |
253 |
254 | /* Explore from lowest price */
255 | SELECT
256 | *
257 | FROM
258 | `myproject8888-357816.real_estate_us.re_us_property`
259 | WHERE year IS NULL
260 | ORDER BY
261 | price ASC
262 |
263 | /* Part of the properties from the data are off-market right now, and part are still on sale.
264 | It's now very useful for analysis: the data contains information about property on sale and already sold at an unknown time.
265 | We can divide it just by manually checking.
266 | There is no need to do such big work. We can additionally try to visualize the whole bunch of data in Power BI, maybe it will show something. */
267 |
--------------------------------------------------------------------------------
/Real Estate USA data SQL and Power BI analysis/Real Estate USA Dashboards.pbix:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/paulo81818/Data-Business-Analysis-Portfolio/a03c682609731826fbf301cdc43e5dc808a676fd/Real Estate USA data SQL and Power BI analysis/Real Estate USA Dashboards.pbix
--------------------------------------------------------------------------------
/Real Estate USA data SQL and Power BI analysis/Real-Estate USA Dashboards.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/paulo81818/Data-Business-Analysis-Portfolio/a03c682609731826fbf301cdc43e5dc808a676fd/Real Estate USA data SQL and Power BI analysis/Real-Estate USA Dashboards.pdf
--------------------------------------------------------------------------------