├── .gitattributes ├── .gitignore ├── Analyzing Police Activity with pandas ├── Codings │ ├── Chapter 1-Preparing the data for analysis.py │ ├── Chapter 2-Exploring the relationship between gender and policing.py │ ├── Chapter 3-Visual exploratory data analysis.py │ └── Chapter 4-Analyzing the effect of weather on policing.py ├── Datasets │ ├── police.csv │ └── weather.csv └── Notes │ ├── chapter1.pdf │ ├── chapter2.pdf │ ├── chapter3.pdf │ └── chapter4.pdf ├── Cleaning Data in Python ├── Codings │ ├── Chapter 1-Exploring your data.py │ ├── Chapter 2-Tidying data for analysis.py │ ├── Chapter 3-Combining data for analysis.py │ ├── Chapter 4-Cleaning data for analysis.py │ └── Chapter 5-Case study.py ├── Datasets │ ├── airquality.csv │ ├── dob_job_application_filings_subset.csv │ ├── ebola.csv │ ├── gapminder.csv │ ├── nyc_uber_2014.csv │ ├── tb.csv │ └── tips.csv └── Notes │ ├── ch1_slides.pdf │ ├── ch2_slides.pdf │ ├── ch3_slides.pdf │ ├── ch4_slides.pdf │ └── ch5_slides.pdf ├── Conda Essentials └── Codings │ ├── Chapter 1-Installing Packages.py │ ├── Chapter 2-Utilizing Channels.py │ ├── Chapter 3-Working with Environments.py │ └── Chapter 4-Case Study on Using Environments.py ├── Importing Data in Python (Part 1) ├── Codings │ ├── Chapter 1-Introduction and flat files.py │ ├── Chapter 2-Importing data from other file types.py │ └── Chapter 3-Working with relational databases in Python.py ├── Datasets │ ├── Chinook.sqlite │ ├── L-L1_LOSC_4_V1-1126259446-32.hdf5 │ ├── battledeath.xlsx │ ├── disarea.dta │ ├── ja_data2.mat │ ├── mnist_kaggle_some_rows.csv │ ├── sales.sas7bdat │ ├── seaslug.txt │ └── titanic_sub.csv └── Notes │ ├── ch2_pdf_slides.pdf │ ├── ch_1_slides.pdf │ └── ch_3_slides.pdf ├── Importing Data in Python (Part 2) ├── Codings │ ├── Chapter 1-Importing data from the Internet.py │ ├── Chapter 2-Interacting with APIs to import data from the web.py │ └── Chapter 3-Diving deep into the Twitter API.py ├── Datasets │ ├── latitude.xls │ ├── tweets3.txt │ └── winequality-red.csv └── Notes │ ├── ch_1_slides.pdf │ ├── ch_2_slides.pdf │ └── ch_3_slides.pdf ├── Interactive Data Visualization with Bokeh ├── Codings │ ├── Chapter 1-Basic plotting with Bokeh.py │ ├── Chapter 2-Layouts, Interactions, and Annotations.py │ ├── Chapter 3-Building interactive apps with Bokeh.py │ └── Chapter 4-Putting It All Together! A Case Study.py ├── Datasets │ ├── aapl.csv │ ├── auto-mpg.csv │ ├── gapminder_tidy.csv │ ├── glucose.csv │ ├── literacy_birth_rate.csv │ └── sprint.csv └── Notes │ ├── ch1_slides.pdf │ ├── ch2_slides.pdf │ ├── ch4_slides.pdf │ └── ch5_slides.pdf ├── Intermediate Python for Data Science ├── Codings │ ├── Chapter 1-Matplotlib │ │ ├── 1.py │ │ ├── 10.py │ │ ├── 11.py │ │ ├── 12.py │ │ ├── 2.py │ │ ├── 3.py │ │ ├── 4.py │ │ ├── 5.py │ │ ├── 6.py │ │ ├── 7.py │ │ ├── 8.py │ │ └── 9.py │ ├── Chapter 2-Dictionaries & Pandas │ │ ├── 1.py │ │ ├── 10.py │ │ ├── 11.py │ │ ├── 12.py │ │ ├── 13.py │ │ ├── 14.py │ │ ├── 15.py │ │ ├── 2.py │ │ ├── 3.py │ │ ├── 4.py │ │ ├── 5.py │ │ ├── 6.py │ │ ├── 7.py │ │ ├── 8.py │ │ └── 9.py │ ├── Chapter 3-Logic, Control Flow and Filtering │ │ ├── 1.py │ │ ├── 10.py │ │ ├── 11.py │ │ ├── 12.py │ │ ├── 2.py │ │ ├── 3.py │ │ ├── 4.py │ │ ├── 5.py │ │ ├── 6.py │ │ ├── 7.py │ │ ├── 8.py │ │ └── 9.py │ ├── Chapter 4-Loops │ │ ├── 1.py │ │ ├── 10.py │ │ ├── 11.py │ │ ├── 12.py │ │ ├── 2.py │ │ ├── 3.py │ │ ├── 4.py │ │ ├── 5.py │ │ ├── 6.py │ │ ├── 7.py │ │ ├── 8.py │ │ └── 9.py │ └── Chapter 5-Case Study Hacker Statistics │ │ ├── 1.py │ │ ├── 10.py │ │ ├── 2.py │ │ ├── 3.py │ │ ├── 4.py │ │ ├── 5.py │ │ ├── 6.py │ │ ├── 7.py │ │ ├── 8.py │ │ └── 9.py ├── Datasets │ ├── brics.csv │ ├── cars.csv │ └── gapminder.csv └── Notes │ ├── intermediate_python_ch1_slides.pdf │ ├── intermediate_python_ch2_slides.pdf │ ├── intermediate_python_ch3_slides.pdf │ ├── intermediate_python_ch4_slides.pdf │ └── intermediate_python_ch5_slides.pdf ├── Introduction to Data Visualization with Python ├── Codings │ ├── Chapter 1-Customizing plots.py │ ├── Chapter 2-Plotting 2D arrays.py │ ├── Chapter 3-Statistical plots with Seaborn.py │ └── Chapter 4- Analyzing time series and images.py ├── Datasets │ ├── auto-mpg.csv │ ├── percent-bachelors-degrees-women-usa.csv │ └── stocks.csv └── Notes │ ├── ch1_slides.pdf │ ├── ch2_slides.pdf │ ├── ch3_slides.pdf │ └── ch4_slides.pdf ├── Introduction to Python ├── Codings │ ├── Chapter 1-Introduction to Python │ │ ├── 1.py │ │ ├── 2.py │ │ ├── 3.py │ │ ├── 4.py │ │ ├── 5.py │ │ ├── 6.py │ │ ├── 7.py │ │ └── 8.py │ ├── Chapter 2-Python Lists │ │ ├── 1.py │ │ ├── 10.py │ │ ├── 2.py │ │ ├── 3.py │ │ ├── 4.py │ │ ├── 5.py │ │ ├── 6.py │ │ ├── 7.py │ │ ├── 8.py │ │ └── 9.py │ ├── Chapter 3-Functions and Packages │ │ ├── 1.py │ │ ├── 2.py │ │ ├── 3.py │ │ ├── 4.py │ │ ├── 5.py │ │ ├── 6.py │ │ └── 7.py │ └── Chapter 4-NumPy │ │ ├── 1.py │ │ ├── 10.py │ │ ├── 11.py │ │ ├── 12.py │ │ ├── 2.py │ │ ├── 3.py │ │ ├── 4.py │ │ ├── 5.py │ │ ├── 6.py │ │ ├── 7.py │ │ ├── 8.py │ │ └── 9.py ├── Datasets │ ├── baseball.csv │ └── fifa.csv └── Notes │ ├── ch1_slides.pdf │ ├── ch2_slides.pdf │ ├── ch3_slides.pdf │ └── ch4_slides.pdf ├── Introduction to Relational Databases in SQL ├── Codings │ ├── Chapter 1-Your first database.py │ ├── Chapter 2-Enforce data consistency with attribute constraints.py │ ├── Chapter 3-Uniquely identify records with key constraints.py │ └── Chapter 4-Glue together tables with foreign keys.py └── Notes │ ├── chapter1.pdf │ ├── chapter2.pdf │ ├── chapter3.pdf │ └── chapter4.pdf ├── Introduction to Shell for Data Science └── Shell codings │ ├── Chapter 1-Manipulating files and directories.py │ ├── Chapter 2-Manipulating data.py │ ├── Chapter 3-Combining tools.py │ ├── Chapter 4-Batch processing.py │ └── Chapter 5-Creating new tools.py ├── LICENSE ├── Manipulating DataFrames with pandas ├── Codings │ ├── Chapter 1-Extracting and transforming data.py │ ├── Chapter 2-Advanced indexing.py │ ├── Chapter 3-Rearranging and reshaping data.py │ ├── Chapter 4-Grouping data.py │ └── Chapter 5-Bringing it all together.py ├── Datasets │ ├── all_medalists.csv │ ├── gapminder_tidy.csv │ ├── pennsylvania2012_turnout.csv │ ├── pittsburgh2013.csv │ ├── sales.zip │ ├── titanic.csv │ └── users.csv └── Notes │ ├── Python For Data Science Cheat Sheet.pdf │ ├── ch1_slides.pdf │ ├── ch2_slides.pdf │ ├── ch3_slides.pdf │ ├── ch4_slides.pdf │ └── ch5_slides.pdf ├── Merging DataFrames with pandas ├── Codings │ ├── Chapter 1-Preparing data.py │ ├── Chapter 2-Concatenating data.py │ ├── Chapter 3-Merging data.py │ └── Chapter 4-Case Study - Summer Olympics.py ├── Datasets │ ├── Baby names.zip │ ├── GDP.zip │ ├── Sales.zip │ ├── Summer Olympic medals.zip │ ├── automobiles.csv │ ├── exchange.csv │ ├── oil_price.csv │ ├── pittsburgh2013.csv │ └── sp500.csv └── Notes │ ├── ch1_slides.pdf │ ├── ch2_slides.pdf │ ├── ch3_slides.pdf │ └── ch4_slides.pdf ├── Natural Language Processing Fundamentals in Python ├── Chapter 1-Regular expressions & word tokenization │ ├── 1.py │ ├── 2.py │ ├── 3.py │ ├── 4.py │ ├── 5.py │ ├── 6.py │ └── chapter1.pdf ├── Chapter 2-Simple topic identification │ ├── 1.py │ ├── 2.py │ ├── 3.py │ ├── 4.py │ ├── 5.py │ └── chapter2.pdf ├── Chapter 3-Named-entity recognition │ ├── 1.py │ ├── 2.py │ ├── 3.py │ ├── 4.py │ ├── 5.py │ ├── 6.py │ └── chapter3.pdf ├── Chapter 4-Building a fake news classifier │ ├── 1.py │ ├── 2.py │ ├── 3.py │ ├── 4.py │ ├── 5.py │ ├── 6.py │ ├── 7.py │ └── chapter4.pdf └── Datasets │ ├── News articles │ └── News articles │ │ ├── articles.txt │ │ ├── blaise.txt │ │ ├── french.txt │ │ └── uber_apple.txt │ ├── Wikipedia articles │ └── Wikipedia articles │ │ ├── wiki_text_bug.txt │ │ ├── wiki_text_computer.txt │ │ ├── wiki_text_crash.txt │ │ ├── wiki_text_debugger.txt │ │ ├── wiki_text_debugging.txt │ │ ├── wiki_text_exception.txt │ │ ├── wiki_text_hopper.txt │ │ ├── wiki_text_language.txt │ │ ├── wiki_text_malware.txt │ │ ├── wiki_text_program.txt │ │ ├── wiki_text_reversing.txt │ │ └── wiki_text_software.txt │ ├── english_stopwords.txt │ └── grail.txt ├── Python Data Science Toolbox (Part 1) ├── Chapter 1-Writing your own functions │ ├── 1.py │ ├── 2.py │ ├── 3.py │ ├── 4.py │ ├── 5.py │ ├── 6.py │ ├── 7.py │ ├── 8.py │ └── ch1_slides.pdf ├── Chapter 2-Default arguments, variable-length arguments and scope │ ├── 1.py │ ├── 10.py │ ├── 2.py │ ├── 3.py │ ├── 4.py │ ├── 5.py │ ├── 6.py │ ├── 7.py │ ├── 8.py │ ├── 9.py │ └── ch2_slides.pdf ├── Chapter 3-Lambda functions and error-handling │ ├── 1.py │ ├── 2.py │ ├── 3.py │ ├── 4.py │ ├── 5.py │ ├── 6.py │ ├── 7.py │ ├── 8.py │ ├── 9.py │ └── ch3_slides.pdf └── Datasets │ └── tweets.csv ├── Python Data Science Toolbox (Part 2) ├── Chapter 1-Using iterators in PythonLand │ ├── 1.py │ ├── 2.py │ ├── 3.py │ ├── 4.py │ ├── 5.py │ ├── 6.py │ ├── 7.py │ ├── 8.py │ └── ch1_slides.pdf ├── Chapter 2-List comprehensions and generators │ ├── 1.py │ ├── 10.py │ ├── 2.py │ ├── 3.py │ ├── 4.py │ ├── 5.py │ ├── 6.py │ ├── 7.py │ ├── 8.py │ ├── 9.py │ └── ch2_slides.pdf ├── Chapter 3-Bringing it all together! │ ├── 1.py │ ├── 10.py │ ├── 11.py │ ├── 12.py │ ├── 2.py │ ├── 3.py │ ├── 4.py │ ├── 5.py │ ├── 6.py │ ├── 7.py │ ├── 8.py │ ├── 9.py │ └── ch3_slides.pdf └── Datasets │ ├── tweets.csv │ └── world_ind_pop_data.csv ├── README.md ├── Statistical Thinking in Python (Part 1) ├── Codings │ ├── Chapter 1-Graphical exploratory data analysis.py │ ├── Chapter 2-Quantitative exploratory data analysis.py │ ├── Chapter 3-Thinking probabilistically-- Discrete variables.py │ └── Chapter 4-Thinking probabilistically-- Continuous variables.py ├── Datasets │ ├── 2008_all_states.csv │ ├── 2008_swing_states.csv │ ├── belmont.csv │ └── michelson_speed_of_light.csv └── Notes │ ├── ch1_slides.pdf │ ├── ch2_slides.pdf │ ├── ch3_slides.pdf │ └── ch4_slides.pdf ├── Statistical Thinking in Python (Part 2) ├── Codings │ ├── Chapter 1-Parameter estimation by optimization.py │ ├── Chapter 2-Bootstrap confidence intervals.py │ ├── Chapter 3 -Introduction to hypothesis testing.py │ ├── Chapter 4-Hypothesis test examples.py │ └── Chapter 5-Putting it all together- a case study.py ├── Datasets │ ├── anscombe.csv │ ├── bee_sperm.csv │ ├── female_literacy_fertility.csv │ ├── finch_beaks_1975.csv │ ├── finch_beaks_2012.csv │ ├── fortis_beak_depth_heredity.csv │ ├── frog_tongue.csv │ ├── mlb_nohitters.csv │ ├── scandens_beak_depth_heredity.csv │ └── sheffield_weather_station.csv └── Notes │ ├── ch1_slides.pdf │ ├── ch2_slides.pdf │ ├── ch3_slides.pdf │ ├── ch4_slides.pdf │ └── ch5_slides.pdf ├── _config.yml └── pandas Foundations ├── Codings ├── Chapter 1-Data ingestion & inspection.py ├── Chapter 2-Exploratory data analysis.py ├── Chapter 3-Time series in pandas.py └── Chapter 4-Case Study - Sunlight in Austin.py ├── Datasets ├── NOAA_QCLCD_2011_hourly_13904.txt ├── austin_airport_departure_data_2015_july.csv ├── auto-mpg.csv ├── life_expectancy_at_birth.csv ├── messy_stock_data.tsv ├── percent-bachelors-degrees-women-usa.csv ├── tips.csv ├── titanic.csv ├── weather_data_austin_2010.csv ├── world_ind_pop_data.csv └── world_population.csv └── Notes ├── ch1_slides.pdf ├── ch2_slides.pdf ├── ch3_slides.pdf └── ch4_slides.pdf /.gitattributes: -------------------------------------------------------------------------------- 1 | # Auto detect text files and perform LF normalization 2 | * text=auto 3 | -------------------------------------------------------------------------------- /Analyzing Police Activity with pandas/Notes/chapter1.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FTiniNadhirah/Datacamp-Data-Scientist-with-Python-2019-and-2020/52ab66237a2373ba820ea3e197809648b09f217b/Analyzing Police Activity with pandas/Notes/chapter1.pdf -------------------------------------------------------------------------------- /Analyzing Police Activity with pandas/Notes/chapter2.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FTiniNadhirah/Datacamp-Data-Scientist-with-Python-2019-and-2020/52ab66237a2373ba820ea3e197809648b09f217b/Analyzing Police Activity with pandas/Notes/chapter2.pdf -------------------------------------------------------------------------------- /Analyzing Police Activity with pandas/Notes/chapter3.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FTiniNadhirah/Datacamp-Data-Scientist-with-Python-2019-and-2020/52ab66237a2373ba820ea3e197809648b09f217b/Analyzing Police Activity with pandas/Notes/chapter3.pdf -------------------------------------------------------------------------------- /Analyzing Police Activity with pandas/Notes/chapter4.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FTiniNadhirah/Datacamp-Data-Scientist-with-Python-2019-and-2020/52ab66237a2373ba820ea3e197809648b09f217b/Analyzing Police Activity with pandas/Notes/chapter4.pdf -------------------------------------------------------------------------------- /Cleaning Data in Python/Notes/ch1_slides.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FTiniNadhirah/Datacamp-Data-Scientist-with-Python-2019-and-2020/52ab66237a2373ba820ea3e197809648b09f217b/Cleaning Data in Python/Notes/ch1_slides.pdf -------------------------------------------------------------------------------- /Cleaning Data in Python/Notes/ch2_slides.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FTiniNadhirah/Datacamp-Data-Scientist-with-Python-2019-and-2020/52ab66237a2373ba820ea3e197809648b09f217b/Cleaning Data in Python/Notes/ch2_slides.pdf -------------------------------------------------------------------------------- /Cleaning Data in Python/Notes/ch3_slides.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FTiniNadhirah/Datacamp-Data-Scientist-with-Python-2019-and-2020/52ab66237a2373ba820ea3e197809648b09f217b/Cleaning Data in Python/Notes/ch3_slides.pdf -------------------------------------------------------------------------------- /Cleaning Data in Python/Notes/ch4_slides.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FTiniNadhirah/Datacamp-Data-Scientist-with-Python-2019-and-2020/52ab66237a2373ba820ea3e197809648b09f217b/Cleaning Data in Python/Notes/ch4_slides.pdf -------------------------------------------------------------------------------- /Cleaning Data in Python/Notes/ch5_slides.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FTiniNadhirah/Datacamp-Data-Scientist-with-Python-2019-and-2020/52ab66237a2373ba820ea3e197809648b09f217b/Cleaning Data in Python/Notes/ch5_slides.pdf -------------------------------------------------------------------------------- /Conda Essentials/Codings/Chapter 2-Utilizing Channels.py: -------------------------------------------------------------------------------- 1 | # Exercise 2 | # Installing from a channel 3 | # We saw in the last exercise that there are about 30,000 linux-64 packages on conda-forge. Across all the channels there are about 50,000 packages, most of those for at least 3 of of the 5 main platforms (osx-64, linux-32, linux-64, win-32, win-64; 32-bit support is of diminishing importance compared to 64-bit). There are around 2500 channels that have been active in the last 6 months; most are individual users, but a fair number belonging to projects or organizations. A majority of package names are published by more than one different channel; sometimes just as a copy, other times with a tweak or compiler optimization, or in a different version. 4 | 5 | # The whole point of having channels is to be able to install packages from them. For this exercise, you will install a version of a package not available on the default channel. Adding a channel to install from simply requires using the same --channel or -c switch we have seen in other conda commands, but with the conda install command. 6 | 7 | # For example: 8 | 9 | # conda install --channel my-organization the-package 10 | # Instructions 1/2 11 | # A package named youtube-dl exists on conda-forge but is not available on the default channel. Please install it. 12 | $ conda install -c conda-forge youtube-dl -y --no-deps 13 | 14 | # Instructions 2/2 15 | # You should examine what software is installed in your current environment now. You should notice that unlike other packages, the newly install youtube-dl came from a non-default channel. 16 | $ conda list -------------------------------------------------------------------------------- /Importing Data in Python (Part 1)/Datasets/Chinook.sqlite: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FTiniNadhirah/Datacamp-Data-Scientist-with-Python-2019-and-2020/52ab66237a2373ba820ea3e197809648b09f217b/Importing Data in Python (Part 1)/Datasets/Chinook.sqlite -------------------------------------------------------------------------------- /Importing Data in Python (Part 1)/Datasets/L-L1_LOSC_4_V1-1126259446-32.hdf5: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FTiniNadhirah/Datacamp-Data-Scientist-with-Python-2019-and-2020/52ab66237a2373ba820ea3e197809648b09f217b/Importing Data in Python (Part 1)/Datasets/L-L1_LOSC_4_V1-1126259446-32.hdf5 -------------------------------------------------------------------------------- /Importing Data in Python (Part 1)/Datasets/battledeath.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FTiniNadhirah/Datacamp-Data-Scientist-with-Python-2019-and-2020/52ab66237a2373ba820ea3e197809648b09f217b/Importing Data in Python (Part 1)/Datasets/battledeath.xlsx -------------------------------------------------------------------------------- /Importing Data in Python (Part 1)/Datasets/disarea.dta: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FTiniNadhirah/Datacamp-Data-Scientist-with-Python-2019-and-2020/52ab66237a2373ba820ea3e197809648b09f217b/Importing Data in Python (Part 1)/Datasets/disarea.dta -------------------------------------------------------------------------------- /Importing Data in Python (Part 1)/Datasets/ja_data2.mat: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FTiniNadhirah/Datacamp-Data-Scientist-with-Python-2019-and-2020/52ab66237a2373ba820ea3e197809648b09f217b/Importing Data in Python (Part 1)/Datasets/ja_data2.mat -------------------------------------------------------------------------------- /Importing Data in Python (Part 1)/Datasets/sales.sas7bdat: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FTiniNadhirah/Datacamp-Data-Scientist-with-Python-2019-and-2020/52ab66237a2373ba820ea3e197809648b09f217b/Importing Data in Python (Part 1)/Datasets/sales.sas7bdat -------------------------------------------------------------------------------- /Importing Data in Python (Part 1)/Datasets/seaslug.txt: -------------------------------------------------------------------------------- 1 | Time Percent 2 | 99 0.067 3 | 99 0.133 4 | 99 0.067 5 | 99 0 6 | 99 0 7 | 0 0.5 8 | 0 0.467 9 | 0 0.857 10 | 0 0.5 11 | 0 0.357 12 | 0 0.533 13 | 5 0.467 14 | 5 0.467 15 | 5 0.125 16 | 5 0.4 17 | 5 0.214 18 | 5 0.4 19 | 10 0.067 20 | 10 0.067 21 | 10 0.333 22 | 10 0.333 23 | 10 0.133 24 | 10 0.133 25 | 15 0.267 26 | 15 0.286 27 | 15 0.333 28 | 15 0.214 29 | 15 0 30 | 15 0 31 | 20 0.267 32 | 20 0.2 33 | 20 0.267 34 | 20 0.437 35 | 20 0.077 36 | 20 0.067 37 | 25 0.133 38 | 25 0.267 39 | 25 0.412 40 | 25 0 41 | 25 0.067 42 | 25 0.133 43 | 30 0 44 | 30 0.071 45 | 30 0 46 | 30 0.067 47 | 30 0.067 48 | 30 0.133 -------------------------------------------------------------------------------- /Importing Data in Python (Part 1)/Notes/ch2_pdf_slides.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FTiniNadhirah/Datacamp-Data-Scientist-with-Python-2019-and-2020/52ab66237a2373ba820ea3e197809648b09f217b/Importing Data in Python (Part 1)/Notes/ch2_pdf_slides.pdf -------------------------------------------------------------------------------- /Importing Data in Python (Part 1)/Notes/ch_1_slides.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FTiniNadhirah/Datacamp-Data-Scientist-with-Python-2019-and-2020/52ab66237a2373ba820ea3e197809648b09f217b/Importing Data in Python (Part 1)/Notes/ch_1_slides.pdf -------------------------------------------------------------------------------- /Importing Data in Python (Part 1)/Notes/ch_3_slides.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FTiniNadhirah/Datacamp-Data-Scientist-with-Python-2019-and-2020/52ab66237a2373ba820ea3e197809648b09f217b/Importing Data in Python (Part 1)/Notes/ch_3_slides.pdf -------------------------------------------------------------------------------- /Importing Data in Python (Part 2)/Datasets/latitude.xls: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FTiniNadhirah/Datacamp-Data-Scientist-with-Python-2019-and-2020/52ab66237a2373ba820ea3e197809648b09f217b/Importing Data in Python (Part 2)/Datasets/latitude.xls -------------------------------------------------------------------------------- /Importing Data in Python (Part 2)/Notes/ch_1_slides.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FTiniNadhirah/Datacamp-Data-Scientist-with-Python-2019-and-2020/52ab66237a2373ba820ea3e197809648b09f217b/Importing Data in Python (Part 2)/Notes/ch_1_slides.pdf -------------------------------------------------------------------------------- /Importing Data in Python (Part 2)/Notes/ch_2_slides.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FTiniNadhirah/Datacamp-Data-Scientist-with-Python-2019-and-2020/52ab66237a2373ba820ea3e197809648b09f217b/Importing Data in Python (Part 2)/Notes/ch_2_slides.pdf -------------------------------------------------------------------------------- /Importing Data in Python (Part 2)/Notes/ch_3_slides.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FTiniNadhirah/Datacamp-Data-Scientist-with-Python-2019-and-2020/52ab66237a2373ba820ea3e197809648b09f217b/Importing Data in Python (Part 2)/Notes/ch_3_slides.pdf -------------------------------------------------------------------------------- /Interactive Data Visualization with Bokeh/Notes/ch1_slides.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FTiniNadhirah/Datacamp-Data-Scientist-with-Python-2019-and-2020/52ab66237a2373ba820ea3e197809648b09f217b/Interactive Data Visualization with Bokeh/Notes/ch1_slides.pdf -------------------------------------------------------------------------------- /Interactive Data Visualization with Bokeh/Notes/ch2_slides.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FTiniNadhirah/Datacamp-Data-Scientist-with-Python-2019-and-2020/52ab66237a2373ba820ea3e197809648b09f217b/Interactive Data Visualization with Bokeh/Notes/ch2_slides.pdf -------------------------------------------------------------------------------- /Interactive Data Visualization with Bokeh/Notes/ch4_slides.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FTiniNadhirah/Datacamp-Data-Scientist-with-Python-2019-and-2020/52ab66237a2373ba820ea3e197809648b09f217b/Interactive Data Visualization with Bokeh/Notes/ch4_slides.pdf -------------------------------------------------------------------------------- /Interactive Data Visualization with Bokeh/Notes/ch5_slides.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FTiniNadhirah/Datacamp-Data-Scientist-with-Python-2019-and-2020/52ab66237a2373ba820ea3e197809648b09f217b/Interactive Data Visualization with Bokeh/Notes/ch5_slides.pdf -------------------------------------------------------------------------------- /Intermediate Python for Data Science/Codings/Chapter 1-Matplotlib/1.py: -------------------------------------------------------------------------------- 1 | # Exercise 2 | # Exercise 3 | # Line plot (1) 4 | # With matplotlib, you can create a bunch of different plots in Python. The most basic plot is the line plot. A general recipe is given here. 5 | 6 | # import matplotlib.pyplot as plt 7 | # plt.plot(x,y) 8 | # plt.show() 9 | # In the video, you already saw how much the world population has grown over the past years. Will it continue to do so? The world bank has estimates of the world population for the years 1950 up to 2100. The years are loaded in your workspace as a list called year, and the corresponding populations as a list called pop. 10 | 11 | # This course touches on a lot of concepts you may have forgotten, so if you ever need a quick refresher, download the Python for data science Cheat Sheet and keep it handy! 12 | 13 | # Instructions 14 | # 100 XP 15 | # print() the last item from both the year and the pop list to see what the predicted population for the year 2100 is. Use two print() functions. 16 | # Before you can start, you should import matplotlib.pyplot as plt. pyplot is a sub-package of matplotlib, hence the dot. 17 | # Use plt.plot() to build a line plot. year should be mapped on the horizontal axis, pop on the vertical axis. Don't forget to finish off with the show() function to actually display the plot. 18 | # Print the last item from year and pop 19 | 20 | print(year[-1]) 21 | print(pop[-1]) 22 | 23 | # Import matplotlib.pyplot as plt 24 | 25 | import matplotlib.pyplot as plt 26 | # Make a line plot: year on the x-axis, pop on the y-axis 27 | plt.plot(year, pop) 28 | 29 | # Display the plot with plt.show() 30 | plt.show() -------------------------------------------------------------------------------- /Intermediate Python for Data Science/Codings/Chapter 1-Matplotlib/10.py: -------------------------------------------------------------------------------- 1 | # Sizes 2 | # Right now, the scatter plot is just a cloud of blue dots, indistinguishable from each other. Let's change this. Wouldn't it be nice if the size of the dots corresponds to the population? 3 | 4 | # To accomplish this, there is a list pop loaded in your workspace. It contains population numbers for each country expressed in millions. You can see that this list is added to the scatter method, as the argument s, for size. 5 | 6 | # Instructions 7 | # 100 XP 8 | # Instructions 9 | # 100 XP 10 | # Run the script to see how the plot changes. 11 | # Looks good, but increasing the size of the bubbles will make things stand out more. 12 | # Import the numpy package as np. 13 | # Use np.array() to create a numpy array from the list pop. Call this Numpy array np_pop. 14 | # Double the values in np_pop by assigning np_pop * 2 to np_pop again. Because np_pop is a Numpy array, each array element will be doubled. 15 | # Change the s argument inside plt.scatter() to be np_pop instead of pop. 16 | 17 | # Import numpy as np 18 | import numpy as np 19 | 20 | # Store pop as a numpy array: np_pop 21 | np_pop = np.array(pop) 22 | 23 | # Double np_pop 24 | np_pop = np_pop * 2 25 | 26 | # Update: set s argument to np_pop 27 | plt.scatter(gdp_cap, life_exp, s = np_pop) 28 | 29 | # Previous customizations 30 | plt.xscale('log') 31 | plt.xlabel('GDP per Capita [in USD]') 32 | plt.ylabel('Life Expectancy [in years]') 33 | plt.title('World Development in 2007') 34 | plt.xticks([1000, 10000, 100000],['1k', '10k', '100k']) 35 | 36 | # Display the plot 37 | plt.show() -------------------------------------------------------------------------------- /Intermediate Python for Data Science/Codings/Chapter 1-Matplotlib/11.py: -------------------------------------------------------------------------------- 1 | # Colors 2 | # The code you've written up to now is available in the script on the right. 3 | 4 | # The next step is making the plot more colorful! To do this, a list col has been created for you. It's a list with a color for each corresponding country, depending on the continent the country is part of. 5 | 6 | # How did we make the list col you ask? The Gapminder data contains a list continent with the continent each country belongs to. A dictionary is constructed that maps continents onto colors: 7 | 8 | # dict = { 9 | # 'Asia':'red', 10 | # 'Europe':'green', 11 | # 'Africa':'blue', 12 | # 'Americas':'yellow', 13 | # 'Oceania':'black' 14 | # } 15 | # Nothing to worry about now; you will learn about dictionaries in the next chapter. 16 | 17 | # Instructions 18 | # 100 XP 19 | # Add c = col to the arguments of the plt.scatter() function. 20 | # Change the opacity of the bubbles by setting the alpha argument to 0.8 inside plt.scatter(). Alpha can be set from zero to one, where zero is totally transparent, and one is not at all transparent. 21 | 22 | # Specify c and alpha inside plt.scatter() 23 | plt.scatter(x = gdp_cap, y = life_exp, s = np.array(pop) * 2, c = col, alpha = 0.8) 24 | 25 | # Previous customizations 26 | plt.xscale('log') 27 | plt.xlabel('GDP per Capita [in USD]') 28 | plt.ylabel('Life Expectancy [in years]') 29 | plt.title('World Development in 2007') 30 | plt.xticks([1000,10000,100000], ['1k','10k','100k']) 31 | 32 | # Show the plot 33 | plt.show() -------------------------------------------------------------------------------- /Intermediate Python for Data Science/Codings/Chapter 1-Matplotlib/12.py: -------------------------------------------------------------------------------- 1 | # Additional Customizations 2 | # If you have another look at the script, under # Additional Customizations, you'll see that there are two plt.text() functions now. They add the words "India" and "China" in the plot. 3 | 4 | # Instructions 5 | # 100 XP 6 | # Add plt.grid(True) after the plt.text() calls so that gridlines are drawn on the plot. 7 | 8 | 9 | # Scatter plot 10 | plt.scatter(x = gdp_cap, y = life_exp, s = np.array(pop) * 2, c = col, alpha = 0.8) 11 | 12 | # Previous customizations 13 | plt.xscale('log') 14 | plt.xlabel('GDP per Capita [in USD]') 15 | plt.ylabel('Life Expectancy [in years]') 16 | plt.title('World Development in 2007') 17 | plt.xticks([1000,10000,100000], ['1k','10k','100k']) 18 | 19 | # Additional customizations 20 | plt.text(1550, 71, 'India') 21 | plt.text(5700, 80, 'China') 22 | 23 | # Add grid() call 24 | plt.grid(True) 25 | 26 | # Show the plot 27 | plt.show() -------------------------------------------------------------------------------- /Intermediate Python for Data Science/Codings/Chapter 1-Matplotlib/2.py: -------------------------------------------------------------------------------- 1 | # Exercise 2 | # Exercise 3 | # Line plot (3) 4 | # Now that you've built your first line plot, let's start working on the data that professor Hans Rosling used to build his beautiful bubble chart. It was collected in 2007. Two lists are available for you: 5 | 6 | # life_exp which contains the life expectancy for each country and 7 | # gdp_cap, which contains the GDP per capita (i.e. per person) for each country expressed in US Dollars. 8 | # GDP stands for Gross Domestic Product. It basically represents the size of the economy of a country. Divide this by the population and you get the GDP per capita. 9 | 10 | # matplotlib.pyplot is already imported as plt, so you can get started straight away. 11 | 12 | # Instructions 13 | # 100 XP 14 | # Instructions 15 | # 100 XP 16 | # Print the last item from both the list gdp_cap, and the list life_exp; it is information about Zimbabwe. 17 | # Build a line chart, with gdp_cap on the x-axis, and life_exp on the y-axis. Does it make sense to plot this data on a line plot? 18 | # Don't forget to finish off with a plt.show() command, to actually display the plot. 19 | 20 | # Print the last item of gdp_cap and life_exp 21 | print(life_exp[-1]) 22 | print(gdp_cap[-1]) 23 | 24 | # Make a line plot, gdp_cap on the x-axis, life_exp on the y-axis 25 | plt.plot(gdp_cap, life_exp) 26 | 27 | # Display the plot 28 | plt.show() -------------------------------------------------------------------------------- /Intermediate Python for Data Science/Codings/Chapter 1-Matplotlib/3.py: -------------------------------------------------------------------------------- 1 | # Scatter Plot (1) 2 | # When you have a time scale along the horizontal axis, the line plot is your friend. But in many other cases, when you're trying to assess if there's a correlation between two variables, for example, the scatter plot is the better choice. Below is an example of how to build a scatter plot. 3 | 4 | # import matplotlib.pyplot as plt 5 | # plt.scatter(x,y) 6 | # plt.show() 7 | # Let's continue with the gdp_cap versus life_exp plot, the GDP and life expectancy data for different countries in 2007. Maybe a scatter plot will be a better alternative? 8 | 9 | # Again, the matplotlib.pyplot package is available as plt. 10 | 11 | # Instructions 12 | # 100 XP 13 | # Change the line plot that's coded in the script to a scatter plot. 14 | # A correlation will become clear when you display the GDP per capita on a logarithmic scale. Add the line plt.xscale('log'). 15 | # Finish off your script with plt.show() to display the plot. 16 | 17 | plt.scatter(gdp_cap, life_exp) 18 | 19 | # Put the x-axis on a logarithmic scale 20 | plt.xscale('log') 21 | 22 | # Show plot 23 | plt.show() -------------------------------------------------------------------------------- /Intermediate Python for Data Science/Codings/Chapter 1-Matplotlib/4.py: -------------------------------------------------------------------------------- 1 | # Scatter plot (2) 2 | # In the previous exercise, you saw that that the higher GDP usually corresponds to a higher life expectancy. In other words, there is a positive correlation. 3 | 4 | # Do you think there's a relationship between population and life expectancy of a country? The list life_exp from the previous exercise is already available. In addition, now also pop is available, listing the corresponding populations for the countries in 2007. The populations are in millions of people. 5 | 6 | # Instructions 7 | # 100 XP 8 | # Start from scratch: import matplotlib.pyplot as plt. 9 | # Build a scatter plot, where pop is mapped on the horizontal axis, and life_exp is mapped on the vertical axis. 10 | # Finish the script with plt.show() to actually display the plot. Do you see a correlation? 11 | 12 | # Import package 13 | import matplotlib.pyplot as plt 14 | 15 | # Build Scatter plot 16 | plt.scatter(pop, life_exp) 17 | plt.xscale('log') 18 | 19 | # Show plot 20 | plt.show() -------------------------------------------------------------------------------- /Intermediate Python for Data Science/Codings/Chapter 1-Matplotlib/5.py: -------------------------------------------------------------------------------- 1 | # Build a histogram (1) 2 | # life_exp, the list containing data on the life expectancy for different countries in 2007, is available in your Python shell. 3 | 4 | # To see how life expectancy in different countries is distributed, let's create a histogram of life_exp. 5 | 6 | # matplotlib.pyplot is already available as plt. 7 | 8 | # Instructions 9 | # 100 XP 10 | # Use plt.hist() to create a histogram of the values in life_exp. Do not specify the number of bins; Python will set the number of bins to 10 by default for you. 11 | # Add plt.show() to actually display the histogram. Can you tell which bin contains the most observations? 12 | 13 | # Create histogram of life_exp data 14 | life_exp 15 | 16 | # Display histogram 17 | plt.hist(life_exp) 18 | plt.show() -------------------------------------------------------------------------------- /Intermediate Python for Data Science/Codings/Chapter 1-Matplotlib/6.py: -------------------------------------------------------------------------------- 1 | # Build a histogram (2): bins 2 | # In the previous exercise, you didn't specify the number of bins. By default, Python sets the number of bins to 10 in that case. The number of bins is pretty important. Too few bins will oversimplify reality and won't show you the details. Too many bins will overcomplicate reality and won't show the bigger picture. 3 | 4 | # To control the number of bins to divide your data in, you can set the bins argument. 5 | 6 | # That's exactly what you'll do in this exercise. You'll be making two plots here. The code in the script already includes plt.show() and plt.clf() calls; plt.show() displays a plot; plt.clf() cleans it up again so you can start afresh. 7 | 8 | # As before, life_exp is available and matplotlib.pyplot is imported as plt. 9 | 10 | # Instructions 11 | # 100 XP 12 | # Instructions 13 | # 100 XP 14 | # Build a histogram of life_exp, with 5 bins. Can you tell which bin contains the most observations? 15 | # Build another histogram of life_exp, this time with 20 bins. Is this better? 16 | # Build histogram with 5 bins 17 | plt.hist(life_exp, bins = 5) 18 | 19 | # Show and clean up plot 20 | plt.show() 21 | plt.clf() 22 | 23 | # Build histogram with 20 bins 24 | plt.hist(life_exp, bins = 20) 25 | 26 | # Show and clean up again 27 | plt.show() 28 | plt.clf() -------------------------------------------------------------------------------- /Intermediate Python for Data Science/Codings/Chapter 1-Matplotlib/7.py: -------------------------------------------------------------------------------- 1 | # Build a histogram (3): compare 2 | # In the video, you saw population pyramids for the present day and for the future. Because we were using a histogram, it was very easy to make a comparison. 3 | 4 | # Let's do a similar comparison. life_exp contains life expectancy data for different countries in 2007. You also have access to a second list now, life_exp1950, containing similar data for 1950. Can you make a histogram for both datasets? 5 | 6 | # You'll again be making two plots. The plt.show() and plt.clf() commands to render everything nicely are already included. Also matplotlib.pyplot is imported for you, as plt. 7 | 8 | # Instructions 9 | # 100 XP 10 | # Build a histogram of life_exp with 15 bins. 11 | # Build a histogram of life_exp1950, also with 15 bins. Is there a big difference with the histogram for the 2007 data? 12 | 13 | # Histogram of life_exp, 15 bins 14 | plt.hist(life_exp, bins=15) 15 | 16 | # Show and clear plot 17 | plt.show() 18 | plt.clf() 19 | 20 | # Histogram of life_exp1950, 15 bins 21 | plt.hist(life_exp1950, bins=15) 22 | 23 | # Show and clear plot again 24 | plt.show() 25 | plt.clf() -------------------------------------------------------------------------------- /Intermediate Python for Data Science/Codings/Chapter 1-Matplotlib/8.py: -------------------------------------------------------------------------------- 1 | # Labels 2 | # It's time to customize your own plot. This is the fun part, you will see your plot come to life! 3 | 4 | # You're going to work on the scatter plot with world development data: GDP per capita on the x-axis (logarithmic scale), life expectancy on the y-axis. The code for this plot is available in the script. 5 | 6 | # As a first step, let's add axis labels and a title to the plot. You can do this with the xlabel(), ylabel() and title() functions, available in matplotlib.pyplot. This sub-package is already imported as plt. 7 | 8 | # Instructions 9 | # 100 XP 10 | # Instructions 11 | # 100 XP 12 | # The strings xlab and ylab are already set for you. Use these variables to set the label of the x- and y-axis. 13 | # The string title is also coded for you. Use it to add a title to the plot. 14 | # After these customizations, finish the script with plt.show() to actually display the plot. 15 | 16 | # Basic scatter plot, log scale 17 | plt.scatter(gdp_cap, life_exp) 18 | plt.xscale('log') 19 | 20 | # Strings 21 | xlab = 'GDP per Capita [in USD]' 22 | ylab = 'Life Expectancy [in years]' 23 | title = 'World Development in 2007' 24 | 25 | # Add axis labels 26 | plt.xlabel(xlab) 27 | plt.ylabel(ylab) 28 | 29 | # Add title 30 | plt.title(title) 31 | 32 | # After customizing, display the plot 33 | plt.show() 34 | -------------------------------------------------------------------------------- /Intermediate Python for Data Science/Codings/Chapter 1-Matplotlib/9.py: -------------------------------------------------------------------------------- 1 | # Ticks 2 | # The customizations you've coded up to now are available in the script, in a more concise form. 3 | 4 | # In the video, Filip has demonstrated how you could control the y-ticks by specifying two arguments: 5 | 6 | # plt.yticks([0,1,2], ["one","two","three"]) 7 | # In this example, the ticks corresponding to the numbers 0, 1 and 2 will be replaced by one, two and three, respectively. 8 | 9 | # Let's do a similar thing for the x-axis of your world development chart, with the xticks() function. The tick values 1000, 10000 and 100000 should be replaced by 1k, 10k and 100k. To this end, two lists have already been created for you: tick_val and tick_lab. 10 | 11 | # Instructions 12 | # 100 XP 13 | # Instructions 14 | # 100 XP 15 | # Use tick_val and tick_lab as inputs to the xticks() function to make the the plot more readable. 16 | # As usual, display the plot with plt.show() after you've added the customizations. 17 | 18 | # Scatter plot 19 | plt.scatter(gdp_cap, life_exp) 20 | 21 | # Previous customizations 22 | plt.xscale('log') 23 | plt.xlabel('GDP per Capita [in USD]') 24 | plt.ylabel('Life Expectancy [in years]') 25 | plt.title('World Development in 2007') 26 | 27 | # Definition of tick_val and tick_lab 28 | tick_val = [1000, 10000, 100000] 29 | tick_lab = ['1k', '10k', '100k'] 30 | 31 | # Adapt the ticks on the x-axis 32 | plt.xticks(tick_val, tick_lab) 33 | 34 | # After customizing, display the plot 35 | plt.show() -------------------------------------------------------------------------------- /Intermediate Python for Data Science/Codings/Chapter 2-Dictionaries & Pandas/1.py: -------------------------------------------------------------------------------- 1 | # Motivation for dictionaries 2 | # To see why dictionaries are useful, have a look at the two lists defined on the right. countries contains the names of some European countries. capitals lists the corresponding names of their capital. 3 | 4 | # Instructions 5 | # 100 XP 6 | # Use the index() method on countries to find the index of 'germany'. Store this index as ind_ger. 7 | # Use ind_ger to access the capital of Germany from the capitals list. Print it out. 8 | # Definition of countries and capital 9 | countries = ['spain', 'france', 'germany', 'norway'] 10 | capitals = ['madrid', 'paris', 'berlin', 'oslo'] 11 | 12 | # Get index of 'germany': ind_ger 13 | ind_ger = countries.index('germany') 14 | 15 | # Use ind_ger to print out capital of Germany 16 | print(capitals[ind_ger]) -------------------------------------------------------------------------------- /Intermediate Python for Data Science/Codings/Chapter 2-Dictionaries & Pandas/10.py: -------------------------------------------------------------------------------- 1 | # Exercise 2 | # Exercise 3 | # CSV to DataFrame (2) 4 | # Your read_csv() call to import the CSV data didn't generate an error, but the output is not entirely what we wanted. The row labels were imported as another column without a name. 5 | 6 | # Remember index_col, an argument of read_csv(), that you can use to specify which column in the CSV file should be used as a row label? Well, that's exactly what you need here! 7 | 8 | # Python code that solves the previous exercise is already included; can you make the appropriate changes to fix the data import? 9 | 10 | # Instructions 11 | # 100 XP 12 | # Run the code with Submit Answer and assert that the first column should actually be used as row labels. 13 | # Specify the index_col argument inside pd.read_csv(): set it to 0, so that the first column is used as row labels. 14 | # Has the printout of cars improved now? 15 | 16 | # Import pandas as pd 17 | import pandas as pd 18 | 19 | # Fix import by including index_col 20 | cars = pd.read_csv('https://assets.datacamp.com/production/course_799/datasets/cars.csv', index_col=0) 21 | 22 | 23 | # Print out cars 24 | print(cars) -------------------------------------------------------------------------------- /Intermediate Python for Data Science/Codings/Chapter 2-Dictionaries & Pandas/11.py: -------------------------------------------------------------------------------- 1 | # Square Brackets (1) 2 | # In the video, you saw that you can index and select Pandas DataFrames in many different ways. The simplest, but not the most powerful way, is to use square brackets. 3 | 4 | # In the sample code on the right, the same cars data is imported from a CSV files as a Pandas DataFrame. To select only the cars_per_cap column from cars, you can use: 5 | 6 | # cars['cars_per_cap'] 7 | # cars[['cars_per_cap']] 8 | # The single bracket version gives a Pandas Series, the double bracket version gives a Pandas DataFrame. 9 | 10 | # Instructions 11 | # 100 XP 12 | # Use single square brackets to print out the country column of cars as a Pandas Series. 13 | # Use double square brackets to print out the country column of cars as a Pandas DataFrame. 14 | # Use double square brackets to print out a DataFrame with both the country and drives_right columns of cars, in this order. 15 | 16 | # Import cars data 17 | import pandas as pd 18 | cars = pd.read_csv('cars.csv', index_col = 0) 19 | 20 | # Print out country column as Pandas Series 21 | print(cars['country']) 22 | 23 | # Print out country column as Pandas DataFrame 24 | print(cars[['country']]) 25 | 26 | # Print out DataFrame with country and drives_right columns 27 | print(cars[['country','drives_right']]) -------------------------------------------------------------------------------- /Intermediate Python for Data Science/Codings/Chapter 2-Dictionaries & Pandas/12.py: -------------------------------------------------------------------------------- 1 | # Square Brackets (2) 2 | # Square brackets can do more than just selecting columns. You can also use them to get rows, or observations, from a DataFrame. The following call selects the first five rows from the cars DataFrame: 3 | 4 | # cars[0:5] 5 | # The result is another DataFrame containing only the rows you specified. 6 | 7 | # Pay attention: You can only select rows using square brackets if you specify a slice, like 0:4. Also, you're using the integer indexes of the rows here, not the row labels! 8 | 9 | # Instructions 10 | # 100 XP 11 | # Select the first 3 observations from cars and print them out. 12 | # Select the fourth, fifth and sixth observation, corresponding to row indexes 3, 4 and 5, and print them out. 13 | 14 | # Import cars data 15 | import pandas as pd 16 | cars = pd.read_csv('cars.csv', index_col = 0) 17 | 18 | # Print out first 3 observations 19 | print(cars[0:3]) 20 | 21 | # Print out fourth, fifth and sixth observation 22 | print(cars[3:6]) -------------------------------------------------------------------------------- /Intermediate Python for Data Science/Codings/Chapter 2-Dictionaries & Pandas/13.py: -------------------------------------------------------------------------------- 1 | # loc and iloc (1) 2 | # With loc and iloc you can do practically any data selection operation on DataFrames you can think of. loc is label-based, which means that you have to specify rows and columns based on their row and column labels. iloc is integer index based, so you have to specify rows and columns by their integer index like you did in the previous exercise. 3 | 4 | # Try out the following commands in the IPython Shell to experiment with loc and iloc to select observations. Each pair of commands here gives the same result. 5 | 6 | # cars.loc['RU'] 7 | # cars.iloc[4] 8 | 9 | # cars.loc[['RU']] 10 | # cars.iloc[[4]] 11 | 12 | # cars.loc[['RU', 'AUS']] 13 | # cars.iloc[[4, 1]] 14 | # As before, code is included that imports the cars data as a Pandas DataFrame. 15 | 16 | # Instructions 17 | # 100 XP 18 | # Use loc or iloc to select the observation corresponding to Japan as a Series. The label of this row is JAP, the index is 2. Make sure to print the resulting Series. 19 | # Use loc or iloc to select the observations for Australia and Egypt as a DataFrame. You can find out about the labels/indexes of these rows by inspecting cars in the IPython Shell. Make sure to print the resulting DataFrame. 20 | 21 | # Import cars data 22 | import pandas as pd 23 | cars = pd.read_csv('cars.csv', index_col = 0) 24 | 25 | # Print out observation for Japan 26 | print(cars.loc['JAP']) 27 | 28 | # Print out observations for Australia and Egypt 29 | print(cars.loc[['AUS','EG']]) -------------------------------------------------------------------------------- /Intermediate Python for Data Science/Codings/Chapter 2-Dictionaries & Pandas/14.py: -------------------------------------------------------------------------------- 1 | # loc and iloc (2) 2 | # loc and iloc also allow you to select both rows and columns from a DataFrame. To experiment, try out the following commands in the IPython Shell. Again, paired commands produce the same result. 3 | 4 | # cars.loc['IN', 'cars_per_cap'] 5 | # cars.iloc[3, 0] 6 | 7 | # cars.loc[['IN', 'RU'], 'cars_per_cap'] 8 | # cars.iloc[[3, 4], 0] 9 | 10 | # cars.loc[['IN', 'RU'], ['cars_per_cap', 'country']] 11 | # cars.iloc[[3, 4], [0, 1]] 12 | # Instructions 13 | # 100 XP 14 | # Instructions 15 | # 100 XP 16 | # Print out the drives_right value of the row corresponding to Morocco (its row label is MOR) 17 | # Print out a sub-DataFrame, containing the observations for Russia and Morocco and the columns country and drives_right. 18 | 19 | # Import cars data 20 | import pandas as pd 21 | cars = pd.read_csv('cars.csv', index_col = 0) 22 | 23 | # Print out drives_right value of Morocco 24 | print(cars.loc[['MOR'],['drives_right']]) 25 | 26 | # Print sub-DataFrame 27 | print(cars.loc[['RU','MOR'],['country','drives_right']]) -------------------------------------------------------------------------------- /Intermediate Python for Data Science/Codings/Chapter 2-Dictionaries & Pandas/15.py: -------------------------------------------------------------------------------- 1 | # Exercise 2 | # Exercise 3 | # loc and iloc (3) 4 | # It's also possible to select only columns with loc and iloc. In both cases, you simply put a slice going from beginning to end in front of the comma: 5 | 6 | # cars.loc[:, 'country'] 7 | # cars.iloc[:, 1] 8 | 9 | # cars.loc[:, ['country','drives_right']] 10 | # cars.iloc[:, [1, 2]] 11 | # Instructions 12 | # 100 XP 13 | # Instructions 14 | # 100 XP 15 | # Print out the drives_right column as a Series using loc or iloc. 16 | # Print out the drives_right column as a DataFrame using loc or iloc. 17 | # Print out both the cars_per_cap and drives_right column as a DataFrame using loc or iloc. 18 | 19 | # Import cars data 20 | import pandas as pd 21 | cars = pd.read_csv('cars.csv', index_col = 0) 22 | 23 | # Print out drives_right column as Series 24 | print(cars.loc[:, 'drives_right']) 25 | 26 | # Print out drives_right column as DataFrame 27 | print(cars.loc[:, ['drives_right']]) 28 | 29 | # Print out cars_per_cap and drives_right as DataFrame 30 | print(cars.loc[:, ['cars_per_cap','drives_right']]) 31 | -------------------------------------------------------------------------------- /Intermediate Python for Data Science/Codings/Chapter 2-Dictionaries & Pandas/2.py: -------------------------------------------------------------------------------- 1 | # Create dictionary 2 | # The countries and capitals lists are again available in the script. It's your job to convert this data to a dictionary where the country names are the keys and the capitals are the corresponding values. As a refresher, here is a recipe for creating a dictionary: 3 | 4 | # my_dict = { 5 | # "key1":"value1", 6 | # "key2":"value2", 7 | # } 8 | # In this recipe, both the keys and the values are strings. This will also be the case for this exercise. 9 | 10 | # Instructions 11 | # 100 XP 12 | # Instructions 13 | # 100 XP 14 | # With the strings in countries and capitals, create a dictionary called europe with 4 key:value pairs. Beware of capitalization! Make sure you use lowercase characters everywhere. 15 | # Print out europe to see if the result is what you expected. 16 | 17 | # Definition of countries and capital 18 | countries = ['spain', 'france', 'germany', 'norway'] 19 | capitals = ['madrid', 'paris', 'berlin', 'oslo'] 20 | 21 | # From string in countries and capitals, create dictionary europe 22 | europe = {'spain':'madrid', 23 | 'france':'paris', 24 | 'germany':'berlin', 25 | 'norway':'oslo'} 26 | 27 | # Print europe 28 | print(europe) -------------------------------------------------------------------------------- /Intermediate Python for Data Science/Codings/Chapter 2-Dictionaries & Pandas/3.py: -------------------------------------------------------------------------------- 1 | # Access dictionary 2 | # If the keys of a dictionary are chosen wisely, accessing the values in a dictionary is easy and intuitive. For example, to get the capital for France from europe you can use: 3 | 4 | # europe['france'] 5 | # Here, 'france' is the key and 'paris' the value is returned. 6 | 7 | # Instructions 8 | # 100 XP 9 | # Check out which keys are in europe by calling the keys() method on europe. Print out the result. 10 | # Print out the value that belongs to the key 'norway'. 11 | 12 | # Definition of dictionary 13 | europe = {'spain':'madrid', 'france':'paris', 'germany':'berlin', 'norway':'oslo' } 14 | 15 | 16 | # Print out the keys in europe 17 | print(europe.keys()) 18 | 19 | # Print out value that belongs to key 'norway' 20 | print(europe['norway']) -------------------------------------------------------------------------------- /Intermediate Python for Data Science/Codings/Chapter 2-Dictionaries & Pandas/4.py: -------------------------------------------------------------------------------- 1 | # Dictionary Manipulation (1) 2 | # If you know how to access a dictionary, you can also assign a new value to it. To add a new key-value pair to europe you can use something like this: 3 | 4 | # europe['iceland'] = 'reykjavik' 5 | # Instructions 6 | # 100 XP 7 | # Add the key 'italy' with the value 'rome' to europe. 8 | # To assert that 'italy' is now a key in europe, print out 'italy' in europe. 9 | # Add another key:value pair to europe: 'poland' is the key, 'warsaw' is the corresponding value. 10 | # Print out europe. 11 | 12 | # Definition of dictionary 13 | europe = {'spain':'madrid', 'france':'paris', 'germany':'berlin', 'norway':'oslo' } 14 | 15 | # Add italy to europe 16 | europe['italy'] = 'rome' 17 | 18 | # Print out italy in europe 19 | print('italy' in europe) 20 | 21 | # Add poland to europe 22 | europe['poland'] = 'warsaw' 23 | 24 | # Print europe 25 | print(europe) 26 | -------------------------------------------------------------------------------- /Intermediate Python for Data Science/Codings/Chapter 2-Dictionaries & Pandas/5.py: -------------------------------------------------------------------------------- 1 | # Exercise 2 | # Exercise 3 | # Dictionary Manipulation (2) 4 | # Somebody thought it would be funny to mess with your accurately generated dictionary. An adapted version of the europe dictionary is available in the script on the right. 5 | 6 | # Can you clean up? Do not do this by adapting the definition of europe, but by adding Python commands to the script to update and remove key:value pairs. 7 | 8 | # Instructions 9 | # 100 XP 10 | # The capital of Germany is not 'bonn'; it's 'berlin'. Update its value. 11 | # Australia is not in Europe, Austria is! Remove the key 'australia' from europe. 12 | # Print out europe to see if your cleaning work paid off. 13 | 14 | # Definition of dictionary 15 | europe = {'spain':'madrid', 'france':'paris', 'germany':'bonn', 16 | 'norway':'oslo', 'italy':'rome', 'poland':'warsaw', 17 | 'australia':'vienna' } 18 | 19 | # Update capital of germany 20 | europe['germany'] = 'berlin' 21 | print(europe) 22 | 23 | # Remove australia 24 | 25 | del(europe['australia']) 26 | print(europe) 27 | # Print europe 28 | -------------------------------------------------------------------------------- /Intermediate Python for Data Science/Codings/Chapter 2-Dictionaries & Pandas/6.py: -------------------------------------------------------------------------------- 1 | # Dictionariception 2 | # Remember lists? They could contain anything, even other lists. Well, for dictionaries the same holds. Dictionaries can contain key:value pairs where the values are again dictionaries. 3 | 4 | # As an example, have a look at the script where another version of europe - the dictionary you've been working with all along - is coded. The keys are still the country names, but the values are dictionaries that contain more information than just the capital. 5 | 6 | # It's perfectly possible to chain square brackets to select elements. To fetch the population for Spain from europe, for example, you need: 7 | 8 | # europe['spain']['population'] 9 | # Instructions 10 | # 100 XP 11 | # Use chained square brackets to select and print out the capital of France. 12 | # Create a dictionary, named data, with the keys 'capital' and 'population'. Set them to 'rome' and 59.83, respectively. 13 | # Add a new key-value pair to europe; the key is 'italy' and the value is data, the dictionary you just built. 14 | # Dictionary of dictionaries 15 | europe = { 'spain': { 'capital':'madrid', 'population':46.77 }, 16 | 'france': { 'capital':'paris', 'population':66.03 }, 17 | 'germany': { 'capital':'berlin', 'population':80.62 }, 18 | 'norway': { 'capital':'oslo', 'population':5.084 } } 19 | 20 | 21 | # Print out the capital of France 22 | print(europe['france']['capital']) 23 | # Create sub-dictionary data 24 | data = {'capital':'rome','population':59.83} 25 | 26 | # Add data to europe under key 'italy' 27 | europe['italy'] = data 28 | 29 | # Print europe 30 | print(europe) -------------------------------------------------------------------------------- /Intermediate Python for Data Science/Codings/Chapter 2-Dictionaries & Pandas/8.py: -------------------------------------------------------------------------------- 1 | # Dictionary to DataFrame (2) 2 | # The Python code that solves the previous exercise is included on the right. Have you noticed that the row labels (i.e. the labels for the different observations) were automatically set to integers from 0 up to 6? 3 | 4 | # To solve this a list row_labels has been created. You can use it to specify the row labels of the cars DataFrame. You do this by setting the index attribute of cars, that you can access as cars.index. 5 | 6 | # Instructions 7 | # 100 XP 8 | # Hit Submit Answer to see that, indeed, the row labels are not correctly set. 9 | # Specify the row labels by setting cars.index equal to row_labels. 10 | # Print out cars again and check if the row labels are correct this time. 11 | 12 | import pandas as pd 13 | 14 | # Build cars DataFrame 15 | names = ['United States', 'Australia', 'Japan', 'India', 'Russia', 'Morocco', 'Egypt'] 16 | dr = [True, False, False, False, True, True, True] 17 | cpc = [809, 731, 588, 18, 200, 70, 45] 18 | dict = { 'country':names, 'drives_right':dr, 'cars_per_cap':cpc } 19 | cars = pd.DataFrame(dict) 20 | print(cars) 21 | 22 | # Definition of row_labels 23 | row_labels = ['US', 'AUS', 'JAP', 'IN', 'RU', 'MOR', 'EG'] 24 | 25 | # Specify row labels of cars 26 | cars.index = row_labels 27 | 28 | # Print cars again 29 | print(cars) -------------------------------------------------------------------------------- /Intermediate Python for Data Science/Codings/Chapter 2-Dictionaries & Pandas/9.py: -------------------------------------------------------------------------------- 1 | # CSV to DataFrame (1) 2 | # Putting data in a dictionary and then building a DataFrame works, but it's not very efficient. What if you're dealing with millions of observations? In those cases, the data is typically available as files with a regular structure. One of those file types is the CSV file, which is short for "comma-separated values". 3 | 4 | # To import CSV data into Python as a Pandas DataFrame you can use read_csv(). 5 | 6 | # Let's explore this function with the same cars data from the previous exercises. This time, however, the data is available in a CSV file, named cars.csv. It is available in your current working directory, so the path to the file is simply 'cars.csv'. 7 | 8 | # Instructions 9 | # 100 XP 10 | # To import CSV files you still need the pandas package: import it as pd. 11 | # Use pd.read_csv() to import cars.csv data as a DataFrame. Store this dataframe as cars. 12 | # Print out cars. Does everything look OK? 13 | 14 | # Import pandas as pd 15 | import pandas as pd 16 | 17 | # Import the cars.csv data: cars 18 | cars = pd.read_csv('https://assets.datacamp.com/production/course_799/datasets/cars.csv') 19 | 20 | # Print out cars 21 | print(cars) 22 | 23 | 24 | -------------------------------------------------------------------------------- /Intermediate Python for Data Science/Codings/Chapter 3-Logic, Control Flow and Filtering/1.py: -------------------------------------------------------------------------------- 1 | # Exercise 2 | # Exercise 3 | # Equality 4 | # To check if two Python values, or variables, are equal you can use ==. To check for inequality, you need !=. As a refresher, have a look at the following examples that all result in True. Feel free to try them out in the IPython Shell. 5 | 6 | # 2 == (1 + 1) 7 | # "intermediate" != "python" 8 | # True != False 9 | # "Python" != "python" 10 | # When you write these comparisons in a script, you will need to wrap a print() function around them to see the output. 11 | 12 | # Instructions 13 | # 100 XP 14 | # Instructions 15 | # 100 XP 16 | # In the editor on the right, write code to see if True equals False. 17 | # Write Python code to check if -5 * 15 is not equal to 75. 18 | # Ask Python whether the strings "pyscript" and "PyScript" are equal. 19 | # What happens if you compare booleans and integers? Write code to see if True and 1 are equal. 20 | # Comparison of booleans 21 | print(True == False) 22 | 23 | # Comparison of integers 24 | print(-5 * 15 != 75) 25 | 26 | # Comparison of strings 27 | print("pyscript" == "PyScript") 28 | 29 | # Compare a boolean with an integer 30 | print(True == 1) -------------------------------------------------------------------------------- /Intermediate Python for Data Science/Codings/Chapter 3-Logic, Control Flow and Filtering/10.py: -------------------------------------------------------------------------------- 1 | # Exercise 2 | # Exercise 3 | # Driving right (2) 4 | # The code in the previous example worked fine, but you actually unnecessarily created a new variable dr. You can achieve the same result without this intermediate variable. Put the code that computes dr straight into the square brackets that select observations from cars. 5 | 6 | # Instructions 7 | # 100 XP 8 | # Convert the code on the right to a one-liner that calculates the variable sel as before. 9 | 10 | # Import cars data 11 | import pandas as pd 12 | cars = pd.read_csv('cars.csv', index_col = 0) 13 | 14 | # Convert code to a one-liner 15 | dr = cars['drives_right'] 16 | sel = cars[dr] 17 | 18 | # Print sel 19 | print(sel) -------------------------------------------------------------------------------- /Intermediate Python for Data Science/Codings/Chapter 3-Logic, Control Flow and Filtering/11.py: -------------------------------------------------------------------------------- 1 | # Cars per capita (1) 2 | # Let's stick to the cars data some more. This time you want to find out which countries have a high cars per capita figure. In other words, in which countries do many people have a car, or maybe multiple cars. 3 | 4 | # Similar to the previous example, you'll want to build up a boolean Series, that you can then use to subset the cars DataFrame to select certain observations. If you want to do this in a one-liner, that's perfectly fine! 5 | 6 | # Instructions 7 | # 100 XP 8 | # Instructions 9 | # 100 XP 10 | # Select the cars_per_cap column from cars as a Pandas Series and store it as cpc. 11 | # Use cpc in combination with a comparison operator and 500. You want to end up with a boolean Series that's True if the corresponding country has a cars_per_cap of more than 500 and False otherwise. Store this boolean Series as many_cars. 12 | # Use many_cars to subset cars, similar to what you did before. Store the result as car_maniac. 13 | # Print out car_maniac to see if you got it right. 14 | 15 | # Import cars data 16 | import pandas as pd 17 | cars = pd.read_csv('cars.csv', index_col = 0) 18 | 19 | # Create car_maniac: observations that have a cars_per_cap over 500 20 | cpc = cars["cars_per_cap"] 21 | many_cars = cpc > 500 22 | car_maniac = cars[many_cars] 23 | 24 | # Print car_maniac 25 | print(car_maniac) -------------------------------------------------------------------------------- /Intermediate Python for Data Science/Codings/Chapter 3-Logic, Control Flow and Filtering/12.py: -------------------------------------------------------------------------------- 1 | # Cars per capita (2) 2 | # Remember about np.logical_and(), np.logical_or() and np.logical_not(), the Numpy variants of the and, or and not operators? You can also use them on Pandas Series to do more advanced filtering operations. 3 | 4 | # Take this example that selects the observations that have a cars_per_cap between 10 and 80. Try out these lines of code step by step to see what's happening. 5 | 6 | # cpc = cars['cars_per_cap'] 7 | # between = np.logical_and(cpc > 10, cpc < 80) 8 | # medium = cars[between] 9 | # Instructions 10 | # 100 XP 11 | # Use the code sample above to create a DataFrame medium, that includes all the observations of cars that have a cars_per_cap between 100 and 500. 12 | # Print out medium. 13 | 14 | # Import cars data 15 | import pandas as pd 16 | cars = pd.read_csv('cars.csv', index_col = 0) 17 | 18 | # Import numpy, you'll need this 19 | import numpy as np 20 | 21 | # Create medium: observations with cars_per_cap between 100 and 500 22 | cpc = cars["cars_per_cap"] 23 | between = np.logical_and(cpc > 100, cpc < 500) 24 | medium = cars[between] 25 | 26 | # Print medium 27 | print(medium) -------------------------------------------------------------------------------- /Intermediate Python for Data Science/Codings/Chapter 3-Logic, Control Flow and Filtering/2.py: -------------------------------------------------------------------------------- 1 | # Greater and less than 2 | # In the video, Filip also talked about the less than and greater than signs, < and > in Python. You can combine them with an equals sign: <= and >=. Pay attention: <= is valid syntax, but =< is not. 3 | 4 | # All Python expressions in the following code chunk evaluate to True: 5 | 6 | # 3 < 4 7 | # 3 <= 4 8 | # "alpha" <= "beta" 9 | # Remember that for string comparison, Python determines the relationship based on alphabetical order. 10 | 11 | # Instructions 12 | # 100 XP 13 | # Write Python expressions, wrapped in a print() function, to check whether: 14 | # x is greater than or equal to -10. x has already been defined for you. 15 | # "test" is less than or equal to y. y has already been defined for you. 16 | # True is greater than False. 17 | 18 | # Comparison of integers 19 | x = -3 * 6 20 | print(x >= -10) 21 | 22 | # Comparison of strings 23 | y = "test" 24 | print("test" <= y) 25 | 26 | # Comparison of booleans 27 | print(True > False) -------------------------------------------------------------------------------- /Intermediate Python for Data Science/Codings/Chapter 3-Logic, Control Flow and Filtering/3.py: -------------------------------------------------------------------------------- 1 | # Compare arrays 2 | # Out of the box, you can also use comparison operators with Numpy arrays. 3 | 4 | # Remember areas, the list of area measurements for different rooms in your house from the previous course? This time there's two Numpy arrays: my_house and your_house. They both contain the areas for the kitchen, living room, bedroom and bathroom in the same order, so you can compare them. 5 | 6 | # Instructions 7 | # 100 XP 8 | # Instructions 9 | # 100 XP 10 | # Using comparison operators, generate boolean arrays that answer the following questions: 11 | # Which areas in my_house are greater than or equal to 18? 12 | # You can also compare two Numpy arrays element-wise. Which areas in my_house are smaller than the ones in your_house? 13 | # Make sure to wrap both commands in a print() statement, so that you can inspect the output. 14 | 15 | # Create arrays 16 | import numpy as np 17 | my_house = np.array([18.0, 20.0, 10.75, 9.50]) 18 | your_house = np.array([14.0, 24.0, 14.25, 9.0]) 19 | 20 | # my_house greater than or equal to 18 21 | print(my_house >= 18) 22 | 23 | # my_house less than your_house 24 | print(my_house < your_house) -------------------------------------------------------------------------------- /Intermediate Python for Data Science/Codings/Chapter 3-Logic, Control Flow and Filtering/4.py: -------------------------------------------------------------------------------- 1 | # and, or, not (1) 2 | # A boolean is either 1 or 0, True or False. With boolean operators such as and, or and not, you can combine these booleans to perform more advanced queries on your data. 3 | 4 | # In the sample code on the right, two variables are defined: my_kitchen and your_kitchen, representing areas. 5 | 6 | # Instructions 7 | # 100 XP 8 | # Write Python expressions, wrapped in a print() function, to check whether: 9 | # my_kitchen is bigger than 10 and smaller than 18. 10 | # my_kitchen is smaller than 14 or bigger than 17. 11 | # double the area of my_kitchen is smaller than triple the area of your_kitchen. 12 | # Define variables 13 | my_kitchen = 18.0 14 | your_kitchen = 14.0 15 | 16 | # my_kitchen bigger than 10 and smaller than 18? 17 | print(my_kitchen > 10 and my_kitchen < 18) 18 | 19 | # my_kitchen smaller than 14 or bigger than 17? 20 | print(my_kitchen < 14 or my_kitchen > 17) 21 | 22 | 23 | # Double my_kitchen smaller than triple your_kitchen? 24 | print(2 * my_kitchen < 3 * your_kitchen) -------------------------------------------------------------------------------- /Intermediate Python for Data Science/Codings/Chapter 3-Logic, Control Flow and Filtering/5.py: -------------------------------------------------------------------------------- 1 | # Boolean operators with Numpy 2 | # Before, the operational operators like < and >= worked with Numpy arrays out of the box. Unfortunately, this is not true for the boolean operators and, or, and not. 3 | 4 | # To use these operators with Numpy, you will need np.logical_and(), np.logical_or() and np.logical_not(). Here's an example on the my_house and your_house arrays from before to give you an idea: 5 | 6 | # np.logical_and(your_house > 13, 7 | # your_house < 15) 8 | # Instructions 9 | # 100 XP 10 | # Instructions 11 | # 100 XP 12 | # Generate boolean arrays that answer the following questions: 13 | # Which areas in my_house are greater than 18.5 or smaller than 10? 14 | # Which areas are smaller than 11 in both my_house and your_house? Make sure to wrap both commands in print() statement, so that you can inspect the output. 15 | 16 | # Create arrays 17 | import numpy as np 18 | my_house = np.array([18.0, 20.0, 10.75, 9.50]) 19 | your_house = np.array([14.0, 24.0, 14.25, 9.0]) 20 | 21 | # my_house greater than 18.5 or smaller than 10 22 | print(np.logical_or(my_house > 18.5, my_house < 10)) 23 | 24 | # Both my_house and your_house smaller than 11 25 | print(np.logical_and(my_house < 11, your_house < 11)) -------------------------------------------------------------------------------- /Intermediate Python for Data Science/Codings/Chapter 3-Logic, Control Flow and Filtering/6.py: -------------------------------------------------------------------------------- 1 | # Exercise 2 | # Exercise 3 | # if 4 | # It's time to take a closer look around in your house. 5 | 6 | # Two variables are defined in the sample code: room, a string that tells you which room of the house we're looking at, and area, the area of that room. 7 | 8 | # Instructions 9 | # 100 XP 10 | # Examine the if statement that prints out "Looking around in the kitchen." if room equals "kit". 11 | # Write another if statement that prints out "big place!" if area is greater than 15. 12 | 13 | 14 | room = "kit" 15 | area = 14.0 16 | 17 | # if statement for room 18 | if room == "kit" : 19 | print("looking around in the kitchen.") 20 | 21 | # if statement for area 22 | if area > 15: 23 | print("big place!") -------------------------------------------------------------------------------- /Intermediate Python for Data Science/Codings/Chapter 3-Logic, Control Flow and Filtering/7.py: -------------------------------------------------------------------------------- 1 | # Add else 2 | # On the right, the if construct for room has been extended with an else statement so that "looking around elsewhere." is printed if the condition room == "kit" evaluates to False. 3 | 4 | # Can you do a similar thing to add more functionality to the if construct for area? 5 | 6 | # Instructions 7 | # 100 XP 8 | # Add an else statement to the second control structure so that "pretty small." is printed out if area > 15 evaluates to False. 9 | 10 | # Define variables 11 | room = "kit" 12 | area = 14.0 13 | 14 | # if-else construct for room 15 | if room == "kit" : 16 | print("looking around in the kitchen.") 17 | else : 18 | print("looking around elsewhere.") 19 | 20 | # if-else construct for area 21 | if area > 15 : 22 | print("big place!") 23 | else: 24 | print("pretty small.") 25 | -------------------------------------------------------------------------------- /Intermediate Python for Data Science/Codings/Chapter 3-Logic, Control Flow and Filtering/8.py: -------------------------------------------------------------------------------- 1 | # Customize further: elif 2 | # It's also possible to have a look around in the bedroom. The sample code contains an elif part that checks if room equals "bed". In that case, "looking around in the bedroom." is printed out. 3 | 4 | # It's up to you now! Make a similar addition to the second control structure to further customize the messages for different values of area. 5 | 6 | # Instructions 7 | # 100 XP 8 | # Add an elif to the second control structure such that "medium size, nice!" is printed out if area is greater than 10. 9 | 10 | # Define variables 11 | room = "bed" 12 | area = 14.0 13 | 14 | # if-elif-else construct for room 15 | if room == "kit" : 16 | print("looking around in the kitchen.") 17 | elif room == "bed": 18 | print("looking around in the bedroom.") 19 | else : 20 | print("looking around elsewhere.") 21 | 22 | # if-elif-else construct for area 23 | if area > 15 : 24 | print("big place!") 25 | elif area > 10: 26 | print("medium size, nice!") 27 | else : 28 | print("pretty small.") -------------------------------------------------------------------------------- /Intermediate Python for Data Science/Codings/Chapter 3-Logic, Control Flow and Filtering/9.py: -------------------------------------------------------------------------------- 1 | # Exercise 2 | # Exercise 3 | # Driving right (1) 4 | # Remember that cars dataset, containing the cars per 1000 people (cars_per_cap) and whether people drive right (drives_right) for different countries (country)? The code that imports this data in CSV format into Python as a DataFrame is available on the right. 5 | 6 | # In the video, you saw a step-by-step approach to filter observations from a DataFrame based on boolean arrays. Let's start simple and try to find all observations in cars where drives_right is True. 7 | 8 | # drives_right is a boolean column, so you'll have to extract it as a Series and then use this boolean Series to select observations from cars. 9 | 10 | # Instructions 11 | # 100 XP 12 | # Extract the drives_right column as a Pandas Series and store it as dr. 13 | # Use dr, a boolean Series, to subset the cars DataFrame. Store the resulting selection in sel. 14 | # Print sel, and assert that drives_right is True for all observations. 15 | 16 | # Import cars data 17 | import pandas as pd 18 | cars = pd.read_csv('cars.csv', index_col = 0) 19 | 20 | # Extract drives_right column as Series: dr 21 | dr = cars["drives_right"] 22 | 23 | # Use dr to subset cars: sel 24 | sel = cars[dr] 25 | 26 | # Print sel 27 | print(sel) -------------------------------------------------------------------------------- /Intermediate Python for Data Science/Codings/Chapter 4-Loops/1.py: -------------------------------------------------------------------------------- 1 | # Basic while loop 2 | # Below you can find the example from the video where the error variable, initially equal to 50.0, is divided by 4 and printed out on every run: 3 | 4 | # error = 50.0 5 | # while error > 1 : 6 | # error = error / 4 7 | # print(error) 8 | # This example will come in handy, because it's time to build a while loop yourself! We're going to code a while loop that implements a very basic control system for an inverted pendulum. If there's an offset from standing perfectly straight, the while loop will incrementally fix this offset. 9 | 10 | # Note that if your while loop takes too long to run, you might have made a mistake! 11 | 12 | # Instructions 13 | # 100 XP 14 | # Create the variable offset with an initial value of 8. 15 | # Code a while loop that keeps running as long as offset is not equal to 0. Inside the while loop: 16 | # Print out the sentence "correcting...". 17 | # Next, decrease the value of offset by 1. You can do this with offset = offset - 1. 18 | # Finally, print out offset so you can see how it changes. 19 | # Initialize offset 20 | offset = 8 21 | 22 | # Code the while loop 23 | while offset != 0: 24 | print("correcting...") 25 | offset = offset - 1 26 | print(offset) -------------------------------------------------------------------------------- /Intermediate Python for Data Science/Codings/Chapter 4-Loops/10.py: -------------------------------------------------------------------------------- 1 | # Loop over DataFrame (2) 2 | # The row data that's generated by iterrows() on every run is a Pandas Series. This format is not very convenient to print out. Luckily, you can easily select variables from the Pandas Series using square brackets: 3 | 4 | # for lab, row in brics.iterrows() : 5 | # print(row['country']) 6 | # Instructions 7 | # 100 XP 8 | # Adapt the code in the for loop such that the first iteration prints out "US: 809", the second iteration "AUS: 731", and so on. The output should be in the form "country: cars_per_cap". Make sure to print out this exact string, with the correct spacing. 9 | 10 | # Import cars data 11 | import pandas as pd 12 | cars = pd.read_csv('cars.csv', index_col = 0) 13 | 14 | # Adapt for loop 15 | for lab, row in cars.iterrows() : 16 | print(lab + ': ' + str(row['cars_per_cap'])) -------------------------------------------------------------------------------- /Intermediate Python for Data Science/Codings/Chapter 4-Loops/11.py: -------------------------------------------------------------------------------- 1 | # Add column (1) 2 | # In the video, Filip showed you how to add the length of the country names of the brics DataFrame in a new column: 3 | 4 | # for lab, row in brics.iterrows() : 5 | # brics.loc[lab, "name_length"] = len(row["country"]) 6 | # You can do similar things on the cars DataFrame. 7 | 8 | # Instructions 9 | # 100 XP 10 | # Use a for loop to add a new column, named COUNTRY, that contains a uppercase version of the country names in the "country" column. You can use the string method upper() for this. 11 | # To see if your code worked, print out cars. Don't indent this code, so that it's not part of the for loop. 12 | 13 | # Import cars data 14 | import pandas as pd 15 | cars = pd.read_csv('cars.csv', index_col = 0) 16 | 17 | # Code for loop that adds COUNTRY column 18 | for lab, row in cars.iterrows() : 19 | cars.loc[lab, "COUNTRY"] = row["country"].upper() 20 | 21 | 22 | # Print cars 23 | print(cars) -------------------------------------------------------------------------------- /Intermediate Python for Data Science/Codings/Chapter 4-Loops/12.py: -------------------------------------------------------------------------------- 1 | # Add column (2) 2 | # Using iterrows() to iterate over every observation of a Pandas DataFrame is easy to understand, but not very efficient. On every iteration, you're creating a new Pandas Series. 3 | 4 | # If you want to add a column to a DataFrame by calling a function on another column, the iterrows() method in combination with a for loop is not the preferred way to go. Instead, you'll want to use apply(). 5 | 6 | # Compare the iterrows() version with the apply() version to get the same result in the brics DataFrame: 7 | 8 | # for lab, row in brics.iterrows() : 9 | # brics.loc[lab, "name_length"] = len(row["country"]) 10 | 11 | # brics["name_length"] = brics["country"].apply(len) 12 | # We can do a similar thing to call the upper() method on every name in the country column. However, upper() is a method, so we'll need a slightly different approach: 13 | 14 | # Instructions 15 | # 100 XP 16 | # Instructions 17 | # 100 XP 18 | # Replace the for loop with a one-liner that uses .apply(str.upper). The call should give the same result: a column COUNTRY should be added to cars, containing an uppercase version of the country names. 19 | # As usual, print out cars to see the fruits of your hard labor 20 | # Import cars data 21 | import pandas as pd 22 | cars = pd.read_csv('cars.csv', index_col = 0) 23 | 24 | # Use .apply(str.upper) 25 | cars["COUNTRY"] = cars["country"].apply(str.upper) 26 | print(cars) -------------------------------------------------------------------------------- /Intermediate Python for Data Science/Codings/Chapter 4-Loops/2.py: -------------------------------------------------------------------------------- 1 | # Add conditionals 2 | # The while loop that corrects the offset is a good start, but what if offset is negative? You can try to run the following code where offset is initialized to -6: 3 | 4 | # # Initialize offset 5 | # offset = -6 6 | 7 | # # Code the while loop 8 | # while offset != 0 : 9 | # print("correcting...") 10 | # offset = offset - 1 11 | # print(offset) 12 | # but your session will be disconnected. The while loop will never stop running, because offset will be further decreased on every run. offset != 0 will never become False and the while loop continues forever. 13 | 14 | # Fix things by putting an if-else statement inside the while loop. If your code is still taking too long to run, you probably made a mistake! 15 | 16 | # Instructions 17 | # 100 XP 18 | # Inside the while loop, replace offset = offset - 1 by an if-else statement: 19 | # If offset is greater than zero, you should decrease offset by 1. 20 | # Else, you should increase offset by 1. 21 | # If you've coded things correctly, hitting Submit Answer should work this time. If your code is still taking too long to run, you probably made a mistake! 22 | 23 | # Initialize offset 24 | offset = -6 25 | 26 | # Code the while loop 27 | while offset != 0 : 28 | print("correcting...") 29 | 30 | if offset > 0: 31 | offset = offset - 1 32 | else: 33 | offset = offset + 1 34 | 35 | print(offset) 36 | -------------------------------------------------------------------------------- /Intermediate Python for Data Science/Codings/Chapter 4-Loops/3.py: -------------------------------------------------------------------------------- 1 | # Loop over a list 2 | # Have another look at the for loop that Filip showed in the video: 3 | 4 | # fam = [1.73, 1.68, 1.71, 1.89] 5 | # for height in fam : 6 | # print(height) 7 | # As usual, you simply have to indent the code with 4 spaces to tell Python which code should be executed in the for loop. 8 | 9 | # The areas variable, containing the area of different rooms in your house, is already defined. 10 | 11 | # Instructions 12 | # 100 XP 13 | # Write a for loop that iterates over all elements of the areas list and prints out every element separately. 14 | 15 | # areas list 16 | areas = [11.25, 18.0, 20.0, 10.75, 9.50] 17 | 18 | # Code the for loop 19 | for item in areas: 20 | print(item) -------------------------------------------------------------------------------- /Intermediate Python for Data Science/Codings/Chapter 4-Loops/4.py: -------------------------------------------------------------------------------- 1 | # Exercise 2 | # Exercise 3 | # Indexes and values (1) 4 | # Using a for loop to iterate over a list only gives you access to every list element in each run, one after the other. If you also want to access the index information, so where the list element you're iterating over is located, you can use enumerate(). 5 | 6 | # As an example, have a look at how the for loop from the video was converted: 7 | 8 | # fam = [1.73, 1.68, 1.71, 1.89] 9 | # for index, height in enumerate(fam) : 10 | # print("person " + str(index) + ": " + str(height)) 11 | # Instructions 12 | # 100 XP 13 | # Instructions 14 | # 100 XP 15 | # Adapt the for loop in the sample code to use enumerate() and use two iterator variables. 16 | # Update the print() statement so that on each run, a line of the form "room x: y" should be printed, where x is the index of the list element and y is the actual list element, i.e. the area. Make sure to print out this exact string, with the correct spacing. 17 | 18 | # areas list 19 | areas = [11.25, 18.0, 20.0, 10.75, 9.50] 20 | 21 | # Change for loop to use enumerate() and update print() 22 | for index, value in enumerate(areas): 23 | print("room " + str(index) + ": " + str(value)) -------------------------------------------------------------------------------- /Intermediate Python for Data Science/Codings/Chapter 4-Loops/5.py: -------------------------------------------------------------------------------- 1 | # Indexes and values (2) 2 | # For non-programmer folks, room 0: 11.25 is strange. Wouldn't it be better if the count started at 1? 3 | 4 | # Instructions 5 | # 100 XP 6 | # Adapt the print() function in the for loop on the right so that the first printout becomes "room 1: 11.25", the second one "room 2: 18.0" and so on. 7 | 8 | 9 | # areas list 10 | areas = [11.25, 18.0, 20.0, 10.75, 9.50] 11 | 12 | # Code the for loop 13 | for index, area in enumerate(areas) : 14 | print("room " + str(index + 1) + ": " + str(area)) -------------------------------------------------------------------------------- /Intermediate Python for Data Science/Codings/Chapter 4-Loops/6.py: -------------------------------------------------------------------------------- 1 | # Loop over list of lists 2 | # Remember the house variable from the Intro to Python course? Have a look at its definition on the right. It's basically a list of lists, where each sublist contains the name and area of a room in your house. 3 | 4 | # It's up to you to build a for loop from scratch this time! 5 | 6 | # Instructions 7 | # 100 XP 8 | # Write a for loop that goes through each sublist of house and prints out the x is y sqm, where x is the name of the room and y is the area of the room. 9 | 10 | # house list of lists 11 | house = [["hallway", 11.25], 12 | ["kitchen", 18.0], 13 | ["living room", 20.0], 14 | ["bedroom", 10.75], 15 | ["bathroom", 9.50]] 16 | 17 | # Build a for loop from scratch 18 | for item in (house): 19 | print("the " + item[0] + " is " + str(item[1]) + " sqm") -------------------------------------------------------------------------------- /Intermediate Python for Data Science/Codings/Chapter 4-Loops/7.py: -------------------------------------------------------------------------------- 1 | # Loop over dictionary 2 | # In Python 3, you need the items() method to loop over a dictionary: 3 | 4 | # world = { "afghanistan":30.55, 5 | # "albania":2.77, 6 | # "algeria":39.21 } 7 | 8 | # for key, value in world.items() : 9 | # print(key + " -- " + str(value)) 10 | # Remember the europe dictionary that contained the names of some European countries as key and their capitals as corresponding value? Go ahead and write a loop to iterate over it! 11 | 12 | # Instructions 13 | # 100 XP 14 | # Write a for loop that goes through each key:value pair of europe. On each iteration, "the capital of x is y" should be printed out, where x is the key and y is the value of the pair. 15 | 16 | # Definition of dictionary 17 | europe = {'spain':'madrid', 'france':'paris', 'germany':'berlin', 18 | 'norway':'oslo', 'italy':'rome', 'poland':'warsaw', 'austria':'vienna' } 19 | 20 | # Iterate over europe 21 | for key, value in europe.items(): 22 | print("the capital of " + key + " is " + str(value)) -------------------------------------------------------------------------------- /Intermediate Python for Data Science/Codings/Chapter 4-Loops/8.py: -------------------------------------------------------------------------------- 1 | # Loop over Numpy array 2 | # If you're dealing with a 1D Numpy array, looping over all elements can be as simple as: 3 | 4 | # for x in my_array : 5 | # ... 6 | # If you're dealing with a 2D Numpy array, it's more complicated. A 2D array is built up of multiple 1D arrays. To explicitly iterate over all separate elements of a multi-dimensional array, you'll need this syntax: 7 | 8 | # for x in np.nditer(my_array) : 9 | # ... 10 | # Two Numpy arrays that you might recognize from the intro course are available in your Python session: np_height, a Numpy array containing the heights of Major League Baseball players, and np_baseball, a 2D Numpy array that contains both the heights (first column) and weights (second column) of those players. 11 | 12 | # Instructions 13 | # 100 XP 14 | # Instructions 15 | # 100 XP 16 | # Import the numpy package under the local alias np. 17 | # Write a for loop that iterates over all elements in np_height and prints out "x inches" for each element, where x is the value in the array. 18 | # Write a for loop that visits every element of the np_baseball array and prints it out. 19 | 20 | # Import numpy as np 21 | import numpy as np 22 | 23 | # For loop over np_height 24 | for x in np_height : 25 | print(str(x) + " inches") 26 | 27 | # For loop over np_baseball 28 | for x in np.nditer(np_baseball) : 29 | print(x) -------------------------------------------------------------------------------- /Intermediate Python for Data Science/Codings/Chapter 4-Loops/9.py: -------------------------------------------------------------------------------- 1 | # Loop over DataFrame (1) 2 | # Iterating over a Pandas DataFrame is typically done with the iterrows() method. Used in a for loop, every observation is iterated over and on every iteration the row label and actual row contents are available: 3 | 4 | # for lab, row in brics.iterrows() : 5 | # ... 6 | # In this and the following exercises you will be working on the cars DataFrame. It contains information on the cars per capita and whether people drive right or left for seven countries in the world. 7 | 8 | # Instructions 9 | # 100 XP 10 | # Write a for loop that iterates over the rows of cars and on each iteration perform two print() calls: one to print out the row label and one to print out all of the rows contents. 11 | 12 | # Import cars data 13 | import pandas as pd 14 | cars = pd.read_csv('cars.csv', index_col = 0) 15 | 16 | # Iterate over rows of cars 17 | for lab, row in cars.iterrows(): 18 | print(lab) 19 | print(row) -------------------------------------------------------------------------------- /Intermediate Python for Data Science/Codings/Chapter 5-Case Study Hacker Statistics/1.py: -------------------------------------------------------------------------------- 1 | # Exercise 2 | # Exercise 3 | # Random float 4 | # Randomness has many uses in science, art, statistics, cryptography, gaming, gambling, and other fields. You're going to use randomness to simulate a game. 5 | 6 | # All the functionality you need is contained in the random package, a sub-package of numpy. In this exercise, you'll be using two functions from this package: 7 | 8 | # seed(): sets the random seed, so that your results are the reproducible between simulations. As an argument, it takes an integer of your choosing. If you call the function, no output will be generated. 9 | # rand(): if you don't specify any arguments, it generates a random float between zero and one. 10 | # Instructions 11 | # 100 XP 12 | # Import numpy as np. 13 | # Use seed() to set the seed; as an argument, pass 123. 14 | # Generate your first random float with rand() and print it out. 15 | 16 | # Import numpy as np 17 | import numpy as np 18 | 19 | # Set the seed 20 | np.random.seed(123) 21 | 22 | # Generate and print random float 23 | print(np.random.rand()) -------------------------------------------------------------------------------- /Intermediate Python for Data Science/Codings/Chapter 5-Case Study Hacker Statistics/2.py: -------------------------------------------------------------------------------- 1 | # Exercise 2 | # Exercise 3 | # Roll the dice 4 | # In the previous exercise, you used rand(), that generates a random float between 0 and 1. 5 | 6 | # As Filip explained in the video you can just as well use randint(), also a function of the random package, to generate integers randomly. The following call generates the integer 4, 5, 6 or 7 randomly. 8 is not included. 7 | 8 | # import numpy as np 9 | # np.random.randint(4, 8) 10 | # Numpy has already been imported as np and a seed has been set. Can you roll some dice? 11 | 12 | # Instructions 13 | # 100 XP 14 | # Use randint() with the appropriate arguments to randomly generate the integer 1, 2, 3, 4, 5 or 6. This simulates a dice. Print it out. 15 | # Repeat the outcome to see if the second throw is different. Again, print out the result. 16 | 17 | # Import numpy and set seed 18 | import numpy as np 19 | np.random.seed(123) 20 | 21 | # Use randint() to simulate a dice 22 | print(np.random.randint(1,7)) 23 | 24 | # Use randint() again 25 | print(np.random.randint(1,7)) -------------------------------------------------------------------------------- /Intermediate Python for Data Science/Codings/Chapter 5-Case Study Hacker Statistics/3.py: -------------------------------------------------------------------------------- 1 | # Determine your next move 2 | # In the Empire State Building bet, your next move depends on the number of eyes you throw with the dice. We can perfectly code this with an if-elif-else construct! 3 | 4 | # The sample code assumes that you're currently at step 50. Can you fill in the missing pieces to finish the script? numpy is already imported as np and the seed has been set to 123, so you don't have to worry about that anymore. 5 | 6 | # Instructions 7 | # 100 XP 8 | # Instructions 9 | # 100 XP 10 | # Roll the dice. Use randint() to create the variable dice. 11 | # Finish the if-elif-else construct by replacing ___: 12 | # If dice is 1 or 2, you go one step down. 13 | # if dice is 3, 4 or 5, you go one step up. 14 | # Else, you throw the dice again. The number of eyes is the number of steps you go up. 15 | # Print out dice and step. Given the value of dice, was step updated correctly? 16 | # Numpy is imported, seed is set 17 | 18 | # Starting step 19 | step = 50 20 | 21 | # Roll the dice 22 | dice = np.random.randint(1,7) 23 | 24 | # Finish the control construct 25 | if dice <= 2 : 26 | step = step - 1 27 | elif dice <= 5 : 28 | step = step + 1 29 | else : 30 | step = step + np.random.randint(1,7) 31 | 32 | # Print out dice and step 33 | print(dice) 34 | print(step) -------------------------------------------------------------------------------- /Intermediate Python for Data Science/Codings/Chapter 5-Case Study Hacker Statistics/4.py: -------------------------------------------------------------------------------- 1 | # The next step 2 | # Before, you have already written Python code that determines the next step based on the previous step. Now it's time to put this code inside a for loop so that we can simulate a random walk. 3 | 4 | # Instructions 5 | # 100 XP 6 | # Instructions 7 | # 100 XP 8 | # Make a list random_walk that contains the first step, which is the integer 0. 9 | # Finish the for loop: 10 | # The loop should run 100 times. 11 | # On each iteration, set step equal to the last element in the random_walk list. You can use the index -1 for this. 12 | # Next, let the if-elif-else construct update step for you. 13 | # The code that appends step to random_walk is already coded. 14 | # Print out random_walk. 15 | 16 | # Numpy is imported, seed is set 17 | 18 | # Initialize random_walk 19 | random_walk = [0] 20 | 21 | # Complete the ___ 22 | for x in range(100) : 23 | # Set step: last element in random_walk 24 | step = random_walk[-1] 25 | 26 | # Roll the dice 27 | dice = np.random.randint(1,7) 28 | 29 | # Determine next step 30 | if dice <= 2: 31 | step = step - 1 32 | elif dice <= 5: 33 | step = step + 1 34 | else: 35 | step = step + np.random.randint(1,7) 36 | 37 | # append next_step to random_walk 38 | random_walk.append(step) 39 | 40 | # Print random_walk 41 | print(random_walk) 42 | -------------------------------------------------------------------------------- /Intermediate Python for Data Science/Codings/Chapter 5-Case Study Hacker Statistics/5.py: -------------------------------------------------------------------------------- 1 | # Exercise 2 | # Exercise 3 | # How low can you go? 4 | # Things are shaping up nicely! You already have code that calculates your location in the Empire State Building after 100 dice throws. However, there's something we haven't thought about - you can't go below 0! 5 | 6 | # A typical way to solve problems like this is by using max(). If you pass max() two arguments, the biggest one gets returned. For example, to make sure that a variable x never goes below 10 when you decrease it, you can use: 7 | 8 | # x = max(10, x - 1) 9 | # Instructions 10 | # 100 XP 11 | # Use max() in a similar way to make sure that step doesn't go below zero if dice <= 2. 12 | # Hit Submit Answer and check the contents of random_walk. 13 | 14 | # Numpy is imported, seed is set 15 | 16 | # Initialize random_walk 17 | random_walk = [0] 18 | 19 | for x in range(100) : 20 | step = random_walk[-1] 21 | dice = np.random.randint(1,7) 22 | 23 | if dice <= 2: 24 | # Replace below: use max to make sure step can't go below 0 25 | step = max(0, step - 1) 26 | elif dice <= 5: 27 | step = step + 1 28 | else: 29 | step = step + np.random.randint(1,7) 30 | 31 | random_walk.append(step) 32 | 33 | print(random_walk) -------------------------------------------------------------------------------- /Intermediate Python for Data Science/Codings/Chapter 5-Case Study Hacker Statistics/6.py: -------------------------------------------------------------------------------- 1 | # Visualize the walk 2 | # Let's visualize this random walk! Remember how you could use matplotlib to build a line plot? 3 | 4 | # import matplotlib.pyplot as plt 5 | # plt.plot(x, y) 6 | # plt.show() 7 | # The first list you pass is mapped onto the x axis and the second list is mapped onto the y axis. 8 | 9 | # If you pass only one argument, Python will know what to do and will use the index of the list to map onto the x axis, and the values in the list onto the y axis. 10 | 11 | # Instructions 12 | # 100 XP 13 | # Add some lines of code after the for loop: 14 | 15 | # Import matplotlib.pyplot as plt. 16 | # Use plt.plot() to plot random_walk. 17 | # Finish off with plt.show() to actually display the plot. 18 | 19 | # Numpy is imported, seed is set 20 | 21 | # Initialization 22 | np.random.seed(123) 23 | random_walk = [0] 24 | 25 | for x in range(100) : 26 | step = random_walk[-1] 27 | dice = np.random.randint(1,7) 28 | 29 | if dice <= 2: 30 | step = max(0, step - 1) 31 | elif dice <= 5: 32 | step = step + 1 33 | else: 34 | step = step + np.random.randint(1,7) 35 | 36 | random_walk.append(step) 37 | 38 | # Import matplotlib.pyplot as plt 39 | import matplotlib.pyplot as plt 40 | 41 | # Plot random_walk 42 | plt.plot(random_walk) 43 | 44 | # Show the plot 45 | plt.show() -------------------------------------------------------------------------------- /Intermediate Python for Data Science/Codings/Chapter 5-Case Study Hacker Statistics/9.py: -------------------------------------------------------------------------------- 1 | # Implement clumsiness 2 | # With this neatly written code of yours, changing the number of times the random walk should be simulated is super-easy. You simply update the range() function in the top-level for loop. 3 | 4 | # There's still something we forgot! You're a bit clumsy and you have a 0.1% chance of falling down. That calls for another random number generation. Basically, you can generate a random float between 0 and 1. If this value is less than or equal to 0.001, you should reset step to 0. 5 | 6 | # Instructions 7 | # 100 XP 8 | # Change the range() function so that the simulation is performed 250 times. 9 | # Finish the if condition so that step is set to 0 if a random float is less or equal to 0.001. Use np.random.rand(). 10 | 11 | # numpy and matplotlib imported, seed set 12 | 13 | # Simulate random walk 250 times 14 | all_walks = [] 15 | 16 | # Simulate random walk 250 times 17 | for i in range(250) : 18 | random_walk = [0] 19 | for x in range(100) : 20 | step = random_walk[-1] 21 | dice = np.random.randint(1,7) 22 | if dice <= 2: 23 | step = max(0, step - 1) 24 | elif dice <= 5: 25 | step = step + 1 26 | else: 27 | step = step + np.random.randint(1,7) 28 | 29 | # Implement clumsiness 30 | if np.random.rand() <= 0.001 : 31 | step = 0 32 | 33 | random_walk.append(step) 34 | all_walks.append(random_walk) 35 | 36 | # Create and plot np_aw_t 37 | np_aw_t = np.transpose(np.array(all_walks)) 38 | plt.plot(np_aw_t) 39 | plt.show() -------------------------------------------------------------------------------- /Intermediate Python for Data Science/Datasets/brics.csv: -------------------------------------------------------------------------------- 1 | ,country,capital,area,population 2 | BR,Brazil,Brasilia,8.516,200.4 3 | RU,Russia,Moscow,17.10,143.5 4 | IN,India,New Delhi,3.286,1252 5 | CH,China,Beijing,9.597,1357 6 | SA,South Africa,Pretoria,1.221,52.98 -------------------------------------------------------------------------------- /Intermediate Python for Data Science/Datasets/cars.csv: -------------------------------------------------------------------------------- 1 | ,cars_per_cap,country,drives_right 2 | US,809,United States,True 3 | AUS,731,Australia,False 4 | JAP,588,Japan,False 5 | IN,18,India,False 6 | RU,200,Russia,True 7 | MOR,70,Morocco,True 8 | EG,45,Egypt,True 9 | -------------------------------------------------------------------------------- /Intermediate Python for Data Science/Notes/intermediate_python_ch1_slides.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FTiniNadhirah/Datacamp-Data-Scientist-with-Python-2019-and-2020/52ab66237a2373ba820ea3e197809648b09f217b/Intermediate Python for Data Science/Notes/intermediate_python_ch1_slides.pdf -------------------------------------------------------------------------------- /Intermediate Python for Data Science/Notes/intermediate_python_ch2_slides.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FTiniNadhirah/Datacamp-Data-Scientist-with-Python-2019-and-2020/52ab66237a2373ba820ea3e197809648b09f217b/Intermediate Python for Data Science/Notes/intermediate_python_ch2_slides.pdf -------------------------------------------------------------------------------- /Intermediate Python for Data Science/Notes/intermediate_python_ch3_slides.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FTiniNadhirah/Datacamp-Data-Scientist-with-Python-2019-and-2020/52ab66237a2373ba820ea3e197809648b09f217b/Intermediate Python for Data Science/Notes/intermediate_python_ch3_slides.pdf -------------------------------------------------------------------------------- /Intermediate Python for Data Science/Notes/intermediate_python_ch4_slides.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FTiniNadhirah/Datacamp-Data-Scientist-with-Python-2019-and-2020/52ab66237a2373ba820ea3e197809648b09f217b/Intermediate Python for Data Science/Notes/intermediate_python_ch4_slides.pdf -------------------------------------------------------------------------------- /Intermediate Python for Data Science/Notes/intermediate_python_ch5_slides.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FTiniNadhirah/Datacamp-Data-Scientist-with-Python-2019-and-2020/52ab66237a2373ba820ea3e197809648b09f217b/Intermediate Python for Data Science/Notes/intermediate_python_ch5_slides.pdf -------------------------------------------------------------------------------- /Introduction to Data Visualization with Python/Notes/ch1_slides.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FTiniNadhirah/Datacamp-Data-Scientist-with-Python-2019-and-2020/52ab66237a2373ba820ea3e197809648b09f217b/Introduction to Data Visualization with Python/Notes/ch1_slides.pdf -------------------------------------------------------------------------------- /Introduction to Data Visualization with Python/Notes/ch2_slides.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FTiniNadhirah/Datacamp-Data-Scientist-with-Python-2019-and-2020/52ab66237a2373ba820ea3e197809648b09f217b/Introduction to Data Visualization with Python/Notes/ch2_slides.pdf -------------------------------------------------------------------------------- /Introduction to Data Visualization with Python/Notes/ch3_slides.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FTiniNadhirah/Datacamp-Data-Scientist-with-Python-2019-and-2020/52ab66237a2373ba820ea3e197809648b09f217b/Introduction to Data Visualization with Python/Notes/ch3_slides.pdf -------------------------------------------------------------------------------- /Introduction to Data Visualization with Python/Notes/ch4_slides.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FTiniNadhirah/Datacamp-Data-Scientist-with-Python-2019-and-2020/52ab66237a2373ba820ea3e197809648b09f217b/Introduction to Data Visualization with Python/Notes/ch4_slides.pdf -------------------------------------------------------------------------------- /Introduction to Python/Codings/Chapter 1-Introduction to Python/1.py: -------------------------------------------------------------------------------- 1 | # The Python Interface 2 | # In the Python script on the right, you can type Python code to solve the exercises. If you hit Run Code or Submit Answer, your python script (script.py) is executed and the output is shown in the IPython Shell. Submit Answer checks whether your submission is correct and gives you feedback. 3 | 4 | # You can hit Run Code and Submit Answer as often as you want. If you're stuck, you can click Get Hint, and ultimately Get Solution. 5 | 6 | # You can also use the IPython Shell interactively by simply typing commands and hitting Enter. When you work in the shell directly, your code will not be checked for correctness so it is a great way to experiment. 7 | 8 | # Instructions 9 | # 100 XP 10 | # Experiment in the IPython Shell; type 5 / 8, for example. 11 | # Add another line of code to the Python script on the top-right (not in the Shell): print(7 + 10). 12 | # Hit Submit Answer to execute the Python script and receive feedback. 13 | 14 | # Example, do not modify! 15 | print(5 / 8) 16 | 17 | # Put code below here 18 | print(7 + 10) -------------------------------------------------------------------------------- /Introduction to Python/Codings/Chapter 1-Introduction to Python/2.py: -------------------------------------------------------------------------------- 1 | # Exercise 2 | # Exercise 3 | # Any comments? 4 | # Something that Filip didn't mention in his videos is that you can add comments to your Python scripts. Comments are important to make sure that you and others can understand what your code is about. 5 | 6 | # To add comments to your Python script, you can use the # tag. These comments are not run as Python code, so they will not influence your result. As an example, take the comment on the right, # Division; it is completely ignored during execution. 7 | 8 | # Instructions 9 | # 100 XP 10 | # Above the print(7 + 10), add the comment # Addition. 11 | 12 | 13 | # Division 14 | print(5 / 8) 15 | 16 | #Addition 17 | print(7 + 10) -------------------------------------------------------------------------------- /Introduction to Python/Codings/Chapter 1-Introduction to Python/3.py: -------------------------------------------------------------------------------- 1 | # Python as a calculator 2 | # Python is perfectly suited to do basic calculations. Apart from addition, subtraction, multiplication and division, there is also support for more advanced operations such as: 3 | 4 | # Exponentiation: **. This operator raises the number to its left to the power of the number to its right. For example 4**2 will give 16. 5 | # Modulo: %. This operator returns the remainder of the division of the number to the left by the number on its right. For example 18 % 7 equals 4. 6 | # The code in the script on the right gives some examples. 7 | 8 | # Instructions 9 | # 100 XP 10 | # Instructions 11 | # 100 XP 12 | # Suppose you have $100, which you can invest with a 10% return each year. After one year, it's 100×1.1=110 dollars, and after two years it's 100×1.1×1.1=121. Add code on the right to calculate how much money you end up with after 7 years. 13 | 14 | # Addition, subtraction 15 | print(5 + 5) 16 | print(5 - 5) 17 | 18 | # Multiplication, division, modulo, and exponentiation 19 | print(3 * 5) 20 | print(10 / 2) 21 | print(18 % 7) 22 | print(4 ** 2) 23 | 24 | # How much is your $100 worth after 7 years? 25 | print(100 * 1.1**7) -------------------------------------------------------------------------------- /Introduction to Python/Codings/Chapter 1-Introduction to Python/4.py: -------------------------------------------------------------------------------- 1 | # Variable Assignment 2 | # In Python, a variable allows you to refer to a value with a name. To create a variable use =, like this example: 3 | 4 | # x = 5 5 | # You can now use the name of this variable, x, instead of the actual value, 5. 6 | 7 | # Remember, = in Python means assignment, it doesn't test equality! 8 | 9 | # Instructions 10 | # 100 XP 11 | # Create a variable savings with the value 100. 12 | # Check out this variable by typing print(savings) in the script. 13 | 14 | # Create a variable savings 15 | savings = 100 16 | 17 | # Print out savings 18 | print(savings) -------------------------------------------------------------------------------- /Introduction to Python/Codings/Chapter 1-Introduction to Python/5.py: -------------------------------------------------------------------------------- 1 | # Calculations with variables 2 | # Remember how you calculated the money you ended up with after 7 years of investing $100? You did something like this: 3 | 4 | # 100 * 1.1 ** 7 5 | # Instead of calculating with the actual values, you can use variables instead. The savings variable you've created in the previous exercise represents the $100 you started with. It's up to you to create a new variable to represent 1.1 and then redo the calculations! 6 | 7 | # Instructions 8 | # 100 XP 9 | # Create a variable growth_multiplier, equal to 1.1. 10 | # Create a variable, result, equal to the amount of money you saved after 7 years. 11 | # Print out the value of result. 12 | 13 | # Create a variable savings 14 | savings = 100 15 | 16 | # Create a variable factor 17 | growth_multiplier = 1.1 18 | 19 | # Calculate result 20 | result = savings * growth_multiplier ** 7 21 | 22 | # Print out result 23 | print(result) 24 | -------------------------------------------------------------------------------- /Introduction to Python/Codings/Chapter 1-Introduction to Python/6.py: -------------------------------------------------------------------------------- 1 | # Other variable types 2 | # In the previous exercise, you worked with two Python data types: 3 | 4 | # int, or integer: a number without a fractional part. savings, with the value 100, is an example of an integer. 5 | # float, or floating point: a number that has both an integer and fractional part, separated by a point. growth_multiplier, with the value 1.1, is an example of a float. 6 | # Next to numerical data types, there are two other very common data types: 7 | 8 | # str, or string: a type to represent text. You can use single or double quotes to build a string. 9 | # bool, or boolean: a type to represent logical values. Can only be True or False (the capitalization is important!). 10 | # Instructions 11 | # 100 XP 12 | # Instructions 13 | # 100 XP 14 | # Create a new string, desc, with the value "compound interest". 15 | # Create a new boolean, profitable, with the value True. 16 | 17 | # Create a variable desc 18 | desc = "compound interest" 19 | 20 | # Create a variable profitable 21 | profitable = True -------------------------------------------------------------------------------- /Introduction to Python/Codings/Chapter 1-Introduction to Python/7.py: -------------------------------------------------------------------------------- 1 | # Operations with other types 2 | # Filip mentioned that different types behave differently in Python. 3 | 4 | # When you sum two strings, for example, you'll get different behavior than when you sum two integers or two booleans. 5 | 6 | # In the script some variables with different types have already been created. It's up to you to use them. 7 | 8 | # Instructions 9 | # 100 XP 10 | # Instructions 11 | # 100 XP 12 | # Calculate the product of savings and growth_multiplier. Store the result in year1. 13 | # What do you think the resulting type will be? Find out by printing out the type of year1. 14 | # Calculate the sum of desc and desc and store the result in a new variable doubledesc. 15 | # Print out doubledesc. Did you expect this? 16 | 17 | savings = 100 18 | growth_multiplier = 1.1 19 | desc = "compound interest" 20 | 21 | # Assign product of growth_multiplier and savings to year1 22 | year1 = savings * growth_multiplier 23 | 24 | # Print the type of year1 25 | print(type(year1)) 26 | 27 | # Assign sum of desc and desc to doubledesc 28 | doubledesc = desc + desc 29 | 30 | # Print out doubledesc 31 | print(doubledesc) 32 | -------------------------------------------------------------------------------- /Introduction to Python/Codings/Chapter 1-Introduction to Python/8.py: -------------------------------------------------------------------------------- 1 | # Type conversion 2 | # Using the + operator to paste together two strings can be very useful in building custom messages. 3 | 4 | # Suppose, for example, that you've calculated the return of your investment and want to summarize the results in a string. Assuming the floats savings and result are defined, you can try something like this: 5 | 6 | # print("I started with $" + savings + " and now have $" + result + ". Awesome!") 7 | # This will not work, though, as you cannot simply sum strings and floats. 8 | 9 | # To fix the error, you'll need to explicitly convert the types of your variables. More specifically, you'll need str(), to convert a value into a string. str(savings), for example, will convert the float savings to a string. 10 | 11 | # Similar functions such as int(), float() and bool() will help you convert Python values into any type. 12 | 13 | # Instructions 14 | # 100 XP 15 | # Hit Run Code to run the code on the right. Try to understand the error message. 16 | # Fix the code on the right such that the printout runs without errors; use the function str() to convert the variables to strings. 17 | # Convert the variable pi_string to a float and store this float as a new variable, pi_float. 18 | 19 | # Definition of savings and result 20 | savings = 100 21 | result = 100 * 1.10 ** 7 22 | 23 | # Fix the printout 24 | print("I started with $" + str(savings) + " and now have $" + str(result) + ". Awesome!") 25 | 26 | # Definition of pi_string 27 | pi_string = "3.1415926" 28 | 29 | # Convert pi_string into float: pi_float 30 | pi_float = float(pi_string) 31 | 32 | print(type(pi_float)) 33 | -------------------------------------------------------------------------------- /Introduction to Python/Codings/Chapter 2-Python Lists/1.py: -------------------------------------------------------------------------------- 1 | # Create a list 2 | # As opposed to int, bool etc., a list is a compound data type; you can group values together: 3 | 4 | # a = "is" 5 | # b = "nice" 6 | # my_list = ["my", "list", a, b] 7 | # After measuring the height of your family, you decide to collect some information on the house you're living in. The areas of the different parts of your house are stored in separate variables for now, as shown in the script. 8 | 9 | # Instructions 10 | # 100 XP 11 | # Instructions 12 | # 100 XP 13 | # Create a list, areas, that contains the area of the hallway (hall), kitchen (kit), living room (liv), bedroom (bed) and bathroom (bath), in this order. Use the predefined variables. 14 | # Print areas with the print() function. 15 | 16 | # area variables (in square meters) 17 | hall = 11.25 18 | kit = 18.0 19 | liv = 20.0 20 | bed = 10.75 21 | bath = 9.50 22 | 23 | # Create list areas 24 | areas = [hall, kit, liv, bed, bath] 25 | 26 | # Print areas 27 | print(areas) -------------------------------------------------------------------------------- /Introduction to Python/Codings/Chapter 2-Python Lists/10.py: -------------------------------------------------------------------------------- 1 | # Inner workings of lists 2 | # At the end of the video, Filip explained how Python lists work behind the scenes. In this exercise you'll get some hands-on experience with this. 3 | 4 | # The Python code in the script already creates a list with the name areas and a copy named areas_copy. Next, the first element in the areas_copy list is changed and the areas list is printed out. If you hit Run Code you'll see that, although you've changed areas_copy, the change also takes effect in the areas list. That's because areas and areas_copy point to the same list. 5 | 6 | # If you want to prevent changes in areas_copy from also taking effect in areas, you'll have to do a more explicit copy of the areas list. You can do this with list() or by using [:]. 7 | 8 | # Instructions 9 | # 100 XP 10 | # Change the second command, that creates the variable areas_copy, such that areas_copy is an explicit copy of areas. After your edit, changes made to areas_copy shouldn't affect areas. Hit Submit Answer to check this. 11 | 12 | # Create list areas 13 | areas = [11.25, 18.0, 20.0, 10.75, 9.50] 14 | 15 | # Create areas_copy 16 | areas_copy = list(areas) 17 | 18 | # Change areas_copy 19 | areas_copy[0] = 5.0 20 | 21 | # Print areas 22 | print(areas) -------------------------------------------------------------------------------- /Introduction to Python/Codings/Chapter 2-Python Lists/2.py: -------------------------------------------------------------------------------- 1 | # Create list with different types 2 | # A list can contain any Python type. Although it's not really common, a list can also contain a mix of Python types including strings, floats, booleans, etc. 3 | 4 | # The printout of the previous exercise wasn't really satisfying. It's just a list of numbers representing the areas, but you can't tell which area corresponds to which part of your house. 5 | 6 | # The code on the right is the start of a solution. For some of the areas, the name of the corresponding room is already placed in front. Pay attention here! "bathroom" is a string, while bath is a variable that represents the float 9.50 you specified earlier. 7 | 8 | # Instructions 9 | # 100 XP 10 | # Instructions 11 | # 100 XP 12 | # Finish the line of code that creates the areas list. Build the list so that the list first contains the name of each room as a string and then its area. In other words, add the strings "hallway", "kitchen" and "bedroom" at the appropriate locations. 13 | # Print areas again; is the printout more informative this time? 14 | 15 | # area variables (in square meters) 16 | hall = 11.25 17 | kit = 18.0 18 | liv = 20.0 19 | bed = 10.75 20 | bath = 9.50 21 | 22 | # Adapt list areas 23 | areas = ["hallway", hall, "kitchen", kit, "living room", liv, "bedroom", bed, "bathroom", bath] 24 | 25 | # Print areas 26 | print(areas) -------------------------------------------------------------------------------- /Introduction to Python/Codings/Chapter 2-Python Lists/3.py: -------------------------------------------------------------------------------- 1 | # List of lists 2 | # As a data scientist, you'll often be dealing with a lot of data, and it will make sense to group some of this data. 3 | 4 | # Instead of creating a flat list containing strings and floats, representing the names and areas of the rooms in your house, you can create a list of lists. The script on the right can already give you an idea. 5 | 6 | # Don't get confused here: "hallway" is a string, while hall is a variable that represents the float 11.25 you specified earlier. 7 | 8 | # Instructions 9 | # 100 XP 10 | # Finish the list of lists so that it also contains the bedroom and bathroom data. Make sure you enter these in order! 11 | # Print out house; does this way of structuring your data make more sense? 12 | # Print out the type of house. Are you still dealing with a list? 13 | 14 | # area variables (in square meters) 15 | hall = 11.25 16 | kit = 18.0 17 | liv = 20.0 18 | bed = 10.75 19 | bath = 9.50 20 | 21 | # house information as list of lists 22 | house = [["hallway", hall], 23 | ["kitchen", kit], 24 | ["living room", liv], 25 | ["bedroom", bed], 26 | ["bathroom", bath] 27 | ] 28 | # Print out house 29 | print(house) 30 | 31 | # Print out the type of house 32 | print(type(house)) -------------------------------------------------------------------------------- /Introduction to Python/Codings/Chapter 2-Python Lists/4.py: -------------------------------------------------------------------------------- 1 | # Subset and conquer 2 | # Subsetting Python lists is a piece of cake. Take the code sample below, which creates a list x and then selects "b" from it. Remember that this is the second element, so it has index 1. You can also use negative indexing. 3 | 4 | # x = ["a", "b", "c", "d"] 5 | # x[1] 6 | # x[-3] # same result! 7 | # Remember the areas list from before, containing both strings and floats? Its definition is already in the script. Can you add the correct code to do some Python subsetting? 8 | 9 | # Instructions 10 | # 100 XP 11 | # Instructions 12 | # 100 XP 13 | # Print out the second element from the areas list (it has the value 11.25). 14 | # Subset and print out the last element of areas, being 9.50. Using a negative index makes sense here! 15 | # Select the number representing the area of the living room (20.0) and print it out. 16 | # Create the areas list 17 | areas = ["hallway", 11.25, "kitchen", 18.0, "living room", 20.0, "bedroom", 10.75, "bathroom", 9.50] 18 | 19 | # Print out second element from areas 20 | print(areas[1]) 21 | 22 | # Print out last element from areas 23 | print(areas[-1]) 24 | 25 | # Print out the area of the living room 26 | print(areas[5]) -------------------------------------------------------------------------------- /Introduction to Python/Codings/Chapter 2-Python Lists/5.py: -------------------------------------------------------------------------------- 1 | # Subset and calculate 2 | # After you've extracted values from a list, you can use them to perform additional calculations. Take this example, where the second and fourth element of a list x are extracted. The strings that result are pasted together using the + operator: 3 | 4 | # x = ["a", "b", "c", "d"] 5 | # print(x[1] + x[3]) 6 | # Instructions 7 | # 100 XP 8 | # Using a combination of list subsetting and variable assignment, create a new variable, eat_sleep_area, that contains the sum of the area of the kitchen and the area of the bedroom. 9 | # Print the new variable eat_sleep_area. 10 | 11 | # Create the areas list 12 | areas = ["hallway", 11.25, "kitchen", 18.0, "living room", 20.0, "bedroom", 10.75, "bathroom", 9.50] 13 | 14 | # Sum of kitchen and bedroom area: eat_sleep_area 15 | eat_sleep_area = areas[3] + areas[7] 16 | 17 | # Print the variable eat_sleep_area 18 | print(eat_sleep_area) -------------------------------------------------------------------------------- /Introduction to Python/Codings/Chapter 2-Python Lists/6.py: -------------------------------------------------------------------------------- 1 | # Slicing and dicing 2 | # Selecting single values from a list is just one part of the story. It's also possible to slice your list, which means selecting multiple elements from your list. Use the following syntax: 3 | 4 | # my_list[start:end] 5 | # The start index will be included, while the end index is not. 6 | 7 | # The code sample below shows an example. A list with "b" and "c", corresponding to indexes 1 and 2, are selected from a list x: 8 | 9 | # x = ["a", "b", "c", "d"] 10 | # x[1:3] 11 | # The elements with index 1 and 2 are included, while the element with index 3 is not. 12 | 13 | # Instructions 14 | # 100 XP 15 | # Use slicing to create a list, downstairs, that contains the first 6 elements of areas. 16 | # Do a similar thing to create a new variable, upstairs, that contains the last 4 elements of areas. 17 | # Print both downstairs and upstairs using print(). 18 | 19 | 20 | # Create the areas list 21 | areas = ["hallway", 11.25, "kitchen", 18.0, "living room", 20.0, "bedroom", 10.75, "bathroom", 9.50] 22 | 23 | # Use slicing to create downstairs 24 | downstairs = areas[:6] 25 | 26 | # Use slicing to create upstairs 27 | upstairs = areas[6:] 28 | 29 | # Print out downstairs and upstairs 30 | print(downstairs) 31 | print(upstairs) -------------------------------------------------------------------------------- /Introduction to Python/Codings/Chapter 2-Python Lists/7.py: -------------------------------------------------------------------------------- 1 | # Exercise 2 | # Exercise 3 | # Slicing and dicing (2) 4 | # In the video, Filip first discussed the syntax where you specify both where to begin and end the slice of your list: 5 | 6 | # my_list[begin:end] 7 | # However, it's also possible not to specify these indexes. If you don't specify the begin index, Python figures out that you want to start your slice at the beginning of your list. If you don't specify the end index, the slice will go all the way to the last element of your list. To experiment with this, try the following commands in the IPython Shell: 8 | 9 | # x = ["a", "b", "c", "d"] 10 | # x[:2] 11 | # x[2:] 12 | # x[:] 13 | # Instructions 14 | # 100 XP 15 | # Instructions 16 | # 100 XP 17 | # Create downstairs again, as the first 6 elements of areas. This time, simplify the slicing by omitting the begin index. 18 | # Create upstairs again, as the last 4 elements of areas. This time, simplify the slicing by omitting the end index. 19 | 20 | # Create the areas list 21 | areas = ["hallway", 11.25, "kitchen", 18.0, "living room", 20.0, "bedroom", 10.75, "bathroom", 9.50] 22 | 23 | # Alternative slicing to create downstairs 24 | downstairs = areas[:6] 25 | 26 | # Alternative slicing to create upstairs 27 | upstairs = areas[6:] -------------------------------------------------------------------------------- /Introduction to Python/Codings/Chapter 2-Python Lists/8.py: -------------------------------------------------------------------------------- 1 | # Exercise 2 | # Exercise 3 | # Replace list elements 4 | # Replacing list elements is pretty easy. Simply subset the list and assign new values to the subset. You can select single elements or you can change entire list slices at once. 5 | 6 | # Use the IPython Shell to experiment with the commands below. Can you tell what's happening and why? 7 | 8 | # x = ["a", "b", "c", "d"] 9 | # x[1] = "r" 10 | # x[2:] = ["s", "t"] 11 | # For this and the following exercises, you'll continue working on the areas list that contains the names and areas of different rooms in a house. 12 | 13 | # Instructions 14 | # 100 XP 15 | # Update the area of the bathroom area to be 10.50 square meters instead of 9.50. 16 | # Make the areas list more trendy! Change "living room" to "chill zone". 17 | # Create the areas list 18 | areas = ["hallway", 11.25, "kitchen", 18.0, "living room", 20.0, "bedroom", 10.75, "bathroom", 9.50] 19 | 20 | # Correct the bathroom area 21 | areas[-1] = 10.5 22 | print(areas) 23 | 24 | # Change "living room" to "chill zone" 25 | areas[4] = "chill zone" 26 | print(areas) -------------------------------------------------------------------------------- /Introduction to Python/Codings/Chapter 2-Python Lists/9.py: -------------------------------------------------------------------------------- 1 | # Extend a list 2 | # If you can change elements in a list, you sure want to be able to add elements to it, right? You can use the + operator: 3 | 4 | # x = ["a", "b", "c", "d"] 5 | # y = x + ["e", "f"] 6 | # You just won the lottery, awesome! You decide to build a poolhouse and a garage. Can you add the information to the areas list? 7 | 8 | # Instructions 9 | # 100 XP 10 | # Use the + operator to paste the list ["poolhouse", 24.5] to the end of the areas list. Store the resulting list as areas_1. 11 | # Further extend areas_1 by adding data on your garage. Add the string "garage" and float 15.45. Name the resulting list areas_2. 12 | 13 | # Create the areas list and make some changes 14 | areas = ["hallway", 11.25, "kitchen", 18.0, "chill zone", 20.0, 15 | "bedroom", 10.75, "bathroom", 10.50] 16 | 17 | # Add poolhouse data to areas, new list is areas_1 18 | areas_1 = areas + ["poolhouse", 24.5] 19 | print(areas_1) 20 | 21 | # Add garage data to areas_1, new list is areas_2 22 | areas_2 = areas_1 + ["garage", 15.45] 23 | print(areas_2) -------------------------------------------------------------------------------- /Introduction to Python/Codings/Chapter 3-Functions and Packages/1.py: -------------------------------------------------------------------------------- 1 | # Exercise 2 | # Exercise 3 | # Familiar functions 4 | # Out of the box, Python offers a bunch of built-in functions to make your life as a data scientist easier. You already know two such functions: print() and type(). You've also used the functions str(), int(), bool() and float() to switch between data types. These are built-in functions as well. 5 | 6 | # Calling a function is easy. To get the type of 3.0 and store the output as a new variable, result, you can use the following: 7 | 8 | # result = type(3.0) 9 | # The general recipe for calling functions and saving the result to a variable is thus: 10 | 11 | # output = function_name(input) 12 | # Instructions 13 | # 100 XP 14 | # Use print() in combination with type() to print out the type of var1. 15 | # Use len() to get the length of the list var1. Wrap it in a print() call to directly print it out. 16 | # Use int() to convert var2 to an integer. Store the output as out2. 17 | # Create variables var1 and var2 18 | var1 = [1, 2, 3, 4] 19 | var2 = True 20 | 21 | # Print out type of var1 22 | print(type(var1)) 23 | 24 | # Print out length of var1 25 | print(len(var1)) 26 | 27 | # Convert var2 to an integer: out2 28 | out2 = int(var2) 29 | print(out2) -------------------------------------------------------------------------------- /Introduction to Python/Codings/Chapter 3-Functions and Packages/3.py: -------------------------------------------------------------------------------- 1 | # Exercise 2 | # Exercise 3 | # String Methods 4 | # Strings come with a bunch of methods. Follow the instructions closely to discover some of them. If you want to discover them in more detail, you can always type help(str) in the IPython Shell. 5 | 6 | # A string place has already been created for you to experiment with. 7 | 8 | # Instructions 9 | # 100 XP 10 | # Use the upper() method on place and store the result in place_up. Use the syntax for calling methods that you learned in the previous video. 11 | # Print out place and place_up. Did both change? 12 | # Print out the number of o's on the variable place by calling count() on place and passing the letter 'o' as an input to the method. We're talking about the variable place, not the word "place"! 13 | # string to experiment with: place 14 | place = "poolhouse" 15 | print(place) 16 | 17 | # Use upper() on room: room_up 18 | place_up = place.upper() 19 | print(place_up) 20 | # Print out the number of o's in room 21 | print(place.count("o")) -------------------------------------------------------------------------------- /Introduction to Python/Codings/Chapter 3-Functions and Packages/4.py: -------------------------------------------------------------------------------- 1 | # List Methods 2 | # Strings are not the only Python types that have methods associated with them. Lists, floats, integers and booleans are also types that come packaged with a bunch of useful methods. In this exercise, you'll be experimenting with: 3 | 4 | # index(), to get the index of the first element of a list that matches its input and 5 | # count(), to get the number of times an element appears in a list. 6 | # You'll be working on the list with the area of different parts of a house: areas. 7 | 8 | # Instructions 9 | # 100 XP 10 | # Use the index() method to get the index of the element in areas that is equal to 20.0. Print out this index. 11 | # Call count() on areas to find out how many times 9.50 appears in the list. Again, simply print out this number. 12 | 13 | # Create list areas 14 | areas = [11.25, 18.0, 20.0, 10.75, 9.50] 15 | 16 | # Print out the index of the element 20.0 17 | print(areas.index(20.0)) 18 | 19 | # Print out how often 14.5 appears in areas 20 | print(areas.count(9.50)) -------------------------------------------------------------------------------- /Introduction to Python/Codings/Chapter 3-Functions and Packages/5.py: -------------------------------------------------------------------------------- 1 | # Exercise 2 | # Exercise 3 | # List Methods (2) 4 | # Most list methods will change the list they're called on. Examples are: 5 | 6 | # append(), that adds an element to the list it is called on, 7 | # remove(), that removes the first element of a list that matches the input, and 8 | # reverse(), that reverses the order of the elements in the list it is called on. 9 | # You'll be working on the list with the area of different parts of the house: areas. 10 | 11 | # Instructions 12 | # 100 XP 13 | # Instructions 14 | # 100 XP 15 | # Use append() twice to add the size of the poolhouse and the garage again: 24.5 and 15.45, respectively. Make sure to add them in this order. 16 | # Print out areas 17 | # Use the reverse() method to reverse the order of the elements in areas. 18 | # Print out areas once more. 19 | 20 | # Create list areas 21 | areas = [11.25, 18.0, 20.0, 10.75, 9.50] 22 | 23 | # Use append twice to add poolhouse and garage size 24 | areas.append(24.5) 25 | areas.append(15.45) 26 | 27 | # Print out areas 28 | print(areas) 29 | 30 | # Reverse the orders of the elements in areas 31 | areas.reverse() 32 | 33 | # Print out areas 34 | print(areas) -------------------------------------------------------------------------------- /Introduction to Python/Codings/Chapter 3-Functions and Packages/6.py: -------------------------------------------------------------------------------- 1 | # Import package 2 | # As a data scientist, some notions of geometry never hurt. Let's refresh some of the basics. 3 | 4 | # For a fancy clustering algorithm, you want to find the circumference, C, and area, A, of a circle. When the radius of the circle is r, you can calculate C and A as: 5 | 6 | # C=2πr 7 | # A=πr2 8 | # To use the constant pi, you'll need the math package. A variable r is already coded in the script. Fill in the code to calculate C and A and see how the print() functions create some nice printouts. 9 | 10 | # Instructions 11 | # 100 XP 12 | # Import the math package. Now you can access the constant pi with math.pi. 13 | # Calculate the circumference of the circle and store it in C. 14 | # Calculate the area of the circle and store it in A. 15 | 16 | # Definition of radius 17 | r = 0.43 18 | 19 | # Import the math package 20 | import math 21 | 22 | # Calculate C 23 | C = 2 * math.pi * r 24 | 25 | # Calculate A 26 | A = math.pi * r ** 2 27 | 28 | # Build printout 29 | print("Circumference: " + str(C)) 30 | print("Area: " + str(A)) -------------------------------------------------------------------------------- /Introduction to Python/Codings/Chapter 3-Functions and Packages/7.py: -------------------------------------------------------------------------------- 1 | # Selective import 2 | # General imports, like import math, make all functionality from the math package available to you. However, if you decide to only use a specific part of a package, you can always make your import more selective: 3 | 4 | # from math import pi 5 | # Let's say the Moon's orbit around planet Earth is a perfect circle, with a radius r (in km) that is defined in the script. 6 | 7 | # Instructions 8 | # 100 XP 9 | # Perform a selective import from the math package where you only import the radians function. 10 | # Calculate the distance travelled by the Moon over 12 degrees of its orbit. Assign the result to dist. You can calculate this as r * phi, where r is the radius and phi is the angle in radians. To convert an angle in degrees to an angle in radians, use the radians() function, which you just imported. 11 | # Print out dist. 12 | 13 | # Definition of radius 14 | r = 192500 15 | 16 | # Import radians function of math package 17 | from math import radians 18 | 19 | # Travel distance of Moon over 12 degrees. Store in dist. 20 | phi = radians(12) 21 | dist = r * phi 22 | 23 | # Print out dist 24 | print(dist) -------------------------------------------------------------------------------- /Introduction to Python/Codings/Chapter 4-NumPy/1.py: -------------------------------------------------------------------------------- 1 | # Your First NumPy Array 2 | # In this chapter, we're going to dive into the world of baseball. Along the way, you'll get comfortable with the basics of numpy, a powerful package to do data science. 3 | 4 | # A list baseball has already been defined in the Python script, representing the height of some baseball players in centimeters. Can you add some code here and there to create a numpy array from it? 5 | 6 | # Instructions 7 | # 100 XP 8 | # Import the numpy package as np, so that you can refer to numpy with np. 9 | # Use np.array() to create a numpy array from baseball. Name this array np_baseball. 10 | # Print out the type of np_baseball to check that you got it right. 11 | 12 | # Create list baseball 13 | baseball = [180, 215, 210, 210, 188, 176, 209, 200] 14 | 15 | # Import the numpy package as np 16 | import numpy as np 17 | 18 | # Create a Numpy array from baseball: np_baseball 19 | np_baseball = np.array(baseball) 20 | print(np_baseball) 21 | 22 | # Print out type of np_baseball 23 | print(type(np_baseball)) -------------------------------------------------------------------------------- /Introduction to Python/Codings/Chapter 4-NumPy/10.py: -------------------------------------------------------------------------------- 1 | # Average versus median 2 | # You now know how to use numpy functions to get a better feeling for your data. It basically comes down to importing numpy and then calling several simple functions on the numpy arrays: 3 | 4 | # import numpy as np 5 | # x = [1, 4, 8, 10, 12] 6 | # np.mean(x) 7 | # np.median(x) 8 | # The baseball data is available as a 2D numpy array with 3 columns (height, weight, age) and 1015 rows. The name of this numpy array is np_baseball. After restructuring the data, however, you notice that some height values are abnormally high. Follow the instructions and discover which summary statistic is best suited if you're dealing with so-called outliers. 9 | 10 | # Instructions 11 | # 100 XP 12 | # Instructions 13 | # 100 XP 14 | # Create numpy array np_height_in that is equal to first column of np_baseball. 15 | # Print out the mean of np_height_in. 16 | # Print out the median of np_height_in. 17 | # np_baseball is available 18 | 19 | # Import numpy 20 | import numpy as np 21 | 22 | # Create np_height_in from np_baseball 23 | np_height_in = np_baseball[:,0] 24 | 25 | # Print out the mean of np_height 26 | print(np.mean(np_height_in)) 27 | 28 | # Print out the median of np_height 29 | print(np.median(np_height_in)) 30 | -------------------------------------------------------------------------------- /Introduction to Python/Codings/Chapter 4-NumPy/11.py: -------------------------------------------------------------------------------- 1 | # Exercise 2 | # Exercise 3 | # Explore the baseball data 4 | # Because the mean and median are so far apart, you decide to complain to the MLB. They find the error and send the corrected data over to you. It's again available as a 2D Numpy array np_baseball, with three columns. 5 | 6 | # The Python script on the right already includes code to print out informative messages with the different summary statistics. Can you finish the job? 7 | 8 | # Instructions 9 | # 100 XP 10 | # The code to print out the mean height is already included. Complete the code for the median height. Replace None with the correct code. 11 | # Use np.std() on the first column of np_baseball to calculate stddev. Replace None with the correct code. 12 | # Do big players tend to be heavier? Use np.corrcoef() to store the correlation between the first and second column of np_baseball in corr. Replace None with the correct code. 13 | # np_baseball is available 14 | 15 | # Import numpy 16 | import numpy as np 17 | # Print mean height (first column) 18 | avg = np.mean(np_baseball[:,0]) 19 | print("Average: " + str(avg)) 20 | 21 | # Print median height. Replace 'None' 22 | med = np.median(np_baseball[:,0]) 23 | print("Median: " + str(med)) 24 | 25 | # Print out the standard deviation on height. Replace 'None' 26 | stddev = np.std(np_baseball[:,0]) 27 | print("Standard Deviation: " + str(stddev)) 28 | 29 | # Print out correlation between first and second column. Replace 'None' 30 | corr = np.corrcoef(np_baseball[:,0],np_baseball[:,1]) 31 | print("Correlation: " + str(corr)) -------------------------------------------------------------------------------- /Introduction to Python/Codings/Chapter 4-NumPy/2.py: -------------------------------------------------------------------------------- 1 | # Baseball players' height 2 | # You are a huge baseball fan. You decide to call the MLB (Major League Baseball) and ask around for some more statistics on the height of the main players. They pass along data on more than a thousand players, which is stored as a regular Python list: height_in. The height is expressed in inches. Can you make a numpy array out of it and convert the units to meters? 3 | 4 | # height_in is already available and the numpy package is loaded, so you can start straight away (Source: stat.ucla.edu). 5 | 6 | # Instructions 7 | # 100 XP 8 | # Instructions 9 | # 100 XP 10 | # Create a numpy array from height_in. Name this new array np_height_in. 11 | # Print np_height_in. 12 | # Multiply np_height_in with 0.0254 to convert all height measurements from inches to meters. Store the new values in a new array, np_height_m. 13 | # Print out np_height_m and check if the output makes sense. 14 | # height is available as a regular list 15 | 16 | # Import numpy 17 | import numpy as np 18 | 19 | # Create a numpy array from height_in: np_height_in 20 | np_height_in = np.array(height_in) 21 | 22 | # Print out np_height_in 23 | print(np_height_in) 24 | 25 | # Convert np_height_in to m: np_height_m 26 | np_height_m = np_height_in * 0.0254 27 | 28 | # Print np_height_m 29 | print(np_height_m) -------------------------------------------------------------------------------- /Introduction to Python/Codings/Chapter 4-NumPy/3.py: -------------------------------------------------------------------------------- 1 | # Baseball player's BMI 2 | # The MLB also offers to let you analyze their weight data. Again, both are available as regular Python lists: height_in and weight. height_in is in inches and weight_lb is in pounds. 3 | 4 | # It's now possible to calculate the BMI of each baseball player. Python code to convert height_in to a numpy array with the correct units is already available in the workspace. Follow the instructions step by step and finish the game! 5 | 6 | # Instructions 7 | # 100 XP 8 | # Instructions 9 | # 100 XP 10 | # Create a numpy array from the weight_lb list with the correct units. Multiply by 0.453592 to go from pounds to kilograms. Store the resulting numpy array as np_weight_kg. 11 | # Use np_height_m and np_weight_kg to calculate the BMI of each player. Use the following equation: 12 | # BMI=weight(kg)height(m)2 13 | # Save the resulting numpy array as bmi. 14 | # Print out bmi. 15 | 16 | # height and weight are available as regular lists 17 | 18 | # Import numpy 19 | import numpy as np 20 | 21 | # Create array from height_in with metric units: np_height_m 22 | np_height_m = np.array(height_in) * 0.0254 23 | 24 | # Create array from weight_lb with metric units: np_weight_kg 25 | np_weight_kg = np.array(weight_lb) * 0.453592 26 | 27 | # Calculate the BMI: bmi 28 | bmi = np_weight_kg / np_height_m ** 2 29 | 30 | # Print out bmi 31 | print(bmi) -------------------------------------------------------------------------------- /Introduction to Python/Codings/Chapter 4-NumPy/4.py: -------------------------------------------------------------------------------- 1 | # Lightweight baseball players 2 | # To subset both regular Python lists and numpy arrays, you can use square brackets: 3 | 4 | # x = [4 , 9 , 6, 3, 1] 5 | # x[1] 6 | # import numpy as np 7 | # y = np.array(x) 8 | # y[1] 9 | # For numpy specifically, you can also use boolean numpy arrays: 10 | 11 | # high = y > 5 12 | # y[high] 13 | # The code that calculates the BMI of all baseball players is already included. Follow the instructions and reveal interesting things from the data! 14 | 15 | # Instructions 16 | # 100 XP 17 | # Instructions 18 | # 100 XP 19 | # Create a boolean numpy array: the element of the array should be True if the corresponding baseball player's BMI is below 21. You can use the < operator for this. Name the array light. 20 | # Print the array light. 21 | # Print out a numpy array with the BMIs of all baseball players whose BMI is below 21. Use light inside square brackets to do a selection on the bmi array. 22 | # height and weight are available as a regular lists 23 | 24 | # Import numpy 25 | import numpy as np 26 | 27 | # Calculate the BMI: bmi 28 | np_height_m = np.array(height_in) * 0.0254 29 | np_weight_kg = np.array(weight_lb) * 0.453592 30 | bmi = np_weight_kg / np_height_m ** 2 31 | 32 | # Create the light array 33 | light = bmi < 21 34 | 35 | # Print out light 36 | print(light) 37 | 38 | # Print out BMIs of all baseball players whose BMI is below 21 39 | print(bmi[light]) -------------------------------------------------------------------------------- /Introduction to Python/Codings/Chapter 4-NumPy/5.py: -------------------------------------------------------------------------------- 1 | # Exercise 2 | # Exercise 3 | # Subsetting NumPy Arrays 4 | # You've seen it with your own eyes: Python lists and numpy arrays sometimes behave differently. Luckily, there are still certainties in this world. For example, subsetting (using the square bracket notation on lists or arrays) works exactly the same. To see this for yourself, try the following lines of code in the IPython Shell: 5 | 6 | # x = ["a", "b", "c"] 7 | # x[1] 8 | 9 | # np_x = np.array(x) 10 | # np_x[1] 11 | # The script on the right already contains code that imports numpy as np, and stores both the height and weight of the MLB players as numpy arrays. 12 | 13 | # Instructions 14 | # 100 XP 15 | # Subset np_weight_lb by printing out the element at index 50. 16 | # Print out a sub-array of np_height_in that contains the elements at index 100 up to and including index 110. 17 | # height and weight are available as a regular lists 18 | 19 | # Import numpy 20 | import numpy as np 21 | 22 | # Store weight and height lists as numpy arrays 23 | np_weight_lb = np.array(weight_lb) 24 | np_height_in = np.array(height_in) 25 | 26 | # Print out the weight at index 50 27 | print(np_weight_lb[50]) 28 | 29 | # Print out sub-array of np_height: index 100 up to and including index 110 30 | print(np_height_in[100:111]) -------------------------------------------------------------------------------- /Introduction to Python/Codings/Chapter 4-NumPy/6.py: -------------------------------------------------------------------------------- 1 | # Exercise 2 | # Exercise 3 | # Your First 2D NumPy Array 4 | # Before working on the actual MLB data, let's try to create a 2D numpy array from a small list of lists. 5 | 6 | # In this exercise, baseball is a list of lists. The main list contains 4 elements. Each of these elements is a list containing the height and the weight of 4 baseball players, in this order. baseball is already coded for you in the script. 7 | 8 | # Instructions 9 | # 100 XP 10 | # Use np.array() to create a 2D numpy array from baseball. Name it np_baseball. 11 | # Print out the type of np_baseball. 12 | # Print out the shape attribute of np_baseball. Use np_baseball.shape. 13 | 14 | # Create baseball, a list of lists 15 | baseball = [[180, 78.4], 16 | [215, 102.7], 17 | [210, 98.5], 18 | [188, 75.2]] 19 | 20 | # Import numpy 21 | import numpy as np 22 | 23 | # Create a 2D numpy array from baseball: np_baseball 24 | np_baseball = np.array(baseball) 25 | 26 | # Print out the type of np_baseball 27 | print(type(np_baseball)) 28 | 29 | # Print out the shape of np_baseball 30 | print(np_baseball.shape) 31 | 32 | print(np_baseball) -------------------------------------------------------------------------------- /Introduction to Python/Codings/Chapter 4-NumPy/7.py: -------------------------------------------------------------------------------- 1 | # Baseball data in 2D form 2 | # You have another look at the MLB data and realize that it makes more sense to restructure all this information in a 2D numpy array. This array should have 1015 rows, corresponding to the 1015 baseball players you have information on, and 2 columns (for height and weight). 3 | 4 | # The MLB was, again, very helpful and passed you the data in a different structure, a Python list of lists. In this list of lists, each sublist represents the height and weight of a single baseball player. The name of this embedded list is baseball. 5 | 6 | # Can you store the data as a 2D array to unlock numpy's extra functionality? 7 | 8 | # Instructions 9 | # 100 XP 10 | # Instructions 11 | # 100 XP 12 | # Use np.array() to create a 2D numpy array from baseball. Name it np_baseball. 13 | # Print out the shape attribute of np_baseball. 14 | # baseball is available as a regular list of lists 15 | 16 | # Import numpy package 17 | import numpy as np 18 | 19 | # Create a 2D numpy array from baseball: np_baseball 20 | np_baseball = np.array(baseball) 21 | 22 | # Print out the shape of np_baseball 23 | print(np_baseball.shape) -------------------------------------------------------------------------------- /Introduction to Python/Codings/Chapter 4-NumPy/9.py: -------------------------------------------------------------------------------- 1 | # 2D Arithmetic 2 | # Remember how you calculated the Body Mass Index for all baseball players? numpy was able to perform all calculations element-wise (i.e. element by element). For 2D numpy arrays this isn't any different! You can combine matrices with single numbers, with vectors, and with other matrices. 3 | 4 | # Execute the code below in the IPython shell and see if you understand: 5 | 6 | # import numpy as np 7 | # np_mat = np.array([[1, 2], 8 | # [3, 4], 9 | # [5, 6]]) 10 | # np_mat * 2 11 | # np_mat + np.array([10, 10]) 12 | # np_mat + np_mat 13 | # np_baseball is coded for you; it's again a 2D numpy array with 3 columns representing height (in inches), weight (in pounds) and age (in years). 14 | 15 | # Instructions 16 | # 100 XP 17 | # You managed to get hold of the changes in height, weight and age of all baseball players. It is available as a 2D numpy array, updated. Add np_baseball and updated and print out the result. 18 | # You want to convert the units of height and weight to metric (meters and kilograms respectively). As a first step, create a numpy array with three values: 0.0254, 0.453592 and 1. Name this array conversion. 19 | # Multiply np_baseball with conversion and print out the result. 20 | # baseball is available as a regular list of lists 21 | # updated is available as 2D numpy array 22 | 23 | # Import numpy package 24 | import numpy as np 25 | 26 | # Create np_baseball (3 cols) 27 | np_baseball = np.array(baseball) 28 | 29 | # Print out addition of np_baseball and updated 30 | print(np_baseball + updated) 31 | 32 | # Create numpy array: conversion 33 | conversion = np.array([0.0254, 0.453592, 1]) 34 | 35 | # Print out product of np_baseball and conversion 36 | print(np_baseball * conversion) -------------------------------------------------------------------------------- /Introduction to Python/Notes/ch1_slides.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FTiniNadhirah/Datacamp-Data-Scientist-with-Python-2019-and-2020/52ab66237a2373ba820ea3e197809648b09f217b/Introduction to Python/Notes/ch1_slides.pdf -------------------------------------------------------------------------------- /Introduction to Python/Notes/ch2_slides.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FTiniNadhirah/Datacamp-Data-Scientist-with-Python-2019-and-2020/52ab66237a2373ba820ea3e197809648b09f217b/Introduction to Python/Notes/ch2_slides.pdf -------------------------------------------------------------------------------- /Introduction to Python/Notes/ch3_slides.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FTiniNadhirah/Datacamp-Data-Scientist-with-Python-2019-and-2020/52ab66237a2373ba820ea3e197809648b09f217b/Introduction to Python/Notes/ch3_slides.pdf -------------------------------------------------------------------------------- /Introduction to Python/Notes/ch4_slides.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FTiniNadhirah/Datacamp-Data-Scientist-with-Python-2019-and-2020/52ab66237a2373ba820ea3e197809648b09f217b/Introduction to Python/Notes/ch4_slides.pdf -------------------------------------------------------------------------------- /Introduction to Relational Databases in SQL/Notes/chapter1.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FTiniNadhirah/Datacamp-Data-Scientist-with-Python-2019-and-2020/52ab66237a2373ba820ea3e197809648b09f217b/Introduction to Relational Databases in SQL/Notes/chapter1.pdf -------------------------------------------------------------------------------- /Introduction to Relational Databases in SQL/Notes/chapter2.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FTiniNadhirah/Datacamp-Data-Scientist-with-Python-2019-and-2020/52ab66237a2373ba820ea3e197809648b09f217b/Introduction to Relational Databases in SQL/Notes/chapter2.pdf -------------------------------------------------------------------------------- /Introduction to Relational Databases in SQL/Notes/chapter3.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FTiniNadhirah/Datacamp-Data-Scientist-with-Python-2019-and-2020/52ab66237a2373ba820ea3e197809648b09f217b/Introduction to Relational Databases in SQL/Notes/chapter3.pdf -------------------------------------------------------------------------------- /Introduction to Relational Databases in SQL/Notes/chapter4.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FTiniNadhirah/Datacamp-Data-Scientist-with-Python-2019-and-2020/52ab66237a2373ba820ea3e197809648b09f217b/Introduction to Relational Databases in SQL/Notes/chapter4.pdf -------------------------------------------------------------------------------- /Manipulating DataFrames with pandas/Datasets/sales.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FTiniNadhirah/Datacamp-Data-Scientist-with-Python-2019-and-2020/52ab66237a2373ba820ea3e197809648b09f217b/Manipulating DataFrames with pandas/Datasets/sales.zip -------------------------------------------------------------------------------- /Manipulating DataFrames with pandas/Datasets/users.csv: -------------------------------------------------------------------------------- 1 | ,weekday,city,visitors,signups 2 | 0,Sun,Austin,139,7 3 | 1,Sun,Dallas,237,12 4 | 2,Mon,Austin,326,3 5 | 3,Mon,Dallas,456,5 6 | -------------------------------------------------------------------------------- /Manipulating DataFrames with pandas/Notes/Python For Data Science Cheat Sheet.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FTiniNadhirah/Datacamp-Data-Scientist-with-Python-2019-and-2020/52ab66237a2373ba820ea3e197809648b09f217b/Manipulating DataFrames with pandas/Notes/Python For Data Science Cheat Sheet.pdf -------------------------------------------------------------------------------- /Manipulating DataFrames with pandas/Notes/ch1_slides.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FTiniNadhirah/Datacamp-Data-Scientist-with-Python-2019-and-2020/52ab66237a2373ba820ea3e197809648b09f217b/Manipulating DataFrames with pandas/Notes/ch1_slides.pdf -------------------------------------------------------------------------------- /Manipulating DataFrames with pandas/Notes/ch2_slides.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FTiniNadhirah/Datacamp-Data-Scientist-with-Python-2019-and-2020/52ab66237a2373ba820ea3e197809648b09f217b/Manipulating DataFrames with pandas/Notes/ch2_slides.pdf -------------------------------------------------------------------------------- /Manipulating DataFrames with pandas/Notes/ch3_slides.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FTiniNadhirah/Datacamp-Data-Scientist-with-Python-2019-and-2020/52ab66237a2373ba820ea3e197809648b09f217b/Manipulating DataFrames with pandas/Notes/ch3_slides.pdf -------------------------------------------------------------------------------- /Manipulating DataFrames with pandas/Notes/ch4_slides.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FTiniNadhirah/Datacamp-Data-Scientist-with-Python-2019-and-2020/52ab66237a2373ba820ea3e197809648b09f217b/Manipulating DataFrames with pandas/Notes/ch4_slides.pdf -------------------------------------------------------------------------------- /Manipulating DataFrames with pandas/Notes/ch5_slides.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FTiniNadhirah/Datacamp-Data-Scientist-with-Python-2019-and-2020/52ab66237a2373ba820ea3e197809648b09f217b/Manipulating DataFrames with pandas/Notes/ch5_slides.pdf -------------------------------------------------------------------------------- /Merging DataFrames with pandas/Datasets/Baby names.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FTiniNadhirah/Datacamp-Data-Scientist-with-Python-2019-and-2020/52ab66237a2373ba820ea3e197809648b09f217b/Merging DataFrames with pandas/Datasets/Baby names.zip -------------------------------------------------------------------------------- /Merging DataFrames with pandas/Datasets/GDP.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FTiniNadhirah/Datacamp-Data-Scientist-with-Python-2019-and-2020/52ab66237a2373ba820ea3e197809648b09f217b/Merging DataFrames with pandas/Datasets/GDP.zip -------------------------------------------------------------------------------- /Merging DataFrames with pandas/Datasets/Sales.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FTiniNadhirah/Datacamp-Data-Scientist-with-Python-2019-and-2020/52ab66237a2373ba820ea3e197809648b09f217b/Merging DataFrames with pandas/Datasets/Sales.zip -------------------------------------------------------------------------------- /Merging DataFrames with pandas/Datasets/Summer Olympic medals.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FTiniNadhirah/Datacamp-Data-Scientist-with-Python-2019-and-2020/52ab66237a2373ba820ea3e197809648b09f217b/Merging DataFrames with pandas/Datasets/Summer Olympic medals.zip -------------------------------------------------------------------------------- /Merging DataFrames with pandas/Notes/ch1_slides.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FTiniNadhirah/Datacamp-Data-Scientist-with-Python-2019-and-2020/52ab66237a2373ba820ea3e197809648b09f217b/Merging DataFrames with pandas/Notes/ch1_slides.pdf -------------------------------------------------------------------------------- /Merging DataFrames with pandas/Notes/ch2_slides.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FTiniNadhirah/Datacamp-Data-Scientist-with-Python-2019-and-2020/52ab66237a2373ba820ea3e197809648b09f217b/Merging DataFrames with pandas/Notes/ch2_slides.pdf -------------------------------------------------------------------------------- /Merging DataFrames with pandas/Notes/ch3_slides.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FTiniNadhirah/Datacamp-Data-Scientist-with-Python-2019-and-2020/52ab66237a2373ba820ea3e197809648b09f217b/Merging DataFrames with pandas/Notes/ch3_slides.pdf -------------------------------------------------------------------------------- /Merging DataFrames with pandas/Notes/ch4_slides.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FTiniNadhirah/Datacamp-Data-Scientist-with-Python-2019-and-2020/52ab66237a2373ba820ea3e197809648b09f217b/Merging DataFrames with pandas/Notes/ch4_slides.pdf -------------------------------------------------------------------------------- /Natural Language Processing Fundamentals in Python/Chapter 1-Regular expressions & word tokenization/2.py: -------------------------------------------------------------------------------- 1 | # Word tokenization with NLTK 2 | # Here, you'll be using the first scene of Monty Python's Holy Grail, which has been pre-loaded as scene_one. Feel free to check it out in the IPython Shell! 3 | 4 | # Your job in this exercise is to utilize word_tokenize and sent_tokenize from nltk.tokenize to tokenize both words and sentences from Python strings - in this case, the first scene of Monty Python's Holy Grail. 5 | 6 | # Instructions 7 | # 100 XP 8 | # Instructions 9 | # 100 XP 10 | # Import the sent_tokenize and word_tokenize functions from nltk.tokenize. 11 | # Tokenize all the sentences in scene_one using the sent_tokenize() function. 12 | # Tokenize the fourth sentence in sentences, which you can access as sentences[3], using the word_tokenize() function. 13 | # Find the unique tokens in the entire scene by using word_tokenize() on scene_one and then converting it into a set using set(). 14 | # Print the unique tokens found. This has been done for you, so hit 'Submit Answer' to see the results! 15 | 16 | from nltk.tokenize import sent_tokenize 17 | from nltk.tokenize import word_tokenize 18 | # Split scene_one into sentences: sentences 19 | sentences = sent_tokenize(scene_one) 20 | 21 | # Use word_tokenize to tokenize the fourth sentence: tokenized_sent 22 | tokenized_sent = word_tokenize(sentences[3]) 23 | 24 | # Make a set of unique tokens in the entire scene: unique_tokens 25 | unique_tokens = set(word_tokenize(scene_one)) 26 | 27 | # Print the unique tokens result 28 | print(unique_tokens) 29 | -------------------------------------------------------------------------------- /Natural Language Processing Fundamentals in Python/Chapter 1-Regular expressions & word tokenization/5.py: -------------------------------------------------------------------------------- 1 | # Non-ascii tokenization 2 | # In this exercise, you'll practice advanced tokenization by tokenizing some non-ascii based text. You'll be using German with emoji! 3 | 4 | # Here, you have access to a string called german_text, which has been printed for you in the Shell. Notice the emoji and the German characters! 5 | 6 | # The following modules have been pre-imported from nltk.tokenize: regexp_tokenize and word_tokenize. 7 | 8 | # Unicode ranges for emoji are: 9 | 10 | # ('\U0001F300'-'\U0001F5FF'), ('\U0001F600-\U0001F64F'), ('\U0001F680-\U0001F6FF'), and ('\u2600'-\u26FF-\u2700-\u27BF'). 11 | 12 | # Instructions 13 | # 100 XP 14 | # Instructions 15 | # 100 XP 16 | # Tokenize all the words in german_text using word_tokenize(), and print the result. 17 | # Tokenize only the capital words in german_text. 18 | # First, write a pattern called capital_words to match only capital words. Make sure to check for the German Ü! 19 | # Then, tokenize it using regexp_tokenize(). 20 | # Tokenize only the emoji in german_text. The pattern using the unicode ranges for emoji given in the assignment text has been written for you. Your job is to use regexp_tokenize() to tokenize the emoji. 21 | 22 | #Non-ascii tokenization 23 | # Tokenize and print all words in german_text 24 | all_words = word_tokenize(german_text) 25 | print(all_words) 26 | 27 | # Tokenize and print only capital words 28 | capital_words = r"[A-ZÜ]\w+" 29 | print(regexp_tokenize(german_text, capital_words)) 30 | 31 | # Tokenize and print only emoji 32 | emoji = "['\U0001F300-\U0001F5FF'|'\U0001F600-\U0001F64F'|'\U0001F680-\U0001F6FF'|'\u2600-\u26FF\u2700-\u27BF']" 33 | print(regexp_tokenize(german_text, emoji)) -------------------------------------------------------------------------------- /Natural Language Processing Fundamentals in Python/Chapter 1-Regular expressions & word tokenization/chapter1.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FTiniNadhirah/Datacamp-Data-Scientist-with-Python-2019-and-2020/52ab66237a2373ba820ea3e197809648b09f217b/Natural Language Processing Fundamentals in Python/Chapter 1-Regular expressions & word tokenization/chapter1.pdf -------------------------------------------------------------------------------- /Natural Language Processing Fundamentals in Python/Chapter 2-Simple topic identification/1.py: -------------------------------------------------------------------------------- 1 | # Exercise 2 | # Exercise 3 | # Building a Counter with bag-of-words 4 | # In this exercise, you'll build your first (in this course) bag-of-words counter using a Wikipedia article, which has been pre-loaded as article. Try doing the bag-of-words without looking at the full article text, and guessing what the topic is! If you'd like to peek at the title at the end, we've included it as article_title. Note that this article text has had very little preprocessing from the raw Wikipedia database entry. 5 | 6 | # word_tokenize has been imported for you. 7 | 8 | # Instructions 9 | # 100 XP 10 | # Instructions 11 | # 100 XP 12 | # Import Counter from collections. 13 | # Use word_tokenize() to split the article into tokens. 14 | # Use a list comprehension with t as the iterator variable to convert all the tokens into lowercase. The .lower() method converts text into lowercase. 15 | # Create a bag-of-words counter called bow_simple by using Counter() with lower_tokens as the argument. 16 | # Use the .most_common() method of bow_simple to print the 10 most common tokens. 17 | 18 | from collections import Counter 19 | 20 | # Tokenize the article: tokens 21 | tokens = word_tokenize(article) 22 | 23 | # Convert the tokens into lowercase: lower_tokens 24 | lower_tokens = [t.lower() for t in tokens] 25 | 26 | # Create a Counter with the lowercase tokens: bow_simple 27 | bow_simple = Counter(lower_tokens) 28 | 29 | # Print the 10 most common tokens 30 | print(bow_simple.most_common(10)) -------------------------------------------------------------------------------- /Natural Language Processing Fundamentals in Python/Chapter 2-Simple topic identification/5.py: -------------------------------------------------------------------------------- 1 | # Tf-idf with Wikipedia 2 | # Now it's your turn to determine new significant terms for your corpus by applying gensim's tf-idf. You will again have access to the same corpus and dictionary objects you created in the previous exercises - dictionary, corpus, and doc. Will tf-idf make for more interesting results on the document level? 3 | 4 | # Instructions 5 | # 100 XP 6 | # Instructions 7 | # 100 XP 8 | # Import TfidfModel from gensim.models.tfidfmodel. 9 | # Initialize a new TfidfModel called tfidf using corpus. 10 | # Use doc to calculate the weights. You can do this by passing [doc] to tfidf. 11 | # Print the first five term ids with weights. 12 | # Sort the term ids and weights in a new list from highest to lowest weight. This has been done for you. 13 | # Print the top five weighted words (term_id) from sorted_tfidf_weights along with their weighted score (weight). 14 | 15 | # Import TfidfModel 16 | from gensim.models.tfidfmodel import TfidfModel 17 | 18 | # Create a new TfidfModel using the corpus: tfidf 19 | tfidf = TfidfModel(corpus) 20 | 21 | # Calculate the tfidf weights of doc: tfidf_weights 22 | tfidf_weights = tfidf[doc] 23 | 24 | # Print the first five weights 25 | print(tfidf_weights[:5]) 26 | 27 | # Sort the weights from highest to lowest: sorted_tfidf_weights 28 | sorted_tfidf_weights = sorted(tfidf_weights, key=lambda w: w[1], reverse=True) 29 | 30 | # Print the top 5 weighted words 31 | for term_id, weight in sorted_tfidf_weights[:5]: 32 | print(dictionary.get(term_id), weight) 33 | -------------------------------------------------------------------------------- /Natural Language Processing Fundamentals in Python/Chapter 2-Simple topic identification/chapter2.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FTiniNadhirah/Datacamp-Data-Scientist-with-Python-2019-and-2020/52ab66237a2373ba820ea3e197809648b09f217b/Natural Language Processing Fundamentals in Python/Chapter 2-Simple topic identification/chapter2.pdf -------------------------------------------------------------------------------- /Natural Language Processing Fundamentals in Python/Chapter 3-Named-entity recognition/3.py: -------------------------------------------------------------------------------- 1 | # Comparing NLTK with spaCy NER 2 | # Using the same text you used in the first exercise of this chapter, you'll now see the results using spaCy's NER annotator. How will they compare? 3 | 4 | # The article has been pre-loaded as article. To minimize execution times, you'll be asked to specify the keyword arguments tagger=False, parser=False, matcher=False when loading the spaCy model, because you only care about the entity in this exercise. 5 | 6 | # Instructions 7 | # 100 XP 8 | # Instructions 9 | # 100 XP 10 | # Import spacy. 11 | # Load the 'en' model using spacy.load(). Specify the additional keyword arguments tagger=False, parser=False, matcher=False. 12 | # Create a spacy document object by passing article into nlp(). 13 | # Using ent as your iterator variable, iterate over the entities of doc and print out the labels (ent.label_) and text (ent.text). 14 | 15 | # Import spacy 16 | import spacy 17 | 18 | # Instantiate the English model: nlp 19 | nlp = spacy.load('en', tagger=False, parser=False, matcher=False) 20 | 21 | # Create a new document: doc 22 | doc = nlp(article) 23 | 24 | # Print all of the found entities and their labels 25 | for ent in doc.ents: 26 | print(ent.label_, ent.text) -------------------------------------------------------------------------------- /Natural Language Processing Fundamentals in Python/Chapter 3-Named-entity recognition/4.py: -------------------------------------------------------------------------------- 1 | # French NER with polyglot I 2 | # In this exercise and the next, you'll use the polyglot library to identify French entities. The library functions slightly differently than spacy, so you'll use a few of the new things you learned in the last video to display the named entity text and category. 3 | 4 | # You have access to the full article string in article. Additionally, the Text class of polyglot has been imported from polyglot.text. 5 | 6 | # Instructions 7 | # 100 XP 8 | # Create a new Text object called txt. 9 | # Iterate over txt.entities and print each entity, ent. 10 | # Print the type() of ent. 11 | 12 | # Create a new text object using Polyglot's Text class: txt 13 | txt = Text(article) 14 | 15 | # Print each of the entities found 16 | for ent in txt.entities: 17 | print(ent) 18 | 19 | # Print the type of each entity 20 | print(type(ent)) -------------------------------------------------------------------------------- /Natural Language Processing Fundamentals in Python/Chapter 3-Named-entity recognition/5.py: -------------------------------------------------------------------------------- 1 | # French NER with polyglot II 2 | # Here, you'll complete the work you began in the previous exercise. 3 | 4 | # Your task is to use a list comprehension to create a list of tuples, in which the first element is the entity tag, and the second element is the full string of the entity text. 5 | 6 | # Instructions 7 | # 100 XP 8 | # Instructions 9 | # 100 XP 10 | # Use a list comprehension to create a list of tuples called entities. 11 | # The output expression of your list comprehension should be a tuple. Remember to use () to create the tuple. 12 | # The first element of each tuple is the entity tag, which you can access using its .tag attribute. 13 | # The second element is the full string of the entity text, which you can access using ' '.join(ent). 14 | # Your iterator variable should be ent, and you should iterate over all of the entities of the polyglot Text object, txt. 15 | # Print entities to see what you've created. 16 | 17 | # Create the list of tuples: entities 18 | entities = [(ent.tag, ' '.join(ent)) for ent in txt.entities] 19 | 20 | # Print the entities 21 | print(entities) 22 | 23 | -------------------------------------------------------------------------------- /Natural Language Processing Fundamentals in Python/Chapter 3-Named-entity recognition/6.py: -------------------------------------------------------------------------------- 1 | # Exercise 2 | # Exercise 3 | # Spanish NER with polyglot 4 | # You'll continue your exploration of polyglot now with some Spanish annotation. This article is not written by a newspaper, so it is your first example of a more blog-like text. How do you think that might compare when finding entities? 5 | 6 | # The Text object has been created as txt, and each entity has been printed, as you can see in the IPython Shell. 7 | 8 | # Your specific task is to determine how many of the entities contain the words "Márquez" or "Gabo" - these refer to the same person in different ways! 9 | 10 | # Instructions 11 | # 100 XP 12 | # Instructions 13 | # 100 XP 14 | # Iterate over all of the entities of txt, using ent as your iterator variable. 15 | # Check whether the entity contains "Márquez" or "Gabo". If it does, increment count. 16 | # Hit 'Submit Answer' to see what percentage of entities refer to Gabriel García Márquez (aka Gabo). 17 | 18 | count = 0 19 | 20 | # Iterate over all the entities 21 | for ent in txt.entities: 22 | # Check whether the entity contains 'Márquez' or 'Gabo' 23 | if "Márquez" in ent or "Gabo" in ent: 24 | # Increment count 25 | count += 1 26 | 27 | # Print count 28 | print(count) 29 | 30 | # Calculate the percentage of entities that refer to "Gabo": percentage 31 | percentage = count * 1.0 / len(txt.entities) 32 | print(percentage) 33 | -------------------------------------------------------------------------------- /Natural Language Processing Fundamentals in Python/Chapter 3-Named-entity recognition/chapter3.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FTiniNadhirah/Datacamp-Data-Scientist-with-Python-2019-and-2020/52ab66237a2373ba820ea3e197809648b09f217b/Natural Language Processing Fundamentals in Python/Chapter 3-Named-entity recognition/chapter3.pdf -------------------------------------------------------------------------------- /Natural Language Processing Fundamentals in Python/Chapter 4-Building a fake news classifier/2.py: -------------------------------------------------------------------------------- 1 | # TfidfVectorizer for text classification 2 | # Similar to the sparse CountVectorizer created in the previous exercise, you'll work on creating tf-idf vectors for your documents. You'll set up a TfidfVectorizer and investigate some of its features. 3 | 4 | # In this exercise, you'll use pandas and sklearn along with the same X_train, y_train and X_test, y_test DataFrames and Series you created in the last exercise. 5 | 6 | # Instructions 7 | # 100 XP 8 | # Instructions 9 | # 100 XP 10 | # Import TfidfVectorizer from sklearn.feature_extraction.text. 11 | # Create a TfidfVectorizer object called tfidf_vectorizer. When doing so, specify the keyword arguments stop_words="english" and max_df=0.7. 12 | # Fit and transform the training data. 13 | # Transform the test data. 14 | # Print the first 10 features of tfidf_vectorizer. 15 | # Print the first 5 vectors of the tfidf training data using slicing on the .A (or array) attribute of tfidf_train. 16 | 17 | # Import TfidfVectorizer 18 | from sklearn.feature_extraction.text import TfidfVectorizer 19 | 20 | # Initialize a TfidfVectorizer object: tfidf_vectorizer 21 | tfidf_vectorizer = TfidfVectorizer(stop_words='english', max_df=0.7) 22 | 23 | # Transform the training data: tfidf_train 24 | tfidf_train = tfidf_vectorizer.fit_transform(X_train) 25 | 26 | # Transform the test data: tfidf_test 27 | tfidf_test = tfidf_vectorizer.transform(X_test) 28 | 29 | # Print the first 10 features 30 | print(tfidf_vectorizer.get_feature_names()[:10]) 31 | 32 | # Print the first 5 vectors of the tfidf training data 33 | print(tfidf_train.A[:5]) -------------------------------------------------------------------------------- /Natural Language Processing Fundamentals in Python/Chapter 4-Building a fake news classifier/4.py: -------------------------------------------------------------------------------- 1 | # Training and testing the "fake news" model with CountVectorizer 2 | # Now it's your turn to train the "fake news" model using the features you identified and extracted. In this first exercise you'll train and test a Naive Bayes model using the CountVectorizer data. 3 | 4 | # The training and test sets have been created, and count_vectorizer, count_train, and count_test have been computed. 5 | 6 | # Instructions 7 | # 100 XP 8 | # Instructions 9 | # 100 XP 10 | # Import the metrics module from sklearn and MultinomialNB from sklearn.naive_bayes. 11 | # Instantiate a MultinomialNB classifier called nb_classifier. 12 | # Fit the classifier to the training data. 13 | # Compute the predicted tags for the test data. 14 | # Calculate and print the accuracy score of the classifier. 15 | # Compute the confusion matrix. To make it easier to read, specify the keyword argument labels=['FAKE', 'REAL']. 16 | 17 | from sklearn.naive_bayes import MultinomialNB 18 | from sklearn import metrics 19 | 20 | # Instantiate a Multinomial Naive Bayes classifier: nb_classifier 21 | nb_classifier = MultinomialNB() 22 | 23 | # Fit the classifier to the training data 24 | nb_classifier.fit(count_train, y_train) 25 | 26 | # Create the predicted tags: pred 27 | pred = nb_classifier.predict(count_test) 28 | 29 | # Calculate the accuracy score: score 30 | score = metrics.accuracy_score(y_test, pred) 31 | print(score) 32 | 33 | # Calculate the confusion matrix: cm 34 | cm = metrics.confusion_matrix(y_test, pred, labels=['FAKE', 'REAL']) 35 | print(cm) 36 | -------------------------------------------------------------------------------- /Natural Language Processing Fundamentals in Python/Chapter 4-Building a fake news classifier/5.py: -------------------------------------------------------------------------------- 1 | # Training and testing the "fake news" model with TfidfVectorizer 2 | # Now that you have evaluated the model using the CountVectorizer, you'll do the same using the TfidfVectorizer with a Naive Bayes model. 3 | 4 | # The training and test sets have been created, and tfidf_vectorizer, tfidf_train, and tfidf_test have been computed. Additionally, MultinomialNB and metrics have been imported from, respectively, sklearn.naive_bayes and sklearn. 5 | 6 | # Instructions 7 | # 100 XP 8 | # Instructions 9 | # 100 XP 10 | # Instantiate a MultinomialNB classifier called nb_classifier. 11 | # Fit the classifier to the training data. 12 | # Compute the predicted tags for the test data. 13 | # Calculate and print the accuracy score of the classifier. 14 | # Compute the confusion matrix. As in the previous exercise, specify the keyword argument labels=['FAKE', 'REAL'] so that the resulting confusion matrix is easier to read. 15 | 16 | 17 | nb_classifier = MultinomialNB() 18 | 19 | # Fit the classifier to the training data 20 | nb_classifier.fit(tfidf_train, y_train) 21 | 22 | # Create the predicted tags: pred 23 | pred = nb_classifier.predict(tfidf_test) 24 | 25 | # Calculate the accuracy score: score 26 | score = metrics.accuracy_score(y_test, pred) 27 | print(score) 28 | 29 | # Calculate the confusion matrix: cm 30 | cm = metrics.confusion_matrix(y_test, pred, labels=['FAKE', 'REAL']) 31 | print(cm) -------------------------------------------------------------------------------- /Natural Language Processing Fundamentals in Python/Chapter 4-Building a fake news classifier/6.py: -------------------------------------------------------------------------------- 1 | # Exercise 2 | # Exercise 3 | # Improving your model 4 | # Your job in this exercise is to test a few different alpha levels using the Tfidf vectors to determine if there is a better performing combination. 5 | 6 | # The training and test sets have been created, and tfidf_vectorizer, tfidf_train, and tfidf_test have been computed. 7 | 8 | # Instructions 9 | # 100 XP 10 | # Instructions 11 | # 100 XP 12 | # Create a list of alphas to try using np.arange(). Values should range from 0 to 1 with steps of 0.1. 13 | # Create a function train_and_predict() that takes in one argument: alpha. The function should: 14 | # Instantiate a MultinomialNB classifier with alpha=alpha. 15 | # Fit it to the training data. 16 | # Compute predictions on the test data. 17 | # Compute and return the accuracy score. 18 | # Using a for loop, print the alpha, score and a newline in between. Use your train_and_predict() function to compute the score. Does the score change along with the alpha? What is the best alpha? 19 | 20 | 21 | # Create the list of alphas: alphas 22 | alphas = np.arange(0, 1, .1) 23 | 24 | # Define train_and_predict() 25 | def train_and_predict(alpha): 26 | # Instantiate the classifier: nb_classifier 27 | nb_classifier = MultinomialNB(alpha=alpha) 28 | # Fit to the training data 29 | nb_classifier.fit(tfidf_train, y_train) 30 | # Predict the labels: pred 31 | pred = nb_classifier.predict(tfidf_test) 32 | # Compute accuracy: score 33 | score = metrics.accuracy_score(y_test, pred) 34 | return score 35 | 36 | # Iterate over the alphas and print the corresponding score 37 | for alpha in alphas: 38 | print('Alpha: ', alpha) 39 | print('Score: ', train_and_predict(alpha)) 40 | print() -------------------------------------------------------------------------------- /Natural Language Processing Fundamentals in Python/Chapter 4-Building a fake news classifier/7.py: -------------------------------------------------------------------------------- 1 | # Inspecting your model 2 | # Now that you have built a "fake news" classifier, you'll investigate what it has learned. You can map the important vector weights back to actual words using some simple inspection techniques. 3 | 4 | # You have your well performing tfidf Naive Bayes classifier available as nb_classifier, and the vectors as tfidf_vectorizer. 5 | 6 | # Instructions 7 | # 100 XP 8 | # Instructions 9 | # 100 XP 10 | # Save the class labels as class_labels by accessing the .classes_ attribute of nb_classifier. 11 | # Extract the features using the .get_feature_names() method of tfidf_vectorizer. 12 | # Create a zipped array of the classifier coefficients with the feature names and sort them by the coefficients. To do this, first use zip() with the arguments nb_classifier.coef_[0] and feature_names. Then, use sorted() on this. 13 | # Print the top 20 weighted features for the first label of class_labels. 14 | # Print the bottom 20 weighted features for the second label of class_labels. 15 | 16 | class_labels = nb_classifier.classes_ 17 | 18 | # Extract the features: feature_names 19 | feature_names = tfidf_vectorizer.get_feature_names() 20 | 21 | # Zip the feature names together with the coefficient array and sort by weights: feat_with_weights 22 | feat_with_weights = sorted(zip(nb_classifier.coef_[0], feature_names)) 23 | 24 | # Print the first class label and the top 20 feat_with_weights entries 25 | print(class_labels[0], feat_with_weights[:20]) 26 | 27 | # Print the second class label and the bottom 20 feat_with_weights entries 28 | print(class_labels[1], feat_with_weights[-20:]) 29 | -------------------------------------------------------------------------------- /Natural Language Processing Fundamentals in Python/Chapter 4-Building a fake news classifier/chapter4.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FTiniNadhirah/Datacamp-Data-Scientist-with-Python-2019-and-2020/52ab66237a2373ba820ea3e197809648b09f217b/Natural Language Processing Fundamentals in Python/Chapter 4-Building a fake news classifier/chapter4.pdf -------------------------------------------------------------------------------- /Natural Language Processing Fundamentals in Python/Datasets/english_stopwords.txt: -------------------------------------------------------------------------------- 1 | i 2 | me 3 | my 4 | myself 5 | we 6 | our 7 | ours 8 | ourselves 9 | you 10 | your 11 | yours 12 | yourself 13 | yourselves 14 | he 15 | him 16 | his 17 | himself 18 | she 19 | her 20 | hers 21 | herself 22 | it 23 | its 24 | itself 25 | they 26 | them 27 | their 28 | theirs 29 | themselves 30 | what 31 | which 32 | who 33 | whom 34 | this 35 | that 36 | these 37 | those 38 | am 39 | is 40 | are 41 | was 42 | were 43 | be 44 | been 45 | being 46 | have 47 | has 48 | had 49 | having 50 | do 51 | does 52 | did 53 | doing 54 | a 55 | an 56 | the 57 | and 58 | but 59 | if 60 | or 61 | because 62 | as 63 | until 64 | while 65 | of 66 | at 67 | by 68 | for 69 | with 70 | about 71 | against 72 | between 73 | into 74 | through 75 | during 76 | before 77 | after 78 | above 79 | below 80 | to 81 | from 82 | up 83 | down 84 | in 85 | out 86 | on 87 | off 88 | over 89 | under 90 | again 91 | further 92 | then 93 | once 94 | here 95 | there 96 | when 97 | where 98 | why 99 | how 100 | all 101 | any 102 | both 103 | each 104 | few 105 | more 106 | most 107 | other 108 | some 109 | such 110 | no 111 | nor 112 | not 113 | only 114 | own 115 | same 116 | so 117 | than 118 | too 119 | very 120 | s 121 | t 122 | can 123 | will 124 | just 125 | don 126 | should 127 | now 128 | d 129 | ll 130 | m 131 | o 132 | re 133 | ve 134 | y 135 | ain 136 | aren 137 | couldn 138 | didn 139 | doesn 140 | hadn 141 | hasn 142 | haven 143 | isn 144 | ma 145 | mightn 146 | mustn 147 | needn 148 | shan 149 | shouldn 150 | wasn 151 | weren 152 | won 153 | wouldn 154 | -------------------------------------------------------------------------------- /Python Data Science Toolbox (Part 1)/Chapter 1-Writing your own functions/1.py: -------------------------------------------------------------------------------- 1 | # Write a simple function 2 | # In the last video, Hugo described the basics of how to define a function. You will now write your own function! 3 | 4 | # Define a function, shout(), which simply prints out a string with three exclamation marks '!!!' at the end. The code for the square() function that we wrote earlier is found below. You can use it as a pattern to define shout(). 5 | 6 | # def square(): 7 | # new_value = 4 ** 2 8 | # return new_value 9 | # Note that the function body is indented 4 spaces already for you. Function bodies need to be indented by a consistent number of spaces and the choice of 4 is common. 10 | 11 | # This course touches on a lot of concepts you may have forgotten, so if you ever need a quick refresher, download the Python for data science Cheat Sheet and keep it handy! 12 | 13 | # Instructions 14 | # 100 XP 15 | # Instructions 16 | # 100 XP 17 | # Complete the function header by adding the appropriate function name, shout. 18 | # In the function body, concatenate the string, 'congratulations' with another string, '!!!'. Assign the result to shout_word. 19 | # Print the value of shout_word. 20 | # Call the shout function. 21 | 22 | # Define the function shout 23 | def shout(): 24 | """Print a string with three exclamation marks""" 25 | # Concatenate the strings: shout_word 26 | shout_word = "congratulations" + "!!!" 27 | 28 | # Print shout_word 29 | print(shout_word) 30 | 31 | # Call shout 32 | shout() -------------------------------------------------------------------------------- /Python Data Science Toolbox (Part 1)/Chapter 1-Writing your own functions/2.py: -------------------------------------------------------------------------------- 1 | # Exercise 2 | # Exercise 3 | # Single-parameter functions 4 | # Congratulations! You have successfully defined and called your own function! That's pretty cool. 5 | 6 | # In the previous exercise, you defined and called the function shout(), which printed out a string concatenated with '!!!'. You will now update shout() by adding a parameter so that it can accept and process any string argument passed to it. Also note that shout(word), the part of the header that specifies the function name and parameter(s), is known as the signature of the function. You may encounter this term in the wild! 7 | 8 | # Instructions 9 | # 100 XP 10 | # Complete the function header by adding the parameter name, word. 11 | # Assign the result of concatenating word with '!!!' to shout_word. 12 | # Print the value of shout_word. 13 | # Call the shout() function, passing to it the string, 'congratulations'. 14 | 15 | # Define shout with the parameter, word 16 | def shout(word): 17 | """Print a string with three exclamation marks""" 18 | # Concatenate the strings: shout_word 19 | shout_word = word + '!!!' 20 | 21 | # Print shout_word 22 | print(shout_word) 23 | 24 | # Call shout with the string 'congratulations' 25 | shout('congratulations') -------------------------------------------------------------------------------- /Python Data Science Toolbox (Part 1)/Chapter 1-Writing your own functions/3.py: -------------------------------------------------------------------------------- 1 | # Exercise 2 | # Exercise 3 | # Functions that return single values 4 | # You're getting very good at this! Try your hand at another modification to the shout() function so that it now returns a single value instead of printing within the function. Recall that the return keyword lets you return values from functions. Parts of the function shout(), which you wrote earlier, are shown. Returning values is generally more desirable than printing them out because, as you saw earlier, a print() call assigned to a variable has type NoneType. 5 | 6 | # Instructions 7 | # 100 XP 8 | # Instructions 9 | # 100 XP 10 | # In the function body, concatenate the string in word with '!!!' and assign to shout_word. 11 | # Replace the print() statement with the appropriate return statement. 12 | # Call the shout() function, passing to it the string, 'congratulations', and assigning the call to the variable, yell. 13 | # To check if yell contains the value returned by shout(), print the value of yell. 14 | # Define shout with the parameter, word 15 | def shout(word): 16 | """Return a string with three exclamation marks""" 17 | # Concatenate the strings: shout_word 18 | shout_word = word + '!!!' 19 | 20 | # Replace print with return 21 | return shout_word 22 | 23 | # Pass 'congratulations' to shout: yell 24 | yell = shout('congratulations') 25 | 26 | # Print yell 27 | print(yell) -------------------------------------------------------------------------------- /Python Data Science Toolbox (Part 1)/Chapter 1-Writing your own functions/4.py: -------------------------------------------------------------------------------- 1 | # Exercise 2 | # Exercise 3 | # Functions with multiple parameters 4 | # Hugo discussed the use of multiple parameters in defining functions in the last lecture. You are now going to use what you've learned to modify the shout() function further. Here, you will modify shout() to accept two arguments. Parts of the function shout(), which you wrote earlier, are shown. 5 | 6 | # Instructions 7 | # 100 XP 8 | # Instructions 9 | # 100 XP 10 | # Modify the function header such that it accepts two parameters, word1 and word2, in that order. 11 | # Concatenate each of word1 and word2 with '!!!' and assign to shout1 and shout2, respectively. 12 | # Concatenate shout1 and shout2 together, in that order, and assign to new_shout. 13 | # Pass the strings 'congratulations' and 'you', in that order, to a call to shout(). Assign the return value to yell. 14 | 15 | 16 | # Define shout with parameters word1 and word2 17 | def shout(word1, word2): 18 | """Concatenate strings with three exclamation marks""" 19 | # Concatenate word1 with '!!!': shout1 20 | shout1 = word1 + '!!!' 21 | 22 | # Concatenate word2 with '!!!': shout2 23 | shout2 = word2 + '!!!' 24 | 25 | # Concatenate shout1 with shout2: new_shout 26 | new_shout = shout1 + shout2 27 | 28 | # Return new_shout 29 | return new_shout 30 | 31 | # Pass 'congratulations' and 'you' to shout(): yell 32 | yell = shout('congratulations', 'you') 33 | 34 | # Print yell 35 | print(yell) -------------------------------------------------------------------------------- /Python Data Science Toolbox (Part 1)/Chapter 1-Writing your own functions/5.py: -------------------------------------------------------------------------------- 1 | # A brief introduction to tuples 2 | # Alongside learning about functions, you've also learned about tuples! Here, you will practice what you've learned about tuples: how to construct, unpack, and access tuple elements. Recall how Hugo unpacked the tuple even_nums in the video: 3 | 4 | # a, b, c = even_nums 5 | 6 | # A three-element tuple named nums has been preloaded for this exercise. Before completing the script, perform the following: 7 | 8 | # Print out the value of nums in the IPython shell. Note the elements in the tuple. 9 | # In the IPython shell, try to change the first element of nums to the value 2 by doing an assignment: nums[0] = 2. What happens? 10 | # Instructions 11 | # 100 XP 12 | # Instructions 13 | # 100 XP 14 | # Unpack nums to the variables num1, num2, and num3. 15 | # Construct a new tuple, even_nums composed of the same elements in nums, but with the 1st element replaced with the value, 2. 16 | 17 | nums = (3,4,6) 18 | print(nums) 19 | 20 | # Unpack nums into num1, num2, and num3 21 | num1, num2, num3 = nums 22 | 23 | # Construct even_nums 24 | even_nums = (2, num2, num3) 25 | print(even_nums) -------------------------------------------------------------------------------- /Python Data Science Toolbox (Part 1)/Chapter 1-Writing your own functions/6.py: -------------------------------------------------------------------------------- 1 | # Functions that return multiple values 2 | # In the previous exercise, you constructed tuples, assigned tuples to variables, and unpacked tuples. Here you will return multiple values from a function using tuples. Let's now update our shout() function to return multiple values. Instead of returning just one string, we will return two strings with the string !!! concatenated to each. 3 | 4 | # Note that the return statement return x, y has the same result as return (x, y): the former actually packs x and y into a tuple under the hood! 5 | 6 | # Instructions 7 | # 100 XP 8 | # Instructions 9 | # 100 XP 10 | # Modify the function header such that the function name is now shout_all, and it accepts two parameters, word1 and word2, in that order. 11 | # Concatenate the string '!!!' to each of word1 and word2 and assign to shout1 and shout2, respectively. 12 | # Construct a tuple shout_words, composed of shout1 and shout2. 13 | # Call shout_all() with the strings 'congratulations' and 'you' and assign the result to yell1 and yell2 (remember, shout_all() returns 2 variables!). 14 | 15 | # Define shout_all with parameters word1 and word2 16 | def shout_all(word1, word2): 17 | 18 | # Concatenate word1 with '!!!': shout1 19 | shout1 = word1 + '!!!' 20 | 21 | # Concatenate word2 with '!!!': shout2 22 | shout2 = word2 + '!!!' 23 | 24 | # Construct a tuple with shout1 and shout2: shout_words 25 | shout_words = (shout1, shout2) 26 | 27 | # Return shout_words 28 | return shout_words 29 | 30 | # Pass 'congratulations' and 'you' to shout_all(): yell1, yell2 31 | yell1, yell2 = shout_all('congratulations', 'you') 32 | 33 | # Print yell1 and yell2 34 | print(yell1) 35 | print(yell2) -------------------------------------------------------------------------------- /Python Data Science Toolbox (Part 1)/Chapter 1-Writing your own functions/ch1_slides.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FTiniNadhirah/Datacamp-Data-Scientist-with-Python-2019-and-2020/52ab66237a2373ba820ea3e197809648b09f217b/Python Data Science Toolbox (Part 1)/Chapter 1-Writing your own functions/ch1_slides.pdf -------------------------------------------------------------------------------- /Python Data Science Toolbox (Part 1)/Chapter 2-Default arguments, variable-length arguments and scope/1.py: -------------------------------------------------------------------------------- 1 | # The keyword global 2 | # Let's work more on your mastery of scope. In this exercise, you will use the keyword global within a function to alter the value of a variable defined in the global scope. 3 | 4 | # Instructions 5 | # 100 XP 6 | # Use the keyword global to alter the object team in the global scope. 7 | # Change the value of team in the global scope to the string "justice league". Assign the result to team. 8 | # Hit the Submit button to see how executing your newly defined function change_team() changes the value of the name team! 9 | 10 | # Create a string: team 11 | team = "teen titans" 12 | 13 | # Define change_team() 14 | def change_team(): 15 | """Change the value of the global variable team.""" 16 | 17 | # Use team in global scope 18 | global team 19 | 20 | # Change the value of team in global: team 21 | team = "justice league" 22 | 23 | # Print team 24 | print(team) 25 | 26 | # Call change_team() 27 | change_team() 28 | 29 | # Print team 30 | print(team) -------------------------------------------------------------------------------- /Python Data Science Toolbox (Part 1)/Chapter 2-Default arguments, variable-length arguments and scope/2.py: -------------------------------------------------------------------------------- 1 | # Nested Functions I 2 | # You've learned in the last video about nesting functions within functions. One reason why you'd like to do this is to avoid writing out the same computations within functions repeatedly. There's nothing new about defining nested functions: you simply define it as you would a regular function with def and embed it inside another function! 3 | 4 | # In this exercise, inside a function three_shouts(), you will define a nested function inner() that concatenates a string object with !!!. three_shouts() then returns a tuple of three elements, each a string concatenated with !!! using inner(). Go for it! 5 | 6 | # Instructions 7 | # 100 XP 8 | # Complete the function header of the nested function with the function name inner() and a single parameter word. 9 | # Complete the return value: each element of the tuple should be a call to inner(), passing in the parameters from three_shouts() as arguments to each call. 10 | 11 | # Define three_shouts 12 | def three_shouts(word1, word2, word3): 13 | """Returns a tuple of strings 14 | concatenated with '!!!'.""" 15 | 16 | # Define inner 17 | def inner(word): 18 | """Returns a string concatenated with '!!!'.""" 19 | return word + '!!!' 20 | 21 | # Return a tuple of strings 22 | return (inner(word1), inner(word2), inner(word3)) 23 | 24 | # Call three_shouts() and print 25 | print(three_shouts('a', 'b', 'c')) -------------------------------------------------------------------------------- /Python Data Science Toolbox (Part 1)/Chapter 2-Default arguments, variable-length arguments and scope/3.py: -------------------------------------------------------------------------------- 1 | # Nested Functions II 2 | # Great job, you've just nested a function within another function. One other pretty cool reason for nesting functions is the idea of a closure. This means that the nested or inner function remembers the state of its enclosing scope when called. Thus, anything defined locally in the enclosing scope is available to the inner function even when the outer function has finished execution. 3 | 4 | # Let's move forward then! In this exercise, you will complete the definition of the inner function inner_echo() and then call echo() a couple of times, each with a different argument. Complete the exercise and see what the output will be! 5 | 6 | # Instructions 7 | # 100 XP 8 | # Instructions 9 | # 100 XP 10 | # Complete the function header of the inner function with the function name inner_echo() and a single parameter word1. 11 | # Complete the function echo() so that it returns inner_echo. 12 | # We have called echo(), passing 2 as an argument, and assigned the resulting function to twice. Your job is to call echo(), passing 3 as an argument. Assign the resulting function to thrice. 13 | # Hit Submit to call twice() and thrice() and print the results. 14 | 15 | # Define echo 16 | def echo(n): 17 | """Return the inner_echo function.""" 18 | 19 | # Define inner_echo 20 | def inner_echo(word1): 21 | """Concatenate n copies of word1.""" 22 | echo_word = word1 * n 23 | return echo_word 24 | 25 | # Return inner_echo 26 | return inner_echo 27 | 28 | # Call echo: twice 29 | twice = echo(2) 30 | 31 | # Call echo: thrice 32 | thrice = echo(3) 33 | 34 | # Call twice() and thrice() then print 35 | print(twice('hello'), thrice('hello')) -------------------------------------------------------------------------------- /Python Data Science Toolbox (Part 1)/Chapter 2-Default arguments, variable-length arguments and scope/4.py: -------------------------------------------------------------------------------- 1 | # The keyword nonlocal and nested functions 2 | # Let's once again work further on your mastery of scope! In this exercise, you will use the keyword nonlocal within a nested function to alter the value of a variable defined in the enclosing scope. 3 | 4 | # Instructions 5 | # 100 XP 6 | # Assign to echo_word the string word, concatenated with itself. 7 | # Use the keyword nonlocal to alter the value of echo_word in the enclosing scope. 8 | # Alter echo_word to echo_word concatenated with '!!!'. 9 | # Call the function echo_shout(), passing it a single argument 'hello'. 10 | 11 | # Define echo_shout() 12 | def echo_shout(word): 13 | """Change the value of a nonlocal variable""" 14 | 15 | # Concatenate word with itself: echo_word 16 | echo_word = word + word 17 | 18 | #Print echo_word 19 | print(echo_word) 20 | 21 | # Define inner function shout() 22 | def shout(): 23 | """Alter a variable in the enclosing scope""" 24 | #Use echo_word in nonlocal scope 25 | nonlocal echo_word 26 | 27 | #Change echo_word to echo_word concatenated with '!!!' 28 | echo_word = echo_word + '!!!' 29 | 30 | # Call function shout() 31 | shout() 32 | 33 | #Print echo_word 34 | print(echo_word) 35 | 36 | #Call function echo_shout() with argument 'hello' 37 | echo_shout('hello') -------------------------------------------------------------------------------- /Python Data Science Toolbox (Part 1)/Chapter 2-Default arguments, variable-length arguments and scope/5.py: -------------------------------------------------------------------------------- 1 | # Exercise 2 | # Exercise 3 | # Functions with one default argument 4 | # In the previous chapter, you've learned to define functions with more than one parameter and then calling those functions by passing the required number of arguments. In the last video, Hugo built on this idea by showing you how to define functions with default arguments. You will practice that skill in this exercise by writing a function that uses a default argument and then calling the function a couple of times. 5 | 6 | # Instructions 7 | # 100 XP 8 | # Instructions 9 | # 100 XP 10 | # Complete the function header with the function name shout_echo. It accepts an argument word1 and a default argument echo with default value 1, in that order. 11 | # Use the * operator to concatenate echo copies of word1. Assign the result to echo_word. 12 | # Call shout_echo() with just the string, "Hey". Assign the result to no_echo. 13 | # Call shout_echo() with the string "Hey" and the value 5 for the default argument, echo. Assign the result to with_echo. 14 | 15 | # Define shout_echo 16 | def shout_echo(word1, echo = 1): 17 | """Concatenate echo copies of word1 and three 18 | exclamation marks at the end of the string.""" 19 | 20 | # Concatenate echo copies of word1 using *: echo_word 21 | echo_word = word1 * echo 22 | 23 | # Concatenate '!!!' to echo_word: shout_word 24 | shout_word = echo_word + '!!!' 25 | 26 | # Return shout_word 27 | return shout_word 28 | 29 | # Call shout_echo() with "Hey": no_echo 30 | no_echo = shout_echo("Hey") 31 | 32 | # Call shout_echo() with "Hey" and echo=5: with_echo 33 | with_echo = shout_echo("Hey", echo=5) 34 | 35 | # Print no_echo and with_echo 36 | print(no_echo) 37 | print(with_echo) -------------------------------------------------------------------------------- /Python Data Science Toolbox (Part 1)/Chapter 2-Default arguments, variable-length arguments and scope/ch2_slides.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FTiniNadhirah/Datacamp-Data-Scientist-with-Python-2019-and-2020/52ab66237a2373ba820ea3e197809648b09f217b/Python Data Science Toolbox (Part 1)/Chapter 2-Default arguments, variable-length arguments and scope/ch2_slides.pdf -------------------------------------------------------------------------------- /Python Data Science Toolbox (Part 1)/Chapter 3-Lambda functions and error-handling/1.py: -------------------------------------------------------------------------------- 1 | # Writing a lambda function you already know 2 | # Some function definitions are simple enough that they can be converted to a lambda function. By doing this, you write less lines of code, which is pretty awesome and will come in handy, especially when you're writing and maintaining big programs. In this exercise, you will use what you know about lambda functions to convert a function that does a simple task into a lambda function. Take a look at this function definition: 3 | 4 | # def echo_word(word1, echo): 5 | # """Concatenate echo copies of word1.""" 6 | # words = word1 * echo 7 | # return words 8 | # The function echo_word takes 2 parameters: a string value, word1 and an integer value, echo. It returns a string that is a concatenation of echo copies of word1. Your task is to convert this simple function into a lambda function. 9 | 10 | # Instructions 11 | # 100 XP 12 | # Instructions 13 | # 100 XP 14 | # Define the lambda function echo_word using the variables word1 and echo. Replicate what the original function definition for echo_word() does above. 15 | # Call echo_word() with the string argument 'hey' and the value 5, in that order. Assign the call to result. 16 | 17 | # Define echo_word as a lambda function: echo_word 18 | echo_word = (lambda word1, echo: word1 * echo) 19 | 20 | # Call echo_word: result 21 | result = echo_word('hey', 5) 22 | 23 | # Print result 24 | print(result) -------------------------------------------------------------------------------- /Python Data Science Toolbox (Part 1)/Chapter 3-Lambda functions and error-handling/3.py: -------------------------------------------------------------------------------- 1 | # Filter() and lambda functions 2 | # In the previous exercise, you used lambda functions to anonymously embed an operation within map(). You will practice this again in this exercise by using a lambda function with filter(), which may be new to you! The function filter() offers a way to filter out elements from a list that don't satisfy certain criteria. 3 | 4 | # Your goal in this exercise is to use filter() to create, from an input list of strings, a new list that contains only strings that have more than 6 characters. 5 | 6 | # Instructions 7 | # 100 XP 8 | # Instructions 9 | # 100 XP 10 | # In the filter() call, pass a lambda function and the list of strings, fellowship. The lambda function should check if the number of characters in a string member is greater than 6; use the len() function to do this. Assign the resulting filter object to result. 11 | # Convert result to a list and print out the list. 12 | 13 | # Create a list of strings: fellowship 14 | fellowship = ['frodo', 'samwise', 'merry', 'pippin', 'aragorn', 'boromir', 'legolas', 'gimli', 'gandalf'] 15 | 16 | # Use filter() to apply a lambda function over fellowship: result 17 | result = filter(lambda member: len(member) > 6 , fellowship) 18 | 19 | # Convert result to a list: result_list 20 | result_list = list(result) 21 | 22 | # Convert result into a list and print it 23 | print(result_list) -------------------------------------------------------------------------------- /Python Data Science Toolbox (Part 1)/Chapter 3-Lambda functions and error-handling/4.py: -------------------------------------------------------------------------------- 1 | # Exercise 2 | # Exercise 3 | # Reduce() and lambda functions 4 | # You're getting very good at using lambda functions! Here's one more function to add to your repertoire of skills. The reduce() function is useful for performing some computation on a list and, unlike map() and filter(), returns a single value as a result. To use reduce(), you must import it from the functools module. 5 | 6 | # Remember gibberish() from a few exercises back? 7 | 8 | # # Define gibberish 9 | # def gibberish(*args): 10 | # """Concatenate strings in *args together.""" 11 | # hodgepodge = '' 12 | # for word in args: 13 | # hodgepodge += word 14 | # return hodgepodge 15 | # gibberish() simply takes a list of strings as an argument and returns, as a single-value result, the concatenation of all of these strings. In this exercise, you will replicate this functionality by using reduce() and a lambda function that concatenates strings together. 16 | 17 | # Instructions 18 | # 100 XP 19 | # Import the reduce function from the functools module. 20 | # In the reduce() call, pass a lambda function that takes two string arguments item1 and item2 and concatenates them; also pass the list of strings, stark. Assign the result to result. The first argument to reduce() should be the lambda function and the second argument is the list stark. 21 | # Import reduce from functools 22 | from functools import reduce 23 | 24 | # Create a list of strings: stark 25 | stark = ['robb', 'sansa', 'arya', 'brandon', 'rickon'] 26 | 27 | # Use reduce() to apply a lambda function over stark: result 28 | result = reduce(lambda item1, item2: item1 + item2, stark) 29 | 30 | # Print the result 31 | print(result) -------------------------------------------------------------------------------- /Python Data Science Toolbox (Part 1)/Chapter 3-Lambda functions and error-handling/6.py: -------------------------------------------------------------------------------- 1 | # Error handling by raising an error 2 | # Another way to raise an error is by using raise. In this exercise, you will add a raise statement to the shout_echo() function you defined before to raise an error message when the value supplied by the user to the echo argument is less than 0. 3 | 4 | # The call to shout_echo() uses valid argument values. To test and see how the raise statement works, simply change the value for the echo argument to a negative value. Don't forget to change it back to valid values to move on to the next exercise! 5 | 6 | # Instructions 7 | # 100 XP 8 | # Complete the if statement by checking if the value of echo is less than 0. 9 | # In the body of the if statement, add a raise statement that raises a ValueError with message 'echo must be greater than 0' when the value supplied by the user to echo is less than 0. 10 | # Define shout_echo 11 | def shout_echo(word1, echo=1): 12 | """Concatenate echo copies of word1 and three 13 | exclamation marks at the end of the string.""" 14 | 15 | # Raise an error with raise 16 | if echo < 0: 17 | raise ValueError('echo must be greater than 0') 18 | 19 | # Concatenate echo copies of word1 using *: echo_word 20 | echo_word = word1 * echo 21 | 22 | # Concatenate '!!!' to echo_word: shout_word 23 | shout_word = echo_word + '!!!' 24 | 25 | # Return shout_word 26 | return shout_word 27 | 28 | # Call shout_echo 29 | shout_echo("particle", echo=5) -------------------------------------------------------------------------------- /Python Data Science Toolbox (Part 1)/Chapter 3-Lambda functions and error-handling/7.py: -------------------------------------------------------------------------------- 1 | # Exercise 2 | # Exercise 3 | # Bringing it all together (1) 4 | # This is awesome! You have now learned how to write anonymous functions using lambda, how to pass lambda functions as arguments to other functions such as map(), filter(), and reduce(), as well as how to write errors and output custom error messages within your functions. You will now put together these learnings to good use by working with a Twitter dataset. Before practicing your new error handling skills,in this exercise, you will write a lambda function and use filter() to select retweets, that is, tweets that begin with the string 'RT'. 5 | 6 | # To help you accomplish this, the Twitter data has been imported into the DataFrame, tweets_df. Go for it! 7 | 8 | # Instructions 9 | # 100 XP 10 | # Instructions 11 | # 100 XP 12 | # In the filter() call, pass a lambda function and the sequence of tweets as strings, tweets_df['text']. The lambda function should check if the first 2 characters in a tweet x are 'RT'. Assign the resulting filter object to result. To get the first 2 characters in a tweet x, use x[0:2]. To check equality, use a Boolean filter with ==. 13 | # Convert result to a list and print out the list. 14 | 15 | # Select retweets from the Twitter DataFrame: result 16 | result = filter(lambda x: x[0:2] == 'RT', tweets_df['text']) 17 | 18 | # Create list from filter object result: res_list 19 | res_list = list(result) 20 | 21 | # Print all retweets in res_list 22 | for tweet in res_list: 23 | print(tweet) -------------------------------------------------------------------------------- /Python Data Science Toolbox (Part 1)/Chapter 3-Lambda functions and error-handling/ch3_slides.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FTiniNadhirah/Datacamp-Data-Scientist-with-Python-2019-and-2020/52ab66237a2373ba820ea3e197809648b09f217b/Python Data Science Toolbox (Part 1)/Chapter 3-Lambda functions and error-handling/ch3_slides.pdf -------------------------------------------------------------------------------- /Python Data Science Toolbox (Part 2)/Chapter 1-Using iterators in PythonLand/1.py: -------------------------------------------------------------------------------- 1 | # Iterating over iterables (1) 2 | # Great, you're familiar with what iterables and iterators are! In this exercise, you will reinforce your knowledge about these by iterating over and printing from iterables and iterators. 3 | 4 | # You are provided with a list of strings flash. You will practice iterating over the list by using a for loop. You will also create an iterator for the list and access the values from the iterator. 5 | 6 | # Instructions 7 | # 100 XP 8 | # Create a for loop to loop over flash and print the values in the list. Use person as the loop variable. 9 | # Create an iterator for the list flash and assign the result to superspeed. 10 | # Print each of the items from superspeed using next() 4 times.# Create a list of strings: flash 11 | flash = ['jay garrick', 'barry allen', 'wally west', 'bart allen'] 12 | 13 | # Print each list item in flash using a for loop 14 | for person in flash: 15 | print(person) 16 | 17 | 18 | # Create an iterator for flash: superspeed 19 | superspeed = iter(flash) 20 | 21 | # Print each item from the iterator 22 | print(next(superspeed)) 23 | print(next(superspeed)) 24 | print(next(superspeed)) 25 | print(next(superspeed)) 26 | -------------------------------------------------------------------------------- /Python Data Science Toolbox (Part 2)/Chapter 1-Using iterators in PythonLand/3.py: -------------------------------------------------------------------------------- 1 | # Iterators as function arguments 2 | # You've been using the iter() function to get an iterator object, as well as the next() function to retrieve the values one by one from the iterator object. 3 | 4 | # There are also functions that take iterators and iterables as arguments. For example, the list() and sum() functions return a list and the sum of elements, respectively. 5 | 6 | # In this exercise, you will use these functions by passing an iterable from range() and then printing the results of the function calls. 7 | 8 | # Instructions 9 | # 100 XP 10 | # Instructions 11 | # 100 XP 12 | # Create a range object that would produce the values from 10 to 20 using range(). Assign the result to values. 13 | # Use the list() function to create a list of values from the range object values. Assign the result to values_list. 14 | # Use the sum() function to get the sum of the values from 10 to 20 from the range object values. Assign the result to values_sum. 15 | 16 | # Create a range object: values 17 | values = range(10,21) 18 | 19 | # Print the range object 20 | print(values) 21 | 22 | # Create a list of integers: values_list 23 | values_list = list(values) 24 | 25 | # Print values_list 26 | print(values_list) 27 | 28 | # Get the sum of values: values_sum 29 | values_sum = sum(values) 30 | 31 | # Print values_sum 32 | print(values_sum) -------------------------------------------------------------------------------- /Python Data Science Toolbox (Part 2)/Chapter 1-Using iterators in PythonLand/ch1_slides.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FTiniNadhirah/Datacamp-Data-Scientist-with-Python-2019-and-2020/52ab66237a2373ba820ea3e197809648b09f217b/Python Data Science Toolbox (Part 2)/Chapter 1-Using iterators in PythonLand/ch1_slides.pdf -------------------------------------------------------------------------------- /Python Data Science Toolbox (Part 2)/Chapter 2-List comprehensions and generators/1.py: -------------------------------------------------------------------------------- 1 | # Exercise 2 | # Exercise 3 | # Writing list comprehensions 4 | # You now have all the knowledge necessary to begin writing list comprehensions! Your job in this exercise is to write a list comprehension that produces a list of the squares of the numbers ranging from 0 to 9. 5 | 6 | # Instructions 7 | # 100 XP 8 | # Using the range of numbers from 0 to 9 as your iterable and i as your iterator variable, write a list comprehension that produces a list of numbers consisting of the squared values of i. 9 | 10 | # Create list comprehension: squares 11 | squares = [i**2 for i in range(0,10)] 12 | squares -------------------------------------------------------------------------------- /Python Data Science Toolbox (Part 2)/Chapter 2-List comprehensions and generators/10.py: -------------------------------------------------------------------------------- 1 | # Conditional list comprehensions for time-stamped data 2 | # Great, you've successfully extracted the data of interest, the time, from a pandas DataFrame! Let's tweak your work further by adding a conditional that further specifies which entries to select. 3 | 4 | # In this exercise, you will be using a list comprehension to extract the time from time-stamped Twitter data. You will add a conditional expression to the list comprehension so that you only select the times in which entry[17:19] is equal to '19'. The pandas package has been imported as pd and the file 'tweets.csv' has been imported as the df DataFrame for your use. 5 | 6 | # Instructions 7 | # 100 XP 8 | # Extract the column 'created_at' from df and assign the result to tweet_time. 9 | # Create a list comprehension that extracts the time from each row in tweet_time. Each row is a string that represents a timestamp, and you will access the 12th to 19th characters in the string to extract the time. Use entry as the iterator variable and assign the result to tweet_clock_time. Additionally, add a conditional expression that checks whether entry[17:19] is equal to '19'. 10 | 11 | 12 | # Extract the created_at column from df: tweet_time 13 | tweet_time = df['created_at'] 14 | 15 | # Extract the clock time: tweet_clock_time 16 | tweet_clock_time = [entry[11:19] for entry in tweet_time if entry[17:19] == '19'] 17 | 18 | # Print the extracted times 19 | print(tweet_clock_time) 20 | -------------------------------------------------------------------------------- /Python Data Science Toolbox (Part 2)/Chapter 2-List comprehensions and generators/3.py: -------------------------------------------------------------------------------- 1 | # Using conditionals in comprehensions (1) 2 | # You've been using list comprehensions to build lists of values, sometimes using operations to create these values. 3 | 4 | # An interesting mechanism in list comprehensions is that you can also create lists with values that meet only a certain condition. One way of doing this is by using conditionals on iterator variables. In this exercise, you will do exactly that! 5 | 6 | # Recall from the video that you can apply a conditional statement to test the iterator variable by adding an if statement in the optional predicate expression part after the for statement in the comprehension: 7 | 8 | # [ output expression for iterator variable in iterable if predicate expression ]. 9 | 10 | # You will use this recipe to write a list comprehension for this exercise. You are given a list of strings fellowship and, using a list comprehension, you will create a list that only includes the members of fellowship that have 7 characters or more. 11 | 12 | # Instructions 13 | # 100 XP 14 | # Instructions 15 | # 100 XP 16 | # Use member as the iterator variable in the list comprehension. For the conditional, use len() to evaluate the iterator variable. Note that you only want strings with 7 characters or more. 17 | 18 | # Create a list of strings: fellowship 19 | fellowship = ['frodo', 'samwise', 'merry', 'aragorn', 'legolas', 'boromir', 'gimli'] 20 | 21 | # Create list comprehension: new_fellowship 22 | new_fellowship = [ member for member in fellowship if len(member) >= 7 ] 23 | 24 | # Print the new list 25 | print(new_fellowship) -------------------------------------------------------------------------------- /Python Data Science Toolbox (Part 2)/Chapter 2-List comprehensions and generators/4.py: -------------------------------------------------------------------------------- 1 | # Using conditionals in comprehensions (2) 2 | # In the previous exercise, you used an if conditional statement in the predicate expression part of a list comprehension to evaluate an iterator variable. In this exercise, you will use an if-else statement on the output expression of the list. 3 | 4 | # You will work on the same list, fellowship and, using a list comprehension and an if-else conditional statement in the output expression, create a list that keeps members of fellowship with 7 or more characters and replaces others with an empty string. Use member as the iterator variable in the list comprehension. 5 | 6 | # Instructions 7 | # 100 XP 8 | # In the output expression, keep the string as-is if the number of characters is >= 7, else replace it with an empty string - that is, '' or "". 9 | 10 | # Create a list of strings: fellowship 11 | fellowship = ['frodo', 'samwise', 'merry', 'aragorn', 'legolas', 'boromir', 'gimli'] 12 | 13 | # Create list comprehension: new_fellowship 14 | new_fellowship = [member if len(member) >=7 else "" for member in fellowship] 15 | 16 | # Print the new list 17 | print(new_fellowship) -------------------------------------------------------------------------------- /Python Data Science Toolbox (Part 2)/Chapter 2-List comprehensions and generators/5.py: -------------------------------------------------------------------------------- 1 | # Dict comprehensions 2 | # Comprehensions aren't relegated merely to the world of lists. There are many other objects you can build using comprehensions, such as dictionaries, pervasive objects in Data Science. You will create a dictionary using the comprehension syntax for this exercise. In this case, the comprehension is called a dict comprehension. 3 | 4 | # Recall that the main difference between a list comprehension and a dict comprehension is the use of curly braces {} instead of []. Additionally, members of the dictionary are created using a colon :, as in : . 5 | 6 | # You are given a list of strings fellowship and, using a dict comprehension, create a dictionary with the members of the list as the keys and the length of each string as the corresponding values. 7 | 8 | # Instructions 9 | # 100 XP 10 | # Instructions 11 | # 100 XP 12 | # Create a dict comprehension where the key is a string in fellowship and the value is the length of the string. Remember to use the syntax : in the output expression part of the comprehension to create the members of the dictionary. Use member as the iterator variable. 13 | 14 | # Create a list of strings: fellowship 15 | fellowship = ['frodo', 'samwise', 'merry', 'aragorn', 'legolas', 'boromir', 'gimli'] 16 | 17 | # Create dict comprehension: new_fellowship 18 | new_fellowship = {member:len(member) for member in fellowship} 19 | 20 | # Print the new list 21 | print(new_fellowship) -------------------------------------------------------------------------------- /Python Data Science Toolbox (Part 2)/Chapter 2-List comprehensions and generators/6.py: -------------------------------------------------------------------------------- 1 | # Write your own generator expressions 2 | # You are familiar with what generators and generator expressions are, as well as its difference from list comprehensions. In this exercise, you will practice building generator expressions on your own. 3 | 4 | # Recall that generator expressions basically have the same syntax as list comprehensions, except that it uses parentheses () instead of brackets []; this should make things feel familiar! Furthermore, if you have ever iterated over a dictionary with .items(), or used the range() function, for example, you have already encountered and used generators before, without knowing it! When you use these functions, Python creates generators for you behind the scenes. 5 | 6 | # Now, you will start simple by creating a generator object that produces numeric values. 7 | 8 | # Instructions 9 | # 100 XP 10 | # Instructions 11 | # 100 XP 12 | # Create a generator object that will produce values from 0 to 30. Assign the result to result and use num as the iterator variable in the generator expression. 13 | # Print the first 5 values by using next() appropriately in print(). 14 | # Print the rest of the values by using a for loop to iterate over the generator object. 15 | 16 | # Create generator object: result 17 | result = (num for num in range(31)) 18 | 19 | # Print the first 5 values 20 | print(next(result)) 21 | print(next(result)) 22 | print(next(result)) 23 | print(next(result)) 24 | print(next(result)) 25 | 26 | # Print the rest of the values 27 | for value in result: 28 | print(value) -------------------------------------------------------------------------------- /Python Data Science Toolbox (Part 2)/Chapter 2-List comprehensions and generators/7.py: -------------------------------------------------------------------------------- 1 | # Changing the output in generator expressions 2 | # Great! At this point, you already know how to write a basic generator expression. In this exercise, you will push this idea a little further by adding to the output expression of a generator expression. Because generator expressions and list comprehensions are so alike in syntax, this should be a familiar task for you! 3 | 4 | # You are given a list of strings lannister and, using a generator expression, create a generator object that you will iterate over to print its values. 5 | 6 | # Instructions 7 | # 100 XP 8 | # Write a generator expression that will generate the lengths of each string in lannister. Use person as the iterator variable. Assign the result to lengths. 9 | # Supply the correct iterable in the for loop for printing the values in the generator object. 10 | 11 | # Create a list of strings: lannister 12 | lannister = ['cersei', 'jaime', 'tywin', 'tyrion', 'joffrey'] 13 | 14 | # Create a generator object: lengths 15 | lengths = (len(person) for person in lannister) 16 | 17 | # Iterate over and print the values in lengths 18 | for value in lengths: 19 | print(value) -------------------------------------------------------------------------------- /Python Data Science Toolbox (Part 2)/Chapter 2-List comprehensions and generators/9.py: -------------------------------------------------------------------------------- 1 | # List comprehensions for time-stamped data 2 | # You will now make use of what you've learned from this chapter to solve a simple data extraction problem. You will also be introduced to a data structure, the pandas Series, in this exercise. We won't elaborate on it much here, but what you should know is that it is a data structure that you will be working with a lot of times when analyzing data from pandas DataFrames. You can think of DataFrame columns as single-dimension arrays called Series. 3 | 4 | # In this exercise, you will be using a list comprehension to extract the time from time-stamped Twitter data. The pandas package has been imported as pd and the file 'tweets.csv' has been imported as the df DataFrame for your use. 5 | 6 | # Instructions 7 | # 100 XP 8 | # Instructions 9 | # 100 XP 10 | # Extract the column 'created_at' from df and assign the result to tweet_time. Fun fact: the extracted column in tweet_time here is a Series data structure! 11 | # Create a list comprehension that extracts the time from each row in tweet_time. Each row is a string that represents a timestamp, and you will access the 12th to 19th characters in the string to extract the time. Use entry as the iterator variable and assign the result to tweet_clock_time. Remember that Python uses 0-based indexing! 12 | 13 | # Extract the created_at column from df: tweet_time 14 | tweet_time = df['created_at'] 15 | 16 | # Extract the clock time: tweet_clock_time 17 | tweet_clock_time = [entry[11:19] for entry in tweet_time] 18 | 19 | # Print the extracted times 20 | print(tweet_clock_time) 21 | -------------------------------------------------------------------------------- /Python Data Science Toolbox (Part 2)/Chapter 2-List comprehensions and generators/ch2_slides.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FTiniNadhirah/Datacamp-Data-Scientist-with-Python-2019-and-2020/52ab66237a2373ba820ea3e197809648b09f217b/Python Data Science Toolbox (Part 2)/Chapter 2-List comprehensions and generators/ch2_slides.pdf -------------------------------------------------------------------------------- /Python Data Science Toolbox (Part 2)/Chapter 3-Bringing it all together!/1.py: -------------------------------------------------------------------------------- 1 | # Exercise 2 | # Exercise 3 | # Dictionaries for data science 4 | # For this exercise, you'll use what you've learned about the zip() function and combine two lists into a dictionary. 5 | 6 | # These lists are actually extracted from a bigger dataset file of world development indicators from the World Bank. For pedagogical purposes, we have pre-processed this dataset into the lists that you'll be working with. 7 | 8 | # The first list feature_names contains header names of the dataset and the second list row_vals contains actual values of a row from the dataset, corresponding to each of the header names. 9 | 10 | # Instructions 11 | # 100 XP 12 | # Instructions 13 | # 100 XP 14 | # Create a zip object by calling zip() and passing to it feature_names and row_vals. Assign the result to zipped_lists. 15 | # Create a dictionary from the zipped_lists zip object by calling dict() with zipped_lists. Assign the resulting dictionary to rs_dict. 16 | 17 | # Zip lists: zipped_lists 18 | zipped_lists = zip(feature_names, row_vals) 19 | type(zipped_lists) 20 | 21 | # Create a dictionary: rs_dict 22 | rs_dict = dict(zipped_lists) 23 | type(rs_dict) 24 | 25 | # Print the dictionary 26 | print(rs_dict) -------------------------------------------------------------------------------- /Python Data Science Toolbox (Part 2)/Chapter 3-Bringing it all together!/2.py: -------------------------------------------------------------------------------- 1 | # Writing a function to help you 2 | # Suppose you needed to repeat the same process done in the previous exercise to many, many rows of data. Rewriting your code again and again could become very tedious, repetitive, and unmaintainable. 3 | 4 | # In this exercise, you will create a function to house the code you wrote earlier to make things easier and much more concise. Why? This way, you only need to call the function and supply the appropriate lists to create your dictionaries! Again, the lists feature_names and row_vals are preloaded and these contain the header names of the dataset and actual values of a row from the dataset, respectively. 5 | 6 | # Instructions 7 | # 100 XP 8 | # Define the function lists2dict() with two parameters: first is list1 and second is list2. 9 | # Return the resulting dictionary rs_dict in lists2dict(). 10 | # Call the lists2dict() function with the arguments feature_names and row_vals. Assign the result of the function call to rs_fxn. 11 | 12 | # Define lists2dict() 13 | def lists2dict(list1, list2): 14 | """Return a dictionary where list1 provides 15 | the keys and list2 provides the values.""" 16 | 17 | # Zip lists: zipped_lists 18 | zipped_lists = zip(list1, list2) 19 | 20 | # Create a dictionary: rs_dict 21 | rs_dict = dict(zipped_lists) 22 | 23 | # Return the dictionary 24 | return rs_dict 25 | 26 | # Call lists2dict: rs_fxn 27 | rs_fxn = lists2dict(feature_names, row_vals) 28 | 29 | # Print rs_fxn 30 | print(rs_fxn) -------------------------------------------------------------------------------- /Python Data Science Toolbox (Part 2)/Chapter 3-Bringing it all together!/3.py: -------------------------------------------------------------------------------- 1 | # Using a list comprehension 2 | # This time, you're going to use the lists2dict() function you defined in the last exercise to turn a bunch of lists into a list of dictionaries with the help of a list comprehension. 3 | 4 | # The lists2dict() function has already been preloaded, together with a couple of lists, feature_names and row_lists. feature_names contains the header names of the World Bank dataset and row_lists is a list of lists, where each sublist is a list of actual values of a row from the dataset. 5 | 6 | # Your goal is to use a list comprehension to generate a list of dicts, where the keys are the header names and the values are the row entries. 7 | 8 | # Instructions 9 | # 100 XP 10 | # Instructions 11 | # 100 XP 12 | # Inspect the contents of row_lists by printing the first two lists in row_lists. 13 | # Create a list comprehension that generates a dictionary using lists2dict() for each sublist in row_lists. The keys are from the feature_names list and the values are the row entries in row_lists. Use sublist as your iterator variable and assign the resulting list of dictionaries to list_of_dicts. 14 | # Look at the first two dictionaries in list_of_dicts by printing them out. 15 | 16 | # Print the first two lists in row_lists 17 | print(row_lists[0]) 18 | print(row_lists[1]) 19 | 20 | # Turn list of lists into list of dicts: list_of_dicts 21 | list_of_dicts = [lists2dict(feature_names, sublist) for sublist in row_lists] 22 | 23 | # Print the first two dictionaries in list_of_dicts 24 | print(list_of_dicts[0]) 25 | print(list_of_dicts[1]) -------------------------------------------------------------------------------- /Python Data Science Toolbox (Part 2)/Chapter 3-Bringing it all together!/4.py: -------------------------------------------------------------------------------- 1 | # Turning this all into a DataFrame 2 | # You've zipped lists together, created a function to house your code, and even used the function in a list comprehension to generate a list of dictionaries. That was a lot of work and you did a great job! 3 | 4 | # You will now use of all these to convert the list of dictionaries into a pandas DataFrame. You will see how convenient it is to generate a DataFrame from dictionaries with the DataFrame() function from the pandas package. 5 | 6 | # The lists2dict() function, feature_names list, and row_lists list have been preloaded for this exercise. 7 | 8 | # Go for it! 9 | 10 | # Instructions 11 | # 100 XP 12 | # To use the DataFrame() function you need, first import the pandas package with the alias pd. 13 | # Create a DataFrame from the list of dictionaries in list_of_dicts by calling pd.DataFrame(). Assign the resulting DataFrame to df. 14 | # Inspect the contents of df printing the head of the DataFrame. Head of the DataFrame df can be accessed by calling df.head(). 15 | 16 | # Import the pandas package 17 | import pandas as pd 18 | 19 | # Turn list of lists into list of dicts: list_of_dicts 20 | list_of_dicts = [lists2dict(feature_names, sublist) for sublist in row_lists] 21 | 22 | # Turn list of dicts into a DataFrame: df 23 | df = pd.DataFrame(list_of_dicts) 24 | 25 | print(type(df)) 26 | 27 | # Print the head of the DataFrame 28 | print(df.head()) 29 | 30 | -------------------------------------------------------------------------------- /Python Data Science Toolbox (Part 2)/Chapter 3-Bringing it all together!/7.py: -------------------------------------------------------------------------------- 1 | # Writing a generator to load data in chunks (3) 2 | # Great! You've just created a generator function that you can use to help you process large files. 3 | 4 | # Now let's use your generator function to process the World Bank dataset like you did previously. You will process the file line by line, to create a dictionary of the counts of how many times each country appears in a column in the dataset. For this exercise, however, you won't process just 1000 rows of data, you'll process the entire dataset! 5 | 6 | # The generator function read_large_file() and the csv file 'world_dev_ind.csv' are preloaded and ready for your use. Go for it! 7 | 8 | # Instructions 9 | # 100 XP 10 | # Instructions 11 | # 100 XP 12 | # Bind the file 'world_dev_ind.csv' to file in the context manager with open(). 13 | # Complete the for loop so that it iterates over the generator from the call to read_large_file() to process all the rows of the file. 14 | # Initialize an empty dictionary: counts_dict 15 | counts_dict = {} 16 | 17 | # Open a connection to the file 18 | with open('world_dev_ind.csv') as file: 19 | 20 | # skip the column names 21 | file.readline() 22 | 23 | # Iterate over the generator from read_large_file() 24 | for line in read_large_file(file): 25 | 26 | row = line.split(',') 27 | first_col = row[0] 28 | 29 | if first_col in counts_dict.keys(): 30 | counts_dict[first_col] += 1 31 | else: 32 | counts_dict[first_col] = 1 33 | 34 | # Print 35 | print(counts_dict) -------------------------------------------------------------------------------- /Python Data Science Toolbox (Part 2)/Chapter 3-Bringing it all together!/8.py: -------------------------------------------------------------------------------- 1 | # Writing an iterator to load data in chunks (1) 2 | # Another way to read data too large to store in memory in chunks is to read the file in as DataFrames of a certain length, say, 100. For example, with the pandas package (imported as pd), you can do pd.read_csv(filename, chunksize=100). This creates an iterable reader object, which means that you can use next() on it. 3 | 4 | # In this exercise, you will read a file in small DataFrame chunks with read_csv(). You're going to use the World Bank Indicators data 'ind_pop.csv', available in your current directory, to look at the urban population indicator for numerous countries and years. 5 | 6 | # Instructions 7 | # 100 XP 8 | # Use pd.read_csv() to read in 'ind_pop.csv' in chunks of size 10. Assign the result to df_reader. 9 | # Print the first two chunks from df_reader. 10 | 11 | 12 | # Import the pandas package 13 | import pandas as pd 14 | 15 | # Initialize reader object: df_reader 16 | df_reader = pd.read_csv('ind_pop.csv', chunksize = 10) 17 | 18 | # Print two chunks 19 | print(next(df_reader)) 20 | print(next(df_reader)) -------------------------------------------------------------------------------- /Python Data Science Toolbox (Part 2)/Chapter 3-Bringing it all together!/ch3_slides.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FTiniNadhirah/Datacamp-Data-Scientist-with-Python-2019-and-2020/52ab66237a2373ba820ea3e197809648b09f217b/Python Data Science Toolbox (Part 2)/Chapter 3-Bringing it all together!/ch3_slides.pdf -------------------------------------------------------------------------------- /Statistical Thinking in Python (Part 1)/Notes/ch1_slides.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FTiniNadhirah/Datacamp-Data-Scientist-with-Python-2019-and-2020/52ab66237a2373ba820ea3e197809648b09f217b/Statistical Thinking in Python (Part 1)/Notes/ch1_slides.pdf -------------------------------------------------------------------------------- /Statistical Thinking in Python (Part 1)/Notes/ch2_slides.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FTiniNadhirah/Datacamp-Data-Scientist-with-Python-2019-and-2020/52ab66237a2373ba820ea3e197809648b09f217b/Statistical Thinking in Python (Part 1)/Notes/ch2_slides.pdf -------------------------------------------------------------------------------- /Statistical Thinking in Python (Part 1)/Notes/ch3_slides.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FTiniNadhirah/Datacamp-Data-Scientist-with-Python-2019-and-2020/52ab66237a2373ba820ea3e197809648b09f217b/Statistical Thinking in Python (Part 1)/Notes/ch3_slides.pdf -------------------------------------------------------------------------------- /Statistical Thinking in Python (Part 1)/Notes/ch4_slides.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FTiniNadhirah/Datacamp-Data-Scientist-with-Python-2019-and-2020/52ab66237a2373ba820ea3e197809648b09f217b/Statistical Thinking in Python (Part 1)/Notes/ch4_slides.pdf -------------------------------------------------------------------------------- /Statistical Thinking in Python (Part 2)/Datasets/anscombe.csv: -------------------------------------------------------------------------------- 1 | 0,0,1,1,2,2,3,3 2 | x,y,x,y,x,y,x,y 3 | 10.0,8.04,10.0,9.14,10.0,7.46,8.0,6.58 4 | 8.0,6.95,8.0,8.14,8.0,6.77,8.0,5.76 5 | 13.0,7.58,13.0,8.74,13.0,12.74,8.0,7.71 6 | 9.0,8.81,9.0,8.77,9.0,7.11,8.0,8.84 7 | 11.0,8.33,11.0,9.26,11.0,7.81,8.0,8.47 8 | 14.0,9.96,14.0,8.10,14.0,8.84,8.0,7.04 9 | 6.0,7.24,6.0,6.13,6.0,6.08,8.0,5.25 10 | 4.0,4.26,4.0,3.10,4.0,5.39,19.0,12.50 11 | 12.0,10.84,12.0,9.13,12.0,8.15,8.0,5.56 12 | 7.0,4.82,7.0,7.26,7.0,6.42,8.0,7.91 13 | 5.0,5.68,5.0,4.74,5.0,5.73,8.0,6.89 14 | -------------------------------------------------------------------------------- /Statistical Thinking in Python (Part 2)/Notes/ch1_slides.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FTiniNadhirah/Datacamp-Data-Scientist-with-Python-2019-and-2020/52ab66237a2373ba820ea3e197809648b09f217b/Statistical Thinking in Python (Part 2)/Notes/ch1_slides.pdf -------------------------------------------------------------------------------- /Statistical Thinking in Python (Part 2)/Notes/ch2_slides.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FTiniNadhirah/Datacamp-Data-Scientist-with-Python-2019-and-2020/52ab66237a2373ba820ea3e197809648b09f217b/Statistical Thinking in Python (Part 2)/Notes/ch2_slides.pdf -------------------------------------------------------------------------------- /Statistical Thinking in Python (Part 2)/Notes/ch3_slides.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FTiniNadhirah/Datacamp-Data-Scientist-with-Python-2019-and-2020/52ab66237a2373ba820ea3e197809648b09f217b/Statistical Thinking in Python (Part 2)/Notes/ch3_slides.pdf -------------------------------------------------------------------------------- /Statistical Thinking in Python (Part 2)/Notes/ch4_slides.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FTiniNadhirah/Datacamp-Data-Scientist-with-Python-2019-and-2020/52ab66237a2373ba820ea3e197809648b09f217b/Statistical Thinking in Python (Part 2)/Notes/ch4_slides.pdf -------------------------------------------------------------------------------- /Statistical Thinking in Python (Part 2)/Notes/ch5_slides.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FTiniNadhirah/Datacamp-Data-Scientist-with-Python-2019-and-2020/52ab66237a2373ba820ea3e197809648b09f217b/Statistical Thinking in Python (Part 2)/Notes/ch5_slides.pdf -------------------------------------------------------------------------------- /_config.yml: -------------------------------------------------------------------------------- 1 | theme: jekyll-theme-leap-day -------------------------------------------------------------------------------- /pandas Foundations/Datasets/messy_stock_data.tsv: -------------------------------------------------------------------------------- 1 | The following stock data was collect on 2016-AUG-25 from an unknown source 2 | These kind of ocmments are not very useful, are they? 3 | probably should just throw this line away too, but not the next since those are column labels 4 | name Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 5 | # So that line you just read has all the column headers labels 6 | IBM 156.08 160.01 159.81 165.22 172.25 167.15 164.75 152.77 145.36 146.11 137.21 137.96 7 | MSFT 45.51 43.08 42.13 43.47 47.53 45.96 45.61 45.51 43.56 48.70 53.88 55.40 8 | # That MSFT is MicroSoft 9 | GOOGLE 512.42 537.99 559.72 540.50 535.24 532.92 590.09 636.84 617.93 663.59 735.39 755.35 10 | APPLE 110.64 125.43 125.97 127.29 128.76 127.81 125.34 113.39 112.80 113.36 118.16 111.73 11 | # Maybe we should have bought some Apple stock in 2008? -------------------------------------------------------------------------------- /pandas Foundations/Datasets/world_population.csv: -------------------------------------------------------------------------------- 1 | Year,Total Population 2 | 1960,3034970564.0 3 | 1970,3684822701.0 4 | 1980,4436590356.0 5 | 1990,5282715991.0 6 | 2000,6115974486.0 7 | 2010,6924282937.0 8 | -------------------------------------------------------------------------------- /pandas Foundations/Notes/ch1_slides.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FTiniNadhirah/Datacamp-Data-Scientist-with-Python-2019-and-2020/52ab66237a2373ba820ea3e197809648b09f217b/pandas Foundations/Notes/ch1_slides.pdf -------------------------------------------------------------------------------- /pandas Foundations/Notes/ch2_slides.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FTiniNadhirah/Datacamp-Data-Scientist-with-Python-2019-and-2020/52ab66237a2373ba820ea3e197809648b09f217b/pandas Foundations/Notes/ch2_slides.pdf -------------------------------------------------------------------------------- /pandas Foundations/Notes/ch3_slides.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FTiniNadhirah/Datacamp-Data-Scientist-with-Python-2019-and-2020/52ab66237a2373ba820ea3e197809648b09f217b/pandas Foundations/Notes/ch3_slides.pdf -------------------------------------------------------------------------------- /pandas Foundations/Notes/ch4_slides.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/FTiniNadhirah/Datacamp-Data-Scientist-with-Python-2019-and-2020/52ab66237a2373ba820ea3e197809648b09f217b/pandas Foundations/Notes/ch4_slides.pdf --------------------------------------------------------------------------------