├── LICENSE ├── README.md ├── check_env.py ├── img ├── ISLR.jpeg ├── acquire.jpg ├── amit.png ├── approach.jpg ├── art.jpeg ├── bargava.jpg ├── book.png ├── books.jpg ├── break.jpg ├── clay.jpeg ├── craft.jpeg ├── estimating_coefficients.png ├── explore.jpg ├── frame.jpg ├── glass.jpg ├── insight.jpg ├── lens.jpeg ├── model.jpg ├── numbers.jpg ├── onion-image.jpg ├── onion.jpg ├── onion.png ├── overview.jpg ├── pair.jpg ├── postit.jpg ├── r2.gif ├── r_squared.png ├── refine.jpg ├── retail.jpg ├── science.jpeg ├── see.jpeg ├── single.jpeg ├── skills.png ├── slope_intercept.png ├── speak.jpeg ├── sports.jpg ├── stars.jpg ├── think.jpg ├── thinkstats.jpg ├── time.jpg ├── tool.jpg ├── travel.jpg ├── welcome.jpg ├── wesmckinney.jpg └── workshop.jpg ├── installation_instructions.md ├── overview.md ├── overview.pdf ├── python.txt └── time_series ├── 1-Frame.ipynb ├── 2-Acquire.ipynb ├── 3-Refine.ipynb ├── 4-Explore.ipynb ├── 5-Model.ipynb ├── 6-Insight.ipynb ├── MonthWiseMarketArrivals.csv ├── MonthWiseMarketArrivals.html ├── MonthWiseMarketArrivalsJan2016.html ├── MonthWiseMarketArrivals_Clean.csv ├── city_geocode.csv ├── img ├── Cov_nonstationary.png ├── Mean_nonstationary.png ├── Var_nonstationary.png ├── corr.svg ├── left_merge.png ├── onion_small.png ├── onion_tables.png ├── peeling_the_onion_small.png ├── pivot.png ├── splitapplycombine.png ├── subsetcolumns.png └── subsetrows.png └── state_geocode.csv /LICENSE: -------------------------------------------------------------------------------- 1 | The MIT License (MIT) 2 | 3 | Copyright (c) 2016 Amit Kapoor & Bargava Subramanian 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Time Series Analysis using Python 2 | Workshop material for Time Series Analysis in Python 3 | by [Amit Kapoor](http://twitter.com/amitkaps) and [Bargava Subramanian](http://twitter.com/bargava) 4 | 5 | **Experience Level** : Beginner 6 | 7 | **Overview**: A lot of data that we see in nature are in continuous time series. This workshop will provide an overview on how to do time series analysis and introduce time series forecasting. 8 | 9 | **Audience**: People interested in Data analytics on time series data. 10 | 11 | **Objective**: 12 | 13 | 1. What is time series data? 14 | 2. How to visualize time series data 15 | 3. How to analyze time series data ? 16 | 4. How to forecast time series data? 17 | 18 | 19 | Weather data, stock prices, population of a country are all examples of time series data. The data is continuously recorded daily, weekly, monthly etc. While a lot of theory has been developed for representing and analyzing data at a point in time, many of those don't work well with continuous time series data. 20 | 21 | The goal of this workshop is two-fold: 22 | 23 | 1. How to analyze/visualize time-series data 24 | 2. How to forecast using the available time-series data 25 | 26 | We will take a principled scientific approach on how to gather data, prepare data and explore it. We will create some summary metrics using the available data. 27 | 28 | Then we will define the problem(s) we want to forecast and introduce some of the common time series forecasting models and implement them using Python. 29 | 30 | **Outline** 31 | 32 | * Obtaining time series data 33 | * Determine what questions need to be answered 34 | * Generate hypotheses for various solution approaches 35 | * Exploring time series data 36 | * Outliers 37 | * Missing values 38 | * Creating aggregate metrics 39 | * Calculate percentage/proportion metrics 40 | * Summary metrics 41 | * Visualize time series data 42 | * Time Series forecasting 43 | * Linear regression 44 | * Moving average 45 | * Time series decomposition 46 | * ARIMA 47 | * Dynamic Regression Models 48 | * Vector Autoregression 49 | * Exponential Smoothing 50 | 51 | 52 | 53 | *Script to check if requisite libraries for the workshop is present* 54 | Please execute the following command at the command prompt 55 | 56 | $ python check_env.py 57 | 58 | If any library has a `FAIL` message, please install/upgrade that library. 59 | 60 | Installation instructions can be found [here](https://github.com/rouseguy/TimeSeriesAnalysiswithPython/blob/master/installation_instructions.md) 61 | 62 | --- 63 | ### Licensing 64 | 65 | Time Series Analysis using Python by Amit Kapoor and Bargava Subramanian is licensed under a MIT License. 66 | -------------------------------------------------------------------------------- /check_env.py: -------------------------------------------------------------------------------- 1 | # Authors: Amit Kapoor and Bargava Subramanian 2 | # Copyright (c) 2016 Amit Kapoor 3 | # License: MIT License 4 | 5 | """ 6 | This script will check if the environment setup is correct for the workshop. 7 | 8 | To run, please execute the following command from the command prompt 9 | >>> python check_env.py 10 | 11 | The output will indicate if any of the libraries are missing or need to be updated. 12 | 13 | This script is inspired from https://github.com/fonnesbeck/scipy2015_tutorial/blob/master/check_env.py 14 | """ 15 | 16 | from __future__ import print_function 17 | 18 | try: 19 | import curses 20 | curses.setupterm() 21 | assert curses.tigetnum("colors") > 2 22 | OK = "\x1b[1;%dm[ OK ]\x1b[0m" % (30 + curses.COLOR_GREEN) 23 | FAIL = "\x1b[1;%dm[FAIL]\x1b[0m" % (30 + curses.COLOR_RED) 24 | except: 25 | OK = '[ OK ]' 26 | FAIL = '[FAIL]' 27 | 28 | import sys 29 | try: 30 | import importlib 31 | except ImportError: 32 | print(FAIL, "Python version 2.7 is required, but %s is installed." % sys.version) 33 | from distutils.version import LooseVersion as Version 34 | 35 | def import_version(pkg, min_ver, fail_msg=""): 36 | mod = None 37 | try: 38 | mod = importlib.import_module(pkg) 39 | if((pkg=="spacy" or pkg=="wordcloud") and (mod > 0)): 40 | print(OK, '%s ' % (pkg)) 41 | else: 42 | #else: 43 | version = getattr(mod, "__version__", 0) or getattr(mod, "VERSION", 0) 44 | if Version(version) < min_ver: 45 | print(FAIL, "%s version %s or higher required, but %s installed." 46 | % (lib, min_ver, version)) 47 | else: 48 | print(OK, '%s version %s' % (pkg, version)) 49 | except ImportError: 50 | print(FAIL, '%s not installed. %s' % (pkg, fail_msg)) 51 | return mod 52 | 53 | 54 | # first check the python version 55 | print('Using python in', sys.prefix) 56 | print(sys.version) 57 | pyversion = Version(sys.version) 58 | if pyversion < "3": 59 | print(FAIL, "Python version 3 is required, but %s is installed." % sys.version) 60 | elif pyversion >= "2": 61 | if pyversion == "2.7": 62 | print(FAIL, "Python version 2.7 is installed. Please upgrade to version 3." ) 63 | else: 64 | print(FAIL, "Unknown Python version: %s" % sys.version) 65 | 66 | print() 67 | requirements = { 68 | 69 | 'IPython' : '4.0.3', 70 | 'jupyter' :'1.0.0', 71 | 'matplotlib' :'1.5.0', 72 | 'numpy' : '1.10.4', 73 | 'pandas' : '0.17.1', 74 | 'scipy' : '0.17.0', 75 | 'sklearn' : '0.17', 76 | 'seaborn' :'0.6.0', 77 | 'statsmodels':'0.6.1' 78 | } 79 | 80 | # now the dependencies 81 | for lib, required_version in list(requirements.items()): 82 | import_version(lib, required_version) 83 | 84 | 85 | 86 | -------------------------------------------------------------------------------- /img/ISLR.jpeg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/img/ISLR.jpeg -------------------------------------------------------------------------------- /img/acquire.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/img/acquire.jpg -------------------------------------------------------------------------------- /img/amit.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/img/amit.png -------------------------------------------------------------------------------- /img/approach.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/img/approach.jpg -------------------------------------------------------------------------------- /img/art.jpeg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/img/art.jpeg -------------------------------------------------------------------------------- /img/bargava.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/img/bargava.jpg -------------------------------------------------------------------------------- /img/book.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/img/book.png -------------------------------------------------------------------------------- /img/books.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/img/books.jpg -------------------------------------------------------------------------------- /img/break.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/img/break.jpg -------------------------------------------------------------------------------- /img/clay.jpeg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/img/clay.jpeg -------------------------------------------------------------------------------- /img/craft.jpeg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/img/craft.jpeg -------------------------------------------------------------------------------- /img/estimating_coefficients.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/img/estimating_coefficients.png -------------------------------------------------------------------------------- /img/explore.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/img/explore.jpg -------------------------------------------------------------------------------- /img/frame.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/img/frame.jpg -------------------------------------------------------------------------------- /img/glass.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/img/glass.jpg -------------------------------------------------------------------------------- /img/insight.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/img/insight.jpg -------------------------------------------------------------------------------- /img/lens.jpeg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/img/lens.jpeg -------------------------------------------------------------------------------- /img/model.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/img/model.jpg -------------------------------------------------------------------------------- /img/numbers.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/img/numbers.jpg -------------------------------------------------------------------------------- /img/onion-image.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/img/onion-image.jpg -------------------------------------------------------------------------------- /img/onion.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/img/onion.jpg -------------------------------------------------------------------------------- /img/onion.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/img/onion.png -------------------------------------------------------------------------------- /img/overview.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/img/overview.jpg -------------------------------------------------------------------------------- /img/pair.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/img/pair.jpg -------------------------------------------------------------------------------- /img/postit.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/img/postit.jpg -------------------------------------------------------------------------------- /img/r2.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/img/r2.gif -------------------------------------------------------------------------------- /img/r_squared.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/img/r_squared.png -------------------------------------------------------------------------------- /img/refine.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/img/refine.jpg -------------------------------------------------------------------------------- /img/retail.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/img/retail.jpg -------------------------------------------------------------------------------- /img/science.jpeg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/img/science.jpeg -------------------------------------------------------------------------------- /img/see.jpeg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/img/see.jpeg -------------------------------------------------------------------------------- /img/single.jpeg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/img/single.jpeg -------------------------------------------------------------------------------- /img/skills.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/img/skills.png -------------------------------------------------------------------------------- /img/slope_intercept.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/img/slope_intercept.png -------------------------------------------------------------------------------- /img/speak.jpeg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/img/speak.jpeg -------------------------------------------------------------------------------- /img/sports.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/img/sports.jpg -------------------------------------------------------------------------------- /img/stars.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/img/stars.jpg -------------------------------------------------------------------------------- /img/think.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/img/think.jpg -------------------------------------------------------------------------------- /img/thinkstats.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/img/thinkstats.jpg -------------------------------------------------------------------------------- /img/time.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/img/time.jpg -------------------------------------------------------------------------------- /img/tool.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/img/tool.jpg -------------------------------------------------------------------------------- /img/travel.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/img/travel.jpg -------------------------------------------------------------------------------- /img/welcome.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/img/welcome.jpg -------------------------------------------------------------------------------- /img/wesmckinney.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/img/wesmckinney.jpg -------------------------------------------------------------------------------- /img/workshop.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/img/workshop.jpg -------------------------------------------------------------------------------- /installation_instructions.md: -------------------------------------------------------------------------------- 1 | # Installation Instructions for the workshop 2 | 3 | 4 | ### Package Manager: Anaconda 5 | 6 | We strongly recommend using Anaconda. It can be downloaded from here: 7 | https://www.continuum.io/downloads 8 | 9 | It comes with `jupyter notebook` which is the IDE we will be using for the workshop 10 | 11 | We recommend using the Python 3.5 version. 12 | 13 | ### Required packages 14 | 15 | Run the following script at the command prompt to check if you have all the requisite packages installed. 16 | To run, please execute the following command from the command prompt 17 | 18 | $ python check_env.py 19 | 20 | The output will indicate if any of the libraries are missing or need to be updated. 21 | 22 | Any package that is missing can be installed by running the command at the command prompt 23 | 24 | $ pip install 25 | 26 | Any package that needs to be upgraded can be upgraded by running the command at the command prompt 27 | 28 | $ pip install --upgrade 29 | 30 | 31 | Replace <*package_name*> with the package that needs to be installed/upgraded. 32 | 33 | -------------------------------------------------------------------------------- /overview.md: -------------------------------------------------------------------------------- 1 | ![](img/workshop.jpg) 2 | # Intro to Data Science and Machine Learning 3 | ### @amitkaps | @bargava 4 | 5 | --- 6 | 7 | ![](img/welcome.jpg) 8 | # Welcome 9 | 10 | --- 11 | 12 | # Facilitators 13 | ![](img/amit.png) 14 | ![](img/bargava.jpg) 15 | 16 | --- 17 | 18 | # Amit 19 | ## @amitkaps 20 | ![](img/amit.png) 21 | 22 | --- 23 | 24 | # Bargava 25 | ## @bargava 26 | ![](img/bargava.jpg) 27 | 28 | 29 | --- 30 | 31 | ![](img/lens.jpeg) 32 | # See the world through a data lens 33 | 34 | --- 35 | 36 | ![](img/see.jpeg) 37 | # "Data is just a clue to the end truth" 38 | -- Josh Smith 39 | 40 | --- 41 | 42 | ![](img/sports.jpg) 43 | ![](img/travel.jpg) 44 | ![](img/retail.jpg) 45 | # Data Driven Decisions 46 | 47 | --- 48 | 49 | ![](img/science.jpeg) 50 | # "Science is knowledge which we understand so well that we can teach it to a computer. Everything else is art" 51 | -- Donald Knuth 52 | 53 | --- 54 | 55 | ![](img/art.jpeg) 56 | # Data Science is an Art 57 | 58 | --- 59 | 60 | ![](img/glass.jpg) 61 | # Hypothesis Driven Approach 62 | 63 | --- 64 | 65 | ![](img/frame.jpg) 66 | # Frame 67 | ## "An approximate answer to the right problem is worth a good deal" 68 | 69 | --- 70 | 71 | ![](img/acquire.jpg) 72 | # Acquire 73 | ## "80% perspiration, 10% great idea, 10% great output" 74 | 75 | --- 76 | 77 | ![](img/refine.jpg) 78 | # Refine 79 | ## "All data is messy." 80 | 81 | --- 82 | 83 | ![](img/explore.jpg) 84 | # Explore 85 | ## "I don't know, what I don't know." 86 | 87 | --- 88 | 89 | ![](img/model.jpg) 90 | # Model 91 | ## "All models are wrong, but some are useful" 92 | 93 | --- 94 | 95 | ![](img/insight.jpg) 96 | # Insight 97 | ## "The goal is to turn data into insight" 98 | 99 | --- 100 | 101 | ![](img/approach.jpg) 102 | 103 | 104 | --- 105 | 106 | ![](img/think.jpg) 107 | ## "Doing data analyis requires quite a bit of thinking and we believe that when you’ve completed a good data analysis, you’ve spent more time thinking than doing." 108 | -- Roger Peng 109 | 110 | --- 111 | 112 | ![](img/tool.jpg) 113 | # Python Data Stack 114 | 115 | --- 116 | 117 | ![](img/books.jpg) 118 | # Case Studies 119 | 120 | --- 121 | # Day 1 122 | # Peeling the Onion 123 | ## Time Series Analysis 124 | ![](img/onion.jpg) 125 | 126 | --- 127 | 128 | # Day 2 129 | # Grocery 130 | ## Market Basket Analysis / Collaborative Filter 131 | 132 | --- 133 | 134 | # Day 2 135 | # BanK Marketing 136 | ## Random Forest and Gradient Boosting 137 | 138 | --- 139 | 140 | # Day 3 141 | # DataTau 142 | ## Text Analytics 143 | 144 | --- 145 | 146 | ![](img/clay.jpeg) 147 | # Learning Approach 148 | 149 | --- 150 | 151 | ![](img/single.jpeg) 152 | # Do the Exercises 153 | 154 | --- 155 | 156 | ![](img/pair.jpg) 157 | # Pair up & Learn 158 | 159 | --- 160 | 161 | ![](img/postit.jpg) 162 | # Call for Help 163 | 164 | --- 165 | 166 | ![](img/numbers.jpg) 167 | # Enjoy the workshop 168 | 169 | --- 170 | 171 | ## Workshop Material is available at the Github Repo 172 | ### [https://github.com/amitkaps/machine-learning](https://github.com/amitkaps/machine-learning) 173 | 174 | --- 175 | 176 | # Exercise 177 | 178 | --- 179 | 180 | # 1. Time Series Exercise 181 | 182 | ### "Predict the number of tickets that will be raised in the next week" 183 | 184 | - **Frame**: What to forecast? At what horizon? At what level? 185 | - **Acquire, Refine, Explore**: Do EDA to understand the trend and pattern within the data 186 | - **Models**: Mean Model, Linear Trend, Random Walk, Simple Moving Average, Exp Smoothing, Decomposition, ARIMA 187 | - **Insight**: Share the insight through a datavis of the models 188 | 189 | --- 190 | 191 | # 2. Text Analytics Exercise 192 | 193 | ### "Identify the entity, features & topics in the 'Comments' data or 'Twitter #machine learning' data" 194 | 195 | - **Frame**: What are the comments you are trying to understand? 196 | - **Acquire, Refine, Explore**: Do Wordcloud, Lemmatization, Part of Speech Analysis, and Entity Chunking 197 | - **Models**: TF-IDF, Topic Modelling, Sentiment Analysis 198 | - **Insight**: Share the insight through word cloud and topic visualisation 199 | 200 | --- 201 | 202 | # Feedback 203 | 204 | ### [https://amitkaps.typeform.com/to/i6wl2E](https://amitkaps.typeform.com/to/i6wl2E) 205 | 206 | 207 | --- 208 | 209 | # Recap 210 | 211 | --- 212 | 213 | ![](img/approach.jpg) 214 | 215 | --- 216 | 217 | ![](img/frame.jpg) 218 | # Frame 219 | - **Toy Problems** 220 | - **Simple Problems** 221 | - Complex Problems 222 | - Business Problems 223 | - Research Problems 224 | 225 | --- 226 | 227 | ![](img/acquire.jpg) 228 | # Acquire 229 | - **Scraping** (structured, unstructured) 230 | - **Files** (csv, xls, json, xml, pdf, ...) 231 | - Database (sqlite, ...) 232 | - APIs 233 | - Streaming 234 | 235 | --- 236 | 237 | ![](img/refine.jpg) 238 | # Refine 239 | - Data Cleaning (inconsistent, missing, ...) 240 | - **Data Refining** (derive, parse, merge, filter, convert, ...) 241 | - **Data Transformations** (group by, pivot, aggregate, sample, summarise, ...) 242 | 243 | 244 | --- 245 | 246 | ![](img/explore.jpg) 247 | # Explore 248 | - **Simple Vis** 249 | - Multi Dimensional Vis 250 | - Geographic Vis 251 | - Large Data Vis (Bin - Summarise - Smooth) 252 | - Interactive Vis 253 | 254 | --- 255 | 256 | ![](img/model.jpg) 257 | # Model - Supervised Learning 258 | - *Continuous*: Regression - **Linear**, Polynomial, Tree Based Methods - CART, **Random Forest**, Gradient Boosting Machines 259 | - *Classification* - **Logistics Regression**, Tree, KNN, SVM, Naive-Bayes, Bayesian Network 260 | 261 | --- 262 | 263 | ![](img/model.jpg) 264 | # Model - UnSupervised Learning 265 | - *Continuous*: Clustering & Dimensionality Reduction like PCA, SVD, MDS, K-means 266 | - *Categorical*: Association Analysis 267 | 268 | --- 269 | 270 | ![](img/model.jpg) 271 | # Model - Advanced / 272 | - **Time Series** 273 | - **Text Analytics** 274 | - Network / Graph Analytics 275 | - Optimization 276 | 277 | --- 278 | ![](img/model.jpg) 279 | # Model - Specialized 280 | - Reinforcement Learning 281 | - Online Learning 282 | - Deep Learning 283 | - Other Applications: Image, Speech 284 | 285 | 286 | --- 287 | 288 | ![](img/insight.jpg) 289 | # Insight 290 | - Narrative Visualisation 291 | - Dashboard Visualisation 292 | - Decision Making Tools 293 | - Automated Decision Tools 294 | 295 | --- 296 | 297 | # PyData Stack 298 | - **Acquire / Refine**: `Pandas, Beautiful Soup, Selenium, Requests, SQL Alchemy, Numpy, Blaze` 299 | - **Explore**: `MatPlotLib, Seaborn, Bokeh, Plotly, Vega, Folium` 300 | - **Model**: `Scikit-Learn, StatsModels, SciPy, Gensim, Keras, Tensor Flow, PySpark` 301 | - **Insight**: `Django, Flask` 302 | 303 | 304 | --- 305 | 306 | # Skills 307 | ![fit](img/skills.png) 308 | 309 | --- 310 | 311 | ![fit](img/skills.png) 312 | 313 | --- 314 | 315 | # Books 316 | 317 | ![fit](img/book.png) 318 | ![fit](img/wesmckinney.jpg) 319 | ![fit](img/thinkstats.jpg) 320 | 321 | 322 | --- 323 | 324 | ![fit](img/book.png) 325 | ![fit](img/wesmckinney.jpg) 326 | ![fit](img/thinkstats.jpg) 327 | 328 | --- 329 | 330 | ![left](img/ISLR.jpeg) 331 | ## Resources - Statistical Learning 332 | - One of the good books on statistical learning is ISLR -> [An Introduction to Statistical Learning with Application in R](http://www-bcf.usc.edu/~gareth/ISL/index.html) 333 | - You can find all the ISLR code in python at this github repo - [https://github.com/JWarmenhoven/ISLR-python](https://github.com/JWarmenhoven/ISLR-python) 334 | 335 | --- 336 | 337 | ## Resources - Time Series 338 | - [Forecasting: Principle and Text](https://www.otexts.org/fpp) 339 | - [Statistical forecasting: Notes on regression and time series analysis Case](http://people.duke.edu/~rnau/411home.htm) 340 | 341 | ## Resources - Text Analytics 342 | - [Natural Language Processing with Python](http://www.nltk.org/book/) 343 | 344 | 345 | --- 346 | ![](img/stars.jpg) 347 | # Online Course 348 | - Harvard Data Science Course - [CS 109 Course](http://cs109.github.io/2015/) (It is structured in similar way to the approach we shared) 349 | - Data Science Specialisation - [JHU Data Science](https://www.coursera.org/specializations/jhu-data-science) (It is a good course, though the material is coded in R) 350 |
351 | - Many more on Coursera & Udacity... 352 | 353 | 354 | --- 355 | ![](img/workshop.jpg) 356 | # We enjoyed the workshop! 357 | 358 | --- 359 | ![](img/speak.jpeg) 360 | # Speak to Us! 361 | 362 | --- 363 | 364 | ![](img/numbers.jpg) 365 | # Thank you 366 | ## @amitkaps | @bargava -------------------------------------------------------------------------------- /overview.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/overview.pdf -------------------------------------------------------------------------------- /python.txt: -------------------------------------------------------------------------------- 1 | abstract-rendering==0.5.1 2 | alabaster==0.7.7 3 | anaconda-client==1.2.2 4 | appnope==0.1.0 5 | appscript==1.0.1 6 | argcomplete==1.0.0 7 | astropy==1.1.1 8 | Babel==2.2.0 9 | beautifulsoup4==4.4.1 10 | bitarray==0.8.1 11 | blaze==0.9.0 12 | bokeh==0.11.0 13 | boto==2.39.0 14 | Bottleneck==1.0.0 15 | cffi==1.2.1 16 | clyent==1.2.0 17 | colorama==0.3.6 18 | conda==4.0.4 19 | conda-build==1.19.0 20 | conda-env==2.4.5 21 | configobj==5.0.6 22 | cryptography==1.0.2 23 | cycler==0.10.0 24 | Cython==0.23.4 25 | cytoolz==0.7.5 26 | datashape==0.5.0 27 | decorator==4.0.6 28 | docutils==0.12 29 | dynd===f641248 30 | et-xmlfile==1.0.1 31 | fastcache==1.0.2 32 | Flask==0.10.1 33 | futures==3.0.3 34 | greenlet==0.4.9 35 | h5py==2.5.0 36 | html5lib==0.999 37 | idna==2.0 38 | ipykernel==4.2.2 39 | ipython==4.0.3 40 | ipython-genutils==0.1.0 41 | ipywidgets==4.1.1 42 | itsdangerous==0.24 43 | jdcal==1.2 44 | jedi==0.9.0 45 | Jinja2==2.8 46 | jsonschema==2.4.0 47 | jupyter==1.0.0 48 | jupyter-client==4.1.1 49 | jupyter-console==4.1.0 50 | jupyter-core==4.0.6 51 | llvmlite==0.8.0 52 | lxml==3.5.0 53 | MarkupSafe==0.23 54 | matplotlib==1.5.1 55 | mistune==0.7.1 56 | multipledispatch==0.4.8 57 | nbconvert==4.1.0 58 | nbformat==4.0.1 59 | networkx==1.11 60 | nltk==3.2 61 | nose==1.3.7 62 | notebook==4.1.0 63 | numba==0.23.1 64 | numexpr==2.4.6 65 | numpy==1.10.4 66 | odo==0.4.0 67 | openpyxl==2.3.2 68 | pandas==0.17.1 69 | path.py==0.0.0 70 | patsy==0.4.0 71 | pep8==1.7.0 72 | pexpect==3.3 73 | pickleshare==0.5 74 | Pillow==3.1.0 75 | ply==3.8 76 | psutil==3.4.2 77 | ptyprocess==0.5 78 | py==1.4.31 79 | pyasn1==0.1.9 80 | pycosat==0.6.1 81 | pycparser==2.14 82 | pycrypto==2.6.1 83 | pycurl==7.19.5.3 84 | pyflakes==1.0.0 85 | Pygments==2.1 86 | pyOpenSSL==0.15.1 87 | pyparsing==2.0.3 88 | pytest==2.8.5 89 | python-dateutil==2.4.2 90 | pytz==2015.7 91 | PyYAML==3.11 92 | pyzmq==15.2.0 93 | qtconsole==4.1.1 94 | redis==2.10.3 95 | requests==2.9.1 96 | rope-py3k==0.9.4.post1 97 | scikit-image==0.11.3 98 | scikit-learn==0.17 99 | scipy==0.17.0 100 | seaborn==0.7.0 101 | simplegeneric==0.8.1 102 | six==1.10.0 103 | snowballstemmer==1.2.1 104 | sockjs-tornado==1.0.1 105 | Sphinx==1.3.5 106 | sphinx-rtd-theme==0.1.9 107 | spyder==2.3.8 108 | SQLAlchemy==1.0.12 109 | statsmodels==0.6.1 110 | sympy==0.7.6.1 111 | tables==3.2.2 112 | terminado==0.5 113 | toolz==0.7.4 114 | tornado==4.3 115 | traitlets==4.1.0 116 | unicodecsv==0.14.1 117 | Werkzeug==0.11.3 118 | xgboost==0.4a30 119 | xlrd==0.9.4 120 | XlsxWriter==0.8.4 121 | xlwings==0.6.4 122 | xlwt==1.0.0 123 | -------------------------------------------------------------------------------- /time_series/1-Frame.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# 1. Frame the Problem\n", 8 | "\n", 9 | "In late 2010, Onion prices shot through the roof and causing grave crisis. Apparently the crisis was caused by lack of rainfall in major onion producing region - Maharashtra and Karnataka and led to large scale hoarding by the traders. The crisis caused political tension in the country and described as \"a grave concern\" by then Prime Minister Manmohan Singh.\n", 10 | "\n", 11 | "\n", 12 | "- BBC Article in Dec 2010 - [Stink over onion crisis is enough to make you cry](http://www.bbc.co.uk/blogs/thereporters/soutikbiswas/2010/12/indias_onion_crisis.html)\n", 13 | "- Hindu OpEd in Dec 2010 - [The political price of onions](http://www.thehindu.com/opinion/editorial/article977100.ece)\n", 14 | "\n", 15 | "![](img/peeling_the_onion_small.png)\n", 16 | "\n", 17 | "So what are the type of questions on Onion Prices - you would like to ask. \n", 18 | "\n", 19 | "\n", 20 | "## Types of Question\n", 21 | "\n", 22 | "> \"Doing data analysis requires quite a bit of thinking and we believe that when you’ve completed a good data analysis, you’ve spent more time thinking than doing.\" - Roger Peng\n", 23 | "\n", 24 | "1. **Descriptive** - \"seeks to summarize a characteristic of a set of data\"\n", 25 | "2. **Exploratory** - \"analyze the data to see if there are patterns, trends, or relationships between variables\" (hypothesis generating) \n", 26 | "3. **Inferential** - \"a restatement of this proposed hypothesis as a question and would be answered by analyzing a different set of data\" (hypothesis testing)\n", 27 | "4. **Predictive** - \"determine the impact on one factor based on other factor in a population - to make a prediction\"\n", 28 | "5. **Causal** - \"asks whether changing one factor will change another factor in a population - to establish a causal link\" \n", 29 | "6. **Mechanistic** - \"establish *how* the change in one factor results in change in another factor in a population - to determine the exact mechanism\"\n", 30 | "\n", 31 | "\n", 32 | "### Descriptive \n", 33 | "- Which states have the highest onion production and sales?\n", 34 | "- Which city (Mandi's) have the highest sales?\n", 35 | "- What is the average price for Onion across a year in Bangalore?\n", 36 | "- ...\n", 37 | "\n", 38 | "### Exploratory & Inferential \n", 39 | "- Is there a large difference between High and Low prices of Onion in a day?\n", 40 | "- What is the trend of onion price across days or months in Bangalore?\n", 41 | "- How is the price on onion correlated with volume of onion?\n", 42 | "- How is the export volume of onion correlated to domestic production volume?\n", 43 | "- ...\n", 44 | "\n", 45 | "### Predictive \n", 46 | "- What is the price of onion likely to be next day?\n", 47 | "- What is the price of onion likely to be next month?\n", 48 | "- What will be the sales quantity of onion tommorrow in Delhi?\n", 49 | "- ...\n", 50 | "\n", 51 | "### Causal\n", 52 | "- Does the change in production of onion have an impact on the onion prices? \n", 53 | "- Does the change in rainfall in monsoon have an impact on onion prices?\n", 54 | "- ...\n", 55 | "\n", 56 | "### Mechanistic\n", 57 | "- How does change in onion production impact the price of onion?\n", 58 | "- How does onion export volumes impact the prices of onion in local markets in India?\n", 59 | "- ...\n", 60 | "\n", 61 | "\n", 62 | "## Questions we will attempt\n", 63 | "\n", 64 | "### 1. Descriptive: How big is the Bangalore onion market compared to other cities in India?\n", 65 | "\n", 66 | "### 2. Exploratory / Inferential: Have the price variation in onion prices in Bangalore really gone up over the years?\n", 67 | "\n", 68 | "### 3. Predictive: Can we predict the price of onion in Bangalore?" 69 | ] 70 | }, 71 | { 72 | "cell_type": "code", 73 | "execution_count": null, 74 | "metadata": { 75 | "collapsed": true 76 | }, 77 | "outputs": [], 78 | "source": [] 79 | } 80 | ], 81 | "metadata": { 82 | "kernelspec": { 83 | "display_name": "Python 3", 84 | "language": "python", 85 | "name": "python3" 86 | }, 87 | "language_info": { 88 | "codemirror_mode": { 89 | "name": "ipython", 90 | "version": 3 91 | }, 92 | "file_extension": ".py", 93 | "mimetype": "text/x-python", 94 | "name": "python", 95 | "nbconvert_exporter": "python", 96 | "pygments_lexer": "ipython3", 97 | "version": "3.5.1" 98 | } 99 | }, 100 | "nbformat": 4, 101 | "nbformat_minor": 0 102 | } 103 | -------------------------------------------------------------------------------- /time_series/2-Acquire.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# 2. Acquire the Data\n", 8 | "\n", 9 | "\n", 10 | "## Finding Data Sources\n", 11 | "\n", 12 | "There are three place to get onion price and quantity information by market. \n", 13 | "\n", 14 | "1. **[Agmarket](http://agmarknet.nic.in/)** - This is the website run by the Directorate of Marketing & Inspection (DMI), Ministry of Agriculture, Government of India and provides daily price and arrival data for all agricultural commodities at national and state level. Unfortunately, the link to get Market-wise Daily Report for Specific Commodity (Onion for us) leads to a multipage aspx entry form to get data for each date. So it is like to require an involved scraper to get the data. Too much effort - Move on. Here is the best link to go to get what is available - http://agmarknet.nic.in/agnew/NationalBEnglish/SpecificCommodityWeeklyReport.aspx?ss=1\n", 15 | "\n", 16 | "\n", 17 | "2. **[Data.gov.in](https://data.gov.in/)** - This is normally a good place to get government data in a machine readable form like csv or xml. The Variety-wise Daily Market Prices Data of Onion is available for each year as an XML but unfortunately it does not include quantity information that is needed. It would be good to have both price and quantity - so even though this is easy, lets see if we can get both from a different source. Here is the best link to go to get what is available - https://data.gov.in/catalog/variety-wise-daily-market-prices-data-onion#web_catalog_tabs_block_10\n", 18 | "\n", 19 | "\n", 20 | "3. **[NHRDF](http://nhrdf.org/en-us/)** - This is the website of National Horticultural Research & Development Foundation and maintains a database on Market Arrivals and Price, Area and Production and Export Data for three commodities - Garlic, Onion and Potatoes. We are in luck! It also has data from 1996 onwards and has only got one form to fill to get the data in a tabular form. Further it also has production and export data. Excellent. Lets use this. Here is the best link to got to get all that is available - http://nhrdf.org/en-us/DatabaseReports\n", 21 | "\n", 22 | "\n", 23 | "## Scraping the Data\n", 24 | "\n", 25 | "\n", 26 | "### Ways to Scrape Data\n", 27 | "Now we can do this in two different levels of sophistication\n", 28 | "\n", 29 | "1. **Automate the form filling process**: The form on this page looks simple. But viewing source in the browser shows there form to fill with hidden fields and we will need to access it as a browser to get the session fields and then submit the form. This is a little bit more complicated than simple scraping a table on a webpage\n", 30 | "\n", 31 | "2. **Manually fill the form**: What if we manually fill the form with the desired form fields and then save the page as a html file. Then we can read this file and just scrape the table from it. Lets go with the simple way for now.\n", 32 | "\n", 33 | "\n", 34 | "### Scraping - Manual Form Filling\n", 35 | "\n", 36 | "So let us fill the form to get a small subset of data and test our scraping process. We will start by getting the [Monthwise Market Arrivals](http://nhrdf.org/en-us/MonthWiseMarketArrivals). \n", 37 | "\n", 38 | "- Crop Name: Onion\n", 39 | "- Month: January\n", 40 | "- Market: All\n", 41 | "- Year: 2016\n", 42 | "\n", 43 | "The saved webpage is available at [MonthWiseMarketArrivalsJan2016.html](MonthWiseMarketArrivalsJan2016.html)\n", 44 | "\n", 45 | "### Understand the HTML Structure\n", 46 | "\n", 47 | "We need to scrape data from this html page... So let us try to understand the structure of the page.\n", 48 | "\n", 49 | "1. You can view the source of the page - typically Right Click and View Source on any browser and that would give your the source HTML for any page.\n", 50 | "\n", 51 | "2. You can open the developer tools in your browser and investigate the structure as you mouse over the page \n", 52 | "\n", 53 | "3. We can use a tools like [Selector Gadget](http://selectorgadget.com/) to understand the id's and classes' used in the web page\n", 54 | "\n", 55 | "Our data is under the **<table>** tag " 56 | ] 57 | }, 58 | { 59 | "cell_type": "markdown", 60 | "metadata": {}, 61 | "source": [ 62 | "### Exercise #1" 63 | ] 64 | }, 65 | { 66 | "cell_type": "markdown", 67 | "metadata": {}, 68 | "source": [ 69 | "Find the number of tables in the HTML Structure of [MonthWiseMarketArrivalsJan2016.html](MonthWiseMarketArrivalsJan2016.html)?" 70 | ] 71 | }, 72 | { 73 | "cell_type": "code", 74 | "execution_count": null, 75 | "metadata": { 76 | "collapsed": true 77 | }, 78 | "outputs": [], 79 | "source": [] 80 | }, 81 | { 82 | "cell_type": "markdown", 83 | "metadata": {}, 84 | "source": [ 85 | "### Find all the Tables " 86 | ] 87 | }, 88 | { 89 | "cell_type": "code", 90 | "execution_count": 1, 91 | "metadata": { 92 | "collapsed": false 93 | }, 94 | "outputs": [], 95 | "source": [ 96 | "# Import the library we need, which is Pandas\n", 97 | "import pandas as pd" 98 | ] 99 | }, 100 | { 101 | "cell_type": "code", 102 | "execution_count": 2, 103 | "metadata": { 104 | "collapsed": false 105 | }, 106 | "outputs": [], 107 | "source": [ 108 | "# Read all the tables from the html document \n", 109 | "AllTables = pd.read_html('MonthWiseMarketArrivalsJan2016.html')" 110 | ] 111 | }, 112 | { 113 | "cell_type": "code", 114 | "execution_count": 3, 115 | "metadata": { 116 | "collapsed": false, 117 | "scrolled": true 118 | }, 119 | "outputs": [ 120 | { 121 | "data": { 122 | "text/plain": [ 123 | "5" 124 | ] 125 | }, 126 | "execution_count": 3, 127 | "metadata": {}, 128 | "output_type": "execute_result" 129 | } 130 | ], 131 | "source": [ 132 | "# Let us find out how many tables has it found?\n", 133 | "len(AllTables)" 134 | ] 135 | }, 136 | { 137 | "cell_type": "code", 138 | "execution_count": 4, 139 | "metadata": { 140 | "collapsed": false 141 | }, 142 | "outputs": [ 143 | { 144 | "data": { 145 | "text/plain": [ 146 | "list" 147 | ] 148 | }, 149 | "execution_count": 4, 150 | "metadata": {}, 151 | "output_type": "execute_result" 152 | } 153 | ], 154 | "source": [ 155 | "type(AllTables)" 156 | ] 157 | }, 158 | { 159 | "cell_type": "markdown", 160 | "metadata": {}, 161 | "source": [ 162 | "### Exercise #2\n", 163 | "Find the exact table of data we want in the list of AllTables?" 164 | ] 165 | }, 166 | { 167 | "cell_type": "code", 168 | "execution_count": 5, 169 | "metadata": { 170 | "collapsed": false 171 | }, 172 | "outputs": [ 173 | { 174 | "data": { 175 | "text/html": [ 176 | "
\n", 177 | "\n", 178 | " \n", 179 | " \n", 180 | " \n", 181 | " \n", 182 | " \n", 183 | " \n", 184 | " \n", 185 | " \n", 186 | " \n", 187 | " \n", 188 | " \n", 189 | " \n", 190 | " \n", 191 | " \n", 192 | " \n", 193 | " \n", 194 | " \n", 195 | " \n", 196 | " \n", 197 | " \n", 198 | " \n", 199 | " \n", 200 | " \n", 201 | " \n", 202 | " \n", 203 | " \n", 204 | " \n", 205 | " \n", 206 | " \n", 207 | " \n", 208 | " \n", 209 | " \n", 210 | " \n", 211 | " \n", 212 | " \n", 213 | " \n", 214 | " \n", 215 | " \n", 216 | " \n", 217 | " \n", 218 | " \n", 219 | " \n", 220 | " \n", 221 | " \n", 222 | " \n", 223 | " \n", 224 | " \n", 225 | " \n", 226 | " \n", 227 | " \n", 228 | " \n", 229 | " \n", 230 | " \n", 231 | " \n", 232 | " \n", 233 | " \n", 234 | " \n", 235 | " \n", 236 | " \n", 237 | " \n", 238 | " \n", 239 | " \n", 240 | " \n", 241 | " \n", 242 | " \n", 243 | " \n", 244 | " \n", 245 | " \n", 246 | " \n", 247 | " \n", 248 | " \n", 249 | " \n", 250 | " \n", 251 | " \n", 252 | " \n", 253 | " \n", 254 | " \n", 255 | " \n", 256 | " \n", 257 | " \n", 258 | " \n", 259 | " \n", 260 | " \n", 261 | " \n", 262 | " \n", 263 | " \n", 264 | " \n", 265 | " \n", 266 | " \n", 267 | " \n", 268 | " \n", 269 | " \n", 270 | " \n", 271 | " \n", 272 | " \n", 273 | " \n", 274 | " \n", 275 | " \n", 276 | " \n", 277 | " \n", 278 | " \n", 279 | " \n", 280 | " \n", 281 | " \n", 282 | " \n", 283 | " \n", 284 | " \n", 285 | " \n", 286 | " \n", 287 | " \n", 288 | " \n", 289 | " \n", 290 | " \n", 291 | " \n", 292 | " \n", 293 | " \n", 294 | " \n", 295 | " \n", 296 | " \n", 297 | " \n", 298 | " \n", 299 | " \n", 300 | " \n", 301 | " \n", 302 | " \n", 303 | " \n", 304 | " \n", 305 | " \n", 306 | " \n", 307 | " \n", 308 | " \n", 309 | " \n", 310 | " \n", 311 | " \n", 312 | " \n", 313 | " \n", 314 | " \n", 315 | " \n", 316 | " \n", 317 | " \n", 318 | " \n", 319 | " \n", 320 | " \n", 321 | " \n", 322 | " \n", 323 | " \n", 324 | " \n", 325 | " \n", 326 | " \n", 327 | " \n", 328 | " \n", 329 | " \n", 330 | " \n", 331 | " \n", 332 | " \n", 333 | " \n", 334 | " \n", 335 | " \n", 336 | " \n", 337 | " \n", 338 | " \n", 339 | " \n", 340 | " \n", 341 | " \n", 342 | " \n", 343 | " \n", 344 | " \n", 345 | " \n", 346 | " \n", 347 | " \n", 348 | " \n", 349 | " \n", 350 | " \n", 351 | " \n", 352 | " \n", 353 | " \n", 354 | " \n", 355 | " \n", 356 | " \n", 357 | " \n", 358 | " \n", 359 | " \n", 360 | " \n", 361 | " \n", 362 | " \n", 363 | " \n", 364 | " \n", 365 | " \n", 366 | " \n", 367 | " \n", 368 | " \n", 369 | " \n", 370 | " \n", 371 | " \n", 372 | " \n", 373 | " \n", 374 | " \n", 375 | " \n", 376 | " \n", 377 | " \n", 378 | " \n", 379 | " \n", 380 | " \n", 381 | " \n", 382 | " \n", 383 | " \n", 384 | " \n", 385 | " \n", 386 | " \n", 387 | " \n", 388 | " \n", 389 | " \n", 390 | " \n", 391 | " \n", 392 | " \n", 393 | " \n", 394 | " \n", 395 | " \n", 396 | " \n", 397 | " \n", 398 | " \n", 399 | " \n", 400 | " \n", 401 | " \n", 402 | " \n", 403 | " \n", 404 | " \n", 405 | " \n", 406 | " \n", 407 | " \n", 408 | " \n", 409 | " \n", 410 | " \n", 411 | " \n", 412 | " \n", 413 | " \n", 414 | " \n", 415 | " \n", 416 | " \n", 417 | " \n", 418 | " \n", 419 | " \n", 420 | " \n", 421 | " \n", 422 | " \n", 423 | " \n", 424 | " \n", 425 | " \n", 426 | " \n", 427 | " \n", 428 | " \n", 429 | " \n", 430 | " \n", 431 | " \n", 432 | " \n", 433 | " \n", 434 | " \n", 435 | " \n", 436 | " \n", 437 | " \n", 438 | " \n", 439 | " \n", 440 | " \n", 441 | " \n", 442 | " \n", 443 | " \n", 444 | " \n", 445 | " \n", 446 | " \n", 447 | " \n", 448 | " \n", 449 | " \n", 450 | " \n", 451 | " \n", 452 | " \n", 453 | " \n", 454 | " \n", 455 | " \n", 456 | " \n", 457 | " \n", 458 | " \n", 459 | " \n", 460 | " \n", 461 | " \n", 462 | " \n", 463 | " \n", 464 | " \n", 465 | " \n", 466 | " \n", 467 | " \n", 468 | " \n", 469 | " \n", 470 | " \n", 471 | " \n", 472 | " \n", 473 | " \n", 474 | " \n", 475 | " \n", 476 | " \n", 477 | " \n", 478 | " \n", 479 | " \n", 480 | " \n", 481 | " \n", 482 | " \n", 483 | " \n", 484 | " \n", 485 | " \n", 486 | " \n", 487 | " \n", 488 | " \n", 489 | " \n", 490 | " \n", 491 | " \n", 492 | " \n", 493 | " \n", 494 | " \n", 495 | " \n", 496 | " \n", 497 | " \n", 498 | " \n", 499 | " \n", 500 | " \n", 501 | " \n", 502 | " \n", 503 | " \n", 504 | " \n", 505 | " \n", 506 | " \n", 507 | " \n", 508 | " \n", 509 | " \n", 510 | " \n", 511 | " \n", 512 | " \n", 513 | " \n", 514 | " \n", 515 | " \n", 516 | " \n", 517 | " \n", 518 | " \n", 519 | " \n", 520 | " \n", 521 | " \n", 522 | " \n", 523 | " \n", 524 | " \n", 525 | " \n", 526 | " \n", 527 | " \n", 528 | " \n", 529 | " \n", 530 | " \n", 531 | " \n", 532 | " \n", 533 | " \n", 534 | " \n", 535 | " \n", 536 | " \n", 537 | " \n", 538 | " \n", 539 | " \n", 540 | " \n", 541 | " \n", 542 | " \n", 543 | " \n", 544 | " \n", 545 | " \n", 546 | " \n", 547 | " \n", 548 | " \n", 549 | " \n", 550 | " \n", 551 | " \n", 552 | " \n", 553 | " \n", 554 | " \n", 555 | " \n", 556 | " \n", 557 | " \n", 558 | " \n", 559 | " \n", 560 | " \n", 561 | " \n", 562 | " \n", 563 | " \n", 564 | " \n", 565 | " \n", 566 | " \n", 567 | " \n", 568 | " \n", 569 | " \n", 570 | " \n", 571 | " \n", 572 | " \n", 573 | " \n", 574 | " \n", 575 | " \n", 576 | " \n", 577 | " \n", 578 | " \n", 579 | " \n", 580 | " \n", 581 | " \n", 582 | " \n", 583 | " \n", 584 | " \n", 585 | " \n", 586 | " \n", 587 | " \n", 588 | " \n", 589 | " \n", 590 | " \n", 591 | " \n", 592 | " \n", 593 | " \n", 594 | " \n", 595 | " \n", 596 | " \n", 597 | " \n", 598 | " \n", 599 | " \n", 600 | " \n", 601 | " \n", 602 | " \n", 603 | " \n", 604 | " \n", 605 | " \n", 606 | " \n", 607 | " \n", 608 | " \n", 609 | " \n", 610 | " \n", 611 | " \n", 612 | " \n", 613 | " \n", 614 | " \n", 615 | " \n", 616 | " \n", 617 | " \n", 618 | " \n", 619 | " \n", 620 | " \n", 621 | " \n", 622 | " \n", 623 | " \n", 624 | " \n", 625 | " \n", 626 | " \n", 627 | " \n", 628 | " \n", 629 | " \n", 630 | " \n", 631 | " \n", 632 | " \n", 633 | " \n", 634 | " \n", 635 | " \n", 636 | " \n", 637 | " \n", 638 | " \n", 639 | " \n", 640 | " \n", 641 | " \n", 642 | " \n", 643 | " \n", 644 | " \n", 645 | " \n", 646 | " \n", 647 | " \n", 648 | " \n", 649 | " \n", 650 | " \n", 651 | " \n", 652 | " \n", 653 | " \n", 654 | " \n", 655 | " \n", 656 | " \n", 657 | " \n", 658 | " \n", 659 | " \n", 660 | " \n", 661 | " \n", 662 | " \n", 663 | " \n", 664 | " \n", 665 | " \n", 666 | " \n", 667 | " \n", 668 | " \n", 669 | " \n", 670 | " \n", 671 | " \n", 672 | " \n", 673 | " \n", 674 | " \n", 675 | " \n", 676 | " \n", 677 | " \n", 678 | " \n", 679 | " \n", 680 | " \n", 681 | " \n", 682 | " \n", 683 | " \n", 684 | " \n", 685 | " \n", 686 | " \n", 687 | " \n", 688 | " \n", 689 | " \n", 690 | " \n", 691 | " \n", 692 | " \n", 693 | " \n", 694 | " \n", 695 | " \n", 696 | " \n", 697 | " \n", 698 | " \n", 699 | " \n", 700 | " \n", 701 | " \n", 702 | " \n", 703 | " \n", 704 | " \n", 705 | " \n", 706 | " \n", 707 | " \n", 708 | " \n", 709 | " \n", 710 | " \n", 711 | " \n", 712 | " \n", 713 | " \n", 714 | " \n", 715 | " \n", 716 | " \n", 717 | " \n", 718 | " \n", 719 | " \n", 720 | " \n", 721 | " \n", 722 | " \n", 723 | " \n", 724 | " \n", 725 | " \n", 726 | " \n", 727 | " \n", 728 | " \n", 729 | " \n", 730 | " \n", 731 | " \n", 732 | " \n", 733 | " \n", 734 | " \n", 735 | " \n", 736 | " \n", 737 | " \n", 738 | " \n", 739 | " \n", 740 | " \n", 741 | " \n", 742 | " \n", 743 | " \n", 744 | " \n", 745 | " \n", 746 | " \n", 747 | " \n", 748 | " \n", 749 | " \n", 750 | " \n", 751 | " \n", 752 | " \n", 753 | " \n", 754 | " \n", 755 | " \n", 756 | " \n", 757 | " \n", 758 | " \n", 759 | " \n", 760 | " \n", 761 | " \n", 762 | " \n", 763 | " \n", 764 | " \n", 765 | " \n", 766 | " \n", 767 | " \n", 768 | " \n", 769 | " \n", 770 | " \n", 771 | " \n", 772 | " \n", 773 | " \n", 774 | " \n", 775 | " \n", 776 | " \n", 777 | " \n", 778 | " \n", 779 | " \n", 780 | " \n", 781 | " \n", 782 | " \n", 783 | " \n", 784 | " \n", 785 | " \n", 786 | " \n", 787 | " \n", 788 | " \n", 789 | " \n", 790 | " \n", 791 | " \n", 792 | " \n", 793 | " \n", 794 | " \n", 795 | " \n", 796 | " \n", 797 | " \n", 798 | " \n", 799 | " \n", 800 | " \n", 801 | " \n", 802 | "
0123456
0MarketMonth NameYearArrival (q)Price Minimum (Rs/q)Price Maximum (Rs/q)Modal Price (Rs/q)
1AGRA(UP)January2016134200103914431349
2AHMEDABAD(GUJ)January20161983906461224997
3AHMEDNAGAR(MS)January201620875117517221138
4AJMER(RAJ)January201642477221067939
5ALIGARH(UP)January201612350121912981257
6ALWAR(RAJ)January201697886251200912
7AMRITSAR(PB)January20162480091313081160
8BALLIA(UP)January2016600140015001460
9BANGALOREJanuary201650722320019431448
10BAREILLY(UP)January201618435114911491149
11BELGAUM(KNT)January20166156448518261164
12BHATINDA(PB)January20165510121817641459
13BHAVNAGAR(GUJ)January20165888707671174969
14BHUBNESWER(OR)January201634050155116581619
15BIJAPUR(KNT)January2016411041317131088
16BURDWAN(WB)January20163880168917891739
17CHAKAN(MS)January20164112594016801350
18CHANDIGARHJanuary2016431080012001000
19CHANDVAD(MS)January201610861946113611094
20CHENNAIJanuary2016116700192722452086
21DEESA(GUJ)January201678490015751350
22DEHRADOON(UTT)January2016100688081048935
23DELHIJanuary201624923146715731154
24DEVALA(MS)January201612713682813211075
25DHAVANGERE(KNT)January2016131704351545990
26DHULIA(MS)January2016984052251268976
27GONDAL(GUJ)January20161536952921061760
28GUWAHATIJanuary20163270164117841713
29HASSAN(KNT)January20161221178214801255
........................
55MUMBAIJanuary201641368165414731215
56NAGPURJanuary20169864595413671260
57NEWASA(MS)January201629646125017581458
58NIPHAD(MS)January2016331105051108958
59PALAYAM(KER)January20168100183521061994
60PATIALA(PB)January20161445100015751300
61PATNAJanuary201629200144015481497
62PHALTAN (MS)January2016266243316841117
63PIMPALGAON(MS)January201650674040314481004
64PUNE(MS)January201632566961317291488
65PURULIA(WB)January20161200164019001800
66RAHATA(MS)January201614928642719861223
67RAHURI(MS)January20165695720016751038
68RAICHUR(KNT)January20161779598715901331
69RAIPUR(CHGARH)January20164300119313711299
70RAJKOT(GUJ)January201680400389980783
71SAGAR(MP)January201619547921000892
72SAIKHEDA(MS)January20161067423751260898
73SANGALI(MS)January20161610441321251231
74SANGAMNER(MS)January201611747150018531312
75SATANA(MS)January2016692864291354957
76SHRIRAMPUR(MS)January20162349727117861350
77SINNAR(MS)January2016739331601363988
78SOLAPUR(MS)January20164037971242285958
79SURAT(GUJ)January20163170094315621252
80UDAIPUR(RAJ)January201664563861307846
81VANI(MS)January20166098376713231007
82VARANASI(UP)January201628900146015031484
83YEOLA(MS)January201643743243712721034
84NaNNaNTotal9307923751(Avg)1490(Avg)1186(Avg)
\n", 803 | "

85 rows × 7 columns

\n", 804 | "
" 805 | ], 806 | "text/plain": [ 807 | " 0 1 2 3 4 \\\n", 808 | "0 Market Month Name Year Arrival (q) Price Minimum (Rs/q) \n", 809 | "1 AGRA(UP) January 2016 134200 1039 \n", 810 | "2 AHMEDABAD(GUJ) January 2016 198390 646 \n", 811 | "3 AHMEDNAGAR(MS) January 2016 208751 175 \n", 812 | "4 AJMER(RAJ) January 2016 4247 722 \n", 813 | "5 ALIGARH(UP) January 2016 12350 1219 \n", 814 | "6 ALWAR(RAJ) January 2016 9788 625 \n", 815 | "7 AMRITSAR(PB) January 2016 24800 913 \n", 816 | "8 BALLIA(UP) January 2016 600 1400 \n", 817 | "9 BANGALORE January 2016 507223 200 \n", 818 | "10 BAREILLY(UP) January 2016 18435 1149 \n", 819 | "11 BELGAUM(KNT) January 2016 61564 485 \n", 820 | "12 BHATINDA(PB) January 2016 5510 1218 \n", 821 | "13 BHAVNAGAR(GUJ) January 2016 588870 767 \n", 822 | "14 BHUBNESWER(OR) January 2016 34050 1551 \n", 823 | "15 BIJAPUR(KNT) January 2016 4110 413 \n", 824 | "16 BURDWAN(WB) January 2016 3880 1689 \n", 825 | "17 CHAKAN(MS) January 2016 41125 940 \n", 826 | "18 CHANDIGARH January 2016 4310 800 \n", 827 | "19 CHANDVAD(MS) January 2016 108619 461 \n", 828 | "20 CHENNAI January 2016 116700 1927 \n", 829 | "21 DEESA(GUJ) January 2016 784 900 \n", 830 | "22 DEHRADOON(UTT) January 2016 10068 808 \n", 831 | "23 DELHI January 2016 249231 467 \n", 832 | "24 DEVALA(MS) January 2016 127136 828 \n", 833 | "25 DHAVANGERE(KNT) January 2016 13170 435 \n", 834 | "26 DHULIA(MS) January 2016 98405 225 \n", 835 | "27 GONDAL(GUJ) January 2016 153695 292 \n", 836 | "28 GUWAHATI January 2016 3270 1641 \n", 837 | "29 HASSAN(KNT) January 2016 12211 782 \n", 838 | ".. ... ... ... ... ... \n", 839 | "55 MUMBAI January 2016 413681 654 \n", 840 | "56 NAGPUR January 2016 98645 954 \n", 841 | "57 NEWASA(MS) January 2016 296461 250 \n", 842 | "58 NIPHAD(MS) January 2016 33110 505 \n", 843 | "59 PALAYAM(KER) January 2016 8100 1835 \n", 844 | "60 PATIALA(PB) January 2016 1445 1000 \n", 845 | "61 PATNA January 2016 29200 1440 \n", 846 | "62 PHALTAN (MS) January 2016 2662 433 \n", 847 | "63 PIMPALGAON(MS) January 2016 506740 403 \n", 848 | "64 PUNE(MS) January 2016 325669 613 \n", 849 | "65 PURULIA(WB) January 2016 1200 1640 \n", 850 | "66 RAHATA(MS) January 2016 149286 427 \n", 851 | "67 RAHURI(MS) January 2016 56957 200 \n", 852 | "68 RAICHUR(KNT) January 2016 17795 987 \n", 853 | "69 RAIPUR(CHGARH) January 2016 4300 1193 \n", 854 | "70 RAJKOT(GUJ) January 2016 80400 389 \n", 855 | "71 SAGAR(MP) January 2016 1954 792 \n", 856 | "72 SAIKHEDA(MS) January 2016 106742 375 \n", 857 | "73 SANGALI(MS) January 2016 16104 413 \n", 858 | "74 SANGAMNER(MS) January 2016 117471 500 \n", 859 | "75 SATANA(MS) January 2016 69286 429 \n", 860 | "76 SHRIRAMPUR(MS) January 2016 23497 271 \n", 861 | "77 SINNAR(MS) January 2016 73933 160 \n", 862 | "78 SOLAPUR(MS) January 2016 403797 124 \n", 863 | "79 SURAT(GUJ) January 2016 31700 943 \n", 864 | "80 UDAIPUR(RAJ) January 2016 6456 386 \n", 865 | "81 VANI(MS) January 2016 60983 767 \n", 866 | "82 VARANASI(UP) January 2016 28900 1460 \n", 867 | "83 YEOLA(MS) January 2016 437432 437 \n", 868 | "84 NaN NaN Total 9307923 751(Avg) \n", 869 | "\n", 870 | " 5 6 \n", 871 | "0 Price Maximum (Rs/q) Modal Price (Rs/q) \n", 872 | "1 1443 1349 \n", 873 | "2 1224 997 \n", 874 | "3 1722 1138 \n", 875 | "4 1067 939 \n", 876 | "5 1298 1257 \n", 877 | "6 1200 912 \n", 878 | "7 1308 1160 \n", 879 | "8 1500 1460 \n", 880 | "9 1943 1448 \n", 881 | "10 1149 1149 \n", 882 | "11 1826 1164 \n", 883 | "12 1764 1459 \n", 884 | "13 1174 969 \n", 885 | "14 1658 1619 \n", 886 | "15 1713 1088 \n", 887 | "16 1789 1739 \n", 888 | "17 1680 1350 \n", 889 | "18 1200 1000 \n", 890 | "19 1361 1094 \n", 891 | "20 2245 2086 \n", 892 | "21 1575 1350 \n", 893 | "22 1048 935 \n", 894 | "23 1573 1154 \n", 895 | "24 1321 1075 \n", 896 | "25 1545 990 \n", 897 | "26 1268 976 \n", 898 | "27 1061 760 \n", 899 | "28 1784 1713 \n", 900 | "29 1480 1255 \n", 901 | ".. ... ... \n", 902 | "55 1473 1215 \n", 903 | "56 1367 1260 \n", 904 | "57 1758 1458 \n", 905 | "58 1108 958 \n", 906 | "59 2106 1994 \n", 907 | "60 1575 1300 \n", 908 | "61 1548 1497 \n", 909 | "62 1684 1117 \n", 910 | "63 1448 1004 \n", 911 | "64 1729 1488 \n", 912 | "65 1900 1800 \n", 913 | "66 1986 1223 \n", 914 | "67 1675 1038 \n", 915 | "68 1590 1331 \n", 916 | "69 1371 1299 \n", 917 | "70 980 783 \n", 918 | "71 1000 892 \n", 919 | "72 1260 898 \n", 920 | "73 2125 1231 \n", 921 | "74 1853 1312 \n", 922 | "75 1354 957 \n", 923 | "76 1786 1350 \n", 924 | "77 1363 988 \n", 925 | "78 2285 958 \n", 926 | "79 1562 1252 \n", 927 | "80 1307 846 \n", 928 | "81 1323 1007 \n", 929 | "82 1503 1484 \n", 930 | "83 1272 1034 \n", 931 | "84 1490(Avg) 1186(Avg) \n", 932 | "\n", 933 | "[85 rows x 7 columns]" 934 | ] 935 | }, 936 | "execution_count": 5, 937 | "metadata": {}, 938 | "output_type": "execute_result" 939 | } 940 | ], 941 | "source": [ 942 | "AllTables[4]" 943 | ] 944 | }, 945 | { 946 | "cell_type": "markdown", 947 | "metadata": {}, 948 | "source": [ 949 | "### Get the exact table\n", 950 | "To read the exact table we need to pass in an identifier value which would identify the table. We can use the `attrs` parameter in read_html to do so. The parameter we will pass is the `id` variable" 951 | ] 952 | }, 953 | { 954 | "cell_type": "code", 955 | "execution_count": 6, 956 | "metadata": { 957 | "collapsed": false 958 | }, 959 | "outputs": [], 960 | "source": [ 961 | "# So can we read our exact table\n", 962 | "OneTable = pd.read_html('MonthWiseMarketArrivalsJan2016.html', \n", 963 | " attrs = {'id' : 'dnn_ctr974_MonthWiseMarketArrivals_GridView1'})" 964 | ] 965 | }, 966 | { 967 | "cell_type": "code", 968 | "execution_count": 7, 969 | "metadata": { 970 | "collapsed": false 971 | }, 972 | "outputs": [ 973 | { 974 | "data": { 975 | "text/plain": [ 976 | "1" 977 | ] 978 | }, 979 | "execution_count": 7, 980 | "metadata": {}, 981 | "output_type": "execute_result" 982 | } 983 | ], 984 | "source": [ 985 | "# So how many tables have we got now\n", 986 | "len(OneTable)" 987 | ] 988 | }, 989 | { 990 | "cell_type": "code", 991 | "execution_count": 8, 992 | "metadata": { 993 | "collapsed": false 994 | }, 995 | "outputs": [ 996 | { 997 | "data": { 998 | "text/html": [ 999 | "
\n", 1000 | "\n", 1001 | " \n", 1002 | " \n", 1003 | " \n", 1004 | " \n", 1005 | " \n", 1006 | " \n", 1007 | " \n", 1008 | " \n", 1009 | " \n", 1010 | " \n", 1011 | " \n", 1012 | " \n", 1013 | " \n", 1014 | " \n", 1015 | " \n", 1016 | " \n", 1017 | " \n", 1018 | " \n", 1019 | " \n", 1020 | " \n", 1021 | " \n", 1022 | " \n", 1023 | " \n", 1024 | " \n", 1025 | " \n", 1026 | " \n", 1027 | " \n", 1028 | " \n", 1029 | " \n", 1030 | " \n", 1031 | " \n", 1032 | " \n", 1033 | " \n", 1034 | " \n", 1035 | " \n", 1036 | " \n", 1037 | " \n", 1038 | " \n", 1039 | " \n", 1040 | " \n", 1041 | " \n", 1042 | " \n", 1043 | " \n", 1044 | " \n", 1045 | " \n", 1046 | " \n", 1047 | " \n", 1048 | " \n", 1049 | " \n", 1050 | " \n", 1051 | " \n", 1052 | " \n", 1053 | " \n", 1054 | " \n", 1055 | " \n", 1056 | " \n", 1057 | " \n", 1058 | " \n", 1059 | " \n", 1060 | " \n", 1061 | " \n", 1062 | " \n", 1063 | " \n", 1064 | " \n", 1065 | "
0123456
0MarketMonth NameYearArrival (q)Price Minimum (Rs/q)Price Maximum (Rs/q)Modal Price (Rs/q)
1AGRA(UP)January2016134200103914431349
2AHMEDABAD(GUJ)January20161983906461224997
3AHMEDNAGAR(MS)January201620875117517221138
4AJMER(RAJ)January201642477221067939
\n", 1066 | "
" 1067 | ], 1068 | "text/plain": [ 1069 | " 0 1 2 3 4 \\\n", 1070 | "0 Market Month Name Year Arrival (q) Price Minimum (Rs/q) \n", 1071 | "1 AGRA(UP) January 2016 134200 1039 \n", 1072 | "2 AHMEDABAD(GUJ) January 2016 198390 646 \n", 1073 | "3 AHMEDNAGAR(MS) January 2016 208751 175 \n", 1074 | "4 AJMER(RAJ) January 2016 4247 722 \n", 1075 | "\n", 1076 | " 5 6 \n", 1077 | "0 Price Maximum (Rs/q) Modal Price (Rs/q) \n", 1078 | "1 1443 1349 \n", 1079 | "2 1224 997 \n", 1080 | "3 1722 1138 \n", 1081 | "4 1067 939 " 1082 | ] 1083 | }, 1084 | "execution_count": 8, 1085 | "metadata": {}, 1086 | "output_type": "execute_result" 1087 | } 1088 | ], 1089 | "source": [ 1090 | "# Show the table of data identifed by pandas with just the first five rows\n", 1091 | "OneTable[0].head()" 1092 | ] 1093 | }, 1094 | { 1095 | "cell_type": "markdown", 1096 | "metadata": {}, 1097 | "source": [ 1098 | "However, we have not got the header correctly in our dataframe. Let us see if we can fix this.\n", 1099 | "\n", 1100 | "To get help on any function just use `??` before the function to help. Run this function and see what additional parameter you need to define to get the header correctly" 1101 | ] 1102 | }, 1103 | { 1104 | "cell_type": "code", 1105 | "execution_count": null, 1106 | "metadata": { 1107 | "collapsed": true 1108 | }, 1109 | "outputs": [], 1110 | "source": [ 1111 | "??pd.read_html" 1112 | ] 1113 | }, 1114 | { 1115 | "cell_type": "markdown", 1116 | "metadata": {}, 1117 | "source": [ 1118 | "### Exercise #3\n", 1119 | "Read the html file again and ensure that the correct header is identifed by pandas?" 1120 | ] 1121 | }, 1122 | { 1123 | "cell_type": "code", 1124 | "execution_count": 11, 1125 | "metadata": { 1126 | "collapsed": false 1127 | }, 1128 | "outputs": [], 1129 | "source": [ 1130 | "OneTable = pd.read_html('MonthWiseMarketArrivalsJan2016.html', header = 0,\n", 1131 | " attrs = {'id' : 'dnn_ctr974_MonthWiseMarketArrivals_GridView1'})" 1132 | ] 1133 | }, 1134 | { 1135 | "cell_type": "markdown", 1136 | "metadata": {}, 1137 | "source": [ 1138 | "Show the top five rows of the dataframe you have read to ensure the headers are now correct." 1139 | ] 1140 | }, 1141 | { 1142 | "cell_type": "code", 1143 | "execution_count": 10, 1144 | "metadata": { 1145 | "collapsed": false, 1146 | "scrolled": true 1147 | }, 1148 | "outputs": [ 1149 | { 1150 | "data": { 1151 | "text/html": [ 1152 | "
\n", 1153 | "\n", 1154 | " \n", 1155 | " \n", 1156 | " \n", 1157 | " \n", 1158 | " \n", 1159 | " \n", 1160 | " \n", 1161 | " \n", 1162 | " \n", 1163 | " \n", 1164 | " \n", 1165 | " \n", 1166 | " \n", 1167 | " \n", 1168 | " \n", 1169 | " \n", 1170 | " \n", 1171 | " \n", 1172 | " \n", 1173 | " \n", 1174 | " \n", 1175 | " \n", 1176 | " \n", 1177 | " \n", 1178 | " \n", 1179 | " \n", 1180 | " \n", 1181 | " \n", 1182 | " \n", 1183 | " \n", 1184 | " \n", 1185 | " \n", 1186 | " \n", 1187 | " \n", 1188 | " \n", 1189 | " \n", 1190 | " \n", 1191 | " \n", 1192 | " \n", 1193 | " \n", 1194 | " \n", 1195 | " \n", 1196 | " \n", 1197 | " \n", 1198 | " \n", 1199 | " \n", 1200 | " \n", 1201 | " \n", 1202 | " \n", 1203 | " \n", 1204 | " \n", 1205 | " \n", 1206 | " \n", 1207 | " \n", 1208 | " \n", 1209 | " \n", 1210 | " \n", 1211 | " \n", 1212 | " \n", 1213 | " \n", 1214 | " \n", 1215 | " \n", 1216 | " \n", 1217 | " \n", 1218 | "
MarketMonth NameYearArrival (q)Price Minimum (Rs/q)Price Maximum (Rs/q)Modal Price (Rs/q)
0AGRA(UP)January2016134200103914431349
1AHMEDABAD(GUJ)January20161983906461224997
2AHMEDNAGAR(MS)January201620875117517221138
3AJMER(RAJ)January201642477221067939
4ALIGARH(UP)January201612350121912981257
\n", 1219 | "
" 1220 | ], 1221 | "text/plain": [ 1222 | " Market Month Name Year Arrival (q) Price Minimum (Rs/q) \\\n", 1223 | "0 AGRA(UP) January 2016 134200 1039 \n", 1224 | "1 AHMEDABAD(GUJ) January 2016 198390 646 \n", 1225 | "2 AHMEDNAGAR(MS) January 2016 208751 175 \n", 1226 | "3 AJMER(RAJ) January 2016 4247 722 \n", 1227 | "4 ALIGARH(UP) January 2016 12350 1219 \n", 1228 | "\n", 1229 | " Price Maximum (Rs/q) Modal Price (Rs/q) \n", 1230 | "0 1443 1349 \n", 1231 | "1 1224 997 \n", 1232 | "2 1722 1138 \n", 1233 | "3 1067 939 \n", 1234 | "4 1298 1257 " 1235 | ] 1236 | }, 1237 | "execution_count": 10, 1238 | "metadata": {}, 1239 | "output_type": "execute_result" 1240 | } 1241 | ], 1242 | "source": [ 1243 | "OneTable[0].head()" 1244 | ] 1245 | }, 1246 | { 1247 | "cell_type": "markdown", 1248 | "metadata": { 1249 | "collapsed": true 1250 | }, 1251 | "source": [ 1252 | "### Dataframe Viewing " 1253 | ] 1254 | }, 1255 | { 1256 | "cell_type": "code", 1257 | "execution_count": 12, 1258 | "metadata": { 1259 | "collapsed": true 1260 | }, 1261 | "outputs": [], 1262 | "source": [ 1263 | "# Let us store the dataframe in a df variable. You will see that as a very common convention in data science pandas use\n", 1264 | "df = OneTable[0]" 1265 | ] 1266 | }, 1267 | { 1268 | "cell_type": "code", 1269 | "execution_count": 13, 1270 | "metadata": { 1271 | "collapsed": false 1272 | }, 1273 | "outputs": [ 1274 | { 1275 | "data": { 1276 | "text/plain": [ 1277 | "(84, 7)" 1278 | ] 1279 | }, 1280 | "execution_count": 13, 1281 | "metadata": {}, 1282 | "output_type": "execute_result" 1283 | } 1284 | ], 1285 | "source": [ 1286 | "# Shape of the dateset - number of rows & number of columns in the dataframe\n", 1287 | "df.shape" 1288 | ] 1289 | }, 1290 | { 1291 | "cell_type": "code", 1292 | "execution_count": 14, 1293 | "metadata": { 1294 | "collapsed": false 1295 | }, 1296 | "outputs": [ 1297 | { 1298 | "data": { 1299 | "text/plain": [ 1300 | "Index(['Market', 'Month Name', 'Year', 'Arrival (q)', 'Price Minimum (Rs/q)',\n", 1301 | " 'Price Maximum (Rs/q)', 'Modal Price (Rs/q)'],\n", 1302 | " dtype='object')" 1303 | ] 1304 | }, 1305 | "execution_count": 14, 1306 | "metadata": {}, 1307 | "output_type": "execute_result" 1308 | } 1309 | ], 1310 | "source": [ 1311 | "# Get the names of all the columns \n", 1312 | "df.columns" 1313 | ] 1314 | }, 1315 | { 1316 | "cell_type": "code", 1317 | "execution_count": 15, 1318 | "metadata": { 1319 | "collapsed": false 1320 | }, 1321 | "outputs": [ 1322 | { 1323 | "data": { 1324 | "text/html": [ 1325 | "
\n", 1326 | "\n", 1327 | " \n", 1328 | " \n", 1329 | " \n", 1330 | " \n", 1331 | " \n", 1332 | " \n", 1333 | " \n", 1334 | " \n", 1335 | " \n", 1336 | " \n", 1337 | " \n", 1338 | " \n", 1339 | " \n", 1340 | " \n", 1341 | " \n", 1342 | " \n", 1343 | " \n", 1344 | " \n", 1345 | " \n", 1346 | " \n", 1347 | " \n", 1348 | " \n", 1349 | " \n", 1350 | " \n", 1351 | " \n", 1352 | " \n", 1353 | " \n", 1354 | " \n", 1355 | " \n", 1356 | " \n", 1357 | " \n", 1358 | " \n", 1359 | " \n", 1360 | " \n", 1361 | " \n", 1362 | " \n", 1363 | " \n", 1364 | " \n", 1365 | " \n", 1366 | " \n", 1367 | " \n", 1368 | " \n", 1369 | " \n", 1370 | " \n", 1371 | " \n", 1372 | " \n", 1373 | " \n", 1374 | " \n", 1375 | " \n", 1376 | " \n", 1377 | " \n", 1378 | " \n", 1379 | " \n", 1380 | " \n", 1381 | " \n", 1382 | " \n", 1383 | " \n", 1384 | " \n", 1385 | " \n", 1386 | " \n", 1387 | " \n", 1388 | " \n", 1389 | " \n", 1390 | " \n", 1391 | "
MarketMonth NameYearArrival (q)Price Minimum (Rs/q)Price Maximum (Rs/q)Modal Price (Rs/q)
0AGRA(UP)January2016134200103914431349
1AHMEDABAD(GUJ)January20161983906461224997
2AHMEDNAGAR(MS)January201620875117517221138
3AJMER(RAJ)January201642477221067939
4ALIGARH(UP)January201612350121912981257
\n", 1392 | "
" 1393 | ], 1394 | "text/plain": [ 1395 | " Market Month Name Year Arrival (q) Price Minimum (Rs/q) \\\n", 1396 | "0 AGRA(UP) January 2016 134200 1039 \n", 1397 | "1 AHMEDABAD(GUJ) January 2016 198390 646 \n", 1398 | "2 AHMEDNAGAR(MS) January 2016 208751 175 \n", 1399 | "3 AJMER(RAJ) January 2016 4247 722 \n", 1400 | "4 ALIGARH(UP) January 2016 12350 1219 \n", 1401 | "\n", 1402 | " Price Maximum (Rs/q) Modal Price (Rs/q) \n", 1403 | "0 1443 1349 \n", 1404 | "1 1224 997 \n", 1405 | "2 1722 1138 \n", 1406 | "3 1067 939 \n", 1407 | "4 1298 1257 " 1408 | ] 1409 | }, 1410 | "execution_count": 15, 1411 | "metadata": {}, 1412 | "output_type": "execute_result" 1413 | } 1414 | ], 1415 | "source": [ 1416 | "# Can we see sample rows - the top 5 rows\n", 1417 | "df.head()" 1418 | ] 1419 | }, 1420 | { 1421 | "cell_type": "code", 1422 | "execution_count": 16, 1423 | "metadata": { 1424 | "collapsed": false 1425 | }, 1426 | "outputs": [ 1427 | { 1428 | "data": { 1429 | "text/html": [ 1430 | "
\n", 1431 | "\n", 1432 | " \n", 1433 | " \n", 1434 | " \n", 1435 | " \n", 1436 | " \n", 1437 | " \n", 1438 | " \n", 1439 | " \n", 1440 | " \n", 1441 | " \n", 1442 | " \n", 1443 | " \n", 1444 | " \n", 1445 | " \n", 1446 | " \n", 1447 | " \n", 1448 | " \n", 1449 | " \n", 1450 | " \n", 1451 | " \n", 1452 | " \n", 1453 | " \n", 1454 | " \n", 1455 | " \n", 1456 | " \n", 1457 | " \n", 1458 | " \n", 1459 | " \n", 1460 | " \n", 1461 | " \n", 1462 | " \n", 1463 | " \n", 1464 | " \n", 1465 | " \n", 1466 | " \n", 1467 | " \n", 1468 | " \n", 1469 | " \n", 1470 | " \n", 1471 | " \n", 1472 | " \n", 1473 | " \n", 1474 | " \n", 1475 | " \n", 1476 | " \n", 1477 | " \n", 1478 | " \n", 1479 | " \n", 1480 | " \n", 1481 | " \n", 1482 | " \n", 1483 | " \n", 1484 | " \n", 1485 | " \n", 1486 | " \n", 1487 | " \n", 1488 | " \n", 1489 | " \n", 1490 | " \n", 1491 | " \n", 1492 | " \n", 1493 | " \n", 1494 | " \n", 1495 | " \n", 1496 | "
MarketMonth NameYearArrival (q)Price Minimum (Rs/q)Price Maximum (Rs/q)Modal Price (Rs/q)
79UDAIPUR(RAJ)January201664563861307846
80VANI(MS)January20166098376713231007
81VARANASI(UP)January201628900146015031484
82YEOLA(MS)January201643743243712721034
83NaNNaNTotal9307923751(Avg)1490(Avg)1186(Avg)
\n", 1497 | "
" 1498 | ], 1499 | "text/plain": [ 1500 | " Market Month Name Year Arrival (q) Price Minimum (Rs/q) \\\n", 1501 | "79 UDAIPUR(RAJ) January 2016 6456 386 \n", 1502 | "80 VANI(MS) January 2016 60983 767 \n", 1503 | "81 VARANASI(UP) January 2016 28900 1460 \n", 1504 | "82 YEOLA(MS) January 2016 437432 437 \n", 1505 | "83 NaN NaN Total 9307923 751(Avg) \n", 1506 | "\n", 1507 | " Price Maximum (Rs/q) Modal Price (Rs/q) \n", 1508 | "79 1307 846 \n", 1509 | "80 1323 1007 \n", 1510 | "81 1503 1484 \n", 1511 | "82 1272 1034 \n", 1512 | "83 1490(Avg) 1186(Avg) " 1513 | ] 1514 | }, 1515 | "execution_count": 16, 1516 | "metadata": {}, 1517 | "output_type": "execute_result" 1518 | } 1519 | ], 1520 | "source": [ 1521 | "# Can we see sample rows - the bottom 5 rows\n", 1522 | "df.tail()" 1523 | ] 1524 | }, 1525 | { 1526 | "cell_type": "code", 1527 | "execution_count": 17, 1528 | "metadata": { 1529 | "collapsed": false 1530 | }, 1531 | "outputs": [ 1532 | { 1533 | "data": { 1534 | "text/plain": [ 1535 | "0 AGRA(UP)\n", 1536 | "1 AHMEDABAD(GUJ)\n", 1537 | "2 AHMEDNAGAR(MS)\n", 1538 | "3 AJMER(RAJ)\n", 1539 | "4 ALIGARH(UP)\n", 1540 | "5 ALWAR(RAJ)\n", 1541 | "6 AMRITSAR(PB)\n", 1542 | "7 BALLIA(UP)\n", 1543 | "8 BANGALORE\n", 1544 | "9 BAREILLY(UP)\n", 1545 | "10 BELGAUM(KNT)\n", 1546 | "11 BHATINDA(PB)\n", 1547 | "12 BHAVNAGAR(GUJ)\n", 1548 | "13 BHUBNESWER(OR)\n", 1549 | "14 BIJAPUR(KNT)\n", 1550 | "15 BURDWAN(WB)\n", 1551 | "16 CHAKAN(MS)\n", 1552 | "17 CHANDIGARH\n", 1553 | "18 CHANDVAD(MS)\n", 1554 | "19 CHENNAI\n", 1555 | "20 DEESA(GUJ)\n", 1556 | "21 DEHRADOON(UTT)\n", 1557 | "22 DELHI\n", 1558 | "23 DEVALA(MS)\n", 1559 | "24 DHAVANGERE(KNT)\n", 1560 | "25 DHULIA(MS)\n", 1561 | "26 GONDAL(GUJ)\n", 1562 | "27 GUWAHATI\n", 1563 | "28 HASSAN(KNT)\n", 1564 | "29 HOSHIARPUR(PB)\n", 1565 | " ... \n", 1566 | "54 MUMBAI\n", 1567 | "55 NAGPUR\n", 1568 | "56 NEWASA(MS)\n", 1569 | "57 NIPHAD(MS)\n", 1570 | "58 PALAYAM(KER)\n", 1571 | "59 PATIALA(PB)\n", 1572 | "60 PATNA\n", 1573 | "61 PHALTAN (MS)\n", 1574 | "62 PIMPALGAON(MS)\n", 1575 | "63 PUNE(MS)\n", 1576 | "64 PURULIA(WB)\n", 1577 | "65 RAHATA(MS)\n", 1578 | "66 RAHURI(MS)\n", 1579 | "67 RAICHUR(KNT)\n", 1580 | "68 RAIPUR(CHGARH)\n", 1581 | "69 RAJKOT(GUJ)\n", 1582 | "70 SAGAR(MP)\n", 1583 | "71 SAIKHEDA(MS)\n", 1584 | "72 SANGALI(MS)\n", 1585 | "73 SANGAMNER(MS)\n", 1586 | "74 SATANA(MS)\n", 1587 | "75 SHRIRAMPUR(MS)\n", 1588 | "76 SINNAR(MS)\n", 1589 | "77 SOLAPUR(MS)\n", 1590 | "78 SURAT(GUJ)\n", 1591 | "79 UDAIPUR(RAJ)\n", 1592 | "80 VANI(MS)\n", 1593 | "81 VARANASI(UP)\n", 1594 | "82 YEOLA(MS)\n", 1595 | "83 NaN\n", 1596 | "Name: Market, dtype: object" 1597 | ] 1598 | }, 1599 | "execution_count": 17, 1600 | "metadata": {}, 1601 | "output_type": "execute_result" 1602 | } 1603 | ], 1604 | "source": [ 1605 | "# Can we access a specific columns\n", 1606 | "df[\"Market\"]" 1607 | ] 1608 | }, 1609 | { 1610 | "cell_type": "code", 1611 | "execution_count": 18, 1612 | "metadata": { 1613 | "collapsed": false 1614 | }, 1615 | "outputs": [ 1616 | { 1617 | "data": { 1618 | "text/plain": [ 1619 | "0 AGRA(UP)\n", 1620 | "1 AHMEDABAD(GUJ)\n", 1621 | "2 AHMEDNAGAR(MS)\n", 1622 | "3 AJMER(RAJ)\n", 1623 | "4 ALIGARH(UP)\n", 1624 | "5 ALWAR(RAJ)\n", 1625 | "6 AMRITSAR(PB)\n", 1626 | "7 BALLIA(UP)\n", 1627 | "8 BANGALORE\n", 1628 | "9 BAREILLY(UP)\n", 1629 | "10 BELGAUM(KNT)\n", 1630 | "11 BHATINDA(PB)\n", 1631 | "12 BHAVNAGAR(GUJ)\n", 1632 | "13 BHUBNESWER(OR)\n", 1633 | "14 BIJAPUR(KNT)\n", 1634 | "15 BURDWAN(WB)\n", 1635 | "16 CHAKAN(MS)\n", 1636 | "17 CHANDIGARH\n", 1637 | "18 CHANDVAD(MS)\n", 1638 | "19 CHENNAI\n", 1639 | "20 DEESA(GUJ)\n", 1640 | "21 DEHRADOON(UTT)\n", 1641 | "22 DELHI\n", 1642 | "23 DEVALA(MS)\n", 1643 | "24 DHAVANGERE(KNT)\n", 1644 | "25 DHULIA(MS)\n", 1645 | "26 GONDAL(GUJ)\n", 1646 | "27 GUWAHATI\n", 1647 | "28 HASSAN(KNT)\n", 1648 | "29 HOSHIARPUR(PB)\n", 1649 | " ... \n", 1650 | "54 MUMBAI\n", 1651 | "55 NAGPUR\n", 1652 | "56 NEWASA(MS)\n", 1653 | "57 NIPHAD(MS)\n", 1654 | "58 PALAYAM(KER)\n", 1655 | "59 PATIALA(PB)\n", 1656 | "60 PATNA\n", 1657 | "61 PHALTAN (MS)\n", 1658 | "62 PIMPALGAON(MS)\n", 1659 | "63 PUNE(MS)\n", 1660 | "64 PURULIA(WB)\n", 1661 | "65 RAHATA(MS)\n", 1662 | "66 RAHURI(MS)\n", 1663 | "67 RAICHUR(KNT)\n", 1664 | "68 RAIPUR(CHGARH)\n", 1665 | "69 RAJKOT(GUJ)\n", 1666 | "70 SAGAR(MP)\n", 1667 | "71 SAIKHEDA(MS)\n", 1668 | "72 SANGALI(MS)\n", 1669 | "73 SANGAMNER(MS)\n", 1670 | "74 SATANA(MS)\n", 1671 | "75 SHRIRAMPUR(MS)\n", 1672 | "76 SINNAR(MS)\n", 1673 | "77 SOLAPUR(MS)\n", 1674 | "78 SURAT(GUJ)\n", 1675 | "79 UDAIPUR(RAJ)\n", 1676 | "80 VANI(MS)\n", 1677 | "81 VARANASI(UP)\n", 1678 | "82 YEOLA(MS)\n", 1679 | "83 NaN\n", 1680 | "Name: Market, dtype: object" 1681 | ] 1682 | }, 1683 | "execution_count": 18, 1684 | "metadata": {}, 1685 | "output_type": "execute_result" 1686 | } 1687 | ], 1688 | "source": [ 1689 | "# Using the dot notation\n", 1690 | "df.Market" 1691 | ] 1692 | }, 1693 | { 1694 | "cell_type": "code", 1695 | "execution_count": 19, 1696 | "metadata": { 1697 | "collapsed": false 1698 | }, 1699 | "outputs": [ 1700 | { 1701 | "data": { 1702 | "text/plain": [ 1703 | "0 AGRA(UP)\n", 1704 | "1 AHMEDABAD(GUJ)\n", 1705 | "2 AHMEDNAGAR(MS)\n", 1706 | "3 AJMER(RAJ)\n", 1707 | "4 ALIGARH(UP)\n", 1708 | "Name: Market, dtype: object" 1709 | ] 1710 | }, 1711 | "execution_count": 19, 1712 | "metadata": {}, 1713 | "output_type": "execute_result" 1714 | } 1715 | ], 1716 | "source": [ 1717 | "# Selecting specific column and rows\n", 1718 | "df[0:5][\"Market\"]" 1719 | ] 1720 | }, 1721 | { 1722 | "cell_type": "code", 1723 | "execution_count": 20, 1724 | "metadata": { 1725 | "collapsed": false 1726 | }, 1727 | "outputs": [ 1728 | { 1729 | "data": { 1730 | "text/plain": [ 1731 | "0 AGRA(UP)\n", 1732 | "1 AHMEDABAD(GUJ)\n", 1733 | "2 AHMEDNAGAR(MS)\n", 1734 | "3 AJMER(RAJ)\n", 1735 | "4 ALIGARH(UP)\n", 1736 | "Name: Market, dtype: object" 1737 | ] 1738 | }, 1739 | "execution_count": 20, 1740 | "metadata": {}, 1741 | "output_type": "execute_result" 1742 | } 1743 | ], 1744 | "source": [ 1745 | "# Works both ways\n", 1746 | "df[\"Market\"][0:5]" 1747 | ] 1748 | }, 1749 | { 1750 | "cell_type": "code", 1751 | "execution_count": 21, 1752 | "metadata": { 1753 | "collapsed": false 1754 | }, 1755 | "outputs": [ 1756 | { 1757 | "data": { 1758 | "text/plain": [ 1759 | "array(['AGRA(UP)', 'AHMEDABAD(GUJ)', 'AHMEDNAGAR(MS)', 'AJMER(RAJ)',\n", 1760 | " 'ALIGARH(UP)', 'ALWAR(RAJ)', 'AMRITSAR(PB)', 'BALLIA(UP)',\n", 1761 | " 'BANGALORE', 'BAREILLY(UP)', 'BELGAUM(KNT)', 'BHATINDA(PB)',\n", 1762 | " 'BHAVNAGAR(GUJ)', 'BHUBNESWER(OR)', 'BIJAPUR(KNT)', 'BURDWAN(WB)',\n", 1763 | " 'CHAKAN(MS)', 'CHANDIGARH', 'CHANDVAD(MS)', 'CHENNAI', 'DEESA(GUJ)',\n", 1764 | " 'DEHRADOON(UTT)', 'DELHI', 'DEVALA(MS)', 'DHAVANGERE(KNT)',\n", 1765 | " 'DHULIA(MS)', 'GONDAL(GUJ)', 'GUWAHATI', 'HASSAN(KNT)',\n", 1766 | " 'HOSHIARPUR(PB)', 'HUBLI(KNT)', 'HYDERABAD', 'INDORE(MP)', 'JAIPUR',\n", 1767 | " 'JALANDHAR(PB)', 'JALGAON(MS)', 'JAMMU', 'JAMNAGAR(GUJ)',\n", 1768 | " 'JODHPUR(RAJ)', 'KALVAN(MS)', 'KANPUR(UP)', 'KARNAL(HR)',\n", 1769 | " 'KHANNA(PB)', 'KOLHAPUR(MS)', 'KOLKATA', 'KOPERGAON(MS)',\n", 1770 | " 'KOTA(RAJ)', 'KURNOOL(AP)', 'LASALGAON(MS)', 'LONAND(MS)',\n", 1771 | " 'LUCKNOW', 'MAHUVA(GUJ)', 'MALEGAON(MS)', 'MANMAD(MS)', 'MUMBAI',\n", 1772 | " 'NAGPUR', 'NEWASA(MS)', 'NIPHAD(MS)', 'PALAYAM(KER)', 'PATIALA(PB)',\n", 1773 | " 'PATNA', 'PHALTAN (MS)', 'PIMPALGAON(MS)', 'PUNE(MS)',\n", 1774 | " 'PURULIA(WB)', 'RAHATA(MS)', 'RAHURI(MS)', 'RAICHUR(KNT)',\n", 1775 | " 'RAIPUR(CHGARH)', 'RAJKOT(GUJ)', 'SAGAR(MP)', 'SAIKHEDA(MS)',\n", 1776 | " 'SANGALI(MS)', 'SANGAMNER(MS)', 'SATANA(MS)', 'SHRIRAMPUR(MS)',\n", 1777 | " 'SINNAR(MS)', 'SOLAPUR(MS)', 'SURAT(GUJ)', 'UDAIPUR(RAJ)',\n", 1778 | " 'VANI(MS)', 'VARANASI(UP)', 'YEOLA(MS)', nan], dtype=object)" 1779 | ] 1780 | }, 1781 | "execution_count": 21, 1782 | "metadata": {}, 1783 | "output_type": "execute_result" 1784 | } 1785 | ], 1786 | "source": [ 1787 | "#Getting unique values of State\n", 1788 | "pd.unique(df['Market'])" 1789 | ] 1790 | }, 1791 | { 1792 | "cell_type": "markdown", 1793 | "metadata": {}, 1794 | "source": [ 1795 | "## Downloading the Entire Month Wise Arrival Data" 1796 | ] 1797 | }, 1798 | { 1799 | "cell_type": "code", 1800 | "execution_count": null, 1801 | "metadata": { 1802 | "collapsed": false 1803 | }, 1804 | "outputs": [], 1805 | "source": [ 1806 | "AllTable = pd.read_html('MonthWiseMarketArrivals.html', header = 0,\n", 1807 | " attrs = {'id' : 'dnn_ctr974_MonthWiseMarketArrivals_GridView1'})" 1808 | ] 1809 | }, 1810 | { 1811 | "cell_type": "code", 1812 | "execution_count": null, 1813 | "metadata": { 1814 | "collapsed": false 1815 | }, 1816 | "outputs": [], 1817 | "source": [ 1818 | "AllTable[0].head()" 1819 | ] 1820 | }, 1821 | { 1822 | "cell_type": "code", 1823 | "execution_count": null, 1824 | "metadata": { 1825 | "collapsed": false 1826 | }, 1827 | "outputs": [], 1828 | "source": [ 1829 | "??pd.DataFrame.to_csv" 1830 | ] 1831 | }, 1832 | { 1833 | "cell_type": "code", 1834 | "execution_count": null, 1835 | "metadata": { 1836 | "collapsed": false 1837 | }, 1838 | "outputs": [], 1839 | "source": [ 1840 | "AllTable[0].columns" 1841 | ] 1842 | }, 1843 | { 1844 | "cell_type": "code", 1845 | "execution_count": null, 1846 | "metadata": { 1847 | "collapsed": true 1848 | }, 1849 | "outputs": [], 1850 | "source": [ 1851 | "# Change the column names to simpler ones\n", 1852 | "AllTable[0].columns = ['market', 'month', 'year', 'quantity', 'priceMin', 'priceMax', 'priceMod']" 1853 | ] 1854 | }, 1855 | { 1856 | "cell_type": "code", 1857 | "execution_count": null, 1858 | "metadata": { 1859 | "collapsed": false 1860 | }, 1861 | "outputs": [], 1862 | "source": [ 1863 | "AllTable[0].head()" 1864 | ] 1865 | }, 1866 | { 1867 | "cell_type": "code", 1868 | "execution_count": null, 1869 | "metadata": { 1870 | "collapsed": false 1871 | }, 1872 | "outputs": [], 1873 | "source": [ 1874 | "# Save the dataframe to a csv file\n", 1875 | "AllTable[0].to_csv('MonthWiseMarketArrivals.csv', index = False)" 1876 | ] 1877 | } 1878 | ], 1879 | "metadata": { 1880 | "kernelspec": { 1881 | "display_name": "Python 3", 1882 | "language": "python", 1883 | "name": "python3" 1884 | }, 1885 | "language_info": { 1886 | "codemirror_mode": { 1887 | "name": "ipython", 1888 | "version": 3 1889 | }, 1890 | "file_extension": ".py", 1891 | "mimetype": "text/x-python", 1892 | "name": "python", 1893 | "nbconvert_exporter": "python", 1894 | "pygments_lexer": "ipython3", 1895 | "version": "3.5.1" 1896 | } 1897 | }, 1898 | "nbformat": 4, 1899 | "nbformat_minor": 0 1900 | } 1901 | -------------------------------------------------------------------------------- /time_series/3-Refine.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# 2. Refine the Data\n", 8 | " \n", 9 | "> \"Data is messy\"\n", 10 | "\n", 11 | "We will be performing the following operation on our Onion price to refine it\n", 12 | "- **Remove** e.g. remove redundant data from the data frame\n", 13 | "- **Derive** e.g. State and City from the market field\n", 14 | "- **Parse** e.g. extract date from year and month column\n", 15 | "\n", 16 | "Other stuff you may need to do to refine are...\n", 17 | "- **Missing** e.g. Check for missing or incomplete data\n", 18 | "- **Quality** e.g. Check for duplicates, accuracy, unusual data\n", 19 | "- **Convert** e.g. free text to coded value\n", 20 | "- **Calculate** e.g. percentages, proportion\n", 21 | "- **Merge** e.g. first and surname for full name\n", 22 | "- **Aggregate** e.g. rollup by year, cluster by area\n", 23 | "- **Filter** e.g. exclude based on location\n", 24 | "- **Sample** e.g. extract a representative data\n", 25 | "- **Summary** e.g. show summary stats like mean" 26 | ] 27 | }, 28 | { 29 | "cell_type": "code", 30 | "execution_count": 1, 31 | "metadata": { 32 | "collapsed": true 33 | }, 34 | "outputs": [], 35 | "source": [ 36 | "# Import the two library we need, which is Pandas and Numpy\n", 37 | "import pandas as pd\n", 38 | "import numpy as np" 39 | ] 40 | }, 41 | { 42 | "cell_type": "code", 43 | "execution_count": 2, 44 | "metadata": { 45 | "collapsed": false 46 | }, 47 | "outputs": [], 48 | "source": [ 49 | "# Read the csv file of Month Wise Market Arrival data that has been scraped.\n", 50 | "df = pd.read_csv('MonthWiseMarketArrivals.csv')" 51 | ] 52 | }, 53 | { 54 | "cell_type": "code", 55 | "execution_count": 3, 56 | "metadata": { 57 | "collapsed": false 58 | }, 59 | "outputs": [ 60 | { 61 | "data": { 62 | "text/html": [ 63 | "
\n", 64 | "\n", 65 | " \n", 66 | " \n", 67 | " \n", 68 | " \n", 69 | " \n", 70 | " \n", 71 | " \n", 72 | " \n", 73 | " \n", 74 | " \n", 75 | " \n", 76 | " \n", 77 | " \n", 78 | " \n", 79 | " \n", 80 | " \n", 81 | " \n", 82 | " \n", 83 | " \n", 84 | " \n", 85 | " \n", 86 | " \n", 87 | " \n", 88 | " \n", 89 | " \n", 90 | " \n", 91 | " \n", 92 | " \n", 93 | " \n", 94 | " \n", 95 | " \n", 96 | " \n", 97 | " \n", 98 | " \n", 99 | " \n", 100 | " \n", 101 | " \n", 102 | " \n", 103 | " \n", 104 | " \n", 105 | " \n", 106 | " \n", 107 | " \n", 108 | " \n", 109 | " \n", 110 | " \n", 111 | " \n", 112 | " \n", 113 | " \n", 114 | " \n", 115 | " \n", 116 | " \n", 117 | " \n", 118 | " \n", 119 | " \n", 120 | " \n", 121 | " \n", 122 | " \n", 123 | " \n", 124 | " \n", 125 | " \n", 126 | " \n", 127 | " \n", 128 | " \n", 129 | "
marketmonthyearquantitypriceMinpriceMaxpriceMod
0ABOHAR(PB)January20052350404493446
1ABOHAR(PB)January2006900487638563
2ABOHAR(PB)January2010790128315921460
3ABOHAR(PB)January2011245306737503433
4ABOHAR(PB)January20121035523686605
\n", 130 | "
" 131 | ], 132 | "text/plain": [ 133 | " market month year quantity priceMin priceMax priceMod\n", 134 | "0 ABOHAR(PB) January 2005 2350 404 493 446\n", 135 | "1 ABOHAR(PB) January 2006 900 487 638 563\n", 136 | "2 ABOHAR(PB) January 2010 790 1283 1592 1460\n", 137 | "3 ABOHAR(PB) January 2011 245 3067 3750 3433\n", 138 | "4 ABOHAR(PB) January 2012 1035 523 686 605" 139 | ] 140 | }, 141 | "execution_count": 3, 142 | "metadata": {}, 143 | "output_type": "execute_result" 144 | } 145 | ], 146 | "source": [ 147 | "df.head()" 148 | ] 149 | }, 150 | { 151 | "cell_type": "code", 152 | "execution_count": 4, 153 | "metadata": { 154 | "collapsed": false 155 | }, 156 | "outputs": [ 157 | { 158 | "data": { 159 | "text/html": [ 160 | "
\n", 161 | "\n", 162 | " \n", 163 | " \n", 164 | " \n", 165 | " \n", 166 | " \n", 167 | " \n", 168 | " \n", 169 | " \n", 170 | " \n", 171 | " \n", 172 | " \n", 173 | " \n", 174 | " \n", 175 | " \n", 176 | " \n", 177 | " \n", 178 | " \n", 179 | " \n", 180 | " \n", 181 | " \n", 182 | " \n", 183 | " \n", 184 | " \n", 185 | " \n", 186 | " \n", 187 | " \n", 188 | " \n", 189 | " \n", 190 | " \n", 191 | " \n", 192 | " \n", 193 | " \n", 194 | " \n", 195 | " \n", 196 | " \n", 197 | " \n", 198 | " \n", 199 | " \n", 200 | " \n", 201 | " \n", 202 | " \n", 203 | " \n", 204 | " \n", 205 | " \n", 206 | " \n", 207 | " \n", 208 | " \n", 209 | " \n", 210 | " \n", 211 | " \n", 212 | " \n", 213 | " \n", 214 | " \n", 215 | " \n", 216 | " \n", 217 | " \n", 218 | " \n", 219 | " \n", 220 | " \n", 221 | " \n", 222 | " \n", 223 | " \n", 224 | " \n", 225 | " \n", 226 | "
marketmonthyearquantitypriceMinpriceMaxpriceMod
10223YEOLA(MS)December201220706648513271136
10224YEOLA(MS)December201321588347214271177
10225YEOLA(MS)December201420107744616541456
10226YEOLA(MS)December201522331560914461126
10227NaNNaNTotal783438108647(Avg)1213(Avg)984(Avg)
\n", 227 | "
" 228 | ], 229 | "text/plain": [ 230 | " market month year quantity priceMin priceMax priceMod\n", 231 | "10223 YEOLA(MS) December 2012 207066 485 1327 1136\n", 232 | "10224 YEOLA(MS) December 2013 215883 472 1427 1177\n", 233 | "10225 YEOLA(MS) December 2014 201077 446 1654 1456\n", 234 | "10226 YEOLA(MS) December 2015 223315 609 1446 1126\n", 235 | "10227 NaN NaN Total 783438108 647(Avg) 1213(Avg) 984(Avg)" 236 | ] 237 | }, 238 | "execution_count": 4, 239 | "metadata": {}, 240 | "output_type": "execute_result" 241 | } 242 | ], 243 | "source": [ 244 | "df.tail()" 245 | ] 246 | }, 247 | { 248 | "cell_type": "markdown", 249 | "metadata": {}, 250 | "source": [ 251 | "## Remove the redundant data" 252 | ] 253 | }, 254 | { 255 | "cell_type": "code", 256 | "execution_count": 5, 257 | "metadata": { 258 | "collapsed": false 259 | }, 260 | "outputs": [ 261 | { 262 | "data": { 263 | "text/plain": [ 264 | "market object\n", 265 | "month object\n", 266 | "year object\n", 267 | "quantity int64\n", 268 | "priceMin object\n", 269 | "priceMax object\n", 270 | "priceMod object\n", 271 | "dtype: object" 272 | ] 273 | }, 274 | "execution_count": 5, 275 | "metadata": {}, 276 | "output_type": "execute_result" 277 | } 278 | ], 279 | "source": [ 280 | "df.dtypes" 281 | ] 282 | }, 283 | { 284 | "cell_type": "code", 285 | "execution_count": 6, 286 | "metadata": { 287 | "collapsed": false 288 | }, 289 | "outputs": [ 290 | { 291 | "data": { 292 | "text/html": [ 293 | "
\n", 294 | "\n", 295 | " \n", 296 | " \n", 297 | " \n", 298 | " \n", 299 | " \n", 300 | " \n", 301 | " \n", 302 | " \n", 303 | " \n", 304 | " \n", 305 | " \n", 306 | " \n", 307 | " \n", 308 | " \n", 309 | " \n", 310 | " \n", 311 | " \n", 312 | " \n", 313 | " \n", 314 | " \n", 315 | " \n", 316 | " \n", 317 | " \n", 318 | " \n", 319 | "
marketmonthyearquantitypriceMinpriceMaxpriceMod
10227NaNNaNTotal783438108647(Avg)1213(Avg)984(Avg)
\n", 320 | "
" 321 | ], 322 | "text/plain": [ 323 | " market month year quantity priceMin priceMax priceMod\n", 324 | "10227 NaN NaN Total 783438108 647(Avg) 1213(Avg) 984(Avg)" 325 | ] 326 | }, 327 | "execution_count": 6, 328 | "metadata": {}, 329 | "output_type": "execute_result" 330 | } 331 | ], 332 | "source": [ 333 | "# Delete the last row from the dataframe\n", 334 | "df.tail(1)" 335 | ] 336 | }, 337 | { 338 | "cell_type": "code", 339 | "execution_count": 7, 340 | "metadata": { 341 | "collapsed": false 342 | }, 343 | "outputs": [], 344 | "source": [ 345 | "# Delete a row from the dataframe\n", 346 | "df.drop(df.tail(1).index, inplace = True)" 347 | ] 348 | }, 349 | { 350 | "cell_type": "code", 351 | "execution_count": 8, 352 | "metadata": { 353 | "collapsed": false 354 | }, 355 | "outputs": [ 356 | { 357 | "data": { 358 | "text/html": [ 359 | "
\n", 360 | "\n", 361 | " \n", 362 | " \n", 363 | " \n", 364 | " \n", 365 | " \n", 366 | " \n", 367 | " \n", 368 | " \n", 369 | " \n", 370 | " \n", 371 | " \n", 372 | " \n", 373 | " \n", 374 | " \n", 375 | " \n", 376 | " \n", 377 | " \n", 378 | " \n", 379 | " \n", 380 | " \n", 381 | " \n", 382 | " \n", 383 | " \n", 384 | " \n", 385 | " \n", 386 | " \n", 387 | " \n", 388 | " \n", 389 | " \n", 390 | " \n", 391 | " \n", 392 | " \n", 393 | " \n", 394 | " \n", 395 | " \n", 396 | " \n", 397 | " \n", 398 | " \n", 399 | " \n", 400 | " \n", 401 | " \n", 402 | " \n", 403 | " \n", 404 | " \n", 405 | " \n", 406 | " \n", 407 | " \n", 408 | " \n", 409 | " \n", 410 | " \n", 411 | " \n", 412 | " \n", 413 | " \n", 414 | " \n", 415 | " \n", 416 | " \n", 417 | " \n", 418 | " \n", 419 | " \n", 420 | " \n", 421 | " \n", 422 | " \n", 423 | " \n", 424 | " \n", 425 | "
marketmonthyearquantitypriceMinpriceMaxpriceMod
0ABOHAR(PB)January20052350404493446
1ABOHAR(PB)January2006900487638563
2ABOHAR(PB)January2010790128315921460
3ABOHAR(PB)January2011245306737503433
4ABOHAR(PB)January20121035523686605
\n", 426 | "
" 427 | ], 428 | "text/plain": [ 429 | " market month year quantity priceMin priceMax priceMod\n", 430 | "0 ABOHAR(PB) January 2005 2350 404 493 446\n", 431 | "1 ABOHAR(PB) January 2006 900 487 638 563\n", 432 | "2 ABOHAR(PB) January 2010 790 1283 1592 1460\n", 433 | "3 ABOHAR(PB) January 2011 245 3067 3750 3433\n", 434 | "4 ABOHAR(PB) January 2012 1035 523 686 605" 435 | ] 436 | }, 437 | "execution_count": 8, 438 | "metadata": {}, 439 | "output_type": "execute_result" 440 | } 441 | ], 442 | "source": [ 443 | "df.head()" 444 | ] 445 | }, 446 | { 447 | "cell_type": "code", 448 | "execution_count": 56, 449 | "metadata": { 450 | "collapsed": false 451 | }, 452 | "outputs": [ 453 | { 454 | "data": { 455 | "text/html": [ 456 | "
\n", 457 | "\n", 458 | " \n", 459 | " \n", 460 | " \n", 461 | " \n", 462 | " \n", 463 | " \n", 464 | " \n", 465 | " \n", 466 | " \n", 467 | " \n", 468 | " \n", 469 | " \n", 470 | " \n", 471 | " \n", 472 | " \n", 473 | " \n", 474 | " \n", 475 | " \n", 476 | " \n", 477 | " \n", 478 | " \n", 479 | " \n", 480 | " \n", 481 | " \n", 482 | " \n", 483 | " \n", 484 | " \n", 485 | " \n", 486 | " \n", 487 | " \n", 488 | " \n", 489 | " \n", 490 | " \n", 491 | " \n", 492 | " \n", 493 | " \n", 494 | " \n", 495 | " \n", 496 | " \n", 497 | " \n", 498 | " \n", 499 | " \n", 500 | " \n", 501 | " \n", 502 | " \n", 503 | " \n", 504 | " \n", 505 | " \n", 506 | " \n", 507 | " \n", 508 | " \n", 509 | " \n", 510 | " \n", 511 | " \n", 512 | " \n", 513 | " \n", 514 | " \n", 515 | " \n", 516 | " \n", 517 | " \n", 518 | " \n", 519 | " \n", 520 | " \n", 521 | " \n", 522 | "
marketmonthyearquantitypriceMinpriceMaxpriceMod
10222YEOLA(MS)December2011131326282612526
10223YEOLA(MS)December201220706648513271136
10224YEOLA(MS)December201321588347214271177
10225YEOLA(MS)December201420107744616541456
10226YEOLA(MS)December201522331560914461126
\n", 523 | "
" 524 | ], 525 | "text/plain": [ 526 | " market month year quantity priceMin priceMax priceMod\n", 527 | "10222 YEOLA(MS) December 2011 131326 282 612 526\n", 528 | "10223 YEOLA(MS) December 2012 207066 485 1327 1136\n", 529 | "10224 YEOLA(MS) December 2013 215883 472 1427 1177\n", 530 | "10225 YEOLA(MS) December 2014 201077 446 1654 1456\n", 531 | "10226 YEOLA(MS) December 2015 223315 609 1446 1126" 532 | ] 533 | }, 534 | "execution_count": 56, 535 | "metadata": {}, 536 | "output_type": "execute_result" 537 | } 538 | ], 539 | "source": [ 540 | "df.tail()" 541 | ] 542 | }, 543 | { 544 | "cell_type": "code", 545 | "execution_count": 57, 546 | "metadata": { 547 | "collapsed": false 548 | }, 549 | "outputs": [ 550 | { 551 | "data": { 552 | "text/plain": [ 553 | "market object\n", 554 | "month object\n", 555 | "year object\n", 556 | "quantity int64\n", 557 | "priceMin object\n", 558 | "priceMax object\n", 559 | "priceMod object\n", 560 | "dtype: object" 561 | ] 562 | }, 563 | "execution_count": 57, 564 | "metadata": {}, 565 | "output_type": "execute_result" 566 | } 567 | ], 568 | "source": [ 569 | "df.dtypes" 570 | ] 571 | }, 572 | { 573 | "cell_type": "code", 574 | "execution_count": 58, 575 | "metadata": { 576 | "collapsed": false 577 | }, 578 | "outputs": [ 579 | { 580 | "data": { 581 | "text/html": [ 582 | "
\n", 583 | "\n", 584 | " \n", 585 | " \n", 586 | " \n", 587 | " \n", 588 | " \n", 589 | " \n", 590 | " \n", 591 | " \n", 592 | " \n", 593 | " \n", 594 | " \n", 595 | " \n", 596 | " \n", 597 | " \n", 598 | " \n", 599 | " \n", 600 | " \n", 601 | " \n", 602 | " \n", 603 | " \n", 604 | " \n", 605 | " \n", 606 | " \n", 607 | " \n", 608 | " \n", 609 | " \n", 610 | " \n", 611 | " \n", 612 | " \n", 613 | " \n", 614 | " \n", 615 | " \n", 616 | " \n", 617 | " \n", 618 | " \n", 619 | " \n", 620 | " \n", 621 | " \n", 622 | " \n", 623 | " \n", 624 | "
priceMinpriceMaxpriceMod
0404493446
1487638563
2128315921460
3306737503433
4523686605
\n", 625 | "
" 626 | ], 627 | "text/plain": [ 628 | " priceMin priceMax priceMod\n", 629 | "0 404 493 446\n", 630 | "1 487 638 563\n", 631 | "2 1283 1592 1460\n", 632 | "3 3067 3750 3433\n", 633 | "4 523 686 605" 634 | ] 635 | }, 636 | "execution_count": 58, 637 | "metadata": {}, 638 | "output_type": "execute_result" 639 | } 640 | ], 641 | "source": [ 642 | "df.iloc[:,4:7].head()" 643 | ] 644 | }, 645 | { 646 | "cell_type": "code", 647 | "execution_count": 59, 648 | "metadata": { 649 | "collapsed": false 650 | }, 651 | "outputs": [], 652 | "source": [ 653 | "df.iloc[:,2:7] = df.iloc[:,2:7].astype(int)" 654 | ] 655 | }, 656 | { 657 | "cell_type": "code", 658 | "execution_count": 60, 659 | "metadata": { 660 | "collapsed": false 661 | }, 662 | "outputs": [ 663 | { 664 | "data": { 665 | "text/plain": [ 666 | "market object\n", 667 | "month object\n", 668 | "year int64\n", 669 | "quantity int64\n", 670 | "priceMin int64\n", 671 | "priceMax int64\n", 672 | "priceMod int64\n", 673 | "dtype: object" 674 | ] 675 | }, 676 | "execution_count": 60, 677 | "metadata": {}, 678 | "output_type": "execute_result" 679 | } 680 | ], 681 | "source": [ 682 | "df.dtypes" 683 | ] 684 | }, 685 | { 686 | "cell_type": "code", 687 | "execution_count": 61, 688 | "metadata": { 689 | "collapsed": false 690 | }, 691 | "outputs": [ 692 | { 693 | "data": { 694 | "text/html": [ 695 | "
\n", 696 | "\n", 697 | " \n", 698 | " \n", 699 | " \n", 700 | " \n", 701 | " \n", 702 | " \n", 703 | " \n", 704 | " \n", 705 | " \n", 706 | " \n", 707 | " \n", 708 | " \n", 709 | " \n", 710 | " \n", 711 | " \n", 712 | " \n", 713 | " \n", 714 | " \n", 715 | " \n", 716 | " \n", 717 | " \n", 718 | " \n", 719 | " \n", 720 | " \n", 721 | " \n", 722 | " \n", 723 | " \n", 724 | " \n", 725 | " \n", 726 | " \n", 727 | " \n", 728 | " \n", 729 | " \n", 730 | " \n", 731 | " \n", 732 | " \n", 733 | " \n", 734 | " \n", 735 | " \n", 736 | " \n", 737 | " \n", 738 | " \n", 739 | " \n", 740 | " \n", 741 | " \n", 742 | " \n", 743 | " \n", 744 | " \n", 745 | " \n", 746 | " \n", 747 | " \n", 748 | " \n", 749 | " \n", 750 | " \n", 751 | " \n", 752 | " \n", 753 | " \n", 754 | " \n", 755 | " \n", 756 | " \n", 757 | " \n", 758 | " \n", 759 | " \n", 760 | " \n", 761 | "
marketmonthyearquantitypriceMinpriceMaxpriceMod
0ABOHAR(PB)Jan20052350404493446
1ABOHAR(PB)Jan2006900487638563
2ABOHAR(PB)Jan2010790128315921460
3ABOHAR(PB)Jan2011245306737503433
4ABOHAR(PB)Jan20121035523686605
\n", 762 | "
" 763 | ], 764 | "text/plain": [ 765 | " market month year quantity priceMin priceMax priceMod\n", 766 | "0 ABOHAR(PB) Jan 2005 2350 404 493 446\n", 767 | "1 ABOHAR(PB) Jan 2006 900 487 638 563\n", 768 | "2 ABOHAR(PB) Jan 2010 790 1283 1592 1460\n", 769 | "3 ABOHAR(PB) Jan 2011 245 3067 3750 3433\n", 770 | "4 ABOHAR(PB) Jan 2012 1035 523 686 605" 771 | ] 772 | }, 773 | "execution_count": 61, 774 | "metadata": {}, 775 | "output_type": "execute_result" 776 | } 777 | ], 778 | "source": [ 779 | "df.head()" 780 | ] 781 | }, 782 | { 783 | "cell_type": "code", 784 | "execution_count": 62, 785 | "metadata": { 786 | "collapsed": false 787 | }, 788 | "outputs": [ 789 | { 790 | "data": { 791 | "text/html": [ 792 | "
\n", 793 | "\n", 794 | " \n", 795 | " \n", 796 | " \n", 797 | " \n", 798 | " \n", 799 | " \n", 800 | " \n", 801 | " \n", 802 | " \n", 803 | " \n", 804 | " \n", 805 | " \n", 806 | " \n", 807 | " \n", 808 | " \n", 809 | " \n", 810 | " \n", 811 | " \n", 812 | " \n", 813 | " \n", 814 | " \n", 815 | " \n", 816 | " \n", 817 | " \n", 818 | " \n", 819 | " \n", 820 | " \n", 821 | " \n", 822 | " \n", 823 | " \n", 824 | " \n", 825 | " \n", 826 | " \n", 827 | " \n", 828 | " \n", 829 | " \n", 830 | " \n", 831 | " \n", 832 | " \n", 833 | " \n", 834 | " \n", 835 | " \n", 836 | " \n", 837 | " \n", 838 | " \n", 839 | " \n", 840 | " \n", 841 | " \n", 842 | " \n", 843 | " \n", 844 | " \n", 845 | " \n", 846 | " \n", 847 | " \n", 848 | " \n", 849 | " \n", 850 | " \n", 851 | " \n", 852 | " \n", 853 | " \n", 854 | " \n", 855 | " \n", 856 | " \n", 857 | " \n", 858 | " \n", 859 | " \n", 860 | " \n", 861 | " \n", 862 | " \n", 863 | " \n", 864 | " \n", 865 | " \n", 866 | " \n", 867 | " \n", 868 | " \n", 869 | " \n", 870 | "
yearquantitypriceMinpriceMaxpriceMod
count10227.00000010227.00000010227.00000010227.00000010227.000000
mean2009.02229476604.880023646.9443631212.760731984.284345
std4.372841124408.698759673.121850979.658874818.471498
min1996.00000020.00000016.000000145.00000080.000000
25%2006.0000008898.000000209.000000557.000000448.000000
50%2009.00000027460.000000440.000000923.000000747.000000
75%2013.00000088356.500000828.0000001527.0000001248.000000
max2016.0000001639032.0000006000.0000008192.0000006400.000000
\n", 871 | "
" 872 | ], 873 | "text/plain": [ 874 | " year quantity priceMin priceMax priceMod\n", 875 | "count 10227.000000 10227.000000 10227.000000 10227.000000 10227.000000\n", 876 | "mean 2009.022294 76604.880023 646.944363 1212.760731 984.284345\n", 877 | "std 4.372841 124408.698759 673.121850 979.658874 818.471498\n", 878 | "min 1996.000000 20.000000 16.000000 145.000000 80.000000\n", 879 | "25% 2006.000000 8898.000000 209.000000 557.000000 448.000000\n", 880 | "50% 2009.000000 27460.000000 440.000000 923.000000 747.000000\n", 881 | "75% 2013.000000 88356.500000 828.000000 1527.000000 1248.000000\n", 882 | "max 2016.000000 1639032.000000 6000.000000 8192.000000 6400.000000" 883 | ] 884 | }, 885 | "execution_count": 62, 886 | "metadata": {}, 887 | "output_type": "execute_result" 888 | } 889 | ], 890 | "source": [ 891 | "df.describe()" 892 | ] 893 | }, 894 | { 895 | "cell_type": "markdown", 896 | "metadata": {}, 897 | "source": [ 898 | "## Extracting the states from market names" 899 | ] 900 | }, 901 | { 902 | "cell_type": "code", 903 | "execution_count": 63, 904 | "metadata": { 905 | "collapsed": false 906 | }, 907 | "outputs": [ 908 | { 909 | "data": { 910 | "text/plain": [ 911 | "LASALGAON(MS) 242\n", 912 | "PIMPALGAON(MS) 224\n", 913 | "MANMAD(MS) 218\n", 914 | "LONAND(MS) 211\n", 915 | "MAHUVA(GUJ) 210\n", 916 | "Name: market, dtype: int64" 917 | ] 918 | }, 919 | "execution_count": 63, 920 | "metadata": {}, 921 | "output_type": "execute_result" 922 | } 923 | ], 924 | "source": [ 925 | "df.market.value_counts().head()" 926 | ] 927 | }, 928 | { 929 | "cell_type": "code", 930 | "execution_count": 64, 931 | "metadata": { 932 | "collapsed": false 933 | }, 934 | "outputs": [], 935 | "source": [ 936 | "df['state'] = df.market.str.split('(').str[-1]" 937 | ] 938 | }, 939 | { 940 | "cell_type": "code", 941 | "execution_count": 65, 942 | "metadata": { 943 | "collapsed": false 944 | }, 945 | "outputs": [ 946 | { 947 | "data": { 948 | "text/html": [ 949 | "
\n", 950 | "\n", 951 | " \n", 952 | " \n", 953 | " \n", 954 | " \n", 955 | " \n", 956 | " \n", 957 | " \n", 958 | " \n", 959 | " \n", 960 | " \n", 961 | " \n", 962 | " \n", 963 | " \n", 964 | " \n", 965 | " \n", 966 | " \n", 967 | " \n", 968 | " \n", 969 | " \n", 970 | " \n", 971 | " \n", 972 | " \n", 973 | " \n", 974 | " \n", 975 | " \n", 976 | " \n", 977 | " \n", 978 | " \n", 979 | " \n", 980 | " \n", 981 | " \n", 982 | " \n", 983 | " \n", 984 | " \n", 985 | " \n", 986 | " \n", 987 | " \n", 988 | " \n", 989 | " \n", 990 | " \n", 991 | " \n", 992 | " \n", 993 | " \n", 994 | " \n", 995 | " \n", 996 | " \n", 997 | " \n", 998 | " \n", 999 | " \n", 1000 | " \n", 1001 | " \n", 1002 | " \n", 1003 | " \n", 1004 | " \n", 1005 | " \n", 1006 | " \n", 1007 | " \n", 1008 | " \n", 1009 | " \n", 1010 | " \n", 1011 | " \n", 1012 | " \n", 1013 | " \n", 1014 | " \n", 1015 | " \n", 1016 | " \n", 1017 | " \n", 1018 | " \n", 1019 | " \n", 1020 | " \n", 1021 | "
marketmonthyearquantitypriceMinpriceMaxpriceModstate
0ABOHAR(PB)Jan20052350404493446PB)
1ABOHAR(PB)Jan2006900487638563PB)
2ABOHAR(PB)Jan2010790128315921460PB)
3ABOHAR(PB)Jan2011245306737503433PB)
4ABOHAR(PB)Jan20121035523686605PB)
\n", 1022 | "
" 1023 | ], 1024 | "text/plain": [ 1025 | " market month year quantity priceMin priceMax priceMod state\n", 1026 | "0 ABOHAR(PB) Jan 2005 2350 404 493 446 PB)\n", 1027 | "1 ABOHAR(PB) Jan 2006 900 487 638 563 PB)\n", 1028 | "2 ABOHAR(PB) Jan 2010 790 1283 1592 1460 PB)\n", 1029 | "3 ABOHAR(PB) Jan 2011 245 3067 3750 3433 PB)\n", 1030 | "4 ABOHAR(PB) Jan 2012 1035 523 686 605 PB)" 1031 | ] 1032 | }, 1033 | "execution_count": 65, 1034 | "metadata": {}, 1035 | "output_type": "execute_result" 1036 | } 1037 | ], 1038 | "source": [ 1039 | "df.head()" 1040 | ] 1041 | }, 1042 | { 1043 | "cell_type": "code", 1044 | "execution_count": 66, 1045 | "metadata": { 1046 | "collapsed": true 1047 | }, 1048 | "outputs": [], 1049 | "source": [ 1050 | "df['city'] = df.market.str.split('(').str[0]" 1051 | ] 1052 | }, 1053 | { 1054 | "cell_type": "code", 1055 | "execution_count": 67, 1056 | "metadata": { 1057 | "collapsed": false 1058 | }, 1059 | "outputs": [ 1060 | { 1061 | "data": { 1062 | "text/html": [ 1063 | "
\n", 1064 | "\n", 1065 | " \n", 1066 | " \n", 1067 | " \n", 1068 | " \n", 1069 | " \n", 1070 | " \n", 1071 | " \n", 1072 | " \n", 1073 | " \n", 1074 | " \n", 1075 | " \n", 1076 | " \n", 1077 | " \n", 1078 | " \n", 1079 | " \n", 1080 | " \n", 1081 | " \n", 1082 | " \n", 1083 | " \n", 1084 | " \n", 1085 | " \n", 1086 | " \n", 1087 | " \n", 1088 | " \n", 1089 | " \n", 1090 | " \n", 1091 | " \n", 1092 | " \n", 1093 | " \n", 1094 | " \n", 1095 | " \n", 1096 | " \n", 1097 | " \n", 1098 | " \n", 1099 | " \n", 1100 | " \n", 1101 | " \n", 1102 | " \n", 1103 | " \n", 1104 | " \n", 1105 | " \n", 1106 | " \n", 1107 | " \n", 1108 | " \n", 1109 | " \n", 1110 | " \n", 1111 | " \n", 1112 | " \n", 1113 | " \n", 1114 | " \n", 1115 | " \n", 1116 | " \n", 1117 | " \n", 1118 | " \n", 1119 | " \n", 1120 | " \n", 1121 | " \n", 1122 | " \n", 1123 | " \n", 1124 | " \n", 1125 | " \n", 1126 | " \n", 1127 | " \n", 1128 | " \n", 1129 | " \n", 1130 | " \n", 1131 | " \n", 1132 | " \n", 1133 | " \n", 1134 | " \n", 1135 | " \n", 1136 | " \n", 1137 | " \n", 1138 | " \n", 1139 | " \n", 1140 | " \n", 1141 | "
marketmonthyearquantitypriceMinpriceMaxpriceModstatecity
0ABOHAR(PB)Jan20052350404493446PB)ABOHAR
1ABOHAR(PB)Jan2006900487638563PB)ABOHAR
2ABOHAR(PB)Jan2010790128315921460PB)ABOHAR
3ABOHAR(PB)Jan2011245306737503433PB)ABOHAR
4ABOHAR(PB)Jan20121035523686605PB)ABOHAR
\n", 1142 | "
" 1143 | ], 1144 | "text/plain": [ 1145 | " market month year quantity priceMin priceMax priceMod state \\\n", 1146 | "0 ABOHAR(PB) Jan 2005 2350 404 493 446 PB) \n", 1147 | "1 ABOHAR(PB) Jan 2006 900 487 638 563 PB) \n", 1148 | "2 ABOHAR(PB) Jan 2010 790 1283 1592 1460 PB) \n", 1149 | "3 ABOHAR(PB) Jan 2011 245 3067 3750 3433 PB) \n", 1150 | "4 ABOHAR(PB) Jan 2012 1035 523 686 605 PB) \n", 1151 | "\n", 1152 | " city \n", 1153 | "0 ABOHAR \n", 1154 | "1 ABOHAR \n", 1155 | "2 ABOHAR \n", 1156 | "3 ABOHAR \n", 1157 | "4 ABOHAR " 1158 | ] 1159 | }, 1160 | "execution_count": 67, 1161 | "metadata": {}, 1162 | "output_type": "execute_result" 1163 | } 1164 | ], 1165 | "source": [ 1166 | "df.head()" 1167 | ] 1168 | }, 1169 | { 1170 | "cell_type": "code", 1171 | "execution_count": 68, 1172 | "metadata": { 1173 | "collapsed": false 1174 | }, 1175 | "outputs": [ 1176 | { 1177 | "data": { 1178 | "text/plain": [ 1179 | "array(['PB)', 'UP)', 'GUJ)', 'MS)', 'RAJ)', 'BANGALORE', 'KNT)', 'BHOPAL',\n", 1180 | " 'OR)', 'BHR)', 'WB)', 'CHANDIGARH', 'CHENNAI', 'bellary)',\n", 1181 | " 'podisu)', 'UTT)', 'DELHI', 'MP)', 'TN)', 'Podis', 'GUWAHATI',\n", 1182 | " 'HYDERABAD', 'JAIPUR', 'WHITE)', 'JAMMU', 'HR)', 'KOLKATA', 'AP)',\n", 1183 | " 'LUCKNOW', 'MUMBAI', 'NAGPUR', 'KER)', 'PATNA', 'CHGARH)', 'JH)',\n", 1184 | " 'SHIMLA', 'SRINAGAR', 'TRIVENDRUM'], dtype=object)" 1185 | ] 1186 | }, 1187 | "execution_count": 68, 1188 | "metadata": {}, 1189 | "output_type": "execute_result" 1190 | } 1191 | ], 1192 | "source": [ 1193 | "df.state.unique()" 1194 | ] 1195 | }, 1196 | { 1197 | "cell_type": "code", 1198 | "execution_count": 69, 1199 | "metadata": { 1200 | "collapsed": false 1201 | }, 1202 | "outputs": [], 1203 | "source": [ 1204 | "df['state'] = df.state.str.split(')').str[0]" 1205 | ] 1206 | }, 1207 | { 1208 | "cell_type": "code", 1209 | "execution_count": 70, 1210 | "metadata": { 1211 | "collapsed": false 1212 | }, 1213 | "outputs": [ 1214 | { 1215 | "data": { 1216 | "text/plain": [ 1217 | "array(['PB', 'UP', 'GUJ', 'MS', 'RAJ', 'BANGALORE', 'KNT', 'BHOPAL', 'OR',\n", 1218 | " 'BHR', 'WB', 'CHANDIGARH', 'CHENNAI', 'bellary', 'podisu', 'UTT',\n", 1219 | " 'DELHI', 'MP', 'TN', 'Podis', 'GUWAHATI', 'HYDERABAD', 'JAIPUR',\n", 1220 | " 'WHITE', 'JAMMU', 'HR', 'KOLKATA', 'AP', 'LUCKNOW', 'MUMBAI',\n", 1221 | " 'NAGPUR', 'KER', 'PATNA', 'CHGARH', 'JH', 'SHIMLA', 'SRINAGAR',\n", 1222 | " 'TRIVENDRUM'], dtype=object)" 1223 | ] 1224 | }, 1225 | "execution_count": 70, 1226 | "metadata": {}, 1227 | "output_type": "execute_result" 1228 | } 1229 | ], 1230 | "source": [ 1231 | "df.state.unique()" 1232 | ] 1233 | }, 1234 | { 1235 | "cell_type": "code", 1236 | "execution_count": 71, 1237 | "metadata": { 1238 | "collapsed": false 1239 | }, 1240 | "outputs": [], 1241 | "source": [ 1242 | "dfState = df.groupby(['state', 'market'], as_index=False).count()" 1243 | ] 1244 | }, 1245 | { 1246 | "cell_type": "code", 1247 | "execution_count": 72, 1248 | "metadata": { 1249 | "collapsed": false 1250 | }, 1251 | "outputs": [ 1252 | { 1253 | "data": { 1254 | "text/plain": [ 1255 | "array(['KURNOOL(AP)', 'RAJAHMUNDRY(AP)', 'BANGALORE', 'BHOPAL',\n", 1256 | " 'BIHARSHARIF(BHR)', 'CHANDIGARH', 'CHENNAI', 'RAIPUR(CHGARH)',\n", 1257 | " 'DELHI', 'AHMEDABAD(GUJ)', 'BHAVNAGAR(GUJ)', 'DEESA(GUJ)',\n", 1258 | " 'GONDAL(GUJ)', 'JAMNAGAR(GUJ)', 'MAHUVA(GUJ)', 'RAJKOT(GUJ)',\n", 1259 | " 'SURAT(GUJ)', 'GUWAHATI', 'KARNAL(HR)', 'HYDERABAD', 'JAIPUR',\n", 1260 | " 'JAMMU', 'RANCHI(JH)', 'PALAYAM(KER)', 'BELGAUM(KNT)',\n", 1261 | " 'BIJAPUR(KNT)', 'CHALLAKERE(KNT)', 'CHICKBALLAPUR(KNT)',\n", 1262 | " 'DHAVANGERE(KNT)', 'HASSAN(KNT)', 'HUBLI(KNT)', 'KOLAR(KNT)',\n", 1263 | " 'RAICHUR(KNT)', 'KOLKATA', 'LUCKNOW', 'DEWAS(MP)', 'INDORE(MP)',\n", 1264 | " 'MANDSOUR(MP)', 'NEEMUCH(MP)', 'SAGAR(MP)', 'UJJAIN(MP)',\n", 1265 | " 'AHMEDNAGAR(MS)', 'BOMBORI(MS)', 'CHAKAN(MS)', 'CHANDVAD(MS)',\n", 1266 | " 'DEVALA(MS)', 'DHULIA(MS)', 'DINDORI(MS)', 'JALGAON(MS)',\n", 1267 | " 'JUNNAR(MS)', 'KALVAN(MS)', 'KOLHAPUR(MS)', 'KOPERGAON(MS)',\n", 1268 | " 'LASALGAON(MS)', 'LONAND(MS)', 'MALEGAON(MS)', 'MANMAD(MS)',\n", 1269 | " 'NANDGAON(MS)', 'NASIK(MS)', 'NEWASA(MS)', 'NIPHAD(MS)',\n", 1270 | " 'PHALTAN (MS)', 'PIMPALGAON(MS)', 'PUNE(MS)', 'RAHATA(MS)',\n", 1271 | " 'RAHURI(MS)', 'SAIKHEDA(MS)', 'SANGALI(MS)', 'SANGAMNER(MS)',\n", 1272 | " 'SATANA(MS)', 'SHRIRAMPUR(MS)', 'SINNAR(MS)', 'SOLAPUR(MS)',\n", 1273 | " 'SRIRAMPUR(MS)', 'VANI(MS)', 'YEOLA(MS)', 'MUMBAI', 'NAGPUR',\n", 1274 | " 'BHUBNESWER(OR)', 'PATNA', 'ABOHAR(PB)', 'AMRITSAR(PB)',\n", 1275 | " 'BHATINDA(PB)', 'HOSHIARPUR(PB)', 'JALANDHAR(PB)', 'KHANNA(PB)',\n", 1276 | " 'LUDHIANA(PB)', 'PATIALA(PB)', 'DINDIGUL(TN)(Podis', 'AJMER(RAJ)',\n", 1277 | " 'ALWAR(RAJ)', 'BIKANER(RAJ)', 'JODHPUR(RAJ)', 'KOTA(RAJ)',\n", 1278 | " 'SRIGANGANAGAR(RAJ)', 'UDAIPUR(RAJ)', 'SHIMLA', 'SRINAGAR',\n", 1279 | " 'DINDIGUL(TN)', 'MADURAI(TN)', 'TRIVENDRUM', 'AGRA(UP)',\n", 1280 | " 'ALIGARH(UP)', 'BALLIA(UP)', 'BAREILLY(UP)', 'DEORIA(UP)',\n", 1281 | " 'ETAWAH(UP)', 'GORAKHPUR(UP)', 'KANPUR(UP)', 'MEERUT(UP)',\n", 1282 | " 'VARANASI(UP)', 'DEHRADOON(UTT)', 'HALDWANI(UTT)', 'BURDWAN(WB)',\n", 1283 | " 'MIDNAPUR(WB)', 'PURULIA(WB)', 'SHEROAPHULY(WB)', 'JALGAON(WHITE)',\n", 1284 | " 'COIMBATORE(TN) (bellary)', 'COIMBATORE(TN) (podisu)'], dtype=object)" 1285 | ] 1286 | }, 1287 | "execution_count": 72, 1288 | "metadata": {}, 1289 | "output_type": "execute_result" 1290 | } 1291 | ], 1292 | "source": [ 1293 | "dfState.market.unique()" 1294 | ] 1295 | }, 1296 | { 1297 | "cell_type": "code", 1298 | "execution_count": 73, 1299 | "metadata": { 1300 | "collapsed": false 1301 | }, 1302 | "outputs": [], 1303 | "source": [ 1304 | "state_now = ['PB', 'UP', 'GUJ', 'MS', 'RAJ', 'BANGALORE', 'KNT', 'BHOPAL', 'OR',\n", 1305 | " 'BHR', 'WB', 'CHANDIGARH', 'CHENNAI', 'bellary', 'podisu', 'UTT',\n", 1306 | " 'DELHI', 'MP', 'TN', 'Podis', 'GUWAHATI', 'HYDERABAD', 'JAIPUR',\n", 1307 | " 'WHITE', 'JAMMU', 'HR', 'KOLKATA', 'AP', 'LUCKNOW', 'MUMBAI',\n", 1308 | " 'NAGPUR', 'KER', 'PATNA', 'CHGARH', 'JH', 'SHIMLA', 'SRINAGAR',\n", 1309 | " 'TRIVENDRUM']" 1310 | ] 1311 | }, 1312 | { 1313 | "cell_type": "code", 1314 | "execution_count": 74, 1315 | "metadata": { 1316 | "collapsed": false 1317 | }, 1318 | "outputs": [], 1319 | "source": [ 1320 | "state_new =['PB', 'UP', 'GUJ', 'MS', 'RAJ', 'KNT', 'KNT', 'MP', 'OR',\n", 1321 | " 'BHR', 'WB', 'CH', 'TN', 'KNT', 'TN', 'UP',\n", 1322 | " 'DEL', 'MP', 'TN', 'TN', 'ASM', 'AP', 'RAJ',\n", 1323 | " 'MS', 'JK', 'HR', 'WB', 'AP', 'UP', 'MS',\n", 1324 | " 'MS', 'KER', 'BHR', 'HR', 'JH', 'HP', 'JK',\n", 1325 | " 'KEL']" 1326 | ] 1327 | }, 1328 | { 1329 | "cell_type": "code", 1330 | "execution_count": 75, 1331 | "metadata": { 1332 | "collapsed": false 1333 | }, 1334 | "outputs": [], 1335 | "source": [ 1336 | "df.state = df.state.replace(state_now, state_new)" 1337 | ] 1338 | }, 1339 | { 1340 | "cell_type": "code", 1341 | "execution_count": 76, 1342 | "metadata": { 1343 | "collapsed": false 1344 | }, 1345 | "outputs": [ 1346 | { 1347 | "data": { 1348 | "text/plain": [ 1349 | "array(['PB', 'UP', 'GUJ', 'MS', 'RAJ', 'KNT', 'MP', 'OR', 'BHR', 'WB',\n", 1350 | " 'CH', 'TN', 'DEL', 'ASM', 'AP', 'JK', 'HR', 'KER', 'JH', 'HP', 'KEL'], dtype=object)" 1351 | ] 1352 | }, 1353 | "execution_count": 76, 1354 | "metadata": {}, 1355 | "output_type": "execute_result" 1356 | } 1357 | ], 1358 | "source": [ 1359 | "df.state.unique()" 1360 | ] 1361 | }, 1362 | { 1363 | "cell_type": "markdown", 1364 | "metadata": {}, 1365 | "source": [ 1366 | "## Getting the Dates" 1367 | ] 1368 | }, 1369 | { 1370 | "cell_type": "code", 1371 | "execution_count": 77, 1372 | "metadata": { 1373 | "collapsed": false 1374 | }, 1375 | "outputs": [ 1376 | { 1377 | "data": { 1378 | "text/html": [ 1379 | "
\n", 1380 | "\n", 1381 | " \n", 1382 | " \n", 1383 | " \n", 1384 | " \n", 1385 | " \n", 1386 | " \n", 1387 | " \n", 1388 | " \n", 1389 | " \n", 1390 | " \n", 1391 | " \n", 1392 | " \n", 1393 | " \n", 1394 | " \n", 1395 | " \n", 1396 | " \n", 1397 | " \n", 1398 | " \n", 1399 | " \n", 1400 | " \n", 1401 | " \n", 1402 | " \n", 1403 | " \n", 1404 | " \n", 1405 | " \n", 1406 | " \n", 1407 | " \n", 1408 | " \n", 1409 | " \n", 1410 | " \n", 1411 | " \n", 1412 | " \n", 1413 | " \n", 1414 | " \n", 1415 | " \n", 1416 | " \n", 1417 | " \n", 1418 | " \n", 1419 | " \n", 1420 | " \n", 1421 | " \n", 1422 | " \n", 1423 | " \n", 1424 | " \n", 1425 | " \n", 1426 | " \n", 1427 | " \n", 1428 | " \n", 1429 | " \n", 1430 | " \n", 1431 | " \n", 1432 | " \n", 1433 | " \n", 1434 | " \n", 1435 | " \n", 1436 | " \n", 1437 | " \n", 1438 | " \n", 1439 | " \n", 1440 | " \n", 1441 | " \n", 1442 | " \n", 1443 | " \n", 1444 | " \n", 1445 | " \n", 1446 | " \n", 1447 | " \n", 1448 | " \n", 1449 | " \n", 1450 | " \n", 1451 | " \n", 1452 | " \n", 1453 | " \n", 1454 | " \n", 1455 | " \n", 1456 | " \n", 1457 | "
marketmonthyearquantitypriceMinpriceMaxpriceModstatecity
0ABOHAR(PB)Jan20052350404493446PBABOHAR
1ABOHAR(PB)Jan2006900487638563PBABOHAR
2ABOHAR(PB)Jan2010790128315921460PBABOHAR
3ABOHAR(PB)Jan2011245306737503433PBABOHAR
4ABOHAR(PB)Jan20121035523686605PBABOHAR
\n", 1458 | "
" 1459 | ], 1460 | "text/plain": [ 1461 | " market month year quantity priceMin priceMax priceMod state \\\n", 1462 | "0 ABOHAR(PB) Jan 2005 2350 404 493 446 PB \n", 1463 | "1 ABOHAR(PB) Jan 2006 900 487 638 563 PB \n", 1464 | "2 ABOHAR(PB) Jan 2010 790 1283 1592 1460 PB \n", 1465 | "3 ABOHAR(PB) Jan 2011 245 3067 3750 3433 PB \n", 1466 | "4 ABOHAR(PB) Jan 2012 1035 523 686 605 PB \n", 1467 | "\n", 1468 | " city \n", 1469 | "0 ABOHAR \n", 1470 | "1 ABOHAR \n", 1471 | "2 ABOHAR \n", 1472 | "3 ABOHAR \n", 1473 | "4 ABOHAR " 1474 | ] 1475 | }, 1476 | "execution_count": 77, 1477 | "metadata": {}, 1478 | "output_type": "execute_result" 1479 | } 1480 | ], 1481 | "source": [ 1482 | "df.head()" 1483 | ] 1484 | }, 1485 | { 1486 | "cell_type": "code", 1487 | "execution_count": 78, 1488 | "metadata": { 1489 | "collapsed": false 1490 | }, 1491 | "outputs": [ 1492 | { 1493 | "data": { 1494 | "text/plain": [ 1495 | "Int64Index([ 0, 1, 2, 3, 4, 5, 6, 7, 8,\n", 1496 | " 9,\n", 1497 | " ...\n", 1498 | " 10217, 10218, 10219, 10220, 10221, 10222, 10223, 10224, 10225,\n", 1499 | " 10226],\n", 1500 | " dtype='int64', length=10227)" 1501 | ] 1502 | }, 1503 | "execution_count": 78, 1504 | "metadata": {}, 1505 | "output_type": "execute_result" 1506 | } 1507 | ], 1508 | "source": [ 1509 | "df.index" 1510 | ] 1511 | }, 1512 | { 1513 | "cell_type": "code", 1514 | "execution_count": 79, 1515 | "metadata": { 1516 | "collapsed": false 1517 | }, 1518 | "outputs": [ 1519 | { 1520 | "data": { 1521 | "text/plain": [ 1522 | "Timestamp('2012-01-01 00:00:00')" 1523 | ] 1524 | }, 1525 | "execution_count": 79, 1526 | "metadata": {}, 1527 | "output_type": "execute_result" 1528 | } 1529 | ], 1530 | "source": [ 1531 | "pd.to_datetime('January 2012')" 1532 | ] 1533 | }, 1534 | { 1535 | "cell_type": "code", 1536 | "execution_count": 80, 1537 | "metadata": { 1538 | "collapsed": false 1539 | }, 1540 | "outputs": [], 1541 | "source": [ 1542 | "df['date'] = df['month'] + '-' + df['year'].map(str)" 1543 | ] 1544 | }, 1545 | { 1546 | "cell_type": "code", 1547 | "execution_count": 82, 1548 | "metadata": { 1549 | "collapsed": true 1550 | }, 1551 | "outputs": [], 1552 | "source": [ 1553 | "??map" 1554 | ] 1555 | }, 1556 | { 1557 | "cell_type": "code", 1558 | "execution_count": 81, 1559 | "metadata": { 1560 | "collapsed": false, 1561 | "scrolled": true 1562 | }, 1563 | "outputs": [ 1564 | { 1565 | "data": { 1566 | "text/html": [ 1567 | "
\n", 1568 | "\n", 1569 | " \n", 1570 | " \n", 1571 | " \n", 1572 | " \n", 1573 | " \n", 1574 | " \n", 1575 | " \n", 1576 | " \n", 1577 | " \n", 1578 | " \n", 1579 | " \n", 1580 | " \n", 1581 | " \n", 1582 | " \n", 1583 | " \n", 1584 | " \n", 1585 | " \n", 1586 | " \n", 1587 | " \n", 1588 | " \n", 1589 | " \n", 1590 | " \n", 1591 | " \n", 1592 | " \n", 1593 | " \n", 1594 | " \n", 1595 | " \n", 1596 | " \n", 1597 | " \n", 1598 | " \n", 1599 | " \n", 1600 | " \n", 1601 | " \n", 1602 | " \n", 1603 | " \n", 1604 | " \n", 1605 | " \n", 1606 | " \n", 1607 | " \n", 1608 | " \n", 1609 | " \n", 1610 | " \n", 1611 | " \n", 1612 | " \n", 1613 | " \n", 1614 | " \n", 1615 | " \n", 1616 | " \n", 1617 | " \n", 1618 | " \n", 1619 | " \n", 1620 | " \n", 1621 | " \n", 1622 | " \n", 1623 | " \n", 1624 | " \n", 1625 | " \n", 1626 | " \n", 1627 | " \n", 1628 | " \n", 1629 | " \n", 1630 | " \n", 1631 | " \n", 1632 | " \n", 1633 | " \n", 1634 | " \n", 1635 | " \n", 1636 | " \n", 1637 | " \n", 1638 | " \n", 1639 | " \n", 1640 | " \n", 1641 | " \n", 1642 | " \n", 1643 | " \n", 1644 | " \n", 1645 | " \n", 1646 | " \n", 1647 | " \n", 1648 | " \n", 1649 | " \n", 1650 | " \n", 1651 | "
marketmonthyearquantitypriceMinpriceMaxpriceModstatecitydate
0ABOHAR(PB)Jan20052350404493446PBABOHARJan-2005
1ABOHAR(PB)Jan2006900487638563PBABOHARJan-2006
2ABOHAR(PB)Jan2010790128315921460PBABOHARJan-2010
3ABOHAR(PB)Jan2011245306737503433PBABOHARJan-2011
4ABOHAR(PB)Jan20121035523686605PBABOHARJan-2012
\n", 1652 | "
" 1653 | ], 1654 | "text/plain": [ 1655 | " market month year quantity priceMin priceMax priceMod state \\\n", 1656 | "0 ABOHAR(PB) Jan 2005 2350 404 493 446 PB \n", 1657 | "1 ABOHAR(PB) Jan 2006 900 487 638 563 PB \n", 1658 | "2 ABOHAR(PB) Jan 2010 790 1283 1592 1460 PB \n", 1659 | "3 ABOHAR(PB) Jan 2011 245 3067 3750 3433 PB \n", 1660 | "4 ABOHAR(PB) Jan 2012 1035 523 686 605 PB \n", 1661 | "\n", 1662 | " city date \n", 1663 | "0 ABOHAR Jan-2005 \n", 1664 | "1 ABOHAR Jan-2006 \n", 1665 | "2 ABOHAR Jan-2010 \n", 1666 | "3 ABOHAR Jan-2011 \n", 1667 | "4 ABOHAR Jan-2012 " 1668 | ] 1669 | }, 1670 | "execution_count": 81, 1671 | "metadata": {}, 1672 | "output_type": "execute_result" 1673 | } 1674 | ], 1675 | "source": [ 1676 | "df.head()" 1677 | ] 1678 | }, 1679 | { 1680 | "cell_type": "code", 1681 | "execution_count": 85, 1682 | "metadata": { 1683 | "collapsed": false 1684 | }, 1685 | "outputs": [], 1686 | "source": [ 1687 | "index = pd.to_datetime(df.date)" 1688 | ] 1689 | }, 1690 | { 1691 | "cell_type": "code", 1692 | "execution_count": 86, 1693 | "metadata": { 1694 | "collapsed": false 1695 | }, 1696 | "outputs": [], 1697 | "source": [ 1698 | "df.index = pd.PeriodIndex(df.date, freq='M')" 1699 | ] 1700 | }, 1701 | { 1702 | "cell_type": "code", 1703 | "execution_count": null, 1704 | "metadata": { 1705 | "collapsed": false 1706 | }, 1707 | "outputs": [], 1708 | "source": [ 1709 | "df.columns" 1710 | ] 1711 | }, 1712 | { 1713 | "cell_type": "code", 1714 | "execution_count": 87, 1715 | "metadata": { 1716 | "collapsed": false 1717 | }, 1718 | "outputs": [ 1719 | { 1720 | "data": { 1721 | "text/plain": [ 1722 | "PeriodIndex(['2005-01', '2006-01', '2010-01', '2011-01', '2012-01', '2013-01',\n", 1723 | " '2014-01', '2015-01', '2005-02', '2006-02',\n", 1724 | " ...\n", 1725 | " '2006-12', '2007-12', '2008-12', '2009-12', '2010-12', '2011-12',\n", 1726 | " '2012-12', '2013-12', '2014-12', '2015-12'],\n", 1727 | " dtype='int64', length=10227, freq='M')" 1728 | ] 1729 | }, 1730 | "execution_count": 87, 1731 | "metadata": {}, 1732 | "output_type": "execute_result" 1733 | } 1734 | ], 1735 | "source": [ 1736 | "df.index" 1737 | ] 1738 | }, 1739 | { 1740 | "cell_type": "code", 1741 | "execution_count": 88, 1742 | "metadata": { 1743 | "collapsed": false 1744 | }, 1745 | "outputs": [ 1746 | { 1747 | "data": { 1748 | "text/html": [ 1749 | "
\n", 1750 | "\n", 1751 | " \n", 1752 | " \n", 1753 | " \n", 1754 | " \n", 1755 | " \n", 1756 | " \n", 1757 | " \n", 1758 | " \n", 1759 | " \n", 1760 | " \n", 1761 | " \n", 1762 | " \n", 1763 | " \n", 1764 | " \n", 1765 | " \n", 1766 | " \n", 1767 | " \n", 1768 | " \n", 1769 | " \n", 1770 | " \n", 1771 | " \n", 1772 | " \n", 1773 | " \n", 1774 | " \n", 1775 | " \n", 1776 | " \n", 1777 | " \n", 1778 | " \n", 1779 | " \n", 1780 | " \n", 1781 | " \n", 1782 | " \n", 1783 | " \n", 1784 | " \n", 1785 | " \n", 1786 | " \n", 1787 | " \n", 1788 | " \n", 1789 | " \n", 1790 | " \n", 1791 | " \n", 1792 | " \n", 1793 | " \n", 1794 | " \n", 1795 | " \n", 1796 | " \n", 1797 | " \n", 1798 | " \n", 1799 | " \n", 1800 | " \n", 1801 | " \n", 1802 | " \n", 1803 | " \n", 1804 | " \n", 1805 | " \n", 1806 | " \n", 1807 | " \n", 1808 | " \n", 1809 | " \n", 1810 | " \n", 1811 | " \n", 1812 | " \n", 1813 | " \n", 1814 | " \n", 1815 | " \n", 1816 | " \n", 1817 | " \n", 1818 | " \n", 1819 | " \n", 1820 | " \n", 1821 | " \n", 1822 | " \n", 1823 | " \n", 1824 | " \n", 1825 | " \n", 1826 | " \n", 1827 | " \n", 1828 | " \n", 1829 | " \n", 1830 | " \n", 1831 | " \n", 1832 | " \n", 1833 | "
marketmonthyearquantitypriceMinpriceMaxpriceModstatecitydate
2005-01ABOHAR(PB)Jan20052350404493446PBABOHARJan-2005
2006-01ABOHAR(PB)Jan2006900487638563PBABOHARJan-2006
2010-01ABOHAR(PB)Jan2010790128315921460PBABOHARJan-2010
2011-01ABOHAR(PB)Jan2011245306737503433PBABOHARJan-2011
2012-01ABOHAR(PB)Jan20121035523686605PBABOHARJan-2012
\n", 1834 | "
" 1835 | ], 1836 | "text/plain": [ 1837 | " market month year quantity priceMin priceMax priceMod state \\\n", 1838 | "2005-01 ABOHAR(PB) Jan 2005 2350 404 493 446 PB \n", 1839 | "2006-01 ABOHAR(PB) Jan 2006 900 487 638 563 PB \n", 1840 | "2010-01 ABOHAR(PB) Jan 2010 790 1283 1592 1460 PB \n", 1841 | "2011-01 ABOHAR(PB) Jan 2011 245 3067 3750 3433 PB \n", 1842 | "2012-01 ABOHAR(PB) Jan 2012 1035 523 686 605 PB \n", 1843 | "\n", 1844 | " city date \n", 1845 | "2005-01 ABOHAR Jan-2005 \n", 1846 | "2006-01 ABOHAR Jan-2006 \n", 1847 | "2010-01 ABOHAR Jan-2010 \n", 1848 | "2011-01 ABOHAR Jan-2011 \n", 1849 | "2012-01 ABOHAR Jan-2012 " 1850 | ] 1851 | }, 1852 | "execution_count": 88, 1853 | "metadata": {}, 1854 | "output_type": "execute_result" 1855 | } 1856 | ], 1857 | "source": [ 1858 | "df.head()" 1859 | ] 1860 | }, 1861 | { 1862 | "cell_type": "code", 1863 | "execution_count": null, 1864 | "metadata": { 1865 | "collapsed": true 1866 | }, 1867 | "outputs": [], 1868 | "source": [ 1869 | "df.to_csv('MonthWiseMarketArrivals_Clean.csv', index = False)" 1870 | ] 1871 | } 1872 | ], 1873 | "metadata": { 1874 | "kernelspec": { 1875 | "display_name": "Python 3", 1876 | "language": "python", 1877 | "name": "python3" 1878 | }, 1879 | "language_info": { 1880 | "codemirror_mode": { 1881 | "name": "ipython", 1882 | "version": 3 1883 | }, 1884 | "file_extension": ".py", 1885 | "mimetype": "text/x-python", 1886 | "name": "python", 1887 | "nbconvert_exporter": "python", 1888 | "pygments_lexer": "ipython3", 1889 | "version": "3.5.1" 1890 | } 1891 | }, 1892 | "nbformat": 4, 1893 | "nbformat_minor": 0 1894 | } 1895 | -------------------------------------------------------------------------------- /time_series/city_geocode.csv: -------------------------------------------------------------------------------- 1 | city,lon,lat 2 | GUWAHATI,91.7362365,26.1445169 3 | KOLKATA,88.363895,22.572646 4 | SRIRAMPUR,88.3385053,23.4033393 5 | SHEROAPHULY,88.3215014,22.7690032 6 | BURDWAN,87.8614793,23.2324214 7 | MIDNAPUR,87.3214908,22.4308892 8 | PURULIA,86.365208,23.3320779 9 | DHULIA,86.0618818,22.0347727 10 | BHUBNESWER,85.8245398,20.2960587 11 | BIHARSHARIF,85.5148735,25.1982147 12 | RANCHI,85.309562,23.3440997 13 | PATNA,85.1375645,25.5940947 14 | BALLIA,84.1487319,25.7584381 15 | DEORIA,83.7838214,26.4862373 16 | GORAKHPUR,83.3731675,26.7605545 17 | VARANASI,82.9739144,25.3176452 18 | RAJAHMUNDRY,81.8040345,17.0005383 19 | RAIPUR,81.6296413,21.2513844 20 | DINDORI,81.0768455,22.9417931 21 | LUCKNOW,80.946166,26.8466937 22 | KANPUR,80.3318736,26.449923 23 | CHENNAI,80.2707184,13.0826802 24 | HALDWANI,79.5129767,29.2182644 25 | BAREILLY,79.4304381,28.3670355 26 | NAGPUR,79.0881546,21.1458004 27 | ETAWAH,79.0046898,26.8117116 28 | SAGAR,78.7378068,23.838805 29 | SAIKHEDA,78.5831181,22.962215 30 | HYDERABAD,78.486671,17.385044 31 | KOLAR,78.1325611,13.1357446 32 | MADURAI,78.1197754,9.9252007 33 | ALIGARH,78.0880129,27.8973944 34 | KURNOOL,78.0372792,15.8281257 35 | DEHRADOON,78.0321918,30.3164945 36 | AGRA,78.0080745,27.1766701 37 | DINDIGUL,77.9802906,10.3673123 38 | CHICKBALLAPUR,77.7280396,13.432366 39 | MEERUT,77.7064137,28.9844618 40 | BANGALORE,77.5945627,12.9715987 41 | BHOPAL,77.412615,23.2599333 42 | RAICHUR,77.3439283,16.2120031 43 | DELHI,77.2090212,28.6139391 44 | SHIMLA,77.1734033,31.1048145 45 | KARNAL,76.9904825,29.6856929 46 | COIMBATORE,76.9558321,11.0168445 47 | PALAYAM,76.9513432,8.5027684 48 | TRIVENDRUM,76.9366376,8.5241391 49 | CHANDIGARH,76.7794179,30.7333148 50 | CHALLAKERE,76.6528225,14.313395 51 | ALWAR,76.6345735,27.5529907 52 | PATIALA,76.3868797,30.3397809 53 | DEVALA,76.3820088,11.4725502 54 | KHANNA,76.2112286,30.697852 55 | HASSAN,76.0995519,13.0068142 56 | DEWAS,76.0507949,22.9622672 57 | DHAVANGERE,75.9238397,14.4663438 58 | HOSHIARPUR,75.911483,31.5143178 59 | SOLAPUR,75.9063906,17.6599188 60 | KOTA,75.8647527,25.2138156 61 | INDORE,75.8577258,22.7195687 62 | LUDHIANA,75.8572758,30.900965 63 | JAIPUR,75.7872709,26.9124336 64 | UJJAIN,75.7849097,23.1793013 65 | BIJAPUR,75.710031,16.8301708 66 | JALANDHAR,75.5761829,31.3260152 67 | JALGAON,75.5626039,21.0076578 68 | HUBLI,75.1239547,15.3647083 69 | MANDSOUR,75.0692952,24.076836 70 | BHATINDA,74.9454745,30.210994 71 | SRINAGAR,74.9442585,34.1255413 72 | NEWASA,74.9281063,19.5511772 73 | AMRITSAR,74.8722642,31.6339793 74 | NEEMUCH,74.8624092,24.4763852 75 | JAMMU,74.8576539,32.7217819 76 | AHMEDNAGAR,74.7495916,19.0952075 77 | SHRIRAMPUR,74.6576091,19.6222323 78 | RAHURI,74.6488264,19.392678 79 | AJMER,74.6399163,26.4498954 80 | SANGALI,74.5814773,16.8523973 81 | MALEGAON,74.5100291,20.5547497 82 | BELGAUM,74.4976741,15.8496953 83 | RAHATA,74.483335,19.7127021 84 | YEOLA,74.4818698,20.0471229 85 | KOPERGAON,74.4790898,19.8916791 86 | MANMAD,74.4366016,20.2511789 87 | PHALTAN ,74.4360424,17.9844507 88 | CHANDVAD,74.2472779,20.3271277 89 | KOLHAPUR,74.2432527,16.7049873 90 | LASALGAON,74.2326058,20.1491422 91 | SANGAMNER,74.2079648,19.5771387 92 | SATANA,74.2032581,20.598224 93 | ABOHAR,74.1993043,30.1452928 94 | LONAND,74.1861821,18.041706 95 | NIPHAD,74.1093141,20.0799646 96 | SINNAR,74.0006328,19.8530593 97 | PIMPALGAON,73.9873787,20.1699678 98 | SRIGANGANAGAR,73.8771901,29.9038399 99 | JUNNAR,73.87425,19.2031842 100 | CHAKAN,73.8630346,18.7602664 101 | PUNE,73.8567437,18.5204303 102 | NASIK,73.7898023,19.9974533 103 | UDAIPUR,73.712479,24.585445 104 | BIKANER,73.3119159,28.0229348 105 | JODHPUR,73.0243094,26.2389469 106 | NANDGAON,72.9276008,18.3855337 107 | MUMBAI,72.8776559,19.0759837 108 | SURAT,72.8310607,21.1702401 109 | AHMEDABAD,72.5713621,23.022505 110 | DEESA,72.1906721,24.2585031 111 | BHAVNAGAR,72.1519304,21.7644725 112 | MAHUVA,71.7563169,21.0902193 113 | RAJKOT,70.8021599,22.3038945 114 | GONDAL,70.792297,21.9619463 115 | JAMNAGAR,70.05773,22.4707019 116 | KALVAN,73.13054,19.24033 117 | VANI,73.89189,20.33749 118 | BOMBORI,72.87766,19.07598 -------------------------------------------------------------------------------- /time_series/img/Cov_nonstationary.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/time_series/img/Cov_nonstationary.png -------------------------------------------------------------------------------- /time_series/img/Mean_nonstationary.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/time_series/img/Mean_nonstationary.png -------------------------------------------------------------------------------- /time_series/img/Var_nonstationary.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/time_series/img/Var_nonstationary.png -------------------------------------------------------------------------------- /time_series/img/left_merge.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/time_series/img/left_merge.png -------------------------------------------------------------------------------- /time_series/img/onion_small.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/time_series/img/onion_small.png -------------------------------------------------------------------------------- /time_series/img/onion_tables.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/time_series/img/onion_tables.png -------------------------------------------------------------------------------- /time_series/img/peeling_the_onion_small.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/time_series/img/peeling_the_onion_small.png -------------------------------------------------------------------------------- /time_series/img/pivot.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/time_series/img/pivot.png -------------------------------------------------------------------------------- /time_series/img/splitapplycombine.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/time_series/img/splitapplycombine.png -------------------------------------------------------------------------------- /time_series/img/subsetcolumns.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/time_series/img/subsetcolumns.png -------------------------------------------------------------------------------- /time_series/img/subsetrows.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/time_series/img/subsetrows.png -------------------------------------------------------------------------------- /time_series/state_geocode.csv: -------------------------------------------------------------------------------- 1 | "state","name","lon","lat" 2 | "MS","Maharashtra",75.7138884,19.7514798 3 | "GUJ","Gujarat",71.1923805,22.258652 4 | "MP","Madhya pradesh",78.6568942,22.9734229 5 | "TN","Tamil Nadu",78.6568942,11.1271225 6 | "KNT","Karnataka",75.7138884,15.3172775 7 | "DEL","Delhi",77.2090212,28.6139391 8 | "HR","Haryana",76.085601,29.0587757 9 | "RAJ","Rajasthan",74.2179326,27.0238036 10 | "AP","Andhra Pradesh",79.7399875,15.9128998 11 | "UP","Uttar Pradesh",80.9461592,26.8467088 12 | "JK","Jammu & Kashmir",74.8576539,32.7217819 13 | "BHR","Bihar",85.3131194,25.0960742 14 | "WB","West Bengal",87.8549755,22.9867569 15 | "HP","Himachal Pradesh",77.1733901,31.1048294 16 | "ASM","Assam",92.9375739,26.2006043 17 | "KEL","Kerala",76.2710833,10.8505159 18 | "JH","Jharkhand",85.2799354,23.6101808 19 | "OR","Orissa",85.0985236,20.9516658 20 | "PB","Punjab",75.3412179,31.1471305 21 | "KER","Kerala",76.2710833,10.8505159 22 | "CH","Chandigarh",76.7794179,30.7333148 23 | --------------------------------------------------------------------------------