├── LICENSE
├── README.md
├── check_env.py
├── img
├── ISLR.jpeg
├── acquire.jpg
├── amit.png
├── approach.jpg
├── art.jpeg
├── bargava.jpg
├── book.png
├── books.jpg
├── break.jpg
├── clay.jpeg
├── craft.jpeg
├── estimating_coefficients.png
├── explore.jpg
├── frame.jpg
├── glass.jpg
├── insight.jpg
├── lens.jpeg
├── model.jpg
├── numbers.jpg
├── onion-image.jpg
├── onion.jpg
├── onion.png
├── overview.jpg
├── pair.jpg
├── postit.jpg
├── r2.gif
├── r_squared.png
├── refine.jpg
├── retail.jpg
├── science.jpeg
├── see.jpeg
├── single.jpeg
├── skills.png
├── slope_intercept.png
├── speak.jpeg
├── sports.jpg
├── stars.jpg
├── think.jpg
├── thinkstats.jpg
├── time.jpg
├── tool.jpg
├── travel.jpg
├── welcome.jpg
├── wesmckinney.jpg
└── workshop.jpg
├── installation_instructions.md
├── overview.md
├── overview.pdf
├── python.txt
└── time_series
├── 1-Frame.ipynb
├── 2-Acquire.ipynb
├── 3-Refine.ipynb
├── 4-Explore.ipynb
├── 5-Model.ipynb
├── 6-Insight.ipynb
├── MonthWiseMarketArrivals.csv
├── MonthWiseMarketArrivals.html
├── MonthWiseMarketArrivalsJan2016.html
├── MonthWiseMarketArrivals_Clean.csv
├── city_geocode.csv
├── img
├── Cov_nonstationary.png
├── Mean_nonstationary.png
├── Var_nonstationary.png
├── corr.svg
├── left_merge.png
├── onion_small.png
├── onion_tables.png
├── peeling_the_onion_small.png
├── pivot.png
├── splitapplycombine.png
├── subsetcolumns.png
└── subsetrows.png
└── state_geocode.csv
/LICENSE:
--------------------------------------------------------------------------------
1 | The MIT License (MIT)
2 |
3 | Copyright (c) 2016 Amit Kapoor & Bargava Subramanian
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Time Series Analysis using Python
2 | Workshop material for Time Series Analysis in Python
3 | by [Amit Kapoor](http://twitter.com/amitkaps) and [Bargava Subramanian](http://twitter.com/bargava)
4 |
5 | **Experience Level** : Beginner
6 |
7 | **Overview**: A lot of data that we see in nature are in continuous time series. This workshop will provide an overview on how to do time series analysis and introduce time series forecasting.
8 |
9 | **Audience**: People interested in Data analytics on time series data.
10 |
11 | **Objective**:
12 |
13 | 1. What is time series data?
14 | 2. How to visualize time series data
15 | 3. How to analyze time series data ?
16 | 4. How to forecast time series data?
17 |
18 |
19 | Weather data, stock prices, population of a country are all examples of time series data. The data is continuously recorded daily, weekly, monthly etc. While a lot of theory has been developed for representing and analyzing data at a point in time, many of those don't work well with continuous time series data.
20 |
21 | The goal of this workshop is two-fold:
22 |
23 | 1. How to analyze/visualize time-series data
24 | 2. How to forecast using the available time-series data
25 |
26 | We will take a principled scientific approach on how to gather data, prepare data and explore it. We will create some summary metrics using the available data.
27 |
28 | Then we will define the problem(s) we want to forecast and introduce some of the common time series forecasting models and implement them using Python.
29 |
30 | **Outline**
31 |
32 | * Obtaining time series data
33 | * Determine what questions need to be answered
34 | * Generate hypotheses for various solution approaches
35 | * Exploring time series data
36 | * Outliers
37 | * Missing values
38 | * Creating aggregate metrics
39 | * Calculate percentage/proportion metrics
40 | * Summary metrics
41 | * Visualize time series data
42 | * Time Series forecasting
43 | * Linear regression
44 | * Moving average
45 | * Time series decomposition
46 | * ARIMA
47 | * Dynamic Regression Models
48 | * Vector Autoregression
49 | * Exponential Smoothing
50 |
51 |
52 |
53 | *Script to check if requisite libraries for the workshop is present*
54 | Please execute the following command at the command prompt
55 |
56 | $ python check_env.py
57 |
58 | If any library has a `FAIL` message, please install/upgrade that library.
59 |
60 | Installation instructions can be found [here](https://github.com/rouseguy/TimeSeriesAnalysiswithPython/blob/master/installation_instructions.md)
61 |
62 | ---
63 | ### Licensing
64 |
65 | Time Series Analysis using Python by Amit Kapoor and Bargava Subramanian is licensed under a MIT License.
66 |
--------------------------------------------------------------------------------
/check_env.py:
--------------------------------------------------------------------------------
1 | # Authors: Amit Kapoor and Bargava Subramanian
2 | # Copyright (c) 2016 Amit Kapoor
3 | # License: MIT License
4 |
5 | """
6 | This script will check if the environment setup is correct for the workshop.
7 |
8 | To run, please execute the following command from the command prompt
9 | >>> python check_env.py
10 |
11 | The output will indicate if any of the libraries are missing or need to be updated.
12 |
13 | This script is inspired from https://github.com/fonnesbeck/scipy2015_tutorial/blob/master/check_env.py
14 | """
15 |
16 | from __future__ import print_function
17 |
18 | try:
19 | import curses
20 | curses.setupterm()
21 | assert curses.tigetnum("colors") > 2
22 | OK = "\x1b[1;%dm[ OK ]\x1b[0m" % (30 + curses.COLOR_GREEN)
23 | FAIL = "\x1b[1;%dm[FAIL]\x1b[0m" % (30 + curses.COLOR_RED)
24 | except:
25 | OK = '[ OK ]'
26 | FAIL = '[FAIL]'
27 |
28 | import sys
29 | try:
30 | import importlib
31 | except ImportError:
32 | print(FAIL, "Python version 2.7 is required, but %s is installed." % sys.version)
33 | from distutils.version import LooseVersion as Version
34 |
35 | def import_version(pkg, min_ver, fail_msg=""):
36 | mod = None
37 | try:
38 | mod = importlib.import_module(pkg)
39 | if((pkg=="spacy" or pkg=="wordcloud") and (mod > 0)):
40 | print(OK, '%s ' % (pkg))
41 | else:
42 | #else:
43 | version = getattr(mod, "__version__", 0) or getattr(mod, "VERSION", 0)
44 | if Version(version) < min_ver:
45 | print(FAIL, "%s version %s or higher required, but %s installed."
46 | % (lib, min_ver, version))
47 | else:
48 | print(OK, '%s version %s' % (pkg, version))
49 | except ImportError:
50 | print(FAIL, '%s not installed. %s' % (pkg, fail_msg))
51 | return mod
52 |
53 |
54 | # first check the python version
55 | print('Using python in', sys.prefix)
56 | print(sys.version)
57 | pyversion = Version(sys.version)
58 | if pyversion < "3":
59 | print(FAIL, "Python version 3 is required, but %s is installed." % sys.version)
60 | elif pyversion >= "2":
61 | if pyversion == "2.7":
62 | print(FAIL, "Python version 2.7 is installed. Please upgrade to version 3." )
63 | else:
64 | print(FAIL, "Unknown Python version: %s" % sys.version)
65 |
66 | print()
67 | requirements = {
68 |
69 | 'IPython' : '4.0.3',
70 | 'jupyter' :'1.0.0',
71 | 'matplotlib' :'1.5.0',
72 | 'numpy' : '1.10.4',
73 | 'pandas' : '0.17.1',
74 | 'scipy' : '0.17.0',
75 | 'sklearn' : '0.17',
76 | 'seaborn' :'0.6.0',
77 | 'statsmodels':'0.6.1'
78 | }
79 |
80 | # now the dependencies
81 | for lib, required_version in list(requirements.items()):
82 | import_version(lib, required_version)
83 |
84 |
85 |
86 |
--------------------------------------------------------------------------------
/img/ISLR.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/img/ISLR.jpeg
--------------------------------------------------------------------------------
/img/acquire.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/img/acquire.jpg
--------------------------------------------------------------------------------
/img/amit.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/img/amit.png
--------------------------------------------------------------------------------
/img/approach.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/img/approach.jpg
--------------------------------------------------------------------------------
/img/art.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/img/art.jpeg
--------------------------------------------------------------------------------
/img/bargava.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/img/bargava.jpg
--------------------------------------------------------------------------------
/img/book.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/img/book.png
--------------------------------------------------------------------------------
/img/books.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/img/books.jpg
--------------------------------------------------------------------------------
/img/break.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/img/break.jpg
--------------------------------------------------------------------------------
/img/clay.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/img/clay.jpeg
--------------------------------------------------------------------------------
/img/craft.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/img/craft.jpeg
--------------------------------------------------------------------------------
/img/estimating_coefficients.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/img/estimating_coefficients.png
--------------------------------------------------------------------------------
/img/explore.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/img/explore.jpg
--------------------------------------------------------------------------------
/img/frame.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/img/frame.jpg
--------------------------------------------------------------------------------
/img/glass.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/img/glass.jpg
--------------------------------------------------------------------------------
/img/insight.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/img/insight.jpg
--------------------------------------------------------------------------------
/img/lens.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/img/lens.jpeg
--------------------------------------------------------------------------------
/img/model.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/img/model.jpg
--------------------------------------------------------------------------------
/img/numbers.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/img/numbers.jpg
--------------------------------------------------------------------------------
/img/onion-image.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/img/onion-image.jpg
--------------------------------------------------------------------------------
/img/onion.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/img/onion.jpg
--------------------------------------------------------------------------------
/img/onion.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/img/onion.png
--------------------------------------------------------------------------------
/img/overview.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/img/overview.jpg
--------------------------------------------------------------------------------
/img/pair.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/img/pair.jpg
--------------------------------------------------------------------------------
/img/postit.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/img/postit.jpg
--------------------------------------------------------------------------------
/img/r2.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/img/r2.gif
--------------------------------------------------------------------------------
/img/r_squared.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/img/r_squared.png
--------------------------------------------------------------------------------
/img/refine.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/img/refine.jpg
--------------------------------------------------------------------------------
/img/retail.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/img/retail.jpg
--------------------------------------------------------------------------------
/img/science.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/img/science.jpeg
--------------------------------------------------------------------------------
/img/see.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/img/see.jpeg
--------------------------------------------------------------------------------
/img/single.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/img/single.jpeg
--------------------------------------------------------------------------------
/img/skills.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/img/skills.png
--------------------------------------------------------------------------------
/img/slope_intercept.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/img/slope_intercept.png
--------------------------------------------------------------------------------
/img/speak.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/img/speak.jpeg
--------------------------------------------------------------------------------
/img/sports.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/img/sports.jpg
--------------------------------------------------------------------------------
/img/stars.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/img/stars.jpg
--------------------------------------------------------------------------------
/img/think.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/img/think.jpg
--------------------------------------------------------------------------------
/img/thinkstats.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/img/thinkstats.jpg
--------------------------------------------------------------------------------
/img/time.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/img/time.jpg
--------------------------------------------------------------------------------
/img/tool.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/img/tool.jpg
--------------------------------------------------------------------------------
/img/travel.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/img/travel.jpg
--------------------------------------------------------------------------------
/img/welcome.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/img/welcome.jpg
--------------------------------------------------------------------------------
/img/wesmckinney.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/img/wesmckinney.jpg
--------------------------------------------------------------------------------
/img/workshop.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/img/workshop.jpg
--------------------------------------------------------------------------------
/installation_instructions.md:
--------------------------------------------------------------------------------
1 | # Installation Instructions for the workshop
2 |
3 |
4 | ### Package Manager: Anaconda
5 |
6 | We strongly recommend using Anaconda. It can be downloaded from here:
7 | https://www.continuum.io/downloads
8 |
9 | It comes with `jupyter notebook` which is the IDE we will be using for the workshop
10 |
11 | We recommend using the Python 3.5 version.
12 |
13 | ### Required packages
14 |
15 | Run the following script at the command prompt to check if you have all the requisite packages installed.
16 | To run, please execute the following command from the command prompt
17 |
18 | $ python check_env.py
19 |
20 | The output will indicate if any of the libraries are missing or need to be updated.
21 |
22 | Any package that is missing can be installed by running the command at the command prompt
23 |
24 | $ pip install
25 |
26 | Any package that needs to be upgraded can be upgraded by running the command at the command prompt
27 |
28 | $ pip install --upgrade
29 |
30 |
31 | Replace <*package_name*> with the package that needs to be installed/upgraded.
32 |
33 |
--------------------------------------------------------------------------------
/overview.md:
--------------------------------------------------------------------------------
1 | 
2 | # Intro to Data Science and Machine Learning
3 | ### @amitkaps | @bargava
4 |
5 | ---
6 |
7 | 
8 | # Welcome
9 |
10 | ---
11 |
12 | # Facilitators
13 | 
14 | 
15 |
16 | ---
17 |
18 | # Amit
19 | ## @amitkaps
20 | 
21 |
22 | ---
23 |
24 | # Bargava
25 | ## @bargava
26 | 
27 |
28 |
29 | ---
30 |
31 | 
32 | # See the world through a data lens
33 |
34 | ---
35 |
36 | 
37 | # "Data is just a clue to the end truth"
38 | -- Josh Smith
39 |
40 | ---
41 |
42 | 
43 | 
44 | 
45 | # Data Driven Decisions
46 |
47 | ---
48 |
49 | 
50 | # "Science is knowledge which we understand so well that we can teach it to a computer. Everything else is art"
51 | -- Donald Knuth
52 |
53 | ---
54 |
55 | 
56 | # Data Science is an Art
57 |
58 | ---
59 |
60 | 
61 | # Hypothesis Driven Approach
62 |
63 | ---
64 |
65 | 
66 | # Frame
67 | ## "An approximate answer to the right problem is worth a good deal"
68 |
69 | ---
70 |
71 | 
72 | # Acquire
73 | ## "80% perspiration, 10% great idea, 10% great output"
74 |
75 | ---
76 |
77 | 
78 | # Refine
79 | ## "All data is messy."
80 |
81 | ---
82 |
83 | 
84 | # Explore
85 | ## "I don't know, what I don't know."
86 |
87 | ---
88 |
89 | 
90 | # Model
91 | ## "All models are wrong, but some are useful"
92 |
93 | ---
94 |
95 | 
96 | # Insight
97 | ## "The goal is to turn data into insight"
98 |
99 | ---
100 |
101 | 
102 |
103 |
104 | ---
105 |
106 | 
107 | ## "Doing data analyis requires quite a bit of thinking and we believe that when you’ve completed a good data analysis, you’ve spent more time thinking than doing."
108 | -- Roger Peng
109 |
110 | ---
111 |
112 | 
113 | # Python Data Stack
114 |
115 | ---
116 |
117 | 
118 | # Case Studies
119 |
120 | ---
121 | # Day 1
122 | # Peeling the Onion
123 | ## Time Series Analysis
124 | 
125 |
126 | ---
127 |
128 | # Day 2
129 | # Grocery
130 | ## Market Basket Analysis / Collaborative Filter
131 |
132 | ---
133 |
134 | # Day 2
135 | # BanK Marketing
136 | ## Random Forest and Gradient Boosting
137 |
138 | ---
139 |
140 | # Day 3
141 | # DataTau
142 | ## Text Analytics
143 |
144 | ---
145 |
146 | 
147 | # Learning Approach
148 |
149 | ---
150 |
151 | 
152 | # Do the Exercises
153 |
154 | ---
155 |
156 | 
157 | # Pair up & Learn
158 |
159 | ---
160 |
161 | 
162 | # Call for Help
163 |
164 | ---
165 |
166 | 
167 | # Enjoy the workshop
168 |
169 | ---
170 |
171 | ## Workshop Material is available at the Github Repo
172 | ### [https://github.com/amitkaps/machine-learning](https://github.com/amitkaps/machine-learning)
173 |
174 | ---
175 |
176 | # Exercise
177 |
178 | ---
179 |
180 | # 1. Time Series Exercise
181 |
182 | ### "Predict the number of tickets that will be raised in the next week"
183 |
184 | - **Frame**: What to forecast? At what horizon? At what level?
185 | - **Acquire, Refine, Explore**: Do EDA to understand the trend and pattern within the data
186 | - **Models**: Mean Model, Linear Trend, Random Walk, Simple Moving Average, Exp Smoothing, Decomposition, ARIMA
187 | - **Insight**: Share the insight through a datavis of the models
188 |
189 | ---
190 |
191 | # 2. Text Analytics Exercise
192 |
193 | ### "Identify the entity, features & topics in the 'Comments' data or 'Twitter #machine learning' data"
194 |
195 | - **Frame**: What are the comments you are trying to understand?
196 | - **Acquire, Refine, Explore**: Do Wordcloud, Lemmatization, Part of Speech Analysis, and Entity Chunking
197 | - **Models**: TF-IDF, Topic Modelling, Sentiment Analysis
198 | - **Insight**: Share the insight through word cloud and topic visualisation
199 |
200 | ---
201 |
202 | # Feedback
203 |
204 | ### [https://amitkaps.typeform.com/to/i6wl2E](https://amitkaps.typeform.com/to/i6wl2E)
205 |
206 |
207 | ---
208 |
209 | # Recap
210 |
211 | ---
212 |
213 | 
214 |
215 | ---
216 |
217 | 
218 | # Frame
219 | - **Toy Problems**
220 | - **Simple Problems**
221 | - Complex Problems
222 | - Business Problems
223 | - Research Problems
224 |
225 | ---
226 |
227 | 
228 | # Acquire
229 | - **Scraping** (structured, unstructured)
230 | - **Files** (csv, xls, json, xml, pdf, ...)
231 | - Database (sqlite, ...)
232 | - APIs
233 | - Streaming
234 |
235 | ---
236 |
237 | 
238 | # Refine
239 | - Data Cleaning (inconsistent, missing, ...)
240 | - **Data Refining** (derive, parse, merge, filter, convert, ...)
241 | - **Data Transformations** (group by, pivot, aggregate, sample, summarise, ...)
242 |
243 |
244 | ---
245 |
246 | 
247 | # Explore
248 | - **Simple Vis**
249 | - Multi Dimensional Vis
250 | - Geographic Vis
251 | - Large Data Vis (Bin - Summarise - Smooth)
252 | - Interactive Vis
253 |
254 | ---
255 |
256 | 
257 | # Model - Supervised Learning
258 | - *Continuous*: Regression - **Linear**, Polynomial, Tree Based Methods - CART, **Random Forest**, Gradient Boosting Machines
259 | - *Classification* - **Logistics Regression**, Tree, KNN, SVM, Naive-Bayes, Bayesian Network
260 |
261 | ---
262 |
263 | 
264 | # Model - UnSupervised Learning
265 | - *Continuous*: Clustering & Dimensionality Reduction like PCA, SVD, MDS, K-means
266 | - *Categorical*: Association Analysis
267 |
268 | ---
269 |
270 | 
271 | # Model - Advanced /
272 | - **Time Series**
273 | - **Text Analytics**
274 | - Network / Graph Analytics
275 | - Optimization
276 |
277 | ---
278 | 
279 | # Model - Specialized
280 | - Reinforcement Learning
281 | - Online Learning
282 | - Deep Learning
283 | - Other Applications: Image, Speech
284 |
285 |
286 | ---
287 |
288 | 
289 | # Insight
290 | - Narrative Visualisation
291 | - Dashboard Visualisation
292 | - Decision Making Tools
293 | - Automated Decision Tools
294 |
295 | ---
296 |
297 | # PyData Stack
298 | - **Acquire / Refine**: `Pandas, Beautiful Soup, Selenium, Requests, SQL Alchemy, Numpy, Blaze`
299 | - **Explore**: `MatPlotLib, Seaborn, Bokeh, Plotly, Vega, Folium`
300 | - **Model**: `Scikit-Learn, StatsModels, SciPy, Gensim, Keras, Tensor Flow, PySpark`
301 | - **Insight**: `Django, Flask`
302 |
303 |
304 | ---
305 |
306 | # Skills
307 | 
308 |
309 | ---
310 |
311 | 
312 |
313 | ---
314 |
315 | # Books
316 |
317 | 
318 | 
319 | 
320 |
321 |
322 | ---
323 |
324 | 
325 | 
326 | 
327 |
328 | ---
329 |
330 | 
331 | ## Resources - Statistical Learning
332 | - One of the good books on statistical learning is ISLR -> [An Introduction to Statistical Learning with Application in R](http://www-bcf.usc.edu/~gareth/ISL/index.html)
333 | - You can find all the ISLR code in python at this github repo - [https://github.com/JWarmenhoven/ISLR-python](https://github.com/JWarmenhoven/ISLR-python)
334 |
335 | ---
336 |
337 | ## Resources - Time Series
338 | - [Forecasting: Principle and Text](https://www.otexts.org/fpp)
339 | - [Statistical forecasting: Notes on regression and time series analysis Case](http://people.duke.edu/~rnau/411home.htm)
340 |
341 | ## Resources - Text Analytics
342 | - [Natural Language Processing with Python](http://www.nltk.org/book/)
343 |
344 |
345 | ---
346 | 
347 | # Online Course
348 | - Harvard Data Science Course - [CS 109 Course](http://cs109.github.io/2015/) (It is structured in similar way to the approach we shared)
349 | - Data Science Specialisation - [JHU Data Science](https://www.coursera.org/specializations/jhu-data-science) (It is a good course, though the material is coded in R)
350 |
351 | - Many more on Coursera & Udacity...
352 |
353 |
354 | ---
355 | 
356 | # We enjoyed the workshop!
357 |
358 | ---
359 | 
360 | # Speak to Us!
361 |
362 | ---
363 |
364 | 
365 | # Thank you
366 | ## @amitkaps | @bargava
--------------------------------------------------------------------------------
/overview.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/overview.pdf
--------------------------------------------------------------------------------
/python.txt:
--------------------------------------------------------------------------------
1 | abstract-rendering==0.5.1
2 | alabaster==0.7.7
3 | anaconda-client==1.2.2
4 | appnope==0.1.0
5 | appscript==1.0.1
6 | argcomplete==1.0.0
7 | astropy==1.1.1
8 | Babel==2.2.0
9 | beautifulsoup4==4.4.1
10 | bitarray==0.8.1
11 | blaze==0.9.0
12 | bokeh==0.11.0
13 | boto==2.39.0
14 | Bottleneck==1.0.0
15 | cffi==1.2.1
16 | clyent==1.2.0
17 | colorama==0.3.6
18 | conda==4.0.4
19 | conda-build==1.19.0
20 | conda-env==2.4.5
21 | configobj==5.0.6
22 | cryptography==1.0.2
23 | cycler==0.10.0
24 | Cython==0.23.4
25 | cytoolz==0.7.5
26 | datashape==0.5.0
27 | decorator==4.0.6
28 | docutils==0.12
29 | dynd===f641248
30 | et-xmlfile==1.0.1
31 | fastcache==1.0.2
32 | Flask==0.10.1
33 | futures==3.0.3
34 | greenlet==0.4.9
35 | h5py==2.5.0
36 | html5lib==0.999
37 | idna==2.0
38 | ipykernel==4.2.2
39 | ipython==4.0.3
40 | ipython-genutils==0.1.0
41 | ipywidgets==4.1.1
42 | itsdangerous==0.24
43 | jdcal==1.2
44 | jedi==0.9.0
45 | Jinja2==2.8
46 | jsonschema==2.4.0
47 | jupyter==1.0.0
48 | jupyter-client==4.1.1
49 | jupyter-console==4.1.0
50 | jupyter-core==4.0.6
51 | llvmlite==0.8.0
52 | lxml==3.5.0
53 | MarkupSafe==0.23
54 | matplotlib==1.5.1
55 | mistune==0.7.1
56 | multipledispatch==0.4.8
57 | nbconvert==4.1.0
58 | nbformat==4.0.1
59 | networkx==1.11
60 | nltk==3.2
61 | nose==1.3.7
62 | notebook==4.1.0
63 | numba==0.23.1
64 | numexpr==2.4.6
65 | numpy==1.10.4
66 | odo==0.4.0
67 | openpyxl==2.3.2
68 | pandas==0.17.1
69 | path.py==0.0.0
70 | patsy==0.4.0
71 | pep8==1.7.0
72 | pexpect==3.3
73 | pickleshare==0.5
74 | Pillow==3.1.0
75 | ply==3.8
76 | psutil==3.4.2
77 | ptyprocess==0.5
78 | py==1.4.31
79 | pyasn1==0.1.9
80 | pycosat==0.6.1
81 | pycparser==2.14
82 | pycrypto==2.6.1
83 | pycurl==7.19.5.3
84 | pyflakes==1.0.0
85 | Pygments==2.1
86 | pyOpenSSL==0.15.1
87 | pyparsing==2.0.3
88 | pytest==2.8.5
89 | python-dateutil==2.4.2
90 | pytz==2015.7
91 | PyYAML==3.11
92 | pyzmq==15.2.0
93 | qtconsole==4.1.1
94 | redis==2.10.3
95 | requests==2.9.1
96 | rope-py3k==0.9.4.post1
97 | scikit-image==0.11.3
98 | scikit-learn==0.17
99 | scipy==0.17.0
100 | seaborn==0.7.0
101 | simplegeneric==0.8.1
102 | six==1.10.0
103 | snowballstemmer==1.2.1
104 | sockjs-tornado==1.0.1
105 | Sphinx==1.3.5
106 | sphinx-rtd-theme==0.1.9
107 | spyder==2.3.8
108 | SQLAlchemy==1.0.12
109 | statsmodels==0.6.1
110 | sympy==0.7.6.1
111 | tables==3.2.2
112 | terminado==0.5
113 | toolz==0.7.4
114 | tornado==4.3
115 | traitlets==4.1.0
116 | unicodecsv==0.14.1
117 | Werkzeug==0.11.3
118 | xgboost==0.4a30
119 | xlrd==0.9.4
120 | XlsxWriter==0.8.4
121 | xlwings==0.6.4
122 | xlwt==1.0.0
123 |
--------------------------------------------------------------------------------
/time_series/1-Frame.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# 1. Frame the Problem\n",
8 | "\n",
9 | "In late 2010, Onion prices shot through the roof and causing grave crisis. Apparently the crisis was caused by lack of rainfall in major onion producing region - Maharashtra and Karnataka and led to large scale hoarding by the traders. The crisis caused political tension in the country and described as \"a grave concern\" by then Prime Minister Manmohan Singh.\n",
10 | "\n",
11 | "\n",
12 | "- BBC Article in Dec 2010 - [Stink over onion crisis is enough to make you cry](http://www.bbc.co.uk/blogs/thereporters/soutikbiswas/2010/12/indias_onion_crisis.html)\n",
13 | "- Hindu OpEd in Dec 2010 - [The political price of onions](http://www.thehindu.com/opinion/editorial/article977100.ece)\n",
14 | "\n",
15 | "\n",
16 | "\n",
17 | "So what are the type of questions on Onion Prices - you would like to ask. \n",
18 | "\n",
19 | "\n",
20 | "## Types of Question\n",
21 | "\n",
22 | "> \"Doing data analysis requires quite a bit of thinking and we believe that when you’ve completed a good data analysis, you’ve spent more time thinking than doing.\" - Roger Peng\n",
23 | "\n",
24 | "1. **Descriptive** - \"seeks to summarize a characteristic of a set of data\"\n",
25 | "2. **Exploratory** - \"analyze the data to see if there are patterns, trends, or relationships between variables\" (hypothesis generating) \n",
26 | "3. **Inferential** - \"a restatement of this proposed hypothesis as a question and would be answered by analyzing a different set of data\" (hypothesis testing)\n",
27 | "4. **Predictive** - \"determine the impact on one factor based on other factor in a population - to make a prediction\"\n",
28 | "5. **Causal** - \"asks whether changing one factor will change another factor in a population - to establish a causal link\" \n",
29 | "6. **Mechanistic** - \"establish *how* the change in one factor results in change in another factor in a population - to determine the exact mechanism\"\n",
30 | "\n",
31 | "\n",
32 | "### Descriptive \n",
33 | "- Which states have the highest onion production and sales?\n",
34 | "- Which city (Mandi's) have the highest sales?\n",
35 | "- What is the average price for Onion across a year in Bangalore?\n",
36 | "- ...\n",
37 | "\n",
38 | "### Exploratory & Inferential \n",
39 | "- Is there a large difference between High and Low prices of Onion in a day?\n",
40 | "- What is the trend of onion price across days or months in Bangalore?\n",
41 | "- How is the price on onion correlated with volume of onion?\n",
42 | "- How is the export volume of onion correlated to domestic production volume?\n",
43 | "- ...\n",
44 | "\n",
45 | "### Predictive \n",
46 | "- What is the price of onion likely to be next day?\n",
47 | "- What is the price of onion likely to be next month?\n",
48 | "- What will be the sales quantity of onion tommorrow in Delhi?\n",
49 | "- ...\n",
50 | "\n",
51 | "### Causal\n",
52 | "- Does the change in production of onion have an impact on the onion prices? \n",
53 | "- Does the change in rainfall in monsoon have an impact on onion prices?\n",
54 | "- ...\n",
55 | "\n",
56 | "### Mechanistic\n",
57 | "- How does change in onion production impact the price of onion?\n",
58 | "- How does onion export volumes impact the prices of onion in local markets in India?\n",
59 | "- ...\n",
60 | "\n",
61 | "\n",
62 | "## Questions we will attempt\n",
63 | "\n",
64 | "### 1. Descriptive: How big is the Bangalore onion market compared to other cities in India?\n",
65 | "\n",
66 | "### 2. Exploratory / Inferential: Have the price variation in onion prices in Bangalore really gone up over the years?\n",
67 | "\n",
68 | "### 3. Predictive: Can we predict the price of onion in Bangalore?"
69 | ]
70 | },
71 | {
72 | "cell_type": "code",
73 | "execution_count": null,
74 | "metadata": {
75 | "collapsed": true
76 | },
77 | "outputs": [],
78 | "source": []
79 | }
80 | ],
81 | "metadata": {
82 | "kernelspec": {
83 | "display_name": "Python 3",
84 | "language": "python",
85 | "name": "python3"
86 | },
87 | "language_info": {
88 | "codemirror_mode": {
89 | "name": "ipython",
90 | "version": 3
91 | },
92 | "file_extension": ".py",
93 | "mimetype": "text/x-python",
94 | "name": "python",
95 | "nbconvert_exporter": "python",
96 | "pygments_lexer": "ipython3",
97 | "version": "3.5.1"
98 | }
99 | },
100 | "nbformat": 4,
101 | "nbformat_minor": 0
102 | }
103 |
--------------------------------------------------------------------------------
/time_series/2-Acquire.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# 2. Acquire the Data\n",
8 | "\n",
9 | "\n",
10 | "## Finding Data Sources\n",
11 | "\n",
12 | "There are three place to get onion price and quantity information by market. \n",
13 | "\n",
14 | "1. **[Agmarket](http://agmarknet.nic.in/)** - This is the website run by the Directorate of Marketing & Inspection (DMI), Ministry of Agriculture, Government of India and provides daily price and arrival data for all agricultural commodities at national and state level. Unfortunately, the link to get Market-wise Daily Report for Specific Commodity (Onion for us) leads to a multipage aspx entry form to get data for each date. So it is like to require an involved scraper to get the data. Too much effort - Move on. Here is the best link to go to get what is available - http://agmarknet.nic.in/agnew/NationalBEnglish/SpecificCommodityWeeklyReport.aspx?ss=1\n",
15 | "\n",
16 | "\n",
17 | "2. **[Data.gov.in](https://data.gov.in/)** - This is normally a good place to get government data in a machine readable form like csv or xml. The Variety-wise Daily Market Prices Data of Onion is available for each year as an XML but unfortunately it does not include quantity information that is needed. It would be good to have both price and quantity - so even though this is easy, lets see if we can get both from a different source. Here is the best link to go to get what is available - https://data.gov.in/catalog/variety-wise-daily-market-prices-data-onion#web_catalog_tabs_block_10\n",
18 | "\n",
19 | "\n",
20 | "3. **[NHRDF](http://nhrdf.org/en-us/)** - This is the website of National Horticultural Research & Development Foundation and maintains a database on Market Arrivals and Price, Area and Production and Export Data for three commodities - Garlic, Onion and Potatoes. We are in luck! It also has data from 1996 onwards and has only got one form to fill to get the data in a tabular form. Further it also has production and export data. Excellent. Lets use this. Here is the best link to got to get all that is available - http://nhrdf.org/en-us/DatabaseReports\n",
21 | "\n",
22 | "\n",
23 | "## Scraping the Data\n",
24 | "\n",
25 | "\n",
26 | "### Ways to Scrape Data\n",
27 | "Now we can do this in two different levels of sophistication\n",
28 | "\n",
29 | "1. **Automate the form filling process**: The form on this page looks simple. But viewing source in the browser shows there form to fill with hidden fields and we will need to access it as a browser to get the session fields and then submit the form. This is a little bit more complicated than simple scraping a table on a webpage\n",
30 | "\n",
31 | "2. **Manually fill the form**: What if we manually fill the form with the desired form fields and then save the page as a html file. Then we can read this file and just scrape the table from it. Lets go with the simple way for now.\n",
32 | "\n",
33 | "\n",
34 | "### Scraping - Manual Form Filling\n",
35 | "\n",
36 | "So let us fill the form to get a small subset of data and test our scraping process. We will start by getting the [Monthwise Market Arrivals](http://nhrdf.org/en-us/MonthWiseMarketArrivals). \n",
37 | "\n",
38 | "- Crop Name: Onion\n",
39 | "- Month: January\n",
40 | "- Market: All\n",
41 | "- Year: 2016\n",
42 | "\n",
43 | "The saved webpage is available at [MonthWiseMarketArrivalsJan2016.html](MonthWiseMarketArrivalsJan2016.html)\n",
44 | "\n",
45 | "### Understand the HTML Structure\n",
46 | "\n",
47 | "We need to scrape data from this html page... So let us try to understand the structure of the page.\n",
48 | "\n",
49 | "1. You can view the source of the page - typically Right Click and View Source on any browser and that would give your the source HTML for any page.\n",
50 | "\n",
51 | "2. You can open the developer tools in your browser and investigate the structure as you mouse over the page \n",
52 | "\n",
53 | "3. We can use a tools like [Selector Gadget](http://selectorgadget.com/) to understand the id's and classes' used in the web page\n",
54 | "\n",
55 | "Our data is under the **<table>** tag "
56 | ]
57 | },
58 | {
59 | "cell_type": "markdown",
60 | "metadata": {},
61 | "source": [
62 | "### Exercise #1"
63 | ]
64 | },
65 | {
66 | "cell_type": "markdown",
67 | "metadata": {},
68 | "source": [
69 | "Find the number of tables in the HTML Structure of [MonthWiseMarketArrivalsJan2016.html](MonthWiseMarketArrivalsJan2016.html)?"
70 | ]
71 | },
72 | {
73 | "cell_type": "code",
74 | "execution_count": null,
75 | "metadata": {
76 | "collapsed": true
77 | },
78 | "outputs": [],
79 | "source": []
80 | },
81 | {
82 | "cell_type": "markdown",
83 | "metadata": {},
84 | "source": [
85 | "### Find all the Tables "
86 | ]
87 | },
88 | {
89 | "cell_type": "code",
90 | "execution_count": 1,
91 | "metadata": {
92 | "collapsed": false
93 | },
94 | "outputs": [],
95 | "source": [
96 | "# Import the library we need, which is Pandas\n",
97 | "import pandas as pd"
98 | ]
99 | },
100 | {
101 | "cell_type": "code",
102 | "execution_count": 2,
103 | "metadata": {
104 | "collapsed": false
105 | },
106 | "outputs": [],
107 | "source": [
108 | "# Read all the tables from the html document \n",
109 | "AllTables = pd.read_html('MonthWiseMarketArrivalsJan2016.html')"
110 | ]
111 | },
112 | {
113 | "cell_type": "code",
114 | "execution_count": 3,
115 | "metadata": {
116 | "collapsed": false,
117 | "scrolled": true
118 | },
119 | "outputs": [
120 | {
121 | "data": {
122 | "text/plain": [
123 | "5"
124 | ]
125 | },
126 | "execution_count": 3,
127 | "metadata": {},
128 | "output_type": "execute_result"
129 | }
130 | ],
131 | "source": [
132 | "# Let us find out how many tables has it found?\n",
133 | "len(AllTables)"
134 | ]
135 | },
136 | {
137 | "cell_type": "code",
138 | "execution_count": 4,
139 | "metadata": {
140 | "collapsed": false
141 | },
142 | "outputs": [
143 | {
144 | "data": {
145 | "text/plain": [
146 | "list"
147 | ]
148 | },
149 | "execution_count": 4,
150 | "metadata": {},
151 | "output_type": "execute_result"
152 | }
153 | ],
154 | "source": [
155 | "type(AllTables)"
156 | ]
157 | },
158 | {
159 | "cell_type": "markdown",
160 | "metadata": {},
161 | "source": [
162 | "### Exercise #2\n",
163 | "Find the exact table of data we want in the list of AllTables?"
164 | ]
165 | },
166 | {
167 | "cell_type": "code",
168 | "execution_count": 5,
169 | "metadata": {
170 | "collapsed": false
171 | },
172 | "outputs": [
173 | {
174 | "data": {
175 | "text/html": [
176 | "\n",
177 | "
\n",
178 | " \n",
179 | " \n",
180 | " | \n",
181 | " 0 | \n",
182 | " 1 | \n",
183 | " 2 | \n",
184 | " 3 | \n",
185 | " 4 | \n",
186 | " 5 | \n",
187 | " 6 | \n",
188 | "
\n",
189 | " \n",
190 | " \n",
191 | " \n",
192 | " 0 | \n",
193 | " Market | \n",
194 | " Month Name | \n",
195 | " Year | \n",
196 | " Arrival (q) | \n",
197 | " Price Minimum (Rs/q) | \n",
198 | " Price Maximum (Rs/q) | \n",
199 | " Modal Price (Rs/q) | \n",
200 | "
\n",
201 | " \n",
202 | " 1 | \n",
203 | " AGRA(UP) | \n",
204 | " January | \n",
205 | " 2016 | \n",
206 | " 134200 | \n",
207 | " 1039 | \n",
208 | " 1443 | \n",
209 | " 1349 | \n",
210 | "
\n",
211 | " \n",
212 | " 2 | \n",
213 | " AHMEDABAD(GUJ) | \n",
214 | " January | \n",
215 | " 2016 | \n",
216 | " 198390 | \n",
217 | " 646 | \n",
218 | " 1224 | \n",
219 | " 997 | \n",
220 | "
\n",
221 | " \n",
222 | " 3 | \n",
223 | " AHMEDNAGAR(MS) | \n",
224 | " January | \n",
225 | " 2016 | \n",
226 | " 208751 | \n",
227 | " 175 | \n",
228 | " 1722 | \n",
229 | " 1138 | \n",
230 | "
\n",
231 | " \n",
232 | " 4 | \n",
233 | " AJMER(RAJ) | \n",
234 | " January | \n",
235 | " 2016 | \n",
236 | " 4247 | \n",
237 | " 722 | \n",
238 | " 1067 | \n",
239 | " 939 | \n",
240 | "
\n",
241 | " \n",
242 | " 5 | \n",
243 | " ALIGARH(UP) | \n",
244 | " January | \n",
245 | " 2016 | \n",
246 | " 12350 | \n",
247 | " 1219 | \n",
248 | " 1298 | \n",
249 | " 1257 | \n",
250 | "
\n",
251 | " \n",
252 | " 6 | \n",
253 | " ALWAR(RAJ) | \n",
254 | " January | \n",
255 | " 2016 | \n",
256 | " 9788 | \n",
257 | " 625 | \n",
258 | " 1200 | \n",
259 | " 912 | \n",
260 | "
\n",
261 | " \n",
262 | " 7 | \n",
263 | " AMRITSAR(PB) | \n",
264 | " January | \n",
265 | " 2016 | \n",
266 | " 24800 | \n",
267 | " 913 | \n",
268 | " 1308 | \n",
269 | " 1160 | \n",
270 | "
\n",
271 | " \n",
272 | " 8 | \n",
273 | " BALLIA(UP) | \n",
274 | " January | \n",
275 | " 2016 | \n",
276 | " 600 | \n",
277 | " 1400 | \n",
278 | " 1500 | \n",
279 | " 1460 | \n",
280 | "
\n",
281 | " \n",
282 | " 9 | \n",
283 | " BANGALORE | \n",
284 | " January | \n",
285 | " 2016 | \n",
286 | " 507223 | \n",
287 | " 200 | \n",
288 | " 1943 | \n",
289 | " 1448 | \n",
290 | "
\n",
291 | " \n",
292 | " 10 | \n",
293 | " BAREILLY(UP) | \n",
294 | " January | \n",
295 | " 2016 | \n",
296 | " 18435 | \n",
297 | " 1149 | \n",
298 | " 1149 | \n",
299 | " 1149 | \n",
300 | "
\n",
301 | " \n",
302 | " 11 | \n",
303 | " BELGAUM(KNT) | \n",
304 | " January | \n",
305 | " 2016 | \n",
306 | " 61564 | \n",
307 | " 485 | \n",
308 | " 1826 | \n",
309 | " 1164 | \n",
310 | "
\n",
311 | " \n",
312 | " 12 | \n",
313 | " BHATINDA(PB) | \n",
314 | " January | \n",
315 | " 2016 | \n",
316 | " 5510 | \n",
317 | " 1218 | \n",
318 | " 1764 | \n",
319 | " 1459 | \n",
320 | "
\n",
321 | " \n",
322 | " 13 | \n",
323 | " BHAVNAGAR(GUJ) | \n",
324 | " January | \n",
325 | " 2016 | \n",
326 | " 588870 | \n",
327 | " 767 | \n",
328 | " 1174 | \n",
329 | " 969 | \n",
330 | "
\n",
331 | " \n",
332 | " 14 | \n",
333 | " BHUBNESWER(OR) | \n",
334 | " January | \n",
335 | " 2016 | \n",
336 | " 34050 | \n",
337 | " 1551 | \n",
338 | " 1658 | \n",
339 | " 1619 | \n",
340 | "
\n",
341 | " \n",
342 | " 15 | \n",
343 | " BIJAPUR(KNT) | \n",
344 | " January | \n",
345 | " 2016 | \n",
346 | " 4110 | \n",
347 | " 413 | \n",
348 | " 1713 | \n",
349 | " 1088 | \n",
350 | "
\n",
351 | " \n",
352 | " 16 | \n",
353 | " BURDWAN(WB) | \n",
354 | " January | \n",
355 | " 2016 | \n",
356 | " 3880 | \n",
357 | " 1689 | \n",
358 | " 1789 | \n",
359 | " 1739 | \n",
360 | "
\n",
361 | " \n",
362 | " 17 | \n",
363 | " CHAKAN(MS) | \n",
364 | " January | \n",
365 | " 2016 | \n",
366 | " 41125 | \n",
367 | " 940 | \n",
368 | " 1680 | \n",
369 | " 1350 | \n",
370 | "
\n",
371 | " \n",
372 | " 18 | \n",
373 | " CHANDIGARH | \n",
374 | " January | \n",
375 | " 2016 | \n",
376 | " 4310 | \n",
377 | " 800 | \n",
378 | " 1200 | \n",
379 | " 1000 | \n",
380 | "
\n",
381 | " \n",
382 | " 19 | \n",
383 | " CHANDVAD(MS) | \n",
384 | " January | \n",
385 | " 2016 | \n",
386 | " 108619 | \n",
387 | " 461 | \n",
388 | " 1361 | \n",
389 | " 1094 | \n",
390 | "
\n",
391 | " \n",
392 | " 20 | \n",
393 | " CHENNAI | \n",
394 | " January | \n",
395 | " 2016 | \n",
396 | " 116700 | \n",
397 | " 1927 | \n",
398 | " 2245 | \n",
399 | " 2086 | \n",
400 | "
\n",
401 | " \n",
402 | " 21 | \n",
403 | " DEESA(GUJ) | \n",
404 | " January | \n",
405 | " 2016 | \n",
406 | " 784 | \n",
407 | " 900 | \n",
408 | " 1575 | \n",
409 | " 1350 | \n",
410 | "
\n",
411 | " \n",
412 | " 22 | \n",
413 | " DEHRADOON(UTT) | \n",
414 | " January | \n",
415 | " 2016 | \n",
416 | " 10068 | \n",
417 | " 808 | \n",
418 | " 1048 | \n",
419 | " 935 | \n",
420 | "
\n",
421 | " \n",
422 | " 23 | \n",
423 | " DELHI | \n",
424 | " January | \n",
425 | " 2016 | \n",
426 | " 249231 | \n",
427 | " 467 | \n",
428 | " 1573 | \n",
429 | " 1154 | \n",
430 | "
\n",
431 | " \n",
432 | " 24 | \n",
433 | " DEVALA(MS) | \n",
434 | " January | \n",
435 | " 2016 | \n",
436 | " 127136 | \n",
437 | " 828 | \n",
438 | " 1321 | \n",
439 | " 1075 | \n",
440 | "
\n",
441 | " \n",
442 | " 25 | \n",
443 | " DHAVANGERE(KNT) | \n",
444 | " January | \n",
445 | " 2016 | \n",
446 | " 13170 | \n",
447 | " 435 | \n",
448 | " 1545 | \n",
449 | " 990 | \n",
450 | "
\n",
451 | " \n",
452 | " 26 | \n",
453 | " DHULIA(MS) | \n",
454 | " January | \n",
455 | " 2016 | \n",
456 | " 98405 | \n",
457 | " 225 | \n",
458 | " 1268 | \n",
459 | " 976 | \n",
460 | "
\n",
461 | " \n",
462 | " 27 | \n",
463 | " GONDAL(GUJ) | \n",
464 | " January | \n",
465 | " 2016 | \n",
466 | " 153695 | \n",
467 | " 292 | \n",
468 | " 1061 | \n",
469 | " 760 | \n",
470 | "
\n",
471 | " \n",
472 | " 28 | \n",
473 | " GUWAHATI | \n",
474 | " January | \n",
475 | " 2016 | \n",
476 | " 3270 | \n",
477 | " 1641 | \n",
478 | " 1784 | \n",
479 | " 1713 | \n",
480 | "
\n",
481 | " \n",
482 | " 29 | \n",
483 | " HASSAN(KNT) | \n",
484 | " January | \n",
485 | " 2016 | \n",
486 | " 12211 | \n",
487 | " 782 | \n",
488 | " 1480 | \n",
489 | " 1255 | \n",
490 | "
\n",
491 | " \n",
492 | " ... | \n",
493 | " ... | \n",
494 | " ... | \n",
495 | " ... | \n",
496 | " ... | \n",
497 | " ... | \n",
498 | " ... | \n",
499 | " ... | \n",
500 | "
\n",
501 | " \n",
502 | " 55 | \n",
503 | " MUMBAI | \n",
504 | " January | \n",
505 | " 2016 | \n",
506 | " 413681 | \n",
507 | " 654 | \n",
508 | " 1473 | \n",
509 | " 1215 | \n",
510 | "
\n",
511 | " \n",
512 | " 56 | \n",
513 | " NAGPUR | \n",
514 | " January | \n",
515 | " 2016 | \n",
516 | " 98645 | \n",
517 | " 954 | \n",
518 | " 1367 | \n",
519 | " 1260 | \n",
520 | "
\n",
521 | " \n",
522 | " 57 | \n",
523 | " NEWASA(MS) | \n",
524 | " January | \n",
525 | " 2016 | \n",
526 | " 296461 | \n",
527 | " 250 | \n",
528 | " 1758 | \n",
529 | " 1458 | \n",
530 | "
\n",
531 | " \n",
532 | " 58 | \n",
533 | " NIPHAD(MS) | \n",
534 | " January | \n",
535 | " 2016 | \n",
536 | " 33110 | \n",
537 | " 505 | \n",
538 | " 1108 | \n",
539 | " 958 | \n",
540 | "
\n",
541 | " \n",
542 | " 59 | \n",
543 | " PALAYAM(KER) | \n",
544 | " January | \n",
545 | " 2016 | \n",
546 | " 8100 | \n",
547 | " 1835 | \n",
548 | " 2106 | \n",
549 | " 1994 | \n",
550 | "
\n",
551 | " \n",
552 | " 60 | \n",
553 | " PATIALA(PB) | \n",
554 | " January | \n",
555 | " 2016 | \n",
556 | " 1445 | \n",
557 | " 1000 | \n",
558 | " 1575 | \n",
559 | " 1300 | \n",
560 | "
\n",
561 | " \n",
562 | " 61 | \n",
563 | " PATNA | \n",
564 | " January | \n",
565 | " 2016 | \n",
566 | " 29200 | \n",
567 | " 1440 | \n",
568 | " 1548 | \n",
569 | " 1497 | \n",
570 | "
\n",
571 | " \n",
572 | " 62 | \n",
573 | " PHALTAN (MS) | \n",
574 | " January | \n",
575 | " 2016 | \n",
576 | " 2662 | \n",
577 | " 433 | \n",
578 | " 1684 | \n",
579 | " 1117 | \n",
580 | "
\n",
581 | " \n",
582 | " 63 | \n",
583 | " PIMPALGAON(MS) | \n",
584 | " January | \n",
585 | " 2016 | \n",
586 | " 506740 | \n",
587 | " 403 | \n",
588 | " 1448 | \n",
589 | " 1004 | \n",
590 | "
\n",
591 | " \n",
592 | " 64 | \n",
593 | " PUNE(MS) | \n",
594 | " January | \n",
595 | " 2016 | \n",
596 | " 325669 | \n",
597 | " 613 | \n",
598 | " 1729 | \n",
599 | " 1488 | \n",
600 | "
\n",
601 | " \n",
602 | " 65 | \n",
603 | " PURULIA(WB) | \n",
604 | " January | \n",
605 | " 2016 | \n",
606 | " 1200 | \n",
607 | " 1640 | \n",
608 | " 1900 | \n",
609 | " 1800 | \n",
610 | "
\n",
611 | " \n",
612 | " 66 | \n",
613 | " RAHATA(MS) | \n",
614 | " January | \n",
615 | " 2016 | \n",
616 | " 149286 | \n",
617 | " 427 | \n",
618 | " 1986 | \n",
619 | " 1223 | \n",
620 | "
\n",
621 | " \n",
622 | " 67 | \n",
623 | " RAHURI(MS) | \n",
624 | " January | \n",
625 | " 2016 | \n",
626 | " 56957 | \n",
627 | " 200 | \n",
628 | " 1675 | \n",
629 | " 1038 | \n",
630 | "
\n",
631 | " \n",
632 | " 68 | \n",
633 | " RAICHUR(KNT) | \n",
634 | " January | \n",
635 | " 2016 | \n",
636 | " 17795 | \n",
637 | " 987 | \n",
638 | " 1590 | \n",
639 | " 1331 | \n",
640 | "
\n",
641 | " \n",
642 | " 69 | \n",
643 | " RAIPUR(CHGARH) | \n",
644 | " January | \n",
645 | " 2016 | \n",
646 | " 4300 | \n",
647 | " 1193 | \n",
648 | " 1371 | \n",
649 | " 1299 | \n",
650 | "
\n",
651 | " \n",
652 | " 70 | \n",
653 | " RAJKOT(GUJ) | \n",
654 | " January | \n",
655 | " 2016 | \n",
656 | " 80400 | \n",
657 | " 389 | \n",
658 | " 980 | \n",
659 | " 783 | \n",
660 | "
\n",
661 | " \n",
662 | " 71 | \n",
663 | " SAGAR(MP) | \n",
664 | " January | \n",
665 | " 2016 | \n",
666 | " 1954 | \n",
667 | " 792 | \n",
668 | " 1000 | \n",
669 | " 892 | \n",
670 | "
\n",
671 | " \n",
672 | " 72 | \n",
673 | " SAIKHEDA(MS) | \n",
674 | " January | \n",
675 | " 2016 | \n",
676 | " 106742 | \n",
677 | " 375 | \n",
678 | " 1260 | \n",
679 | " 898 | \n",
680 | "
\n",
681 | " \n",
682 | " 73 | \n",
683 | " SANGALI(MS) | \n",
684 | " January | \n",
685 | " 2016 | \n",
686 | " 16104 | \n",
687 | " 413 | \n",
688 | " 2125 | \n",
689 | " 1231 | \n",
690 | "
\n",
691 | " \n",
692 | " 74 | \n",
693 | " SANGAMNER(MS) | \n",
694 | " January | \n",
695 | " 2016 | \n",
696 | " 117471 | \n",
697 | " 500 | \n",
698 | " 1853 | \n",
699 | " 1312 | \n",
700 | "
\n",
701 | " \n",
702 | " 75 | \n",
703 | " SATANA(MS) | \n",
704 | " January | \n",
705 | " 2016 | \n",
706 | " 69286 | \n",
707 | " 429 | \n",
708 | " 1354 | \n",
709 | " 957 | \n",
710 | "
\n",
711 | " \n",
712 | " 76 | \n",
713 | " SHRIRAMPUR(MS) | \n",
714 | " January | \n",
715 | " 2016 | \n",
716 | " 23497 | \n",
717 | " 271 | \n",
718 | " 1786 | \n",
719 | " 1350 | \n",
720 | "
\n",
721 | " \n",
722 | " 77 | \n",
723 | " SINNAR(MS) | \n",
724 | " January | \n",
725 | " 2016 | \n",
726 | " 73933 | \n",
727 | " 160 | \n",
728 | " 1363 | \n",
729 | " 988 | \n",
730 | "
\n",
731 | " \n",
732 | " 78 | \n",
733 | " SOLAPUR(MS) | \n",
734 | " January | \n",
735 | " 2016 | \n",
736 | " 403797 | \n",
737 | " 124 | \n",
738 | " 2285 | \n",
739 | " 958 | \n",
740 | "
\n",
741 | " \n",
742 | " 79 | \n",
743 | " SURAT(GUJ) | \n",
744 | " January | \n",
745 | " 2016 | \n",
746 | " 31700 | \n",
747 | " 943 | \n",
748 | " 1562 | \n",
749 | " 1252 | \n",
750 | "
\n",
751 | " \n",
752 | " 80 | \n",
753 | " UDAIPUR(RAJ) | \n",
754 | " January | \n",
755 | " 2016 | \n",
756 | " 6456 | \n",
757 | " 386 | \n",
758 | " 1307 | \n",
759 | " 846 | \n",
760 | "
\n",
761 | " \n",
762 | " 81 | \n",
763 | " VANI(MS) | \n",
764 | " January | \n",
765 | " 2016 | \n",
766 | " 60983 | \n",
767 | " 767 | \n",
768 | " 1323 | \n",
769 | " 1007 | \n",
770 | "
\n",
771 | " \n",
772 | " 82 | \n",
773 | " VARANASI(UP) | \n",
774 | " January | \n",
775 | " 2016 | \n",
776 | " 28900 | \n",
777 | " 1460 | \n",
778 | " 1503 | \n",
779 | " 1484 | \n",
780 | "
\n",
781 | " \n",
782 | " 83 | \n",
783 | " YEOLA(MS) | \n",
784 | " January | \n",
785 | " 2016 | \n",
786 | " 437432 | \n",
787 | " 437 | \n",
788 | " 1272 | \n",
789 | " 1034 | \n",
790 | "
\n",
791 | " \n",
792 | " 84 | \n",
793 | " NaN | \n",
794 | " NaN | \n",
795 | " Total | \n",
796 | " 9307923 | \n",
797 | " 751(Avg) | \n",
798 | " 1490(Avg) | \n",
799 | " 1186(Avg) | \n",
800 | "
\n",
801 | " \n",
802 | "
\n",
803 | "
85 rows × 7 columns
\n",
804 | "
"
805 | ],
806 | "text/plain": [
807 | " 0 1 2 3 4 \\\n",
808 | "0 Market Month Name Year Arrival (q) Price Minimum (Rs/q) \n",
809 | "1 AGRA(UP) January 2016 134200 1039 \n",
810 | "2 AHMEDABAD(GUJ) January 2016 198390 646 \n",
811 | "3 AHMEDNAGAR(MS) January 2016 208751 175 \n",
812 | "4 AJMER(RAJ) January 2016 4247 722 \n",
813 | "5 ALIGARH(UP) January 2016 12350 1219 \n",
814 | "6 ALWAR(RAJ) January 2016 9788 625 \n",
815 | "7 AMRITSAR(PB) January 2016 24800 913 \n",
816 | "8 BALLIA(UP) January 2016 600 1400 \n",
817 | "9 BANGALORE January 2016 507223 200 \n",
818 | "10 BAREILLY(UP) January 2016 18435 1149 \n",
819 | "11 BELGAUM(KNT) January 2016 61564 485 \n",
820 | "12 BHATINDA(PB) January 2016 5510 1218 \n",
821 | "13 BHAVNAGAR(GUJ) January 2016 588870 767 \n",
822 | "14 BHUBNESWER(OR) January 2016 34050 1551 \n",
823 | "15 BIJAPUR(KNT) January 2016 4110 413 \n",
824 | "16 BURDWAN(WB) January 2016 3880 1689 \n",
825 | "17 CHAKAN(MS) January 2016 41125 940 \n",
826 | "18 CHANDIGARH January 2016 4310 800 \n",
827 | "19 CHANDVAD(MS) January 2016 108619 461 \n",
828 | "20 CHENNAI January 2016 116700 1927 \n",
829 | "21 DEESA(GUJ) January 2016 784 900 \n",
830 | "22 DEHRADOON(UTT) January 2016 10068 808 \n",
831 | "23 DELHI January 2016 249231 467 \n",
832 | "24 DEVALA(MS) January 2016 127136 828 \n",
833 | "25 DHAVANGERE(KNT) January 2016 13170 435 \n",
834 | "26 DHULIA(MS) January 2016 98405 225 \n",
835 | "27 GONDAL(GUJ) January 2016 153695 292 \n",
836 | "28 GUWAHATI January 2016 3270 1641 \n",
837 | "29 HASSAN(KNT) January 2016 12211 782 \n",
838 | ".. ... ... ... ... ... \n",
839 | "55 MUMBAI January 2016 413681 654 \n",
840 | "56 NAGPUR January 2016 98645 954 \n",
841 | "57 NEWASA(MS) January 2016 296461 250 \n",
842 | "58 NIPHAD(MS) January 2016 33110 505 \n",
843 | "59 PALAYAM(KER) January 2016 8100 1835 \n",
844 | "60 PATIALA(PB) January 2016 1445 1000 \n",
845 | "61 PATNA January 2016 29200 1440 \n",
846 | "62 PHALTAN (MS) January 2016 2662 433 \n",
847 | "63 PIMPALGAON(MS) January 2016 506740 403 \n",
848 | "64 PUNE(MS) January 2016 325669 613 \n",
849 | "65 PURULIA(WB) January 2016 1200 1640 \n",
850 | "66 RAHATA(MS) January 2016 149286 427 \n",
851 | "67 RAHURI(MS) January 2016 56957 200 \n",
852 | "68 RAICHUR(KNT) January 2016 17795 987 \n",
853 | "69 RAIPUR(CHGARH) January 2016 4300 1193 \n",
854 | "70 RAJKOT(GUJ) January 2016 80400 389 \n",
855 | "71 SAGAR(MP) January 2016 1954 792 \n",
856 | "72 SAIKHEDA(MS) January 2016 106742 375 \n",
857 | "73 SANGALI(MS) January 2016 16104 413 \n",
858 | "74 SANGAMNER(MS) January 2016 117471 500 \n",
859 | "75 SATANA(MS) January 2016 69286 429 \n",
860 | "76 SHRIRAMPUR(MS) January 2016 23497 271 \n",
861 | "77 SINNAR(MS) January 2016 73933 160 \n",
862 | "78 SOLAPUR(MS) January 2016 403797 124 \n",
863 | "79 SURAT(GUJ) January 2016 31700 943 \n",
864 | "80 UDAIPUR(RAJ) January 2016 6456 386 \n",
865 | "81 VANI(MS) January 2016 60983 767 \n",
866 | "82 VARANASI(UP) January 2016 28900 1460 \n",
867 | "83 YEOLA(MS) January 2016 437432 437 \n",
868 | "84 NaN NaN Total 9307923 751(Avg) \n",
869 | "\n",
870 | " 5 6 \n",
871 | "0 Price Maximum (Rs/q) Modal Price (Rs/q) \n",
872 | "1 1443 1349 \n",
873 | "2 1224 997 \n",
874 | "3 1722 1138 \n",
875 | "4 1067 939 \n",
876 | "5 1298 1257 \n",
877 | "6 1200 912 \n",
878 | "7 1308 1160 \n",
879 | "8 1500 1460 \n",
880 | "9 1943 1448 \n",
881 | "10 1149 1149 \n",
882 | "11 1826 1164 \n",
883 | "12 1764 1459 \n",
884 | "13 1174 969 \n",
885 | "14 1658 1619 \n",
886 | "15 1713 1088 \n",
887 | "16 1789 1739 \n",
888 | "17 1680 1350 \n",
889 | "18 1200 1000 \n",
890 | "19 1361 1094 \n",
891 | "20 2245 2086 \n",
892 | "21 1575 1350 \n",
893 | "22 1048 935 \n",
894 | "23 1573 1154 \n",
895 | "24 1321 1075 \n",
896 | "25 1545 990 \n",
897 | "26 1268 976 \n",
898 | "27 1061 760 \n",
899 | "28 1784 1713 \n",
900 | "29 1480 1255 \n",
901 | ".. ... ... \n",
902 | "55 1473 1215 \n",
903 | "56 1367 1260 \n",
904 | "57 1758 1458 \n",
905 | "58 1108 958 \n",
906 | "59 2106 1994 \n",
907 | "60 1575 1300 \n",
908 | "61 1548 1497 \n",
909 | "62 1684 1117 \n",
910 | "63 1448 1004 \n",
911 | "64 1729 1488 \n",
912 | "65 1900 1800 \n",
913 | "66 1986 1223 \n",
914 | "67 1675 1038 \n",
915 | "68 1590 1331 \n",
916 | "69 1371 1299 \n",
917 | "70 980 783 \n",
918 | "71 1000 892 \n",
919 | "72 1260 898 \n",
920 | "73 2125 1231 \n",
921 | "74 1853 1312 \n",
922 | "75 1354 957 \n",
923 | "76 1786 1350 \n",
924 | "77 1363 988 \n",
925 | "78 2285 958 \n",
926 | "79 1562 1252 \n",
927 | "80 1307 846 \n",
928 | "81 1323 1007 \n",
929 | "82 1503 1484 \n",
930 | "83 1272 1034 \n",
931 | "84 1490(Avg) 1186(Avg) \n",
932 | "\n",
933 | "[85 rows x 7 columns]"
934 | ]
935 | },
936 | "execution_count": 5,
937 | "metadata": {},
938 | "output_type": "execute_result"
939 | }
940 | ],
941 | "source": [
942 | "AllTables[4]"
943 | ]
944 | },
945 | {
946 | "cell_type": "markdown",
947 | "metadata": {},
948 | "source": [
949 | "### Get the exact table\n",
950 | "To read the exact table we need to pass in an identifier value which would identify the table. We can use the `attrs` parameter in read_html to do so. The parameter we will pass is the `id` variable"
951 | ]
952 | },
953 | {
954 | "cell_type": "code",
955 | "execution_count": 6,
956 | "metadata": {
957 | "collapsed": false
958 | },
959 | "outputs": [],
960 | "source": [
961 | "# So can we read our exact table\n",
962 | "OneTable = pd.read_html('MonthWiseMarketArrivalsJan2016.html', \n",
963 | " attrs = {'id' : 'dnn_ctr974_MonthWiseMarketArrivals_GridView1'})"
964 | ]
965 | },
966 | {
967 | "cell_type": "code",
968 | "execution_count": 7,
969 | "metadata": {
970 | "collapsed": false
971 | },
972 | "outputs": [
973 | {
974 | "data": {
975 | "text/plain": [
976 | "1"
977 | ]
978 | },
979 | "execution_count": 7,
980 | "metadata": {},
981 | "output_type": "execute_result"
982 | }
983 | ],
984 | "source": [
985 | "# So how many tables have we got now\n",
986 | "len(OneTable)"
987 | ]
988 | },
989 | {
990 | "cell_type": "code",
991 | "execution_count": 8,
992 | "metadata": {
993 | "collapsed": false
994 | },
995 | "outputs": [
996 | {
997 | "data": {
998 | "text/html": [
999 | "\n",
1000 | "
\n",
1001 | " \n",
1002 | " \n",
1003 | " | \n",
1004 | " 0 | \n",
1005 | " 1 | \n",
1006 | " 2 | \n",
1007 | " 3 | \n",
1008 | " 4 | \n",
1009 | " 5 | \n",
1010 | " 6 | \n",
1011 | "
\n",
1012 | " \n",
1013 | " \n",
1014 | " \n",
1015 | " 0 | \n",
1016 | " Market | \n",
1017 | " Month Name | \n",
1018 | " Year | \n",
1019 | " Arrival (q) | \n",
1020 | " Price Minimum (Rs/q) | \n",
1021 | " Price Maximum (Rs/q) | \n",
1022 | " Modal Price (Rs/q) | \n",
1023 | "
\n",
1024 | " \n",
1025 | " 1 | \n",
1026 | " AGRA(UP) | \n",
1027 | " January | \n",
1028 | " 2016 | \n",
1029 | " 134200 | \n",
1030 | " 1039 | \n",
1031 | " 1443 | \n",
1032 | " 1349 | \n",
1033 | "
\n",
1034 | " \n",
1035 | " 2 | \n",
1036 | " AHMEDABAD(GUJ) | \n",
1037 | " January | \n",
1038 | " 2016 | \n",
1039 | " 198390 | \n",
1040 | " 646 | \n",
1041 | " 1224 | \n",
1042 | " 997 | \n",
1043 | "
\n",
1044 | " \n",
1045 | " 3 | \n",
1046 | " AHMEDNAGAR(MS) | \n",
1047 | " January | \n",
1048 | " 2016 | \n",
1049 | " 208751 | \n",
1050 | " 175 | \n",
1051 | " 1722 | \n",
1052 | " 1138 | \n",
1053 | "
\n",
1054 | " \n",
1055 | " 4 | \n",
1056 | " AJMER(RAJ) | \n",
1057 | " January | \n",
1058 | " 2016 | \n",
1059 | " 4247 | \n",
1060 | " 722 | \n",
1061 | " 1067 | \n",
1062 | " 939 | \n",
1063 | "
\n",
1064 | " \n",
1065 | "
\n",
1066 | "
"
1067 | ],
1068 | "text/plain": [
1069 | " 0 1 2 3 4 \\\n",
1070 | "0 Market Month Name Year Arrival (q) Price Minimum (Rs/q) \n",
1071 | "1 AGRA(UP) January 2016 134200 1039 \n",
1072 | "2 AHMEDABAD(GUJ) January 2016 198390 646 \n",
1073 | "3 AHMEDNAGAR(MS) January 2016 208751 175 \n",
1074 | "4 AJMER(RAJ) January 2016 4247 722 \n",
1075 | "\n",
1076 | " 5 6 \n",
1077 | "0 Price Maximum (Rs/q) Modal Price (Rs/q) \n",
1078 | "1 1443 1349 \n",
1079 | "2 1224 997 \n",
1080 | "3 1722 1138 \n",
1081 | "4 1067 939 "
1082 | ]
1083 | },
1084 | "execution_count": 8,
1085 | "metadata": {},
1086 | "output_type": "execute_result"
1087 | }
1088 | ],
1089 | "source": [
1090 | "# Show the table of data identifed by pandas with just the first five rows\n",
1091 | "OneTable[0].head()"
1092 | ]
1093 | },
1094 | {
1095 | "cell_type": "markdown",
1096 | "metadata": {},
1097 | "source": [
1098 | "However, we have not got the header correctly in our dataframe. Let us see if we can fix this.\n",
1099 | "\n",
1100 | "To get help on any function just use `??` before the function to help. Run this function and see what additional parameter you need to define to get the header correctly"
1101 | ]
1102 | },
1103 | {
1104 | "cell_type": "code",
1105 | "execution_count": null,
1106 | "metadata": {
1107 | "collapsed": true
1108 | },
1109 | "outputs": [],
1110 | "source": [
1111 | "??pd.read_html"
1112 | ]
1113 | },
1114 | {
1115 | "cell_type": "markdown",
1116 | "metadata": {},
1117 | "source": [
1118 | "### Exercise #3\n",
1119 | "Read the html file again and ensure that the correct header is identifed by pandas?"
1120 | ]
1121 | },
1122 | {
1123 | "cell_type": "code",
1124 | "execution_count": 11,
1125 | "metadata": {
1126 | "collapsed": false
1127 | },
1128 | "outputs": [],
1129 | "source": [
1130 | "OneTable = pd.read_html('MonthWiseMarketArrivalsJan2016.html', header = 0,\n",
1131 | " attrs = {'id' : 'dnn_ctr974_MonthWiseMarketArrivals_GridView1'})"
1132 | ]
1133 | },
1134 | {
1135 | "cell_type": "markdown",
1136 | "metadata": {},
1137 | "source": [
1138 | "Show the top five rows of the dataframe you have read to ensure the headers are now correct."
1139 | ]
1140 | },
1141 | {
1142 | "cell_type": "code",
1143 | "execution_count": 10,
1144 | "metadata": {
1145 | "collapsed": false,
1146 | "scrolled": true
1147 | },
1148 | "outputs": [
1149 | {
1150 | "data": {
1151 | "text/html": [
1152 | "\n",
1153 | "
\n",
1154 | " \n",
1155 | " \n",
1156 | " | \n",
1157 | " Market | \n",
1158 | " Month Name | \n",
1159 | " Year | \n",
1160 | " Arrival (q) | \n",
1161 | " Price Minimum (Rs/q) | \n",
1162 | " Price Maximum (Rs/q) | \n",
1163 | " Modal Price (Rs/q) | \n",
1164 | "
\n",
1165 | " \n",
1166 | " \n",
1167 | " \n",
1168 | " 0 | \n",
1169 | " AGRA(UP) | \n",
1170 | " January | \n",
1171 | " 2016 | \n",
1172 | " 134200 | \n",
1173 | " 1039 | \n",
1174 | " 1443 | \n",
1175 | " 1349 | \n",
1176 | "
\n",
1177 | " \n",
1178 | " 1 | \n",
1179 | " AHMEDABAD(GUJ) | \n",
1180 | " January | \n",
1181 | " 2016 | \n",
1182 | " 198390 | \n",
1183 | " 646 | \n",
1184 | " 1224 | \n",
1185 | " 997 | \n",
1186 | "
\n",
1187 | " \n",
1188 | " 2 | \n",
1189 | " AHMEDNAGAR(MS) | \n",
1190 | " January | \n",
1191 | " 2016 | \n",
1192 | " 208751 | \n",
1193 | " 175 | \n",
1194 | " 1722 | \n",
1195 | " 1138 | \n",
1196 | "
\n",
1197 | " \n",
1198 | " 3 | \n",
1199 | " AJMER(RAJ) | \n",
1200 | " January | \n",
1201 | " 2016 | \n",
1202 | " 4247 | \n",
1203 | " 722 | \n",
1204 | " 1067 | \n",
1205 | " 939 | \n",
1206 | "
\n",
1207 | " \n",
1208 | " 4 | \n",
1209 | " ALIGARH(UP) | \n",
1210 | " January | \n",
1211 | " 2016 | \n",
1212 | " 12350 | \n",
1213 | " 1219 | \n",
1214 | " 1298 | \n",
1215 | " 1257 | \n",
1216 | "
\n",
1217 | " \n",
1218 | "
\n",
1219 | "
"
1220 | ],
1221 | "text/plain": [
1222 | " Market Month Name Year Arrival (q) Price Minimum (Rs/q) \\\n",
1223 | "0 AGRA(UP) January 2016 134200 1039 \n",
1224 | "1 AHMEDABAD(GUJ) January 2016 198390 646 \n",
1225 | "2 AHMEDNAGAR(MS) January 2016 208751 175 \n",
1226 | "3 AJMER(RAJ) January 2016 4247 722 \n",
1227 | "4 ALIGARH(UP) January 2016 12350 1219 \n",
1228 | "\n",
1229 | " Price Maximum (Rs/q) Modal Price (Rs/q) \n",
1230 | "0 1443 1349 \n",
1231 | "1 1224 997 \n",
1232 | "2 1722 1138 \n",
1233 | "3 1067 939 \n",
1234 | "4 1298 1257 "
1235 | ]
1236 | },
1237 | "execution_count": 10,
1238 | "metadata": {},
1239 | "output_type": "execute_result"
1240 | }
1241 | ],
1242 | "source": [
1243 | "OneTable[0].head()"
1244 | ]
1245 | },
1246 | {
1247 | "cell_type": "markdown",
1248 | "metadata": {
1249 | "collapsed": true
1250 | },
1251 | "source": [
1252 | "### Dataframe Viewing "
1253 | ]
1254 | },
1255 | {
1256 | "cell_type": "code",
1257 | "execution_count": 12,
1258 | "metadata": {
1259 | "collapsed": true
1260 | },
1261 | "outputs": [],
1262 | "source": [
1263 | "# Let us store the dataframe in a df variable. You will see that as a very common convention in data science pandas use\n",
1264 | "df = OneTable[0]"
1265 | ]
1266 | },
1267 | {
1268 | "cell_type": "code",
1269 | "execution_count": 13,
1270 | "metadata": {
1271 | "collapsed": false
1272 | },
1273 | "outputs": [
1274 | {
1275 | "data": {
1276 | "text/plain": [
1277 | "(84, 7)"
1278 | ]
1279 | },
1280 | "execution_count": 13,
1281 | "metadata": {},
1282 | "output_type": "execute_result"
1283 | }
1284 | ],
1285 | "source": [
1286 | "# Shape of the dateset - number of rows & number of columns in the dataframe\n",
1287 | "df.shape"
1288 | ]
1289 | },
1290 | {
1291 | "cell_type": "code",
1292 | "execution_count": 14,
1293 | "metadata": {
1294 | "collapsed": false
1295 | },
1296 | "outputs": [
1297 | {
1298 | "data": {
1299 | "text/plain": [
1300 | "Index(['Market', 'Month Name', 'Year', 'Arrival (q)', 'Price Minimum (Rs/q)',\n",
1301 | " 'Price Maximum (Rs/q)', 'Modal Price (Rs/q)'],\n",
1302 | " dtype='object')"
1303 | ]
1304 | },
1305 | "execution_count": 14,
1306 | "metadata": {},
1307 | "output_type": "execute_result"
1308 | }
1309 | ],
1310 | "source": [
1311 | "# Get the names of all the columns \n",
1312 | "df.columns"
1313 | ]
1314 | },
1315 | {
1316 | "cell_type": "code",
1317 | "execution_count": 15,
1318 | "metadata": {
1319 | "collapsed": false
1320 | },
1321 | "outputs": [
1322 | {
1323 | "data": {
1324 | "text/html": [
1325 | "\n",
1326 | "
\n",
1327 | " \n",
1328 | " \n",
1329 | " | \n",
1330 | " Market | \n",
1331 | " Month Name | \n",
1332 | " Year | \n",
1333 | " Arrival (q) | \n",
1334 | " Price Minimum (Rs/q) | \n",
1335 | " Price Maximum (Rs/q) | \n",
1336 | " Modal Price (Rs/q) | \n",
1337 | "
\n",
1338 | " \n",
1339 | " \n",
1340 | " \n",
1341 | " 0 | \n",
1342 | " AGRA(UP) | \n",
1343 | " January | \n",
1344 | " 2016 | \n",
1345 | " 134200 | \n",
1346 | " 1039 | \n",
1347 | " 1443 | \n",
1348 | " 1349 | \n",
1349 | "
\n",
1350 | " \n",
1351 | " 1 | \n",
1352 | " AHMEDABAD(GUJ) | \n",
1353 | " January | \n",
1354 | " 2016 | \n",
1355 | " 198390 | \n",
1356 | " 646 | \n",
1357 | " 1224 | \n",
1358 | " 997 | \n",
1359 | "
\n",
1360 | " \n",
1361 | " 2 | \n",
1362 | " AHMEDNAGAR(MS) | \n",
1363 | " January | \n",
1364 | " 2016 | \n",
1365 | " 208751 | \n",
1366 | " 175 | \n",
1367 | " 1722 | \n",
1368 | " 1138 | \n",
1369 | "
\n",
1370 | " \n",
1371 | " 3 | \n",
1372 | " AJMER(RAJ) | \n",
1373 | " January | \n",
1374 | " 2016 | \n",
1375 | " 4247 | \n",
1376 | " 722 | \n",
1377 | " 1067 | \n",
1378 | " 939 | \n",
1379 | "
\n",
1380 | " \n",
1381 | " 4 | \n",
1382 | " ALIGARH(UP) | \n",
1383 | " January | \n",
1384 | " 2016 | \n",
1385 | " 12350 | \n",
1386 | " 1219 | \n",
1387 | " 1298 | \n",
1388 | " 1257 | \n",
1389 | "
\n",
1390 | " \n",
1391 | "
\n",
1392 | "
"
1393 | ],
1394 | "text/plain": [
1395 | " Market Month Name Year Arrival (q) Price Minimum (Rs/q) \\\n",
1396 | "0 AGRA(UP) January 2016 134200 1039 \n",
1397 | "1 AHMEDABAD(GUJ) January 2016 198390 646 \n",
1398 | "2 AHMEDNAGAR(MS) January 2016 208751 175 \n",
1399 | "3 AJMER(RAJ) January 2016 4247 722 \n",
1400 | "4 ALIGARH(UP) January 2016 12350 1219 \n",
1401 | "\n",
1402 | " Price Maximum (Rs/q) Modal Price (Rs/q) \n",
1403 | "0 1443 1349 \n",
1404 | "1 1224 997 \n",
1405 | "2 1722 1138 \n",
1406 | "3 1067 939 \n",
1407 | "4 1298 1257 "
1408 | ]
1409 | },
1410 | "execution_count": 15,
1411 | "metadata": {},
1412 | "output_type": "execute_result"
1413 | }
1414 | ],
1415 | "source": [
1416 | "# Can we see sample rows - the top 5 rows\n",
1417 | "df.head()"
1418 | ]
1419 | },
1420 | {
1421 | "cell_type": "code",
1422 | "execution_count": 16,
1423 | "metadata": {
1424 | "collapsed": false
1425 | },
1426 | "outputs": [
1427 | {
1428 | "data": {
1429 | "text/html": [
1430 | "\n",
1431 | "
\n",
1432 | " \n",
1433 | " \n",
1434 | " | \n",
1435 | " Market | \n",
1436 | " Month Name | \n",
1437 | " Year | \n",
1438 | " Arrival (q) | \n",
1439 | " Price Minimum (Rs/q) | \n",
1440 | " Price Maximum (Rs/q) | \n",
1441 | " Modal Price (Rs/q) | \n",
1442 | "
\n",
1443 | " \n",
1444 | " \n",
1445 | " \n",
1446 | " 79 | \n",
1447 | " UDAIPUR(RAJ) | \n",
1448 | " January | \n",
1449 | " 2016 | \n",
1450 | " 6456 | \n",
1451 | " 386 | \n",
1452 | " 1307 | \n",
1453 | " 846 | \n",
1454 | "
\n",
1455 | " \n",
1456 | " 80 | \n",
1457 | " VANI(MS) | \n",
1458 | " January | \n",
1459 | " 2016 | \n",
1460 | " 60983 | \n",
1461 | " 767 | \n",
1462 | " 1323 | \n",
1463 | " 1007 | \n",
1464 | "
\n",
1465 | " \n",
1466 | " 81 | \n",
1467 | " VARANASI(UP) | \n",
1468 | " January | \n",
1469 | " 2016 | \n",
1470 | " 28900 | \n",
1471 | " 1460 | \n",
1472 | " 1503 | \n",
1473 | " 1484 | \n",
1474 | "
\n",
1475 | " \n",
1476 | " 82 | \n",
1477 | " YEOLA(MS) | \n",
1478 | " January | \n",
1479 | " 2016 | \n",
1480 | " 437432 | \n",
1481 | " 437 | \n",
1482 | " 1272 | \n",
1483 | " 1034 | \n",
1484 | "
\n",
1485 | " \n",
1486 | " 83 | \n",
1487 | " NaN | \n",
1488 | " NaN | \n",
1489 | " Total | \n",
1490 | " 9307923 | \n",
1491 | " 751(Avg) | \n",
1492 | " 1490(Avg) | \n",
1493 | " 1186(Avg) | \n",
1494 | "
\n",
1495 | " \n",
1496 | "
\n",
1497 | "
"
1498 | ],
1499 | "text/plain": [
1500 | " Market Month Name Year Arrival (q) Price Minimum (Rs/q) \\\n",
1501 | "79 UDAIPUR(RAJ) January 2016 6456 386 \n",
1502 | "80 VANI(MS) January 2016 60983 767 \n",
1503 | "81 VARANASI(UP) January 2016 28900 1460 \n",
1504 | "82 YEOLA(MS) January 2016 437432 437 \n",
1505 | "83 NaN NaN Total 9307923 751(Avg) \n",
1506 | "\n",
1507 | " Price Maximum (Rs/q) Modal Price (Rs/q) \n",
1508 | "79 1307 846 \n",
1509 | "80 1323 1007 \n",
1510 | "81 1503 1484 \n",
1511 | "82 1272 1034 \n",
1512 | "83 1490(Avg) 1186(Avg) "
1513 | ]
1514 | },
1515 | "execution_count": 16,
1516 | "metadata": {},
1517 | "output_type": "execute_result"
1518 | }
1519 | ],
1520 | "source": [
1521 | "# Can we see sample rows - the bottom 5 rows\n",
1522 | "df.tail()"
1523 | ]
1524 | },
1525 | {
1526 | "cell_type": "code",
1527 | "execution_count": 17,
1528 | "metadata": {
1529 | "collapsed": false
1530 | },
1531 | "outputs": [
1532 | {
1533 | "data": {
1534 | "text/plain": [
1535 | "0 AGRA(UP)\n",
1536 | "1 AHMEDABAD(GUJ)\n",
1537 | "2 AHMEDNAGAR(MS)\n",
1538 | "3 AJMER(RAJ)\n",
1539 | "4 ALIGARH(UP)\n",
1540 | "5 ALWAR(RAJ)\n",
1541 | "6 AMRITSAR(PB)\n",
1542 | "7 BALLIA(UP)\n",
1543 | "8 BANGALORE\n",
1544 | "9 BAREILLY(UP)\n",
1545 | "10 BELGAUM(KNT)\n",
1546 | "11 BHATINDA(PB)\n",
1547 | "12 BHAVNAGAR(GUJ)\n",
1548 | "13 BHUBNESWER(OR)\n",
1549 | "14 BIJAPUR(KNT)\n",
1550 | "15 BURDWAN(WB)\n",
1551 | "16 CHAKAN(MS)\n",
1552 | "17 CHANDIGARH\n",
1553 | "18 CHANDVAD(MS)\n",
1554 | "19 CHENNAI\n",
1555 | "20 DEESA(GUJ)\n",
1556 | "21 DEHRADOON(UTT)\n",
1557 | "22 DELHI\n",
1558 | "23 DEVALA(MS)\n",
1559 | "24 DHAVANGERE(KNT)\n",
1560 | "25 DHULIA(MS)\n",
1561 | "26 GONDAL(GUJ)\n",
1562 | "27 GUWAHATI\n",
1563 | "28 HASSAN(KNT)\n",
1564 | "29 HOSHIARPUR(PB)\n",
1565 | " ... \n",
1566 | "54 MUMBAI\n",
1567 | "55 NAGPUR\n",
1568 | "56 NEWASA(MS)\n",
1569 | "57 NIPHAD(MS)\n",
1570 | "58 PALAYAM(KER)\n",
1571 | "59 PATIALA(PB)\n",
1572 | "60 PATNA\n",
1573 | "61 PHALTAN (MS)\n",
1574 | "62 PIMPALGAON(MS)\n",
1575 | "63 PUNE(MS)\n",
1576 | "64 PURULIA(WB)\n",
1577 | "65 RAHATA(MS)\n",
1578 | "66 RAHURI(MS)\n",
1579 | "67 RAICHUR(KNT)\n",
1580 | "68 RAIPUR(CHGARH)\n",
1581 | "69 RAJKOT(GUJ)\n",
1582 | "70 SAGAR(MP)\n",
1583 | "71 SAIKHEDA(MS)\n",
1584 | "72 SANGALI(MS)\n",
1585 | "73 SANGAMNER(MS)\n",
1586 | "74 SATANA(MS)\n",
1587 | "75 SHRIRAMPUR(MS)\n",
1588 | "76 SINNAR(MS)\n",
1589 | "77 SOLAPUR(MS)\n",
1590 | "78 SURAT(GUJ)\n",
1591 | "79 UDAIPUR(RAJ)\n",
1592 | "80 VANI(MS)\n",
1593 | "81 VARANASI(UP)\n",
1594 | "82 YEOLA(MS)\n",
1595 | "83 NaN\n",
1596 | "Name: Market, dtype: object"
1597 | ]
1598 | },
1599 | "execution_count": 17,
1600 | "metadata": {},
1601 | "output_type": "execute_result"
1602 | }
1603 | ],
1604 | "source": [
1605 | "# Can we access a specific columns\n",
1606 | "df[\"Market\"]"
1607 | ]
1608 | },
1609 | {
1610 | "cell_type": "code",
1611 | "execution_count": 18,
1612 | "metadata": {
1613 | "collapsed": false
1614 | },
1615 | "outputs": [
1616 | {
1617 | "data": {
1618 | "text/plain": [
1619 | "0 AGRA(UP)\n",
1620 | "1 AHMEDABAD(GUJ)\n",
1621 | "2 AHMEDNAGAR(MS)\n",
1622 | "3 AJMER(RAJ)\n",
1623 | "4 ALIGARH(UP)\n",
1624 | "5 ALWAR(RAJ)\n",
1625 | "6 AMRITSAR(PB)\n",
1626 | "7 BALLIA(UP)\n",
1627 | "8 BANGALORE\n",
1628 | "9 BAREILLY(UP)\n",
1629 | "10 BELGAUM(KNT)\n",
1630 | "11 BHATINDA(PB)\n",
1631 | "12 BHAVNAGAR(GUJ)\n",
1632 | "13 BHUBNESWER(OR)\n",
1633 | "14 BIJAPUR(KNT)\n",
1634 | "15 BURDWAN(WB)\n",
1635 | "16 CHAKAN(MS)\n",
1636 | "17 CHANDIGARH\n",
1637 | "18 CHANDVAD(MS)\n",
1638 | "19 CHENNAI\n",
1639 | "20 DEESA(GUJ)\n",
1640 | "21 DEHRADOON(UTT)\n",
1641 | "22 DELHI\n",
1642 | "23 DEVALA(MS)\n",
1643 | "24 DHAVANGERE(KNT)\n",
1644 | "25 DHULIA(MS)\n",
1645 | "26 GONDAL(GUJ)\n",
1646 | "27 GUWAHATI\n",
1647 | "28 HASSAN(KNT)\n",
1648 | "29 HOSHIARPUR(PB)\n",
1649 | " ... \n",
1650 | "54 MUMBAI\n",
1651 | "55 NAGPUR\n",
1652 | "56 NEWASA(MS)\n",
1653 | "57 NIPHAD(MS)\n",
1654 | "58 PALAYAM(KER)\n",
1655 | "59 PATIALA(PB)\n",
1656 | "60 PATNA\n",
1657 | "61 PHALTAN (MS)\n",
1658 | "62 PIMPALGAON(MS)\n",
1659 | "63 PUNE(MS)\n",
1660 | "64 PURULIA(WB)\n",
1661 | "65 RAHATA(MS)\n",
1662 | "66 RAHURI(MS)\n",
1663 | "67 RAICHUR(KNT)\n",
1664 | "68 RAIPUR(CHGARH)\n",
1665 | "69 RAJKOT(GUJ)\n",
1666 | "70 SAGAR(MP)\n",
1667 | "71 SAIKHEDA(MS)\n",
1668 | "72 SANGALI(MS)\n",
1669 | "73 SANGAMNER(MS)\n",
1670 | "74 SATANA(MS)\n",
1671 | "75 SHRIRAMPUR(MS)\n",
1672 | "76 SINNAR(MS)\n",
1673 | "77 SOLAPUR(MS)\n",
1674 | "78 SURAT(GUJ)\n",
1675 | "79 UDAIPUR(RAJ)\n",
1676 | "80 VANI(MS)\n",
1677 | "81 VARANASI(UP)\n",
1678 | "82 YEOLA(MS)\n",
1679 | "83 NaN\n",
1680 | "Name: Market, dtype: object"
1681 | ]
1682 | },
1683 | "execution_count": 18,
1684 | "metadata": {},
1685 | "output_type": "execute_result"
1686 | }
1687 | ],
1688 | "source": [
1689 | "# Using the dot notation\n",
1690 | "df.Market"
1691 | ]
1692 | },
1693 | {
1694 | "cell_type": "code",
1695 | "execution_count": 19,
1696 | "metadata": {
1697 | "collapsed": false
1698 | },
1699 | "outputs": [
1700 | {
1701 | "data": {
1702 | "text/plain": [
1703 | "0 AGRA(UP)\n",
1704 | "1 AHMEDABAD(GUJ)\n",
1705 | "2 AHMEDNAGAR(MS)\n",
1706 | "3 AJMER(RAJ)\n",
1707 | "4 ALIGARH(UP)\n",
1708 | "Name: Market, dtype: object"
1709 | ]
1710 | },
1711 | "execution_count": 19,
1712 | "metadata": {},
1713 | "output_type": "execute_result"
1714 | }
1715 | ],
1716 | "source": [
1717 | "# Selecting specific column and rows\n",
1718 | "df[0:5][\"Market\"]"
1719 | ]
1720 | },
1721 | {
1722 | "cell_type": "code",
1723 | "execution_count": 20,
1724 | "metadata": {
1725 | "collapsed": false
1726 | },
1727 | "outputs": [
1728 | {
1729 | "data": {
1730 | "text/plain": [
1731 | "0 AGRA(UP)\n",
1732 | "1 AHMEDABAD(GUJ)\n",
1733 | "2 AHMEDNAGAR(MS)\n",
1734 | "3 AJMER(RAJ)\n",
1735 | "4 ALIGARH(UP)\n",
1736 | "Name: Market, dtype: object"
1737 | ]
1738 | },
1739 | "execution_count": 20,
1740 | "metadata": {},
1741 | "output_type": "execute_result"
1742 | }
1743 | ],
1744 | "source": [
1745 | "# Works both ways\n",
1746 | "df[\"Market\"][0:5]"
1747 | ]
1748 | },
1749 | {
1750 | "cell_type": "code",
1751 | "execution_count": 21,
1752 | "metadata": {
1753 | "collapsed": false
1754 | },
1755 | "outputs": [
1756 | {
1757 | "data": {
1758 | "text/plain": [
1759 | "array(['AGRA(UP)', 'AHMEDABAD(GUJ)', 'AHMEDNAGAR(MS)', 'AJMER(RAJ)',\n",
1760 | " 'ALIGARH(UP)', 'ALWAR(RAJ)', 'AMRITSAR(PB)', 'BALLIA(UP)',\n",
1761 | " 'BANGALORE', 'BAREILLY(UP)', 'BELGAUM(KNT)', 'BHATINDA(PB)',\n",
1762 | " 'BHAVNAGAR(GUJ)', 'BHUBNESWER(OR)', 'BIJAPUR(KNT)', 'BURDWAN(WB)',\n",
1763 | " 'CHAKAN(MS)', 'CHANDIGARH', 'CHANDVAD(MS)', 'CHENNAI', 'DEESA(GUJ)',\n",
1764 | " 'DEHRADOON(UTT)', 'DELHI', 'DEVALA(MS)', 'DHAVANGERE(KNT)',\n",
1765 | " 'DHULIA(MS)', 'GONDAL(GUJ)', 'GUWAHATI', 'HASSAN(KNT)',\n",
1766 | " 'HOSHIARPUR(PB)', 'HUBLI(KNT)', 'HYDERABAD', 'INDORE(MP)', 'JAIPUR',\n",
1767 | " 'JALANDHAR(PB)', 'JALGAON(MS)', 'JAMMU', 'JAMNAGAR(GUJ)',\n",
1768 | " 'JODHPUR(RAJ)', 'KALVAN(MS)', 'KANPUR(UP)', 'KARNAL(HR)',\n",
1769 | " 'KHANNA(PB)', 'KOLHAPUR(MS)', 'KOLKATA', 'KOPERGAON(MS)',\n",
1770 | " 'KOTA(RAJ)', 'KURNOOL(AP)', 'LASALGAON(MS)', 'LONAND(MS)',\n",
1771 | " 'LUCKNOW', 'MAHUVA(GUJ)', 'MALEGAON(MS)', 'MANMAD(MS)', 'MUMBAI',\n",
1772 | " 'NAGPUR', 'NEWASA(MS)', 'NIPHAD(MS)', 'PALAYAM(KER)', 'PATIALA(PB)',\n",
1773 | " 'PATNA', 'PHALTAN (MS)', 'PIMPALGAON(MS)', 'PUNE(MS)',\n",
1774 | " 'PURULIA(WB)', 'RAHATA(MS)', 'RAHURI(MS)', 'RAICHUR(KNT)',\n",
1775 | " 'RAIPUR(CHGARH)', 'RAJKOT(GUJ)', 'SAGAR(MP)', 'SAIKHEDA(MS)',\n",
1776 | " 'SANGALI(MS)', 'SANGAMNER(MS)', 'SATANA(MS)', 'SHRIRAMPUR(MS)',\n",
1777 | " 'SINNAR(MS)', 'SOLAPUR(MS)', 'SURAT(GUJ)', 'UDAIPUR(RAJ)',\n",
1778 | " 'VANI(MS)', 'VARANASI(UP)', 'YEOLA(MS)', nan], dtype=object)"
1779 | ]
1780 | },
1781 | "execution_count": 21,
1782 | "metadata": {},
1783 | "output_type": "execute_result"
1784 | }
1785 | ],
1786 | "source": [
1787 | "#Getting unique values of State\n",
1788 | "pd.unique(df['Market'])"
1789 | ]
1790 | },
1791 | {
1792 | "cell_type": "markdown",
1793 | "metadata": {},
1794 | "source": [
1795 | "## Downloading the Entire Month Wise Arrival Data"
1796 | ]
1797 | },
1798 | {
1799 | "cell_type": "code",
1800 | "execution_count": null,
1801 | "metadata": {
1802 | "collapsed": false
1803 | },
1804 | "outputs": [],
1805 | "source": [
1806 | "AllTable = pd.read_html('MonthWiseMarketArrivals.html', header = 0,\n",
1807 | " attrs = {'id' : 'dnn_ctr974_MonthWiseMarketArrivals_GridView1'})"
1808 | ]
1809 | },
1810 | {
1811 | "cell_type": "code",
1812 | "execution_count": null,
1813 | "metadata": {
1814 | "collapsed": false
1815 | },
1816 | "outputs": [],
1817 | "source": [
1818 | "AllTable[0].head()"
1819 | ]
1820 | },
1821 | {
1822 | "cell_type": "code",
1823 | "execution_count": null,
1824 | "metadata": {
1825 | "collapsed": false
1826 | },
1827 | "outputs": [],
1828 | "source": [
1829 | "??pd.DataFrame.to_csv"
1830 | ]
1831 | },
1832 | {
1833 | "cell_type": "code",
1834 | "execution_count": null,
1835 | "metadata": {
1836 | "collapsed": false
1837 | },
1838 | "outputs": [],
1839 | "source": [
1840 | "AllTable[0].columns"
1841 | ]
1842 | },
1843 | {
1844 | "cell_type": "code",
1845 | "execution_count": null,
1846 | "metadata": {
1847 | "collapsed": true
1848 | },
1849 | "outputs": [],
1850 | "source": [
1851 | "# Change the column names to simpler ones\n",
1852 | "AllTable[0].columns = ['market', 'month', 'year', 'quantity', 'priceMin', 'priceMax', 'priceMod']"
1853 | ]
1854 | },
1855 | {
1856 | "cell_type": "code",
1857 | "execution_count": null,
1858 | "metadata": {
1859 | "collapsed": false
1860 | },
1861 | "outputs": [],
1862 | "source": [
1863 | "AllTable[0].head()"
1864 | ]
1865 | },
1866 | {
1867 | "cell_type": "code",
1868 | "execution_count": null,
1869 | "metadata": {
1870 | "collapsed": false
1871 | },
1872 | "outputs": [],
1873 | "source": [
1874 | "# Save the dataframe to a csv file\n",
1875 | "AllTable[0].to_csv('MonthWiseMarketArrivals.csv', index = False)"
1876 | ]
1877 | }
1878 | ],
1879 | "metadata": {
1880 | "kernelspec": {
1881 | "display_name": "Python 3",
1882 | "language": "python",
1883 | "name": "python3"
1884 | },
1885 | "language_info": {
1886 | "codemirror_mode": {
1887 | "name": "ipython",
1888 | "version": 3
1889 | },
1890 | "file_extension": ".py",
1891 | "mimetype": "text/x-python",
1892 | "name": "python",
1893 | "nbconvert_exporter": "python",
1894 | "pygments_lexer": "ipython3",
1895 | "version": "3.5.1"
1896 | }
1897 | },
1898 | "nbformat": 4,
1899 | "nbformat_minor": 0
1900 | }
1901 |
--------------------------------------------------------------------------------
/time_series/3-Refine.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# 2. Refine the Data\n",
8 | " \n",
9 | "> \"Data is messy\"\n",
10 | "\n",
11 | "We will be performing the following operation on our Onion price to refine it\n",
12 | "- **Remove** e.g. remove redundant data from the data frame\n",
13 | "- **Derive** e.g. State and City from the market field\n",
14 | "- **Parse** e.g. extract date from year and month column\n",
15 | "\n",
16 | "Other stuff you may need to do to refine are...\n",
17 | "- **Missing** e.g. Check for missing or incomplete data\n",
18 | "- **Quality** e.g. Check for duplicates, accuracy, unusual data\n",
19 | "- **Convert** e.g. free text to coded value\n",
20 | "- **Calculate** e.g. percentages, proportion\n",
21 | "- **Merge** e.g. first and surname for full name\n",
22 | "- **Aggregate** e.g. rollup by year, cluster by area\n",
23 | "- **Filter** e.g. exclude based on location\n",
24 | "- **Sample** e.g. extract a representative data\n",
25 | "- **Summary** e.g. show summary stats like mean"
26 | ]
27 | },
28 | {
29 | "cell_type": "code",
30 | "execution_count": 1,
31 | "metadata": {
32 | "collapsed": true
33 | },
34 | "outputs": [],
35 | "source": [
36 | "# Import the two library we need, which is Pandas and Numpy\n",
37 | "import pandas as pd\n",
38 | "import numpy as np"
39 | ]
40 | },
41 | {
42 | "cell_type": "code",
43 | "execution_count": 2,
44 | "metadata": {
45 | "collapsed": false
46 | },
47 | "outputs": [],
48 | "source": [
49 | "# Read the csv file of Month Wise Market Arrival data that has been scraped.\n",
50 | "df = pd.read_csv('MonthWiseMarketArrivals.csv')"
51 | ]
52 | },
53 | {
54 | "cell_type": "code",
55 | "execution_count": 3,
56 | "metadata": {
57 | "collapsed": false
58 | },
59 | "outputs": [
60 | {
61 | "data": {
62 | "text/html": [
63 | "\n",
64 | "
\n",
65 | " \n",
66 | " \n",
67 | " | \n",
68 | " market | \n",
69 | " month | \n",
70 | " year | \n",
71 | " quantity | \n",
72 | " priceMin | \n",
73 | " priceMax | \n",
74 | " priceMod | \n",
75 | "
\n",
76 | " \n",
77 | " \n",
78 | " \n",
79 | " 0 | \n",
80 | " ABOHAR(PB) | \n",
81 | " January | \n",
82 | " 2005 | \n",
83 | " 2350 | \n",
84 | " 404 | \n",
85 | " 493 | \n",
86 | " 446 | \n",
87 | "
\n",
88 | " \n",
89 | " 1 | \n",
90 | " ABOHAR(PB) | \n",
91 | " January | \n",
92 | " 2006 | \n",
93 | " 900 | \n",
94 | " 487 | \n",
95 | " 638 | \n",
96 | " 563 | \n",
97 | "
\n",
98 | " \n",
99 | " 2 | \n",
100 | " ABOHAR(PB) | \n",
101 | " January | \n",
102 | " 2010 | \n",
103 | " 790 | \n",
104 | " 1283 | \n",
105 | " 1592 | \n",
106 | " 1460 | \n",
107 | "
\n",
108 | " \n",
109 | " 3 | \n",
110 | " ABOHAR(PB) | \n",
111 | " January | \n",
112 | " 2011 | \n",
113 | " 245 | \n",
114 | " 3067 | \n",
115 | " 3750 | \n",
116 | " 3433 | \n",
117 | "
\n",
118 | " \n",
119 | " 4 | \n",
120 | " ABOHAR(PB) | \n",
121 | " January | \n",
122 | " 2012 | \n",
123 | " 1035 | \n",
124 | " 523 | \n",
125 | " 686 | \n",
126 | " 605 | \n",
127 | "
\n",
128 | " \n",
129 | "
\n",
130 | "
"
131 | ],
132 | "text/plain": [
133 | " market month year quantity priceMin priceMax priceMod\n",
134 | "0 ABOHAR(PB) January 2005 2350 404 493 446\n",
135 | "1 ABOHAR(PB) January 2006 900 487 638 563\n",
136 | "2 ABOHAR(PB) January 2010 790 1283 1592 1460\n",
137 | "3 ABOHAR(PB) January 2011 245 3067 3750 3433\n",
138 | "4 ABOHAR(PB) January 2012 1035 523 686 605"
139 | ]
140 | },
141 | "execution_count": 3,
142 | "metadata": {},
143 | "output_type": "execute_result"
144 | }
145 | ],
146 | "source": [
147 | "df.head()"
148 | ]
149 | },
150 | {
151 | "cell_type": "code",
152 | "execution_count": 4,
153 | "metadata": {
154 | "collapsed": false
155 | },
156 | "outputs": [
157 | {
158 | "data": {
159 | "text/html": [
160 | "\n",
161 | "
\n",
162 | " \n",
163 | " \n",
164 | " | \n",
165 | " market | \n",
166 | " month | \n",
167 | " year | \n",
168 | " quantity | \n",
169 | " priceMin | \n",
170 | " priceMax | \n",
171 | " priceMod | \n",
172 | "
\n",
173 | " \n",
174 | " \n",
175 | " \n",
176 | " 10223 | \n",
177 | " YEOLA(MS) | \n",
178 | " December | \n",
179 | " 2012 | \n",
180 | " 207066 | \n",
181 | " 485 | \n",
182 | " 1327 | \n",
183 | " 1136 | \n",
184 | "
\n",
185 | " \n",
186 | " 10224 | \n",
187 | " YEOLA(MS) | \n",
188 | " December | \n",
189 | " 2013 | \n",
190 | " 215883 | \n",
191 | " 472 | \n",
192 | " 1427 | \n",
193 | " 1177 | \n",
194 | "
\n",
195 | " \n",
196 | " 10225 | \n",
197 | " YEOLA(MS) | \n",
198 | " December | \n",
199 | " 2014 | \n",
200 | " 201077 | \n",
201 | " 446 | \n",
202 | " 1654 | \n",
203 | " 1456 | \n",
204 | "
\n",
205 | " \n",
206 | " 10226 | \n",
207 | " YEOLA(MS) | \n",
208 | " December | \n",
209 | " 2015 | \n",
210 | " 223315 | \n",
211 | " 609 | \n",
212 | " 1446 | \n",
213 | " 1126 | \n",
214 | "
\n",
215 | " \n",
216 | " 10227 | \n",
217 | " NaN | \n",
218 | " NaN | \n",
219 | " Total | \n",
220 | " 783438108 | \n",
221 | " 647(Avg) | \n",
222 | " 1213(Avg) | \n",
223 | " 984(Avg) | \n",
224 | "
\n",
225 | " \n",
226 | "
\n",
227 | "
"
228 | ],
229 | "text/plain": [
230 | " market month year quantity priceMin priceMax priceMod\n",
231 | "10223 YEOLA(MS) December 2012 207066 485 1327 1136\n",
232 | "10224 YEOLA(MS) December 2013 215883 472 1427 1177\n",
233 | "10225 YEOLA(MS) December 2014 201077 446 1654 1456\n",
234 | "10226 YEOLA(MS) December 2015 223315 609 1446 1126\n",
235 | "10227 NaN NaN Total 783438108 647(Avg) 1213(Avg) 984(Avg)"
236 | ]
237 | },
238 | "execution_count": 4,
239 | "metadata": {},
240 | "output_type": "execute_result"
241 | }
242 | ],
243 | "source": [
244 | "df.tail()"
245 | ]
246 | },
247 | {
248 | "cell_type": "markdown",
249 | "metadata": {},
250 | "source": [
251 | "## Remove the redundant data"
252 | ]
253 | },
254 | {
255 | "cell_type": "code",
256 | "execution_count": 5,
257 | "metadata": {
258 | "collapsed": false
259 | },
260 | "outputs": [
261 | {
262 | "data": {
263 | "text/plain": [
264 | "market object\n",
265 | "month object\n",
266 | "year object\n",
267 | "quantity int64\n",
268 | "priceMin object\n",
269 | "priceMax object\n",
270 | "priceMod object\n",
271 | "dtype: object"
272 | ]
273 | },
274 | "execution_count": 5,
275 | "metadata": {},
276 | "output_type": "execute_result"
277 | }
278 | ],
279 | "source": [
280 | "df.dtypes"
281 | ]
282 | },
283 | {
284 | "cell_type": "code",
285 | "execution_count": 6,
286 | "metadata": {
287 | "collapsed": false
288 | },
289 | "outputs": [
290 | {
291 | "data": {
292 | "text/html": [
293 | "\n",
294 | "
\n",
295 | " \n",
296 | " \n",
297 | " | \n",
298 | " market | \n",
299 | " month | \n",
300 | " year | \n",
301 | " quantity | \n",
302 | " priceMin | \n",
303 | " priceMax | \n",
304 | " priceMod | \n",
305 | "
\n",
306 | " \n",
307 | " \n",
308 | " \n",
309 | " 10227 | \n",
310 | " NaN | \n",
311 | " NaN | \n",
312 | " Total | \n",
313 | " 783438108 | \n",
314 | " 647(Avg) | \n",
315 | " 1213(Avg) | \n",
316 | " 984(Avg) | \n",
317 | "
\n",
318 | " \n",
319 | "
\n",
320 | "
"
321 | ],
322 | "text/plain": [
323 | " market month year quantity priceMin priceMax priceMod\n",
324 | "10227 NaN NaN Total 783438108 647(Avg) 1213(Avg) 984(Avg)"
325 | ]
326 | },
327 | "execution_count": 6,
328 | "metadata": {},
329 | "output_type": "execute_result"
330 | }
331 | ],
332 | "source": [
333 | "# Delete the last row from the dataframe\n",
334 | "df.tail(1)"
335 | ]
336 | },
337 | {
338 | "cell_type": "code",
339 | "execution_count": 7,
340 | "metadata": {
341 | "collapsed": false
342 | },
343 | "outputs": [],
344 | "source": [
345 | "# Delete a row from the dataframe\n",
346 | "df.drop(df.tail(1).index, inplace = True)"
347 | ]
348 | },
349 | {
350 | "cell_type": "code",
351 | "execution_count": 8,
352 | "metadata": {
353 | "collapsed": false
354 | },
355 | "outputs": [
356 | {
357 | "data": {
358 | "text/html": [
359 | "\n",
360 | "
\n",
361 | " \n",
362 | " \n",
363 | " | \n",
364 | " market | \n",
365 | " month | \n",
366 | " year | \n",
367 | " quantity | \n",
368 | " priceMin | \n",
369 | " priceMax | \n",
370 | " priceMod | \n",
371 | "
\n",
372 | " \n",
373 | " \n",
374 | " \n",
375 | " 0 | \n",
376 | " ABOHAR(PB) | \n",
377 | " January | \n",
378 | " 2005 | \n",
379 | " 2350 | \n",
380 | " 404 | \n",
381 | " 493 | \n",
382 | " 446 | \n",
383 | "
\n",
384 | " \n",
385 | " 1 | \n",
386 | " ABOHAR(PB) | \n",
387 | " January | \n",
388 | " 2006 | \n",
389 | " 900 | \n",
390 | " 487 | \n",
391 | " 638 | \n",
392 | " 563 | \n",
393 | "
\n",
394 | " \n",
395 | " 2 | \n",
396 | " ABOHAR(PB) | \n",
397 | " January | \n",
398 | " 2010 | \n",
399 | " 790 | \n",
400 | " 1283 | \n",
401 | " 1592 | \n",
402 | " 1460 | \n",
403 | "
\n",
404 | " \n",
405 | " 3 | \n",
406 | " ABOHAR(PB) | \n",
407 | " January | \n",
408 | " 2011 | \n",
409 | " 245 | \n",
410 | " 3067 | \n",
411 | " 3750 | \n",
412 | " 3433 | \n",
413 | "
\n",
414 | " \n",
415 | " 4 | \n",
416 | " ABOHAR(PB) | \n",
417 | " January | \n",
418 | " 2012 | \n",
419 | " 1035 | \n",
420 | " 523 | \n",
421 | " 686 | \n",
422 | " 605 | \n",
423 | "
\n",
424 | " \n",
425 | "
\n",
426 | "
"
427 | ],
428 | "text/plain": [
429 | " market month year quantity priceMin priceMax priceMod\n",
430 | "0 ABOHAR(PB) January 2005 2350 404 493 446\n",
431 | "1 ABOHAR(PB) January 2006 900 487 638 563\n",
432 | "2 ABOHAR(PB) January 2010 790 1283 1592 1460\n",
433 | "3 ABOHAR(PB) January 2011 245 3067 3750 3433\n",
434 | "4 ABOHAR(PB) January 2012 1035 523 686 605"
435 | ]
436 | },
437 | "execution_count": 8,
438 | "metadata": {},
439 | "output_type": "execute_result"
440 | }
441 | ],
442 | "source": [
443 | "df.head()"
444 | ]
445 | },
446 | {
447 | "cell_type": "code",
448 | "execution_count": 56,
449 | "metadata": {
450 | "collapsed": false
451 | },
452 | "outputs": [
453 | {
454 | "data": {
455 | "text/html": [
456 | "\n",
457 | "
\n",
458 | " \n",
459 | " \n",
460 | " | \n",
461 | " market | \n",
462 | " month | \n",
463 | " year | \n",
464 | " quantity | \n",
465 | " priceMin | \n",
466 | " priceMax | \n",
467 | " priceMod | \n",
468 | "
\n",
469 | " \n",
470 | " \n",
471 | " \n",
472 | " 10222 | \n",
473 | " YEOLA(MS) | \n",
474 | " December | \n",
475 | " 2011 | \n",
476 | " 131326 | \n",
477 | " 282 | \n",
478 | " 612 | \n",
479 | " 526 | \n",
480 | "
\n",
481 | " \n",
482 | " 10223 | \n",
483 | " YEOLA(MS) | \n",
484 | " December | \n",
485 | " 2012 | \n",
486 | " 207066 | \n",
487 | " 485 | \n",
488 | " 1327 | \n",
489 | " 1136 | \n",
490 | "
\n",
491 | " \n",
492 | " 10224 | \n",
493 | " YEOLA(MS) | \n",
494 | " December | \n",
495 | " 2013 | \n",
496 | " 215883 | \n",
497 | " 472 | \n",
498 | " 1427 | \n",
499 | " 1177 | \n",
500 | "
\n",
501 | " \n",
502 | " 10225 | \n",
503 | " YEOLA(MS) | \n",
504 | " December | \n",
505 | " 2014 | \n",
506 | " 201077 | \n",
507 | " 446 | \n",
508 | " 1654 | \n",
509 | " 1456 | \n",
510 | "
\n",
511 | " \n",
512 | " 10226 | \n",
513 | " YEOLA(MS) | \n",
514 | " December | \n",
515 | " 2015 | \n",
516 | " 223315 | \n",
517 | " 609 | \n",
518 | " 1446 | \n",
519 | " 1126 | \n",
520 | "
\n",
521 | " \n",
522 | "
\n",
523 | "
"
524 | ],
525 | "text/plain": [
526 | " market month year quantity priceMin priceMax priceMod\n",
527 | "10222 YEOLA(MS) December 2011 131326 282 612 526\n",
528 | "10223 YEOLA(MS) December 2012 207066 485 1327 1136\n",
529 | "10224 YEOLA(MS) December 2013 215883 472 1427 1177\n",
530 | "10225 YEOLA(MS) December 2014 201077 446 1654 1456\n",
531 | "10226 YEOLA(MS) December 2015 223315 609 1446 1126"
532 | ]
533 | },
534 | "execution_count": 56,
535 | "metadata": {},
536 | "output_type": "execute_result"
537 | }
538 | ],
539 | "source": [
540 | "df.tail()"
541 | ]
542 | },
543 | {
544 | "cell_type": "code",
545 | "execution_count": 57,
546 | "metadata": {
547 | "collapsed": false
548 | },
549 | "outputs": [
550 | {
551 | "data": {
552 | "text/plain": [
553 | "market object\n",
554 | "month object\n",
555 | "year object\n",
556 | "quantity int64\n",
557 | "priceMin object\n",
558 | "priceMax object\n",
559 | "priceMod object\n",
560 | "dtype: object"
561 | ]
562 | },
563 | "execution_count": 57,
564 | "metadata": {},
565 | "output_type": "execute_result"
566 | }
567 | ],
568 | "source": [
569 | "df.dtypes"
570 | ]
571 | },
572 | {
573 | "cell_type": "code",
574 | "execution_count": 58,
575 | "metadata": {
576 | "collapsed": false
577 | },
578 | "outputs": [
579 | {
580 | "data": {
581 | "text/html": [
582 | "\n",
583 | "
\n",
584 | " \n",
585 | " \n",
586 | " | \n",
587 | " priceMin | \n",
588 | " priceMax | \n",
589 | " priceMod | \n",
590 | "
\n",
591 | " \n",
592 | " \n",
593 | " \n",
594 | " 0 | \n",
595 | " 404 | \n",
596 | " 493 | \n",
597 | " 446 | \n",
598 | "
\n",
599 | " \n",
600 | " 1 | \n",
601 | " 487 | \n",
602 | " 638 | \n",
603 | " 563 | \n",
604 | "
\n",
605 | " \n",
606 | " 2 | \n",
607 | " 1283 | \n",
608 | " 1592 | \n",
609 | " 1460 | \n",
610 | "
\n",
611 | " \n",
612 | " 3 | \n",
613 | " 3067 | \n",
614 | " 3750 | \n",
615 | " 3433 | \n",
616 | "
\n",
617 | " \n",
618 | " 4 | \n",
619 | " 523 | \n",
620 | " 686 | \n",
621 | " 605 | \n",
622 | "
\n",
623 | " \n",
624 | "
\n",
625 | "
"
626 | ],
627 | "text/plain": [
628 | " priceMin priceMax priceMod\n",
629 | "0 404 493 446\n",
630 | "1 487 638 563\n",
631 | "2 1283 1592 1460\n",
632 | "3 3067 3750 3433\n",
633 | "4 523 686 605"
634 | ]
635 | },
636 | "execution_count": 58,
637 | "metadata": {},
638 | "output_type": "execute_result"
639 | }
640 | ],
641 | "source": [
642 | "df.iloc[:,4:7].head()"
643 | ]
644 | },
645 | {
646 | "cell_type": "code",
647 | "execution_count": 59,
648 | "metadata": {
649 | "collapsed": false
650 | },
651 | "outputs": [],
652 | "source": [
653 | "df.iloc[:,2:7] = df.iloc[:,2:7].astype(int)"
654 | ]
655 | },
656 | {
657 | "cell_type": "code",
658 | "execution_count": 60,
659 | "metadata": {
660 | "collapsed": false
661 | },
662 | "outputs": [
663 | {
664 | "data": {
665 | "text/plain": [
666 | "market object\n",
667 | "month object\n",
668 | "year int64\n",
669 | "quantity int64\n",
670 | "priceMin int64\n",
671 | "priceMax int64\n",
672 | "priceMod int64\n",
673 | "dtype: object"
674 | ]
675 | },
676 | "execution_count": 60,
677 | "metadata": {},
678 | "output_type": "execute_result"
679 | }
680 | ],
681 | "source": [
682 | "df.dtypes"
683 | ]
684 | },
685 | {
686 | "cell_type": "code",
687 | "execution_count": 61,
688 | "metadata": {
689 | "collapsed": false
690 | },
691 | "outputs": [
692 | {
693 | "data": {
694 | "text/html": [
695 | "\n",
696 | "
\n",
697 | " \n",
698 | " \n",
699 | " | \n",
700 | " market | \n",
701 | " month | \n",
702 | " year | \n",
703 | " quantity | \n",
704 | " priceMin | \n",
705 | " priceMax | \n",
706 | " priceMod | \n",
707 | "
\n",
708 | " \n",
709 | " \n",
710 | " \n",
711 | " 0 | \n",
712 | " ABOHAR(PB) | \n",
713 | " Jan | \n",
714 | " 2005 | \n",
715 | " 2350 | \n",
716 | " 404 | \n",
717 | " 493 | \n",
718 | " 446 | \n",
719 | "
\n",
720 | " \n",
721 | " 1 | \n",
722 | " ABOHAR(PB) | \n",
723 | " Jan | \n",
724 | " 2006 | \n",
725 | " 900 | \n",
726 | " 487 | \n",
727 | " 638 | \n",
728 | " 563 | \n",
729 | "
\n",
730 | " \n",
731 | " 2 | \n",
732 | " ABOHAR(PB) | \n",
733 | " Jan | \n",
734 | " 2010 | \n",
735 | " 790 | \n",
736 | " 1283 | \n",
737 | " 1592 | \n",
738 | " 1460 | \n",
739 | "
\n",
740 | " \n",
741 | " 3 | \n",
742 | " ABOHAR(PB) | \n",
743 | " Jan | \n",
744 | " 2011 | \n",
745 | " 245 | \n",
746 | " 3067 | \n",
747 | " 3750 | \n",
748 | " 3433 | \n",
749 | "
\n",
750 | " \n",
751 | " 4 | \n",
752 | " ABOHAR(PB) | \n",
753 | " Jan | \n",
754 | " 2012 | \n",
755 | " 1035 | \n",
756 | " 523 | \n",
757 | " 686 | \n",
758 | " 605 | \n",
759 | "
\n",
760 | " \n",
761 | "
\n",
762 | "
"
763 | ],
764 | "text/plain": [
765 | " market month year quantity priceMin priceMax priceMod\n",
766 | "0 ABOHAR(PB) Jan 2005 2350 404 493 446\n",
767 | "1 ABOHAR(PB) Jan 2006 900 487 638 563\n",
768 | "2 ABOHAR(PB) Jan 2010 790 1283 1592 1460\n",
769 | "3 ABOHAR(PB) Jan 2011 245 3067 3750 3433\n",
770 | "4 ABOHAR(PB) Jan 2012 1035 523 686 605"
771 | ]
772 | },
773 | "execution_count": 61,
774 | "metadata": {},
775 | "output_type": "execute_result"
776 | }
777 | ],
778 | "source": [
779 | "df.head()"
780 | ]
781 | },
782 | {
783 | "cell_type": "code",
784 | "execution_count": 62,
785 | "metadata": {
786 | "collapsed": false
787 | },
788 | "outputs": [
789 | {
790 | "data": {
791 | "text/html": [
792 | "\n",
793 | "
\n",
794 | " \n",
795 | " \n",
796 | " | \n",
797 | " year | \n",
798 | " quantity | \n",
799 | " priceMin | \n",
800 | " priceMax | \n",
801 | " priceMod | \n",
802 | "
\n",
803 | " \n",
804 | " \n",
805 | " \n",
806 | " count | \n",
807 | " 10227.000000 | \n",
808 | " 10227.000000 | \n",
809 | " 10227.000000 | \n",
810 | " 10227.000000 | \n",
811 | " 10227.000000 | \n",
812 | "
\n",
813 | " \n",
814 | " mean | \n",
815 | " 2009.022294 | \n",
816 | " 76604.880023 | \n",
817 | " 646.944363 | \n",
818 | " 1212.760731 | \n",
819 | " 984.284345 | \n",
820 | "
\n",
821 | " \n",
822 | " std | \n",
823 | " 4.372841 | \n",
824 | " 124408.698759 | \n",
825 | " 673.121850 | \n",
826 | " 979.658874 | \n",
827 | " 818.471498 | \n",
828 | "
\n",
829 | " \n",
830 | " min | \n",
831 | " 1996.000000 | \n",
832 | " 20.000000 | \n",
833 | " 16.000000 | \n",
834 | " 145.000000 | \n",
835 | " 80.000000 | \n",
836 | "
\n",
837 | " \n",
838 | " 25% | \n",
839 | " 2006.000000 | \n",
840 | " 8898.000000 | \n",
841 | " 209.000000 | \n",
842 | " 557.000000 | \n",
843 | " 448.000000 | \n",
844 | "
\n",
845 | " \n",
846 | " 50% | \n",
847 | " 2009.000000 | \n",
848 | " 27460.000000 | \n",
849 | " 440.000000 | \n",
850 | " 923.000000 | \n",
851 | " 747.000000 | \n",
852 | "
\n",
853 | " \n",
854 | " 75% | \n",
855 | " 2013.000000 | \n",
856 | " 88356.500000 | \n",
857 | " 828.000000 | \n",
858 | " 1527.000000 | \n",
859 | " 1248.000000 | \n",
860 | "
\n",
861 | " \n",
862 | " max | \n",
863 | " 2016.000000 | \n",
864 | " 1639032.000000 | \n",
865 | " 6000.000000 | \n",
866 | " 8192.000000 | \n",
867 | " 6400.000000 | \n",
868 | "
\n",
869 | " \n",
870 | "
\n",
871 | "
"
872 | ],
873 | "text/plain": [
874 | " year quantity priceMin priceMax priceMod\n",
875 | "count 10227.000000 10227.000000 10227.000000 10227.000000 10227.000000\n",
876 | "mean 2009.022294 76604.880023 646.944363 1212.760731 984.284345\n",
877 | "std 4.372841 124408.698759 673.121850 979.658874 818.471498\n",
878 | "min 1996.000000 20.000000 16.000000 145.000000 80.000000\n",
879 | "25% 2006.000000 8898.000000 209.000000 557.000000 448.000000\n",
880 | "50% 2009.000000 27460.000000 440.000000 923.000000 747.000000\n",
881 | "75% 2013.000000 88356.500000 828.000000 1527.000000 1248.000000\n",
882 | "max 2016.000000 1639032.000000 6000.000000 8192.000000 6400.000000"
883 | ]
884 | },
885 | "execution_count": 62,
886 | "metadata": {},
887 | "output_type": "execute_result"
888 | }
889 | ],
890 | "source": [
891 | "df.describe()"
892 | ]
893 | },
894 | {
895 | "cell_type": "markdown",
896 | "metadata": {},
897 | "source": [
898 | "## Extracting the states from market names"
899 | ]
900 | },
901 | {
902 | "cell_type": "code",
903 | "execution_count": 63,
904 | "metadata": {
905 | "collapsed": false
906 | },
907 | "outputs": [
908 | {
909 | "data": {
910 | "text/plain": [
911 | "LASALGAON(MS) 242\n",
912 | "PIMPALGAON(MS) 224\n",
913 | "MANMAD(MS) 218\n",
914 | "LONAND(MS) 211\n",
915 | "MAHUVA(GUJ) 210\n",
916 | "Name: market, dtype: int64"
917 | ]
918 | },
919 | "execution_count": 63,
920 | "metadata": {},
921 | "output_type": "execute_result"
922 | }
923 | ],
924 | "source": [
925 | "df.market.value_counts().head()"
926 | ]
927 | },
928 | {
929 | "cell_type": "code",
930 | "execution_count": 64,
931 | "metadata": {
932 | "collapsed": false
933 | },
934 | "outputs": [],
935 | "source": [
936 | "df['state'] = df.market.str.split('(').str[-1]"
937 | ]
938 | },
939 | {
940 | "cell_type": "code",
941 | "execution_count": 65,
942 | "metadata": {
943 | "collapsed": false
944 | },
945 | "outputs": [
946 | {
947 | "data": {
948 | "text/html": [
949 | "\n",
950 | "
\n",
951 | " \n",
952 | " \n",
953 | " | \n",
954 | " market | \n",
955 | " month | \n",
956 | " year | \n",
957 | " quantity | \n",
958 | " priceMin | \n",
959 | " priceMax | \n",
960 | " priceMod | \n",
961 | " state | \n",
962 | "
\n",
963 | " \n",
964 | " \n",
965 | " \n",
966 | " 0 | \n",
967 | " ABOHAR(PB) | \n",
968 | " Jan | \n",
969 | " 2005 | \n",
970 | " 2350 | \n",
971 | " 404 | \n",
972 | " 493 | \n",
973 | " 446 | \n",
974 | " PB) | \n",
975 | "
\n",
976 | " \n",
977 | " 1 | \n",
978 | " ABOHAR(PB) | \n",
979 | " Jan | \n",
980 | " 2006 | \n",
981 | " 900 | \n",
982 | " 487 | \n",
983 | " 638 | \n",
984 | " 563 | \n",
985 | " PB) | \n",
986 | "
\n",
987 | " \n",
988 | " 2 | \n",
989 | " ABOHAR(PB) | \n",
990 | " Jan | \n",
991 | " 2010 | \n",
992 | " 790 | \n",
993 | " 1283 | \n",
994 | " 1592 | \n",
995 | " 1460 | \n",
996 | " PB) | \n",
997 | "
\n",
998 | " \n",
999 | " 3 | \n",
1000 | " ABOHAR(PB) | \n",
1001 | " Jan | \n",
1002 | " 2011 | \n",
1003 | " 245 | \n",
1004 | " 3067 | \n",
1005 | " 3750 | \n",
1006 | " 3433 | \n",
1007 | " PB) | \n",
1008 | "
\n",
1009 | " \n",
1010 | " 4 | \n",
1011 | " ABOHAR(PB) | \n",
1012 | " Jan | \n",
1013 | " 2012 | \n",
1014 | " 1035 | \n",
1015 | " 523 | \n",
1016 | " 686 | \n",
1017 | " 605 | \n",
1018 | " PB) | \n",
1019 | "
\n",
1020 | " \n",
1021 | "
\n",
1022 | "
"
1023 | ],
1024 | "text/plain": [
1025 | " market month year quantity priceMin priceMax priceMod state\n",
1026 | "0 ABOHAR(PB) Jan 2005 2350 404 493 446 PB)\n",
1027 | "1 ABOHAR(PB) Jan 2006 900 487 638 563 PB)\n",
1028 | "2 ABOHAR(PB) Jan 2010 790 1283 1592 1460 PB)\n",
1029 | "3 ABOHAR(PB) Jan 2011 245 3067 3750 3433 PB)\n",
1030 | "4 ABOHAR(PB) Jan 2012 1035 523 686 605 PB)"
1031 | ]
1032 | },
1033 | "execution_count": 65,
1034 | "metadata": {},
1035 | "output_type": "execute_result"
1036 | }
1037 | ],
1038 | "source": [
1039 | "df.head()"
1040 | ]
1041 | },
1042 | {
1043 | "cell_type": "code",
1044 | "execution_count": 66,
1045 | "metadata": {
1046 | "collapsed": true
1047 | },
1048 | "outputs": [],
1049 | "source": [
1050 | "df['city'] = df.market.str.split('(').str[0]"
1051 | ]
1052 | },
1053 | {
1054 | "cell_type": "code",
1055 | "execution_count": 67,
1056 | "metadata": {
1057 | "collapsed": false
1058 | },
1059 | "outputs": [
1060 | {
1061 | "data": {
1062 | "text/html": [
1063 | "\n",
1064 | "
\n",
1065 | " \n",
1066 | " \n",
1067 | " | \n",
1068 | " market | \n",
1069 | " month | \n",
1070 | " year | \n",
1071 | " quantity | \n",
1072 | " priceMin | \n",
1073 | " priceMax | \n",
1074 | " priceMod | \n",
1075 | " state | \n",
1076 | " city | \n",
1077 | "
\n",
1078 | " \n",
1079 | " \n",
1080 | " \n",
1081 | " 0 | \n",
1082 | " ABOHAR(PB) | \n",
1083 | " Jan | \n",
1084 | " 2005 | \n",
1085 | " 2350 | \n",
1086 | " 404 | \n",
1087 | " 493 | \n",
1088 | " 446 | \n",
1089 | " PB) | \n",
1090 | " ABOHAR | \n",
1091 | "
\n",
1092 | " \n",
1093 | " 1 | \n",
1094 | " ABOHAR(PB) | \n",
1095 | " Jan | \n",
1096 | " 2006 | \n",
1097 | " 900 | \n",
1098 | " 487 | \n",
1099 | " 638 | \n",
1100 | " 563 | \n",
1101 | " PB) | \n",
1102 | " ABOHAR | \n",
1103 | "
\n",
1104 | " \n",
1105 | " 2 | \n",
1106 | " ABOHAR(PB) | \n",
1107 | " Jan | \n",
1108 | " 2010 | \n",
1109 | " 790 | \n",
1110 | " 1283 | \n",
1111 | " 1592 | \n",
1112 | " 1460 | \n",
1113 | " PB) | \n",
1114 | " ABOHAR | \n",
1115 | "
\n",
1116 | " \n",
1117 | " 3 | \n",
1118 | " ABOHAR(PB) | \n",
1119 | " Jan | \n",
1120 | " 2011 | \n",
1121 | " 245 | \n",
1122 | " 3067 | \n",
1123 | " 3750 | \n",
1124 | " 3433 | \n",
1125 | " PB) | \n",
1126 | " ABOHAR | \n",
1127 | "
\n",
1128 | " \n",
1129 | " 4 | \n",
1130 | " ABOHAR(PB) | \n",
1131 | " Jan | \n",
1132 | " 2012 | \n",
1133 | " 1035 | \n",
1134 | " 523 | \n",
1135 | " 686 | \n",
1136 | " 605 | \n",
1137 | " PB) | \n",
1138 | " ABOHAR | \n",
1139 | "
\n",
1140 | " \n",
1141 | "
\n",
1142 | "
"
1143 | ],
1144 | "text/plain": [
1145 | " market month year quantity priceMin priceMax priceMod state \\\n",
1146 | "0 ABOHAR(PB) Jan 2005 2350 404 493 446 PB) \n",
1147 | "1 ABOHAR(PB) Jan 2006 900 487 638 563 PB) \n",
1148 | "2 ABOHAR(PB) Jan 2010 790 1283 1592 1460 PB) \n",
1149 | "3 ABOHAR(PB) Jan 2011 245 3067 3750 3433 PB) \n",
1150 | "4 ABOHAR(PB) Jan 2012 1035 523 686 605 PB) \n",
1151 | "\n",
1152 | " city \n",
1153 | "0 ABOHAR \n",
1154 | "1 ABOHAR \n",
1155 | "2 ABOHAR \n",
1156 | "3 ABOHAR \n",
1157 | "4 ABOHAR "
1158 | ]
1159 | },
1160 | "execution_count": 67,
1161 | "metadata": {},
1162 | "output_type": "execute_result"
1163 | }
1164 | ],
1165 | "source": [
1166 | "df.head()"
1167 | ]
1168 | },
1169 | {
1170 | "cell_type": "code",
1171 | "execution_count": 68,
1172 | "metadata": {
1173 | "collapsed": false
1174 | },
1175 | "outputs": [
1176 | {
1177 | "data": {
1178 | "text/plain": [
1179 | "array(['PB)', 'UP)', 'GUJ)', 'MS)', 'RAJ)', 'BANGALORE', 'KNT)', 'BHOPAL',\n",
1180 | " 'OR)', 'BHR)', 'WB)', 'CHANDIGARH', 'CHENNAI', 'bellary)',\n",
1181 | " 'podisu)', 'UTT)', 'DELHI', 'MP)', 'TN)', 'Podis', 'GUWAHATI',\n",
1182 | " 'HYDERABAD', 'JAIPUR', 'WHITE)', 'JAMMU', 'HR)', 'KOLKATA', 'AP)',\n",
1183 | " 'LUCKNOW', 'MUMBAI', 'NAGPUR', 'KER)', 'PATNA', 'CHGARH)', 'JH)',\n",
1184 | " 'SHIMLA', 'SRINAGAR', 'TRIVENDRUM'], dtype=object)"
1185 | ]
1186 | },
1187 | "execution_count": 68,
1188 | "metadata": {},
1189 | "output_type": "execute_result"
1190 | }
1191 | ],
1192 | "source": [
1193 | "df.state.unique()"
1194 | ]
1195 | },
1196 | {
1197 | "cell_type": "code",
1198 | "execution_count": 69,
1199 | "metadata": {
1200 | "collapsed": false
1201 | },
1202 | "outputs": [],
1203 | "source": [
1204 | "df['state'] = df.state.str.split(')').str[0]"
1205 | ]
1206 | },
1207 | {
1208 | "cell_type": "code",
1209 | "execution_count": 70,
1210 | "metadata": {
1211 | "collapsed": false
1212 | },
1213 | "outputs": [
1214 | {
1215 | "data": {
1216 | "text/plain": [
1217 | "array(['PB', 'UP', 'GUJ', 'MS', 'RAJ', 'BANGALORE', 'KNT', 'BHOPAL', 'OR',\n",
1218 | " 'BHR', 'WB', 'CHANDIGARH', 'CHENNAI', 'bellary', 'podisu', 'UTT',\n",
1219 | " 'DELHI', 'MP', 'TN', 'Podis', 'GUWAHATI', 'HYDERABAD', 'JAIPUR',\n",
1220 | " 'WHITE', 'JAMMU', 'HR', 'KOLKATA', 'AP', 'LUCKNOW', 'MUMBAI',\n",
1221 | " 'NAGPUR', 'KER', 'PATNA', 'CHGARH', 'JH', 'SHIMLA', 'SRINAGAR',\n",
1222 | " 'TRIVENDRUM'], dtype=object)"
1223 | ]
1224 | },
1225 | "execution_count": 70,
1226 | "metadata": {},
1227 | "output_type": "execute_result"
1228 | }
1229 | ],
1230 | "source": [
1231 | "df.state.unique()"
1232 | ]
1233 | },
1234 | {
1235 | "cell_type": "code",
1236 | "execution_count": 71,
1237 | "metadata": {
1238 | "collapsed": false
1239 | },
1240 | "outputs": [],
1241 | "source": [
1242 | "dfState = df.groupby(['state', 'market'], as_index=False).count()"
1243 | ]
1244 | },
1245 | {
1246 | "cell_type": "code",
1247 | "execution_count": 72,
1248 | "metadata": {
1249 | "collapsed": false
1250 | },
1251 | "outputs": [
1252 | {
1253 | "data": {
1254 | "text/plain": [
1255 | "array(['KURNOOL(AP)', 'RAJAHMUNDRY(AP)', 'BANGALORE', 'BHOPAL',\n",
1256 | " 'BIHARSHARIF(BHR)', 'CHANDIGARH', 'CHENNAI', 'RAIPUR(CHGARH)',\n",
1257 | " 'DELHI', 'AHMEDABAD(GUJ)', 'BHAVNAGAR(GUJ)', 'DEESA(GUJ)',\n",
1258 | " 'GONDAL(GUJ)', 'JAMNAGAR(GUJ)', 'MAHUVA(GUJ)', 'RAJKOT(GUJ)',\n",
1259 | " 'SURAT(GUJ)', 'GUWAHATI', 'KARNAL(HR)', 'HYDERABAD', 'JAIPUR',\n",
1260 | " 'JAMMU', 'RANCHI(JH)', 'PALAYAM(KER)', 'BELGAUM(KNT)',\n",
1261 | " 'BIJAPUR(KNT)', 'CHALLAKERE(KNT)', 'CHICKBALLAPUR(KNT)',\n",
1262 | " 'DHAVANGERE(KNT)', 'HASSAN(KNT)', 'HUBLI(KNT)', 'KOLAR(KNT)',\n",
1263 | " 'RAICHUR(KNT)', 'KOLKATA', 'LUCKNOW', 'DEWAS(MP)', 'INDORE(MP)',\n",
1264 | " 'MANDSOUR(MP)', 'NEEMUCH(MP)', 'SAGAR(MP)', 'UJJAIN(MP)',\n",
1265 | " 'AHMEDNAGAR(MS)', 'BOMBORI(MS)', 'CHAKAN(MS)', 'CHANDVAD(MS)',\n",
1266 | " 'DEVALA(MS)', 'DHULIA(MS)', 'DINDORI(MS)', 'JALGAON(MS)',\n",
1267 | " 'JUNNAR(MS)', 'KALVAN(MS)', 'KOLHAPUR(MS)', 'KOPERGAON(MS)',\n",
1268 | " 'LASALGAON(MS)', 'LONAND(MS)', 'MALEGAON(MS)', 'MANMAD(MS)',\n",
1269 | " 'NANDGAON(MS)', 'NASIK(MS)', 'NEWASA(MS)', 'NIPHAD(MS)',\n",
1270 | " 'PHALTAN (MS)', 'PIMPALGAON(MS)', 'PUNE(MS)', 'RAHATA(MS)',\n",
1271 | " 'RAHURI(MS)', 'SAIKHEDA(MS)', 'SANGALI(MS)', 'SANGAMNER(MS)',\n",
1272 | " 'SATANA(MS)', 'SHRIRAMPUR(MS)', 'SINNAR(MS)', 'SOLAPUR(MS)',\n",
1273 | " 'SRIRAMPUR(MS)', 'VANI(MS)', 'YEOLA(MS)', 'MUMBAI', 'NAGPUR',\n",
1274 | " 'BHUBNESWER(OR)', 'PATNA', 'ABOHAR(PB)', 'AMRITSAR(PB)',\n",
1275 | " 'BHATINDA(PB)', 'HOSHIARPUR(PB)', 'JALANDHAR(PB)', 'KHANNA(PB)',\n",
1276 | " 'LUDHIANA(PB)', 'PATIALA(PB)', 'DINDIGUL(TN)(Podis', 'AJMER(RAJ)',\n",
1277 | " 'ALWAR(RAJ)', 'BIKANER(RAJ)', 'JODHPUR(RAJ)', 'KOTA(RAJ)',\n",
1278 | " 'SRIGANGANAGAR(RAJ)', 'UDAIPUR(RAJ)', 'SHIMLA', 'SRINAGAR',\n",
1279 | " 'DINDIGUL(TN)', 'MADURAI(TN)', 'TRIVENDRUM', 'AGRA(UP)',\n",
1280 | " 'ALIGARH(UP)', 'BALLIA(UP)', 'BAREILLY(UP)', 'DEORIA(UP)',\n",
1281 | " 'ETAWAH(UP)', 'GORAKHPUR(UP)', 'KANPUR(UP)', 'MEERUT(UP)',\n",
1282 | " 'VARANASI(UP)', 'DEHRADOON(UTT)', 'HALDWANI(UTT)', 'BURDWAN(WB)',\n",
1283 | " 'MIDNAPUR(WB)', 'PURULIA(WB)', 'SHEROAPHULY(WB)', 'JALGAON(WHITE)',\n",
1284 | " 'COIMBATORE(TN) (bellary)', 'COIMBATORE(TN) (podisu)'], dtype=object)"
1285 | ]
1286 | },
1287 | "execution_count": 72,
1288 | "metadata": {},
1289 | "output_type": "execute_result"
1290 | }
1291 | ],
1292 | "source": [
1293 | "dfState.market.unique()"
1294 | ]
1295 | },
1296 | {
1297 | "cell_type": "code",
1298 | "execution_count": 73,
1299 | "metadata": {
1300 | "collapsed": false
1301 | },
1302 | "outputs": [],
1303 | "source": [
1304 | "state_now = ['PB', 'UP', 'GUJ', 'MS', 'RAJ', 'BANGALORE', 'KNT', 'BHOPAL', 'OR',\n",
1305 | " 'BHR', 'WB', 'CHANDIGARH', 'CHENNAI', 'bellary', 'podisu', 'UTT',\n",
1306 | " 'DELHI', 'MP', 'TN', 'Podis', 'GUWAHATI', 'HYDERABAD', 'JAIPUR',\n",
1307 | " 'WHITE', 'JAMMU', 'HR', 'KOLKATA', 'AP', 'LUCKNOW', 'MUMBAI',\n",
1308 | " 'NAGPUR', 'KER', 'PATNA', 'CHGARH', 'JH', 'SHIMLA', 'SRINAGAR',\n",
1309 | " 'TRIVENDRUM']"
1310 | ]
1311 | },
1312 | {
1313 | "cell_type": "code",
1314 | "execution_count": 74,
1315 | "metadata": {
1316 | "collapsed": false
1317 | },
1318 | "outputs": [],
1319 | "source": [
1320 | "state_new =['PB', 'UP', 'GUJ', 'MS', 'RAJ', 'KNT', 'KNT', 'MP', 'OR',\n",
1321 | " 'BHR', 'WB', 'CH', 'TN', 'KNT', 'TN', 'UP',\n",
1322 | " 'DEL', 'MP', 'TN', 'TN', 'ASM', 'AP', 'RAJ',\n",
1323 | " 'MS', 'JK', 'HR', 'WB', 'AP', 'UP', 'MS',\n",
1324 | " 'MS', 'KER', 'BHR', 'HR', 'JH', 'HP', 'JK',\n",
1325 | " 'KEL']"
1326 | ]
1327 | },
1328 | {
1329 | "cell_type": "code",
1330 | "execution_count": 75,
1331 | "metadata": {
1332 | "collapsed": false
1333 | },
1334 | "outputs": [],
1335 | "source": [
1336 | "df.state = df.state.replace(state_now, state_new)"
1337 | ]
1338 | },
1339 | {
1340 | "cell_type": "code",
1341 | "execution_count": 76,
1342 | "metadata": {
1343 | "collapsed": false
1344 | },
1345 | "outputs": [
1346 | {
1347 | "data": {
1348 | "text/plain": [
1349 | "array(['PB', 'UP', 'GUJ', 'MS', 'RAJ', 'KNT', 'MP', 'OR', 'BHR', 'WB',\n",
1350 | " 'CH', 'TN', 'DEL', 'ASM', 'AP', 'JK', 'HR', 'KER', 'JH', 'HP', 'KEL'], dtype=object)"
1351 | ]
1352 | },
1353 | "execution_count": 76,
1354 | "metadata": {},
1355 | "output_type": "execute_result"
1356 | }
1357 | ],
1358 | "source": [
1359 | "df.state.unique()"
1360 | ]
1361 | },
1362 | {
1363 | "cell_type": "markdown",
1364 | "metadata": {},
1365 | "source": [
1366 | "## Getting the Dates"
1367 | ]
1368 | },
1369 | {
1370 | "cell_type": "code",
1371 | "execution_count": 77,
1372 | "metadata": {
1373 | "collapsed": false
1374 | },
1375 | "outputs": [
1376 | {
1377 | "data": {
1378 | "text/html": [
1379 | "\n",
1380 | "
\n",
1381 | " \n",
1382 | " \n",
1383 | " | \n",
1384 | " market | \n",
1385 | " month | \n",
1386 | " year | \n",
1387 | " quantity | \n",
1388 | " priceMin | \n",
1389 | " priceMax | \n",
1390 | " priceMod | \n",
1391 | " state | \n",
1392 | " city | \n",
1393 | "
\n",
1394 | " \n",
1395 | " \n",
1396 | " \n",
1397 | " 0 | \n",
1398 | " ABOHAR(PB) | \n",
1399 | " Jan | \n",
1400 | " 2005 | \n",
1401 | " 2350 | \n",
1402 | " 404 | \n",
1403 | " 493 | \n",
1404 | " 446 | \n",
1405 | " PB | \n",
1406 | " ABOHAR | \n",
1407 | "
\n",
1408 | " \n",
1409 | " 1 | \n",
1410 | " ABOHAR(PB) | \n",
1411 | " Jan | \n",
1412 | " 2006 | \n",
1413 | " 900 | \n",
1414 | " 487 | \n",
1415 | " 638 | \n",
1416 | " 563 | \n",
1417 | " PB | \n",
1418 | " ABOHAR | \n",
1419 | "
\n",
1420 | " \n",
1421 | " 2 | \n",
1422 | " ABOHAR(PB) | \n",
1423 | " Jan | \n",
1424 | " 2010 | \n",
1425 | " 790 | \n",
1426 | " 1283 | \n",
1427 | " 1592 | \n",
1428 | " 1460 | \n",
1429 | " PB | \n",
1430 | " ABOHAR | \n",
1431 | "
\n",
1432 | " \n",
1433 | " 3 | \n",
1434 | " ABOHAR(PB) | \n",
1435 | " Jan | \n",
1436 | " 2011 | \n",
1437 | " 245 | \n",
1438 | " 3067 | \n",
1439 | " 3750 | \n",
1440 | " 3433 | \n",
1441 | " PB | \n",
1442 | " ABOHAR | \n",
1443 | "
\n",
1444 | " \n",
1445 | " 4 | \n",
1446 | " ABOHAR(PB) | \n",
1447 | " Jan | \n",
1448 | " 2012 | \n",
1449 | " 1035 | \n",
1450 | " 523 | \n",
1451 | " 686 | \n",
1452 | " 605 | \n",
1453 | " PB | \n",
1454 | " ABOHAR | \n",
1455 | "
\n",
1456 | " \n",
1457 | "
\n",
1458 | "
"
1459 | ],
1460 | "text/plain": [
1461 | " market month year quantity priceMin priceMax priceMod state \\\n",
1462 | "0 ABOHAR(PB) Jan 2005 2350 404 493 446 PB \n",
1463 | "1 ABOHAR(PB) Jan 2006 900 487 638 563 PB \n",
1464 | "2 ABOHAR(PB) Jan 2010 790 1283 1592 1460 PB \n",
1465 | "3 ABOHAR(PB) Jan 2011 245 3067 3750 3433 PB \n",
1466 | "4 ABOHAR(PB) Jan 2012 1035 523 686 605 PB \n",
1467 | "\n",
1468 | " city \n",
1469 | "0 ABOHAR \n",
1470 | "1 ABOHAR \n",
1471 | "2 ABOHAR \n",
1472 | "3 ABOHAR \n",
1473 | "4 ABOHAR "
1474 | ]
1475 | },
1476 | "execution_count": 77,
1477 | "metadata": {},
1478 | "output_type": "execute_result"
1479 | }
1480 | ],
1481 | "source": [
1482 | "df.head()"
1483 | ]
1484 | },
1485 | {
1486 | "cell_type": "code",
1487 | "execution_count": 78,
1488 | "metadata": {
1489 | "collapsed": false
1490 | },
1491 | "outputs": [
1492 | {
1493 | "data": {
1494 | "text/plain": [
1495 | "Int64Index([ 0, 1, 2, 3, 4, 5, 6, 7, 8,\n",
1496 | " 9,\n",
1497 | " ...\n",
1498 | " 10217, 10218, 10219, 10220, 10221, 10222, 10223, 10224, 10225,\n",
1499 | " 10226],\n",
1500 | " dtype='int64', length=10227)"
1501 | ]
1502 | },
1503 | "execution_count": 78,
1504 | "metadata": {},
1505 | "output_type": "execute_result"
1506 | }
1507 | ],
1508 | "source": [
1509 | "df.index"
1510 | ]
1511 | },
1512 | {
1513 | "cell_type": "code",
1514 | "execution_count": 79,
1515 | "metadata": {
1516 | "collapsed": false
1517 | },
1518 | "outputs": [
1519 | {
1520 | "data": {
1521 | "text/plain": [
1522 | "Timestamp('2012-01-01 00:00:00')"
1523 | ]
1524 | },
1525 | "execution_count": 79,
1526 | "metadata": {},
1527 | "output_type": "execute_result"
1528 | }
1529 | ],
1530 | "source": [
1531 | "pd.to_datetime('January 2012')"
1532 | ]
1533 | },
1534 | {
1535 | "cell_type": "code",
1536 | "execution_count": 80,
1537 | "metadata": {
1538 | "collapsed": false
1539 | },
1540 | "outputs": [],
1541 | "source": [
1542 | "df['date'] = df['month'] + '-' + df['year'].map(str)"
1543 | ]
1544 | },
1545 | {
1546 | "cell_type": "code",
1547 | "execution_count": 82,
1548 | "metadata": {
1549 | "collapsed": true
1550 | },
1551 | "outputs": [],
1552 | "source": [
1553 | "??map"
1554 | ]
1555 | },
1556 | {
1557 | "cell_type": "code",
1558 | "execution_count": 81,
1559 | "metadata": {
1560 | "collapsed": false,
1561 | "scrolled": true
1562 | },
1563 | "outputs": [
1564 | {
1565 | "data": {
1566 | "text/html": [
1567 | "\n",
1568 | "
\n",
1569 | " \n",
1570 | " \n",
1571 | " | \n",
1572 | " market | \n",
1573 | " month | \n",
1574 | " year | \n",
1575 | " quantity | \n",
1576 | " priceMin | \n",
1577 | " priceMax | \n",
1578 | " priceMod | \n",
1579 | " state | \n",
1580 | " city | \n",
1581 | " date | \n",
1582 | "
\n",
1583 | " \n",
1584 | " \n",
1585 | " \n",
1586 | " 0 | \n",
1587 | " ABOHAR(PB) | \n",
1588 | " Jan | \n",
1589 | " 2005 | \n",
1590 | " 2350 | \n",
1591 | " 404 | \n",
1592 | " 493 | \n",
1593 | " 446 | \n",
1594 | " PB | \n",
1595 | " ABOHAR | \n",
1596 | " Jan-2005 | \n",
1597 | "
\n",
1598 | " \n",
1599 | " 1 | \n",
1600 | " ABOHAR(PB) | \n",
1601 | " Jan | \n",
1602 | " 2006 | \n",
1603 | " 900 | \n",
1604 | " 487 | \n",
1605 | " 638 | \n",
1606 | " 563 | \n",
1607 | " PB | \n",
1608 | " ABOHAR | \n",
1609 | " Jan-2006 | \n",
1610 | "
\n",
1611 | " \n",
1612 | " 2 | \n",
1613 | " ABOHAR(PB) | \n",
1614 | " Jan | \n",
1615 | " 2010 | \n",
1616 | " 790 | \n",
1617 | " 1283 | \n",
1618 | " 1592 | \n",
1619 | " 1460 | \n",
1620 | " PB | \n",
1621 | " ABOHAR | \n",
1622 | " Jan-2010 | \n",
1623 | "
\n",
1624 | " \n",
1625 | " 3 | \n",
1626 | " ABOHAR(PB) | \n",
1627 | " Jan | \n",
1628 | " 2011 | \n",
1629 | " 245 | \n",
1630 | " 3067 | \n",
1631 | " 3750 | \n",
1632 | " 3433 | \n",
1633 | " PB | \n",
1634 | " ABOHAR | \n",
1635 | " Jan-2011 | \n",
1636 | "
\n",
1637 | " \n",
1638 | " 4 | \n",
1639 | " ABOHAR(PB) | \n",
1640 | " Jan | \n",
1641 | " 2012 | \n",
1642 | " 1035 | \n",
1643 | " 523 | \n",
1644 | " 686 | \n",
1645 | " 605 | \n",
1646 | " PB | \n",
1647 | " ABOHAR | \n",
1648 | " Jan-2012 | \n",
1649 | "
\n",
1650 | " \n",
1651 | "
\n",
1652 | "
"
1653 | ],
1654 | "text/plain": [
1655 | " market month year quantity priceMin priceMax priceMod state \\\n",
1656 | "0 ABOHAR(PB) Jan 2005 2350 404 493 446 PB \n",
1657 | "1 ABOHAR(PB) Jan 2006 900 487 638 563 PB \n",
1658 | "2 ABOHAR(PB) Jan 2010 790 1283 1592 1460 PB \n",
1659 | "3 ABOHAR(PB) Jan 2011 245 3067 3750 3433 PB \n",
1660 | "4 ABOHAR(PB) Jan 2012 1035 523 686 605 PB \n",
1661 | "\n",
1662 | " city date \n",
1663 | "0 ABOHAR Jan-2005 \n",
1664 | "1 ABOHAR Jan-2006 \n",
1665 | "2 ABOHAR Jan-2010 \n",
1666 | "3 ABOHAR Jan-2011 \n",
1667 | "4 ABOHAR Jan-2012 "
1668 | ]
1669 | },
1670 | "execution_count": 81,
1671 | "metadata": {},
1672 | "output_type": "execute_result"
1673 | }
1674 | ],
1675 | "source": [
1676 | "df.head()"
1677 | ]
1678 | },
1679 | {
1680 | "cell_type": "code",
1681 | "execution_count": 85,
1682 | "metadata": {
1683 | "collapsed": false
1684 | },
1685 | "outputs": [],
1686 | "source": [
1687 | "index = pd.to_datetime(df.date)"
1688 | ]
1689 | },
1690 | {
1691 | "cell_type": "code",
1692 | "execution_count": 86,
1693 | "metadata": {
1694 | "collapsed": false
1695 | },
1696 | "outputs": [],
1697 | "source": [
1698 | "df.index = pd.PeriodIndex(df.date, freq='M')"
1699 | ]
1700 | },
1701 | {
1702 | "cell_type": "code",
1703 | "execution_count": null,
1704 | "metadata": {
1705 | "collapsed": false
1706 | },
1707 | "outputs": [],
1708 | "source": [
1709 | "df.columns"
1710 | ]
1711 | },
1712 | {
1713 | "cell_type": "code",
1714 | "execution_count": 87,
1715 | "metadata": {
1716 | "collapsed": false
1717 | },
1718 | "outputs": [
1719 | {
1720 | "data": {
1721 | "text/plain": [
1722 | "PeriodIndex(['2005-01', '2006-01', '2010-01', '2011-01', '2012-01', '2013-01',\n",
1723 | " '2014-01', '2015-01', '2005-02', '2006-02',\n",
1724 | " ...\n",
1725 | " '2006-12', '2007-12', '2008-12', '2009-12', '2010-12', '2011-12',\n",
1726 | " '2012-12', '2013-12', '2014-12', '2015-12'],\n",
1727 | " dtype='int64', length=10227, freq='M')"
1728 | ]
1729 | },
1730 | "execution_count": 87,
1731 | "metadata": {},
1732 | "output_type": "execute_result"
1733 | }
1734 | ],
1735 | "source": [
1736 | "df.index"
1737 | ]
1738 | },
1739 | {
1740 | "cell_type": "code",
1741 | "execution_count": 88,
1742 | "metadata": {
1743 | "collapsed": false
1744 | },
1745 | "outputs": [
1746 | {
1747 | "data": {
1748 | "text/html": [
1749 | "\n",
1750 | "
\n",
1751 | " \n",
1752 | " \n",
1753 | " | \n",
1754 | " market | \n",
1755 | " month | \n",
1756 | " year | \n",
1757 | " quantity | \n",
1758 | " priceMin | \n",
1759 | " priceMax | \n",
1760 | " priceMod | \n",
1761 | " state | \n",
1762 | " city | \n",
1763 | " date | \n",
1764 | "
\n",
1765 | " \n",
1766 | " \n",
1767 | " \n",
1768 | " 2005-01 | \n",
1769 | " ABOHAR(PB) | \n",
1770 | " Jan | \n",
1771 | " 2005 | \n",
1772 | " 2350 | \n",
1773 | " 404 | \n",
1774 | " 493 | \n",
1775 | " 446 | \n",
1776 | " PB | \n",
1777 | " ABOHAR | \n",
1778 | " Jan-2005 | \n",
1779 | "
\n",
1780 | " \n",
1781 | " 2006-01 | \n",
1782 | " ABOHAR(PB) | \n",
1783 | " Jan | \n",
1784 | " 2006 | \n",
1785 | " 900 | \n",
1786 | " 487 | \n",
1787 | " 638 | \n",
1788 | " 563 | \n",
1789 | " PB | \n",
1790 | " ABOHAR | \n",
1791 | " Jan-2006 | \n",
1792 | "
\n",
1793 | " \n",
1794 | " 2010-01 | \n",
1795 | " ABOHAR(PB) | \n",
1796 | " Jan | \n",
1797 | " 2010 | \n",
1798 | " 790 | \n",
1799 | " 1283 | \n",
1800 | " 1592 | \n",
1801 | " 1460 | \n",
1802 | " PB | \n",
1803 | " ABOHAR | \n",
1804 | " Jan-2010 | \n",
1805 | "
\n",
1806 | " \n",
1807 | " 2011-01 | \n",
1808 | " ABOHAR(PB) | \n",
1809 | " Jan | \n",
1810 | " 2011 | \n",
1811 | " 245 | \n",
1812 | " 3067 | \n",
1813 | " 3750 | \n",
1814 | " 3433 | \n",
1815 | " PB | \n",
1816 | " ABOHAR | \n",
1817 | " Jan-2011 | \n",
1818 | "
\n",
1819 | " \n",
1820 | " 2012-01 | \n",
1821 | " ABOHAR(PB) | \n",
1822 | " Jan | \n",
1823 | " 2012 | \n",
1824 | " 1035 | \n",
1825 | " 523 | \n",
1826 | " 686 | \n",
1827 | " 605 | \n",
1828 | " PB | \n",
1829 | " ABOHAR | \n",
1830 | " Jan-2012 | \n",
1831 | "
\n",
1832 | " \n",
1833 | "
\n",
1834 | "
"
1835 | ],
1836 | "text/plain": [
1837 | " market month year quantity priceMin priceMax priceMod state \\\n",
1838 | "2005-01 ABOHAR(PB) Jan 2005 2350 404 493 446 PB \n",
1839 | "2006-01 ABOHAR(PB) Jan 2006 900 487 638 563 PB \n",
1840 | "2010-01 ABOHAR(PB) Jan 2010 790 1283 1592 1460 PB \n",
1841 | "2011-01 ABOHAR(PB) Jan 2011 245 3067 3750 3433 PB \n",
1842 | "2012-01 ABOHAR(PB) Jan 2012 1035 523 686 605 PB \n",
1843 | "\n",
1844 | " city date \n",
1845 | "2005-01 ABOHAR Jan-2005 \n",
1846 | "2006-01 ABOHAR Jan-2006 \n",
1847 | "2010-01 ABOHAR Jan-2010 \n",
1848 | "2011-01 ABOHAR Jan-2011 \n",
1849 | "2012-01 ABOHAR Jan-2012 "
1850 | ]
1851 | },
1852 | "execution_count": 88,
1853 | "metadata": {},
1854 | "output_type": "execute_result"
1855 | }
1856 | ],
1857 | "source": [
1858 | "df.head()"
1859 | ]
1860 | },
1861 | {
1862 | "cell_type": "code",
1863 | "execution_count": null,
1864 | "metadata": {
1865 | "collapsed": true
1866 | },
1867 | "outputs": [],
1868 | "source": [
1869 | "df.to_csv('MonthWiseMarketArrivals_Clean.csv', index = False)"
1870 | ]
1871 | }
1872 | ],
1873 | "metadata": {
1874 | "kernelspec": {
1875 | "display_name": "Python 3",
1876 | "language": "python",
1877 | "name": "python3"
1878 | },
1879 | "language_info": {
1880 | "codemirror_mode": {
1881 | "name": "ipython",
1882 | "version": 3
1883 | },
1884 | "file_extension": ".py",
1885 | "mimetype": "text/x-python",
1886 | "name": "python",
1887 | "nbconvert_exporter": "python",
1888 | "pygments_lexer": "ipython3",
1889 | "version": "3.5.1"
1890 | }
1891 | },
1892 | "nbformat": 4,
1893 | "nbformat_minor": 0
1894 | }
1895 |
--------------------------------------------------------------------------------
/time_series/city_geocode.csv:
--------------------------------------------------------------------------------
1 | city,lon,lat
2 | GUWAHATI,91.7362365,26.1445169
3 | KOLKATA,88.363895,22.572646
4 | SRIRAMPUR,88.3385053,23.4033393
5 | SHEROAPHULY,88.3215014,22.7690032
6 | BURDWAN,87.8614793,23.2324214
7 | MIDNAPUR,87.3214908,22.4308892
8 | PURULIA,86.365208,23.3320779
9 | DHULIA,86.0618818,22.0347727
10 | BHUBNESWER,85.8245398,20.2960587
11 | BIHARSHARIF,85.5148735,25.1982147
12 | RANCHI,85.309562,23.3440997
13 | PATNA,85.1375645,25.5940947
14 | BALLIA,84.1487319,25.7584381
15 | DEORIA,83.7838214,26.4862373
16 | GORAKHPUR,83.3731675,26.7605545
17 | VARANASI,82.9739144,25.3176452
18 | RAJAHMUNDRY,81.8040345,17.0005383
19 | RAIPUR,81.6296413,21.2513844
20 | DINDORI,81.0768455,22.9417931
21 | LUCKNOW,80.946166,26.8466937
22 | KANPUR,80.3318736,26.449923
23 | CHENNAI,80.2707184,13.0826802
24 | HALDWANI,79.5129767,29.2182644
25 | BAREILLY,79.4304381,28.3670355
26 | NAGPUR,79.0881546,21.1458004
27 | ETAWAH,79.0046898,26.8117116
28 | SAGAR,78.7378068,23.838805
29 | SAIKHEDA,78.5831181,22.962215
30 | HYDERABAD,78.486671,17.385044
31 | KOLAR,78.1325611,13.1357446
32 | MADURAI,78.1197754,9.9252007
33 | ALIGARH,78.0880129,27.8973944
34 | KURNOOL,78.0372792,15.8281257
35 | DEHRADOON,78.0321918,30.3164945
36 | AGRA,78.0080745,27.1766701
37 | DINDIGUL,77.9802906,10.3673123
38 | CHICKBALLAPUR,77.7280396,13.432366
39 | MEERUT,77.7064137,28.9844618
40 | BANGALORE,77.5945627,12.9715987
41 | BHOPAL,77.412615,23.2599333
42 | RAICHUR,77.3439283,16.2120031
43 | DELHI,77.2090212,28.6139391
44 | SHIMLA,77.1734033,31.1048145
45 | KARNAL,76.9904825,29.6856929
46 | COIMBATORE,76.9558321,11.0168445
47 | PALAYAM,76.9513432,8.5027684
48 | TRIVENDRUM,76.9366376,8.5241391
49 | CHANDIGARH,76.7794179,30.7333148
50 | CHALLAKERE,76.6528225,14.313395
51 | ALWAR,76.6345735,27.5529907
52 | PATIALA,76.3868797,30.3397809
53 | DEVALA,76.3820088,11.4725502
54 | KHANNA,76.2112286,30.697852
55 | HASSAN,76.0995519,13.0068142
56 | DEWAS,76.0507949,22.9622672
57 | DHAVANGERE,75.9238397,14.4663438
58 | HOSHIARPUR,75.911483,31.5143178
59 | SOLAPUR,75.9063906,17.6599188
60 | KOTA,75.8647527,25.2138156
61 | INDORE,75.8577258,22.7195687
62 | LUDHIANA,75.8572758,30.900965
63 | JAIPUR,75.7872709,26.9124336
64 | UJJAIN,75.7849097,23.1793013
65 | BIJAPUR,75.710031,16.8301708
66 | JALANDHAR,75.5761829,31.3260152
67 | JALGAON,75.5626039,21.0076578
68 | HUBLI,75.1239547,15.3647083
69 | MANDSOUR,75.0692952,24.076836
70 | BHATINDA,74.9454745,30.210994
71 | SRINAGAR,74.9442585,34.1255413
72 | NEWASA,74.9281063,19.5511772
73 | AMRITSAR,74.8722642,31.6339793
74 | NEEMUCH,74.8624092,24.4763852
75 | JAMMU,74.8576539,32.7217819
76 | AHMEDNAGAR,74.7495916,19.0952075
77 | SHRIRAMPUR,74.6576091,19.6222323
78 | RAHURI,74.6488264,19.392678
79 | AJMER,74.6399163,26.4498954
80 | SANGALI,74.5814773,16.8523973
81 | MALEGAON,74.5100291,20.5547497
82 | BELGAUM,74.4976741,15.8496953
83 | RAHATA,74.483335,19.7127021
84 | YEOLA,74.4818698,20.0471229
85 | KOPERGAON,74.4790898,19.8916791
86 | MANMAD,74.4366016,20.2511789
87 | PHALTAN ,74.4360424,17.9844507
88 | CHANDVAD,74.2472779,20.3271277
89 | KOLHAPUR,74.2432527,16.7049873
90 | LASALGAON,74.2326058,20.1491422
91 | SANGAMNER,74.2079648,19.5771387
92 | SATANA,74.2032581,20.598224
93 | ABOHAR,74.1993043,30.1452928
94 | LONAND,74.1861821,18.041706
95 | NIPHAD,74.1093141,20.0799646
96 | SINNAR,74.0006328,19.8530593
97 | PIMPALGAON,73.9873787,20.1699678
98 | SRIGANGANAGAR,73.8771901,29.9038399
99 | JUNNAR,73.87425,19.2031842
100 | CHAKAN,73.8630346,18.7602664
101 | PUNE,73.8567437,18.5204303
102 | NASIK,73.7898023,19.9974533
103 | UDAIPUR,73.712479,24.585445
104 | BIKANER,73.3119159,28.0229348
105 | JODHPUR,73.0243094,26.2389469
106 | NANDGAON,72.9276008,18.3855337
107 | MUMBAI,72.8776559,19.0759837
108 | SURAT,72.8310607,21.1702401
109 | AHMEDABAD,72.5713621,23.022505
110 | DEESA,72.1906721,24.2585031
111 | BHAVNAGAR,72.1519304,21.7644725
112 | MAHUVA,71.7563169,21.0902193
113 | RAJKOT,70.8021599,22.3038945
114 | GONDAL,70.792297,21.9619463
115 | JAMNAGAR,70.05773,22.4707019
116 | KALVAN,73.13054,19.24033
117 | VANI,73.89189,20.33749
118 | BOMBORI,72.87766,19.07598
--------------------------------------------------------------------------------
/time_series/img/Cov_nonstationary.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/time_series/img/Cov_nonstationary.png
--------------------------------------------------------------------------------
/time_series/img/Mean_nonstationary.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/time_series/img/Mean_nonstationary.png
--------------------------------------------------------------------------------
/time_series/img/Var_nonstationary.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/time_series/img/Var_nonstationary.png
--------------------------------------------------------------------------------
/time_series/img/left_merge.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/time_series/img/left_merge.png
--------------------------------------------------------------------------------
/time_series/img/onion_small.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/time_series/img/onion_small.png
--------------------------------------------------------------------------------
/time_series/img/onion_tables.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/time_series/img/onion_tables.png
--------------------------------------------------------------------------------
/time_series/img/peeling_the_onion_small.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/time_series/img/peeling_the_onion_small.png
--------------------------------------------------------------------------------
/time_series/img/pivot.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/time_series/img/pivot.png
--------------------------------------------------------------------------------
/time_series/img/splitapplycombine.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/time_series/img/splitapplycombine.png
--------------------------------------------------------------------------------
/time_series/img/subsetcolumns.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/time_series/img/subsetcolumns.png
--------------------------------------------------------------------------------
/time_series/img/subsetrows.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rouseguy/TimeSeriesAnalysiswithPython/1230a50d8fe619c5e150f4ed4bbcda0612ef8ef6/time_series/img/subsetrows.png
--------------------------------------------------------------------------------
/time_series/state_geocode.csv:
--------------------------------------------------------------------------------
1 | "state","name","lon","lat"
2 | "MS","Maharashtra",75.7138884,19.7514798
3 | "GUJ","Gujarat",71.1923805,22.258652
4 | "MP","Madhya pradesh",78.6568942,22.9734229
5 | "TN","Tamil Nadu",78.6568942,11.1271225
6 | "KNT","Karnataka",75.7138884,15.3172775
7 | "DEL","Delhi",77.2090212,28.6139391
8 | "HR","Haryana",76.085601,29.0587757
9 | "RAJ","Rajasthan",74.2179326,27.0238036
10 | "AP","Andhra Pradesh",79.7399875,15.9128998
11 | "UP","Uttar Pradesh",80.9461592,26.8467088
12 | "JK","Jammu & Kashmir",74.8576539,32.7217819
13 | "BHR","Bihar",85.3131194,25.0960742
14 | "WB","West Bengal",87.8549755,22.9867569
15 | "HP","Himachal Pradesh",77.1733901,31.1048294
16 | "ASM","Assam",92.9375739,26.2006043
17 | "KEL","Kerala",76.2710833,10.8505159
18 | "JH","Jharkhand",85.2799354,23.6101808
19 | "OR","Orissa",85.0985236,20.9516658
20 | "PB","Punjab",75.3412179,31.1471305
21 | "KER","Kerala",76.2710833,10.8505159
22 | "CH","Chandigarh",76.7794179,30.7333148
23 |
--------------------------------------------------------------------------------