├── README.md
├── .gitignore
└── CityBudgetExtractor.ipynb


/README.md:
--------------------------------------------------------------------------------
1 | # city_budget_explorer
2 | A repo to explore city budgets
3 | 


--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
  1 | # Byte-compiled / optimized / DLL files
  2 | __pycache__/
  3 | *.py[cod]
  4 | *$py.class
  5 | 
  6 | # C extensions
  7 | *.so
  8 | 
  9 | # Distribution / packaging
 10 | .Python
 11 | build/
 12 | develop-eggs/
 13 | dist/
 14 | downloads/
 15 | eggs/
 16 | .eggs/
 17 | lib/
 18 | lib64/
 19 | parts/
 20 | sdist/
 21 | var/
 22 | wheels/
 23 | pip-wheel-metadata/
 24 | share/python-wheels/
 25 | *.egg-info/
 26 | .installed.cfg
 27 | *.egg
 28 | MANIFEST
 29 | 
 30 | # PyInstaller
 31 | #  Usually these files are written by a python script from a template
 32 | #  before PyInstaller builds the exe, so as to inject date/other infos into it.
 33 | *.manifest
 34 | *.spec
 35 | 
 36 | # Installer logs
 37 | pip-log.txt
 38 | pip-delete-this-directory.txt
 39 | 
 40 | # Unit test / coverage reports
 41 | htmlcov/
 42 | .tox/
 43 | .nox/
 44 | .coverage
 45 | .coverage.*
 46 | .cache
 47 | nosetests.xml
 48 | coverage.xml
 49 | *.cover
 50 | *.py,cover
 51 | .hypothesis/
 52 | .pytest_cache/
 53 | 
 54 | # Translations
 55 | *.mo
 56 | *.pot
 57 | 
 58 | # Django stuff:
 59 | *.log
 60 | local_settings.py
 61 | db.sqlite3
 62 | db.sqlite3-journal
 63 | 
 64 | # Flask stuff:
 65 | instance/
 66 | .webassets-cache
 67 | 
 68 | # Scrapy stuff:
 69 | .scrapy
 70 | 
 71 | # Sphinx documentation
 72 | docs/_build/
 73 | 
 74 | # PyBuilder
 75 | target/
 76 | 
 77 | # Jupyter Notebook
 78 | .ipynb_checkpoints
 79 | 
 80 | # IPython
 81 | profile_default/
 82 | ipython_config.py
 83 | 
 84 | # pyenv
 85 | .python-version
 86 | 
 87 | # pipenv
 88 | #   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
 89 | #   However, in case of collaboration, if having platform-specific dependencies or dependencies
 90 | #   having no cross-platform support, pipenv may install dependencies that don't work, or not
 91 | #   install all needed dependencies.
 92 | #Pipfile.lock
 93 | 
 94 | # PEP 582; used by e.g. github.com/David-OConnor/pyflow
 95 | __pypackages__/
 96 | 
 97 | # Celery stuff
 98 | celerybeat-schedule
 99 | celerybeat.pid
100 | 
101 | # SageMath parsed files
102 | *.sage.py
103 | 
104 | # Environments
105 | .env
106 | .venv
107 | env/
108 | venv/
109 | ENV/
110 | env.bak/
111 | venv.bak/
112 | 
113 | # Spyder project settings
114 | .spyderproject
115 | .spyproject
116 | 
117 | # Rope project settings
118 | .ropeproject
119 | 
120 | # mkdocs documentation
121 | /site
122 | 
123 | # mypy
124 | .mypy_cache/
125 | .dmypy.json
126 | dmypy.json
127 | 
128 | # Pyre type checker
129 | .pyre/
130 | 


--------------------------------------------------------------------------------
/CityBudgetExtractor.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "# City Budget Extractor"
  8 |    ]
  9 |   },
 10 |   {
 11 |    "cell_type": "code",
 12 |    "execution_count": 5,
 13 |    "metadata": {},
 14 |    "outputs": [],
 15 |    "source": [
 16 |     "import PyPDF2\n",
 17 |     "import pandas as pd\n",
 18 |     "from pprint import pprint"
 19 |    ]
 20 |   },
 21 |   {
 22 |    "cell_type": "markdown",
 23 |    "metadata": {},
 24 |    "source": [
 25 |     "After struggling for 20 minutes or so realized that the numbering in the PDF document don't match what is pulled by PyPDf2 because the budget PDF includes some \"intro\" pages that aren't counted by the GUI pdf reader but are counted by PyPDF2"
 26 |    ]
 27 |   },
 28 |   {
 29 |    "cell_type": "code",
 30 |    "execution_count": 4,
 31 |    "metadata": {},
 32 |    "outputs": [],
 33 |    "source": [
 34 |     "# This PDF has some special formatting that offsets the page numbers\n",
 35 |     "page_offset = 11\n",
 36 |     "filename = \"FY-19-20-Adopted-Budget.pdf\"\n",
 37 |     "\n",
 38 |     "reader = PyPDF2.PdfFileReader(filename)\n",
 39 |     "page = reader.getPage(279 +11)"
 40 |    ]
 41 |   },
 42 |   {
 43 |    "cell_type": "code",
 44 |    "execution_count": null,
 45 |    "metadata": {},
 46 |    "outputs": [],
 47 |    "source": []
 48 |   },
 49 |   {
 50 |    "cell_type": "markdown",
 51 |    "metadata": {},
 52 |    "source": [
 53 |     "Lets take a look at what the extracted text looks like. Because its a PDF Im already expecting something terrible and that what is looks like we got. We back one large string that sort of goes across the page row by row. We'll need to split this apart using some custom logic,"
 54 |    ]
 55 |   },
 56 |   {
 57 |    "cell_type": "markdown",
 58 |    "metadata": {},
 59 |    "source": [
 60 |     "For the headers I'm just going to write them down manually. While I could write some clever python it's really not worth it because there's only 6 headers, they're the same page to page, and I don't want the actual string from the text anyway."
 61 |    ]
 62 |   },
 63 |   {
 64 |    "cell_type": "code",
 65 |    "execution_count": 6,
 66 |    "metadata": {},
 67 |    "outputs": [],
 68 |    "source": [
 69 |     "HEADERS = [\"FY2015/16_Actual\", \"FY2016/17_Actual\", \"FY2017/18_Actual\",  \"FY2018/19_Actual\", \"FY2018/19_Revised\", \"FY2019/20_Adopted\"]"
 70 |    ]
 71 |   },
 72 |   {
 73 |    "cell_type": "markdown",
 74 |    "metadata": {},
 75 |    "source": [
 76 |     "With the headers done lets extract each row. The pattern I see here is some string that tells us the fund type, an\n",
 77 |     "\n",
 78 |     "1. String thas in  \"All Funds\", \"General Fund\n",
 79 |     "2. And all caps line that signifies the start of the block of expenses\n",
 80 |     "3. The line item of expenses\n",
 81 |     "4. 6 rows that are the actual budget expenses for my city\n",
 82 |     "\n"
 83 |    ]
 84 |   },
 85 |   {
 86 |    "cell_type": "code",
 87 |    "execution_count": 16,
 88 |    "metadata": {},
 89 |    "outputs": [],
 90 |    "source": [
 91 |     "class Parser:\n",
 92 |     "    \"\"\"Parses the PDF file to grab the city expednitures as dates\n",
 93 |     "    Performs the work in three passes\n",
 94 |     "    \n",
 95 |     "    1. Identifying the Block Headers and closing lines\n",
 96 |     "    2. Identifying the row titles\n",
 97 |     "    3. Parsing the row values and determining which ones can be parsed to valid row\n",
 98 |     "    \n",
 99 |     "    \"\"\"\n",
100 |     "    \n",
101 |     "    HEADERS = [\"FY2015/16_Actual\", \"FY2016/17_Actual\",\n",
102 |     "                         \"FY2017/18_Actual\",  \"FY2018/19_Actual\",\n",
103 |     "                         \"FY2018/19_Revised\", \"FY2019/20_Adopted\"]\n",
104 |     "    \n",
105 |     "    def __init__(self, page=279, page_offset=11, filename=\"FY-19-20-Adopted-Budget.pdf\"):\n",
106 |     "        \"\"\"Gets page from pdf and page text values\n",
107 |     "        \n",
108 |     "        Notes\n",
109 |     "        -----\n",
110 |     "        This PDF has some special formatting that offsets the page numbers\n",
111 |     "\n",
112 |     "        filename = \"FY-19-20-Adopted-Budget.pdf\"\n",
113 |     "        \"\"\"\n",
114 |     "        self.page = reader.getPage(279 +11)\n",
115 |     "        \n",
116 |     "        # Split the text into discrete word units and clean up spacing\n",
117 |     "        \n",
118 |     "        # List Comprehension\n",
119 |     "        self.text = tuple([line.strip() for line in self.page.extractText().split(\"\\n\")])\n",
120 |     "        \n",
121 |     "        \n",
122 |     "        self.text = pd.Series(self.text)\n",
123 |     "\n",
124 |     "        # Parse the budget numbers into numbers\n",
125 |     "        self.text = self.text.apply(self.coerce_numbers)\n",
126 |     "        \n",
127 |     "        \n",
128 |     "    def parse(self):\n",
129 |     "        raise NotImplementedError\n",
130 |     "        \n",
131 |     "    @staticmethod\n",
132 |     "    def is_header(line):\n",
133 |     "        \n",
134 |     "        if isinstance(line, int):\n",
135 |     "            return False\n",
136 |     "        else:\n",
137 |     "            return line.isupper() and \"\".join(line.split(\" \")).isalpha()\n",
138 |     "    \n",
139 |     "    @staticmethod\n",
140 |     "    def coerce_numbers(line):\n",
141 |     "        \"\"\"Tries parsing number strings into numbers, else return string\"\"\"\n",
142 |     "        \n",
143 |     "        # Check if number is negative with parantheses\n",
144 |     "        try:\n",
145 |     "            if line[0] == \"(\" and line[-1] ==\")\":\n",
146 |     "                neg_number = int(\"\".join(line[1:-1].split(\",\")))\n",
147 |     "                return neg_number\n",
148 |     "        except IndexError:\n",
149 |     "            pass\n",
150 |     "\n",
151 |     "        # Otherwise try plain logic\n",
152 |     "        try:\n",
153 |     "            return int(\"\".join(line.split(\",\")))\n",
154 |     "        except ValueError:\n",
155 |     "            return line\n",
156 |     "        \n",
157 |     "    def parse_block_headers(self):\n",
158 |     "        \"\"\"Identify the headers from the page as well as ending line\"\"\"\n",
159 |     "        current_header = {}\n",
160 |     "        headers = []\n",
161 |     "        \n",
162 |     "        for i, line in self.text.iteritems():\n",
163 |     "            if self.is_header(line):\n",
164 |     "                if line != current_header.get(\"header\"):\n",
165 |     "                    current_header = {\"header\":line, \"start\":i}\n",
166 |     "                else:\n",
167 |     "                    assert line == current_header.get(\"header\")\n",
168 |     "                    current_header[\"end\"] = i\n",
169 |     "                    headers.append(current_header)\n",
170 |     "                    current_header = {}\n",
171 |     "            \n",
172 |     "        self.headers = pd.DataFrame(headers)\n",
173 |     "        return self.headers\n",
174 |     "        \n",
175 |     "    def parse_row_labels(self):\n",
176 |     "        \"\"\"Identify the row labels from the page \n",
177 |     "        \n",
178 |     "        Notes\n",
179 |     "        ----\n",
180 |     "        Row labels must appear after first header and before last header\n",
181 |     "        \n",
182 |     "        \"\"\"\n",
183 |     "        \n",
184 |     "        rows = []\n",
185 |     "        for i, (header, start, end) in df.iterrows():\n",
186 |     "            \n",
187 |     "            row = {\"header\":header, \"values\":[]}\n",
188 |     "            \n",
189 |     "            # Get text block for this budget item block\n",
190 |     "            text_block = self.text[start+1:end]\n",
191 |     "            \n",
192 |     "            \n",
193 |     "            for line in text_block:\n",
194 |     "                \n",
195 |     "                # All rows should end with a % sign\n",
196 |     "                # TODO: There's still an issue where the line is not bookended always\n",
197 |     "                if \"%\" in str(line):\n",
198 |     "                    if len(row[\"values\"]) == 6:\n",
199 |     "                        row[\"complete\"] = True\n",
200 |     "                    else:\n",
201 |     "                        row[\"complete\"] = False\n",
202 |     "\n",
203 |     "                    # Explode numerical values and pair headers\n",
204 |     "                    numbers = row.pop(\"values\")\n",
205 |     "                    for key, val in zip(self.HEADERS, numbers):\n",
206 |     "                        row[key] = val\n",
207 |     "\n",
208 |     "                    rows.append(row)\n",
209 |     "                    row = {\"header\":header, \"values\":[]}\n",
210 |     "                    \n",
211 |     "                # If line is a string in this block its a row label\n",
212 |     "                elif isinstance(line, str):\n",
213 |     "                    row[\"line_item\"] = line\n",
214 |     "                \n",
215 |     "                # Otherwise its \n",
216 |     "                else:\n",
217 |     "                    assert isinstance(line, int)\n",
218 |     "                    row[\"values\"].append(line)\n",
219 |     "            \n",
220 |     "        self.budget = pd.DataFrame(rows)\n",
221 |     "        return self.budget\n",
222 |     "\n",
223 |     "        \n",
224 |     "    def parse_numbers(self):\n",
225 |     "        \"\"\"Identify indices of valid numbers from the page\"\"\""
226 |    ]
227 |   },
228 |   {
229 |    "cell_type": "code",
230 |    "execution_count": 17,
231 |    "metadata": {},
232 |    "outputs": [
233 |     {
234 |      "data": {
235 |       "text/html": [
236 |        "<div>\n",
237 |        "<style scoped>\n",
238 |        "    .dataframe tbody tr th:only-of-type {\n",
239 |        "        vertical-align: middle;\n",
240 |        "    }\n",
241 |        "\n",
242 |        "    .dataframe tbody tr th {\n",
243 |        "        vertical-align: top;\n",
244 |        "    }\n",
245 |        "\n",
246 |        "    .dataframe thead th {\n",
247 |        "        text-align: right;\n",
248 |        "    }\n",
249 |        "</style>\n",
250 |        "<table border=\"1\" class=\"dataframe\">\n",
251 |        "  <thead>\n",
252 |        "    <tr style=\"text-align: right;\">\n",
253 |        "      <th></th>\n",
254 |        "      <th>header</th>\n",
255 |        "      <th>line_item</th>\n",
256 |        "      <th>complete</th>\n",
257 |        "      <th>FY2015/16_Actual</th>\n",
258 |        "      <th>FY2016/17_Actual</th>\n",
259 |        "      <th>FY2017/18_Actual</th>\n",
260 |        "      <th>FY2018/19_Actual</th>\n",
261 |        "      <th>FY2018/19_Revised</th>\n",
262 |        "      <th>FY2019/20_Adopted</th>\n",
263 |        "    </tr>\n",
264 |        "  </thead>\n",
265 |        "  <tbody>\n",
266 |        "    <tr>\n",
267 |        "      <th>0</th>\n",
268 |        "      <td>PERSONNEL SERVICES</td>\n",
269 |        "      <td>Salaries, Permanent</td>\n",
270 |        "      <td>True</td>\n",
271 |        "      <td>34323749</td>\n",
272 |        "      <td>34654406</td>\n",
273 |        "      <td>25765375</td>\n",
274 |        "      <td>37010295</td>\n",
275 |        "      <td>37033765</td>\n",
276 |        "      <td>36135762</td>\n",
277 |        "    </tr>\n",
278 |        "    <tr>\n",
279 |        "      <th>1</th>\n",
280 |        "      <td>PERSONNEL SERVICES</td>\n",
281 |        "      <td>Salaries, Temporary</td>\n",
282 |        "      <td>True</td>\n",
283 |        "      <td>499772</td>\n",
284 |        "      <td>420908</td>\n",
285 |        "      <td>348015</td>\n",
286 |        "      <td>367098</td>\n",
287 |        "      <td>538702</td>\n",
288 |        "      <td>367948</td>\n",
289 |        "    </tr>\n",
290 |        "    <tr>\n",
291 |        "      <th>2</th>\n",
292 |        "      <td>PERSONNEL SERVICES</td>\n",
293 |        "      <td>Salaries, Overtime</td>\n",
294 |        "      <td>True</td>\n",
295 |        "      <td>5007346</td>\n",
296 |        "      <td>5043233</td>\n",
297 |        "      <td>4093771</td>\n",
298 |        "      <td>3953950</td>\n",
299 |        "      <td>4372335</td>\n",
300 |        "      <td>4049950</td>\n",
301 |        "    </tr>\n",
302 |        "    <tr>\n",
303 |        "      <th>3</th>\n",
304 |        "      <td>PERSONNEL SERVICES</td>\n",
305 |        "      <td>Benefits</td>\n",
306 |        "      <td>False</td>\n",
307 |        "      <td>1466088</td>\n",
308 |        "      <td>1550479</td>\n",
309 |        "      <td>1079615</td>\n",
310 |        "      <td>25343062</td>\n",
311 |        "      <td>26926178</td>\n",
312 |        "      <td>21666643</td>\n",
313 |        "    </tr>\n",
314 |        "    <tr>\n",
315 |        "      <th>4</th>\n",
316 |        "      <td>OPERATING EXPENSES</td>\n",
317 |        "      <td>Utilities</td>\n",
318 |        "      <td>True</td>\n",
319 |        "      <td>17654</td>\n",
320 |        "      <td>31687</td>\n",
321 |        "      <td>30413</td>\n",
322 |        "      <td>19500</td>\n",
323 |        "      <td>19500</td>\n",
324 |        "      <td>19500</td>\n",
325 |        "    </tr>\n",
326 |        "    <tr>\n",
327 |        "      <th>5</th>\n",
328 |        "      <td>OPERATING EXPENSES</td>\n",
329 |        "      <td>Equipment and Supplies</td>\n",
330 |        "      <td>True</td>\n",
331 |        "      <td>1105697</td>\n",
332 |        "      <td>1575090</td>\n",
333 |        "      <td>935471</td>\n",
334 |        "      <td>985254</td>\n",
335 |        "      <td>1489857</td>\n",
336 |        "      <td>1328684</td>\n",
337 |        "    </tr>\n",
338 |        "    <tr>\n",
339 |        "      <th>6</th>\n",
340 |        "      <td>OPERATING EXPENSES</td>\n",
341 |        "      <td>Repairs and Maintenance</td>\n",
342 |        "      <td>True</td>\n",
343 |        "      <td>1106671</td>\n",
344 |        "      <td>939054</td>\n",
345 |        "      <td>752048</td>\n",
346 |        "      <td>964510</td>\n",
347 |        "      <td>986248</td>\n",
348 |        "      <td>964510</td>\n",
349 |        "    </tr>\n",
350 |        "    <tr>\n",
351 |        "      <th>7</th>\n",
352 |        "      <td>OPERATING EXPENSES</td>\n",
353 |        "      <td>Conferences and Training</td>\n",
354 |        "      <td>True</td>\n",
355 |        "      <td>344329</td>\n",
356 |        "      <td>337535</td>\n",
357 |        "      <td>308983</td>\n",
358 |        "      <td>334105</td>\n",
359 |        "      <td>335654</td>\n",
360 |        "      <td>225767</td>\n",
361 |        "    </tr>\n",
362 |        "    <tr>\n",
363 |        "      <th>8</th>\n",
364 |        "      <td>OPERATING EXPENSES</td>\n",
365 |        "      <td>Professional Services</td>\n",
366 |        "      <td>True</td>\n",
367 |        "      <td>503872</td>\n",
368 |        "      <td>458393</td>\n",
369 |        "      <td>391996</td>\n",
370 |        "      <td>335825</td>\n",
371 |        "      <td>735552</td>\n",
372 |        "      <td>335825</td>\n",
373 |        "    </tr>\n",
374 |        "    <tr>\n",
375 |        "      <th>9</th>\n",
376 |        "      <td>OPERATING EXPENSES</td>\n",
377 |        "      <td>Other Contract Services</td>\n",
378 |        "      <td>True</td>\n",
379 |        "      <td>1727604</td>\n",
380 |        "      <td>1790163</td>\n",
381 |        "      <td>1569292</td>\n",
382 |        "      <td>2279087</td>\n",
383 |        "      <td>2355534</td>\n",
384 |        "      <td>2189087</td>\n",
385 |        "    </tr>\n",
386 |        "    <tr>\n",
387 |        "      <th>10</th>\n",
388 |        "      <td>OPERATING EXPENSES</td>\n",
389 |        "      <td>Rental Expense</td>\n",
390 |        "      <td>True</td>\n",
391 |        "      <td>11420</td>\n",
392 |        "      <td>13111</td>\n",
393 |        "      <td>7148</td>\n",
394 |        "      <td>10884</td>\n",
395 |        "      <td>10884</td>\n",
396 |        "      <td>10884</td>\n",
397 |        "    </tr>\n",
398 |        "    <tr>\n",
399 |        "      <th>11</th>\n",
400 |        "      <td>OPERATING EXPENSES</td>\n",
401 |        "      <td>Payments to Other Governments</td>\n",
402 |        "      <td>True</td>\n",
403 |        "      <td>962714</td>\n",
404 |        "      <td>790602</td>\n",
405 |        "      <td>592863</td>\n",
406 |        "      <td>928540</td>\n",
407 |        "      <td>928540</td>\n",
408 |        "      <td>928540</td>\n",
409 |        "    </tr>\n",
410 |        "    <tr>\n",
411 |        "      <th>12</th>\n",
412 |        "      <td>OPERATING EXPENSES</td>\n",
413 |        "      <td>Expense Allowances</td>\n",
414 |        "      <td>True</td>\n",
415 |        "      <td>331430</td>\n",
416 |        "      <td>346883</td>\n",
417 |        "      <td>330933</td>\n",
418 |        "      <td>367000</td>\n",
419 |        "      <td>367000</td>\n",
420 |        "      <td>367000</td>\n",
421 |        "    </tr>\n",
422 |        "    <tr>\n",
423 |        "      <th>13</th>\n",
424 |        "      <td>OPERATING EXPENSES</td>\n",
425 |        "      <td>Other Expenses</td>\n",
426 |        "      <td>True</td>\n",
427 |        "      <td>3736</td>\n",
428 |        "      <td>10147</td>\n",
429 |        "      <td>132</td>\n",
430 |        "      <td>4973</td>\n",
431 |        "      <td>4973</td>\n",
432 |        "      <td>4973</td>\n",
433 |        "    </tr>\n",
434 |        "    <tr>\n",
435 |        "      <th>14</th>\n",
436 |        "      <td>CAPITAL EXPENDITURES</td>\n",
437 |        "      <td>Equipment</td>\n",
438 |        "      <td>True</td>\n",
439 |        "      <td>24028</td>\n",
440 |        "      <td>342171</td>\n",
441 |        "      <td>88629</td>\n",
442 |        "      <td>56895</td>\n",
443 |        "      <td>156000</td>\n",
444 |        "      <td>295922</td>\n",
445 |        "    </tr>\n",
446 |        "  </tbody>\n",
447 |        "</table>\n",
448 |        "</div>"
449 |       ],
450 |       "text/plain": [
451 |        "                  header                      line_item  complete  \\\n",
452 |        "0     PERSONNEL SERVICES            Salaries, Permanent      True   \n",
453 |        "1     PERSONNEL SERVICES            Salaries, Temporary      True   \n",
454 |        "2     PERSONNEL SERVICES             Salaries, Overtime      True   \n",
455 |        "3     PERSONNEL SERVICES                       Benefits     False   \n",
456 |        "4     OPERATING EXPENSES                      Utilities      True   \n",
457 |        "5     OPERATING EXPENSES         Equipment and Supplies      True   \n",
458 |        "6     OPERATING EXPENSES        Repairs and Maintenance      True   \n",
459 |        "7     OPERATING EXPENSES       Conferences and Training      True   \n",
460 |        "8     OPERATING EXPENSES          Professional Services      True   \n",
461 |        "9     OPERATING EXPENSES        Other Contract Services      True   \n",
462 |        "10    OPERATING EXPENSES                 Rental Expense      True   \n",
463 |        "11    OPERATING EXPENSES  Payments to Other Governments      True   \n",
464 |        "12    OPERATING EXPENSES             Expense Allowances      True   \n",
465 |        "13    OPERATING EXPENSES                 Other Expenses      True   \n",
466 |        "14  CAPITAL EXPENDITURES                      Equipment      True   \n",
467 |        "\n",
468 |        "    FY2015/16_Actual  FY2016/17_Actual  FY2017/18_Actual  FY2018/19_Actual  \\\n",
469 |        "0           34323749          34654406          25765375          37010295   \n",
470 |        "1             499772            420908            348015            367098   \n",
471 |        "2            5007346           5043233           4093771           3953950   \n",
472 |        "3            1466088           1550479           1079615          25343062   \n",
473 |        "4              17654             31687             30413             19500   \n",
474 |        "5            1105697           1575090            935471            985254   \n",
475 |        "6            1106671            939054            752048            964510   \n",
476 |        "7             344329            337535            308983            334105   \n",
477 |        "8             503872            458393            391996            335825   \n",
478 |        "9            1727604           1790163           1569292           2279087   \n",
479 |        "10             11420             13111              7148             10884   \n",
480 |        "11            962714            790602            592863            928540   \n",
481 |        "12            331430            346883            330933            367000   \n",
482 |        "13              3736             10147               132              4973   \n",
483 |        "14             24028            342171             88629             56895   \n",
484 |        "\n",
485 |        "    FY2018/19_Revised  FY2019/20_Adopted  \n",
486 |        "0            37033765           36135762  \n",
487 |        "1              538702             367948  \n",
488 |        "2             4372335            4049950  \n",
489 |        "3            26926178           21666643  \n",
490 |        "4               19500              19500  \n",
491 |        "5             1489857            1328684  \n",
492 |        "6              986248             964510  \n",
493 |        "7              335654             225767  \n",
494 |        "8              735552             335825  \n",
495 |        "9             2355534            2189087  \n",
496 |        "10              10884              10884  \n",
497 |        "11             928540             928540  \n",
498 |        "12             367000             367000  \n",
499 |        "13               4973               4973  \n",
500 |        "14             156000             295922  "
501 |       ]
502 |      },
503 |      "execution_count": 17,
504 |      "metadata": {},
505 |      "output_type": "execute_result"
506 |     }
507 |    ],
508 |    "source": [
509 |     "p = Parser()\n",
510 |     "df = p.parse_block_headers()\n",
511 |     "p.parse_row_labels()"
512 |    ]
513 |   }
514 |  ],
515 |  "metadata": {
516 |   "kernelspec": {
517 |    "display_name": "Python 3",
518 |    "language": "python",
519 |    "name": "python3"
520 |   },
521 |   "language_info": {
522 |    "codemirror_mode": {
523 |     "name": "ipython",
524 |     "version": 3
525 |    },
526 |    "file_extension": ".py",
527 |    "mimetype": "text/x-python",
528 |    "name": "python",
529 |    "nbconvert_exporter": "python",
530 |    "pygments_lexer": "ipython3",
531 |    "version": "3.7.6"
532 |   }
533 |  },
534 |  "nbformat": 4,
535 |  "nbformat_minor": 4
536 | }
537 | 


--------------------------------------------------------------------------------