├── 2 clean data ├── 2 Clean data - Complete.ipynb ├── 2 Clean data.ipynb ├── results.csv └── results_clean.csv ├── 3 analyse data ├── 3 Analyse data - Complete.ipynb ├── 3 Analyse data.ipynb ├── Extra 3 Analyse data exercises - Complete.ipynb ├── Extra 3 Analyse data exercises.ipynb ├── donations per party, absolute + percentages.csv └── results_clean.csv ├── 4 scrape data ├── 4 Scrape data - Complete.ipynb ├── 4 Scrape data.ipynb ├── scrapedData single header.csv └── scrapedData.csv ├── LICENSE └── README.md /2 clean data/2 Clean data.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "toc": true 7 | }, 8 | "source": [ 9 | "

Table of Contents

\n", 10 | "
" 11 | ] 12 | }, 13 | { 14 | "cell_type": "markdown", 15 | "metadata": {}, 16 | "source": [ 17 | " # Clean data with Python Pandas" 18 | ] 19 | }, 20 | { 21 | "cell_type": "markdown", 22 | "metadata": {}, 23 | "source": [ 24 | "Welcome to this Jupyter Notebook! \n", 25 | " \n", 26 | "Today you'll learn how to import a CSV file into a Jupyter Notebook, and how to clean up messy data. This notebook is part of the course Python for Journalists at [Learno.net](learno.net). The data used originally comes from [the Electoral Commission website](http://search.electoralcommission.org.uk/Search?currentPage=1&rows=10&sort=AcceptedDate&order=desc&tab=1&open=filter&et=pp&isIrishSourceYes=false&isIrishSourceNo=false&date=Reported&from=&to=&quarters=2018Q12&rptPd=3617&prePoll=false&postPoll=false&donorStatus=individual&donorStatus=tradeunion&donorStatus=company&donorStatus=unincorporatedassociation&donorStatus=publicfund&donorStatus=other&donorStatus=registeredpoliticalparty&donorStatus=friendlysociety&donorStatus=trust&donorStatus=limitedliabilitypartnership&donorStatus=impermissibledonor&donorStatus=na&donorStatus=unidentifiabledonor&donorStatus=buildingsociety®ister=ni®ister=gb&optCols=Register&optCols=IsIrishSource&optCols=ReportingPeriodName), but is edited for training purposes. The edited dataset is available on the Learno website. \n", 27 | "\n", 28 | "Remember: before you start working with data, make sure to create a copy of the original dataset." 29 | ] 30 | }, 31 | { 32 | "cell_type": "markdown", 33 | "metadata": {}, 34 | "source": [ 35 | "## About Jupyter Notebooks and Pandas" 36 | ] 37 | }, 38 | { 39 | "cell_type": "markdown", 40 | "metadata": {}, 41 | "source": [ 42 | "Right now you're looking at a Jupyter Notebook: an interactive, browser based programming environment. You can use these notebooks to program in R, Julia or Python - as you'll be doing later on. Read more about Jupyter Notebook in the [Jupyter Notebook Quick Start Guide](https://jupyter-notebook-beginner-guide.readthedocs.io/en/latest/what_is_jupyter.html). \n", 43 | " \n", 44 | "To clean up our data, we'll be using Python and Pandas. Pandas is an open-source Python library - basically an extra toolkit to go with Python - that is designed for data analysis. Pandas is flexible, easy to use and has lots of useful functions built right in. Read more about Pandas and its features in [the Pandas documentation](https://pandas.pydata.org/pandas-docs/stable/).\n", 45 | "\n", 46 | "**Notebook shortcuts** \n", 47 | "\n", 48 | "Within Jupyter Notebooks, there are some shortcuts you can use. If you'll be using more notebooks for your data analysis in the future, you'll remember these shortcuts soon enough. :) \n", 49 | "\n", 50 | "* `esc` will take you into command mode\n", 51 | "* `a` will insert cell above\n", 52 | "* `b` will insert cell below\n", 53 | "* `shift then tab` will show you the documentation for your code\n", 54 | "* `shift and enter` will run your cell\n", 55 | "* ` d d` will delete a cell\n", 56 | "\n", 57 | "**Pandas dictionary**\n", 58 | "\n", 59 | "* **dataframe**: dataframe is Pandas speak for a table with a labeled y-axis, also known as an index. (The index usually starts at 0.)\n", 60 | "* **series**: a series is a list, a series can be made of a single column within a dataframe.\n", 61 | "\n", 62 | "Before we dive in, a little more about Jupyter Notebooks. Every notebooks is made out of cells. A cell can either contain Markdown text - like this one - or code. In the latter you can execute your code. To see what that means, type the following command in the next cell `print(\"hello world\")`." 63 | ] 64 | }, 65 | { 66 | "cell_type": "code", 67 | "execution_count": null, 68 | "metadata": {}, 69 | "outputs": [], 70 | "source": [] 71 | }, 72 | { 73 | "cell_type": "markdown", 74 | "metadata": {}, 75 | "source": [ 76 | "OK, if you're good to go, let dive in..." 77 | ] 78 | }, 79 | { 80 | "cell_type": "markdown", 81 | "metadata": {}, 82 | "source": [ 83 | "# Getting started\n", 84 | "\n", 85 | "Before we can work on our data, we need to import all libraries we'll need. In this case, we need to import the Pandas library. You can do that by typing in `import pandas as pd`." 86 | ] 87 | }, 88 | { 89 | "cell_type": "code", 90 | "execution_count": null, 91 | "metadata": {}, 92 | "outputs": [], 93 | "source": [] 94 | }, 95 | { 96 | "cell_type": "markdown", 97 | "metadata": {}, 98 | "source": [ 99 | "`as pd` means that when you'll be writing code you can refer to the library by writing `pd` instead of `pandas`. It's just a little bit shorter and therefore more efficient - something programmers like a lot." 100 | ] 101 | }, 102 | { 103 | "cell_type": "markdown", 104 | "metadata": {}, 105 | "source": [ 106 | "### Import data" 107 | ] 108 | }, 109 | { 110 | "cell_type": "markdown", 111 | "metadata": {}, 112 | "source": [ 113 | "For this course, we'll be using data on donations done to British political parties. The data was originally downloaded from [the Electoral Commission website](http://search.electoralcommission.org.uk/Search?currentPage=1&rows=10&sort=AcceptedDate&order=desc&tab=1&open=filter&et=pp&isIrishSourceYes=false&isIrishSourceNo=false&date=Reported&from=&to=&quarters=2018Q12&rptPd=3617&prePoll=false&postPoll=false&donorStatus=individual&donorStatus=tradeunion&donorStatus=company&donorStatus=unincorporatedassociation&donorStatus=publicfund&donorStatus=other&donorStatus=registeredpoliticalparty&donorStatus=friendlysociety&donorStatus=trust&donorStatus=limitedliabilitypartnership&donorStatus=impermissibledonor&donorStatus=na&donorStatus=unidentifiabledonor&donorStatus=buildingsociety®ister=ni®ister=gb&optCols=Register&optCols=IsIrishSource&optCols=ReportingPeriodName).\n", 114 | "\n", 115 | "We are going to create a dataframe by importing a CSV with our data. Data can be imported using the following code: `df = pd.read_csv('filename.csv')`. " 116 | ] 117 | }, 118 | { 119 | "cell_type": "code", 120 | "execution_count": null, 121 | "metadata": {}, 122 | "outputs": [], 123 | "source": [] 124 | }, 125 | { 126 | "cell_type": "markdown", 127 | "metadata": {}, 128 | "source": [ 129 | "Sometimes you'll have a CSV file that doesn't use comma's to seperate values, but uses semi-colons or something else entirely. To import such a dataset change the code into: \n", 130 | "`df = pd.read_csv('filename.csv', delimiter=\";\")`. " 131 | ] 132 | }, 133 | { 134 | "cell_type": "markdown", 135 | "metadata": { 136 | "collapsed": true 137 | }, 138 | "source": [ 139 | "## Explore data" 140 | ] 141 | }, 142 | { 143 | "cell_type": "markdown", 144 | "metadata": {}, 145 | "source": [ 146 | "When importing our data, we save the CSV file inside a dataframe that is called `df`. We can now explore the data by refering to the dataframe as `df`. It's important to 'get to know' your data, so you know what you're working with.\n", 147 | "\n", 148 | "Use `df.head(10)` to look at the first ten rows of the data:" 149 | ] 150 | }, 151 | { 152 | "cell_type": "code", 153 | "execution_count": null, 154 | "metadata": {}, 155 | "outputs": [], 156 | "source": [] 157 | }, 158 | { 159 | "cell_type": "markdown", 160 | "metadata": {}, 161 | "source": [ 162 | "Use `df.tail(10)` to look at the last ten rows of the data:" 163 | ] 164 | }, 165 | { 166 | "cell_type": "code", 167 | "execution_count": null, 168 | "metadata": {}, 169 | "outputs": [], 170 | "source": [] 171 | }, 172 | { 173 | "cell_type": "markdown", 174 | "metadata": {}, 175 | "source": [ 176 | "To look at a random sample of the data set, typ `df.sample(5)`." 177 | ] 178 | }, 179 | { 180 | "cell_type": "code", 181 | "execution_count": null, 182 | "metadata": {}, 183 | "outputs": [], 184 | "source": [] 185 | }, 186 | { 187 | "cell_type": "markdown", 188 | "metadata": {}, 189 | "source": [ 190 | "Next we want to know what data types we're dealing with for each column in our dataframe\n", 191 | "\n", 192 | "Within Python different types of information, have different names. Using `df.dtypes` you can see what data type is in each column of the dataframe. \n", 193 | "\n", 194 | "**Most common data types**\n", 195 | "* **int**: short for integer, a number with no decimal\n", 196 | "* **float**: short for floating point, a number with at least one decimal. \n", 197 | "* **string**: usually a bit of text, if there are numbers in a string they are not recognized as such. Python will see a string as text.\n", 198 | "* **object**: usually a bit of text, if there are numbers in a string they are not recognized as such. Python will see a string as text.\n", 199 | "* **bool**: short for boolean, a binary data type, (true/false)" 200 | ] 201 | }, 202 | { 203 | "cell_type": "code", 204 | "execution_count": null, 205 | "metadata": {}, 206 | "outputs": [], 207 | "source": [] 208 | }, 209 | { 210 | "cell_type": "markdown", 211 | "metadata": {}, 212 | "source": [ 213 | "To see the shape of the dataframe - the number of rows and columns - type `df.shape`." 214 | ] 215 | }, 216 | { 217 | "cell_type": "code", 218 | "execution_count": null, 219 | "metadata": {}, 220 | "outputs": [], 221 | "source": [] 222 | }, 223 | { 224 | "cell_type": "markdown", 225 | "metadata": {}, 226 | "source": [ 227 | "To see a descriptive statistics summary of our data, including the median, average value for every column, type `df.describe()`" 228 | ] 229 | }, 230 | { 231 | "cell_type": "code", 232 | "execution_count": null, 233 | "metadata": { 234 | "scrolled": true 235 | }, 236 | "outputs": [], 237 | "source": [] 238 | }, 239 | { 240 | "cell_type": "markdown", 241 | "metadata": {}, 242 | "source": [ 243 | "As you can see, using `df.describe()`, the Value column is missing. What happened? The most interesting data is in there... Well, remember when we asked Python to give us the data type for each column using `df.dtypes`? \n", 244 | "\n", 245 | "Turns out, Python doesn't recognize the values in the Value column as numbers. Spoiler alert: that might have something to do with the comma's and pound-sign in that column. Guess what? It's time to do some data cleaning. \n", 246 | "\n", 247 | "# Clean data\n", 248 | "\n", 249 | "**Clean data to do list**\n", 250 | "- make sure that numerical values are recognized as such\n", 251 | "- dates are just objects, lets make Python recognize dates as dates\n", 252 | "- create new columns based on the date (like a column for year and month)" 253 | ] 254 | }, 255 | { 256 | "cell_type": "markdown", 257 | "metadata": {}, 258 | "source": [ 259 | "## Cleaning strings\n", 260 | "\n", 261 | "If we're going to analyse the data, we need the Value column to be recognized as float-numbers. (Floats, not ints since the Value column has numbers with decimals in there.)\n", 262 | "\n", 263 | "First, let's remove all of the pound-signs £... Type `df['ValueClean'] = df['Value'].str.replace('£', '')`. Now, what does this doe? It adds the column ValueClean, which is exactly the same as the column Value, but with every '£' replaced by nothing. " 264 | ] 265 | }, 266 | { 267 | "cell_type": "code", 268 | "execution_count": null, 269 | "metadata": {}, 270 | "outputs": [], 271 | "source": [] 272 | }, 273 | { 274 | "cell_type": "code", 275 | "execution_count": null, 276 | "metadata": {}, 277 | "outputs": [], 278 | "source": [] 279 | }, 280 | { 281 | "cell_type": "markdown", 282 | "metadata": {}, 283 | "source": [ 284 | "Let's heave a look at the first rows to see how we've done. Remember `df.head()`? ValueClean will be added at the right end of the table..." 285 | ] 286 | }, 287 | { 288 | "cell_type": "code", 289 | "execution_count": null, 290 | "metadata": {}, 291 | "outputs": [], 292 | "source": [] 293 | }, 294 | { 295 | "cell_type": "markdown", 296 | "metadata": {}, 297 | "source": [ 298 | "We're not done yet with this ValueClean column. We need to remove all comma's - Python doesn't like comma's or points for thousands, only for decimals. How would you remove all comma's in the ValueClean column without creating a new column? \n", 299 | ". \n", 300 | ". \n", 301 | ". \n", 302 | ". \n", 303 | ". \n", 304 | ". \n", 305 | ". \n", 306 | ". \n", 307 | ". \n", 308 | ". \n", 309 | ". \n", 310 | ". \n", 311 | " \n", 312 | "The answer looks a lot like `df['ValueClean'] = df['Value'].str.replace('£', '')` but isn't exactly the same...\n", 313 | ". \n", 314 | ". \n", 315 | ". \n", 316 | ". \n", 317 | ". \n", 318 | ". \n", 319 | ". \n", 320 | ". \n", 321 | ". \n", 322 | ". \n", 323 | ". \n", 324 | ". \n", 325 | " \n", 326 | "Type `df['ValueClean'] = df['ValueClean'].str.replace(',', '')`, which will replace the column ValueClean with the column ValueClean where all comma's are replaced by nothing." 327 | ] 328 | }, 329 | { 330 | "cell_type": "code", 331 | "execution_count": null, 332 | "metadata": {}, 333 | "outputs": [], 334 | "source": [] 335 | }, 336 | { 337 | "cell_type": "code", 338 | "execution_count": null, 339 | "metadata": {}, 340 | "outputs": [], 341 | "source": [] 342 | }, 343 | { 344 | "cell_type": "markdown", 345 | "metadata": {}, 346 | "source": [ 347 | "Now, let's see if this did the trick. Use `df.dtypes` to see if the ValueClean column is now a float datatype..." 348 | ] 349 | }, 350 | { 351 | "cell_type": "code", 352 | "execution_count": null, 353 | "metadata": {}, 354 | "outputs": [], 355 | "source": [] 356 | }, 357 | { 358 | "cell_type": "markdown", 359 | "metadata": {}, 360 | "source": [ 361 | "Didn't work, huh? That's because we need to explicitly tell Python that the ValueClean column contains float numbers. We can use a Pandas function to do this - like all Pandas functions this one too starts with `pd.`: \n", 362 | "\n", 363 | "`df['ValueClean'] = pd.to_numeric(df['ValueClean'])`" 364 | ] 365 | }, 366 | { 367 | "cell_type": "code", 368 | "execution_count": null, 369 | "metadata": { 370 | "scrolled": true 371 | }, 372 | "outputs": [], 373 | "source": [] 374 | }, 375 | { 376 | "cell_type": "code", 377 | "execution_count": null, 378 | "metadata": {}, 379 | "outputs": [], 380 | "source": [] 381 | }, 382 | { 383 | "cell_type": "markdown", 384 | "metadata": {}, 385 | "source": [ 386 | "## Delete columns\n", 387 | "\n", 388 | "Ok, now we got our Value column cleaned up in ValueClean; we actually no longer need to keep the original Value column. In Pandas removing or deleting a column is called 'dropping a column'. \n", 389 | "\n", 390 | "Also good to know: in Pandas, rows (horizontal) in a dataframe have axis=0, columns (vertical) have the first axis (axis=1). \n", 391 | "\n", 392 | "Knowing this, typing `df = df.drop('Value', 1)` should make sense. It means: the dataframe is the dataframe with the column Value dropped." 393 | ] 394 | }, 395 | { 396 | "cell_type": "code", 397 | "execution_count": null, 398 | "metadata": {}, 399 | "outputs": [], 400 | "source": [] 401 | }, 402 | { 403 | "cell_type": "code", 404 | "execution_count": null, 405 | "metadata": {}, 406 | "outputs": [], 407 | "source": [] 408 | }, 409 | { 410 | "cell_type": "markdown", 411 | "metadata": {}, 412 | "source": [ 413 | "## Renaming columns" 414 | ] 415 | }, 416 | { 417 | "cell_type": "markdown", 418 | "metadata": {}, 419 | "source": [ 420 | "With the Value column gone, we can rename ValueClean to Value. Use the following command to do this: `df = df.rename(columns={'old_name': 'new_name'})`" 421 | ] 422 | }, 423 | { 424 | "cell_type": "code", 425 | "execution_count": null, 426 | "metadata": {}, 427 | "outputs": [], 428 | "source": [] 429 | }, 430 | { 431 | "cell_type": "markdown", 432 | "metadata": {}, 433 | "source": [ 434 | "## Clean up column names\n", 435 | "\n", 436 | "Since leading and trail spaces will always come back to haunt you in your data analysis nightmares, you want to make sure you get them out of your way before analysing your data. \n", 437 | "\n", 438 | "Let's see if there are any of these spaces in our column names, by typing `df.columns`, which will give us a list of all column names." 439 | ] 440 | }, 441 | { 442 | "cell_type": "code", 443 | "execution_count": null, 444 | "metadata": {}, 445 | "outputs": [], 446 | "source": [] 447 | }, 448 | { 449 | "cell_type": "markdown", 450 | "metadata": {}, 451 | "source": [ 452 | "Looking good to me... Let's check all donor names. Get a list of all donor names by using the following command: `df['columnname'].unique()`." 453 | ] 454 | }, 455 | { 456 | "cell_type": "code", 457 | "execution_count": null, 458 | "metadata": { 459 | "scrolled": false 460 | }, 461 | "outputs": [], 462 | "source": [] 463 | }, 464 | { 465 | "cell_type": "markdown", 466 | "metadata": {}, 467 | "source": [ 468 | "Well, well, well, Mr Alun Ffred Jones !! It going to take multiple steps to fix that: \n", 469 | "1. Let's make sure the DonorName column is a string. \n", 470 | "Use `df['columnname'] = df['columnname'].astype(str)`\n", 471 | "2. Strip all strings in the column of leading and trail spaces. \n", 472 | "Use `df['columnname'] = df['columnname'].map(str.strip)`" 473 | ] 474 | }, 475 | { 476 | "cell_type": "code", 477 | "execution_count": null, 478 | "metadata": {}, 479 | "outputs": [], 480 | "source": [] 481 | }, 482 | { 483 | "cell_type": "markdown", 484 | "metadata": {}, 485 | "source": [ 486 | "Did that work? Let's see. `df['DonorName'].unique()`" 487 | ] 488 | }, 489 | { 490 | "cell_type": "code", 491 | "execution_count": null, 492 | "metadata": {}, 493 | "outputs": [], 494 | "source": [] 495 | }, 496 | { 497 | "cell_type": "markdown", 498 | "metadata": {}, 499 | "source": [ 500 | "## Cleaning dates\n", 501 | "\n", 502 | "Ok, so now we've only got to clean up our dates. We're going to use another python library: the datetime library contains some neat and handy datetime tools. Just what we need, type: `import datetime`" 503 | ] 504 | }, 505 | { 506 | "cell_type": "code", 507 | "execution_count": null, 508 | "metadata": {}, 509 | "outputs": [], 510 | "source": [] 511 | }, 512 | { 513 | "cell_type": "markdown", 514 | "metadata": {}, 515 | "source": [ 516 | "Let's have another look at some of our data before we start working on the date column. Use `df.head()`, `df.tail()` or my personal favorite `df.sample()`." 517 | ] 518 | }, 519 | { 520 | "cell_type": "code", 521 | "execution_count": null, 522 | "metadata": {}, 523 | "outputs": [], 524 | "source": [] 525 | }, 526 | { 527 | "cell_type": "code", 528 | "execution_count": null, 529 | "metadata": {}, 530 | "outputs": [], 531 | "source": [] 532 | }, 533 | { 534 | "cell_type": "markdown", 535 | "metadata": {}, 536 | "source": [ 537 | "Our dates are in the AcceptedDate column. Let's make sure these dates are recognized as such. Use `df['AcceptedDate'] = pd.to_datetime(df['AcceptedDate'], format=\"%d/%m/%Y\")` to change the data type from object to date. Use `df.head()`, `df.tail()` or `df.sample()` to see if it worked." 538 | ] 539 | }, 540 | { 541 | "cell_type": "code", 542 | "execution_count": null, 543 | "metadata": {}, 544 | "outputs": [], 545 | "source": [] 546 | }, 547 | { 548 | "cell_type": "markdown", 549 | "metadata": {}, 550 | "source": [ 551 | "Now, did the data type of the AcceptedDate column change? Use `df.dtypes` to check." 552 | ] 553 | }, 554 | { 555 | "cell_type": "code", 556 | "execution_count": null, 557 | "metadata": {}, 558 | "outputs": [], 559 | "source": [] 560 | }, 561 | { 562 | "cell_type": "code", 563 | "execution_count": null, 564 | "metadata": {}, 565 | "outputs": [], 566 | "source": [] 567 | }, 568 | { 569 | "cell_type": "markdown", 570 | "metadata": {}, 571 | "source": [ 572 | "Worked perfectly! :) \n", 573 | "\n", 574 | "## Adding columns with year and month\n", 575 | "\n", 576 | "Now, let's create two new columns. One with the month and one with the years... Since Python now knows that the AcceptedDate column contains dates, we can use out-of-the-box functions from pandas and the datetime libraries. \n", 577 | "\n", 578 | "Creating a column with the years based on the AcceptedDate column, becomes as easy as `df['Year'] = pd.DatetimeIndex(df['AcceptedDate']).year`." 579 | ] 580 | }, 581 | { 582 | "cell_type": "code", 583 | "execution_count": null, 584 | "metadata": {}, 585 | "outputs": [], 586 | "source": [] 587 | }, 588 | { 589 | "cell_type": "code", 590 | "execution_count": null, 591 | "metadata": {}, 592 | "outputs": [], 593 | "source": [] 594 | }, 595 | { 596 | "cell_type": "markdown", 597 | "metadata": {}, 598 | "source": [ 599 | "Use `df.head()`, `df.tail()` or `df.sample()` to see if it worked. Our new column will be added on the right side of the dataframe." 600 | ] 601 | }, 602 | { 603 | "cell_type": "code", 604 | "execution_count": null, 605 | "metadata": {}, 606 | "outputs": [], 607 | "source": [] 608 | }, 609 | { 610 | "cell_type": "markdown", 611 | "metadata": {}, 612 | "source": [ 613 | "To add a column with all months is just as easy. `df['Month'] = pd.DatetimeIndex(df['AcceptedDate']).month` will do the trick, it means: in the dataframe called 'df', create a new column called 'Month' and fill it with months, which you should base on the date inside the column AcceptedDate of the dataframe df." 614 | ] 615 | }, 616 | { 617 | "cell_type": "code", 618 | "execution_count": null, 619 | "metadata": {}, 620 | "outputs": [], 621 | "source": [] 622 | }, 623 | { 624 | "cell_type": "markdown", 625 | "metadata": {}, 626 | "source": [ 627 | "Use `df.head()`, `df.tail()` or `df.sample()` to see if it worked. For something like this, i like to use the `df.sample()` function; it allows you to see if it worked with different values. \n", 628 | "\n", 629 | "Off course this new column too is added on the right side of the dataframe." 630 | ] 631 | }, 632 | { 633 | "cell_type": "code", 634 | "execution_count": null, 635 | "metadata": {}, 636 | "outputs": [], 637 | "source": [] 638 | }, 639 | { 640 | "cell_type": "markdown", 641 | "metadata": {}, 642 | "source": [ 643 | "# Removing columns\n", 644 | "\n", 645 | "Our dataframe is quite big. Maybe we can remove some columns? Let's see how many columns we got... Use `df.columns` and `df.shape` to familiarize yourself with the number of columns and their names." 646 | ] 647 | }, 648 | { 649 | "cell_type": "code", 650 | "execution_count": null, 651 | "metadata": {}, 652 | "outputs": [], 653 | "source": [] 654 | }, 655 | { 656 | "cell_type": "code", 657 | "execution_count": null, 658 | "metadata": {}, 659 | "outputs": [], 660 | "source": [] 661 | }, 662 | { 663 | "cell_type": "markdown", 664 | "metadata": {}, 665 | "source": [ 666 | "Let's get rid of the columns 'ECRef', 'AccountingUnitName', 'AccountingUnitsAsCentralParty', 'IsSponsorship', 'RegulatedDoneeType', 'CompanyRegistrationNumber', 'Postcode', 'DonationType','NatureOfDonation', 'PurposeOfVisit', 'DonationAction', 'ReceivedDate', 'ReportedDate', 'IsReportedPrePoll', 'ReportingPeriodName', 'IsBequest', 'IsAggregation', 'RegulatedEntityId', 'AccountingUnitId', 'RegisterName', 'IsIrishSource', 'AcceptedDateClean'. \n", 667 | "\n", 668 | "Let's drop some columns! `df = df.drop('Value', 1)`\n", 669 | "\n", 670 | ". \n", 671 | ". \n", 672 | ". \n", 673 | ". \n", 674 | ". \n", 675 | ". \n", 676 | ". \n", 677 | ". \n", 678 | ". \n", 679 | ". \n", 680 | "\n", 681 | "Or, maybe we should just tell the computer what columns we like to keep. Might be shorter. :) \n", 682 | "Use `dfMini = df[['RegulatedEntityName', 'AcceptedDate', 'DonorName', 'DonorStatus', 'Year', 'Month','Value', 'RegulatedEntityType', 'DonorId', 'CampaigningName']]`" 683 | ] 684 | }, 685 | { 686 | "cell_type": "code", 687 | "execution_count": null, 688 | "metadata": { 689 | "scrolled": false 690 | }, 691 | "outputs": [], 692 | "source": [] 693 | }, 694 | { 695 | "cell_type": "code", 696 | "execution_count": null, 697 | "metadata": {}, 698 | "outputs": [], 699 | "source": [] 700 | }, 701 | { 702 | "cell_type": "markdown", 703 | "metadata": {}, 704 | "source": [ 705 | "## Save your data\n", 706 | "\n", 707 | "Now that we've put all this work into cleaning our dataset, let's save a copy. Off course Pandas has a nifty command for that too. Use `dfMini.to_csv('filename.csv', encoding='utf8')`. \n", 708 | "\n", 709 | "Be ware: use a different name than the filename of the original data file, or it will be overwritten. " 710 | ] 711 | }, 712 | { 713 | "cell_type": "code", 714 | "execution_count": null, 715 | "metadata": {}, 716 | "outputs": [], 717 | "source": [] 718 | }, 719 | { 720 | "cell_type": "markdown", 721 | "metadata": {}, 722 | "source": [ 723 | "In case you want to check if a new file was created in your directory, you can use the `pwd` and `ls` commands. At the beginning of this module, we used these commands to print the working directory (`pwd`) and list the content of the working directory (`ls`). \n", 724 | "\n", 725 | "First, use `pwd` to see in which folder - also known as directory - you are:" 726 | ] 727 | }, 728 | { 729 | "cell_type": "code", 730 | "execution_count": null, 731 | "metadata": {}, 732 | "outputs": [], 733 | "source": [] 734 | }, 735 | { 736 | "cell_type": "markdown", 737 | "metadata": {}, 738 | "source": [ 739 | "Now use `ls` to get a list of all files in this directory. If everything worked your newly saved datafile should be among the files in the list. " 740 | ] 741 | }, 742 | { 743 | "cell_type": "code", 744 | "execution_count": null, 745 | "metadata": {}, 746 | "outputs": [], 747 | "source": [] 748 | } 749 | ], 750 | "metadata": { 751 | "kernelspec": { 752 | "display_name": "Python 3", 753 | "language": "python", 754 | "name": "python3" 755 | }, 756 | "language_info": { 757 | "codemirror_mode": { 758 | "name": "ipython", 759 | "version": 3 760 | }, 761 | "file_extension": ".py", 762 | "mimetype": "text/x-python", 763 | "name": "python", 764 | "nbconvert_exporter": "python", 765 | "pygments_lexer": "ipython3", 766 | "version": "3.6.4" 767 | }, 768 | "toc": { 769 | "nav_menu": {}, 770 | "number_sections": true, 771 | "sideBar": true, 772 | "skip_h1_title": false, 773 | "toc_cell": true, 774 | "toc_position": {}, 775 | "toc_section_display": "block", 776 | "toc_window_display": true 777 | } 778 | }, 779 | "nbformat": 4, 780 | "nbformat_minor": 2 781 | } 782 | -------------------------------------------------------------------------------- /2 clean data/results_clean.csv: -------------------------------------------------------------------------------- 1 | ,RegulatedEntityName,AcceptedDate,DonorName,DonorStatus,Year,Month,Value,RegulatedEntityType,DonorId,CampaigningName 2 | 0,Plaid Cymru - The Party of Wales,2018-12-19,Mr Alun Ffred Jones,Individual,2018,12,20000.0,Political Party,83318, 3 | 1,Liberal Democrats,2017-12-31,Ms Kirsten Bayes,Individual,2017,12,1800.0,Political Party,43033, 4 | 2,Liberal Democrats,2017-12-31,Mr Steve Webb,Individual,2017,12,3000.0,Political Party,35400, 5 | 3,Liberal Democrats,2017-12-31,Mr Tim Farron,Individual,2017,12,1560.0,Political Party,76661, 6 | 4,Liberal Democrats,2017-12-31,Mr Duncan Greenland,Individual,2017,12,7750.0,Political Party,35403, 7 | 5,Liberal Democrats,2017-12-31,Mr Michael Lees,Individual,2017,12,1800.0,Political Party,76645, 8 | 6,Liberal Democrats,2017-12-31,Ms Jane Mactaggart,Individual,2017,12,1838.0,Political Party,47793, 9 | 7,Liberal Democrats,2017-12-31,Mr Jeremy Hilton,Individual,2017,12,1779.0,Political Party,83347, 10 | 8,Liberal Democrats,2017-12-31,Baroness Kathryn Parminter,Individual,2017,12,2400.0,Political Party,37433, 11 | 9,Liberal Democrats,2017-12-31,Lady Catherine Bakewell,Individual,2017,12,1560.0,Political Party,50620, 12 | 10,Liberal Democrats,2017-12-31,Mr Martin Elengorn,Individual,2017,12,5350.0,Political Party,48727, 13 | 11,Liberal Democrats,2017-12-31,Cllr Ian Shires,Individual,2017,12,1920.0,Political Party,75238, 14 | 12,Liberal Democrats,2017-12-31,Ms Liz Morris,Individual,2017,12,3000.0,Political Party,72589, 15 | 13,Liberal Democrats,2017-12-31,Dr Alun Griffiths,Individual,2017,12,1800.0,Political Party,35415, 16 | 14,Liberal Democrats,2017-12-31,Mr Dave Hodgson,Individual,2017,12,2250.0,Political Party,34493, 17 | 15,Liberal Democrats,2017-12-31,Mr David Goodwin,Individual,2017,12,1560.0,Political Party,72576, 18 | 16,Liberal Democrats,2017-12-31,Mrs Elizabeth Barraclough,Individual,2017,12,2550.0,Political Party,83350, 19 | 17,Liberal Democrats,2017-12-31,Baroness Shirley Williams,Individual,2017,12,1700.0,Political Party,37445, 20 | 18,Liberal Democrats,2017-12-31,Lord Tim Clement-Jones,Individual,2017,12,2200.0,Political Party,34534, 21 | 19,Liberal Democrats,2017-12-31,Mr Duncan Greenland,Individual,2017,12,1950.0,Political Party,31106, 22 | 20,Liberal Democrats,2017-12-31,Mr Mark Burch,Individual,2017,12,4000.0,Political Party,74667, 23 | 21,Liberal Democrats,2017-12-31,Mr Arthur Hookway,Individual,2017,12,1980.0,Political Party,72143, 24 | 22,Liberal Democrats,2017-12-31,Mr Duncan Greenland,Individual,2017,12,1504.5,Political Party,35390, 25 | 23,UK Independence Party (UKIP),2017-12-31,Mr Brett Hammond,Individual,2017,12,650.0,Political Party,83324, 26 | 24,Liberal Democrats,2017-12-31,Mr Ashley Wood,Individual,2017,12,1743.93,Political Party,83356, 27 | 25,Liberal Democrats,2017-12-31,Mrs Rowena Hay,Individual,2017,12,1854.0,Political Party,48690, 28 | 26,Liberal Democrats,2017-12-31,Ms Lynne Featherstone,Individual,2017,12,1800.0,Political Party,37406, 29 | 27,Liberal Democrats,2017-12-31,Ms Inga Lockington,Individual,2017,12,1950.0,Political Party,83332, 30 | 28,Liberal Democrats,2017-12-31,Mr Peter Rothery,Individual,2017,12,1950.0,Political Party,34482, 31 | 29,Liberal Democrats,2017-12-31,Mr Richard Keatinge,Individual,2017,12,2000.0,Political Party,83335, 32 | 30,Liberal Democrats,2017-12-31,Mr Cliff Woodcraft,Individual,2017,12,1800.0,Political Party,50642, 33 | 31,Liberal Democrats,2017-12-31,Mr Derek Eastman,Individual,2017,12,1500.57,Political Party,83354, 34 | 32,Liberal Democrats,2017-12-31,Ms Gail Engert,Individual,2017,12,1695.0,Political Party,43024, 35 | 33,Liberal Democrats,2017-12-31,Mr Pathumal Ali,Individual,2017,12,1800.0,Political Party,72580, 36 | 34,Liberal Democrats,2017-12-31,Mrs Klara Sudbury,Individual,2017,12,1579.91,Political Party,35423, 37 | 35,Liberal Democrats,2017-12-31,Mr James Macpherson,Individual,2017,12,1776.0,Political Party,82968, 38 | 36,Liberal Democrats,2017-12-31,Mr David Tutt,Individual,2017,12,1750.0,Political Party,56049, 39 | 37,Liberal Democrats,2017-12-31,Mr Colin Stears,Individual,2017,12,1830.0,Political Party,54333, 40 | 38,Liberal Democrats,2017-12-31,Ms Mary Wane,Individual,2017,12,2200.0,Political Party,83342, 41 | 39,Liberal Democrats,2017-12-31,Mr Alistair Barr,Individual,2017,12,5000.0,Political Party,47844, 42 | 40,Liberal Democrats,2017-12-31,Miss Jocelyn Clark,Individual,2017,12,1594.0,Political Party,83346, 43 | 41,Liberal Democrats,2017-12-31,Roger Michael Isherwood,Individual,2017,12,2800.0,Political Party,77464, 44 | 42,Conservative and Unionist Party,2017-12-31,Mr David E D Brownlow,Individual,2017,12,11273.77,Political Party,69570, 45 | 43,Conservative and Unionist Party,2017-12-31,Mr David E Brownlow,Individual,2017,12,16540.0,Political Party,83077, 46 | 44,Liberal Democrats,2017-12-31,Cllr Joe Harris,Individual,2017,12,1557.25,Political Party,75240, 47 | 45,Liberal Democrats,2017-12-31,Mr James Baker,Individual,2017,12,1998.0,Political Party,76630, 48 | 46,Liberal Democrats,2017-12-31,Mr Dennis Meredith,Individual,2017,12,2003.41,Political Party,83355, 49 | 47,Liberal Democrats,2017-12-31,Ms Anne Winstanley,Individual,2017,12,2000.0,Political Party,34181, 50 | 48,Liberal Democrats,2017-12-31,Mr David Brown,Individual,2017,12,1862.0,Political Party,83331, 51 | 49,Liberal Democrats,2017-12-31,Mr Bernard Fisher,Individual,2017,12,1759.0,Political Party,76621, 52 | 50,Liberal Democrats,2017-12-31,Mr A Serge Lourie,Individual,2017,12,3620.0,Political Party,46325, 53 | 51,Renew,2017-12-31,Mr Richard Christopher Breen,Individual,2017,12,16509.1,Political Party,83311, 54 | 52,Liberal Democrats,2017-12-31,Mr Simon Wheeler,Individual,2017,12,1620.0,Political Party,83338, 55 | 53,Liberal Democrats,2017-12-31,Mrs Mary-Jane Jeanes,Individual,2017,12,1700.0,Political Party,76643, 56 | 54,Liberal Democrats,2017-12-31,Lord Richard Allan Of Hallam,Individual,2017,12,2100.0,Political Party,72138, 57 | 55,Liberal Democrats,2017-12-31,Dr Robert Barr,Individual,2017,12,1680.0,Political Party,76651, 58 | 56,Liberal Democrats,2017-12-31,Mrs Marian Radford,Individual,2017,12,1896.0,Political Party,72588, 59 | 57,Liberal Democrats,2017-12-31,Baroness Barbara Janke,Individual,2017,12,1600.0,Political Party,37379, 60 | 58,Liberal Democrats,2017-12-31,Mr M Joe Boyle,Individual,2017,12,1563.0,Political Party,83344, 61 | 59,Liberal Democrats,2017-12-31,Mr Dominic Hiscock,Individual,2017,12,2001.66,Political Party,83353, 62 | 60,Liberal Democrats,2017-12-31,Mr Manuel Abellan-San Martin,Individual,2017,12,1560.0,Political Party,83343, 63 | 61,Liberal Democrats,2017-12-31,Ms Helen Clucas,Individual,2017,12,1908.0,Political Party,83348, 64 | 62,Liberal Democrats,2017-12-31,Ms Anne Winstanley,Individual,2017,12,2000.0,Political Party,34181, 65 | 63,Liberal Democrats,2017-12-31,Ms Jane Mactaggart,Individual,2017,12,1668.0,Political Party,83345, 66 | 64,Liberal Democrats,2017-12-31,Ms Jane Mactaggart,Individual,2017,12,1800.0,Political Party,37401, 67 | 65,Liberal Democrats,2017-12-31,Mr Edward Acland,Individual,2017,12,1600.0,Political Party,83352, 68 | 66,Liberal Democrats,2017-12-31,Ms Joanna Kenny,Individual,2017,12,866.25,Political Party,47811, 69 | 67,Liberal Democrats,2017-12-31,Mr Michael Carter,Individual,2017,12,2200.0,Political Party,83341, 70 | 68,Liberal Democrats,2017-12-31,Ms Jane Mactaggart,Individual,2017,12,5000.0,Political Party,76647, 71 | 69,Liberal Democrats,2017-12-31,Ms Ruth Dombey,Individual,2017,12,1800.0,Political Party,50646, 72 | 70,Liberal Democrats,2017-12-31,Ms Karin Snowden,Individual,2017,12,1550.0,Political Party,83333, 73 | 71,Liberal Democrats,2017-12-31,Mrs Sunita Gordon,Individual,2017,12,1548.0,Political Party,76659, 74 | 72,Liberal Democrats,2017-12-31,Mr John Hale,Individual,2017,12,1600.0,Political Party,76632, 75 | 73,Liberal Democrats,2017-12-31,Ms Jane Mactaggart,Individual,2017,12,1800.0,Political Party,83337, 76 | 74,Liberal Democrats,2017-12-31,Lord Paul Strasburger,Individual,2017,12,1600.0,Political Party,73975, 77 | 75,Liberal Democrats,2017-12-31,Mrs Carolyn Lambert,Individual,2017,12,1526.4,Political Party,45278, 78 | 76,Liberal Democrats,2017-12-31,Mr Mark Watkin,Individual,2017,12,1800.0,Political Party,48732, 79 | 77,Liberal Democrats,2017-12-31,Mr Andrew Waller,Individual,2017,12,1657.38,Political Party,78680, 80 | 78,Liberal Democrats,2017-12-31,Mr David Brown,Individual,2017,12,1676.78,Political Party,83330, 81 | 79,Liberal Democrats,2017-12-31,Mr David Beacham,Individual,2017,12,3000.0,Political Party,50606, 82 | 80,Liberal Democrats,2017-12-31,Christopher Williams,Individual,2017,12,1506.37,Political Party,19152, 83 | 81,Scottish National Party (SNP),2017-12-31,Mr John Mason,Individual,2017,12,3160.0,Political Party,45126, 84 | 82,Liberal Democrats,2017-12-31,Mr Steven Lambert,Individual,2017,12,1596.0,Political Party,76657, 85 | 83,Liberal Democrats,2017-12-31,Mrs Isobel McCall,Individual,2017,12,2024.62,Political Party,45282, 86 | 84,Liberal Democrats,2017-12-31,Mr Michael Headley,Individual,2017,12,1958.0,Political Party,37420, 87 | 85,Liberal Democrats,2017-12-31,Mr Robert Wood,Individual,2017,12,1737.8,Political Party,76652, 88 | 86,Liberal Democrats,2017-12-31,Mr Mark Petterson,Individual,2017,12,15000.0,Political Party,55995, 89 | 87,Liberal Democrats,2017-12-31,Ms Jane Mactaggart,Individual,2017,12,2230.0,Political Party,55988, 90 | 88,Liberal Democrats,2017-12-31,Lord Nigel D Jones of Cheltenham,Individual,2017,12,1850.0,Political Party,37426, 91 | 89,Liberal Democrats,2017-12-31,Mr Ian Cuthbertson,Individual,2017,12,1521.72,Political Party,76629, 92 | 90,Liberal Democrats,2017-12-31,Mrs Marlene Heron,Individual,2017,12,1720.0,Political Party,54404, 93 | 91,Liberal Democrats,2017-12-31,Mr Owen Temple,Individual,2017,12,1800.0,Political Party,50168, 94 | 92,Liberal Democrats,2017-12-31,Mr Edward Joyce,Individual,2017,12,1980.0,Political Party,83351, 95 | 93,Liberal Democrats,2017-12-31,Mr SImon Curtis,Individual,2017,12,5000.0,Political Party,78769, 96 | 94,Liberal Democrats,2017-12-31,Mr Andrew Mckinlay,Individual,2017,12,1846.0,Political Party,48691, 97 | 95,Liberal Democrats,2017-12-31,Ms Philippa Connor,Individual,2017,12,2061.96,Political Party,54385, 98 | 96,Liberal Democrats,2017-12-31,Mrs Hilary Stephenson,Individual,2017,12,1800.0,Political Party,37408, 99 | 97,Liberal Democrats,2017-12-31,Mr Chris White,Individual,2017,12,2072.0,Political Party,83329, 100 | 98,Liberal Democrats,2017-12-31,Mr Tom Gosling,Individual,2017,12,10000.0,Political Party,78783, 101 | 99,Liberal Democrats,2017-12-31,Mr Alan Sherwell,Individual,2017,12,1750.0,Political Party,78672, 102 | 100,Liberal Democrats,2017-12-31,Mr Gerald Vernon-Jackson,Individual,2017,12,2290.68,Political Party,34481, 103 | 101,Liberal Democrats,2017-12-31,Mr Keith Crout,Individual,2017,12,1512.0,Political Party,76635, 104 | 102,Liberal Democrats,2017-12-30,Mr Alexey Chudnovskiy,Individual,2017,12,4500.0,Political Party,78673, 105 | 103,Labour Party,2017-12-29,Stella Creasy,Individual,2017,12,1947.05,Political Party,74319, 106 | 104,Liberal Democrats,2017-12-29,Ms Joanna Kenny,Individual,2017,12,1200.0,Political Party,47811, 107 | 105,Conservative and Unionist Party,2017-12-28,Mike Penning,Individual,2017,12,1510.0,Political Party,72665, 108 | 106,Conservative and Unionist Party,2017-12-28,Mr Anthony H Billingham,Individual,2017,12,5000.0,Political Party,37981, 109 | 107,Liberal Democrats,2017-12-27,Mr Peter Wilson,Individual,2017,12,8465.0,Political Party,83334, 110 | 108,Liberal Democrats,2017-12-27,Mrs Susan Howes,Individual,2017,12,5000.0,Political Party,78780, 111 | 109,Liberal Democrats,2017-12-27,Mr Greg Dyke,Individual,2017,12,5000.0,Political Party,78068, 112 | 110,Conservative and Unionist Party,2017-12-27,Mr John B Rutter,Individual,2017,12,1725.0,Political Party,53944, 113 | 111,Conservative and Unionist Party,2017-12-22,Ms Christine E Dawood,Individual,2017,12,12500.0,Political Party,83836, 114 | 112,Conservative and Unionist Party,2017-12-22,Mr Alan T Wicnh,Individual,2017,12,10000.0,Political Party,83878, 115 | 113,Labour Party,2017-12-21,Mr Stephen Kinsella,Individual,2017,12,4957.0,Political Party,37535, 116 | 114,Conservative and Unionist Party,2017-12-21,Mr David W Gray,Individual,2017,12,9000.0,Political Party,83838, 117 | 115,Liberal Democrats,2017-12-21,Mr Robert Chicken,Individual,2017,12,22466.38,Political Party,83339, 118 | 116,Conservative and Unionist Party,2017-12-20,Mr Simon D Hume-Kendall,Individual,2017,12,10000.0,Political Party,83065, 119 | 117,Liberal Democrats,2017-12-20,Ms Jennifer Talbot,Individual,2017,12,15000.0,Political Party,78089, 120 | 118,Conservative and Unionist Party,2017-12-20,Sir Michael Hintze,Individual,2017,12,2500.0,Political Party,34254, 121 | 119,Conservative and Unionist Party,2017-12-18,Mr Jeremy J Lefroy,Individual,2017,12,2572.5,Political Party,47086, 122 | 120,Conservative and Unionist Party,2017-12-18,Mr Jeremy J Lefroy,Individual,2017,12,2946.0,Political Party,47086, 123 | 121,Liberal Democrats,2017-12-15,Mrs Gitte Dawson,Individual,2017,12,20000.0,Political Party,35363, 124 | 122,Conservative and Unionist Party,2017-12-15,Mr Sam Singh,Individual,2017,12,18500.0,Political Party,83845, 125 | 123,Conservative and Unionist Party,2017-12-15,Mr Michael C Warshaw,Individual,2017,12,1703.93,Political Party,54007, 126 | 124,Conservative and Unionist Party,2017-12-15,Mr Edmund G Truell,Individual,2017,12,50000.0,Political Party,34290, 127 | 125,Conservative and Unionist Party,2017-12-15,Mr Surinderpal Lit,Individual,2017,12,3150.0,Political Party,83867, 128 | 126,Conservative and Unionist Party,2017-12-15,Mr Stephen S Less,Individual,2017,12,2100.0,Political Party,83866, 129 | 127,Conservative and Unionist Party,2017-12-14,Mr Stephen Howard,Individual,2017,12,2520.0,Political Party,37957, 130 | 128,Conservative and Unionist Party,2017-12-14,Lord Stanley Fink,Individual,2017,12,111600.0,Political Party,47072, 131 | 129,Conservative and Unionist Party,2017-12-14,Mr Dominic R Johnson,Individual,2017,12,2258.0,Political Party,77944, 132 | 130,Conservative and Unionist Party,2017-12-14,Mr Daniel P Hearsum,Individual,2017,12,10737.72,Political Party,76733, 133 | 131,Conservative and Unionist Party,2017-12-14,Baroness Emma Nicholson,Individual,2017,12,2499.99,Political Party,76310, 134 | 132,Conservative and Unionist Party,2017-12-14,Mr Dominic R Johnson,Individual,2017,12,900.0,Political Party,77944, 135 | 133,Conservative and Unionist Party,2017-12-14,Mr Dominic R Johnson,Individual,2017,12,6000.0,Political Party,77944, 136 | 134,Conservative and Unionist Party,2017-12-14,Mr Nicholas Brougham,Individual,2017,12,2499.0,Political Party,83060, 137 | 135,Conservative and Unionist Party,2017-12-14,Dr Arujuna Sivananthan,Individual,2017,12,2499.0,Political Party,83069, 138 | 136,Conservative and Unionist Party,2017-12-14,Mr Mark J Page,Individual,2017,12,6250.0,Political Party,83844, 139 | 137,Conservative and Unionist Party,2017-12-14,Mr Michael Davis,Individual,2017,12,271000.0,Political Party,34240, 140 | 138,Conservative and Unionist Party,2017-12-13,Mr Michael J Wade,Individual,2017,12,25000.0,Political Party,47059, 141 | 139,Labour Party,2017-12-13,Mr Stephen Kinsella,Individual,2017,12,4957.0,Political Party,37535, 142 | 140,Liberal Democrats,2017-12-13,Mr Mark Petterson,Individual,2017,12,25000.0,Political Party,55995, 143 | 141,Green Party,2017-12-12,Mr Roger Manser,Individual,2017,12,1500.0,Political Party,54660, 144 | 142,Conservative and Unionist Party,2017-12-12,Mr Richard J Grimes,Individual,2017,12,1900.0,Political Party,67265, 145 | 143,Conservative and Unionist Party,2017-12-12,Mr Palminder Singh,Individual,2017,12,2500.0,Political Party,67567, 146 | 144,Labour Party,2017-12-11,Lord Charles Falconer of Thoroton,Individual,2017,12,833.0,Political Party,83993, 147 | 145,UK Independence Party (UKIP),2017-12-11,Professor Tim Congdon,Individual,2017,12,5000.0,Political Party,38170, 148 | 146,Conservative and Unionist Party,2017-12-11,Mr Arthur P Davidson,Individual,2017,12,10000.0,Political Party,83835, 149 | 147,Conservative and Unionist Party,2017-12-11,Mr Mohamed Mansour,Individual,2017,12,12500.0,Political Party,83832, 150 | 148,Conservative and Unionist Party,2017-12-11,Mr Peter Cruddas,Individual,2017,12,12500.0,Political Party,67253, 151 | 149,Conservative and Unionist Party,2017-12-10,Mr Alan C Bolton,Individual,2017,12,7000.0,Political Party,77933, 152 | 150,Scottish National Party (SNP),2017-12-07,Mr Ian McNish,Individual,2017,12,42732.63,Political Party,78128, 153 | 151,Green Party,2017-12-07,Ms Jean Lambert MEP,Individual,2017,12,600.0,Political Party,34382, 154 | 152,UK Independence Party (UKIP),2017-12-07,Mr Ian Pirie,Individual,2017,12,5000.0,Political Party,74686, 155 | 153,Conservative and Unionist Party,2017-12-07,Mr Ian H Leslie-Melville,Individual,2017,12,5000.0,Political Party,77946, 156 | 154,Conservative and Unionist Party,2017-12-07,Mr Michael Slade,Individual,2017,12,5000.0,Political Party,36409, 157 | 155,Conservative and Unionist Party,2017-12-07,Lady Sarah L Keswick,Individual,2017,12,12500.0,Political Party,83840, 158 | 156,Conservative and Unionist Party,2017-12-06,Mr John D Lovering,Individual,2017,12,5000.0,Political Party,45696, 159 | 157,Liberal Democrats,2017-12-06,Mr Dinesh Dhamija,Individual,2017,12,5000.0,Political Party,54347, 160 | 158,Conservative and Unionist Party,2017-12-05,Mr Andrew D Williams,Individual,2017,12,4350.0,Political Party,83861, 161 | 159,Conservative and Unionist Party,2017-12-05,Ms Alison Frost,Individual,2017,12,10000.0,Political Party,37975, 162 | 160,Conservative and Unionist Party,2017-12-04,Mr Raymond Chamberlain,Individual,2017,12,8750.0,Political Party,52210, 163 | 161,Green Party,2017-12-04,Ms Elizabeth Reason,Individual,2017,12,1000.0,Political Party,77455, 164 | 162,Conservative and Unionist Party,2017-12-04,Mr Malcolm Bluemel,Individual,2017,12,50000.0,Political Party,72651, 165 | 163,UK Independence Party (UKIP),2017-12-04,Mrs Catherine Pirie,Individual,2017,12,2000.0,Political Party,74685, 166 | 164,Conservative and Unionist Party,2017-12-04,Mr Andrew Law,Individual,2017,12,13750.0,Political Party,52218, 167 | 165,Scottish Green Party,2017-12-01,Ms Alison Johnstone,Individual,2017,12,7956.0,Political Party,45119, 168 | 166,Conservative and Unionist Party,2017-12-01,Mr Simon M Haslam,Individual,2017,12,2395.0,Political Party,44911, 169 | 167,Scottish Green Party,2017-12-01,Mr Patrick Harvie,Individual,2017,12,8196.0,Political Party,38155, 170 | 168,Scottish Green Party,2017-12-01,Mr Ross John Greer,Individual,2017,12,9036.0,Political Party,83319, 171 | 169,Labour Party,2017-12-01,Lord John Crawford,Individual,2017,12,22000.0,Political Party,83992, 172 | 170,Scottish Green Party,2017-12-01,Mr Mark Christopher Ruskell,Individual,2017,12,7805.16,Political Party,83321, 173 | 171,Scottish Green Party,2017-12-01,Mr Andrew Dearg Wightman,Individual,2017,12,7757.16,Political Party,83322, 174 | 172,Scottish Green Party,2017-12-01,Mr John Bradford Finnie,Individual,2017,12,7846.32,Political Party,83320, 175 | 173,Conservative and Unionist Party,2017-12-01,Mr Michael S Thronton OBE,Individual,2017,12,2000.0,Political Party,83855, 176 | 174,Conservative and Unionist Party,2017-11-30,Mr Henry Keswick,Individual,2017,11,5000.0,Political Party,34260, 177 | 175,Liberal Democrats,2017-11-30,Mrs Hazel Watson,Individual,2017,11,1568.08,Political Party,83349, 178 | 176,Renew,2017-11-30,Mr Richard Christopher Breen,Individual,2017,11,12971.08,Political Party,83311, 179 | 177,Conservative and Unionist Party,2017-11-30,Mr Gerald H Elliot,Individual,2017,11,8000.0,Political Party,83837, 180 | 178,Liberal Democrats,2017-11-29,Lord Paul Tyler,Individual,2017,11,2400.0,Political Party,72569, 181 | 179,Conservative and Unionist Party,2017-11-29,Mr Jeremy RS Hunt,Individual,2017,11,1947.9,Political Party,83870, 182 | 180,Labour Party,2017-11-28,Mr Edward John Izzard,Individual,2017,11,10000.0,Political Party,83990, 183 | 181,UK Independence Party (UKIP),2017-11-28,Lord D Stevens of Ludgate,Individual,2017,11,4000.0,Political Party,48879, 184 | 182,Liberal Democrats,2017-11-27,Mr Michael Watson,Individual,2017,11,2500.0,Political Party,83340, 185 | 183,British National Party,2017-11-27,Ms Marina Smethhurst,Individual,2017,11,10000.0,Political Party,83362, 186 | 184,Labour Party,2017-11-27,Clive Hollick,Individual,2017,11,20000.0,Political Party,83994, 187 | 185,UK Independence Party (UKIP),2017-11-27,Mr Malcolm Bluemel,Individual,2017,11,2000.0,Political Party,83323, 188 | 186,Conservative and Unionist Party,2017-11-27,Mr Abdul-Majid Jafar,Individual,2017,11,12500.0,Political Party,81180, 189 | 187,Conservative and Unionist Party,2017-11-24,Mr Andrew Godson,Individual,2017,11,5500.0,Political Party,47473, 190 | 188,Conservative and Unionist Party,2017-11-22,Mr Stephen P J Matthews,Individual,2017,11,8500.0,Political Party,83843, 191 | 189,Labour Party,2017-11-22,Mr Douglas Reynolds,Individual,2017,11,10000.0,Political Party,83997, 192 | 190,Conservative and Unionist Party,2017-11-21,Ms Nadezda Rodicheva,Individual,2017,11,8500.0,Political Party,81128, 193 | 191,Labour Party,2017-11-21,Lord Charles Falconer of Thoroton,Individual,2017,11,833.0,Political Party,83993, 194 | 192,Conservative and Unionist Party,2017-11-21,Ms Rania K Majeed,Individual,2017,11,11750.0,Political Party,83842, 195 | 193,Conservative and Unionist Party,2017-11-21,Mr Alexander Temerko,Individual,2017,11,4250.0,Political Party,43634, 196 | 194,Conservative and Unionist Party,2017-11-21,Mr Howard Leigh,Individual,2017,11,4250.0,Political Party,36384, 197 | 195,Labour Party,2017-11-21,Lord Charles Falconer of Thoroton,Individual,2017,11,833.0,Political Party,83993, 198 | 196,Conservative and Unionist Party,2017-11-20,Mr Thomas E Notman,Individual,2017,11,10000.0,Political Party,83859, 199 | 197,Conservative and Unionist Party,2017-11-19,Mrs Sarah J Pidgley,Individual,2017,11,5000.0,Political Party,83851, 200 | 198,Conservative and Unionist Party,2017-11-17,Mr Philip L Wroughton,Individual,2017,11,3975.0,Political Party,53955, 201 | 199,Conservative and Unionist Party,2017-11-17,Mr Kenneth J French,Individual,2017,11,5000.0,Political Party,83871, 202 | 200,Plaid Cymru - The Party of Wales,2017-11-17,Ms Jane Mactaggart,Individual,2017,11,61473.89,Political Party,83317, 203 | 201,Liberal Democrats,2017-11-16,Mr John Noel Penstone,Individual,2017,11,3000.0,Political Party,76667, 204 | 202,Conservative and Unionist Party,2017-11-16,Mrs Mary Erbrich,Individual,2017,11,60000.0,Political Party,76714, 205 | 203,Conservative and Unionist Party,2017-11-16,Mr Patrick D Horsfall,Individual,2017,11,2000.0,Political Party,83863, 206 | 204,Liberal Democrats,2017-11-15,Mr Anthony Harris,Individual,2017,11,5080.0,Political Party,54432, 207 | 205,Conservative and Unionist Party,2017-11-15,Ms Jane Mactaggart,Individual,2017,11,10000.0,Political Party,37909, 208 | 206,Conservative and Unionist Party,2017-11-15,Mr Zac F Goldsmith,Individual,2017,11,5000.0,Political Party,69726, 209 | 207,Liberal Democrats,2017-11-15,Mr Stephen Dawson,Individual,2017,11,5000.0,Political Party,74657, 210 | 208,Conservative and Unionist Party,2017-11-15,Mr Peter Brown,Individual,2017,11,5000.0,Political Party,83826, 211 | 209,Labour Party,2017-11-14,Mr Tony Belton,Individual,2017,11,3000.0,Political Party,48385, 212 | 210,Women's Equality Party,2017-11-14,Mr Jonathan Leslie Skeet,Individual,2017,11,10000.0,Political Party,76669, 213 | 211,Labour Party,2017-11-13,Lord Charles Falconer of Thoroton,Individual,2017,11,833.0,Political Party,83993, 214 | 212,Liberal Democrats,2017-11-12,William Goodhart,Individual,2017,11,20000.0,Political Party,34527, 215 | 213,Conservative and Unionist Party,2017-11-10,Mr Roger J Kendrick,Individual,2017,11,3000.0,Political Party,83820, 216 | 214,Conservative and Unionist Party,2017-11-07,Mr Michael Davis,Individual,2017,11,1000.0,Political Party,34240, 217 | 215,Labour Party,2017-11-06,Mr William Haughey,Individual,2017,11,30000.0,Political Party,37529, 218 | 216,Green Party,2017-11-05,Ms Jean Lambert MEP,Individual,2017,11,600.0,Political Party,34382, 219 | 217,Conservative and Unionist Party,2017-11-03,Ms Dora Bertolutti,Individual,2017,11,2400.0,Political Party,83883, 220 | 218,Conservative and Unionist Party,2017-11-03,Mr Neville A Baxter,Individual,2017,11,10000.0,Political Party,34323, 221 | 219,Conservative and Unionist Party,2017-11-01,Mr Oluwole Kolade,Individual,2017,11,52500.0,Political Party,38762, 222 | 220,Liberal Democrats,2017-11-01,Mr Anthony Bunker,Individual,2017,11,3750.0,Political Party,37466, 223 | 221,Conservative and Unionist Party,2017-11-01,Mr Stephen L Massey,Individual,2017,11,11000.0,Political Party,36454, 224 | 222,Conservative and Unionist Party,2017-10-31,Hon George T Farmer,Individual,2017,10,8500.0,Political Party,83827, 225 | 223,Conservative and Unionist Party,2017-10-31,Mr Laurence Hollingworth,Individual,2017,10,1000.0,Political Party,67568, 226 | 224,Liberal Democrats,2017-10-31,Ms Anna Gallop,Individual,2017,10,2000.0,Political Party,83357, 227 | 225,Liberal Democrats,2017-10-31,Mr Vincent Cable,Individual,2017,10,2000.0,Political Party,82964, 228 | 226,Conservative and Unionist Party,2017-10-31,Mr John D Booth,Individual,2017,10,5000.0,Political Party,81007, 229 | 227,Conservative and Unionist Party,2017-10-30,Mrs Esme Forbes,Individual,2017,10,38295.29,Political Party,76306, 230 | 228,Conservative and Unionist Party,2017-10-30,Mr Iain Aitken,Individual,2017,10,2000.0,Political Party,83856, 231 | 229,Conservative and Unionist Party,2017-10-30,Mr Byron S Huson,Individual,2017,10,400000.0,Political Party,83882, 232 | 230,Conservative and Unionist Party,2017-10-25,Mr Navroz D Udwadia,Individual,2017,10,10000.0,Political Party,83846, 233 | 231,Conservative and Unionist Party,2017-10-25,Mr James R Lupton,Individual,2017,10,3900.0,Political Party,34266, 234 | 232,Conservative and Unionist Party,2017-10-25,Mr James R Lupton,Individual,2017,10,50000.0,Political Party,34266, 235 | 233,Conservative and Unionist Party,2017-10-25,Mr Ravi S Kailas,Individual,2017,10,2000.0,Political Party,74743, 236 | 234,Conservative and Unionist Party,2017-10-23,Mr Nicholas Campsie,Individual,2017,10,10000.0,Political Party,76713, 237 | 235,Conservative and Unionist Party,2017-10-23,Mr Nicholas N Moore,Individual,2017,10,2000.0,Political Party,83872, 238 | 236,Conservative and Unionist Party,2017-10-23,Mr William H Salomon,Individual,2017,10,25000.0,Political Party,36405, 239 | 237,Green Party,2017-10-22,Ms Jean Lambert MEP,Individual,2017,10,600.0,Political Party,34382, 240 | 238,Conservative and Unionist Party,2017-10-19,Mr Ian R Taylor,Individual,2017,10,100000.0,Political Party,76312, 241 | 239,Conservative and Unionist Party,2017-10-18,Lord Philip Harris,Individual,2017,10,10000.0,Political Party,34251, 242 | 240,Conservative and Unionist Party,2017-10-17,Scirard Lancelyn Green,Individual,2017,10,1200.0,Political Party,76334, 243 | 241,Conservative and Unionist Party,2017-10-17,Scirard Lancelyn Green,Individual,2017,10,600.0,Political Party,76334, 244 | 242,Conservative and Unionist Party,2017-10-17,Scirard Lancelyn Green,Individual,2017,10,600.0,Political Party,76334, 245 | 243,Conservative and Unionist Party,2017-10-16,Mr Amjad Bseisu,Individual,2017,10,12500.0,Political Party,74074, 246 | 244,Conservative and Unionist Party,2017-10-16,Mr Gary Lydiate,Individual,2017,10,20000.0,Political Party,81113, 247 | 245,Conservative and Unionist Party,2017-10-13,Mr Duncan Greenland,Individual,2017,10,1871.31,Political Party,83876, 248 | 246,Conservative and Unionist Party,2017-10-11,Mr Roger G Orf,Individual,2017,10,8500.0,Political Party,47055, 249 | 247,Labour Party,2017-10-11,Ms Jane Turnell-Read,Individual,2017,10,10000.0,Political Party,83998, 250 | 248,Conservative and Unionist Party,2017-10-11,Ms Lesley Jackson,Individual,2017,10,100000.0,Political Party,38745, 251 | 249,UK Independence Party (UKIP),2017-10-09,Mr Andrew Perloff,Individual,2017,10,20000.0,Political Party,46042, 252 | 250,Conservative and Unionist Party,2017-10-09,Mr Donald J Lewin,Individual,2017,10,10000.0,Political Party,83841, 253 | 251,Conservative and Unionist Party,2017-10-09,Mr Peter Kane,Individual,2017,10,50000.0,Political Party,34257, 254 | 252,Conservative and Unionist Party,2017-10-09,Mr Roger Nagioff,Individual,2017,10,50000.0,Political Party,36392, 255 | 253,Conservative and Unionist Party,2017-10-09,Mr Christopher M Higgins,Individual,2017,10,2300.0,Political Party,84031, 256 | 254,Conservative and Unionist Party,2017-10-09,Mr Michael A Dangoor,Individual,2017,10,56600.0,Political Party,83834, 257 | 255,Conservative and Unionist Party,2017-10-09,Mr Robert D Calrow,Individual,2017,10,21000.0,Political Party,38757, 258 | 256,Conservative and Unionist Party,2017-10-09,Mr Michael Cohen,Individual,2017,10,8100.0,Political Party,83833, 259 | 257,Scottish National Party (SNP),2017-10-06,Mr Ian McNish,Individual,2017,10,175000.0,Political Party,78128, 260 | 258,Labour Party,2017-10-05,Lord Charles Falconer of Thoroton,Individual,2017,10,833.0,Political Party,83993, 261 | 259,Conservative and Unionist Party,2017-10-04,Mr Arthur J Taylor,Individual,2017,10,1500.0,Political Party,49844, 262 | 260,Plaid Cymru - The Party of Wales,2017-10-02,Mr David Charles Williams,Individual,2017,10,40357.67,Political Party,83316, 263 | 261,Conservative and Unionist Party,2017-09-01,Mr Robert J Madejski,Individual,2017,9,1875.0,Political Party,34372, 264 | 262,Liberal Democrats,2017-08-31,Mr Andrew Pinnock,Individual,2017,8,1866.0,Political Party,83358, 265 | 263,Liberal Democrats,2017-08-30,Lord John Lee Of Trafford,Individual,2017,8,2400.0,Political Party,72163, 266 | 264,UK Independence Party (UKIP),2017-08-18,Mr Ian Pirie,Individual,2017,8,1000.0,Political Party,74686, 267 | 265,Liberal Democrats,2017-08-06,Mr Mark Petterson,Individual,2017,8,4000.0,Political Party,55995, 268 | 266,Conservative and Unionist Party,2017-07-17,Ms Rachel H MacLean,Individual,2017,7,2000.0,Political Party,83086, 269 | 267,Conservative and Unionist Party,2017-07-17,Ms Rachel H MacLean,Individual,2017,7,2000.0,Political Party,83086, 270 | 268,UK Independence Party (UKIP),2017-06-30,Mr Brett Hammond,Individual,2017,6,650.0,Political Party,83324, 271 | 269,Liberal Democrats,2017-06-30,Ms Jane Mactaggart,Individual,2017,6,23000.0,Political Party,55988, 272 | 270,Conservative and Unionist Party,2017-06-24,Mr Elizabeth A Gooch,Individual,2017,6,4000.0,Political Party,47085, 273 | 271,Liberal Democrats,2017-06-08,Mr Albert Mcintosh,Individual,2017,6,5000.0,Political Party,82944, 274 | 272,Liberal Democrats,2017-06-05,Mr Mark Petterson,Individual,2017,6,1500.0,Political Party,55995, 275 | 273,Liberal Democrats,2017-06-01,Ms Karin Snowden,Individual,2017,6,1750.0,Political Party,83333, 276 | 274,UK Independence Party (UKIP),2017-05-30,Mr Ian Pirie,Individual,2017,5,3000.0,Political Party,74686, 277 | 275,UK Independence Party (UKIP),2017-05-30,Mr Malcolm Bluemel,Individual,2017,5,1000.0,Political Party,83323, 278 | 276,UK Independence Party (UKIP),2017-05-26,Mr Malcolm Bluemel,Individual,2017,5,3000.0,Political Party,83323, 279 | 277,Conservative and Unionist Party,2017-05-23,Mr Jeremy J Hosking,Individual,2017,5,5000.0,Political Party,38786, 280 | 278,Labour Party,2017-05-22,Lord Charles Falconer of Thoroton,Individual,2017,5,1500.0,Political Party,83993, 281 | 279,Conservative and Unionist Party,2017-05-12,Mr Neil Record,Individual,2017,5,3000.0,Political Party,36402, 282 | 280,Conservative and Unionist Party,2017-05-11,Mrs Menju Mehrotra,Individual,2017,5,2500.0,Political Party,83864, 283 | 281,Conservative and Unionist Party,2017-05-11,Mr Ravi Mehrotra,Individual,2017,5,2500.0,Political Party,83865, 284 | 282,Conservative and Unionist Party,2017-05-05,Mr Peter D Landale,Individual,2017,5,5000.0,Political Party,83857, 285 | 283,Conservative and Unionist Party,2017-05-02,Mr Henry Keswick,Individual,2017,5,3000.0,Political Party,34260, 286 | 284,Conservative and Unionist Party,2017-04-29,Mr Daniel D Laycock,Individual,2017,4,2500.0,Political Party,83824, 287 | 285,Conservative and Unionist Party,2017-04-28,Mr Edward P Weatherall,Individual,2017,4,5000.0,Political Party,83858, 288 | 286,Labour Party,2017-04-24,Mr John O'Hara,Individual,2017,4,1000.0,Political Party,83996, 289 | 287,Conservative and Unionist Party,2017-04-20,Mr Terence F Parkinson,Individual,2017,4,5000.0,Political Party,44896, 290 | 288,Conservative and Unionist Party,2017-04-19,Mr Terence F Parkinson,Individual,2017,4,5000.0,Political Party,44896, 291 | 289,Liberal Democrats,2017-04-08,Lord John Alderdice,Individual,2017,4,2400.0,Political Party,34531, 292 | 290,UK Independence Party (UKIP),2017-03-31,Mr Duncan Greenland,Individual,2017,3,650.0,Political Party,83324, 293 | 291,Conservative and Unionist Party,2017-03-27,Mr Jeremy Hand,Individual,2017,3,2500.0,Political Party,83880, 294 | 292,UK Independence Party (UKIP),2017-02-15,Professor Tim Congdon,Individual,2017,2,2500.0,Political Party,38170, 295 | 293,UK Independence Party (UKIP),2017-02-08,Lord D Stevens of Ludgate,Individual,2017,2,1000.0,Political Party,48879, 296 | 294,UK Independence Party (UKIP),2017-01-26,Professor Tim Congdon,Individual,2017,1,5000.0,Political Party,38170, 297 | 295,UK Independence Party (UKIP),2017-01-26,Mr Duncan Greenland,Individual,2017,1,1000.0,Political Party,83323, 298 | 296,UK Independence Party (UKIP),2017-01-26,Lord D Stevens of Ludgate,Individual,2017,1,5000.0,Political Party,48879, 299 | 297,UK Independence Party (UKIP),2017-01-13,Mr Duncan Greenland,Individual,2017,1,1000.0,Political Party,74686, 300 | 298,UK Independence Party (UKIP),2017-01-12,Mr Malcolm Bluemel,Individual,2017,1,1000.0,Political Party,83323, 301 | 299,Labour Party,2017-01-10,Mr Duncan Greenland,Individual,2017,1,1000.0,Political Party,83996, 302 | -------------------------------------------------------------------------------- /3 analyse data/3 Analyse data.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Analyse data with Python Pandas" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "Welcome to this Jupyter Notebook! \n", 15 | " \n", 16 | "Today you'll learn how to import a CSV file into a Jupyter Notebook, and how to analyse already cleaned data. This notebook is part of the course Python for Journalists at [datajournalism.com](https://datajournalism.com/watch/python-for-journalists). The data used originally comes from [the Electoral Commission website](http://search.electoralcommission.org.uk/Search?currentPage=1&rows=10&sort=AcceptedDate&order=desc&tab=1&open=filter&et=pp&isIrishSourceYes=false&isIrishSourceNo=false&date=Reported&from=&to=&quarters=2018Q12&rptPd=3617&prePoll=false&postPoll=false&donorStatus=individual&donorStatus=tradeunion&donorStatus=company&donorStatus=unincorporatedassociation&donorStatus=publicfund&donorStatus=other&donorStatus=registeredpoliticalparty&donorStatus=friendlysociety&donorStatus=trust&donorStatus=limitedliabilitypartnership&donorStatus=impermissibledonor&donorStatus=na&donorStatus=unidentifiabledonor&donorStatus=buildingsociety®ister=ni®ister=gb&optCols=Register&optCols=IsIrishSource&optCols=ReportingPeriodName), but is edited for training purposes. The edited dataset is available on the course website and its [Github repo](https://github.com/winnydejong/pythonforjournalists). \n", 17 | "\n", 18 | "## About Jupyter Notebooks and Pandas\n", 19 | "\n", 20 | "Right now you're looking at a Jupyter Notebook: an interactive, browser based programming environment. You can use these notebooks to program in R, Julia or Python - as you'll be doing later on. Read more about Jupyter Notebook in the [Jupyter Notebook Quick Start Guide](https://jupyter-notebook-beginner-guide.readthedocs.io/en/latest/what_is_jupyter.html). \n", 21 | " \n", 22 | "To analyse up our data, we'll be using Python and Pandas. Pandas is an open-source Python library - basically an extra toolkit to go with Python - that is designed for data analysis. Pandas is flexible, easy to use and has lots of useful functions built right in. Read more about Pandas and its features in [the Pandas documentation](https://pandas.pydata.org/pandas-docs/stable/). That Pandas functions in ways similar to both spreadsheets and SQL databases (though the latter won't be discussed in this course), makes it beginner friendly. :) \n", 23 | "\n", 24 | "**Notebook shortcuts** \n", 25 | "\n", 26 | "Within Jupyter Notebooks, there are some shortcuts you can use. If you'll be using more notebooks for your data analysis in the future, you'll remember these shortcuts soon enough. :) \n", 27 | "\n", 28 | "* `esc` will take you into command mode\n", 29 | "* `a` will insert cell above\n", 30 | "* `b` will insert cell below\n", 31 | "* `shift then tab` will show you the documentation for your code\n", 32 | "* `shift and enter` will run your cell\n", 33 | "* ` d d` will delete a cell\n", 34 | "\n", 35 | "**Pandas dictionary**\n", 36 | "\n", 37 | "* **dataframe**: dataframe is Pandas speak for a table with a labeled y-axis, also known as an index. (The index usually starts at 0.)\n", 38 | "* **series**: a series is a list, a series can be made of a single column within a dataframe.\n", 39 | "\n", 40 | "Before we dive in, a little more about Jupyter Notebooks. Every notebooks is made out of cells. A cell can either contain Markdown text - like this one - or code. In the latter you can execute your code. To see what that means, type the following command in the next cell `print(\"hello world\")`." 41 | ] 42 | }, 43 | { 44 | "cell_type": "code", 45 | "execution_count": null, 46 | "metadata": {}, 47 | "outputs": [], 48 | "source": [] 49 | }, 50 | { 51 | "cell_type": "markdown", 52 | "metadata": {}, 53 | "source": [ 54 | "## Getting started" 55 | ] 56 | }, 57 | { 58 | "cell_type": "markdown", 59 | "metadata": {}, 60 | "source": [ 61 | "In the module 'Clean data' from this course, we cleaned up a dataset with donations to political parties in the UK. Now, we're going to analyse the data in that dataset. Let's start by importing the Pandas library, using `import pandas as pd`." 62 | ] 63 | }, 64 | { 65 | "cell_type": "code", 66 | "execution_count": null, 67 | "metadata": { 68 | "collapsed": true, 69 | "jupyter": { 70 | "outputs_hidden": true 71 | } 72 | }, 73 | "outputs": [], 74 | "source": [] 75 | }, 76 | { 77 | "cell_type": "markdown", 78 | "metadata": {}, 79 | "source": [ 80 | "Now, import the cleaned dataset, use `df = pd.read_csv('/path/to/file_with_clean_data.csv')`." 81 | ] 82 | }, 83 | { 84 | "cell_type": "markdown", 85 | "metadata": {}, 86 | "source": [ 87 | "## Importing data" 88 | ] 89 | }, 90 | { 91 | "cell_type": "code", 92 | "execution_count": null, 93 | "metadata": {}, 94 | "outputs": [], 95 | "source": [] 96 | }, 97 | { 98 | "cell_type": "markdown", 99 | "metadata": {}, 100 | "source": [ 101 | "Let's see if the data is anything like you'd expect, use `df.head()`, `df.tail()` or `df.sample()`." 102 | ] 103 | }, 104 | { 105 | "cell_type": "code", 106 | "execution_count": null, 107 | "metadata": {}, 108 | "outputs": [], 109 | "source": [] 110 | }, 111 | { 112 | "cell_type": "markdown", 113 | "metadata": {}, 114 | "source": [ 115 | "Whoops! When we saved the data after cleaning it, the index was saved in an unnamed column. With importing, Pandas added a new index... Let's get rid of the 'Unnamed: 0' column. Drop it like it's hot... `df = df.drop('Unnamed: 0', 1)`." 116 | ] 117 | }, 118 | { 119 | "cell_type": "code", 120 | "execution_count": null, 121 | "metadata": { 122 | "collapsed": true, 123 | "jupyter": { 124 | "outputs_hidden": true 125 | } 126 | }, 127 | "outputs": [], 128 | "source": [] 129 | }, 130 | { 131 | "cell_type": "markdown", 132 | "metadata": {}, 133 | "source": [ 134 | "Let's see if this worked, use `df.head()`, `df.tail()` or `df.sample()`.\n", 135 | "\n" 136 | ] 137 | }, 138 | { 139 | "cell_type": "code", 140 | "execution_count": null, 141 | "metadata": {}, 142 | "outputs": [], 143 | "source": [] 144 | }, 145 | { 146 | "cell_type": "markdown", 147 | "metadata": {}, 148 | "source": [ 149 | "Now, if this looks better, let's get started and analyse some data.\n", 150 | "\n", 151 | "# Analyse data\n", 152 | "\n", 153 | "## Statistical summary\n", 154 | "\n", 155 | "In the module Clean data, you already saw the power of `df.describe()`. This function gives a basic statistical summary of every column in the dataset. It will give you even more information when you tell the function that you want everything included, like this: `df.describe(include='all')`" 156 | ] 157 | }, 158 | { 159 | "cell_type": "code", 160 | "execution_count": null, 161 | "metadata": {}, 162 | "outputs": [], 163 | "source": [] 164 | }, 165 | { 166 | "cell_type": "markdown", 167 | "metadata": {}, 168 | "source": [ 169 | "For columns with numeric values, `df.describe()` will give back the most information, here's a full list of the parameters and their meaning: \n", 170 | "\n", 171 | "**df.describe() parameters**\n", 172 | "* **count**: number of values in that column\n", 173 | "* **unique**: number of unique values in that column\n", 174 | "* **top**: first value in that column\n", 175 | "* **freq**: the most common value’s frequency\n", 176 | "* **mean**: average\n", 177 | "* **std**: standard deviation\n", 178 | "* **min**: minimum value, lowest value in the column\n", 179 | "* **25%**: first percentile\n", 180 | "* **50%**: second percentile, this is the same as the median\n", 181 | "* **75%**: thirth percentile\n", 182 | "* **max**: maximum value, highest value in the column\n", 183 | "\n", 184 | "If a column does not contain numeric value, only those parameters that are applicable are returned. Python gives you NaN-values when that's the case - NaN is short for Not a Number. \n", 185 | "\n", 186 | "Notice that 'count' is 300 for every column. This means that every column has a value for every row in the dataset. How do I know? I looked at the total number of rows, using `df.shape`." 187 | ] 188 | }, 189 | { 190 | "cell_type": "code", 191 | "execution_count": null, 192 | "metadata": {}, 193 | "outputs": [], 194 | "source": [] 195 | }, 196 | { 197 | "cell_type": "markdown", 198 | "metadata": {}, 199 | "source": [ 200 | "## Filter\n", 201 | "\n", 202 | "Let's try to filter the dataframe based on the value in the Value column. You can do this using `df[df['Value'] > 10000 ]`. This will give you a dataframe with only donations from 10.000 pound or more." 203 | ] 204 | }, 205 | { 206 | "cell_type": "code", 207 | "execution_count": null, 208 | "metadata": {}, 209 | "outputs": [], 210 | "source": [] 211 | }, 212 | { 213 | "cell_type": "markdown", 214 | "metadata": {}, 215 | "source": [ 216 | "## Sort\n", 217 | "Let's try to sort the data. Using the command `df.sort_values(by='column_name')` will sort the dataframe based on the column of your choosing. Sorting by default happens ascending, from small to big. \n", 218 | "\n", 219 | "In case you want to see the sorting from big to small, descending, you'll have to type: `df.sort_values(by='column_name', ascending=False)`.\n", 220 | "\n", 221 | "Now, let's try to sort the dataframe based on the number in the Value column it's easy to find out who made the biggest donation. \n", 222 | "\n", 223 | "The above commands will sort the dataframe by a column, but - since we never asked our notebook to - won't show the data. To sort the data and show us the new order of the top 10, we'll have to combine the command with `.head(10)` like this: `df.sort_values(by='column_name').head(10)`.\n", 224 | "\n", 225 | "Now, what would you type if you want to see the 10 smallest donations?" 226 | ] 227 | }, 228 | { 229 | "cell_type": "code", 230 | "execution_count": null, 231 | "metadata": {}, 232 | "outputs": [], 233 | "source": [] 234 | }, 235 | { 236 | "cell_type": "markdown", 237 | "metadata": {}, 238 | "source": [ 239 | "If you want to see the biggest donations made, there are two ways to do that. You could use `df.tail(10)` to see the last 10 rows of the dataframe as it is now. Since the dataframe is ordered from small to big donation, the biggest donations will be in the last 10 rows.\n", 240 | "\n", 241 | "Another way of doing this, is using `df.sort_values(by='Value', ascending=False).head(10)`. This would sort the dataframe based on the Value column from big to small. Personally I prefer the latter... " 242 | ] 243 | }, 244 | { 245 | "cell_type": "code", 246 | "execution_count": null, 247 | "metadata": {}, 248 | "outputs": [], 249 | "source": [] 250 | }, 251 | { 252 | "cell_type": "markdown", 253 | "metadata": {}, 254 | "source": [ 255 | "## Sum\n", 256 | "\n", 257 | "Wow! There are some big donations in our dataset. If you want to know how much money was donated in total, you need to get the sum of the column Value. Use `df['Value'].sum()`." 258 | ] 259 | }, 260 | { 261 | "cell_type": "code", 262 | "execution_count": null, 263 | "metadata": {}, 264 | "outputs": [], 265 | "source": [] 266 | }, 267 | { 268 | "cell_type": "markdown", 269 | "metadata": {}, 270 | "source": [ 271 | "## Count" 272 | ] 273 | }, 274 | { 275 | "cell_type": "markdown", 276 | "metadata": {}, 277 | "source": [ 278 | "Let's look at the receivers of all this donation money. Use `df['RegulatedEntityName'].count()` to count the number of times a regulated entity received a donation.\n", 279 | "\n" 280 | ] 281 | }, 282 | { 283 | "cell_type": "code", 284 | "execution_count": null, 285 | "metadata": {}, 286 | "outputs": [], 287 | "source": [] 288 | }, 289 | { 290 | "cell_type": "markdown", 291 | "metadata": {}, 292 | "source": [ 293 | "Not really what we were looking for, right? Using `.count()` gives you the number of values in a column. Not the number of appearances per unique value in the column. \n", 294 | "\n", 295 | "You'll need to use `df['RegulatedEntityName'].value_counts()` if you want to know that... " 296 | ] 297 | }, 298 | { 299 | "cell_type": "code", 300 | "execution_count": null, 301 | "metadata": {}, 302 | "outputs": [], 303 | "source": [] 304 | }, 305 | { 306 | "cell_type": "markdown", 307 | "metadata": {}, 308 | "source": [ 309 | "Ok. Let's see if you really understand the difference between `.value_counts()` and `.count()` If you want to know how many donors have donated, you should count the values in the DonorName column. Do you use `df['DonorName'].value_counts()` or `df['DonorName'].count()`?\n", 310 | "\n", 311 | "When in doubt, try both. Remember: we're using a Jupyter Notebook here. It's a **Notebook**, so you can't go wrong here. :)" 312 | ] 313 | }, 314 | { 315 | "cell_type": "code", 316 | "execution_count": null, 317 | "metadata": {}, 318 | "outputs": [], 319 | "source": [] 320 | }, 321 | { 322 | "cell_type": "code", 323 | "execution_count": null, 324 | "metadata": {}, 325 | "outputs": [], 326 | "source": [] 327 | }, 328 | { 329 | "cell_type": "markdown", 330 | "metadata": {}, 331 | "source": [ 332 | "Interesting: apparently Ms Jane Mactaggart, Mr Duncan Greenland, and Lord Charles Falconer of Thoroton have donated most often. Let's look into that...\n", 333 | "\n", 334 | "## Groupby\n", 335 | "If you're familiar with Excel, you probably heard of 'pivot tables'. Python Pandas has a function very similar to those pivot tables. \n", 336 | "\n", 337 | "Let's start with a small refresher: pivot tables are summaries of a dataset inside a new table. Huh? That's might be a lot to take in. \n", 338 | "\n", 339 | "Look at our example: data on donations to political parties in the UK. If we want to know how much each unique donor donated, we are looking for a specific summary of our dataset. To get the anwer to this question: 'How much have Ms Jane Mactaggart, Mr Duncan Greenland, and Lord Charles Falconer of Thoroton donated in total?' We need Pandas to sum up all donation for every donor in the dataframe. In a way, this is a summary of the original dataframe by grouping values by in this case the column DonorName. \n", 340 | "\n", 341 | "Using Python this can be done using the group by function. Let's create a new dataframe called donors, that has all donors and the total sum of their donations in there. Use `donors = df.groupby('DonorName')['Value'].sum()`. This is a combination of several functions: group data by 'DonorName', and sum the data in the 'Value' column..." 342 | ] 343 | }, 344 | { 345 | "cell_type": "code", 346 | "execution_count": null, 347 | "metadata": {}, 348 | "outputs": [], 349 | "source": [] 350 | }, 351 | { 352 | "cell_type": "markdown", 353 | "metadata": {}, 354 | "source": [ 355 | "To see if it worked, you'll have to add `donors.head(10)`, otherwise your computer won't know that you actually want to see the result of your effort. :)\n", 356 | "\n", 357 | "## Pivot tables\n", 358 | "\n", 359 | "But Python has it's own pivot table as well. You can get a similar result in a better looking table using de `df.pivot_table` function. \n", 360 | "\n", 361 | "Here's a perfectly fine `.pivot_table` example:\n", 362 | "`df.pivot_table(values=\"Value\", index=\"DonorName\", columns=\"Year\", aggfunc='sum').sort_values(2018).head(10)`\n", 363 | "\n", 364 | "Let's go over this code before running it. What will `df.pivot_table(values=\"Value\", index=\"DonorName\", columns=\"Year\", aggfunc='sum').sort_values(2018).head(10)` actually do? \n", 365 | "\n", 366 | "For the dataframe called df, create a pivot table where: \n", 367 | "- the values in the pivot table should be based on the Value column\n", 368 | "- the index of the pivot table should be base don the DonorName column, in other words: create a row for every unique value in the DonorName column\n", 369 | "- create a new column for every unique value in the Year column\n", 370 | "- aggregate the data that fills up these columns (from the Value column, see?) by summing it for every row. \n", 371 | "\n", 372 | "Are you ready to try it yourself?" 373 | ] 374 | }, 375 | { 376 | "cell_type": "code", 377 | "execution_count": null, 378 | "metadata": {}, 379 | "outputs": [], 380 | "source": [] 381 | }, 382 | { 383 | "cell_type": "markdown", 384 | "metadata": {}, 385 | "source": [ 386 | "## Save your data\n", 387 | "\n", 388 | "Now that we've put all this work into cleaning our dataset, let's save a copy. Off course Pandas has a nifty command for that too. Use `dataframe.to_csv('filename.csv', encoding='utf8')`. \n", 389 | "\n", 390 | "Be ware: use a different name than the filename of the original data file, or it will be overwritten. " 391 | ] 392 | }, 393 | { 394 | "cell_type": "code", 395 | "execution_count": null, 396 | "metadata": {}, 397 | "outputs": [], 398 | "source": [] 399 | }, 400 | { 401 | "cell_type": "markdown", 402 | "metadata": {}, 403 | "source": [ 404 | "In case you want to check if a new file was created in your directory, you can use the `pwd` and `ls` commands. At the beginning of this module, we used these commands to print the working directory (`pwd`) and list the content of the working directory (`ls`). \n", 405 | "\n", 406 | "First, use `pwd` to see in which folder - also known as directory - you are:" 407 | ] 408 | }, 409 | { 410 | "cell_type": "code", 411 | "execution_count": null, 412 | "metadata": {}, 413 | "outputs": [], 414 | "source": [] 415 | }, 416 | { 417 | "cell_type": "markdown", 418 | "metadata": {}, 419 | "source": [ 420 | "Now use `ls` to get a list of all files in this directory. If everything worked your newly saved datafile should be among the files in the list. " 421 | ] 422 | }, 423 | { 424 | "cell_type": "code", 425 | "execution_count": null, 426 | "metadata": {}, 427 | "outputs": [], 428 | "source": [] 429 | } 430 | ], 431 | "metadata": { 432 | "kernelspec": { 433 | "display_name": "Python 3", 434 | "language": "python", 435 | "name": "python3" 436 | }, 437 | "language_info": { 438 | "codemirror_mode": { 439 | "name": "ipython", 440 | "version": 3 441 | }, 442 | "file_extension": ".py", 443 | "mimetype": "text/x-python", 444 | "name": "python", 445 | "nbconvert_exporter": "python", 446 | "pygments_lexer": "ipython3", 447 | "version": "3.7.5" 448 | }, 449 | "toc": { 450 | "nav_menu": {}, 451 | "number_sections": true, 452 | "sideBar": true, 453 | "skip_h1_title": true, 454 | "toc_cell": false, 455 | "toc_position": {}, 456 | "toc_section_display": "block", 457 | "toc_window_display": false 458 | } 459 | }, 460 | "nbformat": 4, 461 | "nbformat_minor": 4 462 | } 463 | -------------------------------------------------------------------------------- /3 analyse data/Extra 3 Analyse data exercises.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Extra: 3 Analyse data exercises" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "Welcome to this Jupyter Notebook! \n", 15 | "\n", 16 | "This notebook is part of the course Python for Journalists at [datajournalism.com](https://datajournalism.com/watch/python-for-journalists). The data used originally comes from [the Electoral Commission website](http://search.electoralcommission.org.uk/Search?currentPage=1&rows=10&sort=AcceptedDate&order=desc&tab=1&open=filter&et=pp&isIrishSourceYes=false&isIrishSourceNo=false&date=Reported&from=&to=&quarters=2018Q12&rptPd=3617&prePoll=false&postPoll=false&donorStatus=individual&donorStatus=tradeunion&donorStatus=company&donorStatus=unincorporatedassociation&donorStatus=publicfund&donorStatus=other&donorStatus=registeredpoliticalparty&donorStatus=friendlysociety&donorStatus=trust&donorStatus=limitedliabilitypartnership&donorStatus=impermissibledonor&donorStatus=na&donorStatus=unidentifiabledonor&donorStatus=buildingsociety®ister=ni®ister=gb&optCols=Register&optCols=IsIrishSource&optCols=ReportingPeriodName), but is edited for training purposes. The edited dataset is available on the course website and its [Github repo](https://github.com/winnydejong/pythonforjournalists). \n", 17 | "\n", 18 | "This notebook contains some exercises for you to practice your newly learned skills with, after finishing module 3 of the Python for Journalists course. Note: since this a later added extra, there is no video to accompany this notebook.\n", 19 | "\n", 20 | "## About Jupyter Notebooks and Pandas\n", 21 | "Right now you're looking at a Jupyter Notebook: an interactive, browser based programming environment. You can use these notebooks to program in R, Julia or Python - as you'll be doing later on. Read more about Jupyter Notebook in the [Jupyter Notebook Quick Start Guide](https://jupyter-notebook-beginner-guide.readthedocs.io/en/latest/what_is_jupyter.html). \n", 22 | " \n", 23 | "To analyse up our data, we'll be using Python and Pandas. Pandas is an open-source Python library - basically an extra toolkit to go with Python - that is designed for data analysis. Pandas is flexible, easy to use and has lots of useful functions built right in. Read more about Pandas and its features in [the Pandas documentation](https://pandas.pydata.org/pandas-docs/stable/). That Pandas functions in ways similar to both spreadsheets and SQL databases (though the latter won't be discussed in this course), makes it beginner friendly. :) \n", 24 | "\n", 25 | "**Notebook shortcuts** \n", 26 | "\n", 27 | "Within Jupyter Notebooks, there are some shortcuts you can use. If you'll be using more notebooks for your data analysis in the future, you'll remember these shortcuts soon enough. :) \n", 28 | "\n", 29 | "* `esc` will take you into command mode\n", 30 | "* `a` will insert cell above\n", 31 | "* `b` will insert cell below\n", 32 | "* `shift then tab` will show you the documentation for your code\n", 33 | "* `shift and enter` will run your cell\n", 34 | "* ` d d` will delete a cell\n", 35 | "\n", 36 | "**Pandas dictionary**\n", 37 | "\n", 38 | "* **dataframe**: dataframe is Pandas speak for a table with a labeled y-axis, also known as an index. (The index usually starts at 0.)\n", 39 | "* **series**: a series is a list, a series can be made of a single column within a dataframe.\n", 40 | "\n", 41 | "Before we dive in, a little more about Jupyter Notebooks. Every notebooks is made out of cells. A cell can either contain Markdown text - like this one - or code. In the latter you can execute your code. To see what that means, type the following command in the next cell `print(\"hello world\")`." 42 | ] 43 | }, 44 | { 45 | "cell_type": "code", 46 | "execution_count": null, 47 | "metadata": {}, 48 | "outputs": [], 49 | "source": [] 50 | }, 51 | { 52 | "cell_type": "markdown", 53 | "metadata": {}, 54 | "source": [ 55 | "# Setup\n", 56 | "\n", 57 | "During the exercises you'll use the Pandas library again. Import Pandas as pd here:" 58 | ] 59 | }, 60 | { 61 | "cell_type": "code", 62 | "execution_count": null, 63 | "metadata": {}, 64 | "outputs": [], 65 | "source": [] 66 | }, 67 | { 68 | "cell_type": "markdown", 69 | "metadata": {}, 70 | "source": [ 71 | "Now we need some data to work with; luckily you know how to import results_clean.csv. Don't you? " 72 | ] 73 | }, 74 | { 75 | "cell_type": "code", 76 | "execution_count": null, 77 | "metadata": {}, 78 | "outputs": [], 79 | "source": [] 80 | }, 81 | { 82 | "cell_type": "markdown", 83 | "metadata": { 84 | "toc-hr-collapsed": true 85 | }, 86 | "source": [ 87 | "# Explore the data\n", 88 | "\n", 89 | "Before you start with the exercises below, it's a good idea to get to know the data a bit. " 90 | ] 91 | }, 92 | { 93 | "cell_type": "markdown", 94 | "metadata": {}, 95 | "source": [ 96 | "### Dimensions\n", 97 | "\n", 98 | "If you use ``len()``, Pandas will tell you how long your dataframe is; it will give you the number of rows of df." 99 | ] 100 | }, 101 | { 102 | "cell_type": "code", 103 | "execution_count": null, 104 | "metadata": {}, 105 | "outputs": [], 106 | "source": [] 107 | }, 108 | { 109 | "cell_type": "markdown", 110 | "metadata": {}, 111 | "source": [ 112 | "In case you'd like to know the number of rows and columns, you can use ``.shape``." 113 | ] 114 | }, 115 | { 116 | "cell_type": "code", 117 | "execution_count": null, 118 | "metadata": {}, 119 | "outputs": [], 120 | "source": [] 121 | }, 122 | { 123 | "cell_type": "markdown", 124 | "metadata": {}, 125 | "source": [ 126 | "To get the total number of elements in the DataFrame, use the ``.size`` attribute. It will give your the product of the number of rows and the number of columns:\n" 127 | ] 128 | }, 129 | { 130 | "cell_type": "code", 131 | "execution_count": null, 132 | "metadata": {}, 133 | "outputs": [], 134 | "source": [] 135 | }, 136 | { 137 | "cell_type": "markdown", 138 | "metadata": {}, 139 | "source": [ 140 | "### Sample\n", 141 | "Please look at a sample of the dataset, use ``.sample()``." 142 | ] 143 | }, 144 | { 145 | "cell_type": "code", 146 | "execution_count": null, 147 | "metadata": {}, 148 | "outputs": [], 149 | "source": [] 150 | }, 151 | { 152 | "cell_type": "markdown", 153 | "metadata": {}, 154 | "source": [ 155 | "Note that if you simply use ``.sample``, you'll take a sample of 1. If you want to take a sample of N size, use ``.sample(N)``." 156 | ] 157 | }, 158 | { 159 | "cell_type": "code", 160 | "execution_count": null, 161 | "metadata": {}, 162 | "outputs": [], 163 | "source": [] 164 | }, 165 | { 166 | "cell_type": "markdown", 167 | "metadata": {}, 168 | "source": [ 169 | "### Statistical description\n", 170 | "\n", 171 | "Use ``.describe()`` to look further into the data." 172 | ] 173 | }, 174 | { 175 | "cell_type": "code", 176 | "execution_count": null, 177 | "metadata": {}, 178 | "outputs": [], 179 | "source": [] 180 | }, 181 | { 182 | "cell_type": "markdown", 183 | "metadata": {}, 184 | "source": [ 185 | "# Clean data\n", 186 | "\n", 187 | "Whenever you export a dataframe to a csv, Pandas includes the index unless you explictly tells it not to do so. When importing said csv again, the unnamed index returns as 'Unnamed: 0'. Let's remove that column using ``.drop()``." 188 | ] 189 | }, 190 | { 191 | "cell_type": "code", 192 | "execution_count": null, 193 | "metadata": {}, 194 | "outputs": [], 195 | "source": [] 196 | }, 197 | { 198 | "cell_type": "markdown", 199 | "metadata": {}, 200 | "source": [ 201 | "Take a sample to check if it worked." 202 | ] 203 | }, 204 | { 205 | "cell_type": "code", 206 | "execution_count": null, 207 | "metadata": {}, 208 | "outputs": [], 209 | "source": [] 210 | }, 211 | { 212 | "cell_type": "markdown", 213 | "metadata": {}, 214 | "source": [ 215 | "# Excercises" 216 | ] 217 | }, 218 | { 219 | "cell_type": "markdown", 220 | "metadata": {}, 221 | "source": [ 222 | "Like all journalism, most data journalism starts with a question. The difference between journalism and data-driven journalism, is that the journalist working the story interviews data tables instead of people.\n", 223 | "\n", 224 | "Therefore all exercises start with a question. In newsrooms people both think and talk in questions all the time. So starting with a simple question, means for most following this course starting in there comfortzone. Which is a nice place to begin, don't you think? \n", 225 | "\n", 226 | "**Every exercise follows the same pattern** \n", 227 | "- the question that needs to be answered: formulated to be easily understood for people\n", 228 | "- a breakdown of the question: formulated to be easily understood for computers\n", 229 | "- hints to help you write the code in which the breakdown results\n", 230 | "- and, if you're looking at the completed notebook, the answer" 231 | ] 232 | }, 233 | { 234 | "cell_type": "markdown", 235 | "metadata": { 236 | "toc-hr-collapsed": true 237 | }, 238 | "source": [ 239 | "## 1a Total donations per party" 240 | ] 241 | }, 242 | { 243 | "cell_type": "markdown", 244 | "metadata": {}, 245 | "source": [ 246 | "### Question\n", 247 | "Which party received the most money?" 248 | ] 249 | }, 250 | { 251 | "cell_type": "markdown", 252 | "metadata": {}, 253 | "source": [ 254 | "### Breakdown\n", 255 | "- for every unique party in the dataset;\n", 256 | " - filter dataframe: only donations to said party;\n", 257 | " - while dataframe is filtered: sum the 'Value' column;\n", 258 | " - store both party name and sum donations (add to list or df);\n", 259 | "- create table with all partynames + sum donations;\n", 260 | "- sort table from highest to lowest\n" 261 | ] 262 | }, 263 | { 264 | "cell_type": "markdown", 265 | "metadata": {}, 266 | "source": [ 267 | "### Hints\n", 268 | "This can be done using:\n", 269 | "- ``.unique()``, to create a list with all unique parties\n", 270 | "- a for-loop to iterate over all parties from that list\n", 271 | "- a filtered dataframe, using ``df[df['column data should be filter on'] == 'value']``\n", 272 | "- ``.sum``, to sum all donations in filtered dataframe\n", 273 | "- ``.append``, to add data to list\n", 274 | "- ``pd.DataFrame()``, to turn a list of lists into a dataframe\n", 275 | "- ``sort_values(by='column name', ascending=False)``, to sort data\n", 276 | "\n", 277 | "Remember: this is a notebook. Use as many cells as you please. :) " 278 | ] 279 | }, 280 | { 281 | "cell_type": "code", 282 | "execution_count": null, 283 | "metadata": {}, 284 | "outputs": [], 285 | "source": [] 286 | }, 287 | { 288 | "cell_type": "markdown", 289 | "metadata": { 290 | "toc-hr-collapsed": true 291 | }, 292 | "source": [ 293 | "## 1b Percentage total donations per party" 294 | ] 295 | }, 296 | { 297 | "cell_type": "markdown", 298 | "metadata": {}, 299 | "source": [ 300 | "### Question\n", 301 | "How much percentage of the total amount of donations made, went to the Liberal Democrats?" 302 | ] 303 | }, 304 | { 305 | "cell_type": "markdown", 306 | "metadata": {}, 307 | "source": [ 308 | "### Breakdown\n", 309 | "- calculate the total amount of donations made;\n", 310 | "- for every party:\n", 311 | " - calculate how much percentage of the total they received;\n", 312 | " - store this percentages\n", 313 | "- add all percentages to dataframe\n", 314 | "- get percentage for Liberal Democrats" 315 | ] 316 | }, 317 | { 318 | "cell_type": "markdown", 319 | "metadata": {}, 320 | "source": [ 321 | "### Hints\n", 322 | "This can be done using:\n", 323 | "- ``.unique()``, to create a list with all unique parties\n", 324 | "- a for-loop to iterate over all parties from that list\n", 325 | "- a filtered dataframe, using ``df[df['column data should be filter on'] == 'value']``\n", 326 | "- ``.sum``, to sum all donations in filtered dataframe\n", 327 | "- ``.append``, to add data to list\n", 328 | "- ``pd.DataFrame()``, to turn a list of lists into a dataframe\n", 329 | "- ``sort_values(by='column name', ascending=False)``, to sort data\n", 330 | "\n", 331 | "Note: this is 1b, it builds upon the dataframe you just created in exercise 1a. (In case you haven't finished that yet: do so first, or continue to exercise 2.)\n", 332 | "\n", 333 | "Oh, and remember: this is a notebook. Use as many cells as you please. :) " 334 | ] 335 | }, 336 | { 337 | "cell_type": "code", 338 | "execution_count": null, 339 | "metadata": {}, 340 | "outputs": [], 341 | "source": [] 342 | }, 343 | { 344 | "cell_type": "markdown", 345 | "metadata": {}, 346 | "source": [ 347 | "## 1c Store data without index\n", 348 | "As you have seen throughout this course; when exporting data Pandas includes an index, unless stated otherwise. \n", 349 | "\n", 350 | "Let's say you'd like to store the 'donationsPerParty' dataframe as a csv. You know to use ``.to_csv()``. Adding ``index=False``, will give you the same result minus the often useless index. Try it! " 351 | ] 352 | }, 353 | { 354 | "cell_type": "code", 355 | "execution_count": null, 356 | "metadata": {}, 357 | "outputs": [], 358 | "source": [] 359 | }, 360 | { 361 | "cell_type": "markdown", 362 | "metadata": { 363 | "toc-hr-collapsed": true 364 | }, 365 | "source": [ 366 | "## 2a count donations throughout the year" 367 | ] 368 | }, 369 | { 370 | "cell_type": "markdown", 371 | "metadata": {}, 372 | "source": [ 373 | "### Question\n", 374 | "In which month are the most donations (count) made? " 375 | ] 376 | }, 377 | { 378 | "cell_type": "markdown", 379 | "metadata": {}, 380 | "source": [ 381 | "### Breakdown\n", 382 | "- Group donations by year and month;\n", 383 | "- Use count not sum." 384 | ] 385 | }, 386 | { 387 | "cell_type": "markdown", 388 | "metadata": {}, 389 | "source": [ 390 | "### Hints\n", 391 | "\n", 392 | "This can be done using:\n", 393 | "- the ``pivot_table`` command from pandas\n", 394 | "\n", 395 | "When using the ``pivot_table`` command, I always find it helpful to 'design' my desired table. By which I mean filling in the following blanks: \n", 396 | "In the table that answers my question ('In which month are the most donations (count) made?'); there is a row for every _____________________;\n", 397 | " - and a column for every _____________________; and the value in the cells is based on _____________________. \n", 398 | " \n", 399 | "By thinking your pivot table through, the making of it is literally a fill in the blanks exercise. " 400 | ] 401 | }, 402 | { 403 | "cell_type": "code", 404 | "execution_count": null, 405 | "metadata": {}, 406 | "outputs": [], 407 | "source": [] 408 | }, 409 | { 410 | "cell_type": "markdown", 411 | "metadata": {}, 412 | "source": [ 413 | "The ``.pivot_table`` attribute, looks like this: \n", 414 | "\n", 415 | "``df.pivot_table(index=\"column(s) you want to use as index; use the 'for every row'-blank\", \n", 416 | " columns=\"for every unique value in this column, there will be a column made\n", 417 | " in your pivot table; use the 'column for every'-blank\",\n", 418 | " values=\"whatever you fill in here will populate the cells; use the 'value in cells'-blank\",\n", 419 | " aggfunc='count')``\n", 420 | "\n", 421 | "By filling out the blanks before, you probably will be able to fill in ``.pivot_table`` attribute. Note: you don't need to define the columns..." 422 | ] 423 | }, 424 | { 425 | "cell_type": "code", 426 | "execution_count": null, 427 | "metadata": {}, 428 | "outputs": [], 429 | "source": [] 430 | }, 431 | { 432 | "cell_type": "markdown", 433 | "metadata": { 434 | "toc-hr-collapsed": true 435 | }, 436 | "source": [ 437 | "## 2b sum donations throughout the year " 438 | ] 439 | }, 440 | { 441 | "cell_type": "markdown", 442 | "metadata": {}, 443 | "source": [ 444 | "### Question\n", 445 | "In which month is the most money (sum donations) donated? " 446 | ] 447 | }, 448 | { 449 | "cell_type": "markdown", 450 | "metadata": {}, 451 | "source": [ 452 | "### Breakdown\n", 453 | "- Group donations by year and month;\n", 454 | "- Use count not sum." 455 | ] 456 | }, 457 | { 458 | "cell_type": "markdown", 459 | "metadata": {}, 460 | "source": [ 461 | "### Hints\n", 462 | "\n", 463 | "Similar to 2a, use the ``pivot_table`` command from pandas.\n", 464 | "\n", 465 | "Again, when using the ``pivot_table`` command, I always find it helpful to 'design' my desired table. By \n", 466 | "which I mean filling in the following blanks: \n", 467 | "In the table that answers my question ('In which month are the most donations (count) made?'); there is a row for every _____________________;\n", 468 | " - and a column for every _____________________; and the value in the cells is based on _____________________. \n", 469 | " \n", 470 | "By thinking your pivot table through, the making of it is literally a fill in the blanks exercise. \n", 471 | "\n", 472 | "But, there's a difference. Instead of counting, we now want to sum values in our pivot table. To do that, I'll be using the sum function from numpy; a different Python library. \n", 473 | "\n", 474 | "Before we can get to it, we need to import numpy using ``import numpy as np``." 475 | ] 476 | }, 477 | { 478 | "cell_type": "code", 479 | "execution_count": null, 480 | "metadata": {}, 481 | "outputs": [], 482 | "source": [] 483 | }, 484 | { 485 | "cell_type": "markdown", 486 | "metadata": {}, 487 | "source": [ 488 | "With that out of our way, we can now go ahead and create a pivot_table that sums our data...\n", 489 | "\n", 490 | "Use the following template: \n", 491 | "\n", 492 | "When using the ``pivot_table`` command, I always find it helpful to 'design' my desired table. By which I mean filling in the following blanks: \n", 493 | "In the table that answers my question ('In which month are the most donations (count) made?'); there is a row for every ________ (1) _____________;\n", 494 | " - and a column for every _________ (2) ____________; and the value in the cells is based on ________ (3) _____________. \n", 495 | "\n", 496 | "``df.pivot_table(index=\"column or list of columns you want to use as index (1)\", \n", 497 | " columns=\"column or list of columns you want to use as columns (2)\",\n", 498 | " values=\"columns used to populate cells in pivot table (3)\",\n", 499 | " aggfunc=np.sum)``" 500 | ] 501 | }, 502 | { 503 | "cell_type": "code", 504 | "execution_count": null, 505 | "metadata": {}, 506 | "outputs": [], 507 | "source": [] 508 | }, 509 | { 510 | "cell_type": "markdown", 511 | "metadata": {}, 512 | "source": [ 513 | "Do you remember how to use ``sort_values()`` to figure out when the most money was donated? " 514 | ] 515 | }, 516 | { 517 | "cell_type": "code", 518 | "execution_count": null, 519 | "metadata": {}, 520 | "outputs": [], 521 | "source": [] 522 | }, 523 | { 524 | "cell_type": "markdown", 525 | "metadata": { 526 | "toc-hr-collapsed": true 527 | }, 528 | "source": [ 529 | "## 3a most money donated per person" 530 | ] 531 | }, 532 | { 533 | "cell_type": "markdown", 534 | "metadata": {}, 535 | "source": [ 536 | "### Question\n", 537 | "What is the top 10 of people who donated the most money? " 538 | ] 539 | }, 540 | { 541 | "cell_type": "markdown", 542 | "metadata": {}, 543 | "source": [ 544 | "### Breakdown\n", 545 | "- group data by DonorNames;\n", 546 | "- sum Values for every DonorName;\n", 547 | "- sort Values from highest to lowest;\n", 548 | "- only print top 10" 549 | ] 550 | }, 551 | { 552 | "cell_type": "markdown", 553 | "metadata": {}, 554 | "source": [ 555 | "### Hints\n", 556 | "This can be done using:\n", 557 | "- ``pivot_table()`` and ``np.sum``\n", 558 | "- ``.sort_values``\n", 559 | "- ``head()``\n", 560 | "\n", 561 | "As you'll see in the completed notebook; I build upon my answer. Meaning: I first created the needed pivot table; then sorted it; then only printed the 10 first rows. Try it: " 562 | ] 563 | }, 564 | { 565 | "cell_type": "code", 566 | "execution_count": null, 567 | "metadata": {}, 568 | "outputs": [], 569 | "source": [] 570 | }, 571 | { 572 | "cell_type": "markdown", 573 | "metadata": { 574 | "toc-hr-collapsed": true 575 | }, 576 | "source": [ 577 | "## 3b most often donated" 578 | ] 579 | }, 580 | { 581 | "cell_type": "markdown", 582 | "metadata": {}, 583 | "source": [ 584 | "### Question\n", 585 | "What is the top 10 of people who donated most often? " 586 | ] 587 | }, 588 | { 589 | "cell_type": "markdown", 590 | "metadata": {}, 591 | "source": [ 592 | "### Breakdown\n", 593 | "- group data by DonorNames;\n", 594 | "- count occurences for every DonorName;\n", 595 | "- sort Values from highest to lowest;\n", 596 | "- only print top 10" 597 | ] 598 | }, 599 | { 600 | "cell_type": "markdown", 601 | "metadata": {}, 602 | "source": [ 603 | "### Hints\n", 604 | "This can be done using:\n", 605 | "- ``pivot_table()`` and ``np.sum``\n", 606 | "- ``.sort_values``\n", 607 | "- ``head()``" 608 | ] 609 | }, 610 | { 611 | "cell_type": "markdown", 612 | "metadata": {}, 613 | "source": [ 614 | "Since the table contains both donorid's and donorname, I'd recommend try and create pivot tables using both. Hopefully not getting any differences. (In real life, differences would mean further investigations since id-columns most often are unique identifiers...)" 615 | ] 616 | }, 617 | { 618 | "cell_type": "code", 619 | "execution_count": null, 620 | "metadata": {}, 621 | "outputs": [], 622 | "source": [] 623 | } 624 | ], 625 | "metadata": { 626 | "kernelspec": { 627 | "display_name": "Python 3", 628 | "language": "python", 629 | "name": "python3" 630 | }, 631 | "language_info": { 632 | "codemirror_mode": { 633 | "name": "ipython", 634 | "version": 3 635 | }, 636 | "file_extension": ".py", 637 | "mimetype": "text/x-python", 638 | "name": "python", 639 | "nbconvert_exporter": "python", 640 | "pygments_lexer": "ipython3", 641 | "version": "3.7.5" 642 | }, 643 | "toc-showcode": true 644 | }, 645 | "nbformat": 4, 646 | "nbformat_minor": 4 647 | } 648 | -------------------------------------------------------------------------------- /3 analyse data/donations per party, absolute + percentages.csv: -------------------------------------------------------------------------------- 1 | party,donations sum,percentage of total donations 2 | Conservative and Unionist Party,2089344.4100000001,66.40174956198913 3 | Liberal Democrats,423098.65,13.446557907279487 4 | Scottish National Party (SNP),220892.63,7.020219848459129 5 | Labour Party,124526.05,3.9575799693281475 6 | Plaid Cymru - The Party of Wales,121831.56,3.871946002366576 7 | UK Independence Party (UKIP),64450.0,2.0482945457853927 8 | Scottish Green Party,48596.64000000001,1.5444566742512997 9 | Renew,29480.18,0.9369137610980858 10 | British National Party,10000.0,0.31781141129331153 11 | Women's Equality Party,10000.0,0.31781141129331153 12 | Green Party,4300.0,0.13665890685612397 13 | -------------------------------------------------------------------------------- /3 analyse data/results_clean.csv: -------------------------------------------------------------------------------- 1 | ,RegulatedEntityName,AcceptedDate,DonorName,DonorStatus,Year,Month,Value,RegulatedEntityType,DonorId,CampaigningName 2 | 0,Plaid Cymru - The Party of Wales,2018-12-19,Mr Alun Ffred Jones,Individual,2018,12,20000.0,Political Party,83318, 3 | 1,Liberal Democrats,2017-12-31,Ms Kirsten Bayes,Individual,2017,12,1800.0,Political Party,43033, 4 | 2,Liberal Democrats,2017-12-31,Mr Steve Webb,Individual,2017,12,3000.0,Political Party,35400, 5 | 3,Liberal Democrats,2017-12-31,Mr Tim Farron,Individual,2017,12,1560.0,Political Party,76661, 6 | 4,Liberal Democrats,2017-12-31,Mr Duncan Greenland,Individual,2017,12,7750.0,Political Party,35403, 7 | 5,Liberal Democrats,2017-12-31,Mr Michael Lees,Individual,2017,12,1800.0,Political Party,76645, 8 | 6,Liberal Democrats,2017-12-31,Ms Jane Mactaggart,Individual,2017,12,1838.0,Political Party,47793, 9 | 7,Liberal Democrats,2017-12-31,Mr Jeremy Hilton,Individual,2017,12,1779.0,Political Party,83347, 10 | 8,Liberal Democrats,2017-12-31,Baroness Kathryn Parminter,Individual,2017,12,2400.0,Political Party,37433, 11 | 9,Liberal Democrats,2017-12-31,Lady Catherine Bakewell,Individual,2017,12,1560.0,Political Party,50620, 12 | 10,Liberal Democrats,2017-12-31,Mr Martin Elengorn,Individual,2017,12,5350.0,Political Party,48727, 13 | 11,Liberal Democrats,2017-12-31,Cllr Ian Shires,Individual,2017,12,1920.0,Political Party,75238, 14 | 12,Liberal Democrats,2017-12-31,Ms Liz Morris,Individual,2017,12,3000.0,Political Party,72589, 15 | 13,Liberal Democrats,2017-12-31,Dr Alun Griffiths,Individual,2017,12,1800.0,Political Party,35415, 16 | 14,Liberal Democrats,2017-12-31,Mr Dave Hodgson,Individual,2017,12,2250.0,Political Party,34493, 17 | 15,Liberal Democrats,2017-12-31,Mr David Goodwin,Individual,2017,12,1560.0,Political Party,72576, 18 | 16,Liberal Democrats,2017-12-31,Mrs Elizabeth Barraclough,Individual,2017,12,2550.0,Political Party,83350, 19 | 17,Liberal Democrats,2017-12-31,Baroness Shirley Williams,Individual,2017,12,1700.0,Political Party,37445, 20 | 18,Liberal Democrats,2017-12-31,Lord Tim Clement-Jones,Individual,2017,12,2200.0,Political Party,34534, 21 | 19,Liberal Democrats,2017-12-31,Mr Duncan Greenland,Individual,2017,12,1950.0,Political Party,31106, 22 | 20,Liberal Democrats,2017-12-31,Mr Mark Burch,Individual,2017,12,4000.0,Political Party,74667, 23 | 21,Liberal Democrats,2017-12-31,Mr Arthur Hookway,Individual,2017,12,1980.0,Political Party,72143, 24 | 22,Liberal Democrats,2017-12-31,Mr Duncan Greenland,Individual,2017,12,1504.5,Political Party,35390, 25 | 23,UK Independence Party (UKIP),2017-12-31,Mr Brett Hammond,Individual,2017,12,650.0,Political Party,83324, 26 | 24,Liberal Democrats,2017-12-31,Mr Ashley Wood,Individual,2017,12,1743.93,Political Party,83356, 27 | 25,Liberal Democrats,2017-12-31,Mrs Rowena Hay,Individual,2017,12,1854.0,Political Party,48690, 28 | 26,Liberal Democrats,2017-12-31,Ms Lynne Featherstone,Individual,2017,12,1800.0,Political Party,37406, 29 | 27,Liberal Democrats,2017-12-31,Ms Inga Lockington,Individual,2017,12,1950.0,Political Party,83332, 30 | 28,Liberal Democrats,2017-12-31,Mr Peter Rothery,Individual,2017,12,1950.0,Political Party,34482, 31 | 29,Liberal Democrats,2017-12-31,Mr Richard Keatinge,Individual,2017,12,2000.0,Political Party,83335, 32 | 30,Liberal Democrats,2017-12-31,Mr Cliff Woodcraft,Individual,2017,12,1800.0,Political Party,50642, 33 | 31,Liberal Democrats,2017-12-31,Mr Derek Eastman,Individual,2017,12,1500.57,Political Party,83354, 34 | 32,Liberal Democrats,2017-12-31,Ms Gail Engert,Individual,2017,12,1695.0,Political Party,43024, 35 | 33,Liberal Democrats,2017-12-31,Mr Pathumal Ali,Individual,2017,12,1800.0,Political Party,72580, 36 | 34,Liberal Democrats,2017-12-31,Mrs Klara Sudbury,Individual,2017,12,1579.91,Political Party,35423, 37 | 35,Liberal Democrats,2017-12-31,Mr James Macpherson,Individual,2017,12,1776.0,Political Party,82968, 38 | 36,Liberal Democrats,2017-12-31,Mr David Tutt,Individual,2017,12,1750.0,Political Party,56049, 39 | 37,Liberal Democrats,2017-12-31,Mr Colin Stears,Individual,2017,12,1830.0,Political Party,54333, 40 | 38,Liberal Democrats,2017-12-31,Ms Mary Wane,Individual,2017,12,2200.0,Political Party,83342, 41 | 39,Liberal Democrats,2017-12-31,Mr Alistair Barr,Individual,2017,12,5000.0,Political Party,47844, 42 | 40,Liberal Democrats,2017-12-31,Miss Jocelyn Clark,Individual,2017,12,1594.0,Political Party,83346, 43 | 41,Liberal Democrats,2017-12-31,Roger Michael Isherwood,Individual,2017,12,2800.0,Political Party,77464, 44 | 42,Conservative and Unionist Party,2017-12-31,Mr David E D Brownlow,Individual,2017,12,11273.77,Political Party,69570, 45 | 43,Conservative and Unionist Party,2017-12-31,Mr David E Brownlow,Individual,2017,12,16540.0,Political Party,83077, 46 | 44,Liberal Democrats,2017-12-31,Cllr Joe Harris,Individual,2017,12,1557.25,Political Party,75240, 47 | 45,Liberal Democrats,2017-12-31,Mr James Baker,Individual,2017,12,1998.0,Political Party,76630, 48 | 46,Liberal Democrats,2017-12-31,Mr Dennis Meredith,Individual,2017,12,2003.41,Political Party,83355, 49 | 47,Liberal Democrats,2017-12-31,Ms Anne Winstanley,Individual,2017,12,2000.0,Political Party,34181, 50 | 48,Liberal Democrats,2017-12-31,Mr David Brown,Individual,2017,12,1862.0,Political Party,83331, 51 | 49,Liberal Democrats,2017-12-31,Mr Bernard Fisher,Individual,2017,12,1759.0,Political Party,76621, 52 | 50,Liberal Democrats,2017-12-31,Mr A Serge Lourie,Individual,2017,12,3620.0,Political Party,46325, 53 | 51,Renew,2017-12-31,Mr Richard Christopher Breen,Individual,2017,12,16509.1,Political Party,83311, 54 | 52,Liberal Democrats,2017-12-31,Mr Simon Wheeler,Individual,2017,12,1620.0,Political Party,83338, 55 | 53,Liberal Democrats,2017-12-31,Mrs Mary-Jane Jeanes,Individual,2017,12,1700.0,Political Party,76643, 56 | 54,Liberal Democrats,2017-12-31,Lord Richard Allan Of Hallam,Individual,2017,12,2100.0,Political Party,72138, 57 | 55,Liberal Democrats,2017-12-31,Dr Robert Barr,Individual,2017,12,1680.0,Political Party,76651, 58 | 56,Liberal Democrats,2017-12-31,Mrs Marian Radford,Individual,2017,12,1896.0,Political Party,72588, 59 | 57,Liberal Democrats,2017-12-31,Baroness Barbara Janke,Individual,2017,12,1600.0,Political Party,37379, 60 | 58,Liberal Democrats,2017-12-31,Mr M Joe Boyle,Individual,2017,12,1563.0,Political Party,83344, 61 | 59,Liberal Democrats,2017-12-31,Mr Dominic Hiscock,Individual,2017,12,2001.66,Political Party,83353, 62 | 60,Liberal Democrats,2017-12-31,Mr Manuel Abellan-San Martin,Individual,2017,12,1560.0,Political Party,83343, 63 | 61,Liberal Democrats,2017-12-31,Ms Helen Clucas,Individual,2017,12,1908.0,Political Party,83348, 64 | 62,Liberal Democrats,2017-12-31,Ms Anne Winstanley,Individual,2017,12,2000.0,Political Party,34181, 65 | 63,Liberal Democrats,2017-12-31,Ms Jane Mactaggart,Individual,2017,12,1668.0,Political Party,83345, 66 | 64,Liberal Democrats,2017-12-31,Ms Jane Mactaggart,Individual,2017,12,1800.0,Political Party,37401, 67 | 65,Liberal Democrats,2017-12-31,Mr Edward Acland,Individual,2017,12,1600.0,Political Party,83352, 68 | 66,Liberal Democrats,2017-12-31,Ms Joanna Kenny,Individual,2017,12,866.25,Political Party,47811, 69 | 67,Liberal Democrats,2017-12-31,Mr Michael Carter,Individual,2017,12,2200.0,Political Party,83341, 70 | 68,Liberal Democrats,2017-12-31,Ms Jane Mactaggart,Individual,2017,12,5000.0,Political Party,76647, 71 | 69,Liberal Democrats,2017-12-31,Ms Ruth Dombey,Individual,2017,12,1800.0,Political Party,50646, 72 | 70,Liberal Democrats,2017-12-31,Ms Karin Snowden,Individual,2017,12,1550.0,Political Party,83333, 73 | 71,Liberal Democrats,2017-12-31,Mrs Sunita Gordon,Individual,2017,12,1548.0,Political Party,76659, 74 | 72,Liberal Democrats,2017-12-31,Mr John Hale,Individual,2017,12,1600.0,Political Party,76632, 75 | 73,Liberal Democrats,2017-12-31,Ms Jane Mactaggart,Individual,2017,12,1800.0,Political Party,83337, 76 | 74,Liberal Democrats,2017-12-31,Lord Paul Strasburger,Individual,2017,12,1600.0,Political Party,73975, 77 | 75,Liberal Democrats,2017-12-31,Mrs Carolyn Lambert,Individual,2017,12,1526.4,Political Party,45278, 78 | 76,Liberal Democrats,2017-12-31,Mr Mark Watkin,Individual,2017,12,1800.0,Political Party,48732, 79 | 77,Liberal Democrats,2017-12-31,Mr Andrew Waller,Individual,2017,12,1657.38,Political Party,78680, 80 | 78,Liberal Democrats,2017-12-31,Mr David Brown,Individual,2017,12,1676.78,Political Party,83330, 81 | 79,Liberal Democrats,2017-12-31,Mr David Beacham,Individual,2017,12,3000.0,Political Party,50606, 82 | 80,Liberal Democrats,2017-12-31,Christopher Williams,Individual,2017,12,1506.37,Political Party,19152, 83 | 81,Scottish National Party (SNP),2017-12-31,Mr John Mason,Individual,2017,12,3160.0,Political Party,45126, 84 | 82,Liberal Democrats,2017-12-31,Mr Steven Lambert,Individual,2017,12,1596.0,Political Party,76657, 85 | 83,Liberal Democrats,2017-12-31,Mrs Isobel McCall,Individual,2017,12,2024.62,Political Party,45282, 86 | 84,Liberal Democrats,2017-12-31,Mr Michael Headley,Individual,2017,12,1958.0,Political Party,37420, 87 | 85,Liberal Democrats,2017-12-31,Mr Robert Wood,Individual,2017,12,1737.8,Political Party,76652, 88 | 86,Liberal Democrats,2017-12-31,Mr Mark Petterson,Individual,2017,12,15000.0,Political Party,55995, 89 | 87,Liberal Democrats,2017-12-31,Ms Jane Mactaggart,Individual,2017,12,2230.0,Political Party,55988, 90 | 88,Liberal Democrats,2017-12-31,Lord Nigel D Jones of Cheltenham,Individual,2017,12,1850.0,Political Party,37426, 91 | 89,Liberal Democrats,2017-12-31,Mr Ian Cuthbertson,Individual,2017,12,1521.72,Political Party,76629, 92 | 90,Liberal Democrats,2017-12-31,Mrs Marlene Heron,Individual,2017,12,1720.0,Political Party,54404, 93 | 91,Liberal Democrats,2017-12-31,Mr Owen Temple,Individual,2017,12,1800.0,Political Party,50168, 94 | 92,Liberal Democrats,2017-12-31,Mr Edward Joyce,Individual,2017,12,1980.0,Political Party,83351, 95 | 93,Liberal Democrats,2017-12-31,Mr SImon Curtis,Individual,2017,12,5000.0,Political Party,78769, 96 | 94,Liberal Democrats,2017-12-31,Mr Andrew Mckinlay,Individual,2017,12,1846.0,Political Party,48691, 97 | 95,Liberal Democrats,2017-12-31,Ms Philippa Connor,Individual,2017,12,2061.96,Political Party,54385, 98 | 96,Liberal Democrats,2017-12-31,Mrs Hilary Stephenson,Individual,2017,12,1800.0,Political Party,37408, 99 | 97,Liberal Democrats,2017-12-31,Mr Chris White,Individual,2017,12,2072.0,Political Party,83329, 100 | 98,Liberal Democrats,2017-12-31,Mr Tom Gosling,Individual,2017,12,10000.0,Political Party,78783, 101 | 99,Liberal Democrats,2017-12-31,Mr Alan Sherwell,Individual,2017,12,1750.0,Political Party,78672, 102 | 100,Liberal Democrats,2017-12-31,Mr Gerald Vernon-Jackson,Individual,2017,12,2290.68,Political Party,34481, 103 | 101,Liberal Democrats,2017-12-31,Mr Keith Crout,Individual,2017,12,1512.0,Political Party,76635, 104 | 102,Liberal Democrats,2017-12-30,Mr Alexey Chudnovskiy,Individual,2017,12,4500.0,Political Party,78673, 105 | 103,Labour Party,2017-12-29,Stella Creasy,Individual,2017,12,1947.05,Political Party,74319, 106 | 104,Liberal Democrats,2017-12-29,Ms Joanna Kenny,Individual,2017,12,1200.0,Political Party,47811, 107 | 105,Conservative and Unionist Party,2017-12-28,Mike Penning,Individual,2017,12,1510.0,Political Party,72665, 108 | 106,Conservative and Unionist Party,2017-12-28,Mr Anthony H Billingham,Individual,2017,12,5000.0,Political Party,37981, 109 | 107,Liberal Democrats,2017-12-27,Mr Peter Wilson,Individual,2017,12,8465.0,Political Party,83334, 110 | 108,Liberal Democrats,2017-12-27,Mrs Susan Howes,Individual,2017,12,5000.0,Political Party,78780, 111 | 109,Liberal Democrats,2017-12-27,Mr Greg Dyke,Individual,2017,12,5000.0,Political Party,78068, 112 | 110,Conservative and Unionist Party,2017-12-27,Mr John B Rutter,Individual,2017,12,1725.0,Political Party,53944, 113 | 111,Conservative and Unionist Party,2017-12-22,Ms Christine E Dawood,Individual,2017,12,12500.0,Political Party,83836, 114 | 112,Conservative and Unionist Party,2017-12-22,Mr Alan T Wicnh,Individual,2017,12,10000.0,Political Party,83878, 115 | 113,Labour Party,2017-12-21,Mr Stephen Kinsella,Individual,2017,12,4957.0,Political Party,37535, 116 | 114,Conservative and Unionist Party,2017-12-21,Mr David W Gray,Individual,2017,12,9000.0,Political Party,83838, 117 | 115,Liberal Democrats,2017-12-21,Mr Robert Chicken,Individual,2017,12,22466.38,Political Party,83339, 118 | 116,Conservative and Unionist Party,2017-12-20,Mr Simon D Hume-Kendall,Individual,2017,12,10000.0,Political Party,83065, 119 | 117,Liberal Democrats,2017-12-20,Ms Jennifer Talbot,Individual,2017,12,15000.0,Political Party,78089, 120 | 118,Conservative and Unionist Party,2017-12-20,Sir Michael Hintze,Individual,2017,12,2500.0,Political Party,34254, 121 | 119,Conservative and Unionist Party,2017-12-18,Mr Jeremy J Lefroy,Individual,2017,12,2572.5,Political Party,47086, 122 | 120,Conservative and Unionist Party,2017-12-18,Mr Jeremy J Lefroy,Individual,2017,12,2946.0,Political Party,47086, 123 | 121,Liberal Democrats,2017-12-15,Mrs Gitte Dawson,Individual,2017,12,20000.0,Political Party,35363, 124 | 122,Conservative and Unionist Party,2017-12-15,Mr Sam Singh,Individual,2017,12,18500.0,Political Party,83845, 125 | 123,Conservative and Unionist Party,2017-12-15,Mr Michael C Warshaw,Individual,2017,12,1703.93,Political Party,54007, 126 | 124,Conservative and Unionist Party,2017-12-15,Mr Edmund G Truell,Individual,2017,12,50000.0,Political Party,34290, 127 | 125,Conservative and Unionist Party,2017-12-15,Mr Surinderpal Lit,Individual,2017,12,3150.0,Political Party,83867, 128 | 126,Conservative and Unionist Party,2017-12-15,Mr Stephen S Less,Individual,2017,12,2100.0,Political Party,83866, 129 | 127,Conservative and Unionist Party,2017-12-14,Mr Stephen Howard,Individual,2017,12,2520.0,Political Party,37957, 130 | 128,Conservative and Unionist Party,2017-12-14,Lord Stanley Fink,Individual,2017,12,111600.0,Political Party,47072, 131 | 129,Conservative and Unionist Party,2017-12-14,Mr Dominic R Johnson,Individual,2017,12,2258.0,Political Party,77944, 132 | 130,Conservative and Unionist Party,2017-12-14,Mr Daniel P Hearsum,Individual,2017,12,10737.72,Political Party,76733, 133 | 131,Conservative and Unionist Party,2017-12-14,Baroness Emma Nicholson,Individual,2017,12,2499.99,Political Party,76310, 134 | 132,Conservative and Unionist Party,2017-12-14,Mr Dominic R Johnson,Individual,2017,12,900.0,Political Party,77944, 135 | 133,Conservative and Unionist Party,2017-12-14,Mr Dominic R Johnson,Individual,2017,12,6000.0,Political Party,77944, 136 | 134,Conservative and Unionist Party,2017-12-14,Mr Nicholas Brougham,Individual,2017,12,2499.0,Political Party,83060, 137 | 135,Conservative and Unionist Party,2017-12-14,Dr Arujuna Sivananthan,Individual,2017,12,2499.0,Political Party,83069, 138 | 136,Conservative and Unionist Party,2017-12-14,Mr Mark J Page,Individual,2017,12,6250.0,Political Party,83844, 139 | 137,Conservative and Unionist Party,2017-12-14,Mr Michael Davis,Individual,2017,12,271000.0,Political Party,34240, 140 | 138,Conservative and Unionist Party,2017-12-13,Mr Michael J Wade,Individual,2017,12,25000.0,Political Party,47059, 141 | 139,Labour Party,2017-12-13,Mr Stephen Kinsella,Individual,2017,12,4957.0,Political Party,37535, 142 | 140,Liberal Democrats,2017-12-13,Mr Mark Petterson,Individual,2017,12,25000.0,Political Party,55995, 143 | 141,Green Party,2017-12-12,Mr Roger Manser,Individual,2017,12,1500.0,Political Party,54660, 144 | 142,Conservative and Unionist Party,2017-12-12,Mr Richard J Grimes,Individual,2017,12,1900.0,Political Party,67265, 145 | 143,Conservative and Unionist Party,2017-12-12,Mr Palminder Singh,Individual,2017,12,2500.0,Political Party,67567, 146 | 144,Labour Party,2017-12-11,Lord Charles Falconer of Thoroton,Individual,2017,12,833.0,Political Party,83993, 147 | 145,UK Independence Party (UKIP),2017-12-11,Professor Tim Congdon,Individual,2017,12,5000.0,Political Party,38170, 148 | 146,Conservative and Unionist Party,2017-12-11,Mr Arthur P Davidson,Individual,2017,12,10000.0,Political Party,83835, 149 | 147,Conservative and Unionist Party,2017-12-11,Mr Mohamed Mansour,Individual,2017,12,12500.0,Political Party,83832, 150 | 148,Conservative and Unionist Party,2017-12-11,Mr Peter Cruddas,Individual,2017,12,12500.0,Political Party,67253, 151 | 149,Conservative and Unionist Party,2017-12-10,Mr Alan C Bolton,Individual,2017,12,7000.0,Political Party,77933, 152 | 150,Scottish National Party (SNP),2017-12-07,Mr Ian McNish,Individual,2017,12,42732.63,Political Party,78128, 153 | 151,Green Party,2017-12-07,Ms Jean Lambert MEP,Individual,2017,12,600.0,Political Party,34382, 154 | 152,UK Independence Party (UKIP),2017-12-07,Mr Ian Pirie,Individual,2017,12,5000.0,Political Party,74686, 155 | 153,Conservative and Unionist Party,2017-12-07,Mr Ian H Leslie-Melville,Individual,2017,12,5000.0,Political Party,77946, 156 | 154,Conservative and Unionist Party,2017-12-07,Mr Michael Slade,Individual,2017,12,5000.0,Political Party,36409, 157 | 155,Conservative and Unionist Party,2017-12-07,Lady Sarah L Keswick,Individual,2017,12,12500.0,Political Party,83840, 158 | 156,Conservative and Unionist Party,2017-12-06,Mr John D Lovering,Individual,2017,12,5000.0,Political Party,45696, 159 | 157,Liberal Democrats,2017-12-06,Mr Dinesh Dhamija,Individual,2017,12,5000.0,Political Party,54347, 160 | 158,Conservative and Unionist Party,2017-12-05,Mr Andrew D Williams,Individual,2017,12,4350.0,Political Party,83861, 161 | 159,Conservative and Unionist Party,2017-12-05,Ms Alison Frost,Individual,2017,12,10000.0,Political Party,37975, 162 | 160,Conservative and Unionist Party,2017-12-04,Mr Raymond Chamberlain,Individual,2017,12,8750.0,Political Party,52210, 163 | 161,Green Party,2017-12-04,Ms Elizabeth Reason,Individual,2017,12,1000.0,Political Party,77455, 164 | 162,Conservative and Unionist Party,2017-12-04,Mr Malcolm Bluemel,Individual,2017,12,50000.0,Political Party,72651, 165 | 163,UK Independence Party (UKIP),2017-12-04,Mrs Catherine Pirie,Individual,2017,12,2000.0,Political Party,74685, 166 | 164,Conservative and Unionist Party,2017-12-04,Mr Andrew Law,Individual,2017,12,13750.0,Political Party,52218, 167 | 165,Scottish Green Party,2017-12-01,Ms Alison Johnstone,Individual,2017,12,7956.0,Political Party,45119, 168 | 166,Conservative and Unionist Party,2017-12-01,Mr Simon M Haslam,Individual,2017,12,2395.0,Political Party,44911, 169 | 167,Scottish Green Party,2017-12-01,Mr Patrick Harvie,Individual,2017,12,8196.0,Political Party,38155, 170 | 168,Scottish Green Party,2017-12-01,Mr Ross John Greer,Individual,2017,12,9036.0,Political Party,83319, 171 | 169,Labour Party,2017-12-01,Lord John Crawford,Individual,2017,12,22000.0,Political Party,83992, 172 | 170,Scottish Green Party,2017-12-01,Mr Mark Christopher Ruskell,Individual,2017,12,7805.16,Political Party,83321, 173 | 171,Scottish Green Party,2017-12-01,Mr Andrew Dearg Wightman,Individual,2017,12,7757.16,Political Party,83322, 174 | 172,Scottish Green Party,2017-12-01,Mr John Bradford Finnie,Individual,2017,12,7846.32,Political Party,83320, 175 | 173,Conservative and Unionist Party,2017-12-01,Mr Michael S Thronton OBE,Individual,2017,12,2000.0,Political Party,83855, 176 | 174,Conservative and Unionist Party,2017-11-30,Mr Henry Keswick,Individual,2017,11,5000.0,Political Party,34260, 177 | 175,Liberal Democrats,2017-11-30,Mrs Hazel Watson,Individual,2017,11,1568.08,Political Party,83349, 178 | 176,Renew,2017-11-30,Mr Richard Christopher Breen,Individual,2017,11,12971.08,Political Party,83311, 179 | 177,Conservative and Unionist Party,2017-11-30,Mr Gerald H Elliot,Individual,2017,11,8000.0,Political Party,83837, 180 | 178,Liberal Democrats,2017-11-29,Lord Paul Tyler,Individual,2017,11,2400.0,Political Party,72569, 181 | 179,Conservative and Unionist Party,2017-11-29,Mr Jeremy RS Hunt,Individual,2017,11,1947.9,Political Party,83870, 182 | 180,Labour Party,2017-11-28,Mr Edward John Izzard,Individual,2017,11,10000.0,Political Party,83990, 183 | 181,UK Independence Party (UKIP),2017-11-28,Lord D Stevens of Ludgate,Individual,2017,11,4000.0,Political Party,48879, 184 | 182,Liberal Democrats,2017-11-27,Mr Michael Watson,Individual,2017,11,2500.0,Political Party,83340, 185 | 183,British National Party,2017-11-27,Ms Marina Smethhurst,Individual,2017,11,10000.0,Political Party,83362, 186 | 184,Labour Party,2017-11-27,Clive Hollick,Individual,2017,11,20000.0,Political Party,83994, 187 | 185,UK Independence Party (UKIP),2017-11-27,Mr Malcolm Bluemel,Individual,2017,11,2000.0,Political Party,83323, 188 | 186,Conservative and Unionist Party,2017-11-27,Mr Abdul-Majid Jafar,Individual,2017,11,12500.0,Political Party,81180, 189 | 187,Conservative and Unionist Party,2017-11-24,Mr Andrew Godson,Individual,2017,11,5500.0,Political Party,47473, 190 | 188,Conservative and Unionist Party,2017-11-22,Mr Stephen P J Matthews,Individual,2017,11,8500.0,Political Party,83843, 191 | 189,Labour Party,2017-11-22,Mr Douglas Reynolds,Individual,2017,11,10000.0,Political Party,83997, 192 | 190,Conservative and Unionist Party,2017-11-21,Ms Nadezda Rodicheva,Individual,2017,11,8500.0,Political Party,81128, 193 | 191,Labour Party,2017-11-21,Lord Charles Falconer of Thoroton,Individual,2017,11,833.0,Political Party,83993, 194 | 192,Conservative and Unionist Party,2017-11-21,Ms Rania K Majeed,Individual,2017,11,11750.0,Political Party,83842, 195 | 193,Conservative and Unionist Party,2017-11-21,Mr Alexander Temerko,Individual,2017,11,4250.0,Political Party,43634, 196 | 194,Conservative and Unionist Party,2017-11-21,Mr Howard Leigh,Individual,2017,11,4250.0,Political Party,36384, 197 | 195,Labour Party,2017-11-21,Lord Charles Falconer of Thoroton,Individual,2017,11,833.0,Political Party,83993, 198 | 196,Conservative and Unionist Party,2017-11-20,Mr Thomas E Notman,Individual,2017,11,10000.0,Political Party,83859, 199 | 197,Conservative and Unionist Party,2017-11-19,Mrs Sarah J Pidgley,Individual,2017,11,5000.0,Political Party,83851, 200 | 198,Conservative and Unionist Party,2017-11-17,Mr Philip L Wroughton,Individual,2017,11,3975.0,Political Party,53955, 201 | 199,Conservative and Unionist Party,2017-11-17,Mr Kenneth J French,Individual,2017,11,5000.0,Political Party,83871, 202 | 200,Plaid Cymru - The Party of Wales,2017-11-17,Ms Jane Mactaggart,Individual,2017,11,61473.89,Political Party,83317, 203 | 201,Liberal Democrats,2017-11-16,Mr John Noel Penstone,Individual,2017,11,3000.0,Political Party,76667, 204 | 202,Conservative and Unionist Party,2017-11-16,Mrs Mary Erbrich,Individual,2017,11,60000.0,Political Party,76714, 205 | 203,Conservative and Unionist Party,2017-11-16,Mr Patrick D Horsfall,Individual,2017,11,2000.0,Political Party,83863, 206 | 204,Liberal Democrats,2017-11-15,Mr Anthony Harris,Individual,2017,11,5080.0,Political Party,54432, 207 | 205,Conservative and Unionist Party,2017-11-15,Ms Jane Mactaggart,Individual,2017,11,10000.0,Political Party,37909, 208 | 206,Conservative and Unionist Party,2017-11-15,Mr Zac F Goldsmith,Individual,2017,11,5000.0,Political Party,69726, 209 | 207,Liberal Democrats,2017-11-15,Mr Stephen Dawson,Individual,2017,11,5000.0,Political Party,74657, 210 | 208,Conservative and Unionist Party,2017-11-15,Mr Peter Brown,Individual,2017,11,5000.0,Political Party,83826, 211 | 209,Labour Party,2017-11-14,Mr Tony Belton,Individual,2017,11,3000.0,Political Party,48385, 212 | 210,Women's Equality Party,2017-11-14,Mr Jonathan Leslie Skeet,Individual,2017,11,10000.0,Political Party,76669, 213 | 211,Labour Party,2017-11-13,Lord Charles Falconer of Thoroton,Individual,2017,11,833.0,Political Party,83993, 214 | 212,Liberal Democrats,2017-11-12,William Goodhart,Individual,2017,11,20000.0,Political Party,34527, 215 | 213,Conservative and Unionist Party,2017-11-10,Mr Roger J Kendrick,Individual,2017,11,3000.0,Political Party,83820, 216 | 214,Conservative and Unionist Party,2017-11-07,Mr Michael Davis,Individual,2017,11,1000.0,Political Party,34240, 217 | 215,Labour Party,2017-11-06,Mr William Haughey,Individual,2017,11,30000.0,Political Party,37529, 218 | 216,Green Party,2017-11-05,Ms Jean Lambert MEP,Individual,2017,11,600.0,Political Party,34382, 219 | 217,Conservative and Unionist Party,2017-11-03,Ms Dora Bertolutti,Individual,2017,11,2400.0,Political Party,83883, 220 | 218,Conservative and Unionist Party,2017-11-03,Mr Neville A Baxter,Individual,2017,11,10000.0,Political Party,34323, 221 | 219,Conservative and Unionist Party,2017-11-01,Mr Oluwole Kolade,Individual,2017,11,52500.0,Political Party,38762, 222 | 220,Liberal Democrats,2017-11-01,Mr Anthony Bunker,Individual,2017,11,3750.0,Political Party,37466, 223 | 221,Conservative and Unionist Party,2017-11-01,Mr Stephen L Massey,Individual,2017,11,11000.0,Political Party,36454, 224 | 222,Conservative and Unionist Party,2017-10-31,Hon George T Farmer,Individual,2017,10,8500.0,Political Party,83827, 225 | 223,Conservative and Unionist Party,2017-10-31,Mr Laurence Hollingworth,Individual,2017,10,1000.0,Political Party,67568, 226 | 224,Liberal Democrats,2017-10-31,Ms Anna Gallop,Individual,2017,10,2000.0,Political Party,83357, 227 | 225,Liberal Democrats,2017-10-31,Mr Vincent Cable,Individual,2017,10,2000.0,Political Party,82964, 228 | 226,Conservative and Unionist Party,2017-10-31,Mr John D Booth,Individual,2017,10,5000.0,Political Party,81007, 229 | 227,Conservative and Unionist Party,2017-10-30,Mrs Esme Forbes,Individual,2017,10,38295.29,Political Party,76306, 230 | 228,Conservative and Unionist Party,2017-10-30,Mr Iain Aitken,Individual,2017,10,2000.0,Political Party,83856, 231 | 229,Conservative and Unionist Party,2017-10-30,Mr Byron S Huson,Individual,2017,10,400000.0,Political Party,83882, 232 | 230,Conservative and Unionist Party,2017-10-25,Mr Navroz D Udwadia,Individual,2017,10,10000.0,Political Party,83846, 233 | 231,Conservative and Unionist Party,2017-10-25,Mr James R Lupton,Individual,2017,10,3900.0,Political Party,34266, 234 | 232,Conservative and Unionist Party,2017-10-25,Mr James R Lupton,Individual,2017,10,50000.0,Political Party,34266, 235 | 233,Conservative and Unionist Party,2017-10-25,Mr Ravi S Kailas,Individual,2017,10,2000.0,Political Party,74743, 236 | 234,Conservative and Unionist Party,2017-10-23,Mr Nicholas Campsie,Individual,2017,10,10000.0,Political Party,76713, 237 | 235,Conservative and Unionist Party,2017-10-23,Mr Nicholas N Moore,Individual,2017,10,2000.0,Political Party,83872, 238 | 236,Conservative and Unionist Party,2017-10-23,Mr William H Salomon,Individual,2017,10,25000.0,Political Party,36405, 239 | 237,Green Party,2017-10-22,Ms Jean Lambert MEP,Individual,2017,10,600.0,Political Party,34382, 240 | 238,Conservative and Unionist Party,2017-10-19,Mr Ian R Taylor,Individual,2017,10,100000.0,Political Party,76312, 241 | 239,Conservative and Unionist Party,2017-10-18,Lord Philip Harris,Individual,2017,10,10000.0,Political Party,34251, 242 | 240,Conservative and Unionist Party,2017-10-17,Scirard Lancelyn Green,Individual,2017,10,1200.0,Political Party,76334, 243 | 241,Conservative and Unionist Party,2017-10-17,Scirard Lancelyn Green,Individual,2017,10,600.0,Political Party,76334, 244 | 242,Conservative and Unionist Party,2017-10-17,Scirard Lancelyn Green,Individual,2017,10,600.0,Political Party,76334, 245 | 243,Conservative and Unionist Party,2017-10-16,Mr Amjad Bseisu,Individual,2017,10,12500.0,Political Party,74074, 246 | 244,Conservative and Unionist Party,2017-10-16,Mr Gary Lydiate,Individual,2017,10,20000.0,Political Party,81113, 247 | 245,Conservative and Unionist Party,2017-10-13,Mr Duncan Greenland,Individual,2017,10,1871.31,Political Party,83876, 248 | 246,Conservative and Unionist Party,2017-10-11,Mr Roger G Orf,Individual,2017,10,8500.0,Political Party,47055, 249 | 247,Labour Party,2017-10-11,Ms Jane Turnell-Read,Individual,2017,10,10000.0,Political Party,83998, 250 | 248,Conservative and Unionist Party,2017-10-11,Ms Lesley Jackson,Individual,2017,10,100000.0,Political Party,38745, 251 | 249,UK Independence Party (UKIP),2017-10-09,Mr Andrew Perloff,Individual,2017,10,20000.0,Political Party,46042, 252 | 250,Conservative and Unionist Party,2017-10-09,Mr Donald J Lewin,Individual,2017,10,10000.0,Political Party,83841, 253 | 251,Conservative and Unionist Party,2017-10-09,Mr Peter Kane,Individual,2017,10,50000.0,Political Party,34257, 254 | 252,Conservative and Unionist Party,2017-10-09,Mr Roger Nagioff,Individual,2017,10,50000.0,Political Party,36392, 255 | 253,Conservative and Unionist Party,2017-10-09,Mr Christopher M Higgins,Individual,2017,10,2300.0,Political Party,84031, 256 | 254,Conservative and Unionist Party,2017-10-09,Mr Michael A Dangoor,Individual,2017,10,56600.0,Political Party,83834, 257 | 255,Conservative and Unionist Party,2017-10-09,Mr Robert D Calrow,Individual,2017,10,21000.0,Political Party,38757, 258 | 256,Conservative and Unionist Party,2017-10-09,Mr Michael Cohen,Individual,2017,10,8100.0,Political Party,83833, 259 | 257,Scottish National Party (SNP),2017-10-06,Mr Ian McNish,Individual,2017,10,175000.0,Political Party,78128, 260 | 258,Labour Party,2017-10-05,Lord Charles Falconer of Thoroton,Individual,2017,10,833.0,Political Party,83993, 261 | 259,Conservative and Unionist Party,2017-10-04,Mr Arthur J Taylor,Individual,2017,10,1500.0,Political Party,49844, 262 | 260,Plaid Cymru - The Party of Wales,2017-10-02,Mr David Charles Williams,Individual,2017,10,40357.67,Political Party,83316, 263 | 261,Conservative and Unionist Party,2017-09-01,Mr Robert J Madejski,Individual,2017,9,1875.0,Political Party,34372, 264 | 262,Liberal Democrats,2017-08-31,Mr Andrew Pinnock,Individual,2017,8,1866.0,Political Party,83358, 265 | 263,Liberal Democrats,2017-08-30,Lord John Lee Of Trafford,Individual,2017,8,2400.0,Political Party,72163, 266 | 264,UK Independence Party (UKIP),2017-08-18,Mr Ian Pirie,Individual,2017,8,1000.0,Political Party,74686, 267 | 265,Liberal Democrats,2017-08-06,Mr Mark Petterson,Individual,2017,8,4000.0,Political Party,55995, 268 | 266,Conservative and Unionist Party,2017-07-17,Ms Rachel H MacLean,Individual,2017,7,2000.0,Political Party,83086, 269 | 267,Conservative and Unionist Party,2017-07-17,Ms Rachel H MacLean,Individual,2017,7,2000.0,Political Party,83086, 270 | 268,UK Independence Party (UKIP),2017-06-30,Mr Brett Hammond,Individual,2017,6,650.0,Political Party,83324, 271 | 269,Liberal Democrats,2017-06-30,Ms Jane Mactaggart,Individual,2017,6,23000.0,Political Party,55988, 272 | 270,Conservative and Unionist Party,2017-06-24,Mr Elizabeth A Gooch,Individual,2017,6,4000.0,Political Party,47085, 273 | 271,Liberal Democrats,2017-06-08,Mr Albert Mcintosh,Individual,2017,6,5000.0,Political Party,82944, 274 | 272,Liberal Democrats,2017-06-05,Mr Mark Petterson,Individual,2017,6,1500.0,Political Party,55995, 275 | 273,Liberal Democrats,2017-06-01,Ms Karin Snowden,Individual,2017,6,1750.0,Political Party,83333, 276 | 274,UK Independence Party (UKIP),2017-05-30,Mr Ian Pirie,Individual,2017,5,3000.0,Political Party,74686, 277 | 275,UK Independence Party (UKIP),2017-05-30,Mr Malcolm Bluemel,Individual,2017,5,1000.0,Political Party,83323, 278 | 276,UK Independence Party (UKIP),2017-05-26,Mr Malcolm Bluemel,Individual,2017,5,3000.0,Political Party,83323, 279 | 277,Conservative and Unionist Party,2017-05-23,Mr Jeremy J Hosking,Individual,2017,5,5000.0,Political Party,38786, 280 | 278,Labour Party,2017-05-22,Lord Charles Falconer of Thoroton,Individual,2017,5,1500.0,Political Party,83993, 281 | 279,Conservative and Unionist Party,2017-05-12,Mr Neil Record,Individual,2017,5,3000.0,Political Party,36402, 282 | 280,Conservative and Unionist Party,2017-05-11,Mrs Menju Mehrotra,Individual,2017,5,2500.0,Political Party,83864, 283 | 281,Conservative and Unionist Party,2017-05-11,Mr Ravi Mehrotra,Individual,2017,5,2500.0,Political Party,83865, 284 | 282,Conservative and Unionist Party,2017-05-05,Mr Peter D Landale,Individual,2017,5,5000.0,Political Party,83857, 285 | 283,Conservative and Unionist Party,2017-05-02,Mr Henry Keswick,Individual,2017,5,3000.0,Political Party,34260, 286 | 284,Conservative and Unionist Party,2017-04-29,Mr Daniel D Laycock,Individual,2017,4,2500.0,Political Party,83824, 287 | 285,Conservative and Unionist Party,2017-04-28,Mr Edward P Weatherall,Individual,2017,4,5000.0,Political Party,83858, 288 | 286,Labour Party,2017-04-24,Mr John O'Hara,Individual,2017,4,1000.0,Political Party,83996, 289 | 287,Conservative and Unionist Party,2017-04-20,Mr Terence F Parkinson,Individual,2017,4,5000.0,Political Party,44896, 290 | 288,Conservative and Unionist Party,2017-04-19,Mr Terence F Parkinson,Individual,2017,4,5000.0,Political Party,44896, 291 | 289,Liberal Democrats,2017-04-08,Lord John Alderdice,Individual,2017,4,2400.0,Political Party,34531, 292 | 290,UK Independence Party (UKIP),2017-03-31,Mr Duncan Greenland,Individual,2017,3,650.0,Political Party,83324, 293 | 291,Conservative and Unionist Party,2017-03-27,Mr Jeremy Hand,Individual,2017,3,2500.0,Political Party,83880, 294 | 292,UK Independence Party (UKIP),2017-02-15,Professor Tim Congdon,Individual,2017,2,2500.0,Political Party,38170, 295 | 293,UK Independence Party (UKIP),2017-02-08,Lord D Stevens of Ludgate,Individual,2017,2,1000.0,Political Party,48879, 296 | 294,UK Independence Party (UKIP),2017-01-26,Professor Tim Congdon,Individual,2017,1,5000.0,Political Party,38170, 297 | 295,UK Independence Party (UKIP),2017-01-26,Mr Duncan Greenland,Individual,2017,1,1000.0,Political Party,83323, 298 | 296,UK Independence Party (UKIP),2017-01-26,Lord D Stevens of Ludgate,Individual,2017,1,5000.0,Political Party,48879, 299 | 297,UK Independence Party (UKIP),2017-01-13,Mr Duncan Greenland,Individual,2017,1,1000.0,Political Party,74686, 300 | 298,UK Independence Party (UKIP),2017-01-12,Mr Malcolm Bluemel,Individual,2017,1,1000.0,Political Party,83323, 301 | 299,Labour Party,2017-01-10,Mr Duncan Greenland,Individual,2017,1,1000.0,Political Party,83996, 302 | -------------------------------------------------------------------------------- /4 scrape data/4 Scrape data.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Scrape data with Python Requests and Beautiful Soup" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "Welcome to this Jupyter Notebook! \n", 15 | " \n", 16 | "This notebook was made for the Datajournalism.com [course Python for Journalists](https://datajournalism.com/watch/python-for-journalists). In this module you'll learn how to instruct your computer to download structured, not password protected data from the internet; a technique also known as webscraping. We'll be using the libraries Requests and Beautiful Soup to scrape data. Don't forget to install these libraries to your Anaconda environment. (Otherwise importing these libraries will result in an error message.) Installating these libraries needs to be done in the terminal/cmd prompt using the commands `conda install requests` and `conda install bs4`.\n", 17 | "\n", 18 | "\n", 19 | "## About Jupyter Notebooks and Pandas\n", 20 | "\n", 21 | "Right now you're looking at a Jupyter Notebook: an interactive, browser based programming environment. You can use these notebooks to program in R, Julia or Python - as you'll be doing later on. Read more about Jupyter Notebook in the [Jupyter Notebook Quick Start Guide](https://jupyter-notebook-beginner-guide.readthedocs.io/en/latest/what_is_jupyter.html). \n", 22 | " \n", 23 | "To clean up our data, we'll be using Python and Pandas. Pandas is an open-source Python library - basically an extra toolkit to go with Python - that is designed for data analysis. Pandas is flexible, easy to use and has lots of useful functions built right in. Read more about Pandas and its features in [the Pandas documentation](https://pandas.pydata.org/pandas-docs/stable/).\n", 24 | "\n", 25 | "**Notebook shortcuts** \n", 26 | "\n", 27 | "Within Jupyter Notebooks, there are some shortcuts you can use. If you'll be using more notebooks for your data analysis in the future, you'll remember these shortcuts soon enough. :) \n", 28 | "\n", 29 | "* `esc` will take you into command mode\n", 30 | "* `a` will insert cell above\n", 31 | "* `b` will insert cell below\n", 32 | "* `shift then tab` will show you the documentation for your code\n", 33 | "* `shift and enter` will run your cell\n", 34 | "* ` d d` will delete a cell\n", 35 | "\n", 36 | "**Pandas dictionary**\n", 37 | "\n", 38 | "* **dataframe**: dataframe is Pandas speak for a table with a labeled y-axis, also known as an index. (The index usually starts at 0.)\n", 39 | "* **series**: a series is a list, a series can be made of a single column within a dataframe.\n", 40 | "\n", 41 | "Before we dive in, a little more about Jupyter Notebooks. Every notebooks is made out of cells. A cell can either contain Markdown text - like this one - or code. In the latter you can execute your code. To see what that means, type the following command in the next cell `print(\"hello world\")`." 42 | ] 43 | }, 44 | { 45 | "cell_type": "code", 46 | "execution_count": null, 47 | "metadata": { 48 | "collapsed": true, 49 | "jupyter": { 50 | "outputs_hidden": true 51 | } 52 | }, 53 | "outputs": [], 54 | "source": [] 55 | }, 56 | { 57 | "cell_type": "markdown", 58 | "metadata": {}, 59 | "source": [ 60 | "## Getting started" 61 | ] 62 | }, 63 | { 64 | "cell_type": "markdown", 65 | "metadata": {}, 66 | "source": [ 67 | "Now, let's import the libraries we need to get started with scraping. Type `import requests`, `from bs4 import BeautifulSoup`, `import pandas as pd` and `import csv`." 68 | ] 69 | }, 70 | { 71 | "cell_type": "code", 72 | "execution_count": null, 73 | "metadata": { 74 | "ExecuteTime": { 75 | "end_time": "2018-05-10T16:30:36.347564Z", 76 | "start_time": "2018-05-10T16:30:34.864142Z" 77 | }, 78 | "collapsed": true, 79 | "jupyter": { 80 | "outputs_hidden": true 81 | } 82 | }, 83 | "outputs": [], 84 | "source": [] 85 | }, 86 | { 87 | "cell_type": "markdown", 88 | "metadata": {}, 89 | "source": [ 90 | "**What's in a name** \n", 91 | "Scraping is the act of automatically downloading selected data from a website. Scraping is also known as web scraping, web harvesting, web data extraction and data scraping. It can be very valueable tool for your newsroom: instead of by hand saving data from the web, you can automate and speed up the process by writing a custom Python program that downloads the information for you. \n", 92 | " \n", 93 | " \n", 94 | "\n", 95 | "\n", 96 | "**What we'll actually will be doing, when I say 'we're scraping a website':** \n", 97 | "\n", 98 | "- tell your computer which site to visit: where do you want to download data from? \n", 99 | " - we'll be using the `requests` library to requests webpages\n", 100 | "- save the webpage (the html-page) to the computer\n", 101 | " - this too will be done with library `requests`\n", 102 | "- from the webpage, select the data you want to have\n", 103 | " - we'll be using `BeautifulSoup` to do this\n", 104 | "- write the selection to a csv-file\n", 105 | " - this is done with the `csv` library\n", 106 | "\n", 107 | "If there is more than 1 page where you want to get data from, you can tell your computer to move on the next page to repeat the process. But that's for another course... :) \n" 108 | ] 109 | }, 110 | { 111 | "cell_type": "markdown", 112 | "metadata": {}, 113 | "source": [ 114 | "# Scraping a website\n", 115 | "\n", 116 | "## Request webpage\n", 117 | "We'll be scraping a list of [Power Reactors](https://www.nrc.gov/reactors/operating/list-power-reactor-units.html) from the site of the US government. First we need to let our computer know what site we want to visit; than we can request the site using `requests.get('http://website.com')`." 118 | ] 119 | }, 120 | { 121 | "cell_type": "code", 122 | "execution_count": null, 123 | "metadata": { 124 | "ExecuteTime": { 125 | "end_time": "2018-05-10T16:13:59.010151Z", 126 | "start_time": "2018-05-10T16:13:57.969927Z" 127 | }, 128 | "collapsed": true, 129 | "jupyter": { 130 | "outputs_hidden": true 131 | } 132 | }, 133 | "outputs": [], 134 | "source": [] 135 | }, 136 | { 137 | "cell_type": "markdown", 138 | "metadata": {}, 139 | "source": [ 140 | "If you want your code to become more easily reusable, you can rewrite to:" 141 | ] 142 | }, 143 | { 144 | "cell_type": "code", 145 | "execution_count": null, 146 | "metadata": { 147 | "ExecuteTime": { 148 | "end_time": "2018-05-10T16:36:08.659812Z", 149 | "start_time": "2018-05-10T16:36:08.547577Z" 150 | } 151 | }, 152 | "outputs": [], 153 | "source": [] 154 | }, 155 | { 156 | "cell_type": "markdown", 157 | "metadata": {}, 158 | "source": [ 159 | "Note that `requests.get(url)` doesn't have the url in quotes; it's clear the url is a string by the quotation marks in `url = 'https://www.nrc.gov/reactors/operating/list-power-reactor-units.html'`.\n", 160 | "\n", 161 | "To check if everything went right, we can use simpy type `page`; this will return a response code. Status codes are issued by a server in response to a client's request made to the server. Read more about these code on the [wikipedia page on status codes](). Basically, if you have a 200 response code, the website loaded in just fine." 162 | ] 163 | }, 164 | { 165 | "cell_type": "code", 166 | "execution_count": null, 167 | "metadata": { 168 | "ExecuteTime": { 169 | "end_time": "2018-05-10T16:14:01.044665Z", 170 | "start_time": "2018-05-10T16:14:01.025593Z" 171 | } 172 | }, 173 | "outputs": [], 174 | "source": [] 175 | }, 176 | { 177 | "cell_type": "markdown", 178 | "metadata": {}, 179 | "source": [ 180 | "## Parse HTML, select data\n", 181 | "Now that we've got the page, let's parse the htmlpage. To parse is just nerd speak for splitting up the original data in smaller bits. Use `BeautifulSoup(page.content, 'html.parser')`. It's pretty common when scraping, to name the first with BeautifulSoup created file 'soup'. This 'soup' variable will contain all html of the page once we're done. \n", 182 | "\n", 183 | "Off course, if you want to see what is in 'soup', you could type `print(soup)`. (Notice how there are no quotemarks, since the soup we're refering to is a variable that has data stored inside of it and it is not a string. But, when you add `soup` on a new line, the computer will also print your soup. Again: programmers like things short and sweet.\n", 184 | "\n", 185 | "Btw, the library is named after the Beautiful Soup from Alice in Wonderland... Not kidding.\n", 186 | "\n", 187 | "Now, let's make ourselves some soup..." 188 | ] 189 | }, 190 | { 191 | "cell_type": "code", 192 | "execution_count": null, 193 | "metadata": { 194 | "ExecuteTime": { 195 | "end_time": "2018-05-10T16:14:02.754386Z", 196 | "start_time": "2018-05-10T16:14:02.415142Z" 197 | } 198 | }, 199 | "outputs": [], 200 | "source": [] 201 | }, 202 | { 203 | "cell_type": "markdown", 204 | "metadata": {}, 205 | "source": [ 206 | "Next you want to select the table from this soup. Thanks to the BeautifulSoup library, you can do this writing `soup.find('table')`, this command will look for the first `` in the source code of the webpage, also known as our soup." 207 | ] 208 | }, 209 | { 210 | "cell_type": "code", 211 | "execution_count": null, 212 | "metadata": { 213 | "ExecuteTime": { 214 | "end_time": "2018-05-10T16:14:03.730027Z", 215 | "start_time": "2018-05-10T16:14:03.506059Z" 216 | } 217 | }, 218 | "outputs": [], 219 | "source": [] 220 | }, 221 | { 222 | "cell_type": "markdown", 223 | "metadata": {}, 224 | "source": [ 225 | "Next, let's get all rows in the table. The HTML code for rows in a table is ``. We can use the BeautifulSoup command `.find_all('tr')` to get all of these rows." 226 | ] 227 | }, 228 | { 229 | "cell_type": "code", 230 | "execution_count": null, 231 | "metadata": { 232 | "ExecuteTime": { 233 | "end_time": "2018-05-10T16:14:05.172202Z", 234 | "start_time": "2018-05-10T16:14:05.136868Z" 235 | } 236 | }, 237 | "outputs": [], 238 | "source": [] 239 | }, 240 | { 241 | "cell_type": "markdown", 242 | "metadata": {}, 243 | "source": [ 244 | "See how with `.find_all('')` you can find all rows at once, while `.find('')` will just get you the first one of whatever it is your looking for.\n", 245 | "\n", 246 | "Since there is only 1 table on this webpage, you can either use `soup.find_all('tr')` or `table.find_all('tr')`. But if there are two or more tables on one page, the `soup.find_all('tr')` command will get you all rows, from all tables. `table.find_all('tr')` builds upon `soup.find('table')`, which will give you the **first** table; meaning that `table.find_all('tr')` will get all rows from the first table only.\n", 247 | "\n", 248 | "Don't believe me? Let's try and use `soup.find_all('tr')`..." 249 | ] 250 | }, 251 | { 252 | "cell_type": "code", 253 | "execution_count": null, 254 | "metadata": { 255 | "ExecuteTime": { 256 | "end_time": "2018-05-10T16:14:06.554377Z", 257 | "start_time": "2018-05-10T16:14:06.517875Z" 258 | } 259 | }, 260 | "outputs": [], 261 | "source": [] 262 | }, 263 | { 264 | "cell_type": "markdown", 265 | "metadata": {}, 266 | "source": [ 267 | "You see? Exactly the same result. Just remember; whatever assignment you give to your computer, it always refers to the data that is before the `.assignment`. Meaning `soup.find_all('tr')` looks for '`tr`'s' in `soup`, and `table.find_all('tr')` looks for `tr`s in `table`.\n", 268 | "\n", 269 | "\n", 270 | "Now let's say that you are especially interested in the 21st row. What do you do? Since computers start counting at zero, you should ask it for row 20 to get to see the 21st row. And since you saved all rows in the `rows` variable, you can actually say 'dear computer, give me row 20' by typing `rows[20]`." 271 | ] 272 | }, 273 | { 274 | "cell_type": "code", 275 | "execution_count": null, 276 | "metadata": { 277 | "ExecuteTime": { 278 | "end_time": "2018-05-10T16:14:07.757940Z", 279 | "start_time": "2018-05-10T16:14:07.745156Z" 280 | } 281 | }, 282 | "outputs": [], 283 | "source": [] 284 | }, 285 | { 286 | "cell_type": "markdown", 287 | "metadata": {}, 288 | "source": [ 289 | "Looking at this row, do you recognize the different cells? Every cell starts with `
`, the HTML abbrevation for table data. You can use BeautifulSoup to look for all `td`'s in this 21st row by typing: `rows.find_all('td')`." 290 | ] 291 | }, 292 | { 293 | "cell_type": "code", 294 | "execution_count": null, 295 | "metadata": { 296 | "ExecuteTime": { 297 | "end_time": "2018-05-10T16:14:08.924923Z", 298 | "start_time": "2018-05-10T16:14:08.917175Z" 299 | } 300 | }, 301 | "outputs": [], 302 | "source": [] 303 | }, 304 | { 305 | "cell_type": "markdown", 306 | "metadata": {}, 307 | "source": [ 308 | "Just for your information: you can even save the data from the `td`'s to a variable called cells, simply type ` cells = rows[21].find_all('td')`" 309 | ] 310 | }, 311 | { 312 | "cell_type": "code", 313 | "execution_count": null, 314 | "metadata": { 315 | "ExecuteTime": { 316 | "end_time": "2018-05-10T16:14:10.971479Z", 317 | "start_time": "2018-05-10T16:14:10.965290Z" 318 | } 319 | }, 320 | "outputs": [], 321 | "source": [] 322 | }, 323 | { 324 | "cell_type": "markdown", 325 | "metadata": {}, 326 | "source": [ 327 | "Now that you know how to only select 1 certain row, you can probably guess how to select a data cell. Exactly, use `cells[0]` to get the first cell of `cells`." 328 | ] 329 | }, 330 | { 331 | "cell_type": "code", 332 | "execution_count": null, 333 | "metadata": { 334 | "ExecuteTime": { 335 | "end_time": "2018-05-10T16:14:13.255237Z", 336 | "start_time": "2018-05-10T16:14:13.248465Z" 337 | } 338 | }, 339 | "outputs": [], 340 | "source": [] 341 | }, 342 | { 343 | "cell_type": "markdown", 344 | "metadata": {}, 345 | "source": [ 346 | "It works, but it doesn't look too good, does it? Let's get rid of the HTML bits and pieces around our data. Add `.text` to get the job done." 347 | ] 348 | }, 349 | { 350 | "cell_type": "code", 351 | "execution_count": null, 352 | "metadata": { 353 | "ExecuteTime": { 354 | "end_time": "2018-05-10T16:14:15.058803Z", 355 | "start_time": "2018-05-10T16:14:15.053231Z" 356 | } 357 | }, 358 | "outputs": [], 359 | "source": [] 360 | }, 361 | { 362 | "cell_type": "markdown", 363 | "metadata": {}, 364 | "source": [ 365 | "Looks much better, doesn't it? \n", 366 | "\n", 367 | "Unfortunately, there are too many rows in this table to get each cell like we got `Comanche Peak 105000445`. We'll going to have to automate it. Luckily this is one of the big benefits of programming. \n", 368 | "\n", 369 | "Here's what we're going to do: \n", 370 | "1. create an empty list to be used later\n", 371 | "2. extract the table from our soup, save it to the `table` variable\n", 372 | "3. 'loop over' our table....\n", 373 | "4. ...to save the data we need for each row in the table\n", 374 | "5. add the selected data to the list\n", 375 | "6. print the list\n", 376 | "\n", 377 | "At step 3 we'll 'loop over' the table. What does it mean? Well, using a for loop as its called means that we'll give our computer an assignment and have it done **for** every something. It's like your mum when she told you to treat your friends with candy: **for every one of your friend, give them a piece of candy** It's shorter than naming all your friends one by one and repeating the assignment time and time again, right? We're doing exactly the same by telling our computer: **for every row in the table, get the data inside the cells**." 378 | ] 379 | }, 380 | { 381 | "cell_type": "code", 382 | "execution_count": null, 383 | "metadata": { 384 | "ExecuteTime": { 385 | "end_time": "2018-05-10T16:37:25.348411Z", 386 | "start_time": "2018-05-10T16:37:25.338361Z" 387 | } 388 | }, 389 | "outputs": [], 390 | "source": [] 391 | }, 392 | { 393 | "cell_type": "code", 394 | "execution_count": null, 395 | "metadata": { 396 | "ExecuteTime": { 397 | "end_time": "2018-05-10T16:16:29.254893Z", 398 | "start_time": "2018-05-10T16:16:29.198302Z" 399 | } 400 | }, 401 | "outputs": [], 402 | "source": [] 403 | }, 404 | { 405 | "cell_type": "markdown", 406 | "metadata": {}, 407 | "source": [ 408 | "Congrats! You just wrote your very first scraper - well done!\n", 409 | "\n", 410 | "## Saving the scraped data\n", 411 | "\n", 412 | "Now, off course having your data printed inside the notebook is nice. But it would be even beter to store the data in a CSV file. Remember that I explained what we'd actually be doing? Off course things are a bit more complicated; let me explain. Here's what I told you before:\n", 413 | "\n", 414 | "- tell your computer which site to visit: where do you want to download data from? \n", 415 | " - we'll be using the `requests` library to requests webpages\n", 416 | "- save the webpage (the html-page) to the computer\n", 417 | " - this too will be done with library `requests`\n", 418 | "- from the webpage, select the data you want to have\n", 419 | " - we'll be using `BeautifulSoup` to do this\n", 420 | "- write the selection to a csv-file\n", 421 | " - this is done with the `csv` library\n", 422 | "\n", 423 | "Here's what the code will actually do: \n", 424 | "1. Create a CSV file to save data in\n", 425 | "2. Create a CSV writer to write data with to the CSV file\n", 426 | "3. Tell your computer which site(s) to visit\n", 427 | "4. Get the webpage\n", 428 | "5. Select data from the webpage\n", 429 | "6. Write data with the CSV writer to the CSV file \n", 430 | "7. Save file\n", 431 | "\n", 432 | "## Save data to CSV\n", 433 | "\n", 434 | "Here's how to save data to a CSV file using the CSV library - the process involves a couple steps:\n", 435 | "1. create a file, open it, make sure it's 'writeable', use `open('filename.csv', 'w', encoding='utf8', newline='')`\n", 436 | "2. create a writer, you'll need a writer if you want to write data to the file, use `csv.writer(filename, delimiter=',')`\n", 437 | "3. write data to the file using the writer, use `writer.writerow([data])`\n", 438 | "\n", 439 | "Off course you can repeat step 3 as often as necessary." 440 | ] 441 | }, 442 | { 443 | "cell_type": "code", 444 | "execution_count": null, 445 | "metadata": { 446 | "ExecuteTime": { 447 | "end_time": "2018-05-10T16:24:02.221916Z", 448 | "start_time": "2018-05-10T16:24:02.212331Z" 449 | } 450 | }, 451 | "outputs": [], 452 | "source": [] 453 | }, 454 | { 455 | "cell_type": "markdown", 456 | "metadata": {}, 457 | "source": [ 458 | "Using the `ls` command you can see that a new file was created. " 459 | ] 460 | }, 461 | { 462 | "cell_type": "code", 463 | "execution_count": null, 464 | "metadata": { 465 | "ExecuteTime": { 466 | "end_time": "2018-05-10T16:38:31.872133Z", 467 | "start_time": "2018-05-10T16:38:31.742944Z" 468 | } 469 | }, 470 | "outputs": [], 471 | "source": [] 472 | }, 473 | { 474 | "cell_type": "markdown", 475 | "metadata": {}, 476 | "source": [ 477 | "## The scraper\n", 478 | "Before we broke our essay scraper into sentences before. Now I'll be putting all these sentences together. This way, you can get a good overview of what a scraper could look like. Here's a list of what we need to do, in the exact order: \n", 479 | "1. Create a CSV file, open it, make it writeable\n", 480 | "2. Create a CSV writer to write data\n", 481 | "3. Write the column headers to the file\n", 482 | "4. Tell your computer which site(s) to visit\n", 483 | "5. Get the webpage\n", 484 | "6. Select data from the webpage\n", 485 | "7. Write data with the CSV writer to the CSV file \n", 486 | "8. Save file" 487 | ] 488 | }, 489 | { 490 | "cell_type": "code", 491 | "execution_count": null, 492 | "metadata": { 493 | "ExecuteTime": { 494 | "end_time": "2018-05-10T16:32:33.718331Z", 495 | "start_time": "2018-05-10T16:32:32.641607Z" 496 | }, 497 | "collapsed": true, 498 | "jupyter": { 499 | "outputs_hidden": true 500 | } 501 | }, 502 | "outputs": [], 503 | "source": [] 504 | }, 505 | { 506 | "cell_type": "markdown", 507 | "metadata": {}, 508 | "source": [ 509 | "If you want to check if everything worked as it's supposed to, you can import the ScrapedData.csv file as a dataframe using `pd.read_csv('filename.csv')`. Look at the dataframe to see if there's data in the file. Using `df.shape` you can even quickly check if there is as much data in the file as you'd expect. " 510 | ] 511 | }, 512 | { 513 | "cell_type": "code", 514 | "execution_count": null, 515 | "metadata": { 516 | "ExecuteTime": { 517 | "end_time": "2018-05-10T16:32:37.086801Z", 518 | "start_time": "2018-05-10T16:32:37.041336Z" 519 | } 520 | }, 521 | "outputs": [], 522 | "source": [] 523 | }, 524 | { 525 | "cell_type": "markdown", 526 | "metadata": {}, 527 | "source": [ 528 | "`df.shape` will give you the number of rows and columns of the dataframe. A quick way to check if really everything that should be in the CSV file is there." 529 | ] 530 | }, 531 | { 532 | "cell_type": "code", 533 | "execution_count": null, 534 | "metadata": { 535 | "ExecuteTime": { 536 | "end_time": "2018-05-10T16:34:50.897133Z", 537 | "start_time": "2018-05-10T16:34:50.889696Z" 538 | } 539 | }, 540 | "outputs": [], 541 | "source": [] 542 | }, 543 | { 544 | "cell_type": "markdown", 545 | "metadata": {}, 546 | "source": [ 547 | "Note that the headers are in the dataset twice:\n", 548 | "while scraping we added header; but we also scraped the headers since the headers are in the first row of the table and we scraped all table rows...\n", 549 | "\n", 550 | "Now what? \n", 551 | "\n", 552 | "You can easily delete a row by using ``df.drop(df.index[N])``, to drop the Nth row by index number.\n", 553 | "\n", 554 | "To make sure you get the index number right, why not print the first rows once more? We're in a notebook after all... You can use ``df.head()``" 555 | ] 556 | }, 557 | { 558 | "cell_type": "code", 559 | "execution_count": null, 560 | "metadata": {}, 561 | "outputs": [], 562 | "source": [] 563 | }, 564 | { 565 | "cell_type": "markdown", 566 | "metadata": {}, 567 | "source": [ 568 | "Looking at these first 5 rows, you'll find that you want to delete the row with indexnumber 0. As stated before, you can use ``df.drop``. By default Pandas will create and return a copy of your dataset, and delete the row of your choosing in that copy. This means that the original will still include dropped row.\n", 569 | "\n", 570 | "Consider this a safety belt when deleting data using Pandas. ;)" 571 | ] 572 | }, 573 | { 574 | "cell_type": "code", 575 | "execution_count": null, 576 | "metadata": {}, 577 | "outputs": [], 578 | "source": [] 579 | }, 580 | { 581 | "cell_type": "markdown", 582 | "metadata": {}, 583 | "source": [ 584 | "To delete the first row in the original dataset - and not in a copy that Pandas will return to you; you'll need to use ``inplace=True``. The full command becomes: ``df.drop(df.index[0], inplace=True)``. \n", 585 | "\n", 586 | "``inplace=True`` will delete the row in the original dataset, and won't return anything. Try it:" 587 | ] 588 | }, 589 | { 590 | "cell_type": "code", 591 | "execution_count": null, 592 | "metadata": {}, 593 | "outputs": [], 594 | "source": [] 595 | }, 596 | { 597 | "cell_type": "markdown", 598 | "metadata": {}, 599 | "source": [ 600 | "To see that it worked, request the head of the dataframe..." 601 | ] 602 | }, 603 | { 604 | "cell_type": "code", 605 | "execution_count": null, 606 | "metadata": {}, 607 | "outputs": [], 608 | "source": [] 609 | }, 610 | { 611 | "cell_type": "markdown", 612 | "metadata": {}, 613 | "source": [ 614 | "If you want to you can save this cleaned version, by using ``df.to_csv()``..." 615 | ] 616 | }, 617 | { 618 | "cell_type": "code", 619 | "execution_count": null, 620 | "metadata": {}, 621 | "outputs": [], 622 | "source": [] 623 | }, 624 | { 625 | "cell_type": "markdown", 626 | "metadata": {}, 627 | "source": [ 628 | "Well done, happy web scraping!" 629 | ] 630 | } 631 | ], 632 | "metadata": { 633 | "kernelspec": { 634 | "display_name": "Python 3", 635 | "language": "python", 636 | "name": "python3" 637 | }, 638 | "language_info": { 639 | "codemirror_mode": { 640 | "name": "ipython", 641 | "version": 3 642 | }, 643 | "file_extension": ".py", 644 | "mimetype": "text/x-python", 645 | "name": "python", 646 | "nbconvert_exporter": "python", 647 | "pygments_lexer": "ipython3", 648 | "version": "3.7.5" 649 | }, 650 | "toc": { 651 | "nav_menu": {}, 652 | "number_sections": true, 653 | "sideBar": true, 654 | "skip_h1_title": true, 655 | "toc_cell": false, 656 | "toc_position": {}, 657 | "toc_section_display": "block", 658 | "toc_window_display": false 659 | } 660 | }, 661 | "nbformat": 4, 662 | "nbformat_minor": 4 663 | } 664 | -------------------------------------------------------------------------------- /4 scrape data/scrapedData single header.csv: -------------------------------------------------------------------------------- 1 | ,plantNamedocketNumber,licenseNumber,reactorType,location,OwnerOperator,NRCRegion 2 | 1,Arkansas Nuclear 105000313,DPR-51,PWR,"6 miles WNW of Russellville,  AR","Entergy Nuclear Operations, Inc. ",4 3 | 2,Arkansas Nuclear 205000368,NPF-6,PWR,"6 miles WNW of Russellville,  AR","Entergy Nuclear Operations, Inc. ",4 4 | 3,Beaver Valley 105000334,DPR-66,PWR,"17 miles W of McCandless,  PA",FirstEnergy Nuclear Operating Co. ,1 5 | 4,Beaver Valley 205000412,NPF-73,PWR,"17 miles W of McCandless,  PA",FirstEnergy Nuclear Operating Co. ,1 6 | 5,Braidwood 105000456,NPF-72,PWR,"20 miles SSW of Joliet,  IL","Exelon Generation Co., LLC ",3 7 | 6,Braidwood 205000457,NPF-77,PWR,"20 miles SSW of Joliet,  IL","Exelon Generation Co., LLC ",3 8 | 7,Browns Ferry 105000259,DPR-33,BWR,"32 miles W of Huntsville, AL",Tennessee Valley Authority ,2 9 | 8,Browns Ferry 205000260,DPR-52,BWR,"32 miles W of Huntsville, AL",Tennessee Valley Authority ,2 10 | 9,Browns Ferry 305000296,DPR-68,BWR,"32 miles W of Huntsville, AL",Tennessee Valley Authority ,2 11 | 10,Brunswick 105000325,DPR-71,BWR,"30 miles S of Wilmington, NC","Duke Energy Progress, LLC ",2 12 | 11,Brunswick 205000324,DPR-62,BWR,"30 miles S of Wilmington, NC","Duke Energy Progress, LLC",2 13 | 12,Byron 105000454,NPF-37,PWR,"17 miles SW of Rockford,  IL","Exelon Generation Co., LLC ",3 14 | 13,Byron 205000455,NPF-66,PWR,"17 miles SW of Rockford,  IL","Exelon Generation Co., LLC ",3 15 | 14,Callaway05000483,NPF-30,PWR,"25 miles ENE of Jefferson City, MO",Ameren UE ,4 16 | 15,Calvert Cliffs 105000317,DPR-53,PWR,"40 miles S of Annapolis,  MD",Constellation Energy,1 17 | 16,Calvert Cliffs 205000318,DPR-69,PWR,"40 miles S of Annapolis,  MD",Constellation Energy,1 18 | 17,Catawba 105000413,NPF-35,PWR,"18 miles S of Charlotte, NC","Duke Energy Carolinas, LLC ",2 19 | 18,Catawba 205000414,NPF-52,PWR,"18 miles S of Charlotte, NC","Duke Energy Carolinas, LLC",2 20 | 19,Clinton05000461,NPF-62,BWR,"23 miles SSE of Bloomington, IL","Exelon Generation Co., LLC",3 21 | 20,Columbia Generating Station05000397,NPF-21,BWR,"20 miles NNE of Pasco, WA",Energy Northwest ,4 22 | 21,Comanche Peak 105000445,NPF-87,PWR,"40 miles SW of Fort Worth, TX",TEX Operations Company LLC,4 23 | 22,Comanche Peak 205000446,NPF-89,PWR,"40 miles SW of Fort Worth, TX",TEX Operations Company LLC,4 24 | 23,Cooper05000298,DPR-46,BWR,"23 miles S of Nebraska City,  NE",Nebraska Public Power District ,4 25 | 24,D.C. Cook 105000315,DPR-58,PWR,"13 miles S of Benton Harbor,  MI",Indiana/Michigan Power Co. ,3 26 | 25,D.C. Cook 2 05000316,DPR-74,PWR,"13 miles S of Benton Harbor,  MI",Indiana/Michigan Power Co. ,3 27 | 26,Davis-Besse05000346,NPF-3,PWR,"21 miles ESE of Toledo,  OH",FirstEnergy Nuclear Operating Co. ,3 28 | 27,Diablo Canyon 105000275,DPR-80,PWR,"12 miles WSW of San Luis Obispo,  CA",Pacific Gas & Electric Co. ,4 29 | 28,Diablo Canyon 205000323,DPR-82,PWR,"12 miles WSW of San Luis Obispo,  CA",Pacific Gas & Electric Co. ,4 30 | 29,Dresden 205000237,DPR-19,BWR,"25 miles SW of Joliet, IL","Exelon Generation Co., LLC ",3 31 | 30,Dresden 305000249,DPR-25,BWR,"25 miles SW of Joliet, IL","Exelon Generation Co., LLC ",3 32 | 31,Duane Arnold05000331,DPR-49,BWR,"8 miles NW of Cedar Rapids,  IA","NextEra Energy Duane Arnold, LLC ",3 33 | 32,Farley 105000348,NPF-2,PWR,"18 miles E of Dothan,  AL",Southern Nuclear Operating Co. ,2 34 | 33,Farley 205000364,NPF-8,PWR,"18 miles E of Dothan,  AL",Southern Nuclear Operating Co.,2 35 | 34,Fermi 205000341,NPF-43,BWR,"25 miles NE of Toledo,  OH",DTE Electric Company,3 36 | 35,FitzPatrick05000333,DPR-59,BWR,"6 miles NE of Oswego,  NY","Exelon FitzPatrick, LLC/Exelon Generation Company, LLC",1 37 | 36,Ginna05000244,DPR-18,PWR,"20 miles NE of Rochester,  NY",Constellation Energy,1 38 | 37,Grand Gulf 105000416,NPF-29,BWR,"20 miles S of Vicksburg,  MS","Entergy Nuclear Operations, Inc.",4 39 | 38,Hatch 105000321,DPR-57,BWR,"20 miles S of Vidalia, GA","Southern Nuclear Operating Co., Inc. ",2 40 | 39,Hatch 205000366,NPF-5,BWR,"20 miles S of Vidalia, GA","Southern Nuclear Operating Co., Inc. ",2 41 | 40,Hope Creek 105000354,NPF-57,BWR,"18 miles SE of Wilmington,  DE","PSEG Nuclear, LLC",1 42 | 41,Indian Point 305000286,DPR-64,PWR,"24 miles N of New York City,  NY","Entergy Nuclear Operations, Inc.",1 43 | 42,La Salle 105000373,NPF-11,BWR,"11 miles SE of Ottawa,  IL","Exelon Generation Co., LLC ",3 44 | 43,La Salle 205000374,NPF-18,BWR,"11 miles SE of Ottawa,  IL","Exelon Generation Co., LLC ",3 45 | 44,Limerick 105000352,NPF-39,BWR,"21 miles NW of Philadelphia,  PA","Exelon Generation Co., LLC ",1 46 | 45,Limerick 205000353,NPF-85,BWR,"21 miles NW of Philadelphia,  PA","Exelon Generation Co., LLC ",1 47 | 46,McGuire 105000369,NPF-9,PWR,"17 miles N of Charlotte,  NC","Duke Energy Carolinas, LLC",2 48 | 47,McGuire 205000370,NPF-17,PWR,"17 miles N of Charlotte,  NC","Duke Energy Carolinas, LLC",2 49 | 48,Millstone 205000336,DPR-65,PWR,"3.2 miles WSW of New London,  CT",Dominion Generation ,1 50 | 49,Millstone 305000423,NPF-49,PWR,"3.2 miles WSW of New London,  CT",Dominion Generation ,1 51 | 50,Monticello05000263,DPR-22,BWR,"35 miles NW of Minneapolis,  MN",Northern States Power Company – Minnesota,3 52 | 51,Nine Mile Point 105000220,DPR-63,BWR,"6 miles NE of Oswego,  NY",Constellation Energy,1 53 | 52,Nine Mile Point 205000410,NPF-69,BWR,"6 miles NE of Oswego,  NY",Constellation Energy,1 54 | 53,North Anna 105000338,NPF-4,PWR,"40 miles NW of Richmond,  VA",Dominion Generation,2 55 | 54,North Anna 205000339,NPF-7,PWR,"40 miles NW of Richmond,  VA",Dominion Generation,2 56 | 55,Oconee 105000269,DPR-38,PWR,"30 miles W of Greenville,  SC","Duke Energy Carolinas, LLC",2 57 | 56,Oconee 205000270,DPR-47,PWR,"30 miles W of Greenville,  SC","Duke Energy Carolinas, LLC",2 58 | 57,Oconee 305000287,DPR-55,PWR,"30 miles W of Greenville,  SC","Duke Energy Carolinas, LLC",2 59 | 58,Palisades05000255,DPR-20,PWR,"5 miles S of South Haven,  MI","Entergy Nuclear Operations, Inc.",3 60 | 59,Palo Verde 105000528,NPF-41,PWR,"50 miles W of Phoenix,  AZ",Arizona Public Service Co. ,4 61 | 60,Palo Verde 205000529,NPF-51,PWR,"50 miles W of Phoenix,  AZ",Arizona Public Service Co. ,4 62 | 61,Palo Verde 305000530,NPF-74,PWR,"50 miles W of Phoenix,  AZ",Arizona Public Service Co. ,4 63 | 62,Peach Bottom 205000277,DPR-44,BWR,"17.9 miles S of Lancaster,  PA","Exelon Generation Co., LLC ",1 64 | 63,Peach Bottom 305000278,DPR-56,BWR,"17.9 miles S of Lancaster,  PA","Exelon Generation Co., LLC ",1 65 | 64,Perry 105000440,NPF-58,BWR,"35 miles NE of Cleveland, OH",FirstEnergy Nuclear Operating Co. ,3 66 | 65,Point Beach 105000266,DPR-24,PWR,"13 miles NNW of Manitowoc,  WI","NextEra Energy Point Beach, LLC",3 67 | 66,Point Beach 205000301,DPR-27,PWR,"13 miles NNW of Manitowoc,  WI","NextEra Energy Point Beach, LLC",3 68 | 67,Prairie Island 105000282,DPR-42,PWR,"28 miles SE of Minneapolis,  MN",Northern States Power Company – Minnesota ,3 69 | 68,Prairie Island 205000306,DPR-60,PWR,"28 miles SE of Minneapolis,  MN",Northern States Power Company – Minnesota ,3 70 | 69,Quad Cities 105000254,DPR-29,BWR,"20 miles NE of Moline,  IL","Exelon Generation Co., LLC ",3 71 | 70,Quad Cities 205000265,DPR-30,BWR,"20 miles NE of Moline,  IL","Exelon Generation Co., LLC ",3 72 | 71,River Bend 105000458,NPF-47,BWR,"24 miles NNW of Baton Rouge,  LA","Entergy Nuclear Operations, Inc.",4 73 | 72,Robinson 205000261,DPR-23,PWR,"26 miles NW of Florence,  SC","Duke Energy Progress, LLC ",2 74 | 73,Saint Lucie 105000335,DPR-67,PWR,"10 miles SE of Ft. Pierce,  FL",Florida Power & Light Co. ,2 75 | 74,Saint Lucie 205000389,NPF-16,PWR,"10 miles SE of Ft. Pierce,  FL",Florida Power & Light Co. ,2 76 | 75,Salem 105000272,DPR-70,PWR,"18 miles S of Wilmington,  DE","PSEG Nuclear, LLC",1 77 | 76,Salem 205000311,DPR-75,PWR,"18 miles S of Wilmington,  DE","PSEG Nuclear, LLC",1 78 | 77,Seabrook 105000443,NPF-86,PWR,"13 miles S of Portsmouth,  NH","NextEra Energy Seabrook, LLC",1 79 | 78,Sequoyah 105000327,DPR-77,PWR,"16 miles NE of Chattanooga,  TN",Tennessee Valley Authority ,2 80 | 79,Sequoyah 205000328,DPR-79,PWR,"16 miles NE of Chattanooga,  TN",Tennessee Valley Authority ,2 81 | 80,Shearon Harris 105000400,NPF-63,PWR,"20 miles SW of Raleigh,  NC","Duke Energy Progress, LLC",2 82 | 81,South Texas 105000498,NPF-76,PWR,"90 miles SW of Houston, TX",STP Nuclear Operating Co. ,4 83 | 82,South Texas 205000499,NPF-80,PWR,"90 miles SW of Houston, TX",STP Nuclear Operating Co. ,4 84 | 83,Summer05000395,NPF-12,PWR,"26 miles NW of Columbia,  SC",South Carolina Electric & Gas Co. ,2 85 | -------------------------------------------------------------------------------- /4 scrape data/scrapedData.csv: -------------------------------------------------------------------------------- 1 | plantNamedocketNumber,licenseNumber,reactorType,location,OwnerOperator,NRCRegion 2 | "Plant Name 3 | Docket Number",License Number,"Reactor 4 | Type",Location,Owner/Operator,NRC Region 5 | Arkansas Nuclear 105000313,DPR-51,PWR,"6 miles WNW of Russellville,  AR","Entergy Nuclear Operations, Inc. ",4 6 | Arkansas Nuclear 205000368,NPF-6,PWR,"6 miles WNW of Russellville,  AR","Entergy Nuclear Operations, Inc. ",4 7 | Beaver Valley 105000334,DPR-66,PWR,"17 miles W of McCandless,  PA",FirstEnergy Nuclear Operating Co. ,1 8 | Beaver Valley 205000412,NPF-73,PWR,"17 miles W of McCandless,  PA",FirstEnergy Nuclear Operating Co. ,1 9 | Braidwood 105000456,NPF-72,PWR,"20 miles SSW of Joliet,  IL","Exelon Generation Co., LLC ",3 10 | Braidwood 205000457,NPF-77,PWR,"20 miles SSW of Joliet,  IL","Exelon Generation Co., LLC ",3 11 | Browns Ferry 105000259,DPR-33,BWR,"32 miles W of Huntsville, AL",Tennessee Valley Authority ,2 12 | Browns Ferry 205000260,DPR-52,BWR,"32 miles W of Huntsville, AL",Tennessee Valley Authority ,2 13 | Browns Ferry 305000296,DPR-68,BWR,"32 miles W of Huntsville, AL",Tennessee Valley Authority ,2 14 | Brunswick 105000325,DPR-71,BWR,"30 miles S of Wilmington, NC","Duke Energy Progress, LLC ",2 15 | Brunswick 205000324,DPR-62,BWR,"30 miles S of Wilmington, NC","Duke Energy Progress, LLC",2 16 | Byron 105000454,NPF-37,PWR,"17 miles SW of Rockford,  IL","Exelon Generation Co., LLC ",3 17 | Byron 205000455,NPF-66,PWR,"17 miles SW of Rockford,  IL","Exelon Generation Co., LLC ",3 18 | Callaway05000483,NPF-30,PWR,"25 miles ENE of Jefferson City, MO",Ameren UE ,4 19 | Calvert Cliffs 105000317,DPR-53,PWR,"40 miles S of Annapolis,  MD",Constellation Energy,1 20 | Calvert Cliffs 205000318,DPR-69,PWR,"40 miles S of Annapolis,  MD",Constellation Energy,1 21 | Catawba 105000413,NPF-35,PWR,"18 miles S of Charlotte, NC","Duke Energy Carolinas, LLC ",2 22 | Catawba 205000414,NPF-52,PWR,"18 miles S of Charlotte, NC","Duke Energy Carolinas, LLC",2 23 | Clinton05000461,NPF-62,BWR,"23 miles SSE of Bloomington, IL","Exelon Generation Co., LLC",3 24 | Columbia Generating Station05000397,NPF-21,BWR,"20 miles NNE of Pasco, WA",Energy Northwest ,4 25 | Comanche Peak 105000445,NPF-87,PWR,"40 miles SW of Fort Worth, TX",TEX Operations Company LLC,4 26 | Comanche Peak 205000446,NPF-89,PWR,"40 miles SW of Fort Worth, TX",TEX Operations Company LLC,4 27 | Cooper05000298,DPR-46,BWR,"23 miles S of Nebraska City,  NE",Nebraska Public Power District ,4 28 | D.C. Cook 105000315,DPR-58,PWR,"13 miles S of Benton Harbor,  MI",Indiana/Michigan Power Co. ,3 29 | D.C. Cook 2 05000316,DPR-74,PWR,"13 miles S of Benton Harbor,  MI",Indiana/Michigan Power Co. ,3 30 | Davis-Besse05000346,NPF-3,PWR,"21 miles ESE of Toledo,  OH",FirstEnergy Nuclear Operating Co. ,3 31 | Diablo Canyon 105000275,DPR-80,PWR,"12 miles WSW of San Luis Obispo,  CA",Pacific Gas & Electric Co. ,4 32 | Diablo Canyon 205000323,DPR-82,PWR,"12 miles WSW of San Luis Obispo,  CA",Pacific Gas & Electric Co. ,4 33 | Dresden 205000237,DPR-19,BWR,"25 miles SW of Joliet, IL","Exelon Generation Co., LLC ",3 34 | Dresden 305000249,DPR-25,BWR,"25 miles SW of Joliet, IL","Exelon Generation Co., LLC ",3 35 | Duane Arnold05000331,DPR-49,BWR,"8 miles NW of Cedar Rapids,  IA","NextEra Energy Duane Arnold, LLC ",3 36 | Farley 105000348,NPF-2,PWR,"18 miles E of Dothan,  AL",Southern Nuclear Operating Co. ,2 37 | Farley 205000364,NPF-8,PWR,"18 miles E of Dothan,  AL",Southern Nuclear Operating Co.,2 38 | Fermi 205000341,NPF-43,BWR,"25 miles NE of Toledo,  OH",DTE Electric Company,3 39 | FitzPatrick05000333,DPR-59,BWR,"6 miles NE of Oswego,  NY","Exelon FitzPatrick, LLC/Exelon Generation Company, LLC",1 40 | Ginna05000244,DPR-18,PWR,"20 miles NE of Rochester,  NY",Constellation Energy,1 41 | Grand Gulf 105000416,NPF-29,BWR,"20 miles S of Vicksburg,  MS","Entergy Nuclear Operations, Inc.",4 42 | Hatch 105000321,DPR-57,BWR,"20 miles S of Vidalia, GA","Southern Nuclear Operating Co., Inc. ",2 43 | Hatch 205000366,NPF-5,BWR,"20 miles S of Vidalia, GA","Southern Nuclear Operating Co., Inc. ",2 44 | Hope Creek 105000354,NPF-57,BWR,"18 miles SE of Wilmington,  DE","PSEG Nuclear, LLC",1 45 | Indian Point 305000286,DPR-64,PWR,"24 miles N of New York City,  NY","Entergy Nuclear Operations, Inc.",1 46 | La Salle 105000373,NPF-11,BWR,"11 miles SE of Ottawa,  IL","Exelon Generation Co., LLC ",3 47 | La Salle 205000374,NPF-18,BWR,"11 miles SE of Ottawa,  IL","Exelon Generation Co., LLC ",3 48 | Limerick 105000352,NPF-39,BWR,"21 miles NW of Philadelphia,  PA","Exelon Generation Co., LLC ",1 49 | Limerick 205000353,NPF-85,BWR,"21 miles NW of Philadelphia,  PA","Exelon Generation Co., LLC ",1 50 | McGuire 105000369,NPF-9,PWR,"17 miles N of Charlotte,  NC","Duke Energy Carolinas, LLC",2 51 | McGuire 205000370,NPF-17,PWR,"17 miles N of Charlotte,  NC","Duke Energy Carolinas, LLC",2 52 | Millstone 205000336,DPR-65,PWR,"3.2 miles WSW of New London,  CT",Dominion Generation ,1 53 | Millstone 305000423,NPF-49,PWR,"3.2 miles WSW of New London,  CT",Dominion Generation ,1 54 | Monticello05000263,DPR-22,BWR,"35 miles NW of Minneapolis,  MN",Northern States Power Company – Minnesota,3 55 | Nine Mile Point 105000220,DPR-63,BWR,"6 miles NE of Oswego,  NY",Constellation Energy,1 56 | Nine Mile Point 205000410,NPF-69,BWR,"6 miles NE of Oswego,  NY",Constellation Energy,1 57 | North Anna 105000338,NPF-4,PWR,"40 miles NW of Richmond,  VA",Dominion Generation,2 58 | North Anna 205000339,NPF-7,PWR,"40 miles NW of Richmond,  VA",Dominion Generation,2 59 | Oconee 105000269,DPR-38,PWR,"30 miles W of Greenville,  SC","Duke Energy Carolinas, LLC",2 60 | Oconee 205000270,DPR-47,PWR,"30 miles W of Greenville,  SC","Duke Energy Carolinas, LLC",2 61 | Oconee 305000287,DPR-55,PWR,"30 miles W of Greenville,  SC","Duke Energy Carolinas, LLC",2 62 | Palisades05000255,DPR-20,PWR,"5 miles S of South Haven,  MI","Entergy Nuclear Operations, Inc.",3 63 | Palo Verde 105000528,NPF-41,PWR,"50 miles W of Phoenix,  AZ",Arizona Public Service Co. ,4 64 | Palo Verde 205000529,NPF-51,PWR,"50 miles W of Phoenix,  AZ",Arizona Public Service Co. ,4 65 | Palo Verde 305000530,NPF-74,PWR,"50 miles W of Phoenix,  AZ",Arizona Public Service Co. ,4 66 | Peach Bottom 205000277,DPR-44,BWR,"17.9 miles S of Lancaster,  PA","Exelon Generation Co., LLC ",1 67 | Peach Bottom 305000278,DPR-56,BWR,"17.9 miles S of Lancaster,  PA","Exelon Generation Co., LLC ",1 68 | Perry 105000440,NPF-58,BWR,"35 miles NE of Cleveland, OH",FirstEnergy Nuclear Operating Co. ,3 69 | Point Beach 105000266,DPR-24,PWR,"13 miles NNW of Manitowoc,  WI","NextEra Energy Point Beach, LLC",3 70 | Point Beach 205000301,DPR-27,PWR,"13 miles NNW of Manitowoc,  WI","NextEra Energy Point Beach, LLC",3 71 | Prairie Island 105000282,DPR-42,PWR,"28 miles SE of Minneapolis,  MN",Northern States Power Company – Minnesota ,3 72 | Prairie Island 205000306,DPR-60,PWR,"28 miles SE of Minneapolis,  MN",Northern States Power Company – Minnesota ,3 73 | Quad Cities 105000254,DPR-29,BWR,"20 miles NE of Moline,  IL","Exelon Generation Co., LLC ",3 74 | Quad Cities 205000265,DPR-30,BWR,"20 miles NE of Moline,  IL","Exelon Generation Co., LLC ",3 75 | River Bend 105000458,NPF-47,BWR,"24 miles NNW of Baton Rouge,  LA","Entergy Nuclear Operations, Inc.",4 76 | Robinson 205000261,DPR-23,PWR,"26 miles NW of Florence,  SC","Duke Energy Progress, LLC ",2 77 | Saint Lucie 105000335,DPR-67,PWR,"10 miles SE of Ft. Pierce,  FL",Florida Power & Light Co. ,2 78 | Saint Lucie 205000389,NPF-16,PWR,"10 miles SE of Ft. Pierce,  FL",Florida Power & Light Co. ,2 79 | Salem 105000272,DPR-70,PWR,"18 miles S of Wilmington,  DE","PSEG Nuclear, LLC",1 80 | Salem 205000311,DPR-75,PWR,"18 miles S of Wilmington,  DE","PSEG Nuclear, LLC",1 81 | Seabrook 105000443,NPF-86,PWR,"13 miles S of Portsmouth,  NH","NextEra Energy Seabrook, LLC",1 82 | Sequoyah 105000327,DPR-77,PWR,"16 miles NE of Chattanooga,  TN",Tennessee Valley Authority ,2 83 | Sequoyah 205000328,DPR-79,PWR,"16 miles NE of Chattanooga,  TN",Tennessee Valley Authority ,2 84 | Shearon Harris 105000400,NPF-63,PWR,"20 miles SW of Raleigh,  NC","Duke Energy Progress, LLC",2 85 | South Texas 105000498,NPF-76,PWR,"90 miles SW of Houston, TX",STP Nuclear Operating Co. ,4 86 | South Texas 205000499,NPF-80,PWR,"90 miles SW of Houston, TX",STP Nuclear Operating Co. ,4 87 | Summer05000395,NPF-12,PWR,"26 miles NW of Columbia,  SC",South Carolina Electric & Gas Co. ,2 88 | Surry 105000280,DPR-32,PWR,"17 miles NW of Newport News,  VA",Dominion Generation,2 89 | Surry 205000281,DPR-37,PWR,"17 miles NW of Newport News,  VA",Dominion Generation,2 90 | Susquehanna 105000387,NPF-14,BWR,"70 miles NE of Harrisburg, PA","Susquehanna Nuclear, LLC",1 91 | Susquehanna 205000388,NPF-22,BWR,"70 miles NE of Harrisburg, PA","Susquehanna Nuclear, LLC",1 92 | Turkey Point 305000250,DPR-31,PWR,"20 miles S of Miami,  FL",Florida Power & Light Co. ,2 93 | Turkey Point 405000251,DPR-41,PWR,"20 miles S of Miami,  FL",Florida Power & Light Co. ,2 94 | Vogtle 105000424,NPF-68,PWR,"26 miles SE of Augusta,  GA",Southern Nuclear Operating Co.,2 95 | Vogtle 205000425,NPF-81,PWR,"26 miles SE of Augusta,  GA",Southern Nuclear Operating Co.,2 96 | Waterford 305000382,NPF-38,PWR,"25 miles W of New Orleans,  LA","Entergy Nuclear Operations, Inc.",4 97 | Watts Bar 105000390,NPF-90,PWR,"60 miles SW of Knoxville, TN",Tennessee Valley Authority ,2 98 | Watts Bar 205000391,NPF-96,PWR,"60 miles SW of Knoxville, TN",Tennessee Valley Authority ,2 99 | Wolf Creek 105000482,NPF-42,PWR,"3.5 miles NE of Burlington,  KS",Wolf Creek Nuclear Operating Corp. ,4 100 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2018 Winny de Jong 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | Python for Journalists 2 | ====================== 3 | *Notebooks and files for the Python for Journalists course on [Datajournalism.com](https://datajournalism.com/watch/python-for-journalists)* 4 | 5 | * [What is Python anyway](#what-is-python-anyway) 6 | * [About the course](#about-the-course) 7 | * [Course Modules](#course-modules) 8 | * 1 Getting started 9 | * 2 Clean data 10 | * 3 Analyse data 11 | * 4 Scrape data 12 | * [Learning More, Reference And Tools](#learning-more-reference-and-tools) 13 | * [About Us](#about-us) 14 | 15 | 16 | What Is Python Anyway 17 | =============================== 18 | 19 | Python is a programming language for general-purpose programming. It's popular among data journalists for its readability, ease of use and efficiency. 20 | 21 | About the Course 22 | ================== 23 | The course Python for Journalists is meant for journalists looking to learn the most common uses of Python for data journalism. During four modules the course teaches you how to set up Python and all Python-related tools on your own computer. Next you'll learn how to clean up messy datasets using the Pandas library. In the third module you'll learn how to analyse data, again using the Pandas library. In the fourth and final module you'll learn how to automatically download data from the web, by using both the Beautiful Soup and Requests libraries to dabbling in webscraping. 24 | 25 | This Python for Journalists course is meant for those who dabbled in Python, but somehow didn't persevere; and for those who can't wait to dive in head first... Though no programming knowledge is required, it helps if you know what a terminal or command prompt is and if you are familiar with Excel. 26 | 27 | 28 | Course Modules 29 | =============== 30 | 31 | For all modules except module 1 Set up, there is a Jupyter Notebook available to follow along during the course. Each notebook contains exercises and explanations. Happy Pythoning! 32 | 33 | ### 1 Set Up 34 | 35 | This module revolves around installing the right tools on your laptop. To follow along in the coming modules, you'll need Python 3, and several Python libraries like Requests, Pandas and BeautifulSoup installed. Jupyter Notebooks come highly recommended. It's recommended that you install all of this software in one go, using the [Anaconda distribution](https://anaconda.org/). This first module does not include a Jupyter Notebook. 36 | 37 | **On your computer:** 38 | 39 | * Install the [Anaconda distribution](https://www.anaconda.com/download/#macos) to install **Python 3**, libraries Requests, Pandas, and BeautifulSoup, and Jupyter Notebooks all at once on your computer. 40 | * Note: choose for the Anaconda installation that includes **Python 3**, at the time of writing that would be Python 3.6. 41 | 42 | **Extra preparation:** 43 | If you want to make sure you have a solid foundation to build up on, you might want to learn about the Python syntax first. Here are some places where you can learn about different data types in Python, which might help before continuing with this course: (Since the following tutorials overlap, choosing one is highly recommended.) 44 | * Online beginner tutorials at [LearnPython.org](https://www.learnpython.org/) 45 | * Digital book [Python for you and me](https://pymbook.readthedocs.io/en/py3/) 46 | 47 | ### 2 Clean data 48 | 49 | In this second module we'll show you how to get into your Python conda environment, and how to start a Jupyter Notebook. Once that's out of the way, you'll learn how to import a CSV-file into your Jupyter Notebook, to get ready for some data cleaning. Among other things you'll learn how to search and replace values inside a column; how to change the datatype of a column; and how to extract data from a column to populate a new column. This module includes two Jupyter Notebooks: one empty and another one completed - all named 'clean data'. 50 | 51 | ### 3 Analyse data 52 | 53 | In this third module, you'll learn how to analyse data using the Pandas library. You'll learn how to explore your dataset, looking at summary statistics - count, median, mean, percentiles, standard deviation etc. - for each column. Next we'll look into how to sort, filter, sum and count values in columns. Finally you'll learn how to group data, creating (for those familiar with Excel) pivot tables, using the Pandas library. This module includes two Jupyter Notebooks: one empty and another one completed - all named 'analyse data'. 54 | 55 | **Extra exercises:** 56 | If you want to make sure you have fully grasped this module, you can take on the extra notebook that contains some exercises. Since this is a later add on to the course, there is no video to accompany this notebook. However, you should be able to pull through without video. :) Off course there are two extra notebooks: one completed, and one for you to work in. 57 | 58 | 59 | ### 4 Scrape data 60 | 61 | The final module revolves around scraping data using both the Requests and the BeautifulSoup libraries. Though in practice you'll likely first want to scrape data, to later clean and analyse those numbers, this module is last for training purposes. The modules on cleaning and analysing data introduced you to Python, Pandas and Jupyter Notebooks. Paving the way for some basic webscraping, including a for loop to collect data as efficient as possible. Finishing this module you should be able to write some basic webscrapers to collect data from the internet. This module includes two Jupyter Notebooks: one empty and another one completed - all named 'scrape data'. 62 | 63 | Learning More 64 | ================= 65 | 66 | * Allen B. Downey's digital book [Think Python: How to Think Like a Computer Scientist](http://greenteapress.com/thinkpython2/html/index.html) 67 | * Swaroop's free online book [A Byte of Python](https://python.swaroopch.com/) 68 | * Dan Bader's [Python video tutorials on YouTube](https://www.youtube.com/channel/UCI0vQvr9aFn27yR6Ej6n5UA) 69 | * Al Sweigart's [Automate the boring stuff with Python](https://automatetheboringstuff.com/) site 70 | * [Coding for Journalists](https://coding-for-journalists.readthedocs.io/en/latest/) 71 | 72 | ### Courses 73 | * [Your First Python Notebook](http://www.firstpythonnotebook.org/index.html): a step-by-step guide to analyzing data with Python and the Jupyter Notebook. 74 | * Data Camp [Python Courses](https://www.datacamp.com/courses/tech:python) 75 | * Zed Shaw's [Learn Python the Hard Way](https://learnpythonthehardway.org/) 76 | * EDx's Course [Introduction to Computer Science and Programming Using Python](https://www.edx.org/course/introduction-computer-science-mitx-6-00-1x-11) 77 | * Coursera [Python for Everybody Specialization](https://www.coursera.org/specializations/python) 78 | * Coursera [Applied Data Science with Python Specialization](https://www.coursera.org/specializations/data-science-python) 79 | 80 | 81 | About Us 82 | ======== 83 | 84 | ### About DataJournalism.com 85 | The European Journalism Center believes that the use of data in journalism is a cornerstone of building resilience in any newsroom. After 10 years of experience running data journalism programmes they've created [DataJournalism.com](https://datajournalism.com). The site provides data journalists with free resources, materials, online video courses and community forums. Once you sign in, you can enroll for free into one of our premium online courses or discuss with the community in our forums. Whether you are new to data journalism or deeply familiar with it, membership will expose you to like-minded data journalists and give you a free space to learn or improve your data skills. 86 | 87 | ### About Winny de Jong 88 | Winny works as a data journalist for the Dutch national news broadcast [NOS](https://nos.nl). There she interviews datasets instead of people trying to find news before it is news. Winny usually speaks about the importance of data literacy, how to develop ideas, and her data journalistic workflow. She has presented before for organizations like TEDx, Brussels News Summit, DataHarvest+ and multiple journalism colleges. Every Sunday she shares the best of the data journalism web in her [data journalism newsletter](https://datajournalistiek.nl/en/newsletter/). Visit her online at [winnymedia.nl](https://winnymedia.nl) or at [her data blog](https://datajournalistiek.nl/en). 89 | --------------------------------------------------------------------------------