├── .dockerignore ├── .gitignore ├── .jupyter └── jupyter_notebook_config.py ├── Dockerfile-Local ├── README.md ├── cookbook ├── Chapter 1 - Reading from a CSV.ipynb ├── Chapter 2 - Selecting data & finding the most common complaint type.ipynb ├── Chapter 3 - Which borough has the most noise complaints (or, more selecting data).ipynb ├── Chapter 4 - Find out on which weekday people bike the most with groupby and aggregate.ipynb ├── Chapter 6 - String Operations- Which month was the snowiest.ipynb ├── Chapter 7 - Cleaning up messy data.ipynb ├── Chapter 8 - How to deal with timestamps.ipynb └── Chapter 9 - Loading data from SQL databases.ipynb ├── data ├── 311-service-requests.csv ├── README.md ├── bikes.csv ├── popularity-contest ├── weather_2012.csv └── weather_2012.sqlite ├── images ├── function-completion.png ├── tab-4-times.png ├── tab-once.png └── tab-twice.png ├── requirements.txt └── runtime.txt /.dockerignore: -------------------------------------------------------------------------------- 1 | # Development Artifacts 2 | .git 3 | .ipynb_checkpoints 4 | .DS_Store 5 | .gitignore 6 | binder 7 | __pycache__ 8 | data/test_db.sqlite 9 | 10 | # Text Artifacts 11 | Dockerfile 12 | README.md 13 | runtime.txt 14 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | .ipynb_checkpoints 2 | __pycache__ 3 | data/test_db.sqlite 4 | -------------------------------------------------------------------------------- /.jupyter/jupyter_notebook_config.py: -------------------------------------------------------------------------------- 1 | c.MappingKernelManager.default_kernel_name = 'python3' 2 | -------------------------------------------------------------------------------- /Dockerfile-Local: -------------------------------------------------------------------------------- 1 | FROM ipython/scipyserver 2 | 3 | RUN mkdir /polars-cookbook 4 | WORKDIR /polars-cookbook 5 | RUN mkdir ./cookbook ./data ./images 6 | 7 | COPY cookbook/*.ipynb ./cookbook/ 8 | COPY cookbook/*.py ./cookbook/ 9 | COPY images/*.png ./images/ 10 | COPY data/* ./data/ 11 | 12 | WORKDIR /polars-cookbook/cookbook/ 13 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | polars cookbook 2 | =============== 3 | 4 | [![Binder](https://mybinder.org/badge.svg)](https://mybinder.org/v2/gh/escobar-west/polars-cookbook/master) 5 | 6 | This is a fork of the [pandas-cookbook](https://github.com/jvns/pandas-cookbook) modified to use the polars library instead of pandas. 7 | 8 | [polars](https://pola-rs.github.io/polars-book/) is a Python library for doing 9 | data analysis. It's really fast and lets you do exploratory work 10 | incredibly quickly. 11 | 12 | The goal of this cookbook is to give you some concrete examples for 13 | getting started with polars. The [docs](https://pola-rs.github.io/polars/py-polars/html/reference/index.html) 14 | are really comprehensive. However, I've often had people 15 | tell me that they have some trouble getting started, so these are 16 | examples with real-world data, and all the bugs and weirdness 17 | that entails. 18 | 19 | It uses 3 datasets: 20 | 21 | * 311 calls in New York 22 | * How many people were on Montréal's bike paths in 2012 23 | * Montreal's weather for 2012, hourly 24 | 25 | It comes with batteries (data) included, so you can try out all the 26 | examples right away. 27 | 28 | Table of Contents 29 | ================= 30 | 31 | 32 | * [Chapter 1: Reading from a CSV](http://nbviewer.jupyter.org/github/escobar-west/polars-cookbook/blob/master/cookbook/Chapter%201%20-%20Reading%20from%20a%20CSV.ipynb) 33 |
Reading your data into polars is pretty much the easiest thing. Even when the encoding is wrong! 34 | * [Chapter 2: Selecting data & finding the most common complaint type](http://nbviewer.jupyter.org/github/escobar-west/polars-cookbook/blob/master/cookbook/Chapter%202%20-%20Selecting%20data%20&%20finding%20the%20most%20common%20complaint%20type.ipynb) 35 |
It's not totally obvious how to select data from a polars dataframe. Here I explain the basics (how to take slices and get columns) 36 | * [Chapter 3: Which borough has the most noise complaints? (or, more selecting data)](http://nbviewer.jupyter.org/github/escobar-west/polars-cookbook/blob/master/cookbook/Chapter%203%20-%20Which%20borough%20has%20the%20most%20noise%20complaints%20%28or%2C%20more%20selecting%20data%29.ipynb) 37 |
Here we get into serious slicing and dicing and learn how to filter dataframes in complicated ways, really fast. 38 | * [Chapter 4: Find out on which weekday people bike the most with groupby and aggregate](http://nbviewer.jupyter.org/github/escobar-west/polars-cookbook/blob/master/cookbook/Chapter%204%20-%20Find%20out%20on%20which%20weekday%20people%20bike%20the%20most%20with%20groupby%20and%20aggregate.ipynb) 39 |
The groupby/aggregate is seriously my favorite thing about polars and I use it all the time. You should probably read this. 40 | * [Chapter 5: Combining dataframes and scraping Canadian weather data](http://nbviewer.jupyter.org/github/escobar-west/polars-cookbook/blob/master/cookbook/Chapter%205%20-%20Combining%20dataframes%20and%20scraping%20Canadian%20weather%20data.ipynb) 41 |
This chapter has been omitted due to inactive web URLs. 42 | * [Chapter 6: String operations! Which month was the snowiest?](http://nbviewer.jupyter.org/github/escobar-west/polars-cookbook/blob/master/cookbook/Chapter%206%20-%20String%20Operations-%20Which%20month%20was%20the%20snowiest.ipynb) 43 |
Strings with polars are great. It has all these vectorized string operations and they're the best. We will turn a bunch of strings containing "Snow" into vectors of numbers in a trice. 44 | * [Chapter 7: Cleaning up messy data](http://nbviewer.jupyter.org/github/escobar-west/polars-cookbook/blob/master/cookbook/Chapter%207%20-%20Cleaning%20up%20messy%20data.ipynb) 45 |
Cleaning up messy data is never a joy, but with polars it's easier <3 46 | * [Chapter 8: Parsing Unix timestamps](http://nbviewer.jupyter.org/github/escobar-west/polars-cookbook/blob/master/cookbook/Chapter%208%20-%20How%20to%20deal%20with%20timestamps.ipynb) 47 |
This is basically a quick trick that took me 2 days to figure out. 48 | * [Chapter 9 - Loading data from SQL databases](http://nbviewer.jupyter.org/github/escobar-west/polars-cookbook/blob/master/cookbook/Chapter%209%20-%20Loading%20data%20from%20SQL%20databases.ipynb) 49 |
How to load data from an SQL database into polars, with examples using SQLite3, PostgreSQL, and MySQL. 50 | 51 | How to use this cookbook 52 | ======================== 53 | 54 | The easiest way is to try it out instantly online using Binder's awesome service. **[Start by clicking here](https://mybinder.org/v2/gh/escobar-west/polars-cookbook/master)**, wait for it to launch, then click on "cookbook", and you'll be off to the races! It will let you run all the code interactively without having to install anything on your computer. 55 | 56 | To install it locally, you'll need Jupyter notebook and polars on your computer. 57 | 58 | You can get these using `pip` (you may want to do this inside a virtual environment to avoid conflicting with your other libraries). 59 | 60 | ```bash 61 | pip install -r requirements.txt 62 | ``` 63 | 64 | This can be difficult to get set up and require you to compile 65 | a whole bunch of things. I instead use and recommend 66 | [Anaconda](https://store.continuum.io/), which is a Python distribution which 67 | will give you everything you need. It's free and open source. 68 | 69 | Once you have polars and Jupyter, you can get going! 70 | 71 | ```bash 72 | git clone https://github.com/escobar-west/polars-cookbook.git 73 | cd polars-cookbook/cookbook 74 | jupyter notebook 75 | ``` 76 | 77 | A tab should open up in your browser at `http://localhost:8888` 78 | 79 | Happy polars! 80 | 81 | Running the cookbook inside a Docker container. 82 | =============================================================== 83 | This repository contains a Dockerfile and can be built into a docker container. 84 | To build the container run following command from inside of the repository directory: 85 | ``` 86 | docker build -t escobar-west/polars-cookbook -f Dockerfile-Local . 87 | ``` 88 | run the container: 89 | ``` 90 | docker run -d -p 8888:8888 -e "PASSWORD=MakeAPassword" 91 | ``` 92 | you can find out about the id of the image, by checking 93 | ``` 94 | docker images 95 | ``` 96 | 97 | After starting the container, you can access the Jupyter notebook with the cookbook 98 | on port 8888. 99 | 100 | License 101 | ======= 102 | 103 | Creative Commons License
104 | 105 | This work is licensed under a [Creative Commons Attribution-ShareAlike 4.0 International License](http://creativecommons.org/licenses/by-sa/4.0/) 106 | -------------------------------------------------------------------------------- /cookbook/Chapter 3 - Which borough has the most noise complaints (or, more selecting data).ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 1, 6 | "metadata": {}, 7 | "outputs": [ 8 | { 9 | "name": "stdout", 10 | "output_type": "stream", 11 | "text": [ 12 | "1.6.0\n" 13 | ] 14 | } 15 | ], 16 | "source": [ 17 | "import polars as pl\n", 18 | "import polars.selectors as cs\n", 19 | "import seaborn as sbn\n", 20 | "import matplotlib.pyplot as plt\n", 21 | "\n", 22 | "# Make the graphs a bit prettier, and bigger\n", 23 | "plt.style.use('ggplot')\n", 24 | "plt.rcParams['figure.figsize'] = (15, 5)\n", 25 | "print(pl.__version__)" 26 | ] 27 | }, 28 | { 29 | "attachments": {}, 30 | "cell_type": "markdown", 31 | "metadata": {}, 32 | "source": [ 33 | "Let's continue with our NYC 311 service requests example." 34 | ] 35 | }, 36 | { 37 | "cell_type": "code", 38 | "execution_count": 2, 39 | "metadata": {}, 40 | "outputs": [], 41 | "source": [ 42 | "# because of mixed types we specify dtype to prevent any errors\n", 43 | "complaints = pl.read_csv('../data/311-service-requests.csv', schema_overrides={'Incident Zip':pl.String})" 44 | ] 45 | }, 46 | { 47 | "attachments": {}, 48 | "cell_type": "markdown", 49 | "metadata": {}, 50 | "source": [ 51 | "# 3.1 Selecting only noise complaints" 52 | ] 53 | }, 54 | { 55 | "attachments": {}, 56 | "cell_type": "markdown", 57 | "metadata": {}, 58 | "source": [ 59 | "I'd like to know which borough has the most noise complaints. First, we'll take a look at the data to see what it looks like:" 60 | ] 61 | }, 62 | { 63 | "cell_type": "code", 64 | "execution_count": 3, 65 | "metadata": {}, 66 | "outputs": [ 67 | { 68 | "data": { 69 | "text/html": [ 70 | "
\n", 77 | "shape: (5, 52)
Unique KeyCreated DateClosed DateAgencyAgency NameComplaint TypeDescriptorLocation TypeIncident ZipIncident AddressStreet NameCross Street 1Cross Street 2Intersection Street 1Intersection Street 2Address TypeCityLandmarkFacility TypeStatusDue DateResolution Action Updated DateCommunity BoardBoroughX Coordinate (State Plane)Y Coordinate (State Plane)Park Facility NamePark BoroughSchool NameSchool NumberSchool RegionSchool CodeSchool Phone NumberSchool AddressSchool CitySchool StateSchool ZipSchool Not FoundSchool or Citywide ComplaintVehicle TypeTaxi Company BoroughTaxi Pick Up LocationBridge Highway NameBridge Highway DirectionRoad RampBridge Highway SegmentGarage Lot NameFerry DirectionFerry Terminal NameLatitudeLongitudeLocation
i64strstrstrstrstrstrstrstrstrstrstrstrstrstrstrstrstrstrstrstrstrstrstri64i64strstrstrstrstrstrstrstrstrstrstrstrstrstrstrstrstrstrstrstrstrstrstrf64f64str
26589651"10/31/2013 02:08:41 AM"null"NYPD""New York City Police Departmen…"Noise - Street/Sidewalk""Loud Talking""Street/Sidewalk""11432""90-03 169 STREET""169 STREET""90 AVENUE""91 AVENUE"nullnull"ADDRESS""JAMAICA"null"Precinct""Assigned""10/31/2013 10:08:41 AM""10/31/2013 02:35:17 AM""12 QUEENS""QUEENS"1042027197389"Unspecified""QUEENS""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""N"nullnullnullnullnullnullnullnullnullnullnull40.708275-73.791604"(40.70827532593202, -73.791603…
26593698"10/31/2013 02:01:04 AM"null"NYPD""New York City Police Departmen…"Illegal Parking""Commercial Overnight Parking""Street/Sidewalk""11378""58 AVENUE""58 AVENUE""58 PLACE""59 STREET"nullnull"BLOCKFACE""MASPETH"null"Precinct""Open""10/31/2013 10:01:04 AM"null"05 QUEENS""QUEENS"1009349201984"Unspecified""QUEENS""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""N"nullnullnullnullnullnullnullnullnullnullnull40.721041-73.909453"(40.721040535628305, -73.90945…
26594139"10/31/2013 02:00:24 AM""10/31/2013 02:40:32 AM""NYPD""New York City Police Departmen…"Noise - Commercial""Loud Music/Party""Club/Bar/Restaurant""10032""4060 BROADWAY""BROADWAY""WEST 171 STREET""WEST 172 STREET"nullnull"ADDRESS""NEW YORK"null"Precinct""Closed""10/31/2013 10:00:24 AM""10/31/2013 02:39:42 AM""12 MANHATTAN""MANHATTAN"1001088246531"Unspecified""MANHATTAN""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""N"nullnullnullnullnullnullnullnullnullnullnull40.84333-73.939144"(40.84332975466513, -73.939143…
26595721"10/31/2013 01:56:23 AM""10/31/2013 02:21:48 AM""NYPD""New York City Police Departmen…"Noise - Vehicle""Car/Truck Horn""Street/Sidewalk""10023""WEST 72 STREET""WEST 72 STREET""COLUMBUS AVENUE""AMSTERDAM AVENUE"nullnull"BLOCKFACE""NEW YORK"null"Precinct""Closed""10/31/2013 09:56:23 AM""10/31/2013 02:21:10 AM""07 MANHATTAN""MANHATTAN"989730222727"Unspecified""MANHATTAN""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""N"nullnullnullnullnullnullnullnullnullnullnull40.778009-73.980213"(40.7780087446372, -73.9802134…
26590930"10/31/2013 01:53:44 AM"null"DOHMH""Department of Health and Menta…"Rodent""Condition Attracting Rodents""Vacant Lot""10027""WEST 124 STREET""WEST 124 STREET""LENOX AVENUE""ADAM CLAYTON POWELL JR BOULEVA…nullnull"BLOCKFACE""NEW YORK"null"N/A""Pending""11/30/2013 01:53:44 AM""10/31/2013 01:59:54 AM""10 MANHATTAN""MANHATTAN"998815233545"Unspecified""MANHATTAN""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""N"nullnullnullnullnullnullnullnullnullnullnull40.807691-73.947387"(40.80769092704951, -73.947387…
" 78 | ], 79 | "text/plain": [ 80 | "shape: (5, 52)\n", 81 | "┌────────────┬────────────┬───────────┬────────┬───┬───────────┬───────────┬───────────┬───────────┐\n", 82 | "│ Unique Key ┆ Created ┆ Closed ┆ Agency ┆ … ┆ Ferry ┆ Latitude ┆ Longitude ┆ Location │\n", 83 | "│ --- ┆ Date ┆ Date ┆ --- ┆ ┆ Terminal ┆ --- ┆ --- ┆ --- │\n", 84 | "│ i64 ┆ --- ┆ --- ┆ str ┆ ┆ Name ┆ f64 ┆ f64 ┆ str │\n", 85 | "│ ┆ str ┆ str ┆ ┆ ┆ --- ┆ ┆ ┆ │\n", 86 | "│ ┆ ┆ ┆ ┆ ┆ str ┆ ┆ ┆ │\n", 87 | "╞════════════╪════════════╪═══════════╪════════╪═══╪═══════════╪═══════════╪═══════════╪═══════════╡\n", 88 | "│ 26589651 ┆ 10/31/2013 ┆ null ┆ NYPD ┆ … ┆ null ┆ 40.708275 ┆ -73.79160 ┆ (40.70827 │\n", 89 | "│ ┆ 02:08:41 ┆ ┆ ┆ ┆ ┆ ┆ 4 ┆ 532593202 │\n", 90 | "│ ┆ AM ┆ ┆ ┆ ┆ ┆ ┆ ┆ , -73.791 │\n", 91 | "│ ┆ ┆ ┆ ┆ ┆ ┆ ┆ ┆ 603… │\n", 92 | "│ 26593698 ┆ 10/31/2013 ┆ null ┆ NYPD ┆ … ┆ null ┆ 40.721041 ┆ -73.90945 ┆ (40.72104 │\n", 93 | "│ ┆ 02:01:04 ┆ ┆ ┆ ┆ ┆ ┆ 3 ┆ 053562830 │\n", 94 | "│ ┆ AM ┆ ┆ ┆ ┆ ┆ ┆ ┆ 5, -73.90 │\n", 95 | "│ ┆ ┆ ┆ ┆ ┆ ┆ ┆ ┆ 945… │\n", 96 | "│ 26594139 ┆ 10/31/2013 ┆ 10/31/201 ┆ NYPD ┆ … ┆ null ┆ 40.84333 ┆ -73.93914 ┆ (40.84332 │\n", 97 | "│ ┆ 02:00:24 ┆ 3 ┆ ┆ ┆ ┆ ┆ 4 ┆ 975466513 │\n", 98 | "│ ┆ AM ┆ 02:40:32 ┆ ┆ ┆ ┆ ┆ ┆ , -73.939 │\n", 99 | "│ ┆ ┆ AM ┆ ┆ ┆ ┆ ┆ ┆ 143… │\n", 100 | "│ 26595721 ┆ 10/31/2013 ┆ 10/31/201 ┆ NYPD ┆ … ┆ null ┆ 40.778009 ┆ -73.98021 ┆ (40.77800 │\n", 101 | "│ ┆ 01:56:23 ┆ 3 ┆ ┆ ┆ ┆ ┆ 3 ┆ 87446372, │\n", 102 | "│ ┆ AM ┆ 02:21:48 ┆ ┆ ┆ ┆ ┆ ┆ -73.98021 │\n", 103 | "│ ┆ ┆ AM ┆ ┆ ┆ ┆ ┆ ┆ 34… │\n", 104 | "│ 26590930 ┆ 10/31/2013 ┆ null ┆ DOHMH ┆ … ┆ null ┆ 40.807691 ┆ -73.94738 ┆ (40.80769 │\n", 105 | "│ ┆ 01:53:44 ┆ ┆ ┆ ┆ ┆ ┆ 7 ┆ 092704951 │\n", 106 | "│ ┆ AM ┆ ┆ ┆ ┆ ┆ ┆ ┆ , -73.947 │\n", 107 | "│ ┆ ┆ ┆ ┆ ┆ ┆ ┆ ┆ 387… │\n", 108 | "└────────────┴────────────┴───────────┴────────┴───┴───────────┴───────────┴───────────┴───────────┘" 109 | ] 110 | }, 111 | "execution_count": 3, 112 | "metadata": {}, 113 | "output_type": "execute_result" 114 | } 115 | ], 116 | "source": [ 117 | "complaints.head()" 118 | ] 119 | }, 120 | { 121 | "attachments": {}, 122 | "cell_type": "markdown", 123 | "metadata": {}, 124 | "source": [ 125 | "To get the noise complaints, we need to find the rows where the \"Complaint Type\" column is \"Noise - Street/Sidewalk\". I'll show you how to do that, and then explain what's going on." 126 | ] 127 | }, 128 | { 129 | "cell_type": "code", 130 | "execution_count": 4, 131 | "metadata": {}, 132 | "outputs": [ 133 | { 134 | "data": { 135 | "text/html": [ 136 | "
\n", 143 | "shape: (3, 52)
Unique KeyCreated DateClosed DateAgencyAgency NameComplaint TypeDescriptorLocation TypeIncident ZipIncident AddressStreet NameCross Street 1Cross Street 2Intersection Street 1Intersection Street 2Address TypeCityLandmarkFacility TypeStatusDue DateResolution Action Updated DateCommunity BoardBoroughX Coordinate (State Plane)Y Coordinate (State Plane)Park Facility NamePark BoroughSchool NameSchool NumberSchool RegionSchool CodeSchool Phone NumberSchool AddressSchool CitySchool StateSchool ZipSchool Not FoundSchool or Citywide ComplaintVehicle TypeTaxi Company BoroughTaxi Pick Up LocationBridge Highway NameBridge Highway DirectionRoad RampBridge Highway SegmentGarage Lot NameFerry DirectionFerry Terminal NameLatitudeLongitudeLocation
i64strstrstrstrstrstrstrstrstrstrstrstrstrstrstrstrstrstrstrstrstrstrstri64i64strstrstrstrstrstrstrstrstrstrstrstrstrstrstrstrstrstrstrstrstrstrstrf64f64str
26589651"10/31/2013 02:08:41 AM"null"NYPD""New York City Police Departmen…"Noise - Street/Sidewalk""Loud Talking""Street/Sidewalk""11432""90-03 169 STREET""169 STREET""90 AVENUE""91 AVENUE"nullnull"ADDRESS""JAMAICA"null"Precinct""Assigned""10/31/2013 10:08:41 AM""10/31/2013 02:35:17 AM""12 QUEENS""QUEENS"1042027197389"Unspecified""QUEENS""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""N"nullnullnullnullnullnullnullnullnullnullnull40.708275-73.791604"(40.70827532593202, -73.791603…
26594086"10/31/2013 12:54:03 AM""10/31/2013 02:16:39 AM""NYPD""New York City Police Departmen…"Noise - Street/Sidewalk""Loud Music/Party""Street/Sidewalk""10310""173 CAMPBELL AVENUE""CAMPBELL AVENUE""HENDERSON AVENUE""WINEGAR LANE"nullnull"ADDRESS""STATEN ISLAND"null"Precinct""Closed""10/31/2013 08:54:03 AM""10/31/2013 02:07:14 AM""01 STATEN ISLAND""STATEN ISLAND"952013171076"Unspecified""STATEN ISLAND""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""N"nullnullnullnullnullnullnullnullnullnullnull40.636182-74.11615"(40.63618202176914, -74.116150…
26591573"10/31/2013 12:35:18 AM""10/31/2013 02:41:35 AM""NYPD""New York City Police Departmen…"Noise - Street/Sidewalk""Loud Talking""Street/Sidewalk""10312""24 PRINCETON LANE""PRINCETON LANE""HAMPTON GREEN""DEAD END"nullnull"ADDRESS""STATEN ISLAND"null"Precinct""Closed""10/31/2013 08:35:18 AM""10/31/2013 01:45:17 AM""03 STATEN ISLAND""STATEN ISLAND"929577140964"Unspecified""STATEN ISLAND""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""N"nullnullnullnullnullnullnullnullnullnullnull40.553421-74.196743"(40.55342078716953, -74.196743…
" 144 | ], 145 | "text/plain": [ 146 | "shape: (3, 52)\n", 147 | "┌────────────┬────────────┬───────────┬────────┬───┬───────────┬───────────┬───────────┬───────────┐\n", 148 | "│ Unique Key ┆ Created ┆ Closed ┆ Agency ┆ … ┆ Ferry ┆ Latitude ┆ Longitude ┆ Location │\n", 149 | "│ --- ┆ Date ┆ Date ┆ --- ┆ ┆ Terminal ┆ --- ┆ --- ┆ --- │\n", 150 | "│ i64 ┆ --- ┆ --- ┆ str ┆ ┆ Name ┆ f64 ┆ f64 ┆ str │\n", 151 | "│ ┆ str ┆ str ┆ ┆ ┆ --- ┆ ┆ ┆ │\n", 152 | "│ ┆ ┆ ┆ ┆ ┆ str ┆ ┆ ┆ │\n", 153 | "╞════════════╪════════════╪═══════════╪════════╪═══╪═══════════╪═══════════╪═══════════╪═══════════╡\n", 154 | "│ 26589651 ┆ 10/31/2013 ┆ null ┆ NYPD ┆ … ┆ null ┆ 40.708275 ┆ -73.79160 ┆ (40.70827 │\n", 155 | "│ ┆ 02:08:41 ┆ ┆ ┆ ┆ ┆ ┆ 4 ┆ 532593202 │\n", 156 | "│ ┆ AM ┆ ┆ ┆ ┆ ┆ ┆ ┆ , -73.791 │\n", 157 | "│ ┆ ┆ ┆ ┆ ┆ ┆ ┆ ┆ 603… │\n", 158 | "│ 26594086 ┆ 10/31/2013 ┆ 10/31/201 ┆ NYPD ┆ … ┆ null ┆ 40.636182 ┆ -74.11615 ┆ (40.63618 │\n", 159 | "│ ┆ 12:54:03 ┆ 3 ┆ ┆ ┆ ┆ ┆ ┆ 202176914 │\n", 160 | "│ ┆ AM ┆ 02:16:39 ┆ ┆ ┆ ┆ ┆ ┆ , -74.116 │\n", 161 | "│ ┆ ┆ AM ┆ ┆ ┆ ┆ ┆ ┆ 150… │\n", 162 | "│ 26591573 ┆ 10/31/2013 ┆ 10/31/201 ┆ NYPD ┆ … ┆ null ┆ 40.553421 ┆ -74.19674 ┆ (40.55342 │\n", 163 | "│ ┆ 12:35:18 ┆ 3 ┆ ┆ ┆ ┆ ┆ 3 ┆ 078716953 │\n", 164 | "│ ┆ AM ┆ 02:41:35 ┆ ┆ ┆ ┆ ┆ ┆ , -74.196 │\n", 165 | "│ ┆ ┆ AM ┆ ┆ ┆ ┆ ┆ ┆ 743… │\n", 166 | "└────────────┴────────────┴───────────┴────────┴───┴───────────┴───────────┴───────────┴───────────┘" 167 | ] 168 | }, 169 | "execution_count": 4, 170 | "metadata": {}, 171 | "output_type": "execute_result" 172 | } 173 | ], 174 | "source": [ 175 | "noise_complaints = complaints.filter(pl.col('Complaint Type') == \"Noise - Street/Sidewalk\")\n", 176 | "noise_complaints.head(3)" 177 | ] 178 | }, 179 | { 180 | "attachments": {}, 181 | "cell_type": "markdown", 182 | "metadata": {}, 183 | "source": [ 184 | "If you look at `noise_complaints`, you'll see that this worked, and it only contains complaints with the right complaint type. But how does this work? Let's deconstruct it into two pieces" 185 | ] 186 | }, 187 | { 188 | "cell_type": "code", 189 | "execution_count": 5, 190 | "metadata": {}, 191 | "outputs": [ 192 | { 193 | "data": { 194 | "text/html": [ 195 | "[(col(\"Complaint Type\")) == (String(Noise - Street/Sidewalk))]" 196 | ], 197 | "text/plain": [ 198 | "" 199 | ] 200 | }, 201 | "execution_count": 5, 202 | "metadata": {}, 203 | "output_type": "execute_result" 204 | } 205 | ], 206 | "source": [ 207 | "pl.col('Complaint Type') == \"Noise - Street/Sidewalk\"" 208 | ] 209 | }, 210 | { 211 | "attachments": {}, 212 | "cell_type": "markdown", 213 | "metadata": {}, 214 | "source": [ 215 | "This is a polars expression, which represents a transformation of Series. In this case, this expression represents a mapping from 'Complaint Type' Series (a str) to a boolean Series based on the predecate. The \"filter\" function will take in any expression which evaluates to a boolean Series.\n", 216 | "\n", 217 | "You can also store and combine more than one expression with the `&` operator like this:" 218 | ] 219 | }, 220 | { 221 | "cell_type": "code", 222 | "execution_count": 6, 223 | "metadata": {}, 224 | "outputs": [ 225 | { 226 | "data": { 227 | "text/html": [ 228 | "
\n", 235 | "shape: (5, 52)
Unique KeyCreated DateClosed DateAgencyAgency NameComplaint TypeDescriptorLocation TypeIncident ZipIncident AddressStreet NameCross Street 1Cross Street 2Intersection Street 1Intersection Street 2Address TypeCityLandmarkFacility TypeStatusDue DateResolution Action Updated DateCommunity BoardBoroughX Coordinate (State Plane)Y Coordinate (State Plane)Park Facility NamePark BoroughSchool NameSchool NumberSchool RegionSchool CodeSchool Phone NumberSchool AddressSchool CitySchool StateSchool ZipSchool Not FoundSchool or Citywide ComplaintVehicle TypeTaxi Company BoroughTaxi Pick Up LocationBridge Highway NameBridge Highway DirectionRoad RampBridge Highway SegmentGarage Lot NameFerry DirectionFerry Terminal NameLatitudeLongitudeLocation
i64strstrstrstrstrstrstrstrstrstrstrstrstrstrstrstrstrstrstrstrstrstrstri64i64strstrstrstrstrstrstrstrstrstrstrstrstrstrstrstrstrstrstrstrstrstrstrf64f64str
26595564"10/31/2013 12:30:36 AM"null"NYPD""New York City Police Departmen…"Noise - Street/Sidewalk""Loud Music/Party""Street/Sidewalk""11236""AVENUE J""AVENUE J""EAST 80 STREET""EAST 81 STREET"nullnull"BLOCKFACE""BROOKLYN"null"Precinct""Open""10/31/2013 08:30:36 AM"null"18 BROOKLYN""BROOKLYN"1008937170310"Unspecified""BROOKLYN""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""N"nullnullnullnullnullnullnullnullnullnullnull40.634104-73.911055"(40.634103775951736, -73.91105…
26595553"10/31/2013 12:05:10 AM""10/31/2013 02:43:43 AM""NYPD""New York City Police Departmen…"Noise - Street/Sidewalk""Loud Talking""Street/Sidewalk""11225""25 LEFFERTS AVENUE""LEFFERTS AVENUE""WASHINGTON AVENUE""BEDFORD AVENUE"nullnull"ADDRESS""BROOKLYN"null"Precinct""Closed""10/31/2013 08:05:10 AM""10/31/2013 01:29:29 AM""09 BROOKLYN""BROOKLYN"995366180388"Unspecified""BROOKLYN""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""N"nullnullnullnullnullnullnullnullnullnullnull40.661793-73.959934"(40.6617931276793, -73.9599336…
26594653"10/30/2013 11:26:32 PM""10/31/2013 12:18:54 AM""NYPD""New York City Police Departmen…"Noise - Street/Sidewalk""Loud Music/Party""Street/Sidewalk""11222"nullnullnullnull"DOBBIN STREET""NORMAN STREET""INTERSECTION""BROOKLYN"null"Precinct""Closed""10/31/2013 07:26:32 AM""10/31/2013 12:18:54 AM""01 BROOKLYN""BROOKLYN"996925203271"Unspecified""BROOKLYN""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""N"nullnullnullnullnullnullnullnullnullnullnull40.7246-73.954271"(40.724599563793525, -73.95427…
26591992"10/30/2013 10:02:58 PM""10/30/2013 10:23:20 PM""NYPD""New York City Police Departmen…"Noise - Street/Sidewalk""Loud Talking""Street/Sidewalk""11218""DITMAS AVENUE""DITMAS AVENUE"nullnullnullnull"LATLONG""BROOKLYN"null"Precinct""Closed""10/31/2013 06:02:58 AM""10/30/2013 10:23:20 PM""01 BROOKLYN""BROOKLYN"991895171051"Unspecified""BROOKLYN""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""N"nullnullnullnullnullnullnullnullnullnullnull40.636169-73.972455"(40.63616876563881, -73.972455…
26594167"10/30/2013 08:38:25 PM""10/30/2013 10:26:28 PM""NYPD""New York City Police Departmen…"Noise - Street/Sidewalk""Loud Music/Party""Street/Sidewalk""11218""126 BEVERLY ROAD""BEVERLY ROAD""CHURCH AVENUE""EAST 2 STREET"nullnull"ADDRESS""BROOKLYN"null"Precinct""Closed""10/31/2013 04:38:25 AM""10/30/2013 10:26:28 PM""12 BROOKLYN""BROOKLYN"990144173511"Unspecified""BROOKLYN""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""N"nullnullnullnullnullnullnullnullnullnullnull40.642922-73.978762"(40.6429222774404, -73.9787617…
" 236 | ], 237 | "text/plain": [ 238 | "shape: (5, 52)\n", 239 | "┌────────────┬────────────┬───────────┬────────┬───┬───────────┬───────────┬───────────┬───────────┐\n", 240 | "│ Unique Key ┆ Created ┆ Closed ┆ Agency ┆ … ┆ Ferry ┆ Latitude ┆ Longitude ┆ Location │\n", 241 | "│ --- ┆ Date ┆ Date ┆ --- ┆ ┆ Terminal ┆ --- ┆ --- ┆ --- │\n", 242 | "│ i64 ┆ --- ┆ --- ┆ str ┆ ┆ Name ┆ f64 ┆ f64 ┆ str │\n", 243 | "│ ┆ str ┆ str ┆ ┆ ┆ --- ┆ ┆ ┆ │\n", 244 | "│ ┆ ┆ ┆ ┆ ┆ str ┆ ┆ ┆ │\n", 245 | "╞════════════╪════════════╪═══════════╪════════╪═══╪═══════════╪═══════════╪═══════════╪═══════════╡\n", 246 | "│ 26595564 ┆ 10/31/2013 ┆ null ┆ NYPD ┆ … ┆ null ┆ 40.634104 ┆ -73.91105 ┆ (40.63410 │\n", 247 | "│ ┆ 12:30:36 ┆ ┆ ┆ ┆ ┆ ┆ 5 ┆ 377595173 │\n", 248 | "│ ┆ AM ┆ ┆ ┆ ┆ ┆ ┆ ┆ 6, -73.91 │\n", 249 | "│ ┆ ┆ ┆ ┆ ┆ ┆ ┆ ┆ 105… │\n", 250 | "│ 26595553 ┆ 10/31/2013 ┆ 10/31/201 ┆ NYPD ┆ … ┆ null ┆ 40.661793 ┆ -73.95993 ┆ (40.66179 │\n", 251 | "│ ┆ 12:05:10 ┆ 3 ┆ ┆ ┆ ┆ ┆ 4 ┆ 31276793, │\n", 252 | "│ ┆ AM ┆ 02:43:43 ┆ ┆ ┆ ┆ ┆ ┆ -73.95993 │\n", 253 | "│ ┆ ┆ AM ┆ ┆ ┆ ┆ ┆ ┆ 36… │\n", 254 | "│ 26594653 ┆ 10/30/2013 ┆ 10/31/201 ┆ NYPD ┆ … ┆ null ┆ 40.7246 ┆ -73.95427 ┆ (40.72459 │\n", 255 | "│ ┆ 11:26:32 ┆ 3 ┆ ┆ ┆ ┆ ┆ 1 ┆ 956379352 │\n", 256 | "│ ┆ PM ┆ 12:18:54 ┆ ┆ ┆ ┆ ┆ ┆ 5, -73.95 │\n", 257 | "│ ┆ ┆ AM ┆ ┆ ┆ ┆ ┆ ┆ 427… │\n", 258 | "│ 26591992 ┆ 10/30/2013 ┆ 10/30/201 ┆ NYPD ┆ … ┆ null ┆ 40.636169 ┆ -73.97245 ┆ (40.63616 │\n", 259 | "│ ┆ 10:02:58 ┆ 3 ┆ ┆ ┆ ┆ ┆ 5 ┆ 876563881 │\n", 260 | "│ ┆ PM ┆ 10:23:20 ┆ ┆ ┆ ┆ ┆ ┆ , -73.972 │\n", 261 | "│ ┆ ┆ PM ┆ ┆ ┆ ┆ ┆ ┆ 455… │\n", 262 | "│ 26594167 ┆ 10/30/2013 ┆ 10/30/201 ┆ NYPD ┆ … ┆ null ┆ 40.642922 ┆ -73.97876 ┆ (40.64292 │\n", 263 | "│ ┆ 08:38:25 ┆ 3 ┆ ┆ ┆ ┆ ┆ 2 ┆ 22774404, │\n", 264 | "│ ┆ PM ┆ 10:26:28 ┆ ┆ ┆ ┆ ┆ ┆ -73.97876 │\n", 265 | "│ ┆ ┆ PM ┆ ┆ ┆ ┆ ┆ ┆ 17… │\n", 266 | "└────────────┴────────────┴───────────┴────────┴───┴───────────┴───────────┴───────────┴───────────┘" 267 | ] 268 | }, 269 | "execution_count": 6, 270 | "metadata": {}, 271 | "output_type": "execute_result" 272 | } 273 | ], 274 | "source": [ 275 | "is_noise = pl.col('Complaint Type') == \"Noise - Street/Sidewalk\"\n", 276 | "in_brooklyn = pl.col('Borough') == \"BROOKLYN\"\n", 277 | "complaints.filter(is_noise & in_brooklyn).head()" 278 | ] 279 | }, 280 | { 281 | "attachments": {}, 282 | "cell_type": "markdown", 283 | "metadata": {}, 284 | "source": [ 285 | "Or if we just wanted a few columns:" 286 | ] 287 | }, 288 | { 289 | "cell_type": "code", 290 | "execution_count": 7, 291 | "metadata": {}, 292 | "outputs": [ 293 | { 294 | "data": { 295 | "text/html": [ 296 | "
\n", 303 | "shape: (10, 4)
Complaint TypeBoroughCreated DateDescriptor
strstrstrstr
"Noise - Street/Sidewalk""BROOKLYN""10/31/2013 12:30:36 AM""Loud Music/Party"
"Noise - Street/Sidewalk""BROOKLYN""10/31/2013 12:05:10 AM""Loud Talking"
"Noise - Street/Sidewalk""BROOKLYN""10/30/2013 11:26:32 PM""Loud Music/Party"
"Noise - Street/Sidewalk""BROOKLYN""10/30/2013 10:02:58 PM""Loud Talking"
"Noise - Street/Sidewalk""BROOKLYN""10/30/2013 08:38:25 PM""Loud Music/Party"
"Noise - Street/Sidewalk""BROOKLYN""10/30/2013 08:32:13 PM""Loud Talking"
"Noise - Street/Sidewalk""BROOKLYN""10/30/2013 06:07:39 PM""Loud Music/Party"
"Noise - Street/Sidewalk""BROOKLYN""10/30/2013 03:04:51 PM""Loud Talking"
"Noise - Street/Sidewalk""BROOKLYN""10/29/2013 10:07:02 PM""Loud Talking"
"Noise - Street/Sidewalk""BROOKLYN""10/29/2013 08:15:59 PM""Loud Music/Party"
" 304 | ], 305 | "text/plain": [ 306 | "shape: (10, 4)\n", 307 | "┌─────────────────────────┬──────────┬────────────────────────┬──────────────────┐\n", 308 | "│ Complaint Type ┆ Borough ┆ Created Date ┆ Descriptor │\n", 309 | "│ --- ┆ --- ┆ --- ┆ --- │\n", 310 | "│ str ┆ str ┆ str ┆ str │\n", 311 | "╞═════════════════════════╪══════════╪════════════════════════╪══════════════════╡\n", 312 | "│ Noise - Street/Sidewalk ┆ BROOKLYN ┆ 10/31/2013 12:30:36 AM ┆ Loud Music/Party │\n", 313 | "│ Noise - Street/Sidewalk ┆ BROOKLYN ┆ 10/31/2013 12:05:10 AM ┆ Loud Talking │\n", 314 | "│ Noise - Street/Sidewalk ┆ BROOKLYN ┆ 10/30/2013 11:26:32 PM ┆ Loud Music/Party │\n", 315 | "│ Noise - Street/Sidewalk ┆ BROOKLYN ┆ 10/30/2013 10:02:58 PM ┆ Loud Talking │\n", 316 | "│ Noise - Street/Sidewalk ┆ BROOKLYN ┆ 10/30/2013 08:38:25 PM ┆ Loud Music/Party │\n", 317 | "│ Noise - Street/Sidewalk ┆ BROOKLYN ┆ 10/30/2013 08:32:13 PM ┆ Loud Talking │\n", 318 | "│ Noise - Street/Sidewalk ┆ BROOKLYN ┆ 10/30/2013 06:07:39 PM ┆ Loud Music/Party │\n", 319 | "│ Noise - Street/Sidewalk ┆ BROOKLYN ┆ 10/30/2013 03:04:51 PM ┆ Loud Talking │\n", 320 | "│ Noise - Street/Sidewalk ┆ BROOKLYN ┆ 10/29/2013 10:07:02 PM ┆ Loud Talking │\n", 321 | "│ Noise - Street/Sidewalk ┆ BROOKLYN ┆ 10/29/2013 08:15:59 PM ┆ Loud Music/Party │\n", 322 | "└─────────────────────────┴──────────┴────────────────────────┴──────────────────┘" 323 | ] 324 | }, 325 | "execution_count": 7, 326 | "metadata": {}, 327 | "output_type": "execute_result" 328 | } 329 | ], 330 | "source": [ 331 | "complaints.filter(is_noise & in_brooklyn).select('Complaint Type', 'Borough', 'Created Date', 'Descriptor').head(10)" 332 | ] 333 | }, 334 | { 335 | "attachments": {}, 336 | "cell_type": "markdown", 337 | "metadata": {}, 338 | "source": [ 339 | "# 3.2 So, which borough has the most noise complaints?" 340 | ] 341 | }, 342 | { 343 | "cell_type": "code", 344 | "execution_count": 8, 345 | "metadata": {}, 346 | "outputs": [ 347 | { 348 | "data": { 349 | "text/html": [ 350 | "
\n", 357 | "shape: (6, 2)
Boroughcount
stru32
"MANHATTAN"917
"BROOKLYN"456
"BRONX"292
"QUEENS"226
"STATEN ISLAND"36
"Unspecified"1
" 358 | ], 359 | "text/plain": [ 360 | "shape: (6, 2)\n", 361 | "┌───────────────┬───────┐\n", 362 | "│ Borough ┆ count │\n", 363 | "│ --- ┆ --- │\n", 364 | "│ str ┆ u32 │\n", 365 | "╞═══════════════╪═══════╡\n", 366 | "│ MANHATTAN ┆ 917 │\n", 367 | "│ BROOKLYN ┆ 456 │\n", 368 | "│ BRONX ┆ 292 │\n", 369 | "│ QUEENS ┆ 226 │\n", 370 | "│ STATEN ISLAND ┆ 36 │\n", 371 | "│ Unspecified ┆ 1 │\n", 372 | "└───────────────┴───────┘" 373 | ] 374 | }, 375 | "execution_count": 8, 376 | "metadata": {}, 377 | "output_type": "execute_result" 378 | } 379 | ], 380 | "source": [ 381 | "noise_complaints = complaints.filter(pl.col('Complaint Type') == \"Noise - Street/Sidewalk\")\n", 382 | "noise_complaints['Borough'].value_counts(sort=True)" 383 | ] 384 | }, 385 | { 386 | "attachments": {}, 387 | "cell_type": "markdown", 388 | "metadata": {}, 389 | "source": [ 390 | "It's Manhattan! But Manhattan probably has a lot of complaints in total. Maybe it's better to get the percentage of all complaints that are noise complaints? That would be easy too with the `group_by` method:" 391 | ] 392 | }, 393 | { 394 | "cell_type": "code", 395 | "execution_count": 9, 396 | "metadata": {}, 397 | "outputs": [ 398 | { 399 | "data": { 400 | "text/html": [ 401 | "
\n", 408 | "shape: (6, 2)
Boroughnoise_complaint_avg
strf64
"MANHATTAN"0.037755
"BRONX"0.014833
"BROOKLYN"0.013864
"QUEENS"0.010143
"STATEN ISLAND"0.007474
"Unspecified"0.000141
" 409 | ], 410 | "text/plain": [ 411 | "shape: (6, 2)\n", 412 | "┌───────────────┬─────────────────────┐\n", 413 | "│ Borough ┆ noise_complaint_avg │\n", 414 | "│ --- ┆ --- │\n", 415 | "│ str ┆ f64 │\n", 416 | "╞═══════════════╪═════════════════════╡\n", 417 | "│ MANHATTAN ┆ 0.037755 │\n", 418 | "│ BRONX ┆ 0.014833 │\n", 419 | "│ BROOKLYN ┆ 0.013864 │\n", 420 | "│ QUEENS ┆ 0.010143 │\n", 421 | "│ STATEN ISLAND ┆ 0.007474 │\n", 422 | "│ Unspecified ┆ 0.000141 │\n", 423 | "└───────────────┴─────────────────────┘" 424 | ] 425 | }, 426 | "execution_count": 9, 427 | "metadata": {}, 428 | "output_type": "execute_result" 429 | } 430 | ], 431 | "source": [ 432 | "complaint_avgs = (\n", 433 | " complaints\n", 434 | " .group_by(\"Borough\")\n", 435 | " .agg(noise_complaint_avg=(pl.col('Complaint Type') == \"Noise - Street/Sidewalk\").mean())\n", 436 | " .sort('noise_complaint_avg', descending=True)\n", 437 | ")\n", 438 | "complaint_avgs" 439 | ] 440 | }, 441 | { 442 | "attachments": {}, 443 | "cell_type": "markdown", 444 | "metadata": {}, 445 | "source": [ 446 | "It looks like noise complaints make up about 3.7% of all complaints in Manhattan. Which isn't a lot, but it's still leading amongst all boroughs." 447 | ] 448 | }, 449 | { 450 | "cell_type": "code", 451 | "execution_count": 10, 452 | "metadata": {}, 453 | "outputs": [ 454 | { 455 | "data": { 456 | "text/plain": [ 457 | "" 458 | ] 459 | }, 460 | "execution_count": 10, 461 | "metadata": {}, 462 | "output_type": "execute_result" 463 | }, 464 | { 465 | "data": { 466 | "image/png": "", 467 | "text/plain": [ 468 | "
" 469 | ] 470 | }, 471 | "metadata": {}, 472 | "output_type": "display_data" 473 | } 474 | ], 475 | "source": [ 476 | "sbn.barplot(complaint_avgs, x='Borough', y='noise_complaint_avg')" 477 | ] 478 | }, 479 | { 480 | "attachments": {}, 481 | "cell_type": "markdown", 482 | "metadata": {}, 483 | "source": [ 484 | "So Manhattan really does complain more about noise than the other boroughs! Neat." 485 | ] 486 | }, 487 | { 488 | "attachments": {}, 489 | "cell_type": "markdown", 490 | "metadata": {}, 491 | "source": [ 492 | "\n", 89 | "shape: (5, 52)
Unique KeyCreated DateClosed DateAgencyAgency NameComplaint TypeDescriptorLocation TypeIncident ZipIncident AddressStreet NameCross Street 1Cross Street 2Intersection Street 1Intersection Street 2Address TypeCityLandmarkFacility TypeStatusDue DateResolution Action Updated DateCommunity BoardBoroughX Coordinate (State Plane)Y Coordinate (State Plane)Park Facility NamePark BoroughSchool NameSchool NumberSchool RegionSchool CodeSchool Phone NumberSchool AddressSchool CitySchool StateSchool ZipSchool Not FoundSchool or Citywide ComplaintVehicle TypeTaxi Company BoroughTaxi Pick Up LocationBridge Highway NameBridge Highway DirectionRoad RampBridge Highway SegmentGarage Lot NameFerry DirectionFerry Terminal NameLatitudeLongitudeLocation
i64strstrstrstrstrstrstrstrstrstrstrstrstrstrstrstrstrstrstrstrstrstrstri64i64strstrstrstrstrstrstrstrstrstrstrstrstrstrstrstrstrstrstrstrstrstrstrf64f64str
26589651"10/31/2013 02:08:41 AM"null"NYPD""New York City Police Departmen…"Noise - Street/Sidewalk""Loud Talking""Street/Sidewalk""11432""90-03 169 STREET""169 STREET""90 AVENUE""91 AVENUE"nullnull"ADDRESS""JAMAICA"null"Precinct""Assigned""10/31/2013 10:08:41 AM""10/31/2013 02:35:17 AM""12 QUEENS""QUEENS"1042027197389"Unspecified""QUEENS""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""N"nullnullnullnullnullnullnullnullnullnullnull40.708275-73.791604"(40.70827532593202, -73.791603…
26593698"10/31/2013 02:01:04 AM"null"NYPD""New York City Police Departmen…"Illegal Parking""Commercial Overnight Parking""Street/Sidewalk""11378""58 AVENUE""58 AVENUE""58 PLACE""59 STREET"nullnull"BLOCKFACE""MASPETH"null"Precinct""Open""10/31/2013 10:01:04 AM"null"05 QUEENS""QUEENS"1009349201984"Unspecified""QUEENS""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""N"nullnullnullnullnullnullnullnullnullnullnull40.721041-73.909453"(40.721040535628305, -73.90945…
26594139"10/31/2013 02:00:24 AM""10/31/2013 02:40:32 AM""NYPD""New York City Police Departmen…"Noise - Commercial""Loud Music/Party""Club/Bar/Restaurant""10032""4060 BROADWAY""BROADWAY""WEST 171 STREET""WEST 172 STREET"nullnull"ADDRESS""NEW YORK"null"Precinct""Closed""10/31/2013 10:00:24 AM""10/31/2013 02:39:42 AM""12 MANHATTAN""MANHATTAN"1001088246531"Unspecified""MANHATTAN""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""N"nullnullnullnullnullnullnullnullnullnullnull40.84333-73.939144"(40.84332975466513, -73.939143…
26595721"10/31/2013 01:56:23 AM""10/31/2013 02:21:48 AM""NYPD""New York City Police Departmen…"Noise - Vehicle""Car/Truck Horn""Street/Sidewalk""10023""WEST 72 STREET""WEST 72 STREET""COLUMBUS AVENUE""AMSTERDAM AVENUE"nullnull"BLOCKFACE""NEW YORK"null"Precinct""Closed""10/31/2013 09:56:23 AM""10/31/2013 02:21:10 AM""07 MANHATTAN""MANHATTAN"989730222727"Unspecified""MANHATTAN""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""N"nullnullnullnullnullnullnullnullnullnullnull40.778009-73.980213"(40.7780087446372, -73.9802134…
26590930"10/31/2013 01:53:44 AM"null"DOHMH""Department of Health and Menta…"Rodent""Condition Attracting Rodents""Vacant Lot""10027""WEST 124 STREET""WEST 124 STREET""LENOX AVENUE""ADAM CLAYTON POWELL JR BOULEVA…nullnull"BLOCKFACE""NEW YORK"null"N/A""Pending""11/30/2013 01:53:44 AM""10/31/2013 01:59:54 AM""10 MANHATTAN""MANHATTAN"998815233545"Unspecified""MANHATTAN""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""N"nullnullnullnullnullnullnullnullnullnullnull40.807691-73.947387"(40.80769092704951, -73.947387…
" 90 | ], 91 | "text/plain": [ 92 | "shape: (5, 52)\n", 93 | "┌────────────┬────────────┬───────────┬────────┬───┬───────────┬───────────┬───────────┬───────────┐\n", 94 | "│ Unique Key ┆ Created ┆ Closed ┆ Agency ┆ … ┆ Ferry ┆ Latitude ┆ Longitude ┆ Location │\n", 95 | "│ --- ┆ Date ┆ Date ┆ --- ┆ ┆ Terminal ┆ --- ┆ --- ┆ --- │\n", 96 | "│ i64 ┆ --- ┆ --- ┆ str ┆ ┆ Name ┆ f64 ┆ f64 ┆ str │\n", 97 | "│ ┆ str ┆ str ┆ ┆ ┆ --- ┆ ┆ ┆ │\n", 98 | "│ ┆ ┆ ┆ ┆ ┆ str ┆ ┆ ┆ │\n", 99 | "╞════════════╪════════════╪═══════════╪════════╪═══╪═══════════╪═══════════╪═══════════╪═══════════╡\n", 100 | "│ 26589651 ┆ 10/31/2013 ┆ null ┆ NYPD ┆ … ┆ null ┆ 40.708275 ┆ -73.79160 ┆ (40.70827 │\n", 101 | "│ ┆ 02:08:41 ┆ ┆ ┆ ┆ ┆ ┆ 4 ┆ 532593202 │\n", 102 | "│ ┆ AM ┆ ┆ ┆ ┆ ┆ ┆ ┆ , -73.791 │\n", 103 | "│ ┆ ┆ ┆ ┆ ┆ ┆ ┆ ┆ 603… │\n", 104 | "│ 26593698 ┆ 10/31/2013 ┆ null ┆ NYPD ┆ … ┆ null ┆ 40.721041 ┆ -73.90945 ┆ (40.72104 │\n", 105 | "│ ┆ 02:01:04 ┆ ┆ ┆ ┆ ┆ ┆ 3 ┆ 053562830 │\n", 106 | "│ ┆ AM ┆ ┆ ┆ ┆ ┆ ┆ ┆ 5, -73.90 │\n", 107 | "│ ┆ ┆ ┆ ┆ ┆ ┆ ┆ ┆ 945… │\n", 108 | "│ 26594139 ┆ 10/31/2013 ┆ 10/31/201 ┆ NYPD ┆ … ┆ null ┆ 40.84333 ┆ -73.93914 ┆ (40.84332 │\n", 109 | "│ ┆ 02:00:24 ┆ 3 ┆ ┆ ┆ ┆ ┆ 4 ┆ 975466513 │\n", 110 | "│ ┆ AM ┆ 02:40:32 ┆ ┆ ┆ ┆ ┆ ┆ , -73.939 │\n", 111 | "│ ┆ ┆ AM ┆ ┆ ┆ ┆ ┆ ┆ 143… │\n", 112 | "│ 26595721 ┆ 10/31/2013 ┆ 10/31/201 ┆ NYPD ┆ … ┆ null ┆ 40.778009 ┆ -73.98021 ┆ (40.77800 │\n", 113 | "│ ┆ 01:56:23 ┆ 3 ┆ ┆ ┆ ┆ ┆ 3 ┆ 87446372, │\n", 114 | "│ ┆ AM ┆ 02:21:48 ┆ ┆ ┆ ┆ ┆ ┆ -73.98021 │\n", 115 | "│ ┆ ┆ AM ┆ ┆ ┆ ┆ ┆ ┆ 34… │\n", 116 | "│ 26590930 ┆ 10/31/2013 ┆ null ┆ DOHMH ┆ … ┆ null ┆ 40.807691 ┆ -73.94738 ┆ (40.80769 │\n", 117 | "│ ┆ 01:53:44 ┆ ┆ ┆ ┆ ┆ ┆ 7 ┆ 092704951 │\n", 118 | "│ ┆ AM ┆ ┆ ┆ ┆ ┆ ┆ ┆ , -73.947 │\n", 119 | "│ ┆ ┆ ┆ ┆ ┆ ┆ ┆ ┆ 387… │\n", 120 | "└────────────┴────────────┴───────────┴────────┴───┴───────────┴───────────┴───────────┴───────────┘" 121 | ] 122 | }, 123 | "metadata": {}, 124 | "output_type": "display_data" 125 | }, 126 | { 127 | "data": { 128 | "text/plain": [ 129 | "Schema([('Unique Key', Int64),\n", 130 | " ('Created Date', String),\n", 131 | " ('Closed Date', String),\n", 132 | " ('Agency', String),\n", 133 | " ('Agency Name', String),\n", 134 | " ('Complaint Type', String),\n", 135 | " ('Descriptor', String),\n", 136 | " ('Location Type', String),\n", 137 | " ('Incident Zip', String),\n", 138 | " ('Incident Address', String),\n", 139 | " ('Street Name', String),\n", 140 | " ('Cross Street 1', String),\n", 141 | " ('Cross Street 2', String),\n", 142 | " ('Intersection Street 1', String),\n", 143 | " ('Intersection Street 2', String),\n", 144 | " ('Address Type', String),\n", 145 | " ('City', String),\n", 146 | " ('Landmark', String),\n", 147 | " ('Facility Type', String),\n", 148 | " ('Status', String),\n", 149 | " ('Due Date', String),\n", 150 | " ('Resolution Action Updated Date', String),\n", 151 | " ('Community Board', String),\n", 152 | " ('Borough', String),\n", 153 | " ('X Coordinate (State Plane)', Int64),\n", 154 | " ('Y Coordinate (State Plane)', Int64),\n", 155 | " ('Park Facility Name', String),\n", 156 | " ('Park Borough', String),\n", 157 | " ('School Name', String),\n", 158 | " ('School Number', String),\n", 159 | " ('School Region', String),\n", 160 | " ('School Code', String),\n", 161 | " ('School Phone Number', String),\n", 162 | " ('School Address', String),\n", 163 | " ('School City', String),\n", 164 | " ('School State', String),\n", 165 | " ('School Zip', String),\n", 166 | " ('School Not Found', String),\n", 167 | " ('School or Citywide Complaint', String),\n", 168 | " ('Vehicle Type', String),\n", 169 | " ('Taxi Company Borough', String),\n", 170 | " ('Taxi Pick Up Location', String),\n", 171 | " ('Bridge Highway Name', String),\n", 172 | " ('Bridge Highway Direction', String),\n", 173 | " ('Road Ramp', String),\n", 174 | " ('Bridge Highway Segment', String),\n", 175 | " ('Garage Lot Name', String),\n", 176 | " ('Ferry Direction', String),\n", 177 | " ('Ferry Terminal Name', String),\n", 178 | " ('Latitude', Float64),\n", 179 | " ('Longitude', Float64),\n", 180 | " ('Location', String)])" 181 | ] 182 | }, 183 | "metadata": {}, 184 | "output_type": "display_data" 185 | } 186 | ], 187 | "source": [ 188 | "requests = pl.read_csv('../data/311-service-requests.csv', infer_schema_length=None)\n", 189 | "display(requests.head())\n", 190 | "display(requests.schema)" 191 | ] 192 | }, 193 | { 194 | "attachments": {}, 195 | "cell_type": "markdown", 196 | "metadata": {}, 197 | "source": [ 198 | "# 7.1 How do we know if it's messy? " 199 | ] 200 | }, 201 | { 202 | "attachments": {}, 203 | "cell_type": "markdown", 204 | "metadata": {}, 205 | "source": [ 206 | "We're going to look at a few columns here. I know already that there are some problems with the zip code, so let's look at that first.\n", 207 | " \n", 208 | "To get a sense for whether a column has problems, I usually use `.unique()` to look at all its values. If it's a numeric column, I'll instead plot a histogram to get a sense of the distribution.\n", 209 | "\n", 210 | "When we look at the unique values in \"Incident Zip\", it quickly becomes clear that this is a mess.\n", 211 | "\n", 212 | "Some of the problems:\n", 213 | "\n", 214 | "* Some have been parsed as strings, and some as floats\n", 215 | "* There are `nan`s \n", 216 | "* Some of the zip codes are `29616-0759` or `83`\n", 217 | "* There are some N/A values that polars didn't recognize, like 'N/A' and 'NO CLUE'\n", 218 | "\n", 219 | "What we can do:\n", 220 | "\n", 221 | "* Normalize 'N/A' and 'NO CLUE' into regular nan values\n", 222 | "* Look at what's up with the 83, and decide what to do\n", 223 | "* Make everything strings" 224 | ] 225 | }, 226 | { 227 | "cell_type": "code", 228 | "execution_count": 4, 229 | "metadata": {}, 230 | "outputs": [ 231 | { 232 | "data": { 233 | "text/html": [ 234 | "
\n", 241 | "shape: (251,)
Incident Zip
str
null
"00000"
"000000"
"00083"
"02061"
"90010"
"92123"
"N/A"
"NA"
"NO CLUE"
" 242 | ], 243 | "text/plain": [ 244 | "shape: (251,)\n", 245 | "Series: 'Incident Zip' [str]\n", 246 | "[\n", 247 | "\tnull\n", 248 | "\t\"00000\"\n", 249 | "\t\"000000\"\n", 250 | "\t\"00083\"\n", 251 | "\t\"02061\"\n", 252 | "\t…\n", 253 | "\t\"90010\"\n", 254 | "\t\"92123\"\n", 255 | "\t\"N/A\"\n", 256 | "\t\"NA\"\n", 257 | "\t\"NO CLUE\"\n", 258 | "]" 259 | ] 260 | }, 261 | "execution_count": 4, 262 | "metadata": {}, 263 | "output_type": "execute_result" 264 | } 265 | ], 266 | "source": [ 267 | "requests['Incident Zip'].unique().sort()" 268 | ] 269 | }, 270 | { 271 | "attachments": {}, 272 | "cell_type": "markdown", 273 | "metadata": {}, 274 | "source": [ 275 | "# 7.2 Fixing the null_values and string/float confusion" 276 | ] 277 | }, 278 | { 279 | "attachments": {}, 280 | "cell_type": "markdown", 281 | "metadata": {}, 282 | "source": [ 283 | "We can pass a `null_values` option to `pl.read_csv` to clean this up a little bit. We can also specify that the type of Incident Zip is a string, not a float." 284 | ] 285 | }, 286 | { 287 | "cell_type": "code", 288 | "execution_count": 5, 289 | "metadata": {}, 290 | "outputs": [ 291 | { 292 | "name": "stderr", 293 | "output_type": "stream", 294 | "text": [ 295 | "/var/folders/sz/c22f1dwn4pz41534xrybbydc0000gn/T/ipykernel_26170/480128266.py:2: DeprecationWarning: The argument `dtypes` for `read_csv` is deprecated. It has been renamed to `schema_overrides`.\n", 296 | " requests = pl.read_csv('../data/311-service-requests.csv', null_values=null_values, dtypes={'Incident Zip':pl.String})\n" 297 | ] 298 | }, 299 | { 300 | "data": { 301 | "text/html": [ 302 | "
\n", 309 | "shape: (248,)
Incident Zip
str
null
"00000"
"000000"
"00083"
"02061"
"70711"
"77056"
"77092-2016"
"90010"
"92123"
" 310 | ], 311 | "text/plain": [ 312 | "shape: (248,)\n", 313 | "Series: 'Incident Zip' [str]\n", 314 | "[\n", 315 | "\tnull\n", 316 | "\t\"00000\"\n", 317 | "\t\"000000\"\n", 318 | "\t\"00083\"\n", 319 | "\t\"02061\"\n", 320 | "\t…\n", 321 | "\t\"70711\"\n", 322 | "\t\"77056\"\n", 323 | "\t\"77092-2016\"\n", 324 | "\t\"90010\"\n", 325 | "\t\"92123\"\n", 326 | "]" 327 | ] 328 | }, 329 | "execution_count": 5, 330 | "metadata": {}, 331 | "output_type": "execute_result" 332 | } 333 | ], 334 | "source": [ 335 | "null_values = ['NO CLUE', 'N/A', '0', 'NA']\n", 336 | "requests = pl.read_csv('../data/311-service-requests.csv', null_values=null_values, dtypes={'Incident Zip':pl.String})\n", 337 | "requests['Incident Zip'].unique().sort()" 338 | ] 339 | }, 340 | { 341 | "attachments": {}, 342 | "cell_type": "markdown", 343 | "metadata": {}, 344 | "source": [ 345 | "# 7.3 What's up with the dashes?" 346 | ] 347 | }, 348 | { 349 | "cell_type": "code", 350 | "execution_count": 6, 351 | "metadata": {}, 352 | "outputs": [ 353 | { 354 | "name": "stdout", 355 | "output_type": "stream", 356 | "text": [ 357 | "number of zip codes with dashes: 5\n" 358 | ] 359 | }, 360 | { 361 | "data": { 362 | "text/html": [ 363 | "
\n", 370 | "shape: (5, 52)
Unique KeyCreated DateClosed DateAgencyAgency NameComplaint TypeDescriptorLocation TypeIncident ZipIncident AddressStreet NameCross Street 1Cross Street 2Intersection Street 1Intersection Street 2Address TypeCityLandmarkFacility TypeStatusDue DateResolution Action Updated DateCommunity BoardBoroughX Coordinate (State Plane)Y Coordinate (State Plane)Park Facility NamePark BoroughSchool NameSchool NumberSchool RegionSchool CodeSchool Phone NumberSchool AddressSchool CitySchool StateSchool ZipSchool Not FoundSchool or Citywide ComplaintVehicle TypeTaxi Company BoroughTaxi Pick Up LocationBridge Highway NameBridge Highway DirectionRoad RampBridge Highway SegmentGarage Lot NameFerry DirectionFerry Terminal NameLatitudeLongitudeLocation
i64strstrstrstrstrstrstrstrstrstrstrstrstrstrstrstrstrstrstrstrstrstrstri64i64strstrstrstrstrstrstrstrstrstrstrstrstrstrstrstrstrstrstrstrstrstrstrf64f64str
26550551"10/24/2013 06:16:34 PM"null"DCA""Department of Consumer Affairs""Consumer Complaint""False Advertising"null"77092-2016""2700 EAST SELTICE WAY""EAST SELTICE WAY"nullnullnullnullnull"HOUSTON"nullnull"Assigned""11/13/2013 11:15:20 AM""10/29/2013 11:16:16 AM""0 Unspecified""Unspecified"nullnull"Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""N"nullnullnullnullnullnullnullnullnullnullnullnullnullnull
26548831"10/24/2013 09:35:10 AM"null"DCA""Department of Consumer Affairs""Consumer Complaint""Harassment"null"55164-0737""P.O. BOX 64437""64437"nullnullnullnullnull"ST. PAUL"nullnull"Assigned""11/13/2013 02:30:21 PM""10/29/2013 02:31:06 PM""0 Unspecified""Unspecified"nullnull"Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""N"nullnullnullnullnullnullnullnullnullnullnullnullnullnull
26488417"10/15/2013 03:40:33 PM"null"TLC""Taxi and Limousine Commission""Taxi Complaint""Driver Complaint""Street""11549-3650""365 HOFSTRA UNIVERSITY""HOFSTRA UNIVERSITY"nullnullnullnullnull"HEMSTEAD"nullnull"Assigned""11/30/2013 01:20:33 PM""10/16/2013 01:21:39 PM""0 Unspecified""Unspecified"nullnull"Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""N"nullnullnull"La Guardia Airport"nullnullnullnullnullnullnullnullnullnull
26468296"10/10/2013 12:36:43 PM""10/26/2013 01:07:07 AM""DCA""Department of Consumer Affairs""Consumer Complaint""Debt Not Owed"null"29616-0759""PO BOX 25759""BOX 25759"nullnullnullnullnull"GREENVILLE"nullnull"Closed""10/26/2013 09:20:28 AM""10/26/2013 01:07:07 AM""0 Unspecified""Unspecified"nullnull"Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""N"nullnullnullnullnullnullnullnullnullnullnullnullnullnull
26461137"10/09/2013 05:23:46 PM""10/25/2013 01:06:41 AM""DCA""Department of Consumer Affairs""Consumer Complaint""Harassment"null"35209-3114""600 BEACON PKWY""BEACON PKWY"nullnullnullnullnull"BIRMINGHAM"nullnull"Closed""10/25/2013 02:43:42 PM""10/25/2013 01:06:41 AM""0 Unspecified""Unspecified"nullnull"Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""N"nullnullnullnullnullnullnullnullnullnullnullnullnullnull
" 371 | ], 372 | "text/plain": [ 373 | "shape: (5, 52)\n", 374 | "┌────────────┬────────────┬────────────┬────────┬───┬────────────┬──────────┬───────────┬──────────┐\n", 375 | "│ Unique Key ┆ Created ┆ Closed ┆ Agency ┆ … ┆ Ferry ┆ Latitude ┆ Longitude ┆ Location │\n", 376 | "│ --- ┆ Date ┆ Date ┆ --- ┆ ┆ Terminal ┆ --- ┆ --- ┆ --- │\n", 377 | "│ i64 ┆ --- ┆ --- ┆ str ┆ ┆ Name ┆ f64 ┆ f64 ┆ str │\n", 378 | "│ ┆ str ┆ str ┆ ┆ ┆ --- ┆ ┆ ┆ │\n", 379 | "│ ┆ ┆ ┆ ┆ ┆ str ┆ ┆ ┆ │\n", 380 | "╞════════════╪════════════╪════════════╪════════╪═══╪════════════╪══════════╪═══════════╪══════════╡\n", 381 | "│ 26550551 ┆ 10/24/2013 ┆ null ┆ DCA ┆ … ┆ null ┆ null ┆ null ┆ null │\n", 382 | "│ ┆ 06:16:34 ┆ ┆ ┆ ┆ ┆ ┆ ┆ │\n", 383 | "│ ┆ PM ┆ ┆ ┆ ┆ ┆ ┆ ┆ │\n", 384 | "│ 26548831 ┆ 10/24/2013 ┆ null ┆ DCA ┆ … ┆ null ┆ null ┆ null ┆ null │\n", 385 | "│ ┆ 09:35:10 ┆ ┆ ┆ ┆ ┆ ┆ ┆ │\n", 386 | "│ ┆ AM ┆ ┆ ┆ ┆ ┆ ┆ ┆ │\n", 387 | "│ 26488417 ┆ 10/15/2013 ┆ null ┆ TLC ┆ … ┆ null ┆ null ┆ null ┆ null │\n", 388 | "│ ┆ 03:40:33 ┆ ┆ ┆ ┆ ┆ ┆ ┆ │\n", 389 | "│ ┆ PM ┆ ┆ ┆ ┆ ┆ ┆ ┆ │\n", 390 | "│ 26468296 ┆ 10/10/2013 ┆ 10/26/2013 ┆ DCA ┆ … ┆ null ┆ null ┆ null ┆ null │\n", 391 | "│ ┆ 12:36:43 ┆ 01:07:07 ┆ ┆ ┆ ┆ ┆ ┆ │\n", 392 | "│ ┆ PM ┆ AM ┆ ┆ ┆ ┆ ┆ ┆ │\n", 393 | "│ 26461137 ┆ 10/09/2013 ┆ 10/25/2013 ┆ DCA ┆ … ┆ null ┆ null ┆ null ┆ null │\n", 394 | "│ ┆ 05:23:46 ┆ 01:06:41 ┆ ┆ ┆ ┆ ┆ ┆ │\n", 395 | "│ ┆ PM ┆ AM ┆ ┆ ┆ ┆ ┆ ┆ │\n", 396 | "└────────────┴────────────┴────────────┴────────┴───┴────────────┴──────────┴───────────┴──────────┘" 397 | ] 398 | }, 399 | "execution_count": 6, 400 | "metadata": {}, 401 | "output_type": "execute_result" 402 | } 403 | ], 404 | "source": [ 405 | "rows_with_dashes = requests.filter(\n", 406 | " pl.col('Incident Zip').str.contains('-')\n", 407 | ")\n", 408 | "print('number of zip codes with dashes: ', rows_with_dashes.height)\n", 409 | "rows_with_dashes.head()" 410 | ] 411 | }, 412 | { 413 | "attachments": {}, 414 | "cell_type": "markdown", 415 | "metadata": {}, 416 | "source": [ 417 | "I thought these were missing data and originally deleted them. But then my friend Dave pointed out that 9-digit zip codes are normal. Let's look at all the zip codes with more than 5 digits, make sure they're okay, and then truncate them." 418 | ] 419 | }, 420 | { 421 | "cell_type": "code", 422 | "execution_count": 7, 423 | "metadata": {}, 424 | "outputs": [ 425 | { 426 | "data": { 427 | "text/html": [ 428 | "
\n", 435 | "shape: (5,)
Incident Zip
str
"77092-2016"
"11549-3650"
"55164-0737"
"35209-3114"
"29616-0759"
" 436 | ], 437 | "text/plain": [ 438 | "shape: (5,)\n", 439 | "Series: 'Incident Zip' [str]\n", 440 | "[\n", 441 | "\t\"77092-2016\"\n", 442 | "\t\"11549-3650\"\n", 443 | "\t\"55164-0737\"\n", 444 | "\t\"35209-3114\"\n", 445 | "\t\"29616-0759\"\n", 446 | "]" 447 | ] 448 | }, 449 | "execution_count": 7, 450 | "metadata": {}, 451 | "output_type": "execute_result" 452 | } 453 | ], 454 | "source": [ 455 | "requests.filter(\n", 456 | " pl.col('Incident Zip').str.contains('-')\n", 457 | ")['Incident Zip'].unique()" 458 | ] 459 | }, 460 | { 461 | "attachments": {}, 462 | "cell_type": "markdown", 463 | "metadata": {}, 464 | "source": [ 465 | "Those all look okay to truncate to me." 466 | ] 467 | }, 468 | { 469 | "cell_type": "code", 470 | "execution_count": 8, 471 | "metadata": {}, 472 | "outputs": [ 473 | { 474 | "data": { 475 | "text/html": [ 476 | "
\n", 483 | "shape: (0,)
Incident Zip
str
" 484 | ], 485 | "text/plain": [ 486 | "shape: (0,)\n", 487 | "Series: 'Incident Zip' [str]\n", 488 | "[\n", 489 | "]" 490 | ] 491 | }, 492 | "execution_count": 8, 493 | "metadata": {}, 494 | "output_type": "execute_result" 495 | } 496 | ], 497 | "source": [ 498 | "requests = requests.with_columns(\n", 499 | " pl.col('Incident Zip').str.slice(0, 5)\n", 500 | ")\n", 501 | "requests.filter(\n", 502 | " pl.col('Incident Zip').str.contains('-')\n", 503 | ")['Incident Zip'].unique()" 504 | ] 505 | }, 506 | { 507 | "attachments": {}, 508 | "cell_type": "markdown", 509 | "metadata": {}, 510 | "source": [ 511 | "Done." 512 | ] 513 | }, 514 | { 515 | "attachments": {}, 516 | "cell_type": "markdown", 517 | "metadata": {}, 518 | "source": [ 519 | "Earlier I thought 00083 was a broken zip code, but turns out Central Park's zip code 00083! Shows what I know. I'm still concerned about the 00000 zip codes, though: let's look at that. " 520 | ] 521 | }, 522 | { 523 | "cell_type": "code", 524 | "execution_count": 9, 525 | "metadata": {}, 526 | "outputs": [ 527 | { 528 | "data": { 529 | "text/html": [ 530 | "
\n", 537 | "shape: (2, 52)
Unique KeyCreated DateClosed DateAgencyAgency NameComplaint TypeDescriptorLocation TypeIncident ZipIncident AddressStreet NameCross Street 1Cross Street 2Intersection Street 1Intersection Street 2Address TypeCityLandmarkFacility TypeStatusDue DateResolution Action Updated DateCommunity BoardBoroughX Coordinate (State Plane)Y Coordinate (State Plane)Park Facility NamePark BoroughSchool NameSchool NumberSchool RegionSchool CodeSchool Phone NumberSchool AddressSchool CitySchool StateSchool ZipSchool Not FoundSchool or Citywide ComplaintVehicle TypeTaxi Company BoroughTaxi Pick Up LocationBridge Highway NameBridge Highway DirectionRoad RampBridge Highway SegmentGarage Lot NameFerry DirectionFerry Terminal NameLatitudeLongitudeLocation
i64strstrstrstrstrstrstrstrstrstrstrstrstrstrstrstrstrstrstrstrstrstrstri64i64strstrstrstrstrstrstrstrstrstrstrstrstrstrstrstrstrstrstrstrstrstrstrf64f64str
26529313"10/22/2013 02:51:06 PM"null"TLC""Taxi and Limousine Commission""Taxi Complaint""Driver Complaint"null"00000""EWR EWR""EWR"nullnullnullnullnull"NEWARK"nullnull"Assigned""12/07/2013 09:53:51 AM""10/23/2013 09:54:43 AM""0 Unspecified""Unspecified"nullnull"Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""N"nullnullnull"Other"nullnullnullnullnullnullnullnullnullnull
26507389"10/17/2013 05:48:44 PM"null"TLC""Taxi and Limousine Commission""Taxi Complaint""Driver Complaint""Street""00000""1 NEWARK AIRPORT""NEWARK AIRPORT"nullnullnullnullnull"NEWARK"nullnull"Assigned""12/02/2013 11:59:46 AM""10/18/2013 12:01:08 PM""0 Unspecified""Unspecified"nullnull"Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""Unspecified""N"nullnullnull"Other"nullnullnullnullnullnullnullnullnullnull
" 538 | ], 539 | "text/plain": [ 540 | "shape: (2, 52)\n", 541 | "┌────────────┬──────────────┬────────┬────────┬───┬──────────┬──────────┬───────────┬──────────┐\n", 542 | "│ Unique Key ┆ Created Date ┆ Closed ┆ Agency ┆ … ┆ Ferry ┆ Latitude ┆ Longitude ┆ Location │\n", 543 | "│ --- ┆ --- ┆ Date ┆ --- ┆ ┆ Terminal ┆ --- ┆ --- ┆ --- │\n", 544 | "│ i64 ┆ str ┆ --- ┆ str ┆ ┆ Name ┆ f64 ┆ f64 ┆ str │\n", 545 | "│ ┆ ┆ str ┆ ┆ ┆ --- ┆ ┆ ┆ │\n", 546 | "│ ┆ ┆ ┆ ┆ ┆ str ┆ ┆ ┆ │\n", 547 | "╞════════════╪══════════════╪════════╪════════╪═══╪══════════╪══════════╪═══════════╪══════════╡\n", 548 | "│ 26529313 ┆ 10/22/2013 ┆ null ┆ TLC ┆ … ┆ null ┆ null ┆ null ┆ null │\n", 549 | "│ ┆ 02:51:06 PM ┆ ┆ ┆ ┆ ┆ ┆ ┆ │\n", 550 | "│ 26507389 ┆ 10/17/2013 ┆ null ┆ TLC ┆ … ┆ null ┆ null ┆ null ┆ null │\n", 551 | "│ ┆ 05:48:44 PM ┆ ┆ ┆ ┆ ┆ ┆ ┆ │\n", 552 | "└────────────┴──────────────┴────────┴────────┴───┴──────────┴──────────┴───────────┴──────────┘" 553 | ] 554 | }, 555 | "execution_count": 9, 556 | "metadata": {}, 557 | "output_type": "execute_result" 558 | } 559 | ], 560 | "source": [ 561 | "requests.filter(\n", 562 | " pl.col('Incident Zip') == '00000'\n", 563 | ")" 564 | ] 565 | }, 566 | { 567 | "attachments": {}, 568 | "cell_type": "markdown", 569 | "metadata": {}, 570 | "source": [ 571 | "This looks bad to me. Let's set these to nan." 572 | ] 573 | }, 574 | { 575 | "cell_type": "code", 576 | "execution_count": 10, 577 | "metadata": {}, 578 | "outputs": [ 579 | { 580 | "data": { 581 | "text/html": [ 582 | "
\n", 589 | "shape: (0, 52)
Unique KeyCreated DateClosed DateAgencyAgency NameComplaint TypeDescriptorLocation TypeIncident ZipIncident AddressStreet NameCross Street 1Cross Street 2Intersection Street 1Intersection Street 2Address TypeCityLandmarkFacility TypeStatusDue DateResolution Action Updated DateCommunity BoardBoroughX Coordinate (State Plane)Y Coordinate (State Plane)Park Facility NamePark BoroughSchool NameSchool NumberSchool RegionSchool CodeSchool Phone NumberSchool AddressSchool CitySchool StateSchool ZipSchool Not FoundSchool or Citywide ComplaintVehicle TypeTaxi Company BoroughTaxi Pick Up LocationBridge Highway NameBridge Highway DirectionRoad RampBridge Highway SegmentGarage Lot NameFerry DirectionFerry Terminal NameLatitudeLongitudeLocation
i64strstrstrstrstrstrstrstrstrstrstrstrstrstrstrstrstrstrstrstrstrstrstri64i64strstrstrstrstrstrstrstrstrstrstrstrstrstrstrstrstrstrstrstrstrstrstrf64f64str
" 590 | ], 591 | "text/plain": [ 592 | "shape: (0, 52)\n", 593 | "┌────────────┬─────────┬─────────────┬────────┬───┬──────────┬──────────┬───────────┬──────────┐\n", 594 | "│ Unique Key ┆ Created ┆ Closed Date ┆ Agency ┆ … ┆ Ferry ┆ Latitude ┆ Longitude ┆ Location │\n", 595 | "│ --- ┆ Date ┆ --- ┆ --- ┆ ┆ Terminal ┆ --- ┆ --- ┆ --- │\n", 596 | "│ i64 ┆ --- ┆ str ┆ str ┆ ┆ Name ┆ f64 ┆ f64 ┆ str │\n", 597 | "│ ┆ str ┆ ┆ ┆ ┆ --- ┆ ┆ ┆ │\n", 598 | "│ ┆ ┆ ┆ ┆ ┆ str ┆ ┆ ┆ │\n", 599 | "╞════════════╪═════════╪═════════════╪════════╪═══╪══════════╪══════════╪═══════════╪══════════╡\n", 600 | "└────────────┴─────────┴─────────────┴────────┴───┴──────────┴──────────┴───────────┴──────────┘" 601 | ] 602 | }, 603 | "execution_count": 10, 604 | "metadata": {}, 605 | "output_type": "execute_result" 606 | } 607 | ], 608 | "source": [ 609 | "requests = requests.with_columns(\n", 610 | " pl.when(pl.col('Incident Zip') == '00000').then(None).otherwise(pl.col('Incident Zip')).alias('Incident Zip')\n", 611 | ")\n", 612 | "requests.filter(\n", 613 | " pl.col('Incident Zip') == '00000'\n", 614 | ")" 615 | ] 616 | }, 617 | { 618 | "attachments": {}, 619 | "cell_type": "markdown", 620 | "metadata": {}, 621 | "source": [ 622 | "Great. Let's see where we are now:" 623 | ] 624 | }, 625 | { 626 | "cell_type": "code", 627 | "execution_count": 11, 628 | "metadata": {}, 629 | "outputs": [ 630 | { 631 | "data": { 632 | "text/html": [ 633 | "
\n", 640 | "shape: (246,)
Incident Zip
str
null
"00083"
"02061"
"06901"
"07020"
"70711"
"77056"
"77092"
"90010"
"92123"
" 641 | ], 642 | "text/plain": [ 643 | "shape: (246,)\n", 644 | "Series: 'Incident Zip' [str]\n", 645 | "[\n", 646 | "\tnull\n", 647 | "\t\"00083\"\n", 648 | "\t\"02061\"\n", 649 | "\t\"06901\"\n", 650 | "\t\"07020\"\n", 651 | "\t…\n", 652 | "\t\"70711\"\n", 653 | "\t\"77056\"\n", 654 | "\t\"77092\"\n", 655 | "\t\"90010\"\n", 656 | "\t\"92123\"\n", 657 | "]" 658 | ] 659 | }, 660 | "execution_count": 11, 661 | "metadata": {}, 662 | "output_type": "execute_result" 663 | } 664 | ], 665 | "source": [ 666 | "unique_zips = requests['Incident Zip'].unique().sort()\n", 667 | "unique_zips" 668 | ] 669 | }, 670 | { 671 | "attachments": {}, 672 | "cell_type": "markdown", 673 | "metadata": {}, 674 | "source": [ 675 | "Amazing! This is much cleaner. There's something a bit weird here, though -- I looked up 77056 on Google maps, and that's in Texas.\n", 676 | "\n", 677 | "Let's take a closer look:" 678 | ] 679 | }, 680 | { 681 | "cell_type": "code", 682 | "execution_count": 12, 683 | "metadata": {}, 684 | "outputs": [ 685 | { 686 | "data": { 687 | "text/html": [ 688 | "
\n", 695 | "shape: (1, 3)
Incident ZipDescriptorCity
strstrstr
"77056""Debt Not Owed""HOUSTON"
" 696 | ], 697 | "text/plain": [ 698 | "shape: (1, 3)\n", 699 | "┌──────────────┬───────────────┬─────────┐\n", 700 | "│ Incident Zip ┆ Descriptor ┆ City │\n", 701 | "│ --- ┆ --- ┆ --- │\n", 702 | "│ str ┆ str ┆ str │\n", 703 | "╞══════════════╪═══════════════╪═════════╡\n", 704 | "│ 77056 ┆ Debt Not Owed ┆ HOUSTON │\n", 705 | "└──────────────┴───────────────┴─────────┘" 706 | ] 707 | }, 708 | "execution_count": 12, 709 | "metadata": {}, 710 | "output_type": "execute_result" 711 | } 712 | ], 713 | "source": [ 714 | "requests.lazy().select(\n", 715 | " 'Incident Zip',\n", 716 | " 'Descriptor',\n", 717 | " 'City'\n", 718 | ").filter(\n", 719 | " pl.col('Incident Zip') == \"77056\"\n", 720 | ").sort('Incident Zip').collect()" 721 | ] 722 | }, 723 | { 724 | "attachments": {}, 725 | "cell_type": "markdown", 726 | "metadata": {}, 727 | "source": [ 728 | "Okay, there really are requests coming from Houston! Good to know. Filtering by zip code is probably a bad way to handle this -- we should really be looking at the city instead." 729 | ] 730 | }, 731 | { 732 | "cell_type": "code", 733 | "execution_count": 13, 734 | "metadata": {}, 735 | "outputs": [ 736 | { 737 | "data": { 738 | "text/html": [ 739 | "
\n", 746 | "shape: (101, 2)
Citycount
stru32
"BROOKLYN"31662
"NEW YORK"22664
"BRONX"18438
null12215
"STATEN ISLAND"4766
"SYRACUSE"1
"NANUET"1
"FARMINGDALE"1
"NEW YOR"1
"NEWARK AIRPORT"1
" 747 | ], 748 | "text/plain": [ 749 | "shape: (101, 2)\n", 750 | "┌────────────────┬───────┐\n", 751 | "│ City ┆ count │\n", 752 | "│ --- ┆ --- │\n", 753 | "│ str ┆ u32 │\n", 754 | "╞════════════════╪═══════╡\n", 755 | "│ BROOKLYN ┆ 31662 │\n", 756 | "│ NEW YORK ┆ 22664 │\n", 757 | "│ BRONX ┆ 18438 │\n", 758 | "│ null ┆ 12215 │\n", 759 | "│ STATEN ISLAND ┆ 4766 │\n", 760 | "│ … ┆ … │\n", 761 | "│ SYRACUSE ┆ 1 │\n", 762 | "│ NANUET ┆ 1 │\n", 763 | "│ FARMINGDALE ┆ 1 │\n", 764 | "│ NEW YOR ┆ 1 │\n", 765 | "│ NEWARK AIRPORT ┆ 1 │\n", 766 | "└────────────────┴───────┘" 767 | ] 768 | }, 769 | "execution_count": 13, 770 | "metadata": {}, 771 | "output_type": "execute_result" 772 | } 773 | ], 774 | "source": [ 775 | "requests['City'].str.to_uppercase().value_counts(sort=True)" 776 | ] 777 | }, 778 | { 779 | "attachments": {}, 780 | "cell_type": "markdown", 781 | "metadata": {}, 782 | "source": [ 783 | "There are 12,215 `null` values in the `City` column. Upon closer look, it seems that many of these rows also have missing `Incident Zip` values as well:" 784 | ] 785 | }, 786 | { 787 | "cell_type": "code", 788 | "execution_count": 14, 789 | "metadata": {}, 790 | "outputs": [ 791 | { 792 | "data": { 793 | "text/html": [ 794 | "
\n", 801 | "shape: (12_215, 3)
Incident ZipDescriptorCity
strstrstr
null"Street Light Out"null
null"Street Light Out"null
null"Medicaid"null
null"Controller"null
null"Property Tax Exemption Applica…null
null"Street Light Out"null
null"Street Light Out"null
null"Property Tax Exemption Applica…null
"10022""Driver Complaint"null
"11429""Dead Animal"null
" 802 | ], 803 | "text/plain": [ 804 | "shape: (12_215, 3)\n", 805 | "┌──────────────┬─────────────────────────────────┬──────┐\n", 806 | "│ Incident Zip ┆ Descriptor ┆ City │\n", 807 | "│ --- ┆ --- ┆ --- │\n", 808 | "│ str ┆ str ┆ str │\n", 809 | "╞══════════════╪═════════════════════════════════╪══════╡\n", 810 | "│ null ┆ Street Light Out ┆ null │\n", 811 | "│ null ┆ Street Light Out ┆ null │\n", 812 | "│ null ┆ Medicaid ┆ null │\n", 813 | "│ null ┆ Controller ┆ null │\n", 814 | "│ null ┆ Property Tax Exemption Applica… ┆ null │\n", 815 | "│ … ┆ … ┆ … │\n", 816 | "│ null ┆ Street Light Out ┆ null │\n", 817 | "│ null ┆ Street Light Out ┆ null │\n", 818 | "│ null ┆ Property Tax Exemption Applica… ┆ null │\n", 819 | "│ 10022 ┆ Driver Complaint ┆ null │\n", 820 | "│ 11429 ┆ Dead Animal ┆ null │\n", 821 | "└──────────────┴─────────────────────────────────┴──────┘" 822 | ] 823 | }, 824 | "execution_count": 14, 825 | "metadata": {}, 826 | "output_type": "execute_result" 827 | } 828 | ], 829 | "source": [ 830 | "requests.select(\n", 831 | " 'Incident Zip',\n", 832 | " 'Descriptor',\n", 833 | " 'City'\n", 834 | ").filter(\n", 835 | " pl.col('City').is_null()\n", 836 | ").sort('Incident Zip')" 837 | ] 838 | }, 839 | { 840 | "attachments": {}, 841 | "cell_type": "markdown", 842 | "metadata": {}, 843 | "source": [ 844 | "# 7.4 Putting it together" 845 | ] 846 | }, 847 | { 848 | "attachments": {}, 849 | "cell_type": "markdown", 850 | "metadata": {}, 851 | "source": [ 852 | "Here's what we ended up doing to clean up our zip codes, all together:" 853 | ] 854 | }, 855 | { 856 | "cell_type": "code", 857 | "execution_count": 15, 858 | "metadata": {}, 859 | "outputs": [], 860 | "source": [ 861 | "null_values = ['NO CLUE', 'N/A', '0', 'NA']\n", 862 | "requests = (\n", 863 | " pl.scan_csv('../data/311-service-requests.csv', null_values=null_values, schema_overrides={'Incident Zip':pl.String})\n", 864 | " .with_columns(pl.col('Incident Zip').str.slice(0, 5))\n", 865 | ")\n", 866 | "requests = (\n", 867 | " requests\n", 868 | " .with_columns(pl.when(pl.col('Incident Zip') == '00000').then(None).otherwise(pl.col('Incident Zip')).alias('Incident Zip'))\n", 869 | " .filter(pl.col('Incident Zip').is_not_null())\n", 870 | " .collect()\n", 871 | ")" 872 | ] 873 | }, 874 | { 875 | "cell_type": "code", 876 | "execution_count": 16, 877 | "metadata": {}, 878 | "outputs": [ 879 | { 880 | "data": { 881 | "text/html": [ 882 | "
\n", 889 | "shape: (245,)
Incident Zip
str
"00083"
"02061"
"06901"
"07020"
"07087"
"70711"
"77056"
"77092"
"90010"
"92123"
" 890 | ], 891 | "text/plain": [ 892 | "shape: (245,)\n", 893 | "Series: 'Incident Zip' [str]\n", 894 | "[\n", 895 | "\t\"00083\"\n", 896 | "\t\"02061\"\n", 897 | "\t\"06901\"\n", 898 | "\t\"07020\"\n", 899 | "\t\"07087\"\n", 900 | "\t…\n", 901 | "\t\"70711\"\n", 902 | "\t\"77056\"\n", 903 | "\t\"77092\"\n", 904 | "\t\"90010\"\n", 905 | "\t\"92123\"\n", 906 | "]" 907 | ] 908 | }, 909 | "execution_count": 16, 910 | "metadata": {}, 911 | "output_type": "execute_result" 912 | } 913 | ], 914 | "source": [ 915 | "requests['Incident Zip'].unique().sort()" 916 | ] 917 | }, 918 | { 919 | "attachments": {}, 920 | "cell_type": "markdown", 921 | "metadata": {}, 922 | "source": [ 923 | "\n", 91 | "shape: (5, 5)
atimectimepackage-namemru-programtag
i64i64strstrstr
13872957971367633260"perl-base""/usr/bin/perl"null
13872957961354370480"login""/bin/su"null
13872957431354341275"libtalloc2""/usr/lib/x86_64-linux-gnu/libt…null
13872957431387224204"libwbclient0""/usr/lib/x86_64-linux-gnu/libw…"<RECENT-CTIME>"
13872957421354341253"libselinux1""/lib/x86_64-linux-gnu/libselin…null
" 92 | ], 93 | "text/plain": [ 94 | "shape: (5, 5)\n", 95 | "┌────────────┬────────────┬──────────────┬─────────────────────────────────┬────────────────┐\n", 96 | "│ atime ┆ ctime ┆ package-name ┆ mru-program ┆ tag │\n", 97 | "│ --- ┆ --- ┆ --- ┆ --- ┆ --- │\n", 98 | "│ i64 ┆ i64 ┆ str ┆ str ┆ str │\n", 99 | "╞════════════╪════════════╪══════════════╪═════════════════════════════════╪════════════════╡\n", 100 | "│ 1387295797 ┆ 1367633260 ┆ perl-base ┆ /usr/bin/perl ┆ null │\n", 101 | "│ 1387295796 ┆ 1354370480 ┆ login ┆ /bin/su ┆ null │\n", 102 | "│ 1387295743 ┆ 1354341275 ┆ libtalloc2 ┆ /usr/lib/x86_64-linux-gnu/libt… ┆ null │\n", 103 | "│ 1387295743 ┆ 1387224204 ┆ libwbclient0 ┆ /usr/lib/x86_64-linux-gnu/libw… ┆ │\n", 104 | "│ 1387295742 ┆ 1354341253 ┆ libselinux1 ┆ /lib/x86_64-linux-gnu/libselin… ┆ null │\n", 105 | "└────────────┴────────────┴──────────────┴─────────────────────────────────┴────────────────┘" 106 | ] 107 | }, 108 | "execution_count": 3, 109 | "metadata": {}, 110 | "output_type": "execute_result" 111 | } 112 | ], 113 | "source": [ 114 | "popcon.head()" 115 | ] 116 | }, 117 | { 118 | "attachments": {}, 119 | "cell_type": "markdown", 120 | "metadata": {}, 121 | "source": [ 122 | "We can explicitly convert the integers to datetimes using the `from_epoch` function:" 123 | ] 124 | }, 125 | { 126 | "cell_type": "code", 127 | "execution_count": 4, 128 | "metadata": {}, 129 | "outputs": [], 130 | "source": [ 131 | "popcon = popcon.with_columns(\n", 132 | " pl.from_epoch('atime', time_unit='s'),\n", 133 | " pl.from_epoch('ctime') #time_unit='s' is default\n", 134 | ")" 135 | ] 136 | }, 137 | { 138 | "attachments": {}, 139 | "cell_type": "markdown", 140 | "metadata": {}, 141 | "source": [ 142 | "If we look at the dtype now, it's `pl.Datetime`." 143 | ] 144 | }, 145 | { 146 | "cell_type": "code", 147 | "execution_count": 5, 148 | "metadata": {}, 149 | "outputs": [ 150 | { 151 | "data": { 152 | "text/plain": [ 153 | "Datetime(time_unit='us', time_zone=None)" 154 | ] 155 | }, 156 | "execution_count": 5, 157 | "metadata": {}, 158 | "output_type": "execute_result" 159 | } 160 | ], 161 | "source": [ 162 | "popcon['atime'].dtype" 163 | ] 164 | }, 165 | { 166 | "attachments": {}, 167 | "cell_type": "markdown", 168 | "metadata": {}, 169 | "source": [ 170 | "So now we can look at our `atime` and `ctime` as dates!" 171 | ] 172 | }, 173 | { 174 | "cell_type": "code", 175 | "execution_count": 6, 176 | "metadata": {}, 177 | "outputs": [ 178 | { 179 | "data": { 180 | "text/html": [ 181 | "
\n", 188 | "shape: (5, 5)
atimectimepackage-namemru-programtag
datetime[μs]datetime[μs]strstrstr
2013-12-17 15:56:372013-05-04 02:07:40"perl-base""/usr/bin/perl"null
2013-12-17 15:56:362012-12-01 14:01:20"login""/bin/su"null
2013-12-17 15:55:432012-12-01 05:54:35"libtalloc2""/usr/lib/x86_64-linux-gnu/libt…null
2013-12-17 15:55:432013-12-16 20:03:24"libwbclient0""/usr/lib/x86_64-linux-gnu/libw…"<RECENT-CTIME>"
2013-12-17 15:55:422012-12-01 05:54:13"libselinux1""/lib/x86_64-linux-gnu/libselin…null
" 189 | ], 190 | "text/plain": [ 191 | "shape: (5, 5)\n", 192 | "┌─────────────────────┬─────────────────────┬──────────────┬──────────────────────┬────────────────┐\n", 193 | "│ atime ┆ ctime ┆ package-name ┆ mru-program ┆ tag │\n", 194 | "│ --- ┆ --- ┆ --- ┆ --- ┆ --- │\n", 195 | "│ datetime[μs] ┆ datetime[μs] ┆ str ┆ str ┆ str │\n", 196 | "╞═════════════════════╪═════════════════════╪══════════════╪══════════════════════╪════════════════╡\n", 197 | "│ 2013-12-17 15:56:37 ┆ 2013-05-04 02:07:40 ┆ perl-base ┆ /usr/bin/perl ┆ null │\n", 198 | "│ 2013-12-17 15:56:36 ┆ 2012-12-01 14:01:20 ┆ login ┆ /bin/su ┆ null │\n", 199 | "│ 2013-12-17 15:55:43 ┆ 2012-12-01 05:54:35 ┆ libtalloc2 ┆ /usr/lib/x86_64-linu ┆ null │\n", 200 | "│ ┆ ┆ ┆ x-gnu/libt… ┆ │\n", 201 | "│ 2013-12-17 15:55:43 ┆ 2013-12-16 20:03:24 ┆ libwbclient0 ┆ /usr/lib/x86_64-linu ┆ │\n", 202 | "│ ┆ ┆ ┆ x-gnu/libw… ┆ │\n", 203 | "│ 2013-12-17 15:55:42 ┆ 2012-12-01 05:54:13 ┆ libselinux1 ┆ /lib/x86_64-linux-gn ┆ null │\n", 204 | "│ ┆ ┆ ┆ u/libselin… ┆ │\n", 205 | "└─────────────────────┴─────────────────────┴──────────────┴──────────────────────┴────────────────┘" 206 | ] 207 | }, 208 | "execution_count": 6, 209 | "metadata": {}, 210 | "output_type": "execute_result" 211 | } 212 | ], 213 | "source": [ 214 | "popcon.head()" 215 | ] 216 | }, 217 | { 218 | "attachments": {}, 219 | "cell_type": "markdown", 220 | "metadata": {}, 221 | "source": [ 222 | "Now suppose we want to look at all packages that aren't libraries. First, I want to get rid of everything with timestamp 0." 223 | ] 224 | }, 225 | { 226 | "cell_type": "code", 227 | "execution_count": 7, 228 | "metadata": {}, 229 | "outputs": [ 230 | { 231 | "name": "stdout", 232 | "output_type": "stream", 233 | "text": [ 234 | "before filter\n" 235 | ] 236 | }, 237 | { 238 | "data": { 239 | "text/html": [ 240 | "
\n", 247 | "shape: (3, 5)
atimectimepackage-namemru-programtag
datetime[μs]datetime[μs]strstrstr
1970-01-01 00:00:001970-01-01 00:00:00"librsync1""<NOFILES>"null
1970-01-01 00:00:001970-01-01 00:00:00"libindicator-messages-status-p…"<NOFILES>"null
1970-01-01 00:00:001970-01-01 00:00:00"libxfconf-0-2""<NOFILES>"null
" 248 | ], 249 | "text/plain": [ 250 | "shape: (3, 5)\n", 251 | "┌─────────────────────┬─────────────────────┬─────────────────────────────────┬─────────────┬──────┐\n", 252 | "│ atime ┆ ctime ┆ package-name ┆ mru-program ┆ tag │\n", 253 | "│ --- ┆ --- ┆ --- ┆ --- ┆ --- │\n", 254 | "│ datetime[μs] ┆ datetime[μs] ┆ str ┆ str ┆ str │\n", 255 | "╞═════════════════════╪═════════════════════╪═════════════════════════════════╪═════════════╪══════╡\n", 256 | "│ 1970-01-01 00:00:00 ┆ 1970-01-01 00:00:00 ┆ librsync1 ┆ ┆ null │\n", 257 | "│ 1970-01-01 00:00:00 ┆ 1970-01-01 00:00:00 ┆ libindicator-messages-status-p… ┆ ┆ null │\n", 258 | "│ 1970-01-01 00:00:00 ┆ 1970-01-01 00:00:00 ┆ libxfconf-0-2 ┆ ┆ null │\n", 259 | "└─────────────────────┴─────────────────────┴─────────────────────────────────┴─────────────┴──────┘" 260 | ] 261 | }, 262 | "metadata": {}, 263 | "output_type": "display_data" 264 | }, 265 | { 266 | "name": "stdout", 267 | "output_type": "stream", 268 | "text": [ 269 | "after filter\n" 270 | ] 271 | }, 272 | { 273 | "data": { 274 | "text/html": [ 275 | "
\n", 282 | "shape: (3, 5)
atimectimepackage-namemru-programtag
datetime[μs]datetime[μs]strstrstr
2008-11-20 14:38:202012-12-01 05:54:57"libfile-copy-recursive-perl""/usr/share/perl5/File/Copy/Rec…"<OLD>"
2010-02-22 14:59:212012-12-01 05:54:14"libfribidi0""/usr/bin/fribidi""<OLD>"
2010-03-06 14:44:182012-12-01 05:54:37"laptop-detect""/usr/sbin/laptop-detect""<OLD>"
" 283 | ], 284 | "text/plain": [ 285 | "shape: (3, 5)\n", 286 | "┌─────────────────────┬─────────────────────┬───────────────────────┬──────────────────────┬───────┐\n", 287 | "│ atime ┆ ctime ┆ package-name ┆ mru-program ┆ tag │\n", 288 | "│ --- ┆ --- ┆ --- ┆ --- ┆ --- │\n", 289 | "│ datetime[μs] ┆ datetime[μs] ┆ str ┆ str ┆ str │\n", 290 | "╞═════════════════════╪═════════════════════╪═══════════════════════╪══════════════════════╪═══════╡\n", 291 | "│ 2008-11-20 14:38:20 ┆ 2012-12-01 05:54:57 ┆ libfile-copy-recursiv ┆ /usr/share/perl5/Fil ┆ │\n", 292 | "│ ┆ ┆ e-perl ┆ e/Copy/Rec… ┆ │\n", 293 | "│ 2010-02-22 14:59:21 ┆ 2012-12-01 05:54:14 ┆ libfribidi0 ┆ /usr/bin/fribidi ┆ │\n", 294 | "│ 2010-03-06 14:44:18 ┆ 2012-12-01 05:54:37 ┆ laptop-detect ┆ /usr/sbin/laptop-det ┆ │\n", 295 | "│ ┆ ┆ ┆ ect ┆ │\n", 296 | "└─────────────────────┴─────────────────────┴───────────────────────┴──────────────────────┴───────┘" 297 | ] 298 | }, 299 | "metadata": {}, 300 | "output_type": "display_data" 301 | } 302 | ], 303 | "source": [ 304 | "print(\"before filter\")\n", 305 | "display(popcon.bottom_k(3, by='atime'))\n", 306 | "popcon = popcon.filter(\n", 307 | " pl.col('atime') > pl.datetime(1970, 1, 1)\n", 308 | ")\n", 309 | "print(\"after filter\")\n", 310 | "display(popcon.bottom_k(3, by='atime'))" 311 | ] 312 | }, 313 | { 314 | "attachments": {}, 315 | "cell_type": "markdown", 316 | "metadata": {}, 317 | "source": [ 318 | "Now we can use polars' `filter` and `str` look at rows where the package name doesn't contain 'lib'." 319 | ] 320 | }, 321 | { 322 | "cell_type": "code", 323 | "execution_count": 8, 324 | "metadata": {}, 325 | "outputs": [ 326 | { 327 | "data": { 328 | "text/html": [ 329 | "
\n", 336 | "shape: (10, 5)
atimectimepackage-namemru-programtag
datetime[μs]datetime[μs]strstrstr
2013-12-17 04:55:392013-12-17 04:55:42"ddd""/usr/bin/ddd""<RECENT-CTIME>"
2013-12-16 20:03:202013-12-16 20:05:13"nodejs""/usr/bin/npm""<RECENT-CTIME>"
2013-12-16 20:03:202013-12-16 20:05:04"switchboard-plug-keyboard""/usr/lib/plugs/pantheon/keyboa…"<RECENT-CTIME>"
2013-12-16 20:03:202013-12-16 20:05:04"thunderbird-locale-en""/usr/lib/thunderbird-addons/ex…"<RECENT-CTIME>"
2013-12-16 20:08:272013-12-16 20:05:03"software-center""/usr/sbin/update-software-cent…"<RECENT-CTIME>"
2013-12-16 20:03:202013-12-16 20:05:00"samba-common-bin""/usr/bin/net.samba3""<RECENT-CTIME>"
2013-12-16 20:08:252013-12-16 20:04:59"postgresql-client-9.1""/usr/lib/postgresql/9.1/bin/ps…"<RECENT-CTIME>"
2013-12-16 20:08:232013-12-16 20:04:58"postgresql-9.1""/usr/lib/postgresql/9.1/bin/po…"<RECENT-CTIME>"
2013-12-16 20:03:202013-12-16 20:04:55"php5-dev""/usr/include/php5/main/snprint…"<RECENT-CTIME>"
2013-12-16 20:03:202013-12-16 20:04:54"php-pear""/usr/share/php/XML/Util.php""<RECENT-CTIME>"
" 337 | ], 338 | "text/plain": [ 339 | "shape: (10, 5)\n", 340 | "┌──────────────┬──────────────────────┬─────────────────────┬─────────────────────┬────────────────┐\n", 341 | "│ atime ┆ ctime ┆ package-name ┆ mru-program ┆ tag │\n", 342 | "│ --- ┆ --- ┆ --- ┆ --- ┆ --- │\n", 343 | "│ datetime[μs] ┆ datetime[μs] ┆ str ┆ str ┆ str │\n", 344 | "╞══════════════╪══════════════════════╪═════════════════════╪═════════════════════╪════════════════╡\n", 345 | "│ 2013-12-17 ┆ 2013-12-17 04:55:42 ┆ ddd ┆ /usr/bin/ddd ┆ │\n", 346 | "│ 04:55:39 ┆ ┆ ┆ ┆ │\n", 347 | "│ 2013-12-16 ┆ 2013-12-16 20:05:13 ┆ nodejs ┆ /usr/bin/npm ┆ │\n", 348 | "│ 20:03:20 ┆ ┆ ┆ ┆ │\n", 349 | "│ 2013-12-16 ┆ 2013-12-16 20:05:04 ┆ switchboard-plug-ke ┆ /usr/lib/plugs/pant ┆ │\n", 350 | "│ 20:03:20 ┆ ┆ yboard ┆ heon/keyboa… ┆ │\n", 351 | "│ 2013-12-16 ┆ 2013-12-16 20:05:04 ┆ thunderbird-locale- ┆ /usr/lib/thunderbir ┆ │\n", 352 | "│ 20:03:20 ┆ ┆ en ┆ d-addons/ex… ┆ │\n", 353 | "│ 2013-12-16 ┆ 2013-12-16 20:05:03 ┆ software-center ┆ /usr/sbin/update-so ┆ │\n", 354 | "│ 20:08:27 ┆ ┆ ┆ ftware-cent… ┆ │\n", 355 | "│ 2013-12-16 ┆ 2013-12-16 20:05:00 ┆ samba-common-bin ┆ /usr/bin/net.samba3 ┆ │\n", 356 | "│ 20:03:20 ┆ ┆ ┆ ┆ │\n", 357 | "│ 2013-12-16 ┆ 2013-12-16 20:04:59 ┆ postgresql-client-9 ┆ /usr/lib/postgresql ┆ │\n", 358 | "│ 20:08:25 ┆ ┆ .1 ┆ /9.1/bin/ps… ┆ │\n", 359 | "│ 2013-12-16 ┆ 2013-12-16 20:04:58 ┆ postgresql-9.1 ┆ /usr/lib/postgresql ┆ │\n", 360 | "│ 20:08:23 ┆ ┆ ┆ /9.1/bin/po… ┆ │\n", 361 | "│ 2013-12-16 ┆ 2013-12-16 20:04:55 ┆ php5-dev ┆ /usr/include/php5/m ┆ │\n", 362 | "│ 20:03:20 ┆ ┆ ┆ ain/snprint… ┆ │\n", 363 | "│ 2013-12-16 ┆ 2013-12-16 20:04:54 ┆ php-pear ┆ /usr/share/php/XML/ ┆ │\n", 364 | "│ 20:03:20 ┆ ┆ ┆ Util.php ┆ │\n", 365 | "└──────────────┴──────────────────────┴─────────────────────┴─────────────────────┴────────────────┘" 366 | ] 367 | }, 368 | "execution_count": 8, 369 | "metadata": {}, 370 | "output_type": "execute_result" 371 | } 372 | ], 373 | "source": [ 374 | "nonlibraries = popcon.filter(\n", 375 | " ~pl.col('package-name').str.contains('lib')\n", 376 | ")\n", 377 | "nonlibraries.top_k(10, by='ctime')" 378 | ] 379 | } 380 | ], 381 | "metadata": { 382 | "kernelspec": { 383 | "display_name": "Python 3 (ipykernel)", 384 | "language": "python", 385 | "name": "python3" 386 | }, 387 | "language_info": { 388 | "codemirror_mode": { 389 | "name": "ipython", 390 | "version": 3 391 | }, 392 | "file_extension": ".py", 393 | "mimetype": "text/x-python", 394 | "name": "python", 395 | "nbconvert_exporter": "python", 396 | "pygments_lexer": "ipython3", 397 | "version": "3.12.4" 398 | } 399 | }, 400 | "nbformat": 4, 401 | "nbformat_minor": 4 402 | } 403 | -------------------------------------------------------------------------------- /cookbook/Chapter 9 - Loading data from SQL databases.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 1, 6 | "metadata": {}, 7 | "outputs": [ 8 | { 9 | "name": "stdout", 10 | "output_type": "stream", 11 | "text": [ 12 | "1.6.0\n" 13 | ] 14 | } 15 | ], 16 | "source": [ 17 | "import polars as pl\n", 18 | "from pathlib import Path\n", 19 | "import sqlite3\n", 20 | "\n", 21 | "print(pl.__version__)" 22 | ] 23 | }, 24 | { 25 | "attachments": {}, 26 | "cell_type": "markdown", 27 | "metadata": {}, 28 | "source": [ 29 | "# 9.1 Reading data from SQL databases" 30 | ] 31 | }, 32 | { 33 | "attachments": {}, 34 | "cell_type": "markdown", 35 | "metadata": {}, 36 | "source": [ 37 | "So far we've only talked about reading data from CSV files. That's a pretty common way to store data, but there are many others! Polars has a number of I/O methods at its disposal (see the [documentation](https://pola-rs.github.io/polars/py-polars/html/reference/io.html) for a full list of options). In this chapter we'll talk about reading data from SQL databases.\n", 38 | "\n", 39 | "You can read data from a SQL database using the `pl.read_database` function. `read_database` will automatically convert SQL column names to DataFrame column names.\n", 40 | "\n", 41 | "`read_database` takes 2 arguments: a query statement and a connection URI. This is great because it means you can read from *any* kind of SQL database -- it doesn't matter if it's MySQL, SQLite, PostgreSQL, or something else.\n", 42 | "\n", 43 | "This example reads from a SQLite database, but any other database would work the same way." 44 | ] 45 | }, 46 | { 47 | "cell_type": "code", 48 | "execution_count": 2, 49 | "metadata": {}, 50 | "outputs": [ 51 | { 52 | "data": { 53 | "text/html": [ 54 | "
\n", 61 | "shape: (3, 3)
iddate_timetemp
i64datetime[ns]f64
12012-01-01 00:00:00-1.8
22012-01-01 01:00:00-1.8
32012-01-01 02:00:00-1.8
" 62 | ], 63 | "text/plain": [ 64 | "shape: (3, 3)\n", 65 | "┌─────┬─────────────────────┬──────┐\n", 66 | "│ id ┆ date_time ┆ temp │\n", 67 | "│ --- ┆ --- ┆ --- │\n", 68 | "│ i64 ┆ datetime[ns] ┆ f64 │\n", 69 | "╞═════╪═════════════════════╪══════╡\n", 70 | "│ 1 ┆ 2012-01-01 00:00:00 ┆ -1.8 │\n", 71 | "│ 2 ┆ 2012-01-01 01:00:00 ┆ -1.8 │\n", 72 | "│ 3 ┆ 2012-01-01 02:00:00 ┆ -1.8 │\n", 73 | "└─────┴─────────────────────┴──────┘" 74 | ] 75 | }, 76 | "execution_count": 2, 77 | "metadata": {}, 78 | "output_type": "execute_result" 79 | } 80 | ], 81 | "source": [ 82 | "read_db_path = Path('../data/weather_2012.sqlite').absolute()\n", 83 | "read_uri = f\"sqlite:////{read_db_path}\"\n", 84 | "df = pl.read_database_uri(\"SELECT * from weather_2012 LIMIT 3\", read_uri)\n", 85 | "df" 86 | ] 87 | }, 88 | { 89 | "attachments": {}, 90 | "cell_type": "markdown", 91 | "metadata": {}, 92 | "source": [ 93 | "# 9.2 Writing to a SQLite database" 94 | ] 95 | }, 96 | { 97 | "attachments": {}, 98 | "cell_type": "markdown", 99 | "metadata": {}, 100 | "source": [ 101 | "Polars has a `write_database` function which creates a database table from a dataframe. Let's use it to move our 2012 weather data into SQL." 102 | ] 103 | }, 104 | { 105 | "cell_type": "code", 106 | "execution_count": 3, 107 | "metadata": {}, 108 | "outputs": [ 109 | { 110 | "data": { 111 | "text/plain": [ 112 | "8784" 113 | ] 114 | }, 115 | "execution_count": 3, 116 | "metadata": {}, 117 | "output_type": "execute_result" 118 | } 119 | ], 120 | "source": [ 121 | "weather_df = pl.read_csv('../data/weather_2012.csv')\n", 122 | "write_db_path = Path('../data/test_db.sqlite').absolute()\n", 123 | "write_uri = f\"sqlite:////{write_db_path}\"\n", 124 | "\n", 125 | "con = sqlite3.connect(write_db_path)\n", 126 | "con.execute(\"DROP TABLE IF EXISTS weather_2012\")\n", 127 | "\n", 128 | "weather_df.write_database(\"weather_2012\", write_uri)" 129 | ] 130 | }, 131 | { 132 | "attachments": {}, 133 | "cell_type": "markdown", 134 | "metadata": {}, 135 | "source": [ 136 | "We can now read from the `weather_2012` table in `test_db.sqlite`, and we see that we get the same data back:" 137 | ] 138 | }, 139 | { 140 | "cell_type": "code", 141 | "execution_count": 4, 142 | "metadata": {}, 143 | "outputs": [ 144 | { 145 | "data": { 146 | "text/html": [ 147 | "
\n", 154 | "shape: (3, 8)
Date/TimeTemp (C)Dew Point Temp (C)Rel Hum (%)Wind Spd (km/h)Visibility (km)Stn Press (kPa)Weather
strf64f64i64i64f64f64str
"2012-01-01 00:00:00"-1.8-3.98648.0101.24"Fog"
"2012-01-01 01:00:00"-1.8-3.78748.0101.24"Fog"
"2012-01-01 02:00:00"-1.8-3.48974.0101.26"Freezing Drizzle,Fog"
" 155 | ], 156 | "text/plain": [ 157 | "shape: (3, 8)\n", 158 | "┌────────────┬──────────┬────────────┬────────────┬────────────┬───────────┬───────────┬───────────┐\n", 159 | "│ Date/Time ┆ Temp (C) ┆ Dew Point ┆ Rel Hum ┆ Wind Spd ┆ Visibilit ┆ Stn Press ┆ Weather │\n", 160 | "│ --- ┆ --- ┆ Temp (C) ┆ (%) ┆ (km/h) ┆ y (km) ┆ (kPa) ┆ --- │\n", 161 | "│ str ┆ f64 ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ str │\n", 162 | "│ ┆ ┆ f64 ┆ i64 ┆ i64 ┆ f64 ┆ f64 ┆ │\n", 163 | "╞════════════╪══════════╪════════════╪════════════╪════════════╪═══════════╪═══════════╪═══════════╡\n", 164 | "│ 2012-01-01 ┆ -1.8 ┆ -3.9 ┆ 86 ┆ 4 ┆ 8.0 ┆ 101.24 ┆ Fog │\n", 165 | "│ 00:00:00 ┆ ┆ ┆ ┆ ┆ ┆ ┆ │\n", 166 | "│ 2012-01-01 ┆ -1.8 ┆ -3.7 ┆ 87 ┆ 4 ┆ 8.0 ┆ 101.24 ┆ Fog │\n", 167 | "│ 01:00:00 ┆ ┆ ┆ ┆ ┆ ┆ ┆ │\n", 168 | "│ 2012-01-01 ┆ -1.8 ┆ -3.4 ┆ 89 ┆ 7 ┆ 4.0 ┆ 101.26 ┆ Freezing │\n", 169 | "│ 02:00:00 ┆ ┆ ┆ ┆ ┆ ┆ ┆ Drizzle,F │\n", 170 | "│ ┆ ┆ ┆ ┆ ┆ ┆ ┆ og │\n", 171 | "└────────────┴──────────┴────────────┴────────────┴────────────┴───────────┴───────────┴───────────┘" 172 | ] 173 | }, 174 | "execution_count": 4, 175 | "metadata": {}, 176 | "output_type": "execute_result" 177 | } 178 | ], 179 | "source": [ 180 | "df = pl.read_database_uri(\"SELECT * from weather_2012 LIMIT 3\", write_uri)\n", 181 | "df" 182 | ] 183 | }, 184 | { 185 | "attachments": {}, 186 | "cell_type": "markdown", 187 | "metadata": {}, 188 | "source": [ 189 | "The nice thing about having your data in a database is that you can do arbitrary SQL queries. This is cool especially if you're more familiar with SQL. Here's an example of sorting by the Weather column:" 190 | ] 191 | }, 192 | { 193 | "cell_type": "code", 194 | "execution_count": 5, 195 | "metadata": {}, 196 | "outputs": [ 197 | { 198 | "data": { 199 | "text/html": [ 200 | "
\n", 207 | "shape: (3, 8)
Date/TimeTemp (C)Dew Point Temp (C)Rel Hum (%)Wind Spd (km/h)Visibility (km)Stn Press (kPa)Weather
strf64f64i64i64f64f64str
"2012-01-03 19:00:00"-16.9-24.8502425.0101.74"Clear"
"2012-01-05 18:00:00"-7.1-14.4561125.0100.71"Clear"
"2012-01-05 19:00:00"-9.2-15.461725.0100.8"Clear"
" 208 | ], 209 | "text/plain": [ 210 | "shape: (3, 8)\n", 211 | "┌─────────────┬──────────┬─────────────┬─────────┬─────────────┬────────────┬────────────┬─────────┐\n", 212 | "│ Date/Time ┆ Temp (C) ┆ Dew Point ┆ Rel Hum ┆ Wind Spd ┆ Visibility ┆ Stn Press ┆ Weather │\n", 213 | "│ --- ┆ --- ┆ Temp (C) ┆ (%) ┆ (km/h) ┆ (km) ┆ (kPa) ┆ --- │\n", 214 | "│ str ┆ f64 ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ str │\n", 215 | "│ ┆ ┆ f64 ┆ i64 ┆ i64 ┆ f64 ┆ f64 ┆ │\n", 216 | "╞═════════════╪══════════╪═════════════╪═════════╪═════════════╪════════════╪════════════╪═════════╡\n", 217 | "│ 2012-01-03 ┆ -16.9 ┆ -24.8 ┆ 50 ┆ 24 ┆ 25.0 ┆ 101.74 ┆ Clear │\n", 218 | "│ 19:00:00 ┆ ┆ ┆ ┆ ┆ ┆ ┆ │\n", 219 | "│ 2012-01-05 ┆ -7.1 ┆ -14.4 ┆ 56 ┆ 11 ┆ 25.0 ┆ 100.71 ┆ Clear │\n", 220 | "│ 18:00:00 ┆ ┆ ┆ ┆ ┆ ┆ ┆ │\n", 221 | "│ 2012-01-05 ┆ -9.2 ┆ -15.4 ┆ 61 ┆ 7 ┆ 25.0 ┆ 100.8 ┆ Clear │\n", 222 | "│ 19:00:00 ┆ ┆ ┆ ┆ ┆ ┆ ┆ │\n", 223 | "└─────────────┴──────────┴─────────────┴─────────┴─────────────┴────────────┴────────────┴─────────┘" 224 | ] 225 | }, 226 | "execution_count": 5, 227 | "metadata": {}, 228 | "output_type": "execute_result" 229 | } 230 | ], 231 | "source": [ 232 | "df = pl.read_database_uri(\"SELECT * from weather_2012 ORDER BY Weather LIMIT 3\", write_uri)\n", 233 | "df" 234 | ] 235 | }, 236 | { 237 | "attachments": {}, 238 | "cell_type": "markdown", 239 | "metadata": {}, 240 | "source": [ 241 | "If you have a PostgreSQL database or MySQL database, reading from it works exactly the same way as reading from a SQLite database." 242 | ] 243 | }, 244 | { 245 | "attachments": {}, 246 | "cell_type": "markdown", 247 | "metadata": {}, 248 | "source": [ 249 | "# 9.3 Connecting to other kinds of database" 250 | ] 251 | }, 252 | { 253 | "attachments": {}, 254 | "cell_type": "markdown", 255 | "metadata": {}, 256 | "source": [ 257 | "To connect to a MySQL database:\n", 258 | "\n", 259 | "*Note: For these to work, you will need a working MySQL / PostgreSQL database, with the correct localhost, database name, etc.*" 260 | ] 261 | }, 262 | { 263 | "cell_type": "raw", 264 | "metadata": {}, 265 | "source": [ 266 | "pl.read_database_uri(\"select * from MY_TABLE\", \"mysql://username:password@server:port/database\")" 267 | ] 268 | }, 269 | { 270 | "attachments": {}, 271 | "cell_type": "markdown", 272 | "metadata": {}, 273 | "source": [ 274 | "To connect to a PostgreSQL database:" 275 | ] 276 | }, 277 | { 278 | "cell_type": "raw", 279 | "metadata": {}, 280 | "source": [ 281 | "pl.read_database_uri(\"select * from MY_TABLE\", \"postgresql://username:password@server:port/database\")" 282 | ] 283 | }, 284 | { 285 | "attachments": {}, 286 | "cell_type": "markdown", 287 | "metadata": {}, 288 | "source": [ 289 | "