├── .gitignore ├── AnomalyDetection └── anomalydetection-interactivenotebook-main │ ├── 01-Prerequisites.md │ ├── 02-Dataflow_Pub_Sub_Notebook.md │ ├── Dataflow_Pub_Sub_Notebook.ipynb │ ├── Images │ ├── DataflowJob.png │ ├── Lab_Arch.png │ ├── OrgPolicy.png │ ├── agg-data-results.png │ ├── agg-schema.png │ ├── clonedRepoDisplayed.png │ ├── create_notebook.png │ ├── dataflowFailed.png │ ├── default_notebook_settings.png │ ├── fixed-window.png │ ├── git_clone_icon.png │ ├── navigate_to_workbench.png │ ├── plot.png │ ├── raw-data-results.png │ ├── raw-schema.png │ └── search_for_dataflow.png │ ├── PythonSimulator.ipynb │ └── README.md ├── CODE_OF_CONDUCT.md ├── CONTRIBUTING.md ├── LICENSE ├── README.md └── RealTimePrediction └── realtime-intelligence-main ├── README.md ├── create_train_data.sh ├── images ├── architecture.png ├── batch.png ├── data_folder.png ├── dataflow_jobs1.png ├── dataflow_jobs2.png ├── flights_folder.png ├── ingestion_bq.png ├── ingestion_gcs.png ├── op_externalip.png ├── op_shieldedvm.png ├── prediction.png ├── pubsub.png ├── streaming.png ├── vertex_ai_deployment.png ├── vertex_ai_endpoint.png └── vertex_ai_training.png ├── install_packages.sh ├── predict_flights.sh ├── realtime ├── .gitignore ├── README.md ├── alldata_sample.json ├── call_predict.py ├── create_sample_input.sh ├── create_traindata.py ├── evaluation.ipynb ├── flightstxf │ ├── __init__.py │ └── flights_transforms.py ├── make_predictions.py ├── model.py ├── setup.py ├── simevents_sample.json └── train_on_vertex.py ├── setup_env.sh ├── simulate ├── airports.csv.gz └── simulate.py ├── simulate_flight.sh ├── stage_data.sh └── train_model.sh /.gitignore: -------------------------------------------------------------------------------- 1 | **/.DS_Store 2 | # Covers JetBrains IDEs: IntelliJ, RubyMine, PhpStorm, AppCode, PyCharm, CLion, Android Studio, WebStorm and Rider 3 | # Reference: https://intellij-support.jetbrains.com/hc/en-us/articles/206544839 4 | 5 | # User-specific stuff 6 | .idea/**/workspace.xml 7 | .idea/**/tasks.xml 8 | .idea/**/usage.statistics.xml 9 | .idea/**/dictionaries 10 | .idea/**/shelf 11 | *ipynb_checkpoints 12 | -------------------------------------------------------------------------------- /AnomalyDetection/anomalydetection-interactivenotebook-main/01-Prerequisites.md: -------------------------------------------------------------------------------- 1 | # About 2 | 3 | This module includes all prerequisites for this Lab
4 | 5 | [0. Prerequisites](#0-prerequisites)
6 | [1. Variables](#1-variables)
7 | [2. Enable APIs](#2-enable-api-services)
8 | [3. Create a VPC & a subnet](#3-create-vpc--subnet)
9 | [4. Create firewall rules](#4-create-firewall-rules)
10 | [5. Update organizational policies](#5-update-organizational-policies)
11 | [6. Service Account](#6-service-account)
12 | [7. Grant general IAM permissions](#7-grant-permissions-for-service-account-that-you-just-created)
13 | [8. Launch Apache Beam Notebook Instance](#8-launch-an-apache-beam-notebook-instance)
14 | [9. Next Step](#9-next-step)
15 | 16 | ## 0. Prerequisites 17 | 18 | #### 1. Create a project called "anomaly-detection".
19 | Note the project number and project ID. 20 | We will need this for the rest fo the lab. 21 | 22 | Set the project back to "anomaly-detection" in the UI 23 | 24 | ## 1. Variables 25 | 26 | We will use these throughout the lab.
27 | Run the below in cloud shells coped to the new project you created- 28 | ``` 29 | DEST_PROJECT=`gcloud config get-value project` 30 | VPC=$DEST_PROJECT"-vpc" 31 | SUBNET=$VPC"-subnet" 32 | REGION=us-central1 33 | VPC_FQN=projects/$DEST_PROJECT/global/networks/$VPC 34 | 35 | SERVICE_ACCOUNT="example-name" 36 | SERVICE_ACCOUNT_FQN=$SERVICE_ACCOUNT@$DEST_PROJECT.iam.gserviceaccount.com 37 | YOUR_IP=xx.xxx.xx.xx 38 | 39 | ``` 40 | ## 2. Enable API Services 41 | 42 | From cloud shell, run the below- 43 | ``` 44 | ggcloud services enable compute.googleapis.com 45 | gcloud services enable aiplatform.googleapis.com 46 | gcloud services enable dataflow.googleapis.com 47 | gcloud services enable datastream.googleapis.com 48 | gcloud services enable datacatalog.googleapis.com 49 | gcloud services enable bigquery.googleapis.com 50 | gcloud services enable composer.googleapis.com 51 | gcloud services enable sourcerepo.googleapis.com 52 | gcloud services enable cloudresourcemanager.googleapis.com 53 | ``` 54 | 55 | ## 3. Create VPC & Subnet 56 | 57 | Run the below from a cloud shell. 58 | 59 | ``` 60 | gcloud compute networks create $VPC \ 61 | --subnet-mode=custom \ 62 | --bgp-routing-mode=regional \ 63 | --mtu=1500 64 | ``` 65 | 66 | ``` 67 | gcloud compute networks subnets create $SUBNET \ 68 | --network=$VPC \ 69 | --range=10.0.0.0/24 \ 70 | --region=$REGION \ 71 | --enable-private-ip-google-access 72 | ``` 73 | ## 4. Create Firewall Rules 74 | 75 | 4.1) Intra-VPC, allow all communication 76 | 77 | ``` 78 | gcloud compute firewall-rules create allow-all-intra-vpc --project=$DEST_PROJECT --network=$VPC_FQN \ 79 | --description="Allows\ connection\ from\ any\ source\ to\ any\ instance\ on\ the\ network\ using\ custom\ protocols." --direction=INGRESS \ 80 | --priority=65534 --source-ranges=10.0.0.0/20 --action=ALLOW --rules=all 81 | ``` 82 | 83 | 4.2) Allow-SSH 84 | 85 | ``` 86 | gcloud compute firewall-rules create allow-all-ssh --project=$DEST_PROJECT --network=$VPC_FQN \ 87 | --description="Allows\ TCP\ connections\ from\ any\ source\ to\ any\ instance\ on\ the\ network\ using\ port\ 22." --direction=INGRESS \ 88 | --priority=65534 --source-ranges=0.0.0.0/0 --action=ALLOW --rules=tcp:22 89 | ``` 90 | 91 | 4.3) Allow Ingress 92 | 93 | ``` 94 | gcloud compute --project=$DEST_PROJECT firewall-rules create allow-all-to-my-machine --direction=INGRESS --priority=1000 --network=$VPC \ 95 | --action=ALLOW --rules=all --source-ranges=$YOUR_IP 96 | 97 | ``` 98 | 4.4) Allow your computer to access node-red from the browser on port 1880. 99 | 100 | ``` 101 | gcloud compute firewall-rules create allow-node-red 102 | --project=$DEST_PROJECT 103 | --network=$VPC_FQN --description=Allows\ TCP\ connections\ from\ node\ red\ source\ to\ any\ instance\ on\ the\ network\ using\ port\ 1880. 104 | --direction=INGRESS 105 | --priority=1010 106 | --source-ranges=$YOUR_IP * You need this to open node-red from a browser on your computer 107 | --action=ALLOW 108 | --rules=tcp:1880 109 | 110 | ``` 111 | 112 | ## 5. Update Organizational Policies 113 | 114 | In the Google Cloud Console, navigate to IAM -> Organization Policies 115 | 116 | Turn off the following org policy - constraints/compute.vmExternalIpAccess, constraints/iam.disableServiceAccountKeyCreation 117 | 118 | ![OrgPolicy](Images/OrgPolicy.png) 119 | 120 | ## 6. Service Account 121 | 122 | Run the following in the cloud shell 123 | 124 | ``` 125 | gcloud iam service-accounts create ${SERVICE_ACCOUNT} \ 126 | --description="User Managed Service Account" \ 127 | --display-name=$SERVICE_ACCOUNT 128 | ``` 129 | 130 | ## 7. Grant Permissions for Service Account that you just created 131 | 132 | Run the following to grant all the permissions the service account needs to run this lab 133 | 134 | ``` 135 | gcloud projects add-iam-policy-binding ${DEST_PROJECT} \ 136 | --member=serviceAccount:${SERVICE_ACCOUNT_FQN} \ 137 | --role=roles/iam.serviceAccountTokenCreator 138 | ``` 139 | ``` 140 | gcloud projects add-iam-policy-binding ${DEST_PROJECT} \ 141 | --member=serviceAccount:${SERVICE_ACCOUNT_FQN} \ 142 | --role=roles/pubsub.editor 143 | ``` 144 | ``` 145 | gcloud projects add-iam-policy-binding ${DEST_PROJECT} \ 146 | --member=serviceAccount:${SERVICE_ACCOUNT_FQN} \ 147 | --role=roles/pubsub.publisher 148 | ``` 149 | ``` 150 | gcloud projects add-iam-policy-binding ${DEST_PROJECT} \ 151 | --member=serviceAccount:${SERVICE_ACCOUNT_FQN} \ 152 | --role=roles/bigquery.admin 153 | ``` 154 | ``` 155 | gcloud projects add-iam-policy-binding ${DEST_PROJECT} \ 156 | --member=serviceAccount:${SERVICE_ACCOUNT_FQN} \ 157 | --role=roles/bigquery.dataEditor 158 | ``` 159 | ``` 160 | gcloud projects add-iam-policy-binding ${DEST_PROJECT} \ 161 | --member=serviceAccount:${SERVICE_ACCOUNT_FQN} \ 162 | --role=roles/dataflow.developer 163 | ``` 164 | ``` 165 | gcloud projects add-iam-policy-binding ${DEST_PROJECT} \ 166 | --member=serviceAccount:${SERVICE_ACCOUNT_FQN} \ 167 | --role=roles/dataflow.worker 168 | ``` 169 | ## 8. Launch an Apache Beam notebook instance 170 | 171 | Go to the Google Cloud Console->Dataflow Workflow->Workbench 172 | 173 | Make sure that you are on the User-managed notebooks tab. 174 | 175 | In the toolbar, click add New notebook. 176 | 177 | Select Apache Beam > Without GPUs. 178 | 179 | On the New notebook page, select the subnetwork you created in Step 3 for the notebook VM. 180 | 181 | Click Create. 182 | 183 | When the link becomes active, click Open JupyterLab. Vertex AI Workbench creates a new Apache Beam notebook instance. 184 | 185 | ## 9. Next Step 186 | 187 | [Data Generation](02-Dataflow_Pub_Sub_Notebook.md)
-------------------------------------------------------------------------------- /AnomalyDetection/anomalydetection-interactivenotebook-main/02-Dataflow_Pub_Sub_Notebook.md: -------------------------------------------------------------------------------- 1 | # Real Time Visibility: Anomaly Detection 2 | 3 | ### Overview 4 | 5 | Anomaly Detection is a demo to show an end to end architecture of a streaming pipeline from raw data ingestion to transform the data using Dataflow - leveraging Dataflow notebooks, setting up an Apache Beam pipeline, transforming the data using Windows and finally landing the data in BigQuery for further analysis. 6 | 7 | ### Architecture 8 | ![Lab architecture](Images/Lab_Arch.png) 9 | 10 | ### Getting Started 11 | 12 | Within the GCP Console, type in `dataflow` at the top of the search bar 13 | 14 | ![Search for dataflow](Images/search_for_dataflow.png) 15 | 16 | 17 | ![Navigate to workbench](Images/navigate_to_workbench.png) 18 | 19 | 20 | You will see an existing Notebook called **demo-notebook**. This is a default notbook that has some examples in it. We will leave the default notebook alone and create a new notebook. 21 | 22 | Click on** the User-Managed Notebooks** Tab and click **New Notebook** 23 | 24 | ![User managed notebook](Images/create_notebook.png) 25 | 26 | Select Apache Beam > Without GPUs 27 | 28 | Leave the default settings as is and click **CREATE** 29 | 30 | ![Default notebook settings](Images/default_notebook_settings.png) 31 | 32 | You can click *Refresh* to see the notebook being provisioned. 33 | 34 | Vertex AI Workbench will create a new Apache Beam notebook instance. Once it's available, click on **OPEN JUYPTERLAB** 35 | 36 | Once the Notebook is launched, you will see some default files and folders that come pre installed when you launch a new Notebook 37 | 38 | Next, we will clone a repo in order to get the files we need. Click on the **clone repo** icon: 39 | 40 | ![git_clone_icon](Images/git_clone_icon.png) 41 | 42 | Enter the below HTTPS address to clone the repo. This is a public repo containing the files we will use. 43 | 44 | ```shell 45 | https://github.com/seidou-1/GoogleCloud.git 46 | ``` 47 | 48 | You can leave `Include submodules` **checked** and `Download the repository` **unchecked** 49 | 50 | 51 | Once it's cloned, you'll see a folder called **GoogleCloud** 52 | 53 | 54 | ![clonedRepoDisplayed](Images/clonedRepoDisplayed.png) 55 | 56 | Click into that folder and continue until you reach the sub directory `anomalydetection-interactivenotebook-main ` 57 | 58 | Double click on the file `Dataflow_Pub_Sub_Notebook .ipynb` and follow the instructions in the Notebook. 59 | 60 | -------------------------------------------------------------------------------- /AnomalyDetection/anomalydetection-interactivenotebook-main/Dataflow_Pub_Sub_Notebook.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# How to run the examples on Dataflow\n", 8 | "\n", 9 | "This notebook illustrates a pipeline to stream the raw data from pub/sub to bigquery using dataflow runner and interactive runner .\n", 10 | "\n", 11 | "This pipeline processes the raw data from pub/sub and loads into Bigquery and in parallel it also windows the raw data (using fixed windowing) for every 3 seconds and calculates the mean of sensor values on the windowed data\n", 12 | "\n", 13 | "\n", 14 | "Note that running this example incurs a small [charge](https://cloud.google.com/dataflow/pricing) from Dataflow.\n", 15 | "\n", 16 | "Let's make sure the dependencies are installed. This allows to load the bq query results to a dataframe to plot the anomalies.\n", 17 | "\n" 18 | ] 19 | }, 20 | { 21 | "cell_type": "code", 22 | "execution_count": null, 23 | "metadata": {}, 24 | "outputs": [], 25 | "source": [ 26 | "pip install db-dtypes" 27 | ] 28 | }, 29 | { 30 | "cell_type": "markdown", 31 | "metadata": {}, 32 | "source": [ 33 | "After you do `pip install db-dtypes` restart the kernel by clicking on the reload icon up top near the navigation menu. Once restarted, proceed with the rest of the steps below." 34 | ] 35 | }, 36 | { 37 | "cell_type": "markdown", 38 | "metadata": {}, 39 | "source": [ 40 | " Lets make sure the Dataflow API is enabled. This [allows](https://cloud.google.com/apis/docs/getting-started#enabling_apis) your project to access the Dataflow service:" 41 | ] 42 | }, 43 | { 44 | "cell_type": "code", 45 | "execution_count": null, 46 | "metadata": {}, 47 | "outputs": [], 48 | "source": [ 49 | "!gcloud services enable dataflow.googleapis.com\n", 50 | "!gcloud services enable dataflow" 51 | ] 52 | }, 53 | { 54 | "cell_type": "markdown", 55 | "metadata": {}, 56 | "source": [ 57 | "### 1. Start with necessary imports\n" 58 | ] 59 | }, 60 | { 61 | "cell_type": "code", 62 | "execution_count": null, 63 | "metadata": {}, 64 | "outputs": [], 65 | "source": [ 66 | "import re\n", 67 | "import json\n", 68 | "from datetime import datetime\n", 69 | "import apache_beam as beam\n", 70 | "import random\n", 71 | "import time\n", 72 | "from google.cloud import pubsub_v1,bigquery\n", 73 | "from apache_beam.options import pipeline_options\n", 74 | "from apache_beam.options.pipeline_options import GoogleCloudOptions\n", 75 | "from apache_beam.runners import DataflowRunner\n", 76 | "from apache_beam.runners.interactive import interactive_runner\n", 77 | "from apache_beam import DoFn, GroupByKey, io, ParDo, Pipeline, PTransform, WindowInto, WithKeys,Create,Map , CombineGlobally ,dataframe\n", 78 | "import apache_beam.runners.interactive.interactive_beam as ib\n", 79 | "import google.auth\n", 80 | "import matplotlib.pyplot as plt\n", 81 | "\n", 82 | "publisher = pubsub_v1.PublisherClient() #Pubsub publisher client\n", 83 | "subscriber = pubsub_v1.SubscriberClient() #Pubsub subscriber client\n", 84 | "client = bigquery.Client() #bigquery client" 85 | ] 86 | }, 87 | { 88 | "cell_type": "markdown", 89 | "metadata": {}, 90 | "source": [ 91 | "### 2. Set the variables . These variables will be referenced in later sections\n" 92 | ] 93 | }, 94 | { 95 | "cell_type": "code", 96 | "execution_count": null, 97 | "metadata": {}, 98 | "outputs": [], 99 | "source": [ 100 | "dest_project=!gcloud config get-value project\n", 101 | "project_id=dest_project[1]\n", 102 | "print(project_id)\n", 103 | "\n", 104 | "pubsub_topic = project_id + \"-\" + \"topic\" \n", 105 | "pubsub_subscription = pubsub_topic + \"-\" + \"sub\"\n", 106 | "pubsub_topic_path = publisher.topic_path(project_id, pubsub_topic)\n", 107 | "pubsub_subscription_path = subscriber.subscription_path(project_id, pubsub_subscription)\n", 108 | "\n", 109 | "bq_dataset = \"anomaly_detection_demo\"\n", 110 | "bigquery_agg_schema = \"sensorID:STRING,sensorValue:FLOAT,windowStart:DATETIME,windowEnd:DATETIME\"\n", 111 | "bigquery_raw_schema = \"sensorID:STRING,timeStamp:DATETIME,sensorValue:FLOAT\"\n", 112 | "bigquery_raw_table = bq_dataset + \".anomaly_raw_table\" \n", 113 | "bigquery_agg_table = bq_dataset + \".anomaly_windowed_table\" \n", 114 | "region = \"us-central1\"\n", 115 | "bucket_name = project_id " 116 | ] 117 | }, 118 | { 119 | "cell_type": "markdown", 120 | "metadata": {}, 121 | "source": [ 122 | "### 3: Create Pub/sub topic" 123 | ] 124 | }, 125 | { 126 | "cell_type": "code", 127 | "execution_count": null, 128 | "metadata": {}, 129 | "outputs": [], 130 | "source": [ 131 | "!gcloud pubsub topics create {pubsub_topic}" 132 | ] 133 | }, 134 | { 135 | "cell_type": "markdown", 136 | "metadata": {}, 137 | "source": [ 138 | "If you get an error that says `Run client channel backup poller: UNKNOWN:pollset_` don't be alarmed it won't effect the job. It is just a formatting issue." 139 | ] 140 | }, 141 | { 142 | "cell_type": "markdown", 143 | "metadata": {}, 144 | "source": [ 145 | "### 4: Create Pub/sub subscription" 146 | ] 147 | }, 148 | { 149 | "cell_type": "code", 150 | "execution_count": null, 151 | "metadata": {}, 152 | "outputs": [], 153 | "source": [ 154 | "!gcloud pubsub subscriptions create {pubsub_subscription} --topic={pubsub_topic}" 155 | ] 156 | }, 157 | { 158 | "cell_type": "markdown", 159 | "metadata": {}, 160 | "source": [ 161 | "### 5. Create BigQuery Dataset\n" 162 | ] 163 | }, 164 | { 165 | "cell_type": "code", 166 | "execution_count": null, 167 | "metadata": {}, 168 | "outputs": [], 169 | "source": [ 170 | "!bq --location={region} mk --dataset {project_id}:{bq_dataset}" 171 | ] 172 | }, 173 | { 174 | "cell_type": "markdown", 175 | "metadata": {}, 176 | "source": [ 177 | "### 6. Create BigQuery Tables\n", 178 | "\n", 179 | "raw big query schema\n", 180 | "\n", 181 | "![raw-schema](Images/raw-schema.png)\n", 182 | "\n", 183 | "aggregated big query schema\n", 184 | "![agg-schema](Images/agg-schema.png)" 185 | ] 186 | }, 187 | { 188 | "cell_type": "code", 189 | "execution_count": null, 190 | "metadata": {}, 191 | "outputs": [], 192 | "source": [ 193 | "!bq mk --schema {bigquery_raw_schema} -t {bigquery_raw_table}\n", 194 | "!bq mk --schema {bigquery_agg_schema} -t {bigquery_agg_table}" 195 | ] 196 | }, 197 | { 198 | "cell_type": "markdown", 199 | "metadata": {}, 200 | "source": [ 201 | "If you get an error that says `Run client channel backup poller: UNKNOWN:pollset_` don't be alarmed it won't effect the job. It is just a formatting issue." 202 | ] 203 | }, 204 | { 205 | "cell_type": "markdown", 206 | "metadata": {}, 207 | "source": [ 208 | "### 7. Create GCS Bucket " 209 | ] 210 | }, 211 | { 212 | "cell_type": "code", 213 | "execution_count": null, 214 | "metadata": {}, 215 | "outputs": [], 216 | "source": [ 217 | "!gsutil mb -c standard -l {region} gs://{bucket_name}" 218 | ] 219 | }, 220 | { 221 | "cell_type": "markdown", 222 | "metadata": {}, 223 | "source": [ 224 | "### 8. IMPORTANT! open GCS bucket from console and create a folder called dataflow.\n", 225 | "path should be gs://project_id/dataflow\n" 226 | ] 227 | }, 228 | { 229 | "cell_type": "markdown", 230 | "metadata": {}, 231 | "source": [ 232 | "### 9. set the pipeline options" 233 | ] 234 | }, 235 | { 236 | "cell_type": "code", 237 | "execution_count": null, 238 | "metadata": {}, 239 | "outputs": [], 240 | "source": [ 241 | "# Setting up the Apache Beam pipeline options.\n", 242 | "options = pipeline_options.PipelineOptions(flags={})\n", 243 | "\n", 244 | "# Sets the pipeline mode to streaming, so we can stream the data from PubSub.\n", 245 | "options.view_as(pipeline_options.StandardOptions).streaming = True\n", 246 | "\n", 247 | "# Sets the project to the default project in your current Google Cloud environment.\n", 248 | "options.view_as(GoogleCloudOptions).project = project_id\n", 249 | "\n", 250 | "# Sets the Google Cloud Region in which Cloud Dataflow runs.\n", 251 | "options.view_as(GoogleCloudOptions).region = region" 252 | ] 253 | }, 254 | { 255 | "cell_type": "markdown", 256 | "metadata": {}, 257 | "source": [ 258 | "### 10. create the function to format the raw data and processed data" 259 | ] 260 | }, 261 | { 262 | "cell_type": "code", 263 | "execution_count": null, 264 | "metadata": {}, 265 | "outputs": [], 266 | "source": [ 267 | "# to add window begin datetime and endtime to the aggregated PCollections.\n", 268 | "class FormatDoFn(beam.DoFn):\n", 269 | " def process(self, element, window=beam.DoFn.WindowParam):\n", 270 | " from datetime import datetime\n", 271 | " window_start = datetime.fromtimestamp(window.start)\n", 272 | " window_end = datetime.fromtimestamp(window.end)\n", 273 | " return [{\n", 274 | " 'sensorID': element[0],\n", 275 | " 'sensorValue': element[1],\n", 276 | " 'windowStart': window_start,\n", 277 | " 'windowEnd': window_end\n", 278 | " }] " 279 | ] 280 | }, 281 | { 282 | "cell_type": "code", 283 | "execution_count": null, 284 | "metadata": {}, 285 | "outputs": [], 286 | "source": [ 287 | "# to get the raw PCollections\n", 288 | "class ProcessDoFn(beam.DoFn):\n", 289 | " def process(self, element):\n", 290 | " yield element " 291 | ] 292 | }, 293 | { 294 | "cell_type": "markdown", 295 | "metadata": {}, 296 | "source": [ 297 | "### 11. Construct the pipeline \n", 298 | "This step will take the pipeline from pub/sub topic and do some processing. It will process the raw data into raw PCollections and process the aggregated windowed data into aggregated pcollections.\n", 299 | "\n", 300 | "With the aggregated window, the pipeline will read the data from the pub/topic and group the data into 5 sec intervals. Lastly it will calculate the mean of sensor value for each window." 301 | ] 302 | }, 303 | { 304 | "cell_type": "markdown", 305 | "metadata": {}, 306 | "source": [ 307 | "![fixed-window](Images/fixed-window.png)" 308 | ] 309 | }, 310 | { 311 | "cell_type": "code", 312 | "execution_count": null, 313 | "metadata": {}, 314 | "outputs": [], 315 | "source": [ 316 | "# Set pipeline options \n", 317 | "p = beam.Pipeline(interactive_runner.InteractiveRunner(), options=options)\n", 318 | "\n", 319 | "# pub/sub => mapped(pcollections)\n", 320 | "mapped = (p | \"ReadFromPubSub\" >> beam.io.gcp.pubsub.ReadFromPubSub(subscription=pubsub_subscription_path)\n", 321 | " | \"Json Loads\" >> Map(json.loads))\n", 322 | "\n", 323 | "# mapped(input pcollections => raw_data(output pcollections)\n", 324 | "raw_data = (mapped \n", 325 | " | 'Format' >> beam.ParDo(ProcessDoFn()))\n", 326 | "\n", 327 | "# mapped(input pcollections) => agg_date(output pcollections) \n", 328 | "agg_data = (mapped \n", 329 | " | \"Map Keys\" >> Map(lambda x: (x[\"SensorID\"],x[\"SensorValue\"]))\n", 330 | " | \"ApplyFixedWindow\" >> beam.WindowInto(beam.window.FixedWindows(5))\n", 331 | " | \"Total Per Key\" >> beam.combiners.Mean.PerKey()\n", 332 | " | 'Final Format' >> beam.ParDo(FormatDoFn())) " 333 | ] 334 | }, 335 | { 336 | "cell_type": "markdown", 337 | "metadata": {}, 338 | "source": [ 339 | "Note that the `Pipeline` is constructed by an `InteractiveRunner`, so you can use operations such as `ib.collect` or `ib.show`.\n", 340 | "### Important \n", 341 | "Run steps 1-4 in simulator script(PythonSimulator.ipynb) in a separate tab -- (this is to simulate the data and writes to pub/sub topic to test interactiverunner(1 message per millisecond until it reaches 100 messages)\n", 342 | "\n", 343 | "Remember to **only** run steps 1-4 for now. We will come back to this script to run step 5 later.\n", 344 | " " 345 | ] 346 | }, 347 | { 348 | "cell_type": "code", 349 | "execution_count": null, 350 | "metadata": {}, 351 | "outputs": [], 352 | "source": [ 353 | "ib.show(agg_data)" 354 | ] 355 | }, 356 | { 357 | "cell_type": "markdown", 358 | "metadata": {}, 359 | "source": [ 360 | "### 12.Dataflow Additions\n", 361 | "\n", 362 | "Now, for something a bit different. Because Dataflow executes in the cloud, you need to output to a cloud sink. In this case, you are loading the transformed data into Cloud Storage.\n", 363 | "\n", 364 | "First, set up the `PipelineOptions` to specify to the Dataflow service the Google Cloud project, the region to run the Dataflow Job, and the SDK location." 365 | ] 366 | }, 367 | { 368 | "cell_type": "code", 369 | "execution_count": null, 370 | "metadata": {}, 371 | "outputs": [], 372 | "source": [ 373 | "# IMPORTANT! Adjust the following to choose a Cloud Storage location.\n", 374 | "dataflow_gcs_location = \"gs:///dataflow\"\n", 375 | "\n", 376 | "# Dataflow Staging Location. This location is used to stage the Dataflow Pipeline and SDK binary.\n", 377 | "# options.view_as(GoogleCloudOptions).staging_location = dataflow_gcs_location\n", 378 | "\n", 379 | "# Sets the project to the default project in your current Google Cloud environment.\n", 380 | "_, options.view_as(GoogleCloudOptions).project = google.auth.default()\n", 381 | "\n", 382 | "# Dataflow Temp Location. This location is used to store temporary files or intermediate results before finally outputting to the sink.\n", 383 | "options.view_as(GoogleCloudOptions).temp_location = '%s/temp' % dataflow_gcs_location\n", 384 | "\n", 385 | "# Dataflow job name. when pipeline runs as dataflowrunner.\n", 386 | "options.view_as(GoogleCloudOptions).job_name = project_id\n" 387 | ] 388 | }, 389 | { 390 | "cell_type": "code", 391 | "execution_count": null, 392 | "metadata": {}, 393 | "outputs": [], 394 | "source": [ 395 | "# Specifying the bigquery table to write `add_data` to,\n", 396 | "# based on the `bigquery_raw_table` variable set earlier.\n", 397 | "(raw_data | 'Write raw data to Bigquery' \n", 398 | " >> beam.io.WriteToBigQuery(\n", 399 | " bigquery_raw_table,\n", 400 | " write_disposition=beam.io.BigQueryDisposition.WRITE_APPEND))\n", 401 | "# Specifying the bigquery table to write `add_data` to,\n", 402 | "# based on the `bigquery_agg_table` variable set earlier.\n", 403 | "(agg_data | 'Write windowed aggregated data to Bigquery' \n", 404 | " >> beam.io.WriteToBigQuery(\n", 405 | " bigquery_agg_table,\n", 406 | " write_disposition=beam.io.BigQueryDisposition.WRITE_APPEND))" 407 | ] 408 | }, 409 | { 410 | "cell_type": "code", 411 | "execution_count": null, 412 | "metadata": {}, 413 | "outputs": [], 414 | "source": [ 415 | "# IMPORTANT! Ensure that the graph is correct before sending it out to Dataflow.\n", 416 | "# Because this is a notebook environment, unintended additions to the graph may have occurred when rerunning cells. \n", 417 | "ib.show_graph(p)" 418 | ] 419 | }, 420 | { 421 | "cell_type": "markdown", 422 | "metadata": {}, 423 | "source": [ 424 | "### 13.Running the pipeline\n", 425 | "\n", 426 | "Now you are ready to run the pipeline on Dataflow. `run_pipeline()` runs the pipeline and return a pipeline result object." 427 | ] 428 | }, 429 | { 430 | "cell_type": "code", 431 | "execution_count": null, 432 | "metadata": {}, 433 | "outputs": [], 434 | "source": [ 435 | "pipeline_result = DataflowRunner().run_pipeline(p, options=options)" 436 | ] 437 | }, 438 | { 439 | "cell_type": "markdown", 440 | "metadata": {}, 441 | "source": [ 442 | "### Important \n", 443 | "![dataflowStatus](Images/dataflowFailed.png)\n", 444 | "\n", 445 | "Before moving forward, check the dataflow job to see if it's running (Hamburger menu->Dataflow->Jobs). If the status shows as `failed`, **rerun** the above cell `pipeline_result = DataflowRunner().run_pipeline(p, options=options)` one more time. This happens when the Dataflow API is not fully enabled. It takes a minute or so for the API to permeate fully.\n", 446 | "\n", 447 | "\n" 448 | ] 449 | }, 450 | { 451 | "cell_type": "markdown", 452 | "metadata": {}, 453 | "source": [ 454 | "Using the `pipeline_result` handle, the following code builds a link to the Google Cloud Console web page that shows you details of the Dataflow job you just started:" 455 | ] 456 | }, 457 | { 458 | "cell_type": "code", 459 | "execution_count": null, 460 | "metadata": {}, 461 | "outputs": [], 462 | "source": [ 463 | "from IPython.core.display import display, HTML\n", 464 | "url = ('https://console.cloud.google.com/dataflow/jobs/%s/%s?project=%s' % \n", 465 | " (pipeline_result._job.location, pipeline_result._job.id, pipeline_result._job.projectId))\n", 466 | "display(HTML('Click here for the details of your Dataflow job!' % url))\n" 467 | ] 468 | }, 469 | { 470 | "cell_type": "markdown", 471 | "metadata": {}, 472 | "source": [ 473 | "dtaflow job\n", 474 | "![dataflow-job](Images/DataflowJob.png)" 475 | ] 476 | }, 477 | { 478 | "cell_type": "markdown", 479 | "metadata": {}, 480 | "source": [ 481 | "### Important \n", 482 | "Run step5 in simulator script(PythonSimulator.ipynb) that is in a separate tab -- (this is to simulate the data and writes to pub/sub topic to test dataflow runner(1 message per millisecond until it reaches 5000 messages). " 483 | ] 484 | }, 485 | { 486 | "cell_type": "markdown", 487 | "metadata": {}, 488 | "source": [ 489 | "### 14.Checking the raw table results (note: it will take ~90sec to appear the initial data in table due to dataflow warmup time)\n", 490 | "raw table results\n", 491 | "![raw-table-results](Images/raw-data-results.png)" 492 | ] 493 | }, 494 | { 495 | "cell_type": "code", 496 | "execution_count": null, 497 | "metadata": {}, 498 | "outputs": [], 499 | "source": [ 500 | "#check the raw data in BQ raw Table\n", 501 | "sql = 'SELECT * FROM `{}` '.format(bigquery_raw_table)\n", 502 | "query_job = client.query(sql) # API request\n", 503 | "raw_df = query_job.to_dataframe()\n", 504 | "raw_df" 505 | ] 506 | }, 507 | { 508 | "cell_type": "markdown", 509 | "metadata": {}, 510 | "source": [ 511 | "### 15.Checking the agg table results\n", 512 | "agg table results\n", 513 | "\n", 514 | "![agg-table-results](Images/agg-data-results.png)" 515 | ] 516 | }, 517 | { 518 | "cell_type": "code", 519 | "execution_count": null, 520 | "metadata": {}, 521 | "outputs": [], 522 | "source": [ 523 | "#check the agg data in BQ raw Table\n", 524 | "sql = 'SELECT sensorID , case when sensorValue >= 200 then \"Anomaly\" else \"Normal\" end as type, sensorValue,row_number() over (order by windowStart) as cycle FROM `{}` '.format(bigquery_agg_table)\n", 525 | "query_job = client.query(sql) # API request\n", 526 | "agg_df = query_job.to_dataframe()\n", 527 | "agg_df" 528 | ] 529 | }, 530 | { 531 | "cell_type": "markdown", 532 | "metadata": {}, 533 | "source": [ 534 | "### 16.Plot the results in a simple scatterplot chart \n", 535 | "\n", 536 | "Chart will display Anomalies in red color and Normal in Green color\n", 537 | "![plot](Images/plot.png)" 538 | ] 539 | }, 540 | { 541 | "cell_type": "code", 542 | "execution_count": null, 543 | "metadata": {}, 544 | "outputs": [], 545 | "source": [ 546 | "c=['green' if g=='Normal' else 'red' for g in agg_df['type']]\n", 547 | "agg_df.plot(\n", 548 | " kind=\"scatter\",\n", 549 | " x=\"cycle\",\n", 550 | " y=\"sensorValue\" , c = c, s = 150,\n", 551 | " figsize=(20, 10) \n", 552 | " )\n", 553 | "plt.axhline(y=200, color='black', linestyle='-',linewidth=3)\n" 554 | ] 555 | }, 556 | { 557 | "cell_type": "markdown", 558 | "metadata": {}, 559 | "source": [ 560 | "# Congratulations!!!\n", 561 | "End of lab\n" 562 | ] 563 | } 564 | ], 565 | "metadata": { 566 | "kernelspec": { 567 | "display_name": "01. Apache Beam 2.45.0 for Python 3", 568 | "language": "python", 569 | "name": "01-apache-beam-2.45.0" 570 | }, 571 | "language_info": { 572 | "codemirror_mode": { 573 | "name": "ipython", 574 | "version": 3 575 | }, 576 | "file_extension": ".py", 577 | "mimetype": "text/x-python", 578 | "name": "python", 579 | "nbconvert_exporter": "python", 580 | "pygments_lexer": "ipython3", 581 | "version": "3.8.10" 582 | } 583 | }, 584 | "nbformat": 4, 585 | "nbformat_minor": 4 586 | } 587 | -------------------------------------------------------------------------------- /AnomalyDetection/anomalydetection-interactivenotebook-main/Images/DataflowJob.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/google/real-time-intelligence-workshop/ea99fd5b6fcf44c4f006dc17adde8c42ca163479/AnomalyDetection/anomalydetection-interactivenotebook-main/Images/DataflowJob.png -------------------------------------------------------------------------------- /AnomalyDetection/anomalydetection-interactivenotebook-main/Images/Lab_Arch.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/google/real-time-intelligence-workshop/ea99fd5b6fcf44c4f006dc17adde8c42ca163479/AnomalyDetection/anomalydetection-interactivenotebook-main/Images/Lab_Arch.png -------------------------------------------------------------------------------- /AnomalyDetection/anomalydetection-interactivenotebook-main/Images/OrgPolicy.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/google/real-time-intelligence-workshop/ea99fd5b6fcf44c4f006dc17adde8c42ca163479/AnomalyDetection/anomalydetection-interactivenotebook-main/Images/OrgPolicy.png -------------------------------------------------------------------------------- /AnomalyDetection/anomalydetection-interactivenotebook-main/Images/agg-data-results.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/google/real-time-intelligence-workshop/ea99fd5b6fcf44c4f006dc17adde8c42ca163479/AnomalyDetection/anomalydetection-interactivenotebook-main/Images/agg-data-results.png -------------------------------------------------------------------------------- /AnomalyDetection/anomalydetection-interactivenotebook-main/Images/agg-schema.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/google/real-time-intelligence-workshop/ea99fd5b6fcf44c4f006dc17adde8c42ca163479/AnomalyDetection/anomalydetection-interactivenotebook-main/Images/agg-schema.png -------------------------------------------------------------------------------- /AnomalyDetection/anomalydetection-interactivenotebook-main/Images/clonedRepoDisplayed.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/google/real-time-intelligence-workshop/ea99fd5b6fcf44c4f006dc17adde8c42ca163479/AnomalyDetection/anomalydetection-interactivenotebook-main/Images/clonedRepoDisplayed.png -------------------------------------------------------------------------------- /AnomalyDetection/anomalydetection-interactivenotebook-main/Images/create_notebook.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/google/real-time-intelligence-workshop/ea99fd5b6fcf44c4f006dc17adde8c42ca163479/AnomalyDetection/anomalydetection-interactivenotebook-main/Images/create_notebook.png -------------------------------------------------------------------------------- /AnomalyDetection/anomalydetection-interactivenotebook-main/Images/dataflowFailed.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/google/real-time-intelligence-workshop/ea99fd5b6fcf44c4f006dc17adde8c42ca163479/AnomalyDetection/anomalydetection-interactivenotebook-main/Images/dataflowFailed.png -------------------------------------------------------------------------------- /AnomalyDetection/anomalydetection-interactivenotebook-main/Images/default_notebook_settings.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/google/real-time-intelligence-workshop/ea99fd5b6fcf44c4f006dc17adde8c42ca163479/AnomalyDetection/anomalydetection-interactivenotebook-main/Images/default_notebook_settings.png -------------------------------------------------------------------------------- /AnomalyDetection/anomalydetection-interactivenotebook-main/Images/fixed-window.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/google/real-time-intelligence-workshop/ea99fd5b6fcf44c4f006dc17adde8c42ca163479/AnomalyDetection/anomalydetection-interactivenotebook-main/Images/fixed-window.png -------------------------------------------------------------------------------- /AnomalyDetection/anomalydetection-interactivenotebook-main/Images/git_clone_icon.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/google/real-time-intelligence-workshop/ea99fd5b6fcf44c4f006dc17adde8c42ca163479/AnomalyDetection/anomalydetection-interactivenotebook-main/Images/git_clone_icon.png -------------------------------------------------------------------------------- /AnomalyDetection/anomalydetection-interactivenotebook-main/Images/navigate_to_workbench.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/google/real-time-intelligence-workshop/ea99fd5b6fcf44c4f006dc17adde8c42ca163479/AnomalyDetection/anomalydetection-interactivenotebook-main/Images/navigate_to_workbench.png -------------------------------------------------------------------------------- /AnomalyDetection/anomalydetection-interactivenotebook-main/Images/plot.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/google/real-time-intelligence-workshop/ea99fd5b6fcf44c4f006dc17adde8c42ca163479/AnomalyDetection/anomalydetection-interactivenotebook-main/Images/plot.png -------------------------------------------------------------------------------- /AnomalyDetection/anomalydetection-interactivenotebook-main/Images/raw-data-results.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/google/real-time-intelligence-workshop/ea99fd5b6fcf44c4f006dc17adde8c42ca163479/AnomalyDetection/anomalydetection-interactivenotebook-main/Images/raw-data-results.png -------------------------------------------------------------------------------- /AnomalyDetection/anomalydetection-interactivenotebook-main/Images/raw-schema.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/google/real-time-intelligence-workshop/ea99fd5b6fcf44c4f006dc17adde8c42ca163479/AnomalyDetection/anomalydetection-interactivenotebook-main/Images/raw-schema.png -------------------------------------------------------------------------------- /AnomalyDetection/anomalydetection-interactivenotebook-main/Images/search_for_dataflow.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/google/real-time-intelligence-workshop/ea99fd5b6fcf44c4f006dc17adde8c42ca163479/AnomalyDetection/anomalydetection-interactivenotebook-main/Images/search_for_dataflow.png -------------------------------------------------------------------------------- /AnomalyDetection/anomalydetection-interactivenotebook-main/PythonSimulator.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "id": "e974ebb8", 6 | "metadata": {}, 7 | "source": [ 8 | "Copyright 2023 Google LLC\n", 9 | "\n", 10 | "Licensed under the Apache License, Version 2.0 (the \"License\");\n", 11 | "you may not use this file except in compliance with the License.\n", 12 | "You may obtain a copy of the License at\n", 13 | "\n", 14 | "    https://www.apache.org/licenses/LICENSE-2.0\n", 15 | "\n", 16 | "Unless required by applicable law or agreed to in writing, software\n", 17 | "distributed under the License is distributed on an \"AS IS\" BASIS,\n", 18 | "WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", 19 | "See the License for the specific language governing permissions and\n", 20 | "limitations under the License." 21 | ] 22 | }, 23 | { 24 | "cell_type": "markdown", 25 | "id": "2d419814-3f29-4a93-8fc9-a26eac2e7439", 26 | "metadata": {}, 27 | "source": [ 28 | "### 1. Start with necessary imports" 29 | ] 30 | }, 31 | { 32 | "cell_type": "code", 33 | "execution_count": null, 34 | "id": "3bcb5a61-b6f2-4c0b-8515-fc38666adbe1", 35 | "metadata": {}, 36 | "outputs": [], 37 | "source": [ 38 | "from google.cloud import pubsub_v1\n", 39 | "import json\n", 40 | "from datetime import datetime\n", 41 | "import random\n", 42 | "import time\n", 43 | "publisher = pubsub_v1.PublisherClient()" 44 | ] 45 | }, 46 | { 47 | "cell_type": "markdown", 48 | "id": "29290d31-9ffb-4cbe-90e0-e508f2a49eae", 49 | "metadata": {}, 50 | "source": [ 51 | "### 2. Set the variables . These variables will be referenced in later sections" 52 | ] 53 | }, 54 | { 55 | "cell_type": "code", 56 | "execution_count": null, 57 | "id": "3f189dbc-3ba8-41c9-a37c-1fc37a6256cd", 58 | "metadata": {}, 59 | "outputs": [], 60 | "source": [ 61 | "dest_project=!gcloud config get-value project\n", 62 | "project_id=dest_project[1]\n", 63 | "print(project_id)\n", 64 | "\n", 65 | "pubsub_topic = project_id + \"-\" + \"topic\" \n", 66 | "pubsub_topic_path = publisher.topic_path(project_id, pubsub_topic)\n" 67 | ] 68 | }, 69 | { 70 | "cell_type": "markdown", 71 | "id": "cceca00c-2ca3-41f2-a80a-114010158947", 72 | "metadata": {}, 73 | "source": [ 74 | "### 3. Create the function to simulate the data" 75 | ] 76 | }, 77 | { 78 | "cell_type": "code", 79 | "execution_count": null, 80 | "id": "dc71111e-778e-4df2-8e9c-96427df99fde", 81 | "metadata": {}, 82 | "outputs": [], 83 | "source": [ 84 | "def simulator(number): \n", 85 | " \n", 86 | " i = 0 \n", 87 | " while i < number:\n", 88 | " json_object = json.dumps({\"SensorID\":\"75c18751-7a94-453e-86f5-67be2b0c8fd4\",'Timestamp': datetime.now().strftime('%Y-%m-%d %H:%M:%S.%f')[:-3],\"SensorValue\":random.uniform(100, 300)})\n", 89 | " data = json_object.encode(\"utf-8\")\n", 90 | " future = publisher.publish(pubsub_topic_path, data)\n", 91 | " time.sleep(0.1)\n", 92 | " i= i + 1\n", 93 | "\n" 94 | ] 95 | }, 96 | { 97 | "cell_type": "markdown", 98 | "id": "07ab8e92-e4a4-4481-84bd-32b5a8881579", 99 | "metadata": {}, 100 | "source": [ 101 | "### 4.Run the simulator to test interactive runner" 102 | ] 103 | }, 104 | { 105 | "cell_type": "code", 106 | "execution_count": null, 107 | "id": "3b84d710-7d27-4512-ae1e-5d3a618009fb", 108 | "metadata": {}, 109 | "outputs": [], 110 | "source": [ 111 | "#interactive test data simulation\n", 112 | "if __name__ == \"__main__\":\n", 113 | " simulator(100)\n" 114 | ] 115 | }, 116 | { 117 | "cell_type": "markdown", 118 | "id": "d7cfb1b2-e6ab-4da0-a584-599bbb0458a7", 119 | "metadata": {}, 120 | "source": [ 121 | "### 5.Run the simulator to test dataflow runner" 122 | ] 123 | }, 124 | { 125 | "cell_type": "code", 126 | "execution_count": null, 127 | "id": "d4303433-773d-43f3-87cb-4a2c6fee2346", 128 | "metadata": {}, 129 | "outputs": [], 130 | "source": [ 131 | "#dataflow test data simulation\n", 132 | "if __name__ == \"__main__\":\n", 133 | " simulator(5000)" 134 | ] 135 | } 136 | ], 137 | "metadata": { 138 | "kernelspec": { 139 | "display_name": "01. Apache Beam 2.45.0 for Python 3", 140 | "language": "python", 141 | "name": "01-apache-beam-2.45.0" 142 | }, 143 | "language_info": { 144 | "codemirror_mode": { 145 | "name": "ipython", 146 | "version": 3 147 | }, 148 | "file_extension": ".py", 149 | "mimetype": "text/x-python", 150 | "name": "python", 151 | "nbconvert_exporter": "python", 152 | "pygments_lexer": "ipython3", 153 | "version": "3.8.10" 154 | } 155 | }, 156 | "nbformat": 4, 157 | "nbformat_minor": 5 158 | } 159 | -------------------------------------------------------------------------------- /AnomalyDetection/anomalydetection-interactivenotebook-main/README.md: -------------------------------------------------------------------------------- 1 | Copyright 2023 Google LLC 2 | 3 | Licensed under the Apache License, Version 2.0 (the "License"); 4 | you may not use this file except in compliance with the License. 5 | You may obtain a copy of the License at 6 | 7 |     https://www.apache.org/licenses/LICENSE-2.0 8 | 9 | Unless required by applicable law or agreed to in writing, software 10 | distributed under the License is distributed on an "AS IS" BASIS, 11 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 | See the License for the specific language governing permissions and 13 | limitations under the License. 14 | 15 | # Real Time Visibility - Anomaly Detection 16 | 17 | Demo Asset for Anomaly Detection Use case RealTime Intelligence Go To Market Sales Play 18 | 19 | ## About this Lab 20 | 21 | Anomaly Detection is a demo to show an end to end architecture of a streaming pipeline from raw data ingestion to transform the data using Dataflow - leveraging Dataflow notebooks, setting up an Apache Beam pipeline, transforming the data using Windows and finally landing the data in BigQuery for further analysis. Below you will find an architecture diagram of the overall end to end solution 22 | 23 | ## Architecture 24 | 25 | ![Architecture](Images/Lab_Arch.png) 26 | 27 | ## Lab Modules 28 | 29 | This repo is organized across various modules: 30 | 31 | [1. Prerequisites - provisioning, configuring, securing](01-Prerequisites.md)
32 | 33 | [2. Data Integration Pipeline](02-Dataflow_Pub_Sub_Notebook.md)
34 | -------------------------------------------------------------------------------- /CODE_OF_CONDUCT.md: -------------------------------------------------------------------------------- 1 | # Code of Conduct 2 | 3 | ## Our Pledge 4 | 5 | In the interest of fostering an open and welcoming environment, we as 6 | contributors and maintainers pledge to making participation in our project and 7 | our community a harassment-free experience for everyone, regardless of age, body 8 | size, disability, ethnicity, gender identity and expression, level of 9 | experience, education, socio-economic status, nationality, personal appearance, 10 | race, religion, or sexual identity and orientation. 11 | 12 | ## Our Standards 13 | 14 | Examples of behavior that contributes to creating a positive environment 15 | include: 16 | 17 | * Using welcoming and inclusive language 18 | * Being respectful of differing viewpoints and experiences 19 | * Gracefully accepting constructive criticism 20 | * Focusing on what is best for the community 21 | * Showing empathy towards other community members 22 | 23 | Examples of unacceptable behavior by participants include: 24 | 25 | * The use of sexualized language or imagery and unwelcome sexual attention or 26 | advances 27 | * Trolling, insulting/derogatory comments, and personal or political attacks 28 | * Public or private harassment 29 | * Publishing others' private information, such as a physical or electronic 30 | address, without explicit permission 31 | * Other conduct which could reasonably be considered inappropriate in a 32 | professional setting 33 | 34 | ## Our Responsibilities 35 | 36 | Project maintainers are responsible for clarifying the standards of acceptable 37 | behavior and are expected to take appropriate and fair corrective action in 38 | response to any instances of unacceptable behavior. 39 | 40 | Project maintainers have the right and responsibility to remove, edit, or reject 41 | comments, commits, code, wiki edits, issues, and other contributions that are 42 | not aligned to this Code of Conduct, or to ban temporarily or permanently any 43 | contributor for other behaviors that they deem inappropriate, threatening, 44 | offensive, or harmful. 45 | 46 | ## Scope 47 | 48 | This Code of Conduct applies both within project spaces and in public spaces 49 | when an individual is representing the project or its community. Examples of 50 | representing a project or community include using an official project e-mail 51 | address, posting via an official social media account, or acting as an appointed 52 | representative at an online or offline event. Representation of a project may be 53 | further defined and clarified by project maintainers. 54 | 55 | This Code of Conduct also applies outside the project spaces when the Project 56 | Steward has a reasonable belief that an individual's behavior may have a 57 | negative impact on the project or its community. 58 | 59 | ## Conflict Resolution 60 | 61 | We do not believe that all conflict is bad; healthy debate and disagreement 62 | often yield positive results. However, it is never okay to be disrespectful or 63 | to engage in behavior that violates the project’s code of conduct. 64 | 65 | If you see someone violating the code of conduct, you are encouraged to address 66 | the behavior directly with those involved. Many issues can be resolved quickly 67 | and easily, and this gives people more control over the outcome of their 68 | dispute. If you are unable to resolve the matter for any reason, or if the 69 | behavior is threatening or harassing, report it. We are dedicated to providing 70 | an environment where participants feel welcome and safe. 71 | 72 | Reports should be directed to *[PROJECT STEWARD NAME(s) AND EMAIL(s)]*, the 73 | Project Steward(s) for *[PROJECT NAME]*. It is the Project Steward’s duty to 74 | receive and address reported violations of the code of conduct. They will then 75 | work with a committee consisting of representatives from the Open Source 76 | Programs Office and the Google Open Source Strategy team. If for any reason you 77 | are uncomfortable reaching out to the Project Steward, please email 78 | opensource@google.com. 79 | 80 | We will investigate every complaint, but you may not receive a direct response. 81 | We will use our discretion in determining when and how to follow up on reported 82 | incidents, which may range from not taking action to permanent expulsion from 83 | the project and project-sponsored spaces. We will notify the accused of the 84 | report and provide them an opportunity to discuss it before any action is taken. 85 | The identity of the reporter will be omitted from the details of the report 86 | supplied to the accused. In potentially harmful situations, such as ongoing 87 | harassment or threats to anyone's safety, we may take action without notice. 88 | 89 | ## Attribution 90 | 91 | This Code of Conduct is adapted from the Contributor Covenant, version 1.4, 92 | available at 93 | https://www.contributor-covenant.org/version/1/4/code-of-conduct.html 94 | 95 | -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | # How to contribute 2 | 3 | We'd love to accept your patches and contributions to this project. 4 | 5 | ## Before you begin 6 | 7 | ### Sign our Contributor License Agreement 8 | 9 | Contributions to this project must be accompanied by a 10 | [Contributor License Agreement](https://cla.developers.google.com/about) (CLA). 11 | You (or your employer) retain the copyright to your contribution; this simply 12 | gives us permission to use and redistribute your contributions as part of the 13 | project. 14 | 15 | If you or your current employer have already signed the Google CLA (even if it 16 | was for a different project), you probably don't need to do it again. 17 | 18 | Visit to see your current agreements or to 19 | sign a new one. 20 | 21 | ### Review our community guidelines 22 | 23 | This project follows 24 | [Google's Open Source Community Guidelines](https://opensource.google/conduct/). 25 | 26 | ## Contribution process 27 | 28 | ### Code reviews 29 | 30 | All submissions, including submissions by project members, require review. We 31 | use GitHub pull requests for this purpose. Consult 32 | [GitHub Help](https://help.github.com/articles/about-pull-requests/) for more 33 | information on using pull requests. -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | 2 | Apache License 3 | Version 2.0, January 2004 4 | http://www.apache.org/licenses/ 5 | 6 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 7 | 8 | 1. Definitions. 9 | 10 | "License" shall mean the terms and conditions for use, reproduction, 11 | and distribution as defined by Sections 1 through 9 of this document. 12 | 13 | "Licensor" shall mean the copyright owner or entity authorized by 14 | the copyright owner that is granting the License. 15 | 16 | "Legal Entity" shall mean the union of the acting entity and all 17 | other entities that control, are controlled by, or are under common 18 | control with that entity. For the purposes of this definition, 19 | "control" means (i) the power, direct or indirect, to cause the 20 | direction or management of such entity, whether by contract or 21 | otherwise, or (ii) ownership of fifty percent (50%) or more of the 22 | outstanding shares, or (iii) beneficial ownership of such entity. 23 | 24 | "You" (or "Your") shall mean an individual or Legal Entity 25 | exercising permissions granted by this License. 26 | 27 | "Source" form shall mean the preferred form for making modifications, 28 | including but not limited to software source code, documentation 29 | source, and configuration files. 30 | 31 | "Object" form shall mean any form resulting from mechanical 32 | transformation or translation of a Source form, including but 33 | not limited to compiled object code, generated documentation, 34 | and conversions to other media types. 35 | 36 | "Work" shall mean the work of authorship, whether in Source or 37 | Object form, made available under the License, as indicated by a 38 | copyright notice that is included in or attached to the work 39 | (an example is provided in the Appendix below). 40 | 41 | "Derivative Works" shall mean any work, whether in Source or Object 42 | form, that is based on (or derived from) the Work and for which the 43 | editorial revisions, annotations, elaborations, or other modifications 44 | represent, as a whole, an original work of authorship. For the purposes 45 | of this License, Derivative Works shall not include works that remain 46 | separable from, or merely link (or bind by name) to the interfaces of, 47 | the Work and Derivative Works thereof. 48 | 49 | "Contribution" shall mean any work of authorship, including 50 | the original version of the Work and any modifications or additions 51 | to that Work or Derivative Works thereof, that is intentionally 52 | submitted to Licensor for inclusion in the Work by the copyright owner 53 | or by an individual or Legal Entity authorized to submit on behalf of 54 | the copyright owner. For the purposes of this definition, "submitted" 55 | means any form of electronic, verbal, or written communication sent 56 | to the Licensor or its representatives, including but not limited to 57 | communication on electronic mailing lists, source code control systems, 58 | and issue tracking systems that are managed by, or on behalf of, the 59 | Licensor for the purpose of discussing and improving the Work, but 60 | excluding communication that is conspicuously marked or otherwise 61 | designated in writing by the copyright owner as "Not a Contribution." 62 | 63 | "Contributor" shall mean Licensor and any individual or Legal Entity 64 | on behalf of whom a Contribution has been received by Licensor and 65 | subsequently incorporated within the Work. 66 | 67 | 2. Grant of Copyright License. Subject to the terms and conditions of 68 | this License, each Contributor hereby grants to You a perpetual, 69 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 70 | copyright license to reproduce, prepare Derivative Works of, 71 | publicly display, publicly perform, sublicense, and distribute the 72 | Work and such Derivative Works in Source or Object form. 73 | 74 | 3. Grant of Patent License. Subject to the terms and conditions of 75 | this License, each Contributor hereby grants to You a perpetual, 76 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 77 | (except as stated in this section) patent license to make, have made, 78 | use, offer to sell, sell, import, and otherwise transfer the Work, 79 | where such license applies only to those patent claims licensable 80 | by such Contributor that are necessarily infringed by their 81 | Contribution(s) alone or by combination of their Contribution(s) 82 | with the Work to which such Contribution(s) was submitted. If You 83 | institute patent litigation against any entity (including a 84 | cross-claim or counterclaim in a lawsuit) alleging that the Work 85 | or a Contribution incorporated within the Work constitutes direct 86 | or contributory patent infringement, then any patent licenses 87 | granted to You under this License for that Work shall terminate 88 | as of the date such litigation is filed. 89 | 90 | 4. Redistribution. You may reproduce and distribute copies of the 91 | Work or Derivative Works thereof in any medium, with or without 92 | modifications, and in Source or Object form, provided that You 93 | meet the following conditions: 94 | 95 | (a) You must give any other recipients of the Work or 96 | Derivative Works a copy of this License; and 97 | 98 | (b) You must cause any modified files to carry prominent notices 99 | stating that You changed the files; and 100 | 101 | (c) You must retain, in the Source form of any Derivative Works 102 | that You distribute, all copyright, patent, trademark, and 103 | attribution notices from the Source form of the Work, 104 | excluding those notices that do not pertain to any part of 105 | the Derivative Works; and 106 | 107 | (d) If the Work includes a "NOTICE" text file as part of its 108 | distribution, then any Derivative Works that You distribute must 109 | include a readable copy of the attribution notices contained 110 | within such NOTICE file, excluding those notices that do not 111 | pertain to any part of the Derivative Works, in at least one 112 | of the following places: within a NOTICE text file distributed 113 | as part of the Derivative Works; within the Source form or 114 | documentation, if provided along with the Derivative Works; or, 115 | within a display generated by the Derivative Works, if and 116 | wherever such third-party notices normally appear. The contents 117 | of the NOTICE file are for informational purposes only and 118 | do not modify the License. You may add Your own attribution 119 | notices within Derivative Works that You distribute, alongside 120 | or as an addendum to the NOTICE text from the Work, provided 121 | that such additional attribution notices cannot be construed 122 | as modifying the License. 123 | 124 | You may add Your own copyright statement to Your modifications and 125 | may provide additional or different license terms and conditions 126 | for use, reproduction, or distribution of Your modifications, or 127 | for any such Derivative Works as a whole, provided Your use, 128 | reproduction, and distribution of the Work otherwise complies with 129 | the conditions stated in this License. 130 | 131 | 5. Submission of Contributions. Unless You explicitly state otherwise, 132 | any Contribution intentionally submitted for inclusion in the Work 133 | by You to the Licensor shall be under the terms and conditions of 134 | this License, without any additional terms or conditions. 135 | Notwithstanding the above, nothing herein shall supersede or modify 136 | the terms of any separate license agreement you may have executed 137 | with Licensor regarding such Contributions. 138 | 139 | 6. Trademarks. This License does not grant permission to use the trade 140 | names, trademarks, service marks, or product names of the Licensor, 141 | except as required for reasonable and customary use in describing the 142 | origin of the Work and reproducing the content of the NOTICE file. 143 | 144 | 7. Disclaimer of Warranty. Unless required by applicable law or 145 | agreed to in writing, Licensor provides the Work (and each 146 | Contributor provides its Contributions) on an "AS IS" BASIS, 147 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 148 | implied, including, without limitation, any warranties or conditions 149 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A 150 | PARTICULAR PURPOSE. You are solely responsible for determining the 151 | appropriateness of using or redistributing the Work and assume any 152 | risks associated with Your exercise of permissions under this License. 153 | 154 | 8. Limitation of Liability. In no event and under no legal theory, 155 | whether in tort (including negligence), contract, or otherwise, 156 | unless required by applicable law (such as deliberate and grossly 157 | negligent acts) or agreed to in writing, shall any Contributor be 158 | liable to You for damages, including any direct, indirect, special, 159 | incidental, or consequential damages of any character arising as a 160 | result of this License or out of the use or inability to use the 161 | Work (including but not limited to damages for loss of goodwill, 162 | work stoppage, computer failure or malfunction, or any and all 163 | other commercial damages or losses), even if such Contributor 164 | has been advised of the possibility of such damages. 165 | 166 | 9. Accepting Warranty or Additional Liability. While redistributing 167 | the Work or Derivative Works thereof, You may choose to offer, 168 | and charge a fee for, acceptance of support, warranty, indemnity, 169 | or other liability obligations and/or rights consistent with this 170 | License. However, in accepting such obligations, You may act only 171 | on Your own behalf and on Your sole responsibility, not on behalf 172 | of any other Contributor, and only if You agree to indemnify, 173 | defend, and hold each Contributor harmless for any liability 174 | incurred by, or claims asserted against, such Contributor by reason 175 | of your accepting any such warranty or additional liability. 176 | 177 | END OF TERMS AND CONDITIONS 178 | 179 | APPENDIX: How to apply the Apache License to your work. 180 | 181 | To apply the Apache License to your work, attach the following 182 | boilerplate notice, with the fields enclosed by brackets "[]" 183 | replaced with your own identifying information. (Don't include 184 | the brackets!) The text should be enclosed in the appropriate 185 | comment syntax for the file format. We also recommend that a 186 | file or class name and description of purpose be included on the 187 | same "printed page" as the copyright notice for easier 188 | identification within third-party archives. 189 | 190 | Copyright [yyyy] [name of copyright owner] 191 | 192 | Licensed under the Apache License, Version 2.0 (the "License"); 193 | you may not use this file except in compliance with the License. 194 | You may obtain a copy of the License at 195 | 196 | http://www.apache.org/licenses/LICENSE-2.0 197 | 198 | Unless required by applicable law or agreed to in writing, software 199 | distributed under the License is distributed on an "AS IS" BASIS, 200 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 201 | See the License for the specific language governing permissions and 202 | limitations under the License. -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | Copyright 2024 Google LLC 2 | 3 | Licensed under the Apache License, Version 2.0 (the "License"); 4 | you may not use this file except in compliance with the License. 5 | You may obtain a copy of the License at 6 | 7 |     https://www.apache.org/licenses/LICENSE-2.0 8 | 9 | Unless required by applicable law or agreed to in writing, software 10 | distributed under the License is distributed on an "AS IS" BASIS, 11 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 | See the License for the specific language governing permissions and 13 | limitations under the License. 14 | 15 | # Real Time Intelligence Hands-on Labs 16 | 17 | ## About 18 | This repository features self-contained, hands-on-labs with detailed step-by-step instructions and associated collateral (data, code, configuration) to enable Real Time Intelligence learning. 19 | 20 | ## Labs 21 | 22 | | # | Use Case | Lab summary | Contributed by | 23 | | -- | :--- | :--- |:--- | 24 | | 1. |[Real Time Prediction](RealTimePrediction/realtime-intelligence-main/README.md)|A real-time, streaming, machine learning (ML) prediction pipeline that uses Dataflow, Pub/Sub, Vertex AI, BigQuery and Cloud Storage | Sam Iyer 25 | | 2. |[Anomaly Detection Interactive Notebook](AnomalyDetection/anomalydetection-interactivenotebook-main/README.md)|Running an Apache Beam pipeline using Dataflow notebooks| Smitha Venkat, Purnima Maganti and Mohamed Barry 26 | 27 | ## Contributing 28 | See the contributing [instructions](CONTRIBUTING.md) to get started contributing. 29 | 30 | ## License 31 | All solutions within this repository are provided under the Apache 2.0 license. Please see the LICENSE file for more detailed terms and conditions. 32 | 33 | ## Disclaimer 34 | This repository and its contents are not an official Google Product. 35 | 36 | ## Contact 37 | Share you feedback, ideas, by logging [issues](../../issues). 38 | 39 | ## Release History 40 | 41 | | # | Release Summary | Date | Contributor | 42 | | -- | :--- | :--- |:--- | 43 | | 1. |Initial release| 4/4/2023| Various| 44 | | 2. |Code Fix|6/10/2024|realtime/train_on_vertexai.py| 45 | | 3. |||| 46 | -------------------------------------------------------------------------------- /RealTimePrediction/realtime-intelligence-main/README.md: -------------------------------------------------------------------------------- 1 | Copyright 2024 Google LLC 2 | 3 | Licensed under the Apache License, Version 2.0 (the "License"); 4 | you may not use this file except in compliance with the License. 5 | You may obtain a copy of the License at 6 | 7 |     https://www.apache.org/licenses/LICENSE-2.0 8 | 9 | Unless required by applicable law or agreed to in writing, software 10 | distributed under the License is distributed on an "AS IS" BASIS, 11 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 | See the License for the specific language governing permissions and 13 | limitations under the License. 14 | 15 | ## Realtime Prediction 16 | This lab helps you to implement a real-time, streaming, machine learning (ML) prediction pipeline that uses Dataflow, Pub/Sub, Vertex AI, BigQuery and Cloud Storage. 17 | 18 | ## Solution Overview 19 | This lab predicts if a flight would arrive on-time using historical data from the US Bureau of Transport Statistics(BTS) website. (https://www.bts.gov/topics/airline-time-tables) 20 | This website provides historical on-time performance information of domestic flights in the United States. All major US air carriers are required to file statistics about each of their domestic flights with the BTS. The data they are required to file includes the scheduled departure and arrival times as well as the actual departure and arrival times. From the scheduled and actual arrival times, the arrival delay associated with each flight can be calculated. Therefore, this dataset can give us the true value for building a model to predict arrival delay. 21 | 22 | ## Architecture 23 | 24 | ![Architecture](images/architecture.png) 25 | 1. Data Ingestion 26 | 27 | - Ingest - Extract Flight On-Time Perfomance Data (Date, Flight Number, Origin, Destination, Departure Time, Taxi Time, Arrival Time, etc ) -> Stored in Cloud Storage Bucket 28 | 29 | - Ingest - Extract Airport Information (Airport code, City, Latitude, Longitiude, etc.,) -> Stored in Cloud Storage Bucket 30 | 31 | - Store - Store standardized and transformed datasets in BigQuery 32 | 33 | 2. Model Training 34 | 35 | - Batch Dataflow Process to create Training Dataset using simulated events. 36 | 37 | - Use the Training Dataset for Vertex AI Model Training. 38 | 39 | 3. Prediction 40 | 41 | - Simulate - Simulate Realtime Fight Takeoffs & Landings and capture this data in Pub/Sub Topics. 42 | 43 | - Prediction - Streaming Dataflow job to read from Pub/Sub and call Vertex AI Model to predict on-time arrival of flights. 44 | 45 | - Store - Capture the predictions in a BigQuery Dataset for Analysis and Dashboarding needs. 46 | 47 | ## Datasets 48 | 49 | 1. Inputs 50 | 51 | - Airports Information - airports 52 | 53 | - Ontime Flight Data - flights_raw 54 | 55 | - Time Zone Corrected Data - flights_tzcorr 56 | 57 | - Simulated Flight Event - flights_simevents 58 | 59 | 2. Outputs 60 | 61 | - Streaming Prediction - streaming_preds 62 | 63 | 64 | ## Getting started 65 | 66 | ### Step 01. Create a GCP project and open Cloud Shell 67 | 68 | ### Step 02. Clone this github repository: 69 | 70 | git clone https://github.com/google/real-time-intelligence-workshop.git 71 | 72 | ### Step 03. Change Directory to **RealTimePrediction/realtime-intelligence-main** 73 | 74 | cd real-time-intelligence-workshop/RealTimePrediction/realtime-intelligence-main/ 75 | 76 | ### Step 04. Execute script 77 | 78 | ./setup_env.sh 79 | 80 | This script sets up your project: 81 | 82 | - Create Project Variables 83 | 84 | - Enable necessary APIs 85 | 86 | - Add the necessary roles for the default compute service account 87 | 88 | - Create Network, Sub-network & Firewall rules 89 | 90 | ### Step 05. Execute script 91 | 92 | ./stage_data.sh 93 | 94 | This script will stage the following data for the lab 95 | 96 | - Download flight ontime performance data 97 | 98 | - Download flight timezone corrected data 99 | 100 | - Download Airport information 101 | 102 | - Download Flight Simulated Events 103 | 104 | - Upload the downloaded files to BigQuery 105 | 106 | ### Step 06. Validate if data has been copied to Cloud Storage and BigQuery 107 | 108 | Sample image of the GCS Bucket 109 | 110 | ![GCS](images/ingestion_gcs.png) 111 | 112 | - In Google Cloud Console menu, Navigate to Cloud Storage and validate if -ml bucket is created 113 | 114 | Open the bucket and validate if the following files exists 115 | 116 | - flight_simevents_dump*.gz(5 files) 117 | 118 | - flight folder has 3 sub-folders - airports, raw & tzcorr 119 | 120 | - airports folder has 1 file - airports.csv 121 | 122 | - raw folder has 2 files - 201501.csv & 201502.csv 123 | 124 | - tzcorr folder has 1 file - all_flights* 125 | 126 | Sample Image of Bigquery Dataset 127 | 128 | ![BigQuery](images/ingestion_bq.png) 129 | 130 | - In Google Cloud Console menu, Navigate to BigQuery and validate if flights dataset is created 131 | 132 | Open the dataset and validate if the following tables exists 133 | 134 | - airports - 13,386 rows 135 | 136 | - flights_raw - 899,159 rows 137 | 138 | - flights_simevents - 2,600,380 rows 139 | 140 | - flights_tzcorr - 65,099 rows 141 | 142 | ### Step 07. Check Organization Policies to review the following constraints 143 | 144 | - In Google Cloud Console menu, navigate to IAM->Organization Policies 145 | 146 | - Turn off Shielded VM Policy 147 | 148 | - Filter the following constraint to validate current settings 149 | 150 | constraints/compute.requireShieldedVm 151 | 152 | Sample Image of Shielded VM - Organization Policy 153 | 154 | ![ShieldedVM](images/op_shieldedvm.png) 155 | 156 | 157 | - Allow VM external IP Access 158 | 159 | - Filter the following constraint to validate current settings 160 | 161 | constraints/compute.vmExternalIpAccess 162 | 163 | Sample Image of External IP Access - Organization Policy 164 | 165 | ![ExternalIP](images/op_externalip.png) 166 | 167 | ### Step 08. Execute script to install the necessary packages. 168 | 169 | ./install_packages.sh 170 | 171 | - These packages are necessary to run tensorflow and apache beam processes 172 | 173 | ### Step 09. Execute script to create data for model training. 174 | 175 | ./create_train_data.sh 176 | 177 | This script creates data for testing, training and validation of the model. 178 | 179 | - In the Google Cloud Console menu, navigate to Dataflow > Jobs 180 | 181 | - Click on traindata job to review the job graph 182 | 183 | - Wait for the job to run and succeed - will take about 20 minutes 184 | 185 | Sample Image of Dataflow Jobs - Note: traindata is a batch job 186 | 187 | ![DataFlow1](images/dataflow_jobs1.png) 188 | 189 | 190 | Sample Image of TrainData Job Graph 191 | 192 | ![Batch](images/batch.png) 193 | 194 | Open -ml bucket to validate the following files and folders are present 195 | 196 | - train folder with 1 sub-folder - data with 4 files - all*.csv, test*.csv, train*.csv, validate*.csv 197 | 198 | Sample Image of the bucket 199 | 200 | ![Datafolder](images/data_folder.png) 201 | 202 | - flights folder with 2 sub-folders - staging & temp, that have staging and temp files 203 | 204 | Sample Image of the bucket 205 | 206 | ![Flightsfolder](images/flights_folder.png) 207 | 208 | 209 | ### Step 10. Execute script to train and deploy the model 210 | 211 | ./train_model.sh 212 | 213 | - In Google Cloud Console menu, navigate to Vertex AI -> Training to monitor the training pipeline. 214 | 215 | Sample Image of Vertex AI Training Pipeline 216 | 217 | ![AITraining](images/vertex_ai_training.png) 218 | 219 | - When the status is Finished, click on the training pipeline name and select Deploy & Test tab 220 | 221 | Sample Image of Vertex AI Deployment 222 | 223 | ![AIDeployment](images/vertex_ai_deployment.png) 224 | 225 | 226 | - Monitor the deployment status of the model 227 | 228 | - Note: It will take around 20 minutes to complete the model training and deployment. 229 | 230 | - Once the model is deployed the flights endpoint will be used to call the model for prediction. 231 | 232 | Sample Image of Vertex AI Endpoint 233 | 234 | ![AIEndpoint](images/vertex_ai_endpoint.png) 235 | 236 | 237 | ### Step 11. Open another tab in cloud shell and execute script to stream simulated flight data 238 | 239 | ./simulate_flight.sh 240 | 241 | - In Google Cloud Console menu, navigate to Pub/Sub -> Topics 242 | 243 | - Review 3 topics that were created to stream simulated flights events 244 | 245 | - arrived - simulates flight arrivals 246 | - departed - simulates flight departures 247 | - wheels-off - simulates flight take-offs 248 | 249 | Sample Image of Pub/Sub Topics 250 | 251 | ![PubSub](images/pubsub.png) 252 | 253 | 254 | ### Step 12. In the previous tab execute script to predict the probaility of flights being on time 255 | 256 | ./predict_flights.sh 257 | 258 | This scripts will create a streaming data flow job that calls the AI model trained in Step 10 259 | 260 | Sample Image of Dataflow Jobs - Note: predictions is a streaming job 261 | 262 | ![DataFlow2](images/dataflow_jobs2.png) 263 | 264 | Sample Image of predictions Job Graph 265 | 266 | ![Streaming](images/streaming.png) 267 | 268 | - Wait for 15 minutes. 269 | 270 | - In Google Cloud Consule menu, navigate to BigQuery -> SQL Studio 271 | 272 | - Open the dataset flights and review streaming_preds table 273 | 274 | - Streaming predictions on probalblity of flight ontime arrival is captured in this table 275 | 276 | Sample Image of Streaming Predictions Table 277 | 278 | ![Streaming](images/prediction.png) 279 | 280 | 281 | -------------------------------------------------------------------------------- /RealTimePrediction/realtime-intelligence-main/create_train_data.sh: -------------------------------------------------------------------------------- 1 | # Copyright 2023 Google LLC 2 | # 3 | # Licensed under the Apache License, Version 2.0 (the "License"); 4 | # you may not use this file except in compliance with the License. 5 | # You may obtain a copy of the License at 6 | # 7 | #     https://www.apache.org/licenses/LICENSE-2.0 8 | # 9 | # Unless required by applicable law or agreed to in writing, software 10 | # distributed under the License is distributed on an "AS IS" BASIS, 11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 | # See the License for the specific language governing permissions and 13 | # limitations under the License. 14 | # 15 | # Set environment variables 16 | # 17 | export PROJECT_ID=$(gcloud info --format='value(config.project)') 18 | export BUCKET=$PROJECT_ID-ml 19 | # 20 | # Change directory to realtime directory 21 | # 22 | cd ./realtime 23 | # 24 | # Create data for Training 25 | # Run Dataflow Pipeline to create Training Dataset 26 | # Note: It will take around 15-20 minutes to complete the job. 27 | # 28 | python3 create_traindata.py --input bigquery --project $PROJECT_ID --bucket $BUCKET --region us-central1 29 | # 30 | cd .. 31 | # 32 | #In the GCP Cloud Console menu, navigate to Dataflow > Jobs 33 | #Open Traindata job and review the Job Graph 34 | # -------------------------------------------------------------------------------- /RealTimePrediction/realtime-intelligence-main/images/architecture.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/google/real-time-intelligence-workshop/ea99fd5b6fcf44c4f006dc17adde8c42ca163479/RealTimePrediction/realtime-intelligence-main/images/architecture.png -------------------------------------------------------------------------------- /RealTimePrediction/realtime-intelligence-main/images/batch.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/google/real-time-intelligence-workshop/ea99fd5b6fcf44c4f006dc17adde8c42ca163479/RealTimePrediction/realtime-intelligence-main/images/batch.png -------------------------------------------------------------------------------- /RealTimePrediction/realtime-intelligence-main/images/data_folder.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/google/real-time-intelligence-workshop/ea99fd5b6fcf44c4f006dc17adde8c42ca163479/RealTimePrediction/realtime-intelligence-main/images/data_folder.png -------------------------------------------------------------------------------- /RealTimePrediction/realtime-intelligence-main/images/dataflow_jobs1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/google/real-time-intelligence-workshop/ea99fd5b6fcf44c4f006dc17adde8c42ca163479/RealTimePrediction/realtime-intelligence-main/images/dataflow_jobs1.png -------------------------------------------------------------------------------- /RealTimePrediction/realtime-intelligence-main/images/dataflow_jobs2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/google/real-time-intelligence-workshop/ea99fd5b6fcf44c4f006dc17adde8c42ca163479/RealTimePrediction/realtime-intelligence-main/images/dataflow_jobs2.png -------------------------------------------------------------------------------- /RealTimePrediction/realtime-intelligence-main/images/flights_folder.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/google/real-time-intelligence-workshop/ea99fd5b6fcf44c4f006dc17adde8c42ca163479/RealTimePrediction/realtime-intelligence-main/images/flights_folder.png -------------------------------------------------------------------------------- /RealTimePrediction/realtime-intelligence-main/images/ingestion_bq.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/google/real-time-intelligence-workshop/ea99fd5b6fcf44c4f006dc17adde8c42ca163479/RealTimePrediction/realtime-intelligence-main/images/ingestion_bq.png -------------------------------------------------------------------------------- /RealTimePrediction/realtime-intelligence-main/images/ingestion_gcs.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/google/real-time-intelligence-workshop/ea99fd5b6fcf44c4f006dc17adde8c42ca163479/RealTimePrediction/realtime-intelligence-main/images/ingestion_gcs.png -------------------------------------------------------------------------------- /RealTimePrediction/realtime-intelligence-main/images/op_externalip.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/google/real-time-intelligence-workshop/ea99fd5b6fcf44c4f006dc17adde8c42ca163479/RealTimePrediction/realtime-intelligence-main/images/op_externalip.png -------------------------------------------------------------------------------- /RealTimePrediction/realtime-intelligence-main/images/op_shieldedvm.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/google/real-time-intelligence-workshop/ea99fd5b6fcf44c4f006dc17adde8c42ca163479/RealTimePrediction/realtime-intelligence-main/images/op_shieldedvm.png -------------------------------------------------------------------------------- /RealTimePrediction/realtime-intelligence-main/images/prediction.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/google/real-time-intelligence-workshop/ea99fd5b6fcf44c4f006dc17adde8c42ca163479/RealTimePrediction/realtime-intelligence-main/images/prediction.png -------------------------------------------------------------------------------- /RealTimePrediction/realtime-intelligence-main/images/pubsub.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/google/real-time-intelligence-workshop/ea99fd5b6fcf44c4f006dc17adde8c42ca163479/RealTimePrediction/realtime-intelligence-main/images/pubsub.png -------------------------------------------------------------------------------- /RealTimePrediction/realtime-intelligence-main/images/streaming.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/google/real-time-intelligence-workshop/ea99fd5b6fcf44c4f006dc17adde8c42ca163479/RealTimePrediction/realtime-intelligence-main/images/streaming.png -------------------------------------------------------------------------------- /RealTimePrediction/realtime-intelligence-main/images/vertex_ai_deployment.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/google/real-time-intelligence-workshop/ea99fd5b6fcf44c4f006dc17adde8c42ca163479/RealTimePrediction/realtime-intelligence-main/images/vertex_ai_deployment.png -------------------------------------------------------------------------------- /RealTimePrediction/realtime-intelligence-main/images/vertex_ai_endpoint.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/google/real-time-intelligence-workshop/ea99fd5b6fcf44c4f006dc17adde8c42ca163479/RealTimePrediction/realtime-intelligence-main/images/vertex_ai_endpoint.png -------------------------------------------------------------------------------- /RealTimePrediction/realtime-intelligence-main/images/vertex_ai_training.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/google/real-time-intelligence-workshop/ea99fd5b6fcf44c4f006dc17adde8c42ca163479/RealTimePrediction/realtime-intelligence-main/images/vertex_ai_training.png -------------------------------------------------------------------------------- /RealTimePrediction/realtime-intelligence-main/install_packages.sh: -------------------------------------------------------------------------------- 1 | # Copyright 2023 Google LLC 2 | # 3 | # Licensed under the Apache License, Version 2.0 (the "License"); 4 | # you may not use this file except in compliance with the License. 5 | # You may obtain a copy of the License at 6 | # 7 | #     https://www.apache.org/licenses/LICENSE-2.0 8 | # 9 | # Unless required by applicable law or agreed to in writing, software 10 | # distributed under the License is distributed on an "AS IS" BASIS, 11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 | # See the License for the specific language governing permissions and 13 | # limitations under the License. 14 | # 15 | #Install The following Packages in your cloudshell or VM 16 | # 17 | pip3 install google-cloud-aiplatform 18 | pip3 install cloudml-hypertune 19 | pip3 install pyfarmhash 20 | pip3 install tensorflow 21 | pip3 install kfp 'apache-beam[gcp]' 22 | -------------------------------------------------------------------------------- /RealTimePrediction/realtime-intelligence-main/predict_flights.sh: -------------------------------------------------------------------------------- 1 | # Copyright 2023 Google LLC 2 | # 3 | # Licensed under the Apache License, Version 2.0 (the "License"); 4 | # you may not use this file except in compliance with the License. 5 | # You may obtain a copy of the License at 6 | # 7 | #     https://www.apache.org/licenses/LICENSE-2.0 8 | # 9 | # Unless required by applicable law or agreed to in writing, software 10 | # distributed under the License is distributed on an "AS IS" BASIS, 11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 | # See the License for the specific language governing permissions and 13 | # limitations under the License. 14 | # 15 | # Set environment variables 16 | # 17 | export PROJECT_ID=$(gcloud info --format='value(config.project)') 18 | export BUCKET=$PROJECT_ID-ml 19 | # 20 | # Change directory to simulate directory 21 | # 22 | cd ./realtime 23 | # 24 | # Predict Flights: 25 | # 26 | python3 make_predictions.py --input pubsub --output bigquery --project $PROJECT_ID --bucket $BUCKET --region us-central1 27 | cd .. -------------------------------------------------------------------------------- /RealTimePrediction/realtime-intelligence-main/realtime/.gitignore: -------------------------------------------------------------------------------- 1 | *.egg-info 2 | -------------------------------------------------------------------------------- /RealTimePrediction/realtime-intelligence-main/realtime/README.md: -------------------------------------------------------------------------------- 1 | Copyright 2023 Google LLC 2 | 3 | Licensed under the Apache License, Version 2.0 (the "License"); 4 | you may not use this file except in compliance with the License. 5 | You may obtain a copy of the License at 6 | 7 |     https://www.apache.org/licenses/LICENSE-2.0 8 | 9 | Unless required by applicable law or agreed to in writing, software 10 | distributed under the License is distributed on an "AS IS" BASIS, 11 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 | See the License for the specific language governing permissions and 13 | limitations under the License. 14 | 15 | # Machine Learning on Streaming Pipelines 16 | 17 | ### Catch up from previous chapters if necessary 18 | If you didn't go through Chapters 2-9, the simplest way to catch up is to copy data from my bucket: 19 | 20 | #### Catch up from Chapters 2-9 21 | * Open CloudShell and git clone this repo: 22 | ``` 23 | git clone https://github.com/GoogleCloudPlatform/data-science-on-gcp 24 | ``` 25 | * Go to the 02_ingest folder of the repo, run the program ./ingest_from_crsbucket.sh and specify your bucket name. 26 | * Go to the 04_streaming folder of the repo, run the program ./ingest_from_crsbucket.sh and specify your bucket name. 27 | * Go to the 05_bqnotebook folder of the repo, run the program ./create_trainday.sh and specify your bucket name. 28 | * Go to the 10_mlops folder of the repo, run the program ./ingest_from_crsbucket.sh and specify your bucket name. 29 | 30 | #### From CloudShell 31 | * Install the Python libraries you'll need 32 | ``` 33 | pip3 install google-cloud-aiplatform cloudml-hypertune pyfarmhash 34 | ``` 35 | * [Optional] Create a small, local sample of BigQuery datasets for local experimentation: 36 | ``` 37 | bash create_sample_input.sh 38 | ``` 39 | * [Optional] Run a local pipeline to create a training dataset: 40 | ``` 41 | python3 create_traindata.py --input local 42 | ``` 43 | Verify the results: 44 | ``` 45 | cat /tmp/all_data* 46 | ``` 47 | * Run a Dataflow pipeline to create the full training dataset: 48 | ``` 49 | python3 create_traindata.py --input bigquery --project --bucket --region 50 | ``` 51 | Note if you get an error similar to: 52 | ``` 53 | AttributeError: Can't get attribute '_create_code' on 54 | ``` 55 | it is because the global version of your modules are ahead/behind of what Apache Beam on the server requires. Make sure to submit Apache Beam code to Dataflow from a pristine virtual environment that has only the modules you need: 56 | ``` 57 | python -m venv ~/beamenv 58 | source ~/beamenv/bin/activate 59 | pip install apache-beam[gcp] google-cloud-aiplatform cloudml-hypertune pyfarmhash pyparsing==2.4.2 60 | python3 create_traindata.py ... 61 | ``` 62 | Note that beamenv is only for submitting to Dataflow. Run train_on_vertexai.py and other code directly in the terminal. 63 | * Run script that copies over the Ch10 model.py and train_on_vertexai.py files and makes the necessary changes: 64 | ``` 65 | python3 change_ch10_files.py 66 | ``` 67 | * [Optional] Train an AutoML model on the enriched dataset: 68 | ``` 69 | python3 train_on_vertexai.py --automl --project --bucket --region 70 | ``` 71 | Verify performance by running the following BigQuery query: 72 | ``` 73 | SELECT 74 | SQRT(SUM( 75 | (CAST(ontime AS FLOAT64) - predicted_ontime.scores[OFFSET(0)])* 76 | (CAST(ontime AS FLOAT64) - predicted_ontime.scores[OFFSET(0)]) 77 | )/COUNT(*)) 78 | FROM dsongcp.ch11_automl_evaluated 79 | ``` 80 | * Train custom ML model on the enriched dataset: 81 | ``` 82 | python3 train_on_vertexai.py --project --bucket --region 83 | ``` 84 | Look at the logs of the log to determine the final RMSE. 85 | * Run a local pipeline to invoke predictions: 86 | ``` 87 | python3 make_predictions.py --input local 88 | ``` 89 | Verify the results: 90 | ``` 91 | cat /tmp/predictions* 92 | ``` 93 | * [Optional] Run a pipeline on full BigQuery dataset to invoke predictions: 94 | ``` 95 | python3 make_predictions.py --input bigquery --project --bucket --region 96 | ``` 97 | Verify the results 98 | ``` 99 | gsutil cat gs://BUCKET/flights/ch11/predictions* | head -5 100 | ``` 101 | * [Optional] Simulate real-time pipeline and check to see if predictions are being made 102 | 103 | 104 | In one terminal, type: 105 | ``` 106 | cd ../04_streaming/simulate 107 | python3 ./simulate.py --startTime '2015-05-01 00:00:00 UTC' \ 108 | --endTime '2015-05-04 00:00:00 UTC' --speedFactor=30 --project 109 | ``` 110 | 111 | In another terminal type: 112 | ``` 113 | python3 make_predictions.py --input pubsub \ 114 | --project --bucket --region 115 | ``` 116 | 117 | Ensure that the pipeline starts, check that output elements are starting to be written out, do: 118 | ``` 119 | gsutil ls gs://BUCKET/flights/ch11/predictions* 120 | ``` 121 | Make sure to go to the GCP Console and stop the Dataflow pipeline. 122 | 123 | 124 | * Simulate real-time pipeline and try out different jagger etc. 125 | 126 | In one terminal, type: 127 | ``` 128 | cd ../04_streaming/simulate 129 | python3 ./simulate.py --startTime '2015-02-01 00:00:00 UTC' \ 130 | --endTime '2015-02-03 00:00:00 UTC' --speedFactor=30 --project 131 | ``` 132 | 133 | In another terminal type: 134 | ``` 135 | python3 make_predictions.py --input pubsub --output bigquery \ 136 | --project --bucket --region 137 | ``` 138 | 139 | Ensure that the pipeline starts, look at BigQuery: 140 | ``` 141 | SELECT * FROM dsongcp.streaming_preds ORDER BY event_time DESC LIMIT 10 142 | ``` 143 | When done, make sure to go to the GCP Console and stop the Dataflow pipeline. 144 | 145 | Note: If you are going to try it a second time around, delete the BigQuery sink, or simulate with a different time range 146 | ``` 147 | bq rm -f dsongcp.streaming_preds 148 | ``` 149 | 150 | -------------------------------------------------------------------------------- /RealTimePrediction/realtime-intelligence-main/realtime/call_predict.py: -------------------------------------------------------------------------------- 1 | #### DO NOT EDIT! Autogenerated from ../mlops/call_predict.py# Copyright 2023 Google Inc. All Rights Reserved. 2 | # 3 | # Licensed under the Apache License, Version 2.0 (the "License"); 4 | # you may not use this file except in compliance with the License. 5 | # You may obtain a copy of the License at 6 | # 7 | # http://www.apache.org/licenses/LICENSE-2.0 8 | # 9 | # Unless required by applicable law or agreed to in writing, software 10 | # distributed under the License is distributed on an "AS IS" BASIS, 11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 | # See the License for the specific language governing permissions and 13 | # limitations under the License. 14 | 15 | import sys, json 16 | from google.cloud import aiplatform 17 | from google.cloud.aiplatform import gapic as aip 18 | 19 | #ENDPOINT_NAME = 'flights-ch11' 20 | ENDPOINT_NAME = 'flights' 21 | 22 | if __name__ == '__main__': 23 | 24 | endpoints = aiplatform.Endpoint.list( 25 | filter='display_name="{}"'.format(ENDPOINT_NAME), 26 | order_by='create_time desc' 27 | ) 28 | if len(endpoints) == 0: 29 | print("No endpoint named {}".format(ENDPOINT_NAME)) 30 | sys.exit(-1) 31 | 32 | endpoint = endpoints[0] 33 | 34 | input_data = {"instances": [ 35 | {"dep_hour": 2, "is_weekday": 1, "dep_delay": 40, "taxi_out": 17, "distance": 41, "carrier": "AS", "avg_dep_delay": -3.0, "avg_taxi_out": 5.0, 36 | "dep_airport_lat": 58.42527778, "dep_airport_lon": -135.7075, "arr_airport_lat": 58.35472222, 37 | "arr_airport_lon": -134.57472222, "origin": "GST", "dest": "JNU"}, 38 | {"dep_hour": 22, "is_weekday": 0, "dep_delay": -7, "taxi_out": 7, "distance": 201, "carrier": "HA", "avg_dep_delay": 3.0, "avg_taxi_out": 8.0, 39 | "dep_airport_lat": 21.97611111, "dep_airport_lon": -159.33888889, "arr_airport_lat": 20.89861111, 40 | "arr_airport_lon": -156.43055556, "origin": "LIH", "dest": "OGG"} 41 | ]} 42 | 43 | preds = endpoint.predict(input_data['instances']) 44 | print(preds) 45 | 46 | 47 | 48 | -------------------------------------------------------------------------------- /RealTimePrediction/realtime-intelligence-main/realtime/create_sample_input.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | # Copyright 2023 Google LLC 3 | # 4 | # Licensed under the Apache License, Version 2.0 (the "License"); 5 | # you may not use this file except in compliance with the License. 6 | # You may obtain a copy of the License at 7 | # 8 | #     https://www.apache.org/licenses/LICENSE-2.0 9 | # 10 | # Unless required by applicable law or agreed to in writing, software 11 | # distributed under the License is distributed on an "AS IS" BASIS, 12 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | # See the License for the specific language governing permissions and 14 | # limitations under the License. 15 | 16 | bq query --nouse_legacy_sql --format=sparse \ 17 | "SELECT EVENT_DATA FROM dsongcp.flights_simevents WHERE EVENT_TYPE = 'wheelsoff' AND EVENT_TIME BETWEEN '2015-03-10T10:00:00' AND '2015-03-10T14:00:00' " \ 18 | | grep FL_DATE \ 19 | > simevents_sample.json 20 | 21 | 22 | bq query --nouse_legacy_sql --format=json \ 23 | "SELECT * FROM dsongcp.flights_tzcorr WHERE DEP_TIME BETWEEN '2015-03-10T10:00:00' AND '2015-03-10T14:00:00' " \ 24 | | sed 's/\[//g' | sed 's/\]//g' | sed s'/\},/\}\n/g' \ 25 | > alldata_sample.json 26 | -------------------------------------------------------------------------------- /RealTimePrediction/realtime-intelligence-main/realtime/create_traindata.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | 3 | # Copyright 2023 Google LLC 4 | # 5 | # Licensed under the Apache License, Version 2.0 (the "License"); 6 | # you may not use this file except in compliance with the License. 7 | # You may obtain a copy of the License at 8 | # 9 | #     https://www.apache.org/licenses/LICENSE-2.0 10 | # 11 | # Unless required by applicable law or agreed to in writing, software 12 | # distributed under the License is distributed on an "AS IS" BASIS, 13 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 14 | # See the License for the specific language governing permissions and 15 | # limitations under the License. 16 | 17 | import apache_beam as beam 18 | import logging 19 | import os 20 | import json 21 | 22 | from flightstxf import flights_transforms as ftxf 23 | 24 | CSV_HEADER = 'ontime,dep_delay,taxi_out,distance,origin,dest,dep_hour,is_weekday,carrier,dep_airport_lat,dep_airport_lon,arr_airport_lat,arr_airport_lon,avg_dep_delay,avg_taxi_out,data_split' 25 | 26 | 27 | def dict_to_csv(f): 28 | try: 29 | yield ','.join([str(x) for x in f.values()]) 30 | except Exception as e: 31 | logging.warning('Ignoring {} because: {}'.format(f, e), exc_info=True) 32 | pass 33 | 34 | 35 | def run(project, bucket, region, input): 36 | if input == 'local': 37 | logging.info('Running locally on small extract') 38 | argv = [ 39 | '--runner=DirectRunner' 40 | ] 41 | flights_output = '/tmp/' 42 | else: 43 | logging.info('Running in the cloud on full dataset input={}'.format(input)) 44 | argv = [ 45 | '--project={0}'.format(project), 46 | '--job_name=traindata', 47 | # '--save_main_session', # not needed as we are running as a package now 48 | '--staging_location=gs://{0}/flights/staging/'.format(bucket), 49 | '--temp_location=gs://{0}/flights/temp/'.format(bucket), 50 | '--setup_file=./setup.py', 51 | '--autoscaling_algorithm=THROUGHPUT_BASED', 52 | '--max_num_workers=20', 53 | # '--max_num_workers=4', '--worker_machine_type=m1-ultramem-40', '--disk_size_gb=500', # for full 2015-2019 dataset 54 | '--region={}'.format(region), 55 | '--subnetwork=regions/us-central1/subnetworks/default', 56 | '--runner=DataflowRunner' 57 | ] 58 | flights_output = 'gs://{}/train/data/'.format(bucket) 59 | 60 | with beam.Pipeline(argv=argv) as pipeline: 61 | 62 | # read the event stream 63 | if input == 'local': 64 | input_file = './alldata_sample.json' 65 | logging.info("Reading from {} ... Writing to {}".format(input_file, flights_output)) 66 | events = ( 67 | pipeline 68 | | 'read_input' >> beam.io.ReadFromText(input_file) 69 | | 'parse_input' >> beam.Map(lambda line: json.loads(line)) 70 | ) 71 | elif input == 'bigquery': 72 | input_table = 'flights.flights_tzcorr' 73 | logging.info("Reading from {} ... Writing to {}".format(input_table, flights_output)) 74 | events = ( 75 | pipeline 76 | | 'read_input' >> beam.io.ReadFromBigQuery(table=input_table) 77 | ) 78 | else: 79 | logging.error("Unknown input type {}".format(input)) 80 | return 81 | 82 | # events -> features. See ./flights_transforms.py for the code shared between training & prediction 83 | features = ftxf.transform_events_to_features(events) 84 | 85 | # shuffle globally so that we are not at mercy of TensorFlow's shuffle buffer 86 | features = ( 87 | features 88 | | 'into_global' >> beam.WindowInto(beam.window.GlobalWindows()) 89 | | 'shuffle' >> beam.util.Reshuffle() 90 | ) 91 | 92 | # write out 93 | for split in ['ALL', 'TRAIN', 'VALIDATE', 'TEST']: 94 | feats = features 95 | if split != 'ALL': 96 | feats = feats | 'only_{}'.format(split) >> beam.Filter(lambda f: f['data_split'] == split) 97 | ( 98 | feats 99 | | '{}_to_string'.format(split) >> beam.FlatMap(dict_to_csv) 100 | | '{}_to_gcs'.format(split) >> beam.io.textio.WriteToText(os.path.join(flights_output, split.lower()), 101 | file_name_suffix='.csv', header=CSV_HEADER, 102 | # workaround b/207384805 103 | num_shards=1) 104 | ) 105 | 106 | 107 | if __name__ == '__main__': 108 | import argparse 109 | 110 | parser = argparse.ArgumentParser(description='Create training CSV file that includes time-aggregate features') 111 | parser.add_argument('-p', '--project', help='Project to be billed for Dataflow job. Omit if running locally.') 112 | parser.add_argument('-b', '--bucket', help='Training data will be written to gs://BUCKET/train/') 113 | parser.add_argument('-r', '--region', help='Region to run Dataflow job. Choose the same region as your bucket.') 114 | parser.add_argument('-i', '--input', help='local OR bigquery', required=True) 115 | 116 | logging.getLogger().setLevel(logging.INFO) 117 | args = vars(parser.parse_args()) 118 | 119 | if args['input'] != 'local': 120 | if not args['bucket'] or not args['project'] or not args['region']: 121 | print("Project, Bucket, Region are needed in order to run on the cloud on full dataset.") 122 | parser.print_help() 123 | parser.exit() 124 | 125 | run(project=args['project'], bucket=args['bucket'], region=args['region'], input=args['input']) 126 | -------------------------------------------------------------------------------- /RealTimePrediction/realtime-intelligence-main/realtime/flightstxf/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/google/real-time-intelligence-workshop/ea99fd5b6fcf44c4f006dc17adde8c42ca163479/RealTimePrediction/realtime-intelligence-main/realtime/flightstxf/__init__.py -------------------------------------------------------------------------------- /RealTimePrediction/realtime-intelligence-main/realtime/flightstxf/flights_transforms.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | # Copyright 2023 Google LLC 3 | # 4 | # Licensed under the Apache License, Version 2.0 (the "License"); 5 | # you may not use this file except in compliance with the License. 6 | # You may obtain a copy of the License at 7 | # 8 | #     https://www.apache.org/licenses/LICENSE-2.0 9 | # 10 | # Unless required by applicable law or agreed to in writing, software 11 | # distributed under the License is distributed on an "AS IS" BASIS, 12 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | # See the License for the specific language governing permissions and 14 | # limitations under the License. 15 | 16 | import apache_beam as beam 17 | import datetime as dt 18 | import logging 19 | import numpy as np 20 | import farmhash # pip install pyfarmhash 21 | 22 | DATETIME_FORMAT = '%Y-%m-%d %H:%M:%S' 23 | WINDOW_DURATION = 60 * 60 24 | WINDOW_EVERY = 5 * 60 25 | 26 | 27 | def get_data_split(fl_date): 28 | fl_date_str = str(fl_date) 29 | # Use farm fingerprint just like in BigQuery 30 | x = np.abs(np.uint64(farmhash.fingerprint64(fl_date_str)).astype('int64') % 100) 31 | if x < 60: 32 | data_split = 'TRAIN' 33 | elif x < 80: 34 | data_split = 'VALIDATE' 35 | else: 36 | data_split = 'TEST' 37 | return data_split 38 | 39 | 40 | def get_data_split_2019(fl_date): 41 | fl_date_str = str(fl_date) 42 | if fl_date_str > '2019': 43 | data_split = 'TEST' 44 | else: 45 | # Use farm fingerprint just like in BigQuery 46 | x = np.abs(np.uint64(farmhash.fingerprint64(fl_date_str)).astype('int64') % 100) 47 | if x < 95: 48 | data_split = 'TRAIN' 49 | else: 50 | data_split = 'VALIDATE' 51 | return data_split 52 | 53 | 54 | def to_datetime(event_time): 55 | if isinstance(event_time, str): 56 | # In BigQuery, this is a datetime.datetime. In JSON, it's a string 57 | # sometimes it has a T separating the date, sometimes it doesn't 58 | # Handle all the possibilities 59 | event_time = dt.datetime.strptime(event_time.replace('T', ' '), DATETIME_FORMAT) 60 | return event_time 61 | 62 | 63 | def approx_miles_between(lat1, lon1, lat2, lon2): 64 | # convert to radians 65 | lat1 = float(lat1) * np.pi / 180.0 66 | lat2 = float(lat2) * np.pi / 180.0 67 | lon1 = float(lon1) * np.pi / 180.0 68 | lon2 = float(lon2) * np.pi / 180.0 69 | 70 | # apply Haversine formula 71 | d_lat = lat2 - lat1 72 | d_lon = lon2 - lon1 73 | a = (pow(np.sin(d_lat / 2), 2) + 74 | pow(np.sin(d_lon / 2), 2) * 75 | np.cos(lat1) * np.cos(lat2)); 76 | c = 2 * np.arcsin(np.sqrt(a)) 77 | return float(6371 * c * 0.621371) # miles 78 | 79 | 80 | def create_features_and_label(event, for_training): 81 | try: 82 | model_input = {} 83 | 84 | if for_training: 85 | model_input.update({ 86 | 'ontime': 1.0 if float(event['ARR_DELAY'] or 0) < 15 else 0, 87 | }) 88 | 89 | # features for both training and prediction 90 | model_input.update({ 91 | # same as in ch9 92 | 'dep_delay': event['DEP_DELAY'], 93 | 'taxi_out': event['TAXI_OUT'], 94 | # distance is not in wheelsoff 95 | 'distance': approx_miles_between(event['DEP_AIRPORT_LAT'], event['DEP_AIRPORT_LON'], 96 | event['ARR_AIRPORT_LAT'], event['ARR_AIRPORT_LON']), 97 | 'origin': event['ORIGIN'], 98 | 'dest': event['DEST'], 99 | 'dep_hour': to_datetime(event['DEP_TIME']).hour, 100 | 'is_weekday': 1.0 if to_datetime(event['DEP_TIME']).isoweekday() < 6 else 0.0, 101 | 'carrier': event['UNIQUE_CARRIER'], 102 | 'dep_airport_lat': event['DEP_AIRPORT_LAT'], 103 | 'dep_airport_lon': event['DEP_AIRPORT_LON'], 104 | 'arr_airport_lat': event['ARR_AIRPORT_LAT'], 105 | 'arr_airport_lon': event['ARR_AIRPORT_LON'], 106 | # newly computed averages 107 | 'avg_dep_delay': event['AVG_DEP_DELAY'], 108 | 'avg_taxi_out': event['AVG_TAXI_OUT'], 109 | 110 | }) 111 | 112 | if for_training: 113 | model_input.update({ 114 | # training data split 115 | 'data_split': get_data_split(event['FL_DATE']) 116 | }) 117 | else: 118 | model_input.update({ 119 | # prediction output should include timestamp 120 | 'event_time': event['WHEELS_OFF'] 121 | }) 122 | 123 | yield model_input 124 | except Exception as e: 125 | # if any key is not present, don't use for training 126 | logging.warning('Ignoring {} because: {}'.format(event, e), exc_info=True) 127 | pass 128 | 129 | 130 | def compute_mean(events, col_name): 131 | values = [float(event[col_name]) for event in events if col_name in event and event[col_name]] 132 | return float(np.mean(values)) if len(values) > 0 else None 133 | 134 | 135 | def add_stats(element, window=beam.DoFn.WindowParam): 136 | # result of a group-by, so this will be called once for each airport and window 137 | # all averages here are by airport 138 | airport = element[0] 139 | events = element[1] 140 | 141 | # how late are flights leaving? 142 | avg_dep_delay = compute_mean(events, 'DEP_DELAY') 143 | avg_taxiout = compute_mean(events, 'TAXI_OUT') 144 | 145 | # remember that an event will be present for 60 minutes, but we want to emit 146 | # it only if it has just arrived (if it is within 5 minutes of the start of the window) 147 | emit_end_time = window.start + WINDOW_EVERY 148 | for event in events: 149 | event_time = to_datetime(event['WHEELS_OFF']).timestamp() 150 | if event_time < emit_end_time: 151 | event_plus_stat = event.copy() 152 | event_plus_stat['AVG_DEP_DELAY'] = avg_dep_delay 153 | event_plus_stat['AVG_TAXI_OUT'] = avg_taxiout 154 | yield event_plus_stat 155 | 156 | 157 | def assign_timestamp(event): 158 | try: 159 | event_time = to_datetime(event['WHEELS_OFF']) 160 | yield beam.window.TimestampedValue(event, event_time.timestamp()) 161 | except: 162 | pass 163 | 164 | 165 | def is_normal_operation(event): 166 | for flag in ['CANCELLED', 'DIVERTED']: 167 | if flag in event: 168 | s = str(event[flag]).lower() 169 | if s == 'true': 170 | return False; # cancelled or diverted 171 | return True # normal operation 172 | 173 | 174 | def transform_events_to_features(events, for_training=True): 175 | # events are assigned the time at which predictions will have to be made -- the wheels off time 176 | events = events | 'assign_time' >> beam.FlatMap(assign_timestamp) 177 | events = events | 'remove_cancelled' >> beam.Filter(is_normal_operation) 178 | 179 | # compute stats by airport, and add to events 180 | features = ( 181 | events 182 | | 'window' >> beam.WindowInto(beam.window.SlidingWindows(WINDOW_DURATION, WINDOW_EVERY)) 183 | | 'by_airport' >> beam.Map(lambda x: (x['ORIGIN'], x)) 184 | | 'group_by_airport' >> beam.GroupByKey() 185 | | 'events_and_stats' >> beam.FlatMap(add_stats) 186 | | 'events_to_features' >> beam.FlatMap(lambda x: create_features_and_label(x, for_training)) 187 | ) 188 | 189 | return features 190 | -------------------------------------------------------------------------------- /RealTimePrediction/realtime-intelligence-main/realtime/make_predictions.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | 3 | # Copyright 2023 Google Inc. 4 | # 5 | # Licensed under the Apache License, Version 2.0 (the "License"); 6 | # you may not use this file except in compliance with the License. 7 | # You may obtain a copy of the License at 8 | # 9 | # http://www.apache.org/licenses/LICENSE-2.0 10 | # 11 | # Unless required by applicable law or agreed to in writing, software 12 | # distributed under the License is distributed on an "AS IS" BASIS, 13 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 14 | # See the License for the specific language governing permissions and 15 | # limitations under the License. 16 | 17 | import apache_beam as beam 18 | import logging 19 | import json 20 | import os 21 | 22 | from flightstxf import flights_transforms as ftxf 23 | 24 | 25 | CSV_HEADER = 'event_time,dep_delay,taxi_out,distance,origin,dest,dep_hour,is_weekday,carrier,dep_airport_lat,dep_airport_lon,arr_airport_lat,arr_airport_lon,avg_dep_delay,avg_taxi_out,prob_ontime' 26 | 27 | class FlightsModelInvoker(beam.DoFn): 28 | def __init__(self): 29 | self.endpoint = None 30 | 31 | def setup(self): 32 | from google.cloud import aiplatform 33 | endpoint_name = 'flights' 34 | endpoints = aiplatform.Endpoint.list( 35 | filter='display_name="{}"'.format(endpoint_name), 36 | order_by='create_time desc' 37 | ) 38 | if len(endpoints) == 0: 39 | raise EnvironmentError("No endpoint named {}".format(endpoint_name)) 40 | logging.info("Found endpoint {}".format(endpoints[0])) 41 | self.endpoint = endpoints[0] 42 | 43 | def process(self, input_data): 44 | # call predictions and pull out probability 45 | logging.info("Invoking ML model on {} flights".format(len(input_data))) 46 | # drop inputs not needed by model 47 | features = [x.copy() for x in input_data] 48 | for f in features: 49 | f.pop('event_time') 50 | # call model 51 | predictions = self.endpoint.predict(features).predictions 52 | for idx, input_instance in enumerate(input_data): 53 | result = input_instance.copy() 54 | result['prob_ontime'] = predictions[idx][0] 55 | yield result 56 | 57 | 58 | def run(project, bucket, region, source, sink): 59 | if source == 'local': 60 | logging.info('Running locally on small extract') 61 | argv = [ 62 | '--project={0}'.format(project), 63 | '--runner=DirectRunner' 64 | ] 65 | flights_output = '/tmp/predictions' 66 | else: 67 | logging.info('Running in the cloud on full dataset input={}'.format(source)) 68 | argv = [ 69 | '--project={0}'.format(project), 70 | '--job_name=predictions', 71 | '--save_main_session', 72 | '--staging_location=gs://{0}/flights/staging/'.format(bucket), 73 | '--temp_location=gs://{0}/flights/temp/'.format(bucket), 74 | '--setup_file=./setup.py', 75 | '--autoscaling_algorithm=THROUGHPUT_BASED', 76 | '--max_num_workers=8', 77 | '--region={}'.format(region), 78 | '--subnetwork=regions/us-central1/subnetworks/default', 79 | '--runner=DataflowRunner' 80 | ] 81 | if source == 'pubsub': 82 | logging.info("Turning on streaming. Cancel the pipeline from GCP console") 83 | argv += ['--streaming'] 84 | flights_output = 'gs://{}/flights/predictions'.format(bucket) 85 | 86 | with beam.Pipeline(argv=argv) as pipeline: 87 | 88 | # read the event stream 89 | if source == 'local': 90 | input_file = './simevents_sample.json' 91 | logging.info("Reading from {} ... Writing to {}".format(input_file, flights_output)) 92 | events = ( 93 | pipeline 94 | | 'read_input' >> beam.io.ReadFromText(input_file) 95 | | 'parse_input' >> beam.Map(lambda line: json.loads(line)) 96 | ) 97 | elif source == 'bigquery': 98 | input_query = ("SELECT EVENT_DATA FROM flights.flights_simevents " + 99 | "WHERE EVENT_TIME BETWEEN '2015-03-01' AND '2015-03-02'") 100 | logging.info("Reading from {} ... Writing to {}".format(input_query, flights_output)) 101 | events = ( 102 | pipeline 103 | | 'read_input' >> beam.io.ReadFromBigQuery(query=input_query, use_standard_sql=True) 104 | | 'parse_input' >> beam.Map(lambda row: json.loads(row['EVENT_DATA'])) 105 | ) 106 | elif source == 'pubsub': 107 | input_topic = "projects/{}/topics/wheelsoff".format(project) 108 | logging.info("Reading from {} ... Writing to {}".format(input_topic, flights_output)) 109 | events = ( 110 | pipeline 111 | | 'read_input' >> beam.io.ReadFromPubSub(topic=input_topic, 112 | timestamp_attribute='EventTimeStamp') 113 | | 'parse_input' >> beam.Map(lambda s: json.loads(s)) 114 | ) 115 | else: 116 | logging.error("Unknown input type {}".format(source)) 117 | return 118 | 119 | # events -> features. See ./flights_transforms.py for the code shared between training & prediction 120 | features = ftxf.transform_events_to_features(events, for_training=False) 121 | 122 | # call model endpoint 123 | # shared_handle = beam.utils.shared.Shared() 124 | preds = ( 125 | features 126 | | 'into_global' >> beam.WindowInto(beam.window.GlobalWindows()) 127 | | 'batch_instances' >> beam.BatchElements(min_batch_size=1, max_batch_size=64) 128 | | 'model_predict' >> beam.ParDo(FlightsModelInvoker()) 129 | ) 130 | 131 | # write it out 132 | if sink == 'file': 133 | (preds 134 | | 'to_string' >> beam.Map(lambda f: ','.join([str(x) for x in f.values()])) 135 | | 'to_gcs' >> beam.io.textio.WriteToText(flights_output, 136 | file_name_suffix='.csv', header=CSV_HEADER, 137 | # workaround b/207384805 138 | num_shards=1) 139 | ) 140 | elif sink == 'bigquery': 141 | preds_schema = ','.join([ 142 | 'event_time:timestamp', 143 | 'prob_ontime:float', 144 | 'dep_delay:float', 145 | 'taxi_out:float', 146 | 'distance:float', 147 | 'origin:string', 148 | 'dest:string', 149 | 'dep_hour:integer', 150 | 'is_weekday:integer', 151 | 'carrier:string', 152 | 'dep_airport_lat:float,dep_airport_lon:float', 153 | 'arr_airport_lat:float,arr_airport_lon:float', 154 | 'avg_dep_delay:float', 155 | 'avg_taxi_out:float', 156 | ]) 157 | (preds 158 | | 'to_bigquery' >> beam.io.WriteToBigQuery( 159 | 'flights.streaming_preds', schema=preds_schema, 160 | # write_disposition=beam.io.BigQueryDisposition.WRITE_TRUNCATE, 161 | create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED, 162 | method='STREAMING_INSERTS' 163 | ) 164 | ) 165 | else: 166 | logging.error("Unknown output type {}".format(sink)) 167 | return 168 | 169 | 170 | if __name__ == '__main__': 171 | import argparse 172 | 173 | parser = argparse.ArgumentParser(description='Create training CSV file that includes time-aggregate features') 174 | parser.add_argument('-p', '--project', help='Project to be billed for Dataflow/BigQuery', required=True) 175 | parser.add_argument('-b', '--bucket', help='data will be read from written to gs://BUCKET/flights/predictions/') 176 | parser.add_argument('-r', '--region', help='Region to run Dataflow job. Choose the same region as your bucket.') 177 | parser.add_argument('-i', '--input', help='local, bigquery OR pubsub', required=True) 178 | parser.add_argument('-o', '--output', help='file, bigquery OR bigtable', default='file') 179 | 180 | logging.getLogger().setLevel(logging.INFO) 181 | args = vars(parser.parse_args()) 182 | 183 | if args['input'] != 'local': 184 | if not args['bucket'] or not args['project'] or not args['region']: 185 | print("Project, Bucket, Region are needed in order to run on the cloud on full dataset.") 186 | parser.print_help() 187 | parser.exit() 188 | 189 | run(project=args['project'], bucket=args['bucket'], region=args['region'], 190 | source=args['input'], sink=args['output']) 191 | -------------------------------------------------------------------------------- /RealTimePrediction/realtime-intelligence-main/realtime/model.py: -------------------------------------------------------------------------------- 1 | # Copyright 2023 Google LLC 2 | # 3 | # Licensed under the Apache License, Version 2.0 (the "License"); 4 | # you may not use this file except in compliance with the License. 5 | # You may obtain a copy of the License at 6 | # 7 | #     https://www.apache.org/licenses/LICENSE-2.0 8 | # 9 | # Unless required by applicable law or agreed to in writing, software 10 | # distributed under the License is distributed on an "AS IS" BASIS, 11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 | # See the License for the specific language governing permissions and 13 | # limitations under the License. 14 | 15 | # Checking in 16 | 17 | import argparse 18 | import logging 19 | import os, time 20 | import hypertune 21 | import numpy as np 22 | import tensorflow as tf 23 | 24 | BUCKET = None 25 | TF_VERSION = '2-' + tf.__version__[2:3] # needed to choose container 26 | 27 | DEVELOP_MODE = True 28 | NUM_EXAMPLES = 5000 * 1000 # doesn't need to be precise but get order of magnitude right. 29 | 30 | NUM_BUCKETS = 5 31 | NUM_EMBEDS = 3 32 | TRAIN_BATCH_SIZE = 64 33 | DNN_HIDDEN_UNITS = '64,32' 34 | 35 | CSV_COLUMNS = ( 36 | 'ontime,dep_delay,taxi_out,distance,origin,dest,dep_hour,is_weekday,carrier,' + 37 | 'dep_airport_lat,dep_airport_lon,arr_airport_lat,arr_airport_lon,avg_dep_delay,avg_taxi_out,data_split' 38 | ).split(',') 39 | 40 | CSV_COLUMN_TYPES = [ 41 | 1.0, -3.0, 5.0, 1037.493622678299, 'OTH', 'DEN', 21, 1.0, 'OO', 42 | 43.41694444, -124.24694444, 39.86166667, -104.67305556, -3.0, 5.0, 'TRAIN' 43 | ] 44 | 45 | 46 | def features_and_labels(features): 47 | label = features.pop('ontime') # this is what we will train for 48 | return features, label 49 | 50 | 51 | def read_dataset(pattern, batch_size, mode=tf.estimator.ModeKeys.TRAIN, truncate=None): 52 | dataset = tf.data.experimental.make_csv_dataset( 53 | pattern, batch_size, 54 | column_names=CSV_COLUMNS, 55 | column_defaults=CSV_COLUMN_TYPES, 56 | sloppy=True, 57 | num_parallel_reads=2, 58 | ignore_errors=True, 59 | num_epochs=1) 60 | dataset = dataset.map(features_and_labels) 61 | if mode == tf.estimator.ModeKeys.TRAIN: 62 | dataset = dataset.shuffle(batch_size * 10) 63 | dataset = dataset.repeat() 64 | dataset = dataset.prefetch(1) 65 | if truncate is not None: 66 | dataset = dataset.take(truncate) 67 | return dataset 68 | 69 | 70 | def create_model(): 71 | real = { 72 | colname: tf.feature_column.numeric_column(colname) 73 | for colname in 74 | ( 75 | 'dep_delay,taxi_out,distance,dep_hour,is_weekday,' + 76 | 'dep_airport_lat,dep_airport_lon,' + 77 | 'arr_airport_lat,arr_airport_lon,avg_dep_delay,avg_taxi_out' 78 | ).split(',') 79 | } 80 | sparse = { 81 | 'carrier': tf.feature_column.categorical_column_with_vocabulary_list('carrier', 82 | vocabulary_list='AS,VX,F9,UA,US,WN,HA,EV,MQ,DL,OO,B6,NK,AA'.split( 83 | ',')), 84 | 'origin': tf.feature_column.categorical_column_with_hash_bucket('origin', hash_bucket_size=1000), 85 | 'dest': tf.feature_column.categorical_column_with_hash_bucket('dest', hash_bucket_size=1000), 86 | } 87 | 88 | inputs = { 89 | colname: tf.keras.layers.Input(name=colname, shape=(), dtype='float32') 90 | for colname in real.keys() 91 | } 92 | inputs.update({ 93 | colname: tf.keras.layers.Input(name=colname, shape=(), dtype='string') 94 | for colname in sparse.keys() 95 | }) 96 | 97 | latbuckets = np.linspace(20.0, 50.0, NUM_BUCKETS).tolist() # USA 98 | lonbuckets = np.linspace(-120.0, -70.0, NUM_BUCKETS).tolist() # USA 99 | disc = {} 100 | disc.update({ 101 | 'd_{}'.format(key): tf.feature_column.bucketized_column(real[key], latbuckets) 102 | for key in ['dep_airport_lat', 'arr_airport_lat'] 103 | }) 104 | disc.update({ 105 | 'd_{}'.format(key): tf.feature_column.bucketized_column(real[key], lonbuckets) 106 | for key in ['dep_airport_lon', 'arr_airport_lon'] 107 | }) 108 | 109 | # cross columns that make sense in combination 110 | sparse['dep_loc'] = tf.feature_column.crossed_column( 111 | [disc['d_dep_airport_lat'], disc['d_dep_airport_lon']], NUM_BUCKETS * NUM_BUCKETS) 112 | sparse['arr_loc'] = tf.feature_column.crossed_column( 113 | [disc['d_arr_airport_lat'], disc['d_arr_airport_lon']], NUM_BUCKETS * NUM_BUCKETS) 114 | sparse['dep_arr'] = tf.feature_column.crossed_column([sparse['dep_loc'], sparse['arr_loc']], NUM_BUCKETS ** 4) 115 | 116 | # embed all the sparse columns 117 | embed = { 118 | 'embed_{}'.format(colname): tf.feature_column.embedding_column(col, NUM_EMBEDS) 119 | for colname, col in sparse.items() 120 | } 121 | real.update(embed) 122 | 123 | # one-hot encode the sparse columns 124 | sparse = { 125 | colname: tf.feature_column.indicator_column(col) 126 | for colname, col in sparse.items() 127 | } 128 | 129 | model = wide_and_deep_classifier( 130 | inputs, 131 | linear_feature_columns=sparse.values(), 132 | dnn_feature_columns=real.values(), 133 | dnn_hidden_units=DNN_HIDDEN_UNITS) 134 | 135 | return model 136 | 137 | 138 | def wide_and_deep_classifier(inputs, linear_feature_columns, dnn_feature_columns, dnn_hidden_units): 139 | deep = tf.keras.layers.DenseFeatures(dnn_feature_columns, name='deep_inputs')(inputs) 140 | layers = [int(x) for x in dnn_hidden_units.split(',')] 141 | for layerno, numnodes in enumerate(layers): 142 | deep = tf.keras.layers.Dense(numnodes, activation='relu', name='dnn_{}'.format(layerno + 1))(deep) 143 | wide = tf.keras.layers.DenseFeatures(linear_feature_columns, name='wide_inputs')(inputs) 144 | both = tf.keras.layers.concatenate([deep, wide], name='both') 145 | output = tf.keras.layers.Dense(1, activation='sigmoid', name='pred')(both) 146 | model = tf.keras.Model(inputs, output) 147 | model.compile(optimizer='adam', 148 | loss='binary_crossentropy', 149 | metrics=['accuracy', rmse, tf.keras.metrics.AUC()]) 150 | return model 151 | 152 | 153 | def rmse(y_true, y_pred): 154 | return tf.sqrt(tf.reduce_mean(tf.square(y_pred - y_true))) 155 | 156 | 157 | def train_and_evaluate(train_data_pattern, eval_data_pattern, test_data_pattern, export_dir, output_dir): 158 | train_batch_size = TRAIN_BATCH_SIZE 159 | if DEVELOP_MODE: 160 | eval_batch_size = 100 161 | steps_per_epoch = 3 162 | epochs = 2 163 | num_eval_examples = eval_batch_size * 10 164 | else: 165 | eval_batch_size = 100 166 | steps_per_epoch = NUM_EXAMPLES // train_batch_size 167 | epochs = NUM_EPOCHS 168 | num_eval_examples = eval_batch_size * 100 169 | 170 | train_dataset = read_dataset(train_data_pattern, train_batch_size) 171 | eval_dataset = read_dataset(eval_data_pattern, eval_batch_size, tf.estimator.ModeKeys.EVAL, num_eval_examples) 172 | 173 | # checkpoint 174 | checkpoint_path = '{}/checkpoints/flights.cpt'.format(output_dir) 175 | logging.info("Checkpointing to {}".format(checkpoint_path)) 176 | cp_callback = tf.keras.callbacks.ModelCheckpoint(checkpoint_path, 177 | save_weights_only=True, 178 | verbose=1) 179 | 180 | # call back to write out hyperparameter tuning metric 181 | METRIC = 'val_rmse' 182 | hpt = hypertune.HyperTune() 183 | 184 | class HpCallback(tf.keras.callbacks.Callback): 185 | def on_epoch_end(self, epoch, logs=None): 186 | if logs and METRIC in logs: 187 | logging.info("Epoch {}: {} = {}".format(epoch, METRIC, logs[METRIC])) 188 | hpt.report_hyperparameter_tuning_metric(hyperparameter_metric_tag=METRIC, 189 | metric_value=logs[METRIC], 190 | global_step=epoch) 191 | 192 | # train the model 193 | model = create_model() 194 | logging.info(f"Training on {train_data_pattern}; eval on {eval_data_pattern}; {epochs} epochs; {steps_per_epoch}") 195 | history = model.fit(train_dataset, 196 | validation_data=eval_dataset, 197 | epochs=epochs, 198 | steps_per_epoch=steps_per_epoch, 199 | callbacks=[cp_callback, HpCallback()]) 200 | 201 | # export 202 | logging.info('Exporting to {}'.format(export_dir)) 203 | tf.saved_model.save(model, export_dir) 204 | 205 | # write out final metric 206 | final_rmse = history.history[METRIC][-1] 207 | logging.info("Validation metric {} on {} samples = {}".format(METRIC, num_eval_examples, final_rmse)) 208 | 209 | if (not DEVELOP_MODE) and (test_data_pattern is not None) and (not SKIP_FULL_EVAL): 210 | logging.info("Evaluating over full test dataset") 211 | test_dataset = read_dataset(test_data_pattern, eval_batch_size, tf.estimator.ModeKeys.EVAL, None) 212 | final_metrics = model.evaluate(test_dataset) 213 | logging.info("Final metrics on full test dataset = {}".format(final_metrics)) 214 | else: 215 | logging.info("Skipping evaluation on full test dataset") 216 | 217 | 218 | if __name__ == '__main__': 219 | logging.info("Tensorflow version " + tf.__version__) 220 | parser = argparse.ArgumentParser() 221 | 222 | parser.add_argument( 223 | '--bucket', 224 | #help='Data will be read from gs://BUCKET/ch11/data and output will be in gs://BUCKET/ch11/trained_model', 225 | help='Data will be read from gs://BUCKET/train/data and output will be in gs://BUCKET/train/trained_model', 226 | required=True 227 | ) 228 | 229 | parser.add_argument( 230 | '--num_examples', 231 | help='Number of examples per epoch. Get order of magnitude correct.', 232 | type=int, 233 | default=5000000 234 | ) 235 | 236 | # for hyper-parameter tuning 237 | parser.add_argument( 238 | '--train_batch_size', 239 | help='Number of examples to compute gradient on', 240 | type=int, 241 | default=256 # originally 64 242 | ) 243 | parser.add_argument( 244 | '--nbuckets', 245 | help='Number of bins into which to discretize lats and lons', 246 | type=int, 247 | default=10 # originally 5 248 | ) 249 | parser.add_argument( 250 | '--nembeds', 251 | help='Embedding dimension for categorical variables', 252 | type=int, 253 | default=3 254 | ) 255 | parser.add_argument( 256 | '--num_epochs', 257 | help='Number of epochs (used only if --develop is not set)', 258 | type=int, 259 | default=10 260 | ) 261 | parser.add_argument( 262 | '--dnn_hidden_units', 263 | help='Architecture of DNN part of wide-and-deep network', 264 | default='64,64,64,8' # originally '64,32' 265 | ) 266 | parser.add_argument( 267 | '--develop', 268 | help='Train on a small subset in development', 269 | dest='develop', 270 | action='store_true') 271 | parser.set_defaults(develop=False) 272 | parser.add_argument( 273 | '--skip_full_eval', 274 | help='Just train. Do not evaluate on test dataset.', 275 | dest='skip_full_eval', 276 | action='store_true') 277 | parser.set_defaults(skip_full_eval=False) 278 | 279 | # parse args 280 | args = parser.parse_args().__dict__ 281 | logging.getLogger().setLevel(logging.INFO) 282 | 283 | # The Vertex AI contract. If not running in Vertex AI Training, these will be None 284 | OUTPUT_MODEL_DIR = os.getenv("AIP_MODEL_DIR") # or None 285 | TRAIN_DATA_PATTERN = os.getenv("AIP_TRAINING_DATA_URI") 286 | EVAL_DATA_PATTERN = os.getenv("AIP_VALIDATION_DATA_URI") 287 | TEST_DATA_PATTERN = os.getenv("AIP_TEST_DATA_URI") 288 | 289 | # set top-level output directory for checkpoints, etc. 290 | BUCKET = args['bucket'] 291 | #OUTPUT_DIR = 'gs://{}/ch11/train_output'.format(BUCKET) 292 | OUTPUT_DIR = 'gs://{}/train/train_output'.format(BUCKET) 293 | # During hyperparameter tuning, we need to make sure different trials don't clobber each other 294 | # https://cloud.google.com/ai-platform/training/docs/distributed-training-details#tf-config-format 295 | # This doesn't exist in Vertex AI 296 | # OUTPUT_DIR = os.path.join( 297 | # OUTPUT_DIR, 298 | # json.loads( 299 | # os.environ.get('TF_CONFIG', '{}') 300 | # ).get('task', {}).get('trial', '') 301 | # ) 302 | if OUTPUT_MODEL_DIR: 303 | # convert gs://ai-analytics-solutions-dsongcp2/aiplatform-custom-job-2021-11-13-22:22:46.175/1/model/ 304 | # to gs://ai-analytics-solutions-dsongcp2/aiplatform-custom-job-2021-11-13-22:22:46.175/1 305 | OUTPUT_DIR = os.path.join( 306 | os.path.dirname(OUTPUT_MODEL_DIR if OUTPUT_MODEL_DIR[-1] != '/' else OUTPUT_MODEL_DIR[:-1]), 307 | 'train_output') 308 | logging.info('Writing checkpoints and other outputs to {}'.format(OUTPUT_DIR)) 309 | 310 | # Set default values for the contract variables in case we are not running in Vertex AI Training 311 | if not OUTPUT_MODEL_DIR: 312 | OUTPUT_MODEL_DIR = os.path.join(OUTPUT_DIR, 313 | 'export/flights_{}'.format(time.strftime("%Y%m%d-%H%M%S"))) 314 | if not TRAIN_DATA_PATTERN: 315 | #TRAIN_DATA_PATTERN = 'gs://{}/ch11/data/train*'.format(BUCKET) 316 | TRAIN_DATA_PATTERN = 'gs://{}/train/data/train*'.format(BUCKET) 317 | CSV_COLUMNS.pop() # the data_split column won't exist 318 | CSV_COLUMN_TYPES.pop() # the data_split column won't exist 319 | if not EVAL_DATA_PATTERN: 320 | #EVAL_DATA_PATTERN = 'gs://{}/ch11/data/eval*'.format(BUCKET 321 | EVAL_DATA_PATTERN = 'gs://{}/train/data/eval*'.format(BUCKET) 322 | logging.info('Exporting trained model to {}'.format(OUTPUT_MODEL_DIR)) 323 | logging.info("Reading training data from {}".format(TRAIN_DATA_PATTERN)) 324 | logging.info('Writing trained model to {}'.format(OUTPUT_MODEL_DIR)) 325 | 326 | # other global parameters 327 | NUM_BUCKETS = args['nbuckets'] 328 | NUM_EMBEDS = args['nembeds'] 329 | NUM_EXAMPLES = args['num_examples'] 330 | NUM_EPOCHS = args['num_epochs'] 331 | TRAIN_BATCH_SIZE = args['train_batch_size'] 332 | DNN_HIDDEN_UNITS = args['dnn_hidden_units'] 333 | DEVELOP_MODE = args['develop'] 334 | SKIP_FULL_EVAL = args['skip_full_eval'] 335 | 336 | # run 337 | train_and_evaluate(TRAIN_DATA_PATTERN, EVAL_DATA_PATTERN, TEST_DATA_PATTERN, OUTPUT_MODEL_DIR, OUTPUT_DIR) 338 | 339 | logging.info("Done") 340 | -------------------------------------------------------------------------------- /RealTimePrediction/realtime-intelligence-main/realtime/setup.py: -------------------------------------------------------------------------------- 1 | # 2 | # Licensed to the Apache Software Foundation (ASF) under one or more 3 | # contributor license agreements. See the NOTICE file distributed with 4 | # this work for additional information regarding copyright ownership. 5 | # The ASF licenses this file to You under the Apache License, Version 2.0 6 | # (the "License"); you may not use this file except in compliance with 7 | # the License. You may obtain a copy of the License at 8 | # 9 | # http://www.apache.org/licenses/LICENSE-2.0 10 | # 11 | # Unless required by applicable law or agreed to in writing, software 12 | # distributed under the License is distributed on an "AS IS" BASIS, 13 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 14 | # See the License for the specific language governing permissions and 15 | # limitations under the License. 16 | # 17 | 18 | """Setup.py module for the workflow's worker utilities. 19 | 20 | All the workflow related code is gathered in a package that will be built as a 21 | source distribution, staged in the staging area for the workflow being run and 22 | then installed in the workers when they start running. 23 | 24 | This behavior is triggered by specifying the --setup_file command line option 25 | when running the workflow for remote execution. 26 | """ 27 | 28 | from distutils.command.build import build as _build 29 | import subprocess 30 | 31 | import setuptools 32 | 33 | 34 | # This class handles the pip install mechanism. 35 | class build(_build): # pylint: disable=invalid-name 36 | """A build command class that will be invoked during package install. 37 | 38 | The package built using the current setup.py will be staged and later 39 | installed in the worker using `pip install package'. This class will be 40 | instantiated during install for this specific scenario and will trigger 41 | running the custom commands specified. 42 | """ 43 | sub_commands = _build.sub_commands + [('CustomCommands', None)] 44 | 45 | 46 | # Some custom command to run during setup. The command is not essential for this 47 | # workflow. It is used here as an example. Each command will spawn a child 48 | # process. Typically, these commands will include steps to install non-Python 49 | # packages. For instance, to install a C++-based library libjpeg62 the following 50 | # two commands will have to be added: 51 | # 52 | # ['apt-get', 'update'], 53 | # ['apt-get', '--assume-yes', install', 'libjpeg62'], 54 | # 55 | # First, note that there is no need to use the sudo command because the setup 56 | # script runs with appropriate access. 57 | # Second, if apt-get tool is used then the first command needs to be 'apt-get 58 | # update' so the tool refreshes itself and initializes links to download 59 | # repositories. Without this initial step the other apt-get install commands 60 | # will fail with package not found errors. Note also --assume-yes option which 61 | # shortcuts the interactive confirmation. 62 | # 63 | # The output of custom commands (including failures) will be logged in the 64 | # worker-startup log. 65 | CUSTOM_COMMANDS = [ 66 | ] 67 | 68 | 69 | class CustomCommands(setuptools.Command): 70 | """A setuptools Command class able to run arbitrary commands.""" 71 | 72 | def initialize_options(self): 73 | pass 74 | 75 | def finalize_options(self): 76 | pass 77 | 78 | def RunCustomCommand(self, command_list): 79 | print ('Running command: %s' % command_list) 80 | p = subprocess.Popen( 81 | command_list, 82 | stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.STDOUT) 83 | # Can use communicate(input='y\n'.encode()) if the command run requires 84 | # some confirmation. 85 | stdout_data, _ = p.communicate() 86 | print ('Command output: %s' % stdout_data) 87 | if p.returncode != 0: 88 | raise RuntimeError( 89 | 'Command %s failed: exit code: %s' % (command_list, p.returncode)) 90 | 91 | def run(self): 92 | for command in CUSTOM_COMMANDS: 93 | self.RunCustomCommand(command) 94 | 95 | 96 | # Configure the required packages and scripts to install. 97 | # Note that the Python Dataflow containers come with numpy already installed 98 | # so this dependency will not trigger anything to be installed unless a version 99 | # restriction is specified. 100 | REQUIRED_PACKAGES = [ 101 | 'pyfarmhash', 102 | 'google-cloud-aiplatform', 103 | 'cloudml-hypertune', 104 | 'dill==0.3.1.1' 105 | ] 106 | 107 | 108 | setuptools.setup( 109 | name='flightsdf', 110 | version='0.0.1', 111 | description='Data Science on GCP flights training and prediction pipelines', 112 | install_requires=REQUIRED_PACKAGES, 113 | packages=setuptools.find_packages(), 114 | cmdclass={ 115 | # Command class instantiated and run during pip install scenarios. 116 | 'build': build, 117 | 'CustomCommands': CustomCommands, 118 | } 119 | ) 120 | -------------------------------------------------------------------------------- /RealTimePrediction/realtime-intelligence-main/realtime/simevents_sample.json: -------------------------------------------------------------------------------- 1 | {"FL_DATE": "2015-03-10", "UNIQUE_CARRIER": "US", "ORIGIN_AIRPORT_SEQ_ID": "1393003", "ORIGIN": "ORD", "DEST_AIRPORT_SEQ_ID": "1410702", "DEST": "PHX", "CRS_DEP_TIME": "2015-03-10T12:00:00", "DEP_TIME": "2015-03-10T11:56:00", "DEP_DELAY": -4.0, "TAXI_OUT": 21.0, "WHEELS_OFF": "2015-03-10T12:17:00", "CRS_ARR_TIME": "2015-03-10T16:10:00", "CANCELLED": false, "DIVERTED": false, "DEP_AIRPORT_LAT": 41.97944444, "DEP_AIRPORT_LON": -87.9075, "DEP_AIRPORT_TZOFFSET": -18000.0, "ARR_AIRPORT_LAT": 33.43416667, "ARR_AIRPORT_LON": -112.01166667, "ARR_AIRPORT_TZOFFSET": -25200.0, "EVENT_TYPE": "wheelsoff", "EVENT_TIME": "2015-03-10T12:17:00"} 2 | {"FL_DATE": "2015-03-10", "UNIQUE_CARRIER": "US", "ORIGIN_AIRPORT_SEQ_ID": "1467903", "ORIGIN": "SAN", "DEST_AIRPORT_SEQ_ID": "1410702", "DEST": "PHX", "CRS_DEP_TIME": "2015-03-10T13:30:00", "DEP_TIME": "2015-03-10T13:26:00", "DEP_DELAY": -4.0, "TAXI_OUT": 34.0, "WHEELS_OFF": "2015-03-10T14:00:00", "CRS_ARR_TIME": "2015-03-10T14:57:00", "CANCELLED": false, "DIVERTED": false, "DEP_AIRPORT_LAT": 32.73361111, "DEP_AIRPORT_LON": -117.18972222, "DEP_AIRPORT_TZOFFSET": -25200.0, "ARR_AIRPORT_LAT": 33.43416667, "ARR_AIRPORT_LON": -112.01166667, "ARR_AIRPORT_TZOFFSET": -25200.0, "EVENT_TYPE": "wheelsoff", "EVENT_TIME": "2015-03-10T14:00:00"} 3 | {"FL_DATE": "2015-03-10", "UNIQUE_CARRIER": "US", "ORIGIN_AIRPORT_SEQ_ID": "1483103", "ORIGIN": "SJC", "DEST_AIRPORT_SEQ_ID": "1410702", "DEST": "PHX", "CRS_DEP_TIME": "2015-03-10T13:20:00", "DEP_TIME": "2015-03-10T13:15:00", "DEP_DELAY": -5.0, "TAXI_OUT": 16.0, "WHEELS_OFF": "2015-03-10T13:31:00", "CRS_ARR_TIME": "2015-03-10T15:13:00", "CANCELLED": false, "DIVERTED": false, "DEP_AIRPORT_LAT": 37.36277778, "DEP_AIRPORT_LON": -121.92916667, "DEP_AIRPORT_TZOFFSET": -25200.0, "ARR_AIRPORT_LAT": 33.43416667, "ARR_AIRPORT_LON": -112.01166667, "ARR_AIRPORT_TZOFFSET": -25200.0, "EVENT_TYPE": "wheelsoff", "EVENT_TIME": "2015-03-10T13:31:00"} 4 | {"FL_DATE": "2015-03-10", "UNIQUE_CARRIER": "WN", "ORIGIN_AIRPORT_SEQ_ID": "1410002", "ORIGIN": "PHL", "DEST_AIRPORT_SEQ_ID": "1530402", "DEST": "TPA", "CRS_DEP_TIME": "2015-03-10T09:50:00", "DEP_TIME": "2015-03-10T09:48:00", "DEP_DELAY": -2.0, "TAXI_OUT": 15.0, "WHEELS_OFF": "2015-03-10T10:03:00", "CRS_ARR_TIME": "2015-03-10T12:35:00", "CANCELLED": false, "DIVERTED": false, "DEP_AIRPORT_LAT": 39.87222222, "DEP_AIRPORT_LON": -75.24083333, "DEP_AIRPORT_TZOFFSET": -14400.0, "ARR_AIRPORT_LAT": 27.97555556, "ARR_AIRPORT_LON": -82.53333333, "ARR_AIRPORT_TZOFFSET": -14400.0, "EVENT_TYPE": "wheelsoff", "EVENT_TIME": "2015-03-10T10:03:00"} 5 | {"FL_DATE": "2015-03-10", "UNIQUE_CARRIER": "WN", "ORIGIN_AIRPORT_SEQ_ID": "1323202", "ORIGIN": "MDW", "DEST_AIRPORT_SEQ_ID": "1410702", "DEST": "PHX", "CRS_DEP_TIME": "2015-03-10T12:10:00", "DEP_TIME": "2015-03-10T12:08:00", "DEP_DELAY": -2.0, "TAXI_OUT": 10.0, "WHEELS_OFF": "2015-03-10T12:18:00", "CRS_ARR_TIME": "2015-03-10T16:00:00", "CANCELLED": false, "DIVERTED": false, "DEP_AIRPORT_LAT": 41.78583333, "DEP_AIRPORT_LON": -87.7525, "DEP_AIRPORT_TZOFFSET": -18000.0, "ARR_AIRPORT_LAT": 33.43416667, "ARR_AIRPORT_LON": -112.01166667, "ARR_AIRPORT_TZOFFSET": -25200.0, "EVENT_TYPE": "wheelsoff", "EVENT_TIME": "2015-03-10T12:18:00"} 6 | {"FL_DATE": "2015-03-10", "UNIQUE_CARRIER": "WN", "ORIGIN_AIRPORT_SEQ_ID": "1323202", "ORIGIN": "MDW", "DEST_AIRPORT_SEQ_ID": "1410702", "DEST": "PHX", "CRS_DEP_TIME": "2015-03-10T13:35:00", "DEP_TIME": "2015-03-10T13:40:00", "DEP_DELAY": 5.0, "TAXI_OUT": 6.0, "WHEELS_OFF": "2015-03-10T13:46:00", "CRS_ARR_TIME": "2015-03-10T17:30:00", "CANCELLED": false, "DIVERTED": false, "DEP_AIRPORT_LAT": 41.78583333, "DEP_AIRPORT_LON": -87.7525, "DEP_AIRPORT_TZOFFSET": -18000.0, "ARR_AIRPORT_LAT": 33.43416667, "ARR_AIRPORT_LON": -112.01166667, "ARR_AIRPORT_TZOFFSET": -25200.0, "EVENT_TYPE": "wheelsoff", "EVENT_TIME": "2015-03-10T13:46:00"} 7 | {"FL_DATE": "2015-03-10", "UNIQUE_CARRIER": "WN", "ORIGIN_AIRPORT_SEQ_ID": "1348702", "ORIGIN": "MSP", "DEST_AIRPORT_SEQ_ID": "1410702", "DEST": "PHX", "CRS_DEP_TIME": "2015-03-10T13:00:00", "DEP_TIME": "2015-03-10T12:56:00", "DEP_DELAY": -4.0, "TAXI_OUT": 24.0, "WHEELS_OFF": "2015-03-10T13:20:00", "CRS_ARR_TIME": "2015-03-10T16:30:00", "CANCELLED": false, "DIVERTED": false, "DEP_AIRPORT_LAT": 44.88194444, "DEP_AIRPORT_LON": -93.22166667, "DEP_AIRPORT_TZOFFSET": -18000.0, "ARR_AIRPORT_LAT": 33.43416667, "ARR_AIRPORT_LON": -112.01166667, "ARR_AIRPORT_TZOFFSET": -25200.0, "EVENT_TYPE": "wheelsoff", "EVENT_TIME": "2015-03-10T13:20:00"} 8 | {"FL_DATE": "2015-03-10", "UNIQUE_CARRIER": "WN", "ORIGIN_AIRPORT_SEQ_ID": "1405702", "ORIGIN": "PDX", "DEST_AIRPORT_SEQ_ID": "1410702", "DEST": "PHX", "CRS_DEP_TIME": "2015-03-10T13:45:00", "DEP_TIME": "2015-03-10T13:44:00", "DEP_DELAY": -1.0, "TAXI_OUT": 13.0, "WHEELS_OFF": "2015-03-10T13:57:00", "CRS_ARR_TIME": "2015-03-10T16:20:00", "CANCELLED": false, "DIVERTED": false, "DEP_AIRPORT_LAT": 45.58861111, "DEP_AIRPORT_LON": -122.59694444, "DEP_AIRPORT_TZOFFSET": -25200.0, "ARR_AIRPORT_LAT": 33.43416667, "ARR_AIRPORT_LON": -112.01166667, "ARR_AIRPORT_TZOFFSET": -25200.0, "EVENT_TYPE": "wheelsoff", "EVENT_TIME": "2015-03-10T13:57:00"} 9 | {"FL_DATE": "2015-03-10", "UNIQUE_CARRIER": "WN", "ORIGIN_AIRPORT_SEQ_ID": "1468303", "ORIGIN": "SAT", "DEST_AIRPORT_SEQ_ID": "1410702", "DEST": "PHX", "CRS_DEP_TIME": "2015-03-10T12:20:00", "DEP_TIME": "2015-03-10T12:20:00", "DEP_DELAY": 0.0, "TAXI_OUT": 10.0, "WHEELS_OFF": "2015-03-10T12:30:00", "CRS_ARR_TIME": "2015-03-10T14:55:00", "CANCELLED": false, "DIVERTED": false, "DEP_AIRPORT_LAT": 29.53388889, "DEP_AIRPORT_LON": -98.46916667, "DEP_AIRPORT_TZOFFSET": -18000.0, "ARR_AIRPORT_LAT": 33.43416667, "ARR_AIRPORT_LON": -112.01166667, "ARR_AIRPORT_TZOFFSET": -25200.0, "EVENT_TYPE": "wheelsoff", "EVENT_TIME": "2015-03-10T12:30:00"} 10 | {"FL_DATE": "2015-03-10", "UNIQUE_CARRIER": "WN", "ORIGIN_AIRPORT_SEQ_ID": "1486903", "ORIGIN": "SLC", "DEST_AIRPORT_SEQ_ID": "1410702", "DEST": "PHX", "CRS_DEP_TIME": "2015-03-10T12:00:00", "DEP_TIME": "2015-03-10T11:55:00", "DEP_DELAY": -5.0, "TAXI_OUT": 5.0, "WHEELS_OFF": "2015-03-10T12:00:00", "CRS_ARR_TIME": "2015-03-10T13:40:00", "CANCELLED": false, "DIVERTED": false, "DEP_AIRPORT_LAT": 40.78833333, "DEP_AIRPORT_LON": -111.97777778, "DEP_AIRPORT_TZOFFSET": -21600.0, "ARR_AIRPORT_LAT": 33.43416667, "ARR_AIRPORT_LON": -112.01166667, "ARR_AIRPORT_TZOFFSET": -25200.0, "EVENT_TYPE": "wheelsoff", "EVENT_TIME": "2015-03-10T12:00:00"} 11 | {"FL_DATE": "2015-03-10", "UNIQUE_CARRIER": "OO", "ORIGIN_AIRPORT_SEQ_ID": "1014103", "ORIGIN": "ABR", "DEST_AIRPORT_SEQ_ID": "1348702", "DEST": "MSP", "CRS_DEP_TIME": "2015-03-10T10:15:00", "DEP_TIME": "2015-03-10T10:07:00", "DEP_DELAY": -8.0, "TAXI_OUT": 16.0, "WHEELS_OFF": "2015-03-10T10:23:00", "CRS_ARR_TIME": "2015-03-10T11:25:00", "CANCELLED": false, "DIVERTED": false, "DEP_AIRPORT_LAT": 45.44833333, "DEP_AIRPORT_LON": -98.4225, "DEP_AIRPORT_TZOFFSET": -18000.0, "ARR_AIRPORT_LAT": 44.88194444, "ARR_AIRPORT_LON": -93.22166667, "ARR_AIRPORT_TZOFFSET": -18000.0, "EVENT_TYPE": "wheelsoff", "EVENT_TIME": "2015-03-10T10:23:00"} 12 | {"FL_DATE": "2015-03-10", "UNIQUE_CARRIER": "WN", "ORIGIN_AIRPORT_SEQ_ID": "1039705", "ORIGIN": "ATL", "DEST_AIRPORT_SEQ_ID": "1348702", "DEST": "MSP", "CRS_DEP_TIME": "2015-03-10T12:25:00", "DEP_TIME": "2015-03-10T12:24:00", "DEP_DELAY": -1.0, "TAXI_OUT": 16.0, "WHEELS_OFF": "2015-03-10T12:40:00", "CRS_ARR_TIME": "2015-03-10T15:15:00", "CANCELLED": false, "DIVERTED": false, "DEP_AIRPORT_LAT": 33.63666667, "DEP_AIRPORT_LON": -84.42777778, "DEP_AIRPORT_TZOFFSET": -14400.0, "ARR_AIRPORT_LAT": 44.88194444, "ARR_AIRPORT_LON": -93.22166667, "ARR_AIRPORT_TZOFFSET": -18000.0, "EVENT_TYPE": "wheelsoff", "EVENT_TIME": "2015-03-10T12:40:00"} 13 | {"FL_DATE": "2015-03-10", "UNIQUE_CARRIER": "DL", "ORIGIN_AIRPORT_SEQ_ID": "1039705", "ORIGIN": "ATL", "DEST_AIRPORT_SEQ_ID": "1348702", "DEST": "MSP", "CRS_DEP_TIME": "2015-03-10T11:35:00", "DEP_TIME": "2015-03-10T11:35:00", "DEP_DELAY": 0.0, "TAXI_OUT": 15.0, "WHEELS_OFF": "2015-03-10T11:50:00", "CRS_ARR_TIME": "2015-03-10T14:11:00", "CANCELLED": false, "DIVERTED": false, "DEP_AIRPORT_LAT": 33.63666667, "DEP_AIRPORT_LON": -84.42777778, "DEP_AIRPORT_TZOFFSET": -14400.0, "ARR_AIRPORT_LAT": 44.88194444, "ARR_AIRPORT_LON": -93.22166667, "ARR_AIRPORT_TZOFFSET": -18000.0, "EVENT_TYPE": "wheelsoff", "EVENT_TIME": "2015-03-10T11:50:00"} 14 | {"FL_DATE": "2015-03-10", "UNIQUE_CARRIER": "DL", "ORIGIN_AIRPORT_SEQ_ID": "1052904", "ORIGIN": "BDL", "DEST_AIRPORT_SEQ_ID": "1348702", "DEST": "MSP", "CRS_DEP_TIME": "2015-03-10T10:20:00", "DEP_TIME": "2015-03-10T10:14:00", "DEP_DELAY": -6.0, "TAXI_OUT": 16.0, "WHEELS_OFF": "2015-03-10T10:30:00", "CRS_ARR_TIME": "2015-03-10T13:18:00", "CANCELLED": false, "DIVERTED": false, "DEP_AIRPORT_LAT": 41.93916667, "DEP_AIRPORT_LON": -72.68333333, "DEP_AIRPORT_TZOFFSET": -14400.0, "ARR_AIRPORT_LAT": 44.88194444, "ARR_AIRPORT_LON": -93.22166667, "ARR_AIRPORT_TZOFFSET": -18000.0, "EVENT_TYPE": "wheelsoff", "EVENT_TIME": "2015-03-10T10:30:00"} 15 | {"FL_DATE": "2015-03-10", "UNIQUE_CARRIER": "DL", "ORIGIN_AIRPORT_SEQ_ID": "1062002", "ORIGIN": "BIL", "DEST_AIRPORT_SEQ_ID": "1348702", "DEST": "MSP", "CRS_DEP_TIME": "2015-03-10T12:25:00", "DEP_TIME": "2015-03-10T12:18:00", "DEP_DELAY": -7.0, "TAXI_OUT": 11.0, "WHEELS_OFF": "2015-03-10T12:29:00", "CRS_ARR_TIME": "2015-03-10T14:15:00", "CANCELLED": false, "DIVERTED": false, "DEP_AIRPORT_LAT": 45.80777778, "DEP_AIRPORT_LON": -108.54277778, "DEP_AIRPORT_TZOFFSET": -21600.0, "ARR_AIRPORT_LAT": 44.88194444, "ARR_AIRPORT_LON": -93.22166667, "ARR_AIRPORT_TZOFFSET": -18000.0, "EVENT_TYPE": "wheelsoff", "EVENT_TIME": "2015-03-10T12:29:00"} 16 | {"FL_DATE": "2015-03-10", "UNIQUE_CARRIER": "EV", "ORIGIN_AIRPORT_SEQ_ID": "1062702", "ORIGIN": "BIS", "DEST_AIRPORT_SEQ_ID": "1348702", "DEST": "MSP", "CRS_DEP_TIME": "2015-03-10T12:45:00", "DEP_TIME": "2015-03-10T12:40:00", "DEP_DELAY": -5.0, "TAXI_OUT": 29.0, "WHEELS_OFF": "2015-03-10T13:09:00", "CRS_ARR_TIME": "2015-03-10T14:14:00", "CANCELLED": false, "DIVERTED": false, "DEP_AIRPORT_LAT": 46.77277778, "DEP_AIRPORT_LON": -100.74583333, "DEP_AIRPORT_TZOFFSET": -18000.0, "ARR_AIRPORT_LAT": 44.88194444, "ARR_AIRPORT_LON": -93.22166667, "ARR_AIRPORT_TZOFFSET": -18000.0, "EVENT_TYPE": "wheelsoff", "EVENT_TIME": "2015-03-10T13:09:00"} 17 | {"FL_DATE": "2015-03-10", "UNIQUE_CARRIER": "DL", "ORIGIN_AIRPORT_SEQ_ID": "1062702", "ORIGIN": "BIS", "DEST_AIRPORT_SEQ_ID": "1348702", "DEST": "MSP", "CRS_DEP_TIME": "2015-03-10T10:00:00", "DEP_TIME": "2015-03-10T09:53:00", "DEP_DELAY": -7.0, "TAXI_OUT": 9.0, "WHEELS_OFF": "2015-03-10T10:02:00", "CRS_ARR_TIME": "2015-03-10T11:25:00", "CANCELLED": false, "DIVERTED": false, "DEP_AIRPORT_LAT": 46.77277778, "DEP_AIRPORT_LON": -100.74583333, "DEP_AIRPORT_TZOFFSET": -18000.0, "ARR_AIRPORT_LAT": 44.88194444, "ARR_AIRPORT_LON": -93.22166667, "ARR_AIRPORT_TZOFFSET": -18000.0, "EVENT_TYPE": "wheelsoff", "EVENT_TIME": "2015-03-10T10:02:00"} 18 | {"FL_DATE": "2015-03-10", "UNIQUE_CARRIER": "OO", "ORIGIN_AIRPORT_SEQ_ID": "1063104", "ORIGIN": "BJI", "DEST_AIRPORT_SEQ_ID": "1348702", "DEST": "MSP", "CRS_DEP_TIME": "2015-03-10T10:05:00", "DEP_TIME": "2015-03-10T09:59:00", "DEP_DELAY": -6.0, "TAXI_OUT": 15.0, "WHEELS_OFF": "2015-03-10T10:14:00", "CRS_ARR_TIME": "2015-03-10T11:05:00", "CANCELLED": false, "DIVERTED": false, "DEP_AIRPORT_LAT": 47.51083333, "DEP_AIRPORT_LON": -94.93472222, "DEP_AIRPORT_TZOFFSET": -18000.0, "ARR_AIRPORT_LAT": 44.88194444, "ARR_AIRPORT_LON": -93.22166667, "ARR_AIRPORT_TZOFFSET": -18000.0, "EVENT_TYPE": "wheelsoff", "EVENT_TIME": "2015-03-10T10:14:00"} 19 | {"FL_DATE": "2015-03-10", "UNIQUE_CARRIER": "OO", "ORIGIN_AIRPORT_SEQ_ID": "1069302", "ORIGIN": "BNA", "DEST_AIRPORT_SEQ_ID": "1348702", "DEST": "MSP", "CRS_DEP_TIME": "2015-03-10T13:15:00", "DEP_TIME": "2015-03-10T12:47:00", "DEP_DELAY": -28.0, "TAXI_OUT": 56.0, "WHEELS_OFF": "2015-03-10T13:43:00", "CRS_ARR_TIME": "2015-03-10T15:32:00", "CANCELLED": false, "DIVERTED": false, "DEP_AIRPORT_LAT": 36.12444444, "DEP_AIRPORT_LON": -86.67805556, "DEP_AIRPORT_TZOFFSET": -18000.0, "ARR_AIRPORT_LAT": 44.88194444, "ARR_AIRPORT_LON": -93.22166667, "ARR_AIRPORT_TZOFFSET": -18000.0, "EVENT_TYPE": "wheelsoff", "EVENT_TIME": "2015-03-10T13:43:00"} 20 | {"FL_DATE": "2015-03-10", "UNIQUE_CARRIER": "DL", "ORIGIN_AIRPORT_SEQ_ID": "1072102", "ORIGIN": "BOS", "DEST_AIRPORT_SEQ_ID": "1348702", "DEST": "MSP", "CRS_DEP_TIME": "2015-03-10T12:10:00", "DEP_TIME": "2015-03-10T12:08:00", "DEP_DELAY": -2.0, "TAXI_OUT": 20.0, "WHEELS_OFF": "2015-03-10T12:28:00", "CRS_ARR_TIME": "2015-03-10T15:27:00", "CANCELLED": false, "DIVERTED": false, "DEP_AIRPORT_LAT": 42.36305556, "DEP_AIRPORT_LON": -71.00638889, "DEP_AIRPORT_TZOFFSET": -14400.0, "ARR_AIRPORT_LAT": 44.88194444, "ARR_AIRPORT_LON": -93.22166667, "ARR_AIRPORT_TZOFFSET": -18000.0, "EVENT_TYPE": "wheelsoff", "EVENT_TIME": "2015-03-10T12:28:00"} 21 | {"FL_DATE": "2015-03-10", "UNIQUE_CARRIER": "DL", "ORIGIN_AIRPORT_SEQ_ID": "1072102", "ORIGIN": "BOS", "DEST_AIRPORT_SEQ_ID": "1348702", "DEST": "MSP", "CRS_DEP_TIME": "2015-03-10T10:00:00", "DEP_TIME": "2015-03-10T09:55:00", "DEP_DELAY": -5.0, "TAXI_OUT": 19.0, "WHEELS_OFF": "2015-03-10T10:14:00", "CRS_ARR_TIME": "2015-03-10T13:21:00", "CANCELLED": false, "DIVERTED": false, "DEP_AIRPORT_LAT": 42.36305556, "DEP_AIRPORT_LON": -71.00638889, "DEP_AIRPORT_TZOFFSET": -14400.0, "ARR_AIRPORT_LAT": 44.88194444, "ARR_AIRPORT_LON": -93.22166667, "ARR_AIRPORT_TZOFFSET": -18000.0, "EVENT_TYPE": "wheelsoff", "EVENT_TIME": "2015-03-10T10:14:00"} 22 | {"FL_DATE": "2015-03-10", "UNIQUE_CARRIER": "EV", "ORIGIN_AIRPORT_SEQ_ID": "1104203", "ORIGIN": "CLE", "DEST_AIRPORT_SEQ_ID": "1334205", "DEST": "MKE", "CRS_DEP_TIME": "2015-03-10T10:31:00", "DEP_TIME": "2015-03-10T10:19:00", "DEP_DELAY": -12.0, "TAXI_OUT": 12.0, "WHEELS_OFF": "2015-03-10T10:31:00", "CRS_ARR_TIME": "2015-03-10T12:02:00", "CANCELLED": false, "DIVERTED": false, "DEP_AIRPORT_LAT": 41.40944444, "DEP_AIRPORT_LON": -81.85472222, "DEP_AIRPORT_TZOFFSET": -14400.0, "ARR_AIRPORT_LAT": 42.94694444, "ARR_AIRPORT_LON": -87.89694444, "ARR_AIRPORT_TZOFFSET": -18000.0, "EVENT_TYPE": "wheelsoff", "EVENT_TIME": "2015-03-10T10:31:00"} 23 | {"FL_DATE": "2015-03-10", "UNIQUE_CARRIER": "EV", "ORIGIN_AIRPORT_SEQ_ID": "1104203", "ORIGIN": "CLE", "DEST_AIRPORT_SEQ_ID": "1348702", "DEST": "MSP", "CRS_DEP_TIME": "2015-03-10T11:10:00", "DEP_TIME": "2015-03-10T11:07:00", "DEP_DELAY": -3.0, "TAXI_OUT": 7.0, "WHEELS_OFF": "2015-03-10T11:14:00", "CRS_ARR_TIME": "2015-03-10T13:25:00", "CANCELLED": false, "DIVERTED": false, "DEP_AIRPORT_LAT": 41.40944444, "DEP_AIRPORT_LON": -81.85472222, "DEP_AIRPORT_TZOFFSET": -14400.0, "ARR_AIRPORT_LAT": 44.88194444, "ARR_AIRPORT_LON": -93.22166667, "ARR_AIRPORT_TZOFFSET": -18000.0, "EVENT_TYPE": "wheelsoff", "EVENT_TIME": "2015-03-10T11:14:00"} 24 | {"FL_DATE": "2015-03-10", "UNIQUE_CARRIER": "DL", "ORIGIN_AIRPORT_SEQ_ID": "1105703", "ORIGIN": "CLT", "DEST_AIRPORT_SEQ_ID": "1348702", "DEST": "MSP", "CRS_DEP_TIME": "2015-03-10T10:20:00", "DEP_TIME": "2015-03-10T10:13:00", "DEP_DELAY": -7.0, "TAXI_OUT": 15.0, "WHEELS_OFF": "2015-03-10T10:28:00", "CRS_ARR_TIME": "2015-03-10T13:08:00", "CANCELLED": false, "DIVERTED": false, "DEP_AIRPORT_LAT": 35.21361111, "DEP_AIRPORT_LON": -80.94916667, "DEP_AIRPORT_TZOFFSET": -14400.0, "ARR_AIRPORT_LAT": 44.88194444, "ARR_AIRPORT_LON": -93.22166667, "ARR_AIRPORT_TZOFFSET": -18000.0, "EVENT_TYPE": "wheelsoff", "EVENT_TIME": "2015-03-10T10:28:00"} 25 | {"FL_DATE": "2015-03-10", "UNIQUE_CARRIER": "US", "ORIGIN_AIRPORT_SEQ_ID": "1105703", "ORIGIN": "CLT", "DEST_AIRPORT_SEQ_ID": "1348702", "DEST": "MSP", "CRS_DEP_TIME": "2015-03-10T13:35:00", "DEP_TIME": "2015-03-10T13:35:00", "DEP_DELAY": 0.0, "TAXI_OUT": 16.0, "WHEELS_OFF": "2015-03-10T13:51:00", "CRS_ARR_TIME": "2015-03-10T16:23:00", "CANCELLED": false, "DIVERTED": false, "DEP_AIRPORT_LAT": 35.21361111, "DEP_AIRPORT_LON": -80.94916667, "DEP_AIRPORT_TZOFFSET": -14400.0, "ARR_AIRPORT_LAT": 44.88194444, "ARR_AIRPORT_LON": -93.22166667, "ARR_AIRPORT_TZOFFSET": -18000.0, "EVENT_TYPE": "wheelsoff", "EVENT_TIME": "2015-03-10T13:51:00"} 26 | {"FL_DATE": "2015-03-10", "UNIQUE_CARRIER": "OO", "ORIGIN_AIRPORT_SEQ_ID": "1106603", "ORIGIN": "CMH", "DEST_AIRPORT_SEQ_ID": "1348702", "DEST": "MSP", "CRS_DEP_TIME": "2015-03-10T13:05:00", "DEP_TIME": "2015-03-10T13:01:00", "DEP_DELAY": -4.0, "TAXI_OUT": 9.0, "WHEELS_OFF": "2015-03-10T13:10:00", "CRS_ARR_TIME": "2015-03-10T15:15:00", "CANCELLED": false, "DIVERTED": false, "DEP_AIRPORT_LAT": 39.99694444, "DEP_AIRPORT_LON": -82.89222222, "DEP_AIRPORT_TZOFFSET": -14400.0, "ARR_AIRPORT_LAT": 44.88194444, "ARR_AIRPORT_LON": -93.22166667, "ARR_AIRPORT_TZOFFSET": -18000.0, "EVENT_TYPE": "wheelsoff", "EVENT_TIME": "2015-03-10T13:10:00"} 27 | {"FL_DATE": "2015-03-10", "UNIQUE_CARRIER": "WN", "ORIGIN_AIRPORT_SEQ_ID": "1127802", "ORIGIN": "DCA", "DEST_AIRPORT_SEQ_ID": "1334205", "DEST": "MKE", "CRS_DEP_TIME": "2015-03-10T10:55:00", "DEP_TIME": "2015-03-10T10:49:00", "DEP_DELAY": -6.0, "TAXI_OUT": 7.0, "WHEELS_OFF": "2015-03-10T10:56:00", "CRS_ARR_TIME": "2015-03-10T13:00:00", "CANCELLED": false, "DIVERTED": false, "DEP_AIRPORT_LAT": 38.85194444, "DEP_AIRPORT_LON": -77.03777778, "DEP_AIRPORT_TZOFFSET": -14400.0, "ARR_AIRPORT_LAT": 42.94694444, "ARR_AIRPORT_LON": -87.89694444, "ARR_AIRPORT_TZOFFSET": -18000.0, "EVENT_TYPE": "wheelsoff", "EVENT_TIME": "2015-03-10T10:56:00"} 28 | {"FL_DATE": "2015-03-10", "UNIQUE_CARRIER": "DL", "ORIGIN_AIRPORT_SEQ_ID": "1127802", "ORIGIN": "DCA", "DEST_AIRPORT_SEQ_ID": "1348702", "DEST": "MSP", "CRS_DEP_TIME": "2015-03-10T12:35:00", "DEP_TIME": "2015-03-10T12:32:00", "DEP_DELAY": -3.0, "TAXI_OUT": 9.0, "WHEELS_OFF": "2015-03-10T12:41:00", "CRS_ARR_TIME": "2015-03-10T15:23:00", "CANCELLED": false, "DIVERTED": false, "DEP_AIRPORT_LAT": 38.85194444, "DEP_AIRPORT_LON": -77.03777778, "DEP_AIRPORT_TZOFFSET": -14400.0, "ARR_AIRPORT_LAT": 44.88194444, "ARR_AIRPORT_LON": -93.22166667, "ARR_AIRPORT_TZOFFSET": -18000.0, "EVENT_TYPE": "wheelsoff", "EVENT_TIME": "2015-03-10T12:41:00"} 29 | {"FL_DATE": "2015-03-10", "UNIQUE_CARRIER": "DL", "ORIGIN_AIRPORT_SEQ_ID": "1129202", "ORIGIN": "DEN", "DEST_AIRPORT_SEQ_ID": "1348702", "DEST": "MSP", "CRS_DEP_TIME": "2015-03-10T12:10:00", "DEP_TIME": "2015-03-10T12:07:00", "DEP_DELAY": -3.0, "TAXI_OUT": 14.0, "WHEELS_OFF": "2015-03-10T12:21:00", "CRS_ARR_TIME": "2015-03-10T14:10:00", "CANCELLED": false, "DIVERTED": false, "DEP_AIRPORT_LAT": 39.86166667, "DEP_AIRPORT_LON": -104.67305556, "DEP_AIRPORT_TZOFFSET": -21600.0, "ARR_AIRPORT_LAT": 44.88194444, "ARR_AIRPORT_LON": -93.22166667, "ARR_AIRPORT_TZOFFSET": -18000.0, "EVENT_TYPE": "wheelsoff", "EVENT_TIME": "2015-03-10T12:21:00"} 30 | {"FL_DATE": "2015-03-10", "UNIQUE_CARRIER": "NK", "ORIGIN_AIRPORT_SEQ_ID": "1129803", "ORIGIN": "DFW", "DEST_AIRPORT_SEQ_ID": "1348702", "DEST": "MSP", "CRS_DEP_TIME": "2015-03-10T11:20:00", "DEP_TIME": "2015-03-10T11:17:00", "DEP_DELAY": -3.0, "TAXI_OUT": 19.0, "WHEELS_OFF": "2015-03-10T11:36:00", "CRS_ARR_TIME": "2015-03-10T13:39:00", "CANCELLED": false, "DIVERTED": false, "DEP_AIRPORT_LAT": 32.89694444, "DEP_AIRPORT_LON": -97.03805556, "DEP_AIRPORT_TZOFFSET": -18000.0, "ARR_AIRPORT_LAT": 44.88194444, "ARR_AIRPORT_LON": -93.22166667, "ARR_AIRPORT_TZOFFSET": -18000.0, "EVENT_TYPE": "wheelsoff", "EVENT_TIME": "2015-03-10T11:36:00"} 31 | {"FL_DATE": "2015-03-10", "UNIQUE_CARRIER": "DL", "ORIGIN_AIRPORT_SEQ_ID": "1143302", "ORIGIN": "DTW", "DEST_AIRPORT_SEQ_ID": "1334205", "DEST": "MKE", "CRS_DEP_TIME": "2015-03-10T12:20:00", "DEP_TIME": "2015-03-10T12:19:00", "DEP_DELAY": -1.0, "TAXI_OUT": 14.0, "WHEELS_OFF": "2015-03-10T12:33:00", "CRS_ARR_TIME": "2015-03-10T13:29:00", "CANCELLED": false, "DIVERTED": false, "DEP_AIRPORT_LAT": 42.2125, "DEP_AIRPORT_LON": -83.35333333, "DEP_AIRPORT_TZOFFSET": -14400.0, "ARR_AIRPORT_LAT": 42.94694444, "ARR_AIRPORT_LON": -87.89694444, "ARR_AIRPORT_TZOFFSET": -18000.0, "EVENT_TYPE": "wheelsoff", "EVENT_TIME": "2015-03-10T12:33:00"} 32 | {"FL_DATE": "2015-03-10", "UNIQUE_CARRIER": "DL", "ORIGIN_AIRPORT_SEQ_ID": "1143302", "ORIGIN": "DTW", "DEST_AIRPORT_SEQ_ID": "1348702", "DEST": "MSP", "CRS_DEP_TIME": "2015-03-10T11:25:00", "DEP_TIME": "2015-03-10T11:22:00", "DEP_DELAY": -3.0, "TAXI_OUT": 15.0, "WHEELS_OFF": "2015-03-10T11:37:00", "CRS_ARR_TIME": "2015-03-10T13:20:00", "CANCELLED": false, "DIVERTED": false, "DEP_AIRPORT_LAT": 42.2125, "DEP_AIRPORT_LON": -83.35333333, "DEP_AIRPORT_TZOFFSET": -14400.0, "ARR_AIRPORT_LAT": 44.88194444, "ARR_AIRPORT_LON": -93.22166667, "ARR_AIRPORT_TZOFFSET": -18000.0, "EVENT_TYPE": "wheelsoff", "EVENT_TIME": "2015-03-10T11:37:00"} 33 | {"FL_DATE": "2015-03-10", "UNIQUE_CARRIER": "DL", "ORIGIN_AIRPORT_SEQ_ID": "1143302", "ORIGIN": "DTW", "DEST_AIRPORT_SEQ_ID": "1348702", "DEST": "MSP", "CRS_DEP_TIME": "2015-03-10T12:35:00", "DEP_TIME": "2015-03-10T12:32:00", "DEP_DELAY": -3.0, "TAXI_OUT": 18.0, "WHEELS_OFF": "2015-03-10T12:50:00", "CRS_ARR_TIME": "2015-03-10T14:28:00", "CANCELLED": false, "DIVERTED": false, "DEP_AIRPORT_LAT": 42.2125, "DEP_AIRPORT_LON": -83.35333333, "DEP_AIRPORT_TZOFFSET": -14400.0, "ARR_AIRPORT_LAT": 44.88194444, "ARR_AIRPORT_LON": -93.22166667, "ARR_AIRPORT_TZOFFSET": -18000.0, "EVENT_TYPE": "wheelsoff", "EVENT_TIME": "2015-03-10T12:50:00"} 34 | {"FL_DATE": "2015-03-10", "UNIQUE_CARRIER": "DL", "ORIGIN_AIRPORT_SEQ_ID": "1163703", "ORIGIN": "FAR", "DEST_AIRPORT_SEQ_ID": "1348702", "DEST": "MSP", "CRS_DEP_TIME": "2015-03-10T10:00:00", "DEP_TIME": "2015-03-10T10:20:00", "DEP_DELAY": 20.0, "TAXI_OUT": 8.0, "WHEELS_OFF": "2015-03-10T10:28:00", "CRS_ARR_TIME": "2015-03-10T11:12:00", "CANCELLED": false, "DIVERTED": false, "DEP_AIRPORT_LAT": 46.92055556, "DEP_AIRPORT_LON": -96.81583333, "DEP_AIRPORT_TZOFFSET": -18000.0, "ARR_AIRPORT_LAT": 44.88194444, "ARR_AIRPORT_LON": -93.22166667, "ARR_AIRPORT_TZOFFSET": -18000.0, "EVENT_TYPE": "wheelsoff", "EVENT_TIME": "2015-03-10T10:28:00"} 35 | {"FL_DATE": "2015-03-10", "UNIQUE_CARRIER": "DL", "ORIGIN_AIRPORT_SEQ_ID": "1169703", "ORIGIN": "FLL", "DEST_AIRPORT_SEQ_ID": "1348702", "DEST": "MSP", "CRS_DEP_TIME": "2015-03-10T11:25:00", "DEP_TIME": "2015-03-10T11:20:00", "DEP_DELAY": -5.0, "TAXI_OUT": 13.0, "WHEELS_OFF": "2015-03-10T11:33:00", "CRS_ARR_TIME": "2015-03-10T15:18:00", "CANCELLED": false, "DIVERTED": false, "DEP_AIRPORT_LAT": 26.0725, "DEP_AIRPORT_LON": -80.15277778, "DEP_AIRPORT_TZOFFSET": -14400.0, "ARR_AIRPORT_LAT": 44.88194444, "ARR_AIRPORT_LON": -93.22166667, "ARR_AIRPORT_TZOFFSET": -18000.0, "EVENT_TYPE": "wheelsoff", "EVENT_TIME": "2015-03-10T11:33:00"} 36 | {"FL_DATE": "2015-03-10", "UNIQUE_CARRIER": "OO", "ORIGIN_AIRPORT_SEQ_ID": "1182304", "ORIGIN": "FWA", "DEST_AIRPORT_SEQ_ID": "1348702", "DEST": "MSP", "CRS_DEP_TIME": "2015-03-10T11:34:00", "DEP_TIME": "2015-03-10T11:31:00", "DEP_DELAY": -3.0, "TAXI_OUT": 17.0, "WHEELS_OFF": "2015-03-10T11:48:00", "CRS_ARR_TIME": "2015-03-10T13:30:00", "CANCELLED": false, "DIVERTED": false, "DEP_AIRPORT_LAT": 40.97833333, "DEP_AIRPORT_LON": -85.19527778, "DEP_AIRPORT_TZOFFSET": -14400.0, "ARR_AIRPORT_LAT": 44.88194444, "ARR_AIRPORT_LON": -93.22166667, "ARR_AIRPORT_TZOFFSET": -18000.0, "EVENT_TYPE": "wheelsoff", "EVENT_TIME": "2015-03-10T11:48:00"} 37 | {"FL_DATE": "2015-03-10", "UNIQUE_CARRIER": "DL", "ORIGIN_AIRPORT_SEQ_ID": "1198603", "ORIGIN": "GRR", "DEST_AIRPORT_SEQ_ID": "1348702", "DEST": "MSP", "CRS_DEP_TIME": "2015-03-10T11:25:00", "DEP_TIME": "2015-03-10T11:18:00", "DEP_DELAY": -7.0, "TAXI_OUT": 16.0, "WHEELS_OFF": "2015-03-10T11:34:00", "CRS_ARR_TIME": "2015-03-10T12:59:00", "CANCELLED": false, "DIVERTED": false, "DEP_AIRPORT_LAT": 42.88083333, "DEP_AIRPORT_LON": -85.52277778, "DEP_AIRPORT_TZOFFSET": -14400.0, "ARR_AIRPORT_LAT": 44.88194444, "ARR_AIRPORT_LON": -93.22166667, "ARR_AIRPORT_TZOFFSET": -18000.0, "EVENT_TYPE": "wheelsoff", "EVENT_TIME": "2015-03-10T11:34:00"} 38 | {"FL_DATE": "2015-03-10", "UNIQUE_CARRIER": "OO", "ORIGIN_AIRPORT_SEQ_ID": "1212903", "ORIGIN": "HIB", "DEST_AIRPORT_SEQ_ID": "1348702", "DEST": "MSP", "CRS_DEP_TIME": "2015-03-10T12:14:00", "DEP_TIME": "2015-03-10T12:01:00", "DEP_DELAY": -13.0, "TAXI_OUT": 9.0, "WHEELS_OFF": "2015-03-10T12:10:00", "CRS_ARR_TIME": "2015-03-10T13:25:00", "CANCELLED": false, "DIVERTED": false, "DEP_AIRPORT_LAT": 47.38666667, "DEP_AIRPORT_LON": -92.83888889, "DEP_AIRPORT_TZOFFSET": -18000.0, "ARR_AIRPORT_LAT": 44.88194444, "ARR_AIRPORT_LON": -93.22166667, "ARR_AIRPORT_TZOFFSET": -18000.0, "EVENT_TYPE": "wheelsoff", "EVENT_TIME": "2015-03-10T12:10:00"} 39 | {"FL_DATE": "2015-03-10", "UNIQUE_CARRIER": "DL", "ORIGIN_AIRPORT_SEQ_ID": "1233904", "ORIGIN": "IND", "DEST_AIRPORT_SEQ_ID": "1348702", "DEST": "MSP", "CRS_DEP_TIME": "2015-03-10T11:12:00", "DEP_TIME": "2015-03-10T11:04:00", "DEP_DELAY": -8.0, "TAXI_OUT": 14.0, "WHEELS_OFF": "2015-03-10T11:18:00", "CRS_ARR_TIME": "2015-03-10T13:00:00", "CANCELLED": false, "DIVERTED": false, "DEP_AIRPORT_LAT": 39.71722222, "DEP_AIRPORT_LON": -86.29472222, "DEP_AIRPORT_TZOFFSET": -14400.0, "ARR_AIRPORT_LAT": 44.88194444, "ARR_AIRPORT_LON": -93.22166667, "ARR_AIRPORT_TZOFFSET": -18000.0, "EVENT_TYPE": "wheelsoff", "EVENT_TIME": "2015-03-10T11:18:00"} 40 | {"FL_DATE": "2015-03-10", "UNIQUE_CARRIER": "OO", "ORIGIN_AIRPORT_SEQ_ID": "1238902", "ORIGIN": "ISN", "DEST_AIRPORT_SEQ_ID": "1348702", "DEST": "MSP", "CRS_DEP_TIME": "2015-03-10T11:15:00", "DEP_TIME": "2015-03-10T11:08:00", "DEP_DELAY": -7.0, "TAXI_OUT": 41.0, "WHEELS_OFF": "2015-03-10T11:49:00", "CRS_ARR_TIME": "2015-03-10T13:11:00", "CANCELLED": false, "DIVERTED": false, "DEP_AIRPORT_LAT": 48.17805556, "DEP_AIRPORT_LON": -103.64222222, "DEP_AIRPORT_TZOFFSET": -18000.0, "ARR_AIRPORT_LAT": 44.88194444, "ARR_AIRPORT_LON": -93.22166667, "ARR_AIRPORT_TZOFFSET": -18000.0, "EVENT_TYPE": "wheelsoff", "EVENT_TIME": "2015-03-10T11:49:00"} 41 | {"FL_DATE": "2015-03-10", "UNIQUE_CARRIER": "WN", "ORIGIN_AIRPORT_SEQ_ID": "1295302", "ORIGIN": "LGA", "DEST_AIRPORT_SEQ_ID": "1334205", "DEST": "MKE", "CRS_DEP_TIME": "2015-03-10T12:30:00", "DEP_TIME": "2015-03-10T12:38:00", "DEP_DELAY": 8.0, "TAXI_OUT": 17.0, "WHEELS_OFF": "2015-03-10T12:55:00", "CRS_ARR_TIME": "2015-03-10T14:55:00", "CANCELLED": false, "DIVERTED": false, "DEP_AIRPORT_LAT": 40.77722222, "DEP_AIRPORT_LON": -73.8725, "DEP_AIRPORT_TZOFFSET": -14400.0, "ARR_AIRPORT_LAT": 42.94694444, "ARR_AIRPORT_LON": -87.89694444, "ARR_AIRPORT_TZOFFSET": -18000.0, "EVENT_TYPE": "wheelsoff", "EVENT_TIME": "2015-03-10T12:55:00"} 42 | {"FL_DATE": "2015-03-10", "UNIQUE_CARRIER": "OO", "ORIGIN_AIRPORT_SEQ_ID": "1307602", "ORIGIN": "LSE", "DEST_AIRPORT_SEQ_ID": "1348702", "DEST": "MSP", "CRS_DEP_TIME": "2015-03-10T11:20:00", "DEP_TIME": "2015-03-10T11:20:00", "DEP_DELAY": 0.0, "TAXI_OUT": 7.0, "WHEELS_OFF": "2015-03-10T11:27:00", "CRS_ARR_TIME": "2015-03-10T12:25:00", "CANCELLED": false, "DIVERTED": false, "DEP_AIRPORT_LAT": 43.87916667, "DEP_AIRPORT_LON": -91.25666667, "DEP_AIRPORT_TZOFFSET": -18000.0, "ARR_AIRPORT_LAT": 44.88194444, "ARR_AIRPORT_LON": -93.22166667, "ARR_AIRPORT_TZOFFSET": -18000.0, "EVENT_TYPE": "wheelsoff", "EVENT_TIME": "2015-03-10T11:27:00"} 43 | {"FL_DATE": "2015-03-10", "UNIQUE_CARRIER": "WN", "ORIGIN_AIRPORT_SEQ_ID": "1320402", "ORIGIN": "MCO", "DEST_AIRPORT_SEQ_ID": "1334205", "DEST": "MKE", "CRS_DEP_TIME": "2015-03-10T12:45:00", "DEP_TIME": "2015-03-10T12:44:00", "DEP_DELAY": -1.0, "TAXI_OUT": 9.0, "WHEELS_OFF": "2015-03-10T12:53:00", "CRS_ARR_TIME": "2015-03-10T15:45:00", "CANCELLED": false, "DIVERTED": false, "DEP_AIRPORT_LAT": 28.42944444, "DEP_AIRPORT_LON": -81.30888889, "DEP_AIRPORT_TZOFFSET": -14400.0, "ARR_AIRPORT_LAT": 42.94694444, "ARR_AIRPORT_LON": -87.89694444, "ARR_AIRPORT_TZOFFSET": -18000.0, "EVENT_TYPE": "wheelsoff", "EVENT_TIME": "2015-03-10T12:53:00"} 44 | {"FL_DATE": "2015-03-10", "UNIQUE_CARRIER": "DL", "ORIGIN_AIRPORT_SEQ_ID": "1348602", "ORIGIN": "MSO", "DEST_AIRPORT_SEQ_ID": "1348702", "DEST": "MSP", "CRS_DEP_TIME": "2015-03-10T11:45:00", "DEP_TIME": "2015-03-10T11:38:00", "DEP_DELAY": -7.0, "TAXI_OUT": 21.0, "WHEELS_OFF": "2015-03-10T11:59:00", "CRS_ARR_TIME": "2015-03-10T14:23:00", "CANCELLED": false, "DIVERTED": false, "DEP_AIRPORT_LAT": 46.91638889, "DEP_AIRPORT_LON": -114.09055556, "DEP_AIRPORT_TZOFFSET": -21600.0, "ARR_AIRPORT_LAT": 44.88194444, "ARR_AIRPORT_LON": -93.22166667, "ARR_AIRPORT_TZOFFSET": -18000.0, "EVENT_TYPE": "wheelsoff", "EVENT_TIME": "2015-03-10T11:59:00"} 45 | {"FL_DATE": "2015-03-10", "UNIQUE_CARRIER": "AA", "ORIGIN_AIRPORT_SEQ_ID": "1393003", "ORIGIN": "ORD", "DEST_AIRPORT_SEQ_ID": "1330303", "DEST": "MIA", "CRS_DEP_TIME": "2015-03-10T11:00:00", "DEP_TIME": "2015-03-10T10:54:00", "DEP_DELAY": -6.0, "TAXI_OUT": 12.0, "WHEELS_OFF": "2015-03-10T11:06:00", "CRS_ARR_TIME": "2015-03-10T14:03:00", "CANCELLED": false, "DIVERTED": false, "DEP_AIRPORT_LAT": 41.97944444, "DEP_AIRPORT_LON": -87.9075, "DEP_AIRPORT_TZOFFSET": -18000.0, "ARR_AIRPORT_LAT": 25.79527778, "ARR_AIRPORT_LON": -80.29, "ARR_AIRPORT_TZOFFSET": -14400.0, "EVENT_TYPE": "wheelsoff", "EVENT_TIME": "2015-03-10T11:06:00"} 46 | {"FL_DATE": "2015-03-10", "UNIQUE_CARRIER": "AA", "ORIGIN_AIRPORT_SEQ_ID": "1393003", "ORIGIN": "ORD", "DEST_AIRPORT_SEQ_ID": "1330303", "DEST": "MIA", "CRS_DEP_TIME": "2015-03-10T12:30:00", "DEP_TIME": "2015-03-10T12:21:00", "DEP_DELAY": -9.0, "TAXI_OUT": 16.0, "WHEELS_OFF": "2015-03-10T12:37:00", "CRS_ARR_TIME": "2015-03-10T15:38:00", "CANCELLED": false, "DIVERTED": false, "DEP_AIRPORT_LAT": 41.97944444, "DEP_AIRPORT_LON": -87.9075, "DEP_AIRPORT_TZOFFSET": -18000.0, "ARR_AIRPORT_LAT": 25.79527778, "ARR_AIRPORT_LON": -80.29, "ARR_AIRPORT_TZOFFSET": -14400.0, "EVENT_TYPE": "wheelsoff", "EVENT_TIME": "2015-03-10T12:37:00"} 47 | {"FL_DATE": "2015-03-10", "UNIQUE_CARRIER": "AA", "ORIGIN_AIRPORT_SEQ_ID": "1393003", "ORIGIN": "ORD", "DEST_AIRPORT_SEQ_ID": "1330303", "DEST": "MIA", "CRS_DEP_TIME": "2015-03-10T13:00:00", "DEP_TIME": "2015-03-10T12:55:00", "DEP_DELAY": -5.0, "TAXI_OUT": 13.0, "WHEELS_OFF": "2015-03-10T13:08:00", "CRS_ARR_TIME": "2015-03-10T16:06:00", "CANCELLED": false, "DIVERTED": false, "DEP_AIRPORT_LAT": 41.97944444, "DEP_AIRPORT_LON": -87.9075, "DEP_AIRPORT_TZOFFSET": -18000.0, "ARR_AIRPORT_LAT": 25.79527778, "ARR_AIRPORT_LON": -80.29, "ARR_AIRPORT_TZOFFSET": -14400.0, "EVENT_TYPE": "wheelsoff", "EVENT_TIME": "2015-03-10T13:08:00"} 48 | {"FL_DATE": "2015-03-10", "UNIQUE_CARRIER": "F9", "ORIGIN_AIRPORT_SEQ_ID": "1393003", "ORIGIN": "ORD", "DEST_AIRPORT_SEQ_ID": "1330303", "DEST": "MIA", "CRS_DEP_TIME": "2015-03-10T11:00:00", "DEP_TIME": "2015-03-10T10:56:00", "DEP_DELAY": -4.0, "TAXI_OUT": 21.0, "WHEELS_OFF": "2015-03-10T11:17:00", "CRS_ARR_TIME": "2015-03-10T14:00:00", "CANCELLED": false, "DIVERTED": false, "DEP_AIRPORT_LAT": 41.97944444, "DEP_AIRPORT_LON": -87.9075, "DEP_AIRPORT_TZOFFSET": -18000.0, "ARR_AIRPORT_LAT": 25.79527778, "ARR_AIRPORT_LON": -80.29, "ARR_AIRPORT_TZOFFSET": -14400.0, "EVENT_TYPE": "wheelsoff", "EVENT_TIME": "2015-03-10T11:17:00"} 49 | {"FL_DATE": "2015-03-10", "UNIQUE_CARRIER": "US", "ORIGIN_AIRPORT_SEQ_ID": "1410002", "ORIGIN": "PHL", "DEST_AIRPORT_SEQ_ID": "1330303", "DEST": "MIA", "CRS_DEP_TIME": "2015-03-10T11:35:00", "DEP_TIME": "2015-03-10T11:31:00", "DEP_DELAY": -4.0, "TAXI_OUT": 26.0, "WHEELS_OFF": "2015-03-10T11:57:00", "CRS_ARR_TIME": "2015-03-10T14:37:00", "CANCELLED": false, "DIVERTED": false, "DEP_AIRPORT_LAT": 39.87222222, "DEP_AIRPORT_LON": -75.24083333, "DEP_AIRPORT_TZOFFSET": -14400.0, "ARR_AIRPORT_LAT": 25.79527778, "ARR_AIRPORT_LON": -80.29, "ARR_AIRPORT_TZOFFSET": -14400.0, "EVENT_TYPE": "wheelsoff", "EVENT_TIME": "2015-03-10T11:57:00"} 50 | {"FL_DATE": "2015-03-10", "UNIQUE_CARRIER": "WN", "ORIGIN_AIRPORT_SEQ_ID": "1410702", "ORIGIN": "PHX", "DEST_AIRPORT_SEQ_ID": "1334205", "DEST": "MKE", "CRS_DEP_TIME": "2015-03-10T12:50:00", "DEP_TIME": "2015-03-10T12:47:00", "DEP_DELAY": -3.0, "TAXI_OUT": 8.0, "WHEELS_OFF": "2015-03-10T12:55:00", "CRS_ARR_TIME": "2015-03-10T16:10:00", "CANCELLED": false, "DIVERTED": false, "DEP_AIRPORT_LAT": 33.43416667, "DEP_AIRPORT_LON": -112.01166667, "DEP_AIRPORT_TZOFFSET": -25200.0, "ARR_AIRPORT_LAT": 42.94694444, "ARR_AIRPORT_LON": -87.89694444, "ARR_AIRPORT_TZOFFSET": -18000.0, "EVENT_TYPE": "wheelsoff", "EVENT_TIME": "2015-03-10T12:55:00"} 51 | {"FL_DATE": "2015-03-10", "UNIQUE_CARRIER": "OO", "ORIGIN_AIRPORT_SEQ_ID": "1445702", "ORIGIN": "RAP", "DEST_AIRPORT_SEQ_ID": "1348702", "DEST": "MSP", "CRS_DEP_TIME": "2015-03-10T12:35:00", "DEP_TIME": "2015-03-10T12:20:00", "DEP_DELAY": -15.0, "TAXI_OUT": 12.0, "WHEELS_OFF": "2015-03-10T12:32:00", "CRS_ARR_TIME": "2015-03-10T14:22:00", "CANCELLED": false, "DIVERTED": false, "DEP_AIRPORT_LAT": 44.04527778, "DEP_AIRPORT_LON": -103.05722222, "DEP_AIRPORT_TZOFFSET": -21600.0, "ARR_AIRPORT_LAT": 44.88194444, "ARR_AIRPORT_LON": -93.22166667, "ARR_AIRPORT_TZOFFSET": -18000.0, "EVENT_TYPE": "wheelsoff", "EVENT_TIME": "2015-03-10T12:32:00"} 52 | {"FL_DATE": "2015-03-10", "UNIQUE_CARRIER": "EV", "ORIGIN_AIRPORT_SEQ_ID": "1452401", "ORIGIN": "RIC", "DEST_AIRPORT_SEQ_ID": "1348702", "DEST": "MSP", "CRS_DEP_TIME": "2015-03-10T11:15:00", "DEP_TIME": "2015-03-10T11:56:00", "DEP_DELAY": 41.0, "TAXI_OUT": 9.0, "WHEELS_OFF": "2015-03-10T12:05:00", "CRS_ARR_TIME": "2015-03-10T14:13:00", "CANCELLED": false, "DIVERTED": false, "DEP_AIRPORT_LAT": 37.50527778, "DEP_AIRPORT_LON": -77.31972222, "DEP_AIRPORT_TZOFFSET": -14400.0, "ARR_AIRPORT_LAT": 44.88194444, "ARR_AIRPORT_LON": -93.22166667, "ARR_AIRPORT_TZOFFSET": -18000.0, "EVENT_TYPE": "wheelsoff", "EVENT_TIME": "2015-03-10T12:05:00"} 53 | {"FL_DATE": "2015-03-10", "UNIQUE_CARRIER": "OO", "ORIGIN_AIRPORT_SEQ_ID": "1469606", "ORIGIN": "SBN", "DEST_AIRPORT_SEQ_ID": "1348702", "DEST": "MSP", "CRS_DEP_TIME": "2015-03-10T11:30:00", "DEP_TIME": "2015-03-10T11:19:00", "DEP_DELAY": -11.0, "TAXI_OUT": 44.0, "WHEELS_OFF": "2015-03-10T12:03:00", "CRS_ARR_TIME": "2015-03-10T13:16:00", "CANCELLED": false, "DIVERTED": false, "DEP_AIRPORT_LAT": 41.70833333, "DEP_AIRPORT_LON": -86.31722222, "DEP_AIRPORT_TZOFFSET": -14400.0, "ARR_AIRPORT_LAT": 44.88194444, "ARR_AIRPORT_LON": -93.22166667, "ARR_AIRPORT_TZOFFSET": -18000.0, "EVENT_TYPE": "wheelsoff", "EVENT_TIME": "2015-03-10T12:03:00"} 54 | {"FL_DATE": "2015-03-10", "UNIQUE_CARRIER": "WN", "ORIGIN_AIRPORT_SEQ_ID": "1501603", "ORIGIN": "STL", "DEST_AIRPORT_SEQ_ID": "1334205", "DEST": "MKE", "CRS_DEP_TIME": "2015-03-10T13:40:00", "DEP_TIME": "2015-03-10T13:38:00", "DEP_DELAY": -2.0, "TAXI_OUT": 15.0, "WHEELS_OFF": "2015-03-10T13:53:00", "CRS_ARR_TIME": "2015-03-10T14:50:00", "CANCELLED": false, "DIVERTED": false, "DEP_AIRPORT_LAT": 38.74861111, "DEP_AIRPORT_LON": -90.37, "DEP_AIRPORT_TZOFFSET": -18000.0, "ARR_AIRPORT_LAT": 42.94694444, "ARR_AIRPORT_LON": -87.89694444, "ARR_AIRPORT_TZOFFSET": -18000.0, "EVENT_TYPE": "wheelsoff", "EVENT_TIME": "2015-03-10T13:53:00"} 55 | {"FL_DATE": "2015-03-10", "UNIQUE_CARRIER": "DL", "ORIGIN_AIRPORT_SEQ_ID": "1348702", "ORIGIN": "MSP", "DEST_AIRPORT_SEQ_ID": "1410702", "DEST": "PHX", "CRS_DEP_TIME": "2015-03-10T12:05:00", "DEP_TIME": "2015-03-10T12:00:00", "DEP_DELAY": -5.0, "TAXI_OUT": 10.0, "WHEELS_OFF": "2015-03-10T12:10:00", "CRS_ARR_TIME": "2015-03-10T15:28:00", "CANCELLED": false, "DIVERTED": false, "DEP_AIRPORT_LAT": 44.88194444, "DEP_AIRPORT_LON": -93.22166667, "DEP_AIRPORT_TZOFFSET": -18000.0, "ARR_AIRPORT_LAT": 33.43416667, "ARR_AIRPORT_LON": -112.01166667, "ARR_AIRPORT_TZOFFSET": -25200.0, "EVENT_TYPE": "wheelsoff", "EVENT_TIME": "2015-03-10T12:10:00"} 56 | {"FL_DATE": "2015-03-10", "UNIQUE_CARRIER": "UA", "ORIGIN_AIRPORT_SEQ_ID": "1393003", "ORIGIN": "ORD", "DEST_AIRPORT_SEQ_ID": "1530402", "DEST": "TPA", "CRS_DEP_TIME": "2015-03-10T11:00:00", "DEP_TIME": "2015-03-10T10:53:00", "DEP_DELAY": -7.0, "TAXI_OUT": 18.0, "WHEELS_OFF": "2015-03-10T11:11:00", "CRS_ARR_TIME": "2015-03-10T13:40:00", "CANCELLED": false, "DIVERTED": false, "DEP_AIRPORT_LAT": 41.97944444, "DEP_AIRPORT_LON": -87.9075, "DEP_AIRPORT_TZOFFSET": -18000.0, "ARR_AIRPORT_LAT": 27.97555556, "ARR_AIRPORT_LON": -82.53333333, "ARR_AIRPORT_TZOFFSET": -14400.0, "EVENT_TYPE": "wheelsoff", "EVENT_TIME": "2015-03-10T11:11:00"} 57 | {"FL_DATE": "2015-03-10", "UNIQUE_CARRIER": "US", "ORIGIN_AIRPORT_SEQ_ID": "1348702", "ORIGIN": "MSP", "DEST_AIRPORT_SEQ_ID": "1410702", "DEST": "PHX", "CRS_DEP_TIME": "2015-03-10T12:30:00", "DEP_TIME": "2015-03-10T12:21:00", "DEP_DELAY": -9.0, "TAXI_OUT": 26.0, "WHEELS_OFF": "2015-03-10T12:47:00", "CRS_ARR_TIME": "2015-03-10T16:16:00", "CANCELLED": false, "DIVERTED": false, "DEP_AIRPORT_LAT": 44.88194444, "DEP_AIRPORT_LON": -93.22166667, "DEP_AIRPORT_TZOFFSET": -18000.0, "ARR_AIRPORT_LAT": 33.43416667, "ARR_AIRPORT_LON": -112.01166667, "ARR_AIRPORT_TZOFFSET": -25200.0, "EVENT_TYPE": "wheelsoff", "EVENT_TIME": "2015-03-10T12:47:00"} 58 | {"FL_DATE": "2015-03-10", "UNIQUE_CARRIER": "WN", "ORIGIN_AIRPORT_SEQ_ID": "1154003", "ORIGIN": "ELP", "DEST_AIRPORT_SEQ_ID": "1468303", "DEST": "SAT", "CRS_DEP_TIME": "2015-03-10T12:15:00", "DEP_TIME": "2015-03-10T12:12:00", "DEP_DELAY": -3.0, "TAXI_OUT": 6.0, "WHEELS_OFF": "2015-03-10T12:18:00", "CRS_ARR_TIME": "2015-03-10T13:40:00", "CANCELLED": false, "DIVERTED": false, "DEP_AIRPORT_LAT": 31.80722222, "DEP_AIRPORT_LON": -106.37638889, "DEP_AIRPORT_TZOFFSET": -21600.0, "ARR_AIRPORT_LAT": 29.53388889, "ARR_AIRPORT_LON": -98.46916667, "ARR_AIRPORT_TZOFFSET": -18000.0, "EVENT_TYPE": "wheelsoff", "EVENT_TIME": "2015-03-10T12:18:00"} 59 | {"FL_DATE": "2015-03-10", "UNIQUE_CARRIER": "UA", "ORIGIN_AIRPORT_SEQ_ID": "1226402", "ORIGIN": "IAD", "DEST_AIRPORT_SEQ_ID": "1467903", "DEST": "SAN", "CRS_DEP_TIME": "2015-03-10T12:21:00", "DEP_TIME": "2015-03-10T12:16:00", "DEP_DELAY": -5.0, "TAXI_OUT": 9.0, "WHEELS_OFF": "2015-03-10T12:25:00", "CRS_ARR_TIME": "2015-03-10T18:12:00", "CANCELLED": false, "DIVERTED": false, "DEP_AIRPORT_LAT": 38.9475, "DEP_AIRPORT_LON": -77.46, "DEP_AIRPORT_TZOFFSET": -14400.0, "ARR_AIRPORT_LAT": 32.73361111, "ARR_AIRPORT_LON": -117.18972222, "ARR_AIRPORT_TZOFFSET": -25200.0, "EVENT_TYPE": "wheelsoff", "EVENT_TIME": "2015-03-10T12:25:00"} 60 | {"FL_DATE": "2015-03-10", "UNIQUE_CARRIER": "B6", "ORIGIN_AIRPORT_SEQ_ID": "1247802", "ORIGIN": "JFK", "DEST_AIRPORT_SEQ_ID": "1467903", "DEST": "SAN", "CRS_DEP_TIME": "2015-03-10T13:23:00", "DEP_TIME": "2015-03-10T13:15:00", "DEP_DELAY": -8.0, "TAXI_OUT": 23.0, "WHEELS_OFF": "2015-03-10T13:38:00", "CRS_ARR_TIME": "2015-03-10T19:39:00", "CANCELLED": false, "DIVERTED": false, "DEP_AIRPORT_LAT": 40.63972222, "DEP_AIRPORT_LON": -73.77888889, "DEP_AIRPORT_TZOFFSET": -14400.0, "ARR_AIRPORT_LAT": 32.73361111, "ARR_AIRPORT_LON": -117.18972222, "ARR_AIRPORT_TZOFFSET": -25200.0, "EVENT_TYPE": "wheelsoff", "EVENT_TIME": "2015-03-10T13:38:00"} 61 | {"FL_DATE": "2015-03-10", "UNIQUE_CARRIER": "WN", "ORIGIN_AIRPORT_SEQ_ID": "1082103", "ORIGIN": "BWI", "DEST_AIRPORT_SEQ_ID": "1468303", "DEST": "SAT", "CRS_DEP_TIME": "2015-03-10T13:30:00", "DEP_TIME": "2015-03-10T13:36:00", "DEP_DELAY": 6.0, "TAXI_OUT": 10.0, "WHEELS_OFF": "2015-03-10T13:46:00", "CRS_ARR_TIME": "2015-03-10T17:30:00", "CANCELLED": false, "DIVERTED": false, "DEP_AIRPORT_LAT": 39.17527778, "DEP_AIRPORT_LON": -76.66833333, "DEP_AIRPORT_TZOFFSET": -14400.0, "ARR_AIRPORT_LAT": 29.53388889, "ARR_AIRPORT_LON": -98.46916667, "ARR_AIRPORT_TZOFFSET": -18000.0, "EVENT_TYPE": "wheelsoff", "EVENT_TIME": "2015-03-10T13:46:00"} 62 | {"FL_DATE": "2015-03-10", "UNIQUE_CARRIER": "WN", "ORIGIN_AIRPORT_SEQ_ID": "1323202", "ORIGIN": "MDW", "DEST_AIRPORT_SEQ_ID": "1468303", "DEST": "SAT", "CRS_DEP_TIME": "2015-03-10T13:40:00", "DEP_TIME": "2015-03-10T13:39:00", "DEP_DELAY": -1.0, "TAXI_OUT": 11.0, "WHEELS_OFF": "2015-03-10T13:50:00", "CRS_ARR_TIME": "2015-03-10T16:35:00", "CANCELLED": false, "DIVERTED": false, "DEP_AIRPORT_LAT": 41.78583333, "DEP_AIRPORT_LON": -87.7525, "DEP_AIRPORT_TZOFFSET": -18000.0, "ARR_AIRPORT_LAT": 29.53388889, "ARR_AIRPORT_LON": -98.46916667, "ARR_AIRPORT_TZOFFSET": -18000.0, "EVENT_TYPE": "wheelsoff", "EVENT_TIME": "2015-03-10T13:50:00"} 63 | {"FL_DATE": "2015-03-10", "UNIQUE_CARRIER": "AS", "ORIGIN_AIRPORT_SEQ_ID": "1405702", "ORIGIN": "PDX", "DEST_AIRPORT_SEQ_ID": "1467903", "DEST": "SAN", "CRS_DEP_TIME": "2015-03-10T13:45:00", "DEP_TIME": "2015-03-10T13:37:00", "DEP_DELAY": -8.0, "TAXI_OUT": 18.0, "WHEELS_OFF": "2015-03-10T13:55:00", "CRS_ARR_TIME": "2015-03-10T16:10:00", "CANCELLED": false, "DIVERTED": false, "DEP_AIRPORT_LAT": 45.58861111, "DEP_AIRPORT_LON": -122.59694444, "DEP_AIRPORT_TZOFFSET": -25200.0, "ARR_AIRPORT_LAT": 32.73361111, "ARR_AIRPORT_LON": -117.18972222, "ARR_AIRPORT_TZOFFSET": -25200.0, "EVENT_TYPE": "wheelsoff", "EVENT_TIME": "2015-03-10T13:55:00"} 64 | {"FL_DATE": "2015-03-10", "UNIQUE_CARRIER": "AS", "ORIGIN_AIRPORT_SEQ_ID": "1474703", "ORIGIN": "SEA", "DEST_AIRPORT_SEQ_ID": "1467903", "DEST": "SAN", "CRS_DEP_TIME": "2015-03-10T13:35:00", "DEP_TIME": "2015-03-10T13:49:00", "DEP_DELAY": 14.0, "TAXI_OUT": 10.0, "WHEELS_OFF": "2015-03-10T13:59:00", "CRS_ARR_TIME": "2015-03-10T16:17:00", "CANCELLED": false, "DIVERTED": false, "DEP_AIRPORT_LAT": 47.45, "DEP_AIRPORT_LON": -122.31166667, "DEP_AIRPORT_TZOFFSET": -25200.0, "ARR_AIRPORT_LAT": 32.73361111, "ARR_AIRPORT_LON": -117.18972222, "ARR_AIRPORT_TZOFFSET": -25200.0, "EVENT_TYPE": "wheelsoff", "EVENT_TIME": "2015-03-10T13:59:00"} 65 | {"FL_DATE": "2015-03-10", "UNIQUE_CARRIER": "WN", "ORIGIN_AIRPORT_SEQ_ID": "1477101", "ORIGIN": "SFO", "DEST_AIRPORT_SEQ_ID": "1467903", "DEST": "SAN", "CRS_DEP_TIME": "2015-03-10T13:05:00", "DEP_TIME": "2015-03-10T13:00:00", "DEP_DELAY": -5.0, "TAXI_OUT": 12.0, "WHEELS_OFF": "2015-03-10T13:12:00", "CRS_ARR_TIME": "2015-03-10T14:40:00", "CANCELLED": false, "DIVERTED": false, "DEP_AIRPORT_LAT": 37.61888889, "DEP_AIRPORT_LON": -122.375, "DEP_AIRPORT_TZOFFSET": -25200.0, "ARR_AIRPORT_LAT": 32.73361111, "ARR_AIRPORT_LON": -117.18972222, "ARR_AIRPORT_TZOFFSET": -25200.0, "EVENT_TYPE": "wheelsoff", "EVENT_TIME": "2015-03-10T13:12:00"} 66 | {"FL_DATE": "2015-03-10", "UNIQUE_CARRIER": "WN", "ORIGIN_AIRPORT_SEQ_ID": "1483103", "ORIGIN": "SJC", "DEST_AIRPORT_SEQ_ID": "1467903", "DEST": "SAN", "CRS_DEP_TIME": "2015-03-10T13:30:00", "DEP_TIME": "2015-03-10T13:27:00", "DEP_DELAY": -3.0, "TAXI_OUT": 14.0, "WHEELS_OFF": "2015-03-10T13:41:00", "CRS_ARR_TIME": "2015-03-10T14:50:00", "CANCELLED": false, "DIVERTED": false, "DEP_AIRPORT_LAT": 37.36277778, "DEP_AIRPORT_LON": -121.92916667, "DEP_AIRPORT_TZOFFSET": -25200.0, "ARR_AIRPORT_LAT": 32.73361111, "ARR_AIRPORT_LON": -117.18972222, "ARR_AIRPORT_TZOFFSET": -25200.0, "EVENT_TYPE": "wheelsoff", "EVENT_TIME": "2015-03-10T13:41:00"} 67 | {"FL_DATE": "2015-03-10", "UNIQUE_CARRIER": "WN", "ORIGIN_AIRPORT_SEQ_ID": "1288903", "ORIGIN": "LAS", "DEST_AIRPORT_SEQ_ID": "1468303", "DEST": "SAT", "CRS_DEP_TIME": "2015-03-10T12:55:00", "DEP_TIME": "2015-03-10T13:01:00", "DEP_DELAY": 6.0, "TAXI_OUT": 24.0, "WHEELS_OFF": "2015-03-10T13:25:00", "CRS_ARR_TIME": "2015-03-10T15:25:00", "CANCELLED": false, "DIVERTED": false, "DEP_AIRPORT_LAT": 36.08, "DEP_AIRPORT_LON": -115.15222222, "DEP_AIRPORT_TZOFFSET": -25200.0, "ARR_AIRPORT_LAT": 29.53388889, "ARR_AIRPORT_LON": -98.46916667, "ARR_AIRPORT_TZOFFSET": -18000.0, "EVENT_TYPE": "wheelsoff", "EVENT_TIME": "2015-03-10T13:25:00"} 68 | {"FL_DATE": "2015-03-10", "UNIQUE_CARRIER": "AS", "ORIGIN_AIRPORT_SEQ_ID": "1066602", "ORIGIN": "BLI", "DEST_AIRPORT_SEQ_ID": "1474703", "DEST": "SEA", "CRS_DEP_TIME": "2015-03-10T13:40:00", "DEP_TIME": "2015-03-10T13:37:00", "DEP_DELAY": -3.0, "TAXI_OUT": 15.0, "WHEELS_OFF": "2015-03-10T13:52:00", "CRS_ARR_TIME": "2015-03-10T14:20:00", "CANCELLED": false, "DIVERTED": false, "DEP_AIRPORT_LAT": 48.79277778, "DEP_AIRPORT_LON": -122.5375, "DEP_AIRPORT_TZOFFSET": -25200.0, "ARR_AIRPORT_LAT": 47.45, "ARR_AIRPORT_LON": -122.31166667, "ARR_AIRPORT_TZOFFSET": -25200.0, "EVENT_TYPE": "wheelsoff", "EVENT_TIME": "2015-03-10T13:52:00"} 69 | {"FL_DATE": "2015-03-10", "UNIQUE_CARRIER": "AS", "ORIGIN_AIRPORT_SEQ_ID": "1348702", "ORIGIN": "MSP", "DEST_AIRPORT_SEQ_ID": "1474703", "DEST": "SEA", "CRS_DEP_TIME": "2015-03-10T12:00:00", "DEP_TIME": "2015-03-10T11:55:00", "DEP_DELAY": -5.0, "TAXI_OUT": 15.0, "WHEELS_OFF": "2015-03-10T12:10:00", "CRS_ARR_TIME": "2015-03-10T15:40:00", "CANCELLED": false, "DIVERTED": false, "DEP_AIRPORT_LAT": 44.88194444, "DEP_AIRPORT_LON": -93.22166667, "DEP_AIRPORT_TZOFFSET": -18000.0, "ARR_AIRPORT_LAT": 47.45, "ARR_AIRPORT_LON": -122.31166667, "ARR_AIRPORT_TZOFFSET": -25200.0, "EVENT_TYPE": "wheelsoff", "EVENT_TIME": "2015-03-10T12:10:00"} 70 | {"FL_DATE": "2015-03-10", "UNIQUE_CARRIER": "AA", "ORIGIN_AIRPORT_SEQ_ID": "1393003", "ORIGIN": "ORD", "DEST_AIRPORT_SEQ_ID": "1474703", "DEST": "SEA", "CRS_DEP_TIME": "2015-03-10T13:25:00", "DEP_TIME": "2015-03-10T13:36:00", "DEP_DELAY": 11.0, "TAXI_OUT": 17.0, "WHEELS_OFF": "2015-03-10T13:53:00", "CRS_ARR_TIME": "2015-03-10T18:00:00", "CANCELLED": false, "DIVERTED": false, "DEP_AIRPORT_LAT": 41.97944444, "DEP_AIRPORT_LON": -87.9075, "DEP_AIRPORT_TZOFFSET": -18000.0, "ARR_AIRPORT_LAT": 47.45, "ARR_AIRPORT_LON": -122.31166667, "ARR_AIRPORT_TZOFFSET": -25200.0, "EVENT_TYPE": "wheelsoff", "EVENT_TIME": "2015-03-10T13:53:00"} 71 | {"FL_DATE": "2015-03-10", "UNIQUE_CARRIER": "AS", "ORIGIN_AIRPORT_SEQ_ID": "1393003", "ORIGIN": "ORD", "DEST_AIRPORT_SEQ_ID": "1474703", "DEST": "SEA", "CRS_DEP_TIME": "2015-03-10T11:00:00", "DEP_TIME": "2015-03-10T10:50:00", "DEP_DELAY": -10.0, "TAXI_OUT": 27.0, "WHEELS_OFF": "2015-03-10T11:17:00", "CRS_ARR_TIME": "2015-03-10T15:25:00", "CANCELLED": false, "DIVERTED": false, "DEP_AIRPORT_LAT": 41.97944444, "DEP_AIRPORT_LON": -87.9075, "DEP_AIRPORT_TZOFFSET": -18000.0, "ARR_AIRPORT_LAT": 47.45, "ARR_AIRPORT_LON": -122.31166667, "ARR_AIRPORT_TZOFFSET": -25200.0, "EVENT_TYPE": "wheelsoff", "EVENT_TIME": "2015-03-10T11:17:00"} 72 | {"FL_DATE": "2015-03-10", "UNIQUE_CARRIER": "AS", "ORIGIN_AIRPORT_SEQ_ID": "1393003", "ORIGIN": "ORD", "DEST_AIRPORT_SEQ_ID": "1474703", "DEST": "SEA", "CRS_DEP_TIME": "2015-03-10T13:00:00", "DEP_TIME": "2015-03-10T12:50:00", "DEP_DELAY": -10.0, "TAXI_OUT": 47.0, "WHEELS_OFF": "2015-03-10T13:37:00", "CRS_ARR_TIME": "2015-03-10T17:30:00", "CANCELLED": false, "DIVERTED": false, "DEP_AIRPORT_LAT": 41.97944444, "DEP_AIRPORT_LON": -87.9075, "DEP_AIRPORT_TZOFFSET": -18000.0, "ARR_AIRPORT_LAT": 47.45, "ARR_AIRPORT_LON": -122.31166667, "ARR_AIRPORT_TZOFFSET": -25200.0, "EVENT_TYPE": "wheelsoff", "EVENT_TIME": "2015-03-10T13:37:00"} 73 | {"FL_DATE": "2015-03-10", "UNIQUE_CARRIER": "AS", "ORIGIN_AIRPORT_SEQ_ID": "1379603", "ORIGIN": "OAK", "DEST_AIRPORT_SEQ_ID": "1474703", "DEST": "SEA", "CRS_DEP_TIME": "2015-03-10T13:10:00", "DEP_TIME": "2015-03-10T12:57:00", "DEP_DELAY": -13.0, "TAXI_OUT": 14.0, "WHEELS_OFF": "2015-03-10T13:11:00", "CRS_ARR_TIME": "2015-03-10T15:08:00", "CANCELLED": false, "DIVERTED": false, "DEP_AIRPORT_LAT": 37.72277778, "DEP_AIRPORT_LON": -122.22138889, "DEP_AIRPORT_TZOFFSET": -25200.0, "ARR_AIRPORT_LAT": 47.45, "ARR_AIRPORT_LON": -122.31166667, "ARR_AIRPORT_TZOFFSET": -25200.0, "EVENT_TYPE": "wheelsoff", "EVENT_TIME": "2015-03-10T13:11:00"} 74 | {"FL_DATE": "2015-03-10", "UNIQUE_CARRIER": "OO", "ORIGIN_AIRPORT_SEQ_ID": "1163805", "ORIGIN": "FAT", "DEST_AIRPORT_SEQ_ID": "1474703", "DEST": "SEA", "CRS_DEP_TIME": "2015-03-10T13:00:00", "DEP_TIME": "2015-03-10T12:50:00", "DEP_DELAY": -10.0, "TAXI_OUT": 12.0, "WHEELS_OFF": "2015-03-10T13:02:00", "CRS_ARR_TIME": "2015-03-10T15:07:00", "CANCELLED": false, "DIVERTED": false, "DEP_AIRPORT_LAT": 36.77666667, "DEP_AIRPORT_LON": -119.71888889, "DEP_AIRPORT_TZOFFSET": -25200.0, "ARR_AIRPORT_LAT": 47.45, "ARR_AIRPORT_LON": -122.31166667, "ARR_AIRPORT_TZOFFSET": -25200.0, "EVENT_TYPE": "wheelsoff", "EVENT_TIME": "2015-03-10T13:02:00"} 75 | {"FL_DATE": "2015-03-10", "UNIQUE_CARRIER": "AS", "ORIGIN_AIRPORT_SEQ_ID": "1389101", "ORIGIN": "ONT", "DEST_AIRPORT_SEQ_ID": "1474703", "DEST": "SEA", "CRS_DEP_TIME": "2015-03-10T13:00:00", "DEP_TIME": "2015-03-10T12:50:00", "DEP_DELAY": -10.0, "TAXI_OUT": 14.0, "WHEELS_OFF": "2015-03-10T13:04:00", "CRS_ARR_TIME": "2015-03-10T15:38:00", "CANCELLED": false, "DIVERTED": false, "DEP_AIRPORT_LAT": 34.05611111, "DEP_AIRPORT_LON": -117.60111111, "DEP_AIRPORT_TZOFFSET": -25200.0, "ARR_AIRPORT_LAT": 47.45, "ARR_AIRPORT_LON": -122.31166667, "ARR_AIRPORT_TZOFFSET": -25200.0, "EVENT_TYPE": "wheelsoff", "EVENT_TIME": "2015-03-10T13:04:00"} 76 | {"FL_DATE": "2015-03-10", "UNIQUE_CARRIER": "AS", "ORIGIN_AIRPORT_SEQ_ID": "1289203", "ORIGIN": "LAX", "DEST_AIRPORT_SEQ_ID": "1474703", "DEST": "SEA", "CRS_DEP_TIME": "2015-03-10T13:00:00", "DEP_TIME": "2015-03-10T12:53:00", "DEP_DELAY": -7.0, "TAXI_OUT": 21.0, "WHEELS_OFF": "2015-03-10T13:14:00", "CRS_ARR_TIME": "2015-03-10T15:46:00", "CANCELLED": false, "DIVERTED": false, "DEP_AIRPORT_LAT": 33.9425, "DEP_AIRPORT_LON": -118.40805556, "DEP_AIRPORT_TZOFFSET": -25200.0, "ARR_AIRPORT_LAT": 47.45, "ARR_AIRPORT_LON": -122.31166667, "ARR_AIRPORT_TZOFFSET": -25200.0, "EVENT_TYPE": "wheelsoff", "EVENT_TIME": "2015-03-10T13:14:00"} 77 | {"FL_DATE": "2015-03-10", "UNIQUE_CARRIER": "AS", "ORIGIN_AIRPORT_SEQ_ID": "1410702", "ORIGIN": "PHX", "DEST_AIRPORT_SEQ_ID": "1474703", "DEST": "SEA", "CRS_DEP_TIME": "2015-03-10T13:00:00", "DEP_TIME": "2015-03-10T12:56:00", "DEP_DELAY": -4.0, "TAXI_OUT": 9.0, "WHEELS_OFF": "2015-03-10T13:05:00", "CRS_ARR_TIME": "2015-03-10T15:58:00", "CANCELLED": false, "DIVERTED": false, "DEP_AIRPORT_LAT": 33.43416667, "DEP_AIRPORT_LON": -112.01166667, "DEP_AIRPORT_TZOFFSET": -25200.0, "ARR_AIRPORT_LAT": 47.45, "ARR_AIRPORT_LON": -122.31166667, "ARR_AIRPORT_TZOFFSET": -25200.0, "EVENT_TYPE": "wheelsoff", "EVENT_TIME": "2015-03-10T13:05:00"} 78 | {"FL_DATE": "2015-03-10", "UNIQUE_CARRIER": "UA", "ORIGIN_AIRPORT_SEQ_ID": "1477101", "ORIGIN": "SFO", "DEST_AIRPORT_SEQ_ID": "1474703", "DEST": "SEA", "CRS_DEP_TIME": "2015-03-10T13:17:00", "DEP_TIME": "2015-03-10T13:12:00", "DEP_DELAY": -5.0, "TAXI_OUT": 14.0, "WHEELS_OFF": "2015-03-10T13:26:00", "CRS_ARR_TIME": "2015-03-10T15:22:00", "CANCELLED": false, "DIVERTED": false, "DEP_AIRPORT_LAT": 37.61888889, "DEP_AIRPORT_LON": -122.375, "DEP_AIRPORT_TZOFFSET": -25200.0, "ARR_AIRPORT_LAT": 47.45, "ARR_AIRPORT_LON": -122.31166667, "ARR_AIRPORT_TZOFFSET": -25200.0, "EVENT_TYPE": "wheelsoff", "EVENT_TIME": "2015-03-10T13:26:00"} 79 | {"FL_DATE": "2015-03-10", "UNIQUE_CARRIER": "DL", "ORIGIN_AIRPORT_SEQ_ID": "1039705", "ORIGIN": "ATL", "DEST_AIRPORT_SEQ_ID": "1474703", "DEST": "SEA", "CRS_DEP_TIME": "2015-03-10T12:15:00", "DEP_TIME": "2015-03-10T12:21:00", "DEP_DELAY": 6.0, "TAXI_OUT": 17.0, "WHEELS_OFF": "2015-03-10T12:38:00", "CRS_ARR_TIME": "2015-03-10T17:45:00", "CANCELLED": false, "DIVERTED": false, "DEP_AIRPORT_LAT": 33.63666667, "DEP_AIRPORT_LON": -84.42777778, "DEP_AIRPORT_TZOFFSET": -14400.0, "ARR_AIRPORT_LAT": 47.45, "ARR_AIRPORT_LON": -122.31166667, "ARR_AIRPORT_TZOFFSET": -25200.0, "EVENT_TYPE": "wheelsoff", "EVENT_TIME": "2015-03-10T12:38:00"} 80 | {"FL_DATE": "2015-03-10", "UNIQUE_CARRIER": "B6", "ORIGIN_AIRPORT_SEQ_ID": "1072102", "ORIGIN": "BOS", "DEST_AIRPORT_SEQ_ID": "1474703", "DEST": "SEA", "CRS_DEP_TIME": "2015-03-10T12:55:00", "DEP_TIME": "2015-03-10T12:48:00", "DEP_DELAY": -7.0, "TAXI_OUT": 18.0, "WHEELS_OFF": "2015-03-10T13:06:00", "CRS_ARR_TIME": "2015-03-10T19:07:00", "CANCELLED": false, "DIVERTED": false, "DEP_AIRPORT_LAT": 42.36305556, "DEP_AIRPORT_LON": -71.00638889, "DEP_AIRPORT_TZOFFSET": -14400.0, "ARR_AIRPORT_LAT": 47.45, "ARR_AIRPORT_LON": -122.31166667, "ARR_AIRPORT_TZOFFSET": -25200.0, "EVENT_TYPE": "wheelsoff", "EVENT_TIME": "2015-03-10T13:06:00"} 81 | {"FL_DATE": "2015-03-10", "UNIQUE_CARRIER": "DL", "ORIGIN_AIRPORT_SEQ_ID": "1143302", "ORIGIN": "DTW", "DEST_AIRPORT_SEQ_ID": "1474703", "DEST": "SEA", "CRS_DEP_TIME": "2015-03-10T12:45:00", "DEP_TIME": "2015-03-10T12:41:00", "DEP_DELAY": -4.0, "TAXI_OUT": 18.0, "WHEELS_OFF": "2015-03-10T12:59:00", "CRS_ARR_TIME": "2015-03-10T17:39:00", "CANCELLED": false, "DIVERTED": false, "DEP_AIRPORT_LAT": 42.2125, "DEP_AIRPORT_LON": -83.35333333, "DEP_AIRPORT_TZOFFSET": -14400.0, "ARR_AIRPORT_LAT": 47.45, "ARR_AIRPORT_LON": -122.31166667, "ARR_AIRPORT_TZOFFSET": -25200.0, "EVENT_TYPE": "wheelsoff", "EVENT_TIME": "2015-03-10T12:59:00"} 82 | {"FL_DATE": "2015-03-10", "UNIQUE_CARRIER": "UA", "ORIGIN_AIRPORT_SEQ_ID": "1161802", "ORIGIN": "EWR", "DEST_AIRPORT_SEQ_ID": "1474703", "DEST": "SEA", "CRS_DEP_TIME": "2015-03-10T13:45:00", "DEP_TIME": "2015-03-10T13:46:00", "DEP_DELAY": 1.0, "TAXI_OUT": 10.0, "WHEELS_OFF": "2015-03-10T13:56:00", "CRS_ARR_TIME": "2015-03-10T20:01:00", "CANCELLED": false, "DIVERTED": false, "DEP_AIRPORT_LAT": 40.6925, "DEP_AIRPORT_LON": -74.16861111, "DEP_AIRPORT_TZOFFSET": -14400.0, "ARR_AIRPORT_LAT": 47.45, "ARR_AIRPORT_LON": -122.31166667, "ARR_AIRPORT_TZOFFSET": -25200.0, "EVENT_TYPE": "wheelsoff", "EVENT_TIME": "2015-03-10T13:56:00"} 83 | {"FL_DATE": "2015-03-10", "UNIQUE_CARRIER": "UA", "ORIGIN_AIRPORT_SEQ_ID": "1226402", "ORIGIN": "IAD", "DEST_AIRPORT_SEQ_ID": "1474703", "DEST": "SEA", "CRS_DEP_TIME": "2015-03-10T12:15:00", "DEP_TIME": "2015-03-10T12:09:00", "DEP_DELAY": -6.0, "TAXI_OUT": 16.0, "WHEELS_OFF": "2015-03-10T12:25:00", "CRS_ARR_TIME": "2015-03-10T18:00:00", "CANCELLED": false, "DIVERTED": false, "DEP_AIRPORT_LAT": 38.9475, "DEP_AIRPORT_LON": -77.46, "DEP_AIRPORT_TZOFFSET": -14400.0, "ARR_AIRPORT_LAT": 47.45, "ARR_AIRPORT_LON": -122.31166667, "ARR_AIRPORT_TZOFFSET": -25200.0, "EVENT_TYPE": "wheelsoff", "EVENT_TIME": "2015-03-10T12:25:00"} 84 | {"FL_DATE": "2015-03-10", "UNIQUE_CARRIER": "DL", "ORIGIN_AIRPORT_SEQ_ID": "1247802", "ORIGIN": "JFK", "DEST_AIRPORT_SEQ_ID": "1474703", "DEST": "SEA", "CRS_DEP_TIME": "2015-03-10T11:30:00", "DEP_TIME": "2015-03-10T12:02:00", "DEP_DELAY": 32.0, "TAXI_OUT": 27.0, "WHEELS_OFF": "2015-03-10T12:29:00", "CRS_ARR_TIME": "2015-03-10T17:40:00", "CANCELLED": false, "DIVERTED": false, "DEP_AIRPORT_LAT": 40.63972222, "DEP_AIRPORT_LON": -73.77888889, "DEP_AIRPORT_TZOFFSET": -14400.0, "ARR_AIRPORT_LAT": 47.45, "ARR_AIRPORT_LON": -122.31166667, "ARR_AIRPORT_TZOFFSET": -25200.0, "EVENT_TYPE": "wheelsoff", "EVENT_TIME": "2015-03-10T12:29:00"} 85 | {"FL_DATE": "2015-03-10", "UNIQUE_CARRIER": "AS", "ORIGIN_AIRPORT_SEQ_ID": "1163002", "ORIGIN": "FAI", "DEST_AIRPORT_SEQ_ID": "1474703", "DEST": "SEA", "CRS_DEP_TIME": "2015-03-10T09:50:00", "DEP_TIME": "2015-03-10T09:43:00", "DEP_DELAY": -7.0, "TAXI_OUT": 26.0, "WHEELS_OFF": "2015-03-10T10:09:00", "CRS_ARR_TIME": "2015-03-10T13:20:00", "CANCELLED": false, "DIVERTED": false, "DEP_AIRPORT_LAT": 64.815, "DEP_AIRPORT_LON": -147.85638889, "DEP_AIRPORT_TZOFFSET": -28800.0, "ARR_AIRPORT_LAT": 47.45, "ARR_AIRPORT_LON": -122.31166667, "ARR_AIRPORT_TZOFFSET": -25200.0, "EVENT_TYPE": "wheelsoff", "EVENT_TIME": "2015-03-10T10:09:00"} 86 | {"FL_DATE": "2015-03-10", "UNIQUE_CARRIER": "AS", "ORIGIN_AIRPORT_SEQ_ID": "1029904", "ORIGIN": "ANC", "DEST_AIRPORT_SEQ_ID": "1474703", "DEST": "SEA", "CRS_DEP_TIME": "2015-03-10T13:05:00", "DEP_TIME": "2015-03-10T13:01:00", "DEP_DELAY": -4.0, "TAXI_OUT": 14.0, "WHEELS_OFF": "2015-03-10T13:15:00", "CRS_ARR_TIME": "2015-03-10T16:24:00", "CANCELLED": false, "DIVERTED": false, "DEP_AIRPORT_LAT": 61.17416667, "DEP_AIRPORT_LON": -149.99805556, "DEP_AIRPORT_TZOFFSET": -28800.0, "ARR_AIRPORT_LAT": 47.45, "ARR_AIRPORT_LON": -122.31166667, "ARR_AIRPORT_TZOFFSET": -25200.0, "EVENT_TYPE": "wheelsoff", "EVENT_TIME": "2015-03-10T13:15:00"} 87 | {"FL_DATE": "2015-03-10", "UNIQUE_CARRIER": "AS", "ORIGIN_AIRPORT_SEQ_ID": "1129803", "ORIGIN": "DFW", "DEST_AIRPORT_SEQ_ID": "1474703", "DEST": "SEA", "CRS_DEP_TIME": "2015-03-10T13:00:00", "DEP_TIME": "2015-03-10T12:57:00", "DEP_DELAY": -3.0, "TAXI_OUT": 13.0, "WHEELS_OFF": "2015-03-10T13:10:00", "CRS_ARR_TIME": "2015-03-10T17:20:00", "CANCELLED": false, "DIVERTED": false, "DEP_AIRPORT_LAT": 32.89694444, "DEP_AIRPORT_LON": -97.03805556, "DEP_AIRPORT_TZOFFSET": -18000.0, "ARR_AIRPORT_LAT": 47.45, "ARR_AIRPORT_LON": -122.31166667, "ARR_AIRPORT_TZOFFSET": -25200.0, "EVENT_TYPE": "wheelsoff", "EVENT_TIME": "2015-03-10T13:10:00"} 88 | {"FL_DATE": "2015-03-10", "UNIQUE_CARRIER": "AA", "ORIGIN_AIRPORT_SEQ_ID": "1129803", "ORIGIN": "DFW", "DEST_AIRPORT_SEQ_ID": "1474703", "DEST": "SEA", "CRS_DEP_TIME": "2015-03-10T13:05:00", "DEP_TIME": "2015-03-10T13:01:00", "DEP_DELAY": -4.0, "TAXI_OUT": 18.0, "WHEELS_OFF": "2015-03-10T13:19:00", "CRS_ARR_TIME": "2015-03-10T17:34:00", "CANCELLED": false, "DIVERTED": false, "DEP_AIRPORT_LAT": 32.89694444, "DEP_AIRPORT_LON": -97.03805556, "DEP_AIRPORT_TZOFFSET": -18000.0, "ARR_AIRPORT_LAT": 47.45, "ARR_AIRPORT_LON": -122.31166667, "ARR_AIRPORT_TZOFFSET": -25200.0, "EVENT_TYPE": "wheelsoff", "EVENT_TIME": "2015-03-10T13:19:00"} 89 | {"FL_DATE": "2015-03-10", "UNIQUE_CARRIER": "F9", "ORIGIN_AIRPORT_SEQ_ID": "1129202", "ORIGIN": "DEN", "DEST_AIRPORT_SEQ_ID": "1127802", "DEST": "DCA", "CRS_DEP_TIME": "2015-03-10T13:35:00", "DEP_TIME": "2015-03-10T13:32:00", "DEP_DELAY": -3.0, "TAXI_OUT": 9.0, "WHEELS_OFF": "2015-03-10T13:41:00", "CRS_ARR_TIME": "2015-03-10T16:50:00", "CANCELLED": false, "DIVERTED": false, "DEP_AIRPORT_LAT": 39.86166667, "DEP_AIRPORT_LON": -104.67305556, "DEP_AIRPORT_TZOFFSET": -21600.0, "ARR_AIRPORT_LAT": 38.85194444, "ARR_AIRPORT_LON": -77.03777778, "ARR_AIRPORT_TZOFFSET": -14400.0, "EVENT_TYPE": "wheelsoff", "EVENT_TIME": "2015-03-10T13:41:00"} 90 | {"FL_DATE": "2015-03-10", "UNIQUE_CARRIER": "B6", "ORIGIN_AIRPORT_SEQ_ID": "1245102", "ORIGIN": "JAX", "DEST_AIRPORT_SEQ_ID": "1127802", "DEST": "DCA", "CRS_DEP_TIME": "2015-03-10T13:15:00", "DEP_TIME": "2015-03-10T13:10:00", "DEP_DELAY": -5.0, "TAXI_OUT": 15.0, "WHEELS_OFF": "2015-03-10T13:25:00", "CRS_ARR_TIME": "2015-03-10T15:06:00", "CANCELLED": false, "DIVERTED": false, "DEP_AIRPORT_LAT": 30.49416667, "DEP_AIRPORT_LON": -81.68777778, "DEP_AIRPORT_TZOFFSET": -14400.0, "ARR_AIRPORT_LAT": 38.85194444, "ARR_AIRPORT_LON": -77.03777778, "ARR_AIRPORT_TZOFFSET": -14400.0, "EVENT_TYPE": "wheelsoff", "EVENT_TIME": "2015-03-10T13:25:00"} 91 | {"FL_DATE": "2015-03-10", "UNIQUE_CARRIER": "B6", "ORIGIN_AIRPORT_SEQ_ID": "1169703", "ORIGIN": "FLL", "DEST_AIRPORT_SEQ_ID": "1127802", "DEST": "DCA", "CRS_DEP_TIME": "2015-03-10T13:35:00", "DEP_TIME": "2015-03-10T13:33:00", "DEP_DELAY": -2.0, "TAXI_OUT": 13.0, "WHEELS_OFF": "2015-03-10T13:46:00", "CRS_ARR_TIME": "2015-03-10T16:00:00", "CANCELLED": false, "DIVERTED": false, "DEP_AIRPORT_LAT": 26.0725, "DEP_AIRPORT_LON": -80.15277778, "DEP_AIRPORT_TZOFFSET": -14400.0, "ARR_AIRPORT_LAT": 38.85194444, "ARR_AIRPORT_LON": -77.03777778, "ARR_AIRPORT_TZOFFSET": -14400.0, "EVENT_TYPE": "wheelsoff", "EVENT_TIME": "2015-03-10T13:46:00"} 92 | {"FL_DATE": "2015-03-10", "UNIQUE_CARRIER": "WN", "ORIGIN_AIRPORT_SEQ_ID": "1233904", "ORIGIN": "IND", "DEST_AIRPORT_SEQ_ID": "1127802", "DEST": "DCA", "CRS_DEP_TIME": "2015-03-10T11:20:00", "DEP_TIME": "2015-03-10T11:19:00", "DEP_DELAY": -1.0, "TAXI_OUT": 9.0, "WHEELS_OFF": "2015-03-10T11:28:00", "CRS_ARR_TIME": "2015-03-10T12:55:00", "CANCELLED": false, "DIVERTED": false, "DEP_AIRPORT_LAT": 39.71722222, "DEP_AIRPORT_LON": -86.29472222, "DEP_AIRPORT_TZOFFSET": -14400.0, "ARR_AIRPORT_LAT": 38.85194444, "ARR_AIRPORT_LON": -77.03777778, "ARR_AIRPORT_TZOFFSET": -14400.0, "EVENT_TYPE": "wheelsoff", "EVENT_TIME": "2015-03-10T11:28:00"} 93 | {"FL_DATE": "2015-03-10", "UNIQUE_CARRIER": "WN", "ORIGIN_AIRPORT_SEQ_ID": "1039705", "ORIGIN": "ATL", "DEST_AIRPORT_SEQ_ID": "1127802", "DEST": "DCA", "CRS_DEP_TIME": "2015-03-10T13:40:00", "DEP_TIME": "2015-03-10T13:35:00", "DEP_DELAY": -5.0, "TAXI_OUT": 10.0, "WHEELS_OFF": "2015-03-10T13:45:00", "CRS_ARR_TIME": "2015-03-10T15:25:00", "CANCELLED": false, "DIVERTED": false, "DEP_AIRPORT_LAT": 33.63666667, "DEP_AIRPORT_LON": -84.42777778, "DEP_AIRPORT_TZOFFSET": -14400.0, "ARR_AIRPORT_LAT": 38.85194444, "ARR_AIRPORT_LON": -77.03777778, "ARR_AIRPORT_TZOFFSET": -14400.0, "EVENT_TYPE": "wheelsoff", "EVENT_TIME": "2015-03-10T13:45:00"} 94 | {"FL_DATE": "2015-03-10", "UNIQUE_CARRIER": "DL", "ORIGIN_AIRPORT_SEQ_ID": "1039705", "ORIGIN": "ATL", "DEST_AIRPORT_SEQ_ID": "1127802", "DEST": "DCA", "CRS_DEP_TIME": "2015-03-10T13:20:00", "DEP_TIME": "2015-03-10T13:15:00", "DEP_DELAY": -5.0, "TAXI_OUT": 14.0, "WHEELS_OFF": "2015-03-10T13:29:00", "CRS_ARR_TIME": "2015-03-10T15:01:00", "CANCELLED": false, "DIVERTED": false, "DEP_AIRPORT_LAT": 33.63666667, "DEP_AIRPORT_LON": -84.42777778, "DEP_AIRPORT_TZOFFSET": -14400.0, "ARR_AIRPORT_LAT": 38.85194444, "ARR_AIRPORT_LON": -77.03777778, "ARR_AIRPORT_TZOFFSET": -14400.0, "EVENT_TYPE": "wheelsoff", "EVENT_TIME": "2015-03-10T13:29:00"} 95 | {"FL_DATE": "2015-03-10", "UNIQUE_CARRIER": "DL", "ORIGIN_AIRPORT_SEQ_ID": "1039705", "ORIGIN": "ATL", "DEST_AIRPORT_SEQ_ID": "1127802", "DEST": "DCA", "CRS_DEP_TIME": "2015-03-10T12:20:00", "DEP_TIME": "2015-03-10T12:20:00", "DEP_DELAY": 0.0, "TAXI_OUT": 19.0, "WHEELS_OFF": "2015-03-10T12:39:00", "CRS_ARR_TIME": "2015-03-10T14:01:00", "CANCELLED": false, "DIVERTED": false, "DEP_AIRPORT_LAT": 33.63666667, "DEP_AIRPORT_LON": -84.42777778, "DEP_AIRPORT_TZOFFSET": -14400.0, "ARR_AIRPORT_LAT": 38.85194444, "ARR_AIRPORT_LON": -77.03777778, "ARR_AIRPORT_TZOFFSET": -14400.0, "EVENT_TYPE": "wheelsoff", "EVENT_TIME": "2015-03-10T12:39:00"} 96 | {"FL_DATE": "2015-03-10", "UNIQUE_CARRIER": "WN", "ORIGIN_AIRPORT_SEQ_ID": "1039705", "ORIGIN": "ATL", "DEST_AIRPORT_SEQ_ID": "1127802", "DEST": "DCA", "CRS_DEP_TIME": "2015-03-10T10:45:00", "DEP_TIME": "2015-03-10T10:39:00", "DEP_DELAY": -6.0, "TAXI_OUT": 15.0, "WHEELS_OFF": "2015-03-10T10:54:00", "CRS_ARR_TIME": "2015-03-10T12:20:00", "CANCELLED": false, "DIVERTED": false, "DEP_AIRPORT_LAT": 33.63666667, "DEP_AIRPORT_LON": -84.42777778, "DEP_AIRPORT_TZOFFSET": -14400.0, "ARR_AIRPORT_LAT": 38.85194444, "ARR_AIRPORT_LON": -77.03777778, "ARR_AIRPORT_TZOFFSET": -14400.0, "EVENT_TYPE": "wheelsoff", "EVENT_TIME": "2015-03-10T10:54:00"} 97 | {"FL_DATE": "2015-03-10", "UNIQUE_CARRIER": "B6", "ORIGIN_AIRPORT_SEQ_ID": "1072102", "ORIGIN": "BOS", "DEST_AIRPORT_SEQ_ID": "1127802", "DEST": "DCA", "CRS_DEP_TIME": "2015-03-10T12:42:00", "DEP_TIME": "2015-03-10T13:31:00", "DEP_DELAY": 49.0, "TAXI_OUT": 19.0, "WHEELS_OFF": "2015-03-10T13:50:00", "CRS_ARR_TIME": "2015-03-10T14:19:00", "CANCELLED": false, "DIVERTED": false, "DEP_AIRPORT_LAT": 42.36305556, "DEP_AIRPORT_LON": -71.00638889, "DEP_AIRPORT_TZOFFSET": -14400.0, "ARR_AIRPORT_LAT": 38.85194444, "ARR_AIRPORT_LON": -77.03777778, "ARR_AIRPORT_TZOFFSET": -14400.0, "EVENT_TYPE": "wheelsoff", "EVENT_TIME": "2015-03-10T13:50:00"} 98 | {"FL_DATE": "2015-03-10", "UNIQUE_CARRIER": "US", "ORIGIN_AIRPORT_SEQ_ID": "1072102", "ORIGIN": "BOS", "DEST_AIRPORT_SEQ_ID": "1127802", "DEST": "DCA", "CRS_DEP_TIME": "2015-03-10T11:00:00", "DEP_TIME": "2015-03-10T10:54:00", "DEP_DELAY": -6.0, "TAXI_OUT": 18.0, "WHEELS_OFF": "2015-03-10T11:12:00", "CRS_ARR_TIME": "2015-03-10T12:46:00", "CANCELLED": false, "DIVERTED": false, "DEP_AIRPORT_LAT": 42.36305556, "DEP_AIRPORT_LON": -71.00638889, "DEP_AIRPORT_TZOFFSET": -14400.0, "ARR_AIRPORT_LAT": 38.85194444, "ARR_AIRPORT_LON": -77.03777778, "ARR_AIRPORT_TZOFFSET": -14400.0, "EVENT_TYPE": "wheelsoff", "EVENT_TIME": "2015-03-10T11:12:00"} 99 | {"FL_DATE": "2015-03-10", "UNIQUE_CARRIER": "DL", "ORIGIN_AIRPORT_SEQ_ID": "1143302", "ORIGIN": "DTW", "DEST_AIRPORT_SEQ_ID": "1127802", "DEST": "DCA", "CRS_DEP_TIME": "2015-03-10T11:25:00", "DEP_TIME": "2015-03-10T11:23:00", "DEP_DELAY": -2.0, "TAXI_OUT": 15.0, "WHEELS_OFF": "2015-03-10T11:38:00", "CRS_ARR_TIME": "2015-03-10T13:00:00", "CANCELLED": false, "DIVERTED": false, "DEP_AIRPORT_LAT": 42.2125, "DEP_AIRPORT_LON": -83.35333333, "DEP_AIRPORT_TZOFFSET": -14400.0, "ARR_AIRPORT_LAT": 38.85194444, "ARR_AIRPORT_LON": -77.03777778, "ARR_AIRPORT_TZOFFSET": -14400.0, "EVENT_TYPE": "wheelsoff", "EVENT_TIME": "2015-03-10T11:38:00"} 100 | {"FL_DATE": "2015-03-10", "UNIQUE_CARRIER": "EV", "ORIGIN_AIRPORT_SEQ_ID": "1161802", "ORIGIN": "EWR", "DEST_AIRPORT_SEQ_ID": "1127802", "DEST": "DCA", "CRS_DEP_TIME": "2015-03-10T10:36:00", "DEP_TIME": "2015-03-10T10:31:00", "DEP_DELAY": -5.0, "TAXI_OUT": 12.0, "WHEELS_OFF": "2015-03-10T10:43:00", "CRS_ARR_TIME": "2015-03-10T11:51:00", "CANCELLED": false, "DIVERTED": false, "DEP_AIRPORT_LAT": 40.6925, "DEP_AIRPORT_LON": -74.16861111, "DEP_AIRPORT_TZOFFSET": -14400.0, "ARR_AIRPORT_LAT": 38.85194444, "ARR_AIRPORT_LON": -77.03777778, "ARR_AIRPORT_TZOFFSET": -14400.0, "EVENT_TYPE": "wheelsoff", "EVENT_TIME": "2015-03-10T10:43:00"} 101 | -------------------------------------------------------------------------------- /RealTimePrediction/realtime-intelligence-main/realtime/train_on_vertex.py: -------------------------------------------------------------------------------- 1 | # Copyright 2023 Google LLC 2 | # 3 | # Licensed under the Apache License, Version 2.0 (the "License"); 4 | # you may not use this file except in compliance with the License. 5 | # You may obtain a copy of the License at 6 | # 7 | #     https://www.apache.org/licenses/LICENSE-2.0 8 | # 9 | # Unless required by applicable law or agreed to in writing, software 10 | # distributed under the License is distributed on an "AS IS" BASIS, 11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 | # See the License for the specific language governing permissions and 13 | # limitations under the License. 14 | # new commit 15 | 16 | import argparse 17 | import logging 18 | from datetime import datetime 19 | import tensorflow as tf 20 | 21 | from google.cloud import aiplatform 22 | from google.cloud.aiplatform import gapic as aip 23 | from google.cloud.aiplatform import hyperparameter_tuning as hpt 24 | #Remove References to kfp 25 | #from kfp.v2 import compiler, dsl 26 | ENDPOINT_NAME = 'flights' 27 | 28 | 29 | def train_custom_model(data_set, timestamp, develop_mode, cpu_only_mode, tf_version, extra_args=None): 30 | # Set up training and deployment infra 31 | 32 | if cpu_only_mode: 33 | train_image='us-docker.pkg.dev/vertex-ai/training/tf-cpu.{}:latest'.format(tf_version) 34 | deploy_image='us-docker.pkg.dev/vertex-ai/prediction/tf2-cpu.{}:latest'.format(tf_version) 35 | else: 36 | train_image = "us-docker.pkg.dev/vertex-ai/training/tf-gpu.{}:latest".format(tf_version) 37 | deploy_image = "us-docker.pkg.dev/vertex-ai/prediction/tf2-cpu.{}:latest".format(tf_version) 38 | 39 | # train 40 | model_display_name = '{}-{}'.format(ENDPOINT_NAME, timestamp) 41 | job = aiplatform.CustomTrainingJob( 42 | display_name='train-{}'.format(model_display_name), 43 | script_path="model.py", 44 | container_uri=train_image, 45 | requirements=['cloudml-hypertune'], # any extra Python packages 46 | model_serving_container_image_uri=deploy_image 47 | ) 48 | model_args = [ 49 | '--bucket', BUCKET, 50 | ] 51 | if develop_mode: 52 | model_args += ['--develop'] 53 | if extra_args: 54 | model_args += extra_args 55 | 56 | if cpu_only_mode: 57 | model = job.run( 58 | dataset=data_set, 59 | # See https://googleapis.dev/python/aiplatform/latest/aiplatform.html# 60 | predefined_split_column_name='data_split', 61 | model_display_name=model_display_name, 62 | args=model_args, 63 | replica_count=1, 64 | machine_type='n1-standard-4', 65 | sync=develop_mode 66 | ) 67 | else: 68 | model = job.run( 69 | dataset=data_set, 70 | # See https://googleapis.dev/python/aiplatform/latest/aiplatform.html# 71 | predefined_split_column_name='data_split', 72 | model_display_name=model_display_name, 73 | args=model_args, 74 | replica_count=1, 75 | machine_type='n1-standard-4', 76 | # See https://cloud.google.com/vertex-ai/docs/general/locations#accelerators 77 | accelerator_type=aip.AcceleratorType.NVIDIA_TESLA_T4.name, 78 | accelerator_count=1, 79 | sync=develop_mode 80 | ) 81 | return model 82 | 83 | 84 | def train_automl_model(data_set, timestamp, develop_mode): 85 | # train 86 | model_display_name = '{}-{}'.format(ENDPOINT_NAME, timestamp) 87 | job = aiplatform.AutoMLTabularTrainingJob( 88 | display_name='train-{}'.format(model_display_name), 89 | optimization_prediction_type='classification' 90 | ) 91 | model = job.run( 92 | dataset=data_set, 93 | # See https://googleapis.dev/python/aiplatform/latest/aiplatform.html# 94 | predefined_split_column_name='data_split', 95 | target_column='ontime', 96 | model_display_name=model_display_name, 97 | budget_milli_node_hours=(300 if develop_mode else 2000), 98 | disable_early_stopping=False, 99 | export_evaluated_data_items=True, 100 | export_evaluated_data_items_bigquery_destination_uri='{}:flights.automl_evaluated'.format(PROJECT), 101 | export_evaluated_data_items_override_destination=True, 102 | sync=develop_mode 103 | ) 104 | return model 105 | 106 | 107 | def do_hyperparameter_tuning(data_set, timestamp, develop_mode, cpu_only_mode, tf_version): 108 | # Vertex AI services require regional API endpoints. 109 | if cpu_only_mode: 110 | train_image='us-docker.pkg.dev/vertex-ai/training/tf-cpu.{}:latest'.format(tf_version) 111 | else: 112 | train_image = "us-docker.pkg.dev/vertex-ai/training/tf-gpu.{}:latest".format(tf_version) 113 | 114 | # a single trial job 115 | model_display_name = '{}-{}'.format(ENDPOINT_NAME, timestamp) 116 | if cpu_only_mode: 117 | trial_job = aiplatform.CustomJob.from_local_script( 118 | display_name='train-{}'.format(model_display_name), 119 | script_path="model.py", 120 | container_uri=train_image, 121 | args=[ 122 | '--bucket', BUCKET, 123 | '--skip_full_eval', # no need to evaluate on test data set 124 | '--num_epochs', '10', 125 | '--num_examples', '500000' # 1/10 actual size to finish faster 126 | ], 127 | requirements=['cloudml-hypertune'], # any extra Python packages 128 | replica_count=1, 129 | machine_type='n1-standard-4' 130 | ) 131 | else: 132 | trial_job = aiplatform.CustomJob.from_local_script( 133 | display_name='train-{}'.format(model_display_name), 134 | script_path="model.py", 135 | container_uri=train_image, 136 | args=[ 137 | '--bucket', BUCKET, 138 | '--skip_full_eval', # no need to evaluate on test data set 139 | '--num_epochs', '10', 140 | '--num_examples', '500000' # 1/10 actual size to finish faster 141 | ], 142 | requirements=['cloudml-hypertune'], # any extra Python packages 143 | replica_count=1, 144 | machine_type='n1-standard-4', 145 | # See https://cloud.google.com/vertex-ai/docs/general/locations#accelerators 146 | accelerator_type=aip.AcceleratorType.NVIDIA_TESLA_T4.name, 147 | accelerator_count=1, 148 | ) 149 | 150 | # the tuning job 151 | hparam_job = aiplatform.HyperparameterTuningJob( 152 | # See https://googleapis.dev/python/aiplatform/latest/aiplatform.html# 153 | display_name='hparam-{}'.format(model_display_name), 154 | custom_job=trial_job, 155 | metric_spec={'val_rmse': 'minimize'}, 156 | parameter_spec={ 157 | "train_batch_size": hpt.IntegerParameterSpec(min=16, max=256, scale='log'), 158 | "nbuckets": hpt.IntegerParameterSpec(min=5, max=10, scale='linear'), 159 | "dnn_hidden_units": hpt.CategoricalParameterSpec(values=["64,16", "64,16,4", "64,64,64,8", "256,64,16"]) 160 | }, 161 | max_trial_count=2 if develop_mode else NUM_HPARAM_TRIALS, 162 | parallel_trial_count=2, 163 | search_algorithm=None, # Bayesian 164 | ) 165 | 166 | hparam_job.run(sync=True) # has to finish before we can get trials. 167 | 168 | # get the parameters corresponding to the best trial 169 | best = sorted(hparam_job.trials, key=lambda x: x.final_measurement.metrics[0].value)[0] 170 | logging.info('Best trial: {}'.format(best)) 171 | best_params = [] 172 | for param in best.parameters: 173 | best_params.append('--{}'.format(param.parameter_id)) 174 | 175 | if param.parameter_id in ["train_batch_size", "nbuckets"]: 176 | # hparam returns 10.0 even though it's an integer param. so round it. 177 | # but CustomTrainingJob makes integer args into floats. so make it a string 178 | best_params.append(str(int(round(param.value)))) 179 | else: 180 | # string or float parameters 181 | best_params.append(param.value) 182 | 183 | # run the best trial to completion 184 | logging.info('Launching full training job with {}'.format(best_params)) 185 | return train_custom_model(data_set, timestamp, develop_mode, cpu_only_mode, tf_version, extra_args=best_params) 186 | 187 | #Remove references to kfp 188 | #@dsl.pipeline(name="flights-pipeline", 189 | # description="ds-on-gcp flights pipeline" 190 | #) 191 | 192 | def main(): 193 | aiplatform.init(project=PROJECT, location=REGION, staging_bucket='gs://{}'.format(BUCKET)) 194 | 195 | # create data set 196 | all_files = tf.io.gfile.glob('gs://{}/train/data/all*.csv'.format(BUCKET)) 197 | logging.info("Training on {}".format(all_files)) 198 | data_set = aiplatform.TabularDataset.create( 199 | display_name='data-{}'.format(ENDPOINT_NAME), 200 | gcs_source=all_files 201 | ) 202 | if TF_VERSION is not None: 203 | tf_version = TF_VERSION.replace(".", "-") 204 | else: 205 | tf_version = '2-' + tf.__version__[2:3] 206 | 207 | # train 208 | if AUTOML: 209 | model = train_automl_model(data_set, TIMESTAMP, DEVELOP_MODE) 210 | elif NUM_HPARAM_TRIALS > 1: 211 | model = do_hyperparameter_tuning(data_set, TIMESTAMP, DEVELOP_MODE, CPU_ONLY_MODE, tf_version) 212 | else: 213 | model = train_custom_model(data_set, TIMESTAMP, DEVELOP_MODE, CPU_ONLY_MODE, tf_version) 214 | 215 | # create endpoint if it doesn't already exist 216 | endpoints = aiplatform.Endpoint.list( 217 | filter='display_name="{}"'.format(ENDPOINT_NAME), 218 | order_by='create_time desc', 219 | project=PROJECT, location=REGION, 220 | ) 221 | if len(endpoints) > 0: 222 | endpoint = endpoints[0] # most recently created 223 | else: 224 | endpoint = aiplatform.Endpoint.create( 225 | display_name=ENDPOINT_NAME, project=PROJECT, location=REGION, 226 | sync=DEVELOP_MODE 227 | ) 228 | 229 | # deploy 230 | model.deploy( 231 | endpoint=endpoint, 232 | traffic_split={"0": 100}, 233 | machine_type='n1-standard-2', 234 | min_replica_count=1, 235 | max_replica_count=1, 236 | sync=DEVELOP_MODE 237 | ) 238 | 239 | if DEVELOP_MODE: 240 | model.wait() 241 | 242 | #Remove run_pipeline function 243 | #def run_pipeline(): 244 | # compiler.Compiler().compile(pipeline_func=main, package_path='flights_pipeline.json') 245 | 246 | # job = aip.PipelineJob( 247 | # display_name="{}-pipeline".format(ENDPOINT_NAME), 248 | # template_path="{}_pipeline.json".format(ENDPOINT_NAME), 249 | # pipeline_root="{}/pipeline_root/intro".format(BUCKET), 250 | # enable_caching=False 251 | # ) 252 | 253 | # job.run() 254 | 255 | 256 | if __name__ == '__main__': 257 | parser = argparse.ArgumentParser() 258 | 259 | parser.add_argument( 260 | '--bucket', 261 | help='Data will be read from gs://BUCKET/train/data and checkpoints will be in gs://BUCKET/train/trained_model', 262 | required=True 263 | ) 264 | parser.add_argument( 265 | '--region', 266 | help='Where to run the trainer', 267 | default='us-central1' 268 | ) 269 | parser.add_argument( 270 | '--project', 271 | help='Project to be billed', 272 | required=True 273 | ) 274 | parser.add_argument( 275 | '--develop', 276 | help='Train on a small subset in development', 277 | dest='develop', 278 | action='store_true') 279 | parser.set_defaults(develop=False) 280 | parser.add_argument( 281 | '--automl', 282 | help='Train an AutoML Table, instead of using model.py', 283 | dest='automl', 284 | action='store_true') 285 | parser.set_defaults(automl=False) 286 | parser.add_argument( 287 | '--num_hparam_trials', 288 | help='Number of hyperparameter trials. 0/1 means no hyperparam. Ignored if --automl is set.', 289 | type=int, 290 | default=0) 291 | parser.add_argument( 292 | '--pipeline', 293 | help='Run as pipeline', 294 | dest='pipeline', 295 | action='store_true') 296 | parser.add_argument( 297 | '--cpuonly', 298 | help='Run without GPU', 299 | dest='cpuonly', 300 | action='store_true') 301 | parser.set_defaults(cpuonly=False) 302 | parser.add_argument( 303 | '--tfversion', 304 | help='TensorFlow version to use' 305 | ) 306 | 307 | # parse args 308 | logging.getLogger().setLevel(logging.INFO) 309 | args = parser.parse_args().__dict__ 310 | BUCKET = args['bucket'] 311 | PROJECT = args['project'] 312 | REGION = args['region'] 313 | DEVELOP_MODE = args['develop'] 314 | CPU_ONLY_MODE = args['cpuonly'] 315 | TF_VERSION = args['tfversion'] 316 | AUTOML = args['automl'] 317 | NUM_HPARAM_TRIALS = args['num_hparam_trials'] 318 | TIMESTAMP = datetime.now().strftime("%Y%m%d%H%M%S") 319 | 320 | #if args['pipeline']: 321 | #run_pipeline() 322 | #else: 323 | main() -------------------------------------------------------------------------------- /RealTimePrediction/realtime-intelligence-main/setup_env.sh: -------------------------------------------------------------------------------- 1 | # Copyright 2023 Google LLC 2 | # 3 | # Licensed under the Apache License, Version 2.0 (the "License"); 4 | # you may not use this file except in compliance with the License. 5 | # You may obtain a copy of the License at 6 | # 7 | #     https://www.apache.org/licenses/LICENSE-2.0 8 | # 9 | # Unless required by applicable law or agreed to in writing, software 10 | # distributed under the License is distributed on an "AS IS" BASIS, 11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 | # See the License for the specific language governing permissions and 13 | # limitations under the License. 14 | #Set Project Variables 15 | PROJECT_ID=${DEVSHELL_PROJECT_ID} 16 | PROJECT_NBR=$(gcloud projects describe $PROJECT_ID --format="value(projectNumber)") 17 | NETWORK=default 18 | SUBNET=default 19 | SUBNET_CIDR=10.6.0.0/24 20 | REGION=us-central1 21 | # 22 | #Enable APIs 23 | # 24 | gcloud services enable compute.googleapis.com 25 | gcloud services enable dataflow.googleapis.com 26 | gcloud services enable pubsub.googleapis.com 27 | gcloud services enable aiplatform.googleapis.com 28 | gcloud services enable logging.googleapis.com 29 | gcloud services enable serviceusage.googleapis.com 30 | gcloud services enable bigquery.googleapis.com 31 | gcloud services enable monitoring.googleapis.com 32 | # 33 | #Provide roles to the compute service account 34 | # 35 | gcloud projects add-iam-policy-binding $PROJECT_ID \ 36 | --member=serviceAccount:$PROJECT_NBR-compute@developer.gserviceaccount.com \ 37 | --role="roles/storage.objectAdmin" 38 | 39 | gcloud projects add-iam-policy-binding $PROJECT_ID \ 40 | --member=serviceAccount:$PROJECT_NBR-compute@developer.gserviceaccount.com \ 41 | --role="roles/bigquery.jobUser" 42 | 43 | gcloud projects add-iam-policy-binding $PROJECT_ID \ 44 | --member=serviceAccount:$PROJECT_NBR-compute@developer.gserviceaccount.com \ 45 | --role="roles/bigquery.dataEditor" 46 | 47 | gcloud projects add-iam-policy-binding $PROJECT_ID \ 48 | --member=serviceAccount:$PROJECT_NBR-compute@developer.gserviceaccount.com \ 49 | --role="roles/dataflow.worker" 50 | 51 | gcloud projects add-iam-policy-binding $PROJECT_ID \ 52 | --member=serviceAccount:$PROJECT_NBR-compute@developer.gserviceaccount.com \ 53 | --role="roles/dataflow.developer" 54 | 55 | gcloud projects add-iam-policy-binding $PROJECT_ID \ 56 | --member=serviceAccount:$PROJECT_NBR-compute@developer.gserviceaccount.com \ 57 | --role="roles/aiplatform.user" 58 | 59 | gcloud projects add-iam-policy-binding $PROJECT_ID \ 60 | --member=serviceAccount:$PROJECT_NBR-compute@developer.gserviceaccount.com \ 61 | --role="roles/pubsub.editor" 62 | 63 | # 64 | #Create VPC 65 | # 66 | #gcloud compute networks create $NETWORK \ 67 | #--project=$PROJECT_ID \ 68 | #--subnet-mode=custom \ 69 | #--mtu=1460 \ 70 | #--bgp-routing-mode=regional 71 | # 72 | #Create Subnet 73 | # 74 | #gcloud compute networks subnets create $SUBNET \ 75 | # --network=$NETWORK \ 76 | # --range=$SUBNET_CIDR \ 77 | # --region=$REGION \ 78 | # --enable-private-ip-google-access \ 79 | # --project=$PROJECT_ID 80 | # 81 | #Create Firewall Rules 82 | # 83 | #gcloud compute --project=$PROJECT_ID firewall-rules create allow-intra-$SUBNET \ 84 | #--direction=INGRESS \ 85 | #--priority=1000 \ 86 | #--network=$NETWORK \ 87 | #--action=ALLOW \ 88 | #--rules=all \ 89 | #--source-ranges=$SUBNET_CIDR 90 | 91 | 92 | # If needed turn off the following policies 93 | # shielded vm policy constraints/compute.requireShieldedVm 94 | # Allow constraints/compute.vmExternalIpAccess 95 | 96 | -------------------------------------------------------------------------------- /RealTimePrediction/realtime-intelligence-main/simulate/airports.csv.gz: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/google/real-time-intelligence-workshop/ea99fd5b6fcf44c4f006dc17adde8c42ca163479/RealTimePrediction/realtime-intelligence-main/simulate/airports.csv.gz -------------------------------------------------------------------------------- /RealTimePrediction/realtime-intelligence-main/simulate/simulate.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | # Copyright 2023 Google LLC 3 | # 4 | # Licensed under the Apache License, Version 2.0 (the "License"); 5 | # you may not use this file except in compliance with the License. 6 | # You may obtain a copy of the License at 7 | # 8 | #     https://www.apache.org/licenses/LICENSE-2.0 9 | # 10 | # Unless required by applicable law or agreed to in writing, software 11 | # distributed under the License is distributed on an "AS IS" BASIS, 12 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | # See the License for the specific language governing permissions and 14 | # limitations under the License. 15 | 16 | import time 17 | import pytz 18 | import logging 19 | import argparse 20 | import datetime 21 | import google.cloud.pubsub_v1 as pubsub # Use v1 of the API 22 | import google.cloud.bigquery as bq 23 | 24 | TIME_FORMAT = '%Y-%m-%d %H:%M:%S %Z' 25 | RFC3339_TIME_FORMAT = '%Y-%m-%dT%H:%M:%S-00:00' 26 | 27 | def publish(publisher, topics, allevents, notify_time): 28 | timestamp = notify_time.strftime(RFC3339_TIME_FORMAT) 29 | for key in topics: # 'departed', 'arrived', etc. 30 | topic = topics[key] 31 | events = allevents[key] 32 | # the client automatically batches 33 | logging.info('Publishing {} {} till {}'.format(len(events), key, timestamp)) 34 | for event_data in events: 35 | publisher.publish(topic, event_data.encode(), EventTimeStamp=timestamp) 36 | 37 | def notify(publisher, topics, rows, simStartTime, programStart, speedFactor): 38 | # sleep computation 39 | def compute_sleep_secs(notify_time): 40 | time_elapsed = (datetime.datetime.utcnow() - programStart).total_seconds() 41 | sim_time_elapsed = (notify_time - simStartTime).total_seconds() / speedFactor 42 | to_sleep_secs = sim_time_elapsed - time_elapsed 43 | return to_sleep_secs 44 | 45 | tonotify = {} 46 | for key in topics: 47 | tonotify[key] = list() 48 | 49 | for row in rows: 50 | event_type, notify_time, event_data = row 51 | 52 | # how much time should we sleep? 53 | if compute_sleep_secs(notify_time) > 1: 54 | # notify the accumulated tonotify 55 | publish(publisher, topics, tonotify, notify_time) 56 | for key in topics: 57 | tonotify[key] = list() 58 | 59 | # recompute sleep, since notification takes a while 60 | to_sleep_secs = compute_sleep_secs(notify_time) 61 | if to_sleep_secs > 0: 62 | logging.info('Sleeping {} seconds'.format(to_sleep_secs)) 63 | time.sleep(to_sleep_secs) 64 | tonotify[event_type].append(event_data) 65 | 66 | # left-over records; notify again 67 | publish(publisher, topics, tonotify, notify_time) 68 | 69 | 70 | if __name__ == '__main__': 71 | parser = argparse.ArgumentParser(description='Send simulated flight events to Cloud Pub/Sub') 72 | parser.add_argument('--startTime', help='Example: 2015-05-01 00:00:00 UTC', required=True) 73 | parser.add_argument('--endTime', help='Example: 2015-05-03 00:00:00 UTC', required=True) 74 | parser.add_argument('--project', help='your project id, to create pubsub topic', required=True) 75 | parser.add_argument('--speedFactor', help='Example: 60 implies 1 hour of data sent to Cloud Pub/Sub in 1 minute', required=True, type=float) 76 | parser.add_argument('--jitter', help='type of jitter to add: None, uniform, exp are the three options', default='None') 77 | 78 | # set up BigQuery bqclient 79 | logging.basicConfig(format='%(levelname)s: %(message)s', level=logging.INFO) 80 | args = parser.parse_args() 81 | bqclient = bq.Client(args.project) 82 | bqclient.get_table('flights.flights_simevents') # throws exception on failure 83 | 84 | # jitter? 85 | if args.jitter == 'exp': 86 | jitter = 'CAST (-LN(RAND()*0.99 + 0.01)*30 + 90.5 AS INT64)' 87 | elif args.jitter == 'uniform': 88 | jitter = 'CAST(90.5 + RAND()*30 AS INT64)' 89 | else: 90 | jitter = '0' 91 | 92 | 93 | # run the query to pull simulated events 94 | querystr = """ 95 | SELECT 96 | EVENT_TYPE, 97 | TIMESTAMP_ADD(EVENT_TIME, INTERVAL @jitter SECOND) AS NOTIFY_TIME, 98 | EVENT_DATA 99 | FROM 100 | flights.flights_simevents 101 | WHERE 102 | EVENT_TIME >= @startTime 103 | AND EVENT_TIME < @endTime 104 | ORDER BY 105 | EVENT_TIME ASC 106 | """ 107 | job_config = bq.QueryJobConfig( 108 | query_parameters=[ 109 | bq.ScalarQueryParameter("jitter", "INT64", jitter), 110 | bq.ScalarQueryParameter("startTime", "TIMESTAMP", args.startTime), 111 | bq.ScalarQueryParameter("endTime", "TIMESTAMP", args.endTime), 112 | ] 113 | ) 114 | rows = bqclient.query(querystr, job_config=job_config) 115 | 116 | # create one Pub/Sub notification topic for each type of event 117 | publisher = pubsub.PublisherClient() 118 | topics = {} 119 | for event_type in ['wheelsoff', 'arrived', 'departed']: 120 | topics[event_type] = publisher.topic_path(args.project, event_type) 121 | try: 122 | publisher.get_topic(topic=topics[event_type]) 123 | logging.info("Already exists: {}".format(topics[event_type])) 124 | except: 125 | logging.info("Creating {}".format(topics[event_type])) 126 | publisher.create_topic(name=topics[event_type]) 127 | 128 | 129 | # notify about each row in the dataset 130 | programStartTime = datetime.datetime.utcnow() 131 | simStartTime = datetime.datetime.strptime(args.startTime, TIME_FORMAT).replace(tzinfo=pytz.UTC) 132 | logging.info('Simulation start time is {}'.format(simStartTime)) 133 | notify(publisher, topics, rows, simStartTime, programStartTime, args.speedFactor) 134 | -------------------------------------------------------------------------------- /RealTimePrediction/realtime-intelligence-main/simulate_flight.sh: -------------------------------------------------------------------------------- 1 | # Copyright 2023 Google LLC 2 | # 3 | # Licensed under the Apache License, Version 2.0 (the "License"); 4 | # you may not use this file except in compliance with the License. 5 | # You may obtain a copy of the License at 6 | # 7 | #     https://www.apache.org/licenses/LICENSE-2.0 8 | # 9 | # Unless required by applicable law or agreed to in writing, software 10 | # distributed under the License is distributed on an "AS IS" BASIS, 11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 | # See the License for the specific language governing permissions and 13 | # limitations under the License. 14 | # 15 | # Set environment variables 16 | # 17 | export PROJECT_ID=$(gcloud info --format='value(config.project)') 18 | export BUCKET=$PROJECT_ID-ml 19 | # 20 | # Change directory to simulate directory 21 | # 22 | cd ./simulate 23 | # 24 | # Simulate Flights: 25 | # 26 | python3 ./simulate.py --startTime '2015-02-01 00:00:00 UTC' --endTime '2015-03-03 00:00:00 UTC' --speedFactor=30 --project $PROJECT_ID 27 | cd .. -------------------------------------------------------------------------------- /RealTimePrediction/realtime-intelligence-main/stage_data.sh: -------------------------------------------------------------------------------- 1 | # Copyright 2023 Google LLC 2 | # 3 | # Licensed under the Apache License, Version 2.0 (the "License"); 4 | # you may not use this file except in compliance with the License. 5 | # You may obtain a copy of the License at 6 | # 7 | #     https://www.apache.org/licenses/LICENSE-2.0 8 | # 9 | # Unless required by applicable law or agreed to in writing, software 10 | # distributed under the License is distributed on an "AS IS" BASIS, 11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 | # See the License for the specific language governing permissions and 13 | # limitations under the License. 14 | # 15 | # Create staging environment to generate data needed 16 | # 17 | export PROJECT_ID=${DEVSHELL_PROJECT_ID} 18 | export BUCKET=$PROJECT_ID-ml 19 | # 20 | #Copy raw flights data from US Bureau of Transportation Statistics (https://www.transtats.bts.gov/) 21 | # 22 | gsutil mb -l us-central1 gs://$BUCKET 23 | # 24 | # Download Flight On Time Data from BTS 25 | # 26 | YEAR=2015 27 | # 28 | #BTS URL 29 | # 30 | #SOURCE=https://transtats.bts.gov/PREZIP 31 | # 32 | #Using a mirror 33 | # 34 | SOURCE=https://storage.googleapis.com/data-science-on-gcp/edition2/raw 35 | BASEURL="${SOURCE}/On_Time_Reporting_Carrier_On_Time_Performance_1987_present" 36 | 37 | 38 | for MONTH in `seq 1 2`; do 39 | echo "Downloading YEAR=$YEAR ... MONTH=$MONTH ... from $BASEURL" 40 | MONTH2=$(printf "%02d" $MONTH) 41 | # 42 | #Create a temp directory to store downloaded zip files 43 | # 44 | TMPDIR=$(mktemp -d) 45 | ZIPFILE=${TMPDIR}/${YEAR}_${MONTH2}.zip 46 | echo $ZIPFILE 47 | curl -o $ZIPFILE ${BASEURL}_${YEAR}_${MONTH}.zip 48 | unzip -d $TMPDIR $ZIPFILE 49 | gsutil cp $TMPDIR/*.csv gs://$BUCKET/flights/raw/${YEAR}${MONTH2}.csv 50 | rm -rf $TMPDIR 51 | done 52 | # 53 | #Define Schema for BQ flights_raw table 54 | # 55 | SCHEMA=Year:STRING,Quarter:STRING,Month:STRING,DayofMonth:STRING,DayOfWeek:STRING,FlightDate:DATE,Reporting_Airline:STRING,DOT_ID_Reporting_Airline:STRING,IATA_CODE_Reporting_Airline:STRING,Tail_Number:STRING,Flight_Number_Reporting_Airline:STRING,OriginAirportID:STRING,OriginAirportSeqID:STRING,OriginCityMarketID:STRING,Origin:STRING,OriginCityName:STRING,OriginState:STRING,OriginStateFips:STRING,OriginStateName:STRING,OriginWac:STRING,DestAirportID:STRING,DestAirportSeqID:STRING,DestCityMarketID:STRING,Dest:STRING,DestCityName:STRING,DestState:STRING,DestStateFips:STRING,DestStateName:STRING,DestWac:STRING,CRSDepTime:STRING,DepTime:STRING,DepDelay:STRING,DepDelayMinutes:STRING,DepDel15:STRING,DepartureDelayGroups:STRING,DepTimeBlk:STRING,TaxiOut:STRING,WheelsOff:STRING,WheelsOn:STRING,TaxiIn:STRING,CRSArrTime:STRING,ArrTime:STRING,ArrDelay:STRING,ArrDelayMinutes:STRING,ArrDel15:STRING,ArrivalDelayGroups:STRING,ArrTimeBlk:STRING,Cancelled:STRING,CancellationCode:STRING,Diverted:STRING,CRSElapsedTime:STRING,ActualElapsedTime:STRING,AirTime:STRING,Flights:STRING,Distance:STRING,DistanceGroup:STRING,CarrierDelay:STRING,WeatherDelay:STRING,NASDelay:STRING,SecurityDelay:STRING,LateAircraftDelay:STRING,FirstDepTime:STRING,TotalAddGTime:STRING,LongestAddGTime:STRING,DivAirportLandings:STRING,DivReachedDest:STRING,DivActualElapsedTime:STRING,DivArrDelay:STRING,DivDistance:STRING,Div1Airport:STRING,Div1AirportID:STRING,Div1AirportSeqID:STRING,Div1WheelsOn:STRING,Div1TotalGTime:STRING,Div1LongestGTime:STRING,Div1WheelsOff:STRING,Div1TailNum:STRING,Div2Airport:STRING,Div2AirportID:STRING,Div2AirportSeqID:STRING,Div2WheelsOn:STRING,Div2TotalGTime:STRING,Div2LongestGTime:STRING,Div2WheelsOff:STRING,Div2TailNum:STRING,Div3Airport:STRING,Div3AirportID:STRING,Div3AirportSeqID:STRING,Div3WheelsOn:STRING,Div3TotalGTime:STRING,Div3LongestGTime:STRING,Div3WheelsOff:STRING,Div3TailNum:STRING,Div4Airport:STRING,Div4AirportID:STRING,Div4AirportSeqID:STRING,Div4WheelsOn:STRING,Div4TotalGTime:STRING,Div4LongestGTime:STRING,Div4WheelsOff:STRING,Div4TailNum:STRING,Div5Airport:STRING,Div5AirportID:STRING,Div5AirportSeqID:STRING,Div5WheelsOn:STRING,Div5TotalGTime:STRING,Div5LongestGTime:STRING,Div5WheelsOff:STRING,Div5TailNum:STRING 56 | # 57 | #Create BQ Dataset 58 | # 59 | bq --project_id $PROJECT_ID show flights || bq mk --sync flights 60 | # 61 | # Create BQ table flights_raw in flights dataset and load the raw CSV files copied in the previous steps 62 | # 63 | for MONTH in `seq -w 1 2`; do 64 | CSVFILE=gs://$BUCKET/flights/raw/20150$MONTH.csv 65 | bq --project_id $PROJECT_ID --sync load \ 66 | --time_partitioning_field=FlightDate --time_partitioning_type=MONTH \ 67 | --source_format=CSV --ignore_unknown_values --skip_leading_rows=1 --schema=$SCHEMA \ 68 | --replace $PROJECT_ID:flights.flights_raw\$20150$MONTH $CSVFILE 69 | done 70 | # 71 | # Copy all flights time zone corrected JSON file 72 | # 73 | gsutil -m cp gs://data-science-on-gcp/edition2/flights/tzcorr/all_flights-00000-of-00026 gs://${BUCKET}/flights/tzcorr/ 74 | # 75 | # Create BQ table flights_tzcorr - time zone corrected 76 | # 77 | bq --project_id $PROJECT_ID load \ 78 | --source_format=NEWLINE_DELIMITED_JSON \ 79 | --autodetect ${PROJECT_ID}:flights.flights_tzcorr gs://${BUCKET}/flights/tzcorr/all_flights-* 80 | # 81 | #Copy Airport Information 82 | # 83 | gsutil cp gs://data-science-on-gcp/edition2/raw/airports.csv gs://${BUCKET}/flights/airports/airports.csv 84 | # 85 | # Create BQ airports table 86 | # 87 | bq --project_id=$PROJECT_ID load --autodetect --replace --source_format=CSV ${PROJECT_ID}:flights.airports gs://${BUCKET}/flights/airports/airports.csv 88 | # 89 | # Copy Simulated events file 90 | # 91 | gsutil -m cp gs://cloud-training/gsp201/simevents/flights_simevents_dump00000000000*.csv.gz gs://$BUCKET/ 92 | # 93 | # Create BQ Simevents Table 94 | # 95 | bq load --replace --autodetect --source_format=CSV flights.flights_simevents gs://$BUCKET/flights_simevents_dump00000000000*.csv.gz 96 | -------------------------------------------------------------------------------- /RealTimePrediction/realtime-intelligence-main/train_model.sh: -------------------------------------------------------------------------------- 1 | # Copyright 2023 Google LLC 2 | # 3 | # Licensed under the Apache License, Version 2.0 (the "License"); 4 | # you may not use this file except in compliance with the License. 5 | # You may obtain a copy of the License at 6 | # 7 | #     https://www.apache.org/licenses/LICENSE-2.0 8 | # 9 | # Unless required by applicable law or agreed to in writing, software 10 | # distributed under the License is distributed on an "AS IS" BASIS, 11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 | # See the License for the specific language governing permissions and 13 | # limitations under the License. 14 | # 15 | # Set environment variables 16 | # 17 | export PROJECT_ID=$(gcloud info --format='value(config.project)') 18 | export BUCKET=$PROJECT_ID-ml 19 | # 20 | # Train custom ML model on the enriched dataset: 21 | # 22 | cd ./realtime 23 | python3 train_on_vertex.py --project $PROJECT_ID --bucket $BUCKET --region us-central1 --develop --cpuonly 24 | # 25 | #In the Cloud Console, on the Navigation menu, 26 | #click Vertex AI > Training to monitor the training pipeline. 27 | #When the status is Finished, click on the training pipeline name to monitor the deployment status. 28 | #Note: It will take around 20 minutes to complete the model training and deployment. --------------------------------------------------------------------------------