├── README.md ├── SECURITY.md ├── azure ├── average-tip-per-mile-pipeline.asaql └── deployresources.json └── onprem ├── DataLoader ├── .vscode │ ├── launch.json │ └── tasks.json ├── DataFormat.cs ├── ObjectPool.cs ├── Program.cs ├── StreamReaderExtensions.cs ├── TaxiData.cs ├── TaxiFare.cs ├── TaxiRide.cs └── taxi.csproj ├── Dockerfile └── main.env /README.md: -------------------------------------------------------------------------------- 1 | # Stream processing with Azure Stream Analytics 2 | 3 | This reference architecture shows an end-to-end stream processing pipeline. The pipeline ingests data from two sources, correlates records in the two streams, and calculates a rolling average across a time window. The results are stored for further analysis. 4 | 5 | ![](https://docs.microsoft.com/azure/architecture/reference-architectures/data/images/stream-processing-asa/stream-processing-asa.png) 6 | 7 | For more information about this reference architecture and guidance about best practices, see the article [Stream processing with Azure Stream Analytics](https://docs.microsoft.com/azure/architecture/reference-architectures/data/stream-processing-stream-analytics) on the Azure Architecture Center. 8 | 9 | ## Deploy the solution 10 | 11 | ### Prerequisites 12 | 13 | 1. Clone, fork, or download the zip file for this repository. 14 | 15 | 1. Install [Azure CLI 2.0](https://docs.microsoft.com/cli/azure/install-azure-cli?view=azure-cli-latest). 16 | 17 | 1. From a command prompt, bash prompt, or PowerShell prompt, sign into your Azure account as follows: 18 | 19 | ```bash 20 | az login 21 | ``` 22 | 23 | ### Download the source data files 24 | 25 | 1. Create a directory named `DataFile` in the GitHub repo. 26 | 27 | 1. Open a web browser and navigate to . 28 | 29 | 1. Click the **Download** button on this page to download a zip file of all the taxi data for that year. 30 | 31 | 1. Extract the zip file to the `DataFile` directory. 32 | 33 | > This zip file contains other zip files. Don't extract the child zip files. 34 | 35 | The directory structure should look like the following: 36 | 37 | ```output 38 | /DataFile 39 | /FOIL2013 40 | trip_data_1.zip 41 | trip_data_2.zip 42 | trip_data_3.zip 43 | ... 44 | ``` 45 | 46 | ### Deploy the Azure resources 47 | 48 | 1. Run the following commands to deploy the Azure resources: 49 | 50 | ```bash 51 | export resourceGroup='[Resource group name]' 52 | export resourceLocation='[Location]' 53 | export cosmosDatabaseAccount='[Cosmos DB account name]' 54 | export cosmosDatabase='[Cosmos DB database name]' 55 | export cosmosDataBaseCollection='[Cosmos DB collection name]' 56 | export eventHubNamespace='[Event Hubs namespace name]' 57 | 58 | # Create a resource group 59 | az group create --name $resourceGroup --location $resourceLocation 60 | 61 | # Deploy resources 62 | az group deployment create --resource-group $resourceGroup \ 63 | --template-file ./azure/deployresources.json --parameters \ 64 | eventHubNamespace=$eventHubNamespace \ 65 | outputCosmosDatabaseAccount=$cosmosDatabaseAccount \ 66 | outputCosmosDatabase=$cosmosDatabase \ 67 | outputCosmosDatabaseCollection=$cosmosDataBaseCollection \ 68 | query='@./azure/average-tip-per-mile-pipeline.asaql' 69 | 70 | # Create a database 71 | az cosmosdb database create --name $cosmosDatabaseAccount \ 72 | --db-name $cosmosDatabase --resource-group $resourceGroup 73 | 74 | # Create a collection 75 | az cosmosdb collection create --collection-name $cosmosDataBaseCollection \ 76 | --name $cosmosDatabaseAccount --db-name $cosmosDatabase \ 77 | --resource-group $resourceGroup 78 | ``` 79 | 80 | 1. In the Azure portal, navigate to the resource group that was created. 81 | 82 | 1. Open the blade for the Stream Analytics job. 83 | 84 | 1. Click **Start** to start the job. Select **Now** as the output start time. Wait for the job to start. 85 | 86 | ### Run the data generator 87 | 88 | 1. Get the Event Hub connection strings. You can get these from the Azure portal, or by running the following CLI commands: 89 | 90 | ```bash 91 | # RIDE_EVENT_HUB 92 | az eventhubs eventhub authorization-rule keys list \ 93 | --eventhub-name taxi-ride \ 94 | --name taxi-ride-asa-access-policy \ 95 | --namespace-name $eventHubNamespace \ 96 | --resource-group $resourceGroup \ 97 | --query primaryConnectionString 98 | 99 | # FARE_EVENT_HUB 100 | az eventhubs eventhub authorization-rule keys list \ 101 | --eventhub-name taxi-fare \ 102 | --name taxi-fare-asa-access-policy \ 103 | --namespace-name $eventHubNamespace \ 104 | --resource-group $resourceGroup \ 105 | --query primaryConnectionString 106 | ``` 107 | 108 | 2. Navigate to the directory `/onprem` in the GitHub repository 109 | 110 | 3. Update the values in the file `main.env` as follows: 111 | 112 | ```output 113 | RIDE_EVENT_HUB=[Connection string for taxi-ride event hub] 114 | FARE_EVENT_HUB=[Connection string for taxi-fare event hub] 115 | RIDE_DATA_FILE_PATH=/DataFile/FOIL2013 116 | MINUTES_TO_LEAD=0 117 | PUSH_RIDE_DATA_FIRST=false 118 | ``` 119 | 120 | 4. Run the following command to build the Docker image. 121 | 122 | ```bash 123 | docker build --no-cache -t dataloader . 124 | ``` 125 | 126 | 5. Navigate back to the parent directory. 127 | 128 | ```bash 129 | cd .. 130 | ``` 131 | 132 | 6. Run the following command to run the Docker image. 133 | 134 | ```bash 135 | docker run -v `pwd`/DataFile:/DataFile --env-file=onprem/main.env dataloader:latest 136 | ``` 137 | 138 | The output should look like the following: 139 | 140 | ```output 141 | Created 10000 records for TaxiFare 142 | Created 10000 records for TaxiRide 143 | Created 20000 records for TaxiFare 144 | Created 20000 records for TaxiRide 145 | Created 30000 records for TaxiFare 146 | ... 147 | ``` 148 | 149 | Let the program run for at least 5 minutes, which is the window defined in the Stream Analytics query. To verify the Stream Analytics job is running correctly, open the Azure portal and navigate to the Cosmos DB database. Open the **Data Explorer** blade and view the documents. 150 | -------------------------------------------------------------------------------- /SECURITY.md: -------------------------------------------------------------------------------- 1 | 2 | 3 | ## Security 4 | 5 | Microsoft takes the security of our software products and services seriously, which includes all source code repositories managed through our GitHub organizations, which include [Microsoft](https://github.com/microsoft), [Azure](https://github.com/Azure), [DotNet](https://github.com/dotnet), [AspNet](https://github.com/aspnet), and [Xamarin](https://github.com/xamarin). 6 | 7 | If you believe you have found a security vulnerability in any Microsoft-owned repository that meets [Microsoft's definition of a security vulnerability](https://aka.ms/security.md/definition), please report it to us as described below. 8 | 9 | ## Reporting Security Issues 10 | 11 | **Please do not report security vulnerabilities through public GitHub issues.** 12 | 13 | Instead, please report them to the Microsoft Security Response Center (MSRC) at [https://msrc.microsoft.com/create-report](https://aka.ms/security.md/msrc/create-report). 14 | 15 | If you prefer to submit without logging in, send email to [secure@microsoft.com](mailto:secure@microsoft.com). If possible, encrypt your message with our PGP key; please download it from the [Microsoft Security Response Center PGP Key page](https://aka.ms/security.md/msrc/pgp). 16 | 17 | You should receive a response within 24 hours. If for some reason you do not, please follow up via email to ensure we received your original message. Additional information can be found at [microsoft.com/msrc](https://www.microsoft.com/msrc). 18 | 19 | Please include the requested information listed below (as much as you can provide) to help us better understand the nature and scope of the possible issue: 20 | 21 | * Type of issue (e.g. buffer overflow, SQL injection, cross-site scripting, etc.) 22 | * Full paths of source file(s) related to the manifestation of the issue 23 | * The location of the affected source code (tag/branch/commit or direct URL) 24 | * Any special configuration required to reproduce the issue 25 | * Step-by-step instructions to reproduce the issue 26 | * Proof-of-concept or exploit code (if possible) 27 | * Impact of the issue, including how an attacker might exploit the issue 28 | 29 | This information will help us triage your report more quickly. 30 | 31 | If you are reporting for a bug bounty, more complete reports can contribute to a higher bounty award. Please visit our [Microsoft Bug Bounty Program](https://aka.ms/security.md/msrc/bounty) page for more details about our active programs. 32 | 33 | ## Preferred Languages 34 | 35 | We prefer all communications to be in English. 36 | 37 | ## Policy 38 | 39 | Microsoft follows the principle of [Coordinated Vulnerability Disclosure](https://aka.ms/security.md/cvd). 40 | 41 | -------------------------------------------------------------------------------- /azure/average-tip-per-mile-pipeline.asaql: -------------------------------------------------------------------------------- 1 | WITH 2 | Step1 AS ( 3 | SELECT PartitionId, 4 | TRY_CAST(Medallion AS nvarchar(max)) AS Medallion, 5 | TRY_CAST(HackLicense AS nvarchar(max)) AS HackLicense, 6 | VendorId, 7 | TRY_CAST(pickup_datetime AS datetime) AS PickupTime, 8 | TripDistanceInMiles 9 | FROM [TaxiRide] PARTITION BY PartitionId 10 | ), 11 | Step2 AS ( 12 | SELECT PartitionId, 13 | medallion AS Medallion, 14 | hack_license AS HackLicense, 15 | vendor_id AS VendorId, 16 | TRY_CAST(pickup_datetime AS datetime) AS PickupTime, 17 | tip_amount AS TipAmount 18 | FROM [TaxiFare] PARTITION BY PartitionId 19 | ), 20 | Step3 AS ( 21 | SELECT tr.TripDistanceInMiles, 22 | tf.TipAmount 23 | FROM [Step1] tr 24 | PARTITION BY PartitionId 25 | JOIN [Step2] tf PARTITION BY PartitionId 26 | ON tr.PartitionId = tf.PartitionId 27 | AND tr.PickupTime = tf.PickupTime 28 | AND DATEDIFF(minute, tr, tf) BETWEEN 0 AND 15 29 | ) 30 | 31 | SELECT System.Timestamp AS WindowTime, 32 | SUM(tr.TipAmount) / SUM(tr.TripDistanceInMiles) AS AverageTipPerMile 33 | INTO [TaxiDrain] 34 | FROM [Step3] tr 35 | GROUP BY HoppingWindow(Duration(minute, 5), Hop(minute, 1)) 36 | -------------------------------------------------------------------------------- /azure/deployresources.json: -------------------------------------------------------------------------------- 1 | { 2 | "$schema": "https://schema.management.azure.com/schemas/2015-01-01/deploymentTemplate.json#", 3 | "contentVersion": "1.0.0.0", 4 | "parameters": { 5 | "eventHubNamespace": { 6 | "type": "string" 7 | }, 8 | "query": { 9 | "type": "string" 10 | }, 11 | "outputCosmosDatabaseAccount": { 12 | "type": "string" 13 | }, 14 | "outputCosmosDatabase": { 15 | "type": "string" 16 | }, 17 | "outputCosmosDatabaseCollection": { 18 | "type": "string" 19 | } 20 | }, 21 | "variables": { 22 | "taxiRideEventHub": "taxi-ride", 23 | "taxiRideConsumerGroup": "[concat(variables('taxiRideEventHub'), '-asa-consumer-group')]", 24 | "taxiRideConsumerGroupResourceId": "[concat(resourceId('Microsoft.EventHub/namespaces/eventhubs', parameters('eventHubNamespace'), variables('taxiRideEventHub')), '/consumergroups/', variables('taxiRideConsumerGroup'))]", 25 | "taxiRideSharedAccessPolicy": "[concat(variables('taxiRideEventHub'), '-asa-access-policy')]", 26 | "taxiRideSharedAccessPolicyResourceId": "[concat(resourceId('Microsoft.EventHub/namespaces/eventhubs', parameters('eventHubNamespace'), variables('taxiRideEventHub')), '/authorizationRules/', variables('taxiRideSharedAccessPolicy'))]", 27 | "taxiFareEventHub": "taxi-fare", 28 | "taxiFareConsumerGroup": "[concat(variables('taxiFareEventHub'), '-asa-consumer-group')]", 29 | "taxiFareConsumerGroupResourceId": "[concat(resourceId('Microsoft.EventHub/namespaces/eventhubs', parameters('eventHubNamespace'), variables('taxiFareEventHub')), '/consumergroups/', variables('taxiFareConsumerGroup'))]", 30 | "taxiFareSharedAccessPolicy": "[concat(variables('taxiFareEventHub'), '-asa-access-policy')]", 31 | "taxiFareSharedAccessPolicyResourceId": "[concat(resourceId('Microsoft.EventHub/namespaces/eventhubs', parameters('eventHubNamespace'), variables('taxiFareEventHub')), '/authorizationRules/', variables('taxiFareSharedAccessPolicy'))]" 32 | }, 33 | "resources": [ 34 | { 35 | "type": "Microsoft.EventHub/namespaces", 36 | "name": "[parameters('eventHubNamespace')]", 37 | "apiVersion": "2017-04-01", 38 | "location": "[resourceGroup().location]", 39 | "sku": { 40 | "name": "Standard", 41 | "tier": "Standard" 42 | }, 43 | "resources": [ 44 | { 45 | "type": "eventhubs", 46 | "name": "[variables('taxiRideEventHub')]", 47 | "apiVersion": "2017-04-01", 48 | "properties": { 49 | "messageRetentionInDays": 3, 50 | "partitionCount": 8 51 | }, 52 | "resources": [ 53 | { 54 | "type": "consumergroups", 55 | "name": "[variables('taxiRideConsumerGroup')]", 56 | "apiVersion": "2017-04-01", 57 | "properties": {}, 58 | "dependsOn": [ 59 | "[variables('taxiRideEventHub')]" 60 | ] 61 | }, 62 | { 63 | "type": "authorizationRules", 64 | "name": "[variables('taxiRideSharedAccessPolicy')]", 65 | "apiVersion": "2017-04-01", 66 | "properties": { 67 | "rights": [ 68 | "Listen", 69 | "Send" 70 | ] 71 | }, 72 | "dependsOn": [ 73 | "[variables('taxiRideEventHub')]" 74 | ] 75 | } 76 | ], 77 | "dependsOn": [ 78 | "[parameters('eventHubNamespace')]" 79 | ] 80 | }, 81 | { 82 | "type": "eventhubs", 83 | "name": "[variables('taxiFareEventHub')]", 84 | "apiVersion": "2017-04-01", 85 | "properties": { 86 | "messageRetentionInDays": 3, 87 | "partitionCount": 8 88 | }, 89 | "resources": [ 90 | { 91 | "type": "consumergroups", 92 | "name": "[variables('taxiFareConsumerGroup')]", 93 | "apiVersion": "2017-04-01", 94 | "properties": {}, 95 | "dependsOn": [ 96 | "[variables('taxiFareEventHub')]" 97 | ] 98 | }, 99 | { 100 | "type": "authorizationRules", 101 | "name": "[variables('taxiFareSharedAccessPolicy')]", 102 | "apiVersion": "2017-04-01", 103 | "properties": { 104 | "rights": [ 105 | "Listen", 106 | "Send" 107 | ] 108 | }, 109 | "dependsOn": [ 110 | "[variables('taxiFareEventHub')]" 111 | ] 112 | } 113 | ], 114 | "dependsOn": [ 115 | "[parameters('eventHubNamespace')]" 116 | ] 117 | } 118 | ] 119 | }, 120 | { 121 | "name": "[parameters('outputCosmosDatabaseAccount')]", 122 | "type": "Microsoft.DocumentDB/databaseAccounts", 123 | "apiVersion": "2015-04-08", 124 | "location": "[resourceGroup().location]", 125 | "kind": "GlobalDocumentDB", 126 | "properties": { 127 | "databaseAccountOfferType": "Standard" 128 | } 129 | }, 130 | { 131 | "type": "Microsoft.StreamAnalytics/streamingjobs", 132 | "name": "taxi-asa-job", 133 | "apiVersion": "2016-03-01", 134 | "location": "[resourceGroup().location]", 135 | "properties": { 136 | "sku": { 137 | "name": "Standard" 138 | }, 139 | "eventsOutOfOrderPolicy": "Adjust", 140 | "outputErrorPolicy": "Stop", 141 | "eventsOutOfOrderMaxDelayInSeconds": 900, 142 | "eventsLateArrivalMaxDelayInSeconds": 1805, 143 | "dataLocale": "en-US", 144 | "compatibilityLevel": "1.1", 145 | "inputs": [ 146 | { 147 | "name": "TaxiRide", 148 | "properties": { 149 | "type": "Stream", 150 | "datasource": { 151 | "type": "Microsoft.ServiceBus/EventHub", 152 | "properties": { 153 | "eventHubName": "[variables('taxiRideEventHub')]", 154 | "consumerGroupName": "[variables('taxiRideConsumerGroup')]", 155 | "serviceBusNamespace": "[parameters('eventHubNamespace')]", 156 | "sharedAccessPolicyName": "[variables('taxiRideSharedAccessPolicy')]", 157 | "sharedAccessPolicyKey": "[listKeys(variables('taxiRideSharedAccessPolicyResourceId'), '2017-04-01').primaryKey]" 158 | } 159 | }, 160 | "compression": { 161 | "type": "None" 162 | }, 163 | "serialization": { 164 | "type": "Json", 165 | "properties": { 166 | "encoding": "UTF8" 167 | } 168 | } 169 | } 170 | }, 171 | { 172 | "name": "TaxiFare", 173 | "properties": { 174 | "type": "Stream", 175 | "datasource": { 176 | "type": "Microsoft.ServiceBus/EventHub", 177 | "properties": { 178 | "eventHubName": "[variables('taxiFareEventHub')]", 179 | "consumerGroupName": "[variables('taxiFareConsumerGroup')]", 180 | "serviceBusNamespace": "[parameters('eventHubNamespace')]", 181 | "sharedAccessPolicyName": "[variables('taxiFareSharedAccessPolicy')]", 182 | "sharedAccessPolicyKey": "[listKeys(variables('taxiFareSharedAccessPolicyResourceId'), '2017-04-01').primaryKey]" 183 | } 184 | }, 185 | "compression": { 186 | "type": "None" 187 | }, 188 | "serialization": { 189 | "type": "Csv", 190 | "properties": { 191 | "fieldDelimiter": ",", 192 | "encoding": "UTF8" 193 | } 194 | } 195 | } 196 | } 197 | ], 198 | "outputs": [ 199 | { 200 | "name": "TaxiDrain", 201 | "properties": { 202 | "datasource": { 203 | "type": "Microsoft.Storage/DocumentDB", 204 | "properties": { 205 | "accountId": "[parameters('outputCosmosDatabaseAccount')]", 206 | "accountKey": "[listKeys(resourceId('Microsoft.DocumentDB/databaseAccounts', parameters('outputCosmosDatabaseAccount')), '2015-04-08').primaryMasterKey]", 207 | "database": "[parameters('outputCosmosDatabase')]", 208 | "collectionNamePattern": "[parameters('outputCosmosDatabaseCollection')]", 209 | "partitionKey": null, 210 | "documentId": "WindowTime" 211 | } 212 | }, 213 | "serialization": { 214 | "type": "Json", 215 | "properties": { 216 | "format": "LineSeparated", 217 | "encoding": "UTF8" 218 | } 219 | } 220 | } 221 | } 222 | ], 223 | "transformation": { 224 | "name": "Transformation", 225 | "properties": { 226 | "streamingUnits": 60, 227 | "query": "[parameters('query')]" } 228 | } 229 | }, 230 | "dependsOn": [ 231 | "[variables('taxiRideSharedAccessPolicyResourceId')]", 232 | "[variables('taxiFareSharedAccessPolicyResourceId')]", 233 | "[variables('taxiRideConsumerGroupResourceId')]", 234 | "[variables('taxiFareConsumerGroupResourceId')]", 235 | "[concat('Microsoft.DocumentDb/databaseAccounts/', parameters('outputCosmosDatabaseAccount'))]" 236 | ] 237 | }, 238 | { 239 | "name": "Taxi-Rides-Dashboard", 240 | "type": "Microsoft.Portal/dashboards", 241 | "location": "[resourceGroup().location]", 242 | "tags": { 243 | "hidden-title": "TaxiRidesDashboard" 244 | }, 245 | "apiVersion": "2015-08-01-preview", 246 | "properties": { 247 | "lenses": { 248 | "0": { 249 | "order": 0, 250 | "parts": { 251 | "0": { 252 | "position": { 253 | "x": 0, 254 | "y": 0, 255 | "colSpan": 6, 256 | "rowSpan": 4 257 | }, 258 | "metadata": { 259 | "inputs": [ 260 | { 261 | "name": "options", 262 | "value": { 263 | "charts": [ 264 | { 265 | "metrics": [ 266 | { 267 | "name": "EHINMSGS", 268 | "resourceMetadata": { 269 | "resourceId": "[concat('/subscriptions/', subscription().subscriptionId, '/resourcegroups/', resourceGroup().name, '/providers/Microsoft.EventHub/namespaces/', parameters('eventHubNamespace'))]" 270 | }, 271 | "aggregationType": 4 272 | }, 273 | { 274 | "name": "EHOUTMSGS", 275 | "resourceMetadata": { 276 | "resourceId": "[concat('/subscriptions/', subscription().subscriptionId, '/resourcegroups/', resourceGroup().name, '/providers/Microsoft.EventHub/namespaces/', parameters('eventHubNamespace'))]" 277 | }, 278 | "aggregationType": 4 279 | } 280 | ], 281 | "title": "Event Hubs: Incoming and Outgoing Messages", 282 | "itemDataModel": { 283 | "id": "3CF0CCC0-0F55-45A6-922E-F0F9C5AE0ECB", 284 | "chartHeight": 1, 285 | "metrics": [ 286 | { 287 | "id": { 288 | "resourceDefinition": { 289 | "id": "[concat('/subscriptions/', subscription().subscriptionId, '/resourcegroups/', resourceGroup().name, '/providers/Microsoft.EventHub/namespaces/', parameters('eventHubNamespace'))]" 290 | }, 291 | "name": { 292 | "id": "EHINMSGS", 293 | "displayName": "Incoming Messages" 294 | }, 295 | "dataSource": 1, 296 | "namespace": { 297 | "name": "microsoft.eventhub/namespaces" 298 | } 299 | }, 300 | "metricAggregation": 1, 301 | "color": "#47BDF5" 302 | }, 303 | { 304 | "id": { 305 | "resourceDefinition": { 306 | "id": "[concat('/subscriptions/', subscription().subscriptionId, '/resourcegroups/', resourceGroup().name, '/providers/Microsoft.EventHub/namespaces/', parameters('eventHubNamespace'))]" 307 | }, 308 | "name": { 309 | "id": "EHOUTMSGS", 310 | "displayName": "Outgoing Messages" 311 | }, 312 | "dataSource": 1, 313 | "namespace": { 314 | "name": "microsoft.eventhub/namespaces" 315 | } 316 | }, 317 | "metricAggregation": 1, 318 | "color": "#7E58FF" 319 | } 320 | ], 321 | "priorPeriod": false, 322 | "horizontalBars": true, 323 | "showOther": false, 324 | "aggregation": 1, 325 | "percentage": false, 326 | "palette": "multiColor", 327 | "jsonDefinitionId": "771FFF64-83E4-40E8-A300-8F4C88021D8A", 328 | "filters": { 329 | "filterType": 0, 330 | "id": "59EAC188-32DD-44FD-B254-FFFA46671BD6", 331 | "OperandFilters": [], 332 | "LogicalOperator": 0 333 | }, 334 | "yAxisOptions": { 335 | "options": 1 336 | }, 337 | "title": "Event Hubs: Incoming and Outgoing Messages", 338 | "titleKind": "Auto" 339 | } 340 | } 341 | ], 342 | "v2charts": true, 343 | "version": 1 344 | } 345 | }, 346 | { 347 | "name": "sharedTimeRange", 348 | "isOptional": true 349 | } 350 | ], 351 | "type": "Extension/HubsExtension/PartType/MonitorChartPart", 352 | "settings": {} 353 | } 354 | }, 355 | "1": { 356 | "position": { 357 | "x": 6, 358 | "y": 0, 359 | "colSpan": 6, 360 | "rowSpan": 4 361 | }, 362 | "metadata": { 363 | "inputs": [ 364 | { 365 | "name": "options", 366 | "value": { 367 | "charts": [ 368 | { 369 | "metrics": [ 370 | { 371 | "name": "ThrottledRequests", 372 | "resourceMetadata": { 373 | "resourceId": "[concat('/subscriptions/', subscription().subscriptionId, '/resourcegroups/', resourceGroup().name, '/providers/Microsoft.EventHub/namespaces/', parameters('eventHubNamespace'))]" 374 | }, 375 | "aggregationType": 4 376 | }, 377 | { 378 | "name": "QuotaExceededErrors", 379 | "resourceMetadata": { 380 | "resourceId": "[concat('/subscriptions/', subscription().subscriptionId, '/resourcegroups/', resourceGroup().name, '/providers/Microsoft.EventHub/namespaces/', parameters('eventHubNamespace'))]" 381 | }, 382 | "aggregationType": 4 383 | } 384 | ], 385 | "title": "Event Hubs: Throttled Requests and Quota Exceeded Errors", 386 | "itemDataModel": { 387 | "id": "4B0ADDC7-C6F8-4942-8EF8-46860F6E75B6", 388 | "chartHeight": 1, 389 | "metrics": [ 390 | { 391 | "id": { 392 | "resourceDefinition": { 393 | "id": "[concat('/subscriptions/', subscription().subscriptionId, '/resourcegroups/', resourceGroup().name, '/providers/Microsoft.EventHub/namespaces/', parameters('eventHubNamespace'))]" 394 | }, 395 | "name": { 396 | "id": "ThrottledRequests", 397 | "displayName": "Throttled Requests. (Preview)" 398 | }, 399 | "dataSource": 1, 400 | "namespace": { 401 | "name": "microsoft.eventhub/namespaces" 402 | } 403 | }, 404 | "metricAggregation": 1, 405 | "color": "#47BDF5" 406 | }, 407 | { 408 | "id": { 409 | "resourceDefinition": { 410 | "id": "[concat('/subscriptions/', subscription().subscriptionId, '/resourcegroups/', resourceGroup().name, '/providers/Microsoft.EventHub/namespaces/', parameters('eventHubNamespace'))]" 411 | }, 412 | "name": { 413 | "id": "QuotaExceededErrors", 414 | "displayName": "Quota Exceeded Errors. (Preview)" 415 | }, 416 | "dataSource": 1, 417 | "namespace": { 418 | "name": "microsoft.eventhub/namespaces" 419 | } 420 | }, 421 | "metricAggregation": 1, 422 | "color": "#7E58FF" 423 | } 424 | ], 425 | "priorPeriod": false, 426 | "horizontalBars": true, 427 | "showOther": false, 428 | "aggregation": 1, 429 | "percentage": false, 430 | "palette": "multiColor", 431 | "jsonDefinitionId": "81254AC7-74E6-4D88-9E9D-98712187ED68", 432 | "filters": { 433 | "filterType": 0, 434 | "id": "FA760F3B-8F51-48E6-A39B-ECE400B8B459", 435 | "OperandFilters": [], 436 | "LogicalOperator": 0 437 | }, 438 | "yAxisOptions": { 439 | "options": 1 440 | }, 441 | "title": "Event Hubs: Throttled Requests and Quota Exceeded Errors", 442 | "titleKind": "Auto" 443 | } 444 | } 445 | ], 446 | "v2charts": true, 447 | "version": 1 448 | } 449 | }, 450 | { 451 | "name": "sharedTimeRange", 452 | "isOptional": true 453 | } 454 | ], 455 | "type": "Extension/HubsExtension/PartType/MonitorChartPart", 456 | "settings": {} 457 | } 458 | }, 459 | "2": { 460 | "position": { 461 | "x": 0, 462 | "y": 4, 463 | "colSpan": 6, 464 | "rowSpan": 4 465 | }, 466 | "metadata": { 467 | "inputs": [ 468 | { 469 | "name": "options", 470 | "value": { 471 | "charts": [ 472 | { 473 | "metrics": [ 474 | { 475 | "name": "ResourceUtilization", 476 | "resourceMetadata": { 477 | "resourceId": "[concat('/subscriptions/', subscription().subscriptionId, '/resourcegroups/', resourceGroup().name, '/providers/Microsoft.StreamAnalytics/streamingjobs/taxi-asa-job')]" 478 | }, 479 | "aggregationType": 3 480 | } 481 | ], 482 | "title": "Stream Analytics: Max SU % Utilization", 483 | "itemDataModel": { 484 | "id": "C7B5B71D-2729-467B-B7DB-0567A5822281", 485 | "chartHeight": 1, 486 | "metrics": [ 487 | { 488 | "id": { 489 | "resourceDefinition": { 490 | "id": "[concat('/subscriptions/', subscription().subscriptionId, '/resourcegroups/', resourceGroup().name, '/providers/Microsoft.StreamAnalytics/streamingjobs/taxi-asa-job')]" 491 | }, 492 | "name": { 493 | "id": "ResourceUtilization", 494 | "displayName": "SU % Utilization" 495 | }, 496 | "dataSource": 1, 497 | "namespace": { 498 | "name": "microsoft.streamanalytics/streamingjobs" 499 | } 500 | }, 501 | "metricAggregation": 3, 502 | "color": "#47BDF5" 503 | } 504 | ], 505 | "priorPeriod": false, 506 | "horizontalBars": true, 507 | "showOther": false, 508 | "aggregation": 1, 509 | "percentage": false, 510 | "palette": "multiColor", 511 | "jsonDefinitionId": "A3B3621F-727C-4A6B-8606-4AE3A7353BAA", 512 | "filters": { 513 | "filterType": 0, 514 | "id": "480FEAE9-33B5-4DEC-BD7B-34BBCFCBCCC7", 515 | "OperandFilters": [], 516 | "LogicalOperator": 0 517 | }, 518 | "yAxisOptions": { 519 | "options": 1 520 | }, 521 | "title": "Stream Analytics: Max SU % Utilization", 522 | "titleKind": "Auto" 523 | } 524 | } 525 | ], 526 | "v2charts": true, 527 | "version": 1 528 | } 529 | }, 530 | { 531 | "name": "sharedTimeRange", 532 | "isOptional": true 533 | } 534 | ], 535 | "type": "Extension/HubsExtension/PartType/MonitorChartPart", 536 | "settings": {} 537 | } 538 | }, 539 | "3": { 540 | "position": { 541 | "x": 6, 542 | "y": 4, 543 | "colSpan": 6, 544 | "rowSpan": 4 545 | }, 546 | "metadata": { 547 | "inputs": [ 548 | { 549 | "name": "options", 550 | "value": { 551 | "charts": [ 552 | { 553 | "metrics": [ 554 | { 555 | "name": "ConversionErrors", 556 | "resourceMetadata": { 557 | "resourceId": "[concat('/subscriptions/', subscription().subscriptionId, '/resourcegroups/', resourceGroup().name, '/providers/Microsoft.StreamAnalytics/streamingjobs/taxi-asa-job')]" 558 | }, 559 | "aggregationType": 4 560 | }, 561 | { 562 | "name": "DeserializationError", 563 | "resourceMetadata": { 564 | "resourceId": "[concat('/subscriptions/', subscription().subscriptionId, '/resourcegroups/', resourceGroup().name, '/providers/Microsoft.StreamAnalytics/streamingjobs/taxi-asa-job')]" 565 | }, 566 | "aggregationType": 4 567 | }, 568 | { 569 | "name": "Errors", 570 | "resourceMetadata": { 571 | "resourceId": "[concat('/subscriptions/', subscription().subscriptionId, '/resourcegroups/', resourceGroup().name, '/providers/Microsoft.StreamAnalytics/streamingjobs/taxi-asa-job')]" 572 | }, 573 | "aggregationType": 4 574 | } 575 | ], 576 | "title": "Stream Analytics: Data Conversion Errors, Deserialization Errors, and Runtime Errors", 577 | "itemDataModel": { 578 | "id": "E6DD6E8A-F2EA-438C-8935-F2C5896FBE73", 579 | "chartHeight": 1, 580 | "metrics": [ 581 | { 582 | "id": { 583 | "resourceDefinition": { 584 | "id": "[concat('/subscriptions/', subscription().subscriptionId, '/resourcegroups/', resourceGroup().name, '/providers/Microsoft.StreamAnalytics/streamingjobs/taxi-asa-job')]" 585 | }, 586 | "name": { 587 | "id": "ConversionErrors", 588 | "displayName": "Data Conversion Errors" 589 | }, 590 | "dataSource": 1, 591 | "namespace": { 592 | "name": "microsoft.streamanalytics/streamingjobs" 593 | } 594 | }, 595 | "metricAggregation": 1, 596 | "color": "#47BDF5" 597 | }, 598 | { 599 | "id": { 600 | "resourceDefinition": { 601 | "id": "[concat('/subscriptions/', subscription().subscriptionId, '/resourcegroups/', resourceGroup().name, '/providers/Microsoft.StreamAnalytics/streamingjobs/taxi-asa-job')]" 602 | }, 603 | "name": { 604 | "id": "DeserializationError", 605 | "displayName": "Input Deserialization Errors" 606 | }, 607 | "dataSource": 1, 608 | "namespace": { 609 | "name": "microsoft.streamanalytics/streamingjobs" 610 | } 611 | }, 612 | "metricAggregation": 1, 613 | "color": "#7E58FF" 614 | }, 615 | { 616 | "id": { 617 | "resourceDefinition": { 618 | "id": "[concat('/subscriptions/', subscription().subscriptionId, '/resourcegroups/', resourceGroup().name, '/providers/Microsoft.StreamAnalytics/streamingjobs/taxi-asa-job')]" 619 | }, 620 | "name": { 621 | "id": "Errors", 622 | "displayName": "Runtime Errors" 623 | }, 624 | "dataSource": 1, 625 | "namespace": { 626 | "name": "microsoft.streamanalytics/streamingjobs" 627 | } 628 | }, 629 | "metricAggregation": 1, 630 | "color": "#44F1C8" 631 | } 632 | ], 633 | "priorPeriod": false, 634 | "horizontalBars": true, 635 | "showOther": false, 636 | "aggregation": 1, 637 | "percentage": false, 638 | "palette": "multiColor", 639 | "jsonDefinitionId": "19F54BD0-3513-4B6F-9FDA-5D1D51C9937F", 640 | "filters": { 641 | "filterType": 0, 642 | "id": "063A67A3-3287-495B-8EED-913777517AA0", 643 | "OperandFilters": [], 644 | "LogicalOperator": 0 645 | }, 646 | "yAxisOptions": { 647 | "options": 1 648 | }, 649 | "title": "Stream Analytics: Data Conversion Errors, Deserialization Errors, and Runtime Errors", 650 | "titleKind": "Auto" 651 | } 652 | } 653 | ], 654 | "v2charts": true, 655 | "version": 1 656 | } 657 | }, 658 | { 659 | "name": "sharedTimeRange", 660 | "isOptional": true 661 | } 662 | ], 663 | "type": "Extension/HubsExtension/PartType/MonitorChartPart", 664 | "settings": {} 665 | } 666 | }, 667 | "4": { 668 | "position": { 669 | "x": 0, 670 | "y": 8, 671 | "colSpan": 6, 672 | "rowSpan": 4 673 | }, 674 | "metadata": { 675 | "inputs": [ 676 | { 677 | "name": "options", 678 | "value": { 679 | "charts": [ 680 | { 681 | "metrics": [ 682 | { 683 | "name": "TotalRequestUnits", 684 | "resourceMetadata": { 685 | "resourceId": "[concat('/subscriptions/', subscription().subscriptionId, '/resourcegroups/', resourceGroup().name, '/providers/Microsoft.DocumentDB/databaseAccounts/', parameters('outputCosmosDatabaseAccount'))]" 686 | }, 687 | "aggregationType": 1 688 | } 689 | ], 690 | "title": "Cosmos DB: Avg Total Request Units", 691 | "itemDataModel": { 692 | "id": "329974C4-5AD4-40C9-ACE4-BD67B3C4C1A4", 693 | "chartHeight": 1, 694 | "metrics": [ 695 | { 696 | "id": { 697 | "resourceDefinition": { 698 | "id": "[concat('/subscriptions/', subscription().subscriptionId, '/resourcegroups/', resourceGroup().name, '/providers/Microsoft.DocumentDB/databaseAccounts/', parameters('outputCosmosDatabaseAccount'))]" 699 | }, 700 | "name": { 701 | "id": "TotalRequestUnits", 702 | "displayName": "Total Request Units" 703 | }, 704 | "dataSource": 1, 705 | "namespace": { 706 | "name": "microsoft.documentdb/databaseaccounts" 707 | } 708 | }, 709 | "metricAggregation": 4, 710 | "color": "#47BDF5" 711 | } 712 | ], 713 | "priorPeriod": false, 714 | "horizontalBars": true, 715 | "showOther": false, 716 | "aggregation": 1, 717 | "percentage": false, 718 | "palette": "multiColor", 719 | "jsonDefinitionId": "F0F77627-9C84-4676-BEF1-8CFE580289AD", 720 | "filters": { 721 | "filterType": 0, 722 | "id": "91D04A1F-A199-4439-B026-A5803D94818F", 723 | "OperandFilters": [], 724 | "LogicalOperator": 0 725 | }, 726 | "yAxisOptions": { 727 | "options": 1 728 | }, 729 | "title": "Cosmos DB: Avg Total Request Units", 730 | "titleKind": "Auto" 731 | } 732 | } 733 | ], 734 | "v2charts": true, 735 | "version": 1 736 | } 737 | }, 738 | { 739 | "name": "sharedTimeRange", 740 | "isOptional": true 741 | } 742 | ], 743 | "type": "Extension/HubsExtension/PartType/MonitorChartPart", 744 | "settings": {} 745 | } 746 | }, 747 | "5": { 748 | "position": { 749 | "x": 6, 750 | "y": 8, 751 | "colSpan": 6, 752 | "rowSpan": 4 753 | }, 754 | "metadata": { 755 | "inputs": [{ 756 | "name": "options", 757 | "value": { 758 | "charts": [{ 759 | "metrics": [{ 760 | "resourceMetadata": { 761 | "resourceId": "[concat('/subscriptions/', subscription().subscriptionId, '/resourceGroups/', resourceGroup().name, '/providers/Microsoft.DocumentDB/databaseAccounts/', parameters('outputCosmosDatabaseAccount'))]" 762 | }, 763 | "aggregationType": 4, 764 | "name": "Http 2xx", 765 | "unit": 0 766 | }, 767 | { 768 | "resourceMetadata": { 769 | "resourceId": "[concat('/subscriptions/', subscription().subscriptionId, '/resourceGroups/', resourceGroup().name, '/providers/Microsoft.DocumentDB/databaseAccounts/', parameters('outputCosmosDatabaseAccount'))]" 770 | }, 771 | "aggregationType": 4, 772 | "name": "Http 3xx", 773 | "unit": 0 774 | }, 775 | { 776 | "resourceMetadata": { 777 | "resourceId": "[concat('/subscriptions/', subscription().subscriptionId, '/resourceGroups/', resourceGroup().name, '/providers/Microsoft.DocumentDB/databaseAccounts/', parameters('outputCosmosDatabaseAccount'))]" 778 | }, 779 | "aggregationType": 4, 780 | "name": "Http 400", 781 | "unit": 0 782 | }, 783 | { 784 | "resourceMetadata": { 785 | "resourceId": "[concat('/subscriptions/', subscription().subscriptionId, '/resourceGroups/', resourceGroup().name, '/providers/Microsoft.DocumentDB/databaseAccounts/', parameters('outputCosmosDatabaseAccount'))]" 786 | }, 787 | "aggregationType": 4, 788 | "name": "Http 401", 789 | "unit": 0 790 | } 791 | , 792 | { 793 | "resourceMetadata": { 794 | "resourceId": "[concat('/subscriptions/', subscription().subscriptionId, '/resourceGroups/', resourceGroup().name, '/providers/Microsoft.DocumentDB/databaseAccounts/', parameters('outputCosmosDatabaseAccount'))]" 795 | }, 796 | "aggregationType": 4, 797 | "name": "Throttled Requests", 798 | "unit": 0 799 | } ], 800 | "chartType": 0, 801 | "timespan": { 802 | "relative": { 803 | "durationMs": 86400000 804 | } 805 | }, 806 | "title": "Cosmos DB: HTTP Responses" 807 | }] 808 | } 809 | }, 810 | { 811 | "name": "sharedTimeRange", 812 | "binding": "timeRange", 813 | "isOptional": true 814 | }], 815 | "type": "Extension/HubsExtension/PartType/MonitorChartPart", 816 | "settings": { 817 | } 818 | } 819 | } 820 | } 821 | } 822 | } 823 | } 824 | } 825 | ] 826 | } -------------------------------------------------------------------------------- /onprem/DataLoader/.vscode/launch.json: -------------------------------------------------------------------------------- 1 | { 2 | // Use IntelliSense to find out which attributes exist for C# debugging 3 | // Use hover for the description of the existing attributes 4 | // For further information visit https://github.com/OmniSharp/omnisharp-vscode/blob/master/debugger-launchjson.md 5 | "version": "0.2.0", 6 | "configurations": [ 7 | { 8 | "name": ".NET Core Launch (console)", 9 | "type": "coreclr", 10 | "request": "launch", 11 | "preLaunchTask": "build", 12 | "program": "${workspaceRoot}/bin/Debug/netcoreapp2.0/taxi.dll", 13 | "args": [], 14 | "cwd": "${workspaceRoot}", 15 | "stopAtEntry": false, 16 | "console": "internalConsole", 17 | "env": { 18 | "RIDE_EVENT_HUB": "", 19 | "FARE_EVENT_HUB": "", 20 | "RIDE_DATA_FILE_PATH": "", 21 | "MINUTES_TO_LEAD": "", 22 | "PUSH_RIDE_DATA_FIRST": "" 23 | } 24 | }, 25 | { 26 | "name": ".NET Core Attach", 27 | "type": "coreclr", 28 | "request": "attach", 29 | "processId": "${command:pickProcess}" 30 | } 31 | ,] 32 | } 33 | -------------------------------------------------------------------------------- /onprem/DataLoader/.vscode/tasks.json: -------------------------------------------------------------------------------- 1 | { 2 | "version": "2.0.0", 3 | "tasks": [ 4 | { 5 | "label": "build", 6 | "command": "dotnet", 7 | "type": "process", 8 | "group": { 9 | "kind": "build", 10 | "isDefault": true 11 | }, 12 | "args": [ 13 | "build", 14 | "${workspaceFolder}/taxi.csproj" 15 | ], 16 | "problemMatcher": "$msCompile" 17 | } 18 | ] 19 | } -------------------------------------------------------------------------------- /onprem/DataLoader/DataFormat.cs: -------------------------------------------------------------------------------- 1 | namespace Taxi 2 | { 3 | public enum DataFormat 4 | { 5 | Csv, 6 | Json 7 | } 8 | } -------------------------------------------------------------------------------- /onprem/DataLoader/ObjectPool.cs: -------------------------------------------------------------------------------- 1 | namespace Taxi 2 | { 3 | using System; 4 | using System.Collections.Concurrent; 5 | 6 | public class ObjectPool 7 | where T: class 8 | { 9 | private BlockingCollection _pool = new BlockingCollection(); 10 | private Func _factory; 11 | private int _poolSize; 12 | 13 | public ObjectPool(Func factory, int poolSize = 10) 14 | { 15 | _factory = factory ?? throw new ArgumentNullException(nameof(factory)); 16 | _poolSize = poolSize; 17 | Initialize(); 18 | } 19 | 20 | private void Initialize() 21 | { 22 | for (int i = 0; i < _poolSize; i++) 23 | { 24 | _pool.Add(new ObjectPoolObject(_factory(), this)); 25 | } 26 | } 27 | 28 | public ObjectPoolObject GetObject() 29 | { 30 | return _pool.Take(); 31 | } 32 | 33 | private void Return(ObjectPoolObject obj) 34 | { 35 | if (obj == null) 36 | { 37 | throw new ArgumentNullException(nameof(obj)); 38 | } 39 | 40 | _pool.Add(obj); 41 | } 42 | 43 | public class ObjectPoolObject : IDisposable 44 | { 45 | private T _obj; 46 | private ObjectPool _objectPool; 47 | 48 | internal ObjectPoolObject(T obj, ObjectPool objectPool) 49 | { 50 | _obj = obj ?? throw new ArgumentNullException(nameof(obj)); 51 | _objectPool = objectPool ?? throw new ArgumentNullException(nameof(objectPool)); 52 | } 53 | 54 | public void Dispose() 55 | { 56 | _objectPool.Return(this); 57 | } 58 | 59 | public T Value 60 | { 61 | get => _obj; 62 | } 63 | 64 | public static explicit operator T(ObjectPoolObject poolObject) 65 | { 66 | return poolObject._obj; 67 | } 68 | } 69 | } 70 | } -------------------------------------------------------------------------------- /onprem/DataLoader/Program.cs: -------------------------------------------------------------------------------- 1 | namespace Taxi 2 | { 3 | using System; 4 | using System.Collections.Concurrent; 5 | using System.Collections.Generic; 6 | using System.IO; 7 | using System.IO.Compression; 8 | using System.Linq; 9 | using System.Text; 10 | using System.Threading; 11 | using System.Threading.Tasks; 12 | using Microsoft.Azure.EventHubs; 13 | using Newtonsoft.Json; 14 | using System.Threading.Tasks.Dataflow; 15 | 16 | 17 | class Program 18 | { 19 | 20 | private static CancellationTokenSource cts; 21 | private static async Task ReadData(ICollection pathList, Func factory, 22 | ObjectPool pool, int randomSeed, AsyncConsole console, int waittime, DataFormat dataFormat) 23 | where T : TaxiData 24 | { 25 | 26 | 27 | if (pathList == null) 28 | { 29 | throw new ArgumentNullException(nameof(pathList)); 30 | } 31 | 32 | if (factory == null) 33 | { 34 | throw new ArgumentNullException(nameof(factory)); 35 | } 36 | 37 | if (pool == null) 38 | { 39 | throw new ArgumentNullException(nameof(pool)); 40 | } 41 | 42 | if (console == null) 43 | { 44 | throw new ArgumentNullException(nameof(console)); 45 | } 46 | 47 | if (waittime > 0) 48 | { 49 | TimeSpan span = TimeSpan.FromMilliseconds(waittime); 50 | await Task.Delay(span); 51 | } 52 | 53 | string typeName = typeof(T).Name; 54 | Random random = new Random(randomSeed); 55 | 56 | // buffer block that holds the messages . consumer will fetch records from this block asynchronously. 57 | BufferBlock buffer = new BufferBlock(new DataflowBlockOptions() 58 | { 59 | BoundedCapacity = 100000 60 | }); 61 | 62 | // consumer that sends the data to event hub asynchronoulsy. 63 | var consumer = new ActionBlock( 64 | (t) => 65 | { 66 | using (var client = pool.GetObject()) 67 | { 68 | return client.Value.SendAsync(new EventData(Encoding.UTF8.GetBytes( 69 | t.GetData(dataFormat))), t.PartitionKey).ContinueWith( 70 | async task => 71 | { 72 | cts.Cancel(); 73 | await console.WriteLine(task.Exception.InnerException.Message); 74 | await console.WriteLine($"event hub client failed for {typeName}"); 75 | } 76 | , TaskContinuationOptions.OnlyOnFaulted 77 | ); 78 | } 79 | }, 80 | new ExecutionDataflowBlockOptions 81 | { 82 | BoundedCapacity = 100000, 83 | CancellationToken = cts.Token, 84 | MaxDegreeOfParallelism = 100, 85 | } 86 | ); 87 | 88 | // link the buffer to consumer . 89 | buffer.LinkTo(consumer, new DataflowLinkOptions() 90 | { 91 | PropagateCompletion = true 92 | }); 93 | 94 | long messages = 0; 95 | 96 | List taskList = new List(); 97 | 98 | var readTask = Task.Factory.StartNew( 99 | async () => 100 | { 101 | // iterate through the path list and act on each file from here on 102 | foreach (var path in pathList) 103 | { 104 | using (var archive = new ZipArchive(File.OpenRead(path), 105 | ZipArchiveMode.Read)) 106 | { 107 | foreach (var entry in archive.Entries) 108 | { 109 | using (var reader = new StreamReader(entry.Open())) 110 | { 111 | 112 | var header = reader.ReadLines() 113 | .First(); 114 | // Start consumer 115 | var lines = reader.ReadLines() 116 | .Skip(1); 117 | 118 | 119 | // for each line , send to event hub 120 | foreach (var line in lines) 121 | { 122 | // proceed only if previous send operation is succesful. 123 | // cancelation is requested in case if send fails . 124 | if (cts.IsCancellationRequested) 125 | { 126 | break; 127 | } 128 | await buffer.SendAsync(factory(line, header)).ConfigureAwait(false); 129 | if (++messages % 10000 == 0) 130 | { 131 | // random delay every 10000 messages are buffered ?? 132 | await Task.Delay(random.Next(100, 1000)) 133 | .ConfigureAwait(false); 134 | await console.WriteLine($"Created {messages} records for {typeName}").ConfigureAwait(false); 135 | } 136 | 137 | } 138 | } 139 | 140 | if (cts.IsCancellationRequested) 141 | { 142 | break; 143 | } 144 | } 145 | 146 | if (cts.IsCancellationRequested) 147 | { 148 | break; 149 | } 150 | } 151 | 152 | buffer.Complete(); 153 | await Task.WhenAll(buffer.Completion, consumer.Completion); 154 | await console.WriteLine($"Created total {messages} records for {typeName}").ConfigureAwait(false); 155 | } 156 | } 157 | ).Unwrap().ContinueWith( 158 | async task => 159 | { 160 | cts.Cancel(); 161 | await console.WriteLine($"failed to read files for {typeName}").ConfigureAwait(false); 162 | await console.WriteLine(task.Exception.InnerException.Message).ConfigureAwait(false); 163 | } 164 | , TaskContinuationOptions.OnlyOnFaulted 165 | ); 166 | 167 | 168 | // await on consumer completion. Incase if sending is failed at any moment , 169 | // execption is thrown and caught . This is used to signal the cancel the reading operation and abort all activity further 170 | 171 | try 172 | { 173 | await Task.WhenAll(consumer.Completion, readTask); 174 | } 175 | catch (Exception ex) 176 | { 177 | cts.Cancel(); 178 | await console.WriteLine(ex.Message).ConfigureAwait(false); 179 | await console.WriteLine($"failed to send files for {typeName}").ConfigureAwait(false); 180 | throw; 181 | } 182 | 183 | } 184 | 185 | 186 | private static (string RideConnectionString, 187 | string FareConnectionString, 188 | ICollection RideDataFiles, 189 | ICollection TripDataFiles, 190 | int MillisecondsToRun, 191 | int MillisecondsToLead, 192 | bool sendRideDataFirst) ParseArguments() 193 | { 194 | 195 | var rideConnectionString = Environment.GetEnvironmentVariable("RIDE_EVENT_HUB"); 196 | var fareConnectionString = Environment.GetEnvironmentVariable("FARE_EVENT_HUB"); 197 | var rideDataFilePath = Environment.GetEnvironmentVariable("RIDE_DATA_FILE_PATH"); 198 | var numberOfMillisecondsToRun = (int.TryParse(Environment.GetEnvironmentVariable("SECONDS_TO_RUN"), out int outputSecondToRun) ? outputSecondToRun : 0) * 1000; 199 | var numberOfMillisecondsToLead = (int.TryParse(Environment.GetEnvironmentVariable("MINUTES_TO_LEAD"), out int outputMinutesToLead) ? outputMinutesToLead : 0) * 60000; 200 | var pushRideDataFirst = bool.TryParse(Environment.GetEnvironmentVariable("PUSH_RIDE_DATA_FIRST"), out Boolean outputPushRideDataFirst) ? outputPushRideDataFirst : false; 201 | 202 | if (string.IsNullOrWhiteSpace(rideConnectionString)) 203 | { 204 | throw new ArgumentException("rideConnectionString must be provided"); 205 | } 206 | 207 | if (string.IsNullOrWhiteSpace(fareConnectionString)) 208 | { 209 | throw new ArgumentException("fareConnectionString must be provided"); 210 | } 211 | 212 | if (string.IsNullOrWhiteSpace(rideDataFilePath)) 213 | { 214 | throw new ArgumentException("rideDataFilePath must be provided"); 215 | } 216 | 217 | if (!Directory.Exists(rideDataFilePath)) 218 | { 219 | throw new ArgumentException("ride file path doesnot exists"); 220 | } 221 | // get only the ride files in order. trip_data_1.zip gets read before trip_data_2.zip 222 | var rideDataFiles = Directory.EnumerateFiles(rideDataFilePath) 223 | .Where(p => Path.GetFileNameWithoutExtension(p).Contains("trip_data")) 224 | .OrderBy(p => 225 | { 226 | var filename = Path.GetFileNameWithoutExtension(p); 227 | var indexString = filename.Substring(filename.LastIndexOf('_') + 1); 228 | var index = int.TryParse(indexString, out int i) ? i : throw new ArgumentException("tripdata file must be named in format trip_data_*.zip"); 229 | return index; 230 | }).ToArray(); 231 | 232 | // get only the fare files in order 233 | var fareDataFiles = Directory.EnumerateFiles(rideDataFilePath) 234 | .Where(p => Path.GetFileNameWithoutExtension(p).Contains("trip_fare")) 235 | .OrderBy(p => 236 | { 237 | var filename = Path.GetFileNameWithoutExtension(p); 238 | var indexString = filename.Substring(filename.LastIndexOf('_') + 1); 239 | var index = int.TryParse(indexString, out int i) ? i : throw new ArgumentException("tripfare file must be named in format trip_fare_*.zip"); 240 | return index; 241 | }).ToArray(); 242 | 243 | if (rideDataFiles.Length == 0) 244 | { 245 | throw new ArgumentException($"trip data files at {rideDataFilePath} does not exist"); 246 | } 247 | 248 | if (fareDataFiles.Length == 0) 249 | { 250 | throw new ArgumentException($"fare data files at {rideDataFilePath} does not exist"); 251 | } 252 | 253 | return (rideConnectionString, fareConnectionString, rideDataFiles, fareDataFiles, numberOfMillisecondsToRun, numberOfMillisecondsToLead, pushRideDataFirst); 254 | } 255 | 256 | 257 | // blocking collection that helps to print to console the messages on progress on the read and send of files to event hub. 258 | private class AsyncConsole 259 | { 260 | private BlockingCollection _blockingCollection = new BlockingCollection(); 261 | private CancellationToken _cancellationToken; 262 | private Task _writerTask; 263 | 264 | public AsyncConsole(CancellationToken cancellationToken = default(CancellationToken)) 265 | { 266 | _cancellationToken = cancellationToken; 267 | _writerTask = Task.Factory.StartNew((state) => 268 | { 269 | var token = (CancellationToken)state; 270 | string msg; 271 | while (!token.IsCancellationRequested) 272 | { 273 | if (_blockingCollection.TryTake(out msg, 500)) 274 | { 275 | Console.WriteLine(msg); 276 | } 277 | } 278 | 279 | while (_blockingCollection.TryTake(out msg, 100)) 280 | { 281 | Console.WriteLine(msg); 282 | } 283 | }, _cancellationToken, TaskCreationOptions.LongRunning); 284 | } 285 | 286 | public Task WriteLine(string toWrite) 287 | { 288 | _blockingCollection.Add(toWrite); 289 | return Task.FromResult(0); 290 | } 291 | 292 | public Task WriterTask 293 | { 294 | get { return _writerTask; } 295 | } 296 | } 297 | 298 | // start of the read task 299 | public static async Task Main(string[] args) 300 | { 301 | try 302 | { 303 | var arguments = ParseArguments(); 304 | var rideClient = EventHubClient.CreateFromConnectionString( 305 | arguments.RideConnectionString 306 | ); 307 | var fareClient = EventHubClient.CreateFromConnectionString( 308 | arguments.FareConnectionString 309 | ); 310 | 311 | cts = arguments.MillisecondsToRun == 0 ? new CancellationTokenSource() : new CancellationTokenSource(arguments.MillisecondsToRun); 312 | 313 | Console.CancelKeyPress += (s, e) => 314 | { 315 | //Console.WriteLine("Cancelling data generation"); 316 | cts.Cancel(); 317 | e.Cancel = true; 318 | }; 319 | 320 | 321 | AsyncConsole console = new AsyncConsole(cts.Token); 322 | 323 | var rideClientPool = new ObjectPool(() => EventHubClient.CreateFromConnectionString(arguments.RideConnectionString), 100); 324 | var fareClientPool = new ObjectPool(() => EventHubClient.CreateFromConnectionString(arguments.FareConnectionString), 100); 325 | 326 | 327 | var numberOfMillisecondsToLead = arguments.MillisecondsToLead; 328 | var pushRideDataFirst = arguments.sendRideDataFirst; 329 | 330 | var rideTaskWaitTime = 0; 331 | var fareTaskWaitTime = 0; 332 | 333 | if (numberOfMillisecondsToLead > 0) 334 | { 335 | if (!pushRideDataFirst) 336 | { 337 | rideTaskWaitTime = numberOfMillisecondsToLead; 338 | } 339 | else 340 | { 341 | fareTaskWaitTime = numberOfMillisecondsToLead; 342 | } 343 | } 344 | 345 | 346 | var rideTask = ReadData(arguments.RideDataFiles, 347 | TaxiRide.FromString, rideClientPool, 100, console, 348 | rideTaskWaitTime, DataFormat.Json); 349 | 350 | var fareTask = ReadData(arguments.TripDataFiles, 351 | TaxiFare.FromString, fareClientPool, 200, console, 352 | fareTaskWaitTime, DataFormat.Csv); 353 | 354 | 355 | await Task.WhenAll(rideTask, fareTask, console.WriterTask); 356 | Console.WriteLine("Data generation complete"); 357 | } 358 | catch (Exception ex) 359 | { 360 | Console.WriteLine(ex.Message); 361 | Console.WriteLine("Data generation failed"); 362 | return 1; 363 | } 364 | 365 | return 0; 366 | } 367 | } 368 | } -------------------------------------------------------------------------------- /onprem/DataLoader/StreamReaderExtensions.cs: -------------------------------------------------------------------------------- 1 | using System; 2 | using System.Collections.Generic; 3 | using System.IO; 4 | using System.Linq; 5 | using Microsoft.Azure.EventHubs; 6 | 7 | namespace Taxi 8 | { 9 | public static class StreamReaderExtensions 10 | { 11 | public static IEnumerable ReadLines(this StreamReader reader) 12 | { 13 | if (reader == null) 14 | { 15 | throw new ArgumentNullException(nameof(reader)); 16 | } 17 | 18 | string line = null; 19 | while ((line = reader.ReadLine()) != null) 20 | { 21 | yield return line; 22 | } 23 | } 24 | } 25 | } -------------------------------------------------------------------------------- /onprem/DataLoader/TaxiData.cs: -------------------------------------------------------------------------------- 1 | namespace Taxi 2 | { 3 | using System; 4 | using System.Globalization; 5 | using Newtonsoft.Json; 6 | using Newtonsoft.Json.Serialization; 7 | 8 | [JsonObject(NamingStrategyType = typeof(CamelCaseNamingStrategy))] 9 | public abstract class TaxiData 10 | { 11 | public TaxiData() 12 | { 13 | } 14 | 15 | [JsonProperty] 16 | public long Medallion { get; set; } 17 | 18 | [JsonProperty] 19 | public long HackLicense { get; set; } 20 | 21 | [JsonProperty] 22 | public string VendorId { get; set; } 23 | 24 | [JsonProperty] 25 | public DateTimeOffset PickupTime { get; set; } 26 | 27 | [JsonIgnore] 28 | public string PartitionKey 29 | { 30 | get => $"{Medallion}_{HackLicense}_{VendorId}"; 31 | } 32 | 33 | [JsonIgnore] 34 | protected string CsvHeader { get; set; } 35 | 36 | 37 | [JsonIgnore] 38 | protected string CsvString { get; set; } 39 | 40 | public string GetData(DataFormat dataFormat) 41 | { 42 | if (dataFormat == DataFormat.Csv) 43 | { 44 | return $"{CsvHeader}\r\n{CsvString}"; 45 | } 46 | else if (dataFormat == DataFormat.Json) 47 | { 48 | return JsonConvert.SerializeObject(this); 49 | } 50 | else 51 | { 52 | throw new ArgumentException($"Invalid DataFormat: {dataFormat}"); 53 | } 54 | } 55 | } 56 | } -------------------------------------------------------------------------------- /onprem/DataLoader/TaxiFare.cs: -------------------------------------------------------------------------------- 1 | namespace Taxi 2 | { 3 | using System; 4 | using System.Globalization; 5 | using Newtonsoft.Json; 6 | using Newtonsoft.Json.Serialization; 7 | 8 | [JsonObject(NamingStrategyType = typeof(CamelCaseNamingStrategy))] 9 | public class TaxiFare : TaxiData 10 | 11 | { 12 | public TaxiFare() 13 | { 14 | } 15 | 16 | [JsonProperty] 17 | public string PaymentType { get; set; } 18 | 19 | [JsonProperty] 20 | public float FareAmount { get; set; } 21 | 22 | [JsonProperty] 23 | public float Surcharge { get; set; } 24 | 25 | [JsonProperty("mtaTax")] 26 | public float MTATax { get; set; } 27 | 28 | [JsonProperty] 29 | public float TipAmount { get; set; } 30 | 31 | [JsonProperty] 32 | public float TollsAmount { get; set; } 33 | 34 | [JsonProperty] 35 | public float TotalAmount { get; set; } 36 | 37 | public static TaxiFare FromString(string line,string header) 38 | { 39 | if (string.IsNullOrWhiteSpace(line)) 40 | { 41 | throw new ArgumentException($"{nameof(line)} cannot be null, empty, or only whitespace"); 42 | } 43 | 44 | string[] tokens = line.Split(','); 45 | if (tokens.Length != 11) 46 | { 47 | throw new ArgumentException($"Invalid record: {line}"); 48 | } 49 | 50 | var fare = new TaxiFare(); 51 | fare.CsvString = line; 52 | fare.CsvHeader = header; 53 | try 54 | { 55 | fare.Medallion = long.Parse(tokens[0]); 56 | fare.HackLicense = long.Parse(tokens[1]); 57 | fare.VendorId = tokens[2]; 58 | fare.PickupTime = DateTimeOffset.ParseExact( 59 | tokens[3], "yyyy-MM-dd HH:mm:ss", 60 | CultureInfo.InvariantCulture, 61 | DateTimeStyles.AssumeUniversal); 62 | fare.PaymentType = tokens[4]; 63 | fare.FareAmount = float.TryParse(tokens[5], out float result) ? result : 0.0f; 64 | fare.Surcharge = float.TryParse(tokens[6], out result) ? result : 0.0f; 65 | fare.MTATax = float.TryParse(tokens[7], out result) ? result : 0.0f; 66 | fare.TipAmount = float.TryParse(tokens[8], out result) ? result : 0.0f; 67 | fare.TollsAmount = float.TryParse(tokens[9], out result) ? result : 0.0f; 68 | fare.TotalAmount = float.TryParse(tokens[10], out result) ? result : 0.0f; 69 | return fare; 70 | } 71 | catch (Exception ex) 72 | { 73 | throw new ArgumentException($"Invalid record: {line}", ex); 74 | } 75 | } 76 | } 77 | } -------------------------------------------------------------------------------- /onprem/DataLoader/TaxiRide.cs: -------------------------------------------------------------------------------- 1 | namespace Taxi 2 | { 3 | using System; 4 | using System.Globalization; 5 | using Newtonsoft.Json; 6 | using Newtonsoft.Json.Serialization; 7 | 8 | [JsonObject(NamingStrategyType = typeof(CamelCaseNamingStrategy))] 9 | public class TaxiRide : TaxiData 10 | 11 | { 12 | public TaxiRide() 13 | { 14 | } 15 | 16 | [JsonProperty] 17 | public int RateCode { get; set; } 18 | 19 | [JsonProperty] 20 | public string StoreAndForwardFlag { get; set; } 21 | 22 | [JsonProperty] 23 | public DateTimeOffset DropoffTime { get; set; } 24 | 25 | [JsonProperty] 26 | public int PassengerCount { get; set; } 27 | 28 | [JsonProperty] 29 | public float TripTimeInSeconds { get; set; } 30 | 31 | [JsonProperty] 32 | public float TripDistanceInMiles { get; set; } 33 | 34 | [JsonProperty] 35 | public float PickupLon { get; set; } 36 | 37 | [JsonProperty] 38 | public float PickupLat { get; set; } 39 | 40 | [JsonProperty] 41 | public float DropoffLon { get; set; } 42 | 43 | [JsonProperty] 44 | public float DropoffLat { get; set; } 45 | 46 | public static TaxiRide FromString(string line,string header) 47 | { 48 | if (string.IsNullOrWhiteSpace(line)) 49 | { 50 | throw new ArgumentException($"{nameof(line)} cannot be null, empty, or only whitespace"); 51 | } 52 | 53 | string[] tokens = line.Split(','); 54 | if (tokens.Length != 14) 55 | { 56 | throw new ArgumentException($"Invalid record: {line}"); 57 | } 58 | 59 | var ride = new TaxiRide(); 60 | ride.CsvString = line; 61 | ride.CsvHeader = header; 62 | try 63 | { 64 | ride.Medallion = long.Parse(tokens[0]); 65 | ride.HackLicense = long.Parse(tokens[1]); 66 | ride.VendorId = tokens[2]; 67 | ride.RateCode = int.Parse(tokens[3]); 68 | ride.StoreAndForwardFlag = tokens[4]; 69 | ride.PickupTime = DateTimeOffset.ParseExact( 70 | tokens[5], "yyyy-MM-dd HH:mm:ss", 71 | CultureInfo.InvariantCulture, 72 | DateTimeStyles.AssumeUniversal); 73 | ride.DropoffTime = DateTimeOffset.ParseExact( 74 | tokens[6], "yyyy-MM-dd HH:mm:ss", 75 | CultureInfo.InvariantCulture, 76 | DateTimeStyles.AssumeUniversal); 77 | ride.PassengerCount = int.Parse(tokens[7]); 78 | ride.TripTimeInSeconds = float.Parse(tokens[8]); 79 | ride.TripDistanceInMiles = float.Parse(tokens[9]); 80 | 81 | ride.PickupLon = float.TryParse(tokens[10], out float result) ? result : 0.0f; 82 | ride.PickupLat = float.TryParse(tokens[11], out result) ? result : 0.0f; 83 | ride.DropoffLon = float.TryParse(tokens[12], out result) ? result : 0.0f; 84 | ride.DropoffLat = float.TryParse(tokens[13], out result) ? result : 0.0f; 85 | return ride; 86 | } 87 | catch (Exception ex) 88 | { 89 | throw new ArgumentException($"Invalid record: {line}", ex); 90 | } 91 | } 92 | } 93 | } -------------------------------------------------------------------------------- /onprem/DataLoader/taxi.csproj: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | Exe 5 | netcoreapp2.0 6 | latest 7 | win10-x64 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | -------------------------------------------------------------------------------- /onprem/Dockerfile: -------------------------------------------------------------------------------- 1 | FROM mcr.microsoft.com/dotnet/core/sdk:2.1 AS build 2 | RUN apt-get update 3 | RUN apt-get install -y git 4 | RUN git clone --recursive https://github.com/mspnp/azure-stream-analytics-data-pipeline.git && cd azure-stream-analytics-data-pipeline && git fetch && git checkout master 5 | WORKDIR azure-stream-analytics-data-pipeline/onprem/DataLoader 6 | RUN dotnet build -c Release 7 | RUN dotnet publish -f netcoreapp2.0 -c Release 8 | 9 | 10 | FROM mcr.microsoft.com/dotnet/core/runtime:2.1 AS runtime 11 | WORKDIR DataLoader 12 | COPY --from=build azure-stream-analytics-data-pipeline/onprem/DataLoader/bin/Release/netcoreapp2.0/publish . 13 | ENTRYPOINT ["dotnet" , "taxi.dll"] 14 | -------------------------------------------------------------------------------- /onprem/main.env: -------------------------------------------------------------------------------- 1 | RIDE_EVENT_HUB= 2 | FARE_EVENT_HUB= 3 | RIDE_DATA_FILE_PATH= 4 | MINUTES_TO_LEAD=