├── .github ├── ISSUE_TEMPLATE.md └── PULL_REQUEST_TEMPLATE.md ├── .gitignore ├── CHANGELOG.md ├── CONTRIBUTING.md ├── CleanTrigger1 ├── __init__.py ├── clean.py ├── function.json └── sample.dat ├── CleanTrigger2 ├── __init__.py ├── clean.py ├── function.json └── sample.dat ├── LICENSE.md ├── README.md ├── Reconcile ├── __init__.py ├── clean.py ├── fetch_blob.py ├── function.json └── sample.dat ├── azure-deploy-event-grid-subscription.json ├── azure-deploy-linux-app-plan.json ├── azuredeploy.parameters.json ├── blob_to_smart_contract ├── __init__.py ├── clean.py ├── fetch_blob.py ├── function.json └── sample.dat ├── dataset ├── config.ini ├── randomcsvgenerator.py ├── s1_raw.csv └── s2_raw.csv ├── host.json ├── local.settings.json ├── requirements.txt └── tests ├── host.json ├── subvalidation.json └── test_eventgrid.py /.github/ISSUE_TEMPLATE.md: -------------------------------------------------------------------------------- 1 | 4 | > Please provide us with the following information: 5 | > --------------------------------------------------------------- 6 | 7 | ### This issue is for a: (mark with an `x`) 8 | ``` 9 | - [ ] bug report -> please search issues before submitting 10 | - [ ] feature request 11 | - [ ] documentation issue or request 12 | - [ ] regression (a behavior that used to work and stopped in a new release) 13 | ``` 14 | 15 | ### Minimal steps to reproduce 16 | > 17 | 18 | ### Any log messages given by the failure 19 | > 20 | 21 | ### Expected/desired behavior 22 | > 23 | 24 | ### OS and Version? 25 | > Windows 7, 8 or 10. Linux (which distribution). macOS (Yosemite? El Capitan? Sierra?) 26 | 27 | ### Versions 28 | > 29 | 30 | ### Mention any other details that might be useful 31 | 32 | > --------------------------------------------------------------- 33 | > Thanks! We'll be in touch soon. 34 | -------------------------------------------------------------------------------- /.github/PULL_REQUEST_TEMPLATE.md: -------------------------------------------------------------------------------- 1 | ## Purpose 2 | 3 | * ... 4 | 5 | ## Does this introduce a breaking change? 6 | 7 | ``` 8 | [ ] Yes 9 | [ ] No 10 | ``` 11 | 12 | ## Pull Request Type 13 | What kind of change does this Pull Request introduce? 14 | 15 | 16 | ``` 17 | [ ] Bugfix 18 | [ ] Feature 19 | [ ] Code style update (formatting, local variables) 20 | [ ] Refactoring (no functional changes, no api changes) 21 | [ ] Documentation content changes 22 | [ ] Other... Please describe: 23 | ``` 24 | 25 | ## How to Test 26 | * Get the code 27 | 28 | ``` 29 | git clone [repo-address] 30 | cd [repo-name] 31 | git checkout [branch-name] 32 | npm install 33 | ``` 34 | 35 | * Test the code 36 | 37 | ``` 38 | ``` 39 | 40 | ## What to Check 41 | Verify that the following are valid 42 | * ... 43 | 44 | ## Other Information 45 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | # Byte-compiled / optimized / DLL files 2 | __pycache__/ 3 | *.py[cod] 4 | *$py.class 5 | 6 | # C extensions 7 | *.so 8 | 9 | # Distribution / packaging 10 | .Python 11 | build/ 12 | develop-eggs/ 13 | dist/ 14 | downloads/ 15 | eggs/ 16 | .eggs/ 17 | lib/ 18 | lib64/ 19 | parts/ 20 | sdist/ 21 | var/ 22 | wheels/ 23 | *.egg-info/ 24 | .installed.cfg 25 | *.egg 26 | MANIFEST 27 | 28 | # PyInstaller 29 | # Usually these files are written by a python script from a template 30 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 31 | *.manifest 32 | *.spec 33 | 34 | # Installer logs 35 | pip-log.txt 36 | pip-delete-this-directory.txt 37 | 38 | # Unit test / coverage reports 39 | htmlcov/ 40 | .tox/ 41 | .coverage 42 | .coverage.* 43 | .cache 44 | nosetests.xml 45 | coverage.xml 46 | *.cover 47 | .hypothesis/ 48 | .pytest_cache/ 49 | 50 | # Translations 51 | *.mo 52 | *.pot 53 | 54 | # Django stuff: 55 | *.log 56 | local_settings.py 57 | db.sqlite3 58 | 59 | # Flask stuff: 60 | instance/ 61 | .webassets-cache 62 | 63 | # Scrapy stuff: 64 | .scrapy 65 | 66 | # Sphinx documentation 67 | docs/_build/ 68 | 69 | # PyBuilder 70 | target/ 71 | 72 | # Jupyter Notebook 73 | .ipynb_checkpoints 74 | 75 | # pyenv 76 | .python-version 77 | 78 | # celery beat schedule file 79 | celerybeat-schedule 80 | 81 | # SageMath parsed files 82 | *.sage.py 83 | 84 | # Environments 85 | .env 86 | .venv 87 | env/ 88 | venv/ 89 | ENV/ 90 | env.bak/ 91 | venv.bak/ 92 | 93 | # Spyder project settings 94 | .spyderproject 95 | .spyproject 96 | 97 | # Rope project settings 98 | .ropeproject 99 | 100 | # mkdocs documentation 101 | /site 102 | 103 | # mypy 104 | .mypy_cache/ 105 | -------------------------------------------------------------------------------- /CHANGELOG.md: -------------------------------------------------------------------------------- 1 | ## [project-title] Changelog 2 | 3 | 4 | # x.y.z (yyyy-mm-dd) 5 | 6 | *Features* 7 | * ... 8 | 9 | *Bug Fixes* 10 | * ... 11 | 12 | *Breaking Changes* 13 | * ... 14 | -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | # Contributing to [project-title] 2 | 3 | This project welcomes contributions and suggestions. Most contributions require you to agree to a 4 | Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us 5 | the rights to use your contribution. For details, visit https://cla.microsoft.com. 6 | 7 | When you submit a pull request, a CLA-bot will automatically determine whether you need to provide 8 | a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions 9 | provided by the bot. You will only need to do this once across all repos using our CLA. 10 | 11 | This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/). 12 | For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or 13 | contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments. 14 | 15 | - [Code of Conduct](#coc) 16 | - [Issues and Bugs](#issue) 17 | - [Feature Requests](#feature) 18 | - [Submission Guidelines](#submit) 19 | 20 | ## Code of Conduct 21 | Help us keep this project open and inclusive. Please read and follow our [Code of Conduct](https://opensource.microsoft.com/codeofconduct/). 22 | 23 | ## Found an Issue? 24 | If you find a bug in the source code or a mistake in the documentation, you can help us by 25 | [submitting an issue](#submit-issue) to the GitHub Repository. Even better, you can 26 | [submit a Pull Request](#submit-pr) with a fix. 27 | 28 | ## Want a Feature? 29 | You can *request* a new feature by [submitting an issue](#submit-issue) to the GitHub 30 | Repository. If you would like to *implement* a new feature, please submit an issue with 31 | a proposal for your work first, to be sure that we can use it. 32 | 33 | * **Small Features** can be crafted and directly [submitted as a Pull Request](#submit-pr). 34 | 35 | ## Submission Guidelines 36 | 37 | ### Submitting an Issue 38 | Before you submit an issue, search the archive, maybe your question was already answered. 39 | 40 | If your issue appears to be a bug, and hasn't been reported, open a new issue. 41 | Help us to maximize the effort we can spend fixing issues and adding new 42 | features, by not reporting duplicate issues. Providing the following information will increase the 43 | chances of your issue being dealt with quickly: 44 | 45 | * **Overview of the Issue** - if an error is being thrown a non-minified stack trace helps 46 | * **Version** - what version is affected (e.g. 0.1.2) 47 | * **Motivation for or Use Case** - explain what are you trying to do and why the current behavior is a bug for you 48 | * **Browsers and Operating System** - is this a problem with all browsers? 49 | * **Reproduce the Error** - provide a live example or a unambiguous set of steps 50 | * **Related Issues** - has a similar issue been reported before? 51 | * **Suggest a Fix** - if you can't fix the bug yourself, perhaps you can point to what might be 52 | causing the problem (line of code or commit) 53 | 54 | You can file new issues by providing the above information at the corresponding repository's issues link: https://github.com/[organization-name]/[repository-name]/issues/new]. 55 | 56 | ### Submitting a Pull Request (PR) 57 | Before you submit your Pull Request (PR) consider the following guidelines: 58 | 59 | * Search the repository (https://github.com/[organization-name]/[repository-name]/pulls) for an open or closed PR 60 | that relates to your submission. You don't want to duplicate effort. 61 | 62 | * Make your changes in a new git fork: 63 | 64 | * Commit your changes using a descriptive commit message 65 | * Push your fork to GitHub: 66 | * In GitHub, create a pull request 67 | * If we suggest changes then: 68 | * Make the required updates. 69 | * Rebase your fork and force push to your GitHub repository (this will update your Pull Request): 70 | 71 | ```shell 72 | git rebase master -i 73 | git push -f 74 | ``` 75 | 76 | That's it! Thank you for your contribution! 77 | -------------------------------------------------------------------------------- /CleanTrigger1/__init__.py: -------------------------------------------------------------------------------- 1 | import logging 2 | import json 3 | import azure.functions as func 4 | from . import clean as cleaning_service 5 | 6 | def main(req: func.HttpRequest) -> func.HttpResponse: 7 | # This will output to postman 8 | logging.info('Python HTTP trigger function processed a request.') 9 | req_body = req.get_json() 10 | 11 | if is_validation_event(req_body): 12 | return func.HttpResponse(validate_eg(req_body)) 13 | 14 | elif is_blob_created_event(req_body): 15 | result = cleaning_service.clean(req_body) 16 | 17 | if result is "Success": 18 | return func.HttpResponse("Successfully cleaned data",status_code=200) 19 | else: 20 | return func.HttpResponse("Bad Request", status_code=400) 21 | 22 | else: 23 | pass 24 | 25 | # Check for validation event from event grid 26 | def is_validation_event(req_body): 27 | return req_body and req_body[0] and req_body[0]['eventType'] and req_body[0]['eventType'] == "Microsoft.EventGrid.SubscriptionValidationEvent" 28 | 29 | # If blob created event, then true 30 | def is_blob_created_event(req_body): 31 | return req_body and req_body[0] and req_body[0]['eventType'] and req_body[0]['eventType'] == "Microsoft.Storage.BlobCreated" 32 | 33 | # Respond to event grid webhook validation event 34 | def validate_eg(req_body): 35 | result = {} 36 | result['validationResponse'] = req_body[0]['data']['validationCode'] 37 | return json.dumps(result) -------------------------------------------------------------------------------- /CleanTrigger1/clean.py: -------------------------------------------------------------------------------- 1 | import logging 2 | import os 3 | import pandas as pd 4 | from azure.storage.blob import ContentSettings 5 | from azure.storage.blob import BlockBlobService 6 | from io import StringIO 7 | 8 | blob_account_name = os.getenv("BlobAccountName") 9 | blob_account_key = os.getenv("BlobAccountKey") 10 | block_blob_service = BlockBlobService(account_name=blob_account_name, 11 | account_key=blob_account_key) 12 | out_blob_container_name = os.getenv("C1") 13 | 14 | def clean(req_body): 15 | blob_obj,filename = extract_blob_props(req_body[0]['data']['url'] ) 16 | df = pd.read_csv(StringIO(blob_obj.content)) 17 | result = clean_blob(df,filename) 18 | return result 19 | 20 | # Extract blob container name and blob file 21 | def extract_blob_props(url): 22 | 23 | blob_file_name = url.rsplit('/',1)[-1] 24 | in_container_name = url.rsplit('/',2)[-2] 25 | 26 | # remove file extension from blob name 27 | readblob = block_blob_service.get_blob_to_text(in_container_name,blob_file_name) 28 | return readblob, blob_file_name 29 | 30 | def clean_blob(df, blob_file_name): 31 | 32 | # group by names and region and sum the sales and units 33 | df1 = df.groupby(["names","region"],as_index=False)[["units","price"]].sum().reset_index() 34 | 35 | # pick one region based on request 36 | df2 = df1[df1["region"] == 'east'] 37 | outcsv = df2.to_csv(index=False) 38 | 39 | cleaned_blob_file_name = "cleaned_" +blob_file_name 40 | block_blob_service.create_blob_from_text(out_blob_container_name, cleaned_blob_file_name, outcsv) 41 | return "Success" 42 | -------------------------------------------------------------------------------- /CleanTrigger1/function.json: -------------------------------------------------------------------------------- 1 | { 2 | "scriptFile": "__init__.py", 3 | "bindings": [ 4 | { 5 | "authLevel": "anonymous", 6 | "type": "httpTrigger", 7 | "direction": "in", 8 | "name": "req", 9 | "methods": [ 10 | "get", 11 | "post" 12 | ] 13 | }, 14 | { 15 | "type": "http", 16 | "direction": "out", 17 | "name": "$return" 18 | } 19 | ] 20 | } -------------------------------------------------------------------------------- /CleanTrigger1/sample.dat: -------------------------------------------------------------------------------- 1 | { 2 | "name": "Azure" 3 | } -------------------------------------------------------------------------------- /CleanTrigger2/__init__.py: -------------------------------------------------------------------------------- 1 | import logging 2 | import json 3 | import azure.functions as func 4 | from . import clean as cleaning_service 5 | 6 | def main(req: func.HttpRequest) -> func.HttpResponse: 7 | req_body = req.get_json() 8 | 9 | if is_validation_event(req_body): 10 | return func.HttpResponse(validate_eg(req_body)) 11 | 12 | elif is_blob_created_event(req_body): 13 | result = cleaning_service.clean(req_body) 14 | 15 | if result is "Success": 16 | return func.HttpResponse("Successfully cleaned data",status_code=200) 17 | else: 18 | return func.HttpResponse("Bad Request", status_code=400) 19 | 20 | else: # don't care about other events 21 | pass 22 | 23 | # Check for validation event from event grid 24 | def is_validation_event(req_body): 25 | return req_body and req_body[0] and req_body[0]['eventType'] and req_body[0]['eventType'] == "Microsoft.EventGrid.SubscriptionValidationEvent" 26 | 27 | # If blob created event, then true 28 | def is_blob_created_event(req_body): 29 | return req_body and req_body[0] and req_body[0]['eventType'] and req_body[0]['eventType'] == "Microsoft.Storage.BlobCreated" 30 | 31 | # Respond to event grid webhook validation event 32 | def validate_eg(req_body): 33 | result = {} 34 | result['validationResponse'] = req_body[0]['data']['validationCode'] 35 | return json.dumps(result) -------------------------------------------------------------------------------- /CleanTrigger2/clean.py: -------------------------------------------------------------------------------- 1 | import logging 2 | import os 3 | import pandas as pd 4 | from azure.storage.blob import ContentSettings 5 | from azure.storage.blob import BlockBlobService 6 | from io import StringIO 7 | 8 | blob_account_name = os.getenv("BlobAccountName") 9 | blob_account_key = os.getenv("BlobAccountKey") 10 | block_blob_service = BlockBlobService(account_name=blob_account_name, 11 | account_key=blob_account_key) 12 | out_blob_container_name = os.getenv("C2") 13 | 14 | def clean(req_body): 15 | blob_obj,filename = extract_blob_props(req_body[0]['data']['url']) 16 | df = pd.read_csv(StringIO(blob_obj.content)) 17 | result = clean_blob(df, filename) 18 | return result 19 | 20 | def extract_blob_props(url): 21 | blob_file_name = url.rsplit('/',1)[-1] 22 | in_container_name = url.rsplit('/',2)[-2] 23 | readblob = block_blob_service.get_blob_to_text(in_container_name,blob_file_name) 24 | return readblob, blob_file_name 25 | 26 | def clean_blob(df, blob_file_name): 27 | # group by names and item and sum the sales and units 28 | df1 = df.groupby(["names","item"],as_index=False)[["units","price"]].sum().reset_index() 29 | 30 | # pick one region based on request 31 | df2 = df1[df1["item"] == 'binder'] 32 | outcsv = df2.to_csv(index=False) 33 | 34 | cleaned_blob_file_name = "cleaned_" +blob_file_name 35 | block_blob_service.create_blob_from_text(out_blob_container_name, cleaned_blob_file_name, outcsv) 36 | return "Success" 37 | 38 | -------------------------------------------------------------------------------- /CleanTrigger2/function.json: -------------------------------------------------------------------------------- 1 | { 2 | "scriptFile": "__init__.py", 3 | "bindings": [ 4 | { 5 | "authLevel": "anonymous", 6 | "type": "httpTrigger", 7 | "direction": "in", 8 | "name": "req", 9 | "methods": [ 10 | "get", 11 | "post" 12 | ] 13 | }, 14 | { 15 | "type": "http", 16 | "direction": "out", 17 | "name": "$return" 18 | } 19 | ] 20 | } -------------------------------------------------------------------------------- /CleanTrigger2/sample.dat: -------------------------------------------------------------------------------- 1 | { 2 | "name": "Azure" 3 | } -------------------------------------------------------------------------------- /LICENSE.md: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) Microsoft Corporation. All rights reserved. 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | --- 2 | page_type: sample 3 | description: "This sample demonstrates a data cleaning pipeline with Azure Functions written in Python." 4 | languages: 5 | - python 6 | products: 7 | - azure-functions 8 | - azure-storage 9 | --- 10 | 11 | # Data Cleaning Pipeline 12 | 13 | This sample demonstrates a data cleaning pipeline with Azure Functions written in Python triggered off a HTTP event from Event Grid to perform some pandas cleaning and reconciliation of CSV files. 14 | Using this sample we demonstrate a real use case where this is used to perform cleaning tasks. 15 | 16 | ## Getting Started 17 | 18 | ### Deploy to Azure 19 | 20 | #### Prerequisites 21 | 22 | - Install Python 3.6+ 23 | - Install [Functions Core Tools](https://docs.microsoft.com/en-us/azure/azure-functions/functions-run-local#v2) 24 | - Install Docker 25 | - Note: If run on Windows, use Ubuntu WSL to run deploy script 26 | 27 | #### Steps 28 | 29 | - Deploy through Azure CLI 30 | - Open AZ CLI and run `az group create -l [region] -n [resourceGroupName]` to create a resource group in your Azure subscription (i.e. [region] could be westus2, eastus, etc.) 31 | - Run `az group deployment create --name [deploymentName] --resource-group [resourceGroupName] --template-file azuredeploy.json` 32 | 33 | - Deploy Function App 34 | - [Create/Activate virtual environment](https://docs.microsoft.com/en-us/azure/azure-functions/functions-create-first-function-python#create-and-activate-a-virtual-environment) 35 | - Run `func azure functionapp publish [functionAppName] --build-native-deps` 36 | 37 | ### Test 38 | 39 | - Upload s1.csv file into c1raw container 40 | - Watch event grid trigger the CleanTrigger1 function and produce a "cleaned_s1_raw.csv" 41 | - Repeat the same for s2.csv into c2raw container 42 | - Now send the following HTTP request to the Reconcile function to merge 43 | 44 | ``` 45 | { 46 | "file_1_url" : "https://{storagename}.blob.core.windows.net/c1raw/cleaned_s1_raw.csv", 47 | "file_2_url" : "https://{storagename}.blob.core.windows.net/c2raw/cleaned_s2_raw.csv", 48 | "batchId" : "1122" 49 | } 50 | 51 | ``` 52 | - Watch it produce final.csv file 53 | - Can use a logic app to call the reconcile method with batch id's 54 | 55 | ## References 56 | 57 | - [Create your first Python Function](https://docs.microsoft.com/en-us/azure/azure-functions/functions-create-first-function-python) 58 | -------------------------------------------------------------------------------- /Reconcile/__init__.py: -------------------------------------------------------------------------------- 1 | import logging 2 | import json 3 | import azure.functions as func 4 | from . import clean as cleaning_service 5 | 6 | def main(req: func.HttpRequest) -> func.HttpResponse: 7 | # This will output to postman 8 | logging.info('Python HTTP trigger function processed a request.') 9 | try: 10 | req_body = req.get_json() 11 | f1_url = req_body.get('file_1_url') 12 | f2_url = req_body.get('file_2_url') 13 | batch_id = req_body.get('batchId') 14 | except: 15 | return func.HttpResponse("Bad Request", status_code=400) 16 | 17 | result = cleaning_service.clean(f1_url,f2_url,batch_id) 18 | return func.HttpResponse(result,status_code=200) -------------------------------------------------------------------------------- /Reconcile/clean.py: -------------------------------------------------------------------------------- 1 | import logging 2 | import os 3 | import pandas as pd 4 | from azure.storage.blob import ContentSettings 5 | from azure.storage.blob import BlockBlobService 6 | from io import StringIO 7 | from . import fetch_blob as fetching_service 8 | 9 | blob_account_name = os.getenv("BlobAccountName") 10 | blob_account_key = os.getenv("BlobAccountKey") 11 | block_blob_service = BlockBlobService(account_name=blob_account_name, 12 | account_key=blob_account_key) 13 | out_blob_container_name = os.getenv("FINAL") 14 | 15 | # Clean blob flow from event grid events 16 | # This function will call all the other functions in clean.py 17 | 18 | def clean(file_1_url,file_2_url,batch_id): 19 | f1_container = file_1_url.rsplit('/', 2)[-2] 20 | f2_container = file_2_url.rsplit('/', 2)[-2] 21 | f2_df, f1_df = fetch_blobs(batch_id,f2_container,f1_container) 22 | result = final_reconciliation(f2_df, f1_df,batch_id) 23 | return 'Success' 24 | 25 | def fetch_blobs(batch_id,file_2_container_name,file_1_container_name): 26 | # Create container & blob dictionary with helper function 27 | blob_dict = fetching_service.blob_to_dict(batch_id,file_2_container_name,file_1_container_name) 28 | 29 | # Create F1 DF 30 | filter_string = 'c1' 31 | f1_df = fetching_service.blob_dict_to_df(blob_dict, filter_string) 32 | 33 | # Create F2 df 34 | filter_string = 'c2' 35 | f2_df = fetching_service.blob_dict_to_df(blob_dict, filter_string) 36 | return f2_df, f1_df 37 | 38 | def final_reconciliation(f2_df, f1_df,batch_id): 39 | outcsv = f2_df.to_csv(index=False) 40 | cleaned_blob_file_name = "reconciled_" + batch_id 41 | block_blob_service.create_blob_from_text(out_blob_container_name, cleaned_blob_file_name, outcsv) 42 | return "Success" 43 | -------------------------------------------------------------------------------- /Reconcile/fetch_blob.py: -------------------------------------------------------------------------------- 1 | import logging 2 | import os 3 | import collections 4 | import pandas as pd 5 | from azure.storage.blob import ContentSettings 6 | from azure.storage.blob import BlockBlobService 7 | from io import StringIO 8 | #kill $(lsof -t -i :7071) 9 | 10 | blob_account_name = os.getenv("BlobAccountName") 11 | blob_account_key = os.getenv("BlobAccountKey") 12 | block_blob_service = BlockBlobService(account_name=blob_account_name, 13 | account_key=blob_account_key) 14 | 15 | def blob_dict_to_df(my_ordered_dict, filter_string): 16 | logging.warning('blob_dict_to_df') 17 | filtered_dict = {k:v for k,v in my_ordered_dict.items() if filter_string in k} 18 | logging.warning(filtered_dict) 19 | container_key = list(filtered_dict.keys())[0] 20 | latest_file = list(filtered_dict.values())[0] 21 | blobstring = block_blob_service.get_blob_to_text(container_key, latest_file).content 22 | df = pd.read_csv(StringIO(blobstring),dtype=str) 23 | return df 24 | 25 | def blob_to_dict(batchId,*args): 26 | # add containers to list 27 | container_list = [] 28 | arg_len = (len(args)) 29 | i = 0 30 | for i in range(arg_len): 31 | container_list.append(args[i]) 32 | ''.join([str(i) for i in container_list]) 33 | logging.info(container_list) 34 | # get blob file names from container... azure SDK returns a generator object 35 | ii = 0 36 | file_names = [] 37 | for container in container_list: 38 | logging.warning('FOR LOOP') 39 | generator = block_blob_service.list_blobs(container) 40 | logging.warning(list(generator)) 41 | for file in generator: 42 | if "cleaned" in file.name: 43 | file_names.append(file.name) 44 | ii = ii+1 45 | # Merge the two lists to create a dictionary 46 | # container_file_dict = collections.OrderedDict() 47 | # container_file_dict = dict(zip(container_list,file_names)) 48 | c1_list = [f for f in file_names if batchId + "_c1" in f] 49 | c2_list = [f for f in file_names if batchId + "_c2" in f] 50 | 51 | for c in container_list: 52 | if "c1" in c: 53 | c1_name = c 54 | else: 55 | c2_name = c 56 | container_file_dict = {} 57 | container_file_dict[c1_name] = c1_list[0] 58 | container_file_dict[c2_name] = c2_list[0] 59 | return container_file_dict 60 | -------------------------------------------------------------------------------- /Reconcile/function.json: -------------------------------------------------------------------------------- 1 | { 2 | "scriptFile": "__init__.py", 3 | "bindings": [ 4 | { 5 | "authLevel": "anonymous", 6 | "type": "httpTrigger", 7 | "direction": "in", 8 | "name": "req", 9 | "methods": [ 10 | "get", 11 | "post" 12 | ] 13 | }, 14 | { 15 | "type": "http", 16 | "direction": "out", 17 | "name": "$return" 18 | } 19 | ] 20 | } 21 | -------------------------------------------------------------------------------- /Reconcile/sample.dat: -------------------------------------------------------------------------------- 1 | { 2 | "name": "Azure" 3 | } -------------------------------------------------------------------------------- /azure-deploy-event-grid-subscription.json: -------------------------------------------------------------------------------- 1 | { 2 | "$schema": "https://schema.management.azure.com/schemas/2015-01-01/deploymentTemplate.json#", 3 | "contentVersion": "1.0.0.0", 4 | "parameters": { 5 | "eventSubName1": { 6 | "type": "string", 7 | "defaultValue": "subToStorage1", 8 | "metadata": { 9 | "description": "Provide a name for the Event Grid subscription." 10 | } 11 | }, 12 | "eventSubName2": { 13 | "type": "string", 14 | "defaultValue": "subToStorage2", 15 | "metadata": { 16 | "description": "Provide a name for the Event Grid subscription." 17 | } 18 | }, 19 | "endpoint1": { 20 | "type": "string", 21 | "metadata": { 22 | "description": "Provide the URL for the WebHook to receive events. Create your own endpoint for events." 23 | } 24 | }, 25 | "storageName": { 26 | "type": "string", 27 | "defaultValue": "203014767teststorage", 28 | "metadata": { 29 | "description": "Provide a name for the Event Grid subscription." 30 | } 31 | } 32 | }, 33 | "resources": [ 34 | { 35 | "type": "Microsoft.Storage/storageAccounts/providers/eventSubscriptions", 36 | "name": "[concat(parameters('storageName'), '/Microsoft.EventGrid/', parameters('eventSubName1'))]", 37 | "apiVersion": "2018-01-01", 38 | "properties": { 39 | "destination": { 40 | "endpointType": "WebHook", 41 | "properties": { 42 | "endpointUrl": "[parameters('endpoint1')]" 43 | } 44 | }, 45 | "filter": { 46 | "subjectBeginsWith": "", 47 | "subjectEndsWith": "", 48 | "isSubjectCaseSensitive": false, 49 | "includedEventTypes": [ 50 | "All" 51 | ], 52 | "advancedFilters": [ 53 | { 54 | "operatorType": "StringContains", 55 | "key": "Subject", 56 | "values": ["raw"] 57 | } 58 | ] 59 | } 60 | } 61 | } 62 | ] 63 | } -------------------------------------------------------------------------------- /azure-deploy-linux-app-plan.json: -------------------------------------------------------------------------------- 1 | { 2 | "$schema": "https://schema.management.azure.com/schemas/2015-01-01/deploymentTemplate.json#", 3 | "contentVersion": "1.0.0.0", 4 | "parameters": { 5 | "functionapp1": { 6 | "defaultValue": "customerendpointdh1", 7 | "type": "String" 8 | }, 9 | "functionapp2": { 10 | "defaultValue": "customerendpointdh2", 11 | "type": "String" 12 | }, 13 | "config_web_name": { 14 | "defaultValue": "web", 15 | "type": "String" 16 | }, 17 | "storageName": { 18 | "defaultValue": "203014767teststorage", 19 | "type": "String" 20 | }, 21 | "linuxConsumptionAppName": { 22 | "defaultValue": "WestUSLinuxDynamicPlan", 23 | "type": "String" 24 | }, 25 | "siteName1": { 26 | "defaultValue": "customerendpointdh1.azurewebsites.net", 27 | "type": "String" 28 | }, 29 | "siteName2": { 30 | "defaultValue": "customerendpointdh2.azurewebsites.net", 31 | "type": "String" 32 | }, 33 | "outputBlobContainerName" : { 34 | "defaultValue": "cleaned", 35 | "type": "String" 36 | } 37 | }, 38 | "variables": { 39 | "storageAccountid": "[concat(resourceGroup().id,'/providers/','Microsoft.Storage/storageAccounts/', parameters('storageName'))]", 40 | "container1" : "raw", 41 | "container2" : "cleaned" 42 | }, 43 | "resources": [ 44 | { 45 | "name": "[parameters('storageName')]", 46 | "type": "Microsoft.Storage/storageAccounts", 47 | "apiVersion": "2017-10-01", 48 | "sku": { 49 | "name": "Standard_LRS" 50 | }, 51 | "kind": "StorageV2", 52 | "location": "West US", 53 | "tags": {}, 54 | "properties": { 55 | "accessTier": "Hot" 56 | }, 57 | "resources": [ 58 | { 59 | "name": "[concat('default/', variables('container1'))]", 60 | "type": "blobServices/containers", 61 | "apiVersion": "2018-03-01-preview", 62 | "dependsOn": [ 63 | "[parameters('storageName')]" 64 | ] 65 | }, 66 | { 67 | "name": "[concat('default/', variables('container2'))]", 68 | "type": "blobServices/containers", 69 | "apiVersion": "2018-03-01-preview", 70 | "dependsOn": [ 71 | "[parameters('storageName')]" 72 | ] 73 | } 74 | ] 75 | }, 76 | { 77 | "type": "Microsoft.Web/serverfarms", 78 | "sku": { 79 | "name": "Y1", 80 | "tier": "Dynamic", 81 | "size": "Y1", 82 | "family": "Y", 83 | "capacity": 0 84 | }, 85 | "kind": "functionapp", 86 | "name": "[parameters('linuxConsumptionAppName')]", 87 | "apiVersion": "2016-09-01", 88 | "location": "West US", 89 | "properties": { 90 | "name": "[parameters('linuxConsumptionAppName')]", 91 | "perSiteScaling": false, 92 | "reserved": true 93 | }, 94 | "dependsOn": [] 95 | }, 96 | { 97 | "type": "Microsoft.Web/sites", 98 | "kind": "functionapp,linux", 99 | "name": "[parameters('functionapp1')]", 100 | "apiVersion": "2016-08-01", 101 | "location": "West US", 102 | "properties": { 103 | "enabled": true, 104 | "hostNameSslStates": [ 105 | { 106 | "name": "[concat(parameters('functionapp1'),'.azurewebsites.net')]", 107 | "sslState": "Disabled", 108 | "hostType": "Standard" 109 | } 110 | ], 111 | "serverFarmId": "[resourceId('Microsoft.Web/serverfarms', parameters('linuxConsumptionAppName'))]", 112 | "reserved": true, 113 | "siteConfig": { 114 | "appSettings": [ 115 | { 116 | "name": "AzureWebJobsDashboard", 117 | "value": "[concat('DefaultEndpointsProtocol=https;AccountName=', parameters('storageName'), ';AccountKey=', listKeys(variables('storageAccountid'),'2015-05-01-preview').key1)]" 118 | }, 119 | { 120 | "name": "AzureWebJobsStorage", 121 | "value": "[concat('DefaultEndpointsProtocol=https;AccountName=', parameters('storageName'), ';AccountKey=', listKeys(variables('storageAccountid'),'2015-05-01-preview').key1)]" 122 | }, 123 | { 124 | "name": "WEBSITE_CONTENTAZUREFILECONNECTIONSTRING", 125 | "value": "[concat('DefaultEndpointsProtocol=https;AccountName=', parameters('storageName'), ';AccountKey=', listKeys(variables('storageAccountid'),'2015-05-01-preview').key1)]" 126 | }, 127 | { 128 | "name": "WEBSITE_CONTENTSHARE", 129 | "value": "[parameters('functionapp1')]" 130 | }, 131 | { 132 | "name": "FUNCTIONS_EXTENSION_VERSION", 133 | "value": "~2" 134 | }, 135 | { 136 | "name": "WEBSITE_NODE_DEFAULT_VERSION", 137 | "value": "8.11.1" 138 | }, 139 | { 140 | "name": "FUNCTIONS_WORKER_RUNTIME", 141 | "value": "python" 142 | }, 143 | { 144 | "name" : "BlobAccountName", 145 | "value" : "[parameters('storageName')]" 146 | }, 147 | { 148 | "name": "BlobAccountKey", 149 | "value" : "[listKeys(variables('storageAccountid'),'2015-05-01-preview').key1]" 150 | }, 151 | { 152 | "name" : "OutBlobContainerName", 153 | "value" : "[parameters('outputBlobContainerName')]" 154 | } 155 | ] 156 | } 157 | }, 158 | "dependsOn": [ 159 | "[resourceId('Microsoft.Web/serverfarms', parameters('linuxConsumptionAppName'))]" 160 | ] 161 | }, 162 | { 163 | "type": "Microsoft.Web/sites/config", 164 | "name": "[concat(parameters('functionapp1'), '/', parameters('config_web_name'))]", 165 | "apiVersion": "2016-08-01", 166 | "location": "West US", 167 | "properties": { 168 | "netFrameworkVersion": "v4.0", 169 | "scmType": "None", 170 | "use32BitWorkerProcess": true, 171 | "webSocketsEnabled": false, 172 | "alwaysOn": false, 173 | "appCommandLine": "", 174 | "managedPipelineMode": "Integrated", 175 | "virtualApplications": [ 176 | { 177 | "virtualPath": "/", 178 | "physicalPath": "site\\wwwroot", 179 | "preloadEnabled": false } 180 | ], 181 | "customAppPoolIdentityAdminState": false, 182 | "customAppPoolIdentityTenantState": false, 183 | "loadBalancing": "LeastRequests", 184 | "routingRules": [], 185 | "experiments": { 186 | "rampUpRules": [] 187 | }, 188 | "autoHealEnabled": false, 189 | "vnetName": "", 190 | "cors": { 191 | "allowedOrigins": [ 192 | "https://functions.azure.com", 193 | "https://functions-staging.azure.com", 194 | "https://functions-next.azure.com" 195 | ], 196 | "supportCredentials": false 197 | } 198 | }, 199 | "dependsOn": [ 200 | "[resourceId('Microsoft.Web/sites', parameters('functionapp1'))]" 201 | ] 202 | }, 203 | { 204 | "type": "Microsoft.Web/sites/hostNameBindings", 205 | "name": "[concat(parameters('functionapp1'), '/', parameters('siteName1'))]", 206 | "apiVersion": "2016-08-01", 207 | "location": "West US", 208 | "properties": { 209 | "siteName": "customerendpointdh1", 210 | "hostNameType": "Verified" 211 | }, 212 | "dependsOn": [ 213 | "[resourceId('Microsoft.Web/sites', parameters('functionapp1'))]" 214 | ] 215 | } 216 | ] 217 | } -------------------------------------------------------------------------------- /azuredeploy.parameters.json: -------------------------------------------------------------------------------- 1 | { 2 | "$schema": "https://schema.management.azure.com/schemas/2015-01-01/deploymentParameters.json", 3 | "contentVersion": "1.0.0.0", 4 | "parameters": { 5 | "functionapp1": { 6 | "value": "203014767_func_app1" 7 | }, 8 | "functionapp2":{ 9 | "value" : "203014767_func_app2" 10 | }, 11 | "storageName":{ 12 | "value": "203014767teststorage" 13 | }, 14 | "outputBlobContainerName":{ 15 | "value" : "203014767_Test_Blob" 16 | }, 17 | "eventSubName1" : { 18 | "value" : "203014767_event1" 19 | }, 20 | "eventSubName2" : { 21 | "value" : "203014767_event2" 22 | }, 23 | "endpoint1" : { 24 | "value" : "203014767_endpoint1" 25 | } 26 | } 27 | } 28 | -------------------------------------------------------------------------------- /blob_to_smart_contract/__init__.py: -------------------------------------------------------------------------------- 1 | import logging 2 | import json 3 | import azure.functions as func 4 | from . import clean as cleaning_service 5 | 6 | def main(req: func.HttpRequest) -> func.HttpResponse: 7 | # This will output to postman 8 | logging.info('Python HTTP trigger function processed a request.') 9 | req_body = req.get_json() 10 | 11 | if is_validation_event(req_body): 12 | return func.HttpResponse(validate_eg(req_body)) 13 | 14 | elif is_blob_created_event(req_body): 15 | result = cleaning_service.clean(req_body) 16 | 17 | if result is "Success": 18 | return func.HttpResponse("Successfully cleaned data",status_code=200) 19 | else: 20 | return func.HttpResponse("Bad Request", status_code=400) 21 | 22 | else: # don't care about other events 23 | pass 24 | 25 | # Check for validation event from event grid 26 | def is_validation_event(req_body): 27 | return req_body and req_body[0] and req_body[0]['eventType'] and req_body[0]['eventType'] == "Microsoft.EventGrid.SubscriptionValidationEvent" 28 | 29 | # If blob created event, then true 30 | def is_blob_created_event(req_body): 31 | return req_body and req_body[0] and req_body[0]['eventType'] and req_body[0]['eventType'] == "Microsoft.Storage.BlobCreated" 32 | 33 | # Respond to event grid webhook validation event 34 | def validate_eg(req_body): 35 | result = {} 36 | result['validationResponse'] = req_body[0]['data']['validationCode'] 37 | return json.dumps(result) -------------------------------------------------------------------------------- /blob_to_smart_contract/clean.py: -------------------------------------------------------------------------------- 1 | #%% 2 | import logging 3 | import requests 4 | import json 5 | import numpy as np 6 | import os 7 | import pandas as pd 8 | from azure.storage.blob import ContentSettings 9 | from azure.storage.blob import BlockBlobService 10 | from io import StringIO 11 | from adal import AuthenticationContext 12 | from . import fetch_blob as fetching_service 13 | #ce4d996051a1005da9245562212cb070efdba9d2 14 | 15 | #python3.6 -m venv funcenv... this creates the funcenv 16 | # source funcenv/bin/activate... this activates virtual environment created above 17 | # func host start after each change 18 | # pip install -r requirements.txt 19 | #ce4d996051a1005da9245562212cb070efdba9d2 20 | 21 | #python3.6 -m venv funcenv... this creates the funcenv 22 | # source funcenv/bin/activate... this activates virtual environment created above 23 | # func host start after each change 24 | # pip install -r requirements.txt 25 | #%% 26 | blob_account_name = os.getenv("BlobAccountName") 27 | blob_account_key = os.getenv("BlobAccountKey") 28 | block_blob_service = BlockBlobService(account_name=blob_account_name, 29 | account_key=blob_account_key) 30 | out_blob_final = os.getenv("OutBlobFinal") 31 | #%% 32 | AUTHORITY = 'https://login.microsoftonline.com/gemtudev.onmicrosoft.com' 33 | 34 | # Click on this link to get the Swagger API reference 35 | base_url = 'https://gemtu-ws5arp-api.azurewebsites.net' 36 | WORKBENCH_API_URL = 'https://gemtu-ws5arp-api.azurewebsites.net' 37 | #base_url = 'https://gemtu-ws5arp-api.azurewebsites.net' 38 | 39 | # This is the application ID of the blockchain workbench web API 40 | # Login to the directory of the tenant -> App registrations -> 'Azure Blockchain Workbench *****-***** -> 41 | # copy the Application ID 42 | RESOURCE = 'a33cc4fb-e3f2-4c23-a005-b46819f58f07' 43 | 44 | #Service principal app id & secret/key: 45 | CLIENT_APP_Id = 'c8c2dab5-db8b-4ae2-8210-45b7a335708e' 46 | CLIENT_SECRET = 'Rh95dZrJobHe3fB/GyhxhPyIRtW8DKmThmFl+CfmtI4=' 47 | #%% 48 | auth_context = AuthenticationContext(AUTHORITY) 49 | #%% 50 | def clean(req_body): 51 | dfCreate = fetch_blobs(out_blob_final) 52 | #create_contract(14, 14, 1, testPayload3b) 53 | json_array= populate_workbench(dfCreate) 54 | result = create_json_blob(json_array) 55 | return 'Success' 56 | #%% 57 | # Read/process CSV into pandas df 58 | def fetch_blobs(out_blob_final): 59 | # Create container & blob dictionary with helper function 60 | blob_dict = fetching_service.blob_to_dict(out_blob_final) 61 | # create DF 62 | filter_string = "final" 63 | df = fetching_service.blob_dict_to_df(blob_dict, filter_string) 64 | logging.info(df.dtypes) 65 | return df 66 | #%% 67 | def populate_workbench(dfCreate): 68 | json_array = [] 69 | #logging.warning(dfCreate.head()) 70 | for index, row in dfCreate.iterrows(): 71 | try: 72 | #logging.warning(dfCreate.iloc[index]) 73 | payload = make_create_payload(dfCreate,index) 74 | json_array+=[payload] 75 | #logging.warning(payload) 76 | #resp = create_contract(workflowId,contractCodeId,connectionId,payload) 77 | #createdContracts.append(resp.text) 78 | except: 79 | print('contract creation failed') 80 | continue 81 | #logging.warning(payload) 82 | return json_array 83 | #%% 84 | def make_create_payload(df,index): 85 | # This function generates the payload json fed from the pandas df 86 | #logging.warning(df) 87 | #need to update this value 88 | workflowFunctionId = 93 89 | try: 90 | #logging.warning('Creating payload...\n') 91 | payload = { 92 | "workflowFunctionId": workflowFunctionId, 93 | "workflowActionParameters": [ 94 | { 95 | "name": "po", 96 | "value": df['po'][index] 97 | }, { 98 | "name": "itemno", 99 | "value": df['itemno'][index] 100 | }, { 101 | "name": "invno", 102 | "value": df['invno'][index] 103 | }, { 104 | "name": "signedinvval", 105 | "value": df['signedinval'][index] 106 | }, { 107 | "name": "invdate", 108 | "value": df['invdate'][index] 109 | }, { 110 | "name": "poformat", 111 | "value": df['poformat'][index] 112 | }, { 113 | "name": "popricematch", 114 | "value": df['popricematch'][index] 115 | }, { 116 | "name": "poinvpricematch", 117 | "value": df['poinvpricematch'][index] 118 | }, { 119 | "name": "initstate", 120 | "value": df['initstate'][index] 121 | }, { 122 | "name": "finalpo", 123 | "value": df['finalpo'][index] 124 | }, { 125 | "name": "finalresult", 126 | "value": df['finalresult'][index] 127 | } 128 | ] 129 | } 130 | #payload = json.dumps(payload) 131 | #logging.warning('payload') 132 | #logging.warning(payload) 133 | return payload 134 | except: 135 | logging.warning('error in payload') 136 | 137 | def create_json_blob(json_array): 138 | #outjson = json_array 139 | #myarray = np.asarray(json_array).tolist() 140 | myarray = pd.Series(json_array).to_json(orient='values') 141 | blob_file_name = "df_to_json.json" 142 | block_blob_service.create_blob_from_text(out_blob_final, blob_file_name, myarray) 143 | return 'Success' 144 | #%% 145 | #testPayload3b = json.dumps(testPayload3) 146 | createdContracts = [] 147 | logging.warning('Creating contracts...\n') 148 | def create_contract(workflowId, contractCodeId, connectionId, payload): 149 | if __name__ == '__main__': 150 | try: 151 | # Acquiring the token 152 | token = auth_context.acquire_token_with_client_credentials( 153 | RESOURCE, CLIENT_APP_Id, CLIENT_SECRET) 154 | #pprint(str(token)) 155 | 156 | url = WORKBENCH_API_URL + '/api/v2/contracts' 157 | 158 | headers = {'Authorization': 'Bearer ' + 159 | token['accessToken'], 'Content-Type': 'application/json'} 160 | 161 | params = {'workflowId': workflowId, 'contractCodeId': contractCodeId, 'connectionId': connectionId} 162 | 163 | # Making call to Workbench 164 | response = requests.post(url=url,data=payload,headers=headers,params=params) 165 | 166 | print('Status code: ' + str(response.status_code), '\n') 167 | print('Created contractId: ' + str(response.text), '\n', '\n') 168 | return response 169 | except Exception as error: 170 | print(error) 171 | return error 172 | 173 | 174 | 175 | 176 | -------------------------------------------------------------------------------- /blob_to_smart_contract/fetch_blob.py: -------------------------------------------------------------------------------- 1 | import logging 2 | import os 3 | import collections 4 | import pandas as pd 5 | import numpy as np 6 | from azure.storage.blob import ContentSettings 7 | from azure.storage.blob import BlockBlobService 8 | from io import StringIO 9 | #kill $(lsof -t -i :7071) 10 | 11 | blob_account_name = os.getenv("BlobAccountName") 12 | blob_account_key = os.getenv("BlobAccountKey") 13 | block_blob_service = BlockBlobService(account_name=blob_account_name, 14 | account_key=blob_account_key) 15 | 16 | def blob_dict_to_df(my_ordered_dict, filter_string): 17 | logging.warning('blob_dict_to_df') 18 | logging.warning(my_ordered_dict) 19 | logging.warning(filter_string) 20 | filtered_dict = {k:v for k,v in my_ordered_dict.items() if filter_string in k} 21 | logging.warning(filtered_dict) 22 | container_key = list(filtered_dict.keys())[0] 23 | latest_file = list(filtered_dict.values())[0] 24 | blobstring = block_blob_service.get_blob_to_text(container_key, latest_file).content 25 | df = pd.read_csv(StringIO(blobstring),dtype=str) 26 | df = df.replace(np.nan, '', regex=True) 27 | df["initstate"] = df["finalresult"].map(lambda x: "0" if "no" in x else "2") 28 | #logging.warning(df.head()) 29 | return df 30 | 31 | def blob_to_dict(*args): 32 | # add containers to list 33 | container_list = [] 34 | arg_len = (len(args)) 35 | i = 0 36 | for i in range(arg_len): 37 | container_list.append(args[i]) 38 | ''.join([str(i) for i in container_list]) 39 | logging.info(container_list) 40 | # get blob file names from container... azure SDK returns a generator object 41 | ii = 0 42 | file_names = [] 43 | for container in container_list: 44 | logging.warning('FOR LOOP') 45 | generator = block_blob_service.list_blobs(container) 46 | logging.warning(list(generator)) 47 | for file in generator: 48 | file_names.append(file.name) 49 | logging.info(file_names[ii]) 50 | ii = ii+1 51 | # Merge the two lists to create a dictionary 52 | container_file_dict = collections.OrderedDict() 53 | container_file_dict = dict(zip(container_list,file_names)) 54 | #blob_dict_to_df(container_file_dict) 55 | logging.warning('blob_to_dict function') 56 | logging.warning(container_file_dict) 57 | return container_file_dict 58 | -------------------------------------------------------------------------------- /blob_to_smart_contract/function.json: -------------------------------------------------------------------------------- 1 | { 2 | "scriptFile": "__init__.py", 3 | "bindings": [ 4 | { 5 | "authLevel": "anonymous", 6 | "type": "httpTrigger", 7 | "direction": "in", 8 | "name": "req", 9 | "methods": [ 10 | "get", 11 | "post" 12 | ] 13 | }, 14 | { 15 | "type": "http", 16 | "direction": "out", 17 | "name": "$return" 18 | } 19 | ] 20 | } -------------------------------------------------------------------------------- /blob_to_smart_contract/sample.dat: -------------------------------------------------------------------------------- 1 | { 2 | "name": "Azure" 3 | } -------------------------------------------------------------------------------- /dataset/config.ini: -------------------------------------------------------------------------------- 1 | ; config.ini 2 | [Columns] 3 | customer=highrandom 4 | order=highrandom 5 | names=Richard,Ben,Nick,Aaron,John 6 | region=east,west,central 7 | item=pens,binder,paper 8 | units=lowrandom 9 | price=lowrandom -------------------------------------------------------------------------------- /dataset/randomcsvgenerator.py: -------------------------------------------------------------------------------- 1 | import configparser 2 | import random 3 | 4 | # nr of rows 5 | rows=100 6 | 7 | # read from config.ini 8 | config = configparser.ConfigParser() 9 | config.read('config.ini') 10 | section = config.sections()[0] 11 | col_names = config.options(section) 12 | 13 | with open('generated.csv','w') as f: 14 | f.write(','.join(col_names) + "\n") 15 | for i in range(1,rows): 16 | line = [] 17 | for col in col_names: 18 | item = config.get(section,col) 19 | # define as many conditions you like... 20 | 21 | # a large random number 22 | if item == "highrandom": 23 | line.append(str(random.randrange(1000000, 9999999))) 24 | # a medium random number 25 | if item == "medrandom": 26 | line.append(str(random.randrange(10000,99999))) 27 | # a low random number 28 | if item == "lowrandom": 29 | line.append(str(random.randrange(100,999))) 30 | # generate random choice from given set of values 31 | if "," in item: 32 | choice = random.choice(item.split(",")) 33 | line.append(str(choice)) 34 | f.write(','.join(line) + "\n") 35 | 36 | f.close() 37 | 38 | -------------------------------------------------------------------------------- /dataset/s1_raw.csv: -------------------------------------------------------------------------------- 1 | customer,order,names,region,item,units,price 2 | 7262165,9703508,Aaron,east,paper,747,997 3 | 4616455,8069744,Ben,west,paper,606,185 4 | 5971611,9145486,Ben,west,pens,271,403 5 | 2338105,3958052,Ben,central,pens,119,318 6 | 6058281,5713029,Aaron,east,pens,111,588 7 | 7799747,1935441,John,central,pens,494,541 8 | 4268894,9609672,John,central,pens,569,269 9 | 3904926,4793823,John,east,pens,480,895 10 | 7136420,4103116,John,central,paper,286,671 11 | 7615742,7936821,Aaron,east,paper,826,688 12 | 5579191,4938850,Richard,central,binder,300,498 13 | 8766316,7885362,Richard,west,paper,212,940 14 | 6829476,2759171,Ben,east,pens,116,984 15 | 9356622,6821948,John,west,paper,411,117 16 | 6120661,1749213,Aaron,central,pens,385,823 17 | 7694333,4818021,Ben,west,paper,239,753 18 | 8973305,3604550,Aaron,west,binder,428,977 19 | 7742689,2042955,Ben,east,pens,280,716 20 | 4876091,2342131,Aaron,east,pens,570,213 21 | 8678810,8595134,Nick,west,paper,666,553 22 | 4761317,2309400,Nick,east,binder,177,374 23 | 8485140,8385257,Ben,east,binder,928,787 24 | 1111334,5531601,Ben,east,paper,155,920 25 | 1587478,6966827,John,west,binder,208,257 26 | 8917514,9208473,John,central,pens,788,610 27 | 6285660,6064145,Nick,east,paper,644,589 28 | 3522700,5650262,Richard,west,pens,599,362 29 | 6392383,4018601,Nick,east,paper,412,186 30 | 7411390,8728047,Richard,west,paper,149,745 31 | 5157191,8207924,Richard,east,paper,949,554 32 | 2381505,2881712,Nick,east,paper,273,231 33 | 3682238,2436487,Ben,west,paper,944,429 34 | 9325361,2922302,Nick,central,binder,649,413 35 | 8921453,2474892,Ben,east,pens,615,826 36 | 9056577,8816645,Nick,west,paper,377,977 37 | 2568631,1440723,Aaron,east,pens,399,129 38 | 5860753,8780527,John,central,binder,791,621 39 | 2079112,5606513,Nick,west,binder,179,422 40 | 5900635,7963477,Aaron,east,binder,773,324 41 | 3195785,6243244,Richard,central,binder,458,611 42 | 5588816,2812958,Nick,east,paper,925,969 43 | 7480228,5361920,Nick,east,paper,493,222 44 | 5240415,2853334,Richard,east,paper,220,108 45 | 2741925,2509574,Aaron,central,binder,950,984 46 | 2889267,2096300,Ben,east,pens,319,215 47 | 2636111,5627275,Nick,central,binder,207,319 48 | 8504974,5010213,Ben,east,paper,578,243 49 | 2840278,3860160,Nick,central,paper,881,489 50 | 3034359,2075511,Ben,west,binder,684,413 51 | 5132194,6559888,Nick,central,pens,502,988 52 | 8058886,5301513,Richard,west,pens,403,356 53 | 4317202,5401933,Ben,west,binder,996,441 54 | 5995744,1889420,Aaron,west,pens,856,393 55 | 7335577,8629612,Aaron,central,paper,658,751 56 | 7574670,8912546,Ben,central,pens,366,530 57 | 5252527,4418895,Ben,central,pens,106,686 58 | 3376538,2151894,Nick,east,binder,336,748 59 | 5126705,4964040,Nick,west,pens,695,166 60 | 7476692,7811601,John,west,pens,714,831 61 | 6407071,5205813,John,east,pens,247,520 62 | 4590122,4003835,Richard,east,binder,996,481 63 | 8088663,4112730,Nick,east,binder,748,257 64 | 4453747,7728857,Ben,west,binder,971,433 65 | 6013003,2347973,John,east,pens,799,423 66 | 9244644,1181002,Ben,central,paper,293,434 67 | 2497717,9072391,Nick,west,pens,902,180 68 | 3343840,8678453,Richard,east,paper,617,228 69 | 1477311,9194058,John,east,paper,476,735 70 | 5865196,7676539,Ben,central,paper,624,111 71 | 4977880,4045629,Ben,central,binder,382,642 72 | 1149541,3004955,Nick,east,binder,455,136 73 | 9546677,7430616,Ben,central,paper,994,323 74 | 6475847,6794001,Nick,west,pens,211,146 75 | 1123534,9223169,John,central,paper,634,773 76 | 8339182,7183324,John,west,paper,808,461 77 | 9022922,3870153,Aaron,central,pens,416,173 78 | 4277498,8617843,Richard,central,paper,494,152 79 | 7259430,3632115,Nick,east,binder,215,536 80 | 6714318,3847473,John,central,paper,231,835 81 | 4133799,5878001,Richard,east,pens,680,722 82 | 8560303,7350110,John,east,binder,702,245 83 | 7310662,5376060,Richard,west,binder,894,315 84 | 5520029,2769000,Ben,west,paper,199,618 85 | 9296388,5402422,Nick,central,binder,936,532 86 | 2174535,5536311,Ben,west,binder,660,275 87 | 9897466,3653221,Nick,west,binder,858,656 88 | 7310133,8262752,Richard,central,pens,655,677 89 | 6768863,4916288,Richard,west,binder,350,380 90 | 9090185,2833327,John,east,binder,353,216 91 | 8453475,4107163,Ben,west,paper,735,318 92 | 4019264,9935008,Nick,east,pens,358,514 93 | 3638572,9898492,Richard,east,paper,872,390 94 | 9594055,5740416,Aaron,central,binder,932,562 95 | 9749755,3613121,Ben,central,pens,265,589 96 | 2081996,4848737,John,central,paper,113,920 97 | 1122957,8130323,Nick,west,pens,111,557 98 | 4199051,6375160,Ben,central,pens,169,849 99 | 9823036,1553562,John,west,binder,225,839 100 | 2866071,6919487,John,west,paper,714,601 101 | -------------------------------------------------------------------------------- /dataset/s2_raw.csv: -------------------------------------------------------------------------------- 1 | customer,order,names,region,item,units,price 2 | 2630120,6615957,John,west,paper,236,820 3 | 1928229,1631195,Nick,west,paper,186,450 4 | 8703733,3332001,Ben,central,binder,789,650 5 | 4508368,9651099,Richard,central,pens,948,551 6 | 6895053,6691904,Richard,east,paper,453,548 7 | 1018357,1437828,Ben,central,pens,954,640 8 | 2697132,9866065,Nick,central,paper,696,353 9 | 3027058,9130952,Aaron,east,binder,138,223 10 | 2073981,9141578,Ben,west,paper,325,422 11 | 6076983,4238099,Nick,central,paper,134,344 12 | 5316851,8121173,John,west,pens,825,954 13 | 9962221,1977268,Aaron,central,paper,557,557 14 | 5398147,6367649,John,east,paper,532,566 15 | 3864861,4066176,Aaron,west,binder,381,199 16 | 9821733,9512218,Aaron,east,pens,466,583 17 | 2940832,3210755,John,central,binder,119,616 18 | 1799654,4468679,John,west,paper,622,300 19 | 6729716,9309020,Ben,west,binder,948,623 20 | 7280784,8332358,Ben,west,paper,279,225 21 | 6674887,1613599,Ben,west,paper,221,427 22 | 7863449,7505176,John,east,pens,218,890 23 | 3609656,4698495,Ben,west,pens,196,563 24 | 7592925,7749241,Richard,central,pens,339,498 25 | 8875502,3067891,Nick,west,binder,927,260 26 | 5286002,2341849,Richard,east,paper,801,965 27 | 5051433,5163955,Richard,east,pens,393,798 28 | 5699284,9868416,Ben,west,paper,280,750 29 | 7043309,2474609,Nick,east,paper,147,353 30 | 7151204,8237679,Nick,west,binder,664,620 31 | 9170699,9080335,John,east,binder,221,626 32 | 6321713,5514052,John,east,pens,306,410 33 | 7448969,6503473,Aaron,central,binder,293,274 34 | 4549509,6654647,Richard,west,paper,918,868 35 | 9453682,1636058,Aaron,central,pens,976,280 36 | 5335378,5838107,Richard,west,binder,699,220 37 | 1392828,7208028,Richard,west,binder,182,786 38 | 9697042,1346679,John,west,pens,238,212 39 | 1451047,4435497,Richard,west,binder,365,718 40 | 3594849,8554543,Ben,west,paper,945,127 41 | 3867317,2521725,John,east,pens,842,270 42 | 4558641,1050934,Nick,west,paper,605,286 43 | 4619372,7948476,Ben,east,pens,512,682 44 | 2026276,4485732,Ben,east,binder,795,857 45 | 2719065,5068010,Richard,east,pens,289,436 46 | 1391907,2041945,John,west,pens,917,627 47 | 1868539,2325194,Ben,west,binder,579,190 48 | 4108552,8039195,John,west,paper,271,808 49 | 1046194,5168931,Nick,central,pens,513,693 50 | 8301946,2956675,Richard,east,binder,567,761 51 | 9055248,5868755,Richard,west,binder,232,219 52 | 3874847,1563078,Ben,east,paper,706,611 53 | 8616293,5952825,Ben,east,paper,665,683 54 | 4657692,8199620,Nick,west,binder,570,961 55 | 2937477,1920961,Nick,central,paper,121,799 56 | 2902393,7232627,Richard,central,paper,873,145 57 | 2801703,7954307,Nick,central,pens,581,550 58 | 1579315,4808019,John,central,pens,646,138 59 | 5104644,4471392,Aaron,central,paper,839,524 60 | 8117338,8816269,Aaron,east,binder,795,481 61 | 4292715,5144317,Nick,central,binder,639,451 62 | 9574437,7149165,Nick,east,pens,474,510 63 | 1286942,6788174,Ben,west,binder,836,742 64 | 7914658,1253557,John,central,binder,586,662 65 | 3610539,8287938,John,west,paper,170,416 66 | 9691626,7703325,Ben,west,binder,292,794 67 | 7773380,3324706,Ben,central,paper,372,558 68 | 9560196,7923059,Nick,east,pens,727,181 69 | 8331616,6920131,Ben,east,pens,262,530 70 | 2169243,8424174,Ben,east,paper,988,668 71 | 9149901,1420867,Richard,east,pens,310,693 72 | 1952779,4474360,Aaron,east,paper,333,782 73 | 9247917,9273201,Nick,east,pens,790,519 74 | 1899785,2109114,Ben,east,pens,476,668 75 | 7856754,2280721,Ben,east,paper,124,746 76 | 2105839,5509421,Nick,central,pens,379,716 77 | 3515994,1988786,Aaron,central,pens,599,712 78 | 6461676,7340276,John,central,pens,920,190 79 | 7276182,1076975,Aaron,central,binder,927,840 80 | 2152277,2696815,Aaron,east,binder,685,812 81 | 5527535,5810406,Aaron,west,pens,542,337 82 | 8463126,2974927,Richard,central,paper,678,133 83 | 7173049,8681162,Nick,central,paper,506,847 84 | 8719679,5690117,Nick,central,pens,230,578 85 | 9617614,9591048,Richard,east,paper,913,301 86 | 3377423,3798798,Ben,east,paper,769,947 87 | 8451040,1070835,Richard,east,pens,418,508 88 | 8332099,8158160,Ben,west,binder,657,577 89 | 6570058,4061390,Aaron,west,binder,406,426 90 | 4080314,8616824,Aaron,central,pens,797,221 91 | 4375686,4191217,John,west,binder,734,550 92 | 2386825,6043101,Nick,east,binder,524,491 93 | 3272868,2159803,Ben,central,paper,523,601 94 | 5116646,6224073,Richard,central,binder,531,529 95 | 9699564,8448356,Richard,west,paper,706,241 96 | 8484139,1132006,Aaron,central,binder,241,516 97 | 3612660,7468263,Richard,east,binder,437,107 98 | 5732816,1145162,John,central,paper,289,420 99 | 3484603,6390384,Nick,west,binder,112,242 100 | 2040436,2378152,Richard,west,binder,990,590 101 | -------------------------------------------------------------------------------- /host.json: -------------------------------------------------------------------------------- 1 | { 2 | "version": "2.0" 3 | } -------------------------------------------------------------------------------- /local.settings.json: -------------------------------------------------------------------------------- 1 | { 2 | "IsEncrypted": false, 3 | "Values": { 4 | "FUNCTIONS_WORKER_RUNTIME": "python", 5 | "BlobAccountName": "", 6 | "BlobAccountKey": "", 7 | "C1": "c1raw", 8 | "C2": "c2raw", 9 | "FINAL" : "reconciled" 10 | }, 11 | "ConnectionStrings": {} 12 | } 13 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | asn1crypto==0.24.0 2 | astroid==2.1.0 3 | azure-common==1.1.18 4 | azure-functions==1.0.0a5 5 | azure-functions-worker==1.0.0a6 6 | azure-storage-blob==1.4.0 7 | azure-storage-common==1.4.0 8 | certifi==2018.11.29 9 | cffi==1.11.5 10 | chardet==3.0.4 11 | cryptography==2.5 12 | cycler==0.10.0 13 | grpcio==1.14.2 14 | grpcio-tools==1.14.2 15 | idna==2.8 16 | isort==4.3.4 17 | kiwisolver==1.0.1 18 | lazy-object-proxy==1.3.1 19 | matplotlib==3.0.2 20 | mccabe==0.6.1 21 | numpy==1.16.1 22 | pandas==0.24.1 23 | protobuf==3.6.1 24 | ptvsd==4.2.3 25 | pycparser==2.19 26 | pylint==2.2.2 27 | pyparsing==2.3.1 28 | python-dateutil==2.7.5 29 | pytz==2018.9 30 | requests==2.21.0 31 | scikit-learn==0.20.2 32 | scipy==1.2.0 33 | six==1.12.0 34 | sklearn==0.0 35 | typed-ast==1.3.0 36 | urllib3==1.24.1 37 | wrapt==1.11.1 38 | -------------------------------------------------------------------------------- /tests/host.json: -------------------------------------------------------------------------------- 1 | { 2 | "version": "2.0" 3 | } -------------------------------------------------------------------------------- /tests/subvalidation.json: -------------------------------------------------------------------------------- 1 | [{ 2 | "id": "2d1781af-3a4c-4d7c-bd0c-e34b19da4e66", 3 | "topic": "/subscriptions/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx", 4 | "subject": "", 5 | "data": { 6 | "validationCode": "512d38b6-c7b8-40c8-89fe-f46f9e9622b6", 7 | "validationUrl": "https://rp-eastus2.eventgrid.azure.net:553/eventsubscriptions/estest/validate?id=B2E34264-7D71-453A-B5FB-B62D0FDC85EE&t=2018-04-26T20:30:54.4538837Z&apiVersion=2018-05-01-preview&token=1BNqCxBBSSE9OnNSfZM4%2b5H9zDegKMY6uJ%2fO2DFRkwQ%3d" 8 | }, 9 | "eventType": "Microsoft.EventGrid.SubscriptionValidationEvent", 10 | "eventTime": "2018-01-25T22:12:19.4556811Z", 11 | "metadataVersion": "1", 12 | "dataVersion": "1" 13 | }] -------------------------------------------------------------------------------- /tests/test_eventgrid.py: -------------------------------------------------------------------------------- 1 | import pytest 2 | import subprocess 3 | import os 4 | import signal 5 | import requests 6 | import json 7 | import time 8 | import collections 9 | ### Travis: http://luisquintanilla.me/2018/02/18/testing-deploying-python-projects-travisci/ 10 | #func host start and then open a new terminal and cd into the test folder 11 | # pytest -v test_eventgrid.py 12 | ## https://learning.oreilly.com/library/view/python-testing-with/9781680502848/f_0011.xhtml#ch.pytest 13 | # The Task structure is used as a data structure to pass information between the UI and the API 14 | #Task = collections.namedtuple(​'Task'​, [​'summary'​, ​'owner'​, ​'done'​, ​'id'​]) 15 | pro = None 16 | 17 | #Task = collections.namedtuple(​'Task'​, [​'summary'​, ​'owner'​, ​'done'​, ​'id'​]) 18 | # You can use __new__.__defaults__ to create Task objects without having to specify all the fields. 19 | #Task.__new__.__defaults__ = (None, None, False, None) 20 | 21 | 22 | @pytest.fixture 23 | def init_func(): 24 | pass 25 | #subprocess.Popen("func host start",shell=True) 26 | # The os.setsid() is passed in the argument preexec_fn so 27 | # it's run after the fork() and before exec() to run the shell. 28 | #pro = subprocess.Popen(['func','host','start'],stdout=subprocess.PIPE, shell=True, preexec_fn=os.setsid) 29 | #yield 30 | #print("tearing down functions host...") 31 | #os.killpg(os.getpgid(pro.pid), signal.SIGTERM) 32 | 33 | 34 | # https://docs.pytest.org/en/latest/fixture.html 35 | # https://docs.pytest.org/en/latest/parametrize.html 36 | # https://learning.oreilly.com/library/view/Python+Testing+with+pytest/9781680502848/f_0026.xhtml#parametrized_testing 37 | # Use @pytest.mark.parametrize(argnames, argvalues) to pass lots of data through the same test, like this: 38 | @pytest.mark.parametrize('web', ['http://localhost:7071/api/GE_Clean_Trigger', 39 | 'http://localhost:7071/api/MTU_Clean_Trigger', 40 | 'http://localhost:7071/api/PO_Match']) 41 | #@pytest.fixture 42 | #def web(): 43 | # links = 'http://localhost:7071/api/GE_Clean_Trigger' 44 | # return links 45 | 46 | def test_eg_validation(init_func, web): 47 | with open('subvalidation.json') as f: 48 | payload = json.load(f) 49 | r = requests.post(web, json = payload) 50 | print(r.status_code,r.json()) 51 | assert 'validationResponse' in str(r.json()) 52 | --------------------------------------------------------------------------------