├── .gitignore ├── LICENSE ├── README.md ├── gcs-dlp-classification-python ├── README.md ├── main.py └── requirements.txt ├── sample_data ├── sample_n01.txt ├── sample_n02.txt ├── sample_n03.txt ├── sample_n04.txt ├── sample_n05.txt ├── sample_n06.txt ├── sample_n07.txt ├── sample_n08.txt ├── sample_n09.txt ├── sample_n10.txt ├── sample_n11.csv ├── sample_n12.csv ├── sample_n13.csv ├── sample_n14.csv ├── sample_n15.csv ├── sample_s01.csv ├── sample_s02.csv ├── sample_s03.csv ├── sample_s04.csv ├── sample_s05.csv ├── sample_s06.csv ├── sample_s07.csv ├── sample_s08.csv ├── sample_s09.csv ├── sample_s10.csv ├── sample_s11.csv ├── sample_s12.csv ├── sample_s13.csv ├── sample_s14.csv ├── sample_s15.csv ├── sample_s16.csv ├── sample_s17.csv ├── sample_s18.csv ├── sample_s19.csv ├── sample_s20.csv ├── sample_s21.txt ├── sample_s22.txt ├── sample_s23.txt ├── sample_s24.txt └── sample_s25.txt ├── terraform ├── README.md ├── dlp │ ├── dlp.tf │ ├── terraform.tfvars │ └── variables.tf ├── dlpfunction │ ├── dlpfunction.tf │ ├── terraform.tfvars │ └── variables.tf ├── infra │ ├── infra.tf │ ├── terraform.tfvars │ └── variables.tf └── src │ └── requirements.txt └── test ├── sample_n01.txt ├── sample_s01.csv ├── sample_s02.csv ├── smoke-test-gcf-dlp.sh └── test-gcs.js /.gitignore: -------------------------------------------------------------------------------- 1 | # ignore DS_Store 2 | .DS_Store 3 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | 2 | Apache License 3 | Version 2.0, January 2004 4 | http://www.apache.org/licenses/ 5 | 6 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 7 | 8 | 1. Definitions. 9 | 10 | "License" shall mean the terms and conditions for use, reproduction, 11 | and distribution as defined by Sections 1 through 9 of this document. 12 | 13 | "Licensor" shall mean the copyright owner or entity authorized by 14 | the copyright owner that is granting the License. 15 | 16 | "Legal Entity" shall mean the union of the acting entity and all 17 | other entities that control, are controlled by, or are under common 18 | control with that entity. For the purposes of this definition, 19 | "control" means (i) the power, direct or indirect, to cause the 20 | direction or management of such entity, whether by contract or 21 | otherwise, or (ii) ownership of fifty percent (50%) or more of the 22 | outstanding shares, or (iii) beneficial ownership of such entity. 23 | 24 | "You" (or "Your") shall mean an individual or Legal Entity 25 | exercising permissions granted by this License. 26 | 27 | "Source" form shall mean the preferred form for making modifications, 28 | including but not limited to software source code, documentation 29 | source, and configuration files. 30 | 31 | "Object" form shall mean any form resulting from mechanical 32 | transformation or translation of a Source form, including but 33 | not limited to compiled object code, generated documentation, 34 | and conversions to other media types. 35 | 36 | "Work" shall mean the work of authorship, whether in Source or 37 | Object form, made available under the License, as indicated by a 38 | copyright notice that is included in or attached to the work 39 | (an example is provided in the Appendix below). 40 | 41 | "Derivative Works" shall mean any work, whether in Source or Object 42 | form, that is based on (or derived from) the Work and for which the 43 | editorial revisions, annotations, elaborations, or other modifications 44 | represent, as a whole, an original work of authorship. For the purposes 45 | of this License, Derivative Works shall not include works that remain 46 | separable from, or merely link (or bind by name) to the interfaces of, 47 | the Work and Derivative Works thereof. 48 | 49 | "Contribution" shall mean any work of authorship, including 50 | the original version of the Work and any modifications or additions 51 | to that Work or Derivative Works thereof, that is intentionally 52 | submitted to Licensor for inclusion in the Work by the copyright owner 53 | or by an individual or Legal Entity authorized to submit on behalf of 54 | the copyright owner. For the purposes of this definition, "submitted" 55 | means any form of electronic, verbal, or written communication sent 56 | to the Licensor or its representatives, including but not limited to 57 | communication on electronic mailing lists, source code control systems, 58 | and issue tracking systems that are managed by, or on behalf of, the 59 | Licensor for the purpose of discussing and improving the Work, but 60 | excluding communication that is conspicuously marked or otherwise 61 | designated in writing by the copyright owner as "Not a Contribution." 62 | 63 | "Contributor" shall mean Licensor and any individual or Legal Entity 64 | on behalf of whom a Contribution has been received by Licensor and 65 | subsequently incorporated within the Work. 66 | 67 | 2. Grant of Copyright License. Subject to the terms and conditions of 68 | this License, each Contributor hereby grants to You a perpetual, 69 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 70 | copyright license to reproduce, prepare Derivative Works of, 71 | publicly display, publicly perform, sublicense, and distribute the 72 | Work and such Derivative Works in Source or Object form. 73 | 74 | 3. Grant of Patent License. Subject to the terms and conditions of 75 | this License, each Contributor hereby grants to You a perpetual, 76 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 77 | (except as stated in this section) patent license to make, have made, 78 | use, offer to sell, sell, import, and otherwise transfer the Work, 79 | where such license applies only to those patent claims licensable 80 | by such Contributor that are necessarily infringed by their 81 | Contribution(s) alone or by combination of their Contribution(s) 82 | with the Work to which such Contribution(s) was submitted. If You 83 | institute patent litigation against any entity (including a 84 | cross-claim or counterclaim in a lawsuit) alleging that the Work 85 | or a Contribution incorporated within the Work constitutes direct 86 | or contributory patent infringement, then any patent licenses 87 | granted to You under this License for that Work shall terminate 88 | as of the date such litigation is filed. 89 | 90 | 4. Redistribution. You may reproduce and distribute copies of the 91 | Work or Derivative Works thereof in any medium, with or without 92 | modifications, and in Source or Object form, provided that You 93 | meet the following conditions: 94 | 95 | (a) You must give any other recipients of the Work or 96 | Derivative Works a copy of this License; and 97 | 98 | (b) You must cause any modified files to carry prominent notices 99 | stating that You changed the files; and 100 | 101 | (c) You must retain, in the Source form of any Derivative Works 102 | that You distribute, all copyright, patent, trademark, and 103 | attribution notices from the Source form of the Work, 104 | excluding those notices that do not pertain to any part of 105 | the Derivative Works; and 106 | 107 | (d) If the Work includes a "NOTICE" text file as part of its 108 | distribution, then any Derivative Works that You distribute must 109 | include a readable copy of the attribution notices contained 110 | within such NOTICE file, excluding those notices that do not 111 | pertain to any part of the Derivative Works, in at least one 112 | of the following places: within a NOTICE text file distributed 113 | as part of the Derivative Works; within the Source form or 114 | documentation, if provided along with the Derivative Works; or, 115 | within a display generated by the Derivative Works, if and 116 | wherever such third-party notices normally appear. The contents 117 | of the NOTICE file are for informational purposes only and 118 | do not modify the License. You may add Your own attribution 119 | notices within Derivative Works that You distribute, alongside 120 | or as an addendum to the NOTICE text from the Work, provided 121 | that such additional attribution notices cannot be construed 122 | as modifying the License. 123 | 124 | You may add Your own copyright statement to Your modifications and 125 | may provide additional or different license terms and conditions 126 | for use, reproduction, or distribution of Your modifications, or 127 | for any such Derivative Works as a whole, provided Your use, 128 | reproduction, and distribution of the Work otherwise complies with 129 | the conditions stated in this License. 130 | 131 | 5. Submission of Contributions. Unless You explicitly state otherwise, 132 | any Contribution intentionally submitted for inclusion in the Work 133 | by You to the Licensor shall be under the terms and conditions of 134 | this License, without any additional terms or conditions. 135 | Notwithstanding the above, nothing herein shall supersede or modify 136 | the terms of any separate license agreement you may have executed 137 | with Licensor regarding such Contributions. 138 | 139 | 6. Trademarks. This License does not grant permission to use the trade 140 | names, trademarks, service marks, or product names of the Licensor, 141 | except as required for reasonable and customary use in describing the 142 | origin of the Work and reproducing the content of the NOTICE file. 143 | 144 | 7. Disclaimer of Warranty. Unless required by applicable law or 145 | agreed to in writing, Licensor provides the Work (and each 146 | Contributor provides its Contributions) on an "AS IS" BASIS, 147 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 148 | implied, including, without limitation, any warranties or conditions 149 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A 150 | PARTICULAR PURPOSE. You are solely responsible for determining the 151 | appropriateness of using or redistributing the Work and assume any 152 | risks associated with Your exercise of permissions under this License. 153 | 154 | 8. Limitation of Liability. In no event and under no legal theory, 155 | whether in tort (including negligence), contract, or otherwise, 156 | unless required by applicable law (such as deliberate and grossly 157 | negligent acts) or agreed to in writing, shall any Contributor be 158 | liable to You for damages, including any direct, indirect, special, 159 | incidental, or consequential damages of any character arising as a 160 | result of this License or out of the use or inability to use the 161 | Work (including but not limited to damages for loss of goodwill, 162 | work stoppage, computer failure or malfunction, or any and all 163 | other commercial damages or losses), even if such Contributor 164 | has been advised of the possibility of such damages. 165 | 166 | 9. Accepting Warranty or Additional Liability. While redistributing 167 | the Work or Derivative Works thereof, You may choose to offer, 168 | and charge a fee for, acceptance of support, warranty, indemnity, 169 | or other liability obligations and/or rights consistent with this 170 | License. However, in accepting such obligations, You may act only 171 | on Your own behalf and on Your sole responsibility, not on behalf 172 | of any other Contributor, and only if You agree to indemnify, 173 | defend, and hold each Contributor harmless for any liability 174 | incurred by, or claims asserted against, such Contributor by reason 175 | of your accepting any such warranty or additional liability. 176 | 177 | END OF TERMS AND CONDITIONS 178 | 179 | APPENDIX: How to apply the Apache License to your work. 180 | 181 | To apply the Apache License to your work, attach the following 182 | boilerplate notice, with the fields enclosed by brackets "[]" 183 | replaced with your own identifying information. (Don't include 184 | the brackets!) The text should be enclosed in the appropriate 185 | comment syntax for the file format. We also recommend that a 186 | file or class name and description of purpose be included on the 187 | same "printed page" as the copyright notice for easier 188 | identification within third-party archives. 189 | 190 | Copyright [yyyy] [name of copyright owner] 191 | 192 | Licensed under the Apache License, Version 2.0 (the "License"); 193 | you may not use this file except in compliance with the License. 194 | You may obtain a copy of the License at 195 | 196 | http://www.apache.org/licenses/LICENSE-2.0 197 | 198 | Unless required by applicable law or agreed to in writing, software 199 | distributed under the License is distributed on an "AS IS" BASIS, 200 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 201 | See the License for the specific language governing permissions and 202 | limitations under the License. 203 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Serverless Data Loss Prevention examples 2 | 3 | This repository contains a set of 'serverless' examples to illustrate how to use the Data Loss Prevention API without managing any servers. 4 | 5 | ## How to use the examples 6 | 7 | Use the tutorial to understand how to configure your Google Cloud Platform project to use Cloud functions and the Data Loss Prevention API. 8 | 9 | ## Quickstart 10 | 11 | Clone this repository 12 | 13 | ` git clone https://github.com/GoogleCloudPlatform/dlp-cloud-functions-tutorials.git` 14 | 15 | Change directory to one of the example directories 16 | 17 | Follow the walkthrough in the tutorial associated with the python example for configuration details of Cloud platform products (Cloud Storage, Cloud Pub/Sub and Cloud IAM permissions) and adapt accordingly using the accompanying README for each example. 18 | 19 | Note: you may wish to reuse the same project to try all examples if so : 20 | * Ensure you delete all files from the buckets you configured for the first tutorial in the series before re-using. 21 | * Delete any exsting cloud functions you have deployed 22 | 23 | ## License 24 | 25 | [Apache Version 2.0](http://www.apache.org/licenses/LICENSE-2.0) 26 | -------------------------------------------------------------------------------- /gcs-dlp-classification-python/README.md: -------------------------------------------------------------------------------- 1 | 2 | # Cloud function that uses the Data loss Prevention API to classify files uploaded to a Cloud storage bucket 3 | 4 | Pre-reqs : See the tutorial that accompanies this example: 5 | https://cloud.google.com/solutions/automating-classification-of-data-uploaded-to-cloud-storage 6 | 7 | The workflow: 8 | 9 | * Grant Cloud IAM permissions to service accounts (Refer to tutorial) 10 | * Create 3 Cloud Storage buckets 11 | * Create Cloud Pub/Sub topic & subscription for notification of DLP job completion 12 | * Replace variables in the Cloud function file with your values 13 | * Associate the 1st Cloud Function with a designated quarantine/Staging bucket 14 | * Associate the 2nd Cloud Function with a Pub/Sub topic 15 | * Upload files to the quarantine bucket 16 | * The cloud functions are invoked automatically 17 | * The Data Loss Prevention API inspects and classifies the data 18 | * The file is moved to the appropriate bucket 19 | 20 | ## How to run the example 21 | 22 | Update the following user configurable values in the main.py file 23 | 24 | ``` 25 | [PROJECT_ID_DLP_JOB & TOPIC] Replace with your Project ID 26 | [YOUR_NON_SENSITIVE_DATA_BUCKET] Replace with the name of the bucket where non sensitive data will be moved to 27 | [YOUR_SENSITIVE_DATA_BUCKET] Replace with the name of the bucket where sensitive data will be moved to 28 | [YOUR_QUARANTINE_BUCKET] Replace with the name of the bucket where you will upload your files to 29 | [PUB/SUB_TOPIC] Replace with your Pub/Sub topic name 30 | ``` 31 | Deploy the first Function 32 | 33 | `gcloud functions deploy gcs_file_upload_DLP_job --entry-point create_DLP_job --runtime python37 --trigger-resource ${YOUR_QUARANTINE_BUCKET} --trigger-event google.storage.object.finalize ` 34 | 35 | Deploy the second function 36 | 37 | `gcloud functions deploy DLP_pub_classify_file --entry-point resolve_DLP --runtime python37 --trigger-topic ${YOUR_PUB_SUB_TOPIC} ` 38 | 39 | Change directories to the directory that contains the sample data 40 | 41 | Upload the sample files to `${YOUR_QUARANTINE_BUCKET}` 42 | 43 | `gsutil -m cp * gs://${YOUR_QUARANTINE_BUCKET}/` 44 | 45 | ## Refs 46 | 47 | https://cloud.google.com/dlp/docs/libraries 48 | 49 | 50 | ## License 51 | 52 | [Apache Version 2.0](http://www.apache.org/licenses/LICENSE-2.0) 53 | -------------------------------------------------------------------------------- /gcs-dlp-classification-python/main.py: -------------------------------------------------------------------------------- 1 | """ Copyright 2018, Google, Inc. 2 | 3 | Licensed under the Apache License, Version 2.0 (the "License"); 4 | you may not use this file except in compliance with the License. 5 | You may obtain a copy of the License at 6 | 7 | http://www.apache.org/licenses/LICENSE-2.0 8 | 9 | Unless required by applicable law or agreed to in writing, software 10 | distributed under the License is distributed on an "AS IS" BASIS, 11 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 | See the License for the specific language governing permissions and 13 | limitations under the License. 14 | 15 | Authors: Yuhan Guo, Zhaoyuan Sun, Fengyi Huang, Weimu Song. 16 | Date: October 2018 17 | 18 | """ 19 | 20 | from google.cloud import dlp 21 | from google.cloud import storage 22 | from google.cloud import pubsub 23 | from google.cloud import logging 24 | import os 25 | 26 | # ---------------------------- 27 | # User-configurable Constants 28 | 29 | PROJECT_ID = os.getenv('DLP_PROJECT_ID', '[PROJECT_ID_DLP_JOB & TOPIC]') 30 | """The bucket the to-be-scanned files are uploaded to.""" 31 | STAGING_BUCKET = os.getenv('QUARANTINE_BUCKET', '[YOUR_QUARANTINE_BUCKET]') 32 | """The bucket to move "sensitive" files to.""" 33 | SENSITIVE_BUCKET = os.getenv('SENSITIVE_DATA_BUCKET', '[YOUR_SENSITIVE_DATA_BUCKET]') 34 | """The bucket to move "non sensitive" files to.""" 35 | NONSENSITIVE_BUCKET = os.getenv('INSENSITIVE_DATA_BUCKET', '[YOUR_NON_SENSITIVE_DATA_BUCKET]') 36 | """ Pub/Sub topic to notify once the DLP job completes.""" 37 | PUB_SUB_TOPIC = os.getenv('PUB_SUB_TOPIC', '[PUB/SUB_TOPIC]') 38 | """The minimum_likelihood (Enum) required before returning a match""" 39 | """For more info visit: https://cloud.google.com/dlp/docs/likelihood""" 40 | MIN_LIKELIHOOD = os.getenv('MIN_LIKELIHOOD', 'POSSIBLE') 41 | """The maximum number of findings to report (0 = server maximum)""" 42 | MAX_FINDINGS = 0 43 | """The infoTypes of information to match. ALL_BASIC for common infoTypes""" 44 | """For more info visit: https://cloud.google.com/dlp/docs/concepts-infotypes""" 45 | INFO_TYPES = os.getenv('INFO_TYPES', 'FIRST_NAME,PHONE_NUMBER,EMAIL_ADDRESS,US_SOCIAL_SECURITY_NUMBER').split(',') 46 | 47 | APP_LOG_NAME = os.getenv('LOG_NAME', 'DLP-classify-gcs-files') 48 | 49 | # End of User-configurable Constants 50 | # ---------------------------------- 51 | 52 | # Initialize the Google Cloud client libraries 53 | dlp = dlp.DlpServiceClient() 54 | storage_client = storage.Client() 55 | publisher = pubsub.PublisherClient() 56 | subscriber = pubsub.SubscriberClient() 57 | 58 | LOG_SEVERITY_DEFAULT = 'DEFAULT' 59 | LOG_SEVERITY_INFO = 'INFO' 60 | LOG_SEVERITY_ERROR = 'ERROR' 61 | LOG_SEVERITY_WARNING = 'WARNING' 62 | LOG_SEVERITY_DEBUG = 'DEBUG' 63 | 64 | 65 | def log(text, severity=LOG_SEVERITY_DEFAULT, log_name=APP_LOG_NAME): 66 | logging_client = logging.Client() 67 | logger = logging_client.logger(log_name) 68 | 69 | return logger.log_text(text, severity=severity) 70 | 71 | 72 | def create_DLP_job(data, done): 73 | """This function is triggered by new files uploaded to the designated Cloud Storage quarantine/staging bucket. 74 | 75 | It creates a dlp job for the uploaded file. 76 | Arg: 77 | data: The Cloud Storage Event 78 | Returns: 79 | None. Debug information is printed to the log. 80 | """ 81 | # Get the targeted file in the quarantine bucket 82 | file_name = data['name'] 83 | log('Function triggered for file [{}] to start a DLP job of InfoTypes [{}]'.format(file_name, ','.join(INFO_TYPES)), 84 | severity=LOG_SEVERITY_INFO) 85 | 86 | # Prepare info_types by converting the list of strings (INFO_TYPES) into a list of dictionaries 87 | info_types = [{'name': info_type} for info_type in INFO_TYPES] 88 | 89 | # Convert the project id into a full resource id. 90 | parent = f"projects/{PROJECT_ID}" 91 | 92 | # Construct the configuration dictionary. 93 | inspect_job = { 94 | 'inspect_config': { 95 | 'info_types': info_types, 96 | 'min_likelihood': MIN_LIKELIHOOD, 97 | 'limits': { 98 | 'max_findings_per_request': MAX_FINDINGS 99 | }, 100 | }, 101 | 'storage_config': { 102 | 'cloud_storage_options': { 103 | 'file_set': { 104 | 'url': 105 | 'gs://{bucket_name}/{file_name}'.format( 106 | bucket_name=STAGING_BUCKET, file_name=file_name) 107 | } 108 | } 109 | }, 110 | 'actions': [{ 111 | 'pub_sub': { 112 | 'topic': 113 | 'projects/{project_id}/topics/{topic_id}'.format( 114 | project_id=PROJECT_ID, topic_id=PUB_SUB_TOPIC) 115 | } 116 | }] 117 | } 118 | 119 | # Create the DLP job and let the DLP api processes it. 120 | try: 121 | dlp.create_dlp_job(parent=(parent), inspect_job=(inspect_job)) 122 | log('Job created by create_DLP_job', severity=LOG_SEVERITY_INFO) 123 | except Exception as e: 124 | log(e, severity=LOG_SEVERITY_ERROR) 125 | 126 | 127 | def resolve_DLP(data, context): 128 | """This function listens to the pub/sub notification from function above. 129 | 130 | As soon as it gets pub/sub notification, it picks up results from the 131 | DLP job and moves the file to sensitive bucket or nonsensitive bucket 132 | accordingly. 133 | Args: 134 | data: The Cloud Pub/Sub event 135 | 136 | Returns: 137 | None. Debug information is printed to the log. 138 | """ 139 | # Get the targeted DLP job name that is created by the create_DLP_job function 140 | job_name = data['attributes']['DlpJobName'] 141 | log('Received pub/sub notification from DLP job: {}'.format(job_name), severity=LOG_SEVERITY_INFO) 142 | 143 | # Get the DLP job details by the job_name 144 | job = dlp.get_dlp_job(request={'name': job_name}) 145 | log('Job Name:{name}\nStatus:{status}'.format(name=job.name, status=job.state), severity=LOG_SEVERITY_INFO) 146 | 147 | # Fetching Filename in Cloud Storage from the original dlpJob config. 148 | # See defintion of "JSON Output' in Limiting Cloud Storage Scans': 149 | # https://cloud.google.com/dlp/docs/inspecting-storage 150 | 151 | file_path = ( 152 | job.inspect_details.requested_options.job_config.storage_config 153 | .cloud_storage_options.file_set.url) 154 | file_name = file_path.split("/", 3)[3] 155 | 156 | info_type_stats = job.inspect_details.result.info_type_stats 157 | source_bucket = storage_client.get_bucket(STAGING_BUCKET) 158 | source_blob = source_bucket.blob(file_name) 159 | if (len(info_type_stats) > 0): 160 | # Found at least one sensitive data 161 | for stat in info_type_stats: 162 | log('Found {stat_cnt} instances of {stat_type_name}.'.format( 163 | stat_cnt=stat.count, stat_type_name=stat.info_type.name), severity=LOG_SEVERITY_WARNING) 164 | log('Moving item to sensitive bucket', severity=LOG_SEVERITY_DEBUG) 165 | destination_bucket = storage_client.get_bucket(SENSITIVE_BUCKET) 166 | source_bucket.copy_blob(source_blob, destination_bucket, 167 | file_name) # copy the item to the sensitive bucket 168 | source_blob.delete() # delete item from the quarantine bucket 169 | 170 | else: 171 | # No sensitive data found 172 | log('Moving item to non-sensitive bucket', severity=LOG_SEVERITY_DEBUG) 173 | destination_bucket = storage_client.get_bucket(NONSENSITIVE_BUCKET) 174 | source_bucket.copy_blob( 175 | source_blob, destination_bucket, 176 | file_name) # copy the item to the non-sensitive bucket 177 | source_blob.delete() # delete item from the quarantine bucket 178 | log('classifying file [{}] Finished'.format(file_name), severity=LOG_SEVERITY_DEBUG) 179 | -------------------------------------------------------------------------------- /gcs-dlp-classification-python/requirements.txt: -------------------------------------------------------------------------------- 1 | google-cloud-dlp 2 | google-cloud-pubsub 3 | google-cloud-storage 4 | google-cloud-logging -------------------------------------------------------------------------------- /sample_data/sample_n01.txt: -------------------------------------------------------------------------------- 1 | This is a test document that does not contain any sensitive data. 2 | 3 | “Organize the world’s information and make it universally accessible and useful.” 4 | Since the beginning, our goal has been to develop services that significantly improve the lives of as many people as possible. 5 | 6 | Not just for some. For everyone. 7 | -------------------------------------------------------------------------------- /sample_data/sample_n02.txt: -------------------------------------------------------------------------------- 1 | This is a test document that does not contain any sensitive data. 2 | 3 | “Organize the world’s information and make it universally accessible and useful.” 4 | Since the beginning, our goal has been to develop services that significantly improve the lives of as many people as possible. 5 | 6 | Not just for some. For everyone. 7 | -------------------------------------------------------------------------------- /sample_data/sample_n03.txt: -------------------------------------------------------------------------------- 1 | This is a test document that does not contain any sensitive data. 2 | 3 | “Organize the world’s information and make it universally accessible and useful.” 4 | Since the beginning, our goal has been to develop services that significantly improve the lives of as many people as possible. 5 | 6 | Not just for some. For everyone. 7 | -------------------------------------------------------------------------------- /sample_data/sample_n04.txt: -------------------------------------------------------------------------------- 1 | This is a test document that does not contain any sensitive data. 2 | 3 | “Organize the world’s information and make it universally accessible and useful.” 4 | Since the beginning, our goal has been to develop services that significantly improve the lives of as many people as possible. 5 | 6 | Not just for some. For everyone. 7 | -------------------------------------------------------------------------------- /sample_data/sample_n05.txt: -------------------------------------------------------------------------------- 1 | This is a test document that does not contain any sensitive data. 2 | 3 | “Organize the world’s information and make it universally accessible and useful.” 4 | Since the beginning, our goal has been to develop services that significantly improve the lives of as many people as possible. 5 | 6 | Not just for some. For everyone. 7 | -------------------------------------------------------------------------------- /sample_data/sample_n06.txt: -------------------------------------------------------------------------------- 1 | This is a test document that does not contain any sensitive data. 2 | 3 | “Organize the world’s information and make it universally accessible and useful.” 4 | Since the beginning, our goal has been to develop services that significantly improve the lives of as many people as possible. 5 | 6 | Not just for some. For everyone. 7 | -------------------------------------------------------------------------------- /sample_data/sample_n07.txt: -------------------------------------------------------------------------------- 1 | This is a test document that does not contain any sensitive data. 2 | 3 | “Organize the world’s information and make it universally accessible and useful.” 4 | Since the beginning, our goal has been to develop services that significantly improve the lives of as many people as possible. 5 | 6 | Not just for some. For everyone. 7 | -------------------------------------------------------------------------------- /sample_data/sample_n08.txt: -------------------------------------------------------------------------------- 1 | This is a test document that does not contain any sensitive data. 2 | 3 | “Organize the world’s information and make it universally accessible and useful.” 4 | Since the beginning, our goal has been to develop services that significantly improve the lives of as many people as possible. 5 | 6 | Not just for some. For everyone. 7 | -------------------------------------------------------------------------------- /sample_data/sample_n09.txt: -------------------------------------------------------------------------------- 1 | This is a test document that does not contain any sensitive data. 2 | 3 | “Organize the world’s information and make it universally accessible and useful.” 4 | Since the beginning, our goal has been to develop services that significantly improve the lives of as many people as possible. 5 | 6 | Not just for some. For everyone. 7 | -------------------------------------------------------------------------------- /sample_data/sample_n10.txt: -------------------------------------------------------------------------------- 1 | This is a test document that does not contain any sensitive data. 2 | 3 | “Organize the world’s information and make it universally accessible and useful.” 4 | Since the beginning, our goal has been to develop services that significantly improve the lives of as many people as possible. 5 | 6 | Not just for some. For everyone. 7 | -------------------------------------------------------------------------------- /sample_data/sample_n11.csv: -------------------------------------------------------------------------------- 1 | record id,metric 1,metric 2,metric 3 2 | 1,9,91,68 3 | 2,88,97,75 4 | 3,35,7,80 5 | 4,22,67,1 6 | 5,15,32,15 7 | 6,83,10,53 8 | 7,69,48,16 9 | 8,88,61,30 -------------------------------------------------------------------------------- /sample_data/sample_n12.csv: -------------------------------------------------------------------------------- 1 | record id,metric 1,metric 2,metric 3 2 | 1,13,12,21 3 | 2,4,70,89 4 | 3,72,66,66 5 | 4,53,99,25 6 | 5,17,26,83 7 | 6,72,82,97 8 | 7,56,70,81 9 | 8,19,80,98 -------------------------------------------------------------------------------- /sample_data/sample_n13.csv: -------------------------------------------------------------------------------- 1 | record id,metric 1,metric 2,metric 3 2 | 1,24,78,93 3 | 2,2,40,42 4 | 3,52,10,70 5 | 4,14,27,91 6 | 5,69,64,66 7 | 6,52,16,71 8 | 7,46,73,68 9 | 8,66,8,1 -------------------------------------------------------------------------------- /sample_data/sample_n14.csv: -------------------------------------------------------------------------------- 1 | N,metric 1,metric 2,metric 3 2 | 1,53,49,0 3 | 2,12,13,30 4 | 3,25,94,17 5 | 4,13,48,23 6 | 5,60,81,37 7 | 6,60,89,42 8 | 7,100,60,59 9 | 8,50,76,8 -------------------------------------------------------------------------------- /sample_data/sample_n15.csv: -------------------------------------------------------------------------------- 1 | record id,metric 1,metric 2,metric 3 2 | 1,59,93,100 3 | 2,53,13,17 4 | 3,59,67,53 5 | 4,52,93,34 6 | 5,14,22,88 7 | 6,18,88,3 8 | 7,32,49,5 9 | 8,93,46,14 -------------------------------------------------------------------------------- /sample_data/sample_s01.csv: -------------------------------------------------------------------------------- 1 | email,metric 1,metric 2 2 | jane.doe@example.org,4,48 3 | test86551@example.org,86,24 4 | sasdfa90@example.org,100,11 5 | jack.smith@example.org,44,89 6 | sasdfa90@example.org,29,60 7 | jane.doe@example.org,8,36 8 | sasdfa90@example.org,76,41 9 | jack.smith@example.org,9,97 -------------------------------------------------------------------------------- /sample_data/sample_s02.csv: -------------------------------------------------------------------------------- 1 | email,metric 1,metric 2 2 | jack.smith@example.org,83,61 3 | sasdfa90@example.org,74,16 4 | jane.doe@example.org,49,3 5 | jack.smith@example.org,1,33 6 | jane.doe@example.org,2,82 7 | jack.smith@example.org,5,7 8 | sasdfa90@example.org,8,86 9 | ejadd@example.org,73,69 -------------------------------------------------------------------------------- /sample_data/sample_s03.csv: -------------------------------------------------------------------------------- 1 | email,metric 1,metric 2 2 | sasdfa90@example.org,99,87 3 | ejadd@example.org,94,93 4 | ejadd@example.org,95,98 5 | test86551@example.org,48,66 6 | test86551@example.org,85,99 7 | test86551@example.org,44,47 8 | jane.doe@example.org,21,51 9 | test86551@example.org,9,71 -------------------------------------------------------------------------------- /sample_data/sample_s04.csv: -------------------------------------------------------------------------------- 1 | email,metric 1,metric 2 2 | test86551@example.org,44,9 3 | jack.smith@example.org,67,28 4 | ejadd@example.org,18,12 5 | jack.smith@example.org,91,28 6 | jane.doe@example.org,4,71 7 | sasdfa90@example.org,93,91 8 | test86551@example.org,85,47 9 | test86551@example.org,75,29 -------------------------------------------------------------------------------- /sample_data/sample_s05.csv: -------------------------------------------------------------------------------- 1 | email,metric 1,metric 2 2 | jack.smith@example.org,46,29 3 | test86551@example.org,64,75 4 | ejadd@example.org,85,19 5 | test86551@example.org,51,57 6 | test86551@example.org,23,81 7 | test86551@example.org,8,32 8 | jack.smith@example.org,67,61 9 | jane.doe@example.org,51,75 -------------------------------------------------------------------------------- /sample_data/sample_s06.csv: -------------------------------------------------------------------------------- 1 | name,Phone,metric 1,metric 2 2 | Jackie Smith,858-232-1123,5,36 3 | Maria Johnson,650-332-1241,99,26 4 | Maria Johnson,212-451-5511,91,44 5 | Jackie Smith,650-332-1241,37,11 6 | Jackie Smith,212-451-5511,70,9 7 | Tzvika Roberts,212-451-5511,54,21 8 | Jackie Smith,650-332-1241,71,0 9 | Tzvika Roberts,650-332-1241,51,37 -------------------------------------------------------------------------------- /sample_data/sample_s07.csv: -------------------------------------------------------------------------------- 1 | s,Phone,metric 1,metric 2 2 | Ameet Garland,915-332-1123,70,54 3 | Maria Johnson,858-232-1123,99,86 4 | Ameet Garland,212-451-5511,45,89 5 | Tyler Parker,915-332-1123,100,48 6 | Tyler Parker,650-332-1241,35,41 7 | Maria Johnson,858-232-1123,99,82 8 | Jackie Smith,212-451-5511,100,83 9 | Jackie Smith,650-332-1241,61,48 -------------------------------------------------------------------------------- /sample_data/sample_s08.csv: -------------------------------------------------------------------------------- 1 | name,Phone,metric 1,metric 2 2 | Maria Johnson,650-332-1241,71,8 3 | Maria Johnson,915-332-1123,19,35 4 | Ameet Garland,858-232-1123,62,72 5 | Ameet Garland,915-332-1123,96,72 6 | Jackie Smith,619-442-1124,83,8 7 | Tzvika Roberts,915-332-1123,82,62 8 | Tyler Parker,858-232-1123,33,44 9 | Tzvika Roberts,915-332-1123,45,52 -------------------------------------------------------------------------------- /sample_data/sample_s09.csv: -------------------------------------------------------------------------------- 1 | name,Phone,metric 1,metric 2 2 | Ameet Garland,650-332-1241,37,78 3 | Tyler Parker,915-332-1123,31,24 4 | Tyler Parker,619-442-1124,28,36 5 | Maria Johnson,619-442-1124,75,96 6 | Maria Johnson,212-451-5511,40,58 7 | Ameet Garland,619-442-1124,65,89 8 | Ameet Garland,650-332-1241,46,17 9 | Maria Johnson,915-332-1123,97,22 -------------------------------------------------------------------------------- /sample_data/sample_s10.csv: -------------------------------------------------------------------------------- 1 | name,Phone,metric 1,metric 2 2 | Ameet Garland,858-232-1123,98,37 3 | Jackie Smith,650-332-1241,21,73 4 | Ameet Garland,858-232-1123,14,85 5 | Maria Johnson,858-232-1123,66,59 6 | Tzvika Roberts,212-451-5511,17,23 7 | Maria Johnson,212-451-5511,52,23 8 | Tyler Parker,212-451-5511,63,28 9 | Tzvika Roberts,212-451-5511,91,62 -------------------------------------------------------------------------------- /sample_data/sample_s11.csv: -------------------------------------------------------------------------------- 1 | name,Address,metric 1,metric 2 2 | Tyler Parker,"1600 Amphitheater Way, 94043",17,57 3 | Jackie Smith,"747 6th St S, Kirkland, WA 98033",1,28 4 | Jackie Smith,"801 11th Ave, Sunnyvale, CA",36,87 5 | Tyler Parker,"801 11th Ave, Sunnyvale, CA",25,23 6 | Jackie Smith,"345 Spear St, San Francisco, CA 94105",49,2 7 | Tzvika Roberts,"1600 Amphitheater Way, 94043",43,93 8 | Tyler Parker,"1600 Amphitheater Way, 94043",45,93 9 | Tzvika Roberts,"111 8th Ave, New York, NY 10011",88,19 -------------------------------------------------------------------------------- /sample_data/sample_s12.csv: -------------------------------------------------------------------------------- 1 | name,Address,metric 1,metric 2 2 | Tyler Parker,"1600 Amphitheater Way, 94043",93,100 3 | Tyler Parker,"1600 Amphitheater Way, 94043",11,84 4 | Ameet Garland,"111 8th Ave, New York, NY 10011",23,12 5 | Tyler Parker,"111 8th Ave, New York, NY 10011",72,46 6 | Tyler Parker,"1600 Amphitheater Way, 94043",40,7 7 | Ameet Garland,"801 11th Ave, Sunnyvale, CA",57,70 8 | Ameet Garland,"747 6th St S, Kirkland, WA 98033",83,98 9 | Maria Johnson,"345 Spear St, San Francisco, CA 94105",68,61 -------------------------------------------------------------------------------- /sample_data/sample_s13.csv: -------------------------------------------------------------------------------- 1 | name,Address,metric 1,metric 2 2 | Jackie Smith,"1600 Amphitheater Way, 94043",76,32 3 | Maria Johnson,"345 Spear St, San Francisco, CA 94105",14,89 4 | Tzvika Roberts,"111 8th Ave, New York, NY 10011",36,16 5 | Jackie Smith,"801 11th Ave, Sunnyvale, CA",30,7 6 | Tzvika Roberts,"111 8th Ave, New York, NY 10011",80,25 7 | Ameet Garland,"747 6th St S, Kirkland, WA 98033",68,46 8 | Tyler Parker,"801 11th Ave, Sunnyvale, CA",71,44 9 | Maria Johnson,"111 8th Ave, New York, NY 10011",19,60 -------------------------------------------------------------------------------- /sample_data/sample_s14.csv: -------------------------------------------------------------------------------- 1 | name,Address,metric 1,metric 2 2 | Ameet Garland,"801 11th Ave, Sunnyvale, CA",45,47 3 | Tyler Parker,"345 Spear St, San Francisco, CA 94105",4,52 4 | Jackie Smith,"747 6th St S, Kirkland, WA 98033",77,77 5 | Ameet Garland,"345 Spear St, San Francisco, CA 94105",50,94 6 | Jackie Smith,"345 Spear St, San Francisco, CA 94105",64,28 7 | Maria Johnson,"801 11th Ave, Sunnyvale, CA",6,74 8 | Ameet Garland,"801 11th Ave, Sunnyvale, CA",28,77 9 | Tyler Parker,"801 11th Ave, Sunnyvale, CA",93,27 -------------------------------------------------------------------------------- /sample_data/sample_s15.csv: -------------------------------------------------------------------------------- 1 | S15,Address,metric 1,metric 2 2 | Tzvika Roberts,"345 Spear St, San Francisco, CA 94105",30,3 3 | Jackie Smith,"1600 Amphitheater Way, 94043",73,83 4 | Tyler Parker,"345 Spear St, San Francisco, CA 94105",20,94 5 | Maria Johnson,"747 6th St S, Kirkland, WA 98033",40,18 6 | Tyler Parker,"1600 Amphitheater Way, 94043",80,44 7 | Tzvika Roberts,"1600 Amphitheater Way, 94043",28,26 8 | Ameet Garland,"801 11th Ave, Sunnyvale, CA",96,7 9 | Maria Johnson,"801 11th Ave, Sunnyvale, CA",50,9 -------------------------------------------------------------------------------- /sample_data/sample_s16.csv: -------------------------------------------------------------------------------- 1 | Name,SSN,metric 1,metric 2 2 | Ameet Garland,719-12-6560,56,20 3 | Jackie Smith,719-12-6560,28,98 4 | Jackie Smith,616-69-3226,19,55 5 | Tyler Parker,616-69-3226,68,90 6 | Tzvika Roberts,245-25-8698,55,100 7 | Jackie Smith,719-12-6560,15,53 8 | Jackie Smith,719-12-6560,5,63 9 | Ameet Garland,719-12-6560,32,36 -------------------------------------------------------------------------------- /sample_data/sample_s17.csv: -------------------------------------------------------------------------------- 1 | Name,SSN,metric 1,metric 2 2 | Tzvika Roberts,245-25-8698,24,6 3 | Tyler Parker,475-15-8499,68,96 4 | Ameet Garland,245-25-8698,76,96 5 | Jackie Smith,616-69-3226,91,28 6 | Tyler Parker,475-15-8499,12,46 7 | Tzvika Roberts,475-15-8499,66,32 8 | Maria Johnson,245-25-8698,88,60 9 | Jackie Smith,719-12-6560,3,78 -------------------------------------------------------------------------------- /sample_data/sample_s18.csv: -------------------------------------------------------------------------------- 1 | Name,SSN,metric 1,metric 2 2 | Ameet Garland,616-69-3226,33,87 3 | Ameet Garland,475-15-8499,98,35 4 | Maria Johnson,284-73-5110,3,15 5 | Tyler Parker,245-25-8698,57,5 6 | Maria Johnson,719-12-6560,70,16 7 | Ameet Garland,719-12-6560,16,77 8 | Tzvika Roberts,284-73-5110,95,12 9 | Jackie Smith,475-15-8499,39,73 -------------------------------------------------------------------------------- /sample_data/sample_s19.csv: -------------------------------------------------------------------------------- 1 | Name,SSN,metric 1,metric 2 2 | Jackie Smith,245-25-8698,72,31 3 | Jackie Smith,284-73-5110,85,16 4 | Tyler Parker,719-12-6560,71,15 5 | Maria Johnson,475-15-8499,28,46 6 | Ameet Garland,475-15-8499,77,81 7 | Jackie Smith,719-12-6560,35,5 8 | Ameet Garland,284-73-5110,19,11 9 | Tzvika Roberts,616-69-3226,67,34 -------------------------------------------------------------------------------- /sample_data/sample_s20.csv: -------------------------------------------------------------------------------- 1 | Name,SSN,metric 1,metric 2 2 | Maria Johnson,284-73-5110,5,43 3 | Tyler Parker,284-73-5110,8,17 4 | Maria Johnson,284-73-5110,54,63 5 | Maria Johnson,245-25-8698,53,19 6 | Tyler Parker,475-15-8499,6,67 7 | Maria Johnson,719-12-6560,75,83 8 | Maria Johnson,616-69-3226,91,13 9 | Tzvika Roberts,245-25-8698,94,61 -------------------------------------------------------------------------------- /sample_data/sample_s21.txt: -------------------------------------------------------------------------------- 1 | This is a test document that does contain sensitive data. 2 | 3 | sasdfa90@example.org 4 | 858-232-1123 5 | 801 11th Ave, Sunnyvale, CA 6 | 7 | -------------------------------------------------------------------------------- /sample_data/sample_s22.txt: -------------------------------------------------------------------------------- 1 | This is a test document that does contain sensitive data. 2 | 3 | jane.doe@example.org 4 | 619-442-1124 5 | works at 1600 Amphitheater Way, 94043 6 | 7 | -------------------------------------------------------------------------------- /sample_data/sample_s23.txt: -------------------------------------------------------------------------------- 1 | This is a test document that does contain sensitive data. 2 | 3 | ejadd@example.org 4 | 650-332-1241 5 | 111 8th Ave, New York, NY 10011 6 | Tzvika Roberts 7 | 8 | -------------------------------------------------------------------------------- /sample_data/sample_s24.txt: -------------------------------------------------------------------------------- 1 | This is a test document that does contain sensitive data. 2 | 3 | test86551@example.org 4 | 212-451-5511 5 | 747 6th St S, Kirkland, WA 98033 6 | Tyler Parker 7 | 8 | -------------------------------------------------------------------------------- /sample_data/sample_s25.txt: -------------------------------------------------------------------------------- 1 | This is a test document that does contain sensitive data. 2 | 3 | jack.smith@example.org 4 | 915-332-1123 5 | 345 Spear St, San Francisco, CA 94105 6 | Ameet Garland 7 | -------------------------------------------------------------------------------- /terraform/README.md: -------------------------------------------------------------------------------- 1 | 2 | ## Introduction 3 | 4 | This folder provides terraform scripts to automate the creation of the resources and deployment of the Cloud Foundation code as described in [Automating the classification of data uploaded to Cloud Storage](https://cloud.google.com/architecture/automating-classification-of-data-uploaded-to-cloud-storage) 5 | 6 | 7 | ## Assumptions 8 | 9 | The instructions assume you are running this from a cloud shell instance. 10 | 11 | 12 | ## Contents 13 | 14 | The following folders are available in the terraform folder of the repo: 15 | 16 | 17 | 18 | * **Infra** - terraform script that enables the required APIs, creates buckets, pub/sub topics & subscription. Grants IAM roles to default app engine account 19 | * **dlp** - terraform script to grant required role to [DLP service agent](https://cloud.google.com/dlp/docs/iam-permissions#service_account). 20 | * **dlpfunction** - contains terraform to deploy cloud function 21 | * **src** - empty folder that you will copy the function files to 22 | 23 | 24 | ## Deployment 25 | 26 | 27 | 28 | 1. Create the target project . If you wish to automate the creation of the project you can extend the terraform files here to include creation of the target 29 | 2. Change folder to the infra folder 30 | 3. Update the terraforms.tfvar file replacing the following with your own values 31 | 32 |
35 | Name 36 | | 37 |description 38 | | 39 |type 40 | | 41 |
YOUR_PROJECT_ID 44 | | 45 |Project to be used for creating resources and deploying cloud function to 46 | | 47 |string 48 | | 49 |