├── .gitignore
├── CONTRIBUTING.md
├── LICENSE
├── README.md
├── images
└── diagram.png
└── stackdriverdataflowbigquery.py
/.gitignore:
--------------------------------------------------------------------------------
1 | .DS_Store
2 |
--------------------------------------------------------------------------------
/CONTRIBUTING.md:
--------------------------------------------------------------------------------
1 | # How to Contribute
2 |
3 | We'd love to accept your patches and contributions to this project. There are
4 | just a few small guidelines you need to follow.
5 |
6 | ## Contributor License Agreement
7 |
8 | Contributions to this project must be accompanied by a Contributor License
9 | Agreement (CLA). You (or your employer) retain the copyright to your
10 | contribution; this simply gives us permission to use and redistribute your
11 | contributions as part of the project. Head over to
12 | to see your current agreements on file or
13 | to sign a new one.
14 |
15 | You generally only need to submit a CLA once, so if you've already submitted one
16 | (even if it was for a different project), you probably don't need to do it
17 | again.
18 |
19 | ## Code reviews
20 |
21 | All submissions, including submissions by project members, require review. We
22 | use GitHub pull requests for this purpose. Consult
23 | [GitHub Help](https://help.github.com/articles/about-pull-requests/) for more
24 | information on using pull requests.
25 |
26 | ## Community Guidelines
27 |
28 | This project follows
29 | [Google's Open Source Community Guidelines](https://opensource.google/conduct/).
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 |
2 | Apache License
3 | Version 2.0, January 2004
4 | http://www.apache.org/licenses/
5 |
6 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
7 |
8 | 1. Definitions.
9 |
10 | "License" shall mean the terms and conditions for use, reproduction,
11 | and distribution as defined by Sections 1 through 9 of this document.
12 |
13 | "Licensor" shall mean the copyright owner or entity authorized by
14 | the copyright owner that is granting the License.
15 |
16 | "Legal Entity" shall mean the union of the acting entity and all
17 | other entities that control, are controlled by, or are under common
18 | control with that entity. For the purposes of this definition,
19 | "control" means (i) the power, direct or indirect, to cause the
20 | direction or management of such entity, whether by contract or
21 | otherwise, or (ii) ownership of fifty percent (50%) or more of the
22 | outstanding shares, or (iii) beneficial ownership of such entity.
23 |
24 | "You" (or "Your") shall mean an individual or Legal Entity
25 | exercising permissions granted by this License.
26 |
27 | "Source" form shall mean the preferred form for making modifications,
28 | including but not limited to software source code, documentation
29 | source, and configuration files.
30 |
31 | "Object" form shall mean any form resulting from mechanical
32 | transformation or translation of a Source form, including but
33 | not limited to compiled object code, generated documentation,
34 | and conversions to other media types.
35 |
36 | "Work" shall mean the work of authorship, whether in Source or
37 | Object form, made available under the License, as indicated by a
38 | copyright notice that is included in or attached to the work
39 | (an example is provided in the Appendix below).
40 |
41 | "Derivative Works" shall mean any work, whether in Source or Object
42 | form, that is based on (or derived from) the Work and for which the
43 | editorial revisions, annotations, elaborations, or other modifications
44 | represent, as a whole, an original work of authorship. For the purposes
45 | of this License, Derivative Works shall not include works that remain
46 | separable from, or merely link (or bind by name) to the interfaces of,
47 | the Work and Derivative Works thereof.
48 |
49 | "Contribution" shall mean any work of authorship, including
50 | the original version of the Work and any modifications or additions
51 | to that Work or Derivative Works thereof, that is intentionally
52 | submitted to Licensor for inclusion in the Work by the copyright owner
53 | or by an individual or Legal Entity authorized to submit on behalf of
54 | the copyright owner. For the purposes of this definition, "submitted"
55 | means any form of electronic, verbal, or written communication sent
56 | to the Licensor or its representatives, including but not limited to
57 | communication on electronic mailing lists, source code control systems,
58 | and issue tracking systems that are managed by, or on behalf of, the
59 | Licensor for the purpose of discussing and improving the Work, but
60 | excluding communication that is conspicuously marked or otherwise
61 | designated in writing by the copyright owner as "Not a Contribution."
62 |
63 | "Contributor" shall mean Licensor and any individual or Legal Entity
64 | on behalf of whom a Contribution has been received by Licensor and
65 | subsequently incorporated within the Work.
66 |
67 | 2. Grant of Copyright License. Subject to the terms and conditions of
68 | this License, each Contributor hereby grants to You a perpetual,
69 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable
70 | copyright license to reproduce, prepare Derivative Works of,
71 | publicly display, publicly perform, sublicense, and distribute the
72 | Work and such Derivative Works in Source or Object form.
73 |
74 | 3. Grant of Patent License. Subject to the terms and conditions of
75 | this License, each Contributor hereby grants to You a perpetual,
76 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable
77 | (except as stated in this section) patent license to make, have made,
78 | use, offer to sell, sell, import, and otherwise transfer the Work,
79 | where such license applies only to those patent claims licensable
80 | by such Contributor that are necessarily infringed by their
81 | Contribution(s) alone or by combination of their Contribution(s)
82 | with the Work to which such Contribution(s) was submitted. If You
83 | institute patent litigation against any entity (including a
84 | cross-claim or counterclaim in a lawsuit) alleging that the Work
85 | or a Contribution incorporated within the Work constitutes direct
86 | or contributory patent infringement, then any patent licenses
87 | granted to You under this License for that Work shall terminate
88 | as of the date such litigation is filed.
89 |
90 | 4. Redistribution. You may reproduce and distribute copies of the
91 | Work or Derivative Works thereof in any medium, with or without
92 | modifications, and in Source or Object form, provided that You
93 | meet the following conditions:
94 |
95 | (a) You must give any other recipients of the Work or
96 | Derivative Works a copy of this License; and
97 |
98 | (b) You must cause any modified files to carry prominent notices
99 | stating that You changed the files; and
100 |
101 | (c) You must retain, in the Source form of any Derivative Works
102 | that You distribute, all copyright, patent, trademark, and
103 | attribution notices from the Source form of the Work,
104 | excluding those notices that do not pertain to any part of
105 | the Derivative Works; and
106 |
107 | (d) If the Work includes a "NOTICE" text file as part of its
108 | distribution, then any Derivative Works that You distribute must
109 | include a readable copy of the attribution notices contained
110 | within such NOTICE file, excluding those notices that do not
111 | pertain to any part of the Derivative Works, in at least one
112 | of the following places: within a NOTICE text file distributed
113 | as part of the Derivative Works; within the Source form or
114 | documentation, if provided along with the Derivative Works; or,
115 | within a display generated by the Derivative Works, if and
116 | wherever such third-party notices normally appear. The contents
117 | of the NOTICE file are for informational purposes only and
118 | do not modify the License. You may add Your own attribution
119 | notices within Derivative Works that You distribute, alongside
120 | or as an addendum to the NOTICE text from the Work, provided
121 | that such additional attribution notices cannot be construed
122 | as modifying the License.
123 |
124 | You may add Your own copyright statement to Your modifications and
125 | may provide additional or different license terms and conditions
126 | for use, reproduction, or distribution of Your modifications, or
127 | for any such Derivative Works as a whole, provided Your use,
128 | reproduction, and distribution of the Work otherwise complies with
129 | the conditions stated in this License.
130 |
131 | 5. Submission of Contributions. Unless You explicitly state otherwise,
132 | any Contribution intentionally submitted for inclusion in the Work
133 | by You to the Licensor shall be under the terms and conditions of
134 | this License, without any additional terms or conditions.
135 | Notwithstanding the above, nothing herein shall supersede or modify
136 | the terms of any separate license agreement you may have executed
137 | with Licensor regarding such Contributions.
138 |
139 | 6. Trademarks. This License does not grant permission to use the trade
140 | names, trademarks, service marks, or product names of the Licensor,
141 | except as required for reasonable and customary use in describing the
142 | origin of the Work and reproducing the content of the NOTICE file.
143 |
144 | 7. Disclaimer of Warranty. Unless required by applicable law or
145 | agreed to in writing, Licensor provides the Work (and each
146 | Contributor provides its Contributions) on an "AS IS" BASIS,
147 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
148 | implied, including, without limitation, any warranties or conditions
149 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
150 | PARTICULAR PURPOSE. You are solely responsible for determining the
151 | appropriateness of using or redistributing the Work and assume any
152 | risks associated with Your exercise of permissions under this License.
153 |
154 | 8. Limitation of Liability. In no event and under no legal theory,
155 | whether in tort (including negligence), contract, or otherwise,
156 | unless required by applicable law (such as deliberate and grossly
157 | negligent acts) or agreed to in writing, shall any Contributor be
158 | liable to You for damages, including any direct, indirect, special,
159 | incidental, or consequential damages of any character arising as a
160 | result of this License or out of the use or inability to use the
161 | Work (including but not limited to damages for loss of goodwill,
162 | work stoppage, computer failure or malfunction, or any and all
163 | other commercial damages or losses), even if such Contributor
164 | has been advised of the possibility of such damages.
165 |
166 | 9. Accepting Warranty or Additional Liability. While redistributing
167 | the Work or Derivative Works thereof, You may choose to offer,
168 | and charge a fee for, acceptance of support, warranty, indemnity,
169 | or other liability obligations and/or rights consistent with this
170 | License. However, in accepting such obligations, You may act only
171 | on Your own behalf and on Your sole responsibility, not on behalf
172 | of any other Contributor, and only if You agree to indemnify,
173 | defend, and hold each Contributor harmless for any liability
174 | incurred by, or claims asserted against, such Contributor by reason
175 | of your accepting any such warranty or additional liability.
176 |
177 | END OF TERMS AND CONDITIONS
178 |
179 | APPENDIX: How to apply the Apache License to your work.
180 |
181 | To apply the Apache License to your work, attach the following
182 | boilerplate notice, with the fields enclosed by brackets "[]"
183 | replaced with your own identifying information. (Don't include
184 | the brackets!) The text should be enclosed in the appropriate
185 | comment syntax for the file format. We also recommend that a
186 | file or class name and description of purpose be included on the
187 | same "printed page" as the copyright notice for easier
188 | identification within third-party archives.
189 |
190 | Copyright [yyyy] [name of copyright owner]
191 |
192 | Licensed under the Apache License, Version 2.0 (the "License");
193 | you may not use this file except in compliance with the License.
194 | You may obtain a copy of the License at
195 |
196 | http://www.apache.org/licenses/LICENSE-2.0
197 |
198 | Unless required by applicable law or agreed to in writing, software
199 | distributed under the License is distributed on an "AS IS" BASIS,
200 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
201 | See the License for the specific language governing permissions and
202 | limitations under the License.
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Dialogflow Log Parser
2 |
3 | This repository contains an example of how to leverage Cloud Dataflow and BigQuery to view Dialogflow interactions.
4 |
5 | The Pipeline Steps are as follows:
6 |
7 | 1. Dialogflow Interactions are logged to Google Cloud Logging
8 | 2. A Cloud Logging sink sends the log messages to Cloud Pub/Sub
9 | 3. Dataflow process the __textpayload__ and streams it to BigQuery
10 | 4. Access to the log interactions are now available in BigQuery
11 |
12 | 
13 |
14 | __Note:__ Dialogflow Interactions Logging is sent to Cloud Logging as a Text Payload, this code will parse the Text Payload to a structured format within BigQuery which is defined in the Dataflow code.
15 |
16 | ## BigQuery Schema
17 | You can change the schema as required in the Dataflow code to include other key:value pairs extracted from Cloud Logging. Here is a reference to the current schema:
18 |
19 | | Field name | Type |
20 | | ----------- | ----------- |
21 | | session_id | STRING |
22 | | trace | STRING |
23 | | caller_id | STRING |
24 | | email | STRING |
25 | | timestamp | TIMESTAMP |
26 | | receiveTimestamp | TIMESTAMP |
27 | | resolved_query | STRING |
28 | | string_value | STRING |
29 | | speech | STRING |
30 | | is_fallback_intent | STRING |
31 | | webhook_for_slot_filling_used | STRING |
32 | | webhook_used | STRING |
33 | | intent_name | STRING |
34 | | intent_id | STRING |
35 | | action | STRING |
36 | | source | STRING |
37 | | error_type | STRING |
38 | | code | STRING |
39 | | insertId | STRING |
40 | | logName | STRING |
41 | | lang | STRING |
42 | | textPayload | STRING |
43 |
44 | ## Installation
45 |
46 | 1. Enable the Dataflow API
47 | ```sh
48 | gcloud services enable dataflow
49 | ```
50 |
51 | 2. Create a Storage Bucket for Dataflow Staging
52 |
53 | ```sh
54 | gsutil mb gs://[BUCKET_NAME]/
55 | ```
56 |
57 | 3. Create a folder in the newly created bucket in the Google Cloud Console Storage Browser called __tmp__
58 |
59 | 4. Create a Pub/Sub Topic
60 | ```sh
61 | gcloud pubsub topics create [TOPIC_NAME]
62 | ```
63 | 5. Create a Cloud Logging sink
64 | ```sh
65 | gcloud logging sinks create [SINK_NAME] pubsub.googleapis.com/projects/[PROJECT_ID]/topics/[TOPIC_NAME] --log-filter="resource.type=global"
66 | ```
67 |
68 | 6. Install the Apache Beam GCP Library
69 | ```sh
70 | python3 -m virtualenv tempenv
71 | source tempenv/bin/activate
72 | pip install apache-beam[gcp]
73 | ```
74 |
75 | 7. Create BigQuery dataset
76 |
77 | 8. Deploy Dataflow Job
78 | ```sh
79 | python3 stackdriverdataflowbigquery.py --project=[YOUR_PROJECT_ID] \
80 | --input_topic=projects/[YOUR_PROJECT_ID]/topics/[YOUR_TOPIC_NAME] \
81 | --runner=DataflowRunner --temp_location=gs://[YOUR_DATAFLOW_STAGING_BUCKET]/tmp \
82 | --output_bigquery=[YOUR_BIGQUERY_DATASET.YOUR BIGQUERY_TABLE] --region=us-central1
83 | ```
84 |
85 | 9. Enable Dialogflow Logs to Cloud Logging
86 |
87 | Enable Log interactions to Dialogflow and Google Cloud
88 | https://cloud.google.com/dialogflow/docs/history#access_all_logs
89 |
90 | Once you enable Enable Log interactions, your new Dialogflow interactions will be available in BigQuery
91 |
92 | **This is not an officially supported Google product**
--------------------------------------------------------------------------------
/images/diagram.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/GoogleCloudPlatform/dialogflow-log-parser-dataflow-bigquery/040985ead97d04ee435b8390cb472e28ba34d210/images/diagram.png
--------------------------------------------------------------------------------
/stackdriverdataflowbigquery.py:
--------------------------------------------------------------------------------
1 | # Copyright 2020 Google LLC.
2 | #
3 | # Licensed under the Apache License, Version 2.0 (the "License");
4 | # you may not use this file except in compliance with the License.
5 | # You may obtain a copy of the License at
6 | #
7 | # http://www.apache.org/licenses/LICENSE-2.0
8 | #
9 | # Unless required by applicable law or agreed to in writing, software
10 | # distributed under the License is distributed on an "AS IS" BASIS,
11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12 | # See the License for the specific language governing permissions and
13 | # limitations under the License.
14 |
15 | # [START parsing Stackdriver logs from pubsub_to_bigquery]
16 | import argparse
17 | import logging
18 | import json
19 | import re
20 | import ast
21 |
22 | import apache_beam as beam
23 | from apache_beam.options.pipeline_options import PipelineOptions
24 | from apache_beam.options.pipeline_options import SetupOptions
25 | from apache_beam.options.pipeline_options import StandardOptions
26 |
27 | def iterate_multidimensional(my_dict):
28 | return_json = {
29 | "insertId": None,
30 | "logName": None,
31 | "receiveTimestamp": None,
32 | "textPayload": None,
33 | "timestamp": None,
34 | "trace": None
35 | }
36 | for k,v in my_dict.items():
37 | if(isinstance(v,dict)):
38 | iterate_multidimensional(v)
39 | continue
40 | if k in return_json:
41 | return_json[k] = v
42 | return return_json
43 |
44 | def iterate_textpayload_multidimensional(my_dict):
45 | return_dict = {
46 | "error_type": None,
47 | "session_id": None,
48 | "caller_id": None,
49 | "email": None,
50 | "code": None,
51 | "string_value": None,
52 | "lang": None,
53 | "speech": None,
54 | "is_fallback_intent": None,
55 | "webhook_for_slot_filling_used": None,
56 | "webhook_used": None,
57 | "intent_name": None,
58 | "intent_id": None,
59 | "score": None,
60 | "action": None,
61 | "resolved_query": None,
62 | "source": None
63 | }
64 |
65 | for k,v in my_dict.items():
66 | if(isinstance(v,dict)):
67 | iterate_textpayload_multidimensional(v)
68 | continue
69 | if k in return_dict:
70 | return_dict[k] = v
71 | return return_dict
72 |
73 | def iterate_textpayload(my_list):
74 | res = []
75 | for item in my_list:
76 | my_list_item = item.replace('"', '')
77 | if ':' in my_list_item:
78 | res.append(map(str.strip, my_list_item.split(":", 1)))
79 | return dict(res)
80 |
81 | # function to get response body data from pub/sub message and build structure for BigQuery load
82 | def parse_transform_response(data):
83 | logging.info('--- START parse_transform_response Function ---')
84 | pub_sub_data = json.loads(data)
85 | fullpayload_dict = iterate_multidimensional(pub_sub_data)
86 | # Clean textPlayload from Stackdriver - not a valid JSON object
87 | text_payload = fullpayload_dict['textPayload']
88 | return_merged_payload = None
89 |
90 | if text_payload != None:
91 | regex = re.compile(r'''[\S]+:(?:\s(?!\S+:)\S+)+''', re.VERBOSE)
92 | matches = regex.findall(pub_sub_data["textPayload"])
93 | iterate_textpayload_response = iterate_textpayload(matches)
94 | textpayload_dict = iterate_textpayload_multidimensional(iterate_textpayload_response)
95 | if textpayload_dict["error_type"] is not None:
96 | textpayload_dict["error_type"] = textpayload_dict["error_type"].replace("\n", "").replace("}", "").strip()
97 | return_merged_payload = dict(list(fullpayload_dict.items()) + list(textpayload_dict.items()))
98 | if return_merged_payload is not None:
99 | logging.info('--- END parse_transform_response Function ---')
100 | logging.info(return_merged_payload)
101 | return return_merged_payload
102 | else:
103 | logging.info('--- END parse_transform_response Function ---')
104 | logging.info(fullpayload_dict)
105 | return fullpayload_dict
106 |
107 | def run(argv=None, save_main_session=True):
108 | """Build and run the pipeline."""
109 | parser = argparse.ArgumentParser()
110 | group = parser.add_mutually_exclusive_group(required=True)
111 | group.add_argument(
112 | '--input_topic',
113 | help=('Input PubSub topic of the form '
114 | '"projects//topics/".'))
115 | group.add_argument(
116 | '--input_subscription',
117 | help=('Input PubSub subscription of the form '
118 | '"projects//subscriptions/."'))
119 | parser.add_argument('--output_bigquery', required=True,
120 | help='Output BQ table to write results to '
121 | '"PROJECT_ID:DATASET.TABLE"')
122 | known_args, pipeline_args = parser.parse_known_args(argv)
123 |
124 | pipeline_options = PipelineOptions(pipeline_args)
125 | pipeline_options.view_as(SetupOptions).save_main_session = save_main_session
126 | pipeline_options.view_as(StandardOptions).streaming = True
127 | p = beam.Pipeline(options=pipeline_options)
128 |
129 | # Read from PubSub into a PCollection.
130 | if known_args.input_subscription:
131 | messages = (p
132 | | beam.io.ReadFromPubSub(
133 | subscription=known_args.input_subscription)
134 | .with_output_types(bytes))
135 | else:
136 | messages = (p
137 | | beam.io.ReadFromPubSub(topic=known_args.input_topic)
138 | .with_output_types(bytes))
139 |
140 | decode_messages = messages | 'DecodePubSubMessages' >> beam.Map(lambda x: x.decode('utf-8'))
141 |
142 | # Parse response body data from pub/sub message and build structure for BigQuery load
143 | output = decode_messages | 'ParseTransformResponse' >> beam.Map(parse_transform_response)
144 |
145 | # Write to BigQuery
146 | bigquery_table_schema = {
147 | "fields": [
148 | {
149 | "mode": "NULLABLE",
150 | "name": "session_id",
151 | "type": "STRING"
152 | },
153 | {
154 | "mode": "NULLABLE",
155 | "name": "trace",
156 | "type": "STRING"
157 | },
158 | {
159 | "mode": "NULLABLE",
160 | "name": "caller_id",
161 | "type": "STRING"
162 | },
163 | {
164 | "mode": "NULLABLE",
165 | "name": "email",
166 | "type": "STRING"
167 | },
168 | {
169 | "mode": "NULLABLE",
170 | "name": "timestamp",
171 | "type": "TIMESTAMP"
172 | },
173 | {
174 | "mode": "NULLABLE",
175 | "name": "receiveTimestamp",
176 | "type": "TIMESTAMP"
177 | },
178 | {
179 | "mode": "NULLABLE",
180 | "name": "resolved_query",
181 | "type": "STRING"
182 | },
183 | {
184 | "mode": "NULLABLE",
185 | "name": "string_value",
186 | "type": "STRING"
187 | },
188 | {
189 | "mode": "NULLABLE",
190 | "name": "speech",
191 | "type": "STRING"
192 | },
193 | {
194 | "mode": "NULLABLE",
195 | "name": "is_fallback_intent",
196 | "type": "STRING"
197 | },
198 | {
199 | "mode": "NULLABLE",
200 | "name": "webhook_for_slot_filling_used",
201 | "type": "STRING"
202 | },
203 | {
204 | "mode": "NULLABLE",
205 | "name": "webhook_used",
206 | "type": "STRING"
207 | },
208 | {
209 | "mode": "NULLABLE",
210 | "name": "intent_name",
211 | "type": "STRING"
212 | },
213 | {
214 | "mode": "NULLABLE",
215 | "name": "intent_id",
216 | "type": "STRING"
217 | },
218 | {
219 | "mode": "NULLABLE",
220 | "name": "score",
221 | "type": "STRING"
222 | },
223 | {
224 | "mode": "NULLABLE",
225 | "name": "action",
226 | "type": "STRING"
227 | },
228 | {
229 | "mode": "NULLABLE",
230 | "name": "source",
231 | "type": "STRING"
232 | },
233 | {
234 | "mode": "NULLABLE",
235 | "name": "error_type",
236 | "type": "STRING"
237 | },
238 | {
239 | "mode": "NULLABLE",
240 | "name": "code",
241 | "type": "STRING"
242 | },
243 | {
244 | "mode": "NULLABLE",
245 | "name": "insertId",
246 | "type": "STRING"
247 | },
248 | {
249 | "mode": "NULLABLE",
250 | "name": "logName",
251 | "type": "STRING"
252 | },
253 | {
254 | "mode": "NULLABLE",
255 | "name": "lang",
256 | "type": "STRING"
257 | },
258 | {
259 | "mode": "NULLABLE",
260 | "name": "textPayload",
261 | "type": "STRING"
262 | }
263 |
264 | ]
265 | }
266 | output | 'WriteToBigQuery' >> beam.io.WriteToBigQuery(
267 | known_args.output_bigquery,
268 | schema=bigquery_table_schema,
269 | create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED,
270 | write_disposition=beam.io.BigQueryDisposition.WRITE_APPEND)
271 |
272 | p.run()
273 |
274 | if __name__ == '__main__':
275 | logging.getLogger().setLevel(logging.DEBUG)
276 | run()
277 | # [END parsing Stackdriver logs from pubsub_to_bigquery]
--------------------------------------------------------------------------------