├── .github └── PULL_REQUEST_TEMPLATE.md ├── .gitignore ├── CODE_OF_CONDUCT.md ├── CONTRIBUTING.md ├── Dockerfile ├── Dockerrun.aws.json ├── LICENSE ├── README.md ├── assets ├── apache2.0lic.txt ├── test_file_01.txt └── usage_examples.sh ├── cmd ├── createTable │ └── main.go ├── uploads3 │ └── main.go └── worker │ ├── config.go │ ├── job_message_parse.go │ ├── job_message_queue.go │ ├── main.go │ ├── result_collector.go │ ├── result_notifier.go │ ├── result_recorder.go │ └── worker.go └── shared_types.go /.github/PULL_REQUEST_TEMPLATE.md: -------------------------------------------------------------------------------- 1 | *Issue #, if available:* 2 | 3 | *Description of changes:* 4 | 5 | 6 | By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. 7 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | /bin 2 | /worker 3 | /uploads3 4 | /createTable 5 | /cmd/uploads3/uploads3 6 | # Elastic Beanstalk Files 7 | .elasticbeanstalk/* 8 | !.elasticbeanstalk/*.cfg.yml 9 | !.elasticbeanstalk/*.global.yml 10 | -------------------------------------------------------------------------------- /CODE_OF_CONDUCT.md: -------------------------------------------------------------------------------- 1 | ## Code of Conduct 2 | This project has adopted the [Amazon Open Source Code of Conduct](https://aws.github.io/code-of-conduct). 3 | For more information see the [Code of Conduct FAQ](https://aws.github.io/code-of-conduct-faq) or contact 4 | opensource-codeofconduct@amazon.com with any additional questions or comments. 5 | -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | # Contributing Guidelines 2 | 3 | Thank you for your interest in contributing to our project. Whether it's a bug report, new feature, correction, or additional 4 | documentation, we greatly value feedback and contributions from our community. 5 | 6 | Please read through this document before submitting any issues or pull requests to ensure we have all the necessary 7 | information to effectively respond to your bug report or contribution. 8 | 9 | 10 | ## Reporting Bugs/Feature Requests 11 | 12 | We welcome you to use the GitHub issue tracker to report bugs or suggest features. 13 | 14 | When filing an issue, please check [existing open](https://github.com/${GITHUB_ORG}/${GITHUB_REPO}/issues), or [recently closed](https://github.com/${GITHUB_ORG}/${GITHUB_REPO}/issues?utf8=%E2%9C%93&q=is%3Aissue%20is%3Aclosed%20), issues to make sure somebody else hasn't already 15 | reported the issue. Please try to include as much information as you can. Details like these are incredibly useful: 16 | 17 | * A reproducible test case or series of steps 18 | * The version of our code being used 19 | * Any modifications you've made relevant to the bug 20 | * Anything unusual about your environment or deployment 21 | 22 | 23 | ## Contributing via Pull Requests 24 | Contributions via pull requests are much appreciated. Before sending us a pull request, please ensure that: 25 | 26 | 1. You are working against the latest source on the *master* branch. 27 | 2. You check existing open, and recently merged, pull requests to make sure someone else hasn't addressed the problem already. 28 | 3. You open an issue to discuss any significant work - we would hate for your time to be wasted. 29 | 30 | To send us a pull request, please: 31 | 32 | 1. Fork the repository. 33 | 2. Modify the source; please focus on the specific change you are contributing. If you also reformat all the code, it will be hard for us to focus on your change. 34 | 3. Ensure local tests pass. 35 | 4. Commit to your fork using clear commit messages. 36 | 5. Send us a pull request, answering any default questions in the pull request interface. 37 | 6. Pay attention to any automated CI failures reported in the pull request, and stay involved in the conversation. 38 | 39 | GitHub provides additional document on [forking a repository](https://help.github.com/articles/fork-a-repo/) and 40 | [creating a pull request](https://help.github.com/articles/creating-a-pull-request/). 41 | 42 | 43 | ## Finding contributions to work on 44 | Looking at the existing issues is a great way to find something to contribute on. As our projects, by default, use the default GitHub issue labels ((enhancement/bug/duplicate/help wanted/invalid/question/wontfix), looking at any ['help wanted'](https://github.com/${GITHUB_ORG}/${GITHUB_REPO}/labels/help%20wanted) issues is a great place to start. 45 | 46 | 47 | ## Code of Conduct 48 | This project has adopted the [Amazon Open Source Code of Conduct](https://aws.github.io/code-of-conduct). 49 | For more information see the [Code of Conduct FAQ](https://aws.github.io/code-of-conduct-faq) or contact 50 | opensource-codeofconduct@amazon.com with any additional questions or comments. 51 | 52 | 53 | ## Security issue notifications 54 | If you discover a potential security issue in this project we ask that you notify AWS/Amazon Security via our [vulnerability reporting page](http://aws.amazon.com/security/vulnerability-reporting/). Please do **not** create a public github issue. 55 | 56 | 57 | ## Licensing 58 | 59 | See the [LICENSE](https://github.com/${GITHUB_ORG}/${GITHUB_REPO}/blob/master/LICENSE) file for our project's licensing. We will ask you to confirm the licensing of your contribution. 60 | 61 | We may ask you to sign a [Contributor License Agreement (CLA)](http://en.wikipedia.org/wiki/Contributor_License_Agreement) for larger changes. 62 | -------------------------------------------------------------------------------- /Dockerfile: -------------------------------------------------------------------------------- 1 | # Example Dockerfile if the service was going to be run in a docker 2 | # container instead of a preconfigured Elastic Beanstalk Go platform. 3 | FROM ubuntu:12.04 4 | FROM golang:1.5.1 5 | 6 | ADD . /go/src/github.com/awslabs/aws-go-wordfreq-sample 7 | 8 | RUN go get github.com/awslabs/aws-go-wordfreq-sample/cmd/worker/... 9 | RUN go install github.com/awslabs/aws-go-wordfreq-sample/cmd/worker 10 | 11 | EXPOSE 80 12 | 13 | ENTRYPOINT /go/bin/worker 14 | 15 | CMD ["/go/bin/worker"] -------------------------------------------------------------------------------- /Dockerrun.aws.json: -------------------------------------------------------------------------------- 1 | { 2 | "AWSEBDockerrunVersion": "1" 3 | } 4 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Apache License 2 | Version 2.0, January 2004 3 | http://www.apache.org/licenses/ 4 | 5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 6 | 7 | 1. Definitions. 8 | 9 | "License" shall mean the terms and conditions for use, reproduction, 10 | and distribution as defined by Sections 1 through 9 of this document. 11 | 12 | "Licensor" shall mean the copyright owner or entity authorized by 13 | the copyright owner that is granting the License. 14 | 15 | "Legal Entity" shall mean the union of the acting entity and all 16 | other entities that control, are controlled by, or are under common 17 | control with that entity. For the purposes of this definition, 18 | "control" means (i) the power, direct or indirect, to cause the 19 | direction or management of such entity, whether by contract or 20 | otherwise, or (ii) ownership of fifty percent (50%) or more of the 21 | outstanding shares, or (iii) beneficial ownership of such entity. 22 | 23 | "You" (or "Your") shall mean an individual or Legal Entity 24 | exercising permissions granted by this License. 25 | 26 | "Source" form shall mean the preferred form for making modifications, 27 | including but not limited to software source code, documentation 28 | source, and configuration files. 29 | 30 | "Object" form shall mean any form resulting from mechanical 31 | transformation or translation of a Source form, including but 32 | not limited to compiled object code, generated documentation, 33 | and conversions to other media types. 34 | 35 | "Work" shall mean the work of authorship, whether in Source or 36 | Object form, made available under the License, as indicated by a 37 | copyright notice that is included in or attached to the work 38 | (an example is provided in the Appendix below). 39 | 40 | "Derivative Works" shall mean any work, whether in Source or Object 41 | form, that is based on (or derived from) the Work and for which the 42 | editorial revisions, annotations, elaborations, or other modifications 43 | represent, as a whole, an original work of authorship. For the purposes 44 | of this License, Derivative Works shall not include works that remain 45 | separable from, or merely link (or bind by name) to the interfaces of, 46 | the Work and Derivative Works thereof. 47 | 48 | "Contribution" shall mean any work of authorship, including 49 | the original version of the Work and any modifications or additions 50 | to that Work or Derivative Works thereof, that is intentionally 51 | submitted to Licensor for inclusion in the Work by the copyright owner 52 | or by an individual or Legal Entity authorized to submit on behalf of 53 | the copyright owner. For the purposes of this definition, "submitted" 54 | means any form of electronic, verbal, or written communication sent 55 | to the Licensor or its representatives, including but not limited to 56 | communication on electronic mailing lists, source code control systems, 57 | and issue tracking systems that are managed by, or on behalf of, the 58 | Licensor for the purpose of discussing and improving the Work, but 59 | excluding communication that is conspicuously marked or otherwise 60 | designated in writing by the copyright owner as "Not a Contribution." 61 | 62 | "Contributor" shall mean Licensor and any individual or Legal Entity 63 | on behalf of whom a Contribution has been received by Licensor and 64 | subsequently incorporated within the Work. 65 | 66 | 2. Grant of Copyright License. Subject to the terms and conditions of 67 | this License, each Contributor hereby grants to You a perpetual, 68 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 69 | copyright license to reproduce, prepare Derivative Works of, 70 | publicly display, publicly perform, sublicense, and distribute the 71 | Work and such Derivative Works in Source or Object form. 72 | 73 | 3. Grant of Patent License. Subject to the terms and conditions of 74 | this License, each Contributor hereby grants to You a perpetual, 75 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 76 | (except as stated in this section) patent license to make, have made, 77 | use, offer to sell, sell, import, and otherwise transfer the Work, 78 | where such license applies only to those patent claims licensable 79 | by such Contributor that are necessarily infringed by their 80 | Contribution(s) alone or by combination of their Contribution(s) 81 | with the Work to which such Contribution(s) was submitted. If You 82 | institute patent litigation against any entity (including a 83 | cross-claim or counterclaim in a lawsuit) alleging that the Work 84 | or a Contribution incorporated within the Work constitutes direct 85 | or contributory patent infringement, then any patent licenses 86 | granted to You under this License for that Work shall terminate 87 | as of the date such litigation is filed. 88 | 89 | 4. Redistribution. You may reproduce and distribute copies of the 90 | Work or Derivative Works thereof in any medium, with or without 91 | modifications, and in Source or Object form, provided that You 92 | meet the following conditions: 93 | 94 | (a) You must give any other recipients of the Work or 95 | Derivative Works a copy of this License; and 96 | 97 | (b) You must cause any modified files to carry prominent notices 98 | stating that You changed the files; and 99 | 100 | (c) You must retain, in the Source form of any Derivative Works 101 | that You distribute, all copyright, patent, trademark, and 102 | attribution notices from the Source form of the Work, 103 | excluding those notices that do not pertain to any part of 104 | the Derivative Works; and 105 | 106 | (d) If the Work includes a "NOTICE" text file as part of its 107 | distribution, then any Derivative Works that You distribute must 108 | include a readable copy of the attribution notices contained 109 | within such NOTICE file, excluding those notices that do not 110 | pertain to any part of the Derivative Works, in at least one 111 | of the following places: within a NOTICE text file distributed 112 | as part of the Derivative Works; within the Source form or 113 | documentation, if provided along with the Derivative Works; or, 114 | within a display generated by the Derivative Works, if and 115 | wherever such third-party notices normally appear. The contents 116 | of the NOTICE file are for informational purposes only and 117 | do not modify the License. You may add Your own attribution 118 | notices within Derivative Works that You distribute, alongside 119 | or as an addendum to the NOTICE text from the Work, provided 120 | that such additional attribution notices cannot be construed 121 | as modifying the License. 122 | 123 | You may add Your own copyright statement to Your modifications and 124 | may provide additional or different license terms and conditions 125 | for use, reproduction, or distribution of Your modifications, or 126 | for any such Derivative Works as a whole, provided Your use, 127 | reproduction, and distribution of the Work otherwise complies with 128 | the conditions stated in this License. 129 | 130 | 5. Submission of Contributions. Unless You explicitly state otherwise, 131 | any Contribution intentionally submitted for inclusion in the Work 132 | by You to the Licensor shall be under the terms and conditions of 133 | this License, without any additional terms or conditions. 134 | Notwithstanding the above, nothing herein shall supersede or modify 135 | the terms of any separate license agreement you may have executed 136 | with Licensor regarding such Contributions. 137 | 138 | 6. Trademarks. This License does not grant permission to use the trade 139 | names, trademarks, service marks, or product names of the Licensor, 140 | except as required for reasonable and customary use in describing the 141 | origin of the Work and reproducing the content of the NOTICE file. 142 | 143 | 7. Disclaimer of Warranty. Unless required by applicable law or 144 | agreed to in writing, Licensor provides the Work (and each 145 | Contributor provides its Contributions) on an "AS IS" BASIS, 146 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 147 | implied, including, without limitation, any warranties or conditions 148 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A 149 | PARTICULAR PURPOSE. You are solely responsible for determining the 150 | appropriateness of using or redistributing the Work and assume any 151 | risks associated with Your exercise of permissions under this License. 152 | 153 | 8. Limitation of Liability. In no event and under no legal theory, 154 | whether in tort (including negligence), contract, or otherwise, 155 | unless required by applicable law (such as deliberate and grossly 156 | negligent acts) or agreed to in writing, shall any Contributor be 157 | liable to You for damages, including any direct, indirect, special, 158 | incidental, or consequential damages of any character arising as a 159 | result of this License or out of the use or inability to use the 160 | Work (including but not limited to damages for loss of goodwill, 161 | work stoppage, computer failure or malfunction, or any and all 162 | other commercial damages or losses), even if such Contributor 163 | has been advised of the possibility of such damages. 164 | 165 | 9. Accepting Warranty or Additional Liability. While redistributing 166 | the Work or Derivative Works thereof, You may choose to offer, 167 | and charge a fee for, acceptance of support, warranty, indemnity, 168 | or other liability obligations and/or rights consistent with this 169 | License. However, in accepting such obligations, You may act only 170 | on Your own behalf and on Your sole responsibility, not on behalf 171 | of any other Contributor, and only if You agree to indemnify, 172 | defend, and hold each Contributor harmless for any liability 173 | incurred by, or claims asserted against, such Contributor by reason 174 | of your accepting any such warranty or additional liability. 175 | 176 | END OF TERMS AND CONDITIONS 177 | 178 | APPENDIX: How to apply the Apache License to your work. 179 | 180 | To apply the Apache License to your work, attach the following 181 | boilerplate notice, with the fields enclosed by brackets "{}" 182 | replaced with your own identifying information. (Don't include 183 | the brackets!) The text should be enclosed in the appropriate 184 | comment syntax for the file format. We also recommend that a 185 | file or class name and description of purpose be included on the 186 | same "printed page" as the copyright notice for easier 187 | identification within third-party archives. 188 | 189 | Copyright {yyyy} {name of copyright owner} 190 | 191 | Licensed under the Apache License, Version 2.0 (the "License"); 192 | you may not use this file except in compliance with the License. 193 | You may obtain a copy of the License at 194 | 195 | http://www.apache.org/licenses/LICENSE-2.0 196 | 197 | Unless required by applicable law or agreed to in writing, software 198 | distributed under the License is distributed on an "AS IS" BASIS, 199 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 200 | See the License for the specific language governing permissions and 201 | limitations under the License. 202 | 203 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # aws-go-wordfreq-sample 2 | Word Frequency is a sample service built with AWS SDK for Go. The service highlights how can be used within a concurrent application. The service takes advantage of Amazon Simple Storage service, Amazon Simple Queue Service, Amazon DynamoDB, and AWS Elastic Beanstalk to collect and report the top 10 most common words of a text file all in the AWS Cloud. 3 | 4 | This sample highlights how the SDK can be used to build an application, to read job messages from SQS queue when an existing or new file was uploaded to S3. A S3 bucket is configured to notify an SQS queue with information of the file uploaded. This job message will be read by one of potentially many instances of the Word Frequency service application. The service will then JSON decode the message extracting the object's bucket and key of the file uploaded. Once parsed and added to a job channel, a worker goroutine within a pool of workers will read the job from the channel, stream the object's content from S3, and count the words. When complete the worker will send the results to the results channel so that they can be recorded to DynamoDB, and also sent to an SQS result queue for further processing. 5 | 6 | This package is made up of a set of executable commands. 7 | 8 | ### uploads3 9 | CLI application to upload a file from your local system to S3. Taking advantage of the S3 Upload Manager's concurrent multipart uploads. 10 | 11 | Command line usage: 12 | ```shell 13 | ./uploads3 my-bucket my-filename 14 | ``` 15 | 16 | An additional environment variable can be set instructing the uploads3 command to wait for the file to be processed, and print out the results to the console when they are available. 17 | 18 | * WORKER_RESULT_QUEUE_URL - The SQS queue URL where the job results will be written to. 19 | 20 | ### worker 21 | Service application which will read job messages from a SQS, count the top 10 words, record the results to DynamoDB, and send the results also to an additional SQS queue for further processing. 22 | 23 | Requires the following environment variables to be set. 24 | 25 | * WORKER_QUEUE_URL - The SQS queue URL where the service will read job messages from. Job messages are created when S3 notifies the SQS queue that a file has been uploaded to a particular bucket. 26 | * WORKER_RESULT_QUEUE_URL - The SQS queue URL where the job results will be sent to. 27 | * WORKER_RESULT_TABLENAME - The name of the DynamoDB table result items should be recorded to. 28 | 29 | Optionally the follow environment variables can be provided. 30 | 31 | * AWS_REGION - The AWS region the worker will use for signing and making all requests to. This parameter is only optional if the service is running within an EC2 instance. If not running in an EC2 instance AWS_REGION is required. 32 | * WORKER_MESSAGE_VISIBILITY - The amount of time messages will be hidden in the SQS job message queue from other services when a service reads that message. Will also be used to extend the visibility timeout for long running jobs. Defaults to 60s. 33 | * WORKER_COUNT - The number of workers in the worker pool. Defaults to the number of virtual CPUs in the system. 34 | 35 | 36 | ### createTable 37 | CLI application to show how the SDK can be used to create a DynamoDB table, which the worker will use to record job results to. 38 | 39 | Command line usage: 40 | ```shell 41 | ./createTable my-tablename 42 | ``` 43 | 44 | 45 | -------------------------------------------------------------------------------- /assets/apache2.0lic.txt: -------------------------------------------------------------------------------- 1 | Apache License 2 | 3 | Version 2.0, January 2004 4 | 5 | http://www.apache.org/licenses/ 6 | 7 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 8 | 9 | 1. Definitions. 10 | 11 | "License" shall mean the terms and conditions for use, reproduction, and distribution as defined by Sections 1 through 9 of this document. 12 | 13 | "Licensor" shall mean the copyright owner or entity authorized by the copyright owner that is granting the License. 14 | 15 | "Legal Entity" shall mean the union of the acting entity and all other entities that control, are controlled by, or are under common control with that entity. For the purposes of this definition, "control" means (i) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such entity. 16 | 17 | "You" (or "Your") shall mean an individual or Legal Entity exercising permissions granted by this License. 18 | 19 | "Source" form shall mean the preferred form for making modifications, including but not limited to software source code, documentation source, and configuration files. 20 | 21 | "Object" form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled object code, generated documentation, and conversions to other media types. 22 | 23 | "Work" shall mean the work of authorship, whether in Source or Object form, made available under the License, as indicated by a copyright notice that is included in or attached to the work (an example is provided in the Appendix below). 24 | 25 | "Derivative Works" shall mean any work, whether in Source or Object form, that is based on (or derived from) the Work and for which the editorial revisions, annotations, elaborations, or other modifications represent, as a whole, an original work of authorship. For the purposes of this License, Derivative Works shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work and Derivative Works thereof. 26 | 27 | "Contribution" shall mean any work of authorship, including the original version of the Work and any modifications or additions to that Work or Derivative Works thereof, that is intentionally submitted to Licensor for inclusion in the Work by the copyright owner or by an individual or Legal Entity authorized to submit on behalf of the copyright owner. For the purposes of this definition, "submitted" means any form of electronic, verbal, or written communication sent to the Licensor or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, the Licensor for the purpose of discussing and improving the Work, but excluding communication that is conspicuously marked or otherwise designated in writing by the copyright owner as "Not a Contribution." 28 | 29 | "Contributor" shall mean Licensor and any individual or Legal Entity on behalf of whom a Contribution has been received by Licensor and subsequently incorporated within the Work. 30 | 31 | 2. Grant of Copyright License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare Derivative Works of, publicly display, publicly perform, sublicense, and distribute the Work and such Derivative Works in Source or Object form. 32 | 33 | 3. Grant of Patent License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Work, where such license applies only to those patent claims licensable by such Contributor that are necessarily infringed by their Contribution(s) alone or by combination of their Contribution(s) with the Work to which such Contribution(s) was submitted. If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Work or a Contribution incorporated within the Work constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License for that Work shall terminate as of the date such litigation is filed. 34 | 35 | 4. Redistribution. You may reproduce and distribute copies of the Work or Derivative Works thereof in any medium, with or without modifications, and in Source or Object form, provided that You meet the following conditions: 36 | 37 | You must give any other recipients of the Work or Derivative Works a copy of this License; and 38 | You must cause any modified files to carry prominent notices stating that You changed the files; and 39 | You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works; and 40 | If the Work includes a "NOTICE" text file as part of its distribution, then any Derivative Works that You distribute must include a readable copy of the attribution notices contained within such NOTICE file, excluding those notices that do not pertain to any part of the Derivative Works, in at least one of the following places: within a NOTICE text file distributed as part of the Derivative Works; within the Source form or documentation, if provided along with the Derivative Works; or, within a display generated by the Derivative Works, if and wherever such third-party notices normally appear. The contents of the NOTICE file are for informational purposes only and do not modify the License. You may add Your own attribution notices within Derivative Works that You distribute, alongside or as an addendum to the NOTICE text from the Work, provided that such additional attribution notices cannot be construed as modifying the License. 41 | 42 | You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Derivative Works as a whole, provided Your use, reproduction, and distribution of the Work otherwise complies with the conditions stated in this License. 43 | 5. Submission of Contributions. Unless You explicitly state otherwise, any Contribution intentionally submitted for inclusion in the Work by You to the Licensor shall be under the terms and conditions of this License, without any additional terms or conditions. Notwithstanding the above, nothing herein shall supersede or modify the terms of any separate license agreement you may have executed with Licensor regarding such Contributions. 44 | 45 | 6. Trademarks. This License does not grant permission to use the trade names, trademarks, service marks, or product names of the Licensor, except as required for reasonable and customary use in describing the origin of the Work and reproducing the content of the NOTICE file. 46 | 47 | 7. Disclaimer of Warranty. Unless required by applicable law or agreed to in writing, Licensor provides the Work (and each Contributor provides its Contributions) on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or redistributing the Work and assume any risks associated with Your exercise of permissions under this License. 48 | 49 | 8. Limitation of Liability. In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall any Contributor be liable to You for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or out of the use or inability to use the Work (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if such Contributor has been advised of the possibility of such damages. 50 | 51 | 9. Accepting Warranty or Additional Liability. While redistributing the Work or Derivative Works thereof, You may choose to offer, and charge a fee for, acceptance of support, warranty, indemnity, or other liability obligations and/or rights consistent with this License. However, in accepting such obligations, You may act only on Your own behalf and on Your sole responsibility, not on behalf of any other Contributor, and only if You agree to indemnify, defend, and hold each Contributor harmless for any liability incurred by, or claims asserted against, such Contributor by reason of your accepting any such warranty or additional liability. 52 | 53 | END OF TERMS AND CONDITIONS -------------------------------------------------------------------------------- /assets/test_file_01.txt: -------------------------------------------------------------------------------- 1 | Lorem ipsum dolor sit amet, consectetur adipiscing elit. Integer egestas finibus urna ut suscipit. Vivamus ullamcorper porta hendrerit. Nam fermentum tristique pharetra. Sed at lorem vel nunc molestie finibus. Mauris lacinia ac mi eget bibendum. Nulla euismod vehicula dui laoreet tincidunt. Phasellus lobortis vitae lacus venenatis maximus. Nulla consequat hendrerit ultricies. Cras id aliquet justo, at lacinia augue. Vestibulum ante ipsum primis in faucibus orci luctus et ultrices posuere cubilia Curae; Donec in turpis sapien. In sodales lorem id arcu vestibulum, non aliquam odio elementum. Phasellus facilisis nisi imperdiet nulla volutpat, a sagittis magna bibendum. Nunc et pharetra risus. Fusce viverra posuere purus, eu egestas velit posuere rutrum. Curabitur mollis, nulla quis vehicula tempus, tortor lacus finibus dolor, id mollis nisl eros non est. 2 | 3 | In ut vehicula ligula. Donec et tellus feugiat, egestas justo quis, sollicitudin neque. Vivamus eget leo laoreet, cursus dolor in, fermentum massa. Nulla facilisi. Pellentesque diam lorem, eleifend ut nisi nec, placerat auctor urna. Curabitur at diam ex. Nunc gravida urna sit amet orci efficitur, at porta ante tincidunt. Mauris quis nulla felis. Pellentesque at pharetra mi, ut mattis lacus. Fusce quis nisl nec nisi aliquam accumsan. Phasellus in ornare eros, ac finibus ante. Sed quis laoreet leo. In feugiat eu sapien in auctor. In dui metus, pulvinar et mauris quis, rutrum porta felis. 4 | 5 | Ut lacinia faucibus efficitur. Nunc nec volutpat nibh, ac tincidunt metus. Vestibulum ante ipsum primis in faucibus orci luctus et ultrices posuere cubilia Curae; Proin nec sem lacinia, finibus tortor sagittis, volutpat quam. Duis viverra sit amet nulla ac lacinia. Quisque sollicitudin risus consectetur, malesuada orci sed, dictum nisi. Pellentesque mattis a enim nec elementum. Aenean euismod purus eu purus pretium, non hendrerit magna sagittis. Aliquam efficitur ac felis commodo rhoncus. Etiam varius lorem id lorem pulvinar fringilla vulputate id sem. Aenean libero odio, hendrerit ut dictum vitae, vestibulum malesuada dui. Pellentesque scelerisque quam tortor, et luctus dolor sollicitudin et. 6 | 7 | Suspendisse vitae lacus vel massa fermentum tincidunt. Cras faucibus ultricies leo, accumsan ornare lacus sagittis eget. In in lacinia lacus. Fusce sed nisl maximus est efficitur placerat vitae at odio. In ut eleifend mauris. Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas. Aenean tristique sapien sit amet diam malesuada, in mattis augue rutrum. Curabitur et mattis nibh. 8 | 9 | Suspendisse suscipit, odio et faucibus porta, dolor ex maximus tellus, nec pellentesque enim ex in odio. Suspendisse eu varius ligula. Fusce finibus, ligula eu ultrices ultrices, purus lectus ullamcorper nisl, ut mattis lorem lorem sed nibh. Vivamus faucibus sem ex, tempus tincidunt nunc convallis id. Vestibulum egestas erat feugiat nibh tincidunt ultrices. Nam vulputate tempus risus sit amet eleifend. Maecenas ut imperdiet dolor. Fusce quam nunc, tincidunt quis finibus vitae, gravida id felis. Mauris aliquet dictum quam, eu fermentum quam gravida aliquam. In id iaculis ipsum, consequat lobortis tellus. 10 | -------------------------------------------------------------------------------- /assets/usage_examples.sh: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bash 2 | 3 | # Build and Run uploader 4 | AWS_REGION="us-west-2" \ 5 | AWS_PROFILE="go-wordfreq" \ 6 | WORKER_RESULT_QUEUE_URL="https://sqs.us-west-2.amazonaws.com/762127142917/ResultsQueue" \ 7 | go run ./cmd/uploads3/main.go go-wordfreq ./assets/apache2.0lic.txt 8 | 9 | # Build and run worker in the shell 10 | go build -o bin/application ./cmd/worker 11 | AWS_REGION="us-west-2" \ 12 | AWS_PROFILE="default" \ 13 | WORKER_QUEUE_URL="https://sqs.us-west-2.amazonaws.com/762127142917/UploadedQueue" \ 14 | WORKER_RESULT_QUEUE_URL="https://sqs.us-west-2.amazonaws.com/762127142917/ResultsQueue" \ 15 | WORKER_RESULT_TABLENAME="wordfreq_results" \ 16 | ./bin/application 17 | 18 | # Build and run worker in the Docker 19 | docker build -t go-wordfreq . 20 | docker run -it \ 21 | -e "AWS_REGION=us-west-2" \ 22 | -e "WORKER_QUEUE_URL=https://sqs.us-west-2.amazonaws.com/762127142917/UploadedQueue" \ 23 | -e "WORKER_RESULT_QUEUE_URL=https://sqs.us-west-2.amazonaws.com/762127142917/ResultsQueue" \ 24 | -e "WORKER_RESULT_TABLENAME=wordfreq_results" \ 25 | --name go-wordfreq-dev \ 26 | --rm go-wordfreq 27 | 28 | # build and archive worker app for beanstalk go environment. 29 | env GOOS=linux GOARCH=amd64 go build -o bin/application ./cmd/worker 30 | zip -r -X ../go-wordfreq-bin.zip . -i bin/application 31 | -------------------------------------------------------------------------------- /cmd/createTable/main.go: -------------------------------------------------------------------------------- 1 | package main 2 | 3 | import ( 4 | "fmt" 5 | "github.com/aws/aws-sdk-go/aws/session" 6 | "os" 7 | "path/filepath" 8 | 9 | "github.com/aws/aws-sdk-go/aws" 10 | "github.com/aws/aws-sdk-go/service/dynamodb" 11 | ) 12 | 13 | // Creates a table for the Word Frequency worker to write results to. Takes a 14 | // single parameter for the table name to create. 15 | // 16 | // Usage: 17 | // createTable 18 | func main() { 19 | if len(os.Args) != 2 { 20 | fmt.Printf("usage: %s \n", filepath.Base(os.Args[0])) 21 | os.Exit(1) 22 | } 23 | tableName := os.Args[1] 24 | 25 | // Create a new instance of the DynamoDB service client. To simplify config 26 | // and allow the app to work in multiple regions environment variables will 27 | // provide the AWS_REGION, and credentials. 28 | svc := dynamodb.New(session.New()) 29 | 30 | // Use CreateTable API Operation to create a table on DynamoDB in the 31 | // AWS_REGION's region. '_' is used for the result variable since it is 32 | // not used. 33 | if _, err := svc.CreateTable(&dynamodb.CreateTableInput{ 34 | TableName: aws.String(tableName), 35 | AttributeDefinitions: []*dynamodb.AttributeDefinition{ 36 | { 37 | AttributeName: aws.String("Filename"), 38 | AttributeType: aws.String(dynamodb.ScalarAttributeTypeS), 39 | }, 40 | }, 41 | KeySchema: []*dynamodb.KeySchemaElement{ 42 | { 43 | AttributeName: aws.String("Filename"), 44 | KeyType: aws.String("HASH"), 45 | }, 46 | }, 47 | ProvisionedThroughput: &dynamodb.ProvisionedThroughput{ 48 | ReadCapacityUnits: aws.Int64(1), 49 | WriteCapacityUnits: aws.Int64(1), 50 | }, 51 | }); err != nil { 52 | fmt.Println("failed to create Amazon DynamoDB table,", err) 53 | os.Exit(1) 54 | } 55 | 56 | fmt.Println("succesffully created", tableName) 57 | } 58 | -------------------------------------------------------------------------------- /cmd/uploads3/main.go: -------------------------------------------------------------------------------- 1 | package main 2 | 3 | import ( 4 | "encoding/json" 5 | "fmt" 6 | "log" 7 | "os" 8 | "path/filepath" 9 | "time" 10 | 11 | "github.com/aws/aws-sdk-go/aws" 12 | "github.com/aws/aws-sdk-go/aws/session" 13 | "github.com/aws/aws-sdk-go/service/s3/s3manager" 14 | "github.com/aws/aws-sdk-go/service/sqs" 15 | "github.com/aws/aws-sdk-go/service/sqs/sqsiface" 16 | 17 | "github.com/awslabs/aws-go-wordfreq-sample" 18 | ) 19 | 20 | // Uploads a file to S3 so it can be processed by the Word Frequency service. 21 | // If a "WORKER_RESULT_QUEUE_URL" environment variable is provided the upload 22 | // client will wait for the job to processed, and print the results to the console. 23 | // 24 | // Usage: 25 | // uploads3 26 | func main() { 27 | if len(os.Args) != 3 { 28 | fmt.Printf("usage: %s \n", filepath.Base(os.Args[0])) 29 | os.Exit(1) 30 | } 31 | bucket := os.Args[1] 32 | filename := os.Args[2] 33 | 34 | file, err := os.Open(filename) 35 | if err != nil { 36 | fmt.Println("Failed to open file", filename, err) 37 | os.Exit(1) 38 | } 39 | defer file.Close() 40 | 41 | // Create a session which contains the default configurations for the SDK. 42 | // Use the session to create the service clients to make API calls to AWS. 43 | sess := session.New() 44 | 45 | // Create S3 Uploader manager to concurrently upload the file 46 | svc := s3manager.NewUploader(sess) 47 | 48 | fmt.Println("Uploading file to S3...") 49 | result, err := svc.Upload(&s3manager.UploadInput{ 50 | Bucket: aws.String(bucket), 51 | Key: aws.String(filepath.Base(filename)), 52 | Body: file, 53 | }) 54 | if err != nil { 55 | fmt.Println("error", err) 56 | os.Exit(1) 57 | } 58 | 59 | fmt.Printf("Successfully uploaded %s to %s\n", filename, result.Location) 60 | 61 | if queueURL := os.Getenv("WORKER_RESULT_QUEUE_URL"); queueURL != "" { 62 | fmt.Println("Waiting for results...") 63 | waitForResult(sqs.New(sess), bucket, filepath.Base(filename), queueURL) 64 | } 65 | } 66 | 67 | // waitForResult waits for the job to be processed and the job result to be added 68 | // to the job result SQS queue. This will pool the SQS queue for job results until 69 | // a job result matches the file it uploaded. When a match is found the job result 70 | // will also be deleted from the queue, and its status written to the console. 71 | // If the job result doesn't match the file uploaded by this client, the message 72 | // will be ignored, so another client could received it. 73 | func waitForResult(svc sqsiface.SQSAPI, bucket, filename, resultQueueURL string) { 74 | for { 75 | resp, err := svc.ReceiveMessage(&sqs.ReceiveMessageInput{ 76 | QueueUrl: aws.String(resultQueueURL), 77 | VisibilityTimeout: aws.Int64(0), 78 | WaitTimeSeconds: aws.Int64(20), 79 | }) 80 | if err != nil { 81 | log.Println("Failed to receive message", err) 82 | time.Sleep(30 * time.Second) 83 | continue 84 | } 85 | 86 | for _, msg := range resp.Messages { 87 | result := &wordfreq.JobResult{} 88 | if err := json.Unmarshal([]byte(aws.StringValue(msg.Body)), result); err != nil { 89 | log.Println("Failed to unmarshal message", err) 90 | continue 91 | } 92 | 93 | if result.Job.Bucket != bucket || result.Job.Key != filename { 94 | continue 95 | } 96 | 97 | printResult(result) 98 | svc.DeleteMessage(&sqs.DeleteMessageInput{ 99 | QueueUrl: aws.String(resultQueueURL), 100 | ReceiptHandle: msg.ReceiptHandle, 101 | }) 102 | return 103 | } 104 | } 105 | } 106 | 107 | // printResult prints the job results to the console. 108 | func printResult(result *wordfreq.JobResult) { 109 | fmt.Printf("Job Results completed in %s for %s/%s\n", 110 | printDuration(result.Duration), result.Job.Bucket, result.Job.Key) 111 | if result.Status == wordfreq.JobCompleteFailure { 112 | fmt.Println("Failed:", result.StatusMessage) 113 | return 114 | } 115 | 116 | fmt.Println("Top Words:") 117 | for _, w := range result.Words { 118 | format := "- %s\t%d\n" 119 | if len(w.Word) <= 5 { 120 | format = "- %s\t\t%d\n" 121 | } 122 | fmt.Printf(format, w.Word, w.Count) 123 | } 124 | } 125 | 126 | // printDuration formats the duration trimming less significant units based on 127 | // the overall duration provided. Minutes will be limit to seconds. Seconds to 128 | // milliseconds. Milliseconds to microseconds. 129 | func printDuration(dur time.Duration) string { 130 | nano := dur.Nanoseconds() 131 | if dur > time.Minute { 132 | nano = (nano / 1e9) * 1e9 133 | } else if dur > time.Second { 134 | nano = (nano / 1e6) * 1e6 135 | } else if dur > time.Millisecond { 136 | nano = (nano / 1e3) * 1e3 137 | } 138 | return time.Duration(nano).String() 139 | } 140 | -------------------------------------------------------------------------------- /cmd/worker/config.go: -------------------------------------------------------------------------------- 1 | package main 2 | 3 | import ( 4 | "fmt" 5 | "os" 6 | "runtime" 7 | "strconv" 8 | 9 | "github.com/aws/aws-sdk-go/aws" 10 | "github.com/aws/aws-sdk-go/aws/ec2metadata" 11 | "github.com/aws/aws-sdk-go/aws/session" 12 | ) 13 | 14 | const defaultMessageVisibilityTimeout = 60 15 | 16 | var defaultWorkerCount = runtime.NumCPU() 17 | 18 | // A Config provides a collection of configuration values the service will use 19 | // to setup its components. 20 | type Config struct { 21 | Session *session.Session 22 | 23 | // SQS queue URL job messages will be available at 24 | WorkerQueueURL string 25 | // SQS queue URL job results will be written to 26 | ResultQueueURL string 27 | // DynamoDB tablename results will be recorded to 28 | ResultTableName string 29 | // Number of workers in the worker pool 30 | NumWorkers int 31 | // The amount of time in seconds a read job message from the SQS will be 32 | // hidden from other readers of the queue. 33 | MessageVisibilityTimeout int64 34 | } 35 | 36 | // getConfig collects the configuration from the environment variables, and 37 | // returns it, or error if it was unable to collect the configuration. 38 | func getConfig() (Config, error) { 39 | c := Config{ 40 | WorkerQueueURL: os.Getenv("WORKER_QUEUE_URL"), 41 | ResultQueueURL: os.Getenv("WORKER_RESULT_QUEUE_URL"), 42 | ResultTableName: os.Getenv("WORKER_RESULT_TABLENAME"), 43 | Session: session.New(), 44 | } 45 | 46 | if c.WorkerQueueURL == "" { 47 | return c, fmt.Errorf("missing WORKER_QUEUE_URL") 48 | } 49 | if c.ResultQueueURL == "" { 50 | return c, fmt.Errorf("missing WORKER_RESULT_QUEUE_URL") 51 | } 52 | if c.ResultTableName == "" { 53 | return c, fmt.Errorf("missing WORKER_RESULT_TABLENAME") 54 | } 55 | 56 | if aws.StringValue(c.Session.Config.Region) == "" { 57 | region, err := ec2metadata.New(c.Session).Region() 58 | if err != nil { 59 | return c, fmt.Errorf("region not specified, unable to retrieve from EC2 instance %v", err) 60 | } 61 | c.Session.Config.Region = aws.String(region) 62 | } 63 | 64 | if timeoutStr := os.Getenv("WORKER_MESSAGE_VISIBILITY"); timeoutStr != "" { 65 | timeout, err := strconv.ParseInt(timeoutStr, 10, 64) 66 | if err != nil { 67 | return c, err 68 | } 69 | if timeout <= 0 { 70 | return c, fmt.Errorf("invalid message visibility timeout") 71 | } 72 | c.MessageVisibilityTimeout = timeout 73 | } else { 74 | c.MessageVisibilityTimeout = defaultMessageVisibilityTimeout 75 | } 76 | 77 | atOnceStr := os.Getenv("WORKER_COUNT") 78 | if atOnceStr == "" { 79 | c.NumWorkers = defaultWorkerCount 80 | } else { 81 | atOnce, err := strconv.ParseInt(atOnceStr, 10, 64) 82 | if err != nil { 83 | return c, err 84 | } 85 | if atOnce <= 0 { 86 | return c, fmt.Errorf("invalid worker number") 87 | } 88 | c.NumWorkers = int(atOnce) 89 | } 90 | 91 | return c, nil 92 | } 93 | -------------------------------------------------------------------------------- /cmd/worker/job_message_parse.go: -------------------------------------------------------------------------------- 1 | package main 2 | 3 | import ( 4 | "encoding/json" 5 | "fmt" 6 | "time" 7 | 8 | "github.com/awslabs/aws-go-wordfreq-sample" 9 | ) 10 | 11 | // parseJobMessage unmarshals the JSON job message, and constructs a worker job 12 | // from it. Since S3 messages can include multiple records each individual job 13 | // is added to the job channel so a worker from the worker pool can read it, 14 | // and process the job. 15 | func parseJobMessage(jobCh chan<- *wordfreq.Job, msg wordfreq.JobMessage, timeout int64) error { 16 | fmt.Println("Procesing message", msg.ID) 17 | 18 | s3msg := s3EventMsg{} 19 | if err := json.Unmarshal([]byte(msg.Body), &s3msg); err != nil { 20 | return fmt.Errorf("parse Amazon S3 Event message %v", err) 21 | } 22 | 23 | if len(s3msg.Records) == 0 { 24 | return fmt.Errorf("job does not have any records") 25 | } 26 | 27 | for _, record := range s3msg.Records { 28 | jobCh <- &wordfreq.Job{ 29 | StartedAt: time.Now(), 30 | VisibilityTimeout: timeout, 31 | OrigMessage: msg, 32 | Region: record.Region, 33 | Bucket: record.S3.Bucket.Name, 34 | Key: record.S3.Object.Key, 35 | } 36 | } 37 | 38 | return nil 39 | } 40 | 41 | // A s3EventMsg represents the SQS message provided by S3 Notifications. This 42 | // is an abbreviated form of the message since not all fields are used by this 43 | // service. 44 | type s3EventMsg struct { 45 | Event string 46 | Records []struct { 47 | Region string `json:"awsRegion"` 48 | EventName string 49 | S3 struct { 50 | Bucket struct { 51 | Name string 52 | } 53 | Object struct { 54 | Key string 55 | } 56 | } 57 | } 58 | } 59 | -------------------------------------------------------------------------------- /cmd/worker/job_message_queue.go: -------------------------------------------------------------------------------- 1 | package main 2 | 3 | import ( 4 | "fmt" 5 | "log" 6 | "time" 7 | 8 | "github.com/aws/aws-sdk-go/aws" 9 | "github.com/aws/aws-sdk-go/service/sqs" 10 | "github.com/aws/aws-sdk-go/service/sqs/sqsiface" 11 | 12 | "github.com/awslabs/aws-go-wordfreq-sample" 13 | ) 14 | 15 | // A JobMessageQueue provides listening to a SQS queue for job messages, and 16 | // providing those job messages as a job channel to workers so the jobs can be 17 | // processed. 18 | type JobMessageQueue struct { 19 | queueURL string 20 | queueVisibility int64 21 | queueWait int64 22 | 23 | jobCh chan *wordfreq.Job 24 | msgSvc sqsiface.SQSAPI 25 | } 26 | 27 | // NewJobMessageQueue creates a new instance of the JobMessageQueue configuring it 28 | // for the SQS service client it will use. The sqsiface.SQSAPI is used so that 29 | // the code could be unit tested in isolating without also testing the SDK. 30 | func NewJobMessageQueue(url string, visibilityTime, waitTime int64, svc sqsiface.SQSAPI) *JobMessageQueue { 31 | return &JobMessageQueue{ 32 | queueURL: url, 33 | queueVisibility: visibilityTime, 34 | queueWait: waitTime, 35 | jobCh: make(chan *wordfreq.Job, 10), 36 | msgSvc: svc, 37 | } 38 | } 39 | 40 | // Listen waits for messages to arrive from the SQS queue, parses the JSON 41 | // message and sends the jobs to the job channel to be processed by the worker pool. 42 | func (m *JobMessageQueue) Listen(doneCh <-chan struct{}) { 43 | fmt.Println("Job Message queue starting") 44 | defer close(m.jobCh) 45 | defer fmt.Println("Job Message queue quitting.") 46 | 47 | for { 48 | select { 49 | case <-doneCh: 50 | return 51 | default: 52 | msgs, err := m.receiveMsg() 53 | if err != nil { 54 | log.Println("Failed to read from message queue", err) 55 | time.Sleep(5 * time.Second) 56 | continue 57 | } 58 | 59 | // Since SQS ReceiveMessage could return multiple messages at once 60 | // we should loop over then instead of assuming only a single message 61 | // message is returned. This is also and easier pattern if we want 62 | // to bump up the number of messages that will be read from SQS at once 63 | // by default only one message is read. 64 | for _, msg := range msgs { 65 | parseErr := parseJobMessage(m.jobCh, 66 | wordfreq.JobMessage{ 67 | ID: *msg.MessageId, 68 | ReceiptHandle: *msg.ReceiptHandle, 69 | Body: *msg.Body, 70 | }, 71 | m.queueVisibility, 72 | ) 73 | if parseErr != nil { 74 | fmt.Println("Failed to parse", *msg.MessageId, "job message,", parseErr) 75 | m.DeleteMessage(*msg.ReceiptHandle) 76 | } 77 | } 78 | } 79 | } 80 | } 81 | 82 | // receiveMsg reads a message from the SQS job queue. A visibility timeout is set 83 | // so that no other reader will be able to see the message which this service 84 | // received. Preventing duplication of work. And a wait time provides long pooling 85 | // so the service does not need to micro manage its pooling of SQS. 86 | func (m *JobMessageQueue) receiveMsg() ([]*sqs.Message, error) { 87 | result, err := m.msgSvc.ReceiveMessage(&sqs.ReceiveMessageInput{ 88 | QueueUrl: aws.String(m.queueURL), 89 | WaitTimeSeconds: aws.Int64(m.queueWait), 90 | VisibilityTimeout: aws.Int64(m.queueVisibility), 91 | }) 92 | if err != nil { 93 | return nil, err 94 | } 95 | 96 | return result.Messages, nil 97 | } 98 | 99 | // DeleteMessage deletes a previously received message from the job message queue 100 | // Once a job is complete it can safely be deleted from the queue so that no 101 | // other service or worker will rerun the job. 102 | func (m *JobMessageQueue) DeleteMessage(receiptHandle string) error { 103 | _, err := m.msgSvc.DeleteMessage(&sqs.DeleteMessageInput{ 104 | QueueUrl: aws.String(m.queueURL), 105 | ReceiptHandle: aws.String(receiptHandle), 106 | }) 107 | return err 108 | } 109 | 110 | // UpdateMessageVisibility extends the amount of time a job message is hidden from 111 | // other readers of the SQS job queue. This allows a worker to keep processing 112 | // a long running job. 113 | func (m *JobMessageQueue) UpdateMessageVisibility(receiptHandle string) (int64, error) { 114 | _, err := m.msgSvc.ChangeMessageVisibility(&sqs.ChangeMessageVisibilityInput{ 115 | QueueUrl: aws.String(m.queueURL), 116 | ReceiptHandle: aws.String(receiptHandle), 117 | VisibilityTimeout: aws.Int64(m.queueVisibility), 118 | }) 119 | return m.queueVisibility, err 120 | } 121 | 122 | // GetJobs returns a read only channel to read jobs from. This channel will 123 | // be closed when the JobMessageQueue no longer is listening for further SQS 124 | // job messages. 125 | func (m *JobMessageQueue) GetJobs() <-chan *wordfreq.Job { 126 | return m.jobCh 127 | } 128 | -------------------------------------------------------------------------------- /cmd/worker/main.go: -------------------------------------------------------------------------------- 1 | package main 2 | 3 | import ( 4 | "fmt" 5 | "log" 6 | "os" 7 | "os/signal" 8 | 9 | "github.com/aws/aws-sdk-go/service/dynamodb" 10 | "github.com/aws/aws-sdk-go/service/s3" 11 | "github.com/aws/aws-sdk-go/service/sqs" 12 | 13 | "github.com/awslabs/aws-go-wordfreq-sample" 14 | ) 15 | 16 | // Worker service which reads from an SQS queue pulls off job messages, processes 17 | // the jobs, and records the results. The service uses environment variables for 18 | // its configuration. 19 | // 20 | // Requires the following environment variables to be set. 21 | // 22 | // * WORKER_QUEUE_URL - The SQS queue URL where the service will read job messages 23 | // from. Job messages are created when S3 notifies the SQS queue that a file has 24 | // been uploaded to a particular bucket. 25 | // 26 | // * WORKER_RESULT_QUEUE_URL - The SQS queue URL where the job results will be 27 | // sent to. 28 | // 29 | // * WORKER_RESULT_TABLENAME - The name of the DynamoDB table result items should 30 | // be recorded to. 31 | // 32 | // Optionally the follow environment variables can be provided. 33 | // 34 | // * AWS_REGION - The AWS region the worker will use for signing and making all 35 | // requests to. This parameter is only optional if the service is running within 36 | // an EC2 instance. If not running in an EC2 instance AWS_REGION is required. 37 | // 38 | // * WORKER_MESSAGE_VISIBILITY - The ammount of time messges will be hidden in 39 | // the SQS job message queue from other services when a service reads that message. 40 | // Will also be used to extend the visibility timeout for long running jobs. 41 | // Defaults to 60s. 42 | // 43 | // * WORKER_COUNT - The number of workers in the worker pool. Defaults to the 44 | // number of virtual CPUs in the system. 45 | // 46 | func main() { 47 | doneCh := listenForSigInterrupt() 48 | 49 | cfg, err := getConfig() 50 | if err != nil { 51 | log.Println("Unable to get config", err) 52 | os.Exit(1) 53 | } 54 | 55 | sqsSvc := sqs.New(cfg.Session) 56 | queue := NewJobMessageQueue(cfg.WorkerQueueURL, cfg.MessageVisibilityTimeout, 5, sqsSvc) 57 | go queue.Listen(doneCh) 58 | 59 | // Job Workers 60 | resultsCh := make(chan *wordfreq.JobResult, 10) 61 | workers := NewWorkerPool(cfg.NumWorkers, resultsCh, queue, s3.New(cfg.Session)) 62 | 63 | // Notifier to send a message to an Amazon SQS Queue 64 | notify := NewResultNotifier(sqsSvc, cfg.ResultQueueURL) 65 | // Recorder to write results to Amazon DynamoDB 66 | recorder := NewResultRecorder(cfg.ResultTableName, dynamodb.New(cfg.Session)) 67 | 68 | // Job Progress Collector 69 | collector := NewResultCollector(notify, recorder, queue) 70 | go collector.ProcessJobResult(resultsCh) 71 | 72 | // Wait for the workers to complete before continuing on to exit 73 | workers.WaitForWorkersDone() 74 | close(resultsCh) 75 | 76 | // Wait for all results to be completed before continuing 77 | collector.WaitForResults() 78 | } 79 | 80 | // Handle Ctr+C / Sig Interrupt 81 | func listenForSigInterrupt() <-chan struct{} { 82 | doneCh := make(chan struct{}) 83 | sigCh := make(chan os.Signal, 1) 84 | signal.Notify(sigCh, os.Interrupt) 85 | go func() { 86 | closed := false 87 | for sig := range sigCh { 88 | if !closed { 89 | fmt.Printf("Received %s, existing...\n", sig) 90 | closed = true 91 | close(doneCh) 92 | } 93 | } 94 | }() 95 | 96 | return doneCh 97 | } 98 | -------------------------------------------------------------------------------- /cmd/worker/result_collector.go: -------------------------------------------------------------------------------- 1 | package main 2 | 3 | import ( 4 | "fmt" 5 | "log" 6 | "sync" 7 | 8 | "github.com/awslabs/aws-go-wordfreq-sample" 9 | ) 10 | 11 | // A ResultCollector provides processing of results from the job result channel 12 | // until it is closed and drained. 13 | type ResultCollector struct { 14 | recorder *ResultRecorder 15 | notify *ResultNotifier 16 | queue *JobMessageQueue 17 | 18 | wg sync.WaitGroup 19 | } 20 | 21 | // NewResultCollector creates a new instance of the ProgressCollector. 22 | func NewResultCollector(notify *ResultNotifier, recorder *ResultRecorder, queue *JobMessageQueue) *ResultCollector { 23 | return &ResultCollector{ 24 | notify: notify, 25 | recorder: recorder, 26 | queue: queue, 27 | } 28 | } 29 | 30 | // ProcessJobResult waits for job results to be received from the results channel, 31 | // until the result channel is closed, and drained. Successful results will be 32 | // recorded to DynamoDB, and the original job message deleted from the SQS job 33 | // message queue. Regardless if the job was successful or not the status will be 34 | // reported to an SQS result queue for further processing. 35 | func (r *ResultCollector) ProcessJobResult(resultCh <-chan *wordfreq.JobResult) { 36 | r.wg.Add(1) 37 | fmt.Println("Job Result Collector starting.") 38 | defer fmt.Println("Job Result Collector quiting.") 39 | defer r.wg.Done() 40 | 41 | for { 42 | result, ok := <-resultCh 43 | if !ok { 44 | return 45 | } 46 | message := result.Job.OrigMessage 47 | fmt.Println("Recived job result", message.ID) 48 | 49 | if result.Status == wordfreq.JobCompleteSuccess { 50 | fmt.Println("Succesffuly processed job", message.ID) 51 | 52 | // Record result to dynamoDB, and delete message if successful 53 | // if the writing to dynamoDB fails, don't delete the message 54 | // so the job can be retried by another worker later. 55 | if err := r.recorder.Record(result); err != nil { 56 | result.Status = wordfreq.JobCompleteFailure 57 | result.StatusMessage = fmt.Sprintf("record results failed, %v", err) 58 | log.Println("failed to recored result", message.ID, err) 59 | } else { 60 | err := r.queue.DeleteMessage(message.ReceiptHandle) 61 | if err != nil { 62 | log.Println("Failed to delete message,", message.ID, err) 63 | } 64 | fmt.Println("Deleted message,", message.ID) 65 | } 66 | 67 | } else { 68 | log.Println("Failed to process job", message.ID) 69 | } 70 | 71 | if err := r.notify.Send(result); err != nil { 72 | log.Println("Failed to send result to SQS queue", err) 73 | } 74 | } 75 | } 76 | 77 | // WaitForResults wait for the results collector to finish processing job 78 | // results before returning. 79 | func (r *ResultCollector) WaitForResults() { 80 | r.wg.Wait() 81 | } 82 | -------------------------------------------------------------------------------- /cmd/worker/result_notifier.go: -------------------------------------------------------------------------------- 1 | package main 2 | 3 | import ( 4 | "encoding/json" 5 | 6 | "github.com/aws/aws-sdk-go/aws" 7 | "github.com/aws/aws-sdk-go/service/sqs" 8 | "github.com/aws/aws-sdk-go/service/sqs/sqsiface" 9 | 10 | "github.com/awslabs/aws-go-wordfreq-sample" 11 | ) 12 | 13 | // A ResultNotifier provides pushing a result message to the SQS results queue. 14 | type ResultNotifier struct { 15 | svc sqsiface.SQSAPI 16 | queueURL string 17 | } 18 | 19 | // NewResultNotifier creates a new instance of the ResultNotifier type with the 20 | // Amazon SQS service client and queue URL messages will sent to. 21 | func NewResultNotifier(svc sqsiface.SQSAPI, queueURL string) *ResultNotifier { 22 | return &ResultNotifier{ 23 | svc: svc, queueURL: queueURL, 24 | } 25 | } 26 | 27 | // Send sends a message to the Amazon SQS queue with the job's result. 28 | func (r *ResultNotifier) Send(result *wordfreq.JobResult) error { 29 | msg, err := json.Marshal(result) 30 | if err != nil { 31 | return err 32 | } 33 | 34 | _, err = r.svc.SendMessage(&sqs.SendMessageInput{ 35 | QueueUrl: aws.String(r.queueURL), 36 | MessageBody: aws.String(string(msg)), 37 | }) 38 | if err != nil { 39 | return err 40 | } 41 | return nil 42 | } 43 | -------------------------------------------------------------------------------- /cmd/worker/result_recorder.go: -------------------------------------------------------------------------------- 1 | package main 2 | 3 | import ( 4 | "fmt" 5 | "path" 6 | 7 | "github.com/aws/aws-sdk-go/aws" 8 | "github.com/aws/aws-sdk-go/service/dynamodb" 9 | "github.com/aws/aws-sdk-go/service/dynamodb/dynamodbattribute" 10 | "github.com/aws/aws-sdk-go/service/dynamodb/dynamodbiface" 11 | 12 | "github.com/awslabs/aws-go-wordfreq-sample" 13 | ) 14 | 15 | // A ResultRecorder provides the an abstraction to record job results to DynamoDB. 16 | type ResultRecorder struct { 17 | tableName string 18 | svc dynamodbiface.DynamoDBAPI 19 | } 20 | 21 | // NewResultRecorder creates a new instance of the ResultRecorder configured 22 | // with a DynamoDB service client. 23 | func NewResultRecorder(tableName string, svc dynamodbiface.DynamoDBAPI) *ResultRecorder { 24 | return &ResultRecorder{ 25 | tableName: tableName, 26 | svc: svc, 27 | } 28 | } 29 | 30 | // Record marshals the job result into a dynamodb.AttributeValue struct, and writes 31 | // the result item to DyanmoDB. 32 | func (r *ResultRecorder) Record(result *wordfreq.JobResult) error { 33 | // Construct a result item representing what data we want to write to DynamoDB. 34 | recordItem := resultRecord{ 35 | Filename: path.Join(result.Job.Bucket, result.Job.Key), 36 | Words: map[string]int{}, 37 | } 38 | for _, w := range result.Words { 39 | recordItem.Words[w.Word] = w.Count 40 | } 41 | 42 | // Use the ConvertToX helpers to marshal a Go struct to a dyanmodb.AttributeValue 43 | // type. This greatly simplifies the code needed to create the attribute 44 | // value item. 45 | av, err := dynamodbattribute.ConvertToMap(recordItem) 46 | if err != nil { 47 | return fmt.Errorf("unable to serialize result to dyanmoDB.AttributeValue, %v", err) 48 | } 49 | _, err = r.svc.PutItem(&dynamodb.PutItemInput{ 50 | TableName: aws.String(r.tableName), 51 | Item: av, 52 | }) 53 | if err != nil { 54 | return fmt.Errorf("unable to record result, %v", err) 55 | } 56 | 57 | return nil 58 | } 59 | 60 | // a resultRecord represents the result item in DynamoDB. 61 | type resultRecord struct { 62 | Filename string // Table hash key 63 | Words map[string]int 64 | } 65 | -------------------------------------------------------------------------------- /cmd/worker/worker.go: -------------------------------------------------------------------------------- 1 | package main 2 | 3 | import ( 4 | "bufio" 5 | "fmt" 6 | "io" 7 | "log" 8 | "sort" 9 | "strings" 10 | "sync" 11 | "time" 12 | 13 | "github.com/aws/aws-sdk-go/aws" 14 | "github.com/aws/aws-sdk-go/service/s3" 15 | "github.com/aws/aws-sdk-go/service/s3/s3iface" 16 | 17 | "github.com/awslabs/aws-go-wordfreq-sample" 18 | ) 19 | 20 | // A WorkerPool provides a collection of workers, and access to their lifecycle. 21 | type WorkerPool struct { 22 | workers []*Worker 23 | wg sync.WaitGroup 24 | } 25 | 26 | // NewWorkerPool creates a new instance of the worker pool, and creates all the 27 | // workers in the pool. The workers are spun off in their own goroutines and the 28 | // WorkerPool's wait group is used to know when the workers all completed their 29 | // work and existed. 30 | func NewWorkerPool(size int, resultCh chan<- *wordfreq.JobResult, queue *JobMessageQueue, s3svc s3iface.S3API) *WorkerPool { 31 | pool := &WorkerPool{ 32 | workers: make([]*Worker, size), 33 | } 34 | 35 | for i := 0; i < len(pool.workers); i++ { 36 | pool.wg.Add(1) 37 | pool.workers[i] = NewWorker(i, resultCh, queue, s3svc) 38 | 39 | go func(worker *Worker) { 40 | worker.run() 41 | pool.wg.Done() 42 | }(pool.workers[i]) 43 | } 44 | 45 | return pool 46 | } 47 | 48 | // WaitForWorkersDone waits for the works to of all completed their work and 49 | // exited. 50 | func (w *WorkerPool) WaitForWorkersDone() { 51 | w.wg.Wait() 52 | } 53 | 54 | // A Worker is a individual processor of jobs from the job channel. 55 | type Worker struct { 56 | id int 57 | resultCh chan<- *wordfreq.JobResult 58 | queue *JobMessageQueue 59 | s3Svc s3iface.S3API 60 | } 61 | 62 | // NewWorker creates an initializes a new worker. 63 | func NewWorker(id int, resultCh chan<- *wordfreq.JobResult, queue *JobMessageQueue, s3Svc s3iface.S3API) *Worker { 64 | return &Worker{id: id, resultCh: resultCh, queue: queue, s3Svc: s3Svc} 65 | } 66 | 67 | // run reads from the job channel until it is closed and drained. 68 | func (w *Worker) run() { 69 | fmt.Printf("Worker %d starting\n", w.id) 70 | defer fmt.Printf("Worker %d quitting.\n", w.id) 71 | 72 | for { 73 | job, ok := <-w.queue.GetJobs() 74 | if !ok { 75 | return 76 | } 77 | fmt.Printf("Worker %d received job %s\n", w.id, job.OrigMessage.ID) 78 | result := &wordfreq.JobResult{ 79 | Job: job, 80 | } 81 | 82 | // Stream the file from S3, counting the words and return the words 83 | // and error if one occurred. If an error occurred the words will be 84 | // ignored, and a failed result status is set. Otherwise the success 85 | // status is set along with the words. 86 | words, err := w.processJob(job) 87 | if err != nil { 88 | result.Status = wordfreq.JobCompleteFailure 89 | result.StatusMessage = err.Error() 90 | log.Println("Failed to process job", job.OrigMessage.ID, err) 91 | } else { 92 | result.Status = wordfreq.JobCompleteSuccess 93 | result.Words = words 94 | } 95 | // The duration is collected so that the results can report the 96 | // the amount of time a job took to process. 97 | result.Duration = time.Now().Sub(job.StartedAt) 98 | w.resultCh <- result 99 | } 100 | } 101 | 102 | // processJob gets a io.Reader to the uploaded file from S3 and starts counting 103 | // the words. Returning the words counted or error. 104 | func (w *Worker) processJob(job *wordfreq.Job) (wordfreq.Words, error) { 105 | result, err := w.s3Svc.GetObject(&s3.GetObjectInput{ 106 | Bucket: aws.String(job.Bucket), 107 | Key: aws.String(job.Key), 108 | }) 109 | if err != nil { 110 | return nil, err 111 | } 112 | defer result.Body.Close() 113 | 114 | return w.countTopWords(result.Body, 10, job) 115 | } 116 | 117 | // countTopWords counts the top words returning those words or error. 118 | func (w *Worker) countTopWords(reader io.Reader, top int, job *wordfreq.Job) ([]wordfreq.Word, error) { 119 | wordMap, err := w.countWords(reader, job) 120 | if err != nil { 121 | return nil, err 122 | } 123 | 124 | words := collectTopWords(wordMap, top) 125 | 126 | return words, nil 127 | } 128 | 129 | // countWords collects the counts of all words received from an io.Reader. Using 130 | // a word scanner unique words are counted. This is a fairly simplistic implementation 131 | // of word counting and only splits words based on whitespace. Extra characters 132 | // such as `.,"'?!` are trimmed from the front and end of each string 133 | func (w *Worker) countWords(reader io.Reader, job *wordfreq.Job) (map[string]int, error) { 134 | wordMap := map[string]int{} 135 | 136 | scanner := bufio.NewScanner(reader) 137 | scanner.Split(bufio.ScanWords) 138 | for scanner.Scan() { 139 | word := strings.ToLower(scanner.Text()) 140 | if len(word) <= 4 { 141 | continue 142 | } 143 | word = strings.Trim(word, `.,"'?!`) 144 | 145 | curCount := 0 146 | if v, ok := wordMap[word]; ok { 147 | curCount = v 148 | } 149 | 150 | wordMap[word] = 1 + curCount 151 | 152 | // To make sure another worker doesn't grab long running processes 153 | // bump up the job message's visibility timeout in the Queue. 154 | if time.Now().Sub(job.StartedAt) > time.Duration(job.VisibilityTimeout/2)*time.Second { 155 | timeAdded, err := w.queue.UpdateMessageVisibility(job.OrigMessage.ReceiptHandle) 156 | if err != nil { 157 | return nil, fmt.Errorf("Failed to update job messages's visibility timeout, %v", err) 158 | } 159 | job.VisibilityTimeout += timeAdded 160 | 161 | } 162 | } 163 | if err := scanner.Err(); err != nil { 164 | return nil, fmt.Errorf("failed to count words, %v", err) 165 | } 166 | 167 | return wordMap, nil 168 | } 169 | 170 | // collectTopWords converts the word map into an array, and sorts it. Collecting 171 | // the top words. 172 | func collectTopWords(wordMap map[string]int, top int) wordfreq.Words { 173 | words := wordfreq.Words{} 174 | for word, count := range wordMap { 175 | words = append(words, wordfreq.Word{Word: word, Count: count}) 176 | } 177 | sort.Sort(words) 178 | 179 | if top >= len(words) { 180 | return words 181 | } 182 | return words[:top] 183 | } 184 | -------------------------------------------------------------------------------- /shared_types.go: -------------------------------------------------------------------------------- 1 | package wordfreq 2 | 3 | import "time" 4 | 5 | type Job struct { 6 | StartedAt time.Time 7 | VisibilityTimeout int64 `json:"-"` 8 | OrigMessage JobMessage `json:"-"` 9 | Region, Bucket, Key string 10 | } 11 | 12 | type JobMessage struct { 13 | ID string 14 | ReceiptHandle string 15 | Body string 16 | } 17 | 18 | type JobResult struct { 19 | Job *Job 20 | Words Words 21 | Duration time.Duration 22 | Status JobCompleteStatus 23 | StatusMessage string 24 | } 25 | 26 | type JobCompleteStatus string 27 | 28 | const ( 29 | JobCompleteSuccess JobCompleteStatus = "success" 30 | JobCompleteFailure = "failure" 31 | ) 32 | 33 | type Word struct { 34 | Word string 35 | Count int 36 | } 37 | 38 | type Words []Word 39 | 40 | func (w Words) Len() int { 41 | return len(w) 42 | } 43 | func (w Words) Less(i, j int) bool { 44 | return w[i].Count > w[j].Count 45 | } 46 | func (w Words) Swap(i, j int) { 47 | w[i], w[j] = w[j], w[i] 48 | } 49 | 50 | type OrigMessage struct { 51 | ID string 52 | ReceiptHandle string 53 | Body string 54 | } 55 | --------------------------------------------------------------------------------