├── .github └── PULL_REQUEST_TEMPLATE.md ├── CODE_OF_CONDUCT.md ├── CONTRIBUTING.md ├── LICENSE ├── README.md ├── tweets-simulated.py └── tweets-streaming.py /.github/PULL_REQUEST_TEMPLATE.md: -------------------------------------------------------------------------------- 1 | *Issue #, if available:* 2 | 3 | *Description of changes:* 4 | 5 | 6 | By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice. 7 | -------------------------------------------------------------------------------- /CODE_OF_CONDUCT.md: -------------------------------------------------------------------------------- 1 | ## Code of Conduct 2 | This project has adopted the [Amazon Open Source Code of Conduct](https://aws.github.io/code-of-conduct). 3 | For more information see the [Code of Conduct FAQ](https://aws.github.io/code-of-conduct-faq) or contact 4 | opensource-codeofconduct@amazon.com with any additional questions or comments. 5 | -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | # Contributing Guidelines 2 | 3 | Thank you for your interest in contributing to our project. Whether it's a bug report, new feature, correction, or additional 4 | documentation, we greatly value feedback and contributions from our community. 5 | 6 | Please read through this document before submitting any issues or pull requests to ensure we have all the necessary 7 | information to effectively respond to your bug report or contribution. 8 | 9 | 10 | ## Reporting Bugs/Feature Requests 11 | 12 | We welcome you to use the GitHub issue tracker to report bugs or suggest features. 13 | 14 | When filing an issue, please check [existing open](https://github.com/awslabs/aws-blog-dynamodb-analysis/issues), or [recently closed](https://github.com/awslabs/aws-blog-dynamodb-analysis/issues?utf8=%E2%9C%93&q=is%3Aissue%20is%3Aclosed%20), issues to make sure somebody else hasn't already 15 | reported the issue. Please try to include as much information as you can. Details like these are incredibly useful: 16 | 17 | * A reproducible test case or series of steps 18 | * The version of our code being used 19 | * Any modifications you've made relevant to the bug 20 | * Anything unusual about your environment or deployment 21 | 22 | 23 | ## Contributing via Pull Requests 24 | Contributions via pull requests are much appreciated. Before sending us a pull request, please ensure that: 25 | 26 | 1. You are working against the latest source on the *master* branch. 27 | 2. You check existing open, and recently merged, pull requests to make sure someone else hasn't addressed the problem already. 28 | 3. You open an issue to discuss any significant work - we would hate for your time to be wasted. 29 | 30 | To send us a pull request, please: 31 | 32 | 1. Fork the repository. 33 | 2. Modify the source; please focus on the specific change you are contributing. If you also reformat all the code, it will be hard for us to focus on your change. 34 | 3. Ensure local tests pass. 35 | 4. Commit to your fork using clear commit messages. 36 | 5. Send us a pull request, answering any default questions in the pull request interface. 37 | 6. Pay attention to any automated CI failures reported in the pull request, and stay involved in the conversation. 38 | 39 | GitHub provides additional document on [forking a repository](https://help.github.com/articles/fork-a-repo/) and 40 | [creating a pull request](https://help.github.com/articles/creating-a-pull-request/). 41 | 42 | 43 | ## Finding contributions to work on 44 | Looking at the existing issues is a great way to find something to contribute on. As our projects, by default, use the default GitHub issue labels ((enhancement/bug/duplicate/help wanted/invalid/question/wontfix), looking at any ['help wanted'](https://github.com/awslabs/aws-blog-dynamodb-analysis/labels/help%20wanted) issues is a great place to start. 45 | 46 | 47 | ## Code of Conduct 48 | This project has adopted the [Amazon Open Source Code of Conduct](https://aws.github.io/code-of-conduct). 49 | For more information see the [Code of Conduct FAQ](https://aws.github.io/code-of-conduct-faq) or contact 50 | opensource-codeofconduct@amazon.com with any additional questions or comments. 51 | 52 | 53 | ## Security issue notifications 54 | If you discover a potential security issue in this project we ask that you notify AWS/Amazon Security via our [vulnerability reporting page](http://aws.amazon.com/security/vulnerability-reporting/). Please do **not** create a public github issue. 55 | 56 | 57 | ## Licensing 58 | 59 | See the [LICENSE](https://github.com/awslabs/aws-blog-dynamodb-analysis/blob/master/LICENSE) file for our project's licensing. We will ask you to confirm the licensing of your contribution. 60 | 61 | We may ask you to sign a [Contributor License Agreement (CLA)](http://en.wikipedia.org/wiki/Contributor_License_Agreement) for larger changes. 62 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Copyright 2017 Amazon.com, Inc. or its affiliates. All Rights Reserved. 2 | 3 | Permission is hereby granted, free of charge, to any person obtaining a copy of 4 | this software and associated documentation files (the "Software"), to deal in 5 | the Software without restriction, including without limitation the rights to 6 | use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of 7 | the Software, and to permit persons to whom the Software is furnished to do so. 8 | 9 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 10 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS 11 | FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR 12 | COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER 13 | IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN 14 | CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 15 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # aws-blog-dynamodb-analysis 2 | This package can pull random public tweets from Twitter (tweets-streaming.py) or generates simulated tweets (tweets-simulated.py). The result will be stored into a DynamoDB table. 3 | 4 | tweets-streaming.py 5 | ------------------- 6 | 7 | This script pulls random tweets from the Twitter API and stores them in Amazon DynamoDB. There are two modules needed to execute the script: 8 | 9 | - boto3: https://aws.amazon.com/sdk-for-python/ 10 | - twitter: https://pypi.python.org/pypi/twitter/ 11 | 12 | A Twitter account is needed to access Twitter API. Go to https://www.twitter.com/ and sign up for a free account, if you don't already have one. Once your account is up, go to https://apps.twitter.com/ and on the main landing page, click the grey "Create New App" button. After you give it a name, you can go to the "Keys and Access Tokens" to get your credentials to use the Twitter API. You will need to generate Customer Tokens/Secret and Access Token/Secret. All four keys will be used to authenticate your request. 13 | 14 | In the script, update the following lines with the real security credentials: 15 | 16 | # Twitter security credentials 17 | ACCESS_TOKEN = "...01234..." 18 | ACCESS_SECRET = "...i7RkW..." 19 | CONSUMER_KEY = "...be4Ma..." 20 | CONSUMER_SECRET = "...btcar..." 21 | 22 | This section can be customized according to your preference. Use your own table name and TTL value as desired: 23 | 24 | # Global variables. 25 | dynamodb_table = "TwitterAnalysis" 26 | expires_after_days = 30 27 | 28 | 29 | 30 | tweets-simulated.py 31 | ------------------- 32 | 33 | This script generates simulated tweets and stores them in Amazon DynamoDB. There are three modules needed to execute the script: 34 | 35 | - boto3: https://aws.amazon.com/sdk-for-python/ 36 | - names: https://pypi.python.org/pypi/names/ 37 | - loremipsum: https://pypi.python.org/pypi/loremipsum/ 38 | 39 | In the script, update the following lines with the real security credentials: 40 | 41 | # Twitter security credentials 42 | ACCESS_TOKEN = "...01234..." 43 | ACCESS_SECRET = "...i7RkW..." 44 | CONSUMER_KEY = "...be4Ma..." 45 | CONSUMER_SECRET = "...btcar..." 46 | 47 | This section can be customized according to your preference. Use your own table name and TTL value as desired: 48 | 49 | # Global variables 50 | dynamodb_table = "TwitterAnalysis" 51 | provisioned_wcu = 1 52 | 53 | License Summary 54 | --------------- 55 | 56 | This sample code is made available under the MIT-0 license. See the LICENSE file. 57 | -------------------------------------------------------------------------------- /tweets-simulated.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/python 2 | 3 | # Copyright 2017-2017 Amazon.com, Inc. or its affiliates. All Rights Reserved. 4 | # 5 | # SPDX-License-Identifier: MIT-0 6 | 7 | # Import modules 8 | from loremipsum import get_sentences 9 | import boto3 10 | import names 11 | import random 12 | import string 13 | import signal 14 | import math 15 | import time 16 | import sys 17 | 18 | # Global variables 19 | dynamodb_table = "TwitterAnalysis" 20 | provisioned_wcu = 1 21 | 22 | # Initiate DynamoDB client 23 | client = boto3.client('dynamodb') 24 | 25 | # Signal handler, Ctrl+c to quit 26 | def signal_handler(signal, frame): 27 | print "\n" 28 | sys.exit(0) 29 | 30 | signal.signal(signal.SIGINT, signal_handler) 31 | 32 | # Actions 33 | insert_to_ddb = True; 34 | print_to_screen = True; 35 | 36 | # Start the loop to generate simulated tweets 37 | while(True) : 38 | # Generate fake tweet 39 | user_id = names.get_first_name() 40 | tweet_id = str(random.randint(pow(10,16),pow(10,17)-1)) 41 | created_at = time.strftime("%a %b %d %H:%M:%S +0000 %Y", time.gmtime()) 42 | language = random.choice(['de', 'en', 'es', 'fr', 'id', 'nl', 'pt', 'sk']) 43 | text = str(get_sentences(1)[0]) 44 | 45 | # Store tweet in DynamoDB 46 | if insert_to_ddb == True : 47 | res = client.put_item( 48 | TableName=dynamodb_table, 49 | Item={ 50 | 'user_id' : { 'S' : user_id }, 51 | 'tweet_id' : { 'N' : tweet_id }, 52 | 'created_at': { 'S' : created_at }, 53 | 'language' : { 'S' : language }, 54 | 'text' : { 'S' : text } 55 | }) 56 | 57 | # Print output to screen 58 | if print_to_screen == True : 59 | print "insert_to_ddb: %s" % insert_to_ddb 60 | print "user_id : %s" % user_id 61 | print "tweet_id : %s" % tweet_id 62 | print "created_at : %s" % created_at 63 | print "language : %s" % language 64 | print "text : %s" % (text[:77] + '...' if len(text) > 80 else text) 65 | print "\n===========================================" 66 | 67 | # Loop control 68 | time.sleep(1.0/provisioned_wcu) 69 | -------------------------------------------------------------------------------- /tweets-streaming.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/python 2 | 3 | # Copyright 2017-2017 Amazon.com, Inc. or its affiliates. All Rights Reserved. 4 | # 5 | # SPDX-License-Identifier: MIT-0 6 | 7 | # Import modules 8 | from twitter import Twitter, OAuth, TwitterHTTPError, TwitterStream 9 | import boto3 10 | import signal 11 | import time 12 | import sys 13 | 14 | # Twitter security credentials 15 | ACCESS_TOKEN = "...01234..." 16 | ACCESS_SECRET = "...i7RkW..." 17 | CONSUMER_KEY = "...be4Ma..." 18 | CONSUMER_SECRET = "...btcar..." 19 | 20 | # Global variables. 21 | dynamodb_table = "TwitterAnalysis" 22 | expires_after_days = 30 23 | 24 | # Authenticate and initialize stream 25 | oauth = OAuth(ACCESS_TOKEN, ACCESS_SECRET, CONSUMER_KEY, CONSUMER_SECRET) 26 | stream = TwitterStream(auth=oauth) 27 | tweets = stream.statuses.sample() 28 | 29 | # Initiate DynamoDB client 30 | client = boto3.client('dynamodb') 31 | 32 | # Signal handler, Ctrl+c to quit 33 | def signal_handler(signal, frame): 34 | print "\n" 35 | sys.exit(0) 36 | 37 | signal.signal(signal.SIGINT, signal_handler) 38 | 39 | # Routing. Also for easy block commenting. 40 | insert_to_ddb = True; 41 | print_to_screen = True; 42 | 43 | # Start the loop to get the tweets. 44 | for tweet in tweets : 45 | try : 46 | # Get tweet data 47 | user_id = tweet["user"]["screen_name"] 48 | tweet_id = tweet["id_str"] 49 | created_at = tweet["created_at"] 50 | timestamp_ms = tweet["timestamp_ms"] 51 | language = tweet["lang"] 52 | text = tweet["text"] 53 | hts = tweet["entities"]["hashtags"] 54 | 55 | # Expire items in the future, calculated in milliseconds 56 | ttl_value = str((int(timestamp_ms)/1000)+(expire_after_days*86400000)) 57 | 58 | # Process hashtags 59 | hashtags = ['None'] 60 | if len(hts) != 0 : 61 | hashtags.pop() 62 | for ht in hts : 63 | hashtags.append(str(ht["text"])) 64 | 65 | # Store tweet in DynamoDB 66 | if insert_to_ddb == True : 67 | res = client.put_item( 68 | TableName=dynamodb_table, 69 | Item={ 70 | 'user_id' : { 'S' : user_id }, 71 | 'tweet_id' : { 'N' : tweet_id }, 72 | 'created_at': { 'S' : created_at }, 73 | 'ttl_value' : { 'N' : ttl_value }, 74 | 'language' : { 'S' : language }, 75 | 'text' : { 'S' : text }, 76 | 'hashtags' : { 'SS': hashtags } 77 | }) 78 | 79 | # Print output to screen 80 | if print_to_screen == True : 81 | print "insert_to_ddb: %s" % insert_to_ddb 82 | print "user_id : %s" % user_id 83 | print "tweet_id : %s" % tweet_id 84 | print "created_at : %s" % created_at 85 | print "timestamp_ms : %s" % timestamp_ms 86 | print "language : %s" % language 87 | print "text : %s" % (text[:77] + '...' if len(text) > 80 else text) 88 | print "hashtags : %s" % hashtags 89 | print "\n===========================================" 90 | 91 | except Exception : 92 | pass 93 | --------------------------------------------------------------------------------