├── .gitignore ├── CONTRIBUTING.md ├── LICENSE.txt ├── README.md ├── dev-tools └── release.py ├── pom.xml └── src ├── main ├── assemblies │ └── plugin.xml ├── java │ └── org │ │ └── elasticsearch │ │ ├── plugin │ │ └── river │ │ │ └── twitter │ │ │ └── TwitterRiverPlugin.java │ │ └── river │ │ └── twitter │ │ ├── TwitterRiver.java │ │ └── TwitterRiverModule.java └── resources │ └── es-plugin.properties └── test └── java └── org └── elasticsearch └── river └── twitter └── test ├── AbstractTwitterTest.java ├── Twitter4JThreadFilter.java ├── TwitterIntegrationTest.java └── helper ├── HttpClient.java └── HttpClientResponse.java /.gitignore: -------------------------------------------------------------------------------- 1 | /data 2 | /work 3 | /logs 4 | /.idea 5 | /target 6 | .DS_Store 7 | *.iml 8 | /.project 9 | /.settings 10 | /.classpath 11 | /plugin_tools 12 | /.local-execution-hints.log 13 | /.local-*-execution-hints.log 14 | -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | Contributing to elasticsearch 2 | ============================= 3 | 4 | Elasticsearch is an open source project and we love to receive contributions from our community — you! There are many ways to contribute, from writing tutorials or blog posts, improving the documentation, submitting bug reports and feature requests or writing code which can be incorporated into Elasticsearch itself. 5 | 6 | Bug reports 7 | ----------- 8 | 9 | If you think you have found a bug in Elasticsearch, first make sure that you are testing against the [latest version of Elasticsearch](http://www.elasticsearch.org/download/) - your issue may already have been fixed. If not, search our [issues list](https://github.com/elasticsearch/elasticsearch/issues) on GitHub in case a similar issue has already been opened. 10 | 11 | It is very helpful if you can prepare a reproduction of the bug. In other words, provide a small test case which we can run to confirm your bug. It makes it easier to find the problem and to fix it. Test cases should be provided as `curl` commands which we can copy and paste into a terminal to run it locally, for example: 12 | 13 | ```sh 14 | # delete the index 15 | curl -XDELETE localhost:9200/test 16 | 17 | # insert a document 18 | curl -XPUT localhost:9200/test/test/1 -d '{ 19 | "title": "test document" 20 | }' 21 | 22 | # this should return XXXX but instead returns YYY 23 | curl .... 24 | ``` 25 | 26 | Provide as much information as you can. You may think that the problem lies with your query, when actually it depends on how your data is indexed. The easier it is for us to recreate your problem, the faster it is likely to be fixed. 27 | 28 | Feature requests 29 | ---------------- 30 | 31 | If you find yourself wishing for a feature that doesn't exist in Elasticsearch, you are probably not alone. There are bound to be others out there with similar needs. Many of the features that Elasticsearch has today have been added because our users saw the need. 32 | Open an issue on our [issues list](https://github.com/elasticsearch/elasticsearch/issues) on GitHub which describes the feature you would like to see, why you need it, and how it should work. 33 | 34 | Contributing code and documentation changes 35 | ------------------------------------------- 36 | 37 | If you have a bugfix or new feature that you would like to contribute to Elasticsearch, please find or open an issue about it first. Talk about what you would like to do. It may be that somebody is already working on it, or that there are particular issues that you should know about before implementing the change. 38 | 39 | We enjoy working with contributors to get their code accepted. There are many approaches to fixing a problem and it is important to find the best approach before writing too much code. 40 | 41 | The process for contributing to any of the [Elasticsearch repositories](https://github.com/elasticsearch/) is similar. Details for individual projects can be found below. 42 | 43 | ### Fork and clone the repository 44 | 45 | You will need to fork the main Elasticsearch code or documentation repository and clone it to your local machine. See 46 | [github help page](https://help.github.com/articles/fork-a-repo) for help. 47 | 48 | Further instructions for specific projects are given below. 49 | 50 | ### Submitting your changes 51 | 52 | Once your changes and tests are ready to submit for review: 53 | 54 | 1. Test your changes 55 | Run the test suite to make sure that nothing is broken. 56 | 57 | 2. Sign the Contributor License Agreement 58 | Please make sure you have signed our [Contributor License Agreement](http://www.elasticsearch.org/contributor-agreement/). We are not asking you to assign copyright to us, but to give us the right to distribute your code without restriction. We ask this of all contributors in order to assure our users of the origin and continuing existence of the code. You only need to sign the CLA once. 59 | 60 | 3. Rebase your changes 61 | Update your local repository with the most recent code from the main Elasticsearch repository, and rebase your branch on top of the latest master branch. We prefer your changes to be squashed into a single commit. 62 | 63 | 4. Submit a pull request 64 | Push your local changes to your forked copy of the repository and [submit a pull request](https://help.github.com/articles/using-pull-requests). In the pull request, describe what your changes do and mention the number of the issue where discussion has taken place, eg "Closes #123". 65 | 66 | Then sit back and wait. There will probably be discussion about the pull request and, if any changes are needed, we would love to work with you to get your pull request merged into Elasticsearch. 67 | 68 | 69 | Contributing to the Elasticsearch plugin 70 | ---------------------------------------- 71 | 72 | **Repository:** [https://github.com/elasticsearch/elasticsearch-river-twitter](https://github.com/elasticsearch/elasticsearch-river-twitter) 73 | 74 | Make sure you have [Maven](http://maven.apache.org) installed, as Elasticsearch uses it as its build system. Integration with IntelliJ and Eclipse should work out of the box. Eclipse users can automatically configure their IDE by running `mvn eclipse:eclipse` and then importing the project into their workspace: `File > Import > Existing project into workspace`. 75 | 76 | Please follow these formatting guidelines: 77 | 78 | * Java indent is 4 spaces 79 | * Line width is 140 characters 80 | * The rest is left to Java coding standards 81 | * Disable “auto-format on save” to prevent unnecessary format changes. This makes reviews much harder as it generates unnecessary formatting changes. If your IDE supports formatting only modified chunks that is fine to do. 82 | 83 | To create a distribution from the source, simply run: 84 | 85 | ```sh 86 | cd elasticsearch-river-twitter/ 87 | mvn clean package -DskipTests 88 | ``` 89 | 90 | You will find the newly built packages under: `./target/releases/`. 91 | 92 | Before submitting your changes, run the test suite to make sure that nothing is broken, with: 93 | 94 | ```sh 95 | mvn clean test 96 | ``` 97 | 98 | Source: [Contributing to elasticsearch](http://www.elasticsearch.org/contributing-to-elasticsearch/) 99 | -------------------------------------------------------------------------------- /LICENSE.txt: -------------------------------------------------------------------------------- 1 | 2 | Apache License 3 | Version 2.0, January 2004 4 | http://www.apache.org/licenses/ 5 | 6 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 7 | 8 | 1. Definitions. 9 | 10 | "License" shall mean the terms and conditions for use, reproduction, 11 | and distribution as defined by Sections 1 through 9 of this document. 12 | 13 | "Licensor" shall mean the copyright owner or entity authorized by 14 | the copyright owner that is granting the License. 15 | 16 | "Legal Entity" shall mean the union of the acting entity and all 17 | other entities that control, are controlled by, or are under common 18 | control with that entity. For the purposes of this definition, 19 | "control" means (i) the power, direct or indirect, to cause the 20 | direction or management of such entity, whether by contract or 21 | otherwise, or (ii) ownership of fifty percent (50%) or more of the 22 | outstanding shares, or (iii) beneficial ownership of such entity. 23 | 24 | "You" (or "Your") shall mean an individual or Legal Entity 25 | exercising permissions granted by this License. 26 | 27 | "Source" form shall mean the preferred form for making modifications, 28 | including but not limited to software source code, documentation 29 | source, and configuration files. 30 | 31 | "Object" form shall mean any form resulting from mechanical 32 | transformation or translation of a Source form, including but 33 | not limited to compiled object code, generated documentation, 34 | and conversions to other media types. 35 | 36 | "Work" shall mean the work of authorship, whether in Source or 37 | Object form, made available under the License, as indicated by a 38 | copyright notice that is included in or attached to the work 39 | (an example is provided in the Appendix below). 40 | 41 | "Derivative Works" shall mean any work, whether in Source or Object 42 | form, that is based on (or derived from) the Work and for which the 43 | editorial revisions, annotations, elaborations, or other modifications 44 | represent, as a whole, an original work of authorship. For the purposes 45 | of this License, Derivative Works shall not include works that remain 46 | separable from, or merely link (or bind by name) to the interfaces of, 47 | the Work and Derivative Works thereof. 48 | 49 | "Contribution" shall mean any work of authorship, including 50 | the original version of the Work and any modifications or additions 51 | to that Work or Derivative Works thereof, that is intentionally 52 | submitted to Licensor for inclusion in the Work by the copyright owner 53 | or by an individual or Legal Entity authorized to submit on behalf of 54 | the copyright owner. For the purposes of this definition, "submitted" 55 | means any form of electronic, verbal, or written communication sent 56 | to the Licensor or its representatives, including but not limited to 57 | communication on electronic mailing lists, source code control systems, 58 | and issue tracking systems that are managed by, or on behalf of, the 59 | Licensor for the purpose of discussing and improving the Work, but 60 | excluding communication that is conspicuously marked or otherwise 61 | designated in writing by the copyright owner as "Not a Contribution." 62 | 63 | "Contributor" shall mean Licensor and any individual or Legal Entity 64 | on behalf of whom a Contribution has been received by Licensor and 65 | subsequently incorporated within the Work. 66 | 67 | 2. Grant of Copyright License. Subject to the terms and conditions of 68 | this License, each Contributor hereby grants to You a perpetual, 69 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 70 | copyright license to reproduce, prepare Derivative Works of, 71 | publicly display, publicly perform, sublicense, and distribute the 72 | Work and such Derivative Works in Source or Object form. 73 | 74 | 3. Grant of Patent License. Subject to the terms and conditions of 75 | this License, each Contributor hereby grants to You a perpetual, 76 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 77 | (except as stated in this section) patent license to make, have made, 78 | use, offer to sell, sell, import, and otherwise transfer the Work, 79 | where such license applies only to those patent claims licensable 80 | by such Contributor that are necessarily infringed by their 81 | Contribution(s) alone or by combination of their Contribution(s) 82 | with the Work to which such Contribution(s) was submitted. If You 83 | institute patent litigation against any entity (including a 84 | cross-claim or counterclaim in a lawsuit) alleging that the Work 85 | or a Contribution incorporated within the Work constitutes direct 86 | or contributory patent infringement, then any patent licenses 87 | granted to You under this License for that Work shall terminate 88 | as of the date such litigation is filed. 89 | 90 | 4. Redistribution. You may reproduce and distribute copies of the 91 | Work or Derivative Works thereof in any medium, with or without 92 | modifications, and in Source or Object form, provided that You 93 | meet the following conditions: 94 | 95 | (a) You must give any other recipients of the Work or 96 | Derivative Works a copy of this License; and 97 | 98 | (b) You must cause any modified files to carry prominent notices 99 | stating that You changed the files; and 100 | 101 | (c) You must retain, in the Source form of any Derivative Works 102 | that You distribute, all copyright, patent, trademark, and 103 | attribution notices from the Source form of the Work, 104 | excluding those notices that do not pertain to any part of 105 | the Derivative Works; and 106 | 107 | (d) If the Work includes a "NOTICE" text file as part of its 108 | distribution, then any Derivative Works that You distribute must 109 | include a readable copy of the attribution notices contained 110 | within such NOTICE file, excluding those notices that do not 111 | pertain to any part of the Derivative Works, in at least one 112 | of the following places: within a NOTICE text file distributed 113 | as part of the Derivative Works; within the Source form or 114 | documentation, if provided along with the Derivative Works; or, 115 | within a display generated by the Derivative Works, if and 116 | wherever such third-party notices normally appear. The contents 117 | of the NOTICE file are for informational purposes only and 118 | do not modify the License. You may add Your own attribution 119 | notices within Derivative Works that You distribute, alongside 120 | or as an addendum to the NOTICE text from the Work, provided 121 | that such additional attribution notices cannot be construed 122 | as modifying the License. 123 | 124 | You may add Your own copyright statement to Your modifications and 125 | may provide additional or different license terms and conditions 126 | for use, reproduction, or distribution of Your modifications, or 127 | for any such Derivative Works as a whole, provided Your use, 128 | reproduction, and distribution of the Work otherwise complies with 129 | the conditions stated in this License. 130 | 131 | 5. Submission of Contributions. Unless You explicitly state otherwise, 132 | any Contribution intentionally submitted for inclusion in the Work 133 | by You to the Licensor shall be under the terms and conditions of 134 | this License, without any additional terms or conditions. 135 | Notwithstanding the above, nothing herein shall supersede or modify 136 | the terms of any separate license agreement you may have executed 137 | with Licensor regarding such Contributions. 138 | 139 | 6. Trademarks. This License does not grant permission to use the trade 140 | names, trademarks, service marks, or product names of the Licensor, 141 | except as required for reasonable and customary use in describing the 142 | origin of the Work and reproducing the content of the NOTICE file. 143 | 144 | 7. Disclaimer of Warranty. Unless required by applicable law or 145 | agreed to in writing, Licensor provides the Work (and each 146 | Contributor provides its Contributions) on an "AS IS" BASIS, 147 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 148 | implied, including, without limitation, any warranties or conditions 149 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A 150 | PARTICULAR PURPOSE. You are solely responsible for determining the 151 | appropriateness of using or redistributing the Work and assume any 152 | risks associated with Your exercise of permissions under this License. 153 | 154 | 8. Limitation of Liability. In no event and under no legal theory, 155 | whether in tort (including negligence), contract, or otherwise, 156 | unless required by applicable law (such as deliberate and grossly 157 | negligent acts) or agreed to in writing, shall any Contributor be 158 | liable to You for damages, including any direct, indirect, special, 159 | incidental, or consequential damages of any character arising as a 160 | result of this License or out of the use or inability to use the 161 | Work (including but not limited to damages for loss of goodwill, 162 | work stoppage, computer failure or malfunction, or any and all 163 | other commercial damages or losses), even if such Contributor 164 | has been advised of the possibility of such damages. 165 | 166 | 9. Accepting Warranty or Additional Liability. While redistributing 167 | the Work or Derivative Works thereof, You may choose to offer, 168 | and charge a fee for, acceptance of support, warranty, indemnity, 169 | or other liability obligations and/or rights consistent with this 170 | License. However, in accepting such obligations, You may act only 171 | on Your own behalf and on Your sole responsibility, not on behalf 172 | of any other Contributor, and only if You agree to indemnify, 173 | defend, and hold each Contributor harmless for any liability 174 | incurred by, or claims asserted against, such Contributor by reason 175 | of your accepting any such warranty or additional liability. 176 | 177 | END OF TERMS AND CONDITIONS 178 | 179 | APPENDIX: How to apply the Apache License to your work. 180 | 181 | To apply the Apache License to your work, attach the following 182 | boilerplate notice, with the fields enclosed by brackets "[]" 183 | replaced with your own identifying information. (Don't include 184 | the brackets!) The text should be enclosed in the appropriate 185 | comment syntax for the file format. We also recommend that a 186 | file or class name and description of purpose be included on the 187 | same "printed page" as the copyright notice for easier 188 | identification within third-party archives. 189 | 190 | Copyright [yyyy] [name of copyright owner] 191 | 192 | Licensed under the Apache License, Version 2.0 (the "License"); 193 | you may not use this file except in compliance with the License. 194 | You may obtain a copy of the License at 195 | 196 | http://www.apache.org/licenses/LICENSE-2.0 197 | 198 | Unless required by applicable law or agreed to in writing, software 199 | distributed under the License is distributed on an "AS IS" BASIS, 200 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 201 | See the License for the specific language governing permissions and 202 | limitations under the License. 203 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | **Important**: This project has been stopped since elasticsearch 2.0. 2 | 3 | ---- 4 | 5 | Twitter River Plugin for Elasticsearch 6 | ================================== 7 | 8 | The Twitter river indexes the public [twitter stream](http://dev.twitter.com/pages/streaming_api), aka the hose, 9 | and makes it searchable. 10 | 11 | **Rivers are [deprecated](https://www.elastic.co/blog/deprecating_rivers) and will be removed in the future.** 12 | Have a look at [logstash twitter input](http://www.elastic.co/guide/en/logstash/current/plugins-inputs-twitter.html). 13 | 14 | In order to install the plugin, run: 15 | 16 | ```sh 17 | bin/plugin install elasticsearch/elasticsearch-river-twitter/2.6.0 18 | ``` 19 | 20 | After installing the plugin you need to restart elasticsearch. 21 | 22 | You need to install a version matching your Elasticsearch version: 23 | 24 | | Elasticsearch |Twitter River Plugin| Docs | 25 | |------------------------|-------------------|------------------------------------------------------------------------------------------------------------------------------------| 26 | | master | Build from source | See below | 27 | | es-1.x | Build from source | [2.7.0-SNAPSHOT](https://github.com/elasticsearch/elasticsearch-river-twitter/tree/es-1.x/#version-270-snapshot-for-elasticsearch-1x)| 28 | | es-1.6 | 2.6.0 | [2.6.0](https://github.com/elastic/elasticsearch-river-twitter/tree/v2.6.0/#version-260-for-elasticsearch-16) | 29 | | es-1.5 | 2.5.0 | [2.5.0](https://github.com/elastic/elasticsearch-river-twitter/tree/v2.5.0/#version-250-for-elasticsearch-15) | 30 | | es-1.4 | 2.4.2 | [2.4.2](https://github.com/elasticsearch/elasticsearch-river-twitter/tree/v2.4.2/#version-242-for-elasticsearch-14) | 31 | | es-1.3 | 2.3.0 | [2.3.0](https://github.com/elasticsearch/elasticsearch-river-twitter/tree/v2.3.0/#version-230-for-elasticsearch-13) | 32 | | es-1.2 | 2.2.0 | [2.2.0](https://github.com/elasticsearch/elasticsearch-river-twitter/tree/v2.2.0/#twitter-river-plugin-for-elasticsearch) | 33 | | es-1.0 | 2.0.0 | [2.0.0](https://github.com/elasticsearch/elasticsearch-river-twitter/tree/v2.0.0/#twitter-river-plugin-for-elasticsearch) | 34 | | es-0.90 | 1.5.0 | [1.5.0](https://github.com/elasticsearch/elasticsearch-river-twitter/tree/v1.5.0/#twitter-river-plugin-for-elasticsearch) | 35 | 36 | To build a `SNAPSHOT` version, you need to build it with Maven: 37 | 38 | ```bash 39 | mvn clean install 40 | plugin --install river-twitter \ 41 | --url file:target/releases/elasticsearch-river-twitter-X.X.X-SNAPSHOT.zip 42 | ``` 43 | 44 | Prerequisites 45 | ------------- 46 | 47 | You need to get an OAuth token in order to use Twitter river. 48 | Please follow [Twitter documentation](https://dev.twitter.com/docs/auth/tokens-devtwittercom), basically: 49 | 50 | * Login to: https://dev.twitter.com/apps/ 51 | * Create a new Twitter application (let's say elasticsearch): https://dev.twitter.com/apps/new 52 | You don't need a callback URL. 53 | * When done, click on `Create my access token`. 54 | * Open `OAuth tool` tab and note `Consumer key`, `Consumer secret`, `Access token` and `Access token secret`. 55 | 56 | 57 | Create river 58 | ------------ 59 | 60 | Creating the twitter river can be done using: 61 | 62 | ``` 63 | PUT _river/my_twitter_river/_meta 64 | { 65 | "type" : "twitter", 66 | "twitter" : { 67 | "oauth" : { 68 | "consumer_key" : "*** YOUR Consumer key HERE ***", 69 | "consumer_secret" : "*** YOUR Consumer secret HERE ***", 70 | "access_token" : "*** YOUR Access token HERE ***", 71 | "access_token_secret" : "*** YOUR Access token secret HERE ***" 72 | } 73 | }, 74 | "index" : { 75 | "index" : "my_twitter_river", 76 | "type" : "status", 77 | "bulk_size" : 100, 78 | "flush_interval" : "5s", 79 | "retry_after" : "10s" 80 | } 81 | } 82 | ``` 83 | 84 | The above lists all the options controlling the creation of a twitter river. 85 | 86 | If you don't define `index.index`, it will use your river name (`my_twitter_river`) as the default index name. 87 | If you don't define `index.type`, default `status` type will be used. 88 | 89 | Note that you can define any or all of your oauth settings in `elasticsearch.yml` file on each node by prefixing 90 | setting with `river.twitter.`: 91 | 92 | ``` 93 | river.twitter.oauth.consumer_key: "*** YOUR Consumer key HERE ***" 94 | river.twitter.oauth.consumer_secret: "*** YOUR Consumer secret HERE ***" 95 | river.twitter.oauth.access_token: "*** YOUR Access token HERE ***" 96 | river.twitter.oauth.access_token_secret: "*** YOUR Access token secret HERE ***" 97 | ``` 98 | 99 | In that case, you can create the river using: 100 | 101 | ``` 102 | PUT _river/my_twitter_river/_meta 103 | { 104 | "type" : "twitter" 105 | } 106 | ``` 107 | 108 | You can also overload any of `elasticsearch.yml` setting. A good practice could be to have `consumer_key` and 109 | `consumer_secret` in `elasticsearch.yml` and provide to the river `access_token` and `access_token_secret` properties. 110 | 111 | By default, the twitter river will read a small random of all public statuses using 112 | [sample API](https://dev.twitter.com/docs/api/1.1/get/statuses/sample). 113 | 114 | But, you can define statuses type you want to read: 115 | 116 | * [sample](https://dev.twitter.com/docs/api/1.1/get/statuses/sample): the default one 117 | * [filter](https://dev.twitter.com/docs/api/1.1/post/statuses/filter): track for text, users and locations. 118 | See [Filtered Stream](#filtered-stream) 119 | * [user](https://dev.twitter.com/docs/streaming-apis/streams/user): listen to tweets in the authenticated user's timeline. 120 | See [User Stream](#user-stream) 121 | * [firehose](https://dev.twitter.com/docs/api/1.1/get/statuses/firehose): all public statuses (restricted access) 122 | 123 | For example: 124 | 125 | ``` 126 | PUT _river/my_twitter_river/_meta 127 | { 128 | "type" : "twitter", 129 | "twitter" : { 130 | "type" : "firehose" 131 | } 132 | } 133 | ``` 134 | 135 | Note that if you define a filter (see [next section](#filtered-stream)), type will be automatically set to `filter`. 136 | 137 | Tweets will be indexed once a `bulk_size` of them have been accumulated (default to `100`) 138 | or every `flush_interval` period (default to `5s`). 139 | 140 | Filtered Stream 141 | =============== 142 | 143 | Filtered stream can also be supported (as per the twitter stream API). Filter stream can be configured to 144 | support `tracks`, `follow`, `locations` and `language`. `user_lists` is a shortcut to follow all members of a public 145 | twitter list identified by the user id and the list slug (last part of uri when open a list in your browser). The 146 | configuration is the same as the twitter API (a single comma separated string value, or using json arrays). 147 | Here is an example: 148 | 149 | ``` 150 | PUT _river/my_twitter_river/_meta 151 | { 152 | "type" : "twitter", 153 | "twitter" : { 154 | "filter" : { 155 | "tracks" : "test,something,please", 156 | "follow" : "111,222,333", 157 | "user_lists" : "ownerScreenName1/slug1,ownerScreenName2/slug2", 158 | "locations" : "-122.75,36.8,-121.75,37.8,-74,40,-73,41", 159 | "language" : "fr,en" 160 | } 161 | } 162 | } 163 | ``` 164 | 165 | Note that locations use geoJSON order (longitude, latitude). 166 | 167 | Note that if you want to use language filtering you need also to define at least one of `tracks`, 168 | `follow` or `locations` filter. 169 | Supported languages identifiers are [BCP 47](http://tools.ietf.org/html/bcp47). You can filter 170 | whatever language defined in [Twitter Advanced Search](https://twitter.com/search-advanced). 171 | 172 | Here is an array based configuration example: 173 | 174 | ``` 175 | PUT _river/my_twitter_river/_meta 176 | { 177 | "type" : "twitter", 178 | "twitter" : { 179 | "filter" : { 180 | "tracks" : ["test", "something"], 181 | "follow" : [111, 222, 333], 182 | "locations" : [ [-122.75,36.8], [-121.75,37.8], [-74,40], [-73,41]], 183 | "language" : [ "fr", "en" ] 184 | } 185 | } 186 | } 187 | ``` 188 | 189 | User Stream 190 | =========== 191 | 192 | User stream can also be supported (as per the twitter stream API). This stream return tweets on the authenticated user's 193 | timeline. Here is a basic configuration example: 194 | 195 | ``` 196 | PUT _river/my_twitter_river/_meta 197 | { 198 | "type" : "twitter", 199 | "twitter" : { 200 | "type" : "user" 201 | } 202 | } 203 | ``` 204 | 205 | Indexing RAW Twitter stream 206 | =========================== 207 | 208 | By default, elasticsearch twitter river will convert tweets to an equivalent representation 209 | in elasticsearch. If you want to index RAW twitter JSON content without any transformation, 210 | you can set `raw` to `true`: 211 | 212 | ``` 213 | PUT _river/my_twitter_river/_meta 214 | { 215 | "type" : "twitter", 216 | "twitter" : { 217 | "raw" : true 218 | } 219 | } 220 | ``` 221 | 222 | Note that you should think of creating a mapping first for your tweets. See Twitter documentation on 223 | [raw Tweet format](https://dev.twitter.com/docs/platform-objects/tweets): 224 | 225 | ``` 226 | PUT my_twitter_river/status/_mapping 227 | { 228 | "status" : { 229 | "properties" : { 230 | "text" : {"type" : "string", "analyzer" : "standard"} 231 | } 232 | } 233 | } 234 | ``` 235 | 236 | Ignoring Retweets 237 | ================= 238 | 239 | If you don't want to index retweets (aka RT), just set `ignore_retweet` to `true` (default to `false`): 240 | 241 | ``` 242 | PUT _river/my_twitter_river/_meta 243 | { 244 | "type" : "twitter", 245 | "twitter" : { 246 | "ignore_retweet" : true 247 | } 248 | } 249 | ``` 250 | 251 | Increase the schedule time to reconnect the river 252 | ================================================= 253 | 254 | It can happen that the river fails, thus closing the current connection to the Streaming API. Then, a new connection is scheduled by the river after 10s by default. 255 | If you want to manage this time, simply use the `retry_after` option, as in: 256 | 257 | ``` 258 | PUT _river/my_twitter_river/_meta 259 | { 260 | "type" : "twitter", 261 | "index" : { 262 | "retry_after" : "30s" 263 | } 264 | } 265 | ``` 266 | 267 | Geo location points as array 268 | ============================ 269 | 270 | By default, elasticsearch twitter river index `location` field using the *lat lon as properties* format. 271 | You can set `geo_as_array` to `true` if you prefer having `location` indexed as an array `[lon, lat]`. 272 | 273 | ``` 274 | PUT _river/my_twitter_river/_meta 275 | { 276 | "type" : "twitter", 277 | "twitter" : { 278 | "geo_as_array" : true 279 | } 280 | } 281 | ``` 282 | 283 | Remove the river 284 | ================ 285 | 286 | If you need to stop the Twitter river, you have to remove it: 287 | 288 | ``` 289 | DELETE _river/my_twitter_river/ 290 | ``` 291 | 292 | Using a proxy 293 | ============= 294 | 295 | You can define a proxy if you are using one: 296 | 297 | ``` 298 | PUT _river/my_twitter_river/_meta 299 | { 300 | "type" : "twitter", 301 | "twitter" : { 302 | "proxy" : { 303 | "host": "host", 304 | "port": "port", 305 | "user": "proxy_user_if_any", 306 | "password": "proxy_password_if_any" 307 | } 308 | } 309 | } 310 | ``` 311 | 312 | You can also define proxy settings in `elasticsearch.yml`file on each node by prefixing setting with `river.twitter.`: 313 | 314 | ```yaml 315 | river.twitter.proxy.host: "host" 316 | river.twitter.proxy.port: "port" 317 | river.twitter.proxy.user: "proxy_user_if_any" 318 | river.twitter.proxy.password: "proxy_password_if_any" 319 | ``` 320 | 321 | Sample document 322 | =============== 323 | 324 | Here is how a document could look like when using this river (without `raw` option): 325 | 326 | ```js 327 | { 328 | "text":"This is a text", 329 | "created_at":"2015-01-26T15:22:35.000Z", 330 | "source":"Twitter for Windows Phone", 331 | "truncated":false, 332 | "language":"en", 333 | "mention":[ 334 | 335 | ], 336 | "retweet_count":0, 337 | "hashtag":[ 338 | 339 | ], 340 | "location":[ 341 | 78.418407, 342 | 17.431913 343 | ], 344 | "place":{ 345 | "id":"243cc16f6417a167", 346 | "name":"Hyderabad", 347 | "type":"city", 348 | "full_name":"Hyderabad, Andhra Pradesh", 349 | "street_address":null, 350 | "country":"India", 351 | "country_code":"IN", 352 | "url":"https://api.twitter.com/1.1/geo/id/243cc16f6417a167.json" 353 | }, 354 | "link":[ 355 | 356 | ], 357 | "user":{ 358 | "id":1111111111, 359 | "name":"User Name", 360 | "screen_name":"twitter_handle", 361 | "location":"A full text location description", 362 | "description":"A description", 363 | "profile_image_url":"http://pbs.twimg.com/profile_images/1111111111/QATJ00Yp_normal.jpeg", 364 | "profile_image_url_https":"https://pbs.twimg.com/profile_images/1111111111/QATJ00Yp_normal.jpeg" 365 | } 366 | } 367 | ``` 368 | 369 | Tests 370 | ===== 371 | 372 | Integrations tests in this plugin require working Twitter account and therefore disabled by default. 373 | You need to create your credentials as explained in [Prerequisites](#prerequisites). 374 | 375 | To enable tests prepare a config file `elasticsearch.yml` with the following content: 376 | 377 | ``` 378 | river: 379 | twitter: 380 | oauth: 381 | consumer_key: "your_consumer_key" 382 | consumer_secret: "your_consumer_secret" 383 | access_token: "your_access_token" 384 | access_token_secret: "your_access_token_secret" 385 | ``` 386 | 387 | Replace all occurrences of `your_consumer_key`, `your_consumer_secret`, `your_access_token` and 388 | `your_access_token_secret` with your settings. 389 | 390 | To run test: 391 | 392 | ```sh 393 | mvn -Dtests.twitter=true -Dtests.config=/path/to/config/file/elasticsearch.yml clean test 394 | ``` 395 | 396 | Note that if you want to test User Stream, you need to define write rights for your twitter 397 | application. 398 | 399 | License 400 | ------- 401 | 402 | This software is licensed under the Apache 2 license, quoted below. 403 | 404 | Copyright 2009-2014 Elasticsearch 405 | 406 | Licensed under the Apache License, Version 2.0 (the "License"); you may not 407 | use this file except in compliance with the License. You may obtain a copy of 408 | the License at 409 | 410 | http://www.apache.org/licenses/LICENSE-2.0 411 | 412 | Unless required by applicable law or agreed to in writing, software 413 | distributed under the License is distributed on an "AS IS" BASIS, WITHOUT 414 | WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the 415 | License for the specific language governing permissions and limitations under 416 | the License. 417 | -------------------------------------------------------------------------------- /dev-tools/release.py: -------------------------------------------------------------------------------- 1 | # Licensed to Elasticsearch under one or more contributor 2 | # license agreements. See the NOTICE file distributed with 3 | # this work for additional information regarding copyright 4 | # ownership. Elasticsearch licenses this file to you under 5 | # the Apache License, Version 2.0 (the "License"); you may 6 | # not use this file except in compliance with the License. 7 | # You may obtain a copy of the License at 8 | # 9 | # http://www.apache.org/licenses/LICENSE-2.0 10 | # 11 | # Unless required by applicable law or agreed to in writing, 12 | # software distributed under the License is distributed on 13 | # an 'AS IS' BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, 14 | # either express or implied. See the License for the specific 15 | # language governing permissions and limitations under the License. 16 | 17 | import datetime 18 | import os 19 | import shutil 20 | import sys 21 | import time 22 | import urllib 23 | import urllib.request 24 | import zipfile 25 | 26 | from os.path import dirname, abspath 27 | 28 | """ 29 | This tool builds a release from the a given elasticsearch plugin branch. 30 | 31 | It is basically a wrapper on top of launch_release.py which: 32 | 33 | - tries to get a more recent version of launch_release.py in ... 34 | - download it if needed 35 | - launch it passing all arguments to it, like: 36 | 37 | $ python3 dev_tools/release.py --branch master --publish --remote origin 38 | 39 | Important options: 40 | 41 | # Dry run 42 | $ python3 dev_tools/release.py 43 | 44 | # Dry run without tests 45 | python3 dev_tools/release.py --skiptests 46 | 47 | # Release, publish artifacts and announce 48 | $ python3 dev_tools/release.py --publish 49 | 50 | See full documentation in launch_release.py 51 | """ 52 | env = os.environ 53 | 54 | # Change this if the source repository for your scripts is at a different location 55 | SOURCE_REPO = 'elasticsearch/elasticsearch-plugins-script' 56 | # We define that we should download again the script after 1 days 57 | SCRIPT_OBSOLETE_DAYS = 1 58 | # We ignore in master.zip file the following files 59 | IGNORED_FILES = ['.gitignore', 'README.md'] 60 | 61 | 62 | ROOT_DIR = abspath(os.path.join(abspath(dirname(__file__)), '../')) 63 | TARGET_TOOLS_DIR = ROOT_DIR + '/plugin_tools' 64 | DEV_TOOLS_DIR = ROOT_DIR + '/dev-tools' 65 | BUILD_RELEASE_FILENAME = 'release.zip' 66 | BUILD_RELEASE_FILE = TARGET_TOOLS_DIR + '/' + BUILD_RELEASE_FILENAME 67 | SOURCE_URL = 'https://github.com/%s/archive/master.zip' % SOURCE_REPO 68 | 69 | # Download a recent version of the release plugin tool 70 | try: 71 | os.mkdir(TARGET_TOOLS_DIR) 72 | print('directory %s created' % TARGET_TOOLS_DIR) 73 | except FileExistsError: 74 | pass 75 | 76 | 77 | try: 78 | # we check latest update. If we ran an update recently, we 79 | # are not going to check it again 80 | download = True 81 | 82 | try: 83 | last_download_time = datetime.datetime.fromtimestamp(os.path.getmtime(BUILD_RELEASE_FILE)) 84 | if (datetime.datetime.now()-last_download_time).days < SCRIPT_OBSOLETE_DAYS: 85 | download = False 86 | except FileNotFoundError: 87 | pass 88 | 89 | if download: 90 | urllib.request.urlretrieve(SOURCE_URL, BUILD_RELEASE_FILE) 91 | with zipfile.ZipFile(BUILD_RELEASE_FILE) as myzip: 92 | for member in myzip.infolist(): 93 | filename = os.path.basename(member.filename) 94 | # skip directories 95 | if not filename: 96 | continue 97 | if filename in IGNORED_FILES: 98 | continue 99 | 100 | # copy file (taken from zipfile's extract) 101 | source = myzip.open(member.filename) 102 | target = open(os.path.join(TARGET_TOOLS_DIR, filename), "wb") 103 | with source, target: 104 | shutil.copyfileobj(source, target) 105 | # We keep the original date 106 | date_time = time.mktime(member.date_time + (0, 0, -1)) 107 | os.utime(os.path.join(TARGET_TOOLS_DIR, filename), (date_time, date_time)) 108 | print('plugin-tools updated from %s' % SOURCE_URL) 109 | except urllib.error.HTTPError: 110 | pass 111 | 112 | 113 | # Let see if we need to update the release.py script itself 114 | source_time = os.path.getmtime(TARGET_TOOLS_DIR + '/release.py') 115 | repo_time = os.path.getmtime(DEV_TOOLS_DIR + '/release.py') 116 | if source_time > repo_time: 117 | input('release.py needs an update. Press a key to update it...') 118 | shutil.copyfile(TARGET_TOOLS_DIR + '/release.py', DEV_TOOLS_DIR + '/release.py') 119 | 120 | # We can launch the build process 121 | try: 122 | PYTHON = 'python' 123 | # make sure python3 is used if python3 is available 124 | # some systems use python 2 as default 125 | os.system('python3 --version > /dev/null 2>&1') 126 | PYTHON = 'python3' 127 | except RuntimeError: 128 | pass 129 | 130 | release_args = '' 131 | for x in range(1, len(sys.argv)): 132 | release_args += ' ' + sys.argv[x] 133 | 134 | os.system('%s %s/build_release.py %s' % (PYTHON, TARGET_TOOLS_DIR, release_args)) 135 | -------------------------------------------------------------------------------- /pom.xml: -------------------------------------------------------------------------------- 1 | 2 | 5 | 4.0.0 6 | 7 | org.elasticsearch 8 | elasticsearch-river-twitter 9 | 3.0.0-SNAPSHOT 10 | jar 11 | Elasticsearch Twitter River plugin 12 | The Twitter river indexes the public twitter stream, aka the hose, and makes it searchable 13 | https://github.com/elastic/elasticsearch-river-twitter/ 14 | 2009 15 | 16 | 17 | The Apache Software License, Version 2.0 18 | http://www.apache.org/licenses/LICENSE-2.0.txt 19 | repo 20 | 21 | 22 | 23 | scm:git:git@github.com:elastic/elasticsearch-river-twitter.git 24 | scm:git:git@github.com:elastic/elasticsearch-river-twitter.git 25 | http://github.com/elastic/elasticsearch-river-twitter 26 | 27 | 28 | 29 | org.elasticsearch 30 | elasticsearch-plugin 31 | 2.0.0-SNAPSHOT 32 | 33 | 34 | 35 | 4.0.3 36 | 37 | warn 38 | 1 39 | 40 | 41 | 42 | 43 | 44 | org.twitter4j 45 | twitter4j-stream 46 | ${twitter4j.version} 47 | 48 | 49 | 50 | 51 | 52 | 53 | org.apache.maven.plugins 54 | maven-assembly-plugin 55 | 56 | 57 | 58 | 59 | 60 | 61 | oss-snapshots 62 | Sonatype OSS Snapshots 63 | https://oss.sonatype.org/content/repositories/snapshots/ 64 | 65 | 66 | 67 | -------------------------------------------------------------------------------- /src/main/assemblies/plugin.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | plugin 4 | 5 | zip 6 | 7 | false 8 | 9 | 10 | / 11 | true 12 | true 13 | 14 | org.elasticsearch:elasticsearch 15 | 16 | 17 | 18 | / 19 | true 20 | true 21 | 22 | org.twitter4j:twitter4j-stream 23 | 24 | 25 | 26 | -------------------------------------------------------------------------------- /src/main/java/org/elasticsearch/plugin/river/twitter/TwitterRiverPlugin.java: -------------------------------------------------------------------------------- 1 | /* 2 | * Licensed to Elasticsearch under one or more contributor 3 | * license agreements. See the NOTICE file distributed with 4 | * this work for additional information regarding copyright 5 | * ownership. Elasticsearch licenses this file to you under 6 | * the Apache License, Version 2.0 (the "License"); you may 7 | * not use this file except in compliance with the License. 8 | * You may obtain a copy of the License at 9 | * 10 | * http://www.apache.org/licenses/LICENSE-2.0 11 | * 12 | * Unless required by applicable law or agreed to in writing, 13 | * software distributed under the License is distributed on an 14 | * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY 15 | * KIND, either express or implied. See the License for the 16 | * specific language governing permissions and limitations 17 | * under the License. 18 | */ 19 | 20 | package org.elasticsearch.plugin.river.twitter; 21 | 22 | import org.elasticsearch.common.inject.Inject; 23 | import org.elasticsearch.plugins.AbstractPlugin; 24 | import org.elasticsearch.river.RiversModule; 25 | import org.elasticsearch.river.twitter.TwitterRiverModule; 26 | 27 | /** 28 | * 29 | */ 30 | public class TwitterRiverPlugin extends AbstractPlugin { 31 | 32 | @Inject 33 | public TwitterRiverPlugin() { 34 | } 35 | 36 | @Override 37 | public String name() { 38 | return "river-twitter"; 39 | } 40 | 41 | @Override 42 | public String description() { 43 | return "River Twitter Plugin"; 44 | } 45 | 46 | public void onModule(RiversModule module) { 47 | module.registerRiver("twitter", TwitterRiverModule.class); 48 | } 49 | } 50 | -------------------------------------------------------------------------------- /src/main/java/org/elasticsearch/river/twitter/TwitterRiver.java: -------------------------------------------------------------------------------- 1 | /* 2 | * Licensed to Elasticsearch under one or more contributor 3 | * license agreements. See the NOTICE file distributed with 4 | * this work for additional information regarding copyright 5 | * ownership. Elasticsearch licenses this file to you under 6 | * the Apache License, Version 2.0 (the "License"); you may 7 | * not use this file except in compliance with the License. 8 | * You may obtain a copy of the License at 9 | * 10 | * http://www.apache.org/licenses/LICENSE-2.0 11 | * 12 | * Unless required by applicable law or agreed to in writing, 13 | * software distributed under the License is distributed on an 14 | * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY 15 | * KIND, either express or implied. See the License for the 16 | * specific language governing permissions and limitations 17 | * under the License. 18 | */ 19 | 20 | package org.elasticsearch.river.twitter; 21 | 22 | import org.elasticsearch.ExceptionsHelper; 23 | import org.elasticsearch.action.bulk.BulkItemResponse; 24 | import org.elasticsearch.action.bulk.BulkProcessor; 25 | import org.elasticsearch.action.bulk.BulkRequest; 26 | import org.elasticsearch.action.bulk.BulkResponse; 27 | import org.elasticsearch.client.Client; 28 | import org.elasticsearch.client.Requests; 29 | import org.elasticsearch.cluster.block.ClusterBlockException; 30 | import org.elasticsearch.common.Strings; 31 | import org.elasticsearch.common.inject.Inject; 32 | import org.elasticsearch.common.settings.Settings; 33 | import org.elasticsearch.common.unit.TimeValue; 34 | import org.elasticsearch.common.xcontent.XContentBuilder; 35 | import org.elasticsearch.common.xcontent.XContentFactory; 36 | import org.elasticsearch.common.xcontent.support.XContentMapValues; 37 | import org.elasticsearch.indices.IndexAlreadyExistsException; 38 | import org.elasticsearch.river.AbstractRiverComponent; 39 | import org.elasticsearch.river.River; 40 | import org.elasticsearch.river.RiverName; 41 | import org.elasticsearch.river.RiverSettings; 42 | import org.elasticsearch.threadpool.ThreadPool; 43 | import twitter4j.*; 44 | import twitter4j.conf.Configuration; 45 | import twitter4j.conf.ConfigurationBuilder; 46 | 47 | import java.util.ArrayList; 48 | import java.util.List; 49 | import java.util.Map; 50 | 51 | /** 52 | * 53 | */ 54 | public class TwitterRiver extends AbstractRiverComponent implements River { 55 | 56 | private final ThreadPool threadPool; 57 | 58 | private final Client client; 59 | 60 | private final String oauthConsumerKey; 61 | private final String oauthConsumerSecret; 62 | private final String oauthAccessToken; 63 | private final String oauthAccessTokenSecret; 64 | 65 | private final TimeValue retryAfter; 66 | 67 | private final String proxyHost; 68 | private final String proxyPort; 69 | private final String proxyUser; 70 | private final String proxyPassword; 71 | 72 | private final boolean raw; 73 | private final boolean ignoreRetweet; 74 | private final boolean geoAsArray; 75 | 76 | private final String indexName; 77 | 78 | private final String typeName; 79 | 80 | private final int bulkSize; 81 | private final int maxConcurrentBulk; 82 | private final TimeValue bulkFlushInterval; 83 | 84 | private final FilterQuery filterQuery; 85 | 86 | private final String streamType; 87 | 88 | private RiverStatus riverStatus; 89 | 90 | private volatile TwitterStream stream; 91 | 92 | private volatile BulkProcessor bulkProcessor; 93 | 94 | @SuppressWarnings({"unchecked"}) 95 | @Inject 96 | public TwitterRiver(RiverName riverName, RiverSettings riverSettings, Client client, ThreadPool threadPool, Settings settings) { 97 | super(riverName, riverSettings); 98 | this.riverStatus = RiverStatus.UNKNOWN; 99 | this.client = client; 100 | this.threadPool = threadPool; 101 | 102 | String riverStreamType; 103 | 104 | if (riverSettings.settings().containsKey("twitter")) { 105 | Map twitterSettings = (Map) riverSettings.settings().get("twitter"); 106 | 107 | raw = XContentMapValues.nodeBooleanValue(twitterSettings.get("raw"), false); 108 | ignoreRetweet = XContentMapValues.nodeBooleanValue(twitterSettings.get("ignore_retweet"), false); 109 | geoAsArray = XContentMapValues.nodeBooleanValue(twitterSettings.get("geo_as_array"), false); 110 | 111 | if (twitterSettings.containsKey("oauth")) { 112 | Map oauth = (Map) twitterSettings.get("oauth"); 113 | if (oauth.containsKey("consumer_key")) { 114 | oauthConsumerKey = XContentMapValues.nodeStringValue(oauth.get("consumer_key"), null); 115 | } else { 116 | oauthConsumerKey = settings.get("river.twitter.oauth.consumer_key"); 117 | } 118 | if (oauth.containsKey("consumer_secret")) { 119 | oauthConsumerSecret = XContentMapValues.nodeStringValue(oauth.get("consumer_secret"), null); 120 | } else { 121 | oauthConsumerSecret = settings.get("river.twitter.oauth.consumer_secret"); 122 | } 123 | if (oauth.containsKey("access_token")) { 124 | oauthAccessToken = XContentMapValues.nodeStringValue(oauth.get("access_token"), null); 125 | } else { 126 | oauthAccessToken = settings.get("river.twitter.oauth.access_token"); 127 | } 128 | if (oauth.containsKey("access_token_secret")) { 129 | oauthAccessTokenSecret = XContentMapValues.nodeStringValue(oauth.get("access_token_secret"), null); 130 | } else { 131 | oauthAccessTokenSecret = settings.get("river.twitter.oauth.access_token_secret"); 132 | } 133 | } else { 134 | oauthConsumerKey = settings.get("river.twitter.oauth.consumer_key"); 135 | oauthConsumerSecret = settings.get("river.twitter.oauth.consumer_secret"); 136 | oauthAccessToken = settings.get("river.twitter.oauth.access_token"); 137 | oauthAccessTokenSecret = settings.get("river.twitter.oauth.access_token_secret"); 138 | } 139 | 140 | if (twitterSettings.containsKey("retry_after")) { 141 | retryAfter = XContentMapValues.nodeTimeValue(twitterSettings.get("retry_after"), TimeValue.timeValueSeconds(10)); 142 | } else { 143 | retryAfter = XContentMapValues.nodeTimeValue(settings.get("river.twitter.retry_after"), TimeValue.timeValueSeconds(10)); 144 | } 145 | 146 | if (twitterSettings.containsKey("proxy")) { 147 | Map proxy = (Map) twitterSettings.get("proxy"); 148 | proxyHost = XContentMapValues.nodeStringValue(proxy.get("host"), null); 149 | proxyPort = XContentMapValues.nodeStringValue(proxy.get("port"), null); 150 | proxyUser = XContentMapValues.nodeStringValue(proxy.get("user"), null); 151 | proxyPassword = XContentMapValues.nodeStringValue(proxy.get("password"), null); 152 | } else { 153 | // Let's see if we have that in node settings 154 | proxyHost = settings.get("river.twitter.proxy.host"); 155 | proxyPort = settings.get("river.twitter.proxy.port"); 156 | proxyUser = settings.get("river.twitter.proxy.user"); 157 | proxyPassword = settings.get("river.twitter.proxy.password"); 158 | } 159 | 160 | riverStreamType = XContentMapValues.nodeStringValue(twitterSettings.get("type"), "sample"); 161 | Map filterSettings = (Map) twitterSettings.get("filter"); 162 | 163 | if (riverStreamType.equals("filter") && filterSettings == null) { 164 | filterQuery = null; 165 | stream = null; 166 | streamType = null; 167 | indexName = null; 168 | typeName = "status"; 169 | bulkSize = 100; 170 | this.maxConcurrentBulk = 1; 171 | this.bulkFlushInterval = TimeValue.timeValueSeconds(5); 172 | logger.warn("no filter defined for type filter. Disabling river..."); 173 | return; 174 | } 175 | 176 | if (filterSettings != null) { 177 | riverStreamType = "filter"; 178 | filterQuery = new FilterQuery(); 179 | filterQuery.count(XContentMapValues.nodeIntegerValue(filterSettings.get("count"), 0)); 180 | Object tracks = filterSettings.get("tracks"); 181 | boolean filterSet = false; 182 | if (tracks != null) { 183 | if (tracks instanceof List) { 184 | List lTracks = (List) tracks; 185 | filterQuery.track(lTracks.toArray(new String[lTracks.size()])); 186 | } else { 187 | filterQuery.track(Strings.commaDelimitedListToStringArray(tracks.toString())); 188 | } 189 | filterSet = true; 190 | } 191 | Object follow = filterSettings.get("follow"); 192 | if (follow != null) { 193 | if (follow instanceof List) { 194 | List lFollow = (List) follow; 195 | long[] followIds = new long[lFollow.size()]; 196 | for (int i = 0; i < lFollow.size(); i++) { 197 | Object o = lFollow.get(i); 198 | if (o instanceof Number) { 199 | followIds[i] = ((Number) o).intValue(); 200 | } else { 201 | followIds[i] = Long.parseLong(o.toString()); 202 | } 203 | } 204 | filterQuery.follow(followIds); 205 | } else { 206 | String[] ids = Strings.commaDelimitedListToStringArray(follow.toString()); 207 | long[] followIds = new long[ids.length]; 208 | for (int i = 0; i < ids.length; i++) { 209 | followIds[i] = Long.parseLong(ids[i]); 210 | } 211 | filterQuery.follow(followIds); 212 | } 213 | filterSet = true; 214 | } 215 | Object locations = filterSettings.get("locations"); 216 | if (locations != null) { 217 | if (locations instanceof List) { 218 | List lLocations = (List) locations; 219 | double[][] dLocations = new double[lLocations.size()][]; 220 | for (int i = 0; i < lLocations.size(); i++) { 221 | Object loc = lLocations.get(i); 222 | double lat; 223 | double lon; 224 | if (loc instanceof List) { 225 | List lLoc = (List) loc; 226 | if (lLoc.get(0) instanceof Number) { 227 | lon = ((Number) lLoc.get(0)).doubleValue(); 228 | } else { 229 | lon = Double.parseDouble(lLoc.get(0).toString()); 230 | } 231 | if (lLoc.get(1) instanceof Number) { 232 | lat = ((Number) lLoc.get(1)).doubleValue(); 233 | } else { 234 | lat = Double.parseDouble(lLoc.get(1).toString()); 235 | } 236 | } else { 237 | String[] sLoc = Strings.commaDelimitedListToStringArray(loc.toString()); 238 | lon = Double.parseDouble(sLoc[0]); 239 | lat = Double.parseDouble(sLoc[1]); 240 | } 241 | dLocations[i] = new double[]{lon, lat}; 242 | } 243 | filterQuery.locations(dLocations); 244 | } else { 245 | String[] sLocations = Strings.commaDelimitedListToStringArray(locations.toString()); 246 | double[][] dLocations = new double[sLocations.length / 2][]; 247 | int dCounter = 0; 248 | for (int i = 0; i < sLocations.length; i++) { 249 | double lon = Double.parseDouble(sLocations[i]); 250 | double lat = Double.parseDouble(sLocations[++i]); 251 | dLocations[dCounter++] = new double[]{lon, lat}; 252 | } 253 | filterQuery.locations(dLocations); 254 | } 255 | filterSet = true; 256 | } 257 | Object userLists = filterSettings.get("user_lists"); 258 | if (userLists != null) { 259 | if (userLists instanceof List) { 260 | List lUserlists = (List) userLists; 261 | String[] tUserlists = lUserlists.toArray(new String[lUserlists.size()]); 262 | filterQuery.follow(getUsersListMembers(tUserlists)); 263 | } else { 264 | String[] tUserlists = Strings.commaDelimitedListToStringArray(userLists.toString()); 265 | filterQuery.follow(getUsersListMembers(tUserlists)); 266 | } 267 | filterSet = true; 268 | } 269 | 270 | // We should have something to filter 271 | if (!filterSet) { 272 | streamType = null; 273 | indexName = null; 274 | typeName = "status"; 275 | bulkSize = 100; 276 | this.maxConcurrentBulk = 1; 277 | this.bulkFlushInterval = TimeValue.timeValueSeconds(5); 278 | logger.warn("can not set language filter without tracks, follow, locations or user_lists. Disabling river."); 279 | return; 280 | } 281 | 282 | Object language = filterSettings.get("language"); 283 | if (language != null) { 284 | if (language instanceof List) { 285 | List lLanguage = (List) language; 286 | filterQuery.language(lLanguage.toArray(new String[lLanguage.size()])); 287 | } else { 288 | filterQuery.language(Strings.commaDelimitedListToStringArray(language.toString())); 289 | } 290 | } 291 | } else { 292 | filterQuery = null; 293 | } 294 | } else { 295 | // No specific settings. We need to use some defaults 296 | riverStreamType = "sample"; 297 | raw = false; 298 | ignoreRetweet = false; 299 | geoAsArray = false; 300 | oauthConsumerKey = settings.get("river.twitter.oauth.consumer_key"); 301 | oauthConsumerSecret = settings.get("river.twitter.oauth.consumer_secret"); 302 | oauthAccessToken = settings.get("river.twitter.oauth.access_token"); 303 | oauthAccessTokenSecret = settings.get("river.twitter.oauth.access_token_secret"); 304 | retryAfter = XContentMapValues.nodeTimeValue(settings.get("river.twitter.retry_after"), TimeValue.timeValueSeconds(10)); 305 | filterQuery = null; 306 | proxyHost = null; 307 | proxyPort = null; 308 | proxyUser = null; 309 | proxyPassword =null; 310 | } 311 | 312 | if (oauthAccessToken == null || oauthConsumerKey == null || oauthConsumerSecret == null || oauthAccessTokenSecret == null) { 313 | stream = null; 314 | streamType = null; 315 | indexName = null; 316 | typeName = "status"; 317 | bulkSize = 100; 318 | this.maxConcurrentBulk = 1; 319 | this.bulkFlushInterval = TimeValue.timeValueSeconds(5); 320 | logger.warn("no oauth specified, disabling river..."); 321 | return; 322 | } 323 | 324 | if (riverSettings.settings().containsKey("index")) { 325 | Map indexSettings = (Map) riverSettings.settings().get("index"); 326 | indexName = XContentMapValues.nodeStringValue(indexSettings.get("index"), riverName.name()); 327 | typeName = XContentMapValues.nodeStringValue(indexSettings.get("type"), "status"); 328 | this.bulkSize = XContentMapValues.nodeIntegerValue(indexSettings.get("bulk_size"), 100); 329 | this.bulkFlushInterval = TimeValue.parseTimeValue(XContentMapValues.nodeStringValue( 330 | indexSettings.get("flush_interval"), "5s"), TimeValue.timeValueSeconds(5)); 331 | this.maxConcurrentBulk = XContentMapValues.nodeIntegerValue(indexSettings.get("max_concurrent_bulk"), 1); 332 | } else { 333 | indexName = riverName.name(); 334 | typeName = "status"; 335 | bulkSize = 100; 336 | this.maxConcurrentBulk = 1; 337 | this.bulkFlushInterval = TimeValue.timeValueSeconds(5); 338 | } 339 | 340 | logger.info("creating twitter stream river"); 341 | if (raw && logger.isDebugEnabled()) { 342 | logger.debug("will index twitter raw content..."); 343 | } 344 | 345 | streamType = riverStreamType; 346 | this.riverStatus = RiverStatus.INITIALIZED; 347 | } 348 | 349 | /** 350 | * Get users id of each list to stream them. 351 | * @param tUserlists List of user list. Should be a public list. 352 | * @return 353 | */ 354 | private long[] getUsersListMembers(String[] tUserlists) { 355 | logger.debug("Fetching user id of given lists"); 356 | List listUserIdToFollow = new ArrayList(); 357 | Configuration cb = buildTwitterConfiguration(); 358 | Twitter twitterImpl = new TwitterFactory(cb).getInstance(); 359 | 360 | //For each list given in parameter 361 | for (String listId : tUserlists) { 362 | logger.debug("Adding users of list {} ",listId); 363 | String[] splitListId = listId.split("/"); 364 | try { 365 | long cursor = -1; 366 | PagableResponseList itUserListMembers; 367 | do { 368 | itUserListMembers = twitterImpl.getUserListMembers(splitListId[0], splitListId[1], cursor); 369 | for (User member : itUserListMembers) { 370 | long userId = member.getId(); 371 | listUserIdToFollow.add(userId); 372 | } 373 | } while ((cursor = itUserListMembers.getNextCursor()) != 0); 374 | 375 | } catch (TwitterException te) { 376 | logger.error("Failed to get list members for : {}", listId, te); 377 | } 378 | } 379 | 380 | 381 | //Just casting from Long to long 382 | long ret[] = new long[listUserIdToFollow.size()]; 383 | int pos = 0; 384 | for (Long userId : listUserIdToFollow) { 385 | ret[pos] = userId; 386 | pos++; 387 | } 388 | return ret; 389 | } 390 | 391 | /** 392 | * Build configuration object with credentials and proxy settings 393 | * @return 394 | */ 395 | private Configuration buildTwitterConfiguration() { 396 | logger.debug("creating twitter configuration"); 397 | ConfigurationBuilder cb = new ConfigurationBuilder(); 398 | 399 | cb.setOAuthConsumerKey(oauthConsumerKey) 400 | .setOAuthConsumerSecret(oauthConsumerSecret) 401 | .setOAuthAccessToken(oauthAccessToken) 402 | .setOAuthAccessTokenSecret(oauthAccessTokenSecret); 403 | 404 | if (proxyHost != null) cb.setHttpProxyHost(proxyHost); 405 | if (proxyPort != null) cb.setHttpProxyPort(Integer.parseInt(proxyPort)); 406 | if (proxyUser != null) cb.setHttpProxyUser(proxyUser); 407 | if (proxyPassword != null) cb.setHttpProxyPassword(proxyPassword); 408 | if (raw) cb.setJSONStoreEnabled(true); 409 | logger.debug("twitter configuration created"); 410 | return cb.build(); 411 | } 412 | 413 | /** 414 | * Start twitter stream 415 | */ 416 | private void startTwitterStream() { 417 | logger.info("starting {} twitter stream", streamType); 418 | 419 | if (stream == null) { 420 | logger.debug("creating twitter stream"); 421 | 422 | stream = new TwitterStreamFactory(buildTwitterConfiguration()).getInstance(); 423 | if (streamType.equals("user")) { 424 | stream.addListener(new UserStreamHandler()); 425 | } else { 426 | stream.addListener(new StatusHandler()); 427 | } 428 | 429 | logger.debug("twitter stream created"); 430 | } 431 | 432 | if (riverStatus != RiverStatus.STOPPED && riverStatus != RiverStatus.STOPPING) { 433 | if (streamType.equals("filter") || filterQuery != null) { 434 | stream.filter(filterQuery); 435 | } else if (streamType.equals("firehose")) { 436 | stream.firehose(0); 437 | } else if (streamType.equals("user")) { 438 | stream.user(); 439 | } else { 440 | stream.sample(); 441 | } 442 | } 443 | logger.debug("{} twitter stream started!", streamType); 444 | } 445 | 446 | @Override 447 | public void start() { 448 | this.riverStatus = RiverStatus.STARTING; 449 | // Let's start this in another thread so we won't stop the start process 450 | threadPool.generic().execute(new Runnable() { 451 | @Override 452 | public void run() { 453 | if (riverStatus != RiverStatus.STOPPED && riverStatus != RiverStatus.STOPPING) { 454 | // We are first waiting for a yellow state at least 455 | logger.debug("waiting for yellow status"); 456 | client.admin().cluster().prepareHealth("_river").setWaitForYellowStatus().get(); 457 | logger.debug("yellow or green status received"); 458 | } 459 | 460 | if (riverStatus != RiverStatus.STOPPED && riverStatus != RiverStatus.STOPPING) { 461 | // We push ES mapping only if raw is false 462 | if (!raw) { 463 | try { 464 | logger.debug("Trying to create index [{}]", indexName); 465 | client.admin().indices().prepareCreate(indexName).execute().actionGet(); 466 | logger.debug("index created [{}]", indexName); 467 | } catch (Exception e) { 468 | if (ExceptionsHelper.unwrapCause(e) instanceof IndexAlreadyExistsException) { 469 | // that's fine 470 | logger.debug("Index [{}] already exists, skipping...", indexName); 471 | } else if (ExceptionsHelper.unwrapCause(e) instanceof ClusterBlockException) { 472 | // ok, not recovered yet..., lets start indexing and hope we recover by the first bulk 473 | // TODO: a smarter logic can be to register for cluster event listener here, and only start sampling when the block is removed... 474 | logger.debug("Cluster is blocked for now. Index [{}] can not be created, skipping...", indexName); 475 | } else { 476 | logger.warn("failed to create index [{}], disabling river...", e, indexName); 477 | riverStatus = RiverStatus.STOPPED; 478 | return; 479 | } 480 | } 481 | 482 | if (client.admin().indices().prepareGetMappings(indexName).setTypes(typeName).get().getMappings().isEmpty()) { 483 | try { 484 | String mapping = XContentFactory.jsonBuilder().startObject().startObject(typeName).startObject("properties") 485 | .startObject("location").field("type", "geo_point").endObject() 486 | .startObject("language").field("type", "string").field("index", "not_analyzed").endObject() 487 | .startObject("user").startObject("properties").startObject("screen_name").field("type", "string").field("index", "not_analyzed").endObject().endObject().endObject() 488 | .startObject("mention").startObject("properties").startObject("screen_name").field("type", "string").field("index", "not_analyzed").endObject().endObject().endObject() 489 | .startObject("in_reply").startObject("properties").startObject("user_screen_name").field("type", "string").field("index", "not_analyzed").endObject().endObject().endObject() 490 | .startObject("retweet").startObject("properties").startObject("user_screen_name").field("type", "string").field("index", "not_analyzed").endObject().endObject().endObject() 491 | .endObject().endObject().endObject().string(); 492 | logger.debug("Applying default mapping for [{}]/[{}]: {}", indexName, typeName, mapping); 493 | client.admin().indices().preparePutMapping(indexName).setType(typeName).setSource(mapping).execute().actionGet(); 494 | } catch (Exception e) { 495 | logger.warn("failed to apply default mapping [{}]/[{}], disabling river...", e, indexName, typeName); 496 | return; 497 | } 498 | } else { 499 | logger.debug("Mapping already exists for [{}]/[{}], skipping...", indexName, typeName); 500 | } 501 | } 502 | } 503 | 504 | // Creating bulk processor 505 | logger.debug("creating bulk processor [{}]", indexName); 506 | bulkProcessor = BulkProcessor.builder(client, new BulkProcessor.Listener() { 507 | @Override 508 | public void beforeBulk(long executionId, BulkRequest request) { 509 | logger.debug("Going to execute new bulk composed of {} actions", request.numberOfActions()); 510 | } 511 | 512 | @Override 513 | public void afterBulk(long executionId, BulkRequest request, BulkResponse response) { 514 | logger.debug("Executed bulk composed of {} actions", request.numberOfActions()); 515 | if (response.hasFailures()) { 516 | logger.warn("There was failures while executing bulk", response.buildFailureMessage()); 517 | if (logger.isDebugEnabled()) { 518 | for (BulkItemResponse item : response.getItems()) { 519 | if (item.isFailed()) { 520 | logger.debug("Error for {}/{}/{} for {} operation: {}", item.getIndex(), 521 | item.getType(), item.getId(), item.getOpType(), item.getFailureMessage()); 522 | } 523 | } 524 | } 525 | } 526 | } 527 | 528 | @Override 529 | public void afterBulk(long executionId, BulkRequest request, Throwable failure) { 530 | logger.warn("Error executing bulk", failure); 531 | } 532 | }) 533 | .setBulkActions(bulkSize) 534 | .setConcurrentRequests(maxConcurrentBulk) 535 | .setFlushInterval(bulkFlushInterval) 536 | .build(); 537 | 538 | logger.debug("Bulk processor created with bulkSize [{}], bulkFlushInterval [{}]", bulkSize, bulkFlushInterval); 539 | if (riverStatus != RiverStatus.STOPPED && riverStatus != RiverStatus.STOPPING) { 540 | startTwitterStream(); 541 | riverStatus = RiverStatus.RUNNING; 542 | } 543 | } 544 | }); 545 | } 546 | 547 | private void reconnect() { 548 | if (riverStatus == RiverStatus.STOPPING || riverStatus == RiverStatus.STOPPED ) { 549 | logger.debug("can not reconnect twitter on a closed river"); 550 | return; 551 | } 552 | 553 | riverStatus = RiverStatus.STARTING; 554 | 555 | if (stream != null) { 556 | try { 557 | logger.debug("cleanup stream"); 558 | stream.cleanUp(); 559 | } catch (Exception e) { 560 | logger.debug("failed to cleanup after failure", e); 561 | } 562 | try { 563 | logger.debug("shutdown stream"); 564 | stream.shutdown(); 565 | } catch (Exception e) { 566 | logger.debug("failed to shutdown after failure", e); 567 | } 568 | } 569 | 570 | if (riverStatus == RiverStatus.STOPPING || riverStatus == RiverStatus.STOPPED ) { 571 | logger.debug("can not reconnect twitter on a closed river"); 572 | return; 573 | } 574 | 575 | try { 576 | startTwitterStream(); 577 | riverStatus = RiverStatus.RUNNING; 578 | } catch (Exception e) { 579 | if (riverStatus == RiverStatus.STOPPING || riverStatus == RiverStatus.STOPPED ) { 580 | logger.debug("river is closing. we won't reconnect."); 581 | close(); 582 | return; 583 | } 584 | // TODO, we can update the status of the river to RECONNECT 585 | logger.warn("failed to connect after failure, throttling", e); 586 | threadPool.schedule(retryAfter, ThreadPool.Names.GENERIC, new Runnable() { 587 | @Override 588 | public void run() { 589 | reconnect(); 590 | } 591 | }); 592 | } 593 | } 594 | 595 | @Override 596 | public void close() { 597 | riverStatus = RiverStatus.STOPPING; 598 | 599 | logger.info("closing twitter stream river"); 600 | 601 | if (bulkProcessor != null) { 602 | bulkProcessor.close(); 603 | } 604 | 605 | if (stream != null) { 606 | // No need to call stream.cleanUp(): 607 | // - since it is done by the implementation of shutdown() 608 | // - it will lead to a thread leak (see TwitterStreamImpl.cleanUp() and TwitterStreamImpl.shutdown() ) 609 | stream.shutdown(); 610 | } 611 | 612 | riverStatus = RiverStatus.STOPPED; 613 | } 614 | 615 | private class StatusHandler extends StatusAdapter { 616 | 617 | @Override 618 | public void onStatus(Status status) { 619 | if (riverStatus != RiverStatus.STOPPED && riverStatus != RiverStatus.STOPPING) { 620 | try { 621 | // #24: We want to ignore retweets (default to false) https://github.com/elasticsearch/elasticsearch-river-twitter/issues/24 622 | if (status.isRetweet() && ignoreRetweet) { 623 | if (logger.isTraceEnabled()) { 624 | logger.trace("ignoring status cause retweet {} : {}", status.getUser().getName(), status.getText()); 625 | } 626 | } else { 627 | if (logger.isTraceEnabled()) { 628 | logger.trace("status {} : {}", status.getUser().getName(), status.getText()); 629 | } 630 | 631 | // If we want to index tweets as is, we don't need to convert it to JSon doc 632 | if (raw) { 633 | String rawJSON = TwitterObjectFactory.getRawJSON(status); 634 | if (riverStatus != RiverStatus.STOPPED && riverStatus != RiverStatus.STOPPING) { 635 | bulkProcessor.add(Requests.indexRequest(indexName).type(typeName).id(Long.toString(status.getId())).source(rawJSON)); 636 | } 637 | } else { 638 | XContentBuilder builder = XContentFactory.jsonBuilder().startObject(); 639 | builder.field("text", status.getText()); 640 | builder.field("created_at", status.getCreatedAt()); 641 | builder.field("source", status.getSource()); 642 | builder.field("truncated", status.isTruncated()); 643 | builder.field("language", status.getLang()); 644 | 645 | if (status.getUserMentionEntities() != null) { 646 | builder.startArray("mention"); 647 | for (UserMentionEntity user : status.getUserMentionEntities()) { 648 | builder.startObject(); 649 | builder.field("id", user.getId()); 650 | builder.field("name", user.getName()); 651 | builder.field("screen_name", user.getScreenName()); 652 | builder.field("start", user.getStart()); 653 | builder.field("end", user.getEnd()); 654 | builder.endObject(); 655 | } 656 | builder.endArray(); 657 | } 658 | 659 | if (status.getRetweetCount() != -1) { 660 | builder.field("retweet_count", status.getRetweetCount()); 661 | } 662 | 663 | if (status.isRetweet() && status.getRetweetedStatus() != null) { 664 | builder.startObject("retweet"); 665 | builder.field("id", status.getRetweetedStatus().getId()); 666 | if (status.getRetweetedStatus().getUser() != null) { 667 | builder.field("user_id", status.getRetweetedStatus().getUser().getId()); 668 | builder.field("user_screen_name", status.getRetweetedStatus().getUser().getScreenName()); 669 | if (status.getRetweetedStatus().getRetweetCount() != -1) { 670 | builder.field("retweet_count", status.getRetweetedStatus().getRetweetCount()); 671 | } 672 | } 673 | builder.endObject(); 674 | } 675 | 676 | if (status.getInReplyToStatusId() != -1) { 677 | builder.startObject("in_reply"); 678 | builder.field("status", status.getInReplyToStatusId()); 679 | if (status.getInReplyToUserId() != -1) { 680 | builder.field("user_id", status.getInReplyToUserId()); 681 | builder.field("user_screen_name", status.getInReplyToScreenName()); 682 | } 683 | builder.endObject(); 684 | } 685 | 686 | if (status.getHashtagEntities() != null) { 687 | builder.startArray("hashtag"); 688 | for (HashtagEntity hashtag : status.getHashtagEntities()) { 689 | builder.startObject(); 690 | builder.field("text", hashtag.getText()); 691 | builder.field("start", hashtag.getStart()); 692 | builder.field("end", hashtag.getEnd()); 693 | builder.endObject(); 694 | } 695 | builder.endArray(); 696 | } 697 | if (status.getContributors() != null && status.getContributors().length > 0) { 698 | builder.array("contributor", status.getContributors()); 699 | } 700 | if (status.getGeoLocation() != null) { 701 | if (geoAsArray) { 702 | builder.startArray("location"); 703 | builder.value(status.getGeoLocation().getLongitude()); 704 | builder.value(status.getGeoLocation().getLatitude()); 705 | builder.endArray(); 706 | } else { 707 | builder.startObject("location"); 708 | builder.field("lat", status.getGeoLocation().getLatitude()); 709 | builder.field("lon", status.getGeoLocation().getLongitude()); 710 | builder.endObject(); 711 | } 712 | } 713 | if (status.getPlace() != null) { 714 | builder.startObject("place"); 715 | builder.field("id", status.getPlace().getId()); 716 | builder.field("name", status.getPlace().getName()); 717 | builder.field("type", status.getPlace().getPlaceType()); 718 | builder.field("full_name", status.getPlace().getFullName()); 719 | builder.field("street_address", status.getPlace().getStreetAddress()); 720 | builder.field("country", status.getPlace().getCountry()); 721 | builder.field("country_code", status.getPlace().getCountryCode()); 722 | builder.field("url", status.getPlace().getURL()); 723 | builder.endObject(); 724 | } 725 | if (status.getURLEntities() != null) { 726 | builder.startArray("link"); 727 | for (URLEntity url : status.getURLEntities()) { 728 | if (url != null) { 729 | builder.startObject(); 730 | if (url.getURL() != null) { 731 | builder.field("url", url.getURL()); 732 | } 733 | if (url.getDisplayURL() != null) { 734 | builder.field("display_url", url.getDisplayURL()); 735 | } 736 | if (url.getExpandedURL() != null) { 737 | builder.field("expand_url", url.getExpandedURL()); 738 | } 739 | builder.field("start", url.getStart()); 740 | builder.field("end", url.getEnd()); 741 | builder.endObject(); 742 | } 743 | } 744 | builder.endArray(); 745 | } 746 | 747 | builder.startObject("user"); 748 | builder.field("id", status.getUser().getId()); 749 | builder.field("name", status.getUser().getName()); 750 | builder.field("screen_name", status.getUser().getScreenName()); 751 | builder.field("location", status.getUser().getLocation()); 752 | builder.field("description", status.getUser().getDescription()); 753 | builder.field("profile_image_url", status.getUser().getProfileImageURL()); 754 | builder.field("profile_image_url_https", status.getUser().getProfileImageURLHttps()); 755 | 756 | builder.endObject(); 757 | 758 | builder.endObject(); 759 | if (riverStatus != RiverStatus.STOPPED && riverStatus != RiverStatus.STOPPING) { 760 | bulkProcessor.add(Requests.indexRequest(indexName).type(typeName).id(Long.toString(status.getId())).source(builder)); 761 | } 762 | } 763 | } 764 | 765 | } catch (Exception e) { 766 | logger.warn("failed to construct index request", e); 767 | } 768 | } else { 769 | logger.debug("river is closing. ignoring tweet [{}]", status.getId()); 770 | } 771 | } 772 | 773 | @Override 774 | public void onDeletionNotice(StatusDeletionNotice statusDeletionNotice) { 775 | if (riverStatus != RiverStatus.STOPPED && riverStatus != RiverStatus.STOPPING) { 776 | if (statusDeletionNotice.getStatusId() != -1) { 777 | bulkProcessor.add(Requests.deleteRequest(indexName).type(typeName).id(Long.toString(statusDeletionNotice.getStatusId()))); 778 | } 779 | } else { 780 | logger.debug("river is closing. ignoring deletion of tweet [{}]", statusDeletionNotice.getStatusId()); 781 | } 782 | } 783 | 784 | @Override 785 | public void onTrackLimitationNotice(int numberOfLimitedStatuses) { 786 | logger.info("received track limitation notice, number_of_limited_statuses {}", numberOfLimitedStatuses); 787 | } 788 | 789 | @Override 790 | public void onException(Exception ex) { 791 | logger.warn("stream failure, restarting stream...", ex); 792 | threadPool.generic().execute(new Runnable() { 793 | @Override 794 | public void run() { 795 | reconnect(); 796 | } 797 | }); 798 | } 799 | } 800 | 801 | private class UserStreamHandler extends UserStreamAdapter { 802 | 803 | private final StatusHandler statusHandler = new StatusHandler(); 804 | 805 | @Override 806 | public void onException(Exception ex) { 807 | statusHandler.onException(ex); 808 | } 809 | 810 | @Override 811 | public void onStatus(Status status) { 812 | statusHandler.onStatus(status); 813 | } 814 | 815 | @Override 816 | public void onDeletionNotice(StatusDeletionNotice statusDeletionNotice) { 817 | statusHandler.onDeletionNotice(statusDeletionNotice); 818 | } 819 | } 820 | 821 | public enum RiverStatus { 822 | UNKNOWN, 823 | INITIALIZED, 824 | STARTING, 825 | RUNNING, 826 | STOPPING, 827 | STOPPED; 828 | } 829 | } 830 | -------------------------------------------------------------------------------- /src/main/java/org/elasticsearch/river/twitter/TwitterRiverModule.java: -------------------------------------------------------------------------------- 1 | /* 2 | * Licensed to Elasticsearch under one or more contributor 3 | * license agreements. See the NOTICE file distributed with 4 | * this work for additional information regarding copyright 5 | * ownership. Elasticsearch licenses this file to you under 6 | * the Apache License, Version 2.0 (the "License"); you may 7 | * not use this file except in compliance with the License. 8 | * You may obtain a copy of the License at 9 | * 10 | * http://www.apache.org/licenses/LICENSE-2.0 11 | * 12 | * Unless required by applicable law or agreed to in writing, 13 | * software distributed under the License is distributed on an 14 | * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY 15 | * KIND, either express or implied. See the License for the 16 | * specific language governing permissions and limitations 17 | * under the License. 18 | */ 19 | 20 | package org.elasticsearch.river.twitter; 21 | 22 | import org.elasticsearch.common.inject.AbstractModule; 23 | import org.elasticsearch.river.River; 24 | 25 | /** 26 | * 27 | */ 28 | public class TwitterRiverModule extends AbstractModule { 29 | 30 | @Override 31 | protected void configure() { 32 | bind(River.class).to(TwitterRiver.class).asEagerSingleton(); 33 | } 34 | } 35 | -------------------------------------------------------------------------------- /src/main/resources/es-plugin.properties: -------------------------------------------------------------------------------- 1 | plugin=org.elasticsearch.plugin.river.twitter.TwitterRiverPlugin 2 | version=${project.version} 3 | -------------------------------------------------------------------------------- /src/test/java/org/elasticsearch/river/twitter/test/AbstractTwitterTest.java: -------------------------------------------------------------------------------- 1 | /* 2 | * Licensed to Elasticsearch under one or more contributor 3 | * license agreements. See the NOTICE file distributed with 4 | * this work for additional information regarding copyright 5 | * ownership. Elasticsearch licenses this file to you under 6 | * the Apache License, Version 2.0 (the "License"); you may 7 | * not use this file except in compliance with the License. 8 | * You may obtain a copy of the License at 9 | * 10 | * http://www.apache.org/licenses/LICENSE-2.0 11 | * 12 | * Unless required by applicable law or agreed to in writing, 13 | * software distributed under the License is distributed on an 14 | * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY 15 | * KIND, either express or implied. See the License for the 16 | * specific language governing permissions and limitations 17 | * under the License. 18 | */ 19 | 20 | package org.elasticsearch.river.twitter.test; 21 | 22 | import com.carrotsearch.randomizedtesting.annotations.TestGroup; 23 | import org.elasticsearch.common.base.Predicate; 24 | import org.elasticsearch.test.ElasticsearchIntegrationTest; 25 | import org.elasticsearch.test.ElasticsearchIntegrationTest.ThirdParty; 26 | 27 | import java.lang.annotation.Documented; 28 | import java.lang.annotation.Inherited; 29 | import java.lang.annotation.Retention; 30 | import java.lang.annotation.RetentionPolicy; 31 | import java.util.concurrent.TimeUnit; 32 | 33 | /** 34 | * Base class for tests that an internet connection and twitter credentials to run. 35 | * Twitter tests are disabled by default. 36 | *

37 | * To enable test add -Dtests.thirdparty=true -Dtests.config=/path/to/elasticsearch.yml 38 | *

39 | * The elasticsearch.yml file should contain the following keys 40 | *

41 |   river:
42 |       twitter:
43 |           oauth:
44 |              consumer_key: ""
45 |              consumer_secret: ""
46 |              access_token: ""
47 |              access_token_secret: ""
48 |  * 
49 | * 50 | * You need to get an OAuth token in order to use Twitter river. 51 | * Please follow [Twitter documentation](https://dev.twitter.com/docs/auth/tokens-devtwittercom), basically: 52 | * 53 | *
    54 | *
  • Login to: https://dev.twitter.com/apps/ 55 | *
  • Create a new Twitter application (let's say elasticsearch): https://dev.twitter.com/apps/new 56 | You don't need a callback URL. 57 | *
  • When done, click on `Create my access token`. 58 | *
  • Open `OAuth tool` tab and note `Consumer key`, `Consumer secret`, `Access token` and `Access token secret`. 59 | *
60 | */ 61 | @ThirdParty 62 | public abstract class AbstractTwitterTest extends ElasticsearchIntegrationTest { 63 | 64 | /** 65 | * Repeat a task until it returns true or after a given wait time. 66 | * We use here a 1 second delay between two runs 67 | * @param breakPredicate test you want to run 68 | * @param maxWaitTime maximum time you want to wait 69 | * @param unit time unit used for maxWaitTime and maxSleepTime 70 | */ 71 | public static boolean awaitBusy1Second(Predicate breakPredicate, long maxWaitTime, TimeUnit unit) throws InterruptedException { 72 | long maxTimeInMillis = TimeUnit.MILLISECONDS.convert(maxWaitTime, unit); 73 | long sleepTimeInMillis = 1000; 74 | long iterations = maxTimeInMillis / sleepTimeInMillis; 75 | for (int i = 0; i < iterations; i++) { 76 | if (breakPredicate.apply(null)) { 77 | return true; 78 | } 79 | Thread.sleep(sleepTimeInMillis); 80 | } 81 | return breakPredicate.apply(null); 82 | } 83 | } 84 | -------------------------------------------------------------------------------- /src/test/java/org/elasticsearch/river/twitter/test/Twitter4JThreadFilter.java: -------------------------------------------------------------------------------- 1 | /* 2 | * Licensed to Elasticsearch under one or more contributor 3 | * license agreements. See the NOTICE file distributed with 4 | * this work for additional information regarding copyright 5 | * ownership. Elasticsearch licenses this file to you under 6 | * the Apache License, Version 2.0 (the "License"); you may 7 | * not use this file except in compliance with the License. 8 | * You may obtain a copy of the License at 9 | * 10 | * http://www.apache.org/licenses/LICENSE-2.0 11 | * 12 | * Unless required by applicable law or agreed to in writing, 13 | * software distributed under the License is distributed on an 14 | * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY 15 | * KIND, either express or implied. See the License for the 16 | * specific language governing permissions and limitations 17 | * under the License. 18 | */ 19 | 20 | package org.elasticsearch.river.twitter.test; 21 | 22 | import com.carrotsearch.randomizedtesting.ThreadFilter; 23 | 24 | /** 25 | * We know that Twitter4J can take a while to close 26 | * This filter will ignore it as a ThreadLeak 27 | */ 28 | public class Twitter4JThreadFilter implements ThreadFilter { 29 | 30 | @Override 31 | public boolean reject(Thread t) { 32 | String threadName = t.getName(); 33 | 34 | if (threadName.contains("Twitter4J Async Dispatcher")) { 35 | return true; 36 | } 37 | 38 | if (threadName.contains("Twitter Stream consumer")) { 39 | return true; 40 | } 41 | 42 | if (threadName.contains("riverClusterService#updateTask")) { 43 | return true; 44 | } 45 | 46 | return false; 47 | } 48 | } 49 | -------------------------------------------------------------------------------- /src/test/java/org/elasticsearch/river/twitter/test/TwitterIntegrationTest.java: -------------------------------------------------------------------------------- 1 | /* 2 | * Licensed to Elasticsearch under one or more contributor 3 | * license agreements. See the NOTICE file distributed with 4 | * this work for additional information regarding copyright 5 | * ownership. Elasticsearch licenses this file to you under 6 | * the Apache License, Version 2.0 (the "License"); you may 7 | * not use this file except in compliance with the License. 8 | * You may obtain a copy of the License at 9 | * 10 | * http://www.apache.org/licenses/LICENSE-2.0 11 | * 12 | * Unless required by applicable law or agreed to in writing, 13 | * software distributed under the License is distributed on an 14 | * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY 15 | * KIND, either express or implied. See the License for the 16 | * specific language governing permissions and limitations 17 | * under the License. 18 | */ 19 | 20 | package org.elasticsearch.river.twitter.test; 21 | 22 | import com.carrotsearch.randomizedtesting.annotations.ThreadLeakFilters; 23 | import org.elasticsearch.action.count.CountResponse; 24 | import org.elasticsearch.action.get.GetResponse; 25 | import org.elasticsearch.action.search.SearchPhaseExecutionException; 26 | import org.elasticsearch.action.search.SearchResponse; 27 | import org.elasticsearch.common.Strings; 28 | import org.elasticsearch.common.base.Predicate; 29 | import org.elasticsearch.common.joda.time.DateTime; 30 | import org.elasticsearch.common.settings.Settings; 31 | import org.elasticsearch.common.unit.DistanceUnit; 32 | import org.elasticsearch.common.xcontent.XContentBuilder; 33 | import org.elasticsearch.env.Environment; 34 | import org.elasticsearch.index.query.QueryBuilders; 35 | import org.elasticsearch.indices.IndexAlreadyExistsException; 36 | import org.elasticsearch.indices.IndexMissingException; 37 | import org.elasticsearch.plugins.PluginsService; 38 | import org.elasticsearch.river.twitter.test.helper.HttpClient; 39 | import org.elasticsearch.river.twitter.test.helper.HttpClientResponse; 40 | import org.elasticsearch.search.SearchHit; 41 | import org.elasticsearch.test.ElasticsearchIntegrationTest; 42 | import org.junit.*; 43 | import twitter4j.Status; 44 | import twitter4j.Twitter; 45 | import twitter4j.TwitterException; 46 | import twitter4j.TwitterFactory; 47 | import twitter4j.auth.AccessToken; 48 | 49 | import java.io.IOException; 50 | import java.util.concurrent.TimeUnit; 51 | 52 | import static org.elasticsearch.cluster.metadata.IndexMetaData.SETTING_NUMBER_OF_REPLICAS; 53 | import static org.elasticsearch.cluster.metadata.IndexMetaData.SETTING_NUMBER_OF_SHARDS; 54 | import static org.elasticsearch.common.xcontent.XContentFactory.jsonBuilder; 55 | import static org.hamcrest.CoreMatchers.*; 56 | import static org.hamcrest.Matchers.equalTo; 57 | import static org.hamcrest.Matchers.greaterThan; 58 | 59 | /** 60 | * Integration tests for Twitter river
61 | * You must have an internet access. 62 | * 63 | * Launch it using: 64 | * mvn test -Dtests.thirdparty=true -Dtests.config=/path/to/elasticsearch.yml 65 | * 66 | * where your /path/to/elasticsearch.yml contains: 67 | 68 | river: 69 | twitter: 70 | oauth: 71 | consumer_key: "" 72 | consumer_secret: "" 73 | access_token: "" 74 | access_token_secret: "" 75 | 76 | */ 77 | @ElasticsearchIntegrationTest.ClusterScope( 78 | scope = ElasticsearchIntegrationTest.Scope.SUITE, 79 | transportClientRatio = 0.0) 80 | @ThreadLeakFilters(defaultFilters = true, filters = {Twitter4JThreadFilter.class}) 81 | public class TwitterIntegrationTest extends AbstractTwitterTest { 82 | 83 | private final String track = "obama"; 84 | 85 | @Override 86 | protected Settings nodeSettings(int nodeOrdinal) { 87 | Settings.Builder settings = Settings.builder() 88 | .put(super.nodeSettings(nodeOrdinal)) 89 | .put("path.home", createTempDir()) 90 | .put("plugins." + PluginsService.LOAD_PLUGIN_FROM_CLASSPATH, true); 91 | 92 | Environment environment = new Environment(settings.build()); 93 | 94 | // if explicit, just load it and don't load from env 95 | if (Strings.hasText(System.getProperty("tests.config"))) { 96 | settings.loadFromUrl(environment.resolveConfig(System.getProperty("tests.config"))); 97 | } 98 | 99 | return settings.build(); 100 | } 101 | 102 | @Before 103 | public void createEmptyRiverIndex() { 104 | // We want to force _river index to use 1 shard 1 replica 105 | client().admin().indices().prepareCreate("_river").setSettings(Settings.builder() 106 | .put(SETTING_NUMBER_OF_SHARDS, 1) 107 | .put(SETTING_NUMBER_OF_REPLICAS, 0)).get(); 108 | } 109 | 110 | @After 111 | public void deleteRiverAndWait() throws InterruptedException { 112 | logger.info(" --> delete all"); 113 | client().admin().indices().prepareDelete("_all").get(); 114 | 115 | assertThat(awaitBusy(new Predicate() { 116 | public boolean apply(Object obj) { 117 | CountResponse response = client().prepareCount().get(); 118 | return response.getCount() == 0; 119 | } 120 | }, 20, TimeUnit.SECONDS), equalTo(true)); 121 | 122 | // Let's wait one second between two runs as it appears that Twitter4J 123 | // does not close immediately so we might have Twitter API failure on the next test 124 | // 420:Returned by the Search and Trends API when you are being rate limited 125 | logger.info(" --> wait for Twitter4J to close"); 126 | awaitBusy1Second(new Predicate() { 127 | @Override 128 | public boolean apply(Object o) { 129 | return false; 130 | } 131 | }, 2, TimeUnit.SECONDS); 132 | logger.info(" --> ending test"); 133 | } 134 | 135 | private String getDbName() { 136 | return Strings.toUnderscoreCase(getTestName()); 137 | } 138 | 139 | private void launchTest(XContentBuilder river, final Integer numDocs, boolean removeRiver) 140 | throws IOException, InterruptedException { 141 | logger.info(" -> Checking internet working"); 142 | HttpClientResponse response = new HttpClient("www.elastic.co", 443).request("/"); 143 | Assert.assertThat(response.errorCode(), is(200)); 144 | 145 | logger.info(" -> Create river"); 146 | try { 147 | createIndex(getDbName()); 148 | } catch (IndexAlreadyExistsException e) { 149 | // No worries. We already created the index before 150 | } 151 | index("_river", getDbName(), "_meta", river); 152 | 153 | logger.info(" -> Wait for some docs"); 154 | assertThat(awaitBusy1Second(new Predicate() { 155 | public boolean apply(Object obj) { 156 | try { 157 | refresh(); 158 | CountResponse response = client().prepareCount(getDbName()).get(); 159 | logger.info(" -> got {} docs in {} index", response.getCount(), getDbName()); 160 | return response.getCount() >= numDocs; 161 | } catch (IndexMissingException e) { 162 | return false; 163 | } catch (SearchPhaseExecutionException e) { 164 | return false; 165 | } 166 | } 167 | }, 5, TimeUnit.MINUTES), equalTo(true)); 168 | 169 | if (removeRiver) { 170 | logger.info(" -> Remove river"); 171 | client().prepareDelete("_river", getDbName(), "_meta").get(); 172 | } 173 | } 174 | 175 | @Test 176 | public void testLanguageFiltering() throws IOException, InterruptedException { 177 | launchTest(jsonBuilder() 178 | .startObject() 179 | .field("type", "twitter") 180 | .startObject("twitter") 181 | .field("type", "filter") 182 | .startObject("filter") 183 | .field("tracks", "le") 184 | .field("language", "fr") 185 | .endObject() 186 | .endObject() 187 | .endObject(), randomIntBetween(5, 50), true); 188 | 189 | // We should have only FR data 190 | SearchResponse response = client().prepareSearch(getDbName()) 191 | .addField("language") 192 | .addField("_source") 193 | .get(); 194 | 195 | logger.info(" --> Search response: {}", response.toString()); 196 | 197 | // All language fields should be fr 198 | for (SearchHit hit : response.getHits().getHits()) { 199 | assertThat(hit.field("language"), notNullValue()); 200 | assertThat(hit.field("language").getValue().toString(), is("fr")); 201 | } 202 | } 203 | 204 | @Test 205 | public void testIgnoreRT() throws IOException, InterruptedException { 206 | launchTest(jsonBuilder() 207 | .startObject() 208 | .field("type", "twitter") 209 | .startObject("twitter") 210 | .field("type", "sample") 211 | .field("ignore_retweet", true) 212 | .endObject() 213 | .endObject(), randomIntBetween(5, 50), true); 214 | 215 | // We should have only FR data 216 | SearchResponse response = client().prepareSearch(getDbName()) 217 | .addField("retweet.id") 218 | .get(); 219 | 220 | logger.info(" --> Search response: {}", response.toString()); 221 | 222 | // We should not have any RT 223 | for (SearchHit hit : response.getHits().getHits()) { 224 | assertThat(hit.field("retweet.id"), nullValue()); 225 | } 226 | } 227 | 228 | @Test 229 | public void testRaw() throws IOException, InterruptedException { 230 | launchTest(jsonBuilder() 231 | .startObject() 232 | .field("type", "twitter") 233 | .startObject("twitter") 234 | .field("raw", true) 235 | .startObject("filter") 236 | .field("tracks", track) 237 | .endObject() 238 | .endObject() 239 | .endObject(), randomIntBetween(5, 50), true); 240 | 241 | // We should have data we don't have without raw set to true 242 | SearchResponse response = client().prepareSearch(getDbName()) 243 | .addField("user.statuses_count") 244 | .addField("_source") 245 | .get(); 246 | 247 | logger.info(" --> Search response: {}", response.toString()); 248 | 249 | for (SearchHit hit : response.getHits().getHits()) { 250 | assertThat(hit.field("user.statuses_count"), notNullValue()); 251 | } 252 | } 253 | 254 | /** 255 | * Tracking twitter account: 783214 256 | */ 257 | @Test 258 | public void testFollow() throws IOException, InterruptedException { 259 | launchTest(jsonBuilder() 260 | .startObject() 261 | .field("type", "twitter") 262 | .startObject("twitter") 263 | .startObject("filter") 264 | .field("follow", "783214") 265 | .endObject() 266 | .endObject() 267 | .startObject("index") 268 | .field("bulk_size", 1) 269 | .endObject() 270 | .endObject(), 1, true); 271 | } 272 | 273 | /** 274 | * Tracking twitter lists and Zonal_Marking/Guardian100FootballBlogs,Zonal_Marking/football-journalists-3 275 | */ 276 | @Test 277 | public void testFollowList() throws IOException, InterruptedException { 278 | launchTest(jsonBuilder() 279 | .startObject() 280 | .field("type", "twitter") 281 | .startObject("twitter") 282 | .startObject("filter") 283 | .field("user_lists", "Zonal_Marking/Guardian100FootballBlogs,Zonal_Marking/football-journalists-3") 284 | .endObject() 285 | .endObject() 286 | .startObject("index") 287 | .field("bulk_size", 1) 288 | .endObject() 289 | .endObject(), 1, true); 290 | } 291 | @Test 292 | public void testTracks() throws IOException, InterruptedException { 293 | launchTest(jsonBuilder() 294 | .startObject() 295 | .field("type", "twitter") 296 | .startObject("twitter") 297 | .startObject("filter") 298 | .field("tracks", track) 299 | .endObject() 300 | .endObject() 301 | .endObject(), randomIntBetween(1, 10), true); 302 | 303 | // We should have only FR data 304 | SearchResponse response = client().prepareSearch(getDbName()) 305 | .setQuery(QueryBuilders.queryStringQuery(track)) 306 | .get(); 307 | 308 | logger.info(" --> Search response: {}", response.toString()); 309 | 310 | assertThat(response.getHits().getTotalHits(), greaterThan(0L)); 311 | } 312 | 313 | @Test 314 | public void testSample() throws IOException, InterruptedException { 315 | launchTest(jsonBuilder() 316 | .startObject() 317 | .field("type", "twitter") 318 | .startObject("twitter") 319 | .field("type", "sample") 320 | .endObject() 321 | .endObject(), randomIntBetween(10, 200), true); 322 | } 323 | 324 | @Test 325 | public void testRetryAfter() throws IOException, InterruptedException { 326 | launchTest(jsonBuilder() 327 | .startObject() 328 | .field("type", "twitter") 329 | .startObject("twitter") 330 | .field("type", "sample") 331 | .field("retry_after", "10s") 332 | .endObject() 333 | .endObject(), randomIntBetween(10, 200), true); 334 | } 335 | 336 | @Test 337 | public void testUserStream() throws IOException, InterruptedException, TwitterException { 338 | launchTest(jsonBuilder() 339 | .startObject() 340 | .field("type", "twitter") 341 | .startObject("twitter") 342 | .field("type", "user") 343 | .endObject() 344 | .endObject(), 0, false); 345 | 346 | // Wait for the river to start 347 | awaitBusy(new Predicate() { 348 | public boolean apply(Object obj) { 349 | try { 350 | GetResponse response = get("_river", getDbName(), "_status"); 351 | return response.isExists(); 352 | } catch (IndexMissingException e) { 353 | return false; 354 | } 355 | } 356 | }, 10, TimeUnit.SECONDS); 357 | 358 | // The river could look started but it took actually some seconds 359 | // to get twitter stream up and running. So we wait 30 seconds more. 360 | awaitBusy1Second(new Predicate() { 361 | public boolean apply(Object obj) { 362 | return false; 363 | } 364 | }, 30, TimeUnit.SECONDS); 365 | 366 | // Generate a tweet on your timeline 367 | // We need to read settings from elasticsearch.yml file 368 | Settings settings = internalCluster().getInstance(Settings.class); 369 | AccessToken accessToken = new AccessToken( 370 | settings.get("river.twitter.oauth.access_token"), 371 | settings.get("river.twitter.oauth.access_token_secret")); 372 | 373 | 374 | Twitter twitter = new TwitterFactory().getInstance(); 375 | twitter.setOAuthConsumer( 376 | settings.get("river.twitter.oauth.consumer_key"), 377 | settings.get("river.twitter.oauth.consumer_secret")); 378 | twitter.setOAuthAccessToken(accessToken); 379 | 380 | Status status = twitter.updateStatus("testing twitter river. Please ignore. " + 381 | DateTime.now().toString()); 382 | logger.info(" -> tweet [{}] sent: [{}]", status.getId(), status.getText()); 383 | 384 | assertThat(awaitBusy1Second(new Predicate() { 385 | public boolean apply(Object obj) { 386 | try { 387 | refresh(); 388 | SearchResponse response = client().prepareSearch(getDbName()).get(); 389 | logger.info(" -> got {} docs in {} index", response.getHits().totalHits(), getDbName()); 390 | return response.getHits().totalHits() >= 1; 391 | } catch (IndexMissingException e) { 392 | return false; 393 | } 394 | } 395 | }, 1, TimeUnit.MINUTES), is(true)); 396 | 397 | logger.info(" -> Remove river"); 398 | client().prepareDelete("_river", getDbName(), "_meta").get(); 399 | } 400 | 401 | /** 402 | * Test for #51: https://github.com/elasticsearch/elasticsearch-river-twitter/issues/51 403 | */ 404 | @Test 405 | public void testgeoAsArray() throws IOException, InterruptedException { 406 | launchTest(jsonBuilder() 407 | .startObject() 408 | .field("type", "twitter") 409 | .startObject("twitter") 410 | .field("type", "sample") 411 | .field("geo_as_array", true) 412 | .endObject() 413 | .endObject(), randomIntBetween(1, 10), false); 414 | 415 | // We wait for geo located tweets (it could take a looooong time) 416 | if (!awaitBusy1Second(new Predicate() { 417 | public boolean apply(Object obj) { 418 | try { 419 | refresh(); 420 | SearchResponse response = client().prepareSearch(getDbName()) 421 | .setPostFilter( 422 | QueryBuilders.geoDistanceQuery("location") 423 | .point(0, 0) 424 | .distance(10000, DistanceUnit.KILOMETERS) 425 | ) 426 | .addField("_source") 427 | .addField("location") 428 | .get(); 429 | 430 | logger.info(" --> Search response: {}", response.toString()); 431 | 432 | for (SearchHit hit : response.getHits().getHits()) { 433 | if (hit.field("location") != null) { 434 | // We have a location field so it must be an array containing 2 values 435 | assertThat(hit.field("location").getValues().size(), is(2)); 436 | return true; 437 | } 438 | } 439 | return false; 440 | } catch (IndexMissingException e) { 441 | return false; 442 | } 443 | } 444 | }, 5, TimeUnit.MINUTES)) { 445 | logger.warn(" -> We did not manage to get a geo localized tweet within 5 minutes. :("); 446 | } 447 | 448 | logger.info(" -> Remove river"); 449 | client().prepareDelete("_river", getDbName(), "_meta").get(); 450 | } 451 | } 452 | -------------------------------------------------------------------------------- /src/test/java/org/elasticsearch/river/twitter/test/helper/HttpClient.java: -------------------------------------------------------------------------------- 1 | /* 2 | * Licensed to Elasticsearch under one or more contributor 3 | * license agreements. See the NOTICE file distributed with 4 | * this work for additional information regarding copyright 5 | * ownership. Elasticsearch licenses this file to you under 6 | * the Apache License, Version 2.0 (the "License"); you may 7 | * not use this file except in compliance with the License. 8 | * You may obtain a copy of the License at 9 | * 10 | * http://www.apache.org/licenses/LICENSE-2.0 11 | * 12 | * Unless required by applicable law or agreed to in writing, 13 | * software distributed under the License is distributed on an 14 | * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY 15 | * KIND, either express or implied. See the License for the 16 | * specific language governing permissions and limitations 17 | * under the License. 18 | */ 19 | package org.elasticsearch.river.twitter.test.helper; 20 | 21 | import org.elasticsearch.ElasticsearchException; 22 | import org.elasticsearch.common.base.Charsets; 23 | import org.elasticsearch.common.io.Streams; 24 | 25 | import java.io.IOException; 26 | import java.io.InputStream; 27 | import java.io.InputStreamReader; 28 | import java.io.OutputStreamWriter; 29 | import java.net.HttpURLConnection; 30 | import java.net.MalformedURLException; 31 | import java.net.URL; 32 | import java.nio.charset.StandardCharsets; 33 | import java.util.List; 34 | import java.util.Map; 35 | 36 | public class HttpClient { 37 | 38 | private final URL baseUrl; 39 | 40 | public HttpClient(String hostname, Integer port) { 41 | try { 42 | baseUrl = new URL("https", hostname, port, "/"); 43 | } catch (MalformedURLException e) { 44 | throw new ElasticsearchException("", e); 45 | } 46 | } 47 | 48 | public HttpClientResponse request(String path) { 49 | return request("GET", path, null, null); 50 | } 51 | 52 | public HttpClientResponse request(String method, String path) { 53 | return request(method, path, null, null); 54 | } 55 | 56 | public HttpClientResponse request(String method, String path, String payload) { 57 | return request(method, path, null, payload); 58 | } 59 | 60 | public HttpClientResponse request(String method, String path, Map headers, String payload) { 61 | URL url; 62 | try { 63 | url = new URL(baseUrl, path); 64 | } catch (MalformedURLException e) { 65 | throw new ElasticsearchException("Cannot parse " + path, e); 66 | } 67 | 68 | HttpURLConnection urlConnection; 69 | try { 70 | urlConnection = (HttpURLConnection) url.openConnection(); 71 | urlConnection.setRequestMethod(method); 72 | if (headers != null) { 73 | for (Map.Entry headerEntry : headers.entrySet()) { 74 | urlConnection.setRequestProperty(headerEntry.getKey(), headerEntry.getValue()); 75 | } 76 | } 77 | 78 | if (payload != null) { 79 | urlConnection.setDoOutput(true); 80 | urlConnection.setRequestProperty("Content-Type", "application/json"); 81 | urlConnection.setRequestProperty("Accept", "application/json"); 82 | OutputStreamWriter osw = new OutputStreamWriter(urlConnection.getOutputStream(), StandardCharsets.UTF_8); 83 | osw.write(payload); 84 | osw.flush(); 85 | osw.close(); 86 | } 87 | 88 | urlConnection.connect(); 89 | } catch (IOException e) { 90 | throw new ElasticsearchException("", e); 91 | } 92 | 93 | int errorCode = -1; 94 | Map> respHeaders = null; 95 | try { 96 | errorCode = urlConnection.getResponseCode(); 97 | respHeaders = urlConnection.getHeaderFields(); 98 | InputStream inputStream = urlConnection.getInputStream(); 99 | String body = null; 100 | try { 101 | body = Streams.copyToString(new InputStreamReader(inputStream, Charsets.UTF_8)); 102 | } catch (IOException e1) { 103 | throw new ElasticsearchException("problem reading error stream", e1); 104 | } 105 | return new HttpClientResponse(body, errorCode, respHeaders, null); 106 | } catch (IOException e) { 107 | InputStream errStream = urlConnection.getErrorStream(); 108 | String body = null; 109 | if (errStream != null) { 110 | try { 111 | body = Streams.copyToString(new InputStreamReader(errStream, Charsets.UTF_8)); 112 | } catch (IOException e1) { 113 | throw new ElasticsearchException("problem reading error stream", e1); 114 | } 115 | } 116 | return new HttpClientResponse(body, errorCode, respHeaders, e); 117 | } finally { 118 | urlConnection.disconnect(); 119 | } 120 | } 121 | } 122 | -------------------------------------------------------------------------------- /src/test/java/org/elasticsearch/river/twitter/test/helper/HttpClientResponse.java: -------------------------------------------------------------------------------- 1 | /* 2 | * Licensed to Elasticsearch under one or more contributor 3 | * license agreements. See the NOTICE file distributed with 4 | * this work for additional information regarding copyright 5 | * ownership. Elasticsearch licenses this file to you under 6 | * the Apache License, Version 2.0 (the "License"); you may 7 | * not use this file except in compliance with the License. 8 | * You may obtain a copy of the License at 9 | * 10 | * http://www.apache.org/licenses/LICENSE-2.0 11 | * 12 | * Unless required by applicable law or agreed to in writing, 13 | * software distributed under the License is distributed on an 14 | * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY 15 | * KIND, either express or implied. See the License for the 16 | * specific language governing permissions and limitations 17 | * under the License. 18 | */ 19 | package org.elasticsearch.river.twitter.test.helper; 20 | 21 | import java.util.List; 22 | import java.util.Map; 23 | 24 | public class HttpClientResponse { 25 | private final String response; 26 | private final int errorCode; 27 | private Map> headers; 28 | private final Throwable e; 29 | 30 | public HttpClientResponse(String response, int errorCode, Map> headers, Throwable e) { 31 | this.response = response; 32 | this.errorCode = errorCode; 33 | this.headers = headers; 34 | this.e = e; 35 | } 36 | 37 | public String response() { 38 | return response; 39 | } 40 | 41 | public int errorCode() { 42 | return errorCode; 43 | } 44 | 45 | public Throwable cause() { 46 | return e; 47 | } 48 | 49 | public Map> getHeaders() { 50 | return headers; 51 | } 52 | 53 | public String getHeader(String name) { 54 | if (headers == null) { 55 | return null; 56 | } 57 | List vals = headers.get(name); 58 | if (vals == null || vals.size() == 0) { 59 | return null; 60 | } 61 | return vals.iterator().next(); 62 | } 63 | } 64 | --------------------------------------------------------------------------------