├── .gitignore
├── CONTRIBUTING.md
├── LICENSE.txt
├── README.md
├── dev-tools
    └── release.py
├── pom.xml
└── src
    ├── main
        ├── assemblies
        │   └── plugin.xml
        ├── java
        │   └── org
        │   │   └── elasticsearch
        │   │       ├── plugin
        │   │           └── river
        │   │           │   └── twitter
        │   │           │       └── TwitterRiverPlugin.java
        │   │       └── river
        │   │           └── twitter
        │   │               ├── TwitterRiver.java
        │   │               └── TwitterRiverModule.java
        └── resources
        │   └── es-plugin.properties
    └── test
        └── java
            └── org
                └── elasticsearch
                    └── river
                        └── twitter
                            └── test
                                ├── AbstractTwitterTest.java
                                ├── Twitter4JThreadFilter.java
                                ├── TwitterIntegrationTest.java
                                └── helper
                                    ├── HttpClient.java
                                    └── HttpClientResponse.java


/.gitignore:
--------------------------------------------------------------------------------
 1 | /data
 2 | /work
 3 | /logs
 4 | /.idea
 5 | /target
 6 | .DS_Store
 7 | *.iml
 8 | /.project
 9 | /.settings
10 | /.classpath
11 | /plugin_tools
12 | /.local-execution-hints.log
13 | /.local-*-execution-hints.log
14 | 


--------------------------------------------------------------------------------
/CONTRIBUTING.md:
--------------------------------------------------------------------------------
 1 | Contributing to elasticsearch
 2 | =============================
 3 | 
 4 | Elasticsearch is an open source project and we love to receive contributions from our community — you! There are many ways to contribute, from writing tutorials or blog posts, improving the documentation, submitting bug reports and feature requests or writing code which can be incorporated into Elasticsearch itself.
 5 | 
 6 | Bug reports
 7 | -----------
 8 | 
 9 | If you think you have found a bug in Elasticsearch, first make sure that you are testing against the [latest version of Elasticsearch](http://www.elasticsearch.org/download/) - your issue may already have been fixed. If not, search our [issues list](https://github.com/elasticsearch/elasticsearch/issues) on GitHub in case a similar issue has already been opened.
10 | 
11 | It is very helpful if you can prepare a reproduction of the bug. In other words, provide a small test case which we can run to confirm your bug. It makes it easier to find the problem and to fix it. Test cases should be provided as `curl` commands which we can copy and paste into a terminal to run it locally, for example:
12 | 
13 | ```sh
14 | # delete the index
15 | curl -XDELETE localhost:9200/test
16 | 
17 | # insert a document
18 | curl -XPUT localhost:9200/test/test/1 -d '{
19 |  "title": "test document"
20 | }'
21 | 
22 | # this should return XXXX but instead returns YYY
23 | curl ....
24 | ```
25 | 
26 | Provide as much information as you can. You may think that the problem lies with your query, when actually it depends on how your data is indexed. The easier it is for us to recreate your problem, the faster it is likely to be fixed.
27 | 
28 | Feature requests
29 | ----------------
30 | 
31 | If you find yourself wishing for a feature that doesn't exist in Elasticsearch, you are probably not alone. There are bound to be others out there with similar needs. Many of the features that Elasticsearch has today have been added because our users saw the need.
32 | Open an issue on our [issues list](https://github.com/elasticsearch/elasticsearch/issues) on GitHub which describes the feature you would like to see, why you need it, and how it should work.
33 | 
34 | Contributing code and documentation changes
35 | -------------------------------------------
36 | 
37 | If you have a bugfix or new feature that you would like to contribute to Elasticsearch, please find or open an issue about it first. Talk about what you would like to do. It may be that somebody is already working on it, or that there are particular issues that you should know about before implementing the change.
38 | 
39 | We enjoy working with contributors to get their code accepted. There are many approaches to fixing a problem and it is important to find the best approach before writing too much code.
40 | 
41 | The process for contributing to any of the [Elasticsearch repositories](https://github.com/elasticsearch/) is similar. Details for individual projects can be found below.
42 | 
43 | ### Fork and clone the repository
44 | 
45 | You will need to fork the main Elasticsearch code or documentation repository and clone it to your local machine. See 
46 | [github help page](https://help.github.com/articles/fork-a-repo) for help.
47 | 
48 | Further instructions for specific projects are given below.
49 | 
50 | ### Submitting your changes
51 | 
52 | Once your changes and tests are ready to submit for review:
53 | 
54 | 1. Test your changes
55 | Run the test suite to make sure that nothing is broken.
56 | 
57 | 2. Sign the Contributor License Agreement
58 | Please make sure you have signed our [Contributor License Agreement](http://www.elasticsearch.org/contributor-agreement/). We are not asking you to assign copyright to us, but to give us the right to distribute your code without restriction. We ask this of all contributors in order to assure our users of the origin and continuing existence of the code. You only need to sign the CLA once.
59 | 
60 | 3. Rebase your changes
61 | Update your local repository with the most recent code from the main Elasticsearch repository, and rebase your branch on top of the latest master branch. We prefer your changes to be squashed into a single commit.
62 | 
63 | 4. Submit a pull request
64 | Push your local changes to your forked copy of the repository and [submit a pull request](https://help.github.com/articles/using-pull-requests). In the pull request, describe what your changes do and mention the number of the issue where discussion has taken place, eg "Closes #123".
65 | 
66 | Then sit back and wait. There will probably be discussion about the pull request and, if any changes are needed, we would love to work with you to get your pull request merged into Elasticsearch.
67 | 
68 | 
69 | Contributing to the Elasticsearch plugin
70 | ----------------------------------------
71 | 
72 | **Repository:** [https://github.com/elasticsearch/elasticsearch-river-twitter](https://github.com/elasticsearch/elasticsearch-river-twitter)
73 | 
74 | Make sure you have [Maven](http://maven.apache.org) installed, as Elasticsearch uses it as its build system. Integration with IntelliJ and Eclipse should work out of the box. Eclipse users can automatically configure their IDE by running `mvn eclipse:eclipse` and then importing the project into their workspace: `File > Import > Existing project into workspace`.
75 | 
76 | Please follow these formatting guidelines:
77 | 
78 | * Java indent is 4 spaces
79 | * Line width is 140 characters
80 | * The rest is left to Java coding standards
81 | * Disable “auto-format on save” to prevent unnecessary format changes. This makes reviews much harder as it generates unnecessary formatting changes. If your IDE supports formatting only modified chunks that is fine to do.
82 | 
83 | To create a distribution from the source, simply run:
84 | 
85 | ```sh
86 | cd elasticsearch-river-twitter/
87 | mvn clean package -DskipTests
88 | ```
89 | 
90 | You will find the newly built packages under: `./target/releases/`.
91 | 
92 | Before submitting your changes, run the test suite to make sure that nothing is broken, with:
93 | 
94 | ```sh
95 | mvn clean test
96 | ```
97 | 
98 | Source: [Contributing to elasticsearch](http://www.elasticsearch.org/contributing-to-elasticsearch/)
99 | 


--------------------------------------------------------------------------------
/LICENSE.txt:
--------------------------------------------------------------------------------
  1 | 
  2 |                                  Apache License
  3 |                            Version 2.0, January 2004
  4 |                         http://www.apache.org/licenses/
  5 | 
  6 |    TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
  7 | 
  8 |    1. Definitions.
  9 | 
 10 |       "License" shall mean the terms and conditions for use, reproduction,
 11 |       and distribution as defined by Sections 1 through 9 of this document.
 12 | 
 13 |       "Licensor" shall mean the copyright owner or entity authorized by
 14 |       the copyright owner that is granting the License.
 15 | 
 16 |       "Legal Entity" shall mean the union of the acting entity and all
 17 |       other entities that control, are controlled by, or are under common
 18 |       control with that entity. For the purposes of this definition,
 19 |       "control" means (i) the power, direct or indirect, to cause the
 20 |       direction or management of such entity, whether by contract or
 21 |       otherwise, or (ii) ownership of fifty percent (50%) or more of the
 22 |       outstanding shares, or (iii) beneficial ownership of such entity.
 23 | 
 24 |       "You" (or "Your") shall mean an individual or Legal Entity
 25 |       exercising permissions granted by this License.
 26 | 
 27 |       "Source" form shall mean the preferred form for making modifications,
 28 |       including but not limited to software source code, documentation
 29 |       source, and configuration files.
 30 | 
 31 |       "Object" form shall mean any form resulting from mechanical
 32 |       transformation or translation of a Source form, including but
 33 |       not limited to compiled object code, generated documentation,
 34 |       and conversions to other media types.
 35 | 
 36 |       "Work" shall mean the work of authorship, whether in Source or
 37 |       Object form, made available under the License, as indicated by a
 38 |       copyright notice that is included in or attached to the work
 39 |       (an example is provided in the Appendix below).
 40 | 
 41 |       "Derivative Works" shall mean any work, whether in Source or Object
 42 |       form, that is based on (or derived from) the Work and for which the
 43 |       editorial revisions, annotations, elaborations, or other modifications
 44 |       represent, as a whole, an original work of authorship. For the purposes
 45 |       of this License, Derivative Works shall not include works that remain
 46 |       separable from, or merely link (or bind by name) to the interfaces of,
 47 |       the Work and Derivative Works thereof.
 48 | 
 49 |       "Contribution" shall mean any work of authorship, including
 50 |       the original version of the Work and any modifications or additions
 51 |       to that Work or Derivative Works thereof, that is intentionally
 52 |       submitted to Licensor for inclusion in the Work by the copyright owner
 53 |       or by an individual or Legal Entity authorized to submit on behalf of
 54 |       the copyright owner. For the purposes of this definition, "submitted"
 55 |       means any form of electronic, verbal, or written communication sent
 56 |       to the Licensor or its representatives, including but not limited to
 57 |       communication on electronic mailing lists, source code control systems,
 58 |       and issue tracking systems that are managed by, or on behalf of, the
 59 |       Licensor for the purpose of discussing and improving the Work, but
 60 |       excluding communication that is conspicuously marked or otherwise
 61 |       designated in writing by the copyright owner as "Not a Contribution."
 62 | 
 63 |       "Contributor" shall mean Licensor and any individual or Legal Entity
 64 |       on behalf of whom a Contribution has been received by Licensor and
 65 |       subsequently incorporated within the Work.
 66 | 
 67 |    2. Grant of Copyright License. Subject to the terms and conditions of
 68 |       this License, each Contributor hereby grants to You a perpetual,
 69 |       worldwide, non-exclusive, no-charge, royalty-free, irrevocable
 70 |       copyright license to reproduce, prepare Derivative Works of,
 71 |       publicly display, publicly perform, sublicense, and distribute the
 72 |       Work and such Derivative Works in Source or Object form.
 73 | 
 74 |    3. Grant of Patent License. Subject to the terms and conditions of
 75 |       this License, each Contributor hereby grants to You a perpetual,
 76 |       worldwide, non-exclusive, no-charge, royalty-free, irrevocable
 77 |       (except as stated in this section) patent license to make, have made,
 78 |       use, offer to sell, sell, import, and otherwise transfer the Work,
 79 |       where such license applies only to those patent claims licensable
 80 |       by such Contributor that are necessarily infringed by their
 81 |       Contribution(s) alone or by combination of their Contribution(s)
 82 |       with the Work to which such Contribution(s) was submitted. If You
 83 |       institute patent litigation against any entity (including a
 84 |       cross-claim or counterclaim in a lawsuit) alleging that the Work
 85 |       or a Contribution incorporated within the Work constitutes direct
 86 |       or contributory patent infringement, then any patent licenses
 87 |       granted to You under this License for that Work shall terminate
 88 |       as of the date such litigation is filed.
 89 | 
 90 |    4. Redistribution. You may reproduce and distribute copies of the
 91 |       Work or Derivative Works thereof in any medium, with or without
 92 |       modifications, and in Source or Object form, provided that You
 93 |       meet the following conditions:
 94 | 
 95 |       (a) You must give any other recipients of the Work or
 96 |           Derivative Works a copy of this License; and
 97 | 
 98 |       (b) You must cause any modified files to carry prominent notices
 99 |           stating that You changed the files; and
100 | 
101 |       (c) You must retain, in the Source form of any Derivative Works
102 |           that You distribute, all copyright, patent, trademark, and
103 |           attribution notices from the Source form of the Work,
104 |           excluding those notices that do not pertain to any part of
105 |           the Derivative Works; and
106 | 
107 |       (d) If the Work includes a "NOTICE" text file as part of its
108 |           distribution, then any Derivative Works that You distribute must
109 |           include a readable copy of the attribution notices contained
110 |           within such NOTICE file, excluding those notices that do not
111 |           pertain to any part of the Derivative Works, in at least one
112 |           of the following places: within a NOTICE text file distributed
113 |           as part of the Derivative Works; within the Source form or
114 |           documentation, if provided along with the Derivative Works; or,
115 |           within a display generated by the Derivative Works, if and
116 |           wherever such third-party notices normally appear. The contents
117 |           of the NOTICE file are for informational purposes only and
118 |           do not modify the License. You may add Your own attribution
119 |           notices within Derivative Works that You distribute, alongside
120 |           or as an addendum to the NOTICE text from the Work, provided
121 |           that such additional attribution notices cannot be construed
122 |           as modifying the License.
123 | 
124 |       You may add Your own copyright statement to Your modifications and
125 |       may provide additional or different license terms and conditions
126 |       for use, reproduction, or distribution of Your modifications, or
127 |       for any such Derivative Works as a whole, provided Your use,
128 |       reproduction, and distribution of the Work otherwise complies with
129 |       the conditions stated in this License.
130 | 
131 |    5. Submission of Contributions. Unless You explicitly state otherwise,
132 |       any Contribution intentionally submitted for inclusion in the Work
133 |       by You to the Licensor shall be under the terms and conditions of
134 |       this License, without any additional terms or conditions.
135 |       Notwithstanding the above, nothing herein shall supersede or modify
136 |       the terms of any separate license agreement you may have executed
137 |       with Licensor regarding such Contributions.
138 | 
139 |    6. Trademarks. This License does not grant permission to use the trade
140 |       names, trademarks, service marks, or product names of the Licensor,
141 |       except as required for reasonable and customary use in describing the
142 |       origin of the Work and reproducing the content of the NOTICE file.
143 | 
144 |    7. Disclaimer of Warranty. Unless required by applicable law or
145 |       agreed to in writing, Licensor provides the Work (and each
146 |       Contributor provides its Contributions) on an "AS IS" BASIS,
147 |       WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
148 |       implied, including, without limitation, any warranties or conditions
149 |       of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
150 |       PARTICULAR PURPOSE. You are solely responsible for determining the
151 |       appropriateness of using or redistributing the Work and assume any
152 |       risks associated with Your exercise of permissions under this License.
153 | 
154 |    8. Limitation of Liability. In no event and under no legal theory,
155 |       whether in tort (including negligence), contract, or otherwise,
156 |       unless required by applicable law (such as deliberate and grossly
157 |       negligent acts) or agreed to in writing, shall any Contributor be
158 |       liable to You for damages, including any direct, indirect, special,
159 |       incidental, or consequential damages of any character arising as a
160 |       result of this License or out of the use or inability to use the
161 |       Work (including but not limited to damages for loss of goodwill,
162 |       work stoppage, computer failure or malfunction, or any and all
163 |       other commercial damages or losses), even if such Contributor
164 |       has been advised of the possibility of such damages.
165 | 
166 |    9. Accepting Warranty or Additional Liability. While redistributing
167 |       the Work or Derivative Works thereof, You may choose to offer,
168 |       and charge a fee for, acceptance of support, warranty, indemnity,
169 |       or other liability obligations and/or rights consistent with this
170 |       License. However, in accepting such obligations, You may act only
171 |       on Your own behalf and on Your sole responsibility, not on behalf
172 |       of any other Contributor, and only if You agree to indemnify,
173 |       defend, and hold each Contributor harmless for any liability
174 |       incurred by, or claims asserted against, such Contributor by reason
175 |       of your accepting any such warranty or additional liability.
176 | 
177 |    END OF TERMS AND CONDITIONS
178 | 
179 |    APPENDIX: How to apply the Apache License to your work.
180 | 
181 |       To apply the Apache License to your work, attach the following
182 |       boilerplate notice, with the fields enclosed by brackets "[]"
183 |       replaced with your own identifying information. (Don't include
184 |       the brackets!)  The text should be enclosed in the appropriate
185 |       comment syntax for the file format. We also recommend that a
186 |       file or class name and description of purpose be included on the
187 |       same "printed page" as the copyright notice for easier
188 |       identification within third-party archives.
189 | 
190 |    Copyright [yyyy] [name of copyright owner]
191 | 
192 |    Licensed under the Apache License, Version 2.0 (the "License");
193 |    you may not use this file except in compliance with the License.
194 |    You may obtain a copy of the License at
195 | 
196 |        http://www.apache.org/licenses/LICENSE-2.0
197 | 
198 |    Unless required by applicable law or agreed to in writing, software
199 |    distributed under the License is distributed on an "AS IS" BASIS,
200 |    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
201 |    See the License for the specific language governing permissions and
202 |    limitations under the License.
203 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | **Important**: This project has been stopped since elasticsearch 2.0.
  2 | 
  3 | ----
  4 | 
  5 | Twitter River Plugin for Elasticsearch
  6 | ==================================
  7 | 
  8 | The Twitter river indexes the public [twitter stream](http://dev.twitter.com/pages/streaming_api), aka the hose,
  9 | and makes it searchable.
 10 | 
 11 | **Rivers are [deprecated](https://www.elastic.co/blog/deprecating_rivers) and will be removed in the future.**
 12 | Have a look at [logstash twitter input](http://www.elastic.co/guide/en/logstash/current/plugins-inputs-twitter.html).
 13 | 
 14 | In order to install the plugin, run: 
 15 | 
 16 | ```sh
 17 | bin/plugin install elasticsearch/elasticsearch-river-twitter/2.6.0
 18 | ```
 19 | 
 20 | After installing the plugin you need to restart elasticsearch.
 21 | 
 22 | You need to install a version matching your Elasticsearch version:
 23 | 
 24 | |       Elasticsearch    |Twitter River Plugin|                                                            Docs                                                                   |
 25 | |------------------------|-------------------|------------------------------------------------------------------------------------------------------------------------------------|
 26 | |    master              | Build from source | See below                                                                                                                          |
 27 | |    es-1.x              | Build from source  | [2.7.0-SNAPSHOT](https://github.com/elasticsearch/elasticsearch-river-twitter/tree/es-1.x/#version-270-snapshot-for-elasticsearch-1x)|
 28 | |    es-1.6              |     2.6.0         | [2.6.0](https://github.com/elastic/elasticsearch-river-twitter/tree/v2.6.0/#version-260-for-elasticsearch-16)                  |
 29 | |    es-1.5              |     2.5.0         | [2.5.0](https://github.com/elastic/elasticsearch-river-twitter/tree/v2.5.0/#version-250-for-elasticsearch-15)                  |
 30 | |    es-1.4              |     2.4.2         | [2.4.2](https://github.com/elasticsearch/elasticsearch-river-twitter/tree/v2.4.2/#version-242-for-elasticsearch-14)                  |
 31 | |    es-1.3              |     2.3.0         | [2.3.0](https://github.com/elasticsearch/elasticsearch-river-twitter/tree/v2.3.0/#version-230-for-elasticsearch-13)                  |
 32 | |    es-1.2              |     2.2.0         | [2.2.0](https://github.com/elasticsearch/elasticsearch-river-twitter/tree/v2.2.0/#twitter-river-plugin-for-elasticsearch)          |
 33 | |    es-1.0              |     2.0.0         | [2.0.0](https://github.com/elasticsearch/elasticsearch-river-twitter/tree/v2.0.0/#twitter-river-plugin-for-elasticsearch)          |
 34 | |    es-0.90             |     1.5.0         | [1.5.0](https://github.com/elasticsearch/elasticsearch-river-twitter/tree/v1.5.0/#twitter-river-plugin-for-elasticsearch)          |
 35 | 
 36 | To build a `SNAPSHOT` version, you need to build it with Maven:
 37 | 
 38 | ```bash
 39 | mvn clean install
 40 | plugin --install river-twitter \ 
 41 |        --url file:target/releases/elasticsearch-river-twitter-X.X.X-SNAPSHOT.zip
 42 | ```
 43 | 
 44 | Prerequisites
 45 | -------------
 46 | 
 47 | You need to get an OAuth token in order to use Twitter river.
 48 | Please follow [Twitter documentation](https://dev.twitter.com/docs/auth/tokens-devtwittercom), basically:
 49 | 
 50 | * Login to: https://dev.twitter.com/apps/
 51 | * Create a new Twitter application (let's say elasticsearch): https://dev.twitter.com/apps/new
 52 | You don't need a callback URL.
 53 | * When done, click on `Create my access token`.
 54 | * Open `OAuth tool` tab and note `Consumer key`, `Consumer secret`, `Access token` and `Access token secret`.
 55 | 
 56 | 
 57 | Create river
 58 | ------------
 59 | 
 60 | Creating the twitter river can be done using:
 61 | 
 62 | ```
 63 | PUT _river/my_twitter_river/_meta
 64 | {
 65 |     "type" : "twitter",
 66 |     "twitter" : {
 67 |         "oauth" : {
 68 |             "consumer_key" : "*** YOUR Consumer key HERE ***",
 69 |             "consumer_secret" : "*** YOUR Consumer secret HERE ***",
 70 |             "access_token" : "*** YOUR Access token HERE ***",
 71 |             "access_token_secret" : "*** YOUR Access token secret HERE ***"
 72 |         }
 73 |     },
 74 |     "index" : {
 75 |         "index" : "my_twitter_river",
 76 |         "type" : "status",
 77 |         "bulk_size" : 100,
 78 |         "flush_interval" : "5s",
 79 |         "retry_after" : "10s"
 80 |     }
 81 | }
 82 | ```
 83 | 
 84 | The above lists all the options controlling the creation of a twitter river.
 85 | 
 86 | If you don't define `index.index`, it will use your river name (`my_twitter_river`) as the default index name.
 87 | If you don't define `index.type`, default `status` type will be used. 
 88 | 
 89 | Note that you can define any or all of your oauth settings in `elasticsearch.yml` file on each node by prefixing 
 90 | setting with `river.twitter.`:
 91 | 
 92 | ```
 93 | river.twitter.oauth.consumer_key: "*** YOUR Consumer key HERE ***"
 94 | river.twitter.oauth.consumer_secret: "*** YOUR Consumer secret HERE ***"
 95 | river.twitter.oauth.access_token: "*** YOUR Access token HERE ***"
 96 | river.twitter.oauth.access_token_secret: "*** YOUR Access token secret HERE ***"
 97 | ```
 98 | 
 99 | In that case, you can create the river using:
100 | 
101 | ```
102 | PUT _river/my_twitter_river/_meta
103 | {
104 |     "type" : "twitter"
105 | }
106 | ```
107 | 
108 | You can also overload any of `elasticsearch.yml` setting. A good practice could be to have `consumer_key` and 
109 | `consumer_secret` in `elasticsearch.yml` and provide to the river `access_token` and `access_token_secret` properties.
110 | 
111 | By default, the twitter river will read a small random of all public statuses using
112 | [sample API](https://dev.twitter.com/docs/api/1.1/get/statuses/sample).
113 | 
114 | But, you can define statuses type you want to read:
115 | 
116 | * [sample](https://dev.twitter.com/docs/api/1.1/get/statuses/sample): the default one
117 | * [filter](https://dev.twitter.com/docs/api/1.1/post/statuses/filter): track for text, users and locations.
118 | See [Filtered Stream](#filtered-stream)
119 | * [user](https://dev.twitter.com/docs/streaming-apis/streams/user): listen to tweets in the authenticated user's timeline.
120 | See [User Stream](#user-stream)
121 | * [firehose](https://dev.twitter.com/docs/api/1.1/get/statuses/firehose): all public statuses (restricted access)
122 | 
123 | For example:
124 | 
125 | ```
126 | PUT _river/my_twitter_river/_meta
127 | {
128 |     "type" : "twitter",
129 |     "twitter" : {
130 |         "type" : "firehose"
131 |     }
132 | }
133 | ```
134 | 
135 | Note that if you define a filter (see [next section](#filtered-stream)), type will be automatically set to `filter`.
136 | 
137 | Tweets will be indexed once a `bulk_size` of them have been accumulated (default to `100`)
138 | or every `flush_interval` period (default to `5s`).
139 | 
140 | Filtered Stream
141 | ===============
142 | 
143 | Filtered stream can also be supported (as per the twitter stream API). Filter stream can be configured to
144 | support `tracks`, `follow`, `locations` and `language`. `user_lists` is a shortcut to follow all members of a public 
145 | twitter list identified by the user id and the list slug (last part of uri when open a list in your browser). The 
146 | configuration is the same as the twitter API (a single comma separated string value, or using json arrays). 
147 | Here is an example:
148 | 
149 | ```
150 | PUT _river/my_twitter_river/_meta
151 | {
152 |     "type" : "twitter",
153 |     "twitter" : {
154 |         "filter" : {
155 |             "tracks" : "test,something,please",
156 |             "follow" : "111,222,333",
157 |             "user_lists" : "ownerScreenName1/slug1,ownerScreenName2/slug2",
158 |             "locations" : "-122.75,36.8,-121.75,37.8,-74,40,-73,41",
159 |             "language" : "fr,en"
160 |         }
161 |     }
162 | }
163 | ```
164 | 
165 | Note that locations use geoJSON order (longitude, latitude).
166 | 
167 | Note that if you want to use language filtering you need also to define at least one of `tracks`,
168 | `follow` or `locations` filter.
169 | Supported languages identifiers are [BCP 47](http://tools.ietf.org/html/bcp47). You can filter
170 | whatever language defined in [Twitter Advanced Search](https://twitter.com/search-advanced).
171 | 
172 | Here is an array based configuration example:
173 | 
174 | ```
175 | PUT _river/my_twitter_river/_meta
176 | {
177 |     "type" : "twitter",
178 |     "twitter" : {
179 |         "filter" : {
180 |             "tracks" : ["test", "something"],
181 |             "follow" : [111, 222, 333],
182 |             "locations" : [ [-122.75,36.8], [-121.75,37.8], [-74,40], [-73,41]],
183 |             "language" : [ "fr", "en" ]
184 |         }
185 |     }
186 | }
187 | ```
188 | 
189 | User Stream
190 | ===========
191 | 
192 | User stream can also be supported (as per the twitter stream API). This stream return tweets on the authenticated user's
193 | timeline. Here is a basic configuration example:
194 | 
195 | ```
196 | PUT _river/my_twitter_river/_meta
197 | {
198 |     "type" : "twitter",
199 |     "twitter" : {
200 |         "type" : "user"
201 |     }
202 | }
203 | ```
204 | 
205 | Indexing RAW Twitter stream
206 | ===========================
207 | 
208 | By default, elasticsearch twitter river will convert tweets to an equivalent representation
209 | in elasticsearch. If you want to index RAW twitter JSON content without any transformation,
210 | you can set `raw` to `true`:
211 | 
212 | ```
213 | PUT _river/my_twitter_river/_meta
214 | {
215 |     "type" : "twitter",
216 |     "twitter" : {
217 |         "raw" : true
218 |     }
219 | }
220 | ```
221 | 
222 | Note that you should think of creating a mapping first for your tweets. See Twitter documentation on
223 | [raw Tweet format](https://dev.twitter.com/docs/platform-objects/tweets):
224 | 
225 | ```
226 | PUT my_twitter_river/status/_mapping
227 | {
228 |     "status" : {
229 |         "properties" : {
230 |             "text" : {"type" : "string", "analyzer" : "standard"}
231 |         }
232 |     }
233 | }
234 | ```
235 | 
236 | Ignoring Retweets
237 | =================
238 | 
239 | If you don't want to index retweets (aka RT), just set `ignore_retweet` to `true` (default to `false`):
240 | 
241 | ```
242 | PUT _river/my_twitter_river/_meta
243 | {
244 |     "type" : "twitter",
245 |     "twitter" : {
246 |         "ignore_retweet" : true
247 |     }
248 | }
249 | ```
250 | 
251 | Increase the schedule time to reconnect the river
252 | =================================================
253 | 
254 | It can happen that the river fails, thus closing the current connection to the Streaming API. Then, a new connection is scheduled by the river after 10s by default.
255 | If you want to manage this time, simply use the `retry_after` option, as in:
256 | 
257 | ```
258 | PUT _river/my_twitter_river/_meta
259 | {
260 |     "type" : "twitter",
261 |     "index" : {
262 |         "retry_after" : "30s"
263 |     }
264 | }
265 | ```
266 | 
267 | Geo location points as array
268 | ============================
269 | 
270 | By default, elasticsearch twitter river index `location` field using the *lat lon as properties* format.
271 | You can set `geo_as_array` to `true` if you prefer having `location` indexed as an array `[lon, lat]`.
272 | 
273 | ```
274 | PUT _river/my_twitter_river/_meta
275 | {
276 |     "type" : "twitter",
277 |     "twitter" : {
278 |         "geo_as_array" : true
279 |     }
280 | }
281 | ```
282 | 
283 | Remove the river
284 | ================
285 | 
286 | If you need to stop the Twitter river, you have to remove it:
287 | 
288 | ```
289 | DELETE _river/my_twitter_river/
290 | ```
291 | 
292 | Using a proxy
293 | =============
294 | 
295 | You can define a proxy if you are using one:
296 | 
297 | ```
298 | PUT _river/my_twitter_river/_meta
299 | {
300 |     "type" : "twitter",
301 |     "twitter" : {
302 |         "proxy" : {
303 |             "host": "host",
304 |             "port": "port",
305 |             "user": "proxy_user_if_any",
306 |             "password": "proxy_password_if_any"
307 |         }
308 |     }
309 | }
310 | ```
311 | 
312 | You can also define proxy settings in `elasticsearch.yml`file on each node by prefixing setting with `river.twitter.`:
313 |                                                          
314 | ```yaml
315 | river.twitter.proxy.host: "host"
316 | river.twitter.proxy.port: "port"
317 | river.twitter.proxy.user: "proxy_user_if_any"
318 | river.twitter.proxy.password: "proxy_password_if_any"
319 | ```
320 | 
321 | Sample document
322 | ===============
323 | 
324 | Here is how a document could look like when using this river (without `raw` option):
325 | 
326 | ```js
327 | {
328 |    "text":"This is a text",
329 |    "created_at":"2015-01-26T15:22:35.000Z",
330 |    "source":"<a href=\"http://www.twitter.com\" rel=\"nofollow\">Twitter for Windows Phone</a>",
331 |    "truncated":false,
332 |    "language":"en",
333 |    "mention":[
334 | 
335 |    ],
336 |    "retweet_count":0,
337 |    "hashtag":[
338 | 
339 |    ],
340 |    "location":[
341 |       78.418407,
342 |       17.431913
343 |    ],
344 |    "place":{
345 |       "id":"243cc16f6417a167",
346 |       "name":"Hyderabad",
347 |       "type":"city",
348 |       "full_name":"Hyderabad, Andhra Pradesh",
349 |       "street_address":null,
350 |       "country":"India",
351 |       "country_code":"IN",
352 |       "url":"https://api.twitter.com/1.1/geo/id/243cc16f6417a167.json"
353 |    },
354 |    "link":[
355 | 
356 |    ],
357 |    "user":{
358 |       "id":1111111111,
359 |       "name":"User Name",
360 |       "screen_name":"twitter_handle",
361 |       "location":"A full text location description",
362 |       "description":"A description",
363 |       "profile_image_url":"http://pbs.twimg.com/profile_images/1111111111/QATJ00Yp_normal.jpeg",
364 |       "profile_image_url_https":"https://pbs.twimg.com/profile_images/1111111111/QATJ00Yp_normal.jpeg"
365 |    }
366 | }
367 | ```
368 | 
369 | Tests
370 | =====
371 | 
372 | Integrations tests in this plugin require working Twitter account and therefore disabled by default. 
373 | You need to create your credentials as explained in [Prerequisites](#prerequisites).
374 | 
375 | To enable tests prepare a config file `elasticsearch.yml` with the following content:
376 | 
377 | ```
378 | river:
379 |   twitter:
380 |       oauth:
381 |          consumer_key: "your_consumer_key"
382 |          consumer_secret: "your_consumer_secret"
383 |          access_token: "your_access_token"
384 |          access_token_secret: "your_access_token_secret"
385 | ```
386 | 
387 | Replace all occurrences of `your_consumer_key`, `your_consumer_secret`, `your_access_token` and 
388 | `your_access_token_secret` with your settings.
389 | 
390 | To run test:
391 | 
392 | ```sh
393 | mvn -Dtests.twitter=true -Dtests.config=/path/to/config/file/elasticsearch.yml clean test
394 | ```
395 | 
396 | Note that if you want to test User Stream, you need to define write rights for your twitter 
397 | application.
398 | 
399 | License
400 | -------
401 | 
402 |     This software is licensed under the Apache 2 license, quoted below.
403 | 
404 |     Copyright 2009-2014 Elasticsearch <http://www.elasticsearch.org>
405 | 
406 |     Licensed under the Apache License, Version 2.0 (the "License"); you may not
407 |     use this file except in compliance with the License. You may obtain a copy of
408 |     the License at
409 | 
410 |         http://www.apache.org/licenses/LICENSE-2.0
411 | 
412 |     Unless required by applicable law or agreed to in writing, software
413 |     distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
414 |     WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
415 |     License for the specific language governing permissions and limitations under
416 |     the License.
417 | 


--------------------------------------------------------------------------------
/dev-tools/release.py:
--------------------------------------------------------------------------------
  1 | # Licensed to Elasticsearch under one or more contributor
  2 | # license agreements. See the NOTICE file distributed with
  3 | # this work for additional information regarding copyright
  4 | # ownership. Elasticsearch licenses this file to you under
  5 | # the Apache License, Version 2.0 (the "License"); you may
  6 | # not use this file except in compliance  with the License.
  7 | # You may obtain a copy of the License at
  8 | #
  9 | # http://www.apache.org/licenses/LICENSE-2.0
 10 | #
 11 | # Unless required by applicable law or agreed to in writing,
 12 | # software distributed under the License is distributed on
 13 | # an 'AS IS' BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND,
 14 | # either express or implied. See the License for the specific
 15 | # language governing permissions and limitations under the License.
 16 | 
 17 | import datetime
 18 | import os
 19 | import shutil
 20 | import sys
 21 | import time
 22 | import urllib
 23 | import urllib.request
 24 | import zipfile
 25 | 
 26 | from os.path import dirname, abspath
 27 | 
 28 | """
 29 |  This tool builds a release from the a given elasticsearch plugin branch.
 30 | 
 31 |  It is basically a wrapper on top of launch_release.py which:
 32 | 
 33 |  - tries to get a more recent version of launch_release.py in ...
 34 |  - download it if needed
 35 |  - launch it passing all arguments to it, like:
 36 | 
 37 |    $ python3 dev_tools/release.py --branch master --publish --remote origin
 38 | 
 39 |  Important options:
 40 | 
 41 |    # Dry run
 42 |    $ python3 dev_tools/release.py
 43 | 
 44 |    # Dry run without tests
 45 |    python3 dev_tools/release.py --skiptests
 46 | 
 47 |    # Release, publish artifacts and announce
 48 |    $ python3 dev_tools/release.py --publish
 49 | 
 50 |  See full documentation in launch_release.py
 51 | """
 52 | env = os.environ
 53 | 
 54 | # Change this if the source repository for your scripts is at a different location
 55 | SOURCE_REPO = 'elasticsearch/elasticsearch-plugins-script'
 56 | # We define that we should download again the script after 1 days
 57 | SCRIPT_OBSOLETE_DAYS = 1
 58 | # We ignore in master.zip file the following files
 59 | IGNORED_FILES = ['.gitignore', 'README.md']
 60 | 
 61 | 
 62 | ROOT_DIR = abspath(os.path.join(abspath(dirname(__file__)), '../'))
 63 | TARGET_TOOLS_DIR = ROOT_DIR + '/plugin_tools'
 64 | DEV_TOOLS_DIR = ROOT_DIR + '/dev-tools'
 65 | BUILD_RELEASE_FILENAME = 'release.zip'
 66 | BUILD_RELEASE_FILE = TARGET_TOOLS_DIR + '/' + BUILD_RELEASE_FILENAME
 67 | SOURCE_URL = 'https://github.com/%s/archive/master.zip' % SOURCE_REPO
 68 | 
 69 | # Download a recent version of the release plugin tool
 70 | try:
 71 |     os.mkdir(TARGET_TOOLS_DIR)
 72 |     print('directory %s created' % TARGET_TOOLS_DIR)
 73 | except FileExistsError:
 74 |     pass
 75 | 
 76 | 
 77 | try:
 78 |     # we check latest update. If we ran an update recently, we
 79 |     # are not going to check it again
 80 |     download = True
 81 | 
 82 |     try:
 83 |         last_download_time = datetime.datetime.fromtimestamp(os.path.getmtime(BUILD_RELEASE_FILE))
 84 |         if (datetime.datetime.now()-last_download_time).days < SCRIPT_OBSOLETE_DAYS:
 85 |             download = False
 86 |     except FileNotFoundError:
 87 |         pass
 88 | 
 89 |     if download:
 90 |         urllib.request.urlretrieve(SOURCE_URL, BUILD_RELEASE_FILE)
 91 |         with zipfile.ZipFile(BUILD_RELEASE_FILE) as myzip:
 92 |             for member in myzip.infolist():
 93 |                 filename = os.path.basename(member.filename)
 94 |                 # skip directories
 95 |                 if not filename:
 96 |                     continue
 97 |                 if filename in IGNORED_FILES:
 98 |                     continue
 99 | 
100 |                 # copy file (taken from zipfile's extract)
101 |                 source = myzip.open(member.filename)
102 |                 target = open(os.path.join(TARGET_TOOLS_DIR, filename), "wb")
103 |                 with source, target:
104 |                     shutil.copyfileobj(source, target)
105 |                     # We keep the original date
106 |                     date_time = time.mktime(member.date_time + (0, 0, -1))
107 |                     os.utime(os.path.join(TARGET_TOOLS_DIR, filename), (date_time, date_time))
108 |         print('plugin-tools updated from %s' % SOURCE_URL)
109 | except urllib.error.HTTPError:
110 |     pass
111 | 
112 | 
113 | # Let see if we need to update the release.py script itself
114 | source_time = os.path.getmtime(TARGET_TOOLS_DIR + '/release.py')
115 | repo_time = os.path.getmtime(DEV_TOOLS_DIR + '/release.py')
116 | if source_time > repo_time:
117 |     input('release.py needs an update. Press a key to update it...')
118 |     shutil.copyfile(TARGET_TOOLS_DIR + '/release.py', DEV_TOOLS_DIR + '/release.py')
119 | 
120 | # We can launch the build process
121 | try:
122 |     PYTHON = 'python'
123 |     # make sure python3 is used if python3 is available
124 |     # some systems use python 2 as default
125 |     os.system('python3 --version > /dev/null 2>&1')
126 |     PYTHON = 'python3'
127 | except RuntimeError:
128 |     pass
129 | 
130 | release_args = ''
131 | for x in range(1, len(sys.argv)):
132 |     release_args += ' ' + sys.argv[x]
133 | 
134 | os.system('%s %s/build_release.py %s' % (PYTHON, TARGET_TOOLS_DIR, release_args))
135 | 


--------------------------------------------------------------------------------
/pom.xml:
--------------------------------------------------------------------------------
 1 | <?xml version="1.0" encoding="UTF-8"?>
 2 | <project xmlns="http://maven.apache.org/POM/4.0.0"
 3 |          xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
 4 |          xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
 5 |     <modelVersion>4.0.0</modelVersion>
 6 | 
 7 |     <groupId>org.elasticsearch</groupId>
 8 |     <artifactId>elasticsearch-river-twitter</artifactId>
 9 |     <version>3.0.0-SNAPSHOT</version>
10 |     <packaging>jar</packaging>
11 |     <name>Elasticsearch Twitter River plugin</name>
12 |     <description>The Twitter river indexes the public twitter stream, aka the hose, and makes it searchable</description>
13 |     <url>https://github.com/elastic/elasticsearch-river-twitter/</url>
14 |     <inceptionYear>2009</inceptionYear>
15 |     <licenses>
16 |         <license>
17 |             <name>The Apache Software License, Version 2.0</name>
18 |             <url>http://www.apache.org/licenses/LICENSE-2.0.txt</url>
19 |             <distribution>repo</distribution>
20 |         </license>
21 |     </licenses>
22 |     <scm>
23 |         <connection>scm:git:git@github.com:elastic/elasticsearch-river-twitter.git</connection>
24 |         <developerConnection>scm:git:git@github.com:elastic/elasticsearch-river-twitter.git</developerConnection>
25 |         <url>http://github.com/elastic/elasticsearch-river-twitter</url>
26 |     </scm>
27 | 
28 |     <parent>
29 |         <groupId>org.elasticsearch</groupId>
30 |         <artifactId>elasticsearch-plugin</artifactId>
31 |         <version>2.0.0-SNAPSHOT</version>
32 |     </parent>
33 | 
34 |     <properties>
35 |         <twitter4j.version>4.0.3</twitter4j.version>
36 |         <!-- no unit tests -->
37 |         <tests.ifNoTests>warn</tests.ifNoTests>
38 |         <tests.jvms>1</tests.jvms>
39 |     </properties>
40 | 
41 |     <dependencies>
42 |         <!-- Twitter4J -->
43 |         <dependency>
44 |             <groupId>org.twitter4j</groupId>
45 |             <artifactId>twitter4j-stream</artifactId>
46 |             <version>${twitter4j.version}</version>
47 |         </dependency>
48 |     </dependencies>
49 | 
50 |     <build>
51 |         <plugins>
52 |             <plugin>
53 |                 <groupId>org.apache.maven.plugins</groupId>
54 |                 <artifactId>maven-assembly-plugin</artifactId>
55 |             </plugin>
56 |         </plugins>
57 |     </build>
58 | 
59 |     <repositories>
60 |         <repository>
61 |             <id>oss-snapshots</id>
62 |             <name>Sonatype OSS Snapshots</name>
63 |             <url>https://oss.sonatype.org/content/repositories/snapshots/</url>
64 |         </repository>
65 |     </repositories>
66 | </project>
67 | 


--------------------------------------------------------------------------------
/src/main/assemblies/plugin.xml:
--------------------------------------------------------------------------------
 1 | <?xml version="1.0"?>
 2 | <assembly>
 3 |     <id>plugin</id>
 4 |     <formats>
 5 |         <format>zip</format>
 6 |     </formats>
 7 |     <includeBaseDirectory>false</includeBaseDirectory>
 8 |     <dependencySets>
 9 |         <dependencySet>
10 |             <outputDirectory>/</outputDirectory>
11 |             <useProjectArtifact>true</useProjectArtifact>
12 |             <useTransitiveFiltering>true</useTransitiveFiltering>
13 |             <excludes>
14 |                 <exclude>org.elasticsearch:elasticsearch</exclude>
15 |             </excludes>
16 |         </dependencySet>
17 |         <dependencySet>
18 |             <outputDirectory>/</outputDirectory>
19 |             <useProjectArtifact>true</useProjectArtifact>
20 |             <useTransitiveFiltering>true</useTransitiveFiltering>
21 |             <includes>
22 |                 <include>org.twitter4j:twitter4j-stream</include>
23 |             </includes>
24 |         </dependencySet>
25 |     </dependencySets>
26 | </assembly>


--------------------------------------------------------------------------------
/src/main/java/org/elasticsearch/plugin/river/twitter/TwitterRiverPlugin.java:
--------------------------------------------------------------------------------
 1 | /*
 2 |  * Licensed to Elasticsearch under one or more contributor
 3 |  * license agreements. See the NOTICE file distributed with
 4 |  * this work for additional information regarding copyright
 5 |  * ownership. Elasticsearch licenses this file to you under
 6 |  * the Apache License, Version 2.0 (the "License"); you may
 7 |  * not use this file except in compliance with the License.
 8 |  * You may obtain a copy of the License at
 9 |  *
10 |  *    http://www.apache.org/licenses/LICENSE-2.0
11 |  *
12 |  * Unless required by applicable law or agreed to in writing,
13 |  * software distributed under the License is distributed on an
14 |  * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
15 |  * KIND, either express or implied.  See the License for the
16 |  * specific language governing permissions and limitations
17 |  * under the License.
18 |  */
19 | 
20 | package org.elasticsearch.plugin.river.twitter;
21 | 
22 | import org.elasticsearch.common.inject.Inject;
23 | import org.elasticsearch.plugins.AbstractPlugin;
24 | import org.elasticsearch.river.RiversModule;
25 | import org.elasticsearch.river.twitter.TwitterRiverModule;
26 | 
27 | /**
28 |  *
29 |  */
30 | public class TwitterRiverPlugin extends AbstractPlugin {
31 | 
32 |     @Inject
33 |     public TwitterRiverPlugin() {
34 |     }
35 | 
36 |     @Override
37 |     public String name() {
38 |         return "river-twitter";
39 |     }
40 | 
41 |     @Override
42 |     public String description() {
43 |         return "River Twitter Plugin";
44 |     }
45 | 
46 |     public void onModule(RiversModule module) {
47 |         module.registerRiver("twitter", TwitterRiverModule.class);
48 |     }
49 | }
50 | 


--------------------------------------------------------------------------------
/src/main/java/org/elasticsearch/river/twitter/TwitterRiver.java:
--------------------------------------------------------------------------------
  1 | /*
  2 |  * Licensed to Elasticsearch under one or more contributor
  3 |  * license agreements. See the NOTICE file distributed with
  4 |  * this work for additional information regarding copyright
  5 |  * ownership. Elasticsearch licenses this file to you under
  6 |  * the Apache License, Version 2.0 (the "License"); you may
  7 |  * not use this file except in compliance with the License.
  8 |  * You may obtain a copy of the License at
  9 |  *
 10 |  *    http://www.apache.org/licenses/LICENSE-2.0
 11 |  *
 12 |  * Unless required by applicable law or agreed to in writing,
 13 |  * software distributed under the License is distributed on an
 14 |  * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
 15 |  * KIND, either express or implied.  See the License for the
 16 |  * specific language governing permissions and limitations
 17 |  * under the License.
 18 |  */
 19 | 
 20 | package org.elasticsearch.river.twitter;
 21 | 
 22 | import org.elasticsearch.ExceptionsHelper;
 23 | import org.elasticsearch.action.bulk.BulkItemResponse;
 24 | import org.elasticsearch.action.bulk.BulkProcessor;
 25 | import org.elasticsearch.action.bulk.BulkRequest;
 26 | import org.elasticsearch.action.bulk.BulkResponse;
 27 | import org.elasticsearch.client.Client;
 28 | import org.elasticsearch.client.Requests;
 29 | import org.elasticsearch.cluster.block.ClusterBlockException;
 30 | import org.elasticsearch.common.Strings;
 31 | import org.elasticsearch.common.inject.Inject;
 32 | import org.elasticsearch.common.settings.Settings;
 33 | import org.elasticsearch.common.unit.TimeValue;
 34 | import org.elasticsearch.common.xcontent.XContentBuilder;
 35 | import org.elasticsearch.common.xcontent.XContentFactory;
 36 | import org.elasticsearch.common.xcontent.support.XContentMapValues;
 37 | import org.elasticsearch.indices.IndexAlreadyExistsException;
 38 | import org.elasticsearch.river.AbstractRiverComponent;
 39 | import org.elasticsearch.river.River;
 40 | import org.elasticsearch.river.RiverName;
 41 | import org.elasticsearch.river.RiverSettings;
 42 | import org.elasticsearch.threadpool.ThreadPool;
 43 | import twitter4j.*;
 44 | import twitter4j.conf.Configuration;
 45 | import twitter4j.conf.ConfigurationBuilder;
 46 | 
 47 | import java.util.ArrayList;
 48 | import java.util.List;
 49 | import java.util.Map;
 50 | 
 51 | /**
 52 |  *
 53 |  */
 54 | public class TwitterRiver extends AbstractRiverComponent implements River {
 55 | 
 56 |     private final ThreadPool threadPool;
 57 | 
 58 |     private final Client client;
 59 | 
 60 |     private final String oauthConsumerKey;
 61 |     private final String oauthConsumerSecret;
 62 |     private final String oauthAccessToken;
 63 |     private final String oauthAccessTokenSecret;
 64 | 
 65 |     private final TimeValue retryAfter;
 66 | 
 67 |     private final String proxyHost;
 68 |     private final String proxyPort;
 69 |     private final String proxyUser;
 70 |     private final String proxyPassword;
 71 | 
 72 |     private final boolean raw;
 73 |     private final boolean ignoreRetweet;
 74 |     private final boolean geoAsArray;
 75 | 
 76 |     private final String indexName;
 77 | 
 78 |     private final String typeName;
 79 | 
 80 |     private final int bulkSize;
 81 |     private final int maxConcurrentBulk;
 82 |     private final TimeValue bulkFlushInterval;
 83 | 
 84 |     private final FilterQuery filterQuery;
 85 | 
 86 |     private final String streamType;
 87 | 
 88 |     private RiverStatus riverStatus;
 89 | 
 90 |     private volatile TwitterStream stream;
 91 | 
 92 |     private volatile BulkProcessor bulkProcessor;
 93 | 
 94 |     @SuppressWarnings({"unchecked"})
 95 |     @Inject
 96 |     public TwitterRiver(RiverName riverName, RiverSettings riverSettings, Client client, ThreadPool threadPool, Settings settings) {
 97 |         super(riverName, riverSettings);
 98 |         this.riverStatus = RiverStatus.UNKNOWN;
 99 |         this.client = client;
100 |         this.threadPool = threadPool;
101 | 
102 |         String riverStreamType;
103 | 
104 |         if (riverSettings.settings().containsKey("twitter")) {
105 |             Map<String, Object> twitterSettings = (Map<String, Object>) riverSettings.settings().get("twitter");
106 | 
107 |             raw = XContentMapValues.nodeBooleanValue(twitterSettings.get("raw"), false);
108 |             ignoreRetweet = XContentMapValues.nodeBooleanValue(twitterSettings.get("ignore_retweet"), false);
109 |             geoAsArray = XContentMapValues.nodeBooleanValue(twitterSettings.get("geo_as_array"), false);
110 | 
111 |             if (twitterSettings.containsKey("oauth")) {
112 |                 Map<String, Object> oauth = (Map<String, Object>) twitterSettings.get("oauth");
113 |                 if (oauth.containsKey("consumer_key")) {
114 |                     oauthConsumerKey = XContentMapValues.nodeStringValue(oauth.get("consumer_key"), null);
115 |                 } else {
116 |                     oauthConsumerKey = settings.get("river.twitter.oauth.consumer_key");
117 |                 }
118 |                 if (oauth.containsKey("consumer_secret")) {
119 |                     oauthConsumerSecret = XContentMapValues.nodeStringValue(oauth.get("consumer_secret"), null);
120 |                 } else {
121 |                     oauthConsumerSecret = settings.get("river.twitter.oauth.consumer_secret");
122 |                 }
123 |                 if (oauth.containsKey("access_token")) {
124 |                     oauthAccessToken = XContentMapValues.nodeStringValue(oauth.get("access_token"), null);
125 |                 } else {
126 |                     oauthAccessToken = settings.get("river.twitter.oauth.access_token");
127 |                 }
128 |                 if (oauth.containsKey("access_token_secret")) {
129 |                     oauthAccessTokenSecret = XContentMapValues.nodeStringValue(oauth.get("access_token_secret"), null);
130 |                 } else {
131 |                     oauthAccessTokenSecret = settings.get("river.twitter.oauth.access_token_secret");
132 |                 }
133 |             } else {
134 |                 oauthConsumerKey = settings.get("river.twitter.oauth.consumer_key");
135 |                 oauthConsumerSecret = settings.get("river.twitter.oauth.consumer_secret");
136 |                 oauthAccessToken = settings.get("river.twitter.oauth.access_token");
137 |                 oauthAccessTokenSecret = settings.get("river.twitter.oauth.access_token_secret");
138 |             }
139 | 
140 |             if (twitterSettings.containsKey("retry_after")) {
141 |                 retryAfter = XContentMapValues.nodeTimeValue(twitterSettings.get("retry_after"), TimeValue.timeValueSeconds(10));
142 |             } else {
143 |                 retryAfter = XContentMapValues.nodeTimeValue(settings.get("river.twitter.retry_after"), TimeValue.timeValueSeconds(10));
144 |             }
145 | 
146 |             if (twitterSettings.containsKey("proxy")) {
147 |                 Map<String, Object> proxy = (Map<String, Object>) twitterSettings.get("proxy");
148 |                 proxyHost = XContentMapValues.nodeStringValue(proxy.get("host"), null);
149 |                 proxyPort = XContentMapValues.nodeStringValue(proxy.get("port"), null);
150 |                 proxyUser = XContentMapValues.nodeStringValue(proxy.get("user"), null);
151 |                 proxyPassword = XContentMapValues.nodeStringValue(proxy.get("password"), null);
152 |             } else {
153 |                 // Let's see if we have that in node settings
154 |                 proxyHost = settings.get("river.twitter.proxy.host");
155 |                 proxyPort = settings.get("river.twitter.proxy.port");
156 |                 proxyUser = settings.get("river.twitter.proxy.user");
157 |                 proxyPassword = settings.get("river.twitter.proxy.password");
158 |             }
159 | 
160 |             riverStreamType = XContentMapValues.nodeStringValue(twitterSettings.get("type"), "sample");
161 |             Map<String, Object> filterSettings = (Map<String, Object>) twitterSettings.get("filter");
162 | 
163 |             if (riverStreamType.equals("filter") && filterSettings == null) {
164 |                 filterQuery = null;
165 |                 stream = null;
166 |                 streamType = null;
167 |                 indexName = null;
168 |                 typeName = "status";
169 |                 bulkSize = 100;
170 |                 this.maxConcurrentBulk = 1;
171 |                 this.bulkFlushInterval = TimeValue.timeValueSeconds(5);
172 |                 logger.warn("no filter defined for type filter. Disabling river...");
173 |                 return;
174 |             }
175 | 
176 |             if (filterSettings != null) {
177 |                 riverStreamType = "filter";
178 |                 filterQuery = new FilterQuery();
179 |                 filterQuery.count(XContentMapValues.nodeIntegerValue(filterSettings.get("count"), 0));
180 |                 Object tracks = filterSettings.get("tracks");
181 |                 boolean filterSet = false;
182 |                 if (tracks != null) {
183 |                     if (tracks instanceof List) {
184 |                         List<String> lTracks = (List<String>) tracks;
185 |                         filterQuery.track(lTracks.toArray(new String[lTracks.size()]));
186 |                     } else {
187 |                         filterQuery.track(Strings.commaDelimitedListToStringArray(tracks.toString()));
188 |                     }
189 |                     filterSet = true;
190 |                 }
191 |                 Object follow = filterSettings.get("follow");
192 |                 if (follow != null) {
193 |                     if (follow instanceof List) {
194 |                         List lFollow = (List) follow;
195 |                         long[] followIds = new long[lFollow.size()];
196 |                         for (int i = 0; i < lFollow.size(); i++) {
197 |                             Object o = lFollow.get(i);
198 |                             if (o instanceof Number) {
199 |                                 followIds[i] = ((Number) o).intValue();
200 |                             } else {
201 |                                 followIds[i] = Long.parseLong(o.toString());
202 |                             }
203 |                         }
204 |                         filterQuery.follow(followIds);
205 |                     } else {
206 |                         String[] ids = Strings.commaDelimitedListToStringArray(follow.toString());
207 |                         long[] followIds = new long[ids.length];
208 |                         for (int i = 0; i < ids.length; i++) {
209 |                             followIds[i] = Long.parseLong(ids[i]);
210 |                         }
211 |                         filterQuery.follow(followIds);
212 |                     }
213 |                     filterSet = true;
214 |                 }
215 |                 Object locations = filterSettings.get("locations");
216 |                 if (locations != null) {
217 |                     if (locations instanceof List) {
218 |                         List lLocations = (List) locations;
219 |                         double[][] dLocations = new double[lLocations.size()][];
220 |                         for (int i = 0; i < lLocations.size(); i++) {
221 |                             Object loc = lLocations.get(i);
222 |                             double lat;
223 |                             double lon;
224 |                             if (loc instanceof List) {
225 |                                 List lLoc = (List) loc;
226 |                                 if (lLoc.get(0) instanceof Number) {
227 |                                     lon = ((Number) lLoc.get(0)).doubleValue();
228 |                                 } else {
229 |                                     lon = Double.parseDouble(lLoc.get(0).toString());
230 |                                 }
231 |                                 if (lLoc.get(1) instanceof Number) {
232 |                                     lat = ((Number) lLoc.get(1)).doubleValue();
233 |                                 } else {
234 |                                     lat = Double.parseDouble(lLoc.get(1).toString());
235 |                                 }
236 |                             } else {
237 |                                 String[] sLoc = Strings.commaDelimitedListToStringArray(loc.toString());
238 |                                 lon = Double.parseDouble(sLoc[0]);
239 |                                 lat = Double.parseDouble(sLoc[1]);
240 |                             }
241 |                             dLocations[i] = new double[]{lon, lat};
242 |                         }
243 |                         filterQuery.locations(dLocations);
244 |                     } else {
245 |                         String[] sLocations = Strings.commaDelimitedListToStringArray(locations.toString());
246 |                         double[][] dLocations = new double[sLocations.length / 2][];
247 |                         int dCounter = 0;
248 |                         for (int i = 0; i < sLocations.length; i++) {
249 |                             double lon = Double.parseDouble(sLocations[i]);
250 |                             double lat = Double.parseDouble(sLocations[++i]);
251 |                             dLocations[dCounter++] = new double[]{lon, lat};
252 |                         }
253 |                         filterQuery.locations(dLocations);
254 |                     }
255 |                     filterSet = true;
256 |                 }
257 |                 Object userLists = filterSettings.get("user_lists");
258 |                 if (userLists != null) {
259 |                     if (userLists instanceof List) {
260 |                         List<String> lUserlists = (List<String>) userLists;
261 |                         String[] tUserlists = lUserlists.toArray(new String[lUserlists.size()]);
262 |                         filterQuery.follow(getUsersListMembers(tUserlists));
263 |                     } else {
264 |                         String[] tUserlists = Strings.commaDelimitedListToStringArray(userLists.toString());
265 |                         filterQuery.follow(getUsersListMembers(tUserlists));
266 |                     }
267 |                     filterSet = true;
268 |                 }
269 | 
270 |                 // We should have something to filter
271 |                 if (!filterSet) {
272 |                     streamType = null;
273 |                     indexName = null;
274 |                     typeName = "status";
275 |                     bulkSize = 100;
276 |                     this.maxConcurrentBulk = 1;
277 |                     this.bulkFlushInterval = TimeValue.timeValueSeconds(5);
278 |                     logger.warn("can not set language filter without tracks, follow, locations or user_lists. Disabling river.");
279 |                     return;
280 |                 }
281 | 
282 |                 Object language = filterSettings.get("language");
283 |                 if (language != null) {
284 |                     if (language instanceof List) {
285 |                         List<String> lLanguage = (List<String>) language;
286 |                         filterQuery.language(lLanguage.toArray(new String[lLanguage.size()]));
287 |                     } else {
288 |                         filterQuery.language(Strings.commaDelimitedListToStringArray(language.toString()));
289 |                     }
290 |                 }
291 |             } else {
292 |                 filterQuery = null;
293 |             }
294 |         } else {
295 |             // No specific settings. We need to use some defaults
296 |             riverStreamType = "sample";
297 |             raw = false;
298 |             ignoreRetweet = false;
299 |             geoAsArray = false;
300 |             oauthConsumerKey = settings.get("river.twitter.oauth.consumer_key");
301 |             oauthConsumerSecret = settings.get("river.twitter.oauth.consumer_secret");
302 |             oauthAccessToken = settings.get("river.twitter.oauth.access_token");
303 |             oauthAccessTokenSecret = settings.get("river.twitter.oauth.access_token_secret");
304 |             retryAfter = XContentMapValues.nodeTimeValue(settings.get("river.twitter.retry_after"), TimeValue.timeValueSeconds(10));
305 |             filterQuery = null;
306 |             proxyHost = null;
307 |             proxyPort = null;
308 |             proxyUser = null;
309 |             proxyPassword =null;
310 |         }
311 | 
312 |         if (oauthAccessToken == null || oauthConsumerKey == null || oauthConsumerSecret == null || oauthAccessTokenSecret == null) {
313 |             stream = null;
314 |             streamType = null;
315 |             indexName = null;
316 |             typeName = "status";
317 |             bulkSize = 100;
318 |             this.maxConcurrentBulk = 1;
319 |             this.bulkFlushInterval = TimeValue.timeValueSeconds(5);
320 |             logger.warn("no oauth specified, disabling river...");
321 |             return;
322 |         }
323 | 
324 |         if (riverSettings.settings().containsKey("index")) {
325 |             Map<String, Object> indexSettings = (Map<String, Object>) riverSettings.settings().get("index");
326 |             indexName = XContentMapValues.nodeStringValue(indexSettings.get("index"), riverName.name());
327 |             typeName = XContentMapValues.nodeStringValue(indexSettings.get("type"), "status");
328 |             this.bulkSize = XContentMapValues.nodeIntegerValue(indexSettings.get("bulk_size"), 100);
329 |             this.bulkFlushInterval = TimeValue.parseTimeValue(XContentMapValues.nodeStringValue(
330 |                     indexSettings.get("flush_interval"), "5s"), TimeValue.timeValueSeconds(5));
331 |             this.maxConcurrentBulk = XContentMapValues.nodeIntegerValue(indexSettings.get("max_concurrent_bulk"), 1);
332 |         } else {
333 |             indexName = riverName.name();
334 |             typeName = "status";
335 |             bulkSize = 100;
336 |             this.maxConcurrentBulk = 1;
337 |             this.bulkFlushInterval = TimeValue.timeValueSeconds(5);
338 |         }
339 | 
340 |         logger.info("creating twitter stream river");
341 |         if (raw && logger.isDebugEnabled()) {
342 |             logger.debug("will index twitter raw content...");
343 |         }
344 | 
345 |         streamType = riverStreamType;
346 |         this.riverStatus = RiverStatus.INITIALIZED;
347 |     }
348 | 
349 |     /**
350 |      * Get users id of each list to stream them.
351 |      * @param tUserlists List of user list. Should be a public list.
352 |      * @return
353 |      */
354 |     private long[] getUsersListMembers(String[] tUserlists) {
355 |         logger.debug("Fetching user id of given lists");
356 |         List<Long> listUserIdToFollow = new ArrayList<Long>();
357 |         Configuration cb = buildTwitterConfiguration();
358 |         Twitter twitterImpl = new TwitterFactory(cb).getInstance();
359 | 
360 |         //For each list given in parameter
361 |         for (String listId : tUserlists) {
362 |             logger.debug("Adding users of list {} ",listId);
363 |             String[] splitListId = listId.split("/");
364 |             try {
365 |                 long cursor = -1;
366 |                 PagableResponseList<User> itUserListMembers;
367 |                 do {
368 |                     itUserListMembers = twitterImpl.getUserListMembers(splitListId[0], splitListId[1], cursor);
369 |                     for (User member : itUserListMembers) {
370 |                         long userId = member.getId();
371 |                         listUserIdToFollow.add(userId);
372 |                     }
373 |                 } while ((cursor = itUserListMembers.getNextCursor()) != 0);
374 | 
375 |             } catch (TwitterException te) {
376 |                 logger.error("Failed to get list members for : {}", listId, te);
377 |             }
378 |         }
379 | 
380 | 
381 |         //Just casting from Long to long
382 |         long ret[] = new long[listUserIdToFollow.size()];
383 |         int pos = 0;
384 |         for (Long userId : listUserIdToFollow) {
385 |             ret[pos] = userId;
386 |             pos++;
387 |         }
388 |         return ret;
389 |     }
390 | 
391 |     /**
392 |      * Build configuration object with credentials and proxy settings
393 |      * @return
394 |      */
395 |     private Configuration buildTwitterConfiguration() {
396 |         logger.debug("creating twitter configuration");
397 |         ConfigurationBuilder cb = new ConfigurationBuilder();
398 | 
399 |         cb.setOAuthConsumerKey(oauthConsumerKey)
400 |                 .setOAuthConsumerSecret(oauthConsumerSecret)
401 |                 .setOAuthAccessToken(oauthAccessToken)
402 |                 .setOAuthAccessTokenSecret(oauthAccessTokenSecret);
403 | 
404 |         if (proxyHost != null) cb.setHttpProxyHost(proxyHost);
405 |         if (proxyPort != null) cb.setHttpProxyPort(Integer.parseInt(proxyPort));
406 |         if (proxyUser != null) cb.setHttpProxyUser(proxyUser);
407 |         if (proxyPassword != null) cb.setHttpProxyPassword(proxyPassword);
408 |         if (raw) cb.setJSONStoreEnabled(true);
409 |         logger.debug("twitter configuration created");
410 |         return cb.build();
411 |     }
412 | 
413 |     /**
414 |      * Start twitter stream
415 |      */
416 |     private void startTwitterStream() {
417 |         logger.info("starting {} twitter stream", streamType);
418 | 
419 |         if (stream == null) {
420 |             logger.debug("creating twitter stream");
421 | 
422 |             stream = new TwitterStreamFactory(buildTwitterConfiguration()).getInstance();
423 |             if (streamType.equals("user")) {
424 |                 stream.addListener(new UserStreamHandler());
425 |             } else {
426 |                 stream.addListener(new StatusHandler());
427 |             }
428 | 
429 |             logger.debug("twitter stream created");
430 |         }
431 | 
432 |         if (riverStatus != RiverStatus.STOPPED && riverStatus != RiverStatus.STOPPING) {
433 |             if (streamType.equals("filter") || filterQuery != null) {
434 |                 stream.filter(filterQuery);
435 |             } else if (streamType.equals("firehose")) {
436 |                 stream.firehose(0);
437 |             } else if (streamType.equals("user")) {
438 |                 stream.user();
439 |             } else {
440 |                 stream.sample();
441 |             }
442 |         }
443 |         logger.debug("{} twitter stream started!", streamType);
444 |     }
445 | 
446 |     @Override
447 |     public void start() {
448 |         this.riverStatus = RiverStatus.STARTING;
449 |         // Let's start this in another thread so we won't stop the start process
450 |         threadPool.generic().execute(new Runnable() {
451 |             @Override
452 |             public void run() {
453 |                 if (riverStatus != RiverStatus.STOPPED && riverStatus != RiverStatus.STOPPING) {
454 |                     // We are first waiting for a yellow state at least
455 |                     logger.debug("waiting for yellow status");
456 |                     client.admin().cluster().prepareHealth("_river").setWaitForYellowStatus().get();
457 |                     logger.debug("yellow or green status received");
458 |                 }
459 | 
460 |                 if (riverStatus != RiverStatus.STOPPED && riverStatus != RiverStatus.STOPPING) {
461 |                     // We push ES mapping only if raw is false
462 |                     if (!raw) {
463 |                         try {
464 |                             logger.debug("Trying to create index [{}]", indexName);
465 |                             client.admin().indices().prepareCreate(indexName).execute().actionGet();
466 |                             logger.debug("index created [{}]", indexName);
467 |                         } catch (Exception e) {
468 |                             if (ExceptionsHelper.unwrapCause(e) instanceof IndexAlreadyExistsException) {
469 |                                 // that's fine
470 |                                 logger.debug("Index [{}] already exists, skipping...", indexName);
471 |                             } else if (ExceptionsHelper.unwrapCause(e) instanceof ClusterBlockException) {
472 |                                 // ok, not recovered yet..., lets start indexing and hope we recover by the first bulk
473 |                                 // TODO: a smarter logic can be to register for cluster event listener here, and only start sampling when the block is removed...
474 |                                 logger.debug("Cluster is blocked for now. Index [{}] can not be created, skipping...", indexName);
475 |                             } else {
476 |                                 logger.warn("failed to create index [{}], disabling river...", e, indexName);
477 |                                 riverStatus = RiverStatus.STOPPED;
478 |                                 return;
479 |                             }
480 |                         }
481 | 
482 |                         if (client.admin().indices().prepareGetMappings(indexName).setTypes(typeName).get().getMappings().isEmpty()) {
483 |                             try {
484 |                                 String mapping = XContentFactory.jsonBuilder().startObject().startObject(typeName).startObject("properties")
485 |                                         .startObject("location").field("type", "geo_point").endObject()
486 |                                         .startObject("language").field("type", "string").field("index", "not_analyzed").endObject()
487 |                                         .startObject("user").startObject("properties").startObject("screen_name").field("type", "string").field("index", "not_analyzed").endObject().endObject().endObject()
488 |                                         .startObject("mention").startObject("properties").startObject("screen_name").field("type", "string").field("index", "not_analyzed").endObject().endObject().endObject()
489 |                                         .startObject("in_reply").startObject("properties").startObject("user_screen_name").field("type", "string").field("index", "not_analyzed").endObject().endObject().endObject()
490 |                                         .startObject("retweet").startObject("properties").startObject("user_screen_name").field("type", "string").field("index", "not_analyzed").endObject().endObject().endObject()
491 |                                         .endObject().endObject().endObject().string();
492 |                                 logger.debug("Applying default mapping for [{}]/[{}]: {}", indexName, typeName, mapping);
493 |                                 client.admin().indices().preparePutMapping(indexName).setType(typeName).setSource(mapping).execute().actionGet();
494 |                             } catch (Exception e) {
495 |                                 logger.warn("failed to apply default mapping [{}]/[{}], disabling river...", e, indexName, typeName);
496 |                                 return;
497 |                             }
498 |                         } else {
499 |                             logger.debug("Mapping already exists for [{}]/[{}], skipping...", indexName, typeName);
500 |                         }
501 |                     }
502 |                 }
503 | 
504 |                 // Creating bulk processor
505 |                 logger.debug("creating bulk processor [{}]", indexName);
506 |                 bulkProcessor = BulkProcessor.builder(client, new BulkProcessor.Listener() {
507 |                     @Override
508 |                     public void beforeBulk(long executionId, BulkRequest request) {
509 |                         logger.debug("Going to execute new bulk composed of {} actions", request.numberOfActions());
510 |                     }
511 | 
512 |                     @Override
513 |                     public void afterBulk(long executionId, BulkRequest request, BulkResponse response) {
514 |                         logger.debug("Executed bulk composed of {} actions", request.numberOfActions());
515 |                         if (response.hasFailures()) {
516 |                             logger.warn("There was failures while executing bulk", response.buildFailureMessage());
517 |                             if (logger.isDebugEnabled()) {
518 |                                 for (BulkItemResponse item : response.getItems()) {
519 |                                     if (item.isFailed()) {
520 |                                         logger.debug("Error for {}/{}/{} for {} operation: {}", item.getIndex(),
521 |                                                 item.getType(), item.getId(), item.getOpType(), item.getFailureMessage());
522 |                                     }
523 |                                 }
524 |                             }
525 |                         }
526 |                     }
527 | 
528 |                     @Override
529 |                     public void afterBulk(long executionId, BulkRequest request, Throwable failure) {
530 |                         logger.warn("Error executing bulk", failure);
531 |                     }
532 |                 })
533 |                         .setBulkActions(bulkSize)
534 |                         .setConcurrentRequests(maxConcurrentBulk)
535 |                         .setFlushInterval(bulkFlushInterval)
536 |                     .build();
537 | 
538 |                 logger.debug("Bulk processor created with bulkSize [{}], bulkFlushInterval [{}]", bulkSize, bulkFlushInterval);
539 |                 if (riverStatus != RiverStatus.STOPPED && riverStatus != RiverStatus.STOPPING) {
540 |                     startTwitterStream();
541 |                     riverStatus = RiverStatus.RUNNING;
542 |                 }
543 |             }
544 |         });
545 |     }
546 | 
547 |     private void reconnect() {
548 |         if (riverStatus == RiverStatus.STOPPING || riverStatus == RiverStatus.STOPPED ) {
549 |             logger.debug("can not reconnect twitter on a closed river");
550 |             return;
551 |         }
552 | 
553 |         riverStatus = RiverStatus.STARTING;
554 | 
555 |         if (stream != null) {
556 |             try {
557 |                 logger.debug("cleanup stream");
558 |                 stream.cleanUp();
559 |             } catch (Exception e) {
560 |                 logger.debug("failed to cleanup after failure", e);
561 |             }
562 |             try {
563 |                 logger.debug("shutdown stream");
564 |                 stream.shutdown();
565 |             } catch (Exception e) {
566 |                 logger.debug("failed to shutdown after failure", e);
567 |             }
568 |         }
569 | 
570 |         if (riverStatus == RiverStatus.STOPPING || riverStatus == RiverStatus.STOPPED ) {
571 |             logger.debug("can not reconnect twitter on a closed river");
572 |             return;
573 |         }
574 | 
575 |         try {
576 |             startTwitterStream();
577 |             riverStatus = RiverStatus.RUNNING;
578 |         } catch (Exception e) {
579 |             if (riverStatus == RiverStatus.STOPPING || riverStatus == RiverStatus.STOPPED ) {
580 |                 logger.debug("river is closing. we won't reconnect.");
581 |                 close();
582 |                 return;
583 |             }
584 |             // TODO, we can update the status of the river to RECONNECT
585 |             logger.warn("failed to connect after failure, throttling", e);
586 |             threadPool.schedule(retryAfter, ThreadPool.Names.GENERIC, new Runnable() {
587 |                 @Override
588 |                 public void run() {
589 |                     reconnect();
590 |                 }
591 |             });
592 |         }
593 |     }
594 | 
595 |     @Override
596 |     public void close() {
597 |         riverStatus = RiverStatus.STOPPING;
598 | 
599 |         logger.info("closing twitter stream river");
600 | 
601 |         if (bulkProcessor != null) {
602 |             bulkProcessor.close();
603 |         }
604 | 
605 |         if (stream != null) {
606 |             // No need to call stream.cleanUp():
607 |             // - since it is done by the implementation of shutdown()
608 |             // - it will lead to a thread leak (see TwitterStreamImpl.cleanUp() and TwitterStreamImpl.shutdown() )
609 |             stream.shutdown();
610 |         }
611 | 
612 |         riverStatus = RiverStatus.STOPPED;
613 |     }
614 | 
615 |     private class StatusHandler extends StatusAdapter {
616 | 
617 |         @Override
618 |         public void onStatus(Status status) {
619 |             if (riverStatus != RiverStatus.STOPPED && riverStatus != RiverStatus.STOPPING) {
620 |                 try {
621 |                     // #24: We want to ignore retweets (default to false) https://github.com/elasticsearch/elasticsearch-river-twitter/issues/24
622 |                     if (status.isRetweet() && ignoreRetweet) {
623 |                         if (logger.isTraceEnabled()) {
624 |                             logger.trace("ignoring status cause retweet {} : {}", status.getUser().getName(), status.getText());
625 |                         }
626 |                     } else {
627 |                         if (logger.isTraceEnabled()) {
628 |                             logger.trace("status {} : {}", status.getUser().getName(), status.getText());
629 |                         }
630 | 
631 |                         // If we want to index tweets as is, we don't need to convert it to JSon doc
632 |                         if (raw) {
633 |                             String rawJSON = TwitterObjectFactory.getRawJSON(status);
634 |                             if (riverStatus != RiverStatus.STOPPED && riverStatus != RiverStatus.STOPPING) {
635 |                                 bulkProcessor.add(Requests.indexRequest(indexName).type(typeName).id(Long.toString(status.getId())).source(rawJSON));
636 |                             }
637 |                         } else {
638 |                             XContentBuilder builder = XContentFactory.jsonBuilder().startObject();
639 |                             builder.field("text", status.getText());
640 |                             builder.field("created_at", status.getCreatedAt());
641 |                             builder.field("source", status.getSource());
642 |                             builder.field("truncated", status.isTruncated());
643 |                             builder.field("language", status.getLang());
644 | 
645 |                             if (status.getUserMentionEntities() != null) {
646 |                                 builder.startArray("mention");
647 |                                 for (UserMentionEntity user : status.getUserMentionEntities()) {
648 |                                     builder.startObject();
649 |                                     builder.field("id", user.getId());
650 |                                     builder.field("name", user.getName());
651 |                                     builder.field("screen_name", user.getScreenName());
652 |                                     builder.field("start", user.getStart());
653 |                                     builder.field("end", user.getEnd());
654 |                                     builder.endObject();
655 |                                 }
656 |                                 builder.endArray();
657 |                             }
658 | 
659 |                             if (status.getRetweetCount() != -1) {
660 |                                 builder.field("retweet_count", status.getRetweetCount());
661 |                             }
662 | 
663 |                             if (status.isRetweet() && status.getRetweetedStatus() != null) {
664 |                                 builder.startObject("retweet");
665 |                                 builder.field("id", status.getRetweetedStatus().getId());
666 |                                 if (status.getRetweetedStatus().getUser() != null) {
667 |                                     builder.field("user_id", status.getRetweetedStatus().getUser().getId());
668 |                                     builder.field("user_screen_name", status.getRetweetedStatus().getUser().getScreenName());
669 |                                     if (status.getRetweetedStatus().getRetweetCount() != -1) {
670 |                                         builder.field("retweet_count", status.getRetweetedStatus().getRetweetCount());
671 |                                     }
672 |                                 }
673 |                                 builder.endObject();
674 |                             }
675 | 
676 |                             if (status.getInReplyToStatusId() != -1) {
677 |                                 builder.startObject("in_reply");
678 |                                 builder.field("status", status.getInReplyToStatusId());
679 |                                 if (status.getInReplyToUserId() != -1) {
680 |                                     builder.field("user_id", status.getInReplyToUserId());
681 |                                     builder.field("user_screen_name", status.getInReplyToScreenName());
682 |                                 }
683 |                                 builder.endObject();
684 |                             }
685 | 
686 |                             if (status.getHashtagEntities() != null) {
687 |                                 builder.startArray("hashtag");
688 |                                 for (HashtagEntity hashtag : status.getHashtagEntities()) {
689 |                                     builder.startObject();
690 |                                     builder.field("text", hashtag.getText());
691 |                                     builder.field("start", hashtag.getStart());
692 |                                     builder.field("end", hashtag.getEnd());
693 |                                     builder.endObject();
694 |                                 }
695 |                                 builder.endArray();
696 |                             }
697 |                             if (status.getContributors() != null && status.getContributors().length > 0) {
698 |                                 builder.array("contributor", status.getContributors());
699 |                             }
700 |                             if (status.getGeoLocation() != null) {
701 |                                 if (geoAsArray) {
702 |                                     builder.startArray("location");
703 |                                     builder.value(status.getGeoLocation().getLongitude());
704 |                                     builder.value(status.getGeoLocation().getLatitude());
705 |                                     builder.endArray();
706 |                                 } else {
707 |                                     builder.startObject("location");
708 |                                     builder.field("lat", status.getGeoLocation().getLatitude());
709 |                                     builder.field("lon", status.getGeoLocation().getLongitude());
710 |                                     builder.endObject();
711 |                                 }
712 |                             }
713 |                             if (status.getPlace() != null) {
714 |                                 builder.startObject("place");
715 |                                 builder.field("id", status.getPlace().getId());
716 |                                 builder.field("name", status.getPlace().getName());
717 |                                 builder.field("type", status.getPlace().getPlaceType());
718 |                                 builder.field("full_name", status.getPlace().getFullName());
719 |                                 builder.field("street_address", status.getPlace().getStreetAddress());
720 |                                 builder.field("country", status.getPlace().getCountry());
721 |                                 builder.field("country_code", status.getPlace().getCountryCode());
722 |                                 builder.field("url", status.getPlace().getURL());
723 |                                 builder.endObject();
724 |                             }
725 |                             if (status.getURLEntities() != null) {
726 |                                 builder.startArray("link");
727 |                                 for (URLEntity url : status.getURLEntities()) {
728 |                                     if (url != null) {
729 |                                         builder.startObject();
730 |                                         if (url.getURL() != null) {
731 |                                             builder.field("url", url.getURL());
732 |                                         }
733 |                                         if (url.getDisplayURL() != null) {
734 |                                             builder.field("display_url", url.getDisplayURL());
735 |                                         }
736 |                                         if (url.getExpandedURL() != null) {
737 |                                             builder.field("expand_url", url.getExpandedURL());
738 |                                         }
739 |                                         builder.field("start", url.getStart());
740 |                                         builder.field("end", url.getEnd());
741 |                                         builder.endObject();
742 |                                     }
743 |                                 }
744 |                                 builder.endArray();
745 |                             }
746 | 
747 |                             builder.startObject("user");
748 |                             builder.field("id", status.getUser().getId());
749 |                             builder.field("name", status.getUser().getName());
750 |                             builder.field("screen_name", status.getUser().getScreenName());
751 |                             builder.field("location", status.getUser().getLocation());
752 |                             builder.field("description", status.getUser().getDescription());
753 |                             builder.field("profile_image_url", status.getUser().getProfileImageURL());
754 |                             builder.field("profile_image_url_https", status.getUser().getProfileImageURLHttps());
755 | 
756 |                             builder.endObject();
757 | 
758 |                             builder.endObject();
759 |                             if (riverStatus != RiverStatus.STOPPED && riverStatus != RiverStatus.STOPPING) {
760 |                                 bulkProcessor.add(Requests.indexRequest(indexName).type(typeName).id(Long.toString(status.getId())).source(builder));
761 |                             }
762 |                         }
763 |                     }
764 | 
765 |                 } catch (Exception e) {
766 |                     logger.warn("failed to construct index request", e);
767 |                 }
768 |             } else {
769 |                 logger.debug("river is closing. ignoring tweet [{}]", status.getId());
770 |             }
771 |         }
772 | 
773 |         @Override
774 |         public void onDeletionNotice(StatusDeletionNotice statusDeletionNotice) {
775 |             if (riverStatus != RiverStatus.STOPPED && riverStatus != RiverStatus.STOPPING) {
776 |                 if (statusDeletionNotice.getStatusId() != -1) {
777 |                     bulkProcessor.add(Requests.deleteRequest(indexName).type(typeName).id(Long.toString(statusDeletionNotice.getStatusId())));
778 |                 }
779 |             } else {
780 |                 logger.debug("river is closing. ignoring deletion of tweet [{}]", statusDeletionNotice.getStatusId());
781 |             }
782 |         }
783 | 
784 |         @Override
785 |         public void onTrackLimitationNotice(int numberOfLimitedStatuses) {
786 |             logger.info("received track limitation notice, number_of_limited_statuses {}", numberOfLimitedStatuses);
787 |         }
788 | 
789 |         @Override
790 |         public void onException(Exception ex) {
791 |             logger.warn("stream failure, restarting stream...", ex);
792 |             threadPool.generic().execute(new Runnable() {
793 |                 @Override
794 |                 public void run() {
795 |                     reconnect();
796 |                 }
797 |             });
798 |         }
799 |     }
800 |     
801 |     private class UserStreamHandler extends UserStreamAdapter {
802 | 
803 |     private final StatusHandler statusHandler = new StatusHandler(); 
804 |  
805 |         @Override
806 |         public void onException(Exception ex) {
807 |             statusHandler.onException(ex);
808 |         }
809 | 
810 |         @Override
811 |         public void onStatus(Status status) {
812 |             statusHandler.onStatus(status);
813 |         }
814 | 
815 |         @Override
816 |         public void onDeletionNotice(StatusDeletionNotice statusDeletionNotice) {
817 |             statusHandler.onDeletionNotice(statusDeletionNotice);
818 |         }
819 |     }
820 | 
821 |     public enum RiverStatus {
822 |         UNKNOWN,
823 |         INITIALIZED,
824 |         STARTING,
825 |         RUNNING,
826 |         STOPPING,
827 |         STOPPED;
828 |     }
829 | }
830 | 


--------------------------------------------------------------------------------
/src/main/java/org/elasticsearch/river/twitter/TwitterRiverModule.java:
--------------------------------------------------------------------------------
 1 | /*
 2 |  * Licensed to Elasticsearch under one or more contributor
 3 |  * license agreements. See the NOTICE file distributed with
 4 |  * this work for additional information regarding copyright
 5 |  * ownership. Elasticsearch licenses this file to you under
 6 |  * the Apache License, Version 2.0 (the "License"); you may
 7 |  * not use this file except in compliance with the License.
 8 |  * You may obtain a copy of the License at
 9 |  *
10 |  *    http://www.apache.org/licenses/LICENSE-2.0
11 |  *
12 |  * Unless required by applicable law or agreed to in writing,
13 |  * software distributed under the License is distributed on an
14 |  * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
15 |  * KIND, either express or implied.  See the License for the
16 |  * specific language governing permissions and limitations
17 |  * under the License.
18 |  */
19 | 
20 | package org.elasticsearch.river.twitter;
21 | 
22 | import org.elasticsearch.common.inject.AbstractModule;
23 | import org.elasticsearch.river.River;
24 | 
25 | /**
26 |  *
27 |  */
28 | public class TwitterRiverModule extends AbstractModule {
29 | 
30 |     @Override
31 |     protected void configure() {
32 |         bind(River.class).to(TwitterRiver.class).asEagerSingleton();
33 |     }
34 | }
35 | 


--------------------------------------------------------------------------------
/src/main/resources/es-plugin.properties:
--------------------------------------------------------------------------------
1 | plugin=org.elasticsearch.plugin.river.twitter.TwitterRiverPlugin
2 | version=${project.version}
3 | 


--------------------------------------------------------------------------------
/src/test/java/org/elasticsearch/river/twitter/test/AbstractTwitterTest.java:
--------------------------------------------------------------------------------
 1 | /*
 2 |  * Licensed to Elasticsearch under one or more contributor
 3 |  * license agreements. See the NOTICE file distributed with
 4 |  * this work for additional information regarding copyright
 5 |  * ownership. Elasticsearch licenses this file to you under
 6 |  * the Apache License, Version 2.0 (the "License"); you may
 7 |  * not use this file except in compliance with the License.
 8 |  * You may obtain a copy of the License at
 9 |  *
10 |  *    http://www.apache.org/licenses/LICENSE-2.0
11 |  *
12 |  * Unless required by applicable law or agreed to in writing,
13 |  * software distributed under the License is distributed on an
14 |  * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
15 |  * KIND, either express or implied.  See the License for the
16 |  * specific language governing permissions and limitations
17 |  * under the License.
18 |  */
19 | 
20 | package org.elasticsearch.river.twitter.test;
21 | 
22 | import com.carrotsearch.randomizedtesting.annotations.TestGroup;
23 | import org.elasticsearch.common.base.Predicate;
24 | import org.elasticsearch.test.ElasticsearchIntegrationTest;
25 | import org.elasticsearch.test.ElasticsearchIntegrationTest.ThirdParty;
26 | 
27 | import java.lang.annotation.Documented;
28 | import java.lang.annotation.Inherited;
29 | import java.lang.annotation.Retention;
30 | import java.lang.annotation.RetentionPolicy;
31 | import java.util.concurrent.TimeUnit;
32 | 
33 | /**
34 |  * Base class for tests that an internet connection and twitter credentials to run.
35 |  * Twitter tests are disabled by default.
36 |  * <p/>
37 |  * To enable test add -Dtests.thirdparty=true -Dtests.config=/path/to/elasticsearch.yml
38 |  * <p/>
39 |  * The elasticsearch.yml file should contain the following keys
40 |  * <pre>
41 |   river:
42 |       twitter:
43 |           oauth:
44 |              consumer_key: ""
45 |              consumer_secret: ""
46 |              access_token: ""
47 |              access_token_secret: ""
48 |  * </pre>
49 |  *
50 |  * You need to get an OAuth token in order to use Twitter river.
51 |  * Please follow [Twitter documentation](https://dev.twitter.com/docs/auth/tokens-devtwittercom), basically:
52 |  *
53 |  * <ul>
54 |  * <li>Login to: https://dev.twitter.com/apps/
55 |  * <li>Create a new Twitter application (let's say elasticsearch): https://dev.twitter.com/apps/new
56 |  You don't need a callback URL.
57 |  * <li>When done, click on `Create my access token`.
58 |  * <li>Open `OAuth tool` tab and note `Consumer key`, `Consumer secret`, `Access token` and `Access token secret`.
59 |  * </ul>
60 |  */
61 | @ThirdParty
62 | public abstract class AbstractTwitterTest extends ElasticsearchIntegrationTest {
63 | 
64 |     /**
65 |      * Repeat a task until it returns true or after a given wait time.
66 |      * We use here a 1 second delay between two runs
67 |      * @param breakPredicate test you want to run
68 |      * @param maxWaitTime maximum time you want to wait
69 |      * @param unit time unit used for maxWaitTime and maxSleepTime
70 |      */
71 |     public static boolean awaitBusy1Second(Predicate<?> breakPredicate, long maxWaitTime, TimeUnit unit) throws InterruptedException {
72 |         long maxTimeInMillis = TimeUnit.MILLISECONDS.convert(maxWaitTime, unit);
73 |         long sleepTimeInMillis = 1000;
74 |         long iterations = maxTimeInMillis / sleepTimeInMillis;
75 |         for (int i = 0; i < iterations; i++) {
76 |             if (breakPredicate.apply(null)) {
77 |                 return true;
78 |             }
79 |             Thread.sleep(sleepTimeInMillis);
80 |         }
81 |         return breakPredicate.apply(null);
82 |     }
83 | }
84 | 


--------------------------------------------------------------------------------
/src/test/java/org/elasticsearch/river/twitter/test/Twitter4JThreadFilter.java:
--------------------------------------------------------------------------------
 1 | /*
 2 |  * Licensed to Elasticsearch under one or more contributor
 3 |  * license agreements. See the NOTICE file distributed with
 4 |  * this work for additional information regarding copyright
 5 |  * ownership. Elasticsearch licenses this file to you under
 6 |  * the Apache License, Version 2.0 (the "License"); you may
 7 |  * not use this file except in compliance with the License.
 8 |  * You may obtain a copy of the License at
 9 |  *
10 |  *    http://www.apache.org/licenses/LICENSE-2.0
11 |  *
12 |  * Unless required by applicable law or agreed to in writing,
13 |  * software distributed under the License is distributed on an
14 |  * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
15 |  * KIND, either express or implied.  See the License for the
16 |  * specific language governing permissions and limitations
17 |  * under the License.
18 |  */
19 | 
20 | package org.elasticsearch.river.twitter.test;
21 | 
22 | import com.carrotsearch.randomizedtesting.ThreadFilter;
23 | 
24 | /**
25 |  * We know that Twitter4J can take a while to close
26 |  * This filter will ignore it as a ThreadLeak
27 |  */
28 | public class Twitter4JThreadFilter implements ThreadFilter {
29 | 
30 |     @Override
31 |     public boolean reject(Thread t) {
32 |         String threadName = t.getName();
33 | 
34 |         if (threadName.contains("Twitter4J Async Dispatcher")) {
35 |             return true;
36 |         }
37 | 
38 |         if (threadName.contains("Twitter Stream consumer")) {
39 |             return true;
40 |         }
41 | 
42 |         if (threadName.contains("riverClusterService#updateTask")) {
43 |             return true;
44 |         }
45 | 
46 |         return false;
47 |     }
48 | }
49 | 


--------------------------------------------------------------------------------
/src/test/java/org/elasticsearch/river/twitter/test/TwitterIntegrationTest.java:
--------------------------------------------------------------------------------
  1 | /*
  2 |  * Licensed to Elasticsearch under one or more contributor
  3 |  * license agreements. See the NOTICE file distributed with
  4 |  * this work for additional information regarding copyright
  5 |  * ownership. Elasticsearch licenses this file to you under
  6 |  * the Apache License, Version 2.0 (the "License"); you may
  7 |  * not use this file except in compliance with the License.
  8 |  * You may obtain a copy of the License at
  9 |  *
 10 |  *    http://www.apache.org/licenses/LICENSE-2.0
 11 |  *
 12 |  * Unless required by applicable law or agreed to in writing,
 13 |  * software distributed under the License is distributed on an
 14 |  * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
 15 |  * KIND, either express or implied.  See the License for the
 16 |  * specific language governing permissions and limitations
 17 |  * under the License.
 18 |  */
 19 | 
 20 | package org.elasticsearch.river.twitter.test;
 21 | 
 22 | import com.carrotsearch.randomizedtesting.annotations.ThreadLeakFilters;
 23 | import org.elasticsearch.action.count.CountResponse;
 24 | import org.elasticsearch.action.get.GetResponse;
 25 | import org.elasticsearch.action.search.SearchPhaseExecutionException;
 26 | import org.elasticsearch.action.search.SearchResponse;
 27 | import org.elasticsearch.common.Strings;
 28 | import org.elasticsearch.common.base.Predicate;
 29 | import org.elasticsearch.common.joda.time.DateTime;
 30 | import org.elasticsearch.common.settings.Settings;
 31 | import org.elasticsearch.common.unit.DistanceUnit;
 32 | import org.elasticsearch.common.xcontent.XContentBuilder;
 33 | import org.elasticsearch.env.Environment;
 34 | import org.elasticsearch.index.query.QueryBuilders;
 35 | import org.elasticsearch.indices.IndexAlreadyExistsException;
 36 | import org.elasticsearch.indices.IndexMissingException;
 37 | import org.elasticsearch.plugins.PluginsService;
 38 | import org.elasticsearch.river.twitter.test.helper.HttpClient;
 39 | import org.elasticsearch.river.twitter.test.helper.HttpClientResponse;
 40 | import org.elasticsearch.search.SearchHit;
 41 | import org.elasticsearch.test.ElasticsearchIntegrationTest;
 42 | import org.junit.*;
 43 | import twitter4j.Status;
 44 | import twitter4j.Twitter;
 45 | import twitter4j.TwitterException;
 46 | import twitter4j.TwitterFactory;
 47 | import twitter4j.auth.AccessToken;
 48 | 
 49 | import java.io.IOException;
 50 | import java.util.concurrent.TimeUnit;
 51 | 
 52 | import static org.elasticsearch.cluster.metadata.IndexMetaData.SETTING_NUMBER_OF_REPLICAS;
 53 | import static org.elasticsearch.cluster.metadata.IndexMetaData.SETTING_NUMBER_OF_SHARDS;
 54 | import static org.elasticsearch.common.xcontent.XContentFactory.jsonBuilder;
 55 | import static org.hamcrest.CoreMatchers.*;
 56 | import static org.hamcrest.Matchers.equalTo;
 57 | import static org.hamcrest.Matchers.greaterThan;
 58 | 
 59 | /**
 60 |  * Integration tests for Twitter river<br>
 61 |  * You must have an internet access.
 62 |  *
 63 |  * Launch it using:
 64 |  * mvn test -Dtests.thirdparty=true -Dtests.config=/path/to/elasticsearch.yml
 65 |  *
 66 |  * where your /path/to/elasticsearch.yml contains:
 67 | 
 68 |   river:
 69 |       twitter:
 70 |           oauth:
 71 |              consumer_key: ""
 72 |              consumer_secret: ""
 73 |              access_token: ""
 74 |              access_token_secret: ""
 75 | 
 76 |  */
 77 | @ElasticsearchIntegrationTest.ClusterScope(
 78 |         scope = ElasticsearchIntegrationTest.Scope.SUITE,
 79 |         transportClientRatio = 0.0)
 80 | @ThreadLeakFilters(defaultFilters = true, filters = {Twitter4JThreadFilter.class})
 81 | public class TwitterIntegrationTest extends AbstractTwitterTest {
 82 | 
 83 |     private final String track = "obama";
 84 | 
 85 |     @Override
 86 |     protected Settings nodeSettings(int nodeOrdinal) {
 87 |         Settings.Builder settings = Settings.builder()
 88 |                 .put(super.nodeSettings(nodeOrdinal))
 89 |                 .put("path.home", createTempDir())
 90 |                 .put("plugins." + PluginsService.LOAD_PLUGIN_FROM_CLASSPATH, true);
 91 | 
 92 |         Environment environment = new Environment(settings.build());
 93 | 
 94 |         // if explicit, just load it and don't load from env
 95 |         if (Strings.hasText(System.getProperty("tests.config"))) {
 96 |             settings.loadFromUrl(environment.resolveConfig(System.getProperty("tests.config")));
 97 |         }
 98 | 
 99 |         return settings.build();
100 |     }
101 | 
102 |     @Before
103 |     public void createEmptyRiverIndex() {
104 |         // We want to force _river index to use 1 shard 1 replica
105 |         client().admin().indices().prepareCreate("_river").setSettings(Settings.builder()
106 |                 .put(SETTING_NUMBER_OF_SHARDS, 1)
107 |                 .put(SETTING_NUMBER_OF_REPLICAS, 0)).get();
108 |     }
109 | 
110 |     @After
111 |     public void deleteRiverAndWait() throws InterruptedException {
112 |         logger.info(" --> delete all");
113 |         client().admin().indices().prepareDelete("_all").get();
114 | 
115 |         assertThat(awaitBusy(new Predicate<Object>() {
116 |             public boolean apply(Object obj) {
117 |                 CountResponse response = client().prepareCount().get();
118 |                 return response.getCount() == 0;
119 |             }
120 |         }, 20, TimeUnit.SECONDS), equalTo(true));
121 | 
122 |         // Let's wait one second between two runs as it appears that Twitter4J
123 |         // does not close immediately so we might have Twitter API failure on the next test
124 |         // 420:Returned by the Search and Trends API when you are being rate limited
125 |         logger.info(" --> wait for Twitter4J to close");
126 |         awaitBusy1Second(new Predicate<Object>() {
127 |             @Override
128 |             public boolean apply(Object o) {
129 |                 return false;
130 |             }
131 |         }, 2, TimeUnit.SECONDS);
132 |         logger.info(" --> ending test");
133 |     }
134 | 
135 |     private String getDbName() {
136 |         return Strings.toUnderscoreCase(getTestName());
137 |     }
138 | 
139 |     private void launchTest(XContentBuilder river, final Integer numDocs, boolean removeRiver)
140 |             throws IOException, InterruptedException {
141 |         logger.info("  -> Checking internet working");
142 |         HttpClientResponse response = new HttpClient("www.elastic.co", 443).request("/");
143 |         Assert.assertThat(response.errorCode(), is(200));
144 | 
145 |         logger.info("  -> Create river");
146 |         try {
147 |             createIndex(getDbName());
148 |         } catch (IndexAlreadyExistsException e) {
149 |             // No worries. We already created the index before
150 |         }
151 |         index("_river", getDbName(), "_meta", river);
152 | 
153 |         logger.info("  -> Wait for some docs");
154 |         assertThat(awaitBusy1Second(new Predicate<Object>() {
155 |             public boolean apply(Object obj) {
156 |                 try {
157 |                     refresh();
158 |                     CountResponse response = client().prepareCount(getDbName()).get();
159 |                     logger.info("  -> got {} docs in {} index", response.getCount(), getDbName());
160 |                     return response.getCount() >= numDocs;
161 |                 } catch (IndexMissingException e) {
162 |                     return false;
163 |                 } catch (SearchPhaseExecutionException e) {
164 |                     return false;
165 |                 }
166 |             }
167 |         }, 5, TimeUnit.MINUTES), equalTo(true));
168 | 
169 |         if (removeRiver) {
170 |             logger.info("  -> Remove river");
171 |             client().prepareDelete("_river", getDbName(), "_meta").get();
172 |         }
173 |     }
174 | 
175 |     @Test
176 |     public void testLanguageFiltering() throws IOException, InterruptedException {
177 |         launchTest(jsonBuilder()
178 |             .startObject()
179 |                 .field("type", "twitter")
180 |                 .startObject("twitter")
181 |                     .field("type", "filter")
182 |                     .startObject("filter")
183 |                         .field("tracks", "le")
184 |                         .field("language", "fr")
185 |                     .endObject()
186 |                 .endObject()
187 |             .endObject(), randomIntBetween(5, 50), true);
188 | 
189 |         // We should have only FR data
190 |         SearchResponse response = client().prepareSearch(getDbName())
191 |                 .addField("language")
192 |                 .addField("_source")
193 |                 .get();
194 | 
195 |         logger.info("  --> Search response: {}", response.toString());
196 | 
197 |         // All language fields should be fr
198 |         for (SearchHit hit : response.getHits().getHits()) {
199 |             assertThat(hit.field("language"), notNullValue());
200 |             assertThat(hit.field("language").getValue().toString(), is("fr"));
201 |         }
202 |     }
203 | 
204 |     @Test
205 |     public void testIgnoreRT() throws IOException, InterruptedException {
206 |         launchTest(jsonBuilder()
207 |             .startObject()
208 |                 .field("type", "twitter")
209 |                 .startObject("twitter")
210 |                     .field("type", "sample")
211 |                     .field("ignore_retweet", true)
212 |                .endObject()
213 |             .endObject(), randomIntBetween(5, 50), true);
214 | 
215 |         // We should have only FR data
216 |         SearchResponse response = client().prepareSearch(getDbName())
217 |                 .addField("retweet.id")
218 |                 .get();
219 | 
220 |         logger.info("  --> Search response: {}", response.toString());
221 | 
222 |         // We should not have any RT
223 |         for (SearchHit hit : response.getHits().getHits()) {
224 |             assertThat(hit.field("retweet.id"), nullValue());
225 |         }
226 |     }
227 | 
228 |     @Test
229 |     public void testRaw() throws IOException, InterruptedException {
230 |         launchTest(jsonBuilder()
231 |             .startObject()
232 |                 .field("type", "twitter")
233 |                 .startObject("twitter")
234 |                     .field("raw", true)
235 |                     .startObject("filter")
236 |                           .field("tracks", track)
237 |                     .endObject()
238 |                .endObject()
239 |             .endObject(), randomIntBetween(5, 50), true);
240 | 
241 |         // We should have data we don't have without raw set to true
242 |         SearchResponse response = client().prepareSearch(getDbName())
243 |                 .addField("user.statuses_count")
244 |                 .addField("_source")
245 |                 .get();
246 | 
247 |         logger.info("  --> Search response: {}", response.toString());
248 | 
249 |         for (SearchHit hit : response.getHits().getHits()) {
250 |             assertThat(hit.field("user.statuses_count"), notNullValue());
251 |         }
252 |     }
253 | 
254 |     /**
255 |      * Tracking twitter account: 783214
256 |      */
257 |     @Test
258 |     public void testFollow() throws IOException, InterruptedException {
259 |         launchTest(jsonBuilder()
260 |                 .startObject()
261 |                     .field("type", "twitter")
262 |                     .startObject("twitter")
263 |                         .startObject("filter")
264 |                             .field("follow", "783214")
265 |                         .endObject()
266 |                     .endObject()
267 |                     .startObject("index")
268 |                         .field("bulk_size", 1)
269 |                     .endObject()
270 |                 .endObject(), 1, true);
271 |     }
272 | 
273 |     /**
274 |      * Tracking twitter lists and Zonal_Marking/Guardian100FootballBlogs,Zonal_Marking/football-journalists-3
275 |      */
276 |     @Test
277 |     public void testFollowList() throws IOException, InterruptedException {
278 |         launchTest(jsonBuilder()
279 |                 .startObject()
280 |                     .field("type", "twitter")
281 |                     .startObject("twitter")
282 |                         .startObject("filter")
283 |                             .field("user_lists", "Zonal_Marking/Guardian100FootballBlogs,Zonal_Marking/football-journalists-3")
284 |                         .endObject()
285 |                     .endObject()
286 |                     .startObject("index")
287 |                         .field("bulk_size", 1)
288 |                     .endObject()
289 |                 .endObject(), 1, true);
290 |     }
291 |     @Test
292 |     public void testTracks() throws IOException, InterruptedException {
293 |         launchTest(jsonBuilder()
294 |             .startObject()
295 |                 .field("type", "twitter")
296 |                 .startObject("twitter")
297 |                     .startObject("filter")
298 |                         .field("tracks", track)
299 |                     .endObject()
300 |                .endObject()
301 |             .endObject(), randomIntBetween(1, 10), true);
302 | 
303 |         // We should have only FR data
304 |         SearchResponse response = client().prepareSearch(getDbName())
305 |                 .setQuery(QueryBuilders.queryStringQuery(track))
306 |                 .get();
307 | 
308 |         logger.info("  --> Search response: {}", response.toString());
309 | 
310 |         assertThat(response.getHits().getTotalHits(), greaterThan(0L));
311 |     }
312 | 
313 |     @Test
314 |     public void testSample() throws IOException, InterruptedException {
315 |         launchTest(jsonBuilder()
316 |             .startObject()
317 |                 .field("type", "twitter")
318 |                 .startObject("twitter")
319 |                     .field("type", "sample")
320 |                .endObject()
321 |             .endObject(), randomIntBetween(10, 200), true);
322 |     }
323 | 
324 |     @Test
325 |     public void testRetryAfter() throws IOException, InterruptedException {
326 |         launchTest(jsonBuilder()
327 |             .startObject()
328 |                 .field("type", "twitter")
329 |                 .startObject("twitter")
330 |                     .field("type", "sample")
331 |                     .field("retry_after", "10s")
332 |                .endObject()
333 |             .endObject(), randomIntBetween(10, 200), true);
334 |     }
335 | 
336 |     @Test
337 |     public void testUserStream() throws IOException, InterruptedException, TwitterException {
338 |         launchTest(jsonBuilder()
339 |             .startObject()
340 |                 .field("type", "twitter")
341 |                 .startObject("twitter")
342 |                     .field("type", "user")
343 |                .endObject()
344 |             .endObject(), 0, false);
345 | 
346 |         // Wait for the river to start
347 |         awaitBusy(new Predicate<Object>() {
348 |             public boolean apply(Object obj) {
349 |                 try {
350 |                     GetResponse response = get("_river", getDbName(), "_status");
351 |                     return response.isExists();
352 |                 } catch (IndexMissingException e) {
353 |                     return false;
354 |                 }
355 |             }
356 |         }, 10, TimeUnit.SECONDS);
357 | 
358 |         // The river could look started but it took actually some seconds
359 |         // to get twitter stream up and running. So we wait 30 seconds more.
360 |         awaitBusy1Second(new Predicate<Object>() {
361 |             public boolean apply(Object obj) {
362 |                 return false;
363 |             }
364 |         }, 30, TimeUnit.SECONDS);
365 | 
366 |         // Generate a tweet on your timeline
367 |         // We need to read settings from elasticsearch.yml file
368 |         Settings settings = internalCluster().getInstance(Settings.class);
369 |         AccessToken accessToken = new AccessToken(
370 |                 settings.get("river.twitter.oauth.access_token"),
371 |                 settings.get("river.twitter.oauth.access_token_secret"));
372 | 
373 | 
374 |         Twitter twitter = new TwitterFactory().getInstance();
375 |         twitter.setOAuthConsumer(
376 |                 settings.get("river.twitter.oauth.consumer_key"),
377 |                 settings.get("river.twitter.oauth.consumer_secret"));
378 |         twitter.setOAuthAccessToken(accessToken);
379 | 
380 |         Status status = twitter.updateStatus("testing twitter river. Please ignore. " +
381 |                         DateTime.now().toString());
382 |         logger.info("  -> tweet [{}] sent: [{}]", status.getId(), status.getText());
383 | 
384 |         assertThat(awaitBusy1Second(new Predicate<Object>() {
385 |             public boolean apply(Object obj) {
386 |                 try {
387 |                     refresh();
388 |                     SearchResponse response = client().prepareSearch(getDbName()).get();
389 |                     logger.info("  -> got {} docs in {} index", response.getHits().totalHits(), getDbName());
390 |                     return response.getHits().totalHits() >= 1;
391 |                 } catch (IndexMissingException e) {
392 |                     return false;
393 |                 }
394 |             }
395 |         }, 1, TimeUnit.MINUTES), is(true));
396 | 
397 |         logger.info("  -> Remove river");
398 |         client().prepareDelete("_river", getDbName(), "_meta").get();
399 |     }
400 | 
401 |     /**
402 |      * Test for #51: https://github.com/elasticsearch/elasticsearch-river-twitter/issues/51
403 |      */
404 |     @Test
405 |     public void testgeoAsArray() throws IOException, InterruptedException {
406 |         launchTest(jsonBuilder()
407 |             .startObject()
408 |                 .field("type", "twitter")
409 |                 .startObject("twitter")
410 |                     .field("type", "sample")
411 |                     .field("geo_as_array", true)
412 |                .endObject()
413 |             .endObject(), randomIntBetween(1, 10), false);
414 | 
415 |         // We wait for geo located tweets (it could take a looooong time)
416 |         if (!awaitBusy1Second(new Predicate<Object>() {
417 |             public boolean apply(Object obj) {
418 |                 try {
419 |                     refresh();
420 |                     SearchResponse response = client().prepareSearch(getDbName())
421 |                             .setPostFilter(
422 |                                     QueryBuilders.geoDistanceQuery("location")
423 |                                             .point(0, 0)
424 |                                             .distance(10000, DistanceUnit.KILOMETERS)
425 |                             )
426 |                             .addField("_source")
427 |                             .addField("location")
428 |                             .get();
429 | 
430 |                     logger.info("  --> Search response: {}", response.toString());
431 | 
432 |                     for (SearchHit hit : response.getHits().getHits()) {
433 |                         if (hit.field("location") != null) {
434 |                             // We have a location field so it must be an array containing 2 values
435 |                             assertThat(hit.field("location").getValues().size(), is(2));
436 |                             return true;
437 |                         }
438 |                     }
439 |                     return false;
440 |                 } catch (IndexMissingException e) {
441 |                     return false;
442 |                 }
443 |             }
444 |         }, 5, TimeUnit.MINUTES)) {
445 |             logger.warn("  -> We did not manage to get a geo localized tweet within 5 minutes. :(");
446 |         }
447 | 
448 |         logger.info("  -> Remove river");
449 |         client().prepareDelete("_river", getDbName(), "_meta").get();
450 |     }
451 | }
452 | 


--------------------------------------------------------------------------------
/src/test/java/org/elasticsearch/river/twitter/test/helper/HttpClient.java:
--------------------------------------------------------------------------------
  1 | /*
  2 |  * Licensed to Elasticsearch under one or more contributor
  3 |  * license agreements. See the NOTICE file distributed with
  4 |  * this work for additional information regarding copyright
  5 |  * ownership. Elasticsearch licenses this file to you under
  6 |  * the Apache License, Version 2.0 (the "License"); you may
  7 |  * not use this file except in compliance with the License.
  8 |  * You may obtain a copy of the License at
  9 |  *
 10 |  *    http://www.apache.org/licenses/LICENSE-2.0
 11 |  *
 12 |  * Unless required by applicable law or agreed to in writing,
 13 |  * software distributed under the License is distributed on an
 14 |  * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
 15 |  * KIND, either express or implied.  See the License for the
 16 |  * specific language governing permissions and limitations
 17 |  * under the License.
 18 |  */
 19 | package org.elasticsearch.river.twitter.test.helper;
 20 | 
 21 | import org.elasticsearch.ElasticsearchException;
 22 | import org.elasticsearch.common.base.Charsets;
 23 | import org.elasticsearch.common.io.Streams;
 24 | 
 25 | import java.io.IOException;
 26 | import java.io.InputStream;
 27 | import java.io.InputStreamReader;
 28 | import java.io.OutputStreamWriter;
 29 | import java.net.HttpURLConnection;
 30 | import java.net.MalformedURLException;
 31 | import java.net.URL;
 32 | import java.nio.charset.StandardCharsets;
 33 | import java.util.List;
 34 | import java.util.Map;
 35 | 
 36 | public class HttpClient {
 37 | 
 38 |     private final URL baseUrl;
 39 | 
 40 |     public HttpClient(String hostname, Integer port) {
 41 |         try {
 42 |             baseUrl = new URL("https", hostname, port, "/");
 43 |         } catch (MalformedURLException e) {
 44 |             throw new ElasticsearchException("", e);
 45 |         }
 46 |     }
 47 | 
 48 |     public HttpClientResponse request(String path) {
 49 |         return request("GET", path, null, null);
 50 |     }
 51 | 
 52 |     public HttpClientResponse request(String method, String path) {
 53 |         return request(method, path, null, null);
 54 |     }
 55 | 
 56 |     public HttpClientResponse request(String method, String path, String payload) {
 57 |         return request(method, path, null, payload);
 58 |     }
 59 | 
 60 |     public HttpClientResponse request(String method, String path, Map<String, String> headers, String payload) {
 61 |         URL url;
 62 |         try {
 63 |             url = new URL(baseUrl, path);
 64 |         } catch (MalformedURLException e) {
 65 |             throw new ElasticsearchException("Cannot parse " + path, e);
 66 |         }
 67 | 
 68 |         HttpURLConnection urlConnection;
 69 |         try {
 70 |             urlConnection = (HttpURLConnection) url.openConnection();
 71 |             urlConnection.setRequestMethod(method);
 72 |             if (headers != null) {
 73 |                 for (Map.Entry<String, String> headerEntry : headers.entrySet()) {
 74 |                     urlConnection.setRequestProperty(headerEntry.getKey(), headerEntry.getValue());
 75 |                 }
 76 |             }
 77 | 
 78 |             if (payload != null) {
 79 |                 urlConnection.setDoOutput(true);
 80 |                 urlConnection.setRequestProperty("Content-Type", "application/json");
 81 |                 urlConnection.setRequestProperty("Accept", "application/json");
 82 |                 OutputStreamWriter osw = new OutputStreamWriter(urlConnection.getOutputStream(), StandardCharsets.UTF_8);
 83 |                 osw.write(payload);
 84 |                 osw.flush();
 85 |                 osw.close();
 86 |             }
 87 | 
 88 |             urlConnection.connect();
 89 |         } catch (IOException e) {
 90 |             throw new ElasticsearchException("", e);
 91 |         }
 92 | 
 93 |         int errorCode = -1;
 94 |         Map<String, List<String>> respHeaders = null;
 95 |         try {
 96 |             errorCode = urlConnection.getResponseCode();
 97 |             respHeaders = urlConnection.getHeaderFields();
 98 |             InputStream inputStream = urlConnection.getInputStream();
 99 |             String body = null;
100 |             try {
101 |                 body = Streams.copyToString(new InputStreamReader(inputStream, Charsets.UTF_8));
102 |             } catch (IOException e1) {
103 |                 throw new ElasticsearchException("problem reading error stream", e1);
104 |             }
105 |             return new HttpClientResponse(body, errorCode, respHeaders, null);
106 |         } catch (IOException e) {
107 |             InputStream errStream = urlConnection.getErrorStream();
108 |             String body = null;
109 |             if (errStream != null) {
110 |                 try {
111 |                     body = Streams.copyToString(new InputStreamReader(errStream, Charsets.UTF_8));
112 |                 } catch (IOException e1) {
113 |                     throw new ElasticsearchException("problem reading error stream", e1);
114 |                 }
115 |             }
116 |             return new HttpClientResponse(body, errorCode, respHeaders, e);
117 |         } finally {
118 |             urlConnection.disconnect();
119 |         }
120 |     }
121 | }
122 | 


--------------------------------------------------------------------------------
/src/test/java/org/elasticsearch/river/twitter/test/helper/HttpClientResponse.java:
--------------------------------------------------------------------------------
 1 | /*
 2 |  * Licensed to Elasticsearch under one or more contributor
 3 |  * license agreements. See the NOTICE file distributed with
 4 |  * this work for additional information regarding copyright
 5 |  * ownership. Elasticsearch licenses this file to you under
 6 |  * the Apache License, Version 2.0 (the "License"); you may
 7 |  * not use this file except in compliance with the License.
 8 |  * You may obtain a copy of the License at
 9 |  *
10 |  *    http://www.apache.org/licenses/LICENSE-2.0
11 |  *
12 |  * Unless required by applicable law or agreed to in writing,
13 |  * software distributed under the License is distributed on an
14 |  * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
15 |  * KIND, either express or implied.  See the License for the
16 |  * specific language governing permissions and limitations
17 |  * under the License.
18 |  */
19 | package org.elasticsearch.river.twitter.test.helper;
20 | 
21 | import java.util.List;
22 | import java.util.Map;
23 | 
24 | public class HttpClientResponse {
25 |     private final String response;
26 |     private final int errorCode;
27 |     private Map<String, List<String>> headers;
28 |     private final Throwable e;
29 | 
30 |     public HttpClientResponse(String response, int errorCode, Map<String, List<String>> headers,  Throwable e) {
31 |         this.response = response;
32 |         this.errorCode = errorCode;
33 |         this.headers = headers;
34 |         this.e = e;
35 |     }
36 | 
37 |     public String response() {
38 |         return response;
39 |     }
40 | 
41 |     public int errorCode() {
42 |         return errorCode;
43 |     }
44 | 
45 |     public Throwable cause() {
46 |         return e;
47 |     }
48 | 
49 |     public Map<String, List<String>> getHeaders() {
50 |         return headers;
51 |     }
52 | 
53 |     public String getHeader(String name) {
54 |         if (headers == null) {
55 |             return null;
56 |         }
57 |         List<String> vals = headers.get(name);
58 |         if (vals == null || vals.size() == 0) {
59 |             return null;
60 |         }
61 |         return vals.iterator().next();
62 |     }
63 | }
64 | 


--------------------------------------------------------------------------------