├── .gitignore
├── CONTRIBUTING.md
├── LICENSE.txt
├── README.md
├── dev-tools
└── release.py
├── pom.xml
└── src
├── main
├── assemblies
│ └── plugin.xml
├── java
│ └── org
│ │ └── elasticsearch
│ │ ├── plugin
│ │ └── river
│ │ │ └── twitter
│ │ │ └── TwitterRiverPlugin.java
│ │ └── river
│ │ └── twitter
│ │ ├── TwitterRiver.java
│ │ └── TwitterRiverModule.java
└── resources
│ └── es-plugin.properties
└── test
└── java
└── org
└── elasticsearch
└── river
└── twitter
└── test
├── AbstractTwitterTest.java
├── Twitter4JThreadFilter.java
├── TwitterIntegrationTest.java
└── helper
├── HttpClient.java
└── HttpClientResponse.java
/.gitignore:
--------------------------------------------------------------------------------
1 | /data
2 | /work
3 | /logs
4 | /.idea
5 | /target
6 | .DS_Store
7 | *.iml
8 | /.project
9 | /.settings
10 | /.classpath
11 | /plugin_tools
12 | /.local-execution-hints.log
13 | /.local-*-execution-hints.log
14 |
--------------------------------------------------------------------------------
/CONTRIBUTING.md:
--------------------------------------------------------------------------------
1 | Contributing to elasticsearch
2 | =============================
3 |
4 | Elasticsearch is an open source project and we love to receive contributions from our community — you! There are many ways to contribute, from writing tutorials or blog posts, improving the documentation, submitting bug reports and feature requests or writing code which can be incorporated into Elasticsearch itself.
5 |
6 | Bug reports
7 | -----------
8 |
9 | If you think you have found a bug in Elasticsearch, first make sure that you are testing against the [latest version of Elasticsearch](http://www.elasticsearch.org/download/) - your issue may already have been fixed. If not, search our [issues list](https://github.com/elasticsearch/elasticsearch/issues) on GitHub in case a similar issue has already been opened.
10 |
11 | It is very helpful if you can prepare a reproduction of the bug. In other words, provide a small test case which we can run to confirm your bug. It makes it easier to find the problem and to fix it. Test cases should be provided as `curl` commands which we can copy and paste into a terminal to run it locally, for example:
12 |
13 | ```sh
14 | # delete the index
15 | curl -XDELETE localhost:9200/test
16 |
17 | # insert a document
18 | curl -XPUT localhost:9200/test/test/1 -d '{
19 | "title": "test document"
20 | }'
21 |
22 | # this should return XXXX but instead returns YYY
23 | curl ....
24 | ```
25 |
26 | Provide as much information as you can. You may think that the problem lies with your query, when actually it depends on how your data is indexed. The easier it is for us to recreate your problem, the faster it is likely to be fixed.
27 |
28 | Feature requests
29 | ----------------
30 |
31 | If you find yourself wishing for a feature that doesn't exist in Elasticsearch, you are probably not alone. There are bound to be others out there with similar needs. Many of the features that Elasticsearch has today have been added because our users saw the need.
32 | Open an issue on our [issues list](https://github.com/elasticsearch/elasticsearch/issues) on GitHub which describes the feature you would like to see, why you need it, and how it should work.
33 |
34 | Contributing code and documentation changes
35 | -------------------------------------------
36 |
37 | If you have a bugfix or new feature that you would like to contribute to Elasticsearch, please find or open an issue about it first. Talk about what you would like to do. It may be that somebody is already working on it, or that there are particular issues that you should know about before implementing the change.
38 |
39 | We enjoy working with contributors to get their code accepted. There are many approaches to fixing a problem and it is important to find the best approach before writing too much code.
40 |
41 | The process for contributing to any of the [Elasticsearch repositories](https://github.com/elasticsearch/) is similar. Details for individual projects can be found below.
42 |
43 | ### Fork and clone the repository
44 |
45 | You will need to fork the main Elasticsearch code or documentation repository and clone it to your local machine. See
46 | [github help page](https://help.github.com/articles/fork-a-repo) for help.
47 |
48 | Further instructions for specific projects are given below.
49 |
50 | ### Submitting your changes
51 |
52 | Once your changes and tests are ready to submit for review:
53 |
54 | 1. Test your changes
55 | Run the test suite to make sure that nothing is broken.
56 |
57 | 2. Sign the Contributor License Agreement
58 | Please make sure you have signed our [Contributor License Agreement](http://www.elasticsearch.org/contributor-agreement/). We are not asking you to assign copyright to us, but to give us the right to distribute your code without restriction. We ask this of all contributors in order to assure our users of the origin and continuing existence of the code. You only need to sign the CLA once.
59 |
60 | 3. Rebase your changes
61 | Update your local repository with the most recent code from the main Elasticsearch repository, and rebase your branch on top of the latest master branch. We prefer your changes to be squashed into a single commit.
62 |
63 | 4. Submit a pull request
64 | Push your local changes to your forked copy of the repository and [submit a pull request](https://help.github.com/articles/using-pull-requests). In the pull request, describe what your changes do and mention the number of the issue where discussion has taken place, eg "Closes #123".
65 |
66 | Then sit back and wait. There will probably be discussion about the pull request and, if any changes are needed, we would love to work with you to get your pull request merged into Elasticsearch.
67 |
68 |
69 | Contributing to the Elasticsearch plugin
70 | ----------------------------------------
71 |
72 | **Repository:** [https://github.com/elasticsearch/elasticsearch-river-twitter](https://github.com/elasticsearch/elasticsearch-river-twitter)
73 |
74 | Make sure you have [Maven](http://maven.apache.org) installed, as Elasticsearch uses it as its build system. Integration with IntelliJ and Eclipse should work out of the box. Eclipse users can automatically configure their IDE by running `mvn eclipse:eclipse` and then importing the project into their workspace: `File > Import > Existing project into workspace`.
75 |
76 | Please follow these formatting guidelines:
77 |
78 | * Java indent is 4 spaces
79 | * Line width is 140 characters
80 | * The rest is left to Java coding standards
81 | * Disable “auto-format on save” to prevent unnecessary format changes. This makes reviews much harder as it generates unnecessary formatting changes. If your IDE supports formatting only modified chunks that is fine to do.
82 |
83 | To create a distribution from the source, simply run:
84 |
85 | ```sh
86 | cd elasticsearch-river-twitter/
87 | mvn clean package -DskipTests
88 | ```
89 |
90 | You will find the newly built packages under: `./target/releases/`.
91 |
92 | Before submitting your changes, run the test suite to make sure that nothing is broken, with:
93 |
94 | ```sh
95 | mvn clean test
96 | ```
97 |
98 | Source: [Contributing to elasticsearch](http://www.elasticsearch.org/contributing-to-elasticsearch/)
99 |
--------------------------------------------------------------------------------
/LICENSE.txt:
--------------------------------------------------------------------------------
1 |
2 | Apache License
3 | Version 2.0, January 2004
4 | http://www.apache.org/licenses/
5 |
6 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
7 |
8 | 1. Definitions.
9 |
10 | "License" shall mean the terms and conditions for use, reproduction,
11 | and distribution as defined by Sections 1 through 9 of this document.
12 |
13 | "Licensor" shall mean the copyright owner or entity authorized by
14 | the copyright owner that is granting the License.
15 |
16 | "Legal Entity" shall mean the union of the acting entity and all
17 | other entities that control, are controlled by, or are under common
18 | control with that entity. For the purposes of this definition,
19 | "control" means (i) the power, direct or indirect, to cause the
20 | direction or management of such entity, whether by contract or
21 | otherwise, or (ii) ownership of fifty percent (50%) or more of the
22 | outstanding shares, or (iii) beneficial ownership of such entity.
23 |
24 | "You" (or "Your") shall mean an individual or Legal Entity
25 | exercising permissions granted by this License.
26 |
27 | "Source" form shall mean the preferred form for making modifications,
28 | including but not limited to software source code, documentation
29 | source, and configuration files.
30 |
31 | "Object" form shall mean any form resulting from mechanical
32 | transformation or translation of a Source form, including but
33 | not limited to compiled object code, generated documentation,
34 | and conversions to other media types.
35 |
36 | "Work" shall mean the work of authorship, whether in Source or
37 | Object form, made available under the License, as indicated by a
38 | copyright notice that is included in or attached to the work
39 | (an example is provided in the Appendix below).
40 |
41 | "Derivative Works" shall mean any work, whether in Source or Object
42 | form, that is based on (or derived from) the Work and for which the
43 | editorial revisions, annotations, elaborations, or other modifications
44 | represent, as a whole, an original work of authorship. For the purposes
45 | of this License, Derivative Works shall not include works that remain
46 | separable from, or merely link (or bind by name) to the interfaces of,
47 | the Work and Derivative Works thereof.
48 |
49 | "Contribution" shall mean any work of authorship, including
50 | the original version of the Work and any modifications or additions
51 | to that Work or Derivative Works thereof, that is intentionally
52 | submitted to Licensor for inclusion in the Work by the copyright owner
53 | or by an individual or Legal Entity authorized to submit on behalf of
54 | the copyright owner. For the purposes of this definition, "submitted"
55 | means any form of electronic, verbal, or written communication sent
56 | to the Licensor or its representatives, including but not limited to
57 | communication on electronic mailing lists, source code control systems,
58 | and issue tracking systems that are managed by, or on behalf of, the
59 | Licensor for the purpose of discussing and improving the Work, but
60 | excluding communication that is conspicuously marked or otherwise
61 | designated in writing by the copyright owner as "Not a Contribution."
62 |
63 | "Contributor" shall mean Licensor and any individual or Legal Entity
64 | on behalf of whom a Contribution has been received by Licensor and
65 | subsequently incorporated within the Work.
66 |
67 | 2. Grant of Copyright License. Subject to the terms and conditions of
68 | this License, each Contributor hereby grants to You a perpetual,
69 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable
70 | copyright license to reproduce, prepare Derivative Works of,
71 | publicly display, publicly perform, sublicense, and distribute the
72 | Work and such Derivative Works in Source or Object form.
73 |
74 | 3. Grant of Patent License. Subject to the terms and conditions of
75 | this License, each Contributor hereby grants to You a perpetual,
76 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable
77 | (except as stated in this section) patent license to make, have made,
78 | use, offer to sell, sell, import, and otherwise transfer the Work,
79 | where such license applies only to those patent claims licensable
80 | by such Contributor that are necessarily infringed by their
81 | Contribution(s) alone or by combination of their Contribution(s)
82 | with the Work to which such Contribution(s) was submitted. If You
83 | institute patent litigation against any entity (including a
84 | cross-claim or counterclaim in a lawsuit) alleging that the Work
85 | or a Contribution incorporated within the Work constitutes direct
86 | or contributory patent infringement, then any patent licenses
87 | granted to You under this License for that Work shall terminate
88 | as of the date such litigation is filed.
89 |
90 | 4. Redistribution. You may reproduce and distribute copies of the
91 | Work or Derivative Works thereof in any medium, with or without
92 | modifications, and in Source or Object form, provided that You
93 | meet the following conditions:
94 |
95 | (a) You must give any other recipients of the Work or
96 | Derivative Works a copy of this License; and
97 |
98 | (b) You must cause any modified files to carry prominent notices
99 | stating that You changed the files; and
100 |
101 | (c) You must retain, in the Source form of any Derivative Works
102 | that You distribute, all copyright, patent, trademark, and
103 | attribution notices from the Source form of the Work,
104 | excluding those notices that do not pertain to any part of
105 | the Derivative Works; and
106 |
107 | (d) If the Work includes a "NOTICE" text file as part of its
108 | distribution, then any Derivative Works that You distribute must
109 | include a readable copy of the attribution notices contained
110 | within such NOTICE file, excluding those notices that do not
111 | pertain to any part of the Derivative Works, in at least one
112 | of the following places: within a NOTICE text file distributed
113 | as part of the Derivative Works; within the Source form or
114 | documentation, if provided along with the Derivative Works; or,
115 | within a display generated by the Derivative Works, if and
116 | wherever such third-party notices normally appear. The contents
117 | of the NOTICE file are for informational purposes only and
118 | do not modify the License. You may add Your own attribution
119 | notices within Derivative Works that You distribute, alongside
120 | or as an addendum to the NOTICE text from the Work, provided
121 | that such additional attribution notices cannot be construed
122 | as modifying the License.
123 |
124 | You may add Your own copyright statement to Your modifications and
125 | may provide additional or different license terms and conditions
126 | for use, reproduction, or distribution of Your modifications, or
127 | for any such Derivative Works as a whole, provided Your use,
128 | reproduction, and distribution of the Work otherwise complies with
129 | the conditions stated in this License.
130 |
131 | 5. Submission of Contributions. Unless You explicitly state otherwise,
132 | any Contribution intentionally submitted for inclusion in the Work
133 | by You to the Licensor shall be under the terms and conditions of
134 | this License, without any additional terms or conditions.
135 | Notwithstanding the above, nothing herein shall supersede or modify
136 | the terms of any separate license agreement you may have executed
137 | with Licensor regarding such Contributions.
138 |
139 | 6. Trademarks. This License does not grant permission to use the trade
140 | names, trademarks, service marks, or product names of the Licensor,
141 | except as required for reasonable and customary use in describing the
142 | origin of the Work and reproducing the content of the NOTICE file.
143 |
144 | 7. Disclaimer of Warranty. Unless required by applicable law or
145 | agreed to in writing, Licensor provides the Work (and each
146 | Contributor provides its Contributions) on an "AS IS" BASIS,
147 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
148 | implied, including, without limitation, any warranties or conditions
149 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
150 | PARTICULAR PURPOSE. You are solely responsible for determining the
151 | appropriateness of using or redistributing the Work and assume any
152 | risks associated with Your exercise of permissions under this License.
153 |
154 | 8. Limitation of Liability. In no event and under no legal theory,
155 | whether in tort (including negligence), contract, or otherwise,
156 | unless required by applicable law (such as deliberate and grossly
157 | negligent acts) or agreed to in writing, shall any Contributor be
158 | liable to You for damages, including any direct, indirect, special,
159 | incidental, or consequential damages of any character arising as a
160 | result of this License or out of the use or inability to use the
161 | Work (including but not limited to damages for loss of goodwill,
162 | work stoppage, computer failure or malfunction, or any and all
163 | other commercial damages or losses), even if such Contributor
164 | has been advised of the possibility of such damages.
165 |
166 | 9. Accepting Warranty or Additional Liability. While redistributing
167 | the Work or Derivative Works thereof, You may choose to offer,
168 | and charge a fee for, acceptance of support, warranty, indemnity,
169 | or other liability obligations and/or rights consistent with this
170 | License. However, in accepting such obligations, You may act only
171 | on Your own behalf and on Your sole responsibility, not on behalf
172 | of any other Contributor, and only if You agree to indemnify,
173 | defend, and hold each Contributor harmless for any liability
174 | incurred by, or claims asserted against, such Contributor by reason
175 | of your accepting any such warranty or additional liability.
176 |
177 | END OF TERMS AND CONDITIONS
178 |
179 | APPENDIX: How to apply the Apache License to your work.
180 |
181 | To apply the Apache License to your work, attach the following
182 | boilerplate notice, with the fields enclosed by brackets "[]"
183 | replaced with your own identifying information. (Don't include
184 | the brackets!) The text should be enclosed in the appropriate
185 | comment syntax for the file format. We also recommend that a
186 | file or class name and description of purpose be included on the
187 | same "printed page" as the copyright notice for easier
188 | identification within third-party archives.
189 |
190 | Copyright [yyyy] [name of copyright owner]
191 |
192 | Licensed under the Apache License, Version 2.0 (the "License");
193 | you may not use this file except in compliance with the License.
194 | You may obtain a copy of the License at
195 |
196 | http://www.apache.org/licenses/LICENSE-2.0
197 |
198 | Unless required by applicable law or agreed to in writing, software
199 | distributed under the License is distributed on an "AS IS" BASIS,
200 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
201 | See the License for the specific language governing permissions and
202 | limitations under the License.
203 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | **Important**: This project has been stopped since elasticsearch 2.0.
2 |
3 | ----
4 |
5 | Twitter River Plugin for Elasticsearch
6 | ==================================
7 |
8 | The Twitter river indexes the public [twitter stream](http://dev.twitter.com/pages/streaming_api), aka the hose,
9 | and makes it searchable.
10 |
11 | **Rivers are [deprecated](https://www.elastic.co/blog/deprecating_rivers) and will be removed in the future.**
12 | Have a look at [logstash twitter input](http://www.elastic.co/guide/en/logstash/current/plugins-inputs-twitter.html).
13 |
14 | In order to install the plugin, run:
15 |
16 | ```sh
17 | bin/plugin install elasticsearch/elasticsearch-river-twitter/2.6.0
18 | ```
19 |
20 | After installing the plugin you need to restart elasticsearch.
21 |
22 | You need to install a version matching your Elasticsearch version:
23 |
24 | | Elasticsearch |Twitter River Plugin| Docs |
25 | |------------------------|-------------------|------------------------------------------------------------------------------------------------------------------------------------|
26 | | master | Build from source | See below |
27 | | es-1.x | Build from source | [2.7.0-SNAPSHOT](https://github.com/elasticsearch/elasticsearch-river-twitter/tree/es-1.x/#version-270-snapshot-for-elasticsearch-1x)|
28 | | es-1.6 | 2.6.0 | [2.6.0](https://github.com/elastic/elasticsearch-river-twitter/tree/v2.6.0/#version-260-for-elasticsearch-16) |
29 | | es-1.5 | 2.5.0 | [2.5.0](https://github.com/elastic/elasticsearch-river-twitter/tree/v2.5.0/#version-250-for-elasticsearch-15) |
30 | | es-1.4 | 2.4.2 | [2.4.2](https://github.com/elasticsearch/elasticsearch-river-twitter/tree/v2.4.2/#version-242-for-elasticsearch-14) |
31 | | es-1.3 | 2.3.0 | [2.3.0](https://github.com/elasticsearch/elasticsearch-river-twitter/tree/v2.3.0/#version-230-for-elasticsearch-13) |
32 | | es-1.2 | 2.2.0 | [2.2.0](https://github.com/elasticsearch/elasticsearch-river-twitter/tree/v2.2.0/#twitter-river-plugin-for-elasticsearch) |
33 | | es-1.0 | 2.0.0 | [2.0.0](https://github.com/elasticsearch/elasticsearch-river-twitter/tree/v2.0.0/#twitter-river-plugin-for-elasticsearch) |
34 | | es-0.90 | 1.5.0 | [1.5.0](https://github.com/elasticsearch/elasticsearch-river-twitter/tree/v1.5.0/#twitter-river-plugin-for-elasticsearch) |
35 |
36 | To build a `SNAPSHOT` version, you need to build it with Maven:
37 |
38 | ```bash
39 | mvn clean install
40 | plugin --install river-twitter \
41 | --url file:target/releases/elasticsearch-river-twitter-X.X.X-SNAPSHOT.zip
42 | ```
43 |
44 | Prerequisites
45 | -------------
46 |
47 | You need to get an OAuth token in order to use Twitter river.
48 | Please follow [Twitter documentation](https://dev.twitter.com/docs/auth/tokens-devtwittercom), basically:
49 |
50 | * Login to: https://dev.twitter.com/apps/
51 | * Create a new Twitter application (let's say elasticsearch): https://dev.twitter.com/apps/new
52 | You don't need a callback URL.
53 | * When done, click on `Create my access token`.
54 | * Open `OAuth tool` tab and note `Consumer key`, `Consumer secret`, `Access token` and `Access token secret`.
55 |
56 |
57 | Create river
58 | ------------
59 |
60 | Creating the twitter river can be done using:
61 |
62 | ```
63 | PUT _river/my_twitter_river/_meta
64 | {
65 | "type" : "twitter",
66 | "twitter" : {
67 | "oauth" : {
68 | "consumer_key" : "*** YOUR Consumer key HERE ***",
69 | "consumer_secret" : "*** YOUR Consumer secret HERE ***",
70 | "access_token" : "*** YOUR Access token HERE ***",
71 | "access_token_secret" : "*** YOUR Access token secret HERE ***"
72 | }
73 | },
74 | "index" : {
75 | "index" : "my_twitter_river",
76 | "type" : "status",
77 | "bulk_size" : 100,
78 | "flush_interval" : "5s",
79 | "retry_after" : "10s"
80 | }
81 | }
82 | ```
83 |
84 | The above lists all the options controlling the creation of a twitter river.
85 |
86 | If you don't define `index.index`, it will use your river name (`my_twitter_river`) as the default index name.
87 | If you don't define `index.type`, default `status` type will be used.
88 |
89 | Note that you can define any or all of your oauth settings in `elasticsearch.yml` file on each node by prefixing
90 | setting with `river.twitter.`:
91 |
92 | ```
93 | river.twitter.oauth.consumer_key: "*** YOUR Consumer key HERE ***"
94 | river.twitter.oauth.consumer_secret: "*** YOUR Consumer secret HERE ***"
95 | river.twitter.oauth.access_token: "*** YOUR Access token HERE ***"
96 | river.twitter.oauth.access_token_secret: "*** YOUR Access token secret HERE ***"
97 | ```
98 |
99 | In that case, you can create the river using:
100 |
101 | ```
102 | PUT _river/my_twitter_river/_meta
103 | {
104 | "type" : "twitter"
105 | }
106 | ```
107 |
108 | You can also overload any of `elasticsearch.yml` setting. A good practice could be to have `consumer_key` and
109 | `consumer_secret` in `elasticsearch.yml` and provide to the river `access_token` and `access_token_secret` properties.
110 |
111 | By default, the twitter river will read a small random of all public statuses using
112 | [sample API](https://dev.twitter.com/docs/api/1.1/get/statuses/sample).
113 |
114 | But, you can define statuses type you want to read:
115 |
116 | * [sample](https://dev.twitter.com/docs/api/1.1/get/statuses/sample): the default one
117 | * [filter](https://dev.twitter.com/docs/api/1.1/post/statuses/filter): track for text, users and locations.
118 | See [Filtered Stream](#filtered-stream)
119 | * [user](https://dev.twitter.com/docs/streaming-apis/streams/user): listen to tweets in the authenticated user's timeline.
120 | See [User Stream](#user-stream)
121 | * [firehose](https://dev.twitter.com/docs/api/1.1/get/statuses/firehose): all public statuses (restricted access)
122 |
123 | For example:
124 |
125 | ```
126 | PUT _river/my_twitter_river/_meta
127 | {
128 | "type" : "twitter",
129 | "twitter" : {
130 | "type" : "firehose"
131 | }
132 | }
133 | ```
134 |
135 | Note that if you define a filter (see [next section](#filtered-stream)), type will be automatically set to `filter`.
136 |
137 | Tweets will be indexed once a `bulk_size` of them have been accumulated (default to `100`)
138 | or every `flush_interval` period (default to `5s`).
139 |
140 | Filtered Stream
141 | ===============
142 |
143 | Filtered stream can also be supported (as per the twitter stream API). Filter stream can be configured to
144 | support `tracks`, `follow`, `locations` and `language`. `user_lists` is a shortcut to follow all members of a public
145 | twitter list identified by the user id and the list slug (last part of uri when open a list in your browser). The
146 | configuration is the same as the twitter API (a single comma separated string value, or using json arrays).
147 | Here is an example:
148 |
149 | ```
150 | PUT _river/my_twitter_river/_meta
151 | {
152 | "type" : "twitter",
153 | "twitter" : {
154 | "filter" : {
155 | "tracks" : "test,something,please",
156 | "follow" : "111,222,333",
157 | "user_lists" : "ownerScreenName1/slug1,ownerScreenName2/slug2",
158 | "locations" : "-122.75,36.8,-121.75,37.8,-74,40,-73,41",
159 | "language" : "fr,en"
160 | }
161 | }
162 | }
163 | ```
164 |
165 | Note that locations use geoJSON order (longitude, latitude).
166 |
167 | Note that if you want to use language filtering you need also to define at least one of `tracks`,
168 | `follow` or `locations` filter.
169 | Supported languages identifiers are [BCP 47](http://tools.ietf.org/html/bcp47). You can filter
170 | whatever language defined in [Twitter Advanced Search](https://twitter.com/search-advanced).
171 |
172 | Here is an array based configuration example:
173 |
174 | ```
175 | PUT _river/my_twitter_river/_meta
176 | {
177 | "type" : "twitter",
178 | "twitter" : {
179 | "filter" : {
180 | "tracks" : ["test", "something"],
181 | "follow" : [111, 222, 333],
182 | "locations" : [ [-122.75,36.8], [-121.75,37.8], [-74,40], [-73,41]],
183 | "language" : [ "fr", "en" ]
184 | }
185 | }
186 | }
187 | ```
188 |
189 | User Stream
190 | ===========
191 |
192 | User stream can also be supported (as per the twitter stream API). This stream return tweets on the authenticated user's
193 | timeline. Here is a basic configuration example:
194 |
195 | ```
196 | PUT _river/my_twitter_river/_meta
197 | {
198 | "type" : "twitter",
199 | "twitter" : {
200 | "type" : "user"
201 | }
202 | }
203 | ```
204 |
205 | Indexing RAW Twitter stream
206 | ===========================
207 |
208 | By default, elasticsearch twitter river will convert tweets to an equivalent representation
209 | in elasticsearch. If you want to index RAW twitter JSON content without any transformation,
210 | you can set `raw` to `true`:
211 |
212 | ```
213 | PUT _river/my_twitter_river/_meta
214 | {
215 | "type" : "twitter",
216 | "twitter" : {
217 | "raw" : true
218 | }
219 | }
220 | ```
221 |
222 | Note that you should think of creating a mapping first for your tweets. See Twitter documentation on
223 | [raw Tweet format](https://dev.twitter.com/docs/platform-objects/tweets):
224 |
225 | ```
226 | PUT my_twitter_river/status/_mapping
227 | {
228 | "status" : {
229 | "properties" : {
230 | "text" : {"type" : "string", "analyzer" : "standard"}
231 | }
232 | }
233 | }
234 | ```
235 |
236 | Ignoring Retweets
237 | =================
238 |
239 | If you don't want to index retweets (aka RT), just set `ignore_retweet` to `true` (default to `false`):
240 |
241 | ```
242 | PUT _river/my_twitter_river/_meta
243 | {
244 | "type" : "twitter",
245 | "twitter" : {
246 | "ignore_retweet" : true
247 | }
248 | }
249 | ```
250 |
251 | Increase the schedule time to reconnect the river
252 | =================================================
253 |
254 | It can happen that the river fails, thus closing the current connection to the Streaming API. Then, a new connection is scheduled by the river after 10s by default.
255 | If you want to manage this time, simply use the `retry_after` option, as in:
256 |
257 | ```
258 | PUT _river/my_twitter_river/_meta
259 | {
260 | "type" : "twitter",
261 | "index" : {
262 | "retry_after" : "30s"
263 | }
264 | }
265 | ```
266 |
267 | Geo location points as array
268 | ============================
269 |
270 | By default, elasticsearch twitter river index `location` field using the *lat lon as properties* format.
271 | You can set `geo_as_array` to `true` if you prefer having `location` indexed as an array `[lon, lat]`.
272 |
273 | ```
274 | PUT _river/my_twitter_river/_meta
275 | {
276 | "type" : "twitter",
277 | "twitter" : {
278 | "geo_as_array" : true
279 | }
280 | }
281 | ```
282 |
283 | Remove the river
284 | ================
285 |
286 | If you need to stop the Twitter river, you have to remove it:
287 |
288 | ```
289 | DELETE _river/my_twitter_river/
290 | ```
291 |
292 | Using a proxy
293 | =============
294 |
295 | You can define a proxy if you are using one:
296 |
297 | ```
298 | PUT _river/my_twitter_river/_meta
299 | {
300 | "type" : "twitter",
301 | "twitter" : {
302 | "proxy" : {
303 | "host": "host",
304 | "port": "port",
305 | "user": "proxy_user_if_any",
306 | "password": "proxy_password_if_any"
307 | }
308 | }
309 | }
310 | ```
311 |
312 | You can also define proxy settings in `elasticsearch.yml`file on each node by prefixing setting with `river.twitter.`:
313 |
314 | ```yaml
315 | river.twitter.proxy.host: "host"
316 | river.twitter.proxy.port: "port"
317 | river.twitter.proxy.user: "proxy_user_if_any"
318 | river.twitter.proxy.password: "proxy_password_if_any"
319 | ```
320 |
321 | Sample document
322 | ===============
323 |
324 | Here is how a document could look like when using this river (without `raw` option):
325 |
326 | ```js
327 | {
328 | "text":"This is a text",
329 | "created_at":"2015-01-26T15:22:35.000Z",
330 | "source":"Twitter for Windows Phone",
331 | "truncated":false,
332 | "language":"en",
333 | "mention":[
334 |
335 | ],
336 | "retweet_count":0,
337 | "hashtag":[
338 |
339 | ],
340 | "location":[
341 | 78.418407,
342 | 17.431913
343 | ],
344 | "place":{
345 | "id":"243cc16f6417a167",
346 | "name":"Hyderabad",
347 | "type":"city",
348 | "full_name":"Hyderabad, Andhra Pradesh",
349 | "street_address":null,
350 | "country":"India",
351 | "country_code":"IN",
352 | "url":"https://api.twitter.com/1.1/geo/id/243cc16f6417a167.json"
353 | },
354 | "link":[
355 |
356 | ],
357 | "user":{
358 | "id":1111111111,
359 | "name":"User Name",
360 | "screen_name":"twitter_handle",
361 | "location":"A full text location description",
362 | "description":"A description",
363 | "profile_image_url":"http://pbs.twimg.com/profile_images/1111111111/QATJ00Yp_normal.jpeg",
364 | "profile_image_url_https":"https://pbs.twimg.com/profile_images/1111111111/QATJ00Yp_normal.jpeg"
365 | }
366 | }
367 | ```
368 |
369 | Tests
370 | =====
371 |
372 | Integrations tests in this plugin require working Twitter account and therefore disabled by default.
373 | You need to create your credentials as explained in [Prerequisites](#prerequisites).
374 |
375 | To enable tests prepare a config file `elasticsearch.yml` with the following content:
376 |
377 | ```
378 | river:
379 | twitter:
380 | oauth:
381 | consumer_key: "your_consumer_key"
382 | consumer_secret: "your_consumer_secret"
383 | access_token: "your_access_token"
384 | access_token_secret: "your_access_token_secret"
385 | ```
386 |
387 | Replace all occurrences of `your_consumer_key`, `your_consumer_secret`, `your_access_token` and
388 | `your_access_token_secret` with your settings.
389 |
390 | To run test:
391 |
392 | ```sh
393 | mvn -Dtests.twitter=true -Dtests.config=/path/to/config/file/elasticsearch.yml clean test
394 | ```
395 |
396 | Note that if you want to test User Stream, you need to define write rights for your twitter
397 | application.
398 |
399 | License
400 | -------
401 |
402 | This software is licensed under the Apache 2 license, quoted below.
403 |
404 | Copyright 2009-2014 Elasticsearch
405 |
406 | Licensed under the Apache License, Version 2.0 (the "License"); you may not
407 | use this file except in compliance with the License. You may obtain a copy of
408 | the License at
409 |
410 | http://www.apache.org/licenses/LICENSE-2.0
411 |
412 | Unless required by applicable law or agreed to in writing, software
413 | distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
414 | WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
415 | License for the specific language governing permissions and limitations under
416 | the License.
417 |
--------------------------------------------------------------------------------
/dev-tools/release.py:
--------------------------------------------------------------------------------
1 | # Licensed to Elasticsearch under one or more contributor
2 | # license agreements. See the NOTICE file distributed with
3 | # this work for additional information regarding copyright
4 | # ownership. Elasticsearch licenses this file to you under
5 | # the Apache License, Version 2.0 (the "License"); you may
6 | # not use this file except in compliance with the License.
7 | # You may obtain a copy of the License at
8 | #
9 | # http://www.apache.org/licenses/LICENSE-2.0
10 | #
11 | # Unless required by applicable law or agreed to in writing,
12 | # software distributed under the License is distributed on
13 | # an 'AS IS' BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND,
14 | # either express or implied. See the License for the specific
15 | # language governing permissions and limitations under the License.
16 |
17 | import datetime
18 | import os
19 | import shutil
20 | import sys
21 | import time
22 | import urllib
23 | import urllib.request
24 | import zipfile
25 |
26 | from os.path import dirname, abspath
27 |
28 | """
29 | This tool builds a release from the a given elasticsearch plugin branch.
30 |
31 | It is basically a wrapper on top of launch_release.py which:
32 |
33 | - tries to get a more recent version of launch_release.py in ...
34 | - download it if needed
35 | - launch it passing all arguments to it, like:
36 |
37 | $ python3 dev_tools/release.py --branch master --publish --remote origin
38 |
39 | Important options:
40 |
41 | # Dry run
42 | $ python3 dev_tools/release.py
43 |
44 | # Dry run without tests
45 | python3 dev_tools/release.py --skiptests
46 |
47 | # Release, publish artifacts and announce
48 | $ python3 dev_tools/release.py --publish
49 |
50 | See full documentation in launch_release.py
51 | """
52 | env = os.environ
53 |
54 | # Change this if the source repository for your scripts is at a different location
55 | SOURCE_REPO = 'elasticsearch/elasticsearch-plugins-script'
56 | # We define that we should download again the script after 1 days
57 | SCRIPT_OBSOLETE_DAYS = 1
58 | # We ignore in master.zip file the following files
59 | IGNORED_FILES = ['.gitignore', 'README.md']
60 |
61 |
62 | ROOT_DIR = abspath(os.path.join(abspath(dirname(__file__)), '../'))
63 | TARGET_TOOLS_DIR = ROOT_DIR + '/plugin_tools'
64 | DEV_TOOLS_DIR = ROOT_DIR + '/dev-tools'
65 | BUILD_RELEASE_FILENAME = 'release.zip'
66 | BUILD_RELEASE_FILE = TARGET_TOOLS_DIR + '/' + BUILD_RELEASE_FILENAME
67 | SOURCE_URL = 'https://github.com/%s/archive/master.zip' % SOURCE_REPO
68 |
69 | # Download a recent version of the release plugin tool
70 | try:
71 | os.mkdir(TARGET_TOOLS_DIR)
72 | print('directory %s created' % TARGET_TOOLS_DIR)
73 | except FileExistsError:
74 | pass
75 |
76 |
77 | try:
78 | # we check latest update. If we ran an update recently, we
79 | # are not going to check it again
80 | download = True
81 |
82 | try:
83 | last_download_time = datetime.datetime.fromtimestamp(os.path.getmtime(BUILD_RELEASE_FILE))
84 | if (datetime.datetime.now()-last_download_time).days < SCRIPT_OBSOLETE_DAYS:
85 | download = False
86 | except FileNotFoundError:
87 | pass
88 |
89 | if download:
90 | urllib.request.urlretrieve(SOURCE_URL, BUILD_RELEASE_FILE)
91 | with zipfile.ZipFile(BUILD_RELEASE_FILE) as myzip:
92 | for member in myzip.infolist():
93 | filename = os.path.basename(member.filename)
94 | # skip directories
95 | if not filename:
96 | continue
97 | if filename in IGNORED_FILES:
98 | continue
99 |
100 | # copy file (taken from zipfile's extract)
101 | source = myzip.open(member.filename)
102 | target = open(os.path.join(TARGET_TOOLS_DIR, filename), "wb")
103 | with source, target:
104 | shutil.copyfileobj(source, target)
105 | # We keep the original date
106 | date_time = time.mktime(member.date_time + (0, 0, -1))
107 | os.utime(os.path.join(TARGET_TOOLS_DIR, filename), (date_time, date_time))
108 | print('plugin-tools updated from %s' % SOURCE_URL)
109 | except urllib.error.HTTPError:
110 | pass
111 |
112 |
113 | # Let see if we need to update the release.py script itself
114 | source_time = os.path.getmtime(TARGET_TOOLS_DIR + '/release.py')
115 | repo_time = os.path.getmtime(DEV_TOOLS_DIR + '/release.py')
116 | if source_time > repo_time:
117 | input('release.py needs an update. Press a key to update it...')
118 | shutil.copyfile(TARGET_TOOLS_DIR + '/release.py', DEV_TOOLS_DIR + '/release.py')
119 |
120 | # We can launch the build process
121 | try:
122 | PYTHON = 'python'
123 | # make sure python3 is used if python3 is available
124 | # some systems use python 2 as default
125 | os.system('python3 --version > /dev/null 2>&1')
126 | PYTHON = 'python3'
127 | except RuntimeError:
128 | pass
129 |
130 | release_args = ''
131 | for x in range(1, len(sys.argv)):
132 | release_args += ' ' + sys.argv[x]
133 |
134 | os.system('%s %s/build_release.py %s' % (PYTHON, TARGET_TOOLS_DIR, release_args))
135 |
--------------------------------------------------------------------------------
/pom.xml:
--------------------------------------------------------------------------------
1 |
2 |
5 | 4.0.0
6 |
7 | org.elasticsearch
8 | elasticsearch-river-twitter
9 | 3.0.0-SNAPSHOT
10 | jar
11 | Elasticsearch Twitter River plugin
12 | The Twitter river indexes the public twitter stream, aka the hose, and makes it searchable
13 | https://github.com/elastic/elasticsearch-river-twitter/
14 | 2009
15 |
16 |
17 | The Apache Software License, Version 2.0
18 | http://www.apache.org/licenses/LICENSE-2.0.txt
19 | repo
20 |
21 |
22 |
23 | scm:git:git@github.com:elastic/elasticsearch-river-twitter.git
24 | scm:git:git@github.com:elastic/elasticsearch-river-twitter.git
25 | http://github.com/elastic/elasticsearch-river-twitter
26 |
27 |
28 |
29 | org.elasticsearch
30 | elasticsearch-plugin
31 | 2.0.0-SNAPSHOT
32 |
33 |
34 |
35 | 4.0.3
36 |
37 | warn
38 | 1
39 |
40 |
41 |
42 |
43 |
44 | org.twitter4j
45 | twitter4j-stream
46 | ${twitter4j.version}
47 |
48 |
49 |
50 |
51 |
52 |
53 | org.apache.maven.plugins
54 | maven-assembly-plugin
55 |
56 |
57 |
58 |
59 |
60 |
61 | oss-snapshots
62 | Sonatype OSS Snapshots
63 | https://oss.sonatype.org/content/repositories/snapshots/
64 |
65 |
66 |
67 |
--------------------------------------------------------------------------------
/src/main/assemblies/plugin.xml:
--------------------------------------------------------------------------------
1 |
2 |
3 | plugin
4 |
5 | zip
6 |
7 | false
8 |
9 |
10 | /
11 | true
12 | true
13 |
14 | org.elasticsearch:elasticsearch
15 |
16 |
17 |
18 | /
19 | true
20 | true
21 |
22 | org.twitter4j:twitter4j-stream
23 |
24 |
25 |
26 |
--------------------------------------------------------------------------------
/src/main/java/org/elasticsearch/plugin/river/twitter/TwitterRiverPlugin.java:
--------------------------------------------------------------------------------
1 | /*
2 | * Licensed to Elasticsearch under one or more contributor
3 | * license agreements. See the NOTICE file distributed with
4 | * this work for additional information regarding copyright
5 | * ownership. Elasticsearch licenses this file to you under
6 | * the Apache License, Version 2.0 (the "License"); you may
7 | * not use this file except in compliance with the License.
8 | * You may obtain a copy of the License at
9 | *
10 | * http://www.apache.org/licenses/LICENSE-2.0
11 | *
12 | * Unless required by applicable law or agreed to in writing,
13 | * software distributed under the License is distributed on an
14 | * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
15 | * KIND, either express or implied. See the License for the
16 | * specific language governing permissions and limitations
17 | * under the License.
18 | */
19 |
20 | package org.elasticsearch.plugin.river.twitter;
21 |
22 | import org.elasticsearch.common.inject.Inject;
23 | import org.elasticsearch.plugins.AbstractPlugin;
24 | import org.elasticsearch.river.RiversModule;
25 | import org.elasticsearch.river.twitter.TwitterRiverModule;
26 |
27 | /**
28 | *
29 | */
30 | public class TwitterRiverPlugin extends AbstractPlugin {
31 |
32 | @Inject
33 | public TwitterRiverPlugin() {
34 | }
35 |
36 | @Override
37 | public String name() {
38 | return "river-twitter";
39 | }
40 |
41 | @Override
42 | public String description() {
43 | return "River Twitter Plugin";
44 | }
45 |
46 | public void onModule(RiversModule module) {
47 | module.registerRiver("twitter", TwitterRiverModule.class);
48 | }
49 | }
50 |
--------------------------------------------------------------------------------
/src/main/java/org/elasticsearch/river/twitter/TwitterRiver.java:
--------------------------------------------------------------------------------
1 | /*
2 | * Licensed to Elasticsearch under one or more contributor
3 | * license agreements. See the NOTICE file distributed with
4 | * this work for additional information regarding copyright
5 | * ownership. Elasticsearch licenses this file to you under
6 | * the Apache License, Version 2.0 (the "License"); you may
7 | * not use this file except in compliance with the License.
8 | * You may obtain a copy of the License at
9 | *
10 | * http://www.apache.org/licenses/LICENSE-2.0
11 | *
12 | * Unless required by applicable law or agreed to in writing,
13 | * software distributed under the License is distributed on an
14 | * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
15 | * KIND, either express or implied. See the License for the
16 | * specific language governing permissions and limitations
17 | * under the License.
18 | */
19 |
20 | package org.elasticsearch.river.twitter;
21 |
22 | import org.elasticsearch.ExceptionsHelper;
23 | import org.elasticsearch.action.bulk.BulkItemResponse;
24 | import org.elasticsearch.action.bulk.BulkProcessor;
25 | import org.elasticsearch.action.bulk.BulkRequest;
26 | import org.elasticsearch.action.bulk.BulkResponse;
27 | import org.elasticsearch.client.Client;
28 | import org.elasticsearch.client.Requests;
29 | import org.elasticsearch.cluster.block.ClusterBlockException;
30 | import org.elasticsearch.common.Strings;
31 | import org.elasticsearch.common.inject.Inject;
32 | import org.elasticsearch.common.settings.Settings;
33 | import org.elasticsearch.common.unit.TimeValue;
34 | import org.elasticsearch.common.xcontent.XContentBuilder;
35 | import org.elasticsearch.common.xcontent.XContentFactory;
36 | import org.elasticsearch.common.xcontent.support.XContentMapValues;
37 | import org.elasticsearch.indices.IndexAlreadyExistsException;
38 | import org.elasticsearch.river.AbstractRiverComponent;
39 | import org.elasticsearch.river.River;
40 | import org.elasticsearch.river.RiverName;
41 | import org.elasticsearch.river.RiverSettings;
42 | import org.elasticsearch.threadpool.ThreadPool;
43 | import twitter4j.*;
44 | import twitter4j.conf.Configuration;
45 | import twitter4j.conf.ConfigurationBuilder;
46 |
47 | import java.util.ArrayList;
48 | import java.util.List;
49 | import java.util.Map;
50 |
51 | /**
52 | *
53 | */
54 | public class TwitterRiver extends AbstractRiverComponent implements River {
55 |
56 | private final ThreadPool threadPool;
57 |
58 | private final Client client;
59 |
60 | private final String oauthConsumerKey;
61 | private final String oauthConsumerSecret;
62 | private final String oauthAccessToken;
63 | private final String oauthAccessTokenSecret;
64 |
65 | private final TimeValue retryAfter;
66 |
67 | private final String proxyHost;
68 | private final String proxyPort;
69 | private final String proxyUser;
70 | private final String proxyPassword;
71 |
72 | private final boolean raw;
73 | private final boolean ignoreRetweet;
74 | private final boolean geoAsArray;
75 |
76 | private final String indexName;
77 |
78 | private final String typeName;
79 |
80 | private final int bulkSize;
81 | private final int maxConcurrentBulk;
82 | private final TimeValue bulkFlushInterval;
83 |
84 | private final FilterQuery filterQuery;
85 |
86 | private final String streamType;
87 |
88 | private RiverStatus riverStatus;
89 |
90 | private volatile TwitterStream stream;
91 |
92 | private volatile BulkProcessor bulkProcessor;
93 |
94 | @SuppressWarnings({"unchecked"})
95 | @Inject
96 | public TwitterRiver(RiverName riverName, RiverSettings riverSettings, Client client, ThreadPool threadPool, Settings settings) {
97 | super(riverName, riverSettings);
98 | this.riverStatus = RiverStatus.UNKNOWN;
99 | this.client = client;
100 | this.threadPool = threadPool;
101 |
102 | String riverStreamType;
103 |
104 | if (riverSettings.settings().containsKey("twitter")) {
105 | Map twitterSettings = (Map) riverSettings.settings().get("twitter");
106 |
107 | raw = XContentMapValues.nodeBooleanValue(twitterSettings.get("raw"), false);
108 | ignoreRetweet = XContentMapValues.nodeBooleanValue(twitterSettings.get("ignore_retweet"), false);
109 | geoAsArray = XContentMapValues.nodeBooleanValue(twitterSettings.get("geo_as_array"), false);
110 |
111 | if (twitterSettings.containsKey("oauth")) {
112 | Map oauth = (Map) twitterSettings.get("oauth");
113 | if (oauth.containsKey("consumer_key")) {
114 | oauthConsumerKey = XContentMapValues.nodeStringValue(oauth.get("consumer_key"), null);
115 | } else {
116 | oauthConsumerKey = settings.get("river.twitter.oauth.consumer_key");
117 | }
118 | if (oauth.containsKey("consumer_secret")) {
119 | oauthConsumerSecret = XContentMapValues.nodeStringValue(oauth.get("consumer_secret"), null);
120 | } else {
121 | oauthConsumerSecret = settings.get("river.twitter.oauth.consumer_secret");
122 | }
123 | if (oauth.containsKey("access_token")) {
124 | oauthAccessToken = XContentMapValues.nodeStringValue(oauth.get("access_token"), null);
125 | } else {
126 | oauthAccessToken = settings.get("river.twitter.oauth.access_token");
127 | }
128 | if (oauth.containsKey("access_token_secret")) {
129 | oauthAccessTokenSecret = XContentMapValues.nodeStringValue(oauth.get("access_token_secret"), null);
130 | } else {
131 | oauthAccessTokenSecret = settings.get("river.twitter.oauth.access_token_secret");
132 | }
133 | } else {
134 | oauthConsumerKey = settings.get("river.twitter.oauth.consumer_key");
135 | oauthConsumerSecret = settings.get("river.twitter.oauth.consumer_secret");
136 | oauthAccessToken = settings.get("river.twitter.oauth.access_token");
137 | oauthAccessTokenSecret = settings.get("river.twitter.oauth.access_token_secret");
138 | }
139 |
140 | if (twitterSettings.containsKey("retry_after")) {
141 | retryAfter = XContentMapValues.nodeTimeValue(twitterSettings.get("retry_after"), TimeValue.timeValueSeconds(10));
142 | } else {
143 | retryAfter = XContentMapValues.nodeTimeValue(settings.get("river.twitter.retry_after"), TimeValue.timeValueSeconds(10));
144 | }
145 |
146 | if (twitterSettings.containsKey("proxy")) {
147 | Map proxy = (Map) twitterSettings.get("proxy");
148 | proxyHost = XContentMapValues.nodeStringValue(proxy.get("host"), null);
149 | proxyPort = XContentMapValues.nodeStringValue(proxy.get("port"), null);
150 | proxyUser = XContentMapValues.nodeStringValue(proxy.get("user"), null);
151 | proxyPassword = XContentMapValues.nodeStringValue(proxy.get("password"), null);
152 | } else {
153 | // Let's see if we have that in node settings
154 | proxyHost = settings.get("river.twitter.proxy.host");
155 | proxyPort = settings.get("river.twitter.proxy.port");
156 | proxyUser = settings.get("river.twitter.proxy.user");
157 | proxyPassword = settings.get("river.twitter.proxy.password");
158 | }
159 |
160 | riverStreamType = XContentMapValues.nodeStringValue(twitterSettings.get("type"), "sample");
161 | Map filterSettings = (Map) twitterSettings.get("filter");
162 |
163 | if (riverStreamType.equals("filter") && filterSettings == null) {
164 | filterQuery = null;
165 | stream = null;
166 | streamType = null;
167 | indexName = null;
168 | typeName = "status";
169 | bulkSize = 100;
170 | this.maxConcurrentBulk = 1;
171 | this.bulkFlushInterval = TimeValue.timeValueSeconds(5);
172 | logger.warn("no filter defined for type filter. Disabling river...");
173 | return;
174 | }
175 |
176 | if (filterSettings != null) {
177 | riverStreamType = "filter";
178 | filterQuery = new FilterQuery();
179 | filterQuery.count(XContentMapValues.nodeIntegerValue(filterSettings.get("count"), 0));
180 | Object tracks = filterSettings.get("tracks");
181 | boolean filterSet = false;
182 | if (tracks != null) {
183 | if (tracks instanceof List) {
184 | List lTracks = (List) tracks;
185 | filterQuery.track(lTracks.toArray(new String[lTracks.size()]));
186 | } else {
187 | filterQuery.track(Strings.commaDelimitedListToStringArray(tracks.toString()));
188 | }
189 | filterSet = true;
190 | }
191 | Object follow = filterSettings.get("follow");
192 | if (follow != null) {
193 | if (follow instanceof List) {
194 | List lFollow = (List) follow;
195 | long[] followIds = new long[lFollow.size()];
196 | for (int i = 0; i < lFollow.size(); i++) {
197 | Object o = lFollow.get(i);
198 | if (o instanceof Number) {
199 | followIds[i] = ((Number) o).intValue();
200 | } else {
201 | followIds[i] = Long.parseLong(o.toString());
202 | }
203 | }
204 | filterQuery.follow(followIds);
205 | } else {
206 | String[] ids = Strings.commaDelimitedListToStringArray(follow.toString());
207 | long[] followIds = new long[ids.length];
208 | for (int i = 0; i < ids.length; i++) {
209 | followIds[i] = Long.parseLong(ids[i]);
210 | }
211 | filterQuery.follow(followIds);
212 | }
213 | filterSet = true;
214 | }
215 | Object locations = filterSettings.get("locations");
216 | if (locations != null) {
217 | if (locations instanceof List) {
218 | List lLocations = (List) locations;
219 | double[][] dLocations = new double[lLocations.size()][];
220 | for (int i = 0; i < lLocations.size(); i++) {
221 | Object loc = lLocations.get(i);
222 | double lat;
223 | double lon;
224 | if (loc instanceof List) {
225 | List lLoc = (List) loc;
226 | if (lLoc.get(0) instanceof Number) {
227 | lon = ((Number) lLoc.get(0)).doubleValue();
228 | } else {
229 | lon = Double.parseDouble(lLoc.get(0).toString());
230 | }
231 | if (lLoc.get(1) instanceof Number) {
232 | lat = ((Number) lLoc.get(1)).doubleValue();
233 | } else {
234 | lat = Double.parseDouble(lLoc.get(1).toString());
235 | }
236 | } else {
237 | String[] sLoc = Strings.commaDelimitedListToStringArray(loc.toString());
238 | lon = Double.parseDouble(sLoc[0]);
239 | lat = Double.parseDouble(sLoc[1]);
240 | }
241 | dLocations[i] = new double[]{lon, lat};
242 | }
243 | filterQuery.locations(dLocations);
244 | } else {
245 | String[] sLocations = Strings.commaDelimitedListToStringArray(locations.toString());
246 | double[][] dLocations = new double[sLocations.length / 2][];
247 | int dCounter = 0;
248 | for (int i = 0; i < sLocations.length; i++) {
249 | double lon = Double.parseDouble(sLocations[i]);
250 | double lat = Double.parseDouble(sLocations[++i]);
251 | dLocations[dCounter++] = new double[]{lon, lat};
252 | }
253 | filterQuery.locations(dLocations);
254 | }
255 | filterSet = true;
256 | }
257 | Object userLists = filterSettings.get("user_lists");
258 | if (userLists != null) {
259 | if (userLists instanceof List) {
260 | List lUserlists = (List) userLists;
261 | String[] tUserlists = lUserlists.toArray(new String[lUserlists.size()]);
262 | filterQuery.follow(getUsersListMembers(tUserlists));
263 | } else {
264 | String[] tUserlists = Strings.commaDelimitedListToStringArray(userLists.toString());
265 | filterQuery.follow(getUsersListMembers(tUserlists));
266 | }
267 | filterSet = true;
268 | }
269 |
270 | // We should have something to filter
271 | if (!filterSet) {
272 | streamType = null;
273 | indexName = null;
274 | typeName = "status";
275 | bulkSize = 100;
276 | this.maxConcurrentBulk = 1;
277 | this.bulkFlushInterval = TimeValue.timeValueSeconds(5);
278 | logger.warn("can not set language filter without tracks, follow, locations or user_lists. Disabling river.");
279 | return;
280 | }
281 |
282 | Object language = filterSettings.get("language");
283 | if (language != null) {
284 | if (language instanceof List) {
285 | List lLanguage = (List) language;
286 | filterQuery.language(lLanguage.toArray(new String[lLanguage.size()]));
287 | } else {
288 | filterQuery.language(Strings.commaDelimitedListToStringArray(language.toString()));
289 | }
290 | }
291 | } else {
292 | filterQuery = null;
293 | }
294 | } else {
295 | // No specific settings. We need to use some defaults
296 | riverStreamType = "sample";
297 | raw = false;
298 | ignoreRetweet = false;
299 | geoAsArray = false;
300 | oauthConsumerKey = settings.get("river.twitter.oauth.consumer_key");
301 | oauthConsumerSecret = settings.get("river.twitter.oauth.consumer_secret");
302 | oauthAccessToken = settings.get("river.twitter.oauth.access_token");
303 | oauthAccessTokenSecret = settings.get("river.twitter.oauth.access_token_secret");
304 | retryAfter = XContentMapValues.nodeTimeValue(settings.get("river.twitter.retry_after"), TimeValue.timeValueSeconds(10));
305 | filterQuery = null;
306 | proxyHost = null;
307 | proxyPort = null;
308 | proxyUser = null;
309 | proxyPassword =null;
310 | }
311 |
312 | if (oauthAccessToken == null || oauthConsumerKey == null || oauthConsumerSecret == null || oauthAccessTokenSecret == null) {
313 | stream = null;
314 | streamType = null;
315 | indexName = null;
316 | typeName = "status";
317 | bulkSize = 100;
318 | this.maxConcurrentBulk = 1;
319 | this.bulkFlushInterval = TimeValue.timeValueSeconds(5);
320 | logger.warn("no oauth specified, disabling river...");
321 | return;
322 | }
323 |
324 | if (riverSettings.settings().containsKey("index")) {
325 | Map indexSettings = (Map) riverSettings.settings().get("index");
326 | indexName = XContentMapValues.nodeStringValue(indexSettings.get("index"), riverName.name());
327 | typeName = XContentMapValues.nodeStringValue(indexSettings.get("type"), "status");
328 | this.bulkSize = XContentMapValues.nodeIntegerValue(indexSettings.get("bulk_size"), 100);
329 | this.bulkFlushInterval = TimeValue.parseTimeValue(XContentMapValues.nodeStringValue(
330 | indexSettings.get("flush_interval"), "5s"), TimeValue.timeValueSeconds(5));
331 | this.maxConcurrentBulk = XContentMapValues.nodeIntegerValue(indexSettings.get("max_concurrent_bulk"), 1);
332 | } else {
333 | indexName = riverName.name();
334 | typeName = "status";
335 | bulkSize = 100;
336 | this.maxConcurrentBulk = 1;
337 | this.bulkFlushInterval = TimeValue.timeValueSeconds(5);
338 | }
339 |
340 | logger.info("creating twitter stream river");
341 | if (raw && logger.isDebugEnabled()) {
342 | logger.debug("will index twitter raw content...");
343 | }
344 |
345 | streamType = riverStreamType;
346 | this.riverStatus = RiverStatus.INITIALIZED;
347 | }
348 |
349 | /**
350 | * Get users id of each list to stream them.
351 | * @param tUserlists List of user list. Should be a public list.
352 | * @return
353 | */
354 | private long[] getUsersListMembers(String[] tUserlists) {
355 | logger.debug("Fetching user id of given lists");
356 | List listUserIdToFollow = new ArrayList();
357 | Configuration cb = buildTwitterConfiguration();
358 | Twitter twitterImpl = new TwitterFactory(cb).getInstance();
359 |
360 | //For each list given in parameter
361 | for (String listId : tUserlists) {
362 | logger.debug("Adding users of list {} ",listId);
363 | String[] splitListId = listId.split("/");
364 | try {
365 | long cursor = -1;
366 | PagableResponseList itUserListMembers;
367 | do {
368 | itUserListMembers = twitterImpl.getUserListMembers(splitListId[0], splitListId[1], cursor);
369 | for (User member : itUserListMembers) {
370 | long userId = member.getId();
371 | listUserIdToFollow.add(userId);
372 | }
373 | } while ((cursor = itUserListMembers.getNextCursor()) != 0);
374 |
375 | } catch (TwitterException te) {
376 | logger.error("Failed to get list members for : {}", listId, te);
377 | }
378 | }
379 |
380 |
381 | //Just casting from Long to long
382 | long ret[] = new long[listUserIdToFollow.size()];
383 | int pos = 0;
384 | for (Long userId : listUserIdToFollow) {
385 | ret[pos] = userId;
386 | pos++;
387 | }
388 | return ret;
389 | }
390 |
391 | /**
392 | * Build configuration object with credentials and proxy settings
393 | * @return
394 | */
395 | private Configuration buildTwitterConfiguration() {
396 | logger.debug("creating twitter configuration");
397 | ConfigurationBuilder cb = new ConfigurationBuilder();
398 |
399 | cb.setOAuthConsumerKey(oauthConsumerKey)
400 | .setOAuthConsumerSecret(oauthConsumerSecret)
401 | .setOAuthAccessToken(oauthAccessToken)
402 | .setOAuthAccessTokenSecret(oauthAccessTokenSecret);
403 |
404 | if (proxyHost != null) cb.setHttpProxyHost(proxyHost);
405 | if (proxyPort != null) cb.setHttpProxyPort(Integer.parseInt(proxyPort));
406 | if (proxyUser != null) cb.setHttpProxyUser(proxyUser);
407 | if (proxyPassword != null) cb.setHttpProxyPassword(proxyPassword);
408 | if (raw) cb.setJSONStoreEnabled(true);
409 | logger.debug("twitter configuration created");
410 | return cb.build();
411 | }
412 |
413 | /**
414 | * Start twitter stream
415 | */
416 | private void startTwitterStream() {
417 | logger.info("starting {} twitter stream", streamType);
418 |
419 | if (stream == null) {
420 | logger.debug("creating twitter stream");
421 |
422 | stream = new TwitterStreamFactory(buildTwitterConfiguration()).getInstance();
423 | if (streamType.equals("user")) {
424 | stream.addListener(new UserStreamHandler());
425 | } else {
426 | stream.addListener(new StatusHandler());
427 | }
428 |
429 | logger.debug("twitter stream created");
430 | }
431 |
432 | if (riverStatus != RiverStatus.STOPPED && riverStatus != RiverStatus.STOPPING) {
433 | if (streamType.equals("filter") || filterQuery != null) {
434 | stream.filter(filterQuery);
435 | } else if (streamType.equals("firehose")) {
436 | stream.firehose(0);
437 | } else if (streamType.equals("user")) {
438 | stream.user();
439 | } else {
440 | stream.sample();
441 | }
442 | }
443 | logger.debug("{} twitter stream started!", streamType);
444 | }
445 |
446 | @Override
447 | public void start() {
448 | this.riverStatus = RiverStatus.STARTING;
449 | // Let's start this in another thread so we won't stop the start process
450 | threadPool.generic().execute(new Runnable() {
451 | @Override
452 | public void run() {
453 | if (riverStatus != RiverStatus.STOPPED && riverStatus != RiverStatus.STOPPING) {
454 | // We are first waiting for a yellow state at least
455 | logger.debug("waiting for yellow status");
456 | client.admin().cluster().prepareHealth("_river").setWaitForYellowStatus().get();
457 | logger.debug("yellow or green status received");
458 | }
459 |
460 | if (riverStatus != RiverStatus.STOPPED && riverStatus != RiverStatus.STOPPING) {
461 | // We push ES mapping only if raw is false
462 | if (!raw) {
463 | try {
464 | logger.debug("Trying to create index [{}]", indexName);
465 | client.admin().indices().prepareCreate(indexName).execute().actionGet();
466 | logger.debug("index created [{}]", indexName);
467 | } catch (Exception e) {
468 | if (ExceptionsHelper.unwrapCause(e) instanceof IndexAlreadyExistsException) {
469 | // that's fine
470 | logger.debug("Index [{}] already exists, skipping...", indexName);
471 | } else if (ExceptionsHelper.unwrapCause(e) instanceof ClusterBlockException) {
472 | // ok, not recovered yet..., lets start indexing and hope we recover by the first bulk
473 | // TODO: a smarter logic can be to register for cluster event listener here, and only start sampling when the block is removed...
474 | logger.debug("Cluster is blocked for now. Index [{}] can not be created, skipping...", indexName);
475 | } else {
476 | logger.warn("failed to create index [{}], disabling river...", e, indexName);
477 | riverStatus = RiverStatus.STOPPED;
478 | return;
479 | }
480 | }
481 |
482 | if (client.admin().indices().prepareGetMappings(indexName).setTypes(typeName).get().getMappings().isEmpty()) {
483 | try {
484 | String mapping = XContentFactory.jsonBuilder().startObject().startObject(typeName).startObject("properties")
485 | .startObject("location").field("type", "geo_point").endObject()
486 | .startObject("language").field("type", "string").field("index", "not_analyzed").endObject()
487 | .startObject("user").startObject("properties").startObject("screen_name").field("type", "string").field("index", "not_analyzed").endObject().endObject().endObject()
488 | .startObject("mention").startObject("properties").startObject("screen_name").field("type", "string").field("index", "not_analyzed").endObject().endObject().endObject()
489 | .startObject("in_reply").startObject("properties").startObject("user_screen_name").field("type", "string").field("index", "not_analyzed").endObject().endObject().endObject()
490 | .startObject("retweet").startObject("properties").startObject("user_screen_name").field("type", "string").field("index", "not_analyzed").endObject().endObject().endObject()
491 | .endObject().endObject().endObject().string();
492 | logger.debug("Applying default mapping for [{}]/[{}]: {}", indexName, typeName, mapping);
493 | client.admin().indices().preparePutMapping(indexName).setType(typeName).setSource(mapping).execute().actionGet();
494 | } catch (Exception e) {
495 | logger.warn("failed to apply default mapping [{}]/[{}], disabling river...", e, indexName, typeName);
496 | return;
497 | }
498 | } else {
499 | logger.debug("Mapping already exists for [{}]/[{}], skipping...", indexName, typeName);
500 | }
501 | }
502 | }
503 |
504 | // Creating bulk processor
505 | logger.debug("creating bulk processor [{}]", indexName);
506 | bulkProcessor = BulkProcessor.builder(client, new BulkProcessor.Listener() {
507 | @Override
508 | public void beforeBulk(long executionId, BulkRequest request) {
509 | logger.debug("Going to execute new bulk composed of {} actions", request.numberOfActions());
510 | }
511 |
512 | @Override
513 | public void afterBulk(long executionId, BulkRequest request, BulkResponse response) {
514 | logger.debug("Executed bulk composed of {} actions", request.numberOfActions());
515 | if (response.hasFailures()) {
516 | logger.warn("There was failures while executing bulk", response.buildFailureMessage());
517 | if (logger.isDebugEnabled()) {
518 | for (BulkItemResponse item : response.getItems()) {
519 | if (item.isFailed()) {
520 | logger.debug("Error for {}/{}/{} for {} operation: {}", item.getIndex(),
521 | item.getType(), item.getId(), item.getOpType(), item.getFailureMessage());
522 | }
523 | }
524 | }
525 | }
526 | }
527 |
528 | @Override
529 | public void afterBulk(long executionId, BulkRequest request, Throwable failure) {
530 | logger.warn("Error executing bulk", failure);
531 | }
532 | })
533 | .setBulkActions(bulkSize)
534 | .setConcurrentRequests(maxConcurrentBulk)
535 | .setFlushInterval(bulkFlushInterval)
536 | .build();
537 |
538 | logger.debug("Bulk processor created with bulkSize [{}], bulkFlushInterval [{}]", bulkSize, bulkFlushInterval);
539 | if (riverStatus != RiverStatus.STOPPED && riverStatus != RiverStatus.STOPPING) {
540 | startTwitterStream();
541 | riverStatus = RiverStatus.RUNNING;
542 | }
543 | }
544 | });
545 | }
546 |
547 | private void reconnect() {
548 | if (riverStatus == RiverStatus.STOPPING || riverStatus == RiverStatus.STOPPED ) {
549 | logger.debug("can not reconnect twitter on a closed river");
550 | return;
551 | }
552 |
553 | riverStatus = RiverStatus.STARTING;
554 |
555 | if (stream != null) {
556 | try {
557 | logger.debug("cleanup stream");
558 | stream.cleanUp();
559 | } catch (Exception e) {
560 | logger.debug("failed to cleanup after failure", e);
561 | }
562 | try {
563 | logger.debug("shutdown stream");
564 | stream.shutdown();
565 | } catch (Exception e) {
566 | logger.debug("failed to shutdown after failure", e);
567 | }
568 | }
569 |
570 | if (riverStatus == RiverStatus.STOPPING || riverStatus == RiverStatus.STOPPED ) {
571 | logger.debug("can not reconnect twitter on a closed river");
572 | return;
573 | }
574 |
575 | try {
576 | startTwitterStream();
577 | riverStatus = RiverStatus.RUNNING;
578 | } catch (Exception e) {
579 | if (riverStatus == RiverStatus.STOPPING || riverStatus == RiverStatus.STOPPED ) {
580 | logger.debug("river is closing. we won't reconnect.");
581 | close();
582 | return;
583 | }
584 | // TODO, we can update the status of the river to RECONNECT
585 | logger.warn("failed to connect after failure, throttling", e);
586 | threadPool.schedule(retryAfter, ThreadPool.Names.GENERIC, new Runnable() {
587 | @Override
588 | public void run() {
589 | reconnect();
590 | }
591 | });
592 | }
593 | }
594 |
595 | @Override
596 | public void close() {
597 | riverStatus = RiverStatus.STOPPING;
598 |
599 | logger.info("closing twitter stream river");
600 |
601 | if (bulkProcessor != null) {
602 | bulkProcessor.close();
603 | }
604 |
605 | if (stream != null) {
606 | // No need to call stream.cleanUp():
607 | // - since it is done by the implementation of shutdown()
608 | // - it will lead to a thread leak (see TwitterStreamImpl.cleanUp() and TwitterStreamImpl.shutdown() )
609 | stream.shutdown();
610 | }
611 |
612 | riverStatus = RiverStatus.STOPPED;
613 | }
614 |
615 | private class StatusHandler extends StatusAdapter {
616 |
617 | @Override
618 | public void onStatus(Status status) {
619 | if (riverStatus != RiverStatus.STOPPED && riverStatus != RiverStatus.STOPPING) {
620 | try {
621 | // #24: We want to ignore retweets (default to false) https://github.com/elasticsearch/elasticsearch-river-twitter/issues/24
622 | if (status.isRetweet() && ignoreRetweet) {
623 | if (logger.isTraceEnabled()) {
624 | logger.trace("ignoring status cause retweet {} : {}", status.getUser().getName(), status.getText());
625 | }
626 | } else {
627 | if (logger.isTraceEnabled()) {
628 | logger.trace("status {} : {}", status.getUser().getName(), status.getText());
629 | }
630 |
631 | // If we want to index tweets as is, we don't need to convert it to JSon doc
632 | if (raw) {
633 | String rawJSON = TwitterObjectFactory.getRawJSON(status);
634 | if (riverStatus != RiverStatus.STOPPED && riverStatus != RiverStatus.STOPPING) {
635 | bulkProcessor.add(Requests.indexRequest(indexName).type(typeName).id(Long.toString(status.getId())).source(rawJSON));
636 | }
637 | } else {
638 | XContentBuilder builder = XContentFactory.jsonBuilder().startObject();
639 | builder.field("text", status.getText());
640 | builder.field("created_at", status.getCreatedAt());
641 | builder.field("source", status.getSource());
642 | builder.field("truncated", status.isTruncated());
643 | builder.field("language", status.getLang());
644 |
645 | if (status.getUserMentionEntities() != null) {
646 | builder.startArray("mention");
647 | for (UserMentionEntity user : status.getUserMentionEntities()) {
648 | builder.startObject();
649 | builder.field("id", user.getId());
650 | builder.field("name", user.getName());
651 | builder.field("screen_name", user.getScreenName());
652 | builder.field("start", user.getStart());
653 | builder.field("end", user.getEnd());
654 | builder.endObject();
655 | }
656 | builder.endArray();
657 | }
658 |
659 | if (status.getRetweetCount() != -1) {
660 | builder.field("retweet_count", status.getRetweetCount());
661 | }
662 |
663 | if (status.isRetweet() && status.getRetweetedStatus() != null) {
664 | builder.startObject("retweet");
665 | builder.field("id", status.getRetweetedStatus().getId());
666 | if (status.getRetweetedStatus().getUser() != null) {
667 | builder.field("user_id", status.getRetweetedStatus().getUser().getId());
668 | builder.field("user_screen_name", status.getRetweetedStatus().getUser().getScreenName());
669 | if (status.getRetweetedStatus().getRetweetCount() != -1) {
670 | builder.field("retweet_count", status.getRetweetedStatus().getRetweetCount());
671 | }
672 | }
673 | builder.endObject();
674 | }
675 |
676 | if (status.getInReplyToStatusId() != -1) {
677 | builder.startObject("in_reply");
678 | builder.field("status", status.getInReplyToStatusId());
679 | if (status.getInReplyToUserId() != -1) {
680 | builder.field("user_id", status.getInReplyToUserId());
681 | builder.field("user_screen_name", status.getInReplyToScreenName());
682 | }
683 | builder.endObject();
684 | }
685 |
686 | if (status.getHashtagEntities() != null) {
687 | builder.startArray("hashtag");
688 | for (HashtagEntity hashtag : status.getHashtagEntities()) {
689 | builder.startObject();
690 | builder.field("text", hashtag.getText());
691 | builder.field("start", hashtag.getStart());
692 | builder.field("end", hashtag.getEnd());
693 | builder.endObject();
694 | }
695 | builder.endArray();
696 | }
697 | if (status.getContributors() != null && status.getContributors().length > 0) {
698 | builder.array("contributor", status.getContributors());
699 | }
700 | if (status.getGeoLocation() != null) {
701 | if (geoAsArray) {
702 | builder.startArray("location");
703 | builder.value(status.getGeoLocation().getLongitude());
704 | builder.value(status.getGeoLocation().getLatitude());
705 | builder.endArray();
706 | } else {
707 | builder.startObject("location");
708 | builder.field("lat", status.getGeoLocation().getLatitude());
709 | builder.field("lon", status.getGeoLocation().getLongitude());
710 | builder.endObject();
711 | }
712 | }
713 | if (status.getPlace() != null) {
714 | builder.startObject("place");
715 | builder.field("id", status.getPlace().getId());
716 | builder.field("name", status.getPlace().getName());
717 | builder.field("type", status.getPlace().getPlaceType());
718 | builder.field("full_name", status.getPlace().getFullName());
719 | builder.field("street_address", status.getPlace().getStreetAddress());
720 | builder.field("country", status.getPlace().getCountry());
721 | builder.field("country_code", status.getPlace().getCountryCode());
722 | builder.field("url", status.getPlace().getURL());
723 | builder.endObject();
724 | }
725 | if (status.getURLEntities() != null) {
726 | builder.startArray("link");
727 | for (URLEntity url : status.getURLEntities()) {
728 | if (url != null) {
729 | builder.startObject();
730 | if (url.getURL() != null) {
731 | builder.field("url", url.getURL());
732 | }
733 | if (url.getDisplayURL() != null) {
734 | builder.field("display_url", url.getDisplayURL());
735 | }
736 | if (url.getExpandedURL() != null) {
737 | builder.field("expand_url", url.getExpandedURL());
738 | }
739 | builder.field("start", url.getStart());
740 | builder.field("end", url.getEnd());
741 | builder.endObject();
742 | }
743 | }
744 | builder.endArray();
745 | }
746 |
747 | builder.startObject("user");
748 | builder.field("id", status.getUser().getId());
749 | builder.field("name", status.getUser().getName());
750 | builder.field("screen_name", status.getUser().getScreenName());
751 | builder.field("location", status.getUser().getLocation());
752 | builder.field("description", status.getUser().getDescription());
753 | builder.field("profile_image_url", status.getUser().getProfileImageURL());
754 | builder.field("profile_image_url_https", status.getUser().getProfileImageURLHttps());
755 |
756 | builder.endObject();
757 |
758 | builder.endObject();
759 | if (riverStatus != RiverStatus.STOPPED && riverStatus != RiverStatus.STOPPING) {
760 | bulkProcessor.add(Requests.indexRequest(indexName).type(typeName).id(Long.toString(status.getId())).source(builder));
761 | }
762 | }
763 | }
764 |
765 | } catch (Exception e) {
766 | logger.warn("failed to construct index request", e);
767 | }
768 | } else {
769 | logger.debug("river is closing. ignoring tweet [{}]", status.getId());
770 | }
771 | }
772 |
773 | @Override
774 | public void onDeletionNotice(StatusDeletionNotice statusDeletionNotice) {
775 | if (riverStatus != RiverStatus.STOPPED && riverStatus != RiverStatus.STOPPING) {
776 | if (statusDeletionNotice.getStatusId() != -1) {
777 | bulkProcessor.add(Requests.deleteRequest(indexName).type(typeName).id(Long.toString(statusDeletionNotice.getStatusId())));
778 | }
779 | } else {
780 | logger.debug("river is closing. ignoring deletion of tweet [{}]", statusDeletionNotice.getStatusId());
781 | }
782 | }
783 |
784 | @Override
785 | public void onTrackLimitationNotice(int numberOfLimitedStatuses) {
786 | logger.info("received track limitation notice, number_of_limited_statuses {}", numberOfLimitedStatuses);
787 | }
788 |
789 | @Override
790 | public void onException(Exception ex) {
791 | logger.warn("stream failure, restarting stream...", ex);
792 | threadPool.generic().execute(new Runnable() {
793 | @Override
794 | public void run() {
795 | reconnect();
796 | }
797 | });
798 | }
799 | }
800 |
801 | private class UserStreamHandler extends UserStreamAdapter {
802 |
803 | private final StatusHandler statusHandler = new StatusHandler();
804 |
805 | @Override
806 | public void onException(Exception ex) {
807 | statusHandler.onException(ex);
808 | }
809 |
810 | @Override
811 | public void onStatus(Status status) {
812 | statusHandler.onStatus(status);
813 | }
814 |
815 | @Override
816 | public void onDeletionNotice(StatusDeletionNotice statusDeletionNotice) {
817 | statusHandler.onDeletionNotice(statusDeletionNotice);
818 | }
819 | }
820 |
821 | public enum RiverStatus {
822 | UNKNOWN,
823 | INITIALIZED,
824 | STARTING,
825 | RUNNING,
826 | STOPPING,
827 | STOPPED;
828 | }
829 | }
830 |
--------------------------------------------------------------------------------
/src/main/java/org/elasticsearch/river/twitter/TwitterRiverModule.java:
--------------------------------------------------------------------------------
1 | /*
2 | * Licensed to Elasticsearch under one or more contributor
3 | * license agreements. See the NOTICE file distributed with
4 | * this work for additional information regarding copyright
5 | * ownership. Elasticsearch licenses this file to you under
6 | * the Apache License, Version 2.0 (the "License"); you may
7 | * not use this file except in compliance with the License.
8 | * You may obtain a copy of the License at
9 | *
10 | * http://www.apache.org/licenses/LICENSE-2.0
11 | *
12 | * Unless required by applicable law or agreed to in writing,
13 | * software distributed under the License is distributed on an
14 | * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
15 | * KIND, either express or implied. See the License for the
16 | * specific language governing permissions and limitations
17 | * under the License.
18 | */
19 |
20 | package org.elasticsearch.river.twitter;
21 |
22 | import org.elasticsearch.common.inject.AbstractModule;
23 | import org.elasticsearch.river.River;
24 |
25 | /**
26 | *
27 | */
28 | public class TwitterRiverModule extends AbstractModule {
29 |
30 | @Override
31 | protected void configure() {
32 | bind(River.class).to(TwitterRiver.class).asEagerSingleton();
33 | }
34 | }
35 |
--------------------------------------------------------------------------------
/src/main/resources/es-plugin.properties:
--------------------------------------------------------------------------------
1 | plugin=org.elasticsearch.plugin.river.twitter.TwitterRiverPlugin
2 | version=${project.version}
3 |
--------------------------------------------------------------------------------
/src/test/java/org/elasticsearch/river/twitter/test/AbstractTwitterTest.java:
--------------------------------------------------------------------------------
1 | /*
2 | * Licensed to Elasticsearch under one or more contributor
3 | * license agreements. See the NOTICE file distributed with
4 | * this work for additional information regarding copyright
5 | * ownership. Elasticsearch licenses this file to you under
6 | * the Apache License, Version 2.0 (the "License"); you may
7 | * not use this file except in compliance with the License.
8 | * You may obtain a copy of the License at
9 | *
10 | * http://www.apache.org/licenses/LICENSE-2.0
11 | *
12 | * Unless required by applicable law or agreed to in writing,
13 | * software distributed under the License is distributed on an
14 | * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
15 | * KIND, either express or implied. See the License for the
16 | * specific language governing permissions and limitations
17 | * under the License.
18 | */
19 |
20 | package org.elasticsearch.river.twitter.test;
21 |
22 | import com.carrotsearch.randomizedtesting.annotations.TestGroup;
23 | import org.elasticsearch.common.base.Predicate;
24 | import org.elasticsearch.test.ElasticsearchIntegrationTest;
25 | import org.elasticsearch.test.ElasticsearchIntegrationTest.ThirdParty;
26 |
27 | import java.lang.annotation.Documented;
28 | import java.lang.annotation.Inherited;
29 | import java.lang.annotation.Retention;
30 | import java.lang.annotation.RetentionPolicy;
31 | import java.util.concurrent.TimeUnit;
32 |
33 | /**
34 | * Base class for tests that an internet connection and twitter credentials to run.
35 | * Twitter tests are disabled by default.
36 | *
37 | * To enable test add -Dtests.thirdparty=true -Dtests.config=/path/to/elasticsearch.yml
38 | *
39 | * The elasticsearch.yml file should contain the following keys
40 | *
49 | *
50 | * You need to get an OAuth token in order to use Twitter river.
51 | * Please follow [Twitter documentation](https://dev.twitter.com/docs/auth/tokens-devtwittercom), basically:
52 | *
53 | *
54 | *
Login to: https://dev.twitter.com/apps/
55 | *
Create a new Twitter application (let's say elasticsearch): https://dev.twitter.com/apps/new
56 | You don't need a callback URL.
57 | *
When done, click on `Create my access token`.
58 | *
Open `OAuth tool` tab and note `Consumer key`, `Consumer secret`, `Access token` and `Access token secret`.
59 | *
60 | */
61 | @ThirdParty
62 | public abstract class AbstractTwitterTest extends ElasticsearchIntegrationTest {
63 |
64 | /**
65 | * Repeat a task until it returns true or after a given wait time.
66 | * We use here a 1 second delay between two runs
67 | * @param breakPredicate test you want to run
68 | * @param maxWaitTime maximum time you want to wait
69 | * @param unit time unit used for maxWaitTime and maxSleepTime
70 | */
71 | public static boolean awaitBusy1Second(Predicate> breakPredicate, long maxWaitTime, TimeUnit unit) throws InterruptedException {
72 | long maxTimeInMillis = TimeUnit.MILLISECONDS.convert(maxWaitTime, unit);
73 | long sleepTimeInMillis = 1000;
74 | long iterations = maxTimeInMillis / sleepTimeInMillis;
75 | for (int i = 0; i < iterations; i++) {
76 | if (breakPredicate.apply(null)) {
77 | return true;
78 | }
79 | Thread.sleep(sleepTimeInMillis);
80 | }
81 | return breakPredicate.apply(null);
82 | }
83 | }
84 |
--------------------------------------------------------------------------------
/src/test/java/org/elasticsearch/river/twitter/test/Twitter4JThreadFilter.java:
--------------------------------------------------------------------------------
1 | /*
2 | * Licensed to Elasticsearch under one or more contributor
3 | * license agreements. See the NOTICE file distributed with
4 | * this work for additional information regarding copyright
5 | * ownership. Elasticsearch licenses this file to you under
6 | * the Apache License, Version 2.0 (the "License"); you may
7 | * not use this file except in compliance with the License.
8 | * You may obtain a copy of the License at
9 | *
10 | * http://www.apache.org/licenses/LICENSE-2.0
11 | *
12 | * Unless required by applicable law or agreed to in writing,
13 | * software distributed under the License is distributed on an
14 | * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
15 | * KIND, either express or implied. See the License for the
16 | * specific language governing permissions and limitations
17 | * under the License.
18 | */
19 |
20 | package org.elasticsearch.river.twitter.test;
21 |
22 | import com.carrotsearch.randomizedtesting.ThreadFilter;
23 |
24 | /**
25 | * We know that Twitter4J can take a while to close
26 | * This filter will ignore it as a ThreadLeak
27 | */
28 | public class Twitter4JThreadFilter implements ThreadFilter {
29 |
30 | @Override
31 | public boolean reject(Thread t) {
32 | String threadName = t.getName();
33 |
34 | if (threadName.contains("Twitter4J Async Dispatcher")) {
35 | return true;
36 | }
37 |
38 | if (threadName.contains("Twitter Stream consumer")) {
39 | return true;
40 | }
41 |
42 | if (threadName.contains("riverClusterService#updateTask")) {
43 | return true;
44 | }
45 |
46 | return false;
47 | }
48 | }
49 |
--------------------------------------------------------------------------------
/src/test/java/org/elasticsearch/river/twitter/test/TwitterIntegrationTest.java:
--------------------------------------------------------------------------------
1 | /*
2 | * Licensed to Elasticsearch under one or more contributor
3 | * license agreements. See the NOTICE file distributed with
4 | * this work for additional information regarding copyright
5 | * ownership. Elasticsearch licenses this file to you under
6 | * the Apache License, Version 2.0 (the "License"); you may
7 | * not use this file except in compliance with the License.
8 | * You may obtain a copy of the License at
9 | *
10 | * http://www.apache.org/licenses/LICENSE-2.0
11 | *
12 | * Unless required by applicable law or agreed to in writing,
13 | * software distributed under the License is distributed on an
14 | * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
15 | * KIND, either express or implied. See the License for the
16 | * specific language governing permissions and limitations
17 | * under the License.
18 | */
19 |
20 | package org.elasticsearch.river.twitter.test;
21 |
22 | import com.carrotsearch.randomizedtesting.annotations.ThreadLeakFilters;
23 | import org.elasticsearch.action.count.CountResponse;
24 | import org.elasticsearch.action.get.GetResponse;
25 | import org.elasticsearch.action.search.SearchPhaseExecutionException;
26 | import org.elasticsearch.action.search.SearchResponse;
27 | import org.elasticsearch.common.Strings;
28 | import org.elasticsearch.common.base.Predicate;
29 | import org.elasticsearch.common.joda.time.DateTime;
30 | import org.elasticsearch.common.settings.Settings;
31 | import org.elasticsearch.common.unit.DistanceUnit;
32 | import org.elasticsearch.common.xcontent.XContentBuilder;
33 | import org.elasticsearch.env.Environment;
34 | import org.elasticsearch.index.query.QueryBuilders;
35 | import org.elasticsearch.indices.IndexAlreadyExistsException;
36 | import org.elasticsearch.indices.IndexMissingException;
37 | import org.elasticsearch.plugins.PluginsService;
38 | import org.elasticsearch.river.twitter.test.helper.HttpClient;
39 | import org.elasticsearch.river.twitter.test.helper.HttpClientResponse;
40 | import org.elasticsearch.search.SearchHit;
41 | import org.elasticsearch.test.ElasticsearchIntegrationTest;
42 | import org.junit.*;
43 | import twitter4j.Status;
44 | import twitter4j.Twitter;
45 | import twitter4j.TwitterException;
46 | import twitter4j.TwitterFactory;
47 | import twitter4j.auth.AccessToken;
48 |
49 | import java.io.IOException;
50 | import java.util.concurrent.TimeUnit;
51 |
52 | import static org.elasticsearch.cluster.metadata.IndexMetaData.SETTING_NUMBER_OF_REPLICAS;
53 | import static org.elasticsearch.cluster.metadata.IndexMetaData.SETTING_NUMBER_OF_SHARDS;
54 | import static org.elasticsearch.common.xcontent.XContentFactory.jsonBuilder;
55 | import static org.hamcrest.CoreMatchers.*;
56 | import static org.hamcrest.Matchers.equalTo;
57 | import static org.hamcrest.Matchers.greaterThan;
58 |
59 | /**
60 | * Integration tests for Twitter river
61 | * You must have an internet access.
62 | *
63 | * Launch it using:
64 | * mvn test -Dtests.thirdparty=true -Dtests.config=/path/to/elasticsearch.yml
65 | *
66 | * where your /path/to/elasticsearch.yml contains:
67 |
68 | river:
69 | twitter:
70 | oauth:
71 | consumer_key: ""
72 | consumer_secret: ""
73 | access_token: ""
74 | access_token_secret: ""
75 |
76 | */
77 | @ElasticsearchIntegrationTest.ClusterScope(
78 | scope = ElasticsearchIntegrationTest.Scope.SUITE,
79 | transportClientRatio = 0.0)
80 | @ThreadLeakFilters(defaultFilters = true, filters = {Twitter4JThreadFilter.class})
81 | public class TwitterIntegrationTest extends AbstractTwitterTest {
82 |
83 | private final String track = "obama";
84 |
85 | @Override
86 | protected Settings nodeSettings(int nodeOrdinal) {
87 | Settings.Builder settings = Settings.builder()
88 | .put(super.nodeSettings(nodeOrdinal))
89 | .put("path.home", createTempDir())
90 | .put("plugins." + PluginsService.LOAD_PLUGIN_FROM_CLASSPATH, true);
91 |
92 | Environment environment = new Environment(settings.build());
93 |
94 | // if explicit, just load it and don't load from env
95 | if (Strings.hasText(System.getProperty("tests.config"))) {
96 | settings.loadFromUrl(environment.resolveConfig(System.getProperty("tests.config")));
97 | }
98 |
99 | return settings.build();
100 | }
101 |
102 | @Before
103 | public void createEmptyRiverIndex() {
104 | // We want to force _river index to use 1 shard 1 replica
105 | client().admin().indices().prepareCreate("_river").setSettings(Settings.builder()
106 | .put(SETTING_NUMBER_OF_SHARDS, 1)
107 | .put(SETTING_NUMBER_OF_REPLICAS, 0)).get();
108 | }
109 |
110 | @After
111 | public void deleteRiverAndWait() throws InterruptedException {
112 | logger.info(" --> delete all");
113 | client().admin().indices().prepareDelete("_all").get();
114 |
115 | assertThat(awaitBusy(new Predicate