├── .gitignore
├── README.md
├── images
├── nexus2.png
├── nexus3.png
├── nexus4.png
└── nexus5.png
├── pom.xml
├── run-with-maven.sh
├── run.sh
└── src
└── main
├── java
└── com
│ └── cloudera
│ └── example
│ └── ClouderaImpalaJdbcExample.java
└── resources
├── ClouderaImpalaJdbcExample.conf
└── log4j.properties
/.gitignore:
--------------------------------------------------------------------------------
1 | target/
2 | .DS_Store
3 | ._.DS_Store
4 | .AppleDouble
5 | .LSOverride
6 | Icon
7 | *.jar
8 |
9 | #Thumbnails
10 | ._*
11 |
12 | .Trashes
13 |
14 | # Eclipse stuff
15 | .classpath
16 | .project
17 | .settings
18 |
19 | # IntelliJ
20 | .idea
21 |
22 | # Vim
23 | *.swp
24 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | ###Cloudera Impala JDBC Example
2 |
3 | [Apache Impala (Incubating)](http://www.cloudera.com/products/apache-hadoop/impala.html) is an open source, analytic MPP database for Apache Hadoop.
4 |
5 | This example shows how to build and run a Maven-based project to execute SQL queries on Impala using JDBC
6 |
7 | This example was tested using Impala 2.3 included with [CDH 5.5.2](http://www.cloudera.com/downloads/cdh/5-5-2.html) and the [Impala JDBC Driver v2.5.30](http://www.cloudera.com/downloads/connectors/impala/jdbc/2-5-30.html)
8 |
9 | When you download the Impala JDBC Driver from the link above, it is packaged as a zip file with separate distributions for JDBC3, JDBC4 and JDBC4.1. This example uses the distribution for JDBC4.1 on RHEL6 x86_64. The downloaded zip file contains the following eleven jar files:
10 |
11 | (1) ImpalaJDBC41.jar
12 | (2) TCLIServiceClient.jar
13 | (3) hive_metastore.jar
14 | (4) hive_service.jar
15 | (5) ql.jar
16 | (6) libfb303-0.9.0.jar
17 | (7) libthrift-0.9.0.jar
18 | (8) log4j-1.2.14.jar
19 | (9) slf4j-api-1.5.11.jar
20 | (10) slf4j-log4j12-1.5.11.jar
21 | (11) zookeeper-3.4.6.jar
22 |
23 | The JDBC driver's installation instructions say only that "...you must set the class path to include all the JAR files from the ZIP archive containing the driver that you are using..."
24 |
25 | While this works fine for one-off projects, it's a little loose for shops that would rather manage their dependencies using Maven or other build systems.
26 |
27 | Part of the challenge in building a project using those jars with Maven is that some of the jars are not available in public repos and some of them do not have obvious version numbers. My approach in this example will be to use a local Maven repo to manage the first five jars in the list above and to rely on publicly available Maven repos for jars 6 - 11 (as they have version numbers in their name).
28 | I will use the community version of the [Nexus Repository Manager OSS](http://www.sonatype.org/nexus/go/) as a local Maven repo
29 |
30 | I downloaded Nexus Repository Manager OSS v2.12 from the link [here](http://www.sonatype.org/nexus/go/) and followed the installation instructions [here](http://books.sonatype.com/nexus-book/reference/installing.html)
31 |
32 | Here is the view of my local Nexus repo available after launching it for the first time. Note there is already a repo named "3rd party" which I will use to manage the first five JDBC driver jars:
33 |
34 | 
35 |
36 | To add jars to the repo, login to the local Nexus repo, go to the 3rd party repo's "upload artifacts" tab and select the desired jar to upload. I specified a group of "com.cloudera.impala.jdbc" and a version number of "2.5.30" for each of the five jars I uploaded, like this:
37 |
38 | 
39 |
40 | Click on the 3rd party repo's URL link and you can browse the uploaded artifacts:
41 |
42 | 
43 |
44 | Drill into any of the links and you can see the version number has been appended to each jar:
45 |
46 | 
47 |
48 | Now that we have a local repo available hosting the JDBC jars, all we need to do is add that repo to our pom with an entry like this:
49 |
50 |
51 | YOUR.LOCAL.REPO.ID
52 |
53 | YOUR.LOCAL.REPO.NAME
54 |
55 | false
56 |
57 |
58 |
59 | For example, in my case my local repo entry looks like this:
60 |
61 |
62 | nexus.local
63 | http://10.10.10.7:8081/nexus/content/repositories/thirdparty
64 | Nexus Local
65 |
66 | false
67 |
68 |
69 |
70 | And you can refer to the JDBC artifacts with entries like this:
71 |
72 |
73 | com.cloudera.impala.jdbc
74 | ImpalaJDBC41
75 | 2.5.30
76 |
77 |
78 | Jars 6 - 11 will be retrieved from the Cloudera and Maven Central repos and will have traditional dependency elements like this:
79 |
80 |
81 | org.apache.thrift
82 | libfb303
83 | 0.9.0
84 |
85 |
86 | See the pom.xml for details
87 |
88 |
89 |
90 | ####Dependencies
91 | To build the project you must have Maven 2.x or higher installed. Maven info is [here](http://maven.apache.org).
92 |
93 | To run the project you must have access to a Hadoop cluster running Impala with at least one populated table defined in the Hive Metastore.
94 |
95 |
96 | #### Configuring the example
97 |
98 | Make sure to set your local repo in pom.xml as described above
99 |
100 | Edit the file src/main/resources/ClouderaImpalaJdbcExample.conf and set an Impala daemon's host and port in the connection.url (Impala's default JDBC port is 21050) and set the appropriate JDBC driver class. I am using JDBC4.1 so my conf file looks like this:
101 |
102 | # ClouderaImpalaJdbcExample.conf
103 | connection.url = jdbc:impala://chicago.onefoursix.com:21050
104 | jdbc.driver.class.name = com.cloudera.impala.jdbc41.Driver
105 |
106 | See the JDBC driver's docs for more details.
107 |
108 |
109 | #### Building the example
110 |
111 | Build the project like this:
112 |
113 | $ mvn clean package
114 |
115 | If this is the first time you are building the project you should see messages like this showing that Maven is retrieving the JDBC jars from your local repo:
116 |
117 | Downloading: http://10.10.10.7:8081/nexus/content/repositories/thirdparty/com/cloudera/impala/jdbc/hive_metastore/2.5.30/hive_metastore-2.5.30.jar
118 | Downloading: http://10.10.10.7:8081/nexus/content/repositories/thirdparty/com/cloudera/impala/jdbc/hive_service/2.5.30/hive_service-2.5.30.jar
119 | Downloading: http://10.10.10.7:8081/nexus/content/repositories/thirdparty/com/cloudera/impala/jdbc/ImpalaJDBC41/2.5.30/ImpalaJDBC41-2.5.30.jar
120 | Downloading: http://10.10.10.7:8081/nexus/content/repositories/thirdparty/com/cloudera/impala/jdbc/ql/2.5.30/ql-2.5.30.jar
121 | Downloading: http://10.10.10.7:8081/nexus/content/repositories/thirdparty/com/cloudera/impala/jdbc/TCLIServiceClient/2.5.30/TCLIServiceClient-2.5.30.jar
122 |
123 | Whereas the other jars (and their dependencies) are downloaded from the public repos:
124 |
125 | Downloading: https://repository.cloudera.com/artifactory/cloudera-repos/org/apache/thrift/libfb303/0.9.0/libfb303-0.9.0.jar
126 | Downloading: https://repository.cloudera.com/artifactory/cloudera-repos/org/apache/thrift/libthrift/0.9.0/libthrift-0.9.0.jar
127 | ...
128 |
129 | If your build is successful you should see messages like this:
130 |
131 | [INFO] Building jar: /home/mark/a/Cloudera-Impala-JDBC-Example-impala-cdh-5.5.2/cloudera-impala-jdbc-example-1.0.jar
132 | [INFO]
133 | [INFO] --- maven-shade-plugin:2.2:shade (default) @ cloudera-impala-jdbc-example ---
134 | [INFO] Including com.cloudera.impala.jdbc:hive_metastore:jar:2.5.30 in the shaded jar.
135 | [INFO] Including com.cloudera.impala.jdbc:hive_service:jar:2.5.30 in the shaded jar.
136 | ...
137 | [INFO] ------------------------------------------------------------------------
138 | [INFO] BUILD SUCCESS
139 | [INFO] ------------------------------------------------------------------------
140 | [INFO] Total time: 3.108 s
141 | [INFO] Finished at: 2016-02-21T11:24:56-08:00
142 | [INFO] Final Memory: 32M/476M
143 | [INFO] ------------------------------------------------------------------------
144 |
145 | Note that pom.xml is configured to have Maven build an "uber jar" will all dependencies packaged in a single jar and with the main class set
146 |
147 | The uber jar will be located at target/cloudera-impala-jdbc-example-uber.jar
148 |
149 |
150 | #### Running the example using the uber jar
151 |
152 | One can run the example using the uber jar with a "java -jar" command with a SQL statement as an argument like this:
153 |
154 | $ java -jar target/cloudera-impala-jdbc-example-uber.jar "SELECT description FROM sample_07 limit 10"
155 |
156 | =============================================
157 | Cloudera Impala JDBC Example
158 | Using Connection URL: jdbc:impala://chicago.onefoursix.com:21050
159 | Running Query: SELECT description FROM sample_07 limit 10
160 |
161 | == Begin Query Results ======================
162 | All Occupations
163 | Management occupations
164 | Chief executives
165 | General and operations managers
166 | Legislators
167 | Advertising and promotions managers
168 | Marketing managers
169 | Sales managers
170 | Public relations managers
171 | Administrative services managers
172 | == End Query Results =======================
173 |
174 | There is a "run.sh" script provided with that command
175 |
176 | #### Running the example using Maven
177 |
178 | One can also run the example using Maven using the run-with-maven.sh script which by default passes a SQL statement as an argument:
179 |
180 | mvn exec:java -Dexec.mainClass=com.cloudera.example.ClouderaImpalaJdbcExample -Dexec.arguments="SELECT description FROM sample_07 limit 10"
181 |
182 | Your output should look like this:
183 |
184 | $ ./run-with-maven.sh
185 | [INFO] Scanning for projects...
186 | ...
187 | [INFO] ------------------------------------------------------------------------
188 | [INFO] Building cloudera-impala-jdbc-example 1.0
189 | [INFO] ------------------------------------------------------------------------
190 | [INFO]
191 | [INFO] >>> exec-maven-plugin:1.2.1:java (default-cli) > validate @ cloudera-impala-jdbc-example >>>
192 | [INFO]
193 | [INFO] <<< exec-maven-plugin:1.2.1:java (default-cli) < validate @ cloudera-impala-jdbc-example <<<
194 | [INFO]
195 | [INFO] --- exec-maven-plugin:1.2.1:java (default-cli) @ cloudera-impala-jdbc-example ---
196 |
197 | Cloudera Impala JDBC Example
198 | Using Connection URL: jdbc:impala://chicago.onefoursix.com:21050
199 | Running Query: SELECT description FROM sample_07 limit 10
200 |
201 | == Begin Query Results ======================
202 | All Occupations
203 | Management occupations
204 | Chief executives
205 | General and operations managers
206 | Legislators
207 | Advertising and promotions managers
208 | Marketing managers
209 | Sales managers
210 | Public relations managers
211 | Administrative services managers
212 | == End Query Results =======================
213 |
214 | [INFO] ------------------------------------------------------------------------
215 | [INFO] BUILD SUCCESS
216 | [INFO] ------------------------------------------------------------------------
217 |
--------------------------------------------------------------------------------
/images/nexus2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/onefoursix/Cloudera-Impala-JDBC-Example/6399d6de25b75ff732f655158b8b2f4b375a5b3c/images/nexus2.png
--------------------------------------------------------------------------------
/images/nexus3.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/onefoursix/Cloudera-Impala-JDBC-Example/6399d6de25b75ff732f655158b8b2f4b375a5b3c/images/nexus3.png
--------------------------------------------------------------------------------
/images/nexus4.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/onefoursix/Cloudera-Impala-JDBC-Example/6399d6de25b75ff732f655158b8b2f4b375a5b3c/images/nexus4.png
--------------------------------------------------------------------------------
/images/nexus5.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/onefoursix/Cloudera-Impala-JDBC-Example/6399d6de25b75ff732f655158b8b2f4b375a5b3c/images/nexus5.png
--------------------------------------------------------------------------------
/pom.xml:
--------------------------------------------------------------------------------
1 |
4 |
5 | 4.0.0
6 | com.cloudera.example
7 | cloudera-impala-jdbc-example
8 | 1.0
9 | jar
10 | Cloudera Impala JDBC Example for CDH 5.5.2
11 |
12 |
13 | UTF-8
14 | 2.5.30
15 | cloudera-impala-jdbc-example-uber.jar
16 | com.cloudera.example.ClouderaImpalaJdbcExample
17 |
18 |
19 |
20 |
21 |
22 |
23 |
24 | com.cloudera.impala.jdbc
25 | hive_metastore
26 | ${impala.jdbc.version}
27 |
28 |
29 | com.cloudera.impala.jdbc
30 | hive_service
31 | ${impala.jdbc.version}
32 |
33 |
34 | com.cloudera.impala.jdbc
35 | ImpalaJDBC41
36 | ${impala.jdbc.version}
37 |
38 |
39 | com.cloudera.impala.jdbc
40 | ql
41 | ${impala.jdbc.version}
42 |
43 |
44 | com.cloudera.impala.jdbc
45 | TCLIServiceClient
46 | ${impala.jdbc.version}
47 |
48 |
49 |
50 |
51 |
52 | org.apache.thrift
53 | libfb303
54 | 0.9.0
55 |
56 |
57 | org.apache.thrift
58 | libthrift
59 | 0.9.0
60 |
61 |
62 | log4j
63 | log4j
64 | 1.2.14
65 |
66 |
67 | org.slf4j
68 | slf4j-api
69 | 1.5.11
70 |
71 |
72 | org.slf4j
73 | slf4j-log4j12
74 | 1.5.11
75 |
76 |
77 | org.apache.zookeeper
78 | zookeeper
79 | 3.4.6
80 |
81 |
82 |
83 |
84 |
85 |
86 |
87 |
88 |
89 | org.codehaus.mojo
90 | exec-maven-plugin
91 | 1.2.1
92 |
93 |
94 |
95 |
96 |
97 |
98 | org.apache.maven.plugins
99 | maven-compiler-plugin
100 | 2.3.2
101 |
102 | 1.6
103 | 1.6
104 |
105 |
106 |
107 | org.apache.maven.plugins
108 | maven-jar-plugin
109 | 2.4
110 |
111 | ${basedir}
112 |
113 |
114 |
115 | maven-clean-plugin
116 | 2.6.1
117 |
118 |
119 |
120 | .
121 |
122 | *.jar
123 |
124 | false
125 |
126 |
127 |
128 |
129 |
130 |
131 | org.apache.maven.plugins
132 | maven-shade-plugin
133 | 2.2
134 |
135 | false
136 | target/${uber.jar.name}
137 |
138 |
139 | *:*
140 |
141 |
142 |
143 |
144 | *:*
145 |
146 | META-INF/*.SF
147 | META-INF/*.DSA
148 | META-INF/*.RSA
149 |
150 |
151 |
152 |
153 |
154 |
155 | package
156 |
157 | shade
158 |
159 |
160 |
161 |
162 |
163 | reference.conf
164 |
165 |
166 | ${uber.jar.main.class}
167 |
168 |
169 |
170 |
171 |
172 |
173 |
174 |
175 |
176 |
177 |
178 |
179 |
180 |
181 | YOUR.LOCAL.REPO.ID
182 |
183 | YOUR.LOCAL.REPO.NAME
184 |
185 | false
186 |
187 |
188 |
189 |
190 | cdh.repo
191 | https://repository.cloudera.com/artifactory/cloudera-repos
192 | Cloudera Repositories
193 |
194 | false
195 |
196 |
197 |
198 |
199 | central
200 | http://repo1.maven.org/maven2/
201 |
202 | true
203 |
204 |
205 | false
206 |
207 |
208 |
209 |
210 |
211 |
212 |
--------------------------------------------------------------------------------
/run-with-maven.sh:
--------------------------------------------------------------------------------
1 | mvn exec:java -Dexec.mainClass=com.cloudera.example.ClouderaImpalaJdbcExample -Dexec.arguments="SELECT description FROM sample_07 limit 10"
2 |
--------------------------------------------------------------------------------
/run.sh:
--------------------------------------------------------------------------------
1 | java -jar target/cloudera-impala-jdbc-example-uber.jar "SELECT description FROM sample_07 limit 10"
--------------------------------------------------------------------------------
/src/main/java/com/cloudera/example/ClouderaImpalaJdbcExample.java:
--------------------------------------------------------------------------------
1 | package com.cloudera.example;
2 |
3 | import java.io.IOException;
4 | import java.io.InputStream;
5 | import java.sql.Connection;
6 | import java.sql.DriverManager;
7 | import java.sql.ResultSet;
8 | import java.sql.SQLException;
9 | import java.sql.Statement;
10 | import java.util.Properties;
11 |
12 | public class ClouderaImpalaJdbcExample {
13 |
14 | private static final String CONNECTION_URL_PROPERTY = "connection.url";
15 | private static final String JDBC_DRIVER_NAME_PROPERTY = "jdbc.driver.class.name";
16 |
17 | private static String connectionUrl;
18 | private static String jdbcDriverName;
19 |
20 | private static void loadConfiguration() throws IOException {
21 | InputStream input = null;
22 | try {
23 | String filename = ClouderaImpalaJdbcExample.class.getSimpleName() + ".conf";
24 | input = ClouderaImpalaJdbcExample.class.getClassLoader().getResourceAsStream(filename);
25 | Properties prop = new Properties();
26 | prop.load(input);
27 |
28 | connectionUrl = prop.getProperty(CONNECTION_URL_PROPERTY);
29 | jdbcDriverName = prop.getProperty(JDBC_DRIVER_NAME_PROPERTY);
30 | } finally {
31 | try {
32 | if (input != null)
33 | input.close();
34 | } catch (IOException e) {
35 | // nothing to do
36 | }
37 | }
38 | }
39 |
40 | public static void main(String[] args) throws IOException {
41 |
42 | if (args.length != 1) {
43 | System.out.println("Syntax: ClouderaImpalaJdbcExample \"\"");
44 | System.exit(1);
45 | }
46 | String sqlStatement = args[0];
47 |
48 | loadConfiguration();
49 |
50 | System.out.println("\n=============================================");
51 | System.out.println("Cloudera Impala JDBC Example");
52 | System.out.println("Using Connection URL: " + connectionUrl);
53 | System.out.println("Running Query: " + sqlStatement);
54 |
55 | Connection con = null;
56 |
57 | try {
58 |
59 | Class.forName(jdbcDriverName);
60 |
61 | con = DriverManager.getConnection(connectionUrl);
62 |
63 | Statement stmt = con.createStatement();
64 |
65 | ResultSet rs = stmt.executeQuery(sqlStatement);
66 |
67 | System.out.println("\n== Begin Query Results ======================");
68 |
69 | // print the results to the console
70 | while (rs.next()) {
71 | // the example query returns one String column
72 | System.out.println(rs.getString(1));
73 | }
74 |
75 | System.out.println("== End Query Results =======================\n\n");
76 |
77 | } catch (SQLException e) {
78 | e.printStackTrace();
79 | } catch (Exception e) {
80 | e.printStackTrace();
81 | } finally {
82 | try {
83 | con.close();
84 | } catch (Exception e) {
85 | // swallow
86 | }
87 | }
88 | }
89 | }
90 |
--------------------------------------------------------------------------------
/src/main/resources/ClouderaImpalaJdbcExample.conf:
--------------------------------------------------------------------------------
1 | connection.url = jdbc:impala://IMPALAD_HOST:21050
2 | jdbc.driver.class.name = com.cloudera.impala.jdbc41.Driver
3 |
--------------------------------------------------------------------------------
/src/main/resources/log4j.properties:
--------------------------------------------------------------------------------
1 | # Root logger option
2 | log4j.rootLogger=INFO, stdout
3 |
4 | # Direct log messages to stdout
5 | log4j.appender.stdout=org.apache.log4j.ConsoleAppender
6 | log4j.appender.stdout.Target=System.out
7 | log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
8 | log4j.appender.stdout.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss} %-5p %c{1}:%L - %m%n
--------------------------------------------------------------------------------