├── .gitignore
├── README.md
├── bin
├── cloud-local.sh
├── config.sh
└── ports.sh
├── conf
└── cloud-local.conf
└── templates
├── accumulo
├── accumulo-env.sh
└── accumulo-site.xml
├── hadoop
├── core-site.xml
├── hdfs-site.xml
├── mapred-site.xml
└── yarn-site.xml
├── hbase
└── hbase-site.xml
├── kafka
└── server.properties
├── zeppelin
└── zeppelin-env.sh
└── zookeeper
└── zoo.cfg
/.gitignore:
--------------------------------------------------------------------------------
1 | *~
2 | /pkg
3 | data/*
4 | *.tar.gz
5 | /accumulo*
6 | /hbase*
7 | /hadoop*
8 | /zookeeper*
9 | /zeppelin*
10 | derby*
11 | /metastore_db*
12 | /kafka*
13 | /spark*
14 | /scala*
15 | /geomesa*
16 | /scala*
17 | /.idea
18 | /cloud-local.iml
19 | *.iml
20 | .envrc
21 |
22 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # cloud-local
2 |
3 | Cloud-local is a collection of bash scripts to set up a single-node cloud on on your desktop, laptop, or NUC. Performance expectations are to be sufficient for testing things like map-reduce ingest, converters in real life with real files, and your own geoserver/iterator stack. This setup is preconfigured to run YARN so you can submit command line tools mapreduce jobs to it. Currently localhost ssh is NOT required so it will work on a NUC.
4 |
5 | Cloud Local can be used to run single node versions of the following software:
6 | * Hadoop HDFS
7 | * YARN
8 | * Accumulo
9 | * HBase
10 | * Spark (on yarn)
11 | * Kafka
12 | * GeoMesa
13 |
14 | ## Versions and Branches
15 |
16 | The main branch requires Hadoop 3.x and Accumulo 2.0 while the hadoop2 branch is based on Hadoop 2.x and Accumulo 1.x
17 |
18 | ## Initial Configuration
19 |
20 | A proxy server can be configured by using the standard `http_proxy` env var or the cloud-local specific `cl_http_proxy` env var. The cloud local specific variable takes precedence.
21 |
22 | ## Getting Started
23 |
24 | To prepare for the first run of cloud-local, you may need to `unset` environment variables `HADOOP_HOME`, `ACCUMULO_HOME`, `ZOOKEEPER_HOME`, `HBASE_HOME`, and others. If `env |grep -i hadoop` comes back empty, you should be good to go. You should also `kill` any instances of zookeeper or hadoop running locally. Find them with `jps -lV`.
25 |
26 | When using this the first time...
27 |
28 | git clone git@github.com:ccri/cloud-local.git
29 | cd cloud-local
30 | bin/cloud-local.sh init
31 |
32 | By default only HDFS is started. Edit `conf/cloud-local.conf` to enable Accumulo, BHase, Kafka, GeoMesa, Spark and/or Zeppelin.
33 |
34 | Cloud local sets a default accumulo instance name of "local" and password of "secret" which can be modified by editing the `conf/cloud-local.conf` file. If you want to change this you'll need to stop, clean, and reconfigure.
35 |
36 | This init script does several things:
37 | * configure HDFS configuration files
38 | * format the HDFS namenode
39 | * create a user homedir in hdfs
40 | * initialize accumulo/hbase
41 | * start up zookeeper, hadoop, and accumulo/hbase
42 | * start kafka broker
43 | * install and start Zeppelin
44 | * install GeoMesa Accumulo iterators
45 | * install GeoMesa command-line tools
46 |
47 | After running `init` source the variables in your bashrc or other shell:
48 |
49 | source bin/config.sh
50 |
51 | Now you should have the environment vars set:
52 |
53 | env | grep -i hadoop
54 |
55 | Now you can run fun commands like:
56 |
57 | hadoop fs -ls /
58 | accumulo shell -u root
59 |
60 | After installing it you should be able to reach your standard cloud urls:
61 |
62 | * Accumulo: http://localhost:50095
63 | * Hadoop DFS: http://localhost:50070
64 | * Job Tracker: http://localhost:8088
65 | * Zeppelin: http://localhost:5771
66 |
67 | ## Getting Help
68 |
69 | Options for using `cloud-local.sh` can be found by calling:
70 |
71 | bin/cloud-local.sh help
72 |
73 | You can also set `CL_VERBOSE=1` env variable in `conf/cloud-local.conf` to increase messages
74 |
75 | ## Stopping and Starting
76 |
77 | You can safely stop the cloud using:
78 |
79 | bin/cloud-local.sh stop
80 |
81 | You should stop the cloud before shutting down the machine or doing maintenance.
82 |
83 | You can start the cloud back up using the analogous `start` option. Be sure that the cloud is not running (hit the cloud urls or `ps aux|grep -i hadoop`).
84 |
85 | bin/cloud-local.sh start
86 |
87 | If existing ports are bound to the ports needed for cloud-local an error message will be printed and the script will stop.
88 |
89 | ## Changing Ports, Hostname, and Bind Address
90 |
91 | cloud-local allows you to modify the ports, hostname, and bind addresses in configuration or using variables in your env (bashrc). For example:
92 |
93 | # sample .bashrc configuration
94 |
95 | # offset all ports by 10000
96 | export CL_PORT_OFFSET=10000
97 |
98 | # change the bind address
99 | export CL_BIND_ADDRESS=192.168.2.2
100 |
101 | # change the hostname from localhost to something else
102 | export CL_HOSTNAME=mydns.mycompany.com
103 |
104 | Port offseting moves the entire port space by a given numerical amount in order to allow multiple cloud-local instances to run on a single machine (usually by different users). The bind address and hostname args allow you to reference cloud local from other machines.
105 |
106 | WARNING - you should stop and clean cloud-local before changing any of these parameters since they will modify the config and may prevent cloud-local from cleanly shutting down. Changing port offsets is supported by XML comments in the accumulo and hadoop config files. Removing or changing these comments (CL_port_default) will likely cause failures.
107 |
108 | ## GeoServer
109 |
110 | If you have the environment variable GEOSERVER_HOME set you can use this parameter to start GeoServer at the same time but running in a child thread.
111 |
112 | bin/cloud-local.sh start -gs
113 |
114 | Similarly, you can instruct cloud-local to shutdown GeoServer with the cloud using:
115 |
116 | bin/cloud-local.sh stop -gs
117 |
118 | Additionally, if you need to restart GeoServer you may use the command `regeoserver`:
119 |
120 | bin/cloud-local.sh regeoserver
121 |
122 | The GeoServer PID is stored in `$CLOUD_HOME/data/geoserver/pid/geoserver.pid` and GeoServer's stdout is redirected to `$CLOUD_HOME/data/geoserver/log/std.out`.
123 |
124 | ## Zeppelin
125 |
126 | Zeppelin is *disabled* by default.
127 |
128 | Currently, we are using the Zeppelin distribution that includes all of the interpreters, and
129 | it is configured to run against Spark only in local mode. If you want to connect to another
130 | (real) cloud, you will have to configure that manually; see:
131 |
132 | [Zeppelin documentation](http://zeppelin.apache.org/docs/0.7.0/install/spark_cluster_mode.html#spark-on-yarn-mode)
133 |
134 | ### GeoMesa Spark-SQL on Zeppelin
135 |
136 | To enable GeoMesa's Spark-SQL within Zeppelin:
137 |
138 | 1. point your browser to your [local Zeppelin interpreter configuration](http://localhost:5771/#/interpreter)
139 | 1. scroll to the bottom where the *Spark* interpreter configuration appears
140 | 1. click on the "edit" button next to the interpreter name (on the right-hand side of the UI)
141 | 1. within the _Dependencies_ section, add this one JAR (either as a full, local file name or as Maven GAV coordinates):
142 | 1. geomesa-accumulo-spark-runtime_2.11-1.3.0.jar
143 | 1. when prompted by the pop-up, click to restart the Spark intepreter
144 |
145 | That's it! There is no need to restart any of the cloud-local services.
146 |
147 | ## Maintenance
148 |
149 | The `cloud-local.sh` script provides options for maintenance. Best to stop the cloud before performing any of these tasks. Pass in the parameter `clean` to remove software (but not the tar.gz's) and data. The parameter `reconfigure` will first `clean` then `init`.
150 |
151 | ### Updating
152 |
153 | When this git repo is updated, follow the steps below. The steps below will remove your data.
154 |
155 | cd $CLOUD_HOME
156 | bin/cloud-local.sh stop
157 | bin/cloud-local.sh clean
158 | git pull
159 | bin/cloud-local.sh init
160 |
161 | ### Starting over
162 |
163 | If you foobar your cloud, you can just delete everything and start over. You should do this once a week or so just for good measure.
164 |
165 | cd $CLOUD_HOME
166 | bin/cloud-local.sh stop #if cloud is running
167 | rm * -rf
168 | git pull
169 | git reset --hard
170 | bin/cloud-local.sh init
171 |
172 | ## Virtual Machine Help
173 |
174 | If you are using cloud-local within a virtual machine running you your local box you may want to set up port forwarding for port 50095 to see the accumulo monitor. For VirtualBox go to VM's Settings->Network->Port Forwarding section (name=accumulo, protocol=TCP, Host IP=127.0.0.1, Guest IP (leave blank), Guest Port=50095)
175 |
--------------------------------------------------------------------------------
/bin/cloud-local.sh:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env bash
2 |
3 | REPO_BASE=https://repo.locationtech.org/content/repositories/geomesa-releases
4 |
5 | # thanks accumulo for these resolutions snippets
6 | # Start: Resolve Script Directory
7 | SOURCE="${BASH_SOURCE[0]}"
8 | while [[ -h "${SOURCE}" ]]; do # resolve $SOURCE until the file is no longer a symlink
9 | bin="$( cd -P "$( dirname "${SOURCE}" )" && pwd )"
10 | SOURCE="$(readlink "${SOURCE}")"
11 | [[ "${SOURCE}" != /* ]] && SOURCE="${bin}/${SOURCE}" # if $SOURCE was a relative symlink, we need to resolve it relative to the path where the symlink file was located
12 | done
13 | bin="$( cd -P "$( dirname "${SOURCE}" )" && pwd )"
14 | script=$( basename "${SOURCE}" )
15 | # Stop: Resolve Script Directory
16 |
17 | # Start: config
18 | . "${bin}"/config.sh
19 |
20 | # Check config
21 | if ! validate_config; then
22 | echo "Invalid configuration"
23 | exit 1
24 | fi
25 |
26 | # check java home
27 | if [[ -z "$JAVA_HOME" ]]; then
28 | echo "must set JAVA_HOME..."
29 | exit 1
30 | fi
31 | # Stop: config
32 |
33 | # import port checking
34 | . "${bin}"/ports.sh
35 |
36 | function download_packages {
37 | # Is the pre-download packages variable set?
38 | if [[ ! -z ${pkg_pre_download+x} ]]; then
39 | # Does that folder actually exist?
40 | if [[ -d ${pkg_pre_download} ]] ; then
41 | test -d ${CLOUD_HOME}/pkg || rmdir ${CLOUD_HOME}/pkg
42 | test -h ${CLOUD_HOME}/pkg && rm ${CLOUD_HOME}/pkg
43 | ln -s ${pkg_pre_download} ${CLOUD_HOME}/pkg
44 | echo "Skipping downloads... using ${pkg_pre_download}"
45 | return 0
46 | fi
47 | fi
48 |
49 | # get stuff
50 | echo "Downloading packages from internet..."
51 | test -d ${CLOUD_HOME}/pkg || mkdir ${CLOUD_HOME}/pkg
52 |
53 | # check for proxy
54 | if [[ ! -z ${cl_http_proxy+x} ]]; then
55 | export http_proxy="${cl_http_proxy}"
56 | fi
57 |
58 | if [[ ! -z ${http_proxy+x} ]]; then
59 | echo "Using proxy ${http_proxy}"
60 | fi
61 |
62 | # GeoMesa
63 | if [[ "${geomesa_enabled}" -eq "1" ]]; then
64 | gm="geomesa-accumulo-dist_${pkg_geomesa_scala_ver}-${pkg_geomesa_ver}-bin.tar.gz"
65 | url="${REPO_BASE}/org/locationtech/geomesa/geomesa-accumulo-dist_${pkg_geomesa_scala_ver}/${pkg_geomesa_ver}/${gm}"
66 | wget -c -O "${CLOUD_HOME}/pkg/${gm}" "${url}" || { rm -f "${CLOUD_HOME}/pkg/${gm}"; echo "Error downloading: ${CLOUD_HOME}/pkg/${gm}"; errorList="${errorList} ${gm} ${NL}"; };
67 | gm="geomesa-accumulo-distributed-runtime_${pkg_geomesa_scala_ver}-${pkg_geomesa_ver}.jar"
68 | url="${REPO_BASE}/org/locationtech/geomesa/geomesa-accumulo-distributed-runtime_${pkg_geomesa_scala_ver}/${pkg_geomesa_ver}/${gm}"
69 | wget -c -O "${CLOUD_HOME}/pkg/${gm}" "${url}" || { rm -f "${CLOUD_HOME}/pkg/${gm}"; echo "Error downloading: ${CLOUD_HOME}/pkg/${gm}"; errorList="${errorList} ${gm} ${NL}"; };
70 | fi
71 |
72 | # Scala
73 | if [[ "${scala_enabled}" -eq "1" ]]; then
74 | url="http://downloads.lightbend.com/scala/${pkg_scala_ver}/scala-${pkg_scala_ver}.tgz"
75 | file="${CLOUD_HOME}/pkg/scala-${pkg_scala_ver}.tgz"
76 | wget -c -O "${file}" "${url}" \
77 | || { rm -f "${file}"; echo "Error downloading: ${file}"; errorList="${errorList} scala-${pkg_scala_ver}.tgz ${NL}"; };
78 | fi
79 |
80 | local apache_archive_url="http://archive.apache.org/dist"
81 |
82 | local maven=${pkg_src_maven}
83 |
84 | declare -a urls=("${apache_archive_url}/hadoop/common/hadoop-${pkg_hadoop_ver}/hadoop-${pkg_hadoop_ver}.tar.gz"
85 | "${apache_archive_url}/zookeeper/zookeeper-${pkg_zookeeper_ver}/zookeeper-${pkg_zookeeper_ver}.tar.gz")
86 |
87 | if [[ "$spark_enabled" -eq 1 ]]; then
88 | urls=("${urls[@]}" "${apache_archive_url}/spark/spark-${pkg_spark_ver}/spark-${pkg_spark_ver}-bin-${pkg_spark_hadoop_ver}.tgz")
89 | fi
90 |
91 | if [[ "$kafka_enabled" -eq 1 ]]; then
92 | urls=("${urls[@]}" "${apache_archive_url}/kafka/${pkg_kafka_ver}/kafka_${pkg_kafka_scala_ver}-${pkg_kafka_ver}.tgz")
93 | fi
94 |
95 | if [[ "$acc_enabled" -eq 1 ]]; then
96 | urls=("${urls[@]}" "${maven}/org/apache/accumulo/accumulo/${pkg_accumulo_ver}/accumulo-${pkg_accumulo_ver}-bin.tar.gz")
97 | fi
98 |
99 | if [[ "$hbase_enabled" -eq 1 ]]; then
100 | urls=("${urls[@]}" "${apache_archive_url}/hbase/${pkg_hbase_ver}/hbase-${pkg_hbase_ver}-bin.tar.gz")
101 | fi
102 |
103 | if [[ "$zeppelin_enabled" -eq 1 ]]; then
104 | urls=("${urls[@]}" "${apache_archive_url}/zeppelin/zeppelin-${pkg_zeppelin_ver}/zeppelin-${pkg_zeppelin_ver}-bin-all.tgz")
105 | fi
106 |
107 | for x in "${urls[@]}"; do
108 | fname=$(basename "$x");
109 | echo "fetching ${x}";
110 | wget -c -O "${CLOUD_HOME}/pkg/${fname}" "$x" || { rm -f "${CLOUD_HOME}/pkg/${fname}"; echo "Error Downloading: ${fname}"; errorList="${errorList} ${x} ${NL}"; };
111 | done
112 |
113 | if [[ -n "${errorList}" ]]; then
114 | echo "Failed to download: ${NL} ${errorList}";
115 | fi
116 | }
117 |
118 | function unpackage {
119 | local targs
120 | if [[ "${CL_VERBOSE}" == "1" ]]; then
121 | targs="xvf"
122 | else
123 | targs="xf"
124 | fi
125 |
126 | echo "Unpackaging software..."
127 | [[ "${geomesa_enabled}" -eq "1" ]] \
128 | && $(cd -P "${CLOUD_HOME}" && tar $targs "${CLOUD_HOME}/pkg/geomesa-accumulo-dist_${pkg_geomesa_scala_ver}-${pkg_geomesa_ver}-bin.tar.gz") \
129 | && echo "Unpacked GeoMesa Tools"
130 | [[ "${scala_enabled}" -eq "1" ]] \
131 | && $(cd -P "${CLOUD_HOME}" && tar $targs "${CLOUD_HOME}/pkg/scala-${pkg_scala_ver}.tgz") \
132 | && echo "Unpacked Scala ${pkg_scala_ver}"
133 | (cd -P "${CLOUD_HOME}" && tar $targs "${CLOUD_HOME}/pkg/zookeeper-${pkg_zookeeper_ver}.tar.gz") && echo "Unpacked zookeeper"
134 | [[ "$acc_enabled" -eq 1 ]] && (cd -P "${CLOUD_HOME}" && tar $targs "${CLOUD_HOME}/pkg/accumulo-${pkg_accumulo_ver}-bin.tar.gz") && echo "Unpacked accumulo"
135 | [[ "$hbase_enabled" -eq 1 ]] && (cd -P "${CLOUD_HOME}" && tar $targs "${CLOUD_HOME}/pkg/hbase-${pkg_hbase_ver}-bin.tar.gz") && echo "Unpacked hbase"
136 | [[ "$zeppelin_enabled" -eq 1 ]] && (cd -P "${CLOUD_HOME}" && tar $targs "${CLOUD_HOME}/pkg/zeppelin-${pkg_zeppelin_ver}-bin-all.tgz") && echo "Unpacked zeppelin"
137 | [[ "$kafka_enabled" -eq 1 ]] && (cd -P "${CLOUD_HOME}" && tar $targs "${CLOUD_HOME}/pkg/kafka_${pkg_kafka_scala_ver}-${pkg_kafka_ver}.tgz") && echo "Unpacked kafka"
138 | [[ "$spark_enabled" -eq 1 ]] && (cd -P "${CLOUD_HOME}" && tar $targs "${CLOUD_HOME}/pkg/spark-${pkg_spark_ver}-bin-${pkg_spark_hadoop_ver}.tgz") && echo "Unpacked spark"
139 | (cd -P "${CLOUD_HOME}" && tar $targs "${CLOUD_HOME}/pkg/hadoop-${pkg_hadoop_ver}.tar.gz") && echo "Unpacked hadoop"
140 | }
141 |
142 | function configure {
143 | mkdir -p "${CLOUD_HOME}/tmp/staging"
144 | cp -r ${CLOUD_HOME}/templates/* ${CLOUD_HOME}/tmp/staging/
145 |
146 | # accumulo config before substitutions
147 | [[ "$acc_enabled" -eq 1 ]] && cp $ACCUMULO_HOME/conf/examples/3GB/standalone/* $ACCUMULO_HOME/conf/
148 |
149 | ## Substitute env vars
150 | sed -i~orig "s#LOCAL_CLOUD_PREFIX#${CLOUD_HOME}#;s#CLOUD_LOCAL_HOSTNAME#${CL_HOSTNAME}#;s#CLOUD_LOCAL_BIND_ADDRESS#${CL_BIND_ADDRESS}#" ${CLOUD_HOME}/tmp/staging/*/*
151 |
152 | if [[ "$acc_enabled" -eq 1 ]]; then
153 | # accumulo config
154 | echo "${CL_HOSTNAME}" > ${ACCUMULO_HOME}/conf/gc
155 | echo "${CL_HOSTNAME}" > ${ACCUMULO_HOME}/conf/masters
156 | echo "${CL_HOSTNAME}" > ${ACCUMULO_HOME}/conf/tservers
157 | echo "${CL_HOSTNAME}" > ${ACCUMULO_HOME}/conf/monitor
158 | echo "${CL_HOSTNAME}" > ${ACCUMULO_HOME}/conf/tracers
159 | fi
160 |
161 | if [[ "$hbase_enabled" -eq 1 ]]; then
162 | sed -i~orig "s/\# export HBASE_MANAGES_ZK=true/export HBASE_MANAGES_ZK=false/" "${HBASE_HOME}/conf/hbase-env.sh"
163 | echo "${CL_HOSTNAME}" > ${HBASE_HOME}/conf/regionservers
164 | fi
165 |
166 | # Zeppelin configuration
167 | if [[ "$zeppelin_enabled" -eq 1 ]]; then
168 | echo "[WARNING] Zeppelin configuration is only template-based for now!"
169 | fi
170 |
171 | # hadoop slaves file
172 | echo "${CL_HOSTNAME}" > ${CLOUD_HOME}/tmp/staging/hadoop/slaves
173 |
174 | # deploy from staging
175 | echo "Deploying config from staging..."
176 | test -d $HADOOP_CONF_DIR || mkdir $HADOOP_CONF_DIR
177 | test -d $ZOOKEEPER_HOME/conf || mkdir $ZOOKEEPER_HOME/conf
178 | [[ "$kafka_enabled" -eq 1 ]] && (test -d $KAFKA_HOME/config || mkdir $KAFKA_HOME/config)
179 | cp ${CLOUD_HOME}/tmp/staging/hadoop/* $HADOOP_CONF_DIR/
180 | cp ${CLOUD_HOME}/tmp/staging/zookeeper/* $ZOOKEEPER_HOME/conf/
181 | [[ "$kafka_enabled" -eq 1 ]] && cp ${CLOUD_HOME}/tmp/staging/kafka/* $KAFKA_HOME/config/
182 | [[ "$acc_enabled" -eq 1 ]] && cp ${CLOUD_HOME}/tmp/staging/accumulo/* ${ACCUMULO_HOME}/conf/
183 | [[ "$geomesa_enabled" -eq 1 ]] && cp ${CLOUD_HOME}/pkg/geomesa-accumulo-distributed-runtime_${pkg_geomesa_scala_ver}-${pkg_geomesa_ver}.jar ${ACCUMULO_HOME}/lib/ext/
184 | [[ "$hbase_enabled" -eq 1 ]] && cp ${CLOUD_HOME}/tmp/staging/hbase/* ${HBASE_HOME}/conf/
185 | [[ "$zeppelin_enabled" -eq 1 ]] && cp ${CLOUD_HOME}/tmp/staging/zeppelin/* ${ZEPPELIN_HOME}/conf/
186 |
187 | # If Spark doesn't have log4j settings, use the Spark defaults
188 | if [[ "$spark_enabled" -eq 1 ]]; then
189 | test -f $SPARK_HOME/conf/log4j.properties && cp $SPARK_HOME/conf/log4j.properties.template $SPARK_HOME/conf/log4j.properties
190 | fi
191 |
192 | # configure port offsets
193 | configure_port_offset
194 |
195 | # As of Accumulo 2 accumulo-site.xml is nolonger allowed. To avoid a lot of work rewriting the ports script we'll just use accumulo's converter.
196 | if [ -f "$ACCUMULO_HOME/conf/accumulo-site.xml" ]; then
197 | rm -f "$ACCUMULO_HOME/conf/accumulo.properties"
198 | "$ACCUMULO_HOME/bin/accumulo" convert-config \
199 | -x "$ACCUMULO_HOME/conf/accumulo-site.xml" \
200 | -p "$ACCUMULO_HOME/conf/accumulo.properties"
201 | rm -f "$ACCUMULO_HOME/conf/accumulo-site.xml"
202 | fi
203 |
204 | # Configure accumulo-client.properties
205 | if [ -f "$ACCUMULO_HOME/conf/accumulo-client.properties" ]; then
206 | sed -i "s/.*instance.name=.*$/instance.name=$cl_acc_inst_name/" "$ACCUMULO_HOME/conf/accumulo-client.properties"
207 | sed -i "s/.*auth.principal=.*$/auth.principal=root/" "$ACCUMULO_HOME/conf/accumulo-client.properties"
208 | sed -i "s/.*auth.token=.*$/auth.token=$cl_acc_inst_pass/" "$ACCUMULO_HOME/conf/accumulo-client.properties"
209 |
210 | fi
211 | rm -rf ${CLOUD_HOME}/tmp/staging
212 | }
213 |
214 | function start_first_time {
215 | # This seems redundant to config but this is the first time in normal sequence where it will set properly
216 | [[ "$pkg_spark_hadoop_ver" = "without-hadoop" ]] && export SPARK_DIST_CLASSPATH=$(hadoop classpath)
217 | # check ports
218 | check_ports
219 |
220 | # start zk
221 | echo "Starting zoo..."
222 | (cd $CLOUD_HOME; $ZOOKEEPER_HOME/bin/zkServer.sh start)
223 |
224 | if [[ "$kafka_enabled" -eq 1 ]]; then
225 | # start kafka
226 | echo "Starting kafka..."
227 | $KAFKA_HOME/bin/kafka-server-start.sh -daemon $KAFKA_HOME/config/server.properties
228 | fi
229 |
230 | # format namenode
231 | echo "Formatting namenode..."
232 | $HADOOP_HOME/bin/hdfs namenode -format
233 |
234 | # start hadoop
235 | echo "Starting hadoop..."
236 | $HADOOP_HOME/bin/hdfs --config $HADOOP_CONF_DIR --daemon start namenode
237 | $HADOOP_HOME/bin/hdfs --config $HADOOP_CONF_DIR --daemon start secondarynamenode
238 | $HADOOP_HOME/bin/hdfs --config $HADOOP_CONF_DIR --daemon start datanode
239 | $HADOOP_HOME/bin/yarn --config $HADOOP_CONF_DIR --daemon start resourcemanager
240 | $HADOOP_HOME/bin/yarn --config $HADOOP_CONF_DIR --daemon start nodemanager
241 |
242 | # Wait for HDFS to exit safemode:
243 | hdfs_wait_safemode
244 |
245 | # create user homedir
246 | echo "Creating hdfs path /user/$USER"
247 | $HADOOP_HOME/bin/hadoop fs -mkdir -p "/user/$USER"
248 |
249 | # sleep
250 | sleep 5
251 |
252 | if [[ "$acc_enabled" -eq 1 ]]; then
253 | # init accumulo
254 | echo "Initializing accumulo"
255 | $ACCUMULO_HOME/bin/accumulo init --instance-name $cl_acc_inst_name --password $cl_acc_inst_pass
256 |
257 | # sleep
258 | sleep 5
259 |
260 | # starting accumulo
261 | echo "Starting accumulo..."
262 | $ACCUMULO_HOME/bin/accumulo-cluster start
263 | fi
264 |
265 | if [[ "$hbase_enabled" -eq 1 ]]; then
266 | # start hbase
267 | echo "Starting hbase..."
268 | ${HBASE_HOME}/bin/start-hbase.sh
269 | fi
270 |
271 | if [[ "$zeppelin_enabled" -eq 1 ]]; then
272 | # start zeppelin
273 | echo "Starting zeppelin..."
274 | ${ZEPPELIN_HOME}/bin/zeppelin-daemon.sh start
275 | fi
276 |
277 | if [[ "$geoserver_enable" -eq 1 ]]; then
278 | echo "Initializing geoserver..."
279 | mkdir -p "${GEOSERVER_DATA_DIR}"
280 | mkdir "${GEOSERVER_PID_DIR}"
281 | mkdir "${GEOSERVER_LOG_DIR}"
282 | touch "${GEOSERVER_PID_DIR}/geoserver.pid"
283 | touch "${GEOSERVER_LOG_DIR}/std.out"
284 | start_geoserver
285 | fi
286 |
287 | }
288 |
289 | function start_cloud {
290 | # Check ports
291 | check_ports
292 |
293 | if [[ "$master_enabled" -eq 1 ]]; then
294 | # start zk
295 | echo "Starting zoo..."
296 | (cd $CLOUD_HOME ; zkServer.sh start)
297 |
298 | if [[ "$kafka_enabled" -eq 1 ]]; then
299 | echo "Starting kafka..."
300 | $KAFKA_HOME/bin/kafka-server-start.sh -daemon $KAFKA_HOME/config/server.properties
301 | fi
302 |
303 | # start hadoop
304 | echo "Starting hadoop..."
305 | hdfs --config $HADOOP_CONF_DIR --daemon start namenode
306 | hdfs --config $HADOOP_CONF_DIR --daemon start secondarynamenode
307 | fi
308 |
309 | if [[ "$worker_enabled" -eq 1 ]]; then
310 | hdfs --config $HADOOP_CONF_DIR --daemon start datanode
311 | fi
312 |
313 | start_yarn
314 |
315 | # Wait for HDFS to exit safemode:
316 | echo "Waiting for HDFS to exit safemode..."
317 | hdfs_wait_safemode
318 | }
319 |
320 | function hdfs_wait_safemode {
321 | safemode_done=1
322 | while [[ "$safemode_done" -ne 0 ]]; do
323 | echo "Waiting for HDFS to exit safemode..."
324 | hdfs dfsadmin -safemode wait
325 | safemode_done=$?
326 | if [[ "$safemode_done" -ne 0 ]]; then
327 | echo "Safe mode not done...sleeping 1"
328 | sleep 1;
329 | fi
330 | done
331 | echo "Safemode exited"
332 | }
333 |
334 |
335 | function start_db {
336 | if [[ "$acc_enabled" -eq 1 ]]; then
337 | # starting accumulo
338 | echo "starting accumulo..."
339 | $ACCUMULO_HOME/bin/accumulo-cluster start
340 | fi
341 |
342 | if [[ "$hbase_enabled" -eq 1 ]]; then
343 | # start hbase
344 | echo "Starting hbase..."
345 | ${HBASE_HOME}/bin/start-hbase.sh
346 | fi
347 |
348 | if [[ "$zeppelin_enabled" -eq 1 ]]; then
349 | # start zeppelin
350 | echo "Starting zeppelin..."
351 | ${ZEPPELIN_HOME}/bin/zeppelin-daemon.sh start
352 | fi
353 |
354 | if [[ "$geoserver_enable" -eq 1 ]]; then
355 | echo "Starting geoserver..."
356 | start_geoserver
357 | fi
358 |
359 | }
360 |
361 | function start_yarn {
362 | if [[ "$master_enabled" -eq 1 ]]; then
363 | $HADOOP_HOME/bin/yarn --config $HADOOP_CONF_DIR --daemon start resourcemanager
364 | fi
365 | if [[ "$worker_enabled" -eq 1 ]]; then
366 | $HADOOP_HOME/bin/yarn --config $HADOOP_CONF_DIR --daemon start nodemanager
367 | fi
368 | }
369 |
370 | function start_geoserver {
371 | (${GEOSERVER_HOME}/bin/startup.sh &> ${GEOSERVER_LOG_DIR}/std.out) &
372 | GEOSERVER_PID=$!
373 | echo "${GEOSERVER_PID}" > ${GEOSERVER_PID_DIR}/geoserver.pid
374 | echo "GeoServer Process Started"
375 | echo "PID: ${GEOSERVER_PID}"
376 | echo "GeoServer Out: ${GEOSERVER_LOG_DIR}/std.out"
377 | }
378 |
379 | function stop_db {
380 | verify_stop
381 |
382 | if [[ "$zeppelin_enabled" -eq 1 ]]; then
383 | echo "Stopping zeppelin..."
384 | ${ZEPPELIN_HOME}/bin/zeppelin-daemon.sh stop
385 | fi
386 |
387 | if [[ "$kafka_enabled" -eq 1 ]]; then
388 | echo "Stopping kafka..."
389 | $KAFKA_HOME/bin/kafka-server-stop.sh
390 | fi
391 |
392 | if [[ "$acc_enabled" -eq 1 ]]; then
393 | echo "Stopping accumulo..."
394 | $ACCUMULO_HOME/bin/accumulo-cluster stop
395 | fi
396 |
397 | if [[ "$hbase_enabled" -eq 1 ]]; then
398 | echo "Stopping hbase..."
399 | ${HBASE_HOME}/bin/stop-hbase.sh
400 | fi
401 | }
402 |
403 | function stop_cloud {
404 | echo "Stopping yarn and dfs..."
405 | stop_yarn
406 |
407 | if [[ "$master_enabled" -eq 1 ]]; then
408 | $HADOOP_HOME/bin/hdfs --config $HADOOP_CONF_DIR --daemon stop namenode
409 | $HADOOP_HOME/bin/hdfs --config $HADOOP_CONF_DIR --daemon stop secondarynamenode
410 | fi
411 | if [[ "$worker_enabled" -eq 1 ]]; then
412 | $HADOOP_HOME/bin/hdfs --config $HADOOP_CONF_DIR --daemon stop datanode
413 | fi
414 | echo "Stopping zookeeper..."
415 | $ZOOKEEPER_HOME/bin/zkServer.sh stop
416 |
417 | if [[ "${geoserver_enabled}" -eq "1" ]]; then
418 | echo "Stopping geoserver..."
419 | stop_geoserver
420 | fi
421 |
422 | }
423 |
424 | function psux {
425 | ps ux | grep -i "$1"
426 | }
427 |
428 | function verify_stop {
429 | # Find Processes
430 | local zeppelin=`psux "[z]eppelin"`
431 | local kafka=`psux "[k]afka"`
432 | local accumulo=`psux "[a]ccumulo"`
433 | local hbase=`psux "[h]base"`
434 | local yarn=`psux "[y]arn"`
435 | local zookeeper=`psux "[z]ookeeper"`
436 | local hadoop=`psux "[h]adoop"`
437 | local geoserver=`psux "[g]eoserver"`
438 |
439 | local res="$zeppelin$kafka$accumulo$hbase$yarn$zookeeper$geoserver"
440 | if [[ -n "${res}" ]]; then
441 | echo "The following services do not appear to be shutdown:"
442 | if [[ -n "${zeppelin}" ]]; then
443 | echo "${NL}Zeppelin"
444 | psux "[z]eppelin"
445 | fi
446 | if [[ -n "${kafka}" ]]; then
447 | echo "${NL}Kafka"
448 | psux "[k]afka"
449 | fi
450 | if [[ -n "${accumulo}" ]]; then
451 | echo "${NL}Accumulo"
452 | psux "[a]ccumulo"
453 | fi
454 | if [[ -n "${hbase}" ]]; then
455 | echo "${NL}HBase"
456 | psux "[h]base"
457 | fi
458 | if [[ -n "${yarn}" ]]; then
459 | echo "${NL}Yarn"
460 | psux "[y]arn"
461 | fi
462 | if [[ -n "${zookeeper}" ]]; then
463 | echo "${NL}Zookeeper"
464 | psux "[z]ookeeper"
465 | fi
466 | if [[ -n "${hadoop}" ]]; then
467 | echo "${NL}Hadoop"
468 | psux "[h]adoop"
469 | fi
470 | if [[ -n "${geoserver}" ]]; then
471 | echo "${NL}GeoServer"
472 | psux "[g]eoserver"
473 | fi
474 | read -r -p "Would you like to continue? [Y/n] " confirm
475 | confirm=$(echo $confirm | tr '[:upper:]' '[:lower:]') #lowercasing
476 | if [[ $confirm =~ ^(yes|y) || $confirm == "" ]]; then
477 | return 0
478 | else
479 | exit 1
480 | fi
481 | fi
482 | }
483 |
484 | function stop_yarn {
485 | $HADOOP_HOME/bin/yarn --config $HADOOP_CONF_DIR --daemon stop resourcemanager
486 | $HADOOP_HOME/bin/yarn --config $HADOOP_CONF_DIR --daemon stop nodemanager
487 | }
488 |
489 | function stop_geoserver {
490 | GEOSERVER_PID=`cat ${GEOSERVER_PID_DIR}/geoserver.pid`
491 | if [[ -n "${GEOSERVER_PID}" ]]; then
492 | kill -15 ${GEOSERVER_PID}
493 | echo "TERM signal sent to process PID: ${GEOSERVER_PID}"
494 | else
495 | echo "No GeoServer PID was saved. This script must be used to start GeoServer in order for this script to be able to stop it."
496 | fi
497 | }
498 |
499 | function clear_sw {
500 | [[ "$zeppelin_enabled" -eq 1 ]] && rm -rf "${CLOUD_HOME}/zeppelin-${pkg_zeppelin_ver}-bin-all"
501 | [[ "$acc_enabled" -eq 1 ]] && rm -rf "${CLOUD_HOME}/accumulo-${pkg_accumulo_ver}"
502 | [[ "$hbase_enabled" -eq 1 ]] && rm -rf "${CLOUD_HOME}/hbase-${pkg_hbase_ver}"
503 | [[ -d "${CLOUD_HOME}/geomesa-accumulo_${pkg_geomesa_scala_ver}-${pkg_geomesa_ver}" ]] && rm -rf "${CLOUD_HOME}/geomesa-accumulo_${pkg_geomesa_scala_ver}-${pkg_geomesa_ver}"
504 | rm -rf "${CLOUD_HOME}/hadoop-${pkg_hadoop_ver}"
505 | rm -rf "${CLOUD_HOME}/zookeeper-${pkg_zookeeper_ver}"
506 | [[ "$kafka_enabled" -eq 1 ]] && rm -rf "${CLOUD_HOME}/kafka_${pkg_kafka_scala_ver}-${pkg_kafka_ver}"
507 | rm -rf "${CLOUD_HOME}/spark-${pkg_spark_ver}-bin-${pkg_spark_hadoop_ver}"
508 | rm -rf "${CLOUD_HOME}/scala-${pkg_scala_ver}"
509 | rm -rf "${CLOUD_HOME}/tmp"
510 | if [[ -a "${CLOUD_HOME}/zookeeper.out" ]]; then rm "${CLOUD_HOME}/zookeeper.out"; fi #hahahaha
511 | }
512 |
513 | function clear_data {
514 | read -r -p "Are you sure you want to clear data directories? [y/N] " confirm
515 | confirm=$(echo $confirm | tr '[:upper:]' '[:lower:]') #lowercasing
516 | if [[ $confirm =~ ^(yes|y) ]]; then
517 | rm -rf ${CLOUD_HOME}/data/yarn/*
518 | rm -rf ${CLOUD_HOME}/data/zookeeper/*
519 | rm -rf ${CLOUD_HOME}/data/dfs/data/*
520 | rm -rf ${CLOUD_HOME}/data/dfs/name/*
521 | rm -rf ${CLOUD_HOME}/data/hadoop/tmp/*
522 | rm -rf ${CLOUD_HOME}/data/hadoop/pid/*
523 | rm -rf ${CLOUD_HOME}/data/geoserver/pid/*
524 | rm -rf ${CLOUD_HOME}/data/geoserver/log/*
525 | if [[ -d "${CLOUD_HOME}/data/kafka-logs" ]]; then rm -rf ${CLOUD_HOME}/data/kafka-logs; fi # intentionally to clear dot files
526 | fi
527 | }
528 |
529 | function show_help {
530 | echo "Provide 1 command: (init|start|stop|reconfigure|restart|reyarn|regeoserver|clean|download_only|init_skip_download|help)"
531 | echo "If the environment variable GEOSERVER_HOME is set then the parameter '-gs' may be used with 'start' to automatically start/stop GeoServer with the cloud."
532 | }
533 |
534 | if [[ "$2" == "-gs" ]]; then
535 | if [[ -n "${GEOSERVER_HOME}" && -e $GEOSERVER_HOME/bin/startup.sh ]]; then
536 | geoserver_enabled=1
537 | else
538 | echo "The environment variable GEOSERVER_HOME is not set or is not valid."
539 | fi
540 | fi
541 |
542 | if [[ "$#" -ne 1 && "${geoserver_enabled}" -ne "1" ]]; then
543 | show_help
544 | exit 1
545 | fi
546 |
547 | if [[ $1 == 'init' ]]; then
548 | download_packages && unpackage && configure && start_first_time
549 | elif [[ $1 == 'reconfigure' ]]; then
550 | echo "reconfiguring..."
551 | #TODO ensure everything is stopped? prompt to make sure?
552 | stop_cloud && clear_sw && clear_data && unpackage && configure && start_first_time
553 | elif [[ $1 == 'clean' ]]; then
554 | echo "cleaning..."
555 | clear_sw && clear_data
556 | echo "cleaned!"
557 | elif [[ $1 == 'start' ]]; then
558 | echo "Starting cloud..."
559 | start_cloud && start_db
560 | echo "Cloud Started"
561 | elif [[ $1 == 'stop' ]]; then
562 | echo "Stopping Cloud..."
563 | stop_db && stop_cloud
564 | echo "Cloud stopped"
565 | elif [[ $1 == 'start_db' ]]; then
566 | echo "Starting cloud..."
567 | start_db
568 | echo "Database Started"
569 | elif [[ $1 == 'stop_db' ]]; then
570 | echo "Stopping Database..."
571 | stop_db
572 | echo "Cloud stopped"
573 | elif [[ $1 == 'start_hadoop' ]]; then
574 | echo "Starting Hadoop..."
575 | start_cloud
576 | echo "Cloud Hadoop"
577 | elif [[ $1 == 'stop_hadoop' ]]; then
578 | echo "Stopping Hadoop..."
579 | stop_cloud
580 | echo "Hadoop stopped"
581 | elif [[ $1 == 'reyarn' ]]; then
582 | echo "Stopping Yarn..."
583 | stop_yarn
584 | echo "Starting Yarn..."
585 | start_yarn
586 | elif [[ $1 == 'regeoserver' ]]; then
587 | stop_geoserver
588 | start_geoserver
589 | elif [[ $1 == 'restart' ]]; then
590 | stop_geoserver && stop_cloud && start_cloud && start_geoserver
591 | elif [[ $1 == 'download_only' ]]; then
592 | download_packages
593 | elif [[ $1 == 'init_skip_download' ]]; then
594 | unpackage && configure && start_first_time
595 | else
596 | show_help
597 | fi
598 |
599 |
600 |
601 |
--------------------------------------------------------------------------------
/bin/config.sh:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env bash
2 |
3 | # thanks accumulo for these resolutions snippets
4 | if [ -z "${CLOUD_HOME}" ] ; then
5 | # Start: Resolve Script Directory
6 | SOURCE="${BASH_SOURCE[0]}"
7 | while [ -h "$SOURCE" ]; do # resolve $SOURCE until the file is no longer a symlink
8 | bin="$( cd -P "$( dirname "$SOURCE" )" && pwd )"
9 | SOURCE="$(readlink "$SOURCE")"
10 | [[ $SOURCE != /* ]] && SOURCE="$bin/$SOURCE" # if $SOURCE was a relative symlink, we need to resolve it relative to the path where the symlink file was located
11 | done
12 | bin="$( cd -P "$( dirname "$SOURCE" )" && pwd )"
13 | script=$( basename "$SOURCE" )
14 | # Stop: Resolve Script Directory
15 |
16 | CLOUD_HOME=$( cd -P ${bin}/.. && pwd )
17 | export CLOUD_HOME
18 | fi
19 |
20 | if [ ! -d "${CLOUD_HOME}" ]; then
21 | echo "CLOUD_HOME=${CLOUD_HOME} is not a valid directory. Please make sure it exists"
22 | return 1
23 | fi
24 |
25 | # [Tab] shell completion because i'm lazy
26 | IFS=$'\n' complete -W "init start stop reconfigure regeoserver reyarn clean help" cloud-local.sh
27 | NL=$'\n'
28 |
29 | function validate_config {
30 | # allowed versions are
31 | local pkg_error=""
32 | # hadoop 3.2 currently required
33 | if [[ -z "$pkg_hadoop_ver" || ! $pkg_hadoop_ver =~ 3[.]2[.].+ ]]; then
34 | pkg_error="${pkg_error}Invalid hadoop version: '${pkg_hadoop_ver}' ${NL}"
35 | fi
36 | # zk 3.4.x
37 | if [[ -z "$pkg_zookeeper_ver" || ! $pkg_zookeeper_ver =~ 3[.]4[.]([56789]|10|11|12|13|14) ]]; then
38 | pkg_error="${pkg_error}Invalid zookeeper version: '${pkg_zookeeper_ver}' ${NL}"
39 | fi
40 | # acc 2.0.0
41 | if [[ -z "$pkg_accumulo_ver" || ! $pkg_accumulo_ver =~ 2[.]0[.]0 ]]; then
42 | pkg_error="${pkg_error}Invalid accumulo version: '${pkg_accumulo_ver}' ${NL}"
43 | fi
44 | # kafka 0.9.x, 0.10.x, 0.11.x, 1.0.x
45 | if [[ -z "$pkg_kafka_ver" || ! $pkg_kafka_ver =~ ((0[.]9[.].+)|(0[.]1[01][.].+)|1[.]0[.].) ]]; then
46 | pkg_error="${pkg_error}Invalid kafka version: '${pkg_kafka_ver}' ${NL}"
47 | fi
48 | # geomesa scala 1.3.x
49 | if [[ -z "$pkg_geomesa_scala_ver" && $pkg_geomesa_ver =~ 3[.]0[.].+ ]]; then
50 | pkg_error="${pkg_error}Invalid GeoMesa Scala version: '${pkg_geomesa_scala_ver}' ${NL}"
51 | fi
52 |
53 | if [[ ! -z "$pkg_error" ]]; then
54 | echo "ERROR: ${pkg_error}"
55 | return 1
56 | else
57 | return 0
58 | fi
59 | }
60 |
61 | function set_env_vars {
62 | if [[ $zeppelin_enabled -eq "1" ]]; then
63 | export ZEPPELIN_HOME="${CLOUD_HOME}/zeppelin-${pkg_zeppelin_ver}-bin-all"
64 | fi
65 |
66 | if [[ $geomesa_enabled -eq "1" ]]; then
67 | unset GEOMESA_HOME
68 | unset GEOMESA_BIN
69 | export GEOMESA_HOME="${CLOUD_HOME}/geomesa-accumulo_${pkg_geomesa_scala_ver}-${pkg_geomesa_ver}"
70 | export GEOMESA_BIN="${GEOMESA_HOME}/bin:"
71 | echo "Setting GEOMESA_HOME: ${GEOMESA_HOME}"
72 | fi
73 |
74 | export ZOOKEEPER_HOME="${CLOUD_HOME}/zookeeper-${pkg_zookeeper_ver}"
75 |
76 | if [[ $kafka_enabled -eq "1" ]]; then
77 | export KAFKA_HOME="${CLOUD_HOME}/kafka_2.11-${pkg_kafka_ver}"
78 | fi
79 |
80 | export HADOOP_HOME="$CLOUD_HOME/hadoop-${pkg_hadoop_ver}"
81 | export HADOOP_CONF_DIR="${HADOOP_HOME}/etc/hadoop"
82 | export HADOOP_COMMON_HOME="${HADOOP_HOME}"
83 | export HADOOP_HDFS_HOME="${HADOOP_HOME}"
84 | export HADOOP_YARN_HOME="${HADOOP_HOME}"
85 | export HADOOP_PID_DIR="${CLOUD_HOME}/data/hadoop/pid"
86 | export HADOOP_IDENT_STRING=$(echo ${CLOUD_HOME} | (md5sum 2>/dev/null || md5) | cut -c1-32)
87 |
88 | export YARN_HOME="${HADOOP_HOME}"
89 |
90 | export SPARK_HOME="$CLOUD_HOME/spark-${pkg_spark_ver}-bin-${pkg_spark_hadoop_ver}"
91 |
92 | export GEOSERVER_DATA_DIR="${CLOUD_HOME}/data/geoserver"
93 | export GEOSERVER_PID_DIR="${GEOSERVER_DATA_DIR}/pid"
94 | export GEOSERVER_LOG_DIR="${GEOSERVER_DATA_DIR}/log"
95 |
96 | [[ "${acc_enabled}" -eq "1" ]] && export ACCUMULO_HOME="${CLOUD_HOME}/accumulo-${pkg_accumulo_ver}"
97 | [[ "${hbase_enabled}" -eq "1" ]] && export HBASE_HOME="${CLOUD_HOME}/hbase-${pkg_hbase_ver}"
98 |
99 | export PATH="$GEOMESA_BIN"$ZOOKEEPER_HOME/bin:$KAFKA_HOME/bin:$HADOOP_HOME/sbin:$HADOOP_HOME/bin:$SPARK_HOME/bin:$PATH
100 | [[ "${acc_enabled}" -eq "1" ]] && export PATH="${ACCUMULO_HOME}/bin:${PATH}"
101 | [[ "${hbase_enabled}" -eq "1" ]] && export PATH="${HBASE_HOME}/bin:${PATH}"
102 | [[ "${zeppelin_enabled}" -eq "1" ]] && export PATH="${ZEPPELIN_HOME}/bin:${PATH}"
103 |
104 | # This variable requires Hadoop executable, which will fail during certain runs/steps
105 | [[ "$pkg_spark_hadoop_ver" = "without-hadoop" ]] && export SPARK_DIST_CLASSPATH=$(hadoop classpath 2>/dev/null)
106 |
107 | # Export direnv environment file https://direnv.net/
108 | env | grep -v PATH | sort > $CLOUD_HOME/.envrc
109 | echo "PATH=${PATH}" >> $CLOUD_HOME/.envrc
110 | }
111 |
112 | if [[ -z "$JAVA_HOME" ]];then
113 | echo "ERROR: must set JAVA_HOME..."
114 | return 1
115 | fi
116 |
117 | # load configuration scripts
118 | . "${CLOUD_HOME}/conf/cloud-local.conf"
119 | validate_config
120 | set_env_vars
121 |
122 |
123 |
--------------------------------------------------------------------------------
/bin/ports.sh:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env bash
2 |
3 | case $(uname) in
4 | "Darwin") SED_REGEXP_EXTENDED='-E' ;;
5 | *) SED_REGEXP_EXTENDED='-r' ;;
6 | esac
7 |
8 | # get the port offset from config variable
9 | function get_port_offset {
10 | local offset=0
11 |
12 | if [ -n "${CL_PORT_OFFSET}" ]; then
13 | offset=${CL_PORT_OFFSET}
14 | fi
15 |
16 | echo ${offset}
17 | }
18 |
19 | function check_port {
20 | local port=$1
21 | local offset=$(get_port_offset)
22 | local tocheck=$((port+offset))
23 | if (: < /dev/tcp/127.0.0.1/${tocheck}) 2>/dev/null; then
24 | echo "Error: port ${tocheck} is already taken (orig port ${port} with offset ${offset})"
25 | exit 1
26 | fi
27 | }
28 |
29 | function check_ports {
30 | check_port 2181 # zookeeper
31 |
32 | check_port 9092 # kafka broker
33 |
34 | # hadoop
35 | check_port 50010 # dfs.datanode.address
36 | check_port 50020 # dfs.datanode.ipc.address
37 | check_port 50075 # dfs.datanode.http.address
38 | check_port 50475 # dfs.datanode.https.address
39 |
40 | check_port 8020 # namenode data
41 | check_port 9000 # namenode data
42 | check_port 50070 # namenode http
43 | check_port 50470 # namenode https
44 |
45 | check_port 50090 # secondary name node
46 |
47 | check_port 8088 # yarn job tracker
48 | check_port 8030 # yarn
49 | check_port 8031 # yarn
50 | check_port 8032 # yarn
51 | check_port 8033 # yarn
52 |
53 | check_port 8090 # yarn
54 | check_port 8040 # yarn
55 | check_port 8042 # yarn
56 |
57 | check_port 13562 # mapreduce shuffle
58 |
59 | # accumulo
60 | check_port 9995 # accumulo monitor
61 | check_port 4560 # accumulo monitor log4j
62 | check_port 9997 # accumulo tserver
63 | check_port 9998 # accumulo gc
64 | check_port 9999 # accumulo master
65 | check_port 12234 # accumulo tracer
66 | check_port 10001 # accumulo master replication coordinator
67 | check_port 10002 # accumulo master replication service
68 |
69 |
70 | # hbase
71 | check_port 16000 # hbase master
72 | check_port 16010 # hbase master info
73 | check_port 16020 # hbase regionserver
74 | check_port 16030 # hbase regionserver info
75 | check_port 16040 # hbase rest
76 | check_port 16100 # hbase multicast
77 |
78 | # spark
79 | check_port 4040 # Spark job monitor
80 |
81 | # Zeppelin
82 | check_port 5771 # Zeppelin embedded web server
83 |
84 | echo "Known ports are OK (using offset $(get_port_offset))"
85 | }
86 |
87 | function configure_port_offset {
88 | local offset=$(get_port_offset)
89 |
90 | local KEY="CL_port_default"
91 | local KEY_CHANGED="CL_offset_port"
92 |
93 | # zookeeper (zoo.cfg)
94 | # do this one by hand, it's fairly straightforward
95 | zkPort=$((2181+offset))
96 | sed -i~orig "s/clientPort=.*/clientPort=$zkPort/" $ZOOKEEPER_HOME/conf/zoo.cfg
97 |
98 | # kafka (server.properties)
99 | if [[ "kafka_enabled" -eq 1 ]]; then
100 | kafkaPort=$((9092+offset))
101 | sed -i~orig "s/\/\/$CL_HOSTNAME:[0-9].*/\/\/$CL_HOSTNAME:$kafkaPort/" $KAFKA_HOME/config/server.properties
102 | sed -i~orig "s/zookeeper.connect=$CL_HOSTNAME:[0-9].*/zookeeper.connect=$CL_HOSTNAME:$zkPort/" $KAFKA_HOME/config/server.properties
103 | fi
104 |
105 | # Zeppelin
106 | if [[ "$zeppelin_enabled" -eq 1 ]]; then
107 | zeppelinPort=$((5771+offset))
108 | sed -i~orig "s/ZEPPELIN_PORT=[0-9]\{1,5\}\(.*\)/ZEPPELIN_PORT=$zeppelinPort\1/g" "$ZEPPELIN_HOME/conf/zeppelin-env.sh"
109 | fi
110 |
111 | # hadoop and accumulo xml files
112 | # The idea with this block is that the xml files have comments which tag lines which need
113 | # a port replacement, and the comments provide the default values. So to change ports,
114 | # we replace all the instance of the default value, on the line with the comment, with
115 | # the desired (offset) port.
116 |
117 | xmlFiles=( $HADOOP_CONF_DIR/core-site.xml \
118 | $HADOOP_CONF_DIR/hdfs-site.xml \
119 | $HADOOP_CONF_DIR/mapred-site.xml \
120 | $HADOOP_CONF_DIR/yarn-site.xml )
121 | if [ -f "$ACCUMULO_HOME/conf/accumulo-site.xml" ]; then
122 | xmlFiles+=($ACCUMULO_HOME/conf/accumulo-site.xml)
123 | fi
124 | if [ -f "$HBASE_HOME/conf/hbase-site.xml" ]; then
125 | xmlFiles+=($HBASE_HOME/conf/hbase-site.xml)
126 | fi
127 | for FILE in "${xmlFiles[@]}"; do
128 | while [[ -n "$(grep $KEY $FILE)" ]]; do # while lines need to be changed
129 | # pull the default port out of the comment
130 | basePort=$(grep -hoE "$KEY [0-9]+" $FILE | head -1 | grep -hoE [0-9]+)
131 | # calculate new port
132 | newPort=$(($basePort+$offset))
133 | # note that any part of the line matching the port line will be replaced...
134 | # the following sed only makes the replacement on a single line, containing the matched comment
135 | #sed -i~orig ${SED_REGEXP_EXTENDED} "/$KEY $basePort/ s#[0-9]+#$newPort#" $FILE
136 | if [[ "${CL_VERBOSE}" == "1" ]]; then echo "Replacing port $basePort with $newPort in file $FILE"; fi
137 | sed -i~orig ${SED_REGEXP_EXTENDED} "/$KEY $basePort/ s#(.*)(${basePort})(.*)#\\1${newPort}\\3#" $FILE
138 | # mark this line done
139 | sed -i~orig ${SED_REGEXP_EXTENDED} "s/$KEY $basePort/$KEY_CHANGED $basePort/" $FILE
140 | done
141 | # re-mark all comment lines, so we can change ports again later if we want
142 | sed -i~orig "s/$KEY_CHANGED/$KEY/g" $FILE
143 | done
144 |
145 | echo "Ports configured to use offset $offset"
146 | }
147 |
--------------------------------------------------------------------------------
/conf/cloud-local.conf:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env bash
2 |
3 | ################################################################################
4 | # REPOSITORY MANAGEMENT
5 | ################################################################################
6 |
7 | #
8 | # Source for packages (accumulo, hadoop, etc)
9 | # Available options are (local, wget)
10 | #
11 | # Set the variable 'pkg_src_mirror' if you want to specify a mirror
12 | # else it will use https://www.apache.org/dyn/closer.cgi to determine
13 | # a mirror. If you have a caching web proxy you may want to set this as well.
14 | #
15 | # pkg_src_mirror="http://apache.mirrors.tds.net"
16 |
17 | # Specify a maven repository to use
18 | pkg_src_maven="https://repo1.maven.org/maven2"
19 |
20 | # Optionally specifcy a local shared folder of package downloads
21 | #pkg_pre_download=/net/synds1/volume2/projects2/cloud-local/packages
22 |
23 | ################################################################################
24 | # VERSION MANAGEMENT - Versions of popular software
25 | ################################################################################
26 |
27 | pkg_accumulo_ver="2.0.0"
28 | pkg_hbase_ver="1.3.1"
29 | # Note pkg_spark_hadoop_ver below if modifying
30 | pkg_hadoop_ver="3.2.1"
31 | # Note, just the major+minor from Hadoop, not patch level
32 | hadoop_base_ver=${pkg_hadoop_ver:0:3}
33 |
34 |
35 | pkg_zookeeper_ver="3.4.14"
36 | # Note convention is scala.version_kafka.version
37 | pkg_kafka_scala_ver="2.11"
38 | pkg_kafka_ver="1.0.1"
39 |
40 | pkg_spark_ver="2.2.1"
41 | # Note pkg_hadoop_ver above
42 | # - don't auto-derive this as spark & hadoop major releases aren't lock-step
43 | # - use "without-hadoop" to use version without hadoop deps
44 | pkg_spark_hadoop_ver="without-hadoop"
45 |
46 | pkg_geomesa_ver="3.0.0"
47 | pkg_geomesa_scala_ver="2.11"
48 | pkg_scala_ver="2.11.7"
49 |
50 | # Apache Zeppelin, yet another analyst notebook that knows about Spark
51 | # You must change Spark to a compatible version (e.g. Zep 0.7.2 with Spark 2.1 and Zep 0.7.3 with Spark 2.2)
52 | pkg_zeppelin_ver="0.7.3"
53 |
54 | ################################################################################
55 | # ACCUMULO CONFIGURATION
56 | ################################################################################
57 |
58 | cl_acc_inst_name="local"
59 | cl_acc_inst_pass="secret"
60 |
61 | ################################################################################
62 | # IP/HOSTNAME/PORT CONFIGURATION - How to bind to things
63 | ################################################################################
64 |
65 | # The following options can be overriden in the user environment
66 | # bind address and hostname to use for all service bindings
67 | if [[ -z "${CL_HOSTNAME}" ]]; then
68 | CL_HOSTNAME=$(hostname)
69 | #CL_HOSTNAME=localhost
70 | fi
71 |
72 | if [[ -z "${CL_BIND_ADDRESS}" ]]; then
73 | CL_BIND_ADDRESS="0.0.0.0"
74 | #CL_BIND_ADDRESS="127.0.0.1"
75 | fi
76 |
77 | if [[ -z "${CL_PORT_OFFSET}" ]]; then
78 | CL_PORT_OFFSET=0
79 | fi
80 |
81 | if [[ -z "${CL_VERBOSE}" ]]; then
82 | CL_VERBOSE=0
83 | fi
84 |
85 | ################################################################################
86 | # PACKAGE MANAGEMENT - Enable/Disable software here
87 | ################################################################################
88 |
89 | # 1 = enabled
90 | # 0 = disabled
91 |
92 | master_enabled=1
93 | worker_enabled=1
94 |
95 | # Hadoop HDFS and YARN
96 | hadoop_enabled=1
97 |
98 | # Enable accumulo or hbase - probably best not to run both but it might work
99 | # Requires hadoop_enabled=1
100 | acc_enabled=1
101 | hbase_enabled=0
102 |
103 | # Enable/disable Kafka
104 | kafka_enabled=0
105 |
106 | # Download spark distribution
107 | spark_enabled=0
108 |
109 | # Enable/Disable installation of GeoMesa
110 | geomesa_enabled=0
111 | if [[ -z "${pkg_geomesa_ver}" && -z "${pkg_geomesa_scala_ver}" && "${geomesa_enabled}" -eq "1" ]]; then
112 | echo "Error: GeoMesa is enabled but the version number is missing."
113 | exit 1
114 | fi
115 |
116 | # Enable/Disable scala download
117 | scala_enabled=1
118 | if [[ -z "${pkg_scala_ver}" && "${scala_enabled}" -eq "1" ]]; then
119 | echo "Error: Scala is enabled but the version number is missing."
120 | exit 1
121 | fi
122 |
123 | # Enable/Disable Zepplin
124 | # Ensure that your Spark+Zeppelin versions are compatible
125 | zeppelin_enabled=0
126 |
--------------------------------------------------------------------------------
/templates/accumulo/accumulo-env.sh:
--------------------------------------------------------------------------------
1 | #! /usr/bin/env bash
2 |
3 | # Licensed to the Apache Software Foundation (ASF) under one or more
4 | # contributor license agreements. See the NOTICE file distributed with
5 | # this work for additional information regarding copyright ownership.
6 | # The ASF licenses this file to You under the Apache License, Version 2.0
7 | # (the "License"); you may not use this file except in compliance with
8 | # the License. You may obtain a copy of the License at
9 | #
10 | # http://www.apache.org/licenses/LICENSE-2.0
11 | #
12 | # Unless required by applicable law or agreed to in writing, software
13 | # distributed under the License is distributed on an "AS IS" BASIS,
14 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
15 | # See the License for the specific language governing permissions and
16 | # limitations under the License.
17 |
18 | ## Before accumulo-env.sh is loaded, these environment variables are set and can be used in this file:
19 |
20 | # cmd - Command that is being called such as tserver, master, etc.
21 | # basedir - Root of Accumulo installation
22 | # bin - Directory containing Accumulo scripts
23 | # conf - Directory containing Accumulo configuration
24 | # lib - Directory containing Accumulo libraries
25 |
26 | ############################
27 | # Variables that must be set
28 | ############################
29 |
30 | ## Accumulo logs directory. Referenced by logger config.
31 | export ACCUMULO_LOG_DIR="${ACCUMULO_LOG_DIR:-${basedir}/logs}"
32 | ## Hadoop installation
33 | export HADOOP_HOME="${HADOOP_HOME:-/path/to/hadoop}"
34 | ## Hadoop configuration
35 | export HADOOP_CONF_DIR="${HADOOP_CONF_DIR:-${HADOOP_HOME}/etc/hadoop}"
36 | ## Zookeeper installation
37 | export ZOOKEEPER_HOME="${ZOOKEEPER_HOME:-/path/to/zookeeper}"
38 |
39 | ##########################
40 | # Build CLASSPATH variable
41 | ##########################
42 |
43 | ## Verify that Hadoop & Zookeeper installation directories exist
44 | if [[ ! -d "$ZOOKEEPER_HOME" ]]; then
45 | echo "ZOOKEEPER_HOME=$ZOOKEEPER_HOME is not set to a valid directory in accumulo-env.sh"
46 | exit 1
47 | fi
48 | if [[ ! -d "$HADOOP_HOME" ]]; then
49 | echo "HADOOP_HOME=$HADOOP_HOME is not set to a valid directory in accumulo-env.sh"
50 | exit 1
51 | fi
52 |
53 | ## Build using existing CLASSPATH, conf/ directory, dependencies in lib/, and external Hadoop & Zookeeper dependencies
54 | if [[ -n "$CLASSPATH" ]]; then
55 | CLASSPATH="${CLASSPATH}:${conf}"
56 | else
57 | CLASSPATH="${conf}"
58 | fi
59 | CLASSPATH="${CLASSPATH}:${lib}/*:${HADOOP_CONF_DIR}:${ZOOKEEPER_HOME}/*:${HADOOP_HOME}/share/hadoop/client/*"
60 | export CLASSPATH
61 |
62 | ##################################################################
63 | # Build JAVA_OPTS variable. Defaults below work but can be edited.
64 | ##################################################################
65 |
66 | ## JVM options set for all processes. Extra options can be passed in by setting ACCUMULO_JAVA_OPTS to an array of options.
67 | JAVA_OPTS=("${ACCUMULO_JAVA_OPTS[@]}"
68 | '-XX:+UseConcMarkSweepGC'
69 | '-XX:CMSInitiatingOccupancyFraction=75'
70 | '-XX:+CMSClassUnloadingEnabled'
71 | '-XX:OnOutOfMemoryError=kill -9 %p'
72 | '-XX:-OmitStackTraceInFastThrow'
73 | '-Djava.net.preferIPv4Stack=true'
74 | "-Daccumulo.native.lib.path=${lib}/native")
75 |
76 | ## Make sure Accumulo native libraries are built since they are enabled by default
77 | "${bin}"/accumulo-util build-native &> /dev/null
78 |
79 | ## JVM options set for individual applications
80 | case "$cmd" in
81 | master) JAVA_OPTS=("${JAVA_OPTS[@]}" '-Xmx512m' '-Xms512m') ;;
82 | monitor) JAVA_OPTS=("${JAVA_OPTS[@]}" '-Xmx256m' '-Xms256m') ;;
83 | gc) JAVA_OPTS=("${JAVA_OPTS[@]}" '-Xmx256m' '-Xms256m') ;;
84 | tserver) JAVA_OPTS=("${JAVA_OPTS[@]}" '-Xmx3G' '-Xms3G') ;;
85 | *) JAVA_OPTS=("${JAVA_OPTS[@]}" '-Xmx256m' '-Xms64m') ;;
86 | esac
87 |
88 | ## JVM options set for logging. Review logj4 properties files to see how they are used.
89 | JAVA_OPTS=("${JAVA_OPTS[@]}"
90 | "-Daccumulo.log.dir=${ACCUMULO_LOG_DIR}"
91 | "-Daccumulo.application=${cmd}${ACCUMULO_SERVICE_INSTANCE}_$(hostname)")
92 |
93 | case "$cmd" in
94 | monitor)
95 | JAVA_OPTS=("${JAVA_OPTS[@]}" "-Dlog4j.configuration=log4j-monitor.properties")
96 | ;;
97 | gc|master|tserver|tracer)
98 | JAVA_OPTS=("${JAVA_OPTS[@]}" "-Dlog4j.configuration=log4j-service.properties")
99 | ;;
100 | *)
101 | # let log4j use its default behavior (log4j.xml, log4j.properties)
102 | true
103 | ;;
104 | esac
105 |
106 | export JAVA_OPTS
107 |
108 | ############################
109 | # Variables set to a default
110 | ############################
111 |
112 | export MALLOC_ARENA_MAX=${MALLOC_ARENA_MAX:-1}
113 | ## Add Hadoop native libraries to shared library paths given operating system
114 | case "$(uname)" in
115 | Darwin) export DYLD_LIBRARY_PATH="${HADOOP_HOME}/lib/native:${DYLD_LIBRARY_PATH}" ;;
116 | *) export LD_LIBRARY_PATH="${HADOOP_HOME}/lib/native:${LD_LIBRARY_PATH}" ;;
117 | esac
118 |
119 | ###############################################
120 | # Variables that are optional. Uncomment to set
121 | ###############################################
122 |
123 | ## Specifies command that will be placed before calls to Java in accumulo script
124 | # export ACCUMULO_JAVA_PREFIX=""
125 |
--------------------------------------------------------------------------------
/templates/accumulo/accumulo-site.xml:
--------------------------------------------------------------------------------
1 |
2 |
18 |
19 |
20 |
21 |
23 |
24 |
25 | instance.zookeeper.host
26 | CLOUD_LOCAL_HOSTNAME:2181
27 | comma separated list of zookeeper servers
28 |
29 |
30 |
31 | monitor.port.client
32 | 9995
33 |
34 |
35 |
36 | monitor.port.log4j
37 | 4560
38 |
39 |
40 |
41 | tserver.port.client
42 | 9997
43 |
44 |
45 |
46 | gc.port.client
47 | 9998
48 |
49 |
50 |
51 | master.port.client
52 | 9999
53 |
54 |
55 |
56 | master.replication.coordinator.port
57 | 10001
58 |
59 |
60 |
61 | replication.receipt.service.port
62 | 10002
63 |
64 |
65 |
66 | trace.port.client
67 | 12234
68 |
69 |
70 |
71 | instance.volumes
72 | hdfs://CLOUD_LOCAL_HOSTNAME:9000/accumulo
73 |
74 |
75 |
76 | instance.secret
77 | DEFAULT
78 | A secret unique to a given instance that all servers must know in order to communicate with one another.
79 | Change it before initialization. To
80 | change it later use ./bin/accumulo org.apache.accumulo.server.util.ChangeSecret --old [oldpasswd] --new [newpasswd],
81 | and then update this file.
82 |
83 |
84 |
85 |
86 | tserver.memory.maps.max
87 | 1G
88 |
89 |
90 |
91 | tserver.memory.maps.native.enabled
92 | false
93 |
94 |
95 |
96 | tserver.cache.data.size
97 | 128M
98 |
99 |
100 |
101 | tserver.cache.index.size
102 | 128M
103 |
104 |
105 |
106 | trace.token.property.password
107 |
108 | secret
109 |
110 |
111 |
112 | trace.user
113 | root
114 |
115 |
116 |
117 | tserver.sort.buffer.size
118 | 200M
119 |
120 |
121 |
122 | tserver.walog.max.size
123 | 1G
124 |
125 |
126 |
127 |
--------------------------------------------------------------------------------
/templates/hadoop/core-site.xml:
--------------------------------------------------------------------------------
1 |
2 |
3 |
16 |
17 |
18 |
19 |
20 |
21 | fs.defaultFS
22 | hdfs://CLOUD_LOCAL_HOSTNAME:9000
23 |
24 |
25 |
26 |
27 | fs.default.name
28 | hdfs://CLOUD_LOCAL_HOSTNAME:9000
29 |
30 |
31 |
32 | hadoop.tmp.dir
33 | LOCAL_CLOUD_PREFIX/data/hadoop/tmp
34 |
35 |
36 |
37 |
38 |
--------------------------------------------------------------------------------
/templates/hadoop/hdfs-site.xml:
--------------------------------------------------------------------------------
1 |
2 |
3 |
16 |
17 |
18 |
19 |
20 |
21 | dfs.replication
22 | 1
23 |
24 |
25 |
26 | dfs.datanode.synconclose
27 | true
28 |
29 |
30 |
31 | dfs.name.dir
32 | LOCAL_CLOUD_PREFIX/data/dfs/name
33 |
34 |
35 |
36 | dfs.data.dir
37 | LOCAL_CLOUD_PREFIX/data/dfs/data
38 |
39 |
40 |
41 | dfs.datanode.address
42 | CLOUD_LOCAL_BIND_ADDRESS:50010
43 |
44 |
45 |
46 | dfs.datanode.ipc.address
47 | CLOUD_LOCAL_BIND_ADDRESS:50020
48 |
49 |
50 |
51 | dfs.datanode.http.address
52 | CLOUD_LOCAL_BIND_ADDRESS:50075
53 |
54 |
55 |
56 | dfs.datanode.https.address
57 | CLOUD_LOCAL_BIND_ADDRESS:50475
58 |
59 |
60 |
61 | dfs.namenode.http-address
62 | CLOUD_LOCAL_BIND_ADDRESS:50070
63 |
64 |
65 |
66 | dfs.namenode.https-address
67 | CLOUD_LOCAL_BIND_ADDRESS:50470
68 |
69 |
70 |
71 | dfs.namenode.secondary.http-address
72 | CLOUD_LOCAL_BIND_ADDRESS:50090
73 |
74 |
75 |
76 |
--------------------------------------------------------------------------------
/templates/hadoop/mapred-site.xml:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 | mapreduce.framework.name
6 | yarn
7 |
8 |
9 |
10 | mapreduce.shuffle.port
11 | 13562
12 |
13 |
14 |
15 | mapreduce.jobhistory.address
16 | CLOUD_LOCAL_BIND_ADDRESS:10020
17 |
18 |
19 |
20 | mapreduce.jobhistory.webapp.address
21 | CLOUD_LOCAL_BIND_ADDRESS:19888
22 |
23 |
24 |
25 | mapreduce.jobtracker.http.address
26 | CLOUD_LOCAL_BIND_ADDRESS:50030
27 |
28 |
29 |
30 | mapreduce.tasktracker.http.address
31 | CLOUD_LOCAL_BIND_ADDRESS:50050
32 |
33 |
34 |
35 |
--------------------------------------------------------------------------------
/templates/hadoop/yarn-site.xml:
--------------------------------------------------------------------------------
1 |
2 |
15 |
16 |
17 |
18 | yarn.nodemanager.aux-services
19 | mapreduce_shuffle
20 |
21 |
22 |
23 | yarn.nodemanager.local-dirs
24 | LOCAL_CLOUD_PREFIX/data/yarn
25 |
26 |
27 |
28 | yarn.nodemanager.vmem-check-enabled
29 | false
30 |
31 |
32 |
33 | yarn.resourcemanager.webapp.address
34 | CLOUD_LOCAL_BIND_ADDRESS:8088
35 |
36 |
37 |
38 | yarn.resourcemanager.scheduler.address
39 | CLOUD_LOCAL_BIND_ADDRESS:8030
40 |
41 |
42 |
43 | yarn.resourcemanager.resource-tracker.address
44 | CLOUD_LOCAL_BIND_ADDRESS:8031
45 |
46 |
47 |
48 | yarn.resourcemanager.address
49 | CLOUD_LOCAL_BIND_ADDRESS:8032
50 |
51 |
52 |
53 | yarn.resourcemanager.admin.address
54 | CLOUD_LOCAL_BIND_ADDRESS:8033
55 |
56 |
57 |
58 | yarn.resourcemanager.webapp.https.address
59 | CLOUD_LOCAL_BIND_ADDRESS:8090
60 |
61 |
62 |
63 | yarn.nodemanager.localizer.address
64 | CLOUD_LOCAL_BIND_ADDRESS:8040
65 |
66 |
67 |
68 | yarn.nodemanager.webapp.address
69 | CLOUD_LOCAL_BIND_ADDRESS:8042
70 |
71 |
72 |
73 |
--------------------------------------------------------------------------------
/templates/hbase/hbase-site.xml:
--------------------------------------------------------------------------------
1 |
2 |
3 |
23 |
24 |
25 | hbase.cluster.distributed
26 | true
27 |
28 |
29 | hbase.rootdir
30 | hdfs://CLOUD_LOCAL_HOSTNAME:9000/hbase
31 |
32 |
33 |
34 |
35 | hbase.master.port
36 | 16000
37 |
38 |
39 | hbase.master.info.port
40 | 16010
41 |
42 |
43 | hbase.master.info.bindAddress
44 | 0.0.0.0
45 |
46 |
47 |
48 |
49 | hbase.regionserver.port
50 | 16020
51 |
52 |
53 | hbase.regionserver.info.port
54 | 16030
55 |
56 |
57 | hbase.regionserver.info.bindAddress
58 | 0.0.0.0
59 |
60 |
61 |
62 |
63 | hbase.rest.port
64 | 16040
65 |
66 |
67 |
68 |
69 | hbase.status.multicast.address.port
70 | 16100
71 |
72 |
73 |
74 | hbase.zookeeper.quorum
75 | CLOUD_LOCAL_HOSTNAME
76 |
77 |
78 | hbase.zookeeper.property.clientPort
79 | 2181
80 |
81 |
82 |
--------------------------------------------------------------------------------
/templates/kafka/server.properties:
--------------------------------------------------------------------------------
1 | # Licensed to the Apache Software Foundation (ASF) under one or more
2 | # contributor license agreements. See the NOTICE file distributed with
3 | # this work for additional information regarding copyright ownership.
4 | # The ASF licenses this file to You under the Apache License, Version 2.0
5 | # (the "License"); you may not use this file except in compliance with
6 | # the License. You may obtain a copy of the License at
7 | #
8 | # http://www.apache.org/licenses/LICENSE-2.0
9 | #
10 | # Unless required by applicable law or agreed to in writing, software
11 | # distributed under the License is distributed on an "AS IS" BASIS,
12 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13 | # See the License for the specific language governing permissions and
14 | # limitations under the License.
15 | # see kafka.server.KafkaConfig for additional details and defaults
16 |
17 | ############################# Server Basics #############################
18 |
19 | # The id of the broker. This must be set to a unique integer for each broker.
20 | broker.id=0
21 |
22 | ############################# Socket Server Settings #############################
23 |
24 | # CL_port_default 9092
25 | listeners=PLAINTEXT://CLOUD_LOCAL_HOSTNAME:9092
26 |
27 | # The port the socket server listens on
28 | #port=9092
29 |
30 | # Hostname the broker will bind to. If not set, the server will bind to all interfaces
31 | #host.name=localhost
32 |
33 | # Hostname the broker will advertise to producers and consumers. If not set, it uses the
34 | # value for "host.name" if configured. Otherwise, it will use the value returned from
35 | # java.net.InetAddress.getCanonicalHostName().
36 | #advertised.host.name=
37 |
38 | # The port to publish to ZooKeeper for clients to use. If this is not set,
39 | # it will publish the same port that the broker binds to.
40 | #advertised.port=
41 |
42 | # The number of threads handling network requests
43 | num.network.threads=3
44 |
45 | # The number of threads doing disk I/O
46 | num.io.threads=8
47 |
48 | # The send buffer (SO_SNDBUF) used by the socket server
49 | socket.send.buffer.bytes=102400
50 |
51 | # The receive buffer (SO_RCVBUF) used by the socket server
52 | socket.receive.buffer.bytes=102400
53 |
54 | # The maximum size of a request that the socket server will accept (protection against OOM)
55 | socket.request.max.bytes=104857600
56 |
57 |
58 | ############################# Log Basics #############################
59 |
60 | # A comma seperated list of directories under which to store log files
61 | log.dirs=LOCAL_CLOUD_PREFIX/data/kafka-logs
62 |
63 | # The default number of log partitions per topic. More partitions allow greater
64 | # parallelism for consumption, but this will also result in more files across
65 | # the brokers.
66 | num.partitions=1
67 |
68 | # The number of threads per data directory to be used for log recovery at startup and flushing at shutdown.
69 | # This value is recommended to be increased for installations with data dirs located in RAID array.
70 | num.recovery.threads.per.data.dir=1
71 |
72 | ############################# Log Flush Policy #############################
73 |
74 | # Messages are immediately written to the filesystem but by default we only fsync() to sync
75 | # the OS cache lazily. The following configurations control the flush of data to disk.
76 | # There are a few important trade-offs here:
77 | # 1. Durability: Unflushed data may be lost if you are not using replication.
78 | # 2. Latency: Very large flush intervals may lead to latency spikes when the flush does occur as there will be a lot of data to flush.
79 | # 3. Throughput: The flush is generally the most expensive operation, and a small flush interval may lead to exceessive seeks.
80 | # The settings below allow one to configure the flush policy to flush data after a period of time or
81 | # every N messages (or both). This can be done globally and overridden on a per-topic basis.
82 |
83 | # The number of messages to accept before forcing a flush of data to disk
84 | #log.flush.interval.messages=10000
85 |
86 | # The maximum amount of time a message can sit in a log before we force a flush
87 | #log.flush.interval.ms=1000
88 |
89 | ############################# Log Retention Policy #############################
90 |
91 | # The following configurations control the disposal of log segments. The policy can
92 | # be set to delete segments after a period of time, or after a given size has accumulated.
93 | # A segment will be deleted whenever *either* of these criteria are met. Deletion always happens
94 | # from the end of the log.
95 |
96 | # The minimum age of a log file to be eligible for deletion
97 | log.retention.hours=168
98 |
99 | # A size-based retention policy for logs. Segments are pruned from the log as long as the remaining
100 | # segments don't drop below log.retention.bytes.
101 | #log.retention.bytes=1073741824
102 |
103 | # The maximum size of a log segment file. When this size is reached a new log segment will be created.
104 | log.segment.bytes=1073741824
105 |
106 | # The interval at which log segments are checked to see if they can be deleted according
107 | # to the retention policies
108 | log.retention.check.interval.ms=300000
109 |
110 | # By default the log cleaner is disabled and the log retention policy will default to just delete segments after their retention expires.
111 | # If log.cleaner.enable=true is set the cleaner will be enabled and individual logs can then be marked for log compaction.
112 | log.cleaner.enable=false
113 |
114 | ############################# Zookeeper #############################
115 |
116 | # Zookeeper connection string (see zookeeper docs for details).
117 | # This is a comma separated host:port pairs, each corresponding to a zk
118 | # server. e.g. "127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002".
119 | # You can also append an optional chroot string to the urls to specify the
120 | # root directory for all kafka znodes.
121 | # CL_port_default 2181
122 | zookeeper.connect=CLOUD_LOCAL_HOSTNAME:2181
123 |
124 | # Timeout in ms for connecting to zookeeper
125 | zookeeper.connection.timeout.ms=6000
126 |
--------------------------------------------------------------------------------
/templates/zeppelin/zeppelin-env.sh:
--------------------------------------------------------------------------------
1 | #!/bin/bash
2 | #
3 | # Licensed to the Apache Software Foundation (ASF) under one or more
4 | # contributor license agreements. See the NOTICE file distributed with
5 | # this work for additional information regarding copyright ownership.
6 | # The ASF licenses this file to You under the Apache License, Version 2.0
7 | # (the "License"); you may not use this file except in compliance with
8 | # the License. You may obtain a copy of the License at
9 | #
10 | # http://www.apache.org/licenses/LICENSE-2.0
11 | #
12 | # Unless required by applicable law or agreed to in writing, software
13 | # distributed under the License is distributed on an "AS IS" BASIS,
14 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
15 | # See the License for the specific language governing permissions and
16 | # limitations under the License.
17 | #
18 |
19 | # export JAVA_HOME=
20 | # export MASTER= # Spark master url. eg. spark://master_addr:7077. Leave empty if you want to use local mode.
21 | # export ZEPPELIN_JAVA_OPTS # Additional jvm options. for example, export ZEPPELIN_JAVA_OPTS="-Dspark.executor.memory=8g -Dspark.cores.max=16"
22 | # export ZEPPELIN_MEM # Zeppelin jvm mem options Default -Xms1024m -Xmx1024m -XX:MaxPermSize=512m
23 | # export ZEPPELIN_INTP_MEM # zeppelin interpreter process jvm mem options. Default -Xms1024m -Xmx1024m -XX:MaxPermSize=512m
24 | # export ZEPPELIN_INTP_JAVA_OPTS # zeppelin interpreter process jvm options.
25 | export ZEPPELIN_PORT=5771 # Zeppelin server port. Defaults to 8080.
26 | # export ZEPPELIN_SSL_PORT # ssl port (used when ssl environment variable is set to true)
27 |
28 | # export ZEPPELIN_LOG_DIR # Where log files are stored. PWD by default.
29 | # export ZEPPELIN_PID_DIR # The pid files are stored. ${ZEPPELIN_HOME}/run by default.
30 | # export ZEPPELIN_WAR_TEMPDIR # The location of jetty temporary directory.
31 | # export ZEPPELIN_NOTEBOOK_DIR # Where notebook saved
32 | # export ZEPPELIN_NOTEBOOK_HOMESCREEN # Id of notebook to be displayed in homescreen. ex) 2A94M5J1Z
33 | # export ZEPPELIN_NOTEBOOK_HOMESCREEN_HIDE # hide homescreen notebook from list when this value set to "true". default "false"
34 | # export ZEPPELIN_NOTEBOOK_S3_BUCKET # Bucket where notebook saved
35 | # export ZEPPELIN_NOTEBOOK_S3_ENDPOINT # Endpoint of the bucket
36 | # export ZEPPELIN_NOTEBOOK_S3_USER # User in bucket where notebook saved. For example bucket/user/notebook/2A94M5J1Z/note.json
37 | # export ZEPPELIN_NOTEBOOK_S3_KMS_KEY_ID # AWS KMS key ID
38 | # export ZEPPELIN_NOTEBOOK_S3_KMS_KEY_REGION # AWS KMS key region
39 | # export ZEPPELIN_IDENT_STRING # A string representing this instance of zeppelin. $USER by default.
40 | # export ZEPPELIN_NICENESS # The scheduling priority for daemons. Defaults to 0.
41 | # export ZEPPELIN_INTERPRETER_LOCALREPO # Local repository for interpreter's additional dependency loading
42 | # export ZEPPELIN_NOTEBOOK_STORAGE # Refers to pluggable notebook storage class, can have two classes simultaneously with a sync between them (e.g. local and remote).
43 | # export ZEPPELIN_NOTEBOOK_ONE_WAY_SYNC # If there are multiple notebook storages, should we treat the first one as the only source of truth?
44 | # export ZEPPELIN_NOTEBOOK_PUBLIC # Make notebook public by default when created, private otherwise
45 |
46 | #### Spark interpreter configuration ####
47 |
48 | ## Use provided spark installation ##
49 | ## defining SPARK_HOME makes Zeppelin run spark interpreter process using spark-submit
50 | ##
51 | # export SPARK_HOME # (required) When it is defined, load it instead of Zeppelin embedded Spark libraries
52 | # export SPARK_SUBMIT_OPTIONS # (optional) extra options to pass to spark submit. eg) "--driver-memory 512M --executor-memory 1G".
53 | # export SPARK_APP_NAME # (optional) The name of spark application.
54 |
55 | ## Use embedded spark binaries ##
56 | ## without SPARK_HOME defined, Zeppelin still able to run spark interpreter process using embedded spark binaries.
57 | ## however, it is not encouraged when you can define SPARK_HOME
58 | ##
59 | # Options read in YARN client mode
60 | # export HADOOP_CONF_DIR # yarn-site.xml is located in configuration directory in HADOOP_CONF_DIR.
61 | # Pyspark (supported with Spark 1.2.1 and above)
62 | # To configure pyspark, you need to set spark distribution's path to 'spark.home' property in Interpreter setting screen in Zeppelin GUI
63 | # export PYSPARK_PYTHON # path to the python command. must be the same path on the driver(Zeppelin) and all workers.
64 | # export PYTHONPATH
65 |
66 | ## Spark interpreter options ##
67 | ##
68 | # export ZEPPELIN_SPARK_USEHIVECONTEXT # Use HiveContext instead of SQLContext if set true. true by default.
69 | # export ZEPPELIN_SPARK_CONCURRENTSQL # Execute multiple SQL concurrently if set true. false by default.
70 | # export ZEPPELIN_SPARK_IMPORTIMPLICIT # Import implicits, UDF collection, and sql if set true. true by default.
71 | # export ZEPPELIN_SPARK_MAXRESULT # Max number of Spark SQL result to display. 1000 by default.
72 | # export ZEPPELIN_WEBSOCKET_MAX_TEXT_MESSAGE_SIZE # Size in characters of the maximum text message to be received by websocket. Defaults to 1024000
73 |
74 |
75 | #### HBase interpreter configuration ####
76 |
77 | ## To connect to HBase running on a cluster, either HBASE_HOME or HBASE_CONF_DIR must be set
78 |
79 | # export HBASE_HOME= # (require) Under which HBase scripts and configuration should be
80 | # export HBASE_CONF_DIR= # (optional) Alternatively, configuration directory can be set to point to the directory that has hbase-site.xml
81 |
82 | #### ZeppelinHub connection configuration ####
83 | # export ZEPPELINHUB_API_ADDRESS # Refers to the address of the ZeppelinHub service in use
84 | # export ZEPPELINHUB_API_TOKEN # Refers to the Zeppelin instance token of the user
85 | # export ZEPPELINHUB_USER_KEY # Optional, when using Zeppelin with authentication.
86 |
87 | #### Zeppelin impersonation configuration
88 | # export ZEPPELIN_IMPERSONATE_CMD # Optional, when user want to run interpreter as end web user. eg) 'sudo -H -u ${ZEPPELIN_IMPERSONATE_USER} bash -c '
89 | # export ZEPPELIN_IMPERSONATE_SPARK_PROXY_USER #Optional, by default is true; can be set to false if you don't want to use --proxy-user option with Spark interpreter when impersonation enabled
90 |
--------------------------------------------------------------------------------
/templates/zookeeper/zoo.cfg:
--------------------------------------------------------------------------------
1 | # The number of milliseconds of each tick
2 | tickTime=2000
3 | # The number of ticks that the initial
4 | # synchronization phase can take
5 | initLimit=10
6 | # The number of ticks that can pass between
7 | # sending a request and getting an acknowledgement
8 | syncLimit=5
9 | # the directory where the snapshot is stored.
10 | # do not use /tmp for storage, /tmp here is just
11 | # example sakes.
12 | dataDir=LOCAL_CLOUD_PREFIX/data/zookeeper
13 | # the port at which the clients will connect
14 | clientPort=2181
15 | # the maximum number of client connections.
16 | # increase this if you need to handle more clients
17 | #maxClientCnxns=60
18 | #
19 | # Be sure to read the maintenance section of the
20 | # administrator guide before turning on autopurge.
21 | #
22 | # http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
23 | #
24 | # The number of snapshots to retain in dataDir
25 | #autopurge.snapRetainCount=3
26 | # Purge task interval in hours
27 | # Set to "0" to disable auto purge feature
28 | #autopurge.purgeInterval=1
29 |
--------------------------------------------------------------------------------