├── .gitignore
├── README.md
├── bin
    ├── cloud-local.sh
    ├── config.sh
    └── ports.sh
├── conf
    └── cloud-local.conf
└── templates
    ├── accumulo
        ├── accumulo-env.sh
        └── accumulo-site.xml
    ├── hadoop
        ├── core-site.xml
        ├── hdfs-site.xml
        ├── mapred-site.xml
        └── yarn-site.xml
    ├── hbase
        └── hbase-site.xml
    ├── kafka
        └── server.properties
    ├── zeppelin
        └── zeppelin-env.sh
    └── zookeeper
        └── zoo.cfg


/.gitignore:
--------------------------------------------------------------------------------
 1 | *~
 2 | /pkg
 3 | data/*
 4 | *.tar.gz
 5 | /accumulo*
 6 | /hbase*
 7 | /hadoop*
 8 | /zookeeper*
 9 | /zeppelin*
10 | derby*
11 | /metastore_db*
12 | /kafka*
13 | /spark*
14 | /scala*
15 | /geomesa*
16 | /scala*
17 | /.idea
18 | /cloud-local.iml
19 | *.iml
20 | .envrc
21 | 
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # cloud-local
  2 | 
  3 | Cloud-local is a collection of bash scripts to set up a single-node cloud on on your desktop, laptop, or NUC. Performance expectations are to be sufficient for testing things like map-reduce ingest, converters in real life with real files, and your own geoserver/iterator stack. This setup is preconfigured to run YARN so you can submit command line tools mapreduce jobs to it. Currently localhost ssh is NOT required so it will work on a NUC. 
  4 | 
  5 | Cloud Local can be used to run single node versions of the following software:
  6 | * Hadoop HDFS
  7 | * YARN
  8 | * Accumulo
  9 | * HBase
 10 | * Spark (on yarn)
 11 | * Kafka
 12 | * GeoMesa
 13 | 
 14 | ## Versions and Branches
 15 | 
 16 | The main branch requires Hadoop 3.x and Accumulo 2.0 while the hadoop2 branch is based on Hadoop 2.x and Accumulo 1.x
 17 | 
 18 | ## Initial Configuration
 19 | 
 20 | A proxy server can be configured by using the standard `http_proxy` env var or the cloud-local specific `cl_http_proxy` env var. The cloud local specific variable takes precedence.
 21 | 
 22 | ## Getting Started 
 23 | 
 24 | To prepare for the first run of cloud-local, you may need to `unset` environment variables `HADOOP_HOME`, `ACCUMULO_HOME`, `ZOOKEEPER_HOME`, `HBASE_HOME`, and others. If `env |grep -i hadoop` comes back empty, you should be good to go. You should also `kill` any instances of zookeeper or hadoop running locally. Find them with `jps -lV`.
 25 | 
 26 | When using this the first time...
 27 | 
 28 |     git clone git@github.com:ccri/cloud-local.git
 29 |     cd cloud-local
 30 |     bin/cloud-local.sh init
 31 | 
 32 | By default only HDFS is started. Edit `conf/cloud-local.conf` to enable Accumulo, BHase, Kafka, GeoMesa, Spark and/or Zeppelin.
 33 | 
 34 | Cloud local sets a default accumulo instance name of "local" and password of "secret" which can be modified by editing the `conf/cloud-local.conf` file. If you want to change this you'll need to stop, clean, and reconfigure.
 35 | 
 36 | This init script does several things:
 37 | * configure HDFS configuration files
 38 | * format the HDFS namenode
 39 | * create a user homedir in hdfs
 40 | * initialize accumulo/hbase
 41 | * start up zookeeper, hadoop, and accumulo/hbase
 42 | * start kafka broker
 43 | * install and start Zeppelin
 44 | * install GeoMesa Accumulo iterators
 45 | * install GeoMesa command-line tools
 46 | 
 47 | After running `init` source the variables in your bashrc or other shell:
 48 | 
 49 |     source bin/config.sh
 50 | 
 51 | Now you should have the environment vars set:
 52 |     
 53 |     env | grep -i hadoop
 54 | 
 55 | Now you can run fun commands like:
 56 | 
 57 |     hadoop fs -ls /
 58 |     accumulo shell -u root 
 59 | 
 60 | After installing it you should be able to reach your standard cloud urls:
 61 | 
 62 | * Accumulo:    http://localhost:50095
 63 | * Hadoop DFS:  http://localhost:50070
 64 | * Job Tracker: http://localhost:8088
 65 | * Zeppelin:    http://localhost:5771
 66 | 
 67 | ## Getting Help
 68 | 
 69 | Options for using `cloud-local.sh` can be found by calling:
 70 | 
 71 |     bin/cloud-local.sh help
 72 | 
 73 | You can also set `CL_VERBOSE=1` env variable in `conf/cloud-local.conf` to increase messages
 74 | 
 75 | ## Stopping and Starting
 76 | 
 77 | You can safely stop the cloud using: 
 78 | 
 79 |     bin/cloud-local.sh stop
 80 | 
 81 | You should stop the cloud before shutting down the machine or doing maintenance.
 82 | 
 83 | You can start the cloud back up using the analogous `start` option. Be sure that the cloud is not running (hit the cloud urls or `ps aux|grep -i hadoop`).
 84 | 
 85 |     bin/cloud-local.sh start
 86 | 
 87 | If existing ports are bound to the ports needed for cloud-local an error message will be printed and the script will stop.
 88 | 
 89 | ## Changing Ports, Hostname, and Bind Address
 90 | 
 91 | cloud-local allows you to modify the ports, hostname, and bind addresses in configuration or using variables in your env (bashrc). For example:
 92 | 
 93 |     # sample .bashrc configuration
 94 |     
 95 |     # offset all ports by 10000
 96 |     export CL_PORT_OFFSET=10000
 97 |     
 98 |     # change the bind address
 99 |     export CL_BIND_ADDRESS=192.168.2.2
100 |     
101 |     # change the hostname from localhost to something else
102 |     export CL_HOSTNAME=mydns.mycompany.com
103 |     
104 | Port offseting moves the entire port space by a given numerical amount in order to allow multiple cloud-local instances to run on a single machine (usually by different users). The bind address and hostname args allow you to reference cloud local from other machines.
105 | 
106 | WARNING - you should stop and clean cloud-local before changing any of these parameters since they will modify the config and may prevent cloud-local from cleanly shutting down. Changing port offsets is supported by XML comments in the accumulo and hadoop config files. Removing or changing these comments (CL_port_default) will likely cause failures.
107 | 
108 | ## GeoServer
109 | 
110 | If you have the environment variable GEOSERVER_HOME set you can use this parameter to start GeoServer at the same time but running in a child thread.
111 | 
112 |     bin/cloud-local.sh start -gs
113 | 
114 | Similarly, you can instruct cloud-local to shutdown GeoServer with the cloud using:
115 | 
116 |     bin/cloud-local.sh stop -gs
117 |     
118 | Additionally, if you need to restart GeoServer you may use the command `regeoserver`:
119 | 
120 |     bin/cloud-local.sh regeoserver
121 |     
122 | The GeoServer PID is stored in `$CLOUD_HOME/data/geoserver/pid/geoserver.pid` and GeoServer's stdout is redirected to `$CLOUD_HOME/data/geoserver/log/std.out`.
123 | 
124 | ## Zeppelin
125 | 
126 | Zeppelin is *disabled* by default.
127 | 
128 | Currently, we are using the Zeppelin distribution that includes all of the interpreters, and
129 | it is configured to run against Spark only in local mode.  If you want to connect to another
130 | (real) cloud, you will have to configure that manually; see:
131 | 
132 | [Zeppelin documentation](http://zeppelin.apache.org/docs/0.7.0/install/spark_cluster_mode.html#spark-on-yarn-mode)
133 | 
134 | ### GeoMesa Spark-SQL on Zeppelin
135 | 
136 | To enable GeoMesa's Spark-SQL within Zeppelin:
137 | 
138 | 1.  point your browser to your [local Zeppelin interpreter configuration](http://localhost:5771/#/interpreter)
139 | 1.  scroll to the bottom where the *Spark* interpreter configuration appears
140 | 1.  click on the "edit" button next to the interpreter name (on the right-hand side of the UI)
141 | 1.  within the _Dependencies_ section, add this one JAR (either as a full, local file name or as Maven GAV coordinates):
142 |     1.  geomesa-accumulo-spark-runtime_2.11-1.3.0.jar
143 | 1.  when prompted by the pop-up, click to restart the Spark intepreter
144 | 
145 | That's it!  There is no need to restart any of the cloud-local services.
146 | 
147 | ## Maintenance
148 | 
149 | The `cloud-local.sh` script provides options for maintenance. Best to stop the cloud before performing any of these tasks. Pass in the parameter `clean` to remove software (but not the tar.gz's) and data. The parameter `reconfigure` will first `clean` then `init`.
150 | 
151 | ### Updating
152 | 
153 | When this git repo is updated, follow the steps below. The steps below will remove your data. 
154 | 
155 |     cd $CLOUD_HOME
156 |     bin/cloud-local.sh stop
157 |     bin/cloud-local.sh clean
158 |     git pull
159 |     bin/cloud-local.sh init
160 | 
161 | ### Starting over
162 | 
163 | If you foobar your cloud, you can just delete everything and start over. You should do this once a week or so just for good measure.  
164 | 
165 |     cd $CLOUD_HOME
166 |     bin/cloud-local.sh stop  #if cloud is running
167 |     rm * -rf
168 |     git pull
169 |     git reset --hard
170 |     bin/cloud-local.sh init
171 | 
172 | ## Virtual Machine Help
173 | 
174 | If you are using cloud-local within a virtual machine running you your local box you may want to set up port forwarding for port 50095 to see the accumulo monitor. For VirtualBox go to VM's Settings->Network->Port Forwarding section (name=accumulo, protocol=TCP, Host IP=127.0.0.1, Guest IP (leave blank), Guest Port=50095)
175 | 


--------------------------------------------------------------------------------
/bin/cloud-local.sh:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env bash
  2 | 
  3 | REPO_BASE=https://repo.locationtech.org/content/repositories/geomesa-releases
  4 | 
  5 | # thanks accumulo for these resolutions snippets
  6 | # Start: Resolve Script Directory
  7 | SOURCE="${BASH_SOURCE[0]}"
  8 | while [[ -h "${SOURCE}" ]]; do # resolve $SOURCE until the file is no longer a symlink
  9 |    bin="$( cd -P "$( dirname "${SOURCE}" )" && pwd )"
 10 |    SOURCE="$(readlink "${SOURCE}")"
 11 |    [[ "${SOURCE}" != /* ]] && SOURCE="${bin}/${SOURCE}" # if $SOURCE was a relative symlink, we need to resolve it relative to the path where the symlink file was located
 12 | done
 13 | bin="$( cd -P "$( dirname "${SOURCE}" )" && pwd )"
 14 | script=$( basename "${SOURCE}" )
 15 | # Stop: Resolve Script Directory
 16 | 
 17 | # Start: config
 18 | . "${bin}"/config.sh
 19 | 
 20 | # Check config
 21 | if ! validate_config; then
 22 |   echo "Invalid configuration"
 23 |   exit 1
 24 | fi
 25 | 
 26 | # check java home
 27 | if [[ -z "$JAVA_HOME" ]]; then
 28 |   echo "must set JAVA_HOME..."
 29 |   exit 1
 30 | fi
 31 | # Stop: config
 32 | 
 33 | # import port checking
 34 | . "${bin}"/ports.sh
 35 | 
 36 | function download_packages {
 37 |   # Is the pre-download packages variable set?
 38 |   if [[ ! -z ${pkg_pre_download+x} ]]; then
 39 |     # Does that folder actually exist?
 40 |     if [[ -d ${pkg_pre_download} ]] ; then
 41 |       test -d ${CLOUD_HOME}/pkg || rmdir ${CLOUD_HOME}/pkg
 42 |       test -h ${CLOUD_HOME}/pkg && rm ${CLOUD_HOME}/pkg
 43 |       ln -s ${pkg_pre_download} ${CLOUD_HOME}/pkg
 44 |       echo "Skipping downloads... using ${pkg_pre_download}"
 45 |       return 0
 46 |     fi
 47 |   fi
 48 | 
 49 |   # get stuff
 50 |   echo "Downloading packages from internet..."
 51 |   test -d ${CLOUD_HOME}/pkg || mkdir ${CLOUD_HOME}/pkg
 52 | 
 53 |   # check for proxy
 54 |   if [[ ! -z ${cl_http_proxy+x} ]]; then
 55 |     export http_proxy="${cl_http_proxy}"
 56 |   fi
 57 | 
 58 |   if [[ ! -z ${http_proxy+x} ]]; then
 59 |     echo "Using proxy ${http_proxy}"
 60 |   fi
 61 | 
 62 |   # GeoMesa
 63 |   if [[ "${geomesa_enabled}" -eq "1" ]]; then
 64 |     gm="geomesa-accumulo-dist_${pkg_geomesa_scala_ver}-${pkg_geomesa_ver}-bin.tar.gz"
 65 |     url="${REPO_BASE}/org/locationtech/geomesa/geomesa-accumulo-dist_${pkg_geomesa_scala_ver}/${pkg_geomesa_ver}/${gm}"
 66 |     wget -c -O "${CLOUD_HOME}/pkg/${gm}" "${url}" || { rm -f "${CLOUD_HOME}/pkg/${gm}"; echo "Error downloading: ${CLOUD_HOME}/pkg/${gm}"; errorList="${errorList} ${gm} ${NL}"; };
 67 |     gm="geomesa-accumulo-distributed-runtime_${pkg_geomesa_scala_ver}-${pkg_geomesa_ver}.jar"
 68 |     url="${REPO_BASE}/org/locationtech/geomesa/geomesa-accumulo-distributed-runtime_${pkg_geomesa_scala_ver}/${pkg_geomesa_ver}/${gm}"
 69 |     wget -c -O "${CLOUD_HOME}/pkg/${gm}" "${url}" || { rm -f "${CLOUD_HOME}/pkg/${gm}"; echo "Error downloading: ${CLOUD_HOME}/pkg/${gm}"; errorList="${errorList} ${gm} ${NL}"; };
 70 |   fi
 71 | 
 72 |   # Scala
 73 |   if [[ "${scala_enabled}" -eq "1" ]]; then
 74 |     url="http://downloads.lightbend.com/scala/${pkg_scala_ver}/scala-${pkg_scala_ver}.tgz"
 75 |     file="${CLOUD_HOME}/pkg/scala-${pkg_scala_ver}.tgz"
 76 |     wget -c -O "${file}" "${url}" \
 77 |       || { rm -f "${file}"; echo "Error downloading: ${file}"; errorList="${errorList} scala-${pkg_scala_ver}.tgz ${NL}"; };
 78 |   fi
 79 | 
 80 |   local apache_archive_url="http://archive.apache.org/dist"
 81 | 
 82 |   local maven=${pkg_src_maven}
 83 | 
 84 |   declare -a urls=("${apache_archive_url}/hadoop/common/hadoop-${pkg_hadoop_ver}/hadoop-${pkg_hadoop_ver}.tar.gz"
 85 |                    "${apache_archive_url}/zookeeper/zookeeper-${pkg_zookeeper_ver}/zookeeper-${pkg_zookeeper_ver}.tar.gz")
 86 | 
 87 |   if [[ "$spark_enabled" -eq 1 ]]; then
 88 |     urls=("${urls[@]}" "${apache_archive_url}/spark/spark-${pkg_spark_ver}/spark-${pkg_spark_ver}-bin-${pkg_spark_hadoop_ver}.tgz")
 89 |   fi
 90 |   
 91 |   if [[ "$kafka_enabled" -eq 1 ]]; then
 92 |     urls=("${urls[@]}" "${apache_archive_url}/kafka/${pkg_kafka_ver}/kafka_${pkg_kafka_scala_ver}-${pkg_kafka_ver}.tgz")
 93 |   fi
 94 | 
 95 |   if [[ "$acc_enabled" -eq 1 ]]; then
 96 |     urls=("${urls[@]}" "${maven}/org/apache/accumulo/accumulo/${pkg_accumulo_ver}/accumulo-${pkg_accumulo_ver}-bin.tar.gz")
 97 |   fi
 98 | 
 99 |   if [[ "$hbase_enabled" -eq 1 ]]; then
100 |     urls=("${urls[@]}" "${apache_archive_url}/hbase/${pkg_hbase_ver}/hbase-${pkg_hbase_ver}-bin.tar.gz")
101 |   fi
102 | 
103 |   if [[ "$zeppelin_enabled" -eq 1 ]]; then
104 |     urls=("${urls[@]}" "${apache_archive_url}/zeppelin/zeppelin-${pkg_zeppelin_ver}/zeppelin-${pkg_zeppelin_ver}-bin-all.tgz")
105 |   fi
106 | 
107 |   for x in "${urls[@]}"; do
108 |       fname=$(basename "$x");
109 |       echo "fetching ${x}";
110 |       wget -c -O "${CLOUD_HOME}/pkg/${fname}" "$x" || { rm -f "${CLOUD_HOME}/pkg/${fname}"; echo "Error Downloading: ${fname}"; errorList="${errorList} ${x} ${NL}"; };
111 |   done 
112 | 
113 |   if [[ -n "${errorList}" ]]; then
114 |     echo "Failed to download: ${NL} ${errorList}";
115 |   fi
116 | }
117 | 
118 | function unpackage {
119 |   local targs
120 |   if [[ "${CL_VERBOSE}" == "1" ]]; then
121 |     targs="xvf"
122 |   else
123 |     targs="xf"
124 |   fi
125 | 
126 |   echo "Unpackaging software..."
127 |   [[ "${geomesa_enabled}" -eq "1" ]] \
128 |     && $(cd -P "${CLOUD_HOME}" && tar $targs "${CLOUD_HOME}/pkg/geomesa-accumulo-dist_${pkg_geomesa_scala_ver}-${pkg_geomesa_ver}-bin.tar.gz") \
129 |     && echo "Unpacked GeoMesa Tools"
130 |   [[ "${scala_enabled}" -eq "1" ]] \
131 |     && $(cd -P "${CLOUD_HOME}" && tar $targs "${CLOUD_HOME}/pkg/scala-${pkg_scala_ver}.tgz") \
132 |     && echo "Unpacked Scala ${pkg_scala_ver}"
133 |   (cd -P "${CLOUD_HOME}" && tar $targs "${CLOUD_HOME}/pkg/zookeeper-${pkg_zookeeper_ver}.tar.gz") && echo "Unpacked zookeeper"
134 |   [[ "$acc_enabled" -eq 1 ]] && (cd -P "${CLOUD_HOME}" && tar $targs "${CLOUD_HOME}/pkg/accumulo-${pkg_accumulo_ver}-bin.tar.gz") && echo "Unpacked accumulo"
135 |   [[ "$hbase_enabled" -eq 1 ]] && (cd -P "${CLOUD_HOME}" && tar $targs "${CLOUD_HOME}/pkg/hbase-${pkg_hbase_ver}-bin.tar.gz") && echo "Unpacked hbase"
136 |   [[ "$zeppelin_enabled" -eq 1 ]] && (cd -P "${CLOUD_HOME}" && tar $targs "${CLOUD_HOME}/pkg/zeppelin-${pkg_zeppelin_ver}-bin-all.tgz") && echo "Unpacked zeppelin"
137 |   [[ "$kafka_enabled" -eq 1 ]] && (cd -P "${CLOUD_HOME}" && tar $targs "${CLOUD_HOME}/pkg/kafka_${pkg_kafka_scala_ver}-${pkg_kafka_ver}.tgz") && echo "Unpacked kafka"
138 |   [[ "$spark_enabled" -eq 1 ]] && (cd -P "${CLOUD_HOME}" && tar $targs "${CLOUD_HOME}/pkg/spark-${pkg_spark_ver}-bin-${pkg_spark_hadoop_ver}.tgz") && echo "Unpacked spark"
139 |   (cd -P "${CLOUD_HOME}" && tar $targs "${CLOUD_HOME}/pkg/hadoop-${pkg_hadoop_ver}.tar.gz") && echo "Unpacked hadoop"
140 | }
141 | 
142 | function configure {
143 |   mkdir -p "${CLOUD_HOME}/tmp/staging"
144 |   cp -r ${CLOUD_HOME}/templates/* ${CLOUD_HOME}/tmp/staging/
145 | 
146 |   # accumulo config before substitutions
147 |   [[ "$acc_enabled" -eq 1 ]] && cp $ACCUMULO_HOME/conf/examples/3GB/standalone/* $ACCUMULO_HOME/conf/
148 | 
149 |   ## Substitute env vars
150 |   sed -i~orig "s#LOCAL_CLOUD_PREFIX#${CLOUD_HOME}#;s#CLOUD_LOCAL_HOSTNAME#${CL_HOSTNAME}#;s#CLOUD_LOCAL_BIND_ADDRESS#${CL_BIND_ADDRESS}#" ${CLOUD_HOME}/tmp/staging/*/*
151 | 
152 |   if [[ "$acc_enabled" -eq 1 ]]; then
153 |     # accumulo config
154 |     echo "${CL_HOSTNAME}" > ${ACCUMULO_HOME}/conf/gc
155 |     echo "${CL_HOSTNAME}" > ${ACCUMULO_HOME}/conf/masters
156 |     echo "${CL_HOSTNAME}" > ${ACCUMULO_HOME}/conf/tservers
157 |     echo "${CL_HOSTNAME}" > ${ACCUMULO_HOME}/conf/monitor
158 |     echo "${CL_HOSTNAME}" > ${ACCUMULO_HOME}/conf/tracers
159 |   fi
160 | 
161 |   if [[ "$hbase_enabled" -eq 1 ]]; then
162 |     sed -i~orig "s/\# export HBASE_MANAGES_ZK=true/export HBASE_MANAGES_ZK=false/" "${HBASE_HOME}/conf/hbase-env.sh"
163 |     echo "${CL_HOSTNAME}" > ${HBASE_HOME}/conf/regionservers
164 |   fi
165 | 
166 |   # Zeppelin configuration
167 |   if [[ "$zeppelin_enabled" -eq 1 ]]; then
168 |     echo "[WARNING]  Zeppelin configuration is only template-based for now!"
169 |   fi
170 | 
171 |   # hadoop slaves file
172 |   echo "${CL_HOSTNAME}" > ${CLOUD_HOME}/tmp/staging/hadoop/slaves
173 | 
174 |   # deploy from staging
175 |   echo "Deploying config from staging..."
176 |   test -d $HADOOP_CONF_DIR ||  mkdir $HADOOP_CONF_DIR
177 |   test -d $ZOOKEEPER_HOME/conf || mkdir $ZOOKEEPER_HOME/conf
178 |   [[ "$kafka_enabled" -eq 1 ]] && (test -d $KAFKA_HOME/config || mkdir $KAFKA_HOME/config)
179 |   cp ${CLOUD_HOME}/tmp/staging/hadoop/* $HADOOP_CONF_DIR/
180 |   cp ${CLOUD_HOME}/tmp/staging/zookeeper/* $ZOOKEEPER_HOME/conf/
181 |   [[ "$kafka_enabled" -eq 1 ]] && cp ${CLOUD_HOME}/tmp/staging/kafka/* $KAFKA_HOME/config/
182 |   [[ "$acc_enabled" -eq 1 ]] && cp ${CLOUD_HOME}/tmp/staging/accumulo/* ${ACCUMULO_HOME}/conf/
183 |   [[ "$geomesa_enabled" -eq 1 ]] && cp ${CLOUD_HOME}/pkg/geomesa-accumulo-distributed-runtime_${pkg_geomesa_scala_ver}-${pkg_geomesa_ver}.jar ${ACCUMULO_HOME}/lib/ext/
184 |   [[ "$hbase_enabled" -eq 1 ]] && cp ${CLOUD_HOME}/tmp/staging/hbase/* ${HBASE_HOME}/conf/
185 |   [[ "$zeppelin_enabled" -eq 1 ]] && cp ${CLOUD_HOME}/tmp/staging/zeppelin/* ${ZEPPELIN_HOME}/conf/
186 | 
187 |   # If Spark doesn't have log4j settings, use the Spark defaults
188 |   if [[ "$spark_enabled" -eq 1 ]]; then
189 |     test -f $SPARK_HOME/conf/log4j.properties && cp $SPARK_HOME/conf/log4j.properties.template $SPARK_HOME/conf/log4j.properties
190 |   fi
191 | 
192 |   # configure port offsets
193 |   configure_port_offset
194 | 
195 |   # As of Accumulo 2 accumulo-site.xml is nolonger allowed. To avoid a lot of work rewriting the ports script we'll just use accumulo's converter.
196 |   if [ -f "$ACCUMULO_HOME/conf/accumulo-site.xml" ]; then
197 |     rm -f "$ACCUMULO_HOME/conf/accumulo.properties"
198 |     "$ACCUMULO_HOME/bin/accumulo" convert-config \
199 |       -x "$ACCUMULO_HOME/conf/accumulo-site.xml" \
200 |       -p "$ACCUMULO_HOME/conf/accumulo.properties"
201 |     rm -f "$ACCUMULO_HOME/conf/accumulo-site.xml"
202 |   fi
203 | 
204 |   # Configure accumulo-client.properties
205 |   if [ -f "$ACCUMULO_HOME/conf/accumulo-client.properties" ]; then
206 |     sed -i "s/.*instance.name=.*$/instance.name=$cl_acc_inst_name/" "$ACCUMULO_HOME/conf/accumulo-client.properties"
207 |     sed -i "s/.*auth.principal=.*$/auth.principal=root/"           "$ACCUMULO_HOME/conf/accumulo-client.properties"
208 |     sed -i "s/.*auth.token=.*$/auth.token=$cl_acc_inst_pass/"       "$ACCUMULO_HOME/conf/accumulo-client.properties"
209 | 
210 |   fi
211 |   rm -rf ${CLOUD_HOME}/tmp/staging
212 | }
213 | 
214 | function start_first_time {
215 |   # This seems redundant to config but this is the first time in normal sequence where it will set properly
216 |   [[ "$pkg_spark_hadoop_ver" = "without-hadoop" ]] && export SPARK_DIST_CLASSPATH=$(hadoop classpath)
217 |   # check ports
218 |   check_ports
219 | 
220 |   # start zk
221 |   echo "Starting zoo..."
222 |   (cd $CLOUD_HOME; $ZOOKEEPER_HOME/bin/zkServer.sh start)
223 | 
224 |   if [[ "$kafka_enabled" -eq 1 ]]; then
225 |     # start kafka
226 |     echo "Starting kafka..." 
227 |     $KAFKA_HOME/bin/kafka-server-start.sh -daemon $KAFKA_HOME/config/server.properties
228 |   fi
229 |   
230 |   # format namenode
231 |   echo "Formatting namenode..."
232 |   $HADOOP_HOME/bin/hdfs namenode -format
233 |   
234 |   # start hadoop
235 |   echo "Starting hadoop..."
236 |   $HADOOP_HOME/bin/hdfs --config $HADOOP_CONF_DIR --daemon start namenode
237 |   $HADOOP_HOME/bin/hdfs --config $HADOOP_CONF_DIR --daemon start secondarynamenode
238 |   $HADOOP_HOME/bin/hdfs --config $HADOOP_CONF_DIR --daemon start datanode
239 |   $HADOOP_HOME/bin/yarn --config $HADOOP_CONF_DIR --daemon start resourcemanager
240 |   $HADOOP_HOME/bin/yarn --config $HADOOP_CONF_DIR --daemon start nodemanager
241 |   
242 |   # Wait for HDFS to exit safemode:
243 |   hdfs_wait_safemode 
244 | 
245 |   # create user homedir
246 |   echo "Creating hdfs path /user/$USER"
247 |   $HADOOP_HOME/bin/hadoop fs -mkdir -p "/user/$USER"
248 |   
249 |   # sleep 
250 |   sleep 5
251 |   
252 |   if [[ "$acc_enabled" -eq 1 ]]; then
253 |     # init accumulo
254 |     echo "Initializing accumulo"
255 |     $ACCUMULO_HOME/bin/accumulo init --instance-name $cl_acc_inst_name --password $cl_acc_inst_pass
256 | 
257 |     # sleep
258 |     sleep 5
259 | 
260 |     # starting accumulo
261 |     echo "Starting accumulo..."
262 |     $ACCUMULO_HOME/bin/accumulo-cluster start
263 |   fi
264 | 
265 |   if [[ "$hbase_enabled" -eq 1 ]]; then
266 |     # start hbase
267 |     echo "Starting hbase..."
268 |     ${HBASE_HOME}/bin/start-hbase.sh
269 |   fi
270 | 
271 |   if [[ "$zeppelin_enabled" -eq 1 ]]; then
272 |     # start zeppelin
273 |     echo "Starting zeppelin..."
274 |     ${ZEPPELIN_HOME}/bin/zeppelin-daemon.sh start
275 |   fi
276 | 
277 |   if [[ "$geoserver_enable" -eq 1 ]]; then
278 |     echo "Initializing geoserver..."
279 |     mkdir -p "${GEOSERVER_DATA_DIR}"
280 |     mkdir "${GEOSERVER_PID_DIR}"
281 |     mkdir "${GEOSERVER_LOG_DIR}"
282 |     touch "${GEOSERVER_PID_DIR}/geoserver.pid"
283 |     touch "${GEOSERVER_LOG_DIR}/std.out"
284 |     start_geoserver
285 |   fi
286 | 
287 | }
288 | 
289 | function start_cloud {
290 |   # Check ports
291 |   check_ports
292 |  
293 |   if [[ "$master_enabled" -eq 1 ]]; then
294 |   	# start zk
295 |   	echo "Starting zoo..."
296 |   	(cd $CLOUD_HOME ; zkServer.sh start)
297 | 
298 |   	if [[ "$kafka_enabled" -eq 1 ]]; then
299 |     	echo "Starting kafka..."
300 |     	$KAFKA_HOME/bin/kafka-server-start.sh -daemon $KAFKA_HOME/config/server.properties
301 |   	fi
302 |   
303 |   	# start hadoop
304 |   	echo "Starting hadoop..."
305 |   	hdfs --config $HADOOP_CONF_DIR --daemon start namenode
306 |   	hdfs --config $HADOOP_CONF_DIR --daemon start secondarynamenode
307 |   fi
308 | 
309 |   if [[ "$worker_enabled" -eq 1 ]]; then
310 |   	hdfs --config $HADOOP_CONF_DIR --daemon start datanode
311 |   fi
312 | 
313 |   start_yarn
314 |   
315 |   # Wait for HDFS to exit safemode:
316 |   echo "Waiting for HDFS to exit safemode..."
317 |   hdfs_wait_safemode
318 | }
319 | 
320 | function hdfs_wait_safemode {
321 |   safemode_done=1
322 |   while [[ "$safemode_done" -ne 0 ]]; do
323 |     echo "Waiting for HDFS to exit safemode..."
324 |     hdfs dfsadmin -safemode wait
325 |     safemode_done=$?
326 |     if [[ "$safemode_done" -ne 0 ]]; then
327 |       echo "Safe mode not done...sleeping 1"
328 |       sleep 1;
329 |     fi
330 |   done
331 |   echo "Safemode exited"
332 | }
333 | 
334 | 
335 | function start_db {
336 |   if [[ "$acc_enabled" -eq 1 ]]; then
337 |     # starting accumulo
338 |     echo "starting accumulo..."
339 |     $ACCUMULO_HOME/bin/accumulo-cluster start
340 |   fi
341 | 
342 |   if [[ "$hbase_enabled" -eq 1 ]]; then
343 |     # start hbase
344 |     echo "Starting hbase..."
345 |     ${HBASE_HOME}/bin/start-hbase.sh
346 |   fi
347 | 
348 |   if [[ "$zeppelin_enabled" -eq 1 ]]; then
349 |     # start zeppelin
350 |     echo "Starting zeppelin..."
351 |     ${ZEPPELIN_HOME}/bin/zeppelin-daemon.sh start
352 |   fi
353 | 
354 |   if [[ "$geoserver_enable" -eq 1 ]]; then
355 |     echo "Starting geoserver..."
356 |     start_geoserver
357 |   fi
358 | 
359 | }
360 | 
361 | function start_yarn {
362 |   if [[ "$master_enabled" -eq 1 ]]; then
363 |     $HADOOP_HOME/bin/yarn --config $HADOOP_CONF_DIR --daemon start resourcemanager
364 |   fi 
365 |   if [[ "$worker_enabled" -eq 1 ]]; then
366 |   	$HADOOP_HOME/bin/yarn --config $HADOOP_CONF_DIR --daemon start nodemanager
367 |   fi 
368 | }
369 | 
370 | function start_geoserver {
371 |   (${GEOSERVER_HOME}/bin/startup.sh &> ${GEOSERVER_LOG_DIR}/std.out) &
372 |   GEOSERVER_PID=$!
373 |   echo "${GEOSERVER_PID}" > ${GEOSERVER_PID_DIR}/geoserver.pid
374 |   echo "GeoServer Process Started"
375 |   echo "PID: ${GEOSERVER_PID}"
376 |   echo "GeoServer Out: ${GEOSERVER_LOG_DIR}/std.out"
377 | }
378 | 
379 | function stop_db {
380 |   verify_stop
381 | 
382 |   if [[ "$zeppelin_enabled" -eq 1 ]]; then
383 |     echo "Stopping zeppelin..."
384 |     ${ZEPPELIN_HOME}/bin/zeppelin-daemon.sh stop
385 |   fi
386 | 
387 |   if [[ "$kafka_enabled" -eq 1 ]]; then
388 |     echo "Stopping kafka..."
389 |     $KAFKA_HOME/bin/kafka-server-stop.sh
390 |   fi
391 | 
392 |   if [[ "$acc_enabled" -eq 1 ]]; then
393 |     echo "Stopping accumulo..."
394 |     $ACCUMULO_HOME/bin/accumulo-cluster stop
395 |   fi
396 | 
397 |   if [[ "$hbase_enabled" -eq 1 ]]; then
398 |     echo "Stopping hbase..."
399 |     ${HBASE_HOME}/bin/stop-hbase.sh
400 |   fi
401 | }
402 | 
403 | function stop_cloud {
404 |   echo "Stopping yarn and dfs..."
405 |   stop_yarn
406 | 
407 |   if [[ "$master_enabled" -eq 1 ]]; then
408 |   	$HADOOP_HOME/bin/hdfs --config $HADOOP_CONF_DIR --daemon stop namenode
409 |   	$HADOOP_HOME/bin/hdfs --config $HADOOP_CONF_DIR --daemon stop secondarynamenode
410 |   fi 
411 |   if [[ "$worker_enabled" -eq 1 ]]; then
412 |   	$HADOOP_HOME/bin/hdfs --config $HADOOP_CONF_DIR --daemon stop datanode
413 |   fi 
414 |   echo "Stopping zookeeper..."
415 |   $ZOOKEEPER_HOME/bin/zkServer.sh stop
416 | 
417 |   if [[ "${geoserver_enabled}" -eq "1" ]]; then
418 |     echo "Stopping geoserver..."
419 |     stop_geoserver
420 |   fi
421 | 
422 | }
423 | 
424 | function psux {
425 |   ps ux | grep -i "$1"
426 | }
427 | 
428 | function verify_stop {
429 |   # Find Processes
430 |   local zeppelin=`psux "[z]eppelin"`
431 |   local kafka=`psux "[k]afka"`
432 |   local accumulo=`psux "[a]ccumulo"`
433 |   local hbase=`psux "[h]base"`
434 |   local yarn=`psux "[y]arn"`
435 |   local zookeeper=`psux "[z]ookeeper"`
436 |   local hadoop=`psux "[h]adoop"`
437 |   local geoserver=`psux "[g]eoserver"`
438 | 
439 |   local res="$zeppelin$kafka$accumulo$hbase$yarn$zookeeper$geoserver"
440 |   if [[ -n "${res}" ]]; then
441 |     echo "The following services do not appear to be shutdown:"
442 |     if [[ -n "${zeppelin}" ]]; then
443 |       echo "${NL}Zeppelin"
444 |       psux "[z]eppelin"
445 |     fi
446 |     if [[ -n "${kafka}" ]]; then
447 |       echo "${NL}Kafka"
448 |       psux "[k]afka"
449 |     fi
450 |     if [[ -n "${accumulo}" ]]; then
451 |       echo "${NL}Accumulo"
452 |       psux "[a]ccumulo"
453 |     fi
454 |     if [[ -n "${hbase}" ]]; then
455 |       echo "${NL}HBase"
456 |       psux "[h]base"
457 |     fi
458 |     if [[ -n "${yarn}" ]]; then
459 |       echo "${NL}Yarn"
460 |       psux "[y]arn"
461 |     fi
462 |     if [[ -n "${zookeeper}" ]]; then
463 |       echo "${NL}Zookeeper"
464 |       psux "[z]ookeeper"
465 |     fi
466 |     if [[ -n "${hadoop}" ]]; then
467 |       echo "${NL}Hadoop"
468 |       psux "[h]adoop"
469 |     fi
470 |     if [[ -n "${geoserver}" ]]; then
471 |       echo "${NL}GeoServer"
472 |       psux "[g]eoserver"
473 |     fi
474 |     read -r -p "Would you like to continue? [Y/n] " confirm
475 |     confirm=$(echo $confirm | tr '[:upper:]' '[:lower:]') #lowercasing
476 |     if [[ $confirm =~ ^(yes|y) || $confirm == "" ]]; then
477 |       return 0
478 |     else
479 |       exit 1
480 |     fi
481 |   fi
482 | }
483 | 
484 | function stop_yarn {
485 |   $HADOOP_HOME/bin/yarn --config $HADOOP_CONF_DIR --daemon stop resourcemanager
486 |   $HADOOP_HOME/bin/yarn --config $HADOOP_CONF_DIR --daemon stop nodemanager
487 | }
488 | 
489 | function stop_geoserver {
490 |   GEOSERVER_PID=`cat ${GEOSERVER_PID_DIR}/geoserver.pid`
491 |   if [[ -n "${GEOSERVER_PID}" ]]; then
492 |     kill -15 ${GEOSERVER_PID}
493 |     echo "TERM signal sent to process PID: ${GEOSERVER_PID}"
494 |   else
495 |     echo "No GeoServer PID was saved. This script must be used to start GeoServer in order for this script to be able to stop it."
496 |   fi
497 | }
498 | 
499 | function clear_sw {
500 |   [[ "$zeppelin_enabled" -eq 1 ]] && rm -rf "${CLOUD_HOME}/zeppelin-${pkg_zeppelin_ver}-bin-all"
501 |   [[ "$acc_enabled" -eq 1 ]] && rm -rf "${CLOUD_HOME}/accumulo-${pkg_accumulo_ver}"
502 |   [[ "$hbase_enabled" -eq 1 ]] && rm -rf "${CLOUD_HOME}/hbase-${pkg_hbase_ver}"
503 |   [[ -d "${CLOUD_HOME}/geomesa-accumulo_${pkg_geomesa_scala_ver}-${pkg_geomesa_ver}"  ]] && rm -rf "${CLOUD_HOME}/geomesa-accumulo_${pkg_geomesa_scala_ver}-${pkg_geomesa_ver}"
504 |   rm -rf "${CLOUD_HOME}/hadoop-${pkg_hadoop_ver}"
505 |   rm -rf "${CLOUD_HOME}/zookeeper-${pkg_zookeeper_ver}"
506 |   [[ "$kafka_enabled" -eq 1 ]] && rm -rf "${CLOUD_HOME}/kafka_${pkg_kafka_scala_ver}-${pkg_kafka_ver}"
507 |   rm -rf "${CLOUD_HOME}/spark-${pkg_spark_ver}-bin-${pkg_spark_hadoop_ver}"
508 |   rm -rf "${CLOUD_HOME}/scala-${pkg_scala_ver}"
509 |   rm -rf "${CLOUD_HOME}/tmp"
510 |   if [[ -a "${CLOUD_HOME}/zookeeper.out" ]]; then rm "${CLOUD_HOME}/zookeeper.out"; fi #hahahaha
511 | }
512 | 
513 | function clear_data {
514 |   read -r -p "Are you sure you want to clear data directories? [y/N] " confirm
515 |   confirm=$(echo $confirm | tr '[:upper:]' '[:lower:]') #lowercasing
516 |   if [[ $confirm =~ ^(yes|y) ]]; then
517 |     rm -rf ${CLOUD_HOME}/data/yarn/*
518 |     rm -rf ${CLOUD_HOME}/data/zookeeper/*
519 |     rm -rf ${CLOUD_HOME}/data/dfs/data/*
520 |     rm -rf ${CLOUD_HOME}/data/dfs/name/*
521 |     rm -rf ${CLOUD_HOME}/data/hadoop/tmp/*
522 |     rm -rf ${CLOUD_HOME}/data/hadoop/pid/*
523 |     rm -rf ${CLOUD_HOME}/data/geoserver/pid/*
524 |     rm -rf ${CLOUD_HOME}/data/geoserver/log/*
525 |     if [[ -d "${CLOUD_HOME}/data/kafka-logs" ]]; then rm -rf ${CLOUD_HOME}/data/kafka-logs; fi # intentionally to clear dot files
526 |   fi
527 | }
528 | 
529 | function show_help {
530 |   echo "Provide 1 command: (init|start|stop|reconfigure|restart|reyarn|regeoserver|clean|download_only|init_skip_download|help)"
531 |   echo "If the environment variable GEOSERVER_HOME is set then the parameter '-gs' may be used with 'start' to automatically start/stop GeoServer with the cloud."
532 | }
533 | 
534 | if [[ "$2" == "-gs" ]]; then
535 |   if [[ -n "${GEOSERVER_HOME}" && -e $GEOSERVER_HOME/bin/startup.sh ]]; then
536 |       geoserver_enabled=1
537 |   else
538 |     echo "The environment variable GEOSERVER_HOME is not set or is not valid."
539 |   fi
540 | fi
541 | 
542 | if [[ "$#" -ne 1 && "${geoserver_enabled}" -ne "1" ]]; then
543 |   show_help
544 |   exit 1
545 | fi
546 | 
547 | if [[ $1 == 'init' ]]; then
548 |   download_packages && unpackage && configure && start_first_time
549 | elif [[ $1 == 'reconfigure' ]]; then
550 |   echo "reconfiguring..."
551 |   #TODO ensure everything is stopped? prompt to make sure?
552 |   stop_cloud && clear_sw && clear_data && unpackage && configure && start_first_time
553 | elif [[ $1 == 'clean' ]]; then
554 |   echo "cleaning..."
555 |   clear_sw && clear_data
556 |   echo "cleaned!"
557 | elif [[ $1 == 'start' ]]; then
558 |   echo "Starting cloud..."
559 |   start_cloud && start_db
560 |   echo "Cloud Started"
561 | elif [[ $1 == 'stop' ]]; then
562 |   echo "Stopping Cloud..."
563 |   stop_db && stop_cloud
564 |   echo "Cloud stopped"
565 | elif [[ $1 == 'start_db' ]]; then
566 |   echo "Starting cloud..."
567 |   start_db
568 |   echo "Database Started"
569 | elif [[ $1 == 'stop_db' ]]; then
570 |   echo "Stopping Database..."
571 |   stop_db
572 |   echo "Cloud stopped"
573 | elif [[ $1 == 'start_hadoop' ]]; then
574 |   echo "Starting Hadoop..."
575 |   start_cloud
576 |   echo "Cloud Hadoop"
577 | elif [[ $1 == 'stop_hadoop' ]]; then
578 |   echo "Stopping Hadoop..."
579 |   stop_cloud
580 |   echo "Hadoop stopped"
581 | elif [[ $1 == 'reyarn' ]]; then
582 |   echo "Stopping Yarn..."
583 |   stop_yarn
584 |   echo "Starting Yarn..."
585 |   start_yarn
586 | elif [[ $1 == 'regeoserver' ]]; then
587 |   stop_geoserver
588 |   start_geoserver
589 | elif [[ $1 == 'restart' ]]; then
590 |   stop_geoserver && stop_cloud && start_cloud && start_geoserver
591 | elif [[ $1 == 'download_only' ]]; then
592 |   download_packages
593 | elif [[ $1 == 'init_skip_download' ]]; then
594 |   unpackage && configure && start_first_time
595 | else
596 |   show_help
597 | fi
598 | 
599 | 
600 | 
601 | 


--------------------------------------------------------------------------------
/bin/config.sh:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env bash
  2 | 
  3 | # thanks accumulo for these resolutions snippets
  4 | if [ -z "${CLOUD_HOME}" ] ; then
  5 |   # Start: Resolve Script Directory
  6 |   SOURCE="${BASH_SOURCE[0]}"
  7 |   while [ -h "$SOURCE" ]; do # resolve $SOURCE until the file is no longer a symlink
  8 |      bin="$( cd -P "$( dirname "$SOURCE" )" && pwd )"
  9 |      SOURCE="$(readlink "$SOURCE")"
 10 |      [[ $SOURCE != /* ]] && SOURCE="$bin/$SOURCE" # if $SOURCE was a relative symlink, we need to resolve it relative to the path where the symlink file was located
 11 |   done
 12 |   bin="$( cd -P "$( dirname "$SOURCE" )" && pwd )"
 13 |   script=$( basename "$SOURCE" )
 14 |   # Stop: Resolve Script Directory
 15 | 
 16 |   CLOUD_HOME=$( cd -P ${bin}/.. && pwd )
 17 |   export CLOUD_HOME
 18 | fi
 19 | 
 20 | if [ ! -d "${CLOUD_HOME}" ]; then
 21 |   echo "CLOUD_HOME=${CLOUD_HOME} is not a valid directory. Please make sure it exists"
 22 |   return 1
 23 | fi
 24 | 
 25 | # [Tab] shell completion because i'm lazy
 26 | IFS=$'\n' complete -W "init start stop reconfigure regeoserver reyarn clean help" cloud-local.sh
 27 | NL=$'\n'
 28 | 
 29 | function validate_config {
 30 |   # allowed versions are
 31 |   local pkg_error=""
 32 |   # hadoop 3.2 currently required
 33 |   if [[ -z "$pkg_hadoop_ver" || ! $pkg_hadoop_ver =~ 3[.]2[.].+ ]]; then
 34 |     pkg_error="${pkg_error}Invalid hadoop version: '${pkg_hadoop_ver}' ${NL}"
 35 |   fi
 36 |   # zk 3.4.x 
 37 |   if [[ -z "$pkg_zookeeper_ver" || ! $pkg_zookeeper_ver =~ 3[.]4[.]([56789]|10|11|12|13|14) ]]; then
 38 |     pkg_error="${pkg_error}Invalid zookeeper version: '${pkg_zookeeper_ver}' ${NL}"
 39 |   fi
 40 |   # acc 2.0.0
 41 |   if [[ -z "$pkg_accumulo_ver" || ! $pkg_accumulo_ver =~ 2[.]0[.]0 ]]; then
 42 |     pkg_error="${pkg_error}Invalid accumulo version: '${pkg_accumulo_ver}' ${NL}"
 43 |   fi
 44 |   # kafka 0.9.x, 0.10.x, 0.11.x, 1.0.x
 45 |   if [[ -z "$pkg_kafka_ver" || ! $pkg_kafka_ver =~ ((0[.]9[.].+)|(0[.]1[01][.].+)|1[.]0[.].) ]]; then
 46 |     pkg_error="${pkg_error}Invalid kafka version: '${pkg_kafka_ver}' ${NL}"
 47 |   fi
 48 |   # geomesa scala 1.3.x
 49 |   if [[ -z "$pkg_geomesa_scala_ver" && $pkg_geomesa_ver =~ 3[.]0[.].+ ]]; then
 50 |     pkg_error="${pkg_error}Invalid GeoMesa Scala version: '${pkg_geomesa_scala_ver}' ${NL}"
 51 |   fi
 52 |   
 53 |   if [[ ! -z "$pkg_error" ]]; then
 54 |     echo "ERROR: ${pkg_error}"
 55 |     return 1
 56 |   else
 57 |     return 0
 58 |   fi
 59 | }
 60 | 
 61 | function set_env_vars {
 62 |   if [[ $zeppelin_enabled -eq "1" ]]; then
 63 |     export ZEPPELIN_HOME="${CLOUD_HOME}/zeppelin-${pkg_zeppelin_ver}-bin-all"
 64 |   fi
 65 | 
 66 |   if [[ $geomesa_enabled -eq "1" ]]; then
 67 |     unset GEOMESA_HOME
 68 |     unset GEOMESA_BIN
 69 |     export GEOMESA_HOME="${CLOUD_HOME}/geomesa-accumulo_${pkg_geomesa_scala_ver}-${pkg_geomesa_ver}"
 70 |     export GEOMESA_BIN="${GEOMESA_HOME}/bin:"
 71 |     echo "Setting GEOMESA_HOME:  ${GEOMESA_HOME}"
 72 |   fi
 73 | 
 74 |   export ZOOKEEPER_HOME="${CLOUD_HOME}/zookeeper-${pkg_zookeeper_ver}"
 75 | 
 76 |   if [[ $kafka_enabled -eq "1" ]]; then
 77 |     export KAFKA_HOME="${CLOUD_HOME}/kafka_2.11-${pkg_kafka_ver}"
 78 |   fi
 79 |   
 80 |   export HADOOP_HOME="$CLOUD_HOME/hadoop-${pkg_hadoop_ver}"
 81 |   export HADOOP_CONF_DIR="${HADOOP_HOME}/etc/hadoop"
 82 |   export HADOOP_COMMON_HOME="${HADOOP_HOME}"
 83 |   export HADOOP_HDFS_HOME="${HADOOP_HOME}"
 84 |   export HADOOP_YARN_HOME="${HADOOP_HOME}"
 85 |   export HADOOP_PID_DIR="${CLOUD_HOME}/data/hadoop/pid"
 86 |   export HADOOP_IDENT_STRING=$(echo ${CLOUD_HOME} | (md5sum 2>/dev/null || md5) | cut -c1-32)
 87 | 
 88 |   export YARN_HOME="${HADOOP_HOME}" 
 89 | 
 90 |   export SPARK_HOME="$CLOUD_HOME/spark-${pkg_spark_ver}-bin-${pkg_spark_hadoop_ver}"
 91 | 
 92 |   export GEOSERVER_DATA_DIR="${CLOUD_HOME}/data/geoserver"
 93 |   export GEOSERVER_PID_DIR="${GEOSERVER_DATA_DIR}/pid"
 94 |   export GEOSERVER_LOG_DIR="${GEOSERVER_DATA_DIR}/log"  
 95 | 
 96 |   [[ "${acc_enabled}" -eq "1" ]] && export ACCUMULO_HOME="${CLOUD_HOME}/accumulo-${pkg_accumulo_ver}"
 97 |   [[ "${hbase_enabled}" -eq "1" ]] && export HBASE_HOME="${CLOUD_HOME}/hbase-${pkg_hbase_ver}"
 98 |   
 99 |   export PATH="$GEOMESA_BIN"$ZOOKEEPER_HOME/bin:$KAFKA_HOME/bin:$HADOOP_HOME/sbin:$HADOOP_HOME/bin:$SPARK_HOME/bin:$PATH
100 |   [[ "${acc_enabled}" -eq "1" ]] && export PATH="${ACCUMULO_HOME}/bin:${PATH}"
101 |   [[ "${hbase_enabled}" -eq "1" ]] && export PATH="${HBASE_HOME}/bin:${PATH}"
102 |   [[ "${zeppelin_enabled}" -eq "1" ]] && export PATH="${ZEPPELIN_HOME}/bin:${PATH}"
103 | 
104 |   # This variable requires Hadoop executable, which will fail during certain runs/steps
105 |   [[ "$pkg_spark_hadoop_ver" = "without-hadoop" ]] && export SPARK_DIST_CLASSPATH=$(hadoop classpath 2>/dev/null)
106 | 
107 |   # Export direnv environment file https://direnv.net/
108 |   env | grep -v PATH | sort > $CLOUD_HOME/.envrc
109 |   echo "PATH=${PATH}" >> $CLOUD_HOME/.envrc
110 | }
111 | 
112 | if [[ -z "$JAVA_HOME" ]];then
113 |   echo "ERROR: must set JAVA_HOME..."
114 |   return 1
115 | fi
116 | 
117 | # load configuration scripts
118 | . "${CLOUD_HOME}/conf/cloud-local.conf"
119 | validate_config
120 | set_env_vars
121 | 
122 | 
123 | 


--------------------------------------------------------------------------------
/bin/ports.sh:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env bash
  2 | 
  3 | case $(uname) in
  4 |   "Darwin") SED_REGEXP_EXTENDED='-E' ;;
  5 |   *) SED_REGEXP_EXTENDED='-r' ;;
  6 | esac
  7 | 
  8 | # get the port offset from config variable
  9 | function get_port_offset {
 10 |   local offset=0
 11 |  
 12 |   if [ -n "${CL_PORT_OFFSET}" ]; then
 13 |     offset=${CL_PORT_OFFSET}
 14 |   fi
 15 | 
 16 |   echo ${offset}
 17 | }
 18 | 
 19 | function check_port {
 20 |   local port=$1
 21 |   local offset=$(get_port_offset)
 22 |   local tocheck=$((port+offset))
 23 |   if (: < /dev/tcp/127.0.0.1/${tocheck}) 2>/dev/null; then
 24 |     echo "Error: port ${tocheck} is already taken (orig port ${port} with offset ${offset})"
 25 |     exit 1
 26 |   fi
 27 | }
 28 | 
 29 | function check_ports {
 30 |   check_port 2181 # zookeeper
 31 | 
 32 |   check_port 9092 # kafka broker
 33 |   
 34 |   # hadoop
 35 |   check_port 50010 # dfs.datanode.address
 36 |   check_port 50020 # dfs.datanode.ipc.address
 37 |   check_port 50075 # dfs.datanode.http.address
 38 |   check_port 50475 # dfs.datanode.https.address
 39 |   
 40 |   check_port 8020  # namenode data
 41 |   check_port 9000  # namenode data
 42 |   check_port 50070 # namenode http
 43 |   check_port 50470 # namenode https
 44 |   
 45 |   check_port 50090 # secondary name node
 46 |   
 47 |   check_port 8088  # yarn job tracker
 48 |   check_port 8030  # yarn
 49 |   check_port 8031  # yarn
 50 |   check_port 8032  # yarn
 51 |   check_port 8033  # yarn
 52 |   
 53 |   check_port 8090  # yarn
 54 |   check_port 8040  # yarn
 55 |   check_port 8042  # yarn
 56 | 
 57 |   check_port 13562 # mapreduce shuffle 
 58 | 
 59 |   # accumulo
 60 |   check_port 9995  # accumulo monitor
 61 |   check_port 4560  # accumulo monitor log4j
 62 |   check_port 9997  # accumulo tserver
 63 |   check_port 9998  # accumulo gc
 64 |   check_port 9999  # accumulo master
 65 |   check_port 12234 # accumulo tracer
 66 |   check_port 10001 # accumulo master replication coordinator
 67 |   check_port 10002 # accumulo master replication service
 68 | 
 69 | 
 70 |   # hbase
 71 |   check_port 16000 # hbase master
 72 |   check_port 16010 # hbase master info
 73 |   check_port 16020 # hbase regionserver
 74 |   check_port 16030 # hbase regionserver info
 75 |   check_port 16040 # hbase rest
 76 |   check_port 16100 # hbase multicast  
 77 | 
 78 |   # spark
 79 |   check_port 4040 # Spark job monitor
 80 | 
 81 |   # Zeppelin
 82 |   check_port 5771 # Zeppelin embedded web server
 83 | 
 84 |   echo "Known ports are OK (using offset $(get_port_offset))"
 85 | }
 86 | 
 87 | function configure_port_offset {
 88 |   local offset=$(get_port_offset)
 89 | 
 90 |   local KEY="CL_port_default"
 91 |   local KEY_CHANGED="CL_offset_port"
 92 | 
 93 |   # zookeeper (zoo.cfg)
 94 |   # do this one by hand, it's fairly straightforward
 95 |   zkPort=$((2181+offset))
 96 |   sed -i~orig "s/clientPort=.*/clientPort=$zkPort/" $ZOOKEEPER_HOME/conf/zoo.cfg
 97 | 
 98 |   # kafka (server.properties)
 99 |   if [[ "kafka_enabled" -eq 1 ]]; then
100 |     kafkaPort=$((9092+offset))
101 |     sed -i~orig "s/\/\/$CL_HOSTNAME:[0-9].*/\/\/$CL_HOSTNAME:$kafkaPort/" $KAFKA_HOME/config/server.properties
102 |     sed -i~orig "s/zookeeper.connect=$CL_HOSTNAME:[0-9].*/zookeeper.connect=$CL_HOSTNAME:$zkPort/" $KAFKA_HOME/config/server.properties
103 |   fi
104 | 
105 |   # Zeppelin
106 |   if [[ "$zeppelin_enabled" -eq 1 ]]; then
107 |     zeppelinPort=$((5771+offset))
108 |     sed -i~orig "s/ZEPPELIN_PORT=[0-9]\{1,5\}\(.*\)/ZEPPELIN_PORT=$zeppelinPort\1/g" "$ZEPPELIN_HOME/conf/zeppelin-env.sh"
109 |   fi
110 | 
111 |   # hadoop and accumulo xml files
112 |   # The idea with this block is that the xml files have comments which tag lines which need
113 |   # a port replacement, and the comments provide the default values. So to change ports,
114 |   # we replace all the instance of the default value, on the line with the comment, with
115 |   # the desired (offset) port.
116 | 
117 |   xmlFiles=( $HADOOP_CONF_DIR/core-site.xml \
118 |           $HADOOP_CONF_DIR/hdfs-site.xml \
119 |           $HADOOP_CONF_DIR/mapred-site.xml \
120 |           $HADOOP_CONF_DIR/yarn-site.xml )
121 |   if [ -f "$ACCUMULO_HOME/conf/accumulo-site.xml" ]; then
122 |       xmlFiles+=($ACCUMULO_HOME/conf/accumulo-site.xml)
123 |   fi
124 |   if [ -f "$HBASE_HOME/conf/hbase-site.xml" ]; then
125 |       xmlFiles+=($HBASE_HOME/conf/hbase-site.xml)
126 |   fi
127 |   for FILE in "${xmlFiles[@]}"; do
128 |     while [[ -n "$(grep $KEY $FILE)" ]]; do # while lines need to be changed
129 |         # pull the default port out of the comment
130 |         basePort=$(grep -hoE "$KEY [0-9]+" $FILE | head -1 | grep -hoE [0-9]+)
131 |         # calculate new port
132 |         newPort=$(($basePort+$offset))
133 |         # note that any part of the line matching the port line will be replaced...
134 |         # the following sed only makes the replacement on a single line, containing the matched comment
135 |         #sed -i~orig ${SED_REGEXP_EXTENDED} "/$KEY $basePort/ s#[0-9]+</value>#$newPort</value>#" $FILE
136 |         if [[ "${CL_VERBOSE}" == "1" ]]; then echo "Replacing port $basePort with $newPort in file $FILE"; fi
137 |         sed -i~orig ${SED_REGEXP_EXTENDED} "/$KEY $basePort/ s#(<value>.*)(${basePort})(.*</value>)#\\1${newPort}\\3#" $FILE
138 |         # mark this line done
139 |         sed -i~orig ${SED_REGEXP_EXTENDED} "s/$KEY $basePort/$KEY_CHANGED $basePort/" $FILE
140 |     done
141 |     # re-mark all comment lines, so we can change ports again later if we want
142 |     sed -i~orig "s/$KEY_CHANGED/$KEY/g" $FILE
143 |   done
144 | 
145 |   echo "Ports configured to use offset $offset"
146 | }
147 | 


--------------------------------------------------------------------------------
/conf/cloud-local.conf:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env bash
  2 | 
  3 | ################################################################################
  4 | # REPOSITORY MANAGEMENT
  5 | ################################################################################
  6 | 
  7 | #
  8 | # Source for packages (accumulo, hadoop, etc)
  9 | # Available options are (local, wget)
 10 | #
 11 | # Set the variable 'pkg_src_mirror' if you want to specify a mirror
 12 | # else it will use https://www.apache.org/dyn/closer.cgi to determine
 13 | # a mirror. If you have a caching web proxy you may want to set this as well.
 14 | #
 15 | # pkg_src_mirror="http://apache.mirrors.tds.net"
 16 | 
 17 | # Specify a maven repository to use
 18 | pkg_src_maven="https://repo1.maven.org/maven2"
 19 | 
 20 | # Optionally specifcy a local shared folder of package downloads
 21 | #pkg_pre_download=/net/synds1/volume2/projects2/cloud-local/packages
 22 | 
 23 | ################################################################################
 24 | # VERSION MANAGEMENT - Versions of popular software
 25 | ################################################################################
 26 | 
 27 | pkg_accumulo_ver="2.0.0"
 28 | pkg_hbase_ver="1.3.1"
 29 | # Note pkg_spark_hadoop_ver below if modifying
 30 | pkg_hadoop_ver="3.2.1"
 31 | # Note, just the major+minor from Hadoop, not patch level
 32 | hadoop_base_ver=${pkg_hadoop_ver:0:3}
 33 | 
 34 | 
 35 | pkg_zookeeper_ver="3.4.14"
 36 | # Note convention is scala.version_kafka.version
 37 | pkg_kafka_scala_ver="2.11"
 38 | pkg_kafka_ver="1.0.1"
 39 | 
 40 | pkg_spark_ver="2.2.1"
 41 | # Note pkg_hadoop_ver above
 42 | #  - don't auto-derive this as spark & hadoop major releases aren't lock-step
 43 | #  - use "without-hadoop" to use version without hadoop deps
 44 | pkg_spark_hadoop_ver="without-hadoop"
 45 | 
 46 | pkg_geomesa_ver="3.0.0"
 47 | pkg_geomesa_scala_ver="2.11"
 48 | pkg_scala_ver="2.11.7"
 49 | 
 50 | # Apache Zeppelin, yet another analyst notebook that knows about Spark
 51 | # You must change Spark to a compatible version (e.g. Zep 0.7.2 with Spark 2.1 and Zep 0.7.3 with Spark 2.2)
 52 | pkg_zeppelin_ver="0.7.3"
 53 | 
 54 | ################################################################################
 55 | # ACCUMULO CONFIGURATION
 56 | ################################################################################
 57 | 
 58 | cl_acc_inst_name="local"
 59 | cl_acc_inst_pass="secret"
 60 | 
 61 | ################################################################################
 62 | # IP/HOSTNAME/PORT CONFIGURATION - How to bind to things
 63 | ################################################################################
 64 | 
 65 | # The following options can be overriden in the user environment
 66 | # bind address and hostname to use for all service bindings
 67 | if [[ -z "${CL_HOSTNAME}" ]]; then
 68 |   CL_HOSTNAME=$(hostname)
 69 |   #CL_HOSTNAME=localhost
 70 | fi
 71 | 
 72 | if [[ -z "${CL_BIND_ADDRESS}" ]]; then
 73 |   CL_BIND_ADDRESS="0.0.0.0"
 74 |   #CL_BIND_ADDRESS="127.0.0.1"
 75 | fi
 76 | 
 77 | if [[ -z "${CL_PORT_OFFSET}" ]]; then
 78 |   CL_PORT_OFFSET=0
 79 | fi
 80 | 
 81 | if [[ -z "${CL_VERBOSE}" ]]; then
 82 |   CL_VERBOSE=0
 83 | fi
 84 | 
 85 | ################################################################################
 86 | # PACKAGE MANAGEMENT - Enable/Disable software here
 87 | ################################################################################
 88 | 
 89 | # 1 = enabled
 90 | # 0 = disabled
 91 | 
 92 | master_enabled=1
 93 | worker_enabled=1
 94 | 
 95 | # Hadoop HDFS and YARN
 96 | hadoop_enabled=1
 97 | 
 98 | # Enable accumulo or hbase - probably best not to run both but it might work
 99 | # Requires hadoop_enabled=1
100 | acc_enabled=1
101 | hbase_enabled=0
102 | 
103 | # Enable/disable Kafka
104 | kafka_enabled=0
105 | 
106 | # Download spark distribution
107 | spark_enabled=0
108 | 
109 | # Enable/Disable installation of GeoMesa
110 | geomesa_enabled=0
111 | if [[ -z "${pkg_geomesa_ver}" && -z "${pkg_geomesa_scala_ver}" && "${geomesa_enabled}" -eq "1" ]]; then
112 |   echo "Error: GeoMesa is enabled but the version number is missing."
113 | 	exit 1
114 | fi
115 | 
116 | # Enable/Disable scala download
117 | scala_enabled=1
118 | if [[ -z "${pkg_scala_ver}" && "${scala_enabled}" -eq "1" ]]; then
119 |   echo "Error: Scala is enabled but the version number is missing."
120 |   exit 1
121 | fi
122 | 
123 | # Enable/Disable Zepplin
124 | # Ensure that your Spark+Zeppelin versions are compatible
125 | zeppelin_enabled=0
126 | 


--------------------------------------------------------------------------------
/templates/accumulo/accumulo-env.sh:
--------------------------------------------------------------------------------
  1 | #! /usr/bin/env bash
  2 | 
  3 | # Licensed to the Apache Software Foundation (ASF) under one or more
  4 | # contributor license agreements.  See the NOTICE file distributed with
  5 | # this work for additional information regarding copyright ownership.
  6 | # The ASF licenses this file to You under the Apache License, Version 2.0
  7 | # (the "License"); you may not use this file except in compliance with
  8 | # the License.  You may obtain a copy of the License at
  9 | #
 10 | #     http://www.apache.org/licenses/LICENSE-2.0
 11 | #
 12 | # Unless required by applicable law or agreed to in writing, software
 13 | # distributed under the License is distributed on an "AS IS" BASIS,
 14 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 15 | # See the License for the specific language governing permissions and
 16 | # limitations under the License.
 17 | 
 18 | ## Before accumulo-env.sh is loaded, these environment variables are set and can be used in this file:
 19 | 
 20 | # cmd - Command that is being called such as tserver, master, etc.
 21 | # basedir - Root of Accumulo installation
 22 | # bin - Directory containing Accumulo scripts
 23 | # conf - Directory containing Accumulo configuration
 24 | # lib - Directory containing Accumulo libraries
 25 | 
 26 | ############################
 27 | # Variables that must be set
 28 | ############################
 29 | 
 30 | ## Accumulo logs directory. Referenced by logger config.
 31 | export ACCUMULO_LOG_DIR="${ACCUMULO_LOG_DIR:-${basedir}/logs}"
 32 | ## Hadoop installation
 33 | export HADOOP_HOME="${HADOOP_HOME:-/path/to/hadoop}"
 34 | ## Hadoop configuration
 35 | export HADOOP_CONF_DIR="${HADOOP_CONF_DIR:-${HADOOP_HOME}/etc/hadoop}"
 36 | ## Zookeeper installation
 37 | export ZOOKEEPER_HOME="${ZOOKEEPER_HOME:-/path/to/zookeeper}"
 38 | 
 39 | ##########################
 40 | # Build CLASSPATH variable
 41 | ##########################
 42 | 
 43 | ## Verify that Hadoop & Zookeeper installation directories exist
 44 | if [[ ! -d "$ZOOKEEPER_HOME" ]]; then
 45 |   echo "ZOOKEEPER_HOME=$ZOOKEEPER_HOME is not set to a valid directory in accumulo-env.sh"
 46 |   exit 1
 47 | fi
 48 | if [[ ! -d "$HADOOP_HOME" ]]; then
 49 |   echo "HADOOP_HOME=$HADOOP_HOME is not set to a valid directory in accumulo-env.sh"
 50 |   exit 1
 51 | fi
 52 | 
 53 | ## Build using existing CLASSPATH, conf/ directory, dependencies in lib/, and external Hadoop & Zookeeper dependencies
 54 | if [[ -n "$CLASSPATH" ]]; then
 55 |   CLASSPATH="${CLASSPATH}:${conf}"
 56 | else
 57 |   CLASSPATH="${conf}"
 58 | fi
 59 | CLASSPATH="${CLASSPATH}:${lib}/*:${HADOOP_CONF_DIR}:${ZOOKEEPER_HOME}/*:${HADOOP_HOME}/share/hadoop/client/*"
 60 | export CLASSPATH
 61 | 
 62 | ##################################################################
 63 | # Build JAVA_OPTS variable. Defaults below work but can be edited.
 64 | ##################################################################
 65 | 
 66 | ## JVM options set for all processes. Extra options can be passed in by setting ACCUMULO_JAVA_OPTS to an array of options.
 67 | JAVA_OPTS=("${ACCUMULO_JAVA_OPTS[@]}"
 68 |   '-XX:+UseConcMarkSweepGC'
 69 |   '-XX:CMSInitiatingOccupancyFraction=75'
 70 |   '-XX:+CMSClassUnloadingEnabled'
 71 |   '-XX:OnOutOfMemoryError=kill -9 %p'
 72 |   '-XX:-OmitStackTraceInFastThrow'
 73 |   '-Djava.net.preferIPv4Stack=true'
 74 |   "-Daccumulo.native.lib.path=${lib}/native")
 75 | 
 76 | ## Make sure Accumulo native libraries are built since they are enabled by default
 77 | "${bin}"/accumulo-util build-native &> /dev/null
 78 | 
 79 | ## JVM options set for individual applications
 80 | case "$cmd" in
 81 |   master)  JAVA_OPTS=("${JAVA_OPTS[@]}" '-Xmx512m' '-Xms512m') ;;
 82 |   monitor) JAVA_OPTS=("${JAVA_OPTS[@]}" '-Xmx256m' '-Xms256m') ;;
 83 |   gc)      JAVA_OPTS=("${JAVA_OPTS[@]}" '-Xmx256m' '-Xms256m') ;;
 84 |   tserver) JAVA_OPTS=("${JAVA_OPTS[@]}" '-Xmx3G'   '-Xms3G') ;;
 85 |   *)       JAVA_OPTS=("${JAVA_OPTS[@]}" '-Xmx256m' '-Xms64m') ;;
 86 | esac
 87 | 
 88 | ## JVM options set for logging. Review logj4 properties files to see how they are used.
 89 | JAVA_OPTS=("${JAVA_OPTS[@]}"
 90 |   "-Daccumulo.log.dir=${ACCUMULO_LOG_DIR}"
 91 |   "-Daccumulo.application=${cmd}${ACCUMULO_SERVICE_INSTANCE}_$(hostname)")
 92 | 
 93 | case "$cmd" in
 94 |   monitor)
 95 |     JAVA_OPTS=("${JAVA_OPTS[@]}" "-Dlog4j.configuration=log4j-monitor.properties")
 96 |     ;;
 97 |   gc|master|tserver|tracer)
 98 |     JAVA_OPTS=("${JAVA_OPTS[@]}" "-Dlog4j.configuration=log4j-service.properties")
 99 |     ;;
100 |   *)
101 |     # let log4j use its default behavior (log4j.xml, log4j.properties)
102 |     true
103 |     ;;
104 | esac
105 | 
106 | export JAVA_OPTS
107 | 
108 | ############################
109 | # Variables set to a default
110 | ############################
111 | 
112 | export MALLOC_ARENA_MAX=${MALLOC_ARENA_MAX:-1}
113 | ## Add Hadoop native libraries to shared library paths given operating system
114 | case "$(uname)" in
115 |   Darwin) export DYLD_LIBRARY_PATH="${HADOOP_HOME}/lib/native:${DYLD_LIBRARY_PATH}" ;;
116 |   *)      export LD_LIBRARY_PATH="${HADOOP_HOME}/lib/native:${LD_LIBRARY_PATH}" ;;
117 | esac
118 | 
119 | ###############################################
120 | # Variables that are optional. Uncomment to set
121 | ###############################################
122 | 
123 | ## Specifies command that will be placed before calls to Java in accumulo script
124 | # export ACCUMULO_JAVA_PREFIX=""
125 | 


--------------------------------------------------------------------------------
/templates/accumulo/accumulo-site.xml:
--------------------------------------------------------------------------------
  1 | <?xml version="1.0" encoding="UTF-8"?>
  2 | <!--
  3 |   Licensed to the Apache Software Foundation (ASF) under one or more
  4 |   contributor license agreements.  See the NOTICE file distributed with
  5 |   this work for additional information regarding copyright ownership.
  6 |   The ASF licenses this file to You under the Apache License, Version 2.0
  7 |   (the "License"); you may not use this file except in compliance with
  8 |   the License.  You may obtain a copy of the License at
  9 | 
 10 |       http://www.apache.org/licenses/LICENSE-2.0
 11 | 
 12 |   Unless required by applicable law or agreed to in writing, software
 13 |   distributed under the License is distributed on an "AS IS" BASIS,
 14 |   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 15 |   See the License for the specific language governing permissions and
 16 |   limitations under the License.
 17 | -->
 18 | <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
 19 | 
 20 | <configuration>
 21 |   <!-- Put your site-specific accumulo configurations here. The available configuration values along with their defaults are documented in docs/config.html Unless
 22 |     you are simply testing at your workstation, you will most definitely need to change the three entries below. -->
 23 | 
 24 |   <property>
 25 |     <name>instance.zookeeper.host</name>
 26 |     <value>CLOUD_LOCAL_HOSTNAME:2181</value> <!-- CL_port_default 2181 -->
 27 |     <description>comma separated list of zookeeper servers</description>
 28 |   </property>
 29 | 
 30 |   <property>
 31 |     <name>monitor.port.client</name>
 32 |     <value>9995</value> <!-- CL_port_default 9995 -->
 33 |   </property>
 34 | 
 35 |   <property>
 36 |     <name>monitor.port.log4j</name>
 37 |     <value>4560</value> <!-- CL_port_default 4560 -->
 38 |   </property>
 39 | 
 40 |   <property>
 41 |     <name>tserver.port.client</name>
 42 |     <value>9997</value> <!-- CL_port_default 9997 -->
 43 |   </property>
 44 | 
 45 |   <property>
 46 |     <name>gc.port.client</name>
 47 |     <value>9998</value> <!-- CL_port_default 9998 -->
 48 |   </property>
 49 | 
 50 |   <property>
 51 |     <name>master.port.client</name>
 52 |     <value>9999</value> <!-- CL_port_default 9999 -->
 53 |   </property>
 54 | 
 55 |   <property>
 56 |     <name>master.replication.coordinator.port</name>
 57 |     <value>10001</value> <!-- CL_port_default 10001 -->
 58 |   </property>
 59 | 
 60 |   <property>
 61 |     <name>replication.receipt.service.port</name>
 62 |     <value>10002</value> <!-- CL_port_default 10002 -->
 63 |   </property>
 64 | 
 65 |   <property>
 66 |     <name>trace.port.client</name>
 67 |     <value>12234</value> <!-- CL_port_default 12234 -->
 68 |   </property>
 69 | 
 70 |   <property>
 71 |     <name>instance.volumes</name>
 72 |     <value>hdfs://CLOUD_LOCAL_HOSTNAME:9000/accumulo</value> <!-- CL_port_default 9000 -->
 73 |   </property>
 74 | 
 75 |   <property>
 76 |     <name>instance.secret</name>
 77 |     <value>DEFAULT</value>
 78 |     <description>A secret unique to a given instance that all servers must know in order to communicate with one another.
 79 |       Change it before initialization. To
 80 |       change it later use ./bin/accumulo org.apache.accumulo.server.util.ChangeSecret --old [oldpasswd] --new [newpasswd],
 81 |       and then update this file.
 82 |     </description>
 83 |   </property>
 84 | 
 85 |   <property>
 86 |     <name>tserver.memory.maps.max</name>
 87 |     <value>1G</value>
 88 |   </property>
 89 | 
 90 |   <property>
 91 |     <name>tserver.memory.maps.native.enabled</name>
 92 |     <value>false</value>
 93 |   </property>
 94 | 
 95 |   <property>
 96 |     <name>tserver.cache.data.size</name>
 97 |     <value>128M</value>
 98 |   </property>
 99 | 
100 |   <property>
101 |     <name>tserver.cache.index.size</name>
102 |     <value>128M</value>
103 |   </property>
104 | 
105 |   <property>
106 |     <name>trace.token.property.password</name>
107 |     <!-- change this to the root user's password, and/or change the user below -->
108 |     <value>secret</value>
109 |   </property>
110 | 
111 |   <property>
112 |     <name>trace.user</name>
113 |     <value>root</value>
114 |   </property>
115 | 
116 |   <property>
117 |     <name>tserver.sort.buffer.size</name>
118 |     <value>200M</value>
119 |   </property>
120 | 
121 |   <property>
122 |     <name>tserver.walog.max.size</name>
123 |     <value>1G</value>
124 |   </property>
125 | 
126 | </configuration>
127 | 


--------------------------------------------------------------------------------
/templates/hadoop/core-site.xml:
--------------------------------------------------------------------------------
 1 | <?xml version="1.0" encoding="UTF-8"?>
 2 | <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
 3 | <!--
 4 |   Licensed under the Apache License, Version 2.0 (the "License");
 5 |   you may not use this file except in compliance with the License.
 6 |   You may obtain a copy of the License at
 7 | 
 8 |     http://www.apache.org/licenses/LICENSE-2.0
 9 | 
10 |   Unless required by applicable law or agreed to in writing, software
11 |   distributed under the License is distributed on an "AS IS" BASIS,
12 |   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13 |   See the License for the specific language governing permissions and
14 |   limitations under the License. See accompanying LICENSE file.
15 | -->
16 | 
17 | <!-- Put site-specific property overrides in this file. -->
18 | <configuration>
19 | 
20 |     <property>
21 |         <name>fs.defaultFS</name>
22 |         <value>hdfs://CLOUD_LOCAL_HOSTNAME:9000</value> <!-- CL_port_default 9000 -->
23 |     </property>
24 | 
25 |     <property>
26 |         <!-- this is technically deprecated and is the same as fs.defaultFS -->
27 |         <name>fs.default.name</name>
28 |         <value>hdfs://CLOUD_LOCAL_HOSTNAME:9000</value> <!-- CL_port_default 9000 -->
29 |     </property>
30 | 
31 |     <property>
32 |         <name>hadoop.tmp.dir</name>
33 |         <value>LOCAL_CLOUD_PREFIX/data/hadoop/tmp</value>
34 |     </property>
35 | 
36 | </configuration>
37 | 
38 | 


--------------------------------------------------------------------------------
/templates/hadoop/hdfs-site.xml:
--------------------------------------------------------------------------------
 1 | <?xml version="1.0" encoding="UTF-8"?>
 2 | <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
 3 | <!--
 4 |   Licensed under the Apache License, Version 2.0 (the "License");
 5 |   you may not use this file except in compliance with the License.
 6 |   You may obtain a copy of the License at
 7 | 
 8 |     http://www.apache.org/licenses/LICENSE-2.0
 9 | 
10 |   Unless required by applicable law or agreed to in writing, software
11 |   distributed under the License is distributed on an "AS IS" BASIS,
12 |   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13 |   See the License for the specific language governing permissions and
14 |   limitations under the License. See accompanying LICENSE file.
15 | -->
16 | 
17 | <!-- Put site-specific property overrides in this file. -->
18 | <configuration>
19 | 
20 |     <property>
21 |         <name>dfs.replication</name>
22 |         <value>1</value>
23 |     </property>
24 | 
25 |     <property>
26 |         <name>dfs.datanode.synconclose</name>
27 |         <value>true</value>
28 |     </property>
29 | 
30 |     <property>
31 |         <name>dfs.name.dir</name>
32 |         <value>LOCAL_CLOUD_PREFIX/data/dfs/name</value>
33 |     </property>
34 | 
35 |     <property>
36 |         <name>dfs.data.dir</name>
37 |         <value>LOCAL_CLOUD_PREFIX/data/dfs/data</value>
38 |     </property>
39 | 
40 |     <property>
41 |         <name>dfs.datanode.address</name>
42 |         <value>CLOUD_LOCAL_BIND_ADDRESS:50010</value> <!-- CL_port_default 50010 -->
43 |     </property>
44 | 
45 |     <property>
46 |         <name>dfs.datanode.ipc.address</name>
47 |         <value>CLOUD_LOCAL_BIND_ADDRESS:50020</value> <!-- CL_port_default 50020 -->
48 |     </property>
49 | 
50 |     <property>
51 |         <name>dfs.datanode.http.address</name>
52 |         <value>CLOUD_LOCAL_BIND_ADDRESS:50075</value> <!-- CL_port_default 50075 -->
53 |     </property>
54 | 
55 |     <property>
56 |         <name>dfs.datanode.https.address</name>
57 |         <value>CLOUD_LOCAL_BIND_ADDRESS:50475</value> <!-- CL_port_default 50475 -->
58 |     </property>
59 | 
60 |     <property>
61 |         <name>dfs.namenode.http-address</name>
62 |         <value>CLOUD_LOCAL_BIND_ADDRESS:50070</value> <!-- CL_port_default 50070 -->
63 |     </property>
64 | 
65 |     <property>
66 |         <name>dfs.namenode.https-address</name>
67 |         <value>CLOUD_LOCAL_BIND_ADDRESS:50470</value> <!-- CL_port_default 50470 -->
68 |     </property>
69 | 
70 |     <property>
71 |         <name>dfs.namenode.secondary.http-address</name>
72 |         <value>CLOUD_LOCAL_BIND_ADDRESS:50090</value> <!-- CL_port_default 50090 -->
73 |     </property>
74 | 
75 | </configuration>
76 | 


--------------------------------------------------------------------------------
/templates/hadoop/mapred-site.xml:
--------------------------------------------------------------------------------
 1 | <?xml version="1.0"?>
 2 | <configuration>
 3 | 
 4 |     <property>
 5 |         <name>mapreduce.framework.name</name>
 6 |         <value>yarn</value>
 7 |     </property>
 8 | 
 9 |     <property>
10 |         <name>mapreduce.shuffle.port</name>
11 |         <value>13562</value> <!-- CL_port_default 13562 -->
12 |     </property>
13 | 
14 |     <property>
15 |         <name>mapreduce.jobhistory.address</name>
16 |         <value>CLOUD_LOCAL_BIND_ADDRESS:10020</value> <!-- CL_port_default 10020 -->
17 |     </property>
18 | 
19 |     <property>
20 |         <name>mapreduce.jobhistory.webapp.address</name>
21 |         <value>CLOUD_LOCAL_BIND_ADDRESS:19888</value> <!-- CL_port_default 19888 -->
22 |     </property>
23 | 
24 |     <property>
25 |         <name>mapreduce.jobtracker.http.address</name>
26 |         <value>CLOUD_LOCAL_BIND_ADDRESS:50030</value> <!-- CL_port_default 50030 -->
27 |     </property>
28 | 
29 |     <property>
30 |         <name>mapreduce.tasktracker.http.address</name>
31 |         <value>CLOUD_LOCAL_BIND_ADDRESS:50050</value> <!-- CL_port_default 50050 -->
32 |     </property>
33 | 
34 | </configuration>
35 | 


--------------------------------------------------------------------------------
/templates/hadoop/yarn-site.xml:
--------------------------------------------------------------------------------
 1 | <?xml version="1.0"?>
 2 | <!--
 3 |   Licensed under the Apache License, Version 2.0 (the "License");
 4 |   you may not use this file except in compliance with the License.
 5 |   You may obtain a copy of the License at
 6 | 
 7 |     http://www.apache.org/licenses/LICENSE-2.0
 8 | 
 9 |   Unless required by applicable law or agreed to in writing, software
10 |   distributed under the License is distributed on an "AS IS" BASIS,
11 |   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12 |   See the License for the specific language governing permissions and
13 |   limitations under the License. See accompanying LICENSE file.
14 | -->
15 | <configuration>
16 | 
17 |     <property>
18 |         <name>yarn.nodemanager.aux-services</name>
19 |         <value>mapreduce_shuffle</value>
20 |     </property>
21 | 
22 |     <property>
23 |         <name>yarn.nodemanager.local-dirs</name>
24 |         <value>LOCAL_CLOUD_PREFIX/data/yarn</value>
25 |     </property>
26 | 
27 |     <property>
28 |         <name>yarn.nodemanager.vmem-check-enabled</name>
29 |         <value>false</value>
30 |     </property>
31 | 
32 |     <property>
33 |         <name>yarn.resourcemanager.webapp.address</name>
34 |         <value>CLOUD_LOCAL_BIND_ADDRESS:8088</value> <!-- CL_port_default 8088 -->
35 |     </property>
36 | 
37 |     <property>
38 |         <name>yarn.resourcemanager.scheduler.address</name>
39 |         <value>CLOUD_LOCAL_BIND_ADDRESS:8030</value> <!-- CL_port_default 8030 -->
40 |     </property>
41 | 
42 |     <property>
43 |         <name>yarn.resourcemanager.resource-tracker.address</name>
44 |         <value>CLOUD_LOCAL_BIND_ADDRESS:8031</value> <!-- CL_port_default 8031 -->
45 |     </property>
46 | 
47 |     <property>
48 |         <name>yarn.resourcemanager.address</name>
49 |         <value>CLOUD_LOCAL_BIND_ADDRESS:8032</value> <!-- CL_port_default 8032 -->
50 |     </property>
51 | 
52 |     <property>
53 |         <name>yarn.resourcemanager.admin.address</name>
54 |         <value>CLOUD_LOCAL_BIND_ADDRESS:8033</value> <!-- CL_port_default 8033 -->
55 |     </property>
56 | 
57 |     <property>
58 |         <name>yarn.resourcemanager.webapp.https.address</name>
59 |         <value>CLOUD_LOCAL_BIND_ADDRESS:8090</value> <!-- CL_port_default 8090 -->
60 |     </property>
61 | 
62 |     <property>
63 |         <name>yarn.nodemanager.localizer.address</name>
64 |         <value>CLOUD_LOCAL_BIND_ADDRESS:8040</value> <!-- CL_port_default 8040 -->
65 |     </property>
66 | 
67 |     <property>
68 |         <name>yarn.nodemanager.webapp.address</name>
69 |         <value>CLOUD_LOCAL_BIND_ADDRESS:8042</value> <!-- CL_port_default 8042 -->
70 |     </property>
71 | 
72 | </configuration>
73 | 


--------------------------------------------------------------------------------
/templates/hbase/hbase-site.xml:
--------------------------------------------------------------------------------
 1 | <?xml version="1.0"?>
 2 | <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
 3 | <!--
 4 | /**
 5 |  *
 6 |  * Licensed to the Apache Software Foundation (ASF) under one
 7 |  * or more contributor license agreements.  See the NOTICE file
 8 |  * distributed with this work for additional information
 9 |  * regarding copyright ownership.  The ASF licenses this file
10 |  * to you under the Apache License, Version 2.0 (the
11 |  * "License"); you may not use this file except in compliance
12 |  * with the License.  You may obtain a copy of the License at
13 |  *
14 |  *     http://www.apache.org/licenses/LICENSE-2.0
15 |  *
16 |  * Unless required by applicable law or agreed to in writing, software
17 |  * distributed under the License is distributed on an "AS IS" BASIS,
18 |  * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
19 |  * See the License for the specific language governing permissions and
20 |  * limitations under the License.
21 |  */
22 | -->
23 | <configuration>
24 |   <property>
25 |     <name>hbase.cluster.distributed</name>
26 |     <value>true</value>
27 |   </property>
28 |   <property>
29 |     <name>hbase.rootdir</name>
30 |     <value>hdfs://CLOUD_LOCAL_HOSTNAME:9000/hbase</value>  <!-- CL_port_default 9000 -->
31 |   </property>
32 | 
33 |   <!-- HBase Master config --> 
34 |   <property>
35 |     <name>hbase.master.port</name>
36 |     <value>16000</value>  <!-- CL_port_default 16000 -->
37 |   </property>
38 |   <property>
39 |     <name>hbase.master.info.port</name>
40 |     <value>16010</value>  <!-- CL_port_default 16010 -->
41 |   </property>
42 |   <property>
43 |     <name>hbase.master.info.bindAddress</name>
44 |     <value>0.0.0.0</value>
45 |   </property>
46 | 
47 |   <!-- HBase RegionServer Config -->
48 |   <property>
49 |     <name>hbase.regionserver.port</name>
50 |     <value>16020</value>  <!-- CL_port_default 16020 -->
51 |   </property>
52 |   <property>
53 |     <name>hbase.regionserver.info.port</name>
54 |     <value>16030</value>  <!-- CL_port_default 16030 -->
55 |   </property>
56 |   <property>
57 |     <name>hbase.regionserver.info.bindAddress</name>
58 |     <value>0.0.0.0</value>
59 |   </property>
60 |   
61 |   <!-- HBase Rest -->
62 |   <property>
63 |     <name>hbase.rest.port</name>
64 |     <value>16040</value>   <!-- CL_port_default 16040 -->
65 |   </property>
66 |   
67 |   <!-- multicast -->
68 |   <property>
69 |     <name>hbase.status.multicast.address.port</name>
70 |     <value>16100</value>   <!-- CL_port_default 16100 -->
71 |   </property>
72 |   
73 |   <property>
74 |     <name>hbase.zookeeper.quorum</name>
75 |     <value>CLOUD_LOCAL_HOSTNAME</value> 
76 |   </property>
77 |   <property>
78 |     <name>hbase.zookeeper.property.clientPort</name>
79 |     <value>2181</value>  <!-- CL_port_default 2181 -->
80 |   </property>
81 | </configuration>
82 | 


--------------------------------------------------------------------------------
/templates/kafka/server.properties:
--------------------------------------------------------------------------------
  1 | # Licensed to the Apache Software Foundation (ASF) under one or more
  2 | # contributor license agreements.  See the NOTICE file distributed with
  3 | # this work for additional information regarding copyright ownership.
  4 | # The ASF licenses this file to You under the Apache License, Version 2.0
  5 | # (the "License"); you may not use this file except in compliance with
  6 | # the License.  You may obtain a copy of the License at
  7 | # 
  8 | #    http://www.apache.org/licenses/LICENSE-2.0
  9 | # 
 10 | # Unless required by applicable law or agreed to in writing, software
 11 | # distributed under the License is distributed on an "AS IS" BASIS,
 12 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 13 | # See the License for the specific language governing permissions and
 14 | # limitations under the License.
 15 | # see kafka.server.KafkaConfig for additional details and defaults
 16 | 
 17 | ############################# Server Basics #############################
 18 | 
 19 | # The id of the broker. This must be set to a unique integer for each broker.
 20 | broker.id=0
 21 | 
 22 | ############################# Socket Server Settings #############################
 23 | 
 24 | # CL_port_default 9092
 25 | listeners=PLAINTEXT://CLOUD_LOCAL_HOSTNAME:9092
 26 | 
 27 | # The port the socket server listens on
 28 | #port=9092
 29 | 
 30 | # Hostname the broker will bind to. If not set, the server will bind to all interfaces
 31 | #host.name=localhost
 32 | 
 33 | # Hostname the broker will advertise to producers and consumers. If not set, it uses the
 34 | # value for "host.name" if configured.  Otherwise, it will use the value returned from
 35 | # java.net.InetAddress.getCanonicalHostName().
 36 | #advertised.host.name=<hostname routable by clients>
 37 | 
 38 | # The port to publish to ZooKeeper for clients to use. If this is not set,
 39 | # it will publish the same port that the broker binds to.
 40 | #advertised.port=<port accessible by clients>
 41 | 
 42 | # The number of threads handling network requests
 43 | num.network.threads=3
 44 |  
 45 | # The number of threads doing disk I/O
 46 | num.io.threads=8
 47 | 
 48 | # The send buffer (SO_SNDBUF) used by the socket server
 49 | socket.send.buffer.bytes=102400
 50 | 
 51 | # The receive buffer (SO_RCVBUF) used by the socket server
 52 | socket.receive.buffer.bytes=102400
 53 | 
 54 | # The maximum size of a request that the socket server will accept (protection against OOM)
 55 | socket.request.max.bytes=104857600
 56 | 
 57 | 
 58 | ############################# Log Basics #############################
 59 | 
 60 | # A comma seperated list of directories under which to store log files
 61 | log.dirs=LOCAL_CLOUD_PREFIX/data/kafka-logs
 62 | 
 63 | # The default number of log partitions per topic. More partitions allow greater
 64 | # parallelism for consumption, but this will also result in more files across
 65 | # the brokers.
 66 | num.partitions=1
 67 | 
 68 | # The number of threads per data directory to be used for log recovery at startup and flushing at shutdown.
 69 | # This value is recommended to be increased for installations with data dirs located in RAID array.
 70 | num.recovery.threads.per.data.dir=1
 71 | 
 72 | ############################# Log Flush Policy #############################
 73 | 
 74 | # Messages are immediately written to the filesystem but by default we only fsync() to sync
 75 | # the OS cache lazily. The following configurations control the flush of data to disk. 
 76 | # There are a few important trade-offs here:
 77 | #    1. Durability: Unflushed data may be lost if you are not using replication.
 78 | #    2. Latency: Very large flush intervals may lead to latency spikes when the flush does occur as there will be a lot of data to flush.
 79 | #    3. Throughput: The flush is generally the most expensive operation, and a small flush interval may lead to exceessive seeks. 
 80 | # The settings below allow one to configure the flush policy to flush data after a period of time or
 81 | # every N messages (or both). This can be done globally and overridden on a per-topic basis.
 82 | 
 83 | # The number of messages to accept before forcing a flush of data to disk
 84 | #log.flush.interval.messages=10000
 85 | 
 86 | # The maximum amount of time a message can sit in a log before we force a flush
 87 | #log.flush.interval.ms=1000
 88 | 
 89 | ############################# Log Retention Policy #############################
 90 | 
 91 | # The following configurations control the disposal of log segments. The policy can
 92 | # be set to delete segments after a period of time, or after a given size has accumulated.
 93 | # A segment will be deleted whenever *either* of these criteria are met. Deletion always happens
 94 | # from the end of the log.
 95 | 
 96 | # The minimum age of a log file to be eligible for deletion
 97 | log.retention.hours=168
 98 | 
 99 | # A size-based retention policy for logs. Segments are pruned from the log as long as the remaining
100 | # segments don't drop below log.retention.bytes.
101 | #log.retention.bytes=1073741824
102 | 
103 | # The maximum size of a log segment file. When this size is reached a new log segment will be created.
104 | log.segment.bytes=1073741824
105 | 
106 | # The interval at which log segments are checked to see if they can be deleted according 
107 | # to the retention policies
108 | log.retention.check.interval.ms=300000
109 | 
110 | # By default the log cleaner is disabled and the log retention policy will default to just delete segments after their retention expires.
111 | # If log.cleaner.enable=true is set the cleaner will be enabled and individual logs can then be marked for log compaction.
112 | log.cleaner.enable=false
113 | 
114 | ############################# Zookeeper #############################
115 | 
116 | # Zookeeper connection string (see zookeeper docs for details).
117 | # This is a comma separated host:port pairs, each corresponding to a zk
118 | # server. e.g. "127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002".
119 | # You can also append an optional chroot string to the urls to specify the
120 | # root directory for all kafka znodes.
121 | # CL_port_default 2181
122 | zookeeper.connect=CLOUD_LOCAL_HOSTNAME:2181
123 | 
124 | # Timeout in ms for connecting to zookeeper
125 | zookeeper.connection.timeout.ms=6000
126 | 


--------------------------------------------------------------------------------
/templates/zeppelin/zeppelin-env.sh:
--------------------------------------------------------------------------------
 1 | #!/bin/bash
 2 | #
 3 | # Licensed to the Apache Software Foundation (ASF) under one or more
 4 | # contributor license agreements.  See the NOTICE file distributed with
 5 | # this work for additional information regarding copyright ownership.
 6 | # The ASF licenses this file to You under the Apache License, Version 2.0
 7 | # (the "License"); you may not use this file except in compliance with
 8 | # the License.  You may obtain a copy of the License at
 9 | #
10 | #    http://www.apache.org/licenses/LICENSE-2.0
11 | #
12 | # Unless required by applicable law or agreed to in writing, software
13 | # distributed under the License is distributed on an "AS IS" BASIS,
14 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
15 | # See the License for the specific language governing permissions and
16 | # limitations under the License.
17 | #
18 | 
19 | # export JAVA_HOME=
20 | # export MASTER=                 		# Spark master url. eg. spark://master_addr:7077. Leave empty if you want to use local mode.
21 | # export ZEPPELIN_JAVA_OPTS      		# Additional jvm options. for example, export ZEPPELIN_JAVA_OPTS="-Dspark.executor.memory=8g -Dspark.cores.max=16"
22 | # export ZEPPELIN_MEM            		# Zeppelin jvm mem options Default -Xms1024m -Xmx1024m -XX:MaxPermSize=512m
23 | # export ZEPPELIN_INTP_MEM       		# zeppelin interpreter process jvm mem options. Default -Xms1024m -Xmx1024m -XX:MaxPermSize=512m
24 | # export ZEPPELIN_INTP_JAVA_OPTS 		# zeppelin interpreter process jvm options.
25 | export ZEPPELIN_PORT=5771       		# Zeppelin server port.  Defaults to 8080.
26 | # export ZEPPELIN_SSL_PORT       		# ssl port (used when ssl environment variable is set to true)
27 | 
28 | # export ZEPPELIN_LOG_DIR        		# Where log files are stored.  PWD by default.
29 | # export ZEPPELIN_PID_DIR        		# The pid files are stored. ${ZEPPELIN_HOME}/run by default.
30 | # export ZEPPELIN_WAR_TEMPDIR    		# The location of jetty temporary directory.
31 | # export ZEPPELIN_NOTEBOOK_DIR   		# Where notebook saved
32 | # export ZEPPELIN_NOTEBOOK_HOMESCREEN		# Id of notebook to be displayed in homescreen. ex) 2A94M5J1Z
33 | # export ZEPPELIN_NOTEBOOK_HOMESCREEN_HIDE	# hide homescreen notebook from list when this value set to "true". default "false"
34 | # export ZEPPELIN_NOTEBOOK_S3_BUCKET        # Bucket where notebook saved
35 | # export ZEPPELIN_NOTEBOOK_S3_ENDPOINT      # Endpoint of the bucket
36 | # export ZEPPELIN_NOTEBOOK_S3_USER          # User in bucket where notebook saved. For example bucket/user/notebook/2A94M5J1Z/note.json
37 | # export ZEPPELIN_NOTEBOOK_S3_KMS_KEY_ID    # AWS KMS key ID
38 | # export ZEPPELIN_NOTEBOOK_S3_KMS_KEY_REGION      # AWS KMS key region
39 | # export ZEPPELIN_IDENT_STRING   		# A string representing this instance of zeppelin. $USER by default.
40 | # export ZEPPELIN_NICENESS       		# The scheduling priority for daemons. Defaults to 0.
41 | # export ZEPPELIN_INTERPRETER_LOCALREPO         # Local repository for interpreter's additional dependency loading
42 | # export ZEPPELIN_NOTEBOOK_STORAGE 		# Refers to pluggable notebook storage class, can have two classes simultaneously with a sync between them (e.g. local and remote).
43 | # export ZEPPELIN_NOTEBOOK_ONE_WAY_SYNC	# If there are multiple notebook storages, should we treat the first one as the only source of truth?
44 | # export ZEPPELIN_NOTEBOOK_PUBLIC   # Make notebook public by default when created, private otherwise
45 | 
46 | #### Spark interpreter configuration ####
47 | 
48 | ## Use provided spark installation ##
49 | ## defining SPARK_HOME makes Zeppelin run spark interpreter process using spark-submit
50 | ##
51 | # export SPARK_HOME                             # (required) When it is defined, load it instead of Zeppelin embedded Spark libraries
52 | # export SPARK_SUBMIT_OPTIONS                   # (optional) extra options to pass to spark submit. eg) "--driver-memory 512M --executor-memory 1G".
53 | # export SPARK_APP_NAME                         # (optional) The name of spark application.
54 | 
55 | ## Use embedded spark binaries ##
56 | ## without SPARK_HOME defined, Zeppelin still able to run spark interpreter process using embedded spark binaries.
57 | ## however, it is not encouraged when you can define SPARK_HOME
58 | ##
59 | # Options read in YARN client mode
60 | # export HADOOP_CONF_DIR         		# yarn-site.xml is located in configuration directory in HADOOP_CONF_DIR.
61 | # Pyspark (supported with Spark 1.2.1 and above)
62 | # To configure pyspark, you need to set spark distribution's path to 'spark.home' property in Interpreter setting screen in Zeppelin GUI
63 | # export PYSPARK_PYTHON          		# path to the python command. must be the same path on the driver(Zeppelin) and all workers.
64 | # export PYTHONPATH
65 | 
66 | ## Spark interpreter options ##
67 | ##
68 | # export ZEPPELIN_SPARK_USEHIVECONTEXT  # Use HiveContext instead of SQLContext if set true. true by default.
69 | # export ZEPPELIN_SPARK_CONCURRENTSQL   # Execute multiple SQL concurrently if set true. false by default.
70 | # export ZEPPELIN_SPARK_IMPORTIMPLICIT  # Import implicits, UDF collection, and sql if set true. true by default.
71 | # export ZEPPELIN_SPARK_MAXRESULT       # Max number of Spark SQL result to display. 1000 by default.
72 | # export ZEPPELIN_WEBSOCKET_MAX_TEXT_MESSAGE_SIZE       # Size in characters of the maximum text message to be received by websocket. Defaults to 1024000
73 | 
74 | 
75 | #### HBase interpreter configuration ####
76 | 
77 | ## To connect to HBase running on a cluster, either HBASE_HOME or HBASE_CONF_DIR must be set
78 | 
79 | # export HBASE_HOME=                    # (require) Under which HBase scripts and configuration should be
80 | # export HBASE_CONF_DIR=                # (optional) Alternatively, configuration directory can be set to point to the directory that has hbase-site.xml
81 | 
82 | #### ZeppelinHub connection configuration ####
83 | # export ZEPPELINHUB_API_ADDRESS		# Refers to the address of the ZeppelinHub service in use
84 | # export ZEPPELINHUB_API_TOKEN			# Refers to the Zeppelin instance token of the user
85 | # export ZEPPELINHUB_USER_KEY			# Optional, when using Zeppelin with authentication.
86 | 
87 | #### Zeppelin impersonation configuration
88 | # export ZEPPELIN_IMPERSONATE_CMD       # Optional, when user want to run interpreter as end web user. eg) 'sudo -H -u ${ZEPPELIN_IMPERSONATE_USER} bash -c '
89 | # export ZEPPELIN_IMPERSONATE_SPARK_PROXY_USER  #Optional, by default is true; can be set to false if you don't want to use --proxy-user option with Spark interpreter when impersonation enabled
90 | 


--------------------------------------------------------------------------------
/templates/zookeeper/zoo.cfg:
--------------------------------------------------------------------------------
 1 | # The number of milliseconds of each tick
 2 | tickTime=2000
 3 | # The number of ticks that the initial 
 4 | # synchronization phase can take
 5 | initLimit=10
 6 | # The number of ticks that can pass between 
 7 | # sending a request and getting an acknowledgement
 8 | syncLimit=5
 9 | # the directory where the snapshot is stored.
10 | # do not use /tmp for storage, /tmp here is just 
11 | # example sakes.
12 | dataDir=LOCAL_CLOUD_PREFIX/data/zookeeper
13 | # the port at which the clients will connect
14 | clientPort=2181
15 | # the maximum number of client connections.
16 | # increase this if you need to handle more clients
17 | #maxClientCnxns=60
18 | #
19 | # Be sure to read the maintenance section of the 
20 | # administrator guide before turning on autopurge.
21 | #
22 | # http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
23 | #
24 | # The number of snapshots to retain in dataDir
25 | #autopurge.snapRetainCount=3
26 | # Purge task interval in hours
27 | # Set to "0" to disable auto purge feature
28 | #autopurge.purgeInterval=1
29 | 


--------------------------------------------------------------------------------