├── .gitignore ├── README.md ├── bin ├── cloud-local.sh ├── config.sh └── ports.sh ├── conf └── cloud-local.conf └── templates ├── accumulo ├── accumulo-env.sh └── accumulo-site.xml ├── hadoop ├── core-site.xml ├── hdfs-site.xml ├── mapred-site.xml └── yarn-site.xml ├── hbase └── hbase-site.xml ├── kafka └── server.properties ├── zeppelin └── zeppelin-env.sh └── zookeeper └── zoo.cfg /.gitignore: -------------------------------------------------------------------------------- 1 | *~ 2 | /pkg 3 | data/* 4 | *.tar.gz 5 | /accumulo* 6 | /hbase* 7 | /hadoop* 8 | /zookeeper* 9 | /zeppelin* 10 | derby* 11 | /metastore_db* 12 | /kafka* 13 | /spark* 14 | /scala* 15 | /geomesa* 16 | /scala* 17 | /.idea 18 | /cloud-local.iml 19 | *.iml 20 | .envrc 21 | 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # cloud-local 2 | 3 | Cloud-local is a collection of bash scripts to set up a single-node cloud on on your desktop, laptop, or NUC. Performance expectations are to be sufficient for testing things like map-reduce ingest, converters in real life with real files, and your own geoserver/iterator stack. This setup is preconfigured to run YARN so you can submit command line tools mapreduce jobs to it. Currently localhost ssh is NOT required so it will work on a NUC. 4 | 5 | Cloud Local can be used to run single node versions of the following software: 6 | * Hadoop HDFS 7 | * YARN 8 | * Accumulo 9 | * HBase 10 | * Spark (on yarn) 11 | * Kafka 12 | * GeoMesa 13 | 14 | ## Versions and Branches 15 | 16 | The main branch requires Hadoop 3.x and Accumulo 2.0 while the hadoop2 branch is based on Hadoop 2.x and Accumulo 1.x 17 | 18 | ## Initial Configuration 19 | 20 | A proxy server can be configured by using the standard `http_proxy` env var or the cloud-local specific `cl_http_proxy` env var. The cloud local specific variable takes precedence. 21 | 22 | ## Getting Started 23 | 24 | To prepare for the first run of cloud-local, you may need to `unset` environment variables `HADOOP_HOME`, `ACCUMULO_HOME`, `ZOOKEEPER_HOME`, `HBASE_HOME`, and others. If `env |grep -i hadoop` comes back empty, you should be good to go. You should also `kill` any instances of zookeeper or hadoop running locally. Find them with `jps -lV`. 25 | 26 | When using this the first time... 27 | 28 | git clone git@github.com:ccri/cloud-local.git 29 | cd cloud-local 30 | bin/cloud-local.sh init 31 | 32 | By default only HDFS is started. Edit `conf/cloud-local.conf` to enable Accumulo, BHase, Kafka, GeoMesa, Spark and/or Zeppelin. 33 | 34 | Cloud local sets a default accumulo instance name of "local" and password of "secret" which can be modified by editing the `conf/cloud-local.conf` file. If you want to change this you'll need to stop, clean, and reconfigure. 35 | 36 | This init script does several things: 37 | * configure HDFS configuration files 38 | * format the HDFS namenode 39 | * create a user homedir in hdfs 40 | * initialize accumulo/hbase 41 | * start up zookeeper, hadoop, and accumulo/hbase 42 | * start kafka broker 43 | * install and start Zeppelin 44 | * install GeoMesa Accumulo iterators 45 | * install GeoMesa command-line tools 46 | 47 | After running `init` source the variables in your bashrc or other shell: 48 | 49 | source bin/config.sh 50 | 51 | Now you should have the environment vars set: 52 | 53 | env | grep -i hadoop 54 | 55 | Now you can run fun commands like: 56 | 57 | hadoop fs -ls / 58 | accumulo shell -u root 59 | 60 | After installing it you should be able to reach your standard cloud urls: 61 | 62 | * Accumulo: http://localhost:50095 63 | * Hadoop DFS: http://localhost:50070 64 | * Job Tracker: http://localhost:8088 65 | * Zeppelin: http://localhost:5771 66 | 67 | ## Getting Help 68 | 69 | Options for using `cloud-local.sh` can be found by calling: 70 | 71 | bin/cloud-local.sh help 72 | 73 | You can also set `CL_VERBOSE=1` env variable in `conf/cloud-local.conf` to increase messages 74 | 75 | ## Stopping and Starting 76 | 77 | You can safely stop the cloud using: 78 | 79 | bin/cloud-local.sh stop 80 | 81 | You should stop the cloud before shutting down the machine or doing maintenance. 82 | 83 | You can start the cloud back up using the analogous `start` option. Be sure that the cloud is not running (hit the cloud urls or `ps aux|grep -i hadoop`). 84 | 85 | bin/cloud-local.sh start 86 | 87 | If existing ports are bound to the ports needed for cloud-local an error message will be printed and the script will stop. 88 | 89 | ## Changing Ports, Hostname, and Bind Address 90 | 91 | cloud-local allows you to modify the ports, hostname, and bind addresses in configuration or using variables in your env (bashrc). For example: 92 | 93 | # sample .bashrc configuration 94 | 95 | # offset all ports by 10000 96 | export CL_PORT_OFFSET=10000 97 | 98 | # change the bind address 99 | export CL_BIND_ADDRESS=192.168.2.2 100 | 101 | # change the hostname from localhost to something else 102 | export CL_HOSTNAME=mydns.mycompany.com 103 | 104 | Port offseting moves the entire port space by a given numerical amount in order to allow multiple cloud-local instances to run on a single machine (usually by different users). The bind address and hostname args allow you to reference cloud local from other machines. 105 | 106 | WARNING - you should stop and clean cloud-local before changing any of these parameters since they will modify the config and may prevent cloud-local from cleanly shutting down. Changing port offsets is supported by XML comments in the accumulo and hadoop config files. Removing or changing these comments (CL_port_default) will likely cause failures. 107 | 108 | ## GeoServer 109 | 110 | If you have the environment variable GEOSERVER_HOME set you can use this parameter to start GeoServer at the same time but running in a child thread. 111 | 112 | bin/cloud-local.sh start -gs 113 | 114 | Similarly, you can instruct cloud-local to shutdown GeoServer with the cloud using: 115 | 116 | bin/cloud-local.sh stop -gs 117 | 118 | Additionally, if you need to restart GeoServer you may use the command `regeoserver`: 119 | 120 | bin/cloud-local.sh regeoserver 121 | 122 | The GeoServer PID is stored in `$CLOUD_HOME/data/geoserver/pid/geoserver.pid` and GeoServer's stdout is redirected to `$CLOUD_HOME/data/geoserver/log/std.out`. 123 | 124 | ## Zeppelin 125 | 126 | Zeppelin is *disabled* by default. 127 | 128 | Currently, we are using the Zeppelin distribution that includes all of the interpreters, and 129 | it is configured to run against Spark only in local mode. If you want to connect to another 130 | (real) cloud, you will have to configure that manually; see: 131 | 132 | [Zeppelin documentation](http://zeppelin.apache.org/docs/0.7.0/install/spark_cluster_mode.html#spark-on-yarn-mode) 133 | 134 | ### GeoMesa Spark-SQL on Zeppelin 135 | 136 | To enable GeoMesa's Spark-SQL within Zeppelin: 137 | 138 | 1. point your browser to your [local Zeppelin interpreter configuration](http://localhost:5771/#/interpreter) 139 | 1. scroll to the bottom where the *Spark* interpreter configuration appears 140 | 1. click on the "edit" button next to the interpreter name (on the right-hand side of the UI) 141 | 1. within the _Dependencies_ section, add this one JAR (either as a full, local file name or as Maven GAV coordinates): 142 | 1. geomesa-accumulo-spark-runtime_2.11-1.3.0.jar 143 | 1. when prompted by the pop-up, click to restart the Spark intepreter 144 | 145 | That's it! There is no need to restart any of the cloud-local services. 146 | 147 | ## Maintenance 148 | 149 | The `cloud-local.sh` script provides options for maintenance. Best to stop the cloud before performing any of these tasks. Pass in the parameter `clean` to remove software (but not the tar.gz's) and data. The parameter `reconfigure` will first `clean` then `init`. 150 | 151 | ### Updating 152 | 153 | When this git repo is updated, follow the steps below. The steps below will remove your data. 154 | 155 | cd $CLOUD_HOME 156 | bin/cloud-local.sh stop 157 | bin/cloud-local.sh clean 158 | git pull 159 | bin/cloud-local.sh init 160 | 161 | ### Starting over 162 | 163 | If you foobar your cloud, you can just delete everything and start over. You should do this once a week or so just for good measure. 164 | 165 | cd $CLOUD_HOME 166 | bin/cloud-local.sh stop #if cloud is running 167 | rm * -rf 168 | git pull 169 | git reset --hard 170 | bin/cloud-local.sh init 171 | 172 | ## Virtual Machine Help 173 | 174 | If you are using cloud-local within a virtual machine running you your local box you may want to set up port forwarding for port 50095 to see the accumulo monitor. For VirtualBox go to VM's Settings->Network->Port Forwarding section (name=accumulo, protocol=TCP, Host IP=127.0.0.1, Guest IP (leave blank), Guest Port=50095) 175 | -------------------------------------------------------------------------------- /bin/cloud-local.sh: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bash 2 | 3 | REPO_BASE=https://repo.locationtech.org/content/repositories/geomesa-releases 4 | 5 | # thanks accumulo for these resolutions snippets 6 | # Start: Resolve Script Directory 7 | SOURCE="${BASH_SOURCE[0]}" 8 | while [[ -h "${SOURCE}" ]]; do # resolve $SOURCE until the file is no longer a symlink 9 | bin="$( cd -P "$( dirname "${SOURCE}" )" && pwd )" 10 | SOURCE="$(readlink "${SOURCE}")" 11 | [[ "${SOURCE}" != /* ]] && SOURCE="${bin}/${SOURCE}" # if $SOURCE was a relative symlink, we need to resolve it relative to the path where the symlink file was located 12 | done 13 | bin="$( cd -P "$( dirname "${SOURCE}" )" && pwd )" 14 | script=$( basename "${SOURCE}" ) 15 | # Stop: Resolve Script Directory 16 | 17 | # Start: config 18 | . "${bin}"/config.sh 19 | 20 | # Check config 21 | if ! validate_config; then 22 | echo "Invalid configuration" 23 | exit 1 24 | fi 25 | 26 | # check java home 27 | if [[ -z "$JAVA_HOME" ]]; then 28 | echo "must set JAVA_HOME..." 29 | exit 1 30 | fi 31 | # Stop: config 32 | 33 | # import port checking 34 | . "${bin}"/ports.sh 35 | 36 | function download_packages { 37 | # Is the pre-download packages variable set? 38 | if [[ ! -z ${pkg_pre_download+x} ]]; then 39 | # Does that folder actually exist? 40 | if [[ -d ${pkg_pre_download} ]] ; then 41 | test -d ${CLOUD_HOME}/pkg || rmdir ${CLOUD_HOME}/pkg 42 | test -h ${CLOUD_HOME}/pkg && rm ${CLOUD_HOME}/pkg 43 | ln -s ${pkg_pre_download} ${CLOUD_HOME}/pkg 44 | echo "Skipping downloads... using ${pkg_pre_download}" 45 | return 0 46 | fi 47 | fi 48 | 49 | # get stuff 50 | echo "Downloading packages from internet..." 51 | test -d ${CLOUD_HOME}/pkg || mkdir ${CLOUD_HOME}/pkg 52 | 53 | # check for proxy 54 | if [[ ! -z ${cl_http_proxy+x} ]]; then 55 | export http_proxy="${cl_http_proxy}" 56 | fi 57 | 58 | if [[ ! -z ${http_proxy+x} ]]; then 59 | echo "Using proxy ${http_proxy}" 60 | fi 61 | 62 | # GeoMesa 63 | if [[ "${geomesa_enabled}" -eq "1" ]]; then 64 | gm="geomesa-accumulo-dist_${pkg_geomesa_scala_ver}-${pkg_geomesa_ver}-bin.tar.gz" 65 | url="${REPO_BASE}/org/locationtech/geomesa/geomesa-accumulo-dist_${pkg_geomesa_scala_ver}/${pkg_geomesa_ver}/${gm}" 66 | wget -c -O "${CLOUD_HOME}/pkg/${gm}" "${url}" || { rm -f "${CLOUD_HOME}/pkg/${gm}"; echo "Error downloading: ${CLOUD_HOME}/pkg/${gm}"; errorList="${errorList} ${gm} ${NL}"; }; 67 | gm="geomesa-accumulo-distributed-runtime_${pkg_geomesa_scala_ver}-${pkg_geomesa_ver}.jar" 68 | url="${REPO_BASE}/org/locationtech/geomesa/geomesa-accumulo-distributed-runtime_${pkg_geomesa_scala_ver}/${pkg_geomesa_ver}/${gm}" 69 | wget -c -O "${CLOUD_HOME}/pkg/${gm}" "${url}" || { rm -f "${CLOUD_HOME}/pkg/${gm}"; echo "Error downloading: ${CLOUD_HOME}/pkg/${gm}"; errorList="${errorList} ${gm} ${NL}"; }; 70 | fi 71 | 72 | # Scala 73 | if [[ "${scala_enabled}" -eq "1" ]]; then 74 | url="http://downloads.lightbend.com/scala/${pkg_scala_ver}/scala-${pkg_scala_ver}.tgz" 75 | file="${CLOUD_HOME}/pkg/scala-${pkg_scala_ver}.tgz" 76 | wget -c -O "${file}" "${url}" \ 77 | || { rm -f "${file}"; echo "Error downloading: ${file}"; errorList="${errorList} scala-${pkg_scala_ver}.tgz ${NL}"; }; 78 | fi 79 | 80 | local apache_archive_url="http://archive.apache.org/dist" 81 | 82 | local maven=${pkg_src_maven} 83 | 84 | declare -a urls=("${apache_archive_url}/hadoop/common/hadoop-${pkg_hadoop_ver}/hadoop-${pkg_hadoop_ver}.tar.gz" 85 | "${apache_archive_url}/zookeeper/zookeeper-${pkg_zookeeper_ver}/zookeeper-${pkg_zookeeper_ver}.tar.gz") 86 | 87 | if [[ "$spark_enabled" -eq 1 ]]; then 88 | urls=("${urls[@]}" "${apache_archive_url}/spark/spark-${pkg_spark_ver}/spark-${pkg_spark_ver}-bin-${pkg_spark_hadoop_ver}.tgz") 89 | fi 90 | 91 | if [[ "$kafka_enabled" -eq 1 ]]; then 92 | urls=("${urls[@]}" "${apache_archive_url}/kafka/${pkg_kafka_ver}/kafka_${pkg_kafka_scala_ver}-${pkg_kafka_ver}.tgz") 93 | fi 94 | 95 | if [[ "$acc_enabled" -eq 1 ]]; then 96 | urls=("${urls[@]}" "${maven}/org/apache/accumulo/accumulo/${pkg_accumulo_ver}/accumulo-${pkg_accumulo_ver}-bin.tar.gz") 97 | fi 98 | 99 | if [[ "$hbase_enabled" -eq 1 ]]; then 100 | urls=("${urls[@]}" "${apache_archive_url}/hbase/${pkg_hbase_ver}/hbase-${pkg_hbase_ver}-bin.tar.gz") 101 | fi 102 | 103 | if [[ "$zeppelin_enabled" -eq 1 ]]; then 104 | urls=("${urls[@]}" "${apache_archive_url}/zeppelin/zeppelin-${pkg_zeppelin_ver}/zeppelin-${pkg_zeppelin_ver}-bin-all.tgz") 105 | fi 106 | 107 | for x in "${urls[@]}"; do 108 | fname=$(basename "$x"); 109 | echo "fetching ${x}"; 110 | wget -c -O "${CLOUD_HOME}/pkg/${fname}" "$x" || { rm -f "${CLOUD_HOME}/pkg/${fname}"; echo "Error Downloading: ${fname}"; errorList="${errorList} ${x} ${NL}"; }; 111 | done 112 | 113 | if [[ -n "${errorList}" ]]; then 114 | echo "Failed to download: ${NL} ${errorList}"; 115 | fi 116 | } 117 | 118 | function unpackage { 119 | local targs 120 | if [[ "${CL_VERBOSE}" == "1" ]]; then 121 | targs="xvf" 122 | else 123 | targs="xf" 124 | fi 125 | 126 | echo "Unpackaging software..." 127 | [[ "${geomesa_enabled}" -eq "1" ]] \ 128 | && $(cd -P "${CLOUD_HOME}" && tar $targs "${CLOUD_HOME}/pkg/geomesa-accumulo-dist_${pkg_geomesa_scala_ver}-${pkg_geomesa_ver}-bin.tar.gz") \ 129 | && echo "Unpacked GeoMesa Tools" 130 | [[ "${scala_enabled}" -eq "1" ]] \ 131 | && $(cd -P "${CLOUD_HOME}" && tar $targs "${CLOUD_HOME}/pkg/scala-${pkg_scala_ver}.tgz") \ 132 | && echo "Unpacked Scala ${pkg_scala_ver}" 133 | (cd -P "${CLOUD_HOME}" && tar $targs "${CLOUD_HOME}/pkg/zookeeper-${pkg_zookeeper_ver}.tar.gz") && echo "Unpacked zookeeper" 134 | [[ "$acc_enabled" -eq 1 ]] && (cd -P "${CLOUD_HOME}" && tar $targs "${CLOUD_HOME}/pkg/accumulo-${pkg_accumulo_ver}-bin.tar.gz") && echo "Unpacked accumulo" 135 | [[ "$hbase_enabled" -eq 1 ]] && (cd -P "${CLOUD_HOME}" && tar $targs "${CLOUD_HOME}/pkg/hbase-${pkg_hbase_ver}-bin.tar.gz") && echo "Unpacked hbase" 136 | [[ "$zeppelin_enabled" -eq 1 ]] && (cd -P "${CLOUD_HOME}" && tar $targs "${CLOUD_HOME}/pkg/zeppelin-${pkg_zeppelin_ver}-bin-all.tgz") && echo "Unpacked zeppelin" 137 | [[ "$kafka_enabled" -eq 1 ]] && (cd -P "${CLOUD_HOME}" && tar $targs "${CLOUD_HOME}/pkg/kafka_${pkg_kafka_scala_ver}-${pkg_kafka_ver}.tgz") && echo "Unpacked kafka" 138 | [[ "$spark_enabled" -eq 1 ]] && (cd -P "${CLOUD_HOME}" && tar $targs "${CLOUD_HOME}/pkg/spark-${pkg_spark_ver}-bin-${pkg_spark_hadoop_ver}.tgz") && echo "Unpacked spark" 139 | (cd -P "${CLOUD_HOME}" && tar $targs "${CLOUD_HOME}/pkg/hadoop-${pkg_hadoop_ver}.tar.gz") && echo "Unpacked hadoop" 140 | } 141 | 142 | function configure { 143 | mkdir -p "${CLOUD_HOME}/tmp/staging" 144 | cp -r ${CLOUD_HOME}/templates/* ${CLOUD_HOME}/tmp/staging/ 145 | 146 | # accumulo config before substitutions 147 | [[ "$acc_enabled" -eq 1 ]] && cp $ACCUMULO_HOME/conf/examples/3GB/standalone/* $ACCUMULO_HOME/conf/ 148 | 149 | ## Substitute env vars 150 | sed -i~orig "s#LOCAL_CLOUD_PREFIX#${CLOUD_HOME}#;s#CLOUD_LOCAL_HOSTNAME#${CL_HOSTNAME}#;s#CLOUD_LOCAL_BIND_ADDRESS#${CL_BIND_ADDRESS}#" ${CLOUD_HOME}/tmp/staging/*/* 151 | 152 | if [[ "$acc_enabled" -eq 1 ]]; then 153 | # accumulo config 154 | echo "${CL_HOSTNAME}" > ${ACCUMULO_HOME}/conf/gc 155 | echo "${CL_HOSTNAME}" > ${ACCUMULO_HOME}/conf/masters 156 | echo "${CL_HOSTNAME}" > ${ACCUMULO_HOME}/conf/tservers 157 | echo "${CL_HOSTNAME}" > ${ACCUMULO_HOME}/conf/monitor 158 | echo "${CL_HOSTNAME}" > ${ACCUMULO_HOME}/conf/tracers 159 | fi 160 | 161 | if [[ "$hbase_enabled" -eq 1 ]]; then 162 | sed -i~orig "s/\# export HBASE_MANAGES_ZK=true/export HBASE_MANAGES_ZK=false/" "${HBASE_HOME}/conf/hbase-env.sh" 163 | echo "${CL_HOSTNAME}" > ${HBASE_HOME}/conf/regionservers 164 | fi 165 | 166 | # Zeppelin configuration 167 | if [[ "$zeppelin_enabled" -eq 1 ]]; then 168 | echo "[WARNING] Zeppelin configuration is only template-based for now!" 169 | fi 170 | 171 | # hadoop slaves file 172 | echo "${CL_HOSTNAME}" > ${CLOUD_HOME}/tmp/staging/hadoop/slaves 173 | 174 | # deploy from staging 175 | echo "Deploying config from staging..." 176 | test -d $HADOOP_CONF_DIR || mkdir $HADOOP_CONF_DIR 177 | test -d $ZOOKEEPER_HOME/conf || mkdir $ZOOKEEPER_HOME/conf 178 | [[ "$kafka_enabled" -eq 1 ]] && (test -d $KAFKA_HOME/config || mkdir $KAFKA_HOME/config) 179 | cp ${CLOUD_HOME}/tmp/staging/hadoop/* $HADOOP_CONF_DIR/ 180 | cp ${CLOUD_HOME}/tmp/staging/zookeeper/* $ZOOKEEPER_HOME/conf/ 181 | [[ "$kafka_enabled" -eq 1 ]] && cp ${CLOUD_HOME}/tmp/staging/kafka/* $KAFKA_HOME/config/ 182 | [[ "$acc_enabled" -eq 1 ]] && cp ${CLOUD_HOME}/tmp/staging/accumulo/* ${ACCUMULO_HOME}/conf/ 183 | [[ "$geomesa_enabled" -eq 1 ]] && cp ${CLOUD_HOME}/pkg/geomesa-accumulo-distributed-runtime_${pkg_geomesa_scala_ver}-${pkg_geomesa_ver}.jar ${ACCUMULO_HOME}/lib/ext/ 184 | [[ "$hbase_enabled" -eq 1 ]] && cp ${CLOUD_HOME}/tmp/staging/hbase/* ${HBASE_HOME}/conf/ 185 | [[ "$zeppelin_enabled" -eq 1 ]] && cp ${CLOUD_HOME}/tmp/staging/zeppelin/* ${ZEPPELIN_HOME}/conf/ 186 | 187 | # If Spark doesn't have log4j settings, use the Spark defaults 188 | if [[ "$spark_enabled" -eq 1 ]]; then 189 | test -f $SPARK_HOME/conf/log4j.properties && cp $SPARK_HOME/conf/log4j.properties.template $SPARK_HOME/conf/log4j.properties 190 | fi 191 | 192 | # configure port offsets 193 | configure_port_offset 194 | 195 | # As of Accumulo 2 accumulo-site.xml is nolonger allowed. To avoid a lot of work rewriting the ports script we'll just use accumulo's converter. 196 | if [ -f "$ACCUMULO_HOME/conf/accumulo-site.xml" ]; then 197 | rm -f "$ACCUMULO_HOME/conf/accumulo.properties" 198 | "$ACCUMULO_HOME/bin/accumulo" convert-config \ 199 | -x "$ACCUMULO_HOME/conf/accumulo-site.xml" \ 200 | -p "$ACCUMULO_HOME/conf/accumulo.properties" 201 | rm -f "$ACCUMULO_HOME/conf/accumulo-site.xml" 202 | fi 203 | 204 | # Configure accumulo-client.properties 205 | if [ -f "$ACCUMULO_HOME/conf/accumulo-client.properties" ]; then 206 | sed -i "s/.*instance.name=.*$/instance.name=$cl_acc_inst_name/" "$ACCUMULO_HOME/conf/accumulo-client.properties" 207 | sed -i "s/.*auth.principal=.*$/auth.principal=root/" "$ACCUMULO_HOME/conf/accumulo-client.properties" 208 | sed -i "s/.*auth.token=.*$/auth.token=$cl_acc_inst_pass/" "$ACCUMULO_HOME/conf/accumulo-client.properties" 209 | 210 | fi 211 | rm -rf ${CLOUD_HOME}/tmp/staging 212 | } 213 | 214 | function start_first_time { 215 | # This seems redundant to config but this is the first time in normal sequence where it will set properly 216 | [[ "$pkg_spark_hadoop_ver" = "without-hadoop" ]] && export SPARK_DIST_CLASSPATH=$(hadoop classpath) 217 | # check ports 218 | check_ports 219 | 220 | # start zk 221 | echo "Starting zoo..." 222 | (cd $CLOUD_HOME; $ZOOKEEPER_HOME/bin/zkServer.sh start) 223 | 224 | if [[ "$kafka_enabled" -eq 1 ]]; then 225 | # start kafka 226 | echo "Starting kafka..." 227 | $KAFKA_HOME/bin/kafka-server-start.sh -daemon $KAFKA_HOME/config/server.properties 228 | fi 229 | 230 | # format namenode 231 | echo "Formatting namenode..." 232 | $HADOOP_HOME/bin/hdfs namenode -format 233 | 234 | # start hadoop 235 | echo "Starting hadoop..." 236 | $HADOOP_HOME/bin/hdfs --config $HADOOP_CONF_DIR --daemon start namenode 237 | $HADOOP_HOME/bin/hdfs --config $HADOOP_CONF_DIR --daemon start secondarynamenode 238 | $HADOOP_HOME/bin/hdfs --config $HADOOP_CONF_DIR --daemon start datanode 239 | $HADOOP_HOME/bin/yarn --config $HADOOP_CONF_DIR --daemon start resourcemanager 240 | $HADOOP_HOME/bin/yarn --config $HADOOP_CONF_DIR --daemon start nodemanager 241 | 242 | # Wait for HDFS to exit safemode: 243 | hdfs_wait_safemode 244 | 245 | # create user homedir 246 | echo "Creating hdfs path /user/$USER" 247 | $HADOOP_HOME/bin/hadoop fs -mkdir -p "/user/$USER" 248 | 249 | # sleep 250 | sleep 5 251 | 252 | if [[ "$acc_enabled" -eq 1 ]]; then 253 | # init accumulo 254 | echo "Initializing accumulo" 255 | $ACCUMULO_HOME/bin/accumulo init --instance-name $cl_acc_inst_name --password $cl_acc_inst_pass 256 | 257 | # sleep 258 | sleep 5 259 | 260 | # starting accumulo 261 | echo "Starting accumulo..." 262 | $ACCUMULO_HOME/bin/accumulo-cluster start 263 | fi 264 | 265 | if [[ "$hbase_enabled" -eq 1 ]]; then 266 | # start hbase 267 | echo "Starting hbase..." 268 | ${HBASE_HOME}/bin/start-hbase.sh 269 | fi 270 | 271 | if [[ "$zeppelin_enabled" -eq 1 ]]; then 272 | # start zeppelin 273 | echo "Starting zeppelin..." 274 | ${ZEPPELIN_HOME}/bin/zeppelin-daemon.sh start 275 | fi 276 | 277 | if [[ "$geoserver_enable" -eq 1 ]]; then 278 | echo "Initializing geoserver..." 279 | mkdir -p "${GEOSERVER_DATA_DIR}" 280 | mkdir "${GEOSERVER_PID_DIR}" 281 | mkdir "${GEOSERVER_LOG_DIR}" 282 | touch "${GEOSERVER_PID_DIR}/geoserver.pid" 283 | touch "${GEOSERVER_LOG_DIR}/std.out" 284 | start_geoserver 285 | fi 286 | 287 | } 288 | 289 | function start_cloud { 290 | # Check ports 291 | check_ports 292 | 293 | if [[ "$master_enabled" -eq 1 ]]; then 294 | # start zk 295 | echo "Starting zoo..." 296 | (cd $CLOUD_HOME ; zkServer.sh start) 297 | 298 | if [[ "$kafka_enabled" -eq 1 ]]; then 299 | echo "Starting kafka..." 300 | $KAFKA_HOME/bin/kafka-server-start.sh -daemon $KAFKA_HOME/config/server.properties 301 | fi 302 | 303 | # start hadoop 304 | echo "Starting hadoop..." 305 | hdfs --config $HADOOP_CONF_DIR --daemon start namenode 306 | hdfs --config $HADOOP_CONF_DIR --daemon start secondarynamenode 307 | fi 308 | 309 | if [[ "$worker_enabled" -eq 1 ]]; then 310 | hdfs --config $HADOOP_CONF_DIR --daemon start datanode 311 | fi 312 | 313 | start_yarn 314 | 315 | # Wait for HDFS to exit safemode: 316 | echo "Waiting for HDFS to exit safemode..." 317 | hdfs_wait_safemode 318 | } 319 | 320 | function hdfs_wait_safemode { 321 | safemode_done=1 322 | while [[ "$safemode_done" -ne 0 ]]; do 323 | echo "Waiting for HDFS to exit safemode..." 324 | hdfs dfsadmin -safemode wait 325 | safemode_done=$? 326 | if [[ "$safemode_done" -ne 0 ]]; then 327 | echo "Safe mode not done...sleeping 1" 328 | sleep 1; 329 | fi 330 | done 331 | echo "Safemode exited" 332 | } 333 | 334 | 335 | function start_db { 336 | if [[ "$acc_enabled" -eq 1 ]]; then 337 | # starting accumulo 338 | echo "starting accumulo..." 339 | $ACCUMULO_HOME/bin/accumulo-cluster start 340 | fi 341 | 342 | if [[ "$hbase_enabled" -eq 1 ]]; then 343 | # start hbase 344 | echo "Starting hbase..." 345 | ${HBASE_HOME}/bin/start-hbase.sh 346 | fi 347 | 348 | if [[ "$zeppelin_enabled" -eq 1 ]]; then 349 | # start zeppelin 350 | echo "Starting zeppelin..." 351 | ${ZEPPELIN_HOME}/bin/zeppelin-daemon.sh start 352 | fi 353 | 354 | if [[ "$geoserver_enable" -eq 1 ]]; then 355 | echo "Starting geoserver..." 356 | start_geoserver 357 | fi 358 | 359 | } 360 | 361 | function start_yarn { 362 | if [[ "$master_enabled" -eq 1 ]]; then 363 | $HADOOP_HOME/bin/yarn --config $HADOOP_CONF_DIR --daemon start resourcemanager 364 | fi 365 | if [[ "$worker_enabled" -eq 1 ]]; then 366 | $HADOOP_HOME/bin/yarn --config $HADOOP_CONF_DIR --daemon start nodemanager 367 | fi 368 | } 369 | 370 | function start_geoserver { 371 | (${GEOSERVER_HOME}/bin/startup.sh &> ${GEOSERVER_LOG_DIR}/std.out) & 372 | GEOSERVER_PID=$! 373 | echo "${GEOSERVER_PID}" > ${GEOSERVER_PID_DIR}/geoserver.pid 374 | echo "GeoServer Process Started" 375 | echo "PID: ${GEOSERVER_PID}" 376 | echo "GeoServer Out: ${GEOSERVER_LOG_DIR}/std.out" 377 | } 378 | 379 | function stop_db { 380 | verify_stop 381 | 382 | if [[ "$zeppelin_enabled" -eq 1 ]]; then 383 | echo "Stopping zeppelin..." 384 | ${ZEPPELIN_HOME}/bin/zeppelin-daemon.sh stop 385 | fi 386 | 387 | if [[ "$kafka_enabled" -eq 1 ]]; then 388 | echo "Stopping kafka..." 389 | $KAFKA_HOME/bin/kafka-server-stop.sh 390 | fi 391 | 392 | if [[ "$acc_enabled" -eq 1 ]]; then 393 | echo "Stopping accumulo..." 394 | $ACCUMULO_HOME/bin/accumulo-cluster stop 395 | fi 396 | 397 | if [[ "$hbase_enabled" -eq 1 ]]; then 398 | echo "Stopping hbase..." 399 | ${HBASE_HOME}/bin/stop-hbase.sh 400 | fi 401 | } 402 | 403 | function stop_cloud { 404 | echo "Stopping yarn and dfs..." 405 | stop_yarn 406 | 407 | if [[ "$master_enabled" -eq 1 ]]; then 408 | $HADOOP_HOME/bin/hdfs --config $HADOOP_CONF_DIR --daemon stop namenode 409 | $HADOOP_HOME/bin/hdfs --config $HADOOP_CONF_DIR --daemon stop secondarynamenode 410 | fi 411 | if [[ "$worker_enabled" -eq 1 ]]; then 412 | $HADOOP_HOME/bin/hdfs --config $HADOOP_CONF_DIR --daemon stop datanode 413 | fi 414 | echo "Stopping zookeeper..." 415 | $ZOOKEEPER_HOME/bin/zkServer.sh stop 416 | 417 | if [[ "${geoserver_enabled}" -eq "1" ]]; then 418 | echo "Stopping geoserver..." 419 | stop_geoserver 420 | fi 421 | 422 | } 423 | 424 | function psux { 425 | ps ux | grep -i "$1" 426 | } 427 | 428 | function verify_stop { 429 | # Find Processes 430 | local zeppelin=`psux "[z]eppelin"` 431 | local kafka=`psux "[k]afka"` 432 | local accumulo=`psux "[a]ccumulo"` 433 | local hbase=`psux "[h]base"` 434 | local yarn=`psux "[y]arn"` 435 | local zookeeper=`psux "[z]ookeeper"` 436 | local hadoop=`psux "[h]adoop"` 437 | local geoserver=`psux "[g]eoserver"` 438 | 439 | local res="$zeppelin$kafka$accumulo$hbase$yarn$zookeeper$geoserver" 440 | if [[ -n "${res}" ]]; then 441 | echo "The following services do not appear to be shutdown:" 442 | if [[ -n "${zeppelin}" ]]; then 443 | echo "${NL}Zeppelin" 444 | psux "[z]eppelin" 445 | fi 446 | if [[ -n "${kafka}" ]]; then 447 | echo "${NL}Kafka" 448 | psux "[k]afka" 449 | fi 450 | if [[ -n "${accumulo}" ]]; then 451 | echo "${NL}Accumulo" 452 | psux "[a]ccumulo" 453 | fi 454 | if [[ -n "${hbase}" ]]; then 455 | echo "${NL}HBase" 456 | psux "[h]base" 457 | fi 458 | if [[ -n "${yarn}" ]]; then 459 | echo "${NL}Yarn" 460 | psux "[y]arn" 461 | fi 462 | if [[ -n "${zookeeper}" ]]; then 463 | echo "${NL}Zookeeper" 464 | psux "[z]ookeeper" 465 | fi 466 | if [[ -n "${hadoop}" ]]; then 467 | echo "${NL}Hadoop" 468 | psux "[h]adoop" 469 | fi 470 | if [[ -n "${geoserver}" ]]; then 471 | echo "${NL}GeoServer" 472 | psux "[g]eoserver" 473 | fi 474 | read -r -p "Would you like to continue? [Y/n] " confirm 475 | confirm=$(echo $confirm | tr '[:upper:]' '[:lower:]') #lowercasing 476 | if [[ $confirm =~ ^(yes|y) || $confirm == "" ]]; then 477 | return 0 478 | else 479 | exit 1 480 | fi 481 | fi 482 | } 483 | 484 | function stop_yarn { 485 | $HADOOP_HOME/bin/yarn --config $HADOOP_CONF_DIR --daemon stop resourcemanager 486 | $HADOOP_HOME/bin/yarn --config $HADOOP_CONF_DIR --daemon stop nodemanager 487 | } 488 | 489 | function stop_geoserver { 490 | GEOSERVER_PID=`cat ${GEOSERVER_PID_DIR}/geoserver.pid` 491 | if [[ -n "${GEOSERVER_PID}" ]]; then 492 | kill -15 ${GEOSERVER_PID} 493 | echo "TERM signal sent to process PID: ${GEOSERVER_PID}" 494 | else 495 | echo "No GeoServer PID was saved. This script must be used to start GeoServer in order for this script to be able to stop it." 496 | fi 497 | } 498 | 499 | function clear_sw { 500 | [[ "$zeppelin_enabled" -eq 1 ]] && rm -rf "${CLOUD_HOME}/zeppelin-${pkg_zeppelin_ver}-bin-all" 501 | [[ "$acc_enabled" -eq 1 ]] && rm -rf "${CLOUD_HOME}/accumulo-${pkg_accumulo_ver}" 502 | [[ "$hbase_enabled" -eq 1 ]] && rm -rf "${CLOUD_HOME}/hbase-${pkg_hbase_ver}" 503 | [[ -d "${CLOUD_HOME}/geomesa-accumulo_${pkg_geomesa_scala_ver}-${pkg_geomesa_ver}" ]] && rm -rf "${CLOUD_HOME}/geomesa-accumulo_${pkg_geomesa_scala_ver}-${pkg_geomesa_ver}" 504 | rm -rf "${CLOUD_HOME}/hadoop-${pkg_hadoop_ver}" 505 | rm -rf "${CLOUD_HOME}/zookeeper-${pkg_zookeeper_ver}" 506 | [[ "$kafka_enabled" -eq 1 ]] && rm -rf "${CLOUD_HOME}/kafka_${pkg_kafka_scala_ver}-${pkg_kafka_ver}" 507 | rm -rf "${CLOUD_HOME}/spark-${pkg_spark_ver}-bin-${pkg_spark_hadoop_ver}" 508 | rm -rf "${CLOUD_HOME}/scala-${pkg_scala_ver}" 509 | rm -rf "${CLOUD_HOME}/tmp" 510 | if [[ -a "${CLOUD_HOME}/zookeeper.out" ]]; then rm "${CLOUD_HOME}/zookeeper.out"; fi #hahahaha 511 | } 512 | 513 | function clear_data { 514 | read -r -p "Are you sure you want to clear data directories? [y/N] " confirm 515 | confirm=$(echo $confirm | tr '[:upper:]' '[:lower:]') #lowercasing 516 | if [[ $confirm =~ ^(yes|y) ]]; then 517 | rm -rf ${CLOUD_HOME}/data/yarn/* 518 | rm -rf ${CLOUD_HOME}/data/zookeeper/* 519 | rm -rf ${CLOUD_HOME}/data/dfs/data/* 520 | rm -rf ${CLOUD_HOME}/data/dfs/name/* 521 | rm -rf ${CLOUD_HOME}/data/hadoop/tmp/* 522 | rm -rf ${CLOUD_HOME}/data/hadoop/pid/* 523 | rm -rf ${CLOUD_HOME}/data/geoserver/pid/* 524 | rm -rf ${CLOUD_HOME}/data/geoserver/log/* 525 | if [[ -d "${CLOUD_HOME}/data/kafka-logs" ]]; then rm -rf ${CLOUD_HOME}/data/kafka-logs; fi # intentionally to clear dot files 526 | fi 527 | } 528 | 529 | function show_help { 530 | echo "Provide 1 command: (init|start|stop|reconfigure|restart|reyarn|regeoserver|clean|download_only|init_skip_download|help)" 531 | echo "If the environment variable GEOSERVER_HOME is set then the parameter '-gs' may be used with 'start' to automatically start/stop GeoServer with the cloud." 532 | } 533 | 534 | if [[ "$2" == "-gs" ]]; then 535 | if [[ -n "${GEOSERVER_HOME}" && -e $GEOSERVER_HOME/bin/startup.sh ]]; then 536 | geoserver_enabled=1 537 | else 538 | echo "The environment variable GEOSERVER_HOME is not set or is not valid." 539 | fi 540 | fi 541 | 542 | if [[ "$#" -ne 1 && "${geoserver_enabled}" -ne "1" ]]; then 543 | show_help 544 | exit 1 545 | fi 546 | 547 | if [[ $1 == 'init' ]]; then 548 | download_packages && unpackage && configure && start_first_time 549 | elif [[ $1 == 'reconfigure' ]]; then 550 | echo "reconfiguring..." 551 | #TODO ensure everything is stopped? prompt to make sure? 552 | stop_cloud && clear_sw && clear_data && unpackage && configure && start_first_time 553 | elif [[ $1 == 'clean' ]]; then 554 | echo "cleaning..." 555 | clear_sw && clear_data 556 | echo "cleaned!" 557 | elif [[ $1 == 'start' ]]; then 558 | echo "Starting cloud..." 559 | start_cloud && start_db 560 | echo "Cloud Started" 561 | elif [[ $1 == 'stop' ]]; then 562 | echo "Stopping Cloud..." 563 | stop_db && stop_cloud 564 | echo "Cloud stopped" 565 | elif [[ $1 == 'start_db' ]]; then 566 | echo "Starting cloud..." 567 | start_db 568 | echo "Database Started" 569 | elif [[ $1 == 'stop_db' ]]; then 570 | echo "Stopping Database..." 571 | stop_db 572 | echo "Cloud stopped" 573 | elif [[ $1 == 'start_hadoop' ]]; then 574 | echo "Starting Hadoop..." 575 | start_cloud 576 | echo "Cloud Hadoop" 577 | elif [[ $1 == 'stop_hadoop' ]]; then 578 | echo "Stopping Hadoop..." 579 | stop_cloud 580 | echo "Hadoop stopped" 581 | elif [[ $1 == 'reyarn' ]]; then 582 | echo "Stopping Yarn..." 583 | stop_yarn 584 | echo "Starting Yarn..." 585 | start_yarn 586 | elif [[ $1 == 'regeoserver' ]]; then 587 | stop_geoserver 588 | start_geoserver 589 | elif [[ $1 == 'restart' ]]; then 590 | stop_geoserver && stop_cloud && start_cloud && start_geoserver 591 | elif [[ $1 == 'download_only' ]]; then 592 | download_packages 593 | elif [[ $1 == 'init_skip_download' ]]; then 594 | unpackage && configure && start_first_time 595 | else 596 | show_help 597 | fi 598 | 599 | 600 | 601 | -------------------------------------------------------------------------------- /bin/config.sh: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bash 2 | 3 | # thanks accumulo for these resolutions snippets 4 | if [ -z "${CLOUD_HOME}" ] ; then 5 | # Start: Resolve Script Directory 6 | SOURCE="${BASH_SOURCE[0]}" 7 | while [ -h "$SOURCE" ]; do # resolve $SOURCE until the file is no longer a symlink 8 | bin="$( cd -P "$( dirname "$SOURCE" )" && pwd )" 9 | SOURCE="$(readlink "$SOURCE")" 10 | [[ $SOURCE != /* ]] && SOURCE="$bin/$SOURCE" # if $SOURCE was a relative symlink, we need to resolve it relative to the path where the symlink file was located 11 | done 12 | bin="$( cd -P "$( dirname "$SOURCE" )" && pwd )" 13 | script=$( basename "$SOURCE" ) 14 | # Stop: Resolve Script Directory 15 | 16 | CLOUD_HOME=$( cd -P ${bin}/.. && pwd ) 17 | export CLOUD_HOME 18 | fi 19 | 20 | if [ ! -d "${CLOUD_HOME}" ]; then 21 | echo "CLOUD_HOME=${CLOUD_HOME} is not a valid directory. Please make sure it exists" 22 | return 1 23 | fi 24 | 25 | # [Tab] shell completion because i'm lazy 26 | IFS=$'\n' complete -W "init start stop reconfigure regeoserver reyarn clean help" cloud-local.sh 27 | NL=$'\n' 28 | 29 | function validate_config { 30 | # allowed versions are 31 | local pkg_error="" 32 | # hadoop 3.2 currently required 33 | if [[ -z "$pkg_hadoop_ver" || ! $pkg_hadoop_ver =~ 3[.]2[.].+ ]]; then 34 | pkg_error="${pkg_error}Invalid hadoop version: '${pkg_hadoop_ver}' ${NL}" 35 | fi 36 | # zk 3.4.x 37 | if [[ -z "$pkg_zookeeper_ver" || ! $pkg_zookeeper_ver =~ 3[.]4[.]([56789]|10|11|12|13|14) ]]; then 38 | pkg_error="${pkg_error}Invalid zookeeper version: '${pkg_zookeeper_ver}' ${NL}" 39 | fi 40 | # acc 2.0.0 41 | if [[ -z "$pkg_accumulo_ver" || ! $pkg_accumulo_ver =~ 2[.]0[.]0 ]]; then 42 | pkg_error="${pkg_error}Invalid accumulo version: '${pkg_accumulo_ver}' ${NL}" 43 | fi 44 | # kafka 0.9.x, 0.10.x, 0.11.x, 1.0.x 45 | if [[ -z "$pkg_kafka_ver" || ! $pkg_kafka_ver =~ ((0[.]9[.].+)|(0[.]1[01][.].+)|1[.]0[.].) ]]; then 46 | pkg_error="${pkg_error}Invalid kafka version: '${pkg_kafka_ver}' ${NL}" 47 | fi 48 | # geomesa scala 1.3.x 49 | if [[ -z "$pkg_geomesa_scala_ver" && $pkg_geomesa_ver =~ 3[.]0[.].+ ]]; then 50 | pkg_error="${pkg_error}Invalid GeoMesa Scala version: '${pkg_geomesa_scala_ver}' ${NL}" 51 | fi 52 | 53 | if [[ ! -z "$pkg_error" ]]; then 54 | echo "ERROR: ${pkg_error}" 55 | return 1 56 | else 57 | return 0 58 | fi 59 | } 60 | 61 | function set_env_vars { 62 | if [[ $zeppelin_enabled -eq "1" ]]; then 63 | export ZEPPELIN_HOME="${CLOUD_HOME}/zeppelin-${pkg_zeppelin_ver}-bin-all" 64 | fi 65 | 66 | if [[ $geomesa_enabled -eq "1" ]]; then 67 | unset GEOMESA_HOME 68 | unset GEOMESA_BIN 69 | export GEOMESA_HOME="${CLOUD_HOME}/geomesa-accumulo_${pkg_geomesa_scala_ver}-${pkg_geomesa_ver}" 70 | export GEOMESA_BIN="${GEOMESA_HOME}/bin:" 71 | echo "Setting GEOMESA_HOME: ${GEOMESA_HOME}" 72 | fi 73 | 74 | export ZOOKEEPER_HOME="${CLOUD_HOME}/zookeeper-${pkg_zookeeper_ver}" 75 | 76 | if [[ $kafka_enabled -eq "1" ]]; then 77 | export KAFKA_HOME="${CLOUD_HOME}/kafka_2.11-${pkg_kafka_ver}" 78 | fi 79 | 80 | export HADOOP_HOME="$CLOUD_HOME/hadoop-${pkg_hadoop_ver}" 81 | export HADOOP_CONF_DIR="${HADOOP_HOME}/etc/hadoop" 82 | export HADOOP_COMMON_HOME="${HADOOP_HOME}" 83 | export HADOOP_HDFS_HOME="${HADOOP_HOME}" 84 | export HADOOP_YARN_HOME="${HADOOP_HOME}" 85 | export HADOOP_PID_DIR="${CLOUD_HOME}/data/hadoop/pid" 86 | export HADOOP_IDENT_STRING=$(echo ${CLOUD_HOME} | (md5sum 2>/dev/null || md5) | cut -c1-32) 87 | 88 | export YARN_HOME="${HADOOP_HOME}" 89 | 90 | export SPARK_HOME="$CLOUD_HOME/spark-${pkg_spark_ver}-bin-${pkg_spark_hadoop_ver}" 91 | 92 | export GEOSERVER_DATA_DIR="${CLOUD_HOME}/data/geoserver" 93 | export GEOSERVER_PID_DIR="${GEOSERVER_DATA_DIR}/pid" 94 | export GEOSERVER_LOG_DIR="${GEOSERVER_DATA_DIR}/log" 95 | 96 | [[ "${acc_enabled}" -eq "1" ]] && export ACCUMULO_HOME="${CLOUD_HOME}/accumulo-${pkg_accumulo_ver}" 97 | [[ "${hbase_enabled}" -eq "1" ]] && export HBASE_HOME="${CLOUD_HOME}/hbase-${pkg_hbase_ver}" 98 | 99 | export PATH="$GEOMESA_BIN"$ZOOKEEPER_HOME/bin:$KAFKA_HOME/bin:$HADOOP_HOME/sbin:$HADOOP_HOME/bin:$SPARK_HOME/bin:$PATH 100 | [[ "${acc_enabled}" -eq "1" ]] && export PATH="${ACCUMULO_HOME}/bin:${PATH}" 101 | [[ "${hbase_enabled}" -eq "1" ]] && export PATH="${HBASE_HOME}/bin:${PATH}" 102 | [[ "${zeppelin_enabled}" -eq "1" ]] && export PATH="${ZEPPELIN_HOME}/bin:${PATH}" 103 | 104 | # This variable requires Hadoop executable, which will fail during certain runs/steps 105 | [[ "$pkg_spark_hadoop_ver" = "without-hadoop" ]] && export SPARK_DIST_CLASSPATH=$(hadoop classpath 2>/dev/null) 106 | 107 | # Export direnv environment file https://direnv.net/ 108 | env | grep -v PATH | sort > $CLOUD_HOME/.envrc 109 | echo "PATH=${PATH}" >> $CLOUD_HOME/.envrc 110 | } 111 | 112 | if [[ -z "$JAVA_HOME" ]];then 113 | echo "ERROR: must set JAVA_HOME..." 114 | return 1 115 | fi 116 | 117 | # load configuration scripts 118 | . "${CLOUD_HOME}/conf/cloud-local.conf" 119 | validate_config 120 | set_env_vars 121 | 122 | 123 | -------------------------------------------------------------------------------- /bin/ports.sh: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bash 2 | 3 | case $(uname) in 4 | "Darwin") SED_REGEXP_EXTENDED='-E' ;; 5 | *) SED_REGEXP_EXTENDED='-r' ;; 6 | esac 7 | 8 | # get the port offset from config variable 9 | function get_port_offset { 10 | local offset=0 11 | 12 | if [ -n "${CL_PORT_OFFSET}" ]; then 13 | offset=${CL_PORT_OFFSET} 14 | fi 15 | 16 | echo ${offset} 17 | } 18 | 19 | function check_port { 20 | local port=$1 21 | local offset=$(get_port_offset) 22 | local tocheck=$((port+offset)) 23 | if (: < /dev/tcp/127.0.0.1/${tocheck}) 2>/dev/null; then 24 | echo "Error: port ${tocheck} is already taken (orig port ${port} with offset ${offset})" 25 | exit 1 26 | fi 27 | } 28 | 29 | function check_ports { 30 | check_port 2181 # zookeeper 31 | 32 | check_port 9092 # kafka broker 33 | 34 | # hadoop 35 | check_port 50010 # dfs.datanode.address 36 | check_port 50020 # dfs.datanode.ipc.address 37 | check_port 50075 # dfs.datanode.http.address 38 | check_port 50475 # dfs.datanode.https.address 39 | 40 | check_port 8020 # namenode data 41 | check_port 9000 # namenode data 42 | check_port 50070 # namenode http 43 | check_port 50470 # namenode https 44 | 45 | check_port 50090 # secondary name node 46 | 47 | check_port 8088 # yarn job tracker 48 | check_port 8030 # yarn 49 | check_port 8031 # yarn 50 | check_port 8032 # yarn 51 | check_port 8033 # yarn 52 | 53 | check_port 8090 # yarn 54 | check_port 8040 # yarn 55 | check_port 8042 # yarn 56 | 57 | check_port 13562 # mapreduce shuffle 58 | 59 | # accumulo 60 | check_port 9995 # accumulo monitor 61 | check_port 4560 # accumulo monitor log4j 62 | check_port 9997 # accumulo tserver 63 | check_port 9998 # accumulo gc 64 | check_port 9999 # accumulo master 65 | check_port 12234 # accumulo tracer 66 | check_port 10001 # accumulo master replication coordinator 67 | check_port 10002 # accumulo master replication service 68 | 69 | 70 | # hbase 71 | check_port 16000 # hbase master 72 | check_port 16010 # hbase master info 73 | check_port 16020 # hbase regionserver 74 | check_port 16030 # hbase regionserver info 75 | check_port 16040 # hbase rest 76 | check_port 16100 # hbase multicast 77 | 78 | # spark 79 | check_port 4040 # Spark job monitor 80 | 81 | # Zeppelin 82 | check_port 5771 # Zeppelin embedded web server 83 | 84 | echo "Known ports are OK (using offset $(get_port_offset))" 85 | } 86 | 87 | function configure_port_offset { 88 | local offset=$(get_port_offset) 89 | 90 | local KEY="CL_port_default" 91 | local KEY_CHANGED="CL_offset_port" 92 | 93 | # zookeeper (zoo.cfg) 94 | # do this one by hand, it's fairly straightforward 95 | zkPort=$((2181+offset)) 96 | sed -i~orig "s/clientPort=.*/clientPort=$zkPort/" $ZOOKEEPER_HOME/conf/zoo.cfg 97 | 98 | # kafka (server.properties) 99 | if [[ "kafka_enabled" -eq 1 ]]; then 100 | kafkaPort=$((9092+offset)) 101 | sed -i~orig "s/\/\/$CL_HOSTNAME:[0-9].*/\/\/$CL_HOSTNAME:$kafkaPort/" $KAFKA_HOME/config/server.properties 102 | sed -i~orig "s/zookeeper.connect=$CL_HOSTNAME:[0-9].*/zookeeper.connect=$CL_HOSTNAME:$zkPort/" $KAFKA_HOME/config/server.properties 103 | fi 104 | 105 | # Zeppelin 106 | if [[ "$zeppelin_enabled" -eq 1 ]]; then 107 | zeppelinPort=$((5771+offset)) 108 | sed -i~orig "s/ZEPPELIN_PORT=[0-9]\{1,5\}\(.*\)/ZEPPELIN_PORT=$zeppelinPort\1/g" "$ZEPPELIN_HOME/conf/zeppelin-env.sh" 109 | fi 110 | 111 | # hadoop and accumulo xml files 112 | # The idea with this block is that the xml files have comments which tag lines which need 113 | # a port replacement, and the comments provide the default values. So to change ports, 114 | # we replace all the instance of the default value, on the line with the comment, with 115 | # the desired (offset) port. 116 | 117 | xmlFiles=( $HADOOP_CONF_DIR/core-site.xml \ 118 | $HADOOP_CONF_DIR/hdfs-site.xml \ 119 | $HADOOP_CONF_DIR/mapred-site.xml \ 120 | $HADOOP_CONF_DIR/yarn-site.xml ) 121 | if [ -f "$ACCUMULO_HOME/conf/accumulo-site.xml" ]; then 122 | xmlFiles+=($ACCUMULO_HOME/conf/accumulo-site.xml) 123 | fi 124 | if [ -f "$HBASE_HOME/conf/hbase-site.xml" ]; then 125 | xmlFiles+=($HBASE_HOME/conf/hbase-site.xml) 126 | fi 127 | for FILE in "${xmlFiles[@]}"; do 128 | while [[ -n "$(grep $KEY $FILE)" ]]; do # while lines need to be changed 129 | # pull the default port out of the comment 130 | basePort=$(grep -hoE "$KEY [0-9]+" $FILE | head -1 | grep -hoE [0-9]+) 131 | # calculate new port 132 | newPort=$(($basePort+$offset)) 133 | # note that any part of the line matching the port line will be replaced... 134 | # the following sed only makes the replacement on a single line, containing the matched comment 135 | #sed -i~orig ${SED_REGEXP_EXTENDED} "/$KEY $basePort/ s#[0-9]+#$newPort#" $FILE 136 | if [[ "${CL_VERBOSE}" == "1" ]]; then echo "Replacing port $basePort with $newPort in file $FILE"; fi 137 | sed -i~orig ${SED_REGEXP_EXTENDED} "/$KEY $basePort/ s#(.*)(${basePort})(.*)#\\1${newPort}\\3#" $FILE 138 | # mark this line done 139 | sed -i~orig ${SED_REGEXP_EXTENDED} "s/$KEY $basePort/$KEY_CHANGED $basePort/" $FILE 140 | done 141 | # re-mark all comment lines, so we can change ports again later if we want 142 | sed -i~orig "s/$KEY_CHANGED/$KEY/g" $FILE 143 | done 144 | 145 | echo "Ports configured to use offset $offset" 146 | } 147 | -------------------------------------------------------------------------------- /conf/cloud-local.conf: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bash 2 | 3 | ################################################################################ 4 | # REPOSITORY MANAGEMENT 5 | ################################################################################ 6 | 7 | # 8 | # Source for packages (accumulo, hadoop, etc) 9 | # Available options are (local, wget) 10 | # 11 | # Set the variable 'pkg_src_mirror' if you want to specify a mirror 12 | # else it will use https://www.apache.org/dyn/closer.cgi to determine 13 | # a mirror. If you have a caching web proxy you may want to set this as well. 14 | # 15 | # pkg_src_mirror="http://apache.mirrors.tds.net" 16 | 17 | # Specify a maven repository to use 18 | pkg_src_maven="https://repo1.maven.org/maven2" 19 | 20 | # Optionally specifcy a local shared folder of package downloads 21 | #pkg_pre_download=/net/synds1/volume2/projects2/cloud-local/packages 22 | 23 | ################################################################################ 24 | # VERSION MANAGEMENT - Versions of popular software 25 | ################################################################################ 26 | 27 | pkg_accumulo_ver="2.0.0" 28 | pkg_hbase_ver="1.3.1" 29 | # Note pkg_spark_hadoop_ver below if modifying 30 | pkg_hadoop_ver="3.2.1" 31 | # Note, just the major+minor from Hadoop, not patch level 32 | hadoop_base_ver=${pkg_hadoop_ver:0:3} 33 | 34 | 35 | pkg_zookeeper_ver="3.4.14" 36 | # Note convention is scala.version_kafka.version 37 | pkg_kafka_scala_ver="2.11" 38 | pkg_kafka_ver="1.0.1" 39 | 40 | pkg_spark_ver="2.2.1" 41 | # Note pkg_hadoop_ver above 42 | # - don't auto-derive this as spark & hadoop major releases aren't lock-step 43 | # - use "without-hadoop" to use version without hadoop deps 44 | pkg_spark_hadoop_ver="without-hadoop" 45 | 46 | pkg_geomesa_ver="3.0.0" 47 | pkg_geomesa_scala_ver="2.11" 48 | pkg_scala_ver="2.11.7" 49 | 50 | # Apache Zeppelin, yet another analyst notebook that knows about Spark 51 | # You must change Spark to a compatible version (e.g. Zep 0.7.2 with Spark 2.1 and Zep 0.7.3 with Spark 2.2) 52 | pkg_zeppelin_ver="0.7.3" 53 | 54 | ################################################################################ 55 | # ACCUMULO CONFIGURATION 56 | ################################################################################ 57 | 58 | cl_acc_inst_name="local" 59 | cl_acc_inst_pass="secret" 60 | 61 | ################################################################################ 62 | # IP/HOSTNAME/PORT CONFIGURATION - How to bind to things 63 | ################################################################################ 64 | 65 | # The following options can be overriden in the user environment 66 | # bind address and hostname to use for all service bindings 67 | if [[ -z "${CL_HOSTNAME}" ]]; then 68 | CL_HOSTNAME=$(hostname) 69 | #CL_HOSTNAME=localhost 70 | fi 71 | 72 | if [[ -z "${CL_BIND_ADDRESS}" ]]; then 73 | CL_BIND_ADDRESS="0.0.0.0" 74 | #CL_BIND_ADDRESS="127.0.0.1" 75 | fi 76 | 77 | if [[ -z "${CL_PORT_OFFSET}" ]]; then 78 | CL_PORT_OFFSET=0 79 | fi 80 | 81 | if [[ -z "${CL_VERBOSE}" ]]; then 82 | CL_VERBOSE=0 83 | fi 84 | 85 | ################################################################################ 86 | # PACKAGE MANAGEMENT - Enable/Disable software here 87 | ################################################################################ 88 | 89 | # 1 = enabled 90 | # 0 = disabled 91 | 92 | master_enabled=1 93 | worker_enabled=1 94 | 95 | # Hadoop HDFS and YARN 96 | hadoop_enabled=1 97 | 98 | # Enable accumulo or hbase - probably best not to run both but it might work 99 | # Requires hadoop_enabled=1 100 | acc_enabled=1 101 | hbase_enabled=0 102 | 103 | # Enable/disable Kafka 104 | kafka_enabled=0 105 | 106 | # Download spark distribution 107 | spark_enabled=0 108 | 109 | # Enable/Disable installation of GeoMesa 110 | geomesa_enabled=0 111 | if [[ -z "${pkg_geomesa_ver}" && -z "${pkg_geomesa_scala_ver}" && "${geomesa_enabled}" -eq "1" ]]; then 112 | echo "Error: GeoMesa is enabled but the version number is missing." 113 | exit 1 114 | fi 115 | 116 | # Enable/Disable scala download 117 | scala_enabled=1 118 | if [[ -z "${pkg_scala_ver}" && "${scala_enabled}" -eq "1" ]]; then 119 | echo "Error: Scala is enabled but the version number is missing." 120 | exit 1 121 | fi 122 | 123 | # Enable/Disable Zepplin 124 | # Ensure that your Spark+Zeppelin versions are compatible 125 | zeppelin_enabled=0 126 | -------------------------------------------------------------------------------- /templates/accumulo/accumulo-env.sh: -------------------------------------------------------------------------------- 1 | #! /usr/bin/env bash 2 | 3 | # Licensed to the Apache Software Foundation (ASF) under one or more 4 | # contributor license agreements. See the NOTICE file distributed with 5 | # this work for additional information regarding copyright ownership. 6 | # The ASF licenses this file to You under the Apache License, Version 2.0 7 | # (the "License"); you may not use this file except in compliance with 8 | # the License. You may obtain a copy of the License at 9 | # 10 | # http://www.apache.org/licenses/LICENSE-2.0 11 | # 12 | # Unless required by applicable law or agreed to in writing, software 13 | # distributed under the License is distributed on an "AS IS" BASIS, 14 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 15 | # See the License for the specific language governing permissions and 16 | # limitations under the License. 17 | 18 | ## Before accumulo-env.sh is loaded, these environment variables are set and can be used in this file: 19 | 20 | # cmd - Command that is being called such as tserver, master, etc. 21 | # basedir - Root of Accumulo installation 22 | # bin - Directory containing Accumulo scripts 23 | # conf - Directory containing Accumulo configuration 24 | # lib - Directory containing Accumulo libraries 25 | 26 | ############################ 27 | # Variables that must be set 28 | ############################ 29 | 30 | ## Accumulo logs directory. Referenced by logger config. 31 | export ACCUMULO_LOG_DIR="${ACCUMULO_LOG_DIR:-${basedir}/logs}" 32 | ## Hadoop installation 33 | export HADOOP_HOME="${HADOOP_HOME:-/path/to/hadoop}" 34 | ## Hadoop configuration 35 | export HADOOP_CONF_DIR="${HADOOP_CONF_DIR:-${HADOOP_HOME}/etc/hadoop}" 36 | ## Zookeeper installation 37 | export ZOOKEEPER_HOME="${ZOOKEEPER_HOME:-/path/to/zookeeper}" 38 | 39 | ########################## 40 | # Build CLASSPATH variable 41 | ########################## 42 | 43 | ## Verify that Hadoop & Zookeeper installation directories exist 44 | if [[ ! -d "$ZOOKEEPER_HOME" ]]; then 45 | echo "ZOOKEEPER_HOME=$ZOOKEEPER_HOME is not set to a valid directory in accumulo-env.sh" 46 | exit 1 47 | fi 48 | if [[ ! -d "$HADOOP_HOME" ]]; then 49 | echo "HADOOP_HOME=$HADOOP_HOME is not set to a valid directory in accumulo-env.sh" 50 | exit 1 51 | fi 52 | 53 | ## Build using existing CLASSPATH, conf/ directory, dependencies in lib/, and external Hadoop & Zookeeper dependencies 54 | if [[ -n "$CLASSPATH" ]]; then 55 | CLASSPATH="${CLASSPATH}:${conf}" 56 | else 57 | CLASSPATH="${conf}" 58 | fi 59 | CLASSPATH="${CLASSPATH}:${lib}/*:${HADOOP_CONF_DIR}:${ZOOKEEPER_HOME}/*:${HADOOP_HOME}/share/hadoop/client/*" 60 | export CLASSPATH 61 | 62 | ################################################################## 63 | # Build JAVA_OPTS variable. Defaults below work but can be edited. 64 | ################################################################## 65 | 66 | ## JVM options set for all processes. Extra options can be passed in by setting ACCUMULO_JAVA_OPTS to an array of options. 67 | JAVA_OPTS=("${ACCUMULO_JAVA_OPTS[@]}" 68 | '-XX:+UseConcMarkSweepGC' 69 | '-XX:CMSInitiatingOccupancyFraction=75' 70 | '-XX:+CMSClassUnloadingEnabled' 71 | '-XX:OnOutOfMemoryError=kill -9 %p' 72 | '-XX:-OmitStackTraceInFastThrow' 73 | '-Djava.net.preferIPv4Stack=true' 74 | "-Daccumulo.native.lib.path=${lib}/native") 75 | 76 | ## Make sure Accumulo native libraries are built since they are enabled by default 77 | "${bin}"/accumulo-util build-native &> /dev/null 78 | 79 | ## JVM options set for individual applications 80 | case "$cmd" in 81 | master) JAVA_OPTS=("${JAVA_OPTS[@]}" '-Xmx512m' '-Xms512m') ;; 82 | monitor) JAVA_OPTS=("${JAVA_OPTS[@]}" '-Xmx256m' '-Xms256m') ;; 83 | gc) JAVA_OPTS=("${JAVA_OPTS[@]}" '-Xmx256m' '-Xms256m') ;; 84 | tserver) JAVA_OPTS=("${JAVA_OPTS[@]}" '-Xmx3G' '-Xms3G') ;; 85 | *) JAVA_OPTS=("${JAVA_OPTS[@]}" '-Xmx256m' '-Xms64m') ;; 86 | esac 87 | 88 | ## JVM options set for logging. Review logj4 properties files to see how they are used. 89 | JAVA_OPTS=("${JAVA_OPTS[@]}" 90 | "-Daccumulo.log.dir=${ACCUMULO_LOG_DIR}" 91 | "-Daccumulo.application=${cmd}${ACCUMULO_SERVICE_INSTANCE}_$(hostname)") 92 | 93 | case "$cmd" in 94 | monitor) 95 | JAVA_OPTS=("${JAVA_OPTS[@]}" "-Dlog4j.configuration=log4j-monitor.properties") 96 | ;; 97 | gc|master|tserver|tracer) 98 | JAVA_OPTS=("${JAVA_OPTS[@]}" "-Dlog4j.configuration=log4j-service.properties") 99 | ;; 100 | *) 101 | # let log4j use its default behavior (log4j.xml, log4j.properties) 102 | true 103 | ;; 104 | esac 105 | 106 | export JAVA_OPTS 107 | 108 | ############################ 109 | # Variables set to a default 110 | ############################ 111 | 112 | export MALLOC_ARENA_MAX=${MALLOC_ARENA_MAX:-1} 113 | ## Add Hadoop native libraries to shared library paths given operating system 114 | case "$(uname)" in 115 | Darwin) export DYLD_LIBRARY_PATH="${HADOOP_HOME}/lib/native:${DYLD_LIBRARY_PATH}" ;; 116 | *) export LD_LIBRARY_PATH="${HADOOP_HOME}/lib/native:${LD_LIBRARY_PATH}" ;; 117 | esac 118 | 119 | ############################################### 120 | # Variables that are optional. Uncomment to set 121 | ############################################### 122 | 123 | ## Specifies command that will be placed before calls to Java in accumulo script 124 | # export ACCUMULO_JAVA_PREFIX="" 125 | -------------------------------------------------------------------------------- /templates/accumulo/accumulo-site.xml: -------------------------------------------------------------------------------- 1 | 2 | 18 | 19 | 20 | 21 | 23 | 24 | 25 | instance.zookeeper.host 26 | CLOUD_LOCAL_HOSTNAME:2181 27 | comma separated list of zookeeper servers 28 | 29 | 30 | 31 | monitor.port.client 32 | 9995 33 | 34 | 35 | 36 | monitor.port.log4j 37 | 4560 38 | 39 | 40 | 41 | tserver.port.client 42 | 9997 43 | 44 | 45 | 46 | gc.port.client 47 | 9998 48 | 49 | 50 | 51 | master.port.client 52 | 9999 53 | 54 | 55 | 56 | master.replication.coordinator.port 57 | 10001 58 | 59 | 60 | 61 | replication.receipt.service.port 62 | 10002 63 | 64 | 65 | 66 | trace.port.client 67 | 12234 68 | 69 | 70 | 71 | instance.volumes 72 | hdfs://CLOUD_LOCAL_HOSTNAME:9000/accumulo 73 | 74 | 75 | 76 | instance.secret 77 | DEFAULT 78 | A secret unique to a given instance that all servers must know in order to communicate with one another. 79 | Change it before initialization. To 80 | change it later use ./bin/accumulo org.apache.accumulo.server.util.ChangeSecret --old [oldpasswd] --new [newpasswd], 81 | and then update this file. 82 | 83 | 84 | 85 | 86 | tserver.memory.maps.max 87 | 1G 88 | 89 | 90 | 91 | tserver.memory.maps.native.enabled 92 | false 93 | 94 | 95 | 96 | tserver.cache.data.size 97 | 128M 98 | 99 | 100 | 101 | tserver.cache.index.size 102 | 128M 103 | 104 | 105 | 106 | trace.token.property.password 107 | 108 | secret 109 | 110 | 111 | 112 | trace.user 113 | root 114 | 115 | 116 | 117 | tserver.sort.buffer.size 118 | 200M 119 | 120 | 121 | 122 | tserver.walog.max.size 123 | 1G 124 | 125 | 126 | 127 | -------------------------------------------------------------------------------- /templates/hadoop/core-site.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 16 | 17 | 18 | 19 | 20 | 21 | fs.defaultFS 22 | hdfs://CLOUD_LOCAL_HOSTNAME:9000 23 | 24 | 25 | 26 | 27 | fs.default.name 28 | hdfs://CLOUD_LOCAL_HOSTNAME:9000 29 | 30 | 31 | 32 | hadoop.tmp.dir 33 | LOCAL_CLOUD_PREFIX/data/hadoop/tmp 34 | 35 | 36 | 37 | 38 | -------------------------------------------------------------------------------- /templates/hadoop/hdfs-site.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 16 | 17 | 18 | 19 | 20 | 21 | dfs.replication 22 | 1 23 | 24 | 25 | 26 | dfs.datanode.synconclose 27 | true 28 | 29 | 30 | 31 | dfs.name.dir 32 | LOCAL_CLOUD_PREFIX/data/dfs/name 33 | 34 | 35 | 36 | dfs.data.dir 37 | LOCAL_CLOUD_PREFIX/data/dfs/data 38 | 39 | 40 | 41 | dfs.datanode.address 42 | CLOUD_LOCAL_BIND_ADDRESS:50010 43 | 44 | 45 | 46 | dfs.datanode.ipc.address 47 | CLOUD_LOCAL_BIND_ADDRESS:50020 48 | 49 | 50 | 51 | dfs.datanode.http.address 52 | CLOUD_LOCAL_BIND_ADDRESS:50075 53 | 54 | 55 | 56 | dfs.datanode.https.address 57 | CLOUD_LOCAL_BIND_ADDRESS:50475 58 | 59 | 60 | 61 | dfs.namenode.http-address 62 | CLOUD_LOCAL_BIND_ADDRESS:50070 63 | 64 | 65 | 66 | dfs.namenode.https-address 67 | CLOUD_LOCAL_BIND_ADDRESS:50470 68 | 69 | 70 | 71 | dfs.namenode.secondary.http-address 72 | CLOUD_LOCAL_BIND_ADDRESS:50090 73 | 74 | 75 | 76 | -------------------------------------------------------------------------------- /templates/hadoop/mapred-site.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | mapreduce.framework.name 6 | yarn 7 | 8 | 9 | 10 | mapreduce.shuffle.port 11 | 13562 12 | 13 | 14 | 15 | mapreduce.jobhistory.address 16 | CLOUD_LOCAL_BIND_ADDRESS:10020 17 | 18 | 19 | 20 | mapreduce.jobhistory.webapp.address 21 | CLOUD_LOCAL_BIND_ADDRESS:19888 22 | 23 | 24 | 25 | mapreduce.jobtracker.http.address 26 | CLOUD_LOCAL_BIND_ADDRESS:50030 27 | 28 | 29 | 30 | mapreduce.tasktracker.http.address 31 | CLOUD_LOCAL_BIND_ADDRESS:50050 32 | 33 | 34 | 35 | -------------------------------------------------------------------------------- /templates/hadoop/yarn-site.xml: -------------------------------------------------------------------------------- 1 | 2 | 15 | 16 | 17 | 18 | yarn.nodemanager.aux-services 19 | mapreduce_shuffle 20 | 21 | 22 | 23 | yarn.nodemanager.local-dirs 24 | LOCAL_CLOUD_PREFIX/data/yarn 25 | 26 | 27 | 28 | yarn.nodemanager.vmem-check-enabled 29 | false 30 | 31 | 32 | 33 | yarn.resourcemanager.webapp.address 34 | CLOUD_LOCAL_BIND_ADDRESS:8088 35 | 36 | 37 | 38 | yarn.resourcemanager.scheduler.address 39 | CLOUD_LOCAL_BIND_ADDRESS:8030 40 | 41 | 42 | 43 | yarn.resourcemanager.resource-tracker.address 44 | CLOUD_LOCAL_BIND_ADDRESS:8031 45 | 46 | 47 | 48 | yarn.resourcemanager.address 49 | CLOUD_LOCAL_BIND_ADDRESS:8032 50 | 51 | 52 | 53 | yarn.resourcemanager.admin.address 54 | CLOUD_LOCAL_BIND_ADDRESS:8033 55 | 56 | 57 | 58 | yarn.resourcemanager.webapp.https.address 59 | CLOUD_LOCAL_BIND_ADDRESS:8090 60 | 61 | 62 | 63 | yarn.nodemanager.localizer.address 64 | CLOUD_LOCAL_BIND_ADDRESS:8040 65 | 66 | 67 | 68 | yarn.nodemanager.webapp.address 69 | CLOUD_LOCAL_BIND_ADDRESS:8042 70 | 71 | 72 | 73 | -------------------------------------------------------------------------------- /templates/hbase/hbase-site.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 23 | 24 | 25 | hbase.cluster.distributed 26 | true 27 | 28 | 29 | hbase.rootdir 30 | hdfs://CLOUD_LOCAL_HOSTNAME:9000/hbase 31 | 32 | 33 | 34 | 35 | hbase.master.port 36 | 16000 37 | 38 | 39 | hbase.master.info.port 40 | 16010 41 | 42 | 43 | hbase.master.info.bindAddress 44 | 0.0.0.0 45 | 46 | 47 | 48 | 49 | hbase.regionserver.port 50 | 16020 51 | 52 | 53 | hbase.regionserver.info.port 54 | 16030 55 | 56 | 57 | hbase.regionserver.info.bindAddress 58 | 0.0.0.0 59 | 60 | 61 | 62 | 63 | hbase.rest.port 64 | 16040 65 | 66 | 67 | 68 | 69 | hbase.status.multicast.address.port 70 | 16100 71 | 72 | 73 | 74 | hbase.zookeeper.quorum 75 | CLOUD_LOCAL_HOSTNAME 76 | 77 | 78 | hbase.zookeeper.property.clientPort 79 | 2181 80 | 81 | 82 | -------------------------------------------------------------------------------- /templates/kafka/server.properties: -------------------------------------------------------------------------------- 1 | # Licensed to the Apache Software Foundation (ASF) under one or more 2 | # contributor license agreements. See the NOTICE file distributed with 3 | # this work for additional information regarding copyright ownership. 4 | # The ASF licenses this file to You under the Apache License, Version 2.0 5 | # (the "License"); you may not use this file except in compliance with 6 | # the License. You may obtain a copy of the License at 7 | # 8 | # http://www.apache.org/licenses/LICENSE-2.0 9 | # 10 | # Unless required by applicable law or agreed to in writing, software 11 | # distributed under the License is distributed on an "AS IS" BASIS, 12 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | # See the License for the specific language governing permissions and 14 | # limitations under the License. 15 | # see kafka.server.KafkaConfig for additional details and defaults 16 | 17 | ############################# Server Basics ############################# 18 | 19 | # The id of the broker. This must be set to a unique integer for each broker. 20 | broker.id=0 21 | 22 | ############################# Socket Server Settings ############################# 23 | 24 | # CL_port_default 9092 25 | listeners=PLAINTEXT://CLOUD_LOCAL_HOSTNAME:9092 26 | 27 | # The port the socket server listens on 28 | #port=9092 29 | 30 | # Hostname the broker will bind to. If not set, the server will bind to all interfaces 31 | #host.name=localhost 32 | 33 | # Hostname the broker will advertise to producers and consumers. If not set, it uses the 34 | # value for "host.name" if configured. Otherwise, it will use the value returned from 35 | # java.net.InetAddress.getCanonicalHostName(). 36 | #advertised.host.name= 37 | 38 | # The port to publish to ZooKeeper for clients to use. If this is not set, 39 | # it will publish the same port that the broker binds to. 40 | #advertised.port= 41 | 42 | # The number of threads handling network requests 43 | num.network.threads=3 44 | 45 | # The number of threads doing disk I/O 46 | num.io.threads=8 47 | 48 | # The send buffer (SO_SNDBUF) used by the socket server 49 | socket.send.buffer.bytes=102400 50 | 51 | # The receive buffer (SO_RCVBUF) used by the socket server 52 | socket.receive.buffer.bytes=102400 53 | 54 | # The maximum size of a request that the socket server will accept (protection against OOM) 55 | socket.request.max.bytes=104857600 56 | 57 | 58 | ############################# Log Basics ############################# 59 | 60 | # A comma seperated list of directories under which to store log files 61 | log.dirs=LOCAL_CLOUD_PREFIX/data/kafka-logs 62 | 63 | # The default number of log partitions per topic. More partitions allow greater 64 | # parallelism for consumption, but this will also result in more files across 65 | # the brokers. 66 | num.partitions=1 67 | 68 | # The number of threads per data directory to be used for log recovery at startup and flushing at shutdown. 69 | # This value is recommended to be increased for installations with data dirs located in RAID array. 70 | num.recovery.threads.per.data.dir=1 71 | 72 | ############################# Log Flush Policy ############################# 73 | 74 | # Messages are immediately written to the filesystem but by default we only fsync() to sync 75 | # the OS cache lazily. The following configurations control the flush of data to disk. 76 | # There are a few important trade-offs here: 77 | # 1. Durability: Unflushed data may be lost if you are not using replication. 78 | # 2. Latency: Very large flush intervals may lead to latency spikes when the flush does occur as there will be a lot of data to flush. 79 | # 3. Throughput: The flush is generally the most expensive operation, and a small flush interval may lead to exceessive seeks. 80 | # The settings below allow one to configure the flush policy to flush data after a period of time or 81 | # every N messages (or both). This can be done globally and overridden on a per-topic basis. 82 | 83 | # The number of messages to accept before forcing a flush of data to disk 84 | #log.flush.interval.messages=10000 85 | 86 | # The maximum amount of time a message can sit in a log before we force a flush 87 | #log.flush.interval.ms=1000 88 | 89 | ############################# Log Retention Policy ############################# 90 | 91 | # The following configurations control the disposal of log segments. The policy can 92 | # be set to delete segments after a period of time, or after a given size has accumulated. 93 | # A segment will be deleted whenever *either* of these criteria are met. Deletion always happens 94 | # from the end of the log. 95 | 96 | # The minimum age of a log file to be eligible for deletion 97 | log.retention.hours=168 98 | 99 | # A size-based retention policy for logs. Segments are pruned from the log as long as the remaining 100 | # segments don't drop below log.retention.bytes. 101 | #log.retention.bytes=1073741824 102 | 103 | # The maximum size of a log segment file. When this size is reached a new log segment will be created. 104 | log.segment.bytes=1073741824 105 | 106 | # The interval at which log segments are checked to see if they can be deleted according 107 | # to the retention policies 108 | log.retention.check.interval.ms=300000 109 | 110 | # By default the log cleaner is disabled and the log retention policy will default to just delete segments after their retention expires. 111 | # If log.cleaner.enable=true is set the cleaner will be enabled and individual logs can then be marked for log compaction. 112 | log.cleaner.enable=false 113 | 114 | ############################# Zookeeper ############################# 115 | 116 | # Zookeeper connection string (see zookeeper docs for details). 117 | # This is a comma separated host:port pairs, each corresponding to a zk 118 | # server. e.g. "127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002". 119 | # You can also append an optional chroot string to the urls to specify the 120 | # root directory for all kafka znodes. 121 | # CL_port_default 2181 122 | zookeeper.connect=CLOUD_LOCAL_HOSTNAME:2181 123 | 124 | # Timeout in ms for connecting to zookeeper 125 | zookeeper.connection.timeout.ms=6000 126 | -------------------------------------------------------------------------------- /templates/zeppelin/zeppelin-env.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | # 3 | # Licensed to the Apache Software Foundation (ASF) under one or more 4 | # contributor license agreements. See the NOTICE file distributed with 5 | # this work for additional information regarding copyright ownership. 6 | # The ASF licenses this file to You under the Apache License, Version 2.0 7 | # (the "License"); you may not use this file except in compliance with 8 | # the License. You may obtain a copy of the License at 9 | # 10 | # http://www.apache.org/licenses/LICENSE-2.0 11 | # 12 | # Unless required by applicable law or agreed to in writing, software 13 | # distributed under the License is distributed on an "AS IS" BASIS, 14 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 15 | # See the License for the specific language governing permissions and 16 | # limitations under the License. 17 | # 18 | 19 | # export JAVA_HOME= 20 | # export MASTER= # Spark master url. eg. spark://master_addr:7077. Leave empty if you want to use local mode. 21 | # export ZEPPELIN_JAVA_OPTS # Additional jvm options. for example, export ZEPPELIN_JAVA_OPTS="-Dspark.executor.memory=8g -Dspark.cores.max=16" 22 | # export ZEPPELIN_MEM # Zeppelin jvm mem options Default -Xms1024m -Xmx1024m -XX:MaxPermSize=512m 23 | # export ZEPPELIN_INTP_MEM # zeppelin interpreter process jvm mem options. Default -Xms1024m -Xmx1024m -XX:MaxPermSize=512m 24 | # export ZEPPELIN_INTP_JAVA_OPTS # zeppelin interpreter process jvm options. 25 | export ZEPPELIN_PORT=5771 # Zeppelin server port. Defaults to 8080. 26 | # export ZEPPELIN_SSL_PORT # ssl port (used when ssl environment variable is set to true) 27 | 28 | # export ZEPPELIN_LOG_DIR # Where log files are stored. PWD by default. 29 | # export ZEPPELIN_PID_DIR # The pid files are stored. ${ZEPPELIN_HOME}/run by default. 30 | # export ZEPPELIN_WAR_TEMPDIR # The location of jetty temporary directory. 31 | # export ZEPPELIN_NOTEBOOK_DIR # Where notebook saved 32 | # export ZEPPELIN_NOTEBOOK_HOMESCREEN # Id of notebook to be displayed in homescreen. ex) 2A94M5J1Z 33 | # export ZEPPELIN_NOTEBOOK_HOMESCREEN_HIDE # hide homescreen notebook from list when this value set to "true". default "false" 34 | # export ZEPPELIN_NOTEBOOK_S3_BUCKET # Bucket where notebook saved 35 | # export ZEPPELIN_NOTEBOOK_S3_ENDPOINT # Endpoint of the bucket 36 | # export ZEPPELIN_NOTEBOOK_S3_USER # User in bucket where notebook saved. For example bucket/user/notebook/2A94M5J1Z/note.json 37 | # export ZEPPELIN_NOTEBOOK_S3_KMS_KEY_ID # AWS KMS key ID 38 | # export ZEPPELIN_NOTEBOOK_S3_KMS_KEY_REGION # AWS KMS key region 39 | # export ZEPPELIN_IDENT_STRING # A string representing this instance of zeppelin. $USER by default. 40 | # export ZEPPELIN_NICENESS # The scheduling priority for daemons. Defaults to 0. 41 | # export ZEPPELIN_INTERPRETER_LOCALREPO # Local repository for interpreter's additional dependency loading 42 | # export ZEPPELIN_NOTEBOOK_STORAGE # Refers to pluggable notebook storage class, can have two classes simultaneously with a sync between them (e.g. local and remote). 43 | # export ZEPPELIN_NOTEBOOK_ONE_WAY_SYNC # If there are multiple notebook storages, should we treat the first one as the only source of truth? 44 | # export ZEPPELIN_NOTEBOOK_PUBLIC # Make notebook public by default when created, private otherwise 45 | 46 | #### Spark interpreter configuration #### 47 | 48 | ## Use provided spark installation ## 49 | ## defining SPARK_HOME makes Zeppelin run spark interpreter process using spark-submit 50 | ## 51 | # export SPARK_HOME # (required) When it is defined, load it instead of Zeppelin embedded Spark libraries 52 | # export SPARK_SUBMIT_OPTIONS # (optional) extra options to pass to spark submit. eg) "--driver-memory 512M --executor-memory 1G". 53 | # export SPARK_APP_NAME # (optional) The name of spark application. 54 | 55 | ## Use embedded spark binaries ## 56 | ## without SPARK_HOME defined, Zeppelin still able to run spark interpreter process using embedded spark binaries. 57 | ## however, it is not encouraged when you can define SPARK_HOME 58 | ## 59 | # Options read in YARN client mode 60 | # export HADOOP_CONF_DIR # yarn-site.xml is located in configuration directory in HADOOP_CONF_DIR. 61 | # Pyspark (supported with Spark 1.2.1 and above) 62 | # To configure pyspark, you need to set spark distribution's path to 'spark.home' property in Interpreter setting screen in Zeppelin GUI 63 | # export PYSPARK_PYTHON # path to the python command. must be the same path on the driver(Zeppelin) and all workers. 64 | # export PYTHONPATH 65 | 66 | ## Spark interpreter options ## 67 | ## 68 | # export ZEPPELIN_SPARK_USEHIVECONTEXT # Use HiveContext instead of SQLContext if set true. true by default. 69 | # export ZEPPELIN_SPARK_CONCURRENTSQL # Execute multiple SQL concurrently if set true. false by default. 70 | # export ZEPPELIN_SPARK_IMPORTIMPLICIT # Import implicits, UDF collection, and sql if set true. true by default. 71 | # export ZEPPELIN_SPARK_MAXRESULT # Max number of Spark SQL result to display. 1000 by default. 72 | # export ZEPPELIN_WEBSOCKET_MAX_TEXT_MESSAGE_SIZE # Size in characters of the maximum text message to be received by websocket. Defaults to 1024000 73 | 74 | 75 | #### HBase interpreter configuration #### 76 | 77 | ## To connect to HBase running on a cluster, either HBASE_HOME or HBASE_CONF_DIR must be set 78 | 79 | # export HBASE_HOME= # (require) Under which HBase scripts and configuration should be 80 | # export HBASE_CONF_DIR= # (optional) Alternatively, configuration directory can be set to point to the directory that has hbase-site.xml 81 | 82 | #### ZeppelinHub connection configuration #### 83 | # export ZEPPELINHUB_API_ADDRESS # Refers to the address of the ZeppelinHub service in use 84 | # export ZEPPELINHUB_API_TOKEN # Refers to the Zeppelin instance token of the user 85 | # export ZEPPELINHUB_USER_KEY # Optional, when using Zeppelin with authentication. 86 | 87 | #### Zeppelin impersonation configuration 88 | # export ZEPPELIN_IMPERSONATE_CMD # Optional, when user want to run interpreter as end web user. eg) 'sudo -H -u ${ZEPPELIN_IMPERSONATE_USER} bash -c ' 89 | # export ZEPPELIN_IMPERSONATE_SPARK_PROXY_USER #Optional, by default is true; can be set to false if you don't want to use --proxy-user option with Spark interpreter when impersonation enabled 90 | -------------------------------------------------------------------------------- /templates/zookeeper/zoo.cfg: -------------------------------------------------------------------------------- 1 | # The number of milliseconds of each tick 2 | tickTime=2000 3 | # The number of ticks that the initial 4 | # synchronization phase can take 5 | initLimit=10 6 | # The number of ticks that can pass between 7 | # sending a request and getting an acknowledgement 8 | syncLimit=5 9 | # the directory where the snapshot is stored. 10 | # do not use /tmp for storage, /tmp here is just 11 | # example sakes. 12 | dataDir=LOCAL_CLOUD_PREFIX/data/zookeeper 13 | # the port at which the clients will connect 14 | clientPort=2181 15 | # the maximum number of client connections. 16 | # increase this if you need to handle more clients 17 | #maxClientCnxns=60 18 | # 19 | # Be sure to read the maintenance section of the 20 | # administrator guide before turning on autopurge. 21 | # 22 | # http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance 23 | # 24 | # The number of snapshots to retain in dataDir 25 | #autopurge.snapRetainCount=3 26 | # Purge task interval in hours 27 | # Set to "0" to disable auto purge feature 28 | #autopurge.purgeInterval=1 29 | --------------------------------------------------------------------------------