├── README.md ├── build-image.sh ├── hadoop-base ├── Dockerfile └── files │ ├── bashrc │ ├── hadoop-env.sh │ └── ssh_config ├── hadoop-dnsmasq ├── Dockerfile ├── dnsmasq │ ├── dnsmasq.conf │ └── resolv.dnsmasq.conf ├── handlers │ ├── member-failed │ ├── member-join │ └── member-leave └── serf │ ├── event-router.sh │ ├── serf-config.json │ └── start-serf-agent.sh ├── hadoop-master ├── Dockerfile └── files │ ├── hadoop │ ├── core-site.xml │ ├── hdfs-site.xml │ ├── mapred-site.xml │ └── yarn-site.xml │ └── init │ ├── configure-members.sh │ ├── run-wordcount.sh │ ├── start-hadoop.sh │ ├── start-ssh-serf.sh │ └── stop-hadoop.sh ├── hadoop-slave ├── Dockerfile └── files │ └── hadoop │ ├── core-site.xml │ ├── hdfs-site.xml │ ├── mapred-site.xml │ ├── start-ssh-serf.sh │ └── yarn-site.xml ├── resize-cluster.sh └── start-container.sh /README.md: -------------------------------------------------------------------------------- 1 | # hadoop-docker 2 | 3 | Quickly build arbitrary size Hadoop cluster based on Docker 4 | ------ 5 | Core of this project is based on [kiwenlau](https://github.com/kiwenlau) and [Serf](https://github.com/jai11/docker-serf) docker file. Hadoop version is upgraded and its configuration is partly rewritten. In addition HBase support has been added. As UNIX system is used [Debian wheezy minimalistic](https://hub.docker.com/r/philcryer/min-wheezy/) instead of Ubuntu. Hadoop is setup as fully distributed cluster with YARN. Size of docker images was reduced but room for optimizing still there. [Squash utility](https://github.com/jwilder/docker-squash) during optimization reduced only approx. 30Mb. The method is not used due to losing information of docker image layers. 6 | 7 | Tip: See other Hadoop based project in [kjrecmat](https://github.com/krejcmat) repository. 8 | 9 | ######Version of products 10 | | system | version | 11 | | ----------------|:-----------:| 12 | | Hadoop | 2.7.3 | 13 | | Java | JDK 8.111.14| 14 | | Serf | 1.3.9 | 15 | 16 | 17 | ######See file structure of project 18 | ``` 19 | $ tree 20 | . 21 | ├── build-image.sh 22 | ├── hadoop-base 23 | │   ├── Dockerfile 24 | │   └── files 25 | │   ├── bashrc 26 | │   ├── hadoop-env.sh 27 | │   └── ssh_config 28 | ├── hadoop-dnsmasq 29 | │   ├── dnsmasq 30 | │   │   ├── dnsmasq.conf 31 | │   │   └── resolv.dnsmasq.conf 32 | │   ├── Dockerfile 33 | │   ├── handlers 34 | │   │   ├── member-failed 35 | │   │   ├── member-join 36 | │   │   └── member-leave 37 | │   └── serf 38 | │   ├── event-router.sh 39 | │   ├── serf-config.json 40 | │   └── start-serf-agent.sh 41 | ├── hadoop-master 42 | │   ├── Dockerfile 43 | │   └── files 44 | │   └── hadoop 45 | │   ├── configure-members.sh 46 | │   ├── core-site.xml 47 | │   ├── hdfs-site.xml 48 | │   ├── mapred-site.xml 49 | │   ├── run-wordcount.sh 50 | │   ├── start-hadoop.sh 51 | │   ├── start-ssh-serf.sh 52 | │   ├── stop-hadoop.sh 53 | │   └── yarn-site.xml 54 | ├── hadoop-slave 55 | │   ├── Dockerfile 56 | │   └── files 57 | │   └── hadoop 58 | │   ├── core-site.xml 59 | │   ├── hdfs-site.xml 60 | │   ├── mapred-site.xml 61 | │   ├── start-ssh-serf.sh 62 | │   └── yarn-site.xml 63 | ├── README.md 64 | ├── resize-cluster.sh 65 | └── start-container.sh 66 | 67 | ``` 68 | 69 | ###Usage 70 | ####1] Clone git repository 71 | ``` 72 | $ git clone https://github.com/krejcmat/hadoop-docker.git 73 | $ cd hadoop-docker 74 | ``` 75 | 76 | ####2] Get docker images 77 | Two options how to get images are available. By pulling images directly from Docker official repository or build from Dockerfiles and sources files(see Dockerfile in each hadoop-* directory). Builds on DockerHub are automatically created by pull trigger or GitHub trigger after update Dockerfiles. Triggers are setuped for tag:latest. Below is example of stable version krejcmat/hadoop-<>:0.1. Version krejcmat/hadoop-<>:latest is compiled on DockerHub from master branche on GitHub. 78 | 79 | ######a) Download from Docker hub 80 | ``` 81 | $ docker pull krejcmat/hadoop-master:latest 82 | $ docker pull krejcmat/hadoop-slave:latest 83 | ``` 84 | 85 | ######b)Build from sources(Dockerfiles) 86 | The first argument of the script for bulilds is must be folder with Dockerfile. Tag for sources is **latest** 87 | ``` 88 | $ ./build-image.sh hadoop-dnsmasq 89 | ``` 90 | 91 | ######Check images 92 | ``` 93 | $ docker images 94 | 95 | REPOSITORY TAG IMAGE ID CREATED SIZE 96 | krejcmat/hadoop-slave latest 81cddf669d42 42 minutes ago 670.9 MB 97 | krejcmat/hadoop-master latest ed91c813b86f 42 minutes ago 670.9 MB 98 | krejcmat/hadoop-base latest cae006d1c427 50 minutes ago 670.9 MB 99 | krejcmat/hadoop-dnsmasq latest 89f0052d964c 53 minutes ago 156.9 MB 100 | philcryer/min-wheezy latest 214c501b67fa 14 months ago 50.74 MB 101 | 102 | 103 | ``` 104 | images: 105 | philcryer/min-wheezy, krejcmat/hadoop-dnsmasq, krejcmat/hadoop-base are only temporary for builds. For removing use command: 106 | ``` 107 | $ docker rmi c4c4000322cf e148f587cc4f d196b785d987 108 | ``` 109 | 110 | 111 | ####3] Initialize Hadoop (master and slaves) 112 | ######a)run containers 113 | The first parameter of start-container.sh script is tag of image version, second parameter configuring number of nodes. 114 | ``` 115 | $ ./start-container.sh latest 2 116 | 117 | start master container... 118 | start slave1 container... 119 | ``` 120 | 121 | #####Check status 122 | ######Check members of cluster 123 | ``` 124 | $ serf members 125 | 126 | master.krejcmat.com 172.17.0.2:7946 alive 127 | slave1.krejcmat.com 172.17.0.3:7946 alive 128 | ``` 129 | 130 | 131 | #####b)Run Hadoop cluster 132 | ######Creating configures file for Hadoop and Hbase(includes zookeeper) 133 | ``` 134 | $ cd ~ 135 | $ ./configure-members.sh 136 | 137 | Warning: Permanently added 'slave1.krejcmat.com,172.17.0.3' (ECDSA) to the list of known hosts.slaves 100% 40 0.0KB/s 00:00 138 | Warning: Permanently added 'slave1.krejcmat.com,172.17.0.3' (ECDSA) to the list of known hosts.slaves 100% 40 0.0KB/s 00:00 139 | Warning: Permanently added 'slave1.krejcmat.com,172.17.0.3' (ECDSA) to the list of known hosts.hbase-site.xml 100% 1730 1.7KB/s 00:00 140 | Warning: Permanently added 'master.krejcmat.com,172.17.0.2' (ECDSA) to the list of known hosts.slaves 100% 40 0.0KB/s 00:00 141 | Warning: Permanently added 'master.krejcmat.com,172.17.0.2' (ECDSA) to the list of known hosts.slaves 100% 40 0.0KB/s 00:00 142 | Warning: Permanently added 'master.krejcmat.com,172.17.0.2' (ECDSA) to the list of known hosts.hbase-site.xml 100% 1730 1.7KB/s 00:00 143 | ``` 144 | 145 | ######Starting Hadoop 146 | ``` 147 | $ ./start-hadoop.sh 148 | #For stop Hadoop ./stop-hadoop.sh 149 | 150 | Starting namenodes on [master.krejcmat.com] 151 | master.krejcmat.com: Warning: Permanently added 'master.krejcmat.com,172.17.0.2' (ECDSA) to the list of known hosts. 152 | master.krejcmat.com: starting namenode, logging to /usr/local/hadoop/logs/hadoop-root-namenode-master.krejcmat.com.out 153 | slave1.krejcmat.com: Warning: Permanently added 'slave1.krejcmat.com,172.17.0.3' (ECDSA) to the list of known hosts. 154 | master.krejcmat.com: Warning: Permanently added 'master.krejcmat.com,172.17.0.2' (ECDSA) to the list of known hosts. 155 | slave1.krejcmat.com: starting datanode, logging to /usr/local/hadoop/logs/hadoop-root-datanode-slave1.krejcmat.com.out 156 | master.krejcmat.com: starting datanode, logging to /usr/local/hadoop/logs/hadoop-root-datanode-master.krejcmat.com.out 157 | Starting secondary namenodes [0.0.0.0] 158 | 0.0.0.0: Warning: Permanently added '0.0.0.0' (ECDSA) to the list of known hosts. 159 | 0.0.0.0: starting secondarynamenode, logging to /usr/local/hadoop/logs/hadoop-root-secondarynamenode-master.krejcmat.com.out 160 | 161 | starting yarn daemons 162 | starting resource manager, logging to /usr/local/hadoop/logs/yarn--resourcemanager-master.krejcmat.com.out 163 | master.krejcmat.com: Warning: Permanently added 'master.krejcmat.com,172.17.0.2' (ECDSA) to the list of known hosts. 164 | slave1.krejcmat.com: Warning: Permanently added 'slave1.krejcmat.com,172.17.0.3' (ECDSA) to the list of known hosts. 165 | slave1.krejcmat.com: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-root-nodemanager-slave1.krejcmat.com.out 166 | master.krejcmat.com: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-root-nodemanager-master.krejcmat.com.out 167 | ``` 168 | 169 | ######Print Java processes 170 | ``` 171 | $ jps 172 | 173 | 342 NameNode 174 | 460 DataNode 175 | 1156 Jps 176 | 615 SecondaryNameNode 177 | 769 ResourceManager 178 | 862 NodeManager 179 | ``` 180 | 181 | ######Print status of Hadoop cluster 182 | ``` 183 | $ hdfs dfsadmin -report 184 | 185 | Name: 172.17.0.2:50010 (master.krejcmat.com) 186 | Hostname: master.krejcmat.com 187 | Decommission Status : Normal 188 | Configured Capacity: 98293264384 (91.54 GB) 189 | DFS Used: 24576 (24 KB) 190 | Non DFS Used: 77983322112 (72.63 GB) 191 | DFS Remaining: 20309917696 (18.92 GB) 192 | DFS Used%: 0.00% 193 | DFS Remaining%: 20.66% 194 | Configured Cache Capacity: 0 (0 B) 195 | Cache Used: 0 (0 B) 196 | Cache Remaining: 0 (0 B) 197 | Cache Used%: 100.00% 198 | Cache Remaining%: 0.00% 199 | Xceivers: 1 200 | Last contact: Wed Feb 03 16:09:14 UTC 2016 201 | 202 | Name: 172.17.0.3:50010 (slave1.krejcmat.com) 203 | Hostname: slave1.krejcmat.com 204 | Decommission Status : Normal 205 | Configured Capacity: 98293264384 (91.54 GB) 206 | DFS Used: 24576 (24 KB) 207 | Non DFS Used: 77983322112 (72.63 GB) 208 | DFS Remaining: 20309917696 (18.92 GB) 209 | DFS Used%: 0.00% 210 | DFS Remaining%: 20.66% 211 | Configured Cache Capacity: 0 (0 B) 212 | Cache Used: 0 (0 B) 213 | Cache Remaining: 0 (0 B) 214 | Cache Used%: 100.00% 215 | Cache Remaining%: 0.00% 216 | Xceivers: 1 217 | Last contact: Wed Feb 03 16:09:14 UTC 2016 218 | ``` 219 | 220 | ####4] Control cluster from web UI 221 | ######Overview of UI web ports 222 | | web ui | port | 223 | | ---------------- |:----------:| 224 | | Hadoop namenode | 50070 | 225 | | Hadoop cluster | 8088 | 226 | 227 | so your IP address is 172.17.0.2 228 | 229 | ``` 230 | $ xdg-open http://172.17.0.2:60010/ 231 | ``` 232 | ######Direct access from container(not implemented) 233 | Used Linux distribution is installed without graphical UI. Easiest way is to use another Unix distribution by modifying Dockerfile of hadoop-hbase-dnsmasq and rebuild images. In this case start-container.sh script must be modified. On the line where the master container is created must add parameters for [X forwarding](http://wiki.ros.org/docker/Tutorials/GUI). 234 | 235 | ###Documentation 236 | ####hadoop-hbase-dnsmasq 237 | Base image for all the others. Dockerfile of dnsmaq provide image build based on Debian wheezy minimalistic and (Serf)[https://www.serfdom.io/] which is solution for cluster membership. Serf is also workaround for problem with **/etc/hosts** which is readonly in docker containers. With starting docker container instance the reference is pass as: ```docker run -h -dns ```. Advantage of usage **Serf** is handling cluster, like nodes joining, leaving, failing. Configuration scripts are used from [Docker container Serf/Dnsmasq](https://github.com/jai11/docker-serf) 238 | 239 | 240 | ###Sources & references 241 | 242 | ######configuration 243 | [Hadoop YARN installation guide](http://www.alexjf.net/blog/distributed-systems/hadoop-yarn-installation-definitive-guide/) 244 | 245 | [Hbase main manual](https://hbase.apache.org/book.html) 246 | 247 | ######docker 248 | [Docker cheat sheet](https://github.com/wsargent/docker-cheat-sheet) 249 | 250 | [how to make docker image smaller](http://jasonwilder.com/blog/2014/08/19/squashing-docker-images/) 251 | 252 | ######Serf 253 | [SERF: tool for cluster membership](https://www.serfdom.io/intro/) 254 | 255 | [Serf docker presentation from Hadoop summit14](http://www.slideshare.net/JanosMatyas/docker-based-hadoop-provisioning) 256 | 257 | [Docker Serf/Dnsmasq](https://github.com/jai11/docker-serf) 258 | 259 | 260 | ###Some notes, answers 261 | ######Region server vs datanode 262 | Data nodes store data. Region server(s) essentially buffer I/O operations; data is permanently stored on HDFS (that is, data nodes). I do not think that putting region server on your 'master' node is a good idea. 263 | 264 | Here is a simplified picture of how regions are managed: 265 | 266 | You have a cluster running HDFS (NameNode + DataNodes) with a replication factor of 3 (each HDFS block is copied into 3 different DataNodes). 267 | 268 | You run RegionServers on the same servers as DataNodes. When write request comes to RegionServer it first writes changes into memory and commit log; then at some point, it decides that it is time to write changes to permanent storage on HDFS. Here is where data locality comes into play: since you run RegionServer and DataNode on the same server, first HDFS block replica of the file will be written to the same server. Two other replicas will be written to, well, other DataNodes. As a result, RegionServer serving the region will almost always have access to a local copy of data. 269 | 270 | What if RegionServer crashes or RegionMaster decided to reassign region to another RegionServer (to keep cluster balanced)? New RegionServer will be forced to perform remote read first, but as soon as compaction is performed (merging of change log into the data) - new file will be written to HDFS by the new RegionServer, and local copy will be created on the RegionServer (again, because DataNode and RegionServer runs on the same server). 271 | 272 | Note: in the case of RegionServer crash, regions previously assigned to it will be reassigned to multiple RegionServers. 273 | -------------------------------------------------------------------------------- /build-image.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | image=$1 4 | tag='latest' 5 | 6 | 7 | if [ $1 = 0 ] 8 | then 9 | echo "Please use image name as the first argument!" 10 | exit 1 11 | fi 12 | 13 | # founction for delete images 14 | function docker_rmi() 15 | { 16 | echo -e "\n\nsudo docker rmi krejcmat/$1:$tag" 17 | sudo docker rmi krejcmat/$1:$tag 18 | } 19 | 20 | 21 | # founction for build images 22 | function docker_build() 23 | { 24 | cd $1 25 | echo -e "\n\nsudo docker build -t krejcmat/$1:$tag ." 26 | /usr/bin/time -f "real %e" sudo docker build -t krejcmat/$1:$tag . 27 | cd .. 28 | } 29 | 30 | echo -e "\ndocker rm -f slave1 slave2 master" 31 | sudo docker rm -f slave1 slave2 master 32 | 33 | sudo docker images >images.txt 34 | 35 | #all image is based on dnsmasq. master and slaves are based on base image. 36 | if [ $image == "hadoop-dnsmasq" ] 37 | then 38 | docker_rmi hadoop-master 39 | docker_rmi hadoop-slave 40 | docker_rmi hadoop-base 41 | docker_rmi hadoop-dnsmasq 42 | docker_build hadoop-dnsmasq 43 | docker_build hadoop-base 44 | docker_build hadoop-master 45 | docker_build hadoop-slave 46 | elif [ $image == "hadoop-base" ] 47 | then 48 | docker_rmi hadoop-master 49 | docker_rmi hadoop-slave 50 | docker_rmi hadoop-base 51 | docker_build hadoop-base 52 | docker_build hadoop-master 53 | docker_build hadoop-slave 54 | elif [ $image == "hadoop-master" ] 55 | then 56 | docker_rmi hadoop-master 57 | docker_build hadoop-master 58 | elif [ $image == "hadoop-slave" ] 59 | then 60 | docker_rmi hadoop-slave 61 | docker_build hadoop-slave 62 | else 63 | echo "The image name is wrong!" 64 | fi 65 | 66 | #docker_rmi hadoop-base 67 | 68 | echo -e "\nimages before build" 69 | cat images.txt 70 | rm images.txt 71 | 72 | echo -e "\nimages after build" 73 | sudo docker images 74 | -------------------------------------------------------------------------------- /hadoop-base/Dockerfile: -------------------------------------------------------------------------------- 1 | FROM sebge2/hadoop-dnsmasq:latest 2 | MAINTAINER sgerard 3 | 4 | # install openssh-server 5 | RUN apt-get update && \ 6 | apt-get install -y curl openssh-server nano && \ 7 | apt-get clean -y && apt-get autoclean -y && apt-get autoremove -y && \ 8 | rm -rf /var/lib/{apt,dpkg,cache,log}/ 9 | 10 | # Java Version 11 | ENV JAVA_VERSION_MAJOR 8 12 | ENV JAVA_VERSION_MINOR 111 13 | ENV JAVA_VERSION_BUILD 14 14 | ENV JAVA_PACKAGE jdk 15 | 16 | # Download and unarchive Java 17 | RUN mkdir -p /opt &&\ 18 | curl -jksSLH "Cookie: oraclelicense=accept-securebackup-cookie"\ 19 | http://download.oracle.com/otn-pub/java/jdk/${JAVA_VERSION_MAJOR}u${JAVA_VERSION_MINOR}-b${JAVA_VERSION_BUILD}/${JAVA_PACKAGE}-${JAVA_VERSION_MAJOR}u${JAVA_VERSION_MINOR}-linux-x64.tar.gz | gunzip -c - | tar -xf - -C /opt &&\ 20 | ln -s /opt/jdk1.${JAVA_VERSION_MAJOR}.0_${JAVA_VERSION_MINOR} /opt/jdk &&\ 21 | rm -rf /opt/jdk/*src.zip \ 22 | /opt/jdk/lib/missioncontrol \ 23 | /opt/jdk/lib/visualvm \ 24 | /opt/jdk/lib/*javafx* \ 25 | /opt/jdk/jre/lib/plugin.jar \ 26 | /opt/jdk/jre/lib/ext/jfxrt.jar \ 27 | /opt/jdk/jre/bin/javaws \ 28 | /opt/jdk/jre/lib/javaws.jar \ 29 | /opt/jdk/jre/lib/desktop \ 30 | /opt/jdk/jre/plugin \ 31 | /opt/jdk/jre/lib/deploy* \ 32 | /opt/jdk/jre/lib/*javafx* \ 33 | /opt/jdk/jre/lib/*jfx* \ 34 | /opt/jdk/jre/lib/amd64/libdecora_sse.so \ 35 | /opt/jdk/jre/lib/amd64/libprism_*.so \ 36 | /opt/jdk/jre/lib/amd64/libfxplugins.so \ 37 | /opt/jdk/jre/lib/amd64/libglass.so \ 38 | /opt/jdk/jre/lib/amd64/libgstreamer-lite.so \ 39 | /opt/jdk/jre/lib/amd64/libjavafx*.so \ 40 | /opt/jdk/jre/lib/amd64/libjfx*.so 41 | 42 | 43 | # Set environment 44 | ENV JAVA_HOME /opt/jdk 45 | ENV PATH ${PATH}:${JAVA_HOME}/bin 46 | 47 | # move all configuration files into container 48 | ADD files/* /usr/local/ 49 | 50 | # set jave environment variable 51 | ENV JAVA_HOME /opt/jdk 52 | ENV PATH $PATH:$JAVA_HOME/bin 53 | 54 | #configure ssh free key access 55 | RUN mkdir /var/run/sshd && \ 56 | ssh-keygen -t rsa -f ~/.ssh/id_rsa -P '' && \ 57 | cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys && \ 58 | mv /usr/local/ssh_config ~/.ssh/config && \ 59 | sed 's@session\s*required\s*pam_loginuid.so@session optional pam_loginuid.so@g' -i /etc/pam.d/sshd 60 | 61 | #install hadoop 2.7.3 62 | RUN wget -q -o out.log -P /tmp http://www.trieuvan.com/apache/hadoop/common/hadoop-2.7.3/hadoop-2.7.3.tar.gz && \ 63 | tar xzf /tmp/hadoop-2.7.3.tar.gz -C /usr/local && \ 64 | rm /tmp/hadoop-2.7.3.tar.gz && \ 65 | mv /usr/local/hadoop-2.7.3 /usr/local/hadoop && \ 66 | mv /usr/local/bashrc ~/.bashrc && \ 67 | mv /usr/local/hadoop-env.sh /usr/local/hadoop/etc/hadoop/hadoop-env.sh 68 | 69 | -------------------------------------------------------------------------------- /hadoop-base/files/bashrc: -------------------------------------------------------------------------------- 1 | export JAVA_HOME=/opt/jdk 2 | export HADOOP_INSTALL=/usr/local/hadoop 3 | export HADOOP_HOME=$HADOOP_INSTALL 4 | export PATH=$PATH:$HADOOP_INSTALL/bin 5 | export PATH=$PATH:$HADOOP_INSTALL/sbin 6 | export HADOOP_MAPRED_HOME=$HADOOP_INSTALL 7 | export HADOOP_COMMON_HOME=$HADOOP_INSTALL 8 | export HADOOP_HDFS_HOME=$HADOOP_INSTALL 9 | export HADOOP_CONF_DIR=$HADOOP_INSTALL/etc/hadoop 10 | export YARN_HOME=$HADOOP_INSTALL 11 | export YARN_CONF_DIR=$HADOOP_INSTALL/etc/hadoop -------------------------------------------------------------------------------- /hadoop-base/files/hadoop-env.sh: -------------------------------------------------------------------------------- 1 | # Copyright 2011 The Apache Software Foundation 2 | # 3 | # Licensed to the Apache Software Foundation (ASF) under one 4 | # or more contributor license agreements. See the NOTICE file 5 | # distributed with this work for additional information 6 | # regarding copyright ownership. The ASF licenses this file 7 | # to you under the Apache License, Version 2.0 (the 8 | # "License"); you may not use this file except in compliance 9 | # with the License. You may obtain a copy of the License at 10 | # 11 | # http://www.apache.org/licenses/LICENSE-2.0 12 | # 13 | # Unless required by applicable law or agreed to in writing, software 14 | # distributed under the License is distributed on an "AS IS" BASIS, 15 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 16 | # See the License for the specific language governing permissions and 17 | # limitations under the License. 18 | 19 | # Set Hadoop-specific environment variables here. 20 | 21 | # The only required environment variable is JAVA_HOME. All others are 22 | # optional. When running a distributed configuration it is best to 23 | # set JAVA_HOME in this file, so that it is correctly defined on 24 | # remote nodes. 25 | 26 | # The java implementation to use. 27 | export JAVA_HOME=/opt/jdk 28 | 29 | # The jsvc implementation to use. Jsvc is required to run secure datanodes. 30 | #export JSVC_HOME=${JSVC_HOME} 31 | 32 | export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/etc/hadoop"} 33 | 34 | # Extra Java CLASSPATH elements. Automatically insert capacity-scheduler. 35 | for f in $HADOOP_HOME/contrib/capacity-scheduler/*.jar; do 36 | if [ "$HADOOP_CLASSPATH" ]; then 37 | export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$f 38 | else 39 | export HADOOP_CLASSPATH=$f 40 | fi 41 | done 42 | 43 | # The maximum amount of heap to use, in MB. Default is 1000. 44 | #export HADOOP_HEAPSIZE= 45 | #export HADOOP_NAMENODE_INIT_HEAPSIZE="" 46 | 47 | # Extra Java runtime options. Empty by default. 48 | export HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true" 49 | 50 | # Command specific options appended to HADOOP_OPTS when specified 51 | export HADOOP_NAMENODE_OPTS="-Dhadoop.security.logger=${HADOOP_SECURITY_LOGGER:-INFO,RFAS} -Dhdfs.audit.logger=${HDFS_AUDIT_LOGGER:-INFO,NullAppender} $HADOOP_NAMENODE_OPTS" 52 | export HADOOP_DATANODE_OPTS="-Dhadoop.security.logger=ERROR,RFAS $HADOOP_DATANODE_OPTS" 53 | 54 | export HADOOP_SECONDARYNAMENODE_OPTS="-Dhadoop.security.logger=${HADOOP_SECURITY_LOGGER:-INFO,RFAS} -Dhdfs.audit.logger=${HDFS_AUDIT_LOGGER:-INFO,NullAppender} $HADOOP_SECONDARYNAMENODE_OPTS" 55 | 56 | # The following applies to multiple commands (fs, dfs, fsck, distcp etc) 57 | export HADOOP_CLIENT_OPTS="-Xmx512m $HADOOP_CLIENT_OPTS" 58 | #HADOOP_JAVA_PLATFORM_OPTS="-XX:-UsePerfData $HADOOP_JAVA_PLATFORM_OPTS" 59 | 60 | # On secure datanodes, user to run the datanode as after dropping privileges 61 | export HADOOP_SECURE_DN_USER=${HADOOP_SECURE_DN_USER} 62 | 63 | # Where log files are stored. $HADOOP_HOME/logs by default. 64 | #export HADOOP_LOG_DIR=${HADOOP_LOG_DIR}/$USER 65 | 66 | # Where log files are stored in the secure data environment. 67 | export HADOOP_SECURE_DN_LOG_DIR=${HADOOP_LOG_DIR}/${HADOOP_HDFS_USER} 68 | 69 | # The directory where pid files are stored. /tmp by default. 70 | # NOTE: this should be set to a directory that can only be written to by 71 | # the user that will run the hadoop daemons. Otherwise there is the 72 | # potential for a symlink attack. 73 | export HADOOP_PID_DIR=${HADOOP_PID_DIR} 74 | export HADOOP_SECURE_DN_PID_DIR=${HADOOP_PID_DIR} 75 | 76 | # A string representing this instance of hadoop. $USER by default. 77 | export HADOOP_IDENT_STRING=$USER 78 | -------------------------------------------------------------------------------- /hadoop-base/files/ssh_config: -------------------------------------------------------------------------------- 1 | Host localhost 2 | StrictHostKeyChecking no 3 | 4 | Host 0.0.0.0 5 | StrictHostKeyChecking no 6 | 7 | Host *.krejcmat.com 8 | StrictHostKeyChecking no 9 | UserKnownHostsFile=/dev/null -------------------------------------------------------------------------------- /hadoop-dnsmasq/Dockerfile: -------------------------------------------------------------------------------- 1 | FROM philcryer/min-wheezy:latest 2 | MAINTAINER sgerard 3 | 4 | # init wheezy docker 5 | RUN echo 'deb http://ftp.cz.debian.org/debian stable main contrib'>> /etc/apt/sources.list && \ 6 | apt-get clean -y && apt-get autoclean -y && apt-get autoremove -y && \ 7 | cp -R /usr/share/locale/en\@* /tmp/ && rm -rf /usr/share/locale/* && mv /tmp/en\@* /usr/share/locale/ && \ 8 | rm -rf /var/cache/debconf/*-old && rm -rf /var/lib/apt/lists/* && rm -rf /usr/share/doc/* && \ 9 | apt-get update -y && \ 10 | echo "`cat /etc/issue.net` Docker Image - philcryer/min-wheezy - `date +'%Y/%m/%d'`" > /etc/motd 11 | 12 | RUN apt-get install -y unzip dnsmasq wget && \ 13 | apt-get clean -y && apt-get autoclean -y && apt-get autoremove -y && \ 14 | rm -rf /var/lib/{apt,dpkg,cache,log}/ && \ 15 | rm -rf /tmp 16 | 17 | # dnsmasq configuration 18 | ADD dnsmasq/* /etc/ 19 | 20 | # install serf 21 | RUN wget -q -o out.log -P /tmp/ https://releases.hashicorp.com/serf/0.7.0/serf_0.7.0_linux_amd64.zip && \ 22 | rm -rf /bin/serf 23 | 24 | RUN unzip /tmp/serf_0.7.0_linux_amd64.zip -d /bin && \ 25 | rm /tmp/serf_0.7.0_linux_amd64.zip 26 | 27 | # configure serf 28 | ENV SERF_CONFIG_DIR /etc/serf 29 | ADD serf/* $SERF_CONFIG_DIR/ 30 | ADD handlers $SERF_CONFIG_DIR/handlers 31 | RUN chmod +x $SERF_CONFIG_DIR/event-router.sh $SERF_CONFIG_DIR/start-serf-agent.sh 32 | -------------------------------------------------------------------------------- /hadoop-dnsmasq/dnsmasq/dnsmasq.conf: -------------------------------------------------------------------------------- 1 | listen-address=127.0.0.1 2 | resolv-file=/etc/resolv.dnsmasq.conf 3 | conf-dir=/etc/dnsmasq.d 4 | user=root 5 | -------------------------------------------------------------------------------- /hadoop-dnsmasq/dnsmasq/resolv.dnsmasq.conf: -------------------------------------------------------------------------------- 1 | # google dns 2 | nameserver 8.8.8.8 3 | nameserver 8.8.4.4 -------------------------------------------------------------------------------- /hadoop-dnsmasq/handlers/member-failed: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | #for var in ${!SERF*}; do echo ${var}=${!var};done 4 | 5 | # ask 6 | serf members -status=alive | while read line ;do 7 | NEXT_HOST=$(echo $line|cut -d' ' -f 1) 8 | NEXT_SHORT=${NEXT_HOST%%.*} 9 | NEXT_ADDR=$(echo $line|cut -d' ' -f 2) 10 | NEXT_IP=${NEXT_ADDR%%:*} 11 | echo address=\"/$NEXT_HOST/$NEXT_SHORT/$NEXT_IP\" 12 | IFS='.' read -a NEXT_IP_ARRAY <<< "$NEXT_IP" 13 | echo ptr-record=${NEXT_IP_ARRAY[3]}.${NEXT_IP_ARRAY[2]}.${NEXT_IP_ARRAY[1]}.${NEXT_IP_ARRAY[0]}.in-addr.arpa,$NEXT_HOST 14 | done > /etc/dnsmasq.d/0hosts 15 | 16 | service dnsmasq restart 17 | -------------------------------------------------------------------------------- /hadoop-dnsmasq/handlers/member-join: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | #for var in ${!SERF*}; do echo ${var}=${!var};done 4 | 5 | # ask 6 | serf members -status=alive | while read line ;do 7 | NEXT_HOST=$(echo $line|cut -d' ' -f 1) 8 | NEXT_SHORT=${NEXT_HOST%%.*} 9 | NEXT_ADDR=$(echo $line|cut -d' ' -f 2) 10 | NEXT_IP=${NEXT_ADDR%%:*} 11 | echo address=\"/$NEXT_HOST/$NEXT_SHORT/$NEXT_IP\" 12 | IFS='.' read -a NEXT_IP_ARRAY <<< "$NEXT_IP" 13 | echo ptr-record=${NEXT_IP_ARRAY[3]}.${NEXT_IP_ARRAY[2]}.${NEXT_IP_ARRAY[1]}.${NEXT_IP_ARRAY[0]}.in-addr.arpa,$NEXT_HOST 14 | done > /etc/dnsmasq.d/0hosts 15 | 16 | service dnsmasq restart 17 | -------------------------------------------------------------------------------- /hadoop-dnsmasq/handlers/member-leave: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | #for var in ${!SERF*}; do echo ${var}=${!var};done 4 | 5 | # ask 6 | serf members -status=alive | while read line ;do 7 | NEXT_HOST=$(echo $line|cut -d' ' -f 1) 8 | NEXT_SHORT=${NEXT_HOST%%.*} 9 | NEXT_ADDR=$(echo $line|cut -d' ' -f 2) 10 | NEXT_IP=${NEXT_ADDR%%:*} 11 | echo address=\"/$NEXT_HOST/$NEXT_SHORT/$NEXT_IP\" 12 | IFS='.' read -a NEXT_IP_ARRAY <<< "$NEXT_IP" 13 | echo ptr-record=${NEXT_IP_ARRAY[3]}.${NEXT_IP_ARRAY[2]}.${NEXT_IP_ARRAY[1]}.${NEXT_IP_ARRAY[0]}.in-addr.arpa,$NEXT_HOST 14 | done > /etc/dnsmasq.d/0hosts 15 | 16 | service dnsmasq restart 17 | -------------------------------------------------------------------------------- /hadoop-dnsmasq/serf/event-router.sh: -------------------------------------------------------------------------------- 1 | #!/bin/sh 2 | HANDLER_DIR="/etc/serf/handlers" 3 | 4 | if [ "$SERF_EVENT" = "user" ]; then 5 | EVENT="user-$SERF_USER_EVENT" 6 | elif [ "$SERF_EVENT" = "query" ]; then 7 | EVENT="query-$SERF_QUERY_NAME" 8 | else 9 | EVENT=$SERF_EVENT 10 | fi 11 | 12 | HANDLER="$HANDLER_DIR/$EVENT" 13 | [ -f "$HANDLER" -a -x "$HANDLER" ] && exec "$HANDLER" || : 14 | -------------------------------------------------------------------------------- /hadoop-dnsmasq/serf/serf-config.json: -------------------------------------------------------------------------------- 1 | { 2 | "event_handlers": ["/etc/serf/event-router.sh"], 3 | "rpc_addr" : "0.0.0.0:7373" 4 | } 5 | -------------------------------------------------------------------------------- /hadoop-dnsmasq/serf/start-serf-agent.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | service dnsmasq start 4 | 5 | SERF_CONFIG_DIR=/etc/serf 6 | 7 | # if JOIN_IP env variable set generate a config json for serf 8 | [[ -n $JOIN_IP ]] && cat > $SERF_CONFIG_DIR/join.json < $SERF_CONFIG_DIR/node.json < 3 | 4 | # move all confugration files into container 5 | ADD files/hadoop/* /tmp/ 6 | ADD files/init/* /tmp/ 7 | 8 | ENV HADOOP_INSTALL /usr/local/hadoop 9 | 10 | RUN mkdir -p ~/hdfs/namenode && \ 11 | mkdir -p ~/zookeeper && \ 12 | mkdir -p ~/hdfs/datanode 13 | 14 | RUN mv /tmp/hdfs-site.xml $HADOOP_INSTALL/etc/hadoop/hdfs-site.xml && \ 15 | mv /tmp/core-site.xml $HADOOP_INSTALL/etc/hadoop/core-site.xml && \ 16 | mv /tmp/mapred-site.xml $HADOOP_INSTALL/etc/hadoop/mapred-site.xml && \ 17 | mv /tmp/yarn-site.xml $HADOOP_INSTALL/etc/hadoop/yarn-site.xml && \ 18 | mv /tmp/stop-hadoop.sh ~/stop-hadoop.sh && \ 19 | mv /tmp/start-hadoop.sh ~/start-hadoop.sh && \ 20 | mv /tmp/run-wordcount.sh ~/run-wordcount.sh && \ 21 | mv /tmp/start-ssh-serf.sh ~/start-ssh-serf.sh && \ 22 | mv /tmp/configure-members.sh ~/configure-members.sh 23 | 24 | RUN chmod +x ~/start-hadoop.sh && \ 25 | chmod +x ~/stop-hadoop.sh && \ 26 | chmod +x ~/run-wordcount.sh && \ 27 | chmod +x ~/start-ssh-serf.sh && \ 28 | chmod +x ~/configure-members.sh && \ 29 | chmod 1777 tmp 30 | 31 | # format namenode 32 | RUN /usr/local/hadoop/bin/hdfs namenode -format 33 | 34 | EXPOSE 22 7373 7946 9000 50010 50020 50070 50075 50090 50475 8030 8031 8032 8033 8040 8042 8060 8088 50060 35 | 36 | CMD '/root/start-ssh-serf.sh'; 'bash' -------------------------------------------------------------------------------- /hadoop-master/files/hadoop/core-site.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | fs.defaultFS 6 | hdfs://master.krejcmat.com:9000/ 7 | The name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class. The uri's authority is used to determine the host, port, etc. for a filesystem. 8 | 9 | -------------------------------------------------------------------------------- /hadoop-master/files/hadoop/hdfs-site.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | dfs.datanode.data.dir 6 | file:///root/hdfs/datanode 7 | DataNode directory 8 | 9 | 10 | dfs.namenode.name.dir 11 | file:///root/hdfs/namenode 12 | NameNode directory for namespace and transaction logs storage. 13 | 14 | 15 | dfs.replication 16 | 3 17 | 18 | 19 | dfs.permissions 20 | false 21 | 22 | 23 | dfs.datanode.use.datanode.hostname 24 | false 25 | 26 | 27 | dfs.webhdfs.enabled 28 | true 29 | 30 | 31 | dfs.namenode.datanode.registration.ip-hostname-check 32 | false 33 | If true (the default), then the namenode requires that a connecting datanode's address must be resolved to a hostname. If necessary, a reverse DNS lookup is performed. All attempts to register a datanode from an unresolvable address are rejected. It is recommended that this setting be left on to prevent accidental registration of datanodes listed by hostname in the excludes file during a DNS outage. Only set this to false in environments where there is no infrastructure to support reverse DNS lookup. 34 | 35 | -------------------------------------------------------------------------------- /hadoop-master/files/hadoop/mapred-site.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | mapreduce.framework.name 6 | yarn 7 | 8 | -------------------------------------------------------------------------------- /hadoop-master/files/hadoop/yarn-site.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | yarn.nodemanager.aux-services 5 | mapreduce_shuffle 6 | 7 | 8 | yarn.nodemanager.aux-services.mapreduce_shuffle.class 9 | org.apache.hadoop.mapred.ShuffleHandler 10 | 11 | 12 | yarn.resourcemanager.resource-tracker.address 13 | master.krejcmat.com:8025 14 | 15 | 16 | yarn.resourcemanager.scheduler.address 17 | master.krejcmat.com:8030 18 | 19 | 20 | yarn.resourcemanager.address 21 | master.krejcmat.com:8040 22 | 23 | 24 | yarn.nodemanager.address 25 | master:8050 26 | 27 | 28 | yarn.nodemanager.localizer.address 29 | master:8060 30 | 31 | -------------------------------------------------------------------------------- /hadoop-master/files/init/configure-members.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | slaves=/tmp/slaves 4 | rm -f $slaves 5 | >$slaves 6 | 7 | 8 | function init_members(){ 9 | members=$(serf members 2>&1| tac) 10 | while read -r line; do 11 | if [[ ($line =~ "slave" || $line =~ "master") && $line =~ "alive" ]] 12 | then 13 | alive_mem=$(echo $line | cut -d " " -f 1 2>&1) #get hosts 14 | echo "$alive_mem">>$slaves 15 | continue 16 | fi 17 | done <<< "$members" 18 | #copy slave file to all slaves and master 19 | #create hbase 20 | members_line=$(paste -d, -s $slaves 2>&1) 21 | memstr='members' #uniq string for replace 22 | 23 | while read -r member 24 | do 25 | scp $slaves $member:$HADOOP_CONF_DIR/slaves #hadoop 26 | done < "$slaves" 27 | } 28 | 29 | 30 | init_members 31 | 32 | -------------------------------------------------------------------------------- /hadoop-master/files/init/run-wordcount.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # test the hadoop cluster by running wordcount 4 | 5 | # create input files 6 | mkdir input 7 | echo "Hello Docker" >input/file2.txt 8 | echo "Hello Hadoop" >input/file1.txt 9 | 10 | # create input directory on HDFS 11 | hadoop fs -mkdir -p input 12 | 13 | # put input files to HDFS 14 | hdfs dfs -put ./input/* input 15 | 16 | # run wordcount 17 | hadoop jar $HADOOP_INSTALL/share/hadoop/mapreduce/sources/hadoop-mapreduce-examples-2.7.1-sources.jar org.apache.hadoop.examples.WordCount input output 18 | 19 | # print the input files 20 | echo -e "\ninput file1.txt:" 21 | hdfs dfs -cat input/file1.txt 22 | 23 | echo -e "\ninput file2.txt:" 24 | hdfs dfs -cat input/file2.txt 25 | 26 | # print the output of wordcount 27 | echo -e "\nwordcount output:" 28 | hdfs dfs -cat output/part-r-00000 29 | -------------------------------------------------------------------------------- /hadoop-master/files/init/start-hadoop.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | $HADOOP_INSTALL/sbin/start-dfs.sh 4 | 5 | echo -e "\n" 6 | $HADOOP_INSTALL/sbin/start-yarn.sh 7 | -------------------------------------------------------------------------------- /hadoop-master/files/init/start-ssh-serf.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # start sshd 4 | echo "start sshd..." 5 | service ssh start 6 | 7 | # start sef 8 | echo -e "\nstart serf..." 9 | /etc/serf/start-serf-agent.sh > serf_log & 10 | 11 | sleep 5 12 | 13 | serf members 14 | 15 | echo -e "\nhadoop-cluster-docker " -------------------------------------------------------------------------------- /hadoop-master/files/init/stop-hadoop.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | $HADOOP_INSTALL/sbin/stop-dfs.sh 3 | 4 | echo -e "\n" 5 | $HADOOP_INSTALL/sbin/stop-yarn.sh 6 | -------------------------------------------------------------------------------- /hadoop-slave/Dockerfile: -------------------------------------------------------------------------------- 1 | FROM sebge2/hadoop-base:latest 2 | MAINTAINER sgerard 3 | 4 | # move all confugration /tmp into container 5 | ADD files/hadoop/* /tmp/ 6 | 7 | ENV HADOOP_INSTALL /usr/local/hadoop 8 | 9 | #RUN mkdir $HADOOP_INSTALL/logs 10 | 11 | RUN mv /tmp/hdfs-site.xml $HADOOP_INSTALL/etc/hadoop/hdfs-site.xml && \ 12 | mv /tmp/core-site.xml $HADOOP_INSTALL/etc/hadoop/core-site.xml && \ 13 | mv /tmp/mapred-site.xml $HADOOP_INSTALL/etc/hadoop/mapred-site.xml && \ 14 | mv /tmp/yarn-site.xml $HADOOP_INSTALL/etc/hadoop/yarn-site.xml 15 | 16 | RUN mv /tmp/start-ssh-serf.sh ~/start-ssh-serf.sh && \ 17 | chmod +x ~/start-ssh-serf.sh 18 | 19 | EXPOSE 22 7373 7946 9000 50010 50020 50070 50075 50090 50475 8030 8031 8032 8033 8040 8042 8060 8088 50060 20 | 21 | CMD '/root/start-ssh-serf.sh'; 'bash' 22 | 23 | -------------------------------------------------------------------------------- /hadoop-slave/files/hadoop/core-site.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | fs.defaultFS 6 | hdfs://master.krejcmat.com:9000/ 7 | NameNode URI 8 | 9 | -------------------------------------------------------------------------------- /hadoop-slave/files/hadoop/hdfs-site.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | dfs.datanode.data.dir 6 | file:///root/hdfs/datanode 7 | DataNode directory 8 | 9 | 10 | dfs.namenode.name.dir 11 | file:///root/hdfs/namenode 12 | NameNode directory for namespace and transaction logs storage. 13 | 14 | 15 | dfs.replication 16 | 3 17 | 18 | 19 | dfs.permissions 20 | false 21 | 22 | 23 | dfs.datanode.use.datanode.hostname 24 | false 25 | 26 | 27 | dfs.webhdfs.enabled 28 | true 29 | 30 | 31 | dfs.namenode.datanode.registration.ip-hostname-check 32 | false 33 | 34 | -------------------------------------------------------------------------------- /hadoop-slave/files/hadoop/mapred-site.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | mapreduce.framework.name 6 | yarn 7 | The runtime framework for executing MapReduce jobs. Can be one of local, classic or yarn. 8 | 9 | -------------------------------------------------------------------------------- /hadoop-slave/files/hadoop/start-ssh-serf.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # start sshd 4 | echo "start sshd..." 5 | service ssh start 6 | 7 | # start sef 8 | echo -e "\nstart serf..." 9 | /etc/serf/start-serf-agent.sh > serf_log & 10 | 11 | sleep 5 12 | 13 | serf members 14 | 15 | echo -e "\nhadoop-cluster-docker developed by krejcmat " -------------------------------------------------------------------------------- /hadoop-slave/files/hadoop/yarn-site.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | yarn.nodemanager.aux-services 5 | mapreduce_shuffle 6 | 7 | 8 | yarn.nodemanager.aux-services.mapreduce_shuffle.class 9 | org.apache.hadoop.mapred.ShuffleHandler 10 | 11 | 12 | yarn.resourcemanager.resource-tracker.address 13 | master.krejcmat.com:8025 14 | 15 | 16 | yarn.resourcemanager.scheduler.address 17 | master.krejcmat.com:8030 18 | 19 | 20 | yarn.resourcemanager.address 21 | master.krejcmat.com:8040 22 | 23 | -------------------------------------------------------------------------------- /resize-cluster.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | tag="latest" 4 | 5 | # N is the node number of the cluster 6 | N=$1 7 | 8 | if [ $# = 0 ] 9 | then 10 | echo "Please use the node number of the cluster as the argument!" 11 | exit 1 12 | fi 13 | 14 | cd hadoop-master 15 | 16 | # change the slaves file 17 | echo "master.krejcmat.com" > files/slaves 18 | i=1 19 | while [ $i -lt $N ] 20 | do 21 | echo "slave$i.krejcmat.com" >> files/slaves 22 | ((i++)) 23 | done 24 | 25 | # delete master container 26 | sudo docker rm -f master 27 | 28 | # delete hadoop-master image 29 | sudo docker rmi krejcmat/hadoop-master:$tag 30 | 31 | # rebuild hadoop-docker image 32 | pwd 33 | sudo docker build -t krejcmat/hadoop-master:$tag . -------------------------------------------------------------------------------- /start-container.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # run N slave containers 4 | tag=$1 5 | N=$2 6 | 7 | if [ $# != 2 ] 8 | then 9 | echo "Set first parametar as image version tag(e.g. 0.1) and second as number of nodes" 10 | exit 1 11 | fi 12 | 13 | # delete old master container and start new master container 14 | sudo docker rm -f master &> /dev/null 15 | echo "start master container..." 16 | sudo docker run -d -t --dns 127.0.0.1 -P --name master -h master.krejcmat.com -w /root krejcmat/hadoop-master:$tag&> /dev/null 17 | 18 | # get the IP address of master container 19 | FIRST_IP=$(docker inspect --format="{{.NetworkSettings.IPAddress}}" master) 20 | 21 | # delete old slave containers and start new slave containers 22 | i=1 23 | while [ $i -lt $N ] 24 | do 25 | sudo docker rm -f slave$i &> /dev/null 26 | echo "start slave$i container..." 27 | sudo docker run -d -t --dns 127.0.0.1 -P --name slave$i -h slave$i.krejcmat.com -e JOIN_IP=$FIRST_IP krejcmat/hadoop-slave:$tag &> /dev/null 28 | ((i++)) 29 | done 30 | 31 | 32 | # create a new Bash session in the master container 33 | sudo docker exec -it master bash 34 | --------------------------------------------------------------------------------