├── LICENSE ├── README.md ├── Spark-Cassandra-Zeppelin ├── README.md ├── Vagrantfile ├── apache-mirror-selector.py └── provision_spark_node.sh ├── Spark-IPython-32bit ├── README.md ├── Vagrantfile ├── apache-mirror-selector.py ├── ipython-pyspark.py └── provision_spark_node.sh ├── Spark-IPython-64bit ├── README.md ├── Vagrantfile ├── apache-mirror-selector.py ├── ipython-pyspark.py └── provision_spark_node.sh ├── Spark-IPython-Zeppelin-Lightning ├── CabinSketch-Bold.ttf ├── Humor-Sans-1.0.ttf ├── README.md ├── Vagrantfile ├── database.js ├── ipython-pyspark.py ├── lightning.tar.gz ├── pg_hba.conf ├── postgresql.conf ├── provision_lgn_app.sh ├── provision_spark_app.sh ├── provision_spark_node.sh └── spark-ml-streaming.tar.gz └── SparkR-Zeppelin ├── README.md ├── Vagrantfile ├── apache-mirror-selector.py ├── provision_spark_app.sh └── provision_spark_node.sh /LICENSE: -------------------------------------------------------------------------------- 1 | Apache License 2 | Version 2.0, January 2004 3 | http://www.apache.org/licenses/ 4 | 5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 6 | 7 | 1. Definitions. 8 | 9 | "License" shall mean the terms and conditions for use, reproduction, 10 | and distribution as defined by Sections 1 through 9 of this document. 11 | 12 | "Licensor" shall mean the copyright owner or entity authorized by 13 | the copyright owner that is granting the License. 14 | 15 | "Legal Entity" shall mean the union of the acting entity and all 16 | other entities that control, are controlled by, or are under common 17 | control with that entity. For the purposes of this definition, 18 | "control" means (i) the power, direct or indirect, to cause the 19 | direction or management of such entity, whether by contract or 20 | otherwise, or (ii) ownership of fifty percent (50%) or more of the 21 | outstanding shares, or (iii) beneficial ownership of such entity. 22 | 23 | "You" (or "Your") shall mean an individual or Legal Entity 24 | exercising permissions granted by this License. 25 | 26 | "Source" form shall mean the preferred form for making modifications, 27 | including but not limited to software source code, documentation 28 | source, and configuration files. 29 | 30 | "Object" form shall mean any form resulting from mechanical 31 | transformation or translation of a Source form, including but 32 | not limited to compiled object code, generated documentation, 33 | and conversions to other media types. 34 | 35 | "Work" shall mean the work of authorship, whether in Source or 36 | Object form, made available under the License, as indicated by a 37 | copyright notice that is included in or attached to the work 38 | (an example is provided in the Appendix below). 39 | 40 | "Derivative Works" shall mean any work, whether in Source or Object 41 | form, that is based on (or derived from) the Work and for which the 42 | editorial revisions, annotations, elaborations, or other modifications 43 | represent, as a whole, an original work of authorship. For the purposes 44 | of this License, Derivative Works shall not include works that remain 45 | separable from, or merely link (or bind by name) to the interfaces of, 46 | the Work and Derivative Works thereof. 47 | 48 | "Contribution" shall mean any work of authorship, including 49 | the original version of the Work and any modifications or additions 50 | to that Work or Derivative Works thereof, that is intentionally 51 | submitted to Licensor for inclusion in the Work by the copyright owner 52 | or by an individual or Legal Entity authorized to submit on behalf of 53 | the copyright owner. For the purposes of this definition, "submitted" 54 | means any form of electronic, verbal, or written communication sent 55 | to the Licensor or its representatives, including but not limited to 56 | communication on electronic mailing lists, source code control systems, 57 | and issue tracking systems that are managed by, or on behalf of, the 58 | Licensor for the purpose of discussing and improving the Work, but 59 | excluding communication that is conspicuously marked or otherwise 60 | designated in writing by the copyright owner as "Not a Contribution." 61 | 62 | "Contributor" shall mean Licensor and any individual or Legal Entity 63 | on behalf of whom a Contribution has been received by Licensor and 64 | subsequently incorporated within the Work. 65 | 66 | 2. Grant of Copyright License. Subject to the terms and conditions of 67 | this License, each Contributor hereby grants to You a perpetual, 68 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 69 | copyright license to reproduce, prepare Derivative Works of, 70 | publicly display, publicly perform, sublicense, and distribute the 71 | Work and such Derivative Works in Source or Object form. 72 | 73 | 3. Grant of Patent License. Subject to the terms and conditions of 74 | this License, each Contributor hereby grants to You a perpetual, 75 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 76 | (except as stated in this section) patent license to make, have made, 77 | use, offer to sell, sell, import, and otherwise transfer the Work, 78 | where such license applies only to those patent claims licensable 79 | by such Contributor that are necessarily infringed by their 80 | Contribution(s) alone or by combination of their Contribution(s) 81 | with the Work to which such Contribution(s) was submitted. If You 82 | institute patent litigation against any entity (including a 83 | cross-claim or counterclaim in a lawsuit) alleging that the Work 84 | or a Contribution incorporated within the Work constitutes direct 85 | or contributory patent infringement, then any patent licenses 86 | granted to You under this License for that Work shall terminate 87 | as of the date such litigation is filed. 88 | 89 | 4. Redistribution. You may reproduce and distribute copies of the 90 | Work or Derivative Works thereof in any medium, with or without 91 | modifications, and in Source or Object form, provided that You 92 | meet the following conditions: 93 | 94 | (a) You must give any other recipients of the Work or 95 | Derivative Works a copy of this License; and 96 | 97 | (b) You must cause any modified files to carry prominent notices 98 | stating that You changed the files; and 99 | 100 | (c) You must retain, in the Source form of any Derivative Works 101 | that You distribute, all copyright, patent, trademark, and 102 | attribution notices from the Source form of the Work, 103 | excluding those notices that do not pertain to any part of 104 | the Derivative Works; and 105 | 106 | (d) If the Work includes a "NOTICE" text file as part of its 107 | distribution, then any Derivative Works that You distribute must 108 | include a readable copy of the attribution notices contained 109 | within such NOTICE file, excluding those notices that do not 110 | pertain to any part of the Derivative Works, in at least one 111 | of the following places: within a NOTICE text file distributed 112 | as part of the Derivative Works; within the Source form or 113 | documentation, if provided along with the Derivative Works; or, 114 | within a display generated by the Derivative Works, if and 115 | wherever such third-party notices normally appear. The contents 116 | of the NOTICE file are for informational purposes only and 117 | do not modify the License. You may add Your own attribution 118 | notices within Derivative Works that You distribute, alongside 119 | or as an addendum to the NOTICE text from the Work, provided 120 | that such additional attribution notices cannot be construed 121 | as modifying the License. 122 | 123 | You may add Your own copyright statement to Your modifications and 124 | may provide additional or different license terms and conditions 125 | for use, reproduction, or distribution of Your modifications, or 126 | for any such Derivative Works as a whole, provided Your use, 127 | reproduction, and distribution of the Work otherwise complies with 128 | the conditions stated in this License. 129 | 130 | 5. Submission of Contributions. Unless You explicitly state otherwise, 131 | any Contribution intentionally submitted for inclusion in the Work 132 | by You to the Licensor shall be under the terms and conditions of 133 | this License, without any additional terms or conditions. 134 | Notwithstanding the above, nothing herein shall supersede or modify 135 | the terms of any separate license agreement you may have executed 136 | with Licensor regarding such Contributions. 137 | 138 | 6. Trademarks. This License does not grant permission to use the trade 139 | names, trademarks, service marks, or product names of the Licensor, 140 | except as required for reasonable and customary use in describing the 141 | origin of the Work and reproducing the content of the NOTICE file. 142 | 143 | 7. Disclaimer of Warranty. Unless required by applicable law or 144 | agreed to in writing, Licensor provides the Work (and each 145 | Contributor provides its Contributions) on an "AS IS" BASIS, 146 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 147 | implied, including, without limitation, any warranties or conditions 148 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A 149 | PARTICULAR PURPOSE. You are solely responsible for determining the 150 | appropriateness of using or redistributing the Work and assume any 151 | risks associated with Your exercise of permissions under this License. 152 | 153 | 8. Limitation of Liability. In no event and under no legal theory, 154 | whether in tort (including negligence), contract, or otherwise, 155 | unless required by applicable law (such as deliberate and grossly 156 | negligent acts) or agreed to in writing, shall any Contributor be 157 | liable to You for damages, including any direct, indirect, special, 158 | incidental, or consequential damages of any character arising as a 159 | result of this License or out of the use or inability to use the 160 | Work (including but not limited to damages for loss of goodwill, 161 | work stoppage, computer failure or malfunction, or any and all 162 | other commercial damages or losses), even if such Contributor 163 | has been advised of the possibility of such damages. 164 | 165 | 9. Accepting Warranty or Additional Liability. While redistributing 166 | the Work or Derivative Works thereof, You may choose to offer, 167 | and charge a fee for, acceptance of support, warranty, indemnity, 168 | or other liability obligations and/or rights consistent with this 169 | License. However, in accepting such obligations, You may act only 170 | on Your own behalf and on Your sole responsibility, not on behalf 171 | of any other Contributor, and only if You agree to indemnify, 172 | defend, and hold each Contributor harmless for any liability 173 | incurred by, or claims asserted against, such Contributor by reason 174 | of your accepting any such warranty or additional liability. 175 | 176 | END OF TERMS AND CONDITIONS 177 | 178 | APPENDIX: How to apply the Apache License to your work. 179 | 180 | To apply the Apache License to your work, attach the following 181 | boilerplate notice, with the fields enclosed by brackets "{}" 182 | replaced with your own identifying information. (Don't include 183 | the brackets!) The text should be enclosed in the appropriate 184 | comment syntax for the file format. We also recommend that a 185 | file or class name and description of purpose be included on the 186 | same "printed page" as the copyright notice for easier 187 | identification within third-party archives. 188 | 189 | Copyright {yyyy} {name of copyright owner} 190 | 191 | Licensed under the Apache License, Version 2.0 (the "License"); 192 | you may not use this file except in compliance with the License. 193 | You may obtain a copy of the License at 194 | 195 | http://www.apache.org/licenses/LICENSE-2.0 196 | 197 | Unless required by applicable law or agreed to in writing, software 198 | distributed under the License is distributed on an "AS IS" BASIS, 199 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 200 | See the License for the specific language governing permissions and 201 | limitations under the License. 202 | 203 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # vagrant-projects 2 | 3 | ## Spark-IPython-Zeppelin-Lightning 4 | 5 | This has the Vagrantfile and provisioning script for the environment used in the [Seattle Spark Meetup](http://www.meetup.com/Seattle-Spark-Meetup/events/208711962/) presentation on Apr 15, 2015. 6 | 7 | And also a few related projects with Spark, Zeppelin, or IPython / Jupyter, running on Vagrant. 8 | 9 | 10 | All rights reserved for their respective owners. 11 | -------------------------------------------------------------------------------- /Spark-Cassandra-Zeppelin/README.md: -------------------------------------------------------------------------------- 1 | # Spark-Cassandra-Zeppelin 2 | 3 | SACZ - Setup a Vagrant node with Spark, HBase, Cassandra, and Zeppelin 4 | 5 | 6 | ### Content 7 | 8 | apache-mirror-selector - script to pick Apache mirror URL 9 | provision_spark_node.sh - provisioning, JDK, Spark, HBase, Cassandra, Zeppelin 10 | spark-1.6.1-bin-hadoop2.6.tgz - Spark 1.6.1 official release 11 | hbase-1.2.1-bin.tar.gz 12 | apache-cassandra-3.5-bin.tar.gz 13 | apache-maven-3.3.9-bin.tar.gz 14 | Vagrantfile 15 | 16 | ### Prereq 17 | 18 | Vagrant, VirtualBox as a provider 19 | 20 | A Spark 1.6.1 build - provisioning script will download Spark from Apache mirror if not supplied 21 | 22 | ### Start/Stop 23 | 24 | - Go to local directory and run `vagrant up` 25 | - Zeppelin is running automatically from provisioning script 26 | - Zeppelin is configured to have these interpreters working: Spark, HBase, Cassandra 27 | - Use `vagrant ssh` to connect to the machine 28 | 29 | #### Zeppelin 30 | 31 | `vagrant ssh` 32 | 33 | stop: 34 | `$ sudo /opt/incubator-zeppelin/bin/zeppelin-daemon.sh stop` 35 | start: 36 | `$ sudo SPARK_HOME=/opt/spark-1.6.1-bin-hadoop2.6 /opt/incubator-zeppelin/bin/zeppelin-daemon.sh start` 37 | 38 | Connect to http://localhost:8080 39 | 40 | #### interpreters 41 | 42 | To start, run this in the paragraph 43 | ``` 44 | %hbase 45 | help 46 | ``` 47 | 48 | ``` 49 | %cassandra 50 | HELP; 51 | ``` 52 | 53 | More information to come! 54 | -------------------------------------------------------------------------------- /Spark-Cassandra-Zeppelin/Vagrantfile: -------------------------------------------------------------------------------- 1 | # -*- mode: ruby -*- 2 | # vi: set ft=ruby : 3 | 4 | # Configuration parameters 5 | boxRam = 8192 # Ram in MB 6 | boxCpus = 2 # Number of CPU core 7 | 8 | 9 | Vagrant.configure(2) do |config| 10 | config.vm.define "sczvm" do |master| 11 | # ubuntu/xenial64 - doesn't work with folder share in Vagrant 1.8.1 12 | # ubuntu/trusty64 - doesn't have supported openjdk-8 13 | # ubuntu/vivid64 - box no longer available on Vagrant Atlas 14 | master.vm.box = "ubuntu/wily64" 15 | master.vm.network :forwarded_port, host: 4040, guest: 4040 # Spark UI (Driver) 16 | master.vm.network :forwarded_port, host: 8080, guest: 8080 # Zeppelin default port 17 | master.vm.hostname = "sczvm" 18 | 19 | master.vm.provider :virtualbox do |v| 20 | v.name = master.vm.hostname.to_s 21 | v.customize ["modifyvm", :id, "--memory", "#{boxRam}"] 22 | v.customize ["modifyvm", :id, "--cpus", "#{boxCpus}"] 23 | end 24 | master.vm.provision :shell, :path => "provision_spark_node.sh" 25 | end 26 | end 27 | -------------------------------------------------------------------------------- /Spark-Cassandra-Zeppelin/apache-mirror-selector.py: -------------------------------------------------------------------------------- 1 | #! /usr/bin/env python 2 | 3 | # https://github.com/y-higuchi/apache-mirror-selector 4 | 5 | import sys, argparse 6 | from urllib2 import urlopen 7 | from json import loads 8 | 9 | class UsageOnErrorParser(argparse.ArgumentParser): 10 | def error(self, message): 11 | sys.stderr.write('argument error: %s\n' % message) 12 | self.print_help() 13 | sys.exit(2) 14 | 15 | parser = UsageOnErrorParser(description='Print preferred Apache mirror URL.') 16 | parser.add_argument('url', type=str, help='Apache mirror selector url.') 17 | 18 | args = parser.parse_args() 19 | 20 | jsonurl = args.url + '&asjson=1' 21 | 22 | body = urlopen(jsonurl).read().decode('utf-8') 23 | mirrors = loads(body) 24 | print(mirrors['preferred'] + mirrors['path_info']) 25 | -------------------------------------------------------------------------------- /Spark-Cassandra-Zeppelin/provision_spark_node.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | echo "== start node $(date +'%Y/%m/%d %H:%M:%S')" 4 | STARTTIME=$(date +%s) 5 | 6 | ISTRUSTY=`lsb_release -a | grep -e "14.04"` 7 | ISWILY=`lsb_release -a | grep -e "15.10"` 8 | 9 | set -xe 10 | 11 | # machine config 12 | sudo sysctl -w vm.swappiness=0 13 | echo never > /sys/kernel/mm/transparent_hugepage/defrag 14 | 15 | sudo apt-get update 16 | # sudo apt-get -y upgrade 17 | # sudo apt-get install wget 18 | 19 | sudo apt-get -y install unzip 20 | 21 | # JDK 22 | if [ -n "${ISTRUSTY// }" ]; then 23 | # Trusty Tahr does not have JDK 8 support https://bugs.launchpad.net/trusty-backports/+bug/1368094 24 | sudo apt-get -y install openjdk-7-jdk 25 | else 26 | sudo apt-get -y install openjdk-8-jdk 27 | fi 28 | java -version 29 | javac -version 30 | 31 | if [ -n "${ISWILY// }" ]; then 32 | # Fix build error: java.lang.RuntimeException: Unexpected error: java.security.InvalidAlgorithmParameterException: the trustAnchors parameter must be non-empty and 'parent.relativePath' points at wrong local POM 33 | sudo /var/lib/dpkg/info/ca-certificates-java.postinst configure 34 | fi 35 | 36 | # python 37 | sudo apt-get -y install python-pip 38 | sudo apt-get -y install python-matplotlib 39 | sudo apt-get -y install python-dev 40 | 41 | # Spark 42 | # To change Spark version, change SPARKVER to the right distribution package 43 | SPARKVER=1.6.1 44 | SPARKVER_SHORT=1.6 45 | HADOOP_VERSION=2.6.0 46 | HADOOP_VERSION_SHORT=2.6 47 | cp /vagrant/apache-mirror-selector.py ~/ 48 | pushd /opt 49 | if [ ! -f /vagrant/spark-$SPARKVER-bin-hadoop$HADOOP_VERSION_SHORT.tgz ]; then 50 | echo "downloading Spark ${SPARKVER}..." 51 | sudo wget -q `python ~/apache-mirror-selector.py http://www.apache.org/dyn/closer.cgi?path=spark/spark-$SPARKVER/spark-$SPARKVER-bin-hadoop$HADOOP_VERSION_SHORT.tgz` 52 | cp ./spark-$SPARKVER-bin-hadoop$HADOOP_VERSION_SHORT.tgz /vagrant/ # save it for the next time 53 | else 54 | sudo cp /vagrant/spark-$SPARKVER-bin-hadoop$HADOOP_VERSION_SHORT.tgz ./ 55 | fi 56 | sudo tar -xzf spark-* 57 | sudo rm -f spark-*.tgz 58 | cd spark-* 59 | SPARKHOME=$(pwd) 60 | echo '# set SPARK_HOME and PATH' >> /etc/profile.d/spark.sh 61 | echo "export SPARK_HOME=${SPARKHOME}" >> /etc/profile.d/spark.sh 62 | echo 'export PATH=$SPARK_HOME/bin:$PATH' >> /etc/profile.d/spark.sh 63 | export SPARK_HOME=$SPARKHOME 64 | export PATH=$SPARK_HOME/bin:$PATH 65 | echo "SPARK_HOME=${SPARK_HOME}" 66 | popd 67 | 68 | # HBase 69 | HBASEVER=1.2.1 70 | pushd /opt 71 | if [ ! -f /vagrant/hbase-$HBASEVER-bin.tar.gz ]; then 72 | echo "downloading HBase ${HBASEVER}..." 73 | sudo wget -q `python ~/apache-mirror-selector.py http://www.apache.org/dyn/closer.cgi?path=hbase/$HBASEVER/hbase-$HBASEVER-bin.tar.gz` 74 | cp ./hbase-$HBASEVER-bin.tar.gz /vagrant/ # save it for the next time 75 | else 76 | sudo cp /vagrant/hbase-$HBASEVER-bin.tar.gz ./ 77 | fi 78 | sudo tar -xzf hbase-* 79 | sudo rm -f hbase-*.gz 80 | popd 81 | 82 | # Cassandra 83 | CVER=3.5 84 | pushd /opt 85 | if [ ! -f /vagrant/apache-cassandra-$CVER-bin.tar.gz ]; then 86 | echo "downloading Cassandra ${CVER}..." 87 | sudo wget -q `python ~/apache-mirror-selector.py http://www.apache.org/dyn/closer.cgi?path=cassandra/$CVER/apache-cassandra-$CVER-bin.tar.gz` 88 | cp ./apache-cassandra-$CVER-bin.tar.gz /vagrant/ # save it for the next time 89 | else 90 | sudo cp /vagrant/apache-cassandra-$CVER-bin.tar.gz ./ 91 | fi 92 | sudo tar -xzf apache-cassandra-* 93 | sudo rm -f apache-cassandra-*.gz 94 | sudo mkdir /opt/apache-cassandra-$CVER/data 95 | sudo chown -R vagrant:vagrant /opt/apache-cassandra-$CVER/data 96 | # Enable UDF 97 | sed -i 's/enable_user_defined_functions: false/enable_user_defined_functions: true/' /opt/apache-cassandra-$CVER/conf/cassandra.yaml 98 | popd 99 | 100 | # Zeppelin - we are going to build from source 101 | 102 | # Install Maven 3 103 | MAVENVER=3.3.9 104 | pushd /opt 105 | if [ ! -f /vagrant/apache-maven-$MAVENVER-bin.tar.gz ]; then 106 | echo "downloading Maven ${MAVENVER}..." 107 | sudo wget -q `python ~/apache-mirror-selector.py http://www.apache.org/dyn/closer.cgi?path=maven/maven-3/$MAVENVER/binaries/apache-maven-$MAVENVER-bin.tar.gz` 108 | cp ./apache-maven-$MAVENVER-bin.tar.gz /vagrant/ # save it for the next time 109 | else 110 | sudo cp /vagrant/apache-maven-$MAVENVER-bin.tar.gz ./ 111 | fi 112 | sudo tar -xzf apache-maven-* 113 | sudo rm -f apache-maven-*.gz 114 | ln -s /opt/apache-maven-$MAVENVER/bin/mvn /usr/bin/mvn 115 | popd 116 | 117 | # Install dependencies 118 | apt-get install -y git vim emacs nodejs npm 119 | ln -s /usr/bin/nodejs /usr/bin/node 120 | npm update -g npm 121 | npm install -g grunt-cli 122 | npm install -g grunt 123 | npm install -g bower 124 | 125 | # Clone and build Zeppelin 126 | pushd /opt 127 | git clone https://github.com/felixcheung/incubator-zeppelin.git --branch abdc16 --depth 1 128 | pushd incubator-zeppelin 129 | mvn clean install -DskipTests "-Dspark.version=$SPARKVER" "-Dhadoop.version=$HADOOP_VERSION" -Pyarn -Phadoop-$HADOOP_VERSION_SHORT -Pspark-$SPARKVER_SHORT -Ppyspark -Dhbase.hbase.version=$HBASEVER -Dhbase.hadoop.version=$HADOOP_VERSION 130 | popd 131 | popd 132 | # Create the conf/interpreter.json file for the first time 133 | /opt/incubator-zeppelin/bin/zeppelin-daemon.sh start 134 | sleep 10s 135 | /opt/incubator-zeppelin/bin/zeppelin-daemon.sh stop 136 | # Change settings, HBASE home directory, restart 137 | sed -i "s#\"hbase.home\": \"/usr/lib/hbase/\"#\"hbase.home\": \"/opt/hbase-${HBASEVER}/\"#" /opt/incubator-zeppelin/conf/interpreter.json 138 | 139 | cat > /lib/systemd/system/zeppelin.service <> /etc/profile.d/java.sh 157 | echo "export JAVA_HOME=/usr" >> /etc/profile.d/java.sh 158 | export JAVA_HOME=/usr 159 | /opt/hbase-$HBASEVER/bin/start-hbase.sh 160 | 161 | # Start Cassandra 162 | JDKVER=`java -version 2>&1 | grep -e "1.7"` || true 163 | if [ -n "${JDKVER// }" ]; then 164 | echo "Cassandra 3.x requires JDK 8 to run" 165 | else 166 | sudo -u vagrant sh /opt/apache-cassandra-$CVER/bin/cassandra & 167 | fi 168 | 169 | set +xe 170 | 171 | echo "Ready - open http://localhost:8080" 172 | echo "== end node $(date +'%Y/%m/%d %H:%M:%S')" 173 | echo "== $(($(date +%s) - $STARTTIME)) seconds" 174 | -------------------------------------------------------------------------------- /Spark-IPython-32bit/README.md: -------------------------------------------------------------------------------- 1 | # Spark-IPython-32bit 2 | 3 | Setup a Vagrant single node with Spark, IPython, and matplotlib, in a ubuntu/trusty32 VM 4 | 5 | ### Content 6 | 7 | Vagrantfile 8 | apache-mirror-selector.py - Script to help select a Apache mirror to download from 9 | ipython-pyspark.py - IPython notebook config and launch script 10 | provision_spark_node.sh - Vagrant provisioning script 11 | 12 | ### Prereq 13 | 14 | Vagrant http://docs.vagrantup.com/v2/installation/index.html 15 | VirtualBox https://www.virtualbox.org/wiki/Downloads as a provider 16 | 17 | ### Preparation 18 | 19 | - Go to local directory and run `vagrant up` 20 | - Vagrant will then prepare the VM - this should take about ~2 min to download the core vm (aka "box") and then ~4 min for other downloads and provisioning - it will require Internet connection to download various content from sources 21 | - Spark distribution is automatically downloaded during the provisioning phase 22 | - IPython notebook is downloaded and configured during provisioning, and it is launched with PySpark as the very last step 23 | - To connect to IPython notebook, use http://localhost:1088. To see [SparkContext web UI](https://spark.apache.org/docs/latest/monitoring.html) use http://localhost:4040. Port forwarding is configured in Vagrant. 24 | - If needed, use `vagrant ssh` to connect to the VM machine 25 | 26 | ### Start/Stop 27 | 28 | #### IPython 29 | 30 | `vagrant ssh` 31 | 32 | stop: Ctrl-C to break 33 | start: 34 | `$ sudo su -` 35 | `$ ./ipython-pyspark.py` 36 | 37 | Connect to [http://localhost:1088](http://localhost:1088) 38 | 39 | #### Spark 40 | 41 | Connect to [http://localhost:4040](http://localhost:4040) for the Spark UI (Driver) 42 | 43 | ### Data transfer 44 | 45 | Vagrant support a "mapped directory". The local directory on the host where Vagrantfile is, is mapped to `/vagrant` in the VM. Any file there can be accessed from within the VM (use `vagrant ssh` to connect) 46 | -------------------------------------------------------------------------------- /Spark-IPython-32bit/Vagrantfile: -------------------------------------------------------------------------------- 1 | # -*- mode: ruby -*- 2 | # vi: set ft=ruby : 3 | 4 | # Configuration parameters 5 | boxRam = 2048 # Ram in MB 6 | boxCpus = 2 # Number of CPU core 7 | 8 | ipythonPort = 1088 # Ipython port to forward (also set in IPython notebook config) 9 | 10 | Vagrant.configure(2) do |config| 11 | config.vm.define "sparkvm" do |master| 12 | master.vm.box = "ubuntu/trusty32" 13 | master.vm.network :forwarded_port, host: ipythonPort, guest: ipythonPort # IPython port (set in notebook config) 14 | master.vm.network :forwarded_port, host: 4040, guest: 4040 # Spark UI (Driver) 15 | master.vm.hostname = "sparkvm" 16 | 17 | master.vm.provider :virtualbox do |v| 18 | v.name = master.vm.hostname.to_s 19 | v.customize ["modifyvm", :id, "--memory", "#{boxRam}"] 20 | v.customize ["modifyvm", :id, "--cpus", "#{boxCpus}"] 21 | end 22 | master.vm.provision :shell, :path => "provision_spark_node.sh" 23 | end 24 | end 25 | -------------------------------------------------------------------------------- /Spark-IPython-32bit/apache-mirror-selector.py: -------------------------------------------------------------------------------- 1 | #! /usr/bin/env python 2 | 3 | # https://github.com/y-higuchi/apache-mirror-selector 4 | 5 | import sys, argparse 6 | from urllib2 import urlopen 7 | from json import loads 8 | 9 | class UsageOnErrorParser(argparse.ArgumentParser): 10 | def error(self, message): 11 | sys.stderr.write('argument error: %s\n' % message) 12 | self.print_help() 13 | sys.exit(2) 14 | 15 | parser = UsageOnErrorParser(description='Print preferred Apache mirror URL.') 16 | parser.add_argument('url', type=str, help='Apache mirror selector url.') 17 | 18 | args = parser.parse_args() 19 | 20 | jsonurl = args.url + '&asjson=1' 21 | 22 | body = urlopen(jsonurl).read().decode('utf-8') 23 | mirrors = loads(body) 24 | print(mirrors['preferred'] + mirrors['path_info']) 25 | -------------------------------------------------------------------------------- /Spark-IPython-32bit/ipython-pyspark.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | 3 | # https://github.com/felixcheung/vagrant-projects 4 | 5 | import getpass 6 | import glob 7 | import inspect 8 | import os 9 | import platform 10 | import re 11 | import subprocess 12 | import sys 13 | import time 14 | 15 | #----------------------- 16 | # PySpark 17 | # 18 | 19 | master = 'local[*]' 20 | 21 | num_executors = 12 #24 22 | executor_cores = 2 23 | executor_memory = '1g' #10g 24 | 25 | pyspark_submit_args = os.getenv('PYSPARK_SUBMIT_ARGS', None) 26 | if not pyspark_submit_args: 27 | pyspark_submit_args = '--num-executors %d --executor-cores %d --executor-memory %s' % (num_executors, executor_cores, executor_memory) 28 | pyspark_submit_args = '--master %s %s' % (master, pyspark_submit_args) 29 | 30 | if not os.getenv('PYSPARK_PYTHON', None): 31 | os.environ['PYSPARK_PYTHON'] = sys.executable 32 | os.environ['PYSPARK_DRIVER_PYTHON']='ipython' # PySpark Driver (ie. IPython) 33 | profile_name = 'pyspark' 34 | os.environ['PYSPARK_DRIVER_PYTHON_OPTS'] = 'notebook --profile=%s' % profile_name 35 | 36 | #----------------------- 37 | # IPython Notebook 38 | # 39 | 40 | ipython_notebook_config_template = '''c = get_config() 41 | c.NotebookApp.ip = '{ip}' 42 | c.NotebookApp.port = {port} 43 | c.NotebookApp.open_browser = False 44 | ''' 45 | 46 | pyspark_setup_template = '''import os 47 | if not os.getenv('PYSPARK_SUBMIT_ARGS', None): 48 | raise ValueError('PYSPARK_SUBMIT_ARGS environment variable is not set') 49 | 50 | spark_home = os.getenv('SPARK_HOME', None) 51 | if not spark_home: 52 | raise ValueError('SPARK_HOME environment variable is not set') 53 | ''' 54 | 55 | ip = '*' # Warning: this is potentially insecure 56 | port = 1088 57 | 58 | #----------------------- 59 | # Create profile and start 60 | # 61 | 62 | try: 63 | ipython_profile_path = os.popen('ipython locate').read().rstrip('\n') + '/profile_%s' % profile_name 64 | setup_py_path = ipython_profile_path + '/startup/00-pyspark-setup.py' 65 | ipython_notebook_config_path = ipython_profile_path + '/ipython_notebook_config.py' 66 | ipython_kernel_config_path = ipython_profile_path + '/ipython_kernel_config.py' 67 | 68 | if not os.path.exists(ipython_profile_path): 69 | print 'Creating IPython Notebook profile\n' 70 | cmd = 'ipython profile create %s' % profile_name 71 | os.system(cmd) 72 | print '\n' 73 | 74 | if not os.path.exists(setup_py_path): 75 | print 'Writing PySpark setup\n' 76 | setup_file = open(setup_py_path, 'w') 77 | setup_file.write(pyspark_setup_template) 78 | setup_file.close() 79 | os.chmod(setup_py_path, 0600) 80 | 81 | # matplotlib inline 82 | kernel_config = open(ipython_kernel_config_path).read() 83 | if "c.IPKernelApp.matplotlib = 'inline'" not in kernel_config: 84 | print 'Writing IPython kernel config\n' 85 | new_kernel_config = kernel_config.replace('# c.IPKernelApp.matplotlib = None', "c.IPKernelApp.matplotlib = 'inline'") 86 | kernel_file = open(ipython_kernel_config_path, 'w') 87 | kernel_file.write(new_kernel_config) 88 | kernel_file.close() 89 | os.chmod(ipython_kernel_config_path, 0600) 90 | 91 | if not os.path.exists(ipython_notebook_config_path) or 'open_browser = False' not in open(ipython_notebook_config_path).read(): 92 | print 'Writing IPython Notebook config\n' 93 | config_file = open(ipython_notebook_config_path, 'w') 94 | config_file.write(ipython_notebook_config_template.format(ip = ip, port = port)) 95 | config_file.close() 96 | os.chmod(ipython_notebook_config_path, 0600) 97 | 98 | print 'Launching PySpark with IPython Notebook\n' 99 | cmd = 'pyspark %s' % pyspark_submit_args 100 | os.system(cmd) 101 | sys.exit(0) 102 | except KeyboardInterrupt: 103 | print 'Aborted\n' 104 | sys.exit(1) 105 | -------------------------------------------------------------------------------- /Spark-IPython-32bit/provision_spark_node.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | echo "== start vm provisioning $(date +'%Y/%m/%d %H:%M:%S')" 4 | STARTTIME=$(date +%s) 5 | 6 | sudo apt-get update && sudo apt-get -y upgrade 7 | 8 | # OpenJDK 1.7.0_79 9 | sudo apt-get -y install openjdk-7-jre-headless 10 | 11 | # Set JAVA_HOME 12 | java -version 13 | echo '' >> /etc/profile 14 | echo '# set JAVA_HOME' >> /etc/profile 15 | echo 'export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-i386' >> /etc/profile 16 | export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-i386 17 | echo "JAVA_HOME=${JAVA_HOME}" 18 | 19 | # Spark 20 | pushd ~ 21 | echo "Getting Spark..." 22 | cp /vagrant/apache-mirror-selector.py ./ 23 | chmod 700 apache-mirror-selector.py 24 | wget -q `./apache-mirror-selector.py http://www.apache.org/dyn/closer.cgi?path=spark/spark-1.3.1/spark-1.3.1-bin-hadoop2.6.tgz` 25 | sudo cp ./spark-1.3.1-bin-hadoop2.6.tgz /opt 26 | pushd /opt 27 | sudo tar -xzf spark-* 28 | sudo rm -f spark-*.tgz 29 | cd spark-* 30 | SPARKHOME=$(pwd) 31 | echo '' >> /etc/profile 32 | echo '# set SPARK_HOME and PATH' >> /etc/profile 33 | echo "export SPARK_HOME=${SPARKHOME}" >> /etc/profile 34 | echo 'export PATH=$JAVA_HOME/bin:$SPARK_HOME/bin:$PATH' >> /etc/profile 35 | export SPARK_HOME=$SPARKHOME 36 | export PATH=$JAVA_HOME/bin:$SPARK_HOME/bin:$PATH 37 | echo "SPARK_HOME=${SPARK_HOME}" 38 | popd 39 | rm -f apache-mirror-selector.py 40 | popd 41 | 42 | sudo apt-get -y install pkg-config 43 | sudo apt-get -y install python-pip 44 | 45 | # matplotlib 46 | # required to get freetype, png 47 | sudo apt-get -y install python-matplotlib 48 | 49 | # IPython notebook 50 | sudo apt-get -y install libzmq-dev 51 | # required to get pyzmq 52 | sudo apt-get -y install python-dev 53 | sudo python -m pip install "ipython[notebook]" --upgrade 54 | IPYTHONVER=`ipython -V` 55 | echo "IPython version ${IPYTHONVER}" 56 | 57 | # Start IPython notebook 58 | cd ~ 59 | cp /vagrant/ipython-pyspark.py ~/ 60 | ~/ipython-pyspark.py 61 | 62 | echo "== end vm provisioning $(date +'%Y/%m/%d %H:%M:%S')" 63 | echo "== $(($(date +%s) - $STARTTIME)) seconds" 64 | -------------------------------------------------------------------------------- /Spark-IPython-64bit/README.md: -------------------------------------------------------------------------------- 1 | # Spark-IPython-64bit 2 | 3 | Setup a Vagrant single node with Java 1.8.0, Spark 1.5.1, IPython 4.0.0, and matplotlib, in a ubuntu/vivid64 VM 4 | 5 | ### Content 6 | 7 | Vagrantfile 8 | apache-mirror-selector.py - Script to help select a Apache mirror to download from 9 | ipython-pyspark.py - IPython notebook config and launch script 10 | provision_spark_node.sh - Vagrant provisioning script 11 | 12 | ### Prereq 13 | 14 | Vagrant http://docs.vagrantup.com/v2/installation/index.html 15 | VirtualBox https://www.virtualbox.org/wiki/Downloads as a provider 16 | 17 | ### Preparation 18 | 19 | - Go to local directory and run `vagrant up` 20 | - Vagrant will then prepare the VM - this should take about ~2 min to download the core vm (aka "box") and then ~4 min for other downloads and provisioning - it will require Internet connection to download various content from sources 21 | - Spark distribution is automatically downloaded during the provisioning phase 22 | - IPython notebook is downloaded and configured during provisioning, and it is launched with PySpark as the very last step 23 | - To connect to IPython notebook, use http://localhost:1088. To see [SparkContext web UI](https://spark.apache.org/docs/latest/monitoring.html) use http://localhost:4040. Port forwarding is configured in Vagrant. 24 | - If needed, use `vagrant ssh` to connect to the VM machine 25 | 26 | ### Start/Stop 27 | 28 | #### IPython 29 | 30 | `vagrant ssh` 31 | 32 | stop: Ctrl-C to break 33 | start: 34 | `$ sudo su -` 35 | `$ ./ipython-pyspark.py` 36 | 37 | Connect to [http://localhost:1088](http://localhost:1088) 38 | 39 | #### Spark 40 | 41 | Connect to [http://localhost:4040](http://localhost:4040) for the Spark UI (Driver) 42 | 43 | ### Data transfer 44 | 45 | Vagrant support a "mapped directory". The local directory on the host where Vagrantfile is, is mapped to `/vagrant` in the VM. Any file there can be accessed from within the VM (use `vagrant ssh` to connect) 46 | -------------------------------------------------------------------------------- /Spark-IPython-64bit/Vagrantfile: -------------------------------------------------------------------------------- 1 | # -*- mode: ruby -*- 2 | # vi: set ft=ruby : 3 | 4 | # Configuration parameters 5 | boxRam = 2048 # Ram in MB 6 | boxCpus = 2 # Number of CPU core 7 | 8 | ipythonPort = 1088 # Ipython port to forward (also set in IPython notebook config) 9 | 10 | Vagrant.configure(2) do |config| 11 | config.vm.define "sparkvm64" do |master| 12 | master.vm.box = "ubuntu/vivid64" 13 | master.vm.network :forwarded_port, host: ipythonPort, guest: ipythonPort # IPython port (set in notebook config) 14 | master.vm.network :forwarded_port, host: 4040, guest: 4040 # Spark UI (Driver) 15 | master.vm.hostname = "sparkvm64" 16 | 17 | master.vm.provider :virtualbox do |v| 18 | v.name = master.vm.hostname.to_s 19 | v.customize ["modifyvm", :id, "--memory", "#{boxRam}"] 20 | v.customize ["modifyvm", :id, "--cpus", "#{boxCpus}"] 21 | end 22 | master.vm.provision :shell, :path => "provision_spark_node.sh" 23 | end 24 | end 25 | -------------------------------------------------------------------------------- /Spark-IPython-64bit/apache-mirror-selector.py: -------------------------------------------------------------------------------- 1 | #! /usr/bin/env python 2 | 3 | # https://github.com/y-higuchi/apache-mirror-selector 4 | 5 | import sys, argparse 6 | from urllib2 import urlopen 7 | from json import loads 8 | 9 | class UsageOnErrorParser(argparse.ArgumentParser): 10 | def error(self, message): 11 | sys.stderr.write('argument error: %s\n' % message) 12 | self.print_help() 13 | sys.exit(2) 14 | 15 | parser = UsageOnErrorParser(description='Print preferred Apache mirror URL.') 16 | parser.add_argument('url', type=str, help='Apache mirror selector url.') 17 | 18 | args = parser.parse_args() 19 | 20 | jsonurl = args.url + '&asjson=1' 21 | 22 | body = urlopen(jsonurl).read().decode('utf-8') 23 | mirrors = loads(body) 24 | print(mirrors['preferred'] + mirrors['path_info']) 25 | -------------------------------------------------------------------------------- /Spark-IPython-64bit/ipython-pyspark.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | 3 | # https://github.com/felixcheung/vagrant-projects 4 | 5 | import os 6 | import sys 7 | 8 | #----------------------- 9 | # PySpark 10 | # 11 | 12 | master = 'local[*]' 13 | 14 | num_executors = 12 #24 15 | executor_cores = 2 16 | executor_memory = '1g' #10g 17 | 18 | pyspark_submit_args = os.getenv('PYSPARK_SUBMIT_ARGS', None) 19 | if not pyspark_submit_args: 20 | pyspark_submit_args = '--num-executors %d --executor-cores %d --executor-memory %s' % (num_executors, executor_cores, executor_memory) 21 | pyspark_submit_args = '--master %s %s' % (master, pyspark_submit_args) 22 | 23 | if not os.getenv('PYSPARK_PYTHON', None): 24 | os.environ['PYSPARK_PYTHON'] = sys.executable 25 | os.environ['PYSPARK_DRIVER_PYTHON']='ipython' # PySpark Driver (ie. IPython) 26 | 27 | ip = '*' # Warning: this is potentially insecure 28 | port = 1088 29 | 30 | os.environ['PYSPARK_DRIVER_PYTHON_OPTS'] = 'notebook --ip=%s --port=%s --no-browser --notebook-dir=/vagrant' % (ip, port) 31 | 32 | try: 33 | print 'Launching PySpark with IPython Notebook\n' 34 | cmd = 'pyspark %s' % pyspark_submit_args 35 | os.system(cmd) 36 | sys.exit(0) 37 | except KeyboardInterrupt: 38 | print 'Aborted\n' 39 | sys.exit(1) 40 | -------------------------------------------------------------------------------- /Spark-IPython-64bit/provision_spark_node.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | echo "== start vm provisioning $(date +'%Y/%m/%d %H:%M:%S')" 4 | STARTTIME=$(date +%s) 5 | 6 | sudo apt-get update && sudo apt-get -y upgrade 7 | 8 | # OpenJDK 1.8.0_45 9 | sudo apt-get -y install openjdk-8-jre-headless 10 | 11 | # Set JAVA_HOME 12 | java -version 13 | echo '' >> /etc/profile 14 | echo '# set JAVA_HOME' >> /etc/profile 15 | echo 'export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-amd64' >> /etc/profile 16 | export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-amd64 17 | echo "JAVA_HOME=${JAVA_HOME}" 18 | 19 | # Spark 20 | pushd ~ 21 | echo "Getting Spark..." 22 | cp /vagrant/apache-mirror-selector.py ./ 23 | chmod 700 apache-mirror-selector.py 24 | wget -q `./apache-mirror-selector.py http://www.apache.org/dyn/closer.cgi?path=spark/spark-1.5.1/spark-1.5.1-bin-hadoop2.6.tgz` 25 | sudo cp ./spark-1.5.1-bin-hadoop2.6.tgz /opt 26 | pushd /opt 27 | sudo tar -xzf spark-* 28 | sudo rm -f spark-*.tgz 29 | cd spark-* 30 | SPARKHOME=$(pwd) 31 | echo '' >> /etc/profile 32 | echo '# set SPARK_HOME and PATH' >> /etc/profile 33 | echo "export SPARK_HOME=${SPARKHOME}" >> /etc/profile 34 | echo 'export PATH=$JAVA_HOME/bin:$SPARK_HOME/bin:$PATH' >> /etc/profile 35 | export SPARK_HOME=$SPARKHOME 36 | export PATH=$JAVA_HOME/bin:$SPARK_HOME/bin:$PATH 37 | echo "SPARK_HOME=${SPARK_HOME}" 38 | popd 39 | rm -f apache-mirror-selector.py 40 | popd 41 | 42 | sudo apt-get -y install pkg-config 43 | sudo apt-get -y install python-pip 44 | 45 | # matplotlib 46 | # required to get freetype, png 47 | sudo apt-get -y install python-matplotlib 48 | 49 | # IPython notebook 50 | sudo apt-get -y install libzmq-dev 51 | # required to get pyzmq 52 | sudo apt-get -y install python-dev 53 | sudo python -m pip install "ipython[notebook]" --upgrade 54 | IPYTHONVER=`ipython -V` 55 | echo "IPython version ${IPYTHONVER}" 56 | 57 | # Start IPython notebook 58 | cd ~ 59 | cp /vagrant/ipython-pyspark.py ~/ 60 | ~/ipython-pyspark.py 61 | 62 | echo "== end vm provisioning $(date +'%Y/%m/%d %H:%M:%S')" 63 | echo "== $(($(date +%s) - $STARTTIME)) seconds" 64 | -------------------------------------------------------------------------------- /Spark-IPython-Zeppelin-Lightning/CabinSketch-Bold.ttf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/felixcheung/vagrant-projects/0967c95f08a378fa94097c6deaae5d04e18972d9/Spark-IPython-Zeppelin-Lightning/CabinSketch-Bold.ttf -------------------------------------------------------------------------------- /Spark-IPython-Zeppelin-Lightning/Humor-Sans-1.0.ttf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/felixcheung/vagrant-projects/0967c95f08a378fa94097c6deaae5d04e18972d9/Spark-IPython-Zeppelin-Lightning/Humor-Sans-1.0.ttf -------------------------------------------------------------------------------- /Spark-IPython-Zeppelin-Lightning/README.md: -------------------------------------------------------------------------------- 1 | # Spark-IPython-Zeppelin-Lightning 2 | 3 | Setup a Vagrant multinode (2) with Spark, IPython, Zeppelin on 'node' and Lightning on 'lgn' 4 | 5 | ### Content 6 | 7 | CabinSketch-Bold.ttf - used in Word2Vec demo, https://github.com/korin/stupid_captcha 8 | Humor-Sans-1.0.ttf - required for xkcd plots, http://antiyawn.com/uploads/Humor-Sans-1.0.ttf 9 | Vagrantfile 10 | database.js - Lightning database config 11 | ipython-pyspark.py - IPython setup/launch script 12 | lightning.tar.gz - Lightning 13 | pg_hba.conf - Postgres config 14 | postgresql.conf - Postgres config 15 | provision_lgn_app.sh - provisioning, Lightning on lgn 16 | provision_spark_app.sh - provisioning, Spark app and demo stuff 17 | provision_spark_node.sh - provisioning, Spark on node 18 | spark-ml-streaming.tar.gz - Streaming k-means project 19 | 20 | ### Prereq 21 | 22 | Vagrant, VirtualBox as a provider 23 | 24 | Lightning, Streaming k-means builds are included for convenience but building your own is recommended. 25 | 26 | Unfortunately these are too large for GitHub, 27 | spark-1.3.0-bin-hadoop2.4.tgz 28 | - Spark 1.3.0 official release, you can download from http://spark.apache.org/downloads.html, choose 1.3.0, Hadoop 2.4 29 | 30 | zeppelin-0.5.0-SNAPSHOT.tar.gz 31 | - Zeppelin build, see https://github.com/apache/incubator-zeppelin 32 | - Build with 33 | `mvn clean package -Pspark-1.3 -Phadoop-2.4 -Dhadoop.version=2.5.0 -P build-distr -DskipTests` 34 | 35 | ### Preparation 36 | 37 | - Go to local directory and run `vagrant up` 38 | - Vagrant will then prepare the VMs - this could take ~1 hour and require Internet connection to download various content from sources 39 | - Zeppelin, Lightning are running automatically from provisioning script; if you see the Lightning logo then the preparation & provisioning steps are complete 40 | - Use `vagrant ssh` to connect to these machine 41 | 42 | ### Start/Stop 43 | 44 | #### IPython 45 | 46 | `vagrant ssh node` 47 | 48 | stop: Ctrl-C to break 49 | start: 50 | `$ MASTER=local[*] ./ipython-pyspark.py` 51 | 52 | Connect to http://localhost:8888 53 | 54 | #### Zeppelin 55 | 56 | `vagrant ssh node` 57 | 58 | stop: 59 | `$ sudo /opt/zeppelin-0.5.0-SNAPSHOT/bin/zeppelin-daemon.sh stop` 60 | start: 61 | `$ sudo SPARK_HOME=/opt/spark-1.3.0-bin-hadoop2.4 PYSPARK_PYTHON=/usr/local/bin/python2.7 LD_LIBRARY_PATH=/usr/local/lib /opt/zeppelin-0.5.0-SNAPSHOT/bin/zeppelin-daemon.sh start` 62 | 63 | Connect to http://localhost:8080 64 | 65 | #### Lightning 66 | 67 | `vagrant ssh lgn` 68 | 69 | stop: Ctrl-C to break 70 | start: 71 | ``` 72 | $ cd /opt/lightning 73 | $ sudo npm start 74 | ``` 75 | 76 | Connect to http://localhost:3000 77 | 78 | #### Streaming k-means 79 | 80 | `vagrant ssh node` 81 | 82 | Driver script: 83 | ``` 84 | $ cd ~/streamingcluster/bin 85 | $ /usr/local/bin/python2.7 streaming-kmeans -nc 4 -nd 2 -hl 10 -nb 100 -tu points 86 | ``` 87 | 88 | IPython notebook - Lightning client: 89 | You can get it from https://github.com/felixcheung/spark-ml-streaming 90 | 91 | More information to come! 92 | -------------------------------------------------------------------------------- /Spark-IPython-Zeppelin-Lightning/Vagrantfile: -------------------------------------------------------------------------------- 1 | # -*- mode: ruby -*- 2 | # vi: set ft=ruby : 3 | 4 | # Configuration parameters 5 | centosBoxName = "centos65-x86_64-20140116" 6 | centosBoxUrl = "https://github.com/2creatives/vagrant-centos/releases/download/v6.5.3/centos65-x86_64-20140116.box" 7 | 8 | masterRam = 2048 # Ram in MB for the main node 9 | masterCpus = 1 # Number of CPU core for the main node 10 | 11 | # For some reason, VirtualBox provider can only route host->VM at 192.168.128.1 12 | # Also not setting virtualbox__intnet: "networkname" which seems to help 13 | privateNetworkIp = "192.168.128.1" # Starting IP range for the private network between nodes 14 | ipythonPort = 8888 # Ipython port to forward (set in IPython notebook config) 15 | 16 | # Do not edit below this line 17 | # -------------------------------------------------------------- 18 | privateSubnet = privateNetworkIp.split(".")[0...3].join(".") 19 | privateStartingIp = privateNetworkIp.split(".")[3].to_i 20 | 21 | # Create hosts data 22 | idNode = 1 23 | hosts = "#{privateSubnet}.#{privateStartingIp + idNode} node node\n" 24 | idLgn = 0 25 | hosts << "#{privateSubnet}.#{privateStartingIp + idLgn} lgn lgn\n" 26 | 27 | $hosts_data = <