├── .gitignore └── README.md /.gitignore: -------------------------------------------------------------------------------- 1 | **.html 2 | *.sublime-project 3 | *.sublime-workspace 4 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Julia and R on Amazon EC2 2 | 3 | Thank you to @arnonerba for figuring out how to get Julia compiled with Intel MKL and major revisions below. 4 | 5 | ## Purpose 6 | 7 | This guide explains how to set up a simple Amazon Linux 2 instance on EC2 for use with R and Julia. Renting EC2 space can be quite cheap, especially if you use a Spot Instance. Pricing is a continuous uniform-price auction, and if you are outbid, your instance is terminated w/out warning. 8 | 9 | The guide assumes basic familiarity with a UNIX-like systems (e.g., navigating the file structure, copying, moving, ssh, etc). Note that Windows 10 now includes the "Windows Subsytem for Linux" (WSL), which provides a very nice terminal environment ([MSDN setup guide](https://msdn.microsoft.com/en-us/commandline/wsl/install_guide)). The Git Bash terminal is also a good choice for Windows. macOS users may make use of the built-in macOS terminal. 10 | 11 | If you have suggestions, pull requests & edits are welcome! 12 | 13 | ## Launch an EC2 Instance 14 | 15 | 1. Sign up for an Amazon AWS account. Sign up for a [GitHub education pack](https://education.github.com/pack) if eligible and you may get some free Amazon AWS credits. 16 | 2. Spin up an Amazon Linux 2 instance. The "compute optimized" tier is recommended, and you should not need more than 16GB of storage. 17 | 3. Create a new SSH key pair when prompted, or choose one already saved in your AWS account. If you create a new pair, you will be asked to download your private key. 18 | - **Your private key must be kept secure**. By convention it should be placed in your local `~/.ssh` directory and be protected by either `0400` or `0600` permissions [(note)](https://superuser.com/questions/215504/permissions-on-private-key-in-ssh-folder). Your `~/.ssh` directory should have `0700` permissions. 19 | - Private keys may take several different forms: 20 | + `*.pem` - Standard file format for cryptographic keys/certificates. AWS uses this format. 21 | + `*.key` - Alternate file extension for a PEM file only containing a private key. 22 | + `*.ppk` - Proprietary PuTTY format for private keys. PuTTY does not support the PEM format. 23 | - Public keys utilize the `*.pub` extension, but when copied to a server are appended to your remote `~/.ssh/authorized_keys` file. The presence of your public key in this **remote** file grants you access to the server. 24 | + If you are manually copying a new public key to an instance you already have access to, use the `ssh-copy-id` command [(note)](https://askubuntu.com/questions/4830/easiest-way-to-copy-ssh-keys-to-another-machine). Otherwise, the AWS setup guide handles this process for you. 25 | 4. Connect to your EC2 instance via SSH. You can find the IP address/hostname of your instance in your AWS dashboard. 26 | - Append the following to your local `~/.ssh/config` file, substituting the appropriate values as necessary: 27 | ```shell 28 | Host your_server_name 29 | HostName your_ip_address_or_hostname 30 | User ec2-user 31 | IdentityFile ~/.ssh/your_private_key.pem 32 | ``` 33 | - Then, SSH into the server with `ssh your_server_name`. 34 | - Alternatively, you can skip the instructions above and connect directly with: 35 | ```shell 36 | ssh ec2-user@your_ip_address_or_hostname -i ~/.ssh/your_private_key.pem 37 | ``` 38 | 39 | ## Install Software 40 | 41 | ### Git 42 | - Install Git 43 | ```shell 44 | sudo yum install git 45 | ``` 46 | - To push and pull from GitHub over SSH, you will need another public/private key pair that is tied to your GitHub account [(note)](https://help.github.com/articles/adding-a-new-ssh-key-to-your-github-account/). If you do not have a key pair, generate one on your EC2 instance with `ssh-keygen` and add the public key to your GitHub account. If you already have an authorized key pair, copy the private key to your EC2 instance and place it in your remote `~/.ssh` directory: 47 | + On the local machine, navigate to your directory with relevant keys (usually `~/.ssh` or `%USERPROFILE%/.ssh`). 48 | + Use `sftp` to put your `github_rsa` private key on the remote server. 49 | + Exit `sftp`, and then `ssh` back into the server. 50 | + Move the private key into .ssh: `mv github_rsa .ssh/`. 51 | + Check that the permissions are correct: `ls -al .ssh`. 52 | 53 | ### Git LFS 54 | - See [install guide](https://github.com/git-lfs/git-lfs#getting-started). I had to use [PackageCloud](https://packagecloud.io/github/git-lfs/install) to install from command line. 55 | ```shell 56 | curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.rpm.sh | sudo bash 57 | sudo yum install git-lfs 58 | git lfs install # only run once for initial install 59 | ``` 60 | 61 | ### LFTP 62 | - Install LFTP to connect to Box accounts. 63 | ```shell 64 | sudo yum install lftp 65 | ``` 66 | 67 | ### Intel MKL 68 | - Intel MKL is [available for free](https://software.intel.com/en-us/articles/how-to-get-intel-mklippdaal) from Intel. The [yum repository](https://software.intel.com/en-us/articles/installing-intel-free-libs-and-python-yum-repo) can be easily added on an Amazon Linux 2 system: 69 | ```shell 70 | sudo yum-config-manager --add-repo https://yum.repos.intel.com/mkl/setup/intel-mkl.repo 71 | sudo rpm --import https://yum.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS-2019.PUB 72 | ``` 73 | - Multiple versions of MKL are available, but the latest can be easily installed: 74 | ```shell 75 | sudo yum install intel-mkl 76 | ``` 77 | 78 | ### Julia (Build From Source) 79 | 80 | These instructions are for building Julia from source. Binary files are also available on the [Julia Downloads page](https://julialang.org/downloads/), and installation instructions are available [here](https://julialang.org/downloads/platform.html) 81 | 82 | - First, install the necessary dependencies: 83 | ```shell 84 | sudo yum groupinstall 'Development Tools' 85 | sudo yum install make gcc gcc-c++ libatomic python gcc-gfortran perl wget m4 patch pkgconfig 86 | sudo yum autoremove cmake # default version is too old 87 | ``` 88 | - Download the Julia source code: 89 | ```shell 90 | wget https://github.com/JuliaLang/julia/archive/v1.0.1.tar.gz 91 | ``` 92 | - Extract Julia source code tarball and move to `/usr/local`: 93 | ```shell 94 | tar -xzvf v1.0.1.tar.gz 95 | mv julia-1.0.1/ /usr/local/julia-1.0.1/ 96 | cd /usr/local/julia-1.0.1/ 97 | ``` 98 | - (Optional) Enable MKL in Julia: 99 | ```shell 100 | source /opt/intel/bin/compilervars.sh intel64 101 | echo "USE_INTEL_MKL = 1" > Make.user 102 | ``` 103 | - Use `make` to compile Julia: 104 | ```shell 105 | ./contrib/download_cmake.sh # force Julia to build an updated version of cmake 106 | make -j4 # where '4' is the number of available CPU threads 107 | ``` 108 | - Symlink Julia to `/usr/local/bin`: 109 | ```shell 110 | ln -s /usr/local/julia-1.0.1/julia /usr/local/bin/julia 111 | ``` 112 | - Test MKL Integration in a Julia prompt: 113 | ```julia 114 | using LinearAlgebra 115 | LinearAlgebra.BLAS.vendor() 116 | ``` 117 | 118 | ### Julia Packages 119 | - Open up a julia prompt and install packages into the default folder 120 | ```bash 121 | ]add AxisAlgorithms BenchmarkTools Calculus CategoricalArrays DataFrames Distributions FileIO Formatting GLM GR Gadfly IndirectArrays Interpolations JLD2 MixedModels NLSolversBase NLopt Optim PkgDev Plots Primes Profile ProgressMeter PyPlot RData Ratios StatsBase StatsFuns StatsModels 122 | ``` 123 | - ~~Initialize package repo with `Pkg.init()` in julia~~ (deprecated in v0.7) 124 | - ~~Bulk install by updating `REQUIRE` in `~/.julia/v0.x/REQUIRE` and running `Pkg.resolve()`. You may need to run julia as `sudo` with elevated priveleges, but hopefully not.~~ (deprecated in v0.7) 125 | 126 | ### R 127 | 128 | No guide yet for installing R or R Studio server. See [Louis Aslett's page](http://www.louisaslett.com/RStudio_AMI/) 129 | 130 | ## Using Amazon EFS 131 | 132 | A great way to make sure log files persist across sessions and in case your spot instance gets killed is to use Amazon's EFS storage. EFS storage is accessible only by instances in the same security group in the same region. So to access files there, you have to go through a running instance: you cannot SSH directly to the the EFS drive. 133 | 134 | - Add an EFS instance (encrypted?) 135 | - Install EFS utilities 136 | + If running Amazon Linux: `sudo yum install -y amazon-efs-utils` 137 | + If running Ubuntu: 138 | * install `amazon-efs-utils` (manually?) using 139 | * `apt-get install libssl-dev` 140 | * Upgrade `stunnel` and symlinmk it `sudo ln -s /usr/local/bin/stunnel /bin/stunnel` ~~and/or `sudo ln -s /usr/bin/stunnel /bin/stunnel`~~ 141 | - **Make sure that EFS and EC2 instances are in the same security group (SG), and that the SG has an inbound rule allowing NFS traffic from the same SG** 142 | - One-time mounting can be done with `sudo mount -t efs fs-[INSTANCE_ID]:/ [TARGET MOUNT POINT]` 143 | - Permanent mounting can be done by opening `/etc/fstab` with `sudo` privileges and adding a new line to: `fs-[INSTANCE_ID]:/ [TARGET MOUNT POINT SUCH AS /mnt/efs] efs defaults,_netdev,nofail 0 0` 144 | - It can be nice to symlink the efs volume to a directory: `ln -s /mnt/efs efsdir`. 145 | 146 | ## How to get stuff done in the terminal/REPL 147 | 148 | 1. Launch your AMI from the EC2 console: `AMI > Select on your AMI > Under "Actions," select "Spot Request"` Request a big instance, and set the MAX price you are willing to pay per hour (usually it's much lower than this) 149 | 2. Once your AMI is running (can take a bit), SSH into it & get to work or point your browser to the relevant IP address. 150 | 151 | ## Set up notifications when your job dies 152 | 153 | Sometimes things error out. We have not yet figured out how to get the instance to email us when a job errors out. However, to be notified when an instance's CPU utilization falls below a threshold, 154 | 155 | - Create a subscription at [AWS SNS](https://console.aws.amazon.com/sns/v2/home) 156 | - Under EC2 instances > Monitoring, click "create alarm". Set alarm for CPU utilization <= X pct for less than Y min. 157 | 158 | ## Using nohup 159 | 160 | `nohup` is a way to run a script that will stay running after your terminal session is killed and have the script dump all STDOUT and STDERR to a log file. For example, we can run 161 | 162 | ```bash 163 | nohup julia --optimize=3 ~/dev-pkgs/ShaleDrillingEstimation/example/run_estimator.jl > ~/efs-ubuntu/JOBNAME\ "$(TZ=America/Los_Angeles date +on\ %Y-%m-%d\ at\ %Hh%Mm)"\ by\ ${MYIP}.out 2>&1 & 164 | ``` 165 | 166 | where `JOBNAME` is to be filled in by you. 167 | 168 | ## Using LFTP 169 | 170 | - One can use `lftp` to transfer files between AWS instances and Box.net in lieu of Git and git-lfs. Note that special characters in password may have to be escaped or translated to HTML. 171 | ```shell 172 | lftp -p 990 -u "username@institution,PASSWORD" ftps://ftp.box.com 173 | mirror [project_dir_on_box] [remote_project_dir] 174 | ``` 175 | 176 | ## X11 and macOS 177 | 178 | If required, X11 can be easily used to run remote GUI applications on macOS. 179 | - Install [XQuartz](https://www.xquartz.org/) 180 | - Log out and log back in, then connect using `ssh -YC user@server` in Terminal to enable X11 forwarding. 181 | 182 | ## Remote Desktop Over SSH 183 | 184 | Sometimes it is nice to use a GUI. This is pretty straightforward to do on AWS or Azure Ubuntu instances. Note that the instructions below are not tested on AWS Amazon linux. 185 | 186 | - Install Remote Desktop 187 | + Install `xrdp` and `xcfe4` software as per . We'll connect over SSH, so no need to open a special RDP port. 188 | + On local machine, create ssh tunnel to remote with port forwarding `ssh -L [LOCALPORT]:localhost:[3389] username@remoteip`. Can do this with terminal, Bash for Windows, or Putty. 189 | + After connecting to remote instance, set a password on the remote machine so that the RDP can log in `sudo setpasswd [yourname]`. I have sometimes found that I need to do this step to set up a password each time I launch an instance... even if the AMI I'm launching had a password. 190 | + Open Remote Desktop Connection (search for mstsc.exe on Windows) & log in to `localhost:[LOCALPORT]` 191 | + To be able to reconnect to the same desktop, see and . Basically, the idea is to edit the xrdp ini file to allow this. Run `sudo [gedit/pico/vim] /etc/xrdp/xrdp.ini` and change section `[xrdp1]` where it says `port=-1` to `port=ask-1`. When logging in for the first time, leave the port as `-1` and note the port number you get (will default to `5910`). Then on subsquent logins, change the port to whater the previous one was (I it *should* default to `5910`). Sessions seem to persist even when the SSH tunnel is closed. 192 | - Install the gnome terminal `sudo apt-get install gnome-terminal`, or something better than the `xcfe` terminal. This should swap out automatically if you open a new terminal window 193 | - Install unzip (at least if on Azure): `sudo apt-get install unzip` so that Julia can build `HttpParser` for Atom 194 | - Fix tab-completion by following 195 | - Install Firefox using `sudo apt-get install firefox` 196 | - Installing Atom 197 | + Download .deb file from 198 | + attempt to install with `sudo dpkg -i atom-amd64.deb` 199 | + After error, run `sudo apt-get install -f` 200 | + Then again `sudo dpkg -i atom-amd64.deb` 201 | + Follow to get Atom to run. You can find the file with 202 | ```bash 203 | dpkg -L libxcb1 # to find the file 204 | cd /usr/share/atom 205 | cp /usr/lib/x86_64-linux-gnu/libxcb.so.1 206 | sudo sed -i 's/BIG-REQUESTS/_IG-REQUESTS/' libxcb.so.1 207 | ``` 208 | --------------------------------------------------------------------------------