├── .gitignore ├── .gitmodules ├── README.md ├── Vagrantfile ├── dev.md ├── figures ├── arch.dot ├── arch.svg ├── arch_vm.dot └── arch_vm.svg ├── index.html ├── play.sh ├── provision.bash ├── python ├── _version.py ├── gen_cert.sh ├── setup.py └── sqlflow_playground │ ├── __init__.py │ ├── k8s.py │ ├── playground_server_design.md │ └── server.py ├── release.sh └── start.bash /.gitignore: -------------------------------------------------------------------------------- 1 | *.log 2 | *~ 3 | .vagrant 4 | -------------------------------------------------------------------------------- /.gitmodules: -------------------------------------------------------------------------------- 1 | [submodule "sqlflow"] 2 | path = sqlflow 3 | url = https://github.com/sql-machine-learning/sqlflow 4 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Release SQLFlow Desktop Distribution as a VM Image 2 | 3 | This is an experimental work to check deploying the whole 4 | [SQLFlow](https://sqlflow.org/sqlflow) service mesh on Windows, Linux, 5 | or macOS desktop. 6 | 7 | The general architecture of SQLFlow is as the following: 8 | 9 |  10 | 11 | In this deployment, we have Jupyter Notebook server, SQLFlow server, 12 | and MySQL running in a container executing the 13 | `sqlflow/sqlflow:latest` image. Argo runs on a minikube cluster 14 | running on the VM. The deployment is shown in the folllowing figure: 15 | 16 |  17 | 18 | I chose this deployment plan for reasons: 19 | 20 | 1. We don't have a well-written local workflow engine, and at the 21 | right moment, we need to focus on the Kubernetes-native engine. 22 | So, we use minikube and install Argo on minikube. 23 | 24 | 1. We can install minikube directly on users' desktop computers 25 | running Windows, Linux, macOS. However, writing a shell script to 26 | do that requires us to consider many edge cases. To have a clear 27 | deployment environment, I introduced VM. 28 | 29 | 1. To make the VM manageable in a programmatic way, I used Vagrant. 30 | Please be aware that Vagrant is the only software users need to 31 | install to use SQLFlow on their desktop computer. And Vagrant 32 | provides official support for Windows, Linux, and macOS. 33 | 34 | 1. We can run the SQLFlow server container (`sqlflow/sqlflow:latest`) 35 | on minikube as well. But that would add challenge to export ports. 36 | Running the container directly in the VM but out of minikube, we 37 | 38 | 1. expoe the in-container port by adding `EXPOSE` statement in the 39 | Dockerfile, and 40 | 1. expose the docker port for accessing from outside of the VM by 41 | adding the following code snippet to the Vagrantfile. 42 | 43 | ```ruby 44 | config.vm.network "forwarded_port", guest: 3306, host: 3306 45 | config.vm.network "forwarded_port", guest: 50051, host: 50051 46 | config.vm.network "forwarded_port", guest: 8888, host: 8888 47 | ``` 48 | -------------------------------------------------------------------------------- /Vagrantfile: -------------------------------------------------------------------------------- 1 | # -*- mode: ruby -*- 2 | # vi: set ft=ruby : 3 | 4 | Vagrant.configure("2") do |config| 5 | config.vm.box = "ubuntu/bionic64" 6 | config.vm.provision "shell", path: "provision.bash" 7 | 8 | # Enlarge disk size from default '10G' to '20G' 9 | # This need the vagrant-disksize plugin which is installed in play.sh. 10 | config.disksize.size = '20GB' 11 | 12 | # Don't forward 22. Even if we do so, the exposed port only binds 13 | # to 127.0.0.1, but not 0.0.0.0. Other ports binds to all IPs. 14 | config.vm.network "forwarded_port", guest: 3306, host: 3306, 15 | auto_correct: true 16 | config.vm.network "forwarded_port", guest: 50051, host: 50051, 17 | auto_correct: true 18 | # Jupyter Notebook 19 | config.vm.network "forwarded_port", guest: 8888, host: 8888, 20 | auto_correct: true 21 | # minikube dashboard 22 | config.vm.network "forwarded_port", guest: 9000, host: 9000, 23 | auto_correct: true 24 | # Argo dashboard 25 | config.vm.network "forwarded_port", guest: 9001, host: 9001, 26 | auto_correct: true 27 | 28 | config.vm.provider "virtualbox" do |v| 29 | v.memory = 8192 30 | v.cpus = 4 31 | end 32 | 33 | # Bind the host directory ./ into the VM. 34 | config.vm.synced_folder "./", "/home/vagrant/desktop" 35 | end 36 | -------------------------------------------------------------------------------- /dev.md: -------------------------------------------------------------------------------- 1 | ## Develop, Release, and Use SQLFlow-in-a-VM 2 | 3 | ### For Developers 4 | 5 | 1. Install [VirtualBox 6.1.6](https://www.virtualbox.org/) and [Vagrant 2.2.7](https://www.vagrantup.com/) on a computer with a relatively large memory size. As a recommendation, a host with 16G memory and 8 cores is preferred. 6 | 1. Clone and update `SQLFlow playground` project. 7 | ```bash 8 | git clone https://github.com/sql-machine-learning/playground.git 9 | cd playground 10 | git submodule update --init 11 | ``` 12 | 1. Run the `play.sh` under playgound's root directory. This script will guide you to install SQLFlow on a virtualbox VM. If you have a slow Internet connection to Vagrant Cloud, you might want to download the Ubuntu VirtualBox image manually from some mirror sites into `~/.cache/sqlflow/` before running the above script. We use `wget -c` here for continuing get the file from last breakpoint, so if this command fail, just re-run it. 13 | ```bash 14 | # download Vagrant image manually, optional 15 | mkdir -p $HOME/.cache/sqlflow 16 | wget -c -O $HOME/.cache/sqlflow/ubuntu-bionic64.box \ 17 | "https://mirrors.ustc.edu.cn/ubuntu-cloud-images/bionic/current/bionic-server-cloudimg-amd64-vagrant.box" 18 | 19 | ./play.sh 20 | ``` 21 | The `play.sh` add some extensions for Vagrant, like `vagrant-disksize` which enlarges the disk size of the VM. The script will then call `vagrant up` command to bootup the VM. After the VM is up, the `provision.sh` will be automatically executed which will install the dependencies for SQLFlow. Provision is a one-shot work, after it is done, we will have an environment with SQLFlow, docker and minikube installed. 22 | 23 | 1. Log on the VM and start SQLFlow playground. Run the `start.bash` script, it will pull some docker images and start the playground minikube cluster. As the images pulling may be slow, the script might fail sometimes. Feel free to re-run the script until gou get some output like `Access Jupyter Notebook at ...`. 24 | ```bash 25 | vagrant ssh 26 | sudo su 27 | cd desktop 28 | ./start.bash 29 | ``` 30 | 1. After the minikube is started up. You can access the `Jupyter Notebook` from your desktop. Or you can use SQLFlow command-line tool [sqlflow](https://github.com/sql-machine-learning/sqlflow/blob/develop/doc/run/cli.md) to access the `SQLFlow server`. Just follow the output of the `start.bash`, it will give you some hint. 31 | 1. After playing a while, you may want to stop SQLFlow playground, just log on the VM again and stop the minikube cluster. 32 | ```bash 33 | vagrant ssh # optional if you already logged on 34 | minikube stop 35 | ``` 36 | 1. Finally if you want to stop the VM, you can run the `vagrant halt` command. To complete destroy the VM, run the `vagrant destroy` command. 37 | 38 | ### For Releaser 39 | 40 | The releaser, which, in most cases, is a developer, can export a running VirtualBox VM into a VM image file with extension `.ova`. An `ova` file is a tarball of a directory, whose content follows the OVF specification. For the concepts, please refer to this [explanation](https://damiankarlson.com/2010/11/01/ovas-and-ovfs-what-are-they-and-whats-the-difference/). 41 | 42 | According to this [tutorial](https://www.techrepublic.com/article/how-to-import-and-export-virtualbox-appliances-from-the-command-line/), releasers can call the VBoxManage command to export a VM. We have written a scrip to do this. Simply run below script to export our playground. This script will create a file named `SQLFlowPlayground.ova`, we can import the file through virtual box GUI. 43 | 44 | ```bash 45 | ./release.sh 46 | ``` 47 | 48 | ### For End-users 49 | 50 | To run SQLFlow on a desktop computer running Windows, Linux, or macOS, follow below steps: 51 | 1. install [VirtualBox](https://www.virtualbox.org/) (v6.1.6 is recommended) 52 | 53 | 1. download the released VirtualBox `.ova` file, you have two choices: 54 | - the minimized image (about 600M): shipped with all bootstrap files but no dependency docker images. When you start the playground, you will wait for a while to download the latest docker images, minikube framework and other packages. 55 | ```bash 56 | wget -c http://cdn.sqlflow.tech/latest/SQLFlowPlaygroundBare.ova 57 | ``` 58 | - the full installed image (about 2G): with all dependencies, no extra downloading is needed when starting. Note that in this case, the images will not be updated automatically, you will do it manually when needed. 59 | ```bash 60 | wget -c http://cdn.sqlflow.tech/latest/SQLFlowPlaygroundFull.ova 61 | ``` 62 | 1. optional, download the [sqlflow](https://github.com/sql-machine-learning/sqlflow/blob/develop/doc/run/cli.md) command-line tool released by SQLFlow CI. 63 | 64 | After VirtualBox is installed, you can import the `.ova` file and start a VM. If you have a relative lower configuration, you can adjust the CPU core and RAM amount in VirtualBox's setting panel, say, to 2 cores and 4G RAM. After that, you can log in the system through the VirtualBox GUI or through a ssh connection like below. The default password of `root` is `sqlflow`. 65 | ```bash 66 | ssh -p2222 root@127.0.0.1 67 | root@127.0.0.1's password: sqlflow 68 | ``` 69 | Once logged in the VM, you will immediately see a script named `start.bash`, just run the script to start SQLFlow playground. It will output some hint messages for you, follow those hints, after a while, you will see something like `Access Jupyter NoteBook at: http://127.0.0.1:8888/...`, it means we are all set. Copy the link to your web browser and you will see SQLFlow's Jupyter Notebook user interface, Enjoy it! 70 | ```bash 71 | ./start.bash 72 | ``` 73 | 74 | Or, if you has an AWS or Google Cloud account, you can upload the `.ova` file to start the VM on the cloud. AWS users can follow [these steps](https://aws.amazon.com/ec2/vm-import/). 75 | 76 | Anyway, given a running VM, the end-user can run the following command to connect to it: 77 | 78 | ```bash 79 | sqlflow --sqlflow-server=my-vm.aws.com:50051 80 | ``` 81 | 82 | ### For End-users with Kubernetes (without a VM) 83 | 84 | Now, SQLFlow playground supports directly installing on Kubernetes. Users can refer to [this doc](https://github.com/sql-machine-learning/sqlflow/blob/develop/doc/run/kubernetes.md) to apply a fast deployment. 85 | -------------------------------------------------------------------------------- /figures/arch.dot: -------------------------------------------------------------------------------- 1 | digraph G { 2 | node [shape=box]; 3 | 4 | User1 [shape=oval, label="Lily"]; 5 | User2 [shape=oval, label="Bob"]; 6 | User3 [shape=oval, label="Eva"]; 7 | 8 | {rank = same; User1; User2; User3} 9 | 10 | Browser1 [label="Web browser"]; 11 | Browser2 [label="Web browser"]; 12 | 13 | {rank = same; Browser1, Browser2, Client} 14 | 15 | Jupyter [label="Jupyter Notebook server +\n SQLFlow magic command"]; 16 | SQLFlow [label="SQLFlow server"]; 17 | Argo [label="Tekton on Kubernetes\n(each workflow step is a container)"]; 18 | AI [label="AI engine\n(Alibaba PAI, KubeFlow+Kuberntes, etc)"]; 19 | DBMS [label="database system\n(Hive, MySQL, MaxCompute, etc)"]; 20 | 21 | User1 -> Browser1; 22 | User2 -> Browser2; 23 | Browser1 -> Jupyter [label="SQL/Flow program"]; 24 | Browser2 -> Jupyter; 25 | 26 | Jupyter -> SQLFlow [label="SQL/Flow program"]; 27 | SQLFlow -> Argo [label="Argo workflow"]; 28 | Argo -> DBMS [label="submit SQL statement"]; 29 | Argo -> AI [label="submit AI job"]; 30 | Argo -> DBMS [label="verify data schema"]; 31 | 32 | Client [label="sqlflow command-line client"]; 33 | 34 | User3 -> Client; 35 | Client -> SQLFlow [label="SQL/Flow program"]; 36 | } 37 | -------------------------------------------------------------------------------- /figures/arch.svg: -------------------------------------------------------------------------------- 1 | 2 | 4 | 6 | 7 | 157 | -------------------------------------------------------------------------------- /figures/arch_vm.dot: -------------------------------------------------------------------------------- 1 | digraph G { 2 | node [shape=box]; 3 | 4 | User1 [shape=oval, label="Lily"]; 5 | User2 [shape=oval, label="Bob"]; 6 | User3 [shape=oval, label="Eva"]; 7 | 8 | {rank = same; User1; User2; User3} 9 | 10 | Browser1 [label="Web browser"]; 11 | Browser2 [label="Web browser"]; 12 | 13 | {rank = same; Browser1, Browser2, Client} 14 | 15 | subgraph cluster_vm { 16 | label="VM" 17 | subgraph cluster_container { 18 | label="sqlflow/sqlflow:latest"; 19 | Jupyter [label="Jupyter Notebook server +\n SQLFlow magic command"]; 20 | SQLFlow [label="SQLFlow server"]; 21 | DBMS [label="MySQL"]; 22 | } 23 | subgraph cluster_minikube { 24 | label="minikube"; 25 | Argo [label="Argo"]; 26 | AI [label="AI engine:\ncontainer-local run"]; 27 | } 28 | } 29 | 30 | User1 -> Browser1; 31 | User2 -> Browser2; 32 | Browser1 -> Jupyter [label="SQL/Flow program"]; 33 | Browser2 -> Jupyter; 34 | 35 | Jupyter -> SQLFlow [label="SQL/Flow program"]; 36 | SQLFlow -> Argo [label="Argo workflow"]; 37 | Argo -> DBMS [label="submit SQL statement"]; 38 | Argo -> AI [label="submit AI job"]; 39 | Argo -> DBMS [label="verify data schema"]; 40 | 41 | Client [label="sqlflow command-line client"]; 42 | 43 | User3 -> Client; 44 | Client -> SQLFlow [label="SQL/Flow program"]; 45 | } 46 | -------------------------------------------------------------------------------- /figures/arch_vm.svg: -------------------------------------------------------------------------------- 1 | 2 | 4 | 6 | 7 | 170 | -------------------------------------------------------------------------------- /index.html: -------------------------------------------------------------------------------- 1 | 30 | 31 | 32 |
33 | 34 | 35 | 38 | 39 | 40 |