├── Dockerfile
├── Makefile
├── README.md
├── docs
├── gopher.png
├── index.html
├── main.wasm
└── wasm_exec.js
├── img
├── basics.png
├── incomes-predict-murders.png
├── inhabitants-predict-murders.png
├── line-plot.png
├── remove1.png
└── unemployment-predict-murders.png
├── main.go
├── runner.go
└── utils.go
/Dockerfile:
--------------------------------------------------------------------------------
1 | FROM ubuntu:18.04
2 | # docker build -t vanessa/regression-wasm .
3 | RUN apt-get update && apt-get install -y git nginx build-essential python autoconf automake libtool bc
4 | WORKDIR /opt
5 | RUN git clone https://github.com/emscripten-core/emsdk.git && \
6 | cd emsdk && \
7 | git pull && \
8 | ./emsdk install latest && \
9 | ./emsdk activate latest
10 |
11 | ENV PATH /opt/emsdk:/opt/emsdk/fastcomp/emscripten:/opt/emsdk/node/12.9.1_64bit/bin:$PATH
12 |
13 | WORKDIR /var/www/html
14 | COPY . /var/www/html
15 | RUN make
16 | EXPOSE 80
17 | CMD ["nginx", "-g", "daemon off;"]
18 |
--------------------------------------------------------------------------------
/Makefile:
--------------------------------------------------------------------------------
1 |
2 | all:
3 | go get github.com/sajari/regression
4 | GOOS=js GOARCH=wasm go build -o docs/main.wasm
5 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Regression Wasm
2 |
3 | This repository serves a simple [web assembly](https://webassembly.org/) (wasm) application
4 | to perform a regression, using data from a table in the browser, which can be loaded as a delimited file
5 | by the user. We use a simple [regression library](https://github.com/sajari/regression) to do
6 | the work. See the demo [here](https://vsoch.github.io/regression-wasm/) or continue reading.
7 |
8 | ## Summary
9 |
10 | - Run a multiple or single regression using Web Assembly
11 | - Two variables (one predictor, one regression) will generate a line plot showing X vs. Y and predictions
12 | - More than two variables (one predictor, multiple regressors) performs multiple regression to generate a residual histogram
13 | - Upload your own data file, change the delimiter, the file name to be saved, or the predictor column
14 |
15 | ## Overview
16 |
17 | When you load the page, you are presented with a loaded data frame. The data is a bit dark,
18 | but it's a nice dataset to show how this works. The first column is the number of murders (per
19 | million habitants) for some city, and each of the remaining columns are variables that might
20 | be used to predict it (inhabitants, percent with incomes below $5000, and percent unemployed).
21 | This is what you see:
22 |
23 | 
24 |
25 | ### Formula
26 |
27 | The formula for our regression model is shown below the plot, in human friendly terms.
28 |
29 | ```
30 | Predicted = -36.7649 + Inhabitants*0.0000 + Percent with incomes below $5000*1.1922 + Percent unemployed*4.7198
31 | ```
32 |
33 | ### Residual Plot
34 |
35 | Given that we have more than one regressor variable, we need to run a multiple regression,
36 | and so the plot in the upper right is a histogram of the residuals.
37 |
38 | > the residuals are the difference between the actual values (number of murders per million habitants) and the values predicted by our model.
39 |
40 | ### Filtering
41 |
42 | If you remove any single value from a row, it invalidates it, and it won't be included
43 | in the plot. If you remove a column heading, it's akin to removing the entire column.
44 |
45 | ### Line Plot
46 |
47 | But what if we want to plot the relationship between one of the variables X, and our Y?
48 | This is where the tool gets interesting! By removing a column header, we essentially
49 | remove the column from the dataset. Let's first try removing just one, Inhabitants:
50 |
51 | 
52 |
53 |
54 | We still see a residual plot because it would require more than two dimensions to plot.
55 | Let's remove another one, the percent unemployed:
56 |
57 | 
58 |
59 | Now we see a line plot, along with the plotting of the predictions! By simply removing
60 | each column one at a time (and leaving only one Y, and one X) we are actually running
61 | a single regression, and we can do this for each variable:
62 |
63 | #### Inhabitants to predict murders
64 |
65 | 
66 |
67 |
68 |
69 | #### Unemployment to predict murders
70 |
71 | 
72 |
73 |
74 |
75 | #### Low Income Percentage to predict murders
76 |
77 | 
78 |
79 |
80 | As we can see, the number of inhabitants (on its own) is fairly useless. The variables
81 | that are strongest here are unemployment and income.
82 |
83 | ## Download Data
84 |
85 | This of course is a very superficial overview, you would want to download the full model data to get more detail:
86 | The "Download Results" will appear after you generate any kind of plot, and it downloads
87 | a text file with the model output. Here is an example:
88 |
89 | ```
90 | Dinosaur Regression Wasm
91 | Predicted = -36.7649 + Inhabitants*0.0000 + Percent with incomes below $5000*1.1922 + Percent unemployed*4.7198
92 | Murders per annum per one million inhabitants| Inhabitants| Percent with incomes below $5000| Percent unemployed
93 | 11.20| 587000.00| 16.50| 6.20
94 | 13.40| 643000.00| 20.50| 6.40
95 | 40.70| 635000.00| 26.30| 9.30
96 | 5.30| 692000.00| 16.50| 5.30
97 | 24.80| 1248000.00| 19.20| 7.30
98 | 12.70| 643000.00| 16.50| 5.90
99 | 20.90| 1964000.00| 20.20| 6.40
100 | 35.70| 1531000.00| 21.30| 7.60
101 | 8.70| 713000.00| 17.20| 4.90
102 | 9.60| 749000.00| 14.30| 6.40
103 | 14.50| 7895000.00| 18.10| 6.00
104 | 26.90| 762000.00| 23.10| 7.40
105 | 15.70| 2793000.00| 19.10| 5.80
106 | 36.20| 741000.00| 24.70| 8.60
107 | 18.10| 625000.00| 18.60| 6.50
108 | 28.90| 854000.00| 24.90| 8.30
109 | 14.90| 716000.00| 17.90| 6.70
110 | 25.80| 921000.00| 22.40| 8.60
111 | 21.70| 595000.00| 20.20| 8.40
112 | 25.70| 3353000.00| 16.90| 6.70
113 |
114 | N = 20
115 | Variance observed = 92.76010000000001
116 | Variance Predicted = 75.90724706481737
117 | R2 = 0.8183178658153383
118 | ```
119 |
120 | ## About
121 |
122 | ### Why?
123 |
124 | Web assembly can allow us to interact with compiled code directly in the browser,
125 | doing away with any need for a server. While I don't do a large amount of data analysis
126 | for my role proper, I realize that many researchers do, and so with this in mind,
127 | I wanted to create a starting point for developers to interact with data in the browser.
128 | The minimum conditions for success meant:
129 |
130 | 1. being able to load a delimited file into the browser
131 | 2. having the file render as a table
132 | 3. having the data be processed by a compiled wasm
133 | 4. updating a plot based on output from 3.
134 |
135 | Thus, the application performs a simple regression based on loading data in the table,
136 | and then plotting the result. To make it fun, I added a cute gopher logo and used an xkcd
137 | plotting library for the result.
138 |
139 | ### Customization
140 |
141 | The basics are here for a developer to create (some GoLang based) functions to
142 | perform data analysis on an input file, and render back to the screen as a plot.
143 | If you need any help, or want to request a custom tool, please don't hesitate to
144 | [open up an issue](https://www.github.com/vsoch/regression-wasm/issues).
145 |
146 | ## Development
147 |
148 | ### Local
149 |
150 | If you are comfortable with GoLang, and have installed [emscripten](https://emscripten.org),
151 | you can clone the repository into your $GOPATH under the github folder:
152 |
153 | ```bash
154 | $ mkdir -p $GOPATH/src.github.com/vsoch
155 | $ cd $GOPATH/src.github.com/vsoch
156 | $ git clone https://www.github.com/vsoch/regression-wasm
157 | ```
158 |
159 | And then build the wasm.
160 |
161 | ```bash
162 | $ cd regression-wasm
163 | $ make
164 | ```
165 |
166 | Add your own Go version specific `wasm_exec.js` file :
167 |
168 | ```bash
169 | $ cp "$(go env GOROOT)/misc/wasm/wasm_exec.js" ./docs
170 | ```
171 |
172 | And cd into the "docs" folder and start a server to see the result.
173 |
174 | ```bash
175 | $ cd docs
176 | $ python -m http.server 9999
177 | ```
178 |
179 | Open the browser to http://localhost:9999
180 |
181 |
182 | ## Docker
183 |
184 | If you don't want to install dependencies, just clone the repository, and
185 | build the Docker image:
186 |
187 | ```bash
188 | $ docker build -t vanessa/regression-wasm .
189 | ```
190 |
191 | It will install [emscripten](https://emscripten.org/docs/getting_started/FAQ.html),
192 | add the source code to the repository, and compile to wasm. You can then
193 | run the container and expose port 80 to see the compiled interface:
194 |
195 | ```bash
196 | $ docker run -it --rm -p 80:80 vanessa/regression-wasm
197 | ```
198 |
199 | Then you can proceed to use the interface.
200 |
--------------------------------------------------------------------------------
/docs/gopher.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vsoch/regression-wasm/91f1adb3dfe185c92f6aac59916829295a8a5aee/docs/gopher.png
--------------------------------------------------------------------------------
/docs/index.html:
--------------------------------------------------------------------------------
1 |
2 |
3 |