├── .gitignore
├── index.html
├── test.sh
├── Makefile
├── Dockerfile
├── README.md
├── layout.mustache
└── server.asm
/.gitignore:
--------------------------------------------------------------------------------
1 | server.o
2 | server.html
3 | server
4 | .#*
--------------------------------------------------------------------------------
/index.html:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 | servasm
6 |
7 |
8 | servasm
9 | Your other webserver.
10 |
11 |
12 |
--------------------------------------------------------------------------------
/test.sh:
--------------------------------------------------------------------------------
1 | #!/bin/bash
2 |
3 | # Simple smoke tests
4 |
5 | set -e
6 |
7 | ./server &
8 | PID=$!
9 |
10 | at_exit(){
11 | kill -9 $PID
12 | }
13 | trap at_exit EXIT
14 |
15 | sleep 1
16 |
17 | diff -u <(curl -s localhost:8080) <(cat index.html)
18 | diff -u <(curl -s localhost:8080/server.asm) <(cat server.asm)
19 | diff -u <(curl -s localhost:8080/foobar) <(echo -ne 'HTTP/1.0 404 File not found\r\n\r')
20 |
--------------------------------------------------------------------------------
/Makefile:
--------------------------------------------------------------------------------
1 | default: server
2 |
3 | server: server.html server.asm
4 | nasm -g -f elf64 -o server.o server.asm
5 | ld -o server server.o
6 |
7 | server.html:
8 | rocco -l asm -c ';;' -t layout.mustache server.asm
9 |
10 | .PHONY: build_docker
11 | build_docker:
12 | cat Dockerfile | docker build -t servasm -
13 |
14 | .PHONY: test
15 | test:
16 | bash ./test.sh
17 |
18 | .PHONY: clean
19 | clean:
20 | rm server server.o server.html
21 |
--------------------------------------------------------------------------------
/Dockerfile:
--------------------------------------------------------------------------------
1 | FROM debian:sid
2 | MAINTAINER Vladimir Terekhov
3 |
4 | RUN apt-get update && \
5 | apt-get install -y locales && \
6 | dpkg-reconfigure locales && \
7 | locale-gen C.UTF-8 && \
8 | /usr/sbin/update-locale LANG=C.UTF-8
9 |
10 | ENV LC_ALL C.UTF-8
11 |
12 | # Installing ruby
13 |
14 | RUN apt-get install -y build-essential nasm ruby ruby-dev python-pygments && gem install rocco curl
15 |
16 | CMD /bin/bash
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # SERVASM: Your other webserver.
2 |
3 | Minimal x86_64 Linux-only file webserver, written in assembly language.
4 | It doesn't allocate any memory, using only stack to serve files.
5 |
6 | *Not intended for production use.*
7 |
8 | ## How it works:
9 |
10 | Main process setups listing socket on 8080 port with few system calls:
11 | `socket(2)` -> `bind(2)` -> `listen(2)`
12 | After main process blocks on on `accept(2)` system call until client connects.
13 | Then it `fork(2)` main process passing dealing with request in child process and `accept(2)`'ing again in main.
14 | On a child process sets `alarm(2)` to drop very slow clients, and `recv(2)` headers.
15 | We do couple checks on incoming request (only GET requests are `supported).
16 | open(2)` file and get its size with `fstat(2).
17 | write(2)` headers and let the kernel send rest with `sendfile(2)`. After we `close(2)` socket and file.
18 |
19 | In a case of error we exit process with passing system call result as exit code.
20 |
21 | ## Running
22 |
23 | Compiling server requires `nasm` assembler.
24 |
25 | `make && ./server`
26 |
27 | ## Debugging
28 |
29 | `make && strace -v -s 512 -f ./server`
30 |
31 | ## License
32 |
33 | Copyright (c) 2015 Vladimir Terekhov
34 |
35 | Permission is hereby granted, free of charge, to any person
36 | obtaining a copy of this software and associated documentation
37 | files (the "Software"), to deal in the Software without
38 | restriction, including without limitation the rights to use,
39 | copy, modify, merge, publish, distribute, sublicense, and/or sell
40 | copies of the Software, and to permit persons to whom the
41 | Software is furnished to do so, subject to the following
42 | conditions:
43 |
44 | The above copyright notice and this permission notice shall be
45 | included in all copies or substantial portions of the Software.
46 |
47 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
48 | EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
49 | OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
50 | NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
51 | HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
52 | WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
53 | FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
54 | OTHER DEALINGS IN THE SOFTWARE.
55 |
--------------------------------------------------------------------------------
/layout.mustache:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 | {{ title }}
6 |
157 |
158 |
159 |
160 |
161 |
162 | {{#sections}}
163 |
164 | |
165 | {{{ docs }}}
166 | |
167 |
168 |
169 | |
170 |
171 | {{/sections}}
172 |
173 |
174 |
175 |
--------------------------------------------------------------------------------
/server.asm:
--------------------------------------------------------------------------------
1 | ;; # SERVASM: Your other webserver.
2 | ;;
3 | ;; Minimal x86_64 Linux-only file webserver written in assembly language.
4 | ;; This page is literate program with all service source code.
5 | ;; [Project repository and build instructions](https://github.com/zarkzork/servasm).
6 | ;;
7 | ;; *Warning: server is not intented for production use. It may and will wreck you stuff.*
8 |
9 | ;; ## Overview
10 | ;;
11 | ;; Servasm is forking server, each request is processed in separate process.
12 | ;; This is how it was done in Mesozoic Era (except we use [`sendfile(2)`](http://unixhelp.ed.ac.uk/CGI/man-cgi?sendfile+2), which wasn't invented then).
13 | ;; And this allows us to make things stupidly simple and take as much leverage from Kernel as possible.
14 | ;; We aim for ~1kloc of assembly with comments and spaces.
15 | ;;
16 | ;; Main process setups listing socket with few system calls:
17 | ;;
18 | ;; [`socket(2)`](http://unixhelp.ed.ac.uk/CGI/man-cgi?socket+2) → [`bind(2)`](http://unixhelp.ed.ac.uk/CGI/man-cgi?bind+2) → [`listen(2)`](http://unixhelp.ed.ac.uk/CGI/man-cgi?listen+2)
19 | ;;
20 | ;; Then main process loops on [`accept(2)`](http://unixhelp.ed.ac.uk/CGI/man-cgi?accept+2) system call.
21 | ;; For each request it [`fork(2)`](http://unixhelp.ed.ac.uk/CGI/man-cgi?fork+2)s main process and processes request there:.
22 | ;;
23 | ;; 1. set [`alarm(2)`](http://unixhelp.ed.ac.uk/CGI/man-cgi?alarm+2) to drop very slow clients
24 | ;; 2. [`recv(2)`](http://unixhelp.ed.ac.uk/CGI/man-cgi?recv+2) request headers
25 | ;; 3. check that request is valid (only GET requests are supported)
26 | ;; 4. [`open(2)`](http://unixhelp.ed.ac.uk/CGI/man-cgi?open+2) requested file
27 | ;; 5. get its size with [`fstat(2)`](http://unixhelp.ed.ac.uk/CGI/man-cgi?fstat+2).
28 | ;; 6. [`write(2)`](http://unixhelp.ed.ac.uk/CGI/man-cgi?write+2) response headers
29 | ;; 7. let kernel send rest with [`sendfile(2)`](http://unixhelp.ed.ac.uk/CGI/man-cgi?sendfile+2)
30 | ;; 8. [`close(2)`](http://unixhelp.ed.ac.uk/CGI/man-cgi?close+2) socket and file
31 | ;; 9. [`exit(2)`](http://unixhelp.ed.ac.uk/CGI/man-cgi?exit+2) child
32 | ;;
33 | ;; In a case of error we exit process with passing system call result as exit code.
34 |
35 | ;; ## Reference material
36 | ;;
37 | ;; - [Assembly x86_64 programming for Linux](http://0xax.blogspot.fr/p/assembly-x8664-programming-for-linux.html): introductory blog posts about asm for x86_64 architecture
38 | ;; - [Beej's Guide to Network Programming](http://beej.us/guide/bgnet/): detailed tutorial about unix networking
39 | ;; - Servasm implementation loosely based on [althttpd.c](https://www.sqlite.org/docsrc/artifact/d53e8146bf7977) from sqlite project
40 | ;; - [Stack frame layout on x86-64](http://eli.thegreenplace.net/2011/09/06/stack-frame-layout-on-x86-64/): post about stackframe layout for x86_64
41 | ;; - [Linux System Call Table for x86_64](http://blog.rchapman.org/post/36801038863/linux-system-call-table-for-x86-64)
42 | ;;
43 |
44 | ;; ## Constants
45 | ;;
46 | ;; Data section keeps all static constants that we might need during server lifetime.
47 | section .data
48 |
49 | ;; We are going to use IPv4 and TCP as our transport.
50 | pf_inet: equ 2
51 | sock_stream: equ 1
52 |
53 | ;; Our server binds to `0.0.0.0:8080` interface.
54 | ;; `0.0.0.0` is special ip address that will map to all interfaces on user machine.
55 | sockaddr: db 0x02, 0x00 ;; AFINET
56 | db 0x1f, 0x90 ;; PORT 8080
57 | db 0x00, 0x00, 0x00, 0x00 ;; IP 0.0.0.0
58 | addr_len: equ 128
59 |
60 | ;; Requests timeout in 15 second.
61 | request_timeout: equ 15
62 |
63 | ;; Backlog is number of incoming request that kernel will buffer for us, untill we [`accept(2)`](http://unixhelp.ed.ac.uk/CGI/man-cgi?accept+2) them.
64 | ;; We set it to 128.
65 | backlog: equ 128
66 |
67 | ;; And we are going to use `TCP_CORK` option (more on it later)
68 | sol_tcp: equ 6
69 | tcp_cork: equ 3
70 | on_state: db 0x01
71 |
72 | ;; We store strings as pair of their content and their length following right after message.
73 | ;; `$` points to current memory address, so current address - start of the string is its length.
74 | startup_error_msg: db "ERROR: Cannot start server", 10
75 | startup_error_msg_len: equ $ - startup_error_msg
76 |
77 | ;; for incoming request we restrict path to be alphanumeric plus `./`
78 | url_whitelist: db "abcdefghijklmnopqrstuvwxyz"
79 | db "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789./"
80 | url_whitelist_len: equ $ - url_whitelist
81 |
82 | ;; ## Lookup tables.
83 |
84 | ;; Syscall table for x86-64.
85 | ;; For reference look [here](http://blog.rchapman.org/post/36801038863/linux-system-call-table-for-x86-64).
86 | sys_write: equ 1
87 | sys_open: equ 2
88 | sys_close: equ 3
89 | sys_fstat: equ 5
90 | sys_alarm: equ 37
91 | sys_sendfile: equ 40
92 | sys_socket: equ 41
93 | sys_accept: equ 43
94 | sys_recv: equ 45
95 | sys_bind: equ 49
96 | sys_listen: equ 50
97 | sys_setsockopt: equ 54
98 | sys_fork: equ 57
99 | sys_exit: equ 60
100 | sys_waitid: equ 247
101 |
102 |
103 | ;; We build response headers on stack.
104 | ;; That means that we need to push strings from last one, for example to build header:
105 | ;;
106 | ;; HTTP/1.0 200 OK\r\n
107 | ;; Server: servasm\r\n
108 | ;; Content-type: text/html; charset=UTF-8\r\n
109 | ;; Content-Length: 42\r\n
110 | ;;
111 | ;; We will push `\n\r24 :htgneL-tnetnoC\n\r8-FTU=tesrahc...`.
112 | ;; To make this easy we keep pointers to the end of string instead of beggining and use `0x00` byte to mark begining of the string.
113 |
114 |
115 | ;; We use stack to build headers. string, so all headers are pushed from last character to the first one.
116 | ;; We use
117 |
118 | ;; `\r\n` string
119 | db 0x00, 13, 10
120 | crnl:
121 |
122 | ;; ### Response codes
123 |
124 | ;; [200 OK](http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.2.1)
125 |
126 | db 0x00, "HTTP/1.0 200 OK", 13, 10
127 | result_ok:
128 | ;; [403 Forbidden](http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.4)
129 | db 0x00, "HTTP/1.0 403 Forbidden", 13, 10
130 | result_forbidden:
131 | ;; [404 Not Found](http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.5)
132 | db 0x00, "HTTP/1.0 404 File not found", 13, 10
133 | result_not_found:
134 | ;; [500 Internal Server Error](http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.5.1)
135 | db 0x00, "HTTP/1.0 500 OOPSIE", 13, 10
136 | result_server_error:
137 | ;; [500 Not Implemented](http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.5.2)
138 | db 0x00, "HTTP/1.0 501 Not Implemented", 13, 10
139 | result_unsupported_method:
140 |
141 | ;; ### Mime Types
142 |
143 | ;; We use small amout of predefined mime-types backed in source code.
144 | ;; And support only utf-8 encoding
145 |
146 | db 0x00, "text/plain; charset=UTF-8", 13, 10
147 | txt:
148 | db 0x00, "text/html; charset=UTF-8", 13, 10
149 | html:
150 | db 0x00, "text/css; charset=UTF-8", 13, 10
151 | css:
152 | db 0x00, "css/js; charset=UTF-8", 13, 10
153 | js:
154 | db 0x00, "image/png", 13, 10
155 | png:
156 | db 0x00, "image/jpeg", 13, 10
157 | jpg:
158 | db 0x00, "application/octet-stream", 13, 10
159 | other:
160 |
161 | ;; Mime type hash table
162 | ;; Each entry has two quad words.
163 | ;; first quad word is product of extension ascii codes.
164 | ;; For example:
165 | ;;
166 | ;; 104(h) * 116 (t) * 109 (m) * 108 (l) = 142017408 = 0x8770380
167 | ;;
168 | ;; This means that some unknown files can be served with wrong mime-type in case of hash collision.
169 | ;; And this is okay. Repeat after me: this is okay.
170 | ;;
171 | ;; Second quad word — pointer to the end of matched mime.
172 | ;; In the case file type is uknown we serve it with `application/octet-stream`.
173 |
174 | mime_table: dq 0x18a380, txt
175 | dq 0x8770380, html
176 | dq 0x13fa5b, css
177 | dq 0x2f9e, js
178 | dq 0x135ce0, png
179 | dq 0x12a8a0, jpg
180 | dq 0x0, other
181 |
182 | ;; ### Headers
183 |
184 | ;; [Content-type](http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.17)
185 | db 0x00, "Content-type: "
186 | content_type_header:
187 |
188 | ;; [Content-Length](http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.13)
189 | db 0x00, "Content-Length: "
190 | content_length_header:
191 |
192 | ;; [Server](http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.38)
193 | db 0x00, "Server: servasm", 13, 10,
194 | server_header:
195 |
196 | ;; ## Variables
197 |
198 | ;; `BSS` section stores data that can be changed during application execution.
199 | section .bss
200 |
201 | ;; We will store incoming request in buffer limited to 255 bytes.
202 | buffer: resb 1025
203 | buffer_len: equ 1024
204 | buffer_read: resb 8
205 |
206 | ;; buffer for result of [`fstat(2)`](http://unixhelp.ed.ac.uk/CGI/man-cgi?fstat+2) system call
207 | statbuf: resb 144
208 |
209 | ;; Main server socket
210 | server_fd: resb 8
211 |
212 | ;; Incoming request socket
213 | client_fd: resb 8
214 |
215 | ;; File descriptor to be served
216 | file_fd: resb 8
217 |
218 | ;; Name of requested file
219 | filename: resb 255
220 | filename_len: resb 8
221 |
222 | ;; Size of a file
223 | file_size: resb 8
224 | ;; Mime type for a file
225 | mime_type: resb 8
226 |
227 | ;; ## Source code
228 |
229 | section .text
230 |
231 | ;; Define etry point
232 | global _start
233 |
234 | _start:
235 | ;; Our webserver is little more than glue code to few syscalls, actually it's amazing how much can be done only with standard system calls.
236 | ;;
237 | ;; Syscalls are made differently for different versions of architectures and operating systems. We restrict ourselvs to `x86_64` architecture.
238 | ;; To make syscall in `x86_64` you need to set `rax` register to syscall number and
239 | ;; `rdi`, `rsi`, `rdx`, `r10`, `r8`, `r9` registers to parameters 1-6 respectively.
240 | ;; Then use `syscall` instruction to pass control to kernel.
241 | ;; syscall result will be stored in `rax` register.
242 | ;; Look [here](http://blog.rchapman.org/post/36801038863/linux-system-call-table-for-x86-64) for reference.
243 |
244 | ;; ### Main socket setup
245 |
246 | ;; Call [`socket(2)`](http://unixhelp.ed.ac.uk/CGI/man-cgi?socket+2) to create IPv4 TCP socket
247 | mov rax, sys_socket
248 | mov rdi, pf_inet
249 | mov rsi, sock_stream
250 | xor rdx, rdx
251 | syscall
252 | ;; if socket was not created and syscal returned error jump to exit_error
253 | cmp rax, 0
254 | js .exit_error
255 | ;; If everything is fine, we store result into `server_fd`.
256 | mov [server_fd], rax
257 |
258 | ;; call [`setsockopt(2)`](http://unixhelp.ed.ac.uk/CGI/man-cgi?setsockopt+2) set `TCP_CORK` flag to server socket.
259 | ;; `TCP_CORK` flag will prevent sockets from flushing after we write headers.
260 | ;; This will allow use to reduce number of packets send, as first packet will include headers and first chunk of served file.
261 | ;;
262 | ;; For more info on read [blog post](http://baus.net/on-tcp_cork/) or [man page](http://linux.die.net/man/7/tcp).
263 | mov rax, sys_setsockopt
264 | mov rdi, [server_fd]
265 | mov rsi, sol_tcp
266 | mov rdx, tcp_cork
267 | mov r10, on_state
268 | mov r8, 8
269 | syscall
270 | cmp rax, 0
271 | js .exit_error
272 |
273 |
274 | ;; [`bind(2)`](http://unixhelp.ed.ac.uk/CGI/man-cgi?bind+2) to bind socket to ip, port.
275 | mov rax, sys_bind
276 | mov rdi, [server_fd]
277 | mov rsi, sockaddr
278 | mov rdx, addr_len
279 | syscall
280 | cmp rax, 0
281 | js .exit_error
282 |
283 | ;; And call [`listen(2)`](http://unixhelp.ed.ac.uk/CGI/man-cgi?listen+2) to start listening for incoming connections.
284 | ;; From now kernel will buffer number of incoming requests equal `backlog`.
285 | ;; If `backlog` is exceeded, requests will be dropped.
286 | mov rax, sys_listen
287 | mov rdi, [server_fd]
288 | mov rsi, backlog
289 | syscall
290 | cmp rax, 0
291 | js .exit_error
292 |
293 | ;; Now socket is initialized and ready to serve clients.
294 |
295 | ;; ### Main loop
296 | .accept_socket:
297 | ;; [`accept(2)`](http://unixhelp.ed.ac.uk/CGI/man-cgi?accept+2) new client from backlog.
298 | ;; Call will block untill first client connects.
299 | mov rax, sys_accept
300 | mov rdi, [server_fd]
301 | xor rsi, rsi
302 | xor rdx, rdx
303 | syscall
304 | cmp rax, 0
305 | js .exit_error
306 | ;; accept(2) return fd for incoming socket
307 | mov [client_fd], rax
308 |
309 | ;; We process each child in children processes, and when they are exited, they become [zombie processes](https://en.wikipedia.org/wiki/Zombie_process).
310 | ;; Kernel keeps their exit code and some other state until parent process gets to it, this is called `reaping`.
311 | ;; We reap all zombie process before processing each request.
312 | ;; This means that we can have some between requests.
313 | ;; We use [`waitid(2)`](http://unixhelp.ed.ac.uk/CGI/man-cgi?waitid+2) to get last process exit code.
314 | .next_process:
315 | mov rax, sys_waitid
316 | mov rdi, 0
317 | mov rsi, 0
318 | mov rdx, 0
319 | mov r10, 4
320 | mov r8, 0
321 | syscall
322 | ;; if returned value is >0 it means that we reaped process, and maybe there is more.
323 | ;; So we try again. (Errors are ignored here)
324 | cmp rax, 0
325 | jg .next_process
326 |
327 | ;; We process incoming requests one by one, so we need to return to [`accept(2)`](http://unixhelp.ed.ac.uk/CGI/man-cgi?accept+2)ing requests ASAP.
328 | ;; So we [`fork(2)`](http://unixhelp.ed.ac.uk/CGI/man-cgi?fork+2) new process to handle client. It will has it's own copy of client fd in `client_fd` variable.
329 | ;; Main process can overwrite this variable safely, as client has own copy.
330 | mov rax, sys_fork
331 | syscall
332 | ;; [`fork(2)`](http://unixhelp.ed.ac.uk/CGI/man-cgi?fork+2) returns negative number in case of error, if it's happens we ragequit from server.
333 | cmp rax, 0
334 | js .exit_error
335 | ;; If `rax` is 0, it means we are inside child, so we jump to serving request
336 | jz .process_socket
337 |
338 | ;; Otherwise we are in the main process, so we close(2) client fd and jmp to accepting new client
339 | mov rax, sys_close
340 | mov rdi, [client_fd]
341 | syscall
342 | cmp rax, 0
343 | js .exit_error
344 | jmp .accept_socket
345 |
346 | ;; ## Processing client
347 | .process_socket:
348 |
349 | ;; In child process we [`close(2)`](http://unixhelp.ed.ac.uk/CGI/man-cgi?close+2) server fd
350 | mov rax, sys_close
351 | mov rdi, [server_fd]
352 | syscall
353 | cmp rax, 0
354 | js .exit_error
355 |
356 | ;; Set alarm(2) to drop slow clients.
357 | ;; Kernel will send `ALARM` signal to child process after `request_timeout` is elapsed.
358 | ;; In happy path we will serve request and exit before alarm is triggered.
359 | ;; Otherwise we just exit child process.
360 | mov rax, sys_alarm
361 | mov rdi, request_timeout
362 | syscall
363 | cmp rax, 0
364 | js .exit_error
365 |
366 | ;; ### Parse request
367 |
368 | ;; call [`recv(2)`](http://unixhelp.ed.ac.uk/CGI/man-cgi?recv+2) to write request to `buffer`.
369 | ;; Our buffer size is limited, but we only need to make few checks and extract filename from it.
370 | mov rax, sys_recv
371 | mov rdi, [client_fd]
372 | mov rsi, buffer
373 | mov rdx, buffer_len
374 | xor r10, r10
375 | xor r8, r8
376 | xor r9, r9
377 | syscall
378 | cmp rax, 0
379 | js .exit_error
380 | ;; Our filename extracting algorithm requires that buffer ends with `" "`.
381 | mov byte [buffer + 1 + rax], " "
382 | ;; Keep bytes read count
383 | mov [buffer_read], rax
384 |
385 | ;; For now we accept only GET requests.
386 | ;; So we will return 501 error to clients if other request method is used in request.
387 | mov rax, result_unsupported_method
388 | cmp byte [buffer], "G"
389 | jnz .return_error
390 | cmp byte [buffer + 1], "E"
391 | jnz .return_error
392 | cmp byte [buffer + 2], "T"
393 | jnz .return_error
394 | cmp byte [buffer + 3], " "
395 | jnz .return_error
396 | cmp byte [buffer + 4], "/"
397 | jnz .return_error
398 |
399 | ;; call `extract_filename` procedure to extract filename to `filename` variable
400 | call extract_filename
401 |
402 | ;; `check_filenames` returns 0 if filename is valid, return 403 otherwise.
403 | call check_filename
404 | cmp rax, 0
405 | mov rax, result_forbidden
406 | jne .return_error
407 |
408 | ;; call `get_mime` to extract mime-type from `filename`.
409 | ;; It will set `mime_type` variable.
410 | call get_mime
411 |
412 | ;; Try to [`open(2)`](http://unixhelp.ed.ac.uk/CGI/man-cgi?open+2) requested file and put fd to `file_fd` variable.
413 | mov rax, sys_open
414 | mov rdi, filename
415 | xor rsi, rsi ;; no flags
416 | xor rdx, rdx ;; readonly
417 | syscall
418 | mov [file_fd], rax
419 |
420 | ;; return 404 if open file fails.
421 | cmp rax, 0
422 | mov rax, result_not_found
423 | js .return_error
424 |
425 | ;; call [`fstat(2)`](http://unixhelp.ed.ac.uk/CGI/man-cgi?fstat+2) to get file info structure and extract `file_size` from it
426 | mov rax, sys_fstat
427 | mov rdi, [file_fd]
428 | mov rsi, statbuf
429 | syscall
430 | cmp rax, 0
431 | mov rax, result_server_error
432 | js .return_error
433 | mov rax, [statbuf + 48]
434 | mov [file_size], rax
435 |
436 | ;; ### Write response
437 | ;; after request has been parsed and file found, we start writing response.
438 | .write_response:
439 |
440 | ;; read request from socket
441 | call read_full_request
442 |
443 | ;; Write headers with `write_headers` procedure
444 | call write_headers
445 | cmp rax, 0
446 | js .exit_error
447 |
448 | ;; We use [`sendfile(2)`](http://unixhelp.ed.ac.uk/CGI/man-cgi?sendfile+2) to make Kernel read data from `file_fd` and write it to `client_fd`.
449 | ;; we expect [`sendfile(2)`](http://unixhelp.ed.ac.uk/CGI/man-cgi?sendfile+2) to send whole file at once.
450 | mov rax, sys_sendfile
451 | mov rdi, [client_fd]
452 | mov rsi, [file_fd]
453 | xor rdx, rdx
454 | mov r10, [file_size]
455 | syscall ;; ignore errors
456 |
457 | ;; [`close(2)`](http://unixhelp.ed.ac.uk/CGI/man-cgi?close+2) client socket
458 | mov rax, sys_close
459 | mov rdi, [client_fd]
460 | syscall ;; ignore errors
461 |
462 |
463 | ;; and [`close(2)`](http://unixhelp.ed.ac.uk/CGI/man-cgi?close+2) file_fd
464 | mov rax, sys_close
465 | mov rdi, [file_fd]
466 | syscall ;; ignore errors
467 |
468 |
469 | ;; and finally [`exit(2)`](http://unixhelp.ed.ac.uk/CGI/man-cgi?exit+2) from child process with 0 exit code
470 | xor rax, rax
471 | jmp .exit
472 |
473 | ;; ### Error handling
474 | .return_error:
475 |
476 | ;; Write error response headers and body
477 | ;; to client socket
478 | call write_error_response
479 |
480 | ;; and [`close(2)`](http://unixhelp.ed.ac.uk/CGI/man-cgi?close+2) client socket ignoring errors.
481 | mov rax, sys_close
482 | mov rdi, [client_fd]
483 | syscall
484 |
485 | .exit_error:
486 | ;; [`write(2)`](http://unixhelp.ed.ac.uk/CGI/man-cgi?write+2) error message to `stderr`
487 | mov rax, sys_write
488 | mov rdi, 2 ; stderr
489 | mov rsi, startup_error_msg
490 | mov rdx, startup_error_msg_len
491 | syscall
492 |
493 | ;; set error code to 1
494 | mov rax, 1
495 |
496 | .exit:
497 | ;; call [`exit(2)`](http://unixhelp.ed.ac.uk/CGI/man-cgi?exit+2) syscall
498 | mov rdi, rax
499 | mov rax, sys_exit
500 | syscall
501 |
502 | ;; ## Procedures
503 |
504 | ;; ### Extract Mime Type
505 |
506 | ;; We use `filename` and `filename_len` to fill `mime_type`
507 | ;; variable. It will point to end of mime string.
508 | get_mime:
509 | mov rax, 1
510 | mov rcx, [filename_len]
511 | dec rcx
512 |
513 | ;; calculate mime_hash using algorithm in [Mime Types section](#section-Mime_Types).
514 | .get_mime_hash:
515 | xor rdx, rdx
516 | mov dl, [filename + rcx]
517 | cmp dl, "."
518 | je .get_mime_hash_done
519 | mul rdx
520 | dec rcx
521 | cmp rcx, 0
522 | je .get_mime_hash_done
523 | jmp .get_mime_hash
524 |
525 | .get_mime_hash_done:
526 | mov rcx, 0
527 |
528 | ;; Find pointer to Mime Type using `mime_table`
529 | .get_mime_get_pointer:
530 | mov r11, [mime_table + rcx]
531 | cmp r11, rax
532 | je .get_mime_pointer_done
533 | cmp r11, 0
534 | je .get_mime_pointer_done
535 | add rcx, 16
536 | jmp .get_mime_get_pointer
537 | .get_mime_pointer_done:
538 | mov rdi, [mime_table + rcx + 8]
539 | ;; and store it to `mime_type` variable
540 | mov [mime_type], rdi
541 | ret
542 |
543 |
544 | ;; ### Write headers
545 | ;; write 200 OK response and some headers to client socket
546 | write_headers:
547 |
548 | ;; We will be using stack as buffer for response headers
549 | ;; instead of making multiple write calls on socket.
550 |
551 | ;; save stack top to temporary register
552 | mov rbp, rsp
553 |
554 | ;; `push_string` uses `rcx` to keep count of free bytes in current
555 | ;; stack top, -1 means no free bytes left and we need to make
556 | ;; room for new value.
557 | mov rcx, -1
558 |
559 | ;; first we push end of headers (`\r\n\r\n`)
560 | mov rsi, crnl
561 | call push_string
562 | mov rsi, crnl
563 | call push_string
564 |
565 | ;; push `Content-Length` header
566 | mov rax, [file_size]
567 | call push_int
568 | mov rsi, content_length_header
569 | call push_string
570 |
571 | ;; push `Content-type` header
572 | mov rsi, [mime_type]
573 | call push_string
574 | mov rsi, content_type_header
575 | call push_string
576 |
577 | ;; push server name (`Server` header)
578 | mov rsi, server_header
579 | call push_string
580 |
581 | ;; Push `200 OK` response header
582 | mov rsi, result_ok
583 | call push_string
584 |
585 | ;; calculate start headers adress on stack
586 | mov rbx, rcx
587 | add rbx, rsp
588 | inc rbx
589 |
590 | ;; restore stack state
591 | mov rsp, rbp
592 |
593 | ;; calculate length of headers
594 | sub rbp, rbx
595 |
596 | ;; write(2) headers
597 | mov rax, sys_write
598 | mov rdi, [client_fd]
599 | mov rsi, rbx
600 | mov rdx, rbp
601 | syscall
602 |
603 | ret
604 |
605 | ;; ### Write error response
606 | ;; write response headers and body to client fd
607 | ;; expects rax to point to end of error response code string
608 | write_error_response:
609 | mov r11, rax
610 |
611 | ;; read request from socket
612 | call read_full_request
613 |
614 | ;; look `write_headers` method for comments on using `push_string`.
615 |
616 | ;; write end of request
617 | mov rbp, rsp
618 | mov rcx, -1
619 | mov rsi, crnl
620 | call push_string
621 |
622 | ;; write request body
623 | mov rsi, r11
624 | call push_string
625 |
626 | ;; write body | headers separator
627 | mov rsi, crnl
628 | call push_string
629 |
630 | ;; write request header
631 | mov rsi, r11
632 | call push_string
633 |
634 | ;; calculate start headers adress on stack
635 | mov rbx, rcx
636 | add rbx, rsp
637 | inc rbx
638 |
639 | ;; restore stack state
640 | mov rsp, rbp
641 |
642 | ;; [`write(2)`](http://unixhelp.ed.ac.uk/CGI/man-cgi?write+2) request from stack to client socket
643 | mov rax, sys_write
644 | mov rdi, [client_fd]
645 | mov rsi, rbx
646 | sub rbp, rbx
647 | dec rbp
648 | mov rdx, rbp
649 | syscall ;; ignore errors
650 |
651 | ret
652 |
653 |
654 | ;; ### push string
655 | ;;
656 | ;; `rsi` should point to end of string, string should begin with `0x00` byte
657 | ;; rcx is used to store byte shift from stack top (0-7), if rcx is -1 it means
658 | ;; that additional stack space is required. Funciton will grow stack.
659 | ;;
660 | ;; If push string is called multiple times it will form continious string on the stack.
661 | ;; For Example, two calls with rcx -1 `0x00, "llo"` and `0x00, "he"` will push `"hello"`
662 | ;; to the stack and set `rcx` to 2
663 | push_string:
664 | ;; remove return address from the stack
665 | ;; and store it to `rdx` register
666 | pop rdx
667 | ;; we use `0x00` to mark begining of passed string.
668 | mov al, 0x00
669 |
670 | .push_string_next:
671 | ;; if we have no free bytes on stack
672 | ;; add 8 bytes and change `rcx` accordingly
673 | cmp rcx, -1
674 | jne .push_string_write
675 | push 0
676 | mov rcx, 7
677 |
678 | .push_string_write:
679 | ;; move string to stack starting from string end until `0x00`
680 | dec rsi
681 | mov rbx, [rsi]
682 | cmp al, bl
683 | je .push_string_ret
684 | mov byte [rsp + rcx], bl
685 | dec rcx
686 | jmp .push_string_next
687 |
688 | .push_string_ret:
689 | ;; restore stack
690 | push rdx
691 | ret
692 |
693 | ;; ### Push int
694 | ;; converts rax to string and calls push_string on it
695 | push_int:
696 | ;; remove return address from the stack
697 | ;; and store it to `rdi` register.
698 | pop rdi
699 |
700 | ;; we convert integer value to sequence of characters with base 10 and push each character with `push_string` procedure.
701 | mov r8, rax
702 | .push_int_next:
703 | mov rax, r8
704 | xor rdx, rdx
705 | mov r11, 10
706 | div r11
707 | mov r8, rax
708 | add dl, 48
709 | mov rsi, rsp
710 | sub rsi, 8
711 | mov byte [rsi - 1], dl
712 | mov byte [rsi - 2], 0x00
713 | call push_string
714 | cmp r8, 0
715 | je .push_int_ret
716 | jmp .push_int_next
717 | .push_int_ret:
718 | ;; restore stack
719 | push rdi
720 | ret
721 |
722 | ;; ### Read rest of request
723 | ;; Spec requires us to read full request with headers before we can send response.
724 | read_full_request:
725 | ;; We kept amout of read from socket in `buffer_read` variable.
726 | mov rax, [buffer_read]
727 | ;; We check that last bytes recieved from client were `\r\n\r\n`
728 | .check_buffer:
729 | cmp byte [buffer + rax - 1], 10
730 | jne .read_more_from_client_socket
731 | cmp byte [buffer + rax - 2], 13
732 | jne .read_more_from_client_socket
733 | cmp byte [buffer + rax - 3], 10
734 | jne .read_more_from_client_socket
735 | cmp byte [buffer + rax - 4], 13
736 | jne .read_more_from_client_socket
737 | ret
738 |
739 | ;; if not we [`recv(2)`](http://unixhelp.ed.ac.uk/CGI/man-cgi?recv+2) more data from socket and check buffer again in a loop.
740 | .read_more_from_client_socket:
741 | mov rax, sys_recv
742 | mov rdi, [client_fd]
743 | mov rsi, buffer
744 | mov rdx, buffer_len
745 | xor r10, r10
746 | xor r8, r8
747 | xor r9, r9
748 | syscall
749 | jmp .check_buffer
750 |
751 | ;; ### Extract filename
752 | ;; fills filename and filename_len variables based on request buffer content.
753 | extract_filename:
754 | ;; we expect only get request in buffer, so filename should start with fitth character, after `GET /` string.
755 | mov rsi, buffer + 5
756 | mov rdi, filename
757 | xor rcx, rcx
758 |
759 | ;; We copy characters from buffer untill we see `'?'` or `' '` character.
760 | .extract_filename_next_char:
761 | cld
762 | cmp byte [rsi], " "
763 | jz .extract_filename_check_index
764 | cmp byte [rsi], "?"
765 | jz .extract_filename_check_index
766 | movsb
767 | jmp .extract_filename_next_char
768 |
769 | ;; If filename is empty (client requested `/`), we set `filename` to be `index.html`
770 | .extract_filename_check_index:
771 | mov rcx, rdi
772 | sub rcx, filename
773 | cmp rcx, 0
774 | jnz .extract_filename_done
775 | mov rax, "index.ht"
776 | mov [filename ], rax
777 | mov rax, "ml"
778 | mov [filename + 8], rax
779 | mov rcx, 10
780 |
781 | .extract_filename_done:
782 | mov [filename_len], rcx
783 | ret
784 |
785 | ;; ### Check filename
786 | ;; Checks that filename is safe to [`read(2)`](http://unixhelp.ed.ac.uk/CGI/man-cgi?read+2) from filesystem.
787 | check_filename:
788 | mov rsi, -1
789 |
790 | ;; First check `filename` characters match whitelist
791 | .check_filename_whitelist:
792 | inc rsi
793 | mov byte al, [filename + rsi]
794 | cmp rsi, [filename_len]
795 | jz .check_filename_whitelist_ok
796 | mov rdi, url_whitelist
797 | mov rcx, url_whitelist_len
798 | repne scasb
799 | je .check_filename_whitelist
800 | jmp .check_filename_return_error
801 |
802 | .check_filename_whitelist_ok:
803 | mov rcx, [filename_len]
804 |
805 | ;; First that filename doesn't contain `".."` in it.
806 | .check_filename_double_dot:
807 | dec rcx
808 | cmp word [filename + rcx], ".."
809 | je .check_filename_return_error
810 | cmp rcx, 0
811 | je .check_filename_return_success
812 | jmp .check_filename_double_dot
813 |
814 | .check_filename_return_success:
815 | xor rax, rax
816 | ret
817 |
818 | .check_filename_return_error:
819 | mov rax, 1
820 | ret
821 |
822 | ;; ## Known issues
823 | ;;
824 | ;; - We use tmp registers to store some global state between procedure calls.
825 | ;; This makes recursion impossible and can lead to hidden bugs.
826 | ;; Natural way to solve this is to use stack for keeping state between procedure calls, but we use stack to build response string.
827 | ;; - While simple, forking on each request is not optimal for perfomance.
828 | ;; Modern webservers use [`epoll(2)`](http://unixhelp.ed.ac.uk/CGI/man-cgi?epoll+2) to process multiple requests in single process.
829 |
830 | ;; ## License
831 | ;;
832 | ;; Copyright (c) 2015 Vladimir Terekhov
833 | ;;
834 | ;; Permission is hereby granted, free of charge, to any person
835 | ;; obtaining a copy of this software and associated documentation
836 | ;; files (the "Software"), to deal in the Software without
837 | ;; restriction, including without limitation the rights to use,
838 | ;; copy, modify, merge, publish, distribute, sublicense, and/or sell
839 | ;; copies of the Software, and to permit persons to whom the
840 | ;; Software is furnished to do so, subject to the following
841 | ;; conditions:
842 | ;;
843 | ;; The above copyright notice and this permission notice shall be
844 | ;; included in all copies or substantial portions of the Software.
845 | ;;
846 | ;; THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
847 | ;; EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
848 | ;; OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
849 | ;; NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
850 | ;; HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
851 | ;; WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
852 | ;; FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
853 | ;; OTHER DEALINGS IN THE SOFTWARE.
854 |
--------------------------------------------------------------------------------