├── .gitignore ├── index.html ├── test.sh ├── Makefile ├── Dockerfile ├── README.md ├── layout.mustache └── server.asm /.gitignore: -------------------------------------------------------------------------------- 1 | server.o 2 | server.html 3 | server 4 | .#* -------------------------------------------------------------------------------- /index.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | servasm 6 | 7 | 8 |

servasm

9 |

Your other webserver.

10 | 11 | 12 | -------------------------------------------------------------------------------- /test.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # Simple smoke tests 4 | 5 | set -e 6 | 7 | ./server & 8 | PID=$! 9 | 10 | at_exit(){ 11 | kill -9 $PID 12 | } 13 | trap at_exit EXIT 14 | 15 | sleep 1 16 | 17 | diff -u <(curl -s localhost:8080) <(cat index.html) 18 | diff -u <(curl -s localhost:8080/server.asm) <(cat server.asm) 19 | diff -u <(curl -s localhost:8080/foobar) <(echo -ne 'HTTP/1.0 404 File not found\r\n\r') 20 | -------------------------------------------------------------------------------- /Makefile: -------------------------------------------------------------------------------- 1 | default: server 2 | 3 | server: server.html server.asm 4 | nasm -g -f elf64 -o server.o server.asm 5 | ld -o server server.o 6 | 7 | server.html: 8 | rocco -l asm -c ';;' -t layout.mustache server.asm 9 | 10 | .PHONY: build_docker 11 | build_docker: 12 | cat Dockerfile | docker build -t servasm - 13 | 14 | .PHONY: test 15 | test: 16 | bash ./test.sh 17 | 18 | .PHONY: clean 19 | clean: 20 | rm server server.o server.html 21 | -------------------------------------------------------------------------------- /Dockerfile: -------------------------------------------------------------------------------- 1 | FROM debian:sid 2 | MAINTAINER Vladimir Terekhov 3 | 4 | RUN apt-get update && \ 5 | apt-get install -y locales && \ 6 | dpkg-reconfigure locales && \ 7 | locale-gen C.UTF-8 && \ 8 | /usr/sbin/update-locale LANG=C.UTF-8 9 | 10 | ENV LC_ALL C.UTF-8 11 | 12 | # Installing ruby 13 | 14 | RUN apt-get install -y build-essential nasm ruby ruby-dev python-pygments && gem install rocco curl 15 | 16 | CMD /bin/bash -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # SERVASM: Your other webserver. 2 | 3 | Minimal x86_64 Linux-only file webserver, written in assembly language. 4 | It doesn't allocate any memory, using only stack to serve files. 5 | 6 | *Not intended for production use.* 7 | 8 | ## How it works: 9 | 10 | Main process setups listing socket on 8080 port with few system calls: 11 | `socket(2)` -> `bind(2)` -> `listen(2)` 12 | After main process blocks on on `accept(2)` system call until client connects. 13 | Then it `fork(2)` main process passing dealing with request in child process and `accept(2)`'ing again in main. 14 | On a child process sets `alarm(2)` to drop very slow clients, and `recv(2)` headers. 15 | We do couple checks on incoming request (only GET requests are `supported). 16 | open(2)` file and get its size with `fstat(2). 17 | write(2)` headers and let the kernel send rest with `sendfile(2)`. After we `close(2)` socket and file. 18 | 19 | In a case of error we exit process with passing system call result as exit code. 20 | 21 | ## Running 22 | 23 | Compiling server requires `nasm` assembler. 24 | 25 | `make && ./server` 26 | 27 | ## Debugging 28 | 29 | `make && strace -v -s 512 -f ./server` 30 | 31 | ## License 32 | 33 | Copyright (c) 2015 Vladimir Terekhov 34 | 35 | Permission is hereby granted, free of charge, to any person 36 | obtaining a copy of this software and associated documentation 37 | files (the "Software"), to deal in the Software without 38 | restriction, including without limitation the rights to use, 39 | copy, modify, merge, publish, distribute, sublicense, and/or sell 40 | copies of the Software, and to permit persons to whom the 41 | Software is furnished to do so, subject to the following 42 | conditions: 43 | 44 | The above copyright notice and this permission notice shall be 45 | included in all copies or substantial portions of the Software. 46 | 47 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 48 | EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES 49 | OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND 50 | NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT 51 | HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, 52 | WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 53 | FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR 54 | OTHER DEALINGS IN THE SOFTWARE. 55 | -------------------------------------------------------------------------------- /layout.mustache: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | {{ title }} 6 | 157 | 158 | 159 |
160 | 161 | 162 | {{#sections}} 163 | 164 | 167 | 170 | 171 | {{/sections}} 172 |
165 | {{{ docs }}} 166 | 168 |
{{{ code }}}
169 |
173 |
174 | 175 | -------------------------------------------------------------------------------- /server.asm: -------------------------------------------------------------------------------- 1 | ;; # SERVASM: Your other webserver. 2 | ;; 3 | ;; Minimal x86_64 Linux-only file webserver written in assembly language. 4 | ;; This page is literate program with all service source code. 5 | ;; [Project repository and build instructions](https://github.com/zarkzork/servasm). 6 | ;; 7 | ;; *Warning: server is not intented for production use. It may and will wreck you stuff.* 8 | 9 | ;; ## Overview 10 | ;; 11 | ;; Servasm is forking server, each request is processed in separate process. 12 | ;; This is how it was done in Mesozoic Era (except we use [`sendfile(2)`](http://unixhelp.ed.ac.uk/CGI/man-cgi?sendfile+2), which wasn't invented then). 13 | ;; And this allows us to make things stupidly simple and take as much leverage from Kernel as possible. 14 | ;; We aim for ~1kloc of assembly with comments and spaces. 15 | ;; 16 | ;; Main process setups listing socket with few system calls: 17 | ;; 18 | ;; [`socket(2)`](http://unixhelp.ed.ac.uk/CGI/man-cgi?socket+2) → [`bind(2)`](http://unixhelp.ed.ac.uk/CGI/man-cgi?bind+2) → [`listen(2)`](http://unixhelp.ed.ac.uk/CGI/man-cgi?listen+2) 19 | ;; 20 | ;; Then main process loops on [`accept(2)`](http://unixhelp.ed.ac.uk/CGI/man-cgi?accept+2) system call. 21 | ;; For each request it [`fork(2)`](http://unixhelp.ed.ac.uk/CGI/man-cgi?fork+2)s main process and processes request there:. 22 | ;; 23 | ;; 1. set [`alarm(2)`](http://unixhelp.ed.ac.uk/CGI/man-cgi?alarm+2) to drop very slow clients 24 | ;; 2. [`recv(2)`](http://unixhelp.ed.ac.uk/CGI/man-cgi?recv+2) request headers 25 | ;; 3. check that request is valid (only GET requests are supported) 26 | ;; 4. [`open(2)`](http://unixhelp.ed.ac.uk/CGI/man-cgi?open+2) requested file 27 | ;; 5. get its size with [`fstat(2)`](http://unixhelp.ed.ac.uk/CGI/man-cgi?fstat+2). 28 | ;; 6. [`write(2)`](http://unixhelp.ed.ac.uk/CGI/man-cgi?write+2) response headers 29 | ;; 7. let kernel send rest with [`sendfile(2)`](http://unixhelp.ed.ac.uk/CGI/man-cgi?sendfile+2) 30 | ;; 8. [`close(2)`](http://unixhelp.ed.ac.uk/CGI/man-cgi?close+2) socket and file 31 | ;; 9. [`exit(2)`](http://unixhelp.ed.ac.uk/CGI/man-cgi?exit+2) child 32 | ;; 33 | ;; In a case of error we exit process with passing system call result as exit code. 34 | 35 | ;; ## Reference material 36 | ;; 37 | ;; - [Assembly x86_64 programming for Linux](http://0xax.blogspot.fr/p/assembly-x8664-programming-for-linux.html): introductory blog posts about asm for x86_64 architecture 38 | ;; - [Beej's Guide to Network Programming](http://beej.us/guide/bgnet/): detailed tutorial about unix networking 39 | ;; - Servasm implementation loosely based on [althttpd.c](https://www.sqlite.org/docsrc/artifact/d53e8146bf7977) from sqlite project 40 | ;; - [Stack frame layout on x86-64](http://eli.thegreenplace.net/2011/09/06/stack-frame-layout-on-x86-64/): post about stackframe layout for x86_64 41 | ;; - [Linux System Call Table for x86_64](http://blog.rchapman.org/post/36801038863/linux-system-call-table-for-x86-64) 42 | ;; 43 | 44 | ;; ## Constants 45 | ;; 46 | ;; Data section keeps all static constants that we might need during server lifetime. 47 | section .data 48 | 49 | ;; We are going to use IPv4 and TCP as our transport. 50 | pf_inet: equ 2 51 | sock_stream: equ 1 52 | 53 | ;; Our server binds to `0.0.0.0:8080` interface. 54 | ;; `0.0.0.0` is special ip address that will map to all interfaces on user machine. 55 | sockaddr: db 0x02, 0x00 ;; AFINET 56 | db 0x1f, 0x90 ;; PORT 8080 57 | db 0x00, 0x00, 0x00, 0x00 ;; IP 0.0.0.0 58 | addr_len: equ 128 59 | 60 | ;; Requests timeout in 15 second. 61 | request_timeout: equ 15 62 | 63 | ;; Backlog is number of incoming request that kernel will buffer for us, untill we [`accept(2)`](http://unixhelp.ed.ac.uk/CGI/man-cgi?accept+2) them. 64 | ;; We set it to 128. 65 | backlog: equ 128 66 | 67 | ;; And we are going to use `TCP_CORK` option (more on it later) 68 | sol_tcp: equ 6 69 | tcp_cork: equ 3 70 | on_state: db 0x01 71 | 72 | ;; We store strings as pair of their content and their length following right after message. 73 | ;; `$` points to current memory address, so current address - start of the string is its length. 74 | startup_error_msg: db "ERROR: Cannot start server", 10 75 | startup_error_msg_len: equ $ - startup_error_msg 76 | 77 | ;; for incoming request we restrict path to be alphanumeric plus `./` 78 | url_whitelist: db "abcdefghijklmnopqrstuvwxyz" 79 | db "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789./" 80 | url_whitelist_len: equ $ - url_whitelist 81 | 82 | ;; ## Lookup tables. 83 | 84 | ;; Syscall table for x86-64. 85 | ;; For reference look [here](http://blog.rchapman.org/post/36801038863/linux-system-call-table-for-x86-64). 86 | sys_write: equ 1 87 | sys_open: equ 2 88 | sys_close: equ 3 89 | sys_fstat: equ 5 90 | sys_alarm: equ 37 91 | sys_sendfile: equ 40 92 | sys_socket: equ 41 93 | sys_accept: equ 43 94 | sys_recv: equ 45 95 | sys_bind: equ 49 96 | sys_listen: equ 50 97 | sys_setsockopt: equ 54 98 | sys_fork: equ 57 99 | sys_exit: equ 60 100 | sys_waitid: equ 247 101 | 102 | 103 | ;; We build response headers on stack. 104 | ;; That means that we need to push strings from last one, for example to build header: 105 | ;; 106 | ;; HTTP/1.0 200 OK\r\n 107 | ;; Server: servasm\r\n 108 | ;; Content-type: text/html; charset=UTF-8\r\n 109 | ;; Content-Length: 42\r\n 110 | ;; 111 | ;; We will push `\n\r24 :htgneL-tnetnoC\n\r8-FTU=tesrahc...`. 112 | ;; To make this easy we keep pointers to the end of string instead of beggining and use `0x00` byte to mark begining of the string. 113 | 114 | 115 | ;; We use stack to build headers. string, so all headers are pushed from last character to the first one. 116 | ;; We use 117 | 118 | ;; `\r\n` string 119 | db 0x00, 13, 10 120 | crnl: 121 | 122 | ;; ### Response codes 123 | 124 | ;; [200 OK](http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.2.1) 125 | 126 | db 0x00, "HTTP/1.0 200 OK", 13, 10 127 | result_ok: 128 | ;; [403 Forbidden](http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.4) 129 | db 0x00, "HTTP/1.0 403 Forbidden", 13, 10 130 | result_forbidden: 131 | ;; [404 Not Found](http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.5) 132 | db 0x00, "HTTP/1.0 404 File not found", 13, 10 133 | result_not_found: 134 | ;; [500 Internal Server Error](http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.5.1) 135 | db 0x00, "HTTP/1.0 500 OOPSIE", 13, 10 136 | result_server_error: 137 | ;; [500 Not Implemented](http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.5.2) 138 | db 0x00, "HTTP/1.0 501 Not Implemented", 13, 10 139 | result_unsupported_method: 140 | 141 | ;; ### Mime Types 142 | 143 | ;; We use small amout of predefined mime-types backed in source code. 144 | ;; And support only utf-8 encoding 145 | 146 | db 0x00, "text/plain; charset=UTF-8", 13, 10 147 | txt: 148 | db 0x00, "text/html; charset=UTF-8", 13, 10 149 | html: 150 | db 0x00, "text/css; charset=UTF-8", 13, 10 151 | css: 152 | db 0x00, "css/js; charset=UTF-8", 13, 10 153 | js: 154 | db 0x00, "image/png", 13, 10 155 | png: 156 | db 0x00, "image/jpeg", 13, 10 157 | jpg: 158 | db 0x00, "application/octet-stream", 13, 10 159 | other: 160 | 161 | ;; Mime type hash table 162 | ;; Each entry has two quad words. 163 | ;; first quad word is product of extension ascii codes. 164 | ;; For example: 165 | ;; 166 | ;; 104(h) * 116 (t) * 109 (m) * 108 (l) = 142017408 = 0x8770380 167 | ;; 168 | ;; This means that some unknown files can be served with wrong mime-type in case of hash collision. 169 | ;; And this is okay. Repeat after me: this is okay. 170 | ;; 171 | ;; Second quad word — pointer to the end of matched mime. 172 | ;; In the case file type is uknown we serve it with `application/octet-stream`. 173 | 174 | mime_table: dq 0x18a380, txt 175 | dq 0x8770380, html 176 | dq 0x13fa5b, css 177 | dq 0x2f9e, js 178 | dq 0x135ce0, png 179 | dq 0x12a8a0, jpg 180 | dq 0x0, other 181 | 182 | ;; ### Headers 183 | 184 | ;; [Content-type](http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.17) 185 | db 0x00, "Content-type: " 186 | content_type_header: 187 | 188 | ;; [Content-Length](http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.13) 189 | db 0x00, "Content-Length: " 190 | content_length_header: 191 | 192 | ;; [Server](http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.38) 193 | db 0x00, "Server: servasm", 13, 10, 194 | server_header: 195 | 196 | ;; ## Variables 197 | 198 | ;; `BSS` section stores data that can be changed during application execution. 199 | section .bss 200 | 201 | ;; We will store incoming request in buffer limited to 255 bytes. 202 | buffer: resb 1025 203 | buffer_len: equ 1024 204 | buffer_read: resb 8 205 | 206 | ;; buffer for result of [`fstat(2)`](http://unixhelp.ed.ac.uk/CGI/man-cgi?fstat+2) system call 207 | statbuf: resb 144 208 | 209 | ;; Main server socket 210 | server_fd: resb 8 211 | 212 | ;; Incoming request socket 213 | client_fd: resb 8 214 | 215 | ;; File descriptor to be served 216 | file_fd: resb 8 217 | 218 | ;; Name of requested file 219 | filename: resb 255 220 | filename_len: resb 8 221 | 222 | ;; Size of a file 223 | file_size: resb 8 224 | ;; Mime type for a file 225 | mime_type: resb 8 226 | 227 | ;; ## Source code 228 | 229 | section .text 230 | 231 | ;; Define etry point 232 | global _start 233 | 234 | _start: 235 | ;; Our webserver is little more than glue code to few syscalls, actually it's amazing how much can be done only with standard system calls. 236 | ;; 237 | ;; Syscalls are made differently for different versions of architectures and operating systems. We restrict ourselvs to `x86_64` architecture. 238 | ;; To make syscall in `x86_64` you need to set `rax` register to syscall number and 239 | ;; `rdi`, `rsi`, `rdx`, `r10`, `r8`, `r9` registers to parameters 1-6 respectively. 240 | ;; Then use `syscall` instruction to pass control to kernel. 241 | ;; syscall result will be stored in `rax` register. 242 | ;; Look [here](http://blog.rchapman.org/post/36801038863/linux-system-call-table-for-x86-64) for reference. 243 | 244 | ;; ### Main socket setup 245 | 246 | ;; Call [`socket(2)`](http://unixhelp.ed.ac.uk/CGI/man-cgi?socket+2) to create IPv4 TCP socket 247 | mov rax, sys_socket 248 | mov rdi, pf_inet 249 | mov rsi, sock_stream 250 | xor rdx, rdx 251 | syscall 252 | ;; if socket was not created and syscal returned error jump to exit_error 253 | cmp rax, 0 254 | js .exit_error 255 | ;; If everything is fine, we store result into `server_fd`. 256 | mov [server_fd], rax 257 | 258 | ;; call [`setsockopt(2)`](http://unixhelp.ed.ac.uk/CGI/man-cgi?setsockopt+2) set `TCP_CORK` flag to server socket. 259 | ;; `TCP_CORK` flag will prevent sockets from flushing after we write headers. 260 | ;; This will allow use to reduce number of packets send, as first packet will include headers and first chunk of served file. 261 | ;; 262 | ;; For more info on read [blog post](http://baus.net/on-tcp_cork/) or [man page](http://linux.die.net/man/7/tcp). 263 | mov rax, sys_setsockopt 264 | mov rdi, [server_fd] 265 | mov rsi, sol_tcp 266 | mov rdx, tcp_cork 267 | mov r10, on_state 268 | mov r8, 8 269 | syscall 270 | cmp rax, 0 271 | js .exit_error 272 | 273 | 274 | ;; [`bind(2)`](http://unixhelp.ed.ac.uk/CGI/man-cgi?bind+2) to bind socket to ip, port. 275 | mov rax, sys_bind 276 | mov rdi, [server_fd] 277 | mov rsi, sockaddr 278 | mov rdx, addr_len 279 | syscall 280 | cmp rax, 0 281 | js .exit_error 282 | 283 | ;; And call [`listen(2)`](http://unixhelp.ed.ac.uk/CGI/man-cgi?listen+2) to start listening for incoming connections. 284 | ;; From now kernel will buffer number of incoming requests equal `backlog`. 285 | ;; If `backlog` is exceeded, requests will be dropped. 286 | mov rax, sys_listen 287 | mov rdi, [server_fd] 288 | mov rsi, backlog 289 | syscall 290 | cmp rax, 0 291 | js .exit_error 292 | 293 | ;; Now socket is initialized and ready to serve clients. 294 | 295 | ;; ### Main loop 296 | .accept_socket: 297 | ;; [`accept(2)`](http://unixhelp.ed.ac.uk/CGI/man-cgi?accept+2) new client from backlog. 298 | ;; Call will block untill first client connects. 299 | mov rax, sys_accept 300 | mov rdi, [server_fd] 301 | xor rsi, rsi 302 | xor rdx, rdx 303 | syscall 304 | cmp rax, 0 305 | js .exit_error 306 | ;; accept(2) return fd for incoming socket 307 | mov [client_fd], rax 308 | 309 | ;; We process each child in children processes, and when they are exited, they become [zombie processes](https://en.wikipedia.org/wiki/Zombie_process). 310 | ;; Kernel keeps their exit code and some other state until parent process gets to it, this is called `reaping`. 311 | ;; We reap all zombie process before processing each request. 312 | ;; This means that we can have some between requests. 313 | ;; We use [`waitid(2)`](http://unixhelp.ed.ac.uk/CGI/man-cgi?waitid+2) to get last process exit code. 314 | .next_process: 315 | mov rax, sys_waitid 316 | mov rdi, 0 317 | mov rsi, 0 318 | mov rdx, 0 319 | mov r10, 4 320 | mov r8, 0 321 | syscall 322 | ;; if returned value is >0 it means that we reaped process, and maybe there is more. 323 | ;; So we try again. (Errors are ignored here) 324 | cmp rax, 0 325 | jg .next_process 326 | 327 | ;; We process incoming requests one by one, so we need to return to [`accept(2)`](http://unixhelp.ed.ac.uk/CGI/man-cgi?accept+2)ing requests ASAP. 328 | ;; So we [`fork(2)`](http://unixhelp.ed.ac.uk/CGI/man-cgi?fork+2) new process to handle client. It will has it's own copy of client fd in `client_fd` variable. 329 | ;; Main process can overwrite this variable safely, as client has own copy. 330 | mov rax, sys_fork 331 | syscall 332 | ;; [`fork(2)`](http://unixhelp.ed.ac.uk/CGI/man-cgi?fork+2) returns negative number in case of error, if it's happens we ragequit from server. 333 | cmp rax, 0 334 | js .exit_error 335 | ;; If `rax` is 0, it means we are inside child, so we jump to serving request 336 | jz .process_socket 337 | 338 | ;; Otherwise we are in the main process, so we close(2) client fd and jmp to accepting new client 339 | mov rax, sys_close 340 | mov rdi, [client_fd] 341 | syscall 342 | cmp rax, 0 343 | js .exit_error 344 | jmp .accept_socket 345 | 346 | ;; ## Processing client 347 | .process_socket: 348 | 349 | ;; In child process we [`close(2)`](http://unixhelp.ed.ac.uk/CGI/man-cgi?close+2) server fd 350 | mov rax, sys_close 351 | mov rdi, [server_fd] 352 | syscall 353 | cmp rax, 0 354 | js .exit_error 355 | 356 | ;; Set alarm(2) to drop slow clients. 357 | ;; Kernel will send `ALARM` signal to child process after `request_timeout` is elapsed. 358 | ;; In happy path we will serve request and exit before alarm is triggered. 359 | ;; Otherwise we just exit child process. 360 | mov rax, sys_alarm 361 | mov rdi, request_timeout 362 | syscall 363 | cmp rax, 0 364 | js .exit_error 365 | 366 | ;; ### Parse request 367 | 368 | ;; call [`recv(2)`](http://unixhelp.ed.ac.uk/CGI/man-cgi?recv+2) to write request to `buffer`. 369 | ;; Our buffer size is limited, but we only need to make few checks and extract filename from it. 370 | mov rax, sys_recv 371 | mov rdi, [client_fd] 372 | mov rsi, buffer 373 | mov rdx, buffer_len 374 | xor r10, r10 375 | xor r8, r8 376 | xor r9, r9 377 | syscall 378 | cmp rax, 0 379 | js .exit_error 380 | ;; Our filename extracting algorithm requires that buffer ends with `" "`. 381 | mov byte [buffer + 1 + rax], " " 382 | ;; Keep bytes read count 383 | mov [buffer_read], rax 384 | 385 | ;; For now we accept only GET requests. 386 | ;; So we will return 501 error to clients if other request method is used in request. 387 | mov rax, result_unsupported_method 388 | cmp byte [buffer], "G" 389 | jnz .return_error 390 | cmp byte [buffer + 1], "E" 391 | jnz .return_error 392 | cmp byte [buffer + 2], "T" 393 | jnz .return_error 394 | cmp byte [buffer + 3], " " 395 | jnz .return_error 396 | cmp byte [buffer + 4], "/" 397 | jnz .return_error 398 | 399 | ;; call `extract_filename` procedure to extract filename to `filename` variable 400 | call extract_filename 401 | 402 | ;; `check_filenames` returns 0 if filename is valid, return 403 otherwise. 403 | call check_filename 404 | cmp rax, 0 405 | mov rax, result_forbidden 406 | jne .return_error 407 | 408 | ;; call `get_mime` to extract mime-type from `filename`. 409 | ;; It will set `mime_type` variable. 410 | call get_mime 411 | 412 | ;; Try to [`open(2)`](http://unixhelp.ed.ac.uk/CGI/man-cgi?open+2) requested file and put fd to `file_fd` variable. 413 | mov rax, sys_open 414 | mov rdi, filename 415 | xor rsi, rsi ;; no flags 416 | xor rdx, rdx ;; readonly 417 | syscall 418 | mov [file_fd], rax 419 | 420 | ;; return 404 if open file fails. 421 | cmp rax, 0 422 | mov rax, result_not_found 423 | js .return_error 424 | 425 | ;; call [`fstat(2)`](http://unixhelp.ed.ac.uk/CGI/man-cgi?fstat+2) to get file info structure and extract `file_size` from it 426 | mov rax, sys_fstat 427 | mov rdi, [file_fd] 428 | mov rsi, statbuf 429 | syscall 430 | cmp rax, 0 431 | mov rax, result_server_error 432 | js .return_error 433 | mov rax, [statbuf + 48] 434 | mov [file_size], rax 435 | 436 | ;; ### Write response 437 | ;; after request has been parsed and file found, we start writing response. 438 | .write_response: 439 | 440 | ;; read request from socket 441 | call read_full_request 442 | 443 | ;; Write headers with `write_headers` procedure 444 | call write_headers 445 | cmp rax, 0 446 | js .exit_error 447 | 448 | ;; We use [`sendfile(2)`](http://unixhelp.ed.ac.uk/CGI/man-cgi?sendfile+2) to make Kernel read data from `file_fd` and write it to `client_fd`. 449 | ;; we expect [`sendfile(2)`](http://unixhelp.ed.ac.uk/CGI/man-cgi?sendfile+2) to send whole file at once. 450 | mov rax, sys_sendfile 451 | mov rdi, [client_fd] 452 | mov rsi, [file_fd] 453 | xor rdx, rdx 454 | mov r10, [file_size] 455 | syscall ;; ignore errors 456 | 457 | ;; [`close(2)`](http://unixhelp.ed.ac.uk/CGI/man-cgi?close+2) client socket 458 | mov rax, sys_close 459 | mov rdi, [client_fd] 460 | syscall ;; ignore errors 461 | 462 | 463 | ;; and [`close(2)`](http://unixhelp.ed.ac.uk/CGI/man-cgi?close+2) file_fd 464 | mov rax, sys_close 465 | mov rdi, [file_fd] 466 | syscall ;; ignore errors 467 | 468 | 469 | ;; and finally [`exit(2)`](http://unixhelp.ed.ac.uk/CGI/man-cgi?exit+2) from child process with 0 exit code 470 | xor rax, rax 471 | jmp .exit 472 | 473 | ;; ### Error handling 474 | .return_error: 475 | 476 | ;; Write error response headers and body 477 | ;; to client socket 478 | call write_error_response 479 | 480 | ;; and [`close(2)`](http://unixhelp.ed.ac.uk/CGI/man-cgi?close+2) client socket ignoring errors. 481 | mov rax, sys_close 482 | mov rdi, [client_fd] 483 | syscall 484 | 485 | .exit_error: 486 | ;; [`write(2)`](http://unixhelp.ed.ac.uk/CGI/man-cgi?write+2) error message to `stderr` 487 | mov rax, sys_write 488 | mov rdi, 2 ; stderr 489 | mov rsi, startup_error_msg 490 | mov rdx, startup_error_msg_len 491 | syscall 492 | 493 | ;; set error code to 1 494 | mov rax, 1 495 | 496 | .exit: 497 | ;; call [`exit(2)`](http://unixhelp.ed.ac.uk/CGI/man-cgi?exit+2) syscall 498 | mov rdi, rax 499 | mov rax, sys_exit 500 | syscall 501 | 502 | ;; ## Procedures 503 | 504 | ;; ### Extract Mime Type 505 | 506 | ;; We use `filename` and `filename_len` to fill `mime_type` 507 | ;; variable. It will point to end of mime string. 508 | get_mime: 509 | mov rax, 1 510 | mov rcx, [filename_len] 511 | dec rcx 512 | 513 | ;; calculate mime_hash using algorithm in [Mime Types section](#section-Mime_Types). 514 | .get_mime_hash: 515 | xor rdx, rdx 516 | mov dl, [filename + rcx] 517 | cmp dl, "." 518 | je .get_mime_hash_done 519 | mul rdx 520 | dec rcx 521 | cmp rcx, 0 522 | je .get_mime_hash_done 523 | jmp .get_mime_hash 524 | 525 | .get_mime_hash_done: 526 | mov rcx, 0 527 | 528 | ;; Find pointer to Mime Type using `mime_table` 529 | .get_mime_get_pointer: 530 | mov r11, [mime_table + rcx] 531 | cmp r11, rax 532 | je .get_mime_pointer_done 533 | cmp r11, 0 534 | je .get_mime_pointer_done 535 | add rcx, 16 536 | jmp .get_mime_get_pointer 537 | .get_mime_pointer_done: 538 | mov rdi, [mime_table + rcx + 8] 539 | ;; and store it to `mime_type` variable 540 | mov [mime_type], rdi 541 | ret 542 | 543 | 544 | ;; ### Write headers 545 | ;; write 200 OK response and some headers to client socket 546 | write_headers: 547 | 548 | ;; We will be using stack as buffer for response headers 549 | ;; instead of making multiple write calls on socket. 550 | 551 | ;; save stack top to temporary register 552 | mov rbp, rsp 553 | 554 | ;; `push_string` uses `rcx` to keep count of free bytes in current 555 | ;; stack top, -1 means no free bytes left and we need to make 556 | ;; room for new value. 557 | mov rcx, -1 558 | 559 | ;; first we push end of headers (`\r\n\r\n`) 560 | mov rsi, crnl 561 | call push_string 562 | mov rsi, crnl 563 | call push_string 564 | 565 | ;; push `Content-Length` header 566 | mov rax, [file_size] 567 | call push_int 568 | mov rsi, content_length_header 569 | call push_string 570 | 571 | ;; push `Content-type` header 572 | mov rsi, [mime_type] 573 | call push_string 574 | mov rsi, content_type_header 575 | call push_string 576 | 577 | ;; push server name (`Server` header) 578 | mov rsi, server_header 579 | call push_string 580 | 581 | ;; Push `200 OK` response header 582 | mov rsi, result_ok 583 | call push_string 584 | 585 | ;; calculate start headers adress on stack 586 | mov rbx, rcx 587 | add rbx, rsp 588 | inc rbx 589 | 590 | ;; restore stack state 591 | mov rsp, rbp 592 | 593 | ;; calculate length of headers 594 | sub rbp, rbx 595 | 596 | ;; write(2) headers 597 | mov rax, sys_write 598 | mov rdi, [client_fd] 599 | mov rsi, rbx 600 | mov rdx, rbp 601 | syscall 602 | 603 | ret 604 | 605 | ;; ### Write error response 606 | ;; write response headers and body to client fd 607 | ;; expects rax to point to end of error response code string 608 | write_error_response: 609 | mov r11, rax 610 | 611 | ;; read request from socket 612 | call read_full_request 613 | 614 | ;; look `write_headers` method for comments on using `push_string`. 615 | 616 | ;; write end of request 617 | mov rbp, rsp 618 | mov rcx, -1 619 | mov rsi, crnl 620 | call push_string 621 | 622 | ;; write request body 623 | mov rsi, r11 624 | call push_string 625 | 626 | ;; write body | headers separator 627 | mov rsi, crnl 628 | call push_string 629 | 630 | ;; write request header 631 | mov rsi, r11 632 | call push_string 633 | 634 | ;; calculate start headers adress on stack 635 | mov rbx, rcx 636 | add rbx, rsp 637 | inc rbx 638 | 639 | ;; restore stack state 640 | mov rsp, rbp 641 | 642 | ;; [`write(2)`](http://unixhelp.ed.ac.uk/CGI/man-cgi?write+2) request from stack to client socket 643 | mov rax, sys_write 644 | mov rdi, [client_fd] 645 | mov rsi, rbx 646 | sub rbp, rbx 647 | dec rbp 648 | mov rdx, rbp 649 | syscall ;; ignore errors 650 | 651 | ret 652 | 653 | 654 | ;; ### push string 655 | ;; 656 | ;; `rsi` should point to end of string, string should begin with `0x00` byte 657 | ;; rcx is used to store byte shift from stack top (0-7), if rcx is -1 it means 658 | ;; that additional stack space is required. Funciton will grow stack. 659 | ;; 660 | ;; If push string is called multiple times it will form continious string on the stack. 661 | ;; For Example, two calls with rcx -1 `0x00, "llo"` and `0x00, "he"` will push `"hello"` 662 | ;; to the stack and set `rcx` to 2 663 | push_string: 664 | ;; remove return address from the stack 665 | ;; and store it to `rdx` register 666 | pop rdx 667 | ;; we use `0x00` to mark begining of passed string. 668 | mov al, 0x00 669 | 670 | .push_string_next: 671 | ;; if we have no free bytes on stack 672 | ;; add 8 bytes and change `rcx` accordingly 673 | cmp rcx, -1 674 | jne .push_string_write 675 | push 0 676 | mov rcx, 7 677 | 678 | .push_string_write: 679 | ;; move string to stack starting from string end until `0x00` 680 | dec rsi 681 | mov rbx, [rsi] 682 | cmp al, bl 683 | je .push_string_ret 684 | mov byte [rsp + rcx], bl 685 | dec rcx 686 | jmp .push_string_next 687 | 688 | .push_string_ret: 689 | ;; restore stack 690 | push rdx 691 | ret 692 | 693 | ;; ### Push int 694 | ;; converts rax to string and calls push_string on it 695 | push_int: 696 | ;; remove return address from the stack 697 | ;; and store it to `rdi` register. 698 | pop rdi 699 | 700 | ;; we convert integer value to sequence of characters with base 10 and push each character with `push_string` procedure. 701 | mov r8, rax 702 | .push_int_next: 703 | mov rax, r8 704 | xor rdx, rdx 705 | mov r11, 10 706 | div r11 707 | mov r8, rax 708 | add dl, 48 709 | mov rsi, rsp 710 | sub rsi, 8 711 | mov byte [rsi - 1], dl 712 | mov byte [rsi - 2], 0x00 713 | call push_string 714 | cmp r8, 0 715 | je .push_int_ret 716 | jmp .push_int_next 717 | .push_int_ret: 718 | ;; restore stack 719 | push rdi 720 | ret 721 | 722 | ;; ### Read rest of request 723 | ;; Spec requires us to read full request with headers before we can send response. 724 | read_full_request: 725 | ;; We kept amout of read from socket in `buffer_read` variable. 726 | mov rax, [buffer_read] 727 | ;; We check that last bytes recieved from client were `\r\n\r\n` 728 | .check_buffer: 729 | cmp byte [buffer + rax - 1], 10 730 | jne .read_more_from_client_socket 731 | cmp byte [buffer + rax - 2], 13 732 | jne .read_more_from_client_socket 733 | cmp byte [buffer + rax - 3], 10 734 | jne .read_more_from_client_socket 735 | cmp byte [buffer + rax - 4], 13 736 | jne .read_more_from_client_socket 737 | ret 738 | 739 | ;; if not we [`recv(2)`](http://unixhelp.ed.ac.uk/CGI/man-cgi?recv+2) more data from socket and check buffer again in a loop. 740 | .read_more_from_client_socket: 741 | mov rax, sys_recv 742 | mov rdi, [client_fd] 743 | mov rsi, buffer 744 | mov rdx, buffer_len 745 | xor r10, r10 746 | xor r8, r8 747 | xor r9, r9 748 | syscall 749 | jmp .check_buffer 750 | 751 | ;; ### Extract filename 752 | ;; fills filename and filename_len variables based on request buffer content. 753 | extract_filename: 754 | ;; we expect only get request in buffer, so filename should start with fitth character, after `GET /` string. 755 | mov rsi, buffer + 5 756 | mov rdi, filename 757 | xor rcx, rcx 758 | 759 | ;; We copy characters from buffer untill we see `'?'` or `' '` character. 760 | .extract_filename_next_char: 761 | cld 762 | cmp byte [rsi], " " 763 | jz .extract_filename_check_index 764 | cmp byte [rsi], "?" 765 | jz .extract_filename_check_index 766 | movsb 767 | jmp .extract_filename_next_char 768 | 769 | ;; If filename is empty (client requested `/`), we set `filename` to be `index.html` 770 | .extract_filename_check_index: 771 | mov rcx, rdi 772 | sub rcx, filename 773 | cmp rcx, 0 774 | jnz .extract_filename_done 775 | mov rax, "index.ht" 776 | mov [filename ], rax 777 | mov rax, "ml" 778 | mov [filename + 8], rax 779 | mov rcx, 10 780 | 781 | .extract_filename_done: 782 | mov [filename_len], rcx 783 | ret 784 | 785 | ;; ### Check filename 786 | ;; Checks that filename is safe to [`read(2)`](http://unixhelp.ed.ac.uk/CGI/man-cgi?read+2) from filesystem. 787 | check_filename: 788 | mov rsi, -1 789 | 790 | ;; First check `filename` characters match whitelist 791 | .check_filename_whitelist: 792 | inc rsi 793 | mov byte al, [filename + rsi] 794 | cmp rsi, [filename_len] 795 | jz .check_filename_whitelist_ok 796 | mov rdi, url_whitelist 797 | mov rcx, url_whitelist_len 798 | repne scasb 799 | je .check_filename_whitelist 800 | jmp .check_filename_return_error 801 | 802 | .check_filename_whitelist_ok: 803 | mov rcx, [filename_len] 804 | 805 | ;; First that filename doesn't contain `".."` in it. 806 | .check_filename_double_dot: 807 | dec rcx 808 | cmp word [filename + rcx], ".." 809 | je .check_filename_return_error 810 | cmp rcx, 0 811 | je .check_filename_return_success 812 | jmp .check_filename_double_dot 813 | 814 | .check_filename_return_success: 815 | xor rax, rax 816 | ret 817 | 818 | .check_filename_return_error: 819 | mov rax, 1 820 | ret 821 | 822 | ;; ## Known issues 823 | ;; 824 | ;; - We use tmp registers to store some global state between procedure calls. 825 | ;; This makes recursion impossible and can lead to hidden bugs. 826 | ;; Natural way to solve this is to use stack for keeping state between procedure calls, but we use stack to build response string. 827 | ;; - While simple, forking on each request is not optimal for perfomance. 828 | ;; Modern webservers use [`epoll(2)`](http://unixhelp.ed.ac.uk/CGI/man-cgi?epoll+2) to process multiple requests in single process. 829 | 830 | ;; ## License 831 | ;; 832 | ;; Copyright (c) 2015 Vladimir Terekhov 833 | ;; 834 | ;; Permission is hereby granted, free of charge, to any person 835 | ;; obtaining a copy of this software and associated documentation 836 | ;; files (the "Software"), to deal in the Software without 837 | ;; restriction, including without limitation the rights to use, 838 | ;; copy, modify, merge, publish, distribute, sublicense, and/or sell 839 | ;; copies of the Software, and to permit persons to whom the 840 | ;; Software is furnished to do so, subject to the following 841 | ;; conditions: 842 | ;; 843 | ;; The above copyright notice and this permission notice shall be 844 | ;; included in all copies or substantial portions of the Software. 845 | ;; 846 | ;; THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 847 | ;; EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES 848 | ;; OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND 849 | ;; NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT 850 | ;; HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, 851 | ;; WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 852 | ;; FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR 853 | ;; OTHER DEALINGS IN THE SOFTWARE. 854 | --------------------------------------------------------------------------------