├── .DS_Store ├── book.json ├── index.html └── main.py /.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RivaanRanawat/web-server-python/ed8fd8d87a6f9981857f53302d3ae36ead9154cc/.DS_Store -------------------------------------------------------------------------------- /book.json: -------------------------------------------------------------------------------- 1 | { 2 | "title": "The Great Gatsby", 3 | "author": "F. Scott Fitzgerald", 4 | "year": 1925, 5 | "genres": ["Fiction", "Classic", "Literary"], 6 | "available": true, 7 | "details": { 8 | "pages": 180, 9 | "language": "English", 10 | "isbn": "978-0-7432-7356-5" 11 | } 12 | } 13 | -------------------------------------------------------------------------------- /index.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | Document 7 | 8 | 9 |

Greetings! Welcome to my own web server!

10 |

This is created from scratch, YEP!

11 | 12 | -------------------------------------------------------------------------------- /main.py: -------------------------------------------------------------------------------- 1 | import socket 2 | # import time 3 | 4 | # Define the host and port 5 | SERVER_HOST = '0.0.0.0' 6 | SERVER_PORT = 8080 7 | 8 | # ---------- CREATE SOCKET ---------- 9 | # The first step is to create and initialise our socket. What is a socket? 10 | # A socket is an endpoint in a network communication system. They are 11 | # the most important tool in creating our own server as they allow for 12 | # data to be sent and received over the network. They facilitate the 13 | # connection between the client and the server and enable the server 14 | # to handle multiple client connections simultaneously. Without sockets, 15 | # the data communication needed for the operation of a server would not 16 | # be possible. 17 | # You might be thinking, what's http then? HTTP, short for Hyper Text Transfer Protocol, 18 | # is the protocol that's being used when data is sent or received through 19 | # the sockets. Keep this in mind as it'll be useful when we write our own HTTP 20 | # response. 21 | 22 | # socket.socket() -> initializes a new socket. 23 | 24 | # The first argument we want to pass in is AddressFamily. For this, 25 | # we will pass in an IP. 26 | # You might ask, what is an IP? 27 | # Internet Protocol (IP) are rules for routing or sending packets of 28 | # data across networks between devices. Basically, when data or information 29 | # travels over the Internet or web, it travels in small packets. 30 | # IP addresses ensure that devices like computers, servers etc. 31 | # route those data packets to the correct place. 32 | 33 | # socket.AF_INET specifies the Internet Protocol v4 addresses, the first 34 | # stable version of IP and also the most used protocol. You could use v6 35 | # addresses, the newer version if you want. If you're wondering the main 36 | # reason why ipv6 came out, it is because ipv4 could generate 37 | # around 4 billion addresses and the number of devices were exceeding. 38 | # ipv6 supports around three hundred forty quindecillion addresses. 39 | # Ofcourse, there are other improvements in performance but the number 40 | # of addresses supported were the biggest problem that led to ipv6. 41 | 42 | # socket.SOCK_STREAM means it is a TCP socket. TCP establishes 43 | # a connection between the sender and receiver and ensures that the 44 | # data, once it arrives, is complete, in order, and error-free. 45 | # This connection is established using a handshake process in 3 steps (open up the image): 46 | # 1. SYN (Synchronize): The client wants to establish a connection with 47 | # the server, so it sends a packet with the SYN (synchronize) flag set 48 | # to the server. This packet includes a sequence number, which is a 49 | # random number that initiates the sequence numbers for the data 50 | # packets that the client will send. 51 | # 2. SYN-ACK (Synchronize-Acknowledgment): Upon receiving the SYN packet, 52 | # the server responds with a SYN-ACK packet. This packet acknowledges 53 | # the client's SYN packet (using the ACK flag) and includes the 54 | # server's own sequence number for the data packets it will send 55 | # to the client. 56 | # 3. ACK (Acknowledgment): The client receives the server's SYN-ACK 57 | # packet and responds with an ACK packet. This packet acknowledges 58 | # the server's SYN packet. At this point, the handshake is complete, 59 | # and both the client and server have established a reliable connection. 60 | # They can now start exchanging data. 61 | # If you have difficulty understanding this process, think of it this way - someone knocks at your door, 62 | # you open the door, then they say HI and your communication begins. 63 | 64 | # Instead of using SOCK_STREAM, you can use SOCK_DGRAM which specifies 65 | # socket to use the UDP socket. 66 | # UDP stands for User Datagram Protocol. UDP sends packets, 67 | # called datagrams, directly to the recipient without verifying 68 | # whether the recipient is ready to receive or not. 69 | # So, at first glance UDP and TCP might seem like the same thing 70 | # but they're actually quite different. 71 | # TCP ensures reliable transmission through error checking, 72 | # retransmissions when the packets are corrupted and congestion 73 | # control to reduce traffic load, meaning if the network condition 74 | # is bad, it will alter the rate of transfer of data. Because of 75 | # this, TCP is used where reliability and data integrity are 76 | # critical, such as web browsing (HTTP/HTTPS) and email (SMTP, IMAP/POP3). 77 | # UDP doesn’t do all of this. It does not establish a connection/handshake before sending data. 78 | # It sends packets without verifying whether the recipient is ready to receive. 79 | # The benefit of this is that, UDP is faster than TCP as it doesn’t have 80 | # to do error checking. The tradeoff is there’s no guarantee that the packets 81 | # will arrive in order, or in fact even arrive. This is why UDP is 82 | # generally used in applications such as live video or audio streaming, 83 | # online games, and broadcasting services. 84 | server_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM) 85 | # server_socket.setblocking(False) 86 | # After creating the socket, we can add optional settings to change 87 | # the default behaviour of sockets. 88 | # the first thing setsockopt takes in is -> level, meaning 89 | # the protocol level for which the configuration is happening in. 90 | # For socket-level options, use socket.SOL_SOCKET. For TCP options, 91 | # can use socket.IPPROTO_TCP 92 | # Next, we need to pass in the option name, meaning the feature we 93 | # need to enable. We will use SO_REUSEADDR. SO stands for 94 | # socket and REUSEADDR means Reuseaddress. This specific option tells the 95 | # kernel to allow this endpoint (IP address and port) to be 96 | # reused immediately after the socket is closed. Normally, there is 97 | # a delay before an endpoint can be reused, to ensure that any 98 | # delayed packets in the network are not mistakenly delivered to 99 | # the wrong application. 100 | # finally, we pass in the value which is usually either 1 or 0; 1 101 | # means on and 0 means off. 102 | # Simply, this line allows the server socket to reuse a 103 | # local address immediately after the socket is closed, 104 | # instead of waiting for the default timeout. 105 | server_socket.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1) 106 | 107 | # Till now, with our 2 lines, great progress btw, we have created 108 | # and initialised a socket along with an option to improve 109 | # performance. Next, we need to connect, or more specifically bind 110 | # our socket to our computer to tell it where our server should listen 111 | # for incoming network requests. This is done using the IP Address 112 | # and port. What is an IP Address? Just a bunch of unique numbers 113 | # and letters that identifies and locates a device on a network. 114 | # We can access each and every website through an IP address, 115 | # for example google website can be reached by typing 142.250.189.206 (show it) 116 | # DNS or the Domain Name System is used to map a proper string to this 117 | # IP Address so that we, humans type only google.com to reach the website 118 | # instead of a confusing number. 119 | # Now what is a port? Basically, port helps differentiate 120 | # between multiple services running on the same computer. 121 | # It allows computers and servers to route the incoming network 122 | # traffic to the correct application or service. 123 | # For example, web servers usually listen on port 80 for HTTP traffic 124 | # and port 443 for HTTPS traffic. By using different port numbers, 125 | # a single server can host multiple services, each listening on its 126 | # unique port, ensuring that all the http calls go to port 80 and 127 | # https go to 443. Think of this way - If an IP address is like a 128 | # company's phone number, then a port number is like an extension. 129 | # Since, we want this server to be accessible from any device, we can 130 | # pass in 0.0.0.0. If you only want this computer to access the 131 | # server, you can pass 127.0.0.1. 132 | # For port, you can pass in any number in the range 0-65535 but remember, that Ports 133 | # 0-1023 are reserved for the operating system's use so don't use that. 134 | # You won't get any error but your web server will not show up. 135 | # I'll pass 8080 because it is similar to 80 and as i said 136 | # before, web servers typically listen on port 80 for HTTP request 137 | server_socket.bind((SERVER_HOST, SERVER_PORT)) 138 | 139 | # The final step in socket creation is making socket listen to 140 | # the incoming connection requests using server_socket.listen(). 141 | # How this works internally is that the operating system takes 142 | # note that this particular socket is now ready to accept incoming 143 | # connections. It creates a queue for these connections, handling 144 | # the initial handshake required. To specify, the size of this 145 | # queue, we can pass the backlog value to the listen function 146 | # To be clearer, the backlog parameter tells the maximum 147 | # number of fully established connections that can wait in the queue. 148 | # But wait, what if the queue is full, what will happen to the new connection request? 149 | # The new connection will either be refused or ignored (depending on the operating 150 | # system and its configuration), causing the client to retry later. 151 | server_socket.listen(5) 152 | 153 | print(f'Listening on port {SERVER_PORT} ...') # if all goes well, we will print that we have started listening on a specific port 154 | 155 | while True: # so that we continuously keep listening to new client connections 156 | 157 | # Now, remember that our new connections get piled up in the queue? 158 | # How do I take the front element of that queue? Using server_socket.accept() 159 | # This will give us a tuple with the socket and address of the client. 160 | # The socket object is specifically for communication with the newly accepted connection, 161 | # don't confuse it with the server socket. The listening socket will continue to 162 | # listen for other incoming connections, while the new socket is used to send 163 | # and receive data on the connection just accepted. The address we receive 164 | # is again a tuple of (host, port), where the host client's IP address 165 | # and port is the port used by client's machine for the connection. 166 | # By the way, what do you think will happen if there are no connections in the queue? 167 | # accept() will block the code until a connection becomes available (unless the socket 168 | # is configured to be non-blocking). You can set it to be non blocking by doing 169 | # server_socket.setblocking(False) on top of the file (do it, get an error) 170 | # The error comes in because we told the socket to never block the execution 171 | # and when it tried to get a connection request from queue, the queue was empty 172 | # To resolve this, we can use the try..except block and catch the BlockingIOError 173 | # An important thing to mention is that BlockingIOError isn't just given 174 | # when we call the accept function, it can also be thrown when we send or 175 | # receive the HTTP request. 176 | # I'll be using the blocked version for the rest of the tutorial but 177 | # feel free to follow along with non blocked version. 178 | 179 | # try: 180 | print('ran') # <----- add print statement to explain the blocking 181 | client_socket, client_address = server_socket.accept() 182 | print('ran2') # <----- add print statement to explain the blocking 183 | # except BlockingIOError: 184 | # time.sleep(1) 185 | # continue 186 | 187 | # Next, we need to get the data or in our case, the http request from the client so that we can give 188 | # an appropriate response. To do that, we use the client_socket and call 189 | # the recv function on it. We can specify the maximum amount of data 190 | # we can handle in bytes. The most widely used networks limit packets 191 | # to approximately 1500 bytes so we can pass in something like that. 192 | # The recv function returns bytes, obviously something we can't understand 193 | # So, we can convert it to string using the decode function. 194 | request = client_socket.recv(1500).decode() 195 | print(request) 196 | 197 | # This request is composed of a request line, headers, and 198 | # an optional message body. 199 | 200 | # The first line of the http request is the request line. 201 | # It contains 3 elements - the http method, in our case it is 202 | # GET but it can also be POST, UPDATE, DELETE, HEAD & OPTIONS. 203 | # GET, as the name suggests, tells/implies that some 204 | # resource should be fetched. POST suggests that some data is 205 | # created and pushed to the server. UPDATE and DELETE mean what 206 | # the name suggests. HEAD is similar to the **`GET`** method, 207 | # except that it requests the server to respond with the headers 208 | # only and not the actual body of the response. Where is that useful? 209 | # If a URL produces a large download, a `HEAD` request could read its 210 | # Content-Length header to check the filesize without actually downloading the 211 | # file. OPTIONS list out the HTTP methods and other options supported by 212 | # a web server without performing any action or transferring a 213 | # resource's data. The next part of the first line is usually 214 | # the URL but in our case, the path is mentioned, meaning what route of the 215 | # site is called. For example, if I go to YouTube.com/@RivaanRanawat, /@RivaanRanawat 216 | # is the path. So, in our case since we just passed localhost:8080, it went 217 | # to /, the default home path for most websites. I’m not going into 218 | # much detail for this, you can look at this structure to understand more 219 | # about URL if you want to. (insert image) 220 | 221 | # The final part of the first line is the HTTP version that was used to send 222 | # this request. It is 1.1 but there are various versions of http - 0.9, the 223 | # first official version, 1.0, 1.1, 2 and 3. This versioning might 224 | # not seem important but it is actually pretty important! The http 225 | # version used here determines the structure of the rest of this 226 | # http request. In version 0.9, request only specified the first 227 | # two parts of the first line - http method and the path. What about 228 | # the version? It was the first version so ofcourse the concept of http 229 | # versions didn’t exist back then. The response was simply an HTML 230 | # file, no other type of file or message could be sent. In version 231 | # 1.0, version was added to the first line, headers were attached 232 | # in both requests and responses, more about that in some time. 233 | # Things like status code and the ability to send files other than 234 | # just HTML were also added to the response. The biggest problem with 235 | # version 1.0 was Interoperability, meaning different browsers and 236 | # servers communicating with each other. Why did this issue exist? 237 | # That’s because many people tried to improve the 1.0 version by 238 | # adding new features, that’s good but there wasn't a good way to 239 | # make sure all browsers and servers understood these new features. 240 | # Imagine the issue if I talk to you in English and suddenly say 241 | # some important information in Hindi. All of this was fixed in 242 | # the 1.1 version, the first standardised protocol. Along with 243 | # this, multiple new things were added like cache control, pipelining 244 | # meaning the ability to send a second request before the first one 245 | # is completed. In prior versions, a new TCP connection was created 246 | # for each http call. As you can imagine, this was inefficient as a 247 | # web page generally required multiple resources such as images, 248 | # scripts, and stylesheets. The overhead of establishing and tearing 249 | # down TCP connections for each resource increased the page load 250 | # times and put more load on the servers. In 1.1, a connection can 251 | # be/is reused. We will understand how this connection can be reused 252 | # in a couple of minutes. HTTP 2 was introduced 15 years later which 253 | # focused on improving the performance. HTTP 3 was soon introduced 254 | # which focused on changing TCP to QUIC, short for Quick UDP Internet 255 | # Connection. You can think of QUIC as a protocol built on top of 256 | # UDP providing the reliability and ordering of TCP but with reduced 257 | # latency and improved performance. How does this happen? **Remember, 258 | # TCP** requires a three-way handshake to establish a connection? This 259 | # adds latency. Additionally, if encryption is done (as in HTTPS), 260 | # there is an additional handshake process. **QUIC** combines the 261 | # connection and security handshakes, reducing the initial setup time. 262 | # How is performance improved? We know **TCP** connections are identified 263 | # by IP addresses and port numbers, we did that in our code too. But 264 | # what if a user's IP address changes (like when switching from Wi-Fi 265 | # to mobile data)? Then the TCP connection must be reestablished. 266 | # In QUIC, connections are identified by connection IDs rather than 267 | # IP addresses so it doesn’t matter if the network or IP changes. 268 | # All this information is good but the main question is Why are we 269 | # getting HTTP 1.1 request, not HTTP 2 or 3? Simple, we have created a 270 | # basic server that depends on TCP because of which HTTP 3 isn’t used. 271 | # In order to run HTTP 3 server, we need to depend on QUIC. To use 272 | # HTTP 2, we need to implement some other protocols that allow 273 | # features like multiplexing, the ability to run multiple requests 274 | # at once. The interesting thing we notice here is the ability of 275 | # browsers to use http 1.1 to send a request if the server it is 276 | # interacting with, doesn’t use http 2/3. Alright, enough about 277 | # the first line, I promise we’ll go over the next lines quicker. 278 | # To remind you, the lines and structure we will see next are 279 | # only present in HTTP 1.1. 280 | 281 | # The next line, Host: [localhost:8080](http://localhost:8080) 282 | # specifies the domain name and if you use IP address, that along 283 | # with the port of the server from which the resource was requested. 284 | # The next line, Connection: keep-alive tells the server that the 285 | # client wants to keep the connection open for further requests 286 | # rather than closing it right after this request is fulfilled. 287 | # Remember, I had told in 1.1, a connection can be/is reused. 288 | # If this line says keep-alive, the connection is reused 289 | # User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) 290 | # AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123.0.0.0 Safari/537.36: 291 | # This line provides detailed information about the client's browser, 292 | # operating system, and rendering engine. 293 | # sec-ch-ua-platform: "macOS": This line specifies the operating 294 | # system the browser is running on. 295 | # Accept: image/avif,image/webp,image/apng,image/svg+xml,image/,*/;q=0.8: 296 | # This header specifies the media types that the client can process, 297 | # prioritized from left to right. The **`q`** parameter indicates 298 | # the quality factor for each type. 299 | # Sec-Fetch-Site: same-origin: lets us know if the request is being 300 | # made from the same origin as the destination. 301 | # Sec-Fetch-Mode: no-cors: Specifies the mode for how the request 302 | # should be made regarding CORS (Cross-Origin Resource Sharing) 303 | # policies. What is CORS? A security mechanism implemented by 304 | # web browsers to protect users from certain types of cyber attacks, 305 | # like cross-site request forgery. What on earth does that mean? 306 | # Let’s use an analogy. 307 | # Imagine you have a blog hosted at `www.myawesomeblog.com`, and 308 | # you want to embed a YouTube video within one of your blog posts. 309 | # The same-origin policy is a security measure implemented in 310 | # web browsers. It ensures that scripts running on pages from 311 | # your blog can only request data from `www.myawesomeblog.com` 312 | # and not from other sites directly, like `www.youtube.com`. 313 | # This rule helps protect your blog's visitors from potential 314 | # malicious scripts and data theft. What do you do in that case, 315 | # you still want to display YouTube videos? Enter CORS 316 | # (Cross-Origin Resource Sharing), a mechanism that allows 317 | # restricted resources on a web page to be requested from another 318 | # domain outside the domain from which the first resource was served. 319 | # So, if YouTube sets up its servers to include specific CORS headers 320 | # in responses, it can allow your blog to embed its videos. 321 | # Essentially, YouTube is telling the browser, "It's safe to 322 | # display our videos on `www.myawesomeblog.com`". So, 323 | # behind the scenes, when you embed a YouTube video on your blog, your 324 | # blog's page makes a request to `www.youtube.com` to fetch the video. 325 | # YouTube's servers respond with the video along with headers that say, 326 | # "We permit `www.myawesomeblog.com` to display this content." The 327 | # browser checks these permissions (the CORS headers) and decides 328 | # it's okay to show the YouTube video on your blog. Understood, 329 | # but what is the no-cors that's mentioned? The "no-cors" mode is 330 | # generally used when you don't need to read the content from 331 | # the other site but just need to include it or refer to it. 332 | # Sec-Fetch-Dest: navigate - tells the type of content the client 333 | # expects to receive as a response - in our case, the request is 334 | # to navigate the browser to a new document. 335 | # Referer: http://localhost:8080/ - tells the address of the web 336 | # page that initiated the request. 337 | # Accept-Encoding: gzip, deflate, br, zstd - Tells the server 338 | # which encoding algorithms the client can understand for 339 | # compressing the response. This supports gzip, deflate, 340 | # Brotli (br), and Zstandard (zstd). 341 | # Accept-Language: en-IN,en-GB;q=0.9,en-US;q=0.8,en;q=0.7: 342 | # Specifies the client's preferred languages, ordered by preference 343 | # using **`q`** values. In my case, English as used in India is 344 | # preferred, followed by British English, American English, and 345 | # then any other type of English. 346 | # All of these were headers. 347 | 348 | # The third thing in this request is an optional message body. 349 | # You might wonder there's no optional message body? And that's 350 | # right. No message body is present in GET requests. However, 351 | # if it was a post request, we would have a message body. 352 | # Let me show it to you! (Notice one line is left between 353 | # headers and the message body) 354 | 355 | # That’s all about HTTP request format! This was very important 356 | # to understand as based on the request, we can frame our response 357 | # and this is going to be a piece of cake! 358 | 359 | # Returns HTTP response 360 | headers = request.split('\n') 361 | first_header_components = headers[0].split() 362 | 363 | http_method = first_header_components[0] 364 | path = first_header_components[1] 365 | 366 | if http_method == 'GET': 367 | if path == '/': 368 | fin = open('index.html') 369 | elif path == '/book': 370 | fin = open('book.json') 371 | else: 372 | # handle the edge case 373 | pass 374 | 375 | content = fin.read() 376 | fin.close() 377 | response = 'HTTP/1.1 200 OK\n\n' + content 378 | else: 379 | response = 'HTTP/1.1 405 Method Not Allowed\n\nAllow: GET' 380 | 381 | # You have the response created, you only want to send it back to 382 | # the client. To do that, there are multiple functions on client_socket 383 | # We'll be using sendall so that all the data is sent to the client. 384 | # sendall accepts ReadableBuffer but we have a response in string format 385 | # what do we do? we will encode the response so that it's converted 386 | # into bytes. 387 | # We could have also used the send function but the problem with it 388 | # is that there's no guarantee send() will send all the data in one 389 | # call, especially if the message is large or the network is busy. 390 | # sendall on the other hand continues sending data from the buffer 391 | # until either all data has been sent or an error occurs. 392 | # basically, sendall handles the re-sending of data that was not 393 | # successfully sent in one go. 394 | client_socket.sendall(response.encode()) 395 | 396 | # Close connection - show what happens if the client socket 397 | # is not closed. 398 | client_socket.close() 399 | 400 | # Close socket 401 | server_socket.close() --------------------------------------------------------------------------------