└── README.md
/README.md:
--------------------------------------------------------------------------------
1 | # The Backend Engineering Fundamentals
2 |
3 | ### OSI Model
4 | Open System Interconnection by International Organization for Standardization in 1984. It provides a framework for creating and implementing networking standards, devices, and internetworking schemes.
5 | | Group | Layer No. | Layer Name | Description |
6 | | ------ | --------- | ----------- | ----------- |
7 | | Top Layers | 7 | Application | Provide a user interface for sending and receiving data |
8 | | - | 6 | Presentation | Encrypt, format, and compress data for transmission |
9 | | - | 5 | Session | Initiate and terminate a session with the remote system |
10 | | Bottom Layers | 4 | Transport | Break the data stream into smaller segments and provide reliable and unreliable data delivery |
11 | | - | 3 | Network | Provide logical addressing |
12 | | - | 2 | Data Link | Prepare data for transmission |
13 | | - | 1 | Physical | Move data between devices |
14 | #### Application Layer
15 | The Top layer of the OSI model is the application layer. It provides the protocols and services that are required by the network-aware applications to connect to the network. FTP, TFTP, POP3, SMTP, and HTTP are examples of standards and protocols used in this layer.
16 | #### Presentation Layer
17 | Conversion, compression, and encryption are the main functions that the Presentation layer performs on the sending computer while on the receiving computer these functions are reconversion, decompression, and decryption. ASCII, BMP, GIF, JPEG, WAV, AVI, and MPEG are examples of standards and protocols that work in this layer.
18 | #### Session Layer
19 | The session layer is responsible for establishing, managing, and terminating communications between two computers. RPCs and NFS are examples of the session layer.
20 | #### Transport Layer
21 | The main functionalities of the Transport layer are segmentation, data transportation, and connection multiplexing. For data transportation, it uses TCP and UDP protocols. TCP is a connection-oriented protocol. It provides reliable data delivery.
22 | #### Network Layer
23 | Defining logical addresses and finding the best path to reach the destination address are the main functions of this layer. Routers work in this layer. Routing also takes place in this layer. IP, IPX, and AppleTalk are examples of this layer.
24 | #### Data Link Layer
25 | Defining physical addresses, finding hosts in the local network, specifying standards and methods to access the media are the primary functions of this layer. Switching takes place in this layer. Switches and Bridges work in this layer. HDLC, PPP, and Frame Relay are examples of this layer.
26 |
27 | This layer has two sub-layers: MAC(Media Access Control) and LLC(Logical Link Control)
28 | #### Physical Layer
29 | The Physical Layer mainly defines standards for media and devices that are used to move data across the network. 10BaseT, 10Base100, CSU/DSU, DCE, and DTE are examples of the standards used in this layer.
30 |
31 |
32 |
33 | ### TCP/IP
34 | TCP/IP stands for Transmission Control Protocol/ Internet Protocol. TCP/IP Stack is specifically designed as a model to offer highly reliable and end-to-end byte stream over an unreliable internetwork
35 | 
36 |
37 | 
38 |
39 |
40 | TCP/IP model is developed by ARPANET (Advanced Research Project Agency Network).
41 | Some widely used most common TCP/IP protocol are:
42 |
43 | 
44 |
45 | #### TCP:
46 | Transmission Control Protocol is an internet protocol suite which breaks up the message into TCP Segments and reassembling them at the receiving side.
47 |
48 | #### IP:
49 | An Internet Protocol address that is also known as an IP address is a numerical label. It is assigned to each device that is connected to a computer network which uses the IP for communication. Its routing function allows internetworking and essentially establishes the Internet. Combination of IP with a TCP allows developing a virtual connection between a destination and a source.
50 |
51 | #### HTTP:
52 | The Hypertext Transfer Protocol is a foundation of the World Wide Web. It is used for transferring webpages and other such resources from the HTTP server or web server to the web client or the HTTP client. Whenever you use a web browser like Google Chrome or Firefox, you are using a web client. It helps HTTP to transfer web pages that you request from the remote servers.
53 |
54 | #### SMTP:
55 | SMTP stands for Simple mail transfer protocol. This protocol supports the e-mail is known as a simple mail transfer protocol. This protocol helps you to send the data to another e-mail address.
56 |
57 | #### SNMP:
58 | SNMP stands for Simple Network Management Protocol. It is a framework which is used for managing the devices on the internet by using the TCP/IP protocol.
59 |
60 | #### DNS:
61 | DNS stands for Domain Name System. An IP address that is used to identify the connection of a host to the internet uniquely. However, users prefer to use names instead of addresses for that DNS.
62 |
63 | #### TELNET:
64 | TELNET stands for Terminal Network. It establishes the connection between the local and remote computer. It established connection in such a manner that you can simulate your local system at the remote system.
65 |
66 | #### FTP:
67 | FTP stands for File Transfer Protocol. It is a mostly used standard protocol for transmitting the files from one machine to another.
68 |
69 |
70 | #### [How OSI and TCP/IP work?](https://youtu.be/3kfO61Mensg)
71 |
72 |
73 |
74 | ### IP Address
75 | An Internet Protocol address (IP address) serves two main functions: network interface identification and location addressing.
76 | IPv4 defines an IP address as a 32-bit number. However, because of the growth of the Internet and the depletion of available IPv4 addresses, a new version of IP i.e., IPv6, using 128 bits for the IP address, was standardized in 1998.
77 |
78 |
79 |
80 | ### NAT
81 | Network address translation (NAT) is a method of mapping an IP address space into another by modifying network address information in the IP header of packets while they are in transit across a traffic routing device. The technique was originally used to bypass the need to assign a new address to every host when a network was moved, or when the upstream Internet service provider was replaced, but could not route the network's address space. It has become a popular and essential tool in conserving global address space in the face of IPv4 address exhaustion. One Internet-routable IP address of a NAT gateway can be used for an entire private network.
82 | 
83 |
84 | #### Routing
85 | Network address translation can be used to mitigate IP address overlap. Address overlap occurs when hosts in different networks with the same IP address space try to reach the same destination host. This is most often a misconfiguration and may result from the merger of two networks or subnets, especially when using RFC 1918 private network addressing. The destination host experiences traffic apparently arriving from the same network, and intermediate routers have no way to determine where reply traffic should be sent to. The solution is either renumbering to eliminate overlap or network address translation.
86 |
87 | #### Load balancing
88 | In client–server applications, load balancers forward client requests to a set of server computers to manage the workload of each server. Network address translation may be used to map a representative IP address of the server cluster to specific hosts that service the request.
89 |
90 |
91 |
92 | ### Virtual IP Addressing
93 | A virtual IP address (VIP or VIPA) is an IP address that does not correspond to a physical network interface.
94 | 
95 |
96 | 
97 |
98 | Uses for VIPs include network address translation (especially, one-to-many NAT), fault-tolerance, and mobility. It advertises virtual links connected via itself to all of its actual network interfaces.
99 |
100 |
101 |
102 | ### iptables
103 | iptables allows a system administrator to configure the IP packet filter rules of the Linux kernel firewall. The filters are organized in different tables, which contain chains of rules for how to treat network traffic packets. Different kernel modules and programs are currently used for different protocols. On most Linux systems, iptables is installed as /usr/sbin/iptables and documented in its man pages.
104 |
105 | 
106 |
107 | 
108 |
109 | 
110 |
111 |
112 |
113 | #### [iptables - I: Advance](https://youtu.be/NAdJojxENEU)
114 | #### [iptables - II: Advance](https://youtu.be/-CraNvj48J0)
115 |
116 |
117 | ### TCP & UDP
118 | TCP, which stands for Transmission Control Protocol, and UDP, or User Datagram Protocol, are part of the internet protocol suite, layer 4 (Transport). TCP and UDP are different methods to send information across the internet.
119 |
120 | #### TCP Pros
121 | - acknowledgement
122 | - guranteed delivery
123 | - connection based
124 | - congestion control
125 | - ordered packets
126 |
127 | #### TCP Cons
128 | - larger packets
129 | - more bandwidth
130 | - slower than UDP
131 | - stateful
132 | - server memory (DoS)
133 |
134 | #### UDP Pros
135 | - smaller package
136 | - less bandwidth
137 | - faster than TCP
138 | - stateless
139 |
140 | #### UDP Cons
141 | - no acknowledgement
142 | - no guranteed delivery
143 | - connectionless
144 | - no congestion control
145 | - no ordered packets
146 | - no security
147 |
148 | #### [TCP vs UDP](https://youtu.be/qqRYkcta6IE)
149 |
150 |
151 | ### TCP 3-way handshake
152 | #### Connection Establishment Process
153 | 
154 |
155 | #### Closing Connection Process
156 | 
157 |
158 |
159 | ### When to use UDP vs TCP?
160 | When data integrity is your top priority, then TCP will always be the best choice. The protocol guarantees complete delivery and accurate reconstruction of the original data. When streaming video, however, accuracy is less important than continuity. This is why real-time applications like audio and video streaming will often use UDP.
161 |
162 |
163 | ### Stateless vs Stateful applications
164 | In a traditional stateful web application, the server is doing all of the work associated with recreating a web page. A stateless application doesn't save any client session (state) data on the server where the application lives. By using a Representational State Transfer (REST) API, developers can augment HTTP to produce stateless apps.
165 |
166 | When an application is stateful, it relies on saved client session data to process new transactions. A stateful app still uses a database for back-end storage, but it also uses the server where the application runs to store data from previous interactions. This allows the app to process subsequent transactions in the context of preceding ones. Stateful systems use databases like any application, but they also maintain "state data" (related to client authentication and past requests) on the server itself. Stateful apps are fast because they don't need to process as much data in each client request. This makes stateful apps fast and it allows clients to interact with the application within the historical context of previous interactions.
167 |
168 | Stateful apps bind clients and users to the same server so it can process subsequent requests in the context of previous ones. Stateful apps work best under predictable workloads that the system can manage. If traffic grows, you can't simply replicate a stateful app and redirect new client requests because users will need to start from scratch. This makes stateful apps more prone to system unavailability when client traffic increases.
169 |
170 | A stateless application or stateless process doesn't store any data related to past transactions on its server. It accepts each transaction or user interaction like a blank slate without knowledge of previous interactions. Similar to a Coke machine, the stateless app receives a single short-term request and delivers a single response. This is why many of the apps (clients) on your phone or computer have a cache. A stateless system acts like a nameless agent that clients use to interact with the databases it connects with.
171 |
172 | Since the application is stateless, a load balancer can infinitely replicate new instances of it and balance client requests across those instances. This allows the system to scale and manage any level of traffic. Similar to the security guard, DreamFactory waits for a client to submit an API request.
173 |
174 | The decision to use stateful versus stateless apps boils down to your scalability requirements and what you need the app to do. If your app needs to store session data to process transactions in-context and if the server can handle the expected processing load, a stateful system is probably best. On the other hand, if you are building an app that needs to process REST API transactions, provide information in response to client requests – and traffic levels can grow exponentially – a stateless app is what you’ll be working with.
175 |
176 |
177 | ### HTTP
178 | It is a aapplication layer protocol, functions as a request–response protocol in the client–server model.
179 | An example HTTP request:
180 | 
181 |
182 | An example response:
183 | 
184 |
185 | #### HTTP 1.0 over TCP
186 | - new TCP connection for each request
187 | - slow
188 | - buffering
189 |
190 | #### HTTP 1.1 over TCP
191 | 
192 |
193 | - persisted TCP connection
194 | - low latency
195 | - streaming with chunked transfer
196 | - pipelining (disabled by default)
197 |
198 | 
199 |
200 | #### HTTP 2 over TCP
201 | - compression
202 | - multiplexing
203 | - server push
204 | - SPDY (initial version of HTTP2 by google)
205 | - secure by default
206 | - protocol negotiation during TLS (NPN/ALPN)
207 |
208 | 
209 |
210 |
211 | ### Journey of an HTTP request
212 | 1. HTTP request (format as shown earlier) is created
213 | 2. Transport Layer
214 | - TCP handshake
215 | - If a message is too large to be sent in one go, it is broken down into several segments that each have their own sequence number (situated in a header field, as well). This way, when they arrive at their destination, they will be processed in the correct order.
216 | - The HTTP request is encapsulated in a TCP segment, which has a header and data payload. The Transport Layer is responsible for sending the segment from the requesting application to the correct process in the server. It uses the use of ports: integers in the range of 0–65535 that act as identifiers for a specific application or process in a host.
217 | 
218 |
219 | 3. Internet Layer
220 | - Onwards, downwards, to the Internet Layer. This is where TCP segments are turned into IP packets by the Internet Protocol (IP) The Internet Protocol is the one that enables a packet to travel from your home to a server on the other side of the world. An IP address might look something like this: 207.126.144.100, for IPv4, or this: 2a02:687:1211:7500:c8cb:bcf9:7acb:a966,. for IPv6.
221 | 
222 |
223 | 4. Data Link Layer
224 | - Ethernet wraps the IP packet, containing the TCP segment, into a frame. The frame takes its final form: it becomes a stream of bits travelling as electrical signals (or light signals or radio waves) towards the destination MAC address’— the router that will act as a gateway to the outside world. It's much like how real packages travel across the world: each courier processes the package and dispatches it to the next courier until it reaches its final destination.
225 | 
226 |
227 |
228 | ### HTTP GET / through Switches, Routers, Gateways, and Proxies
229 | 
230 |
231 | 
232 |
233 | 
234 |
235 | ### cURL Verbose mode
236 | ```
237 | curl -v https://www.stackoverflow.com
238 | * Trying 151.101.1.69:443...
239 | * TCP_NODELAY set
240 | * Connected to www.stackoverflow.com (151.101.1.69) port 443 (#0)
241 | * ALPN, offering h2
242 | * ALPN, offering http/1.1
243 | * successfully set certificate verify locations:
244 | * CAfile: /etc/ssl/certs/ca-certificates.crt
245 | CApath: /etc/ssl/certs
246 | * TLSv1.3 (OUT), TLS handshake, Client hello (1):
247 | * TLSv1.3 (IN), TLS handshake, Server hello (2):
248 | * TLSv1.2 (IN), TLS handshake, Certificate (11):
249 | * TLSv1.2 (IN), TLS handshake, Server key exchange (12):
250 | * TLSv1.2 (IN), TLS handshake, Server finished (14):
251 | * TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
252 | * TLSv1.2 (OUT), TLS change cipher, Change cipher spec (1):
253 | * TLSv1.2 (OUT), TLS handshake, Finished (20):
254 | * TLSv1.2 (IN), TLS handshake, Finished (20):
255 | * SSL connection using TLSv1.2 / ECDHE-RSA-AES128-GCM-SHA256
256 | * ALPN, server accepted to use h2
257 | * Server certificate:
258 | * subject: CN=*.stackexchange.com
259 | * start date: Sep 4 13:06:50 2022 GMT
260 | * expire date: Dec 3 13:06:49 2022 GMT
261 | * subjectAltName: host "www.stackoverflow.com" matched cert's "*.stackoverflow.com"
262 | * issuer: C=US; O=Let's Encrypt; CN=R3
263 | * SSL certificate verify ok.
264 | * Using HTTP2, server supports multi-use
265 | * Connection state changed (HTTP/2 confirmed)
266 | * Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
267 | * Using Stream ID: 1 (easy handle 0x5604558be8c0)
268 | > GET / HTTP/2
269 | > Host: www.stackoverflow.com
270 | > user-agent: curl/7.68.0
271 | > accept: */*
272 | >
273 | * Connection state changed (MAX_CONCURRENT_STREAMS == 100)!
274 | < HTTP/2 301
275 | < cache-control: private
276 | < content-type: text/html; charset=utf-8
277 | ```
278 | ### HTTP Code 502 Bad Gateway
279 | The server was acting as a gateway or proxy and received an invalid response from the upstream server.
280 |
281 | The 502 error code is a by-product of the source or origin server being out of order. A range of connectivity issues, a server that is powered down, or spikes in traffic can all lead to this message. Communication issues between online servers or DNS issues such as incorrectly cached IP addresses play a big role in the appearance of this error.
282 |
283 | If your website is generating a 502 status code error, then try disabling the firewall or the CDN if you are behind one. Website themes and browser plugins can also cause the error as well. If disabling the plugins does not help then try updating your website theme. Most hosting providers have customer support teams that can triage the issue with you.
284 |
285 | Error codes use by cloudflare for more meaningfull error:
286 |
287 | 
288 |
289 | ### HTTP CONNECT method
290 | #### HTTP Proxy
291 | - Client wants to connect to a server beyond its reach
292 | - Client doesn't know how to communicate target-server protocol
293 | - Proxy has access to the server and knows how to talk to it
294 | - Client connects the Proxy and Proxy makes the request on behalf of the client to the target server
295 |
296 | 
297 |
298 | #### HTTPS Proxy
299 | - In HTTPS Proxy, the proxy decrypts and serves its certificate to the client instead of that of the target server
300 | - Proxy sees everything(BAD, but good for debugging web traffic)
301 | - We need a way to establish End to End encryption between the client and the target server
302 |
303 | 
304 |
305 | #### HTTP CONNECT
306 | - Creates a tunnel between the client and the target server
307 | - Client sends HTTP CONNECT to Proxy containing target server
308 | - Proxy creates a TCP connection to target server
309 | - Once successful, proxy returns success to client
310 | - Any packet the client sends goes as-is to the target server
311 | - This includes TLS handshake, establishing e2e
312 |
313 | 
314 |
315 | 
316 |
317 | #### HTTP CONNECT Chaining
318 | 
319 |
320 | **Pros:**
321 | - Connect to secure servers
322 | - Support protocols that normally not supported through proxies (WebSockets, WebRTC)
323 | - Proxy can't read encrypted traffic
324 | - Chained Proxies CONNECT
325 |
326 | **Cons**
327 | - Only TCP, Can't proxy UDP traffic (won't work with QUIC/DNS)
328 | - Each CONNECT opens new TCP, expensive, (no multiplexing)
329 | - Bad implementation would allow tunneling to port (eg 25 SMTP can load to spam email)
330 |
331 | **MASQUE: Multiplexed Application Substrate over QUIC Encryption**
332 | ***Why MASQUE?***
333 | - Allows UDP tunneling, Can't proxy UDP traffic with CONNECT (won't work with QUIC/DNS)
334 | - Allows multiplexing, each CONNECT opens new TCP to same host, expensive
335 | - Secure, bad CONNECT implementation would allow tunneling to port (eg 25 SMTP can load to spam email)
336 |
337 | ### HTTP/2
338 | #### HTTP 1.1
339 | - We can't utilize the same connection for parallel requests
340 | - Browsers use 6 connections, all those connections can be use for sending more request (to overcome the above issue)
341 |
342 | #### HTTP 2
343 | - Multiplexed streams, as we have identifier for streams, we can compress the header too
344 |
345 | 
346 |
347 | - Server push
348 |
349 | 
350 |
351 | **Pros:**
352 | - Multiplexing over Single Connection
353 | - Compression (Header & Data)
354 | - Server Push
355 | - Secure by default
356 | - Protocol Negotiation during TLS (ALPN)
357 |
358 | **Cons:**
359 | - Server push can be abushed when configured incorrectly
360 | - Can be slower when in mixed mode (Backend is http/2 but load balancer is http 1 or vice versa)
361 |
362 |
363 | ### WebSockets
364 | 
365 |
366 | 
367 |
368 | #### WebSockets usecases:
369 | - Chatting
370 | - Live Feed
371 | - Multiplayer gaming
372 | - Showing client progress/logging
373 |
374 | **Pros:**
375 | - Full-duplex (no polling)
376 | - HTTP compatible
377 | - Firewall friendly (standard)
378 |
379 | **Cons:**
380 | - Proxying is tricky
381 | - L7 Load Balance challenging (timeouts)
382 | - Stateful, difficult to horizontally scale
383 |
384 | **Do you have to use WebSockets?**
385 | - NO! Rule of thumb - do you absolutely need bidirectional communication?
386 | - Long polling
387 | - EventSource
388 |
389 | ### HTTP/2 limitation that leads to QUIC or HTTP/3
390 | HTTP/2 over TCP suffers from a slight inefficiency caused by TCP. Consider the following example: Suppose you have 3 streams A, B and C. Denote packets (frames) of each stream by lower case letters (a, b, c) and a sequence number. Let's have a look at what happens with HTTP/2 over TCP when the following sequence is sent:
391 |
392 | server ---> a2, c2, b2, *c1, b1, a1 ---> client
393 |
394 | Where the *c1 means that this frame was lost. The receiving end (client) must wait for a re-transmission of the lost *c1 frame before it can pass later frames to the application layer (namely b2,c2,a2), because the communication is over TCP and TCP guarantees in-order delivery!
395 |
396 | That is in contrast to HTTP/3 & QUIC, where over UDP these are just independent packets, thus the loss of *c1 would not delay the delivery of b2, c2 and a2 to the application layer!
397 |
398 | ### QUIC or HTTP/3
399 |
400 |
401 | ### gRPC over HTTP/3
402 | ### Should RabbitMQ implement QUIC Protocol for their Channels Message Queue?
403 | ### Can QUIC Protocol improve Database Performance in Web Applications?
404 | ### Facebook moves their backend and frontend to QUIC
405 | ### Symmetrical and Asymmetrical Encryption
406 | ### TLS
407 | ### TLS Handshake
408 | ### Web Servers
409 | ### Proxy and Reverse Proxy Server
410 | ### Anatomy of a Proxy Server
411 | ### Layer 4 vs Layer 7 proxying
412 | ### NginX
413 | ### NGINX Internal Architectur - Workers
414 | ### HAProxy
415 | ### Envoy Proxy
416 | ### Slack migrating millions of WebSockets from HAProxy to Envoy
417 | ### Load balancing Server-Sent Events
418 | ### Load balancing in Layer 4 vs Layer 7
419 | ### Load Balance multiple RTMP Servers to Horizontally Scale Streaming
420 | ### Can you Max-out the connections between load balancer and backend servers?
421 | ### Caching is hard
422 | ### Long Polling, how it deffers from push, pull and SSE
423 | ### First port your computer hits
424 | ### ACID Transactions - Relational Database
425 | ### Primary Key and Secondary key
426 | ### B Tree and B+ Tree
427 | ### Database Indexing
428 | ### Leaking Postgres database connections
429 | ### Database Engines
430 | ### Fail-over and High-Availability
431 | ### Active-Active vs Active-Passive Cluster to achieve High Availability
432 | ### Connection Pooling in PostgresSQL
433 | ### Horizontal vs Vertical Database Partitioning
434 | ### Database Partitioning
435 | ### Database Sharding
436 | ### Can you get Eventual Consistency in Relational Database?
437 | ### Avoid Double Booking and Race Conditions
438 | ###
439 |
440 |
--------------------------------------------------------------------------------