├── DRF_Interview_Questions__Answers.pdf ├── LICENSE ├── README.md └── django-celery-redis-interview-questions.md /DRF_Interview_Questions__Answers.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PragatiVerma18/DRF-Interview-Prep/392e1fadda1affdc09c9f9b9716ed4dcb152ebd5/DRF_Interview_Questions__Answers.pdf -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2021 Pragati Verma 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # DRF Interview Questions & Answers 2 | 3 | > Click :star: if you like the project. Pull Request are highly appreciated. Follow me [@pragati_verma18](https://twitter.com/pragati_verma18) for technical updates. 4 | 5 | > I have added a new section on [Django + Celery + Redis + RabbitMQ Interview Questions](https://github.com/PragatiVerma18/DRF-Interview-Prep/blob/master/django-celery-redis-interview-questions.md). Check it out! 6 | 7 | --- 8 |
9 | 10 | twitter 11 | 12 | 13 | devto 14 | 15 | 16 | 17 | 18 | 19 | linkedin 20 | 21 |
22 | 23 |
24 | 25 | download 26 | 27 |
28 | 29 | 30 | --- 31 | 32 | ### Table of Contents 33 | | No. | Questions | 34 | | --- | --------- | 35 | | | **[Core Concepts](#core-concepts)** | 36 | |1 | [What is an API?](#what-is-an-api) | 37 | |2 | [What is a web API?](#what-is-a-web-api) | 38 | |3 | [What is a REST API?](#what-is-a-rest-api) | 39 | |4 | [What is an endpoint?](#what-is-an-endpoint) | 40 | |5 | [What are HTTP Verbs?](#what-are-http-verbs) | 41 | |6 | [What is the difference between HTTP and HTTPS?](#what-is-the-difference-between-http-and-https) | 42 | |7 | [What are status codes?](#what-are-status-codes) | 43 | |8 | [What is the difference between authentication and authorization?](#what-is-the-difference-between-authentication-and-authorization) | 44 | |9 | [What is a browsable API?](#what-is-a-browsable-api)| 45 | |10 | [What is CORS?](#what-is-cors) | 46 | |11 | [How to fix CORS error in Django?](#how-to-fix-cors-error-in-django) | 47 | |12 | [What is the difference between stateful and stateless?](#what-is-the-difference-between-stateful-and-stateless) | 48 | | | **[Django Rest Framework](#django-rest-framework)** | 49 | |1 | [What is Django Rest Framework?](#what-is-django-rest-framework) | 50 | |2 | [What are benefits of using Django Rest Framework?](#what-are-benefits-of-using-django-rest-framework) | 51 | |3 | [What are serializers?](#what-are-serializers) | 52 | |4 | [What are Permissions in DRF?](#what-are-permissions-in-drf) | 53 | |5 | [How to add login in the browsable API provided by DRF?](#how-to-add-login-in-the-browsable-api-provided-by-drf) | 54 | |6 | [What are Project-Level Permissions?](#what-are-project-level-permissions) | 55 | |7 | [How to make custom permission classes?](#how-to-make-custom-permission-classes)| 56 | |8 | [What is Basic Authentication?](#what-is-basic-authentication) | 57 | |9 | [What are the disadvantages of Basic Authentication?](#what-are-the-disadvantages-of-basic-authentication) | 58 | |10 | [What is session authentication?](#what-is-session-authentication) | 59 | |11 | [What are the pros and cons of session authentication?](#what-are-the-pros-and-cons-of-session-authentication) | 60 | |12 | [What is Token Authentication?](#what-is-token-authentication) | 61 | |13 | [What are pros and cons of token authentication?](#what-are-pros-and-cons-of-token-authentication) | 62 | |14 | [What is the difference between cookies vs localStorage?](#what-is-the-difference-between-cookies-vs-localstorage) | 63 | |15 | [Where should token be saved - cookie or localStorage?](#where-should-token-be-saved-cookie-or-localstorage) | 64 | |16 | [What are disadvantages of Django REST Framework's built-in TokenAuthentication?](#what-are-disadvantages-of-django-rest-framework-s-built-in-tokenauthentication) | 65 | |17 | [What are JSON Web Tokens(JWTs)?](#what-are-json-web-tokens-jwts) | 66 | |18 | [What are benefits of JWT?](#what-are-benefits-of-jwt) | 67 | |19 | [What is the difference between a session and cookie?](#what-is-the-difference-between-a-session-and-cookie) | 68 | |20 | [What is the difference between cookie and tokens?](#what-is-the-difference-between-cookie-and-tokens) | 69 | |21 | [What's an access token?](#what-s-an-access-token) | 70 | |22 | [What is meant by a bearer token?](#what-is-meant-by-a-bearer-token) | 71 | |23 | [What is the security threat to access token?](#what-is-the-security-threat-to-access-token) | 72 | |24 | [What is a refresh token?](#what-is-a-refresh-token) | 73 | |25 | [What are the best practices when using token authentication?](#what-are-the-best-practices-when-using-token-authentication) | 74 | |26 | [What is cookie-based authentication?](#what-is-cookie-based-authentication) | 75 | |27 | [What are viewsets in DRF?](#what-are-viewsets-in-drf) | 76 | |28 | [What are routers in DRF?](#what-are-routers-in-drf) | 77 | |29 | [What is the difference between APIViews and Viewsets in DRF?](#what-is-the-difference-between-apiviews-and-viewsets-in-drf) | 78 | |30 | [What is the difference between `GenericAPIView` and `GenericViewset`?](#what-is-the-difference-between-genericapiview-and-genericviewset) | 79 | | | [About Author](#about-author) | 80 | | | [Connect with me](#connect-with-me)| 81 | 82 | 83 | --- 84 | 85 | ## Core Concepts 86 | 87 | 88 | 89 | 1. ### What is an API? 90 | An API is a set of definitions and protocols for building and integrating application software. API stands for **Application Programming Interface**. APIs let your product or service communicate with other products and services without having to know how they’re implemented. This can simplify app development, saving time and money. The API is not the database or even the server; it’s the code that governs the access point(s) for the server. 91 | 92 | It’s sometimes referred to as a contract between an information provider and an information user—establishing the content required from the consumer (the call) and the content required by the producer (the response). For example, the API design for a weather service could specify that the user supply a zip code and that the producer reply with a 2-part answer, the first being the high temperature, and the second being the low. 93 | 94 | 95 | **[⬆ Back to Top](#table-of-contents)** 96 | 97 | 2. ### What is a web API? 98 | A web API is a collection of endpoints that expose certain parts of an underlying database. As developers we control the URLs for each endpoint, what underlying data is available, and what actions are possible via HTTP verbs. 99 | 100 | 101 | **[⬆ Back to Top](#table-of-contents)** 102 | 103 | 3. ### What is a REST API? 104 | A **REST(Representational State Transfer) API** (also known as RESTful API) is an API that conforms to the constraints of REST architectural style and allows for interaction with RESTful web services. When a client request is made via a RESTful API, it transfers a representation of the state of the resource to the requester or endpoint. 105 | 106 | Every RESTful API: 107 | 108 | - is stateless, like HTTP 109 | - supports common HTTP verbs (GET, POST, PUT, DELETE, etc.) 110 | - returns data in either the JSON or XML format 111 | 112 | Any RESTful API must, at a minimum, have these three principles. The standard is important because it provides a consistent way to both design and consume web APIs. 113 | 114 | **[⬆ Back to Top](#table-of-contents)** 115 | 116 | 4. ### What is an endpoint? 117 | A web API has endpoints - URLs with a list of available actions (HTTP verbs) that expose data (typically in JSON, which is the most common data format these days and the default for Django REST Framework). 118 | 119 | The type of endpoint which returns multiple data resources is known as a **collection.** 120 | 121 | **[⬆ Back to Top](#table-of-contents)** 122 | 123 | 5. ### What are HTTP Verbs? 124 | HTTP defines a set of request methods to indicate the desired action to be performed for a given resource. Although they can also be nouns, these request methods are sometimes referred to as HTTP verbs. 125 | ![HTTP Verbs](https://user-images.githubusercontent.com/42115530/104155356-08919a00-540d-11eb-94f8-316e8f591177.png) 126 | 127 | **[⬆ Back to Top](#table-of-contents)** 128 | 129 | 6. ### What is the difference between HTTP and HTTPS? 130 | HTTPS stands for Hypertext Transfer Protocol Secure (also referred to as HTTP over TLS or HTTP over SSL). HTTPS also uses TCP (Transmission Control Protocol) to send and receive data packets, but it does so over port 443, within a connection encrypted by Transport Layer Security (TLS). Generally sites running over HTTPS will have a redirect in place so even if you type in `http://` it will redirect to deliver over a secured connection. 131 | 132 | **Key Differences:** 133 | 134 | - HTTP is unsecured while HTTPS is secured. 135 | - HTTP sends data over port 80 while HTTPS uses port 443. 136 | - HTTP operates at application layer, while HTTPS operates at transport layer. 137 | - No SSL certificates are required for HTTP, with HTTPS it is required that you have an SSL certificate and it is signed by a CA. 138 | - HTTP doesn't require domain validation, where as HTTPS requires at least domain validation and certain certificates even require legal document validation. 139 | - No encryption in HTTP, with HTTPS the data is encrypted before sending. 140 | 141 | **[⬆ Back to Top](#table-of-contents)** 142 | 143 | 7. ### What are status codes? 144 | HTTP response status codes indicate whether a specific HTTP request has been successfully completed. Responses are grouped in five classes: 145 | - **1xx: Informational** – Communicates transfer protocol-level information. 146 | - **2xx: Success** – Indicates that the client’s request was accepted successfully. 147 | - **3xx: Redirection** – Indicates that the client must take some additional action in order to complete their request. 148 | - **4xx: Client Error** – This category of error status codes points the finger at clients. 149 | - **5xx: Server Error** – The server takes responsibility for these error status codes. 150 | 151 | **[⬆ Back to Top](#table-of-contents)** 152 | 153 | 8. ### What is the difference between authentication and authorization? 154 | Authentication confirms that users are who they say they are. Authorization gives those users permission to access a resource. In secure environments, authorization must always follow authentication. Users should first prove that their identities are genuine before an organization’s administrators grant them access to the requested resources. 155 | 156 | Let's use an analogy to outline the differences. 157 | 158 | Consider a person walking up to a locked door to provide care to a pet while the family is away on vacation. That person needs: 159 | 160 | Authentication, in the form of a key. The lock on the door only grants access to someone with the correct key in much the same way that a system only grants access to users who have the correct credentials. 161 | Authorization, in the form of permissions. Once inside, the person has the authorization to access the kitchen and open the cupboard that holds the pet food. The person may not have permission to go into the bedroom for a quick nap. 162 | 163 | **[⬆ Back to Top](#table-of-contents)** 164 | 165 | 9. ### What is a browsable API? 166 | Django REST Framework supports generating human-friendly HTML output for each resource when the HTML format is requested. These pages allow for easy browsing of resources, as well as forms for submitting data to the resources using POST, PUT, and DELETE. It facilitates interaction with RESTful web service through any web browser. To enable this feature, we should specify text/html for the Content-Type key in the request header. 167 | 168 | **[⬆ Back to Top](#table-of-contents)** 169 | 170 | 10. ### What is CORS? 171 | Cross-Origin Resource Sharing (CORS) is a protocol that enables scripts running on a browser client to interact with resources from a different origin. This is useful because, thanks to the same-origin policy followed by XMLHttpRequest and fetch, JavaScript can only make calls to URLs that live on the same origin as the location where the script is running. 172 | 173 | **[⬆ Back to Top](#table-of-contents)** 174 | 175 | 11. ### How to fix CORS error in Django? 176 | CORS requires the server to include specific HTTP headers that allow for the client to determine if and when cross-domain requests should be allowed. 177 | 178 | The easiest way to handle this–-and the one recommended by Django REST Framework–-is to use middleware that will automatically include the appropriate HTTP headers based on our settings. 179 | 180 | We use `django-cors-headers`: 181 | 182 | - add `corsheaders` to the `INSTALLED_APPS` 183 | - add `CorsMiddleware` above `CommonMiddleWare` in `MIDDLEWARE` 184 | - create a `CORS_ORIGIN_WHITELIST` 185 | 186 | 187 | **[⬆ Back to Top](#table-of-contents)** 188 | 189 | 12. ### What is the difference between stateful and stateless? 190 | A stateless process or application can be understood in isolation. There is no stored knowledge of or reference to past transactions. Each transaction is made as if from scratch for the first time. 191 | 192 | Stateful applications and processes, however, are those that can be returned to again and again, like online banking or email. They’re performed with the context of previous transactions and the current transaction may be affected by what happened during previous transactions. For these reasons, stateful apps use the same servers each time they process a request from a user. 193 | 194 | If a stateful transaction is interrupted, the context and history have been stored so you can more or less pick up where you left off. Stateful apps track things like window location, setting preferences, and recent activity. You can think of stateful transactions as an ongoing periodic conversation with the same person. 195 | 196 | In terms of authorization, 197 | - **Stateful** = save authorization info on server side, this is the traditional way 198 | - **Stateless** = save authorization info on client side, along with a signature to ensure integrity, in form of tokens 199 | 200 | **[⬆ Back to Top](#table-of-contents)** 201 | 202 | --- 203 | 204 | ## Django Rest Framework 205 | 206 | 1. ### What is Django Rest Framework? 207 | Django REST Framework is a web framework built over Django that helps to create web APIs which are a collection of URL endpoints containing available HTTP verbs that return JSON. It’s very easy to build model-backed APIs that have authentication policies and are browsable. 208 | 209 | **[⬆ Back to Top](#table-of-contents)** 210 | 211 | 2. ### What are benefits of using Django Rest Framework? 212 | - Its Web-browsable API is a huge usability win for your developers. 213 | - Authentication policies include packages for OAuth1 and OAuth2. 214 | - Serialization supports both ORM and non-ORM data sources. 215 | - It’s customizable all the way down. Just use regular function-based views if you don’t need the more powerful features. 216 | - It has extensive documentation and great community support. 217 | - It’s used and trusted by internationally recognized companies including Mozilla, Red Hat, Heroku, and Eventbrite. 218 | 219 | **[⬆ Back to Top](#table-of-contents)** 220 | 221 | 3. ### What are serializers? 222 | Serializers allow complex data such as querysets and model instances to be converted to native Python datatypes that can then be easily rendered into JSON, XML or other content types. Serializers also provide deserialization, allowing parsed data to be converted back into complex types, after first validating the incoming data. 223 | 224 | **[⬆ Back to Top](#table-of-contents)** 225 | 226 | 4. ### What are Permissions in DRF? 227 | Permission checks are always run at the very start of the view, before any other code is allowed to proceed. Permission checks will typically use the authentication information in the request.user and request.auth properties to determine if the incoming request should be permitted. 228 | 229 | Permissions are used to grant or deny access for different classes of users to different parts of the API. 230 | 231 | The simplest style of permission would be to allow access to any authenticated user, and deny access to any unauthenticated user. This corresponds to the `IsAuthenticated` class in REST framework. 232 | 233 | These can be applied at a **project-level**, a **view-level**, or at any **individual model level**. 234 | 235 | **[⬆ Back to Top](#table-of-contents)** 236 | 237 | 5. ### How to add login in the browsable API provided by DRF? 238 | Within the project-level `urls.py` file, add a new URL route that includes `rest_framework.urls`. 239 | 240 | ```python 241 | # blog_project/urls.py 242 | from django.urls import include, path 243 | 244 | urlpatterns = [ 245 | ... 246 | path('api-auth/', include('rest_framework.urls')), # new 247 | ] 248 | ``` 249 | 250 | **[⬆ Back to Top](#table-of-contents)** 251 | 252 | 6. ### What are Project-Level Permissions? 253 | Django REST Framework ships with a number of built-in project-level permissions settings we can use, including: 254 | 255 | - **AllowAny** - any user, authenticated or not, has full access 256 | - **IsAuthenticated** - only authenticated, registered users have access 257 | - **IsAdminUser** - only admins/superusers have access 258 | - **IsAuthenticatedOrReadOnly** - unauthorized users can view any page, but only authenticated users have write, edit, or delete privileges 259 | 260 | Implementing any of these four settings requires updating the `DEFAULT_PERMISSION_CLASSES` setting: 261 | 262 | ```python 263 | # blog_project/settings.py 264 | 265 | REST_FRAMEWORK = { 266 | 'DEFAULT_PERMISSION_CLASSES': [ 267 | 'rest_framework.permissions.IsAuthenticated', # new 268 | ] 269 | } 270 | ``` 271 | **[⬆ Back to Top](#table-of-contents)** 272 | 273 | 7. ### How to make custom permission classes? 274 | To make custom permission class, create a file named `permissions.py` that imports permissions at the top and then create your custom class, for example - `IsAuthorOrReadOnly` which extends `BasePermission`, then we override `has_object_permission`. 275 | 276 | **[⬆ Back to Top](#table-of-contents)** 277 | 278 | 8. ### What is Basic Authentication? 279 | The most common form of HTTP authentication is known as “Basic” Authentication. When a client makes an HTTP request, it is forced to send an approved authentication credential before access is granted. 280 | 281 | The complete request/response flow looks like this: 282 | 1. Client makes an HTTP request 283 | 2. Server responds with an HTTP response containing a `401 (Unauthorized)` status code and `WWW-Authenticate` HTTP header with details on how to authorize 284 | 3. Client sends credentials back via the `Authorization` HTTP header 285 | 4. Server checks credentials and responds with either `200 OK` or `403 Forbidden` status code. 286 | Once approved, the client sends all future requests with the Authorization HTTP header credentials. 287 | 288 | > **Note:** The authorization credentials sent are the **unencrypted base64 encoded** version of `:`. 289 | 290 | **[⬆ Back to Top](#table-of-contents)** 291 | 292 | 9. ### What are the disadvantages of Basic Authentication? 293 | **Cons of Basic Authentication:** 294 | - On every single request the server must look up and verify the username and password, which is inefficient. 295 | - User credentials are being passed in clear text—not encrypted at all, can be easily captured and reused. 296 | 297 | **[⬆ Back to Top](#table-of-contents)** 298 | 299 | 10. ### What is session authentication? 300 | At a high level, the client authenticates with its credentials (username/password) and then receives a _session ID_ from the server which is stored as a _cookie_). This session ID is then passed in the header of every future HTTP request. When the session ID is passed, the server uses it to look up a session object containing all available information for a given user, including credentials. This approach is **stateful** because a record must be kept and maintained on both the server (the session object) and the client (the session ID). 301 | 302 | Let’s review the basic flow: 303 | 1. A user enters their log in credentials (typically username/password) 304 | 2. The server verifies the credentials are correct and generates a session object that is then stored in the database 305 | 3. The server sends the client a session ID — not the session object itself—which is stored as a cookie on the browser 306 | 4. On all future requests the session ID is included as an HTTP header and, if verified by the database, the request proceeds 307 | 5. Once a user logs out of an application, the session ID is destroyed by both the client and server 308 | 6. If the user later logs in again, a new session ID is generated and stored as a cookie on the client 309 | 310 | > **Note:** The default setting in Django REST Framework is actually a combination of Basic Authentication and Session Authentication. Django’s traditional session-based authentication system is used and the session ID is passed in the HTTP header on each request via Basic Authentication. 311 | 312 | **[⬆ Back to Top](#table-of-contents)** 313 | 314 | 11. ### What are the pros and cons of session authentication? 315 | **Pros:** 316 | - User credentials are only sent once, not on every request/response cycle as in Basic Authentication. 317 | - It is also more efficient since the server does not have to verify the user’s credentials each time, it just matches the session ID to the session object which is a fast look up. 318 | 319 | **Cons:** 320 | - A session ID is only valid within the browser where log in was performed; it will not work across multiple domains. This is an obvious problem when an API needs to support multiple front-ends such as a website and a mobile app. 321 | - The session object must be kept up-to-date which can be challenging in large sites with multiple servers. 322 | - The cookie is sent out for every single request, even those that don’t require authentication, which is inefficient. 323 | 324 | > **Note:** It is generally not advised to use a session-based authentication scheme for any API that will have multiple front-ends. 325 | 326 | **[⬆ Back to Top](#table-of-contents)** 327 | 328 | 12. ### What is Token Authentication? 329 | Tokens are pieces of data that carry just enough information to facilitate the process of determining a user's identity or authorizing a user to perform an action. All in all, tokens are artifacts that allow application systems to perform the authorization and authentication process. 330 | 331 | Token-based authentication is **stateless**: once a client sends the initial user credentials to the server, a unique token is generated and then stored by the client as either a cookie or in local storage. This token is then passed in the header of each incoming HTTP request and the server uses it to verify that a user is authenticated. The server itself does not keep a record of the user, just whether a token is valid or not. 332 | 333 | **[⬆ Back to Top](#table-of-contents)** 334 | 335 | 13. ### What are pros and cons of token authentication? 336 | **Pros:** 337 | - Since tokens are stored on the client, scaling the servers to maintain up-to-date session objects is no longer an issue. 338 | - Tokens can be shared amongst multiple front-ends: the same token can represent a user on the website and the same user on a mobile app. 339 | 340 | **Cons:** 341 | - A token contains all user information, not just an id as with a session id/session object set up. Since the token is sent on every request, managing its size can become a performance issue. 342 | 343 | **[⬆ Back to Top](#table-of-contents)** 344 | 345 | 14. ### What is the difference between cookies vs localStorage? 346 | - Cookies are used for reading server-side information. 347 | - They are smaller (4KB) in size and automatically sent with each HTTP request. 348 | 349 | - LocalStorage is designed for client-side information. 350 | - It is much larger (5120KB) and its contents are not sent by default with each HTTP request. 351 | 352 | **[⬆ Back to Top](#table-of-contents)** 353 | 354 | 15. ### Where should token be saved - cookie or localStorage? 355 | With token-based auth, you are given the choice of where to store the JWT. Commonly, the JWT is placed in the browser's local storage and this works well for most use cases. There are some issues with storing JWTs in local storage to be aware of. Unlike cookies, local storage is sandboxed to a specific domain and its data cannot be accessed by any other domain including sub-domains. 356 | 357 | You can store the token in a cookie instead, but the max size of a cookie is only 4kb so that may be problematic if you have many claims attached to the token. Additionally, you can store the token in session storage which is similar to local storage but is cleared as soon as the user closes the browser. 358 | 359 | Tokens stored in both cookies and localStorage are vulnerable to XSS attacks. The current best practice is to store tokens in a cookie with the httpOnly and Secure cookie flags. 360 | 361 | **[⬆ Back to Top](#table-of-contents)** 362 | 363 | 16. ### What are disadvantages of Django REST Framework's built-in TokenAuthentication? 364 | - It doesn't support setting tokens to expire 365 | - It only generates one token per user 366 | 367 | **[⬆ Back to Top](#table-of-contents)** 368 | 369 | 17. ### What are JSON Web Tokens(JWTs)? 370 | JSON web token (JWT), is an open standard that defines a compact and self-contained way for securely transmitting information between parties as a JSON object. 371 | 372 | Because of its relatively small size, a JWT can be sent through a URL, through a POST parameter, or inside an HTTP header, and it is transmitted quickly. A JWT contains all the required information about an entity to avoid querying a database more than once. The recipient of a JWT also does not need to call a server to validate the token. 373 | 374 | **[⬆ Back to Top](#table-of-contents)** 375 | 376 | 18. ### What are benefits of JWT? 377 | - **More compact:** JSON is less verbose than XML, so when it is encoded, a JWT is smaller than a simple token. This makes JWT a good choice to be passed in HTML and HTTP environments. 378 | 379 | - **More secure:** JWTs can use a public/private key pair for signing. A JWT can also be symmetrically signed by a shared secret using the HMAC algorithm. 380 | 381 | - **More common:** JSON parsers are common in most programming languages because they map directly to objects. 382 | 383 | - **Easier to process:** JWT is used at internet scale. This means that it is easier to process on user's devices, especially mobile. 384 | 385 | **[⬆ Back to Top](#table-of-contents)** 386 | 387 | 19. ### What is the difference between a session and cookie? 388 | A cookie is a bit of data stored by the browser and sent to the server with every request.Cookies are used to identify sessions. A session is a collection of data stored on the server and associated with a given user (usually via a cookie containing an id code). 389 | 390 | The main difference between a session and a cookie is that session data is stored on the server, whereas cookies store data in the visitor’s browser. Sessions are more secure than cookies as it is stored in server. Cookie can be turned off from browser. Data stored in cookie can be stored for months or years, depending on the life span of the cookie. But the data in the session is lost when the web browser is closed. 391 | 392 | **[⬆ Back to Top](#table-of-contents)** 393 | 394 | 20. ### What is the difference between cookie and tokens? 395 | Cookie-based authentication is stateful. This means that an authentication record or session must be kept both server and client-side. The server needs to keep track of active sessions in a database, while on the front-end a cookie is created that holds a session identifier, thus the name cookie based authentication. 396 | 397 | Token-based authentication is stateless. The server does not keep a record of which users are logged in or which JWTs have been issued. Instead, every request to the server is accompanied by a token which the server uses to verify the authenticity of the request. The token is generally sent as an addition Authorization header in the form of Bearer {JWT}, but can additionally be sent in the body of a POST request or even as a query parameter. 398 | 399 | > Read more [here](https://stackoverflow.com/questions/17000835/token-authentication-vs-cookies) 400 | 401 | **[⬆ Back to Top](#table-of-contents)** 402 | 403 | 21. ### What's an access token? 404 | When a user logins in, the authorization server issues an access token, which is an artifact that client applications can use to make secure calls to an API server. When a client application needs to access protected resources on a server on behalf of a user, the access token lets the client signal to the server that it has received authorization by the user to perform certain tasks or access certain resources. 405 | 406 | **[⬆ Back to Top](#table-of-contents)** 407 | 408 | 22. ### What is meant by a bearer token? 409 | A bearer token stands for a token that can be used by those who hold it. The access token thus acts as a credential artifact to access protected resources rather than an identification artifact. 410 | 411 | **[⬆ Back to Top](#table-of-contents)** 412 | 413 | 23. ### What is the security threat to access token? 414 | Malicious users could theoretically compromise a system and steal access tokens, which in turn they could use to access protected resources by presenting those tokens directly to the server. 415 | 416 | As such, it's critical to have security strategies that minimize the risk of compromising access tokens. One mitigation method is to create access tokens that have a short lifespan: they are only valid for a short time defined in terms of hours or days. 417 | 418 | **[⬆ Back to Top](#table-of-contents)** 419 | 420 | 24. ### What is a refresh token? 421 | For security purposes, access tokens may be valid for a short amount of time. Once they expire, client applications can use a refresh token to "refresh" the access token. That is, a refresh token is a credential artifact that lets a client application get new access tokens without having to ask the user to log in again. 422 | 423 | The client application can get a new access token as long as the refresh token is valid and unexpired. Consequently, a refresh token that has a very long lifespan could theoretically give infinite power to the token bearer to get a new access token to access protected resources anytime. The bearer of the refresh token could be a legitimate user or a malicious user. 424 | 425 | **[⬆ Back to Top](#table-of-contents)** 426 | 427 | 25. ### What are the best practices when using token authentication? 428 | Some basic considerations to keep in mind when using tokens: 429 | 430 | - Keep it secret. Keep it safe. 431 | - Do not add sensitive data to the payload. 432 | - Give tokens an expiration. 433 | - Embrace HTTPS. 434 | - Consider all of your authorization use cases. 435 | - Store and reuse. 436 | 437 | **[⬆ Back to Top](#table-of-contents)** 438 | 439 | 26. ### What is cookie-based authentication? 440 | A request to the server is always signed in by authorization cookie. 441 | **Pros:** 442 | - Cookies can be marked as "http-only" which makes them impossible to be read on the client side. This is better for XSS-attack protection. 443 | - Comes out of the box - you don't have to implement any code on the client side. 444 | 445 | **Cons:** 446 | - Bound to a single domain. 447 | - Vulnerable to XSRF. You have to implement extra measures to make your site protected against cross site request forgery. 448 | - Are sent out for every single request, (even for requests that don't require authentication). 449 | 450 | **[⬆ Back to Top](#table-of-contents)** 451 | 452 | 27. ### What are viewsets in DRF? 453 | A viewset is a way to combine the logic for multiple related views into a single class. In other words, one viewset can replace multiple views. It is a class that is simply a type of class-based View, that does not provide any method handlers such as `.get()` or `.post()`, and instead provides actions such as `.list()` and `.create()`. 454 | 455 | **[⬆ Back to Top](#table-of-contents)** 456 | 457 | 28. ### What are routers in DRF? 458 | Routers work directly with viewsets to automatically generate URL patterns for us. Django REST Framework has two default routers: SimpleRouter and DefaultRouter. 459 | - **SimpleRouter** - This router includes routes for the standard set of list, create, retrieve, update, partial_update and destroy actions. 460 | - **Default Router** - This router is similar to SimpleRouter, but additionally includes a default API root view, that returns a response containing hyperlinks to all the list views. It also generates routes for optional `.json` style format suffixes. 461 | 462 | **[⬆ Back to Top](#table-of-contents)** 463 | 464 | 29. ### What is the difference between APIViews and Viewsets in DRF? 465 | DRF has two main systems for handling views: 466 | 467 | - **`APIView`:** This provides methods handler for http verbs: get, post, put, patch, and delete. 468 | - **`ViewSet`:** This is an abstraction over `APIView`, which provides actions as methods: 469 | - **list:** read only, returns multiple resources (http verb: get). Returns a list of dicts. 470 | - **retrieve:** read only, single resource (http verb: get, but will expect an id in the url). Returns a single dict. 471 | - **create:** creates a new resource (http verb: post) 472 | - **update/partial_update:** edits a resource (http verbs: put/patch) 473 | - **destroy:** removes a resource (http verb: delete) 474 | 475 | **[⬆ Back to Top](#table-of-contents)** 476 | 477 | 30. ### What is the difference between `GenericAPIView` and `GenericViewset`? 478 | - **`GenericAPIView`:** for APIView, this gives you shortcuts that map closely to your database models. Adds commonly required behavior for standard list and detail views. Gives you some attributes like, the `serializer_class`, also gives `pagination_class`, `filter_backend`, etc 479 | 480 | - **`GenericViewSet`:** There are many GenericViewSet, the most common being `ModelViewSet`. They inherit from `GenericAPIView` and have a full implementation of all of the actions: list, retrieve, destroy, updated, etc. 481 | 482 | **[⬆ Back to Top](#table-of-contents)** 483 | 484 | > I have added a new section on [Django + Celery + Redis + RabbitMQ Interview Questions](https://github.com/PragatiVerma18/DRF-Interview-Prep/blob/master/django-celery-redis-interview-questions.md). Check it out! 485 | 486 | --- 487 | ### About Author 488 | 489 | | | 490 | | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | 491 | | **[Pragati Verma](https://www.linkedin.com/in/PragatiVerma18/)** | 492 | 493 | > **_Need help?_** 494 | > **_Feel free to contact me @ [itispragativerma@gmail.com](mailto:itispragativerma@gmail.com?Subject=SnippetShareProject)_** 495 | 496 | [![GitHub followers](https://img.shields.io/github/followers/pragativerma18.svg?label=Follow%20@pragativerma18&style=social)](https://github.com/PragatiVerma18/) [![Twitter Follow](https://img.shields.io/twitter/follow/pragati_verma18?style=social)](https://twitter.com/pragati_verma18) 497 | 498 | ### Connect with me 499 |
500 | 501 | github 502 | 503 | 504 | twitter 505 | 506 | 507 | codepen 508 | 509 | 510 | devto 511 | 512 | 513 | stackoverflow 514 | 515 | 516 | linkedin 517 | 518 | 519 | facebook 520 | 521 | 522 | instagram 523 | 524 | 525 | medium 526 | 527 |
528 | 529 | --- 530 | -------------------------------------------------------------------------------- /django-celery-redis-interview-questions.md: -------------------------------------------------------------------------------- 1 | # Django + Celery + Redis + RabbitMQ Interview Questions 2 | 3 | Here are some potential interview questions covering Django, Celery, Redis, and RabbitMQ, along with explanations and edge cases: 4 | 5 | ## Django 6 | 7 | 1. **How does Django handle asynchronous tasks?** 8 | 9 | Django itself is primarily synchronous, but it has support for asynchronous views, middleware, and database operations as of Django 3.1+. However, Django does not provide built-in background task execution. Instead, Celery is commonly used to handle async tasks like sending emails, processing images, or performing background computations. 10 | 11 | - https://dev.to/pragativerma18/unlocking-performance-a-guide-to-async-support-in-django-2jdj 12 | 13 | 2. **What are Django signals, and how do they compare to Celery tasks?** 14 | 15 | Django signals allow decoupled components of a Django application to communicate when certain events occur. For example, the `post_save` signal can be used to trigger an action when a model instance is saved. 16 | 17 | **Comparison with Celery:** 18 | 19 | - **Signals** are synchronous and executed within the request-response cycle. 20 | - **Celery tasks** are asynchronous and executed in the background, preventing delays in user-facing operations. 21 | 22 | **Use cases:** 23 | 24 | - Signals: Logging, cache invalidation, simple notifications. 25 | - Celery: Sending emails, generating reports, and time-consuming operations. 26 | 27 | 3. **How would you scale a Django application handling high traffic?** 28 | 29 | To scale a Django app for high traffic: 30 | 31 | - **Optimize Database Queries**: Use indexing, avoid N+1 queries, leverage caching. 32 | - **Use Load Balancing**: Deploy multiple application servers behind a load balancer. 33 | - **Enable Caching**: Use Redis or Memcached for frequently accessed data. 34 | - **Use a CDN**: Offload static files and media files to a CDN. 35 | - **Use Asynchronous Task Processing**: Offload heavy tasks to Celery workers. 36 | - **Deploy with WSGI or ASGI**: Use Gunicorn for WSGI, or Daphne/Uvicorn for ASGI (async support). 37 | 38 | 4. **How does Django's ORM interact with databases?** 39 | 40 | Django ORM (Object-Relational Mapper) allows interaction with databases using Python objects instead of raw SQL. It converts Python models into SQL queries and provides an abstraction layer. 41 | 42 | - **QuerySet API**: `Model.objects.filter(name="John")` translates to `SELECT * FROM model WHERE name='John'`. 43 | - **Transactions**: Uses ACID-compliant transactions. 44 | - **Connection Pooling**: Managed by Django’s database engine. 45 | - **Lazy Execution**: QuerySets are evaluated only when necessary, improving efficiency. 46 | 47 | 5. **What happens when a database transaction fails in Django?** 48 | 49 | If a transaction fails: 50 | 51 | - If **atomic blocks** (`@transaction.atomic`) are used, Django **rolls back** all changes within the block. 52 | - Without atomic transactions, partial changes may persist, leading to data inconsistency. 53 | 54 | Example: 55 | 56 | ```python 57 | from django.db import transaction 58 | 59 | try: 60 | with transaction.atomic(): 61 | user = User.objects.create(username="test_user") 62 | Profile.objects.create(user=user) # If this fails, user creation is also rolled back 63 | except Exception as e: 64 | print("Transaction failed:", e) 65 | ``` 66 | 67 | 6. **How does Django handle caching, and how can Redis improve performance?** 68 | 69 | Django supports multiple caching backends: 70 | 71 | - **Database caching**: Stores cache data in a database table. 72 | - **File system caching**: Stores cache files on disk. 73 | - **Memory-based caching**: Uses Memcached or Redis for high-performance caching. 74 | 75 | **Redis Benefits:** 76 | 77 | - **In-memory storage**: Faster than disk-based caching. 78 | - **Persistence**: Supports data persistence (RDB, AOF). 79 | - **Distributed Cache**: Works across multiple servers. 80 | 81 | Example: 82 | 83 | ```python 84 | CACHES = { 85 | 'default': { 86 | 'BACKEND': 'django.core.cache.backends.redis.RedisCache', 87 | 'LOCATION': 'redis://127.0.0.1:6379/1', 88 | } 89 | } 90 | ``` 91 | 92 | 7. **What happens if the Django application loses connection to the database?** 93 | 94 | If the database connection is lost: 95 | 96 | - Django raises an `OperationalError`. 97 | - Default behavior: Retries connections but eventually fails if the database is unreachable. 98 | - Solution: Use **connection pooling** and **auto-reconnect** using Django’s `CONN_MAX_AGE` setting. 99 | 100 | Example: 101 | 102 | ```python 103 | DATABASES = { 104 | 'default': { 105 | 'ENGINE': 'django.db.backends.postgresql', 106 | 'NAME': 'mydb', 107 | 'USER': 'myuser', 108 | 'PASSWORD': 'mypassword', 109 | 'HOST': 'db_host', 110 | 'PORT': '5432', 111 | 'CONN_MAX_AGE': 600, # Persistent connections 112 | } 113 | } 114 | ``` 115 | 116 | 8. **How do you secure Django applications from SQL injection, XSS, and CSRF attacks?** 117 | 118 | - **SQL Injection Prevention**: Django ORM escapes queries by default. 119 | ```python 120 | User.objects.get(username=username) # Safe 121 | ``` 122 | - **XSS (Cross-Site Scripting) Prevention**: Django auto-escapes templates. 123 | ```html 124 | {{ user_input }} 125 | 126 | ``` 127 | - **CSRF (Cross-Site Request Forgery) Prevention**: Django includes CSRF middleware. 128 | ```html 129 |
{% csrf_token %}
130 | ``` 131 | 132 | Other best practices: 133 | 134 | - Use `SECURE_SSL_REDIRECT = True` (force HTTPS). 135 | - Use `HttpOnly` and `Secure` attributes on cookies. 136 | 137 | 9. **What is the difference between Django’s session storage in the database vs. in Redis?** 138 | 139 | Django allows session storage in: 140 | 141 | - **Database (`django.contrib.sessions.backends.db`)**: Stores session data in a table, leading to higher latency. 142 | - **Cache (Redis/Memcached)**: Stores session data in-memory, making it faster. 143 | 144 | **Why use Redis?** 145 | 146 | - **Faster retrieval** due to in-memory storage. 147 | - **Automatic expiration** of stale sessions. 148 | - **Scalability** for distributed applications. 149 | 150 | 10. **How do you manage background tasks in Django without Celery?** 151 | 152 | While Celery is the preferred choice for background tasks, alternatives include: 153 | 154 | 1. **Django-cron**: 155 | - Runs periodic tasks based on system cron jobs. 156 | - Example: 157 | ```python 158 | from django_cron import CronJobBase, Schedule 159 | class MyCronJob(CronJobBase): 160 | RUN_EVERY_MINS = 60 # every hour 161 | schedule = Schedule(run_every_mins=RUN_EVERY_MINS) 162 | code = 'my_app.my_cron_job' 163 | def do(self): 164 | print("Running background task") 165 | ``` 166 | 2. **Threading in Django Views** (not recommended for heavy tasks): 167 | 168 | ```python 169 | import threading 170 | def process_data(): 171 | # Expensive operation 172 | pass 173 | 174 | t = threading.Thread(target=process_data) 175 | t.start() 176 | ``` 177 | 178 | 3. **Using Django’s `runserver` with Management Commands**: 179 | - Custom command: 180 | ```python 181 | from django.core.management.base import BaseCommand 182 | class Command(BaseCommand): 183 | def handle(self, *args, **kwargs): 184 | print("Executing background task") 185 | ``` 186 | - Run as: 187 | ```bash 188 | python manage.py my_custom_command 189 | ``` 190 | 4. **Database-backed queue**: 191 | - Use Django’s ORM to store tasks and process them with a simple worker script. 192 | 193 | While these alternatives work, Celery provides better reliability, scheduling, and retry mechanisms. 194 | 195 | ## Celery 196 | 197 | 1. **What is Celery, and why is it used?** 198 | 199 | Celery is an asynchronous task queue based on distributed message passing. It is used to offload long-running or background tasks from the main application, improving responsiveness and performance. 200 | 201 | **Common Use Cases:** 202 | 203 | - Sending emails asynchronously. 204 | - Processing large datasets. 205 | - Scheduling periodic tasks (e.g., reports, data backups). 206 | - Handling web scraping jobs. 207 | 208 | 2. **How does Celery execute tasks asynchronously?** 209 | 210 | Celery follows a producer-consumer architecture where: 211 | 212 | 1. A **producer (Django app)** sends a task to a **message broker** (Redis/RabbitMQ). 213 | 2. A **Celery worker** picks up the task from the broker and processes it. 214 | 3. The result is optionally stored in a **results backend** (Redis, database, etc.). 215 | 216 | ![image.png](Django%20+%20Celery%20+%20Redis%20+%20RabbitMQ%20Interview%20Quest%201a0ac7d04d878032b379c840b387c722/image.png) 217 | 218 | 3. **What are the different Celery message brokers, and how do they compare?** 219 | 220 | Celery supports multiple message brokers: 221 | 222 | - **Redis** (fast in-memory store, supports pub/sub, but may lose tasks if not persistent). 223 | - **RabbitMQ** (persistent queues, better message durability). 224 | - **Amazon SQS** (fully managed, highly available, but higher latency). 225 | 226 | **Comparison:** 227 | 228 | | Feature | Redis | RabbitMQ | Amazon SQS | 229 | | ----------- | ---------------------- | -------------------- | ----------------- | 230 | | Speed | Faster (in-memory) | Slower (disk-based) | Slower | 231 | | Persistence | Optional (RDB/AOF) | Yes (durable queues) | Yes | 232 | | Scalability | Horizontal scaling | Clustering required | Auto-scaled | 233 | | Use Case | Quick, transient tasks | Reliable messaging | Cloud-native apps | 234 | 235 | 4. **What happens if a Celery task fails?** 236 | 237 | By default, failed tasks are logged, but they can be retried using `retry`. 238 | 239 | Example: 240 | 241 | ```python 242 | from celery import shared_task 243 | from celery.exceptions import MaxRetriesExceededError 244 | import requests 245 | 246 | @shared_task(bind=True, max_retries=3) 247 | def fetch_url(self, url): 248 | try: 249 | response = requests.get(url) 250 | return response.text 251 | except requests.RequestException as exc: 252 | raise self.retry(exc=exc, countdown=5) # Retries after 5 seconds 253 | ``` 254 | 255 | - The task retries up to **3 times** before failing permanently. 256 | - `countdown=5` adds a delay before retries. 257 | 258 | 5. **What happens if the Celery worker crashes?** 259 | 260 | - Any task in progress **may be lost** if it was running in memory. 261 | - If using **RabbitMQ** (durable queues), unprocessed tasks remain in the queue. 262 | - If using **Redis**, tasks might be lost unless `acks_late=True` is used. 263 | - Celery can recover tasks with **task acknowledgment** enabled. 264 | 265 | Solution: Enable **task persistence** using: 266 | 267 | ```python 268 | task_acks_late = True # Ensures tasks are acknowledged after execution 269 | worker_prefetch_multiplier = 1 # Ensures fair task distribution 270 | ``` 271 | 272 | ### Is the Task Lost If a Worker Crashes? 273 | 274 | | **When Worker Crashes?** | **Is Task Lost?** | **Solution** | 275 | | --------------------------------------- | ----------------- | ---------------------------------------------- | 276 | | **Before fetching task** | ❌ No | Task is still in queue | 277 | | **After fetching but before execution** | ✅ Yes (default) | `acks_late=True` | 278 | | **During execution** | ✅ Yes (default) | `acks_late=True`, `autoretry_for=(Exception,)` | 279 | | **After execution (completed task)** | ❌ No | Task is done | 280 | 281 | 6. **How does Celery handle scheduled and periodic tasks?** 282 | 283 | Celery can schedule tasks using **Celery Beat**, a periodic task scheduler. 284 | 285 | Example: 286 | 287 | ```python 288 | from celery.schedules import crontab 289 | from celery import Celery 290 | 291 | app = Celery('tasks') 292 | 293 | app.conf.beat_schedule = { 294 | 'every-day-task': { 295 | 'task': 'tasks.daily_report', 296 | 'schedule': crontab(hour=0, minute=0), 297 | }, 298 | } 299 | ``` 300 | 301 | This runs `daily_report` every day at midnight. 302 | 303 | 7. **What happens if Redis (or RabbitMQ) crashes while a task is in progress?** 304 | 305 | - **Redis as broker**: Tasks might be lost unless they are persistent. 306 | - **RabbitMQ as broker**: Messages remain in the queue due to durability settings. 307 | - **Result backend impact**: If Redis is the result backend, results might be lost. 308 | 309 | Solution: 310 | 311 | - Use **persistent queues** in RabbitMQ (`x-ha-policy: all` for HA queues). 312 | - Enable **AOF persistence** in Redis to reduce data loss. 313 | - Configure **retry policies** in Celery tasks. 314 | 315 | 8. **How do you handle task dependencies in Celery?** 316 | 317 | Celery provides **chaining**, **groups**, and **callbacks** for task dependencies. 318 | 319 | **Task Chain (Sequential Execution)**: 320 | 321 | ```python 322 | from celery import chain 323 | 324 | chain(task1.s(), task2.s(), task3.s())() 325 | ``` 326 | 327 | **Task Group (Parallel Execution)**: 328 | 329 | ```python 330 | from celery import group 331 | 332 | group(task1.s(), task2.s(), task3.s())() 333 | ``` 334 | 335 | **Chord (Parallel + Callback)**: 336 | 337 | ```python 338 | from celery import chord 339 | 340 | chord([task1.s(), task2.s()])(callback_task.s()) 341 | ``` 342 | 343 | 9. **What are Celery's different states (PENDING, STARTED, SUCCESS, FAILURE, etc.)?** 344 | 345 | Celery tasks have different states: 346 | 347 | - **PENDING**: Task is in the queue but not yet assigned. 348 | - **STARTED**: Task execution has begun (requires `task_track_started=True`). 349 | - **RETRY**: Task has failed but is retrying. 350 | - **SUCCESS**: Task executed successfully. 351 | - **FAILURE**: Task execution failed. 352 | 353 | 10. **What is the differences Between Celery Workers, Task Queues, Message Brokers, and Result Backends?** 354 | 355 | | **Component** | **Definition** | **Purpose** | **Examples** | 356 | | ------------------- | --------------------------------------- | ------------------------------------------------------------- | ------------------------------------------ | 357 | | **Celery Workers** | Processes that execute Celery tasks. | Consume tasks from queues and execute them asynchronously. | `celery worker -A myapp` | 358 | | **Task Queues** | Queues that hold pending tasks. | Stores tasks until they are picked up by a worker. | `default`, `high-priority`, `low-priority` | 359 | | **Message Brokers** | Middleware that routes tasks to queues. | Transfers tasks from the producer (Django app) to the worker. | Redis, RabbitMQ, Amazon SQS | 360 | | **Result Backends** | Stores task execution results. | Allows retrieval of task status and results. | Redis, PostgreSQL, MongoDB, Memcached | 361 | 362 | *** 363 | 364 | ### **Detailed Breakdown** 365 | 366 | ### **1. Celery Workers** 367 | 368 | - Workers are background processes that execute tasks asynchronously. 369 | - They listen for tasks in a queue and process them when available. 370 | - Multiple workers can run on different machines to scale task execution. 371 | 372 | **Example: Starting a worker** 373 | 374 | ```bash 375 | celery -A myapp worker --loglevel=info 376 | ``` 377 | 378 | - `A myapp`: Specifies the Celery app. 379 | - `worker`: Starts a worker process. 380 | - `-loglevel=info`: Sets log verbosity. 381 | 382 | *** 383 | 384 | ### **2. Task Queues** 385 | 386 | - Task queues hold tasks that are waiting to be processed. 387 | - Workers consume tasks from these queues in a FIFO manner. 388 | - Tasks can be routed to different queues based on priority. 389 | 390 | **Example: Defining a queue in Celery** 391 | 392 | ```python 393 | from celery import Celery 394 | 395 | app = Celery('myapp', broker='redis://localhost:6379/0') 396 | 397 | app.conf.task_routes = { 398 | 'tasks.high_priority': {'queue': 'high_priority'}, 399 | 'tasks.low_priority': {'queue': 'low_priority'}, 400 | } 401 | ``` 402 | 403 | - Tasks are routed based on the function name. 404 | 405 | **Sending a task to a specific queue** 406 | 407 | ```python 408 | add.apply_async(args=[4, 4], queue='high_priority') 409 | ``` 410 | 411 | *** 412 | 413 | ### **3. Message Brokers** 414 | 415 | - A message broker acts as an intermediary between producers (Django app) and consumers (workers). 416 | - It ensures tasks are stored in queues until workers are ready to process them. 417 | 418 | **Popular Message Brokers:** 419 | 420 | | Broker | Pros | Cons | 421 | | -------------- | -------------------------------------------------------- | --------------------------------------------- | 422 | | **Redis** | Fast, simple setup, supports Pub/Sub. | Volatile memory storage, potential data loss. | 423 | | **RabbitMQ** | Persistent queues, message durability, advanced routing. | More complex setup. | 424 | | **Amazon SQS** | Fully managed, scalable. | Higher latency, additional cost. | 425 | 426 | *** 427 | 428 | ### **4. Result Backends** 429 | 430 | - The result backend stores task statuses and return values. 431 | - If a task returns a result, it can be retrieved later. 432 | - Some backends support expiration policies to delete old results. 433 | 434 | **Example: Storing results in Redis** 435 | 436 | ```python 437 | app.conf.result_backend = 'redis://localhost:6379/0' 438 | ``` 439 | 440 | **Checking task status** 441 | 442 | ```python 443 | result = add.delay(4, 4) 444 | print(result.status) # PENDING, SUCCESS, FAILURE 445 | print(result.get()) # Fetch result 446 | ``` 447 | 448 | **Common result backends:** 449 | 450 | | Backend | Pros | Cons | 451 | | ---------- | -------------------------------- | ---------------------------- | 452 | | Redis | Fast, in-memory storage. | Data loss if not persistent. | 453 | | PostgreSQL | Durable, SQL queries supported. | Slower than Redis. | 454 | | MongoDB | Flexible, supports complex data. | Additional setup required. | 455 | 456 | 11. **Why Do We Need Message Brokers If Queues Already Exist?** 457 | 458 | At first glance, it might seem like **task queues** should be enough to handle background tasks. However, **message brokers** provide essential functionality that simple queues alone cannot offer. Here’s why message brokers are necessary: 459 | 460 | ### **1. Message Brokers Manage Queues Efficiently** 461 | 462 | A **queue** is just a storage structure for tasks, but it does not have the intelligence to: 463 | 464 | - Route messages to the right workers. 465 | - Ensure message delivery reliability. 466 | - Handle multiple producers and consumers efficiently. 467 | 468 | A **message broker** (like Redis or RabbitMQ) **manages** queues by: 469 | 470 | - Storing tasks in memory or disk. 471 | - Ensuring tasks are delivered to the right queue. 472 | - Handling retries, acknowledgments, and routing. 473 | 474 | Without a broker, we would need to manually implement all of these features. 475 | 476 | ### **2. Message Brokers Ensure Reliability & Durability** 477 | 478 | - If a queue were just a simple data structure (e.g., a Python list or database table), **tasks could be lost** if the system crashes. 479 | - Brokers like **RabbitMQ** provide **persistent queues**, ensuring that tasks survive restarts. 480 | - **Redis** allows for **data persistence** using **Append-Only File (AOF) mode**. 481 | 482 | 📌 **Example:** 483 | 484 | If a worker crashes while processing a task, RabbitMQ **re-delivers** the task to another worker. 485 | 486 | *** 487 | 488 | ### **3. Message Brokers Allow Asynchronous Communication** 489 | 490 | - The **producer (Django app)** doesn’t have to wait for the task to complete—it just **sends** it to the broker and moves on. 491 | - The **worker** picks up the task when it’s available, processes it, and sends the result back. 492 | - This enables **scalability** and **non-blocking execution**. 493 | 494 | 📌 **Example:** 495 | 496 | A Django app sending 1,000 emails should **not** process them synchronously—it should push them to a broker like Redis, and multiple Celery workers can process them in parallel. 497 | 498 | *** 499 | 500 | ### **4. Message Brokers Support Multiple Consumers & Load Balancing** 501 | 502 | - A single queue can be **shared among multiple workers**, distributing tasks efficiently. 503 | - Brokers like RabbitMQ **load-balance** tasks among multiple workers using **round-robin scheduling**. 504 | 505 | 📌 **Example:** 506 | 507 | If 5 workers are listening to a queue, the broker distributes tasks among them dynamically. 508 | 509 | *** 510 | 511 | ### **5. Message Brokers Support Advanced Task Routing** 512 | 513 | - Some tasks may be **high-priority**, while others can wait. 514 | - Brokers **route tasks to different queues** based on predefined rules. 515 | 516 | 📌 **Example:** 517 | 518 | A web scraping job may go to a **low-priority queue**, while a payment processing job goes to a **high-priority queue**. 519 | 520 | ```python 521 | app.conf.task_routes = { 522 | 'tasks.process_payment': {'queue': 'high_priority'}, 523 | 'tasks.web_scrape': {'queue': 'low_priority'}, 524 | } 525 | ``` 526 | 527 | *** 528 | 529 | ### **6. Message Brokers Handle Retries & Acknowledgments** 530 | 531 | - If a worker crashes while processing a task, a broker ensures the task is **retried**. 532 | - Celery allows enabling **acknowledgments**, meaning the broker considers a task **complete only when the worker confirms it**. 533 | 534 | 📌 **Example:** 535 | 536 | If `task_acks_late=True` is enabled in Celery, a task will only be removed from the queue **after** successful execution. 537 | 538 | ```python 539 | app.conf.task_acks_late = True 540 | ``` 541 | 542 | *** 543 | 544 | ### **7. Message Brokers Scale With the System** 545 | 546 | - As the number of tasks grows, brokers handle **thousands to millions of tasks** efficiently. 547 | - They support **distributed task execution**, meaning multiple workers across different servers can process tasks from a single broker. 548 | 549 | 📌 **Example:** 550 | 551 | In a cloud-based architecture, **multiple workers on different servers** can pull tasks from a **single broker instance**. 552 | 553 | *** 554 | 555 | ### **Summary: Why Not Just Use Queues?** 556 | 557 | | Feature | Simple Queue (e.g., Python List, DB Table) | Message Broker (Redis, RabbitMQ) | 558 | | ---------------------------------- | ------------------------------------------ | ------------------------------------------------ | 559 | | **Persistence** | No (tasks lost on crash) | Yes (Redis AOF, RabbitMQ durable queues) | 560 | | **Asynchronous Execution** | No (blocking execution) | Yes (non-blocking) | 561 | | **Load Balancing** | No (one queue = one consumer) | Yes (multiple workers consume tasks dynamically) | 562 | | **Task Retries & Acknowledgments** | No (manual handling) | Yes (automatic retries, acks) | 563 | | **Task Routing & Prioritization** | No | Yes (priority queues, routing rules) | 564 | | **Scalability** | Low (single queue = single system) | High (distributed processing) | 565 | 566 | 12. **Where Do Queues Exist in Celery?** 567 | 568 | The queues in Celery are **not stored inside Celery itself**—they exist **within the message broker** (like Redis or RabbitMQ). Celery **only interacts with these queues** to send and receive tasks. 569 | 570 | ### **How Celery Queues Work?** 571 | 572 | 1. **Django (or any producer) sends a task** → The task is pushed into a queue inside the message broker. 573 | 2. **The broker holds the queue** until a worker picks up the task. 574 | 3. **Celery workers consume tasks** from these queues and execute them. 575 | 576 | ### **Where Do These Queues Physically Exist?** 577 | 578 | It depends on the message broker being used: 579 | 580 | | **Message Broker** | **Where Queues Exist?** | 581 | | ------------------ | ------------------------------------------------------------------------------------------------------------------------- | 582 | | **Redis** | Queues exist as **lists** inside Redis memory (e.g., `LPUSH` for adding tasks, `RPOP` for consuming). | 583 | | **RabbitMQ** | Queues exist as **durable message queues** inside RabbitMQ (AMQP-based). Messages are stored temporarily until processed. | 584 | | **Amazon SQS** | Queues exist inside Amazon’s managed Simple Queue Service (SQS), persisting messages until consumed. | 585 | | **Kafka** | Queues exist as **partitions inside Kafka topics**, allowing scalable message streaming. | 586 | 587 | ## Redis 588 | 589 | 1. **How does Redis differ from traditional databases?** 590 | 591 | | Feature | Redis | Traditional Databases (PostgreSQL, MySQL, etc.) | 592 | | -------------------- | -------------------------------------------- | ----------------------------------------------- | 593 | | **Storage** | In-memory | Disk-based | 594 | | **Speed** | Extremely fast | Slower (disk I/O involved) | 595 | | **Data Persistence** | Optional (AOF, RDB) | Persistent by default | 596 | | **Data Model** | Key-value | Relational (tables, joins) | 597 | | **Transactions** | Supports transactions, but limited | Full ACID compliance | 598 | | **Scalability** | Scales horizontally (sharding, clustering) | Can be scaled but needs optimizations | 599 | | **Use Cases** | Caching, message queues, real-time analytics | OLTP, relational data storage | 600 | 601 | Redis is **not a replacement** for traditional databases but is used for caching, real-time operations, and message brokering. 602 | 603 | 2. **What happens if Redis crashes while it is storing Celery task states?** 604 | 605 | - If Redis is being used as a **Celery result backend**, all task states **will be lost** unless persistence is enabled. 606 | - Tasks that are **queued but not yet fetched** may also be lost. 607 | - Running tasks **continue** if they have already been picked up by workers. 608 | 609 | ✅ **Solution:** 610 | 611 | - Enable Redis **AOF (Append-Only File) or RDB snapshots** for persistence. 612 | - Use an **alternative result backend** (e.g., PostgreSQL, S3, or RabbitMQ) to avoid losing task states. 613 | 614 | 3. **How can Redis persistence be enabled, and what are the trade-offs?** 615 | 616 | Redis provides **two persistence mechanisms**: 617 | 618 | 1. **RDB (Redis Database Backup) – Periodic Snapshots** 619 | 620 | - Saves a snapshot of the dataset **at intervals** (e.g., every 5 minutes). 621 | - Less disk I/O but **risk of data loss** between snapshots. 622 | - ✅ **Good for caching, not for critical data.** 623 | - 🔴 **Trade-off:** Data loss possible if Redis crashes between snapshots. 624 | 625 | **Enable RDB:** 626 | 627 | ``` 628 | save 900 1 # Save every 900 seconds if at least 1 change 629 | save 300 10 # Save every 300 seconds if at least 10 changes 630 | ``` 631 | 632 | 2. **AOF (Append-Only File) – Continuous Logging** 633 | 634 | - Logs every write operation **to disk** for full recovery. 635 | - ✅ **Best for Celery queues and critical tasks.** 636 | - 🔴 **Trade-off:** More disk usage and I/O overhead. 637 | 638 | **Enable AOF:** 639 | 640 | ``` 641 | appendonly yes 642 | appendfsync everysec # Sync to disk every second 643 | ``` 644 | 645 | ✅ **Best Practice:** Use **both AOF and RDB** for better recovery. 646 | 647 | 4. **What happens if Redis runs out of memory?** 648 | 649 | - Redis **stops accepting writes** once it reaches the `maxmemory` limit. 650 | - If `maxmemory-policy` is set, it starts **evicting old keys** (depending on policy). 651 | - If no eviction policy is set, Redis **throws errors** for new writes. 652 | 653 | ✅ **Solutions:** 654 | 655 | - Increase `maxmemory`: 656 | ``` 657 | maxmemory 2gb 658 | ``` 659 | - Set an eviction policy (`allkeys-lru`, `volatile-lru`, etc.): 660 | ``` 661 | maxmemory-policy allkeys-lru 662 | ``` 663 | 664 | 5. **How does Redis handle concurrency?** 665 | 666 | - Redis is **single-threaded** but **uses an event loop** to handle multiple requests. 667 | - It executes commands **one at a time** in sequence (Atomic operations). 668 | - **Pipelining** allows sending multiple commands at once, improving performance. 669 | - **Transactions (`MULTI/EXEC`)** ensure multiple operations are executed atomically. 670 | 671 | ✅ **Best Practices:** 672 | 673 | - Use **pipelining** for batch operations. 674 | - Use **Lua scripts** for atomic multi-step operations. 675 | - Scale Redis with **sharding and clustering**. 676 | 677 | 6. **What are Redis pub/sub and its use cases?** 678 | - **Publish/Subscribe (Pub/Sub)** allows message broadcasting. 679 | - **Publishers** send messages to channels, and **subscribers** receive them. 680 | - **Use Cases:** 681 | - **Real-time notifications** (e.g., chat apps, live updates). 682 | - **Event-driven architecture** (decoupling microservices). 683 | - **Streaming data processing** (e.g., logs, monitoring). 684 | 7. **How does Redis replication work, and how do you handle failover?** 685 | - Redis **replicates** data from a **primary (master) node** to **one or more replicas (slaves)**. 686 | - Replicas sync **asynchronously**. 687 | - Failover is handled using **Redis Sentinel** or **Redis Cluster**. 688 | 8. **What are the differences between Redis and RabbitMQ as message brokers?** 689 | 690 | | Feature | Redis | RabbitMQ | 691 | | ----------------- | ------------------------------- | ------------------------------------------ | 692 | | **Message Model** | Pub/Sub & Streams | AMQP Queues | 693 | | **Persistence** | Optional (AOF/RDB) | Persistent by default | 694 | | **Reliability** | Less reliable (default) | Highly reliable | 695 | | **Ordering** | FIFO not guaranteed | FIFO & priority queues available | 696 | | **Scalability** | Horizontally scalable | Needs clustering for scale | 697 | | **Best For** | Fast, real-time data processing | Reliable task queues, event-driven systems | 698 | 699 | ✅ **When to Use Redis?** 700 | 701 | - Low-latency tasks (real-time updates, analytics). 702 | - Lightweight pub/sub messaging. 703 | 704 | ✅ **When to Use RabbitMQ?** 705 | 706 | - Reliable, **persistent** message queues. 707 | - Ensuring **no task loss** in Celery. 708 | 709 | 9. **What happens if Redis gets overloaded with too many Celery tasks?** 710 | 711 | - High CPU and memory usage. 712 | - Redis may **reject new tasks** if `maxmemory` is reached. 713 | - Celery workers may **timeout waiting for tasks**. 714 | 715 | ✅ **Solutions:** 716 | 717 | 1. **Increase Redis memory limit** or **enable eviction policy**. 718 | 2. **Use multiple Redis instances (sharding).** 719 | 3. **Switch to RabbitMQ** for a more **scalable** message broker. 720 | 4. **Use Celery rate limiting** to prevent task floods 721 | 722 | 10. **How do you secure Redis from unauthorized access and attacks?** 723 | 724 | 1. **Bind Redis to localhost (prevent external access)** 725 | 726 | ``` 727 | bind 127.0.0.1 728 | ``` 729 | 730 | 2. **Set a strong Redis password (`requirepass`)** 731 | 732 | ``` 733 | requirepass MySecurePass 734 | ``` 735 | 736 | 3. **Disable dangerous commands (flushall, config, shutdown)** 737 | 738 | ``` 739 | rename-command FLUSHALL "" 740 | rename-command CONFIG "" 741 | ``` 742 | 743 | 4. **Enable TLS Encryption** 744 | 745 | ``` 746 | tls-cert-file /etc/ssl/redis.crt 747 | tls-key-file /etc/ssl/redis.key 748 | ``` 749 | 750 | 5. **Use a firewall (`ufw` or `iptables`)** to restrict access. 751 | 752 | ``` 753 | sudo ufw allow from 192.168.1.100 to any port 6379 754 | ``` 755 | 756 | ✅ **Best Practice:** Deploy Redis **behind a VPN** or **inside a private network**. 757 | 758 | ## RabbitMQ 759 | 760 | 1. **How does RabbitMQ work as a message broker for Celery?** 761 | 762 | RabbitMQ is a **reliable** message broker that Celery uses to queue tasks and ensure delivery. It supports **durable queues, message acknowledgments, and retries**, making it a better choice for **persistent and fault-tolerant** task queues. 763 | 764 | 1. **Producer (Django/Celery Task) publishes a task** → Sent to an **Exchange** 765 | 2. **Exchange routes the task** → Pushed into a **Queue** (based on bindings) 766 | 3. **Consumer (Celery Worker) fetches the task** → Acknowledges completion after processing 767 | 768 | ✅ **Why RabbitMQ?** 769 | 770 | - Supports **persistent queues** (tasks survive restarts). 771 | - Ensures **only one worker picks a task** (FIFO processing). 772 | - Built-in **acknowledgment & retry mechanisms** prevent task loss. 773 | 774 | 2. **What is the difference between Redis and RabbitMQ as Celery brokers?** 775 | 776 | | Feature | Redis | RabbitMQ | 777 | | --------------- | ------------------------------ | ----------------------------- | 778 | | **Data Model** | Key-Value Store | AMQP-based Message Queue | 779 | | **Persistence** | Optional (AOF, RDB) | Persistent by default | 780 | | **Ordering** | FIFO not guaranteed | FIFO guaranteed | 781 | | **Reliability** | Less reliable (may drop tasks) | Highly reliable | 782 | | **Scalability** | Sharding, clustering | Clustering, high availability | 783 | | **Use Case** | Fast, real-time jobs | Guaranteed delivery of tasks | 784 | 785 | **When to use Redis?** 786 | 787 | - **High-speed** tasks (caching, low-priority jobs). 788 | - If **task loss is acceptable**. 789 | 790 | **When to use RabbitMQ?** 791 | 792 | - If **task persistence is required**. 793 | - **Ensuring at-least-once task execution**. 794 | 795 | 3. **How does RabbitMQ handle message acknowledgments and retries?** 796 | 797 | - When a worker **fetches a task**, RabbitMQ marks it as **unacknowledged**. 798 | - The task is **removed from the queue only after acknowledgment** (`ack`). 799 | - If the worker **crashes before ack**, RabbitMQ **requeues** the task automatically. 800 | - RabbitMQ supports **automatic retries** if a task fails. 801 | 802 | ✅ **Explicit Acknowledgment in Celery:** 803 | 804 | ```python 805 | @app.task(acks_late=True) # Ensures message is acknowledged only after completion 806 | def process_task(): 807 | ... 808 | ``` 809 | 810 | 4. **What happens if RabbitMQ crashes while tasks are in the queue?** 811 | 812 | - If queues are **non-durable**, all tasks are **lost**. 813 | - If queues are **durable** and messages are **persistent**, tasks survive a crash. 814 | 815 | ✅ **Solution:** 816 | 817 | - **Enable durable queues**: 818 | ```python 819 | app.conf.task_queues = Queue('default', durable=True) 820 | ``` 821 | - **Use persistent messages**: 822 | ```python 823 | app.conf.task_serializer = 'json' 824 | app.conf.result_persistent = True 825 | ``` 826 | 827 | 5. **What is the purpose of RabbitMQ’s exchange, queue, and binding?** 828 | 829 | RabbitMQ routes messages using an **Exchange**, which determines how tasks reach queues. 830 | 831 | - **Exchange**: Routes messages based on rules (`direct`, `fanout`, `topic`). 832 | - **Queue**: Stores messages until a worker consumes them. 833 | - **Binding**: Connects exchanges to queues based on rules. 834 | 835 | ✅ **Example:** 836 | 837 | ```python 838 | # Direct exchange 839 | app.conf.task_queues = ( 840 | Queue('tasks', Exchange('default'), routing_key='task_queue'), 841 | ) 842 | ``` 843 | 844 | 6. **How can RabbitMQ ensure message durability?** 845 | - **Declare durable queues:** 846 | ```python 847 | Queue('default', durable=True) 848 | ``` 849 | - **Enable persistent messages:** 850 | ```python 851 | app.conf.task_serializer = 'json' 852 | app.conf.result_persistent = True 853 | ``` 854 | - **Use mirrored queues (HA Mode)** in clustered RabbitMQ. 855 | 7. **What happens if a consumer crashes after consuming a message?** 856 | 857 | - If the worker **did not acknowledge** the message, RabbitMQ **requeues it**. 858 | - If **acknowledgment was sent**, the message is **lost** unless a result backend is used. 859 | 860 | ✅ **Solution:** Use `acks_late=True` to avoid losing tasks. 861 | 862 | ```python 863 | @app.task(acks_late=True) 864 | def process_task(): 865 | ... 866 | ``` 867 | 868 | 8. **How do you scale RabbitMQ for high availability?** 869 | - **Use Clustering** – Deploy multiple RabbitMQ nodes. 870 | - **Enable High-Availability Queues** – Mirror queues across nodes: 871 | ``` 872 | rabbitmqctl set_policy ha-all ".*" '{"ha-mode":"all"}' 873 | ``` 874 | - **Load balancing** – Use **HAProxy** or **NGINX**. 875 | 9. **What happens if RabbitMQ queues get overloaded?** 876 | - If RabbitMQ gets overloaded, it **rejects new tasks** or **slows down consumers**. 877 | - **Enable flow control** to prevent overload: 878 | ``` 879 | rabbitmqctl set_vm_memory_high_watermark 0.6 880 | ``` 881 | - **Increase prefetch limit** to improve throughput: 882 | ```python 883 | app.conf.worker_prefetch_multiplier = 10 884 | ``` 885 | 10. **How does RabbitMQ handle delayed message delivery?** 886 | 887 | RabbitMQ does not natively support delayed messages but can use **Dead Letter Exchanges (DLX)**. 888 | 889 | ✅ **Example: Delay Message Processing by 10s** 890 | 891 | ```python 892 | app.conf.task_routes = {'tasks.slow_task': {'queue': 'delayed'}} 893 | ``` 894 | 895 | --- 896 | --------------------------------------------------------------------------------