├── Assignments ├── CS559_Programming_Assignment_2024.pdf └── Readme.md ├── ML Systems Resources.md ├── Readme.md ├── Resources.md └── Software.md /Assignments/CS559_Programming_Assignment_2024.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gagan-iitb/ComputerSysDesign/80be4dbb326813c23a3f50456e0a1f8fbe7192b4/Assignments/CS559_Programming_Assignment_2024.pdf -------------------------------------------------------------------------------- /Assignments/Readme.md: -------------------------------------------------------------------------------- 1 | [Jan 1, 2024] First assignment is released. 2 | -------------------------------------------------------------------------------- /ML Systems Resources.md: -------------------------------------------------------------------------------- 1 | MLOPs Software: 2 | 3 | Book: https://www.manning.com/books/effective-data-science-infrastructure 4 | 5 | Code from the book chapters can be found here: https://github.com/outerbounds/dsbook 6 | 7 | 8 | We will discuss and compare a few tools available for MLOps: Airflow, MLflow, and MetaFlow 9 | 10 | __MetaFlow__: Allows you to define a flow with multiple steps as a DAG. Each step can define the tasks to be done, artifacts (data) to be generated, metadata, and the next steps. 11 | Your flow is independent of the compute and can run on different computing infrastructures with the help of an orchestrator. 12 | You can monitor the metrics, performance, etc. of each of the steps. 13 | You can debug, and re-execute the flow from the point where it stopped without having to re-run the time-consuming steps that executed successfully. 14 | 15 | 16 | Excellent tutorials are available: 17 | * https://docs.metaflow.org/introduction/metaflow-resources 18 | * __Beginner:__ https://outerbounds.com/docs/data-science-welcome/ 19 | * __NLP:__ https://outerbounds.com/docs/nlp-tutorial-overview/ 20 | * __Computer Vision:__ https://outerbounds.com/docs/cv-tutorial-S2-overview/ 21 | * __Recommendation:__ https://outerbounds.com/docs/recsys-tutorial-S2-overview/ 22 | 23 | 24 | ---------------------------------------------------------------- 25 | 26 | Once you have developed an ML model to solve a business problem, deploying and operating that is much more difficult than one would imagine! 27 | The links below discuss why and how to overcome such problems. 28 | Notably, these topics are being discussed in mainstream leading ML and systems conferences as well. 29 | The best way to keep in touch is to regularly read research papers, systems blogs and resources such as below. 30 | 31 | 32 | MLOPS links: 33 | * https://neptune.ai/blog/recommender-systems-lessons-from-building-and-deployment: This blog is about of some of the key lessons from building and deploying recommender systems in the real world. You can learn about online metrics. 34 | * https://eugeneyan.com/speaking/mlops-community-recsys/: Shows the offline and online architecture and how to generate recommendations at a low cost 35 | * https://hktw-resources.awscloud.com/webinar-slides/introduction-to-mlops: AWS tools meant to simplify the deployment of ML models 36 | 37 | ------------------------------------------------------------------------- 38 | 39 | 40 | __9 step ML system design approach__ 41 | 42 | https://github.com/alirezadir/Machine-Learning-Interviews/blob/main/src/MLSD/ml-system-design.md 43 | 44 | 45 | __Made with ML__: https://madewithml.com/ 46 | 47 | __Ray:__ 48 | Tutorial: https://github.com/anyscale/academy/blob/main/ray-serve/e2e/tutorial.ipynb 49 | 50 | __Stanford MLSys Seminars__: https://www.youtube.com/playlist?list=PLSrTvUm384I9PV10koj_cqit9OfbJXEkq 51 | 52 | __Visual Search System Design__: https://bytebytego.com/courses/machine-learning-system-design-interview/visual-search-system 53 | -------------------------------------------------------------------------------- /Readme.md: -------------------------------------------------------------------------------- 1 | # Course Page for CS559 (Computer Systems Design) 2 | 3 | Being taught at IIT Bhilai, India in the Winter Semester of 2024. 4 | 5 | __Course Instructor:__ [Dr. Gagan Raj Gupta ](https://www.iitbhilai.ac.in/index.php?pid=gagan) 6 | 7 | Real-life applications are complex and involve a variety of components (multiple clients, backend servers, databases, ML modules, all connected by a network). We want our applications to be intelligent, adaptive (data-driven), scalable, reliable, and performant. How do we go from the idea to the design to its implementation and successful operationalization? 8 | 9 | This course attempts to teach the basic principles underlying system design, implementation, and evaluation of computer systems. It provides an introduction to the fundamentals of analytic modeling techniques that are used in computer system design. Students will also learn general systems concepts that support design goals of modularity, performance, and security. Students will apply materials learned in lectures and readings to design, build and evaluate new systems components. 10 | 11 | __Objectives:__ 12 | 13 | After completing this class, the students will be able to design their own distributed systems to solve real-world problems. The ability to design one's own distributed system includes an ability to argue for one's design choices. 14 | 15 | The students will be able to evaluate and critique existing systems and their own system designs. As part of that, students will learn to recognize design choices made in existing systems. 16 | 17 | __Learning Outcomes:__ 18 | 19 | The students will be able to apply the technical material taught in the lecture to new system components. This implies an ability to recognize and describe: 20 | 21 | • How common design patterns in computer systems—such as abstraction and modularity are used to limit complexity. 22 | 23 | • How operating systems use virtualization and abstraction to enforce modularity. 24 | 25 | • How reliable, usable distributed systems can be built on top of an unreliable network. 26 | 27 | • How to measure system performance and what can we do to improve performance and scalability? 28 | 29 | * How to design and deploy ML systems? 30 | 31 | __Pre-requisites__ 32 | 33 | Undergraduate course in Computer Networks and Operating Systems. Basic courses in data science and ML (DS250, DS200, CS550) 34 | 35 | 36 | __Class Timings and Location__ 37 | 38 | Lecture Room: L102 39 | Lecture Timings: 8:30 a.m. to 9:30 a.m. On Mondays and 9:30 am -10:30 am on Wednesdays and Friday 40 | 41 | __Course Materials__ 42 | 43 | * Google Drive Link with Lecture materials for IIT Bhilai Students: [GDrive](https://drive.google.com/drive/folders/1i0VtvMyIu4FIVmAxFJ81NnOdoT1atasW) 44 | 45 | * Canvas Link for registered Students (for Assignments and Discussions): [Canvas](https://canvas.instructure.com/courses/4085885) 46 | 47 | __Textbook/Reference books:__ 48 | 49 | 1. [MB] Mor Harchol-Balter, February 2013, Performance Modeling and Design of Computer Systems: Queueing Theory in Action, Cambridge University Press 50 | 51 | 2. [UDS] Roberto Vitillo, Understanding Distributed Systems, https://understandingdistributed.systems/ : Simplified and easy-to-follow description of essential concepts on a wide range of topics 52 | 3. [EDSI] Ville Tuulos, Effective Data Science Infrastructure: How to Make Data Scientists Productive, Manning Publications. 53 | 54 | 55 | __Grading Plan__ 56 | 57 | * 2 Exams: 45% 58 | * Weekly System Design Exercise (In class): 25% [will mimic system design interviews] 59 | * 1 Assignment: 10% 60 | * 1 Project: 20% [Project will include a mock interview] 61 | 62 | The assignment is individual. The project will be end-to-end (full stack) in a team. Students are encouraged to build balanced teams [front-end developer, back-end developer, ML engineer, tester, architect roles] 63 | 64 | __Detailed Schedule__ 65 | 66 | * __Week1__: Introduction to the course, modularity, building a Campus Service Application, Building Blocks of AWS (reading assignment) 67 | * __Week2__: Building Reliable and Secure Communications, Networks, API Design 68 | * __Week3__: Design for Maintainability: testing, tracing, logging, metrics 69 | * __Week4__: System Design Interview Preparation, High-Level Design, Back of Envelope Calculations, Detailed Design 70 | * __Week5__: System Scalability and performance analysis basics 71 | * __Feb5-9__: Rate Limiter (Case Study), Open and Closed Systems 72 | * __Week7__: Distributed Systems Fundamentals: Process Coordination, Concurrency. 73 | * Consistent Hashing, 74 | * __Exam Week__: No classes 75 | * __Week8__: Reliable storage and File Systems, Key Value Store (Case Study) 76 | * __Research Papers__: After exam 1, we will switch gears and study both classical and recent papers on 3 main topics: Caching, DBs, ML Systems. 77 | -------------------------------------------------------------------------------- /Resources.md: -------------------------------------------------------------------------------- 1 | __AWS Essentials__ 2 | The following courses may be helpful to you: 3 | * __https://explore.skillbuilder.aws/learn/course/external/view/elearning/11458/aws-cloud-quest-cloud-practitioner__ 4 | * __https://www.coursera.org/learn/aws-cloud-practitioner-essentials__ 5 | * __https://aws.amazon.com/getting-started/hands-on__ 6 | 7 | __Rate Limter Implementations__ 8 | 9 | 10 | * https://blog.bytebytego.com/p/rate-limiter-for-the-real-world 11 | * https://builtin.com/software-engineering-perspectives/rate-limiter 12 | * https://medium.com/@saisandeepmopuri/system-design-rate-limiter-and-data-modelling-9304b0d18250 13 | 14 | Using Redis: 15 | * https://github.com/redis-developer/basic-rate-limiting-demo-python?tab=readme-ov-file 16 | * https://gist.github.com/ptarjan/e38f45f2dfe601419ca3af937fff574d#request-rate-limiter 17 | * https://engineering.classdojo.com/blog/2015/02/06/rolling-rate-limiter/ 18 | 19 | 20 | Java based: 21 | https://github.com/vishalratna-microsoft/JavaSystemDesign/tree/master/src/main/java/org/example/ratelimiter 22 | 23 | __Queueing Theory Simulator__ 24 | 25 | * MATLAB Simulink: https://in.mathworks.com/discovery/queuing-theory.html 26 | 27 | * R package: https://ace-ebert.shinyapps.io/queue_simulator_mmk/ 28 | 29 | __Consistent Hashing___ 30 | 31 | * https://systemdesign.one/consistent-hashing-explained/ 32 | * https://en.wikipedia.org/wiki/Consistent_hashing 33 | * https://tom-e-white.com/2007/11/consistent-hashing.html 34 | * https://theory.stanford.edu/~tim/s16/l/l1.pdf 35 | * Code: https://arpitbhayani.me/blogs/consistent-hashing/ 36 | * Algo: http://highscalability.com/blog/2023/2/22/consistent-hashing-algorithm.html 37 | * 38 | __Reference Books__ 39 | 40 | 1. [SJK] Saltzer, Jerome H. and M. Frans Kaashoek. 2009, Principles of Computer System Design: An Introduction, Part I. Morgan Kaufmann, ISBN: 9780123749574. Part II 41 | of the textbook is available on MIT OpenCourseWare. 42 | 2. Google SRE books: https://sre.google/books/ 43 | 3. [BL] Butler Lampson, 2011, Hints and Principles for Computer Systems Design, 44 | (https://arxiv.org/ftp/arxiv/papers/2011/2011.02455.pdf ) 45 | 4. [LZGS] E. D. Lazowska, J. Zahorjan, G. S. Graham, K. C. Sevcik, 1984, Quantitative System Performance, Prentice Hall. 46 | 5. [LK] L. Kleinrock, 1975. Queueing Systems Volume I: Theory, Wiley Interscience. 47 | 6. [OSTEP](https://pages.cs.wisc.edu/~remzi/OSTEP/) Remzi H. Arpaci-Dusseau and Andrea C. Arpaci-Dusseau, Operating Systems: Three Easy Pieces, Arpaci-Dusseau Books, 2018 48 | 7. Alex Xu's books: **https://bytebytego.com/** 49 | 8. [DS] Distributed Systems book: https://www.distributed-systems.net/index.php/books/ds4/ -- This is a complete reference book with lots of accompanying materials. 50 | 51 | __INTERVIEW Preparation Materials__ 52 | 53 | http://www.uoitc.edu.iq/images/documents/informatics-institute/Competitive_exam/Systemanalysisanddesign.pdf 54 | 55 | https://www.codingninjas.com/codestudio/guided-paths/system-design 56 | 57 | https://media.geeksforgeeks.org/courses/syllabus/6db09c35adb4bca3998be368743755b5.pdf 58 | 59 | http://book.mixu.net/distsys/single-page.html 60 | 61 | https://blog.pragmaticengineer.com/operating-a-high-scale-distributed-system/ 62 | 63 | https://martinfowler.com/articles/patterns-of-distributed-systems/ 64 | 65 | Important Algorithms: https://blog.bytebytego.com/p/algorithms-you-should-know-before 66 | 67 | ByteByteGo's List of System Design Resources: https://github.com/alex-xu-system/bytebytego/blob/main/system_design_links.md 68 | 69 | __Similar courses available in other institute(s) / MOOC portals:__ 70 | 71 | * MIT 6.033: Computer Systems Engineering (http://web.mit.edu/6.033/www/ ) 72 | 73 | * Princeton COS 316: Principles of Computer Systems Design 74 | (https://www.cs.princeton.edu/courses/archive/fall19/cos316/ ) 75 | 76 | * University of Wisconsin Madison, CS547: Computer Systems Modeling Fundamentals 77 | (http://pages.cs.wisc.edu/~vernon/cs547/cs547.html ) 78 | 79 | * CMU 15-857: Analytical Performance Modeling & Design of Computer Systems 80 | (https://www.cs.cmu.edu/~harchol/Perfclass/class21fall.html ) 81 | 82 | 83 | -------------------------------------------------------------------------------- /Software.md: -------------------------------------------------------------------------------- 1 | https://www.charlesproxy.com/ 2 | --------------------------------------------------------------------------------