└── README.md /README.md: -------------------------------------------------------------------------------- 1 | Programmer Roadmap 2 | =================== 3 | 4 | My personal roadmap to become a great programmer. 5 | 6 | Inspired on [Sijin Joseph's Programmer Competency 7 | Matrix](http://sijinjoseph.com/programmer-competency-matrix/) and 8 | [John Washam's Coding Interview University](https://github.com/jwasham/coding-interview-university). 9 | 10 | Some things like communication and experience will be trained within the 11 | method: writing blogs and delibarated practices. 12 | 13 | My study annotations will be found in my [personal blog](http://tamoios.org/). 14 | 15 | 16 | Table of Contents 17 | ----------------- 18 | 19 | - [Data structures](#data-structures) 20 | - [Algorithms](#algorithms) 21 | - [Advanced algorithms](#advanced-algorithms) 22 | - [System architecture](system-architecture) 23 | - [Computer networks](computer-networks) 24 | - [Automated testing](#automated-testing) 25 | - [System Design, Scalability, Data Handling](#system-design-scalability-data-handling) 26 | - [Problem decomposition](#problem-decomposition) 27 | - [Systems decomposition](#systems-decomposition) 28 | - [Database](#database) 29 | - [Security](#security) 30 | 31 | 32 | -------------------- 33 | 34 | 35 | ### Data structures 36 | 37 | You should have knowledge of advanced data structures like B-trees, binomial 38 | and fibonacci heaps, AVL/Red Black trees, Splay Trees, Skip Lists, tries etc. 39 | 40 | - [ ] Arrays 41 | - [ ] Linked Lists 42 | - [ ] Stack 43 | - [ ] Queue 44 | - [ ] Hash table 45 | - [ ] Trees - Notes & Background 46 | - [ ] Binary search trees: BSTs 47 | - [ ] Heap / Priority Queue / Binary Heap 48 | - [ ] AVL trees 49 | - [ ] Splay trees 50 | - [ ] 2-3 search trees 51 | - [ ] 2-3-4 Trees (aka 2-4 trees) 52 | - [ ] B-Trees 53 | - [ ] Red/black trees 54 | - [ ] N-ary (K-ary, M-ary) trees 55 | - [ ] Tries 56 | - Note there are different kinds of tries. Some have prefixes, some don't, and some use string instead of bits 57 | to track the path. 58 | - I read through code, but will not implement. 59 | - [ ] [Sedgewick - Tries (3 videos)](https://www.youtube.com/playlist?list=PLe-ggMe31CTe9IyG9MB8vt5xUJeYgOYRQ) 60 | - [ ] [1. R Way Tries](https://www.youtube.com/watch?v=buq2bn8x3Vo&index=3&list=PLe-ggMe31CTe9IyG9MB8vt5xUJeYgOYRQ) 61 | - [ ] [2. Ternary Search Tries](https://www.youtube.com/watch?v=LelV-kkYMIg&index=2&list=PLe-ggMe31CTe9IyG9MB8vt5xUJeYgOYRQ) 62 | - [ ] [3. Character Based Operations](https://www.youtube.com/watch?v=00YaFPcC65g&list=PLe-ggMe31CTe9IyG9MB8vt5xUJeYgOYRQ&index=1) 63 | - [ ] [Notes on Data Structures and Programming Techniques](http://www.cs.yale.edu/homes/aspnes/classes/223/notes.html#Tries) 64 | - [ ] Short course videos: 65 | - [ ] [Introduction To Tries (video)](https://www.coursera.org/learn/data-structures-optimizing-performance/lecture/08Xyf/core-introduction-to-tries) 66 | - [ ] [Performance Of Tries (video)](https://www.coursera.org/learn/data-structures-optimizing-performance/lecture/PvlZW/core-performance-of-tries) 67 | - [ ] [Implementing A Trie (video)](https://www.coursera.org/learn/data-structures-optimizing-performance/lecture/DFvd3/core-implementing-a-trie) 68 | - [ ] [The Trie: A Neglected Data Structure](https://www.toptal.com/java/the-trie-a-neglected-data-structure) 69 | - [ ] [TopCoder - Using Tries](https://www.topcoder.com/community/data-science/data-science-tutorials/using-tries/) 70 | - [ ] [Stanford Lecture (real world use case) (video)](https://www.youtube.com/watch?v=TJ8SkcUSdbU) 71 | - [ ] [MIT, Advanced Data Structures, Strings (can get pretty obscure about halfway through)](https://www.youtube.com/watch?v=NinWEPPrkDQ&index=16&list=PLUl4u3cNGP61hsJNdULdudlRL493b-XZf) 72 | 73 | 74 | 75 | ### Algorithms 76 | 77 | You should be able to recognize and code dynamic programming solutions, good 78 | knowledge of graph algorithms, good knowledge of numerical computation 79 | algorithms, able to identify NP problems etc. 80 | 81 | - [ ] Binary search 82 | - [ ] Bitwise operations 83 | - [ ] Sorting Algorithm Stability 84 | - [ ] Bubble Sort 85 | - [ ] Analyzing Bubble Sort 86 | - [ ] Insertion Sort 87 | - [ ] Merge Sort 88 | - [ ] Quicksort 89 | - [ ] Selection Sort 90 | - [ ] Heap sort 91 | - [ ] Radix Sort 92 | - [ ] [Sorting in Linear Time (video)](https://www.youtube.com/watch?v=pOKy3RZbSws&list=PLUl4u3cNGP61hsJNdULdudlRL493b-XZf&index=14) 93 | - [ ] Graphs 94 | - There are three basic ways to represent a graph in memory: 95 | - objects and pointers 96 | - matrix 97 | - adjacency list 98 | - Familiarize yourself with each representation and its pros & cons 99 | - BFS and DFS - know their computational complexity, their tradeoffs, and how to implement them in real code 100 | - Check for cycle (needed for topological sort, since we'll check for cycle before starting) 101 | - Topological sort 102 | - Count connected components in a graph 103 | - List strongly connected components (Cliques) 104 | - Check for bipartite graph 105 | - [ ] Dijkstra's algorithm 106 | - [ ] A* algorithm 107 | - [ ] Recursion 108 | - When it is appropriate to use it 109 | - How is tail recursion better than not? 110 | - [ ] String searching & manipulations 111 | - [ ] [Sedgewick - Suffix Arrays (video)](https://www.youtube.com/watch?v=HKPrVm5FWvg) 112 | - [ ] [Sedgewick - Substring Search (videos)](https://www.youtube.com/watch?v=2LvvVFCEIv8&list=PLe-ggMe31CTdAdjXB3lIuf2maubzo9t66&index=5) 113 | - [ ] [1. Introduction to Substring Search](https://www.youtube.com/watch?v=2LvvVFCEIv8&list=PLe-ggMe31CTdAdjXB3lIuf2maubzo9t66&index=5) 114 | - [ ] [2. Brute-Force Substring Search](https://www.youtube.com/watch?v=CcDXwIGEXYU&list=PLe-ggMe31CTdAdjXB3lIuf2maubzo9t66&index=4) 115 | - [ ] [3. Knuth-Morris Pratt](https://www.youtube.com/watch?v=n-7n-FDEWzc&index=3&list=PLe-ggMe31CTdAdjXB3lIuf2maubzo9t66) 116 | - [ ] [4. Boyer-Moore](https://www.youtube.com/watch?v=fI7Ch6pZXfM&list=PLe-ggMe31CTdAdjXB3lIuf2maubzo9t66&index=2) 117 | - [ ] [5. Rabin-Karp](https://www.youtube.com/watch?v=QzI0p6zDjK4&index=1&list=PLe-ggMe31CTdAdjXB3lIuf2maubzo9t66) 118 | - [ ] [Search pattern in text (video)](https://www.coursera.org/learn/data-structures/lecture/tAfHI/search-pattern-in-text) 119 | 120 | ### Advanced algorithms 121 | 122 | 123 | - [ ] [Compression](https://www.youtube.com/watch?v=Lto-ajuqW3w) 124 | - [ ] [Entropy in Compression](https://www.youtube.com/watch?v=M5c_RFKVkko) 125 | - [ ] [Upside Down Trees (Huffman Trees)](https://www.youtube.com/watch?v=umTbivyJoiI) 126 | - [ ] [EXTRA BITS/TRITS - Huffman Trees](https://www.youtube.com/watch?v=DV8efuB3h2g) 127 | - [ ] [Elegant Compression in Text (The LZ 77 Method)](https://www.youtube.com/watch?v=goOa3DGezUA) 128 | - [ ] [Text Compression Meets Probabilities](https://www.youtube.com/watch?v=cCDCfoHTsaU) 129 | - [ ] [Compressor Head videos](https://www.youtube.com/playlist?list=PLOU2XLYxmsIJGErt5rrCqaSGTMyyqNt2H) 130 | - [ ] [(optional) Google Developers Live: GZIP is not enough!](https://www.youtube.com/watch?v=whGwm0Lky2s) 131 | - [ ] MP3 132 | - [ ] MPEG 133 | - [ ] Linear Programming 134 | - Simplex 135 | 136 | 137 | ### System architecture 138 | 139 | You should understands the entire programming stack, hardware (CPU + Memory + 140 | Cache + Interrupts + microcode), binary code, assembly, static and dynamic 141 | linking, compilation, interpretation, JIT compilation, garbage collection, 142 | heap, stack, memory addressing and others. 143 | 144 | - [ ] Compilers 145 | - [ ] [How a Compiler Works in ~1 minute (video)](https://www.youtube.com/watch?v=IhC7sdYe-Jg) 146 | - [ ] [Harvard CS50 - Compilers (video)](https://www.youtube.com/watch?v=CSZLNYF4Klo) 147 | - [ ] [C++ (video)](https://www.youtube.com/watch?v=twodd1KFfGk) 148 | - [ ] [Understanding Compiler Optimization (C++) (video)](https://www.youtube.com/watch?v=FnGCDLhaxKU) 149 | 150 | - [ ] [Write Great Code: Volume 1: Understanding the Machine](https://www.amazon.com/Write-Great-Code-Understanding-Machine/dp/1593270038) 151 | - Numeric Representation 152 | - Binary Arithmetic and Bit Operations 153 | - Floating-Point Representation 154 | - Character Representation 155 | - Memory Organization and Access 156 | - Composite Data Types and Memory Objects 157 | - CPU Architecture 158 | - Instruction Set Architecture 159 | - Memory Architecture and Organization 160 | 161 | - [ ] Caches 162 | - [ ] LRU cache: 163 | - [ ] [The Magic of LRU Cache (100 Days of Google Dev) (video)](https://www.youtube.com/watch?v=R5ON3iwx78M) 164 | - [ ] [Implementing LRU (video)](https://www.youtube.com/watch?v=bq6N7Ym81iI) 165 | - [ ] [LeetCode - 146 LRU Cache (C++) (video)](https://www.youtube.com/watch?v=8-FZRAjR7qU) 166 | - [ ] CPU cache: 167 | - [ ] [MIT 6.004 L15: The Memory Hierarchy (video)](https://www.youtube.com/watch?v=vjYF_fAZI5E&list=PLrRW1w6CGAcXbMtDFj205vALOGmiRc82-&index=24) 168 | - [ ] [MIT 6.004 L16: Cache Issues (video)](https://www.youtube.com/watch?v=ajgC3-pyGlk&index=25&list=PLrRW1w6CGAcXbMtDFj205vALOGmiRc82-) 169 | 170 | - [ ] Processes and Threads 171 | - [ ] Computer Science 162 - Operating Systems (25 videos): 172 | - for processes and threads see videos 1-11 173 | - [Operating Systems and System Programming (video)](https://www.youtube.com/playlist?list=PL-XXv-cvA_iBDyz-ba4yDskqMDY6A1w_c) 174 | - [What Is The Difference Between A Process And A Thread?](https://www.quora.com/What-is-the-difference-between-a-process-and-a-thread) 175 | - Covers: 176 | - Processes, Threads, Concurrency issues 177 | - difference between processes and threads 178 | - processes 179 | - threads 180 | - locks 181 | - mutexes 182 | - semaphores 183 | - monitors 184 | - how they work 185 | - deadlock 186 | - livelock 187 | - CPU activity, interrupts, context switching 188 | - Modern concurrency constructs with multicore processors 189 | - [Paging, segmentation and virtual memory (video)](https://www.youtube.com/watch?v=LKe7xK0bF7o&list=PLCiOXwirraUCBE9i_ukL8_Kfg6XNv7Se8&index=2) 190 | - [Interrupts (video)](https://www.youtube.com/watch?v=uFKi2-J-6II&list=PLCiOXwirraUCBE9i_ukL8_Kfg6XNv7Se8&index=3) 191 | - [Scheduling (video)](https://www.youtube.com/watch?v=-Gu5mYdKbu4&index=4&list=PLCiOXwirraUCBE9i_ukL8_Kfg6XNv7Se8) 192 | - Process resource needs (memory: code, static storage, stack, heap, and also file descriptors, i/o) 193 | - Thread resource needs (shares above (minus stack) with other threads in the same process but each has its own pc, stack counter, registers, and stack) 194 | - Forking is really copy on write (read-only) until the new process writes to memory, then it does a full copy. 195 | - Context switching 196 | - How context switching is initiated by the operating system and underlying hardware 197 | - [ ] [threads in C++ (series - 10 videos)](https://www.youtube.com/playlist?list=PL5jc9xFGsL8E12so1wlMS0r0hTQoJL74M) 198 | - [ ] concurrency in Python (videos): 199 | - [ ] [Short series on threads](https://www.youtube.com/playlist?list=PL1H1sBF1VAKVMONJWJkmUh6_p8g4F2oy1) 200 | - [ ] [Python Threads](https://www.youtube.com/watch?v=Bs7vPNbB9JM) 201 | - [ ] [Understanding the Python GIL (2010)](https://www.youtube.com/watch?v=Obt-vMVdM8s) 202 | - [reference](http://www.dabeaz.com/GIL) 203 | - [ ] [David Beazley - Python Concurrency From the Ground Up: LIVE! - PyCon 2015](https://www.youtube.com/watch?v=MCs5OvhV9S4) 204 | - [ ] [Keynote David Beazley - Topics of Interest (Python Asyncio)](https://www.youtube.com/watch?v=ZzfHjytDceU) 205 | - [ ] [Mutex in Python](https://www.youtube.com/watch?v=0zaPs8OtyKY) 206 | 207 | - [ ] Floating Point Numbers 208 | - [ ] simple 8-bit: [Representation of Floating Point Numbers - 1 (video - there is an error in calculations - see video description)](https://www.youtube.com/watch?v=ji3SfClm8TU) 209 | - [ ] 32 bit: [IEEE754 32-bit floating point binary (video)](https://www.youtube.com/watch?v=50ZYcZebIec) 210 | 211 | - [ ] Unicode 212 | - [ ] [The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets]( http://www.joelonsoftware.com/articles/Unicode.html) 213 | - [ ] [What Every Programmer Absolutely, Positively Needs To Know About Encodings And Character Sets To Work With Text](http://kunststube.net/encoding/) 214 | 215 | - [ ] Endianness 216 | - [ ] [Big And Little Endian](https://www.cs.umd.edu/class/sum2003/cmsc311/Notes/Data/endian.html) 217 | - [ ] [Big Endian Vs Little Endian (video)](https://www.youtube.com/watch?v=JrNF0KRAlyo) 218 | - [ ] [Big And Little Endian Inside/Out (video)](https://www.youtube.com/watch?v=oBSuXP-1Tc0) 219 | - Very technical talk for kernel devs. Don't worry if most is over your head. 220 | - The first half is enough. 221 | 222 | - [ ] Scheduling 223 | - in an OS, how it works 224 | - can be gleaned from Operating System videos 225 | 226 | - [ ] Implement system routines 227 | - understand what lies beneath the programming APIs you use 228 | - can you implement them? 229 | 230 | - [ ] Unix command line tools 231 | - I filled in the list below from good tools. 232 | - bash 233 | - cat 234 | - grep 235 | - sed 236 | - awk 237 | - curl or wget 238 | - sort 239 | - tr 240 | - uniq 241 | - [strace](https://en.wikipedia.org/wiki/Strace) 242 | - [tcpdump](https://danielmiessler.com/study/tcpdump/) 243 | 244 | 245 | ### Computer networks 246 | 247 | - [ ] [Khan Academy](https://www.khanacademy.org/computing/computer-science/internet-intro) 248 | - [ ] [UDP and TCP: Comparison of Transport Protocols](https://www.youtube.com/watch?v=Vdc8TCESIg8) 249 | - [ ] [TCP/IP and the OSI Model Explained!](https://www.youtube.com/watch?v=e5DEVa9eSN0) 250 | - [ ] [Packet Transmission across the Internet. Networking & TCP/IP tutorial.](https://www.youtube.com/watch?v=nomyRJehhnM) 251 | - [ ] [HTTP](https://www.youtube.com/watch?v=WGJrLqtX7As) 252 | - [ ] [SSL and HTTPS](https://www.youtube.com/watch?v=S2iBR2ZlZf0) 253 | - [ ] [SSL/TLS](https://www.youtube.com/watch?v=Rp3iZUvXWlM) 254 | - [ ] [HTTP 2.0](https://www.youtube.com/watch?v=E9FxNzv1Tr8) 255 | - [ ] [Video Series (21 videos)](https://www.youtube.com/playlist?list=PLEbnTDJUr_IegfoqO4iPnPYQui46QqT0j) 256 | - [ ] [Subnetting Demystified - Part 5 CIDR Notation](https://www.youtube.com/watch?v=t5xYI0jzOf4) 257 | - [ ] Sockets: 258 | - [ ] [Java - Sockets - Introduction (video)](https://www.youtube.com/watch?v=6G_W54zuadg&t=6s) 259 | - [ ] [Socket Programming (video)](https://www.youtube.com/watch?v=G75vN2mnJeQ) 260 | 261 | 262 | ### Source code version control 263 | 264 | Knowledge of distributed VCS systems. Has tried out Bzr/Mercurial/Darcs/Git 265 | 266 | 267 | ### Build automation 268 | 269 | Can setup a script to build the system and also documentation, installers, 270 | generate release notes and tag the code in source control 271 | 272 | ### Automated testing 273 | 274 | Understands and is able to setup automated functional, load/performance and UI 275 | tests 276 | 277 | - How unit testing works 278 | - What are mock objects 279 | - What is integration testing 280 | - What is dependency injection 281 | - [ ] [Agile Software Testing with James Bach (video)](https://www.youtube.com/watch?v=SAhJf36_u5U) 282 | - [ ] [Open Lecture by James Bach on Software Testing (video)](https://www.youtube.com/watch?v=ILkT_HV9DVU) 283 | - [ ] [Steve Freeman - Test-Driven Development (that’s not what we meant) (video)](https://vimeo.com/83960706) 284 | - [slides](http://gotocon.com/dl/goto-berlin-2013/slides/SteveFreeman_TestDrivenDevelopmentThatsNotWhatWeMeant.pdf) 285 | - [ ] [TDD is dead. Long live testing.](http://david.heinemeierhansson.com/2014/tdd-is-dead-long-live-testing.html) 286 | - [ ] [Is TDD dead? (video)](https://www.youtube.com/watch?v=z9quxZsLcfo) 287 | - [ ] [Video series (152 videos) - not all are needed (video)](https://www.youtube.com/watch?v=nzJapzxH_rE&list=PLAwxTw4SYaPkWVHeC_8aSIbSxE_NXI76g) 288 | - [ ] [Test-Driven Web Development with Python](http://www.obeythetestinggoat.com/pages/book.html#toc) 289 | - [ ] Dependency injection: 290 | - [ ] [video](https://www.youtube.com/watch?v=IKD2-MAkXyQ) 291 | - [ ] [Tao Of Testing](http://jasonpolites.github.io/tao-of-testing/ch3-1.1.html) 292 | - [ ] [How to write tests](http://jasonpolites.github.io/tao-of-testing/ch4-1.1.html) 293 | 294 | ### System Design, Scalability, Data Handling 295 | 296 | - **You can expect system design questions if you have 4+ years of experience.** 297 | - Scalability and System Design are very large topics with many topics and resources, since 298 | there is a lot to consider when designing a software/hardware system that can scale. 299 | Expect to spend quite a bit of time on this. 300 | - Considerations: 301 | - scalability 302 | - Distill large data sets to single values 303 | - Transform one data set to another 304 | - Handling obscenely large amounts of data 305 | - system design 306 | - features sets 307 | - interfaces 308 | - class hierarchies 309 | - designing a system under certain constraints 310 | - simplicity and robustness 311 | - tradeoffs 312 | - performance analysis and optimization 313 | - [ ] **START HERE**: [The System Design Primer](https://github.com/donnemartin/system-design-primer) 314 | - [ ] [System Design from HiredInTech](http://www.hiredintech.com/system-design/) 315 | - [ ] [How Do I Prepare To Answer Design Questions In A Technical Inverview?](https://www.quora.com/How-do-I-prepare-to-answer-design-questions-in-a-technical-interview?redirected_qid=1500023) 316 | - [ ] [8 Things You Need to Know Before a System Design Interview](http://blog.gainlo.co/index.php/2015/10/22/8-things-you-need-to-know-before-system-design-interviews/) 317 | - [ ] [Algorithm design](http://www.hiredintech.com/algorithm-design/) 318 | - [ ] [Database Normalization - 1NF, 2NF, 3NF and 4NF (video)](https://www.youtube.com/watch?v=UrYLYV7WSHM) 319 | - [ ] [System Design Interview](https://github.com/checkcheckzz/system-design-interview) - There are a lot of resources in this one. Look through the articles and examples. I put some of them below. 320 | - [ ] [How to ace a systems design interview](http://www.palantir.com/2011/10/how-to-rock-a-systems-design-interview/) 321 | - [ ] [Numbers Everyone Should Know](http://everythingisdata.wordpress.com/2009/10/17/numbers-everyone-should-know/) 322 | - [ ] [How long does it take to make a context switch?](http://blog.tsunanet.net/2010/11/how-long-does-it-take-to-make-context.html) 323 | - [ ] [Transactions Across Datacenters (video)](https://www.youtube.com/watch?v=srOgpXECblk) 324 | - [ ] [A plain English introduction to CAP Theorem](http://ksat.me/a-plain-english-introduction-to-cap-theorem/) 325 | - [ ] Paxos Consensus algorithm: 326 | - [short video](https://www.youtube.com/watch?v=s8JqcZtvnsM) 327 | - [extended video with use case and multi-paxos](https://www.youtube.com/watch?v=JEpsBg0AO6o) 328 | - [paper](http://research.microsoft.com/en-us/um/people/lamport/pubs/paxos-simple.pdf) 329 | - [ ] [Consistent Hashing](http://www.tom-e-white.com/2007/11/consistent-hashing.html) 330 | - [ ] [NoSQL Patterns](http://horicky.blogspot.com/2009/11/nosql-patterns.html) 331 | - [ ] Scalability: 332 | - [ ] [Great overview (video)](https://www.youtube.com/watch?v=-W9F__D3oY4) 333 | - [ ] Short series: 334 | - [Clones](http://www.lecloud.net/post/7295452622/scalability-for-dummies-part-1-clones) 335 | - [Database](http://www.lecloud.net/post/7994751381/scalability-for-dummies-part-2-database) 336 | - [Cache](http://www.lecloud.net/post/9246290032/scalability-for-dummies-part-3-cache) 337 | - [Asynchronism](http://www.lecloud.net/post/9699762917/scalability-for-dummies-part-4-asynchronism) 338 | - [ ] [Scalable Web Architecture and Distributed Systems](http://www.aosabook.org/en/distsys.html) 339 | - [ ] [Fallacies of Distributed Computing Explained](https://pages.cs.wisc.edu/~zuyu/files/fallacies.pdf) 340 | - [ ] [Pragmatic Programming Techniques](http://horicky.blogspot.com/2010/10/scalable-system-design-patterns.html) 341 | - [extra: Google Pregel Graph Processing](http://horicky.blogspot.com/2010/07/google-pregel-graph-processing.html) 342 | - [ ] [Jeff Dean - Building Software Systems At Google and Lessons Learned (video)](https://www.youtube.com/watch?v=modXC5IWTJI) 343 | - [ ] [Introduction to Architecting Systems for Scale](http://lethain.com/introduction-to-architecting-systems-for-scale/) 344 | - [ ] [Scaling mobile games to a global audience using App Engine and Cloud Datastore (video)](https://www.youtube.com/watch?v=9nWyWwY2Onc) 345 | - [ ] [How Google Does Planet-Scale Engineering for Planet-Scale Infra (video)](https://www.youtube.com/watch?v=H4vMcD7zKM0) 346 | - [ ] [The Importance of Algorithms](https://www.topcoder.com/community/data-science/data-science-tutorials/the-importance-of-algorithms/) 347 | - [ ] [Sharding](http://highscalability.com/blog/2009/8/6/an-unorthodox-approach-to-database-design-the-coming-of-the.html) 348 | - [ ] [Scale at Facebook (2009)](https://www.infoq.com/presentations/Scale-at-Facebook) 349 | - [ ] [Scale at Facebook (2012), "Building for a Billion Users" (video)](https://www.youtube.com/watch?v=oodS71YtkGU) 350 | - [ ] [Engineering for the Long Game - Astrid Atkinson Keynote(video)](https://www.youtube.com/watch?v=p0jGmgIrf_M&list=PLRXxvay_m8gqVlExPC5DG3TGWJTaBgqSA&index=4) 351 | - [ ] [7 Years Of YouTube Scalability Lessons In 30 Minutes](http://highscalability.com/blog/2012/3/26/7-years-of-youtube-scalability-lessons-in-30-minutes.html) 352 | - [video](https://www.youtube.com/watch?v=G-lGCC4KKok) 353 | - [ ] [How PayPal Scaled To Billions Of Transactions Daily Using Just 8VMs](http://highscalability.com/blog/2016/8/15/how-paypal-scaled-to-billions-of-transactions-daily-using-ju.html) 354 | - [ ] [How to Remove Duplicates in Large Datasets](https://blog.clevertap.com/how-to-remove-duplicates-in-large-datasets/) 355 | - [ ] [A look inside Etsy's scale and engineering culture with Jon Cowie (video)](https://www.youtube.com/watch?v=3vV4YiqKm1o) 356 | - [ ] [What Led Amazon to its Own Microservices Architecture](http://thenewstack.io/led-amazon-microservices-architecture/) 357 | - [ ] [To Compress Or Not To Compress, That Was Uber's Question](https://eng.uber.com/trip-data-squeeze/) 358 | - [ ] [Asyncio Tarantool Queue, Get In The Queue](http://highscalability.com/blog/2016/3/3/asyncio-tarantool-queue-get-in-the-queue.html) 359 | - [ ] [When Should Approximate Query Processing Be Used?](http://highscalability.com/blog/2016/2/25/when-should-approximate-query-processing-be-used.html) 360 | - [ ] [Google's Transition From Single Datacenter, To Failover, To A Native Multihomed Architecture]( http://highscalability.com/blog/2016/2/23/googles-transition-from-single-datacenter-to-failover-to-a-n.html) 361 | - [ ] [Spanner](http://highscalability.com/blog/2012/9/24/google-spanners-most-surprising-revelation-nosql-is-out-and.html) 362 | - [ ] [Egnyte Architecture: Lessons Learned In Building And Scaling A Multi Petabyte Distributed System](http://highscalability.com/blog/2016/2/15/egnyte-architecture-lessons-learned-in-building-and-scaling.html) 363 | - [ ] [Machine Learning Driven Programming: A New Programming For A New World](http://highscalability.com/blog/2016/7/6/machine-learning-driven-programming-a-new-programming-for-a.html) 364 | - [ ] [The Image Optimization Technology That Serves Millions Of Requests Per Day](http://highscalability.com/blog/2016/6/15/the-image-optimization-technology-that-serves-millions-of-re.html) 365 | - [ ] [A Patreon Architecture Short](http://highscalability.com/blog/2016/2/1/a-patreon-architecture-short.html) 366 | - [ ] [Tinder: How Does One Of The Largest Recommendation Engines Decide Who You'll See Next?](http://highscalability.com/blog/2016/1/27/tinder-how-does-one-of-the-largest-recommendation-engines-de.html) 367 | - [ ] [Design Of A Modern Cache](http://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html) 368 | - [ ] [Live Video Streaming At Facebook Scale](http://highscalability.com/blog/2016/1/13/live-video-streaming-at-facebook-scale.html) 369 | - [ ] [A Beginner's Guide To Scaling To 11 Million+ Users On Amazon's AWS](http://highscalability.com/blog/2016/1/11/a-beginners-guide-to-scaling-to-11-million-users-on-amazons.html) 370 | - [ ] [How Does The Use Of Docker Effect Latency?](http://highscalability.com/blog/2015/12/16/how-does-the-use-of-docker-effect-latency.html) 371 | - [ ] [Does AMP Counter An Existential Threat To Google?](http://highscalability.com/blog/2015/12/14/does-amp-counter-an-existential-threat-to-google.html) 372 | - [ ] [A 360 Degree View Of The Entire Netflix Stack](http://highscalability.com/blog/2015/11/9/a-360-degree-view-of-the-entire-netflix-stack.html) 373 | - [ ] [Latency Is Everywhere And It Costs You Sales - How To Crush It](http://highscalability.com/latency-everywhere-and-it-costs-you-sales-how-crush-it) 374 | - [ ] [Serverless (very long, just need the gist)](http://martinfowler.com/articles/serverless.html) 375 | - [ ] [What Powers Instagram: Hundreds of Instances, Dozens of Technologies](http://instagram-engineering.tumblr.com/post/13649370142/what-powers-instagram-hundreds-of-instances) 376 | - [ ] [Cinchcast Architecture - Producing 1,500 Hours Of Audio Every Day](http://highscalability.com/blog/2012/7/16/cinchcast-architecture-producing-1500-hours-of-audio-every-d.html) 377 | - [ ] [Justin.Tv's Live Video Broadcasting Architecture](http://highscalability.com/blog/2010/3/16/justintvs-live-video-broadcasting-architecture.html) 378 | - [ ] [Playfish's Social Gaming Architecture - 50 Million Monthly Users And Growing](http://highscalability.com/blog/2010/9/21/playfishs-social-gaming-architecture-50-million-monthly-user.html) 379 | - [ ] [TripAdvisor Architecture - 40M Visitors, 200M Dynamic Page Views, 30TB Data](http://highscalability.com/blog/2011/6/27/tripadvisor-architecture-40m-visitors-200m-dynamic-page-view.html) 380 | - [ ] [PlentyOfFish Architecture](http://highscalability.com/plentyoffish-architecture) 381 | - [ ] [Salesforce Architecture - How They Handle 1.3 Billion Transactions A Day](http://highscalability.com/blog/2013/9/23/salesforce-architecture-how-they-handle-13-billion-transacti.html) 382 | - [ ] [ESPN's Architecture At Scale - Operating At 100,000 Duh Nuh Nuhs Per Second](http://highscalability.com/blog/2013/11/4/espns-architecture-at-scale-operating-at-100000-duh-nuh-nuhs.html) 383 | - [ ] See "Messaging, Serialization, and Queueing Systems" way below for info on some of the technologies that can glue services together 384 | - [ ] Twitter: 385 | - [O'Reilly MySQL CE 2011: Jeremy Cole, "Big and Small Data at @Twitter" (video)](https://www.youtube.com/watch?v=5cKTP36HVgI) 386 | - [Timelines at Scale](https://www.infoq.com/presentations/Twitter-Timeline-Scalability) 387 | - For even more, see "Mining Massive Datasets" video series in the [Video Series](#video-series) section. 388 | - [ ] Practicing the system design process: Here are some ideas to try working through on paper, each with some documentation on how it was handled in the real world: 389 | - review: [The System Design Primer](https://github.com/donnemartin/system-design-primer) 390 | - [System Design from HiredInTech](http://www.hiredintech.com/system-design/) 391 | - [cheat sheet](https://github.com/jwasham/coding-interview-university/blob/master/extras/cheat%20sheets/system-design.pdf) 392 | - flow: 393 | 1. Understand the problem and scope: 394 | - define the use cases, with interviewer's help 395 | - suggest additional features 396 | - remove items that interviewer deems out of scope 397 | - assume high availability is required, add as a use case 398 | 2. Think about constraints: 399 | - ask how many requests per month 400 | - ask how many requests per second (they may volunteer it or make you do the math) 401 | - estimate reads vs. writes percentage 402 | - keep 80/20 rule in mind when estimating 403 | - how much data written per second 404 | - total storage required over 5 years 405 | - how much data read per second 406 | 3. Abstract design: 407 | - layers (service, data, caching) 408 | - infrastructure: load balancing, messaging 409 | - rough overview of any key algorithm that drives the service 410 | - consider bottlenecks and determine solutions 411 | 412 | ### Problem decomposition 413 | 414 | Use of appropriate data structures and algorithms and comes up with 415 | generic/object-oriented code that encapsulate aspects of the problem that are 416 | subject to change. 417 | 418 | - [ ] Dynamic Programming 419 | - [ ] NP, NP-Complete and Approximation Algorithms 420 | - Know about the most famous classes of NP-complete problems, such as traveling salesman and the knapsack problem. 421 | - Know what NP-complete means. 422 | - Greedy Algs 423 | 424 | ### Systems decomposition 425 | 426 | Able to visualize and design complex systems with multiple product lines and 427 | integrations with external systems. Also should be able to design operations 428 | support systems like monitoring, reporting, fail overs etc. 429 | 430 | - [ ] [Design patterns](https://www.martinfowler.com/articles/writingPatterns.html) 431 | - Describe all the 24 patterns in the GOF book and a working knowledge in the patterns of POSA books 432 | - [ ] Design principles 433 | - Know the SOLID principles and have a good understanding of the component principles 434 | - [ ] Methods 435 | - XP 436 | - Scrum 437 | - Lean 438 | - Kanban 439 | - Waterfall 440 | - Structured Analysis 441 | - Structured Desing 442 | - [ ] UML 443 | - [ ] DFDs 444 | - [ ] Structure charts 445 | - [ ] Petri Nets 446 | - [ ] State Transition Diagrams 447 | - [ ] Decision tables 448 | - [ ] Object oriented programming 449 | 450 | 451 | ### Code organization 452 | 453 | Code organization at a physical level closely matches design and looking at 454 | file names and folder distribution provides insights into design 455 | 456 | Physical layout of source tree matches logical hierarchy and organization. The 457 | directory names and organization provide insights into the design of the 458 | system. 459 | 460 | 461 | ### Requirements 462 | 463 | Able to suggest better alternatives and flows to given requirements based on 464 | experience 465 | 466 | 467 | ### Database 468 | 469 | Can do basic database administration, performance optimization, index 470 | optimization, write advanced select queries, able to replace cursor usage with 471 | relational sql, understands how data is stored internally, understands how 472 | indexes are stored internally, understands how databases can be mirrored, 473 | replicated etc. Understands how the two phase commit works. 474 | 475 | - [ ] Understand the benefits of relational data, e.g. SQL. 476 | - [ ] Learn about NoSQL databases, e.g. MongoDB. 477 | - [ ] Understand which would be better in certain situations. 478 | - [ ] Know how to connect a database with your chosen back-end language (e.g. Node.js + MongoDB). 479 | - [ ] Understand the benefits of in-memory data stores like Redis or memcached. 480 | - [ ] Web storage to store sessions, cookies, and cached data in the browser. 481 | - [ ] Scaling databases, ACID, and ORM (all optional). 482 | 483 | 484 | ### Security 485 | 486 | - [ ] Computer Security 487 | - [MIT (23 videos)](https://www.youtube.com/playlist?list=PLUl4u3cNGP62K2DjQLRxDNRi0z2IRWnNh) 488 | - [ ] [Introduction, Threat Models](https://www.youtube.com/watch?v=GqmQg-cszw4&index=1&list=PLUl4u3cNGP62K2DjQLRxDNRi0z2IRWnNh) 489 | - [ ] [Control Hijacking Attacks](https://www.youtube.com/watch?v=6bwzNg5qQ0o&list=PLUl4u3cNGP62K2DjQLRxDNRi0z2IRWnNh&index=2) 490 | - [ ] [Buffer Overflow Exploits and Defenses](https://www.youtube.com/watch?v=drQyrzRoRiA&list=PLUl4u3cNGP62K2DjQLRxDNRi0z2IRWnNh&index=3) 491 | - [ ] [Privilege Separation](https://www.youtube.com/watch?v=6SIJmoE9L9g&index=4&list=PLUl4u3cNGP62K2DjQLRxDNRi0z2IRWnNh) 492 | - [ ] [Capabilities](https://www.youtube.com/watch?v=8VqTSY-11F4&index=5&list=PLUl4u3cNGP62K2DjQLRxDNRi0z2IRWnNh) 493 | - [ ] [Sandboxing Native Code](https://www.youtube.com/watch?v=VEV74hwASeU&list=PLUl4u3cNGP62K2DjQLRxDNRi0z2IRWnNh&index=6) 494 | - [ ] [Web Security Model](https://www.youtube.com/watch?v=chkFBigodIw&index=7&list=PLUl4u3cNGP62K2DjQLRxDNRi0z2IRWnNh) 495 | - [ ] [Securing Web Applications](https://www.youtube.com/watch?v=EBQIGy1ROLY&index=8&list=PLUl4u3cNGP62K2DjQLRxDNRi0z2IRWnNh) 496 | - [ ] [Symbolic Execution](https://www.youtube.com/watch?v=yRVZPvHYHzw&index=9&list=PLUl4u3cNGP62K2DjQLRxDNRi0z2IRWnNh) 497 | - [ ] [Network Security](https://www.youtube.com/watch?v=SIEVvk3NVuk&index=11&list=PLUl4u3cNGP62K2DjQLRxDNRi0z2IRWnNh) 498 | - [ ] [Network Protocols](https://www.youtube.com/watch?v=QOtA76ga_fY&index=12&list=PLUl4u3cNGP62K2DjQLRxDNRi0z2IRWnNh) 499 | - [ ] [Side-Channel Attacks](https://www.youtube.com/watch?v=PuVMkSEcPiI&index=15&list=PLUl4u3cNGP62K2DjQLRxDNRi0z2IRWnNh) 500 | 501 | 502 | ### Mathematics 503 | 504 | - [ ] Combinatorics (n choose k) & Probability 505 | - [ ] Information theory (videos) 506 | - [ ] [Khan Academy](https://www.khanacademy.org/computing/computer-science/informationtheory) 507 | - [ ] more about Markov processes: 508 | - [ ] [Core Markov Text Generation](https://www.coursera.org/learn/data-structures-optimizing-performance/lecture/waxgx/core-markov-text-generation) 509 | - [ ] [Core Implementing Markov Text Generation](https://www.coursera.org/learn/data-structures-optimizing-performance/lecture/gZhiC/core-implementing-markov-text-generation) 510 | - [ ] [Project = Markov Text Generation Walk Through](https://www.coursera.org/learn/data-structures-optimizing-performance/lecture/EUjrq/project-markov-text-generation-walk-through) 511 | - See more in MIT 6.050J Information and Entropy series below. 512 | 513 | - [ ] Parity & Hamming Code (videos) 514 | - [ ] [Intro](https://www.youtube.com/watch?v=q-3BctoUpHE) 515 | - [ ] [Parity](https://www.youtube.com/watch?v=DdMcAUlxh1M) 516 | - [ ] Hamming Code: 517 | - [Error detection](https://www.youtube.com/watch?v=1A_NcXxdoCc) 518 | - [Error correction](https://www.youtube.com/watch?v=JAMLuxdHH8o) 519 | - [ ] [Error Checking](https://www.youtube.com/watch?v=wbH2VxzmoZk) 520 | 521 | - [ ] Entropy 522 | - make sure to watch information theory videos first 523 | - [ ] [Information Theory, Claude Shannon, Entropy, Redundancy, Data Compression & Bits (video)](https://youtu.be/JnJq3Py0dyM?t=176) 524 | 525 | --- 526 | 527 | Other topics 528 | ----------- 529 | 530 | To be categorized. 531 | - Cryptography 532 | - Computer Intelligence 533 | 534 | Papers 535 | ------ 536 | 537 | - Reading all from end to end with full comprehension will likely take more time than you have. I recommend being selective on papers and their sections. 538 | - [Love classic papers?](https://www.cs.cmu.edu/~crary/819-f09/) 539 | - [ ] [1978: Communicating Sequential Processes](http://spinroot.com/courses/summer/Papers/hoare_1978.pdf) 540 | - [implemented in Go](https://godoc.org/github.com/thomas11/csp) 541 | - [ ] [2003: The Google File System](http://static.googleusercontent.com/media/research.google.com/en//archive/gfs-sosp2003.pdf) 542 | - replaced by Colossus in 2012 543 | - [ ] [2004: MapReduce: Simplified Data Processing on Large Clusters]( http://static.googleusercontent.com/media/research.google.com/en//archive/mapreduce-osdi04.pdf) 544 | - mostly replaced by Cloud Dataflow? 545 | - [ ] [2006: Bigtable: A Distributed Storage System for Structured Data](https://static.googleusercontent.com/media/research.google.com/en//archive/bigtable-osdi06.pdf) 546 | - [An Inside Look at Google BigQuery](https://cloud.google.com/files/BigQueryTechnicalWP.pdf) 547 | - [ ] [2006: The Chubby Lock Service for Loosely-Coupled Distributed Systems](https://research.google.com/archive/chubby-osdi06.pdf) 548 | - [ ] [2007: Dynamo: Amazon’s Highly Available Key-value Store](http://s3.amazonaws.com/AllThingsDistributed/sosp/amazon-dynamo-sosp2007.pdf) 549 | - The Dynamo paper kicked off the NoSQL revolution 550 | - [ ] [2007: What Every Programmer Should Know About Memory (very long, and the author encourages skipping of some sections)](https://www.akkadia.org/drepper/cpumemory.pdf) 551 | - [ ] [2010: Dapper, a Large-Scale Distributed Systems Tracing Infrastructure](https://research.google.com/pubs/archive/36356.pdf) 552 | - [ ] [2010: Dremel: Interactive Analysis of Web-Scale Datasets](https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/36632.pdf) 553 | - [ ] [2012: Google's Colossus](https://www.wired.com/2012/07/google-colossus/) 554 | - paper not available 555 | - [ ] 2012: AddressSanitizer: A Fast Address Sanity Checker: 556 | - [paper](http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/37752.pdf) 557 | - [video](https://www.usenix.org/conference/atc12/technical-sessions/presentation/serebryany) 558 | - [ ] 2013: Spanner: Google’s Globally-Distributed Database: 559 | - [paper](http://static.googleusercontent.com/media/research.google.com/en//archive/spanner-osdi2012.pdf) 560 | - [video](https://www.usenix.org/node/170855) 561 | - [ ] [2014: Machine Learning: The High-Interest Credit Card of Technical Debt](http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43146.pdf) 562 | - [ ] [2015: Continuous Pipelines at Google](http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43790.pdf) 563 | - [ ] [2015: High-Availability at Massive Scale: Building Google’s Data Infrastructure for Ads](https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/44686.pdf) 564 | - [ ] [2015: TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems](http://download.tensorflow.org/paper/whitepaper2015.pdf ) 565 | - [ ] [2015: How Developers Search for Code: A Case Study](http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43835.pdf) 566 | - [ ] [2016: Borg, Omega, and Kubernetes](http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/44843.pdf) 567 | --------------------------------------------------------------------------------