├── assets ├── nyaruko.jpg └── courses │ └── database │ ├── doc.png │ ├── schema_network.png │ ├── relational_DBMS.png │ ├── hierarchical_DBMS.png │ └── object_oriented_DBMS.png ├── courses ├── my_courses.md ├── Compilers │ └── notes.md └── advanced-database-systems │ └── history_of_databases.md ├── reading-list └── backlog.md ├── README.md └── books ├── notes ├── grokking_algorithms.md ├── the_ruthless_elimination_of_hurry.md ├── we.md └── designing_data_intensive_applications.md └── books_list.md /assets/nyaruko.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dellamora/reading-diary/HEAD/assets/nyaruko.jpg -------------------------------------------------------------------------------- /assets/courses/database/doc.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dellamora/reading-diary/HEAD/assets/courses/database/doc.png -------------------------------------------------------------------------------- /assets/courses/database/schema_network.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dellamora/reading-diary/HEAD/assets/courses/database/schema_network.png -------------------------------------------------------------------------------- /assets/courses/database/relational_DBMS.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dellamora/reading-diary/HEAD/assets/courses/database/relational_DBMS.png -------------------------------------------------------------------------------- /assets/courses/database/hierarchical_DBMS.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dellamora/reading-diary/HEAD/assets/courses/database/hierarchical_DBMS.png -------------------------------------------------------------------------------- /assets/courses/database/object_oriented_DBMS.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dellamora/reading-diary/HEAD/assets/courses/database/object_oriented_DBMS.png -------------------------------------------------------------------------------- /courses/my_courses.md: -------------------------------------------------------------------------------- 1 | ## I Hope to Finish Before I Die 2 | 3 | - [Advanced Database Systems (Spring 2020)](/courses/advanced-database-systems/history_of_databases.md) 4 | 5 | ## Completed Courses 6 | 7 | - [The Last Algorithms Course You Will Need](https://github.com/dellamora/data-structures-and-algorithms) 8 | -------------------------------------------------------------------------------- /reading-list/backlog.md: -------------------------------------------------------------------------------- 1 | - [Intuitively Understanding the Shannon Entropy](https://www.youtube.com/watch?v=0GCGaw0QOhA) 2 | - [Alan Watts - The Principle Of Not Forcing](https://www.youtube.com/watch?v=ZzaUGhhnlQ8) 3 | - https://shikaan.github.io/assembly/x86/guide/2024/09/16/x86-64-conditionals.html 4 | -[Enter The Arena: Simplifying Memory Management](https://www.youtube.com/watch?v=TZ5a3gCCZYo) 5 | - https://blog.brownplt.org/2021/07/31/behavioral-hof.html… 6 | - https://blog.brownplt.org/2022/07/09/plan-comp-hof.html… 7 | - https://blog.brownplt.org/2022/08/16/struct-pipe-comp.html… 8 | - [A Flock of Functions: Combinators, Lambda Calculus, & Church Encodings in JS - Part II](https://www.youtube.com/watch?v=pAnLQ9jwN-E) 9 | - https://en.m.wikipedia.org/wiki/What_Is_Life%3F 10 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 |
2 |

Welcome to My Reading Diary ^^

3 |
4 | 5 | Here, you'll find a list of books, courses, and other materials I've read or watched, along with my personal notes and summaries. These are my interpretations, so be sure to check out the original sources for complete details. 6 | 7 | ## What's Inside? 8 | 9 | - [Books](/books/books_list.md): A list of books I've read, with notes on some of them. 10 | - [Courses](/courses/my_courses.md): A list of courses I've taken, including my notes, solutions, and resources. 11 | 12 | 13 | ## Usage 14 | 15 | Feel free to use these notes for your personal reference. If you find them helpful, please consider giving this repository a star. If you have any suggestions or corrections, you're welcome to submit a pull request. 16 | 17 | ## Disclaimer 18 | 19 | These notes are personal interpretations and summaries. They may not cover all aspects of the materials. For a complete understanding, please refer to the original sources. 20 | -------------------------------------------------------------------------------- /courses/Compilers/notes.md: -------------------------------------------------------------------------------- 1 | # Compiler Notes ˆˆ 2 | 3 | So I'm dumping all my notes here from this Stanford compiler course I'm taking (https://online.stanford.edu/courses/soe-ycscs1-compilers). This is basically my brain on paper and might be a bit chaotic but that's how I learn fast ˆˆ 4 | 5 | Will keep adding my notes as I go deeper into the course. Feel free to learn along with me or point out if something doesn't make sense! 6 | 7 | 8 | ## Day one 9 | 10 | Key Points: 11 | - Prof explaining 2 ways to run programs: interpreters vs compilers 12 | 13 | Interpreters: 14 | - Takes program + data → direct output 15 | - Runs stuff immediately, no preprocessing 16 | - Downside: Way slower (like 10-20x slower) 17 | 18 | Compilers: 19 | - Takes just the program → makes executable 20 | - Can run executable many times w/ different data 21 | - Offline process (preprocesses first) 22 | 23 | Cool History: 24 | - 1950s: IBM 704 showed software costs > hardware costs (big deal back then) 25 | - Speed coding (1953): 26 | * First interpreter by John Backus 27 | * Made programming faster but programs ran slower 28 | * Used 30% of computer memory (yikes) 29 | 30 | FORTRAN: 31 | - Backus's next big thing (1954-57) 32 | - First successful high-level language 33 | - Took 3 years instead of 1 (classic software project lol) 34 | - By 1958: 50% of all code = FORTRAN (insane adoption rate!) 35 | 36 | Modern Compiler Structure (same as FORTRAN 1!): 37 | 1. Lexical analysis & parsing (syntax stuff) 38 | 2. Semantic analysis (types, scope rules) 39 | 3. Optimization (make code faster/smaller) 40 | 4. Code generation (translate to machine code/bytecode) 41 | -------------------------------------------------------------------------------- /books/notes/grokking_algorithms.md: -------------------------------------------------------------------------------- 1 | # Grokking Algorithms 2 | 3 | ## Big O Notation 4 | 5 | - `log` always refers to log₂ in computer science. 6 | - Examples: 7 | - 2³ = 8, so log₂(8) = 3 8 | - 2⁴ = 16, so log₂(16) = 4 9 | - 2⁵ = 32, so log₂(32) = 5 10 | 11 | - Big O notation describes an algorithm's worst-case time complexity. 12 | - Common Big O run times (from fastest to slowest): 13 | 1. O(log n): Logarithmic time (e.g., Binary Search) 14 | 2. O(n): Linear time (e.g., Simple Search) 15 | 3. O(n log n): Linearithmic time (e.g., Quicksort) 16 | 4. O(n²): Quadratic time (e.g., Selection Sort) 17 | 5. O(n!): Factorial time (e.g., Traveling Salesperson problem) 18 | 19 | - Algorithm speed is measured by the growth in the number of operations as input size increases. 20 | - O(log n) is faster than O(n), with the difference becoming more significant as the input size grows. 21 | - Binary search: 22 | - Only works on sorted lists 23 | - Has logarithmic time complexity O(log n) 24 | - Significantly faster than simple search 25 | 26 | - The constant factor in Big O notation can matter in practice (e.g., Quicksort vs. Merge Sort). 27 | - For simple vs. binary search, the difference in Big O complexity (O(n) vs. O(log n)) usually outweighs constant factors. 28 | 29 | ## Memory and Data Structures 30 | 31 | | Operation | Arrays | Linked Lists | 32 | |-----------|--------|--------------| 33 | | Reading | O(1) | O(n) | 34 | | Insertion | O(n) | O(1) | 35 | 36 | - Computer memory is conceptually similar to a set of numbered drawers. 37 | - Arrays store elements of the same type contiguously in memory. 38 | - Linked lists can store different types and dynamically grow or shrink. 39 | - Stack: Last In, First Out (LIFO) 40 | - Queue: First In, First Out (FIFO) 41 | 42 | ## Recursion 43 | 44 | - Recursion occurs when a function calls itself. 45 | - Every recursive function has: 46 | 1. Base case (termination condition) 47 | 2. Recursive case 48 | - All function calls are added to the call stack. 49 | - Excessive recursion can lead to stack overflow. 50 | 51 | ## Hash Tables 52 | 53 | - Hash functions map strings to numbers. 54 | - Ideal for mapping web addresses to IP addresses. 55 | - Collisions occur when multiple keys hash to the same slot. 56 | - Collision resolution: often implemented using linked lists at each slot. 57 | - Performance depends on: 58 | 1. Low load factor 59 | 2. Good hash function 60 | - Useful for caching and detecting duplicates. 61 | 62 | ## Graphs 63 | 64 | - Breadth-First Search (BFS): Finds shortest path in unweighted graphs. 65 | - Dijkstra's Algorithm: Finds shortest path in weighted graphs (no negative weights). 66 | - Bellman-Ford Algorithm: Handles graphs with negative weights. 67 | 68 | ## Greedy Algorithms 69 | 70 | - Make locally optimal choices at each step. 71 | - Often provide good approximations but not always optimal solutions. 72 | 73 | ## Dynamic Programming 74 | 75 | - Useful for optimization problems with constraints. 76 | - Applicable when problems can be broken into discrete subproblems. 77 | - Typically involves creating a grid of subproblem solutions. 78 | - Grid cells usually contain the values being optimized. 79 | 80 | ## K-Nearest Neighbors (KNN) 81 | 82 | - Used for classification and regression tasks. 83 | - Classification: Categorizing into groups. 84 | - Regression: Predicting a numerical value. 85 | - Requires feature extraction: converting items into comparable numerical lists. 86 | - Selecting relevant features is crucial for KNN's success. -------------------------------------------------------------------------------- /books/notes/the_ruthless_elimination_of_hurry.md: -------------------------------------------------------------------------------- 1 | # The Ruthless Elimination of Hurry 2 | 3 | The book is divided into two parts: the first addresses the issue of hurry in our daily lives, while the second suggests that instead of trying to get more time, we should manage our time better. The author advocates for living a life similar to that of Jesus, emphasizing the importance of moments of silence and solitude with God. He shares how these habits transformed his life and provides practical tips for incorporating them into daily routines. 4 | 5 | ## Words and snippets that made me feel/think 6 | 7 | 8 | "Corrie ten Boom once said that if the devil can't make you sin, he'll make you busy. 9 | 10 | There's truth in that. Both sin and busyness have the exact same effect-they cut off your connection to God, to other people, and even to your own soul. 11 | The famous psychologist Carl Jung had this little saying: 12 | 13 | Hurry is not of the devil; hurry is the devil." 14 | 15 | 16 | "There's a reason people talk about "walking" with God, not "running" with God. It's because God is love." 17 | 18 | 19 | "Meaning, very little can be done with hurry that can't be done better without it. Especially our lives with God. And even our work for God." 20 | 21 | " 22 | I mean, how do we have any kind of spiritual life at all if we can't pay attention longer than a goldfish? How do you pray, read the Scriptures, sit under a teaching at church, or rest well on the Sabbath when every chance you get, you reach for the dopamine dispenser that is your phone? 23 | " 24 | 25 | 26 | "It's been proven by study after study: there is zero correlation between hurry and productivity. In fact, once you work a certain number of hours in a week, your productivity plummets. Wanna know what the number is? Fifty hours. Ironic: that's about a six-day workweek. One study found that there was zero difference in productivity between workers who logged seventy hours and those who logged fifty-five.*" 27 | 28 | 29 | 30 | 31 | "Dan Allender, in his book Sabbath, had this to say: 32 | 33 | The Sabbath is an invitation to enter delight. The Sabbath, when experienced as God intended, is the best day of our lives. Without question or thought, it is the best day of the week. It is the day we anticipate on Wednesday, Thursday, and Friday—and the day we remember on Sunday, Monday, and Tuesday. Sabbath is the holy time where we feast, play, dance, have sex, sing, pray, laugh, tell stories, read, paint, walk, and watch creation in its fullness. Few people are willing to enter the Sabbath and sanctify it, to make it holy, because a full day of delight and joy is more than most people can bear in a lifetime, let alone a week.« 34 | " 35 | 36 | 37 | 38 | "[...] cited the happiest people on earth. Near the top of the list was a group of Christians called Seventh-day Adventists, who are religious, literally, about the Sabbath. This doctor noted that they lived ten years longer than the average American." 39 | 40 | 41 | 42 | " 43 | The conscious and intelligent manipulation of the organized habits and opinions of the masses is an important element in democratic society. 44 | Those who manipulate this unseen mechanism of society constitute an invisible government which is the true ruling power of our country. 45 | We are governed, our minds are molded, our tastes formed, our ideas suggested, largely by men we have never heard of. ... In almost every act of our daily lives ... we are dominated by the relatively small number of persons ….. who pull the wires which control the public 46 | mind.s" 47 | 48 | 49 | 50 | 51 | "The goal here is to live with a high degree of intentionality around what matters most, which, for those of us who apprentice under Jesus, is Jesus himself and his kingdom." 52 | 53 | 54 | 55 | 56 | "Set a time and a time limit for social media (or just get off it). 57 | " 58 | -------------------------------------------------------------------------------- /books/notes/we.md: -------------------------------------------------------------------------------- 1 | ## A Book I Almost Didn’t Read: We by Yevgeny Zamyatin 2 | 3 | My boyfriend gave me We by Yevgeny Zamyatin, knowing I have a thing for dystopian stories. Still, I let the book sit on my desk for months before opening it. Maybe it was the weight of the genre, or the fear it wouldn’t live up to the others I love. But once I finally started reading it, I was pulled in much faster than I expected. 4 | 5 | We is set in a hyper-rational future where people live in glass buildings, wear identical clothing, and have their lives scheduled down to the minute. Emotions are considered a disease, and even imagination is a threat to the perfect balance of the State. Everyone is known by a number, not a name (the protagonist is D-503, a mathematician and builder of a spacecraft called the Integral, designed to export their “perfect” society to other planets.) 6 | 7 | Right away, I saw echoes of other dystopias I’ve read: Brave New World, 1984, Fahrenheit 451. But what struck me most about We was how early it was written, and yet how timeless it feels. I could sense how deeply it influenced those later books (maybe even Interstellar???) And while I’m kind of biased (I love stories about totalitarian regimes and their slow collapse), We felt different. Its language is stranger, more chaotic, more poetic. It's written like a series of journal entries, and D-503's mind slowly unravels in real time as he falls for the mysterious rebel I-330. 8 | 9 | One small detail haunted me long after I closed the book: D-503's obsession with numbers and how he believes the universe is ruled by absolute logic. At one point, I started thinking about how humans can’t create truly random numbers. Even when we try, our minds fall into patterns. It’s like there's a tiny tyrant inside us, always trying to bring order to chaos. That made the world of We feel even more real, this desperate need to quantify everything, to believe that randomness and freedom are mistakes that can be corrected. 10 | 11 | In the end, D-503 is forced to undergo an “operation” to remove his imagination. The rebellion fails (at least on the surface), and the State reasserts control. But that little image of the grass lingers. It reminds us that even under total control, something soft and stubborn still pushes upward. That maybe freedom is not an idea we build, but a force that grows on its own. 12 | 13 | Reading We was both unsettling and familiar. I waited too long to read it, but I’m glad I finally did. 14 | 15 | --- 16 | 17 | I just remembered something else that’s been sitting with me: the book’s view on racism and the human obsession with feeling special. There's this thread running through the story, that deep-rooted desire to believe our group, our system, our logic is superior. It reminded me how often that mindset fuels exclusion, hierarchy, and dehumanization in the real world. The One State frames its dominance as scientific and moral progress, but beneath it is the same old instinct to divide, to control, to claim authority over what is "right." 18 | 19 | And then there’s that haunting moment with I-330, when she takes D-503’s hand for the frist time. In that moment, she makes him feel special. Not as a number, not as a function in a machine, but as a person. It’s such a small, intimate gesture, but it breaks something open in him. He carries that feeling all the way to the end, even after the operation, even after he insists he’s been cured. That quiet memory lingers, like the grass. It’s proof that something human slipped past all the equations. 20 | 21 | And then there’s the book’s complex relationship with Christianity. It doesn't offer a direct critique, but it definitely echoes and twists religious themes, original sin becomes imagination, salvation is replaced by conformity, and the Benefactor feels like a cold, authoritarian god. I-330, the rebel, even tempts D-503 with forbidden knowledge. The story reframes Eden, but not as a place of harmony, it’s a sterile garden of equations. It made me think about how belief systems can be used both to liberate and to dominate, depending on who’s in control of the story. 22 | -------------------------------------------------------------------------------- /courses/advanced-database-systems/history_of_databases.md: -------------------------------------------------------------------------------- 1 | # History of Databases 2 | 3 | Basically, all the old database issues are still the same in today's debate. The SQL vs. NoSQL debate is reminiscent of the Relational vs. CODASYL debate from the 1970s. 4 | 5 | **1960 - Integrated Data Store (IDS)**: Theyre going to use what is caled a network data modelling and when youre execute queries is basically be writing a bunch of for loops that operate on a single tuple at a time and was one of the frist DBMSs 6 | 7 | **CODASYL**: COBOL people got together and proposed a standard for ow programs will access a database. Lead by [Charles Bachman](). Network data model but a Tuple-at-time queries during query executing. 8 | 9 | **Network Data Model**: The network data model is a database model conceived as a flexible way of representing objects and their relationships. Its distinguishing feature is that the schema, viewed as a graph in which object types are nodes and relationship types are arcs, is not restricted to being a hierarchy or lattice. 10 | 11 |
12 | 13 | 14 |
15 | 16 | **1960s- IBM IMS**: Early database sustem developed to keeep track of purchase orders for Apollo moon missions 17 | Hierarchical data model 18 | programmer-defined physical storage format 19 | tuple-at-a-time queries 20 | 21 | **Hierarchical Data Model**: The hierarchical data model organizes data in a tree structure. There is a hierarchy of parent and child data segments. This structure implies that a record can have repeating information, generally in the child data segments. This model is very effective in one-to-many relationships. 22 | 23 | 24 | 25 | **Late 1960s and early 1970s Relational Model**:[Ted Codd](https://en.wikipedia.org/wiki/Edgar_F._Codd) **Late 1960s and early 1970s Relational Model**:[Ted Codd](https://en.wikipedia.org/wiki/Edgar_F._Codd) was a mathematician working ar IBM Research. He saw developers spending their time rewriting IMDS and Codasyl programs every time the databsses schema or laryout changed. 26 | 27 | To avoid this maintenance: 28 | 29 | - Database abstraction 30 | - Store databases in simple data structures 31 | - Access data through a high-level language 32 | - Physical storage left up to implementation 33 | 34 | 35 | 36 | **1970s - Relational Mode**: 37 | 38 | - System R - IBM Research 39 | - INGRES -U.C Berkeley (POstgrees stands for post ingress) 40 | - Oracle - Larry Ellison 41 | 42 | **1980s Relation Model**: IBM comes out with DB2 in 1983, SEQUEL becomes the standard (SQL) and many new enterprise DBMSs but Oracle wins marketplace. 43 | 44 | note: [stonebraker](https://en.wikipedia.org/wiki/Michael_Stonebraker) creates Postgres 45 | 46 | **1980s Object-Oriented database**: few of these original DBMSs from the 1980s still exist today but many of the technologies exist in other forms (JSON, XML) 47 | 48 | 49 | 50 | 51 | **1990s - Boring Days**: No major advancements in database systems or application workloads. Microssot forks sybase and creates SQL Server. MySQL is written as a replacement for mSQL. Postgres gets SQL support. SQLite started in early 2000. 52 | 53 | **2000s Internet Boom**: The internet boom created a need for new database systems. The old systems were not designed to handle the scale of the internet. 54 | 55 | **2000s Data Warehouses**: Rise of the special purpose OLAP DBMSs. Distributed/shared-nothing, Relational/SQL, usually closed-source. Significant performance benefits from using columnar data storage model. 56 | 57 | **2000s NoSQL Systems**: Focus on high-availability and high-scalability. Non-relational data models (document, key/value, etc), no ACID transactions, custom APIs instead of SQL, usually open source. 58 | 59 | **for response of NoSQL in 2010s NewSQL**: Provide same perfomance for OLTP Workloads as NoSQL DBMSs without giving up ACID. Distributed/shared-nothing, Relational/SQL, usually closed-source. 60 | 61 | **2010a Hybrid systems**: Hybrid Transactional-Analytical Processing. Execute fast OLTP like NewSQL sytem while also executing comples OLAP queries like data warahouse system. distribute/ shared-nothing, relational/sql, mixed open/ closed-source. 62 | 63 | **2010s Cloud systems**: First database-as-a-service (DBaaS) offereing were "contrainerized" versions of existing DBMSs. There are new DBMSs that are designed from scratch explicitly for running in a cloud environment. 64 | 65 | **Shared-disk engines**:Instead of writing a custom storage manager, the DBMS levarages distributed storage. Scale execution layer independently of storage. Favors log-structured approaches. 66 | 67 | note: this is what most people think when they talk about a data lake 68 | 69 | **2010s GRaph Systems**: Systems for storing and querying graph data. Their main advantage over other data models is to provide a graph-centric query API. Recent research demonstrated that is unclear whether there is any benefit to using a graph-centric execution engine and storage manager. 70 | 71 | **2010s Timeseries systems**: Specialized systems that are designes to store timeserie/event data. the design of these systems make deep asasumptions about the distribution of data and workload query patterns. 72 | 73 | **2010s Specialized systems**: 74 | 75 | - embedded DBMSs 76 | - Milti-models DBSMSs 77 | - Blockchain DBSMSs 78 | - FoHardware Acceleration 79 | 80 | ### References 81 | 82 | - video: [History of Databases (CMU Databases / Spring 2020)](https://www.youtube.com/watch?v=SdW5RKUboKc&list=PLSE8ODhjZXjasmrEd2_Yi1deeE360zv5O&index=1) 83 | 84 | - Paper: [What goes around comes around ](https://people.cs.umass.edu/~yanlei/courses/CS691LL-f06/papers/SH05.pdf) 85 | -------------------------------------------------------------------------------- /books/notes/designing_data_intensive_applications.md: -------------------------------------------------------------------------------- 1 | # Part 1: Foundations of Data Systems 2 | 3 | ### Chapter 1: Reliable, Scalable, and Maintainable Applications 4 | 5 | The first chapter was focused on explaining some fundamental ways of thinking about data-intensive applications, what the book will cover, and an overview of what reliability, scalability, and maintainability are. 6 | 7 | - **Realiability**: systems should prevent hardware faults, software faults and human error. 8 | 9 | - **Scalability**: As the system expands, whether in terms of data volume, traffic volume, or complexity, it's essential to have viable strategies in place to manage and accommodate this growth effectively. 10 | 11 | - **Maintainability**: systems should be easey to operate, simple and adaptable. 12 | 13 | ### Chapter 2: Data Models and Query Languages 14 | 15 | Data models are important because they can determine how we think about the problem that we are solving. Applications are built ny layering one data model on top of another. 16 | 17 | **SQL**: Based on the relational model proposed by [Edgar F. Codd](https://en.wikipedia.org/wiki/Edgar_F._Codd) the SQL is the best-known data modal today. Data is organized into `tables`, where each relation is an unordered collections of `rows` 18 | 19 | **NoSQL**: Have gained popularity because they handle massive amounts of data and high-speed writing better than regular databases. They are also free and open source, making them ideal for unique searches regular databases can't handle, and they offer more freedom in data management. 20 | 21 | **ORM: The Object-Relational Mismatch**: Is request just one query because the relavant data are more easy to get it than creating a lot of queries because there are a lot of tables relations and the book cover some examples 22 | 23 | (I just got lazy and didn't write more notes about this chapter) 24 | 25 | ### Chaper 3: Storage and Retrieval 26 | 27 | A database needs to do two things: **store data and retrieve data**. 28 | 29 | - know how databse handles storage and retrieval internally is important to select a storage engine that is appropriate for the kind of data you have and the kind of access patterns your application requires. 30 | 31 | **Data Structures**: The most common data structures used in databases are `hash indexes`, `search trees`, and `log-structured storage`. 32 | 33 | **Hash Indexes**: A hash index consists of an array of `buckets`, each of which contains a few `keys` and `pointers` to the records that have that key. The hash function determines which bucket a key is assigned to. 34 | 35 | - **Bitcask** is a storage engine that uses hash indexes and well suited to situations where the value for a key is being updated frequently. 36 | 37 | **B-Trees and LMS-Trees**: B-Trees are a generalization of binary search trees in which each node can contain more than two children. B-Trees are well suited to storing data on disk, because they are optimized for reading and writing large blocks of data. LMS-trees are a variation of B-Trees that are optimized for SSDs. 38 | 39 | - **Downsides of LSM-Trees** 40 | 41 | - The process of merging segments is called compaction, and it can interfere with the performance of ongoing reads and writes. - Overhead: LSM-trees have more overhead than B-Trees because they have to maintain more data structures. 42 | 43 | - In high-write-throughput scenarios, LSM-Tree databases struggle to balance disk bandwidth between initial writes and ongoing compaction. Initially, when the database is empty, all disk bandwidth is used for writes. However, as the database grows, more bandwidth is needed for compaction. Without careful configuration, this can lead to a backlog of unmerged segments, decreased read performance, and potential disk space exhaustion. Effective management and monitoring of compaction are crucial to avoid these issues. 44 | 45 | **Log-Structured Storage**: Log-structured storage engines write new data to the end of a sequentially written log, and periodically merge segments of the log into a new segment. 46 | 47 | **Other Indexing Structures**: The book also covers secondary indexes, multi-column indexes ,full-text search / fuzzy indexes. 48 | 49 | - secondary indexes: A secondary index is an index that is based on a field that is not the primary key. Secondary indexes are used to speed up queries that search for records based on a field that is not the primary key. 50 | 51 | - Multi-column indexes: A multi-column index is an index that is based on more than one field. Multi-dimiensional are a general way of querying several columns at once. 52 | 53 | ```SQL 54 | SELECT * FROM restaurants WHERE latitude > 51.4946 AND latitude < 51.5079 55 | AND longitude > -0.1162 AND longitude < -0.1004; 56 | ``` 57 | 58 | - Full-text search / fuzzy indexes: Full-text search is a technique for searching a document or a collection of documents for a word or a phrase. Fuzzy indexes are used to search for records that are similar to a given record. (Elasticsearch is a popular search engine that uses fuzzy indexes) 59 | 60 | **Keeping everything in memory**: The book also covers the use of in-memory databases. 61 | 62 | **OLAP VS OLTP**: The difference between OLTP 63 | and OLAP is not always clear-cut, but some typical characteristics are listed below. 64 | | Property | Transaction Processing Systems (OLTP) | Analytic Systems (OLAP) | 65 | |-------------------------------|-----------------------------------------------|-------------------------------------------------| 66 | | Main Read Pattern | Small number of records per query, fetched by key | Aggregate over large number of records | 67 | | Main Write Pattern | Random-access, low-latency writes from user input | Bulk import (ETL) or event stream | 68 | | Primarily Used By | End user/customer, via web application | Internal analyst, for decision support | 69 | | Data Representation | Latest state of data (current point in time) | History of events that happened over time | 70 | | Dataset Size | Gigabytes to terabytes | Terabytes to petabytes | 71 | 72 | ```text 73 | "On a high level, we saw that storage engines fall into two broad categories: those opti‐ 74 | mized for transaction processing (OLTP), and those optimized for analytics (OLAP)." 75 | ``` 76 | 77 | **Data Warehousing**: Data warehousing is the process of collecting, storing, and managing data from various sources to provide meaningful business insights. It is a blend of technologies and components which allows the strategic use of data. 78 | 79 | ### Chapter 4: Encoding and Evolution 80 | 81 | **Evolvability**: The ability to make changes to a system in the future, such as adding new features or fixing bugs. 82 | 83 | - For server-side applications, you can perform rolling upgrades, deploying the new version gradually to a few nodes at a time, ensuring smooth operation before moving to others. This minimizes downtime, enabling more frequent releases. 84 | 85 | - Client-side applications rely on users to install updates, which may cause delays in adoption. 86 | 87 | - Backward compatibility is the ability of a system to accept input intended for a later version of itself. 88 | 89 | - Forward compatibility is the ability of a system to accept input intended for a later version of itself. 90 | 91 | **Formats for Encoding Data**: Programs usually work with data in diferents representations, so we need some kind of translations between them. The book covers some of the most common formats for encoding data. 92 | 93 | **Language-Specific Formats**: Some languages have built-in support for encoding objects into byte sequences. While convenient for in-memory operations, these libraries lack cross-language compatibility. Decoding may require instantiating arbitrary classes, posing security risks. Versioning and efficiency are also challenges, as handling different object versions can be complex. 94 | 95 | (There are a lot of examples in the book, but I didn't write them down) 96 | 97 | **JSON, XML, and Binary Variants**: JSON and XML are popular for encoding data for interchange between systems. They are human-readable, but verbose and slow to parse. Binary variants are more efficient, but lack human readability 98 | 99 | **Avro**: Avro is a binary encoding format that is smaller and faster to encode and decode than JSON. It is also a schema-based format, meaning that the schema is included in the encoded data, allowing for forward and backward compatibility. 100 | 101 | (Note there a lot of nice things to say about Avro, but I haven't written them down. 102 | ) 103 | 104 | - Define Schema: Avro schemas are defined using JSON. The schema defines the fields and types of the data being encoded. 105 | - Schema Evolution: Avro supports schema evolution, meaning that the schema can change over time. New fields can be added, fields can be removed, and fields can be renamed. Avro also supports default values for fields, allowing for backward compatibility. 106 | - Writer Schema and Reader Schema: Avro supports the concept of a writer schema and a reader schema. The writer schema is the schema used to encode the data, and the reader schema is the schema used to decode the data. The reader schema can be different from the writer schema, allowing for forward and backward compatibility. 107 | 108 | **Dataflow Throught Databases**: Basically, in a database, the process that writes to the database encodes the data and the process that reads from the databse decodes it. 109 | 110 | - Different values written at different times may be encoded differently, so the database must be able to handle multiple encodings of the same field. This is called **schema evolution**. 111 | 112 | **Dataflow Throught Services**: When you have proceses that need to communicate over a **network**, there are a few differents ways of arranging that communication. 113 | 114 | **REST and HTTP**: REST is an architectural style for building distributed systems. It is based on the principles of the web, such as URLs, HTTP methods, and hypermedia. RESTful systems are stateless, meaning that each request from the client to the server must contain all the information necessary to understand the request. RESTful systems are also cacheable, meaning that responses can be cached to improve performance. 115 | 116 | **Remote Procedure Calls (RPC)**: RPC is a synchronous request-response protocol. The client sends a request to the server and waits for a response. The server processes the request and sends a response back to the client. The client is blocked until the server responds. 117 | 118 | **Message brokers (also called a message queue or message-oriented middleware)**: facilitates communication between different applications by receiving messages from one application and delivering them to another. It acts as an intermediary, enabling decoupled communication and allowing applications to interact without direct awareness of each other. 119 | 120 | ```text 121 | "in general, message brokers are used as follows: one process sends a message to a named queue or topic, and the broker ensures that the message is delivered to one or more consumers of or subscribers to that queue or topic. There can be many producers and many consumers on the same topic." 122 | 123 | ``` 124 | 125 | **Distributed actor frameworks**: A distributed actor framework is a framework for building distributed systems using the actor model. The actor model is a model of concurrent computation that treats actors as the universal primitives of concurrent computation. In response to a message that it receives, an actor can make local decisions, create more actors, send more messages, and determine how to respond to the next message received. 126 | 127 | **SUMMARY** 128 | 129 | - In databases, data is encoded by the writing process and decoded by the reading process. 130 | - RPC and REST APIs involve encoding a request by the client, decoding it by the server, then encoding a response by the server and decoding it by the client. 131 | - In asynchronous message passing, such as with message brokers or actors, nodes exchange encoded messages, with senders encoding and recipients decoding them for communication. 132 | -------------------------------------------------------------------------------- /books/books_list.md: -------------------------------------------------------------------------------- 1 | ## Reading 2 | - [Designing Data-Intensive Applications](/books/notes/designing_data_intensive_applications.md) 3 | - [Chip War: The Fight for the World's Most Critical Technology](https://en.wikipedia.org/wiki/Chip_War:_The_Fight_for_the_World%27s_Most_Critical_Technology) 4 | 5 | 6 | 7 | ## Want to Read 8 | - Why we Sleep(Matthew Walkwer) 9 | - The Circadian Code: Lose Weight, Supercharge Your Energy and Sleep Well Every Night 10 | - The Craft od Research wayne c. booth, gregory g. colomb, joseph m. williams 11 | - The marshmallow test walter mischel 12 | - The complete TurtleTrader: How 23 Novice Investors Became Overnight Millionaires 13 | - [Functional-Light JavaScript](https://github.com/getify/Functional-Light-JS) 14 | - The Design of Everyday Thing 15 | 16 | ## Read 17 |
18 | 2025 19 | 20 | - O monge e o executivo: Uma história sobre a essência da liderança 21 | - [We Yevgeny Zamyatin](./notes/we.md) 22 | 23 |
24 |
25 | 2024 26 | 27 | - O ceticismo da fé: Deus: uma dúvida, uma certeza, uma distorção (Rodrigo Silva) 28 | - [The Ruthless Elimination of Hurry: How to Stay Emotionally Healthy and Spiritually Alive in the Chaos of the Modern World (John Mark Comer)](/books/notes/the_ruthless_elimination_of_hurry.md) 29 | - [Grokking algorithms Bhargava ](./notes/grokking_algorithms.md) 30 | - O segredo judaico ade resolução de problemas – Nilton Bonder 31 | 32 |
33 | 34 |
35 | Before 2024 130 | --------------------------------------------------------------------------------