├── .gitignore ├── 1.Elementary level ├── README.md ├── binary-search.py ├── binary_search.js ├── item-recursion.py ├── recursion.hs └── stack │ └── stack.c ├── 2.Intermediate Level └── README.md ├── 3.Advanced level └── README.md ├── 4.Thanos level └── README.md ├── 5.Merlin level └── README.md ├── LICENSE ├── README.md └── my-articles ├── Algorithm ├── Basic Algorithms.md ├── Introduction to Analysis of Algorithms.md ├── SORTING │ ├── Bubble.md │ ├── Insertion.md │ ├── README.md │ ├── Selection Sort.md │ └── Selection algorithm.md ├── Searching │ └── README.md ├── What is the algorithm-1?.md └── What is the algorithm-2?.md ├── Data Structure ├── Introduction of Data Structure.md └── STRING │ └── String Manipulation and Algorithms.md └── utils ├── Introduction to Problem Solving and Flowcharts.md └── What is Pseudocode.md /.gitignore: -------------------------------------------------------------------------------- 1 | *.o 2 | tags 3 | -------------------------------------------------------------------------------- /1.Elementary level/README.md: -------------------------------------------------------------------------------- 1 | # Elementary Level Algorithms & Data Structures 2 | 3 | ## 1. **Fundamentals of Algorithmic Thinking** 4 | 5 | ### [What is an Algorithm?](../my-articles/Algorithm/What%20is%20the%20algorithm-2?.md) 6 | 7 | - **Definition and Significance** 8 | 9 | - [Historical perspective: Early algorithms (Euclid's algorithm, Al-Khwarizmi's contributions)](https://quantumzeitgeist.com/the-secret-life-of-algorithms-how-ancient-mathematical-ideas-power-modern-computing/) 10 | - [https://en.wikipedia.org/wiki/Euclidean_algorithm](https://en.wikipedia.org/wiki/Euclidean_algorithm) 11 | - [https://languagelog.ldc.upenn.edu/nll/?p=29074](https://languagelog.ldc.upenn.edu/nll/?p=29074) 12 | 13 | - **Algorithms & Computational Thinking Foundations** 14 | - https://www.learning.com/blog/defining-computational-algorithmic-design-thinking 15 | - https://gasstationwithoutpumps.wordpress.com/2010/08/12/algorithmic-vs-computational-thinking/ 16 | 17 | ## 2. **Algorithm Analysis Basics** 18 | 19 | I built a tool to understand this in depth and better: 20 | [https://github.com/medishen/TideityIQ](https://github.com/medishen/TideityIQ) 21 | 22 | Various resources regarding algorithm analysis: 23 | [https://github.com/m-mdy-m/TechShelf/tree/main/Algorithms/Analysis](https://github.com/m-mdy-m/TechShelf/tree/main/Algorithms/Analysis) 24 | 25 | ## 3. **Fundamental Data Structures** 26 | 27 | ### Why Data Structures Matter 28 | 29 | - **Data Organization Principles** 30 | 31 | - https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/14867/Good_dataMan.pdf 32 | - https://people.umass.edu/biostat690c/pdf/1.%20%20Principles%20of%20Data%20Management%202020.pdf 33 | - https://ws1.nbninternational.com/fusion/v2.0/supplement/62d96a37646eb15fd4cb7208.pdf 34 | - https://www.intrac.org/app/uploads/2017/01/Principles-of-data-collection.pdf 35 | 36 | - **Abstract Data Types (ADTs)** 37 | 38 | - https://math.hws.edu/eck/cs327_s04/chapter2.pdf 39 | 40 | ### Basic Data Structures 41 | 42 | I will complete the first chapters of Arliz's book to understand the data structure of arrays and their children, such as stacks, queues, etc., and then move on to the next ones. 43 | 44 | ## 4. **Introduction to Recursion** 45 | 46 | - https://unamer34.wordpress.com/wp-content/uploads/2008/06/pr.pdf 47 | - https://cseweb.ucsd.edu/classes/sp05/cse101/JeffEdmondsBook.pdf 48 | - https://home.cs.colorado.edu/~main/supplements/pdf/notes09.pdf 49 | 50 | ## 5. **Pattern Recognition in Algorithms** 51 | 52 | - **Iteration Patterns** 53 | 54 | - **Divide and Conquer** 55 | 56 | - **Greedy Approach** 57 | 58 | - **Dynamic Programming Introduction** 59 | 60 | - **Transformation Patterns** 61 | -------------------------------------------------------------------------------- /1.Elementary level/binary-search.py: -------------------------------------------------------------------------------- 1 | def binary_search(list, item): 2 | low = 0 3 | high = len(list) - 1 4 | while low <= high: 5 | mid = (low + high) 6 | guess = list[mid] 7 | if guess == item: 8 | return mid 9 | if guess > item: 10 | high = mid - 1 11 | else: 12 | low = mid + 1 13 | return None 14 | l = [1,3,5,6,7,8,19] 15 | print(binary_search(l,8)) 16 | -------------------------------------------------------------------------------- /1.Elementary level/binary_search.js: -------------------------------------------------------------------------------- 1 | function binarySearch(array,item){ 2 | let low = 0, 3 | high = array.length -1, 4 | mid, 5 | guess; 6 | while(low<=high){ 7 | mid = low+high 8 | guess = array[mid] 9 | if (guess===item){ 10 | return mid 11 | }else if(guess Int 2 | fact 0 = 1 3 | fact n = n * fact(n-1) 4 | main = do 5 | print ( fact 5 ) 6 | -------------------------------------------------------------------------------- /1.Elementary level/stack/stack.c: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #define MAX 3 4 | struct Stack{ 5 | int items[MAX]; 6 | int top; 7 | }; 8 | typedef struct Stack st; 9 | 10 | void init(st *s){ 11 | s->top = -1; 12 | } 13 | 14 | int isFull(st *s){ 15 | if(s->top == MAX-1){ 16 | return 1; 17 | }else{ 18 | return 0; 19 | } 20 | } 21 | int isEmpty(st *s){ 22 | if(s->top == -1){ 23 | return 1; 24 | }else{ 25 | return 0; 26 | } 27 | } 28 | void push(st*s,int i){ 29 | if(isFull(s)){ 30 | printf("Stack is Full\n"); 31 | }else{ 32 | s->top++; 33 | s->items[s->top] = i; 34 | } 35 | } 36 | 37 | void pop(st*s){ 38 | if(isEmpty(s)){ 39 | printf("Stack is empty"); 40 | }else{ 41 | s->top--; 42 | } 43 | } 44 | int main(){ 45 | st *s = (st *)malloc(sizeof(st)); 46 | 47 | init(s); 48 | 49 | push(s,1); 50 | push(s,2); 51 | return 0; 52 | } 53 | -------------------------------------------------------------------------------- /2.Intermediate Level/README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/m-mdy-m/algorithms-data-structures/7bcd5875ab26768ec17b0daf3cd83cffb05007aa/2.Intermediate Level/README.md -------------------------------------------------------------------------------- /3.Advanced level/README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/m-mdy-m/algorithms-data-structures/7bcd5875ab26768ec17b0daf3cd83cffb05007aa/3.Advanced level/README.md -------------------------------------------------------------------------------- /4.Thanos level/README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/m-mdy-m/algorithms-data-structures/7bcd5875ab26768ec17b0daf3cd83cffb05007aa/4.Thanos level/README.md -------------------------------------------------------------------------------- /5.Merlin level/README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/m-mdy-m/algorithms-data-structures/7bcd5875ab26768ec17b0daf3cd83cffb05007aa/5.Merlin level/README.md -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2023 Mahdi 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # My Algorithms & CS Journey 2 | 3 | This repo is where I drop all the cool stuff I learn while diving into algorithms, data structures, and computer science in general. 4 | 5 | I'm not here to teach anyone. 6 | This is just my personal space to experiment, break things, build things, and collect ideas that I find interesting or useful. 7 | 8 | No tutorials, no fancy guides — just raw learning, exploration, and a bit of chaos. 9 | Think of it as a messy but honest journal of my journey through the CS rabbit hole. 10 | 11 | ## What's Inside? 12 | 13 | I've split things into levels that (kinda) match my own learning curve: 14 | 15 | - [Elementary](./1.Elementary%20level) – The basics, the first steps. 16 | - [Intermediate](./2.Intermediate%20Level) – Starting to get my hands dirty. 17 | - [Advanced](./3.Advanced%20level) – Headaches and "aha" moments. 18 | - [Thanos Level](./4.Thanos%20level) – Stuff that _snaps_ my brain. 19 | - [Merlin Level](./5.Merlin%20level) – Magic? Madness? Who knows. 20 | 21 | ## Notes & Progress 22 | 23 | Sometimes I’ll reflect, write notes, or just dump thoughts here. 24 | No promise of organization, but it’s all real. 25 | 26 | --- 27 | 28 | That’s it. If you’re here, cool. If not, also cool. 29 | -------------------------------------------------------------------------------- /my-articles/Algorithm/Introduction to Analysis of Algorithms.md: -------------------------------------------------------------------------------- 1 | ### Introduction to Analysis of Algorithms 2 | 3 | **Algorithm analysis** is a key area within computational complexity theory, which estimates the theoretical resources required by an algorithm to solve a given computational problem. It plays a critical role in determining how efficiently an algorithm performs, particularly in terms of time and space. 4 | 5 | Most algorithms are designed to handle inputs of arbitrary length, meaning the algorithm must perform regardless of the size of the data. Analyzing algorithms helps us understand their performance for different input sizes, providing insights into the scalability and efficiency of an algorithm. The efficiency of an algorithm is commonly expressed in terms of: 6 | 7 | - **Time Complexity**: This measures how the runtime of an algorithm changes as the input size increases. It is often represented by Big-O notation, which provides an upper bound on the time it takes for the algorithm to run based on input size. 8 | - **Space Complexity**: This measures how much memory an algorithm uses relative to the input size. It is crucial for understanding how much additional storage is required when executing the algorithm. 9 | 10 | ### Types of Algorithm Analysis 11 | 12 | There are four main types of algorithm analysis: 13 | 14 | 1. **Worst-Case Analysis**: 15 | 16 | - This refers to the maximum number of steps or resources an algorithm will need for any input of size `n`. Worst-case analysis is important for ensuring that the algorithm will perform efficiently under the most difficult circumstances. 17 | - **Example**: In a linear search algorithm, the worst-case scenario occurs when the target element is at the very end of the list, requiring the algorithm to scan through every element before finding it. 18 | 19 | 2. **Best-Case Analysis**: 20 | 21 | - This calculates the minimum number of steps required by the algorithm for any input of size `n`. While useful, best-case analysis is less significant in real-world applications because it only reflects the most favorable input scenario. 22 | - **Example**: In the same linear search algorithm, the best-case scenario is when the target element is the first element, meaning the search ends after one comparison. 23 | 24 | 3. **Average-Case Analysis**: 25 | 26 | - This computes the average number of steps the algorithm will take for a random input of size `n`. Average-case analysis provides a more realistic expectation of performance compared to best- and worst-case scenarios. 27 | - **Example**: In sorting algorithms like quicksort, the average case might consider random input orders and derive the expected number of comparisons. 28 | 29 | 4. **Amortized Analysis**: 30 | - Amortized analysis looks at a sequence of operations on a data structure and provides an average performance over time. This is particularly useful when some operations can be expensive, but their cost is "amortized" by many cheaper operations. 31 | - **Example**: In dynamic array resizing, while resizing can be expensive, it happens infrequently, so the average cost per insertion is low when considering multiple insertions. 32 | 33 | ### Importance of Algorithm Analysis 34 | 35 | Algorithm analysis helps identify the efficiency of an algorithm in terms of **CPU time**, **memory usage**, **disk usage**, and **network usage**. Among these, **CPU time** (time complexity) is typically the most critical factor when evaluating algorithms. 36 | 37 | It’s important to distinguish between **performance** and **complexity**: 38 | 39 | - **Performance**: This measures how much time or resources (memory, disk, etc.) are used when a program is run. Performance is dependent on several factors, including the hardware (machine specifications), software (compiler optimizations), and the algorithm itself. 40 | 41 | - **Example**: A sorting algorithm may take 10 milliseconds on one machine and 20 milliseconds on another, depending on CPU speed. 42 | 43 | - **Complexity**: Complexity examines how the resource requirements of an algorithm scale as the problem size increases. This provides a more general measure of the algorithm’s efficiency, independent of the specific machine or environment. 44 | - **Example**: If an algorithm has a time complexity of O(n^2), its runtime will grow quadratically as the input size increases. 45 | 46 | ### Role of Algorithms in Computing 47 | 48 | Algorithms are at the heart of computing, providing precise sets of instructions that a computer must follow to perform a task or solve a problem. Whether sorting data, processing images, or searching for information, algorithms help computers execute tasks efficiently and accurately. Algorithm efficiency is critical because it directly impacts the performance of computer systems in various industries. 49 | 50 | ### Applications of Algorithms in Various Industries 51 | 52 | Algorithms play a significant role in many industries by optimizing operations, enhancing decision-making, and improving efficiency. Some examples include: 53 | 54 | 1. **Manufacturing**: Algorithms are used to optimize production processes and supply chain management. This includes reducing waste, improving scheduling, and enhancing overall efficiency. 55 | 56 | - **Example**: The use of algorithms in inventory management to determine the most efficient production schedules for minimizing waste. 57 | 58 | 2. **Finance**: Algorithms are employed to analyze financial data, detect patterns, and make predictions, enabling traders and investors to make informed decisions. 59 | 60 | - **Example**: High-frequency trading algorithms that buy and sell assets within milliseconds to capitalize on price fluctuations. 61 | 62 | 3. **Healthcare**: Medical algorithms process and analyze medical images, assist in diagnosing diseases, and help optimize treatment plans. 63 | 64 | - **Example**: Algorithms used in MRI scans to detect abnormalities or in predictive models for diagnosing potential health issues. 65 | 66 | 4. **Retail**: Algorithms are vital for customer relationship management, personalized product recommendations, and pricing optimization, improving sales and customer satisfaction. 67 | 68 | - **Example**: Recommender systems used by e-commerce platforms to suggest products based on user behavior. 69 | 70 | 5. **Transportation**: Algorithms help optimize routes for delivery and transportation, reducing fuel consumption and improving delivery times. 71 | 72 | - **Example**: GPS navigation systems that calculate the most efficient route based on traffic data and road conditions. 73 | 74 | 6. **Energy**: Energy companies use algorithms to optimize energy generation, distribution, and consumption, leading to reduced energy waste and enhanced efficiency. 75 | 76 | - **Example**: Smart grid algorithms for balancing electricity supply and demand across a power network. 77 | 78 | 7. **Security**: In cybersecurity, algorithms are crucial for detecting and preventing threats like hacking, fraud, and cyber-attacks. 79 | - **Example**: Machine learning algorithms that detect anomalies in network traffic to prevent cyber-attacks. 80 | 81 | ### Key Applications of Algorithms in Computing 82 | 83 | Algorithms are fundamental in many aspects of computing, including: 84 | 85 | 1. **Data Processing**: Algorithms are essential for handling large amounts of data, whether for sorting, searching, or organizing data. 86 | 87 | - **Example**: Sorting algorithms like mergesort or quicksort, which organize data in a particular order for faster access. 88 | 89 | 2. **Problem Solving**: Algorithms are used to solve computational problems such as optimization, mathematical problems, and decision-making processes. 90 | 91 | - **Example**: Algorithms like Dijkstra’s algorithm find the shortest path in graphs, solving optimization problems efficiently. 92 | 93 | 3. **Computer Graphics**: Algorithms are used in creating, processing, and compressing images and graphics. 94 | 95 | - **Example**: JPEG compression algorithms that reduce image file sizes while maintaining visual quality. 96 | 97 | 4. **Artificial Intelligence**: AI systems rely on algorithms for machine learning, natural language processing (NLP), and computer vision tasks. 98 | 99 | - **Example**: Neural network algorithms used in image recognition, speech processing, and decision-making systems. 100 | 101 | 5. **Database Management**: Algorithms are critical in managing large databases, such as indexing algorithms and query optimization algorithms that make data retrieval more efficient. 102 | 103 | - **Example**: B-trees are used in databases to manage sorted data and allow efficient insertion, deletion, and searching operations. 104 | 105 | 6. **Network Communication**: Efficient communication and data transfer across networks depend on algorithms like routing and error correction algorithms. 106 | 107 | - **Example**: The TCP/IP protocol stack uses algorithms to ensure reliable data transmission over the internet. 108 | 109 | 7. **Operating Systems**: Operating systems rely on algorithms for process scheduling, memory management, and disk management. 110 | - **Example**: Round-robin scheduling algorithms allocate CPU time fairly among processes in a multitasking operating system. 111 | 112 | ## Algorithm of Efficiency 113 | 114 | When analyzing algorithms, two critical factors determine their efficiency: space efficiency (memory usage) and time efficiency (execution time). These factors can impact how well an algorithm performs, especially for large datasets or time-sensitive tasks. Let's go through each concept in detail. 115 | 116 | ### 1. Space Efficiency (Space Complexity) 117 | 118 | Space efficiency refers to the amount of memory an algorithm requires during its execution. It is important when working with large datasets or in environments with limited memory, such as embedded systems. Space complexity is typically measured based on the following: 119 | 120 | #### Components of Memory Use: 121 | 122 | - **Instruction Space**: Memory required to store the program's instructions. This is affected by factors like: 123 | 124 | - The compiler used 125 | - Compiler optimization options 126 | - The architecture of the target machine (such as the CPU) 127 | 128 | - **Data Space**: Memory required for storing variables. This includes: 129 | 130 | - Data size, dynamically allocated memory, and static program variables. 131 | 132 | - **Run-time Stack Space**: Memory used by the program's call stack. This is affected by: 133 | - Function calls and recursion (which lead to more stack frames) 134 | - Local variables and parameters passed during function calls. 135 | 136 | #### Static vs. Dynamic Memory Components: 137 | 138 | - **Fixed/Static Components**: These are determined at compile-time, such as the memory used by machine instructions and static variables. This size does not change during execution. 139 | - **Variable/Dynamic Components**: These are determined at run-time, such as the memory used by recursion and dynamically allocated memory. 140 | 141 | ### 2. Time Efficiency (Time Complexity) 142 | 143 | Time efficiency measures how long an algorithm takes to run. This can be influenced by several factors: 144 | 145 | - **Speed of the computer**: This includes the CPU, I/O operations, memory access speeds, etc. 146 | - **Compiler and compiler options**: Different compilers can optimize code differently, impacting execution time. 147 | - **Size of the input**: Algorithms often run more slowly on larger datasets. 148 | - **Nature of the input**: In certain algorithms (like searching), the structure of the input can affect how long it takes to complete. For instance, in a linear search, if the desired element is at the beginning of the list, the search will be faster compared to when it is at the end. 149 | 150 | ### Recursive Algorithm Analysis 151 | 152 | To analyze recursive algorithms, follow these steps: 153 | 154 | #### Steps for Analysis: 155 | 156 | 1. **Identify a parameter** indicating the size of the input. This will help define the recursive relation. 157 | 2. **Identify the basic operation**: This is the fundamental step repeated in the algorithm (e.g., a multiplication or a disk move in Tower of Hanoi). 158 | 3. **Analyze the number of times the basic operation is executed**: Depending on the input, this might vary, so consider: 159 | 160 | - **Worst-case**: The scenario where the algorithm performs the maximum number of operations. 161 | - **Average-case**: The scenario where the algorithm performs a typical number of operations for random input. 162 | - **Best-case**: The scenario where the algorithm performs the fewest operations. 163 | 164 | 4. **Set up a recurrence relation**: A recurrence relation expresses the number of basic operations executed based on the input size. 165 | 166 | 5. **Solve the recurrence relation**: Either find an exact solution or estimate the order of growth (big-O notation). 167 | 168 | ### Example: Recursive Evaluation of Factorial $n!$ 169 | 170 | The factorial function $n!$ is defined as the product of all positive integers up to $n$. For instance: 171 | 172 | - $4! = 1 \times 2 \times 3 \times 4 = 24$ 173 | 174 | #### Recursive Definition of Factorial 175 | 176 | Factorial can be defined recursively as follows: 177 | 178 | - $F(n) = F(n-1) \times n$ for $n \geq 1$ 179 | - $F(0) = 1$ 180 | 181 | Here is an algorithm to compute $n!$: 182 | 183 | ``` 184 | ALGORITHM F(n) 185 | // Computes n! recursively 186 | // Input: A nonnegative integer n 187 | // Output: The value of n! 188 | if n = 0 return 1 189 | else return F(n - 1) * n 190 | ``` 191 | 192 | #### Time Complexity Analysis for Factorial: 193 | 194 | To compute the factorial of $n$, the algorithm must perform $n$ multiplications. Here's a recurrence relation for the number of multiplications $M(n)$: 195 | 196 | - $M(n) = M(n-1) + 1$ for $n > 0$ 197 | - $M(0) = 0$ (No multiplications when $n = 0$) 198 | 199 | Solving this recurrence gives: 200 | 201 | - $M(n) = M(n-1) + 1$ 202 | - $M(n-1) = M(n-2) + 1$ 203 | - $M(n) = M(n-2) + 2$ 204 | - $M(n) = M(n-3) + 3$ 205 | - ... 206 | - $M(n) = n$ 207 | 208 | Thus, the time complexity of this algorithm is $O(n)$ because it performs $n$ multiplications. 209 | 210 | ### Example: Tower of Hanoi 211 | 212 | The Tower of Hanoi is a classic recursive problem. The objective is to move a set of disks from one rod to another, following certain rules: 213 | 214 | 1. Only one disk can be moved at a time. 215 | 2. A larger disk cannot be placed on a smaller disk. 216 | 217 | #### Recursive Relation for Tower of Hanoi: 218 | 219 | - If there are $n$ disks, the recursive relation is: 220 | $ M(n) = 2M(n-1) + 1 $ 221 | where $M(n)$ is the number of moves required to transfer $n$ disks. 222 | - **Initial Condition**: When $n = 1$, only one move is required: $M(1) = 1$. 223 | 224 | #### Solving the Recurrence: 225 | 226 | Using backward substitution to solve the recurrence: 227 | $M(n) = 2M(n-1) + 1$ 228 | $= 2[2M(n-2) + 1] + 1 = 2^2M(n-2) + 2 + 1$ 229 | $= 2^3M(n-3) + 2^2 + 2 + 1$ 230 | $...$ 231 | $M(n) = 2^n - 1$ 232 | 233 | Thus, the number of moves grows exponentially with $n$. The time complexity of Tower of Hanoi is $O(2^n)$. 234 | 235 | ### Analyzing the Time Efficiency of Non-Recursive Algorithms 236 | 237 | The time efficiency of non-recursive algorithms can be analyzed through a systematic approach. This process involves identifying critical factors such as the size of the input, the basic operation of the algorithm, and how the execution of that operation scales with the input size. Here’s how this analysis is conducted step-by-step: 238 | 239 | 1. **Decide on a Parameter (or Parameters) Indicating an Input’s Size** 240 | The input's size is a crucial factor in determining an algorithm's time efficiency. For example, if an algorithm processes a list, the input's size would typically be the number of elements in the list (denoted as $n$). 241 | 242 | 2. **Identify the Algorithm’s Basic Operation** 243 | The basic operation is the fundamental computation or comparison repeated most frequently in the algorithm. It is typically found in the innermost loop or operation. For example, in a sorting algorithm, the basic operation might be comparing two elements. 244 | 245 | 3. **Check Whether the Number of Times the Basic Operation is Executed Depends Only on Input Size** 246 | The execution of the basic operation may depend solely on the input size or on other factors, such as the specific data or structure of the input. For example, in some algorithms, the execution count may differ between the best, average, and worst-case scenarios. When this is the case, separate analyses for worst-case, average-case, and best-case efficiencies are necessary. 247 | 248 | 4. **Set Up a Sum for the Number of Times the Basic Operation is Executed** 249 | A sum is established to express how many times the basic operation is executed as a function of the input size. For example, if a loop executes $n-1$ times, the sum would reflect that repetition. 250 | 251 | 5. **Use Standard Formulas and Rules of Sum Manipulation** 252 | Using mathematical formulas, the sum is simplified to find either a closed-form expression or at least the asymptotic growth rate of the algorithm’s time complexity. The goal is to determine the algorithm's "order of growth," typically expressed using Big-O notation (e.g., $O(n)$, $O(n^2)$). 253 | 254 | --- 255 | 256 | ### Example: Finding the Maximum Element in an Array 257 | 258 | Let’s consider an algorithm that finds the maximum element in an array of $n$ elements. 259 | 260 | **Algorithm: Max Element** 261 | The following pseudocode shows a simple algorithm for finding the largest element in an array: 262 | 263 | ``` 264 | // Input: Array A[0..n-1] of real numbers 265 | // Output: The value of the largest element in A 266 | Max_val ← A[0] 267 | for i ← 1 to n − 1 do 268 | if A[i] > Max_val 269 | Max_val ← A[i] 270 | return Max_val 271 | ``` 272 | 273 | **Algorithm Analysis**: 274 | 275 | - **Input Size**: The input size is the number of elements in the array, denoted as $n$. 276 | - **Basic Operation**: The basic operation is the comparison $A[i] > Max_val$, as it occurs on every iteration of the loop. 277 | - **No Best, Worst, or Average Case Distinction**: In this algorithm, the number of comparisons remains the same regardless of the order of elements in the array. Therefore, the analysis applies to all cases. 278 | - **Sum of Basic Operations**: Since the comparison is executed once per iteration of the loop, and the loop runs from $i = 1$ to $i = n-1$, the number of comparisons $C(n)$ is: 279 | 280 | $C(n) = \sum\_{i=1}^{n-1} 1 = n - 1$ 281 | 282 | This indicates that the time complexity is $O(n)$, meaning the algorithm’s execution time grows linearly with the size of the array. 283 | 284 | --- 285 | 286 | ### Empirical Analysis of Algorithms 287 | 288 | Empirical analysis is an evidence-based approach that relies on observed and measurable evidence. It is an essential tool in analyzing algorithms, especially when theoretical analysis alone may not provide a complete picture. Empirical evidence in algorithm analysis often involves measuring an algorithm's performance by running it on actual input data and recording its behavior. 289 | 290 | Steps of empirical analysis include: 291 | 292 | 1. **Observation**: 293 | Initial observations of the algorithm's behavior are made, often by running it on different datasets. These observations spark ideas or lead to hypotheses about the algorithm's performance characteristics. 294 | 295 | **Example**: Observing how the algorithm for finding the maximum element performs on small vs. large arrays can lead to insights about its time efficiency. 296 | 297 | 2. **Induction**: 298 | Based on the observed data, a probable explanation for the algorithm's behavior is proposed. Inductive reasoning is used to generalize the specific results from the observations. 299 | 300 | **Example**: After observing that the algorithm takes longer with larger arrays, one might hypothesize that its time complexity is linear. 301 | 302 | 3. **Deduction**: 303 | A testable hypothesis is formulated, which can be verified by conducting more experiments or using theoretical analysis. Deductive reasoning takes the general explanation and predicts specific outcomes that can be tested. 304 | 305 | **Example**: Based on the hypothesis that the time complexity is $O(n)$, one might predict that doubling the array size will roughly double the running time. 306 | 307 | 4. **Testing**: 308 | Quantitative and qualitative data are gathered through experimentation. This data is often analyzed statistically to confirm or refute the hypothesis. The results can support the hypothesis, refute it, or be neutral. 309 | 310 | **Example**: Running the algorithm on arrays of various sizes and measuring the running time would provide empirical data to support or refute the hypothesis about time complexity. 311 | 312 | 5. **Evaluation**: 313 | After gathering and analyzing the empirical data, conclusions are drawn, and the results are documented. This stage may include discussing any limitations encountered during testing and suggestions for future research or improvements. 314 | 315 | --- 316 | 317 | ### Example of Empirical Analysis: 318 | 319 | Suppose we are analyzing the `Max Element` algorithm through empirical testing by running it on arrays of increasing size and recording the running time. If we observe that doubling the array size roughly doubles the running time, this supports the conclusion that the time complexity is linear ($O(n)$). 320 | 321 | In summary, analyzing non-recursive algorithms involves understanding how the number of basic operations scales with input size. Empirical analysis complements theoretical analysis by providing real-world performance data. Both methods provide insight into an algorithm's efficiency and help optimize performance based on practical requirements. 322 | 323 | **[Related link for more information about this section](https://github.com/m-mdy-m/TechShelf/tree/main/Algorithms/Analysis)** 324 | -------------------------------------------------------------------------------- /my-articles/Algorithm/SORTING/Bubble.md: -------------------------------------------------------------------------------- 1 | # Bubble 2 | 3 | ## What is Bubble: 4 | 5 | Bubble sort is a sorting algorithm, sometimes referred to as sinking sort. It works by repeatedly stepping through the list, element by element, comparing the current element with the one after it. If the elements are in the wrong order (current element is greater than the next one), they are swapped. These passes through the list are repeated until no swaps are needed in a complete pass, indicating the list is sorted. 6 | 7 | The name "bubble sort" comes from the way larger elements "bubble" up to the top of the list during the sorting process. Imagine bubbles rising to the surface of water – larger elements, like bubbles, eventually rise (or swap their positions) to their correct positions at the end of the list. 8 | 9 | ### Introduction: 10 | 11 | Although considered a simple and intuitive sorting algorithm, bubble sort performs poorly in most practical applications. Its inefficiency makes it primarily a pedagogical tool used to introduce sorting concepts. More efficient algorithms like quicksort, timsort, or merge sort are generally used in real-world sorting tasks within popular programming languages. 12 | 13 | However, it's worth noting that under specific conditions, bubble sort can be advantageous. If parallel processing is allowed, bubble sort can achieve O(n) time complexity, making it potentially faster than parallel implementations of insertion sort or selection sort, which don't parallelize as effectively. 14 | 15 | ## How it Work: 16 | 17 | **1. Initial Setup:** 18 | 19 | - Consider an unsorted list of elements `[element1, element2, ..., elementN]`. 20 | - Initialize a flag `swapped` to `False` to track if any swaps occurred during the current pass. This flag is crucial to determine if the list is already sorted and avoid unnecessary iterations. 21 | 22 | **2. Looping Through the List:** 23 | 24 | - Initiate a loop that iterates through the list from the beginning (index 0) up to the second-last element (index N-2). This is because the largest element will "bubble" to the end in the first pass, so we don't need to compare it further in subsequent passes. Essentially, with each pass, the sorted portion of the list grows by one element from the end. 25 | 26 | **3. Comparing Adjacent Elements:** 27 | 28 | - Inside the loop, for each element at index `i`, compare it with its next element at index `i+1`. 29 | 30 | **4. Swapping if Necessary:** 31 | 32 | - If the element at `i` is greater than the element at `i+1`, it means they're in the wrong order. Set the `swapped` flag to `True` and swap their positions in the list. This swap effectively "bubbles" the larger element one position closer to its correct place at the end of the list. 33 | 34 | **5. Repeating the Process:** 35 | 36 | - Continue iterating through the loop, comparing and potentially swapping adjacent elements. 37 | 38 | **6. Checking for Swaps:** 39 | 40 | - After completing a full pass through the list, check the `swapped` flag. 41 | 42 | **7. Termination Condition:** 43 | 44 | - If the `swapped` flag remained `False` throughout the entire pass, it signifies that no swaps were necessary, meaning the list is already sorted. In this case, the loop terminates. 45 | 46 | **8. Multiple Passes:** 47 | 48 | - If the `swapped` flag was set to `True` during a pass, it indicates elements were out of order and need further sorting. This necessitates another pass through the list. Repeat steps 2-7 until a complete pass occurs with no swaps, signifying a sorted list. 49 | 50 | **Illustrative Example:** 51 | 52 | Consider the unsorted list: `[6, 4, 2, 8, 1]`. Here's how bubble sort would work step by step: 53 | 54 | **Pass 1:** 55 | 56 | - Compare `6` at index 0 with `4` at index 1. Swap them (`swapped` becomes `True`). Now the list is `[4, 6, 2, 8, 1]`. 57 | - Compare `6` at index 1 (previously swapped element) with `2` at index 2. Swap them (`swapped` remains `True`). Now the list is `[4, 2, 6, 8, 1]`. 58 | - Continue comparing and swapping elements until reaching the second-last element (index 3). In this pass, the list might become `[4, 2, 1, 8, 6]`. The largest element (`6`) has "bubbled" to its correct position at the end. 59 | 60 | **Pass 2 (if necessary):** 61 | 62 | - Repeat the comparison and swapping process for adjacent elements. 63 | - In this pass, elements might be mostly sorted, resulting in the `swapped` flag staying `False`. This indicates the list is now sorted (`[1, 2, 4, 6, 8]`). 64 | 65 | **Key Points:** 66 | 67 | - With each pass, the largest element "bubbles" up to its correct position at the end of the list. The number of passes required depends on the initial level of unsortedness. In the worst case (completely reversed list), it takes N-1 passes (where N is the number of elements) to sort the list. 68 | - Bubble sort is simple to understand and implement but has a time complexity of O(n^2), making it inefficient for large datasets. The nested loop structure that compares all possible adjacent element pairs leads to this quadratic complexity. As the number of elements increases, the number of comparisons grows significantly. 69 | - While not recommended for real-world sorting tasks due to its inefficiency, bubble sort serves as a valuable pedagogical tool for understanding the fundamental concepts of sorting algorithms. 70 | 71 | ## Implementations 72 | 73 | **1. Initialization:** 74 | 75 | - Define a function `bubble_sort(array)` that takes an unsorted array of elements as input. 76 | - Inside the function, initialize a variable `swapped` to `False` to track if any swaps occurred during a pass. 77 | 78 | **2. Looping Through the List:** 79 | 80 | - Initiate a `for` loop that iterates from `0` to `n-1` (where `n` is the length of the array). This loop controls the number of passes required to sort the list. 81 | 82 | **3. Comparing Adjacent Elements:** 83 | 84 | - Inside the loop, create another `for` loop that iterates from `0` to `n-i-1` (where `i` is the current loop counter for the outer loop). This inner loop compares adjacent elements within each pass. 85 | 86 | **4. Swapping if Necessary:** 87 | 88 | - Within the inner loop, compare the element at index `j` with the element at index `j + 1`. 89 | - If the element at index `j` is greater than the element at index `j + 1`, it means they're in the wrong order. Set the `swapped` flag to `True` and swap their positions in the array using a temporary variable `temp`. 90 | 91 | **5. Checking for Swaps:** 92 | 93 | - After completing the inner loop (one pass through the list), check the `swapped` flag. 94 | 95 | **6. Termination Condition:** 96 | 97 | - If the `swapped` flag remained `False` throughout the entire pass (meaning no swaps were necessary), it signifies the list is already sorted. In this case, we can optimize by breaking out of the outer loop using a `break` statement. This avoids unnecessary iterations. 98 | 99 | **Pseudocode:** 100 | 101 | ``` 102 | function bubble_sort(array) 103 | swapped = False // Initialize swapped flag 104 | 105 | for i = 0 to n-1 // Loop for number of passes (n-1) 106 | swapped = False // Reset swapped flag for each pass 107 | 108 | for j = 0 to n-i-1 // Loop to compare adjacent elements 109 | if array[j] > array[j + 1] then 110 | temp = array[j] // Temporary variable for swap 111 | array[j] = array[j + 1] 112 | array[j + 1] = temp 113 | swapped = True // Set swapped flag if a swap occurred 114 | end if 115 | if not swapped then // Optimization: break if no swaps in a pass 116 | break 117 | end if 118 | end for 119 | end function 120 | ``` 121 | 122 | **Explanation:** 123 | 124 | - The outer loop controls the number of passes required to sort the list. With each pass, the largest element "bubbles" to the end. 125 | - The inner loop compares adjacent elements within each pass. 126 | - The swapping logic ensures elements are placed in ascending order. 127 | - The `swapped` flag is crucial to determine if the list is already sorted and avoid unnecessary iterations. 128 | - The `break` statement in the outer loop is an optimization to exit early if the list is sorted in a particular pass. 129 | 130 | ## Complexity 131 | 132 | **Time Complexity:** 133 | 134 | - **Worst-case:** O(n^2) 135 | - **Average-case:** O(n^2) 136 | - **Best-case:** O(n) 137 | 138 | - **Worst-case (O(n^2)):** This scenario occurs when the list is initially sorted in descending order. In each pass, only the largest element "bubbles" to its correct position at the end. The remaining elements need to be compared again in subsequent passes. This requires `n-1` comparisons in the first pass, `n-2` comparisons in the second pass, and so on, leading to a total of: 139 | 140 | ``` 141 | (n-1) + (n-2) + ... + 1 = n(n-1)/2 ≈ O(n^2) 142 | ``` 143 | 144 | - **Average-case (O(n^2)):** Even for randomly ordered lists, the average number of comparisons required to sort the list remains O(n^2). This is because the nested loops still need to iterate through all possible pairs of elements in the worst-case scenario, even if it happens less frequently for random data. 145 | 146 | - **Best-case (O(n)):** The only situation where bubble sort achieves O(n) time complexity is when the list is already sorted in ascending order. In this case, the `swapped` flag will remain `False` throughout the first pass itself, and the `break` statement will immediately terminate the loop, requiring only n comparisons (one for each element). 147 | 148 | **Space Complexity:** 149 | 150 | - **Space complexity:** O(1) 151 | 152 | Bubble sort is considered a space-efficient sorting algorithm. It only requires a constant amount of extra space for variables like `swapped`, `i`, and `j` used in the loops. These variables don't depend on the input size (number of elements) and remain constant throughout the sorting process. 153 | 154 | **Key Points:** 155 | 156 | - The nested loops in bubble sort lead to its quadratic time complexity, making it inefficient for large datasets. 157 | - Although bubble sort has a best-case of O(n), this scenario is uncommon in practice. 158 | - For real-world sorting tasks, algorithms like quicksort, timsort, or merge sort with lower average-case time complexities are preferred. 159 | 160 | ## Advantages and Disadvantages 161 | 162 | **Advantages:** 163 | 164 | - **Simplicity:** Bubble sort's most significant advantage lies in its sheer simplicity. The algorithm is incredibly easy to understand and implement, even for beginners with basic programming knowledge. This makes it a valuable educational tool for grasping fundamental sorting concepts. 165 | 166 | - **In-place sorting:** Bubble sort is an in-place sorting algorithm, meaning it sorts the data by modifying the original array without requiring significant additional memory space. This can be beneficial for situations with memory constraints. 167 | 168 | - **Stable sorting:** Bubble sort is a stable sorting algorithm. This means that if two elements have equal keys (values to be sorted by), their original order is preserved in the sorted output. This can be useful in specific use cases where maintaining the relative order of duplicates is important. 169 | 170 | **Disadvantages:** 171 | 172 | - **Time complexity:** The primary drawback of bubble sort is its unfavorable time complexity. In the worst-case scenario, and often in the average case, bubble sort has a time complexity of O(n^2). This translates to significantly slower sorting times as the data size (n) increases. For large datasets, this inefficiency makes bubble sort impractical. 173 | 174 | - **Inefficiency for large datasets:** Due to its quadratic time complexity, bubble sort becomes exceptionally slow when dealing with large datasets. There are significantly faster sorting algorithms available, such as quicksort, merge sort, or timsort, which should be tercih for real-world sorting tasks. 175 | 176 | - **Comparison-based sorting:** Bubble sort, like many sorting algorithms, is comparison-based. It relies on comparing elements to determine their order. In certain cases, this comparison-based approach might be inherently less efficient than non-comparison-based sorting algorithms for specific data types or sorting criteria. 177 | 178 | ## FAQ 179 | 180 | **1. What is the Boundary Case for Bubble Sort?** 181 | 182 | Bubble sort exhibits its best-case time complexity of O(n) when the input list is already sorted in ascending order. In this scenario, the `swapped` flag will remain `False` throughout the first pass itself. Since no swaps are necessary, the loop terminates immediately after the first iteration, requiring only n comparisons (one for each element). This highlights that bubble sort can be efficient for pre-sorted or nearly sorted data. 183 | 184 | **2. Does Sorting Happen In-place in Bubble Sort?** 185 | 186 | Yes, bubble sort is considered an in-place sorting algorithm. It sorts the data by modifying the original array itself, without allocating significant additional memory space. During the sorting process, elements are swapped within the existing array until the desired order is achieved. This in-place nature can be advantageous in situations with memory limitations. 187 | 188 | **3. Is Bubble Sort a Stable Sorting Algorithm?** 189 | 190 | Yes, bubble sort is a stable sorting algorithm. This means that if two elements in the original list have equal values (the key being sorted by), their relative order remains preserved in the sorted output. This stability arises because bubble sort only compares adjacent elements. If two elements have the same value, they won't be swapped unless they're in the wrong order based on their position in the list. This property can be crucial when maintaining the order of duplicates is essential. 191 | 192 | **4. Where is Bubble Sort Used?** 193 | 194 | Despite its limitations in real-world sorting due to time complexity, bubble sort finds applications in specific scenarios: 195 | 196 | - **Educational Tool:** Due to its simplicity, bubble sort is often used as an introductory sorting algorithm in computer science education. It allows students to grasp the fundamental concepts of sorting and how comparisons and swaps lead to ordered data. 197 | 198 | - **Nearly Sorted Arrays:** For situations where the data is already partially sorted or has only a few elements out of order, bubble sort can be surprisingly efficient. Its linear complexity (O(n)) in this best-case scenario makes it suitable for fine-tuning the order of a nearly sorted list. 199 | 200 | - **Computer Graphics:** In specific computer graphics applications, such as polygon filling algorithms, bubble sort can be used to detect and fix minor errors (like swapping just two elements) in almost-sorted arrays with linear complexity (O(2n)). This can be beneficial for maintaining order within specific graphical operations. 201 | 202 | ## Example 203 | 204 | - [Ts](https://github.com/m-mdy-m/algorithms-data-structures/blob/main/6.Basic-Sorting/Bubble/Example/Bubble.ts) 205 | - [Js](https://github.com/m-mdy-m/algorithms-data-structures/blob/main/6.Basic-Sorting/Bubble/Example/Bubble.js) 206 | - [Go](https://github.com/m-mdy-m/algorithms-data-structures/blob/main/6.Basic-Sorting/Bubble/Example/Bubble.go) 207 | - [Py](https://github.com/m-mdy-m/algorithms-data-structures/blob/main/6.Basic-Sorting/Bubble/Example/Bubble.py) 208 | - [Java](https://github.com/m-mdy-m/algorithms-data-structures/blob/main/6.Basic-Sorting/Bubble/Example/Bubble.java) 209 | -------------------------------------------------------------------------------- /my-articles/Algorithm/SORTING/Insertion.md: -------------------------------------------------------------------------------- 1 | # Insertion 2 | 3 | ## Introduction: 4 | 5 | Insertion sort is a sorting algorithm that mimics the intuitive process of arranging objects in increasing (or decreasing) order. Imagine you have a stack of cards in your hand, and you want to sort them efficiently. Insertion sort tackles this problem in a step-by-step manner: 6 | 7 | 1. **Start Small:** You begin by considering the first card as already sorted. This forms the initial sorted sub-list. 8 | 2. **Insertion Challenge:** Next, you pick up the second card. Now comes the key step: imagine inserting this card into its rightful place among the cards you're already holding (the sorted sub-list). This might involve shifting some cards in your hand to make space for the new card while maintaining the sorted order. 9 | 3. **Iterative Growth:** With the second card placed correctly, you repeat this process for every remaining card. Each time, you take an unsorted card and strategically insert it into the sorted sub-list you've built so far. 10 | 4. **The Grand Finale:** By the time you reach the last card, you'll have iterated through the entire deck, inserting each card into its appropriate position. The result? The entire hand of cards is now sorted! 11 | 12 | ## How it Work: 13 | 14 | 1. **Seeding the Sorted Sub-list:** We kick things off by considering the first element of the array as already sorted. This single element acts as the foundation of our sorted sub-list. The remaining elements in the array represent the unsorted portion. 15 | 16 | 2. **Unsheathing the Unsorted:** We enter a loop that iterates through the unsorted portion, starting from the second element (index 1) and progressing towards the end of the array. For each element in this unsorted territory: 17 | 18 | - **Target in Sight:** We focus on the current element, picking it up for inspection. 19 | - **Sifting Through the Sorted:** We embark on a comparison journey, starting from the right end of the sorted sub-list and working our way towards the beginning. We compare the current element with each element in the sorted sub-list, one by one. 20 | - **Making Way:** If the current element is smaller than an element in the sorted sub-list, we encounter an obstacle. To create space for the current element, we shift the larger element in the sorted sub-list one position to the right. This act of shifting effectively pushes the larger elements one step further down the line, creating a temporary vacancy. 21 | - **Finding the Niche:** We continue this compare-and-shift operation within the sorted sub-list until we find a sweet spot. This spot is where the current element is greater than or equal to the element in the sorted sub-list. This marks the rightful position for the current element to be inserted. 22 | 23 | 3. **Securing the Sorted Order:** Having identified the perfect insertion point, we place the current element into that position in the sorted sub-list. This action essentially inserts the element into its correct spot among the already sorted elements, expanding the sorted sub-list by one. 24 | 25 | 4. **Marching Towards Completion:** The loop continues its march, processing each element in the unsorted portion of the array. We repeat steps 2 and 3 for every element. By the time the loop finishes its journey, it will have iterated through all elements. The consequence? The entire array will be transformed into a perfectly sorted list, much like a meticulously arranged hand of cards. 26 | 27 | This step-by-step process mirrors the act of sorting cards one by one. With each insertion, we strategically place an element into the sorted sub-list, ensuring that the final order is meticulously maintained. 28 | 29 | ## Implementations 30 | 31 | Now that we have a solid grasp of insertion sort's step-by-step process, let's explore how to translate those steps into code. We'll use pseudocode, which resembles natural language but is specifically designed to describe algorithms. 32 | 33 | **1. Seeding the Sorted Sub-list (Initialization):** 34 | 35 | ``` 36 | for i = 1 to length(array) // Loop through the unsorted portion (starting from the second element) 37 | ``` 38 | 39 | This loop iterates through the elements in the array, starting from the second element (index 1). We don't need to process the first element because it's considered "already sorted" in our initial sub-list of size 1. 40 | 41 | **2. Unsheathing the Unsorted (Inner Loop):** 42 | 43 | ``` 44 | currentElement = array[i] // Focus on the current element 45 | 46 | j = i // Initialize the index for shifting within the sorted sub-list (starts at current element position) 47 | 48 | while j > 0 and array[j - 1] > currentElement: // Sifting through the sorted sub-list 49 | array[j] = array[j - 1] // Shift larger elements one position to the right (making space) 50 | j = j - 1 // Move the shifting index one step back (towards the beginning of the sorted sub-list) 51 | ``` 52 | 53 | This inner loop represents the "sifting through the sorted" step. Here's a breakdown: 54 | 55 | - `currentElement`: We store the element we're currently processing from the unsorted portion. 56 | - `j`: This variable acts as an index for shifting elements within the sorted sub-list. It starts at the position of the current element (`i`). 57 | - The `while` loop continues as long as two conditions are met: 58 | 1. `j` is greater than 0 (to avoid going out of bounds of the sorted sub-list). 59 | 2. The element at the previous index (`j-1`) in the sorted sub-list is larger than the `currentElement`. This indicates we need to shift elements to make space. 60 | - Inside the loop, we perform the shifting: 61 | - `array[j] = array[j - 1]`: This line essentially moves the larger element from the sorted sub-list one position to the right, creating a temporary vacancy at index `j`. 62 | - `j = j - 1`: We decrement `j` to move the shifting index one step back, focusing on the element to the left in the sorted sub-list for the next comparison. 63 | 64 | **3. Securing the Sorted Order (Insertion):** 65 | 66 | ``` 67 | array[j] = currentElement // Insert the current element at the correct position in the sorted sub-list 68 | ``` 69 | 70 | Once the `while` loop exits, it means we've found the ideal spot for the `currentElement` within the sorted sub-list. This spot is marked by the index `j`. Here, we simply insert the `currentElement` into its rightful position using this line: 71 | 72 | - `array[j] = currentElement`: This places the `currentElement` at the correct index (`j`) in the sorted sub-list, effectively expanding the sorted portion of the array. 73 | 74 | **4. Marching Towards Completion (Loop Continues):** 75 | 76 | The outer loop (`for`) continues its march, processing the next element in the unsorted portion. We repeat steps 2 and 3 for every element in the array until the loop finishes iterating through all elements. By the end, the entire array will be transformed into a sorted list. 77 | 78 | This approach merges the explanation of insertion sort with the corresponding pseudocode, providing a clear understanding of how the algorithm sorts elements and how that translates into code. 79 | 80 | **Full Insertion Sort Pseudocode:** 81 | 82 | ``` 83 | function insertionSort(array) 84 | for i = 1 to length(array) // Loop through the unsorted portion (starting from the second element) 85 | currentElement = array[i] // Focus on the current element 86 | 87 | j = i // Initialize the index for shifting within the sorted sub-list (starts at current element position) 88 | 89 | while j > 0 and array[j - 1] > currentElement: // Sifting through the sorted sub-list 90 | array[j] = array[j - 1] // Shift larger elements one position to the right (making space) 91 | j = j - 1 // Move the shifting index one step back (towards the beginning of the sorted sub-list) 92 | end while 93 | 94 | array[j] = currentElement // Insert the current element at the correct position in the sorted sub-list 95 | end for 96 | end function 97 | ``` 98 | 99 | ## Complexity 100 | 101 | Insertion sort, while conceptually simple, exhibits a complexity that varies depending on the input it encounters. Here's a breakdown of its time complexity in different scenarios: 102 | 103 | **Best Case: O(n)** 104 | 105 | Imagine a scenario where the input array is already sorted in ascending (or descending) order. In this delightful case, insertion sort has a linear running time, meaning the time it takes to sort the array grows proportionally to the number of elements (n). Why? Because for each element, the inner loop that performs shifting barely budges. Since the element is already in its rightful place, there's no need for extensive comparisons or shifting within the sorted sub-list. It's a breeze-through! 106 | 107 | **Worst Case: O(n^2)** 108 | 109 | Now, let's consider the not-so-friendly scenario: an array sorted in reverse order (descending for ascending sort and vice versa). This is where insertion sort encounters its biggest hurdle. As it iterates through the elements, each element needs to be compared with all the previously sorted elements (which keeps growing with each iteration) to find its correct position. This comparison and potential shifting process for every element translates to a quadratic running time, meaning the time it takes to sort the array grows proportionally to the square of the number of elements (n^2). 110 | 111 | **Average Case: O(n^2)** 112 | 113 | The average case, unfortunately, also falls into the O(n^2) complexity category. While not always dealing with a perfectly reverse-sorted array, insertion sort on random data still involves a significant amount of shifting within the sorted sub-list for most elements. This shifting operation contributes to the overall time complexity, making it less efficient for large arrays compared to other sorting algorithms like quicksort or merge sort. 114 | 115 | **Sweet Spot for Small Arrays** 116 | 117 | Despite its average-case complexity, insertion sort shines for very small arrays. In fact, it can outperform even quicksort for such inputs. This is because the constant factors involved in insertion sort's operations (like comparisons and assignments) become more dominant for smaller arrays, outweighing the overhead of quicksort's pivot selection and recursion mechanisms. Additionally, for sorting sub-arrays that arise during the divide-and-conquer process in algorithms like quicksort, insertion sort is often used when the sub-array size falls below a certain threshold. This threshold is determined experimentally based on the specific machine and implementation but typically lies around ten elements. 118 | 119 | **Illustrative Example** 120 | 121 | Here's a table showcasing the insertion sort process for sorting the sequence {3, 7, 4, 9, 5, 2, 6, 1}: 122 | 123 | ``` 124 | Initial State: 3 7 4 9 5 2 6 1 125 | Step 1: 3* 7 4 9 5 2 6 1 (Key: 3, Moved: None) 126 | Step 2: 3 7* 4 9 5 2 6 1 (Key: 7, Moved: 3) 127 | ... (Steps omitted for brevity) ... 128 | Step 7: 1* 2 3 4 5 6 7 9 (Key: 1, Moved: None) 129 | Sorted Output: 1 2 3 4 5 6 7 9 130 | ``` 131 | 132 | As you can see, in the worst-case scenario (initial reverse-sorted order), most elements require significant shifting within the sorted sub-list, leading to a higher number of comparisons and swaps compared to the best case (already sorted). This difference translates to the contrasting time complexities. 133 | 134 | ## Advantages and Disadvantages 135 | 136 | **Advantages:** 137 | 138 | - **Simplicity Reigns Supreme:** Insertion sort's core concept mirrors the act of arranging cards one by one. This inherent simplicity translates to ease of understanding and implementation. It's a fantastic starting point for grasping sorting algorithms in general. 139 | - **Small Arrays, Big Wins:** When dealing with small lists (less than ten elements or so), insertion sort shines. The constant factors associated with its operations, like comparisons and assignments, dominate the overall time complexity. This makes it surprisingly efficient for tiny datasets, even surpassing more complex algorithms like quicksort in such scenarios. 140 | - **Nearly Sorted Savior:** If you have an array that's already partially sorted (elements mostly in order with a few outliers), insertion sort is your friend. The comparisons and shifting operations required are significantly reduced compared to a completely random array. This can be a valuable asset in specific situations. 141 | - **Space Saver:** Insertion sort is a memory-efficient algorithm. It sorts the array in-place, meaning it doesn't require creating a large additional data structure to hold temporary values during the sorting process. This can be crucial when dealing with limited memory constraints. 142 | - **Stable Sorting:** Insertion sort preserves the original order of equal elements within the array. This means if you have multiple elements with the same value, their relative positions in the sorted output will remain the same as in the original array. While not always essential, this stability can be beneficial for certain sorting tasks. 143 | 144 | **Disadvantages:** 145 | 146 | - **Large Array Lament:** As the size of the array grows, insertion sort's performance takes a significant hit. The worst-case scenario, a reverse-sorted array, leads to a quadratic time complexity (O(n^2)), meaning the sorting time increases proportionally to the square of the number of elements. This makes it impractical for sorting massive datasets where efficiency is paramount. 147 | - **Outshined by the Elite:** Compared to more advanced sorting algorithms like quicksort, merge sort, or heap sort, insertion sort generally falls short in terms of efficiency for most input sizes. These algorithms achieve on average a time complexity of O(n log n), which significantly outperforms insertion sort's O(n^2) in the worst-case and average scenarios. 148 | - **Inner Loop Blues:** The inner loop in insertion sort, responsible for shifting elements within the sorted sub-list, can be a bottleneck for performance. This loop iterates and potentially shifts elements for each insertion, contributing to the overall time complexity, especially in the worst case. 149 | 150 | ## Relation to other sorting algorithms 151 | 152 | Insertion sort and selection sort, while both fundamental sorting algorithms, tackle the task of arranging elements in order with slightly different approaches. Both share a family resemblance: they iteratively build a sorted sub-list at the beginning of the array. However, their strategies for selecting and placing elements diverge, leading to distinct strengths and weaknesses. 153 | 154 | **Similarities in Simplicity:** 155 | 156 | - **Grasp at First Glance:** Both insertion sort and selection sort boast a straightforward logic, making them excellent starting points for understanding sorting algorithms in general. Their core concepts are relatively easy to grasp, even for beginners. 157 | - **Implementation Ease:** Along with their conceptual simplicity, both algorithms translate well into code. Their clear steps and lack of complex data structures make them ideal for those new to implementing sorting algorithms. 158 | 159 | **Key Differences in Philosophy:** 160 | 161 | - **Selection Strategy:** This is where the cousins differ in style. Insertion sort acts like an organizer meticulously inserting elements into their rightful places within a sorted sub-list. It scans **backwards** from the current element, comparing it with elements in the sorted sub-list until it finds the correct position for insertion. 162 | - **Selection Sort's Scouting Approach:** In contrast, selection sort is more like a scout searching for the minimum element. It scans the **unsorted** portion of the array **forwards**, identifying the element with the minimum value. This minimum element is then swapped with the first element in the unsorted portion, effectively placing it in its sorted position. 163 | 164 | **Efficiency Trade-offs:** 165 | 166 | The choice between insertion sort and selection sort often involves a trade-off between comparisons and swaps (data movement). 167 | 168 | - **Insertion Sort's Efficiency Advantage:** When dealing with **partially sorted arrays** or when the next element is already greater than the elements in the sorted sub-list, insertion sort shines. In these cases, it requires fewer comparisons (potentially just one) because it doesn't need to scan the entire unsorted portion for the minimum element. On average, insertion sort performs about half the number of comparisons as selection sort. 169 | - **Swapping Considerations:** However, insertion sort might require more **swaps** (data movement) during insertion within the sorted sub-list, especially in the worst case (reverse-sorted array). This can be a disadvantage in scenarios where memory writes are expensive, such as with flash memory. 170 | 171 | **Selection Sort's Memory-Conscious Strategy:** Selection sort excels in situations where memory writes (swaps) are a major concern. It requires fewer swaps but performs more comparisons to identify the minimum element. This can be beneficial for memory-constrained environments. 172 | 173 | **Small Arrays: A Realm Where Simplicity Reigns** 174 | 175 | While more advanced algorithms like quicksort and mergesort outperform both for larger datasets, insertion sort and selection sort have a hidden champion's corner: very small arrays. Due to their simplicity, they can be faster for tiny datasets (exact size depends on implementation). This is why some complex sorting algorithms often switch to insertion sort when dealing with sub-arrays of a certain size during their divide-and-conquer process. 176 | 177 | ## FAQ 178 | 179 | **Q1. What are the best and worst case time complexities of Insertion Sort?** 180 | 181 | - **Best Case:** O(n). This occurs when the array is already sorted. In this scenario, insertion sort simply iterates through the array without needing to perform any swaps or shifts, resulting in linear time complexity. 182 | - **Worst Case:** O(n^2). This happens when the array is sorted in reverse order. In this case, each element needs to be compared with all the previously sorted elements to find its correct position, leading to quadratic time complexity. 183 | 184 | **Q2. How is Insertion Sort different from Selection Sort?** 185 | 186 | Both Insertion Sort and Selection Sort are simple sorting algorithms with similar time complexity in the average case (O(n^2)). However, they differ in their approach: 187 | 188 | - **Insertion Sort:** Works by iteratively inserting elements into their correct positions within a sorted sub-list. It scans backward from the current element for the right spot. 189 | - **Selection Sort:** Finds the minimum element in the unsorted portion of the array and swaps it with the first element in that portion. It scans forward to find the minimum element. 190 | 191 | Insertion Sort might be slightly more efficient for partially sorted arrays on average, while Selection Sort might be preferable when data swaps are expensive (e.g., flash memory). 192 | 193 | **Q3. Is Insertion Sort an in-place and stable sorting algorithm?** 194 | 195 | - **Yes, Insertion Sort is an in-place sorting algorithm.** It sorts the data by rearranging elements within the original array, without needing additional memory for a separate data structure. 196 | - **Yes, Insertion Sort is also a stable sorting algorithm.** This means it preserves the original order of equal elements in the array. If multiple elements have the same value, their relative positions before and after sorting will remain the same. 197 | 198 | **Q4. When is Insertion Sort a good choice?** 199 | 200 | Insertion Sort is a good choice for several reasons: 201 | 202 | - **Simple to understand and implement:** Great for learning sorting algorithms or for small codebases. 203 | - **Efficient for small arrays:** Often faster than complex algorithms for tiny datasets. 204 | - **Useful for partially sorted arrays:** Takes advantage of the partial order to reduce comparisons. 205 | 206 | ## Example 207 | 208 | - [Ts](https://github.com/m-mdy-m/algorithms-data-structures/blob/main/6.Basic-Sorting/Insertion/Example/Insertion.ts) 209 | - [Js](https://github.com/m-mdy-m/algorithms-data-structures/blob/main/6.Basic-Sorting/Insertion/Example/Insertion.js) 210 | - [Java](https://github.com/m-mdy-m/algorithms-data-structures/blob/main/6.Basic-Sorting/Insertion/Example/Insertion.java) 211 | - [Go](https://github.com/m-mdy-m/algorithms-data-structures/blob/main/6.Basic-Sorting/Insertion/Example/Insertion.go) 212 | - [Py](https://github.com/m-mdy-m/algorithms-data-structures/blob/main/6.Basic-Sorting/Insertion/Example/Insertion.py) 213 | -------------------------------------------------------------------------------- /my-articles/Algorithm/SORTING/Selection Sort.md: -------------------------------------------------------------------------------- 1 | # Selection Sort 2 | 3 | ## Introduction: 4 | 5 | Selection sort is a sorting algorithm that categorizes as an in-place comparison sort. This means it operates directly on the data it receives (in-place) and determines the order through comparisons between elements (comparison sort). While it has a time complexity of O(n^2), making it less efficient for massive datasets, selection sort offers advantages in specific scenarios due to its simplicity. 6 | 7 | **Conceptual Breakdown:** 8 | 9 | Here's a breakdown of how selection sort works: 10 | 11 | 1. **Divide and Conquer (Partially):** The input list is conceptually divided into two sections: 12 | 13 | - **Sorted Sublist:** Initially empty, this sublist grows from left to right as elements are placed in their correct order. 14 | - **Unsorted Sublist:** This sublist comprises the remaining elements that haven't been sorted yet and occupies the right portion of the original list. 15 | 16 | 2. **Iterative Selection:** The algorithm enters a loop that executes the following steps in each iteration: 17 | 18 | - **Finding the Minimum (or Maximum):** It scans the unsorted sublist to identify the element with the minimum value (for ascending order) or maximum value (for descending order). 19 | - **Swapping with the First Unsorted Element:** The identified minimum (or maximum) element is then swapped with the element at the beginning of the unsorted sublist, essentially placing it in its sorted position. 20 | - **Shrinking the Unsorted Sublist:** Since an element is now sorted, the unsorted sublist boundary shrinks by one position towards the right. 21 | 22 | 3. **Loop Termination:** The loop continues iterating until the entire unsorted sublist is empty, signifying that all elements have been placed in their correct positions within the sorted sublist, resulting in a fully sorted list. 23 | 24 | **Key Points:** 25 | 26 | - Selection sort is relatively easy to understand and implement compared to more complex sorting algorithms. 27 | - Its in-place nature makes it memory efficient as it doesn't require significant additional storage for temporary data. 28 | - However, the time complexity of O(n^2) becomes a significant drawback for very large datasets. As the number of elements (n) increases, the number of comparisons required grows quadratically, leading to slower performance compared to algorithms with better time complexity (e.g., Merge Sort, Quick Sort). 29 | 30 | In essence, selection sort repeatedly finds the extreme element (minimum or maximum) within the unsorted data, swaps it into its sorted position, and shrinks the unsorted portion until the entire list is sorted. While it might not be the most efficient choice for all sorting needs, its simplicity and low memory requirements make it a valuable tool in specific situations. 31 | 32 | ## How it Work: 33 | 34 | Selection sort, despite its O(n^2) complexity, offers a straightforward approach to sorting. Here's a detailed breakdown of the algorithm along with an illustrative example: 35 | 36 | **1. Initialization: Divide and Conquer (Partially)** 37 | 38 | Imagine you're handed a deck of cards (our unsorted list) and are tasked with arranging them in ascending order. Selection sort tackles this by conceptually dividing the deck into two sections: 39 | 40 | - **Sorted Sublist:** Initially empty, this pile will grow steadily as elements are placed in their correct order. 41 | - **Unsorted Sublist:** This pile represents the remaining cards that haven't been sorted yet. 42 | 43 | **2. Finding the Minimum (Each Iteration):** 44 | 45 | The algorithm enters a loop that iteratively processes the unsorted sublist: 46 | 47 | - In each pass, we tentatively assume the first card in the unsorted sublist is the minimum. 48 | - We then examine the remaining cards in the unsorted pile. 49 | - If we encounter a card smaller than our assumed minimum, we update the minimum's position (essentially the index in the list) to point to this new contender. 50 | 51 | **3. Swapping and Shrinking: Putting Things in Order** 52 | 53 | Once we've traversed the entire unsorted sublist, we've identified the true minimum element. Now it's time to make things official: 54 | 55 | - We swap the minimum element with the first card of the unsorted sublist. This essentially places the minimum card in its rightful sorted position at the front of the sorted sublist pile. 56 | - Since one element is now sorted and placed in its correct spot, the unsorted sublist shrinks. We simply move the boundary between the sorted and unsorted sublists one position forward, effectively excluding the swapped element from further consideration. 57 | 58 | **4. Loop Termination and Result:** 59 | 60 | The loop continues its relentless march through the unsorted sublist, performing steps 2 and 3 in each iteration. This process resembles sifting through the unsorted cards, extracting the minimum each time and placing it at the front of the sorted pile. 61 | 62 | - The loop terminates when the unsorted sublist becomes empty, signifying that all elements have been processed and placed in their correct positions. 63 | - At this point, the original deck of cards (our unsorted list) has been transformed into a perfectly ordered deck (our sorted list). 64 | 65 | **Example: Sorting a Hand of Cards** 66 | 67 | Let's sort the hand of cards `[8, 4, 2, 1, 5]`: 68 | 69 | - **Iteration 1:** 70 | - We tentatively assume the first card (8) is the minimum. 71 | - Traversing the rest (4, 2, 1, 5), we discover that 1 is indeed the smallest card. 72 | - We swap 8 and 1, placing the true minimum at the front of the sorted sublist (now containing just the 1). The unsorted sublist becomes `[8, 4, 2, 5]`. 73 | - **Iteration 2:** 74 | - The first card in the unsorted sublist (8) becomes the assumed minimum. 75 | - We find a 2 among the remaining cards, which is smaller than 8. 76 | - We swap 4 and 2, adding the 2 to the sorted sublist which becomes `[1, 2]`. The unsorted sublist is now `[8, 4, 5]`. 77 | - The loop continues similarly in each iteration, finding the minimum in the unsorted sublist and swapping it to its sorted position. 78 | - After all iterations, the entire hand is sorted: `[1, 2, 4, 5, 8]`. 79 | 80 | By repeatedly finding the minimum element and swapping it into its sorted position, selection sort gradually builds the sorted sublist from the original unsorted data. While it may not be the most efficient for massive datasets, selection sort's simplicity and low memory requirements make it a valuable tool in specific situations. 81 | 82 | **Visualizing the Process:** 83 | 84 | ![](https://codepumpkin.com/wp-content/uploads/2017/10/selectionSort.gif) 85 | 86 | ![](https://codepumpkin.com/wp-content/uploads/2017/10/SelectionSort_Avg_case.gif) 87 | 88 | ![](https://www.codingconnect.net/wp-content/uploads/2016/09/Selection-Sort.gif) 89 | 90 | ## Algorithm: 91 | 92 | Selection sort is a type of **selection algorithm**. Selection algorithms work by repeatedly finding the desired element (minimum or maximum depending on the sorting order) from a collection and placing it in its sorted position. You can find more information about selection algorithms in general [Selection-Algorithm](https://github.com/m-mdy-m/algorithms-data-structures/tree/main/6.5-Selection-Algorithm). 93 | 94 | In selection sort, this "desired element" is the minimum (or maximum) element within the unsorted data. The algorithm iteratively finds this minimum element, swaps it into its sorted position, and shrinks the unsorted portion until the entire list is sorted. 95 | 96 | ## Implementations 97 | 98 | **Initialization: Setting the Stage** 99 | 100 | 1. **Data Acquisition:** We begin with an unsorted list of elements `data` that needs to be arranged in ascending order (you can modify the comparison operator to sort in descending order if needed). 101 | 2. **Sorted Sublist Boundary:** We establish a variable `sorted_end` to keep track of the sorted sublist's end. Initially, it points to the beginning of the list (index 0) since there are no sorted elements yet. This variable essentially marks the dividing line between the sorted and unsorted portions of the list. 102 | 103 | **Looping Through the Unsorted Wilderness** 104 | 105 | 1. **Iterating Through Unsorted Elements:** We enter a loop that systematically iterates through the unsorted sublist. The loop counter `i` ranges from 0 to the second-last element (index `len(data) - 1`) because the last element will naturally fall into its correct position during the process. 106 | 107 | ``` 108 | for i in range(length(data) - 1): 109 | ``` 110 | 111 | **Finding the Minimum Within the Unsorted Realm** 112 | 113 | 1. **Tentative Minimum:** We initialize a variable `min_index` to hold the index of the assumed minimum element. We start by assuming the first element in the unsorted sublist (at index `i`) is the minimum. 114 | 2. **Scouring for the True Minimum:** We iterate through the remaining elements in the unsorted sublist (from `i + 1` to the end). 115 | 116 | ``` 117 | min_index = i 118 | for j in range(i + 1, length(data)): 119 | ``` 120 | 121 | 3. **Challenging the Assumption:** Within the inner loop, we compare each element (`data[j]`) with the current assumed minimum (`data[min_index]`). If we encounter an element smaller than the assumed minimum, we update `min_index` to point to this new challenger, effectively tracking the true minimum element's index. 122 | 123 | ``` 124 | if data[j] < data[min_index]: 125 | min_index = j 126 | ``` 127 | 128 | **Swapping and Shrinking: Bringing Order to Chaos** 129 | 130 | 1. **Placing the Minimum in its Rightful Place:** After examining the entire unsorted sublist, `min_index` holds the index of the true minimum element. We swap the element at `min_index` with the element at the current `sorted_end` position. This essentially lifts the minimum element out of the unsorted chaos and places it in its rightful sorted position at the front of the sorted sublist. 131 | 132 | ``` 133 | if i != min_index: # Swap only if minimum isn't already at the sorted_end 134 | data[i], data[min_index] = data[min_index], data[i] 135 | ``` 136 | 137 | 2. **Shrinking the Unsorted Territory:** Since one element is now sorted and placed in its correct position, the unsorted sublist shrinks. We increment `sorted_end` by 1 to reflect this change. The sorted sublist now encompasses one more element. 138 | 139 | **Loop Termination and the Sorted List** 140 | 141 | The loop continues its relentless march through the unsorted sublist, performing the steps mentioned above in each iteration. With each pass, the sorted sublist steadily grows, and the unsorted sublist correspondingly shrinks. 142 | 143 | - **Loop Completion:** The loop terminates when the counter `i` reaches the second-last element (index `len(data) - 2`). At this point, the last remaining element is automatically placed in its correct position (either the largest or smallest depending on the sorting order). 144 | 145 | **Pseudocode in Plain English:** 146 | 147 | ``` 148 | function selectionSort(data): 149 | sorted_end = 0 # Initialize sorted sublist boundary 150 | for i in range(length(data) - 1): # Loop through unsorted elements 151 | min_index = i # Assume first element in unsorted sublist is minimum 152 | for j in range(i + 1, length(data)): # Find the true minimum in the unsorted sublist 153 | if data[j] < data[min_index]: 154 | min_index = j 155 | if i != min_index: # Swap minimum with element at sorted_end if necessary 156 | data[i], data[min_index] = data[min_index], data[i] 157 | sorted_end += 1 # Increment sorted sublist boundary 158 | return data 159 | ``` 160 | 161 | ## Complexity 162 | 163 | **Time Complexity: O(n^2)** 164 | 165 | Selection sort exhibits a time complexity of O(n^2), signifying that the execution time grows quadratically with the input size (n). This analysis hinges on the nested loop structure inherent to the algorithm. 166 | 167 | - **Nested Loops:** The core operation involves two nested loops. The outer loop iterates through the unsorted sublist, progressively shrinking its size as elements are placed in their sorted positions. The inner loop, for each iteration of the outer loop, traverses the remaining unsorted elements to identify the minimum (or maximum, depending on the sorting order). 168 | - **Comparisons in Inner Loop:** In the worst-case scenario, to place an element in its sorted position, the inner loop needs to compare it with all the remaining unsorted elements (n - 1 comparisons) to pinpoint the minimum. 169 | - **Total Comparisons:** As this process repeats for n-1 elements (the outer loop iterates n-1 times), the total number of comparisons in the worst case scenario culminates in: 170 | 171 | ``` 172 | (n - 1) + (n - 2) + ... + 1 173 | ``` 174 | 175 | This summation represents an arithmetic series. By leveraging the formula for the sum of an arithmetic series, we arrive at: 176 | 177 | ``` 178 | Sum = (n-1 + 1) / 2 * (n-1) 179 | = n(n-1) / 2 180 | = 1/2 * n^2 - 1/2 * n 181 | ``` 182 | 183 | Ignoring the constant term (1/2 \* n), we're left with n^2, which dominates the expression and defines the overall time complexity. Consequently, selection sort exhibits a time complexity of O(n^2). 184 | 185 | Ignoring the constant term (1/2 \* n), we're left with n^2, which is the dominant term and defines the overall time complexity. Therefore, selection sort has a time complexity of O(n^2). 186 | 187 | > I know it might be a bit complicated, that's why better and simpler explanations: 188 | > **Simple explanation** 189 | 190 | Here's a simpler explanation of selection sort's time complexity: 191 | 192 | **Imagine a race with n runners.** Selection sort is like picking the slowest runner one by one and placing them at the front of the line (sorted position). 193 | 194 | - **Inner Loop:** To find the slowest runner (minimum element) in each round (iteration), you need to compare them all. This is like the inner loop, which takes n-1 comparisons in the worst case (comparing each runner with the remaining ones). 195 | - **Outer Loop:** You repeat this process (outer loop) n-1 times to find and place n-1 runners (elements) in their sorted positions. 196 | 197 | **The Problem:** As the number of runners (n) increases, the comparisons per round (inner loop) and the number of rounds (outer loop) both grow. This makes selection sort slow for large datasets. 198 | 199 | **The Math:** We can express this growth using a formula (n^2) that reflects the quadratic increase in comparisons with the number of elements (n). This is why selection sort has a time complexity of O(n^2). 200 | 201 | **In simpler terms:** The more runners you have (data elements), the more comparisons it takes to find the slowest one (minimum element) in each round (iteration), and the more rounds it takes to sort everyone (all elements). This makes selection sort slow for big datasets. 202 | 203 | **Space Complexity: O(1)** 204 | 205 | On the bright side, selection sort is a space-efficient algorithm. It performs in-place sorting, meaning it sorts the data within the original list without requiring significant additional memory. The space complexity remains constant (O(1)) irrespective of the input list size. This is because it only utilizes a few variables (like `sorted_end` and `min_index`) to track the sorting process, and these variables' space requirements don't grow with the input size. 206 | 207 | **Complexity Summary:** 208 | 209 | | Complexity | Description | 210 | | ---------------- | ------------------------------------------------------------------------------ | 211 | | Time Complexity | O(n^2). The number of comparisons grows quadratically with the input size (n). | 212 | | Space Complexity | O(1). The algorithm is space-efficient and uses constant extra space. | 213 | 214 | **In essence,** selection sort's simplicity and low memory requirements make it a potential choice for specific scenarios where memory is a constraint. However, for larger datasets, its O(n^2) time complexity becomes a significant drawback, and other sorting algorithms with better time complexity (e.g., Merge Sort, Quick Sort) are preferable. 215 | 216 | ## Advantages and Disadvantages 217 | 218 | **Advantages:** 219 | 220 | - **Simplicity:** Selection sort's core concept is straightforward to understand and implement. It's a good sorting algorithm to learn as it lays the foundation for grasping more complex sorting techniques. 221 | - **In-place sorting:** Selection sort operates directly on the original data list, modifying it in place. This eliminates the need for extra memory to store a temporary sorted copy, making it memory-efficient for smaller datasets. 222 | - **Favorable for small datasets:** For relatively small lists, selection sort might perform reasonably well due to its simpler structure. In scenarios where memory is a tight constraint and the data size is limited, selection sort can be a suitable choice. 223 | 224 | **Disadvantages:** 225 | 226 | - **O(n^2) time complexity:** The primary drawback of selection sort lies in its time complexity. As the number of elements (n) in the list grows, the number of comparisons required to sort the list increases quadratically (n^2). This makes selection sort inefficient for handling large datasets where other sorting algorithms with better time complexity (e.g., Merge Sort, Quick Sort) are more preferable. 227 | - **Unstable sorting:** Selection sort doesn't necessarily preserve the original order of elements with identical values. This means if you have multiple elements with the same value, their relative positions in the sorted list might differ from their original order. While this might not be a critical concern in all situations, it's an important factor to consider when specific ordering requirements exist. 228 | 229 | ## Comparison to other sorting algorithms 230 | 231 | Among quadratic sorting algorithms (sorting algorithms with a simple average-case of Θ(n2)), selection sort almost always outperforms bubble sort and gnome sort. Insertion sort is very similar in that after the kth iteration, the first 232 | 𝑘 elements in the array are in sorted order. Insertion sort's advantage is that it only scans as many elements as it needs in order to place the 𝑘+1 st element, while selection sort must scan all remaining elements to find the 𝑘+1 st element. 233 | 234 | Simple calculation shows that insertion sort will therefore usually perform about half as many comparisons as selection sort, although it can perform just as many or far fewer depending on the order the array was in prior to sorting. It can be seen as an advantage for some real-time applications that selection sort will perform identically regardless of the order of the array, while insertion sort's running time can vary considerably. However, this is more often an advantage for insertion sort in that it runs much more efficiently if the array is already sorted or "close to sorted." 235 | 236 | While selection sort is preferable to insertion sort in terms of number of writes (𝑛−1 swaps versus up to 𝑛(𝑛−1)/2 swaps, with each swap being two writes), this is roughly twice the theoretical minimum achieved by cycle sort, which performs at most n writes. This can be important if writes are significantly more expensive than reads, such as with EEPROM or Flash memory, where every write lessens the lifespan of the memory. 237 | 238 | Selection sort can be implemented without unpredictable branches for the benefit of CPU branch predictors, by finding the location of the minimum with branch-free code and then performing the swap unconditionally. 239 | 240 | Finally, selection sort is greatly outperformed on larger arrays by Θ(𝑛 log⁡ 𝑛) divide-and-conquer algorithms such as mergesort. However, insertion sort or selection sort are both typically faster for small arrays (i.e. fewer than 10–20 elements). A useful optimization in practice for the recursive algorithms is to switch to insertion sort or selection sort for "small enough" sublists. 241 | 242 | ## FAQ 243 | 244 | > `Ai is used` 245 | 246 | **When to Use Selection Sort:** 247 | 248 | - **Learning purposes:** Selection sort's simplicity makes it an excellent choice for beginners to grasp the fundamental concepts of sorting algorithms. Its clear logic and step-by-step nature allow for easy understanding of how sorting works. 249 | - **Small datasets:** For relatively small lists where efficiency isn't a paramount concern, selection sort's memory efficiency can be beneficial. Since it operates directly on the original data, it avoids the need for additional memory allocation, which might be limited in specific scenarios. 250 | 251 | **When to Avoid Selection Sort:** 252 | 253 | - **Large datasets:** Selection sort's Achilles' heel is its O(n^2) time complexity. As the number of elements in the list grows, the sorting time increases quadratically, making it sluggish for handling massive datasets. In such cases, resorting to sorting algorithms with better time complexity (e.g., Merge Sort, Quick Sort) is highly recommended for optimal performance. 254 | - **Stability matters:** If maintaining the original order of elements with equal values is crucial (stable sorting), selection sort is not the ideal choice. It doesn't guarantee that elements with identical keys will preserve their relative positions in the sorted list. 255 | 256 | **Additional Considerations:** 257 | 258 | - **Hybrid approaches:** In some situations, selection sort might be used as a preliminary step for a more complex sorting algorithm. For instance, it could be employed to partially sort a small sub-list before feeding it into a divide-and-conquer sorting algorithm like Merge Sort. 259 | 260 | ## References: 261 | 262 | - [wikipedia](https://en.wikipedia.org/wiki/Selection_sort) 263 | - [geeksforgeeks](https://www.geeksforgeeks.org/selection-sort/) 264 | 265 | ## Example 266 | 267 | - [Ts](https://github.com/m-mdy-m/algorithms-data-structures/blob/main/6.Basic-Sorting/Selection/Example/Selection.ts) 268 | - [Js](https://github.com/m-mdy-m/algorithms-data-structures/blob/main/6.Basic-Sorting/Selection/Example/Selection.js) 269 | - [Go](https://github.com/m-mdy-m/algorithms-data-structures/blob/main/6.Basic-Sorting/Selection/Example/Selection.go) 270 | - [Java](https://github.com/m-mdy-m/algorithms-data-structures/blob/main/6.Basic-Sorting/Selection/Example/Selection.java) 271 | - [Python](https://github.com/m-mdy-m/algorithms-data-structures/blob/main/6.Basic-Sorting/Selection/Example/Selection.py) 272 | 273 | > Link Visualization-> [Visualization](https://algorithmic-solutions.github.io/Selection-Algorithm/) 274 | -------------------------------------------------------------------------------- /my-articles/Algorithm/SORTING/Selection algorithm.md: -------------------------------------------------------------------------------- 1 | # Selection algorithm 2 | 3 | ## Introduction: 4 | 5 | In computer science, a selection algorithm is an algorithm for finding the **kth smallest (or largest)** value in a collection of **unorganized data**. This data can be a list of numbers, characters, or even objects. The value that it finds is called the **kth order statistic**. Selection algorithms include finding the minimum, median, and maximum element in the collection as special cases. 6 | 7 | Common selection algorithms include quickselect and the median of medians algorithm. When applied to a collection of **n unsorted elements**, these algorithms typically have a time complexity of **O(n)**, as expressed using big O notation. This means the execution time grows linearly with the number of elements. 8 | 9 | For data that is already structured, faster algorithms may be possible. As an extreme case, selecting the kth element in an already-sorted array takes constant time, **O(1)**. This is because we can directly access the element at the desired index. 10 | 11 | > Summary: A selection algorithm efficiently retrieves the kth smallest (or largest) element from a collection of unordered data. It's particularly useful for finding specific values like minimum, maximum, or median. While selection algorithms have a linear time complexity for unsorted data, they offer a significant advantage over sorting the entire collection when you only need a specific element. 12 | 13 | ## How it Work: 14 | 15 | Selection algorithms efficiently find the kth smallest (or largest) element in a collection of unordered data. Unlike sorting algorithms that arrange the entire collection, selection algorithms focus on finding a specific element. 16 | 17 | Here's a general breakdown of how selection algorithms work: 18 | 19 | 1. **Initialization:** We start by choosing an initial element from the data. This element could be any element in the collection. 20 | 21 | 2. **Comparison Loop:** We enter a loop that iterates through the remaining data. In each iteration, we compare the current element with the initial element (or the current candidate for the kth smallest/largest). 22 | 23 | 3. **Update Candidate:** 24 | 25 | - If the current element is smaller (for smallest) or larger (for largest) than the candidate element, we update the candidate. This means the current element becomes a better candidate for the kth smallest/largest position. 26 | - If the current element is equal to the candidate, the specific behavior depends on the selection algorithm. Some algorithms might treat them equally, while others might have a tie-breaking mechanism. 27 | 28 | 4. **Repeat:** Steps 2 and 3 are repeated until all elements in the collection have been compared with the candidate. 29 | 30 | 5. **Result:** After examining all elements, the final candidate element becomes the kth smallest (or largest) value in the collection. 31 | 32 | **Visualizing the Process (Optional):** 33 | 34 | ![](https://codepumpkin.com/wp-content/uploads/2017/10/selectionSort.gif) 35 | 36 | ![](https://codepumpkin.com/wp-content/uploads/2017/10/SelectionSort_Avg_case.gif) 37 | 38 | ![](https://www.codingconnect.net/wp-content/uploads/2016/09/Selection-Sort.gif) 39 | 40 | --- 41 | 42 | ### Problem statement 43 | 44 | Selection algorithms address a fundamental task: efficiently identifying the **kth smallest (or largest)** element within a collection of unordered data. This data can encompass numbers, characters, or even objects with a comparable property like size or date. Unlike sorting algorithms that arrange the entire collection, selection algorithms focus on retrieving a specific element based on its position within the sorted order. 45 | 46 | **Core Characteristics:** 47 | 48 | - **Unsorted Data:** The data is assumed to be in its initial, unordered state. Selection algorithms don't require any prior sorting to function. 49 | - **Input:** The algorithm takes two inputs: 50 | 1. The collection of data values. 51 | 2. A number **k** representing the desired position (kth smallest or largest element). 52 | - **Output:** The algorithm outputs the **kth smallest element** if configured to find minimum values. Conversely, for finding maximum values, it outputs the **kth largest element**. Some variations might return a collection of the k smallest/largest elements. 53 | - **Comparison-Based Approach:** Selection algorithms typically rely on comparisons between elements to determine their relative order. Imagine comparing numbers to see which is bigger or smaller. This approach avoids complex mathematical operations, making them suitable for various data types. 54 | 55 | **Addressing Challenges:** 56 | 57 | - **Handling Duplicates:** To simplify the analysis of selection algorithms, some theoretical discussions assume the data has **distinct values** (no duplicates). 58 | - **Tie-Breaking Mechanism:** In real-world scenarios, data might contain duplicates. To address this, a consistent **tie-breaking method** can be established to determine the order of elements with the same value. 59 | - **Numbering the kth Element:** There are variations in how the "kth" element is numbered. This explanation follows the convention established by Cormen et al.: 60 | - The **minimum value** is obtained with **k = 1**. 61 | - The **maximum value** is obtained with **k = n** (n being the total number of elements). 62 | - The **median** for an odd number of elements (n) is found with **k = (n + 1) / 2**. 63 | - For even n, there are two medians: 64 | - Lower median with **k = n / 2**. 65 | - Upper median with **k = n / 2 + 1**. 66 | 67 | > Summary: Selection algorithms offer a targeted approach to extracting specific elements from unsorted data. Their strength lies in retrieving the kth smallest/largest element without necessarily sorting the entire collection. This makes them efficient for tasks like finding minimum, maximum, or median values in unsorted datasets. They provide a valuable tool for data analysis and manipulation whenever you need to identify specific elements within unordered data. 68 | 69 | ## Algorithms 70 | 71 | ### Sorting and heapselect 72 | 73 | While specialized selection algorithms often reign supreme in terms of efficiency, a fundamental approach to finding the kth smallest element in a collection relies on sorting. Here's a breakdown of this method: 74 | 75 | - **Step 1: Sorting the Collection:** The initial step involves meticulously arranging the entire data collection in ascending order. This can be achieved using various sorting algorithms, each with its own time complexity. Common sorting algorithms like quicksort or merge sort typically have a time complexity of Θ(n log n), where n represents the number of elements in the collection. 76 | 77 | - **Step 2: Retrieving the kth Element:** Once the data is sorted, retrieving the kth smallest element becomes a straightforward task. If the sorting algorithm outputs the data as an array, you can simply access the element at the kth position (remembering that arrays are often zero-indexed, so the first element is at position 0). In other cases, you might need to scan the sorted sequence to locate the element at the kth position. 78 | 79 | **Time Complexity:** 80 | 81 | The overall time complexity of this selection method is dominated by the sorting step. Since sorting algorithms like quicksort typically require Θ(n log n) time, this becomes the governing factor. While there might be specialized integer sorting algorithms that are faster, they generally underperform compared to the linear time achievable with dedicated selection algorithms. 82 | 83 | **Advantages and Considerations:** 84 | 85 | - **Simplicity:** This approach offers an appealing level of simplicity, especially when a well-optimized sorting routine is readily available within a programming library. If a dedicated selection algorithm is not included in the library, this method becomes a viable alternative. 86 | - **Favorable for Smaller Inputs:** For datasets with moderate sizes, sorting might even outperform non-randomized selection algorithms due to smaller constant factors associated with sorting's running time. 87 | - **Output and Further Computations:** An additional benefit of this method is that it produces a fully sorted version of the original collection. This sorted data can prove useful for subsequent computations, particularly if selection with different values of k is needed later. 88 | 89 | **Specialized Techniques:** 90 | 91 | There are optimizations that can be applied when the sorting algorithm itself generates elements one at a time, like selection sort. In such scenarios, it's possible to perform the selection process alongside the sorting. The sorting can then be terminated as soon as the kth element is identified. This optimization, when applied to heapsort, results in the heapselect algorithm. Heapselect boasts a time complexity of O(n + k log n), making it efficient when k is significantly smaller than n. However, this advantage diminishes for larger k values (like k = n/2 for median finding), where it can regress to O(n log n). 92 | 93 | > `Summary:` The selection method leveraging sorting offers a clear and straightforward approach for finding the kth smallest element in a collection. While it might not be the most time-efficient solution for all cases, its simplicity and potential benefits for smaller datasets or situations where a pre-existing sorting routine is available make it a valuable option to consider. However, for optimal performance, especially when dealing with larger datasets, exploring specialized selection algorithms is recommended. 94 | 95 | ### Pivoting 96 | 97 | Selection algorithms excel at pinpointing specific elements within unordered data collections. Unlike sorting algorithms that meticulously arrange the entire dataset, selection algorithms focus on retrieving a strategically chosen element based on its rank. At the heart of many efficient selection algorithms lies the concept of pivoting, a technique that partitions the data and guides the search towards the target element. 98 | 99 | **The Pivoting Technique:** 100 | 101 | Imagine a collection of unsorted numbers. A pivoting algorithm strategically selects a specific element from this data, the pivot, and uses it to divide the remaining elements into two distinct subsets: 102 | 103 | - **Less Than (L):** This subset encompasses elements with values **less than** the pivot. 104 | - **Greater Than (R):** This subset contains elements with values **greater than** the pivot. 105 | 106 | By comparing the desired rank (kth smallest) with the sizes of these subsets, the algorithm can determine the location of the target element. Here's a breakdown of the logic: 107 | 108 | 1. **Comparison with k:** If k, representing the desired rank (kth smallest), is less than or equal to the size of the "Less Than" subset (L), then the kth smallest element resides within L. In this scenario, we can apply the same selection algorithm recursively to the elements in L to hone in on the target element. 109 | 110 | 2. **Pivot as the Answer:** If k is exactly equal to the size of L plus 1, then the kth smallest element is the pivot itself! We can simply return the pivot in this case. 111 | 112 | 3. **Refining the Search in "Greater Than" Subset:** If neither of the above conditions hold true, it implies the kth smallest element is located within the "Greater Than" subset (R). More precisely, it's the element positioned at (k - |L| - 1) within R. To find this element, we recursively apply the selection algorithm again, but this time focusing on this specific position within the R subset. 113 | 114 | **The Significance of Pivot Choice:** 115 | 116 | The efficiency of pivoting-based selection algorithms hinges critically on the selection of the pivot element. A well-chosen pivot leads to a balanced division of the data, resulting in efficient recursive calls. Conversely, a poor pivot choice can create highly unbalanced subsets, leading to a worst-case time complexity of O(n^2), significantly slower than desired. Here's a closer look at some popular pivoting-based algorithms and their pivot selection strategies: 117 | 118 | - **Quickselect (Randomized):** This algorithm picks a pivot element uniformly at random from the input data. It boasts an expected time complexity of O(n), making it a fast choice in many situations due to its potential for balanced partitions. 119 | 120 | - **Median of Medians (Deterministic):** This algorithm takes a more deterministic approach. It partitions the data into sets of five elements and finds the median of each set. Then, it recursively finds the median of these medians to use as the pivot. This approach guarantees a worst-case time complexity of O(n), but its high constant factors (due to multiple recursive calls) make it less practical than quickselect for most use cases. 121 | 122 | - **Floyd-Rivest Algorithm (Hybrid):** This algorithm is a variation of quickselect that employs a more sophisticated pivot selection strategy. It randomly samples a subset of data values and then recursively selects two elements to use as pivots. This approach aims to "sandwich" the kth smallest element between the pivots, minimizing the size of the data to be searched recursively. It achieves an expected number of comparisons close to O(n) but requires a more complex implementation compared to quickselect. 123 | 124 | - **Hybrid Algorithms:** Techniques like introselect combine the strengths of quickselect's efficiency with the guaranteed worst-case O(n) time bound of algorithms like median of medians. They strategically switch between these approaches based on the situation, offering a balance between speed and guaranteed performance. 125 | 126 | In conclusion, pivoting empowers selection algorithms to efficiently locate the kth smallest element within unordered data. By strategically choosing a pivot and dividing the data based on comparisons, these algorithms can effectively narrow down the search space and locate the target element. While the effectiveness significantly relies on the pivot selection strategy, various algorithms cater to different needs. Quickselect offers a fast, randomized approach, while median of medians provides a deterministic guarantee. Hybrid algorithms like introselect attempt to combine the best of both worlds. Understanding these techniques empowers you to select the most appropriate algorithm for your specific data manipulation tasks. 127 | 128 | > `Summary:` Pivoting empowers selection algorithms to efficiently locate the kth smallest element within unordered data. By strategically choosing a pivot and dividing the data based on comparisons, these algorithms can effectively narrow down the search space and locate the target element. While the effectiveness significantly relies on the pivot selection strategy, various algorithms like quickselect and median of medians offer practical solutions for different scenarios. 129 | 130 | ### Factories 131 | 132 | While pivoting techniques offer a powerful approach for selection algorithms, the quest for even greater efficiency has led to the exploration of factories. Introduced in 1976, factories represent a sophisticated concept that pushes the boundaries of deterministic selection algorithms, particularly for k values far from 1 or n (i.e., not the very smallest or largest elements). 133 | 134 | **What are Factories?** 135 | 136 | Imagine a specialized workshop dedicated to meticulously ordering elements. These factories, within the context of selection algorithms, are methods that construct partial orders for small subsets of input values. They leverage comparisons to meticulously combine smaller, pre-established partial orders. 137 | 138 | **A Simplified Example:** 139 | 140 | Consider a basic factory that takes a series of single-element partial orders (think of each element as an individual unit waiting to be compared and positioned). This factory compares pairs of elements from these single-element orders and produces a new output: a sequence containing two totally ordered sets. The elements fed into this factory could be: 141 | 142 | - Uncompared input values (fresh data waiting to be placed in the order). 143 | - "Waste" values produced by other factories during the overall process. 144 | 145 | **The Grand Objective:** 146 | 147 | The ultimate goal of a factory-based selection algorithm is to strategically combine these factories. The outputs from some factories become the inputs for others, ultimately aiming to achieve a comprehensive partial order where: 148 | 149 | - One specific element (the kth smallest) is positioned higher than k-1 other elements. 150 | - The same element is positioned lower than n-k other elements (remember n is the total number of elements). 151 | 152 | **Benefits and Challenges:** 153 | 154 | By meticulously designing these factories and their interactions, researchers have achieved remarkable results. For instance, a factory-based approach specifically designed for median-finding (finding the middle element) can achieve this task using at most 2.942n comparisons. This is significantly more efficient than some standard algorithms. However, employing factories effectively can be quite complex. The design and implementation of these factories require a deep understanding of algorithms and mathematical concepts. 155 | 156 | > **Summary:** Factories represent an advanced approach to selection algorithms, particularly for deterministic algorithms targeting specific k values. While they offer the potential for superior efficiency, their complexity makes them a niche area explored by researchers pushing the boundaries of what's possible. For most practical applications, pivoting-based techniques like quickselect might be a more approachable and well-suited choice. 157 | 158 | ### Parallel algorithms 159 | 160 | The exploration of parallel selection algorithms began in the mid-1970s. Researchers have investigated how to perform selection using a parallel comparison tree model, a theoretical framework for analyzing parallel algorithms. This model revealed an interesting trade-off: 161 | 162 | - **Parallel Comparison Tree Model:** Introduced by Leslie Valiant, this model provides a framework for analyzing parallel selection algorithms. It establishes a lower bound of Ω(log log n) parallel steps, even for simple tasks like finding the minimum or maximum element, when using a linear number of comparisons. 163 | 164 | - **Matching the Lower Bound:** Researchers have devised parallel selection algorithms that achieve the established lower bound of O(log log n) steps, showcasing the potential for efficient parallelization. 165 | 166 | - **Randomized Parallelism:** By incorporating randomization within the parallel comparison tree model, it's possible to perform selection in a fixed number of steps while maintaining a linear number of comparisons. 167 | 168 | - **Parallel RAM Model:** This model offers a more realistic representation of parallel computing with exclusive read/write memory access. In this model, selection can be achieved in O(log n) time using O(n/log n) processors, representing an optimal solution in terms of both time and the number of processors required. 169 | 170 | - **Concurrent Access:** With concurrent memory access (allowing multiple processors to access memory simultaneously), faster parallel times can be achieved. The logarithmic term in the time bound can even be reduced to log k, potentially leading to significant speedups for specific values of k (desired rank). 171 | 172 | **Sublinear Selection with Specialized Structures:** 173 | 174 | When data is already organized within a specific data structure, selection can potentially be performed in sublinear time, meaning the time complexity grows slower than the number of elements (n). Here are some notable examples: 175 | 176 | - **Sorted Arrays:** For data meticulously sorted within an array, selecting the kth element becomes a trivial task. A simple array lookup in constant time (O(1)) suffices to retrieve the desired element. 177 | - **Two-Dimensional Arrays:** Data organized as a two-dimensional array with sorted rows and columns allows for selection in O(m log(2n/m)) time. This can be even faster for scenarios where k is relatively small compared to the array's dimensions. 178 | - **Multiple Sorted Arrays:** A collection of sorted one-dimensional arrays offers selection in O(m + Σ(i=1 to m) log(ki + 1)) time, where m represents the number of arrays and ki denotes the number of elements less than the selected item within the ith array. 179 | - **Binary Heaps:** Data structured as a binary heap enables selection in O(k) time, remarkably independent of the heap's size (n). This proves to be significantly faster than the O(k log n) time bound associated with best-first search approaches. 180 | - **Order Statistic Trees:** For dynamic data undergoing insertions and deletions, order statistic trees provide a data structure that supports selection queries alongside insertions and deletions, all in O(log n) time per operation. This leverages a self-balancing binary search tree augmented with additional information per node. 181 | 182 | **Beyond Traditional Selection:** 183 | 184 | The exploration of selection algorithms extends beyond the comparison model of computation. For scenarios involving small integers where binary arithmetic operations are permissible, faster selection times can be achieved. 185 | 186 | **Real-World Considerations:** 187 | 188 | It's important to note that streaming algorithms with sublinear memory limitations (both in terms of n and k) cannot perform exact selection queries on dynamic data. However, techniques like the count-min sketch offer approximate solutions, identifying a value whose position in the ordering (if included) would be within an εn range of k. This approach utilizes a sketch with a size that scales logarithmically with respect to 1/ε. 189 | 190 | > Summary : Selection algorithms can be parallelized to achieve faster computation times. In the parallel RAM model, selection can be achieved in O(log n) time using O(n/log n) processors. When data is already organized in specific data structures, selection can be even faster. For example, selection on a sorted array takes constant time. 191 | 192 | ## Lower bounds 193 | 194 | Selection algorithms, despite their apparent simplicity, face a fundamental challenge: the minimum number of comparisons required to identify the desired element. This section delves into the fascinating realm of lower bounds, exploring the theoretical limitations on selection efficiency and the distinctions between randomized and deterministic approaches. 195 | 196 | **Understanding the Necessity of Comparisons:** 197 | 198 | The linear running time (O(n)) observed in most selection algorithms is not a coincidence. It stems from the inherent nature of the problem. When dealing with unordered data, every element must be considered at least once to ensure the target element isn't overlooked. Skipping a comparison introduces the risk of missing the crucial piece of information that could lead to the correct selection. This inherent requirement for comparisons establishes a baseline level of efficiency that all selection algorithms must strive to achieve. 199 | 200 | **Beyond the Basics: Exploring Specific Cases** 201 | 202 | Researchers haven't stopped at this basic understanding. They've delved deeper, seeking the exact number of comparisons needed for selection in various scenarios. This exploration reveals intriguing differences between randomized and deterministic algorithms: 203 | 204 | - **Minimum and Maximum Selection:** Finding the minimum or maximum element presents a relatively straightforward case. To identify the minimum element, each element in the data set must be compared against the current minimum candidate. This process ensures that every other element is demonstrably "non-minimal." The same logic applies to maximum selection, just with the comparison criteria reversed. This analysis leads to a lower bound of n-1 comparisons for both minimum and maximum selection – every element needs to be compared at least once to definitively establish its position relative to the minimum or maximum value. 205 | 206 | - **Second-Smallest Selection:** Selecting the second-smallest element presents a more intricate challenge. Here, the algorithm must first distinguish the absolute minimum element. The number of comparisons involving the minimum value (p) plays a crucial role. Each element compared to the minimum becomes a candidate for the second-smallest position. However, to eliminate these contenders, p-1 of them must be proven larger than another element in a separate comparison. This intricate analysis reveals a lower bound of n + ⌈log2⁡𝑛⌉ - 2 comparisons for deterministic selection algorithms. Interestingly, this bound aligns with the number of comparisons required in a single-elimination tournament structure, where elements compete in pairs and the loser is eliminated. However, randomized algorithms can achieve a lower expected number of comparisons in this scenario by exploiting the element selection process probabilistically. 207 | 208 | **Generalizing the Lower Bound:** 209 | 210 | The lower bound for selecting the kth element (out of n) can be expressed as n + min(k, n-k) - O(1) comparisons, on average. This matches the number of comparisons used by the Floyd-Rivest selection algorithm, disregarding its constant term. This bound applies to both deterministic algorithms (averaged over all possible input permutations) and the expected number of comparisons for randomized algorithms on their worst-case inputs, based on Yao's principle. In simpler terms, the lower bound suggests that the number of comparisons required, on average, scales linearly with the data size (n) and is influenced by the position of the target element (k) within the data set. 211 | 212 | **Deterministic vs. Randomized Approaches:** 213 | 214 | The quest for efficiency extends beyond the average-case scenario. For deterministic selection algorithms, a more complex lower bound exists, involving the binary entropy function (H(x)). This function captures the inherent uncertainty associated with a given probability distribution. The lower bound suggests that deterministic selection necessitates at least (1 + H(k/n))n + Ω(n) comparisons. While this bound might seem abstract, it essentially indicates that deterministic algorithms face a trade-off between the number of comparisons and the certainty of the outcome. Algorithms with fewer comparisons might have a higher probability of encountering situations where the input data leads to a larger number of comparisons in the worst case. 215 | 216 | **Median Finding: A Special Case:** 217 | 218 | Finding the median element (k = n/2) presents a special case with a slightly stricter lower bound on comparisons. Deterministic algorithms require at least (2 + ε)n comparisons, where ε is a very small constant value. This suggests that even for finding the middle element, there's a theoretical limit on how efficiently deterministic algorithms can operate. 219 | 220 | > **Summary:** Selection algorithms, while seemingly straightforward, are subject to inherent limitations on their efficiency. Understanding these lower bounds empowers researchers to design algorithms that approach these theoretical limits and navigate the trade-offs between deterministic and randomized approaches. Randomized algorithms can offer better average-case performance by incorporating probabilistic elements, while deterministic algorithms provide guaranteed performance but might require more comparisons in the worst case. By delving into the complexities of lower bounds, we gain a deeper appreciation for the challenges and accomplishments in the realm of selection algorithms. 221 | 222 | ## Exact numbers of comparisons 223 | 224 | Donald Knuth, a renowned computer scientist, compiled a fascinating table known as the "selection cost triangle." This triangle offers a glimpse into the exact number of comparisons required by an optimal selection algorithm for various scenarios. 225 | 226 | **Understanding the Triangle:** 227 | 228 | Imagine a triangular table where rows represent the number of elements (n) in the data set, starting with n = 1 at the top. Within each row, the kth number signifies the minimum comparisons needed to select the kth smallest element from that specific data size (n). The table is symmetrical because selecting the kth smallest element requires the same number of comparisons as selecting the kth largest element, in the worst-case scenario. 229 | 230 | **Exploring the Triangle's Content:** 231 | 232 | The table is partially populated with the following numbers: 233 | 234 | ``` 235 | 0 236 | 1 1 237 | 2 3 2 238 | 3 4 4 3 239 | 4 6 6 6 4 240 | 5 7 8 8 7 5 241 | 6 8 10 10 10 8 6 242 | 7 9 11 12 12 11 9 7 243 | 8 11 12 14 14 14 12 11 8 244 | 9 12 14 15 16 16 15 14 12 9 245 | ``` 246 | 247 | **Formula for Efficiency:** 248 | 249 | For most entries on the left half of each row, a formula exists to calculate the optimal number of comparisons. This formula, developed by Abdollah Hadian and Milton Sobel, leverages a method related to heap selection. It considers a single-elimination tournament followed by a series of smaller tournaments to progressively identify the desired kth smallest element. 250 | 251 | **Formula:** 252 | 253 | 𝑛−𝑘+(𝑘−1)⌈log2⁡(𝑛+2−𝑘)⌉ 254 | 255 | Here, n represents the data size, k represents the target element position (kth smallest), and ⌈log2⁡(𝑛+2−𝑘)⌉ denotes the ceiling function of the base-2 logarithm of (n + 2 - k). 256 | 257 | **Beyond the Formula:** 258 | 259 | While the formula provides valuable insights, some entries in the triangle, particularly the larger ones, were established as optimal through extensive computer searches. This signifies that for certain data sizes and target element positions, the exact number of comparisons needed might not have a simple closed-form expression. 260 | 261 | > **Summary:** The selection cost triangle serves as a valuable reference for understanding the inherent efficiency limitations associated with selection algorithms. By analyzing the triangle, researchers can gain insights into the optimal number of comparisons required for various scenarios and guide the development of more efficient selection algorithms. The existence of a formula for most entries on the left half of the table provides a foundation for theoretical analysis, while computer searches aid in establishing optimality for more complex cases. As research in selection algorithms continues to evolve, the triangle might be further refined or expanded to encompass a wider range of data sizes and target element positions. 262 | 263 | ## Conclusion 264 | 265 | True, understanding selection algorithms can involve some complex concepts, especially when delving into lower bounds and advanced techniques like order statistic trees. However, by grasping the core principles and exploring different approaches like sorting, heaps, and pivoting, you gain a solid foundation for appreciating their efficiency and applicability in various scenarios. 266 | 267 | **Real-World Example: Finding the Median Salary in a Company** 268 | 269 | Imagine a company needs to determine the median employee salary. They have a dataset containing salary information for all employees (n data points). To find the median (the middle value when the data is ordered from least to greatest), a selection algorithm can be employed. Here's how it might work using the quickselect algorithm, a popular randomized approach: 270 | 271 | 1. **Randomly Choose a Pivot:** The algorithm picks an element from the dataset at random and designates it as the pivot. 272 | 273 | 2. **Partition the Data:** The remaining data is divided into two subsets based on a comparison with the pivot: 274 | 275 | - Less Than (L): Elements with salaries less than the pivot's value. 276 | - Greater Than (R): Elements with salaries greater than the pivot's value. 277 | 278 | 3. **Identify the Median's Location:** The median's position (k) within the sorted data depends on the sizes of these subsets. 279 | 280 | - If the size of subset L (number of employees earning less than the pivot) is greater than or equal to k (halfway point), the median resides within subset L. 281 | - If the size of L is exactly k - 1, then the pivot itself is the median! 282 | - Otherwise, the median is positioned at (k - |L| - 1) within subset R (since some elements in L are smaller and the rest in R are larger). 283 | 284 | 4. **Recursive Calls (if necessary):** Depending on the pivot's position and the desired median's location (k), the algorithm might recursively call itself on either subset L or R, focusing on the specific location where the median is guaranteed to reside. This process continues until the median element is identified. 285 | 286 | 5. **Return the Median:** Once the median's position is pinpointed within the data (either the pivot itself or an element within a specific subset), the algorithm returns this value as the median salary in the company. 287 | 288 | By strategically selecting pivots and partitioning the data, quickselect efficiently narrows down the search space and locates the median element in the salary dataset. This principle applies to various selection algorithms, each with its own strengths and considerations. 289 | -------------------------------------------------------------------------------- /my-articles/Algorithm/Searching/README.md: -------------------------------------------------------------------------------- 1 | # Searching Algorithms: 2 | ## What is Searching? 3 | 4 | Searching is the essential process of finding a specific element or item within a dataset. This data can be organized in various structures like arrays, lists, trees, or any other formatted representation. The core goal of searching is to determine if the desired element exists in the data and, if it does, to pinpoint its exact location or extract it. Searching is fundamental to many computational tasks and real-world applications, including retrieving information, analyzing data, and informing decision-making processes. 5 | ## Introduction 6 | In computer science, a search algorithm is a method for efficiently finding information within a dataset. These algorithms can retrieve data stored in specific structures or calculate it within a defined search space. They are crucial for various applications like information retrieval and data analysis. 7 | 8 | **Choosing the right search algorithm depends on the data structure and any prior knowledge about the data itself.** Specialized data structures like search trees and hash tables can significantly improve search efficiency. 9 | ### Types of Search Algorithms 10 | 11 | There are two main categories of search algorithms based on their mechanisms: 12 | 13 | 1. **Linear Search:** Examines each record one by one until the target is found. 14 | 2. **Binary Search:** Repeatedly divides the search space in half by targeting the center of the data structure, making it much faster than linear search for sorted data. 15 | > !! NOTE : it's important to clarify that this is a high-level categorization. !! 16 | 17 | **Search algorithms are evaluated based on their computational complexity, which describes the maximum time required to find the target.** Binary search, for example, boasts a logarithmic time complexity (O(log n)), meaning the number of steps needed to find the target grows proportionally to the logarithm of the data size. 18 | 19 | **Summary:** Search algorithms are essential tools for efficiently finding information within datasets. The choice of algorithm depends on the data structure and properties, with each type offering varying efficiency trade-offs. 20 | ### Simple explanation: 21 | 22 | Imagine you're looking for a specific book in a library. Here's how different search algorithms might play out: 23 | 24 | * **Linear Search:** This is like checking each shelf in the library, one by one, until you find the book you're looking for. It works but can be slow, especially in a large library. 25 | 26 | * **Binary Search:** This is like going to the middle section of the library first. If your book should be alphabetically before the middle section, you only search the left half. If it should be after, you only search the right half. You keep halving the remaining section until you find the book. Much faster than linear search, especially if the books are sorted alphabetically! 27 | ## How it Work 28 | 29 | Search algorithms come in many varieties, but they all share a core principle: efficiently locating a specific element within a dataset. Here's a breakdown of the general process, considering different factors: 30 | 31 | **1. Data Structure and Search Type:** 32 | 33 | The first step involves understanding the data structure you're searching within (array, list, tree, etc.) and the type of search being conducted (finding an exact match, a range of values, etc.). 34 | 35 | **2. Initialization:** 36 | 37 | The search algorithm typically starts by initializing variables. This might involve setting a counter to 0, defining the target element you're looking for, and sometimes establishing boundaries for the search space (depending on the algorithm). 38 | 39 | **3. Looping and Comparison:** 40 | 41 | Most search algorithms rely on loops to iterate through the data structure. Within the loop, the algorithm compares the current element with the target element. 42 | 43 | * **Linear Search:** In a linear search, the comparison happens for each element in the data structure. If there's a match, the search ends, and the element's position is returned. 44 | * **Binary Search:** This algorithm assumes the data is sorted. It starts by comparing the target element with the middle element. 45 | * If they match, the search ends, and the middle element's position is returned. 46 | * If the target element is smaller, the search continues only in the left half of the data structure (excluding the middle element). 47 | * If the target element is larger, the search continues only in the right half of the data structure (excluding the middle element). 48 | * This process of dividing the search space in half and comparing with the middle element continues until the target is found or the entire data structure is exhausted. 49 | 50 | **4. Termination:** 51 | 52 | The loop terminates under different conditions depending on the algorithm: 53 | 54 | * **Target Found:** If a match is found during the comparison stage, the search ends, and the position or the element itself is returned. 55 | * **Search Exhausted:** If the loop iterates through the entire data structure without finding a match, the search concludes with a "not found" result. 56 | 57 | **5. NOTE:** 58 | 59 | * **Early Termination:** Some search algorithms might incorporate early termination conditions. For example, a linear search might stop if it encounters a special element indicating the target wouldn't be found beyond that point. 60 | * **Data Modification:** Search algorithms typically don't modify the data structure they're searching within. They access and compare elements but usually leave the data structure unchanged. 61 | ## Searching terminologies: 62 | * **Target Element:** In a search operation, the target element represents the specific piece of information you're trying to locate within the data collection. This element can be various data types depending on the context. It could be a numerical value (e.g., finding a specific ID number), a record containing multiple data points (e.g., searching for a customer with a particular name and address), a key used for indexing purposes (e.g., locating a product based on its unique product code), or any other data entity relevant to your search objective. 63 | 64 | * **Search Space:** The search space encompasses the entirety of the data you're searching through to find the target element. The size and organization of the search space depend heavily on the chosen data structure. For instance, a search space might be an array of elements, a linked list of nodes, a tree with hierarchical relationships between elements, or a hash table with key-value pairs. Understanding the structure of the search space is crucial for selecting the most efficient search algorithm. 65 | 66 | * **Complexity:** Search algorithms exhibit varying degrees of complexity based on two main factors: the data structure and the chosen algorithm itself. Complexity is typically measured in terms of time and space requirements. 67 | * **Time Complexity:** This refers to the amount of time (number of steps) an algorithm takes, on average or in the worst-case scenario, to find the target element. It's often expressed using Big O Notation, which considers the growth rate of the execution time as the data size increases. For example, a linear search has a time complexity of O(n), signifying that the search time grows linearly with the number of elements (n) in the data structure. Binary search, on the other hand, boasts a logarithmic time complexity of O(log n), meaning the search time increases much slower as the data size grows. 68 | * **Space Complexity:** This refers to the amount of additional memory space an algorithm requires beyond the space occupied by the data itself. Some search algorithms might use extra memory for temporary variables or keeping track of the search progress, impacting the space complexity. 69 | 70 | * **Deterministic vs. Non-Deterministic Algorithms:** 71 | * **Deterministic Search Algorithms:** These algorithms follow a predefined, step-by-step approach to locate the target element. They consistently arrive at the same outcome given the same data and initial conditions. Binary search is a prime example, where the algorithm always divides the search space in half based on a clear set of rules until the target is found or eliminated as a possibility. 72 | * **Non-Deterministic Search Algorithms:** These algorithms don't necessarily follow a fixed path during the search process. The order of element comparisons might vary, and the worst-case scenario could involve examining the entire search space. Linear search falls under this category. While it eventually finds the target element if it exists in the data, the number of comparisons can vary depending on the target's position within the data structure. 73 | ## Applications of Searching: 74 | * **Image and Video Retrieval:** Search algorithms are used in image and video search engines to find images or videos based on user queries. These algorithms can analyze image content, such as colors, shapes, and textures, to match user queries with relevant results. 75 | 76 | * **Natural Language Processing (NLP):** Search algorithms play a crucial role in various NLP tasks, including sentiment analysis, machine translation, and text summarization. They can be used to identify keywords, phrases, and semantic relationships within text data, enabling tasks like finding documents relevant to a specific topic or translating text from one language to another. 77 | 78 | * **Machine Learning:** Search algorithms are fundamental for various machine learning applications. They are used in tasks like nearest neighbor classification, where new data points are classified based on their similarity to existing labeled data points identified through search algorithms. Additionally, search algorithms are used in anomaly detection to identify data points that deviate significantly from the expected patterns. 79 | 80 | * **Bioinformatics:** Search algorithms are crucial for analyzing biological sequences like DNA and protein sequences. They enable tasks like finding specific genes or protein motifs within large datasets, which is essential for research in genetics, drug discovery, and personalized medicine. 81 | 82 | * **Recommendation Systems:** Search algorithms are used in recommendation systems to suggest relevant products, movies, music, or other items to users. These algorithms can analyze user behavior and search history to identify patterns and recommend items that are similar to what the user has shown interest in previously. 83 | 84 | * **Network Security:** Search algorithms are used for intrusion detection systems (IDS) to identify suspicious network activity and potential security threats. They can analyze network traffic patterns to detect anomalies that might indicate malware or hacking attempts. 85 | 86 | * **Robotics and Navigation:** Search algorithms are used in path planning for robots and autonomous vehicles. These algorithms can help robots navigate their environment by finding the most efficient route from a starting point to a destination while avoiding obstacles. -------------------------------------------------------------------------------- /my-articles/Algorithm/What is the algorithm-1?.md: -------------------------------------------------------------------------------- 1 | # What is the algorithm? 2 | 3 | ## Introduction to Algorithms Book : 4 | 5 | Informally: 6 | A defined calculation method (completely defined) that takes a long time to take a value from the input and produce a value! 7 | 8 | So the algorithm is a sequence of calculations 9 | Steps that convert input into output 10 | 11 | It can also be said that the algorithm is a tool for solving calculations! 12 | 13 | > (that is, both a calculation method and a tool for calculation) 14 | 15 | ### Problem 16 | 17 | The problem can be said that => 18 | 19 | Input/output relationship with problem instances and arbitrary size 20 | 21 | In general, algorithms provide a specific computational method to establish the relationship between inputs and outputs for all types of problems. 22 | 23 | ### example : 24 | 25 | For example, think that we have to arrange a sequence of numbers uniformly! 26 | 27 | Here is how we formally define the sorting problem: 28 | 29 | Input: A sequence of n numbers(a1,a2,.....,an). 30 | 31 | Output: A permutation (reordering) (a1,a2,......,an) of the input sequence such 32 | 33 | that a1 <= a2 <= .... a`n. 34 | 35 | Thus, given the input sequence (31,41,59,26,41,58) a correct sorting algorithm 36 | 37 | returns as output the sequence (26,31,41,41,58,59) Such an input sequence is called an instance of the sorting problem 38 | 39 | (It might be confusing, but we take the jumbled input and give the sorted output from small to large) 40 | 41 | ### **Choosing the Right Algorithm:** 42 | 43 | The best algorithm for a specific task depends on various factors such as the number of items to be sorted, the degree to which they are already sorted, any restrictions on the values of the items, the computer's architecture, and the storage devices being used (like main memory, disks, or tapes). 44 | 45 | ### **Correctness of Algorithms:** 46 | 47 | - An algorithm is considered correct if, for every input provided, it finishes its computation within a finite amount of time and produces the correct solution. 48 | - A correct algorithm solves the given computational problem accurately. 49 | 50 | > . However, incorrect algorithms may not finish running at all on certain inputs, or they might produce incorrect results. Sometimes, though, incorrect algorithms can still be useful, especially if you can control their error rate. 51 | 52 | ### What kinds of problems are solved by algorithms? 53 | 54 | Practical applications of algorithms are everywhere and include the following: 55 | 56 | - The Human Genome Project is working to identify all human genes and decode the sequences in human DNA. This involves using advanced algorithms to store and analyze genetic data efficiently. Techniques like dynamic programming help solve biological problems, especially those related to DNA sequence similarities. These algorithms save time and money, allowing scientists to extract more information from their experiments. 57 | 58 | - The internet allows people worldwide to access vast amounts of information quickly. Clever algorithms help websites manage and process this data efficiently. Examples of problems where algorithms play a crucial role include finding optimal data routes and using search engines to locate specific information on web pages 59 | 60 | - Electronic commerce allows for the electronic negotiation and exchange of goods and services. It relies on the privacy and security of personal information like credit card numbers, passwords, and bank statements. The key technologies used in electronic commerce include public-key cryptography and digital signatures, which are based on numerical algorithms and number theory (discussed in Chapter 31). These technologies help ensure the security of transactions and protect sensitive information during online exchanges. 61 | 62 | - Manufacturing and commercial enterprises often face the challenge of allocating limited resources effectively. For example: 63 | - An oil company may need to decide where to place wells to maximize profits. 64 | - A political candidate may want to spend campaign funds strategically to increase their chances of winning an election. 65 | - Airlines need to assign crews to flights efficiently, ensuring coverage while meeting regulatory requirements. 66 | - Internet service providers may need to decide where to allocate resources to better serve their customers. 67 | > These are all examples of problems that require optimization, where algorithms can help find the most beneficial solutions. 68 | 69 | ### We also demonstrate how to solve many specific problems, including the following: 70 | 71 | - You have a road map with distances between intersections marked, and you want to find the shortest route from one intersection to another. With countless possible routes, how do you determine the shortest one? You can start by modeling the road map as a graph, where intersections are vertices and roads are edges. Then, you can use algorithms to find the shortest path from one vertex (intersection) to another. Chapter 22 explains how to efficiently solve this problem. 72 | 73 | #### **These lists of algorithmic problems, though not exhaustive, share two common characteristics:** 74 | 75 | 1. **Many Candidate Solutions:** These problems typically have numerous possible solutions, but the vast majority of them do not actually solve the problem. Finding the best or optimal solution without explicitly examining every possibility can be quite challenging. 76 | 77 | 2. **Practical Applications:** The problems listed often have practical applications in various fields. For example, finding the shortest path in transportation networks has financial implications for companies like trucking or railroad firms, as shorter paths result in lower costs. Similarly, routing nodes on the internet need to find the shortest path to route messages quickly, and individuals use navigation apps to find driving directions efficiently. 78 | 79 | While not every algorithmic problem has a readily identifiable set of candidate solutions, there are still practical applications. For instance, the discrete Fourier transform, which converts signals from the time domain to the frequency domain, plays a crucial role in signal processing, data compression, and other fields. 80 | 81 | --- 82 | 83 | # Grokking Algorithms Book: 84 | 85 | An algorithm is a set of instructions for accomplishing a task 86 | 87 | > Every piece of code could be called an algorithm 88 | 89 | # Exercises: 90 | 91 | 1. **Understanding Algorithms:** 92 | 93 | - Define what an algorithm is in your own words. Explain why algorithms are important in computer science and problem-solving. 94 | 95 | 2. **Real-life Algorithms:** 96 | 97 | - Think of a common task you perform regularly, such as making a sandwich or getting ready for school/work. Describe the step-by-step instructions for completing this task. Discuss whether these instructions can be considered an algorithm and why. 98 | 99 | 3. **Algorithm Identification:** 100 | 101 | - Look around your environment and identify a process or task that involves a series of steps. Write down these steps and analyze whether they constitute an algorithm. Provide reasons for your analysis. 102 | 103 | 4. **Sorting Algorithm Understanding:** 104 | 105 | - Explain the concept of a sorting algorithm using a real-life example. Describe how sorting algorithms work in terms of rearranging items in a specific order. 106 | 107 | 5. **Algorithm Complexity:** 108 | 109 | - Discuss the concept of algorithmic complexity using examples from your daily life. Provide scenarios where the complexity of an algorithm impacts its efficiency in solving a problem. 110 | 111 | 6. **Algorithm Correctness:** 112 | 113 | - Describe a situation where the correctness of an algorithm is crucial. Explain why it is important for algorithms to produce correct results, even if they may take longer to execute. 114 | 115 | 7. **Algorithm Efficiency:** 116 | 117 | - Consider a problem-solving scenario and compare two different algorithms for solving it. Evaluate the efficiency of each algorithm based on factors like runtime and resource usage. 118 | 119 | 8. **Algorithm Applications:** 120 | 121 | - Identify a problem in your community or society that could benefit from algorithmic solutions, such as optimizing traffic flow or managing waste disposal. Discuss how algorithms could be applied to address this problem effectively. 122 | 123 | 9. **Algorithm Optimization:** 124 | 125 | - Take a basic algorithm, such as a simple sorting or searching algorithm, and brainstorm ways to optimize it for better performance. Discuss potential optimizations and their impact on the algorithm's efficiency. 126 | 127 | 10. **Algorithm Design:** 128 | - Design an algorithm to solve a specific problem, such as finding the shortest route between two locations in a city or organizing a list of tasks based on priority. Describe the step-by-step process of your algorithm and its expected outcome. 129 | -------------------------------------------------------------------------------- /my-articles/Algorithm/What is the algorithm-2?.md: -------------------------------------------------------------------------------- 1 | # What is Algorithm 2 | 3 | ## INTRODUCTION 4 | 5 | The word “Algorithm” is derived from the name of the Persian scholar **[Abdullah Jafar Muhammad ibn Musa Al-Khwarizmi](https://en.wikipedia.org/wiki/Al-Khwarizmi)** , a mathematician and astronomer from the ninth century. His work laid the foundation for algebra and the development of algorithmic processes in mathematics. He is often referred to as the "father of algebra.". Al-Khwarizmi's contribution to the definition of an algorithm is profound: 6 | 7 | An **Algorithm** is a well-defined computational procedure, composed of a finite set of steps, that takes one or more inputs and produces an output. These steps form a systematic method for solving a problem or performing a calculation, which can be executed manually or by a machine (e.g., a computer). 8 | 9 | The modern definition of an algorithm retains this concept while also emphasizing computational efficiency and precision. An algorithm is **finite** and **deterministic**, meaning it must have a definite end and the outcome must be predictable from the inputs. 10 | 11 | ### Definition of an Algorithm (by Al-Khwarizmi): 12 | 13 | An algorithm is a systematic procedure for solving a mathematical problem in a finite number of steps, which includes well-defined instructions to achieve a specific outcome. **An Algorithm** is a precise rule or set of rules to solve a problem, especially by a computer, in a step-by-step manner. This concept is an evolution of Al-Khwarizmi's ideas, where algorithms were first applied to manual arithmetic and later formalized in various fields of computer science and mathematics. 14 | 15 | ### Modern Formal Definition: 16 | 17 | An **Algorithm** can be defined as a structured procedure that: 18 | 19 | - **Takes inputs**: Typically, an algorithm requires one or more inputs to start the computation. 20 | - **Performs computations**: It carries out a sequence of operations or instructions on the inputs, following a defined set of rules. 21 | - **Produces output**: It generates a result or output based on the computations made on the input. 22 | - **Terminates**: The process must end after a finite number of steps. 23 | 24 | An algorithm is used to solve problems by breaking them down into simple, executable steps that can be repeated for different inputs. The concept of an **algorithm** has evolved over time, becoming an essential foundation in computer science. In this context, we define an algorithm as a set of programs that express or implement that algorithm. This definition may seem abstract at first, but it provides a way to group similar programs that perform the same tasks. An algorithm is essentially a computational procedure that, given certain inputs, produces an expected output. Each step of the algorithm is well-defined, meaning that the instructions are clear and unambiguous. While there are many ways to express an algorithm (such as in programming languages), the idea behind it remains consistent. Even if different programmers write different implementations of the same algorithm, the core logic stays the same. 25 | 26 | For example, the **Mergesort algorithm** can be implemented in many different ways, but as long as the process of recursively dividing and merging the list is followed, all implementations would still belong to the category of the Mergesort algorithm. 27 | 28 | ### Programs vs. Algorithms 29 | 30 | One of the key points here is distinguishing between **programs** and **algorithms**. A program is a specific implementation of an algorithm in a particular programming language or system. When multiple programmers write different code to solve the same problem using the same algorithm, these programs are distinct from one another. However, they all express the same **algorithm**. 31 | 32 | We can think of algorithms as abstract concepts or blueprints for solving a problem, while programs are concrete implementations of that blueprint in a specific language or environment. To understand this further, consider the following example: 33 | 34 | - **Sorting Algorithms:** The concept of sorting data can be implemented through different algorithms such as **Mergesort** or **Quicksort**. However, each of these algorithms can also have multiple implementations or programs written by different programmers. 35 | 36 | - **Mergesort:** Imagine two programmers implement Mergesort using different programming languages (one in Python and another in Java). Both programs would be different, but they are considered part of the same **Mergesort algorithm** because they use the same underlying steps. 37 | 38 | - **Quicksort:** Similarly, two different implementations of the Quicksort algorithm could exist, and even though the code may differ, both would still represent the same Quicksort algorithm. 39 | 40 | ### Equivalence of Programs 41 | 42 | The set of all programs that implement a given algorithm can be grouped into **equivalence classes**. Two programs are considered **equivalent** if they are essentially implementing the same algorithm, regardless of differences in their code or language. 43 | 44 | For instance, if we take all the different programs that implement Mergesort, they belong to the same equivalence class because they perform the same sorting algorithm. This grouping allows us to define an algorithm as the set of all programs that belong to its equivalence class. 45 | 46 | ### [Category Theory and Algorithms](http://www.cs.man.ac.uk/~david/categories/book/book.pdf) 47 | 48 | In mathematical terms, this way of grouping programs into equivalence classes introduces a structure that can be analyzed using **category theory**. In this theory, the set of all programs is not considered a category, but the set of algorithms forms a category with additional structure. This structure helps in understanding the relationships between different algorithms and their implementations. 49 | 50 | The conditions that determine whether two programs are equivalent can be described as **coherence relations**. These relations define the rules by which different programs are grouped into the same equivalence class, enriching the category of algorithms with extra properties. 51 | 52 | ### Universal Properties of the Category of Algorithms 53 | 54 | **Universal properties** are a key concept in category theory, and they apply to algorithms as well. These properties help us understand how algorithms relate to one another and how they can be combined or transformed. Universal properties provide a formal framework for reasoning about algorithms, especially when we need to compare them or understand their behavior in a more abstract sense. 55 | 56 | ### Questions about the Definition of an Algorithm 57 | 58 | In their book _Introduction to Algorithms_, authors Cormen, Leiserson, Rivest, and Stein define an algorithm informally as a well-defined computational procedure that takes input and produces output. However, some questions arise from this definition: 59 | 60 | 1. **"Informally"?** – Given the technical nature of the book, some might expect a more formal definition of an algorithm. 61 | 2. **What is "well-defined"?** – The term "well-defined" means that the steps of the algorithm are clear and can be understood unambiguously by both humans and machines. 62 | 63 | 3. **What is a "procedure"?** – The word "procedure" refers to the specific series of steps followed to solve a problem, which is essentially what an algorithm is. 64 | 65 | Donald Knuth, a renowned computer scientist, offers a more precise perspective, stating that while algorithms are difficult to define formally, they are nevertheless **real mathematical objects**. We refer to algorithms with statements like "Mergesort runs in O(n log n) time" or "There does not exist an algorithm to solve the halting problem." Algorithms are just as real as numbers or mathematical sets. 66 | 67 | ### Algorithms as Abstract Objects 68 | 69 | Just like the number 42 is not tied to any particular representation (whether written as `42`, `XLII`, or `101010` in binary), an algorithm exists as an abstract concept. Multiple programs can implement the same algorithm, just like multiple sets of objects can represent the number 42. In this sense, an algorithm is defined similarly to how **Gottlob Frege** defined a natural number as the equivalence class of all sets of the same size. 70 | 71 | To summarize: 72 | 73 | - **Algorithms** are abstract entities that represent a specific way to solve a problem. 74 | - **Programs** are concrete implementations of these algorithms in a particular language or system. 75 | - Different programs that implement the same algorithm are grouped into **equivalence classes**, allowing us to define an algorithm as the set of all equivalent programs. 76 | 77 | ### 1. Defining an Algorithm as a Set of Programs 78 | 79 | To explain this more clearly: an **algorithm** is not simply a single program or piece of code. Instead, it’s the **entire set of programs** that can be written to achieve the same task. For example, imagine a professor teaching a sorting algorithm like MergeSort. Students in the class might all go home and write different programs that implement MergeSort in various ways. Despite differences in code structure or approach, all of these programs are performing the same underlying sorting task. In this sense, all of these programs belong to the same "equivalence class," which defines the algorithm as a whole. 80 | 81 | This leads to the idea that algorithms are essentially collections of programs. Two programs belong to the same equivalence class (or algorithm) if they perform the same function. The algorithm itself is defined by this collection of equivalent programs. 82 | 83 | ### 2. What Does It Mean for Two Programs to be "Essentially" the Same? 84 | 85 | When two programs are considered "essentially" the same, it means they are performing the same task but may have small differences in their implementation. 86 | 87 | - One program performs two unrelated processes, let's call them **Process1** and **Process2**, in a certain order. Another program might perform these processes in the reverse order. Even though the order of operations differs, the overall result is the same. 88 | - One program uses a loop to repeat a certain task **n** times. Another program "unwinds" the loop, meaning it explicitly writes out the steps rather than using a loop structure. Again, these two programs achieve the same result, but in slightly different ways. 89 | - One program might perform two separate tasks within a single loop, while another program might split these tasks into two separate loops. Despite this structural difference, both programs are effectively doing the same thing. 90 | 91 | In these examples, even though the programs may look different at the level of code, they are performing the same underlying task or function. Therefore, they belong to the same algorithm. 92 | 93 | ### 3. Algorithms as Equivalence Classes 94 | 95 | The key idea is that an **algorithm** is the collection of all programs that can achieve the same result. These programs are grouped into what are called **equivalence classes**. The notion of an equivalence class is a common concept in mathematics: two things are considered equivalent if they share some essential property. In this case, two programs are considered equivalent if they implement the same algorithm. 96 | 97 | For example, all the programs that implement MergeSort are considered part of the same equivalence class. Similarly, all the programs that implement QuickSort belong to a different equivalence class. In the language of the text, the "set of programs" is partitioned into these equivalence classes, where each class represents a distinct algorithm. 98 | 99 | ### 4. Subjectivity of Equivalence 100 | 101 | One important point is that the decision of whether two programs are "essentially the same" is somewhat subjective. Different people or contexts might have different criteria for considering two programs equivalent. That the relations used to decide this equivalence are not set in stone; others might come up with different or additional ways of defining equivalence between programs. Despite this subjectivity, there are some standard relations that most people would agree on. These relations correspond to what are called **categorical coherence rules** in mathematics, and when we "mod-out" (apply the equivalence relation), the set of programs becomes a more structured object, called a **category**. 102 | 103 | ### 5. Moving from Programs to Algorithms to Computable Functions 104 | 105 | - **Programs**: These are the concrete implementations that software engineers write. They are individual pieces of code that perform tasks. 106 | - **Algorithms**: These are equivalence classes of programs. An algorithm is a collection of programs that perform the same task, even if they differ in implementation. 107 | - **Computable Functions**: At the highest level, we have functions. A computable function is something that can be calculated by a program or algorithm. For example, the function "sort" takes a list of numbers and returns the list in sorted order. 108 | 109 | Different algorithms can compute the same function. For example, MergeSort and QuickSort are two different algorithms, but both compute the same **function**: sorting a list of numbers. 110 | 111 | ### 6. Mapping Between Levels 112 | 113 | - There is a mapping (or function) from **programs to algorithms**. This mapping takes each program and assigns it to the equivalence class (algorithm) that it belongs to. 114 | - There is also a mapping from **algorithms to computable functions**. This takes each algorithm and assigns it to the computable function that it implements. 115 | 116 | ### Visual Representation: 117 | 118 | ``` 119 | Programs → Algorithms → Computable Functions 120 | ``` 121 | 122 | Each level represents a higher level of abstraction: 123 | 124 | - **Programs** are specific code implementations. 125 | - **Algorithms** are groups of programs that do the same task. 126 | - **Computable Functions** are abstract mathematical functions that can be computed by various algorithms. 127 | 128 | The idea here is that programs can be grouped together if they are “essentially the same,” forming an equivalence class. Two programs are considered to be in the same equivalence class if they perform the same essential task, even if the details of how they achieve it differ. For example, one program might execute certain processes in a different order, or it might loop through a task in a slightly different manner. Despite these differences, both programs can be seen as implementations of the same algorithm. An algorithm, in this sense, is the sum of all the programs that implement it. 129 | 130 | This leads to a concept where all programs are divided into subsets (equivalence classes), and each subset represents one algorithm. This gives us a set of equivalence classes called "Algorithms." Formally, we can define a function φ that maps a program to its corresponding algorithm, meaning φ takes a program and assigns it to its equivalence class (its algorithm). This algorithm is the essential idea behind all the programs in that class. Additionally, there could be another function ψ that, when applied to an algorithm, produces a specific program implementing it. 131 | 132 | It's important to recognize that there are different ways to compare programs for "sameness." Some methods are stricter, while others are more lenient. In the strictest interpretation, every program would be considered its own unique algorithm, meaning every program is distinct. In the loosest interpretation, two programs would be considered the same if they produce the same result or perform the same function, leading to a view where every program that computes the same result is just an expression of the same computable function. 133 | 134 | In the middle of these two extremes lies the approach taken in this discussion, where programs that are "essentially" the same are grouped together into equivalence classes, but these groups still distinguish between algorithms that compute the same result in fundamentally different ways (like different methods for sorting data). Other equivalence relations can exist that further fine-tune how programs are grouped into algorithms, and different relations will lead to different structures. 135 | 136 | The notion of algorithms forming a set is not just abstract. It forms a mathematical structure called a category, which has some specific properties. For instance, the category of algorithms in this case has a Cartesian product structure, which means that the category supports the idea of combining algorithms, and it has a special way of handling natural numbers, meaning it can express recursive algorithms. 137 | 138 | Although related categories have been studied in various forms, the connection between algorithms and this specific categorical structure is relatively new. Just as rational numbers are defined as equivalence classes of integer pairs, algorithms can be thought of as equivalence classes of programs. When we write an algorithm down, we’re really just writing one of its programs, which is why algorithms are often presented in pseudo-code: it’s a way to avoid being tied to any specific programming language. Pseudo-code captures the essential steps of an algorithm without specifying exactly how the algorithm should be implemented. 139 | 140 | A nice analogy here is to think about how rational numbers, such as 3/5, are equivalence classes. Just as there are many ways to express the same rational number (like 3/5, 6/10, or 3000/5000), there are many ways to express an algorithm through different programs. While we often prefer a "canonical" representation of a rational number, like 3/5, we might wish for a similarly preferred or canonical representation of an algorithm. This concept, though, is speculative and explored further in later parts of the paper. 141 | 142 | The next question is: which programming language should be used to express algorithms? Instead of picking a specific programming language, using the language of descriptions for **primitive recursive functions**. This language is simple, elegant, and familiar to many readers. It focuses on three operations: **Composition** (sequential processing), **Bracket** (parallel processing), and **Recursion** (looping). Primitive recursive functions are an important subset of computable functions, and their descriptions can be seen as programs that calculate functions. Although this framework is limited to primitive recursive functions for now, it still provides interesting insights and results. There is also ongoing work to expand this to cover all recursive functions, which would give a more comprehensive framework. 143 | 144 | ### Key Properties of an Algorithm: 145 | 146 | 1. **Input**: An algorithm must accept a finite number of inputs. These inputs represent the data to be processed. 147 | 2. **Output**: It must produce at least one output or result, which is the solution or a decision made by following the steps of the algorithm. 148 | 3. **Definiteness**: Each instruction in the algorithm must be clear, unambiguous, and precisely defined. This ensures that every step is understood without confusion. 149 | 4. **Finiteness**: The algorithm must terminate after a finite number of steps. It cannot continue indefinitely. 150 | 5. **Effectiveness**: Each step of the algorithm must be basic enough to be carried out manually or executed by a machine. 151 | 152 | ### Criteria for Evaluating an Algorithm: 153 | 154 | When analyzing or designing an algorithm, the following factors are considered: 155 | 156 | 1. **Data Structures**: What data structures (lists, queues, stacks, heaps, trees, etc.) should be used to implement the algorithm? 157 | 2. **Correctness**: Is the algorithm correct? Does it always produce the correct output, or only in some cases? 158 | 3. **Efficiency**: How efficient is the algorithm? Efficiency is measured in terms of time complexity and space complexity, and it usually depends on the size of the input. 159 | 4. **Complexity**: Does an efficient algorithm exist for the problem? This leads to the famous **P vs NP** problem, which is a fundamental question in theoretical computer science regarding the existence of polynomial-time algorithms for NP problems. 160 | 161 | ### Conclusion 162 | 163 | In conclusion, algorithms are foundational concepts in both mathematics and computer science, originating from the work of Al-Khwarizmi, a scholar whose contributions laid the groundwork for modern computational thinking. An algorithm is a systematic, well-defined sequence of steps used to solve problems or perform computations, with key characteristics like finiteness, determinism, and clarity of execution. As abstract entities, algorithms transcend the specific programs that implement them, representing the core logic of problem-solving across various languages and platforms. 164 | 165 | By viewing algorithms as equivalence classes of programs, we can appreciate how different implementations of the same algorithm belong to the same conceptual framework, even if they vary in code structure or language. This abstraction helps us understand how diverse programs can achieve the same task and how algorithms relate to broader mathematical concepts, such as computable functions. 166 | 167 | Furthermore, the application of category theory provides a deeper understanding of the relationships between algorithms, showing how they can be grouped and compared based on coherence and equivalence. In essence, algorithms are the bridge between abstract mathematical functions and concrete computational procedures, enabling efficient and precise problem-solving in both theoretical and practical contexts. 168 | -------------------------------------------------------------------------------- /my-articles/Data Structure/Introduction of Data Structure.md: -------------------------------------------------------------------------------- 1 | ### Introduction to Analysis of Algorithms 2 | 3 | **Algorithm analysis** is a key area within computational complexity theory, which estimates the theoretical resources required by an algorithm to solve a given computational problem. It plays a critical role in determining how efficiently an algorithm performs, particularly in terms of time and space. 4 | 5 | Most algorithms are designed to handle inputs of arbitrary length, meaning the algorithm must perform regardless of the size of the data. Analyzing algorithms helps us understand their performance for different input sizes, providing insights into the scalability and efficiency of an algorithm. The efficiency of an algorithm is commonly expressed in terms of: 6 | 7 | - **Time Complexity**: This measures how the runtime of an algorithm changes as the input size increases. It is often represented by Big-O notation, which provides an upper bound on the time it takes for the algorithm to run based on input size. 8 | - **Space Complexity**: This measures how much memory an algorithm uses relative to the input size. It is crucial for understanding how much additional storage is required when executing the algorithm. 9 | 10 | ### Types of Algorithm Analysis 11 | 12 | There are four main types of algorithm analysis: 13 | 14 | 1. **Worst-Case Analysis**: 15 | 16 | - This refers to the maximum number of steps or resources an algorithm will need for any input of size `n`. Worst-case analysis is important for ensuring that the algorithm will perform efficiently under the most difficult circumstances. 17 | - **Example**: In a linear search algorithm, the worst-case scenario occurs when the target element is at the very end of the list, requiring the algorithm to scan through every element before finding it. 18 | 19 | 2. **Best-Case Analysis**: 20 | 21 | - This calculates the minimum number of steps required by the algorithm for any input of size `n`. While useful, best-case analysis is less significant in real-world applications because it only reflects the most favorable input scenario. 22 | - **Example**: In the same linear search algorithm, the best-case scenario is when the target element is the first element, meaning the search ends after one comparison. 23 | 24 | 3. **Average-Case Analysis**: 25 | 26 | - This computes the average number of steps the algorithm will take for a random input of size `n`. Average-case analysis provides a more realistic expectation of performance compared to best- and worst-case scenarios. 27 | - **Example**: In sorting algorithms like quicksort, the average case might consider random input orders and derive the expected number of comparisons. 28 | 29 | 4. **Amortized Analysis**: 30 | - Amortized analysis looks at a sequence of operations on a data structure and provides an average performance over time. This is particularly useful when some operations can be expensive, but their cost is "amortized" by many cheaper operations. 31 | - **Example**: In dynamic array resizing, while resizing can be expensive, it happens infrequently, so the average cost per insertion is low when considering multiple insertions. 32 | 33 | ### Importance of Algorithm Analysis 34 | 35 | Algorithm analysis helps identify the efficiency of an algorithm in terms of **CPU time**, **memory usage**, **disk usage**, and **network usage**. Among these, **CPU time** (time complexity) is typically the most critical factor when evaluating algorithms. 36 | 37 | It’s important to distinguish between **performance** and **complexity**: 38 | 39 | - **Performance**: This measures how much time or resources (memory, disk, etc.) are used when a program is run. Performance is dependent on several factors, including the hardware (machine specifications), software (compiler optimizations), and the algorithm itself. 40 | 41 | - **Example**: A sorting algorithm may take 10 milliseconds on one machine and 20 milliseconds on another, depending on CPU speed. 42 | 43 | - **Complexity**: Complexity examines how the resource requirements of an algorithm scale as the problem size increases. This provides a more general measure of the algorithm’s efficiency, independent of the specific machine or environment. 44 | - **Example**: If an algorithm has a time complexity of O(n^2), its runtime will grow quadratically as the input size increases. 45 | 46 | ### Role of Algorithms in Computing 47 | 48 | Algorithms are at the heart of computing, providing precise sets of instructions that a computer must follow to perform a task or solve a problem. Whether sorting data, processing images, or searching for information, algorithms help computers execute tasks efficiently and accurately. Algorithm efficiency is critical because it directly impacts the performance of computer systems in various industries. 49 | 50 | ### Applications of Algorithms in Various Industries 51 | 52 | Algorithms play a significant role in many industries by optimizing operations, enhancing decision-making, and improving efficiency. Some examples include: 53 | 54 | 1. **Manufacturing**: Algorithms are used to optimize production processes and supply chain management. This includes reducing waste, improving scheduling, and enhancing overall efficiency. 55 | 56 | - **Example**: The use of algorithms in inventory management to determine the most efficient production schedules for minimizing waste. 57 | 58 | 2. **Finance**: Algorithms are employed to analyze financial data, detect patterns, and make predictions, enabling traders and investors to make informed decisions. 59 | 60 | - **Example**: High-frequency trading algorithms that buy and sell assets within milliseconds to capitalize on price fluctuations. 61 | 62 | 3. **Healthcare**: Medical algorithms process and analyze medical images, assist in diagnosing diseases, and help optimize treatment plans. 63 | 64 | - **Example**: Algorithms used in MRI scans to detect abnormalities or in predictive models for diagnosing potential health issues. 65 | 66 | 4. **Retail**: Algorithms are vital for customer relationship management, personalized product recommendations, and pricing optimization, improving sales and customer satisfaction. 67 | 68 | - **Example**: Recommender systems used by e-commerce platforms to suggest products based on user behavior. 69 | 70 | 5. **Transportation**: Algorithms help optimize routes for delivery and transportation, reducing fuel consumption and improving delivery times. 71 | 72 | - **Example**: GPS navigation systems that calculate the most efficient route based on traffic data and road conditions. 73 | 74 | 6. **Energy**: Energy companies use algorithms to optimize energy generation, distribution, and consumption, leading to reduced energy waste and enhanced efficiency. 75 | 76 | - **Example**: Smart grid algorithms for balancing electricity supply and demand across a power network. 77 | 78 | 7. **Security**: In cybersecurity, algorithms are crucial for detecting and preventing threats like hacking, fraud, and cyber-attacks. 79 | - **Example**: Machine learning algorithms that detect anomalies in network traffic to prevent cyber-attacks. 80 | 81 | ### Key Applications of Algorithms in Computing 82 | 83 | Algorithms are fundamental in many aspects of computing, including: 84 | 85 | 1. **Data Processing**: Algorithms are essential for handling large amounts of data, whether for sorting, searching, or organizing data. 86 | 87 | - **Example**: Sorting algorithms like mergesort or quicksort, which organize data in a particular order for faster access. 88 | 89 | 2. **Problem Solving**: Algorithms are used to solve computational problems such as optimization, mathematical problems, and decision-making processes. 90 | 91 | - **Example**: Algorithms like Dijkstra’s algorithm find the shortest path in graphs, solving optimization problems efficiently. 92 | 93 | 3. **Computer Graphics**: Algorithms are used in creating, processing, and compressing images and graphics. 94 | 95 | - **Example**: JPEG compression algorithms that reduce image file sizes while maintaining visual quality. 96 | 97 | 4. **Artificial Intelligence**: AI systems rely on algorithms for machine learning, natural language processing (NLP), and computer vision tasks. 98 | 99 | - **Example**: Neural network algorithms used in image recognition, speech processing, and decision-making systems. 100 | 101 | 5. **Database Management**: Algorithms are critical in managing large databases, such as indexing algorithms and query optimization algorithms that make data retrieval more efficient. 102 | 103 | - **Example**: B-trees are used in databases to manage sorted data and allow efficient insertion, deletion, and searching operations. 104 | 105 | 6. **Network Communication**: Efficient communication and data transfer across networks depend on algorithms like routing and error correction algorithms. 106 | 107 | - **Example**: The TCP/IP protocol stack uses algorithms to ensure reliable data transmission over the internet. 108 | 109 | 7. **Operating Systems**: Operating systems rely on algorithms for process scheduling, memory management, and disk management. 110 | - **Example**: Round-robin scheduling algorithms allocate CPU time fairly among processes in a multitasking operating system. 111 | 112 | ## Algorithm of Efficiency 113 | 114 | When analyzing algorithms, two critical factors determine their efficiency: space efficiency (memory usage) and time efficiency (execution time). These factors can impact how well an algorithm performs, especially for large datasets or time-sensitive tasks. Let's go through each concept in detail. 115 | 116 | ### 1. Space Efficiency (Space Complexity) 117 | 118 | Space efficiency refers to the amount of memory an algorithm requires during its execution. It is important when working with large datasets or in environments with limited memory, such as embedded systems. Space complexity is typically measured based on the following: 119 | 120 | #### Components of Memory Use: 121 | 122 | - **Instruction Space**: Memory required to store the program's instructions. This is affected by factors like: 123 | 124 | - The compiler used 125 | - Compiler optimization options 126 | - The architecture of the target machine (such as the CPU) 127 | 128 | - **Data Space**: Memory required for storing variables. This includes: 129 | 130 | - Data size, dynamically allocated memory, and static program variables. 131 | 132 | - **Run-time Stack Space**: Memory used by the program's call stack. This is affected by: 133 | - Function calls and recursion (which lead to more stack frames) 134 | - Local variables and parameters passed during function calls. 135 | 136 | #### Static vs. Dynamic Memory Components: 137 | 138 | - **Fixed/Static Components**: These are determined at compile-time, such as the memory used by machine instructions and static variables. This size does not change during execution. 139 | - **Variable/Dynamic Components**: These are determined at run-time, such as the memory used by recursion and dynamically allocated memory. 140 | 141 | ### 2. Time Efficiency (Time Complexity) 142 | 143 | Time efficiency measures how long an algorithm takes to run. This can be influenced by several factors: 144 | 145 | - **Speed of the computer**: This includes the CPU, I/O operations, memory access speeds, etc. 146 | - **Compiler and compiler options**: Different compilers can optimize code differently, impacting execution time. 147 | - **Size of the input**: Algorithms often run more slowly on larger datasets. 148 | - **Nature of the input**: In certain algorithms (like searching), the structure of the input can affect how long it takes to complete. For instance, in a linear search, if the desired element is at the beginning of the list, the search will be faster compared to when it is at the end. 149 | 150 | ### Recursive Algorithm Analysis 151 | 152 | To analyze recursive algorithms, follow these steps: 153 | 154 | #### Steps for Analysis: 155 | 156 | 1. **Identify a parameter** indicating the size of the input. This will help define the recursive relation. 157 | 2. **Identify the basic operation**: This is the fundamental step repeated in the algorithm (e.g., a multiplication or a disk move in Tower of Hanoi). 158 | 3. **Analyze the number of times the basic operation is executed**: Depending on the input, this might vary, so consider: 159 | 160 | - **Worst-case**: The scenario where the algorithm performs the maximum number of operations. 161 | - **Average-case**: The scenario where the algorithm performs a typical number of operations for random input. 162 | - **Best-case**: The scenario where the algorithm performs the fewest operations. 163 | 164 | 4. **Set up a recurrence relation**: A recurrence relation expresses the number of basic operations executed based on the input size. 165 | 166 | 5. **Solve the recurrence relation**: Either find an exact solution or estimate the order of growth (big-O notation). 167 | 168 | ### Example: Recursive Evaluation of Factorial $n!$ 169 | 170 | The factorial function $n!$ is defined as the product of all positive integers up to $n$. For instance: 171 | 172 | - $4! = 1 \times 2 \times 3 \times 4 = 24$ 173 | 174 | #### Recursive Definition of Factorial 175 | 176 | Factorial can be defined recursively as follows: 177 | 178 | - $F(n) = F(n-1) \times n$ for $n \geq 1$ 179 | - $F(0) = 1$ 180 | 181 | Here is an algorithm to compute $n!$: 182 | 183 | ``` 184 | ALGORITHM F(n) 185 | // Computes n! recursively 186 | // Input: A nonnegative integer n 187 | // Output: The value of n! 188 | if n = 0 return 1 189 | else return F(n - 1) * n 190 | ``` 191 | 192 | #### Time Complexity Analysis for Factorial: 193 | 194 | To compute the factorial of $n$, the algorithm must perform $n$ multiplications. Here's a recurrence relation for the number of multiplications $M(n)$: 195 | 196 | - $M(n) = M(n-1) + 1$ for $n > 0$ 197 | - $M(0) = 0$ (No multiplications when $n = 0$) 198 | 199 | Solving this recurrence gives: 200 | 201 | - $M(n) = M(n-1) + 1$ 202 | - $M(n-1) = M(n-2) + 1$ 203 | - $M(n) = M(n-2) + 2$ 204 | - $M(n) = M(n-3) + 3$ 205 | - ... 206 | - $M(n) = n$ 207 | 208 | Thus, the time complexity of this algorithm is $O(n)$ because it performs $n$ multiplications. 209 | 210 | ### Example: Tower of Hanoi 211 | 212 | The Tower of Hanoi is a classic recursive problem. The objective is to move a set of disks from one rod to another, following certain rules: 213 | 214 | 1. Only one disk can be moved at a time. 215 | 2. A larger disk cannot be placed on a smaller disk. 216 | 217 | #### Recursive Relation for Tower of Hanoi: 218 | 219 | - If there are $n$ disks, the recursive relation is: 220 | $ M(n) = 2M(n-1) + 1 $ 221 | where $M(n)$ is the number of moves required to transfer $n$ disks. 222 | - **Initial Condition**: When $n = 1$, only one move is required: $M(1) = 1$. 223 | 224 | #### Solving the Recurrence: 225 | 226 | Using backward substitution to solve the recurrence: 227 | $M(n) = 2M(n-1) + 1$ 228 | $= 2[2M(n-2) + 1] + 1 = 2^2M(n-2) + 2 + 1$ 229 | $= 2^3M(n-3) + 2^2 + 2 + 1$ 230 | $...$ 231 | $M(n) = 2^n - 1$ 232 | 233 | Thus, the number of moves grows exponentially with $n$. The time complexity of Tower of Hanoi is $O(2^n)$. 234 | 235 | ### Analyzing the Time Efficiency of Non-Recursive Algorithms 236 | 237 | The time efficiency of non-recursive algorithms can be analyzed through a systematic approach. This process involves identifying critical factors such as the size of the input, the basic operation of the algorithm, and how the execution of that operation scales with the input size. Here’s how this analysis is conducted step-by-step: 238 | 239 | 1. **Decide on a Parameter (or Parameters) Indicating an Input’s Size** 240 | The input's size is a crucial factor in determining an algorithm's time efficiency. For example, if an algorithm processes a list, the input's size would typically be the number of elements in the list (denoted as $n$). 241 | 242 | 2. **Identify the Algorithm’s Basic Operation** 243 | The basic operation is the fundamental computation or comparison repeated most frequently in the algorithm. It is typically found in the innermost loop or operation. For example, in a sorting algorithm, the basic operation might be comparing two elements. 244 | 245 | 3. **Check Whether the Number of Times the Basic Operation is Executed Depends Only on Input Size** 246 | The execution of the basic operation may depend solely on the input size or on other factors, such as the specific data or structure of the input. For example, in some algorithms, the execution count may differ between the best, average, and worst-case scenarios. When this is the case, separate analyses for worst-case, average-case, and best-case efficiencies are necessary. 247 | 248 | 4. **Set Up a Sum for the Number of Times the Basic Operation is Executed** 249 | A sum is established to express how many times the basic operation is executed as a function of the input size. For example, if a loop executes $n-1$ times, the sum would reflect that repetition. 250 | 251 | 5. **Use Standard Formulas and Rules of Sum Manipulation** 252 | Using mathematical formulas, the sum is simplified to find either a closed-form expression or at least the asymptotic growth rate of the algorithm’s time complexity. The goal is to determine the algorithm's "order of growth," typically expressed using Big-O notation (e.g., $O(n)$, $O(n^2)$). 253 | 254 | --- 255 | 256 | ### Example: Finding the Maximum Element in an Array 257 | 258 | Let’s consider an algorithm that finds the maximum element in an array of $n$ elements. 259 | 260 | **Algorithm: Max Element** 261 | The following pseudocode shows a simple algorithm for finding the largest element in an array: 262 | 263 | ``` 264 | // Input: Array A[0..n-1] of real numbers 265 | // Output: The value of the largest element in A 266 | Max_val ← A[0] 267 | for i ← 1 to n − 1 do 268 | if A[i] > Max_val 269 | Max_val ← A[i] 270 | return Max_val 271 | ``` 272 | 273 | **Algorithm Analysis**: 274 | 275 | - **Input Size**: The input size is the number of elements in the array, denoted as $n$. 276 | - **Basic Operation**: The basic operation is the comparison $A[i] > Max_val$, as it occurs on every iteration of the loop. 277 | - **No Best, Worst, or Average Case Distinction**: In this algorithm, the number of comparisons remains the same regardless of the order of elements in the array. Therefore, the analysis applies to all cases. 278 | - **Sum of Basic Operations**: Since the comparison is executed once per iteration of the loop, and the loop runs from $i = 1$ to $i = n-1$, the number of comparisons $C(n)$ is: 279 | 280 | $C(n) = \sum\_{i=1}^{n-1} 1 = n - 1$ 281 | 282 | This indicates that the time complexity is $O(n)$, meaning the algorithm’s execution time grows linearly with the size of the array. 283 | 284 | --- 285 | 286 | ### Empirical Analysis of Algorithms 287 | 288 | Empirical analysis is an evidence-based approach that relies on observed and measurable evidence. It is an essential tool in analyzing algorithms, especially when theoretical analysis alone may not provide a complete picture. Empirical evidence in algorithm analysis often involves measuring an algorithm's performance by running it on actual input data and recording its behavior. 289 | 290 | Steps of empirical analysis include: 291 | 292 | 1. **Observation**: 293 | Initial observations of the algorithm's behavior are made, often by running it on different datasets. These observations spark ideas or lead to hypotheses about the algorithm's performance characteristics. 294 | 295 | **Example**: Observing how the algorithm for finding the maximum element performs on small vs. large arrays can lead to insights about its time efficiency. 296 | 297 | 2. **Induction**: 298 | Based on the observed data, a probable explanation for the algorithm's behavior is proposed. Inductive reasoning is used to generalize the specific results from the observations. 299 | 300 | **Example**: After observing that the algorithm takes longer with larger arrays, one might hypothesize that its time complexity is linear. 301 | 302 | 3. **Deduction**: 303 | A testable hypothesis is formulated, which can be verified by conducting more experiments or using theoretical analysis. Deductive reasoning takes the general explanation and predicts specific outcomes that can be tested. 304 | 305 | **Example**: Based on the hypothesis that the time complexity is $O(n)$, one might predict that doubling the array size will roughly double the running time. 306 | 307 | 4. **Testing**: 308 | Quantitative and qualitative data are gathered through experimentation. This data is often analyzed statistically to confirm or refute the hypothesis. The results can support the hypothesis, refute it, or be neutral. 309 | 310 | **Example**: Running the algorithm on arrays of various sizes and measuring the running time would provide empirical data to support or refute the hypothesis about time complexity. 311 | 312 | 5. **Evaluation**: 313 | After gathering and analyzing the empirical data, conclusions are drawn, and the results are documented. This stage may include discussing any limitations encountered during testing and suggestions for future research or improvements. 314 | 315 | --- 316 | 317 | ### Example of Empirical Analysis: 318 | 319 | Suppose we are analyzing the `Max Element` algorithm through empirical testing by running it on arrays of increasing size and recording the running time. If we observe that doubling the array size roughly doubles the running time, this supports the conclusion that the time complexity is linear ($O(n)$). 320 | 321 | In summary, analyzing non-recursive algorithms involves understanding how the number of basic operations scales with input size. Empirical analysis complements theoretical analysis by providing real-world performance data. Both methods provide insight into an algorithm's efficiency and help optimize performance based on practical requirements. 322 | 323 | **[Related link for more information about this section](https://github.com/m-mdy-m/TechShelf/tree/main/Algorithms/Analysis)** 324 | -------------------------------------------------------------------------------- /my-articles/Data Structure/STRING/String Manipulation and Algorithms.md: -------------------------------------------------------------------------------- 1 | # String Manipulation and Algorithms 2 | 3 | ## Introduction 4 | 5 | The digital world thrives on information, and a substantial portion of this information resides in the form of text. String manipulation and searching algorithms serve as the foundation for processing and analyzing this textual data. These algorithms empower us to perform a wide range of operations on strings, from the fundamental act of combining words (concatenation) to the intricate task of identifying specific patterns buried within vast amounts of text. 6 | 7 | - **String Manipulation:** Imagine building with words. String manipulation algorithms act as our tools, allowing us to create new strings by joining existing ones (concatenation), extract specific portions (substrings), or determine the length of our textual building blocks. These fundamental operations are the backbone for tasks like preparing data for analysis, generating text, and extracting valuable information from documents. 8 | 9 | **A Real-World Example: Genomic Research and String Searching** 10 | 11 | In the realm of genomic research, scientists analyze vast amounts of DNA sequence data. String searching algorithms become instrumental here. Imagine a scenario where researchers aim to identify specific genes within a long DNA sequence. These algorithms function as powerful tools, allowing scientists to efficiently locate these gene sequences (specific patterns) within the DNA data. This process enables them to pinpoint genes associated with specific diseases or traits, ultimately furthering our understanding of human biology. 12 | 13 | ## String Searching Algorithms: A Complexity Perspective 14 | 15 | - **Time Complexity: A Measure of Algorithmic Speed** 16 | 17 | Time complexity quantifies the execution time of an algorithm, typically expressed as a function of the input data size (often denoted by "n"). In the context of string searching, the input data encompasses the text string (length n) and the pattern string to be located. Time complexity analysis facilitates our understanding of how the execution time scales with increasing input size. Ideally, we seek algorithms with time complexity that exhibits slow growth or remains constant as the input size expands. This ensures efficient execution even when dealing with vast amounts of textual data. 18 | 19 | - **Space Complexity: Analyzing Memory Footprint** 20 | 21 | Space complexity, on the other hand, delves into the amount of additional memory space an algorithm requires beyond the input data itself. This additional space is often utilized for temporary variables or data structures employed during the search process. In the context of string searching algorithms, space complexity analysis sheds light on the memory footprint of the algorithm, allowing us to determine its suitability for scenarios with limited memory resources. Ideally, we prefer algorithms with space complexity that remains constant or grows slowly as the input size increases. This ensures efficient memory utilization and avoids resource constraints, particularly when processing large datasets. 22 | 23 | ## **Brute-Force Search: A Baseline Approach** 24 | 25 | The brute-force search serves as a foundational and intuitive technique for locating patterns within text. However, its simplicity comes at a cost in terms of computational efficiency. 26 | 27 | **Algorithm Description: A Methodical Comparison** 28 | 29 | The brute-force search algorithm adopts a straightforward approach. It systematically iterates through the text string, comparing the pattern string character by character at each potential starting position. If a mismatch occurs between corresponding characters in the text and pattern strings, the algorithm promptly shifts the pattern one position to the right and restarts the comparison process. This continues until either a complete match is identified or the entire text string has been scanned without success. 30 | 31 | **Real-World Example: Searching for Keywords in a Legal Document** 32 | 33 | Imagine a lawyer meticulously searching for a specific legal term within a lengthy contract. This scenario exemplifies the brute-force approach in action. The lawyer methodically reads through the contract (text string), comparing each word (character) to the specific legal term (pattern string). If a mismatch occurs, they simply move on to the next word and repeat the comparison. While effective for small documents, this approach becomes increasingly time-consuming and laborious as the document size (text string length) grows. 34 | 35 | **Complexity Analysis: Unveiling the Efficiency Bottleneck** 36 | 37 | While intuitively straightforward, the brute-force search suffers from significant limitations in terms of efficiency. Its time complexity, denoted by O(n\*m), reveals its Achilles' heel. Here, "n" represents the length of the text string and "m" represents the length of the pattern string. This implies that the execution time grows proportionally to the product of the text and pattern lengths. For large datasets, this translates to a substantial increase in processing time. Additionally, the space complexity of the brute-force algorithm is typically O(1), signifying a constant memory footprint independent of the input size. This is an advantage, but the trade-off lies in the significant time complexity bottleneck. 38 | 39 | **Limitations and the Need for More Sophisticated Approaches** 40 | 41 | The brute-force search, while conceptually simple, exhibits a critical limitation - its inefficiency for large datasets. The exponential growth in execution time with increasing input size renders it unsuitable for practical applications involving vast amounts of textual data. This paves the way for exploring more sophisticated string searching algorithms designed to achieve superior efficiency and handle real-world text processing demands. 42 | 43 | ## Efficient String Searching Algorithms: Beyond Brute Force 44 | 45 | The limitations of the brute-force search algorithm necessitate the exploration of more efficient techniques for string searching. Enter the realm of sophisticated algorithms engineered to locate patterns within text with remarkable speed and minimal resource consumption. Here, we delve into the Z-algorithm, a powerful tool that leverages pre-computed information to achieve superior efficiency. 46 | 47 | ### **A. The Z-Algorithm: Unveiling Prefix Matches** 48 | 49 | The Z-algorithm hinges on a clever concept known as the Z-function. The Z-function, denoted by Z[i], for a given string S, calculates the length of the longest substring starting at index i that also occurs as a prefix of the string S. In essence, the Z-function pre-computes information about potential matches between prefixes and suffixes within the text string. 50 | 51 | **Algorithm Explanation: Exploiting Pre-computed Knowledge:** 52 | 53 | The Z-algorithm leverages the Z-function to efficiently locate occurrences of the pattern P within the text string T. It constructs a Z-array for the concatenated string formed by adding a special separator character between P and T (denoted as PT). This separator ensures that no prefixes and suffixes within P can match each other. By iterating through the Z-array, the algorithm identifies positions where Z[i] is equal to the length of the pattern P. These positions correspond to potential starting points of pattern matches within the text string T. 54 | 55 | **Real-World Example: Musical Plagiarism Detection:** 56 | 57 | Imagine a musician analyzing a new melody to check for potential plagiarism. The Z-algorithm can be employed here. The musician's original melody (text string) is concatenated with the suspected plagiarized melody (pattern string) using a unique separator symbol. The Z-function then identifies sections within the combined melody where a significant portion of the suspected melody matches a prefix of the original melody. This expedites the plagiarism detection process, allowing the musician to focus on potential matches flagged by the algorithm. 58 | 59 | **Complexity Analysis: Efficiency Gains:** 60 | 61 | The Z-algorithm boasts a significant advantage over the brute-force search in terms of efficiency. Its time complexity is typically linear, O(n+m), where n is the length of the text string and m is the length of the pattern string. This implies that the execution time grows proportionally to the sum of the text and pattern lengths, a substantial improvement over the brute-force algorithm's exponential growth. The space complexity of the Z-algorithm is also linear, O(n), requiring additional memory proportional to the text string length to store the Z-array. 62 | 63 | --- 64 | 65 | ### B. Manacher's Algorithm : A Champion for Palindromes 66 | 67 | While the Z-algorithm excels at general string searching, specific problems demand specialized solutions. Manacher's algorithm emerges as a powerful tool for identifying palindromic substrings within a text string with remarkable efficiency. 68 | 69 | **What is Palindromic? :** 70 | 71 | A palindrome is a word, phrase, number, or other sequence of characters that reads the same backward as forward, such as "madam" or "racecar". Here are some key points about palindromes: 72 | 73 | - **Direction-independent:** The sequence of characters reads the same regardless of whether you start from the beginning or the end. 74 | - **Examples:** Common examples of palindromes include words like "noon", "level", "rotor", and phrases like "A man, a plan, a canal: Panama" or "Race car, race!". Numbers can also be palindromes, such as 1111, 1221, or 5885. 75 | - **Variations:** Palindromes can be single characters ("A"), single words ("noon"), or even entire sentences ("Madam, I'm Adam"). 76 | - **Case-sensitivity:** Depending on the context, palindromes might be considered case-sensitive (e.g., "Noon" is not a palindrome) or case-insensitive (e.g., "Noon" is considered the same as "noon"). 77 | 78 | - **Etymology:** The word "palindrome" comes from the Greek words "palin" (meaning "back again") and "dromos" (meaning "course"). 79 | - **Mathematical Applications:** Palindromes have applications in computer science, linguistics, and even recreational mathematics. For instance, they can be used in data validation (checking if an input is a palindrome) or exploring properties of numbers. 80 | - **Cultural Significance:** Palindromes appear in literature, wordplay, and even historical writings. They can be found in puzzles, riddles, and creative writing as a form of wordplay or artistic expression. 81 | 82 | **Palindromic Substring Problem: Finding Words that Read the Same Backwards and Forwards** 83 | 84 | A palindrome is a captivating word or phrase that reads the same backward and forward, like "racecar" or "madam." The palindromic substring problem seeks to locate all such substrings within a given text string. 85 | 86 | **Algorithm Explanation: A Linear Scan with a Twist** 87 | 88 | Manacher's algorithm employs a clever data structure called a P-array to efficiently identify palindromic substrings. The P-array, denoted by P[i], for a given string S and index i, stores the largest palindrome centered at index i (considering the character at i as the center). The algorithm performs a single linear scan through the text string, cleverly utilizing the P-values of previously processed characters to expand or contract the search for palindromes centered at the current position. 89 | 90 | **Real-World Example: DNA Sequence Analysis and Palindrome Detection** 91 | 92 | In the realm of DNA research, scientists often encounter palindromic sequences that play a crucial role in gene regulation. Manacher's algorithm can be employed here. The DNA sequence (text string) is processed, and the P-array identifies all palindromic substrings within the sequence. This expedites the discovery of these potentially significant DNA features, aiding researchers in unraveling the mysteries of the genetic code. 93 | 94 | **Complexity Analysis: Linear Efficiency for Palindrome Hunting** 95 | 96 | Manacher's algorithm exhibits exceptional efficiency for the palindromic substring problem. Its time complexity is typically linear, O(n), where n is the length of the text string. This implies that the execution time grows proportionally to the text string length, a significant advantage over algorithms that might require repeated substring comparisons. The space complexity of Manacher's algorithm is also linear, O(n), due to the P-array it utilizes. 97 | 98 | **Advantage over Z-Algorithm for Palindromes:** 99 | 100 | While the Z-algorithm can be adapted to find palindromes, Manacher's algorithm is specifically designed for this task. It leverages the concept of palindromes centered at each index, resulting in a more efficient linear scan compared to the Z-algorithm's approach for general string searching. 101 | 102 | ## **Applications of String Searching Algorithms** 103 | 104 | String searching algorithms transcend theoretical concepts; they serve as the backbone for a multitude of real-world applications that rely on efficiently locating specific patterns within text data. Here, we explore some prominent examples: 105 | 106 | - **Text Editors: The Find Function - A Familiar Friend** 107 | 108 | The ubiquitous "find" functionality in text editors exemplifies the practical application of string searching algorithms. When you search for a specific word or phrase within a document, the underlying algorithm swiftly scans the text, identifying occurrences of the search pattern (your query) with remarkable speed. This empowers you to navigate large documents efficiently and locate relevant information effortlessly. 109 | 110 | - **Bioinformatics: Unveiling the Secrets of Life within DNA Sequences** 111 | 112 | In the realm of bioinformatics, string searching algorithms play a critical role in analyzing DNA sequences. Scientists utilize these algorithms to identify specific patterns within these sequences, such as genes, regulatory elements, or repetitive motifs. By efficiently locating these patterns, researchers gain valuable insights into the genetic code, furthering our understanding of biological processes and paving the way for advancements in medicine and biotechnology. 113 | 114 | - **Plagiarism Detection: Protecting Intellectual Property** 115 | 116 | String searching algorithms serve as the foundation for plagiarism detection software. This software scans submitted text against a vast database of existing works, searching for potential matches or significant overlaps. By efficiently identifying instances of copied content, these algorithms help safeguard intellectual property and ensure the originality of academic and creative works. 117 | 118 | - **Network Intrusion Detection Systems: Guardians of the Digital Realm** 119 | 120 | Network intrusion detection systems (NIDS) rely heavily on string searching algorithms to protect against cyber threats. These systems constantly monitor network traffic, searching for malicious patterns or suspicious strings often embedded within malicious code or attack attempts. By efficiently identifying these patterns, NIDS can trigger alarms and take preventive measures to safeguard computer networks from unauthorized access and data breaches. 121 | 122 | > These are just a few examples of how string searching algorithms have revolutionized various fields. 123 | 124 | ## Implementation 125 | 126 | ### How Implement Z-Algorithm 127 | 128 | ``` 129 | Z_algorithm(Text) 130 | Input: Text - String to search within 131 | Output: Z - List containing the Z-function values for each index in Text 132 | 133 | n = length(Text) 134 | Z = list of size n (initialized with zeros) 135 | 136 | l = 0 # Left pointer for tracking the longest prefix match 137 | r = 0 # Right pointer for tracking the longest prefix match 138 | 139 | for i in range(1, n): 140 | # Check if the current index is within the window of a previously found match 141 | if i <= r: 142 | k = i - l 143 | # Check if the character at index i matches the character at index k (within the window) 144 | if Text[i] == Text[k]: 145 | Z[i] = Z[k] # Match extends within the window 146 | else: 147 | # Match ends, need to expand the window 148 | l = i 149 | r = i + Z[k] - 1 150 | else: 151 | # Current index is outside the window, search for a new match 152 | l = r = i 153 | while r < n and Text[r] == Text[r - l]: 154 | r += 1 155 | Z[i] = r - l - 1 # Length of the longest prefix match starting at i 156 | 157 | return Z 158 | ``` 159 | 160 | **Explanation:** 161 | 162 | 1. The `Z_algorithm` function takes a text string (`Text`) as input. 163 | 2. It initializes an empty list `Z` of size `n` (length of the text) to store the Z-function values. 164 | 3. Two variables, `l` and `r`, are used as left and right pointers to track the longest prefix match found so far. They represent the window within which characters have already been matched. 165 | 4. The loop iterates through the text string starting from index 1 (excluding the first character). 166 | 5. **Within the window:** 167 | - If the current index (`i`) is within the window of a previously found match (i.e., `i <= r`), it checks if the character at the current index (`Text[i]`) matches the character at the corresponding index within the window (`Text[k]`). 168 | - If a match is found (`Text[i] == Text[k]`), it implies the longest prefix match extends within the window. The Z-value at the current index is simply copied from the corresponding Z-value within the window (`Z[i] = Z[k]`). 169 | - Otherwise, the match ends at the current index, and the window needs to be expanded (`l = i`, `r = i + Z[k]`) to encompass the new potential match starting at the current index. 170 | 6. **Outside the window:** 171 | - If the current index (`i`) is outside the window (i.e., `i > r`), it initiates a new search for the longest prefix match starting from this index. 172 | - It expands the window (`l = r = i`) by setting both pointers to the current index. 173 | - The `while` loop iterates, comparing characters outwards until a mismatch occurs (`r < n and Text[r] == Text[r - l]`) or the end of the text is reached. 174 | - The Z-value at the current index is set to the length of the longest prefix match found during this expansion (`Z[i] = r - l - 1`). 175 | 7. Finally, the function returns the `Z` list containing the Z-function values for each index in the text string. 176 | 177 | > Example Z-algorithm in (Go,Java,python,Js,Ts) 178 | 179 | - [Go](https://github.com/m-mdy-m/algorithms-data-structures/blob/main/5.String-Manipulation-And-Algorithms/Example/golang/Z_algorithm.go) 180 | - [Java](https://github.com/m-mdy-m/algorithms-data-structures/blob/main/5.String-Manipulation-And-Algorithms/Example/java/Z_algorithm.java) 181 | - [TypeScript](https://github.com/m-mdy-m/algorithms-data-structures/blob/main/5.String-Manipulation-And-Algorithms/Example/ts/Z_algorithm.ts) 182 | - [JavaScript](https://github.com/m-mdy-m/algorithms-data-structures/blob/main/5.String-Manipulation-And-Algorithms/Example/js/Z_algorithm.js) 183 | - [Python](https://github.com/m-mdy-m/algorithms-data-structures/blob/main/5.String-Manipulation-And-Algorithms/Example/golang/python.py) 184 | 185 | **Output**-> 186 | 187 | ``` 188 | Text: ABABDABACDABABCABAB 189 | Z-function: [0, -1, 1, -1, 0, 2, -1, 1, 0, -1, 3, -1, 1, -1, 0, 3, -1, 1, -1] 190 | ``` 191 | 192 | Explanations: 193 | 194 | ``` 195 | A B A B D A B A C D A B A B C A B A B 196 | [ 0 -1 1 -1 0 2 -1 1 0 -1 3 -1 1 -1 0 3 -1 1 -1 ] 197 | ``` 198 | 199 | | Index | Text Character | Z-function Value | Explanation (Visualized) | 200 | | ----- | -------------- | ---------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------- | 201 | | 0 | A | 0 | The first character, so no prefix match (empty string). Represented by an empty box. | 202 | | 1 | B | -1 | No prefix match starting at index 1 is a substring of the entire string. Represented by a box with "X" indicating no match. | 203 | | 2 | A | 1 | "A" itself is the longest prefix match starting at index 2. Represented by a box containing "A". | 204 | | 3 | B | -1 | No prefix match starting at index 3 is a substring of the entire string. Represented by a box with "X". | 205 | | 4 | D | 0 | No prefix match starting at index 4 is a substring of the entire string (since "AB" is not a prefix of "ABABDABACDABABCABAB"). Represented by an empty box. | 206 | | 5 | A | 2 | The longest prefix match starting at index 5 is "AB" (shown as a highlighted box). | 207 | | 6 | B | -1 | No prefix match starting at index 6 is a substring of the entire string. Represented by a box with "X". | 208 | | 7 | A | 1 | "A" itself is the longest prefix match starting at index 7. Represented by a box containing "A". | 209 | | 8 | C | 0 | No prefix match starting at index 8 is a substring of the entire string. Represented by an empty box. | 210 | | 9 | D | -1 | No prefix match starting at index 9 is a substring of the entire string. Represented by a box with "X". | 211 | | 10 | A | 3 | The longest prefix match starting at index 10 is "ABA" (shown as a highlighted box). | 212 | | 11 | B | -1 | No prefix match starting at index 11 is a substring of the entire string. Represented by a box with "X". | 213 | | 12 | A | 1 | "A" itself is the longest prefix match starting at index 12. Represented by a box containing "A". | 214 | | 13 | B | -1 | No prefix match starting at index 13 is a substring of the entire string. Represented by a box with "X". | 215 | | 14 | C | 0 | No prefix match starting at index 14 is a substring of the entire string. Represented by an empty box. | 216 | | 15 | A | 3 | The longest prefix match starting at index 15 is "ABA" (shown as a highlighted box). | 217 | | 16 | B | -1 | No prefix match starting at index 16 is a substring of the entire string. Represented by a box with "X". | 218 | | 17 | A | 1 | "A" itself is the longest prefix match starting at index 17. Represented by a box containing "A". | 219 | | 18 | B | -1 | No prefix match starting at index 18 is a substring of the entire string. Represented by a box with "X". | 220 | 221 | ``` 222 | A B A B D A B A C D A B A B C A B A B 223 | [ 0( ) -1(X) 1(A) -1(X) 0( ) 2(AB) -1(X) 1(A) 0( ) -1(X) 3(ABA) -1(X) 1(A) -1(X) 0( ) 3(ABA) -1(X) 1(A) -1(X) ] 224 | ``` 225 | 226 | --- 227 | 228 | ### How Implement Manachers Algorithm 229 | 230 | ``` 231 | Manachers_Algorithm(Text) 232 | Input: Text - String to search for palindromes 233 | Output: Palindromes - List containing starting indices and lengths of all palindromes in Text 234 | 235 | # Preprocess the text by adding a special character between each character 236 | Processed_Text = "#" + "#".join(Text) + "#" 237 | 238 | C = list of size (2 * len(Processed_Text) + 1) (initialized with zeros) # Center array for palindrome information 239 | P = list of size (2 * len(Processed_Text) + 1) (initialized with zeros) # Length array for palindrome lengths 240 | C_center = 0 # Center of the current palindrome 241 | R = 0 # Right boundary of the current palindrome 242 | 243 | for i in range(1, len(Processed_Text) - 1): 244 | # Check if the current index is within the previously found palindrome's boundary 245 | i_mirror = 2 * C_center - i 246 | if i <= R: 247 | P[i] = min(R - i, P[i_mirror]) # Utilize mirrored index for efficiency 248 | else: 249 | P[i] = 0 # No existing palindrome centered at this index 250 | 251 | # Expand the palindrome centered at the current index 252 | while i - P[i] - 1 >= 0 and i + P[i] + 1 < len(Processed_Text) and Processed_Text[i - P[i] - 1] == Processed_Text[i + P[i] + 1]: 253 | P[i] += 1 254 | 255 | # Update center and right boundary if a larger palindrome is found 256 | if i + P[i] > R: 257 | C_center = i 258 | R = i + P[i] 259 | 260 | # Extract starting indices and lengths of palindromes from the P array 261 | Palindromes = list 262 | for i in range(1, len(Processed_Text) - 1, 2): 263 | if P[i] > 0: 264 | start_index = (i - P[i]) // 2 265 | length = P[i] 266 | Palindromes.append((start_index, length)) 267 | 268 | return Palindromes 269 | ``` 270 | 271 | **Explanation:** 272 | 273 | 1. The `Manachers_Algorithm` function takes a text string (`Text`) as input. 274 | 2. It creates a preprocessed version of the text (`Processed_Text`) by adding a special character "#" between each character in the original text. This allows for efficient character comparisons during palindrome checks. 275 | 3. Two empty lists are initialized: `C` (center array) and `P` (length array) of size (2 \* len(Processed_Text) + 1) to store information about palindromes. These arrays track the center and length of the expanding palindrome for each index in the processed text. 276 | 4. Two variables, `C_center` and `R`, are used to track the center and right boundary of the currently expanding palindrome. 277 | 5. The loop iterates through each character (excluding the first and last special characters) of the `Processed_Text`. 278 | 6. It first checks if the current index (`i`) falls within the previously found palindrome's boundary (`i <= R`). If so, it utilizes mirroring to efficiently determine the potential palindrome length. The `i_mirror` index is calculated based on the `C_center` and reflects the mirrored position within the palindrome. The `P[i]` value is then set to the minimum of the remaining length within the previous palindrome (`R - i`) and the corresponding `P` value at the mirrored index (`P[i_mirror]`). 279 | 7. Otherwise, it initializes `P[i]` to 0, signifying no existing palindrome centered at this index. 280 | 8. The loop then expands the potential palindrome centered at the current index (`i`), comparing characters outwards until a mismatch occurs or the boundaries are reached. The `P[i]` value is updated with the current palindrome length. 281 | 9. If the expanded palindrome extends beyond the previously found palindrome (`i + P[i] > R`), the `C_center` and `R` are updated to reflect the new center and right boundary. 282 | 10. After processing all characters, the algorithm extracts starting indices and lengths of palindromes from the `P` array. It iterates through the `P` array with a step of 2 (skipping the special characters) and checks for non-zero `P` values. If a non-zero value is found, it calculates the starting index based on the current index (`i`) and the palindrome length (`P[i]`). 283 | 284 | > Example Manachers algorithm in (Go,Java,python,Js,Ts) 285 | 286 | - [Go](https://github.com/m-mdy-m/algorithms-data-structures/blob/main/5.String-Manipulation-And-Algorithms/Example/golang/Manachers_Algorithm.go) 287 | - [Java](https://github.com/m-mdy-m/algorithms-data-structures/blob/main/5.String-Manipulation-And-Algorithms/Example/java/Manachers_Algorithm.go) - Write with Ai 288 | - [TypeScript](https://github.com/m-mdy-m/algorithms-data-structures/blob/main/5.String-Manipulation-And-Algorithms/Example/ts/Manachers_Algorithm.go) 289 | - [JavaScript](https://github.com/m-mdy-m/algorithms-data-structures/blob/main/5.String-Manipulation-And-Algorithms/Example/js/Manachers_Algorithm.go) 290 | - [Python](https://github.com/m-mdy-m/algorithms-data-structures/blob/main/5.String-Manipulation-And-Algorithms/Example/python/Manachers_Algorithm.go) 291 | 292 | **Output**-> 293 | 294 | ``` 295 | Text: ABABDABACDABABCABAB 296 | Manachers-function: [(0, 1), (0, 3), (1, 3), (3, 1), (4, 1), (5, 1), (5, 3), (7, 1), (8, 1), (9, 1), (10, 1), (10, 3), (11, 3), (13, 1), (14, 1), (15, 1), (15, 3), (16, 3), (18, 1)] 297 | processed_text: #A#B#A#B#D#A#B#A#C#D#A#B#A#B#C#A#B#A#B# -> length : 39 298 | ``` 299 | 300 | - **Text:** This represents the original input string that will be analyzed for palindromes. 301 | - **Manachers-function:** This is a list of tuples, where each tuple encodes the starting index and length of a palindrome discovered within the processed text. 302 | - **processed_text:** This showcases a preprocessed version of the original text. Special characters (`#`) are inserted between each character and additional `#` are placed at the beginning and end. This preprocessing step simplifies the palindrome identification process by exploiting character symmetry. 303 | 304 | | Index (i) | Processed Text Slice | Start Index (original text) | Length | Reason | 305 | | --------- | -------------------- | --------------------------- | ------ | ------------------------------------------------------------------------------------------------------------- | 306 | | (0, 1) | `#` | 0 | 1 | Single character 'A' at index 0 in the original text is a palindrome of length 1. | 307 | | (0, 3) | `#A#B#` | 0 | 3 | Palindrome "ABA" centered at index 1 (considering `#` as boundaries). | 308 | | (1, 3) | `#A#` | (1 - 3) // 2 = -1 (ignored) | 3 | Center of this palindrome is at index 1, but it extends beyond the beginning of the processed text (invalid). | 309 | | (3, 1) | `#B#A#` | 3 // 2 = 1 | 1 | Single character 'B' at index 1 is a palindrome of length 1. | 310 | | (4, 1) | `#A#B#` | 4 // 2 = 2 | 1 | Single character 'A' at index 2 is a palindrome of length 1. | 311 | | (5, 1) | `#D#A#` | 5 // 2 = 2 | 1 | Single character 'D' at index 2 is a palindrome of length 1. | 312 | | (5, 3) | `#D#A#B#` | 5 // 2 = 2 | 3 | Palindrome "BAB" centered at index 3. | 313 | | (7, 1) | `#A#C#` | 7 // 2 = 3 | 1 | Single character 'C' at index 3 is a palindrome of length 1. | 314 | | ... | ... | ... | ... | ... | 315 | | (18, 1) | `#B#` | 18 // 2 = 9 | 1 | Single character 'B' at index 9 is a palindrome of length 1. | 316 | 317 | The Manachers Algorithm leverages two key concepts: a "center" (`C_center`) and a "right" boundary (`R`). It employs a `P` array to store the "palindrome radius" for each index, representing the maximum length of a palindrome centered at that index within the current boundaries. 318 | 319 | Consider the palindrome "ABA" (centered at index 1) for illustration: 320 | 321 | ``` 322 | i (center) 323 | 0 # A # B # # # # # 324 | -3-2-1--0--1--2--3--4- 325 | R (right boundary) 326 | P[i] = 1 327 | ``` 328 | 329 | ## Conclusion 330 | 331 | String manipulation and efficient string searching algorithms are the unsung heroes of the digital world. They empower us to navigate, analyze, and modify textual data with remarkable efficiency. 332 | 333 | This exploration delved into two powerful algorithms: 334 | 335 | - **Z-Algorithm:** This algorithm pre-computes information about potential prefix matches within the text, enabling swift pattern searching. It excels at general string searching tasks. 336 | - **Manacher's Algorithm:** This algorithm leverages a clever data structure to efficiently identify palindromic substrings within a text string. It caters specifically to the problem of finding these intriguing words or phrases that read the same backward and forward. 337 | 338 | While Z-algorithm and Manacher's algorithm provide robust solutions, the realm of string searching extends beyond these. Algorithms like the Knuth-Morris-Pratt algorithm offer even faster pattern matching when the search pattern itself exhibits specific characteristics. 339 | 340 | This glimpse into the world of string searching algorithms paves the way for further exploration. As the volume and complexity of textual data continue to surge, efficient string searching algorithms will remain at the forefront, empowering us to unlock the valuable insights hidden within the vast ocean of words. 341 | --------------------------------------------------------------------------------