├── .gitignore ├── Algorithms and Data Structure 2 Mins Review.ipynb ├── ChildTobaccoUseAnalysis.pdf ├── Predicting crashing paptients with ensemble of embeddings.pdf ├── README.md └── dynamicsofamodifiedsirmodel_zhen_li_spring2012.pdf /.gitignore: -------------------------------------------------------------------------------- 1 | .ipynb_checkpoints/ 2 | -------------------------------------------------------------------------------- /Algorithms and Data Structure 2 Mins Review.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Data Structures And Algorithms 2 Mins Review\n", 8 | "\n", 9 | "by Stephen Gou\n", 10 | "\n", 11 | "Jan, 2019\n", 12 | "\n", 13 | "This notebook provides brief descriptions and runtimes for common data structures and algorithms. Use this as a quick review for technical interviews or as a checklist for knowledge. I wanted a **comprehensive, concise, and correct** cheat sheet but couldn't find a satisfactory one online, thus I created my own. Runtime refers to average runtime.\n", 14 | "\n", 15 | "## Data Structures\n", 16 | "\n", 17 | "### 1 Dynamic Array\n", 18 | "Sequentially stored data in a continuous chunk of memory. Double current capacity whenever capacity reached. When increasing capacity, it allocates new chunk of memory and copy over the previous values to new location.\n", 19 | "\n", 20 | "**Runtime**\n", 21 | "\n", 22 | "indexing: $O(1)$\n", 23 | "\n", 24 | "searching: $O(n)$\n", 25 | "\n", 26 | "insertion: $O(n)$ *Note: $O(1)$ Amortized*\n", 27 | "\n", 28 | "deletion: $O(n)$ \n", 29 | "\n", 30 | "### 2 Linked List\n", 31 | "Most commonly refers to singly linked list. Data stored in nodes where each node has a reference to the next node. There are doubly linked list and circular linked list as well.\n", 32 | "\n", 33 | "**Runtime**\n", 34 | "\n", 35 | "indexing: $O(n)$\n", 36 | "\n", 37 | "searching: $O(n)$\n", 38 | "\n", 39 | "insertion: $O(1)$\n", 40 | "\n", 41 | "deletion: $O(1)$ \n", 42 | "\n", 43 | "Stack and queue are often implemented with linked list because linked list are most performant for insertion/deletion, which are the most frequently used operations for stacks/queues.\n", 44 | "\n", 45 | "**Stack** Last in First Out (LIFO)\n", 46 | "\n", 47 | "**Queue** First in First Out (FIFO)\n", 48 | "\n", 49 | "### 3 Hash Table (Hash Map)\n", 50 | "An usually unordered data structure that maps keys to values. A hash (preferably unique) is computed for a given key and its value will be stored in the corresponding bins or index according to the hash. Internally the bins can be an array. Collision can happen when multiple keys are mapped to the same hash. Common resolution is to store a list/linked-list at each bin/index location (called chaining).\n", 51 | "\n", 52 | "**Runtime**\n", 53 | "\n", 54 | "value lookup: $O(1)$\n", 55 | "\n", 56 | "insertion: $O(1)$\n", 57 | "\n", 58 | "deletion: $O(1)$ \n", 59 | "\n", 60 | "### 4 Binary Search Tree (BST)\n", 61 | "A binary tree with extra condition that each node is greater than or equal to all nodes in left sub-tree, and smaller than or equal to all nodes in right sub-tree.\n", 62 | "\n", 63 | "**Runtime**\n", 64 | "\n", 65 | "searching: $O(log\\ n)$\n", 66 | "\n", 67 | "insertion: $O(log\\ n)$\n", 68 | "\n", 69 | "deletion: $O(log\\ n)$ \n", 70 | "### 5 Heap (Max Heap/Min Heap)\n", 71 | "A binary tree with the condition that parent node's value is bigger/smaller than its children. So root is the maximum in a max heap and minimum in min heap. Priority queue is also referred to as heap because it's usually implemented by a heap.\n", 72 | "\n", 73 | "**Runtime**\n", 74 | "\n", 75 | "min/max: $O(1)$\n", 76 | "\n", 77 | "insertion: $O(log\\ n)$\n", 78 | "\n", 79 | "deletion: $O(log\\ n)$ " 80 | ] 81 | }, 82 | { 83 | "cell_type": "markdown", 84 | "metadata": {}, 85 | "source": [ 86 | "## Algorithms\n", 87 | "### 1 Sorting\n", 88 | "**Bubble Sort**\n", 89 | "\n", 90 | "Iterate through entire list while comparing pairs and swap positions based on their values until all elements sorted.\n", 91 | "\n", 92 | "$O(n^2)$ but fast if list is almost sorted.\n", 93 | "\n", 94 | "\n", 95 | "**Insertion Sort**\n", 96 | "\n", 97 | "Iterates through unsorted list while building a sorted list. For each value encountered in unsorted list, find appropriate place in sorted list and insert it.\n", 98 | "\n", 99 | "$O(n^2)$\n", 100 | "\n", 101 | "**Merge Sort**\n", 102 | "\n", 103 | "A type of divide and conquer algorithm: 1) divides the list into two equally sized sub lists 2) sort each sub list 3) merge two sorted lists into final list.\n", 104 | "\n", 105 | "$O(n\\ log\\ n)$ - needs to divide $log\\ n$ times, and for each divide, it needs to go through all $n$ items to merge, thus $ n\\ times\\ log\\ n$. \n", 106 | "\n", 107 | "**Heap Sort**\n", 108 | "\n", 109 | "1) Build a heap (min or max) from the unsorted list 2)repeatedly remove the root node from the heap and put into the sorted list.\n", 110 | "\n", 111 | "$O(n\\ log\\ n)$ - remove root node is $O(log \\ n)$, and has to be repeated for each node, thus $ n\\ times\\ log\\ n$. \n", 112 | "\n", 113 | "**Quick Sort**\n", 114 | "\n", 115 | "A type of divide and conquer algorithm: 1) pick an item in the unsorted list as pivot 2) divided list into 2 sub lists, one contains elements smaller than pivot while the other contains elements greater than the pivot 3) sort the sub lists, and combine the results into final list.\n", 116 | "\n", 117 | "$O(n\\ log\\ n)$ - need to divide $O(log \\ n)$ times, and after each divide, the partioning has to go through all elements, thus overall runtime $ n\\ times\\ log\\ n$. \n", 118 | "\n", 119 | "\n", 120 | "## 2 Searching\n", 121 | "**Linear Search** $O(n)$\n", 122 | "\n", 123 | "**Binary Search** $O(log\\ n)$\n", 124 | "\n", 125 | "**Breadth-First-Search (BFS)** Siblings first then children. Use queue usually.\n", 126 | "\n", 127 | "**Depth-First-Search (DFS)** Children first then siblings. Use stack usually.\n", 128 | "\n", 129 | "**A\\* Search** Goal is to find the shortest path between 2 nodes in a graph. It's a best-first search. At each iteration it finds the next node to extend the path based on the criteria $g(next) + h(next)$ where g is the distance from next node to starting node and h is the heuristic (estimated) distance of next node to final node. use a heap usually.\n", 130 | "\n", 131 | "## 3 Tree Traversals\n", 132 | "**Inorder** (Left, Root, Right): useful for getting sorted list out of BST\n", 133 | "\n", 134 | "**Preorder** (Root, Left, Right) :useful for making copy of binary trees, or evaluate expression trees.\n", 135 | "\n", 136 | "**Postorder** (Left, Right, Root): useful for deleting trees (because need to delete children before deleting parent)\n" 137 | ] 138 | }, 139 | { 140 | "cell_type": "code", 141 | "execution_count": null, 142 | "metadata": {}, 143 | "outputs": [], 144 | "source": [] 145 | } 146 | ], 147 | "metadata": { 148 | "kernelspec": { 149 | "display_name": "Python 3", 150 | "language": "python", 151 | "name": "python3" 152 | }, 153 | "language_info": { 154 | "codemirror_mode": { 155 | "name": "ipython", 156 | "version": 3 157 | }, 158 | "file_extension": ".py", 159 | "mimetype": "text/x-python", 160 | "name": "python", 161 | "nbconvert_exporter": "python", 162 | "pygments_lexer": "ipython3", 163 | "version": "3.6.6" 164 | } 165 | }, 166 | "nbformat": 4, 167 | "nbformat_minor": 2 168 | } 169 | -------------------------------------------------------------------------------- /ChildTobaccoUseAnalysis.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/stephengou/Data-Science-Projects/53f214ff69fa7cd62adb4a02a2529f89c3a2465a/ChildTobaccoUseAnalysis.pdf -------------------------------------------------------------------------------- /Predicting crashing paptients with ensemble of embeddings.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/stephengou/Data-Science-Projects/53f214ff69fa7cd62adb4a02a2529f89c3a2465a/Predicting crashing paptients with ensemble of embeddings.pdf -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Data-Science-Blog 2 | Data Science Projects for personal blog 3 | -------------------------------------------------------------------------------- /dynamicsofamodifiedsirmodel_zhen_li_spring2012.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/stephengou/Data-Science-Projects/53f214ff69fa7cd62adb4a02a2529f89c3a2465a/dynamicsofamodifiedsirmodel_zhen_li_spring2012.pdf --------------------------------------------------------------------------------