├── Interview Preparation- Day 1.ipynb ├── Interview Preparation- Day 2- Linear Regression.ipynb ├── Interview Preparation- Day 3-SVM.ipynb ├── Interview Preparation- Day 4- Decision Trees.ipynb ├── Interview Preparation- Day 5-Logistic Regression.ipynb ├── Interview Preparation-Random Forest-Bagging.ipynb ├── Interview Preparation-Xgboost,GBboost,Adaboost--Boosting.ipynb └── README.md /Interview Preparation- Day 1.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "### How To Learn Machine Learning Algorithms For Interviews\n", 8 | "\n", 9 | "#### Naive Bayes Classifier\n", 10 | "\n", 11 | "Theoretical Understanding:\n", 12 | "\n", 13 | "1. Tutorial 48th : https://www.youtube.com/watch?v=jS1CKhALUBQ\n", 14 | "2. Tutorial 49th: https://www.youtube.com/watch?v=temQ8mHpe3k" 15 | ] 16 | }, 17 | { 18 | "cell_type": "markdown", 19 | "metadata": {}, 20 | "source": [ 21 | "##### 1. What Are the Basic Assumption?\n", 22 | "Features Are Independent" 23 | ] 24 | }, 25 | { 26 | "cell_type": "markdown", 27 | "metadata": {}, 28 | "source": [ 29 | "##### 2. Advantages\n", 30 | "1. Work Very well with many number of features\n", 31 | "2. Works Well with Large training Dataset\n", 32 | "3. It converges faster when we are training the model\n", 33 | "4. It also performs well with categorical features" 34 | ] 35 | }, 36 | { 37 | "cell_type": "markdown", 38 | "metadata": {}, 39 | "source": [ 40 | "##### 3. Disadvantages\n", 41 | "1. Correlated features affects performance" 42 | ] 43 | }, 44 | { 45 | "cell_type": "markdown", 46 | "metadata": {}, 47 | "source": [ 48 | "##### 4. Whether Feature Scaling is required?\n", 49 | "No\n", 50 | "##### 5. Impact of Missing Values?\n", 51 | "Naive Bayes can handle missing data. Attributes are handled separately by the algorithm at both model construction time and prediction time. As such, if a data instance has a missing value for an attribute, it can be ignored while preparing the model, and ignored when a probability is calculated for a class value\n", 52 | "tutorial :https://www.youtube.com/watch?v=EqjyLfpv5oA\n", 53 | "##### 6. Impact of outliers?\n", 54 | "It is usually robust to outliers" 55 | ] 56 | }, 57 | { 58 | "cell_type": "markdown", 59 | "metadata": {}, 60 | "source": [ 61 | "##### Different Problem statement you can solve using Naive Baye's\n", 62 | "1. Sentiment Analysis\n", 63 | "2. Spam classification\n", 64 | "3. twitter sentiment analysis\n", 65 | "4. document categorization" 66 | ] 67 | }, 68 | { 69 | "cell_type": "code", 70 | "execution_count": null, 71 | "metadata": {}, 72 | "outputs": [], 73 | "source": [ 74 | "-" 75 | ] 76 | } 77 | ], 78 | "metadata": { 79 | "kernelspec": { 80 | "display_name": "Python 3", 81 | "language": "python", 82 | "name": "python3" 83 | }, 84 | "language_info": { 85 | "codemirror_mode": { 86 | "name": "ipython", 87 | "version": 3 88 | }, 89 | "file_extension": ".py", 90 | "mimetype": "text/x-python", 91 | "name": "python", 92 | "nbconvert_exporter": "python", 93 | "pygments_lexer": "ipython3", 94 | "version": "3.7.9" 95 | } 96 | }, 97 | "nbformat": 4, 98 | "nbformat_minor": 4 99 | } 100 | -------------------------------------------------------------------------------- /Interview Preparation- Day 2- Linear Regression.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "### How To Learn Machine Learning Algorithms For Interviews\n", 8 | "\n", 9 | "#### Linear Regression\n", 10 | "\n", 11 | "Theoretical Understanding:\n", 12 | "\n", 13 | "1. https://www.youtube.com/watch?v=1-OGRohmH2s&list=PLZoTAELRMXVPBTrWtJkn3wWQxZkmTXGwe&index=29\n", 14 | "2. https://www.youtube.com/watch?v=5rvnlZWzox8&list=PLZoTAELRMXVPBTrWtJkn3wWQxZkmTXGwe&index=34\n", 15 | "3. https://www.youtube.com/watch?v=NAPhUDjgG_s&list=PLZoTAELRMXVPBTrWtJkn3wWQxZkmTXGwe&index=32\n", 16 | "4. https://www.youtube.com/watch?v=WuuyD3Yr-js&list=PLZoTAELRMXVPBTrWtJkn3wWQxZkmTXGwe&index=35\n", 17 | "5. https://www.youtube.com/watch?v=BqzgUnrNhFM&list=PLZoTAELRMXVPBTrWtJkn3wWQxZkmTXGwe&index=33" 18 | ] 19 | }, 20 | { 21 | "cell_type": "markdown", 22 | "metadata": {}, 23 | "source": [ 24 | "##### Interview Question on Multicollinearity\n", 25 | "\n", 26 | "1. https://www.youtube.com/watch?v=tcaruVHXZwE" 27 | ] 28 | }, 29 | { 30 | "cell_type": "markdown", 31 | "metadata": {}, 32 | "source": [ 33 | "##### 1. What Are the Basic Assumption?(favourite)\n", 34 | "There are four assumptions associated with a linear regression model:\n", 35 | "\n", 36 | "1. Linearity: The relationship between X and the mean of Y is linear.\n", 37 | "2. Homoscedasticity: The variance of residual is the same for any value of X.\n", 38 | "3. Independence: Observations are independent of each other.\n", 39 | "4. Normality: For any fixed value of X, Y is normally distributed." 40 | ] 41 | }, 42 | { 43 | "cell_type": "markdown", 44 | "metadata": {}, 45 | "source": [ 46 | "##### 2. Advantages\n", 47 | "1. Linear regression performs exceptionally well for linearly separable data\n", 48 | "2. Easy to implement and train the model\n", 49 | "3. It can handle overfitting using dimensionlity reduction techniques and cross validation and regularization \n" 50 | ] 51 | }, 52 | { 53 | "cell_type": "markdown", 54 | "metadata": {}, 55 | "source": [ 56 | "##### 3. Disadvantages\n", 57 | "1. Sometimes Lot of Feature Engineering Is required\n", 58 | "2. If the independent features are correlated it may affect performance\n", 59 | "3. It is often quite prone to noise and overfitting" 60 | ] 61 | }, 62 | { 63 | "attachments": { 64 | "image.png": { 65 | "image/png": "" 66 | } 67 | }, 68 | "cell_type": "markdown", 69 | "metadata": {}, 70 | "source": [ 71 | "##### 4. Whether Feature Scaling is required?\n", 72 | "Yes\n", 73 | "##### 5. Impact of Missing Values?\n", 74 | "It is sensitive to missing values\n", 75 | "##### 6. Impact of outliers?\n", 76 | "linear regression needs the relationship between the independent and dependent variables to be linear. It is also important to check for outliers since linear regression is sensitive to outlier effects.\n", 77 | "\n", 78 | "![image.png](attachment:image.png)" 79 | ] 80 | }, 81 | { 82 | "cell_type": "markdown", 83 | "metadata": {}, 84 | "source": [ 85 | "##### Types of Problems it can solve(Supervised)\n", 86 | "1. Regression" 87 | ] 88 | }, 89 | { 90 | "cell_type": "markdown", 91 | "metadata": {}, 92 | "source": [ 93 | "##### Overfitting And Underfitting\n", 94 | "HomeWork?" 95 | ] 96 | }, 97 | { 98 | "cell_type": "markdown", 99 | "metadata": {}, 100 | "source": [ 101 | "##### Different Problem statement you can solve using Linear Regression\n", 102 | "1. Advance House Price Prediction\n", 103 | "2. Flight Price Prediction" 104 | ] 105 | }, 106 | { 107 | "cell_type": "markdown", 108 | "metadata": {}, 109 | "source": [ 110 | "#### Practical Implementation\n", 111 | "1. https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html" 112 | ] 113 | }, 114 | { 115 | "cell_type": "code", 116 | "execution_count": null, 117 | "metadata": {}, 118 | "outputs": [], 119 | "source": [] 120 | } 121 | ], 122 | "metadata": { 123 | "kernelspec": { 124 | "display_name": "Python 3", 125 | "language": "python", 126 | "name": "python3" 127 | }, 128 | "language_info": { 129 | "codemirror_mode": { 130 | "name": "ipython", 131 | "version": 3 132 | }, 133 | "file_extension": ".py", 134 | "mimetype": "text/x-python", 135 | "name": "python", 136 | "nbconvert_exporter": "python", 137 | "pygments_lexer": "ipython3", 138 | "version": "3.7.9" 139 | } 140 | }, 141 | "nbformat": 4, 142 | "nbformat_minor": 4 143 | } 144 | -------------------------------------------------------------------------------- /Interview Preparation- Day 3-SVM.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "### How To Learn Machine Learning Algorithms For Interviews\n", 8 | "\n", 9 | "#### SVM\n", 10 | "\n", 11 | "Theoretical Understanding:\n", 12 | "\n", 13 | "1. https://www.youtube.com/watch?v=H9yACitf-KM\n", 14 | "2. https://www.youtube.com/watch?v=Js3GLb1xPhc" 15 | ] 16 | }, 17 | { 18 | "cell_type": "markdown", 19 | "metadata": {}, 20 | "source": [ 21 | "##### 1. What Are the Basic Assumption?\n", 22 | "There are no such assumptions" 23 | ] 24 | }, 25 | { 26 | "cell_type": "markdown", 27 | "metadata": {}, 28 | "source": [ 29 | "##### 2. Advantages\n", 30 | "1. SVM is more effective in high dimensional spaces.\n", 31 | "2. SVM is relatively memory efficient.\n", 32 | "3. SVM’s are very good when we have no idea on the data.\n", 33 | "4. Works well with even unstructured and semi structured data like text, Images and trees.\n", 34 | "5. The kernel trick is real strength of SVM. With an appropriate kernel function, we can solve any complex problem.\n", 35 | "6. SVM models have generalization in practice, the risk of over-fitting is less in SVM." 36 | ] 37 | }, 38 | { 39 | "cell_type": "markdown", 40 | "metadata": {}, 41 | "source": [ 42 | "##### 3. Disadvantages\n", 43 | "1. More Training Time is required for larger dataset\n", 44 | "2. It is difficult to choose a good kernel function\n", 45 | "https://www.youtube.com/watch?v=mTyT-oHoivA\n", 46 | "3. The SVM hyper parameters are Cost -C and gamma. It is not that easy to fine-tune these hyper-parameters. It is hard to visualize their impact" 47 | ] 48 | }, 49 | { 50 | "cell_type": "markdown", 51 | "metadata": {}, 52 | "source": [ 53 | "##### 4. Whether Feature Scaling is required?\n", 54 | "Yes\n", 55 | "##### 5. Impact of Missing Values?\n", 56 | "Although SVMs are an attractive option when constructing a classifier, SVMs do not easily accommodate missing covariate information. Similar to other prediction and classification methods, in-attention to missing data when constructing an SVM can impact the accuracy and utility of the resulting classifier.\n", 57 | "##### 6. Impact of outliers?\n", 58 | "It is usually sensitive to outliers\n", 59 | "https://arxiv.org/abs/1409.0934#:~:text=Despite%20its%20popularity%2C%20SVM%20has,causes%20the%20sensitivity%20to%20outliers." 60 | ] 61 | }, 62 | { 63 | "cell_type": "markdown", 64 | "metadata": {}, 65 | "source": [ 66 | "##### Types of Problems it can solve(Supervised)\n", 67 | "1. Classification\n", 68 | "2. Regression" 69 | ] 70 | }, 71 | { 72 | "cell_type": "markdown", 73 | "metadata": {}, 74 | "source": [ 75 | "##### Overfitting And Underfitting\n", 76 | "In SVM, to avoid overfitting, we choose a Soft Margin, instead of a Hard one i.e. we let some data points enter our margin intentionally (but we still penalize it) so that our classifier don't overfit on our training sample\n", 77 | "\n", 78 | "https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html" 79 | ] 80 | }, 81 | { 82 | "cell_type": "markdown", 83 | "metadata": {}, 84 | "source": [ 85 | "##### Different Problem statement you can solve using Naive Baye's\n", 86 | "1. We can use SVM with every ANN usecases\n", 87 | "2. Intrusion Detection\n", 88 | "3. Handwriting Recognition" 89 | ] 90 | }, 91 | { 92 | "cell_type": "markdown", 93 | "metadata": {}, 94 | "source": [ 95 | "#### Practical Implementation\n", 96 | "1. https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html\n", 97 | "2. https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVR.html" 98 | ] 99 | }, 100 | { 101 | "cell_type": "markdown", 102 | "metadata": {}, 103 | "source": [ 104 | "##### Performance Metrics" 105 | ] 106 | }, 107 | { 108 | "cell_type": "markdown", 109 | "metadata": {}, 110 | "source": [ 111 | "##### Classification\n", 112 | "1. Confusion Matrix \n", 113 | "2. Precision,Recall, F1 score\n", 114 | "\n", 115 | "##### Regression\n", 116 | "1. R2,Adjusted R2\n", 117 | "2. MSE,RMSE,MAE" 118 | ] 119 | }, 120 | { 121 | "cell_type": "code", 122 | "execution_count": null, 123 | "metadata": {}, 124 | "outputs": [], 125 | "source": [] 126 | } 127 | ], 128 | "metadata": { 129 | "kernelspec": { 130 | "display_name": "Python 3", 131 | "language": "python", 132 | "name": "python3" 133 | }, 134 | "language_info": { 135 | "codemirror_mode": { 136 | "name": "ipython", 137 | "version": 3 138 | }, 139 | "file_extension": ".py", 140 | "mimetype": "text/x-python", 141 | "name": "python", 142 | "nbconvert_exporter": "python", 143 | "pygments_lexer": "ipython3", 144 | "version": "3.7.9" 145 | } 146 | }, 147 | "nbformat": 4, 148 | "nbformat_minor": 4 149 | } 150 | -------------------------------------------------------------------------------- /Interview Preparation- Day 4- Decision Trees.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "### How To Learn Machine Learning Algorithms For Interviews\n", 8 | "\n", 9 | "#### Decision Tree Classifier And Regressor\n", 10 | "Interview Questions:\n", 11 | "1. Decision Tree \n", 12 | "2. Entropy, Information Gain, Gini Impurity\n", 13 | "3. Decision Tree Working For Categorical and Numerical Features\n", 14 | "4. What are the scenarios where Decision Tree works well\n", 15 | "5. Decision Tree Low Bias And High Variance- Overfitting\n", 16 | "6. Hyperparameter Techniques\n", 17 | "7. Library used for constructing decision tree\n", 18 | "8. Impact of Outliers Of Decision Tree\n", 19 | "9. Impact of mising values on Decision Tree\n", 20 | "10. Does Decision Tree require Feature Scaling\n", 21 | " \n", 22 | "\n", 23 | "Theoretical Understanding:\n", 24 | "\n", 25 | "1. Tutorial 37:Entropy In Decision Tree https://www.youtube.com/watch?v=1IQOtJ4NI_0\n", 26 | "2. Tutorial 38:Information Gain https://www.youtube.com/watch?v=FuTRucXB9rA\n", 27 | "3. Tutorial 39:Gini Impurity https://www.youtube.com/watch?v=5aIFgrrTqOw\n", 28 | "4. Tutorial 40: Decision Tree For Numerical Features: https://www.youtube.com/watch?v=5O8HvA9pMew \n", 29 | "5. How To Visualize DT: https://www.youtube.com/watch?v=ot75kOmpYjI" 30 | ] 31 | }, 32 | { 33 | "cell_type": "code", 34 | "execution_count": null, 35 | "metadata": {}, 36 | "outputs": [], 37 | "source": [] 38 | }, 39 | { 40 | "cell_type": "markdown", 41 | "metadata": {}, 42 | "source": [ 43 | "##### 1. What Are the Basic Assumption?\n", 44 | "There are no such assumptions" 45 | ] 46 | }, 47 | { 48 | "cell_type": "markdown", 49 | "metadata": {}, 50 | "source": [ 51 | "##### 2. Advantages\n", 52 | "Advantages of Decision Tree\n", 53 | "\n", 54 | "1. Clear Visualization: The algorithm is simple to understand, interpret and visualize as the idea is mostly used in our daily lives. Output of a Decision Tree can be easily interpreted by humans.\n", 55 | "\n", 56 | "2. Simple and easy to understand: Decision Tree looks like simple if-else statements which are very easy to understand.\n", 57 | "\n", 58 | "3. Decision Tree can be used for both classification and regression problems.\n", 59 | "\n", 60 | "4. Decision Tree can handle both continuous and categorical variables.\n", 61 | "\n", 62 | "5. No feature scaling required: No feature scaling (standardization and normalization) required in case of Decision Tree as it uses rule based approach instead of distance calculation.\n", 63 | "\n", 64 | "6. Handles non-linear parameters efficiently: Non linear parameters don't affect the performance of a Decision Tree unlike curve based algorithms. So, if there is high non-linearity between the independent variables, Decision Trees may outperform as compared to other curve based algorithms.\n", 65 | "\n", 66 | "7. Decision Tree can automatically handle missing values.\n", 67 | "\n", 68 | "8. Decision Tree is usually robust to outliers and can handle them automatically.\n", 69 | "\n", 70 | "9. Less Training Period: Training period is less as compared to Random Forest because it generates only one tree unlike forest of trees in the Random Forest. " 71 | ] 72 | }, 73 | { 74 | "cell_type": "markdown", 75 | "metadata": {}, 76 | "source": [ 77 | "##### 3. Disadvantages\n", 78 | "Disadvantages of Decision Tree\n", 79 | "\n", 80 | "1. Overfitting: This is the main problem of the Decision Tree. It generally leads to overfitting of the data which ultimately leads to wrong predictions. In order to fit the data (even noisy data), it keeps generating new nodes and ultimately the tree becomes too complex to interpret. In this way, it loses its generalization capabilities. It performs very well on the trained data but starts making a lot of mistakes on the unseen data.\n", 81 | "\n", 82 | "\n", 83 | "2. High variance: As mentioned in point 1, Decision Tree generally leads to the overfitting of data. Due to the overfitting, there are very high chances of high variance in the output which leads to many errors in the final estimation and shows high inaccuracy in the results. In order to achieve zero bias (overfitting), it leads to high variance. \n", 84 | "\n", 85 | "3. Unstable: Adding a new data point can lead to re-generation of the overall tree and all nodes need to be recalculated and recreated. \n", 86 | "\n", 87 | "4. Not suitable for large datasets: If data size is large, then one single tree may grow complex and lead to overfitting. So in this case, we should use Random Forest instead of a single Decision Tree." 88 | ] 89 | }, 90 | { 91 | "cell_type": "markdown", 92 | "metadata": {}, 93 | "source": [ 94 | "##### 4. Whether Feature Scaling is required?\n", 95 | "No\n", 96 | "\n", 97 | "##### 6. Impact of outliers?\n", 98 | "It is not sensitive to outliers.Since, extreme values or outliers, never cause much reduction in RSS, they are never involved in split. Hence, tree based methods are insensitive to outliers." 99 | ] 100 | }, 101 | { 102 | "cell_type": "markdown", 103 | "metadata": {}, 104 | "source": [ 105 | "##### Types of Problems it can solve(Supervised)\n", 106 | "1. Classification\n", 107 | "2. Regression" 108 | ] 109 | }, 110 | { 111 | "cell_type": "markdown", 112 | "metadata": {}, 113 | "source": [ 114 | "##### Overfitting And Underfitting\n", 115 | "Ho to avoid overfitting\n", 116 | "\n", 117 | "https://www.youtube.com/watch?v=SLOyyFHbiqo" 118 | ] 119 | }, 120 | { 121 | "cell_type": "markdown", 122 | "metadata": {}, 123 | "source": [ 124 | "\n" 125 | ] 126 | }, 127 | { 128 | "cell_type": "markdown", 129 | "metadata": {}, 130 | "source": [ 131 | "#### Practical Implementation\n", 132 | "1. https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html\n", 133 | "2. https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeRegressor.html" 134 | ] 135 | }, 136 | { 137 | "cell_type": "markdown", 138 | "metadata": {}, 139 | "source": [ 140 | "##### Performance Metrics" 141 | ] 142 | }, 143 | { 144 | "cell_type": "markdown", 145 | "metadata": {}, 146 | "source": [ 147 | "##### Classification\n", 148 | "1. Confusion Matrix \n", 149 | "2. Precision,Recall, F1 score\n", 150 | "\n", 151 | "##### Regression\n", 152 | "1. R2,Adjusted R2\n", 153 | "2. MSE,RMSE,MAE" 154 | ] 155 | }, 156 | { 157 | "cell_type": "code", 158 | "execution_count": null, 159 | "metadata": {}, 160 | "outputs": [], 161 | "source": [] 162 | } 163 | ], 164 | "metadata": { 165 | "kernelspec": { 166 | "display_name": "Python 3", 167 | "language": "python", 168 | "name": "python3" 169 | }, 170 | "language_info": { 171 | "codemirror_mode": { 172 | "name": "ipython", 173 | "version": 3 174 | }, 175 | "file_extension": ".py", 176 | "mimetype": "text/x-python", 177 | "name": "python", 178 | "nbconvert_exporter": "python", 179 | "pygments_lexer": "ipython3", 180 | "version": "3.7.9" 181 | } 182 | }, 183 | "nbformat": 4, 184 | "nbformat_minor": 4 185 | } 186 | -------------------------------------------------------------------------------- /Interview Preparation- Day 5-Logistic Regression.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "### How To Learn Machine Learning Algorithms For Interviews\n", 8 | "\n", 9 | "#### Logistics Regression\n", 10 | " \n", 11 | "\n", 12 | "Theoretical Understanding:\n", 13 | "\n", 14 | "1. Tutorial 35:Logitic Regression Part 1 https://www.youtube.com/watch?v=L_xBe7MbPwk\n", 15 | "2. Tutorial 36:Logitic Regression Part 2 https://www.youtube.com/watch?v=uFfsSgQgerw\n", 16 | "3. Tutorial 39:Logitic Regression Part 3 https://www.youtube.com/watch?v=V8fS0T_ktn4\n", 17 | "4. Tutorial 42:How To Find Optimal Threshold for Binary classification: https://www.youtube.com/watch?v=_AjhdXuXEDE\n", 18 | "5. Interview question: https://www.youtube.com/watch?v=tcaruVHXZwE&t=122s" 19 | ] 20 | }, 21 | { 22 | "cell_type": "code", 23 | "execution_count": null, 24 | "metadata": {}, 25 | "outputs": [], 26 | "source": [] 27 | }, 28 | { 29 | "cell_type": "markdown", 30 | "metadata": {}, 31 | "source": [ 32 | "##### 1. What Are the Basic Assumption?\n", 33 | "1. Linear Relation between independent features and the log odds" 34 | ] 35 | }, 36 | { 37 | "cell_type": "markdown", 38 | "metadata": {}, 39 | "source": [ 40 | "##### 2. Advantages\n", 41 | "Advantages of Logistics Regression\n", 42 | "\n", 43 | "1. Logistic Regression Are very easy to understand\n", 44 | "2. It requires less training\n", 45 | "3. Good accuracy for many simple data sets and it performs well when the dataset is linearly separable.\n", 46 | "4. It makes no assumptions about distributions of classes in feature space.\n", 47 | "5. Logistic regression is less inclined to over-fitting but it can overfit in high dimensional datasets.One may consider Regularization (L1 and L2) techniques to avoid over-fittingin these scenarios.\n", 48 | "6. Logistic regression is easier to implement, interpret, and very efficient to train.\n" 49 | ] 50 | }, 51 | { 52 | "cell_type": "markdown", 53 | "metadata": {}, 54 | "source": [ 55 | "##### 3. Disadvantages\n", 56 | "1. Sometimes Lot of Feature Engineering Is required\n", 57 | "2. If the independent features are correlated it may affect performance\n", 58 | "3. It is often quite prone to noise and overfitting\n", 59 | "4. If the number of observations is lesser than the number of features, Logistic Regression should not be used, otherwise, it may lead to overfitting.\n", 60 | "5. \tNon-linear problems can’t be solved with logistic regression because it has a linear decision surface. Linearly separable data is rarely found in real-world scenarios.\n", 61 | "6. It is tough to obtain complex relationships using logistic regression. More powerful and compact algorithms such as Neural Networks can easily outperform this algorithm.\n", 62 | "7. In Linear Regression independent and dependent variables are related linearly. But Logistic Regression needs that independent variables are linearly related to the log odds (log(p/(1-p))." 63 | ] 64 | }, 65 | { 66 | "cell_type": "markdown", 67 | "metadata": {}, 68 | "source": [ 69 | "##### 4. Whether Feature Scaling is required?\n", 70 | "yes\n", 71 | "\n", 72 | "#### 5. Missing Values\n", 73 | "Sensitive to missing values\n", 74 | "\n", 75 | "##### 6. Impact of outliers?\n", 76 | "Like linear regression, estimates of the logistic regression are sensitive to the unusual observations: outliers, high leverage, and influential observations. Numerical examples and analysis are presented to demonstrate the most recent outlier diagnostic methods using data sets from medical domain" 77 | ] 78 | }, 79 | { 80 | "cell_type": "markdown", 81 | "metadata": {}, 82 | "source": [ 83 | "##### Types of Problems it can solve(Supervised)\n", 84 | "1. Classification" 85 | ] 86 | }, 87 | { 88 | "cell_type": "markdown", 89 | "metadata": {}, 90 | "source": [ 91 | "\n" 92 | ] 93 | }, 94 | { 95 | "cell_type": "markdown", 96 | "metadata": {}, 97 | "source": [ 98 | "#### Practical Implementation\n", 99 | "1. http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html" 100 | ] 101 | }, 102 | { 103 | "cell_type": "markdown", 104 | "metadata": {}, 105 | "source": [ 106 | "##### Performance Metrics" 107 | ] 108 | }, 109 | { 110 | "cell_type": "markdown", 111 | "metadata": {}, 112 | "source": [ 113 | "##### Classification\n", 114 | "1. Confusion Matrix \n", 115 | "2. Precision,Recall, F1 score\n", 116 | "\n", 117 | "1. Part 1 https://www.youtube.com/watch?v=aWAnNHXIKww\n", 118 | "2. Part 2 https://www.youtube.com/watch?v=A_ZKMsZ3f3o" 119 | ] 120 | }, 121 | { 122 | "cell_type": "code", 123 | "execution_count": null, 124 | "metadata": {}, 125 | "outputs": [], 126 | "source": [] 127 | } 128 | ], 129 | "metadata": { 130 | "kernelspec": { 131 | "display_name": "Python 3", 132 | "language": "python", 133 | "name": "python3" 134 | }, 135 | "language_info": { 136 | "codemirror_mode": { 137 | "name": "ipython", 138 | "version": 3 139 | }, 140 | "file_extension": ".py", 141 | "mimetype": "text/x-python", 142 | "name": "python", 143 | "nbconvert_exporter": "python", 144 | "pygments_lexer": "ipython3", 145 | "version": "3.7.9" 146 | } 147 | }, 148 | "nbformat": 4, 149 | "nbformat_minor": 4 150 | } 151 | -------------------------------------------------------------------------------- /Interview Preparation-Random Forest-Bagging.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "### How To Learn Machine Learning Algorithms For Interviews\n", 8 | "\n", 9 | "#### Decision Tree Classifier And Regressor\n", 10 | "Interview Questions:\n", 11 | "1. Decision Tree \n", 12 | "2. Entropy, Information Gain, Gini Impurity\n", 13 | "3. Decision Tree Working For Categorical and Numerical Features\n", 14 | "4. What are the scenarios where Decision Tree works well\n", 15 | "5. Decision Tree Low Bias And High Variance- Overfitting\n", 16 | "6. Hyperparameter Techniques\n", 17 | "7. Library used for constructing decision tree\n", 18 | "8. Impact of Outliers Of Decision Tree\n", 19 | "9. Impact of mising values on Decision Tree\n", 20 | "10. Does Decision Tree require Feature Scaling\n", 21 | "\n", 22 | "##### Random Forest Classifier And Regresor\n", 23 | "\n", 24 | "11. Ensemble Techniques(Boosting And Bagging)\n", 25 | "12. Working of Random Forest Classifier\n", 26 | "13. Working of Random Forest Regresor\n", 27 | "14. Hyperparameter Tuning(Grid Search And RandomSearch)\n", 28 | " \n", 29 | "\n", 30 | "Theoretical Understanding:\n", 31 | "\n", 32 | "1. Tutorial 37:Entropy In Decision Tree https://www.youtube.com/watch?v=1IQOtJ4NI_0\n", 33 | "2. Tutorial 38:Information Gain https://www.youtube.com/watch?v=FuTRucXB9rA\n", 34 | "3. Tutorial 39:Gini Impurity https://www.youtube.com/watch?v=5aIFgrrTqOw\n", 35 | "4. Tutorial 40: Decision Tree For Numerical Features: https://www.youtube.com/watch?v=5O8HvA9pMew \n", 36 | "5. How To Visualize DT: https://www.youtube.com/watch?v=ot75kOmpYjI\n", 37 | "\n", 38 | "Theoretical Understanding:\n", 39 | "1. Ensemble technique(Bagging): https://www.youtube.com/watch?v=KIOeZ5cFZ50\n", 40 | "2. Random forest Classifier And Regressor\n", 41 | "https://www.youtube.com/watch?v=nxFG5xdpDto\n", 42 | "3. Construct Decision Tree And working in Random Forest: https://www.youtube.com/watch?v=WQ0iJSbnnZA&t=406s" 43 | ] 44 | }, 45 | { 46 | "cell_type": "markdown", 47 | "metadata": {}, 48 | "source": [ 49 | "#### Important properties of Random Forest Classifiers \n", 50 | "\n", 51 | "1. Decision Tree---Low Bias And High Variance\n", 52 | " \n", 53 | "2. Ensemble Bagging(Random Forest Classifier)--Low Bias And Low Variance\n", 54 | " " 55 | ] 56 | }, 57 | { 58 | "cell_type": "markdown", 59 | "metadata": {}, 60 | "source": [ 61 | "##### 1. What Are the Basic Assumption?\n", 62 | "There are no such assumptions" 63 | ] 64 | }, 65 | { 66 | "cell_type": "markdown", 67 | "metadata": {}, 68 | "source": [ 69 | "##### 2. Advantages\n", 70 | "Advantages of Random Forest\n", 71 | "\n", 72 | "1. Doesn't Overfit\n", 73 | "\n", 74 | "2. Favourite algorithm for Kaggle competition\n", 75 | "\n", 76 | "3. Less Parameter Tuning required\n", 77 | "\n", 78 | "4. Decision Tree can handle both continuous and categorical variables.\n", 79 | "\n", 80 | "5. No feature scaling required: No feature scaling (standardization and normalization) required in case of Random Forest as it uses DEcision Tree internally\n", 81 | "\n", 82 | "6. Suitable for any kind of ML problems\n", 83 | " " 84 | ] 85 | }, 86 | { 87 | "cell_type": "markdown", 88 | "metadata": {}, 89 | "source": [ 90 | "##### 3. Disadvantages\n", 91 | "Disadvantages of Random Forest\n", 92 | "\n", 93 | "1.Biased With features having many categories\n", 94 | "\n", 95 | "2. Biased in multiclass classification problems towards more frequent classes." 96 | ] 97 | }, 98 | { 99 | "cell_type": "markdown", 100 | "metadata": {}, 101 | "source": [ 102 | "##### 4. Whether Feature Scaling is required?\n", 103 | "No\n", 104 | "\n", 105 | "##### 6. Impact of outliers?\n", 106 | "Robust to Outliers" 107 | ] 108 | }, 109 | { 110 | "cell_type": "markdown", 111 | "metadata": {}, 112 | "source": [ 113 | "##### Types of Problems it can solve(Supervised)\n", 114 | "1. Classification\n", 115 | "2. Regression" 116 | ] 117 | }, 118 | { 119 | "cell_type": "markdown", 120 | "metadata": {}, 121 | "source": [ 122 | "\n" 123 | ] 124 | }, 125 | { 126 | "cell_type": "markdown", 127 | "metadata": {}, 128 | "source": [ 129 | "#### Practical Implementation\n", 130 | "1. https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html\n", 131 | "2. https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeRegressor.html\n", 132 | "\n", 133 | "1. https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html\n", 134 | "2. https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html" 135 | ] 136 | }, 137 | { 138 | "cell_type": "markdown", 139 | "metadata": {}, 140 | "source": [ 141 | "##### Performance Metrics" 142 | ] 143 | }, 144 | { 145 | "cell_type": "markdown", 146 | "metadata": {}, 147 | "source": [ 148 | "##### Classification\n", 149 | "1. Confusion Matrix \n", 150 | "2. Precision,Recall, F1 score\n", 151 | "\n", 152 | "##### Regression\n", 153 | "1. R2,Adjusted R2\n", 154 | "2. MSE,RMSE,MAE" 155 | ] 156 | }, 157 | { 158 | "cell_type": "code", 159 | "execution_count": null, 160 | "metadata": {}, 161 | "outputs": [], 162 | "source": [] 163 | } 164 | ], 165 | "metadata": { 166 | "kernelspec": { 167 | "display_name": "Python 3", 168 | "language": "python", 169 | "name": "python3" 170 | }, 171 | "language_info": { 172 | "codemirror_mode": { 173 | "name": "ipython", 174 | "version": 3 175 | }, 176 | "file_extension": ".py", 177 | "mimetype": "text/x-python", 178 | "name": "python", 179 | "nbconvert_exporter": "python", 180 | "pygments_lexer": "ipython3", 181 | "version": "3.7.7" 182 | } 183 | }, 184 | "nbformat": 4, 185 | "nbformat_minor": 4 186 | } 187 | -------------------------------------------------------------------------------- /Interview Preparation-Xgboost,GBboost,Adaboost--Boosting.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "### How To Learn Machine Learning Algorithms For Interviews\n", 8 | "\n", 9 | "#### Decision Tree Classifier And Regressor\n", 10 | "Interview Questions:\n", 11 | "1. Decision Tree \n", 12 | "2. Entropy, Information Gain, Gini Impurity\n", 13 | "3. Decision Tree Working For Categorical and Numerical Features\n", 14 | "4. What are the scenarios where Decision Tree works well\n", 15 | "5. Decision Tree Low Bias And High Variance- Overfitting\n", 16 | "6. Hyperparameter Techniques\n", 17 | "7. Library used for constructing decision tree\n", 18 | "8. Impact of Outliers Of Decision Tree\n", 19 | "9. Impact of mising values on Decision Tree\n", 20 | "10. Does Decision Tree require Feature Scaling\n", 21 | "\n", 22 | "#### Xgboost Classifier And Regressor, GB Algorithm, Adaboost\n", 23 | "\n", 24 | " \n", 25 | "\n", 26 | "Decision Tree Theoretical Understanding:\n", 27 | "\n", 28 | "1. Tutorial 37:Entropy In Decision Tree https://www.youtube.com/watch?v=1IQOtJ4NI_0\n", 29 | "2. Tutorial 38:Information Gain https://www.youtube.com/watch?v=FuTRucXB9rA\n", 30 | "3. Tutorial 39:Gini Impurity https://www.youtube.com/watch?v=5aIFgrrTqOw\n", 31 | "4. Tutorial 40: Decision Tree For Numerical Features: https://www.youtube.com/watch?v=5O8HvA9pMew \n", 32 | "5. How To Visualize DT: https://www.youtube.com/watch?v=ot75kOmpYjI\n", 33 | "\n", 34 | "Theoretical Understanding:\n", 35 | "\n", 36 | "1. Ensemble technique(Bagging): https://www.youtube.com/watch?v=KIOeZ5cFZ50\n", 37 | "2. Adaboost(Boosting Technique):https://www.youtube.com/watch?v=NLRO1-jp5F8\n", 38 | "3. Gradient Boosting In Depth Intuition Part 1: https://www.youtube.com/watch?v=Nol1hVtLOSg\n", 39 | "4. Gradient Boosting In Depth Intuition Part 2: https://www.youtube.com/watch?v=Oo9q6YtGzvc\n", 40 | "5. Xgboost Classifier Indepth Intuition: https://www.youtube.com/watch?v=gPciUPwWJQQ\n", 41 | "6. Xgboost Regression Indpeth Intuition: https://www.youtube.com/watch?v=w-_vmVfpssg\n", 42 | "7. Implementation of Xgboost: https://youtu.be/9HomdnM12o4" 43 | ] 44 | }, 45 | { 46 | "cell_type": "markdown", 47 | "metadata": {}, 48 | "source": [ 49 | "##### 1. What Are the Basic Assumption?\n", 50 | "There are no such assumptions" 51 | ] 52 | }, 53 | { 54 | "cell_type": "markdown", 55 | "metadata": {}, 56 | "source": [ 57 | "#### Missing Values\n", 58 | "1. Adaboost can handle mising values\n", 59 | "2. Xgboosst and GBoost cannot handle missing values" 60 | ] 61 | }, 62 | { 63 | "cell_type": "markdown", 64 | "metadata": {}, 65 | "source": [ 66 | "##### 2. Advantages\n", 67 | "Advantages of Adaboost\n", 68 | "\n", 69 | "1. Doesn't Overfit\n", 70 | "\n", 71 | "2. It has few parameters to tune\n", 72 | " \n", 73 | "Advantages of Gradient Boost And Xgboost\n", 74 | " \n", 75 | "1. It has a great performance\n", 76 | "2. It can solve complex non linear functions \n", 77 | "3. It is better in solve any kind of ML usecases.\n", 78 | " " 79 | ] 80 | }, 81 | { 82 | "cell_type": "markdown", 83 | "metadata": {}, 84 | "source": [ 85 | "##### 3. Disadvantages\n", 86 | "Disadvantages of Gradient Boosting And Xgboost\n", 87 | "\n", 88 | "1.It requires some amount of parameter tuning" 89 | ] 90 | }, 91 | { 92 | "cell_type": "markdown", 93 | "metadata": {}, 94 | "source": [ 95 | "##### 4. Whether Feature Scaling is required?\n", 96 | "No\n", 97 | "\n", 98 | "##### 6. Impact of outliers?\n", 99 | "Robust to Outliers in Gradient Boosting And Xgboost,\n", 100 | "Sensitive to outliers in Adaboost" 101 | ] 102 | }, 103 | { 104 | "cell_type": "markdown", 105 | "metadata": {}, 106 | "source": [ 107 | "##### Types of Problems it can solve(Supervised)\n", 108 | "1. Classification\n", 109 | "2. Regression" 110 | ] 111 | }, 112 | { 113 | "cell_type": "markdown", 114 | "metadata": {}, 115 | "source": [ 116 | "\n" 117 | ] 118 | }, 119 | { 120 | "cell_type": "markdown", 121 | "metadata": {}, 122 | "source": [ 123 | "##### Performance Metrics" 124 | ] 125 | }, 126 | { 127 | "cell_type": "markdown", 128 | "metadata": {}, 129 | "source": [ 130 | "##### Classification\n", 131 | "1. Confusion Matrix \n", 132 | "2. Precision,Recall, F1 score\n", 133 | "\n", 134 | "##### Regression\n", 135 | "1. R2,Adjusted R2\n", 136 | "2. MSE,RMSE,MAE" 137 | ] 138 | }, 139 | { 140 | "cell_type": "code", 141 | "execution_count": null, 142 | "metadata": {}, 143 | "outputs": [], 144 | "source": [] 145 | } 146 | ], 147 | "metadata": { 148 | "kernelspec": { 149 | "display_name": "Python 3", 150 | "language": "python", 151 | "name": "python3" 152 | }, 153 | "language_info": { 154 | "codemirror_mode": { 155 | "name": "ipython", 156 | "version": 3 157 | }, 158 | "file_extension": ".py", 159 | "mimetype": "text/x-python", 160 | "name": "python", 161 | "nbconvert_exporter": "python", 162 | "pygments_lexer": "ipython3", 163 | "version": "3.7.7" 164 | } 165 | }, 166 | "nbformat": 4, 167 | "nbformat_minor": 4 168 | } 169 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Interview-Prepartion-Data-Science --------------------------------------------------------------------------------