├── Algorithms & Theory ├── Categories ├── Programming └── README.md /Algorithms & Theory: -------------------------------------------------------------------------------- 1 | Q1: What's the trade-off between bias and variance? 2 | 3 | A1: Bias: Bias is error due to erroneous or overly simplistic assumptions in the learning algorithm you’re using. 4 | This can lead to the model underfitting your data, making it hard for it to have high predictive accuracy 5 | and for you to generalize your knowledge from the training set to the test set. 6 | 7 | Variance: Variance is error due to too much complexity in the learning algorithm you’re using. 8 | This leads to the algorithm being highly sensitive to high degrees of variation in your training data, 9 | which can lead your model to overfit the data. 10 | You’ll be carrying too much noise from your training data for your model to be very useful for your test data. 11 | 12 | Bias-Variance Relationship: 13 | 14 | The bias-variance decomposition essentially decomposes the learning error from any algorithm by adding the bias, 15 | the variance and a bit of irreducible error due to noise in the underlying dataset. 16 | Essentially, if you make the model more complex and add more variables, you’ll lose bias but gain some variance 17 | — in order to get the optimally reduced amount of error, you’ll have to tradeoff bias and variance. 18 | You don’t want either high bias or high variance in your model. 19 | 20 | Q2: What is the difference between supervised and unsupervised machine learning? 21 | 22 | A2: 23 | Supervised learning: 24 | Supervised learning requires training labeled data. 25 | For example, in order to do classification (a supervised learning task), 26 | you’ll need to first label the data you’ll use to train the model to classify data into your labeled groups. 27 | 28 | Unsupervised learning: 29 | In contrast, unsupervised learning does not require labeling data explicitly. 30 | 31 | Q3: How is kNN different from k-means clustering? 32 | 33 | A3: KNN: K-Nearest Neighbors is a supervised classification algorithm, 34 | K-means clustering: k-means clustering is an unsupervised clustering algorithm. 35 | Similarity: While the mechanisms may seem similar at first, 36 | what this really means is that in order for K-Nearest Neighbors to work, 37 | you need labeled data you want to classify an unlabeled point into (thus the nearest neighbor part). 38 | K-means clustering requires only a set of unlabeled points and a threshold: 39 | the algorithm will take unlabeled points and gradually learn how to cluster them into groups 40 | by computing the mean of the distance between different points. 41 | Difference: That KNN needs labeled points and is thus supervised learning, 42 | while k-means doesn’t — and is thus unsupervised learning. 43 | 44 | Q4: Explain how a ROC curve works? 45 | A4: The ROC curve is a graphical representation of the contrast 46 | between true positive rates and the false positive rate at various thresholds. 47 | It’s often used as a proxy for the trade-off between the sensitivity of 48 | the model (true positives) vs the fall-out or the probability it will trigger a false alarm (false positives). 49 | 50 | Q5: Define precision and recall. 51 | A5: Recall: 52 | Recall is also known as the true positive rate: 53 | the amount of positives your model claims compared to the actual number of positives there are throughout the data. 54 | 55 | Precision: 56 | Precision is also known as the positive predictive value, and it is a measure of 57 | the amount of accurate positives your model claims compared to the number of positives it actually claims. 58 | 59 | Example: 60 | It can be easier to think of recall and precision in the context of a case 61 | where you’ve predicted that there were 10 apples and 5 oranges in a case of 10 apples. 62 | You’d have perfect recall (there are actually 10 apples, and you predicted there would be 10) 63 | but 66.7% precision because out of the 15 events you predicted, only 10 (the apples) are correct. 64 | 65 | Q6: What is Bayes’ Theorem? How is it useful in a machine learning context? 66 | A6: Bayes’ Theorem gives you the posterior probability of an event given what is known as prior knowledge. 67 | Mathematically, it’s expressed as the true positive rate of a condition sample divided by 68 | the sum of the false positive rate of the population and the true positive rate of a condition. 69 | Say you had a 60% chance of actually having the flu after a flu test, but out of people who had the flu, 70 | the test will be false 50% of the time, and the overall population only has a 5% chance of having the flu. 71 | 72 | Would you actually have a 60% chance of having the flu after having a positive test? 73 | Bayes’ Theorem says no. It says that you have a 74 | (.6 * 0.05) (True Positive Rate of a Condition Sample) / (.6*0.05)(True Positive Rate of a Condition Sample) + (.5*0.95) (False Positive Rate of a Population) = 0.0594 or 5.94% chance of getting a flu. 75 | 76 | Bayes’ Theorem is the basis behind a branch of machine learning that most notably includes the Naive Bayes classifier. 77 | That’s something important to consider when you’re faced with machine learning interview questions. 78 | 79 | Q7: Why is “Naive” Bayes naive? 80 | A7: Despite its practical applications, especially in text mining, 81 | Naive Bayes is considered “Naive” because it makes an assumption that is virtually impossible to see in real-life data: 82 | the conditional probability is calculated as the pure product of the individual probabilities of components. 83 | This implies the absolute independence of features — a condition probably never met in real life. 84 | 85 | As a Quora commenter put it whimsically, 86 | a Naive Bayes classifier that figured out that you liked pickles and ice cream 87 | would probably naively recommend you a ickle ice cream. 88 | 89 | Q8: Explain the difference between L1 and L2 regularization? 90 | A8: L2 regularization tends to spread error among all the terms, 91 | while L1 is more binary/sparse, with many variables either being assigned a 1 or 0 in weighting. 92 | L1 corresponds to setting a Laplacean prior on the terms, while L2 corresponds to a Gaussian prior. 93 | 94 | Q9: What’s your favorite algorithm, and can you explain it to me in less than a minute? 95 | A9: This type of question tests your understanding of how to 96 | communicate complex and technical nuances with poise and the ability to summarize quickly and efficiently. 97 | Make sure you have a choice and make sure you can explain different algorithms 98 | so simply and effectively that a five-year-old could grasp the basics! 99 | 100 | Q10: What’s the difference between Type I and Type II error? 101 | A10: Don’t think that this is a trick question! 102 | Many machine learning interview questions will be an attempt to lob basic questions at you just to 103 | make sure you’re on top of your game and you’ve prepared all of your bases. 104 | 105 | Type I error is a false positive, while Type II error is a false negative. 106 | Briefly stated, Type I error means claiming something has happened when it hasn’t, 107 | while Type II error means that you claim nothing is happening when in fact something is. 108 | 109 | A clever way to think about this is to think of Type I error as telling a man he is pregnant, 110 | while Type II error means you tell a pregnant woman she isn’t carrying a baby. 111 | 112 | Q11: What’s a Fourier transform? 113 | A11: A Fourier transform is a generic method to decompose generic functions into a superposition of symmetric functions. 114 | Or as this more intuitive tutorial puts it, given a smoothie, it’s how we find the recipe. 115 | The Fourier transform finds the set of cycle speeds, amplitudes and phases to match any time signal. 116 | A Fourier transform converts a signal from time to frequency domain — 117 | it’s a very common way to extract features from audio signals or other time series such as sensor data. 118 | 119 | Q12: What’s the difference between probability and likelihood? 120 | A12: 121 | 122 | Q13: What is deep learning, and how does it contrast with other machine learning algorithms? 123 | A13: Deep learning is a subset of machine learning that is concerned with neural networks: 124 | how to use backpropagation and certain principles from neuroscience to more accurately model large sets of 125 | unlabelled or semi-structured data. 126 | In that sense, deep learning represents an unsupervised learning algorithm that learns representations of data 127 | through the use of neural nets. 128 | 129 | Q14: What’s the difference between a generative and discriminative model? 130 | A14: A generative model will learn categories of data 131 | while a discriminative model will simply learn the distinction between different categories of data. 132 | Discriminative models will generally outperform generative models on classification tasks. 133 | 134 | Q15: What cross-validation technique would you use on a time series dataset? 135 | A15: Instead of using standard k-folds cross-validation, 136 | you have to pay attention to the fact that a time series is not randomly distributed data — 137 | it is inherently ordered by chronological order. 138 | If a pattern emerges in later time periods for example, your model may still pick up on it 139 | even if that effect doesn’t hold in earlier years! 140 | 141 | You’ll want to do something like forward chaining where you’ll be able to model on past data then look at forward-facing data. 142 | 143 | fold 1 : training [1], test [2] 144 | fold 2 : training [1 2], test [3] 145 | fold 3 : training [1 2 3], test [4] 146 | fold 4 : training [1 2 3 4], test [5] 147 | fold 5 : training [1 2 3 4 5], test [6] 148 | 149 | Q16: How is a decision tree pruned? 150 | A16: Pruning is what happens in decision trees when branches that have weak predictive power are removed 151 | in order to reduce the complexity of the model and increase the predictive accuracy of a decision tree model. 152 | Pruning can happen bottom-up and top-down, with approaches such as reduced error pruning and cost complexity pruning. 153 | 154 | Reduced error pruning is perhaps the simplest version: replace each node. 155 | If it doesn’t decrease predictive accuracy, keep it pruned. 156 | While simple, this heuristic actually comes pretty close to an approach that would optimize for maximum accuracy. 157 | 158 | Q17: Which is more important to you– model accuracy, or model performance? 159 | A17: This question tests your grasp of the nuances of machine learning model performance! 160 | Machine learning interview questions often look towards the details. 161 | There are models with higher accuracy that can perform worse in predictive power — how does that make sense? 162 | 163 | Well, it has everything to do with how model accuracy is only a subset of model performance, 164 | and at that, a sometimes misleading one. 165 | For example, if you wanted to detect fraud in a massive dataset with a sample of millions, 166 | a more accurate model would most likely predict no fraud at all if only a vast minority of cases were fraud. 167 | However, this would be useless for a predictive model — a model designed to find fraud 168 | that asserted there was no fraud at all! 169 | 170 | Questions like this help you demonstrate that you understand model accuracy isn’t the be-all and end-all of model performance. 171 | 172 | Q18: What’s the F1 score? How would you use it? 173 | A18: The F1 score is a measure of a model’s performance. 174 | It is a weighted average of the precision and recall of a model, with results tending to 1 being the best, 175 | and those tending to 0 being the worst. You would use it in classification tests where true negatives don’t matter much. 176 | 177 | Q19: How would you handle an imbalanced dataset? 178 | A19: An imbalanced dataset is when you have, 179 | for example, a classification test and 90% of the data is in one class. 180 | That leads to problems: an accuracy of 90% can be skewed if you have no predictive power on the other category of data! 181 | 182 | Here are a few tactics to get over the hump: 183 | 1- Collect more data to even the imbalances in the dataset. 184 | 2- Resample the dataset to correct for imbalances. 185 | 3- Try a different algorithm altogether on your dataset. 186 | 187 | What’s important here is that you have a keen sense for what damage an unbalanced dataset can cause, and how to balance that. 188 | 189 | Q20: When should you use classification over regression? 190 | A20: Classification produces discrete values and dataset to strict categories, 191 | while regression gives you continuous results that allow you to better distinguish differences between individual points. 192 | You would use classification over regression 193 | if you wanted your results to reflect the belongingness of data points in your dataset to certain explicit categories 194 | (ex: If you wanted to know whether a name was male or female rather than just how correlated they were with male and female names.) 195 | 196 | Q21: Name an example where ensemble techniques might be useful. 197 | A21: Ensemble techniques use a combination of learning algorithms to optimize better predictive performance. 198 | They typically reduce overfitting in models and make the model more robust 199 | (unlikely to be influenced by small changes in the training data). 200 | 201 | You could list some examples of ensemble methods, 202 | from bagging to boosting to a “bucket of models” method and demonstrate how they could increase predictive power. 203 | 204 | Q22: How do you ensure you’re not overfitting with a model? 205 | A22: This is a simple restatement of a fundamental problem in machine learning: 206 | the possibility of overfitting training data and carrying the noise of that data through to the test set, 207 | thereby providing inaccurate generalizations. 208 | 209 | There are three main methods to avoid overfitting: 210 | 1- Keep the model simpler: reduce variance by taking into account fewer variables and parameters, 211 | thereby removing some of the noise in the training data. 212 | 2- Use cross-validation techniques such as k-folds cross-validation. 213 | 3- Use regularization techniques such as LASSO that penalize certain model parameters if they’re likely to cause overfitting. 214 | 215 | Q23: What evaluation approaches would you work to gauge the effectiveness of a machine learning model? 216 | A23: You would first split the dataset into training and test sets, 217 | or perhaps use cross-validation techniques to further segment the dataset into composite sets of 218 | training and test sets within the data. 219 | 220 | You should then implement a choice selection of performance metrics: here is a fairly comprehensive list. 221 | You could use measures such as the F1 score, the accuracy, and the confusion matrix. 222 | 223 | What’s important here is to demonstrate that you understand the nuances of how a model is measured and 224 | how to choose the right performance measures for the right situations. 225 | 226 | Q24: How would you evaluate a logistic regression model? 227 | A24: A subsection of the question above. 228 | You have to demonstrate an understanding of what the typical goals of a logistic regression are (classification, prediction etc.) 229 | and bring up a few examples and use cases. 230 | 231 | Q25: What’s the “kernel trick” and how is it useful? 232 | A25: The Kernel trick involves kernel functions that can enable in higher-dimension spaces 233 | without explicitly calculating the coordinates of points within that dimension: 234 | instead, kernel functions compute the inner products between the images of all pairs of data in a feature space. 235 | This allows them the very useful attribute of calculating the coordinates of higher dimensions 236 | while being computationally cheaper than the explicit calculation of said coordinates. 237 | 238 | Many algorithms can be expressed in terms of inner products. 239 | Using the kernel trick enables us effectively run algorithms in a high-dimensional space with lower-dimensional data. 240 | -------------------------------------------------------------------------------- /Categories: -------------------------------------------------------------------------------- 1 | Machine Learning Interview 2 | 3 | The first thing has to do with the algorithms and theory behind machine learning. 4 | You'll have to show an understanding of how algorithms compare with one another and 5 | how to measure their efficacy and accuracy in the right way. 6 | 7 | The second category has to do with your programming skills and 8 | your ability to execute on top of those algorithms and the theory. 9 | 10 | The third has to do with your general interest in machine learning: 11 | you’ll be asked about what’s going on in the industry and how you 12 | keep up with the latest machine learning trends. 13 | 14 | Finally, there are company or industry-specific questions that test 15 | your ability to take your general machine learning knowledge and 16 | turn it into actionable points to drive the bottom line forward. 17 | -------------------------------------------------------------------------------- /Programming: -------------------------------------------------------------------------------- 1 | These machine learning interview questions test your knowledge of programming principles you need to implement 2 | machine learning principles in practice. 3 | Machine learning interview questions tend to be technical questions that test your logic and programming skills: 4 | this section focuses more on the latter. 5 | 6 | Q26: How do you handle missing or corrupted data in a dataset? 7 | A26: You could find missing/corrupted data in a dataset and either drop those rows or columns, 8 | or decide to replace them with another value. 9 | 10 | In Pandas, there are two very useful methods: isnull() and dropna() that 11 | will help you find columns of data with missing or corrupted data and drop those values. 12 | If you want to fill the invalid values with a placeholder value (for example, 0), you could use the fillna() method. 13 | 14 | Q27: Do you have experience with Spark or big data tools for machine learning? 15 | A27: You’ll want to get familiar with the meaning of big data for different companies and the different tools they’ll want. 16 | Spark is the big data tool most in demand now, able to handle immense datasets with speed. 17 | Be honest if you don’t have experience with the tools demanded, 18 | but also take a look at job descriptions and see what tools pop up: you’ll want to invest in familiarizing yourself with them. 19 | 20 | Q28: Pick an algorithm. Write the psuedo-code for a parallel implementation. 21 | A28: This kind of question demonstrates your ability to think in parallelism and 22 | how you could handle concurrency in programming implementations dealing with big data. 23 | Take a look at pseudocode frameworks such as Peril-L and visualization tools such as Web Sequence Diagrams to 24 | help you demonstrate your ability to write code that reflects parallelism. 25 | 26 | Q29: What are some differences between a linked list and an array? 27 | A29: An array is an ordered collection of objects. 28 | A linked list is a series of objects with pointers that direct how to process them sequentially. 29 | An array assumes that every element has the same size, unlike the linked list. 30 | A linked list can more easily grow organically: an array has to be pre-defined or re-defined for organic growth. 31 | Shuffling a linked list involves changing which points direct where 32 | but shuffling an array is more complex and takes more memory. 33 | 34 | Q30: Describe a hash table. 35 | A30: A hash table is a data structure that produces an associative array. 36 | A key is mapped to certain values through the use of a hash function. 37 | They are often used for tasks such as database indexing. 38 | 39 | Q31: Which data visualization libraries do you use? What are your thoughts on the best data visualization tools? 40 | A31: What’s important here is to define your views on how to properly visualize data 41 | and your personal preferences when it comes to tools. 42 | Popular tools include R’s ggplot, Python’s seaborn and matplotlib, and tools such as Plot.ly and Tableau. 43 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Machine-Learning-Interview 2 | A List of Machine Learning Interview Questions. 3 | --------------------------------------------------------------------------------