├── Algorithms & Theory
├── Categories
├── Programming
└── README.md


/Algorithms & Theory:
--------------------------------------------------------------------------------
  1 | Q1: What's the trade-off between bias and variance?
  2 | 
  3 | A1: Bias: Bias is error due to erroneous or overly simplistic assumptions in the learning algorithm you’re using. 
  4 |           This can lead to the model underfitting your data, making it hard for it to have high predictive accuracy 
  5 |           and for you to generalize your knowledge from the training set to the test set.
  6 | 
  7 |     Variance: Variance is error due to too much complexity in the learning algorithm you’re using. 
  8 |               This leads to the algorithm being highly sensitive to high degrees of variation in your training data, 
  9 |               which can lead your model to overfit the data. 
 10 |               You’ll be carrying too much noise from your training data for your model to be very useful for your test data.
 11 | 
 12 |     Bias-Variance Relationship:
 13 | 
 14 |     The bias-variance decomposition essentially decomposes the learning error from any algorithm by adding the bias, 
 15 |     the variance and a bit of irreducible error due to noise in the underlying dataset. 
 16 |     Essentially, if you make the model more complex and add more variables, you’ll lose bias but gain some variance 
 17 |     — in order to get the optimally reduced amount of error, you’ll have to tradeoff bias and variance. 
 18 |     You don’t want either high bias or high variance in your model.
 19 | 
 20 | Q2: What is the difference between supervised and unsupervised machine learning?
 21 | 
 22 | A2: 
 23 |     Supervised learning: 
 24 |       Supervised learning requires training labeled data. 
 25 |       For example, in order to do classification (a supervised learning task), 
 26 |       you’ll need to first label the data you’ll use to train the model to classify data into your labeled groups. 
 27 |     
 28 |     Unsupervised learning:
 29 |       In contrast, unsupervised learning does not require labeling data explicitly.
 30 |       
 31 | Q3: How is kNN different from k-means clustering?
 32 | 
 33 | A3: KNN: K-Nearest Neighbors is a supervised classification algorithm, 
 34 |     K-means clustering: k-means clustering is an unsupervised clustering algorithm.
 35 |     Similarity: While the mechanisms may seem similar at first, 
 36 |                 what this really means is that in order for K-Nearest Neighbors to work, 
 37 |                 you need labeled data you want to classify an unlabeled point into (thus the nearest neighbor part). 
 38 |                 K-means clustering requires only a set of unlabeled points and a threshold: 
 39 |                 the algorithm will take unlabeled points and gradually learn how to cluster them into groups 
 40 |                 by computing the mean of the distance between different points.
 41 |     Difference: That KNN needs labeled points and is thus supervised learning, 
 42 |                 while k-means doesn’t — and is thus unsupervised learning.
 43 |               
 44 | Q4: Explain how a ROC curve works?
 45 | A4: The ROC curve is a graphical representation of the contrast 
 46 |     between true positive rates and the false positive rate at various thresholds. 
 47 |     It’s often used as a proxy for the trade-off between the sensitivity of 
 48 |     the model (true positives) vs the fall-out or the probability it will trigger a false alarm (false positives).
 49 |     
 50 | Q5: Define precision and recall.
 51 | A5: Recall:
 52 |     Recall is also known as the true positive rate: 
 53 |     the amount of positives your model claims compared to the actual number of positives there are throughout the data. 
 54 |     
 55 |     Precision: 
 56 |     Precision is also known as the positive predictive value, and it is a measure of 
 57 |     the amount of accurate positives your model claims compared to the number of positives it actually claims. 
 58 |     
 59 |     Example:
 60 |     It can be easier to think of recall and precision in the context of a case 
 61 |     where you’ve predicted that there were 10 apples and 5 oranges in a case of 10 apples. 
 62 |     You’d have perfect recall (there are actually 10 apples, and you predicted there would be 10) 
 63 |     but 66.7% precision because out of the 15 events you predicted, only 10 (the apples) are correct.
 64 |     
 65 | Q6: What is Bayes’ Theorem? How is it useful in a machine learning context?
 66 | A6: Bayes’ Theorem gives you the posterior probability of an event given what is known as prior knowledge.
 67 |     Mathematically, it’s expressed as the true positive rate of a condition sample divided by 
 68 |     the sum of the false positive rate of the population and the true positive rate of a condition. 
 69 |     Say you had a 60% chance of actually having the flu after a flu test, but out of people who had the flu, 
 70 |     the test will be false 50% of the time, and the overall population only has a 5% chance of having the flu. 
 71 |      
 72 |     Would you actually have a 60% chance of having the flu after having a positive test?
 73 |     Bayes’ Theorem says no. It says that you have a 
 74 |     (.6 * 0.05) (True Positive Rate of a Condition Sample) / (.6*0.05)(True Positive Rate of a Condition Sample) + (.5*0.95) (False Positive Rate of a Population)  = 0.0594 or 5.94% chance of getting a flu.
 75 |      
 76 |     Bayes’ Theorem is the basis behind a branch of machine learning that most notably includes the Naive Bayes classifier. 
 77 |     That’s something important to consider when you’re faced with machine learning interview questions.
 78 |  
 79 | Q7: Why is “Naive” Bayes naive? 
 80 | A7: Despite its practical applications, especially in text mining, 
 81 |     Naive Bayes is considered “Naive” because it makes an assumption that is virtually impossible to see in real-life data: 
 82 |     the conditional probability is calculated as the pure product of the individual probabilities of components. 
 83 |     This implies the absolute independence of features — a condition probably never met in real life.
 84 |      
 85 |     As a Quora commenter put it whimsically,      
 86 |     a Naive Bayes classifier that figured out that you liked pickles and ice cream 
 87 |     would probably naively recommend you a ickle ice cream.
 88 |  
 89 | Q8: Explain the difference between L1 and L2 regularization?
 90 | A8: L2 regularization tends to spread error among all the terms, 
 91 |     while L1 is more binary/sparse, with many variables either being assigned a 1 or 0 in weighting. 
 92 |     L1 corresponds to setting a Laplacean prior on the terms, while L2 corresponds to a Gaussian prior.
 93 |      
 94 | Q9: What’s your favorite algorithm, and can you explain it to me in less than a minute?
 95 | A9: This type of question tests your understanding of how to 
 96 |     communicate complex and technical nuances with poise and the ability to summarize quickly and efficiently. 
 97 |     Make sure you have a choice and make sure you can explain different algorithms 
 98 |     so simply and effectively that a five-year-old could grasp the basics!
 99 | 
100 | Q10: What’s the difference between Type I and Type II error? 
101 | A10: Don’t think that this is a trick question! 
102 |      Many machine learning interview questions will be an attempt to lob basic questions at you just to 
103 |      make sure you’re on top of your game and you’ve prepared all of your bases.
104 | 
105 |      Type I error is a false positive, while Type II error is a false negative. 
106 |      Briefly stated, Type I error means claiming something has happened when it hasn’t, 
107 |      while Type II error means that you claim nothing is happening when in fact something is.
108 | 
109 |      A clever way to think about this is to think of Type I error as telling a man he is pregnant, 
110 |      while Type II error means you tell a pregnant woman she isn’t carrying a baby.
111 |  
112 | Q11: What’s a Fourier transform?
113 | A11: A Fourier transform is a generic method to decompose generic functions into a superposition of symmetric functions. 
114 |      Or as this more intuitive tutorial puts it, given a smoothie, it’s how we find the recipe. 
115 |      The Fourier transform finds the set of cycle speeds, amplitudes and phases to match any time signal. 
116 |      A Fourier transform converts a signal from time to frequency domain — 
117 |      it’s a very common way to extract features from audio signals or other time series such as sensor data.
118 |  
119 | Q12: What’s the difference between probability and likelihood?
120 | A12: 
121 | 
122 | Q13: What is deep learning, and how does it contrast with other machine learning algorithms?
123 | A13: Deep learning is a subset of machine learning that is concerned with neural networks: 
124 |      how to use backpropagation and certain principles from neuroscience to more accurately model large sets of 
125 |      unlabelled or semi-structured data. 
126 |      In that sense, deep learning represents an unsupervised learning algorithm that learns representations of data 
127 |      through the use of neural nets.
128 | 
129 | Q14: What’s the difference between a generative and discriminative model?
130 | A14: A generative model will learn categories of data 
131 |      while a discriminative model will simply learn the distinction between different categories of data. 
132 |      Discriminative models will generally outperform generative models on classification tasks.
133 | 
134 | Q15: What cross-validation technique would you use on a time series dataset?
135 | A15: Instead of using standard k-folds cross-validation, 
136 |      you have to pay attention to the fact that a time series is not randomly distributed data — 
137 |      it is inherently ordered by chronological order. 
138 |      If a pattern emerges in later time periods for example, your model may still pick up on it 
139 |      even if that effect doesn’t hold in earlier years!
140 | 
141 |      You’ll want to do something like forward chaining where you’ll be able to model on past data then look at forward-facing data.
142 | 
143 |      fold 1 : training [1], test [2]
144 |      fold 2 : training [1 2], test [3]
145 |      fold 3 : training [1 2 3], test [4]
146 |      fold 4 : training [1 2 3 4], test [5]
147 |      fold 5 : training [1 2 3 4 5], test [6]
148 | 
149 | Q16: How is a decision tree pruned?
150 | A16: Pruning is what happens in decision trees when branches that have weak predictive power are removed 
151 |      in order to reduce the complexity of the model and increase the predictive accuracy of a decision tree model. 
152 |      Pruning can happen bottom-up and top-down, with approaches such as reduced error pruning and cost complexity pruning.
153 |      
154 |      Reduced error pruning is perhaps the simplest version: replace each node. 
155 |      If it doesn’t decrease predictive accuracy, keep it pruned. 
156 |      While simple, this heuristic actually comes pretty close to an approach that would optimize for maximum accuracy.
157 |  
158 | Q17: Which is more important to you– model accuracy, or model performance?
159 | A17: This question tests your grasp of the nuances of machine learning model performance! 
160 |      Machine learning interview questions often look towards the details. 
161 |      There are models with higher accuracy that can perform worse in predictive power — how does that make sense?
162 |      
163 |      Well, it has everything to do with how model accuracy is only a subset of model performance, 
164 |      and at that, a sometimes misleading one. 
165 |      For example, if you wanted to detect fraud in a massive dataset with a sample of millions, 
166 |      a more accurate model would most likely predict no fraud at all if only a vast minority of cases were fraud. 
167 |      However, this would be useless for a predictive model — a model designed to find fraud 
168 |      that asserted there was no fraud at all! 
169 |      
170 |      Questions like this help you demonstrate that you understand model accuracy isn’t the be-all and end-all of model performance.
171 | 
172 | Q18: What’s the F1 score? How would you use it?
173 | A18: The F1 score is a measure of a model’s performance. 
174 |      It is a weighted average of the precision and recall of a model, with results tending to 1 being the best, 
175 |      and those tending to 0 being the worst. You would use it in classification tests where true negatives don’t matter much.
176 | 
177 | Q19: How would you handle an imbalanced dataset?
178 | A19: An imbalanced dataset is when you have, 
179 |      for example, a classification test and 90% of the data is in one class. 
180 |      That leads to problems: an accuracy of 90% can be skewed if you have no predictive power on the other category of data! 
181 |      
182 |      Here are a few tactics to get over the hump:
183 |      1- Collect more data to even the imbalances in the dataset.
184 |      2- Resample the dataset to correct for imbalances.
185 |      3- Try a different algorithm altogether on your dataset.
186 |      
187 |      What’s important here is that you have a keen sense for what damage an unbalanced dataset can cause, and how to balance that. 
188 | 
189 | Q20: When should you use classification over regression?
190 | A20: Classification produces discrete values and dataset to strict categories, 
191 |      while regression gives you continuous results that allow you to better distinguish differences between individual points. 
192 |      You would use classification over regression 
193 |      if you wanted your results to reflect the belongingness of data points in your dataset to certain explicit categories 
194 |      (ex: If you wanted to know whether a name was male or female rather than just how correlated they were with male and female names.)
195 | 
196 | Q21: Name an example where ensemble techniques might be useful.
197 | A21: Ensemble techniques use a combination of learning algorithms to optimize better predictive performance. 
198 |      They typically reduce overfitting in models and make the model more robust 
199 |      (unlikely to be influenced by small changes in the training data). 
200 |      
201 |      You could list some examples of ensemble methods, 
202 |      from bagging to boosting to a “bucket of models” method and demonstrate how they could increase predictive power.
203 |   
204 | Q22: How do you ensure you’re not overfitting with a model?
205 | A22: This is a simple restatement of a fundamental problem in machine learning: 
206 |      the possibility of overfitting training data and carrying the noise of that data through to the test set, 
207 |      thereby providing inaccurate generalizations.
208 |      
209 |      There are three main methods to avoid overfitting:
210 |      1- Keep the model simpler: reduce variance by taking into account fewer variables and parameters, 
211 |         thereby removing some of the noise in the training data.
212 |      2- Use cross-validation techniques such as k-folds cross-validation.
213 |      3- Use regularization techniques such as LASSO that penalize certain model parameters if they’re likely to cause overfitting.
214 | 
215 | Q23: What evaluation approaches would you work to gauge the effectiveness of a machine learning model?
216 | A23: You would first split the dataset into training and test sets, 
217 |      or perhaps use cross-validation techniques to further segment the dataset into composite sets of 
218 |      training and test sets within the data. 
219 |      
220 |      You should then implement a choice selection of performance metrics: here is a fairly comprehensive list. 
221 |      You could use measures such as the F1 score, the accuracy, and the confusion matrix. 
222 |      
223 |      What’s important here is to demonstrate that you understand the nuances of how a model is measured and 
224 |      how to choose the right performance measures for the right situations.
225 | 
226 | Q24: How would you evaluate a logistic regression model?
227 | A24: A subsection of the question above.
228 |      You have to demonstrate an understanding of what the typical goals of a logistic regression are (classification, prediction etc.) 
229 |      and bring up a few examples and use cases.
230 | 
231 | Q25: What’s the “kernel trick” and how is it useful?
232 | A25: The Kernel trick involves kernel functions that can enable in higher-dimension spaces 
233 |      without explicitly calculating the coordinates of points within that dimension: 
234 |      instead, kernel functions compute the inner products between the images of all pairs of data in a feature space. 
235 |      This allows them the very useful attribute of calculating the coordinates of higher dimensions 
236 |      while being computationally cheaper than the explicit calculation of said coordinates. 
237 |      
238 |      Many algorithms can be expressed in terms of inner products. 
239 |      Using the kernel trick enables us effectively run  algorithms in a high-dimensional space with lower-dimensional data.
240 | 


--------------------------------------------------------------------------------
/Categories:
--------------------------------------------------------------------------------
 1 | Machine Learning Interview
 2 | 
 3 | The first thing has to do with the algorithms and theory behind machine learning.
 4 | You'll have to show an understanding of how algorithms compare with one another and 
 5 | how to measure their efficacy and accuracy in the right way. 
 6 | 
 7 | The second category has to do with your programming skills and 
 8 | your ability to execute on top of those algorithms and the theory.
 9 | 
10 | The third has to do with your general interest in machine learning: 
11 | you’ll be asked about what’s going on in the industry and how you 
12 | keep up with the latest machine learning trends. 
13 | 
14 | Finally, there are company or industry-specific questions that test 
15 | your ability to take your general machine learning knowledge and 
16 | turn it into actionable points to drive the bottom line forward.
17 | 


--------------------------------------------------------------------------------
/Programming:
--------------------------------------------------------------------------------
 1 | These machine learning interview questions test your knowledge of programming principles you need to implement 
 2 | machine learning principles in practice. 
 3 | Machine learning interview questions tend to be technical questions that test your logic and programming skills: 
 4 | this section focuses more on the latter.
 5 | 
 6 | Q26: How do you handle missing or corrupted data in a dataset?
 7 | A26: You could find missing/corrupted data in a dataset and either drop those rows or columns, 
 8 |      or decide to replace them with another value.
 9 |      
10 |      In Pandas, there are two very useful methods: isnull() and dropna() that 
11 |      will help you find columns of data with missing or corrupted data and drop those values. 
12 |      If you want to fill the invalid values with a placeholder value (for example, 0), you could use the fillna() method.
13 |      
14 | Q27: Do you have experience with Spark or big data tools for machine learning?
15 | A27: You’ll want to get familiar with the meaning of big data for different companies and the different tools they’ll want. 
16 |      Spark is the big data tool most in demand now, able to handle immense datasets with speed. 
17 |      Be honest if you don’t have experience with the tools demanded, 
18 |      but also take a look at job descriptions and see what tools pop up: you’ll want to invest in familiarizing yourself with them.
19 | 
20 | Q28: Pick an algorithm. Write the psuedo-code for a parallel implementation.
21 | A28: This kind of question demonstrates your ability to think in parallelism and 
22 |      how you could handle concurrency in programming implementations dealing with big data. 
23 |      Take a look at pseudocode frameworks such as Peril-L and visualization tools such as Web Sequence Diagrams to 
24 |      help you demonstrate your ability to write code that reflects parallelism.
25 | 
26 | Q29： What are some differences between a linked list and an array?
27 | A29： An array is an ordered collection of objects. 
28 |       A linked list is a series of objects with pointers that direct how to process them sequentially. 
29 |       An array assumes that every element has the same size, unlike the linked list. 
30 |       A linked list can more easily grow organically: an array has to be pre-defined or re-defined for organic growth. 
31 |       Shuffling a linked list involves changing which points direct where
32 |       but shuffling an array is more complex and takes more memory.
33 |       
34 | Q30: Describe a hash table.
35 | A30: A hash table is a data structure that produces an associative array. 
36 |      A key is mapped to certain values through the use of a hash function. 
37 |      They are often used for tasks such as database indexing.
38 |      
39 | Q31: Which data visualization libraries do you use? What are your thoughts on the best data visualization tools?
40 | A31: What’s important here is to define your views on how to properly visualize data 
41 |      and your personal preferences when it comes to tools. 
42 |      Popular tools include R’s ggplot, Python’s seaborn and matplotlib, and tools such as Plot.ly and Tableau.
43 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Machine-Learning-Interview
2 | A List of Machine Learning Interview Questions.
3 | 


--------------------------------------------------------------------------------