└── README.md /README.md: -------------------------------------------------------------------------------- 1 | Data-Science-Interview-Questions-and-Answers-General (Updating) 2 | ==================================================== 3 | 4 | I hope this article could help beginners to better understanding of Data Science, and have a better performance in your first interviews. 5 | 6 | I will do long update and please feel free to contact me if you have any questions. 7 | 8 | I'm just a porter, most of them are borrowing from others 9 | 10 | ## Data Science Questions and Answers (General) for beginner 11 | ### Editor : Zhiqiang ZHONG 12 | 13 | # Content 14 | #### Q1 How would you create a taxonomy to identify key customer trends in unstructured data? 15 | 16 | The best way to approach this question is to mention that it is good to check with the business owner 17 | and understand their objectives before categorizing the data. Having done this, it is always good to 18 | follow an iterative approach by pulling new data samples and improving the model accordingly by validating 19 | it for accuracy by soliciting feedback from the stakeholders of the business. This helps ensure that your 20 | model is producing actionable results and improving over the time. 21 | 22 | #### Q2 Python or R – Which one would you prefer for text analytics? 23 | 24 | The best possible answer for this would be Python because it has Pandas library that provides easy to use 25 | data structures and high performance data analysis tools. 26 | 27 | #### Q3 Which technique is used to predict categorical responses? 28 | 29 | Classification technique is used widely in mining for classifying data sets. 30 | 31 | #### Q4 What is logistic regression? Or State an example when you have used logistic regression recently. 32 | 33 | Logistic Regression often referred as logit model is a technique to predict the binary outcome from a linear 34 | combination of predictor variables. For example, if you want to predict whether a particular political leader 35 | will win the election or not. In this case, the outcome of prediction is binary i.e. 0 or 1 (Win/Lose). The 36 | predictor variables here would be the amount of money spent for election campaigning of a particular candidate, 37 | the amount of time spent in campaigning, etc. 38 | 39 | #### Q5 What are Recommender Systems? 40 | 41 | A subclass of information filtering systems that are meant to predict the preferences or ratings that a user 42 | would give to a product. Recommender systems are widely used in movies, news, research articles, products, 43 | social tags, music, etc. 44 | 45 | #### Q6 Why data cleaning plays a vital role in analysis? 46 | 47 | Cleaning data from multiple sources to transform it into a format that data analysts or data scientists can work 48 | with is a cumbersome process because - as the number of data sources increases, the time take to clean the data 49 | increases exponentially due to the number of sources and the volume of data generated in these sources. It might 50 | take up to 80% of the time for just cleaning data making it a critical part of analysis task. 51 | 52 | #### Q7 Differentiate between univariate, bivariate and multivariate analysis. 53 | 54 | These are descriptive statistical analysis techniques which can be differentiated based on the number of 55 | variables involved at a given point of time. For example, the pie charts of sales based on territory involve 56 | only one variable and can be referred to as univariate analysis. 57 | 58 | If the analysis attempts to understand the difference between 2 variables at time as in a scatterplot, then it 59 | is referred to as bivariate analysis. For example, analysing the volume of sale and a spending can be considered 60 | as an example of bivariate analysis. 61 | 62 | Analysis that deals with the study of more than two variables to understand the effect of variables on the 63 | responses is referred to as multivariate analysis. 64 | 65 | #### Q8 What do you understand by the term Normal Distribution? 66 | 67 | Data is usually distributed in different ways with a bias to the left or to the right or it can all be jumbled 68 | up. However, there are chances that data is distributed around a central value without any bias to the left or 69 | right and reaches normal distribution in the form of a bell shaped curve. The random variables are distributed 70 | in the form of an symmetrical bell shaped curve. 71 | 72 | ![](https://s3.amazonaws.com/files.dezyre.com/images/blog/100+Data+Science+Interview+Questions+and+Answers+(General)/Bell+Shaped+Curve+for+Normal+Distribution.jpg) 73 | 74 | #### Q9 What is Linear Regression? 75 | 76 | Linear regression is a statistical technique where the score of a variable Y is predicted from the score of a 77 | second variable X. X is referred to as the predictor variable and Y as the criterion variable. 78 | 79 | #### Q10 What is Interpolation and Extrapolation? 80 | 81 | Estimating a value from 2 known values from a list of values is Interpolation. Extrapolation is approximating 82 | a value by extending a known set of values or facts. 83 | 84 | #### Q11 What is power analysis? 85 | 86 | An experimental design technique for determining the effect of a given sample size. 87 | 88 | #### Q12 What is K-means? How can you select K for K-means? 89 | 90 |    K-means is a clestering algorithm, handle with un-supervised problem. k-means clustering aims to partition 91 | n observations into k clusters in which each observation belongs to the cluster with the nearest mean, 92 | serving as a prototype of the cluster. 93 | 94 | You can choose the number of cluster by visually but there is lots of ambiguity, or computethe sum of SSE(the 95 | sum of squared error) for some values of K. To find one good K. 96 | 97 | ![](https://qph.ec.quoracdn.net/main-qimg-678795190794dd4c071366c06bf32115.webp) 98 | 99 | In this case, k=6 is the value. 100 | 101 | [More reading](https://www.quora.com/How-can-we-choose-a-good-K-for-K-means-clustering) 102 | 103 | #### Q13 What is Collaborative filtering? 104 | 105 | The process of filtering used by most of the recommender systems to find patterns or information by collaborating 106 | viewpoints, various data sources and multiple agents. 107 | 108 | #### Q14 What is the difference between Cluster and Systematic Sampling? 109 | 110 | Cluster sampling is a technique used when it becomes difficult to study the target population spread across 111 | a wide area and simple random sampling cannot be applied. Cluster Sample is a probability sample where each 112 | sampling unit is a collection, or cluster of elements. Systematic sampling is a statistical technique where 113 | elements are selected from an ordered sampling frame. In systematic sampling, the list is progressed in a 114 | circular manner so once you reach the end of the list,it is progressed from the top again. The best example 115 | for systematic sampling is equal probability method. 116 | 117 | #### Q15 Are expected value and mean value different? 118 | 119 | They are not different but the terms are used in different contexts. Mean is generally referred when talking 120 | about a probability distribution or sample population whereas expected value is generally referred in a 121 | random variable context. 122 | 123 | ***For Sampling Data*** 124 | Mean value is the only value that comes from the sampling data. 125 | Expected Value is the mean of all the means i.e. the value that is built from multiple samples. Expected 126 | value is the population mean. 127 | 128 | ***For Distributions*** 129 | Mean value and Expected value are same irrespective of the distribution, under the condition that the 130 | distribution is in the same population. 131 | 132 | #### Q16 What does P-value signify about the statistical data? 133 | 134 | P-value is used to determine the significance of results after a hypothesis test in statistics. P-value 135 | helps the readers to draw conclusions and is always between 0 and 1. 136 | - P- Value > 0.05 denotes weak evidence against the null hypothesis which means the null hypothesis cannot be rejected. 137 | - P-value <= 0.05 denotes strong evidence against the null hypothesis which means the null hypothesis can be rejected. 138 | - P-value=0.05is the marginal value indicating it is possible to go either way. 139 | 140 | #### Q17 Do gradient descent methods always converge to same point? 141 | 142 | No, they do not because in some cases it reaches a local minima or a local optima point. You don’t reach 143 | the global optima point. It depends on the data and starting conditions 144 | 145 | ~~#### Q18 What are categorical variables?~~ 146 | 147 | #### Q19 A test has a true positive rate of 100% and false positive rate of 5%. There is a population with a 1/1000 rate of having the condition the test identifies. Considering a positive test, what is the probability of having that condition? 148 | 149 | Let’s suppose you are being tested for a disease, if you have the illness the test will end up saying you 150 | have the illness. However, if you don’t have the illness- 5% of the times the test will end up saying you 151 | have the illness and 95% of the times the test will give accurate result that you don’t have the illness. 152 | Thus there is a 5% error in case you do not have the illness. 153 | 154 | Out of 1000 people, 1 person who has the disease will get true positive result. 155 | 156 | Out of the remaining 999 people, 5% will also get true positive result. 157 | 158 | Close to 50 people will get a true positive result for the disease. 159 | 160 | This means that out of 1000 people, 51 people will be tested positive for the disease even though only one 161 | person has the illness. There is only a 2% probability of you having the disease even if your reports say 162 | that you have the disease. 163 | 164 | #### Q20 How you can make data normal using Box-Cox transformation? 165 | 166 |    The calculation fomula of Box-Cox: 167 | ![](http://images.cnblogs.com/cnblogs_com/zgw21cn/WindowsLiveWriter/BoxCox_119E9/clip_image002_thumb.gif) 168 | 169 |    It change the calculation between log, sqrt and reciprocal operation by changing lambda. Find a suitable 170 | lambda based on specific data set. 171 | 172 | #### Q21 What is the difference between Supervised Learning an Unsupervised Learning? 173 | 174 | If an algorithm learns something from the training data so that the knowledge can be applied to the test data, 175 | then it is referred to as Supervised Learning. Classification is an example for Supervised Learning. If the 176 | algorithm does not learn anything beforehand because there is no response variable or any training data, 177 | then it is referred to as unsupervised learning. Clustering is an example for unsupervised learning. 178 | 179 | #### Q22 Explain the use of Combinatorics in data science. 180 | 181 |    Combinatorics used a lot in data science, from feature engineer to algorithms(ensemble algorithms).Creat new features 182 | by merge original feature and merge several networks in one to creat news, like bagging, boosting and stacking. 183 | 184 | #### Q23 Why is vectorization considered a powerful method for optimizing numerical code? 185 | 186 | Vectorization can change original data to be structed. 187 | 188 | #### Q24 What is the goal of A/B Testing? 189 | 190 | It is a statistical hypothesis testing for randomized experiment with two variables A and B. The goal of A/B 191 | Testing is to identify any changes to the web page to maximize or increase the outcome of an interest. An 192 | example for this could be identifying the click through rate for a banner ad. 193 | 194 | #### Q25 What is an Eigenvalue and Eigenvector? 195 | 196 | Eigenvectors are used for understanding linear transformations. In data analysis, we usually calculate the 197 | eigenvectors for a correlation or covariance matrix. Eigenvectors are the directions along which a particular 198 | linear transformation acts by flipping, compressing or stretching. Eigenvalue can be referred to as the strength 199 | of the transformation in the direction of eigenvector or the factor by which the compression occurs. 200 | #### Q26 What is Gradient Descent? 201 | 202 |    A method to find the local minimum of a function. From a point along the direction of gradient to iterational 203 | search by a certain step length, until gradient equals zero. 204 | 205 | #### Q27 How can outlier values be treated? 206 | 207 | Outlier values can be identified by using univariate or any other graphical analysis method. If the number of 208 | outlier values is few then they can be assessed individually but for large number of outliers the values can 209 | be substituted with either the 99th or the 1st percentile values. All extreme values are not outlier values. 210 | The most common ways to treat outlier values – 211 | 212 | 1. To change the value and bring in within a range 213 | 214 | 2. To just remove the value. 215 | 216 | #### Q28 How can you assess a good logistic model? 217 | 218 | There are various methods to assess the results of a logistic regression analysis- 219 | 220 | - Using Classification Matrix to look at the true negatives and false positives. 221 | - Concordance that helps identify the ability of the logistic model to differentiate between the event happening and not happening. 222 | - Lift helps assess the logistic model by comparing it with random selection. 223 | 224 | #### Q29 What are various steps involved in an analytics project? 225 | 226 | - Understand the business problem 227 | - Explore the data and become familiar with it. 228 | - Prepare the data for modelling by detecting outliers, treating missing values, transforming variables, etc. 229 | - After data preparation, start running the model, analyse the result and tweak the approach. This is an iterative step till the best possible outcome is achieved. 230 | - Validate the model using a new data set. 231 | - Start implementing the model and track the result to analyse the performance of the model over the period of time. 232 | 233 | #### Q30 How can you iterate over a list and also retrieve element indices at the same time? 234 | 235 | This can be done using the enumerate function which takes every element in a sequence just like in a list 236 | and adds its location just before it. 237 | 238 | #### Q31 During analysis, how do you treat missing values? 239 | 240 | Minsing values has many reasons, like: 241 | - Information not advisable for this time 242 | - Information was missed by collect 243 | - Some attributes of some items are not avaliable 244 | - Some information was thinked not important 245 | - It's too expensive to collect all these data 246 | 247 | Types of Missing values: 248 | - Missing completely at Random (MCAR): no relationship with missing values and other variables, like 249 | family adress 250 | - Missing at random (MAR): not completely random, missing denpends on other variables, like finance situation 251 | data missing has relationship with the company size 252 | - Missing not at random (MNAR): there is relationship with the value of variable self, like high income families 253 | don't will to open its income situation 254 | 255 | Methods treatment (you need to know clearly about your missing values firstly) 256 | - Delect tuple 257 | Delect tuples have any missing values 258 | - List wise delection 259 | - Pair wise delection 260 | ![](https://www.analyticsvidhya.com/wp-content/uploads/2015/02/Data_Exploration_2_2.png) 261 | 262 | - Imputation 263 | - Filling manually 264 | - Treating Missing Attribute values as Special values (mean, mode, median imputation) 265 | - Hot deck imputation 266 | - KNN 267 | - Assigning All Possible values of the Attribute 268 | - Combinational Completer 269 | - Regression 270 | - Expectation maximization, EM 271 | - Multiple Imputation 272 | 273 | [More Reading (In Chinese)](http://blog.csdn.net/lujiandong1/article/details/52654703) 274 | 275 | [Python package](https://pypi.python.org/pypi/fancyimpute) 276 | 277 | ~~#### Q32 Explain about the box cox transformation in regression models.~~ 278 | 279 | #### Q33 Can you use machine learning for time series analysis? 280 | 281 | Yes, it can be used but it depends on the applications. 282 | 283 | #### Q34 Write a function that takes in two sorted lists and outputs a sorted list that is their union. 284 | 285 | First solution which will come to your mind is to merge two lists and short them afterwards 286 | **Python code-** 287 | def return_union(list_a, list_b): 288 | return sorted(list_a + list_b) 289 | 290 | **R code-** 291 | return_union <- function(list_a, list_b) 292 | { 293 | list_c<-list(c(unlist(list_a),unlist(list_b))) 294 | return(list(list_c[[1]][order(list_c[[1]])])) 295 | } 296 | 297 | Generally, the tricky part of the question is not to use any sorting or ordering function. In that 298 | case you will have to write your own logic to answer the question and impress your interviewer. 299 | 300 | ***Python code-*** 301 | def return_union(list_a, list_b): 302 | len1 = len(list_a) 303 | len2 = len(list_b) 304 | final_sorted_list = [] 305 | j = 0 306 | k = 0 307 | 308 | for i in range(len1+len2): 309 | if k == len1: 310 | final_sorted_list.extend(list_b[j:]) 311 | break 312 | elif j == len2: 313 | final_sorted_list.extend(list_a[k:]) 314 | break 315 | elif list_a[k] < list_b[j]: 316 | final_sorted_list.append(list_a[k]) 317 | k += 1 318 | else: 319 | final_sorted_list.append(list_b[j]) 320 | j += 1 321 | return final_sorted_list 322 | 323 | Similar function can be returned in R as well by following the similar steps. 324 | 325 | return_union <- function(list_a,list_b) 326 | { 327 | #Initializing length variables 328 | len_a <- length(list_a) 329 | len_b <- length(list_b) 330 | len <- len_a + len_b 331 | 332 | #initializing counter variables 333 | 334 | j=1 335 | k=1 336 | 337 | #Creating an empty list which has length equal to sum of both the lists 338 | 339 | list_c <- list(rep(NA,len)) 340 | 341 | #Here goes our for loop 342 | 343 | for(i in 1:len) 344 | { 345 | if(j>len_a) 346 | { 347 | list_c[i:len] <- list_b[k:len_b] 348 | break 349 | } 350 | else if(k>len_b) 351 | { 352 | list_c[i:len] <- list_a[j:len_a] 353 | break 354 | } 355 | else if(list_a[[j]] <= list_b[[k]]) 356 | { 357 | list_c[[i]] <- list_a[[j]] 358 | j <- j+1 359 | } 360 | else if(list_a[[j]] > list_b[[k]]) 361 | { 362 | list_c[[i]] <- list_b[[k]] 363 | k <- k+1 364 | } 365 | } 366 | return(list(unlist(list_c))) 367 | 368 | } 369 | #### Q35 What is the difference between Bayesian Inference and Maximum Likelihood Estimation (MLE)? 370 | 371 | #### Q36 What is Regularization and what kind of problems does regularization solve? 372 | A central problem in machine learning is how to make an algorithm that will perform weel not just on 373 | the training data, but also on new inputs. Many strategies used in machine learning are explicitly 374 | designed to reduce the test error, possibly at the expense of increased training error. These 375 | strategies are known collectively as regularization. 376 | Briefly, regularization is any modification we make to a learning algorithm that is intended to 377 | reduce its generalization error but not its training error. 378 | 379 | #### Q37 What is multicollinearity and how you can overcome it? 380 | In statistics, multicollinearity (also collinearity) is a phenomenon in which two or more predictor 381 | variables in a multiple regression model are highly correlated, meaning that one can be linearly 382 | predicted from the others with a substantial degree of accuracy. 383 | Solutions: 384 | Remove variables that lead to multicollinearity. 385 | Obtain more data. 386 | Ridge regression or PCA (principal component regression) or partial least squares regression 387 | [More reading in WIKI](https://en.wikipedia.org/wiki/Multicollinearity) 388 | 389 | #### Q38 What is the curse of dimensionality? 390 | It refers to various phenomena that arise when analyzing and organizing data in high-dimensional 391 | spaces (often with hundreds or thousands of dimensions) that do not occur in low-dimensional 392 | settings. 393 | 394 | #### Q39 How do you decide whether your linear regression model fits the data? 395 | Many solutions, such as use a loss function and check it situation, or use test data to verify 396 | our model 397 | 398 | ~~#### Q40 What is the difference between squared error and absolute error?~~ 399 | 400 | #### Q41 What is Machine Learning? 401 | 402 | The simplest way to answer this question is – we give the data and equation to the machine. Ask the 403 | machine to look at the data and identify the coefficient values in an equation. 404 | 405 | For example for the linear regression y=mx+c, we give the data for the variable x, y and the machine 406 | learns about the values of m and c from the data. 407 | 408 | #### Q42 How are confidence intervals constructed and how will you interpret them? 409 |    Confidence interval is: under a certain confidence, the length of the area where the overall parameter 410 | is located. 411 | 412 | #### Q43 How will you explain logistic regression to an economist, physican scientist and biologist? 413 | 414 | #### Q44 How can you overcome Overfitting? 415 | Regularization: add a regularizer or a penalty term. 416 | Cross Validation: Simple cross validation; S-folder cross validation; Leave-one-out cross validation. 417 | 418 | #### Q45 Differentiate between wide and tall data formats? 419 | Wide: data formats have lots of columns. 420 | Tall: data formats have lots of examples. 421 | 422 | #### Q46 Is Naïve Bayes bad? If yes, under what aspects. 423 | 424 | #### Q47 How would you develop a model to identify plagiarism? 425 | 426 | #### Q48 How will you define the number of clusters in a clustering algorithm? 427 | 428 | Though the Clustering Algorithm is not specified, this question will mostly be asked in reference to 429 | K-Means clustering where “K” defines the number of clusters. The objective of clustering is to group 430 | similar entities in a way that the entities within a group are similar to each other but the groups 431 | are different from each other. 432 | 433 | For example, the following image shows three different groups. 434 | 435 | ![](https://s3.amazonaws.com/files.dezyre.com/images/blog/100+Data+Science+Interview+Questions+and+Answers+(General)/Data+Science+Interview+Questions+K-Means+Clustering.jpg) 436 | 437 | K-Mean Clustering Machine Learning Algorithm 438 | 439 | Within Sum of squares is generally used to explain the homogeneity within a cluster. If you plot WSS 440 | for a range of number of clusters, you will get the plot shown below. The Graph is generally known as 441 | Elbow Curve. 442 | 443 | ![](https://s3.amazonaws.com/files.dezyre.com/images/blog/100+Data+Science+Interview+Questions+and+Answers+(General)/Data+Science+Interview+Questions+K-Means.png) 444 | 445 | Red circled point in above graph i.e. Number of Cluster =6 is the point after which you don’t see any 446 | decrement in WSS. This point is known as bending point and taken as K in K – Means. 447 | 448 | This is the widely used approach but few data scientists also use Hierarchical clustering first to 449 | create dendograms and identify the distinct groups from there. 450 | #### Q49 Is it better to have too many false negatives or too many false positives? 451 | It depends on the situation, for example, if we use the model for cancer detection, FN(False Negative) 452 | is more serious than FP(False Positive) because a FN could be verified in futher check, but 453 | FP maybe will let a patient be missed and delay the best treatment period. 454 | 455 | #### Q50 Is it possible to perform logistic regression with Microsoft Excel? 456 | Yep, i must say Microsoft Excel is more and more powerful, and many data science could be 457 | realized in simple way. 458 | 459 | #### Q51 What do you understand by Fuzzy merging ? Which language will you use to handle it? 460 | 461 | #### Q51 What is the difference between skewed and uniform distribution? 462 | 463 | #### G52 You created a predictive model of a quantitative outcome variable using multiple regressions. What are the steps you would follow to validate the model? 464 | 465 | Since the question asked, is about post model building exercise, we will assume that you have 466 | already tested for null hypothesis, multi collinearity and Standard error of coefficients. 467 | 468 | Once you have built the model, you should check for following – 469 | - Global F-test to see the significance of group of independent variables on dependent variable 470 | - R^2 471 | - Adjusted R^2 472 | - RMSE, MAPE 473 | 474 | In addition to above mentioned quantitative metrics you should also check for- 475 | - Residual plot 476 | - Assumptions of linear regression 477 | 478 | #### Q54 What do you understand by Hypothesis in the content of Machine Learning? 479 | 480 | #### Q55 What do you understand by Recall and Precision? 481 | 482 | #### Q56 How will you find the right K for K-means? 483 | No any other way just do experiment on instance dataset, see the result of different K, find 484 | the better one. 485 | 486 | #### Q57 Why L1 regularizations causes parameter sparsity whereas L2 regularization does not? 487 | 488 | Regularizations in statistics or in the field of machine learning is used to include some extra 489 | information in order to solve a problem in a better way. L1 & L2 regularizations are generally used 490 | to add constraints to optimization problems. 491 | 492 | ![](https://s3.amazonaws.com/files.dezyre.com/images/blog/100+Data+Science+Interview+Questions+and+Answers+(General)/L1+L2+Regularizations.png) 493 | 494 | In the example shown above H0 is a hypothesis. If you observe, in L1 there is a high likelihood to 495 | hit the corners as solutions while in L2, it doesn’t. So in L1 variables are penalized more as compared 496 | to L2 which results into sparsity. 497 | In other words, errors are squared in L2, so model sees higher error and tries to minimize that squared 498 | error. 499 | 500 | #### Q58 How can you deal with different types of seasonality in time series modelling? 501 | 502 | #### Q59 In experimental design, is it necessary to do randomization? If yes, why? 503 | Normally yes, but never do it for time series dataset. 504 | 505 | #### Q60 What do you understand by conjugate-prior with respect to Naïve Bayes? 506 | 507 | #### Q61 Can you cite some examples where a false positive is important than a false negative? 508 | 509 | Before we start, let us understand what are false positives and what are false negatives. 510 | False Positives are the cases where you wrongly classified a non-event as an event a.k.a Type I error. 511 | And, False Negatives are the cases where you wrongly classify events as non-events, a.k.a Type II error. 512 | 513 | ![](https://s3.amazonaws.com/files.dezyre.com/images/blog/100+Data+Science+Interview+Questions+and+Answers+(General)/False+Positive+False+Negative.png) 514 | 515 | In medical field, assume you have to give chemo therapy to patients. Your lab tests patients for certain 516 | vital information and based on those results they decide to give radiation therapy to a patient. 517 | Assume a patient comes to that hospital and he is tested positive for cancer (But he doesn’t have cancer) 518 | based on lab prediction. What will happen to him? (Assuming Sensitivity is 1) 519 | 520 | One more example might come from marketing. Let’s say an ecommerce company decided to give $1000 Gift 521 | voucher to the customers whom they assume to purchase at least $5000 worth of items. They send free voucher 522 | mail directly to 100 customers without any minimum purchase condition because they assume to make at 523 | least 20% profit on sold items above 5K. 524 | 525 | Now what if they have sent it to false positive cases? 526 | 527 | #### Q62 Can you cite some examples where a false negative important than a false positive? 528 | 529 | Assume there is an airport ‘A’ which has received high security threats and based on certain 530 | characteristics they identify whether a particular passenger can be a threat or not. Due to shortage 531 | of staff they decided to scan passenger being predicted as risk positives by their predictive model. 532 | What will happen if a true threat customer is being flagged as non-threat by airport model? 533 | 534 | Another example can be judicial system. What if Jury or judge decide to make a criminal go free? 535 | 536 | What if you rejected to marry a very good person based on your predictive model and you happen to 537 | meet him/her after few years and realize that you had a false negative? 538 | 539 | #### Q63 Can you cite some examples where both false positive and false negatives are equally important? 540 | 541 | In the banking industry giving loans is the primary source of making money but at the same time if 542 | your repayment rate is not good you will not make any profit, rather you will risk huge losses. 543 | 544 | Banks don’t want to lose good customers and at the same point of time they don’t want to acquire 545 | bad customers. In this scenario both the false positives and false negatives become very important 546 | to measure. 547 | 548 | #### Q64 Can you explain the difference between a Test Set and a Validation Set? 549 | 550 | Validation set can be considered as a part of the training set as it is used for parameter selection 551 | and to avoid Overfitting of the model being built. On the other hand, test set is used for testing 552 | or evaluating the performance of a trained machine leaning model. 553 | 554 | In simple terms ,the differences can be summarized as- 555 | 556 | - Training Set is to fit the parameters i.e. weights. 557 | - Test Set is to assess the performance of the model i.e. evaluating the predictive power and generalization. 558 | - Validation set is to tune the parameters. 559 | 560 | #### Q65 What makes a dataset gold standard? 561 | 562 | 563 | #### Q66 What do you understand by statistical power of sensitivity and how do you calculate it? 564 | 565 | Sensitivity is commonly used to validate the accuracy of a classifier (Logistic, SVM, RF etc.). 566 | Sensitivity is nothing but “Predicted TRUE events/ Total events”. True events here are the events 567 | which were true and model also predicted them as true. 568 | 569 | Calculation of seasonality is pretty straight forward- 570 | 571 | ***Seasonality = True Positives /Positives in Actual Dependent Variable*** 572 | 573 | Where, True positives are Positive events which are correctly classified as Positives. 574 | 575 | #### Q67 What is the importance of having a selection bias? 576 | 577 | #### Q68 Give some situations where you will use an SVM over a RandomForest Machine Learning algorithm and vice-versa. 578 | 579 | SVM and Random Forest are both used in classification problems. 580 | 581 | a) If you are sure that your data is outlier free and clean then go for SVM. It is the 582 | opposite - if your data might contain outliers then Random forest would be the best choice 583 | b) Generally, SVM consumes more computational power than Random Forest, so if you are constrained 584 | with memory go for Random Forest machine learning algorithm. 585 | c) Random Forest gives you a very good idea of variable importance in your data, so if you want to 586 | have variable importance then choose Random Forest machine learning algorithm. 587 | d) Random Forest machine learning algorithms are preferred for multiclass problems. 588 | e) SVM is preferred in multi-dimensional problem set - like text classification 589 | but as a good data scientist, you should experiment with both of them and test for accuracy or rather 590 | you can use ensemble of many Machine Learning techniques. 591 | 592 | #### Q69 What do you understand by feature vectors? 593 | 594 | ~~#### Q70 How do data management procedures like missing data handling make selection bias worse?~~ 595 | 596 | #### Q71 What are the advantages and disadvantages of using regularization methods like Ridge Regression? 597 | 598 | ~~#### Q72 What do you understand by long and wide data formats?~~ 599 | 600 | #### Q73 What do you understand by outliers and inliers? What would you do if you find them in your dataset? 601 | 602 | ~~#### Q74 Write a program in Python which takes input as the diameter of a coin and weight of the coin and produces output as the money value of the coin. 603 | 604 | #### Q75 What are the basic assumptions to be made for linear regression? 605 | 606 | Normality of error distribution, statistical independence of errors, linearity and additivity. 607 | 608 | #### Q76 Can you write the formula to calculat R-square? 609 | 610 | R-Square can be calculated using the below formular - 611 | 1 - (Residual Sum of Squares/ Total Sum of Squares) 612 | 613 | #### Q77 What is the advantage of performing dimensionality reduction before fitting an SVM? 614 | 615 | Support Vector Machine Learning Algorithm performs better in the reduced space. It is beneficial to 616 | perform dimensionality reduction before fitting an SVM if the number of features is large when 617 | compared to the number of observations. 618 | 619 | #### Q78 How will you assess the statistical significance of an insight whether it is a real insight or just by chance? 620 | 621 | Statistical importance of an insight can be accessed using Hypothesis Testing. 622 | 623 | ## Machine Learning Interview Questions: Algorithms/Theory 624 | 625 | #### Q79 What’s the trade-off between bias and variance? 626 | 627 | Bias is error due to erroneous or overly simplistic assumptions in the learning algorithm 628 | you’re using. This can lead to the model underfitting your data, making it hard for it to have 629 | high predictive accuracy and for you to generalize your knowledge from the training set to the 630 | test set. 631 | 632 | Variance is error due to too much complexity in the learning algorithm you’re using. This leads 633 | to the algorithm being highly sensitive to high degrees of variation in your training data, which 634 | can lead your model to overfit the data. You’ll be carrying too much noise from your training data 635 | for your model to be very useful for your test data. 636 | 637 | The bias-variance decomposition essentially decomposes the learning error from any algorithm by 638 | adding the bias, the variance and a bit of irreducible error due to noise in the underlying dataset. 639 | Essentially, if you make the model more complex and add more variables, you’ll lose bias but gain 640 | some variance — in order to get the optimally reduced amount of error, you’ll have to tradeoff bias 641 | and variance. You don’t want either high bias or high variance in your model. 642 | 643 | #### Q80 What is the difference between supervised and unsupervised machine learning? 644 | 645 | Supervised learning requires training labeled data. For example, in order to do classification 646 | (a supervised learning task), you’ll need to first label the data you’ll use to train the model 647 | to classify data into your labeled groups. Unsupervised learning, in contrast, does not require 648 | labeling data explicitly. 649 | 650 | #### Q81 How is KNN different from k-means clustering? 651 | 652 | K-Nearest Neighbors is a supervised classification algorithm, while k-means clustering is an 653 | unsupervised clustering algorithm. While the mechanisms may seem similar at first, what this 654 | really means is that in order for K-Nearest Neighbors to work, you need labeled data you want to 655 | classify an unlabeled point into (thus the nearest neighbor part). K-means clustering requires only 656 | a set of unlabeled points and a threshold: the algorithm will take unlabeled points and gradually 657 | learn how to cluster them into groups by computing the mean of the distance between different points. 658 | 659 | The critical difference here is that KNN needs labeled points and is thus supervised learning, while 660 | k-means doesn’t — and is thus unsupervised learning. 661 | 662 | #### Q82 Explain how a ROC curve works. 663 | 664 | The ROC curve is a graphical representation of the contrast between true positive rates and the 665 | false positive rate at various thresholds. It’s often used as a proxy for the trade-off between 666 | the sensitivity of the model (true positives) vs the fall-out or the probability it will trigger 667 | a false alarm (false positives). 668 | 669 | ![](https://lh3.googleusercontent.com/zUWYO4VwGpoyu9oygT12F3hgZ30GxVY7sg_ZF46INrNbDutd9mVz9GnYIYGw2r1ZcbPLQXF4HV-uNXvQcVrP7Sg2BDDqRkaY3RAApumdXgH2mQZ8OCSgqqsVl7UDVjqwVFq224Z_) 670 | 671 | #### Q83 Define precision and recall. 672 | 673 | Recall is also known as the true positive rate: the amount of positives your model claims 674 | compared to the actual number of positives there are throughout the data. Precision is also 675 | known as the positive predictive value, and it is a measure of the amount of accurate positives 676 | your model claims compared to the number of positives it actually claims. It can be easier to think 677 | of recall and precision in the context of a case where you’ve predicted that there were 10 apples 678 | and 5 oranges in a case of 10 apples. You’d have perfect recall (there are actually 10 apples, and 679 | you predicted there would be 10) but 66.7% precision because out of the 15 events you predicted, 680 | only 10 (the apples) are correct. 681 | 682 | #### Q84 What is Bayes’ Theorem? How is it useful in a machine learning context? 683 | 684 | Bayes’ Theorem gives you the posterior probability of an event given what is known as prior knowledge. 685 | 686 | Mathematically, it’s expressed as the true positive rate of a condition sample divided by the sum of 687 | the false positive rate of the population and the true positive rate of a condition. Say you had a 688 | 60% chance of actually having the flu after a flu test, but out of people who had the flu, the test 689 | will be false 50% of the time, and the overall population only has a 5% chance of having the flu. 690 | Would you actually have a 60% chance of having the flu after having a positive test? 691 | 692 | Bayes’ Theorem says no. It says that you have a (.6 * 0.05) (True Positive Rate of a Condition 693 | Sample) / (.6*0.05)(True Positive Rate of a Condition Sample) + (.5*0.95) (False Positive Rate of 694 | a Population) = 0.0594 or 5.94% chance of getting a flu. 695 | 696 | Bayes’ Theorem is the basis behind a branch of machine learning that most notably includes the 697 | Naive Bayes classifier. That’s something important to consider when you’re faced with machine 698 | learning interview questions. 699 | 700 | #### Q85 Why is “Naive” Bayes naive? 701 | 702 | Despite its practical applications, especially in text mining, Naive Bayes is considered “Naive” 703 | because it makes an assumption that is virtually impossible to see in real-life data: the 704 | conditional probability is calculated as the pure product of the individual probabilities of 705 | components. This implies the absolute independence of features — a condition probably never met 706 | in real life. 707 | 708 | As a Quora commenter put it whimsically, a Naive Bayes classifier that figured out that you liked 709 | pickles and ice cream would probably naively recommend you a pickle ice cream. 710 | 711 | #### Q86 Explain the difference between L1 and L2 regularization. 712 | 713 | L2 regularization tends to spread error among all the terms, while L1 is more binary/sparse, with 714 | many variables either being assigned a 1 or 0 in weighting. L1 corresponds to setting a Laplacean 715 | prior on the terms, while L2 corresponds to a Gaussian prior. 716 | 717 | ![](https://lh6.googleusercontent.com/vXUSHKE11Qpolek11IPPP6Fs-iU1-LeWtf5EXVdrfOl97ytug_cME-vLF1t4BNvoAppxfRhx4dNzHoKkdl8dfGVix4jc2hhvrtDG_wyuByxpVfeFZQdMH-INzG6RSi_9jkJLERto) 718 | 719 | #### Q87 What’s your favorite algorithm, and can you explain it to me in less than a minute? 720 | 721 | This type of question tests your understanding of how to communicate complex and technical nuances 722 | with poise and the ability to summarize quickly and efficiently. Make sure you have a choice and 723 | make sure you can explain different algorithms so simply and effectively that a five-year-old could 724 | grasp the basics! 725 | 726 | #### Q88 What’s the difference between Type I and Type II error? 727 | 728 | Don’t think that this is a trick question! Many machine learning interview questions will be an 729 | attempt to lob basic questions at you just to make sure you’re on top of your game and you’ve 730 | prepared all of your bases. 731 | 732 | Type I error is a false positive, while Type II error is a false negative. Briefly stated, Type I 733 | error means claiming something has happened when it hasn’t, while Type II error means that you claim 734 | nothing is happening when in fact something is. 735 | 736 | A clever way to think about this is to think of Type I error as telling a man he is pregnant, while 737 | Type II error means you tell a pregnant woman she isn’t carrying a baby. 738 | 739 | #### Q89 What’s a Fourier transform? 740 | 741 | A Fourier transform is a generic method to decompose generic functions into a superposition of symmetric 742 | functions. Or as this more intuitive tutorial puts it, given a smoothie, it’s how we find the recipe. The 743 | Fourier transform finds the set of cycle speeds, amplitudes and phases to match any time signal. A Fourier 744 | transform converts a signal from time to frequency domain — it’s a very common way to extract features from 745 | audio signals or other time series such as sensor data. 746 | 747 | #### Q90 What’s the difference between probability and likelihood? 748 | 749 | ![](https://lh3.googleusercontent.com/Yz2xAzLEEjtk62o9zatSDZJ7yBwgw-a1GtSNfAjJ3tq3OY5UbnxYUpNOqAuuKAUj8kVZaraIsr87kX83ejzg2y8DW9goGJbZuPc1Be_2VmGEEsNZ5JMioUw6Xke-KvYzp-sVrLCL) 750 | 751 | #### Q91 What is deep learning, and how does it contrast with other machine learning algorithms? 752 | 753 | Deep learning is a subset of machine learning that is concerned with neural networks: how to use 754 | backpropagation and certain principles from neuroscience to more accurately model large sets of 755 | unlabelled or semi-structured data. In that sense, deep learning represents an unsupervised learning 756 | algorithm that learns representations of data through the use of neural nets. 757 | 758 | #### Q92 What’s the difference between a generative and discriminative model? 759 | 760 | A generative model will learn categories of data while a discriminative model will simply learn the 761 | distinction between different categories of data. Discriminative models will generally outperform 762 | generative models on classification tasks. 763 | 764 | #### Q93 What cross-validation technique would you use on a time series dataset? 765 | 766 | Instead of using standard k-folds cross-validation, you have to pay attention to the fact that a 767 | time series is not randomly distributed data — it is inherently ordered by chronological order. If a 768 | pattern emerges in later time periods for example, your model may still pick up on it even if that 769 | effect doesn’t hold in earlier years! 770 | 771 | You’ll want to do something like forward chaining where you’ll be able to model on past data then 772 | look at forward-facing data. 773 | 774 | fold 1 : training [1], test [2] 775 | fold 2 : training [1 2], test [3] 776 | fold 3 : training [1 2 3], test [4] 777 | fold 4 : training [1 2 3 4], test [5] 778 | fold 5 : training [1 2 3 4 5], test [6] 779 | #### Q94 How is a decision tree pruned? 780 | 781 | Pruning is what happens in decision trees when branches that have weak predictive power are removed 782 | in order to reduce the complexity of the model and increase the predictive accuracy of a decision 783 | tree model. Pruning can happen bottom-up and top-down, with approaches such as reduced error pruning 784 | and cost complexity pruning. 785 | 786 | Reduced error pruning is perhaps the simplest version: replace each node. If it doesn’t decrease 787 | predictive accuracy, keep it pruned. While simple, this heuristic actually comes pretty close to an 788 | approach that would optimize for maximum accuracy. 789 | 790 | #### Q95 Which is more important to you– model accuracy, or model performance? 791 | 792 | This question tests your grasp of the nuances of machine learning model performance! Machine learning 793 | interview questions often look towards the details. There are models with higher accuracy that can 794 | perform worse in predictive power — how does that make sense? 795 | 796 | Well, it has everything to do with how model accuracy is only a subset of model performance, and at 797 | that, a sometimes misleading one. For example, if you wanted to detect fraud in a massive dataset with 798 | a sample of millions, a more accurate model would most likely predict no fraud at all if only a vast 799 | minority of cases were fraud. However, this would be useless for a predictive model — a model designed 800 | to find fraud that asserted there was no fraud at all! Questions like this help you demonstrate that 801 | you understand model accuracy isn’t the be-all and end-all of model performance. 802 | 803 | #### Q96 What’s the F1 score? How would you use it? 804 | 805 | The F1 score is a measure of a model’s performance. It is a weighted average of the precision and recall 806 | of a model, with results tending to 1 being the best, and those tending to 0 being the worst. You would 807 | use it in classification tests where true negatives don’t matter much. 808 | 809 | #### Q97 How would you handle an imbalanced dataset? 810 | 811 | An imbalanced dataset is when you have, for example, a classification test and 90% of the data is in one 812 | class. That leads to problems: an accuracy of 90% can be skewed if you have no predictive power on the 813 | other category of data! Here are a few tactics to get over the hump: 814 | 815 | 1- Collect more data to even the imbalances in the dataset. 816 | 817 | 2- Resample the dataset to correct for imbalances. 818 | 819 | 3- Try a different algorithm altogether on your dataset. 820 | 821 | What’s important here is that you have a keen sense for what damage an unbalanced dataset can cause, 822 | and how to balance that. 823 | 824 | #### Q98 When should you use classification over regression? 825 | 826 | Classification produces discrete values and dataset to strict categories, while regression gives you 827 | continuous results that allow you to better distinguish differences between individual points. You would 828 | use classification over regression if you wanted your results to reflect the belongingness of data points 829 | in your dataset to certain explicit categories (ex: If you wanted to know whether a name was male or 830 | female rather than just how correlated they were with male and female names.) 831 | 832 | #### Q99 Name an example where ensemble techniques might be useful. 833 | 834 | Ensemble techniques use a combination of learning algorithms to optimize better predictive performance. 835 | They typically reduce overfitting in models and make the model more robust (unlikely to be influenced by 836 | small changes in the training data). 837 | 838 | You could list some examples of ensemble methods, from bagging to boosting to a “bucket of models” method 839 | and demonstrate how they could increase predictive power. 840 | 841 | #### Q100 How do you ensure you’re not overfitting with a model? 842 | 843 | This is a simple restatement of a fundamental problem in machine learning: the possibility of 844 | overfitting training data and carrying the noise of that data through to the test set, thereby 845 | providing inaccurate generalizations. 846 | 847 | There are three main methods to avoid overfitting: 848 | 849 | 1- Keep the model simpler: reduce variance by taking into account fewer variables and parameters, 850 | thereby removing some of the noise in the training data. 851 | 852 | 2- Use cross-validation techniques such as k-folds cross-validation. 853 | 854 | 3- Use regularization techniques such as LASSO that penalize certain model parameters if they’re 855 | likely to cause overfitting. 856 | 857 | #### Q101 What evaluation approaches would you work to gauge the effectiveness of a machine learning model? 858 | 859 | You would first split the dataset into training and test sets, or perhaps use cross-validation 860 | techniques to further segment the dataset into composite sets of training and test sets within 861 | the data. You should then implement a choice selection of performance metrics: here is a fairly 862 | comprehensive list. You could use measures such as the F1 score, the accuracy, and the confusion 863 | matrix. What’s important here is to demonstrate that you understand the nuances of how a model is 864 | measured and how to choose the right performance measures for the right situations. 865 | 866 | #### Q102 How would you evaluate a logistic regression model? 867 | 868 | A subsection of the question above. You have to demonstrate an understanding of what the typical goals 869 | of a logistic regression are (classification, prediction etc.) and bring up a few examples and use cases. 870 | 871 | #### Q103 What’s the “kernel trick” and how is it useful? 872 | 873 | The Kernel trick involves kernel functions that can enable in higher-dimension spaces without explicitly 874 | calculating the coordinates of points within that dimension: instead, kernel functions compute the inner 875 | products between the images of all pairs of data in a feature space. This allows them the very useful 876 | attribute of calculating the coordinates of higher dimensions while being computationally cheaper than 877 | the explicit calculation of said coordinates. Many algorithms can be expressed in terms of inner products. 878 | Using the kernel trick enables us effectively run algorithms in a high-dimensional space with lower-dimensional data. 879 | 880 | ## Machine Learning Interview Questions: Programming 881 | These machine learning interview questions test your knowledge of programming principles you need to 882 | implement machine learning principles in practice. Machine learning interview questions tend to be technical 883 | questions that test your logic and programming skills: this section focuses more on the latter. 884 | 885 | ~~#### Q104 How do you handle missing or corrupted data in a dataset?~~ 886 | 887 | #### Q105 Do you have experience with Spark or big data tools for machine learning? 888 | 889 | You’ll want to get familiar with the meaning of big data for different companies and the different 890 | tools they’ll want. Spark is the big data tool most in demand now, able to handle immense datasets 891 | with speed. Be honest if you don’t have experience with the tools demanded, but also take a look at 892 | job descriptions and see what tools pop up: you’ll want to invest in familiarizing yourself with them. 893 | 894 | #### Q106 Pick an algorithm. Write the psuedo-code for a parallel implementation. 895 | 896 | This kind of question demonstrates your ability to think in parallelism and how you could handle 897 | concurrency in programming implementations dealing with big data. Take a look at pseudocode frameworks 898 | such as Peril-L and visualization tools such as Web Sequence Diagrams to help you demonstrate your 899 | ability to write code that reflects parallelism. 900 | 901 | #### Q107 What are some differences between a linked list and an array? 902 | 903 | An array is an ordered collection of objects. A linked list is a series of objects with pointers that 904 | direct how to process them sequentially. An array assumes that every element has the same size, unlike 905 | the linked list. A linked list can more easily grow organically: an array has to be pre-defined or 906 | re-defined for organic growth. Shuffling a linked list involves changing which points direct where — 907 | meanwhile, shuffling an array is more complex and takes more memory. 908 | 909 | #### Q108 Describe a hash table. 910 | 911 | A hash table is a data structure that produces an associative array. A key is mapped to certain values 912 | through the use of a hash function. They are often used for tasks such as database indexing. 913 | 914 | #### Q109 Which data visualization libraries do you use? What are your thoughts on the best data visualization tools? 915 | 916 | What’s important here is to define your views on how to properly visualize data and your personal 917 | preferences when it comes to tools. Popular tools include R’s ggplot, Python’s seaborn and matplotlib, 918 | and tools such as Plot.ly and Tableau. 919 | 920 | ![](https://lh3.googleusercontent.com/79d5jkZBgpZPQa61A4e9opgfX2-mrxWxfQyswec3YxBouNEvAu8wYxjCXNQl-nRdBVQeuco1h-LZbxVblgS9h6bYLi6peoqSd2N7VW7BSeBgpmclKng6IRYEf9QkTMRJKMyPxrCT) 921 | 922 | ## Machine Learning Interview Questions: Company/Industry Specific 923 | 924 | These machine learning interview questions deal with how to implement your general machine learning knowledge 925 | to a specific company’s requirements. You’ll be asked to create case studies and extend your knowledge of the 926 | company and industry you’re applying for with your machine learning skills. 927 | 928 | #### Q110 How would you implement a recommendation system for our company’s users? 929 | 930 | A lot of machine learning interview questions of this type will involve implementation of machine learning 931 | models to a company’s problems. You’ll have to research the company and its industry in-depth, especially 932 | the revenue drivers the company has, and the types of users the company takes on in the context of the 933 | industry it’s in. 934 | 935 | #### Q111 How can we use your machine learning skills to generate revenue? 936 | 937 | This is a tricky question. The ideal answer would demonstrate knowledge of what drives the business and 938 | how your skills could relate. For example, if you were interviewing for music-streaming startup Spotify, 939 | you could remark that your skills at developing a better recommendation model would increase user retention, 940 | which would then increase revenue in the long run. 941 | 942 | The startup metrics Slideshare linked above will help you understand exactly what performance indicators 943 | are important for startups and tech companies as they think about revenue and growth. 944 | 945 | #### Q112 What do you think of our current data process? 946 | 947 | This kind of question requires you to listen carefully and impart feedback in a manner that is constructive 948 | and insightful. Your interviewer is trying to gauge if you’d be a valuable member of their team and whether 949 | you grasp the nuances of why certain things are set the way they are in the company’s data process based on 950 | company- or industry-specific conditions. They’re trying to see if you can be an intellectual peer. Act 951 | accordingly. 952 | 953 | ## Machine Learning Interview Questions: General Machine Learning Interest 954 | 955 | This series of machine learning interview questions attempts to gauge your passion and interest in machine learning. 956 | The right answers will serve as a testament for your commitment to being a lifelong learner in machine learning. 957 | 958 | #### Q113 What are the last machine learning papers you’ve read? 959 | 960 | Keeping up with the latest scientific literature on machine learning is a must if you want to demonstrate 961 | interest in a machine learning position. This overview of deep learning in Nature by the scions of deep 962 | learning themselves (from Hinton to Bengio to LeCun) can be a good reference paper and an overview of what’s 963 | happening in deep learning — and the kind of paper you might want to cite. 964 | 965 | #### Q114 Do you have research experience in machine learning? 966 | 967 | Related to the last point, most organizations hiring for machine learning positions will look for your 968 | formal experience in the field. Research papers, co-authored or supervised by leaders in the field, can make 969 | the difference between you being hired and not. Make sure you have a summary of your research experience 970 | and papers ready — and an explanation for your background and lack of formal research experience if you don’t. 971 | 972 | #### Q115 What are your favorite use cases of machine learning models? 973 | 974 | The Quora thread above contains some examples, such as decision trees that categorize people into different 975 | tiers of intelligence based on IQ scores. Make sure that you have a few examples in mind and describe what 976 | resonated with you. It’s important that you demonstrate an interest in how machine learning is implemented. 977 | 978 | #### Q116 How would you approach the “Netflix Prize” competition? 979 | 980 | The Netflix Prize was a famed competition where Netflix offered $1,000,000 for a better collaborative 981 | filtering algorithm. The team that won called BellKor had a 10% improvement and used an ensemble of different 982 | methods to win. Some familiarity with the case and its solution will help demonstrate you’ve paid attention 983 | to machine learning for a while. 984 | 985 | #### Q117 Where do you usually source datasets? 986 | 987 | Machine learning interview questions like these try to get at the heart of your machine learning interest. 988 | Somebody who is truly passionate about machine learning will have gone off and done side projects on their own, 989 | and have a good idea of what great datasets are out there. If you’re missing any, check out Quandl for economic 990 | and financial data, and Kaggle’s Datasets collection for another great list. 991 | 992 | #### Q118 How do you think Google is training data for self-driving cars? 993 | 994 | Machine learning interview questions like this one really test your knowledge of different machine learning 995 | methods, and your inventiveness if you don’t know the answer. Google is currently using recaptcha to source 996 | labelled data on storefronts and traffic signs. They are also building on training data collected by Sebastian 997 | Thrun at GoogleX — some of which was obtained by his grad students driving buggies on desert dunes! 998 | 999 | #### Q119 How would you simulate the approach AlphaGo took to beat Lee Sidol at Go? 1000 | 1001 | AlphaGo beating Lee Sidol, the best human player at Go, in a best-of-five series was a truly seminal event 1002 | in the history of machine learning and deep learning. The Nature paper above describes how this was accomplished 1003 | with “Monte-Carlo tree search with deep neural networks that have been trained by supervised learning, 1004 | from human expert games, and by reinforcement learning from games of self-play.” 1005 | 1006 | 1007 | [Reference from dezyre](https://www.dezyre.com/article/100-data-science-interview-questions-and-answers-general-for-2017/184 "悬停显示") 1008 | 1009 | [Rererence from Springbord](https://www.springboard.com/blog/machine-learning-interview-questions/?from=message&isappinstalled=0 "悬停显示") 1010 | 1011 | Reference: Deep Learning (Ian Goodfellow, Yoshua Bengio and Aaron Courville) -- MIT 1012 | --------------------------------------------------------------------------------