└── README.md


/README.md:
--------------------------------------------------------------------------------
   1 | Data-Science-Interview-Questions-and-Answers-General (Updating)
   2 | ====================================================
   3 | 
   4 | I hope this article could help beginners to better understanding of Data Science, and have a better performance in your first interviews.  
   5 | 
   6 | I will do long update and please feel free to contact me if you have any questions.  
   7 | 
   8 | I'm just a porter, most of them are borrowing from others
   9 | 
  10 | ## Data Science Questions and Answers (General) for beginner
  11 | ### Editor : Zhiqiang ZHONG 
  12 | 
  13 | # Content
  14 | #### Q1 How would you create a taxonomy to identify key customer trends in unstructured data?
  15 | 
  16 |     The best way to approach this question is to mention that it is good to check with the business owner 
  17 |     and understand their objectives before categorizing the data. Having done this, it is always good to 
  18 |     follow an iterative approach by pulling new data samples and improving   the model accordingly by validating 
  19 |     it for accuracy by soliciting feedback from the stakeholders of the business. This helps ensure that your 
  20 |     model is producing actionable results and improving over the time.
  21 |     
  22 | #### Q2 Python or R – Which one would you prefer for text analytics?
  23 | 
  24 |     The best possible answer for this would be Python because it has Pandas library that provides easy to use 
  25 |     data structures and high performance data analysis tools.
  26 |     
  27 | #### Q3 Which technique is used to predict categorical responses?
  28 | 
  29 |     Classification technique is used widely in mining for classifying data sets.
  30 |     
  31 | #### Q4 What is logistic regression? Or State an example when you have used logistic regression recently.
  32 | 
  33 |     Logistic Regression often referred as logit model is a technique to predict the binary outcome from a linear 
  34 |     combination of predictor variables. For example, if you want to predict whether a particular political leader 
  35 |     will win the election or not. In this case, the outcome of prediction is binary i.e. 0 or 1 (Win/Lose). The 
  36 |     predictor variables here would be the amount of money spent for election campaigning of a particular candidate, 
  37 |     the amount of time spent in campaigning, etc.
  38 |     
  39 | #### Q5 What are Recommender Systems?
  40 | 
  41 |     A subclass of information filtering systems that are meant to predict the preferences or ratings that a user 
  42 |     would give to a product. Recommender systems are widely used in movies, news, research articles, products, 
  43 |     social tags, music, etc.
  44 |     
  45 | #### Q6 Why data cleaning plays a vital role in analysis?
  46 | 
  47 |     Cleaning data from multiple sources to transform it into a format that data analysts or data scientists can work 
  48 |     with is a cumbersome process because - as the number of data sources increases, the time take to clean the data 
  49 |     increases exponentially due to the number of sources and the volume of data generated in these sources. It might 
  50 |     take up to 80% of the time for just cleaning data making it a critical part of analysis task.
  51 |     
  52 | #### Q7 Differentiate between univariate, bivariate and multivariate analysis.
  53 | 
  54 |     These are descriptive statistical analysis techniques which can be differentiated based on the number of 
  55 |     variables involved at a given point of time. For example, the pie charts of sales based on territory involve 
  56 |     only one variable and can be referred to as univariate analysis.
  57 | 
  58 |     If the analysis attempts to understand the difference between 2 variables at time as in a scatterplot, then it 
  59 |     is referred to as bivariate analysis. For example, analysing the volume of sale and a spending can be considered 
  60 |     as an example of bivariate analysis.
  61 | 
  62 |     Analysis that deals with the study of more than two variables to understand the effect of variables on the 
  63 |     responses is referred to as multivariate analysis.
  64 | 
  65 | #### Q8 What do you understand by the term Normal Distribution?
  66 | 
  67 |     Data is usually distributed in different ways with a bias to the left or to the right or it can all be jumbled
  68 |     up. However, there are chances that data is distributed around a central value without any bias to the left or
  69 |     right and reaches normal distribution in the form of a bell shaped curve. The random variables are distributed
  70 |     in the form of an symmetrical bell shaped curve.
  71 |     
  72 | ![](https://s3.amazonaws.com/files.dezyre.com/images/blog/100+Data+Science+Interview+Questions+and+Answers+(General)/Bell+Shaped+Curve+for+Normal+Distribution.jpg)
  73 | 
  74 | #### Q9 What is Linear Regression?
  75 | 
  76 |     Linear regression is a statistical technique where the score of a variable Y is predicted from the score of a 
  77 |     second variable X. X is referred to as the predictor variable and Y as the criterion variable.
  78 |     
  79 | #### Q10 What is Interpolation and Extrapolation?
  80 | 
  81 |     Estimating a value from 2 known values from a list of values is Interpolation. Extrapolation is approximating 
  82 |     a value by extending a known set of values or facts.
  83 |     
  84 | #### Q11 What is power analysis?
  85 | 
  86 |     An experimental design technique for determining the effect of a given sample size.
  87 |     
  88 | #### Q12 What is K-means? How can you select K for K-means?
  89 | 
  90 |     K-means is a clestering algorithm, handle with un-supervised problem. k-means clustering aims to partition
  91 |     n observations into k clusters in which each observation belongs to the cluster with the nearest mean, 
  92 |     serving as a prototype of the cluster.
  93 |     
  94 |     You can choose the number of cluster by visually but there is lots of ambiguity, or computethe sum of SSE(the
  95 |     sum of squared error) for some values of K. To find one good K.
  96 |     
  97 | ![](https://qph.ec.quoracdn.net/main-qimg-678795190794dd4c071366c06bf32115.webp)
  98 | 
  99 |     In this case, k=6 is the value.
 100 |     
 101 | [More reading](https://www.quora.com/How-can-we-choose-a-good-K-for-K-means-clustering)
 102 |     
 103 | #### Q13 What is Collaborative filtering?
 104 | 
 105 |     The process of filtering used by most of the recommender systems to find patterns or information by collaborating 
 106 |     viewpoints, various data sources and multiple agents.
 107 |     
 108 | #### Q14 What is the difference between Cluster and Systematic Sampling?
 109 | 
 110 |     Cluster sampling is a technique used when it becomes difficult to study the target population spread across
 111 |     a wide area and simple random sampling cannot be applied. Cluster Sample is a probability sample where each 
 112 |     sampling unit is a collection, or cluster of elements. Systematic sampling is a statistical technique where 
 113 |     elements are selected from an ordered sampling frame. In systematic sampling, the list is progressed in a 
 114 |     circular manner so once you reach the end of the list,it is progressed from the top again. The best example
 115 |     for systematic sampling is equal probability method.
 116 |     
 117 | #### Q15 Are expected value and mean value different?
 118 | 
 119 |     They are not different but the terms are used in different contexts. Mean is generally referred when talking 
 120 |     about a probability distribution or sample population whereas expected value is generally referred in a 
 121 |     random variable context.
 122 | 
 123 |     ***For Sampling Data***
 124 |     Mean value is the only value that comes from the sampling data.
 125 |     Expected Value is the mean of all the means i.e. the value that is built from multiple samples. Expected 
 126 |     value is the population mean.
 127 | 
 128 |     ***For Distributions***
 129 |     Mean value and Expected value are same irrespective of the distribution, under the condition that the 
 130 |     distribution is in the same population.
 131 |     
 132 | #### Q16 What does P-value signify about the statistical data?
 133 | 
 134 |     P-value is used to determine the significance of results after a hypothesis test in statistics. P-value 
 135 |     helps the readers to draw conclusions and is always between 0 and 1.
 136 | - P- Value > 0.05 denotes weak evidence against the null hypothesis which means the null hypothesis cannot be rejected.
 137 | - P-value <= 0.05 denotes strong evidence against the null hypothesis which means the null hypothesis can be rejected.
 138 | - P-value=0.05is the marginal value indicating it is possible to go either way.
 139 | 
 140 | #### Q17 Do gradient descent methods always converge to same point?
 141 | 
 142 |     No, they do not because in some cases it reaches a local minima or a local optima point. You don’t reach 
 143 |     the global optima point. It depends on the data and starting conditions
 144 |     
 145 | ~~#### Q18 What are categorical variables?~~
 146 | 
 147 | #### Q19 A test has a true positive rate of 100% and false positive rate of 5%. There is a population with a 1/1000 rate of having the condition the test identifies. Considering a positive test, what is the probability of having that condition?
 148 | 
 149 |     Let’s suppose you are being tested for a disease, if you have the illness the test will end up saying you 
 150 |     have the illness. However, if you don’t have the illness- 5% of the times the test will end up saying you
 151 |     have the illness and 95% of the times the test will give accurate result that you don’t have the illness. 
 152 |     Thus there is a 5% error in case you do not have the illness.
 153 | 
 154 |     Out of 1000 people, 1 person who has the disease will get true positive result.
 155 | 
 156 |     Out of the remaining 999 people, 5% will also get true positive result.
 157 | 
 158 |     Close to 50 people will get a true positive result for the disease.
 159 | 
 160 |     This means that out of 1000 people, 51 people will be tested positive for the disease even though only one 
 161 |     person has the illness. There is only a 2% probability of you having the disease even if your reports say 
 162 |     that you have the disease.
 163 | 
 164 | #### Q20 How you can make data normal using Box-Cox transformation?
 165 | 
 166 |     The calculation fomula of Box-Cox: 
 167 | ![](http://images.cnblogs.com/cnblogs_com/zgw21cn/WindowsLiveWriter/BoxCox_119E9/clip_image002_thumb.gif)
 168 | 
 169 |     It change the calculation between log, sqrt and reciprocal operation by changing lambda. Find a suitable 
 170 |     lambda based on specific data set.
 171 |     
 172 | #### Q21 What is the difference between Supervised Learning an Unsupervised Learning?
 173 | 
 174 |     If an algorithm learns something from the training data so that the knowledge can be applied to the test data,
 175 |     then it is referred to as Supervised Learning. Classification is an example for Supervised Learning. If the
 176 |     algorithm does not learn anything beforehand because there is no response variable or any training data, 
 177 |     then it is referred to as unsupervised learning. Clustering is an example for unsupervised learning.
 178 |     
 179 | #### Q22 Explain the use of Combinatorics in data science.
 180 | 
 181 |     Combinatorics used a lot in data science, from feature engineer to algorithms(ensemble algorithms).Creat new features
 182 |     by merge original feature and merge several networks in one to creat news, like bagging, boosting and stacking. 
 183 | 
 184 | #### Q23 Why is vectorization considered a powerful method for optimizing numerical code?
 185 | 
 186 |     Vectorization can change original data to be structed.
 187 | 
 188 | #### Q24 What is the goal of A/B Testing?
 189 | 
 190 |     It is a statistical hypothesis testing for randomized experiment with two variables A and B. The goal of A/B 
 191 |     Testing is to identify any changes to the web page to maximize or increase the outcome of an interest. An
 192 |     example for this could be identifying the click through rate for a banner ad.
 193 |     
 194 | #### Q25 What is an Eigenvalue and Eigenvector?
 195 | 
 196 |     Eigenvectors are used for understanding linear transformations. In data analysis, we usually calculate the
 197 |     eigenvectors for a correlation or covariance matrix. Eigenvectors are the directions along which a particular
 198 |     linear transformation acts by flipping, compressing or stretching. Eigenvalue can be referred to as the strength
 199 |     of the transformation in the direction of eigenvector or the factor by which the compression occurs.
 200 | #### Q26 What is Gradient Descent?
 201 | 
 202 |     A method to find the local minimum of a function. From a point along the direction of gradient to iterational 
 203 |     search by a certain step length, until gradient equals zero. 
 204 | 
 205 | #### Q27 How can outlier values be treated?
 206 | 
 207 |     Outlier values can be identified by using univariate or any other graphical analysis method. If the number of
 208 |     outlier values is few then they can be assessed individually but for large number of outliers the values can
 209 |     be substituted with either the 99th or the 1st percentile values. All extreme values are not outlier values.
 210 |     The most common ways to treat outlier values –
 211 |     
 212 | 1. To change the value and bring in within a range
 213 | 
 214 | 2. To just remove the value.
 215 | 
 216 | #### Q28 How can you assess a good logistic model?
 217 | 
 218 |     There are various methods to assess the results of a logistic regression analysis-
 219 |     
 220 | - Using Classification Matrix to look at the true negatives and false positives.
 221 | - Concordance that helps identify the ability of the logistic model to differentiate between the event happening and not happening.
 222 | - Lift helps assess the logistic model by comparing it with random selection.
 223 | 
 224 | #### Q29 What are various steps involved in an analytics project?
 225 | 
 226 | - Understand the business problem
 227 | - Explore the data and become familiar with it.
 228 | - Prepare the data for modelling by detecting outliers, treating missing values, transforming variables, etc.
 229 | - After data preparation, start running the model, analyse the result and tweak the approach. This is an iterative step till the best possible outcome is achieved.
 230 | - Validate the model using a new data set.
 231 | - Start implementing the model and track the result to analyse the performance of the model over the period of time.
 232 | 
 233 | #### Q30 How can you iterate over a list and also retrieve element indices at the same time?
 234 | 
 235 |     This can be done using the enumerate function which takes every element in a sequence just like in a list
 236 |     and adds its location just before it.
 237 |     
 238 | #### Q31 During analysis, how do you treat missing values?
 239 | 
 240 | Minsing values has many reasons, like:
 241 | - Information not advisable for this time
 242 | - Information was missed by collect
 243 | - Some attributes of some items are not avaliable
 244 | - Some information was thinked not important
 245 | - It's too expensive to collect all these data
 246 |     
 247 | Types of Missing values:
 248 | - Missing completely at Random (MCAR): no relationship with missing values and other variables, like 
 249 |     family adress
 250 | - Missing at random (MAR): not completely random, missing denpends on other variables, like finance situation
 251 |     data missing has relationship with the company size
 252 | - Missing not at random (MNAR): there is relationship with the value of variable self, like high income families 
 253 |     don't will to open its income situation
 254 |       
 255 | Methods treatment (you need to know clearly about your missing values firstly)
 256 | - Delect tuple
 257 |     Delect tuples have any missing values
 258 |     - List wise delection
 259 |     - Pair wise delection
 260 | ![](https://www.analyticsvidhya.com/wp-content/uploads/2015/02/Data_Exploration_2_2.png)
 261 | 
 262 | - Imputation
 263 |     - Filling manually
 264 |     - Treating Missing Attribute values as Special values (mean, mode, median imputation)
 265 |     - Hot deck imputation
 266 |     - KNN 
 267 |     - Assigning All Possible values of the Attribute
 268 |     - Combinational Completer
 269 |     - Regression
 270 |     - Expectation maximization, EM
 271 |     - Multiple Imputation
 272 | 
 273 | [More Reading (In Chinese)](http://blog.csdn.net/lujiandong1/article/details/52654703)
 274 | 
 275 | [Python package](https://pypi.python.org/pypi/fancyimpute)
 276 | 
 277 | ~~#### Q32 Explain about the box cox transformation in regression models.~~
 278 | 
 279 | #### Q33 Can you use machine learning for time series analysis?
 280 | 
 281 |     Yes, it can be used but it depends on the applications.
 282 |     
 283 | #### Q34 Write a function that takes in two sorted lists and outputs a sorted list that is their union. 
 284 | 
 285 |     First solution which will come to your mind is to merge two lists and short them afterwards
 286 |     **Python code-**
 287 |     def return_union(list_a, list_b):
 288 |         return sorted(list_a + list_b)
 289 |     
 290 |     **R code-**
 291 |     return_union <- function(list_a, list_b)
 292 |     {
 293 |     list_c<-list(c(unlist(list_a),unlist(list_b)))
 294 |     return(list(list_c[[1]][order(list_c[[1]])]))
 295 |     }
 296 | 
 297 |     Generally, the tricky part of the question is not to use any sorting or ordering function. In that 
 298 |     case you will have to write your own logic to answer the question and impress your interviewer.
 299 |     
 300 |     ***Python code-***
 301 |     def return_union(list_a, list_b):
 302 |         len1 = len(list_a)
 303 |         len2 = len(list_b)
 304 |         final_sorted_list = []
 305 |         j = 0
 306 |         k = 0
 307 |     
 308 |         for i in range(len1+len2):
 309 |             if k == len1:
 310 |                 final_sorted_list.extend(list_b[j:])
 311 |                 break
 312 |             elif j == len2:
 313 |                 final_sorted_list.extend(list_a[k:])
 314 |                 break
 315 |             elif list_a[k] < list_b[j]:
 316 |                 final_sorted_list.append(list_a[k])
 317 |                 k += 1
 318 |             else:
 319 |                 final_sorted_list.append(list_b[j])
 320 |                 j += 1
 321 |         return final_sorted_list
 322 | 
 323 |     Similar function can be returned in R as well by following the similar steps.
 324 | 
 325 |     return_union <- function(list_a,list_b)
 326 |     {
 327 |     #Initializing length variables
 328 |     len_a <- length(list_a)
 329 |     len_b <- length(list_b)
 330 |     len <- len_a + len_b
 331 |     
 332 |     #initializing counter variables
 333 |     
 334 |     j=1
 335 |     k=1
 336 |     
 337 |     #Creating an empty list which has length equal to sum of both the lists
 338 |     
 339 |     list_c <- list(rep(NA,len))
 340 |     
 341 |     #Here goes our for loop 
 342 |     
 343 |     for(i in 1:len)
 344 |     {
 345 |         if(j>len_a)
 346 |         {
 347 |             list_c[i:len] <- list_b[k:len_b]
 348 |             break
 349 |         }
 350 |         else if(k>len_b)
 351 |         {
 352 |             list_c[i:len] <- list_a[j:len_a]
 353 |             break
 354 |         }
 355 |         else if(list_a[[j]] <= list_b[[k]])
 356 |         {
 357 |             list_c[[i]] <- list_a[[j]]
 358 |             j <- j+1
 359 |         }
 360 |         else if(list_a[[j]] > list_b[[k]])
 361 |         {
 362 |         list_c[[i]] <- list_b[[k]]
 363 |         k <- k+1
 364 |         }
 365 |     }
 366 |     return(list(unlist(list_c)))
 367 | 
 368 |     }
 369 | #### Q35 What is the difference between Bayesian Inference and Maximum Likelihood Estimation (MLE)?
 370 | 
 371 | #### Q36 What is Regularization and what kind of problems does regularization solve?
 372 |     A central problem in machine learning is how to make an algorithm that will perform weel not just on
 373 |     the training data, but also on new inputs. Many strategies used in machine learning are explicitly 
 374 |     designed to reduce the test error, possibly at the expense of increased training error. These 
 375 |     strategies are known collectively as regularization.
 376 |     Briefly, regularization is any modification we make to a learning algorithm that is intended to 
 377 |     reduce its generalization error but not its training error.
 378 | 
 379 | #### Q37 What is multicollinearity and how you can overcome it?
 380 |     In statistics, multicollinearity (also collinearity) is a phenomenon in which two or more predictor
 381 |     variables in a multiple regression model are highly correlated, meaning that one can be linearly 
 382 |     predicted from the others with a substantial degree of accuracy. 
 383 |     Solutions:
 384 |         Remove variables that lead to multicollinearity.
 385 |         Obtain more data.
 386 |         Ridge regression or PCA (principal component regression) or partial least squares regression
 387 | [More reading in WIKI](https://en.wikipedia.org/wiki/Multicollinearity)
 388 | 
 389 | #### Q38 What is the curse of dimensionality?
 390 |     It refers to various phenomena that arise when analyzing and organizing data in high-dimensional 
 391 |     spaces (often with hundreds or thousands of dimensions) that do not occur in low-dimensional
 392 |     settings.
 393 | 
 394 | #### Q39 How do you decide whether your linear regression model fits the data?
 395 |     Many solutions, such as use a loss function and check it situation, or use test data to verify 
 396 |     our model
 397 | 
 398 | ~~#### Q40 What is the difference between squared error and absolute error?~~
 399 | 
 400 | #### Q41 What is Machine Learning?
 401 | 
 402 |     The simplest way to answer this question is – we give the data and equation to the machine. Ask the
 403 |     machine to look at the data and identify the coefficient values in an equation.
 404 | 
 405 |     For example for the linear regression y=mx+c, we give the data for the variable x, y and the machine
 406 |     learns about the values of m and c from the data.
 407 |     
 408 | #### Q42 How are confidence intervals constructed and how will you interpret them?
 409 |     Confidence interval is: under a certain confidence, the length of the area where the overall parameter
 410 |     is located. 
 411 | 
 412 | #### Q43 How will you explain logistic regression to an economist, physican scientist and biologist?
 413 | 
 414 | #### Q44 How can you overcome Overfitting?
 415 |     Regularization: add a regularizer or a penalty term.
 416 |     Cross Validation: Simple cross validation; S-folder cross validation; Leave-one-out cross validation.  
 417 | 
 418 | #### Q45 Differentiate between wide and tall data formats?
 419 |     Wide: data formats have lots of columns.
 420 |     Tall: data formats have lots of examples.
 421 | 
 422 | #### Q46 Is Naïve Bayes bad? If yes, under what aspects.
 423 | 
 424 | #### Q47 How would you develop a model to identify plagiarism?
 425 | 
 426 | #### Q48 How will you define the number of clusters in a clustering algorithm?
 427 | 
 428 |     Though the Clustering Algorithm is not specified, this question will mostly be asked in reference to
 429 |     K-Means clustering where “K” defines the number of clusters. The objective of clustering is to group 
 430 |     similar entities in a way that the entities within a group are similar to each other but the groups 
 431 |     are different from each other.
 432 | 
 433 |     For example, the following image shows three different groups.
 434 |     
 435 | ![](https://s3.amazonaws.com/files.dezyre.com/images/blog/100+Data+Science+Interview+Questions+and+Answers+(General)/Data+Science+Interview+Questions+K-Means+Clustering.jpg)
 436 | 
 437 |     K-Mean Clustering Machine Learning Algorithm
 438 | 
 439 |     Within Sum of squares is generally used to explain the homogeneity within a cluster. If you plot WSS 
 440 |     for a range of number of clusters, you will get the plot shown below. The Graph is generally known as 
 441 |     Elbow Curve.
 442 |     
 443 | ![](https://s3.amazonaws.com/files.dezyre.com/images/blog/100+Data+Science+Interview+Questions+and+Answers+(General)/Data+Science+Interview+Questions+K-Means.png)
 444 | 
 445 |     Red circled point in above graph i.e. Number of Cluster =6 is the point after which you don’t see any 
 446 |     decrement in WSS. This point is known as bending point and taken as K in K – Means.
 447 | 
 448 |     This is the widely used approach but few data scientists also use Hierarchical clustering first to 
 449 |     create dendograms and identify the distinct groups from there.
 450 | #### Q49 Is it better to have too many false negatives or too many false positives?
 451 |     It depends on the situation, for example, if we use the model for cancer detection, FN(False Negative)
 452 |     is more serious than FP(False Positive) because a FN could be verified in futher check, but
 453 |     FP maybe will let a patient be missed and delay the best treatment period.
 454 | 
 455 | #### Q50 Is it possible to perform logistic regression with Microsoft Excel?
 456 |     Yep, i must say Microsoft Excel is more and more powerful, and many data science could be 
 457 |     realized in simple way.
 458 | 
 459 | #### Q51 What do you understand by Fuzzy merging ? Which language will you use to handle it?
 460 | 
 461 | #### Q51 What is the difference between skewed and uniform distribution?
 462 | 
 463 | #### G52 You created a predictive model of a quantitative outcome variable using multiple regressions. What are the steps you would follow to validate the model?
 464 | 
 465 |     Since the question asked, is about post model building exercise, we will assume that you have 
 466 |     already tested for null hypothesis, multi collinearity and Standard error of coefficients.
 467 |     
 468 |     Once you have built the model, you should check for following –
 469 | - Global F-test to see the significance of group of independent variables on dependent variable
 470 | - R^2
 471 | - Adjusted R^2
 472 | - RMSE, MAPE
 473 | 
 474 | In addition to above mentioned quantitative metrics you should also check for-
 475 | - Residual plot
 476 | - Assumptions of linear regression 
 477 | 
 478 | #### Q54 What do you understand by Hypothesis in the content of Machine Learning?
 479 | 
 480 | #### Q55 What do you understand by Recall and Precision?
 481 | 
 482 | #### Q56 How will you find the right K for K-means?
 483 |     No any other way just do experiment on instance dataset, see the result of different K, find
 484 |     the better one. 
 485 | 
 486 | #### Q57 Why L1 regularizations causes parameter sparsity whereas L2 regularization does not?
 487 | 
 488 |     Regularizations in statistics or in the field of machine learning is used to include some extra 
 489 |     information in order to solve a problem in a better way. L1 & L2 regularizations are generally used 
 490 |     to add constraints to optimization problems.
 491 | 
 492 | ![](https://s3.amazonaws.com/files.dezyre.com/images/blog/100+Data+Science+Interview+Questions+and+Answers+(General)/L1+L2+Regularizations.png)
 493 | 
 494 |     In the example shown above H0 is a hypothesis. If you observe, in L1 there is a high likelihood to 
 495 |     hit the corners as solutions while in L2, it doesn’t. So in L1 variables are penalized more as compared
 496 |     to L2 which results into sparsity.
 497 |     In other words, errors are squared in L2, so model sees higher error and tries to minimize that squared 
 498 |     error.
 499 |     
 500 | #### Q58 How can you deal with different types of seasonality in time series modelling?
 501 | 
 502 | #### Q59 In experimental design, is it necessary to do randomization? If yes, why?
 503 |     Normally yes, but never do it for time series dataset.
 504 | 
 505 | #### Q60 What do you understand by conjugate-prior with respect to Naïve Bayes?
 506 | 
 507 | #### Q61 Can you cite some examples where a false positive is important than a false negative?
 508 | 
 509 |     Before we start, let us understand what are false positives and what are false negatives.
 510 |     False Positives are the cases where you wrongly classified a non-event as an event a.k.a Type I error.
 511 |     And, False Negatives are the cases where you wrongly classify events as non-events, a.k.a Type II error.
 512 |     
 513 | ![](https://s3.amazonaws.com/files.dezyre.com/images/blog/100+Data+Science+Interview+Questions+and+Answers+(General)/False+Positive+False+Negative.png)
 514 |     
 515 |     In medical field, assume you have to give chemo therapy to patients. Your lab tests patients for certain 
 516 |     vital information and based on those results they decide to give radiation therapy to a patient.
 517 |     Assume a patient comes to that hospital and he is tested positive for cancer (But he doesn’t have cancer) 
 518 |     based on lab prediction. What will happen to him? (Assuming Sensitivity is 1)
 519 | 
 520 |     One more example might come from marketing. Let’s say an ecommerce company decided to give $1000 Gift 
 521 |     voucher to the customers whom they assume to purchase at least $5000 worth of items. They send free voucher 
 522 |     mail directly to 100 customers without any minimum purchase condition because they assume to make at 
 523 |     least 20% profit on sold items above 5K.
 524 | 
 525 |     Now what if they have sent it to false positive cases? 
 526 |     
 527 | #### Q62 Can you cite some examples where a false negative important than a false positive?
 528 | 
 529 |     Assume there is an airport ‘A’ which has received high security threats and based on certain 
 530 |     characteristics they identify whether a particular passenger can be a threat or not. Due to shortage 
 531 |     of staff they decided to scan passenger being predicted as risk positives by their predictive model.
 532 |     What will happen if a true threat customer is being flagged as non-threat by airport model?
 533 |     
 534 |     Another example can be judicial system. What if Jury or judge decide to make a criminal go free?
 535 |     
 536 |     What if you rejected to marry a very good person based on your predictive model and you happen to
 537 |     meet him/her after few years and realize that you had a false negative?
 538 |     
 539 | #### Q63 Can you cite some examples where both false positive and false negatives are equally important?
 540 | 
 541 |     In the banking industry giving loans is the primary source of making money but at the same time if 
 542 |     your repayment rate is not good you will not make any profit, rather you will risk huge losses.
 543 |     
 544 |     Banks don’t want to lose good customers and at the same point of time they don’t want to acquire 
 545 |     bad customers. In this scenario both the false positives and false negatives become very important 
 546 |     to measure.
 547 | 
 548 | #### Q64 Can you explain the difference between a Test Set and a Validation Set?
 549 | 
 550 |     Validation set can be considered as a part of the training set as it is used for parameter selection
 551 |     and to avoid Overfitting of the model being built. On the other hand, test set is used for testing 
 552 |     or evaluating the performance of a trained machine leaning model.
 553 | 
 554 |     In simple terms ,the differences can be summarized as-
 555 |     
 556 | -   Training Set is to fit the parameters i.e. weights.
 557 | -   Test Set is to assess the performance of the model i.e. evaluating the predictive power and generalization.
 558 | -   Validation set is to tune the parameters.
 559 | 
 560 | #### Q65 What makes a dataset gold standard?
 561 |     
 562 | 
 563 | #### Q66 What do you understand by statistical power of sensitivity and how do you calculate it?
 564 | 
 565 |     Sensitivity is commonly used to validate the accuracy of a classifier (Logistic, SVM, RF etc.). 
 566 |     Sensitivity is nothing but “Predicted TRUE events/ Total events”. True events here are the events
 567 |     which were true and model also predicted them as true.
 568 |     
 569 |     Calculation of seasonality is pretty straight forward-
 570 |     
 571 |     ***Seasonality = True Positives /Positives in Actual Dependent Variable***
 572 |     
 573 |     Where, True positives are Positive events which are correctly classified as Positives.
 574 |     
 575 | #### Q67 What is the importance of having a selection bias?
 576 | 
 577 | #### Q68 Give some situations where you will use an SVM over a RandomForest Machine Learning algorithm and vice-versa.
 578 | 
 579 |     SVM and Random Forest are both used in classification problems.
 580 |     
 581 |     a)      If you are sure that your data is outlier free and clean then go for SVM. It is the 
 582 |     opposite - if your data might contain outliers then Random forest would be the best choice
 583 |     b)      Generally, SVM consumes more computational power than Random Forest, so if you are constrained 
 584 |     with memory go for Random Forest machine learning algorithm.
 585 |     c)  Random Forest gives you a very good idea of variable importance in your data, so if you want to 
 586 |     have variable importance then choose Random Forest machine learning algorithm.
 587 |     d)      Random Forest machine learning algorithms are preferred for multiclass problems.
 588 |     e)     SVM is preferred in multi-dimensional problem set - like text classification
 589 |     but as a good data scientist, you should experiment with both of them and test for accuracy or rather 
 590 |     you can use ensemble of many Machine Learning techniques.
 591 | 
 592 | #### Q69 What do you understand by feature vectors?
 593 | 
 594 | ~~#### Q70 How do data management procedures like missing data handling make selection bias worse?~~
 595 | 
 596 | #### Q71 What are the advantages and disadvantages of using regularization methods like Ridge Regression?
 597 | 
 598 | ~~#### Q72 What do you understand by long and wide data formats?~~
 599 | 
 600 | #### Q73 What do you understand by outliers and inliers? What would you do if you find them in your dataset?
 601 | 
 602 | ~~#### Q74 Write a program in Python which takes input as the diameter of a coin and weight of the coin and produces output as the money value of the coin.
 603 | 
 604 | #### Q75 What are the basic assumptions to be made for linear regression?
 605 | 
 606 |     Normality of error distribution, statistical independence of errors, linearity and additivity.
 607 | 
 608 | #### Q76 Can you write the formula to calculat R-square?
 609 | 
 610 |     R-Square can be calculated using the below formular -
 611 |     1 - (Residual Sum of Squares/ Total Sum of Squares)
 612 | 
 613 | #### Q77 What is the advantage of performing dimensionality reduction before fitting an SVM?
 614 | 
 615 |     Support Vector Machine Learning Algorithm performs better in the reduced space. It is beneficial to 
 616 |     perform dimensionality reduction before fitting an SVM if the number of features is large when 
 617 |     compared to the number of observations.
 618 | 
 619 | #### Q78 How will you assess the statistical significance of an insight whether it is a real insight or just by chance?
 620 | 
 621 |     Statistical importance of an insight can be accessed using Hypothesis Testing.
 622 | 
 623 | ## Machine Learning Interview Questions: Algorithms/Theory
 624 | 
 625 | #### Q79 What’s the trade-off between bias and variance?
 626 |     
 627 |     Bias is error due to erroneous or overly simplistic assumptions in the learning algorithm 
 628 |     you’re using. This can lead to the model underfitting your data, making it hard for it to have 
 629 |     high predictive accuracy and for you to generalize your knowledge from the training set to the 
 630 |     test set.
 631 | 
 632 |     Variance is error due to too much complexity in the learning algorithm you’re using. This leads 
 633 |     to the algorithm being highly sensitive to high degrees of variation in your training data, which 
 634 |     can lead your model to overfit the data. You’ll be carrying too much noise from your training data 
 635 |     for your model to be very useful for your test data.
 636 | 
 637 |     The bias-variance decomposition essentially decomposes the learning error from any algorithm by 
 638 |     adding the bias, the variance and a bit of irreducible error due to noise in the underlying dataset. 
 639 |     Essentially, if you make the model more complex and add more variables, you’ll lose bias but gain 
 640 |     some variance — in order to get the optimally reduced amount of error, you’ll have to tradeoff bias 
 641 |     and variance. You don’t want either high bias or high variance in your model.
 642 | 
 643 | #### Q80 What is the difference between supervised and unsupervised machine learning?
 644 |     
 645 |     Supervised learning requires training labeled data. For example, in order to do classification 
 646 |     (a supervised learning task), you’ll need to first label the data you’ll use to train the model
 647 |     to classify data into your labeled groups. Unsupervised learning, in contrast, does not require
 648 |     labeling data explicitly.
 649 | 
 650 | #### Q81 How is KNN different from k-means clustering?
 651 | 
 652 |     K-Nearest Neighbors is a supervised classification algorithm, while k-means clustering is an
 653 |     unsupervised clustering algorithm. While the mechanisms may seem similar at first, what this
 654 |     really means is that in order for K-Nearest Neighbors to work, you need labeled data you want to 
 655 |     classify an unlabeled point into (thus the nearest neighbor part). K-means clustering requires only
 656 |     a set of unlabeled points and a threshold: the algorithm will take unlabeled points and gradually
 657 |     learn how to cluster them into groups by computing the mean of the distance between different points.
 658 |     
 659 |     The critical difference here is that KNN needs labeled points and is thus supervised learning, while
 660 |     k-means doesn’t — and is thus unsupervised learning.
 661 | 
 662 | #### Q82 Explain how a ROC curve works.
 663 |     
 664 |     The ROC curve is a graphical representation of the contrast between true positive rates and the 
 665 |     false positive rate at various thresholds. It’s often used as a proxy for the trade-off between
 666 |     the sensitivity of the model (true positives) vs the fall-out or the probability it will trigger 
 667 |     a false alarm (false positives).
 668 |     
 669 | ![](https://lh3.googleusercontent.com/zUWYO4VwGpoyu9oygT12F3hgZ30GxVY7sg_ZF46INrNbDutd9mVz9GnYIYGw2r1ZcbPLQXF4HV-uNXvQcVrP7Sg2BDDqRkaY3RAApumdXgH2mQZ8OCSgqqsVl7UDVjqwVFq224Z_)
 670 |     
 671 | #### Q83 Define precision and recall.
 672 |     
 673 |     Recall is also known as the true positive rate: the amount of positives your model claims 
 674 |     compared to the actual number of positives there are throughout the data. Precision is also 
 675 |     known as the positive predictive value, and it is a measure of the amount of accurate positives
 676 |     your model claims compared to the number of positives it actually claims. It can be easier to think
 677 |     of recall and precision in the context of a case where you’ve predicted that there were 10 apples
 678 |     and 5 oranges in a case of 10 apples. You’d have perfect recall (there are actually 10 apples, and
 679 |     you predicted there would be 10) but 66.7% precision because out of the 15 events you predicted,
 680 |     only 10 (the apples) are correct.
 681 | 
 682 | #### Q84 What is Bayes’ Theorem? How is it useful in a machine learning context?
 683 |     
 684 |     Bayes’ Theorem gives you the posterior probability of an event given what is known as prior knowledge.
 685 |     
 686 |     Mathematically, it’s expressed as the true positive rate of a condition sample divided by the sum of 
 687 |     the false positive rate of the population and the true positive rate of a condition. Say you had a 
 688 |     60% chance of actually having the flu after a flu test, but out of people who had the flu, the test 
 689 |     will be false 50% of the time, and the overall population only has a 5% chance of having the flu. 
 690 |     Would you actually have a 60% chance of having the flu after having a positive test?
 691 |     
 692 |     Bayes’ Theorem says no. It says that you have a (.6 * 0.05) (True Positive Rate of a Condition 
 693 |     Sample) / (.6*0.05)(True Positive Rate of a Condition Sample) + (.5*0.95) (False Positive Rate of
 694 |     a Population)  = 0.0594 or 5.94% chance of getting a flu.
 695 | 
 696 |     Bayes’ Theorem is the basis behind a branch of machine learning that most notably includes the
 697 |     Naive Bayes classifier. That’s something important to consider when you’re faced with machine 
 698 |     learning interview questions.
 699 | 
 700 | #### Q85 Why is “Naive” Bayes naive?
 701 | 
 702 |     Despite its practical applications, especially in text mining, Naive Bayes is considered “Naive” 
 703 |     because it makes an assumption that is virtually impossible to see in real-life data: the 
 704 |     conditional probability is calculated as the pure product of the individual probabilities of 
 705 |     components. This implies the absolute independence of features — a condition probably never met 
 706 |     in real life.
 707 | 
 708 |     As a Quora commenter put it whimsically, a Naive Bayes classifier that figured out that you liked 
 709 |     pickles and ice cream would probably naively recommend you a pickle ice cream.
 710 | 
 711 | #### Q86 Explain the difference between L1 and L2 regularization.
 712 | 
 713 |     L2 regularization tends to spread error among all the terms, while L1 is more binary/sparse, with
 714 |     many variables either being assigned a 1 or 0 in weighting. L1 corresponds to setting a Laplacean
 715 |     prior on the terms, while L2 corresponds to a Gaussian prior.
 716 |     
 717 | ![](https://lh6.googleusercontent.com/vXUSHKE11Qpolek11IPPP6Fs-iU1-LeWtf5EXVdrfOl97ytug_cME-vLF1t4BNvoAppxfRhx4dNzHoKkdl8dfGVix4jc2hhvrtDG_wyuByxpVfeFZQdMH-INzG6RSi_9jkJLERto)
 718 | 
 719 | #### Q87 What’s your favorite algorithm, and can you explain it to me in less than a minute?
 720 | 
 721 |     This type of question tests your understanding of how to communicate complex and technical nuances 
 722 |     with poise and the ability to summarize quickly and efficiently. Make sure you have a choice and 
 723 |     make sure you can explain different algorithms so simply and effectively that a five-year-old could
 724 |     grasp the basics!
 725 | 
 726 | #### Q88 What’s the difference between Type I and Type II error?
 727 | 
 728 |     Don’t think that this is a trick question! Many machine learning interview questions will be an 
 729 |     attempt to lob basic questions at you just to make sure you’re on top of your game and you’ve
 730 |     prepared all of your bases.
 731 | 
 732 |     Type I error is a false positive, while Type II error is a false negative. Briefly stated, Type I 
 733 |     error means claiming something has happened when it hasn’t, while Type II error means that you claim 
 734 |     nothing is happening when in fact something is.
 735 | 
 736 |     A clever way to think about this is to think of Type I error as telling a man he is pregnant, while
 737 |     Type II error means you tell a pregnant woman she isn’t carrying a baby.
 738 | 
 739 | #### Q89 What’s a Fourier transform?
 740 | 
 741 |     A Fourier transform is a generic method to decompose generic functions into a superposition of symmetric
 742 |     functions. Or as this more intuitive tutorial puts it, given a smoothie, it’s how we find the recipe. The 
 743 |     Fourier transform finds the set of cycle speeds, amplitudes and phases to match any time signal. A Fourier
 744 |     transform converts a signal from time to frequency domain — it’s a very common way to extract features from
 745 |     audio signals or other time series such as sensor data.
 746 | 
 747 | #### Q90 What’s the difference between probability and likelihood?
 748 | 
 749 | ![](https://lh3.googleusercontent.com/Yz2xAzLEEjtk62o9zatSDZJ7yBwgw-a1GtSNfAjJ3tq3OY5UbnxYUpNOqAuuKAUj8kVZaraIsr87kX83ejzg2y8DW9goGJbZuPc1Be_2VmGEEsNZ5JMioUw6Xke-KvYzp-sVrLCL)
 750 | 
 751 | #### Q91 What is deep learning, and how does it contrast with other machine learning algorithms?
 752 | 
 753 |     Deep learning is a subset of machine learning that is concerned with neural networks: how to use
 754 |     backpropagation and certain principles from neuroscience to more accurately model large sets of
 755 |     unlabelled or semi-structured data. In that sense, deep learning represents an unsupervised learning
 756 |     algorithm that learns representations of data through the use of neural nets.
 757 | 
 758 | #### Q92 What’s the difference between a generative and discriminative model?
 759 | 
 760 |     A generative model will learn categories of data while a discriminative model will simply learn the 
 761 |     distinction between different categories of data. Discriminative models will generally outperform 
 762 |     generative models on classification tasks.
 763 | 
 764 | #### Q93 What cross-validation technique would you use on a time series dataset?
 765 | 
 766 |     Instead of using standard k-folds cross-validation, you have to pay attention to the fact that a 
 767 |     time series is not randomly distributed data — it is inherently ordered by chronological order. If a 
 768 |     pattern emerges in later time periods for example, your model may still pick up on it even if that 
 769 |     effect doesn’t hold in earlier years!
 770 | 
 771 |     You’ll want to do something like forward chaining where you’ll be able to model on past data then
 772 |     look at forward-facing data.
 773 | 
 774 |     fold 1 : training [1], test [2]
 775 |     fold 2 : training [1 2], test [3]
 776 |     fold 3 : training [1 2 3], test [4]
 777 |     fold 4 : training [1 2 3 4], test [5]
 778 |     fold 5 : training [1 2 3 4 5], test [6]
 779 | #### Q94 How is a decision tree pruned?
 780 | 
 781 |     Pruning is what happens in decision trees when branches that have weak predictive power are removed 
 782 |     in order to reduce the complexity of the model and increase the predictive accuracy of a decision 
 783 |     tree model. Pruning can happen bottom-up and top-down, with approaches such as reduced error pruning 
 784 |     and cost complexity pruning.
 785 | 
 786 |     Reduced error pruning is perhaps the simplest version: replace each node. If it doesn’t decrease 
 787 |     predictive accuracy, keep it pruned. While simple, this heuristic actually comes pretty close to an 
 788 |     approach that would optimize for maximum accuracy.
 789 | 
 790 | #### Q95 Which is more important to you– model accuracy, or model performance?
 791 | 
 792 |     This question tests your grasp of the nuances of machine learning model performance! Machine learning 
 793 |     interview questions often look towards the details. There are models with higher accuracy that can 
 794 |     perform worse in predictive power — how does that make sense?
 795 |     
 796 |     Well, it has everything to do with how model accuracy is only a subset of model performance, and at 
 797 |     that, a sometimes misleading one. For example, if you wanted to detect fraud in a massive dataset with
 798 |     a sample of millions, a more accurate model would most likely predict no fraud at all if only a vast 
 799 |     minority of cases were fraud. However, this would be useless for a predictive model — a model designed
 800 |     to find fraud that asserted there was no fraud at all! Questions like this help you demonstrate that 
 801 |     you understand model accuracy isn’t the be-all and end-all of model performance.
 802 | 
 803 | #### Q96 What’s the F1 score? How would you use it?
 804 | 
 805 |     The F1 score is a measure of a model’s performance. It is a weighted average of the precision and recall
 806 |     of a model, with results tending to 1 being the best, and those tending to 0 being the worst. You would
 807 |     use it in classification tests where true negatives don’t matter much.
 808 | 
 809 | #### Q97 How would you handle an imbalanced dataset?
 810 | 
 811 |     An imbalanced dataset is when you have, for example, a classification test and 90% of the data is in one
 812 |     class. That leads to problems: an accuracy of 90% can be skewed if you have no predictive power on the
 813 |     other category of data! Here are a few tactics to get over the hump:
 814 | 
 815 |     1- Collect more data to even the imbalances in the dataset.
 816 | 
 817 |     2- Resample the dataset to correct for imbalances.
 818 | 
 819 |     3- Try a different algorithm altogether on your dataset.
 820 | 
 821 |     What’s important here is that you have a keen sense for what damage an unbalanced dataset can cause, 
 822 |     and how to balance that.
 823 | 
 824 | #### Q98 When should you use classification over regression?
 825 | 
 826 |     Classification produces discrete values and dataset to strict categories, while regression gives you
 827 |     continuous results that allow you to better distinguish differences between individual points. You would
 828 |     use classification over regression if you wanted your results to reflect the belongingness of data points 
 829 |     in your dataset to certain explicit categories (ex: If you wanted to know whether a name was male or 
 830 |     female rather than just how correlated they were with male and female names.)
 831 | 
 832 | #### Q99 Name an example where ensemble techniques might be useful.
 833 | 
 834 |     Ensemble techniques use a combination of learning algorithms to optimize better predictive performance.
 835 |     They typically reduce overfitting in models and make the model more robust (unlikely to be influenced by 
 836 |     small changes in the training data). 
 837 | 
 838 |     You could list some examples of ensemble methods, from bagging to boosting to a “bucket of models” method
 839 |     and demonstrate how they could increase predictive power.
 840 | 
 841 | #### Q100 How do you ensure you’re not overfitting with a model?
 842 | 
 843 |     This is a simple restatement of a fundamental problem in machine learning: the possibility of 
 844 |     overfitting training data and carrying the noise of that data through to the test set, thereby
 845 |     providing inaccurate generalizations.
 846 | 
 847 |     There are three main methods to avoid overfitting:
 848 | 
 849 |     1- Keep the model simpler: reduce variance by taking into account fewer variables and parameters, 
 850 |     thereby removing some of the noise in the training data.
 851 | 
 852 |     2- Use cross-validation techniques such as k-folds cross-validation.
 853 | 
 854 |     3- Use regularization techniques such as LASSO that penalize certain model parameters if they’re 
 855 |     likely to cause overfitting.
 856 | 
 857 | #### Q101 What evaluation approaches would you work to gauge the effectiveness of a machine learning model?
 858 | 
 859 |     You would first split the dataset into training and test sets, or perhaps use cross-validation
 860 |     techniques to further segment the dataset into composite sets of training and test sets within 
 861 |     the data. You should then implement a choice selection of performance metrics: here is a fairly
 862 |     comprehensive list. You could use measures such as the F1 score, the accuracy, and the confusion 
 863 |     matrix. What’s important here is to demonstrate that you understand the nuances of how a model is
 864 |     measured and how to choose the right performance measures for the right situations.
 865 | 
 866 | #### Q102 How would you evaluate a logistic regression model?
 867 | 
 868 |     A subsection of the question above. You have to demonstrate an understanding of what the typical goals 
 869 |     of a logistic regression are (classification, prediction etc.) and bring up a few examples and use cases.
 870 | 
 871 | #### Q103 What’s the “kernel trick” and how is it useful?
 872 | 
 873 |     The Kernel trick involves kernel functions that can enable in higher-dimension spaces without explicitly 
 874 |     calculating the coordinates of points within that dimension: instead, kernel functions compute the inner 
 875 |     products between the images of all pairs of data in a feature space. This allows them the very useful 
 876 |     attribute of calculating the coordinates of higher dimensions while being computationally cheaper than 
 877 |     the explicit calculation of said coordinates. Many algorithms can be expressed in terms of inner products.
 878 |     Using the kernel trick enables us effectively run  algorithms in a high-dimensional space with lower-dimensional data.
 879 | 
 880 | ## Machine Learning Interview Questions: Programming
 881 | These machine learning interview questions test your knowledge of programming principles you need to 
 882 | implement machine learning principles in practice. Machine learning interview questions tend to be technical
 883 | questions that test your logic and programming skills: this section focuses more on the latter.
 884 | 
 885 | ~~#### Q104 How do you handle missing or corrupted data in a dataset?~~
 886 | 
 887 | #### Q105 Do you have experience with Spark or big data tools for machine learning?
 888 | 
 889 |     You’ll want to get familiar with the meaning of big data for different companies and the different
 890 |     tools they’ll want. Spark is the big data tool most in demand now, able to handle immense datasets
 891 |     with speed. Be honest if you don’t have experience with the tools demanded, but also take a look at
 892 |     job descriptions and see what tools pop up: you’ll want to invest in familiarizing yourself with them.
 893 | 
 894 | #### Q106 Pick an algorithm. Write the psuedo-code for a parallel implementation.
 895 | 
 896 |     This kind of question demonstrates your ability to think in parallelism and how you could handle 
 897 |     concurrency in programming implementations dealing with big data. Take a look at pseudocode frameworks
 898 |     such as Peril-L and visualization tools such as Web Sequence Diagrams to help you demonstrate your 
 899 |     ability to write code that reflects parallelism.
 900 | 
 901 | #### Q107 What are some differences between a linked list and an array?
 902 | 
 903 |     An array is an ordered collection of objects. A linked list is a series of objects with pointers that
 904 |     direct how to process them sequentially. An array assumes that every element has the same size, unlike 
 905 |     the linked list. A linked list can more easily grow organically: an array has to be pre-defined or 
 906 |     re-defined for organic growth. Shuffling a linked list involves changing which points direct where — 
 907 |     meanwhile, shuffling an array is more complex and takes more memory.
 908 | 
 909 | #### Q108 Describe a hash table.
 910 | 
 911 |     A hash table is a data structure that produces an associative array. A key is mapped to certain values 
 912 |     through the use of a hash function. They are often used for tasks such as database indexing.
 913 | 
 914 | #### Q109 Which data visualization libraries do you use? What are your thoughts on the best data visualization tools?
 915 | 
 916 |     What’s important here is to define your views on how to properly visualize data and your personal 
 917 |     preferences when it comes to tools. Popular tools include R’s ggplot, Python’s seaborn and matplotlib,
 918 |     and tools such as Plot.ly and Tableau.
 919 |     
 920 | ![](https://lh3.googleusercontent.com/79d5jkZBgpZPQa61A4e9opgfX2-mrxWxfQyswec3YxBouNEvAu8wYxjCXNQl-nRdBVQeuco1h-LZbxVblgS9h6bYLi6peoqSd2N7VW7BSeBgpmclKng6IRYEf9QkTMRJKMyPxrCT)
 921 | 
 922 | ## Machine Learning Interview Questions: Company/Industry Specific
 923 | 
 924 | These machine learning interview questions deal with how to implement your general machine learning knowledge 
 925 | to a specific company’s requirements. You’ll be asked to create case studies and extend your knowledge of the
 926 | company and industry you’re applying for with your machine learning skills.
 927 | 
 928 | #### Q110 How would you implement a recommendation system for our company’s users?
 929 | 
 930 |     A lot of machine learning interview questions of this type will involve implementation of machine learning
 931 |     models to a company’s problems. You’ll have to research the company and its industry in-depth, especially 
 932 |     the revenue drivers the company has, and the types of users the company takes on in the context of the 
 933 |     industry it’s in.
 934 | 
 935 | #### Q111 How can we use your machine learning skills to generate revenue?
 936 | 
 937 |     This is a tricky question. The ideal answer would demonstrate knowledge of what drives the business and 
 938 |     how your skills could relate. For example, if you were interviewing for music-streaming startup Spotify, 
 939 |     you could remark that your skills at developing a better recommendation model would increase user retention,
 940 |     which would then increase revenue in the long run.
 941 | 
 942 |     The startup metrics Slideshare linked above will help you understand exactly what performance indicators 
 943 |     are important for startups and tech companies as they think about revenue and growth.
 944 | 
 945 | #### Q112 What do you think of our current data process?
 946 | 
 947 |     This kind of question requires you to listen carefully and impart feedback in a manner that is constructive 
 948 |     and insightful. Your interviewer is trying to gauge if you’d be a valuable member of their team and whether
 949 |     you grasp the nuances of why certain things are set the way they are in the company’s data process based on
 950 |     company- or industry-specific conditions. They’re trying to see if you can be an intellectual peer. Act 
 951 |     accordingly.
 952 | 
 953 | ## Machine Learning Interview Questions: General Machine Learning Interest
 954 | 
 955 | This series of machine learning interview questions attempts to gauge your passion and interest in machine learning.
 956 | The right answers will serve as a testament for your commitment to being a lifelong learner in machine learning.
 957 | 
 958 | #### Q113 What are the last machine learning papers you’ve read?
 959 | 
 960 |     Keeping up with the latest scientific literature on machine learning is a must if you want to demonstrate
 961 |     interest in a machine learning position. This overview of deep learning in Nature by the scions of deep 
 962 |     learning themselves (from Hinton to Bengio to LeCun) can be a good reference paper and an overview of what’s 
 963 |     happening in deep learning — and the kind of paper you might want to cite.
 964 | 
 965 | #### Q114 Do you have research experience in machine learning?
 966 | 
 967 |     Related to the last point, most organizations hiring for machine learning positions will look for your 
 968 |     formal experience in the field. Research papers, co-authored or supervised by leaders in the field, can make 
 969 |     the difference between you being hired and not. Make sure you have a summary of your research experience 
 970 |     and papers ready — and an explanation for your background and lack of formal research experience if you don’t.
 971 | 
 972 | #### Q115 What are your favorite use cases of machine learning models?
 973 | 
 974 |     The Quora thread above contains some examples, such as decision trees that categorize people into different 
 975 |     tiers of intelligence based on IQ scores. Make sure that you have a few examples in mind and describe what 
 976 |     resonated with you. It’s important that you demonstrate an interest in how machine learning is implemented.
 977 | 
 978 | #### Q116 How would you approach the “Netflix Prize” competition?
 979 | 
 980 |     The Netflix Prize was a famed competition where Netflix offered $1,000,000 for a better collaborative 
 981 |     filtering algorithm. The team that won called BellKor had a 10% improvement and used an ensemble of different
 982 |     methods to win. Some familiarity with the case and its solution will help demonstrate you’ve paid attention 
 983 |     to machine learning for a while.
 984 | 
 985 | #### Q117 Where do you usually source datasets?
 986 | 
 987 |     Machine learning interview questions like these try to get at the heart of your machine learning interest.
 988 |     Somebody who is truly passionate about machine learning will have gone off and done side projects on their own, 
 989 |     and have a good idea of what great datasets are out there. If you’re missing any, check out Quandl for economic
 990 |     and financial data, and Kaggle’s Datasets collection for another great list.
 991 | 
 992 | #### Q118 How do you think Google is training data for self-driving cars?
 993 | 
 994 |     Machine learning interview questions like this one really test your knowledge of different machine learning 
 995 |     methods, and your inventiveness if you don’t know the answer. Google is currently using recaptcha to source 
 996 |     labelled data on storefronts and traffic signs. They are also building on training data collected by Sebastian
 997 |     Thrun at GoogleX — some of which was obtained by his grad students driving buggies on desert dunes!
 998 | 
 999 | #### Q119 How would you simulate the approach AlphaGo took to beat Lee Sidol at Go?
1000 | 
1001 |     AlphaGo beating Lee Sidol, the best human player at Go, in a best-of-five series was a truly seminal event
1002 |     in the history of machine learning and deep learning. The Nature paper above describes how this was accomplished
1003 |     with “Monte-Carlo tree search with deep neural networks that have been trained by supervised learning,
1004 |     from human expert games, and by reinforcement learning from games of self-play.”
1005 | 
1006 | 
1007 | [Reference from dezyre](https://www.dezyre.com/article/100-data-science-interview-questions-and-answers-general-for-2017/184 "悬停显示")
1008 | 
1009 | [Rererence from Springbord](https://www.springboard.com/blog/machine-learning-interview-questions/?from=message&isappinstalled=0 "悬停显示")
1010 | 
1011 | Reference: Deep Learning (Ian Goodfellow, Yoshua Bengio and Aaron Courville) -- MIT
1012 | 


--------------------------------------------------------------------------------