├── 06_Machine Learning Capstone ├── week_5.md ├── week_1.md ├── week_6.md ├── week_4.md ├── week_2.md └── week_3.md ├── 02_Supervised Machine Learning: Regression ├── week_6.md ├── week_2.md ├── week_1.md ├── week_4.md ├── week_5.md └── week_3.md ├── 01_Exploratory Data Analysis for Machine Learning ├── week_5.md ├── week_3.md ├── week_4.md ├── week_2.md └── week_1.md ├── 03_Supervised Machine Learning: Classification ├── final_project.md ├── week_2.md ├── week_5.md ├── week_6.md ├── week_4.md ├── week_1.md └── week_3.md ├── 05_Deep Learning and Reinforcement Learning ├── final_project.md ├── week_7.md ├── week_9.md ├── week_2.md ├── week_8.md ├── week_3.md ├── week_1.md ├── week_5.md ├── week_6.md └── week_4.md ├── 04_Unsupervised Machine Learning ├── week_7.md ├── week_2.md ├── week_5.md ├── week_1.md ├── week_3.md ├── week_4.md └── week_6.md └── README.md /06_Machine Learning Capstone/week_5.md: -------------------------------------------------------------------------------- 1 | # WEEK 5 2 | ## There is no any task move forward for next week. 3 | HAPPY LEARNING 😊 4 | -------------------------------------------------------------------------------- /06_Machine Learning Capstone/week_1.md: -------------------------------------------------------------------------------- 1 | # WEEK 1 2 | ## 📜 Introduction Part 📜 3 | ## There is no any task move forward for next week. 4 | HAPPY LEARNING 😊 5 | 6 | -------------------------------------------------------------------------------- /02_Supervised Machine Learning: Regression/week_6.md: -------------------------------------------------------------------------------- 1 | # Week 6 2 | Week 6 is Project work and the ball is in your court (means its upto you or optional work 😊) 3 | 4 | 🌟 HAPPY LEARNING! 😊📚 5 | 6 | -------------------------------------------------------------------------------- /01_Exploratory Data Analysis for Machine Learning/week_5.md: -------------------------------------------------------------------------------- 1 | # WEEK 5 2 | Week 5 is Project work and the ball is in your court (means its upto you or optional work 😊) 3 | 4 | 🌟 HAPPY LEARNING! 😊📚 5 | 6 | 7 | -------------------------------------------------------------------------------- /03_Supervised Machine Learning: Classification/final_project.md: -------------------------------------------------------------------------------- 1 | ### Here is the project click and Download ➡️ [03-Supervised Machine Learning Classification.pdf](https://github.com/iamvikramkumar/ibm_machine_learning_coursera/files/13214925/03-Supervised.Machine.Learning.Classification.pdf) 2 | # NOTE: 3 | ## Please make some minor changes to the project. I have mentioned my name in it, so you should remove it before uploading. Thank you. 4 | -------------------------------------------------------------------------------- /05_Deep Learning and Reinforcement Learning/final_project.md: -------------------------------------------------------------------------------- 1 | ### Here is the project click and Download ➡️ [05-Deep Learning and Reinforcement Learning.pdf](https://github.com/iamvikramkumar/ibm_machine_learning_coursera/files/13214748/05-Deep.Learning.and.Reinforcement.Learning.pdf) 2 | 3 | # NOTE: 4 | ## Please make some minor changes to the project. I have mentioned my name in it, so you should remove it before uploading. Thank you. 5 | -------------------------------------------------------------------------------- /04_Unsupervised Machine Learning/week_7.md: -------------------------------------------------------------------------------- 1 | # WEEK 7 2 | ## Week 7 is Project work and the ball is in your court or may be it updated in future. 3 | 🌟 HAPPY LEARNING! 😊📚 4 | 5 | ### Here is the project click and Download ➡️ [04-Unsupervised Machine Learning.pdf](https://github.com/iamvikramkumar/ibm_machine_learning_coursera/files/13214858/04-Unsupervised.Machine.Learning.pdf) 6 | 7 | # NOTE: 8 | ## Please make some minor changes to the project. I have mentioned my name in it, so you should remove it before uploading. Thank you. 9 | -------------------------------------------------------------------------------- /06_Machine Learning Capstone/week_6.md: -------------------------------------------------------------------------------- 1 | # WEEK 6 2 | ## Week 6 is Project work and the ball is in your court or may be it updated in future. 3 | 🌟 HAPPY LEARNING! 😊📚 4 | 5 | ### Here is the project click and Download ➡️ 6 | [Machine Learning Capstone.pdf](https://github.com/iamvikramkumar/ibm_machine_learning_coursera/files/13214662/06_Machine.Learning.Capstone.pdf) 7 | 8 | # NOTE: 9 | ## Please make some minor changes to the project. I have mentioned my name in it, so you should remove it before uploading. Thank you. 10 | -------------------------------------------------------------------------------- /04_Unsupervised Machine Learning/week_2.md: -------------------------------------------------------------------------------- 1 | # WEEK 2 QUIZ 2 | 3 | ## Q1. What is the other name we can give to the L2 distance? 4 | `Euclidean Distance` 5 | 6 | ## Q2. Which of the following statements is a business case for the use of the Manhattan distance (L1)? 7 | `We use it in business cases where there is very high dimensionality.` 8 | 9 | ## Q3. What is the key feature for the Cosine Distance? 10 | `The Cosine Distance, which takes into acount the angle between 2 points.` 11 | 12 | ## Q4. The following statement is an example of a business case where we can use the Cosine Distance? 13 | 14 | `Cosine is better for data such as text where location of occurrence is less important. ` 15 | 16 | 17 | ## Q5. Which distance metric is useful when we have text documents and we want to group similar topics together? 18 | 19 | `Jaccard ` 20 | -------------------------------------------------------------------------------- /04_Unsupervised Machine Learning/week_5.md: -------------------------------------------------------------------------------- 1 | # WEEK 5 QUIZ 2 | 3 | ## Q1. What is the main difference between kernel PCA and linear PCA? 4 | `Kernel PCA tend to uncover non-linearity structure within the dataset by increasing the dimensionality of the space thanks to the kernel trick.` 5 | 6 | ## Q2. (True/False) Multi-Dimensional Scaling (MDS) focuses on maintaining the geometric distances between points 7 | `True` 8 | 9 | ## Q3. Which of the following data types is more suitable for Kernel PCA than PCA? 10 | `Data where the classes are not linearly separable.` 11 | 12 | ## Q4. By applying MDS, you are able to: 13 | `Find embeddings for points so that their distance is the most similar to the original distance.` 14 | 15 | ## Q5. Which one of the following hyperparameters is NOT considered when using GridSearchCV for Kernel PCA? 16 | `n_clusters` 17 | 18 | -------------------------------------------------------------------------------- /05_Deep Learning and Reinforcement Learning/week_7.md: -------------------------------------------------------------------------------- 1 | # WEEK 7 QUIZ 2 | 3 | ## Q1. Select the correct option: 4 | 5 | Statement 1: Autoencoders are a supervised learning technique. 6 | 7 | Statement 2: Autoencoder’s output is exactly the same as the input. 8 | 9 | `Both statements are false.` 10 | 11 | 12 | ## Q2. Select the correct option: 13 | 14 | Statement 1: Autoencoders can be viewed as a generalization of PCA that discovers lower dimensional representation of complex data. 15 | 16 | Statement 2: We can implement overcomplete autoencoder by constraining the number of units present in the hidden layers of the neural network. 17 | 18 | `Statement 1 is true, statement 2 is false.` 19 | 20 | ## Q3. (True/False) Denoising autoencoders can be used as a tool for feature extraction. 21 | 22 | `True` 23 | 24 | 25 | ## Q4. (True/False) An Autoencoder is a form of unsupervised deep learning. 26 | 27 | `True` 28 | -------------------------------------------------------------------------------- /05_Deep Learning and Reinforcement Learning/week_9.md: -------------------------------------------------------------------------------- 1 | # WEEK 9 QUIZ 2 | 3 | ## Q1. (True/False) Simulation is a common approach for Reinforcement Learning applications that are complex or computing intensive. 4 | `True` 5 | 6 | ## Q2.(True/False) Discounting rewards refers to an agent reducing the value of the reward based on its uncertainty. 7 | `False` 8 | 9 | ## Q3. (True/False) Successful Reinforcement Learning approaches are often limited by extreme sensitivity to hyperparameters. 10 | `True` 11 | 12 | ## Q4. (True/False) Reinforcement Learning approaches are often limited by excessive computation resources and data requirements. 13 | `True` 14 | 15 | ## Q5. Which type of Deep Learning approach is most commonly used for image recognition? 16 | `Convolutional Neural Network` 17 | 18 | 19 | ## Q6. Which type of Deep Learning approach is most commonly used for forecasting problems? 20 | 21 | `Recurrent Neural Network` 22 | 23 | ## Q7. Which type of Deep Learning approach is most commonly used for generating artificial images? 24 | 25 | `Autoencoders` 26 | 27 | -------------------------------------------------------------------------------- /05_Deep Learning and Reinforcement Learning/week_2.md: -------------------------------------------------------------------------------- 1 | # WEEK 2 QUIZ 2 | 3 | ## Q1. The backpropagation algorithm updates which of the following? 4 | `The parameters only.` 5 | 6 | ## Q2. What of the following about the activation functions is true? 7 | 8 | `They add non-linearity into the model, allowing the model to learn complex pattern.` 9 | 10 | ## Q3. What is true regarding the backpropagation rule? 11 | `The actual output is determined by computing the output of neurons in each hidden layer ` 12 | 13 | ## Q4. Which option correctly lists the steps to build a linear regression model using Keras? 14 | 1. Use `fit()` and specify the number of epochs to train the model for. 15 | 16 | 2. Create a Sequential model with the relevant layers. 17 | 18 | 3. Normalize the features with ` layers.Normalization()` and apply `adapt()`. 19 | 20 | 4. Compile using `model.compile()` with specified optimizer and loss. 21 | 22 | ANSWER ➡️ `3, 2, 4, 1` 23 | 24 | ## Q5. (True/False) Keras provides one approach to build a model: by defining a Sequential model. 25 | `False` 26 | -------------------------------------------------------------------------------- /04_Unsupervised Machine Learning/week_1.md: -------------------------------------------------------------------------------- 1 | # WEEK 1 QUIZ 2 | 3 | ## Q1. What is the implication of a small standard deviation of the clusters? 4 | `The standard deviation of the cluster defines how tightly around each one of the centroids are. With a small standard deviation, the points will be closer to the centroids. ` 5 | 6 | ## Q2. After we plot our elbow and we find the inflection point, what does that point indicate to us? 7 | 8 | `The ideal number of clusters. ` 9 | 10 | ## Q3. What is one of the most suitable ways to choose K when the number of clusters is unclear? 11 | 12 | `By evaluating Clustering performance such as Inertia and Distortion.` 13 | 14 | ## Q4. Which statement describes correctly the use of distortion and inertia? 15 | 16 | `When the similarity of the points in the cluster are more important, you should use distortion, and if you are more concerned that clusters have similar numbers of points, then you should use inertia.` 17 | 18 | 19 | ## Q5. Which method is commonly used to select the right number of clusters? 20 | 21 | `The elbow method.` 22 | -------------------------------------------------------------------------------- /04_Unsupervised Machine Learning/week_3.md: -------------------------------------------------------------------------------- 1 | # WEEK 3 QUIZ 2 | 3 | ## Q1. When using DBSCAN, how does the algorithm determine that a cluster is complete and is time to move to a different point of the data set and potentially start a new cluster? 4 | `When no point is left unvisited by the chain reaction.` 5 | 6 | ## Q2. Which of the following statements correctly defines the strengths of the DBSCAN algorithm? 7 | 8 | `No need to specify the number of clusters (cf. K-means), allows for noise, and can handle arbitrary-shaped clusters.` 9 | 10 | ## Q3. Which of the following statements correctly defines the weaknesses of the DBSCAN algorithm? 11 | 12 | `It needs two parameters as input, finding appropriate values of Ɛ and n_clu can be difficult, and it does not do well with clusters of different density.` 13 | 14 | ## Q4. (True/false) Does complete linkage refers to the maximum pairwise distance between clusters? 15 | 16 | `True` 17 | 18 | ## Q5. Which of the following measure methods computes the inertia and pick the pair that is going to ultimately minimize the inertia value? 19 | `Ward linkage` 20 | -------------------------------------------------------------------------------- /05_Deep Learning and Reinforcement Learning/week_8.md: -------------------------------------------------------------------------------- 1 | # WEEK 8 QUIZ 2 | 3 | ## Q1. Select the right assertion: 4 | 5 | `Autoencoders learn from a compressed representation of the data, while variational autoencoders learn from a probability distribution representing the data.` 6 | 7 | ## Q2. (True/False) Variational autoencoders are generative models. 8 | 9 | `True` 10 | 11 | ## Q3. When comparing the results of Autoencoders and Principal Component Analysis, which approach might best improve the results from Autoencoders? 12 | `Add layers and epochs` 13 | 14 | ## Q4. (True/False) KL loss is used in Variatoinal Autoencoders to represent the measure of the difference between two distributions. 15 | 16 | `True` 17 | 18 | ## Q5. A good way to compare the inputs and outputs of a Variational Autoencoder is to calculate the mean of a reconstruction function based on binary crossentropy. 19 | 20 | `True` 21 | 22 | 23 | ## Q6. The main parts of GANs architecture are: 24 | 25 | `generator and discriminator` 26 | 27 | 28 | ## Q7. (True/False) One of the main advantages of GANs over other adversarial networks is that it does not spend any time evaluating whether an input or image is fake or real. It only computes probability of being fake. 29 | `True` 30 | -------------------------------------------------------------------------------- /04_Unsupervised Machine Learning/week_4.md: -------------------------------------------------------------------------------- 1 | # WEEK 4 QUIZ 2 | 3 | ## Q1. Select the option that best completes the following sentence: For data with many features, principal components analysis 4 | `generates new features that are linear combinations of the original features.` 5 | 6 | ## Q2. Which option correctly lists the steps for implementing PCA in Python? 7 | 8 | 1. Fit PCA to data 9 | 10 | 2. Scale the data 11 | 12 | 3. Determine the desired number of components based on total explained variance 13 | 14 | 4. Define a PCA object 15 | 16 | `2, 4, 1, 3 ` 17 | 18 | ## Q3. Given the following matrix for lengths of singular vectors, how do we rank the vectors in terms of importance? 19 |
20 | [11 0 0 0 21 | 0 3 0 0 22 | 0 0 2 0 23 | 0 0 0 1] 24 |25 | `v1, v2, v3, v4 ` 26 | 27 | ## Q4. Given two principal components v1, v2, let's say that feature f1 contributed 0.15 to v1 and 0.25 to v2. Feature f2 contributed -0.11 to v1 and 0.4 to v2. Which feature is more important according to their total contribution to the components? 28 | `v2 because |-0.11| + |0.4| > |0.15| + |0.25|` 29 | 30 | 31 | ## Q5. (True/False) In PCA, the first principal component represents the most important feature in the dataset. 32 | `False` 33 | 34 | -------------------------------------------------------------------------------- /05_Deep Learning and Reinforcement Learning/week_3.md: -------------------------------------------------------------------------------- 1 | # WEEK 3 QUIZ 2 | 3 | ## Q1. What is the main function of backpropagation when training a Neural Network? 4 | 5 | `Make adjustments to the weights` 6 | 7 | ## Q2. (True/False) The “vanishing gradient” problem can be solved using a different activation function. 8 | `True` 9 | ## Q3. (True/False) Every node in a neural network has an activation function. 10 | `True` 11 | 12 | ## Q4. These are all activation functions except: 13 | `Leaky hyperbolic tangent` 14 | 15 | ## Q5. Deep Learning uses deep Neural Networks for all these uses, except 16 | `Cases in which explainability is the main objective` 17 | 18 | ## Q6. These are all activation functions except: 19 | `Pruning` 20 | 21 | ## Q7. (True/False) Optimizer approaches for Deep Learning Regularization use gradient descent: 22 | `False` 23 | 24 | ## Q8. Stochastic gradient descent is this type of batching method: 25 | 26 | `online learning` 27 | 28 | ## Q9. (True/False) The main purpose of data shuffling during the training of a Neural Network is to aid convergence and use the data in a different order each epoch. 29 | 30 | `True` 31 | 32 | ## Q10. This is a high-level library that is commonly used to train deep learning models and runs on either TensorFlow or Theano: 33 | `Keras` 34 | -------------------------------------------------------------------------------- /03_Supervised Machine Learning: Classification/week_2.md: -------------------------------------------------------------------------------- 1 | # WEEK 2 QUIZ 2 | 3 | ## Q1. Which one of the following statements is true regarding K Nearest Neighbors? 4 | `K Nearest Neighbors (KNN) assumes that points which are close together are similar.` 5 | 6 | 7 | ## Q2. Which one of the following statements is most accurate? 8 | `K nearest neighbors (KNN) needs to remember the entire training dataset in order to classify a new data sample.` 9 | 10 | ## Q3. Which one of the following statements is most accurate about K Nearest Neighbors (KNN)? 11 | `KNN can be used for both classification and regression.` 12 | 13 | ## Q4. (True/False) K Nearest Neighbors with large k tend to be the best classifiers. 14 | `False` 15 | 16 | ## Q5. When building a KNN classifier for a variable with 2 classes, it is advantageous to set the neighbor count k to an odd number. 17 | `True` 18 | 19 | ## Q6. The Euclidean distance between two points will always be shorter than the Manhattan distance: 20 | `True` 21 | 22 | ## Q7. The main purpose of scaling features before fitting a k nearest neighbor model is to: 23 | `Ensure that features have similar influence on the distance calculation` 24 | 25 | ## Q8. These are all pros of the k nearest neighbor algorithm EXCEPT: 26 | ` It is sensitive to the curse of dimensionality` 27 | -------------------------------------------------------------------------------- /04_Unsupervised Machine Learning/week_6.md: -------------------------------------------------------------------------------- 1 | # WEEK 6 QUIZ 2 | 3 | ## Q1. (True/False) In some applications, NMF can make for more human interpretable latent features. 4 | `True` 5 | 6 | ## Q2. Which of the following set of features is the least adapted to NMF? 7 | `Monthly returns of a set of stock portfolios.` 8 | 9 | ## Q3. (True/False) The NMF can produce different outputs depending on its initialization. 10 | 11 | `True` 12 | 13 | ## Q4. Which option is the sparse representation of the matrix below? 14 | [(1, 1, 2), (1, 2, 3), (3, 4, 1), (2, 4, 4), (4, 3, 1)] 15 | 16 | 17 | - [x] [[2 0 0 0], 18 | 19 | [0 3 0 0], 20 | 21 | [0 0 0 1], 22 | 23 | [0 4 1 0]] 24 | 25 | 26 | - [ ] [[0 0 0 1], 27 | 28 | [0 2 0 0], 29 | 30 | [0 0 0 3], 31 | 32 | [0 4 1 0]] 33 | 34 | 35 | - [ ] [[1 0 0 0], 36 | 37 | [0 3 0 0], 38 | 39 | [0 2 0 0], WRONG 40 | 41 | [0 0 4 2]] 42 | 43 | 44 | - [ ] [[0 0 0 2], 45 | 46 | [0 3 4 0], 47 | 48 | [0 0 0 0], 49 | 50 | [0 0 1 0]] 51 | 52 | 53 | ## Q5. In Practice lab: Non-Negative Matrix Factorization, why did we use "pairwise_distances" from scikit-learn? 54 | 55 | `To calculate the pairwise distance between NMF encoded version of the original dataset and the encoded query dataset.` 56 | -------------------------------------------------------------------------------- /05_Deep Learning and Reinforcement Learning/week_1.md: -------------------------------------------------------------------------------- 1 | # WEEK 1 QUIZ 2 | 3 | ## Q1. What is another name for the “neuron” on which all neural networks are based? 4 | `perceptron` 5 | 6 | ## Q2. What is an advantage of using a network of neurons? 7 | `A network of neurons can represent a non-linear decision boundary.` 8 | 9 | ## Q3. A dataset with 8 features would have how many nodes in the input layer? 10 | 11 | `8` 12 | 13 | ## Q4. For a single data point, the weights between an input layer with 3 nodes and a hidden layer with 4 nodes can be represented by a: 14 | 15 | `3 x 4 matrix.` 16 | 17 | ## Q5. Use the following image for reference. How many hidden layers are in this Neural Network? 18 | 19 |  20 | 21 | `Two` 22 | 23 | ## Q6. Use the following image for reference. How many hidden units are in this Neural Network? 24 |  25 | 26 | `Eight` 27 | 28 | 29 | ## Q7. Which statement is TRUE about the relationship between Neural Networks and Logistic Regression? 30 | `A single-layer Neural Network can be parameterized to generate results equivalent to Linear or Logistic Regression.` 31 | -------------------------------------------------------------------------------- /03_Supervised Machine Learning: Classification/week_5.md: -------------------------------------------------------------------------------- 1 | # WEEK 5 QUIZ 2 | 3 | ## Q1. The term Bagging stands for bootstrap aggregating. 4 | `True` 5 | 6 | ## Q2. This is the best way to choose the number of trees to build on a Bagging ensemble. 7 | 8 | `Tune number of trees as a hyperparameter that needs to be optimized` 9 | 10 | ## Q3. Which type of Ensemble modeling approach is NOT a special case of model averaging? 11 | `Boosting methods` 12 | The Pasting method of Bootstrap aggregation 13 | 14 | 15 | ## Q4. What is an ensemble model that needs you to look at out of bag error? 16 | `Random Forest` 17 | 18 | ## Q5. What is the main condition to use stacking as ensemble method? 19 | 20 | `Models need to output predicted probabilities` 21 | 22 | ## Q6. This tree ensemble method only uses a subset of the features for each tree: 23 | 24 | `Random Forest` 25 | 26 | ## Q7. Order these tree ensembles in order of most randomness to least randomness: 27 | `Random Trees, Random Forest, Bagging` 28 | 29 | ## Q8. This is an ensemble model that does not use bootstrapped samples to fit the base trees, takes residuals into account, and fits the base trees iteratively: 30 | `Boosting` 31 | 32 | ## Q9. When comparing the two ensemble methods Bagging and Boosting, what is one characteristic of Boosting? 33 | `Fits entire data set` 34 | 35 | ## Q10. What is the most frequently discussed loss function in boosting algorithms? 36 | 37 | `0-1 Loss Function` 38 | 39 | -------------------------------------------------------------------------------- /05_Deep Learning and Reinforcement Learning/week_5.md: -------------------------------------------------------------------------------- 1 | # WEEK 5 QUIZ 2 | 3 | ## Q1. (True/False) In Keras, the Dropout layer has an argument called rate, which is a probability that represents how often we want to invoke the layer in the training. 4 | `False` 5 | 6 | ## Q2. What is a benefit of applying transfer learning to neural networks? 7 | `Save early layers for generalization before re-training later layers for specific applications. ` 8 | 9 | ## Q3. By setting ` layer.trainable = False` for certain layers in a neural network, we____ 10 | `freeze the layers such that their weights don’t update during training. ` 11 | 12 | ## Q4. Which option correctly orders the steps of implementing transfer learning? 13 | 14 | 1. Freeze the early layers of the pre-trained model. 15 | 16 | 2. Improve the model by fine-tuning. 17 | 18 | 3. Train the model with a new output layer in place. 19 | 20 | 4. Select a pre-trained model as the base of our training. 21 | 22 | 23 | # ANSWER ➡️ `4, 1, 3, 2` 24 | 25 | ## Q5. Given a 100x100 pixels RGB image, there are _____ features. 26 | 27 | `30000` 28 | There is one feature per pixel of each channel, so 100*100*3=30000. 29 | 30 | 31 | ## Q6. Before a CNN is ready for classifying images, what layer must we add as the last? 32 | 33 | `Dense layer with the number of units corresponding to the number of classes ` 34 | 35 | ## Q7. In a CNN, the depth of a layer corresponds to the number of: 36 | `filters applied` 37 | -------------------------------------------------------------------------------- /02_Supervised Machine Learning: Regression/week_2.md: -------------------------------------------------------------------------------- 1 | # WEEK 2 QUIZ 2 | 3 | ## Q1. The main purpose of splitting your data into a training and test sets is: 4 | - To avoid overfitting 5 | 6 | ## Q2. Complete the following sentence: The training data is used to fit the model, while the test data is used to: 7 | - Measure error and performance of the model 8 | 9 | ## Q3. What term is used if your test data leaks into the training data? 10 | - Data leakage 11 | 12 | ## Q4. Which one of the below terms uses a linear combination of features? 13 | - Linear Regression 14 | 15 | ## Q5. When splitting your data, what is the purpose of the training data? 16 | - Fit the actual model and learn the parameters 17 | 18 | ## Q6. Polynomial features capture what effects? 19 | - Non-linear effects. 20 | 21 | ## Q7. Which fundamental problems are being solved by adding non-linear patterns, such as polynomial features, to a standard linear approach? 22 | - Prediction and Interpretation. 23 | 24 | ## Q8. A testing data could be also referred to as: 25 | - Unseen data 26 | 27 | ## Q9. Select the correct syntax to obtain the data split that will result in a train set that is 60% of the size of your available data: 28 | - X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4) 29 | 30 | ## Q10. What is the correct sklearn syntax to add a third-degree polynomial to your model? 31 | - polyFeat = PolynomialFeatures(degree=3) 32 | -------------------------------------------------------------------------------- /05_Deep Learning and Reinforcement Learning/week_6.md: -------------------------------------------------------------------------------- 1 | # WEEK 6 QUIZ 2 | 3 | ## Q1. (True/False) RNN models are mostly used in the fields of natural language processing and speech recognition. 4 | 5 | `True` 6 | 7 | ## Q2. (True/False) GRUs and LSTM are a way to deal with the vanishing gradient problem encountered by RNNs. 8 | `True` 9 | 10 | 11 | ## Q3. (True/False) GRUs will generally perform about as well as LSTMs with shorter training time, especially for smaller datasets. 12 | `True` 13 | ## Q4. (True/False) The main idea of Seq2Seq models is to improve accuracy by keeping necessary information in the hidden state from one sequence to the next. 14 | `True` 15 | 16 | ## Q5. (True/False) The main parts of a Seq2Seq model are: an encoder, a hidden state, a sequence state, and a decoder. 17 | `False` 18 | 19 | ## Q6. Select the correct option, in the context of Seq2Seq models: 20 | 21 | The Greedy Search algorithm selects one best candidate as an input sequence for each time step while the Beam Search produces multiple different hypothesis based on conditional probability. 22 | 23 | ## Q7. Which is the gating mechanism for RNNs that include a reset gate and an update gate? 24 | 25 | `GRUs` 26 | 27 | ## Q8. LSTM models are among the most common Deep Learning models used in forecasting. These are other common uses of LSTM models, except: 28 | 29 | Speech Recognition 30 | 31 | 32 | - [ ] Machine Translation 33 | 34 | 35 | - [ ] Image Captioning 36 | 37 | 38 | - [x] Generating Images 39 | 40 | 41 | - [ ] Anomaly Detection 42 | 43 | 44 | - [ ] Robotic Control 45 | -------------------------------------------------------------------------------- /02_Supervised Machine Learning: Regression/week_1.md: -------------------------------------------------------------------------------- 1 | 2 | # WEEK 1 QUIZ 3 | ## Q1. You can use supervised machine learning for all of the following examples, EXCEPT: 4 | Segment customers by their demographics. 5 | 6 | ## Q2. The autocorrect on your phone is an example of: 7 | Supervised learning 8 | 9 | ## Q3. Which of the following is the type of Machine Learning that uses only data with outcomes to build a model? 10 | Supervised Machine Learning 11 | 12 | ## Q4. Which among the following options does not conform to the best practice of modelling in Supervised Machine learning? 13 | Develop multiple models. 14 | 15 | ## Q5. This is the syntax you need to predict new data after you have trained a linear regression model called LR : 16 | LR.predict(X_test) 17 | 18 | ## Q6. All of these options are useful error measures to compare regressions except: 19 | ROC index 20 | 21 | ## Q7. All of the listed below are part of the Machine Learning Framework, except: 22 | 23 | Observations 24 | Features 25 | Parameters 26 | Answer ➡️ None of the above 27 | 28 | ## Q8. Select the option that is the most INACCURATE regarding the definition of Machine Learning: 29 | Machine Learning is automated and requires no programming 30 | 31 | ## Q9. In Linear Regression, which statement is correct about Sum Squared Error? 32 | The Sum Squared Error measures the distance between the truth and predicted values. 33 | 34 | ## Q10. When learning about regression we saw the outcome as a continuous number. Given the below options what is an example of regression? 35 | Housing prices 36 | -------------------------------------------------------------------------------- /01_Exploratory Data Analysis for Machine Learning/week_3.md: -------------------------------------------------------------------------------- 1 | # WEEK 3 QUIZ 2 | ## Q1. Which scaling approach converts features to standard normal variables? 3 | Standard scaling 4 | 5 | ## Q2. Which variable transformation should you use for ordinal data? 6 | 7 | Ordinal encoding 8 | 9 | ## Q3. What are polynomial features? 10 | 11 | They are higher order relationships in the data. 12 | 13 | ## Q4. What does Boxcox transformation do? 14 | It transforms the data distribution into more symmetrical bell curve 15 | 16 | ## Q5. Select three important reasons why EDA is useful. 17 | To determine if the data makes sense, to determine whether further data cleaning is needed, and to help identify patterns and trends in the data 18 | 19 | ## Q6. What assumption does the linear regression model make about data? 20 | This model assumes a linear relationship between predictor variables and outcome variables. 21 | 22 | ## Q7. What is skewed data? 23 | 24 | Data that is distorted away from normal distribution; may be positively or negatively skewed. 25 | 26 | ## Q8. Select the two primary types of categorical feature encoding. 27 | 28 | One-hot encoding and ordinal encoding 29 | 30 | ## Q9. Which scaling approach puts values between zero and one? 31 | 32 | Min-max scaling 33 | 34 | Correct 35 | Correct. Min-max scaling converts variables to continuous variables in the (0, 1) interval by mapping minimum values to 0 and maximum values to 1. 36 | 37 | ## Q10. Which variable transformation should you use for nominal data with multiple different values within the feature? 38 | One-hot encoding 39 | -------------------------------------------------------------------------------- /03_Supervised Machine Learning: Classification/week_6.md: -------------------------------------------------------------------------------- 1 | # WEEK 6 QUIZ 2 | 3 | ## Q1.Which of the following statements about Downsampling is TRUE? 4 | 5 | `Downsampling is likely to decrease Precision.` 6 | 7 | ## Q2. Which of the following statements about Random Upsampling is TRUE? 8 | 9 | `Random Upsampling results in excessive focus on the more frequently-occurring class. ` 10 | 11 | ## Q3. Which of the following statements about Synthetic Upsampling is TRUE? 12 | `Synthetic Upsampling generates observations that were not part of the original data.` 13 | 14 | ## Q4. What can help humans to interpret the behaviors and methods of Machine Learning models more easily? 15 | `Model Explanations` 16 | 17 | ## Q5. What type of explanation method can be used to explain different types of Machine Learning models no matter the model structures and complexity? 18 | 19 | `Model-Agnostic Explanations` 20 | 21 | 22 | ## Q6. What reason might a Global Surrogate model fail? 23 | `Large inconsistency between surrogate models and black-box models` 24 | 25 | ## Q7. When working with unbalanced sets, what should be done to the samples so the class balance remains consistent in both the train and test set? 26 | `Stratify the samples` 27 | 28 | ## Q8. What approach are you using when trying to increase the size of a minority class so that it is similar to the size of the majority class? 29 | `Oversampling` 30 | 31 | ## Q9. What approach are you using when you create a new sample of a minority class that does not yet exist? 32 | `Synthetic Oversampling` 33 | 34 | ## Q10.What intuitive technique is used for unbalanced datasets that ensures a continuous downsample for each of the bootstrap samples? 35 | `Blagging` 36 | -------------------------------------------------------------------------------- /05_Deep Learning and Reinforcement Learning/week_4.md: -------------------------------------------------------------------------------- 1 | # WEEK 4 QUIZ 2 | 3 | ## Q1. What is the main function of backpropagation when training a Neural Network? 4 | `Make adjustments to the weights` 5 | 6 | ## Q2. What is the main function of backpropagation when training a Neural Network? 7 | (True/False) The “vanishing gradient” problem can be solved using a different activation function. 8 | `True` 9 | 10 | ## Q3. What is the main function of backpropagation when training a Neural Network? 11 | (True/False) Every node in a neural network has an activation function. 12 | 13 | `True` 14 | 15 | ## Q4. What is the main function of backpropagation when training a Neural Network? These are all activation functions except: 16 | - [ ] Sigmoid 17 | - [ ] Hyperbolic tangent 18 | - [X] Leaky hyperbolic tangent 19 | - [ ] ReLu 20 | 21 | ## Q5. Deep Learning uses deep Neural Networks for all these uses, except: 22 | `Cases in which explainability is the main objective` 23 | 24 | ## Q6. These are all activation functions for CNN, except: 25 | `Pruning` 26 | 27 | ## Q7. (True/False) Optimizer approaches for Deep Learning Regularization use gradient descent: 28 | `False` 29 | 30 | ## Q8. Stochastic gradient descent is this type of batching method: 31 | 32 | `online learning` 33 | 34 | ## Q9. The main purpose of data shuffling during the training of a Neural Network is to aid convergence and use the data in a different order each epoch. 35 | `True` 36 | 37 | ## Q10. Which of the following IS NOT a benefit of Transfer Learning? 38 | `Improving the speed at which large models can be trained from scratch` 39 | 40 | ## Q11. Which of the following statements about using a Pooling Layer is TRUE? 41 | 42 | `Pooling can reduce both computational complexity and overfitting.` 43 | -------------------------------------------------------------------------------- /02_Supervised Machine Learning: Regression/week_4.md: -------------------------------------------------------------------------------- 1 | # WEEK 4 QUIZ 2 | ## Q1. Which of the following statements about model complexity is TRUE? 3 | - Higher model complexity leads to a higher chance of overfitting. 4 | 5 | ## Q2.Which of the following statements about model errors is TRUE? 6 | - Underfitting is characterized by higher errors in both training and test samples. 7 | 8 | ## Q3. Which of the following statements about regularization is TRUE? 9 | - Regularization decreases the likelihood of overfitting relative to training data. 10 | 11 | ## Q4. Which of the following statements about scaling features prior to regularization is TRUE? 12 | - The larger a feature’s scale, the more likely its estimated impact will be influenced by regularization. 13 | 14 | ## Q5. Which one of the 3 Regularization techniques: Ridge, Lasso, and Elastic Net, performs the fastest under the hood? 15 | - Ridge 16 | 17 | ## Q6. Which of the following statements about Elastic Net regression is TRUE? 18 | - Elastic Net combines L1 and L2 regularization. 19 | 20 | ## Q7. BOTH Ridge regression and Lasso regression 21 | - Add a term to the loss function proportional to a regularization parameter. 22 | 23 | 24 | ## Q8. Compared with Lasso regression (assuming similar implementation), Ridge regression is: 25 | - Less likely to set feature coefficients to zero. 26 | 27 | ## Q9. Which of the following about Ridge Regularization is TRUE? 28 | - It enforces the coefficients to be lower, but not 0 29 | - It minimizes irrelevant features 30 | - It penalizes the size magnitude of the regression coefficients by adding a squared term 31 | - ANSWER ➡️ `All of the above` 32 | 33 | 34 | ## Q10. Whixh of the below statements are correct? 35 | - Only LassoCV use L1 regularization function. 36 | -------------------------------------------------------------------------------- /03_Supervised Machine Learning: Classification/week_4.md: -------------------------------------------------------------------------------- 1 | # WEEK 4 QUIZ 2 | 3 | ## Q1. These are all characteristics of decision trees, EXCEPT: 4 | `They have well rounded decision boundaries` 5 | 6 | ## Q2. Decision trees used as classifiers compute the value assigned to a leaf by calculating the ratio: number of observations of one class divided by the number of observations in that leaf E.g. number of customers that are younger than 50 years old divided by the total number of customers. 7 | ## How are leaf values calculated for regression decision trees? 8 | `average value of the predicted variable` 9 | 10 | ## Q3. These are two main advantages of decision trees: 11 | `They are very visual and easy to interpret` 12 | 13 | ## Q4. How can you determine the split for each node of a decision tree? 14 | `Find the split that minimizes the gini impurity.` 15 | 16 | ## Q5. Which of the following describes a way to regularize a decision tree to address overfitting? 17 | `Decrease the max depth.` 18 | 19 | ## Q6. What is a disadvantage of decision trees? 20 | `They tend to overfit.` 21 | 22 | ## Q7. What method can you use to minimize overfitting of a machine learning model? 23 | `Tune the hyperparameters of your model using cross-validation.` 24 | 25 | ## Q8. Concerning Classification algorithms, what is a characteristic of K-Nearest Neighbors? 26 | 27 | `Training data is the model` 28 | 29 | ## Q9. Concerning Classification algorithms, what are the characteristics of Logistic Regression? 30 | `The model is just parameters, fitting can be slow, prediction is fast, and the decision boundary is simple and less flexible` 31 | 32 | ## Q10. When evaluating all possible splits of a decision tree what can be used to find the best split regardless of what happened in prior or future steps? 33 | `Greedy Search` 34 | -------------------------------------------------------------------------------- /01_Exploratory Data Analysis for Machine Learning/week_4.md: -------------------------------------------------------------------------------- 1 | 2 | ## Q1. Which one of the following is common to both machine learning and statistical inference? ⬇️⬇️ 3 | Using sample data to infer qualities of the underlying population distribution. 4 | 5 | ## Q2. Which one of the following describes an approach to customer churn prediction stated in terms of probability? 6 | Predicting a score for individuals that estimates the probability the customer will leave. 7 | 8 | ## Q3. What is customer lifetime value? 9 | The total purchases over the time which the person is a customer. 10 | 11 | ## Q4. Which one the following statements about the normalized histogram of a variable is true? 12 | It provides an estimate of the variable’s probability distribution. 13 | 14 | ## Q5. The outcome of rolling a fair die can be modelled as a _______ distribution. 15 | uniform 16 | 17 | ## Q6. Which one of the following features best distinguishes the Bayesian approach to statistics from the Frequentist approach? 18 | Bayesian statistics incorporate the probability of the hypothesis being true. 19 | 20 | ## Q7. Which of the following best describes what a hypothesis is 21 | A hypothesis is a statement about a population. 22 | 23 | ## Q8. A Type 2 error in hypothesis testing is _____________________: 24 | incorrectly accepting the null hypothesis. 25 | 26 | ## Q9. Which statement best describes a consequence of a type II error in the context of a churn prediction example? Assume that the null hypothesis is that customer churn is due to chance, and that the alternative hypothesis is that customers enrolled for greater than two years will not churn over the next year. 27 | You incorrectly conclude that customer churn is by chance 28 | 29 | ## Q10. Which of the following is a statistic used for hypothesis testing? 30 | The likelihood ratio. 31 | 32 | -------------------------------------------------------------------------------- /06_Machine Learning Capstone/week_4.md: -------------------------------------------------------------------------------- 1 | # WEEK 4 QUIZ 2 | 3 | 4 | ## Q1. Which of the following methods can be used to convert a dense matrix saved as a long/vertical format to a sparse matrix? 5 | `pivot()` 6 | The pivot() method can be used to convert a dense matrix to a sparse matrix. 7 | 8 | ## Q2. Which of the following methods from the KNNBasic class can be used to train a KNN-based collaborative filtering model with a training set? 9 | 10 | `fit()` 11 | 12 | 13 | ## Q3. Which of the following is a Python scikit library used for recommender systems? 14 | 15 | `Surprise` 16 | 17 | ## Q4. Say you are given a sparse user-item interaction matrix, A, with dimensions 10000 x 500 and you defined the latent feature vector dimension to be 37. If non-negative matrix factorization is applied to A to decompose it into a user matrix, U, and an item matrix, I, what are the dimensions of U and I? 18 | 19 | `U (10000 x 37) and I (37 x 500)` 20 | 21 | ## Q5. If the pre-defined RecommenderNet is provided a user one-hot vector and an item one-hot vector as inputs, what is the expected output? 22 | `A rating estimation` 23 | 24 | ## Q6. In the Neural Networks lab, what is meant by embedding? 25 | 26 | `Embedding the one-hot encoding vector into a latent feature space` 27 | 28 | ## Q7. In the Regression lab, what is the data that is input into the regression model? 29 | 30 | `An interaction feature vector` 31 | 32 | ## Q8. Which of the following method(s) can be used to aggregate two feature vectors? 33 | 34 | - [ ] Element-wise addition 35 | - [ ] Element-wise multiplication 36 | - [ ] Element-wise max/min 37 | - [x] All of the above 38 | 39 | ## Q9. In the Classification lab, which values are used as input to LabelEncoder()? 40 | `Rating mode` 41 | 42 | ## Q10. What does the fit_transform() method in the LabelEnocder class return? 43 | `Encoded labels` 44 | -------------------------------------------------------------------------------- /01_Exploratory Data Analysis for Machine Learning/week_2.md: -------------------------------------------------------------------------------- 1 | # WEEK 2 QUIZ 2 | ## Q1. What is a CSV file? 3 | CSV files are rows of data or values separated by commas. 4 | 5 | ## Q2. What are residuals? 6 | Residuals are the difference between the actual values and the values predicted by a given model. 7 | 8 | ## Q3. If removal of rows or columns of data is not an option, why must we ensure that information is assigned for missing data? 9 | Most models will not accept blank values in our data. 10 | Assigning information for missing data improves the accuracy of the dataset. 11 | Information must be assigned to prevent outliers. 12 | Missing data may bias the dataset. Incorrect Answer 13 | 14 | ## Q4. What are the two main data problems companies face when getting started with artificial intelligence/machine learning? 15 | Lack of relevant data and bad data 16 | 17 | ## Q5. What does SQL stand for and what does it represent? 18 | SQL stands for Structured Query Language, and it represents a set of relational databases with fixed schemas. 19 | 20 | ## Q6. What does NoSQL stand for and what does it represent? 21 | NoSQL stands for Not-only SQL, and it represents a set of databases that are not relational, therefore, they vary in structure. 22 | 23 | ## Q7. What is a JSON file? 24 | 25 | JSON stands for JavaScript Object Notation, and it is a standard way to store the data across platforms. 26 | 27 | ## Q8. What is meant by the Messy Data? 28 | 29 | Duplicated or unnecessary data. 30 | Inconsistent text and typos. 31 | Missing data. 32 | ➡️ Answer `All of the above.` 33 | 34 | ## Q9. What is an outlier? 35 | 36 | Outlier is an observation in dataset that is distant from most other observations. 37 | 38 | 39 | ## Q10. How do we identify outliers in our dataset? 40 | We can identify outliers both visually and with statistical calculations. 41 | -------------------------------------------------------------------------------- /01_Exploratory Data Analysis for Machine Learning/week_1.md: -------------------------------------------------------------------------------- 1 | # WEEK 1 QUIZ 2 | ## Q1. What is the goal of supervised learning? 3 | Predict the labels. 4 | 5 | ## Q2. What is deep learning? 6 | 7 | Deep learning is machine learning that involves deep neural networks. 8 | 9 | ## Q3. When is a standard machine learning algorithm usually a better choice than using deep learning to get the job done? 10 | When working with small data sets. 11 | 12 | ## Q4. What is a Turing test? 13 | 14 | It tests a machine's ability to exhibit intelligent behavior. 15 | 16 | ## Q5. What are some of the different milestones in deep learning history? 17 | Geoffrey Hinton’s work, AlexNet, and TensorFlow 18 | 19 | Correct 20 | In 2006, the previous limitations of deep learning, namely exploding and vanishing gradients were overcome with algorithmic advancements such as Geoffrey Hinton's work on unsupervised pre-training. Neural networks are rebranded as deep learning, as we are able to train much deeper networks, networks with more layers; In 2012, a deep learning model using convolutional neural nets called AlexNet achieved a top five error of 15.3 percent; In 2015, one of the most popular libraries, TensorFlow, was built for deep learning, making it more powerful and accessible. 21 | 22 | ## Q6. What is artificial intelligence? 23 | Any program that can sense, reason, act, and adapt. 24 | 25 | 26 | ## Q7. What are two spaces within AI that are going through drastic growth and innovation? 27 | Computer vision and natural language processing. 28 | 29 | ## Q8. Why did AI flourish so much in the last years? 30 | Faster and inexpensive computers and data storage 31 | 32 | ## Q9. How does Alexa use artificial intelligence? 33 | 34 | Recognizes our voice and answers questions. 35 | 36 | ## Q10. What are the first two steps of a typical machine learning workflow? 37 | Problem statement and data collection. 38 | -------------------------------------------------------------------------------- /06_Machine Learning Capstone/week_2.md: -------------------------------------------------------------------------------- 1 | # WEEK 2 QUIZ 2 | 3 | ## Q1. What is the main benefit of visualizing the course titles in a word cloud? 4 | 5 | `The word cloud provides a quick visualization of the popular learning topics across all the courses.` 6 | 7 | ## Q2. In the Exploratory Data Analysis lab, how can we find the course enrollment counts for each user using Pandas dataframe? 8 | 9 | `Use the groupby() on the user column and use the size() method to count the courses for each user.` 10 | 11 | ## Q3. In the Exploratory Data Analysis lab, why do we need to plot a histogram that shows the number of how many courses users are enrolled in (i.e user enrollment)? 12 | 13 | `To illustrate the distribution of course enrollment` 14 | 15 | 16 | ## Q4. In the Exploratory Data Analysis lab, which percentage range do the 20 highest rated courses fall into when compared to the total number of ratings? 17 | 18 | `50%-74%` 19 | 20 | ## Q5. Which of the following best describes a “Bag of Words” (BoW) feature? 21 | 22 | `An array containing the frequency that words appear in a course’s title and description` 23 | 24 | 25 | ## Q6. In the Extract BoW Features lab, what does the stopwords.words() method do? 26 | 27 | `Retrieves a list of commonly used but unimportant words` 28 | 29 | ## Q7. In the Extract BoW Features lab, what does the method tokens_dict.doc2bow() do? 30 | 31 | `Generates a Bag of Words feature from a tokenized list` 32 | 33 | ## Q8. Which of the following could NOT be a cosine similarity measurement? 34 | 35 | `-0.25` 36 | # Cosine similarity measurements cannot be negative. 37 | 38 | ## Q9. Which format of the Bag of Words feature can be used directly to compute the cosine similarity? 39 | 40 | `Horizontal/sparse` 41 | 42 | 43 | ## Q10. When comparing two course’s Bag of Words features you find the cosine similarity to be 0.72. Which of the following is a true statement about this measurement? 44 | 45 | `The two courses can be considered relatively similar to each other.` 46 | -------------------------------------------------------------------------------- /02_Supervised Machine Learning: Regression/week_5.md: -------------------------------------------------------------------------------- 1 | # WEEK 5 QUIZ 2 | 3 | ## Q1. When working with regularization, what is the view that illuminates the actual optimization problem and shows why LASSO generally zeros out coefficients? 4 | - Answer: Geometric view 5 | 6 | ## Q2. When working with regularization, what is the view that recalibrates our understanding of LASSO and Ridge, as a base problem, where coefficients have particular prior distributions? 7 | - Answer: Probabilistic view 8 | 9 | ## Q3. When working with regularization, what is the logical view of how to achieve the goal of reducing complexity? 10 | - Answer: Analytical view 11 | 12 | ## Q4. All of the following statements about Regularization are TRUE except: 13 | - Answer: Features should rarely or never be scaled prior to implementing regularization. 14 | 15 | ## Q5. When working with regularization and using the geometric formulation, what is found at the intersection of the penalty boundary and a contour of the traditional OLS cost function surface? 16 | - Answer: The cost function minimum 17 | 18 | ## Q6. Which statement under the Probabilistic View is correct? 19 | - Answer: Regularization imposes certain priors on the regression coefficients. 20 | 21 | ## Q7. Increasing L2/L1 penalties force coefficients to be smaller, restricting their plausible range. This statement is part of what View? 22 | - Answer: Analytic View 23 | 24 | ## Q8. What does a higher lambda term mean in Regularization technique? 25 | - Answer: Higher lambda decreases variance, means smaller coefficients. 26 | 27 | 28 | ## Q9. What concept/s under Probabilistic View is/are True? 29 | - Answer: All of the above (We can derive the posterior probability by knowing the probability of the target and the prior distribution, The prior distribution is derived from independent draws of a prior coefficient density function that we choose when regularizing, L2 (ridge) regularization imposes a Gaussian prior on the coefficients, while L1 (lasso) regularization imposes a Laplacian prior.) 30 | 31 | ## Q10. What statement is True? 32 | - Answer: We reduce the complexity of the model by minimizing the error on our training set. 33 | -------------------------------------------------------------------------------- /06_Machine Learning Capstone/week_3.md: -------------------------------------------------------------------------------- 1 | # WEEK 3 QUIZ 2 | 3 | ## Q1. For the user profile matrix you generated in the User Profile-based Recommender System Lab, what does each row represent? 4 | 5 | `User profile vector` 6 | 7 | ## Q2. For the User Profile-based Recommender System Lab, which of the following best describes a user profile vector? 8 | `A list of recommendation scores ` 9 | 10 | ## Q3. For the User Profile-based Recommender System Lab, how is each value calculated in the user profile matrix you created? 11 | 12 | `The dot product of a user’s course ratings vector by a course’s associated genres vector` 13 | 14 | 15 | ## Q4. For the User Profile-based Recommender System Lab, what does each cell value represent in the recommender score matrix you created? 16 | 17 | `A course recommendation score for a given user` 18 | 19 | ## Q5. For the Content-based Recommender System Lab, why are the values along the diagonal equal to 1 in the course similarity matrix? 20 | 21 | `Because the similarity measurement of a course compared to itself is equal to 1` 22 | 23 | ## Q6. What information does the course similarity matrix discussed in Content-based Recommender System Lab convey? 24 | 25 | `Displays the bag of words similarity measurements between all courses to all other courses` 26 | 27 | 28 | ## Q7. What do the indices in the course similarity matrix from Content-based Recommender System Lab represent? 29 | 30 | `Courses` 31 | 32 | ## Q8. In the following code, sim_df represents a course similarity matrix. 33 | 34 | sim_matrix = sim_df.to_numpy() 35 | 36 | sim = sim_matrix[200][158] 37 | 38 | sim 39 | 40 | What does the output of this code represent? 41 | 42 | `The similarity measurement between the courses with indices 200 and 158` 43 | 44 | 45 | ## Q9. In the Clustering-based Course Recommender System Lab, which of the following ranges contains the point that indicates the optimized number of clusters in order to apply the K-means algorithm to generate the cluster label for all users? 46 | 47 | `11-20` 48 | 49 | ## Q10. In the Clustering-based Course Recommender System Lab, which of the following pairs of course genres are the most highly correlated according to the covariance matrix? 50 | 51 | `Python and DataAnalysis` 52 | -------------------------------------------------------------------------------- /02_Supervised Machine Learning: Regression/week_3.md: -------------------------------------------------------------------------------- 1 | # WEEK 3 QUIZ 2 | 3 | 4 | ## Q1. In K-fold cross-validation, how will increasing k affect the variance (across subsamples) of estimated model parameters? 5 | - Increasing k will usually increase the variance of estimated model parameters. 6 | 7 | ## Q2. Which statement about K-fold cross-validation below is TRUE? 8 | - Each of the k subsamples in K-fold cross-validation is used as a test set 9 | 10 | ## Q3. If a low-complexity model is underfitting during estimation, which of the following is MOST LIKELY true (holding the model constant) about K-fold cross-validation? 11 | - K-fold cross-validation will still lead to underfitting, for any k 12 | 13 | ## Q4. Which of the following statements about a high-complexity model in a linear regression setting is TRUE? 14 | - A high variance of parameter estimates across cross-validation subsamples indicates likely overfitting. 15 | 16 | ## Q5. Reviewing the below graph, what is the model considered when associated with the left side of this curve before hitting the plateau? 17 | - Underfitting 18 | 19 | ## Q6. Reviewing the below graph, what is the model considered when associated with the right side of the cross-validation error? 20 | - Overfitting 21 | 22 | ## Q7. Which of the following functions perform K-fold cross-validation for us, appropriately fitting and transforming at every step of the way? 23 | - 'cross_val_predict' 24 | 25 | ## Q8. Which of the following statements about cross-validation is/are True? 26 | - Cross-validation is an essential step in hyperparameter tuning. 27 | - We can manually generate folds by using the KFold function. 28 | - ANSWER - `ALL THE ABOVE` 29 | 30 | ## Q9. Which of the following statements about GridSearchCV is/are True? 31 | - GridSearchCV scans over a dictionary of parameters. 32 | - GridSearchCV finds the hyperparameter set that has the best out-of-sample score. 33 | - GridSearchCV retrains on all data with the "best" hyperparameters. 34 | - ANSWER - `ALL THE ABOVE` 35 | 36 | ## Q10. Which of the below functions randomly selects data to be in the train/test folds? 37 | - `KFold` and `StratifiedKFold` 38 | 39 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |