├── Week 1
    ├── Qn5.JPG
    └── Week 1 Quiz: Disease detection with computer vision.md
├── README.md
├── Week 3
    └── Week 3 Quiz: Segmentation on medical images.md
└── Week 2
    └── Week 2 Quiz: Evaluating machine learning models.md


/Week 1/Qn5.JPG:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ashishpatel26/AI-for-Medical-Diagnosis/master/Week 1/Qn5.JPG


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # AI for Medical Diagnosis on Coursera
 2 | 
 3 | Master Deep Learning, and Break into AI
 4 | 
 5 | Instructor: Andrew Ng
 6 | 
 7 | # Introduction
 8 | This repo contains all my work for this specialization. All the code base, quiz questions, screenshot, and images, are taken from, unless specified, AI for Medical Diagnosis on Coursera. I am a Biomedical Undergraduate and a long time self-taught learner. There are many forums where there are detailed descriptions of the various programs. I understand the hardwork put in to understand the new concepts and debugging your program. Here, I have released the solutions of the assignment **only for your reference purpose**. It can help you save some time in completing the assignment or it can also assist you for the same. Use these only as an aid to solving the programming assignment and the quiz. This course is one of the easiest deep learning course I have ever taken. It explains most of the concepts necessary for AI applications in medicine. It is a treasure given by deeplearning.ai team.
 9 | 
10 | # Programming Assignments and Quiz
11 | 
12 | [- Week 1 ](https://github.com/mk-gurucharan/AI-for-Medical-Diagnosis/tree/master/Week%201)
13 | 
14 | [- Week 2 ](https://github.com/mk-gurucharan/AI-for-Medical-Diagnosis/tree/master/Week%202)
15 | 
16 | [- Week 3 ](https://github.com/mk-gurucharan/AI-for-Medical-Diagnosis/tree/master/Week%203)
17 | 


--------------------------------------------------------------------------------
/Week 3/Week 3 Quiz: Segmentation on medical images.md:
--------------------------------------------------------------------------------
 1 | # Week 3 Quiz: Segmentation on medical images
 2 | 
 3 | ### Notice that I only list correct options.
 4 | 
 5 | 1. Which of the following is a segmentation task?
 6 | 
 7 | - **Determining which areas of the brain have tumor from an MRI**
 8 | 
 9 | Correct! Classification tasks have binary or categorical labels for each image, while segmentation tasks ask you to determine a label for every pixel (or voxel).
10 | 
11 | 2. What is the MAIN disadvantage of processing each MRI slice independently using a 2D segmentation model (as mentioned in the lecture)?
12 | Hint: watch the lecture video "Segmentation" to help you answer this question.
13 | 
14 | - **You lose some context between slices**
15 | 
16 | Correct! The main disadvantage is the loss of information between slices. For example, if a tumor is present in a given slice, then we would expect higher probability of having a tumor in the same area in neighboring slices.
17 | 
18 | 3. The U-net consists of...
19 | 
20 | - **A contracting path followed by an expanding path**
21 | 
22 | Correct! The U-net consists of a contracting path followed by an expanding path. This can be interpreted as ‘squeezing the input to create a low dimensional representation and then producing a segmentation based off of those low dimensional features.
23 | 
24 | 4. Which of the following data augmentation is most effective for MRI sequences?
25 | 
26 | - **Rotation**
27 | 
28 | Correct! The only transformation which preserves the integrity of the data is using rotations. If we shuffle the slices, the relationships between the slices will change and the model will not be able to learn.
29 | 
30 | 5. What is the soft dice loss for the example below?
31 | L(P,G) = 1 - \frac{2\sum_{i=1}^n p_i g_i}{\sum_{i=1}^n p_i^2 + \sum_{i=1}^n g_i^2}
32 | 
33 | - **0.089**
34 | 
35 | Correct! Using the formula:
36 | L(P,G) = 1 - \frac{2\sum_{i=1}^n p_i g_i}{\sum_{i=1}^n p_i^2 + \sum_{i=1}^n g_i^2}
37 | Computing the numerator, we get 2 * ( 3.7) = 7.4, and the denominator is 3.13 + 5.0 = 8.13. Therefore the answer is 1 - (7.4 / 8.13) = 0.089.
38 | 
39 | 6. Look at the output of model 1 and model 2: Which one will have a lower soft dice loss?
40 | Hint: Notice the prediction scores of P1 and P2 on the pixels where the ground truth is 1. This may help you focus on certain parts of the soft dice loss formula:
41 | 
42 | 
43 | - **Model 1 has a lower loss**
44 | 
45 | Correct! Note that the numerator will not change between the models, since the scores for model 1 and 2 are the same for the pixels which have ground truth 1.
46 | However, the denominator for model 1 will be smaller, since it has smaller scores on the corner pixels (0.3 for model 1 instead of 0.5 for model 2).
47 | With a smaller denominator, the ratio for model 1 will be larger. Subtracting 1 minus the larger ratio will lead to a smaller loss for model 1.
48 | 
49 | 7. What is the minimum value of the soft dice loss?
50 | L(P,G) = 1 - \frac{2\sum_{i=1}^n p_i g_i}{\sum_{i=1}^n p_i^2 + \sum_{i=1}^n g_i^2}
51 | 
52 | - **0**
53 | 
54 | Correct! The minimum value is 0. To see this, set p_i = g_i. Then the numerator will be equal to the denominator and 1 minus that will be 0.
55 | To see that it is greater than or equal to 0, note that the top will be bounded above by both \sum_{i=1}^n p_i^2 and \sum_{i=1}^n g_i^2
56 | Therefore, 2 times the numerator is less than or equal to the denominator, so this fraction must be at most 1. So the loss must be greater than or equal to 0.
57 | 
58 | 8. An X-ray classification model is developed on data from US hospitals and is later tested on an external dataset from Latin America. Which if the following do you expect?
59 | 
60 | - **Performance drops on the new dataset**
61 | 
62 | Correct! We would expect performance to drop on the new external dataset since the underlying population of the new patient population is different from the population the model was trained on. Additionally, there might be idiosyncrasies about the scanners for the X-rays on the new dataset that bias the model. We would not typically expect performance to remain constant or improve, just like we don’t expect the model performance on the test set to be the same as on the validation set after hyper-parameter tuning.
63 | 
64 | 9. Which of the following is an example of a prospective study?
65 | 
66 | - **A model is deployed for 1 year in an emergency room and its performance over that time is evaluated**
67 | 
68 | Correct! A prospective study is the application of a model to data that is not historical.
69 | 


--------------------------------------------------------------------------------
/Week 1/Week 1 Quiz: Disease detection with computer vision.md:
--------------------------------------------------------------------------------
 1 | # **Week 1 Quiz: Disease detection with computer vision**
 2 | 
 3 | ### **Notice that I only list correct options.**
 4 | 
 5 | 
 6 | 1. Which of the following is not one of the key challenges for AI diagnostic algorithms that is discussed in the lecture? 
 7 | - **Inflexible models**
 8 | 
 9 | Correct! 
10 | This was not discussed as one of the key challenges, but more complex models can be used to fit data, to avoid underfitting.
11 | 
12 | 2.You find that your training set has 70% negative examples and 30% positive. Which of the following techniques will NOT help for training this imbalanced dataset?
13 | -  **Oversampling negative examples**
14 | 
15 | Correct! 
16 | Given that the model is being trained on more negative examples, sampling even more negative samples will bias the model even more towards making a negative prediction
17 | 
18 | 3.What is the total loss from the normal (non-mass) examples in this example dataset?
19 | 
20 | Please use the natural logarithm in your calculation. When you use numpy.log, this is using the natural logarithm. Also, to get the total loss, please add up the losses from each ‘normal’ example.
21 | 
22 | Example -	P(positive)
23 | P1      -Normal	0.6
24 | P3      -Normal	0.3
25 | P5      -Mass	0.4
26 | 
27 | - **1.27**
28 | 
29 | Correct! 
30 | Since these are negative examples, the losses will be  -log(1-P(positive))−log(1−P(positive)).
31 | For P1, -log(1-0.6) = 0.91−log(1−0.6)=0.91.
32 | For P3 -log(1-0.3) = 0.36−log(1−0.3)=0.36.
33 | The sum is 0.91 + 0.36 = 1.270.91+0.36=1.27.
34 | 
35 | 4. What is the typical size of medical image dataset?
36 | 
37 | - **~10 thousand to 100 thousand  images**
38 | 
39 | Correct! 
40 | Most often datasets will range from 10,000 to 100,000 labeled images. Fewer than 1000 is typically too few to train, validate and test a classifier, and very few datasets will have millions of images due to the cost of labeling.
41 | 
42 | 5. Which of the following data augmentations would be best to apply?
43 | 
44 | - [**Click here for the Image**](https://github.com/mk-gurucharan/AI-for-Medical-Diagnosis/blob/master/Week%201/Qn5.JPG)
45 | 
46 | Correct! 
47 | This rotation is most likely to help. This is a realistic transformation. Also, it does not risk changing the label.
48 | 
49 | 6.Which of the following are valid methods for determining ground truth?  Choose all that apply.
50 | 
51 | 
52 | - **Consensus voting from a board of doctors**
53 | 
54 | Correct! 
55 | Consensus is considered less reliable than biopsy verification.  However, the limited availability of biopsy data means that consensus voting may still be the best (or only viable) option.
56 | 
57 | - **Biopsy**
58 | 
59 | Correct! 
60 | Biopsy is definitely a valid method.  Keep in mind that there are likely fewer data examples where patients have both the chest x-ray and an additional diagnostic test for the same disease.
61 | 
62 | - **Confirmation by CT scan**
63 | 
64 | Correct! 
65 | A CT scan can provide an objective ground truth.  Keep in mind that there are likely fewer data examples where patients have both the chest x-ray and an additional diagnostic test for the same disease.
66 | 
67 | 7. In what order should the training, validation, and test sets be sampled?
68 | 
69 | -  **Test, Validation, Training**
70 | 
71 | Correct! 
72 | First the test dataset should be sampled, then the validation set, then the training set. This is so that you can make sure you can adequately sample the test set, and then sample the validation set to match the distribution of labels in the test set.
73 | 
74 | 8. Why is it bad to have patients in both training and test sets?
75 | 
76 | - **Overly optimistic test performance**
77 | 
78 | Correct! 
79 | Having images from the same patient is bad because it has been shown that the model may learn patient-specific features that are not generalizable to other patients.
80 | 
81 | 9. Let’s say you have a relatively small training set (~5 thousand images). Which training strategy makes the most sense?   
82 | 
83 | - **Retraining the last layer of a pre-trained model**
84 | 
85 | Correct! 
86 | By using a pre-trained model, you can make use of its ability to recognize lower level features, and then fine tune the last few layers using your dataset.
87 | 
88 | 10. Now let’s say you have a very large dataset (~1 million images). Which training strategies will make the most sense?
89 | 
90 | - **Training a model with randomly initialized weights.**
91 | 
92 | Correct@ 
93 | Given a very large dataset, you have the option of training a new model instead of using a pre-trained model.
94 | 
95 | - **Retraining all layers of a pretrained model**
96 | 
97 | Correct! 
98 | Given the large dataset, you have the option of training all layers of a pre-trained model.  Using a pre-trained model may be faster than training a model from randomly initialized weights.
99 | 


--------------------------------------------------------------------------------
/Week 2/Week 2 Quiz: Evaluating machine learning models.md:
--------------------------------------------------------------------------------
  1 | # **Week 2 Quiz: Evaluating machine learning models**
  2 | 
  3 | ### Notice that I only list correct options.
  4 | 
  5 | 1. What is the sensitivity and specificity of a pneumonia model that always outputs positive? In other words, the models says that every patient has the disease.
  6 | 
  7 | - **sensitivity = 1.0, specificity = 0.0**
  8 | 
  9 | Correct! Sensitivity tells us how good the model is at correctly identifying those patients who actually have the disease and label them as having the disease.
 10 | Specificity tells us how good the model is at correctly identifying the healthy patients as not having the disease.
 11 | 
 12 | 2. In some studies, you may have to compute the Positive predictive value (PPV) from the sensitivity, specificity and prevalence. Given a sensitivity = 0.9, specificity = 0.8, and prevalence = 0.2, what is the PPV (positive predictive value)? HINT: please check the reading item "Calculating PPV in terms of sensitivity, specificity and prevalence"
 13 | 
 14 | - **0.52**
 15 | 
 16 | Correct! PPV=sensitivity×prevalencesensitivity×prevalence+(1−specificity)×(1−prevalence)
 17 | The numerator is (sensitivity * prevalence) = 0.9 * 0.2 = 0.18.
 18 | The denominator is 0.18 + 0.2 * 0.8 = 0.34.
 19 | Therefore the PPV is 0.18/0.34 ~ 0.52
 20 | 
 21 | 3. If sensitivity = 0.9, specificity = 0.8, and prevalence = 0.2, then what is the accuracy? Hint: You can watch the video "Sensitivity, Specificity and Prevalence" to find the equation.
 22 | 
 23 | - **0.82**
 24 | 
 25 | Correct! The equation for accuracy is: Accuracy=(Sensitivity×Prevalence)+(Specificity×(1−Prevalence))
 26 | So accuracy = (0.9 * 0.2) + (0.8 * 0.8) = 0.82
 27 | 
 28 | 4. What is the sensitivity and specificity of a model which randomly assigns a score between 0 and 1 to each example (with equal probability) if we use a threshold of 0.7?
 29 | 
 30 | - **Sensitivity = 0.3, Specificity = 0.7**
 31 | 
 32 | Correct! Sensitivity=TPTP+FN Specificity=TNTN+FP
 33 | Sensitivity=P(pos^|pos)=P(score>0.7|pos)
 34 | Our score is independent of the input data (it randomly assigns 0 or 1 predictions) so
 35 | P(score > 0.7 | pos) = P(score > 0.7) = 0.3P(score>0.7∣pos)=P(score>0.7)=0.3
 36 | Similarly, specificity=P(neg^|neg)=P(score<0.7|neg)=P(score<0.7)=0.7
 37 | 
 38 | 5. What is the PPV and sensitivity associated with the following confusion matrix?
 39 | Recall that
 40 | PPV = \frac{\text{TruePositives}}{\text{positive predictions}}PPV= 
 41 | positive predictions
 42 | TruePositives 
 43 | Sensitivity = \text{How many actual positives are predicted positive?}Sensitivity=How many actual positives are predicted positive?
 44 | Test Positive	Test Negative
 45 | Disease Positive	30	20
 46 | Disease Negative	70	10
 47 | 
 48 | 
 49 | - **PPV = 0.3, Sensitivity = 0.6**
 50 | 
 51 | Correct! PPV=P(pos|pos^)
 52 | PPV = \frac{TP}{TP + FP}PPV= 
 53 | TP+FP
 54 | TP
 55 | PPV = \frac{30}{30 + 70} = 0.3 PPV= 
 56 | 30+70
 57 | 30
 58 | =0.3
 59 | Sensitivity = P(predict positive | actual positive)
 60 | Sensitivity = \frac{TP}{TP+FN}Sensitivity= 
 61 | TP+FN
 62 | TP 
 63 | Sensitivity = \frac{30}{30 + 20} = 0.6Sensitivity= 
 64 | 30+20
 65 | 30
 66 | =0.6
 67 | 
 68 | 6. You have a model such that the lowest score for a positive example is higher than the maximum score for a negative example. What is its ROC?
 69 | HINT 1: watch the video “Varying the threshold”.
 70 | HINT 2: draw a number line and choose values for the score that is the lowest prediction for any positive example, and choose another number that is the score for the highest prediction for any negative example.  Draw a few circles for “positive” examples and a few “x” for the negative examples. What do you notice about the model’s ability to identify positive and negative examples?
 71 | 
 72 | - **1.0**
 73 | 
 74 | Correct! The model perfectly discriminates between positive and negative examples. 
 75 | Pretend that the score predictions for all positive examples is 0.5 or higher, and the score predictions for all the negative examples are less than 0.5.  Then all the positive examples have prediction scores of 0.5 or higher. All the negative examples have prediction scores less than 0.5. They are perfectly separated.
 76 | For any thresholds > 0.5, the specificity will be 1.0  (it correctly identifies all the negative examples), and the sensitivity will range from 0 to 1, so the points will run along the line y=1 (in the plot of the ROC curve, it will be the top horizontal edge of the chart. 
 77 | At the threshold 0.5, the sensitivity (ability to correctly identify positive examples) will be 1.0 and the specificity will also be 1.0, so the point will be at the top right corner of the ROC curve. 
 78 | At any threshold < 0.5, the sensitivity (ability to identify positive examples) will be 1.0 and the specificity will range from 1 to 0, so the point will be along the line x = 1 (the right side edge of the ROC Curve chart.  
 79 | So the ROC curve is a box with width 1 and height 1, so the area under it is 1.0.
 80 | 
 81 | 7. For every specificity s, as we vary the threshold, the sensitivity of model 1 is at least as high as model 2. Which of the following must be true?
 82 | 
 83 | - **The ROC of model 1 is at least as high as model 2**
 84 | 
 85 | Correct! Note that because specificity determines the x-axis location, and since the sensitivity of model 1 is at least as high as the sensitivity of model 2, the ROC curve for model 1 never goes underneath the curve for model 2. Therefore if we compute the area under the two curves, the area for model 1 must be at least as high as model 2.
 86 | 
 87 | 8. You want to measure the proportion of people with high blood pressure in a population. You sample 1000 people and find that 55% have high blood pressure with a 90% confidence interval of (50%, 60%). What is the correct interpretation of this result?
 88 | HINT: Please watch the video "Confidence interval" to help you answer this question.
 89 | 
 90 | - **If you repeated this sampling, the true proportion would be in the confidence interval about 90% of the time**
 91 | 
 92 | Correct! Confidence intervals are created so that 90% of the time you repeat the experiment, the interval will contain the true parameter value.
 93 | 
 94 | 9. One experiment calculates a confidence interval using 1000 samples, and the another computes it using 10000 samples. Which interval do you expect to be tighter (assume they use the normal approximation)?
 95 | 
 96 | - **10,000 samples**
 97 | 
 98 | 
 99 | Correct! When we’re using a normal approximation, the width of our confidence interval depends on the variance of the normal distribution. Recall that the variance of each sample is identical, but the variance of the average is divided by n. Therefore since dividing by a larger number makes a quantity smaller, the variance of the average of 10000 samples should be less than that for 1000 samples, so the second confidence interval should be tighter.
100 | 


--------------------------------------------------------------------------------