├── .travis.yml ├── Anaconda-Guide └── Installation-Guide.md ├── CODE_OF_CONDUCT.md ├── CONTRIBUTING.md ├── Decision Trees ├── CART_model.R ├── CART_recitation.R ├── Random_Forest.R ├── Random_forest_text.R ├── first.py └── iris.py ├── Error Estimation ├── Area Under ROC Curve.py ├── Classification_Accuracy.py └── Logarithmic Loss.py ├── K Means Clustering ├── .ipynb_checkpoints │ └── K Means Clustering-checkpoint.ipynb └── K Means Clustering.ipynb ├── K Nearest Neighbors ├── .ipynb_checkpoints │ └── KNN-checkpoint.ipynb ├── Classified Data ├── KMeanClustering.R └── KNN.ipynb ├── LICENSE ├── LinearRegression ├── .ipynb_checkpoints │ └── LinearRegression-checkpoint.ipynb ├── LinearRegression.ipynb ├── Supervised+Learning+-++Linear+Regression.ipynb ├── USA_Housing.csv ├── linearregression.R ├── linearregressionRecitation.R └── machine.pdf ├── Naive_Bayes ├── Supervised+Learning+Naive+Bayes.ipynb └── example_naive_bayes.py ├── Neural_Networks └── Hebb_rule.m ├── README.md ├── RESOURCES.MD ├── Random Forests └── intro_to_random_forest.py └── Support Vector Machines ├── .ipynb_checkpoints └── SVM-checkpoint.ipynb └── SVM.ipynb /.travis.yml: -------------------------------------------------------------------------------- 1 | notifications: 2 | webhooks: 3 | urls: 4 | - https://webhooks.gitter.im/e/1e6ea277770f5229049b 5 | on_success: change # options: [always|never|change] default: always 6 | on_failure: always # options: [always|never|change] default: always 7 | on_start: never # options: [always|never|change] default: always 8 | -------------------------------------------------------------------------------- /Anaconda-Guide/Installation-Guide.md: -------------------------------------------------------------------------------- 1 | # Anaconda Installation Guide 2 | 3 | # Installing On Linux: 4 | 5 | 1. Download [Anaconda installer for Linux](https://docs.anaconda.com/anaconda/install/linux) 6 | 2. + Enter the following to install Anaconda for Python 3.6: 7 | ```bash 8 | bash ~/dirname/Anaconda3-5.0.1-Linux-x86_64.sh 9 | ``` 10 | + Or Enter the following to install Anaconda for Python 2.7: 11 | ```bash 12 | bash ~/dirname/Anaconda2-5.0.1-Linux-x86_64.sh 13 | ``` 14 | * Note: replace ~/dirname/ with the path to the file you downloaded. 15 | 3. The installer prompts “In order to continue the installation process, please review the license agreement.” Click Enter to view license terms. Scroll to the bottom of the license terms and enter “Yes” to agree. 16 | 4. The installer prompts you to click Enter to accept the default install location, CTRL-C to cancel the installation, or specify an alternate installation directory. If you accept the default install location, the installer displays “```PREFIX=/home//anaconda<2 or 3>```” and continues the installation. It may take a few minutes to complete 17 | 5. The installer prompts “Do you wish the installer to prepend the Anaconda 2 or 3 install location to PATH in your ```/home//.bashrc ?```” Enter Yes. 18 | 6. The installer finishes and displays “Thank you for installing Anaconda 2 or 3 !” 19 | 7. Close and open your terminal window for the installation to take effect, or you can enter the command source ```~/.bashrc.``` 20 | 8. After your install is complete, verify it by opening Anaconda Navigator, a program that is included with Anaconda: Open a Terminal window and type ```anaconda-navigator```. If Navigator opens, you have successfully installed Anaconda 21 | -------------------------------------------------------------------------------- /CODE_OF_CONDUCT.md: -------------------------------------------------------------------------------- 1 | # Code of Conduct 2 | 3 | ## 1. Purpose 4 | 5 | A primary goal of Lectures On Machine Learning is to be inclusive to the largest number of contributors, with the most varied and diverse backgrounds possible. As such, we are committed to providing a friendly, safe and welcoming environment for all, regardless of gender, sexual orientation, ability, ethnicity, socioeconomic status, and religion (or lack thereof). 6 | 7 | This code of conduct outlines our expectations for all those who participate in our community, as well as the consequences for unacceptable behavior. 8 | 9 | We invite all those who participate in Lectures On Machine Learning to help us create safe and positive experiences for everyone. 10 | 11 | ## 2. Open Source Citizenship 12 | 13 | A supplemental goal of this Code of Conduct is to increase open source citizenship by encouraging participants to recognize and strengthen the relationships between our actions and their effects on our community. 14 | 15 | Communities mirror the societies in which they exist and positive action is essential to counteract the many forms of inequality and abuses of power that exist in society. 16 | 17 | If you see someone who is making an extra effort to ensure our community is welcoming, friendly, and encourages all participants to contribute to the fullest extent, we want to know. 18 | 19 | ## 3. Expected Behavior 20 | 21 | The following behaviors are expected and requested of all community members: 22 | 23 | * Participate in an authentic and active way. In doing so, you contribute to the health and longevity of this community. 24 | * Exercise consideration and respect in your speech and actions. 25 | * Attempt collaboration before conflict. 26 | * Refrain from demeaning, discriminatory, or harassing behavior and speech. 27 | * Be mindful of your surroundings and of your fellow participants. Alert community leaders if you notice a dangerous situation, someone in distress, or violations of this Code of Conduct, even if they seem inconsequential. 28 | * Remember that community event venues may be shared with members of the public; please be respectful to all patrons of these locations. 29 | 30 | ## 4. Unacceptable Behavior 31 | 32 | The following behaviors are considered harassment and are unacceptable within our community: 33 | 34 | * Violence, threats of violence or violent language directed against another person. 35 | * Sexist, racist, homophobic, transphobic, ableist or otherwise discriminatory jokes and language. 36 | * Posting or displaying sexually explicit or violent material. 37 | * Posting or threatening to post other people’s personally identifying information ("doxing"). 38 | * Personal insults, particularly those related to gender, sexual orientation, race, religion, or disability. 39 | * Inappropriate photography or recording. 40 | * Inappropriate physical contact. You should have someone’s consent before touching them. 41 | * Unwelcome sexual attention. This includes, sexualized comments or jokes; inappropriate touching, groping, and unwelcomed sexual advances. 42 | * Deliberate intimidation, stalking or following (online or in person). 43 | * Advocating for, or encouraging, any of the above behavior. 44 | * Sustained disruption of community events, including talks and presentations. 45 | 46 | ## 5. Consequences of Unacceptable Behavior 47 | 48 | Unacceptable behavior from any community member, including sponsors and those with decision-making authority, will not be tolerated. 49 | 50 | Anyone asked to stop unacceptable behavior is expected to comply immediately. 51 | 52 | If a community member engages in unacceptable behavior, the community organizers may take any action they deem appropriate, up to and including a temporary ban or permanent expulsion from the community without warning (and without refund in the case of a paid event). 53 | 54 | ## 6. Reporting Guidelines 55 | 56 | If you are subject to or witness unacceptable behavior, or have any other concerns, please notify a community organizer as soon as possible. divyanshu.r46956@gmailc.com. 57 | 58 | [Reporting Guidelines](https://github.com/Cybros/Lectures-On-Machine-Learning/issues) 59 | 60 | Additionally, community organizers are available to help community members engage with local law enforcement or to otherwise help those experiencing unacceptable behavior feel safe. In the context of in-person events, organizers will also provide escorts as desired by the person experiencing distress. 61 | 62 | ## 7. Addressing Grievances 63 | 64 | If you feel you have been falsely or unfairly accused of violating this Code of Conduct, you should notify Cybros with a concise description of your grievance. Your grievance will be handled in accordance with our existing governing policies. 65 | 66 | [Policy](https://github.com/Cybros/Lectures-On-Machine-Learning/issues) 67 | 68 | ## 8. Scope 69 | 70 | We expect all community participants (contributors, paid or otherwise; sponsors; and other guests) to abide by this Code of Conduct in all community venues–online and in-person–as well as in all one-on-one communications pertaining to community business. 71 | 72 | This code of conduct and its related procedures also applies to unacceptable behavior occurring outside the scope of community activities when such behavior has the potential to adversely affect the safety and well-being of community members. 73 | 74 | ## 9. Contact info 75 | 76 | divyanshu.r46956@gmailc.com 77 | 78 | ## 10. License and attribution 79 | 80 | This Code of Conduct is distributed under a [Creative Commons Attribution-ShareAlike license](http://creativecommons.org/licenses/by-sa/3.0/). 81 | 82 | Portions of text derived from the [Django Code of Conduct](https://www.djangoproject.com/conduct/) and the [Geek Feminism Anti-Harassment Policy](http://geekfeminism.wikia.com/wiki/Conference_anti-harassment/Policy). 83 | 84 | Retrieved on November 22, 2016 from [http://citizencodeofconduct.org/](http://citizencodeofconduct.org/) 85 | -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | 2 | We believe - and we hope you do too - that learning how to code, how to think, and how to contribute to open source can empower the next generation of coders and creators. 3 | 4 | You can do it! Here’s how 5 | 6 | If you have never contributed to an open source project before and you’re just getting started, consider exploring these resources. 7 | 8 | [A Step by Step Guide to Making Your First GitHub Contribution](https://codeburst.io/a-step-by-step-guide-to-making-your-first-github-contribution-5302260a2940) 9 | 10 | [How to Contribute to Open Source](https://opensource.guide/how-to-contribute/) 11 | 12 | 13 | In order to get some insights about Github terminologies consider exploring these resources. 14 | 15 | [Collaborating With Issues And Pull Requests](https://help.github.com/categories/collaborating-with-issues-and-pull-requests/) 16 | 17 | ## 18 | 19 | All contributors and maintainers of this project are subject to this code of conduct. 20 | 21 | As contributors and maintainers of this project, and in the interest of fostering an open and welcoming community, we pledge to respect all people who contribute to reporting issues, posting feature requests, updating documentation, submitting pull requests or patches, and other activities. 22 | 23 | 24 | Some guidelines you must have to follow in order to contribute to this repository. 25 | 26 | * Create your **branch**: `git checkout -b my-new-feature` 27 | 28 | * **Commit** your changes: `git commit -m 'Add some feature'` 29 | 30 | * **Push** to the branch: `git push origin my-new-feature` 31 | 32 | * Send a **Pull Request** 33 | 34 | * **Enjoy!** 35 | 36 | ## 37 | 38 | If you are still facing some issues while contributing to the repository feel free to reach us on our **Gitter** channel. 39 | 40 | [![Join the chat](https://img.shields.io/badge/gitter-join%20chat%20%E2%86%92-brightgreen.svg)](https://gitter.im/LNMIIT-Computer-Club/Lobby) 41 | 42 | **Note** : Project maintainers have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that are not aligned to this Code of Conduct, or to ban temporarily or permanently any contributor for other behaviors that they deem inappropriate, threatening, offensive, or harmful. 43 | 44 | -------------------------------------------------------------------------------- /Decision Trees/CART_model.R: -------------------------------------------------------------------------------- 1 | 2 | 3 | # Read in the data 4 | Claims = read.csv("ClaimsData.csv") 5 | 6 | str(Claims) 7 | 8 | # Percentage of patients in each cost bucket 9 | table(Claims$bucket2009)/nrow(Claims) 10 | 11 | # Split the data 12 | library(caTools) 13 | 14 | set.seed(88) 15 | 16 | spl = sample.split(Claims$bucket2009, SplitRatio = 0.6) 17 | 18 | ClaimsTrain = subset(Claims, spl==TRUE) 19 | 20 | ClaimsTest = subset(Claims, spl==FALSE) 21 | 22 | 23 | # VIDEO 7 24 | 25 | # Baseline method 26 | table(ClaimsTest$bucket2009, ClaimsTest$bucket2008) 27 | (110138 + 10721 + 2774 + 1539 + 104)/nrow(ClaimsTest) 28 | 29 | # Penalty Matrix 30 | PenaltyMatrix = matrix(c(0,1,2,3,4,2,0,1,2,3,4,2,0,1,2,6,4,2,0,1,8,6,4,2,0), byrow=TRUE, nrow=5) 31 | 32 | PenaltyMatrix 33 | 34 | # Penalty Error of Baseline Method 35 | as.matrix(table(ClaimsTest$bucket2009, ClaimsTest$bucket2008))*PenaltyMatrix 36 | 37 | sum(as.matrix(table(ClaimsTest$bucket2009, ClaimsTest$bucket2008))*PenaltyMatrix)/nrow(ClaimsTest) 38 | 39 | 40 | # VIDEO 8 41 | 42 | # Load necessary libraries 43 | library(rpart) 44 | library(rpart.plot) 45 | 46 | # CART model 47 | ClaimsTree = rpart(bucket2009 ~ age + alzheimers + arthritis + cancer + copd + depression + diabetes + heart.failure + ihd + kidney + osteoporosis + stroke + bucket2008 + reimbursement2008, data=ClaimsTrain, method="class", cp=0.00005) 48 | 49 | prp(ClaimsTree) 50 | 51 | 52 | # Make predictions 53 | PredictTest = predict(ClaimsTree, newdata = ClaimsTest, type = "class") 54 | 55 | table(ClaimsTest$bucket2009, PredictTest) 56 | 57 | (114141 + 16102 + 118 + 201 + 0)/nrow(ClaimsTest) 58 | 59 | # Penalty Error 60 | as.matrix(table(ClaimsTest$bucket2009, PredictTest))*PenaltyMatrix 61 | 62 | sum(as.matrix(table(ClaimsTest$bucket2009, PredictTest))*PenaltyMatrix)/nrow(ClaimsTest) 63 | 64 | # New CART model with loss matrix 65 | ClaimsTree = rpart(bucket2009 ~ age + alzheimers + arthritis + cancer + copd + depression + diabetes + heart.failure + ihd + kidney + osteoporosis + stroke + bucket2008 + reimbursement2008, data=ClaimsTrain, method="class", cp=0.00005, parms=list(loss=PenaltyMatrix)) 66 | 67 | # Redo predictions and penalty error 68 | PredictTest = predict(ClaimsTree, newdata = ClaimsTest, type = "class") 69 | 70 | table(ClaimsTest$bucket2009, PredictTest) 71 | 72 | (94310 + 18942 + 4692 + 636 + 2)/nrow(ClaimsTest) 73 | 74 | sum(as.matrix(table(ClaimsTest$bucket2009, PredictTest))*PenaltyMatrix)/nrow(ClaimsTest) 75 | -------------------------------------------------------------------------------- /Decision Trees/CART_recitation.R: -------------------------------------------------------------------------------- 1 | # Unit 4, Recitation 2 | 3 | 4 | # VIDEO 2 5 | 6 | # Read in data 7 | boston = read.csv("boston.csv") 8 | str(boston) 9 | 10 | # Plot observations 11 | plot(boston$LON, boston$LAT) 12 | 13 | # Tracts alongside the Charles River 14 | points(boston$LON[boston$CHAS==1], boston$LAT[boston$CHAS==1], col="blue", pch=19) 15 | 16 | # Plot MIT 17 | points(boston$LON[boston$TRACT==3531],boston$LAT[boston$TRACT==3531],col="red", pch=20) 18 | 19 | # Plot polution 20 | summary(boston$NOX) 21 | points(boston$LON[boston$NOX>=0.55], boston$LAT[boston$NOX>=0.55], col="green", pch=20) 22 | 23 | # Plot prices 24 | plot(boston$LON, boston$LAT) 25 | summary(boston$MEDV) 26 | points(boston$LON[boston$MEDV>=21.2], boston$LAT[boston$MEDV>=21.2], col="red", pch=20) 27 | 28 | 29 | 30 | # VIDEO 3 31 | 32 | # Linear Regression using LAT and LON 33 | plot(boston$LAT, boston$MEDV) 34 | plot(boston$LON, boston$MEDV) 35 | latlonlm = lm(MEDV ~ LAT + LON, data=boston) 36 | summary(latlonlm) 37 | 38 | # Visualize regression output 39 | plot(boston$LON, boston$LAT) 40 | points(boston$LON[boston$MEDV>=21.2], boston$LAT[boston$MEDV>=21.2], col="red", pch=20) 41 | 42 | latlonlm$fitted.values 43 | points(boston$LON[latlonlm$fitted.values >= 21.2], boston$LAT[latlonlm$fitted.values >= 21.2], col="blue", pch="$") 44 | 45 | 46 | 47 | # Video 4 48 | 49 | # Load CART packages 50 | library(rpart) 51 | library(rpart.plot) 52 | 53 | # CART model 54 | latlontree = rpart(MEDV ~ LAT + LON, data=boston) 55 | prp(latlontree) 56 | 57 | # Visualize output 58 | plot(boston$LON, boston$LAT) 59 | points(boston$LON[boston$MEDV>=21.2], boston$LAT[boston$MEDV>=21.2], col="red", pch=20) 60 | 61 | fittedvalues = predict(latlontree) 62 | points(boston$LON[fittedvalues>21.2], boston$LAT[fittedvalues>=21.2], col="blue", pch="$") 63 | 64 | # Simplify tree by increasing minbucket 65 | latlontree = rpart(MEDV ~ LAT + LON, data=boston, minbucket=50) 66 | plot(latlontree) 67 | text(latlontree) 68 | 69 | # Visualize Output 70 | plot(boston$LON,boston$LAT) 71 | abline(v=-71.07) 72 | abline(h=42.21) 73 | abline(h=42.17) 74 | points(boston$LON[boston$MEDV>=21.2], boston$LAT[boston$MEDV>=21.2], col="red", pch=20) 75 | 76 | 77 | 78 | # VIDEO 5 79 | 80 | # Let's use all the variables 81 | 82 | # Split the data 83 | library(caTools) 84 | set.seed(123) 85 | split = sample.split(boston$MEDV, SplitRatio = 0.7) 86 | train = subset(boston, split==TRUE) 87 | test = subset(boston, split==FALSE) 88 | 89 | # Create linear regression 90 | linreg = lm(MEDV ~ LAT + LON + CRIM + ZN + INDUS + CHAS + NOX + RM + AGE + DIS + RAD + TAX + PTRATIO, data=train) 91 | summary(linreg) 92 | 93 | # Make predictions 94 | linreg.pred = predict(linreg, newdata=test) 95 | linreg.sse = sum((linreg.pred - test$MEDV)^2) 96 | linreg.sse 97 | 98 | # Create a CART model 99 | tree = rpart(MEDV ~ LAT + LON + CRIM + ZN + INDUS + CHAS + NOX + RM + AGE + DIS + RAD + TAX + PTRATIO, data=train) 100 | prp(tree) 101 | 102 | # Make predictions 103 | tree.pred = predict(tree, newdata=test) 104 | tree.sse = sum((tree.pred - test$MEDV)^2) 105 | tree.sse 106 | 107 | 108 | 109 | # Video 7 110 | 111 | # Load libraries for cross-validation 112 | library(caret) 113 | library(e1071) 114 | 115 | # Number of folds 116 | tr.control = trainControl(method = "cv", number = 10) 117 | 118 | # cp values 119 | cp.grid = expand.grid( .cp = (0:10)*0.001) 120 | 121 | # What did we just do? 122 | 1*0.001 123 | 10*0.001 124 | 0:10 125 | 0:10 * 0.001 126 | 127 | # Cross-validation 128 | tr = train(MEDV ~ LAT + LON + CRIM + ZN + INDUS + CHAS + NOX + RM + AGE + DIS + RAD + TAX + PTRATIO, data = train, method = "rpart", trControl = tr.control, tuneGrid = cp.grid) 129 | 130 | # Extract tree 131 | best.tree = tr$finalModel 132 | prp(best.tree) 133 | 134 | # Make predictions 135 | best.tree.pred = predict(best.tree, newdata=test) 136 | best.tree.sse = sum((best.tree.pred - test$MEDV)^2) 137 | best.tree.sse 138 | 139 | -------------------------------------------------------------------------------- /Decision Trees/Random_Forest.R: -------------------------------------------------------------------------------- 1 | 2 | 3 | # Read in the data 4 | stevens = read.csv("stevens.csv") 5 | str(stevens) 6 | 7 | # Split the data 8 | library(caTools) 9 | set.seed(3000) 10 | spl = sample.split(stevens$Reverse, SplitRatio = 0.7) 11 | Train = subset(stevens, spl==TRUE) 12 | Test = subset(stevens, spl==FALSE) 13 | 14 | # Install rpart library 15 | install.packages("rpart") 16 | library(rpart) 17 | install.packages("rpart.plot") 18 | library(rpart.plot) 19 | 20 | # CART model 21 | StevensTree = rpart(Reverse ~ Circuit + Issue + Petitioner + Respondent + LowerCourt + Unconst, data = Train, method="class", minbucket=25) 22 | 23 | prp(StevensTree) 24 | 25 | # Make predictions 26 | PredictCART = predict(StevensTree, newdata = Test, type = "class") 27 | table(Test$Reverse, PredictCART) 28 | (41+71)/(41+36+22+71) 29 | 30 | # ROC curve 31 | library(ROCR) 32 | 33 | PredictROC = predict(StevensTree, newdata = Test) 34 | PredictROC 35 | 36 | pred = prediction(PredictROC[,2], Test$Reverse) 37 | perf = performance(pred, "tpr", "fpr") 38 | plot(perf) 39 | 40 | 41 | 42 | 43 | # Install randomForest package 44 | install.packages("randomForest") 45 | library(randomForest) 46 | 47 | # Build random forest model 48 | StevensForest = randomForest(Reverse ~ Circuit + Issue + Petitioner + Respondent + LowerCourt + Unconst, data = Train, ntree=200, nodesize=25 ) 49 | 50 | # Convert outcome to factor 51 | Train$Reverse = as.factor(Train$Reverse) 52 | Test$Reverse = as.factor(Test$Reverse) 53 | 54 | # Try again 55 | StevensForest = randomForest(Reverse ~ Circuit + Issue + Petitioner + Respondent + LowerCourt + Unconst, data = Train, ntree=200, nodesize=25 ) 56 | 57 | # Make predictions 58 | PredictForest = predict(StevensForest, newdata = Test) 59 | table(Test$Reverse, PredictForest) 60 | (40+74)/(40+37+19+74) 61 | 62 | 63 | 64 | # VIDEO 6 65 | 66 | # Install cross-validation packages 67 | install.packages("caret") 68 | library(caret) 69 | install.packages("e1071") 70 | library(e1071) 71 | 72 | # Define cross-validation experiment 73 | numFolds = trainControl( method = "cv", number = 10 ) 74 | cpGrid = expand.grid( .cp = seq(0.01,0.5,0.01)) 75 | 76 | # Perform the cross validation 77 | train(Reverse ~ Circuit + Issue + Petitioner + Respondent + LowerCourt + Unconst, data = Train, method = "rpart", trControl = numFolds, tuneGrid = cpGrid ) 78 | 79 | # Create a new CART model 80 | StevensTreeCV = rpart(Reverse ~ Circuit + Issue + Petitioner + Respondent + LowerCourt + Unconst, data = Train, method="class", cp = 0.18) 81 | 82 | # Make predictions 83 | PredictCV = predict(StevensTreeCV, newdata = Test, type = "class") 84 | table(Test$Reverse, PredictCV) 85 | (59+64)/(59+18+29+64) 86 | 87 | -------------------------------------------------------------------------------- /Decision Trees/Random_forest_text.R: -------------------------------------------------------------------------------- 1 | # Unit 5 - Twitter 2 | 3 | 4 | # VIDEO 5 5 | 6 | # Read in the data 7 | 8 | tweets r= read.csv("tweets.csv", stringsAsFactors=FALSE) 9 | 10 | str(tweets) 11 | 12 | 13 | # Create dependent variable 14 | 15 | tweets$Negative = as.factor(tweets$Avg <= -1) 16 | 17 | table(tweets$Negative) 18 | 19 | 20 | # Install new packages 21 | 22 | install.packages("tm") 23 | library(tm) 24 | install.packages("SnowballC") 25 | library(SnowballC) 26 | 27 | 28 | # Create corpus 29 | 30 | corpus = Corpus(VectorSource(tweets$Tweet)) 31 | 32 | # Look at corpus 33 | corpus 34 | 35 | corpus[[1]] 36 | 37 | 38 | # Convert to lower-case 39 | 40 | corpus = tm_map(corpus, tolower) 41 | 42 | corpus[[1]] 43 | 44 | # IMPORTANT NOTE: If you are using the latest version of the tm package, you will need to run the following line before continuing (it converts corpus to a Plain Text Document). This is a recent change having to do with the tolower function that occurred after this video was recorded. 45 | 46 | corpus = tm_map(corpus, PlainTextDocument) 47 | 48 | 49 | # Remove punctuation 50 | 51 | corpus = tm_map(corpus, removePunctuation) 52 | 53 | corpus[[1]] 54 | 55 | # Look at stop words 56 | stopwords("english")[1:10] 57 | 58 | # Remove stopwords and apple 59 | 60 | corpus = tm_map(corpus, removeWords, c("apple", stopwords("english"))) 61 | 62 | corpus[[1]] 63 | 64 | # Stem document 65 | 66 | corpus = tm_map(corpus, stemDocument) 67 | 68 | corpus[[1]] 69 | 70 | 71 | 72 | 73 | # Video 6 74 | 75 | # Create matrix 76 | 77 | frequencies = DocumentTermMatrix(corpus) 78 | 79 | frequencies 80 | 81 | # Look at matrix 82 | 83 | inspect(frequencies[1000:1005,505:515]) 84 | 85 | # Check for sparsity 86 | 87 | findFreqTerms(frequencies, lowfreq=20) 88 | 89 | # Remove sparse terms 90 | 91 | sparse = removeSparseTerms(frequencies, 0.995) 92 | sparse 93 | 94 | # Convert to a data frame 95 | 96 | tweetsSparse = as.data.frame(as.matrix(sparse)) 97 | 98 | # Make all variable names R-friendly 99 | 100 | colnames(tweetsSparse) = make.names(colnames(tweetsSparse)) 101 | 102 | # Add dependent variable 103 | 104 | tweetsSparse$Negative = tweets$Negative 105 | 106 | # Split the data 107 | 108 | library(caTools) 109 | 110 | set.seed(123) 111 | 112 | split = sample.split(tweetsSparse$Negative, SplitRatio = 0.7) 113 | 114 | trainSparse = subset(tweetsSparse, split==TRUE) 115 | testSparse = subset(tweetsSparse, split==FALSE) 116 | 117 | 118 | 119 | # Video 7 120 | 121 | # Build a CART model 122 | 123 | library(rpart) 124 | library(rpart.plot) 125 | 126 | tweetCART = rpart(Negative ~ ., data=trainSparse, method="class") 127 | 128 | prp(tweetCART) 129 | 130 | # Evaluate the performance of the model 131 | predictCART = predict(tweetCART, newdata=testSparse, type="class") 132 | 133 | table(testSparse$Negative, predictCART) 134 | 135 | # Compute accuracy 136 | 137 | (294+18)/(294+6+37+18) 138 | 139 | # Baseline accuracy 140 | 141 | table(testSparse$Negative) 142 | 143 | 300/(300+55) 144 | 145 | 146 | # Random forest model 147 | 148 | library(randomForest) 149 | set.seed(123) 150 | 151 | tweetRF = randomForest(Negative ~ ., data=trainSparse) 152 | 153 | # Make predictions: 154 | predictRF = predict(tweetRF, newdata=testSparse) 155 | 156 | table(testSparse$Negative, predictRF) 157 | 158 | # Accuracy: 159 | (293+21)/(293+7+34+21) 160 | 161 | -------------------------------------------------------------------------------- /Decision Trees/first.py: -------------------------------------------------------------------------------- 1 | from sklearn import tree 2 | 3 | features = [[80,5.6],[56,5.5],[70,5.2],[45,5.2]] 4 | label = [1,0,1,0] 5 | 6 | clf = tree.DecisionTreeClassifier() 7 | clf.fit(features,label) 8 | print(clf.predict([[100,5.9]])) 9 | print(clf.predict([[58,5.6]])) 10 | -------------------------------------------------------------------------------- /Decision Trees/iris.py: -------------------------------------------------------------------------------- 1 | from sklearn.datasets import load_iris 2 | import numpy as np 3 | from sklearn import tree 4 | 5 | iris = load_iris() 6 | #print(iris.feature_names) 7 | #print(iris.target_names) 8 | #print(iris.data[0]) 9 | #print(iris.target[0]) 10 | '''for i in range(len(iris.target)): 11 | print("Features: %s Target: %s"%(iris.data[i],iris.target[i]))''' 12 | test_idx = [0,50,100] 13 | 14 | #training data 15 | train_data = np.delete(iris.data,test_idx,axis=0) 16 | train_label = np.delete(iris.target,test_idx) 17 | 18 | #testing data 19 | test_target = iris.target[test_idx] 20 | test_data = iris.data[test_idx] 21 | clf = tree.DecisionTreeClassifier() 22 | clf.fit(train_data,train_label) 23 | print(clf.predict(test_data)) 24 | print(test_target) 25 | 26 | 27 | import graphviz 28 | dot_data = tree.export_graphviz(clf, out_file=None) 29 | graph = graphviz.Source(dot_data) 30 | graph.render("iris") 31 | 32 | dot_data = tree.export_graphviz(clf, out_file=None, 33 | feature_names=iris.feature_names, 34 | class_names=iris.target_names, 35 | filled=True, rounded=True, 36 | special_characters=True) 37 | graph = graphviz.Source(dot_data) 38 | graph 39 | -------------------------------------------------------------------------------- /Error Estimation/Area Under ROC Curve.py: -------------------------------------------------------------------------------- 1 | # Cross Validation Classification ROC AUC 2 | import pandas 3 | from sklearn import model_selection 4 | from sklearn.linear_model import LogisticRegression 5 | url = "https://archive.ics.uci.edu/ml/machine-learning-databases/pima-indians-diabetes/pima-indians-diabetes.data" 6 | names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class'] 7 | dataframe = pandas.read_csv(url, names=names) 8 | array = dataframe.values 9 | X = array[:,0:8] 10 | Y = array[:,8] 11 | seed = 7 12 | kfold = model_selection.KFold(n_splits=10, random_state=seed) 13 | model = LogisticRegression() 14 | scoring = 'roc_auc' 15 | results = model_selection.cross_val_score(model, X, Y, cv=kfold, scoring=scoring) 16 | print("AUC: %.3f (%.3f)") % (results.mean(), results.std()) -------------------------------------------------------------------------------- /Error Estimation/Classification_Accuracy.py: -------------------------------------------------------------------------------- 1 | # Cross Validation Classification Accuracy 2 | import pandas 3 | from sklearn import model_selection 4 | from sklearn.linear_model import LogisticRegression 5 | url = "https://archive.ics.uci.edu/ml/machine-learning-databases/pima-indians-diabetes/pima-indians-diabetes.data" 6 | names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class'] 7 | dataframe = pandas.read_csv(url, names=names) 8 | array = dataframe.values 9 | X = array[:,0:8] 10 | Y = array[:,8] 11 | seed = 7 12 | kfold = model_selection.KFold(n_splits=10, random_state=seed) 13 | model = LogisticRegression() 14 | scoring = 'accuracy' 15 | results = model_selection.cross_val_score(model, X, Y, cv=kfold, scoring=scoring) 16 | print("Accuracy: %.3f (%.3f)") % (results.mean(), results.std()) -------------------------------------------------------------------------------- /Error Estimation/Logarithmic Loss.py: -------------------------------------------------------------------------------- 1 | # Cross Validation Classification LogLoss 2 | import pandas 3 | from sklearn import model_selection 4 | from sklearn.linear_model import LogisticRegression 5 | url = "https://archive.ics.uci.edu/ml/machine-learning-databases/pima-indians-diabetes/pima-indians-diabetes.data" 6 | names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class'] 7 | dataframe = pandas.read_csv(url, names=names) 8 | array = dataframe.values 9 | X = array[:,0:8] 10 | Y = array[:,8] 11 | seed = 7 12 | kfold = model_selection.KFold(n_splits=10, random_state=seed) 13 | model = LogisticRegression() 14 | scoring = 'neg_log_loss' 15 | results = model_selection.cross_val_score(model, X, Y, cv=kfold, scoring=scoring) 16 | print("Logloss: %.3f (%.3f)") % (results.mean(), results.std()) -------------------------------------------------------------------------------- /K Nearest Neighbors/KMeanClustering.R: -------------------------------------------------------------------------------- 1 | 2 | 3 | flower = read.csv("flower.csv", header=FALSE) 4 | str(flower) 5 | 6 | # Change the data type to matrix 7 | flowerMatrix = as.matrix(flower) 8 | str(flowerMatrix) 9 | 10 | # Turn matrix into a vector 11 | flowerVector = as.vector(flowerMatrix) 12 | str(flowerVector) 13 | 14 | flowerVector2 = as.vector(flower) 15 | str(flowerVector2) 16 | 17 | # Compute distances 18 | distance = dist(flowerVector, method = "euclidean") 19 | 20 | 21 | 22 | # Video 3 23 | 24 | # Hierarchical clustering 25 | clusterIntensity = hclust(distance, method="ward") 26 | 27 | # Plot the dendrogram 28 | plot(clusterIntensity) 29 | 30 | # Select 3 clusters 31 | rect.hclust(clusterIntensity, k = 3, border = "red") 32 | flowerClusters = cutree(clusterIntensity, k = 3) 33 | flowerClusters 34 | 35 | # Find mean intensity values 36 | tapply(flowerVector, flowerClusters, mean) 37 | 38 | # Plot the image and the clusters 39 | dim(flowerClusters) = c(50,50) 40 | image(flowerClusters, axes = FALSE) 41 | 42 | # Original image 43 | image(flowerMatrix,axes=FALSE,col=grey(seq(0,1,length=256))) 44 | 45 | 46 | 47 | # Video 4 48 | 49 | # Let's try this with an MRI image of the brain 50 | 51 | healthy = read.csv("healthy.csv", header=FALSE) 52 | healthyMatrix = as.matrix(healthy) 53 | str(healthyMatrix) 54 | 55 | # Plot image 56 | image(healthyMatrix,axes=FALSE,col=grey(seq(0,1,length=256))) 57 | 58 | # Hierarchial clustering 59 | healthyVector = as.vector(healthyMatrix) 60 | distance = dist(healthyVector, method = "euclidean") 61 | 62 | # We have an error - why? 63 | str(healthyVector) 64 | 65 | 66 | 67 | # Video 5 68 | 69 | # Specify number of clusters 70 | k = 5 71 | 72 | # Run k-means 73 | set.seed(1) 74 | KMC = kmeans(healthyVector, centers = k, iter.max = 1000) 75 | str(KMC) 76 | 77 | # Extract clusters 78 | healthyClusters = KMC$cluster 79 | KMC$centers[2] 80 | 81 | # Plot the image with the clusters 82 | dim(healthyClusters) = c(nrow(healthyMatrix), ncol(healthyMatrix)) 83 | 84 | image(healthyClusters, axes = FALSE, col=rainbow(k)) 85 | 86 | 87 | 88 | # Video 6 89 | 90 | # Apply to a test image 91 | 92 | tumor = read.csv("tumor.csv", header=FALSE) 93 | tumorMatrix = as.matrix(tumor) 94 | tumorVector = as.vector(tumorMatrix) 95 | 96 | # Apply clusters from before to new image, using the flexclust package 97 | install.packages("flexclust") 98 | library(flexclust) 99 | 100 | KMC.kcca = as.kcca(KMC, healthyVector) 101 | tumorClusters = predict(KMC.kcca, newdata = tumorVector) 102 | 103 | # Visualize the clusters 104 | dim(tumorClusters) = c(nrow(tumorMatrix), ncol(tumorMatrix)) 105 | 106 | image(tumorClusters, axes = FALSE, col=rainbow(k)) 107 | 108 | -------------------------------------------------------------------------------- /K Nearest Neighbors/KNN.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# K Nearest Neighbors\n", 8 | "\n", 9 | "For this we will use a classfied dataset.The feature column names are hidden but we are given the data and the target classes.\n", 10 | "\n", 11 | "We will use KNN to create a model that directly predicts a class for a new data point based off the features" 12 | ] 13 | }, 14 | { 15 | "cell_type": "markdown", 16 | "metadata": {}, 17 | "source": [ 18 | "## Import Libraries" 19 | ] 20 | }, 21 | { 22 | "cell_type": "code", 23 | "execution_count": 1, 24 | "metadata": { 25 | "collapsed": true 26 | }, 27 | "outputs": [], 28 | "source": [ 29 | "import pandas as pd\n", 30 | "import seaborn as sns\n", 31 | "import matplotlib.pyplot as plt\n", 32 | "import numpy as np\n", 33 | "%matplotlib inline" 34 | ] 35 | }, 36 | { 37 | "cell_type": "markdown", 38 | "metadata": {}, 39 | "source": [ 40 | "## Get the data" 41 | ] 42 | }, 43 | { 44 | "cell_type": "code", 45 | "execution_count": 2, 46 | "metadata": { 47 | "collapsed": true 48 | }, 49 | "outputs": [], 50 | "source": [ 51 | "df = pd.read_csv('Classified Data',index_col=0)\n", 52 | "#set index_col=0 to use the first column as the index" 53 | ] 54 | }, 55 | { 56 | "cell_type": "code", 57 | "execution_count": 3, 58 | "metadata": {}, 59 | "outputs": [ 60 | { 61 | "data": { 62 | "text/html": [ 63 | "
\n", 64 | "\n", 77 | "\n", 78 | " \n", 79 | " \n", 80 | " \n", 81 | " \n", 82 | " \n", 83 | " \n", 84 | " \n", 85 | " \n", 86 | " \n", 87 | " \n", 88 | " \n", 89 | " \n", 90 | " \n", 91 | " \n", 92 | " \n", 93 | " \n", 94 | " \n", 95 | " \n", 96 | " \n", 97 | " \n", 98 | " \n", 99 | " \n", 100 | " \n", 101 | " \n", 102 | " \n", 103 | " \n", 104 | " \n", 105 | " \n", 106 | " \n", 107 | " \n", 108 | " \n", 109 | " \n", 110 | " \n", 111 | " \n", 112 | " \n", 113 | " \n", 114 | " \n", 115 | " \n", 116 | " \n", 117 | " \n", 118 | " \n", 119 | " \n", 120 | " \n", 121 | " \n", 122 | " \n", 123 | " \n", 124 | " \n", 125 | " \n", 126 | " \n", 127 | " \n", 128 | " \n", 129 | " \n", 130 | " \n", 131 | " \n", 132 | " \n", 133 | " \n", 134 | " \n", 135 | " \n", 136 | " \n", 137 | " \n", 138 | " \n", 139 | " \n", 140 | " \n", 141 | " \n", 142 | " \n", 143 | " \n", 144 | " \n", 145 | " \n", 146 | " \n", 147 | " \n", 148 | " \n", 149 | " \n", 150 | " \n", 151 | " \n", 152 | " \n", 153 | " \n", 154 | " \n", 155 | " \n", 156 | " \n", 157 | " \n", 158 | " \n", 159 | " \n", 160 | " \n", 161 | " \n", 162 | " \n", 163 | " \n", 164 | " \n", 165 | " \n", 166 | "
WTTPTIEQWSBILQEQWGFDJPJFHQENXJTARGET CLASS
00.9139171.1620730.5679460.7554640.7808620.3526080.7596970.6437980.8794221.2314091
10.6356321.0037220.5353420.8256450.9241090.6484500.6753341.0135460.6215521.4927020
20.7213601.2014930.9219900.8555951.5266290.7207811.6263511.1544830.9578771.2855970
31.2342041.3867260.6530460.8256241.1425040.8751281.4097081.3800031.5226921.1530931
41.2794910.9497500.6272800.6689761.2325370.7037271.1155960.6466911.4638121.4191671
\n", 167 | "
" 168 | ], 169 | "text/plain": [ 170 | " WTT PTI EQW SBI LQE QWG FDJ \\\n", 171 | "0 0.913917 1.162073 0.567946 0.755464 0.780862 0.352608 0.759697 \n", 172 | "1 0.635632 1.003722 0.535342 0.825645 0.924109 0.648450 0.675334 \n", 173 | "2 0.721360 1.201493 0.921990 0.855595 1.526629 0.720781 1.626351 \n", 174 | "3 1.234204 1.386726 0.653046 0.825624 1.142504 0.875128 1.409708 \n", 175 | "4 1.279491 0.949750 0.627280 0.668976 1.232537 0.703727 1.115596 \n", 176 | "\n", 177 | " PJF HQE NXJ TARGET CLASS \n", 178 | "0 0.643798 0.879422 1.231409 1 \n", 179 | "1 1.013546 0.621552 1.492702 0 \n", 180 | "2 1.154483 0.957877 1.285597 0 \n", 181 | "3 1.380003 1.522692 1.153093 1 \n", 182 | "4 0.646691 1.463812 1.419167 1 " 183 | ] 184 | }, 185 | "execution_count": 3, 186 | "metadata": {}, 187 | "output_type": "execute_result" 188 | } 189 | ], 190 | "source": [ 191 | "df.head()" 192 | ] 193 | }, 194 | { 195 | "cell_type": "markdown", 196 | "metadata": {}, 197 | "source": [ 198 | "## Standardize the variables\n", 199 | "\n", 200 | "Because the KNN classifier predicts the class of a given test observation by identifying the observations that are nearest to it, the scale of the variables matters. Any variables that are on a large scale will have a much larger effect on the distance between the observations, and hence on the KNN classifier, than variables that are on a small scale." 201 | ] 202 | }, 203 | { 204 | "cell_type": "code", 205 | "execution_count": 16, 206 | "metadata": { 207 | "collapsed": true 208 | }, 209 | "outputs": [], 210 | "source": [ 211 | "from sklearn.preprocessing import StandardScaler" 212 | ] 213 | }, 214 | { 215 | "cell_type": "code", 216 | "execution_count": 17, 217 | "metadata": { 218 | "collapsed": true 219 | }, 220 | "outputs": [], 221 | "source": [ 222 | "scaler = StandardScaler()" 223 | ] 224 | }, 225 | { 226 | "cell_type": "code", 227 | "execution_count": 18, 228 | "metadata": {}, 229 | "outputs": [ 230 | { 231 | "data": { 232 | "text/plain": [ 233 | "StandardScaler(copy=True, with_mean=True, with_std=True)" 234 | ] 235 | }, 236 | "execution_count": 18, 237 | "metadata": {}, 238 | "output_type": "execute_result" 239 | } 240 | ], 241 | "source": [ 242 | "scaler.fit(df.drop('TARGET CLASS',axis=1))" 243 | ] 244 | }, 245 | { 246 | "cell_type": "code", 247 | "execution_count": 19, 248 | "metadata": {}, 249 | "outputs": [], 250 | "source": [ 251 | "scaled_features = scaler.transform(df.drop('TARGET CLASS',axis=1))" 252 | ] 253 | }, 254 | { 255 | "cell_type": "code", 256 | "execution_count": 20, 257 | "metadata": {}, 258 | "outputs": [ 259 | { 260 | "data": { 261 | "text/html": [ 262 | "
\n", 263 | "\n", 276 | "\n", 277 | " \n", 278 | " \n", 279 | " \n", 280 | " \n", 281 | " \n", 282 | " \n", 283 | " \n", 284 | " \n", 285 | " \n", 286 | " \n", 287 | " \n", 288 | " \n", 289 | " \n", 290 | " \n", 291 | " \n", 292 | " \n", 293 | " \n", 294 | " \n", 295 | " \n", 296 | " \n", 297 | " \n", 298 | " \n", 299 | " \n", 300 | " \n", 301 | " \n", 302 | " \n", 303 | " \n", 304 | " \n", 305 | " \n", 306 | " \n", 307 | " \n", 308 | " \n", 309 | " \n", 310 | " \n", 311 | " \n", 312 | " \n", 313 | " \n", 314 | " \n", 315 | " \n", 316 | " \n", 317 | " \n", 318 | " \n", 319 | " \n", 320 | " \n", 321 | " \n", 322 | " \n", 323 | " \n", 324 | " \n", 325 | " \n", 326 | " \n", 327 | " \n", 328 | " \n", 329 | " \n", 330 | " \n", 331 | " \n", 332 | " \n", 333 | " \n", 334 | " \n", 335 | " \n", 336 | " \n", 337 | " \n", 338 | " \n", 339 | " \n", 340 | " \n", 341 | " \n", 342 | " \n", 343 | " \n", 344 | " \n", 345 | " \n", 346 | " \n", 347 | " \n", 348 | " \n", 349 | " \n", 350 | " \n", 351 | " \n", 352 | " \n", 353 | " \n", 354 | " \n", 355 | " \n", 356 | " \n", 357 | " \n", 358 | " \n", 359 | " \n", 360 | " \n", 361 | " \n", 362 | " \n", 363 | " \n", 364 | " \n", 365 | " \n", 366 | " \n", 367 | " \n", 368 | " \n", 369 | " \n", 370 | " \n", 371 | " \n", 372 | " \n", 373 | " \n", 374 | " \n", 375 | " \n", 376 | " \n", 377 | " \n", 378 | " \n", 379 | " \n", 380 | " \n", 381 | " \n", 382 | " \n", 383 | " \n", 384 | " \n", 385 | " \n", 386 | " \n", 387 | " \n", 388 | " \n", 389 | " \n", 390 | " \n", 391 | " \n", 392 | " \n", 393 | " \n", 394 | " \n", 395 | " \n", 396 | " \n", 397 | " \n", 398 | " \n", 399 | " \n", 400 | " \n", 401 | " \n", 402 | " \n", 403 | " \n", 404 | " \n", 405 | " \n", 406 | " \n", 407 | " \n", 408 | " \n", 409 | " \n", 410 | " \n", 411 | " \n", 412 | " \n", 413 | " \n", 414 | " \n", 415 | " \n", 416 | " \n", 417 | " \n", 418 | " \n", 419 | " \n", 420 | " \n", 421 | " \n", 422 | " \n", 423 | " \n", 424 | " \n", 425 | " \n", 426 | " \n", 427 | " \n", 428 | " \n", 429 | " \n", 430 | " \n", 431 | " \n", 432 | " \n", 433 | " \n", 434 | " \n", 435 | " \n", 436 | " \n", 437 | " \n", 438 | " \n", 439 | " \n", 440 | " \n", 441 | " \n", 442 | " \n", 443 | " \n", 444 | " \n", 445 | " \n", 446 | " \n", 447 | " \n", 448 | " \n", 449 | " \n", 450 | " \n", 451 | " \n", 452 | " \n", 453 | " \n", 454 | " \n", 455 | " \n", 456 | " \n", 457 | " \n", 458 | " \n", 459 | " \n", 460 | " \n", 461 | " \n", 462 | " \n", 463 | " \n", 464 | " \n", 465 | " \n", 466 | " \n", 467 | " \n", 468 | " \n", 469 | " \n", 470 | " \n", 471 | " \n", 472 | " \n", 473 | " \n", 474 | " \n", 475 | " \n", 476 | " \n", 477 | " \n", 478 | " \n", 479 | " \n", 480 | " \n", 481 | " \n", 482 | " \n", 483 | " \n", 484 | " \n", 485 | " \n", 486 | " \n", 487 | " \n", 488 | " \n", 489 | " \n", 490 | " \n", 491 | " \n", 492 | " \n", 493 | " \n", 494 | " \n", 495 | " \n", 496 | " \n", 497 | " \n", 498 | " \n", 499 | " \n", 500 | " \n", 501 | " \n", 502 | " \n", 503 | " \n", 504 | " \n", 505 | " \n", 506 | " \n", 507 | " \n", 508 | " \n", 509 | " \n", 510 | " \n", 511 | " \n", 512 | " \n", 513 | " \n", 514 | " \n", 515 | " \n", 516 | " \n", 517 | " \n", 518 | " \n", 519 | " \n", 520 | " \n", 521 | " \n", 522 | " \n", 523 | " \n", 524 | " \n", 525 | " \n", 526 | " \n", 527 | " \n", 528 | " \n", 529 | " \n", 530 | " \n", 531 | " \n", 532 | " \n", 533 | " \n", 534 | " \n", 535 | " \n", 536 | " \n", 537 | " \n", 538 | " \n", 539 | " \n", 540 | " \n", 541 | " \n", 542 | " \n", 543 | " \n", 544 | " \n", 545 | " \n", 546 | " \n", 547 | " \n", 548 | " \n", 549 | " \n", 550 | " \n", 551 | " \n", 552 | " \n", 553 | " \n", 554 | " \n", 555 | " \n", 556 | " \n", 557 | " \n", 558 | " \n", 559 | " \n", 560 | " \n", 561 | " \n", 562 | " \n", 563 | " \n", 564 | " \n", 565 | " \n", 566 | " \n", 567 | " \n", 568 | " \n", 569 | " \n", 570 | " \n", 571 | " \n", 572 | " \n", 573 | " \n", 574 | " \n", 575 | " \n", 576 | " \n", 577 | " \n", 578 | " \n", 579 | " \n", 580 | " \n", 581 | " \n", 582 | " \n", 583 | " \n", 584 | " \n", 585 | " \n", 586 | " \n", 587 | " \n", 588 | " \n", 589 | " \n", 590 | " \n", 591 | " \n", 592 | " \n", 593 | " \n", 594 | " \n", 595 | " \n", 596 | " \n", 597 | " \n", 598 | " \n", 599 | " \n", 600 | " \n", 601 | " \n", 602 | " \n", 603 | " \n", 604 | " \n", 605 | " \n", 606 | " \n", 607 | " \n", 608 | " \n", 609 | " \n", 610 | " \n", 611 | " \n", 612 | " \n", 613 | " \n", 614 | " \n", 615 | " \n", 616 | " \n", 617 | " \n", 618 | " \n", 619 | " \n", 620 | " \n", 621 | " \n", 622 | " \n", 623 | " \n", 624 | " \n", 625 | " \n", 626 | " \n", 627 | " \n", 628 | " \n", 629 | " \n", 630 | " \n", 631 | " \n", 632 | " \n", 633 | " \n", 634 | " \n", 635 | " \n", 636 | " \n", 637 | " \n", 638 | " \n", 639 | " \n", 640 | " \n", 641 | " \n", 642 | " \n", 643 | " \n", 644 | " \n", 645 | " \n", 646 | " \n", 647 | " \n", 648 | " \n", 649 | " \n", 650 | " \n", 651 | " \n", 652 | " \n", 653 | " \n", 654 | " \n", 655 | " \n", 656 | " \n", 657 | " \n", 658 | " \n", 659 | " \n", 660 | " \n", 661 | " \n", 662 | " \n", 663 | " \n", 664 | " \n", 665 | " \n", 666 | " \n", 667 | " \n", 668 | " \n", 669 | " \n", 670 | " \n", 671 | " \n", 672 | " \n", 673 | " \n", 674 | " \n", 675 | " \n", 676 | " \n", 677 | " \n", 678 | " \n", 679 | " \n", 680 | " \n", 681 | " \n", 682 | " \n", 683 | " \n", 684 | " \n", 685 | " \n", 686 | " \n", 687 | " \n", 688 | " \n", 689 | " \n", 690 | " \n", 691 | " \n", 692 | " \n", 693 | " \n", 694 | " \n", 695 | " \n", 696 | " \n", 697 | " \n", 698 | " \n", 699 | " \n", 700 | " \n", 701 | " \n", 702 | " \n", 703 | " \n", 704 | " \n", 705 | " \n", 706 | " \n", 707 | " \n", 708 | " \n", 709 | " \n", 710 | " \n", 711 | " \n", 712 | " \n", 713 | " \n", 714 | " \n", 715 | " \n", 716 | " \n", 717 | " \n", 718 | " \n", 719 | " \n", 720 | " \n", 721 | " \n", 722 | " \n", 723 | " \n", 724 | " \n", 725 | " \n", 726 | " \n", 727 | " \n", 728 | " \n", 729 | " \n", 730 | " \n", 731 | " \n", 732 | " \n", 733 | " \n", 734 | " \n", 735 | " \n", 736 | " \n", 737 | " \n", 738 | " \n", 739 | " \n", 740 | " \n", 741 | " \n", 742 | " \n", 743 | " \n", 744 | " \n", 745 | " \n", 746 | " \n", 747 | " \n", 748 | " \n", 749 | " \n", 750 | " \n", 751 | " \n", 752 | " \n", 753 | " \n", 754 | " \n", 755 | " \n", 756 | " \n", 757 | " \n", 758 | " \n", 759 | " \n", 760 | " \n", 761 | " \n", 762 | " \n", 763 | " \n", 764 | " \n", 765 | " \n", 766 | " \n", 767 | " \n", 768 | " \n", 769 | " \n", 770 | " \n", 771 | " \n", 772 | " \n", 773 | " \n", 774 | " \n", 775 | " \n", 776 | " \n", 777 | " \n", 778 | " \n", 779 | " \n", 780 | " \n", 781 | " \n", 782 | " \n", 783 | " \n", 784 | " \n", 785 | " \n", 786 | " \n", 787 | " \n", 788 | " \n", 789 | " \n", 790 | " \n", 791 | " \n", 792 | " \n", 793 | " \n", 794 | " \n", 795 | " \n", 796 | " \n", 797 | " \n", 798 | " \n", 799 | " \n", 800 | " \n", 801 | " \n", 802 | " \n", 803 | " \n", 804 | " \n", 805 | " \n", 806 | " \n", 807 | " \n", 808 | " \n", 809 | " \n", 810 | " \n", 811 | " \n", 812 | " \n", 813 | " \n", 814 | " \n", 815 | " \n", 816 | " \n", 817 | " \n", 818 | " \n", 819 | " \n", 820 | " \n", 821 | " \n", 822 | " \n", 823 | " \n", 824 | " \n", 825 | " \n", 826 | " \n", 827 | " \n", 828 | " \n", 829 | " \n", 830 | " \n", 831 | " \n", 832 | " \n", 833 | " \n", 834 | " \n", 835 | " \n", 836 | " \n", 837 | " \n", 838 | " \n", 839 | " \n", 840 | " \n", 841 | " \n", 842 | " \n", 843 | " \n", 844 | " \n", 845 | " \n", 846 | " \n", 847 | " \n", 848 | " \n", 849 | " \n", 850 | " \n", 851 | " \n", 852 | " \n", 853 | " \n", 854 | " \n", 855 | " \n", 856 | " \n", 857 | " \n", 858 | " \n", 859 | " \n", 860 | " \n", 861 | " \n", 862 | " \n", 863 | " \n", 864 | " \n", 865 | " \n", 866 | " \n", 867 | " \n", 868 | " \n", 869 | " \n", 870 | " \n", 871 | " \n", 872 | " \n", 873 | " \n", 874 | " \n", 875 | " \n", 876 | " \n", 877 | " \n", 878 | " \n", 879 | " \n", 880 | " \n", 881 | " \n", 882 | " \n", 883 | " \n", 884 | " \n", 885 | " \n", 886 | " \n", 887 | " \n", 888 | " \n", 889 | " \n", 890 | " \n", 891 | " \n", 892 | " \n", 893 | " \n", 894 | " \n", 895 | " \n", 896 | " \n", 897 | " \n", 898 | " \n", 899 | " \n", 900 | " \n", 901 | " \n", 902 | " \n", 903 | " \n", 904 | " \n", 905 | " \n", 906 | " \n", 907 | " \n", 908 | " \n", 909 | " \n", 910 | " \n", 911 | " \n", 912 | " \n", 913 | " \n", 914 | " \n", 915 | " \n", 916 | " \n", 917 | " \n", 918 | " \n", 919 | " \n", 920 | " \n", 921 | " \n", 922 | " \n", 923 | " \n", 924 | " \n", 925 | " \n", 926 | " \n", 927 | " \n", 928 | " \n", 929 | " \n", 930 | " \n", 931 | " \n", 932 | " \n", 933 | " \n", 934 | " \n", 935 | " \n", 936 | " \n", 937 | " \n", 938 | " \n", 939 | " \n", 940 | " \n", 941 | " \n", 942 | " \n", 943 | " \n", 944 | " \n", 945 | " \n", 946 | " \n", 947 | " \n", 948 | " \n", 949 | " \n", 950 | " \n", 951 | " \n", 952 | " \n", 953 | " \n", 954 | " \n", 955 | " \n", 956 | " \n", 957 | " \n", 958 | " \n", 959 | " \n", 960 | " \n", 961 | " \n", 962 | " \n", 963 | " \n", 964 | " \n", 965 | " \n", 966 | " \n", 967 | " \n", 968 | " \n", 969 | " \n", 970 | " \n", 971 | " \n", 972 | " \n", 973 | " \n", 974 | " \n", 975 | " \n", 976 | " \n", 977 | " \n", 978 | " \n", 979 | " \n", 980 | " \n", 981 | " \n", 982 | " \n", 983 | " \n", 984 | " \n", 985 | " \n", 986 | " \n", 987 | " \n", 988 | " \n", 989 | " \n", 990 | " \n", 991 | " \n", 992 | " \n", 993 | " \n", 994 | " \n", 995 | " \n", 996 | " \n", 997 | " \n", 998 | " \n", 999 | " \n", 1000 | " \n", 1001 | " \n", 1002 | " \n", 1003 | " \n", 1004 | " \n", 1005 | " \n", 1006 | " \n", 1007 | " \n", 1008 | " \n", 1009 | " \n", 1010 | " \n", 1011 | " \n", 1012 | " \n", 1013 | " \n", 1014 | " \n", 1015 | " \n", 1016 | " \n", 1017 | " \n", 1018 | " \n", 1019 | " \n", 1020 | " \n", 1021 | " \n", 1022 | " \n", 1023 | " \n", 1024 | " \n", 1025 | " \n", 1026 | " \n", 1027 | " \n", 1028 | " \n", 1029 | " \n", 1030 | " \n", 1031 | " \n", 1032 | " \n", 1033 | " \n", 1034 | " \n", 1035 | " \n", 1036 | " \n", 1037 | " \n", 1038 | " \n", 1039 | " \n", 1040 | " \n", 1041 | " \n", 1042 | " \n", 1043 | " \n", 1044 | " \n", 1045 | " \n", 1046 | " \n", 1047 | " \n", 1048 | " \n", 1049 | " \n", 1050 | " \n", 1051 | " \n", 1052 | " \n", 1053 | " \n", 1054 | " \n", 1055 | " \n", 1056 | " \n", 1057 | " \n", 1058 | " \n", 1059 | " \n", 1060 | " \n", 1061 | " \n", 1062 | " \n", 1063 | " \n", 1064 | " \n", 1065 | " \n", 1066 | " \n", 1067 | " \n", 1068 | " \n", 1069 | " \n", 1070 | " \n", 1071 | " \n", 1072 | " \n", 1073 | " \n", 1074 | " \n", 1075 | " \n", 1076 | " \n", 1077 | " \n", 1078 | " \n", 1079 | " \n", 1080 | " \n", 1081 | " \n", 1082 | " \n", 1083 | " \n", 1084 | " \n", 1085 | " \n", 1086 | " \n", 1087 | "
WTTPTIEQWSBILQEQWGFDJPJFHQENXJ
0-0.1235420.185907-0.9134310.319629-1.033637-2.308375-0.798951-1.482368-0.949719-0.643314
1-1.084836-0.430348-1.0253130.625388-0.444847-1.152706-1.129797-0.202240-1.8280510.636759
2-0.7887020.3393180.3015110.7558732.031693-0.8701562.5998180.285707-0.682494-0.377850
30.9828411.060193-0.6213990.6252990.452820-0.2672201.7502081.0664911.241325-1.026987
41.139275-0.640392-0.709819-0.0571750.822886-0.9367730.596782-1.4723521.0407720.276510
5-0.3998531.5917070.9286491.4771020.3084400.2632701.2397160.722608-2.2068160.809900
6-0.0171890.5349490.826189-1.723636-0.147547-2.010505-0.206348-1.096313-0.158215-1.233974
7-0.461182-0.1008350.210071-1.909291-0.3666950.3968750.7181220.9345230.2284580.308929
8-0.5980541.360189-0.172618-1.502292-1.1924850.504787-0.3259810.834346-0.136536-0.670199
9-0.612806-2.3318760.1972111.1273561.636853-0.2252330.948308-1.6448811.309064-1.865764
101.1583030.843392-0.738540-0.1092780.019954-0.883576-0.725209-1.6363660.2977790.386875
110.3961261.1697770.215062-0.2309561.707287-0.592248-0.1637170.572255-2.0231230.298564
12-0.9213720.5464740.565336-1.239044-0.085144-1.4675101.4576940.075387-1.0601000.059053
130.0125620.796571-0.9306730.9733080.887731-1.0380740.211314-1.8830521.306107-1.194205
14-0.512670-0.692383-0.4707051.264935-0.7441171.9515580.9061281.971821-0.1345870.312793
150.386919-0.270127-0.804363-0.944600-0.4194480.1203410.9805390.6079230.2022341.275846
16-0.8084441.0285450.561652-0.0309470.8080521.092841-1.4670191.689324-1.065861-0.558538
17-0.1723370.4342081.704690-1.5079060.9915440.6126210.2203800.526007-0.1526260.624265
18-0.2275330.6288560.308635-0.0578830.9728461.713172-0.922135-1.195267-0.6213331.426053
19-0.6265171.089683-0.037780-0.3050890.966241-0.2405960.5520062.039341-1.414778-1.293438
201.0472972.1307790.3022250.8408501.5681431.511576-0.2594170.723631-1.434843-0.831651
21-1.1647820.8743840.8183690.510145-1.2728340.766933-0.0441922.319503-0.467466-0.801096
22-0.994955-1.9876550.171669-0.860776-0.8222890.5903280.7557950.969469-1.0725730.619645
231.132875-0.736699-2.2758581.635880-0.149610-0.037040-0.335809-0.1636790.799924-0.658005
240.5536330.969652-0.640984-1.3678431.2680890.5662980.4228730.209496-1.2396970.077090
25-1.2879260.7968900.9642311.9599401.9722250.0434770.542992-0.1536120.3239741.116183
26-0.323861-0.9912210.0003660.9740570.6398031.778300-0.218305-0.7107021.9148740.321497
27-0.128771-0.695307-1.052901-0.054727-0.218007-0.679450-1.682344-1.296608-0.5869890.877280
28-0.939247-0.285971-0.612030-0.8829960.3578541.1061270.0330490.052382-2.701361-0.957683
291.5316560.308442-0.2332011.2157480.7920630.728028-1.288932-1.9478360.7586200.213241
.................................
970-1.092590-0.4273211.4206221.6434361.105013-0.1185340.2530620.111280-0.7231912.285684
971-2.0491590.266015-0.5533501.682972-2.0585211.2156241.1493310.485726-0.554716-0.730912
9720.6643480.3102751.5967760.2363291.5581450.071995-0.8340030.104553-1.2865691.657803
973-2.1198960.890052-0.0275600.702566-0.308356-0.2509751.3227501.453298-0.197014-0.957779
974-0.500890-1.467881-1.720976-0.068558-0.055904-0.9981151.3053900.4522920.379526-0.837198
9752.238760-0.726363-0.9601850.298757-0.5348680.1611351.481551-0.3859020.4027610.257162
976-0.253088-1.0701751.183553-1.8603831.674244-0.457098-0.9104690.181316-1.036693-0.802952
977-0.0967390.9626531.458516-0.1401861.3907561.8061120.4745731.219110-1.2212640.082680
978-1.994428-0.538382-0.976454-0.7103930.5087470.6251131.001561-1.193159-0.939098-1.830024
9791.579881-0.446266-1.440703-0.3647990.5671610.225818-0.5185410.1889681.175769-1.509489
980-0.212821-1.3891022.0374080.6938700.703562-0.622200-0.8811230.306874-2.1373170.986439
9810.777620-0.373316-0.473712-1.7992340.3427402.2758670.155346-0.230044-0.521091-0.656865
982-0.050027-1.235786-0.1059530.0125640.201000-0.464177-0.748513-0.2592531.0157121.752788
9831.073876-1.962424-0.876976-0.6862510.364578-0.8051030.934637-1.355439-0.3726660.105382
9840.0411680.032588-2.0679000.8043371.450942-0.226436-1.0159210.942501-0.633783-2.185136
9850.692676-1.5097230.0850380.299556-1.613375-1.751818-0.577131-2.0765310.358318-0.636963
986-2.121108-0.1401880.436901-0.2653371.0761501.345130-0.4536660.199567-0.7673080.378845
987-0.319126-1.382602-0.098888-0.2089050.023272-0.466318-0.4168672.1525120.1506290.271522
988-1.1659310.9362321.0756200.2278281.1343420.410832-0.2240091.397082-0.3653240.890242
989-1.6200690.489926-0.2743000.5337861.233058-0.1956970.453712-0.8728510.6093380.905628
990-0.254135-0.6689410.7771853.467701-0.8778092.0707651.3449230.780459-2.164150-0.373942
9910.528275-0.416955-1.026314-0.212952-1.214779-0.3081010.4576880.5496900.0757731.541641
992-0.4837981.9007010.538140-0.1401410.355732-0.170696-0.1737501.858942-0.611853-0.426720
993-0.746120-0.251663-0.3600880.7380862.1360360.042642-1.937310-0.7264491.044144-1.342160
9940.908385-1.071156-1.2975450.3978580.241989-0.582658-0.8894480.3130371.2074800.256920
9950.211653-0.3124900.065163-0.2598340.017567-1.395721-0.849486-2.604264-0.139347-0.069602
996-1.292453-0.6169010.3696130.4826481.5698911.2734950.362784-1.242110-0.6797461.473448
9970.641777-0.513083-0.1792051.022255-0.539703-0.229680-2.261339-2.362494-0.8142610.111597
9980.467072-0.982786-1.465194-0.0714652.3686660.001269-0.422041-0.0367770.406025-0.855670
999-0.387654-0.595894-1.4313980.512722-0.402552-2.026512-0.726253-0.5677890.3369970.010350
\n", 1088 | "

1000 rows × 10 columns

\n", 1089 | "
" 1090 | ], 1091 | "text/plain": [ 1092 | " WTT PTI EQW SBI LQE QWG FDJ \\\n", 1093 | "0 -0.123542 0.185907 -0.913431 0.319629 -1.033637 -2.308375 -0.798951 \n", 1094 | "1 -1.084836 -0.430348 -1.025313 0.625388 -0.444847 -1.152706 -1.129797 \n", 1095 | "2 -0.788702 0.339318 0.301511 0.755873 2.031693 -0.870156 2.599818 \n", 1096 | "3 0.982841 1.060193 -0.621399 0.625299 0.452820 -0.267220 1.750208 \n", 1097 | "4 1.139275 -0.640392 -0.709819 -0.057175 0.822886 -0.936773 0.596782 \n", 1098 | "5 -0.399853 1.591707 0.928649 1.477102 0.308440 0.263270 1.239716 \n", 1099 | "6 -0.017189 0.534949 0.826189 -1.723636 -0.147547 -2.010505 -0.206348 \n", 1100 | "7 -0.461182 -0.100835 0.210071 -1.909291 -0.366695 0.396875 0.718122 \n", 1101 | "8 -0.598054 1.360189 -0.172618 -1.502292 -1.192485 0.504787 -0.325981 \n", 1102 | "9 -0.612806 -2.331876 0.197211 1.127356 1.636853 -0.225233 0.948308 \n", 1103 | "10 1.158303 0.843392 -0.738540 -0.109278 0.019954 -0.883576 -0.725209 \n", 1104 | "11 0.396126 1.169777 0.215062 -0.230956 1.707287 -0.592248 -0.163717 \n", 1105 | "12 -0.921372 0.546474 0.565336 -1.239044 -0.085144 -1.467510 1.457694 \n", 1106 | "13 0.012562 0.796571 -0.930673 0.973308 0.887731 -1.038074 0.211314 \n", 1107 | "14 -0.512670 -0.692383 -0.470705 1.264935 -0.744117 1.951558 0.906128 \n", 1108 | "15 0.386919 -0.270127 -0.804363 -0.944600 -0.419448 0.120341 0.980539 \n", 1109 | "16 -0.808444 1.028545 0.561652 -0.030947 0.808052 1.092841 -1.467019 \n", 1110 | "17 -0.172337 0.434208 1.704690 -1.507906 0.991544 0.612621 0.220380 \n", 1111 | "18 -0.227533 0.628856 0.308635 -0.057883 0.972846 1.713172 -0.922135 \n", 1112 | "19 -0.626517 1.089683 -0.037780 -0.305089 0.966241 -0.240596 0.552006 \n", 1113 | "20 1.047297 2.130779 0.302225 0.840850 1.568143 1.511576 -0.259417 \n", 1114 | "21 -1.164782 0.874384 0.818369 0.510145 -1.272834 0.766933 -0.044192 \n", 1115 | "22 -0.994955 -1.987655 0.171669 -0.860776 -0.822289 0.590328 0.755795 \n", 1116 | "23 1.132875 -0.736699 -2.275858 1.635880 -0.149610 -0.037040 -0.335809 \n", 1117 | "24 0.553633 0.969652 -0.640984 -1.367843 1.268089 0.566298 0.422873 \n", 1118 | "25 -1.287926 0.796890 0.964231 1.959940 1.972225 0.043477 0.542992 \n", 1119 | "26 -0.323861 -0.991221 0.000366 0.974057 0.639803 1.778300 -0.218305 \n", 1120 | "27 -0.128771 -0.695307 -1.052901 -0.054727 -0.218007 -0.679450 -1.682344 \n", 1121 | "28 -0.939247 -0.285971 -0.612030 -0.882996 0.357854 1.106127 0.033049 \n", 1122 | "29 1.531656 0.308442 -0.233201 1.215748 0.792063 0.728028 -1.288932 \n", 1123 | ".. ... ... ... ... ... ... ... \n", 1124 | "970 -1.092590 -0.427321 1.420622 1.643436 1.105013 -0.118534 0.253062 \n", 1125 | "971 -2.049159 0.266015 -0.553350 1.682972 -2.058521 1.215624 1.149331 \n", 1126 | "972 0.664348 0.310275 1.596776 0.236329 1.558145 0.071995 -0.834003 \n", 1127 | "973 -2.119896 0.890052 -0.027560 0.702566 -0.308356 -0.250975 1.322750 \n", 1128 | "974 -0.500890 -1.467881 -1.720976 -0.068558 -0.055904 -0.998115 1.305390 \n", 1129 | "975 2.238760 -0.726363 -0.960185 0.298757 -0.534868 0.161135 1.481551 \n", 1130 | "976 -0.253088 -1.070175 1.183553 -1.860383 1.674244 -0.457098 -0.910469 \n", 1131 | "977 -0.096739 0.962653 1.458516 -0.140186 1.390756 1.806112 0.474573 \n", 1132 | "978 -1.994428 -0.538382 -0.976454 -0.710393 0.508747 0.625113 1.001561 \n", 1133 | "979 1.579881 -0.446266 -1.440703 -0.364799 0.567161 0.225818 -0.518541 \n", 1134 | "980 -0.212821 -1.389102 2.037408 0.693870 0.703562 -0.622200 -0.881123 \n", 1135 | "981 0.777620 -0.373316 -0.473712 -1.799234 0.342740 2.275867 0.155346 \n", 1136 | "982 -0.050027 -1.235786 -0.105953 0.012564 0.201000 -0.464177 -0.748513 \n", 1137 | "983 1.073876 -1.962424 -0.876976 -0.686251 0.364578 -0.805103 0.934637 \n", 1138 | "984 0.041168 0.032588 -2.067900 0.804337 1.450942 -0.226436 -1.015921 \n", 1139 | "985 0.692676 -1.509723 0.085038 0.299556 -1.613375 -1.751818 -0.577131 \n", 1140 | "986 -2.121108 -0.140188 0.436901 -0.265337 1.076150 1.345130 -0.453666 \n", 1141 | "987 -0.319126 -1.382602 -0.098888 -0.208905 0.023272 -0.466318 -0.416867 \n", 1142 | "988 -1.165931 0.936232 1.075620 0.227828 1.134342 0.410832 -0.224009 \n", 1143 | "989 -1.620069 0.489926 -0.274300 0.533786 1.233058 -0.195697 0.453712 \n", 1144 | "990 -0.254135 -0.668941 0.777185 3.467701 -0.877809 2.070765 1.344923 \n", 1145 | "991 0.528275 -0.416955 -1.026314 -0.212952 -1.214779 -0.308101 0.457688 \n", 1146 | "992 -0.483798 1.900701 0.538140 -0.140141 0.355732 -0.170696 -0.173750 \n", 1147 | "993 -0.746120 -0.251663 -0.360088 0.738086 2.136036 0.042642 -1.937310 \n", 1148 | "994 0.908385 -1.071156 -1.297545 0.397858 0.241989 -0.582658 -0.889448 \n", 1149 | "995 0.211653 -0.312490 0.065163 -0.259834 0.017567 -1.395721 -0.849486 \n", 1150 | "996 -1.292453 -0.616901 0.369613 0.482648 1.569891 1.273495 0.362784 \n", 1151 | "997 0.641777 -0.513083 -0.179205 1.022255 -0.539703 -0.229680 -2.261339 \n", 1152 | "998 0.467072 -0.982786 -1.465194 -0.071465 2.368666 0.001269 -0.422041 \n", 1153 | "999 -0.387654 -0.595894 -1.431398 0.512722 -0.402552 -2.026512 -0.726253 \n", 1154 | "\n", 1155 | " PJF HQE NXJ \n", 1156 | "0 -1.482368 -0.949719 -0.643314 \n", 1157 | "1 -0.202240 -1.828051 0.636759 \n", 1158 | "2 0.285707 -0.682494 -0.377850 \n", 1159 | "3 1.066491 1.241325 -1.026987 \n", 1160 | "4 -1.472352 1.040772 0.276510 \n", 1161 | "5 0.722608 -2.206816 0.809900 \n", 1162 | "6 -1.096313 -0.158215 -1.233974 \n", 1163 | "7 0.934523 0.228458 0.308929 \n", 1164 | "8 0.834346 -0.136536 -0.670199 \n", 1165 | "9 -1.644881 1.309064 -1.865764 \n", 1166 | "10 -1.636366 0.297779 0.386875 \n", 1167 | "11 0.572255 -2.023123 0.298564 \n", 1168 | "12 0.075387 -1.060100 0.059053 \n", 1169 | "13 -1.883052 1.306107 -1.194205 \n", 1170 | "14 1.971821 -0.134587 0.312793 \n", 1171 | "15 0.607923 0.202234 1.275846 \n", 1172 | "16 1.689324 -1.065861 -0.558538 \n", 1173 | "17 0.526007 -0.152626 0.624265 \n", 1174 | "18 -1.195267 -0.621333 1.426053 \n", 1175 | "19 2.039341 -1.414778 -1.293438 \n", 1176 | "20 0.723631 -1.434843 -0.831651 \n", 1177 | "21 2.319503 -0.467466 -0.801096 \n", 1178 | "22 0.969469 -1.072573 0.619645 \n", 1179 | "23 -0.163679 0.799924 -0.658005 \n", 1180 | "24 0.209496 -1.239697 0.077090 \n", 1181 | "25 -0.153612 0.323974 1.116183 \n", 1182 | "26 -0.710702 1.914874 0.321497 \n", 1183 | "27 -1.296608 -0.586989 0.877280 \n", 1184 | "28 0.052382 -2.701361 -0.957683 \n", 1185 | "29 -1.947836 0.758620 0.213241 \n", 1186 | ".. ... ... ... \n", 1187 | "970 0.111280 -0.723191 2.285684 \n", 1188 | "971 0.485726 -0.554716 -0.730912 \n", 1189 | "972 0.104553 -1.286569 1.657803 \n", 1190 | "973 1.453298 -0.197014 -0.957779 \n", 1191 | "974 0.452292 0.379526 -0.837198 \n", 1192 | "975 -0.385902 0.402761 0.257162 \n", 1193 | "976 0.181316 -1.036693 -0.802952 \n", 1194 | "977 1.219110 -1.221264 0.082680 \n", 1195 | "978 -1.193159 -0.939098 -1.830024 \n", 1196 | "979 0.188968 1.175769 -1.509489 \n", 1197 | "980 0.306874 -2.137317 0.986439 \n", 1198 | "981 -0.230044 -0.521091 -0.656865 \n", 1199 | "982 -0.259253 1.015712 1.752788 \n", 1200 | "983 -1.355439 -0.372666 0.105382 \n", 1201 | "984 0.942501 -0.633783 -2.185136 \n", 1202 | "985 -2.076531 0.358318 -0.636963 \n", 1203 | "986 0.199567 -0.767308 0.378845 \n", 1204 | "987 2.152512 0.150629 0.271522 \n", 1205 | "988 1.397082 -0.365324 0.890242 \n", 1206 | "989 -0.872851 0.609338 0.905628 \n", 1207 | "990 0.780459 -2.164150 -0.373942 \n", 1208 | "991 0.549690 0.075773 1.541641 \n", 1209 | "992 1.858942 -0.611853 -0.426720 \n", 1210 | "993 -0.726449 1.044144 -1.342160 \n", 1211 | "994 0.313037 1.207480 0.256920 \n", 1212 | "995 -2.604264 -0.139347 -0.069602 \n", 1213 | "996 -1.242110 -0.679746 1.473448 \n", 1214 | "997 -2.362494 -0.814261 0.111597 \n", 1215 | "998 -0.036777 0.406025 -0.855670 \n", 1216 | "999 -0.567789 0.336997 0.010350 \n", 1217 | "\n", 1218 | "[1000 rows x 10 columns]" 1219 | ] 1220 | }, 1221 | "execution_count": 20, 1222 | "metadata": {}, 1223 | "output_type": "execute_result" 1224 | } 1225 | ], 1226 | "source": [ 1227 | "df_feat = pd.DataFrame(scaled_features,columns=df.columns[:-1])\n", 1228 | "df_feat" 1229 | ] 1230 | }, 1231 | { 1232 | "cell_type": "markdown", 1233 | "metadata": {}, 1234 | "source": [ 1235 | "## Train Test Split" 1236 | ] 1237 | }, 1238 | { 1239 | "cell_type": "code", 1240 | "execution_count": 21, 1241 | "metadata": { 1242 | "collapsed": true 1243 | }, 1244 | "outputs": [], 1245 | "source": [ 1246 | "from sklearn.model_selection import train_test_split" 1247 | ] 1248 | }, 1249 | { 1250 | "cell_type": "code", 1251 | "execution_count": 22, 1252 | "metadata": { 1253 | "collapsed": true 1254 | }, 1255 | "outputs": [], 1256 | "source": [ 1257 | "X_train, X_test, y_train, y_test = train_test_split(scaled_features,df['TARGET CLASS'],\n", 1258 | " test_size=0.30)" 1259 | ] 1260 | }, 1261 | { 1262 | "cell_type": "markdown", 1263 | "metadata": {}, 1264 | "source": [ 1265 | "## Using KNN\n", 1266 | "We'll start with k=1." 1267 | ] 1268 | }, 1269 | { 1270 | "cell_type": "code", 1271 | "execution_count": 23, 1272 | "metadata": { 1273 | "collapsed": true 1274 | }, 1275 | "outputs": [], 1276 | "source": [ 1277 | "from sklearn.neighbors import KNeighborsClassifier" 1278 | ] 1279 | }, 1280 | { 1281 | "cell_type": "code", 1282 | "execution_count": 24, 1283 | "metadata": { 1284 | "collapsed": true 1285 | }, 1286 | "outputs": [], 1287 | "source": [ 1288 | "knn = KNeighborsClassifier(n_neighbors=1)" 1289 | ] 1290 | }, 1291 | { 1292 | "cell_type": "code", 1293 | "execution_count": 25, 1294 | "metadata": {}, 1295 | "outputs": [ 1296 | { 1297 | "data": { 1298 | "text/plain": [ 1299 | "KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',\n", 1300 | " metric_params=None, n_jobs=1, n_neighbors=1, p=2,\n", 1301 | " weights='uniform')" 1302 | ] 1303 | }, 1304 | "execution_count": 25, 1305 | "metadata": {}, 1306 | "output_type": "execute_result" 1307 | } 1308 | ], 1309 | "source": [ 1310 | "knn.fit(X_train,y_train)" 1311 | ] 1312 | }, 1313 | { 1314 | "cell_type": "code", 1315 | "execution_count": 27, 1316 | "metadata": {}, 1317 | "outputs": [], 1318 | "source": [ 1319 | "pred = knn.predict(X_test)" 1320 | ] 1321 | }, 1322 | { 1323 | "cell_type": "markdown", 1324 | "metadata": {}, 1325 | "source": [ 1326 | "## Predictions and Evaluations" 1327 | ] 1328 | }, 1329 | { 1330 | "cell_type": "code", 1331 | "execution_count": 28, 1332 | "metadata": { 1333 | "collapsed": true 1334 | }, 1335 | "outputs": [], 1336 | "source": [ 1337 | "from sklearn.metrics import classification_report,confusion_matrix" 1338 | ] 1339 | }, 1340 | { 1341 | "cell_type": "code", 1342 | "execution_count": 29, 1343 | "metadata": {}, 1344 | "outputs": [ 1345 | { 1346 | "name": "stdout", 1347 | "output_type": "stream", 1348 | "text": [ 1349 | "[[134 21]\n", 1350 | " [ 15 130]]\n" 1351 | ] 1352 | } 1353 | ], 1354 | "source": [ 1355 | "print(confusion_matrix(y_test,pred))" 1356 | ] 1357 | }, 1358 | { 1359 | "cell_type": "code", 1360 | "execution_count": 30, 1361 | "metadata": {}, 1362 | "outputs": [ 1363 | { 1364 | "name": "stdout", 1365 | "output_type": "stream", 1366 | "text": [ 1367 | " precision recall f1-score support\n", 1368 | "\n", 1369 | " 0 0.90 0.86 0.88 155\n", 1370 | " 1 0.86 0.90 0.88 145\n", 1371 | "\n", 1372 | "avg / total 0.88 0.88 0.88 300\n", 1373 | "\n" 1374 | ] 1375 | } 1376 | ], 1377 | "source": [ 1378 | "print(classification_report(y_test,pred))" 1379 | ] 1380 | }, 1381 | { 1382 | "cell_type": "markdown", 1383 | "metadata": {}, 1384 | "source": [ 1385 | "## Choosing a K Value\n", 1386 | "\n", 1387 | "Using elbow method to pick a good K value:" 1388 | ] 1389 | }, 1390 | { 1391 | "cell_type": "code", 1392 | "execution_count": 31, 1393 | "metadata": { 1394 | "collapsed": true 1395 | }, 1396 | "outputs": [], 1397 | "source": [ 1398 | "error_rate = []\n", 1399 | "\n", 1400 | "for i in range(1,30):\n", 1401 | " knn = KNeighborsClassifier(n_neighbors=i)\n", 1402 | " knn.fit(X_train,y_train)\n", 1403 | " pred_i = knn.predict(X_test)\n", 1404 | " error_rate.append(np.mean(pred_i != y_test))" 1405 | ] 1406 | }, 1407 | { 1408 | "cell_type": "code", 1409 | "execution_count": 38, 1410 | "metadata": {}, 1411 | "outputs": [ 1412 | { 1413 | "data": { 1414 | "text/plain": [ 1415 | "" 1416 | ] 1417 | }, 1418 | "execution_count": 38, 1419 | "metadata": {}, 1420 | "output_type": "execute_result" 1421 | }, 1422 | { 1423 | "data": { 1424 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAmUAAAGCCAYAAAChJrSAAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzs3Xl8VNX9//HXnZlsEwJECCBqUBSPuNQibqitVfyqVfni\nUhEXXKsg2i8VtS5Va7Vf6U9aqC24YEWKC0JblS/WolZbbYsrLrWAxxVQEIiCBLLPzP39MZkSIkkm\nyb2zvp+PBw/IzJ07nzkz3HzmnM85x3FdFxERERFJr0C6AxARERERJWUiIiIiGUFJmYiIiEgGUFIm\nIiIikgGUlImIiIhkACVlIiIiIhkglO4ARCR7GGNc4CMg0uqu8621r/n4vHOAk4Evm28KAuuBq621\nr3bw2CLgLGvtXL/i6+D5dwc+tNaGWtx2A3AqcKy1tqbF7T8D9rbWjml1jr2Bt4CdrbXVbTzPd4Df\nWmv38vxFiEhKKCkTkc76jrX2szQ8713W2p8lfjDGjAH+COzaweOGAecDaUnKWjPGnAdcABzVMiFr\nNgf4lzGmZ6vkaxzwRFsJmYjkBiVlIuKJ5h6hJcB84CBr7dHNPWs3AhcC+wL7AfcAfYB64Dpr7TPN\nvTx3AJ8BTdbac5N4yoXAfGNMhbW2yhjzfeBq4te1z4knMvXAE0BPY8zfrbXfMsYcCfwKKAe+AM6x\n1n7c6rUsAF611v6y+edvAk8TTwBvA84EnOZ4z7PWrk2yjY4FfkY8sf2i9f3W2g+NMW8C3wNmt7jr\nXOCy5nOMAGYApUAM+B9r7V9aPc8c4r1zP2v9szFmX+Lvwc5AA3CRtfaNZOIXEX+ppkxEvNQXeNta\ne3SL2xxrrQFc4DFghrV2H+D7wDxjTFnzccOAe5NJyIwxDjAReB/4whjTj3ii8l/W2iHAh8DN1tr1\nwA3Ay80JWRmwCLixeZjvLmDBDp7iD8B/t/j5tObbhgJjgP2ttXsTT/iO67BV4jEfQLy3brS1dmU7\nh84BzmvxuCOJJ5ovNN80C5ja3IY/B+5N5vmbzxUAngTmNsc/AVhojNEXdJEMoKRMRDrrb8aY91r8\n+XuL+wqIJyotPdX89x7AAOKJGc29M6uAQ5rvr7PWvkDbJiWeE6gBvgOcZK11rbUbgJ4thlX/Dgze\nwTm+BXxmrX2uOYZ5wF7GmMpWx/0JGGaM2an559OIJ29fARXAucaYcmvtb5KsVXOI9yAWAb06OHYB\ncIgxJjEsO454EhVr/vmbbEsk23qdbdkH6EdzL5y19p9AFXBEJ84hIj7RtyMR6az2asqiO6h72tj8\ndwXwlbW25Ya7m4gnCetaHNeWu1oMx80jPhz3UfPPQeA2Y8x/E58EUEa8F6213sCezYldQkNzbKsT\nN1hra4wxfwFONsb8k/hQ5z+tta4x5nTgGuA3xpiXgAnW2k87iN0h3sO2B/Eh1+FtDXlaa6uNMQuJ\nJ37TiQ9lHt7ikHOB/2nu9Qs2nztZvYEwsMIYk7itJ/HhZBFJMyVlIpIq64GdjDFOi8SsT/PtnXUL\n8IYx5l5r7RrgLOLDjd+21n5hjLmUePLS2lpghbX24CSe4w/Ee8gqgD8kYrbW/hX4qzGmFPgF8SHE\njoZcY9bafwP/NsY8BPzeGPMda21TG8fPaT73cuA9a+2HAMaYXYD7gcOstW8bY4aw4+QzSjxhSyhv\n/nstUN089CkiGUbDlyKSKiuJF8afBWCMOYL4cGanl9Kw1n5AfDgwMRuzH7CyOSHrQ7xXqkfzfU3E\nC/0d4FVgZ2PMYc0xDDbGPNR8X2uLiA/rnUrzcKEx5nhjzExjTKB55uQ7xGvlOuNG4gX609o55gXi\nidQPiSdoCRXEh27fa64DSxT/92j1+M+BA5vvGwwc1Xz7KuAzY8z3mu/ra4yZ15xgikiaKSkTkc5q\nXVP2njHmyo4e1NzTNBa40hizAvg1cOYOloVI1m3AGGPMgcA8oI8x5sPmf98E7GaM+SXwD2Ag8V6i\nRuLDgb9pjuEJ4PethlQT8W4BlgKDgFeab36J+PDf+8aYZcQTzFsAjDFzjTGjOgraWhsh3g5nGWN2\n2MPWXD/2EPFhy5YTEd4hPgv0feBl4onjK8CLrU5xP7C7MeYDYArxXr/W78F7za/n+W68ByLiIcd1\nO/slT0RERES8pp4yERERkQygpExEREQkAygpExEREckASspEREREMoCSMhEREZEMkPWLx1ZVbWlz\n+mh5eZhNm2pTGU7eUlunhto5ddTWqaF2Th21deq019YVFWVt7sKR0z1loVCw44PEE2rr1FA7p47a\nOjXUzqmjtk6drrZ1TidlIiIiItlCSZmIiIhIBlBSJiIiIpIBlJSJiIiIZAAlZSIiIiIZQEmZiIiI\nSAZQUiYiIiKSAbJ+8Vg/OdWbKVq0kMD6dcT6D6Bh1Gjcnr3SHZaIiIjkICVlbQhPn0r4rmk4tTX/\nua3Hj6+jdtJkaq+6No2RiYiISC7yNSkzxkwHDgdcYJK19vUW9xUD9wH7WWsPbnH7ncC3mmObYq19\n3M8YdyQ8fSqlU27/2u1Obc1/bldiJiIiIl7yrabMGHM0MMRaOwK4BPh1q0OmAm+3eswxwP7NjzkR\n+JVf8bXFqd5M+K5p7R4TvmsazpbqFEUkIiIi+cDPQv+RwJMA1toVQLkxpmeL+28Enmj1mJeAM5v/\n/RVQaoxJ6WZdRYsWbjdkuSNObQ1FixamKCIRERHJB34OXw4Alrb4uar5tmoAa+0WY0yflg+w1kaB\nREZ0CfB0821tKi8Pt7vxZ0VFWeei3ropqcPKtm6irLPnznGdbmvpErVz6qitU0PtnDpq69TpSlun\nstDfSfZAY8xo4knZ8R0du2lTbZv3VVSUUVW1JdmnBaC4RznJNOOWHuXUd/LcuawrbS2dp3ZOHbV1\naqidU0dtnTrttXV7yZqfw5drifeMJQwEPu/oQcaYE4AfA9+11m72KbY2NYwajRsubfcYN1xKw6jR\nKYpIRERE8oGfSdmzwPcAjDEHAWutte2m6MaYXsQnAJxird3oY2xtcnv2onbS5HaPqZ00GbesZ7vH\niIiIiHSGb8OX1tolxpilxpglQAy4whhzIbDZWvuEMeb3wG6AMcb8DZgF9AD6AguMMYlTnW+tXe1X\nnDuSWO6i9TplbrhU65SJiIiILxzXddMdQ7dUVW1p8wV0d/zc2VINf1jI9Os20XPvflzy55PUQ9YG\n1Sqkhto5ddTWqaF2Th21dep0UFPWZo29VvRvh1vWEy4ax/2/KKW0ES4ua3+pDBEREZGu0obkSais\ndPnsM4dou4tziIiIiHSdkrIkDBoUIxJxWLs26VU9RERERDpFSVkSBg2KAbB6tZpLRERE/KEsIwmJ\npGzVKvWUiYiIiD+UlCWhsjI+wVM9ZSIiIuIXZRlJSPSUrVyp5hIRERF/KMtIwsCBLqGQq54yERER\n8Y2yjCQEg7Drrq5qykRERMQ3SsqSVFkZo6oqQI3WjxUREREfKClLUqKu7NNP1WQiIiLiPWUYSRo0\nKD4DU0OYIiIi4gclZUnSArIiIiLiJ2UYSaqsTCwgqyYTERER7ynDSNK2njINX4qIiIj3lJQlqbwc\nyspc9ZSJiIiIL5RhJMlx4kOYq1YFcN10RyMiIiK5RklZJwwaFKO21uGLLzSEKSIiIt5SUtYJ2zYm\nV1ImIiIi3lJS1gmJYn/VlYmIiIjXlF10gtYqExEREb8ou+gEreovIiIiflFS1gm77aaeMhEREfGH\nsotOKC6GAQNiqikTERERzym76KTKyhhr1jg0NaU7EhEREcklSso6adAgl2jUYc0a1ZWJiIiId5SU\ndVJiY3LVlYmIiIiXlFl0ktYqExERET8os+ikxLIYWtVfREREvKSkrJPUUyYiIiJ+UGbRSQMGuBQW\nuqopExEREU8ps+ikQAB2283Vqv4iIiLiKSVlXVBZGePLLwNs3ZruSERERCRXKCnrAtWViYiIiNeU\nVXRBYq0yJWUiIiLiFWUVXaBlMURERMRrSsq6QMOXIiIi4jVlFV2QSMq0LIaIiIh4RVlFF/TqBb16\naVkMERER8Y6Ssi4aNCjG6tUBXDfdkYiIiEguUFLWRZWVMerrHTZsUG+ZiIiIdJ+Ssi5KzMDUEKaI\niIh4QUlZF2kGpoiIiHhJGUUXJRaQ1QxMERER8YIyii7afXf1lImIiIh3lFF00S67uDiOq1X9RURE\nxBNKyrqoqAgGDnTVUyYiIiKeUEbRDZWVMdaudWhsTHckIiIiku2UlHXDoEEuruvw2WcawhQREZHu\nUVLWDYkZmBrCFBERke4K+XlyY8x04HDABSZZa19vcV8xcB+wn7X24Ba37w8sBKZba2f4GV93bb9W\nWTS9wYiIiEhW862LxxhzNDDEWjsCuAT4datDpgJvt3pMKfAb4Hm/4vJSZWV8VX/NwBQREZHu8nPc\nbSTwJIC1dgVQbozp2eL+G4EnWj2mATgJWOtjXJ7RWmUiIiLiFT+ziQFAVYufq5pvA8Bau6X1A6y1\nEWttnY8xeapfP5fiYler+ouIiEi3+VpT1oovY3zl5WFCoWCb91dUlPnxtP+xxx6wenXQ9+fJBmqD\n1FA7p47aOjXUzqmjtk6drrS1n0nZWlr0jAEDgc+9fpJNm2rbvK+iooyqqq91yHlql11KWLEixIcf\nbqFXL1+fKqOloq1F7ZxKauvUUDunjto6ddpr6/aSNT/H3Z4FvgdgjDkIWLujIctsl5iBqSFMERER\n6Q7fesqstUuMMUuNMUuAGHCFMeZCYLO19gljzO+B3QBjjPkbMAuwwC+B3YEmY8z3gNOttRv9irO7\nEmuVrVwZ4IADYmmORkRERLKVrzVl1trrW930Tov7zmzjYd/xLSAfDBqkZTFERESk+zTm1k1a1V9E\nRES8oEyim1RTJiIiIl5QJtFNZWWw004x9ZSJiIhItyiT8MCgQS6ffuoQU52/iIiIdJGSMg9UVsZo\nbHRYt07F/iIiItI1Sso8oLoyERER6S5lER6orIwvi7FypXrKREREpGuUlHlAPWUiIiLSXcoiPKC1\nykRERKS7lEV4YNddXQIBV6v6i4iISJcpKfNAQUE8MVNPmYiIiHSVsgiPVFbGWLcuQH19uiMRERGR\nbKSkzCOJYv9PP1WTioiISOcpg/BIYlkM1ZWJiIhIVygp80iip2zlSjWpiIiIdJ4yCI8klsXQWmUi\nIiLSFcogPDJoUHz4ctUqDV+KiIhI5ykp80jfvi7hsKueMhEREekSZRAecZx4XdmqVQFcN93RiIiI\nSLZRUuahykqXLVscvvoq3ZGIiIhItlFS5qHEDEyt7C8iIiKdpezBQ9qYXERERLpK2YOH1FMmIiIi\nXaXswUOJVf21LIaIiIh0lpIyD2kBWREREekqZQ8eKi2Fvn1jGr4UERGRTlP24LFBg1w++8whGk13\nJCIiIpJNlJR5bNCgGJGIw9q1qisTERGR5Ckp81hiBqbqykRERKQzlDl4TDMwRUREpCuUlHlMPWUi\nIiLSFcocPJZYFmPlSjWtiIiIJE+Zg8cGDnQJhVz1lImIiEinKHPwWCgEu+ziqqZMREREOkVJmQ8G\nDYpRVRWgtjbdkYiIiEi2UFLmAxX7i4iISGcpa/DBoEHxZTFWr9YQpoiIiCRHSZkPEj1l2gNTRERE\nkqWswQeJZTE0fCkiIiLJUtbgg209ZRq+FBERkeQoKfNBeTn06OFq+FJERESSpqzBB44T7y1btSqA\n66Y7GhEREckGSsp8UlkZo7bW4csvNYQpIiIiHVNS5pPEshiqKxMREZFkKCnziZbFEBERkc5QxuAT\nreovIiIinaGMwSeVlRq+FBERkeQpKfPJbrupp0xERESSp4zBJyUl0L9/TDVlIiIikhRlDD4aNCjG\nmjUOTU3pjkREREQynZIyH1VWukSjDmvWqK5MRERE2hfy8+TGmOnA4YALTLLWvt7ivmLgPmA/a+3B\nyTwm27Scgbn77tE0RyMiIiKZzLeeMmPM0cAQa+0I4BLg160OmQq83cnHZBWtVSYiIiLJ8jNbGAk8\nCWCtXQGUG2N6trj/RuCJTj4mqyRW9V+9WsOXIiIi0j4/hy8HAEtb/FzVfFs1gLV2izGmT2cesyPl\n5WFCoWCbQVRUlHUuag8NGxb/e926IioqitIWR6qks63zido5ddTWqaF2Th21dep0pa19rSlrpSvd\nRR0+ZtOm2jbvq6goo6pqSxee1hsFBVBY2IMPPohRVdV2nLkg3W2dL9TOqaO2Tg21c+qorVOnvbZu\nL1nzc/hyLfFeroSBwOc+PCZjBYOw666uVvUXERGRDvmZlD0LfA/AGHMQsNZa21GK3pXHZLRBg2J8\n+WWArVvTHYmIiIhkMt+GL621S4wxS40xS4AYcIUx5kJgs7X2CWPM74HdAGOM+Rswy1r7aOvH+BVf\nqlRWbpuBud9+sTRHIyIiIpnK15oya+31rW56p8V9Zyb5mKzWcq0yJWUiIiLSlqSGL40x+xtjTm3+\nd29/Q8otiWUxVFcmIiIi7ekwKTPGXAXMBn7afNPNxpibfI0qh7TsKRMRERFpSzKZwtnEtz3a2Pzz\ntcApvkWUY1rWlImIiIi0JZlMYYu19j/FUM3/VnFUknr3hl69XK3qLyIiIu1KptD/I2PMT4hveXQ6\ncBawwt+wcsugQTE++CCA64Kj3ExERER2IJmesiuAGmANcB7wKnC5n0HlmsrKGHV1Dhs2KCMTERGR\nHUump+wKa+0vgF8kbjDG/BT4iW9R5ZiWMzD793fTHI2IiIhkojaTMmPMMcCxwHnGmJ1a3FUAXISS\nsqS1LPY/9FCV44mIiMjXtddT9h6wc/O/oy1ubwLG+hZRDtKyGCIiItKRNpMya+3nwKPGmCXW2pUt\n7zPG/A/wN39Dyx2D+2ziYuZx0NOfUTygLw2jRuP27JXusERERCSDJFNT1tsYswDo2/xzEfE9K3/t\nW1Q5JDx9KsPvmsYD1MC7wFXQ48fXUTtpMrVXXZvu8ERERCRDJDOedjfwOLAT8EvgA2Ccn0HlivD0\nqZROuZ1Abc12tzu1NZROuZ3w9KlpikxEREQyTTJJWa219jFgs7X2T8AlxFf1l3Y41ZsJ3zWt3WPC\nd03D2VKdoohEREQkkyWTlBUbY/YH6o0xRxPvMdvd16hyQNGihTiteshac2prKFq0MEURiYiISCZL\nJim7DtgTuAW4n/jw5aN+BpULAuvXeXqciIiI5LYOC/2ttf9s8ePePsaSU2L9B3h6nIiIiOS2NnvK\njDFlxpifG2MWGWNuMMYEmm8faIx5KnUhZqeGUaNxw6XtHuOGS2kYNTpFEYmIiEgma2/48r7mv+8H\nDgR+Yoy5mPjel3/2O7Bs5/bsRe2kye0eUztpMm5ZzxRFJCIiIpmsveHL3ay15wAYY54GviS+YOwI\na+1nKYgt6yXWIQvfNW27ov+6YCmxH2mdMhEREdmmvaQskviHtTZijHnLWquxtk6qvepa6r4/Pj4b\nc906fnJvJQtiZ/LqlQEK0h2ciIiIZIz2kjK31c/aSbuL3LKe1J8TX2/3qw1FfDa7kFdfreWoo6Id\nPFJERETyRXtJ2WBjzG1t/WytvcW/sHLXiSdGmD27kGeeCSkpExERkf9or9D/QSDa4k/rn6ULjjgi\nSo8eLn/+cwi3dV+kiIiI5K02e8qstT9NZSD5orAQRo6MsHBhAe+9F2DoUI0Ki4iISHIr+ovHTjgh\nPofimWc6XLtXRERE8oSSsjQYOTJCMOgqKRMREZH/6DApM8Z8NxWB5JPychgxIsrSpUHWr3fSHY6I\niIhkgGR6yiYbY9Sl47HEEOazz6ppRUREJIkNyYGvgOXGmDeBxsSN1trzfYsqDxx/fISbb47XlY0b\n15TucERERCTNkknKnmr+Ix7aYw+XffaJ8tJLQWpqoLT9vctFREQkx3U4fGmt/R3wIrAFqAb+2nyb\ndNOJJ0aor3d46SUNYYqIiOS7ZAr9JwB/BcYC5wJ/M8Zc4Hdg+SBRV7Z4sZIyERGRfJdMNjAOGGqt\nrQcwxpQCfwHUW9ZNw4bFqKiI8dxzQaJRCAbTHZGIiIikSzKzLyOJhAzAWltDi4J/6bpAIN5b9sUX\nAZYu1ZJxIiIi+SyZnrJPjTG/AZ5r/vkEYLV/IeWXE06I8PDD8Q3KDz1Uua6IiEi+SqZ75jJgDXAR\ncCGwqvk28cC3vx2lpMRVXZmIiEieSyYTOMta+3PfI8lTJSVw9NERFi8u4KOPHPbc0013SCIiIpIG\nyfSUnW6M6eV7JHnsxBO1QbmIiEi+SyYLKAFWGmMs26/o/23fosozxx0XxXHiG5RPnKjV/UVERPJR\nMknZ7b5Hkef69XMZPjzGq68G2bgRdtop3RGJiIhIqiWTlJ1mrf2h75HkuRNPjPDGG0U891yIs86K\npDscERERSbFkasqixphjjTHFxphA4o/vkeUZ1ZWJiIjkt2SSq+8TX6OsFmgCIs1/i4eGDImxxx4x\nXnghRH19x8eLiIhIbumwW8Zaq5mXKeA48YVk7723kCVLghx7bDTdIYmIiEgKtdlTZoy5utXPB7f4\n9wN+BpWvvvtdbVAuIiKSr9obvjy51c93tvj3YB9iyXuHHBKlvDy+NIarNWRFRETySntJmdPOz0oZ\nfBAKwXHHRfj88wD/+pfmUoiIiOST9n7zt5d4tU7YxCOJWZgawhQREckvnemOcdv4t3jomGMiFBa6\nWhpDREQkz7T3m/8IY8zqFj/3a/7ZAfr6G1b+6tEDjjoqygsvhPj0U4fddlP+KyIikg/aS8pMd09u\njJkOHE68Z22Stfb1FvcdB9wBRIGnrbW3Ny9Key+wP/F9NidYa9/rbhzZ5oQTIrzwQohnnw1xySVa\nEk5ERCQftJmUWWtXdefExpijgSHW2hHGmKHAbGBEi0N+DZwArAFeNMb8kXgi2Mtae4QxZk/gLuCU\n7sSRjU44IcJ118XrypSUiYiI5Ac/p/iNBJ4EsNauAMqNMT0BjDGDgY3W2k+ttTHg6ebjhwCvNT/m\nI2CQMSboY4wZaeBAl298I8qSJUGqq9MdjYiIiKSCn9XkA4ClLX6uar6tuvnvqhb3bQD2BP4OXGWM\n+RWwF/H10PoC69t6kvLyMKFQ23lbRUVZF8NPr9NPh1tvhaVLyxgzJt3RJCdb2zrbqJ1TR22dGmrn\n1FFbp05X2jqVU/zaW0bDAbDW/tkYcyTwEvAvYEUHj2PTpto276uoKKOqakvnI80A3/pWAChl/vwm\njjkm8zfDzOa2ziZq59RRW6eG2jl11Nap015bt5es+ZmUrSXeI5YwEPi8jft2ab4Na+1NiRuNMR8R\n70XLO/vvH2OXXWI8/3yIpiYoKEh3RCIiIuInP2vKngW+B2CMOQhYa63dAmCtXQn0NMbsbowJES/m\nf9YYc6AxZnbzY04E3myuOcs7iQ3KN292ePXVvCurExERyTu+JWXW2iXAUmPMEuIzLa8wxlxojDmt\n+ZDLgXnE68jmW2vfB94FAsaY14Abgcl+xZcNTjghvrq/FpIVERHJfY6b5TtfV1VtafMFZPv4eUMD\nDB3ag512cnn99RqcDN7cKtvbOluonVNHbZ0aaufUUVunTgc1ZW3+Nteu1xmsqAhGjoywenWA997T\nWyUiIpLL9Js+w2kIU0REJD8oKctwI0dGCAa1QbmIiEiuU1KW4crL4fDDoyxdGmT9+gwuKhMREZFu\nUVKWBU48MT6E+eyz6i0TERHJVUrKssDxx6uuTEREJNcpKcsCe+zhss8+UV56KUhNTbqjERERET8o\nKcsSJ5wQob7e4aWX1FsmIiKSi5SUZYnE0hiLFyspExERyUX6DZ8lDjooxuA+m+i36A8U77Yadh5A\nw6jRuD17pTs0TzjVmylatJDA+nXE+ufOa/PqdSXOw9ZNFPcoz5n2EZH8k6vXey9om6UsEZ4+leDU\naRRHthWVueFSaidNpvaqa9MYWVx32jo8fSrhu6bh1Gbma+sqr15XrrZPpsul60cmUzunTia0db5c\nz7q6zVLw1ltv9SumlKitbby1rftKS4uorW1MYTT+CE+fSumU2wnFmra73WlqovAfL0EoRNOII9MU\nXVxX2zrx2pymzH1tXeHV68rV9skGuXL9yHRq59RJd1vn0/WsvbYuLS36aVuPU09ZhnOqN9PnG/ts\n962iNTdcypfvWtyynimMbHtdaetseW2d5dXrytX2yRa5cP3IBmrn1ElnW+fb9ayrPWWqKctwRYsW\ntvshBnBqayhatJD6c8alKCpv5OprS/Z1/fbEp3l6wEVtHnPyunlcl4PtIyL5J1ev915TUpbhAuvX\neXpcJsnV15ZsvFs+2MDfP2j7v+C32eDp84mIpEuuXu+9pqQsw8X6D/D0uEySq68t2Xh/NK2cH5zd\n9lBCyaPlcLV3zyciki65er33mtYpy3ANo0bjhkvbPSYWLqVh1OgUReSd+lNGUx9q/7W5WfjaknnP\n3HApTaNHEwzS5p+m0bn73otIfmkYNZpIsa5nHVFSluHcnr2onTS53WMe2uVHNJVkX2Hk1FkV3B65\nod1jPjv/6qwr+kzmPaudNLnD15XMeRYPuzbr2kdE8s9HX/TmzsD17R7z7HBdz7QkRhZoGnEkhEIU\nvLl0u6nEsXAp9w+8me9/fAvr1jmccEIUp805Hf7qbFvPnl3AbbcVs6ryKM45H0r//cZ2r62xsJRb\noj/hqqqbGD06Qo8efkTtn8R7Fntl6XZLmbjhUmqvuT7p9Xjaeu+jJaVM63ELF35wC717uwwfHvP8\nNeS7XLl+ZDq1c+qkq63Xr3c47bQwT3xxNMce71D5+etfu579svQWLnz/Fvr2dRk2LPuvZ1oSYwdy\nbaq1s6X6a6sgV9OT004L869/BZk0qYEf/zg9F7fOtPUTT4SYMKGYvn1dFi2qZfBgd4ev7ed39+WX\nvyxi332jLFxYS68sXPD57JOb2P2NJ5h69SpCuzavXN2Fb4KJ9inbuoktzSv6f/JlL045JcyGDQHu\nuaeOM86I+PAK8leuXT8yldo5ddLR1ps3w+jRYZYvD3LNNQ386EeNO7zef1QVv559+aXDrFn1jB6d\n3dezri7Zeb6BAAAgAElEQVSJoaQsB1RVOYwaFebjjwPcdls9EyY0dfwgjyXb1n/9a5DzziuhuBie\nfLKWAw5o+xuR68J11xUxZ04hhx8eYf78OkpKvIzaX64LQ4b0oF+/GEuW1HpyztbtvGxZgNGjw9TW\nwsMP13HssVFPnkfy5/qRbmrn1El1W9fVwZgxJbz6aoiLL25kypSGdkdz3n03wKmnhqmvh0ceqeM7\n38ne61lXkzLVlOWAigqXBQtq6d8/xi23FDN/fmZOql26NMBFF5UQCMBDD9W1m5ABOA5MmdLA6NFN\nvPJKiPHji4lk0Zenzz5zqK522Hdf/7ri99svxsMP1xEKwcUXl/D66/ovLSLpF4nAZZfFE7JTT23i\njjvaT8gADjggxty5dQQCcOGFJbz5Zv5dz/LvFeeoykqXBQvq6NXL5Yc/LOa554LpDmk7778f4Jxz\n4t+A7ruvniOOSO4bUDAIM2bU8+1vR1i8uIDJk4vJls7dZcvi/73228/f+ojDD49y//11NDTAueeG\nee89/bcWkfSJxeCqq4p55pkQRx8dYcaMegJJXpaOPDLKfffVU18P55xTwvvv59f1LL9ebY4bOjTG\nI4/UUlgI3/9+Ca++mhmJ2Zo1DmPGlLBpk8O0afWcdFLnuruKimDOnDqGDYvy2GMF3HZbkU+RemvZ\nsnj777ef/13wJ5wQZfr0er76yuGss0r49NM0zfgQkbx3221FzJ9fwEEHRXnwwToKCzv3+JNOivDL\nXzawcWOAs84qYc2a/LmeKSnLMYceGuOBB+pobITzzith+fL0vsVffhlPyNauDXDzzQ2cc07Xxh97\n9IBHH61jr72izJxZyIwZBR5H6r1U9ZQljB0b4dZb6/n88wBjxoT54ov8uZCJSGb4zW8KufvuQoYM\nifLII3Vdnjl/7rlN3HRTA2vWxBOzjRu9jTNTKSnLQccdF+XXv65n8+Z4r8mqVen55bx1K5x7bgkf\nfBDk8ssbufLK7s0M7dMnPkQ7cGCM224rZt68zKydS1i2LEjv3i4DB6ZuvHXixCauvLKBjz4KcM45\nJWzdmrKnFpE89+ijIW6/vYiBA2PMn19Hnz7du/b94AeNTJjQyPvvBznnnHBeXM+UlOWoM8+McPvt\n9axfH+812bAhtYlZYyNcdFEJb74Z5Kyzmrj11o6LPJOx664u8+fXUV7uctVVxSxenBlDtK1t3Qor\nVzrst1/q1467+eZGzjmnkbffDnLBBSU0NKT2+UUk/zz9dIjJk4vZaacYCxbUseuu3f8y6jhw660N\njBnTxJtvBrn44hIac3xJOyVlOWz8+CZ++MMGPvkkwNlnl7AlRTOho1G48spiXnwxxPHHR5g+vd7T\nxMSYeO1ccTFcemkJL7+ceYnZe+8FcF0nZUOXLTkO/OIXDZx4YhN//3uIiROLiWbvzHIRyXBLlgQZ\nP76Y4uJ4mcnee3t33QsEYPr0eo4/PsLf/hbiBz8oJpb9a8u2SUlZjrvhhkbGjWvk3XeDnH9+CfX1\n/j6f68KNNxbx5JMFHHZYhPvvjy/X4LWDD44xe3Yd0Wi8du7ddzPro5zKIv8dCYXis1xHjIiwaFEB\n111XlDWzVkUke7z7boBx40qIxeDBB+s46CDvM6aCApg1q47DDovwxBMF3Hhj7l7PMrsoR7rNceDO\nOxvYuNHhT38qYMKEYh54oJ5Qzeavrajs9uz8kvlOdfw8bN1EcY9ypn5yJg8+WMbQoVEeftjfxV6P\nPTbKjBn1XH55MWPHlvDUU7UM7vOVJ6+ru1Jd5L8jJSXx9eBGjw4zd24hffu63DCxKiPaR8RLieuQ\nPtf+at3O9oBTOWvszmzdGv8SeMwx/n0JDYe3Xc9mz45fz669LPeuZ1rRP08k1nz5xz9CLDjwNs74\n4E4CtTX/ud8Nl1I7aXLSezIChKdPJXzXNJwW59lKKff0up7v/uOH9O+fms/WAw8UcMMNxdzZ+2dM\nbvg5wbruvS4vnHxymDffDPDJJ1spLvbuvF35TK9fH9/x4ZyVU/hJ4RQKG9PfPtlA14/U6G477+g6\npM/1jnWnrXfUzjVOKXe4N1A25WouuSQ1O8msW+dwyilhzlud2dezrq7orw3J80QoFF/7Za/5d3LR\nRz/ZbjNYAKepicJ/vAShUHwT7A6Ep0+ldMrtXztPIU0c2fBXSnsndx4vHHRQjGOW/JwLPvgJgUj3\nXpcXYjG4+eYi9tgjxqWXenuh6spnukcPOP/TKZz2xk8IRtPfPtlC14/U6E47t3Ud0ud6x7ra1u1d\n70fyAsMODaSsnf9zPVua2dczbUi+A/qmuz2nejM7HbAPgRY9Sa254VK+fNe2u2m2U72ZPt/YZ7tv\nTF05j1ec6s3s9I19tuv5S2c8n3zicNhhPTj99CbuvdfbIr6ufKYz7f3KFrp+pEZX21mf687LhetH\npsXTFu19KR0qWrSw3YQMwKmt4cdmMbvt1qPNPzfts7jd/xCJ8xQtWuhl+G0qWrSw3YQs1fFsK/LP\njClCRYsWZtT7JeIFfa5TI9PaOdPi8ZoK/fNIYP26pI47sN9a9t+57YTiwM/Xwlrvnq+7kn2eVMWz\nrcg/M9ahyLT2EfGCPtepkWntnGnxeE1JWR6J9R+Q1HHjftSHM8+pbfP+4kf6wFXePV93Jfs8qYon\nE2ZetpRp7SPiBX2uUyPT2jnT4vGahi/zSMOo0bjh0naPccOlNIwanZLzeCXT4lm+PEjfvjH69cuM\nes1Max8RL+hznRoNo0bTUJA57ZzM+x7L4vddSVkecXv2onbS5HaPqZ00ucPiSK/O45VMiqe6Glav\nDrDvvrGUb6/UlkxqHxGvuD17se6iq9s9Rp/r7nvkqT7c1nRDu8dk2vX+3t7XscXJzvddS2LkmaYR\nR0IoRMGbS7eb3uyGS6m95vqk13fx6jxeyZR43noryLx5BZx0UsSXhRS7+pluq322UsrL//Vjek+9\nxsswc4KuH6nRnXb+6V+PYckbxXyr8NXtlkeoD5XScF3qr0OZrrNt/ec/x7dpe7f3txg7zqXXe29k\n9PU+Fi5lvrmJiz+6hXfeCTJ6dIRgmnbh05IYO6Ap7W1ztlR/fSXkLnzTSZynbOsmtvQo7/J5vJKI\n57FffcnLK3fhhjdOoHdl6uJJLGQ7Y0YdY8ZEPD9/dz/TLd/3zeEBfPNn51LSv4xXXqnxZTusbKbr\nR2p0tZ2rq+Gb3+xBaanL0r9+Ts9nF8Ln67jtt5XM2XImL74VpKIiu3+/ea0zbf3yy0HGjCkhGIQ/\n/rGW4cNjnv3e8MqO4mks7smFF5bw3HMhTjutiXvuqSeQhjHBri6JoctwnnLLelJ/zjjPzlNWUUZ9\nBvwCS8Tz/meFPPiLIka+V8vxlambBbl8eWYV+bfW8n0vAE7+qIg5cwI89VSIU0/1PokU8cvvflfI\n1q0OP/xhI4V9t32u+5YX8MX1xTzwQAPXX6+ezq54990A550X38/yd7+rY/jw+PXMq98bXtlRPAXA\n/ffXMWZMCU88UcBOO7nccUdDxpSTdEQ1ZZKTDj00noi99lpq+66XLQtSUOAyZEhmJmWtTZjQSCDg\nMmNGYc5u8Cu5p6EBZs0qoLTU5YILtk+8xo5tok+fGA8+WEhN+8tZyQ588onD2LElbN0KM2bUc+yx\nmbG0T2eEw/Dww3UMHRrlgQcK+eUvC9MdUtKUlElOGj48SiDg8vrrqUvKolFYsSLA3nvHKMySa8Dg\nwS4nnxzhX/8K8o9/pKn4QqSTHn88xPr1AcaNa6JXq/2nw2G46KImNm1ymDevID0BZqn16x3GjAlT\nVRXgjjsaOO207O09790b5s+vo7Iyxp13FvHgg9nxWVBSJjmprAyGDo3x1ltBGlM0gvHJJw51dU7G\nDl225Yor4g00c2aWZJKS12IxuPvuQkIhl/Hjd/yf++KLmygudrn33kIi2ZtXpNTmzXDWWSWsWhXg\n6qsbUrbBuJ8GDHBZsKCWvn1jXH99EQsXZn7FlpIyyVmHHhqlvt7h3XdT8zHftr1SdnX3H3RQjCOO\niPDCC6H/LHwrkqn+8pcg1gY57bQIu+yy4zH3vn1dzj67idWrAyxalPm/iNOtrg7GjSth+fIgF17Y\nyI9+lDu1eIMHu8yfX0dpKUycWMyLL2b2iICuwJKzDjkknhylaggz01by74xEb9ndd6u3TDJbokd3\n4sT2E4dEveTMmaqXbE8kAuPHF/PKKyFGj25iypTsKYpP1gEHxHjooTocBy64oIS33src1CdzIxPp\nplQX+yd6yvbdN/uSspEjoxgT5YknQqxZk2NXZMkZb74Z4OWXQxx7bKTDLz977KF6yY64LkyeXMzi\nxQV8+9sRZsyoT9u6Xn478sgo991XT309nH12CR98kJnpT2ZGJeKB3XZzGTAgxmuvBVPyTXnZsgD9\n+8fo2zf7vpYHAvGeh0jEYdYs9ZZJZkr0kiV6djty5ZWql2zPbbcV8dhjBQwbFmXOnDqKitIdkb9O\nPjnCL37RwMaNAcaMKWHt2sz7AqqkTHKW48SHMDdsCLB6tb//+TZtgrVrA1k5dJlw+ukRBgyIMXdu\nAZs3pzsake19/LHDU0+F+MY3ohx1VHJ1m8OGqV6yLTNmFDBzZiF77RXlkUfq6NEj3RGlxnnnNXHT\nTQ2sWRNPzDZuTHdE29OnVHJaqoYws7XIv6WiIrj00iZqahx+9zv1LEhmuffeQlzX4corGztV86R6\nSXCqN1P8yFz42c8ofmQuf5xdw223FbPzzjEWLKjLyt797vjBDxoZP76R998Pcu65YWo/j7dPeNqd\nFD8yF6c6fd9KfZ2WYoyZDhwOuMAka+3rLe47DrgDiAJPW2tvN8b0AOYC5UAR8FNr7TN+xii5rWVS\nduaZ/s2Nz+Yi/5YuuKCR6dMLuf/+AsaPb8z54QzJDl984fDYYwVUVsY45ZTO/T8eOTLKPvvE6yVv\nvNFpc8ZmrgpPn0r4rmk4tfGVdMuA87iOL4qv5zsLrmLXXfOrPSA+ivLTnzawcaPDkN/fSb+DplAS\n3bbScI8fX0ftpMlp2TvVt54yY8zRwBBr7QjgEuDXrQ75NXAGcCRwvDFmX+BCwFprjwG+B9zlV3yS\nH/bfP0ZJif+LyG7rKcvupKxnTzj//CbWrw/wxz9qKQHJDA88UEB9vcOECY2d3qO1Zb3kffflV29Z\nePpUSqfc/p+ELKEHNdxUfzPDnv5/aYos/QIB+O0et/O/3LRdQgbg1NZQOuV2wtOnpj4uH889EngS\nwFq7Aig3xvQEMMYMBjZaaz+11saAp5uP/wLo0/z48uafRbqsoACGDYuyYkWA6mr/nmfZsgBFRS57\n7pndSRnAZZc1Egq53H13IbHsfzmS5Wpr4cEHCygvj6891hWJesmHHsqfekmnejPhu6a1e0z4rmk4\nW3y8MGYwp3ozZTMyr338TMoGAFUtfq5qvm1H920AdrbWPgZUGmM+BF4CrvExPskThx4axXUd3njD\nn96ySASsDbDPPrFOf4vPRAMHupx+eoT33w/yl7/k6Px4yRrz5hWwcWOAiy5qpLS0a+coLMy/esmi\nRQu/1kPWmlNbQ9GihSmKKLNkavuk8ldIe6WZDoAx5jxgtbX2RGPMgcADwMHtnbS8PEwo1PYvjoqK\nsi6EKl2RqW193HHwq1/B8uVhzjrL+/MvWwaNjTB8eDAlbZCK57jpJliwAO67L8y55/r+dBkrUz/T\nuaatdo5EYNYsKC6G664roqKi60WOV18dvw789rdF3HRTUe7XS27dlNRhZVs3UZaPn/MUtE9Xrh9+\nJmVr2dYzBjAQ+LyN+3Zpvu1I4BkAa+07xpiBxpigtbbNKW2bNtW2GUBFRRlVVVu6Fr10Sia39ZAh\nAGX89a8RrryyzvPz//3vIaCEPfesp6rK3/3iUtXOAwbAyJElPP98iMWLaxg+PP/GMTP5M51L2mvn\nhQtDfPJJCRdc0IjjNFBVtcPDkjZuXBF3313IvffWcc45ub0pZnGPcpJJCbb0KKc+Dz/nfrdPe5/r\n9pI1P4cvnyVerI8x5iBgrbV2C4C1diXQ0xizuzEmBJzSfPyHwGHNjxkEbG0vIRNJRnk5GBNl6dKg\nL5sT58rMy9a0Ubmkk+vCjBmFOI7L5Zd7sxdjPtVLfnXcaGoD7Y/3uuFSGkaNTlFEmaVh1GjccOa1\nj29JmbV2CbDUGLOE+EzLK4wxFxpjTms+5HJgHvB3YL619n3gPmB3Y8yLwKPABL/ik/xyyCFRamsd\nVqzw/iO/bXul3Pr+cOSRUb75zSh/+lOIjz/OvJWvJbf9859B3nknyEknRRg82JtlGwYOdDnjjHi9\n5HPP5W69ZCQCl13bn/+N3dDucbWTJuOW9UxRVJnF7dmL2kmT2z0mHe3ja02Ztfb6Vje90+K+l4AR\nrY7fCozxMybJT4ceGuXhh+PrlR1wgLdfkZctC7DrrjF69/b0tGnnOPHesksvLeGeewqZOrUh3SFJ\nHkn00Ca2SvLKxImNzJ8fX83+hBO8L2dIN9eFq6+O72dZ863r2Hx4Az1nTtuuqN0Nl6ZtHa5Mknj9\nLddxg/S2Tw7MFRPp2CGHxHuxXn89yCWXeFf3VVXlsGFDgBNOyM36lJNPjlBZGWP+/AJ+9KNGKiry\nb6FJSb0VKwI8/3yIww+PeF7POHRojJEjIzz/fIilSwM5Vy95++2FzJtXwDe/GeV3v6ujsce1fDlh\nPEWLFlK2dRNbepTHh+7ytIestdqrrqXu+/H2CaxfR6z/gLS2j5IyyQuDB7v07RvzfLulRD1Zrg1d\nJoRCcPnljdxwQzGzZxdw3XXe9lqI7EhiS6RkNx7vrCuvbOT550PMnFnI7Nn1vjxHOsycWcCMGUXs\ntVeURx/dtp+lW9aT+nPGUVZRlpdF/R1JtE8m0N6XkhccBw4+OMpnnwVYu9a7+qhcLfJvaezYJnba\nKcbs2YXUtL+sj0i3rV3r8Mc/hth77yj/9V/+fNk54ojcq5d87LEQP/1pfD/L+fPzbz/LXKGkTPLG\nIYfEEycvt1zKhY3IO1JaChdd1MSmTfH9B0X8NGtWIZGIw8SJjQR8+g2VqJd0XYd77sn+2cWLFwe5\n6qpievd2mT+/jt12U0KWrZSUSd5ouTm5V5YtCxAOu+y+e25fBC+5pIniYpd77in0ZVkREYDqapg7\nt4D+/WOccYa/H7SW9ZJVVdnbW/byy0Euu6yEoiJ49NFa9tknd3vt84GSMskbBx4YpbDQ9Swpa2iA\nDz4IMHRojGDuzq4HoG9fl7Fjm1i9OsBTT6kUVfzxu98VsnWrw6WXNvm+4n6iXrK+3uGBB7KzB/jf\n/w4wblwJkQjMnl3HwQcrIct2SsokbxQXwze+EePf/w54Uhv1/vsBIhEnZ4v8W5swoRHHcZk5sxA3\ntzsGJQ0aG+H++wsoLXW54ILUTCg5++x4veSDD2ZfveTKlQ5jx5ZQXe3wm9/Uc+yx+XEdynVKyiSv\nHHpolGjU4a23ut+1tXx57hf5tzR4sMvJJ0d4550g//xnjncNSso9/niIdesCjBvXRK9eqXnOcDg7\n6yXXr3c488wwGzYEuOOOet+HeiV1lJRJXvGyrmxbkX9+JGWwbYmCGTOyvzhaMkcsFl8sNhRyGT8+\ntcuuZFu9ZHU1jB1bwqpVASZPbuD73/d3v11JLRWHSF45+OBti8h217blMPJn2GD48BgjRkR444Ua\nNk59hF2Dn29bbLFn17o3nOrNX1+4sYvn8kIiHrZuojix0GYGxNPd9snU87B1E5+s7sPn9hxOOzPM\nLrukdmw8US85Z04hzyyo4Qz3jxnXRonzfHXcaMZdNoBly4Kcf36j1g3MQY6b5cUhVVVb2nwB7e3S\nLt7KprY+7LBSNm50sHZrl6fcuy7su28pPXrA66+nrhglE9p5zRW/ZK/f/4IedH9bkvD0qRm1xUmu\nxpMN59lKKRsuuZqyKdckfR6vfPyxw6LDf8WNgZ8TjmVuG9UFS/lZ9AbeHfUjZs2q7/QEo0y4fuSL\n9tq6oqKszem+wVtvvdWvmFKitrbx1rbuKy0torZW3yRSIZva+l//CrJ0aZBRoyJd3jZo/XqHadOK\nOPLICKeemroxj3S3c3j6VCrvu41Cth8ycZqaKPzHSxAK0TTiyKTPVTrldpym7p/LC7kaT7acp5Am\nyt96MeXtDLDLnKmc+I9bKXAzu40K3CZG8gInjQb3qM63UbqvH/mkvbYuLS36aVuPU0+ZeCKb2nru\n3AKuuaaYX/yinvPP71o9xvPPBzn77DDXXtvAtdem7iKXznZ2qjfT5xv7bPfNvTU3XMqX79oO943z\n8lxeyNV4cvU8Xsq01+ZnG2XTdTrbdbWnTDVlkndaFvt3NSlLFPnvu2/+FPkXLVrY7i8KAKe2hleu\neYp3hl/Q7nEHvvEUpyRxrqJFC1OyJ12yry3T4umorZNt50w7T6raGbK3rVPZRpI6Ssok7+y9d4xe\nvbq3iGw+FvkH1q9L6riXn/iCO54obveYH/MFp3j4nN2V7PNkWjwdtXWy7Zxp50lVO3fmufK5jSR1\nlJRJ3gkE4rMwn38+xIYNDv36dX4If9myAD16uFRWZvfwf2fE+g9I6rjvXtSHId+ua/eYIS/1gQe9\ne87uSvZ5Mi2ejto62XbOtPOkqp0781z53EaSOqopE09kW1tPn17IlClFPPhgHSef3LlC/bo62GOP\nHhx8cJSnnmo/+fCaasr8kavx5Op5vJRpr001ZbmhqzVlWjxW8tIhh3R9EVlrA8RiTl4tGgvg9uxF\n7aTJ7R5TO2lyUr8ovDyXF3I1nlw9j5cy7bVlYhtJ6mhJDPFEtrV1nz4uM2YUEos5nHtu54r9n38+\nxDPPhDjvvCYOPDC1iVm627lpxJEQClHw5tLtpuu74VJqr7m+U+swtXWuxsJS6n/UuXN54cv9juQ3\n94Y5JPbadkt+1AVLabw+9fE0jTiSuY+WMHTL9vF0tq29es8y7TxeyrTX5lcbpfv6kU+0JMYOqKs2\ndbKxrY8/Pszy5QE+/HArxe3XpW/nxhuL+O1vC/nzn2sYPjy1SVmmtLOzpfrrK5Z38Zt74ly1H63n\nlnsq+efOZ/CX14KdXhizu2bMKOC224q5/doqJu3ye3ps2cTNd/dn5voz+ctrwZTXD776apBRo8KM\nPmYjD41+rNtt7dV75vV5yrZuYkti54Q09/5kaht58f8MMuf6kQ+6OnyppEw8kY1t/eMfF3H//YX8\n3//Vcvjhyc+iHD26hFdeCfLxx1spLfUxwB3IxnbujKuvLuKhhwr57W/r+O//Tt2ivI2NcPDBpWzZ\n4vD221vp1Sve1jNn1nHllSVcemkj//u/DSmLB+D884tZvLiAJ5+s5YgjcneWb65/pjOJ2jp1VFMm\n0kmJ9co6sw+m68Ly5UEGD3ZTnpDlg4kTG3Ecl5kzC0nl98XHHw+xbl2AceOa6NVii8LTToswcGCM\nRx4pYNOm1MXz4YcOzzwT4qCDoowYkbsJmYhsT0mZ5K1Esf/rryf/32DNGofNm528Wp8slfbc0+W7\n343w1ltBXn45NeOXsRjcfXchoZDL+PHb14AUFMBllzVSW+swZ05hSuIBuOeeQlzX4YorGnHa/E4t\nIrlGSZnkrYEDXXbdNcbrrweT7pXZtmhsfs28TKUrrognRjNnpiYJev75IO+9F+S00yLsssvXPwjj\nxjXRs6fL/fcXUF/vfzzr1zssWFDA7rvHOOmk1A3hikj6KSmTvHbooVG+/DLAxx8n1x2R2F5JPWX+\nOeSQGIcdFuG550K8957/l6hE8jdx4o5nSpWVwQUXNPLFFwEWLCjwPZ4HHiigocHh8ssbUz7ZQUTS\nS0mZ5LXOrlemnrLUSPSW3X23v71lb74ZYMmSEMccE2n3Pb300iYKClzuuaeQmI9v/datMGdOIX36\nxBg7tmv7sopI9lJSJnmt5ebkyVi2LEivXi4DB2b3rOVMd/zxUYYMifLHP4b4/HP/iqoSvWRXXtn+\n2k0DBriceWYTH30UYPFi/3ane/TRAr76yuHii5soKfHtaUQkQykpk7w2dGiM0lI3qRmYNTXwySfx\nIn8VX/srEIDLL2+iqclh1ix/ess++cThT38K8Y1vRDnqqI6HoydOjPdczZjhTzxNTXDvvYWUlLhc\nfLF6yUTykZIyyWuhEAwfHuX994MdLnmwYkUA182/7ZXS5Xvfa6Jfvxhz5xawxYelle69N76jQ7Iz\nHPfeO8bxx0d4440gr77qfbHX//1fiM8+C3D22U306aOeWJF8pKRM8l6iruyNN9r/Rasi/9QqLo7X\ncm3Z4jB3rrcF9l984TBvXgGVlTFGjUp+huO2maHexuO68aHUQMBlwgRtgyOSr5SUSd5Ltq5MRf6p\nd8EFjYTDLrNmFdLoYa4ye3YB9fUOEyY0EupEidjhh0cZPjzK4sUFfPCBd5fPF18M8u9/BznllAi7\n765eMpF8paRM8t7BB0dxHDeJpCxIMOhijJKyVOndO75O2OefB3j8cW8K7Gtr40lZebnL2Wd3rnbL\ncbYtnXHPPd71liUmHCR64kQkPykpk7xXVhYv+H/rrSBNbfyOjsVg+fIAe+0V69Tm5dJ948c3Egy6\nzavcd/988+YVsHFjgIsuauzSVlknnRRhjz1iLFhQwPr13Z/x8e67AV58McSRR0YYNkwJv0g+U1Im\nQnwIs77e4d13d/xfYtUqh5oaFfmnw667upx6aoQVK4K88EL3Cuyj0XiBf1GRyyWXdG2GYzAIl1/e\nSGOjwwMPdL+3LLEWm3rJRERJmQgdLyK7fHn89n33VVKWDomEpbvLUfzpTyFWrQpw1llNVFR0vdvt\nrLOa6Ns3xoMPFrJ1a9fj+fRThyefDDF0aJSRIzWBRCTfKSkTYVuxf1vrlSWK/PffX78402H//WN8\n5zsR/vnPEG+/3bXLluvGkzrHcdvcUilZJSVw8cVNbN7s8MgjXe8tmzWrkGg0vqWS1r4TESVlIkBl\npX3mAyYAAAw5SURBVEv//jFee23Hm5Nr5mX6dXej8iVLgrz9dpCTTooweHD3i9Muvjg+M/S++wrb\nrEVsz1dfwUMPFbDzzjFOP10bj4uIkjIRID6r7pBDoqxfH2D16q93WSxbFqRv3xj9+mm5gnT59rej\nHHBAlEWLQqxc2fluJa9nOO60E5x9dhOffRZg4cLOzwydM6eQ2lqHyy5rpNDfLT5FJEsoKRNp1tYQ\n5pYtsHp1gH33jWmIKY0cJ55QxWIO997buSxmxYoAf/lLiMMOi3Dwwd71dk6Y0Egg4DJzZudmhtbX\nw/33F1BW5nL++dpSSUTilJSJNGtrEdnESv4q8k+///7vCLvtFmPevAK+/DL5DNmvGY6DBrmMGhVh\n2bIgL76Y/MzQP/yhgKqqAOef30RZmachiUgWU1Im0mz//WMUF399Edlt9WQq8k+3UCjeO1VX5zB7\ndnIF9mvXOjz+eIghQ6Icf7z372FnZ4bGYnD33QUUFLhcdpmWwRCRbZSUiTQrLIRhw6KsWBHYbgPs\n5ctV5J9Jzj67id69XWbPLqC2tuPjZ80qpKnJYeLEJgI+XPG++c0YRx0V4aWXQm2uc9fSM8+E+PDD\nIGecEWHnnVWjKCLbKCkTaeHQQ6O4rrPd5uTLlgUpKHDZe28lZZmgRw+46KJGvvwywPz57feWVVfD\n3LkF9OsX43vf8692qzMzQxObmXd3WQ4RyT1KykRaaL2IbDQaLxIfMiSmGXIZ5JJLmigqim+9FG1n\nRHLu3AK2bnW47LImior8i+fYY6MMHRpl4cIQn37adq3ba68FeO21EMcdF2GffZTki8j2lJSJtJBI\nyhIzMD/5xKGuTtsrZZp+/VzGjGli5coATz+94+UoGhvjQ5elpS4XXOBvr1Rio/Jo1OG++9rO3hM9\naVdeqV4yEfk6JWUiLZSXw957R1m6NEgksm3mpYr8M8/EiY04TtvLUTz+eIh16wKcd14TvXr5H89p\np0XYeecYDz9cwFdfff3+jz5yWLw4xLBhUUaM0OdJRL5OSZlIK4ccEqWmxmHFioBW8s9ge+7pcuKJ\nEd58M8grr2w/YzY+w7GQUMhl/PjU9EoVFsL48Y3U1jrMmfP13rK77y7EdR2uuEJbKonIjikpE2ml\n5Xpl23rKlJRlosQwYOvlKJ5/Psh77wU59dQIu+6auhmO48Y1UVbmcv/9BdTXb7t9wwaHBQsKGDQo\nxskna0slEdkxJWUirbRc2X/58gD9+8fo21dLF2SiQw6JceihEZ57LoS12y5nidqtVM9wLCuDCy5o\npKoqwO9/v21m6AMPFNDQEN94PJj8GrMikmc6v2GbSI4bPNhl9/JN7PrMHxhas57e+/TDqT4Jt2cK\nCpOk0664oonXXgvx4K/q+NVR81n31nr2WTKI8qNOZf/9Uz9l9rLLmrjvvkLm/qaOS5xHiX22jqZ7\nB7F7+RmMHatLroi0zXE7s2FbBqqq2tLmC6ioKKOqaktbd4uHcqmtw9OnErhzGiXRmv/c5oZLqZ00\nmdqrrk1jZLnVzl6JxeCRfX/FZRv/Hz3Y9p5FiktpuKrr71l32vqF46Zz8r/u3C6ehoJSItek/zOU\nafSZTh21deq019YVFWVtVpX6+rXNGDMdOBxwgUnW2tdb3HcccAcQBZ621t5ujLkEGNfiFAdba3v4\nGaNIS+HpUymdcvvXbndqa/5zu36pZpYed01l8savv2eh+hpCaXjPwtOncta/vh5PUVMNRfoMiUg7\nfKspM8YcDQyx1o4ALgF+3eqQXwNnAEcCxxtj9rXWPmCt/Y619jvAT4Df+RWfSGtO9WbCd01r95jw\nXdNwtlSnKCLpSKa9Z5kWj4hkFz8L/UcCTwJYa1cA5caYngDGmMHARmvtp9baGPB08/Et3QJ8/eum\niE+KFi3Eqa1p9xintoaiRQtTFJF0JNPes0yLR0Syi5/DlwOApS1+rmq+rbr576oW920A9kz8YIw5\nBPjUWruuoycpLw8TCrU9namioqxzUcv/b+9uQuy6yziOf8eGms5k0qYgSRMLQZAHpG6SjQq14wtU\nJTGLVheGUrVFkEYC0UXFjdWFmmCrRndWgy2WUos20VJqu9BFNm2hQUWeqqjQTmp9ie2MoaFtrotz\nAjPTuXmZufee//3P9wMD55x7En7znGdmHs4599wVG/taz5+6qN2m508x3eH3OvZ1HqQhH7NLrvWY\n9FBp7OnRsdajs5Jaj/KtQOd7XOLS124HjlzMf3rq1Om+r3lT4+jUUOv1GzZxMT9Ccxs28WpH32sN\ndR6kYR6zldR6HHqoNPb06Fjr0bnAjf59/90wL1/O0pwRO2crcLLPa9vabefMAMeHmE16kzO799Cb\nnDrvPr3JKc7s3jOiRLqQ0o5ZaXkkjZdhDmWPAzcDRMQOYDYz5wAy82/AxojYHhHrgF3t/kTEVmA+\nM/3EXo1Ub+OVnN5/4Lz7nN5/gN70xhEl0oWUdsxKyyNpvAzt8mVmHo+IZyLiOHAWuCMiPg28nJk/\nBz4PPNDu/mBmPtcuX0Nzj5k0cuceVTD53bsX3bBdynPK9GalHbPS8kgaHz48VgNRW60n5l7hrcce\n4S3/eJGzm7c0l6UKOLtRW50HadDHbLW1LrWHSmNPj461Hp0iHx4rjave9EZe/dQtF95RxSjtmJWW\nR1L5/EBySZKkAjiUSZIkFcChTJIkqQAOZZIkSQVwKJMkSSqAQ5kkSVIBHMokSZIK4FAmSZJUgLF/\nor8kSVINPFMmSZJUAIcySZKkAjiUSZIkFcChTJIkqQAOZZIkSQVwKJMkSSrAuq4DDEtE3AO8B+gB\n+zPzqY4jVSciZoCHgD+0m36XmV/oLlF9IuI64BHgnsz8fkRcC9wHXAacBG7JzDNdZqzFMrU+AuwE\n/t3ucigzf9VVvlpExEHgepq/P98AnsKeHoplav1x7OmBiohJ4AiwGVgPfB04wQp7usozZRFxA/DO\nzHwvcBvwvY4j1ew3mTnTfjmQDVBETAGHgScXbP4a8IPMvB74M/DZLrLVpk+tAb68oL/947VKEfEB\n4Lr2d/NHgO9gTw9Fn1qDPT1ou4GnM/MG4JPA3ayip6scyoAPAb8AyMw/ApsiYmO3kaRLdgb4GDC7\nYNsMcLRdPgZ8eMSZarVcrTV4vwU+0S7/F5jCnh6W5Wp9WXdx6pSZD2bmwXb1WuB5VtHTtV6+3AI8\ns2D9n+22V7qJU7V3RcRR4Grgrsz8ddeBapGZrwOvR8TCzVMLToO/BFwz8mAV6lNrgH0RcYCm1vsy\n818jD1eRzHwD+F+7ehvwKHCjPT14fWr9Bvb0UETEceDtwC7giZX2dK1nypaa6DpApf4E3AXsAW4F\n7o2Iy7uNtKbY18N1H3BnZn4QeBb4ardx6hERe2gGhX1LXrKnB2xJre3pIcnM99Hcs3c/i/v4knq6\n1qFslubM2DlbaW620wBl5gvtqdteZv4FeBHY1nWuys1HxBXt8ja83DY0mflkZj7brh4F3t1lnlpE\nxI3AV4CPZubL2NNDs7TW9vTgRcTO9g1YtLVdB8yttKdrHcoeB24GiIgdwGxmznUbqT4RsTcivtQu\nb6F598kL3aaq3hPATe3yTcBjHWapWkQ8HBHvaFdngN93GKcKEXElcAjYlZn/aTfb00OwXK3t6aF4\nP/BFgIjYDGxgFT090ev1Bh2wCBHxTZpinQXuyMwTHUeqTkRMAz8FrgIup7mn7NFuU9UjInYC3wa2\nA6/RDLx7ad5+vR74O/CZzHyto4jV6FPrw8CdwGlgnqbWL3WVsQYR8TmaS2bPLdh8K/BD7OmB6lPr\nH9NcxrSnB6Q9I3YvzU3+V9Dc0vM08BNW0NPVDmWSJEnjpNbLl5IkSWPFoUySJKkADmWSJEkFcCiT\nJEkqgEOZJElSARzKJKkVEdsj4vkF61dHxImI2N1lLklrg0OZJC0jIiaBXwKHMvNY13kk1c+hTJKW\niIh1wM+ABzLz/q7zSFobHMokabEJ4EfA+sw83HUYSWuHQ5kkLbaF5jMBr4qIvV2HkbR2OJRJ0mIn\nM/MgzQcJfysidnQdSNLa4FAmScvIzL8CtwMPR8Tbus4jqX4OZZLUR2Y+RnN/2UPtzf+SNDQTvV6v\n6wySJElrnmfKJEmSCuBQJkmSVACHMkmSpAI4lEmSJBXAoUySJKkADmWSJEkFcCiTJEkqgEOZJElS\nAf4P2iZ4+aTU60IAAAAASUVORK5CYII=\n", 1425 | "text/plain": [ 1426 | "" 1427 | ] 1428 | }, 1429 | "metadata": {}, 1430 | "output_type": "display_data" 1431 | } 1432 | ], 1433 | "source": [ 1434 | "plt.figure(figsize=(10,6))\n", 1435 | "plt.plot(range(1,30),error_rate,color='blue', linestyle='-', marker='o',\n", 1436 | " markerfacecolor='red', markersize=10)\n", 1437 | "plt.title('Error Rate vs. K Value')\n", 1438 | "plt.xlabel('K')\n", 1439 | "plt.ylabel('Error Rate')" 1440 | ] 1441 | }, 1442 | { 1443 | "cell_type": "code", 1444 | "execution_count": 39, 1445 | "metadata": {}, 1446 | "outputs": [ 1447 | { 1448 | "name": "stdout", 1449 | "output_type": "stream", 1450 | "text": [ 1451 | "WITH K=1\n", 1452 | "\n", 1453 | "\n", 1454 | "[[134 21]\n", 1455 | " [ 15 130]]\n", 1456 | "\n", 1457 | "\n", 1458 | " precision recall f1-score support\n", 1459 | "\n", 1460 | " 0 0.90 0.86 0.88 155\n", 1461 | " 1 0.86 0.90 0.88 145\n", 1462 | "\n", 1463 | "avg / total 0.88 0.88 0.88 300\n", 1464 | "\n" 1465 | ] 1466 | } 1467 | ], 1468 | "source": [ 1469 | "# FIRST A QUICK COMPARISON TO OUR ORIGINAL K=1\n", 1470 | "knn = KNeighborsClassifier(n_neighbors=1)\n", 1471 | "\n", 1472 | "knn.fit(X_train,y_train)\n", 1473 | "pred = knn.predict(X_test)\n", 1474 | "\n", 1475 | "print('WITH K=1')\n", 1476 | "print('\\n')\n", 1477 | "print(confusion_matrix(y_test,pred))\n", 1478 | "print('\\n')\n", 1479 | "print(classification_report(y_test,pred))" 1480 | ] 1481 | }, 1482 | { 1483 | "cell_type": "code", 1484 | "execution_count": 40, 1485 | "metadata": {}, 1486 | "outputs": [ 1487 | { 1488 | "name": "stdout", 1489 | "output_type": "stream", 1490 | "text": [ 1491 | "WITH K=16\n", 1492 | "\n", 1493 | "\n", 1494 | "[[139 16]\n", 1495 | " [ 5 140]]\n", 1496 | "\n", 1497 | "\n", 1498 | " precision recall f1-score support\n", 1499 | "\n", 1500 | " 0 0.97 0.90 0.93 155\n", 1501 | " 1 0.90 0.97 0.93 145\n", 1502 | "\n", 1503 | "avg / total 0.93 0.93 0.93 300\n", 1504 | "\n" 1505 | ] 1506 | } 1507 | ], 1508 | "source": [ 1509 | "# NOW WITH K=16\n", 1510 | "knn = KNeighborsClassifier(n_neighbors=16)\n", 1511 | "\n", 1512 | "knn.fit(X_train,y_train)\n", 1513 | "pred = knn.predict(X_test)\n", 1514 | "\n", 1515 | "print('WITH K=16')\n", 1516 | "print('\\n')\n", 1517 | "print(confusion_matrix(y_test,pred))\n", 1518 | "print('\\n')\n", 1519 | "print(classification_report(y_test,pred))" 1520 | ] 1521 | }, 1522 | { 1523 | "cell_type": "markdown", 1524 | "metadata": {}, 1525 | "source": [ 1526 | "# The End!" 1527 | ] 1528 | }, 1529 | { 1530 | "cell_type": "code", 1531 | "execution_count": null, 1532 | "metadata": { 1533 | "collapsed": true 1534 | }, 1535 | "outputs": [], 1536 | "source": [] 1537 | } 1538 | ], 1539 | "metadata": { 1540 | "kernelspec": { 1541 | "display_name": "Python 2", 1542 | "language": "python", 1543 | "name": "python2" 1544 | }, 1545 | "language_info": { 1546 | "codemirror_mode": { 1547 | "name": "ipython", 1548 | "version": 2 1549 | }, 1550 | "file_extension": ".py", 1551 | "mimetype": "text/x-python", 1552 | "name": "python", 1553 | "nbconvert_exporter": "python", 1554 | "pygments_lexer": "ipython2", 1555 | "version": "2.7.12" 1556 | } 1557 | }, 1558 | "nbformat": 4, 1559 | "nbformat_minor": 2 1560 | } 1561 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2017 Computer Club LNMIIT 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /LinearRegression/linearregression.R: -------------------------------------------------------------------------------- 1 | # VIDEO 2 2 | 3 | # Read in data 4 | baseball = read.csv("baseball.csv") 5 | str(baseball) 6 | 7 | # Subset to only include moneyball years 8 | moneyball = subset(baseball, Year < 2002) 9 | str(moneyball) 10 | 11 | # Compute Run Difference 12 | moneyball$RD = moneyball$RS - moneyball$RA 13 | str(moneyball) 14 | 15 | # Scatterplot to check for linear relationship 16 | plot(moneyball$RD, moneyball$W) 17 | 18 | # Regression model to predict wins 19 | WinsReg = lm(W ~ RD, data=moneyball) 20 | summary(WinsReg) 21 | 22 | 23 | # VIDEO 3 24 | 25 | str(moneyball) 26 | 27 | # Regression model to predict runs scored 28 | RunsReg = lm(RS ~ OBP + SLG + BA, data=moneyball) 29 | summary(RunsReg) 30 | 31 | RunsReg = lm(RS ~ OBP + SLG, data=moneyball) 32 | summary(RunsReg) 33 | -------------------------------------------------------------------------------- /LinearRegression/linearregressionRecitation.R: -------------------------------------------------------------------------------- 1 | # VIDEO 1 2 | 3 | # Read in the data 4 | NBA = read.csv("NBA_train.csv") 5 | str(NBA) 6 | 7 | 8 | # VIDEO 2 9 | 10 | # How many wins to make the playoffs? 11 | table(NBA$W, NBA$Playoffs) 12 | 13 | # Compute Points Difference 14 | NBA$PTSdiff = NBA$PTS - NBA$oppPTS 15 | 16 | # Check for linear relationship 17 | plot(NBA$PTSdiff, NBA$W) 18 | 19 | # Linear regression model for wins 20 | WinsReg = lm(W ~ PTSdiff, data=NBA) 21 | summary(WinsReg) 22 | 23 | 24 | # VIDEO 3 25 | 26 | # Linear regression model for points scored 27 | PointsReg = lm(PTS ~ X2PA + X3PA + FTA + AST + ORB + DRB + TOV + STL + BLK, data=NBA) 28 | summary(PointsReg) 29 | 30 | # Sum of Squared Errors 31 | PointsReg$residuals 32 | SSE = sum(PointsReg$residuals^2) 33 | SSE 34 | 35 | # Root mean squared error 36 | RMSE = sqrt(SSE/nrow(NBA)) 37 | RMSE 38 | 39 | # Average number of points in a season 40 | mean(NBA$PTS) 41 | 42 | # Remove insignifcant variables 43 | summary(PointsReg) 44 | 45 | PointsReg2 = lm(PTS ~ X2PA + X3PA + FTA + AST + ORB + DRB + STL + BLK, data=NBA) 46 | summary(PointsReg2) 47 | 48 | PointsReg3 = lm(PTS ~ X2PA + X3PA + FTA + AST + ORB + STL + BLK, data=NBA) 49 | summary(PointsReg3) 50 | 51 | PointsReg4 = lm(PTS ~ X2PA + X3PA + FTA + AST + ORB + STL, data=NBA) 52 | summary(PointsReg4) 53 | accurate model = step(PointsReg4) 54 | 55 | # Compute SSE and RMSE for new model 56 | SSE_4 = sum(PointsReg4$residuals^2) 57 | RMSE_4 = sqrt(SSE_4/nrow(NBA)) 58 | SSE_4 59 | RMSE_4 60 | 61 | 62 | 63 | 64 | # VIDEO 4 65 | 66 | # Read in test set 67 | NBA_test = read.csv("NBA_test.csv") 68 | 69 | # Make predictions on test set 70 | PointsPredictions = predict(PointsReg4, newdata=NBA_test) 71 | 72 | # Compute out-of-sample R^2 73 | SSE = sum((PointsPredictions - NBA_test$PTS)^2) 74 | SST = sum((mean(NBA$PTS) - NBA_test$PTS)^2) 75 | R2 = 1 - SSE/SST 76 | R2 77 | 78 | # Compute the RMSE 79 | RMSE = sqrt(SSE/nrow(NBA_test)) 80 | RMSE -------------------------------------------------------------------------------- /LinearRegression/machine.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Github-Classroom-Cybros/Lectures-On-Machine-Learning/996e0842c8560c692a1023c200d3e2f209e150f2/LinearRegression/machine.pdf -------------------------------------------------------------------------------- /Naive_Bayes/Supervised+Learning+Naive+Bayes.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Naive Bayes Classifiers\n", 8 | "\n", 9 | "In this lecture we will learn how to use Naive Bayes Classifier to perform a Multi Class Classification on a data set we are already familiar with: the Iris Data Set. \n", 10 | "\n", 11 | " This Lecture will consist of 7 main parts:\n", 12 | "\n", 13 | " Part 1: Note on Notation and Math Terms\n", 14 | " Part 2: Bayes' Theorem\n", 15 | " Part 3: Introduction to Naive Bayes\n", 16 | " Part 4: Naive Bayes Classifier Mathematics Overview\n", 17 | " Part 5: Constructing a classifier from the probability model\n", 18 | " Part 6: Gaussian Naive Bayes\n", 19 | " Part 7: Gaussian Naive Bayes with SciKit Learn\n", 20 | " " 21 | ] 22 | }, 23 | { 24 | "cell_type": "markdown", 25 | "metadata": {}, 26 | "source": [ 27 | "#### Part 1: Note on Notation and Math Terms\n", 28 | "\n", 29 | "There are a few more advanced notations and mathematical terms used during the explanation of naive Bayes Classification.\n", 30 | "You should be familiar with the following:\n", 31 | "\n", 32 | "[Product of Sequence](http://en.wikipedia.org/wiki/Product_%28mathematics%29#Product_of_sequences)\n", 33 | "\n", 34 | "The product of a sequence of terms can be written with the product symbol, which derives from the capital letter Π (Pi) in the Greek alphabet. The meaning of this notation is given by:\n", 35 | " $$\\prod_{i=1}^4 i = 1\\cdot 2\\cdot 3\\cdot 4, $$\n", 36 | "that is\n", 37 | " $$\\prod_{i=1}^4 i = 24. $$\n", 38 | " \n", 39 | "[Arg Max](http://en.wikipedia.org/wiki/Arg_max)\n", 40 | "\n", 41 | "In mathematics, the argument of the maximum (abbreviated arg max or argmax) is the set of points of the given argument for which the given function attains its maximum value. In contrast to global maximums, which refer to a function's largest outputs, the arg max refers to the inputs which create those maximum outputs.\n", 42 | "\n", 43 | "The arg max is defined by\n", 44 | "\n", 45 | "$$\\operatorname*{arg\\,max}_x f(x) := \\{x \\mid \\forall y : f(y) \\le f(x)\\}$$\n", 46 | "\n", 47 | "In other words, it is the set of points x for which f(x) attains its largest value. This set may be empty, have one element, or have multiple elements. For example, if f(x) is 1−|x|, then it attains its maximum value of 1 at x = 0 and only there, so\n", 48 | "\n", 49 | "$$\\operatorname*{arg\\,max}_x (1-|x|) = \\{0\\}$$\n" 50 | ] 51 | }, 52 | { 53 | "cell_type": "markdown", 54 | "metadata": {}, 55 | "source": [ 56 | "\n", 57 | "### Part 2: Bayes' Theorem\n", 58 | "\n", 59 | "First, for a quick introduction to Bayes' Theorem, check out the Bayes' Theorem Lecture in the statistics appendix portion of this course, in order ot fully understand Naive Bayes, you'll need a complete understanding of the Bayes' Theorem.\n", 60 | "\n", 61 | "### Part 3: Introduction to Naive Bayes\n", 62 | "\n", 63 | "Naive Bayes is probably one of the practical machine learning algorithms. Despite its name, it is actually performs very well considering its classification performance. It proves to be quite robust to irrelevant features, which it ignores. It learns and predicts very fast and it does not require lots of storage. So, why is it then called naive?\n", 64 | "\n", 65 | "The naive was added to the account for one assumption that is required for Bayes to work optimally: all features must be independent of each other. In reality, this is usually not the case, however, it still returns very good accuracy in practice even when the independent assumption does not hold.\n", 66 | "\n", 67 | "Naive Bayes classifiers have worked quite well in many real-world situations, famously document classification and spam filtering. We will be working with the Iris Flower data set in this lecture.\n", 68 | "\n", 69 | "### Part 4: Naive Bayes Classifier Mathematics Overview\n", 70 | "\n", 71 | "Naive Bayes methods are a set of supervised learning algorithms based on applying Bayes’ theorem with the “naive” assumption of independence between every pair of features. Given a class variable y and a dependent feature vector x1 through xn, Bayes’ theorem states the following relationship:\n", 72 | "\n", 73 | "$$P(y \\mid x_1, \\dots, x_n) = \\frac{P(y) P(x_1, \\dots x_n \\mid y)}\n", 74 | " {P(x_1, \\dots, x_n)}$$\n", 75 | " \n", 76 | "Using the naive independence assumption that\n", 77 | "$$P(x_i | y, x_1, \\dots, x_{i-1}, x_{i+1}, \\dots, x_n) = P(x_i | y)$$\n", 78 | "\n", 79 | "for all i, this relationship is simplified to:\n", 80 | "\n", 81 | "$$P(y \\mid x_1, \\dots, x_n) = \\frac{P(y) \\prod_{i=1}^{n} P(x_i \\mid y)}\n", 82 | " {P(x_1, \\dots, x_n)}$$\n", 83 | " \n", 84 | "We now have a relationship between the target and the features using Bayes Theorem along with a Naive Assumption that all features are independent." 85 | ] 86 | }, 87 | { 88 | "cell_type": "markdown", 89 | "metadata": {}, 90 | "source": [ 91 | "### Part 5: Constructing a classifier from the probability model\n", 92 | "\n", 93 | "So far we have derived the independent feature model, the Naive Bayes probability model. The Naive Bayes classifier combines this model with a *decision rule*, this decision rule will decide which hypothesis is most probable, in our example case this will be which class of flower is most probable.\n", 94 | "\n", 95 | "Picking the hypothesis that is most probable is known as the maximum a posteriori or MAP decision rule. The corresponding classifier, a Bayes classifier, is the function that assigns a class label to y as follows:" 96 | ] 97 | }, 98 | { 99 | "cell_type": "markdown", 100 | "metadata": {}, 101 | "source": [ 102 | "Since P(x1, ..., xn) is constant given the input, we can use the following classification rule:\n", 103 | "$$P(y \\mid x_1, \\dots, x_n) \\propto P(y) \\prod_{i=1}^{n} P(x_i \\mid y)$$\n", 104 | "\n", 105 | "$$\\Downarrow$$\n", 106 | "\n", 107 | "$$\\hat{y} = \\arg\\max_y P(y) \\prod_{i=1}^{n} P(x_i \\mid y),$$\n", 108 | "\n", 109 | "and we can use Maximum A Posteriori (MAP) estimation to estimate P(y) and P(xi | y); the former is then the relative frequency of class y in the training set.\n", 110 | "\n", 111 | "There are different naive Bayes classifiers that differ mainly by the assumptions they make regarding the distribution of P(xi | y).\n", 112 | "\n", 113 | "### Part 6: Gaussian Naive Bayes\n", 114 | "\n", 115 | "When dealing with continuous data, a typical assumption is that the continuous values associated with each class are distributed according to a Gaussian distribution. Go back to the normal distribution lecture to review the formulas for the Gaussian/Normal Distribution.\n", 116 | "\n", 117 | "For example of using the Gaussian Distribution, suppose the training data contain a continuous attribute, x. We first segment the data by the class, and then compute the mean and variance of x in each class. Let μc be the mean of the values in x associated with class c, and let σ2c be the variance of the values in x associated with class c. Then, the probability distribution of some value given a class, p(x=v|c), can be computed by plugging v into the equation for a Normal distribution parameterized by μc and σ2c. That is:\n", 118 | "\n", 119 | "$$p(x=v|c)=\\frac{1}{\\sqrt{2\\pi\\sigma^2_c}}\\,e^{ -\\frac{(v-\\mu_c)^2}{2\\sigma^2_c} }$$\n" 120 | ] 121 | }, 122 | { 123 | "cell_type": "markdown", 124 | "metadata": {}, 125 | "source": [ 126 | "The key to Naive Bayes is making the (rather large) assumption that the presences (or absences) of\n", 127 | "each data feature are independent of one another, conditional on a data having a certain label." 128 | ] 129 | }, 130 | { 131 | "cell_type": "markdown", 132 | "metadata": {}, 133 | "source": [ 134 | "### Part 7: Gaussian Naive Bayes with SciKit Learn\n", 135 | "\n", 136 | "Quick note we will actually only use the SciKit Learn Library in this lecture:" 137 | ] 138 | }, 139 | { 140 | "cell_type": "code", 141 | "execution_count": 3, 142 | "metadata": {}, 143 | "outputs": [], 144 | "source": [ 145 | "import pandas as pd\n", 146 | "from pandas import Series,DataFrame\n", 147 | "\n", 148 | "import matplotlib.pyplot as plt\n", 149 | "import seaborn as sns\n", 150 | "\n", 151 | "# Gaussian Naive Bayes\n", 152 | "from sklearn import datasets\n", 153 | "from sklearn import metrics\n", 154 | "from sklearn.naive_bayes import GaussianNB" 155 | ] 156 | }, 157 | { 158 | "cell_type": "markdown", 159 | "metadata": {}, 160 | "source": [ 161 | "Now that we have our module imports, let's go ahead and import the Iris Data Set. We have previously worked with this dataset, so go ahead and look at Lectures on MultiClass Classification for a complete breakdown on this data set!" 162 | ] 163 | }, 164 | { 165 | "cell_type": "code", 166 | "execution_count": 10, 167 | "metadata": {}, 168 | "outputs": [ 169 | { 170 | "name": "stdout", 171 | "output_type": "stream", 172 | "text": [ 173 | "Iris Plants Database\n", 174 | "\n", 175 | "Notes\n", 176 | "-----\n", 177 | "Data Set Characteristics:\n", 178 | " :Number of Instances: 150 (50 in each of three classes)\n", 179 | " :Number of Attributes: 4 numeric, predictive attributes and the class\n", 180 | " :Attribute Information:\n", 181 | " - sepal length in cm\n", 182 | " - sepal width in cm\n", 183 | " - petal length in cm\n", 184 | " - petal width in cm\n", 185 | " - class:\n", 186 | " - Iris-Setosa\n", 187 | " - Iris-Versicolour\n", 188 | " - Iris-Virginica\n", 189 | " :Summary Statistics:\n", 190 | " ============== ==== ==== ======= ===== ====================\n", 191 | " Min Max Mean SD Class Correlation\n", 192 | " ============== ==== ==== ======= ===== ====================\n", 193 | " sepal length: 4.3 7.9 5.84 0.83 0.7826\n", 194 | " sepal width: 2.0 4.4 3.05 0.43 -0.4194\n", 195 | " petal length: 1.0 6.9 3.76 1.76 0.9490 (high!)\n", 196 | " petal width: 0.1 2.5 1.20 0.76 0.9565 (high!)\n", 197 | " ============== ==== ==== ======= ===== ====================\n", 198 | " :Missing Attribute Values: None\n", 199 | " :Class Distribution: 33.3% for each of 3 classes.\n", 200 | " :Creator: R.A. Fisher\n", 201 | " :Donor: Michael Marshall (MARSHALL%PLU@io.arc.nasa.gov)\n", 202 | " :Date: July, 1988\n", 203 | "\n", 204 | "This is a copy of UCI ML iris datasets.\n", 205 | "http://archive.ics.uci.edu/ml/datasets/Iris\n", 206 | "\n", 207 | "The famous Iris database, first used by Sir R.A Fisher\n", 208 | "\n", 209 | "This is perhaps the best known database to be found in the\n", 210 | "pattern recognition literature. Fisher's paper is a classic in the field and\n", 211 | "is referenced frequently to this day. (See Duda & Hart, for example.) The\n", 212 | "data set contains 3 classes of 50 instances each, where each class refers to a\n", 213 | "type of iris plant. One class is linearly separable from the other 2; the\n", 214 | "latter are NOT linearly separable from each other.\n", 215 | "\n", 216 | "References\n", 217 | "----------\n", 218 | " - Fisher,R.A. \"The use of multiple measurements in taxonomic problems\"\n", 219 | " Annual Eugenics, 7, Part II, 179-188 (1936); also in \"Contributions to\n", 220 | " Mathematical Statistics\" (John Wiley, NY, 1950).\n", 221 | " - Duda,R.O., & Hart,P.E. (1973) Pattern Classification and Scene Analysis.\n", 222 | " (Q327.D83) John Wiley & Sons. ISBN 0-471-22361-1. See page 218.\n", 223 | " - Dasarathy, B.V. (1980) \"Nosing Around the Neighborhood: A New System\n", 224 | " Structure and Classification Rule for Recognition in Partially Exposed\n", 225 | " Environments\". IEEE Transactions on Pattern Analysis and Machine\n", 226 | " Intelligence, Vol. PAMI-2, No. 1, 67-71.\n", 227 | " - Gates, G.W. (1972) \"The Reduced Nearest Neighbor Rule\". IEEE Transactions\n", 228 | " on Information Theory, May 1972, 431-433.\n", 229 | " - See also: 1988 MLC Proceedings, 54-64. Cheeseman et al\"s AUTOCLASS II\n", 230 | " conceptual clustering system finds 3 classes in the data.\n", 231 | " - Many, many more ...\n", 232 | "\n" 233 | ] 234 | } 235 | ], 236 | "source": [ 237 | "# load the iris datasets\n", 238 | "iris = datasets.load_iris()\n", 239 | "\n", 240 | "# Grab features (X) and the Target (Y)\n", 241 | "X = iris.data\n", 242 | "\n", 243 | "Y = iris.target\n", 244 | "\n", 245 | "# Show the Built-in Data Description\n", 246 | "print iris.DESCR" 247 | ] 248 | }, 249 | { 250 | "cell_type": "markdown", 251 | "metadata": {}, 252 | "source": [ 253 | "Since we have already done a general analysis of this data in earlier lectures, let's go ahead and move on to using the Naive Bayes Method to seperate this data set into multiple classes.\n", 254 | "\n", 255 | "First we create and fit the model" 256 | ] 257 | }, 258 | { 259 | "cell_type": "code", 260 | "execution_count": 14, 261 | "metadata": {}, 262 | "outputs": [], 263 | "source": [ 264 | "# Fit a Naive Bayes model to the data\n", 265 | "model = GaussianNB()" 266 | ] 267 | }, 268 | { 269 | "cell_type": "markdown", 270 | "metadata": {}, 271 | "source": [ 272 | "Now that we have our model, we will continue by seperating into training and testing sets:" 273 | ] 274 | }, 275 | { 276 | "cell_type": "code", 277 | "execution_count": 16, 278 | "metadata": {}, 279 | "outputs": [], 280 | "source": [ 281 | "from sklearn.cross_validation import train_test_split\n", 282 | "# Split the data into Trainging and Testing sets\n", 283 | "X_train, X_test, Y_train, Y_test = train_test_split(X, Y)" 284 | ] 285 | }, 286 | { 287 | "cell_type": "markdown", 288 | "metadata": {}, 289 | "source": [ 290 | "Now we fit our model using the training data set:" 291 | ] 292 | }, 293 | { 294 | "cell_type": "code", 295 | "execution_count": 19, 296 | "metadata": {}, 297 | "outputs": [ 298 | { 299 | "data": { 300 | "text/plain": [ 301 | "GaussianNB()" 302 | ] 303 | }, 304 | "execution_count": 19, 305 | "metadata": {}, 306 | "output_type": "execute_result" 307 | } 308 | ], 309 | "source": [ 310 | "# Fit the training model\n", 311 | "model.fit(X_train,Y_train)" 312 | ] 313 | }, 314 | { 315 | "cell_type": "markdown", 316 | "metadata": {}, 317 | "source": [ 318 | "Now we predict the outcomes from the Testing Set:" 319 | ] 320 | }, 321 | { 322 | "cell_type": "code", 323 | "execution_count": 21, 324 | "metadata": { 325 | "collapsed": true 326 | }, 327 | "outputs": [], 328 | "source": [ 329 | "# Predicted outcomes\n", 330 | "predicted = model.predict(X_test)\n", 331 | "\n", 332 | "# Actual Expected Outvomes\n", 333 | "expected = Y_test" 334 | ] 335 | }, 336 | { 337 | "cell_type": "markdown", 338 | "metadata": {}, 339 | "source": [ 340 | "Finally we can see the metrics for performance:" 341 | ] 342 | }, 343 | { 344 | "cell_type": "code", 345 | "execution_count": 24, 346 | "metadata": {}, 347 | "outputs": [ 348 | { 349 | "name": "stdout", 350 | "output_type": "stream", 351 | "text": [ 352 | "0.947368421053\n" 353 | ] 354 | } 355 | ], 356 | "source": [ 357 | "print metrics.accuracy_score(expected, predicted)" 358 | ] 359 | }, 360 | { 361 | "cell_type": "markdown", 362 | "metadata": {}, 363 | "source": [ 364 | "It looks like we have about 94.7% accuracy using the Naive Bayes method!" 365 | ] 366 | }, 367 | { 368 | "cell_type": "markdown", 369 | "metadata": {}, 370 | "source": [] 371 | }, 372 | { 373 | "cell_type": "code", 374 | "execution_count": null, 375 | "metadata": { 376 | "collapsed": true 377 | }, 378 | "outputs": [], 379 | "source": [] 380 | } 381 | ], 382 | "metadata": { 383 | "kernelspec": { 384 | "display_name": "Python 2", 385 | "language": "python", 386 | "name": "python2" 387 | }, 388 | "language_info": { 389 | "codemirror_mode": { 390 | "name": "ipython", 391 | "version": 2 392 | }, 393 | "file_extension": ".py", 394 | "mimetype": "text/x-python", 395 | "name": "python", 396 | "nbconvert_exporter": "python", 397 | "pygments_lexer": "ipython2", 398 | "version": "2.7.13" 399 | } 400 | }, 401 | "nbformat": 4, 402 | "nbformat_minor": 1 403 | } 404 | -------------------------------------------------------------------------------- /Naive_Bayes/example_naive_bayes.py: -------------------------------------------------------------------------------- 1 | # Michelle Morales 2 | # Digital Ocean Tutorial - How to Build a Machine Learning Classifier in Python with Scikit-learn 3 | # https://www.digitalocean.com/community/tutorials/how-to-build-a-machine-learning-classifier-in-python-with-scikit-learn 4 | 5 | 6 | from sklearn.datasets import load_breast_cancer 7 | from sklearn.model_selection import train_test_split 8 | from sklearn.naive_bayes import GaussianNB 9 | from sklearn.metrics import accuracy_score 10 | 11 | # Load dataset 12 | data = load_breast_cancer() 13 | 14 | # Organize our data 15 | label_names = data['target_names'] 16 | labels = data['target'] 17 | feature_names = data['feature_names'] 18 | features = data['data'] 19 | 20 | # Look at our data 21 | print (label_names) 22 | print ('Class label = ', labels[0]) 23 | print (feature_names) 24 | print (features[0]) 25 | 26 | # Split our data 27 | train, test, train_labels, test_labels = train_test_split(features, labels, 28 | test_size=0.33, random_state=42) 29 | # Initialize our classifier 30 | gnb = GaussianNB() 31 | 32 | # Train our classifier 33 | model = gnb.fit(train, train_labels) 34 | 35 | # Make predictions 36 | preds = gnb.predict(test) 37 | print(preds) 38 | 39 | # Evaluate accuracy 40 | print (accuracy_score(test_labels, preds)) 41 | -------------------------------------------------------------------------------- /Neural_Networks/Hebb_rule.m: -------------------------------------------------------------------------------- 1 | x= [1 1 -1 -1;1 -1 1 -1]; 2 | t= [1 -1 -1 -1]; 3 | w= [0 0]; 4 | b=0; 5 | 6 | for i= 1:4; 7 | for j= 1:2 8 | w(j)=w(j)+t(i)*x(j,i); 9 | end 10 | b= b+t(i); 11 | end 12 | 13 | disp('Final Weight Matrix: '); 14 | disp(w); 15 | disp('Final bias Values'); 16 | disp(b); 17 | 18 | % Plotting by Linear Separability Concept 19 | plot(x(1,1),x(2,1), 'or','MarkerSize',20,'MarkerFaceColor',[0 0 1]);hold on; 20 | plot(x(1,2),x(2,2), 'or','MarkerSize',20,'MarkerFaceColor',[1 0 0]);hold on; 21 | plot(x(1,3),x(2,3), 'or','MarkerSize',20,'MarkerFaceColor',[1 0 0]);hold on; 22 | plot(x(1,4),x(2,4), 'or','MarkerSize',20,'MarkerFaceColor',[1 0 0]);hold on; 23 | hold on; 24 | 25 | m= -(w(1)/w(2)); 26 | c= -b/w(2); 27 | 28 | x1=linspace(-2,2,100); 29 | x2=m*x1+c; 30 | plot(x2,x1,'r'); 31 | axis([-2 2 -2 2]); 32 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | Lectures-On-Machine-Learning 2 | 3 | ## 4 | 5 | Repository contains reference material for Cybros Lecture Series on ML. 6 | 7 | ## 8 | 9 | 10 | The repository has been divided into various ML-algorithms and sections. 11 | Only well-documented and commented codes in either of Python,Matlab and R would be considered as this repository is for beginner letures in Machine Learning 12 | 13 | ![DUB](https://img.shields.io/dub/l/vibe-d.svg?style=flat) 14 | 15 | [![Join the chat](https://img.shields.io/badge/gitter-join%20chat%20%E2%86%92-brightgreen.svg)](https://gitter.im/LNMIIT-Computer-Club/Lobby) 16 | -------------------------------------------------------------------------------- /RESOURCES.MD: -------------------------------------------------------------------------------- 1 | # Resources 2 | 3 | ## Useful Information and Links 4 | 5 | * [TensorFlow](https://www.tensorflow.org/) 6 | * [Coursera ML Course](https://www.coursera.org/learn/machine-learning) -------------------------------------------------------------------------------- /Random Forests/intro_to_random_forest.py: -------------------------------------------------------------------------------- 1 | 2 | #RandomForests Classifier 3 | 4 | from sklearn.ensemble import RandomForestClassifier 5 | 6 | 7 | features = [[0, 0], [1, 1], [80,3], [56,5], [70,5], [45,1]] 8 | labels = [0, 1, 1, 1, 1, 0] 9 | 10 | clf = RandomForestClassifier(n_estimators=10) 11 | # n_estimators (parameter) indicates number of trees included into our model 12 | 13 | clf = clf.fit( features, labels) # fitting data into the model (training) 14 | 15 | print(clf.predict([[2., 2.]])) # prediction (testing) 16 | print(clf.predict([[58,5]])) 17 | -------------------------------------------------------------------------------- /Support Vector Machines/.ipynb_checkpoints/SVM-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Support Vector Machines\n", 8 | "\n", 9 | "\n", 10 | "## Import Libraries" 11 | ] 12 | }, 13 | { 14 | "cell_type": "code", 15 | "execution_count": 1, 16 | "metadata": { 17 | "collapsed": true 18 | }, 19 | "outputs": [], 20 | "source": [ 21 | "import pandas as pd\n", 22 | "import numpy as np\n", 23 | "import matplotlib.pyplot as plt\n", 24 | "import seaborn as sns\n", 25 | "%matplotlib inline" 26 | ] 27 | }, 28 | { 29 | "cell_type": "markdown", 30 | "metadata": {}, 31 | "source": [ 32 | "## Get the Data\n", 33 | "\n", 34 | "Using the built-in breast-cancer dataset of sklearn " 35 | ] 36 | }, 37 | { 38 | "cell_type": "code", 39 | "execution_count": 2, 40 | "metadata": { 41 | "collapsed": true 42 | }, 43 | "outputs": [], 44 | "source": [ 45 | "from sklearn.datasets import load_breast_cancer" 46 | ] 47 | }, 48 | { 49 | "cell_type": "code", 50 | "execution_count": 3, 51 | "metadata": { 52 | "collapsed": true 53 | }, 54 | "outputs": [], 55 | "source": [ 56 | "cancer = load_breast_cancer()" 57 | ] 58 | }, 59 | { 60 | "cell_type": "markdown", 61 | "metadata": {}, 62 | "source": [ 63 | "The data set is presented in a dictionary form:" 64 | ] 65 | }, 66 | { 67 | "cell_type": "code", 68 | "execution_count": 4, 69 | "metadata": {}, 70 | "outputs": [ 71 | { 72 | "data": { 73 | "text/plain": [ 74 | "['target_names', 'data', 'target', 'DESCR', 'feature_names']" 75 | ] 76 | }, 77 | "execution_count": 4, 78 | "metadata": {}, 79 | "output_type": "execute_result" 80 | } 81 | ], 82 | "source": [ 83 | "cancer.keys()" 84 | ] 85 | }, 86 | { 87 | "cell_type": "markdown", 88 | "metadata": {}, 89 | "source": [ 90 | "We can grab information and arrays out of this dictionary to set up our data frame and understanding of the features:" 91 | ] 92 | }, 93 | { 94 | "cell_type": "code", 95 | "execution_count": 5, 96 | "metadata": {}, 97 | "outputs": [ 98 | { 99 | "name": "stdout", 100 | "output_type": "stream", 101 | "text": [ 102 | "Breast Cancer Wisconsin (Diagnostic) Database\n", 103 | "=============================================\n", 104 | "\n", 105 | "Notes\n", 106 | "-----\n", 107 | "Data Set Characteristics:\n", 108 | " :Number of Instances: 569\n", 109 | "\n", 110 | " :Number of Attributes: 30 numeric, predictive attributes and the class\n", 111 | "\n", 112 | " :Attribute Information:\n", 113 | " - radius (mean of distances from center to points on the perimeter)\n", 114 | " - texture (standard deviation of gray-scale values)\n", 115 | " - perimeter\n", 116 | " - area\n", 117 | " - smoothness (local variation in radius lengths)\n", 118 | " - compactness (perimeter^2 / area - 1.0)\n", 119 | " - concavity (severity of concave portions of the contour)\n", 120 | " - concave points (number of concave portions of the contour)\n", 121 | " - symmetry \n", 122 | " - fractal dimension (\"coastline approximation\" - 1)\n", 123 | "\n", 124 | " The mean, standard error, and \"worst\" or largest (mean of the three\n", 125 | " largest values) of these features were computed for each image,\n", 126 | " resulting in 30 features. For instance, field 3 is Mean Radius, field\n", 127 | " 13 is Radius SE, field 23 is Worst Radius.\n", 128 | "\n", 129 | " - class:\n", 130 | " - WDBC-Malignant\n", 131 | " - WDBC-Benign\n", 132 | "\n", 133 | " :Summary Statistics:\n", 134 | "\n", 135 | " ===================================== ====== ======\n", 136 | " Min Max\n", 137 | " ===================================== ====== ======\n", 138 | " radius (mean): 6.981 28.11\n", 139 | " texture (mean): 9.71 39.28\n", 140 | " perimeter (mean): 43.79 188.5\n", 141 | " area (mean): 143.5 2501.0\n", 142 | " smoothness (mean): 0.053 0.163\n", 143 | " compactness (mean): 0.019 0.345\n", 144 | " concavity (mean): 0.0 0.427\n", 145 | " concave points (mean): 0.0 0.201\n", 146 | " symmetry (mean): 0.106 0.304\n", 147 | " fractal dimension (mean): 0.05 0.097\n", 148 | " radius (standard error): 0.112 2.873\n", 149 | " texture (standard error): 0.36 4.885\n", 150 | " perimeter (standard error): 0.757 21.98\n", 151 | " area (standard error): 6.802 542.2\n", 152 | " smoothness (standard error): 0.002 0.031\n", 153 | " compactness (standard error): 0.002 0.135\n", 154 | " concavity (standard error): 0.0 0.396\n", 155 | " concave points (standard error): 0.0 0.053\n", 156 | " symmetry (standard error): 0.008 0.079\n", 157 | " fractal dimension (standard error): 0.001 0.03\n", 158 | " radius (worst): 7.93 36.04\n", 159 | " texture (worst): 12.02 49.54\n", 160 | " perimeter (worst): 50.41 251.2\n", 161 | " area (worst): 185.2 4254.0\n", 162 | " smoothness (worst): 0.071 0.223\n", 163 | " compactness (worst): 0.027 1.058\n", 164 | " concavity (worst): 0.0 1.252\n", 165 | " concave points (worst): 0.0 0.291\n", 166 | " symmetry (worst): 0.156 0.664\n", 167 | " fractal dimension (worst): 0.055 0.208\n", 168 | " ===================================== ====== ======\n", 169 | "\n", 170 | " :Missing Attribute Values: None\n", 171 | "\n", 172 | " :Class Distribution: 212 - Malignant, 357 - Benign\n", 173 | "\n", 174 | " :Creator: Dr. William H. Wolberg, W. Nick Street, Olvi L. Mangasarian\n", 175 | "\n", 176 | " :Donor: Nick Street\n", 177 | "\n", 178 | " :Date: November, 1995\n", 179 | "\n", 180 | "This is a copy of UCI ML Breast Cancer Wisconsin (Diagnostic) datasets.\n", 181 | "https://goo.gl/U2Uwz2\n", 182 | "\n", 183 | "Features are computed from a digitized image of a fine needle\n", 184 | "aspirate (FNA) of a breast mass. They describe\n", 185 | "characteristics of the cell nuclei present in the image.\n", 186 | "\n", 187 | "Separating plane described above was obtained using\n", 188 | "Multisurface Method-Tree (MSM-T) [K. P. Bennett, \"Decision Tree\n", 189 | "Construction Via Linear Programming.\" Proceedings of the 4th\n", 190 | "Midwest Artificial Intelligence and Cognitive Science Society,\n", 191 | "pp. 97-101, 1992], a classification method which uses linear\n", 192 | "programming to construct a decision tree. Relevant features\n", 193 | "were selected using an exhaustive search in the space of 1-4\n", 194 | "features and 1-3 separating planes.\n", 195 | "\n", 196 | "The actual linear program used to obtain the separating plane\n", 197 | "in the 3-dimensional space is that described in:\n", 198 | "[K. P. Bennett and O. L. Mangasarian: \"Robust Linear\n", 199 | "Programming Discrimination of Two Linearly Inseparable Sets\",\n", 200 | "Optimization Methods and Software 1, 1992, 23-34].\n", 201 | "\n", 202 | "This database is also available through the UW CS ftp server:\n", 203 | "\n", 204 | "ftp ftp.cs.wisc.edu\n", 205 | "cd math-prog/cpo-dataset/machine-learn/WDBC/\n", 206 | "\n", 207 | "References\n", 208 | "----------\n", 209 | " - W.N. Street, W.H. Wolberg and O.L. Mangasarian. Nuclear feature extraction \n", 210 | " for breast tumor diagnosis. IS&T/SPIE 1993 International Symposium on \n", 211 | " Electronic Imaging: Science and Technology, volume 1905, pages 861-870,\n", 212 | " San Jose, CA, 1993.\n", 213 | " - O.L. Mangasarian, W.N. Street and W.H. Wolberg. Breast cancer diagnosis and \n", 214 | " prognosis via linear programming. Operations Research, 43(4), pages 570-577, \n", 215 | " July-August 1995.\n", 216 | " - W.H. Wolberg, W.N. Street, and O.L. Mangasarian. Machine learning techniques\n", 217 | " to diagnose breast cancer from fine-needle aspirates. Cancer Letters 77 (1994) \n", 218 | " 163-171.\n", 219 | "\n" 220 | ] 221 | } 222 | ], 223 | "source": [ 224 | "print(cancer['DESCR'])" 225 | ] 226 | }, 227 | { 228 | "cell_type": "code", 229 | "execution_count": 6, 230 | "metadata": {}, 231 | "outputs": [ 232 | { 233 | "data": { 234 | "text/plain": [ 235 | "array(['mean radius', 'mean texture', 'mean perimeter', 'mean area',\n", 236 | " 'mean smoothness', 'mean compactness', 'mean concavity',\n", 237 | " 'mean concave points', 'mean symmetry', 'mean fractal dimension',\n", 238 | " 'radius error', 'texture error', 'perimeter error', 'area error',\n", 239 | " 'smoothness error', 'compactness error', 'concavity error',\n", 240 | " 'concave points error', 'symmetry error', 'fractal dimension error',\n", 241 | " 'worst radius', 'worst texture', 'worst perimeter', 'worst area',\n", 242 | " 'worst smoothness', 'worst compactness', 'worst concavity',\n", 243 | " 'worst concave points', 'worst symmetry', 'worst fractal dimension'], \n", 244 | " dtype='|S23')" 245 | ] 246 | }, 247 | "execution_count": 6, 248 | "metadata": {}, 249 | "output_type": "execute_result" 250 | } 251 | ], 252 | "source": [ 253 | "cancer['feature_names']" 254 | ] 255 | }, 256 | { 257 | "cell_type": "markdown", 258 | "metadata": {}, 259 | "source": [ 260 | "## Set up DataFrame" 261 | ] 262 | }, 263 | { 264 | "cell_type": "code", 265 | "execution_count": 7, 266 | "metadata": {}, 267 | "outputs": [ 268 | { 269 | "name": "stdout", 270 | "output_type": "stream", 271 | "text": [ 272 | "\n", 273 | "RangeIndex: 569 entries, 0 to 568\n", 274 | "Data columns (total 30 columns):\n", 275 | "mean radius 569 non-null float64\n", 276 | "mean texture 569 non-null float64\n", 277 | "mean perimeter 569 non-null float64\n", 278 | "mean area 569 non-null float64\n", 279 | "mean smoothness 569 non-null float64\n", 280 | "mean compactness 569 non-null float64\n", 281 | "mean concavity 569 non-null float64\n", 282 | "mean concave points 569 non-null float64\n", 283 | "mean symmetry 569 non-null float64\n", 284 | "mean fractal dimension 569 non-null float64\n", 285 | "radius error 569 non-null float64\n", 286 | "texture error 569 non-null float64\n", 287 | "perimeter error 569 non-null float64\n", 288 | "area error 569 non-null float64\n", 289 | "smoothness error 569 non-null float64\n", 290 | "compactness error 569 non-null float64\n", 291 | "concavity error 569 non-null float64\n", 292 | "concave points error 569 non-null float64\n", 293 | "symmetry error 569 non-null float64\n", 294 | "fractal dimension error 569 non-null float64\n", 295 | "worst radius 569 non-null float64\n", 296 | "worst texture 569 non-null float64\n", 297 | "worst perimeter 569 non-null float64\n", 298 | "worst area 569 non-null float64\n", 299 | "worst smoothness 569 non-null float64\n", 300 | "worst compactness 569 non-null float64\n", 301 | "worst concavity 569 non-null float64\n", 302 | "worst concave points 569 non-null float64\n", 303 | "worst symmetry 569 non-null float64\n", 304 | "worst fractal dimension 569 non-null float64\n", 305 | "dtypes: float64(30)\n", 306 | "memory usage: 133.4 KB\n" 307 | ] 308 | } 309 | ], 310 | "source": [ 311 | "df_feat = pd.DataFrame(cancer['data'],columns=cancer['feature_names'])\n", 312 | "df_feat.info()" 313 | ] 314 | }, 315 | { 316 | "cell_type": "code", 317 | "execution_count": 8, 318 | "metadata": {}, 319 | "outputs": [ 320 | { 321 | "data": { 322 | "text/plain": [ 323 | "array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0,\n", 324 | " 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,\n", 325 | " 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1,\n", 326 | " 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 1, 1, 1, 0,\n", 327 | " 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1,\n", 328 | " 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1,\n", 329 | " 0, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1,\n", 330 | " 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 1, 0, 0, 0, 1,\n", 331 | " 0, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1,\n", 332 | " 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0,\n", 333 | " 0, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0,\n", 334 | " 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1,\n", 335 | " 1, 0, 1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1,\n", 336 | " 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0,\n", 337 | " 1, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 0, 1,\n", 338 | " 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1,\n", 339 | " 0, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1,\n", 340 | " 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1,\n", 341 | " 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1,\n", 342 | " 1, 1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1,\n", 343 | " 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1,\n", 344 | " 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 0, 1, 1,\n", 345 | " 1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1,\n", 346 | " 1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,\n", 347 | " 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1])" 348 | ] 349 | }, 350 | "execution_count": 8, 351 | "metadata": {}, 352 | "output_type": "execute_result" 353 | } 354 | ], 355 | "source": [ 356 | "cancer['target']" 357 | ] 358 | }, 359 | { 360 | "cell_type": "code", 361 | "execution_count": 9, 362 | "metadata": {}, 363 | "outputs": [], 364 | "source": [ 365 | "df_target = pd.DataFrame(cancer['target'],columns=['Cancer'])" 366 | ] 367 | }, 368 | { 369 | "cell_type": "code", 370 | "execution_count": 29, 371 | "metadata": {}, 372 | "outputs": [ 373 | { 374 | "data": { 375 | "text/html": [ 376 | "
\n", 377 | "\n", 390 | "\n", 391 | " \n", 392 | " \n", 393 | " \n", 394 | " \n", 395 | " \n", 396 | " \n", 397 | " \n", 398 | " \n", 399 | " \n", 400 | " \n", 401 | " \n", 402 | " \n", 403 | " \n", 404 | " \n", 405 | " \n", 406 | " \n", 407 | " \n", 408 | " \n", 409 | " \n", 410 | " \n", 411 | " \n", 412 | " \n", 413 | " \n", 414 | " \n", 415 | " \n", 416 | " \n", 417 | " \n", 418 | " \n", 419 | "
Cancer
00
10
20
30
40
\n", 420 | "
" 421 | ], 422 | "text/plain": [ 423 | " Cancer\n", 424 | "0 0\n", 425 | "1 0\n", 426 | "2 0\n", 427 | "3 0\n", 428 | "4 0" 429 | ] 430 | }, 431 | "execution_count": 29, 432 | "metadata": {}, 433 | "output_type": "execute_result" 434 | } 435 | ], 436 | "source": [ 437 | "df_target.head()" 438 | ] 439 | }, 440 | { 441 | "cell_type": "markdown", 442 | "metadata": {}, 443 | "source": [ 444 | "## Train Test Split" 445 | ] 446 | }, 447 | { 448 | "cell_type": "code", 449 | "execution_count": 11, 450 | "metadata": { 451 | "collapsed": true 452 | }, 453 | "outputs": [], 454 | "source": [ 455 | "from sklearn.model_selection import train_test_split" 456 | ] 457 | }, 458 | { 459 | "cell_type": "code", 460 | "execution_count": 12, 461 | "metadata": { 462 | "collapsed": true 463 | }, 464 | "outputs": [], 465 | "source": [ 466 | "X_train, X_test, y_train, y_test = train_test_split(df_feat, np.ravel(df_target), test_size=0.30, random_state=101)" 467 | ] 468 | }, 469 | { 470 | "cell_type": "markdown", 471 | "metadata": {}, 472 | "source": [ 473 | "# Train the Support Vector Classifier" 474 | ] 475 | }, 476 | { 477 | "cell_type": "code", 478 | "execution_count": 13, 479 | "metadata": { 480 | "collapsed": true 481 | }, 482 | "outputs": [], 483 | "source": [ 484 | "from sklearn.svm import SVC" 485 | ] 486 | }, 487 | { 488 | "cell_type": "code", 489 | "execution_count": 14, 490 | "metadata": { 491 | "collapsed": true 492 | }, 493 | "outputs": [], 494 | "source": [ 495 | "model = SVC()" 496 | ] 497 | }, 498 | { 499 | "cell_type": "code", 500 | "execution_count": 15, 501 | "metadata": {}, 502 | "outputs": [ 503 | { 504 | "data": { 505 | "text/plain": [ 506 | "SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,\n", 507 | " decision_function_shape=None, degree=3, gamma='auto', kernel='rbf',\n", 508 | " max_iter=-1, probability=False, random_state=None, shrinking=True,\n", 509 | " tol=0.001, verbose=False)" 510 | ] 511 | }, 512 | "execution_count": 15, 513 | "metadata": {}, 514 | "output_type": "execute_result" 515 | } 516 | ], 517 | "source": [ 518 | "model.fit(X_train,y_train)" 519 | ] 520 | }, 521 | { 522 | "cell_type": "markdown", 523 | "metadata": {}, 524 | "source": [ 525 | "## Predictions and Evaluations\n" 526 | ] 527 | }, 528 | { 529 | "cell_type": "code", 530 | "execution_count": 16, 531 | "metadata": { 532 | "collapsed": true 533 | }, 534 | "outputs": [], 535 | "source": [ 536 | "predictions = model.predict(X_test)" 537 | ] 538 | }, 539 | { 540 | "cell_type": "code", 541 | "execution_count": 17, 542 | "metadata": { 543 | "collapsed": true 544 | }, 545 | "outputs": [], 546 | "source": [ 547 | "from sklearn.metrics import classification_report,confusion_matrix" 548 | ] 549 | }, 550 | { 551 | "cell_type": "code", 552 | "execution_count": 18, 553 | "metadata": {}, 554 | "outputs": [ 555 | { 556 | "name": "stdout", 557 | "output_type": "stream", 558 | "text": [ 559 | "[[ 0 66]\n", 560 | " [ 0 105]]\n" 561 | ] 562 | } 563 | ], 564 | "source": [ 565 | "print(confusion_matrix(y_test,predictions))" 566 | ] 567 | }, 568 | { 569 | "cell_type": "code", 570 | "execution_count": 19, 571 | "metadata": {}, 572 | "outputs": [ 573 | { 574 | "name": "stdout", 575 | "output_type": "stream", 576 | "text": [ 577 | " precision recall f1-score support\n", 578 | "\n", 579 | " 0 0.00 0.00 0.00 66\n", 580 | " 1 0.61 1.00 0.76 105\n", 581 | "\n", 582 | "avg / total 0.38 0.61 0.47 171\n", 583 | "\n" 584 | ] 585 | }, 586 | { 587 | "name": "stderr", 588 | "output_type": "stream", 589 | "text": [ 590 | "/usr/local/lib/python2.7/dist-packages/sklearn/metrics/classification.py:1113: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples.\n", 591 | " 'precision', 'predicted', average, warn_for)\n" 592 | ] 593 | } 594 | ], 595 | "source": [ 596 | "print(classification_report(y_test,predictions))" 597 | ] 598 | }, 599 | { 600 | "cell_type": "markdown", 601 | "metadata": {}, 602 | "source": [ 603 | "Notice that we are classifying everything into a single class! This means our model needs to have it parameters adjusted (it may also help to normalize the data)." 604 | ] 605 | }, 606 | { 607 | "cell_type": "markdown", 608 | "metadata": {}, 609 | "source": [ 610 | "# Gridsearch\n", 611 | "\n", 612 | "Finding the right parameters (like what C or gamma values to use) is a tricky task! This idea of creating a 'grid' of parameters and just trying out all the possible combinations is called a Gridsearch, this method is common enough that Scikit-learn has this functionality built in with GridSearchCV! The CV stands for cross-validation which is the GridSearchCV takes a dictionary that describes the parameters that should be tried and a model to train. The grid of parameters is defined as a dictionary, where the keys are the parameters and the values are the settings to be tested. " 613 | ] 614 | }, 615 | { 616 | "cell_type": "code", 617 | "execution_count": 20, 618 | "metadata": { 619 | "collapsed": true 620 | }, 621 | "outputs": [], 622 | "source": [ 623 | "param_grid = {'C': [0.1,1, 10, 100, 1000], 'gamma': [1,0.1,0.01,0.001,0.0001], 'kernel': ['rbf']} " 624 | ] 625 | }, 626 | { 627 | "cell_type": "code", 628 | "execution_count": 21, 629 | "metadata": { 630 | "collapsed": true 631 | }, 632 | "outputs": [], 633 | "source": [ 634 | "from sklearn.model_selection import GridSearchCV" 635 | ] 636 | }, 637 | { 638 | "cell_type": "code", 639 | "execution_count": 22, 640 | "metadata": { 641 | "collapsed": true 642 | }, 643 | "outputs": [], 644 | "source": [ 645 | "grid = GridSearchCV(SVC(),param_grid,refit=True,verbose=3)" 646 | ] 647 | }, 648 | { 649 | "cell_type": "markdown", 650 | "metadata": {}, 651 | "source": [ 652 | "What fit does is a bit more involved then usual. First, it runs the same loop with cross-validation, to find the best parameter combination. Once it has the best combination, it runs fit again on all data passed to fit (without cross-validation), to built a single new model using the best parameter setting." 653 | ] 654 | }, 655 | { 656 | "cell_type": "code", 657 | "execution_count": 23, 658 | "metadata": {}, 659 | "outputs": [ 660 | { 661 | "name": "stdout", 662 | "output_type": "stream", 663 | "text": [ 664 | "Fitting 3 folds for each of 25 candidates, totalling 75 fits\n", 665 | "[CV] kernel=rbf, C=0.1, gamma=1 ......................................\n", 666 | "[CV] ....... kernel=rbf, C=0.1, gamma=1, score=0.631579, total= 0.0s\n", 667 | "[CV] kernel=rbf, C=0.1, gamma=1 ......................................\n", 668 | "[CV] ....... kernel=rbf, C=0.1, gamma=1, score=0.631579, total= 0.0s\n", 669 | "[CV] kernel=rbf, C=0.1, gamma=1 ......................................\n", 670 | "[CV] ....... kernel=rbf, C=0.1, gamma=1, score=0.636364, total= 0.0s\n", 671 | "[CV] kernel=rbf, C=0.1, gamma=0.1 ....................................\n", 672 | "[CV] ..... kernel=rbf, C=0.1, gamma=0.1, score=0.631579, total= 0.0s\n", 673 | "[CV] kernel=rbf, C=0.1, gamma=0.1 ....................................\n", 674 | "[CV] ..... kernel=rbf, C=0.1, gamma=0.1, score=0.631579, total= 0.0s\n", 675 | "[CV] kernel=rbf, C=0.1, gamma=0.1 ....................................\n", 676 | "[CV] ..... kernel=rbf, C=0.1, gamma=0.1, score=0.636364, total= 0.0s\n", 677 | "[CV] kernel=rbf, C=0.1, gamma=0.01 ...................................\n", 678 | "[CV] .... kernel=rbf, C=0.1, gamma=0.01, score=0.631579, total= 0.0s\n", 679 | "[CV] kernel=rbf, C=0.1, gamma=0.01 ...................................\n", 680 | "[CV] .... kernel=rbf, C=0.1, gamma=0.01, score=0.631579, total= 0.0s\n", 681 | "[CV] kernel=rbf, C=0.1, gamma=0.01 ...................................\n", 682 | "[CV] .... kernel=rbf, C=0.1, gamma=0.01, score=0.636364, total= 0.0s\n", 683 | "[CV] kernel=rbf, C=0.1, gamma=0.001 ..................................\n", 684 | "[CV] ... kernel=rbf, C=0.1, gamma=0.001, score=0.631579, total= 0.0s\n", 685 | "[CV] kernel=rbf, C=0.1, gamma=0.001 ..................................\n", 686 | "[CV] ... kernel=rbf, C=0.1, gamma=0.001, score=0.631579, total= 0.0s\n", 687 | "[CV] kernel=rbf, C=0.1, gamma=0.001 ..................................\n" 688 | ] 689 | }, 690 | { 691 | "name": "stderr", 692 | "output_type": "stream", 693 | "text": [ 694 | "[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 0.0s remaining: 0.0s\n", 695 | "[Parallel(n_jobs=1)]: Done 2 out of 2 | elapsed: 0.0s remaining: 0.0s\n" 696 | ] 697 | }, 698 | { 699 | "name": "stdout", 700 | "output_type": "stream", 701 | "text": [ 702 | "[CV] ... kernel=rbf, C=0.1, gamma=0.001, score=0.636364, total= 0.0s\n", 703 | "[CV] kernel=rbf, C=0.1, gamma=0.0001 .................................\n", 704 | "[CV] .. kernel=rbf, C=0.1, gamma=0.0001, score=0.902256, total= 0.0s\n", 705 | "[CV] kernel=rbf, C=0.1, gamma=0.0001 .................................\n", 706 | "[CV] .. kernel=rbf, C=0.1, gamma=0.0001, score=0.962406, total= 0.0s\n", 707 | "[CV] kernel=rbf, C=0.1, gamma=0.0001 .................................\n", 708 | "[CV] .. kernel=rbf, C=0.1, gamma=0.0001, score=0.916667, total= 0.0s\n", 709 | "[CV] kernel=rbf, C=1, gamma=1 ........................................\n", 710 | "[CV] ......... kernel=rbf, C=1, gamma=1, score=0.631579, total= 0.0s\n", 711 | "[CV] kernel=rbf, C=1, gamma=1 ........................................\n", 712 | "[CV] ......... kernel=rbf, C=1, gamma=1, score=0.631579, total= 0.0s\n", 713 | "[CV] kernel=rbf, C=1, gamma=1 ........................................\n", 714 | "[CV] ......... kernel=rbf, C=1, gamma=1, score=0.636364, total= 0.0s\n", 715 | "[CV] kernel=rbf, C=1, gamma=0.1 ......................................\n", 716 | "[CV] ....... kernel=rbf, C=1, gamma=0.1, score=0.631579, total= 0.0s\n", 717 | "[CV] kernel=rbf, C=1, gamma=0.1 ......................................\n", 718 | "[CV] ....... kernel=rbf, C=1, gamma=0.1, score=0.631579, total= 0.0s\n", 719 | "[CV] kernel=rbf, C=1, gamma=0.1 ......................................\n", 720 | "[CV] ....... kernel=rbf, C=1, gamma=0.1, score=0.636364, total= 0.0s\n", 721 | "[CV] kernel=rbf, C=1, gamma=0.01 .....................................\n", 722 | "[CV] ...... kernel=rbf, C=1, gamma=0.01, score=0.631579, total= 0.0s\n", 723 | "[CV] kernel=rbf, C=1, gamma=0.01 .....................................\n", 724 | "[CV] ...... kernel=rbf, C=1, gamma=0.01, score=0.631579, total= 0.0s\n", 725 | "[CV] kernel=rbf, C=1, gamma=0.01 .....................................\n", 726 | "[CV] ...... kernel=rbf, C=1, gamma=0.01, score=0.636364, total= 0.0s\n", 727 | "[CV] kernel=rbf, C=1, gamma=0.001 ....................................\n", 728 | "[CV] ..... kernel=rbf, C=1, gamma=0.001, score=0.902256, total= 0.0s\n", 729 | "[CV] kernel=rbf, C=1, gamma=0.001 ....................................\n", 730 | "[CV] ..... kernel=rbf, C=1, gamma=0.001, score=0.939850, total= 0.0s\n", 731 | "[CV] kernel=rbf, C=1, gamma=0.001 ....................................\n", 732 | "[CV] ..... kernel=rbf, C=1, gamma=0.001, score=0.954545, total= 0.0s\n", 733 | "[CV] kernel=rbf, C=1, gamma=0.0001 ...................................\n", 734 | "[CV] .... kernel=rbf, C=1, gamma=0.0001, score=0.939850, total= 0.0s\n", 735 | "[CV] kernel=rbf, C=1, gamma=0.0001 ...................................\n", 736 | "[CV] .... kernel=rbf, C=1, gamma=0.0001, score=0.969925, total= 0.0s\n", 737 | "[CV] kernel=rbf, C=1, gamma=0.0001 ...................................\n", 738 | "[CV] .... kernel=rbf, C=1, gamma=0.0001, score=0.946970, total= 0.0s\n", 739 | "[CV] kernel=rbf, C=10, gamma=1 .......................................\n", 740 | "[CV] ........ kernel=rbf, C=10, gamma=1, score=0.631579, total= 0.0s\n", 741 | "[CV] kernel=rbf, C=10, gamma=1 .......................................\n", 742 | "[CV] ........ kernel=rbf, C=10, gamma=1, score=0.631579, total= 0.0s\n", 743 | "[CV] kernel=rbf, C=10, gamma=1 .......................................\n", 744 | "[CV] ........ kernel=rbf, C=10, gamma=1, score=0.636364, total= 0.0s\n", 745 | "[CV] kernel=rbf, C=10, gamma=0.1 .....................................\n", 746 | "[CV] ...... kernel=rbf, C=10, gamma=0.1, score=0.631579, total= 0.0s\n", 747 | "[CV] kernel=rbf, C=10, gamma=0.1 .....................................\n", 748 | "[CV] ...... kernel=rbf, C=10, gamma=0.1, score=0.631579, total= 0.0s\n", 749 | "[CV] kernel=rbf, C=10, gamma=0.1 .....................................\n", 750 | "[CV] ...... kernel=rbf, C=10, gamma=0.1, score=0.636364, total= 0.0s\n", 751 | "[CV] kernel=rbf, C=10, gamma=0.01 ....................................\n", 752 | "[CV] ..... kernel=rbf, C=10, gamma=0.01, score=0.631579, total= 0.0s\n", 753 | "[CV] kernel=rbf, C=10, gamma=0.01 ....................................\n", 754 | "[CV] ..... kernel=rbf, C=10, gamma=0.01, score=0.631579, total= 0.0s\n", 755 | "[CV] kernel=rbf, C=10, gamma=0.01 ....................................\n", 756 | "[CV] ..... kernel=rbf, C=10, gamma=0.01, score=0.636364, total= 0.0s\n", 757 | "[CV] kernel=rbf, C=10, gamma=0.001 ...................................\n", 758 | "[CV] .... kernel=rbf, C=10, gamma=0.001, score=0.894737, total= 0.0s\n", 759 | "[CV] kernel=rbf, C=10, gamma=0.001 ...................................\n", 760 | "[CV] .... kernel=rbf, C=10, gamma=0.001, score=0.932331, total= 0.0s\n", 761 | "[CV] kernel=rbf, C=10, gamma=0.001 ...................................\n", 762 | "[CV] .... kernel=rbf, C=10, gamma=0.001, score=0.916667, total= 0.0s\n", 763 | "[CV] kernel=rbf, C=10, gamma=0.0001 ..................................\n", 764 | "[CV] ... kernel=rbf, C=10, gamma=0.0001, score=0.932331, total= 0.0s\n", 765 | "[CV] kernel=rbf, C=10, gamma=0.0001 ..................................\n", 766 | "[CV] ... kernel=rbf, C=10, gamma=0.0001, score=0.969925, total= 0.0s\n", 767 | "[CV] kernel=rbf, C=10, gamma=0.0001 ..................................\n", 768 | "[CV] ... kernel=rbf, C=10, gamma=0.0001, score=0.962121, total= 0.0s\n", 769 | "[CV] kernel=rbf, C=100, gamma=1 ......................................\n", 770 | "[CV] ....... kernel=rbf, C=100, gamma=1, score=0.631579, total= 0.0s\n", 771 | "[CV] kernel=rbf, C=100, gamma=1 ......................................\n", 772 | "[CV] ....... kernel=rbf, C=100, gamma=1, score=0.631579, total= 0.0s\n", 773 | "[CV] kernel=rbf, C=100, gamma=1 ......................................\n", 774 | "[CV] ....... kernel=rbf, C=100, gamma=1, score=0.636364, total= 0.0s\n", 775 | "[CV] kernel=rbf, C=100, gamma=0.1 ....................................\n", 776 | "[CV] ..... kernel=rbf, C=100, gamma=0.1, score=0.631579, total= 0.0s\n", 777 | "[CV] kernel=rbf, C=100, gamma=0.1 ....................................\n", 778 | "[CV] ..... kernel=rbf, C=100, gamma=0.1, score=0.631579, total= 0.0s\n", 779 | "[CV] kernel=rbf, C=100, gamma=0.1 ....................................\n", 780 | "[CV] ..... kernel=rbf, C=100, gamma=0.1, score=0.636364, total= 0.0s\n", 781 | "[CV] kernel=rbf, C=100, gamma=0.01 ...................................\n", 782 | "[CV] .... kernel=rbf, C=100, gamma=0.01, score=0.631579, total= 0.0s\n", 783 | "[CV] kernel=rbf, C=100, gamma=0.01 ...................................\n", 784 | "[CV] .... kernel=rbf, C=100, gamma=0.01, score=0.631579, total= 0.0s\n", 785 | "[CV] kernel=rbf, C=100, gamma=0.01 ...................................\n", 786 | "[CV] .... kernel=rbf, C=100, gamma=0.01, score=0.636364, total= 0.0s\n", 787 | "[CV] kernel=rbf, C=100, gamma=0.001 ..................................\n", 788 | "[CV] ... kernel=rbf, C=100, gamma=0.001, score=0.894737, total= 0.0s\n", 789 | "[CV] kernel=rbf, C=100, gamma=0.001 ..................................\n", 790 | "[CV] ... kernel=rbf, C=100, gamma=0.001, score=0.932331, total= 0.0s\n", 791 | "[CV] kernel=rbf, C=100, gamma=0.001 ..................................\n", 792 | "[CV] ... kernel=rbf, C=100, gamma=0.001, score=0.916667, total= 0.0s\n", 793 | "[CV] kernel=rbf, C=100, gamma=0.0001 .................................\n", 794 | "[CV] .. kernel=rbf, C=100, gamma=0.0001, score=0.917293, total= 0.0s\n", 795 | "[CV] kernel=rbf, C=100, gamma=0.0001 .................................\n", 796 | "[CV] .. kernel=rbf, C=100, gamma=0.0001, score=0.977444, total= 0.0s\n", 797 | "[CV] kernel=rbf, C=100, gamma=0.0001 .................................\n", 798 | "[CV] .. kernel=rbf, C=100, gamma=0.0001, score=0.939394, total= 0.0s\n", 799 | "[CV] kernel=rbf, C=1000, gamma=1 .....................................\n", 800 | "[CV] ...... kernel=rbf, C=1000, gamma=1, score=0.631579, total= 0.0s\n", 801 | "[CV] kernel=rbf, C=1000, gamma=1 .....................................\n", 802 | "[CV] ...... kernel=rbf, C=1000, gamma=1, score=0.631579, total= 0.0s\n", 803 | "[CV] kernel=rbf, C=1000, gamma=1 .....................................\n", 804 | "[CV] ...... kernel=rbf, C=1000, gamma=1, score=0.636364, total= 0.0s\n", 805 | "[CV] kernel=rbf, C=1000, gamma=0.1 ...................................\n", 806 | "[CV] .... kernel=rbf, C=1000, gamma=0.1, score=0.631579, total= 0.0s\n", 807 | "[CV] kernel=rbf, C=1000, gamma=0.1 ...................................\n", 808 | "[CV] .... kernel=rbf, C=1000, gamma=0.1, score=0.631579, total= 0.0s\n", 809 | "[CV] kernel=rbf, C=1000, gamma=0.1 ...................................\n", 810 | "[CV] .... kernel=rbf, C=1000, gamma=0.1, score=0.636364, total= 0.0s\n", 811 | "[CV] kernel=rbf, C=1000, gamma=0.01 ..................................\n", 812 | "[CV] ... kernel=rbf, C=1000, gamma=0.01, score=0.631579, total= 0.0s\n", 813 | "[CV] kernel=rbf, C=1000, gamma=0.01 ..................................\n", 814 | "[CV] ... kernel=rbf, C=1000, gamma=0.01, score=0.631579, total= 0.0s\n", 815 | "[CV] kernel=rbf, C=1000, gamma=0.01 ..................................\n", 816 | "[CV] ... kernel=rbf, C=1000, gamma=0.01, score=0.636364, total= 0.0s\n", 817 | "[CV] kernel=rbf, C=1000, gamma=0.001 .................................\n", 818 | "[CV] .. kernel=rbf, C=1000, gamma=0.001, score=0.894737, total= 0.0s\n", 819 | "[CV] kernel=rbf, C=1000, gamma=0.001 .................................\n", 820 | "[CV] .. kernel=rbf, C=1000, gamma=0.001, score=0.932331, total= 0.0s\n", 821 | "[CV] kernel=rbf, C=1000, gamma=0.001 .................................\n", 822 | "[CV] .. kernel=rbf, C=1000, gamma=0.001, score=0.916667, total= 0.0s\n", 823 | "[CV] kernel=rbf, C=1000, gamma=0.0001 ................................\n", 824 | "[CV] . kernel=rbf, C=1000, gamma=0.0001, score=0.909774, total= 0.0s\n", 825 | "[CV] kernel=rbf, C=1000, gamma=0.0001 ................................\n", 826 | "[CV] . kernel=rbf, C=1000, gamma=0.0001, score=0.969925, total= 0.0s\n", 827 | "[CV] kernel=rbf, C=1000, gamma=0.0001 ................................\n", 828 | "[CV] . kernel=rbf, C=1000, gamma=0.0001, score=0.931818, total= 0.0s\n" 829 | ] 830 | }, 831 | { 832 | "name": "stderr", 833 | "output_type": "stream", 834 | "text": [ 835 | "[Parallel(n_jobs=1)]: Done 75 out of 75 | elapsed: 1.2s finished\n" 836 | ] 837 | }, 838 | { 839 | "data": { 840 | "text/plain": [ 841 | "GridSearchCV(cv=None, error_score='raise',\n", 842 | " estimator=SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,\n", 843 | " decision_function_shape=None, degree=3, gamma='auto', kernel='rbf',\n", 844 | " max_iter=-1, probability=False, random_state=None, shrinking=True,\n", 845 | " tol=0.001, verbose=False),\n", 846 | " fit_params={}, iid=True, n_jobs=1,\n", 847 | " param_grid={'kernel': ['rbf'], 'C': [0.1, 1, 10, 100, 1000], 'gamma': [1, 0.1, 0.01, 0.001, 0.0001]},\n", 848 | " pre_dispatch='2*n_jobs', refit=True, return_train_score=True,\n", 849 | " scoring=None, verbose=3)" 850 | ] 851 | }, 852 | "execution_count": 23, 853 | "metadata": {}, 854 | "output_type": "execute_result" 855 | } 856 | ], 857 | "source": [ 858 | "# May take awhile!\n", 859 | "grid.fit(X_train,y_train)" 860 | ] 861 | }, 862 | { 863 | "cell_type": "markdown", 864 | "metadata": {}, 865 | "source": [ 866 | "You can inspect the best parameters found by GridSearchCV in the best_params_ attribute, and the best estimator in the best\\_estimator_ attribute:" 867 | ] 868 | }, 869 | { 870 | "cell_type": "code", 871 | "execution_count": 24, 872 | "metadata": {}, 873 | "outputs": [ 874 | { 875 | "data": { 876 | "text/plain": [ 877 | "{'C': 10, 'gamma': 0.0001, 'kernel': 'rbf'}" 878 | ] 879 | }, 880 | "execution_count": 24, 881 | "metadata": {}, 882 | "output_type": "execute_result" 883 | } 884 | ], 885 | "source": [ 886 | "grid.best_params_" 887 | ] 888 | }, 889 | { 890 | "cell_type": "code", 891 | "execution_count": 25, 892 | "metadata": {}, 893 | "outputs": [ 894 | { 895 | "data": { 896 | "text/plain": [ 897 | "SVC(C=10, cache_size=200, class_weight=None, coef0=0.0,\n", 898 | " decision_function_shape=None, degree=3, gamma=0.0001, kernel='rbf',\n", 899 | " max_iter=-1, probability=False, random_state=None, shrinking=True,\n", 900 | " tol=0.001, verbose=False)" 901 | ] 902 | }, 903 | "execution_count": 25, 904 | "metadata": {}, 905 | "output_type": "execute_result" 906 | } 907 | ], 908 | "source": [ 909 | "grid.best_estimator_" 910 | ] 911 | }, 912 | { 913 | "cell_type": "markdown", 914 | "metadata": {}, 915 | "source": [ 916 | "Then you can re-run predictions on this grid object just like you would with a normal model." 917 | ] 918 | }, 919 | { 920 | "cell_type": "code", 921 | "execution_count": 26, 922 | "metadata": { 923 | "collapsed": true 924 | }, 925 | "outputs": [], 926 | "source": [ 927 | "grid_predictions = grid.predict(X_test)" 928 | ] 929 | }, 930 | { 931 | "cell_type": "code", 932 | "execution_count": 27, 933 | "metadata": {}, 934 | "outputs": [ 935 | { 936 | "name": "stdout", 937 | "output_type": "stream", 938 | "text": [ 939 | "[[ 60 6]\n", 940 | " [ 3 102]]\n" 941 | ] 942 | } 943 | ], 944 | "source": [ 945 | "print(confusion_matrix(y_test,grid_predictions))" 946 | ] 947 | }, 948 | { 949 | "cell_type": "code", 950 | "execution_count": 28, 951 | "metadata": {}, 952 | "outputs": [ 953 | { 954 | "name": "stdout", 955 | "output_type": "stream", 956 | "text": [ 957 | " precision recall f1-score support\n", 958 | "\n", 959 | " 0 0.95 0.91 0.93 66\n", 960 | " 1 0.94 0.97 0.96 105\n", 961 | "\n", 962 | "avg / total 0.95 0.95 0.95 171\n", 963 | "\n" 964 | ] 965 | } 966 | ], 967 | "source": [ 968 | "print(classification_report(y_test,grid_predictions))" 969 | ] 970 | }, 971 | { 972 | "cell_type": "markdown", 973 | "metadata": {}, 974 | "source": [ 975 | "# The End!" 976 | ] 977 | } 978 | ], 979 | "metadata": { 980 | "kernelspec": { 981 | "display_name": "Python 3", 982 | "language": "python", 983 | "name": "python3" 984 | }, 985 | "language_info": { 986 | "codemirror_mode": { 987 | "name": "ipython", 988 | "version": 2 989 | }, 990 | "file_extension": ".py", 991 | "mimetype": "text/x-python", 992 | "name": "python", 993 | "nbconvert_exporter": "python", 994 | "pygments_lexer": "ipython2", 995 | "version": "2.7.12" 996 | } 997 | }, 998 | "nbformat": 4, 999 | "nbformat_minor": 1 1000 | } 1001 | -------------------------------------------------------------------------------- /Support Vector Machines/SVM.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Support Vector Machines\n", 8 | "\n", 9 | "\n", 10 | "## Import Libraries" 11 | ] 12 | }, 13 | { 14 | "cell_type": "code", 15 | "execution_count": 1, 16 | "metadata": { 17 | "collapsed": true 18 | }, 19 | "outputs": [], 20 | "source": [ 21 | "import pandas as pd\n", 22 | "import numpy as np\n", 23 | "import matplotlib.pyplot as plt\n", 24 | "import seaborn as sns\n", 25 | "%matplotlib inline" 26 | ] 27 | }, 28 | { 29 | "cell_type": "markdown", 30 | "metadata": {}, 31 | "source": [ 32 | "## Get the Data\n", 33 | "\n", 34 | "Using the built-in breast-cancer dataset of sklearn " 35 | ] 36 | }, 37 | { 38 | "cell_type": "code", 39 | "execution_count": 2, 40 | "metadata": { 41 | "collapsed": true 42 | }, 43 | "outputs": [], 44 | "source": [ 45 | "from sklearn.datasets import load_breast_cancer" 46 | ] 47 | }, 48 | { 49 | "cell_type": "code", 50 | "execution_count": 3, 51 | "metadata": { 52 | "collapsed": true 53 | }, 54 | "outputs": [], 55 | "source": [ 56 | "cancer = load_breast_cancer()" 57 | ] 58 | }, 59 | { 60 | "cell_type": "markdown", 61 | "metadata": {}, 62 | "source": [ 63 | "The data set is presented in a dictionary form:" 64 | ] 65 | }, 66 | { 67 | "cell_type": "code", 68 | "execution_count": 4, 69 | "metadata": {}, 70 | "outputs": [ 71 | { 72 | "data": { 73 | "text/plain": [ 74 | "['target_names', 'data', 'target', 'DESCR', 'feature_names']" 75 | ] 76 | }, 77 | "execution_count": 4, 78 | "metadata": {}, 79 | "output_type": "execute_result" 80 | } 81 | ], 82 | "source": [ 83 | "cancer.keys()" 84 | ] 85 | }, 86 | { 87 | "cell_type": "markdown", 88 | "metadata": {}, 89 | "source": [ 90 | "We can grab information and arrays out of this dictionary to set up our data frame and understanding of the features:" 91 | ] 92 | }, 93 | { 94 | "cell_type": "code", 95 | "execution_count": 5, 96 | "metadata": {}, 97 | "outputs": [ 98 | { 99 | "name": "stdout", 100 | "output_type": "stream", 101 | "text": [ 102 | "Breast Cancer Wisconsin (Diagnostic) Database\n", 103 | "=============================================\n", 104 | "\n", 105 | "Notes\n", 106 | "-----\n", 107 | "Data Set Characteristics:\n", 108 | " :Number of Instances: 569\n", 109 | "\n", 110 | " :Number of Attributes: 30 numeric, predictive attributes and the class\n", 111 | "\n", 112 | " :Attribute Information:\n", 113 | " - radius (mean of distances from center to points on the perimeter)\n", 114 | " - texture (standard deviation of gray-scale values)\n", 115 | " - perimeter\n", 116 | " - area\n", 117 | " - smoothness (local variation in radius lengths)\n", 118 | " - compactness (perimeter^2 / area - 1.0)\n", 119 | " - concavity (severity of concave portions of the contour)\n", 120 | " - concave points (number of concave portions of the contour)\n", 121 | " - symmetry \n", 122 | " - fractal dimension (\"coastline approximation\" - 1)\n", 123 | "\n", 124 | " The mean, standard error, and \"worst\" or largest (mean of the three\n", 125 | " largest values) of these features were computed for each image,\n", 126 | " resulting in 30 features. For instance, field 3 is Mean Radius, field\n", 127 | " 13 is Radius SE, field 23 is Worst Radius.\n", 128 | "\n", 129 | " - class:\n", 130 | " - WDBC-Malignant\n", 131 | " - WDBC-Benign\n", 132 | "\n", 133 | " :Summary Statistics:\n", 134 | "\n", 135 | " ===================================== ====== ======\n", 136 | " Min Max\n", 137 | " ===================================== ====== ======\n", 138 | " radius (mean): 6.981 28.11\n", 139 | " texture (mean): 9.71 39.28\n", 140 | " perimeter (mean): 43.79 188.5\n", 141 | " area (mean): 143.5 2501.0\n", 142 | " smoothness (mean): 0.053 0.163\n", 143 | " compactness (mean): 0.019 0.345\n", 144 | " concavity (mean): 0.0 0.427\n", 145 | " concave points (mean): 0.0 0.201\n", 146 | " symmetry (mean): 0.106 0.304\n", 147 | " fractal dimension (mean): 0.05 0.097\n", 148 | " radius (standard error): 0.112 2.873\n", 149 | " texture (standard error): 0.36 4.885\n", 150 | " perimeter (standard error): 0.757 21.98\n", 151 | " area (standard error): 6.802 542.2\n", 152 | " smoothness (standard error): 0.002 0.031\n", 153 | " compactness (standard error): 0.002 0.135\n", 154 | " concavity (standard error): 0.0 0.396\n", 155 | " concave points (standard error): 0.0 0.053\n", 156 | " symmetry (standard error): 0.008 0.079\n", 157 | " fractal dimension (standard error): 0.001 0.03\n", 158 | " radius (worst): 7.93 36.04\n", 159 | " texture (worst): 12.02 49.54\n", 160 | " perimeter (worst): 50.41 251.2\n", 161 | " area (worst): 185.2 4254.0\n", 162 | " smoothness (worst): 0.071 0.223\n", 163 | " compactness (worst): 0.027 1.058\n", 164 | " concavity (worst): 0.0 1.252\n", 165 | " concave points (worst): 0.0 0.291\n", 166 | " symmetry (worst): 0.156 0.664\n", 167 | " fractal dimension (worst): 0.055 0.208\n", 168 | " ===================================== ====== ======\n", 169 | "\n", 170 | " :Missing Attribute Values: None\n", 171 | "\n", 172 | " :Class Distribution: 212 - Malignant, 357 - Benign\n", 173 | "\n", 174 | " :Creator: Dr. William H. Wolberg, W. Nick Street, Olvi L. Mangasarian\n", 175 | "\n", 176 | " :Donor: Nick Street\n", 177 | "\n", 178 | " :Date: November, 1995\n", 179 | "\n", 180 | "This is a copy of UCI ML Breast Cancer Wisconsin (Diagnostic) datasets.\n", 181 | "https://goo.gl/U2Uwz2\n", 182 | "\n", 183 | "Features are computed from a digitized image of a fine needle\n", 184 | "aspirate (FNA) of a breast mass. They describe\n", 185 | "characteristics of the cell nuclei present in the image.\n", 186 | "\n", 187 | "Separating plane described above was obtained using\n", 188 | "Multisurface Method-Tree (MSM-T) [K. P. Bennett, \"Decision Tree\n", 189 | "Construction Via Linear Programming.\" Proceedings of the 4th\n", 190 | "Midwest Artificial Intelligence and Cognitive Science Society,\n", 191 | "pp. 97-101, 1992], a classification method which uses linear\n", 192 | "programming to construct a decision tree. Relevant features\n", 193 | "were selected using an exhaustive search in the space of 1-4\n", 194 | "features and 1-3 separating planes.\n", 195 | "\n", 196 | "The actual linear program used to obtain the separating plane\n", 197 | "in the 3-dimensional space is that described in:\n", 198 | "[K. P. Bennett and O. L. Mangasarian: \"Robust Linear\n", 199 | "Programming Discrimination of Two Linearly Inseparable Sets\",\n", 200 | "Optimization Methods and Software 1, 1992, 23-34].\n", 201 | "\n", 202 | "This database is also available through the UW CS ftp server:\n", 203 | "\n", 204 | "ftp ftp.cs.wisc.edu\n", 205 | "cd math-prog/cpo-dataset/machine-learn/WDBC/\n", 206 | "\n", 207 | "References\n", 208 | "----------\n", 209 | " - W.N. Street, W.H. Wolberg and O.L. Mangasarian. Nuclear feature extraction \n", 210 | " for breast tumor diagnosis. IS&T/SPIE 1993 International Symposium on \n", 211 | " Electronic Imaging: Science and Technology, volume 1905, pages 861-870,\n", 212 | " San Jose, CA, 1993.\n", 213 | " - O.L. Mangasarian, W.N. Street and W.H. Wolberg. Breast cancer diagnosis and \n", 214 | " prognosis via linear programming. Operations Research, 43(4), pages 570-577, \n", 215 | " July-August 1995.\n", 216 | " - W.H. Wolberg, W.N. Street, and O.L. Mangasarian. Machine learning techniques\n", 217 | " to diagnose breast cancer from fine-needle aspirates. Cancer Letters 77 (1994) \n", 218 | " 163-171.\n", 219 | "\n" 220 | ] 221 | } 222 | ], 223 | "source": [ 224 | "print(cancer['DESCR'])" 225 | ] 226 | }, 227 | { 228 | "cell_type": "code", 229 | "execution_count": 6, 230 | "metadata": {}, 231 | "outputs": [ 232 | { 233 | "data": { 234 | "text/plain": [ 235 | "array(['mean radius', 'mean texture', 'mean perimeter', 'mean area',\n", 236 | " 'mean smoothness', 'mean compactness', 'mean concavity',\n", 237 | " 'mean concave points', 'mean symmetry', 'mean fractal dimension',\n", 238 | " 'radius error', 'texture error', 'perimeter error', 'area error',\n", 239 | " 'smoothness error', 'compactness error', 'concavity error',\n", 240 | " 'concave points error', 'symmetry error', 'fractal dimension error',\n", 241 | " 'worst radius', 'worst texture', 'worst perimeter', 'worst area',\n", 242 | " 'worst smoothness', 'worst compactness', 'worst concavity',\n", 243 | " 'worst concave points', 'worst symmetry', 'worst fractal dimension'], \n", 244 | " dtype='|S23')" 245 | ] 246 | }, 247 | "execution_count": 6, 248 | "metadata": {}, 249 | "output_type": "execute_result" 250 | } 251 | ], 252 | "source": [ 253 | "cancer['feature_names']" 254 | ] 255 | }, 256 | { 257 | "cell_type": "markdown", 258 | "metadata": {}, 259 | "source": [ 260 | "## Set up DataFrame" 261 | ] 262 | }, 263 | { 264 | "cell_type": "code", 265 | "execution_count": 7, 266 | "metadata": {}, 267 | "outputs": [ 268 | { 269 | "name": "stdout", 270 | "output_type": "stream", 271 | "text": [ 272 | "\n", 273 | "RangeIndex: 569 entries, 0 to 568\n", 274 | "Data columns (total 30 columns):\n", 275 | "mean radius 569 non-null float64\n", 276 | "mean texture 569 non-null float64\n", 277 | "mean perimeter 569 non-null float64\n", 278 | "mean area 569 non-null float64\n", 279 | "mean smoothness 569 non-null float64\n", 280 | "mean compactness 569 non-null float64\n", 281 | "mean concavity 569 non-null float64\n", 282 | "mean concave points 569 non-null float64\n", 283 | "mean symmetry 569 non-null float64\n", 284 | "mean fractal dimension 569 non-null float64\n", 285 | "radius error 569 non-null float64\n", 286 | "texture error 569 non-null float64\n", 287 | "perimeter error 569 non-null float64\n", 288 | "area error 569 non-null float64\n", 289 | "smoothness error 569 non-null float64\n", 290 | "compactness error 569 non-null float64\n", 291 | "concavity error 569 non-null float64\n", 292 | "concave points error 569 non-null float64\n", 293 | "symmetry error 569 non-null float64\n", 294 | "fractal dimension error 569 non-null float64\n", 295 | "worst radius 569 non-null float64\n", 296 | "worst texture 569 non-null float64\n", 297 | "worst perimeter 569 non-null float64\n", 298 | "worst area 569 non-null float64\n", 299 | "worst smoothness 569 non-null float64\n", 300 | "worst compactness 569 non-null float64\n", 301 | "worst concavity 569 non-null float64\n", 302 | "worst concave points 569 non-null float64\n", 303 | "worst symmetry 569 non-null float64\n", 304 | "worst fractal dimension 569 non-null float64\n", 305 | "dtypes: float64(30)\n", 306 | "memory usage: 133.4 KB\n" 307 | ] 308 | } 309 | ], 310 | "source": [ 311 | "df_feat = pd.DataFrame(cancer['data'],columns=cancer['feature_names'])\n", 312 | "df_feat.info()" 313 | ] 314 | }, 315 | { 316 | "cell_type": "code", 317 | "execution_count": 8, 318 | "metadata": {}, 319 | "outputs": [ 320 | { 321 | "data": { 322 | "text/plain": [ 323 | "array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0,\n", 324 | " 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,\n", 325 | " 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1,\n", 326 | " 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 1, 1, 1, 0,\n", 327 | " 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1,\n", 328 | " 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1,\n", 329 | " 0, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1,\n", 330 | " 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 1, 0, 0, 0, 1,\n", 331 | " 0, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1,\n", 332 | " 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0,\n", 333 | " 0, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0,\n", 334 | " 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1,\n", 335 | " 1, 0, 1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1,\n", 336 | " 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0,\n", 337 | " 1, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 0, 1,\n", 338 | " 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1,\n", 339 | " 0, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1,\n", 340 | " 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1,\n", 341 | " 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1,\n", 342 | " 1, 1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1,\n", 343 | " 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1,\n", 344 | " 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 0, 1, 1,\n", 345 | " 1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1,\n", 346 | " 1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,\n", 347 | " 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1])" 348 | ] 349 | }, 350 | "execution_count": 8, 351 | "metadata": {}, 352 | "output_type": "execute_result" 353 | } 354 | ], 355 | "source": [ 356 | "cancer['target']" 357 | ] 358 | }, 359 | { 360 | "cell_type": "code", 361 | "execution_count": 9, 362 | "metadata": {}, 363 | "outputs": [], 364 | "source": [ 365 | "df_target = pd.DataFrame(cancer['target'],columns=['Cancer'])" 366 | ] 367 | }, 368 | { 369 | "cell_type": "code", 370 | "execution_count": 29, 371 | "metadata": {}, 372 | "outputs": [ 373 | { 374 | "data": { 375 | "text/html": [ 376 | "
\n", 377 | "\n", 390 | "\n", 391 | " \n", 392 | " \n", 393 | " \n", 394 | " \n", 395 | " \n", 396 | " \n", 397 | " \n", 398 | " \n", 399 | " \n", 400 | " \n", 401 | " \n", 402 | " \n", 403 | " \n", 404 | " \n", 405 | " \n", 406 | " \n", 407 | " \n", 408 | " \n", 409 | " \n", 410 | " \n", 411 | " \n", 412 | " \n", 413 | " \n", 414 | " \n", 415 | " \n", 416 | " \n", 417 | " \n", 418 | " \n", 419 | "
Cancer
00
10
20
30
40
\n", 420 | "
" 421 | ], 422 | "text/plain": [ 423 | " Cancer\n", 424 | "0 0\n", 425 | "1 0\n", 426 | "2 0\n", 427 | "3 0\n", 428 | "4 0" 429 | ] 430 | }, 431 | "execution_count": 29, 432 | "metadata": {}, 433 | "output_type": "execute_result" 434 | } 435 | ], 436 | "source": [ 437 | "df_target.head()" 438 | ] 439 | }, 440 | { 441 | "cell_type": "markdown", 442 | "metadata": {}, 443 | "source": [ 444 | "## Train Test Split" 445 | ] 446 | }, 447 | { 448 | "cell_type": "code", 449 | "execution_count": 11, 450 | "metadata": { 451 | "collapsed": true 452 | }, 453 | "outputs": [], 454 | "source": [ 455 | "from sklearn.model_selection import train_test_split" 456 | ] 457 | }, 458 | { 459 | "cell_type": "code", 460 | "execution_count": 12, 461 | "metadata": { 462 | "collapsed": true 463 | }, 464 | "outputs": [], 465 | "source": [ 466 | "X_train, X_test, y_train, y_test = train_test_split(df_feat, np.ravel(df_target), test_size=0.30, random_state=101)" 467 | ] 468 | }, 469 | { 470 | "cell_type": "markdown", 471 | "metadata": {}, 472 | "source": [ 473 | "# Train the Support Vector Classifier" 474 | ] 475 | }, 476 | { 477 | "cell_type": "code", 478 | "execution_count": 13, 479 | "metadata": { 480 | "collapsed": true 481 | }, 482 | "outputs": [], 483 | "source": [ 484 | "from sklearn.svm import SVC" 485 | ] 486 | }, 487 | { 488 | "cell_type": "code", 489 | "execution_count": 14, 490 | "metadata": { 491 | "collapsed": true 492 | }, 493 | "outputs": [], 494 | "source": [ 495 | "model = SVC()" 496 | ] 497 | }, 498 | { 499 | "cell_type": "code", 500 | "execution_count": 15, 501 | "metadata": {}, 502 | "outputs": [ 503 | { 504 | "data": { 505 | "text/plain": [ 506 | "SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,\n", 507 | " decision_function_shape=None, degree=3, gamma='auto', kernel='rbf',\n", 508 | " max_iter=-1, probability=False, random_state=None, shrinking=True,\n", 509 | " tol=0.001, verbose=False)" 510 | ] 511 | }, 512 | "execution_count": 15, 513 | "metadata": {}, 514 | "output_type": "execute_result" 515 | } 516 | ], 517 | "source": [ 518 | "model.fit(X_train,y_train)" 519 | ] 520 | }, 521 | { 522 | "cell_type": "markdown", 523 | "metadata": {}, 524 | "source": [ 525 | "## Predictions and Evaluations\n" 526 | ] 527 | }, 528 | { 529 | "cell_type": "code", 530 | "execution_count": 16, 531 | "metadata": { 532 | "collapsed": true 533 | }, 534 | "outputs": [], 535 | "source": [ 536 | "predictions = model.predict(X_test)" 537 | ] 538 | }, 539 | { 540 | "cell_type": "code", 541 | "execution_count": 17, 542 | "metadata": { 543 | "collapsed": true 544 | }, 545 | "outputs": [], 546 | "source": [ 547 | "from sklearn.metrics import classification_report,confusion_matrix" 548 | ] 549 | }, 550 | { 551 | "cell_type": "code", 552 | "execution_count": 18, 553 | "metadata": {}, 554 | "outputs": [ 555 | { 556 | "name": "stdout", 557 | "output_type": "stream", 558 | "text": [ 559 | "[[ 0 66]\n", 560 | " [ 0 105]]\n" 561 | ] 562 | } 563 | ], 564 | "source": [ 565 | "print(confusion_matrix(y_test,predictions))" 566 | ] 567 | }, 568 | { 569 | "cell_type": "code", 570 | "execution_count": 19, 571 | "metadata": {}, 572 | "outputs": [ 573 | { 574 | "name": "stdout", 575 | "output_type": "stream", 576 | "text": [ 577 | " precision recall f1-score support\n", 578 | "\n", 579 | " 0 0.00 0.00 0.00 66\n", 580 | " 1 0.61 1.00 0.76 105\n", 581 | "\n", 582 | "avg / total 0.38 0.61 0.47 171\n", 583 | "\n" 584 | ] 585 | }, 586 | { 587 | "name": "stderr", 588 | "output_type": "stream", 589 | "text": [ 590 | "/usr/local/lib/python2.7/dist-packages/sklearn/metrics/classification.py:1113: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples.\n", 591 | " 'precision', 'predicted', average, warn_for)\n" 592 | ] 593 | } 594 | ], 595 | "source": [ 596 | "print(classification_report(y_test,predictions))" 597 | ] 598 | }, 599 | { 600 | "cell_type": "markdown", 601 | "metadata": {}, 602 | "source": [ 603 | "Notice that we are classifying everything into a single class! This means our model needs to have it parameters adjusted (it may also help to normalize the data)." 604 | ] 605 | }, 606 | { 607 | "cell_type": "markdown", 608 | "metadata": {}, 609 | "source": [ 610 | "# Gridsearch\n", 611 | "\n", 612 | "Finding the right parameters (like what C or gamma values to use) is a tricky task! This idea of creating a 'grid' of parameters and just trying out all the possible combinations is called a Gridsearch, this method is common enough that Scikit-learn has this functionality built in with GridSearchCV! The CV stands for cross-validation which is the GridSearchCV takes a dictionary that describes the parameters that should be tried and a model to train. The grid of parameters is defined as a dictionary, where the keys are the parameters and the values are the settings to be tested. " 613 | ] 614 | }, 615 | { 616 | "cell_type": "code", 617 | "execution_count": 20, 618 | "metadata": { 619 | "collapsed": true 620 | }, 621 | "outputs": [], 622 | "source": [ 623 | "param_grid = {'C': [0.1,1, 10, 100, 1000], 'gamma': [1,0.1,0.01,0.001,0.0001], 'kernel': ['rbf']} " 624 | ] 625 | }, 626 | { 627 | "cell_type": "code", 628 | "execution_count": 21, 629 | "metadata": { 630 | "collapsed": true 631 | }, 632 | "outputs": [], 633 | "source": [ 634 | "from sklearn.model_selection import GridSearchCV" 635 | ] 636 | }, 637 | { 638 | "cell_type": "code", 639 | "execution_count": 22, 640 | "metadata": { 641 | "collapsed": true 642 | }, 643 | "outputs": [], 644 | "source": [ 645 | "grid = GridSearchCV(SVC(),param_grid,refit=True,verbose=3)" 646 | ] 647 | }, 648 | { 649 | "cell_type": "markdown", 650 | "metadata": {}, 651 | "source": [ 652 | "What fit does is a bit more involved then usual. First, it runs the same loop with cross-validation, to find the best parameter combination. Once it has the best combination, it runs fit again on all data passed to fit (without cross-validation), to built a single new model using the best parameter setting." 653 | ] 654 | }, 655 | { 656 | "cell_type": "code", 657 | "execution_count": 23, 658 | "metadata": {}, 659 | "outputs": [ 660 | { 661 | "name": "stdout", 662 | "output_type": "stream", 663 | "text": [ 664 | "Fitting 3 folds for each of 25 candidates, totalling 75 fits\n", 665 | "[CV] kernel=rbf, C=0.1, gamma=1 ......................................\n", 666 | "[CV] ....... kernel=rbf, C=0.1, gamma=1, score=0.631579, total= 0.0s\n", 667 | "[CV] kernel=rbf, C=0.1, gamma=1 ......................................\n", 668 | "[CV] ....... kernel=rbf, C=0.1, gamma=1, score=0.631579, total= 0.0s\n", 669 | "[CV] kernel=rbf, C=0.1, gamma=1 ......................................\n", 670 | "[CV] ....... kernel=rbf, C=0.1, gamma=1, score=0.636364, total= 0.0s\n", 671 | "[CV] kernel=rbf, C=0.1, gamma=0.1 ....................................\n", 672 | "[CV] ..... kernel=rbf, C=0.1, gamma=0.1, score=0.631579, total= 0.0s\n", 673 | "[CV] kernel=rbf, C=0.1, gamma=0.1 ....................................\n", 674 | "[CV] ..... kernel=rbf, C=0.1, gamma=0.1, score=0.631579, total= 0.0s\n", 675 | "[CV] kernel=rbf, C=0.1, gamma=0.1 ....................................\n", 676 | "[CV] ..... kernel=rbf, C=0.1, gamma=0.1, score=0.636364, total= 0.0s\n", 677 | "[CV] kernel=rbf, C=0.1, gamma=0.01 ...................................\n", 678 | "[CV] .... kernel=rbf, C=0.1, gamma=0.01, score=0.631579, total= 0.0s\n", 679 | "[CV] kernel=rbf, C=0.1, gamma=0.01 ...................................\n", 680 | "[CV] .... kernel=rbf, C=0.1, gamma=0.01, score=0.631579, total= 0.0s\n", 681 | "[CV] kernel=rbf, C=0.1, gamma=0.01 ...................................\n", 682 | "[CV] .... kernel=rbf, C=0.1, gamma=0.01, score=0.636364, total= 0.0s\n", 683 | "[CV] kernel=rbf, C=0.1, gamma=0.001 ..................................\n", 684 | "[CV] ... kernel=rbf, C=0.1, gamma=0.001, score=0.631579, total= 0.0s\n", 685 | "[CV] kernel=rbf, C=0.1, gamma=0.001 ..................................\n", 686 | "[CV] ... kernel=rbf, C=0.1, gamma=0.001, score=0.631579, total= 0.0s\n", 687 | "[CV] kernel=rbf, C=0.1, gamma=0.001 ..................................\n" 688 | ] 689 | }, 690 | { 691 | "name": "stderr", 692 | "output_type": "stream", 693 | "text": [ 694 | "[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 0.0s remaining: 0.0s\n", 695 | "[Parallel(n_jobs=1)]: Done 2 out of 2 | elapsed: 0.0s remaining: 0.0s\n" 696 | ] 697 | }, 698 | { 699 | "name": "stdout", 700 | "output_type": "stream", 701 | "text": [ 702 | "[CV] ... kernel=rbf, C=0.1, gamma=0.001, score=0.636364, total= 0.0s\n", 703 | "[CV] kernel=rbf, C=0.1, gamma=0.0001 .................................\n", 704 | "[CV] .. kernel=rbf, C=0.1, gamma=0.0001, score=0.902256, total= 0.0s\n", 705 | "[CV] kernel=rbf, C=0.1, gamma=0.0001 .................................\n", 706 | "[CV] .. kernel=rbf, C=0.1, gamma=0.0001, score=0.962406, total= 0.0s\n", 707 | "[CV] kernel=rbf, C=0.1, gamma=0.0001 .................................\n", 708 | "[CV] .. kernel=rbf, C=0.1, gamma=0.0001, score=0.916667, total= 0.0s\n", 709 | "[CV] kernel=rbf, C=1, gamma=1 ........................................\n", 710 | "[CV] ......... kernel=rbf, C=1, gamma=1, score=0.631579, total= 0.0s\n", 711 | "[CV] kernel=rbf, C=1, gamma=1 ........................................\n", 712 | "[CV] ......... kernel=rbf, C=1, gamma=1, score=0.631579, total= 0.0s\n", 713 | "[CV] kernel=rbf, C=1, gamma=1 ........................................\n", 714 | "[CV] ......... kernel=rbf, C=1, gamma=1, score=0.636364, total= 0.0s\n", 715 | "[CV] kernel=rbf, C=1, gamma=0.1 ......................................\n", 716 | "[CV] ....... kernel=rbf, C=1, gamma=0.1, score=0.631579, total= 0.0s\n", 717 | "[CV] kernel=rbf, C=1, gamma=0.1 ......................................\n", 718 | "[CV] ....... kernel=rbf, C=1, gamma=0.1, score=0.631579, total= 0.0s\n", 719 | "[CV] kernel=rbf, C=1, gamma=0.1 ......................................\n", 720 | "[CV] ....... kernel=rbf, C=1, gamma=0.1, score=0.636364, total= 0.0s\n", 721 | "[CV] kernel=rbf, C=1, gamma=0.01 .....................................\n", 722 | "[CV] ...... kernel=rbf, C=1, gamma=0.01, score=0.631579, total= 0.0s\n", 723 | "[CV] kernel=rbf, C=1, gamma=0.01 .....................................\n", 724 | "[CV] ...... kernel=rbf, C=1, gamma=0.01, score=0.631579, total= 0.0s\n", 725 | "[CV] kernel=rbf, C=1, gamma=0.01 .....................................\n", 726 | "[CV] ...... kernel=rbf, C=1, gamma=0.01, score=0.636364, total= 0.0s\n", 727 | "[CV] kernel=rbf, C=1, gamma=0.001 ....................................\n", 728 | "[CV] ..... kernel=rbf, C=1, gamma=0.001, score=0.902256, total= 0.0s\n", 729 | "[CV] kernel=rbf, C=1, gamma=0.001 ....................................\n", 730 | "[CV] ..... kernel=rbf, C=1, gamma=0.001, score=0.939850, total= 0.0s\n", 731 | "[CV] kernel=rbf, C=1, gamma=0.001 ....................................\n", 732 | "[CV] ..... kernel=rbf, C=1, gamma=0.001, score=0.954545, total= 0.0s\n", 733 | "[CV] kernel=rbf, C=1, gamma=0.0001 ...................................\n", 734 | "[CV] .... kernel=rbf, C=1, gamma=0.0001, score=0.939850, total= 0.0s\n", 735 | "[CV] kernel=rbf, C=1, gamma=0.0001 ...................................\n", 736 | "[CV] .... kernel=rbf, C=1, gamma=0.0001, score=0.969925, total= 0.0s\n", 737 | "[CV] kernel=rbf, C=1, gamma=0.0001 ...................................\n", 738 | "[CV] .... kernel=rbf, C=1, gamma=0.0001, score=0.946970, total= 0.0s\n", 739 | "[CV] kernel=rbf, C=10, gamma=1 .......................................\n", 740 | "[CV] ........ kernel=rbf, C=10, gamma=1, score=0.631579, total= 0.0s\n", 741 | "[CV] kernel=rbf, C=10, gamma=1 .......................................\n", 742 | "[CV] ........ kernel=rbf, C=10, gamma=1, score=0.631579, total= 0.0s\n", 743 | "[CV] kernel=rbf, C=10, gamma=1 .......................................\n", 744 | "[CV] ........ kernel=rbf, C=10, gamma=1, score=0.636364, total= 0.0s\n", 745 | "[CV] kernel=rbf, C=10, gamma=0.1 .....................................\n", 746 | "[CV] ...... kernel=rbf, C=10, gamma=0.1, score=0.631579, total= 0.0s\n", 747 | "[CV] kernel=rbf, C=10, gamma=0.1 .....................................\n", 748 | "[CV] ...... kernel=rbf, C=10, gamma=0.1, score=0.631579, total= 0.0s\n", 749 | "[CV] kernel=rbf, C=10, gamma=0.1 .....................................\n", 750 | "[CV] ...... kernel=rbf, C=10, gamma=0.1, score=0.636364, total= 0.0s\n", 751 | "[CV] kernel=rbf, C=10, gamma=0.01 ....................................\n", 752 | "[CV] ..... kernel=rbf, C=10, gamma=0.01, score=0.631579, total= 0.0s\n", 753 | "[CV] kernel=rbf, C=10, gamma=0.01 ....................................\n", 754 | "[CV] ..... kernel=rbf, C=10, gamma=0.01, score=0.631579, total= 0.0s\n", 755 | "[CV] kernel=rbf, C=10, gamma=0.01 ....................................\n", 756 | "[CV] ..... kernel=rbf, C=10, gamma=0.01, score=0.636364, total= 0.0s\n", 757 | "[CV] kernel=rbf, C=10, gamma=0.001 ...................................\n", 758 | "[CV] .... kernel=rbf, C=10, gamma=0.001, score=0.894737, total= 0.0s\n", 759 | "[CV] kernel=rbf, C=10, gamma=0.001 ...................................\n", 760 | "[CV] .... kernel=rbf, C=10, gamma=0.001, score=0.932331, total= 0.0s\n", 761 | "[CV] kernel=rbf, C=10, gamma=0.001 ...................................\n", 762 | "[CV] .... kernel=rbf, C=10, gamma=0.001, score=0.916667, total= 0.0s\n", 763 | "[CV] kernel=rbf, C=10, gamma=0.0001 ..................................\n", 764 | "[CV] ... kernel=rbf, C=10, gamma=0.0001, score=0.932331, total= 0.0s\n", 765 | "[CV] kernel=rbf, C=10, gamma=0.0001 ..................................\n", 766 | "[CV] ... kernel=rbf, C=10, gamma=0.0001, score=0.969925, total= 0.0s\n", 767 | "[CV] kernel=rbf, C=10, gamma=0.0001 ..................................\n", 768 | "[CV] ... kernel=rbf, C=10, gamma=0.0001, score=0.962121, total= 0.0s\n", 769 | "[CV] kernel=rbf, C=100, gamma=1 ......................................\n", 770 | "[CV] ....... kernel=rbf, C=100, gamma=1, score=0.631579, total= 0.0s\n", 771 | "[CV] kernel=rbf, C=100, gamma=1 ......................................\n", 772 | "[CV] ....... kernel=rbf, C=100, gamma=1, score=0.631579, total= 0.0s\n", 773 | "[CV] kernel=rbf, C=100, gamma=1 ......................................\n", 774 | "[CV] ....... kernel=rbf, C=100, gamma=1, score=0.636364, total= 0.0s\n", 775 | "[CV] kernel=rbf, C=100, gamma=0.1 ....................................\n", 776 | "[CV] ..... kernel=rbf, C=100, gamma=0.1, score=0.631579, total= 0.0s\n", 777 | "[CV] kernel=rbf, C=100, gamma=0.1 ....................................\n", 778 | "[CV] ..... kernel=rbf, C=100, gamma=0.1, score=0.631579, total= 0.0s\n", 779 | "[CV] kernel=rbf, C=100, gamma=0.1 ....................................\n", 780 | "[CV] ..... kernel=rbf, C=100, gamma=0.1, score=0.636364, total= 0.0s\n", 781 | "[CV] kernel=rbf, C=100, gamma=0.01 ...................................\n", 782 | "[CV] .... kernel=rbf, C=100, gamma=0.01, score=0.631579, total= 0.0s\n", 783 | "[CV] kernel=rbf, C=100, gamma=0.01 ...................................\n", 784 | "[CV] .... kernel=rbf, C=100, gamma=0.01, score=0.631579, total= 0.0s\n", 785 | "[CV] kernel=rbf, C=100, gamma=0.01 ...................................\n", 786 | "[CV] .... kernel=rbf, C=100, gamma=0.01, score=0.636364, total= 0.0s\n", 787 | "[CV] kernel=rbf, C=100, gamma=0.001 ..................................\n", 788 | "[CV] ... kernel=rbf, C=100, gamma=0.001, score=0.894737, total= 0.0s\n", 789 | "[CV] kernel=rbf, C=100, gamma=0.001 ..................................\n", 790 | "[CV] ... kernel=rbf, C=100, gamma=0.001, score=0.932331, total= 0.0s\n", 791 | "[CV] kernel=rbf, C=100, gamma=0.001 ..................................\n", 792 | "[CV] ... kernel=rbf, C=100, gamma=0.001, score=0.916667, total= 0.0s\n", 793 | "[CV] kernel=rbf, C=100, gamma=0.0001 .................................\n", 794 | "[CV] .. kernel=rbf, C=100, gamma=0.0001, score=0.917293, total= 0.0s\n", 795 | "[CV] kernel=rbf, C=100, gamma=0.0001 .................................\n", 796 | "[CV] .. kernel=rbf, C=100, gamma=0.0001, score=0.977444, total= 0.0s\n", 797 | "[CV] kernel=rbf, C=100, gamma=0.0001 .................................\n", 798 | "[CV] .. kernel=rbf, C=100, gamma=0.0001, score=0.939394, total= 0.0s\n", 799 | "[CV] kernel=rbf, C=1000, gamma=1 .....................................\n", 800 | "[CV] ...... kernel=rbf, C=1000, gamma=1, score=0.631579, total= 0.0s\n", 801 | "[CV] kernel=rbf, C=1000, gamma=1 .....................................\n", 802 | "[CV] ...... kernel=rbf, C=1000, gamma=1, score=0.631579, total= 0.0s\n", 803 | "[CV] kernel=rbf, C=1000, gamma=1 .....................................\n", 804 | "[CV] ...... kernel=rbf, C=1000, gamma=1, score=0.636364, total= 0.0s\n", 805 | "[CV] kernel=rbf, C=1000, gamma=0.1 ...................................\n", 806 | "[CV] .... kernel=rbf, C=1000, gamma=0.1, score=0.631579, total= 0.0s\n", 807 | "[CV] kernel=rbf, C=1000, gamma=0.1 ...................................\n", 808 | "[CV] .... kernel=rbf, C=1000, gamma=0.1, score=0.631579, total= 0.0s\n", 809 | "[CV] kernel=rbf, C=1000, gamma=0.1 ...................................\n", 810 | "[CV] .... kernel=rbf, C=1000, gamma=0.1, score=0.636364, total= 0.0s\n", 811 | "[CV] kernel=rbf, C=1000, gamma=0.01 ..................................\n", 812 | "[CV] ... kernel=rbf, C=1000, gamma=0.01, score=0.631579, total= 0.0s\n", 813 | "[CV] kernel=rbf, C=1000, gamma=0.01 ..................................\n", 814 | "[CV] ... kernel=rbf, C=1000, gamma=0.01, score=0.631579, total= 0.0s\n", 815 | "[CV] kernel=rbf, C=1000, gamma=0.01 ..................................\n", 816 | "[CV] ... kernel=rbf, C=1000, gamma=0.01, score=0.636364, total= 0.0s\n", 817 | "[CV] kernel=rbf, C=1000, gamma=0.001 .................................\n", 818 | "[CV] .. kernel=rbf, C=1000, gamma=0.001, score=0.894737, total= 0.0s\n", 819 | "[CV] kernel=rbf, C=1000, gamma=0.001 .................................\n", 820 | "[CV] .. kernel=rbf, C=1000, gamma=0.001, score=0.932331, total= 0.0s\n", 821 | "[CV] kernel=rbf, C=1000, gamma=0.001 .................................\n", 822 | "[CV] .. kernel=rbf, C=1000, gamma=0.001, score=0.916667, total= 0.0s\n", 823 | "[CV] kernel=rbf, C=1000, gamma=0.0001 ................................\n", 824 | "[CV] . kernel=rbf, C=1000, gamma=0.0001, score=0.909774, total= 0.0s\n", 825 | "[CV] kernel=rbf, C=1000, gamma=0.0001 ................................\n", 826 | "[CV] . kernel=rbf, C=1000, gamma=0.0001, score=0.969925, total= 0.0s\n", 827 | "[CV] kernel=rbf, C=1000, gamma=0.0001 ................................\n", 828 | "[CV] . kernel=rbf, C=1000, gamma=0.0001, score=0.931818, total= 0.0s\n" 829 | ] 830 | }, 831 | { 832 | "name": "stderr", 833 | "output_type": "stream", 834 | "text": [ 835 | "[Parallel(n_jobs=1)]: Done 75 out of 75 | elapsed: 1.2s finished\n" 836 | ] 837 | }, 838 | { 839 | "data": { 840 | "text/plain": [ 841 | "GridSearchCV(cv=None, error_score='raise',\n", 842 | " estimator=SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,\n", 843 | " decision_function_shape=None, degree=3, gamma='auto', kernel='rbf',\n", 844 | " max_iter=-1, probability=False, random_state=None, shrinking=True,\n", 845 | " tol=0.001, verbose=False),\n", 846 | " fit_params={}, iid=True, n_jobs=1,\n", 847 | " param_grid={'kernel': ['rbf'], 'C': [0.1, 1, 10, 100, 1000], 'gamma': [1, 0.1, 0.01, 0.001, 0.0001]},\n", 848 | " pre_dispatch='2*n_jobs', refit=True, return_train_score=True,\n", 849 | " scoring=None, verbose=3)" 850 | ] 851 | }, 852 | "execution_count": 23, 853 | "metadata": {}, 854 | "output_type": "execute_result" 855 | } 856 | ], 857 | "source": [ 858 | "# May take awhile!\n", 859 | "grid.fit(X_train,y_train)" 860 | ] 861 | }, 862 | { 863 | "cell_type": "markdown", 864 | "metadata": {}, 865 | "source": [ 866 | "You can inspect the best parameters found by GridSearchCV in the best_params_ attribute, and the best estimator in the best\\_estimator_ attribute:" 867 | ] 868 | }, 869 | { 870 | "cell_type": "code", 871 | "execution_count": 24, 872 | "metadata": {}, 873 | "outputs": [ 874 | { 875 | "data": { 876 | "text/plain": [ 877 | "{'C': 10, 'gamma': 0.0001, 'kernel': 'rbf'}" 878 | ] 879 | }, 880 | "execution_count": 24, 881 | "metadata": {}, 882 | "output_type": "execute_result" 883 | } 884 | ], 885 | "source": [ 886 | "grid.best_params_" 887 | ] 888 | }, 889 | { 890 | "cell_type": "code", 891 | "execution_count": 25, 892 | "metadata": {}, 893 | "outputs": [ 894 | { 895 | "data": { 896 | "text/plain": [ 897 | "SVC(C=10, cache_size=200, class_weight=None, coef0=0.0,\n", 898 | " decision_function_shape=None, degree=3, gamma=0.0001, kernel='rbf',\n", 899 | " max_iter=-1, probability=False, random_state=None, shrinking=True,\n", 900 | " tol=0.001, verbose=False)" 901 | ] 902 | }, 903 | "execution_count": 25, 904 | "metadata": {}, 905 | "output_type": "execute_result" 906 | } 907 | ], 908 | "source": [ 909 | "grid.best_estimator_" 910 | ] 911 | }, 912 | { 913 | "cell_type": "markdown", 914 | "metadata": {}, 915 | "source": [ 916 | "Then you can re-run predictions on this grid object just like you would with a normal model." 917 | ] 918 | }, 919 | { 920 | "cell_type": "code", 921 | "execution_count": 26, 922 | "metadata": { 923 | "collapsed": true 924 | }, 925 | "outputs": [], 926 | "source": [ 927 | "grid_predictions = grid.predict(X_test)" 928 | ] 929 | }, 930 | { 931 | "cell_type": "code", 932 | "execution_count": 27, 933 | "metadata": {}, 934 | "outputs": [ 935 | { 936 | "name": "stdout", 937 | "output_type": "stream", 938 | "text": [ 939 | "[[ 60 6]\n", 940 | " [ 3 102]]\n" 941 | ] 942 | } 943 | ], 944 | "source": [ 945 | "print(confusion_matrix(y_test,grid_predictions))" 946 | ] 947 | }, 948 | { 949 | "cell_type": "code", 950 | "execution_count": 28, 951 | "metadata": {}, 952 | "outputs": [ 953 | { 954 | "name": "stdout", 955 | "output_type": "stream", 956 | "text": [ 957 | " precision recall f1-score support\n", 958 | "\n", 959 | " 0 0.95 0.91 0.93 66\n", 960 | " 1 0.94 0.97 0.96 105\n", 961 | "\n", 962 | "avg / total 0.95 0.95 0.95 171\n", 963 | "\n" 964 | ] 965 | } 966 | ], 967 | "source": [ 968 | "print(classification_report(y_test,grid_predictions))" 969 | ] 970 | }, 971 | { 972 | "cell_type": "markdown", 973 | "metadata": {}, 974 | "source": [ 975 | "# The End!" 976 | ] 977 | } 978 | ], 979 | "metadata": { 980 | "kernelspec": { 981 | "display_name": "Python 3", 982 | "language": "python", 983 | "name": "python3" 984 | }, 985 | "language_info": { 986 | "codemirror_mode": { 987 | "name": "ipython", 988 | "version": 2 989 | }, 990 | "file_extension": ".py", 991 | "mimetype": "text/x-python", 992 | "name": "python", 993 | "nbconvert_exporter": "python", 994 | "pygments_lexer": "ipython2", 995 | "version": "2.7.12" 996 | } 997 | }, 998 | "nbformat": 4, 999 | "nbformat_minor": 1 1000 | } 1001 | --------------------------------------------------------------------------------