├── ..Rcheck ├── 00check.log └── 00install.out ├── .Rbuildignore ├── .gitignore ├── .idea ├── .gitignore ├── SmartML.iml ├── aws.xml ├── inspectionProfiles │ ├── Project_Default.xml │ └── profiles_settings.xml ├── misc.xml ├── modules.xml ├── rAvailablePackageCache.xml ├── rGraphicsSettings.xml ├── rSettings.xml └── vcs.xml ├── .travis.yml ├── CONTRIBUTE.md ├── DESCRIPTION ├── LICENSE ├── NAMESPACE ├── NEWS.md ├── R ├── autoRLearn.R ├── autoRLearn_.R ├── bohb.R ├── bohb_utility.R ├── checkInternet.R ├── computeEI.R ├── computeMetaFeatures.R ├── convertCategorical.R ├── datasetReader.R ├── evaluateMet.R ├── evocate.R ├── evocate_utilities.R ├── featurePreProcessing.R ├── fitModel.R ├── getCandidateClassifiers.R ├── hb_utilities.R ├── hyperband.R ├── initialize.R ├── intensify.R ├── intrepretability.R ├── outClassifierConf.R ├── readDataset.R ├── runClassifier.R ├── runClassifier_.R ├── selectConfiguration.R ├── sendToDatabase.R ├── sendToTmp.R ├── successive_halving.R ├── successive_resampling.R └── sysdata.rda ├── README.Rmd ├── README.md ├── SmartML.Rproj ├── SmartML_0.3.0.pdf ├── codecov.yml ├── inst └── extdata │ ├── anneal_test.csv │ ├── anneal_train.csv │ ├── avila_test.csv │ ├── avila_train.csv │ ├── batch_test.csv │ ├── batch_train.csv │ ├── dota_test.csv │ ├── dota_train.csv │ ├── hyperband_jsons.zip │ ├── hyperband_jsons │ ├── cv_glmnet.json │ ├── glmnet.json │ ├── kknn.json │ ├── lm.json │ ├── naive_bayes.json │ ├── ranger.json │ ├── rpart.json │ ├── svm.json │ └── xgboost.json │ ├── messidor_test.csv │ ├── messidor_train.csv │ ├── mushroom_test.csv │ ├── mushroom_train.csv │ ├── schizo.csv │ ├── ta_test.csv │ ├── ta_train.csv │ ├── test_schizo.csv │ ├── theorem_test.csv │ ├── theorem_train.csv │ ├── tictactoe_test.csv │ ├── tictactoe_train.csv │ └── train_schizo.csv ├── man ├── autoRLearn.Rd ├── autoRLearn_.Rd ├── datasetReader.Rd ├── metafeatures.pdf ├── runClassifier.Rd └── supportedAlgorithms.pdf ├── manual.pdf ├── save_jsons.R ├── sysdata.rda ├── test_rmarkdown ├── data_tests.Rmd ├── func_tests.Rmd └── new_tests.Rmd ├── testing.R ├── tests ├── testthat.R └── testthat │ ├── test-autorlearn.R │ └── test-hyperband_test.R └── vignettes ├── .gitignore └── introduction.Rmd /..Rcheck/00check.log: -------------------------------------------------------------------------------- 1 | * using log directory 'C:/Users/s-moh/0-Labwork/SmartML_From_Scratch/Auto-Machine-Learning/..Rcheck' 2 | * using R version 3.5.3 (2019-03-11) 3 | * using platform: x86_64-w64-mingw32 (64-bit) 4 | * using session charset: ISO8859-1 5 | * checking for file './DESCRIPTION' ... OK 6 | * checking extension type ... Package 7 | * this is package 'SmartML' version '0.1.0' 8 | * package encoding: UTF-8 9 | * checking package namespace information ... OK 10 | * checking package dependencies ... OK 11 | * checking if this is a source package ... OK 12 | * checking if there is a namespace ... OK 13 | * checking for .dll and .exe files ... OK 14 | * checking for hidden files and directories ... NOTE 15 | Found the following hidden files and directories: 16 | .RData 17 | .Rhistory 18 | .gitignore 19 | ..Rcheck 20 | .Rproj.user 21 | .git 22 | These were most likely included in error. See section 'Package 23 | structure' in the 'Writing R Extensions' manual. 24 | * checking for portable file names ... OK 25 | * checking whether package 'SmartML' can be installed ... ERROR 26 | Installation failed. 27 | See 'C:/Users/s-moh/0-Labwork/SmartML_From_Scratch/Auto-Machine-Learning/..Rcheck/00install.out' for details. 28 | * DONE 29 | Status: 1 ERROR, 1 NOTE 30 | -------------------------------------------------------------------------------- /..Rcheck/00install.out: -------------------------------------------------------------------------------- 1 | * installing *source* package 'SmartML' ... 2 | ** R 3 | ** byte-compile and prepare package for lazy loading 4 | Error : .onLoad failed in loadNamespace() for 'rJava', details: 5 | call: dirname(this$RuntimeLib) 6 | error: a character vector argument expected 7 | ERROR: lazy loading failed for package 'SmartML' 8 | * removing 'C:/Users/s-moh/0-Labwork/SmartML_From_Scratch/Auto-Machine-Learning/..Rcheck/SmartML' 9 | In R CMD INSTALL 10 | -------------------------------------------------------------------------------- /.Rbuildignore: -------------------------------------------------------------------------------- 1 | ^CONTRIBUTE\.md$ 2 | ^tmp$ 3 | ^man/supportedAlgorithms\.pdf$ 4 | ^man/metafeatures\.pdf$ 5 | ^codecov\.yml$ 6 | ^\.travis\.yml$ 7 | ^.*\.Rproj$ 8 | ^\.Rproj\.user$ 9 | ^sampleDatasets$ 10 | ^manual.pdf$ 11 | ^LICENSE$ 12 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | inst/doc 2 | .Rproj.user 3 | .Rhistory 4 | .RData 5 | .Ruserdata 6 | tmp 7 | -------------------------------------------------------------------------------- /.idea/.gitignore: -------------------------------------------------------------------------------- 1 | # Default ignored files 2 | /shelf/ 3 | /workspace.xml 4 | # Datasource local storage ignored files 5 | /../../../../../../../:\Users\s-moh\0-Labwork\SmartML\SmartML\.idea/dataSources/ 6 | /dataSources.local.xml 7 | # Editor-based HTTP Client requests 8 | /httpRequests/ 9 | -------------------------------------------------------------------------------- /.idea/SmartML.iml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | -------------------------------------------------------------------------------- /.idea/aws.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 10 | 11 | -------------------------------------------------------------------------------- /.idea/inspectionProfiles/Project_Default.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 15 | -------------------------------------------------------------------------------- /.idea/inspectionProfiles/profiles_settings.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 6 | -------------------------------------------------------------------------------- /.idea/misc.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 30 | 31 | -------------------------------------------------------------------------------- /.idea/modules.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | -------------------------------------------------------------------------------- /.idea/rAvailablePackageCache.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 10 | 11 | -------------------------------------------------------------------------------- /.idea/rGraphicsSettings.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 9 | -------------------------------------------------------------------------------- /.idea/rSettings.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 7 | -------------------------------------------------------------------------------- /.idea/vcs.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | -------------------------------------------------------------------------------- /.travis.yml: -------------------------------------------------------------------------------- 1 | # R for travis: see documentation at https://docs.travis-ci.com/user/languages/r 2 | 3 | language: r 4 | sudo: false 5 | cache: packages 6 | warnings_are_errors: false 7 | -------------------------------------------------------------------------------- /CONTRIBUTE.md: -------------------------------------------------------------------------------- 1 | # Contributing to `SmartML` development 2 | 3 | I use the same guide for contributing as the `ggplot2` R package, which is restated here: 4 | 5 | The goal of this guide is to help you get up and contributing to `SmartML` as 6 | quickly as possible. The guide is divided into two main pieces: 7 | 8 | * Filing a bug report or feature request in an issue. 9 | * Suggesting a change via a pull request. 10 | 11 | ## Issues 12 | 13 | When filing an issue, the most important thing is to include a minimal 14 | reproducible example so that we can quickly verify the problem, and then figure 15 | out how to fix it. There are three things you need to include to make your 16 | example reproducible: required packages, data, code. 17 | 18 | 1. **Packages** should be loaded at the top of the script, so it's easy to 19 | see which ones the example needs. 20 | 21 | 2. The easiest way to include **data** is to use `dput()` to generate the R 22 | code to recreate it. 23 | 24 | 3. Spend a little bit of time ensuring that your **code** is easy for others to 25 | read: 26 | 27 | * make sure you've used spaces and your variable names are concise, but 28 | informative 29 | 30 | * use comments to indicate where your problem lies 31 | 32 | * do your best to remove everything that is not related to the problem. 33 | The shorter your code is, the easier it is to understand. 34 | 35 | You can check you have actually made a reproducible example by starting up a 36 | fresh R session and pasting your script in. 37 | 38 | (Unless you've been specifically asked for it, please don't include the output 39 | of `sessionInfo()`.) 40 | 41 | ## Pull requests 42 | 43 | To contribute a change to `SmartML`, you follow these steps: 44 | 45 | 1. Create a branch in git and make your changes. 46 | 2. Push branch to github and issue pull request (PR). 47 | 3. Discuss the pull request. 48 | 4. Iterate until either we accept the PR or decide that it's not a good fit for 49 | `SmartML`. 50 | 51 | Each of these steps are described in more detail below. This might feel 52 | overwhelming the first time you get set up, but it gets easier with practice. 53 | 54 | If you're not familiar with git or GitHub, please start by reading 55 | 56 | 57 | Pull requests will be evaluated against the a checklist: 58 | 59 | 1. __Motivation__. Your pull request should clearly and concisely motivates the 60 | need for change. Plesae describe the problem your PR addresses and show 61 | how your pull request solves it as concisely as possible. 62 | 63 | Also include this motivation in `NEWS` so that when a new release of 64 | `SmartML` comes out it's easy for users to see what's changed. Add your 65 | item at the top of the file and use markdown for formatting. The 66 | news item should end with `(@yourGithubUsername, #the_issue_number)`. 67 | 68 | 2. __Only related changes__. Before you submit your pull request, please 69 | check to make sure that you haven't accidentally included any unrelated 70 | changes. These make it harder to see exactly what's changed, and to 71 | evaluate any unexpected side effects. 72 | 73 | Each PR corresponds to a git branch, so if you expect to submit 74 | multiple changes make sure to create multiple branches. If you have 75 | multiple changes that depend on each other, start with the first one 76 | and don't submit any others until the first one has been processed. 77 | 78 | 3. If you're adding new parameters or a new function, you'll also need 79 | to document them with [roxygen](https://github.com/klutometis/roxygen). 80 | Make sure to re-run `devtools::document()` on the code before submitting. 81 | 82 | This seems like a lot of work but don't worry if your pull request isn't 83 | perfect. It's a learning process. A pull request is a process, and unless 84 | you've submitted a few in the past it's unlikely that your pull request will be 85 | accepted as is. Please don't submit pull requests that change existing 86 | behaviour. Instead, think about how you can add a new feature in a minimally 87 | invasive way. 88 | -------------------------------------------------------------------------------- /DESCRIPTION: -------------------------------------------------------------------------------- 1 | Package: SmartML 2 | Version: 0.3.0 3 | Title: Machine Learning Automation 4 | Authors@R: 5 | c(person(given = "Mohamed", 6 | family = "Maher", 7 | email = "s-mohamed.zenhom@zewailcity.edu.eg", 8 | role = c("aut", "cre")), 9 | person(given = "Sherif", 10 | family = "Sakr", 11 | email = "sherif.sakr@ut.ee", 12 | role = "aut"), 13 | person(given = "Bruno Rucy", 14 | family = "Carneiro Alves de Lima", 15 | email = "brurucy@protonmail.ch", 16 | role = "ctb")) 17 | Description: This package is a meta-learning based framework for automated selection and hyper-parameter tuning for machine learning algorithms. Being meta-learning based, the framework is able to simulate the role of the machine learning expert. In particular, the framework is equipped with a continuously updated knowledge base that stores information about statistical meta features of all processed datasets along with the associated performance of the different classifiers and their tuned parameters. Thus, for any new dataset, SmartML automatically extracts its meta features and searches its knowledge base for the best performing algorithm to start its optimization process. In addition, SmartML makes use of the new runs to continuously enrich its knowledge base to improve its performance and robustness for future runs. 18 | License: GPL-3 19 | Encoding: UTF-8 20 | LazyData: false 21 | Imports: 22 | devtools, R.utils, stats, tictoc, e1071, BBmisc, kknn, purrr, xgboost, ranger, 23 | KernSmooth, data.table, randomForest, rpart, glmnet, nloptr, bbotk 24 | Suggests: 25 | knitr, 26 | covr, 27 | testthat, 28 | rmarkdown 29 | Depends: 30 | mlr3, 31 | mlr3learners, 32 | mlr3pipelines, 33 | mlr3filters 34 | RoxygenNote: 7.1.1 35 | VignetteBuilder: knitr 36 | -------------------------------------------------------------------------------- /NAMESPACE: -------------------------------------------------------------------------------- 1 | # Generated by roxygen2: do not edit by hand 2 | 3 | export(autoRLearn) 4 | export(autoRLearn_) 5 | export(evocate) 6 | export(runClassifier) 7 | import(RWeka) 8 | import(caret) 9 | import(devtools) 10 | import(farff) 11 | import(ggplot2) 12 | import(mice) 13 | import(purrr) 14 | import(rjson) 15 | importFrom(BBmisc,normalize) 16 | importFrom(C50,C5.0) 17 | importFrom(C50,C5.0Control) 18 | importFrom(FNN,knn) 19 | importFrom(KernSmooth,bkde) 20 | importFrom(KernSmooth,dpik) 21 | importFrom(LiblineaR,LiblineaR) 22 | importFrom(MASS,lda) 23 | importFrom(R.utils,withTimeout) 24 | importFrom(RCurl,getURL) 25 | importFrom(RMySQL,MySQL) 26 | importFrom(RMySQL,dbConnect) 27 | importFrom(RMySQL,dbDisconnect) 28 | importFrom(RMySQL,dbSendQuery) 29 | importFrom(RMySQL,fetch) 30 | importFrom(UBL,SmoteClassif) 31 | importFrom(caret,confusionMatrix) 32 | importFrom(caret,plsda) 33 | importFrom(data.table,fcase) 34 | importFrom(deepboost,deepboost) 35 | importFrom(deepboost,deepboost.predict) 36 | importFrom(dplyr,arrange) 37 | importFrom(dplyr,case_when) 38 | importFrom(dplyr,distinct) 39 | importFrom(dplyr,filter) 40 | importFrom(dplyr,group_by) 41 | importFrom(dplyr,mutate) 42 | importFrom(dplyr,mutate_if) 43 | importFrom(dplyr,n) 44 | importFrom(dplyr,select) 45 | importFrom(dplyr,top_frac) 46 | importFrom(e1071,kurtosis) 47 | importFrom(e1071,naiveBayes) 48 | importFrom(e1071,skewness) 49 | importFrom(e1071,svm) 50 | importFrom(fastNaiveBayes,fnb.train) 51 | importFrom(graphics,plot) 52 | importFrom(httr,POST) 53 | importFrom(httr,content) 54 | importFrom(iml,FeatureImp) 55 | importFrom(iml,Interaction) 56 | importFrom(iml,Predictor) 57 | importFrom(imputeMissings,compute) 58 | importFrom(imputeMissings,impute) 59 | importFrom(ipred,bagging) 60 | importFrom(klaR,rda) 61 | importFrom(mda,bruto) 62 | importFrom(mda,fda) 63 | importFrom(mda,gen.ridge) 64 | importFrom(mda,mars) 65 | importFrom(mda,polyreg) 66 | importFrom(nnet,nnet) 67 | importFrom(randomForest,randomForest) 68 | importFrom(ranger,ranger) 69 | importFrom(rjson,fromJSON) 70 | importFrom(rpart,rpart) 71 | importFrom(rpart,rpart.control) 72 | importFrom(stats,complete.cases) 73 | importFrom(stats,dnorm) 74 | importFrom(stats,glm) 75 | importFrom(stats,na.omit) 76 | importFrom(stats,pnorm) 77 | importFrom(stats,predict) 78 | importFrom(stats,rnorm) 79 | importFrom(stats,runif) 80 | importFrom(stats,setNames) 81 | importFrom(stats,var) 82 | importFrom(tictoc,tic) 83 | importFrom(tictoc,toc) 84 | importFrom(tidyr,drop_na) 85 | importFrom(tidyr,gather) 86 | importFrom(tidyr,separate) 87 | importFrom(tidyr,spread) 88 | importFrom(tidyr,unite) 89 | importFrom(truncnorm,dtruncnorm) 90 | importFrom(truncnorm,rtruncnorm) 91 | importFrom(utils,capture.output) 92 | importFrom(utils,head) 93 | importFrom(utils,read.csv) 94 | importFrom(xgboost,xgb.DMatrix) 95 | importFrom(xgboost,xgboost) 96 | -------------------------------------------------------------------------------- /NEWS.md: -------------------------------------------------------------------------------- 1 | # SmartML 0.3.0.1 2 | 3 | * Hotfix, fixed some dependency issues relating to dplyr 4 | 5 | # SmartML 0.3.0 6 | 7 | ## Features 8 | 9 | * Added Ranger, XGBoost, fastNaiveBayes and LiblineaR high performing algorithms 10 | * Added the autoRLearn_ function, which assumes that the data is in perfect shape and can be loaded from a dataframe, unlike autoRLearn which can only load from a data file outside R. 11 | * Added Hyperband and Bayesian Optimization Hyperband to the new autoRLearn_ 12 | * Added some extra temporary dependencies which will be removed in the following months (all tidyverse packages other than purrr) 13 | * Fixed some small mistakes in the code and jsons 14 | 15 | ## Current Roadmap 16 | 17 | * fix metalearning, at the moment it doesn't work. There's something wrong with the AWS server we are using. 18 | * change the dplyr back end to use data.table with dtplyr 19 | * merge autoRLearn and autoRLearn_ into a single function, which can both load from a data file and in R. 20 | * Rewrite SMAC, as requested by Sherif. 21 | 22 | ## Extra info 23 | 24 | * brurucy is a new and active maintainer 25 | * Nightly and experimental versions, independent from the Data Systems Lab, are being developed at https://github.com/brurucy/witchcraft 26 | * Updates will be conservative and focused on non-breaking changes, up to release 1.0. 27 | -------------------------------------------------------------------------------- /R/autoRLearn.R: -------------------------------------------------------------------------------- 1 | #' @title Run smartML function for automatic Supervised Machine Learning. 2 | #' 3 | #' @description Run the smartML main function for automatic classifier algorithm selection, and hyper-parameter tuning. 4 | #' 5 | #' @param maxTime Float numeric of the maximum time budget for reading dataset, preprocessing, calculating meta-features, Algorithm Selection & hyper-parameter tuning process only in minutes(Excluding Model Interpretability) - This is applicable in case of Option = 2 only. 6 | #' @param directory String Character of the training dataset directory (SmartML accepts file formats arff/(csv with columns headers) ). 7 | #' @param testDirectory String Character of the testing dataset directory (SmartML accepts file formats arff/(csv with columns headers) ). 8 | #' @param classCol String Character of the name of the class label column in the dataset (default = 'class'). 9 | #' @param vRatio Float numeric of the validation set ratio that should be splitted out of the training set for the evaluation process (default = 0.1 --> 10\%). 10 | #' @param preProcessF vector of string Character containing the name of the preprocessing algorithms (default = c('standardize', 'zv') --> no preprocessing): 11 | #' \itemize{ 12 | #' \item "boxcox" - apply a Box–Cox transform and values must be non-zero and positive in all features, 13 | #' \item "yeo-Johnson" - apply a Yeo-Johnson transform, like a BoxCox, but values can be negative, 14 | #' \item "zv" - remove attributes with a zero variance (all the same value), 15 | #' \item "center" - subtract mean from values, 16 | #' \item "scale" - divide values by standard deviation, 17 | #' \item "standardize" - perform both centering and scaling, 18 | #' \item "normalize" - normalize values, 19 | #' \item "pca" - transform data to the principal components, 20 | #' \item "ica" - transform data to the independent components. 21 | #' } 22 | #' @param featuresToPreProcess Vector of number of features to perform the feature preprocessing on - In case of empty vector, this means to include all features in the dataset file (default = c()) - This vector should be a subset of \code{selectedFeats}. 23 | #' @param nComp Integer numeric of Number of components needed if either "pca" or "ica" feature preprocessors are needed. 24 | #' @param nModels Integer numeric representing the number of classifier algorithms that you want to select based on Meta-Learning and start to tune using Bayesian Optimization (default = 5). 25 | #' @param option Integer numeric representing either Classifier Algorithm Selection is needed only = 1 or Algorithm selection with its parameter tuning is required = 2 which is the default value. 26 | #' @param featureTypes Vector of either 'numerical' or 'categorical' representing the types of features in the dataset (default = c() --> any factor or character features will be considered as categorical otherwise numerical). 27 | #' @param interp Boolean representing if model interpretability (Feature Importance and Interaction) is needed or not (default = FALSE) This option will take more time budget if set to 1. 28 | #' @param missingOpr Boolean variable represents either use median/mode imputation for instances with missing values (FALSE) or apply imputation using "MICE" library which helps you imputing missing values with plausible data values that are drawn from a distribution specifically designed for each missing datapoint (TRUE). 29 | #' @param balance Boolean variable represents if SMOTE class balancing is required or not (default FALSE). 30 | #' @param metric Metric of string character to be used in evaluation: 31 | #' \itemize{ 32 | #' \item "acc" - Accuracy, 33 | #' \item "avg-fscore" - Average of F-Score of each label, 34 | #' \item "avg-recall" - Average of Recall of each label, 35 | #' \item "avg-precision" - Average of Precision of each label, 36 | #' \item "fscore" - Micro-Average of F-Score of each label, 37 | #' \item "recall" - Micro-Average of Recall of each label, 38 | #' \item "precision" - Micro-Average of Precision of each label. 39 | #' } 40 | #' 41 | #' @return List of Results 42 | #' \itemize{ 43 | #' \item "option=1" - Choosen Classifier Algorithms Names \code{clfs} with their parameters configurations \code{params}, Training DataFrame \code{TRData}, Test DataFrame \code{TEData} in case of \code{option=2}, 44 | #' \item "option=2" - Best classifier algorithm name found \code{clfs} with its parameters configuration \code{params}, , Training DataFrame \code{TRData}, Test DataFrame \code{TEData}, model variable \code{model}, predicted values on test set \code{pred}, performance on TestingSet \code{perf}, and Feature Importance \code{interpret$featImp} / Interaction \code{interpret$Interact} plots in case of interpretability \code{interp} = TRUE and chosen model is not knn. 45 | #' } 46 | #' 47 | #' @examples 48 | #' \dontrun{ 49 | #' autoRLearn(1, 'sampleDatasets/car/train.arff', \ 50 | #' 'sampleDatasets/car/test.arff', option = 2, preProcessF = 'normalize') 51 | #' 52 | #' result <- autoRLearn(10, 'sampleDatasets/shuttle/train.arff', 'sampleDatasets/shuttle/test.arff') 53 | #' } 54 | #' 55 | #' @importFrom tictoc tic toc 56 | #' @importFrom R.utils withTimeout 57 | #' @importFrom graphics plot 58 | #' @import ggplot2 59 | #' 60 | #' @export autoRLearn 61 | 62 | autoRLearn <- function(maxTime, directory, testDirectory, classCol = 'class', metric = 'acc', vRatio = 0.3, preProcessF = c('standardize', 'zv'), featuresToPreProcess = c(), nComp = NA, nModels = 5, option = 2, featureTypes = c(), interp = FALSE, missingOpr = FALSE, balance = FALSE) { 63 | #Set Seed 64 | set.seed(22) 65 | #Read Dataset 66 | datasetReadError <- try( 67 | { 68 | #Read Training Dataset 69 | dataset <- readDataset(directory, testDirectory, classCol = classCol, vRatio = vRatio, preProcessF = preProcessF, featuresToPreProcess = featuresToPreProcess, nComp = nComp, missingOpr = missingOpr, metric = metric, balance = balance) 70 | trainingSet <- dataset$TD 71 | #Read Testing Dataset 72 | testDataset <- dataset$TED 73 | #Read all training Dataset without validation 74 | trainDataset <- dataset$FULLTD 75 | }) 76 | if(inherits(datasetReadError, "try-error")){ 77 | print('Error: Failed Reading Dataset: Make sure that dataset directory is correct and it is a valid csv/arff file.') 78 | return(-1) 79 | } 80 | 81 | #Calculate Meta-Features for the dataset 82 | metaFeaturesError <- try( 83 | { 84 | metaFeatures <- computeMetaFeatures(trainingSet, maxTime, featureTypes) 85 | }) 86 | if(inherits(metaFeaturesError, "try-error")){ 87 | print('Error: Failed Extracting Dataset MetaFeatures.') 88 | return(-1) 89 | } 90 | 91 | splitError <- try( 92 | { 93 | #Convert Categorical Features to Numerical Ones and split the dataset 94 | B <- max(10, as.integer((metaFeatures$nInstances) / 2000)) #Number of folds to work on for the dataset and trees in SMAC forest model 95 | 96 | dataset <- convertCategorical(dataset, trainDataset, testDataset, B = B) 97 | validationSet <- dataset$VD #Validation set 98 | trainingSet <- dataset$TD #Training Set 99 | foldedSet <- dataset$FD #Folded sets of Training Data. 100 | #Convert for all TrainingSet 101 | trainDataset <- dataset$FULLTD 102 | #Convert for all TestingSet 103 | testDataset <- dataset$TED 104 | }) 105 | if(inherits(splitError, "try-error")){ 106 | print('Error: Failed Splitting Dataset.') 107 | return(-1) 108 | } 109 | 110 | #Generate candidate classifiers 111 | candidateClfsError <- try( 112 | { 113 | nClassifiers <- 15 114 | output <- getCandidateClassifiers(maxTime, metaFeatures, min(c(nModels, nClassifiers)) ) 115 | algorithms <- output$c #Classifier Algorithm names selected. 116 | tRatio <- output$r #Time ratio between all classifiers. 117 | algorithmsParams <- output$p #Initial Parameter configuration of each classifier. 118 | }) 119 | if(inherits(candidateClfsError, "try-error")){ 120 | print('Error: Can not generate Candidate classifiers.') 121 | return(-1) 122 | } 123 | 124 | tryCatch({ 125 | #Option 1: Only Candidate Classifiers with initial parameters will be resulted (No Hyper-parameter tuning) 126 | if(option == 1 && length(algorithms) == length(algorithmsParams)) 127 | return (list(clfs = algorithms, params = algorithmsParams, TRData = dataset$FULLTD, TEData = dataset$TED)) 128 | else if(option == 1) 129 | return ('Error: Failed to Connect to KnowledgeBase, Option 1 can not be executed') 130 | 131 | #Option 2: Classifier Algorithm Selection + Parameter Tuning 132 | res <- withTimeout({ 133 | #variables to hold best classifiers 134 | bestAlgorithm <- '' #bestClassifierName. 135 | bestAlgorithmPerf <- 0 #bestClassifierPerformance. 136 | bestAlgorithmParams <- list() #Parameters of best Classifier. 137 | 138 | #loop over each classifier 139 | for(i in 1:length(algorithms)){ 140 | classifierAlgorithm <- algorithms[[i]] 141 | if (i <= length(algorithmsParams)) 142 | classifierAlgorithmParams <- algorithmsParams[[i]] 143 | else 144 | classifierAlgorithmParams <- '' #use the default initial parameter configuration 145 | 146 | #Read maxTime for the current classifier algorithm and convert to seconds 147 | maxClfTime <- tRatio[i] * 60 148 | #Read the current classifier default parameter configuration 149 | classifierConf <- getClassifierConf(classifierAlgorithm) 150 | cat('\nStart Tuning Classifier Algorithm: ', classifierAlgorithm, '\n') 151 | #initialize step 152 | R <- initialize(classifierAlgorithm, classifierConf, classifierAlgorithmParams) 153 | cntParams <- R[, -which(names(R) == "performance")] 154 | #start hyperParameter tuning till maximum Time 155 | tic(quiet = TRUE) 156 | timeTillNow <- 0 157 | #Regression Random Forest Trees for training set folds 158 | tree <- data.frame(fold=integer(), parent=integer(), params=character(), rightChild=integer(), leftChild=integer(), performance=double(), rowN = integer()) 159 | bestParams <- cntParams 160 | bestPerf <- c() 161 | classifierFailureCounter <- 0 162 | 163 | repeat{ 164 | gc() 165 | #Fit Model 166 | output <- fitModel(bestParams, bestPerf, trainingSet, validationSet, foldedSet, classifierAlgorithm, tree, B = B) 167 | #Check if this classifer failed for more than 5 times, skip to the next classifier 168 | if((length(bestPerf) > 0 && mean(bestPerf) == 0) || length(bestPerf) == 0){ 169 | classifierFailureCounter <- classifierFailureCounter + 1 170 | if(classifierFailureCounter > 2) break 171 | } 172 | tree <- output$t 173 | bestPerf <- output$p 174 | bestParams <- output$bp 175 | #Select Candidate Classifier Configurations 176 | candidateConfs <- selectConfiguration(R, classifierAlgorithm, tree, bestParams, B = B) 177 | #Intensify 178 | if(nrow(candidateConfs) > 0){ 179 | output <- intensify(R, bestParams, bestPerf, candidateConfs, foldedSet, trainingSet, validationSet, classifierAlgorithm, maxClfTime, timeTillNow, B = B, metric = metric) 180 | bestParams <- output$params 181 | bestPerf <- output$perf 182 | timeTillNow <- output$timeTillNow 183 | classifierFailureCounter <- classifierFailureCounter + output$fails 184 | R <- output$r 185 | } 186 | #Check if execution time exceeded the allowed time or not 187 | t <- toc(quiet = TRUE) 188 | timeTillNow <- timeTillNow + t$toc - t$tic 189 | tic(quiet = TRUE) 190 | if(timeTillNow > maxClfTime){ 191 | if(mean(bestPerf) > mean(bestAlgorithmPerf)){ 192 | bestAlgorithmPerf <- bestPerf 193 | bestAlgorithm <- classifierAlgorithm 194 | bestAlgorithmParams <- bestParams 195 | #cat('Best Classifier:', bestAlgorithm, ' --> Performance:', bestAlgorithmPerf, '\n') 196 | } 197 | break 198 | } 199 | 200 | } 201 | } 202 | 203 | },timeout = maxTime * 60, cpu = maxTime * 60) 204 | }, TimeoutException = function(ex) { 205 | message("NOTE: Time Budget allowed has been finished.") 206 | }) 207 | 208 | print("Time Limit for Tuning process has been reached out. Training the best classifier found over whole Training set now.") 209 | if (bestAlgorithm != '') 210 | bestAlgorithmParams <- bestAlgorithmParams[,names(bestAlgorithmParams) != "EI" & names(bestAlgorithmParams) != "performance"] 211 | else{ 212 | bestAlgorithm <- algorithms[[1]] 213 | bestAlgorithmParams <- algorithmsParams[[1]] 214 | } 215 | 216 | trainFinalModelError <- try( 217 | { 218 | #Run Classifier over all training set and check performance on testing set 219 | finalResult <- runClassifier(trainingSet = trainDataset, validationSet = testDataset, params = bestAlgorithmParams, classifierAlgorithm = bestAlgorithm, metric = metric, interp = interp) 220 | finalResult$clfs <- bestAlgorithm 221 | finalResult$params <- bestAlgorithmParams 222 | #save results to Temporary File 223 | query <- sendToTmp(metaFeatures, bestAlgorithm, bestAlgorithmParams, finalResult$perf, nModels, metric) 224 | #check internet connection and send data in tmp file to database if connection exists 225 | if(checkInternet() == TRUE){ 226 | sendToDatabase(query) 227 | } 228 | }) 229 | if(inherits(trainFinalModelError, "try-error")){ 230 | print('Error: No Enough Computational Resources. Can not build a model over the current dataset!') 231 | } 232 | 233 | 234 | finalResult$TRData = dataset$FULLTD 235 | finalResult$TEData = dataset$TED 236 | return(finalResult) 237 | } 238 | -------------------------------------------------------------------------------- /R/autoRLearn_.R: -------------------------------------------------------------------------------- 1 | #' @title Advanced version of autoRLearn. 2 | #' 3 | #' @description Tunes the hyperparameters of the desired algorithm/s using either hyperband or BOHB. 4 | #' 5 | #' @param df_train Dataframe of the training dataset. Assumes it is in perfect shape with all numeric variables and factor response variable named "class". 6 | #' @param df_test Dataframe of the test dataset. Assumes it is in perfect shape with all numeric variables and factor response variable named "class". 7 | #' @param maxTime Float representing the maximum time the algorithm should be run (in minutes). 8 | #' @param models List of strings denoting which algorithms to use for the process: 9 | #' \itemize{ 10 | #' \item "randomForest" - Random forests using the randomForest package 11 | #' \item "ranger - Random forests using the ranger package (unstable) 12 | #' \item "naiveBayes" - Naive bayes using the fastNaiveBayes package 13 | #' \item "boosting" - Gradient boosting using xgboost 14 | #' \item "l2-linear-classifier" - Linear primal Support vector machine from LibLinear 15 | #' \item "svm" - RBF kernel svm from e1071 16 | #' } 17 | #' @param optimizationAlgorithm - String of which hyperparameter tuning algorithm to use: 18 | #' \itemize{ 19 | #' \item "hyperband" - Hyperband with uniformly initiated parameters 20 | #' \item "bohb" - Hyperband with bayesian optimization as described on F. Hutter et al 2018 paper BOHB. Has extra parameters bw and kde_type 21 | #' } 22 | #' @param bw - (only applies to BOHB) Double representing how much should the KDE bandwidth be widened. Higher values allow the algorithm to explore more hyperparameter combinations 23 | #' @param max_iter - (affects both hyperband and BOHB) Integer representing the maximum number of iterations that one successive halving run can have 24 | #' @param kde_type - (only applies to BOHB) String representing whether a model's hyperparameters should be tuned individually of each other or have their probability densities multiplied: 25 | #' \itemize{ 26 | #' \item "single" - each hyperparameter has its own expected improvement calculated 27 | #' \item "mixed" - all hyperparameters' probabilty densities are multiplied and only one mixed expected improvement is calculated 28 | #' } 29 | #' @param metric String of the evaluation metric to be used in the model performance optimization: 30 | #' \itemize{ 31 | #' \item "acc" - Accuracy, 32 | #' \item "avg-fscore" - Average of F-Score of each label, 33 | #' \item "avg-recall" - Average of Recall of each label, 34 | #' \item "avg-precision" - Average of Precision of each label, 35 | #' \item "fscore" - Micro-Average of F-Score of each label, 36 | #' \item "recall" - Micro-Average of Recall of each label, 37 | #' \item "precision" - Micro-Average of Precision of each label. 38 | #' } 39 | #' @return List of Results 40 | #' \itemize{ 41 | #' \item \code{perf} - Evaluated metric of the best performing model on the test data 42 | #' \item \code{pred} - prediction on the test data using the best model 43 | #' \item \code{model} - best model object 44 | #' \item \code{best_models} - table with the best hyperparameters found for the selected models. 45 | #' } 46 | 47 | #' @importFrom R.utils withTimeout 48 | #' @importFrom tictoc tic toc 49 | #' @importFrom stats na.omit runif 50 | #' @importFrom utils head 51 | 52 | #' @export autoRLearn_ 53 | autoRLearn_ <- function(df_train, df_test, maxTime = 10, 54 | models = c("randomForest", "naiveBayes", "boosting", "l2-linear-classifier", "svm"), 55 | optimizationAlgorithm = "hyperband", bw = 3, kde_type = "single", 56 | max_iter = 81, metric = "acc") { 57 | 58 | total_time <- maxTime * 60 59 | parameters_per_model <- map_int(models, .f = ~ length(jsons[[.x]]$params)) 60 | times <- (parameters_per_model * total_time) / (sum(parameters_per_model)) 61 | 62 | print("Time distribution:") 63 | print(times) 64 | print("Models selected:") 65 | print(models) 66 | 67 | run_optimization <- function(model, time) { 68 | results <- NULL 69 | priors <- data.frame() 70 | 71 | tic(model, "optimization time:") 72 | 73 | if(optimizationAlgorithm == "hyperband") { 74 | current <- Sys.time() %>% as.integer() 75 | end <- (Sys.time() %>% as.integer()) + time 76 | repeat { 77 | gc(verbose = F) 78 | tic("current hyperband runtime") 79 | print(paste("started", model)) 80 | time_left <- max(end - (Sys.time() %>% as.integer()), 1) 81 | print(paste("There are:", time_left, "seconds left for this hyperband run")) 82 | res <- hyperband(df = df_train, model = model, max_iter = max_iter, maxtime = time_left) 83 | if(is_empty(flatten(res)) == F) { 84 | res <- res %>% 85 | map_dfr(.f = ~ .x[["answer"]]) %>% 86 | arrange(desc(acc)) %>% 87 | head(1) 88 | results <- c(list(res), results) 89 | print(paste('Best accuracy from hyperband this round: ', res$acc)) 90 | } 91 | elapsed <- (Sys.time() %>% as.integer()) - current 92 | if(elapsed >= time) { 93 | break 94 | } 95 | } 96 | } 97 | 98 | else if(optimizationAlgorithm == "bohb") { 99 | current <- Sys.time() %>% as.integer() 100 | end <- (Sys.time() %>% as.integer()) + time 101 | repeat { 102 | gc(verbose = F) 103 | tic("current bohb time") 104 | print(paste("started", model)) 105 | time_left <- max(end - (Sys.time() %>% as.integer()), 1) 106 | print(paste("There are:", time_left, "seconds left for this bohb run")) 107 | res <- bohb(df = df_train, model = model, bw = bw, max_iter = max_iter, maxtime = time_left, 108 | priors = priors, kde_type = kde_type) 109 | if(is_empty(flatten(res)) == F) { 110 | priors <- res %>% 111 | map_dfr(.f = ~ .x[["sh_runs"]]) 112 | res <- res %>% 113 | map_dfr(.f = ~ .x[["answer"]]) %>% 114 | arrange(desc(acc)) %>% 115 | head(1) 116 | results <- c(list(res), results) 117 | print(paste('Best accuracy from hyperband this round: ', res$acc)) 118 | } 119 | elapsed <- (Sys.time() %>% as.integer()) - current 120 | if(elapsed >= time) { 121 | break 122 | } 123 | } 124 | } 125 | 126 | else { 127 | errorCondition(message = "Only hyperband and bohb are valid optimization algorithms at this moment.") 128 | break 129 | } 130 | 131 | toc() 132 | results 133 | } 134 | 135 | print("Finished all optimizations.") 136 | ans <- vector(mode = "list", length = length(models)) 137 | 138 | for(i in 1:length(models)) { 139 | flag <- TRUE 140 | #tryCatch(expr = { 141 | ans[[i]] <- run_optimization(models[[i]], times[[i]]) 142 | #}, error = function(e) { 143 | # print("Error spotted, going to the next model!") 144 | # flag <<- FALSE 145 | #}) 146 | if (!flag) next 147 | } 148 | 149 | print(ans) 150 | ans <- ans %>% 151 | map(.f = ~ map_dfr(.x = .x, .f = ~ .x %>% select(model, params, acc))) %>% 152 | map_dfr(.f = ~ .x %>% arrange(desc(acc)) %>% head(1)) %>% 153 | arrange(desc(acc)) 154 | best_model <- ans %>% head(1) 155 | final_evaluation <- eval_loss(model = best_model[["model"]], train_df = df_train, test_df = df_test, 156 | params = best_model[["params"]]) 157 | final_evaluation$best_models <- ans 158 | print(paste("Winner:", best_model$model, "test accuracy:", final_evaluation$perf)) 159 | final_evaluation 160 | 161 | } 162 | 163 | -------------------------------------------------------------------------------- /R/bohb.R: -------------------------------------------------------------------------------- 1 | #' @importFrom dplyr distinct n group_by 2 | 3 | #' @keywords internal 4 | bohb <- function(df, model, max_iter = 81, eta = 3, bw = 3, random_frac = 1/3, 5 | maxtime, priors = data.frame(), kde_type = "single") { 6 | logeta = as_mapper(~ log(.x) / log(eta)) 7 | s_max = trunc(logeta(max_iter)) 8 | B = (s_max + 1) * max_iter 9 | nrs = map_dfc(s_max:0, .f = ~ calc_n_r(max_iter, eta, .x, B)) %>% 10 | t() %>% 11 | `colnames<-`(value = c("n", "r")) %>% 12 | as_tibble() 13 | nrs$s = s_max:0 14 | length_params <- length(jsons[[model]]$params) 15 | 16 | tryCatch(expr = {withTimeout(expr = { 17 | liszt = vector(mode = "list", 18 | length = max(nrs$s) + 1) 19 | runs_df <- NULL 20 | current_sh_run <- NULL 21 | for (row in 1:nrow(nrs)) { 22 | if(row == 1) { 23 | print(paste("Iteration number", row)) 24 | #print(paste("n = ", nrs[[row, 1]], " r = ", nrs[[row, 2]], " s_max = ", nrs[[row, 3]], sep = "")) 25 | current_sh_run <- successive_halving(df = df, 26 | params_config = sample_n_params(n = nrs[[row, 1]], 27 | model = model), 28 | n = nrs[[row, 1]], 29 | r = nrs[[row, 2]], 30 | s_max = nrs[[row, 3]], 31 | max_iter = max_iter, 32 | eta = eta, 33 | evaluations = priors) 34 | runs_df <- runs_df %>% 35 | bind_rows(current_sh_run$sh_runs) 36 | liszt[[row]] <- current_sh_run 37 | next 38 | } 39 | else if(row > 1){ 40 | bayesian_opt_samples <- successive_resampling(df = runs_df, 41 | model = model, 42 | samples = max_iter, 43 | n = round(max(nrs[[row, 1]] * (1 - random_frac), 1)), 44 | bw = bw, 45 | kde_type = kde_type) 46 | 47 | current_sh_run <- successive_halving(df = df, 48 | params_config = bayesian_opt_samples %>% 49 | bind_rows(sample_n_params(n = round(max(nrs[[row, 1]] * random_frac, 1)), model = model)), 50 | n = nrs[[row, 1]], 51 | r = nrs[[row, 2]], 52 | s_max = nrs[[row, 3]], 53 | max_iter = max_iter, 54 | eta = eta) 55 | } 56 | runs_df <- runs_df %>% 57 | bind_rows(current_sh_run$sh_runs) 58 | liszt[[row]] <- current_sh_run 59 | } 60 | }, timeout = maxtime, cpu = maxtime)}, 61 | 62 | TimeoutException = function(ex) { 63 | print("Budget ended.") 64 | return(liszt) 65 | }, 66 | 67 | finally = function(ex) { 68 | print("BOHB successfully finished.") 69 | return(liszt) 70 | } 71 | , 72 | 73 | error = function(ex) { 74 | print(paste("Error found, replace ", model, sep = "")) 75 | print(geterrmessage()) 76 | break 77 | }) 78 | 79 | return(liszt) 80 | } 81 | -------------------------------------------------------------------------------- /R/bohb_utility.R: -------------------------------------------------------------------------------- 1 | #' @keywords internal 2 | EI <- function(..., lkde, gkde) { predict(lkde, x = c(...)) / predict(gkde, x = c(...)) } 3 | 4 | #' @keywords internal 5 | map_all <- function(df) { 6 | do.call("mapply", c(list, df, SIMPLIFY = FALSE, USE.NAMES=FALSE)) 7 | } 8 | 9 | #' @keywords internal 10 | coalesce_all_columns <- function(df, group_vars = NULL) { 11 | if (is.null(group_vars)) { 12 | group_vars <- 13 | df %>% 14 | purrr::keep(~ dplyr::n_distinct(.x) == 1L) %>% 15 | names() 16 | } 17 | 18 | msk <- colnames(df) %in% group_vars 19 | same_df <- df[1L, msk, drop = FALSE] 20 | coal_df <- df[, !msk, drop = FALSE] %>% 21 | purrr::map_dfc(na.omit) 22 | cbind(same_df, coal_df) 23 | } 24 | 25 | #' @keywords internal 26 | sample_n_params <- function(n, model) { 27 | ans <- map_chr(.x = rep(model, n), .f = make_paste_final) %>% 28 | data.frame(model = model, 29 | params = .) %>% 30 | mutate_all(.funs = as.character) 31 | ans 32 | } 33 | 34 | #' @keywords internal 35 | make_paste_final <- function(model) { 36 | params_list <- get_random_hp_config(jsons[[model]]) 37 | 38 | names_list <- names(params_list) %>% 39 | map(~ str_glue(.x, " = ")) %>% 40 | map2(params_list, ~paste(.x, .y, sep = "")) %>% 41 | paste(collapse = ",") 42 | names_list 43 | } 44 | -------------------------------------------------------------------------------- /R/checkInternet.R: -------------------------------------------------------------------------------- 1 | #' @title Check Internet Connectivity. 2 | #' 3 | #' @description Checking if user has Internet connectivity at the moment of execution to send results to the knowledge base / get recommendation from knowledge base. 4 | #' 5 | #' @return Boolean representing the Internet connectivity status. 6 | #' 7 | #' @examples 8 | #' checkInternet(). 9 | #' 10 | #' @importFrom RCurl getURL 11 | #' 12 | #' @noRd 13 | #' 14 | #' @keywords internal 15 | 16 | checkInternet <- function() { 17 | out <- FALSE 18 | tryCatch({ 19 | out <- is.character(getURL("www.yahoo.com")) 20 | }, 21 | error = function(e) { 22 | out <- FALSE 23 | } 24 | ) 25 | out 26 | } 27 | -------------------------------------------------------------------------------- /R/computeEI.R: -------------------------------------------------------------------------------- 1 | #' @title Compute Expected Improvement. 2 | #' 3 | #' @description Compute the expected improvement for the suggested parameter configurations of a specific classifier. 4 | #' 5 | #' @param cmin Minimum error rate achieved till now. 6 | #' @param perf Expected Performance of the current configuration on each tree of the forest of SMAC algorithm. 7 | #' @param B number of trees in the forest of trees of SMAC optimization algorithm (default = 10). 8 | #' 9 | #' @return Float Number of Expected Improvement value. 10 | #' 11 | #' @examples 12 | #' computeEI(0.9, c(0.91, 0.95, 0.89, 0.88, 0.93), 5). 13 | #' 14 | #' @importFrom stats pnorm dnorm var 15 | #' 16 | #' @noRd 17 | #' 18 | #' @keywords internal 19 | 20 | computeEI <- function(cmin, perf, B = 10){ 21 | for(i in 1:B){ 22 | perf[i] <- 1 - perf[i] 23 | } 24 | perfMean <- mean(perf) 25 | perfStdDev <- sqrt(var(perf)) 26 | u <- (cmin - perfMean)/perfStdDev 27 | cdf <- pnorm(u, mean=0, sd=1) 28 | pdf <- dnorm(u, mean=0, sd=1) 29 | EI <- perfStdDev * (u * cdf + pdf) 30 | return (EI) 31 | } 32 | -------------------------------------------------------------------------------- /R/computeMetaFeatures.R: -------------------------------------------------------------------------------- 1 | #' @title Compute Meta-Features. 2 | #' 3 | #' @description Compute Statistical Meta-Features for a dataset. 4 | #' 5 | #' @param dataset The dataframe containing the dataset to process. 6 | #' @param maxTime The maximum time budget entered by user for the parameter optimization part (in minutes). 7 | #' @param featureTypes Vector of Types of each feature in the dataset either ('numerical', 'categorical'). 8 | #' 9 | #' @return dataframe with 25 statistical meta-feature of \code{dataset}. 10 | #' 11 | #' @examples 12 | #' computeMetaFeatures(data.frame(salary = c(623, 515, 611, 729, 843), class = c (0, 0, 0, 1, 1)), 10, c('numerical', 'numerical')). 13 | #' 14 | #' @importFrom e1071 skewness kurtosis 15 | #' @importFrom stats var 16 | #' 17 | #' @noRd 18 | #' 19 | #' @keywords internal 20 | 21 | computeMetaFeatures <- function(dataset, maxTime, featureTypes) { 22 | print('###################START: Preparation of Meta-Features of the Dataset###################') 23 | #1- number of instances 24 | nInstances <- nrow(dataset) 25 | cat(sprintf("1-Number of Instances: %d\n", nInstances)) 26 | #2- log number of instances 27 | lognInstances <- log(nInstances) 28 | cat(sprintf("2-Log number of Instances: %f\n",lognInstances)) 29 | #3- number of features 30 | nFeatures <- ncol(dataset) - 1 31 | cat(sprintf("3-Number of Features: %d\n", nFeatures)) 32 | #4- log number of features 33 | lognFeatures <- log(nFeatures) 34 | cat(sprintf("4-Log number of Features: %f\n", lognFeatures)) 35 | #5- number of classes 36 | classes <- unique(dataset$class) 37 | nClasses <- length(classes) 38 | cat(sprintf("5-Total number of Classes: %d\n", nClasses)) 39 | #6- number of categorical features 40 | nCatFeatures <- 0 41 | nNumFeatures <- 0 42 | skewVector <- c() 43 | kurtosisVector <- c() 44 | symbolsVector <- c() 45 | featsType <- lapply(dataset, class) 46 | if(length(featureTypes) == 0){ 47 | for(i in colnames(dataset)){ 48 | if(i == 'class')next 49 | if(featsType[[i]] != 'factor' && featsType[[i]] != 'character' && length(unique(dataset[[i]])) > lognInstances){ 50 | nNumFeatures <- nNumFeatures + 1 51 | skewVector <- c(skewVector, skewness(dataset[[i]])) 52 | kurtosisVector <- c(kurtosisVector, kurtosis(dataset[[i]])) 53 | } 54 | else{ 55 | nCatFeatures <- nNumFeatures + 1 56 | symbolsVector <- c(symbolsVector, length(unique(dataset[[i]]))) 57 | } 58 | } 59 | } 60 | else{ 61 | counter <- 0 62 | for(i in colnames(dataset)){ 63 | counter <- counter + 1 64 | if(i == 'class')next 65 | if(featureTypes[counter] == 'numerical'){ 66 | nNumFeatures <- nNumFeatures + 1 67 | skewVector <- c(skewVector, skewness(dataset[[i]])) 68 | kurtosisVector <- c(kurtosisVector, kurtosis(dataset[[i]])) 69 | } 70 | else{ 71 | nCatFeatures <- nNumFeatures + 1 72 | symbolsVector <- c(symbolsVector, length(unique(dataset[[i]]))) 73 | } 74 | } 75 | } 76 | cat(sprintf("6-Number of Categorical Features: %d\n", nCatFeatures)) 77 | #7- number of numerical features 78 | cat(sprintf("7-Number of Numerical Features: %d\n", nNumFeatures)) 79 | #8- ratio of numerical to categorical features 80 | if(nNumFeatures > 0){ 81 | ratioNumToCat <- nCatFeatures / nNumFeatures 82 | } 83 | else{ 84 | ratioNumToCat <- 999999 85 | } 86 | cat(sprintf("8-Ratio of Categorical to Numerical Features %f\n", ratioNumToCat)) 87 | #9- class entropy 88 | probClasses <- c() 89 | classEntropy <- 0 90 | for(i in classes){ 91 | prob <- length(which(dataset$class==i))/nInstances 92 | probClasses <- c(probClasses, prob) 93 | classEntropy <- classEntropy - prob * log2(prob) 94 | } 95 | cat(sprintf("9-Class Entropy: %f\n", classEntropy)) 96 | #10- class probability max 97 | classProbMax <- max(probClasses) 98 | cat(sprintf("10-Maximum Class Probability: %f\n", classProbMax)) 99 | #11- class probability min 100 | classProbMin <- min(probClasses) 101 | cat(sprintf("11-Minimum Class Probability: %f\n", classProbMin)) 102 | #12- class probability mean 103 | classProbMean <- mean(probClasses) 104 | cat(sprintf("12-Mean Class Probability: %f\n", classProbMean)) 105 | #13- class probability std. dev 106 | classProbStdDev <- sqrt(var(probClasses)) 107 | cat(sprintf("13-Standard Deviation of Class Probability: %f\n", classProbStdDev)) 108 | #14- Symbols Mean 109 | if(length(symbolsVector) > 0) symbolsMean <- mean(symbolsVector) 110 | else symbolsMean <- 'NULL' 111 | cat(sprintf("14-Mean of Number of Symbols: %s\n", symbolsMean)) 112 | #15- Symbols sum 113 | if(length(symbolsVector) > 0) symbolsSum <- sum(symbolsVector) 114 | else symbolsSum <- 'NULL' 115 | cat(sprintf("15-Sum of Number of Symbols: %s\n", symbolsSum)) 116 | #16- Symbols Std. Deviation 117 | if(length(symbolsVector) > 0) symbolsStdDev <- sqrt(var(symbolsVector)) 118 | else symbolsStdDev <- 'NULL' 119 | cat(sprintf("16-Std. Deviation of Number of Symbols: %s\n", symbolsStdDev)) 120 | #17- skewness min 121 | if(length(skewVector) > 0) featuresSkewMin <- try(min(skewVector)) 122 | else featuresSkewMin <- 0 123 | cat(sprintf("17-Features Skewness Minimum: %s\n", featuresSkewMin)) 124 | #18- skewness mean 125 | if(length(skewVector) > 0) featuresSkewMean <- try(mean(skewVector)) 126 | else featuresSkewMean <- 0 127 | cat(sprintf("18-Features Skewness Mean: %s\n", featuresSkewMean)) 128 | #19- skewness max 129 | if(length(skewVector) > 0) featuresSkewMax <- try(max(skewVector)) 130 | else featuresSkewMax <- 0 131 | cat(sprintf("19-Features Skewness Maximum: %s\n", featuresSkewMax)) 132 | #20- skewness std. dev. 133 | if(length(skewVector) > 0) featuresSkewStdDev <- try(sqrt(var(skewVector))) 134 | else featuresSkewStdDev <- 0 135 | cat(sprintf("20-Features Skewness Std. Deviation: %s\n", featuresSkewStdDev)) 136 | #21- Kurtosis min 137 | if(length(kurtosisVector) > 0) featuresKurtMin <- try(min(kurtosisVector)) 138 | else featuresKurtMin <- 0 139 | cat(sprintf("21-Features Kurtosis Min: %s\n", featuresKurtMin)) 140 | #22- Kurtosis max 141 | if(length(kurtosisVector) > 0) featuresKurtMax <- try(max(kurtosisVector)) 142 | else featuresKurtMax <- 0 143 | cat(sprintf("22-Features Kurtosis Max: %s\n", featuresKurtMax)) 144 | #23- Kurtosis mean 145 | if(length(kurtosisVector) > 0) featuresKurtMean <- try(mean(kurtosisVector)) 146 | else featuresKurtMean <- 0 147 | cat(sprintf("23-Features Kurtosis Mean: %s\n", featuresKurtMean)) 148 | #24- Kurtosis std. dev. 149 | if(length(kurtosisVector) > 0) featuresKurtStdDev <- try(sqrt(var(kurtosisVector))) 150 | else featuresKurtStdDev <- 0 151 | cat(sprintf("24-Features Kurtosis Std. Deviation: %s\n", featuresKurtStdDev)) 152 | #25- Dataset Ratio (ratio of number features: number of instances) 153 | datasetRatio <- nFeatures / nInstances 154 | cat(sprintf("25-Dataset Ratio: %f\n", datasetRatio)) 155 | 156 | #Collecting Meta-Features in a dataFrame 157 | df <- data.frame(datasetRatio = datasetRatio, featuresKurtStdDev = featuresKurtStdDev, 158 | featuresKurtMean = featuresKurtMean, featuresKurtMax = featuresKurtMax, 159 | featuresKurtMin = featuresKurtMin, featuresSkewStdDev = featuresSkewStdDev, 160 | featuresSkewMean = featuresSkewMean, featuresSkewMax = featuresSkewMax, 161 | featuresSkewMin = featuresSkewMin, symbolsStdDev = symbolsStdDev, symbolsSum = symbolsSum, 162 | symbolsMean = symbolsMean, classProbStdDev = classProbStdDev, classProbMean = classProbMean, 163 | classProbMax = classProbMax, classProbMin = classProbMin, classEntropy = classEntropy, 164 | ratioNumToCat = ratioNumToCat, nCatFeatures = nCatFeatures, nNumFeatures = nNumFeatures, 165 | nInstances = nInstances, nFeatures = nFeatures, nClasses = nClasses, 166 | lognFeatures = lognFeatures, lognInstances = lognInstances, maxTime = maxTime) 167 | print('###################END: Preparation of Meta-Features of the Dataset###################') 168 | return(df) 169 | } 170 | -------------------------------------------------------------------------------- /R/convertCategorical.R: -------------------------------------------------------------------------------- 1 | #' @title Convert Categorical to Numerical Features. 2 | #' 3 | #' @description Perform One-Hot-Encoding for the categorical features to convert them to numerical ones. 4 | #' 5 | #' @param dataset List of training and validation dataframes containing the dataset to process. 6 | #' @param trainDataset Dataframe of full training set 7 | #' @param testDataset Dataframe of full testing set 8 | #' @param B number of trees in the forest of trees of SMAC optimization algorithm (default = 10). 9 | #' 10 | #' @return List of data frames for the new dataset after encoding Ctegorical to numerical (TD = Training Dataset, VD = Validation Dataset, FD = Training Dataset after splitting it into \code{B} folds). 11 | #' 12 | #' @examples 13 | #' convertCategorical(data.frame(salary = c(623, 515, 611, 729, 843), class = c (0, 0, 0, 1, 1)), 1). 14 | #' 15 | #' @import caret 16 | #' 17 | #' @noRd 18 | #' 19 | #' @keywords internal 20 | 21 | convertCategorical <- function(dataset, trainDataset, testDataset, B = 10) { 22 | #Convert Factor/String Features into numeric features 23 | dmy <- caret::dummyVars(" ~ .", data = rbind(trainDataset, testDataset)[,names(trainDataset) != "class"]) 24 | datasetTmp <- data.frame(predict(dmy, newdata = dataset$TD)) 25 | dataset$FULLTD <- data.frame(predict(dmy, newdata = trainDataset)) 26 | dataset$TED <- data.frame(predict(dmy, newdata = testDataset)) 27 | 28 | datasetTmp$class <- dataset$TD$class 29 | dataset$TD <- datasetTmp 30 | dataset$FULLTD$class <- trainDataset$class 31 | dataset$TED$class <- testDataset$class 32 | 33 | if(nrow(dataset$VD) > 1){ 34 | validationSet <- data.frame(predict(dmy, newdata = dataset$VD)) 35 | validationSet$class <- dataset$VD$class 36 | dataset$VD <- validationSet 37 | dataset$FD <- createFolds(dataset$TD$class, k = B, list = TRUE, returnTrain = FALSE) 38 | } 39 | return(dataset) 40 | } 41 | -------------------------------------------------------------------------------- /R/datasetReader.R: -------------------------------------------------------------------------------- 1 | #' @title Read Dataset File into Memory. 2 | #' 3 | #' @description Read the file of the training and testing dataset, and perform preprocessing and data cleaning if necessary. 4 | #' 5 | #' @param directory String of the directory to the file containing the training dataset. 6 | #' @param testDirectory String of the directory to the file containing the testing dataset. 7 | #' @param selectedFeats Vector of numbers of features columns to include from the training set and ignore the rest of columns - In case of empty vector, this means to include all features in the dataset file (default = c()). 8 | #' @param classCol String of the name of the class label column in the dataset (default = 'class'). 9 | #' @param preProcessF string containing the name of the preprocessing algorithm (default = 'N' --> no preprocessing): 10 | #' \itemize{ 11 | #' \item "boxcox" - apply a Box–Cox transform and values must be non-zero and positive in all features, 12 | #' \item "yeo-Johnson" - apply a Yeo-Johnson transform, like a BoxCox, but values can be negative, 13 | #' \item "zv" - remove attributes with a zero variance (all the same value), 14 | #' \item "center" - subtract mean from values, 15 | #' \item "scale" - divide values by standard deviation, 16 | #' \item "standardize" - perform both centering and scaling, 17 | #' \item "normalize" - normalize values, 18 | #' \item "pca" - transform data to the principal components, 19 | #' \item "ica" - transform data to the independent components. 20 | #' } 21 | #' @param featuresToPreProcess Vector of number of features to perform the feature preprocessing on - In case of empty vector, this means to include all features in the dataset file (default = c()) - This vector should be a subset of \code{selectedFeats}. 22 | #' @param nComp Integer of Number of components needed if either "pca" or "ica" feature preprocessors are needed. 23 | #' @param missingVal Vector of strings representing the missing values in dataset (default: c('NA', '?', ' ')). 24 | #' @param missingOpr Boolean variable represents either delete instances with missing values or apply imputation using "MICE" library which helps you imputing missing values with plausible data values that are drawn from a distribution specifically designed for each missing datapoint- (default = 0 --> delete instances). 25 | #' 26 | #' @return List of the TrainingSet \code{Train} and TestingSet \code{Test}. 27 | #' 28 | #' @import RWeka 29 | #' @import farff 30 | #' @import caret 31 | #' @import mice 32 | #' @importFrom utils read.csv 33 | #' @importFrom stats complete.cases 34 | #' 35 | #' @examples 36 | #' \dontrun{ 37 | #' dataset <- datasetReader('/Datasets/irisTrain.csv', '/Datasets/irisTest.csv') 38 | #' } 39 | 40 | datasetReader <- function(directory, testDirectory, selectedFeats = c(), classCol = 'class', 41 | preProcessF = 'N', featuresToPreProcess = c(), nComp = NA, 42 | missingVal = c('NA', '?', ' '), missingOpr = 0) { 43 | #check if CSV or arff 44 | ext <- substr(directory, nchar(directory)-2, nchar(directory)) 45 | #Read CSV file of data 46 | if(ext == 'csv'){ 47 | con <- file(directory, "r") 48 | data <- read.csv(file = con, header = TRUE, sep = ",", stringsAsFactors = FALSE) 49 | close(con) 50 | con <- file(testDirectory, "r") 51 | dataTED <- read.csv(file = con, header = TRUE, sep = ",", stringsAsFactors = FALSE) 52 | close(con) 53 | } 54 | else{ 55 | data <- readARFF(directory) 56 | dataTED <- readARFF(testDirectory) 57 | } 58 | 59 | #change column name of classes to be "class" 60 | colnames(data)[which(names(data) == classCol)] <- "class" 61 | colnames(dataTED)[which(names(dataTED) == classCol)] <- "class" 62 | cInd <- grep("class", colnames(data)) #index of class column 63 | 64 | #Convert characters representing missing values to NA 65 | m1 <- as.matrix(data) 66 | m1[m1 %in% missingVal] <- NA 67 | m2 <- as.matrix(dataTED) 68 | m2[m2 %in% missingVal] <- NA 69 | 70 | #check either to delete instance with missing values or perform imputation 71 | if (missingOpr == 0){ 72 | data <- data[complete.cases(m1), ] 73 | dataTED <- dataTED[complete.cases(m2), ] 74 | } 75 | else{ 76 | data <- complete(mice(data, m = 1)) 77 | dataTED <- complete(mice(dataTED, m = 1)) 78 | } 79 | 80 | #select features only upon user request 81 | if(length(selectedFeats) == 0){ 82 | selectedFeats <- c(1:ncol(data)) 83 | } 84 | #perform preprocessing 85 | if(preProcessF != 'N'){ 86 | if(length(featuresToPreProcess ) == 0) 87 | featuresToPreProcess <- selectedFeats 88 | 89 | featuresToPreProcess <- featuresToPreProcess[!featuresToPreProcess %in% cInd] #remove class column from set of features to be preprocessed 90 | dataTmp <- featurePreProcessing(data[,featuresToPreProcess], dataTED[,featuresToPreProcess], preProcessF, nComp) 91 | 92 | #add other features that don't require feature preprocessing to the features obtained after preprocessing 93 | diffTmp <- setdiff(selectedFeats, c(cInd, featuresToPreProcess)) 94 | dataTDTmp <- cbind(dataTmp$TD, data[, diffTmp]) 95 | dataTEDTmp <- cbind(dataTmp$TED, dataTED[, diffTmp]) 96 | #add class column to the dataframe of the dataset 97 | dataTDTmp$class <- data$class 98 | dataTEDTmp$class <- dataTED$class 99 | data <- dataTDTmp 100 | dataTED <- dataTEDTmp 101 | } 102 | else{ 103 | data <- data[, selectedFeats] 104 | dataTED <- dataTED[, selectedFeats] 105 | } 106 | return (list(Train = data, Test = dataTED)) 107 | } 108 | -------------------------------------------------------------------------------- /R/evaluateMet.R: -------------------------------------------------------------------------------- 1 | #' @title Evaluate Fitted Model. 2 | #' 3 | #' @description Evaluate Predictions obtained from a specific model based on true labels, its predictions, and the evaluation metric. 4 | #' 5 | #' @param yTrue Vector of true labels. 6 | #' @param pred Vector of predicted labels. 7 | #' @param metric Metric to be used in evaluation: 8 | #' \itemize{ 9 | #' \item "acc" - Accuracy, 10 | #' \item "avg-fscore" - Average of F-Score of each label, 11 | #' \item "avg-recall" - Average of Recall of each label, 12 | #' \item "avg-precision" - Average of Precision of each label, 13 | #' \item "fscore" - Micro-Average of F-Score of each label, 14 | #' \item "recall" - Micro-Average of Recall of each label, 15 | #' \item "precision" - Micro-Average of Precision of each label. 16 | #' } 17 | #' 18 | #' @importFrom caret confusionMatrix 19 | #' 20 | #' @return Float number representing the evaluation. 21 | #' 22 | #' @examples 23 | #' \dontrun{ 24 | #' result1 <- autoRLearn(10, 'sampleDatasets/shuttle/train.arff', 'sampleDatasets/shuttle/test.arff') 25 | #' } 26 | #' 27 | #' @noRd 28 | #' 29 | #' @keywords internal 30 | #' 31 | evaluateMet <- function(yTrue, pred, metric = 'acc'){ 32 | lvls <- union(pred, yTrue) 33 | cm = as.matrix(table(Actual = factor(yTrue, lvls), 34 | Predicted = factor(pred, lvls)) ) # create the confusion matrix 35 | n = sum(cm) # number of instances 36 | nc = nrow(cm) # number of classes 37 | diag = diag(cm) # number of correctly classified instances per class 38 | rowsums = apply(cm, 1, sum) # number of instances per class 39 | colsums = apply(cm, 2, sum) # number of predictions per class 40 | oneVsAll = lapply(1 : nc, 41 | function(i){ 42 | v = c(cm[i,i], 43 | rowsums[i] - cm[i,i], 44 | colsums[i] - cm[i,i], 45 | n-rowsums[i] - colsums[i] + cm[i,i]); 46 | return(matrix(v, nrow = 2, byrow = T))}) 47 | s = matrix(0, nrow = 2, ncol = 2) 48 | for(i in 1 : nc){s = s + oneVsAll[[i]]} 49 | 50 | if (metric == 'acc'){ 51 | perf <- sum(diag) / n 52 | } 53 | else if(metric == 'avg-precision'){ 54 | precision <- diag / colsums 55 | perf <- mean(precision) 56 | } 57 | else if(metric == 'avg-recall'){ 58 | recall <- diag / rowsums 59 | perf <- mean(recall) 60 | } 61 | else if(metric == 'avg-fscore'){ 62 | precision <- diag / colsums 63 | recall <- diag / rowsums 64 | f1 <- 2 * precision * recall / (precision + recall) 65 | perf <- mean(f1) 66 | } 67 | else{ 68 | perf <- (diag(s) / apply(s,1, sum))[1]; 69 | } 70 | 71 | return(perf) 72 | } 73 | -------------------------------------------------------------------------------- /R/evocate.R: -------------------------------------------------------------------------------- 1 | #' @export evocate 2 | evocate <- function(df_train, df_test, maxTime = 1, models = "xgboost", 3 | optimizationAlgorithm = "hyperband", bw = 3, max_iter = 81, kde_type = "single", 4 | problem = "classification", measure = "classif.acc", ensemble_size = 1) { 5 | 6 | total_time <- maxTime * 60 7 | parameters_per_model <- map_int(models, .f = ~ length(jsons[[.x]]$params)) 8 | times <- (parameters_per_model * total_time) / (sum(parameters_per_model)) 9 | 10 | cat("Models selected:", models, '\n', sep = ' ') 11 | cat("Time distribution:", times, '\n', sep = ' ') 12 | 13 | run_optimization <- function(model, time) { 14 | results <- NULL 15 | priors <- data.frame() 16 | tic(model, "optimization time:") 17 | 18 | if(optimizationAlgorithm == 'hyperband') { 19 | current <- Sys.time() %>% as.integer() 20 | end <- (Sys.time() %>% as.integer()) + time 21 | 22 | repeat { 23 | gc(verbose = F) 24 | tic('current hyperband runtime') 25 | print(paste('Started', model, ' model...')) 26 | # Compute the time left for this model 27 | time_left <- max(end - (Sys.time() %>% as.integer()), 1) 28 | print(paste("There are:", time_left, "seconds left for this hyperband run")) 29 | res <- hyperband(df = df_train, model = model, max_iter = max_iter, 30 | maxtime = time_left, problem = problem, measure = measure) 31 | 32 | if(is_empty(flatten(res)) == F) { 33 | res <- res %>% 34 | map_dfr(.f = ~ .x[["answer"]]) %>% 35 | arrange(desc(acc)) %>% 36 | head(1) 37 | results <- c(list(res), results) 38 | print(paste('Best performance from hyperband this round: ', res$acc)) 39 | } 40 | # Break if the remaining time exceeds the allowed time budget 41 | elapsed <- (Sys.time() %>% as.integer()) - current 42 | if(elapsed >= time) { 43 | break 44 | } 45 | } 46 | } 47 | else if(optimizationAlgorithm == "bohb") { 48 | current <- Sys.time() %>% as.integer() 49 | end <- (Sys.time() %>% as.integer()) + time 50 | repeat { 51 | gc(verbose = F) 52 | tic("current bohb time") 53 | print(paste("started", model)) 54 | time_left <- max(end - (Sys.time() %>% as.integer()), 1) 55 | print(paste("There are:", time_left, "seconds left for this bohb run")) 56 | res <- bohb(df = df_train, model = model, bw = bw, max_iter = max_iter, 57 | maxtime = time_left, priors = priors, kde_type = kde_type) 58 | 59 | if(is_empty(flatten(res)) == F) { 60 | priors <- res %>% 61 | map_dfr(.f = ~ .x[["sh_runs"]]) 62 | res <- res %>% 63 | map_dfr(.f = ~ .x[["answer"]]) %>% 64 | arrange(desc(acc)) %>% 65 | head(1) 66 | 67 | results <- c(list(res), results) 68 | print(paste('Best accuracy from hyperband this round: ', res$acc)) 69 | } 70 | 71 | elapsed <- (Sys.time() %>% as.integer()) - current 72 | if(elapsed >= time) { 73 | break 74 | } 75 | } 76 | } 77 | else { 78 | errorCondition(message = "Only hyperband and bohb are valid optimization algorithms at this moment.") 79 | break 80 | } 81 | toc() 82 | results 83 | } 84 | 85 | print("Starting to run all optimizations.") 86 | ans <- vector(mode = "list", length = length(models)) 87 | 88 | for(i in 1:length(models)) { 89 | flag <- TRUE 90 | tryCatch({ 91 | ans[[i]] <- run_optimization(models[[i]], times[[i]]) 92 | }, error = function(e) { 93 | cat('Error spotted: ') 94 | message(e) 95 | cat(' In ', models[[i]], ' model, going to the next model!\n') 96 | flag <<- FALSE 97 | }) 98 | if (!flag) next 99 | } 100 | 101 | # Arrange Results according to the best performance 102 | ensemble_size <- min(max(1, length(ans[[1]])), ensemble_size) 103 | print(ensemble_size) 104 | tryCatch({best_model <- ans %>% 105 | map(.f = ~ map_dfr(.x = .x, .f = ~ .x %>% select(model, acc))) %>% 106 | map_dfr(.f = ~ .x %>% arrange(desc(acc)) %>% head(ensemble_size)) %>% 107 | arrange(desc(acc)) 108 | print('----------------------####------------------------') 109 | # Return the best performing model 110 | results <- ensembling(best_model, df_train, df_test, problem = problem, measure = measure) 111 | return (results) 112 | }, error = function(e){ 113 | cat('Error spotted: ') 114 | message(e) 115 | cat('Try increasing the time budget or use a different model.\n') 116 | return (-1) 117 | }) 118 | 119 | } 120 | -------------------------------------------------------------------------------- /R/evocate_utilities.R: -------------------------------------------------------------------------------- 1 | #' @import nloptr 2 | #' @import bbotk 3 | 4 | #' @keywords internal 5 | 6 | ensembling = function(best_models, df_train, df_test, 7 | problem = 'classification', measure = 'classif.acc'){ 8 | lrns = c() 9 | for(i in 1:nrow(best_models)){ 10 | lrns = c(lrns, po('learner_cv', best_models[[1]][[i]], 11 | id = paste('lrn', as.character(i), sep='') )) 12 | } 13 | 14 | level0 = gunion(list( 15 | lrns)) %>>% 16 | po("featureunion", id = "union1") 17 | 18 | if(problem == 'classification'){ 19 | problem = 'classif' 20 | ensemble = level0 %>>% LearnerClassifAvg$new(id = "classif.avg") 21 | task = TaskClassif$new(id = 'final_eval', backend = df_train, target = 'class') 22 | } 23 | else{ 24 | problem = 'regr' 25 | ensemble = level0 %>>% LearnerRegrAvg$new(id = "regr.avg") 26 | task = TaskRegr$new(id = 'final_eval', backend = df_train, target = 'class') 27 | } 28 | 29 | ens_lrn = GraphLearner$new(ensemble) 30 | ens_lrn$predict_type = "prob" 31 | ens_lrn$train(task) 32 | perf <- ens_lrn$predict_newdata(df_test)$score(msr(measure)) 33 | return (list("model" = ens_lrn, "performance" = perf)) 34 | } 35 | -------------------------------------------------------------------------------- /R/featurePreProcessing.R: -------------------------------------------------------------------------------- 1 | #' @title Perform Feature Preprocessing if specified by user. 2 | #' 3 | #' @description Perform a preprocessing algorithm on the dataset and return the preprocessed one. 4 | #' 5 | #' @param data Data frame containing the dataset to process. 6 | #' @param dataTED Data frame containing the test dataset to process. 7 | #' @param preProcessF string containing the name of the preprocessing algorithm: 8 | #' "boxcox": apply a Box–Cox transform and values must be non-zero and positive in all features, 9 | #' "yeo-Johnson": apply a Yeo-Johnson transform, like a BoxCox, but values can be negative, 10 | #' "zv": remove attributes with a zero variance (all the same value), 11 | #' "center": subtract mean from values, 12 | #' "scale": divide values by standard deviation, 13 | #' "standardize": perform both centering and scaling, 14 | #' "normalize": normalize values, 15 | #' "pca": transform data to the principal components, 16 | #' "ica": transform data to the independent components. 17 | #' @param nComp Integer of Number of components needed if either "pca" or "ica" feature preprocessors are needed. 18 | #' 19 | #' @return List of two Dataframes of the preprocessed training and testing datasets. 20 | #' 21 | #' @examples featurePreProcessing(\code{data}, \code{dataTED}, "center", 0). 22 | #' 23 | #' @noRd 24 | #' 25 | #' @keywords internal 26 | 27 | featurePreProcessing <- function(data, dataTED, preProcessF, nComp) { 28 | 29 | if(preProcessF == 'scale'){ 30 | preprocessParams <- preProcess(data, method=c("scale")) 31 | } 32 | else if(preProcessF == 'center'){ 33 | preprocessParams <- preProcess(data, method=c("center")) 34 | } 35 | else if(preProcessF == 'standardize'){ 36 | preprocessParams <- preProcess(data, method=c("center", "scale")) 37 | } 38 | else if(preProcessF == 'normalize'){ 39 | preprocessParams <- preProcess(data, method=c("range")) 40 | } 41 | else if(preProcessF == 'pca'){ 42 | if (is.na(nComp)) 43 | preprocessParams <- preProcess(data, method=c("pca")) 44 | else 45 | preprocessParams <- preProcess(data, method=c("center", "scale", "pca"), pcaComp = nComp) 46 | } 47 | else if(preProcessF == 'ica'){ 48 | preprocessParams <- preProcess(data, method=c("center", "scale", "ica"), n.comp=nComp) 49 | } 50 | else if(preProcessF == 'yeo-Johnson'){ 51 | preprocessParams <- preProcess(data, method=c("YeoJohnson")) 52 | } 53 | else if(preProcessF == 'boxcox'){ 54 | preprocessParams <- preProcess(data, method=c("BoxCox")) 55 | } 56 | else if(preProcessF == 'zv'){ 57 | preprocessParams <- preProcess(data, method=c("zv")) 58 | } 59 | else{ 60 | print('Error: No defined Preprocessing Algorithm...Skip feature preprocessing part!') 61 | return(list(TD = data, TED = dataTED)) 62 | } 63 | data <- predict(preprocessParams, data) 64 | dataTED <- predict(preprocessParams, dataTED) 65 | return(list(TD = data, TED = dataTED)) 66 | } 67 | -------------------------------------------------------------------------------- /R/fitModel.R: -------------------------------------------------------------------------------- 1 | #' @title Fit SMAC Model. 2 | #' 3 | #' @description Fit the trees of the SMAC forest model by adding new nodes to each of the forest trees. 4 | #' 5 | #' @param params A string of parameter configuration values for the current classifier to be tuned (parameters are separated by #). 6 | #' @param bestPerf Vector of performance values of the best parameter configuration on the folds of the SMAC model. 7 | #' @param trainingSet Dataframe of the training set. 8 | #' @param validationSet Dataframe of the validation Set. 9 | #' @param foldedSet List of the folds of the dataset in each tree of the SMAC forest. 10 | #' @param classifierAlgorithm String of the name of classifier algorithm used now. 11 | #' @param tree List of data frames, representing the data structure for the forest of trees of the SMAC model. 12 | #' @param B number of trees in the forest of trees of SMAC optimization algorithm (default = 10). 13 | #' @param metric Metric to be used in evaluation: 14 | #' \itemize{ 15 | #' \item "acc" - Accuracy, 16 | #' \item "avg-fscore" - Average of F-Score of each label, 17 | #' \item "avg-recall" - Average of Recall of each label, 18 | #' \item "avg-precision" - Average of Precision of each label, 19 | #' \item "fscore" - Micro-Average of F-Score of each label, 20 | #' \item "recall" - Micro-Average of Recall of each label, 21 | #' \item "precision" - Micro-Average of Precision of each label. 22 | #' } 23 | #' 24 | #' @return List of: \code{t} trees of fitted SMAC Model - \code{p} performance of current parameter configuration on whole dataset - \code{bp} Current added parameter configuration. 25 | #' 26 | #' @examples fitModel('1', c(0.91, 0.89), data.frame(salary = c(623, 515, 611, 729, 843), class = c (0, 0, 0, 1, 1)), data.frame(salary = c(400, 800), class = c (0, 1)), list(c(1,2,4), c(3,5)), 'knn', data.frame(fold = c(), parent = c(), params = c(), leftChild = c(), rightChild = c(), performance = c(), rowN = c()), 2). 27 | #' 28 | #' @noRd 29 | #' 30 | #' @keywords internal 31 | 32 | fitModel <- function(params, bestPerf, trainingSet, validationSet, foldedSet, classifierAlgorithm, tree, B = 10, metric = 'acc') { 33 | #fit SMAC model using the current best parameters 34 | #get current best parameters 35 | cntParams <- params 36 | cntParamStr <- paste( unlist(cntParams), collapse='#') 37 | #initiate a variable to store its performance on each decision tree of the forest 38 | perf <- c() 39 | for(i in 1:B){ 40 | cntNode <- tree[tree$fold==i & is.na(tree$parent), ] 41 | #Get position to add the new node 42 | cParent <- NA 43 | cChild <- NA 44 | if(nrow(cntNode) > 0){ 45 | cParent <- cntNode$rowN 46 | while(!is.na(cntNode[[1]])){ 47 | cParent <- cntNode$rowN 48 | if(cntParamStr > as.character(cntNode$params)){ 49 | cntNode <- tree[as.integer(cntNode$rightChild), ] 50 | cChild <- 5 #pointer position to right node 51 | } 52 | else if(cntParamStr < as.character(cntNode$params)){ 53 | cntNode <- tree[as.integer(cntNode$leftChild), ] 54 | cChild <- 4 #pointer position to left node 55 | } 56 | else{ 57 | return(list(bp = params, t = tree, p=bestPerf)) 58 | } 59 | } 60 | } 61 | 62 | if(length(bestPerf) >= i) 63 | perf <- bestPerf 64 | else 65 | perf <- c(perf, (runClassifier(trainingSet[foldedSet[[i]], ], validationSet, cntParams, classifierAlgorithm, metric = metric))$perf) 66 | 67 | #row number of new node to be added 68 | newRowN <- nrow(tree) + 1 69 | #Update parent's child 70 | if(!is.na(cChild)) 71 | tree[cParent, cChild] <- newRowN 72 | #Add new node with current configuration 73 | df <- data.frame(fold = i, parent = cParent, params = cntParamStr, leftChild = NA, rightChild = NA, performance = perf[i], rowN = newRowN) 74 | tree <- rbind(tree, df) 75 | } 76 | 77 | cntParams$performance <- mean(perf) 78 | return(list(t = tree, p=perf, bp=cntParams)) 79 | } 80 | -------------------------------------------------------------------------------- /R/getCandidateClassifiers.R: -------------------------------------------------------------------------------- 1 | #' @title Get candidate Good Classifier Algorithms. 2 | #' 3 | #' @description Compare Dataset Meta-Features with the Knowledge base to recommend good Classifier Algorithms based on nearest neighbor datasets with outperformaing pipelines. 4 | #' 5 | #' @param maxTime Float of the maximum time budget allowed. 6 | #' @param metaFeatures List of the meta-features collected from the dataset. 7 | #' @param nModels Integer of number of required number of recommendations of classifier algorithms to get. 8 | #' 9 | #' @return List of recommended classifier algorithms, their initial parameter configurations, and time ratio to be spent in tuning each classifier. 10 | #' 11 | #' @examples getCandidateClassifiers(10, \code{metaFeatures}, 3) 12 | #' 13 | #' @importFrom BBmisc normalize 14 | #' @importFrom RMySQL MySQL fetch dbDisconnect dbSendQuery dbConnect 15 | #' @importFrom httr POST content 16 | #' @importFrom stats setNames 17 | #' 18 | #' @noRd 19 | #' 20 | #' @keywords internal 21 | 22 | getCandidateClassifiers <- function(maxTime, metaFeatures, nModels) { 23 | classifiers <- c('randomForest', 'c50', 'j48', 'svm', 'naiveBayes','knn', 'bagging', 'rda', 'neuralnet', 'plsda', 'part', 'deepboost', 'rpart', 'lda', 'lmt') 24 | classifiersWt <- c(10, 20, 11, 21, 10, 5, 25, 5, 5, 6, 11, 21, 6, 5, 10) #weight of each classifier to tune based on number and types of parameters 25 | 26 | #Choosen Classifiers parameters initialization 27 | params <- c() 28 | cclassifiers <- c() #chosen classifiers 29 | ratio <- c() #time ratios for each classifier 30 | KBFlag <- FALSE 31 | for(trial in 1:3){ #TRY to connect to knowledge base 32 | readKnowledgeBase <- try( 33 | { 34 | metaData <- content(POST("https://jncvt2k156.execute-api.eu-west-1.amazonaws.com/default/callKnowledgeBase")) 35 | KBFlag <- TRUE 36 | metaDataFeatures <- data.frame(matrix(unlist(metaData, recursive = FALSE), nrow = length(metaData), byrow = T)) 37 | colnames(metaDataFeatures) <- c('datasetRatio', 'featuresKurtStdDev', 'featuresKurtMean', 'featuresKurtMax', 'featuresKurtMin', 'featuresSkewStdDev', 'featuresSkewMean', 'featuresSkewMax', 'featuresSkewMin', 'symbolsStdDev', 'symbolsSum', 'symbolsMean', 'classProbStdDev', 'classProbMean', 'classProbMax', 'classProbMin', 'classEntropy', 'ratioNumToCat', 'nCatFeatures', 'nNumFeatures', 'nInstances', 'nFeatures', 'nClasses', 'lognFeatures', 'lognInstances', 'classifierAlgorithm', 'parameters', 'maxTime', 'metric', 'performance') 38 | 39 | #Remove useless columns for now 40 | metaDataFeatures$performance <- NULL 41 | metaDataFeatures$metric <- NULL 42 | metaDataFeatures$ipInserted <- NULL 43 | metaDataFeatures$maxTime <- NULL 44 | metaDataFeatures$dateInserted <- NULL 45 | metaDataFeatures$ID <- NULL 46 | metaFeatures$maxTime <- NULL 47 | 48 | #Separate Best Classifier Algorithms and Their Parameters 49 | bestClf <- metaDataFeatures$classifierAlgorithm 50 | nClasses <- metaDataFeatures$nClasses 51 | bestClfParams <- metaDataFeatures$parameters 52 | metaDataFeatures$classifierAlgorithm <- NULL 53 | metaDataFeatures$parameters <- NULL 54 | 55 | #Append new dataset meta features to the metaDataFeatures 56 | metaDataFeatures <- rbind(metaDataFeatures, metaFeatures) 57 | 58 | #Normalize the distance matrix 59 | metaDataFeatures[] <- suppressWarnings(lapply(metaDataFeatures, function(x) as.numeric(as.character(x)))) 60 | metaDataFeatures <- normalize(metaDataFeatures, method = "standardize", range = c(0, 1), margin = 1L, on.constant = "quiet") 61 | 62 | #Construct the distance list to extract the nearest neighbors 63 | cntMeta <- nrow(metaDataFeatures) 64 | distMat <- data.frame() 65 | distMat[['dist']] <- as.numeric() 66 | distMat[['index']] <- as.numeric() 67 | 68 | for(i in 1:(nrow(metaDataFeatures)-1)){ 69 | dist <- 0 70 | for(j in 1:ncol(metaDataFeatures)){ 71 | if(is.na(metaDataFeatures[i,j]) == TRUE && is.na(metaDataFeatures[cntMeta,j]) == TRUE) 72 | dist <- dist + 0 73 | 74 | else if ( (is.na(metaDataFeatures[i,j]) == TRUE && is.na(metaDataFeatures[cntMeta,j]) == FALSE) || (is.na(metaDataFeatures[i,j]) == FALSE && is.na(metaDataFeatures[cntMeta,j]) == TRUE) ) 75 | dist <- dist + 0.5 76 | 77 | else 78 | dist <- dist + (suppressWarnings(as.numeric(metaDataFeatures[i,j])) - suppressWarnings(as.numeric(metaDataFeatures[cntMeta, j])) )^2 79 | 80 | } 81 | tmpDist <- list(dist = dist, index = i) 82 | distMat <- rbind(distMat, tmpDist) 83 | } 84 | #Sort Dataframe 85 | orderInd <- order(distMat$dist) 86 | distMat <- distMat[orderInd, ] 87 | 88 | #Get best classifiers with their parameters 89 | for(i in 1:nrow(distMat)){ 90 | ind <- distMat[i,]$index 91 | clf <- bestClf[ind] 92 | if(is.element(clf, cclassifiers) == FALSE){ 93 | #Exception for deep Boost requires binary classes dataset 94 | if((clf == 'deepboost' && nClasses > 2)||clf == 'fda') 95 | next 96 | cclassifiers <- c(cclassifiers, clf) 97 | params <- c(params, bestClfParams[ind]) 98 | 99 | clfInd = which(classifiers == clf) 100 | ratio <- c(ratio, classifiersWt[clfInd]) 101 | } 102 | if(length(cclassifiers) == nModels) 103 | break 104 | } 105 | }) 106 | if(inherits(readKnowledgeBase, "try-error")){ 107 | KBFlag <- FALSE 108 | print('Warning: Can not connect to KnowledgeBase Data! Check your internet connectivity. Trying Again.') 109 | next 110 | } 111 | 112 | if(KBFlag == TRUE) #managed to get information from knowledge base 113 | break 114 | } 115 | 116 | if(KBFlag == FALSE) 117 | print('Assuming Random Classifiers will be used. You should use Large Time Budgets and nModels for better results') 118 | #Assign time ratio for each classifier 119 | if (length(cclassifiers) < nModels){ #failed to make use of meta-learning --> tune over all classifiers 120 | #cclassifiers <- classifiers 121 | for (clf in classifiers){ 122 | if(is.element(clf, cclassifiers) == TRUE) #already inserted this classifier 123 | next 124 | ind = which(classifiers == clf) 125 | ratio <- c(ratio, classifiersWt[ind]) 126 | cclassifiers <- c(cclassifiers, clf) 127 | params <- c(params, '') 128 | if(length(cclassifiers) == nModels) #completed number of required models 129 | break 130 | } 131 | } 132 | ratio <- ratio / sum(ratio) * (maxTime * 0.9) #Only using 90% of the allowed time budget 133 | 134 | return (list(c = cclassifiers, r = ratio, p = params)) 135 | } 136 | -------------------------------------------------------------------------------- /R/hb_utilities.R: -------------------------------------------------------------------------------- 1 | #' @importFrom data.table fcase 2 | #' @import purrr 3 | 4 | #' @keywords internal 5 | 6 | param_sample <- function(model, hparam, columns = NULL) { 7 | param = jsons[[model]][[hparam]] 8 | type <- param$type 9 | type_scale <- param$scale 10 | 11 | if(type == "boolean") { 12 | param_estimation <- paste(base::sample(x = as.list(param$values), size = 1), sep = "") 13 | param_estimation <- ifelse(param_estimation == "FALSE", FALSE, TRUE) 14 | return(param_estimation) 15 | } 16 | else if(type == "discrete") { 17 | param_estimation <- paste(base::sample(x = as.list(param$values), size = 1), sep = "") 18 | return(param_estimation) 19 | } 20 | 21 | else { 22 | int_val <- ifelse(hparam == "mtry", as.numeric(columns) - 1, as.numeric(param$maxVal)) 23 | param_estimation <- fcase(type_scale == "int", rdunif(1, a = as.numeric(param$minVal), 24 | b = int_val), 25 | type_scale == "any", runif(1, min = as.numeric(param$minVal), 26 | max = as.numeric(param$maxVal)), 27 | type_scale == "double", runif(1, min = as.numeric(param$minVal), 28 | max = as.numeric(param$maxVal)), 29 | type_scale == "exp", 2^rdunif(1, a = as.numeric(param$minVal), 30 | b = as.numeric(param$maxVal))) 31 | return(as.numeric(param_estimation)) 32 | } 33 | 34 | } 35 | 36 | #' @keywords internal 37 | get_random_hp_config <- function(model, columns = NULL) { 38 | param_db <- jsons[[model]] 39 | params_list <- param_db$params 40 | params_list_mapped <- map(.x = params_list, 41 | .f = as_mapper( ~ param_sample(model = model, 42 | hparam = .x, 43 | columns = columns))) 44 | `names<-`(params_list_mapped, params_list) 45 | } 46 | 47 | #' @keywords internal 48 | calc_n_r = function(max_iter = 81, eta = 3, s = 4, B = 405) { 49 | n = trunc(ceiling(trunc(B/max_iter/(s+1)) * eta**s)) 50 | r = max_iter * eta^(-s) 51 | ans = c(n, r) 52 | ans 53 | } 54 | -------------------------------------------------------------------------------- /R/hyperband.R: -------------------------------------------------------------------------------- 1 | #' @keywords internal hyperband 2 | hyperband <- function(df, model, max_iter = 81, eta = 3, maxtime = 1000, 3 | problem = 'classification', measure = 'classif.acc') { 4 | logeta = as_mapper(~ log(.x) / log(eta)) 5 | s_max = trunc(logeta(max_iter)) 6 | B = (s_max + 1) * max_iter 7 | nrs = map_dfc(s_max:0, .f = ~ calc_n_r(max_iter, eta, .x, B)) %>% 8 | t() %>% 9 | `colnames<-`(value = c("n", "r")) %>% 10 | as.data.table() 11 | nrs$s = s_max:0 12 | partial_halving <- function(n, r, s) { 13 | successive_halving(df = df, model = model, 14 | params_config = replicate(n, get_random_hp_config(model, columns = ncol(df) - 1), 15 | simplify = FALSE), 16 | n = n, r = r, s_max = s, max_iter = max_iter, eta = eta, 17 | problem = problem, measure = measure) 18 | } 19 | 20 | liszt = vector(mode = "list", length = max(nrs$s) + 1) 21 | if (model != 'ranger'){ 22 | tryCatch({tmp <- withTimeout({ 23 | for (row in 1:nrow(nrs)) { 24 | liszt[[row]] <- partial_halving(nrs[[row, 1]], 25 | nrs[[row, 2]], 26 | nrs[[row, 3]]) 27 | print("Looped once") 28 | } 29 | }, timeout = maxtime, elapsed = maxtime) 30 | }, TimeoutException = function(ex) { 31 | err <- geterrmessage() 32 | if (startsWith(err, 'reached') == FALSE) 33 | print(paste('Error Found, ', err, ' Replace ', model, sep = '')) 34 | else 35 | print("Time Budget ended.") 36 | }, 37 | finally = function(ex) { 38 | print("Hyperband successfully finished.") 39 | }) 40 | } 41 | else{ 42 | current <- Sys.time() %>% as.integer() 43 | for (row in 1:nrow(nrs)) { 44 | tryCatch({liszt[[row]] <- partial_halving(nrs[[row, 1]], 45 | nrs[[row, 2]], 46 | nrs[[row, 3]]) 47 | }, Exception = function(ex) { 48 | err <- geterrmessage() 49 | print(paste('Error Found, ', err, ' Replace ', model, sep = '')) 50 | }) 51 | now <- Sys.time() %>% as.integer() 52 | if ((now - current) > maxtime){ 53 | print("Time Budget ended.") 54 | break 55 | } 56 | print("Looped once") 57 | } 58 | } 59 | return(liszt) 60 | } 61 | -------------------------------------------------------------------------------- /R/initialize.R: -------------------------------------------------------------------------------- 1 | #' @title Initialize the SMAC model. 2 | #' 3 | #' @description Initialize the SMAC model with the classifier default parameter configuration. 4 | #' 5 | #' @param classifierName String of the classifier algorithm name. 6 | #' @param result List of the converted classifier json parameter configuration into set of vectors and lists. 7 | #' @param initParams String of the initial parameter configuration of \code{classifierName} to start the model with. 8 | #' 9 | #' @return 10 | #' 11 | #' @examples 12 | #' 13 | #' @noRd 14 | #' 15 | #' @keywords internal 16 | 17 | initialize <- function(classifierName, result, initParams) { 18 | #get list of Classifier Parameters 19 | params <- result$params 20 | #get list of GrandParent parametes 21 | gparams <- result$parents 22 | #Create dataFrame for classifier default parameters 23 | defaultParams <- data.frame(matrix(ncol = length(params)+1, nrow = 1)) 24 | colnames(defaultParams) <- c(params, 'performance') 25 | i <- 1 26 | while(i <= length(gparams)){ 27 | parI <- gparams[i] 28 | defaultParams[[parI]] <- result[[parI]]$'default' 29 | require <- result[[parI]]$'requires'[[result[[parI]]$'default']]$'require' 30 | gparams <- c(gparams, require) 31 | i <- i + 1 32 | } 33 | 34 | if ( initParams != ""){ 35 | initParams <- unlist(strsplit(initParams, "#")) 36 | j <- 1 37 | for(i in colnames(defaultParams)){ 38 | if(i == 'performance' || i == 'nodesize') 39 | next 40 | if(initParams[j] == 'NA') 41 | defaultParams[[i]] <- NA 42 | else 43 | defaultParams[[i]] <- initParams[j] 44 | 45 | j <- j + 1 46 | } 47 | } 48 | defaultParams[["EI"]] <- NA 49 | return (defaultParams) 50 | } 51 | -------------------------------------------------------------------------------- /R/intensify.R: -------------------------------------------------------------------------------- 1 | #' @title Intensify of SMAC model 2 | #' 3 | #' @description Checking if current candidate parameter configuration is better than the current best parameter configuration chosen till now or not. 4 | #' 5 | #' @param R Dataframe of tried out candidate parameter configurations. 6 | #' @param bestParams String of best parameter configuration found till now. 7 | #' @param bestPerf Vector of performance of classifier on all folds of dataset. 8 | #' @param candidateConfs Vector of strings of candidate parameter configurations. 9 | #' @param trainingSet Dataframe of the training set. 10 | #' @param validationSet Dataframe of the validation Set. 11 | #' @param foldedSet List of the folds of the dataset in each tree of the SMAC forest. 12 | #' @param classifierAlgorithm String value of the classifier Name. 13 | #' @param maxTime Float of maximum time budget allowed. 14 | #' @param timeTillNow Float of the time spent till now. 15 | #' @param B number of trees in the forest of trees of SMAC optimization algorithm (default = 10). 16 | #' @param metric Metric to be used in evaluation: 17 | #' \itemize{ 18 | #' \item "acc" - Accuracy, 19 | #' \item "avg-fscore" - Average of F-Score of each label, 20 | #' \item "avg-recall" - Average of Recall of each label, 21 | #' \item "avg-precision" - Average of Precision of each label, 22 | #' \item "fscore" - Micro-Average of F-Score of each label, 23 | #' \item "recall" - Micro-Average of Recall of each label, 24 | #' \item "precision" - Micro-Average of Precision of each label. 25 | #' } 26 | #' 27 | #' @return List of current best parameter configuration, its performance, dataframe of tried out candidate parameter configurations, and time till now. 28 | #' 29 | #' @examples intensify(c('1'), '1', c(0.89, 0.91), list(c(1,2,4), c(3,5)), data.frame(salary = c(623, 515, 611, 729, 843), class = c (0, 0, 0, 1, 1)), data.frame(salary = c(400, 800), class = c (0, 1)), 'knn', 100, 5, 2) 30 | #' 31 | #' @noRd 32 | #' 33 | #' @keywords internal 34 | 35 | intensify <- function(R, bestParams, bestPerf, candidateConfs, foldedSet, trainingSet, validationSet, classifierAlgorithm, maxTime, timeTillNow , B = 10, metric = metric) { 36 | for(j in 1:nrow(candidateConfs)){ 37 | cntParams <- candidateConfs[j,] 38 | cntPerf <- c() 39 | folds <- sample(1:B) 40 | pointer <- 1 41 | timeFlag <- FALSE 42 | N <- 1 43 | #number of folds with higher performance for candidate configuration 44 | forMe <- 0 45 | #number of folds with lower performance for candidate configuration 46 | againstMe <- 0 47 | fails <- 0 48 | while(pointer < B){ 49 | for(i in pointer:min(pointer+N-1, B)){ 50 | tmpPerf <- runClassifier(trainingSet[foldedSet[[i]], ], validationSet, cntParams, classifierAlgorithm, metric = metric) 51 | if(tmpPerf$perf == 0){ 52 | fails <- fails + 1 53 | } 54 | cntPerf <- c(cntPerf, tmpPerf$perf) 55 | if(i > length(bestPerf)) 56 | tmpPerf <- runClassifier(trainingSet[foldedSet[[i]], ], validationSet, bestParams, classifierAlgorithm, metric = metric) 57 | bestPerf <- c(bestPerf, tmpPerf$perf) 58 | if(cntPerf[i] >= bestPerf[i])forMe <- forMe + 1 59 | else againstMe <- againstMe + 1 60 | 61 | #Check time consumed till now 62 | t <- toc(quiet = TRUE) 63 | timeTillNow <- timeTillNow + t$toc - t$tic 64 | tic(quiet = TRUE) 65 | if(timeTillNow > maxTime || fails > 2){ 66 | timeFlag <- TRUE 67 | break 68 | } 69 | } 70 | if(forMe < againstMe || timeFlag == TRUE) break 71 | pointer <- pointer + N 72 | N <- N * 2 73 | } 74 | #make the current candidate as the best candidate 75 | if(timeFlag == FALSE && forMe > againstMe){ 76 | bestParams <- cntParams 77 | bestPerf <- cntPerf 78 | } 79 | cntParams$performance <- mean(cntPerf) 80 | bestParams$performance <- mean(bestPerf) 81 | R <- rbind(R, cntParams) 82 | } 83 | return(list(params = bestParams, perf = bestPerf, r = R, timeTillNow = timeTillNow, fails = fails)) 84 | } 85 | -------------------------------------------------------------------------------- /R/intrepretability.R: -------------------------------------------------------------------------------- 1 | #' @title Perform Interpretability on Model. 2 | #' 3 | #' @description Perform Model interpretability on the select model by obtaining two plots for feature importance and feature interaction. 4 | #' 5 | #' @param model Fitted Model of any of the chosen classifiers and fitted on the training set. 6 | #' @param x Dataframe of the training set. 7 | #' 8 | #' @return List of two plots of feature importance and feature interaction. 9 | #' 10 | #' @examples interpret(\code{model}, data.frame(salary = c(623, 515, 611, 729, 843), class = c (0, 0, 0, 1, 1))) 11 | #' 12 | #' @importFrom iml FeatureImp Interaction Predictor 13 | #' 14 | #' @noRd 15 | #' 16 | #' @keywords internal 17 | 18 | Loss <- function(actual, predicted){ 19 | err <- 0 20 | for(i in 1:length(actual)){ 21 | act <- as.character(actual[i]) 22 | pred <- substring(as.character(predicted[i]), 2) 23 | if (act != pred) 24 | err <- err + 1 25 | } 26 | return(err/length(actual)) 27 | } 28 | 29 | interpret <- function(model, x){ 30 | clas = as.factor(x$class) 31 | X = x[which(names(x) != "class")] 32 | X[] <- lapply(X, function(x) { 33 | as.double(as.character(x)) 34 | }) 35 | predictor = Predictor$new(model, data = X, y = as.factor(clas)) 36 | out <- list() 37 | out$featImp <- FeatureImp$new(predictor, loss = Loss) 38 | out$interact = Interaction$new(predictor) 39 | return(out) 40 | } 41 | -------------------------------------------------------------------------------- /R/outClassifierConf.R: -------------------------------------------------------------------------------- 1 | #' @title Output Classifier Parameter Configuration. 2 | #' 3 | #' @description Get the classifier parameter configuration in a human readable format. 4 | #' 5 | #' @param classifierName String of the name of classifier algorithm used now. 6 | #' @param result List of the converted classifier json parameter configuration into set of vectors and lists. 7 | #' @param initParams String of parameters of \code{classifierName} separated by #. 8 | #' 9 | #' @return String of the human readable output in HTML format. 10 | #' 11 | #' @examples outClassifierConf('knn', list(params = c('k'), parents = c('k'), k = list(default = '7', require = c())), '1') 12 | #' 13 | #' @noRd 14 | #' 15 | #' @keywords internal 16 | 17 | outClassifierConf <- function(classifierName, result, initParams) { 18 | #get list of Classifier Parameters names 19 | params <- result$params 20 | #get list of GrandParent parameters 21 | gparams <- result$parents 22 | #Create dataFrame for classifier default parameters 23 | defaultParams <- data.frame(matrix(ncol = length(params), nrow = 1)) 24 | colnames(defaultParams) <- c(params) 25 | 26 | i <- 1 27 | while(i <= length(gparams)){ 28 | parI <- gparams[i] 29 | defaultParams[[parI]] <- result[[parI]]$'default' 30 | require <- result[[parI]]$'requires'[[result[[parI]]$'default']]$'require' 31 | gparams <- c(gparams, require) 32 | i <- i + 1 33 | } 34 | 35 | return(initParams) 36 | } 37 | -------------------------------------------------------------------------------- /R/readDataset.R: -------------------------------------------------------------------------------- 1 | #' @title Read Dataset File into Memory. 2 | #' 3 | #' @description Read the file of the dataset, and split it into training and validation sets. 4 | #' 5 | #' @param directory String of the directory to the file containing the training dataset. 6 | #' @param testDirectory String of the directory to the file containing the testing dataset. 7 | #' @param vRatio The split ratio of the dataset file into training, and validation sets default(10% Validation - 90% Training). 8 | #' @param classCol String of the class column of the dataset. 9 | #' @param preProcessF Vector of Strings of the preprocessing algorithm to apply. 10 | #' @param featuresToPreProcess Vector of numbers of features columns to perform preprocessing - empty vector means all features. 11 | #' @param nComp Number of components needed if either "pca" or "ica" feature preprocessors are needed. 12 | #' @param missingOpr Boolean variable represents either delete instances with missing values or apply imputation using "MICE" library - (default = 0 --> delete instances). 13 | #' @param metric Metric of string character to be used in evaluation: 14 | #' @param balance Boolean variable represents if SMOTE class balancing is required or not (default FALSE). 15 | #' \itemize{ 16 | #' \item "acc" - Accuracy, 17 | #' \item "avg-fscore" - Average of F-Score of each label, 18 | #' \item "avg-recall" - Average of Recall of each label, 19 | #' \item "avg-precision" - Average of Precision of each label, 20 | #' \item "fscore" - Micro-Average of F-Score of each label, 21 | #' \item "recall" - Micro-Average of Recall of each label, 22 | #' \item "precision" - Micro-Average of Precision of each label. 23 | #' } 24 | #' 25 | #' @return List of the Training and Validation Sets splits. 26 | #' 27 | #' @examples readDataset('/Datasets/irisTrain.csv', '/Datasets/irisTest.csv', 0.1, c(), 'class', 'pca', c(), 2) 28 | #' 29 | #' @import RWeka 30 | #' @import farff 31 | #' @import caret 32 | #' @import mice 33 | #' @importFrom UBL SmoteClassif 34 | #' @importFrom imputeMissings compute impute 35 | #' @importFrom utils read.csv 36 | #' @importFrom stats complete.cases 37 | #' 38 | #' @noRd 39 | #' 40 | #' @keywords internal 41 | 42 | readDataset <- function(directory, testDirectory, vRatio = 0.3, classCol, preProcessF, featuresToPreProcess, nComp, missingOpr, metric, balance) { 43 | #check if CSV or arff 44 | ext <- substr(directory, nchar(directory)-2, nchar(directory)) 45 | #Read CSV file of data 46 | if(ext == 'csv'){ 47 | con <- file(directory, "r") 48 | data <- read.csv(file = con, header = TRUE, sep = ",", stringsAsFactors = TRUE) 49 | close(con) 50 | con <- file(testDirectory, "r") 51 | dataTED <- read.csv(file = con, header = TRUE, sep = ",", stringsAsFactors = TRUE) 52 | close(con) 53 | } 54 | else{ 55 | data <- readARFF(directory) 56 | dataTED <- readARFF(testDirectory) 57 | } 58 | 59 | #Sampling from large datasets 60 | maxSample = 20000000 61 | n = as.integer(maxSample / ncol(data)) 62 | if(maxSample < nrow(data) * ncol(data)){ 63 | sampleInds <- createDataPartition(y = data$class, times = 1, p = n/nrow(data), list = FALSE) 64 | data <- data[sampleInds,] 65 | } 66 | 67 | #change column name of classes to be "class" 68 | colnames(data)[which(names(data) == classCol)] <- "class" 69 | colnames(dataTED)[which(names(dataTED) == classCol)] <- "class" 70 | cInd <- grep("class", colnames(data)) #index of class column 71 | #function which returns function which will encode vectors with values of class column labels 72 | label_encoder <- function(vec){ 73 | levels <- sort(unique(vec)) 74 | function(x){ 75 | match(x, levels) 76 | } 77 | } 78 | classEncoder <- label_encoder(data$class) # create class encoder 79 | data$class <- classEncoder(data$class) # encoding class labels of training set 80 | dataTED$class <- classEncoder(dataTED$class) # encoding class labels of testing set 81 | 82 | #check either to delete an instance with missing values or perform imputation 83 | if (missingOpr == FALSE){ 84 | missingVals <- imputeMissings::compute(data, method = "median/mode") 85 | data <- impute(data, object = missingVals) 86 | dataTED <- impute(dataTED, object = missingVals) 87 | } 88 | else{ 89 | data <-complete( mice(data, m = 1, threshold = 1, printFlag = FALSE)) 90 | dataTED <- complete(mice(dataTED, m = 1, threshold = 1, printFlag = FALSE)) 91 | } 92 | 93 | #remove ID features 94 | numericFlag <- unlist(lapply(data, is.numeric)) 95 | rmvFlag = c() 96 | for(i in 1:ncol(data)){ 97 | len = length(unique(data[,i])) 98 | if(numericFlag[i] == FALSE && ((len / nrow(data) > 0.5) || len == 1) ) 99 | rmvFlag <- c(rmvFlag, i) 100 | } 101 | keepFlag = c(1:ncol(data)) 102 | keepFlag = keepFlag[!keepFlag %in% rmvFlag] 103 | data <- data[, keepFlag] 104 | dataTED <- dataTED[, keepFlag] 105 | 106 | #Select all remaining features 107 | selectedFeats <- c(1:ncol(data)) 108 | 109 | #perform preprocessing 110 | if(length(featuresToPreProcess ) == 0){ 111 | numericFlag <- unlist(lapply(data, is.numeric)) 112 | for(i in 1:ncol(data)){ 113 | if(numericFlag[i] == TRUE && i != cInd) 114 | featuresToPreProcess <- c(featuresToPreProcess, i) 115 | } 116 | } 117 | if(length(preProcessF) != 0 && length(featuresToPreProcess) > 1){ 118 | featuresToPreProcess <- featuresToPreProcess[!featuresToPreProcess %in% cInd] #remove class column from set of features to be preprocessed 119 | dataTmp = list(TD = data[,featuresToPreProcess], TED = dataTED[,featuresToPreProcess]) 120 | #Add PCA if we have more than 100 features 121 | if(length(featuresToPreProcess) > 100 && any('pca' != preProcessF) ) 122 | preProcessF <- c(preProcessF, 'pca') 123 | for(i in 1:length(preProcessF)){ 124 | dataTmp <- featurePreProcessing(dataTmp$TD, dataTmp$TED, preProcessF[i], nComp) 125 | } 126 | 127 | #add other features that don't require feature preprocessing to the features obtained after preprocessing 128 | diffTmp <- setdiff(selectedFeats, c(cInd, featuresToPreProcess)) 129 | dHead = c(colnames(dataTmp$TD), colnames(data)[diffTmp]) 130 | 131 | dataTDTmp <- data.frame(cbind(dataTmp$TD, data[,diffTmp])) 132 | dataTEDTmp <- data.frame(cbind(dataTmp$TED, dataTED[,diffTmp])) 133 | colnames(dataTDTmp) <- dHead 134 | colnames(dataTEDTmp) <- dHead 135 | 136 | #add class column to the dataframe of the dataset 137 | dataTDTmp$class <- data$class 138 | dataTEDTmp$class <- dataTED$class 139 | data <- dataTDTmp 140 | dataTED <- dataTEDTmp 141 | } 142 | 143 | #Class Balancing using Smote for metrics other than accuracy and binary class problems 144 | if( balance == TRUE || (metric != 'acc' && length(unique(data$class)) == 2) ){ 145 | data$class = factor(data$class) 146 | data <- SmoteClassif(class ~., data, dist = 'HEOM') 147 | } 148 | 149 | # Use 70% of the dataset as Training - 30% of the dataset as Validation by default 150 | #smp_size <- floor((1-vRatio) * nrow(data)) 151 | # set the seed to make your partition reproducible 152 | #train_ind <- sample(seq_len(nrow(data)), size = smp_size) 153 | train_ind <- createDataPartition(y = data$class, times = 1, p = (1-vRatio), list = FALSE) 154 | trainingDataset <- data[train_ind, ] 155 | validationDataset <- data[-train_ind, ] 156 | return (list(TD = trainingDataset, VD = validationDataset, FULLTD = data, TED = dataTED)) 157 | } 158 | -------------------------------------------------------------------------------- /R/runClassifier_.R: -------------------------------------------------------------------------------- 1 | #' @keywords internal 2 | runClassifier_ <- function(trainingSet, validationSet, params, classifierAlgorithm, metric = "acc") { 3 | 4 | #training set features and classes 5 | xFeatures <- subset(trainingSet, select = -class) 6 | xClass <- c(subset(trainingSet, select = class)$'class') 7 | 8 | #print(levels(xClass)) 9 | 10 | #validation set features and classes 11 | yFeatures <- subset(validationSet, select = -class) 12 | yClass <- c(subset(validationSet, select = class)$'class') 13 | 14 | #print(levels(yClass)) 15 | 16 | #remove not available parameters 17 | if(typeof(params) == 'character'){ 18 | classifierConf <- getClassifierConf(classifierAlgorithm) 19 | params <- initialize(classifierAlgorithm, classifierConf, params) 20 | } 21 | for(i in colnames(params)){ 22 | if(is.na(params[[i]]) || params[[i]] == 'NA' || params[[i]] == 'EI'){ 23 | params <- subset(params, select = -get(i)) 24 | } 25 | } 26 | # build model 27 | if(classifierAlgorithm == 'svm'){ 28 | if(exists('gamma', where=params) && !is.na(params$gamma)) 29 | params$gamma <- (2^ as.double(params$gamma)) 30 | if(exists('cost', where=params) && !is.na(params$cost)) 31 | params$cost <- (2^ as.double(params$cost)) 32 | if(exists('tolerance', where=params) && !is.na(params$tolerance)) 33 | params$tolerance <- (2^ as.double(params$tolerance)) 34 | if(!exists('kernel', where = params)) 35 | params$kernel <- 'radial' 36 | invisible(capture.output(suppressWarnings(model <- do.call(svm,c(list(x = xFeatures, y = xClass, type = 'C-classification', scale = F), params))))) 37 | #check performance 38 | pred <- predict(model, yFeatures) 39 | } 40 | else if(classifierAlgorithm == 'l2-linear-classifier'){ 41 | params$cost <- (2^as.numeric(params$cost)) 42 | params$epsilon <- as.numeric(params$epsilon) 43 | model <- LiblineaR(target = as.factor(xClass), data = xFeatures, cost = params$cost, epsilon = params$epsilon, type = 2) 44 | pred <- predict(model, yFeatures)$predictions 45 | } 46 | else if(classifierAlgorithm == 'naiveBayes'){ 47 | if(!exists('eps', where = params)) { 48 | params$laplace <- as.numeric(params$laplace) 49 | 50 | model <- fnb.train(x = xFeatures, y = as.factor(xClass), laplace = params$laplace) 51 | } 52 | if(exists('eps', where = params)) { 53 | 54 | params$laplace <- as.numeric(params$laplace) 55 | params$eps <- (2 ^ as.numeric(params$eps)) 56 | learn <- cbind(xClass, xFeatures) 57 | model <- naiveBayes(as.factor(xClass) ~., data = learn, laplace = params$laplace, eps = params$eps) 58 | 59 | } 60 | 61 | pred <- predict(model, yFeatures) 62 | 63 | } 64 | else if(classifierAlgorithm == 'boosting'){ 65 | params$eta <- (2^as.numeric(params$eta)) 66 | params$max_depth <- as.numeric(params$max_depth) 67 | params$min_child_weight <- as.numeric(params$min_child_weight) 68 | params$gamma <- as.numeric(params$gamma) 69 | params$colsample_bytree <- as.numeric(params$colsample_bytree) 70 | 71 | xClass_dmat <- xClass %>% as.numeric() %>% map(.f = ~ .x - 1) 72 | xFeatures_dmat <- xFeatures %>% as.matrix() 73 | mode(xFeatures_dmat) = 'double' 74 | yFeatures_dmat <- yFeatures %>% as.matrix() 75 | mode(yFeatures_dmat) = 'double' 76 | 77 | learn <- xgb.DMatrix(data = xFeatures_dmat, label = xClass_dmat) 78 | model <- xgboost(data = learn, 79 | nrounds = 5, 80 | eta = params$eta, 81 | max_depth = params$max_depth, 82 | min_child_weight = params$min_child_weight, 83 | gamma = params$gamma, 84 | colsample_bytree = params$colsample_bytree, 85 | objective = "multi:softprob", 86 | num_class = length(unique(xClass_dmat)), 87 | verbose = 0, 88 | nthread = 1) 89 | 90 | pred_prep <- predict(model, yFeatures_dmat, nthreads = 1) 91 | 92 | pred_mat <- matrix(pred_prep, ncol = length(unique(xClass_dmat)), byrow = T) 93 | 94 | colnames(pred_mat) <- levels(trainingSet$class) 95 | 96 | pred <- apply(pred_mat, 1, function(x) colnames(pred_mat)[which.max(x)]) 97 | 98 | levels(pred) <- levels(trainingSet$class) 99 | 100 | } 101 | else if(classifierAlgorithm == 'ranger'){ 102 | params$max.depth <- as.numeric(params$max.depth) 103 | params$num.trees <- as.numeric(params$num.trees) 104 | params$mtry <- min(as.numeric(params$mtry), ncol(xFeatures)) 105 | params$min.node.size <- as.numeric(params$min.node.size) 106 | learn <- cbind(xClass, xFeatures) 107 | model <- ranger(as.factor(xClass) ~ ., 108 | data = learn, 109 | max.depth = params$max.depth, 110 | num.trees = params$num.trees, 111 | mtry = params$mtry, 112 | min.node.size = params$min.node.size, 113 | num.threads = 1) 114 | pred <- predict(model, yFeatures, num.threads = 1)$prediction 115 | } 116 | else if(classifierAlgorithm == 'randomForest'){ 117 | params$mtry <- as.numeric(params$mtry) 118 | params$ntree <- as.numeric(params$ntree) 119 | params$mtry <- min(params$mtry, ncol(xFeatures)) 120 | model <- do.call(randomForest,c(list(x = xFeatures, y = as.factor(xClass)), params)) 121 | pred <- predict(model, yFeatures) 122 | } 123 | if (classifierAlgorithm != 'boosting') { 124 | 125 | perf <- evaluateMet(yClass, pred, metric = metric) 126 | 127 | } 128 | else { 129 | 130 | perf <- evaluateMet(validationSet$class, pred %>% factor(levels = levels(validationSet$class)), metric = metric) 131 | 132 | } 133 | 134 | result <- list() 135 | result$perf <- perf 136 | 137 | result$model <- model 138 | result$pred <- pred 139 | 140 | return(result) 141 | } 142 | -------------------------------------------------------------------------------- /R/selectConfiguration.R: -------------------------------------------------------------------------------- 1 | #' @title Select Candidate Parameter Configuration 2 | #' 3 | #' @description Generate neighbor parameter configurations, sort them according to the expected improvement, and select the top promising ones as candidate configurations. 4 | #' 5 | #' @param R Dataframe of tried out parameter configurations. 6 | #' @param classifierAlgorithm String value of the classifier Name. 7 | #' @param tree List of data frames, representing the data structure for the forest of trees of the SMAC model. 8 | #' @param bestParams String of best parameter configuration found till now. 9 | #' @param B number of trees in the forest of trees of SMAC optimization algorithm (default = 10). 10 | #' 11 | #' @return Vector of strings of candidate parameter configurations. 12 | #' 13 | #' @examples selectConfiguration(c('1'), 'knn', data.frame(fold = c(), parent = c(), params = c(), leftChild = c(), rightChild = c(), performance = c(), rowN = c()), '1', 10) 14 | #' 15 | #' @import rjson 16 | #' @importFrom stats rnorm 17 | #' 18 | #' @noRd 19 | #' 20 | #' @keywords internal 21 | 22 | selectConfiguration <- function(R, classifierAlgorithm, tree, bestParams, B = 10) { 23 | #Read Classifier Algorithm Configuration Parameters 24 | #Open the Classifier Parameters Configuration File 25 | classifierConfDir <- system.file("extdata", paste(classifierAlgorithm,'.json',sep=""), package = "SmartML", mustWork = TRUE) 26 | result <- fromJSON(file = classifierConfDir) 27 | 28 | #get list of Classifier Parameters 29 | params <- result$params 30 | 31 | #minimum error rate found till now 32 | cmin <- (1 - bestParams$performance) 33 | 34 | #calculate Expected Improvement for all saved configurations 35 | for(i in 1:nrow(R)){ 36 | cntParams <- R[i,] 37 | cntParamStr <- paste( unlist(cntParams), collapse='#') 38 | cntPerf <- c() 39 | #calculate Expected improvment from SMAC random forest model 40 | for(j in 1:B){ 41 | cntNode <- tree[tree$fold==j & is.na(tree$parent), ] 42 | while(!is.na(cntNode[1])){ 43 | cParent <- cntNode$rowN 44 | cntNode$params 45 | if(cntParamStr > as.character(cntNode$params) && !is.na(cntNode$rightChild)){ 46 | cntNode <- tree[cntNode$rightChild, ] 47 | } 48 | else if(cntParamStr < as.character(cntNode$params) && !is.na(cntNode$leftChild)){ 49 | cntNode <- tree[cntNode$leftChild, ] 50 | } 51 | else{ 52 | cntPerf <- c(cntPerf, cntNode$performance) 53 | cntNode <- NA 54 | } 55 | } 56 | } 57 | cntParams$EI <- computeEI(cmin, cntPerf) 58 | R[i, ] <- cntParams 59 | } 60 | #sort according to Expected Improvement 61 | sortedR <- R[order(-R$EI),] 62 | #choose best promising configurations to suggest candidate configurations 63 | candidates <- R[0,] 64 | for(i in 1:min(10, nrow(R))){ 65 | cntParams <- R[i,] 66 | for(parI in params){ 67 | tmpParams <- cntParams 68 | cntParam <- cntParams[[parI]] 69 | if(is.na(cntParam)) 70 | next 71 | #for continuous Integer parameters 72 | if(result[[parI]]$type == 'continuous' && result[[parI]]$scale == 'int'){ 73 | minVal <- as.double(result[[parI]]$minVal) 74 | maxVal <- as.double(result[[parI]]$maxVal) 75 | cntParam <- as.double(cntParam) 76 | 77 | #generate a candidate 78 | parValues <- c(result[[parI]]$values) 79 | 80 | while(cntParam == cntParams[[parI]]){ 81 | cntParam <- sample(minVal:maxVal, 1, TRUE) 82 | if(result[[parI]]$constraint == 'odd' && (cntParam %% 2) == 0) 83 | cntParam = cntParams[[parI]] 84 | } 85 | tmpParams[[parI]] <- cntParam 86 | gparams <- c(parI) 87 | i <- 1 88 | while(i <= length(gparams)){ 89 | parTmp <- gparams[i] 90 | if(parTmp != parI){ 91 | if(is.na(cntParams[[parTmp]]))tmpParams[[parTmp]] <- result[[parTmp]]$default 92 | else tmpParams[[parTmp]] <- cntParams[[parTmp]] 93 | } 94 | i <- i + 1 95 | } 96 | tmpParams$EI <- NA 97 | tmpParams$performance <- NA 98 | candidates <- rbind(candidates, tmpParams) 99 | } 100 | #for continuous Non-Integer parameters 101 | else if(result[[parI]]$type == 'continuous'){ 102 | minVal <- as.double(result[[parI]]$minVal) 103 | maxVal <- as.double(result[[parI]]$maxVal) 104 | cntParam <- as.double(cntParam) 105 | meanU <- (cntParam - minVal)/(maxVal - minVal) 106 | #generate four candidates 107 | num <- 1 108 | while(num < 5){ 109 | cntParam <- rnorm(1, mean = meanU, sd = 0.2) 110 | if(cntParam <= 1 && cntParam >= 0){ 111 | num <- num + 1 112 | tmpParams[[parI]] <- as.character(cntParam * (maxVal - minVal) + minVal) 113 | tmpParams$EI <- NA 114 | tmpParams$performance <- NA 115 | candidates <- rbind(candidates, tmpParams) 116 | } 117 | } 118 | } 119 | #for Categorical (discrete parameters) 120 | else if(result[[parI]]$type == 'discrete'){ 121 | parValues <- c(result[[parI]]$values) 122 | while(cntParam == cntParams[[parI]]) 123 | cntParam <- sample(parValues, 1) 124 | tmpParams[[parI]] <- cntParam 125 | gparams <- c(parI) 126 | i <- 1 127 | while(i <= length(gparams)){ 128 | parTmp <- gparams[i] 129 | if(parTmp != parI){ 130 | if(is.na(cntParams[[parTmp]]))tmpParams[[parTmp]] <- result[[parTmp]]$default 131 | else tmpParams[[parTmp]] <- cntParams[[parTmp]] 132 | } 133 | require <- result[[parTmp]]$'requires'[[cntParam]]$require 134 | gparams <- c(gparams, require) 135 | i <- i + 1 136 | } 137 | tmpParams$EI <- NA 138 | tmpParams$performance <- NA 139 | candidates <- rbind(candidates, tmpParams) 140 | } 141 | } 142 | } 143 | candidates <- unique(candidates) 144 | 145 | #Remove Duplicate Candidate Configurations 146 | duplicates <- c() 147 | for(i in 1:nrow(candidates)){ 148 | for(j in 1:nrow(R)){ 149 | flager <- FALSE 150 | for(k in 1:(ncol(candidates)-2)){ 151 | if((!is.na(candidates[i,k]) && !is.na(candidates[i,k])) || candidates[i,k] != R[j,k]){ 152 | flager <- TRUE 153 | break 154 | } 155 | } 156 | if(flager == FALSE) 157 | duplicates <- c(duplicates, i) 158 | } 159 | } 160 | if(length(duplicates) > 0) 161 | candidates <- candidates[-duplicates,] 162 | #End Remove Candidate Configurations 163 | return(candidates) 164 | } 165 | -------------------------------------------------------------------------------- /R/sendToDatabase.R: -------------------------------------------------------------------------------- 1 | #' @title Send Results to Knowledge Base 2 | #' 3 | #' @description Connect to the cloud knowledge base to store the results obtained to be used in meta-learning of future runs. 4 | #' @param tmp String of characters to be sent to knowledge base 5 | #' @return None 6 | #' 7 | #' @examples sendToDatabase() 8 | #' 9 | #' @noRd 10 | #' 11 | #' @import devtools 12 | #' @importFrom rjson fromJSON 13 | #' @importFrom httr POST 14 | #' 15 | #' @keywords internal 16 | 17 | sendToDatabase <- function(tmp){ 18 | #Get IP 19 | cntIP <- fromJSON(readLines("http://api.hostip.info/get_json.php", warn=F))$ip 20 | 21 | #Update knowledge base 22 | updateKB <- try( 23 | { 24 | #tmp <- paste(readLines(system.file("extdata", "tmp", package = "SmartML", mustWork = TRUE)), collapse="\n") 25 | res <- POST("https://jncvt2k156.execute-api.eu-west-1.amazonaws.com/default/s3-trigger-rautoml", body = list(data = paste(tmp, "&DATA&", sep=""), 26 | fName = paste(cntIP,".csv&FILENAME&", sep=""), 27 | encode = "json")) 28 | #write("", file=system.file("extdata", "tmp", package = "SmartML", mustWork = TRUE),append=TRUE) #Empty the tmp file 29 | }) 30 | if(inherits(updateKB, "try-error")) 31 | print('Failed to update Knowledge base.') 32 | 33 | } 34 | -------------------------------------------------------------------------------- /R/sendToTmp.R: -------------------------------------------------------------------------------- 1 | #' @title Write results. 2 | #' 3 | #' @description Append results to a log file. 4 | #' 5 | #' @param df List of the dataset meta-features 6 | #' @param algorithmName String of the name of selected classifier algorithm. 7 | #' @param bestParams String of the best parameters configuration found. 8 | #' @param perf String of the performance value obtained using the selected algorithm and parameter configuration. 9 | #' @param nModels Integer representing the number of classifier algorithms that you want to select based on Meta-Learning and start to tune using Bayesian Optimization. 10 | #' @param metric Metric to be used in evaluation: 11 | #' \itemize{ 12 | #' \item "acc" - Accuracy, 13 | #' \item "fscore" - Micro-Average of F-Score of each label, 14 | #' \item "recall" - Micro-Average of Recall of each label, 15 | #' \item "precision" - Micro-Average of Precision of each label. 16 | #' } 17 | #' 18 | #' @return None 19 | #' 20 | #' @examples sendToTmp(\code{df}, 'knn', '1', '0.9'). 21 | #' 22 | #' @noRd 23 | #' 24 | #' @keywords internal 25 | 26 | sendToTmp <- function(df, algorithmName, bestParams, perf, nModels, metric = 'acc') { 27 | df$params <- sprintf("%s", paste( unlist(bestParams), collapse='#')) 28 | df$performance <- perf 29 | df$classifierAlgorithm <- sprintf("%s", algorithmName) 30 | 31 | query <- sprintf("%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s", 32 | df$datasetRatio, df$featuresKurtStdDev, df$featuresKurtMean, df$featuresKurtMax, df$featuresKurtMin, df$featuresSkewStdDev, 33 | df$featuresSkewMean, df$featuresSkewMax, df$featuresSkewMin, df$symbolsStdDev, df$symbolsSum, df$symbolsMean, df$classProbStdDev, 34 | df$classProbMean, df$classProbMax, df$classProbMin, df$classEntropy, df$ratioNumToCat, df$nCatFeatures, df$nNumFeatures, 35 | df$nInstances, df$nFeatures, df$nClasses, df$lognFeatures, df$lognInstances, df$classifierAlgorithm, df$params, df$maxTime, metric, 36 | df$performance, nModels) 37 | return(query) 38 | #write(query, file=system.file("extdata", "tmp", package = "SmartML", mustWork = TRUE),append=TRUE) 39 | } 40 | -------------------------------------------------------------------------------- /R/successive_halving.R: -------------------------------------------------------------------------------- 1 | #' @keywords internal 2 | #' 3 | successive_halving <- function(df, model, params_config, n = 81, r = 1, eta = 3, 4 | max_iter = 81, s_max = 5, evaluations = data.frame(), 5 | problem = 'classification', measure = 'classif.acc') { 6 | 7 | final_df = params_config 8 | print('GOT HERE 0') 9 | if(problem == 'classification'){ 10 | problem = 'classif' 11 | task = TaskClassif$new(id = 'sh', backend = df, target = 'class') 12 | } 13 | else{ 14 | problem = 'regr' 15 | task = TaskRegr$new(id = 'sh', backend = df, target = 'class') 16 | } 17 | param_number = length(params_config) 18 | 19 | for (k in 0:s_max) { 20 | gc() 21 | n_i = n * (eta ** -k) 22 | r_i = r * (eta ** k) 23 | r_p = r_i / max_iter 24 | min_train_datapoints = (length(unique(df$class)) * 3) + 1 25 | min_prob_datapoints = min_train_datapoints / nrow(df$class) 26 | train_idxs <- sample(task$nrow, task$nrow * max(min(r_p, 0.8), min_prob_datapoints)) 27 | test_idxs <- setdiff(seq_len(task$nrow), train_idxs) 28 | if (problem == 'classif') 29 | learners <- replicate(n = n_i, expr = {lrn(paste(problem, sep = '.', model), 30 | predict_type = 'prob')}) 31 | else 32 | learners <- replicate(n = n_i, expr = {lrn(paste(problem, sep = '.', model))}) 33 | 34 | print('GOT HERE 1') 35 | j = 1 36 | for (i in learners) { 37 | cnt_field <- final_df[[j]] 38 | ## Some conditions to filter the parameter values 39 | if (model == 'svm' && final_df[[j]]$kernel != 'polynomial') 40 | cnt_field$degree <- NULL 41 | if ( (model == 'svm' && final_df[[j]]$kernel == 'linear') || (model == 'cv_glmnet' && final_df[[j]]$relax == FALSE)) 42 | cnt_field$gamma <- NULL 43 | 44 | i$param_set$values = cnt_field 45 | j = j + 1 46 | } 47 | 48 | print('GOT HERE 2') 49 | for (l in learners) { 50 | l$train(task = task, row_ids = train_idxs) 51 | } 52 | 53 | print('GOT HERE 3') 54 | preds <- map(.x = learners, .f = ~ .x$predict(task, row_ids = test_idxs)$score(msr(measure))) 55 | 56 | 57 | final_df <- final_df %>% 58 | as.data.table() %>% 59 | t() %>% 60 | `colnames<-`(value = jsons[[model]]$params) %>% 61 | as.data.table() 62 | 63 | 64 | final_df[, acc := unlist(preds)] 65 | final_df[, budget := r_i] 66 | final_df[, budget := r_p] 67 | final_df[, model := unlist(learners)] 68 | setorder(final_df, -acc) 69 | evaluations <- rbindlist(list(evaluations, final_df)) 70 | 71 | 72 | final_df <- final_df %>% 73 | head(max(n_i/eta, 1)) 74 | 75 | 76 | if(k == s_max){ 77 | return(list("answer" = final_df, "sh_runs" = evaluations)) 78 | } 79 | 80 | final_df$acc = NULL 81 | final_df$budget = NULL 82 | final_df$model = NULL 83 | final_df <- purrr::transpose(final_df) 84 | 85 | } 86 | } 87 | -------------------------------------------------------------------------------- /R/successive_resampling.R: -------------------------------------------------------------------------------- 1 | #' @importFrom KernSmooth dpik bkde 2 | #' @importFrom tidyr drop_na separate gather spread unite 3 | #' @importFrom dplyr select mutate_if arrange top_frac case_when mutate filter 4 | #' @importFrom truncnorm rtruncnorm dtruncnorm 5 | 6 | #' @keywords internal 7 | dpikSafe <- function(x, ...) 8 | { 9 | result <- try(dpik(x, ...), silent = TRUE) 10 | if (class(result) == "try-error") 11 | { 12 | msg <- geterrmessage() 13 | if (grepl("scale estimate is zero for input data", msg)) 14 | { 15 | warning("Using standard deviation as scale estimate, probably because IQR == 0") 16 | result <- try(dpik(x, scalest = "stdev", ...), silent = TRUE ) 17 | if (class(result) == "try-error") { 18 | msg <- geterrmessage() 19 | if (grepl("scale estimate is zero for input data", msg)) { 20 | warning("0 scale, bandwidth estimation failed. using 1e-3") 21 | result <- 1e-3 22 | } 23 | } 24 | } else 25 | { 26 | stop(msg) 27 | } 28 | } 29 | return(result) 30 | } 31 | 32 | #' @keywords internal 33 | successive_resampling <- function(df, model, samples = 64, n = 27, bw = 3, kde_type = "single") { 34 | samples_filtered <- df %>% drop_na() 35 | params_list <- jsons[[model]]$params 36 | length_params <- length(params_list) 37 | biggest_budget_that_satisfies <- samples_filtered %>% 38 | mutate(acc = as.numeric(acc)) %>% 39 | group_by(budget) %>% 40 | mutate(size = n()) %>% 41 | ungroup() %>% 42 | filter(size > ((length_params + 1) * 20/3)) %>% 43 | filter(budget == max(budget)) %>% 44 | arrange(desc(acc)) %>% 45 | select(-size) %>% 46 | separate(col = params, 47 | into = jsons[[model]]$params, 48 | sep = ",") %>% 49 | select(-model, -rp) %>% 50 | mutate_if(is.character, .funs = ~ str_extract(.x, pattern = "(?<==).*$") %>% parse_number) 51 | l_samples <- biggest_budget_that_satisfies %>% 52 | top_frac(0.15, wt = acc) %>% 53 | select(-acc, -budget) 54 | 55 | g_samples <- biggest_budget_that_satisfies %>% 56 | top_frac(-0.85, wt = acc) %>% 57 | select(-acc, -budget) 58 | 59 | l_kde_bws <- suppressWarnings(map_dbl(l_samples, dpikSafe)) 60 | g_kde_bws <- suppressWarnings(map_dbl(g_samples, dpikSafe)) 61 | l_kde_means <- map2_dbl(.x = l_samples, .y = l_kde_bws, .f = ~ mean(bkde(x = .x, bandwidth = .y)$x)) 62 | g_kde_means <- map2_dbl(.x = g_samples, .y = g_kde_bws, .f = ~ mean(bkde(x = .x, bandwidth = .y)$x)) 63 | maxvals <- map_dbl(.x = params_list, .f = ~ readr::parse_number(jsons[[model]][[.x]]$maxVal)) 64 | minvals <- map_dbl(.x = params_list, .f = ~ readr::parse_number(jsons[[model]][[.x]]$minVal)) 65 | types <- map_chr(.x = params_list, .f = ~ jsons[[model]][[.x]]$scale) 66 | partial_rtruncnorm <- function(n, a, b, mu, sigma, type) { 67 | case_when(type == "int" ~ round(rtruncnorm(n = n, a = a, b = b, mean = mu, sd = sigma)), 68 | type == "double" | type == "exp" ~ rtruncnorm(n = n, a = a, b = b, mean = mu, sd = sigma)) 69 | } 70 | 71 | partial_dtruncnorm <- function(x, a, b, mu, sigma) { 72 | dtruncnorm(x = x, a = a, b = b, mean = mu, sd = sigma) 73 | } 74 | 75 | batch_samples <- pmap_dfc(.l = list("a" = minvals, 76 | "b" = maxvals, 77 | "mu" = l_kde_means, 78 | "sigma" = l_kde_bws * bw, 79 | "type" = types), 80 | .f = partial_rtruncnorm, 81 | n = samples) %>% 82 | set_names(nm = params_list) 83 | 84 | batch_samples_densities_l <- pmap_dfc(.l = list("x" = batch_samples, 85 | "a" = minvals, 86 | "b" = maxvals, 87 | "mu" = l_kde_means, 88 | "sigma" = l_kde_bws), 89 | .f = partial_dtruncnorm) 90 | 91 | batch_samples_densities_g <- pmap_dfc(.l = list("x" = batch_samples, 92 | "a" = minvals, 93 | "b" = maxvals, 94 | "mu" = g_kde_means, 95 | "sigma" = g_kde_bws), 96 | .f = partial_dtruncnorm) 97 | 98 | evaluate_batch_convolution <- batch_samples_densities_l / batch_samples_densities_g 99 | 100 | rank_sample_density <- function(samp, kdensity, n) { 101 | samp <- samp %>% as.data.frame() 102 | samp$rank <- kdensity 103 | sorted_samp <- samp %>% arrange(desc(rank)) %>% head(n) 104 | subset(sorted_samp, select = -rank) 105 | } 106 | 107 | if(kde_type == "mixed") { 108 | EI <- evaluate_batch_convolution %>% 109 | reduce(.f = `*`) %>% 110 | map_if(.p = ~ ((is.nan(.x) | is.infinite(.x)) == T), 111 | .f = ~ runif(1, min = 1e-5, max = 1e-3)) %>% 112 | flatten_dbl() 113 | 114 | batch_samples$rank <- EI 115 | 116 | evaluated_batch <- batch_samples %>% 117 | arrange(desc(rank)) %>% 118 | top_n(n = n, wt = rank) 119 | 120 | evaluated_batch_step_two <- evaluated_batch %>% 121 | select(-rank) %>% 122 | gather(key, value) %>% 123 | mutate(params = paste(key, value, sep = " = ")) %>% 124 | .[["params"]] 125 | 126 | eval_batch_step_three <- evaluated_batch_step_two %>% 127 | matrix(nrow = n, ncol = length(params_list)) %>% 128 | as.data.frame() %>% 129 | unite(col = "params", sep = ",") %>% 130 | mutate(model = model) %>% 131 | select(model, params) 132 | 133 | return(eval_batch_step_three) 134 | 135 | } else if(kde_type == "single") { 136 | evaluated_batch <- map2_dfc(.x = batch_samples, .y = evaluate_batch_convolution, 137 | .f = rank_sample_density, n = n) 138 | 139 | colnames(evaluated_batch) <- params_list 140 | final_df <- evaluated_batch %>% 141 | gather(key, value) %>% 142 | mutate(params = paste(key, value, sep = " = ")) %>% 143 | .[["params"]] %>% 144 | matrix(nrow = n, ncol = length(params_list)) %>% 145 | as.data.frame() %>% 146 | unite(col = "params", sep = ",") %>% 147 | mutate(model = model) %>% 148 | select(model, params) 149 | 150 | return(final_df) 151 | } 152 | 153 | } 154 | -------------------------------------------------------------------------------- /R/sysdata.rda: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DataSystemsGroupUT/SmartML/e58b5bddb0fbf741e16f31651a282146143e78fe/R/sysdata.rda -------------------------------------------------------------------------------- /README.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | output: github_document 3 | --- 4 | 5 | 6 | 7 | ```{r setup, include = FALSE} 8 | knitr::opts_chunk$set( 9 | collapse = TRUE, 10 | comment = "#>", 11 | fig.path = "man/figures/README-", 12 | out.width = "100%" 13 | ) 14 | ``` 15 | 16 | # witchcraft 17 | 18 | [![CRAN_Status_Badge](http://www.r-pkg.org/badges/version/witchcraft)](https://cran.r-project.org/package=witchcraft) 19 | [![lifecycle](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://www.tidyverse.org/lifecycle/#experimental) 20 | [![Travis build status](https://travis-ci.org/brurucy/witchcraft.svg?branch=master)](https://travis-ci.org/brurucy/witchcraft) 21 | 22 | 23 | The R package *witchcraft* is an opinionated framework for automated machine learning, with the intent of being frequently updated with the newest state-of-the-art optimization methods. 24 | 25 | At the moment, *witchcraft* uses the [Bayesian-Optimization-Hyperband](https://arxiv.org/pdf/1603.06560.pdf) algorithm. 26 | 27 | Besides *Combined Algorithm Selection and Hyperparameter optimization*, *witchcraft* provides tools to evaluate the results, which are consistent with the mlr3 workflow. 28 | 29 | ## Installation 30 | 31 | Soon, installing the **stable** version from [CRAN](https://cran.r-project.org/package=witchcraft) will be possible: 32 | 33 | ```{r cran-installation, eval = FALSE} 34 | install.packages("witchcraft") 35 | ``` 36 | 37 | You can always install the **development** version from 38 | [GitHub](https://github.com/brurucy/witchcraft) 39 | 40 | ```{r gh-installation, eval = FALSE} 41 | # install.packages("remotes") 42 | remotes::install_github("brurucy/witchcraft") 43 | ``` 44 | 45 | Installing this software requires a compiler. 46 | 47 | ## Valid example 48 | 49 | ```{r example, message=FALSE, eval=FALSE} 50 | library(SmartML) 51 | library(readr) 52 | 53 | data_train <- readr::read_csv('inst/extdata/dota_train.csv') %>% 54 | as.data.table() 55 | 56 | data_test <- readr::read_csv('inst/extdata/dota_test.csv') %>% 57 | as.data.table() 58 | 59 | data_train[, class := factor(class, levels = unique(class)) %>% sort()] 60 | data_test[, class := factor(class, levels = unique(class)) %>% sort()] 61 | 62 | params <- SmartML:::get_random_hp_config('kknn', columns = ncol(data_train) - 1) 63 | 64 | print(typeof(params$kernel)) 65 | params 66 | 67 | ``` 68 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | [![DOI](http://joss.theoj.org/papers/10.21105/joss.00786/status.svg)](https://doi.org/10.5441/002/edbt.2019.54) 5 | 6 | 7 | ## SmartML: 8 | Curently, SmartML is an R-Package representing a meta learning-based framework for automated selection and hyperparameter tuning for machine learning algorithms. Being meta-learning based, the framework is able to simulate the role of the machine learning expert. In particular, the framework is equipped with a continuously updated knowledge base that stores information about the meta-features of all processed datasets along with the associated performance of the different classifiers and their tuned parameters. Thus, for any new dataset, SmartML automatically extracts its meta features and searches its knowledge base for the best performing algorithm to start its optimization process. In addition, SmartML makes use of the new runs to continuously enrich its knowledge base to improve its performance and robustness for future runs. 9 | 10 | 11 | 12 | --- 13 | ## SmartML Contribution Points and Goals: 14 | 15 | The goal of SmartML is to automate the process of classifier algorithm selection, and hyper-parameter tuning in supervised machine learning using a modified version of SMAC bayesian optimization that prefers explitation more than exploration thanks to Meta-Learning. 16 | 1. SmartML is the first R package to deal with the sueprvised machine learning automation, and it is built over 16 different classifier algorithms from different R packages.
17 | 2. In addition, we offer different data preprocessing, and feature engineering algorithms that can be specified by user and applied on tabular datasets of either CSV or ARFF extensions easily. 18 | 3. SmartML has a collaborative knowledge base that grows by time as more users are using our tool. 19 | 4. Finally, SmartML has the ability to do some model interpretability plots for feature importance and interaction by help of ```iml``` package for ML model interpretability. 20 | 21 | --- 22 | ## Installation 23 | 24 | You can install the released version of SmartML from [Github](https://github.com/mmaher22/SmartML) with: 25 | 26 | ``` r 27 | install_github("mmaher22/SmartML") 28 | ``` 29 | 30 | --- 31 | ## User Manual 32 | 33 | Manual for the SmartML R package can be found HERE 34 | 35 | --- 36 | ## Example 37 | 38 | This is a basic example which shows you how to run SmartML simply: 39 | 40 | ```{r} 41 | library(SmartML) 42 | ``` 43 | 44 | ```{r} 45 | #' Option 1 = Classifier Selection Only, apply PCA as a preprocessing step with 4 components and get two candidate models as output only 46 | result1 <- autoRLearn(1, 'sampleDatasets/shuttle/train.arff', 'sampleDatasets/shuttle/test.arff', option = 1, preProcessF = 'pca', nComp = 4, nModels = 2) 47 | 48 | #option 1 runs for Classifier Algorithm Selection Only 49 | result1$clfs #Vector of recommended nModels classifiers 50 | result1$params #Vector of initial suggested parameter configurations of nModels recommended classifiers 51 | 52 | #Use recommended model to train over training data and make predictions over test data 53 | resultRun <- runClassifier(result1$TRData, result1$TEData, result1$params[[1]], result1$clfs[[1]]) 54 | resultRun$perf #model performance on test set 55 | ``` 56 | 57 | ```{r} 58 | #' Option 2 = Both Classifier Selection and Parameter Optimization and compute model interpretability plots 59 | result2 <- autoRLearn(2, 'sampleDatasets/car/train.arff', 'sampleDatasets/car/test.arff', interp = TRUE) # Option 2 runs for both classifier algorithm selection and parameter tuning for 2 minutes. 60 | 61 | result2$clfs #best classifier found 62 | result2$params #parameter configuration for best classifier 63 | result2$perf #performance of chosen classifier on testing set after fitting on whole training set 64 | ``` 65 | 66 | ```{r} 67 | plot(result2$interpret$featImp) #Feature Importance Plot 68 | ``` 69 | 70 | ```{r} 71 | #' Option 2 = Both Classifier Selection and Parameter Optimization, use 20% validation set from training set, and apply MICE for missing values imputation 72 | result3 <- autoRLearn(5, 'sampleDatasets/EEGEyeState/train.csv', 'sampleDatasets/EEGEyeState/test.csv', vRatio = 0.2, missingOpr = TRUE) # Option 2 runs for both classifier algorithm selection and parameter tuning for 5 minutes. 73 | 74 | 75 | result3$clfs #best classifier found 76 | result3$params #parameter configuration for best classifier 77 | result3$perf #performance of chosen classifier on testing set 78 | ``` 79 | 80 | --- 81 | ## Contribution GuideLines to SmartML 82 | To Contribute to `SmartML`, Please Follow these GuideLines 83 | 84 | --- 85 | ## Publication 86 | 87 | SmartML has been accepted as a DEMO paper at EDBT 19 in Lisbon Portugal [PDF]: 88 | ``` 89 | Mohamed Maher, Sherif Sakr.,SMARTML: A Meta Learning-Based Framework for Automated Selection and Hyperparameter Tuning for Machine Learning Algorithms (2019). Advances in Database Technology-EDBT 2019: 22nd International Conference on Extending Database Technology, Lisbon, Portugal, March 26-29. 90 | ``` 91 | 92 | --- 93 | ## Licence: 94 | This work is licensed under the terms of the GNU General Public License, version 3.0 (GPLv3) 95 | -------------------------------------------------------------------------------- /SmartML.Rproj: -------------------------------------------------------------------------------- 1 | Version: 1.0 2 | 3 | RestoreWorkspace: Default 4 | SaveWorkspace: Default 5 | AlwaysSaveHistory: Default 6 | 7 | EnableCodeIndexing: Yes 8 | UseSpacesForTab: Yes 9 | NumSpacesForTab: 2 10 | Encoding: UTF-8 11 | 12 | RnwWeave: Sweave 13 | LaTeX: pdfLaTeX 14 | 15 | AutoAppendNewline: Yes 16 | StripTrailingWhitespace: Yes 17 | 18 | BuildType: Package 19 | PackageUseDevtools: Yes 20 | PackageInstallArgs: --no-multiarch --with-keep.source 21 | PackageCheckArgs: –as-cran 22 | PackageRoxygenize: rd,collate,namespace 23 | -------------------------------------------------------------------------------- /SmartML_0.3.0.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DataSystemsGroupUT/SmartML/e58b5bddb0fbf741e16f31651a282146143e78fe/SmartML_0.3.0.pdf -------------------------------------------------------------------------------- /codecov.yml: -------------------------------------------------------------------------------- 1 | comment: false 2 | coverage: 3 | status: 4 | project: 5 | default: 6 | target: auto 7 | threshold: 1% 8 | patch: 9 | default: 10 | target: auto 11 | threshold: 1% 12 | language: R 13 | sudo: false 14 | 15 | -------------------------------------------------------------------------------- /inst/extdata/hyperband_jsons.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DataSystemsGroupUT/SmartML/e58b5bddb0fbf741e16f31651a282146143e78fe/inst/extdata/hyperband_jsons.zip -------------------------------------------------------------------------------- /inst/extdata/hyperband_jsons/cv_glmnet.json: -------------------------------------------------------------------------------- 1 | { 2 | "params":["dfmax", "alpha", "gamma", "relax", "nfolds"], 3 | "parents":["dfmax", "alpha", "gamma", "relax", "nfolds"], 4 | "gamma": 5 | { 6 | "type":"continuous", 7 | "scale":"double", 8 | "minVal":"0", 9 | "maxVal":"1", 10 | "default":"0.5", 11 | "constraint":"any" 12 | }, 13 | "alpha": 14 | { 15 | "type":"continuous", 16 | "scale":"double", 17 | "minVal":"0", 18 | "maxVal":"1", 19 | "default":"0.3", 20 | "constraint":"any" 21 | }, 22 | "dfmax": 23 | { 24 | "type":"continuous", 25 | "scale":"int", 26 | "minVal":"10", 27 | "maxVal":"100", 28 | "default":"50", 29 | "constraint":"any" 30 | }, 31 | "nfolds": 32 | { 33 | "type":"continuous", 34 | "scale":"int", 35 | "minVal":"3", 36 | "maxVal":"3", 37 | "default":"3", 38 | "constraint":"any" 39 | }, 40 | "relax": 41 | { 42 | "type":"boolean", 43 | "values":["TRUE", "FALSE"], 44 | "default":"FALSE" 45 | } 46 | } 47 | -------------------------------------------------------------------------------- /inst/extdata/hyperband_jsons/glmnet.json: -------------------------------------------------------------------------------- 1 | { 2 | "params":["dfmax", "alpha", "gamma", "relax"], 3 | "parents":["dfmax", "alpha", "gamma", "relax"], 4 | "gamma": 5 | { 6 | "type":"continuous", 7 | "scale":"double", 8 | "minVal":"0", 9 | "maxVal":"1", 10 | "default":"0.5", 11 | "constraint":"any" 12 | }, 13 | "alpha": 14 | { 15 | "type":"continuous", 16 | "scale":"double", 17 | "minVal":"0", 18 | "maxVal":"1", 19 | "default":"0.3", 20 | "constraint":"any" 21 | }, 22 | "dfmax": 23 | { 24 | "type":"continuous", 25 | "scale":"int", 26 | "minVal":"10", 27 | "maxVal":"100", 28 | "default":"50", 29 | "constraint":"any" 30 | }, 31 | "relax": 32 | { 33 | "type":"boolean", 34 | "values":["TRUE", "FALSE"], 35 | "default":"FALSE" 36 | } 37 | } 38 | -------------------------------------------------------------------------------- /inst/extdata/hyperband_jsons/kknn.json: -------------------------------------------------------------------------------- 1 | { 2 | "params":["k", "distance", "kernel"], 3 | "parents":["k", "distance", "kernel"], 4 | "k": 5 | { 6 | "type":"continuous", 7 | "scale":"int", 8 | "minVal":"1", 9 | "maxVal":"20", 10 | "default":"7", 11 | "constraint":"any" 12 | }, 13 | "distance": 14 | { 15 | "type":"continuous", 16 | "scale":"int", 17 | "minVal":"1", 18 | "maxVal":"4", 19 | "default":"2", 20 | "constraint":"any" 21 | }, 22 | "kernel": 23 | { 24 | "type":"discrete", 25 | "values":["rectangular", "epanechnikov", "gaussian", "rank", "optimal"], 26 | "default":"optimal" 27 | } 28 | } 29 | -------------------------------------------------------------------------------- /inst/extdata/hyperband_jsons/lm.json: -------------------------------------------------------------------------------- 1 | { 2 | "params":["singular.ok"], 3 | "parents":["singular.ok"], 4 | "type": 5 | { 6 | "type":"boolean", 7 | "values":["TRUE"], 8 | "default":"TRUE" 9 | } 10 | } 11 | -------------------------------------------------------------------------------- /inst/extdata/hyperband_jsons/naive_bayes.json: -------------------------------------------------------------------------------- 1 | { 2 | "params":["laplace"], 3 | "parents":["laplace"], 4 | "laplace": 5 | { 6 | "default":"0", 7 | "type":"continuous", 8 | "scale":"int", 9 | "minVal":"0", 10 | "maxVal":"4", 11 | "constraint":"any" 12 | } 13 | } 14 | -------------------------------------------------------------------------------- /inst/extdata/hyperband_jsons/ranger.json: -------------------------------------------------------------------------------- 1 | { 2 | "params":["num.trees", "mtry", "max.depth", "min.node.size", "verbose"], 3 | "parents":["num.trees", "mtry", "max.depth", "min.node.size", "verbose"], 4 | "num.trees": 5 | { 6 | "type":"continuous", 7 | "scale":"int", 8 | "minVal":"1", 9 | "maxVal":"500", 10 | "default":"500", 11 | "constraint":"any" 12 | }, 13 | "mtry": 14 | { 15 | "type":"continuous", 16 | "scale":"int", 17 | "minVal":"1", 18 | "maxVal":"30", 19 | "default":"5", 20 | "constraint":"any" 21 | }, 22 | "max.depth": 23 | { 24 | "type":"continuous", 25 | "scale":"int", 26 | "minVal":"0", 27 | "maxVal":"10", 28 | "default":"0", 29 | "constraint":"any" 30 | }, 31 | "min.node.size": 32 | { 33 | "type":"continuous", 34 | "scale":"int", 35 | "minVal":"1", 36 | "maxVal":"10", 37 | "default":"2", 38 | "constraint":"any" 39 | }, 40 | "verbose": 41 | { 42 | "type":"boolean", 43 | "values":["FALSE"], 44 | "default":"FALSE" 45 | } 46 | } 47 | -------------------------------------------------------------------------------- /inst/extdata/hyperband_jsons/rpart.json: -------------------------------------------------------------------------------- 1 | { 2 | "params":["maxdepth", "minsplit"], 3 | "parents":["maxdepth", "minsplit"], 4 | "maxdepth": 5 | { 6 | "type":"continuous", 7 | "scale":"int", 8 | "minVal":"1", 9 | "maxVal":"30", 10 | "default":"6", 11 | "constraint":"any" 12 | }, 13 | "minsplit": 14 | { 15 | "type":"continuous", 16 | "scale":"int", 17 | "minVal":"1", 18 | "maxVal":"30", 19 | "default":"10", 20 | "constraint":"any" 21 | } 22 | } 23 | -------------------------------------------------------------------------------- /inst/extdata/hyperband_jsons/svm.json: -------------------------------------------------------------------------------- 1 | { 2 | "params":["kernel", "type", "degree", "gamma", "cost"], 3 | "parents":["kernel", "type", "degree", "gamma", "cost"], 4 | "kernel": 5 | { 6 | "type":"discrete", 7 | "values":["linear", "radial", "polynomial"], 8 | "default":"linear" 9 | }, 10 | "type": 11 | { 12 | "type":"discrete", 13 | "values":["C-classification"], 14 | "default":"C-classification" 15 | }, 16 | "gamma": 17 | { 18 | "default":"-4", 19 | "type":"continuous", 20 | "minVal":"-10", 21 | "maxVal":"5", 22 | "scale":"exp", 23 | "constraint":"any" 24 | }, 25 | "degree": 26 | { 27 | "default":"3", 28 | "type":"continuous", 29 | "minVal":"2", 30 | "maxVal":"5", 31 | "scale":"int", 32 | "constraint":"any" 33 | }, 34 | "cost": 35 | { 36 | "default":"-2", 37 | "type":"continuous", 38 | "minVal":"-6", 39 | "maxVal":"12", 40 | "scale":"exp", 41 | "constraint":"any" 42 | } 43 | } 44 | -------------------------------------------------------------------------------- /inst/extdata/hyperband_jsons/xgboost.json: -------------------------------------------------------------------------------- 1 | { 2 | "params":["eta", "max_depth", "nrounds", "verbose", "min_child_weight"], 3 | "parents":["eta", "max_depth", "nrounds", "verbose", "min_child_weight"], 4 | "verbose": 5 | { 6 | "type":"continuous", 7 | "scale":"int", 8 | "minVal":"0", 9 | "maxVal":"0", 10 | "default":"0" 11 | }, 12 | "nrounds": 13 | { 14 | "type":"continuous", 15 | "scale":"int", 16 | "minVal":"10", 17 | "maxVal":"1000", 18 | "default":"10" 19 | }, 20 | "eta": 21 | { 22 | "type":"continuous", 23 | "scale":"double", 24 | "minVal":"0.01", 25 | "maxVal":"0.5", 26 | "default":"0.3" 27 | }, 28 | "max_depth": 29 | { 30 | "type":"continuous", 31 | "scale":"int", 32 | "minVal":"2", 33 | "maxVal":"10", 34 | "default":"6" 35 | }, 36 | "min_child_weight": 37 | { 38 | "type":"continuous", 39 | "scale":"int", 40 | "minVal":"1", 41 | "maxVal":"10", 42 | "default":"1" 43 | } 44 | } 45 | -------------------------------------------------------------------------------- /inst/extdata/ta_test.csv: -------------------------------------------------------------------------------- 1 | X1.1,X1.2,X2.1,X2.2,X2.3,X2.4,X2.5,X2.6,X2.7,X2.8,X2.9,X2.10,X2.11,X2.12,X2.13,X2.14,X2.15,X2.16,X2.17,X2.18,X2.19,X2.20,X2.21,X2.22,X2.23,X2.24,X2.25,X3.1,X3.2,X3.3,X3.4,X3.5,X3.6,X3.7,X3.8,X3.9,X3.10,X3.11,X3.12,X3.13,X3.14,X3.15,X3.16,X3.17,X3.18,X3.19,X3.20,X3.21,X3.22,X3.23,X3.24,X3.25,X3.26,X4.1,X4.2,X5,class 2 | 0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,2.104308876612462,3 3 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,-0.610182803372127,3 4 | 0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,-0.4550689930872934,2 5 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1.096069109761043,1 6 | 0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1.7940812560427943,1 7 | 1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,-0.6877397085145439,3 8 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,-0.8428535187993775,3 9 | 0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,1.1736260149034599,2 10 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1.4062967303307103,2 11 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,-1.3857518547962953,2 12 | 0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0.7082845840489589,1 13 | 1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,-0.22239827766004297,3 14 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,-0.610182803372127,3 15 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,-0.610182803372127,2 16 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0.087829342909624325,1 17 | 0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,-0.06728446737520931,1 18 | 0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,-0.3775120879448766,1 19 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,2.9574348331790463,1 20 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,-0.8428535187993775,3 21 | 0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,-1.6184225702235457,3 22 | 1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,2.026751971470045,3 23 | 1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0.087829342909624325,3 24 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,-1.3081949496538785,2 25 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,-0.5326258982297103,2 26 | 0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,-0.29995518280245975,2 27 | 0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,-1.7735363805083795,2 28 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1.096069109761043,2 29 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0.087829342909624325,1 30 | 0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,-1.2306380445114617,1 31 | -------------------------------------------------------------------------------- /inst/extdata/ta_train.csv: -------------------------------------------------------------------------------- 1 | X1.1,X1.2,X2.1,X2.2,X2.3,X2.4,X2.5,X2.6,X2.7,X2.8,X2.9,X2.10,X2.11,X2.12,X2.13,X2.14,X2.15,X2.16,X2.17,X2.18,X2.19,X2.20,X2.21,X2.22,X2.23,X2.24,X2.25,X3.1,X3.2,X3.3,X3.4,X3.5,X3.6,X3.7,X3.8,X3.9,X3.10,X3.11,X3.12,X3.13,X3.14,X3.15,X3.16,X3.17,X3.18,X3.19,X3.20,X3.21,X3.22,X3.23,X3.24,X3.25,X3.26,X4.1,X4.2,X5,class 2 | 1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,-0.6877397085145439,3 3 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,-0.8428535187993775,3 4 | 1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1.6389674457579608,3 5 | 1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0.39805696347929165,3 6 | 0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,-0.6877397085145439,3 7 | 0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,-0.06728446737520931,3 8 | 1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,2.3369795920397123,3 9 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,-0.610182803372127,3 10 | 0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,-1.463308759938712,3 11 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0.16538624805204116,3 12 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0.087829342909624325,3 13 | 0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0.8633983943337925,3 14 | 0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,1.096069109761043,2 15 | 0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,1.1736260149034599,2 16 | 0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,-1.3857518547962953,2 17 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1.4062967303307103,2 18 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,-1.3857518547962953,2 19 | 0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,1.096069109761043,2 20 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,-0.06728446737520931,2 21 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,-0.3775120879448766,2 22 | 0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0.24294315319445797,2 23 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0.7082845840489589,2 24 | 0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,-1.1530811393690448,2 25 | 0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,-0.29995518280245975,2 26 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0.7858414891913758,2 27 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0.01027243776720751,1 28 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,-0.6877397085145439,1 29 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0.24294315319445797,1 30 | 1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,-1.1530811393690448,1 31 | 0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0.7082845840489589,1 32 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0.6307276789065421,1 33 | 0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,-0.5326258982297103,1 34 | 0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1.5614105406155439,1 35 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0.7858414891913758,1 36 | 1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1.6389674457579608,3 37 | 1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0.39805696347929165,3 38 | 0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,2.104308876612462,3 39 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,-0.610182803372127,3 40 | 0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,-0.6877397085145439,3 41 | 0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,-0.06728446737520931,3 42 | 1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,2.3369795920397123,3 43 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,-0.610182803372127,3 44 | 0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,-1.463308759938712,3 45 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0.16538624805204116,3 46 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0.087829342909624325,3 47 | 0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0.8633983943337925,3 48 | 0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,1.096069109761043,2 49 | 0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,-1.3857518547962953,2 50 | 0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,1.096069109761043,2 51 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,-0.06728446737520931,2 52 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,-0.3775120879448766,2 53 | 0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0.24294315319445797,2 54 | 0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,-0.4550689930872934,2 55 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0.7082845840489589,2 56 | 0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,-1.1530811393690448,2 57 | 0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,-0.29995518280245975,2 58 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0.7858414891913758,2 59 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1.096069109761043,1 60 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0.01027243776720751,1 61 | 0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1.7940812560427943,1 62 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,-0.6877397085145439,1 63 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0.24294315319445797,1 64 | 1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,-1.1530811393690448,1 65 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0.6307276789065421,1 66 | 0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,-0.5326258982297103,1 67 | 0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1.5614105406155439,1 68 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0.7858414891913758,1 69 | 1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,-0.8428535187993775,3 70 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,-1.3081949496538785,3 71 | 0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0.8633983943337925,3 72 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,-1.3081949496538785,3 73 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,-0.6877397085145439,3 74 | 1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1.3287398251882936,3 75 | 0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,-0.610182803372127,3 76 | 1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,-0.610182803372127,3 77 | 1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0.7858414891913758,3 78 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,-0.8428535187993775,3 79 | 1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,-0.6877397085145439,3 80 | 0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,-0.29995518280245975,3 81 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,-0.22239827766004297,3 82 | 1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0.24294315319445797,3 83 | 0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0.24294315319445797,3 84 | 0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,-0.7652966136569607,2 85 | 1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,-0.4550689930872934,2 86 | 1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,-0.06728446737520931,2 87 | 0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,-1.075524234226628,2 88 | 1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0.5531707737641253,2 89 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,-0.610182803372127,2 90 | 1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,-0.610182803372127,2 91 | 0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0.7082845840489589,2 92 | 1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,-0.9979673290842112,2 93 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,-0.22239827766004297,2 94 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,-1.3857518547962953,2 95 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,-1.075524234226628,1 96 | 1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0.7858414891913758,1 97 | 0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,-0.6877397085145439,1 98 | 0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0.16538624805204116,1 99 | 1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0.3205000583368748,1 100 | 0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0.47561386862170846,1 101 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,-1.2306380445114617,1 102 | 0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0.087829342909624325,1 103 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,-0.6877397085145439,1 104 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,-1.928650190793213,1 105 | 1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,-0.5326258982297103,3 106 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0.6307276789065421,3 107 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1.3287398251882936,3 108 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,-0.9204104239417944,2 109 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,-0.7652966136569607,2 110 | 1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,1.2511829200458766,2 111 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,-0.8428535187993775,2 112 | 0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,-0.610182803372127,2 113 | 0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0.16538624805204116,1 114 | 0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,-0.6877397085145439,1 115 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,-1.3081949496538785,1 116 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,-0.9979673290842112,1 117 | 0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0.7082845840489589,1 118 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,-1.3857518547962953,1 119 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,-0.29995518280245975,1 120 | 0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,-0.14484137251762613,1 121 | 1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1.5614105406155439,1 122 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1.7940812560427943,1 123 | 0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,-0.06728446737520931,1 124 | -------------------------------------------------------------------------------- /inst/extdata/test_schizo.csv: -------------------------------------------------------------------------------- 1 | target,gain_ratio_1,gain_ratio_2,gain_ratio_3,gain_ratio_4,gain_ratio_5,gain_ratio_6,gain_ratio_7,gain_ratio_8,gain_ratio_9,gain_ratio_10,gain_ratio_11,sex,y 2 | PS,0.879,0.864,0.804,0.65,0.74,0.766,0.866,0.817,0.879,0.733,0.845,female,non-schizophrenic 3 | PS,0.919,0.875,0.828,0.915,0.883,0.802,0.802,0.77,0.963,0.932,1.01,female,non-schizophrenic 4 | CS,0.829,0.753,0.774,0.716,0.776,0.793,0.738,0.731,0.76,0.636,0.642,female,non-schizophrenic 5 | PS,0.8425,0.829,0.828,0.741,0.831,0.832,0.665,0.816,0.819,0.73,0.816,female,non-schizophrenic 6 | PS,0.948,0.896,0.872,0.869,0.819,0.852,0.815,0.83,0.799,0.728,0.69,female,non-schizophrenic 7 | PS,0.862,0.881,0.874,0.874,0.835,0.814,0.825,0.772,0.711,0.716,0.726,female,non-schizophrenic 8 | PS,0.8425,0.829,0.952,0.83,0.831,0.98,0.827,0.892,0.962,0.836,0.816,female,non-schizophrenic 9 | PS,0.791,0.834,0.726,0.83,0.831,0.832,0.722,0.816,0.838,0.827,0.916,female,non-schizophrenic 10 | PS,0.872,0.829,0.867,0.919,0.795,0.756,0.854,0.945,0.842,0.82,0.816,female,non-schizophrenic 11 | PS,0.947,0.912,0.94,0.919,0.915,0.889,0.901,0.874,0.837,0.872,0.84,female,non-schizophrenic 12 | PS,0.88,0.829,0.798,0.822,0.77,0.815,0.803,0.816,0.767,0.82,0.797,female,non-schizophrenic 13 | PS,0.799,0.69,0.701,0.738,0.831,0.761,0.696,0.679,0.709,0.65,0.816,female,non-schizophrenic 14 | TR,0.8425,0.829,0.966,0.926,0.831,0.832,0.827,0.916,0.777,0.82,0.816,female,non-schizophrenic 15 | PS,0.8425,0.829,0.828,0.83,0.947,1.2,1.14,1.1,1.12,0.871,0.809,female,non-schizophrenic 16 | PS,0.896,0.874,0.893,0.944,0.933,0.941,0.892,0.893,0.84,0.82,0.829,female,non-schizophrenic 17 | PS,0.914,0.873,0.844,0.925,0.868,0.783,0.701,0.741,0.722,0.828,0.816,female,non-schizophrenic 18 | TR,0.8425,0.829,0.828,0.83,0.831,0.832,0.827,0.816,0.819,0.82,0.816,female,non-schizophrenic 19 | CS,0.807,0.811,0.787,0.728,0.803,0.832,0.827,0.816,0.819,0.82,0.816,female,non-schizophrenic 20 | PS,0.803,0.782,0.623,0.828,0.826,0.793,0.811,0.75,0.816,0.753,0.766,female,non-schizophrenic 21 | CS,0.939,0.841,0.901,0.917,0.896,0.921,0.899,0.804,0.894,0.846,0.902,female,non-schizophrenic 22 | PS,0.813,0.758,0.828,0.83,0.831,0.832,0.827,0.77,0.819,0.82,0.773,female,non-schizophrenic 23 | TR,0.697,0.617,0.759,0.83,0.6,0.604,0.619,0.592,0.819,0.82,0.679,female,non-schizophrenic 24 | TR,0.782,0.88,0.828,0.83,0.709,0.886,0.841,0.816,0.819,0.82,0.843,female,non-schizophrenic 25 | CS,1.03,1.01,1.02,0.964,1.05,1.01,0.985,0.964,1.01,1,1.01,male,non-schizophrenic 26 | PS,0.822,0.843,0.625,0.81,0.702,0.702,0.842,0.865,0.701,0.77,0.801,male,non-schizophrenic 27 | PS,0.863,0.913,0.743,0.86,0.803,0.889,0.924,0.87,0.872,0.859,0.84,male,non-schizophrenic 28 | PS,0.901,0.777,0.743,0.858,0.811,0.751,0.627,0.748,0.808,0.669,0.844,male,non-schizophrenic 29 | PS,0.81,0.735,0.664,0.826,0.767,0.604,0.669,0.87,0.817,0.59,0.835,male,non-schizophrenic 30 | TR,0.674,0.646,0.626,0.639,0.64,0.665,0.655,0.661,0.724,0.7,0.661,male,non-schizophrenic 31 | CS,1,0.958,0.938,1.02,0.956,0.909,1.04,0.902,0.956,0.939,0.954,male,non-schizophrenic 32 | PS,0.8425,0.829,0.828,0.83,0.924,0.924,1,0.986,0.962,1.02,0.991,male,non-schizophrenic 33 | PS,0.94,0.971,0.76,0.983,0.998,0.894,0.856,0.942,0.937,0.965,0.936,male,non-schizophrenic 34 | TR,0.8425,0.829,0.803,0.826,0.764,0.815,0.868,0.791,0.86,0.82,0.839,male,non-schizophrenic 35 | CS,0.894,0.954,0.939,0.938,0.9,0.936,0.944,0.884,0.93,0.885,0.846,male,non-schizophrenic 36 | CS,0.8425,0.829,0.711,0.83,0.78,0.832,0.775,0.68,0.819,0.858,0.662,male,non-schizophrenic 37 | CS,0.962,0.93,0.922,0.858,0.905,0.793,0.867,0.948,0.879,0.916,0.781,male,non-schizophrenic 38 | TR,0.757,0.756,0.811,0.709,0.714,0.743,0.745,0.816,0.819,0.82,0.813,male,non-schizophrenic 39 | PS,0.8425,0.93,0.906,1.01,0.933,0.832,0.862,0.816,0.819,0.82,0.816,male,non-schizophrenic 40 | CS,0.868,0.901,0.893,0.864,0.831,0.795,0.905,0.872,0.873,0.872,0.822,male,non-schizophrenic 41 | TR,0.8,0.76,0.815,0.759,0.828,0.77,0.769,0.789,0.73,0.766,0.85,male,non-schizophrenic 42 | PS,0.895,0.771,0.997,0.885,0.948,0.832,0.843,0.66,0.729,0.801,0.893,male,non-schizophrenic 43 | CS,0.643,0.829,0.828,0.83,0.831,0.832,0.626,0.816,0.819,0.82,0.816,male,non-schizophrenic 44 | TR,0.742,0.829,0.7,0.743,0.748,0.827,0.827,0.816,0.819,0.82,0.776,male,non-schizophrenic 45 | CS,0.767,0.822,0.828,0.798,0.806,0.766,0.767,0.816,0.82,0.876,0.756,male,non-schizophrenic 46 | PS,0.876,0.866,0.899,0.923,0.832,0.849,0.827,0.906,0.822,0.885,0.826,female,schizophrenic 47 | CS,0.836,0.944,0.889,0.909,0.863,0.838,0.844,0.784,0.819,0.82,0.816,female,schizophrenic 48 | PS,0.8425,0.857,0.828,0.798,0.831,0.832,0.757,0.742,0.819,0.82,0.816,female,schizophrenic 49 | TR,0.8425,0.829,0.682,0.651,0.672,0.832,0.827,0.604,0.819,0.82,0.816,female,schizophrenic 50 | CS,0.919,0.856,0.825,0.908,0.896,0.886,0.905,0.938,0.875,0.983,0.881,female,schizophrenic 51 | PS,0.911,0.927,0.798,0.938,0.899,0.952,0.925,0.851,0.953,0.761,0.952,female,schizophrenic 52 | PS,0.8425,0.829,0.613,0.44,0.831,0.832,0.827,0.816,0.819,0.82,0.816,female,schizophrenic 53 | PS,0.726,0.734,0.862,0.83,0.972,0.9,0.876,0.83,0.878,0.79,0.868,female,schizophrenic 54 | TR,0.756,0.871,0.712,0.897,0.785,0.789,0.724,0.798,0.581,0.672,0.636,female,schizophrenic 55 | PS,0.782,0.829,0.828,0.83,0.84,0.832,0.837,0.816,0.819,0.797,0.816,female,schizophrenic 56 | PS,0.937,0.776,0.857,0.899,0.955,0.929,0.827,0.89,0.819,0.818,0.945,female,schizophrenic 57 | TR,0.75,0.829,0.744,0.83,0.794,0.732,0.827,0.697,0.819,0.772,0.816,female,schizophrenic 58 | PS,0.8,0.866,0.915,0.911,0.9,0.886,0.837,0.848,0.896,0.755,0.861,female,schizophrenic 59 | TR,0.899,0.768,0.787,0.781,0.735,0.827,0.796,0.793,0.729,0.801,0.838,female,schizophrenic 60 | CS,0.83,0.828,0.697,0.731,0.817,0.687,0.778,0.612,0.668,0.755,0.754,male,schizophrenic 61 | PS,0.63,0.631,0.828,0.664,0.579,0.832,0.801,0.641,0.819,0.82,0.816,male,schizophrenic 62 | PS,0.691,0.709,0.828,0.83,0.831,0.687,0.639,0.667,0.669,0.695,0.545,male,schizophrenic 63 | PS,0.782,0.812,0.828,0.669,0.701,0.726,0.827,0.673,0.708,0.637,0.728,male,schizophrenic 64 | PS,0.932,0.783,0.809,0.837,0.744,0.794,0.767,0.71,0.622,0.569,0.562,male,schizophrenic 65 | PS,0.851,0.828,0.808,0.827,0.873,0.862,0.752,0.668,0.687,0.717,0.696,male,schizophrenic 66 | PS,0.73,0.729,0.828,0.704,0.831,0.692,0.637,0.581,0.819,0.654,0.816,male,schizophrenic 67 | TR,0.564,0.703,0.59,0.58,0.831,0.832,0.667,0.584,0.819,0.688,0.584,male,schizophrenic 68 | CS,0.779,0.707,0.705,0.785,0.58,0.746,0.715,0.551,0.799,0.668,0.779,male,schizophrenic 69 | PS,0.787,0.748,0.764,0.796,0.778,0.758,0.75,0.746,0.763,0.647,0.734,male,schizophrenic 70 | PS,0.8425,0.773,0.635,0.594,0.608,0.832,0.526,0.625,0.623,0.712,0.782,male,schizophrenic 71 | PS,0.927,0.854,0.828,0.83,1.01,0.955,0.916,0.957,0.905,0.855,0.947,male,schizophrenic 72 | CS,0.893,0.702,0.902,0.83,0.831,0.777,0.827,0.816,0.819,0.82,0.816,male,schizophrenic 73 | PS,0.8425,0.829,0.828,0.83,0.909,0.895,0.827,0.816,0.931,0.956,0.97,male,schizophrenic 74 | PS,0.8425,0.994,1.05,0.941,0.98,1.02,0.96,1.03,0.973,0.813,0.909,male,schizophrenic 75 | PS,0.8425,0.857,0.895,0.879,0.831,0.832,0.852,0.894,0.888,0.82,0.816,male,schizophrenic 76 | TR,0.8425,0.728,0.828,0.83,0.777,0.825,0.827,0.816,0.819,0.82,0.686,male,schizophrenic 77 | PS,0.776,0.956,0.944,0.928,0.85,0.925,0.942,0.9,0.945,0.919,0.898,male,schizophrenic 78 | CS,0.618,0.829,0.828,0.83,0.737,0.832,0.827,0.816,0.643,0.82,0.62,male,schizophrenic 79 | PS,0.8425,0.829,0.828,0.83,0.831,0.712,0.871,0.832,0.819,0.82,0.816,male,schizophrenic 80 | CS,0.956,0.825,0.953,0.825,0.916,0.92,0.964,0.903,0.868,0.945,0.895,male,schizophrenic 81 | CS,0.66,0.655,0.828,0.58,0.708,0.688,0.646,0.816,0.588,0.82,0.74,male,schizophrenic 82 | CS,0.782,0.779,0.72,0.787,0.763,0.755,0.784,0.764,0.754,0.789,0.753,male,schizophrenic 83 | PS,0.602,0.829,0.641,0.574,0.831,0.832,0.827,0.793,0.819,0.613,0.634,male,schizophrenic 84 | TR,0.684,0.579,0.509,0.496,0.436,0.558,0.564,0.816,0.819,0.82,0.259,male,schizophrenic 85 | CS,0.856,0.835,0.946,0.844,0.907,0.897,0.827,0.816,0.819,0.82,0.816,male,schizophrenic 86 | -------------------------------------------------------------------------------- /inst/extdata/tictactoe_test.csv: -------------------------------------------------------------------------------- 1 | X1b,X1o,X1x,X2b,X2o,X2x,X3b,X3o,X3x,X4b,X4o,X4x,X5b,X5o,X5x,X6b,X6o,X6x,X7b,X7o,X7x,X8b,X8o,X8x,X9b,X9o,X9x,class 2 | 0,0,1,0,0,1,0,0,1,0,0,1,0,1,0,1,0,0,0,1,0,0,1,0,1,0,0,positive 3 | 0,0,1,0,0,1,0,0,1,0,0,1,0,1,0,1,0,0,0,1,0,1,0,0,0,1,0,positive 4 | 0,0,1,0,0,1,0,0,1,0,0,1,1,0,0,0,1,0,0,1,0,1,0,0,0,1,0,positive 5 | 0,0,1,0,0,1,0,0,1,0,1,0,0,0,1,0,1,0,0,1,0,0,0,1,0,1,0,positive 6 | 0,0,1,0,0,1,0,0,1,0,1,0,0,1,0,0,0,1,1,0,0,0,1,0,1,0,0,positive 7 | 0,0,1,0,0,1,0,0,1,0,1,0,1,0,0,0,0,1,0,1,0,0,1,0,1,0,0,positive 8 | 0,0,1,0,0,1,0,0,1,0,1,0,1,0,0,0,1,0,1,0,0,0,0,1,0,1,0,positive 9 | 0,0,1,0,0,1,0,0,1,0,1,0,1,0,0,0,1,0,1,0,0,0,1,0,0,0,1,positive 10 | 0,0,1,0,0,1,0,0,1,0,1,0,1,0,0,1,0,0,0,1,0,0,0,1,0,1,0,positive 11 | 0,0,1,0,0,1,0,0,1,0,1,0,1,0,0,1,0,0,0,1,0,1,0,0,1,0,0,positive 12 | 0,0,1,0,0,1,0,0,1,1,0,0,0,1,0,0,0,1,0,1,0,1,0,0,0,1,0,positive 13 | 0,0,1,0,0,1,0,0,1,1,0,0,0,1,0,0,1,0,1,0,0,0,0,1,0,1,0,positive 14 | 0,0,1,0,0,1,0,0,1,1,0,0,0,1,0,1,0,0,0,1,0,0,1,0,0,0,1,positive 15 | 0,0,1,0,0,1,0,0,1,1,0,0,0,1,0,1,0,0,1,0,0,1,0,0,0,1,0,positive 16 | 0,0,1,0,0,1,0,0,1,1,0,0,1,0,0,0,1,0,0,1,0,0,1,0,0,0,1,positive 17 | 0,0,1,0,0,1,0,0,1,1,0,0,1,0,0,0,1,0,1,0,0,0,1,0,1,0,0,positive 18 | 0,0,1,0,0,1,0,1,0,0,0,1,0,1,0,0,0,1,0,0,1,0,1,0,0,1,0,positive 19 | 0,0,1,0,0,1,0,1,0,0,0,1,0,1,0,0,1,0,0,0,1,1,0,0,1,0,0,positive 20 | 0,0,1,0,0,1,0,1,0,0,0,1,0,1,0,1,0,0,0,0,1,1,0,0,0,1,0,positive 21 | 0,0,1,0,0,1,0,1,0,0,0,1,1,0,0,0,1,0,0,0,1,0,1,0,1,0,0,positive 22 | 0,0,1,0,0,1,0,1,0,0,0,1,1,0,0,1,0,0,0,0,1,0,1,0,0,1,0,positive 23 | 0,0,1,0,0,1,0,1,0,0,1,0,0,0,1,0,1,0,0,0,1,0,1,0,0,0,1,positive 24 | 0,0,1,0,0,1,1,0,0,0,0,1,0,1,0,0,1,0,0,0,1,0,1,0,1,0,0,positive 25 | 0,0,1,0,0,1,1,0,0,0,1,0,0,0,1,0,1,0,1,0,0,0,1,0,0,0,1,positive 26 | 0,0,1,0,1,0,0,0,1,0,1,0,0,0,1,1,0,0,1,0,0,0,1,0,0,0,1,positive 27 | 0,0,1,0,1,0,0,1,0,0,0,1,0,0,1,0,0,1,0,1,0,0,0,1,0,1,0,positive 28 | 0,0,1,0,1,0,0,1,0,0,0,1,0,0,1,0,0,1,0,1,0,1,0,0,1,0,0,positive 29 | 0,0,1,0,1,0,0,1,0,0,0,1,0,0,1,0,1,0,0,0,1,0,1,0,0,0,1,positive 30 | 0,0,1,0,1,0,0,1,0,0,0,1,0,0,1,1,0,0,0,1,0,1,0,0,0,0,1,positive 31 | 0,0,1,0,1,0,0,1,0,0,0,1,0,1,0,0,0,1,0,0,1,1,0,0,1,0,0,positive 32 | 0,0,1,0,1,0,0,1,0,0,0,1,0,1,0,1,0,0,0,0,1,1,0,0,0,0,1,positive 33 | 0,0,1,0,1,0,0,1,0,0,1,0,0,0,1,0,0,1,1,0,0,1,0,0,0,0,1,positive 34 | 0,0,1,0,1,0,0,1,0,0,1,0,0,0,1,1,0,0,0,0,1,1,0,0,0,0,1,positive 35 | 0,0,1,0,1,0,0,1,0,0,1,0,0,0,1,1,0,0,1,0,0,0,0,1,0,0,1,positive 36 | 0,0,1,0,1,0,0,1,0,1,0,0,0,0,1,0,1,0,1,0,0,0,0,1,0,0,1,positive 37 | 0,0,1,0,1,0,0,1,0,1,0,0,0,0,1,1,0,0,0,1,0,0,0,1,0,0,1,positive 38 | 0,0,1,0,1,0,1,0,0,0,0,1,0,0,1,0,1,0,0,0,1,0,1,0,1,0,0,positive 39 | 0,0,1,0,1,0,1,0,0,0,0,1,0,0,1,1,0,0,0,1,0,0,1,0,0,0,1,positive 40 | 0,0,1,0,1,0,1,0,0,0,0,1,1,0,0,0,1,0,0,0,1,0,0,1,0,1,0,positive 41 | 0,0,1,0,1,0,1,0,0,0,0,1,1,0,0,0,1,0,0,0,1,0,1,0,0,0,1,positive 42 | 0,0,1,0,1,0,1,0,0,0,0,1,1,0,0,0,1,0,0,0,1,1,0,0,1,0,0,positive 43 | 0,0,1,0,1,0,1,0,0,0,1,0,0,0,1,0,1,0,0,0,1,1,0,0,0,0,1,positive 44 | 0,0,1,0,1,0,1,0,0,0,1,0,0,0,1,1,0,0,0,1,0,0,0,1,0,0,1,positive 45 | 0,0,1,0,1,0,1,0,0,1,0,0,0,0,1,1,0,0,0,1,0,1,0,0,0,0,1,positive 46 | 0,0,1,1,0,0,0,0,1,0,1,0,0,0,1,1,0,0,0,1,0,0,1,0,0,0,1,positive 47 | 0,0,1,1,0,0,0,0,1,1,0,0,0,0,1,0,1,0,0,1,0,0,1,0,0,0,1,positive 48 | 0,0,1,1,0,0,0,1,0,0,0,1,0,0,1,0,0,1,0,1,0,0,1,0,1,0,0,positive 49 | 0,0,1,1,0,0,0,1,0,0,0,1,0,0,1,0,0,1,0,1,0,1,0,0,0,1,0,positive 50 | 0,0,1,1,0,0,0,1,0,0,0,1,0,0,1,0,1,0,0,0,1,0,1,0,1,0,0,positive 51 | 0,0,1,1,0,0,0,1,0,0,0,1,0,1,0,1,0,0,0,0,1,1,0,0,1,0,0,positive 52 | 0,0,1,1,0,0,0,1,0,0,0,1,1,0,0,0,0,1,0,0,1,0,1,0,0,1,0,positive 53 | 0,0,1,1,0,0,0,1,0,0,1,0,0,0,1,1,0,0,0,1,0,0,0,1,0,0,1,positive 54 | 0,0,1,1,0,0,0,1,0,0,1,0,0,0,1,1,0,0,1,0,0,1,0,0,0,0,1,positive 55 | 0,0,1,1,0,0,0,1,0,0,1,0,1,0,0,0,1,0,0,0,1,0,0,1,0,0,1,positive 56 | 0,0,1,1,0,0,0,1,0,1,0,0,0,0,1,0,1,0,0,1,0,0,0,1,0,0,1,positive 57 | 0,0,1,1,0,0,1,0,0,0,0,1,0,0,1,0,1,0,0,0,1,0,1,0,0,1,0,positive 58 | 0,0,1,1,0,0,1,0,0,0,0,1,0,1,0,0,1,0,0,0,1,0,1,0,0,0,1,positive 59 | 0,0,1,1,0,0,1,0,0,0,0,1,0,1,0,1,0,0,0,0,1,1,0,0,0,1,0,positive 60 | 0,0,1,1,0,0,1,0,0,0,0,1,1,0,0,1,0,0,0,0,1,0,1,0,0,1,0,positive 61 | 0,0,1,1,0,0,1,0,0,0,1,0,0,0,1,0,0,1,0,1,0,0,1,0,0,0,1,positive 62 | 0,0,1,1,0,0,1,0,0,0,1,0,0,0,1,0,1,0,1,0,0,1,0,0,0,0,1,positive 63 | 0,0,1,1,0,0,1,0,0,0,1,0,0,0,1,1,0,0,0,1,0,1,0,0,0,0,1,positive 64 | 0,1,0,0,0,1,0,0,1,0,1,0,0,1,0,0,0,1,1,0,0,1,0,0,0,0,1,positive 65 | 0,1,0,0,0,1,0,1,0,0,0,1,0,0,1,0,0,1,0,1,0,0,0,1,0,1,0,positive 66 | 0,1,0,0,0,1,0,1,0,0,0,1,0,0,1,0,0,1,0,1,0,0,1,0,0,0,1,positive 67 | 0,1,0,0,0,1,0,1,0,1,0,0,0,0,1,0,0,1,0,1,0,0,0,1,1,0,0,positive 68 | 0,1,0,0,0,1,0,1,0,1,0,0,0,0,1,0,0,1,1,0,0,0,0,1,0,1,0,positive 69 | 0,1,0,0,0,1,0,1,0,1,0,0,0,0,1,1,0,0,0,0,1,0,0,1,0,1,0,positive 70 | 0,1,0,0,0,1,1,0,0,0,0,1,0,0,1,0,0,1,0,1,0,0,1,0,1,0,0,positive 71 | 0,1,0,0,0,1,1,0,0,0,0,1,0,0,1,0,1,0,0,1,0,0,0,1,1,0,0,positive 72 | 0,1,0,0,0,1,1,0,0,0,1,0,0,0,1,1,0,0,1,0,0,0,0,1,1,0,0,positive 73 | 0,1,0,0,1,0,0,0,1,0,0,1,0,0,1,0,0,1,0,0,1,0,1,0,0,1,0,positive 74 | 0,1,0,0,1,0,0,0,1,0,0,1,0,0,1,0,1,0,0,0,1,1,0,0,1,0,0,positive 75 | 0,1,0,0,1,0,0,0,1,0,0,1,0,1,0,0,0,1,0,1,0,0,0,1,0,0,1,positive 76 | 0,1,0,0,1,0,0,0,1,0,0,1,0,1,0,0,1,0,0,0,1,0,0,1,0,0,1,positive 77 | 0,1,0,0,1,0,0,0,1,0,1,0,0,0,1,0,0,1,0,0,1,0,1,0,0,0,1,positive 78 | 0,1,0,0,1,0,0,0,1,0,1,0,0,0,1,1,0,0,0,0,1,0,0,1,1,0,0,positive 79 | 0,1,0,0,1,0,0,0,1,0,1,0,0,0,1,1,0,0,0,0,1,1,0,0,0,0,1,positive 80 | 0,1,0,0,1,0,0,0,1,0,1,0,1,0,0,1,0,0,0,0,1,0,0,1,0,0,1,positive 81 | 0,1,0,0,1,0,0,0,1,1,0,0,0,0,1,0,1,0,0,0,1,1,0,0,0,0,1,positive 82 | 0,1,0,0,1,0,0,0,1,1,0,0,0,0,1,1,0,0,0,0,1,0,0,1,0,1,0,positive 83 | 0,1,0,0,1,0,0,0,1,1,0,0,0,0,1,1,0,0,0,0,1,0,1,0,0,0,1,positive 84 | 0,1,0,0,1,0,0,0,1,1,0,0,0,1,0,0,0,1,0,0,1,1,0,0,0,0,1,positive 85 | 0,1,0,0,1,0,0,0,1,1,0,0,1,0,0,0,0,1,0,0,1,0,1,0,0,0,1,positive 86 | 0,1,0,0,1,0,0,0,1,1,0,0,1,0,0,0,0,1,0,1,0,0,0,1,0,0,1,positive 87 | 0,1,0,1,0,0,0,0,1,0,0,1,0,0,1,0,1,0,0,0,1,0,1,0,1,0,0,positive 88 | 0,1,0,1,0,0,0,0,1,0,1,0,0,0,1,0,1,0,0,0,1,0,0,1,1,0,0,positive 89 | 0,1,0,1,0,0,0,0,1,0,1,0,0,0,1,0,1,0,0,0,1,1,0,0,0,0,1,positive 90 | 0,1,0,1,0,0,0,0,1,0,1,0,0,0,1,1,0,0,0,0,1,0,1,0,0,0,1,positive 91 | 0,1,0,1,0,0,0,0,1,0,1,0,1,0,0,0,1,0,0,0,1,0,0,1,0,0,1,positive 92 | 0,1,0,1,0,0,0,0,1,1,0,0,0,1,0,0,0,1,1,0,0,1,0,0,0,0,1,positive 93 | 0,1,0,1,0,0,0,1,0,0,0,1,0,0,1,0,0,1,1,0,0,0,1,0,0,0,1,positive 94 | 0,1,0,1,0,0,0,1,0,0,1,0,0,0,1,1,0,0,0,0,1,0,0,1,0,0,1,positive 95 | 0,1,0,1,0,0,0,1,0,0,1,0,1,0,0,0,0,1,0,0,1,0,0,1,0,0,1,positive 96 | 0,1,0,1,0,0,0,1,0,1,0,0,0,1,0,0,0,1,0,0,1,0,0,1,0,0,1,positive 97 | 1,0,0,0,0,1,0,0,1,0,1,0,0,0,1,0,1,0,0,1,0,0,0,1,1,0,0,positive 98 | 1,0,0,0,0,1,0,0,1,0,1,0,1,0,0,0,0,1,0,1,0,0,1,0,0,0,1,positive 99 | 1,0,0,0,0,1,0,0,1,1,0,0,0,1,0,0,0,1,0,1,0,0,1,0,0,0,1,positive 100 | 1,0,0,0,0,1,0,1,0,0,0,1,0,0,1,0,0,1,0,1,0,0,1,0,1,0,0,positive 101 | 1,0,0,0,0,1,0,1,0,0,0,1,0,0,1,1,0,0,0,1,0,0,0,1,0,1,0,positive 102 | 1,0,0,0,0,1,0,1,0,0,1,0,0,0,1,0,1,0,0,0,1,0,0,1,1,0,0,positive 103 | 1,0,0,0,0,1,0,1,0,0,1,0,1,0,0,0,1,0,0,0,1,0,0,1,0,0,1,positive 104 | 1,0,0,0,0,1,0,1,0,1,0,0,0,0,1,0,1,0,1,0,0,0,0,1,1,0,0,positive 105 | 1,0,0,0,0,1,1,0,0,0,1,0,0,0,1,0,1,0,0,1,0,0,0,1,0,0,1,positive 106 | 1,0,0,0,1,0,0,0,1,0,0,1,0,0,1,0,0,1,0,1,0,0,1,0,1,0,0,positive 107 | 1,0,0,0,1,0,0,0,1,0,0,1,0,1,0,0,0,1,0,1,0,1,0,0,0,0,1,positive 108 | 1,0,0,0,1,0,0,0,1,0,1,0,0,0,1,0,0,1,0,1,0,1,0,0,0,0,1,positive 109 | 1,0,0,0,1,0,0,0,1,0,1,0,0,0,1,0,1,0,0,0,1,0,0,1,1,0,0,positive 110 | 1,0,0,0,1,0,0,0,1,0,1,0,0,1,0,0,0,1,1,0,0,0,0,1,0,0,1,positive 111 | 1,0,0,0,1,0,0,0,1,0,1,0,0,1,0,1,0,0,0,0,1,0,0,1,0,0,1,positive 112 | 1,0,0,0,1,0,0,0,1,0,1,0,1,0,0,0,0,1,1,0,0,1,0,0,0,0,1,positive 113 | 1,0,0,0,1,0,0,0,1,0,1,0,1,0,0,0,1,0,0,0,1,0,0,1,0,0,1,positive 114 | 1,0,0,0,1,0,0,0,1,1,0,0,0,0,1,0,0,1,0,0,1,0,1,0,0,1,0,positive 115 | 1,0,0,0,1,0,0,0,1,1,0,0,0,0,1,0,1,0,0,0,1,0,0,1,0,1,0,positive 116 | 1,0,0,0,1,0,0,0,1,1,0,0,0,0,1,1,0,0,0,0,1,0,1,0,1,0,0,positive 117 | 1,0,0,0,1,0,0,0,1,1,0,0,0,1,0,0,0,1,0,1,0,0,0,1,0,0,1,positive 118 | 1,0,0,0,1,0,0,0,1,1,0,0,0,1,0,0,0,1,1,0,0,1,0,0,0,0,1,positive 119 | 1,0,0,0,1,0,0,1,0,0,0,1,0,0,1,0,0,1,0,0,1,0,1,0,1,0,0,positive 120 | 1,0,0,0,1,0,1,0,0,0,0,1,0,0,1,0,0,1,0,1,0,0,0,1,0,1,0,positive 121 | 1,0,0,1,0,0,0,0,1,1,0,0,0,1,0,0,0,1,0,1,0,1,0,0,0,0,1,positive 122 | 1,0,0,1,0,0,0,1,0,0,0,1,0,0,1,0,0,1,1,0,0,1,0,0,0,1,0,positive 123 | 1,0,0,1,0,0,0,1,0,0,0,1,0,1,0,0,1,0,0,0,1,0,0,1,0,0,1,positive 124 | 1,0,0,1,0,0,0,1,0,0,1,0,0,1,0,0,0,1,0,0,1,0,0,1,0,0,1,positive 125 | 1,0,0,1,0,0,1,0,0,0,0,1,0,0,1,0,0,1,1,0,0,0,1,0,0,1,0,positive 126 | 1,0,0,1,0,0,1,0,0,0,1,0,0,1,0,1,0,0,0,0,1,0,0,1,0,0,1,positive 127 | 0,0,1,0,0,1,0,1,0,0,0,1,0,0,1,0,1,0,0,1,0,1,0,0,0,1,0,negative 128 | 0,0,1,0,0,1,0,1,0,0,0,1,0,0,1,0,1,0,1,0,0,0,1,0,0,1,0,negative 129 | 0,0,1,0,0,1,0,1,0,0,0,1,0,1,0,0,1,0,0,1,0,0,0,1,1,0,0,negative 130 | 0,0,1,0,0,1,0,1,0,0,0,1,0,1,0,1,0,0,0,1,0,0,1,0,0,0,1,negative 131 | 0,0,1,0,0,1,0,1,0,1,0,0,0,1,0,0,1,0,0,1,0,0,0,1,0,0,1,negative 132 | 0,0,1,0,0,1,1,0,0,0,1,0,0,1,0,0,1,0,0,0,1,0,0,1,0,1,0,negative 133 | 0,0,1,0,1,0,0,0,1,0,0,1,0,0,1,1,0,0,0,1,0,0,1,0,0,1,0,negative 134 | 0,0,1,0,1,0,0,0,1,0,0,1,0,1,0,0,0,1,0,1,0,0,1,0,1,0,0,negative 135 | 0,0,1,0,1,0,0,0,1,0,0,1,0,1,0,0,0,1,1,0,0,0,1,0,0,1,0,negative 136 | 0,0,1,0,1,0,0,0,1,0,0,1,0,1,0,0,1,0,1,0,0,0,1,0,0,0,1,negative 137 | 0,0,1,0,1,0,0,0,1,0,1,0,0,1,0,0,1,0,0,0,1,0,0,1,1,0,0,negative 138 | 0,0,1,0,1,0,0,0,1,0,1,0,0,1,0,0,1,0,0,0,1,1,0,0,0,0,1,negative 139 | 0,0,1,0,1,0,0,0,1,0,1,0,0,1,0,0,1,0,1,0,0,0,0,1,0,0,1,negative 140 | 0,0,1,0,1,0,0,0,1,0,1,0,0,1,0,1,0,0,0,0,1,0,1,0,0,0,1,negative 141 | 0,0,1,0,1,0,1,0,0,0,0,1,0,1,0,0,0,1,1,0,0,0,1,0,1,0,0,negative 142 | 0,0,1,1,0,0,0,0,1,0,0,1,0,1,0,0,0,1,0,1,0,0,1,0,0,1,0,negative 143 | 0,0,1,1,0,0,0,0,1,0,1,0,0,0,1,0,0,1,0,1,0,0,1,0,0,1,0,negative 144 | 0,0,1,1,0,0,0,0,1,0,1,0,0,1,0,0,1,0,0,0,1,1,0,0,1,0,0,negative 145 | 0,0,1,1,0,0,0,0,1,1,0,0,0,0,1,1,0,0,0,1,0,0,1,0,0,1,0,negative 146 | 0,0,1,1,0,0,0,1,0,0,1,0,0,0,1,0,1,0,0,0,1,0,0,1,0,1,0,negative 147 | 0,0,1,1,0,0,0,1,0,1,0,0,0,1,0,0,0,1,0,1,0,0,0,1,1,0,0,negative 148 | 0,0,1,1,0,0,0,1,0,1,0,0,0,1,0,0,0,1,0,1,0,1,0,0,0,0,1,negative 149 | 0,0,1,1,0,0,0,1,0,1,0,0,1,0,0,0,1,0,0,0,1,0,0,1,0,1,0,negative 150 | 0,0,1,1,0,0,1,0,0,0,0,1,0,0,1,1,0,0,0,1,0,0,1,0,0,1,0,negative 151 | 0,1,0,0,0,1,0,0,1,0,0,1,0,1,0,1,0,0,1,0,0,1,0,0,0,1,0,negative 152 | 0,1,0,0,0,1,0,0,1,0,1,0,0,1,0,0,1,0,1,0,0,0,0,1,0,0,1,negative 153 | 0,1,0,0,0,1,0,0,1,0,1,0,1,0,0,0,0,1,0,1,0,1,0,0,1,0,0,negative 154 | 0,1,0,0,0,1,0,0,1,0,1,0,1,0,0,0,1,0,0,1,0,0,0,1,0,0,1,negative 155 | 0,1,0,0,0,1,0,0,1,1,0,0,0,1,0,0,0,1,0,1,0,0,0,1,0,1,0,negative 156 | 0,1,0,0,0,1,0,0,1,1,0,0,0,1,0,1,0,0,0,0,1,1,0,0,0,1,0,negative 157 | 0,1,0,0,0,1,0,1,0,0,0,1,0,1,0,0,0,1,1,0,0,0,0,1,0,1,0,negative 158 | 0,1,0,0,0,1,0,1,0,0,0,1,0,1,0,1,0,0,0,0,1,0,0,1,0,1,0,negative 159 | 0,1,0,0,0,1,0,1,0,1,0,0,0,1,0,0,0,1,0,1,0,0,0,1,0,0,1,negative 160 | 0,1,0,0,0,1,1,0,0,0,1,0,0,0,1,0,0,1,0,1,0,0,1,0,0,0,1,negative 161 | 0,1,0,0,0,1,1,0,0,0,1,0,0,1,0,0,0,1,0,1,0,0,0,1,0,0,1,negative 162 | 0,1,0,0,1,0,0,0,1,0,0,1,0,1,0,0,0,1,0,0,1,1,0,0,0,1,0,negative 163 | 0,1,0,0,1,0,0,0,1,0,0,1,0,1,0,1,0,0,0,0,1,0,1,0,0,0,1,negative 164 | 0,1,0,0,1,0,0,1,0,0,0,1,0,0,1,1,0,0,0,1,0,0,0,1,0,0,1,negative 165 | 0,1,0,0,1,0,0,1,0,0,0,1,0,0,1,1,0,0,1,0,0,1,0,0,0,0,1,negative 166 | 0,1,0,0,1,0,0,1,0,0,0,1,0,1,0,0,0,1,1,0,0,0,0,1,0,0,1,negative 167 | 0,1,0,0,1,0,0,1,0,0,0,1,1,0,0,0,0,1,0,0,1,1,0,0,1,0,0,negative 168 | 0,1,0,0,1,0,0,1,0,0,0,1,1,0,0,1,0,0,1,0,0,0,0,1,0,0,1,negative 169 | 0,1,0,0,1,0,0,1,0,0,1,0,0,0,1,0,0,1,0,0,1,1,0,0,0,0,1,negative 170 | 0,1,0,0,1,0,0,1,0,0,1,0,0,0,1,0,0,1,1,0,0,0,0,1,0,0,1,negative 171 | 0,1,0,1,0,0,0,0,1,0,1,0,0,0,1,0,1,0,0,1,0,0,0,1,0,0,1,negative 172 | 0,1,0,1,0,0,0,0,1,1,0,0,0,1,0,1,0,0,0,0,1,0,0,1,0,1,0,negative 173 | 0,1,0,1,0,0,0,1,0,0,0,1,0,1,0,0,0,1,0,0,1,0,0,1,0,1,0,negative 174 | 0,1,0,1,0,0,0,1,0,0,1,0,0,0,1,0,0,1,0,1,0,0,0,1,0,0,1,negative 175 | 0,1,0,1,0,0,1,0,0,0,0,1,0,1,0,0,0,1,1,0,0,0,0,1,0,1,0,negative 176 | 0,1,0,1,0,0,1,0,0,0,1,0,0,0,1,0,0,1,0,1,0,1,0,0,0,0,1,negative 177 | 1,0,0,0,0,1,0,0,1,0,1,0,0,0,1,0,0,1,0,1,0,0,1,0,0,1,0,negative 178 | 1,0,0,0,0,1,0,0,1,0,1,0,0,1,0,0,1,0,0,0,1,0,0,1,0,1,0,negative 179 | 1,0,0,0,0,1,0,1,0,0,0,1,0,0,1,0,1,0,0,0,1,0,1,0,0,1,0,negative 180 | 1,0,0,0,0,1,0,1,0,0,0,1,0,1,0,0,0,1,0,1,0,0,1,0,0,0,1,negative 181 | 1,0,0,0,0,1,0,1,0,0,0,1,0,1,0,1,0,0,0,1,0,1,0,0,0,0,1,negative 182 | 1,0,0,0,1,0,0,0,1,0,0,1,0,1,0,0,1,0,0,0,1,0,1,0,0,0,1,negative 183 | 1,0,0,0,1,0,0,0,1,0,0,1,0,1,0,1,0,0,0,0,1,0,1,0,1,0,0,negative 184 | 1,0,0,0,1,0,0,1,0,0,0,1,0,0,1,0,1,0,0,0,1,0,0,1,0,1,0,negative 185 | 1,0,0,0,1,0,1,0,0,0,0,1,0,1,0,0,0,1,0,0,1,0,1,0,1,0,0,negative 186 | 1,0,0,1,0,0,0,1,0,0,0,1,0,0,1,0,1,0,0,0,1,1,0,0,0,1,0,negative 187 | 1,0,0,1,0,0,0,1,0,0,0,1,0,1,0,0,0,1,0,1,0,0,0,1,1,0,0,negative 188 | 0,0,1,0,0,1,0,1,0,0,1,0,0,1,0,0,0,1,0,0,1,0,0,1,0,1,0,negative 189 | 0,0,1,0,1,0,0,0,1,0,0,1,0,0,1,0,1,0,0,1,0,0,0,1,0,1,0,negative 190 | 0,0,1,0,1,0,0,0,1,0,1,0,0,0,1,0,0,1,0,1,0,0,0,1,0,1,0,negative 191 | 0,1,0,0,0,1,0,1,0,0,0,1,0,1,0,0,0,1,0,0,1,0,1,0,0,0,1,negative 192 | 0,1,0,0,0,1,0,1,0,0,1,0,0,0,1,0,0,1,0,0,1,0,1,0,0,0,1,negative 193 | -------------------------------------------------------------------------------- /man/autoRLearn.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/autoRLearn.R 3 | \name{autoRLearn} 4 | \alias{autoRLearn} 5 | \title{Run smartML function for automatic Supervised Machine Learning.} 6 | \usage{ 7 | autoRLearn( 8 | maxTime, 9 | directory, 10 | testDirectory, 11 | classCol = "class", 12 | metric = "acc", 13 | vRatio = 0.3, 14 | preProcessF = c("standardize", "zv"), 15 | featuresToPreProcess = c(), 16 | nComp = NA, 17 | nModels = 5, 18 | option = 2, 19 | featureTypes = c(), 20 | interp = FALSE, 21 | missingOpr = FALSE, 22 | balance = FALSE 23 | ) 24 | } 25 | \arguments{ 26 | \item{maxTime}{Float numeric of the maximum time budget for reading dataset, preprocessing, calculating meta-features, Algorithm Selection & hyper-parameter tuning process only in minutes(Excluding Model Interpretability) - This is applicable in case of Option = 2 only.} 27 | 28 | \item{directory}{String Character of the training dataset directory (SmartML accepts file formats arff/(csv with columns headers) ).} 29 | 30 | \item{testDirectory}{String Character of the testing dataset directory (SmartML accepts file formats arff/(csv with columns headers) ).} 31 | 32 | \item{classCol}{String Character of the name of the class label column in the dataset (default = 'class').} 33 | 34 | \item{metric}{Metric of string character to be used in evaluation: 35 | \itemize{ 36 | \item "acc" - Accuracy, 37 | \item "avg-fscore" - Average of F-Score of each label, 38 | \item "avg-recall" - Average of Recall of each label, 39 | \item "avg-precision" - Average of Precision of each label, 40 | \item "fscore" - Micro-Average of F-Score of each label, 41 | \item "recall" - Micro-Average of Recall of each label, 42 | \item "precision" - Micro-Average of Precision of each label. 43 | }} 44 | 45 | \item{vRatio}{Float numeric of the validation set ratio that should be splitted out of the training set for the evaluation process (default = 0.1 --> 10\%).} 46 | 47 | \item{preProcessF}{vector of string Character containing the name of the preprocessing algorithms (default = c('standardize', 'zv') --> no preprocessing): 48 | \itemize{ 49 | \item "boxcox" - apply a Box–Cox transform and values must be non-zero and positive in all features, 50 | \item "yeo-Johnson" - apply a Yeo-Johnson transform, like a BoxCox, but values can be negative, 51 | \item "zv" - remove attributes with a zero variance (all the same value), 52 | \item "center" - subtract mean from values, 53 | \item "scale" - divide values by standard deviation, 54 | \item "standardize" - perform both centering and scaling, 55 | \item "normalize" - normalize values, 56 | \item "pca" - transform data to the principal components, 57 | \item "ica" - transform data to the independent components. 58 | }} 59 | 60 | \item{featuresToPreProcess}{Vector of number of features to perform the feature preprocessing on - In case of empty vector, this means to include all features in the dataset file (default = c()) - This vector should be a subset of \code{selectedFeats}.} 61 | 62 | \item{nComp}{Integer numeric of Number of components needed if either "pca" or "ica" feature preprocessors are needed.} 63 | 64 | \item{nModels}{Integer numeric representing the number of classifier algorithms that you want to select based on Meta-Learning and start to tune using Bayesian Optimization (default = 5).} 65 | 66 | \item{option}{Integer numeric representing either Classifier Algorithm Selection is needed only = 1 or Algorithm selection with its parameter tuning is required = 2 which is the default value.} 67 | 68 | \item{featureTypes}{Vector of either 'numerical' or 'categorical' representing the types of features in the dataset (default = c() --> any factor or character features will be considered as categorical otherwise numerical).} 69 | 70 | \item{interp}{Boolean representing if model interpretability (Feature Importance and Interaction) is needed or not (default = FALSE) This option will take more time budget if set to 1.} 71 | 72 | \item{missingOpr}{Boolean variable represents either use median/mode imputation for instances with missing values (FALSE) or apply imputation using "MICE" library which helps you imputing missing values with plausible data values that are drawn from a distribution specifically designed for each missing datapoint (TRUE).} 73 | 74 | \item{balance}{Boolean variable represents if SMOTE class balancing is required or not (default FALSE).} 75 | } 76 | \value{ 77 | List of Results 78 | \itemize{ 79 | \item "option=1" - Choosen Classifier Algorithms Names \code{clfs} with their parameters configurations \code{params}, Training DataFrame \code{TRData}, Test DataFrame \code{TEData} in case of \code{option=2}, 80 | \item "option=2" - Best classifier algorithm name found \code{clfs} with its parameters configuration \code{params}, , Training DataFrame \code{TRData}, Test DataFrame \code{TEData}, model variable \code{model}, predicted values on test set \code{pred}, performance on TestingSet \code{perf}, and Feature Importance \code{interpret$featImp} / Interaction \code{interpret$Interact} plots in case of interpretability \code{interp} = TRUE and chosen model is not knn. 81 | } 82 | } 83 | \description{ 84 | Run the smartML main function for automatic classifier algorithm selection, and hyper-parameter tuning. 85 | } 86 | \examples{ 87 | \dontrun{ 88 | autoRLearn(1, 'sampleDatasets/car/train.arff', \ 89 | 'sampleDatasets/car/test.arff', option = 2, preProcessF = 'normalize') 90 | 91 | result <- autoRLearn(10, 'sampleDatasets/shuttle/train.arff', 'sampleDatasets/shuttle/test.arff') 92 | } 93 | 94 | } 95 | -------------------------------------------------------------------------------- /man/autoRLearn_.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/autoRLearn_.R 3 | \name{autoRLearn_} 4 | \alias{autoRLearn_} 5 | \title{Advanced version of autoRLearn.} 6 | \usage{ 7 | autoRLearn_( 8 | df_train, 9 | df_test, 10 | maxTime = 10, 11 | models = c("randomForest", "naiveBayes", "boosting", "l2-linear-classifier", "svm"), 12 | optimizationAlgorithm = "hyperband", 13 | bw = 3, 14 | kde_type = "single", 15 | max_iter = 81, 16 | metric = "acc" 17 | ) 18 | } 19 | \arguments{ 20 | \item{df_train}{Dataframe of the training dataset. Assumes it is in perfect shape with all numeric variables and factor response variable named "class".} 21 | 22 | \item{df_test}{Dataframe of the test dataset. Assumes it is in perfect shape with all numeric variables and factor response variable named "class".} 23 | 24 | \item{maxTime}{Float representing the maximum time the algorithm should be run (in minutes).} 25 | 26 | \item{models}{List of strings denoting which algorithms to use for the process: 27 | \itemize{ 28 | \item "randomForest" - Random forests using the randomForest package 29 | \item "ranger - Random forests using the ranger package (unstable) 30 | \item "naiveBayes" - Naive bayes using the fastNaiveBayes package 31 | \item "boosting" - Gradient boosting using xgboost 32 | \item "l2-linear-classifier" - Linear primal Support vector machine from LibLinear 33 | \item "svm" - RBF kernel svm from e1071 34 | }} 35 | 36 | \item{optimizationAlgorithm}{- String of which hyperparameter tuning algorithm to use: 37 | \itemize{ 38 | \item "hyperband" - Hyperband with uniformly initiated parameters 39 | \item "bohb" - Hyperband with bayesian optimization as described on F. Hutter et al 2018 paper BOHB. Has extra parameters bw and kde_type 40 | }} 41 | 42 | \item{bw}{- (only applies to BOHB) Double representing how much should the KDE bandwidth be widened. Higher values allow the algorithm to explore more hyperparameter combinations} 43 | 44 | \item{kde_type}{- (only applies to BOHB) String representing whether a model's hyperparameters should be tuned individually of each other or have their probability densities multiplied: 45 | \itemize{ 46 | \item "single" - each hyperparameter has its own expected improvement calculated 47 | \item "mixed" - all hyperparameters' probabilty densities are multiplied and only one mixed expected improvement is calculated 48 | }} 49 | 50 | \item{max_iter}{- (affects both hyperband and BOHB) Integer representing the maximum number of iterations that one successive halving run can have} 51 | 52 | \item{metric}{String of the evaluation metric to be used in the model performance optimization: 53 | \itemize{ 54 | \item "acc" - Accuracy, 55 | \item "avg-fscore" - Average of F-Score of each label, 56 | \item "avg-recall" - Average of Recall of each label, 57 | \item "avg-precision" - Average of Precision of each label, 58 | \item "fscore" - Micro-Average of F-Score of each label, 59 | \item "recall" - Micro-Average of Recall of each label, 60 | \item "precision" - Micro-Average of Precision of each label. 61 | }} 62 | } 63 | \value{ 64 | List of Results 65 | \itemize{ 66 | \item \code{perf} - Evaluated metric of the best performing model on the test data 67 | \item \code{pred} - prediction on the test data using the best model 68 | \item \code{model} - best model object 69 | \item \code{best_models} - table with the best hyperparameters found for the selected models. 70 | } 71 | } 72 | \description{ 73 | Tunes the hyperparameters of the desired algorithm/s using either hyperband or BOHB. 74 | } 75 | -------------------------------------------------------------------------------- /man/datasetReader.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/datasetReader.R 3 | \name{datasetReader} 4 | \alias{datasetReader} 5 | \title{Read Dataset File into Memory.} 6 | \usage{ 7 | datasetReader( 8 | directory, 9 | testDirectory, 10 | selectedFeats = c(), 11 | classCol = "class", 12 | preProcessF = "N", 13 | featuresToPreProcess = c(), 14 | nComp = NA, 15 | missingVal = c("NA", "?", " "), 16 | missingOpr = 0 17 | ) 18 | } 19 | \arguments{ 20 | \item{directory}{String of the directory to the file containing the training dataset.} 21 | 22 | \item{testDirectory}{String of the directory to the file containing the testing dataset.} 23 | 24 | \item{selectedFeats}{Vector of numbers of features columns to include from the training set and ignore the rest of columns - In case of empty vector, this means to include all features in the dataset file (default = c()).} 25 | 26 | \item{classCol}{String of the name of the class label column in the dataset (default = 'class').} 27 | 28 | \item{preProcessF}{string containing the name of the preprocessing algorithm (default = 'N' --> no preprocessing): 29 | \itemize{ 30 | \item "boxcox" - apply a Box–Cox transform and values must be non-zero and positive in all features, 31 | \item "yeo-Johnson" - apply a Yeo-Johnson transform, like a BoxCox, but values can be negative, 32 | \item "zv" - remove attributes with a zero variance (all the same value), 33 | \item "center" - subtract mean from values, 34 | \item "scale" - divide values by standard deviation, 35 | \item "standardize" - perform both centering and scaling, 36 | \item "normalize" - normalize values, 37 | \item "pca" - transform data to the principal components, 38 | \item "ica" - transform data to the independent components. 39 | }} 40 | 41 | \item{featuresToPreProcess}{Vector of number of features to perform the feature preprocessing on - In case of empty vector, this means to include all features in the dataset file (default = c()) - This vector should be a subset of \code{selectedFeats}.} 42 | 43 | \item{nComp}{Integer of Number of components needed if either "pca" or "ica" feature preprocessors are needed.} 44 | 45 | \item{missingVal}{Vector of strings representing the missing values in dataset (default: c('NA', '?', ' ')).} 46 | 47 | \item{missingOpr}{Boolean variable represents either delete instances with missing values or apply imputation using "MICE" library which helps you imputing missing values with plausible data values that are drawn from a distribution specifically designed for each missing datapoint- (default = 0 --> delete instances).} 48 | } 49 | \value{ 50 | List of the TrainingSet \code{Train} and TestingSet \code{Test}. 51 | } 52 | \description{ 53 | Read the file of the training and testing dataset, and perform preprocessing and data cleaning if necessary. 54 | } 55 | \examples{ 56 | \dontrun{ 57 | dataset <- datasetReader('/Datasets/irisTrain.csv', '/Datasets/irisTest.csv') 58 | } 59 | } 60 | -------------------------------------------------------------------------------- /man/metafeatures.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DataSystemsGroupUT/SmartML/e58b5bddb0fbf741e16f31651a282146143e78fe/man/metafeatures.pdf -------------------------------------------------------------------------------- /man/runClassifier.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/runClassifier.R 3 | \name{runClassifier} 4 | \alias{runClassifier} 5 | \title{Fit a classifier model.} 6 | \usage{ 7 | runClassifier( 8 | trainingSet, 9 | validationSet, 10 | params, 11 | classifierAlgorithm, 12 | metric = "acc", 13 | interp = 0 14 | ) 15 | } 16 | \arguments{ 17 | \item{trainingSet}{Dataframe of the training set.} 18 | 19 | \item{validationSet}{Dataframe of the validation Set.} 20 | 21 | \item{params}{A string character of parameter configuration values for the current classifier to be tuned (parameters are separated by #) and can be obtained from \code{params} out of resulted list after running \code{autoRLearn} function.} 22 | 23 | \item{classifierAlgorithm}{String character of the name of classifier algorithm used now. 24 | \itemize{ 25 | \item "svm" - Support Vector Machines from e1071 package, 26 | \item "naiveBayes" - naiveBayes from e1071 package, 27 | \item "randomForest" - randomForest from randomForest package, 28 | \item "lmt" - LMT Weka classifier trees from RWeka package, 29 | \item "lda" - Linear Discriminant Analysis from MASS package, 30 | \item "j48" - J48 Weka classifier Trees from RWeka package, 31 | \item "bagging" - Bagging Classfier from ipred package, 32 | \item "knn" - K nearest Neighbors from FNN package, 33 | \item "nnet" - Simple neural net from nnet package, 34 | \item "C50" - C50 decision tree from C5.0 pacakge, 35 | \item "rpart" - rpart decision tree from rpart package, 36 | \item "rda" - regularized discriminant analysis from klaR package, 37 | \item "plsda" - Partial Least Squares And Sparse Partial Least Squares Discriminant Analysis from caret package, 38 | \item "glm" - Fitting Generalized Linear Models from stats package, 39 | \item "deepboost" - deep boost classifier from deepboost package. 40 | }} 41 | 42 | \item{metric}{Metric string character to be used in evaluation: 43 | \itemize{ 44 | \item "acc" - Accuracy, 45 | \item "avg-fscore" - Average of F-Score of each label, 46 | \item "avg-recall" - Average of Recall of each label, 47 | \item "avg-precision" - Average of Precision of each label, 48 | \item "fscore" - Micro-Average of F-Score of each label, 49 | \item "recall" - Micro-Average of Recall of each label, 50 | \item "precision" - Micro-Average of Precision of each label 51 | }} 52 | 53 | \item{interp}{Boolean representing if interpretability is required or not (Default = 0).} 54 | } 55 | \value{ 56 | List of performance on validationSet named \code{perf}, model fitted on trainingSet named \code{m}, predictions on test set \code{pred}, and interpretability plots named \code{interpret} in case of interp = 1 57 | } 58 | \description{ 59 | Run the classifier on a training set and measure performance on a validation set. 60 | } 61 | \examples{ 62 | \dontrun{ 63 | result1 <- autoRLearn(10, 'sampleDatasets/shuttle/train.arff', 'sampleDatasets/shuttle/test.arff') 64 | dataset <- datasetReader('/Datasets/irisTrain.csv', '/Datasets/irisTest.csv') 65 | result2 <- runClassifier(dataset$Train, dataset$Test, result1$params, result1$clfs) 66 | } 67 | 68 | } 69 | -------------------------------------------------------------------------------- /man/supportedAlgorithms.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DataSystemsGroupUT/SmartML/e58b5bddb0fbf741e16f31651a282146143e78fe/man/supportedAlgorithms.pdf -------------------------------------------------------------------------------- /manual.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DataSystemsGroupUT/SmartML/e58b5bddb0fbf741e16f31651a282146143e78fe/manual.pdf -------------------------------------------------------------------------------- /save_jsons.R: -------------------------------------------------------------------------------- 1 | library(purrr) 2 | library(stringr) 3 | library(jsonlite) 4 | library(devtools) 5 | 6 | 7 | files <- dir(path <- "inst/extdata/hyperband_jsons", pattern = "*.json") 8 | names_clf <- files %>% 9 | map_chr(~ str_remove(.x, pattern = ".json")) 10 | paths <- file.path(path, files) 11 | jsons <- paths %>% 12 | map(.f = ~ jsonlite::fromJSON(txt = .x, flatten = T)) 13 | names(jsons) <- names_clf 14 | 15 | ## Then: 16 | 17 | save(jsons, file = "R/sysdata.rda") 18 | save(jsons, file = "sysdata.rda") 19 | load("sysdata.rda") 20 | -------------------------------------------------------------------------------- /sysdata.rda: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DataSystemsGroupUT/SmartML/e58b5bddb0fbf741e16f31651a282146143e78fe/sysdata.rda -------------------------------------------------------------------------------- /test_rmarkdown/new_tests.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "new_tests" 3 | author: "rucy" 4 | date: "9/22/2020" 5 | output: html_document 6 | --- 7 | 8 | ```{r setup, include=FALSE} 9 | knitr::opts_chunk$set(echo = TRUE) 10 | ``` 11 | 12 | ## R Markdown 13 | 14 | ```{r} 15 | 16 | library(R.utils) 17 | library(mlr3) 18 | library(mlr3learners) 19 | library(readr) 20 | library(data.table) 21 | library(purrr) 22 | library(stringr) 23 | library(jsonlite) 24 | library(tictoc) 25 | 26 | ## If you change any of the jsons 27 | 28 | ## Do this: 29 | 30 | files <- dir(path <- "~/school_stuff/schoolwork/witchcraft/inst/extdata/hyperband_jsons", pattern = "*.json") 31 | 32 | names_clf <- files %>% 33 | map_chr(~ str_remove(.x, pattern = ".json")) 34 | 35 | paths <- file.path(path, files) 36 | 37 | jsons <- paths %>% 38 | map(.f = ~ fromJSON(txt = .x, flatten = T)) 39 | 40 | names(jsons) <- names_clf 41 | 42 | ## Then: 43 | 44 | ## save(jsons, file = "~/school_stuff/schoolwork/witchcraft/sysdata.rda") 45 | 46 | # load("~/school_stuff/schoolwork/witchcraft/R/sysdata.rda") 47 | 48 | ## Do this ^^ 49 | 50 | param_sample <- function(model, hparam, columns = NULL) { 51 | 52 | param <- jsons[[model]][[hparam]] 53 | 54 | type <- param$type 55 | 56 | type_scale <- param$scale 57 | 58 | if(type == "discrete") { 59 | 60 | param_estimation <- paste("'", base::sample(x = as.list(param$values), size = 1), "'", sep = "") 61 | 62 | return(param_estimation) 63 | 64 | } 65 | 66 | else { 67 | 68 | int_val <- ifelse(hparam == "mtry", as.numeric(columns) - 1, as.numeric(param$maxVal)) 69 | 70 | param_estimation <- fcase(type_scale == "int", rdunif(1, a = as.numeric(param$minVal), 71 | b = int_val), 72 | type_scale == "any", runif(1, min = as.numeric(param$minVal), 73 | max = as.numeric(param$maxVal)), 74 | type_scale == "double", runif(1, min = as.numeric(param$minVal), 75 | max = as.numeric(param$maxVal)), 76 | type_scale == "exp", runif(1, min = 2^as.numeric(param$minVal), 77 | max = 2^as.numeric(param$maxVal))) 78 | 79 | return(param_estimation) 80 | 81 | } 82 | 83 | } 84 | 85 | get_random_hp_config <- function(model, columns = NULL) { 86 | 87 | param_db <- jsons[[model]] 88 | 89 | params_list <- param_db$params 90 | 91 | params_list_mapped <- map(.x = params_list, 92 | .f = as_mapper( ~ param_sample(model = model, hparam = .x, columns = columns))) 93 | 94 | `names<-`(params_list_mapped, params_list) 95 | 96 | } 97 | 98 | data_load <- read_csv(file = "~/school_stuff/schoolwork/witchcraft/inst/extdata/ta_train.csv") 99 | 100 | data_model <- data_load %>% 101 | as.data.table() 102 | 103 | data_model[, class := factor(class, levels = unique(class)) %>% sort()] 104 | 105 | ``` 106 | 107 | ### New successive halving 108 | 109 | ```{r} 110 | 111 | library(data.table) 112 | 113 | successive_halving <- function(df, model, params_config, n = 81, r = 1, eta = 3, max_iter = 81, s_max = 4, evaluations = data.frame()) { 114 | 115 | final_df <- params_config 116 | 117 | task <- TaskClassif$new(id = "sh", backend = df, target = "class") 118 | 119 | param_number <- length(params_config) 120 | 121 | for (k in 0:s_max) { 122 | 123 | gc() 124 | 125 | n_i = n * (eta ** -k) 126 | 127 | r_i = r * (eta ** k) 128 | 129 | r_p = r_i / max_iter 130 | 131 | min_train_datapoints = (length(unique(df$class)) * 3) + 1 132 | 133 | min_prob_datapoints = min_train_datapoints / nrow(df$class) 134 | 135 | train_idxs <- sample(task$nrow, task$nrow * max(min(r_p, 0.8), min_prob_datapoints)) 136 | test_idxs <- setdiff(seq_len(task$nrow), train_idxs) 137 | 138 | learners <- replicate(n = n_i, expr = {lrn(paste("classif", sep = ".", model))}) 139 | 140 | j = 1 141 | for (i in learners) { 142 | 143 | i$param_set$values = final_df[[j]] 144 | 145 | j = j + 1 146 | 147 | } 148 | 149 | for (l in learners) { 150 | 151 | l$train(task = task, row_ids = train_idxs) 152 | 153 | } 154 | 155 | measure <- msr("classif.acc") 156 | 157 | preds <- map(.x = learners, .f = ~ .x$predict(task, row_ids = test_idxs)$score(measure)) 158 | 159 | final_df <- final_df %>% 160 | as.data.table() %>% 161 | t() %>% 162 | `colnames<-`(value = jsons[[model]]$params) %>% 163 | as.data.table() 164 | 165 | final_df[, acc := unlist(preds)] 166 | 167 | final_df[, budget := r_i] 168 | 169 | final_df[, budget := r_p] 170 | 171 | setorder(final_df, -acc) 172 | 173 | evaluations <- rbindlist(list(evaluations, final_df)) 174 | 175 | final_df <- final_df %>% 176 | head(max(n_i/eta, 1)) 177 | 178 | if(k == s_max){ 179 | 180 | return(list("answer" = final_df, "sh_runs" = evaluations)) 181 | 182 | } 183 | 184 | final_df$acc = NULL 185 | final_df$budget = NULL 186 | 187 | final_df <- purrr::transpose(final_df) 188 | 189 | } 190 | } 191 | 192 | test_param_sampling <- replicate(81, get_random_hp_config("xgboost", columns = ncol(data_model)), simplify = FALSE) 193 | 194 | test_sh <- successive_halving(df = data_model, model = "xgboost", params_config = test_param_sampling) 195 | ``` 196 | 197 | ### New hyperbandito 198 | 199 | ```{r} 200 | 201 | calc_n_r = function(max_iter = 81, eta = 3, s = 4, B = 405) { 202 | 203 | n = trunc(ceiling(trunc(B/max_iter/(s+1)) * eta**s)) 204 | 205 | r = max_iter * eta^(-s) 206 | 207 | ans = c(n, r) 208 | 209 | ans 210 | 211 | } 212 | 213 | 214 | hyperband <- function(df, model, max_iter = 81, eta = 3, maxtime = 1000) { 215 | 216 | logeta = as_mapper(~ log(.x) / log(eta)) 217 | 218 | s_max = trunc(logeta(max_iter)) 219 | 220 | B = (s_max + 1) * max_iter 221 | 222 | nrs = map_dfc(s_max:0, .f = ~ calc_n_r(max_iter, eta, .x, B)) %>% 223 | t() %>% 224 | `colnames<-`(value = c("n", "r")) %>% 225 | as.data.table() 226 | 227 | nrs$s = s_max:0 228 | 229 | partial_halving <- function(n, r, s) { 230 | 231 | successive_halving(df = df, 232 | model = model, 233 | params_config = replicate(n, get_random_hp_config(model, columns = ncol(df) - 1), simplify = FALSE), 234 | n = n, 235 | r = r, 236 | s_max = s, 237 | max_iter = max_iter, 238 | eta = eta) 239 | 240 | } 241 | 242 | tryCatch(expr = {withTimeout(expr = { 243 | 244 | liszt = vector(mode = "list", 245 | length = max(nrs$s) + 1) 246 | 247 | for (row in 1:nrow(nrs)) { 248 | 249 | liszt[[row]] <- partial_halving(nrs[[row, 1]], 250 | nrs[[row, 2]], 251 | nrs[[row, 3]]) 252 | 253 | } 254 | }, timeout = maxtime, cpu = maxtime)}, 255 | 256 | TimeoutException = function(ex) { 257 | 258 | print("Budget ended.") 259 | 260 | return(liszt) 261 | 262 | }, 263 | 264 | finally = function(ex) { 265 | 266 | print("Hyperband successfully finished.") 267 | 268 | return(liszt) } 269 | , 270 | 271 | error = function(ex) { 272 | 273 | print(paste("Error found, replace ", model, sep = "")) 274 | 275 | print(geterrmessage()) 276 | 277 | break 278 | 279 | }) 280 | 281 | return(liszt) 282 | 283 | } 284 | 285 | tezt_hyperband = hyperband(df = data_model, model = "xgboost", maxtime = 120) 286 | ``` 287 | 288 | Evocation test 289 | 290 | ```{r} 291 | 292 | evocate <- function(df_train, df_test, maxTime = 10, models = "xgboost", optimizationAlgorithm = "hyperband", bw = 3, max_iter = 81, kde_type = "single") { 293 | 294 | total_time = maxTime * 60 295 | 296 | parameters_per_model <- map_int(models, .f = ~ length(jsons[[.x]]$params)) 297 | 298 | times = (parameters_per_model * total_time) / (sum(parameters_per_model)) 299 | 300 | print("Time distribution:") 301 | print(times) 302 | print("Models selected:") 303 | print(models) 304 | 305 | run_optimization = function(model, time) { 306 | 307 | results = NULL 308 | 309 | priors = data.frame() 310 | 311 | tic(model, "optimization time:") 312 | 313 | if(optimizationAlgorithm == "hyperband") { 314 | 315 | current <- Sys.time() %>% as.integer() 316 | 317 | end <- (Sys.time() %>% as.integer()) + time 318 | 319 | repeat { 320 | 321 | gc(verbose = F) 322 | 323 | tic("current hyperband runtime") 324 | 325 | print(paste("started", model)) 326 | 327 | time_left <- max(end - (Sys.time() %>% as.integer()), 1) 328 | 329 | print(paste("There are:", time_left, "seconds left for this hyperband run")) 330 | 331 | res <- hyperband(df = df_train, model = model, max_iter = max_iter, maxtime = time_left) 332 | 333 | if(is_empty(purrr::flatten(res)) == F) { 334 | 335 | res <- res %>% 336 | map_dfr(.f = ~ .x[["answer"]]) %>% 337 | as.data.table() 338 | 339 | setorder(res, -acc) 340 | 341 | res <- res %>% head(1) 342 | 343 | results <- c(list(res), results) 344 | 345 | print(res) 346 | 347 | print(paste('Best accuracy from hyperband this round: ', res$acc)) 348 | 349 | } 350 | 351 | elapsed <- (Sys.time() %>% as.integer()) - current 352 | 353 | if(elapsed >= time) { 354 | 355 | break 356 | 357 | } 358 | 359 | } 360 | 361 | } 362 | 363 | else if(optimizationAlgorithm == "bohb") { 364 | 365 | current <- Sys.time() %>% as.integer() 366 | 367 | end <- (Sys.time() %>% as.integer()) + time 368 | 369 | repeat { 370 | 371 | gc(verbose = F) 372 | 373 | tic("current bohb time") 374 | 375 | print(paste("started", model)) 376 | 377 | time_left <- max(end - (Sys.time() %>% as.integer()), 1) 378 | 379 | print(paste("There are:", time_left, "seconds left for this bohb run")) 380 | 381 | res <- bohb(df = df_train, model = model, bw = bw, max_iter = max_iter, maxtime = time_left, priors = priors, kde_type = kde_type) 382 | 383 | if(is_empty(flatten(res)) == F) { 384 | 385 | priors <- res %>% 386 | map_dfr(.f = ~ .x[["sh_runs"]]) 387 | 388 | res <- res %>% 389 | map_dfr(.f = ~ .x[["answer"]]) %>% 390 | arrange(desc(acc)) %>% 391 | head(1) 392 | 393 | results <- c(list(res), results) 394 | 395 | print(paste('Best accuracy from hyperband this round: ', res$acc)) 396 | 397 | } 398 | 399 | elapsed <- (Sys.time() %>% as.integer()) - current 400 | 401 | if(elapsed >= time) { 402 | 403 | break 404 | 405 | } 406 | 407 | } 408 | 409 | 410 | } 411 | 412 | else { 413 | 414 | errorCondition(message = "Only hyperband and bohb are valid optimization algorithms at this moment.") 415 | 416 | break 417 | 418 | } 419 | 420 | toc() 421 | 422 | results 423 | 424 | } 425 | 426 | print("Finished all optimizations.") 427 | 428 | ans = vector(mode = "list", length = length(models)) 429 | 430 | for(i in 1:length(models)) { 431 | 432 | flag <- TRUE 433 | 434 | tryCatch(expr = { 435 | 436 | ans[[i]] <- run_optimization(models[[i]], times[[i]]) 437 | 438 | }, error = function(e) { 439 | 440 | print("Error spotted, going to the next model") 441 | 442 | flag <<- FALSE 443 | 444 | }) 445 | 446 | if (!flag) next 447 | 448 | } 449 | 450 | return(ans) 451 | 452 | ### TO DO - add the final model evaluation. 453 | ### with your cross validation ideas and etc. 454 | 455 | } 456 | 457 | 458 | ``` 459 | 460 | ```{r} 461 | 462 | data_train <- read_csv(file = "~/school_stuff/schoolwork/witchcraft/inst/extdata/ta_train.csv") %>% as.data.table() 463 | data_test <- read_csv(file = "~/school_stuff/schoolwork/witchcraft/inst/extdata/ta_test.csv") %>% as.data.table() 464 | 465 | data_train[, class := factor(class, levels = unique(class)) %>% sort()] 466 | data_test[, class := factor(class, levels = unique(class)) %>% sort()] 467 | 468 | tezt <- evocate(data_train, data_test, maxTime = 2, models = "xgboost") 469 | 470 | ``` 471 | -------------------------------------------------------------------------------- /testing.R: -------------------------------------------------------------------------------- 1 | # Title : Testing the Main Package Function 2 | # Objective : Package Testing 3 | # Created by: s-moh 4 | # Created on: 11/12/2020 5 | library(SmartML) 6 | library(tidyverse) 7 | library(R.utils) 8 | library(mlr) 9 | library(mlr3) 10 | library(mlr3learners) 11 | library(mlr3pipelines) 12 | library(mlr3filters) 13 | library(readr) 14 | library(data.table) 15 | library(stringr) 16 | library(jsonlite) 17 | library(tictoc) 18 | 19 | ################################################################################################# 20 | # Classification 21 | 22 | "lrn1 <- lrn('classif.rpart', predict_type = 'prob') 23 | lrn2 <- lrn('classif.ranger', predict_type = 'prob') 24 | lrn3 <- lrn('classif.svm', predict_type = 'prob') 25 | 26 | rpart_cv1 = po('learner_cv', lrn1, id = 'lrn1') 27 | ranger_cv1 = po('learner_cv', lrn2, id = 'lrn2') 28 | svm_cv1 = po('learner_cv', lrn3, id = 'lrn3') 29 | lrns = c(rpart_cv1, ranger_cv1, svm_cv1) 30 | 31 | level0 = gunion(list( 32 | lrns)) %>>% 33 | po('featureunion', id = 'union1') 34 | 35 | ensemble = level0 %>>% LearnerClassifAvg$new(id = 'classif.avg') 36 | ensemble$plot(html = FALSE) 37 | 38 | ens_lrn = GraphLearner$new(ensemble) 39 | ens_lrn$predict_type = 'prob' 40 | 41 | task = mlr_tasks$get('iris') 42 | train.idx = sample(seq_len(task$nrow), 120) 43 | test.idx = setdiff(seq_len(task$nrow), train.idx) 44 | 45 | perf <- ens_lrn$train(task, train.idx)$predict(task, test.idx)$score(msr('classif.acc')) 46 | print(perf)" 47 | 48 | ################################################################################################# 49 | 50 | data_train <- readr::read_csv('inst/extdata/tictactoe_train.csv') %>% 51 | as.data.table() 52 | 53 | data_test <- readr::read_csv('inst/extdata/tictactoe_test.csv') %>% 54 | as.data.table() 55 | 56 | data_train[, class := factor(class, levels = unique(class)) %>% sort()] 57 | data_test[, class := factor(class, levels = unique(class)) %>% sort()] 58 | 59 | opt <- SmartML::evocate(df_train = data_train, 60 | df_test = data_test, 61 | models = c('rpart', 'ranger', 'svm'), 62 | #'svm(done)', 'kknn(done)', 'ranger(done)', 'rpart(done)', 63 | #'xgboost(done)', 'cv_glmnet(done)', 'naive_bayes(done)' 64 | optimizationAlgorithm = 'hyperband', 65 | maxTime = 5, ensemble_size = 3) 66 | 67 | print(opt) 68 | gc() 69 | -------------------------------------------------------------------------------- /tests/testthat.R: -------------------------------------------------------------------------------- 1 | library(testthat) 2 | library(SmartML) 3 | 4 | test_check("SmartML") 5 | -------------------------------------------------------------------------------- /tests/testthat/test-autorlearn.R: -------------------------------------------------------------------------------- 1 | context("test-autorlearn") 2 | 3 | test_that("option1", { 4 | result1 <- autoRLearn(1, system.file("extdata", "shuttle/train.arff", package = "SmartML"), system.file("extdata", "shuttle/train.arff", package = "SmartML"), option = 1, preProcessF = 'pca', nComp = 3, nModels = 2) 5 | result1$clfs #Vector of recommended nModels classifiers 6 | result1$params #Vector of initial suggested parameter configurations of nModels recommended classifiers 7 | }) 8 | 9 | -------------------------------------------------------------------------------- /tests/testthat/test-hyperband_test.R: -------------------------------------------------------------------------------- 1 | test_that("Parameter sampling works", { 2 | expect_length(param_sample("ranger", "mtry", columns = 11), 1) 3 | }) 4 | -------------------------------------------------------------------------------- /vignettes/.gitignore: -------------------------------------------------------------------------------- 1 | *.html 2 | *.R 3 | -------------------------------------------------------------------------------- /vignettes/introduction.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Introduction to SmartML: Automatic Supervised Machine Learning in R" 3 | author: "Mohamed Maher - Data Systems Group @ University of Tartu" 4 | output: rmarkdown::html_vignette 5 | fig_width: 10 6 | fig_height: 10 7 | vignette: > 8 | %\VignetteIndexEntry{Introduction to SmartML: Automatic Supervised Machine Learning in R} 9 | %\VignetteEngine{knitr::rmarkdown} 10 | %\VignetteEncoding{UTF-8} 11 | --- 12 | 13 | ```{r setup, include = FALSE} 14 | knitr::opts_chunk$set( 15 | collapse = TRUE, 16 | comment = "#>" 17 | ) 18 | ``` 19 | 20 | 21 | ## SmartML: 22 | Curently, SmartML is an R-Package representing a meta learning-based framework for automated selection and hyperparameter tuning for machine learning algorithms. Being meta-learning based, the framework is able to simulate the role of the machine learning expert. In particular, the framework is equipped with a continuously updated knowledge base that stores information about the meta-features of all processed datasets along with the associated performance of the different classifiers and their tuned parameters. Thus, for any new dataset, SmartML automatically extracts its meta features and searches its knowledge base for the best performing algorithm to start its optimization process. In addition, SmartML makes use of the new runs to continuously enrich its knowledge base to improve its performance and robustness for future runs. 23 | 24 | 25 | 26 | ## SmartML Contribution Points and Goals: 27 | 28 | The goal of SmartML is to automate the process of classifier algorithm selection, and hyper-parameter tuning in supervised machine learning using a modified version of SMAC bayesian optimization that prefers explitation more than exploration thanks to Meta-Learning. 29 | 1. SmartML is the first R package to deal with the sueprvised machine learning automation, and it is built over 16 different classifier algorithms from different R packages.
30 | 2. In addition, we offer different data preprocessing, and feature engineering algorithms that can be specified by user and applied on tabular datasets of either CSV or ARFF extensions easily. 31 | 3. SmartML has a collaborative knowledge base that grows by time as more users are using our tool. 32 | 4. Finally, SmartML has the ability to do some model interpretability plots for feature importance and interaction by help of ```iml``` package for ML model interpretability. 33 | 5. SmartML has a web service for the tool with a simple R Shiny interface that can be found HERE , and demonstration for how to use the web service can be found HERE. 34 | 35 | ## Installation 36 | 37 | You can install the released version of SmartML from [Github](https://github.com/DataSystemsGroupUT/SmartML) with: 38 | 39 | ``` r 40 | install_github("DataSystemsGroupUT/SmartML") 41 | ``` 42 | 43 | --- 44 | ## User Manual 45 | 46 | Manual for the SmartML R package can be found HERE 47 | 48 | --- 49 | ## Example 50 | 51 | --- 52 | ## Contribution GuideLines to SmartML 53 | To Contribute to `SmartML`, Please Follow these GuideLines 54 | 55 | --- 56 | ## Publication 57 | 58 | For More details, you can view our publication about SmartML. 59 | SmartML has been accepted as a DEMO paper at EDBT 19 in Lisbon Portugal [PDF]: 60 | ``` 61 | Mohamed Maher, Sherif Sakr.,SMARTML: A Meta Learning-Based Framework for Automated Selection and Hyperparameter Tuning for Machine Learning Algorithms (2019). Advances in Database Technology-EDBT 2019: 22nd International Conference on Extending Database Technology, Lisbon, Portugal, March 26-29. 62 | ``` 63 | 64 | --- 65 | ## Funding: 66 | This work is funded by the European Regional Development Funds via the Mobilitas Plus programme (grant MOBTT75). 67 | 68 | --- 69 | ## Licence: 70 |

71 | © 2019, Data Systems Group at University of Tartu 72 |

73 | This work is licensed under the terms of the GNU General Public License, version 3.0 (GPLv3) 74 | --------------------------------------------------------------------------------