├── ..Rcheck
    ├── 00check.log
    └── 00install.out
├── .Rbuildignore
├── .gitignore
├── .idea
    ├── .gitignore
    ├── SmartML.iml
    ├── aws.xml
    ├── inspectionProfiles
    │   ├── Project_Default.xml
    │   └── profiles_settings.xml
    ├── misc.xml
    ├── modules.xml
    ├── rAvailablePackageCache.xml
    ├── rGraphicsSettings.xml
    ├── rSettings.xml
    └── vcs.xml
├── .travis.yml
├── CONTRIBUTE.md
├── DESCRIPTION
├── LICENSE
├── NAMESPACE
├── NEWS.md
├── R
    ├── autoRLearn.R
    ├── autoRLearn_.R
    ├── bohb.R
    ├── bohb_utility.R
    ├── checkInternet.R
    ├── computeEI.R
    ├── computeMetaFeatures.R
    ├── convertCategorical.R
    ├── datasetReader.R
    ├── evaluateMet.R
    ├── evocate.R
    ├── evocate_utilities.R
    ├── featurePreProcessing.R
    ├── fitModel.R
    ├── getCandidateClassifiers.R
    ├── hb_utilities.R
    ├── hyperband.R
    ├── initialize.R
    ├── intensify.R
    ├── intrepretability.R
    ├── outClassifierConf.R
    ├── readDataset.R
    ├── runClassifier.R
    ├── runClassifier_.R
    ├── selectConfiguration.R
    ├── sendToDatabase.R
    ├── sendToTmp.R
    ├── successive_halving.R
    ├── successive_resampling.R
    └── sysdata.rda
├── README.Rmd
├── README.md
├── SmartML.Rproj
├── SmartML_0.3.0.pdf
├── codecov.yml
├── inst
    └── extdata
    │   ├── anneal_test.csv
    │   ├── anneal_train.csv
    │   ├── avila_test.csv
    │   ├── avila_train.csv
    │   ├── batch_test.csv
    │   ├── batch_train.csv
    │   ├── dota_test.csv
    │   ├── dota_train.csv
    │   ├── hyperband_jsons.zip
    │   ├── hyperband_jsons
    │       ├── cv_glmnet.json
    │       ├── glmnet.json
    │       ├── kknn.json
    │       ├── lm.json
    │       ├── naive_bayes.json
    │       ├── ranger.json
    │       ├── rpart.json
    │       ├── svm.json
    │       └── xgboost.json
    │   ├── messidor_test.csv
    │   ├── messidor_train.csv
    │   ├── mushroom_test.csv
    │   ├── mushroom_train.csv
    │   ├── schizo.csv
    │   ├── ta_test.csv
    │   ├── ta_train.csv
    │   ├── test_schizo.csv
    │   ├── theorem_test.csv
    │   ├── theorem_train.csv
    │   ├── tictactoe_test.csv
    │   ├── tictactoe_train.csv
    │   └── train_schizo.csv
├── man
    ├── autoRLearn.Rd
    ├── autoRLearn_.Rd
    ├── datasetReader.Rd
    ├── metafeatures.pdf
    ├── runClassifier.Rd
    └── supportedAlgorithms.pdf
├── manual.pdf
├── save_jsons.R
├── sysdata.rda
├── test_rmarkdown
    ├── data_tests.Rmd
    ├── func_tests.Rmd
    └── new_tests.Rmd
├── testing.R
├── tests
    ├── testthat.R
    └── testthat
    │   ├── test-autorlearn.R
    │   └── test-hyperband_test.R
└── vignettes
    ├── .gitignore
    └── introduction.Rmd


/..Rcheck/00check.log:
--------------------------------------------------------------------------------
 1 | * using log directory 'C:/Users/s-moh/0-Labwork/SmartML_From_Scratch/Auto-Machine-Learning/..Rcheck'
 2 | * using R version 3.5.3 (2019-03-11)
 3 | * using platform: x86_64-w64-mingw32 (64-bit)
 4 | * using session charset: ISO8859-1
 5 | * checking for file './DESCRIPTION' ... OK
 6 | * checking extension type ... Package
 7 | * this is package 'SmartML' version '0.1.0'
 8 | * package encoding: UTF-8
 9 | * checking package namespace information ... OK
10 | * checking package dependencies ... OK
11 | * checking if this is a source package ... OK
12 | * checking if there is a namespace ... OK
13 | * checking for .dll and .exe files ... OK
14 | * checking for hidden files and directories ... NOTE
15 | Found the following hidden files and directories:
16 |   .RData
17 |   .Rhistory
18 |   .gitignore
19 |   ..Rcheck
20 |   .Rproj.user
21 |   .git
22 | These were most likely included in error. See section 'Package
23 | structure' in the 'Writing R Extensions' manual.
24 | * checking for portable file names ... OK
25 | * checking whether package 'SmartML' can be installed ... ERROR
26 | Installation failed.
27 | See 'C:/Users/s-moh/0-Labwork/SmartML_From_Scratch/Auto-Machine-Learning/..Rcheck/00install.out' for details.
28 | * DONE
29 | Status: 1 ERROR, 1 NOTE
30 | 


--------------------------------------------------------------------------------
/..Rcheck/00install.out:
--------------------------------------------------------------------------------
 1 | * installing *source* package 'SmartML' ...
 2 | ** R
 3 | ** byte-compile and prepare package for lazy loading
 4 | Error : .onLoad failed in loadNamespace() for 'rJava', details:
 5 |   call: dirname(this$RuntimeLib)
 6 |   error: a character vector argument expected
 7 | ERROR: lazy loading failed for package 'SmartML'
 8 | * removing 'C:/Users/s-moh/0-Labwork/SmartML_From_Scratch/Auto-Machine-Learning/..Rcheck/SmartML'
 9 | In R CMD INSTALL
10 | 


--------------------------------------------------------------------------------
/.Rbuildignore:
--------------------------------------------------------------------------------
 1 | ^CONTRIBUTE\.md$
 2 | ^tmp$
 3 | ^man/supportedAlgorithms\.pdf$
 4 | ^man/metafeatures\.pdf$
 5 | ^codecov\.yml$
 6 | ^\.travis\.yml$
 7 | ^.*\.Rproj$
 8 | ^\.Rproj\.user$
 9 | ^sampleDatasets$
10 | ^manual.pdf$
11 | ^LICENSE$
12 | 


--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
1 | inst/doc
2 | .Rproj.user
3 | .Rhistory
4 | .RData
5 | .Ruserdata
6 | tmp
7 | 


--------------------------------------------------------------------------------
/.idea/.gitignore:
--------------------------------------------------------------------------------
1 | # Default ignored files
2 | /shelf/
3 | /workspace.xml
4 | # Datasource local storage ignored files
5 | /../../../../../../../:\Users\s-moh\0-Labwork\SmartML\SmartML\.idea/dataSources/
6 | /dataSources.local.xml
7 | # Editor-based HTTP Client requests
8 | /httpRequests/
9 | 


--------------------------------------------------------------------------------
/.idea/SmartML.iml:
--------------------------------------------------------------------------------
1 | <?xml version="1.0" encoding="UTF-8"?>
2 | <module type="PYTHON_MODULE" version="4">
3 |   <component name="NewModuleRootManager">
4 |     <content url="file://$MODULE_DIR$" />
5 |     <orderEntry type="inheritedJdk" />
6 |     <orderEntry type="sourceFolder" forTests="false" />
7 |   </component>
8 | </module>


--------------------------------------------------------------------------------
/.idea/aws.xml:
--------------------------------------------------------------------------------
 1 | <?xml version="1.0" encoding="UTF-8"?>
 2 | <project version="4">
 3 |   <component name="accountSettings">
 4 |     <option name="activeRegion" value="us-east-1" />
 5 |     <option name="recentlyUsedRegions">
 6 |       <list>
 7 |         <option value="us-east-1" />
 8 |       </list>
 9 |     </option>
10 |   </component>
11 | </project>


--------------------------------------------------------------------------------
/.idea/inspectionProfiles/Project_Default.xml:
--------------------------------------------------------------------------------
 1 | <component name="InspectionProjectProfileManager">
 2 |   <profile version="1.0">
 3 |     <option name="myName" value="Project Default" />
 4 |     <inspection_tool class="PyCompatibilityInspection" enabled="true" level="WARNING" enabled_by_default="true">
 5 |       <option name="ourVersions">
 6 |         <value>
 7 |           <list size="2">
 8 |             <item index="0" class="java.lang.String" itemvalue="2.7" />
 9 |             <item index="1" class="java.lang.String" itemvalue="3.9" />
10 |           </list>
11 |         </value>
12 |       </option>
13 |     </inspection_tool>
14 |   </profile>
15 | </component>


--------------------------------------------------------------------------------
/.idea/inspectionProfiles/profiles_settings.xml:
--------------------------------------------------------------------------------
1 | <component name="InspectionProjectProfileManager">
2 |   <settings>
3 |     <option name="USE_PROJECT_PROFILE" value="false" />
4 |     <version value="1.0" />
5 |   </settings>
6 | </component>


--------------------------------------------------------------------------------
/.idea/misc.xml:
--------------------------------------------------------------------------------
 1 | <?xml version="1.0" encoding="UTF-8"?>
 2 | <project version="4">
 3 |   <component name="ProjectRootManager" version="2" project-jdk-name="Python 3.8 (untitled)" project-jdk-type="Python SDK" />
 4 |   <component name="RMarkdownSettings">
 5 |     <option name="renderProfiles">
 6 |       <map>
 7 |         <entry key="file://$PROJECT_DIR$/test_rmarkdown/data_tests.Rmd">
 8 |           <value>
 9 |             <RMarkdownRenderProfile>
10 |               <option name="outputDirectoryUrl" value="file://$PROJECT_DIR$/test_rmarkdown" />
11 |             </RMarkdownRenderProfile>
12 |           </value>
13 |         </entry>
14 |         <entry key="file://$PROJECT_DIR$/test_rmarkdown/func_tests.Rmd">
15 |           <value>
16 |             <RMarkdownRenderProfile>
17 |               <option name="outputDirectoryUrl" value="file://$PROJECT_DIR$/test_rmarkdown" />
18 |             </RMarkdownRenderProfile>
19 |           </value>
20 |         </entry>
21 |         <entry key="file://$PROJECT_DIR$/test_rmarkdown/new_tests.Rmd">
22 |           <value>
23 |             <RMarkdownRenderProfile>
24 |               <option name="outputDirectoryUrl" value="file://$PROJECT_DIR$/test_rmarkdown" />
25 |             </RMarkdownRenderProfile>
26 |           </value>
27 |         </entry>
28 |       </map>
29 |     </option>
30 |   </component>
31 | </project>


--------------------------------------------------------------------------------
/.idea/modules.xml:
--------------------------------------------------------------------------------
1 | <?xml version="1.0" encoding="UTF-8"?>
2 | <project version="4">
3 |   <component name="ProjectModuleManager">
4 |     <modules>
5 |       <module fileurl="file://$PROJECT_DIR$/.idea/SmartML.iml" filepath="$PROJECT_DIR$/.idea/SmartML.iml" />
6 |     </modules>
7 |   </component>
8 | </project>


--------------------------------------------------------------------------------
/.idea/rAvailablePackageCache.xml:
--------------------------------------------------------------------------------
 1 | <?xml version="1.0" encoding="UTF-8"?>
 2 | <project version="4">
 3 |   <component name="RAvailablePackageCache">
 4 |     <option name="lastUpdate" value="1605397760266" />
 5 |     <option name="repoUrls">
 6 |       <list>
 7 |         <option value="@CRAN@" />
 8 |       </list>
 9 |     </option>
10 |   </component>
11 | </project>


--------------------------------------------------------------------------------
/.idea/rGraphicsSettings.xml:
--------------------------------------------------------------------------------
1 | <?xml version="1.0" encoding="UTF-8"?>
2 | <project version="4">
3 |   <component name="RGraphicsSettings">
4 |     <option name="height" value="432" />
5 |     <option name="resolution" value="75" />
6 |     <option name="version" value="1" />
7 |     <option name="width" value="768" />
8 |   </component>
9 | </project>


--------------------------------------------------------------------------------
/.idea/rSettings.xml:
--------------------------------------------------------------------------------
1 | <?xml version="1.0" encoding="UTF-8"?>
2 | <project version="4">
3 |   <component name="RSettings">
4 |     <option name="interpreterPath" value="C:\Program Files\R\R-4.0.3\bin\R.exe" />
5 |     <option name="loadWorkspace" value="true" />
6 |   </component>
7 | </project>


--------------------------------------------------------------------------------
/.idea/vcs.xml:
--------------------------------------------------------------------------------
1 | <?xml version="1.0" encoding="UTF-8"?>
2 | <project version="4">
3 |   <component name="VcsDirectoryMappings">
4 |     <mapping directory="$PROJECT_DIR$" vcs="Git" />
5 |   </component>
6 | </project>


--------------------------------------------------------------------------------
/.travis.yml:
--------------------------------------------------------------------------------
1 | # R for travis: see documentation at https://docs.travis-ci.com/user/languages/r
2 | 
3 | language: r
4 | sudo: false
5 | cache: packages
6 | warnings_are_errors: false
7 | 


--------------------------------------------------------------------------------
/CONTRIBUTE.md:
--------------------------------------------------------------------------------
 1 | # Contributing to `SmartML` development
 2 | 
 3 | I use the same guide for contributing as the `ggplot2` R package, which is restated here: 
 4 |   
 5 | The goal of this guide is to help you get up and contributing to `SmartML` as
 6 | quickly as possible. The guide is divided into two main pieces:
 7 |   
 8 | * Filing a bug report or feature request in an issue.
 9 | * Suggesting a change via a pull request.
10 | 
11 | ## Issues
12 | 
13 | When filing an issue, the most important thing is to include a minimal
14 | reproducible example so that we can quickly verify the problem, and then figure
15 | out how to fix it. There are three things you need to include to make your
16 | example reproducible: required packages, data, code.
17 | 
18 | 1.  **Packages** should be loaded at the top of the script, so it's easy to
19 | see which ones the example needs.
20 | 
21 | 2.  The easiest way to include **data** is to use `dput()` to generate the R
22 | code to recreate it.
23 | 
24 | 3.  Spend a little bit of time ensuring that your **code** is easy for others to
25 | read:
26 | 
27 | * make sure you've used spaces and your variable names are concise, but
28 | informative
29 | 
30 | * use comments to indicate where your problem lies
31 | 
32 | * do your best to remove everything that is not related to the problem.
33 | The shorter your code is, the easier it is to understand.
34 | 
35 | You can check you have actually made a reproducible example by starting up a
36 | fresh R session and pasting your script in.
37 | 
38 | (Unless you've been specifically asked for it, please don't include the output
39 |   of `sessionInfo()`.)
40 | 
41 | ## Pull requests
42 | 
43 | To contribute a change to `SmartML`, you follow these steps:
44 |   
45 | 1. Create a branch in git and make your changes.
46 | 2. Push branch to github and issue pull request (PR).
47 | 3. Discuss the pull request.
48 | 4. Iterate until either we accept the PR or decide that it's not a good fit for
49 |    `SmartML`.
50 | 
51 | Each of these steps are described in more detail below. This might feel
52 | overwhelming the first time you get set up, but it gets easier with practice.
53 | 
54 | If you're not familiar with git or GitHub, please start by reading
55 | <http://r-pkgs.had.co.nz/git.html>
56 |   
57 |   Pull requests will be evaluated against the a checklist:
58 |   
59 | 1.  __Motivation__. Your pull request should clearly and concisely motivates the
60 | need for change. Plesae describe the problem your PR addresses and show
61 | how your pull request solves it as concisely as possible.
62 | 
63 |     Also include this motivation in `NEWS` so that when a new release of
64 | `SmartML` comes out it's easy for users to see what's changed. Add your
65 | item at the top of the file and use markdown for formatting. The
66 | news item should end with `(@yourGithubUsername, #the_issue_number)`.
67 | 
68 | 2.  __Only related changes__. Before you submit your pull request, please
69 | check to make sure that you haven't accidentally included any unrelated
70 | changes. These make it harder to see exactly what's changed, and to
71 | evaluate any unexpected side effects. 
72 | 
73 |     Each PR corresponds to a git branch, so if you expect to submit
74 | multiple changes make sure to create multiple branches. If you have
75 | multiple changes that depend on each other, start with the first one
76 | and don't submit any others until the first one has been processed.
77 | 
78 | 3.  If you're adding new parameters or a new function, you'll also need
79 |     to document them with [roxygen](https://github.com/klutometis/roxygen).
80 |     Make sure to re-run `devtools::document()` on the code before submitting.
81 | 
82 | This seems like a lot of work but don't worry if your pull request isn't
83 | perfect. It's a learning process. A pull request is a process, and unless
84 | you've submitted a few in the past it's unlikely that your pull request will be
85 | accepted as is. Please don't submit pull requests that change existing
86 | behaviour. Instead, think about how you can add a new feature in a minimally
87 | invasive way.
88 | 


--------------------------------------------------------------------------------
/DESCRIPTION:
--------------------------------------------------------------------------------
 1 | Package: SmartML
 2 | Version: 0.3.0
 3 | Title: Machine Learning Automation
 4 | Authors@R:
 5 |   c(person(given = "Mohamed",
 6 |            family = "Maher",
 7 |            email = "s-mohamed.zenhom@zewailcity.edu.eg",
 8 |            role = c("aut", "cre")),
 9 |     person(given = "Sherif",
10 |            family = "Sakr",
11 |            email = "sherif.sakr@ut.ee",
12 |            role = "aut"),
13 |     person(given = "Bruno Rucy",
14 |            family = "Carneiro Alves de Lima",
15 |            email = "brurucy@protonmail.ch",
16 |            role = "ctb"))
17 | Description: This package is a meta-learning based framework for automated selection and hyper-parameter tuning for machine learning algorithms. Being meta-learning based, the framework is able to simulate the role of the machine learning expert. In particular, the framework is equipped with a continuously updated knowledge base that stores information about statistical meta features of all processed datasets along with the associated performance of the different classifiers and their tuned parameters. Thus, for any new dataset, SmartML automatically extracts its meta features and searches its knowledge base for the best performing algorithm to start its optimization process. In addition, SmartML makes use of the new runs to continuously enrich its knowledge base to improve its performance and robustness for future runs.
18 | License: GPL-3
19 | Encoding: UTF-8
20 | LazyData: false
21 | Imports:
22 |   devtools, R.utils, stats, tictoc, e1071, BBmisc, kknn, purrr, xgboost, ranger, 
23 |   KernSmooth, data.table, randomForest, rpart, glmnet, nloptr, bbotk
24 | Suggests:
25 |     knitr,
26 |     covr,
27 |     testthat,
28 |     rmarkdown
29 | Depends:
30 |     mlr3,
31 |     mlr3learners,
32 |     mlr3pipelines,
33 |     mlr3filters
34 | RoxygenNote: 7.1.1
35 | VignetteBuilder: knitr
36 | 


--------------------------------------------------------------------------------
/NAMESPACE:
--------------------------------------------------------------------------------
 1 | # Generated by roxygen2: do not edit by hand
 2 | 
 3 | export(autoRLearn)
 4 | export(autoRLearn_)
 5 | export(evocate)
 6 | export(runClassifier)
 7 | import(RWeka)
 8 | import(caret)
 9 | import(devtools)
10 | import(farff)
11 | import(ggplot2)
12 | import(mice)
13 | import(purrr)
14 | import(rjson)
15 | importFrom(BBmisc,normalize)
16 | importFrom(C50,C5.0)
17 | importFrom(C50,C5.0Control)
18 | importFrom(FNN,knn)
19 | importFrom(KernSmooth,bkde)
20 | importFrom(KernSmooth,dpik)
21 | importFrom(LiblineaR,LiblineaR)
22 | importFrom(MASS,lda)
23 | importFrom(R.utils,withTimeout)
24 | importFrom(RCurl,getURL)
25 | importFrom(RMySQL,MySQL)
26 | importFrom(RMySQL,dbConnect)
27 | importFrom(RMySQL,dbDisconnect)
28 | importFrom(RMySQL,dbSendQuery)
29 | importFrom(RMySQL,fetch)
30 | importFrom(UBL,SmoteClassif)
31 | importFrom(caret,confusionMatrix)
32 | importFrom(caret,plsda)
33 | importFrom(data.table,fcase)
34 | importFrom(deepboost,deepboost)
35 | importFrom(deepboost,deepboost.predict)
36 | importFrom(dplyr,arrange)
37 | importFrom(dplyr,case_when)
38 | importFrom(dplyr,distinct)
39 | importFrom(dplyr,filter)
40 | importFrom(dplyr,group_by)
41 | importFrom(dplyr,mutate)
42 | importFrom(dplyr,mutate_if)
43 | importFrom(dplyr,n)
44 | importFrom(dplyr,select)
45 | importFrom(dplyr,top_frac)
46 | importFrom(e1071,kurtosis)
47 | importFrom(e1071,naiveBayes)
48 | importFrom(e1071,skewness)
49 | importFrom(e1071,svm)
50 | importFrom(fastNaiveBayes,fnb.train)
51 | importFrom(graphics,plot)
52 | importFrom(httr,POST)
53 | importFrom(httr,content)
54 | importFrom(iml,FeatureImp)
55 | importFrom(iml,Interaction)
56 | importFrom(iml,Predictor)
57 | importFrom(imputeMissings,compute)
58 | importFrom(imputeMissings,impute)
59 | importFrom(ipred,bagging)
60 | importFrom(klaR,rda)
61 | importFrom(mda,bruto)
62 | importFrom(mda,fda)
63 | importFrom(mda,gen.ridge)
64 | importFrom(mda,mars)
65 | importFrom(mda,polyreg)
66 | importFrom(nnet,nnet)
67 | importFrom(randomForest,randomForest)
68 | importFrom(ranger,ranger)
69 | importFrom(rjson,fromJSON)
70 | importFrom(rpart,rpart)
71 | importFrom(rpart,rpart.control)
72 | importFrom(stats,complete.cases)
73 | importFrom(stats,dnorm)
74 | importFrom(stats,glm)
75 | importFrom(stats,na.omit)
76 | importFrom(stats,pnorm)
77 | importFrom(stats,predict)
78 | importFrom(stats,rnorm)
79 | importFrom(stats,runif)
80 | importFrom(stats,setNames)
81 | importFrom(stats,var)
82 | importFrom(tictoc,tic)
83 | importFrom(tictoc,toc)
84 | importFrom(tidyr,drop_na)
85 | importFrom(tidyr,gather)
86 | importFrom(tidyr,separate)
87 | importFrom(tidyr,spread)
88 | importFrom(tidyr,unite)
89 | importFrom(truncnorm,dtruncnorm)
90 | importFrom(truncnorm,rtruncnorm)
91 | importFrom(utils,capture.output)
92 | importFrom(utils,head)
93 | importFrom(utils,read.csv)
94 | importFrom(xgboost,xgb.DMatrix)
95 | importFrom(xgboost,xgboost)
96 | 


--------------------------------------------------------------------------------
/NEWS.md:
--------------------------------------------------------------------------------
 1 | # SmartML 0.3.0.1
 2 | 
 3 | * Hotfix, fixed some dependency issues relating to dplyr
 4 | 
 5 | # SmartML 0.3.0
 6 | 
 7 | ## Features
 8 | 
 9 | * Added Ranger, XGBoost, fastNaiveBayes and LiblineaR high performing algorithms
10 | * Added the autoRLearn_ function, which assumes that the data is in perfect shape and can be loaded from a dataframe, unlike autoRLearn which can only load from a data file outside R.
11 | * Added Hyperband and Bayesian Optimization Hyperband to the new autoRLearn_
12 | * Added some extra temporary dependencies which will be removed in the following months (all tidyverse packages other than purrr)
13 | * Fixed some small mistakes in the code and jsons
14 | 
15 | ## Current Roadmap
16 | 
17 | * fix metalearning, at the moment it doesn't work. There's something wrong with the AWS server we are using.
18 | * change the dplyr back end to use data.table with dtplyr
19 | * merge autoRLearn and autoRLearn_ into a single function, which can both load from a data file and in R.
20 | * Rewrite SMAC, as requested by Sherif.
21 | 
22 | ## Extra info
23 | 
24 | * brurucy is a new and active maintainer
25 | * Nightly and experimental versions, independent from the Data Systems Lab, are being developed at https://github.com/brurucy/witchcraft
26 | * Updates will be conservative and focused on non-breaking changes, up to release 1.0.
27 | 


--------------------------------------------------------------------------------
/R/autoRLearn.R:
--------------------------------------------------------------------------------
  1 | #' @title Run smartML function for automatic Supervised Machine Learning.
  2 | #'
  3 | #' @description Run the smartML main function for automatic classifier algorithm selection, and hyper-parameter tuning.
  4 | #'
  5 | #' @param maxTime Float numeric of the maximum time budget for reading dataset, preprocessing, calculating meta-features, Algorithm Selection & hyper-parameter tuning process only in minutes(Excluding Model Interpretability) - This is applicable in case of Option = 2 only.
  6 | #' @param directory String Character of the training dataset directory (SmartML accepts file formats arff/(csv with columns headers) ).
  7 | #' @param testDirectory String Character of the testing dataset directory (SmartML accepts file formats arff/(csv with columns headers) ).
  8 | #' @param classCol String Character of the name of the class label column in the dataset (default = 'class').
  9 | #' @param vRatio Float numeric of the validation set ratio that should be splitted out of the training set for the evaluation process (default = 0.1 --> 10\%).
 10 | #' @param preProcessF vector of string Character containing the name of the preprocessing algorithms (default = c('standardize', 'zv') --> no preprocessing):
 11 | #' \itemize{
 12 | #' \item "boxcox" - apply a Box–Cox transform and values must be non-zero and positive in all features,
 13 | #' \item "yeo-Johnson" - apply a Yeo-Johnson transform, like a BoxCox, but values can be negative,
 14 | #' \item "zv" - remove attributes with a zero variance (all the same value),
 15 | #' \item "center" - subtract mean from values,
 16 | #' \item "scale" - divide values by standard deviation,
 17 | #' \item "standardize" - perform both centering and scaling,
 18 | #' \item "normalize" - normalize values,
 19 | #' \item "pca" - transform data to the principal components,
 20 | #' \item "ica" - transform data to the independent components.
 21 | #' }
 22 | #' @param featuresToPreProcess Vector of number of features to perform the feature preprocessing on - In case of empty vector, this means to include all features in the dataset file (default = c()) - This vector should be a subset of \code{selectedFeats}.
 23 | #' @param nComp Integer numeric of Number of components needed if either "pca" or "ica" feature preprocessors are needed.
 24 | #' @param nModels Integer numeric representing the number of classifier algorithms that you want to select based on Meta-Learning and start to tune using Bayesian Optimization (default = 5).
 25 | #' @param option Integer numeric representing either Classifier Algorithm Selection is needed only = 1 or Algorithm selection with its parameter tuning is required = 2 which is the default value.
 26 | #' @param featureTypes Vector of either 'numerical' or 'categorical' representing the types of features in the dataset (default = c() --> any factor or character features will be considered as categorical otherwise numerical).
 27 | #' @param interp Boolean representing if model interpretability (Feature Importance and Interaction) is needed or not (default = FALSE) This option will take more time budget if set to 1.
 28 | #' @param missingOpr Boolean variable represents either use median/mode imputation for instances with missing values (FALSE) or apply imputation using "MICE" library which helps you imputing missing values with plausible data values that are drawn from a distribution specifically designed for each missing datapoint (TRUE).
 29 | #' @param balance Boolean variable represents if SMOTE class balancing is required or not (default FALSE).
 30 | #' @param metric Metric of string character to be used in evaluation:
 31 | #' \itemize{
 32 | #' \item "acc" - Accuracy,
 33 | #' \item "avg-fscore" - Average of F-Score of each label,
 34 | #' \item "avg-recall" - Average of Recall of each label,
 35 | #' \item "avg-precision" - Average of Precision of each label,
 36 | #' \item "fscore" - Micro-Average of F-Score of each label,
 37 | #' \item "recall" - Micro-Average of Recall of each label,
 38 | #' \item "precision" - Micro-Average of Precision of each label.
 39 | #' }
 40 | #'
 41 | #' @return List of Results
 42 | #' \itemize{
 43 | #' \item "option=1" - Choosen Classifier Algorithms Names \code{clfs} with their parameters configurations \code{params}, Training DataFrame \code{TRData}, Test DataFrame \code{TEData} in case of \code{option=2},
 44 | #' \item "option=2" - Best classifier algorithm name found \code{clfs} with its parameters configuration \code{params}, , Training DataFrame \code{TRData}, Test DataFrame \code{TEData}, model variable \code{model}, predicted values on test set \code{pred}, performance on TestingSet \code{perf}, and Feature Importance \code{interpret$featImp} / Interaction \code{interpret$Interact} plots in case of interpretability \code{interp} = TRUE and chosen model is not knn.
 45 | #' }
 46 | #'
 47 | #' @examples
 48 | #' \dontrun{
 49 | #' autoRLearn(1, 'sampleDatasets/car/train.arff', \
 50 | #' 'sampleDatasets/car/test.arff', option = 2, preProcessF = 'normalize')
 51 | #'
 52 | #' result <- autoRLearn(10, 'sampleDatasets/shuttle/train.arff', 'sampleDatasets/shuttle/test.arff')
 53 | #' }
 54 | #'
 55 | #' @importFrom tictoc tic toc
 56 | #' @importFrom R.utils withTimeout
 57 | #' @importFrom graphics plot
 58 | #' @import ggplot2
 59 | #'
 60 | #' @export autoRLearn
 61 | 
 62 | autoRLearn <- function(maxTime, directory, testDirectory, classCol = 'class', metric = 'acc', vRatio = 0.3, preProcessF = c('standardize', 'zv'), featuresToPreProcess = c(), nComp = NA, nModels = 5, option = 2, featureTypes = c(), interp = FALSE, missingOpr = FALSE, balance = FALSE) {
 63 |   #Set Seed
 64 |   set.seed(22)
 65 |   #Read Dataset
 66 |   datasetReadError <- try(
 67 |   {
 68 |     #Read Training Dataset
 69 |     dataset <- readDataset(directory, testDirectory, classCol = classCol, vRatio = vRatio, preProcessF = preProcessF, featuresToPreProcess = featuresToPreProcess, nComp = nComp, missingOpr = missingOpr, metric = metric, balance = balance)
 70 |     trainingSet <- dataset$TD
 71 |     #Read Testing Dataset
 72 |     testDataset <- dataset$TED
 73 |     #Read all training Dataset without validation
 74 |     trainDataset <- dataset$FULLTD
 75 |   })
 76 |   if(inherits(datasetReadError, "try-error")){
 77 |     print('Error: Failed Reading Dataset: Make sure that dataset directory is correct and it is a valid csv/arff file.')
 78 |     return(-1)
 79 |   }
 80 | 
 81 |   #Calculate Meta-Features for the dataset
 82 |   metaFeaturesError <- try(
 83 |   {
 84 |     metaFeatures <- computeMetaFeatures(trainingSet, maxTime, featureTypes)
 85 |   })
 86 |   if(inherits(metaFeaturesError, "try-error")){
 87 |     print('Error: Failed Extracting Dataset MetaFeatures.')
 88 |     return(-1)
 89 |   }
 90 | 
 91 |   splitError <- try(
 92 |   {
 93 |     #Convert Categorical Features to Numerical Ones and split the dataset
 94 |     B <- max(10, as.integer((metaFeatures$nInstances) / 2000)) #Number of folds to work on for the dataset and trees in SMAC forest model
 95 | 
 96 |     dataset <- convertCategorical(dataset, trainDataset, testDataset, B = B)
 97 |     validationSet <- dataset$VD #Validation set
 98 |     trainingSet <- dataset$TD #Training Set
 99 |     foldedSet <- dataset$FD #Folded sets of Training Data.
100 |     #Convert for all TrainingSet
101 |     trainDataset <- dataset$FULLTD
102 |     #Convert for all TestingSet
103 |     testDataset <- dataset$TED
104 |   })
105 |   if(inherits(splitError, "try-error")){
106 |     print('Error: Failed Splitting Dataset.')
107 |     return(-1)
108 |   }
109 | 
110 |   #Generate candidate classifiers
111 |   candidateClfsError <- try(
112 |   {
113 |     nClassifiers <- 15
114 |     output <- getCandidateClassifiers(maxTime, metaFeatures, min(c(nModels, nClassifiers)) )
115 |     algorithms <- output$c #Classifier Algorithm names selected.
116 |     tRatio <- output$r #Time ratio between all classifiers.
117 |     algorithmsParams <- output$p #Initial Parameter configuration of each classifier.
118 |   })
119 |   if(inherits(candidateClfsError, "try-error")){
120 |     print('Error: Can not generate Candidate classifiers.')
121 |     return(-1)
122 |   }
123 | 
124 |   tryCatch({
125 |     #Option 1: Only Candidate Classifiers with initial parameters will be resulted (No Hyper-parameter tuning)
126 |     if(option == 1 && length(algorithms) == length(algorithmsParams))
127 |       return (list(clfs = algorithms, params = algorithmsParams, TRData = dataset$FULLTD, TEData = dataset$TED))
128 |     else if(option == 1)
129 |       return ('Error: Failed to Connect to KnowledgeBase, Option 1 can not be executed')
130 | 
131 |     #Option 2: Classifier Algorithm Selection + Parameter Tuning
132 |     res <- withTimeout({
133 |       #variables to hold best classifiers
134 |       bestAlgorithm <- '' #bestClassifierName.
135 |       bestAlgorithmPerf <- 0 #bestClassifierPerformance.
136 |       bestAlgorithmParams <- list() #Parameters of best Classifier.
137 | 
138 |       #loop over each classifier
139 |       for(i in 1:length(algorithms)){
140 |         classifierAlgorithm <- algorithms[[i]]
141 |         if (i <= length(algorithmsParams))
142 |           classifierAlgorithmParams <- algorithmsParams[[i]]
143 |         else
144 |           classifierAlgorithmParams <- '' #use the default initial parameter configuration
145 | 
146 |         #Read maxTime for the current classifier algorithm and convert to seconds
147 |         maxClfTime <- tRatio[i] * 60
148 |         #Read the current classifier default parameter configuration
149 |         classifierConf <- getClassifierConf(classifierAlgorithm)
150 |         cat('\nStart Tuning Classifier Algorithm: ', classifierAlgorithm, '\n')
151 |         #initialize step
152 |         R <- initialize(classifierAlgorithm, classifierConf, classifierAlgorithmParams)
153 |         cntParams <- R[, -which(names(R) == "performance")]
154 |         #start hyperParameter tuning till maximum Time
155 |         tic(quiet = TRUE)
156 |         timeTillNow <- 0
157 |         #Regression Random Forest Trees for training set folds
158 |         tree <- data.frame(fold=integer(), parent=integer(), params=character(), rightChild=integer(), leftChild=integer(), performance=double(), rowN = integer())
159 |         bestParams <- cntParams
160 |         bestPerf <- c()
161 |         classifierFailureCounter <- 0
162 | 
163 |         repeat{
164 |           gc()
165 |           #Fit Model
166 |           output <- fitModel(bestParams, bestPerf, trainingSet, validationSet, foldedSet, classifierAlgorithm, tree, B = B)
167 |           #Check if this classifer failed for more than 5 times, skip to the next classifier
168 |           if((length(bestPerf) > 0 && mean(bestPerf) == 0) || length(bestPerf) == 0){
169 |             classifierFailureCounter <- classifierFailureCounter + 1
170 |             if(classifierFailureCounter > 2) break
171 |           }
172 |           tree <- output$t
173 |           bestPerf <- output$p
174 |           bestParams <- output$bp
175 |           #Select Candidate Classifier Configurations
176 |           candidateConfs <- selectConfiguration(R, classifierAlgorithm, tree, bestParams, B = B)
177 |           #Intensify
178 |           if(nrow(candidateConfs) > 0){
179 |             output <- intensify(R, bestParams, bestPerf, candidateConfs, foldedSet, trainingSet, validationSet, classifierAlgorithm, maxClfTime, timeTillNow, B = B, metric = metric)
180 |             bestParams <- output$params
181 |             bestPerf <- output$perf
182 |             timeTillNow <- output$timeTillNow
183 |             classifierFailureCounter <- classifierFailureCounter + output$fails
184 |             R <- output$r
185 |           }
186 |           #Check if execution time exceeded the allowed time or not
187 |           t <- toc(quiet = TRUE)
188 |           timeTillNow <- timeTillNow + t$toc - t$tic
189 |           tic(quiet = TRUE)
190 |           if(timeTillNow > maxClfTime){
191 |             if(mean(bestPerf) > mean(bestAlgorithmPerf)){
192 |               bestAlgorithmPerf <- bestPerf
193 |               bestAlgorithm <- classifierAlgorithm
194 |               bestAlgorithmParams <- bestParams
195 |               #cat('Best Classifier:', bestAlgorithm, ' --> Performance:', bestAlgorithmPerf, '\n')
196 |             }
197 |             break
198 |           }
199 | 
200 |         }
201 |       }
202 | 
203 |     },timeout = maxTime * 60, cpu = maxTime * 60)
204 |   }, TimeoutException = function(ex) {
205 |     message("NOTE: Time Budget allowed has been finished.")
206 |   })
207 | 
208 |   print("Time Limit for Tuning process has been reached out. Training the best classifier found over whole Training set now.")
209 |   if (bestAlgorithm != '')
210 |     bestAlgorithmParams <- bestAlgorithmParams[,names(bestAlgorithmParams) != "EI" & names(bestAlgorithmParams) != "performance"]
211 |   else{
212 |     bestAlgorithm <- algorithms[[1]]
213 |     bestAlgorithmParams <- algorithmsParams[[1]]
214 |   }
215 | 
216 |   trainFinalModelError <- try(
217 |     {
218 |       #Run Classifier over all training set and check performance on testing set
219 |       finalResult <- runClassifier(trainingSet = trainDataset, validationSet = testDataset, params = bestAlgorithmParams, classifierAlgorithm = bestAlgorithm, metric = metric, interp = interp)
220 |       finalResult$clfs <- bestAlgorithm
221 |       finalResult$params <- bestAlgorithmParams
222 |       #save results to Temporary File
223 |       query <- sendToTmp(metaFeatures, bestAlgorithm, bestAlgorithmParams, finalResult$perf, nModels, metric)
224 |       #check internet connection and send data in tmp file to database if connection exists
225 |       if(checkInternet() == TRUE){
226 |         sendToDatabase(query)
227 |       }
228 |     })
229 |   if(inherits(trainFinalModelError, "try-error")){
230 |     print('Error: No Enough Computational Resources. Can not build a model over the current dataset!')
231 |   }
232 | 
233 | 
234 |   finalResult$TRData = dataset$FULLTD
235 |   finalResult$TEData = dataset$TED
236 |   return(finalResult)
237 | }
238 | 


--------------------------------------------------------------------------------
/R/autoRLearn_.R:
--------------------------------------------------------------------------------
  1 | #' @title Advanced version of autoRLearn.
  2 | #'
  3 | #' @description Tunes the hyperparameters of the desired algorithm/s using either hyperband or BOHB.
  4 | #'
  5 | #' @param df_train Dataframe of the training dataset. Assumes it is in perfect shape with all numeric variables and factor response variable named "class".
  6 | #' @param df_test Dataframe of the test dataset. Assumes it is in perfect shape with all numeric variables and factor response variable named "class".
  7 | #' @param maxTime Float representing the maximum time the algorithm should be run (in minutes).
  8 | #' @param models List of strings denoting which algorithms to use for the process:
  9 | #' \itemize{
 10 | #' \item "randomForest" - Random forests using the randomForest package
 11 | #' \item "ranger - Random forests using the ranger package (unstable)
 12 | #' \item "naiveBayes" - Naive bayes using the fastNaiveBayes package
 13 | #' \item "boosting" - Gradient boosting using xgboost
 14 | #' \item "l2-linear-classifier" - Linear primal Support vector machine from LibLinear
 15 | #' \item "svm" - RBF kernel svm from e1071
 16 | #' }
 17 | #' @param optimizationAlgorithm - String of which hyperparameter tuning algorithm to use:
 18 | #' \itemize{
 19 | #' \item "hyperband" - Hyperband with uniformly initiated parameters
 20 | #' \item "bohb" - Hyperband with bayesian optimization as described on F. Hutter et al 2018 paper BOHB. Has extra parameters bw and kde_type
 21 | #' }
 22 | #' @param bw - (only applies to BOHB) Double representing how much should the KDE bandwidth be widened. Higher values allow the algorithm to explore more hyperparameter combinations
 23 | #' @param max_iter - (affects both hyperband and BOHB) Integer representing the maximum number of iterations that one successive halving run can have
 24 | #' @param kde_type - (only applies to BOHB) String representing whether a model's hyperparameters should be tuned individually of each other or have their probability densities multiplied:
 25 | #' \itemize{
 26 | #' \item "single" - each hyperparameter has its own expected improvement calculated
 27 | #' \item "mixed" - all hyperparameters' probabilty densities are multiplied and only one mixed expected improvement is calculated
 28 | #' }
 29 | #' @param metric String of the evaluation metric to be used in the model performance optimization:
 30 | #' \itemize{
 31 | #' \item "acc" - Accuracy,
 32 | #' \item "avg-fscore" - Average of F-Score of each label,
 33 | #' \item "avg-recall" - Average of Recall of each label,
 34 | #' \item "avg-precision" - Average of Precision of each label,
 35 | #' \item "fscore" - Micro-Average of F-Score of each label,
 36 | #' \item "recall" - Micro-Average of Recall of each label,
 37 | #' \item "precision" - Micro-Average of Precision of each label.
 38 | #' }
 39 | #' @return List of Results
 40 | #' \itemize{
 41 | #' \item \code{perf} - Evaluated metric of the best performing model on the test data
 42 | #' \item \code{pred} - prediction on the test data using the best model
 43 | #' \item \code{model} - best model object
 44 | #' \item \code{best_models} - table with the best hyperparameters found for the selected models.
 45 | #' }
 46 | 
 47 | #' @importFrom R.utils withTimeout
 48 | #' @importFrom tictoc tic toc
 49 | #' @importFrom stats na.omit runif
 50 | #' @importFrom utils head
 51 | 
 52 | #' @export autoRLearn_
 53 | autoRLearn_ <- function(df_train, df_test, maxTime = 10,
 54 |                         models = c("randomForest", "naiveBayes", "boosting", "l2-linear-classifier", "svm"),
 55 |                         optimizationAlgorithm = "hyperband", bw = 3, kde_type = "single",
 56 |                         max_iter = 81, metric = "acc") {
 57 | 
 58 |   total_time <- maxTime * 60
 59 |   parameters_per_model <- map_int(models, .f = ~ length(jsons[[.x]]$params))
 60 |   times <- (parameters_per_model * total_time) / (sum(parameters_per_model))
 61 | 
 62 |   print("Time distribution:")
 63 |   print(times)
 64 |   print("Models selected:")
 65 |   print(models)
 66 | 
 67 |   run_optimization <- function(model, time) {
 68 |   results <- NULL
 69 |   priors <- data.frame()
 70 | 
 71 |   tic(model, "optimization time:")
 72 | 
 73 |     if(optimizationAlgorithm == "hyperband") {
 74 |       current <- Sys.time() %>% as.integer()
 75 |       end <- (Sys.time() %>% as.integer()) + time
 76 |       repeat {
 77 |         gc(verbose = F)
 78 |         tic("current hyperband runtime")
 79 |         print(paste("started", model))
 80 |         time_left <- max(end - (Sys.time() %>% as.integer()), 1)
 81 |         print(paste("There are:", time_left, "seconds left for this hyperband run"))
 82 |         res <- hyperband(df = df_train, model = model, max_iter = max_iter, maxtime = time_left)
 83 |         if(is_empty(flatten(res)) == F) {
 84 |           res <- res %>%
 85 |             map_dfr(.f = ~ .x[["answer"]]) %>%
 86 |             arrange(desc(acc)) %>%
 87 |             head(1)
 88 |           results <- c(list(res), results)
 89 |           print(paste('Best accuracy from hyperband this round: ', res$acc))
 90 |         }
 91 |         elapsed <- (Sys.time() %>% as.integer()) - current
 92 |         if(elapsed >= time) {
 93 |            break
 94 |         }
 95 |       }
 96 |     }
 97 | 
 98 |     else if(optimizationAlgorithm == "bohb") {
 99 |         current <- Sys.time() %>% as.integer()
100 |         end <- (Sys.time() %>% as.integer()) + time
101 |         repeat {
102 |           gc(verbose = F)
103 |           tic("current bohb time")
104 |           print(paste("started", model))
105 |           time_left <- max(end - (Sys.time() %>% as.integer()), 1)
106 |           print(paste("There are:", time_left, "seconds left for this bohb run"))
107 |           res <- bohb(df = df_train, model = model, bw = bw, max_iter = max_iter, maxtime = time_left,
108 |                       priors = priors, kde_type = kde_type)
109 |           if(is_empty(flatten(res)) == F) {
110 |             priors <- res %>%
111 |                 map_dfr(.f = ~ .x[["sh_runs"]])
112 |             res <- res %>%
113 |                 map_dfr(.f = ~ .x[["answer"]]) %>%
114 |                 arrange(desc(acc)) %>%
115 |                 head(1)
116 |               results <- c(list(res), results)
117 |               print(paste('Best accuracy from hyperband this round: ', res$acc))
118 |           }
119 |           elapsed <- (Sys.time() %>% as.integer()) - current
120 |           if(elapsed >= time) {
121 |             break
122 |           }
123 |       }
124 |     }
125 | 
126 |     else {
127 |       errorCondition(message = "Only hyperband and bohb are valid optimization algorithms at this moment.")
128 |       break
129 |     }
130 | 
131 |     toc()
132 |   results
133 |   }
134 | 
135 |   print("Finished all optimizations.")
136 |   ans <- vector(mode = "list", length = length(models))
137 | 
138 |   for(i in 1:length(models)) {
139 |     flag <- TRUE
140 |     #tryCatch(expr = {
141 |     ans[[i]] <- run_optimization(models[[i]], times[[i]])
142 |     #}, error = function(e) {
143 |     #  print("Error spotted, going to the next model!")
144 |     #  flag <<- FALSE
145 |     #})
146 |     if (!flag) next
147 |   }
148 | 
149 |   print(ans)
150 |   ans <- ans %>%
151 |     map(.f = ~ map_dfr(.x = .x, .f = ~ .x %>% select(model, params, acc))) %>%
152 |     map_dfr(.f = ~ .x %>% arrange(desc(acc)) %>% head(1)) %>%
153 |     arrange(desc(acc))
154 |   best_model <- ans %>% head(1)
155 |   final_evaluation <- eval_loss(model = best_model[["model"]], train_df = df_train, test_df = df_test,
156 |                                 params = best_model[["params"]])
157 |   final_evaluation$best_models <- ans
158 |   print(paste("Winner:", best_model$model, "test accuracy:", final_evaluation$perf))
159 |   final_evaluation
160 | 
161 | }
162 | 
163 | 


--------------------------------------------------------------------------------
/R/bohb.R:
--------------------------------------------------------------------------------
 1 | #' @importFrom dplyr distinct n group_by
 2 | 
 3 | #' @keywords internal
 4 | bohb <- function(df, model, max_iter = 81, eta = 3, bw = 3, random_frac = 1/3,
 5 |                  maxtime, priors = data.frame(), kde_type = "single") {
 6 |   logeta = as_mapper(~ log(.x) / log(eta))
 7 |   s_max = trunc(logeta(max_iter))
 8 |   B = (s_max + 1) * max_iter
 9 |   nrs = map_dfc(s_max:0, .f = ~ calc_n_r(max_iter, eta, .x, B)) %>%
10 |     t() %>%
11 |     `colnames<-`(value = c("n", "r")) %>%
12 |     as_tibble()
13 |   nrs$s = s_max:0
14 |   length_params <- length(jsons[[model]]$params)
15 | 
16 |   tryCatch(expr = {withTimeout(expr = {
17 |     liszt = vector(mode = "list",
18 |                    length = max(nrs$s) + 1)
19 |     runs_df <- NULL
20 |     current_sh_run <- NULL
21 |     for (row in 1:nrow(nrs)) {
22 |       if(row == 1) {
23 |         print(paste("Iteration number", row))
24 |         #print(paste("n = ", nrs[[row, 1]], " r = ", nrs[[row, 2]], " s_max = ", nrs[[row, 3]], sep = ""))
25 |         current_sh_run <- successive_halving(df = df,
26 |                                              params_config = sample_n_params(n = nrs[[row, 1]],
27 |                                                                              model = model),
28 |                                              n = nrs[[row, 1]],
29 |                                              r = nrs[[row, 2]],
30 |                                              s_max = nrs[[row, 3]],
31 |                                              max_iter = max_iter,
32 |                                              eta = eta,
33 |                                              evaluations = priors)
34 |         runs_df <- runs_df %>%
35 |           bind_rows(current_sh_run$sh_runs)
36 |         liszt[[row]] <- current_sh_run
37 |         next
38 |       }
39 |       else if(row > 1){
40 |         bayesian_opt_samples <- successive_resampling(df = runs_df,
41 |                                                       model = model,
42 |                                                       samples = max_iter,
43 |                                                       n = round(max(nrs[[row, 1]] * (1 - random_frac), 1)),
44 |                                                       bw = bw,
45 |                                                       kde_type = kde_type)
46 | 
47 |         current_sh_run <- successive_halving(df = df,
48 |                                              params_config = bayesian_opt_samples %>%
49 |                                                bind_rows(sample_n_params(n = round(max(nrs[[row, 1]] * random_frac, 1)), model = model)),
50 |                                              n = nrs[[row, 1]],
51 |                                              r = nrs[[row, 2]],
52 |                                              s_max = nrs[[row, 3]],
53 |                                              max_iter = max_iter,
54 |                                              eta = eta)
55 |       }
56 |       runs_df <- runs_df %>%
57 |         bind_rows(current_sh_run$sh_runs)
58 |       liszt[[row]] <- current_sh_run
59 |     }
60 |   }, timeout = maxtime, cpu = maxtime)},
61 | 
62 |   TimeoutException = function(ex) {
63 |     print("Budget ended.")
64 |     return(liszt)
65 |   },
66 | 
67 |   finally = function(ex) {
68 |     print("BOHB successfully finished.")
69 |     return(liszt)
70 |   }
71 |   ,
72 | 
73 |   error = function(ex) {
74 |     print(paste("Error found, replace ", model, sep = ""))
75 |     print(geterrmessage())
76 |     break
77 |   })
78 | 
79 |   return(liszt)
80 | }
81 | 


--------------------------------------------------------------------------------
/R/bohb_utility.R:
--------------------------------------------------------------------------------
 1 | #' @keywords internal
 2 | EI <- function(..., lkde, gkde) { predict(lkde, x = c(...)) / predict(gkde, x = c(...)) }
 3 | 
 4 | #' @keywords internal
 5 | map_all <- function(df) {
 6 |   do.call("mapply", c(list, df, SIMPLIFY = FALSE, USE.NAMES=FALSE))
 7 | }
 8 | 
 9 | #' @keywords internal
10 | coalesce_all_columns <- function(df, group_vars = NULL) {
11 |   if (is.null(group_vars)) {
12 |     group_vars <-
13 |       df %>%
14 |       purrr::keep(~ dplyr::n_distinct(.x) == 1L) %>%
15 |       names()
16 |   }
17 | 
18 |   msk <- colnames(df) %in% group_vars
19 |   same_df <- df[1L, msk, drop = FALSE]
20 |   coal_df <- df[, !msk, drop = FALSE] %>%
21 |     purrr::map_dfc(na.omit)
22 |   cbind(same_df, coal_df)
23 | }
24 | 
25 | #' @keywords internal
26 | sample_n_params <- function(n, model) {
27 |   ans <- map_chr(.x = rep(model, n), .f = make_paste_final) %>%
28 |     data.frame(model = model,
29 |                params = .) %>%
30 |     mutate_all(.funs = as.character)
31 |   ans
32 | }
33 | 
34 | #' @keywords internal
35 | make_paste_final <- function(model) {
36 |   params_list <- get_random_hp_config(jsons[[model]])
37 | 
38 |   names_list <- names(params_list) %>%
39 |     map(~ str_glue(.x, " = ")) %>%
40 |     map2(params_list, ~paste(.x, .y, sep = "")) %>%
41 |     paste(collapse = ",")
42 |   names_list
43 | }
44 | 


--------------------------------------------------------------------------------
/R/checkInternet.R:
--------------------------------------------------------------------------------
 1 | #' @title Check Internet Connectivity.
 2 | #'
 3 | #' @description Checking if user has Internet connectivity at the moment of execution to send results to the knowledge base / get recommendation from knowledge base.
 4 | #'
 5 | #' @return Boolean representing the Internet connectivity status.
 6 | #'
 7 | #' @examples
 8 | #' checkInternet().
 9 | #'
10 | #' @importFrom RCurl getURL
11 | #'
12 | #' @noRd
13 | #'
14 | #' @keywords internal
15 | 
16 | checkInternet <- function() {
17 |   out <- FALSE
18 |   tryCatch({
19 |     out <- is.character(getURL("www.yahoo.com"))
20 |   },
21 |   error = function(e) {
22 |     out <- FALSE
23 |   }
24 |   )
25 |   out
26 | }
27 | 


--------------------------------------------------------------------------------
/R/computeEI.R:
--------------------------------------------------------------------------------
 1 | #' @title Compute Expected Improvement.
 2 | #'
 3 | #' @description Compute the expected improvement for the suggested parameter configurations of a specific classifier.
 4 | #'
 5 | #' @param cmin Minimum error rate achieved till now.
 6 | #' @param perf Expected Performance of the current configuration on each tree of the forest of SMAC algorithm.
 7 | #' @param B number of trees in the forest of trees of SMAC optimization algorithm (default = 10).
 8 | #'
 9 | #' @return Float Number of Expected Improvement value.
10 | #'
11 | #' @examples
12 | #' computeEI(0.9, c(0.91, 0.95, 0.89, 0.88, 0.93), 5).
13 | #'
14 | #' @importFrom stats pnorm dnorm var
15 | #'
16 | #' @noRd
17 | #'
18 | #' @keywords internal
19 | 
20 | computeEI <- function(cmin, perf, B = 10){
21 |   for(i in 1:B){
22 |     perf[i] <- 1 - perf[i]
23 |   }
24 |   perfMean <- mean(perf)
25 |   perfStdDev <- sqrt(var(perf))
26 |   u <- (cmin - perfMean)/perfStdDev
27 |   cdf <- pnorm(u, mean=0, sd=1)
28 |   pdf <- dnorm(u, mean=0, sd=1)
29 |   EI <- perfStdDev * (u * cdf + pdf)
30 |   return (EI)
31 | }
32 | 


--------------------------------------------------------------------------------
/R/computeMetaFeatures.R:
--------------------------------------------------------------------------------
  1 | #' @title Compute Meta-Features.
  2 | #'
  3 | #' @description Compute Statistical Meta-Features for a dataset.
  4 | #'
  5 | #' @param dataset The dataframe containing the dataset to process.
  6 | #' @param maxTime The maximum time budget entered by user for the parameter optimization part (in minutes).
  7 | #' @param featureTypes Vector of Types of each feature in the dataset either ('numerical', 'categorical').
  8 | #'
  9 | #' @return dataframe with 25 statistical meta-feature of \code{dataset}.
 10 | #'
 11 | #' @examples
 12 | #' computeMetaFeatures(data.frame(salary = c(623, 515, 611, 729, 843), class = c (0, 0, 0, 1, 1)),  10, c('numerical', 'numerical')).
 13 | #'
 14 | #' @importFrom e1071 skewness kurtosis
 15 | #' @importFrom stats var
 16 | #'
 17 | #' @noRd
 18 | #'
 19 | #' @keywords internal
 20 | 
 21 | computeMetaFeatures <- function(dataset, maxTime, featureTypes) {
 22 |   print('###################START: Preparation of Meta-Features of the Dataset###################')
 23 |   #1- number of instances
 24 |   nInstances <- nrow(dataset)
 25 |   cat(sprintf("1-Number of Instances: %d\n", nInstances))
 26 |   #2- log number of instances
 27 |   lognInstances <- log(nInstances)
 28 |   cat(sprintf("2-Log number of Instances: %f\n",lognInstances))
 29 |   #3- number of features
 30 |   nFeatures <- ncol(dataset) - 1
 31 |   cat(sprintf("3-Number of Features: %d\n", nFeatures))
 32 |   #4- log number of features
 33 |   lognFeatures <- log(nFeatures)
 34 |   cat(sprintf("4-Log number of Features: %f\n", lognFeatures))
 35 |   #5- number of classes
 36 |   classes <- unique(dataset$class)
 37 |   nClasses <- length(classes)
 38 |   cat(sprintf("5-Total number of Classes: %d\n", nClasses))
 39 |   #6- number of categorical features
 40 |   nCatFeatures <- 0
 41 |   nNumFeatures <- 0
 42 |   skewVector <- c()
 43 |   kurtosisVector <- c()
 44 |   symbolsVector <- c()
 45 |   featsType <- lapply(dataset, class)
 46 |   if(length(featureTypes) == 0){
 47 |     for(i in colnames(dataset)){
 48 |       if(i == 'class')next
 49 |       if(featsType[[i]] != 'factor' && featsType[[i]] != 'character' && length(unique(dataset[[i]])) > lognInstances){
 50 |         nNumFeatures <- nNumFeatures + 1
 51 |         skewVector <- c(skewVector, skewness(dataset[[i]]))
 52 |         kurtosisVector <- c(kurtosisVector, kurtosis(dataset[[i]]))
 53 |       }
 54 |       else{
 55 |         nCatFeatures <- nNumFeatures + 1
 56 |         symbolsVector <- c(symbolsVector, length(unique(dataset[[i]])))
 57 |       }
 58 |     }
 59 |   }
 60 |   else{
 61 |     counter <- 0
 62 |     for(i in colnames(dataset)){
 63 |       counter <- counter + 1
 64 |       if(i == 'class')next
 65 |       if(featureTypes[counter] == 'numerical'){
 66 |         nNumFeatures <- nNumFeatures + 1
 67 |         skewVector <- c(skewVector, skewness(dataset[[i]]))
 68 |         kurtosisVector <- c(kurtosisVector, kurtosis(dataset[[i]]))
 69 |       }
 70 |       else{
 71 |         nCatFeatures <- nNumFeatures + 1
 72 |         symbolsVector <- c(symbolsVector, length(unique(dataset[[i]])))
 73 |       }
 74 |     }
 75 |   }
 76 |   cat(sprintf("6-Number of Categorical Features: %d\n", nCatFeatures))
 77 |   #7- number of numerical features
 78 |   cat(sprintf("7-Number of Numerical Features: %d\n", nNumFeatures))
 79 |   #8- ratio of numerical to categorical features
 80 |   if(nNumFeatures > 0){
 81 |     ratioNumToCat <- nCatFeatures / nNumFeatures
 82 |   }
 83 |   else{
 84 |     ratioNumToCat <- 999999
 85 |   }
 86 |   cat(sprintf("8-Ratio of Categorical to Numerical Features %f\n", ratioNumToCat))
 87 |   #9- class entropy
 88 |   probClasses <- c()
 89 |   classEntropy <- 0
 90 |   for(i in classes){
 91 |     prob <- length(which(dataset$class==i))/nInstances
 92 |     probClasses <- c(probClasses, prob)
 93 |     classEntropy <- classEntropy - prob * log2(prob)
 94 |   }
 95 |   cat(sprintf("9-Class Entropy: %f\n", classEntropy))
 96 |   #10- class probability max
 97 |   classProbMax <- max(probClasses)
 98 |   cat(sprintf("10-Maximum Class Probability: %f\n", classProbMax))
 99 |   #11- class probability min
100 |   classProbMin <- min(probClasses)
101 |   cat(sprintf("11-Minimum Class Probability: %f\n", classProbMin))
102 |   #12- class probability mean
103 |   classProbMean <- mean(probClasses)
104 |   cat(sprintf("12-Mean Class Probability: %f\n", classProbMean))
105 |   #13- class probability std. dev
106 |   classProbStdDev <- sqrt(var(probClasses))
107 |   cat(sprintf("13-Standard Deviation of Class Probability: %f\n", classProbStdDev))
108 |   #14- Symbols Mean
109 |   if(length(symbolsVector) > 0) symbolsMean <- mean(symbolsVector)
110 |   else symbolsMean <- 'NULL'
111 |   cat(sprintf("14-Mean of Number of Symbols: %s\n", symbolsMean))
112 |   #15- Symbols sum
113 |   if(length(symbolsVector) > 0) symbolsSum <- sum(symbolsVector)
114 |   else symbolsSum <- 'NULL'
115 |   cat(sprintf("15-Sum of Number of Symbols: %s\n", symbolsSum))
116 |   #16- Symbols Std. Deviation
117 |   if(length(symbolsVector) > 0) symbolsStdDev <- sqrt(var(symbolsVector))
118 |   else symbolsStdDev <- 'NULL'
119 |   cat(sprintf("16-Std. Deviation of Number of Symbols: %s\n", symbolsStdDev))
120 |   #17- skewness min
121 |   if(length(skewVector) > 0) featuresSkewMin <- try(min(skewVector))
122 |   else featuresSkewMin <- 0
123 |   cat(sprintf("17-Features Skewness Minimum: %s\n", featuresSkewMin))
124 |   #18- skewness mean
125 |   if(length(skewVector) > 0) featuresSkewMean <- try(mean(skewVector))
126 |   else featuresSkewMean <- 0
127 |   cat(sprintf("18-Features Skewness Mean: %s\n", featuresSkewMean))
128 |   #19- skewness max
129 |   if(length(skewVector) > 0) featuresSkewMax <- try(max(skewVector))
130 |   else featuresSkewMax <- 0
131 |   cat(sprintf("19-Features Skewness Maximum: %s\n", featuresSkewMax))
132 |   #20- skewness std. dev.
133 |   if(length(skewVector) > 0) featuresSkewStdDev <- try(sqrt(var(skewVector)))
134 |   else featuresSkewStdDev <- 0
135 |   cat(sprintf("20-Features Skewness Std. Deviation: %s\n", featuresSkewStdDev))
136 |   #21- Kurtosis min
137 |   if(length(kurtosisVector) > 0) featuresKurtMin <- try(min(kurtosisVector))
138 |   else featuresKurtMin <- 0
139 |   cat(sprintf("21-Features Kurtosis Min: %s\n", featuresKurtMin))
140 |   #22- Kurtosis max
141 |   if(length(kurtosisVector) > 0) featuresKurtMax <- try(max(kurtosisVector))
142 |   else featuresKurtMax <- 0
143 |   cat(sprintf("22-Features Kurtosis Max: %s\n", featuresKurtMax))
144 |   #23- Kurtosis mean
145 |   if(length(kurtosisVector) > 0) featuresKurtMean <- try(mean(kurtosisVector))
146 |   else featuresKurtMean <- 0
147 |   cat(sprintf("23-Features Kurtosis Mean: %s\n", featuresKurtMean))
148 |   #24- Kurtosis std. dev.
149 |   if(length(kurtosisVector) > 0) featuresKurtStdDev <- try(sqrt(var(kurtosisVector)))
150 |   else featuresKurtStdDev <- 0
151 |   cat(sprintf("24-Features Kurtosis Std. Deviation: %s\n", featuresKurtStdDev))
152 |   #25- Dataset Ratio (ratio of number features: number of instances)
153 |   datasetRatio <- nFeatures / nInstances
154 |   cat(sprintf("25-Dataset Ratio: %f\n", datasetRatio))
155 | 
156 |   #Collecting Meta-Features in a dataFrame
157 |   df <- data.frame(datasetRatio = datasetRatio, featuresKurtStdDev = featuresKurtStdDev,
158 |                    featuresKurtMean = featuresKurtMean, featuresKurtMax = featuresKurtMax,
159 |                    featuresKurtMin = featuresKurtMin, featuresSkewStdDev = featuresSkewStdDev,
160 |                    featuresSkewMean = featuresSkewMean, featuresSkewMax = featuresSkewMax,
161 |                    featuresSkewMin = featuresSkewMin, symbolsStdDev = symbolsStdDev, symbolsSum = symbolsSum,
162 |                    symbolsMean = symbolsMean, classProbStdDev = classProbStdDev, classProbMean = classProbMean,
163 |                    classProbMax = classProbMax, classProbMin = classProbMin, classEntropy = classEntropy,
164 |                    ratioNumToCat = ratioNumToCat, nCatFeatures = nCatFeatures, nNumFeatures = nNumFeatures,
165 |                    nInstances = nInstances, nFeatures = nFeatures, nClasses = nClasses,
166 |                    lognFeatures = lognFeatures, lognInstances = lognInstances, maxTime = maxTime)
167 |   print('###################END: Preparation of Meta-Features of the Dataset###################')
168 |   return(df)
169 | }
170 | 


--------------------------------------------------------------------------------
/R/convertCategorical.R:
--------------------------------------------------------------------------------
 1 | #' @title Convert Categorical to Numerical Features.
 2 | #'
 3 | #' @description Perform One-Hot-Encoding for the categorical features to convert them to numerical ones.
 4 | #'
 5 | #' @param dataset List of training and validation dataframes containing the dataset to process.
 6 | #' @param trainDataset Dataframe of full training set
 7 | #' @param testDataset Dataframe of full testing set
 8 | #' @param B number of trees in the forest of trees of SMAC optimization algorithm (default = 10).
 9 | #'
10 | #' @return List of data frames for the new dataset after encoding Ctegorical to numerical (TD = Training Dataset, VD = Validation Dataset, FD = Training Dataset after splitting it into \code{B} folds).
11 | #'
12 | #' @examples
13 | #' convertCategorical(data.frame(salary = c(623, 515, 611, 729, 843), class = c (0, 0, 0, 1, 1)),  1).
14 | #'
15 | #' @import caret
16 | #'
17 | #' @noRd
18 | #'
19 | #' @keywords internal
20 | 
21 | convertCategorical <- function(dataset, trainDataset, testDataset, B = 10) {
22 |   #Convert Factor/String Features into numeric features
23 |   dmy <- caret::dummyVars(" ~ .", data = rbind(trainDataset, testDataset)[,names(trainDataset) != "class"])
24 |   datasetTmp <- data.frame(predict(dmy, newdata = dataset$TD))
25 |   dataset$FULLTD <- data.frame(predict(dmy, newdata = trainDataset))
26 |   dataset$TED <- data.frame(predict(dmy, newdata = testDataset))
27 | 
28 |   datasetTmp$class <- dataset$TD$class
29 |   dataset$TD <- datasetTmp
30 |   dataset$FULLTD$class <- trainDataset$class
31 |   dataset$TED$class <- testDataset$class
32 | 
33 |   if(nrow(dataset$VD) > 1){
34 |     validationSet <- data.frame(predict(dmy, newdata = dataset$VD))
35 |     validationSet$class <- dataset$VD$class
36 |     dataset$VD <- validationSet
37 |     dataset$FD <- createFolds(dataset$TD$class, k = B, list = TRUE, returnTrain = FALSE)
38 |   }
39 |   return(dataset)
40 | }
41 | 


--------------------------------------------------------------------------------
/R/datasetReader.R:
--------------------------------------------------------------------------------
  1 | #' @title Read Dataset File into Memory.
  2 | #'
  3 | #' @description Read the file of the training and testing dataset, and perform preprocessing and data cleaning if necessary.
  4 | #'
  5 | #' @param directory String of the directory to the file containing the training dataset.
  6 | #' @param testDirectory String of the directory to the file containing the testing dataset.
  7 | #' @param selectedFeats Vector of numbers of features columns to include from the training set and ignore the rest of columns - In case of empty vector, this means to include all features in the dataset file (default = c()).
  8 | #' @param classCol String of the name of the class label column in the dataset (default = 'class').
  9 | #' @param preProcessF string containing the name of the preprocessing algorithm (default = 'N' --> no preprocessing):
 10 | #' \itemize{
 11 | #' \item "boxcox" - apply a Box–Cox transform and values must be non-zero and positive in all features,
 12 | #' \item "yeo-Johnson" - apply a Yeo-Johnson transform, like a BoxCox, but values can be negative,
 13 | #' \item "zv" - remove attributes with a zero variance (all the same value),
 14 | #' \item "center" - subtract mean from values,
 15 | #' \item "scale" - divide values by standard deviation,
 16 | #' \item "standardize" - perform both centering and scaling,
 17 | #' \item "normalize" - normalize values,
 18 | #' \item "pca" - transform data to the principal components,
 19 | #' \item "ica" - transform data to the independent components.
 20 | #' }
 21 | #' @param featuresToPreProcess Vector of number of features to perform the feature preprocessing on - In case of empty vector, this means to include all features in the dataset file (default = c()) - This vector should be a subset of \code{selectedFeats}.
 22 | #' @param nComp Integer of Number of components needed if either "pca" or "ica" feature preprocessors are needed.
 23 | #' @param missingVal Vector of strings representing the missing values in dataset (default: c('NA', '?', ' ')).
 24 | #' @param missingOpr Boolean variable represents either delete instances with missing values or apply imputation using "MICE" library which helps you imputing missing values with plausible data values that are drawn from a distribution specifically designed for each missing datapoint- (default = 0 --> delete instances).
 25 | #'
 26 | #' @return List of the TrainingSet \code{Train} and TestingSet \code{Test}.
 27 | #'
 28 | #' @import RWeka
 29 | #' @import farff
 30 | #' @import caret
 31 | #' @import mice
 32 | #' @importFrom  utils read.csv
 33 | #' @importFrom stats complete.cases
 34 | #'
 35 | #' @examples
 36 | #' \dontrun{
 37 | #' dataset <- datasetReader('/Datasets/irisTrain.csv', '/Datasets/irisTest.csv')
 38 | #' }
 39 | 
 40 | datasetReader <- function(directory, testDirectory, selectedFeats = c(), classCol = 'class',
 41 |                           preProcessF = 'N', featuresToPreProcess = c(), nComp = NA,
 42 |                           missingVal = c('NA', '?', ' '), missingOpr = 0) {
 43 |   #check if CSV or arff
 44 |   ext <- substr(directory, nchar(directory)-2, nchar(directory))
 45 |   #Read CSV file of data
 46 |   if(ext == 'csv'){
 47 |     con <- file(directory, "r")
 48 |     data <- read.csv(file = con, header = TRUE, sep = ",", stringsAsFactors = FALSE)
 49 |     close(con)
 50 |     con <- file(testDirectory, "r")
 51 |     dataTED <- read.csv(file = con, header = TRUE, sep = ",", stringsAsFactors = FALSE)
 52 |     close(con)
 53 |   }
 54 |   else{
 55 |     data <- readARFF(directory)
 56 |     dataTED <- readARFF(testDirectory)
 57 |   }
 58 | 
 59 |   #change column name of classes to be "class"
 60 |   colnames(data)[which(names(data) == classCol)] <- "class"
 61 |   colnames(dataTED)[which(names(dataTED) == classCol)] <- "class"
 62 |   cInd <- grep("class", colnames(data)) #index of class column
 63 | 
 64 |   #Convert characters representing missing values to NA
 65 |   m1 <- as.matrix(data)
 66 |   m1[m1 %in% missingVal] <- NA
 67 |   m2 <- as.matrix(dataTED)
 68 |   m2[m2 %in% missingVal] <- NA
 69 | 
 70 |   #check either to delete instance with missing values or perform imputation
 71 |   if (missingOpr == 0){
 72 |     data <- data[complete.cases(m1), ]
 73 |     dataTED <- dataTED[complete.cases(m2), ]
 74 |   }
 75 |   else{
 76 |     data <- complete(mice(data, m = 1))
 77 |     dataTED <- complete(mice(dataTED, m = 1))
 78 |   }
 79 | 
 80 |   #select features only upon user request
 81 |   if(length(selectedFeats) == 0){
 82 |     selectedFeats <- c(1:ncol(data))
 83 |   }
 84 |   #perform preprocessing
 85 |   if(preProcessF != 'N'){
 86 |     if(length(featuresToPreProcess ) == 0)
 87 |       featuresToPreProcess <- selectedFeats
 88 | 
 89 |     featuresToPreProcess <- featuresToPreProcess[!featuresToPreProcess %in% cInd] #remove class column from set of features to be preprocessed
 90 |     dataTmp <- featurePreProcessing(data[,featuresToPreProcess], dataTED[,featuresToPreProcess], preProcessF, nComp)
 91 | 
 92 |     #add other features that don't require feature preprocessing to the features obtained after preprocessing
 93 |     diffTmp <- setdiff(selectedFeats, c(cInd, featuresToPreProcess))
 94 |     dataTDTmp <- cbind(dataTmp$TD, data[, diffTmp])
 95 |     dataTEDTmp <- cbind(dataTmp$TED, dataTED[, diffTmp])
 96 |     #add class column to the dataframe of the dataset
 97 |     dataTDTmp$class <- data$class
 98 |     dataTEDTmp$class <- dataTED$class
 99 |     data <- dataTDTmp
100 |     dataTED <- dataTEDTmp
101 |   }
102 |   else{
103 |     data <- data[, selectedFeats]
104 |     dataTED <- dataTED[, selectedFeats]
105 |   }
106 |   return (list(Train = data, Test = dataTED))
107 | }
108 | 


--------------------------------------------------------------------------------
/R/evaluateMet.R:
--------------------------------------------------------------------------------
 1 | #' @title Evaluate Fitted Model.
 2 | #'
 3 | #' @description Evaluate Predictions obtained from a specific model based on true labels, its predictions, and the evaluation metric.
 4 | #'
 5 | #' @param yTrue Vector of true labels.
 6 | #' @param pred Vector of predicted labels.
 7 | #' @param metric Metric to be used in evaluation:
 8 | #' \itemize{
 9 | #' \item "acc" - Accuracy,
10 | #' \item "avg-fscore" - Average of F-Score of each label,
11 | #' \item "avg-recall" - Average of Recall of each label,
12 | #' \item "avg-precision" - Average of Precision of each label,
13 | #' \item "fscore" - Micro-Average of F-Score of each label,
14 | #' \item "recall" - Micro-Average of Recall of each label,
15 | #' \item "precision" - Micro-Average of Precision of each label.
16 | #' }
17 | #'
18 | #' @importFrom  caret confusionMatrix
19 | #'
20 | #' @return Float number representing the evaluation.
21 | #'
22 | #' @examples
23 | #' \dontrun{
24 | #' result1 <- autoRLearn(10, 'sampleDatasets/shuttle/train.arff', 'sampleDatasets/shuttle/test.arff')
25 | #' }
26 | #'
27 | #' @noRd
28 | #'
29 | #' @keywords internal
30 | #'
31 | evaluateMet <- function(yTrue, pred, metric = 'acc'){
32 |   lvls <- union(pred, yTrue)
33 |   cm = as.matrix(table(Actual = factor(yTrue, lvls),
34 |                        Predicted = factor(pred, lvls)) ) # create the confusion matrix
35 |   n = sum(cm) # number of instances
36 |   nc = nrow(cm) # number of classes
37 |   diag = diag(cm) # number of correctly classified instances per class
38 |   rowsums = apply(cm, 1, sum) # number of instances per class
39 |   colsums = apply(cm, 2, sum) # number of predictions per class
40 |   oneVsAll = lapply(1 : nc,
41 |                     function(i){
42 |                       v = c(cm[i,i],
43 |                             rowsums[i] - cm[i,i],
44 |                             colsums[i] - cm[i,i],
45 |                             n-rowsums[i] - colsums[i] + cm[i,i]);
46 |                       return(matrix(v, nrow = 2, byrow = T))})
47 |   s = matrix(0, nrow = 2, ncol = 2)
48 |   for(i in 1 : nc){s = s + oneVsAll[[i]]}
49 | 
50 |   if (metric == 'acc'){
51 |     perf <- sum(diag) / n
52 |   }
53 |   else if(metric == 'avg-precision'){
54 |     precision <- diag / colsums
55 |     perf <- mean(precision)
56 |   }
57 |   else if(metric == 'avg-recall'){
58 |     recall <- diag / rowsums
59 |     perf <- mean(recall)
60 |   }
61 |   else if(metric == 'avg-fscore'){
62 |     precision <- diag / colsums
63 |     recall <- diag / rowsums
64 |     f1 <- 2 * precision * recall / (precision + recall)
65 |     perf <- mean(f1)
66 |   }
67 |   else{
68 |     perf <- (diag(s) / apply(s,1, sum))[1];
69 |   }
70 | 
71 |   return(perf)
72 | }
73 | 


--------------------------------------------------------------------------------
/R/evocate.R:
--------------------------------------------------------------------------------
  1 | #' @export evocate
  2 | evocate <- function(df_train, df_test, maxTime = 1, models = "xgboost",
  3 |                     optimizationAlgorithm = "hyperband", bw = 3, max_iter = 81, kde_type = "single",
  4 |                     problem = "classification", measure = "classif.acc", ensemble_size = 1) {
  5 | 
  6 |   total_time <- maxTime * 60
  7 |   parameters_per_model <- map_int(models, .f = ~ length(jsons[[.x]]$params))
  8 |   times <- (parameters_per_model * total_time) / (sum(parameters_per_model))
  9 | 
 10 |   cat("Models selected:", models, '\n', sep = ' ')
 11 |   cat("Time distribution:", times, '\n', sep = ' ')
 12 | 
 13 |   run_optimization <- function(model, time) {
 14 |     results <- NULL
 15 |     priors <- data.frame()
 16 |     tic(model, "optimization time:")
 17 | 
 18 |     if(optimizationAlgorithm == 'hyperband') {
 19 |       current <- Sys.time() %>% as.integer()
 20 |       end <- (Sys.time() %>% as.integer()) + time
 21 | 
 22 |       repeat {
 23 |         gc(verbose = F)
 24 |         tic('current hyperband runtime')
 25 |         print(paste('Started', model, ' model...'))
 26 |         # Compute the time left for this model
 27 |         time_left <- max(end - (Sys.time() %>% as.integer()), 1)
 28 |         print(paste("There are:", time_left, "seconds left for this hyperband run"))
 29 |         res <- hyperband(df = df_train, model = model, max_iter = max_iter,
 30 |                          maxtime = time_left, problem = problem, measure = measure)
 31 | 
 32 |         if(is_empty(flatten(res)) == F) {
 33 |           res <- res %>%
 34 |             map_dfr(.f = ~ .x[["answer"]]) %>%
 35 |             arrange(desc(acc)) %>%
 36 |             head(1)
 37 |           results <- c(list(res), results)
 38 |           print(paste('Best performance from hyperband this round: ', res$acc))
 39 |         }
 40 |         # Break if the remaining time exceeds the allowed time budget
 41 |         elapsed <- (Sys.time() %>% as.integer()) - current
 42 |         if(elapsed >= time) {
 43 |           break
 44 |         }
 45 |       }
 46 |     }
 47 |     else if(optimizationAlgorithm == "bohb") {
 48 |       current <- Sys.time() %>% as.integer()
 49 |       end <- (Sys.time() %>% as.integer()) + time
 50 |       repeat {
 51 |         gc(verbose = F)
 52 |         tic("current bohb time")
 53 |         print(paste("started", model))
 54 |         time_left <- max(end - (Sys.time() %>% as.integer()), 1)
 55 |         print(paste("There are:", time_left, "seconds left for this bohb run"))
 56 |         res <- bohb(df = df_train, model = model, bw = bw, max_iter = max_iter,
 57 |                     maxtime = time_left, priors = priors, kde_type = kde_type)
 58 | 
 59 |         if(is_empty(flatten(res)) == F) {
 60 |           priors <- res %>%
 61 |             map_dfr(.f = ~ .x[["sh_runs"]])
 62 |           res <- res %>%
 63 |             map_dfr(.f = ~ .x[["answer"]]) %>%
 64 |             arrange(desc(acc)) %>%
 65 |             head(1)
 66 | 
 67 |           results <- c(list(res), results)
 68 |           print(paste('Best accuracy from hyperband this round: ', res$acc))
 69 |         }
 70 | 
 71 |         elapsed <- (Sys.time() %>% as.integer()) - current
 72 |         if(elapsed >= time) {
 73 |           break
 74 |         }
 75 |       }
 76 |     }
 77 |     else {
 78 |       errorCondition(message = "Only hyperband and bohb are valid optimization algorithms at this moment.")
 79 |       break
 80 |     }
 81 |     toc()
 82 |     results
 83 |   }
 84 | 
 85 |   print("Starting to run all optimizations.")
 86 |   ans <- vector(mode = "list", length = length(models))
 87 | 
 88 |   for(i in 1:length(models)) {
 89 |     flag <- TRUE
 90 |     tryCatch({
 91 |       ans[[i]] <- run_optimization(models[[i]], times[[i]])
 92 |     }, error = function(e) {
 93 |       cat('Error spotted: ')
 94 |       message(e)
 95 |       cat(' In ', models[[i]], ' model, going to the next model!\n')
 96 |       flag <<- FALSE
 97 |     })
 98 |     if (!flag) next
 99 |   }
100 | 
101 |   # Arrange Results according to the best performance
102 |   ensemble_size <- min(max(1, length(ans[[1]])), ensemble_size)
103 |   print(ensemble_size)
104 |   tryCatch({best_model <- ans %>%
105 |     map(.f = ~ map_dfr(.x = .x, .f = ~ .x %>% select(model, acc))) %>%
106 |     map_dfr(.f = ~ .x %>% arrange(desc(acc)) %>% head(ensemble_size)) %>%
107 |     arrange(desc(acc))
108 |   print('----------------------####------------------------')
109 |   # Return the best performing model
110 |   results <- ensembling(best_model, df_train, df_test, problem = problem, measure = measure)
111 |   return (results)
112 |   }, error = function(e){
113 |     cat('Error spotted: ')
114 |     message(e)
115 |     cat('Try increasing the time budget or use a different model.\n')
116 |     return (-1)
117 |   })
118 | 
119 | }
120 | 


--------------------------------------------------------------------------------
/R/evocate_utilities.R:
--------------------------------------------------------------------------------
 1 | #' @import nloptr
 2 | #' @import bbotk
 3 | 
 4 | #' @keywords internal
 5 | 
 6 | ensembling = function(best_models, df_train, df_test,
 7 |                        problem = 'classification', measure = 'classif.acc'){
 8 |   lrns = c()
 9 |   for(i in 1:nrow(best_models)){
10 |     lrns = c(lrns, po('learner_cv', best_models[[1]][[i]],
11 |                       id = paste('lrn', as.character(i), sep='') ))
12 |   }
13 | 
14 |   level0 = gunion(list(
15 |     lrns))  %>>%
16 |     po("featureunion", id = "union1")
17 | 
18 |   if(problem == 'classification'){
19 |     problem = 'classif'
20 |     ensemble = level0 %>>% LearnerClassifAvg$new(id = "classif.avg")
21 |     task = TaskClassif$new(id = 'final_eval', backend = df_train, target = 'class')
22 |   }
23 |   else{
24 |     problem = 'regr'
25 |     ensemble = level0 %>>% LearnerRegrAvg$new(id = "regr.avg")
26 |     task = TaskRegr$new(id = 'final_eval', backend = df_train, target = 'class')
27 |   }
28 | 
29 |   ens_lrn = GraphLearner$new(ensemble)
30 |   ens_lrn$predict_type = "prob"
31 |   ens_lrn$train(task)
32 |   perf <- ens_lrn$predict_newdata(df_test)$score(msr(measure))
33 |   return (list("model" = ens_lrn, "performance" = perf))
34 | }
35 | 


--------------------------------------------------------------------------------
/R/featurePreProcessing.R:
--------------------------------------------------------------------------------
 1 | #' @title Perform Feature Preprocessing if specified by user.
 2 | #'
 3 | #' @description Perform a preprocessing algorithm on the dataset and return the preprocessed one.
 4 | #'
 5 | #' @param data Data frame containing the dataset to process.
 6 | #' @param dataTED Data frame containing the test dataset to process.
 7 | #' @param preProcessF string containing the name of the preprocessing algorithm:
 8 | #' "boxcox": apply a Box–Cox transform and values must be non-zero and positive in all features,
 9 | #' "yeo-Johnson": apply a Yeo-Johnson transform, like a BoxCox, but values can be negative,
10 | #' "zv": remove attributes with a zero variance (all the same value),
11 | #' "center": subtract mean from values,
12 | #' "scale": divide values by standard deviation,
13 | #' "standardize": perform both centering and scaling,
14 | #' "normalize": normalize values,
15 | #' "pca": transform data to the principal components,
16 | #' "ica": transform data to the independent components.
17 | #' @param nComp Integer of Number of components needed if either "pca" or "ica" feature preprocessors are needed.
18 | #'
19 | #' @return List of two Dataframes of the preprocessed training and testing datasets.
20 | #'
21 | #' @examples featurePreProcessing(\code{data}, \code{dataTED}, "center", 0).
22 | #'
23 | #' @noRd
24 | #'
25 | #' @keywords internal
26 | 
27 | featurePreProcessing <- function(data, dataTED, preProcessF, nComp) {
28 | 
29 |   if(preProcessF == 'scale'){
30 |     preprocessParams <- preProcess(data, method=c("scale"))
31 |   }
32 |   else if(preProcessF == 'center'){
33 |     preprocessParams <- preProcess(data, method=c("center"))
34 |   }
35 |   else if(preProcessF == 'standardize'){
36 |     preprocessParams <- preProcess(data, method=c("center", "scale"))
37 |   }
38 |   else if(preProcessF == 'normalize'){
39 |     preprocessParams <- preProcess(data, method=c("range"))
40 |   }
41 |   else if(preProcessF == 'pca'){
42 |     if (is.na(nComp))
43 |       preprocessParams <- preProcess(data, method=c("pca"))
44 |     else
45 |       preprocessParams <- preProcess(data, method=c("center", "scale", "pca"), pcaComp = nComp)
46 |   }
47 |   else if(preProcessF == 'ica'){
48 |     preprocessParams <- preProcess(data, method=c("center", "scale", "ica"), n.comp=nComp)
49 |   }
50 |   else if(preProcessF == 'yeo-Johnson'){
51 |     preprocessParams <- preProcess(data, method=c("YeoJohnson"))
52 |   }
53 |   else if(preProcessF == 'boxcox'){
54 |     preprocessParams <- preProcess(data, method=c("BoxCox"))
55 |   }
56 |   else if(preProcessF == 'zv'){
57 |     preprocessParams <- preProcess(data, method=c("zv"))
58 |   }
59 |   else{
60 |     print('Error: No defined Preprocessing Algorithm...Skip feature preprocessing part!')
61 |     return(list(TD = data, TED = dataTED))
62 |   }
63 |   data <- predict(preprocessParams, data)
64 |   dataTED <- predict(preprocessParams, dataTED)
65 |   return(list(TD = data, TED = dataTED))
66 | }
67 | 


--------------------------------------------------------------------------------
/R/fitModel.R:
--------------------------------------------------------------------------------
 1 | #' @title Fit SMAC Model.
 2 | #'
 3 | #' @description Fit the trees of the SMAC forest model by adding new nodes to each of the forest trees.
 4 | #'
 5 | #' @param params A string of parameter configuration values for the current classifier to be tuned (parameters are separated by #).
 6 | #' @param bestPerf Vector of performance values of the best parameter configuration on the folds of the SMAC model.
 7 | #' @param trainingSet Dataframe of the training set.
 8 | #' @param validationSet Dataframe of the validation Set.
 9 | #' @param foldedSet List of the folds of the dataset in each tree of the SMAC forest.
10 | #' @param classifierAlgorithm String of the name of classifier algorithm used now.
11 | #' @param tree List of data frames, representing the data structure for the forest of trees of the SMAC model.
12 | #' @param B number of trees in the forest of trees of SMAC optimization algorithm (default = 10).
13 | #' @param metric Metric to be used in evaluation:
14 | #' \itemize{
15 | #' \item "acc" - Accuracy,
16 | #' \item "avg-fscore" - Average of F-Score of each label,
17 | #' \item "avg-recall" - Average of Recall of each label,
18 | #' \item "avg-precision" - Average of Precision of each label,
19 | #' \item "fscore" - Micro-Average of F-Score of each label,
20 | #' \item "recall" - Micro-Average of Recall of each label,
21 | #' \item "precision" - Micro-Average of Precision of each label.
22 | #' }
23 | #'
24 | #' @return List of: \code{t} trees of fitted SMAC Model - \code{p} performance of current parameter configuration on whole dataset - \code{bp} Current added parameter configuration.
25 | #'
26 | #' @examples fitModel('1', c(0.91, 0.89), data.frame(salary = c(623, 515, 611, 729, 843), class = c (0, 0, 0, 1, 1)), data.frame(salary = c(400, 800), class = c (0, 1)), list(c(1,2,4), c(3,5)), 'knn', data.frame(fold = c(), parent = c(), params = c(), leftChild = c(), rightChild = c(), performance = c(), rowN = c()), 2).
27 | #'
28 | #' @noRd
29 | #'
30 | #' @keywords internal
31 | 
32 | fitModel <- function(params, bestPerf, trainingSet, validationSet, foldedSet, classifierAlgorithm, tree, B = 10, metric = 'acc') {
33 |   #fit SMAC model using the current best parameters
34 |   #get current best parameters
35 |   cntParams <- params
36 |   cntParamStr <- paste( unlist(cntParams), collapse='#')
37 |   #initiate a variable to store its performance on each decision tree of the forest
38 |   perf <- c()
39 |   for(i in 1:B){
40 |     cntNode <- tree[tree$fold==i & is.na(tree$parent), ]
41 |     #Get position to add the new node
42 |     cParent <- NA
43 |     cChild <- NA
44 |     if(nrow(cntNode) > 0){
45 |       cParent <- cntNode$rowN
46 |       while(!is.na(cntNode[[1]])){
47 |         cParent <- cntNode$rowN
48 |         if(cntParamStr > as.character(cntNode$params)){
49 |           cntNode <- tree[as.integer(cntNode$rightChild), ]
50 |           cChild <- 5 #pointer position to right node
51 |         }
52 |         else if(cntParamStr < as.character(cntNode$params)){
53 |           cntNode <- tree[as.integer(cntNode$leftChild), ]
54 |           cChild <- 4 #pointer position to left node
55 |         }
56 |         else{
57 |           return(list(bp = params, t = tree, p=bestPerf))
58 |         }
59 |       }
60 |     }
61 | 
62 |     if(length(bestPerf) >= i)
63 |       perf <- bestPerf
64 |     else
65 |       perf <- c(perf, (runClassifier(trainingSet[foldedSet[[i]], ], validationSet, cntParams, classifierAlgorithm, metric = metric))$perf)
66 | 
67 |     #row number of new node to be added
68 |     newRowN <- nrow(tree) + 1
69 |     #Update parent's child
70 |     if(!is.na(cChild))
71 |       tree[cParent, cChild] <- newRowN
72 |     #Add new node with current configuration
73 |     df <- data.frame(fold = i, parent = cParent, params = cntParamStr, leftChild = NA, rightChild = NA, performance = perf[i], rowN = newRowN)
74 |     tree <- rbind(tree, df)
75 |   }
76 | 
77 |   cntParams$performance <- mean(perf)
78 |   return(list(t = tree, p=perf, bp=cntParams))
79 | }
80 | 


--------------------------------------------------------------------------------
/R/getCandidateClassifiers.R:
--------------------------------------------------------------------------------
  1 | #' @title Get candidate Good Classifier Algorithms.
  2 | #'
  3 | #' @description Compare Dataset Meta-Features with the Knowledge base to recommend good Classifier Algorithms based on nearest neighbor datasets with outperformaing pipelines.
  4 | #'
  5 | #' @param maxTime Float of the maximum time budget allowed.
  6 | #' @param metaFeatures List of the meta-features collected from the dataset.
  7 | #' @param nModels Integer of number of required number of recommendations of classifier algorithms to get.
  8 | #'
  9 | #' @return List of recommended classifier algorithms, their initial parameter configurations, and time ratio to be spent in tuning each classifier.
 10 | #'
 11 | #' @examples getCandidateClassifiers(10, \code{metaFeatures}, 3)
 12 | #'
 13 | #' @importFrom BBmisc normalize
 14 | #' @importFrom RMySQL MySQL fetch dbDisconnect dbSendQuery dbConnect
 15 | #' @importFrom httr POST content
 16 | #' @importFrom stats setNames
 17 | #'
 18 | #' @noRd
 19 | #'
 20 | #' @keywords internal
 21 | 
 22 | getCandidateClassifiers <- function(maxTime, metaFeatures, nModels) {
 23 |   classifiers <- c('randomForest', 'c50', 'j48', 'svm', 'naiveBayes','knn', 'bagging', 'rda', 'neuralnet', 'plsda', 'part', 'deepboost', 'rpart', 'lda', 'lmt')
 24 |   classifiersWt <- c(10, 20, 11, 21, 10, 5, 25, 5, 5, 6, 11, 21, 6, 5, 10) #weight of each classifier to tune based on number and types of parameters
 25 | 
 26 |   #Choosen Classifiers parameters initialization
 27 |   params <- c()
 28 |   cclassifiers <- c() #chosen classifiers
 29 |   ratio <- c() #time ratios for each classifier
 30 |   KBFlag <- FALSE
 31 |   for(trial in 1:3){ #TRY to connect to knowledge base
 32 |     readKnowledgeBase <- try(
 33 |     {
 34 |       metaData <- content(POST("https://jncvt2k156.execute-api.eu-west-1.amazonaws.com/default/callKnowledgeBase"))
 35 |       KBFlag <- TRUE
 36 |       metaDataFeatures <- data.frame(matrix(unlist(metaData, recursive = FALSE), nrow = length(metaData), byrow = T))
 37 |       colnames(metaDataFeatures) <- c('datasetRatio', 'featuresKurtStdDev', 'featuresKurtMean', 'featuresKurtMax', 'featuresKurtMin', 'featuresSkewStdDev', 'featuresSkewMean', 'featuresSkewMax', 'featuresSkewMin', 'symbolsStdDev', 'symbolsSum', 'symbolsMean', 'classProbStdDev', 'classProbMean', 'classProbMax', 'classProbMin', 'classEntropy', 'ratioNumToCat', 'nCatFeatures', 'nNumFeatures', 'nInstances', 'nFeatures', 'nClasses', 'lognFeatures', 'lognInstances', 'classifierAlgorithm', 'parameters', 'maxTime', 'metric', 'performance')
 38 | 
 39 |       #Remove useless columns for now
 40 |       metaDataFeatures$performance <- NULL
 41 |       metaDataFeatures$metric <- NULL
 42 |       metaDataFeatures$ipInserted <- NULL
 43 |       metaDataFeatures$maxTime <- NULL
 44 |       metaDataFeatures$dateInserted <- NULL
 45 |       metaDataFeatures$ID <- NULL
 46 |       metaFeatures$maxTime <- NULL
 47 | 
 48 |       #Separate Best Classifier Algorithms and Their Parameters
 49 |       bestClf <- metaDataFeatures$classifierAlgorithm
 50 |       nClasses <- metaDataFeatures$nClasses
 51 |       bestClfParams <- metaDataFeatures$parameters
 52 |       metaDataFeatures$classifierAlgorithm <- NULL
 53 |       metaDataFeatures$parameters <- NULL
 54 | 
 55 |       #Append new dataset meta features to the metaDataFeatures
 56 |       metaDataFeatures <- rbind(metaDataFeatures, metaFeatures)
 57 | 
 58 |       #Normalize the distance matrix
 59 |       metaDataFeatures[] <- suppressWarnings(lapply(metaDataFeatures, function(x) as.numeric(as.character(x))))
 60 |       metaDataFeatures <- normalize(metaDataFeatures, method = "standardize", range = c(0, 1), margin = 1L, on.constant = "quiet")
 61 | 
 62 |       #Construct the distance list to extract the nearest neighbors
 63 |       cntMeta <- nrow(metaDataFeatures)
 64 |       distMat <- data.frame()
 65 |       distMat[['dist']] <- as.numeric()
 66 |       distMat[['index']] <- as.numeric()
 67 | 
 68 |       for(i in 1:(nrow(metaDataFeatures)-1)){
 69 |         dist <- 0
 70 |         for(j in 1:ncol(metaDataFeatures)){
 71 |           if(is.na(metaDataFeatures[i,j]) == TRUE && is.na(metaDataFeatures[cntMeta,j]) == TRUE)
 72 |             dist <- dist + 0
 73 | 
 74 |           else if ( (is.na(metaDataFeatures[i,j]) == TRUE && is.na(metaDataFeatures[cntMeta,j]) == FALSE)  || (is.na(metaDataFeatures[i,j]) == FALSE && is.na(metaDataFeatures[cntMeta,j]) == TRUE) )
 75 |             dist <- dist + 0.5
 76 | 
 77 |           else
 78 |             dist <- dist + (suppressWarnings(as.numeric(metaDataFeatures[i,j])) - suppressWarnings(as.numeric(metaDataFeatures[cntMeta, j])) )^2
 79 | 
 80 |         }
 81 |         tmpDist <- list(dist = dist, index = i)
 82 |         distMat <- rbind(distMat, tmpDist)
 83 |       }
 84 |       #Sort Dataframe
 85 |       orderInd <- order(distMat$dist)
 86 |       distMat <- distMat[orderInd, ]
 87 | 
 88 |       #Get best classifiers with their parameters
 89 |       for(i in 1:nrow(distMat)){
 90 |         ind <- distMat[i,]$index
 91 |         clf <- bestClf[ind]
 92 |         if(is.element(clf, cclassifiers) == FALSE){
 93 |           #Exception for deep Boost requires binary classes dataset
 94 |           if((clf == 'deepboost'  && nClasses > 2)||clf == 'fda')
 95 |             next
 96 |           cclassifiers <- c(cclassifiers, clf)
 97 |           params <- c(params, bestClfParams[ind])
 98 | 
 99 |           clfInd = which(classifiers == clf)
100 |           ratio <- c(ratio, classifiersWt[clfInd])
101 |         }
102 |         if(length(cclassifiers) == nModels)
103 |           break
104 |       }
105 |     })
106 |     if(inherits(readKnowledgeBase, "try-error")){
107 |       KBFlag <- FALSE
108 |       print('Warning: Can not connect to KnowledgeBase Data! Check your internet connectivity. Trying Again.')
109 |       next
110 |     }
111 | 
112 |     if(KBFlag == TRUE) #managed to get information from knowledge base
113 |       break
114 |   }
115 | 
116 |   if(KBFlag == FALSE)
117 |     print('Assuming Random Classifiers will be used. You should use Large Time Budgets and nModels for better results')
118 |   #Assign time ratio for each classifier
119 |   if (length(cclassifiers) < nModels){ #failed to make use of meta-learning --> tune over all classifiers
120 |     #cclassifiers <- classifiers
121 |     for (clf in classifiers){
122 |       if(is.element(clf, cclassifiers) == TRUE) #already inserted this classifier
123 |         next
124 |       ind = which(classifiers == clf)
125 |       ratio <- c(ratio, classifiersWt[ind])
126 |       cclassifiers <- c(cclassifiers, clf)
127 |       params <- c(params, '')
128 |       if(length(cclassifiers) == nModels) #completed number of required models
129 |         break
130 |     }
131 |   }
132 |   ratio <- ratio / sum(ratio) * (maxTime * 0.9) #Only using 90% of the allowed time budget
133 | 
134 |   return (list(c = cclassifiers, r = ratio, p = params))
135 | }
136 | 


--------------------------------------------------------------------------------
/R/hb_utilities.R:
--------------------------------------------------------------------------------
 1 | #' @importFrom data.table fcase
 2 | #' @import purrr
 3 | 
 4 | #' @keywords internal
 5 | 
 6 | param_sample <- function(model, hparam, columns = NULL) {
 7 |   param = jsons[[model]][[hparam]]
 8 |   type <- param$type
 9 |   type_scale <- param$scale
10 | 
11 |   if(type == "boolean") {
12 |     param_estimation <- paste(base::sample(x = as.list(param$values), size = 1), sep = "")
13 |     param_estimation <- ifelse(param_estimation == "FALSE", FALSE, TRUE)
14 |     return(param_estimation)
15 |   }
16 |   else if(type == "discrete") {
17 |     param_estimation <- paste(base::sample(x = as.list(param$values), size = 1), sep = "")
18 |     return(param_estimation)
19 |   }
20 | 
21 |   else {
22 |     int_val <- ifelse(hparam == "mtry", as.numeric(columns) - 1, as.numeric(param$maxVal))
23 |     param_estimation <- fcase(type_scale == "int", rdunif(1, a = as.numeric(param$minVal),
24 |                                                           b = int_val),
25 |                               type_scale == "any", runif(1,  min = as.numeric(param$minVal),
26 |                                                          max = as.numeric(param$maxVal)),
27 |                               type_scale == "double", runif(1,  min = as.numeric(param$minVal),
28 |                                                             max = as.numeric(param$maxVal)),
29 |                               type_scale == "exp", 2^rdunif(1,  a = as.numeric(param$minVal),
30 |                                                             b = as.numeric(param$maxVal)))
31 |     return(as.numeric(param_estimation))
32 |   }
33 | 
34 | }
35 | 
36 | #' @keywords internal
37 | get_random_hp_config <- function(model, columns = NULL) {
38 |   param_db <- jsons[[model]]
39 |   params_list <- param_db$params
40 |   params_list_mapped <- map(.x = params_list,
41 |                             .f = as_mapper( ~ param_sample(model = model,
42 |                                                            hparam = .x,
43 |                                                            columns = columns)))
44 |   `names<-`(params_list_mapped, params_list)
45 | }
46 | 
47 | #' @keywords internal
48 | calc_n_r = function(max_iter = 81, eta = 3, s = 4, B = 405) {
49 |   n = trunc(ceiling(trunc(B/max_iter/(s+1)) * eta**s))
50 |   r = max_iter * eta^(-s)
51 |   ans = c(n, r)
52 |   ans
53 | }
54 | 


--------------------------------------------------------------------------------
/R/hyperband.R:
--------------------------------------------------------------------------------
 1 | #' @keywords internal hyperband
 2 | hyperband <- function(df, model, max_iter = 81, eta = 3, maxtime = 1000,
 3 |                       problem = 'classification', measure = 'classif.acc') {
 4 |   logeta = as_mapper(~ log(.x) / log(eta))
 5 |   s_max = trunc(logeta(max_iter))
 6 |   B = (s_max + 1) * max_iter
 7 |   nrs = map_dfc(s_max:0, .f = ~ calc_n_r(max_iter, eta, .x, B)) %>%
 8 |     t() %>%
 9 |     `colnames<-`(value = c("n", "r")) %>%
10 |     as.data.table()
11 |   nrs$s = s_max:0
12 |   partial_halving <- function(n, r, s) {
13 |     successive_halving(df = df, model = model,
14 |                        params_config = replicate(n, get_random_hp_config(model, columns = ncol(df) - 1),
15 |                                                  simplify = FALSE),
16 |                        n = n, r = r, s_max = s, max_iter = max_iter, eta = eta,
17 |                        problem = problem, measure = measure)
18 |   }
19 | 
20 |   liszt = vector(mode = "list", length = max(nrs$s) + 1)
21 |   if (model != 'ranger'){
22 |     tryCatch({tmp <- withTimeout({
23 |       for (row in 1:nrow(nrs)) {
24 |           liszt[[row]] <- partial_halving(nrs[[row, 1]],
25 |                                                   nrs[[row, 2]],
26 |                                                   nrs[[row, 3]])
27 |         print("Looped once")
28 |       }
29 |     }, timeout = maxtime, elapsed = maxtime)
30 |     }, TimeoutException = function(ex) {
31 |       err <- geterrmessage()
32 |       if (startsWith(err, 'reached') == FALSE)
33 |         print(paste('Error Found, ', err, ' Replace ', model, sep = ''))
34 |       else
35 |         print("Time Budget ended.")
36 |     },
37 |     finally = function(ex) {
38 |       print("Hyperband successfully finished.")
39 |     })
40 |   }
41 |   else{
42 |     current <- Sys.time() %>% as.integer()
43 |     for (row in 1:nrow(nrs)) {
44 |       tryCatch({liszt[[row]] <- partial_halving(nrs[[row, 1]],
45 |                                       nrs[[row, 2]],
46 |                                       nrs[[row, 3]])
47 |       }, Exception = function(ex) {
48 |         err <- geterrmessage()
49 |         print(paste('Error Found, ', err, ' Replace ', model, sep = ''))
50 |       })
51 |       now <- Sys.time() %>% as.integer()
52 |       if ((now - current) > maxtime){
53 |         print("Time Budget ended.")
54 |         break
55 |       }
56 |       print("Looped once")
57 |     }
58 |   }
59 |   return(liszt)
60 | }
61 | 


--------------------------------------------------------------------------------
/R/initialize.R:
--------------------------------------------------------------------------------
 1 | #' @title Initialize the SMAC model.
 2 | #'
 3 | #' @description Initialize the SMAC model with the classifier default parameter configuration.
 4 | #'
 5 | #' @param classifierName String of the classifier algorithm name.
 6 | #' @param result List of the converted classifier json parameter configuration into set of vectors and lists.
 7 | #' @param initParams String of the initial parameter configuration of \code{classifierName} to start the model with.
 8 | #'
 9 | #' @return
10 | #'
11 | #' @examples
12 | #'
13 | #' @noRd
14 | #'
15 | #' @keywords internal
16 | 
17 | initialize <- function(classifierName, result, initParams) {
18 |   #get list of Classifier Parameters
19 |   params <- result$params
20 |   #get list of GrandParent parametes
21 |   gparams <- result$parents
22 |   #Create dataFrame for classifier default parameters
23 |   defaultParams <- data.frame(matrix(ncol = length(params)+1, nrow = 1))
24 |   colnames(defaultParams) <- c(params, 'performance')
25 |   i <- 1
26 |   while(i <= length(gparams)){
27 |     parI <- gparams[i]
28 |     defaultParams[[parI]] <- result[[parI]]$'default'
29 |     require <- result[[parI]]$'requires'[[result[[parI]]$'default']]$'require'
30 |     gparams <- c(gparams, require)
31 |     i <- i + 1
32 |   }
33 | 
34 |   if ( initParams != ""){
35 |     initParams <- unlist(strsplit(initParams, "#"))
36 |     j <- 1
37 |     for(i in colnames(defaultParams)){
38 |       if(i == 'performance' || i == 'nodesize')
39 |         next
40 |       if(initParams[j] == 'NA')
41 |         defaultParams[[i]] <- NA
42 |       else
43 |         defaultParams[[i]] <- initParams[j]
44 | 
45 |       j <- j + 1
46 |     }
47 |   }
48 |   defaultParams[["EI"]] <- NA
49 |   return (defaultParams)
50 | }
51 | 


--------------------------------------------------------------------------------
/R/intensify.R:
--------------------------------------------------------------------------------
 1 | #' @title Intensify of SMAC model
 2 | #'
 3 | #' @description Checking if current candidate parameter configuration is better than the current best parameter configuration chosen till now or not.
 4 | #'
 5 | #' @param R Dataframe of tried out candidate parameter configurations.
 6 | #' @param bestParams String of best parameter configuration found till now.
 7 | #' @param bestPerf Vector of performance of classifier on all folds of dataset.
 8 | #' @param candidateConfs Vector of strings of candidate parameter configurations.
 9 | #' @param trainingSet Dataframe of the training set.
10 | #' @param validationSet Dataframe of the validation Set.
11 | #' @param foldedSet List of the folds of the dataset in each tree of the SMAC forest.
12 | #' @param classifierAlgorithm  String value of the classifier Name.
13 | #' @param maxTime Float of maximum time budget allowed.
14 | #' @param timeTillNow Float of the time spent till now.
15 | #' @param B number of trees in the forest of trees of SMAC optimization algorithm (default = 10).
16 | #' @param metric Metric to be used in evaluation:
17 | #' \itemize{
18 | #' \item "acc" - Accuracy,
19 | #' \item "avg-fscore" - Average of F-Score of each label,
20 | #' \item "avg-recall" - Average of Recall of each label,
21 | #' \item "avg-precision" - Average of Precision of each label,
22 | #' \item "fscore" - Micro-Average of F-Score of each label,
23 | #' \item "recall" - Micro-Average of Recall of each label,
24 | #' \item "precision" - Micro-Average of Precision of each label.
25 | #' }
26 | #'
27 | #' @return List of current best parameter configuration, its performance, dataframe of tried out candidate parameter configurations, and time till now.
28 | #'
29 | #' @examples intensify(c('1'), '1', c(0.89, 0.91), list(c(1,2,4), c(3,5)), data.frame(salary = c(623, 515, 611, 729, 843), class = c (0, 0, 0, 1, 1)), data.frame(salary = c(400, 800), class = c (0, 1)), 'knn', 100, 5, 2)
30 | #'
31 | #' @noRd
32 | #'
33 | #' @keywords internal
34 | 
35 | intensify <- function(R, bestParams, bestPerf, candidateConfs, foldedSet, trainingSet, validationSet, classifierAlgorithm, maxTime, timeTillNow , B = 10, metric = metric) {
36 |   for(j in 1:nrow(candidateConfs)){
37 |     cntParams <- candidateConfs[j,]
38 |     cntPerf <- c()
39 |     folds <- sample(1:B)
40 |     pointer <- 1
41 |     timeFlag <- FALSE
42 |     N <- 1
43 |     #number of folds with higher performance for candidate configuration
44 |     forMe <- 0
45 |     #number of folds with lower performance for candidate configuration
46 |     againstMe <- 0
47 |     fails <- 0
48 |     while(pointer < B){
49 |       for(i in pointer:min(pointer+N-1, B)){
50 |         tmpPerf <- runClassifier(trainingSet[foldedSet[[i]], ], validationSet, cntParams, classifierAlgorithm, metric = metric)
51 |         if(tmpPerf$perf == 0){
52 |           fails <- fails + 1
53 |         }
54 |         cntPerf <- c(cntPerf, tmpPerf$perf)
55 |         if(i > length(bestPerf))
56 |           tmpPerf <- runClassifier(trainingSet[foldedSet[[i]], ], validationSet, bestParams, classifierAlgorithm, metric = metric)
57 |           bestPerf <- c(bestPerf, tmpPerf$perf)
58 |         if(cntPerf[i] >= bestPerf[i])forMe <- forMe + 1
59 |         else againstMe <- againstMe + 1
60 | 
61 |         #Check time consumed till now
62 |         t <- toc(quiet = TRUE)
63 |         timeTillNow <- timeTillNow + t$toc - t$tic
64 |         tic(quiet = TRUE)
65 |         if(timeTillNow > maxTime || fails > 2){
66 |           timeFlag <- TRUE
67 |           break
68 |         }
69 |       }
70 |       if(forMe < againstMe || timeFlag == TRUE) break
71 |       pointer <- pointer + N
72 |       N <- N * 2
73 |     }
74 |     #make the current candidate as the best candidate
75 |     if(timeFlag == FALSE && forMe > againstMe){
76 |       bestParams <- cntParams
77 |       bestPerf <- cntPerf
78 |     }
79 |     cntParams$performance <- mean(cntPerf)
80 |     bestParams$performance <- mean(bestPerf)
81 |     R <- rbind(R, cntParams)
82 |   }
83 |   return(list(params = bestParams, perf = bestPerf, r = R, timeTillNow = timeTillNow, fails = fails))
84 | }
85 | 


--------------------------------------------------------------------------------
/R/intrepretability.R:
--------------------------------------------------------------------------------
 1 | #' @title Perform Interpretability on Model.
 2 | #'
 3 | #' @description Perform Model interpretability on the select model by obtaining two plots for feature importance and feature interaction.
 4 | #'
 5 | #' @param model Fitted Model of any of the chosen classifiers and fitted on the training set.
 6 | #' @param x Dataframe of the training set.
 7 | #'
 8 | #' @return List of two plots of feature importance and feature interaction.
 9 | #'
10 | #' @examples interpret(\code{model}, data.frame(salary = c(623, 515, 611, 729, 843), class = c (0, 0, 0, 1, 1)))
11 | #'
12 | #' @importFrom iml FeatureImp Interaction Predictor
13 | #'
14 | #' @noRd
15 | #'
16 | #' @keywords internal
17 | 
18 | Loss <- function(actual, predicted){
19 |   err <- 0
20 |   for(i in 1:length(actual)){
21 |     act <- as.character(actual[i])
22 |     pred <- substring(as.character(predicted[i]), 2)
23 |     if (act != pred)
24 |       err <- err + 1
25 |   }
26 |   return(err/length(actual))
27 | }
28 | 
29 | interpret <- function(model, x){
30 |   clas = as.factor(x$class)
31 |   X = x[which(names(x) != "class")]
32 |   X[] <- lapply(X, function(x) {
33 |     as.double(as.character(x))
34 |   })
35 |   predictor = Predictor$new(model, data = X, y = as.factor(clas))
36 |   out <- list()
37 |   out$featImp <- FeatureImp$new(predictor, loss = Loss)
38 |   out$interact = Interaction$new(predictor)
39 |   return(out)
40 | }
41 | 


--------------------------------------------------------------------------------
/R/outClassifierConf.R:
--------------------------------------------------------------------------------
 1 | #' @title Output Classifier Parameter Configuration.
 2 | #'
 3 | #' @description Get the classifier parameter configuration in a human readable format.
 4 | #'
 5 | #' @param classifierName String of the name of classifier algorithm used now.
 6 | #' @param result List of the converted classifier json parameter configuration into set of vectors and lists.
 7 | #' @param initParams String of parameters of \code{classifierName} separated by #.
 8 | #'
 9 | #' @return String of the human readable output in HTML format.
10 | #'
11 | #' @examples outClassifierConf('knn', list(params = c('k'), parents = c('k'), k = list(default = '7', require = c())), '1')
12 | #'
13 | #' @noRd
14 | #'
15 | #' @keywords internal
16 | 
17 | outClassifierConf <- function(classifierName, result, initParams) {
18 |   #get list of Classifier Parameters names
19 |   params <- result$params
20 |   #get list of GrandParent parameters
21 |   gparams <- result$parents
22 |   #Create dataFrame for classifier default parameters
23 |   defaultParams <- data.frame(matrix(ncol = length(params), nrow = 1))
24 |   colnames(defaultParams) <- c(params)
25 | 
26 |   i <- 1
27 |   while(i <= length(gparams)){
28 |     parI <- gparams[i]
29 |     defaultParams[[parI]] <- result[[parI]]$'default'
30 |     require <- result[[parI]]$'requires'[[result[[parI]]$'default']]$'require'
31 |     gparams <- c(gparams, require)
32 |     i <- i + 1
33 |   }
34 | 
35 |   return(initParams)
36 | }
37 | 


--------------------------------------------------------------------------------
/R/readDataset.R:
--------------------------------------------------------------------------------
  1 | #' @title Read Dataset File into Memory.
  2 | #'
  3 | #' @description Read the file of the dataset, and split it into training and validation sets.
  4 | #'
  5 | #' @param directory String of the directory to the file containing the training dataset.
  6 | #' @param testDirectory String of the directory to the file containing the testing dataset.
  7 | #' @param vRatio The split ratio of the dataset file into training, and validation sets default(10% Validation - 90% Training).
  8 | #' @param classCol String of the class column of the dataset.
  9 | #' @param preProcessF Vector of Strings of the preprocessing algorithm to apply.
 10 | #' @param featuresToPreProcess Vector of numbers of features columns to perform preprocessing - empty vector means all features.
 11 | #' @param nComp Number of components needed if either "pca" or "ica" feature preprocessors are needed.
 12 | #' @param missingOpr Boolean variable represents either delete instances with missing values or apply imputation using "MICE" library - (default = 0 --> delete instances).
 13 | #' @param metric Metric of string character to be used in evaluation:
 14 | #' @param balance Boolean variable represents if SMOTE class balancing is required or not (default FALSE).
 15 | #' \itemize{
 16 | #' \item "acc" - Accuracy,
 17 | #' \item "avg-fscore" - Average of F-Score of each label,
 18 | #' \item "avg-recall" - Average of Recall of each label,
 19 | #' \item "avg-precision" - Average of Precision of each label,
 20 | #' \item "fscore" - Micro-Average of F-Score of each label,
 21 | #' \item "recall" - Micro-Average of Recall of each label,
 22 | #' \item "precision" - Micro-Average of Precision of each label.
 23 | #' }
 24 | #'
 25 | #' @return List of the Training and Validation Sets splits.
 26 | #'
 27 | #' @examples readDataset('/Datasets/irisTrain.csv', '/Datasets/irisTest.csv', 0.1, c(), 'class', 'pca', c(), 2)
 28 | #'
 29 | #' @import RWeka
 30 | #' @import farff
 31 | #' @import caret
 32 | #' @import mice
 33 | #' @importFrom UBL SmoteClassif
 34 | #' @importFrom imputeMissings compute impute
 35 | #' @importFrom  utils read.csv
 36 | #' @importFrom stats complete.cases
 37 | #'
 38 | #' @noRd
 39 | #'
 40 | #' @keywords internal
 41 | 
 42 | readDataset <- function(directory, testDirectory, vRatio = 0.3, classCol, preProcessF, featuresToPreProcess, nComp, missingOpr, metric, balance) {
 43 |   #check if CSV or arff
 44 |   ext <- substr(directory, nchar(directory)-2, nchar(directory))
 45 |   #Read CSV file of data
 46 |   if(ext == 'csv'){
 47 |     con <- file(directory, "r")
 48 |     data <- read.csv(file = con, header = TRUE, sep = ",", stringsAsFactors = TRUE)
 49 |     close(con)
 50 |     con <- file(testDirectory, "r")
 51 |     dataTED <- read.csv(file = con, header = TRUE, sep = ",", stringsAsFactors = TRUE)
 52 |     close(con)
 53 |   }
 54 |   else{
 55 |     data <- readARFF(directory)
 56 |     dataTED <- readARFF(testDirectory)
 57 |   }
 58 | 
 59 |   #Sampling from large datasets
 60 |   maxSample = 20000000
 61 |   n = as.integer(maxSample / ncol(data))
 62 |   if(maxSample < nrow(data) * ncol(data)){
 63 |     sampleInds <- createDataPartition(y = data$class, times = 1, p = n/nrow(data), list = FALSE)
 64 |     data <- data[sampleInds,]
 65 |   }
 66 | 
 67 |   #change column name of classes to be "class"
 68 |   colnames(data)[which(names(data) == classCol)] <- "class"
 69 |   colnames(dataTED)[which(names(dataTED) == classCol)] <- "class"
 70 |   cInd <- grep("class", colnames(data)) #index of class column
 71 |   #function which returns function which will encode vectors with values of class column labels
 72 |   label_encoder <- function(vec){
 73 |     levels <- sort(unique(vec))
 74 |     function(x){
 75 |       match(x, levels)
 76 |     }
 77 |   }
 78 |   classEncoder <- label_encoder(data$class) # create class encoder
 79 |   data$class <- classEncoder(data$class) # encoding class labels of training set
 80 |   dataTED$class <- classEncoder(dataTED$class) # encoding class labels of testing set
 81 | 
 82 |   #check either to delete an instance with missing values or perform imputation
 83 |   if (missingOpr == FALSE){
 84 |     missingVals <- imputeMissings::compute(data, method = "median/mode")
 85 |     data <- impute(data, object = missingVals)
 86 |     dataTED <- impute(dataTED, object = missingVals)
 87 |   }
 88 |   else{
 89 |     data <-complete( mice(data, m = 1, threshold = 1, printFlag = FALSE))
 90 |     dataTED <- complete(mice(dataTED, m = 1, threshold = 1, printFlag = FALSE))
 91 |   }
 92 | 
 93 |   #remove ID features
 94 |   numericFlag <- unlist(lapply(data, is.numeric))
 95 |   rmvFlag = c()
 96 |   for(i in 1:ncol(data)){
 97 |     len = length(unique(data[,i]))
 98 |     if(numericFlag[i] == FALSE && ((len / nrow(data) > 0.5) || len == 1) )
 99 |       rmvFlag <- c(rmvFlag, i)
100 |   }
101 |   keepFlag = c(1:ncol(data))
102 |   keepFlag = keepFlag[!keepFlag %in% rmvFlag]
103 |   data <- data[, keepFlag]
104 |   dataTED <- dataTED[, keepFlag]
105 | 
106 |   #Select all remaining features
107 |   selectedFeats <- c(1:ncol(data))
108 | 
109 |   #perform preprocessing
110 |   if(length(featuresToPreProcess ) == 0){
111 |     numericFlag <- unlist(lapply(data, is.numeric))
112 |     for(i in 1:ncol(data)){
113 |       if(numericFlag[i] == TRUE && i != cInd)
114 |         featuresToPreProcess <- c(featuresToPreProcess, i)
115 |     }
116 |   }
117 |   if(length(preProcessF) != 0 && length(featuresToPreProcess) > 1){
118 |     featuresToPreProcess <- featuresToPreProcess[!featuresToPreProcess %in% cInd] #remove class column from set of features to be preprocessed
119 |     dataTmp = list(TD = data[,featuresToPreProcess], TED = dataTED[,featuresToPreProcess])
120 |     #Add PCA if we have more than 100 features
121 |     if(length(featuresToPreProcess) > 100 && any('pca' != preProcessF) )
122 |       preProcessF <- c(preProcessF, 'pca')
123 |     for(i in 1:length(preProcessF)){
124 |       dataTmp <- featurePreProcessing(dataTmp$TD, dataTmp$TED, preProcessF[i], nComp)
125 |     }
126 | 
127 |     #add other features that don't require feature preprocessing to the features obtained after preprocessing
128 |     diffTmp <- setdiff(selectedFeats, c(cInd, featuresToPreProcess))
129 |     dHead = c(colnames(dataTmp$TD), colnames(data)[diffTmp])
130 | 
131 |     dataTDTmp <- data.frame(cbind(dataTmp$TD, data[,diffTmp]))
132 |     dataTEDTmp <- data.frame(cbind(dataTmp$TED, dataTED[,diffTmp]))
133 |     colnames(dataTDTmp) <- dHead
134 |     colnames(dataTEDTmp) <- dHead
135 | 
136 |     #add class column to the dataframe of the dataset
137 |     dataTDTmp$class <- data$class
138 |     dataTEDTmp$class <- dataTED$class
139 |     data <- dataTDTmp
140 |     dataTED <- dataTEDTmp
141 |   }
142 | 
143 |   #Class Balancing using Smote for metrics other than accuracy and binary class problems
144 |   if( balance == TRUE || (metric != 'acc' && length(unique(data$class)) == 2) ){
145 |     data$class = factor(data$class)
146 |     data <- SmoteClassif(class ~., data, dist = 'HEOM')
147 |   }
148 | 
149 |   # Use 70% of the dataset as Training - 30% of the dataset as Validation by default
150 |   #smp_size <- floor((1-vRatio) * nrow(data))
151 |   # set the seed to make your partition reproducible
152 |   #train_ind <- sample(seq_len(nrow(data)), size = smp_size)
153 |   train_ind <- createDataPartition(y = data$class, times = 1, p = (1-vRatio), list = FALSE)
154 |   trainingDataset <- data[train_ind, ]
155 |   validationDataset <- data[-train_ind, ]
156 |   return (list(TD = trainingDataset, VD = validationDataset, FULLTD = data, TED = dataTED))
157 | }
158 | 


--------------------------------------------------------------------------------
/R/runClassifier_.R:
--------------------------------------------------------------------------------
  1 | #' @keywords internal
  2 | runClassifier_ <- function(trainingSet, validationSet, params, classifierAlgorithm, metric = "acc") {
  3 | 
  4 |   #training set features and classes
  5 |   xFeatures <- subset(trainingSet, select = -class)
  6 |   xClass <- c(subset(trainingSet, select = class)$'class')
  7 | 
  8 |   #print(levels(xClass))
  9 | 
 10 |   #validation set features and classes
 11 |   yFeatures <- subset(validationSet, select = -class)
 12 |   yClass <- c(subset(validationSet, select = class)$'class')
 13 | 
 14 |   #print(levels(yClass))
 15 | 
 16 |   #remove not available parameters
 17 |   if(typeof(params) == 'character'){
 18 |     classifierConf <- getClassifierConf(classifierAlgorithm)
 19 |     params <- initialize(classifierAlgorithm, classifierConf, params)
 20 |   }
 21 |   for(i in colnames(params)){
 22 |     if(is.na(params[[i]]) || params[[i]] == 'NA' || params[[i]] == 'EI'){
 23 |       params <- subset(params, select = -get(i))
 24 |     }
 25 |   }
 26 |         # build model
 27 |         if(classifierAlgorithm == 'svm'){
 28 |           if(exists('gamma', where=params) && !is.na(params$gamma))
 29 |             params$gamma <- (2^ as.double(params$gamma))
 30 |           if(exists('cost', where=params) && !is.na(params$cost))
 31 |             params$cost <- (2^ as.double(params$cost))
 32 |           if(exists('tolerance', where=params) && !is.na(params$tolerance))
 33 |             params$tolerance <- (2^ as.double(params$tolerance))
 34 |           if(!exists('kernel', where = params))
 35 |             params$kernel <- 'radial'
 36 |           invisible(capture.output(suppressWarnings(model <- do.call(svm,c(list(x = xFeatures, y = xClass, type = 'C-classification', scale = F), params)))))
 37 |           #check performance
 38 |           pred <- predict(model, yFeatures)
 39 |         }
 40 |         else if(classifierAlgorithm == 'l2-linear-classifier'){
 41 |           params$cost <- (2^as.numeric(params$cost))
 42 |           params$epsilon <- as.numeric(params$epsilon)
 43 |           model <- LiblineaR(target = as.factor(xClass), data = xFeatures, cost = params$cost, epsilon = params$epsilon, type = 2)
 44 |           pred <- predict(model, yFeatures)$predictions
 45 |         }
 46 |         else if(classifierAlgorithm == 'naiveBayes'){
 47 |           if(!exists('eps', where = params)) {
 48 |             params$laplace <- as.numeric(params$laplace)
 49 | 
 50 |             model <- fnb.train(x = xFeatures, y = as.factor(xClass), laplace = params$laplace)
 51 |           }
 52 |           if(exists('eps', where = params)) {
 53 | 
 54 |             params$laplace <- as.numeric(params$laplace)
 55 |             params$eps <- (2 ^ as.numeric(params$eps))
 56 |             learn <- cbind(xClass, xFeatures)
 57 |             model <- naiveBayes(as.factor(xClass) ~., data = learn, laplace = params$laplace, eps = params$eps)
 58 | 
 59 |           }
 60 | 
 61 |           pred <- predict(model, yFeatures)
 62 | 
 63 |         }
 64 |         else if(classifierAlgorithm == 'boosting'){
 65 |           params$eta <- (2^as.numeric(params$eta))
 66 |           params$max_depth <- as.numeric(params$max_depth)
 67 |           params$min_child_weight <- as.numeric(params$min_child_weight)
 68 |           params$gamma <- as.numeric(params$gamma)
 69 |           params$colsample_bytree <- as.numeric(params$colsample_bytree)
 70 | 
 71 |           xClass_dmat <- xClass %>% as.numeric() %>% map(.f = ~ .x - 1)
 72 |           xFeatures_dmat <- xFeatures %>% as.matrix()
 73 |           mode(xFeatures_dmat) = 'double'
 74 |           yFeatures_dmat <- yFeatures %>% as.matrix()
 75 |           mode(yFeatures_dmat) = 'double'
 76 | 
 77 |           learn <- xgb.DMatrix(data = xFeatures_dmat, label = xClass_dmat)
 78 |           model <- xgboost(data = learn,
 79 |                            nrounds = 5,
 80 |                            eta = params$eta,
 81 |                            max_depth = params$max_depth,
 82 |                            min_child_weight = params$min_child_weight,
 83 |                            gamma = params$gamma,
 84 |                            colsample_bytree = params$colsample_bytree,
 85 |                            objective = "multi:softprob",
 86 |                            num_class = length(unique(xClass_dmat)),
 87 |                            verbose = 0,
 88 |                            nthread = 1)
 89 | 
 90 |           pred_prep <- predict(model, yFeatures_dmat, nthreads = 1)
 91 | 
 92 |           pred_mat <- matrix(pred_prep, ncol = length(unique(xClass_dmat)), byrow = T)
 93 | 
 94 |           colnames(pred_mat) <- levels(trainingSet$class)
 95 | 
 96 |           pred <- apply(pred_mat, 1, function(x) colnames(pred_mat)[which.max(x)])
 97 | 
 98 |           levels(pred) <- levels(trainingSet$class)
 99 | 
100 |         }
101 |         else if(classifierAlgorithm == 'ranger'){
102 |           params$max.depth <- as.numeric(params$max.depth)
103 |           params$num.trees <- as.numeric(params$num.trees)
104 |           params$mtry <- min(as.numeric(params$mtry), ncol(xFeatures))
105 |           params$min.node.size <- as.numeric(params$min.node.size)
106 |           learn <- cbind(xClass, xFeatures)
107 |           model <- ranger(as.factor(xClass) ~ .,
108 |                           data = learn,
109 |                           max.depth = params$max.depth,
110 |                           num.trees = params$num.trees,
111 |                           mtry = params$mtry,
112 |                           min.node.size = params$min.node.size,
113 |                           num.threads = 1)
114 |           pred <- predict(model, yFeatures, num.threads = 1)$prediction
115 |         }
116 |         else if(classifierAlgorithm == 'randomForest'){
117 |           params$mtry <- as.numeric(params$mtry)
118 |           params$ntree <- as.numeric(params$ntree)
119 |           params$mtry <- min(params$mtry, ncol(xFeatures))
120 |           model <- do.call(randomForest,c(list(x = xFeatures, y = as.factor(xClass)), params))
121 |           pred <- predict(model, yFeatures)
122 |         }
123 |         if (classifierAlgorithm != 'boosting') {
124 | 
125 |           perf <- evaluateMet(yClass, pred, metric = metric)
126 | 
127 |         }
128 |         else {
129 | 
130 |           perf <- evaluateMet(validationSet$class, pred %>% factor(levels = levels(validationSet$class)), metric = metric)
131 | 
132 |         }
133 | 
134 |   result <- list()
135 |   result$perf <- perf
136 | 
137 |   result$model <- model
138 |   result$pred <- pred
139 | 
140 |   return(result)
141 | }
142 | 


--------------------------------------------------------------------------------
/R/selectConfiguration.R:
--------------------------------------------------------------------------------
  1 | #' @title Select Candidate Parameter Configuration
  2 | #'
  3 | #' @description Generate neighbor parameter configurations, sort them according to the expected improvement, and select the top promising ones as candidate configurations.
  4 | #'
  5 | #' @param R Dataframe of tried out parameter configurations.
  6 | #' @param classifierAlgorithm String value of the classifier Name.
  7 | #' @param tree List of data frames, representing the data structure for the forest of trees of the SMAC model.
  8 | #' @param bestParams String of best parameter configuration found till now.
  9 | #' @param B number of trees in the forest of trees of SMAC optimization algorithm (default = 10).
 10 | #'
 11 | #' @return Vector of strings of candidate parameter configurations.
 12 | #'
 13 | #' @examples selectConfiguration(c('1'), 'knn', data.frame(fold = c(), parent = c(), params = c(), leftChild = c(), rightChild = c(), performance = c(), rowN = c()), '1', 10)
 14 | #'
 15 | #' @import rjson
 16 | #' @importFrom stats rnorm
 17 | #'
 18 | #' @noRd
 19 | #'
 20 | #' @keywords internal
 21 | 
 22 | selectConfiguration <- function(R, classifierAlgorithm, tree, bestParams, B = 10) {
 23 |   #Read Classifier Algorithm Configuration Parameters
 24 |   #Open the Classifier Parameters Configuration File
 25 |   classifierConfDir <- system.file("extdata", paste(classifierAlgorithm,'.json',sep=""), package = "SmartML", mustWork = TRUE)
 26 |   result <- fromJSON(file = classifierConfDir)
 27 | 
 28 |   #get list of Classifier Parameters
 29 |   params <- result$params
 30 | 
 31 |   #minimum error rate found till now
 32 |   cmin <- (1 - bestParams$performance)
 33 | 
 34 |   #calculate Expected Improvement for all saved configurations
 35 |   for(i in 1:nrow(R)){
 36 |     cntParams <- R[i,]
 37 |     cntParamStr <- paste( unlist(cntParams), collapse='#')
 38 |     cntPerf <- c()
 39 |     #calculate Expected improvment from SMAC random forest model
 40 |     for(j in 1:B){
 41 |       cntNode <- tree[tree$fold==j & is.na(tree$parent), ]
 42 |       while(!is.na(cntNode[1])){
 43 |         cParent <- cntNode$rowN
 44 |         cntNode$params
 45 |         if(cntParamStr > as.character(cntNode$params) && !is.na(cntNode$rightChild)){
 46 |           cntNode <- tree[cntNode$rightChild, ]
 47 |         }
 48 |         else if(cntParamStr < as.character(cntNode$params) && !is.na(cntNode$leftChild)){
 49 |           cntNode <- tree[cntNode$leftChild, ]
 50 |         }
 51 |         else{
 52 |           cntPerf <- c(cntPerf, cntNode$performance)
 53 |           cntNode <- NA
 54 |         }
 55 |       }
 56 |     }
 57 |     cntParams$EI <- computeEI(cmin, cntPerf)
 58 |     R[i, ] <- cntParams
 59 |   }
 60 |   #sort according to Expected Improvement
 61 |   sortedR <- R[order(-R$EI),]
 62 |   #choose best promising configurations to suggest candidate configurations
 63 |   candidates <- R[0,]
 64 |   for(i in 1:min(10, nrow(R))){
 65 |     cntParams <- R[i,]
 66 |     for(parI in params){
 67 |       tmpParams <- cntParams
 68 |       cntParam <- cntParams[[parI]]
 69 |       if(is.na(cntParam))
 70 |         next
 71 |       #for continuous Integer parameters
 72 |       if(result[[parI]]$type == 'continuous' && result[[parI]]$scale == 'int'){
 73 |         minVal <- as.double(result[[parI]]$minVal)
 74 |         maxVal <- as.double(result[[parI]]$maxVal)
 75 |         cntParam <- as.double(cntParam)
 76 | 
 77 |         #generate a candidate
 78 |         parValues <- c(result[[parI]]$values)
 79 | 
 80 |         while(cntParam == cntParams[[parI]]){
 81 |           cntParam <- sample(minVal:maxVal, 1, TRUE)
 82 |           if(result[[parI]]$constraint == 'odd' && (cntParam %% 2) == 0)
 83 |             cntParam = cntParams[[parI]]
 84 |         }
 85 |         tmpParams[[parI]] <- cntParam
 86 |         gparams <- c(parI)
 87 |         i <- 1
 88 |         while(i <= length(gparams)){
 89 |           parTmp <- gparams[i]
 90 |           if(parTmp != parI){
 91 |             if(is.na(cntParams[[parTmp]]))tmpParams[[parTmp]] <- result[[parTmp]]$default
 92 |             else tmpParams[[parTmp]] <- cntParams[[parTmp]]
 93 |           }
 94 |           i <- i + 1
 95 |         }
 96 |         tmpParams$EI <- NA
 97 |         tmpParams$performance <- NA
 98 |         candidates <- rbind(candidates, tmpParams)
 99 |       }
100 |       #for continuous Non-Integer parameters
101 |       else if(result[[parI]]$type == 'continuous'){
102 |         minVal <- as.double(result[[parI]]$minVal)
103 |         maxVal <- as.double(result[[parI]]$maxVal)
104 |         cntParam <- as.double(cntParam)
105 |         meanU <- (cntParam - minVal)/(maxVal - minVal)
106 |         #generate four candidates
107 |         num <- 1
108 |         while(num < 5){
109 |           cntParam <- rnorm(1, mean = meanU, sd = 0.2)
110 |           if(cntParam <= 1 && cntParam >= 0){
111 |             num <- num + 1
112 |             tmpParams[[parI]] <- as.character(cntParam * (maxVal - minVal) + minVal)
113 |             tmpParams$EI <- NA
114 |             tmpParams$performance <- NA
115 |             candidates <- rbind(candidates, tmpParams)
116 |           }
117 |         }
118 |       }
119 |       #for Categorical (discrete parameters)
120 |       else if(result[[parI]]$type == 'discrete'){
121 |         parValues <- c(result[[parI]]$values)
122 |         while(cntParam == cntParams[[parI]])
123 |           cntParam <- sample(parValues, 1)
124 |         tmpParams[[parI]] <- cntParam
125 |         gparams <- c(parI)
126 |         i <- 1
127 |         while(i <= length(gparams)){
128 |           parTmp <- gparams[i]
129 |           if(parTmp != parI){
130 |             if(is.na(cntParams[[parTmp]]))tmpParams[[parTmp]] <- result[[parTmp]]$default
131 |             else tmpParams[[parTmp]] <- cntParams[[parTmp]]
132 |           }
133 |           require <- result[[parTmp]]$'requires'[[cntParam]]$require
134 |           gparams <- c(gparams, require)
135 |           i <- i + 1
136 |         }
137 |         tmpParams$EI <- NA
138 |         tmpParams$performance <- NA
139 |         candidates <- rbind(candidates, tmpParams)
140 |       }
141 |     }
142 |   }
143 |   candidates <- unique(candidates)
144 | 
145 |   #Remove Duplicate Candidate Configurations
146 |   duplicates <- c()
147 |   for(i in 1:nrow(candidates)){
148 |     for(j in 1:nrow(R)){
149 |       flager <- FALSE
150 |       for(k in 1:(ncol(candidates)-2)){
151 |         if((!is.na(candidates[i,k]) && !is.na(candidates[i,k])) || candidates[i,k] != R[j,k]){
152 |           flager <- TRUE
153 |           break
154 |         }
155 |       }
156 |       if(flager == FALSE)
157 |         duplicates <- c(duplicates, i)
158 |     }
159 |   }
160 |   if(length(duplicates) > 0)
161 |     candidates <- candidates[-duplicates,]
162 |   #End Remove Candidate Configurations
163 |   return(candidates)
164 | }
165 | 


--------------------------------------------------------------------------------
/R/sendToDatabase.R:
--------------------------------------------------------------------------------
 1 | #' @title Send Results to Knowledge Base
 2 | #'
 3 | #' @description Connect to the cloud knowledge base to store the results obtained to be used in meta-learning of future runs.
 4 | #' @param tmp String of characters to be sent to knowledge base
 5 | #' @return None
 6 | #'
 7 | #' @examples sendToDatabase()
 8 | #'
 9 | #' @noRd
10 | #'
11 | #' @import devtools
12 | #' @importFrom rjson fromJSON
13 | #' @importFrom httr POST
14 | #'
15 | #' @keywords internal
16 | 
17 | sendToDatabase <- function(tmp){
18 |   #Get IP
19 |   cntIP <- fromJSON(readLines("http://api.hostip.info/get_json.php", warn=F))$ip
20 | 
21 |   #Update knowledge base
22 |   updateKB <- try(
23 |     {
24 |       #tmp <- paste(readLines(system.file("extdata", "tmp", package = "SmartML", mustWork = TRUE)), collapse="\n")
25 |       res <- POST("https://jncvt2k156.execute-api.eu-west-1.amazonaws.com/default/s3-trigger-rautoml", body = list(data = paste(tmp, "&DATA&", sep=""),
26 |                                                                                                                    fName = paste(cntIP,".csv&FILENAME&", sep=""),
27 |                                                                                                                    encode = "json"))
28 |       #write("", file=system.file("extdata", "tmp", package = "SmartML", mustWork = TRUE),append=TRUE) #Empty the tmp file
29 |     })
30 |   if(inherits(updateKB, "try-error"))
31 |     print('Failed to update Knowledge base.')
32 | 
33 | }
34 | 


--------------------------------------------------------------------------------
/R/sendToTmp.R:
--------------------------------------------------------------------------------
 1 | #' @title Write results.
 2 | #'
 3 | #' @description Append results to a log file.
 4 | #'
 5 | #' @param df List of the dataset meta-features
 6 | #' @param algorithmName String of the name of selected classifier algorithm.
 7 | #' @param bestParams String of the best parameters configuration found.
 8 | #' @param perf String of the performance value obtained using the selected algorithm and parameter configuration.
 9 | #' @param nModels Integer representing the number of classifier algorithms that you want to select based on Meta-Learning and start to tune using Bayesian Optimization.
10 | #' @param metric Metric to be used in evaluation:
11 | #' \itemize{
12 | #' \item "acc" - Accuracy,
13 | #' \item "fscore" - Micro-Average of F-Score of each label,
14 | #' \item "recall" - Micro-Average of Recall of each label,
15 | #' \item "precision" - Micro-Average of Precision of each label.
16 | #' }
17 | #'
18 | #' @return None
19 | #'
20 | #' @examples sendToTmp(\code{df}, 'knn', '1', '0.9').
21 | #'
22 | #' @noRd
23 | #'
24 | #' @keywords internal
25 | 
26 | sendToTmp <- function(df, algorithmName, bestParams, perf, nModels, metric = 'acc') {
27 |   df$params <- sprintf("%s", paste( unlist(bestParams), collapse='#'))
28 |   df$performance <- perf
29 |   df$classifierAlgorithm <- sprintf("%s", algorithmName)
30 | 
31 |   query <- sprintf("%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s",
32 |                    df$datasetRatio, df$featuresKurtStdDev, df$featuresKurtMean, df$featuresKurtMax, df$featuresKurtMin, df$featuresSkewStdDev,
33 |                    df$featuresSkewMean, df$featuresSkewMax, df$featuresSkewMin, df$symbolsStdDev, df$symbolsSum, df$symbolsMean, df$classProbStdDev,
34 |                    df$classProbMean, df$classProbMax, df$classProbMin, df$classEntropy, df$ratioNumToCat, df$nCatFeatures, df$nNumFeatures,
35 |                    df$nInstances, df$nFeatures, df$nClasses, df$lognFeatures, df$lognInstances, df$classifierAlgorithm, df$params, df$maxTime, metric,
36 |                    df$performance, nModels)
37 |   return(query)
38 |   #write(query, file=system.file("extdata", "tmp", package = "SmartML", mustWork = TRUE),append=TRUE)
39 | }
40 | 


--------------------------------------------------------------------------------
/R/successive_halving.R:
--------------------------------------------------------------------------------
 1 | #' @keywords internal
 2 | #'
 3 | successive_halving <- function(df, model, params_config, n = 81, r = 1, eta = 3,
 4 |                                max_iter = 81, s_max = 5, evaluations = data.frame(),
 5 |                                problem = 'classification', measure = 'classif.acc') {
 6 | 
 7 |   final_df = params_config
 8 |   print('GOT HERE 0')
 9 |   if(problem == 'classification'){
10 |     problem = 'classif'
11 |     task = TaskClassif$new(id = 'sh', backend = df, target = 'class')
12 |   }
13 |   else{
14 |     problem = 'regr'
15 |     task = TaskRegr$new(id = 'sh', backend = df, target = 'class')
16 |   }
17 |   param_number = length(params_config)
18 | 
19 |   for (k in 0:s_max) {
20 |     gc()
21 |     n_i = n * (eta ** -k)
22 |     r_i = r * (eta ** k)
23 |     r_p = r_i / max_iter
24 |     min_train_datapoints = (length(unique(df$class)) * 3) + 1
25 |     min_prob_datapoints = min_train_datapoints / nrow(df$class)
26 |     train_idxs <- sample(task$nrow, task$nrow * max(min(r_p, 0.8), min_prob_datapoints))
27 |     test_idxs <- setdiff(seq_len(task$nrow), train_idxs)
28 |     if (problem == 'classif')
29 |       learners <- replicate(n = n_i, expr = {lrn(paste(problem, sep = '.', model),
30 |                                                  predict_type = 'prob')})
31 |     else
32 |       learners <- replicate(n = n_i, expr = {lrn(paste(problem, sep = '.', model))})
33 | 
34 |     print('GOT HERE 1')
35 |     j = 1
36 |     for (i in learners) {
37 |       cnt_field <- final_df[[j]]
38 |       ## Some conditions to filter the parameter values
39 |       if (model == 'svm' && final_df[[j]]$kernel != 'polynomial')
40 |         cnt_field$degree <- NULL
41 |       if ( (model == 'svm' && final_df[[j]]$kernel == 'linear') || (model == 'cv_glmnet' && final_df[[j]]$relax == FALSE))
42 |         cnt_field$gamma <- NULL
43 | 
44 |       i$param_set$values = cnt_field
45 |       j = j + 1
46 |     }
47 | 
48 |     print('GOT HERE 2')
49 |     for (l in learners) {
50 |       l$train(task = task, row_ids = train_idxs)
51 |     }
52 | 
53 |     print('GOT HERE 3')
54 |     preds <- map(.x = learners, .f = ~ .x$predict(task, row_ids = test_idxs)$score(msr(measure)))
55 | 
56 | 
57 |     final_df <- final_df %>%
58 |       as.data.table() %>%
59 |       t() %>%
60 |       `colnames<-`(value = jsons[[model]]$params) %>%
61 |       as.data.table()
62 | 
63 | 
64 |     final_df[, acc := unlist(preds)]
65 |     final_df[, budget := r_i]
66 |     final_df[, budget := r_p]
67 |     final_df[, model := unlist(learners)]
68 |     setorder(final_df, -acc)
69 |     evaluations <- rbindlist(list(evaluations, final_df))
70 | 
71 | 
72 |     final_df <- final_df %>%
73 |       head(max(n_i/eta, 1))
74 | 
75 | 
76 |     if(k == s_max){
77 |       return(list("answer" = final_df, "sh_runs" = evaluations))
78 |     }
79 | 
80 |     final_df$acc = NULL
81 |     final_df$budget = NULL
82 |     final_df$model = NULL
83 |     final_df <- purrr::transpose(final_df)
84 | 
85 |   }
86 | }
87 | 


--------------------------------------------------------------------------------
/R/successive_resampling.R:
--------------------------------------------------------------------------------
  1 | #' @importFrom KernSmooth dpik bkde
  2 | #' @importFrom tidyr drop_na separate gather spread unite
  3 | #' @importFrom dplyr select mutate_if arrange top_frac case_when mutate filter
  4 | #' @importFrom truncnorm rtruncnorm dtruncnorm
  5 | 
  6 | #' @keywords internal
  7 | dpikSafe <- function(x, ...)
  8 | {
  9 |   result <- try(dpik(x, ...), silent = TRUE)
 10 |   if (class(result) == "try-error")
 11 |   {
 12 |     msg <- geterrmessage()
 13 |     if (grepl("scale estimate is zero for input data", msg))
 14 |     {
 15 |       warning("Using standard deviation as scale estimate, probably because IQR == 0")
 16 |       result <- try(dpik(x, scalest = "stdev", ...), silent = TRUE	)
 17 |       if (class(result) == "try-error") {
 18 |         msg <- geterrmessage()
 19 |         if (grepl("scale estimate is zero for input data", msg)) {
 20 |           warning("0 scale, bandwidth estimation failed. using 1e-3")
 21 |           result <- 1e-3
 22 |         }
 23 |       }
 24 |     } else
 25 |     {
 26 |       stop(msg)
 27 |     }
 28 |   }
 29 |   return(result)
 30 | }
 31 | 
 32 | #' @keywords internal
 33 | successive_resampling <- function(df, model, samples = 64, n = 27, bw = 3, kde_type = "single") {
 34 |   samples_filtered <- df %>% drop_na()
 35 |   params_list <- jsons[[model]]$params
 36 |   length_params <- length(params_list)
 37 |   biggest_budget_that_satisfies <- samples_filtered %>%
 38 |     mutate(acc = as.numeric(acc)) %>%
 39 |     group_by(budget) %>%
 40 |     mutate(size = n()) %>%
 41 |     ungroup() %>%
 42 |     filter(size > ((length_params + 1) * 20/3)) %>%
 43 |     filter(budget == max(budget)) %>%
 44 |     arrange(desc(acc)) %>%
 45 |     select(-size) %>%
 46 |     separate(col = params,
 47 |              into = jsons[[model]]$params,
 48 |              sep = ",") %>%
 49 |     select(-model, -rp) %>%
 50 |     mutate_if(is.character, .funs = ~ str_extract(.x, pattern = "(?<==).*$") %>% parse_number)
 51 |   l_samples <- biggest_budget_that_satisfies %>%
 52 |     top_frac(0.15, wt = acc) %>%
 53 |     select(-acc, -budget)
 54 | 
 55 |   g_samples <- biggest_budget_that_satisfies %>%
 56 |     top_frac(-0.85, wt = acc) %>%
 57 |     select(-acc, -budget)
 58 | 
 59 |   l_kde_bws <- suppressWarnings(map_dbl(l_samples, dpikSafe))
 60 |   g_kde_bws <- suppressWarnings(map_dbl(g_samples, dpikSafe))
 61 |   l_kde_means <- map2_dbl(.x = l_samples, .y = l_kde_bws, .f = ~ mean(bkde(x = .x, bandwidth = .y)$x))
 62 |   g_kde_means <- map2_dbl(.x = g_samples, .y = g_kde_bws, .f = ~ mean(bkde(x = .x, bandwidth = .y)$x))
 63 |   maxvals <- map_dbl(.x = params_list, .f = ~ readr::parse_number(jsons[[model]][[.x]]$maxVal))
 64 |   minvals <- map_dbl(.x = params_list, .f = ~ readr::parse_number(jsons[[model]][[.x]]$minVal))
 65 |   types   <- map_chr(.x = params_list, .f = ~ jsons[[model]][[.x]]$scale)
 66 |   partial_rtruncnorm <- function(n, a, b, mu, sigma, type) {
 67 |     case_when(type == "int"    ~ round(rtruncnorm(n = n, a = a, b = b, mean = mu, sd = sigma)),
 68 |               type == "double" | type == "exp" ~ rtruncnorm(n = n, a = a, b = b, mean = mu, sd = sigma))
 69 |   }
 70 | 
 71 |   partial_dtruncnorm <- function(x, a, b, mu, sigma) {
 72 |     dtruncnorm(x = x, a = a, b = b, mean = mu, sd = sigma)
 73 |   }
 74 | 
 75 |   batch_samples <- pmap_dfc(.l = list("a" = minvals,
 76 |                                       "b" = maxvals,
 77 |                                       "mu" = l_kde_means,
 78 |                                       "sigma" = l_kde_bws * bw,
 79 |                                       "type" = types),
 80 |                             .f = partial_rtruncnorm,
 81 |                             n = samples) %>%
 82 |     set_names(nm = params_list)
 83 | 
 84 |   batch_samples_densities_l <- pmap_dfc(.l = list("x" = batch_samples,
 85 |                                                   "a" = minvals,
 86 |                                                   "b" = maxvals,
 87 |                                                   "mu" = l_kde_means,
 88 |                                                   "sigma" = l_kde_bws),
 89 |                                         .f = partial_dtruncnorm)
 90 | 
 91 |   batch_samples_densities_g <- pmap_dfc(.l = list("x" = batch_samples,
 92 |                                                   "a" = minvals,
 93 |                                                   "b" = maxvals,
 94 |                                                   "mu" = g_kde_means,
 95 |                                                   "sigma" = g_kde_bws),
 96 |                                         .f = partial_dtruncnorm)
 97 | 
 98 |   evaluate_batch_convolution <- batch_samples_densities_l / batch_samples_densities_g
 99 | 
100 |   rank_sample_density <- function(samp, kdensity, n) {
101 |     samp <- samp %>% as.data.frame()
102 |     samp$rank <- kdensity
103 |     sorted_samp <- samp %>% arrange(desc(rank)) %>% head(n)
104 |     subset(sorted_samp, select = -rank)
105 |   }
106 | 
107 |   if(kde_type == "mixed") {
108 |     EI <- evaluate_batch_convolution %>%
109 |       reduce(.f = `*`) %>%
110 |       map_if(.p = ~ ((is.nan(.x) | is.infinite(.x)) == T),
111 |              .f = ~ runif(1, min = 1e-5, max = 1e-3)) %>%
112 |       flatten_dbl()
113 | 
114 |     batch_samples$rank <- EI
115 | 
116 |     evaluated_batch <- batch_samples %>%
117 |       arrange(desc(rank)) %>%
118 |       top_n(n = n, wt = rank)
119 | 
120 |     evaluated_batch_step_two <- evaluated_batch %>%
121 |       select(-rank) %>%
122 |       gather(key, value) %>%
123 |       mutate(params = paste(key, value, sep = " = ")) %>%
124 |       .[["params"]]
125 | 
126 |     eval_batch_step_three <- evaluated_batch_step_two %>%
127 |       matrix(nrow = n, ncol = length(params_list)) %>%
128 |       as.data.frame() %>%
129 |       unite(col = "params", sep = ",") %>%
130 |       mutate(model = model) %>%
131 |       select(model, params)
132 | 
133 |     return(eval_batch_step_three)
134 | 
135 |   } else if(kde_type == "single") {
136 |     evaluated_batch <- map2_dfc(.x = batch_samples, .y = evaluate_batch_convolution,
137 |                                 .f = rank_sample_density, n = n)
138 | 
139 |     colnames(evaluated_batch) <- params_list
140 |     final_df <- evaluated_batch %>%
141 |       gather(key, value) %>%
142 |       mutate(params = paste(key, value, sep = " = ")) %>%
143 |       .[["params"]] %>%
144 |       matrix(nrow = n, ncol = length(params_list)) %>%
145 |       as.data.frame() %>%
146 |       unite(col = "params", sep = ",") %>%
147 |       mutate(model = model) %>%
148 |       select(model, params)
149 | 
150 |     return(final_df)
151 |   }
152 | 
153 | }
154 | 


--------------------------------------------------------------------------------
/R/sysdata.rda:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DataSystemsGroupUT/SmartML/e58b5bddb0fbf741e16f31651a282146143e78fe/R/sysdata.rda


--------------------------------------------------------------------------------
/README.Rmd:
--------------------------------------------------------------------------------
 1 | ---
 2 | output: github_document
 3 | ---
 4 | 
 5 | <!-- README.md is generated from README.Rmd. Please edit that file -->
 6 | 
 7 | ```{r setup, include = FALSE}
 8 | knitr::opts_chunk$set(
 9 |   collapse = TRUE,
10 |   comment = "#>",
11 |   fig.path = "man/figures/README-",
12 |   out.width = "100%"
13 | )
14 | ```
15 | 
16 | # witchcraft <a href='https://bigdata.cs.ut.ee/'><img src='man/figures/logo.png' align="right" height="138.5" /></a>
17 | <!-- badges: start -->
18 | [![CRAN_Status_Badge](http://www.r-pkg.org/badges/version/witchcraft)](https://cran.r-project.org/package=witchcraft)
19 | [![lifecycle](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://www.tidyverse.org/lifecycle/#experimental)
20 |   [![Travis build status](https://travis-ci.org/brurucy/witchcraft.svg?branch=master)](https://travis-ci.org/brurucy/witchcraft)
21 | <!-- badges: end -->
22 | 
23 | The R package *witchcraft* is an opinionated framework for automated machine learning, with the intent of being frequently updated with the newest state-of-the-art optimization methods.
24 | 
25 | At the moment, *witchcraft* uses the [Bayesian-Optimization-Hyperband](https://arxiv.org/pdf/1603.06560.pdf) algorithm.
26 | 
27 | Besides *Combined Algorithm Selection and Hyperparameter optimization*, *witchcraft* provides tools to evaluate the results, which are consistent with the mlr3 workflow.
28 | 
29 | ## Installation
30 | 
31 | Soon, installing the **stable** version from [CRAN](https://cran.r-project.org/package=witchcraft) will be possible:
32 | 
33 | ```{r cran-installation, eval = FALSE}
34 | install.packages("witchcraft")
35 | ```
36 | 
37 | You can always install the **development** version from
38 | [GitHub](https://github.com/brurucy/witchcraft)
39 | 
40 | ```{r gh-installation, eval = FALSE}
41 | # install.packages("remotes")
42 | remotes::install_github("brurucy/witchcraft")
43 | ```
44 | 
45 | Installing this software requires a compiler.
46 | 
47 | ## Valid example
48 | 
49 | ```{r example, message=FALSE, eval=FALSE}
50 | library(SmartML)
51 | library(readr)
52 | 
53 | data_train <- readr::read_csv('inst/extdata/dota_train.csv') %>%
54 |   as.data.table()
55 | 
56 | data_test <- readr::read_csv('inst/extdata/dota_test.csv') %>%
57 |   as.data.table()
58 | 
59 | data_train[, class := factor(class, levels = unique(class)) %>% sort()]
60 | data_test[, class := factor(class, levels = unique(class)) %>% sort()]
61 | 
62 | params <- SmartML:::get_random_hp_config('kknn', columns = ncol(data_train) - 1)
63 | 
64 | print(typeof(params$kernel))
65 | params
66 | 
67 | ```
68 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | <img src = "https://bigdata.cs.ut.ee/smartml/images/banner.png">
 2 | 
 3 | 
 4 | [![DOI](http://joss.theoj.org/papers/10.21105/joss.00786/status.svg)](https://doi.org/10.5441/002/edbt.2019.54)
 5 | 
 6 | 
 7 | ## SmartML: 
 8 | Curently, SmartML is an R-Package representing a meta learning-based framework for automated selection and hyperparameter tuning for machine learning algorithms. Being meta-learning based, the framework is able to simulate the role of the machine learning expert. In particular, the framework is equipped with a continuously updated knowledge base that stores information about the meta-features of all processed datasets along with the associated performance of the different classifiers and their tuned parameters. Thus, for any new dataset, SmartML automatically extracts its meta features and searches its knowledge base for the best performing algorithm to start its optimization process. In addition, SmartML makes use of the new runs to continuously enrich its knowledge base to improve its performance and robustness for future runs.
 9 | 
10 | <img src = "https://bigdata.cs.ut.ee/smartml/images/arch.jpg">
11 | 
12 | ---
13 | ## SmartML Contribution Points and Goals:
14 | 
15 | The goal of SmartML is to automate the process of classifier algorithm selection, and hyper-parameter tuning in supervised machine learning using a modified version of SMAC bayesian optimization that prefers explitation more than exploration thanks to Meta-Learning. 
16 | 1. SmartML is the first R package to deal with the sueprvised machine learning automation, and it is built over 16 different classifier algorithms from different R packages. <br>
17 | 2. In addition, we offer different data preprocessing, and feature engineering algorithms that can be specified by user and applied on tabular datasets of either CSV or ARFF extensions easily.
18 | 3. SmartML has a collaborative knowledge base that grows by time as more users are using our tool.
19 | 4. Finally, SmartML has the ability to do some model interpretability plots for feature importance and interaction by help of ```iml``` package for ML model interpretability.
20 | 
21 | ---
22 | ## Installation
23 | 
24 | You can install the released version of SmartML from [Github](https://github.com/mmaher22/SmartML) with:
25 | 
26 | ``` r
27 | install_github("mmaher22/SmartML")
28 | ```
29 | 
30 | ---
31 | ## User Manual
32 | 
33 | Manual for the SmartML R package can be found <a href = "https://github.com/mmaher22/Auto-Machine-Learning/blob/master/manual.pdf"> HERE </a>
34 | 
35 | ---
36 | ## Example
37 | 
38 | This is a basic example which shows you how to run SmartML simply:
39 | 
40 | ```{r}
41 | library(SmartML)
42 | ```
43 | 
44 | ```{r}
45 | #' Option 1 = Classifier Selection Only, apply PCA as a preprocessing step with 4 components and get two candidate models as output only
46 | result1 <- autoRLearn(1, 'sampleDatasets/shuttle/train.arff', 'sampleDatasets/shuttle/test.arff', option = 1, preProcessF = 'pca', nComp = 4, nModels = 2) 
47 | 
48 | #option 1 runs for Classifier Algorithm Selection Only
49 | result1$clfs  #Vector of recommended nModels classifiers
50 | result1$params #Vector of initial suggested parameter configurations of nModels recommended classifiers
51 | 
52 | #Use recommended model to train over training data and make predictions over test data
53 | resultRun <- runClassifier(result1$TRData, result1$TEData, result1$params[[1]], result1$clfs[[1]])
54 | resultRun$perf #model performance on test set
55 | ```
56 | 
57 | ```{r}
58 | #' Option 2 = Both Classifier Selection and Parameter Optimization and compute model interpretability plots
59 | result2 <- autoRLearn(2, 'sampleDatasets/car/train.arff', 'sampleDatasets/car/test.arff', interp = TRUE) # Option 2 runs for both classifier algorithm selection and parameter tuning for 2 minutes.
60 | 
61 | result2$clfs #best classifier found
62 | result2$params #parameter configuration for best classifier
63 | result2$perf #performance of chosen classifier on testing set after fitting on whole training set
64 | ```
65 | 
66 | ```{r}
67 | plot(result2$interpret$featImp) #Feature Importance Plot
68 | ```
69 | 
70 | ```{r}
71 | #' Option 2 = Both Classifier Selection and Parameter Optimization, use 20% validation set from training set, and apply MICE for missing values imputation
72 | result3 <- autoRLearn(5, 'sampleDatasets/EEGEyeState/train.csv', 'sampleDatasets/EEGEyeState/test.csv', vRatio = 0.2, missingOpr = TRUE) # Option 2 runs for both classifier algorithm selection and parameter tuning for 5 minutes.
73 | 
74 | 
75 | result3$clfs #best classifier found
76 | result3$params #parameter configuration for best classifier
77 | result3$perf #performance of chosen classifier on testing set
78 | ```
79 | 
80 | ---
81 | ## Contribution GuideLines to SmartML
82 | To Contribute to `SmartML`, Please Follow these <a href = "https://github.com/mmaher22/SmartML/blob/master/CONTRIBUTE.md"> GuideLines </a>
83 | 
84 | ---
85 | ## Publication
86 | 
87 | SmartML has been accepted as a DEMO paper at EDBT 19 in Lisbon Portugal <a href = "http://openproceedings.org/2019/conf/edbt/EDBT19_paper_235.pdf">[PDF]</a>:
88 | ```
89 | Mohamed Maher, Sherif Sakr.,SMARTML: A Meta Learning-Based Framework for Automated Selection and Hyperparameter Tuning for Machine Learning Algorithms (2019). Advances in Database Technology-EDBT 2019: 22nd International Conference on Extending Database Technology, Lisbon, Portugal, March 26-29.
90 | ```
91 | 
92 | ---
93 | ## Licence:
94 | This work is licensed under the terms of the GNU General Public License, version 3.0 (GPLv3)
95 | 


--------------------------------------------------------------------------------
/SmartML.Rproj:
--------------------------------------------------------------------------------
 1 | Version: 1.0
 2 | 
 3 | RestoreWorkspace: Default
 4 | SaveWorkspace: Default
 5 | AlwaysSaveHistory: Default
 6 | 
 7 | EnableCodeIndexing: Yes
 8 | UseSpacesForTab: Yes
 9 | NumSpacesForTab: 2
10 | Encoding: UTF-8
11 | 
12 | RnwWeave: Sweave
13 | LaTeX: pdfLaTeX
14 | 
15 | AutoAppendNewline: Yes
16 | StripTrailingWhitespace: Yes
17 | 
18 | BuildType: Package
19 | PackageUseDevtools: Yes
20 | PackageInstallArgs: --no-multiarch --with-keep.source
21 | PackageCheckArgs: –as-cran
22 | PackageRoxygenize: rd,collate,namespace
23 | 


--------------------------------------------------------------------------------
/SmartML_0.3.0.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DataSystemsGroupUT/SmartML/e58b5bddb0fbf741e16f31651a282146143e78fe/SmartML_0.3.0.pdf


--------------------------------------------------------------------------------
/codecov.yml:
--------------------------------------------------------------------------------
 1 | comment: false
 2 | coverage:
 3 |   status:
 4 |     project:
 5 |       default:
 6 |         target: auto
 7 |         threshold: 1%
 8 |     patch:
 9 |       default:
10 |         target: auto
11 |         threshold: 1%
12 | language: R
13 | sudo: false
14 | 
15 | 


--------------------------------------------------------------------------------
/inst/extdata/hyperband_jsons.zip:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DataSystemsGroupUT/SmartML/e58b5bddb0fbf741e16f31651a282146143e78fe/inst/extdata/hyperband_jsons.zip


--------------------------------------------------------------------------------
/inst/extdata/hyperband_jsons/cv_glmnet.json:
--------------------------------------------------------------------------------
 1 | {
 2 | 	"params":["dfmax", "alpha", "gamma", "relax", "nfolds"],
 3 | 	"parents":["dfmax", "alpha", "gamma", "relax", "nfolds"],
 4 | 	"gamma":
 5 | 	{
 6 | 	  "type":"continuous",
 7 | 	  "scale":"double",
 8 | 	  "minVal":"0",
 9 | 	  "maxVal":"1",
10 | 	  "default":"0.5",
11 | 	  "constraint":"any"
12 | 	},
13 | 	"alpha":
14 | 	{
15 | 		"type":"continuous",
16 | 		"scale":"double",
17 | 		"minVal":"0",
18 | 		"maxVal":"1",
19 | 		"default":"0.3",
20 | 		"constraint":"any"
21 | 	},
22 | 	"dfmax":
23 | 	{
24 | 		"type":"continuous",
25 | 		"scale":"int",
26 | 		"minVal":"10",
27 | 		"maxVal":"100",
28 | 		"default":"50",
29 | 		"constraint":"any"
30 | 	},
31 | 	"nfolds":
32 | 	{
33 | 		"type":"continuous",
34 | 		"scale":"int",
35 | 		"minVal":"3",
36 | 		"maxVal":"3",
37 | 		"default":"3",
38 | 		"constraint":"any"
39 | 	},
40 | 	"relax":
41 | 	{
42 | 		"type":"boolean",
43 | 		"values":["TRUE", "FALSE"],
44 | 		"default":"FALSE"
45 | 	}
46 | }
47 | 


--------------------------------------------------------------------------------
/inst/extdata/hyperband_jsons/glmnet.json:
--------------------------------------------------------------------------------
 1 | {
 2 | 	"params":["dfmax", "alpha", "gamma", "relax"],
 3 | 	"parents":["dfmax", "alpha", "gamma", "relax"],
 4 | 	"gamma":
 5 | 	{
 6 | 	  "type":"continuous",
 7 | 	  "scale":"double",
 8 | 	  "minVal":"0",
 9 | 	  "maxVal":"1",
10 | 	  "default":"0.5",
11 | 	  "constraint":"any"
12 | 	},
13 | 	"alpha":
14 | 	{
15 | 		"type":"continuous",
16 | 		"scale":"double",
17 | 		"minVal":"0",
18 | 		"maxVal":"1",
19 | 		"default":"0.3",
20 | 		"constraint":"any"
21 | 	},
22 | 	"dfmax":
23 | 	{
24 | 		"type":"continuous",
25 | 		"scale":"int",
26 | 		"minVal":"10",
27 | 		"maxVal":"100",
28 | 		"default":"50",
29 | 		"constraint":"any"
30 | 	},
31 | 	"relax":
32 | 	{
33 | 		"type":"boolean",
34 | 		"values":["TRUE", "FALSE"],
35 | 		"default":"FALSE"
36 | 	}
37 | }
38 | 


--------------------------------------------------------------------------------
/inst/extdata/hyperband_jsons/kknn.json:
--------------------------------------------------------------------------------
 1 | {
 2 | 	"params":["k", "distance", "kernel"],
 3 | 	"parents":["k", "distance", "kernel"],
 4 | 	"k":
 5 | 	{
 6 | 		"type":"continuous",
 7 | 		"scale":"int",
 8 | 		"minVal":"1",
 9 | 		"maxVal":"20",
10 | 		"default":"7",
11 | 		"constraint":"any"
12 | 	},
13 | 	"distance":
14 | 	{
15 | 		"type":"continuous",
16 | 		"scale":"int",
17 | 		"minVal":"1",
18 | 		"maxVal":"4",
19 | 		"default":"2",
20 | 		"constraint":"any"
21 | 	},
22 | 	"kernel":
23 | 	{
24 | 		"type":"discrete",
25 | 		"values":["rectangular", "epanechnikov", "gaussian", "rank", "optimal"],
26 | 		"default":"optimal"
27 | 	}
28 | }
29 | 


--------------------------------------------------------------------------------
/inst/extdata/hyperband_jsons/lm.json:
--------------------------------------------------------------------------------
 1 | {
 2 | 	"params":["singular.ok"],
 3 |     "parents":["singular.ok"],
 4 | 	"type":
 5 | 	{
 6 | 		"type":"boolean",
 7 | 		"values":["TRUE"],
 8 | 		"default":"TRUE"
 9 | 	}
10 | }
11 | 


--------------------------------------------------------------------------------
/inst/extdata/hyperband_jsons/naive_bayes.json:
--------------------------------------------------------------------------------
 1 | {
 2 | 	"params":["laplace"],
 3 | 	"parents":["laplace"],
 4 | 	"laplace":
 5 | 	{
 6 | 		"default":"0",
 7 | 		"type":"continuous",
 8 | 		"scale":"int",
 9 | 		"minVal":"0",
10 | 		"maxVal":"4",
11 | 		"constraint":"any"
12 | 	}
13 | }
14 | 


--------------------------------------------------------------------------------
/inst/extdata/hyperband_jsons/ranger.json:
--------------------------------------------------------------------------------
 1 | {
 2 | 	"params":["num.trees", "mtry", "max.depth", "min.node.size", "verbose"],
 3 | 	"parents":["num.trees", "mtry", "max.depth", "min.node.size", "verbose"],
 4 | 	"num.trees":
 5 | 	{
 6 | 		"type":"continuous",
 7 | 		"scale":"int",
 8 | 		"minVal":"1",
 9 | 		"maxVal":"500",
10 | 		"default":"500",
11 | 		"constraint":"any"
12 | 	},
13 | 	"mtry":
14 | 	{
15 | 		"type":"continuous",
16 | 		"scale":"int",
17 | 		"minVal":"1",
18 | 		"maxVal":"30",
19 | 		"default":"5",
20 | 		"constraint":"any"
21 | 	},
22 | 	"max.depth":
23 | 	{
24 | 		"type":"continuous",
25 | 		"scale":"int",
26 | 		"minVal":"0",
27 | 		"maxVal":"10",
28 | 		"default":"0",
29 | 		"constraint":"any"
30 | 	},
31 | 	"min.node.size":
32 | 	{
33 | 		"type":"continuous",
34 | 		"scale":"int",
35 | 		"minVal":"1",
36 | 		"maxVal":"10",
37 | 		"default":"2",
38 | 		"constraint":"any"
39 | 	},
40 | 	"verbose":
41 | 	{
42 | 		"type":"boolean",
43 | 		"values":["FALSE"],
44 | 		"default":"FALSE"
45 | 	}
46 | }
47 | 


--------------------------------------------------------------------------------
/inst/extdata/hyperband_jsons/rpart.json:
--------------------------------------------------------------------------------
 1 | {
 2 | 	"params":["maxdepth", "minsplit"],
 3 | 	"parents":["maxdepth", "minsplit"],
 4 | 	"maxdepth":
 5 | 	{
 6 | 		"type":"continuous",
 7 | 		"scale":"int",
 8 | 		"minVal":"1",
 9 | 		"maxVal":"30",
10 | 		"default":"6",
11 | 		"constraint":"any"
12 | 	},
13 | 	"minsplit":
14 | 	{
15 | 		"type":"continuous",
16 | 		"scale":"int",
17 | 		"minVal":"1",
18 | 		"maxVal":"30",
19 | 		"default":"10",
20 | 		"constraint":"any"
21 | 	}
22 | }
23 | 


--------------------------------------------------------------------------------
/inst/extdata/hyperband_jsons/svm.json:
--------------------------------------------------------------------------------
 1 | {
 2 | 	"params":["kernel", "type", "degree", "gamma", "cost"],
 3 |     "parents":["kernel", "type", "degree", "gamma", "cost"],
 4 | 	"kernel":
 5 | 	{
 6 | 		"type":"discrete",
 7 | 		"values":["linear", "radial", "polynomial"],
 8 | 		"default":"linear"
 9 | 	},
10 | 	"type":
11 | 	{
12 | 		"type":"discrete",
13 | 		"values":["C-classification"],
14 | 		"default":"C-classification"
15 | 	},
16 | 	"gamma":
17 | 	{
18 | 		"default":"-4",
19 | 		"type":"continuous",
20 | 		"minVal":"-10",
21 | 		"maxVal":"5",
22 | 		"scale":"exp",
23 | 		"constraint":"any"
24 | 	},
25 | 	"degree":
26 | 	{
27 | 		"default":"3",
28 | 		"type":"continuous",
29 | 		"minVal":"2",
30 | 		"maxVal":"5",
31 | 		"scale":"int",
32 | 		"constraint":"any"
33 | 	},
34 | 	"cost":
35 | 	{
36 | 		"default":"-2",
37 | 		"type":"continuous",
38 | 		"minVal":"-6",
39 | 		"maxVal":"12",
40 | 		"scale":"exp",
41 | 		"constraint":"any"
42 | 	}
43 | }
44 | 


--------------------------------------------------------------------------------
/inst/extdata/hyperband_jsons/xgboost.json:
--------------------------------------------------------------------------------
 1 | {
 2 | 	"params":["eta", "max_depth", "nrounds", "verbose", "min_child_weight"],
 3 | 	"parents":["eta", "max_depth", "nrounds", "verbose", "min_child_weight"],
 4 | 	"verbose":
 5 | 	{
 6 | 	  "type":"continuous",
 7 | 	  "scale":"int",
 8 | 	  "minVal":"0",
 9 | 	  "maxVal":"0",
10 | 	  "default":"0"
11 | 	},
12 | 	"nrounds":
13 | 	{
14 | 	  "type":"continuous",
15 | 	  "scale":"int",
16 | 	  "minVal":"10",
17 | 	  "maxVal":"1000",
18 | 	  "default":"10"
19 | 	},
20 | 	"eta":
21 | 	{
22 | 		"type":"continuous",
23 | 		"scale":"double",
24 | 		"minVal":"0.01",
25 | 		"maxVal":"0.5",
26 | 		"default":"0.3"
27 | 	},
28 | 	"max_depth":
29 | 	{
30 | 		"type":"continuous",
31 | 		"scale":"int",
32 | 		"minVal":"2",
33 | 		"maxVal":"10",
34 | 		"default":"6"
35 | 	},
36 | 	"min_child_weight":
37 | 	{
38 | 		"type":"continuous",
39 | 		"scale":"int",
40 | 		"minVal":"1",
41 | 		"maxVal":"10",
42 | 		"default":"1"
43 | 	}
44 | }
45 | 


--------------------------------------------------------------------------------
/inst/extdata/ta_test.csv:
--------------------------------------------------------------------------------
 1 | X1.1,X1.2,X2.1,X2.2,X2.3,X2.4,X2.5,X2.6,X2.7,X2.8,X2.9,X2.10,X2.11,X2.12,X2.13,X2.14,X2.15,X2.16,X2.17,X2.18,X2.19,X2.20,X2.21,X2.22,X2.23,X2.24,X2.25,X3.1,X3.2,X3.3,X3.4,X3.5,X3.6,X3.7,X3.8,X3.9,X3.10,X3.11,X3.12,X3.13,X3.14,X3.15,X3.16,X3.17,X3.18,X3.19,X3.20,X3.21,X3.22,X3.23,X3.24,X3.25,X3.26,X4.1,X4.2,X5,class
 2 | 0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,2.104308876612462,3
 3 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,-0.610182803372127,3
 4 | 0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,-0.4550689930872934,2
 5 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1.096069109761043,1
 6 | 0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1.7940812560427943,1
 7 | 1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,-0.6877397085145439,3
 8 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,-0.8428535187993775,3
 9 | 0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,1.1736260149034599,2
10 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1.4062967303307103,2
11 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,-1.3857518547962953,2
12 | 0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0.7082845840489589,1
13 | 1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,-0.22239827766004297,3
14 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,-0.610182803372127,3
15 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,-0.610182803372127,2
16 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0.087829342909624325,1
17 | 0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,-0.06728446737520931,1
18 | 0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,-0.3775120879448766,1
19 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,2.9574348331790463,1
20 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,-0.8428535187993775,3
21 | 0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,-1.6184225702235457,3
22 | 1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,2.026751971470045,3
23 | 1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0.087829342909624325,3
24 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,-1.3081949496538785,2
25 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,-0.5326258982297103,2
26 | 0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,-0.29995518280245975,2
27 | 0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,-1.7735363805083795,2
28 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1.096069109761043,2
29 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0.087829342909624325,1
30 | 0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,-1.2306380445114617,1
31 | 


--------------------------------------------------------------------------------
/inst/extdata/ta_train.csv:
--------------------------------------------------------------------------------
  1 | X1.1,X1.2,X2.1,X2.2,X2.3,X2.4,X2.5,X2.6,X2.7,X2.8,X2.9,X2.10,X2.11,X2.12,X2.13,X2.14,X2.15,X2.16,X2.17,X2.18,X2.19,X2.20,X2.21,X2.22,X2.23,X2.24,X2.25,X3.1,X3.2,X3.3,X3.4,X3.5,X3.6,X3.7,X3.8,X3.9,X3.10,X3.11,X3.12,X3.13,X3.14,X3.15,X3.16,X3.17,X3.18,X3.19,X3.20,X3.21,X3.22,X3.23,X3.24,X3.25,X3.26,X4.1,X4.2,X5,class
  2 | 1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,-0.6877397085145439,3
  3 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,-0.8428535187993775,3
  4 | 1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1.6389674457579608,3
  5 | 1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0.39805696347929165,3
  6 | 0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,-0.6877397085145439,3
  7 | 0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,-0.06728446737520931,3
  8 | 1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,2.3369795920397123,3
  9 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,-0.610182803372127,3
 10 | 0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,-1.463308759938712,3
 11 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0.16538624805204116,3
 12 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0.087829342909624325,3
 13 | 0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0.8633983943337925,3
 14 | 0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,1.096069109761043,2
 15 | 0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,1.1736260149034599,2
 16 | 0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,-1.3857518547962953,2
 17 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1.4062967303307103,2
 18 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,-1.3857518547962953,2
 19 | 0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,1.096069109761043,2
 20 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,-0.06728446737520931,2
 21 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,-0.3775120879448766,2
 22 | 0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0.24294315319445797,2
 23 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0.7082845840489589,2
 24 | 0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,-1.1530811393690448,2
 25 | 0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,-0.29995518280245975,2
 26 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0.7858414891913758,2
 27 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0.01027243776720751,1
 28 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,-0.6877397085145439,1
 29 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0.24294315319445797,1
 30 | 1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,-1.1530811393690448,1
 31 | 0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0.7082845840489589,1
 32 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0.6307276789065421,1
 33 | 0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,-0.5326258982297103,1
 34 | 0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1.5614105406155439,1
 35 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0.7858414891913758,1
 36 | 1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1.6389674457579608,3
 37 | 1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0.39805696347929165,3
 38 | 0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,2.104308876612462,3
 39 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,-0.610182803372127,3
 40 | 0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,-0.6877397085145439,3
 41 | 0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,-0.06728446737520931,3
 42 | 1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,2.3369795920397123,3
 43 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,-0.610182803372127,3
 44 | 0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,-1.463308759938712,3
 45 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0.16538624805204116,3
 46 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0.087829342909624325,3
 47 | 0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0.8633983943337925,3
 48 | 0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,1.096069109761043,2
 49 | 0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,-1.3857518547962953,2
 50 | 0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,1.096069109761043,2
 51 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,-0.06728446737520931,2
 52 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,-0.3775120879448766,2
 53 | 0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0.24294315319445797,2
 54 | 0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,-0.4550689930872934,2
 55 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0.7082845840489589,2
 56 | 0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,-1.1530811393690448,2
 57 | 0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,-0.29995518280245975,2
 58 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0.7858414891913758,2
 59 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1.096069109761043,1
 60 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0.01027243776720751,1
 61 | 0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1.7940812560427943,1
 62 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,-0.6877397085145439,1
 63 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0.24294315319445797,1
 64 | 1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,-1.1530811393690448,1
 65 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0.6307276789065421,1
 66 | 0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,-0.5326258982297103,1
 67 | 0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1.5614105406155439,1
 68 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0.7858414891913758,1
 69 | 1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,-0.8428535187993775,3
 70 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,-1.3081949496538785,3
 71 | 0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0.8633983943337925,3
 72 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,-1.3081949496538785,3
 73 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,-0.6877397085145439,3
 74 | 1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1.3287398251882936,3
 75 | 0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,-0.610182803372127,3
 76 | 1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,-0.610182803372127,3
 77 | 1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0.7858414891913758,3
 78 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,-0.8428535187993775,3
 79 | 1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,-0.6877397085145439,3
 80 | 0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,-0.29995518280245975,3
 81 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,-0.22239827766004297,3
 82 | 1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0.24294315319445797,3
 83 | 0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0.24294315319445797,3
 84 | 0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,-0.7652966136569607,2
 85 | 1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,-0.4550689930872934,2
 86 | 1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,-0.06728446737520931,2
 87 | 0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,-1.075524234226628,2
 88 | 1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0.5531707737641253,2
 89 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,-0.610182803372127,2
 90 | 1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,-0.610182803372127,2
 91 | 0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0.7082845840489589,2
 92 | 1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,-0.9979673290842112,2
 93 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,-0.22239827766004297,2
 94 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,-1.3857518547962953,2
 95 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,-1.075524234226628,1
 96 | 1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0.7858414891913758,1
 97 | 0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,-0.6877397085145439,1
 98 | 0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0.16538624805204116,1
 99 | 1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0.3205000583368748,1
100 | 0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0.47561386862170846,1
101 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,-1.2306380445114617,1
102 | 0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0.087829342909624325,1
103 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,-0.6877397085145439,1
104 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,-1.928650190793213,1
105 | 1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,-0.5326258982297103,3
106 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0.6307276789065421,3
107 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1.3287398251882936,3
108 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,-0.9204104239417944,2
109 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,-0.7652966136569607,2
110 | 1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,1.2511829200458766,2
111 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,-0.8428535187993775,2
112 | 0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,-0.610182803372127,2
113 | 0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0.16538624805204116,1
114 | 0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,-0.6877397085145439,1
115 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,-1.3081949496538785,1
116 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,-0.9979673290842112,1
117 | 0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0.7082845840489589,1
118 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,-1.3857518547962953,1
119 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,-0.29995518280245975,1
120 | 0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,-0.14484137251762613,1
121 | 1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1.5614105406155439,1
122 | 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1.7940812560427943,1
123 | 0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,-0.06728446737520931,1
124 | 


--------------------------------------------------------------------------------
/inst/extdata/test_schizo.csv:
--------------------------------------------------------------------------------
 1 | target,gain_ratio_1,gain_ratio_2,gain_ratio_3,gain_ratio_4,gain_ratio_5,gain_ratio_6,gain_ratio_7,gain_ratio_8,gain_ratio_9,gain_ratio_10,gain_ratio_11,sex,y
 2 | PS,0.879,0.864,0.804,0.65,0.74,0.766,0.866,0.817,0.879,0.733,0.845,female,non-schizophrenic
 3 | PS,0.919,0.875,0.828,0.915,0.883,0.802,0.802,0.77,0.963,0.932,1.01,female,non-schizophrenic
 4 | CS,0.829,0.753,0.774,0.716,0.776,0.793,0.738,0.731,0.76,0.636,0.642,female,non-schizophrenic
 5 | PS,0.8425,0.829,0.828,0.741,0.831,0.832,0.665,0.816,0.819,0.73,0.816,female,non-schizophrenic
 6 | PS,0.948,0.896,0.872,0.869,0.819,0.852,0.815,0.83,0.799,0.728,0.69,female,non-schizophrenic
 7 | PS,0.862,0.881,0.874,0.874,0.835,0.814,0.825,0.772,0.711,0.716,0.726,female,non-schizophrenic
 8 | PS,0.8425,0.829,0.952,0.83,0.831,0.98,0.827,0.892,0.962,0.836,0.816,female,non-schizophrenic
 9 | PS,0.791,0.834,0.726,0.83,0.831,0.832,0.722,0.816,0.838,0.827,0.916,female,non-schizophrenic
10 | PS,0.872,0.829,0.867,0.919,0.795,0.756,0.854,0.945,0.842,0.82,0.816,female,non-schizophrenic
11 | PS,0.947,0.912,0.94,0.919,0.915,0.889,0.901,0.874,0.837,0.872,0.84,female,non-schizophrenic
12 | PS,0.88,0.829,0.798,0.822,0.77,0.815,0.803,0.816,0.767,0.82,0.797,female,non-schizophrenic
13 | PS,0.799,0.69,0.701,0.738,0.831,0.761,0.696,0.679,0.709,0.65,0.816,female,non-schizophrenic
14 | TR,0.8425,0.829,0.966,0.926,0.831,0.832,0.827,0.916,0.777,0.82,0.816,female,non-schizophrenic
15 | PS,0.8425,0.829,0.828,0.83,0.947,1.2,1.14,1.1,1.12,0.871,0.809,female,non-schizophrenic
16 | PS,0.896,0.874,0.893,0.944,0.933,0.941,0.892,0.893,0.84,0.82,0.829,female,non-schizophrenic
17 | PS,0.914,0.873,0.844,0.925,0.868,0.783,0.701,0.741,0.722,0.828,0.816,female,non-schizophrenic
18 | TR,0.8425,0.829,0.828,0.83,0.831,0.832,0.827,0.816,0.819,0.82,0.816,female,non-schizophrenic
19 | CS,0.807,0.811,0.787,0.728,0.803,0.832,0.827,0.816,0.819,0.82,0.816,female,non-schizophrenic
20 | PS,0.803,0.782,0.623,0.828,0.826,0.793,0.811,0.75,0.816,0.753,0.766,female,non-schizophrenic
21 | CS,0.939,0.841,0.901,0.917,0.896,0.921,0.899,0.804,0.894,0.846,0.902,female,non-schizophrenic
22 | PS,0.813,0.758,0.828,0.83,0.831,0.832,0.827,0.77,0.819,0.82,0.773,female,non-schizophrenic
23 | TR,0.697,0.617,0.759,0.83,0.6,0.604,0.619,0.592,0.819,0.82,0.679,female,non-schizophrenic
24 | TR,0.782,0.88,0.828,0.83,0.709,0.886,0.841,0.816,0.819,0.82,0.843,female,non-schizophrenic
25 | CS,1.03,1.01,1.02,0.964,1.05,1.01,0.985,0.964,1.01,1,1.01,male,non-schizophrenic
26 | PS,0.822,0.843,0.625,0.81,0.702,0.702,0.842,0.865,0.701,0.77,0.801,male,non-schizophrenic
27 | PS,0.863,0.913,0.743,0.86,0.803,0.889,0.924,0.87,0.872,0.859,0.84,male,non-schizophrenic
28 | PS,0.901,0.777,0.743,0.858,0.811,0.751,0.627,0.748,0.808,0.669,0.844,male,non-schizophrenic
29 | PS,0.81,0.735,0.664,0.826,0.767,0.604,0.669,0.87,0.817,0.59,0.835,male,non-schizophrenic
30 | TR,0.674,0.646,0.626,0.639,0.64,0.665,0.655,0.661,0.724,0.7,0.661,male,non-schizophrenic
31 | CS,1,0.958,0.938,1.02,0.956,0.909,1.04,0.902,0.956,0.939,0.954,male,non-schizophrenic
32 | PS,0.8425,0.829,0.828,0.83,0.924,0.924,1,0.986,0.962,1.02,0.991,male,non-schizophrenic
33 | PS,0.94,0.971,0.76,0.983,0.998,0.894,0.856,0.942,0.937,0.965,0.936,male,non-schizophrenic
34 | TR,0.8425,0.829,0.803,0.826,0.764,0.815,0.868,0.791,0.86,0.82,0.839,male,non-schizophrenic
35 | CS,0.894,0.954,0.939,0.938,0.9,0.936,0.944,0.884,0.93,0.885,0.846,male,non-schizophrenic
36 | CS,0.8425,0.829,0.711,0.83,0.78,0.832,0.775,0.68,0.819,0.858,0.662,male,non-schizophrenic
37 | CS,0.962,0.93,0.922,0.858,0.905,0.793,0.867,0.948,0.879,0.916,0.781,male,non-schizophrenic
38 | TR,0.757,0.756,0.811,0.709,0.714,0.743,0.745,0.816,0.819,0.82,0.813,male,non-schizophrenic
39 | PS,0.8425,0.93,0.906,1.01,0.933,0.832,0.862,0.816,0.819,0.82,0.816,male,non-schizophrenic
40 | CS,0.868,0.901,0.893,0.864,0.831,0.795,0.905,0.872,0.873,0.872,0.822,male,non-schizophrenic
41 | TR,0.8,0.76,0.815,0.759,0.828,0.77,0.769,0.789,0.73,0.766,0.85,male,non-schizophrenic
42 | PS,0.895,0.771,0.997,0.885,0.948,0.832,0.843,0.66,0.729,0.801,0.893,male,non-schizophrenic
43 | CS,0.643,0.829,0.828,0.83,0.831,0.832,0.626,0.816,0.819,0.82,0.816,male,non-schizophrenic
44 | TR,0.742,0.829,0.7,0.743,0.748,0.827,0.827,0.816,0.819,0.82,0.776,male,non-schizophrenic
45 | CS,0.767,0.822,0.828,0.798,0.806,0.766,0.767,0.816,0.82,0.876,0.756,male,non-schizophrenic
46 | PS,0.876,0.866,0.899,0.923,0.832,0.849,0.827,0.906,0.822,0.885,0.826,female,schizophrenic
47 | CS,0.836,0.944,0.889,0.909,0.863,0.838,0.844,0.784,0.819,0.82,0.816,female,schizophrenic
48 | PS,0.8425,0.857,0.828,0.798,0.831,0.832,0.757,0.742,0.819,0.82,0.816,female,schizophrenic
49 | TR,0.8425,0.829,0.682,0.651,0.672,0.832,0.827,0.604,0.819,0.82,0.816,female,schizophrenic
50 | CS,0.919,0.856,0.825,0.908,0.896,0.886,0.905,0.938,0.875,0.983,0.881,female,schizophrenic
51 | PS,0.911,0.927,0.798,0.938,0.899,0.952,0.925,0.851,0.953,0.761,0.952,female,schizophrenic
52 | PS,0.8425,0.829,0.613,0.44,0.831,0.832,0.827,0.816,0.819,0.82,0.816,female,schizophrenic
53 | PS,0.726,0.734,0.862,0.83,0.972,0.9,0.876,0.83,0.878,0.79,0.868,female,schizophrenic
54 | TR,0.756,0.871,0.712,0.897,0.785,0.789,0.724,0.798,0.581,0.672,0.636,female,schizophrenic
55 | PS,0.782,0.829,0.828,0.83,0.84,0.832,0.837,0.816,0.819,0.797,0.816,female,schizophrenic
56 | PS,0.937,0.776,0.857,0.899,0.955,0.929,0.827,0.89,0.819,0.818,0.945,female,schizophrenic
57 | TR,0.75,0.829,0.744,0.83,0.794,0.732,0.827,0.697,0.819,0.772,0.816,female,schizophrenic
58 | PS,0.8,0.866,0.915,0.911,0.9,0.886,0.837,0.848,0.896,0.755,0.861,female,schizophrenic
59 | TR,0.899,0.768,0.787,0.781,0.735,0.827,0.796,0.793,0.729,0.801,0.838,female,schizophrenic
60 | CS,0.83,0.828,0.697,0.731,0.817,0.687,0.778,0.612,0.668,0.755,0.754,male,schizophrenic
61 | PS,0.63,0.631,0.828,0.664,0.579,0.832,0.801,0.641,0.819,0.82,0.816,male,schizophrenic
62 | PS,0.691,0.709,0.828,0.83,0.831,0.687,0.639,0.667,0.669,0.695,0.545,male,schizophrenic
63 | PS,0.782,0.812,0.828,0.669,0.701,0.726,0.827,0.673,0.708,0.637,0.728,male,schizophrenic
64 | PS,0.932,0.783,0.809,0.837,0.744,0.794,0.767,0.71,0.622,0.569,0.562,male,schizophrenic
65 | PS,0.851,0.828,0.808,0.827,0.873,0.862,0.752,0.668,0.687,0.717,0.696,male,schizophrenic
66 | PS,0.73,0.729,0.828,0.704,0.831,0.692,0.637,0.581,0.819,0.654,0.816,male,schizophrenic
67 | TR,0.564,0.703,0.59,0.58,0.831,0.832,0.667,0.584,0.819,0.688,0.584,male,schizophrenic
68 | CS,0.779,0.707,0.705,0.785,0.58,0.746,0.715,0.551,0.799,0.668,0.779,male,schizophrenic
69 | PS,0.787,0.748,0.764,0.796,0.778,0.758,0.75,0.746,0.763,0.647,0.734,male,schizophrenic
70 | PS,0.8425,0.773,0.635,0.594,0.608,0.832,0.526,0.625,0.623,0.712,0.782,male,schizophrenic
71 | PS,0.927,0.854,0.828,0.83,1.01,0.955,0.916,0.957,0.905,0.855,0.947,male,schizophrenic
72 | CS,0.893,0.702,0.902,0.83,0.831,0.777,0.827,0.816,0.819,0.82,0.816,male,schizophrenic
73 | PS,0.8425,0.829,0.828,0.83,0.909,0.895,0.827,0.816,0.931,0.956,0.97,male,schizophrenic
74 | PS,0.8425,0.994,1.05,0.941,0.98,1.02,0.96,1.03,0.973,0.813,0.909,male,schizophrenic
75 | PS,0.8425,0.857,0.895,0.879,0.831,0.832,0.852,0.894,0.888,0.82,0.816,male,schizophrenic
76 | TR,0.8425,0.728,0.828,0.83,0.777,0.825,0.827,0.816,0.819,0.82,0.686,male,schizophrenic
77 | PS,0.776,0.956,0.944,0.928,0.85,0.925,0.942,0.9,0.945,0.919,0.898,male,schizophrenic
78 | CS,0.618,0.829,0.828,0.83,0.737,0.832,0.827,0.816,0.643,0.82,0.62,male,schizophrenic
79 | PS,0.8425,0.829,0.828,0.83,0.831,0.712,0.871,0.832,0.819,0.82,0.816,male,schizophrenic
80 | CS,0.956,0.825,0.953,0.825,0.916,0.92,0.964,0.903,0.868,0.945,0.895,male,schizophrenic
81 | CS,0.66,0.655,0.828,0.58,0.708,0.688,0.646,0.816,0.588,0.82,0.74,male,schizophrenic
82 | CS,0.782,0.779,0.72,0.787,0.763,0.755,0.784,0.764,0.754,0.789,0.753,male,schizophrenic
83 | PS,0.602,0.829,0.641,0.574,0.831,0.832,0.827,0.793,0.819,0.613,0.634,male,schizophrenic
84 | TR,0.684,0.579,0.509,0.496,0.436,0.558,0.564,0.816,0.819,0.82,0.259,male,schizophrenic
85 | CS,0.856,0.835,0.946,0.844,0.907,0.897,0.827,0.816,0.819,0.82,0.816,male,schizophrenic
86 | 


--------------------------------------------------------------------------------
/inst/extdata/tictactoe_test.csv:
--------------------------------------------------------------------------------
  1 | X1b,X1o,X1x,X2b,X2o,X2x,X3b,X3o,X3x,X4b,X4o,X4x,X5b,X5o,X5x,X6b,X6o,X6x,X7b,X7o,X7x,X8b,X8o,X8x,X9b,X9o,X9x,class
  2 | 0,0,1,0,0,1,0,0,1,0,0,1,0,1,0,1,0,0,0,1,0,0,1,0,1,0,0,positive
  3 | 0,0,1,0,0,1,0,0,1,0,0,1,0,1,0,1,0,0,0,1,0,1,0,0,0,1,0,positive
  4 | 0,0,1,0,0,1,0,0,1,0,0,1,1,0,0,0,1,0,0,1,0,1,0,0,0,1,0,positive
  5 | 0,0,1,0,0,1,0,0,1,0,1,0,0,0,1,0,1,0,0,1,0,0,0,1,0,1,0,positive
  6 | 0,0,1,0,0,1,0,0,1,0,1,0,0,1,0,0,0,1,1,0,0,0,1,0,1,0,0,positive
  7 | 0,0,1,0,0,1,0,0,1,0,1,0,1,0,0,0,0,1,0,1,0,0,1,0,1,0,0,positive
  8 | 0,0,1,0,0,1,0,0,1,0,1,0,1,0,0,0,1,0,1,0,0,0,0,1,0,1,0,positive
  9 | 0,0,1,0,0,1,0,0,1,0,1,0,1,0,0,0,1,0,1,0,0,0,1,0,0,0,1,positive
 10 | 0,0,1,0,0,1,0,0,1,0,1,0,1,0,0,1,0,0,0,1,0,0,0,1,0,1,0,positive
 11 | 0,0,1,0,0,1,0,0,1,0,1,0,1,0,0,1,0,0,0,1,0,1,0,0,1,0,0,positive
 12 | 0,0,1,0,0,1,0,0,1,1,0,0,0,1,0,0,0,1,0,1,0,1,0,0,0,1,0,positive
 13 | 0,0,1,0,0,1,0,0,1,1,0,0,0,1,0,0,1,0,1,0,0,0,0,1,0,1,0,positive
 14 | 0,0,1,0,0,1,0,0,1,1,0,0,0,1,0,1,0,0,0,1,0,0,1,0,0,0,1,positive
 15 | 0,0,1,0,0,1,0,0,1,1,0,0,0,1,0,1,0,0,1,0,0,1,0,0,0,1,0,positive
 16 | 0,0,1,0,0,1,0,0,1,1,0,0,1,0,0,0,1,0,0,1,0,0,1,0,0,0,1,positive
 17 | 0,0,1,0,0,1,0,0,1,1,0,0,1,0,0,0,1,0,1,0,0,0,1,0,1,0,0,positive
 18 | 0,0,1,0,0,1,0,1,0,0,0,1,0,1,0,0,0,1,0,0,1,0,1,0,0,1,0,positive
 19 | 0,0,1,0,0,1,0,1,0,0,0,1,0,1,0,0,1,0,0,0,1,1,0,0,1,0,0,positive
 20 | 0,0,1,0,0,1,0,1,0,0,0,1,0,1,0,1,0,0,0,0,1,1,0,0,0,1,0,positive
 21 | 0,0,1,0,0,1,0,1,0,0,0,1,1,0,0,0,1,0,0,0,1,0,1,0,1,0,0,positive
 22 | 0,0,1,0,0,1,0,1,0,0,0,1,1,0,0,1,0,0,0,0,1,0,1,0,0,1,0,positive
 23 | 0,0,1,0,0,1,0,1,0,0,1,0,0,0,1,0,1,0,0,0,1,0,1,0,0,0,1,positive
 24 | 0,0,1,0,0,1,1,0,0,0,0,1,0,1,0,0,1,0,0,0,1,0,1,0,1,0,0,positive
 25 | 0,0,1,0,0,1,1,0,0,0,1,0,0,0,1,0,1,0,1,0,0,0,1,0,0,0,1,positive
 26 | 0,0,1,0,1,0,0,0,1,0,1,0,0,0,1,1,0,0,1,0,0,0,1,0,0,0,1,positive
 27 | 0,0,1,0,1,0,0,1,0,0,0,1,0,0,1,0,0,1,0,1,0,0,0,1,0,1,0,positive
 28 | 0,0,1,0,1,0,0,1,0,0,0,1,0,0,1,0,0,1,0,1,0,1,0,0,1,0,0,positive
 29 | 0,0,1,0,1,0,0,1,0,0,0,1,0,0,1,0,1,0,0,0,1,0,1,0,0,0,1,positive
 30 | 0,0,1,0,1,0,0,1,0,0,0,1,0,0,1,1,0,0,0,1,0,1,0,0,0,0,1,positive
 31 | 0,0,1,0,1,0,0,1,0,0,0,1,0,1,0,0,0,1,0,0,1,1,0,0,1,0,0,positive
 32 | 0,0,1,0,1,0,0,1,0,0,0,1,0,1,0,1,0,0,0,0,1,1,0,0,0,0,1,positive
 33 | 0,0,1,0,1,0,0,1,0,0,1,0,0,0,1,0,0,1,1,0,0,1,0,0,0,0,1,positive
 34 | 0,0,1,0,1,0,0,1,0,0,1,0,0,0,1,1,0,0,0,0,1,1,0,0,0,0,1,positive
 35 | 0,0,1,0,1,0,0,1,0,0,1,0,0,0,1,1,0,0,1,0,0,0,0,1,0,0,1,positive
 36 | 0,0,1,0,1,0,0,1,0,1,0,0,0,0,1,0,1,0,1,0,0,0,0,1,0,0,1,positive
 37 | 0,0,1,0,1,0,0,1,0,1,0,0,0,0,1,1,0,0,0,1,0,0,0,1,0,0,1,positive
 38 | 0,0,1,0,1,0,1,0,0,0,0,1,0,0,1,0,1,0,0,0,1,0,1,0,1,0,0,positive
 39 | 0,0,1,0,1,0,1,0,0,0,0,1,0,0,1,1,0,0,0,1,0,0,1,0,0,0,1,positive
 40 | 0,0,1,0,1,0,1,0,0,0,0,1,1,0,0,0,1,0,0,0,1,0,0,1,0,1,0,positive
 41 | 0,0,1,0,1,0,1,0,0,0,0,1,1,0,0,0,1,0,0,0,1,0,1,0,0,0,1,positive
 42 | 0,0,1,0,1,0,1,0,0,0,0,1,1,0,0,0,1,0,0,0,1,1,0,0,1,0,0,positive
 43 | 0,0,1,0,1,0,1,0,0,0,1,0,0,0,1,0,1,0,0,0,1,1,0,0,0,0,1,positive
 44 | 0,0,1,0,1,0,1,0,0,0,1,0,0,0,1,1,0,0,0,1,0,0,0,1,0,0,1,positive
 45 | 0,0,1,0,1,0,1,0,0,1,0,0,0,0,1,1,0,0,0,1,0,1,0,0,0,0,1,positive
 46 | 0,0,1,1,0,0,0,0,1,0,1,0,0,0,1,1,0,0,0,1,0,0,1,0,0,0,1,positive
 47 | 0,0,1,1,0,0,0,0,1,1,0,0,0,0,1,0,1,0,0,1,0,0,1,0,0,0,1,positive
 48 | 0,0,1,1,0,0,0,1,0,0,0,1,0,0,1,0,0,1,0,1,0,0,1,0,1,0,0,positive
 49 | 0,0,1,1,0,0,0,1,0,0,0,1,0,0,1,0,0,1,0,1,0,1,0,0,0,1,0,positive
 50 | 0,0,1,1,0,0,0,1,0,0,0,1,0,0,1,0,1,0,0,0,1,0,1,0,1,0,0,positive
 51 | 0,0,1,1,0,0,0,1,0,0,0,1,0,1,0,1,0,0,0,0,1,1,0,0,1,0,0,positive
 52 | 0,0,1,1,0,0,0,1,0,0,0,1,1,0,0,0,0,1,0,0,1,0,1,0,0,1,0,positive
 53 | 0,0,1,1,0,0,0,1,0,0,1,0,0,0,1,1,0,0,0,1,0,0,0,1,0,0,1,positive
 54 | 0,0,1,1,0,0,0,1,0,0,1,0,0,0,1,1,0,0,1,0,0,1,0,0,0,0,1,positive
 55 | 0,0,1,1,0,0,0,1,0,0,1,0,1,0,0,0,1,0,0,0,1,0,0,1,0,0,1,positive
 56 | 0,0,1,1,0,0,0,1,0,1,0,0,0,0,1,0,1,0,0,1,0,0,0,1,0,0,1,positive
 57 | 0,0,1,1,0,0,1,0,0,0,0,1,0,0,1,0,1,0,0,0,1,0,1,0,0,1,0,positive
 58 | 0,0,1,1,0,0,1,0,0,0,0,1,0,1,0,0,1,0,0,0,1,0,1,0,0,0,1,positive
 59 | 0,0,1,1,0,0,1,0,0,0,0,1,0,1,0,1,0,0,0,0,1,1,0,0,0,1,0,positive
 60 | 0,0,1,1,0,0,1,0,0,0,0,1,1,0,0,1,0,0,0,0,1,0,1,0,0,1,0,positive
 61 | 0,0,1,1,0,0,1,0,0,0,1,0,0,0,1,0,0,1,0,1,0,0,1,0,0,0,1,positive
 62 | 0,0,1,1,0,0,1,0,0,0,1,0,0,0,1,0,1,0,1,0,0,1,0,0,0,0,1,positive
 63 | 0,0,1,1,0,0,1,0,0,0,1,0,0,0,1,1,0,0,0,1,0,1,0,0,0,0,1,positive
 64 | 0,1,0,0,0,1,0,0,1,0,1,0,0,1,0,0,0,1,1,0,0,1,0,0,0,0,1,positive
 65 | 0,1,0,0,0,1,0,1,0,0,0,1,0,0,1,0,0,1,0,1,0,0,0,1,0,1,0,positive
 66 | 0,1,0,0,0,1,0,1,0,0,0,1,0,0,1,0,0,1,0,1,0,0,1,0,0,0,1,positive
 67 | 0,1,0,0,0,1,0,1,0,1,0,0,0,0,1,0,0,1,0,1,0,0,0,1,1,0,0,positive
 68 | 0,1,0,0,0,1,0,1,0,1,0,0,0,0,1,0,0,1,1,0,0,0,0,1,0,1,0,positive
 69 | 0,1,0,0,0,1,0,1,0,1,0,0,0,0,1,1,0,0,0,0,1,0,0,1,0,1,0,positive
 70 | 0,1,0,0,0,1,1,0,0,0,0,1,0,0,1,0,0,1,0,1,0,0,1,0,1,0,0,positive
 71 | 0,1,0,0,0,1,1,0,0,0,0,1,0,0,1,0,1,0,0,1,0,0,0,1,1,0,0,positive
 72 | 0,1,0,0,0,1,1,0,0,0,1,0,0,0,1,1,0,0,1,0,0,0,0,1,1,0,0,positive
 73 | 0,1,0,0,1,0,0,0,1,0,0,1,0,0,1,0,0,1,0,0,1,0,1,0,0,1,0,positive
 74 | 0,1,0,0,1,0,0,0,1,0,0,1,0,0,1,0,1,0,0,0,1,1,0,0,1,0,0,positive
 75 | 0,1,0,0,1,0,0,0,1,0,0,1,0,1,0,0,0,1,0,1,0,0,0,1,0,0,1,positive
 76 | 0,1,0,0,1,0,0,0,1,0,0,1,0,1,0,0,1,0,0,0,1,0,0,1,0,0,1,positive
 77 | 0,1,0,0,1,0,0,0,1,0,1,0,0,0,1,0,0,1,0,0,1,0,1,0,0,0,1,positive
 78 | 0,1,0,0,1,0,0,0,1,0,1,0,0,0,1,1,0,0,0,0,1,0,0,1,1,0,0,positive
 79 | 0,1,0,0,1,0,0,0,1,0,1,0,0,0,1,1,0,0,0,0,1,1,0,0,0,0,1,positive
 80 | 0,1,0,0,1,0,0,0,1,0,1,0,1,0,0,1,0,0,0,0,1,0,0,1,0,0,1,positive
 81 | 0,1,0,0,1,0,0,0,1,1,0,0,0,0,1,0,1,0,0,0,1,1,0,0,0,0,1,positive
 82 | 0,1,0,0,1,0,0,0,1,1,0,0,0,0,1,1,0,0,0,0,1,0,0,1,0,1,0,positive
 83 | 0,1,0,0,1,0,0,0,1,1,0,0,0,0,1,1,0,0,0,0,1,0,1,0,0,0,1,positive
 84 | 0,1,0,0,1,0,0,0,1,1,0,0,0,1,0,0,0,1,0,0,1,1,0,0,0,0,1,positive
 85 | 0,1,0,0,1,0,0,0,1,1,0,0,1,0,0,0,0,1,0,0,1,0,1,0,0,0,1,positive
 86 | 0,1,0,0,1,0,0,0,1,1,0,0,1,0,0,0,0,1,0,1,0,0,0,1,0,0,1,positive
 87 | 0,1,0,1,0,0,0,0,1,0,0,1,0,0,1,0,1,0,0,0,1,0,1,0,1,0,0,positive
 88 | 0,1,0,1,0,0,0,0,1,0,1,0,0,0,1,0,1,0,0,0,1,0,0,1,1,0,0,positive
 89 | 0,1,0,1,0,0,0,0,1,0,1,0,0,0,1,0,1,0,0,0,1,1,0,0,0,0,1,positive
 90 | 0,1,0,1,0,0,0,0,1,0,1,0,0,0,1,1,0,0,0,0,1,0,1,0,0,0,1,positive
 91 | 0,1,0,1,0,0,0,0,1,0,1,0,1,0,0,0,1,0,0,0,1,0,0,1,0,0,1,positive
 92 | 0,1,0,1,0,0,0,0,1,1,0,0,0,1,0,0,0,1,1,0,0,1,0,0,0,0,1,positive
 93 | 0,1,0,1,0,0,0,1,0,0,0,1,0,0,1,0,0,1,1,0,0,0,1,0,0,0,1,positive
 94 | 0,1,0,1,0,0,0,1,0,0,1,0,0,0,1,1,0,0,0,0,1,0,0,1,0,0,1,positive
 95 | 0,1,0,1,0,0,0,1,0,0,1,0,1,0,0,0,0,1,0,0,1,0,0,1,0,0,1,positive
 96 | 0,1,0,1,0,0,0,1,0,1,0,0,0,1,0,0,0,1,0,0,1,0,0,1,0,0,1,positive
 97 | 1,0,0,0,0,1,0,0,1,0,1,0,0,0,1,0,1,0,0,1,0,0,0,1,1,0,0,positive
 98 | 1,0,0,0,0,1,0,0,1,0,1,0,1,0,0,0,0,1,0,1,0,0,1,0,0,0,1,positive
 99 | 1,0,0,0,0,1,0,0,1,1,0,0,0,1,0,0,0,1,0,1,0,0,1,0,0,0,1,positive
100 | 1,0,0,0,0,1,0,1,0,0,0,1,0,0,1,0,0,1,0,1,0,0,1,0,1,0,0,positive
101 | 1,0,0,0,0,1,0,1,0,0,0,1,0,0,1,1,0,0,0,1,0,0,0,1,0,1,0,positive
102 | 1,0,0,0,0,1,0,1,0,0,1,0,0,0,1,0,1,0,0,0,1,0,0,1,1,0,0,positive
103 | 1,0,0,0,0,1,0,1,0,0,1,0,1,0,0,0,1,0,0,0,1,0,0,1,0,0,1,positive
104 | 1,0,0,0,0,1,0,1,0,1,0,0,0,0,1,0,1,0,1,0,0,0,0,1,1,0,0,positive
105 | 1,0,0,0,0,1,1,0,0,0,1,0,0,0,1,0,1,0,0,1,0,0,0,1,0,0,1,positive
106 | 1,0,0,0,1,0,0,0,1,0,0,1,0,0,1,0,0,1,0,1,0,0,1,0,1,0,0,positive
107 | 1,0,0,0,1,0,0,0,1,0,0,1,0,1,0,0,0,1,0,1,0,1,0,0,0,0,1,positive
108 | 1,0,0,0,1,0,0,0,1,0,1,0,0,0,1,0,0,1,0,1,0,1,0,0,0,0,1,positive
109 | 1,0,0,0,1,0,0,0,1,0,1,0,0,0,1,0,1,0,0,0,1,0,0,1,1,0,0,positive
110 | 1,0,0,0,1,0,0,0,1,0,1,0,0,1,0,0,0,1,1,0,0,0,0,1,0,0,1,positive
111 | 1,0,0,0,1,0,0,0,1,0,1,0,0,1,0,1,0,0,0,0,1,0,0,1,0,0,1,positive
112 | 1,0,0,0,1,0,0,0,1,0,1,0,1,0,0,0,0,1,1,0,0,1,0,0,0,0,1,positive
113 | 1,0,0,0,1,0,0,0,1,0,1,0,1,0,0,0,1,0,0,0,1,0,0,1,0,0,1,positive
114 | 1,0,0,0,1,0,0,0,1,1,0,0,0,0,1,0,0,1,0,0,1,0,1,0,0,1,0,positive
115 | 1,0,0,0,1,0,0,0,1,1,0,0,0,0,1,0,1,0,0,0,1,0,0,1,0,1,0,positive
116 | 1,0,0,0,1,0,0,0,1,1,0,0,0,0,1,1,0,0,0,0,1,0,1,0,1,0,0,positive
117 | 1,0,0,0,1,0,0,0,1,1,0,0,0,1,0,0,0,1,0,1,0,0,0,1,0,0,1,positive
118 | 1,0,0,0,1,0,0,0,1,1,0,0,0,1,0,0,0,1,1,0,0,1,0,0,0,0,1,positive
119 | 1,0,0,0,1,0,0,1,0,0,0,1,0,0,1,0,0,1,0,0,1,0,1,0,1,0,0,positive
120 | 1,0,0,0,1,0,1,0,0,0,0,1,0,0,1,0,0,1,0,1,0,0,0,1,0,1,0,positive
121 | 1,0,0,1,0,0,0,0,1,1,0,0,0,1,0,0,0,1,0,1,0,1,0,0,0,0,1,positive
122 | 1,0,0,1,0,0,0,1,0,0,0,1,0,0,1,0,0,1,1,0,0,1,0,0,0,1,0,positive
123 | 1,0,0,1,0,0,0,1,0,0,0,1,0,1,0,0,1,0,0,0,1,0,0,1,0,0,1,positive
124 | 1,0,0,1,0,0,0,1,0,0,1,0,0,1,0,0,0,1,0,0,1,0,0,1,0,0,1,positive
125 | 1,0,0,1,0,0,1,0,0,0,0,1,0,0,1,0,0,1,1,0,0,0,1,0,0,1,0,positive
126 | 1,0,0,1,0,0,1,0,0,0,1,0,0,1,0,1,0,0,0,0,1,0,0,1,0,0,1,positive
127 | 0,0,1,0,0,1,0,1,0,0,0,1,0,0,1,0,1,0,0,1,0,1,0,0,0,1,0,negative
128 | 0,0,1,0,0,1,0,1,0,0,0,1,0,0,1,0,1,0,1,0,0,0,1,0,0,1,0,negative
129 | 0,0,1,0,0,1,0,1,0,0,0,1,0,1,0,0,1,0,0,1,0,0,0,1,1,0,0,negative
130 | 0,0,1,0,0,1,0,1,0,0,0,1,0,1,0,1,0,0,0,1,0,0,1,0,0,0,1,negative
131 | 0,0,1,0,0,1,0,1,0,1,0,0,0,1,0,0,1,0,0,1,0,0,0,1,0,0,1,negative
132 | 0,0,1,0,0,1,1,0,0,0,1,0,0,1,0,0,1,0,0,0,1,0,0,1,0,1,0,negative
133 | 0,0,1,0,1,0,0,0,1,0,0,1,0,0,1,1,0,0,0,1,0,0,1,0,0,1,0,negative
134 | 0,0,1,0,1,0,0,0,1,0,0,1,0,1,0,0,0,1,0,1,0,0,1,0,1,0,0,negative
135 | 0,0,1,0,1,0,0,0,1,0,0,1,0,1,0,0,0,1,1,0,0,0,1,0,0,1,0,negative
136 | 0,0,1,0,1,0,0,0,1,0,0,1,0,1,0,0,1,0,1,0,0,0,1,0,0,0,1,negative
137 | 0,0,1,0,1,0,0,0,1,0,1,0,0,1,0,0,1,0,0,0,1,0,0,1,1,0,0,negative
138 | 0,0,1,0,1,0,0,0,1,0,1,0,0,1,0,0,1,0,0,0,1,1,0,0,0,0,1,negative
139 | 0,0,1,0,1,0,0,0,1,0,1,0,0,1,0,0,1,0,1,0,0,0,0,1,0,0,1,negative
140 | 0,0,1,0,1,0,0,0,1,0,1,0,0,1,0,1,0,0,0,0,1,0,1,0,0,0,1,negative
141 | 0,0,1,0,1,0,1,0,0,0,0,1,0,1,0,0,0,1,1,0,0,0,1,0,1,0,0,negative
142 | 0,0,1,1,0,0,0,0,1,0,0,1,0,1,0,0,0,1,0,1,0,0,1,0,0,1,0,negative
143 | 0,0,1,1,0,0,0,0,1,0,1,0,0,0,1,0,0,1,0,1,0,0,1,0,0,1,0,negative
144 | 0,0,1,1,0,0,0,0,1,0,1,0,0,1,0,0,1,0,0,0,1,1,0,0,1,0,0,negative
145 | 0,0,1,1,0,0,0,0,1,1,0,0,0,0,1,1,0,0,0,1,0,0,1,0,0,1,0,negative
146 | 0,0,1,1,0,0,0,1,0,0,1,0,0,0,1,0,1,0,0,0,1,0,0,1,0,1,0,negative
147 | 0,0,1,1,0,0,0,1,0,1,0,0,0,1,0,0,0,1,0,1,0,0,0,1,1,0,0,negative
148 | 0,0,1,1,0,0,0,1,0,1,0,0,0,1,0,0,0,1,0,1,0,1,0,0,0,0,1,negative
149 | 0,0,1,1,0,0,0,1,0,1,0,0,1,0,0,0,1,0,0,0,1,0,0,1,0,1,0,negative
150 | 0,0,1,1,0,0,1,0,0,0,0,1,0,0,1,1,0,0,0,1,0,0,1,0,0,1,0,negative
151 | 0,1,0,0,0,1,0,0,1,0,0,1,0,1,0,1,0,0,1,0,0,1,0,0,0,1,0,negative
152 | 0,1,0,0,0,1,0,0,1,0,1,0,0,1,0,0,1,0,1,0,0,0,0,1,0,0,1,negative
153 | 0,1,0,0,0,1,0,0,1,0,1,0,1,0,0,0,0,1,0,1,0,1,0,0,1,0,0,negative
154 | 0,1,0,0,0,1,0,0,1,0,1,0,1,0,0,0,1,0,0,1,0,0,0,1,0,0,1,negative
155 | 0,1,0,0,0,1,0,0,1,1,0,0,0,1,0,0,0,1,0,1,0,0,0,1,0,1,0,negative
156 | 0,1,0,0,0,1,0,0,1,1,0,0,0,1,0,1,0,0,0,0,1,1,0,0,0,1,0,negative
157 | 0,1,0,0,0,1,0,1,0,0,0,1,0,1,0,0,0,1,1,0,0,0,0,1,0,1,0,negative
158 | 0,1,0,0,0,1,0,1,0,0,0,1,0,1,0,1,0,0,0,0,1,0,0,1,0,1,0,negative
159 | 0,1,0,0,0,1,0,1,0,1,0,0,0,1,0,0,0,1,0,1,0,0,0,1,0,0,1,negative
160 | 0,1,0,0,0,1,1,0,0,0,1,0,0,0,1,0,0,1,0,1,0,0,1,0,0,0,1,negative
161 | 0,1,0,0,0,1,1,0,0,0,1,0,0,1,0,0,0,1,0,1,0,0,0,1,0,0,1,negative
162 | 0,1,0,0,1,0,0,0,1,0,0,1,0,1,0,0,0,1,0,0,1,1,0,0,0,1,0,negative
163 | 0,1,0,0,1,0,0,0,1,0,0,1,0,1,0,1,0,0,0,0,1,0,1,0,0,0,1,negative
164 | 0,1,0,0,1,0,0,1,0,0,0,1,0,0,1,1,0,0,0,1,0,0,0,1,0,0,1,negative
165 | 0,1,0,0,1,0,0,1,0,0,0,1,0,0,1,1,0,0,1,0,0,1,0,0,0,0,1,negative
166 | 0,1,0,0,1,0,0,1,0,0,0,1,0,1,0,0,0,1,1,0,0,0,0,1,0,0,1,negative
167 | 0,1,0,0,1,0,0,1,0,0,0,1,1,0,0,0,0,1,0,0,1,1,0,0,1,0,0,negative
168 | 0,1,0,0,1,0,0,1,0,0,0,1,1,0,0,1,0,0,1,0,0,0,0,1,0,0,1,negative
169 | 0,1,0,0,1,0,0,1,0,0,1,0,0,0,1,0,0,1,0,0,1,1,0,0,0,0,1,negative
170 | 0,1,0,0,1,0,0,1,0,0,1,0,0,0,1,0,0,1,1,0,0,0,0,1,0,0,1,negative
171 | 0,1,0,1,0,0,0,0,1,0,1,0,0,0,1,0,1,0,0,1,0,0,0,1,0,0,1,negative
172 | 0,1,0,1,0,0,0,0,1,1,0,0,0,1,0,1,0,0,0,0,1,0,0,1,0,1,0,negative
173 | 0,1,0,1,0,0,0,1,0,0,0,1,0,1,0,0,0,1,0,0,1,0,0,1,0,1,0,negative
174 | 0,1,0,1,0,0,0,1,0,0,1,0,0,0,1,0,0,1,0,1,0,0,0,1,0,0,1,negative
175 | 0,1,0,1,0,0,1,0,0,0,0,1,0,1,0,0,0,1,1,0,0,0,0,1,0,1,0,negative
176 | 0,1,0,1,0,0,1,0,0,0,1,0,0,0,1,0,0,1,0,1,0,1,0,0,0,0,1,negative
177 | 1,0,0,0,0,1,0,0,1,0,1,0,0,0,1,0,0,1,0,1,0,0,1,0,0,1,0,negative
178 | 1,0,0,0,0,1,0,0,1,0,1,0,0,1,0,0,1,0,0,0,1,0,0,1,0,1,0,negative
179 | 1,0,0,0,0,1,0,1,0,0,0,1,0,0,1,0,1,0,0,0,1,0,1,0,0,1,0,negative
180 | 1,0,0,0,0,1,0,1,0,0,0,1,0,1,0,0,0,1,0,1,0,0,1,0,0,0,1,negative
181 | 1,0,0,0,0,1,0,1,0,0,0,1,0,1,0,1,0,0,0,1,0,1,0,0,0,0,1,negative
182 | 1,0,0,0,1,0,0,0,1,0,0,1,0,1,0,0,1,0,0,0,1,0,1,0,0,0,1,negative
183 | 1,0,0,0,1,0,0,0,1,0,0,1,0,1,0,1,0,0,0,0,1,0,1,0,1,0,0,negative
184 | 1,0,0,0,1,0,0,1,0,0,0,1,0,0,1,0,1,0,0,0,1,0,0,1,0,1,0,negative
185 | 1,0,0,0,1,0,1,0,0,0,0,1,0,1,0,0,0,1,0,0,1,0,1,0,1,0,0,negative
186 | 1,0,0,1,0,0,0,1,0,0,0,1,0,0,1,0,1,0,0,0,1,1,0,0,0,1,0,negative
187 | 1,0,0,1,0,0,0,1,0,0,0,1,0,1,0,0,0,1,0,1,0,0,0,1,1,0,0,negative
188 | 0,0,1,0,0,1,0,1,0,0,1,0,0,1,0,0,0,1,0,0,1,0,0,1,0,1,0,negative
189 | 0,0,1,0,1,0,0,0,1,0,0,1,0,0,1,0,1,0,0,1,0,0,0,1,0,1,0,negative
190 | 0,0,1,0,1,0,0,0,1,0,1,0,0,0,1,0,0,1,0,1,0,0,0,1,0,1,0,negative
191 | 0,1,0,0,0,1,0,1,0,0,0,1,0,1,0,0,0,1,0,0,1,0,1,0,0,0,1,negative
192 | 0,1,0,0,0,1,0,1,0,0,1,0,0,0,1,0,0,1,0,0,1,0,1,0,0,0,1,negative
193 | 


--------------------------------------------------------------------------------
/man/autoRLearn.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/autoRLearn.R
 3 | \name{autoRLearn}
 4 | \alias{autoRLearn}
 5 | \title{Run smartML function for automatic Supervised Machine Learning.}
 6 | \usage{
 7 | autoRLearn(
 8 |   maxTime,
 9 |   directory,
10 |   testDirectory,
11 |   classCol = "class",
12 |   metric = "acc",
13 |   vRatio = 0.3,
14 |   preProcessF = c("standardize", "zv"),
15 |   featuresToPreProcess = c(),
16 |   nComp = NA,
17 |   nModels = 5,
18 |   option = 2,
19 |   featureTypes = c(),
20 |   interp = FALSE,
21 |   missingOpr = FALSE,
22 |   balance = FALSE
23 | )
24 | }
25 | \arguments{
26 | \item{maxTime}{Float numeric of the maximum time budget for reading dataset, preprocessing, calculating meta-features, Algorithm Selection & hyper-parameter tuning process only in minutes(Excluding Model Interpretability) - This is applicable in case of Option = 2 only.}
27 | 
28 | \item{directory}{String Character of the training dataset directory (SmartML accepts file formats arff/(csv with columns headers) ).}
29 | 
30 | \item{testDirectory}{String Character of the testing dataset directory (SmartML accepts file formats arff/(csv with columns headers) ).}
31 | 
32 | \item{classCol}{String Character of the name of the class label column in the dataset (default = 'class').}
33 | 
34 | \item{metric}{Metric of string character to be used in evaluation:
35 | \itemize{
36 | \item "acc" - Accuracy,
37 | \item "avg-fscore" - Average of F-Score of each label,
38 | \item "avg-recall" - Average of Recall of each label,
39 | \item "avg-precision" - Average of Precision of each label,
40 | \item "fscore" - Micro-Average of F-Score of each label,
41 | \item "recall" - Micro-Average of Recall of each label,
42 | \item "precision" - Micro-Average of Precision of each label.
43 | }}
44 | 
45 | \item{vRatio}{Float numeric of the validation set ratio that should be splitted out of the training set for the evaluation process (default = 0.1 --> 10\%).}
46 | 
47 | \item{preProcessF}{vector of string Character containing the name of the preprocessing algorithms (default = c('standardize', 'zv') --> no preprocessing):
48 | \itemize{
49 | \item "boxcox" - apply a Box–Cox transform and values must be non-zero and positive in all features,
50 | \item "yeo-Johnson" - apply a Yeo-Johnson transform, like a BoxCox, but values can be negative,
51 | \item "zv" - remove attributes with a zero variance (all the same value),
52 | \item "center" - subtract mean from values,
53 | \item "scale" - divide values by standard deviation,
54 | \item "standardize" - perform both centering and scaling,
55 | \item "normalize" - normalize values,
56 | \item "pca" - transform data to the principal components,
57 | \item "ica" - transform data to the independent components.
58 | }}
59 | 
60 | \item{featuresToPreProcess}{Vector of number of features to perform the feature preprocessing on - In case of empty vector, this means to include all features in the dataset file (default = c()) - This vector should be a subset of \code{selectedFeats}.}
61 | 
62 | \item{nComp}{Integer numeric of Number of components needed if either "pca" or "ica" feature preprocessors are needed.}
63 | 
64 | \item{nModels}{Integer numeric representing the number of classifier algorithms that you want to select based on Meta-Learning and start to tune using Bayesian Optimization (default = 5).}
65 | 
66 | \item{option}{Integer numeric representing either Classifier Algorithm Selection is needed only = 1 or Algorithm selection with its parameter tuning is required = 2 which is the default value.}
67 | 
68 | \item{featureTypes}{Vector of either 'numerical' or 'categorical' representing the types of features in the dataset (default = c() --> any factor or character features will be considered as categorical otherwise numerical).}
69 | 
70 | \item{interp}{Boolean representing if model interpretability (Feature Importance and Interaction) is needed or not (default = FALSE) This option will take more time budget if set to 1.}
71 | 
72 | \item{missingOpr}{Boolean variable represents either use median/mode imputation for instances with missing values (FALSE) or apply imputation using "MICE" library which helps you imputing missing values with plausible data values that are drawn from a distribution specifically designed for each missing datapoint (TRUE).}
73 | 
74 | \item{balance}{Boolean variable represents if SMOTE class balancing is required or not (default FALSE).}
75 | }
76 | \value{
77 | List of Results
78 | \itemize{
79 | \item "option=1" - Choosen Classifier Algorithms Names \code{clfs} with their parameters configurations \code{params}, Training DataFrame \code{TRData}, Test DataFrame \code{TEData} in case of \code{option=2},
80 | \item "option=2" - Best classifier algorithm name found \code{clfs} with its parameters configuration \code{params}, , Training DataFrame \code{TRData}, Test DataFrame \code{TEData}, model variable \code{model}, predicted values on test set \code{pred}, performance on TestingSet \code{perf}, and Feature Importance \code{interpret$featImp} / Interaction \code{interpret$Interact} plots in case of interpretability \code{interp} = TRUE and chosen model is not knn.
81 | }
82 | }
83 | \description{
84 | Run the smartML main function for automatic classifier algorithm selection, and hyper-parameter tuning.
85 | }
86 | \examples{
87 | \dontrun{
88 | autoRLearn(1, 'sampleDatasets/car/train.arff', \
89 | 'sampleDatasets/car/test.arff', option = 2, preProcessF = 'normalize')
90 | 
91 | result <- autoRLearn(10, 'sampleDatasets/shuttle/train.arff', 'sampleDatasets/shuttle/test.arff')
92 | }
93 | 
94 | }
95 | 


--------------------------------------------------------------------------------
/man/autoRLearn_.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/autoRLearn_.R
 3 | \name{autoRLearn_}
 4 | \alias{autoRLearn_}
 5 | \title{Advanced version of autoRLearn.}
 6 | \usage{
 7 | autoRLearn_(
 8 |   df_train,
 9 |   df_test,
10 |   maxTime = 10,
11 |   models = c("randomForest", "naiveBayes", "boosting", "l2-linear-classifier", "svm"),
12 |   optimizationAlgorithm = "hyperband",
13 |   bw = 3,
14 |   kde_type = "single",
15 |   max_iter = 81,
16 |   metric = "acc"
17 | )
18 | }
19 | \arguments{
20 | \item{df_train}{Dataframe of the training dataset. Assumes it is in perfect shape with all numeric variables and factor response variable named "class".}
21 | 
22 | \item{df_test}{Dataframe of the test dataset. Assumes it is in perfect shape with all numeric variables and factor response variable named "class".}
23 | 
24 | \item{maxTime}{Float representing the maximum time the algorithm should be run (in minutes).}
25 | 
26 | \item{models}{List of strings denoting which algorithms to use for the process:
27 | \itemize{
28 | \item "randomForest" - Random forests using the randomForest package
29 | \item "ranger - Random forests using the ranger package (unstable)
30 | \item "naiveBayes" - Naive bayes using the fastNaiveBayes package
31 | \item "boosting" - Gradient boosting using xgboost
32 | \item "l2-linear-classifier" - Linear primal Support vector machine from LibLinear
33 | \item "svm" - RBF kernel svm from e1071
34 | }}
35 | 
36 | \item{optimizationAlgorithm}{- String of which hyperparameter tuning algorithm to use:
37 | \itemize{
38 | \item "hyperband" - Hyperband with uniformly initiated parameters
39 | \item "bohb" - Hyperband with bayesian optimization as described on F. Hutter et al 2018 paper BOHB. Has extra parameters bw and kde_type
40 | }}
41 | 
42 | \item{bw}{- (only applies to BOHB) Double representing how much should the KDE bandwidth be widened. Higher values allow the algorithm to explore more hyperparameter combinations}
43 | 
44 | \item{kde_type}{- (only applies to BOHB) String representing whether a model's hyperparameters should be tuned individually of each other or have their probability densities multiplied:
45 | \itemize{
46 | \item "single" - each hyperparameter has its own expected improvement calculated
47 | \item "mixed" - all hyperparameters' probabilty densities are multiplied and only one mixed expected improvement is calculated
48 | }}
49 | 
50 | \item{max_iter}{- (affects both hyperband and BOHB) Integer representing the maximum number of iterations that one successive halving run can have}
51 | 
52 | \item{metric}{String of the evaluation metric to be used in the model performance optimization:
53 | \itemize{
54 | \item "acc" - Accuracy,
55 | \item "avg-fscore" - Average of F-Score of each label,
56 | \item "avg-recall" - Average of Recall of each label,
57 | \item "avg-precision" - Average of Precision of each label,
58 | \item "fscore" - Micro-Average of F-Score of each label,
59 | \item "recall" - Micro-Average of Recall of each label,
60 | \item "precision" - Micro-Average of Precision of each label.
61 | }}
62 | }
63 | \value{
64 | List of Results
65 | \itemize{
66 | \item \code{perf} - Evaluated metric of the best performing model on the test data
67 | \item \code{pred} - prediction on the test data using the best model
68 | \item \code{model} - best model object
69 | \item \code{best_models} - table with the best hyperparameters found for the selected models.
70 | }
71 | }
72 | \description{
73 | Tunes the hyperparameters of the desired algorithm/s using either hyperband or BOHB.
74 | }
75 | 


--------------------------------------------------------------------------------
/man/datasetReader.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/datasetReader.R
 3 | \name{datasetReader}
 4 | \alias{datasetReader}
 5 | \title{Read Dataset File into Memory.}
 6 | \usage{
 7 | datasetReader(
 8 |   directory,
 9 |   testDirectory,
10 |   selectedFeats = c(),
11 |   classCol = "class",
12 |   preProcessF = "N",
13 |   featuresToPreProcess = c(),
14 |   nComp = NA,
15 |   missingVal = c("NA", "?", " "),
16 |   missingOpr = 0
17 | )
18 | }
19 | \arguments{
20 | \item{directory}{String of the directory to the file containing the training dataset.}
21 | 
22 | \item{testDirectory}{String of the directory to the file containing the testing dataset.}
23 | 
24 | \item{selectedFeats}{Vector of numbers of features columns to include from the training set and ignore the rest of columns - In case of empty vector, this means to include all features in the dataset file (default = c()).}
25 | 
26 | \item{classCol}{String of the name of the class label column in the dataset (default = 'class').}
27 | 
28 | \item{preProcessF}{string containing the name of the preprocessing algorithm (default = 'N' --> no preprocessing):
29 | \itemize{
30 | \item "boxcox" - apply a Box–Cox transform and values must be non-zero and positive in all features,
31 | \item "yeo-Johnson" - apply a Yeo-Johnson transform, like a BoxCox, but values can be negative,
32 | \item "zv" - remove attributes with a zero variance (all the same value),
33 | \item "center" - subtract mean from values,
34 | \item "scale" - divide values by standard deviation,
35 | \item "standardize" - perform both centering and scaling,
36 | \item "normalize" - normalize values,
37 | \item "pca" - transform data to the principal components,
38 | \item "ica" - transform data to the independent components.
39 | }}
40 | 
41 | \item{featuresToPreProcess}{Vector of number of features to perform the feature preprocessing on - In case of empty vector, this means to include all features in the dataset file (default = c()) - This vector should be a subset of \code{selectedFeats}.}
42 | 
43 | \item{nComp}{Integer of Number of components needed if either "pca" or "ica" feature preprocessors are needed.}
44 | 
45 | \item{missingVal}{Vector of strings representing the missing values in dataset (default: c('NA', '?', ' ')).}
46 | 
47 | \item{missingOpr}{Boolean variable represents either delete instances with missing values or apply imputation using "MICE" library which helps you imputing missing values with plausible data values that are drawn from a distribution specifically designed for each missing datapoint- (default = 0 --> delete instances).}
48 | }
49 | \value{
50 | List of the TrainingSet \code{Train} and TestingSet \code{Test}.
51 | }
52 | \description{
53 | Read the file of the training and testing dataset, and perform preprocessing and data cleaning if necessary.
54 | }
55 | \examples{
56 | \dontrun{
57 | dataset <- datasetReader('/Datasets/irisTrain.csv', '/Datasets/irisTest.csv')
58 | }
59 | }
60 | 


--------------------------------------------------------------------------------
/man/metafeatures.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DataSystemsGroupUT/SmartML/e58b5bddb0fbf741e16f31651a282146143e78fe/man/metafeatures.pdf


--------------------------------------------------------------------------------
/man/runClassifier.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/runClassifier.R
 3 | \name{runClassifier}
 4 | \alias{runClassifier}
 5 | \title{Fit a classifier model.}
 6 | \usage{
 7 | runClassifier(
 8 |   trainingSet,
 9 |   validationSet,
10 |   params,
11 |   classifierAlgorithm,
12 |   metric = "acc",
13 |   interp = 0
14 | )
15 | }
16 | \arguments{
17 | \item{trainingSet}{Dataframe of the training set.}
18 | 
19 | \item{validationSet}{Dataframe of the validation Set.}
20 | 
21 | \item{params}{A string character of parameter configuration values for the current classifier to be tuned (parameters are separated by #) and can be obtained from \code{params} out of resulted list after running \code{autoRLearn} function.}
22 | 
23 | \item{classifierAlgorithm}{String character of the name of classifier algorithm used now.
24 | \itemize{
25 | \item "svm" - Support Vector Machines from e1071 package,
26 | \item "naiveBayes" - naiveBayes from e1071 package,
27 | \item "randomForest" - randomForest from randomForest package,
28 | \item "lmt" -  LMT Weka classifier trees from RWeka package,
29 | \item "lda" -  Linear Discriminant Analysis from MASS package,
30 | \item "j48" - J48 Weka classifier Trees from RWeka package,
31 | \item "bagging" - Bagging Classfier from ipred package,
32 | \item "knn" - K nearest Neighbors from FNN package,
33 | \item "nnet" - Simple neural net from nnet package,
34 | \item "C50" - C50 decision tree from C5.0 pacakge,
35 | \item "rpart" - rpart decision tree from rpart package,
36 | \item "rda" - regularized discriminant analysis from klaR package,
37 | \item "plsda" - Partial Least Squares And Sparse Partial Least Squares Discriminant Analysis from caret package,
38 | \item "glm" - Fitting Generalized Linear Models from stats package,
39 | \item "deepboost" - deep boost classifier from deepboost package.
40 | }}
41 | 
42 | \item{metric}{Metric string character to be used in evaluation:
43 | \itemize{
44 | \item "acc" - Accuracy,
45 | \item "avg-fscore" - Average of F-Score of each label,
46 | \item "avg-recall" - Average of Recall of each label,
47 | \item "avg-precision" - Average of Precision of each label,
48 | \item "fscore" - Micro-Average of F-Score of each label,
49 | \item "recall" - Micro-Average of Recall of each label,
50 | \item "precision" - Micro-Average of Precision of each label
51 | }}
52 | 
53 | \item{interp}{Boolean representing if interpretability is required or not (Default = 0).}
54 | }
55 | \value{
56 | List of performance on validationSet named \code{perf}, model fitted on trainingSet named \code{m}, predictions on test set \code{pred}, and interpretability plots named \code{interpret} in case of interp = 1
57 | }
58 | \description{
59 | Run the classifier on a training set and measure performance on a validation set.
60 | }
61 | \examples{
62 | \dontrun{
63 | result1 <- autoRLearn(10, 'sampleDatasets/shuttle/train.arff', 'sampleDatasets/shuttle/test.arff')
64 | dataset <- datasetReader('/Datasets/irisTrain.csv', '/Datasets/irisTest.csv')
65 | result2 <- runClassifier(dataset$Train, dataset$Test, result1$params, result1$clfs)
66 | }
67 | 
68 | }
69 | 


--------------------------------------------------------------------------------
/man/supportedAlgorithms.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DataSystemsGroupUT/SmartML/e58b5bddb0fbf741e16f31651a282146143e78fe/man/supportedAlgorithms.pdf


--------------------------------------------------------------------------------
/manual.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DataSystemsGroupUT/SmartML/e58b5bddb0fbf741e16f31651a282146143e78fe/manual.pdf


--------------------------------------------------------------------------------
/save_jsons.R:
--------------------------------------------------------------------------------
 1 | library(purrr)
 2 | library(stringr)
 3 | library(jsonlite)
 4 | library(devtools)
 5 | 
 6 | 
 7 | files <- dir(path <- "inst/extdata/hyperband_jsons", pattern = "*.json")
 8 | names_clf <- files %>%
 9 |   map_chr(~ str_remove(.x, pattern = ".json"))
10 | paths <- file.path(path, files)
11 | jsons <- paths %>%
12 |   map(.f = ~ jsonlite::fromJSON(txt = .x, flatten = T))
13 | names(jsons) <- names_clf
14 | 
15 | ## Then:
16 | 
17 | save(jsons, file = "R/sysdata.rda")
18 | save(jsons, file = "sysdata.rda")
19 | load("sysdata.rda")
20 | 


--------------------------------------------------------------------------------
/sysdata.rda:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/DataSystemsGroupUT/SmartML/e58b5bddb0fbf741e16f31651a282146143e78fe/sysdata.rda


--------------------------------------------------------------------------------
/test_rmarkdown/new_tests.Rmd:
--------------------------------------------------------------------------------
  1 | ---
  2 | title: "new_tests"
  3 | author: "rucy"
  4 | date: "9/22/2020"
  5 | output: html_document
  6 | ---
  7 | 
  8 | ```{r setup, include=FALSE}
  9 | knitr::opts_chunk$set(echo = TRUE)
 10 | ```
 11 | 
 12 | ## R Markdown
 13 | 
 14 | ```{r}
 15 | 
 16 | library(R.utils)
 17 | library(mlr3)
 18 | library(mlr3learners)
 19 | library(readr)
 20 | library(data.table)
 21 | library(purrr)
 22 | library(stringr)
 23 | library(jsonlite)
 24 | library(tictoc)
 25 | 
 26 | ## If you change any of the jsons
 27 | 
 28 | ## Do this:
 29 | 
 30 | files <- dir(path <- "~/school_stuff/schoolwork/witchcraft/inst/extdata/hyperband_jsons", pattern = "*.json")
 31 | 
 32 | names_clf <- files %>%
 33 |   map_chr(~ str_remove(.x, pattern = ".json"))
 34 | 
 35 | paths <- file.path(path, files)
 36 | 
 37 | jsons <- paths %>%
 38 |   map(.f = ~ fromJSON(txt = .x, flatten = T))
 39 | 
 40 | names(jsons) <- names_clf
 41 | 
 42 | ## Then:
 43 | 
 44 | ## save(jsons, file = "~/school_stuff/schoolwork/witchcraft/sysdata.rda")
 45 | 
 46 | # load("~/school_stuff/schoolwork/witchcraft/R/sysdata.rda")
 47 | 
 48 | ## Do this ^^ 
 49 | 
 50 | param_sample <- function(model, hparam, columns = NULL) {
 51 | 
 52 |   param <- jsons[[model]][[hparam]]
 53 | 
 54 |   type <- param$type
 55 | 
 56 |   type_scale <- param$scale
 57 | 
 58 |   if(type == "discrete") {
 59 | 
 60 |     param_estimation <- paste("'", base::sample(x = as.list(param$values), size = 1), "'", sep = "")
 61 | 
 62 |     return(param_estimation)
 63 | 
 64 |   }
 65 | 
 66 |   else {
 67 | 
 68 |     int_val <- ifelse(hparam == "mtry", as.numeric(columns) - 1, as.numeric(param$maxVal))
 69 | 
 70 |     param_estimation <- fcase(type_scale == "int", rdunif(1, a = as.numeric(param$minVal),
 71 |                                                           b = int_val),
 72 |                               type_scale == "any", runif(1,  min = as.numeric(param$minVal),
 73 |                                                          max = as.numeric(param$maxVal)),
 74 |                               type_scale == "double", runif(1,  min = as.numeric(param$minVal),
 75 |                                                             max = as.numeric(param$maxVal)),
 76 |                               type_scale == "exp", runif(1,  min = 2^as.numeric(param$minVal),
 77 |                                                          max = 2^as.numeric(param$maxVal)))
 78 | 
 79 |     return(param_estimation)
 80 | 
 81 |   }
 82 | 
 83 | }
 84 | 
 85 | get_random_hp_config <- function(model, columns = NULL) {
 86 | 
 87 |   param_db <- jsons[[model]]
 88 | 
 89 |   params_list <- param_db$params
 90 | 
 91 |   params_list_mapped <- map(.x = params_list,
 92 |                             .f = as_mapper( ~ param_sample(model = model, hparam = .x, columns = columns)))
 93 | 
 94 |   `names<-`(params_list_mapped, params_list)
 95 | 
 96 | }
 97 | 
 98 | data_load <- read_csv(file = "~/school_stuff/schoolwork/witchcraft/inst/extdata/ta_train.csv")
 99 | 
100 | data_model <- data_load %>%
101 |   as.data.table()
102 |   
103 | data_model[, class := factor(class, levels = unique(class)) %>% sort()]
104 | 
105 | ```
106 | 
107 | ### New successive halving
108 | 
109 | ```{r}
110 | 
111 | library(data.table)
112 | 
113 | successive_halving <- function(df, model, params_config, n = 81, r = 1, eta = 3, max_iter = 81, s_max = 4, evaluations = data.frame()) {
114 | 
115 |   final_df <- params_config
116 |   
117 |   task <- TaskClassif$new(id = "sh", backend = df, target = "class")
118 |   
119 |   param_number <- length(params_config)
120 | 
121 |   for (k in 0:s_max) {
122 | 
123 |     gc()
124 | 
125 |     n_i = n * (eta ** -k)
126 | 
127 |     r_i = r * (eta ** k)
128 | 
129 |     r_p = r_i / max_iter
130 | 
131 |     min_train_datapoints = (length(unique(df$class)) * 3) + 1
132 | 
133 |     min_prob_datapoints = min_train_datapoints / nrow(df$class)
134 |     
135 |     train_idxs <- sample(task$nrow, task$nrow * max(min(r_p, 0.8), min_prob_datapoints))
136 |     test_idxs <- setdiff(seq_len(task$nrow), train_idxs)
137 |     
138 |     learners <- replicate(n = n_i, expr = {lrn(paste("classif", sep = ".", model))})
139 | 
140 |     j = 1
141 |     for (i in learners) {
142 |       
143 |       i$param_set$values = final_df[[j]]
144 |       
145 |       j = j + 1
146 |       
147 |     }
148 | 
149 |     for (l in learners) {
150 |       
151 |       l$train(task = task, row_ids = train_idxs)
152 |       
153 |     }
154 | 
155 |     measure <- msr("classif.acc")
156 | 
157 |     preds <- map(.x = learners, .f = ~ .x$predict(task, row_ids = test_idxs)$score(measure))
158 | 
159 |     final_df <- final_df %>%
160 |       as.data.table() %>%
161 |       t() %>%
162 |       `colnames<-`(value = jsons[[model]]$params) %>%
163 |       as.data.table()
164 |     
165 |     final_df[, acc := unlist(preds)]
166 |     
167 |     final_df[, budget := r_i]
168 |     
169 |     final_df[, budget := r_p]
170 | 
171 |     setorder(final_df, -acc)
172 | 
173 |     evaluations <- rbindlist(list(evaluations, final_df))
174 |     
175 |     final_df <- final_df %>%
176 |       head(max(n_i/eta, 1))
177 | 
178 |     if(k == s_max){
179 | 
180 |       return(list("answer" = final_df, "sh_runs" = evaluations))
181 | 
182 |     }
183 |     
184 |     final_df$acc = NULL
185 |     final_df$budget = NULL
186 |     
187 |     final_df <- purrr::transpose(final_df)
188 | 
189 |   }
190 | }
191 | 
192 | test_param_sampling <- replicate(81, get_random_hp_config("xgboost", columns = ncol(data_model)), simplify = FALSE)
193 | 
194 | test_sh <- successive_halving(df = data_model, model = "xgboost", params_config = test_param_sampling)
195 | ```
196 | 
197 | ### New hyperbandito
198 | 
199 | ```{r}
200 | 
201 | calc_n_r = function(max_iter = 81, eta = 3, s = 4, B = 405) {
202 | 
203 |   n = trunc(ceiling(trunc(B/max_iter/(s+1)) * eta**s))
204 | 
205 |   r = max_iter * eta^(-s)
206 | 
207 |   ans = c(n, r)
208 | 
209 |   ans
210 | 
211 | }
212 | 
213 | 
214 | hyperband <- function(df, model, max_iter = 81, eta = 3, maxtime = 1000) {
215 | 
216 |   logeta = as_mapper(~ log(.x) / log(eta))
217 | 
218 |   s_max = trunc(logeta(max_iter))
219 | 
220 |   B = (s_max + 1) * max_iter
221 | 
222 |   nrs = map_dfc(s_max:0, .f = ~ calc_n_r(max_iter, eta, .x, B)) %>%
223 |     t() %>%
224 |     `colnames<-`(value = c("n", "r")) %>%
225 |     as.data.table()
226 | 
227 |   nrs$s = s_max:0
228 | 
229 |   partial_halving <- function(n, r, s) {
230 | 
231 |     successive_halving(df = df,
232 |                        model = model,
233 |                        params_config = replicate(n, get_random_hp_config(model, columns = ncol(df) - 1), simplify = FALSE),
234 |                        n = n,
235 |                        r = r,
236 |                        s_max = s,
237 |                        max_iter = max_iter,
238 |                        eta = eta)
239 | 
240 |   }
241 | 
242 |   tryCatch(expr = {withTimeout(expr = {
243 | 
244 |     liszt = vector(mode = "list",
245 |                    length = max(nrs$s) + 1)
246 | 
247 |     for (row in 1:nrow(nrs)) {
248 | 
249 |       liszt[[row]] <- partial_halving(nrs[[row, 1]],
250 |                                       nrs[[row, 2]],
251 |                                       nrs[[row, 3]])
252 | 
253 |     }
254 |   }, timeout = maxtime, cpu = maxtime)},
255 | 
256 |   TimeoutException = function(ex) {
257 | 
258 |     print("Budget ended.")
259 | 
260 |     return(liszt)
261 | 
262 |   },
263 | 
264 |   finally = function(ex) {
265 | 
266 |     print("Hyperband successfully finished.")
267 | 
268 |     return(liszt) }
269 |   ,
270 | 
271 |   error = function(ex) {
272 | 
273 |     print(paste("Error found, replace ", model, sep = ""))
274 | 
275 |     print(geterrmessage())
276 | 
277 |     break
278 | 
279 |   })
280 | 
281 |   return(liszt)
282 | 
283 | }
284 | 
285 | tezt_hyperband = hyperband(df = data_model, model = "xgboost", maxtime = 120)
286 | ```
287 | 
288 | Evocation test
289 | 
290 | ```{r}
291 | 
292 | evocate <- function(df_train, df_test, maxTime = 10, models = "xgboost", optimizationAlgorithm = "hyperband", bw = 3, max_iter = 81, kde_type = "single") {
293 | 
294 |   total_time = maxTime * 60
295 | 
296 |   parameters_per_model <- map_int(models, .f = ~ length(jsons[[.x]]$params))
297 | 
298 |   times = (parameters_per_model * total_time) / (sum(parameters_per_model))
299 | 
300 |   print("Time distribution:")
301 |   print(times)
302 |   print("Models selected:")
303 |   print(models)
304 | 
305 |   run_optimization = function(model, time) {
306 | 
307 |     results = NULL
308 | 
309 |     priors = data.frame()
310 | 
311 |     tic(model, "optimization time:")
312 | 
313 |     if(optimizationAlgorithm == "hyperband") {
314 | 
315 |       current <- Sys.time() %>% as.integer()
316 | 
317 |       end <- (Sys.time() %>% as.integer()) + time
318 | 
319 |       repeat {
320 | 
321 |         gc(verbose = F)
322 | 
323 |         tic("current hyperband runtime")
324 | 
325 |         print(paste("started", model))
326 | 
327 |         time_left <- max(end - (Sys.time() %>% as.integer()), 1)
328 | 
329 |         print(paste("There are:", time_left, "seconds left for this hyperband run"))
330 | 
331 |         res <- hyperband(df = df_train, model = model, max_iter = max_iter, maxtime = time_left)
332 | 
333 |         if(is_empty(purrr::flatten(res)) == F) {
334 | 
335 |           res <- res %>%
336 |             map_dfr(.f = ~ .x[["answer"]]) %>%
337 |             as.data.table()
338 |           
339 |           setorder(res, -acc)
340 |           
341 |           res <- res %>% head(1)
342 | 
343 |           results <- c(list(res), results)
344 |           
345 |           print(res)
346 | 
347 |           print(paste('Best accuracy from hyperband this round: ', res$acc))
348 | 
349 |         }
350 | 
351 |         elapsed <- (Sys.time() %>% as.integer()) - current
352 | 
353 |         if(elapsed >= time) {
354 | 
355 |           break
356 | 
357 |         }
358 | 
359 |       }
360 | 
361 |     }
362 | 
363 |     else if(optimizationAlgorithm == "bohb") {
364 | 
365 |       current <- Sys.time() %>% as.integer()
366 | 
367 |       end <- (Sys.time() %>% as.integer()) + time
368 | 
369 |       repeat {
370 | 
371 |         gc(verbose = F)
372 | 
373 |         tic("current bohb time")
374 | 
375 |         print(paste("started", model))
376 | 
377 |         time_left <- max(end - (Sys.time() %>% as.integer()), 1)
378 | 
379 |         print(paste("There are:", time_left, "seconds left for this bohb run"))
380 | 
381 |         res <- bohb(df = df_train, model = model, bw = bw, max_iter = max_iter, maxtime = time_left, priors = priors, kde_type = kde_type)
382 | 
383 |         if(is_empty(flatten(res)) == F) {
384 | 
385 |           priors <- res %>%
386 |             map_dfr(.f = ~ .x[["sh_runs"]])
387 | 
388 |           res <- res %>%
389 |             map_dfr(.f = ~ .x[["answer"]]) %>%
390 |             arrange(desc(acc)) %>%
391 |             head(1)
392 | 
393 |           results <- c(list(res), results)
394 | 
395 |           print(paste('Best accuracy from hyperband this round: ', res$acc))
396 | 
397 |         }
398 | 
399 |         elapsed <- (Sys.time() %>% as.integer()) - current
400 | 
401 |         if(elapsed >= time) {
402 | 
403 |           break
404 | 
405 |         }
406 | 
407 |       }
408 | 
409 | 
410 |     }
411 | 
412 |     else {
413 | 
414 |       errorCondition(message = "Only hyperband and bohb are valid optimization algorithms at this moment.")
415 | 
416 |       break
417 | 
418 |     }
419 | 
420 |     toc()
421 | 
422 |     results
423 | 
424 |   }
425 | 
426 |   print("Finished all optimizations.")
427 | 
428 |   ans = vector(mode = "list", length = length(models))
429 | 
430 |   for(i in 1:length(models)) {
431 | 
432 |     flag <- TRUE
433 | 
434 |     tryCatch(expr = {
435 | 
436 |       ans[[i]] <- run_optimization(models[[i]], times[[i]])
437 | 
438 |     }, error = function(e) {
439 | 
440 |       print("Error spotted, going to the next model")
441 | 
442 |       flag <<- FALSE
443 | 
444 |     })
445 | 
446 |     if (!flag) next
447 | 
448 |   }
449 | 
450 |   return(ans)
451 |   
452 |   ### TO DO - add the final model evaluation.
453 |   ### with your cross validation ideas and etc.
454 | 
455 | }
456 | 
457 | 
458 | ```
459 | 
460 | ```{r}
461 | 
462 | data_train <- read_csv(file = "~/school_stuff/schoolwork/witchcraft/inst/extdata/ta_train.csv") %>% as.data.table()
463 | data_test <- read_csv(file = "~/school_stuff/schoolwork/witchcraft/inst/extdata/ta_test.csv") %>% as.data.table()
464 | 
465 | data_train[, class := factor(class, levels = unique(class)) %>% sort()]
466 | data_test[, class := factor(class, levels = unique(class)) %>% sort()]
467 | 
468 | tezt <- evocate(data_train, data_test, maxTime = 2, models = "xgboost")
469 | 
470 | ```
471 | 


--------------------------------------------------------------------------------
/testing.R:
--------------------------------------------------------------------------------
 1 | # Title     : Testing the Main Package Function
 2 | # Objective : Package Testing
 3 | # Created by: s-moh
 4 | # Created on: 11/12/2020
 5 | library(SmartML)
 6 | library(tidyverse)
 7 | library(R.utils)
 8 | library(mlr)
 9 | library(mlr3)
10 | library(mlr3learners)
11 | library(mlr3pipelines)
12 | library(mlr3filters)
13 | library(readr)
14 | library(data.table)
15 | library(stringr)
16 | library(jsonlite)
17 | library(tictoc)
18 | 
19 | #################################################################################################
20 | # Classification
21 | 
22 | "lrn1 <- lrn('classif.rpart', predict_type = 'prob')
23 | lrn2 <- lrn('classif.ranger', predict_type = 'prob')
24 | lrn3 <- lrn('classif.svm', predict_type = 'prob')
25 | 
26 | rpart_cv1 = po('learner_cv', lrn1, id = 'lrn1')
27 | ranger_cv1 = po('learner_cv', lrn2, id = 'lrn2')
28 | svm_cv1 = po('learner_cv', lrn3, id = 'lrn3')
29 | lrns = c(rpart_cv1, ranger_cv1, svm_cv1)
30 | 
31 | level0 = gunion(list(
32 |   lrns))  %>>%
33 |   po('featureunion', id = 'union1')
34 | 
35 | ensemble = level0 %>>% LearnerClassifAvg$new(id = 'classif.avg')
36 | ensemble$plot(html = FALSE)
37 | 
38 | ens_lrn = GraphLearner$new(ensemble)
39 | ens_lrn$predict_type = 'prob'
40 | 
41 | task = mlr_tasks$get('iris')
42 | train.idx = sample(seq_len(task$nrow), 120)
43 | test.idx  = setdiff(seq_len(task$nrow), train.idx)
44 | 
45 | perf <- ens_lrn$train(task, train.idx)$predict(task, test.idx)$score(msr('classif.acc'))
46 | print(perf)"
47 | 
48 | #################################################################################################
49 | 
50 | data_train <- readr::read_csv('inst/extdata/tictactoe_train.csv') %>%
51 |   as.data.table()
52 | 
53 | data_test <- readr::read_csv('inst/extdata/tictactoe_test.csv') %>%
54 |   as.data.table()
55 | 
56 | data_train[, class := factor(class, levels = unique(class)) %>% sort()]
57 | data_test[, class := factor(class, levels = unique(class)) %>% sort()]
58 | 
59 | opt <- SmartML::evocate(df_train = data_train,
60 |                df_test = data_test,
61 |                models = c('rpart', 'ranger', 'svm'),
62 |                #'svm(done)', 'kknn(done)', 'ranger(done)', 'rpart(done)',
63 |                #'xgboost(done)', 'cv_glmnet(done)', 'naive_bayes(done)'
64 |                optimizationAlgorithm = 'hyperband',
65 |                maxTime = 5, ensemble_size = 3)
66 | 
67 | print(opt)
68 | gc()
69 | 


--------------------------------------------------------------------------------
/tests/testthat.R:
--------------------------------------------------------------------------------
1 | library(testthat)
2 | library(SmartML)
3 | 
4 | test_check("SmartML")
5 | 


--------------------------------------------------------------------------------
/tests/testthat/test-autorlearn.R:
--------------------------------------------------------------------------------
1 | context("test-autorlearn")
2 | 
3 | test_that("option1", {
4 |   result1 <- autoRLearn(1, system.file("extdata", "shuttle/train.arff", package = "SmartML"), system.file("extdata", "shuttle/train.arff", package = "SmartML"), option = 1, preProcessF = 'pca', nComp = 3, nModels = 2)
5 |   result1$clfs  #Vector of recommended nModels classifiers
6 |   result1$params #Vector of initial suggested parameter configurations of nModels recommended classifiers
7 | })
8 | 
9 | 


--------------------------------------------------------------------------------
/tests/testthat/test-hyperband_test.R:
--------------------------------------------------------------------------------
1 | test_that("Parameter sampling works", {
2 |   expect_length(param_sample("ranger", "mtry", columns = 11), 1)
3 | })
4 | 


--------------------------------------------------------------------------------
/vignettes/.gitignore:
--------------------------------------------------------------------------------
1 | *.html
2 | *.R
3 | 


--------------------------------------------------------------------------------
/vignettes/introduction.Rmd:
--------------------------------------------------------------------------------
 1 | ---
 2 | title: "Introduction to SmartML: Automatic Supervised Machine Learning in R"
 3 | author: "Mohamed Maher - Data Systems Group @ University of Tartu"
 4 | output: rmarkdown::html_vignette
 5 | fig_width: 10 
 6 | fig_height: 10 
 7 | vignette: >
 8 |   %\VignetteIndexEntry{Introduction to SmartML: Automatic Supervised Machine Learning in R}
 9 |   %\VignetteEngine{knitr::rmarkdown}
10 |   %\VignetteEncoding{UTF-8}
11 | ---
12 | 
13 | ```{r setup, include = FALSE}
14 | knitr::opts_chunk$set(
15 |   collapse = TRUE,
16 |   comment = "#>"
17 | )
18 | ```
19 | 
20 | 
21 | ## SmartML: 
22 | Curently, SmartML is an R-Package representing a meta learning-based framework for automated selection and hyperparameter tuning for machine learning algorithms. Being meta-learning based, the framework is able to simulate the role of the machine learning expert. In particular, the framework is equipped with a continuously updated knowledge base that stores information about the meta-features of all processed datasets along with the associated performance of the different classifiers and their tuned parameters. Thus, for any new dataset, SmartML automatically extracts its meta features and searches its knowledge base for the best performing algorithm to start its optimization process. In addition, SmartML makes use of the new runs to continuously enrich its knowledge base to improve its performance and robustness for future runs.
23 | 
24 | <img src = "https://bigdata.cs.ut.ee/smartml/images/arch.jpg" width = "50%">
25 | 
26 | ## SmartML Contribution Points and Goals:
27 | 
28 | The goal of SmartML is to automate the process of classifier algorithm selection, and hyper-parameter tuning in supervised machine learning using a modified version of SMAC bayesian optimization that prefers explitation more than exploration thanks to Meta-Learning. 
29 | 1. SmartML is the first R package to deal with the sueprvised machine learning automation, and it is built over 16 different classifier algorithms from different R packages. <br>
30 | 2. In addition, we offer different data preprocessing, and feature engineering algorithms that can be specified by user and applied on tabular datasets of either CSV or ARFF extensions easily.
31 | 3. SmartML has a collaborative knowledge base that grows by time as more users are using our tool.
32 | 4. Finally, SmartML has the ability to do some model interpretability plots for feature importance and interaction by help of ```iml``` package for ML model interpretability.
33 | 5. SmartML has a web service for the tool with a simple R Shiny interface that can be found <a href = "https://bigdata.cs.ut.ee/smartml/index.html"> HERE </a>, and demonstration for how to use the web service can be found <a href="https://www.youtube.com/watch?v=m5sbV1P8oqU">HERE</a>.
34 | 
35 | ## Installation
36 | 
37 | You can install the released version of SmartML from [Github](https://github.com/DataSystemsGroupUT/SmartML) with:
38 | 
39 | ``` r
40 | install_github("DataSystemsGroupUT/SmartML")
41 | ```
42 | 
43 | ---
44 | ## User Manual
45 | 
46 | Manual for the SmartML R package can be found <a href = "https://github.com/DataSystemsGroupUT/Auto-Machine-Learning/blob/master/manual.pdf"> HERE </a>
47 | 
48 | ---
49 | ## Example
50 | 
51 | ---
52 | ## Contribution GuideLines to SmartML
53 | To Contribute to `SmartML`, Please Follow these <a href = "https://github.com/DataSystemsGroupUT/SmartML/blob/master/CONTRIBUTE.md"> GuideLines </a>
54 | 
55 | ---
56 | ## Publication
57 | 
58 | For More details, you can view our publication about SmartML.
59 | SmartML has been accepted as a DEMO paper at EDBT 19 in Lisbon Portugal <a href = "http://openproceedings.org/2019/conf/edbt/EDBT19_paper_235.pdf">[PDF]</a>:
60 | ```
61 | Mohamed Maher, Sherif Sakr.,SMARTML: A Meta Learning-Based Framework for Automated Selection and Hyperparameter Tuning for Machine Learning Algorithms (2019). Advances in Database Technology-EDBT 2019: 22nd International Conference on Extending Database Technology, Lisbon, Portugal, March 26-29.
62 | ```
63 | 
64 | ---
65 | ## Funding:
66 | This work is funded by the European Regional Development Funds via the Mobilitas Plus programme (grant MOBTT75).
67 | 
68 | ---
69 | ## Licence:
70 | <p>
71 |   &copy; 2019, Data Systems Group at University of Tartu
72 | </p>
73 | This work is licensed under the terms of the GNU General Public License, version 3.0 (GPLv3)
74 | 


--------------------------------------------------------------------------------