trollR.Rmd
lorrem ipsssilalal tikitaka Vignettes are long form documentation commonly included in packages. Because they are part of the distribution of the package, they need to be as compact as possible. The html_vignette
output type provides a custom style sheet (and tweaks some options) to ensure that the resulting html is as small as possible. The html_vignette
format:
Note the various macros within the vignette
section of the metadata block above. These are required in order to instruct R how to build the vignette. Note that you should change the title
field and the \VignetteIndexEntry
to match the title of your vignette.
The html_vignette
template includes a basic CSS theme. To override this theme you can specify your own CSS in the document metadata as follows:
output:
113 | rmarkdown::html_vignette:
114 | css: mystyles.css
115 | The figure sizes have been customised so that you can easily put two images side-by-side.
120 |plot(1:10)
121 | plot(10:1)
You can enable figure captions by fig_caption: yes
in YAML:
output:
125 | rmarkdown::html_vignette:
126 | fig_caption: yes
127 | Then you can use the chunk option fig.cap = "Your figure caption."
in knitr.
You can write math expressions, e.g. \(Y = X\beta + \epsilon\), footnotes1, and tables, e.g. using knitr::kable()
.
136 | | mpg | 137 |cyl | 138 |disp | 139 |hp | 140 |drat | 141 |wt | 142 |qsec | 143 |vs | 144 |am | 145 |gear | 146 |carb | 147 |
---|---|---|---|---|---|---|---|---|---|---|---|
Mazda RX4 | 151 |21.0 | 152 |6 | 153 |160.0 | 154 |110 | 155 |3.90 | 156 |2.620 | 157 |16.46 | 158 |0 | 159 |1 | 160 |4 | 161 |4 | 162 |
Mazda RX4 Wag | 165 |21.0 | 166 |6 | 167 |160.0 | 168 |110 | 169 |3.90 | 170 |2.875 | 171 |17.02 | 172 |0 | 173 |1 | 174 |4 | 175 |4 | 176 |
Datsun 710 | 179 |22.8 | 180 |4 | 181 |108.0 | 182 |93 | 183 |3.85 | 184 |2.320 | 185 |18.61 | 186 |1 | 187 |1 | 188 |4 | 189 |1 | 190 |
Hornet 4 Drive | 193 |21.4 | 194 |6 | 195 |258.0 | 196 |110 | 197 |3.08 | 198 |3.215 | 199 |19.44 | 200 |1 | 201 |0 | 202 |3 | 203 |1 | 204 |
Hornet Sportabout | 207 |18.7 | 208 |8 | 209 |360.0 | 210 |175 | 211 |3.15 | 212 |3.440 | 213 |17.02 | 214 |0 | 215 |0 | 216 |3 | 217 |2 | 218 |
Valiant | 221 |18.1 | 222 |6 | 223 |225.0 | 224 |105 | 225 |2.76 | 226 |3.460 | 227 |20.22 | 228 |1 | 229 |0 | 230 |3 | 231 |1 | 232 |
Duster 360 | 235 |14.3 | 236 |8 | 237 |360.0 | 238 |245 | 239 |3.21 | 240 |3.570 | 241 |15.84 | 242 |0 | 243 |0 | 244 |3 | 245 |4 | 246 |
Merc 240D | 249 |24.4 | 250 |4 | 251 |146.7 | 252 |62 | 253 |3.69 | 254 |3.190 | 255 |20.00 | 256 |1 | 257 |0 | 258 |4 | 259 |2 | 260 |
Merc 230 | 263 |22.8 | 264 |4 | 265 |140.8 | 266 |95 | 267 |3.92 | 268 |3.150 | 269 |22.90 | 270 |1 | 271 |0 | 272 |4 | 273 |2 | 274 |
Merc 280 | 277 |19.2 | 278 |6 | 279 |167.6 | 280 |123 | 281 |3.92 | 282 |3.440 | 283 |18.30 | 284 |1 | 285 |0 | 286 |4 | 287 |4 | 288 |
Also a quote using >
:
293 |295 |“He who gives up [code] safety for [code] speed deserves neither.” (via)
294 |
A footnote here.↩
vignette2.Rmd
You can use your own smartphone to track your movements. So far, the import via SportsTracker is implemented. I am working on providing you with other import options in the future. If you find other apps that work with similar file formats, feel free to create a pull request or shoot me a tweet or email.
101 |Sportstracker is an application …
105 |So far, you need to do this manually and download/export every single log file as described. I am thinking about maybe writing a scraper to automize this process, but would definitely have to check with SportsTracker first to see if they allow me to.
111 |LSE Hackathon Challenge: Detecting Online Trolling Behaviour
81 |Click here to try out our shiny-app.
82 |Data source: https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge/
83 |Data description
84 |A large number of Wikipedia comments which have been labeled by human raters for toxic behavior. The types of toxicity are:
85 |To install the package use
98 |# install.packages("devtools")
99 | devtools::install_github("schliebs/trollR",
100 | auth_token = "6957b42653250daa253173f2b5e0f8e384a8f961")
101 | library(trollR)
102 | library(xgboost)
predict_troll("Hello World - this is an example of trollR - Identifying trolling comments using R")
104 | #> [1] 0.0722369
105 |
106 | # take some text
107 | text <- c(
108 | "I would like to point out that your comment was substandard!",
109 | "YOU SHOULD DIE!!!!",
110 | "YOU SHOULD DIE",
111 | "you should die!!!!",
112 | "you should die",
113 | "Go rot in hell",
114 | "I can also write something non-toxic -- really",
115 | "COCKSUCKER BEFORE YOU PISS AROUND ON MY WORK",
116 | "bloody hell, i forgot my purse at the pub yesterday"
117 | )
118 |
119 | # and find how likely it is to be trolling?
120 | data_frame(text = text, troll = predict_troll(text)) %>% arrange(-troll)
121 | #> # A tibble: 9 x 2
122 | #> text troll
123 | #> <chr> <dbl>
124 | #> 1 COCKSUCKER BEFORE YOU PISS AROUND ON MY WORK 0.972
125 | #> 2 bloody hell, i forgot my purse at the pub yesterday 0.958
126 | #> 3 Go rot in hell 0.796
127 | #> 4 you should die!!!! 0.729
128 | #> 5 YOU SHOULD DIE!!!! 0.714
129 | #> 6 YOU SHOULD DIE 0.667
130 | #> 7 you should die 0.543
131 | #> 8 I would like to point out that your comment was substandard! 0.0739
132 | #> 9 I can also write something non-toxic -- really 0.0281
Of course not
137 |run_api()
Or from a terminal
142 |curl "http://localhost:8000/trollR?text=You suck you cocksucker"
{"text":["You suck you cocksucker"],"troll_certainty":[0.9746]}
But wait, there is more
145 | 146 |# load the model
155 | #model <- xgb.load(system.file("xgboost_model.buffer", package = "trollR"))
156 | model <- xgb.load("inst/xgboost_model.buffer")
157 |
158 | df <- xgb.importance(mdl_data$model_matrix %>% colnames(), model) %>% as_data_frame()
159 |
160 | vars <- c("length", "ncap", "ncap_len", "nsen", "nexcl", "nquest", "npunct",
161 | "nword", "nsymb", "nsmile", "nslur")
162 | df %>%
163 | arrange(-Gain) %>%
164 | top_n(20, Gain) %>%
165 | mutate(Feature = reorder(Feature, Gain),
166 | Vartype = Feature %in% vars) %>%
167 | ggplot(aes(x = Feature, y = Gain, fill = Vartype)) +
168 | geom_col() +
169 | coord_flip() +
170 | labs(y = "Feature Importance in the XGBoost Model", x = "", title = "") +
171 | theme(axis.text.y = element_text(size = 15, face = "bold")) +
172 | scale_fill_brewer(palette = "Set1", guide = F)
build_features.Rd
Builds the feature-matrix from a text-vector
109 | 110 | 111 |build_features(x, term_count_min = 1, mdl = NULL, parallel = TRUE, 112 | quiet = FALSE)113 | 114 |
x | 119 |a vector of text |
120 |
---|---|
term_count_min | 123 |a number passed to
124 | |
126 |
mdl | 129 |is a list of existing models-data (containing the vectorizer, the 130 | tfidf, and the lsa object), defaults to NULL, in which case it is rebuild |
131 |
parallel | 134 |T/F if the task should be executed in parallel, defaults to TRUE |
135 |
quiet | 138 |T/F if the function remains silent, defaults to FALSE |
139 |
a list of two: a dgCMatrix that contains the features (columns) for 145 | each text (row) and as a second element a list of the model that can be passed 146 | as mdl
147 | 148 | 149 |201 |text <- c( 151 | "This is a first text that describes something", 152 | "A second Text That USES A LOT of CAPITALS", 153 | "Lastly MANY!!!! (like, really a lot!) punctuations!!!" 154 | ) 155 | 156 | build_features(text)#> Calculating Features... 157 | #> Create DTM... 158 | #> Finished in 5.12 seconds#> $model_matrix 159 | #> 3 x 21 sparse Matrix of class "dgCMatrix"#>#> 160 | #> 1 45 1 0.02222222 . . . . 8 . . 1 1 1 . . . . . . . . 161 | #> 2 41 19 0.46341463 . . . . 9 . . . . . 1 1 1 . . . . . 162 | #> 3 53 5 0.09433962 . 8 . 11 7 . . . . . . . . 1 1 1 1 1 163 | #> 164 | #> $mdl 165 | #> $mdl$vectorizer 166 | #> function (iterator, grow_dtm, skip_grams_window_context, window_size, 167 | #> weights) 168 | #> { 169 | #> vocab_corpus_ptr = cpp_vocabulary_corpus_create(vocabulary$term, 170 | #> attr(vocabulary, "ngram")[[1]], attr(vocabulary, "ngram")[[2]], 171 | #> attr(vocabulary, "stopwords"), attr(vocabulary, "sep_ngram")) 172 | #> setattr(vocab_corpus_ptr, "ids", character(0)) 173 | #> setattr(vocab_corpus_ptr, "class", "VocabCorpus") 174 | #> corpus_insert(vocab_corpus_ptr, iterator, grow_dtm, skip_grams_window_context, 175 | #> window_size, weights) 176 | #> } 177 | #> <environment: 0x000000000652caf0> 178 | #> 179 | #>180 | # a second example 181 | train <- c("Banking is finance", "flowers are not houses", "finance is power", "houses are build") 182 | test <- c("finance is greed", "flowers belong in the garbage", "houses are build") 183 | 184 | a1 <- build_features(test)#> Calculating Features... 185 | #> Create DTM... 186 | #> Finished in 3.38 secondsa12 <- build_features(test, mdl = a1$mdl)#> Calculating Features... 187 | #> Create DTM... 188 | #> Finished in 2.99 seconds189 | a2 <- build_features(train, mdl = a1$mdl)#> Calculating Features... 190 | #> Create DTM... 191 | #> Finished in 3.07 secondsa2$model_matrix %>% as.matrix()#> length ncap ncap_len nsen nexcl nquest npunct nword nsymb nsmile greed 192 | #> 1 18 1 0.05555556 0 0 0 0 3 0 0 0 193 | #> 2 22 0 0.00000000 0 0 0 0 4 0 0 0 194 | #> 3 16 0 0.00000000 0 0 0 0 3 0 0 0 195 | #> 4 16 0 0.00000000 0 0 0 0 3 0 0 0 196 | #> financ belong garbag flower build hous 197 | #> 1 1 0 0 0 0 0 198 | #> 2 0 0 0 1 0 1 199 | #> 3 1 0 0 0 0 0 200 | #> 4 0 0 0 0 1 1
predict_troll.Rd
Detect if given texts are trolls
109 | 110 | 111 |predict_troll(x, model_ = NULL, mdl_data_ = NULL)112 | 113 |
x | 118 |a vector of text |
119 |
---|---|
model_ | 122 |a model that is passed to predict, defaults to the |
124 |
mdl_data_ | 127 |a model as returned by |
130 |
a vector with the same lengths as x that holds the predicted probabilities 136 | that the given text is trolling
137 | 138 | 139 |142 |text <- c("You suck, die!", "What a nice world we have today", "I like you", "I hate you") 141 | (pred <- predict_troll(text))#> [1] 0.99461335 0.06459010 0.01856389 0.19459596
run_api.Rd
Run the Plumber API
109 | 110 | 111 |run_api(port = 8000)112 | 113 |
... | 118 |parameters passed to |
119 |
---|
invisible NULL
125 | 126 | 127 |# NOT RUN { 129 | run_api() 130 | # try to got to: http://127.0.0.1:8000/trollR 131 | # or use http://127.0.0.1:8000/trollR?text=This may be a troll comment 132 | # }133 |
run_shiny.Rd
Shiny App Launcher
109 | 110 | 111 |run_shiny(example = "trollR")112 | 113 |
example | 118 |name of the app, defaults to trollR |
119 |
---|
Nothing
125 | 126 | 127 |# NOT RUN { 129 | run_shiny() 130 | # }131 |
shitwordlist.Rd
A list of Enlish curse words scraped from https://www.noswearing.com/dictionary
109 | 110 | 111 |shitwordlist
112 |
113 | A character vector containing 349 words.
this is only for illustration purpose how to document data
hello whatup
test_function.Rd
What it does.
121 | 122 | 123 |test_function(x = c("hello"), y = 2, z = c(2, 7, 9))124 | 125 |
x | 130 |A string vector. |
131 |
---|---|
y | 134 |An integer. |
135 |
z | 138 |A numeric vector. |
139 |
What the function returns (eg. a data frame with n
rows of v variables).
Do not operate heavy machinery within 8 hours of using this function.
150 | 151 | 152 |157 |test_function(x = c("hello"), 154 | y = c(2), 155 | z = c(2,7,9))#> [1] "hello" 156 | #> [1] "this gives you the result of y * sum(z)"#> [1] 36