├── .DS_Store ├── .Rbuildignore ├── .github ├── ISSUE_TEMPLATE.md └── ISSUE_TEMPLATE │ └── bug_report.md ├── .gitignore ├── .travis.yml ├── CONDUCT.md ├── CONTRIBUTING.md ├── DESCRIPTION ├── LICENSE ├── NAMESPACE ├── NEWS.md ├── R ├── hello.R └── utils.R ├── README.Rmd ├── README.md ├── codecov.yml ├── inst ├── logo.png ├── rstudio │ └── addins.dcf └── wordcountaddin.svg ├── man ├── text_stats.Rd └── wordcountaddin.Rd ├── tests ├── testthat.R └── testthat │ ├── test_wordcountaddin.R │ ├── test_wordcountaddin.Rmd │ └── test_wordcountaddin.docx └── wordcountaddin.Rproj /.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/benmarwick/wordcountaddin/13bf891f11322c73919be59ed797cf201e725cac/.DS_Store -------------------------------------------------------------------------------- /.Rbuildignore: -------------------------------------------------------------------------------- 1 | ^.*\.Rproj$ 2 | ^\.Rproj\.user$ 3 | ^\.travis\.yml$ 4 | ^README\.Rmd$ 5 | ^README-.*\.png$ 6 | ^CONDUCT\.md$ 7 | ^CONTRIBUTING.md$ 8 | ^codecov\.yml$ 9 | ^wordcountaddin\.Rcheck$ 10 | ^wordcountaddin.*\.tar\.gz$ 11 | ^wordcountaddin.*\.tgz$ 12 | .github/ 13 | -------------------------------------------------------------------------------- /.github/ISSUE_TEMPLATE.md: -------------------------------------------------------------------------------- 1 | --- 2 | name: Bug report 3 | about: Create a report to help us improve 4 | title: '' 5 | labels: '' 6 | assignees: '' 7 | 8 | --- 9 | 10 | **Please wait for some discussion of your report before making a Pull Request.** 11 | 12 | **Describe the bug** 13 | A clear and concise description of what the bug is. 14 | 15 | **To Reproduce** 16 | 17 | Please include a minimal reproducible example (AKA a reprex). If you've never heard of a [reprex](http://reprex.tidyverse.org/) before, start by reading . 18 | 19 | Describe the steps to reproduce the behavior: 20 | 1. Go to '...' 21 | 2. Click on '....' 22 | 3. Scroll down to '....' 23 | 4. See error 24 | 25 | **Expected behavior** 26 | A clear and concise description of what you expected to happen. 27 | 28 | **Session Info** 29 | Output of `sessionInfo()` on your device so we can see what packages and version numbers you have 30 | -------------------------------------------------------------------------------- /.github/ISSUE_TEMPLATE/bug_report.md: -------------------------------------------------------------------------------- 1 | --- 2 | name: Bug report 3 | about: Create a report to help us improve 4 | title: '' 5 | labels: bug 6 | assignees: '' 7 | 8 | --- 9 | 10 | **Describe the bug** 11 | A clear and concise description of what the bug is. 12 | 13 | **To Reproduce** 14 | Please include a minimal reproducible example (AKA a reprex). If you've never heard of a [reprex](http://reprex.tidyverse.org/) before, start by reading . 15 | 16 | **Expected behavior** 17 | A clear and concise description of what you expected to happen. 18 | 19 | **Session Info** 20 | Output of `sessionInfo()` on your device so we can see what packages and version numbers you have 21 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | .Rproj.user 2 | .Rhistory 3 | .RData 4 | wordcountaddin.Rcheck/ 5 | wordcountaddin*.tar.gz 6 | wordcountaddin*.tgz 7 | -------------------------------------------------------------------------------- /.travis.yml: -------------------------------------------------------------------------------- 1 | # Sample .travis.yml for R projects 2 | 3 | language: r 4 | warnings_are_errors: false 5 | sudo: required 6 | 7 | r_github_packages: 8 | - jimhester/covr 9 | 10 | after_success: 11 | - Rscript -e 'covr::codecov()' 12 | 13 | 14 | -------------------------------------------------------------------------------- /CONDUCT.md: -------------------------------------------------------------------------------- 1 | # Contributor Code of Conduct 2 | 3 | As contributors and maintainers of this project, we pledge to respect all people who 4 | contribute through reporting issues, posting feature requests, updating documentation, 5 | submitting pull requests or patches, and other activities. 6 | 7 | We are committed to making participation in this project a harassment-free experience for 8 | everyone, regardless of level of experience, gender, gender identity and expression, 9 | sexual orientation, disability, personal appearance, body size, race, ethnicity, age, or religion. 10 | 11 | Examples of unacceptable behavior by participants include the use of sexual language or 12 | imagery, derogatory comments or personal attacks, trolling, public or private harassment, 13 | insults, or other unprofessional conduct. 14 | 15 | Project maintainers have the right and responsibility to remove, edit, or reject comments, 16 | commits, code, wiki edits, issues, and other contributions that are not aligned to this 17 | Code of Conduct. Project maintainers who do not follow the Code of Conduct may be removed 18 | from the project team. 19 | 20 | Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by 21 | opening an issue or contacting one or more of the project maintainers. 22 | 23 | This Code of Conduct is adapted from the Contributor Covenant 24 | (http:contributor-covenant.org), version 1.0.0, available at 25 | http://contributor-covenant.org/version/1/0/0/ 26 | -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | # Contributing Guidelines 2 | 3 | ## Pull requests 4 | 5 | Requirements for making a pull request: 6 | 7 | * Some knowledge of [git]() 8 | * Some knowledge of [GitHub]() 9 | 10 | Read more about pull requests on GitHub at [https://help.github.com/articles/using-pull-requests/](https://help.github.com/articles/using-pull-requests/). If you haven't done this before, Hadley Wickham provides a nice overview of git (), as well as best practices for submitting pull requests (). 11 | 12 | Then: 13 | 14 | * Fork the repo to your GitHub account 15 | * Clone the version on your account down to your machine from your account, e.g,. `git clone git@github.com:benmarwick/.git` 16 | * Make sure to track progress upstream (i.e., on our version of the package at `benmarwick/`) by doing `git remote add upstream git@github.com:benmarwick/.git`. Each time you go to make changes on your machine, be sure to pull changes in from upstream (aka the ropensci version) by doing either `git fetch upstream` then merge later or `git pull upstream` to fetch and merge in one step 17 | * Make your changes (we prefer if you make changes on a new branch) 18 | * Ideally included in your contributions: 19 | * Well documented code in roxygen docs 20 | * If you add new functions or change functionality, add one or more tests. 21 | * Make sure the package passes `R CMD CHECK` on your machine without errors/warnings 22 | * Push up to your account 23 | * Submit a pull request and participate in the discussion. 24 | 25 | ## Documentation contributions 26 | 27 | Documentation contributions are surely much needed in every project as each could surely use better instructions. If you are editing any files in the repo, follow the above instructions for pull requests to add contributions. However, if you are editing the wiki, then you can just edit the wiki and no need to do git, pull requests, etc. 28 | 29 | All of the function documentation is generated automatically. Please do not edit any of the documentation files in man/ or the NAMESPACE. Instead, construct the appropriate roxygen2 documentation in the function files in R/ themselves. The documentation is then generated by running the document() function from the devtools package. Please consult the Advanced R programming guide if this workflow is unfamiliar to you. Note that functions should include examples in the documentation. Please use \dontrun for examples that take more than a few seconds to execute or require an internet connection. 30 | 31 | Likewise, the README.md file in the base directory should not be edited directly. This file is created automatically from code that runs the examples shown, helping to ensure that they are functioning as advertised and consistent with the package README vignette. Instead, edit the README.Rmd source file in manuscripts and run make to build the README. 32 | 33 | ## Repository structure 34 | 35 | This repository is structured as a standard R package following the conventions outlined in the Writing R extensions manual. A few additional files are provided that are not part of the built R package and are listed in .Rbuildignore, such as .travis.yml, which is used for continuous testing and integration. 36 | 37 | ## Code 38 | 39 | All code for this package is found in R/, (except compiled source code, if used, which is in /src). All functions should be thoroughly documented with roxygen2 notation; see Documentation. 40 | 41 | Bug reports _must_ have a [reproducible example](http://adv-r.had.co.nz/Reproducibility.html) and include the output of `devtools::session_info()` (instead of `sessionInfo()`). We recommend using Hadley Wickham's style guide when writing code (). 42 | 43 | ## Testing 44 | 45 | Any new feature or bug-fix should include a unit-test demonstrating the change. Unit tests follow the testthat framework with files in tests/testthat. Please make sure that the testing suite passes before issuing a pull request. This can be done by running check() from the devtools package, which will also check for consistent documentation, etc. 46 | 47 | This package uses the travis continuous testing mechanism for R to ensure that the test suite is run on each push to Github. An icon at the top of the README.md indicates whether or not the tests are currently passing. 48 | 49 | ## Questions or comments? 50 | 51 | Do not hesitate to open an issue in the issues tracker to raise any questions or comments about the package or these guidelines. 52 | -------------------------------------------------------------------------------- /DESCRIPTION: -------------------------------------------------------------------------------- 1 | Package: wordcountaddin 2 | Type: Package 3 | Title: Word counts and readability statistics in R markdown documents 4 | Version: 0.3.0.9000 5 | Authors@R: c(person("Ben", "Marwick", 6 | email = "benmarwick@gmail.com", 7 | role = c("aut", "cre")), 8 | person("JooYoung", "Seo", 9 | email = "jooyoung@psu.edu", 10 | role = "ctb", comment = c(ORCID = "0000-0002-4064-6012")), 11 | person("Henrik", "Bengtsson", 12 | email = "henrik.bengtsson@gmail.com", 13 | role = "ctb"), 14 | person("Florian S.", "Schaffner", 15 | email = "florian.schaffner@outlook.com", 16 | role = "ctb"), 17 | person("Matthew T.", "Warkentin", 18 | email = "warkentin@lunenfeld.ca", 19 | role = "ctb"), 20 | person("Luke A.", "McGuinness", 21 | email = "luke.a.mcguinness@gmail.com", 22 | role = "ctb", 23 | comment = c(ORCID = "0000-0001-8730-9761"))) 24 | Maintainer: Ben Marwick 25 | Description: An addin for RStudio that will count the words and characters 26 | in a plain text document. It is designed for use with RMarkdown 27 | documents and will exclude YAML header content, code chunks and inline 28 | code from the counts. It also computes readability statistics so you can 29 | get an idea of how easy or difficult your text is to read. 30 | License: MIT + file LICENSE 31 | LazyData: TRUE 32 | Imports: 33 | fs, 34 | knitr, 35 | koRpus, 36 | koRpus.lang.en, 37 | miniUI (>= 0.1.1), 38 | purrr, 39 | rstudioapi (>= 0.5), 40 | shiny (>= 0.13), 41 | stringi, 42 | sylly, 43 | sylly.en 44 | Encoding: UTF-8 45 | RoxygenNote: 7.1.1 46 | Suggests: 47 | covr, 48 | testthat 49 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | YEAR: 2017 2 | COPYRIGHT HOLDER: Ben Marwick 3 | -------------------------------------------------------------------------------- /NAMESPACE: -------------------------------------------------------------------------------- 1 | # Generated by roxygen2: do not edit by hand 2 | 3 | export(readability) 4 | export(readability_chr) 5 | export(text_stats) 6 | export(text_stats_chr) 7 | export(text_stats_fn_) 8 | export(word_count) 9 | import(koRpus) 10 | import(purrr) 11 | import(stringi) 12 | -------------------------------------------------------------------------------- /NEWS.md: -------------------------------------------------------------------------------- 1 | # wordcountaddin 0.3.0 2 | 3 | NEW FEATURES 4 | 5 | * Count words from Rmd filename and get scalar as output (#20) 6 | 7 | MINOR IMPROVEMENTS 8 | 9 | * make the functions more DRY by adding some unexported fns 10 | * Expanded readme slightly 11 | * Added more tests 12 | 13 | # wordcountaddin 0.2.0 14 | 15 | NEW FEATURES 16 | 17 | * Count words from Rmd filename without using RStudio (#3) 18 | * Count words in active Rmd in RStudio without making text selection (#3) 19 | * Count words in character string from command line (without Rmd or RStudio) (#2) 20 | 21 | MINOR IMPROVEMENTS 22 | 23 | * Added a `NEWS.md` file to track changes to the package. 24 | * Expanded readme 25 | * Added more tests 26 | 27 | BUG FIXES 28 | 29 | * Fixed inaccurate count when
present (#1) 30 | 31 | DEPRECATED AND DEFUNCT 32 | 33 | NA 34 | 35 | # wordcountaddin 0.1.0 36 | 37 | Initial release 38 | 39 | 40 | -------------------------------------------------------------------------------- /R/hello.R: -------------------------------------------------------------------------------- 1 | #' wordcountaddin 2 | #' 3 | #' This packages is an addin for RStudio that will count the words and characters in a plain text document. It is designed for use with R markdown documents and will exclude YAML header content, code chunks and inline code from the counts. It also computes readability statistics so you can get an idea of how easy or difficult your text is to read. 4 | #' 5 | #' @name wordcountaddin 6 | #' @docType package 7 | #' @import purrr stringi koRpus 8 | NULL 9 | 10 | # global things 11 | 12 | md_file_ext_regex <- paste( 13 | "\\.markdown$", 14 | "\\.mdown$", 15 | "\\.mkdn$", 16 | "\\.md$", 17 | "\\.mkd$", 18 | "\\.mdwn$", 19 | "\\.mdtxt$", 20 | "\\.mdtext$", 21 | "\\.rmd$", 22 | "\\.Rmd$", 23 | "\\.RMD$", 24 | "\\.Rmarkdown$", 25 | "\\.qmd$", 26 | sep = "|") 27 | 28 | 29 | #------------------------------------------------------------------- 30 | # fns for working with selected text in an active Rmd 31 | 32 | #' Get text stats for selected text (excluding code chunks and inline code) 33 | #' 34 | #' Call this addin to get a word count and some other stats about the text 35 | #' @param filename Path to the file on which to compute text stats. 36 | #' Default is the current file (when working in RStudio) or the file being 37 | #' knit (when compiling with \code{knitr}). 38 | #' 39 | #' @export 40 | #' @examples 41 | #' md <- system.file(package = "wordcountaddin", "NEWS.md") 42 | #' text_stats(md) 43 | #' word_count(md) 44 | #' \dontrun{ 45 | #' readability(md) 46 | #' } 47 | text_stats <- function(filename = this_filename()) { 48 | 49 | text_to_count_output <- text_to_count(filename) 50 | 51 | text_stats_fn(text_to_count_output) 52 | } 53 | 54 | 55 | #' @rdname text_stats 56 | #' @description Get a word count as a single integer 57 | #' @export 58 | word_count <- function(filename = this_filename()){ 59 | 60 | text_to_count_output <- text_to_count(filename) 61 | 62 | word_count_output <- text_stats_fn_(text_to_count_output) 63 | 64 | word_count_output$n_words_korp 65 | } 66 | 67 | 68 | 69 | 70 | 71 | 72 | #' @rdname text_stats 73 | #' @description Get readability stats for selected text (excluding code chunks) 74 | #' @param quiet Logical. Should task be performed quietly? 75 | #' 76 | #' @details Call this addin to get readbility stats about the text 77 | #' 78 | #' @export 79 | readability <- function(filename = this_filename(), quiet = TRUE) { 80 | 81 | 82 | text_to_count_output <- text_to_count(filename) 83 | 84 | readability_fn(text_to_count_output, quiet = TRUE) 85 | } 86 | 87 | #--------------------------------------------------------------- 88 | # directly work on a character string in the console 89 | 90 | 91 | #' @rdname text_stats 92 | #' @description Get text stats for selected text (excluding code chunks and inline code) 93 | #' 94 | #' @details Use this function with a character string as input 95 | #' 96 | #' @export 97 | text_stats_chr <- function(text) { 98 | 99 | text <- paste(text, collapse="\n") 100 | 101 | text_stats_fn(text) 102 | 103 | } 104 | 105 | 106 | #' @rdname text_stats 107 | #' @description Get readability stats for selected text (excluding code chunks) 108 | #' 109 | #' @details Use this function with a character string as input 110 | #' 111 | #' @param text a character string of text, length of one 112 | #' 113 | #' @export 114 | readability_chr <- function(text, quiet = TRUE) { 115 | 116 | text <- paste(text, collapse = "\n") 117 | 118 | readability_fn(text, quiet = TRUE) 119 | 120 | } 121 | #----------------------------------------------------------- 122 | # helper fns, not exported 123 | 124 | text_to_count <- function(filename){ 125 | # selected text takes precedence over the filename argument: 126 | # if text is selected, it is used. Otherwise, the text in filename is used 127 | if (rstudioapi::isAvailable()) { 128 | context <- rstudioapi::getActiveDocumentContext() 129 | selection_text <- unname(unlist(context$selection)["text"]) 130 | text_is_selected <- nchar(selection_text) > 0 131 | } else { 132 | # if not running in RStudio, assume no text is selected 133 | text_is_selected <- FALSE 134 | } 135 | 136 | if (text_is_selected) { 137 | text <- selection_text 138 | } else { 139 | # if no text is selected, read text from "filename" as character vector 140 | is_extension_invalid <- !grepl(md_file_ext_regex, filename) 141 | if (is_extension_invalid) { 142 | stop(paste("The supplied file has an extension which is not associated with markdown.", 143 | "This function only works with markdown or R markdown files.", sep = "\n ")) 144 | } 145 | text <- paste(scan(filename, 'character', quiet = TRUE), collapse = " ") 146 | } 147 | text 148 | } 149 | 150 | prep_text <- function(text){ 151 | 152 | # remove all line breaks, http://stackoverflow.com/a/21781150/1036500 153 | text <- gsub("[\r\n]", " ", text) 154 | 155 | # don't include yaml front matter 156 | three_dashes <- unlist(gregexpr('---', text)) 157 | if (three_dashes[1]==1L) { 158 | yaml_end <- three_dashes[2] + 2L 159 | text <- substr(text, yaml_end + 1L, nchar(text)) 160 | } else { 161 | text 162 | } 163 | 164 | # don't include text in code chunks: https://regex101.com/#python 165 | text <- gsub("```\\{.+?\\}.+?```", "", text) 166 | 167 | # don't include text in in-line R code 168 | text <- gsub("`r.+?`", "", text) 169 | 170 | # don't include HTML comments 171 | text <- gsub("", "", text) 172 | 173 | # don't include LaTeX comments 174 | # how to do this? %% 175 | 176 | # don't include images with captions 177 | text <- gsub("!\\[.+?\\]\\(.+?\\)\\{.+?\\}", "", text) 178 | text <- gsub("!\\[.+?\\]\\(.+?\\)", "", text) 179 | 180 | # don't include inline markdown URLs 181 | text <- gsub("\\(http.+?\\)", "", text) 182 | 183 | # don't include # for headings 184 | text <- gsub("#*", "", text) 185 | 186 | # don't include opening html tags 187 | # (source: https://www.w3schools.com/TAGS/default.ASP) 188 | 189 | tags <- paste0("!DOCTYPE|a|abbr|acronym|address|applet|area|article|aside|", 190 | "audio|b|base|basefont|bdi|bdo|big|blockquote|body|br|button|", 191 | "canvas|caption|center|cite|code|col|colgroup|data|datalist|", 192 | "dd|del|details|dfn|dialog|dir|div|dl|dt|em|embed|fieldset|", 193 | "figcaption|figure|font|footer|form|frame|frameset|h1 to h6|", 194 | "head|header|hr|html|i|iframe|img|input|ins|kbd|label|legend|", 195 | "li|link|main|map|mark|meta|meter|nav|noframes|noscript|", 196 | "object|ol|optgroup|option|output|p|param|picture|pre|", 197 | "progress|q|rp|rt|ruby|s|samp|script|section|select|small|", 198 | "source|span|strike|strong|style|sub|summary|sup|svg|table|", 199 | "tbody|td|template|textarea|tfoot|th|thead|time|title|tr|", 200 | "track|tt|u|ul|var|video|wbr") 201 | 202 | text <- gsub(paste0("<\\s*(",tags,")[^>]*>"),"", text) 203 | 204 | # don't include closing html tags 205 | text <- gsub("", "", text) 206 | 207 | # don't include greater/less than signs because they trip up koRpus 208 | text <- gsub("<|>", "", text) 209 | 210 | # don't include percent signs because they trip up stringi 211 | text <- gsub("%", "", text) 212 | 213 | # don't include figures and tables inserted using plain LaTeX code 214 | text <- gsub("\\\\begin\\{figure\\}(.*?)\\\\end\\{figure\\}", "", text) 215 | text <- gsub("\\\\begin\\{table\\}(.*?)\\\\end\\{table\\}", "", text) 216 | 217 | # don't count abbreviations as multiple words, but leave 218 | # the period at the end in case it's the end of a sentence 219 | text <- gsub("\\.(?=[a-z]+)", "", text, perl = TRUE) 220 | 221 | # don't include LaTeX \eggs{ham} 222 | # how to do? problem with capturing \x 223 | 224 | # remove lines starting with ::: 225 | text <- gsub(":::.*", "", text) 226 | 227 | 228 | 229 | if(nchar(text) == 0){ 230 | stop("You have not selected any text. Please select some text with the mouse and try again") 231 | } 232 | 233 | return(text) 234 | 235 | } 236 | 237 | prep_text_korpus <- function(text){ 238 | lengths <- unlist(strsplit(text, " ")) 239 | no_long_one <- paste0(ifelse(nchar(lengths) > 30, substr(lengths, 1, 10), lengths), collapse = " ") 240 | tokenize_safe <- purrr::safely(koRpus::tokenize) 241 | k1 <- tokenize_safe(no_long_one, lang = 'en', format = 'obj') 242 | k1 <- k1$result 243 | return(k1) 244 | } 245 | 246 | 247 | # These functions do the actual work 248 | 249 | #' @rdname text_stats 250 | #' @export 251 | text_stats_fn_ <- function(text){ 252 | # suppress warnings 253 | oldw <- getOption("warn") 254 | options(warn = -1) 255 | 256 | text <- prep_text(text) 257 | 258 | require("koRpus.lang.en", quietly = TRUE) 259 | 260 | # stringi methods 261 | n_char_tot <- sum(stri_stats_latex(text)[c(1,3)]) 262 | n_words_stri <- unname(stri_stats_latex(text)[4]) 263 | 264 | #korpus methods 265 | k1 <- prep_text_korpus(text) 266 | korpus_stats <- sylly::describe(k1) 267 | k_nchr <- korpus_stats$all.chars 268 | k_wc <- korpus_stats$words 269 | k_sent <- korpus_stats$sentences 270 | k_wps <- k_wc / k_sent 271 | 272 | # reading time 273 | # https://en.wikipedia.org/wiki/Words_per_minute#Reading_and_comprehension 274 | # assume 200 words per min 275 | wpm <- 200 276 | reading_time_korp <- paste0(round(k_wc / wpm, 1), " minutes") 277 | reading_time_stri <- paste0(round(n_words_stri / wpm, 1), " minutes") 278 | 279 | return(list( 280 | # make the names more useful 281 | n_char_tot_stri = n_char_tot, 282 | n_char_tot_korp = k_nchr, 283 | n_words_korp = k_wc, 284 | n_words_stri = n_words_stri, 285 | n_sentences_korp = k_sent, 286 | words_per_sentence_korp = k_wps, 287 | reading_time_korp = reading_time_korp, 288 | reading_time_stri = reading_time_stri 289 | )) 290 | 291 | # resume warnings 292 | options(warn = oldw) 293 | 294 | } 295 | 296 | 297 | 298 | text_stats_fn <- function(text){ 299 | 300 | l <- text_stats_fn_(text) 301 | 302 | results_df <- data.frame(Method = c("Word count", "Character count", "Sentence count", "Reading time"), 303 | koRpus = c(l$n_words_korp, l$n_char_tot_korp, l$n_sentences_korp, l$reading_time_korp), 304 | stringi = c(l$n_words_stri, l$n_char_tot_stri, "Not available", l$reading_time_stri) 305 | ) 306 | 307 | results_df_tab <- knitr::kable(results_df) 308 | return(results_df_tab) 309 | 310 | } 311 | 312 | 313 | readability_fn_ <- function(text, quiet = TRUE){ 314 | 315 | text <- prep_text(text) 316 | 317 | oldw <- getOption("warn") 318 | options(warn = -1) 319 | 320 | require("koRpus.lang.en", quietly = TRUE) 321 | 322 | # korpus methods 323 | k1 <- prep_text_korpus(text) 324 | k_readability <- koRpus::readability(k1, quiet = TRUE) 325 | 326 | return(k_readability) 327 | 328 | # resume warnings 329 | options(warn = oldw) 330 | } 331 | 332 | 333 | readability_fn <- function(text, quiet = TRUE){ 334 | # a more condensed overview of the results 335 | k_readability <- readability_fn_(text, quiet = TRUE) 336 | readability_summary_table <- knitr::kable(summary(k_readability)) 337 | return(readability_summary_table) 338 | 339 | } 340 | -------------------------------------------------------------------------------- /R/utils.R: -------------------------------------------------------------------------------- 1 | # Get the filename of the current file, or 2 | # the file being rendered 3 | 4 | this_filename <- function() { 5 | if (interactive()) { 6 | filename <- rstudioapi::getSourceEditorContext()$path 7 | } else { 8 | filename <- knitr::current_input() 9 | } 10 | return(fs::path(filename)) 11 | } 12 | -------------------------------------------------------------------------------- /README.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | output: 3 | md_document: 4 | variant: markdown_github 5 | --- 6 | 7 | 8 | 9 | 10 | 11 | ```{r, echo = FALSE} 12 | knitr::opts_chunk$set( 13 | collapse = TRUE, 14 | comment = "#>", 15 | fig.path = "README-" 16 | ) 17 | ``` 18 | 19 | 20 | # wordcountaddin 21 | 22 | [![Last-changedate](https://img.shields.io/badge/last%20change-`r gsub('-', '--', Sys.Date())`-brightgreen.svg)](https://github.com/benmarwick/wordcountaddin/commits/master) 23 | [![minimal R version](https://img.shields.io/badge/R%3E%3D-`r as.character(getRversion())`-brightgreen.svg)](https://cran.r-project.org/) 24 | [![Licence](https://img.shields.io/github/license/mashape/apistatus.svg)](http://choosealicense.com/licenses/mit/) 25 | [![Travis-CI Build Status](https://travis-ci.org/benmarwick/wordcountaddin.png?branch=master)](https://travis-ci.org/benmarwick/wordcountaddin) 26 | [![codecov.io](https://codecov.io/github/benmarwick/wordcountaddin/coverage.svg?branch=master)](https://codecov.io/github/benmarwick/wordcountaddin?branch=master) [![ORCiD](https://img.shields.io/badge/ORCiD-0000--0001--7879--4531-green.svg)](http://orcid.org/0000-0001-7879-4531) 27 | 28 | 29 | 30 | 31 | This R package is an [RStudio addin](https://rstudio.github.io/rstudioaddins/) to count words and characters in text in an [R markdown](http://rmarkdown.rstudio.com/) document. It also has a function to compute readability statistics so you can get an indication of how easy or difficult your document is to read. 32 | 33 | You can count words in your Rmd file in three ways: 34 | 35 | - In a selection of text in your active Rmd, by selecting some text with your mouse in RStudio and using the Wordcount Addin 36 | - All the words in your active Rmd in RStudio, by using the Wordcount Addin with no text selected 37 | - All the words in an Rmd file, directly using the `word_count` function from the console or command line (RStudio not required), and specifiying the filename as an argument to the function (e.g. `wordcountaddin::word_count("my_file.Rmd")`). This will give you a single integer result, rather than the Markdown table that the other functions return. 38 | 39 | Independent of an Rmd file, you can also count words in a character vector from the console using the `text_stats_chr` function (and there is `readability_chr` for readability). 40 | 41 | ## Word count 42 | 43 | When counting words in the text of your Rmd document, these things will be ignored: 44 | 45 | - YAML front matter 46 | - code chunks and inline code 47 | - text in HTML comment tags: `` 48 | - HTML tags in the text: `
`, `
` 49 | - inline URLs in this format: `[text of link](url)` 50 | - images with captions in this format: `![this is the caption](/path/to/image.png)` 51 | - header level indicators such as `#` and `##`, etc. 52 | 53 | And because my regex is quite simple, the word count function may also ignore parts of your actual text that resemble these things. 54 | 55 | The word count will include text in headers, block quotations, verbatim code blocks, tables, raw LaTeX and raw HTML. 56 | 57 | In general, there are numerous ways to count words, with no widely accepted standard method. The variety of methods is due to differences in the definitions of a word and a sentence. Run `?stringi::stri_stats_latex` and `?koRpus::describe` to learn more about the word counting methods. 58 | 59 | For this addin I've included two methods, mostly out of curiosity to see how they differ from each other. I use functions from the [stringi](https://cran.r-project.org/web/packages/stringi/index.html) and [koRpus](https://cran.r-project.org/web/packages/koRpus/index.html) packages. If you're curious, you can compare the results you get with this addin to an online tool such as . 60 | 61 | The output of the `Word count` function is a markdown table in your R console that might look like this: 62 | 63 | ``` 64 | |Method |koRpus |stringi | 65 | |:---------------|:-----------|:-------------| 66 | |Word count |107 |104 | 67 | |Character count |604 |603 | 68 | |Sentence count |10 |Not available | 69 | |Reading time |0.5 minutes |0.5 minutes | 70 | ``` 71 | 72 | If you want to reuse these results in other R functions, you can use an unexported function like this `wordcountaddin:::text_stats_fn_(text)`, where `text` is a character vector of your text (with length one, ie. all your text in a single character string). The output will be a list object, and will include several other items not shown in the markdown table. 73 | 74 | ## Readability 75 | 76 | The readability function ignores all the same parts of the text as the word count function, and then computes the values of a bunch of [readability statistics](https://en.wikipedia.org/wiki/Readability_test). 77 | 78 | Most of these readability measurements aim to approximate the years of education required to understand your text. They look at the number of characters and syllables per word, the number of words per sentence, and so on. They don't analyse the meaning of the words. A score of around 10-12 is roughly the reading level on completion of high school in the US. These stats are computed by the [koRpus](https://cran.r-project.org/web/packages/koRpus/index.html) package. 79 | 80 | There about 27 measurements that this readability function returns (depending on how long your text is), including the Automated Readability Index (ARI), Coleman-Liau, th Flesch-Kincaid Grade Level, and the Simple Measure of Gobbledygook (SMOG). For the full list of readability measurements that are returned by the readability function, run `?koRpus::readability`. That help page also shows the formulae and citations for each statistic (and an additional 20-odd other readability statistics not used here). 81 | 82 | Readability stats are, of course, no substitute for critical self-reflection on the effectiveness of your writing at communicating ideas and information. To help with that, read [_Style: Toward Clarity and Grace_](http://www.amazon.com/dp/0226899152). 83 | 84 | 85 | The output of the `readability` function is a markdown table in your R console that might look like this: 86 | 87 | ``` 88 | 89 | |index |flavour |raw |grade |age | 90 | |:---------------------|:-----------|:-----|:-----|:----| 91 | |ARI | | |2.31 | | 92 | |Coleman-Liau | |66 |4.91 | | 93 | |Danielson-Bryan DB1 | |6.46 | | | 94 | |Danielson-Bryan DB2 | |60.39 |6 | | 95 | |Dickes-Steiwer | |53.07 | | | 96 | |ELF | |1.83 | | | 97 | |Farr-Jenkins-Paterson | |66.81 |8-9 | | 98 | |Flesch |en (Flesch) |69.57 |8-9 | | 99 | |Flesch-Kincaid | | |4.85 |9.8 | 100 | |FOG | | |7.84 | | 101 | |FORCAST | | |10.28 |15.3 | 102 | |Fucks | |23.38 |4.83 | | 103 | |Linsear-Write | | |2.35 | | 104 | |LIX | |32.41 |< 5 | | 105 | |nWS1 | | |4.19 | | 106 | |nWS2 | | |4.72 | | 107 | |nWS3 | | |4.14 | | 108 | |nWS4 | | |3.64 | | 109 | |RIX | |1.42 |5 | | 110 | |SMOG | | |8.08 |13.1 | 111 | |Strain | |2.44 | | | 112 | |TRI | |-94 | | | 113 | |Tuldava | |2.57 | | | 114 | |Wheeler-Smith | |18.33 |2 | | 115 | ``` 116 | 117 | Similar to the `word count` function, if you want to reuse these results in other R functions, you can use an unexported function like this `wordcountaddin:::readability_fn_(text)`, where `text` is a character vector of your text (with length one, ie. all your text in a single character string). The output will be a list object with slightly more detail than the summary table above. 118 | 119 | Inspiration for this addin came from [jadd](https://github.com/jennybc/jadd) and [WrapRmd](https://github.com/tjmahr/WrapRmd). 120 | 121 | ## How to install 122 | 123 | Install with `devtools::install_github("benmarwick/wordcountaddin", type = "source", dependencies = TRUE)` 124 | 125 | Go to `Tools > Addins` in RStudio to select and configure addins. 126 | 127 | ## How to use 128 | 129 | 1. Open a Rmd file in RStudio. 130 | 2. Select some text, it can include YAML, code chunks and inline code 131 | 3. Go to `Tools > Addins` in RStudio and click on `Word count` or `Readability`. Computing `Readability` may take a few moments on longer documents because it has to count syllables for some of the stats. 132 | 4. Look in the console for the output 133 | 134 | 135 | ## Feedback, contributing, etc. 136 | 137 | Please [open an issue](https://github.com/benmarwick/wordcountaddin/issues/new) if you find something that doesn't work as expected. Note that this project is released with a [Guide to Contributing](CONTRIBUTING.md) and a [Contributor Code of Conduct](CONDUCT.md). By participating in this project you agree to abide by its terms. 138 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | 2 | wordcountaddin 3 | ===================================================================== 4 | 5 | [![Last-changedate](https://img.shields.io/badge/last%20change-2019--01--09-brightgreen.svg)](https://github.com/benmarwick/wordcountaddin/commits/master) 6 | [![minimal R 7 | version](https://img.shields.io/badge/R%3E%3D-3.5.2-brightgreen.svg)](https://cran.r-project.org/) 8 | [![Licence](https://img.shields.io/github/license/mashape/apistatus.svg)](http://choosealicense.com/licenses/mit/) 9 | [![Travis-CI Build 10 | Status](https://travis-ci.org/benmarwick/wordcountaddin.png?branch=master)](https://travis-ci.org/benmarwick/wordcountaddin) 11 | [![codecov.io](https://codecov.io/github/benmarwick/wordcountaddin/coverage.svg?branch=master)](https://codecov.io/github/benmarwick/wordcountaddin?branch=master) 12 | [![ORCiD](https://img.shields.io/badge/ORCiD-0000--0001--7879--4531-green.svg)](http://orcid.org/0000-0001-7879-4531) 13 | 14 | This R package is an [RStudio 15 | addin](https://rstudio.github.io/rstudioaddins/) to count words and 16 | characters in text in an [R markdown](http://rmarkdown.rstudio.com/) 17 | document. It also has a function to compute readability statistics so 18 | you can get an indication of how easy or difficult your document is to 19 | read. 20 | 21 | You can count words in your Rmd file in three ways: 22 | 23 | - In a selection of text in your active Rmd, by selecting some text 24 | with your mouse in RStudio and using the Wordcount Addin 25 | - All the words in your active Rmd in RStudio, by using the Wordcount 26 | Addin with no text selected 27 | - All the words in an Rmd file, directly using the `word_count` 28 | function from the console or command line (RStudio not required), 29 | and specifiying the filename as an argument to the function (e.g. 30 | `wordcountaddin::word_count("my_file.Rmd")`). This will give you a 31 | single integer result, rather than the Markdown table that the other 32 | functions return. 33 | 34 | Independent of an Rmd file, you can also count words in a character 35 | vector from the console using the `text_stats_chr` function (and there 36 | is `readability_chr` for readability). 37 | 38 | Word count 39 | ---------- 40 | 41 | When counting words in the text of your Rmd document, these things will 42 | be ignored: 43 | 44 | - YAML front matter 45 | - code chunks and inline code 46 | - text in HTML comment tags: `` 47 | - HTML tags in the text: `
`, `
` 48 | - inline URLs in this format: `[text of link](url)` 49 | - images with captions in this format: 50 | `![this is the caption](/path/to/image.png)` 51 | - header level indicators such as `#` and `##`, etc. 52 | 53 | And because my regex is quite simple, the word count function may also 54 | ignore parts of your actual text that resemble these things. 55 | 56 | The word count will include text in headers, block quotations, verbatim 57 | code blocks, tables, raw LaTeX and raw HTML. 58 | 59 | In general, there are numerous ways to count words, with no widely 60 | accepted standard method. The variety of methods is due to differences 61 | in the definitions of a word and a sentence. Run 62 | `?stringi::stri_stats_latex` and `?koRpus::describe` to learn more about 63 | the word counting methods. 64 | 65 | For this addin I’ve included two methods, mostly out of curiosity to see 66 | how they differ from each other. I use functions from the 67 | [stringi](https://cran.r-project.org/web/packages/stringi/index.html) 68 | and [koRpus](https://cran.r-project.org/web/packages/koRpus/index.html) 69 | packages. If you’re curious, you can compare the results you get with 70 | this addin to an online tool such as 71 | http://wordcounttools.com/. 72 | 73 | The output of the `Word count` function is a markdown table in your R 74 | console that might look like this: 75 | 76 | |Method |koRpus |stringi | 77 | |:---------------|:-----------|:-------------| 78 | |Word count |107 |104 | 79 | |Character count |604 |603 | 80 | |Sentence count |10 |Not available | 81 | |Reading time |0.5 minutes |0.5 minutes | 82 | 83 | If you want to reuse these results in other R functions, you can use an 84 | unexported function like this `wordcountaddin:::text_stats_fn_(text)`, 85 | where `text` is a character vector of your text (with length one, ie. 86 | all your text in a single character string). The output will be a list 87 | object, and will include several other items not shown in the markdown 88 | table. 89 | 90 | Readability 91 | ----------- 92 | 93 | The readability function ignores all the same parts of the text as the 94 | word count function, and then computes the values of a bunch of 95 | [readability 96 | statistics](https://en.wikipedia.org/wiki/Readability_test). 97 | 98 | Most of these readability measurements aim to approximate the years of 99 | education required to understand your text. They look at the number of 100 | characters and syllables per word, the number of words per sentence, and 101 | so on. They don’t analyse the meaning of the words. A score of around 102 | 10-12 is roughly the reading level on completion of high school in the 103 | US. These stats are computed by the 104 | [koRpus](https://cran.r-project.org/web/packages/koRpus/index.html) 105 | package. 106 | 107 | There about 27 measurements that this readability function returns 108 | (depending on how long your text is), including the Automated 109 | Readability Index (ARI), Coleman-Liau, th Flesch-Kincaid Grade Level, 110 | and the Simple Measure of Gobbledygook (SMOG). For the full list of 111 | readability measurements that are returned by the readability function, 112 | run `?koRpus::readability`. That help page also shows the formulae and 113 | citations for each statistic (and an additional 20-odd other readability 114 | statistics not used here). 115 | 116 | Readability stats are, of course, no substitute for critical 117 | self-reflection on the effectiveness of your writing at communicating 118 | ideas and information. To help with that, read [*Style: Toward Clarity 119 | and Grace*](http://www.amazon.com/dp/0226899152). 120 | 121 | The output of the `readability` function is a markdown table in your R 122 | console that might look like this: 123 | 124 | 125 | |index |flavour |raw |grade |age | 126 | |:---------------------|:-----------|:-----|:-----|:----| 127 | |ARI | | |2.31 | | 128 | |Coleman-Liau | |66 |4.91 | | 129 | |Danielson-Bryan DB1 | |6.46 | | | 130 | |Danielson-Bryan DB2 | |60.39 |6 | | 131 | |Dickes-Steiwer | |53.07 | | | 132 | |ELF | |1.83 | | | 133 | |Farr-Jenkins-Paterson | |66.81 |8-9 | | 134 | |Flesch |en (Flesch) |69.57 |8-9 | | 135 | |Flesch-Kincaid | | |4.85 |9.8 | 136 | |FOG | | |7.84 | | 137 | |FORCAST | | |10.28 |15.3 | 138 | |Fucks | |23.38 |4.83 | | 139 | |Linsear-Write | | |2.35 | | 140 | |LIX | |32.41 |< 5 | | 141 | |nWS1 | | |4.19 | | 142 | |nWS2 | | |4.72 | | 143 | |nWS3 | | |4.14 | | 144 | |nWS4 | | |3.64 | | 145 | |RIX | |1.42 |5 | | 146 | |SMOG | | |8.08 |13.1 | 147 | |Strain | |2.44 | | | 148 | |TRI | |-94 | | | 149 | |Tuldava | |2.57 | | | 150 | |Wheeler-Smith | |18.33 |2 | | 151 | 152 | Similar to the `word count` function, if you want to reuse these results 153 | in other R functions, you can use an unexported function like this 154 | `wordcountaddin:::readability_fn_(text)`, where `text` is a character 155 | vector of your text (with length one, ie. all your text in a single 156 | character string). The output will be a list object with slightly more 157 | detail than the summary table above. 158 | 159 | Inspiration for this addin came from 160 | [jadd](https://github.com/jennybc/jadd) and 161 | [WrapRmd](https://github.com/tjmahr/WrapRmd). 162 | 163 | How to install 164 | -------------- 165 | 166 | Install with 167 | `devtools::install_github("benmarwick/wordcountaddin", type = "source", dependencies = TRUE)` 168 | 169 | Go to `Tools > Addins` in RStudio to select and configure addins. 170 | 171 | How to use 172 | ---------- 173 | 174 | 1. Open a Rmd file in RStudio. 175 | 2. Select some text, it can include YAML, code chunks and inline code 176 | 3. Go to `Tools > Addins` in RStudio and click on `Word count` or 177 | `Readability`. Computing `Readability` may take a few moments on 178 | longer documents because it has to count syllables for some of the 179 | stats. 180 | 4. Look in the console for the output 181 | 182 | Feedback, contributing, etc. 183 | ---------------------------- 184 | 185 | Please [open an 186 | issue](https://github.com/benmarwick/wordcountaddin/issues/new) if you 187 | find something that doesn’t work as expected. Note that this project is 188 | released with a [Guide to Contributing](CONTRIBUTING.md) and a 189 | [Contributor Code of Conduct](CONDUCT.md). By participating in this 190 | project you agree to abide by its terms. 191 | -------------------------------------------------------------------------------- /codecov.yml: -------------------------------------------------------------------------------- 1 | comment: false 2 | -------------------------------------------------------------------------------- /inst/logo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/benmarwick/wordcountaddin/13bf891f11322c73919be59ed797cf201e725cac/inst/logo.png -------------------------------------------------------------------------------- /inst/rstudio/addins.dcf: -------------------------------------------------------------------------------- 1 | Name: Word count 2 | Description: Counts words and characters (excluding code chunks, inline code, etc.) 3 | Binding: text_stats 4 | Interactive: true 5 | 6 | Name: Readability 7 | Description: Computes readability statistics (excluding code chunks, inline code, etc.) 8 | Binding: readability 9 | Interactive: true 10 | -------------------------------------------------------------------------------- /man/text_stats.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/hello.R 3 | \name{text_stats} 4 | \alias{text_stats} 5 | \alias{word_count} 6 | \alias{readability} 7 | \alias{text_stats_chr} 8 | \alias{readability_chr} 9 | \alias{text_stats_fn_} 10 | \title{Get text stats for selected text (excluding code chunks and inline code)} 11 | \usage{ 12 | text_stats(filename = this_filename()) 13 | 14 | word_count(filename = this_filename()) 15 | 16 | readability(filename = this_filename(), quiet = TRUE) 17 | 18 | text_stats_chr(text) 19 | 20 | readability_chr(text, quiet = TRUE) 21 | 22 | text_stats_fn_(text) 23 | } 24 | \arguments{ 25 | \item{filename}{Path to the file on which to compute text stats. 26 | Default is the current file (when working in RStudio) or the file being 27 | knit (when compiling with \code{knitr}).} 28 | 29 | \item{quiet}{Logical. Should task be performed quietly?} 30 | 31 | \item{text}{a character string of text, length of one} 32 | } 33 | \description{ 34 | Call this addin to get a word count and some other stats about the text 35 | 36 | Get a word count as a single integer 37 | 38 | Get readability stats for selected text (excluding code chunks) 39 | 40 | Get text stats for selected text (excluding code chunks and inline code) 41 | 42 | Get readability stats for selected text (excluding code chunks) 43 | } 44 | \details{ 45 | Call this addin to get readbility stats about the text 46 | 47 | Use this function with a character string as input 48 | 49 | Use this function with a character string as input 50 | } 51 | \examples{ 52 | md <- system.file(package = "wordcountaddin", "NEWS.md") 53 | text_stats(md) 54 | word_count(md) 55 | \dontrun{ 56 | readability(md) 57 | } 58 | } 59 | -------------------------------------------------------------------------------- /man/wordcountaddin.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/hello.R 3 | \docType{package} 4 | \name{wordcountaddin} 5 | \alias{wordcountaddin} 6 | \title{wordcountaddin} 7 | \description{ 8 | This packages is an addin for RStudio that will count the words and characters in a plain text document. It is designed for use with R markdown documents and will exclude YAML header content, code chunks and inline code from the counts. It also computes readability statistics so you can get an idea of how easy or difficult your text is to read. 9 | } 10 | -------------------------------------------------------------------------------- /tests/testthat.R: -------------------------------------------------------------------------------- 1 | library(testthat) 2 | library(wordcountaddin) 3 | 4 | test_check("wordcountaddin") 5 | -------------------------------------------------------------------------------- /tests/testthat/test_wordcountaddin.R: -------------------------------------------------------------------------------- 1 | library(wordcountaddin) 2 | 3 | context("Word count") 4 | 5 | test_that("Word count is correct for short simple sentence", { 6 | # short sentence 7 | eleven_words <- "here are exactly eleven words of fairly boring and unpunctuated text" 8 | 9 | short_stats <- text_stats_fn_(eleven_words) 10 | # qdap cannot manage without final punct. 11 | n_words_stri_11 <-short_stats$n_words_stri 12 | n_words_korp_11 <- short_stats$n_words_korp 13 | 14 | n_char_tot_stri_11 <- short_stats$n_char_tot_stri 15 | n_char_tot_korp_11 <- short_stats$n_char_tot_korp 16 | 17 | expect_equal(n_words_stri_11, 11) 18 | expect_equal(n_words_korp_11, 11) 19 | expect_equal(n_char_tot_stri_11, 68) 20 | expect_equal(n_char_tot_korp_11, 69) 21 | }) 22 | 23 | test_that("Word count is correct for moderately complex sentences", { 24 | # Moderate: Harvard sentences, https://en.wikipedia.org/wiki/Harvard_sentences 25 | moderately_complex <- "The birch canoe slid on the smooth planks. Glue the sheet to the dark blue background. It's easy to tell the depth of a well. These days a chicken leg is a rare dish. Rice is often served in round bowls. The juice of lemons makes fine punch. The box was thrown beside the parked truck. The hogs were fed chopped corn and garbage. Four hours of steady work faced us. Large size in stockings is hard to sell." 26 | 27 | moderately_complex_stats <- text_stats_fn_(moderately_complex) 28 | 29 | n_char_tot_stri_mc <- moderately_complex_stats$n_char_tot_stri 30 | n_char_tot_korp_mc <- moderately_complex_stats$n_char_tot_korp 31 | 32 | n_words_stri_mc <- moderately_complex_stats$n_words_stri 33 | n_words_korp_mc <- moderately_complex_stats$n_words_korp 34 | 35 | n_sentences_korp_mc <- moderately_complex_stats$n_sentences_korp 36 | 37 | expect_equal(n_char_tot_stri_mc, 406) 38 | expect_equal(n_char_tot_korp_mc, 407) 39 | expect_equal(n_words_stri_mc, 80) # MS Word says 79 40 | expect_equal(n_words_korp_mc, 80) 41 | expect_equal(n_sentences_korp_mc, 10) 42 | }) 43 | 44 | 45 | 46 | test_that("Word count is correct for complex sentences in filler text", { 47 | # Filler text with various punctuation 48 | filler <- "Lorem ipsum dolor sit amet, ea debet error sensibus vix, at esse decore vivendo vim, rebum aliquip an cum? His ea agam novum dissentiet! At mel audire liberavisse, mundi audiam quaeque sea ne. In eam error habemus delectus, audiam ocurreret ne sit, sit ei salutandi liberavisse! Ut vix case corpora. 49 | 50 | Posse malorum ponderum in qui, et eum dicam disputando, an vix quaestio scripserit. Falli veniam tamquam id mei. Modo sumo appetere cu mea, mutat possim rationibus ius id. Sed nominati antiopam cu, cu prima mandamus vim. Eos cu exerci consul! 51 | 52 | Nam case atomorum suavitate cu? No quo inermis necessitatibus, eos ne essent scripta vivendum, ea euismod quaestio qui? Per minim tation accusamus eu, audire dolores nam an. Vel vocent inimicus ut, eu porro libris argumentum quo. 53 | 54 | Vim no solet tempor, aperiam habemus assueverit ea usu: sea ut quodsi gloriatur! Eum te laudem aliquid inciderint, mollis prodesset mea ad. Dico definiebas efficiendi id usu. No bonorum suavitate adolescens per, ius oratio pericula ut, at mel porro vocibus scriptorem. Sea incorrupte definitiones necessitatibus in, cu ancillae conclusionemque duo. Ex vix dolore propriae principes, ius in augue ludus? 55 | 56 | Solet copiosae ea sed, at assum - dolore delenit has, ex aperiam honestatis mei. No legere nemore nonumes mel. Eu ullum accusata nam, an sea wisi rebum. Ei homero equidem sea! Sed erat augue eripuit et, ea vim altera eirmod labores, ad noster veritus nec. 57 | 58 | Ut porro sententiae vis, debet affert eligendi id eam! In, nominati, pertinacia has, sea admodum dissentiunt eu! Volumus appellantur ex eos. Ei duo movet scripta aliquid, ea blandit explicari consectetuer eos. 59 | 60 | Ne cibo ornatus vituperata pri. Soleat populo fierent ne sed, vel congue consequat temporibus in. Pro eu nostro inermis sadipscing, ne pri possim lobortis! Sea sonet nihil accusata no. Mei virtute noluisse pericula ex, aliquid mandamus inimicus quo ex. 61 | 62 | Esse patrioque at qui, cum sanctus; consequuntur conclusionemque cu? Ut summo oportere appellantur mel, ex per tale semper appellantur. Usu ea alia insolens sadipscing, eu aeterno persius vix. Agam prodesset interpretaris at ius, ne est malis signiferumque, illum soluta albucius mei an. Ex error tollit recusabo est, ut prompta consectetuer per. Dicam numquam eum id, brute mollis nam cu! 63 | 64 | Ei vis discere interesset! Mutat 'option' qualisque ius te, sea deserunt lobortis voluptatum at. Qui et impedit accumsan atomorum, nam dicat possit ornatus an? Eu mei aperiri discere, sea veri homero ad, stet dolore putant mei in. Eu pri debet populo luptatum, eos te nominati concludaturque. 65 | 66 | Tota veritus similique ne per, eam fastidii voluptatum eu. Sea tale mandamus suscipiantur ex. Ullum ullamcorper consequuntur et cum, aeque fuisset ut sea! Mea graecis pertinax explicari ne, pri tale hinc no? Eu vidisse nominati eum, et eam hendrerit voluptatum assueverit, qui ne munere recusabo democritum." 67 | 68 | filler_stats <- text_stats_fn_(filler) 69 | 70 | n_char_tot_stri_f <- filler_stats$n_char_tot_stri 71 | n_char_tot_korp_f <- filler_stats$n_char_tot_korp 72 | 73 | n_words_stri_f <- filler_stats$n_words_stri 74 | n_words_korp_f <- filler_stats$n_words_korp 75 | 76 | n_sentences_korp_f <- filler_stats$n_sentences_korp 77 | 78 | expect_equal(n_char_tot_stri_f, 2896) 79 | expect_equal(n_char_tot_korp_f, 2897) 80 | expect_equal(n_words_stri_f, 450) 81 | expect_equal(n_words_korp_f, 450) # MS Word says 442 82 | expect_equal(n_sentences_korp_f, 52) 83 | }) 84 | 85 | 86 | 87 | test_that("Word count is correct for rmd text", { 88 | # text with code chunks, etc. 89 | rmd_text <- " 90 | 91 | --- 92 | title: 'Untitled' 93 | output: html_document 94 | --- 95 | 96 | ```{r setup, include=FALSE} 97 | knitr::opts_chunk$set(echo = TRUE) 98 | ``` 99 | 100 | 101 | 102 | ## Heading 103 | 104 | This is an [R markdown](http://rmarkdown.rstudio.com/) document. 105 | 106 | ```{r cars} 107 | summary(cars) 108 | # Lines line this have caused problems ----------------------------------------- 109 | ``` 110 | 111 | `r 2+2` 112 | 113 | `r nrow(cars)` 114 | 115 | ## Plots 116 | 117 | You can also embed plots, for example: 118 | 119 | ```{r pressure, echo=FALSE} 120 | plot(pressure) 121 | ``` 122 | 123 | ![this is the caption](/path/to/image.png) 124 | 125 | " 126 | 127 | rmd_stats <- text_stats_fn_(rmd_text) 128 | 129 | n_char_tot_stri_r <- rmd_stats$n_char_tot_stri 130 | n_char_tot_korp_r <- rmd_stats$n_char_tot_korp 131 | 132 | n_words_stri_r <- rmd_stats$n_words_stri 133 | n_words_korp_r <- rmd_stats$n_words_korp 134 | 135 | n_sentences_korp_r <- rmd_stats$n_sentences_korp 136 | 137 | expect_equal(n_char_tot_stri_r, 159) 138 | expect_equal(n_char_tot_korp_r, 159) 139 | expect_equal(n_words_stri_r, 20) 140 | expect_equal(n_words_korp_r, 20) 141 | expect_equal(n_sentences_korp_r, 4) 142 | }) 143 | 144 | test_that("we can ignore
and
", { 145 | # test for
146 | string_with_br <- "Hi, I have
in the
string" 147 | 148 | string_with_br_stats <- text_stats_fn_(string_with_br) 149 | 150 | n_char_tot_stri_r <- string_with_br_stats$n_char_tot_stri 151 | n_char_tot_korp_r <- string_with_br_stats$n_char_tot_korp 152 | 153 | n_words_stri_r <- string_with_br_stats$n_words_stri 154 | n_words_korp_r <- string_with_br_stats$n_words_korp 155 | 156 | n_sentences_korp_r <- string_with_br_stats$n_sentences_korp 157 | 158 | expect_equal(n_char_tot_stri_r, 26) 159 | expect_equal(n_char_tot_korp_r, 27) 160 | expect_equal(n_words_stri_r, 6) 161 | expect_equal(n_words_korp_r, 6) 162 | expect_equal(n_sentences_korp_r, 0) 163 | }) 164 | 165 | test_that("we can ignore HTML tags but keep greater/less", { 166 | string_gr_ls <- "Hi,
I am <20 but >10 years old" 167 | 168 | expect_equal(prep_text(string_gr_ls), 169 | "Hi, I am 20 but 10 years old") 170 | }) 171 | 172 | test_that("Word count is correct for rmd file", { 173 | # test that we can word count on a file 174 | the_rmd_file_stats <- text_stats(filename = test_path("test_wordcountaddin.Rmd")) 175 | 176 | expect_equal(the_rmd_file_stats[3], 177 | "|Word count |108 |107 |") 178 | expect_equal(the_rmd_file_stats[4], 179 | "|Character count |628 |628 |") 180 | expect_equal(the_rmd_file_stats[5], 181 | "|Sentence count |9 |Not available |") 182 | expect_equal(the_rmd_file_stats[6], 183 | "|Reading time |0.5 minutes |0.5 minutes |") 184 | }) 185 | 186 | 187 | test_that("Word count is correct for cmd line", { 188 | # command line fns 189 | text_on_the_command_line <- "here is some text" 190 | text_stats_chr_out <- text_stats_chr(text_on_the_command_line) 191 | 192 | expect_equal(text_stats_chr_out[3], 193 | "|Word count |4 |4 |") 194 | expect_equal(text_stats_chr_out[4], 195 | "|Character count |18 |17 |") 196 | expect_equal(text_stats_chr_out[5], 197 | "|Sentence count |0 |Not available |") 198 | expect_equal(text_stats_chr_out[6], 199 | "|Reading time |0 minutes |0 minutes |") 200 | }) 201 | 202 | 203 | test_that("readability is correct for cmd line", { 204 | text_on_the_command_line <- "here is some text" 205 | expect_output( 206 | expect_warning( 207 | readability_chr_out <- readability_chr(text_on_the_command_line) 208 | ) 209 | ) 210 | expect_length(readability_chr_out, 26) 211 | }) 212 | 213 | test_that("Word count is correct for text with % sign", { 214 | # test for escaping the percent sign in plain text 215 | text_with_percent_sign <- "Here is some % text with percent % signs in it." 216 | 217 | text_stats_percent_chr_out <- text_stats_chr(text_with_percent_sign) 218 | expect_equal(text_stats_percent_chr_out[3], 219 | "|Word count |9 |9 |") 220 | }) 221 | 222 | 223 | test_that("Word count is correct for text with figures included using LaTeX code", { 224 | # test for escaping the percent sign in plain text 225 | text_with_figures <- "One \\begin{figure} \\caption{text} \\label{text} \\includegraphics[width=\\textwidth]{figure.png} \\end{figure} Two \\begin{figure} \\caption{text} \\label{text} \\includegraphics[width=\\textwidth]{figure.png} \\end{figure} Three" 226 | 227 | text_stats_percent_chr_out <- text_stats_chr(text_with_figures) 228 | expect_equal(text_stats_percent_chr_out[3], 229 | "|Word count |3 |3 |") 230 | }) 231 | 232 | 233 | test_that("Word count is a single integer for a Rmd file when using word_count", { 234 | # test that we can word count on a file 235 | the_rmd_word_count <- word_count(filename = test_path("test_wordcountaddin.Rmd")) 236 | 237 | expect_equal(the_rmd_word_count, 238 | 108L) 239 | }) 240 | 241 | test_that("We can handle very long strings, like citation keys", { 242 | 243 | expect_output( 244 | expect_warning( 245 | # test that we can word count on a file 246 | long_string_read <- readability_chr("it's a long string right at the end here because a tiny 247 | fraction of the refreneces have crazy long keys. Why do they do 248 | that? It's autogenerated. Why does this give so many warnings when 249 | testing. It's a puzzle. [@aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa].", 250 | quiet = TRUE))) 251 | 252 | expect_equal( attr(long_string_read, 'format'), "markdown") 253 | 254 | }) 255 | 256 | test_that("don't count abbreviations as multiple words", { 257 | 258 | 259 | # test that we can word count on a file 260 | words_with_abbv <- "zero .o.n.e .t.wo." 261 | abbrev_count <- text_stats_chr(words_with_abbv) 262 | 263 | expect_equal( abbrev_count[3], "|Word count |3 |3 |") 264 | 265 | }) 266 | 267 | test_that("text_to_count reads file contents as character vector", { 268 | contents <- text_to_count(test_path("test_wordcountaddin.Rmd")) 269 | 270 | expect_type(contents, "character") 271 | expect_length(contents, 1) 272 | }) 273 | 274 | test_that("text_to_count raises an error for invalid file types", { 275 | expect_error(text_to_count("invalid.tif"), regexp = "works with markdown") 276 | }) 277 | 278 | -------------------------------------------------------------------------------- /tests/testthat/test_wordcountaddin.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "rmd_test_file.rmd" 3 | output: 4 | word_document: default 5 | html_document: default 6 | --- 7 | 8 | ```{r setup, include=FALSE} 9 | knitr::opts_chunk$set(echo = TRUE) 10 | ``` 11 | 12 | ## R Markdown 13 | 14 | This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see . 15 | 16 | When you click the **Knit** button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this: 17 | 18 | ```{r cars} 19 | summary(cars) 20 | 21 | # Lines line this have caused problems ----------------------------------------- 22 | ``` 23 | 24 | ## Including Plots 25 | 26 | You can also embed plots, for example: 27 | 28 | ```{r pressure, echo=FALSE} 29 | plot(pressure) 30 | ``` 31 | 32 | Note that the `echo = FALSE` parameter was added to the code chunk to prevent printing of the R code that generated the plot. 33 | 34 | ```{r} 35 | # context <- rstudioapi::getActiveDocumentContext() 36 | ``` 37 | 38 | This Markdown file contains `r wordcountaddin::word_count()` words: 39 | 40 | ```{r, message=FALSE, echo=FALSE, error=TRUE} 41 | wordcountaddin::text_stats() 42 | ``` 43 | 44 | 45 | ::: {.cell layout-align="center"} 46 | 47 | ::: 48 | 49 | ::: {.cell layout-align="center"} 50 | ::: {.cell-output-display} 51 | ::: 52 | ::: 53 | -------------------------------------------------------------------------------- /tests/testthat/test_wordcountaddin.docx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/benmarwick/wordcountaddin/13bf891f11322c73919be59ed797cf201e725cac/tests/testthat/test_wordcountaddin.docx -------------------------------------------------------------------------------- /wordcountaddin.Rproj: -------------------------------------------------------------------------------- 1 | Version: 1.0 2 | 3 | RestoreWorkspace: Default 4 | SaveWorkspace: Default 5 | AlwaysSaveHistory: Default 6 | 7 | EnableCodeIndexing: Yes 8 | UseSpacesForTab: Yes 9 | NumSpacesForTab: 2 10 | Encoding: UTF-8 11 | 12 | RnwWeave: knitr 13 | LaTeX: XeLaTeX 14 | 15 | AutoAppendNewline: Yes 16 | StripTrailingWhitespace: Yes 17 | 18 | BuildType: Package 19 | PackageUseDevtools: Yes 20 | PackageInstallArgs: --no-multiarch --with-keep.source 21 | PackageRoxygenize: rd,collate,namespace 22 | --------------------------------------------------------------------------------