├── .DS_Store
├── .Rbuildignore
├── .github
    ├── ISSUE_TEMPLATE.md
    └── ISSUE_TEMPLATE
    │   └── bug_report.md
├── .gitignore
├── .travis.yml
├── CONDUCT.md
├── CONTRIBUTING.md
├── DESCRIPTION
├── LICENSE
├── NAMESPACE
├── NEWS.md
├── R
    ├── hello.R
    └── utils.R
├── README.Rmd
├── README.md
├── codecov.yml
├── inst
    ├── logo.png
    ├── rstudio
    │   └── addins.dcf
    └── wordcountaddin.svg
├── man
    ├── text_stats.Rd
    └── wordcountaddin.Rd
├── tests
    ├── testthat.R
    └── testthat
    │   ├── test_wordcountaddin.R
    │   ├── test_wordcountaddin.Rmd
    │   └── test_wordcountaddin.docx
└── wordcountaddin.Rproj


/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/benmarwick/wordcountaddin/13bf891f11322c73919be59ed797cf201e725cac/.DS_Store


--------------------------------------------------------------------------------
/.Rbuildignore:
--------------------------------------------------------------------------------
 1 | ^.*\.Rproj$
 2 | ^\.Rproj\.user$
 3 | ^\.travis\.yml$
 4 | ^README\.Rmd$
 5 | ^README-.*\.png$
 6 | ^CONDUCT\.md$
 7 | ^CONTRIBUTING.md$
 8 | ^codecov\.yml$
 9 | ^wordcountaddin\.Rcheck$
10 | ^wordcountaddin.*\.tar\.gz$
11 | ^wordcountaddin.*\.tgz$
12 | .github/
13 | 


--------------------------------------------------------------------------------
/.github/ISSUE_TEMPLATE.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | name: Bug report
 3 | about: Create a report to help us improve
 4 | title: ''
 5 | labels: ''
 6 | assignees: ''
 7 | 
 8 | ---
 9 | 
10 | **Please wait for some discussion of your report before making a Pull Request.**
11 | 
12 | **Describe the bug**
13 | A clear and concise description of what the bug is.
14 | 
15 | **To Reproduce**
16 | 
17 | Please include a minimal reproducible example (AKA a reprex). If you've never heard of a [reprex](http://reprex.tidyverse.org/) before, start by reading <https://www.tidyverse.org/help/#reprex>.
18 | 
19 | Describe the steps to reproduce the behavior:
20 | 1. Go to '...'
21 | 2. Click on '....'
22 | 3. Scroll down to '....'
23 | 4. See error
24 | 
25 | **Expected behavior**
26 | A clear and concise description of what you expected to happen.
27 | 
28 | **Session Info**
29 | Output of `sessionInfo()` on your device so we can see what packages and version numbers you have
30 | 


--------------------------------------------------------------------------------
/.github/ISSUE_TEMPLATE/bug_report.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | name: Bug report
 3 | about: Create a report to help us improve
 4 | title: ''
 5 | labels: bug
 6 | assignees: ''
 7 | 
 8 | ---
 9 | 
10 | **Describe the bug**
11 | A clear and concise description of what the bug is.
12 | 
13 | **To Reproduce**
14 | Please include a minimal reproducible example (AKA a reprex). If you've never heard of a [reprex](http://reprex.tidyverse.org/) before, start by reading <https://www.tidyverse.org/help/#reprex>.
15 | 
16 | **Expected behavior**
17 | A clear and concise description of what you expected to happen.
18 | 
19 | **Session Info**
20 | Output of `sessionInfo()` on your device so we can see what packages and version numbers you have
21 | 


--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
1 | .Rproj.user
2 | .Rhistory
3 | .RData
4 | wordcountaddin.Rcheck/
5 | wordcountaddin*.tar.gz
6 | wordcountaddin*.tgz
7 | 


--------------------------------------------------------------------------------
/.travis.yml:
--------------------------------------------------------------------------------
 1 | # Sample .travis.yml for R projects
 2 | 
 3 | language: r
 4 | warnings_are_errors: false
 5 | sudo: required
 6 | 
 7 | r_github_packages:
 8 |   - jimhester/covr
 9 | 
10 | after_success:
11 |   - Rscript -e 'covr::codecov()'
12 | 
13 | 
14 | 


--------------------------------------------------------------------------------
/CONDUCT.md:
--------------------------------------------------------------------------------
 1 | # Contributor Code of Conduct
 2 | 
 3 | As contributors and maintainers of this project, we pledge to respect all people who 
 4 | contribute through reporting issues, posting feature requests, updating documentation,
 5 | submitting pull requests or patches, and other activities.
 6 | 
 7 | We are committed to making participation in this project a harassment-free experience for
 8 | everyone, regardless of level of experience, gender, gender identity and expression,
 9 | sexual orientation, disability, personal appearance, body size, race, ethnicity, age, or religion.
10 | 
11 | Examples of unacceptable behavior by participants include the use of sexual language or
12 | imagery, derogatory comments or personal attacks, trolling, public or private harassment,
13 | insults, or other unprofessional conduct.
14 | 
15 | Project maintainers have the right and responsibility to remove, edit, or reject comments,
16 | commits, code, wiki edits, issues, and other contributions that are not aligned to this 
17 | Code of Conduct. Project maintainers who do not follow the Code of Conduct may be removed 
18 | from the project team.
19 | 
20 | Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by 
21 | opening an issue or contacting one or more of the project maintainers.
22 | 
23 | This Code of Conduct is adapted from the Contributor Covenant 
24 | (http:contributor-covenant.org), version 1.0.0, available at 
25 | http://contributor-covenant.org/version/1/0/0/
26 | 


--------------------------------------------------------------------------------
/CONTRIBUTING.md:
--------------------------------------------------------------------------------
 1 | # Contributing Guidelines
 2 | 
 3 | ## Pull requests
 4 | 
 5 | Requirements for making a pull request:
 6 | 
 7 |   * Some knowledge of [git]()
 8 | * Some knowledge of [GitHub]()
 9 | 
10 | Read more about pull requests on GitHub at [https://help.github.com/articles/using-pull-requests/](https://help.github.com/articles/using-pull-requests/). If you haven't done this before, Hadley Wickham provides a nice overview of git (<http://r-pkgs.had.co.nz/git.html>), as well as best practices for submitting pull requests (<http://r-pkgs.had.co.nz/git.html#pr-make>).
11 | 
12 | Then:
13 | 
14 | * Fork the repo to your GitHub account
15 | * Clone the version on your account down to your machine from your account, e.g,. `git clone git@github.com:benmarwick/<package name>.git`
16 | * Make sure to track progress upstream (i.e., on our version of the package at `benmarwick/<package name>`) by doing `git remote add upstream git@github.com:benmarwick/<package name>.git`. Each time you go to make changes on your machine, be sure to pull changes in from upstream (aka the ropensci version) by doing either `git fetch upstream` then merge later or `git pull upstream` to fetch and merge in one step
17 | * Make your changes (we prefer if you make changes on a new branch)
18 | * Ideally included in your contributions:
19 | * Well documented code in roxygen docs
20 | * If you add new functions or change functionality, add one or more tests.
21 | * Make sure the package passes `R CMD CHECK` on your machine without errors/warnings
22 | * Push up to your account
23 | * Submit a pull request and participate in the discussion.
24 | 
25 | ## Documentation contributions
26 | 
27 | Documentation contributions are surely much needed in every project as each could surely use better instructions. If you are editing any files in the repo, follow the above instructions for pull requests to add contributions. However, if you are editing the wiki, then you can just edit the wiki and no need to do git, pull requests, etc.
28 | 
29 | All of the function documentation is generated automatically. Please do not edit any of the documentation files in man/ or the NAMESPACE. Instead, construct the appropriate roxygen2 documentation in the function files in R/ themselves. The documentation is then generated by running the document() function from the devtools package. Please consult the Advanced R programming guide if this workflow is unfamiliar to you. Note that functions should include examples in the documentation. Please use \dontrun for examples that take more than a few seconds to execute or require an internet connection.
30 | 
31 | Likewise, the README.md file in the base directory should not be edited directly. This file is created automatically from code that runs the examples shown, helping to ensure that they are functioning as advertised and consistent with the package README vignette. Instead, edit the README.Rmd source file in manuscripts and run make to build the README.
32 | 
33 | ## Repository structure
34 | 
35 | This repository is structured as a standard R package following the conventions outlined in the Writing R extensions manual. A few additional files are provided that are not part of the built R package and are listed in .Rbuildignore, such as .travis.yml, which is used for continuous testing and integration.
36 | 
37 | ## Code
38 | 
39 | All code for this package is found in R/, (except compiled source code, if used, which is in /src). All functions should be thoroughly documented with roxygen2 notation; see Documentation.
40 | 
41 | Bug reports _must_ have a [reproducible example](http://adv-r.had.co.nz/Reproducibility.html) and include the output of `devtools::session_info()` (instead of `sessionInfo()`). We recommend using Hadley Wickham's style guide when writing code (<http://adv-r.had.co.nz/Style.html>).
42 | 
43 | ## Testing
44 | 
45 | Any new feature or bug-fix should include a unit-test demonstrating the change. Unit tests follow the testthat framework with files in tests/testthat. Please make sure that the testing suite passes before issuing a pull request. This can be done by running check() from the devtools package, which will also check for consistent documentation, etc.
46 | 
47 | This package uses the travis continuous testing mechanism for R to ensure that the test suite is run on each push to Github. An icon at the top of the README.md indicates whether or not the tests are currently passing.
48 | 
49 | ## Questions or comments?
50 | 
51 | Do not hesitate to open an issue in the issues tracker to raise any questions or comments about the package or these guidelines.
52 | 


--------------------------------------------------------------------------------
/DESCRIPTION:
--------------------------------------------------------------------------------
 1 | Package: wordcountaddin
 2 | Type: Package
 3 | Title: Word counts and readability statistics in R markdown documents
 4 | Version: 0.3.0.9000
 5 | Authors@R: c(person("Ben", "Marwick",
 6 |                   email = "benmarwick@gmail.com",
 7 |                   role = c("aut", "cre")),
 8 |             person("JooYoung", "Seo",
 9 |                   email = "jooyoung@psu.edu",
10 |                   role = "ctb", comment = c(ORCID = "0000-0002-4064-6012")),
11 |             person("Henrik", "Bengtsson",
12 |                   email = "henrik.bengtsson@gmail.com",
13 |                   role = "ctb"),
14 |             person("Florian S.", "Schaffner",
15 |                   email = "florian.schaffner@outlook.com",
16 |                   role = "ctb"),
17 |             person("Matthew T.", "Warkentin",
18 |                    email = "warkentin@lunenfeld.ca",
19 |                    role = "ctb"),
20 |             person("Luke A.", "McGuinness",
21 |                   email = "luke.a.mcguinness@gmail.com",
22 |                   role = "ctb",
23 |                   comment = c(ORCID = "0000-0001-8730-9761")))
24 | Maintainer: Ben Marwick <benmarwick@gmail.com>
25 | Description: An addin for RStudio that will count the words and characters
26 |     in a plain text document. It is designed for use with RMarkdown
27 |     documents and will exclude YAML header content, code chunks and inline
28 |     code from the counts. It also computes readability statistics so you can
29 |     get an idea of how easy or difficult your text is to read.
30 | License: MIT + file LICENSE
31 | LazyData: TRUE
32 | Imports:
33 |     fs,
34 |     knitr,
35 |     koRpus,
36 |     koRpus.lang.en,
37 |     miniUI (>= 0.1.1),
38 |     purrr,
39 |     rstudioapi (>= 0.5),
40 |     shiny (>= 0.13),
41 |     stringi,
42 |     sylly,
43 |     sylly.en
44 | Encoding: UTF-8
45 | RoxygenNote: 7.1.1
46 | Suggests:
47 |     covr,
48 |     testthat
49 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | YEAR: 2017
2 | COPYRIGHT HOLDER: Ben Marwick
3 | 


--------------------------------------------------------------------------------
/NAMESPACE:
--------------------------------------------------------------------------------
 1 | # Generated by roxygen2: do not edit by hand
 2 | 
 3 | export(readability)
 4 | export(readability_chr)
 5 | export(text_stats)
 6 | export(text_stats_chr)
 7 | export(text_stats_fn_)
 8 | export(word_count)
 9 | import(koRpus)
10 | import(purrr)
11 | import(stringi)
12 | 


--------------------------------------------------------------------------------
/NEWS.md:
--------------------------------------------------------------------------------
 1 | # wordcountaddin 0.3.0
 2 | 
 3 | NEW FEATURES
 4 | 
 5 | * Count words from Rmd filename and get scalar as output (#20)
 6 | 
 7 | MINOR IMPROVEMENTS
 8 | 
 9 | * make the functions more DRY by adding some unexported fns
10 | * Expanded readme slightly
11 | * Added more tests
12 | 
13 | # wordcountaddin 0.2.0
14 | 
15 | NEW FEATURES
16 | 
17 | * Count words from Rmd filename without using RStudio (#3)
18 | * Count words in active Rmd in RStudio without making text selection (#3)
19 | * Count words in character string from command line (without Rmd or RStudio) (#2)
20 | 
21 | MINOR IMPROVEMENTS
22 | 
23 | * Added a `NEWS.md` file to track changes to the package.
24 | * Expanded readme
25 | * Added more tests
26 | 
27 | BUG FIXES
28 | 
29 | * Fixed inaccurate count when <br> present (#1)
30 | 
31 | DEPRECATED AND DEFUNCT
32 | 
33 | NA
34 | 
35 | # wordcountaddin 0.1.0
36 | 
37 | Initial release
38 | 
39 | 
40 | 


--------------------------------------------------------------------------------
/R/hello.R:
--------------------------------------------------------------------------------
  1 | #' wordcountaddin
  2 | #'
  3 | #' This packages is an addin for RStudio that will count the words and characters in a plain text document. It is designed for use with R markdown documents and will exclude YAML header content, code chunks and inline code from the counts. It also computes readability statistics so you can get an idea of how easy or difficult your text is to read.
  4 | #'
  5 | #' @name wordcountaddin
  6 | #' @docType package
  7 | #' @import purrr stringi koRpus
  8 | NULL
  9 | 
 10 | # global things
 11 | 
 12 |  md_file_ext_regex <- paste(
 13 |     "\\.markdown$",
 14 |     "\\.mdown$",
 15 |     "\\.mkdn$",
 16 |     "\\.md$",
 17 |     "\\.mkd$",
 18 |     "\\.mdwn$",
 19 |     "\\.mdtxt$",
 20 |     "\\.mdtext$",
 21 |     "\\.rmd$",
 22 |     "\\.Rmd$",
 23 |     "\\.RMD$",
 24 |     "\\.Rmarkdown$",
 25 |     "\\.qmd$",
 26 |   sep = "|")
 27 | 
 28 | 
 29 | #-------------------------------------------------------------------
 30 | # fns for working with selected text in an active Rmd
 31 | 
 32 | #' Get text stats for selected text (excluding code chunks and inline code)
 33 | #'
 34 | #' Call this addin to get a word count and some other stats about the text
 35 | #' @param filename Path to the file on which to compute text stats.
 36 | #' Default is the current file (when working in RStudio) or the file being
 37 | #' knit (when compiling with \code{knitr}).
 38 | #'
 39 | #' @export
 40 | #' @examples
 41 | #' md <- system.file(package = "wordcountaddin", "NEWS.md")
 42 | #' text_stats(md)
 43 | #' word_count(md)
 44 | #' \dontrun{
 45 | #' readability(md)
 46 | #' }
 47 | text_stats <- function(filename = this_filename()) {
 48 | 
 49 |   text_to_count_output <- text_to_count(filename)
 50 | 
 51 |   text_stats_fn(text_to_count_output)
 52 | }
 53 | 
 54 | 
 55 | #' @rdname text_stats
 56 | #' @description Get a word count as a single integer
 57 | #' @export
 58 | word_count <- function(filename = this_filename()){
 59 | 
 60 |   text_to_count_output <- text_to_count(filename)
 61 | 
 62 |   word_count_output <- text_stats_fn_(text_to_count_output)
 63 | 
 64 |   word_count_output$n_words_korp
 65 | }
 66 | 
 67 | 
 68 | 
 69 | 
 70 | 
 71 | 
 72 | #' @rdname text_stats
 73 | #' @description Get readability stats for selected text (excluding code chunks)
 74 | #' @param quiet Logical. Should task be performed quietly?
 75 | #'
 76 | #' @details Call this addin to get readbility stats about the text
 77 | #'
 78 | #' @export
 79 | readability <- function(filename = this_filename(), quiet = TRUE) {
 80 | 
 81 | 
 82 |   text_to_count_output <- text_to_count(filename)
 83 | 
 84 |   readability_fn(text_to_count_output, quiet = TRUE)
 85 | }
 86 | 
 87 | #---------------------------------------------------------------
 88 | # directly work on a character string in the console
 89 | 
 90 | 
 91 | #' @rdname text_stats
 92 | #' @description Get text stats for selected text (excluding code chunks and inline code)
 93 | #'
 94 | #' @details Use this function with a character string as input
 95 | #'
 96 | #' @export
 97 | text_stats_chr <- function(text) {
 98 | 
 99 |   text <- paste(text, collapse="\n")
100 | 
101 |   text_stats_fn(text)
102 | 
103 | }
104 | 
105 | 
106 | #' @rdname text_stats
107 | #' @description Get readability stats for selected text (excluding code chunks)
108 | #'
109 | #' @details Use this function with a character string as input
110 | #'
111 | #' @param text a character string of text, length of one
112 | #'
113 | #' @export
114 | readability_chr <- function(text, quiet = TRUE) {
115 | 
116 |   text <- paste(text, collapse = "\n")
117 | 
118 |   readability_fn(text, quiet = TRUE)
119 | 
120 | }
121 | #-----------------------------------------------------------
122 | # helper fns, not exported
123 | 
124 | text_to_count <- function(filename){
125 |   # selected text takes precedence over the filename argument:
126 |   # if text is selected, it is used. Otherwise, the text in filename is used
127 |   if (rstudioapi::isAvailable()) {
128 |     context <- rstudioapi::getActiveDocumentContext()
129 |     selection_text <- unname(unlist(context$selection)["text"])
130 |     text_is_selected <- nchar(selection_text) > 0
131 |   } else {
132 |     # if not running in RStudio, assume no text is selected
133 |     text_is_selected <- FALSE
134 |   }
135 | 
136 |   if (text_is_selected) {
137 |     text <- selection_text
138 |   } else {
139 |     # if no text is selected, read text from "filename" as character vector
140 |     is_extension_invalid <- !grepl(md_file_ext_regex, filename)
141 |     if (is_extension_invalid) {
142 |       stop(paste("The supplied file has an extension which is not associated with markdown.",
143 |                  "This function only works with markdown or R markdown files.", sep = "\n  "))
144 |     }
145 |     text <- paste(scan(filename, 'character', quiet = TRUE), collapse = " ")
146 |   }
147 |   text
148 | }
149 | 
150 | prep_text <- function(text){
151 | 
152 |   # remove all line breaks, http://stackoverflow.com/a/21781150/1036500
153 |   text <- gsub("[\r\n]", " ", text)
154 | 
155 |   # don't include yaml front matter
156 |   three_dashes <- unlist(gregexpr('---', text))
157 |   if (three_dashes[1]==1L) {
158 |     yaml_end <- three_dashes[2] + 2L
159 |     text <- substr(text, yaml_end + 1L, nchar(text))
160 |   } else {
161 |     text
162 |   }
163 | 
164 |   # don't include text in code chunks: https://regex101.com/#python
165 |   text <- gsub("```\\{.+?\\}.+?```", "", text)
166 | 
167 |   # don't include text in in-line R code
168 |   text <- gsub("`r.+?`", "", text)
169 | 
170 |   # don't include HTML comments
171 |   text <- gsub("<!--.+?-->", "", text)
172 | 
173 |   # don't include LaTeX comments
174 |   # how to do this? %%
175 | 
176 |   # don't include images with captions
177 |   text <- gsub("!\\[.+?\\]\\(.+?\\)\\{.+?\\}", "", text)
178 |   text <- gsub("!\\[.+?\\]\\(.+?\\)", "", text)
179 | 
180 |   # don't include inline markdown URLs
181 |   text <- gsub("\\(http.+?\\)", "", text)
182 | 
183 |   # don't include # for headings
184 |   text <- gsub("#*", "", text)
185 | 
186 |   # don't include opening html tags
187 |   # (source: https://www.w3schools.com/TAGS/default.ASP)
188 | 
189 |   tags <- paste0("!DOCTYPE|a|abbr|acronym|address|applet|area|article|aside|",
190 |                  "audio|b|base|basefont|bdi|bdo|big|blockquote|body|br|button|",
191 |                  "canvas|caption|center|cite|code|col|colgroup|data|datalist|",
192 |                  "dd|del|details|dfn|dialog|dir|div|dl|dt|em|embed|fieldset|",
193 |                  "figcaption|figure|font|footer|form|frame|frameset|h1 to h6|",
194 |                  "head|header|hr|html|i|iframe|img|input|ins|kbd|label|legend|",
195 |                  "li|link|main|map|mark|meta|meter|nav|noframes|noscript|",
196 |                  "object|ol|optgroup|option|output|p|param|picture|pre|",
197 |                  "progress|q|rp|rt|ruby|s|samp|script|section|select|small|",
198 |                  "source|span|strike|strong|style|sub|summary|sup|svg|table|",
199 |                  "tbody|td|template|textarea|tfoot|th|thead|time|title|tr|",
200 |                  "track|tt|u|ul|var|video|wbr")
201 | 
202 |   text <- gsub(paste0("<\\s*(",tags,")[^>]*>"),"", text)
203 | 
204 |   # don't include closing html tags
205 |   text <- gsub("</.+?>", "", text)
206 | 
207 |   # don't include greater/less than signs because they trip up koRpus
208 |   text <- gsub("<|>", "", text)
209 | 
210 |   # don't include percent signs because they trip up stringi
211 |   text <- gsub("%", "", text)
212 | 
213 |   # don't include figures and tables inserted using plain LaTeX code
214 |   text <- gsub("\\\\begin\\{figure\\}(.*?)\\\\end\\{figure\\}", "", text)
215 |   text <- gsub("\\\\begin\\{table\\}(.*?)\\\\end\\{table\\}", "", text)
216 | 
217 |   # don't count abbreviations as multiple words, but leave
218 |   # the period at the end in case it's the end of a sentence
219 |   text <- gsub("\\.(?=[a-z]+)", "", text, perl = TRUE)
220 | 
221 |   # don't include LaTeX \eggs{ham}
222 |   # how to do? problem with capturing \x
223 | 
224 |   # remove lines starting with :::
225 |   text <- gsub(":::.*", "", text)
226 |  
227 | 
228 | 
229 |   if(nchar(text) == 0){
230 |     stop("You have not selected any text. Please select some text with the mouse and try again")
231 |   }
232 | 
233 |   return(text)
234 | 
235 | }
236 | 
237 | prep_text_korpus <- function(text){
238 |   lengths <- unlist(strsplit(text, " "))
239 |   no_long_one <- paste0(ifelse(nchar(lengths) > 30, substr(lengths, 1, 10), lengths), collapse = " ")
240 |   tokenize_safe <- purrr::safely(koRpus::tokenize)
241 |   k1 <- tokenize_safe(no_long_one, lang = 'en', format = 'obj')
242 |   k1 <- k1$result
243 |   return(k1)
244 | }
245 | 
246 | 
247 | # These functions do the actual work
248 | 
249 | #' @rdname text_stats
250 | #' @export
251 | text_stats_fn_ <- function(text){
252 |   # suppress warnings
253 |   oldw <- getOption("warn")
254 |   options(warn = -1)
255 | 
256 |   text <- prep_text(text)
257 | 
258 |   require("koRpus.lang.en", quietly = TRUE)
259 | 
260 |   # stringi methods
261 |   n_char_tot <- sum(stri_stats_latex(text)[c(1,3)])
262 |   n_words_stri <- unname(stri_stats_latex(text)[4])
263 | 
264 |   #korpus methods
265 |   k1 <- prep_text_korpus(text)
266 |   korpus_stats <- sylly::describe(k1)
267 |   k_nchr <- korpus_stats$all.chars
268 |   k_wc <- korpus_stats$words
269 |   k_sent <- korpus_stats$sentences
270 |   k_wps <- k_wc / k_sent
271 | 
272 |   # reading time
273 |   # https://en.wikipedia.org/wiki/Words_per_minute#Reading_and_comprehension
274 |   # assume 200 words per min
275 |   wpm <-  200
276 |   reading_time_korp <- paste0(round(k_wc / wpm, 1), " minutes")
277 |   reading_time_stri <- paste0(round(n_words_stri / wpm, 1), " minutes")
278 | 
279 |   return(list(
280 |   # make the names more useful
281 |   n_char_tot_stri = n_char_tot,
282 |   n_char_tot_korp = k_nchr,
283 |   n_words_korp = k_wc,
284 |   n_words_stri = n_words_stri,
285 |   n_sentences_korp = k_sent,
286 |   words_per_sentence_korp = k_wps,
287 |   reading_time_korp = reading_time_korp,
288 |   reading_time_stri = reading_time_stri
289 |   ))
290 | 
291 |   # resume warnings
292 |   options(warn = oldw)
293 | 
294 | }
295 | 
296 | 
297 | 
298 | text_stats_fn <- function(text){
299 | 
300 |   l <- text_stats_fn_(text)
301 | 
302 |   results_df <- data.frame(Method = c("Word count", "Character count", "Sentence count", "Reading time"),
303 |                            koRpus  = c(l$n_words_korp, l$n_char_tot_korp, l$n_sentences_korp, l$reading_time_korp),
304 |                            stringi = c(l$n_words_stri, l$n_char_tot_stri, "Not available", l$reading_time_stri)
305 |                            )
306 | 
307 |   results_df_tab <- knitr::kable(results_df)
308 |   return(results_df_tab)
309 | 
310 | }
311 | 
312 | 
313 | readability_fn_ <- function(text, quiet = TRUE){
314 | 
315 |   text <- prep_text(text)
316 | 
317 |   oldw <- getOption("warn")
318 |   options(warn = -1)
319 | 
320 |   require("koRpus.lang.en", quietly = TRUE)
321 | 
322 |   # korpus methods
323 |   k1 <- prep_text_korpus(text)
324 |   k_readability <- koRpus::readability(k1, quiet = TRUE)
325 | 
326 |   return(k_readability)
327 | 
328 |   # resume warnings
329 |   options(warn = oldw)
330 | }
331 | 
332 | 
333 | readability_fn <- function(text, quiet = TRUE){
334 |   # a more condensed overview of the results
335 |   k_readability <- readability_fn_(text, quiet = TRUE)
336 |   readability_summary_table <- knitr::kable(summary(k_readability))
337 |   return(readability_summary_table)
338 | 
339 | }
340 | 


--------------------------------------------------------------------------------
/R/utils.R:
--------------------------------------------------------------------------------
 1 | # Get the filename of the current file, or
 2 | # the file being rendered
 3 | 
 4 | this_filename <- function() {
 5 |   if (interactive()) {
 6 |     filename <- rstudioapi::getSourceEditorContext()$path
 7 |   } else {
 8 |     filename <- knitr::current_input()
 9 |   }
10 |   return(fs::path(filename))
11 | }
12 | 


--------------------------------------------------------------------------------
/README.Rmd:
--------------------------------------------------------------------------------
  1 | ---
  2 | output:
  3 |   md_document:
  4 |     variant: markdown_github
  5 | ---
  6 | 
  7 | 
  8 | 
  9 | <!-- README.md is generated from README.Rmd. Please edit that file -->
 10 | 
 11 | ```{r, echo = FALSE}
 12 | knitr::opts_chunk$set(
 13 |   collapse = TRUE,
 14 |   comment = "#>",
 15 |   fig.path = "README-"
 16 | )
 17 | ```
 18 | 
 19 | 
 20 | # wordcountaddin <img src="inst/logo.png" align="right" height="130" />
 21 | 
 22 | [![Last-changedate](https://img.shields.io/badge/last%20change-`r gsub('-', '--', Sys.Date())`-brightgreen.svg)](https://github.com/benmarwick/wordcountaddin/commits/master) 
 23 | [![minimal R version](https://img.shields.io/badge/R%3E%3D-`r as.character(getRversion())`-brightgreen.svg)](https://cran.r-project.org/)
 24 | [![Licence](https://img.shields.io/github/license/mashape/apistatus.svg)](http://choosealicense.com/licenses/mit/) 
 25 | [![Travis-CI Build Status](https://travis-ci.org/benmarwick/wordcountaddin.png?branch=master)](https://travis-ci.org/benmarwick/wordcountaddin) 
 26 | [![codecov.io](https://codecov.io/github/benmarwick/wordcountaddin/coverage.svg?branch=master)](https://codecov.io/github/benmarwick/wordcountaddin?branch=master) [![ORCiD](https://img.shields.io/badge/ORCiD-0000--0001--7879--4531-green.svg)](http://orcid.org/0000-0001-7879-4531) 
 27 | 
 28 | 
 29 | 
 30 | 
 31 | This R package is an [RStudio addin](https://rstudio.github.io/rstudioaddins/) to count words and characters in text in an [R markdown](http://rmarkdown.rstudio.com/) document. It also has a function to compute readability statistics so you can get an indication of how easy or difficult your document is to read. 
 32 | 
 33 | You can count words in your Rmd file in three ways:
 34 | 
 35 | - In a selection of text in your active Rmd, by selecting some text with your mouse in RStudio and using the Wordcount Addin   
 36 | - All the words in your active Rmd in RStudio, by using the Wordcount Addin  with no text selected
 37 | - All the words in an Rmd file, directly using the `word_count` function from the console or command line (RStudio not required), and specifiying the filename as an argument to the function (e.g. `wordcountaddin::word_count("my_file.Rmd")`). This will give you a single integer result, rather than the Markdown table that the other functions return. 
 38 | 
 39 | Independent of an Rmd file, you can also count words in a character vector from the console using the `text_stats_chr` function (and there is `readability_chr` for readability). 
 40 | 
 41 | ## Word count
 42 | 
 43 | When counting words in the text of your Rmd document, these things will be ignored:
 44 | 
 45 | - YAML front matter    
 46 | - code chunks and inline code
 47 | - text in HTML comment tags: `<!-- text -->` 
 48 | - HTML tags in the text: `<br>`,  `</br>`
 49 | - inline URLs in this format: `[text of link](url)`
 50 | - images with captions in this format: `![this is the caption](/path/to/image.png)`
 51 | - header level indicators such as `#` and `##`, etc.
 52 | 
 53 | And because my regex is quite simple, the word count function may also ignore parts of your actual text that resemble these things. 
 54 | 
 55 | The word count will include text in headers, block quotations, verbatim code blocks, tables, raw LaTeX and raw HTML. 
 56 | 
 57 | In general, there are numerous ways to count words, with no widely accepted standard method. The variety of methods is due to differences in the definitions of a word and a sentence. Run `?stringi::stri_stats_latex` and `?koRpus::describe` to learn more about the word counting methods.
 58 | 
 59 | For this addin I've included two methods, mostly out of curiosity to see how they differ from each other. I use functions from the  [stringi](https://cran.r-project.org/web/packages/stringi/index.html) and [koRpus](https://cran.r-project.org/web/packages/koRpus/index.html) packages. If you're curious, you can compare the results you get with this addin to an online tool such as <http://wordcounttools.com/>.
 60 | 
 61 | The output of the `Word count` function is a markdown table in your R console that might look like this:
 62 | 
 63 | ```
 64 | |Method          |koRpus      |stringi       |
 65 | |:---------------|:-----------|:-------------|
 66 | |Word count      |107         |104           |
 67 | |Character count |604         |603           |
 68 | |Sentence count  |10          |Not available |
 69 | |Reading time    |0.5 minutes |0.5 minutes   |
 70 | ```
 71 | 
 72 | If you want to reuse these results in other R functions, you can use an unexported function like this `wordcountaddin:::text_stats_fn_(text)`, where `text` is a character vector of your text (with length one, ie. all your text in a single character string). The output will be a list object, and will include several other items not shown in the markdown table.
 73 | 
 74 | ## Readability 
 75 | 
 76 | The readability function ignores all the same parts of the text as the word count function, and then computes the values of a bunch of [readability statistics](https://en.wikipedia.org/wiki/Readability_test).
 77 | 
 78 | Most of these readability measurements aim to approximate the years of education required to understand your text. They look at the number of characters and syllables per word, the number of words per sentence, and so on. They don't analyse the meaning of the words. A score of around 10-12 is roughly the reading level on completion of high school in the US. These stats are computed by the [koRpus](https://cran.r-project.org/web/packages/koRpus/index.html) package. 
 79 | 
 80 | There about 27 measurements that this readability function returns (depending on how long your text is), including the Automated Readability Index (ARI), Coleman-Liau, th Flesch-Kincaid Grade Level, and the Simple Measure of Gobbledygook (SMOG). For the full list of readability measurements that are returned by the readability function, run `?koRpus::readability`. That help page also shows the formulae and citations for each statistic (and an additional 20-odd other readability statistics not used here). 
 81 | 
 82 | Readability stats are, of course, no substitute for critical self-reflection on the effectiveness of your writing at communicating ideas and information. To help with that, read [_Style: Toward Clarity and Grace_](http://www.amazon.com/dp/0226899152).
 83 | 
 84 | 
 85 | The output of the `readability` function is a markdown table in your R console that might look like this:
 86 | 
 87 | ```
 88 | 
 89 | |index                 |flavour     |raw   |grade |age  |
 90 | |:---------------------|:-----------|:-----|:-----|:----|
 91 | |ARI                   |            |      |2.31  |     |
 92 | |Coleman-Liau          |            |66    |4.91  |     |
 93 | |Danielson-Bryan DB1   |            |6.46  |      |     |
 94 | |Danielson-Bryan DB2   |            |60.39 |6     |     |
 95 | |Dickes-Steiwer        |            |53.07 |      |     |
 96 | |ELF                   |            |1.83  |      |     |
 97 | |Farr-Jenkins-Paterson |            |66.81 |8-9   |     |
 98 | |Flesch                |en (Flesch) |69.57 |8-9   |     |
 99 | |Flesch-Kincaid        |            |      |4.85  |9.8  |
100 | |FOG                   |            |      |7.84  |     |
101 | |FORCAST               |            |      |10.28 |15.3 |
102 | |Fucks                 |            |23.38 |4.83  |     |
103 | |Linsear-Write         |            |      |2.35  |     |
104 | |LIX                   |            |32.41 |< 5   |     |
105 | |nWS1                  |            |      |4.19  |     |
106 | |nWS2                  |            |      |4.72  |     |
107 | |nWS3                  |            |      |4.14  |     |
108 | |nWS4                  |            |      |3.64  |     |
109 | |RIX                   |            |1.42  |5     |     |
110 | |SMOG                  |            |      |8.08  |13.1 |
111 | |Strain                |            |2.44  |      |     |
112 | |TRI                   |            |-94   |      |     |
113 | |Tuldava               |            |2.57  |      |     |
114 | |Wheeler-Smith         |            |18.33 |2     |     |
115 | ```
116 | 
117 | Similar to the `word count` function, if you want to reuse these results in other R functions, you can use an unexported function like this `wordcountaddin:::readability_fn_(text)`, where `text` is a character vector of your text (with length one, ie. all your text in a single character string). The output will be a list object with slightly more detail than the summary table above. 
118 | 
119 | Inspiration for this addin came from [jadd](https://github.com/jennybc/jadd) and [WrapRmd](https://github.com/tjmahr/WrapRmd). 
120 | 
121 | ## How to install
122 | 
123 | Install with `devtools::install_github("benmarwick/wordcountaddin",  type = "source", dependencies = TRUE)`
124 | 
125 | Go to `Tools > Addins` in RStudio to select and configure addins. 
126 | 
127 | ## How to use
128 | 
129 | 1. Open a Rmd file in RStudio.  
130 | 2. Select some text, it can include YAML, code chunks and inline code   
131 | 3. Go to `Tools > Addins` in RStudio and click on `Word count` or `Readability`. Computing `Readability` may take a few moments on longer documents because it has to count syllables for some of the stats.
132 | 4. Look in the console for the output   
133 | 
134 | 
135 | ## Feedback, contributing, etc.
136 | 
137 | Please [open an issue](https://github.com/benmarwick/wordcountaddin/issues/new) if you find something that doesn't work as expected. Note that this project is released with a [Guide to Contributing](CONTRIBUTING.md) and a [Contributor Code of Conduct](CONDUCT.md). By participating in this project you agree to abide by its terms.
138 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | <!-- README.md is generated from README.Rmd. Please edit that file -->
  2 | wordcountaddin <img src="inst/logo.png" align="right" height="130" />
  3 | =====================================================================
  4 | 
  5 | [![Last-changedate](https://img.shields.io/badge/last%20change-2019--01--09-brightgreen.svg)](https://github.com/benmarwick/wordcountaddin/commits/master)
  6 | [![minimal R
  7 | version](https://img.shields.io/badge/R%3E%3D-3.5.2-brightgreen.svg)](https://cran.r-project.org/)
  8 | [![Licence](https://img.shields.io/github/license/mashape/apistatus.svg)](http://choosealicense.com/licenses/mit/)
  9 | [![Travis-CI Build
 10 | Status](https://travis-ci.org/benmarwick/wordcountaddin.png?branch=master)](https://travis-ci.org/benmarwick/wordcountaddin)
 11 | [![codecov.io](https://codecov.io/github/benmarwick/wordcountaddin/coverage.svg?branch=master)](https://codecov.io/github/benmarwick/wordcountaddin?branch=master)
 12 | [![ORCiD](https://img.shields.io/badge/ORCiD-0000--0001--7879--4531-green.svg)](http://orcid.org/0000-0001-7879-4531)
 13 | 
 14 | This R package is an [RStudio
 15 | addin](https://rstudio.github.io/rstudioaddins/) to count words and
 16 | characters in text in an [R markdown](http://rmarkdown.rstudio.com/)
 17 | document. It also has a function to compute readability statistics so
 18 | you can get an indication of how easy or difficult your document is to
 19 | read.
 20 | 
 21 | You can count words in your Rmd file in three ways:
 22 | 
 23 | -   In a selection of text in your active Rmd, by selecting some text
 24 |     with your mouse in RStudio and using the Wordcount Addin  
 25 | -   All the words in your active Rmd in RStudio, by using the Wordcount
 26 |     Addin with no text selected
 27 | -   All the words in an Rmd file, directly using the `word_count`
 28 |     function from the console or command line (RStudio not required),
 29 |     and specifiying the filename as an argument to the function (e.g.
 30 |     `wordcountaddin::word_count("my_file.Rmd")`). This will give you a
 31 |     single integer result, rather than the Markdown table that the other
 32 |     functions return.
 33 | 
 34 | Independent of an Rmd file, you can also count words in a character
 35 | vector from the console using the `text_stats_chr` function (and there
 36 | is `readability_chr` for readability).
 37 | 
 38 | Word count
 39 | ----------
 40 | 
 41 | When counting words in the text of your Rmd document, these things will
 42 | be ignored:
 43 | 
 44 | -   YAML front matter  
 45 | -   code chunks and inline code
 46 | -   text in HTML comment tags: `<!-- text -->`
 47 | -   HTML tags in the text: `<br>`, `</br>`
 48 | -   inline URLs in this format: `[text of link](url)`
 49 | -   images with captions in this format:
 50 |     `![this is the caption](/path/to/image.png)`
 51 | -   header level indicators such as `#` and `##`, etc.
 52 | 
 53 | And because my regex is quite simple, the word count function may also
 54 | ignore parts of your actual text that resemble these things.
 55 | 
 56 | The word count will include text in headers, block quotations, verbatim
 57 | code blocks, tables, raw LaTeX and raw HTML.
 58 | 
 59 | In general, there are numerous ways to count words, with no widely
 60 | accepted standard method. The variety of methods is due to differences
 61 | in the definitions of a word and a sentence. Run
 62 | `?stringi::stri_stats_latex` and `?koRpus::describe` to learn more about
 63 | the word counting methods.
 64 | 
 65 | For this addin I’ve included two methods, mostly out of curiosity to see
 66 | how they differ from each other. I use functions from the
 67 | [stringi](https://cran.r-project.org/web/packages/stringi/index.html)
 68 | and [koRpus](https://cran.r-project.org/web/packages/koRpus/index.html)
 69 | packages. If you’re curious, you can compare the results you get with
 70 | this addin to an online tool such as
 71 | <a href="http://wordcounttools.com/" class="uri">http://wordcounttools.com/</a>.
 72 | 
 73 | The output of the `Word count` function is a markdown table in your R
 74 | console that might look like this:
 75 | 
 76 |     |Method          |koRpus      |stringi       |
 77 |     |:---------------|:-----------|:-------------|
 78 |     |Word count      |107         |104           |
 79 |     |Character count |604         |603           |
 80 |     |Sentence count  |10          |Not available |
 81 |     |Reading time    |0.5 minutes |0.5 minutes   |
 82 | 
 83 | If you want to reuse these results in other R functions, you can use an
 84 | unexported function like this `wordcountaddin:::text_stats_fn_(text)`,
 85 | where `text` is a character vector of your text (with length one, ie.
 86 | all your text in a single character string). The output will be a list
 87 | object, and will include several other items not shown in the markdown
 88 | table.
 89 | 
 90 | Readability
 91 | -----------
 92 | 
 93 | The readability function ignores all the same parts of the text as the
 94 | word count function, and then computes the values of a bunch of
 95 | [readability
 96 | statistics](https://en.wikipedia.org/wiki/Readability_test).
 97 | 
 98 | Most of these readability measurements aim to approximate the years of
 99 | education required to understand your text. They look at the number of
100 | characters and syllables per word, the number of words per sentence, and
101 | so on. They don’t analyse the meaning of the words. A score of around
102 | 10-12 is roughly the reading level on completion of high school in the
103 | US. These stats are computed by the
104 | [koRpus](https://cran.r-project.org/web/packages/koRpus/index.html)
105 | package.
106 | 
107 | There about 27 measurements that this readability function returns
108 | (depending on how long your text is), including the Automated
109 | Readability Index (ARI), Coleman-Liau, th Flesch-Kincaid Grade Level,
110 | and the Simple Measure of Gobbledygook (SMOG). For the full list of
111 | readability measurements that are returned by the readability function,
112 | run `?koRpus::readability`. That help page also shows the formulae and
113 | citations for each statistic (and an additional 20-odd other readability
114 | statistics not used here).
115 | 
116 | Readability stats are, of course, no substitute for critical
117 | self-reflection on the effectiveness of your writing at communicating
118 | ideas and information. To help with that, read [*Style: Toward Clarity
119 | and Grace*](http://www.amazon.com/dp/0226899152).
120 | 
121 | The output of the `readability` function is a markdown table in your R
122 | console that might look like this:
123 | 
124 | 
125 |     |index                 |flavour     |raw   |grade |age  |
126 |     |:---------------------|:-----------|:-----|:-----|:----|
127 |     |ARI                   |            |      |2.31  |     |
128 |     |Coleman-Liau          |            |66    |4.91  |     |
129 |     |Danielson-Bryan DB1   |            |6.46  |      |     |
130 |     |Danielson-Bryan DB2   |            |60.39 |6     |     |
131 |     |Dickes-Steiwer        |            |53.07 |      |     |
132 |     |ELF                   |            |1.83  |      |     |
133 |     |Farr-Jenkins-Paterson |            |66.81 |8-9   |     |
134 |     |Flesch                |en (Flesch) |69.57 |8-9   |     |
135 |     |Flesch-Kincaid        |            |      |4.85  |9.8  |
136 |     |FOG                   |            |      |7.84  |     |
137 |     |FORCAST               |            |      |10.28 |15.3 |
138 |     |Fucks                 |            |23.38 |4.83  |     |
139 |     |Linsear-Write         |            |      |2.35  |     |
140 |     |LIX                   |            |32.41 |< 5   |     |
141 |     |nWS1                  |            |      |4.19  |     |
142 |     |nWS2                  |            |      |4.72  |     |
143 |     |nWS3                  |            |      |4.14  |     |
144 |     |nWS4                  |            |      |3.64  |     |
145 |     |RIX                   |            |1.42  |5     |     |
146 |     |SMOG                  |            |      |8.08  |13.1 |
147 |     |Strain                |            |2.44  |      |     |
148 |     |TRI                   |            |-94   |      |     |
149 |     |Tuldava               |            |2.57  |      |     |
150 |     |Wheeler-Smith         |            |18.33 |2     |     |
151 | 
152 | Similar to the `word count` function, if you want to reuse these results
153 | in other R functions, you can use an unexported function like this
154 | `wordcountaddin:::readability_fn_(text)`, where `text` is a character
155 | vector of your text (with length one, ie. all your text in a single
156 | character string). The output will be a list object with slightly more
157 | detail than the summary table above.
158 | 
159 | Inspiration for this addin came from
160 | [jadd](https://github.com/jennybc/jadd) and
161 | [WrapRmd](https://github.com/tjmahr/WrapRmd).
162 | 
163 | How to install
164 | --------------
165 | 
166 | Install with
167 | `devtools::install_github("benmarwick/wordcountaddin",  type = "source", dependencies = TRUE)`
168 | 
169 | Go to `Tools > Addins` in RStudio to select and configure addins.
170 | 
171 | How to use
172 | ----------
173 | 
174 | 1.  Open a Rmd file in RStudio.  
175 | 2.  Select some text, it can include YAML, code chunks and inline code  
176 | 3.  Go to `Tools > Addins` in RStudio and click on `Word count` or
177 |     `Readability`. Computing `Readability` may take a few moments on
178 |     longer documents because it has to count syllables for some of the
179 |     stats.
180 | 4.  Look in the console for the output
181 | 
182 | Feedback, contributing, etc.
183 | ----------------------------
184 | 
185 | Please [open an
186 | issue](https://github.com/benmarwick/wordcountaddin/issues/new) if you
187 | find something that doesn’t work as expected. Note that this project is
188 | released with a [Guide to Contributing](CONTRIBUTING.md) and a
189 | [Contributor Code of Conduct](CONDUCT.md). By participating in this
190 | project you agree to abide by its terms.
191 | 


--------------------------------------------------------------------------------
/codecov.yml:
--------------------------------------------------------------------------------
1 | comment: false
2 | 


--------------------------------------------------------------------------------
/inst/logo.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/benmarwick/wordcountaddin/13bf891f11322c73919be59ed797cf201e725cac/inst/logo.png


--------------------------------------------------------------------------------
/inst/rstudio/addins.dcf:
--------------------------------------------------------------------------------
 1 | Name: Word count
 2 | Description: Counts words and characters (excluding code chunks, inline code, etc.)
 3 | Binding: text_stats
 4 | Interactive: true
 5 | 
 6 | Name: Readability
 7 | Description: Computes readability statistics (excluding code chunks, inline code, etc.)
 8 | Binding: readability
 9 | Interactive: true
10 | 


--------------------------------------------------------------------------------
/man/text_stats.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/hello.R
 3 | \name{text_stats}
 4 | \alias{text_stats}
 5 | \alias{word_count}
 6 | \alias{readability}
 7 | \alias{text_stats_chr}
 8 | \alias{readability_chr}
 9 | \alias{text_stats_fn_}
10 | \title{Get text stats for selected text (excluding code chunks and inline code)}
11 | \usage{
12 | text_stats(filename = this_filename())
13 | 
14 | word_count(filename = this_filename())
15 | 
16 | readability(filename = this_filename(), quiet = TRUE)
17 | 
18 | text_stats_chr(text)
19 | 
20 | readability_chr(text, quiet = TRUE)
21 | 
22 | text_stats_fn_(text)
23 | }
24 | \arguments{
25 | \item{filename}{Path to the file on which to compute text stats.
26 | Default is the current file (when working in RStudio) or the file being
27 | knit (when compiling with \code{knitr}).}
28 | 
29 | \item{quiet}{Logical. Should task be performed quietly?}
30 | 
31 | \item{text}{a character string of text, length of one}
32 | }
33 | \description{
34 | Call this addin to get a word count and some other stats about the text
35 | 
36 | Get a word count as a single integer
37 | 
38 | Get readability stats for selected text (excluding code chunks)
39 | 
40 | Get text stats for selected text (excluding code chunks and inline code)
41 | 
42 | Get readability stats for selected text (excluding code chunks)
43 | }
44 | \details{
45 | Call this addin to get readbility stats about the text
46 | 
47 | Use this function with a character string as input
48 | 
49 | Use this function with a character string as input
50 | }
51 | \examples{
52 | md <- system.file(package = "wordcountaddin", "NEWS.md")
53 | text_stats(md)
54 | word_count(md)
55 | \dontrun{
56 | readability(md)
57 | }
58 | }
59 | 


--------------------------------------------------------------------------------
/man/wordcountaddin.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/hello.R
 3 | \docType{package}
 4 | \name{wordcountaddin}
 5 | \alias{wordcountaddin}
 6 | \title{wordcountaddin}
 7 | \description{
 8 | This packages is an addin for RStudio that will count the words and characters in a plain text document. It is designed for use with R markdown documents and will exclude YAML header content, code chunks and inline code from the counts. It also computes readability statistics so you can get an idea of how easy or difficult your text is to read.
 9 | }
10 | 


--------------------------------------------------------------------------------
/tests/testthat.R:
--------------------------------------------------------------------------------
1 | library(testthat)
2 | library(wordcountaddin)
3 | 
4 | test_check("wordcountaddin")
5 | 


--------------------------------------------------------------------------------
/tests/testthat/test_wordcountaddin.R:
--------------------------------------------------------------------------------
  1 | library(wordcountaddin)
  2 | 
  3 | context("Word count")
  4 | 
  5 | test_that("Word count is correct for short simple sentence", {
  6 |   # short sentence
  7 |   eleven_words <- "here are exactly eleven words of fairly boring and unpunctuated text"
  8 | 
  9 |   short_stats <-  text_stats_fn_(eleven_words)
 10 |   # qdap cannot manage without final punct.
 11 |   n_words_stri_11 <-short_stats$n_words_stri
 12 |   n_words_korp_11 <- short_stats$n_words_korp
 13 | 
 14 |   n_char_tot_stri_11 <-  short_stats$n_char_tot_stri
 15 |   n_char_tot_korp_11 <- short_stats$n_char_tot_korp
 16 | 
 17 |   expect_equal(n_words_stri_11, 11)
 18 |   expect_equal(n_words_korp_11, 11)
 19 |   expect_equal(n_char_tot_stri_11, 68)
 20 |   expect_equal(n_char_tot_korp_11, 69)
 21 | })
 22 | 
 23 | test_that("Word count is correct for moderately complex sentences", {
 24 |   # Moderate: Harvard sentences, https://en.wikipedia.org/wiki/Harvard_sentences
 25 |   moderately_complex <- "The birch canoe slid on the smooth planks. Glue the sheet to the dark blue background. It's easy to tell the depth of a well. These days a chicken leg is a rare dish. Rice is often served in round bowls. The juice of lemons makes fine punch. The box was thrown beside the parked truck. The hogs were fed chopped corn and garbage. Four hours of steady work faced us. Large size in stockings is hard to sell."
 26 | 
 27 |   moderately_complex_stats <- text_stats_fn_(moderately_complex)
 28 | 
 29 |   n_char_tot_stri_mc <-  moderately_complex_stats$n_char_tot_stri
 30 |   n_char_tot_korp_mc <- moderately_complex_stats$n_char_tot_korp
 31 | 
 32 |   n_words_stri_mc <- moderately_complex_stats$n_words_stri
 33 |   n_words_korp_mc <- moderately_complex_stats$n_words_korp
 34 | 
 35 |   n_sentences_korp_mc <- moderately_complex_stats$n_sentences_korp
 36 | 
 37 |   expect_equal(n_char_tot_stri_mc, 406)
 38 |   expect_equal(n_char_tot_korp_mc, 407)
 39 |   expect_equal(n_words_stri_mc, 80)  # MS Word says 79
 40 |   expect_equal(n_words_korp_mc, 80)
 41 |   expect_equal(n_sentences_korp_mc, 10)
 42 | })
 43 | 
 44 | 
 45 | 
 46 | test_that("Word count is correct for complex sentences in filler text", {
 47 |   # Filler text with various punctuation
 48 |   filler <- "Lorem ipsum dolor sit amet, ea debet error sensibus vix, at esse decore vivendo vim, rebum aliquip an cum? His ea agam novum dissentiet! At mel audire liberavisse, mundi audiam quaeque sea ne. In eam error habemus delectus, audiam ocurreret ne sit, sit ei salutandi liberavisse! Ut vix case corpora.
 49 | 
 50 | Posse malorum ponderum in qui, et eum dicam disputando, an vix quaestio scripserit. Falli veniam tamquam id mei. Modo sumo appetere cu mea, mutat possim rationibus ius id. Sed nominati antiopam cu, cu prima mandamus vim. Eos cu exerci consul!
 51 | 
 52 | Nam case atomorum suavitate cu? No quo inermis necessitatibus, eos ne essent scripta vivendum, ea euismod quaestio qui? Per minim tation accusamus eu, audire dolores nam an. Vel vocent inimicus ut, eu porro libris argumentum quo.
 53 | 
 54 | Vim no solet tempor, aperiam habemus assueverit ea usu: sea ut quodsi gloriatur! Eum te laudem aliquid inciderint, mollis prodesset mea ad. Dico definiebas efficiendi id usu. No bonorum suavitate adolescens per, ius oratio pericula ut, at mel porro vocibus scriptorem. Sea incorrupte definitiones necessitatibus in, cu ancillae conclusionemque duo. Ex vix dolore propriae principes, ius in augue ludus?
 55 | 
 56 | Solet copiosae ea sed, at assum  - dolore delenit has, ex aperiam honestatis mei. No legere nemore nonumes mel. Eu ullum accusata nam, an sea wisi rebum. Ei homero equidem sea! Sed erat augue eripuit et, ea vim altera eirmod labores, ad noster veritus nec.
 57 | 
 58 | Ut porro sententiae vis, debet affert eligendi id eam! In, nominati, pertinacia has, sea admodum dissentiunt eu! Volumus appellantur ex eos. Ei duo movet scripta aliquid, ea blandit explicari consectetuer eos.
 59 | 
 60 | Ne cibo ornatus vituperata pri. Soleat populo fierent ne sed, vel congue consequat temporibus in. Pro eu nostro inermis sadipscing, ne pri possim lobortis! Sea sonet nihil accusata no. Mei virtute noluisse pericula ex, aliquid mandamus inimicus quo ex.
 61 | 
 62 | Esse patrioque at qui, cum sanctus; consequuntur conclusionemque cu? Ut summo oportere  appellantur mel, ex per tale semper appellantur. Usu ea alia insolens sadipscing, eu aeterno persius vix. Agam prodesset interpretaris at ius, ne est malis signiferumque, illum soluta albucius mei an. Ex error tollit recusabo est, ut prompta consectetuer per. Dicam numquam eum id, brute mollis nam cu!
 63 | 
 64 | Ei vis discere interesset! Mutat 'option' qualisque ius te, sea deserunt lobortis voluptatum at. Qui et impedit accumsan atomorum, nam dicat possit ornatus an? Eu mei aperiri discere, sea veri homero ad, stet dolore putant mei in. Eu pri debet populo luptatum, eos te nominati concludaturque.
 65 | 
 66 | Tota veritus similique ne per, eam fastidii voluptatum eu. Sea tale mandamus suscipiantur ex. Ullum ullamcorper consequuntur et cum, aeque fuisset ut sea! Mea graecis pertinax explicari ne, pri tale hinc no? Eu vidisse nominati eum, et eam hendrerit voluptatum assueverit, qui ne munere recusabo democritum."
 67 | 
 68 |   filler_stats <- text_stats_fn_(filler)
 69 | 
 70 |   n_char_tot_stri_f <-  filler_stats$n_char_tot_stri
 71 |   n_char_tot_korp_f <- filler_stats$n_char_tot_korp
 72 | 
 73 |   n_words_stri_f <- filler_stats$n_words_stri
 74 |   n_words_korp_f <- filler_stats$n_words_korp
 75 | 
 76 |   n_sentences_korp_f <- filler_stats$n_sentences_korp
 77 | 
 78 |   expect_equal(n_char_tot_stri_f, 2896)
 79 |   expect_equal(n_char_tot_korp_f, 2897)
 80 |   expect_equal(n_words_stri_f, 450)
 81 |   expect_equal(n_words_korp_f, 450) # MS Word says 442
 82 |   expect_equal(n_sentences_korp_f, 52)
 83 | })
 84 | 
 85 | 
 86 | 
 87 | test_that("Word count is correct for rmd text", {
 88 |   # text with code chunks, etc.
 89 |   rmd_text <- "
 90 | 
 91 | ---
 92 | title: 'Untitled'
 93 | output: html_document
 94 | ---
 95 | 
 96 | ```{r setup, include=FALSE}
 97 | knitr::opts_chunk$set(echo = TRUE)
 98 | ```
 99 | 
100 | <!-- this is an HTML comment -->
101 | 
102 | ## Heading
103 | 
104 | This is an [R markdown](http://rmarkdown.rstudio.com/) document.
105 | 
106 | ```{r cars}
107 | summary(cars)
108 | # Lines line this have caused problems -----------------------------------------
109 | ```
110 | 
111 | `r 2+2`
112 | 
113 | `r nrow(cars)`
114 | 
115 | ##  Plots
116 | 
117 | You can also embed plots, for example:
118 | 
119 | ```{r pressure, echo=FALSE}
120 | plot(pressure)
121 | ```
122 | 
123 | ![this is the caption](/path/to/image.png)
124 | 
125 | "
126 | 
127 |   rmd_stats <- text_stats_fn_(rmd_text)
128 | 
129 |   n_char_tot_stri_r <-  rmd_stats$n_char_tot_stri
130 |   n_char_tot_korp_r <- rmd_stats$n_char_tot_korp
131 | 
132 |   n_words_stri_r <- rmd_stats$n_words_stri
133 |   n_words_korp_r <- rmd_stats$n_words_korp
134 | 
135 |   n_sentences_korp_r <- rmd_stats$n_sentences_korp
136 | 
137 |   expect_equal(n_char_tot_stri_r, 159)
138 |   expect_equal(n_char_tot_korp_r, 159)
139 |   expect_equal(n_words_stri_r, 20)
140 |   expect_equal(n_words_korp_r, 20)
141 |   expect_equal(n_sentences_korp_r, 4)
142 | })
143 | 
144 | test_that("we can ignore <br> and </br>", {
145 |   #  test for <br>
146 |   string_with_br <- "Hi, I have <br> in the </br> string"
147 | 
148 |   string_with_br_stats <- text_stats_fn_(string_with_br)
149 | 
150 |   n_char_tot_stri_r <-  string_with_br_stats$n_char_tot_stri
151 |   n_char_tot_korp_r <- string_with_br_stats$n_char_tot_korp
152 | 
153 |   n_words_stri_r <- string_with_br_stats$n_words_stri
154 |   n_words_korp_r <- string_with_br_stats$n_words_korp
155 | 
156 |   n_sentences_korp_r <- string_with_br_stats$n_sentences_korp
157 | 
158 |   expect_equal(n_char_tot_stri_r, 26)
159 |   expect_equal(n_char_tot_korp_r, 27)
160 |   expect_equal(n_words_stri_r, 6)
161 |   expect_equal(n_words_korp_r, 6)
162 |   expect_equal(n_sentences_korp_r, 0)
163 | })
164 | 
165 | test_that("we can ignore HTML tags but keep greater/less", {
166 |   string_gr_ls <- "Hi, <br> I am <20 but >10 years old"
167 | 
168 |   expect_equal(prep_text(string_gr_ls),
169 |                "Hi,  I am 20 but 10 years old")
170 | })
171 | 
172 | test_that("Word count is correct for rmd file", {
173 |   # test that we can word count on a file
174 |   the_rmd_file_stats <- text_stats(filename = test_path("test_wordcountaddin.Rmd"))
175 | 
176 |   expect_equal(the_rmd_file_stats[3],
177 |                "|Word count      |108         |107           |")
178 |   expect_equal(the_rmd_file_stats[4],
179 |                "|Character count |628         |628           |")
180 |   expect_equal(the_rmd_file_stats[5],
181 |                "|Sentence count  |9           |Not available |")
182 |   expect_equal(the_rmd_file_stats[6],
183 |                "|Reading time    |0.5 minutes |0.5 minutes   |")
184 | })
185 | 
186 | 
187 | test_that("Word count is correct for cmd line", {
188 |   # command line fns
189 |   text_on_the_command_line <- "here is some text"
190 |   text_stats_chr_out <- text_stats_chr(text_on_the_command_line)
191 | 
192 |   expect_equal(text_stats_chr_out[3],
193 |                "|Word count      |4         |4             |")
194 |   expect_equal(text_stats_chr_out[4],
195 |                "|Character count |18        |17            |")
196 |   expect_equal(text_stats_chr_out[5],
197 |                "|Sentence count  |0         |Not available |")
198 |   expect_equal(text_stats_chr_out[6],
199 |                "|Reading time    |0 minutes |0 minutes     |")
200 | })
201 | 
202 | 
203 | test_that("readability is correct for cmd line", {
204 |   text_on_the_command_line <- "here is some text"
205 |   expect_output(
206 |     expect_warning(
207 |       readability_chr_out <- readability_chr(text_on_the_command_line)
208 |     )
209 |   )
210 |   expect_length(readability_chr_out, 26)
211 | })
212 | 
213 | test_that("Word count is correct for text with % sign", {
214 |   # test for escaping the percent sign in plain text
215 |   text_with_percent_sign <- "Here is some % text with percent % signs in it."
216 | 
217 |   text_stats_percent_chr_out <- text_stats_chr(text_with_percent_sign)
218 |   expect_equal(text_stats_percent_chr_out[3],
219 |                "|Word count      |9         |9             |")
220 | })
221 | 
222 | 
223 | test_that("Word count is correct for text with figures included using LaTeX code", {
224 |   # test for escaping the percent sign in plain text
225 |   text_with_figures <- "One \\begin{figure} \\caption{text} \\label{text} \\includegraphics[width=\\textwidth]{figure.png} \\end{figure} Two \\begin{figure} \\caption{text} \\label{text} \\includegraphics[width=\\textwidth]{figure.png} \\end{figure} Three"
226 | 
227 |   text_stats_percent_chr_out <- text_stats_chr(text_with_figures)
228 |   expect_equal(text_stats_percent_chr_out[3],
229 |                "|Word count      |3         |3             |")
230 | })
231 | 
232 | 
233 | test_that("Word count is a single integer for a Rmd file when using word_count", {
234 |   # test that we can word count on a file
235 |   the_rmd_word_count <- word_count(filename = test_path("test_wordcountaddin.Rmd"))
236 | 
237 |   expect_equal(the_rmd_word_count,
238 |                108L)
239 | })
240 | 
241 | test_that("We can handle very long strings, like citation keys", {
242 | 
243 |   expect_output(
244 |     expect_warning(
245 |   # test that we can word count on a file
246 |  long_string_read <- readability_chr("it's a long string right at the end here because a tiny
247 |                                      fraction of the refreneces have crazy long keys. Why do they do
248 |                                      that? It's autogenerated. Why does this give so many warnings when
249 |                                      testing. It's a puzzle. [@aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa].",
250 |                                      quiet = TRUE)))
251 | 
252 |  expect_equal( attr(long_string_read, 'format'), "markdown")
253 | 
254 | })
255 | 
256 | test_that("don't count abbreviations as multiple words", {
257 | 
258 | 
259 |   # test that we can word count on a file
260 |   words_with_abbv <- "zero .o.n.e .t.wo."
261 |   abbrev_count <- text_stats_chr(words_with_abbv)
262 | 
263 |  expect_equal( abbrev_count[3], "|Word count      |3         |3             |")
264 | 
265 | })
266 | 
267 | test_that("text_to_count reads file contents as character vector", {
268 |   contents <- text_to_count(test_path("test_wordcountaddin.Rmd"))
269 | 
270 |   expect_type(contents, "character")
271 |   expect_length(contents, 1)
272 | })
273 | 
274 | test_that("text_to_count raises an error for invalid file types", {
275 |   expect_error(text_to_count("invalid.tif"), regexp = "works with markdown")
276 | })
277 | 
278 | 


--------------------------------------------------------------------------------
/tests/testthat/test_wordcountaddin.Rmd:
--------------------------------------------------------------------------------
 1 | ---
 2 | title: "rmd_test_file.rmd"
 3 | output:
 4 |   word_document: default
 5 |   html_document: default
 6 | ---
 7 | 
 8 | ```{r setup, include=FALSE}
 9 | knitr::opts_chunk$set(echo = TRUE)
10 | ```
11 | 
12 | ## R Markdown
13 | 
14 | This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see <http://rmarkdown.rstudio.com>.
15 | 
16 | When you click the **Knit** button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
17 | 
18 | ```{r cars}
19 | summary(cars)
20 | 
21 | # Lines line this have caused problems -----------------------------------------
22 | ```
23 | 
24 | ## Including Plots
25 | 
26 | You can also embed plots, for example:
27 | 
28 | ```{r pressure, echo=FALSE}
29 | plot(pressure)
30 | ```
31 | 
32 | Note that the `echo = FALSE` parameter was added to the code chunk to prevent printing of the R code that generated the plot.
33 | 
34 | ```{r}
35 | # context <- rstudioapi::getActiveDocumentContext()
36 | ```
37 | 
38 | This Markdown file contains `r wordcountaddin::word_count()` words:
39 | 
40 | ```{r, message=FALSE, echo=FALSE, error=TRUE}
41 | wordcountaddin::text_stats()
42 | ```
43 | 
44 | 
45 | ::: {.cell layout-align="center"}
46 | 
47 | :::
48 | 
49 | ::: {.cell layout-align="center"}
50 | ::: {.cell-output-display}
51 | :::
52 | :::
53 | 


--------------------------------------------------------------------------------
/tests/testthat/test_wordcountaddin.docx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/benmarwick/wordcountaddin/13bf891f11322c73919be59ed797cf201e725cac/tests/testthat/test_wordcountaddin.docx


--------------------------------------------------------------------------------
/wordcountaddin.Rproj:
--------------------------------------------------------------------------------
 1 | Version: 1.0
 2 | 
 3 | RestoreWorkspace: Default
 4 | SaveWorkspace: Default
 5 | AlwaysSaveHistory: Default
 6 | 
 7 | EnableCodeIndexing: Yes
 8 | UseSpacesForTab: Yes
 9 | NumSpacesForTab: 2
10 | Encoding: UTF-8
11 | 
12 | RnwWeave: knitr
13 | LaTeX: XeLaTeX
14 | 
15 | AutoAppendNewline: Yes
16 | StripTrailingWhitespace: Yes
17 | 
18 | BuildType: Package
19 | PackageUseDevtools: Yes
20 | PackageInstallArgs: --no-multiarch --with-keep.source
21 | PackageRoxygenize: rd,collate,namespace
22 | 


--------------------------------------------------------------------------------