vignettes/youtubecaption_vignette.Rmd
91 | youtubecaption_vignette.Rmd
Although there exist some R packages tailored for YouTube API (e.g., ‘tuber’), downloading YouTube video subtitle (i.e., caption) in a tidy form has never been a low-hanging fruit. Using ‘youtube-transcript-api’ Python package under the hood, this R package provides users with a convenient way of parsing and converting a desired YouTube caption into a handy tibble data_frame object. Furthermore, users can easily save a desired YouTube caption data as a tidy Excel file without advanced programming background knowledge.
106 |youtubecaption
requires Anaconda Python environment on your system Path.
If you have not installed Conda environment on your system, please download and install Anaconda (Python 3.6 or later is recommended).
115 |For this package, I have employed youtube-transcript-api Python module into R using reticulate.
116 |You can install the latest development version as follows:
125 |if(!require(remotes)) { 126 | install.packages("remotes") 127 | } 128 | 129 | remotes::install_github("jooyoungseo/youtubecaption")
You can install the released version of youtubecaption from CRAN with:
135 |install.packages('youtubecaption')
Please use get_caption()
function after loading youtubecaption
package like below:
library(youtubecaption) 143 | 144 | # Let's get the video caption out of Hadley Wickham's "You can't do data science in a GUI": 145 | url <- "https://www.youtube.com/watch?v=cpbtcsGE0OA" 146 | caption <- get_caption(url) 147 | caption 148 | 149 | #> # A tibble: 1,420 x 5 150 | #> segment_id text start duration vid 151 | #> <int> <chr> <dbl> <dbl> <chr> 152 | #> 1 1 thank you for coming to a meeting ~ 7.13 8.32 cpbtcsGE0~ 153 | #> 2 2 in regards to data science GUI with 10.7 8.44 cpbtcsGE0~ 154 | #> 3 3 happy with chief data scientist in~ 15.4 7.11 cpbtcsGE0~ 155 | #> 4 4 studio as well as the member of th~ 19.1 7.23 cpbtcsGE0~ 156 | #> 5 5 Foundation and an attempt professo~ 22.6 6 cpbtcsGE0~ 157 | #> 6 6 Stanford and at the University of 26.4 6.48 cpbtcsGE0~ 158 | #> 7 7 Auckland he builds both computatio~ 28.6 7.17 cpbtcsGE0~ 159 | #> 8 8 and cognitive tools to make data s~ 32.8 7.5 cpbtcsGE0~ 160 | #> 9 9 easier faster and more times his w~ 35.7 7.01 cpbtcsGE0~ 161 | #> 10 10 includes various packages as well ~ 40.4 6.21 cpbtcsGE0~ 162 | #> # ... with 1,410 more rows 163 | 164 | # Save the caption as an Excel file and open it right it away: 165 | get_caption(url = url, savexl = TRUE, openxl = TRUE)
Although there exist some R packages tailored for YouTube API (e.g., ‘tuber’), downloading YouTube video subtitle (i.e., caption) in a tidy form has never been a low-hanging fruit. Using ‘youtube-transcript-api’ Python package under the hood, this R package provides users with a convenient way of parsing and converting a desired YouTube caption into a handy tibble data_frame object. Furthermore, users can easily save a desired YouTube caption data as a tidy Excel file without advanced programming background knowledge.
93 |youtubecaption
requires Anaconda Python environment on your system Path.
If you have not installed Conda environment on your system, please download and install Anaconda (Python 3.6 or later is recommended).
102 |For this package, I have employed youtube-transcript-api Python module into R using reticulate.
103 |You can install the latest development version as follows:
112 |if(!require(remotes)) { 113 | install.packages("remotes") 114 | } 115 | 116 | remotes::install_github("jooyoungseo/youtubecaption")
You can install the released version of youtubecaption from CRAN with:
122 |install.packages('youtubecaption')
Please use get_caption()
function after loading youtubecaption
package like below:
library(youtubecaption) 130 | 131 | # Let's get the video caption out of Hadley Wickham's "You can't do data science in a GUI": 132 | url <- "https://www.youtube.com/watch?v=cpbtcsGE0OA" 133 | caption <- get_caption(url) 134 | caption 135 | 136 | #> # A tibble: 1,420 x 5 137 | #> segment_id text start duration vid 138 | #> <int> <chr> <dbl> <dbl> <chr> 139 | #> 1 1 thank you for coming to a meeting ~ 7.13 8.32 cpbtcsGE0~ 140 | #> 2 2 in regards to data science GUI with 10.7 8.44 cpbtcsGE0~ 141 | #> 3 3 happy with chief data scientist in~ 15.4 7.11 cpbtcsGE0~ 142 | #> 4 4 studio as well as the member of th~ 19.1 7.23 cpbtcsGE0~ 143 | #> 5 5 Foundation and an attempt professo~ 22.6 6 cpbtcsGE0~ 144 | #> 6 6 Stanford and at the University of 26.4 6.48 cpbtcsGE0~ 145 | #> 7 7 Auckland he builds both computatio~ 28.6 7.17 cpbtcsGE0~ 146 | #> 8 8 and cognitive tools to make data s~ 32.8 7.5 cpbtcsGE0~ 147 | #> 9 9 easier faster and more times his w~ 35.7 7.01 cpbtcsGE0~ 148 | #> 10 10 includes various packages as well ~ 40.4 6.21 cpbtcsGE0~ 149 | #> # ... with 1,410 more rows 150 | 151 | # Save the caption as an Excel file and open it right away: 152 | get_caption(url = url, savexl = TRUE, openxl = TRUE)
NEWS.md
127 | language
for get_caption()
function (#1). You can now pass two-character language code; it is set to “en” (English) by default.CITATION
file has been added.openxl
of get_caption()
does not work properly has been resolved.R/get_caption.R
128 | get_caption.Rd
Use this function for downloading a desired YouTube video caption in a tidy tibble data_frame form and save it as an Excel file in your current working directory.
133 |get_caption( 136 | url = NULL, 137 | language = "en", 138 | savexl = FALSE, 139 | openxl = FALSE, 140 | path = getwd() 141 | )142 | 143 |
url | 148 |A string value for a single YouTube video link URL. A typical form should start with "https://www.youtube.com/watch?v=" followed by a unique video ID. |
149 |
---|---|
language | 152 |two-character language code for the video URL. Set to "en" (English) by default. You can change this to fit with your needs (e.g., "ko" for Korean, "de" for German, etc.). |
153 |
savexl | 156 |A logical value for determining whether or not to save the obtained tidy YouTube caption data as an Excel file. The default is FALSE which does not save it as a file. If set to TRUE, a file named "YouTube_caption_ |
157 |
openxl | 160 |A logical value for determining whether or not to open, if any, the saved YouTube_caption Excel file in your working directory. The default is FALSE. TRUE works only when the preceding argument (i.e., savexl) is set to TRUE. |
161 |
path | 164 |A character vector of full path names; the default corresponds to the working directory, getwd. Tilde expansion (see path.expand) is performed. Missing values will be ignored. |
165 |
tibble (advanced data.frame) object for a YouTube video caption will be returned.
171 |get_caption
174 |See example below.
175 |https://pypi.org/project/youtube-transcript-api/
178 | 179 |216 |# \donttest{ 181 | library(youtubecaption) 182 | # Let's get the video caption out of Hadley Wickham's "You can't do data science in a GUI": 183 | url <- "https://www.youtube.com/watch?v=cpbtcsGE0OA" 184 | caption <- get_caption(url)#> Warning: path[1]="C:\Anaconda3\envs\bookworm/python.exe": The system cannot find the file specified#> Warning: path[1]="C:\Anaconda3\envs\bookworm/python.exe": The system cannot find the file specifiedcaption#> # A tibble: 1,420 x 5 185 | #> segment_id text start duration vid 186 | #> <int> <chr> <dbl> <dbl> <chr> 187 | #> 1 1 thank you for coming to a meeting today 7.13 8.32 cpbtcsGE0~ 188 | #> 2 2 in regards to data science GUI with 10.7 8.44 cpbtcsGE0~ 189 | #> 3 3 happy with chief data scientist in our 15.4 7.11 cpbtcsGE0~ 190 | #> 4 4 studio as well as the member of the our 19.1 7.23 cpbtcsGE0~ 191 | #> 5 5 Foundation and an attempt professor at 22.6 6 cpbtcsGE0~ 192 | #> 6 6 Stanford and at the University of 26.4 6.48 cpbtcsGE0~ 193 | #> 7 7 Auckland he builds both computational 28.6 7.17 cpbtcsGE0~ 194 | #> 8 8 and cognitive tools to make data science 32.8 7.5 cpbtcsGE0~ 195 | #> 9 9 easier faster and more times his work 35.7 7.01 cpbtcsGE0~ 196 | #> 10 10 includes various packages as well as 40.4 6.21 cpbtcsGE0~ 197 | #> # ... with 1,410 more rows198 | # Save the caption as an Excel file and open it right it away 199 | ## Changing path to temp for the demonstration purpose only: 200 | get_caption(url = url, savexl = TRUE, openxl = TRUE, path = tempdir())#> Warning: path[1]="C:\Anaconda3\envs\bookworm/python.exe": The system cannot find the file specified#> Warning: path[1]="C:\Anaconda3\envs\bookworm/python.exe": The system cannot find the file specified#> # A tibble: 1,420 x 5 201 | #> segment_id text start duration vid 202 | #> <int> <chr> <dbl> <dbl> <chr> 203 | #> 1 1 thank you for coming to a meeting today 7.13 8.32 cpbtcsGE0~ 204 | #> 2 2 in regards to data science GUI with 10.7 8.44 cpbtcsGE0~ 205 | #> 3 3 happy with chief data scientist in our 15.4 7.11 cpbtcsGE0~ 206 | #> 4 4 studio as well as the member of the our 19.1 7.23 cpbtcsGE0~ 207 | #> 5 5 Foundation and an attempt professor at 22.6 6 cpbtcsGE0~ 208 | #> 6 6 Stanford and at the University of 26.4 6.48 cpbtcsGE0~ 209 | #> 7 7 Auckland he builds both computational 28.6 7.17 cpbtcsGE0~ 210 | #> 8 8 and cognitive tools to make data science 32.8 7.5 cpbtcsGE0~ 211 | #> 9 9 easier faster and more times his work 35.7 7.01 cpbtcsGE0~ 212 | #> 10 10 includes various packages as well as 40.4 6.21 cpbtcsGE0~ 213 | #> # ... with 1,410 more rows# } 214 | 215 |
See magrittr::%>%
for details.
lhs %>% rhs136 | 137 | 138 | 139 |
R/youtubecaption-package.R
128 | youtubecaption-package.Rd
Although there exist some R packages tailored for YouTube API (e.g., 'tuber'), downloading YouTube video subtitle (i.e., caption) in a tidy form has never been a low-hanging fruit. Using 'youtube-transcript-api Python package' under the hood, this R package provides users with a convenient way of parsing and converting a desired YouTube caption into a handy 'tibble' data_frame object. Furthermore, users can easily save a desired YouTube caption data as a tidy Excel file without advanced programming background knowledge.
133 |Useful links:
Report bugs at https://github.com/jooyoungseo/youtubecaption/issues