├── vignettes ├── .gitignore └── genius_basics.Rmd ├── R ├── globals.R ├── utils-pipe.R ├── gen_album_url.R ├── gen_song_url.R ├── genius_lyrics.R ├── genius_album.R ├── genius_tracklist.R ├── calc_self_sim.R ├── add_genius.R ├── utils.R └── genius_url.R ├── .gitattributes ├── LICENSE ├── .gitignore ├── cran-comments.md ├── .Rbuildignore ├── man ├── possible_url.Rd ├── possible_album.Rd ├── possible_lyrics.Rd ├── prep_info.Rd ├── pipe.Rd ├── cleaning.Rd ├── gen_album_url.Rd ├── genius_tracklist.Rd ├── gen_song_url.Rd ├── genius_lyrics.Rd ├── genius_url.Rd ├── genius_album.Rd ├── calc_self_sim.Rd └── add_genius.Rd ├── geniusR.Rproj ├── DESCRIPTION ├── LICENSE.md ├── paper ├── inthewild.md ├── paper.md ├── notes.md └── index.bib ├── NAMESPACE ├── CODE_OF_CONDUCT.md ├── NEWS.md ├── inst └── tutorials │ └── genius_tutorial │ ├── genius_tutorial.Rmd │ └── genius_tutorial.html └── README.md /vignettes/.gitignore: -------------------------------------------------------------------------------- 1 | *.html 2 | *.R 3 | -------------------------------------------------------------------------------- /R/globals.R: -------------------------------------------------------------------------------- 1 | utils::globalVariables(c(".")) 2 | -------------------------------------------------------------------------------- /.gitattributes: -------------------------------------------------------------------------------- 1 | inst/tutorials/* linguist-documentation 2 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | YEAR: 2020 2 | COPYRIGHT HOLDER: Josiah D. Parry 3 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | .Rproj.user 2 | .Rhistory 3 | .RData 4 | .Ruserdata 5 | inst/doc 6 | prep_info.R 7 | data 8 | data-raw 9 | .DS_Store 10 | CRAN-RELEASE 11 | genius-similarity 12 | README.Rmd 13 | dev 14 | 15 | docs 16 | tests 17 | -------------------------------------------------------------------------------- /cran-comments.md: -------------------------------------------------------------------------------- 1 | This is a patch update that removes a dependency on readr, and updates superseded functions. 2 | 3 | ## Test environments 4 | * local R installation, R 4.0.3 5 | * ubuntu 16.04 (on travis-ci), R 4.0.3 6 | * win-builder (devel) 7 | 8 | ## R CMD check results 9 | 10 | 0 errors | 0 warnings | 1 note 11 | 12 | * This is a new release. 13 | -------------------------------------------------------------------------------- /.Rbuildignore: -------------------------------------------------------------------------------- 1 | ^cran-comments\.md$ 2 | ^README\.md$ 3 | ^CRAN-RELEASE$ 4 | ^Readme\.md$ 5 | ^R/prep_info\.R$ 6 | ^\.Rhistory$ 7 | ^data$ 8 | ^dev$ 9 | ^.*\.Rproj$ 10 | ^\.Rproj\.user$ 11 | ^data-raw$ 12 | ^docs$ 13 | ^tests$ 14 | ^vignettes$ 15 | ^vignettes/genius_basics\.Rmd$ 16 | ^genius-similarity$ 17 | ^genius-similarity/app\.R$ 18 | ^README\.Rmd$ 19 | ^CODE_OF_CONDUCT\.md$ 20 | ^LICENSE\.md$ 21 | ^paper$ 22 | -------------------------------------------------------------------------------- /man/possible_url.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/utils.R 3 | \name{possible_url} 4 | \alias{possible_url} 5 | \title{Form of genius_url that can handle errors} 6 | \usage{ 7 | possible_url(...) 8 | } 9 | \arguments{ 10 | \item{...}{arguments that would be passed to `genius_url()`} 11 | } 12 | \description{ 13 | Form of genius_url that can handle errors 14 | } 15 | -------------------------------------------------------------------------------- /man/possible_album.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/utils.R 3 | \name{possible_album} 4 | \alias{possible_album} 5 | \title{Form of genius_album that can handle errors} 6 | \usage{ 7 | possible_album(...) 8 | } 9 | \arguments{ 10 | \item{...}{arguments that would be passed to `genius_album()`} 11 | } 12 | \description{ 13 | Form of genius_album that can handle errors 14 | } 15 | -------------------------------------------------------------------------------- /R/utils-pipe.R: -------------------------------------------------------------------------------- 1 | #' Pipe operator 2 | #' 3 | #' See \code{magrittr::\link[magrittr:pipe]{\%>\%}} for details. 4 | #' 5 | #' @name %>% 6 | #' @rdname pipe 7 | #' @keywords internal 8 | #' @export 9 | #' @importFrom magrittr %>% 10 | #' @usage lhs \%>\% rhs 11 | #' @param lhs A value or the magrittr placeholder. 12 | #' @param rhs A function call using the magrittr semantics. 13 | #' @return The result of calling `rhs(lhs)`. 14 | NULL 15 | -------------------------------------------------------------------------------- /man/possible_lyrics.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/utils.R 3 | \name{possible_lyrics} 4 | \alias{possible_lyrics} 5 | \title{Form of genius_lyrics that can handle errors} 6 | \usage{ 7 | possible_lyrics(...) 8 | } 9 | \arguments{ 10 | \item{...}{arguments that would be passed to `genius_lyrics()`} 11 | } 12 | \description{ 13 | Form of genius_lyrics that can handle errors 14 | } 15 | -------------------------------------------------------------------------------- /man/prep_info.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/utils.R 3 | \name{prep_info} 4 | \alias{prep_info} 5 | \title{Prepares input strings for `gen_song_url()`} 6 | \usage{ 7 | prep_info(input) 8 | } 9 | \arguments{ 10 | \item{input}{Either artist, song, or album, function input.} 11 | } 12 | \description{ 13 | Applies a number of regular expressions to prepare the input to match Genius url format 14 | } 15 | -------------------------------------------------------------------------------- /geniusR.Rproj: -------------------------------------------------------------------------------- 1 | Version: 1.0 2 | 3 | RestoreWorkspace: No 4 | SaveWorkspace: No 5 | AlwaysSaveHistory: Default 6 | 7 | EnableCodeIndexing: Yes 8 | UseSpacesForTab: Yes 9 | NumSpacesForTab: 2 10 | Encoding: UTF-8 11 | 12 | RnwWeave: Sweave 13 | LaTeX: pdfLaTeX 14 | 15 | AutoAppendNewline: Yes 16 | StripTrailingWhitespace: Yes 17 | 18 | BuildType: Package 19 | PackageUseDevtools: Yes 20 | PackageInstallArgs: --no-multiarch --with-keep.source 21 | PackageRoxygenize: rd,collate,namespace 22 | -------------------------------------------------------------------------------- /man/pipe.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/utils-pipe.R 3 | \name{\%>\%} 4 | \alias{\%>\%} 5 | \title{Pipe operator} 6 | \usage{ 7 | lhs \%>\% rhs 8 | } 9 | \arguments{ 10 | \item{lhs}{A value or the magrittr placeholder.} 11 | 12 | \item{rhs}{A function call using the magrittr semantics.} 13 | } 14 | \value{ 15 | The result of calling `rhs(lhs)`. 16 | } 17 | \description{ 18 | See \code{magrittr::\link[magrittr:pipe]{\%>\%}} for details. 19 | } 20 | \keyword{internal} 21 | -------------------------------------------------------------------------------- /man/cleaning.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/utils.R 3 | \name{cleaning} 4 | \alias{cleaning} 5 | \title{Function which produces a vector to be used in string cleaning from scraping there are a lot of hard coded values in here and will need to be adapted for the weird nuances.} 6 | \usage{ 7 | cleaning() 8 | } 9 | \description{ 10 | Function which produces a vector to be used in string cleaning from scraping there are a lot of hard coded values in here and will need to be adapted for the weird nuances. 11 | } 12 | -------------------------------------------------------------------------------- /man/gen_album_url.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/gen_album_url.R 3 | \name{gen_album_url} 4 | \alias{gen_album_url} 5 | \title{Create Genius Album url} 6 | \usage{ 7 | gen_album_url(artist = NULL, album = NULL) 8 | } 9 | \arguments{ 10 | \item{artist}{The quoted name of the artist. Spelling matters, capitalization does not.} 11 | 12 | \item{album}{The quoted name of the album Spelling matters, capitalization does not.} 13 | } 14 | \description{ 15 | Creates a string containing the url to an album tracklist on Genius.com. The function is used internally to `genius_tracklist()`. 16 | } 17 | \examples{ 18 | 19 | gen_album_url(artist = "Pinegrove", album = "Cardinal") 20 | 21 | } 22 | -------------------------------------------------------------------------------- /man/genius_tracklist.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/genius_tracklist.R 3 | \name{genius_tracklist} 4 | \alias{genius_tracklist} 5 | \title{Create a tracklist of an album} 6 | \usage{ 7 | genius_tracklist(artist = NULL, album = NULL) 8 | } 9 | \arguments{ 10 | \item{artist}{The quoted name of the artist. Spelling matters, capitalization does not.} 11 | 12 | \item{album}{The quoted name of the album Spelling matters, capitalization does not.} 13 | } 14 | \description{ 15 | Creates a `tibble` containing all track titles for a given artist and album. This function is used internally in `genius_album()`. 16 | } 17 | \examples{ 18 | 19 | # genius_tracklist(artist = "Andrew Bird", album = "Noble Beast") 20 | 21 | } 22 | -------------------------------------------------------------------------------- /man/gen_song_url.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/gen_song_url.R 3 | \name{gen_song_url} 4 | \alias{gen_song_url} 5 | \title{Create Genius url} 6 | \usage{ 7 | gen_song_url(artist = NULL, song = NULL) 8 | } 9 | \arguments{ 10 | \item{artist}{The quoted name of the artist. Spelling matters, capitalization does not.} 11 | 12 | \item{song}{The quoted name of the song. Spelling matters, capitalization does not.} 13 | } 14 | \description{ 15 | Generates the url for a song given an artist and a song title. This function is used internally within the `genius_lyrics()` function. 16 | } 17 | \examples{ 18 | gen_song_url(artist = "Kendrick Lamar", song = "HUMBLE") 19 | gen_song_url("Margaret glaspy", "Memory Street") 20 | 21 | } 22 | -------------------------------------------------------------------------------- /R/gen_album_url.R: -------------------------------------------------------------------------------- 1 | #' Create Genius Album url 2 | #' 3 | #' Creates a string containing the url to an album tracklist on Genius.com. The function is used internally to `genius_tracklist()`. 4 | #' 5 | #' @param artist The quoted name of the artist. Spelling matters, capitalization does not. 6 | #' @param album The quoted name of the album Spelling matters, capitalization does not. 7 | #' 8 | #' @examples 9 | #' 10 | #' gen_album_url(artist = "Pinegrove", album = "Cardinal") 11 | #' 12 | #' @export 13 | #' @import dplyr 14 | #' @importFrom stringr str_replace_all 15 | 16 | gen_album_url <- function(artist = NULL, album = NULL) { 17 | artist <- prep_info(artist) 18 | album <- prep_info(album) 19 | base_url <- "https://genius.com/albums/" 20 | query <- paste(artist,"/", album, sep = "") %>% 21 | str_replace_all(" ", "-") 22 | 23 | url <- paste0(base_url, query) 24 | return(url) 25 | } 26 | -------------------------------------------------------------------------------- /R/gen_song_url.R: -------------------------------------------------------------------------------- 1 | #' Create Genius url 2 | #' 3 | #' Generates the url for a song given an artist and a song title. This function is used internally within the `genius_lyrics()` function. 4 | #' 5 | #' @param artist The quoted name of the artist. Spelling matters, capitalization does not. 6 | #' @param song The quoted name of the song. Spelling matters, capitalization does not. 7 | #' 8 | #' @examples 9 | #' gen_song_url(artist = "Kendrick Lamar", song = "HUMBLE") 10 | #' gen_song_url("Margaret glaspy", "Memory Street") 11 | #' 12 | #' @export 13 | #' @importFrom stringr str_replace_all 14 | #' @import dplyr 15 | #' 16 | gen_song_url <- function(artist = NULL, song = NULL) { 17 | artist <- prep_info(artist) 18 | song <- prep_info(song) 19 | base_url <- "https://genius.com/" 20 | query <- paste(artist, song, "lyrics", sep = "-") %>% 21 | str_replace_all(" ", "-") 22 | url <- paste0(base_url, query) 23 | return(url) 24 | } 25 | -------------------------------------------------------------------------------- /DESCRIPTION: -------------------------------------------------------------------------------- 1 | Package: genius 2 | Title: Easily Access Song Lyrics from Genius.com 3 | Version: 2.2.3 4 | Authors@R: c( 5 | person("Josiah", "Parry", email = "josiah.parry@gmail.com", role = c("aut", "cre")), 6 | person("Nathan", "Barr", email = "nab222@cornell.edu", role = c("aut", "ctb")), 7 | person("Chris", "Billingham", email = "chris.billingham@gmail.com", role = "ctb"), 8 | person("Evan", "Oppenheimer", email = "eoppe1022@gmail.com", role = "ctb") 9 | ) 10 | Description: Easily access song lyrics in a tidy way. 11 | URL: https://github.com/josiahparry/genius 12 | BugReports: https://github.com/josiahparry/genius/issues 13 | Depends: 14 | R (>= 3.1.2) 15 | Imports: 16 | dplyr (>= 0.7.0), 17 | rvest, 18 | stringr, 19 | tidyr, 20 | purrr, 21 | tibble, 22 | tidytext, 23 | reshape2, 24 | rlang, 25 | magrittr 26 | License: MIT + file LICENSE 27 | Encoding: UTF-8 28 | RoxygenNote: 7.1.1 29 | Suggests: 30 | knitr, 31 | rmarkdown, 32 | testthat 33 | -------------------------------------------------------------------------------- /R/genius_lyrics.R: -------------------------------------------------------------------------------- 1 | #' Retrieve song lyrics from Genius.com 2 | #' 3 | #' Retrieve the lyrics of a song with supplied artist and song name. 4 | #' @param artist The quoted name of the artist. Spelling matters, capitalization does not. 5 | #' @param song The quoted name of the song. Spelling matters, capitalization does not. 6 | #' @param info Default \code{"title"}, returns the track title. Set to \code{"simple"} for only lyrics, \code{"artist"} for the lyrics and artist, \code{"features"} for song element and the artist of that element, \code{"all"} to return artist, track, line, lyric, element, and element artist. 7 | #' 8 | #' 9 | #' @examples 10 | #' \donttest{ 11 | #' # genius_lyrics(artist = "Margaret Glaspy", song = "Memory Street") 12 | #' # genius_lyrics(artist = "Kendrick Lamar", song = "Money Trees") 13 | #' # genius_lyrics("JMSN", "Drinkin'") 14 | #'} 15 | #' @export 16 | #' @import dplyr 17 | 18 | genius_lyrics <- function(artist = NULL, song = NULL, info = "title") { 19 | song_url <- gen_song_url(artist, song) 20 | 21 | genius_url(song_url, info) 22 | 23 | } 24 | -------------------------------------------------------------------------------- /man/genius_lyrics.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/genius_lyrics.R 3 | \name{genius_lyrics} 4 | \alias{genius_lyrics} 5 | \title{Retrieve song lyrics from Genius.com} 6 | \usage{ 7 | genius_lyrics(artist = NULL, song = NULL, info = "title") 8 | } 9 | \arguments{ 10 | \item{artist}{The quoted name of the artist. Spelling matters, capitalization does not.} 11 | 12 | \item{song}{The quoted name of the song. Spelling matters, capitalization does not.} 13 | 14 | \item{info}{Default \code{"title"}, returns the track title. Set to \code{"simple"} for only lyrics, \code{"artist"} for the lyrics and artist, \code{"features"} for song element and the artist of that element, \code{"all"} to return artist, track, line, lyric, element, and element artist.} 15 | } 16 | \description{ 17 | Retrieve the lyrics of a song with supplied artist and song name. 18 | } 19 | \examples{ 20 | \donttest{ 21 | # genius_lyrics(artist = "Margaret Glaspy", song = "Memory Street") 22 | # genius_lyrics(artist = "Kendrick Lamar", song = "Money Trees") 23 | # genius_lyrics("JMSN", "Drinkin'") 24 | } 25 | } 26 | -------------------------------------------------------------------------------- /man/genius_url.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/genius_url.R 3 | \name{genius_url} 4 | \alias{genius_url} 5 | \title{Use Genius url to retrieve lyrics} 6 | \usage{ 7 | genius_url(url, info = "title") 8 | } 9 | \arguments{ 10 | \item{url}{The url of song lyrics on Genius} 11 | 12 | \item{info}{Default \code{"title"}, returns the track title. Set to \code{"simple"} for only lyrics, \code{"artist"} for the lyrics and artist, \code{"features"} for song element and the artist of that element, \code{"all"} to return artist, track, line, lyric, element, and element artist.} 13 | } 14 | \description{ 15 | This function is used inside of the `genius_lyrics()` function. Given a url to a song on Genius, this function returns a tibble where each row is one line. Pair this function with `gen_song_url()` for easier access to song lyrics. 16 | } 17 | \examples{ 18 | \donttest{ 19 | #' genius_url("https://genius.com/Head-north-in-the-water-lyrics", info = "all") 20 | 21 | # url <- gen_song_url(artist = "Kendrick Lamar", song = "HUMBLE") 22 | 23 | # genius_url(url) 24 | 25 | } 26 | } 27 | -------------------------------------------------------------------------------- /LICENSE.md: -------------------------------------------------------------------------------- 1 | # MIT License 2 | 3 | Copyright (c) 2020 Josiah D. Parry 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /man/genius_album.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/genius_album.R 3 | \name{genius_album} 4 | \alias{genius_album} 5 | \title{Retrieve song lyrics for an album} 6 | \usage{ 7 | genius_album(artist = NULL, album = NULL, info = "simple") 8 | } 9 | \arguments{ 10 | \item{artist}{The quoted name of the artist. Spelling matters, capitalization does not.} 11 | 12 | \item{album}{The quoted name of the album Spelling matters, capitalization does not.} 13 | 14 | \item{info}{Return track level metadata. See details.} 15 | } 16 | \description{ 17 | Obtain the lyrics to an album in a tidy format. 18 | } 19 | \details{ 20 | The `info` argument returns additional columns to the returned tibble: 21 | `"simple"` returns only the song lyrics. 22 | `"title"` returns the track title and lyrics. 23 | `"artist"` returns the lyrics and artist. 24 | `"features"` returns the lyrics, song elements, and element artists. 25 | `"all"` returns all of the above mentioned, plus appends the album name. 26 | } 27 | \examples{ 28 | 29 | \donttest{ 30 | # genius_album(artist = "Petal", album = "Comfort EP") 31 | # genius_album(artist = "Fit For A King", album = "Deathgrip", info = "all") 32 | } 33 | 34 | } 35 | -------------------------------------------------------------------------------- /man/calc_self_sim.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/calc_self_sim.R 3 | \name{calc_self_sim} 4 | \alias{calc_self_sim} 5 | \title{Calculate a self-similarity matrix} 6 | \usage{ 7 | calc_self_sim( 8 | df, 9 | lyric_col, 10 | output = "tidy", 11 | remove_stop_words = FALSE, 12 | language = "en", 13 | source = "snowball" 14 | ) 15 | } 16 | \arguments{ 17 | \item{df}{The data frame containing song lyrics. Usually from the output of \code{`genius_lyrics()`}.} 18 | 19 | \item{lyric_col}{The unquoted name of the column containing lyrics} 20 | 21 | \item{output}{Determine the type of output. Default is \code{"tidy"}. Set to \code{"matrix"} for the raw matrix.} 22 | 23 | \item{remove_stop_words}{Optional argument to remove stop words from self-similarity matrix.} 24 | 25 | \item{language}{Language of stop words. See \code{tidytext::get_stopwords()}.} 26 | 27 | \item{source}{Stop words source. See \code{tidytext::get_stopwords()}.} 28 | } 29 | \description{ 30 | Calculate the self-similarity matrix for song lyrics. 31 | } 32 | \examples{ 33 | 34 | \donttest{ 35 | # bad_habits <- genius_lyrics("Alix", "Bad Habits") 36 | # self_sim <- calc_self_sim(bad_habits, lyric) 37 | } 38 | 39 | } 40 | -------------------------------------------------------------------------------- /paper/inthewild.md: -------------------------------------------------------------------------------- 1 | Hardwired for tidytext: https://johnmackintosh.com/2018-01-30-hardwired-for-tidy-text/ 2 | 3 | Christian Music analysis: https://deadset.press/2019/11/02/jesus-is-king-the-numerical-analysis/ 4 | tweet that went viral-ish for the above https://twitter.com/deadsetpress/status/1174072168277303296?s=20 5 | 6 | Taylor Swift Alcohol: https://www.reddit.com/r/dataisbeautiful/comments/8bi05k/taylor_swifts_newfound_infatuation_with_alcohol_oc/ 7 | 8 | In Significance Magazine: https://rss.onlinelibrary.wiley.com/doi/abs/10.1111/j.1740-9713.2019.01277.x 9 | 10 | Soundtracks of mobile listening: youth’s daily life music listening on mobile phones : https://www.researchgate.net/publication/336848309_Soundtracks_of_mobile_listening_youth's_daily_life_music_listening_on_mobile_phones 11 | 12 | Songs perceived as relaxing: Musical features, lyrics, and contributing mechanisms : https://www.researchgate.net/publication/336848388_Songs_perceived_as_relaxing_Musical_features_lyrics_and_contributing_mechanisms 13 | 14 | Oma Musa (Personalised music listening strategies to support emotional health in adolescents): https://www.jyu.fi/hytk/fi/laitokset/mutku/en/research/projects2/personalisedmusic 15 | 16 | Used in data science education: https://github.com/rstudio-education/datascience-box/blob/eaa59a8855884669846884407a4d5faabc206e9a/slides/u3_d04-tidytext/u3_d04-tidytext.Rmd 17 | 18 | -------------------------------------------------------------------------------- /NAMESPACE: -------------------------------------------------------------------------------- 1 | # Generated by roxygen2: do not edit by hand 2 | 3 | export("%>%") 4 | export(add_genius) 5 | export(calc_self_sim) 6 | export(gen_album_url) 7 | export(gen_song_url) 8 | export(genius_album) 9 | export(genius_lyrics) 10 | export(genius_tracklist) 11 | export(genius_url) 12 | export(possible_album) 13 | export(possible_lyrics) 14 | export(possible_url) 15 | export(prep_info) 16 | import(dplyr) 17 | importFrom(dplyr,bind_rows) 18 | importFrom(dplyr,case_when) 19 | importFrom(dplyr,filter) 20 | importFrom(dplyr,group_by) 21 | importFrom(dplyr,inner_join) 22 | importFrom(dplyr,mutate) 23 | importFrom(dplyr,n) 24 | importFrom(dplyr,row_number) 25 | importFrom(dplyr,ungroup) 26 | importFrom(magrittr,"%>%") 27 | importFrom(purrr,map) 28 | importFrom(purrr,map2) 29 | importFrom(purrr,pluck) 30 | importFrom(purrr,possibly) 31 | importFrom(reshape2,melt) 32 | importFrom(rlang,enquo) 33 | importFrom(rvest,html_attr) 34 | importFrom(rvest,html_node) 35 | importFrom(rvest,html_nodes) 36 | importFrom(rvest,html_session) 37 | importFrom(rvest,html_text) 38 | importFrom(rvest,session) 39 | importFrom(stringr,str_detect) 40 | importFrom(stringr,str_extract) 41 | importFrom(stringr,str_replace_all) 42 | importFrom(stringr,str_trim) 43 | importFrom(tibble,as_tibble) 44 | importFrom(tibble,tibble) 45 | importFrom(tidyr,fill) 46 | importFrom(tidyr,pivot_wider) 47 | importFrom(tidyr,replace_na) 48 | importFrom(tidyr,separate) 49 | importFrom(tidyr,unnest) 50 | importFrom(tidytext,get_stopwords) 51 | importFrom(tidytext,unnest_tokens) 52 | -------------------------------------------------------------------------------- /R/genius_album.R: -------------------------------------------------------------------------------- 1 | if(getRversion() >= "2.15.1") utils::globalVariables(c("track_url", "lyrics")) 2 | 3 | #' Retrieve song lyrics for an album 4 | #' 5 | #' Obtain the lyrics to an album in a tidy format. 6 | #' 7 | #' @param artist The quoted name of the artist. Spelling matters, capitalization does not. 8 | #' @param album The quoted name of the album Spelling matters, capitalization does not. 9 | #' @param info Return track level metadata. See details. 10 | #' 11 | #' @details 12 | #' The `info` argument returns additional columns to the returned tibble: 13 | #' `"simple"` returns only the song lyrics. 14 | #' `"title"` returns the track title and lyrics. 15 | #' `"artist"` returns the lyrics and artist. 16 | #' `"features"` returns the lyrics, song elements, and element artists. 17 | #' `"all"` returns all of the above mentioned, plus appends the album name. 18 | #' 19 | #' @examples 20 | #' 21 | #'\donttest{ 22 | #' # genius_album(artist = "Petal", album = "Comfort EP") 23 | #' # genius_album(artist = "Fit For A King", album = "Deathgrip", info = "all") 24 | #'} 25 | #' 26 | #' @export 27 | #' @import dplyr 28 | #' @importFrom purrr map 29 | #' @importFrom stringr str_replace_all 30 | #' @importFrom tidyr unnest 31 | 32 | genius_album <- function(artist = NULL, album = NULL, info = "simple") { 33 | 34 | tracks <- genius_tracklist(artist, album) 35 | 36 | album <- tracks %>% 37 | mutate(lyrics = map(track_url, possible_url, info)) %>% 38 | select(-track_title) %>% 39 | unnest(lyrics) %>% 40 | right_join(tracks) %>% 41 | select(-track_url) 42 | 43 | if(info != "all"){album <- album %>% select(-.data$album_name)} 44 | 45 | album 46 | } 47 | -------------------------------------------------------------------------------- /man/add_genius.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/add_genius.R 3 | \name{add_genius} 4 | \alias{add_genius} 5 | \title{Add lyrics to a data frame} 6 | \usage{ 7 | add_genius(data, artist, title, type = c("album", "track", "lyrics")) 8 | } 9 | \arguments{ 10 | \item{data}{This is a dataframe with one column for the artist name, and the other column being either the track title or the album title.} 11 | 12 | \item{artist}{This is the column which has artist title information} 13 | 14 | \item{title}{This is the column that has either album titles, track titles, or both.} 15 | 16 | \item{type}{This is a single value character string of either "album" or "track". This tells the function what kind of lyrics to pull. Alternatively, this can be a column with the value of "album" or "track" associated with each row. "lyric" can be used for backward compatibility.} 17 | } 18 | \description{ 19 | This function is to be used to build on a data frame with artist and album/track information. To use the function with a data frame of mixed type (albums and tracks), create another column that specifies type. The type values are `"album"`and `"lyrics"`. 20 | } 21 | \examples{ 22 | \donttest{ 23 | # # Albums only 24 | # 25 | # artist_albums <- tibble::tribble( 26 | # ~artist, ~album, 27 | # "J. Cole", "KOD", 28 | # "Sampha", "Process" 29 | # ) 30 | # 31 | # add_genius(artist_albums, artist, album, type = "album") 32 | # 33 | # # Individual Tracks only 34 | # 35 | # artist_songs <- tibble::tribble( 36 | # ~artist, ~track, 37 | # "J. Cole", "Motiv8", 38 | # "Andrew Bird", "Anonanimal" 39 | # ) 40 | # 41 | # # Tracks and Albums 42 | # mixed_type <- tibble::tribble( 43 | # ~artist, ~album, ~type, 44 | # "J. Cole", "KOD", "album", 45 | # "Andrew Bird", "Proxy War", "track" 46 | # ) 47 | # 48 | # add_genius(mixed_type, artist, album, type) 49 | # add_genius(artist_songs, artist, track, type = "track") 50 | } 51 | 52 | 53 | } 54 | -------------------------------------------------------------------------------- /R/genius_tracklist.R: -------------------------------------------------------------------------------- 1 | if(getRversion() >= "2.15.1") utils::globalVariables(c("track_n")) 2 | 3 | 4 | #' Create a tracklist of an album 5 | #' 6 | #' Creates a `tibble` containing all track titles for a given artist and album. This function is used internally in `genius_album()`. 7 | #' 8 | #' @param artist The quoted name of the artist. Spelling matters, capitalization does not. 9 | #' @param album The quoted name of the album Spelling matters, capitalization does not. 10 | #' 11 | #' @examples 12 | #' 13 | #' # genius_tracklist(artist = "Andrew Bird", album = "Noble Beast") 14 | #' 15 | #' @export 16 | #' @import dplyr 17 | #' @importFrom rvest html_session html_nodes html_text html_attr 18 | #' @importFrom stringr str_replace_all str_trim 19 | 20 | genius_tracklist <- function(artist = NULL, album = NULL) { 21 | url <- gen_album_url(artist, album) 22 | genius_session <- session(url) 23 | 24 | # Get the album name 25 | album_name <- html_nodes(genius_session, ".header_with_cover_art-primary_info-title") %>% 26 | html_text() 27 | 28 | # Get track numbers 29 | # Where there are no track numbers, it isn't a song 30 | track_numbers <- html_nodes(genius_session, ".chart_row-number_container-number") %>% 31 | html_text() %>% 32 | str_replace_all("\n", "") %>% 33 | str_trim() 34 | 35 | # Get all titles 36 | # Where there is a title and a track number, it isn't an actual song 37 | track_titles <- html_nodes(genius_session, ".chart_row-content-title") %>% 38 | html_text() %>% 39 | str_replace_all("\n","") %>% 40 | str_replace_all("Lyrics", "") %>% 41 | str_trim() 42 | 43 | # Get all song urls 44 | track_url <- html_nodes(genius_session, ".u-display_block") %>% 45 | html_attr('href') %>% 46 | str_replace_all("\n", "") %>% 47 | str_trim() 48 | 49 | # Create df for easy filtering 50 | # Filter to find only the actual tracks, the ones without a track number were credits / booklet etc 51 | tibble( 52 | album_name = album_name, 53 | track_title = track_titles, 54 | track_n = as.integer(track_numbers), 55 | track_url = track_url 56 | ) %>% 57 | filter(track_n > 0) 58 | 59 | 60 | } 61 | -------------------------------------------------------------------------------- /paper/paper.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: 'genius: analysis-ready song lyric retrieval' 3 | tags: 4 | - R 5 | - music information retrieval 6 | - natural language processing 7 | - tidy text 8 | - tidy data 9 | - lyrics 10 | authors: 11 | - name: Josiah Parry 12 | orcid: 0000-0001-9910-865X 13 | affiliation: 1 14 | affiliations: 15 | - name: Northeastern University, School of Public Policy and Urban Affairs 16 | index: 1 17 | date: 16 December 2019 18 | bibliography: index.bib 19 | --- 20 | 21 | 22 | # Summary 23 | 24 | Music is a rather unique phenomenon. One song can be considered a dataset with myriad bits of information which can be represented by musical notation, audio signals, lyrics, among many other forms of representation. The extraction of information from music is referred to as music information retrieval (MIR). MIR has been largely focused on audio signal processing and has not put the same effort into the analysis of music text data [@Mayer2010]. Concomitantly, the field of music psychology has been studying the psychological implications of lyrics [@wellbeing; @violence; @soundtracks; @relaxing]. However, there seems to be no consistent MIR system in place for the acquisiton and analysis of song lyrics [@napier]. 25 | 26 | `genius` is an R package that provides a consistent interface for acquiring song lyrics in R. The package was developed with tidy data principles in mind [@tidydata; @tidytext]. As such, the toolkit that `genius` provides integrates with the tidy text framework. `genius` functionality returns one-token-per-document-per-row where each token is a line of a song. This adherance to tidy text principles allows users to transition between tidy and alternative text analysis frameworks. 27 | 28 | As song lyrics are only one aspect of music, it is often useful to acquire audio features and metadata as well. To this end, `genius` has been integrated with the `spotifyr` package [@spotifyr]. This integration, for example, enables researchers to fetch a single artist's discography with audio features and lyrics in a matter of seconds. 29 | 30 | By providing a robust and consistent framework for acquiring song lyrics, `genius` aids researchers by drastically reducing the amount of time and effort spent in collecting, cleaning, and preparing song lyrics for analysis. 31 | 32 | # References 33 | 34 | -------------------------------------------------------------------------------- /CODE_OF_CONDUCT.md: -------------------------------------------------------------------------------- 1 | # Contributor Code of Conduct 2 | 3 | As contributors and maintainers of this project, we pledge to make participation in our community a harassment-free experience for everyone, regardless of age, body size, disability, ethnicity, gender identity and expression, level of experience, education, socio-economic status, nationality, personal appearance, race, religion, or sexual identity and orientation. 4 | 5 | We pledge to act and interact in ways that contribute to an open, welcoming, diverse, inclusive, and healthy environment. 6 | 7 | Examples of behavior that contributes to a positive environment for our community include: 8 | 9 | * Demonstrating empathy and kindness toward other people 10 | * Being respectful of differing opinions, viewpoints, and experiences 11 | * Giving and gracefully accepting constructive feedback 12 | * Accepting responsibility and apologizing to those affected by our mistakes, and learning from the experience 13 | * Focusing on what is best not just for us as individuals, but for the overall community 14 | 15 | Examples of unacceptable behavior include: 16 | 17 | * The use of sexualized language or imagery, and sexual attention or 18 | advances of any kind 19 | * Trolling, insulting or derogatory comments, and personal or political attacks 20 | * Public or private harassment 21 | * Publishing others' private information, such as a physical or email 22 | address, without their explicit permission 23 | * Other conduct which could reasonably be considered inappropriate in a 24 | professional setting 25 | 26 | Community leaders are responsible for clarifying and enforcing our standards of acceptable behavior and will take appropriate and fair corrective action in response to any behavior that they deem inappropriate, threatening, offensive, or harmful. 27 | 28 | Community leaders have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that are not aligned to this Code of Conduct, and will communicate reasons for moderation decisions when appropriate. 29 | 30 | Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by opening an issue or contacting one or more of the project maintainers. 31 | 32 | 33 | ### Attribution 34 | 35 | This Code of Conduct is adapted from the [Contributor Covenant version 2.0](https://www.contributor-covenant.org/version/2/0/code_of_conduct.html) and the [dplyr Contributor Code of Conduct](https://dplyr.tidyverse.org/CODE_OF_CONDUCT). 36 | -------------------------------------------------------------------------------- /R/calc_self_sim.R: -------------------------------------------------------------------------------- 1 | if (getRversion() >= "2.15.1") { 2 | utils::globalVariables(c("Var1", "Var2", "value", "word", "x_id", "y_id")) 3 | } 4 | 5 | #' Calculate a self-similarity matrix 6 | #' 7 | #' Calculate the self-similarity matrix for song lyrics. 8 | #' 9 | #' @param df The data frame containing song lyrics. Usually from the output of \code{`genius_lyrics()`}. 10 | #' @param lyric_col The unquoted name of the column containing lyrics 11 | #' @param output Determine the type of output. Default is \code{"tidy"}. Set to \code{"matrix"} for the raw matrix. 12 | #' @param remove_stop_words Optional argument to remove stop words from self-similarity matrix. 13 | #' @param language Language of stop words. See \code{tidytext::get_stopwords()}. 14 | #' @param source Stop words source. See \code{tidytext::get_stopwords()}. 15 | #' 16 | #' @examples 17 | #' 18 | #'\donttest{ 19 | #' # bad_habits <- genius_lyrics("Alix", "Bad Habits") 20 | #' # self_sim <- calc_self_sim(bad_habits, lyric) 21 | #'} 22 | #' 23 | #' @export 24 | #' @import dplyr 25 | #' @importFrom tidytext unnest_tokens get_stopwords 26 | #' @importFrom reshape2 melt 27 | #' @importFrom tibble as_tibble 28 | 29 | calc_self_sim <- function(df, lyric_col, output = "tidy", remove_stop_words = FALSE, language = "en", source = "snowball") { 30 | lyric_col <- enquo(lyric_col) 31 | lyric_vec <- if (remove_stop_words) { 32 | df %>% 33 | unnest_tokens(word, !!lyric_col) %>% 34 | anti_join(get_stopwords(language = language, source = source)) %>% 35 | pull(word) 36 | } else { 37 | df %>% 38 | unnest_tokens(word, !!lyric_col) %>% 39 | pull(word) 40 | } 41 | 42 | 43 | # calculate matrix dimensions 44 | mat_size <- length(lyric_vec) 45 | 46 | # create matrix of the words 47 | lyric_mat <- matrix(lyric_vec, nrow = mat_size, ncol = mat_size) 48 | 49 | # initialize empty self-sim matrix 50 | self_sim <- matrix(nrow = mat_size, ncol = mat_size) 51 | 52 | # iterate through matrix and evlaute similarity 53 | for (col in 1:mat_size) { 54 | for(row in 1:mat_size) { 55 | self_sim[row, col] <- (self_sim[row, col] <- lyric_mat[row, col] == lyric_mat[col,col]) 56 | } 57 | } 58 | 59 | 60 | switch(output, 61 | matrix = {return(self_sim)}, 62 | tidy = { 63 | reshape2::melt(self_sim) %>% 64 | as_tibble() %>% 65 | rename(x_id = Var1, y_id = Var2, identical = value) %>% 66 | mutate(word_x = lyric_vec[x_id], 67 | word_y = lyric_vec[y_id]) %>% 68 | return() 69 | }) 70 | 71 | } 72 | -------------------------------------------------------------------------------- /paper/notes.md: -------------------------------------------------------------------------------- 1 | 2 | 3 | ------- 4 | 5 | Notes 6 | 7 | 8 | https://www.apa.org/monitor/julaug03/violent 9 | https://www.psychologytoday.com/us/blog/the-athletes-way/201901/anger-and-sadness-are-the-rise-in-popular-music-lyrics (billboard reference) 10 | https://jpms.ucpress.edu/content/30/4/161 (billboard) - scraped from a-z lyrics 11 | http://repository.upenn.edu/cgi/viewcontent.cgi?article=1094&context=mapp_capstone 12 | https://ieeexplore.ieee.org/document/5197270 13 | 14 | Genius extends tidy text analysis Silge 15 | https://www.researchgate.net/publication/305219559_tidytext_Text_Mining_and_Analysis_Using_Tidy_Data_Principles_in_R 16 | 17 | Used internally in spotifyr Thompson et al 18 | 19 | http://oa.upm.es/54212/1/TFG_LUIS_SERRANO_ESQUINAS.pdf 20 | 21 | https://scholar.google.com/scholar?start=10&q=spotifyr&hl=en&as_sdt=0,22#d=gs_qabs&u=%23p%3DTm9tg7_WPbQJ 22 | 23 | 24 | https://pure.tue.nl/ws/portalfiles/portal/136755280/Master_thesis_project_Janissen.pdf 25 | 26 | Music 21 is an alternate robust tool. Works only with existing corpora 27 | 28 | 29 | ----- 30 | Genius in the wild 31 | Hardwired for tidytext: https://johnmackintosh.com/2018-01-30-hardwired-for-tidy-text/ 32 | 33 | Christian Music analysis: https://deadset.press/2019/11/02/jesus-is-king-the-numerical-analysis/ 34 | tweet that went viral-ish for the above https://twitter.com/deadsetpress/status/1174072168277303296?s=20 35 | 36 | Taylor Swift Alcohol: https://www.reddit.com/r/dataisbeautiful/comments/8bi05k/taylor_swifts_newfound_infatuation_with_alcohol_oc/ 37 | 38 | In Significance Magazine: https://rss.onlinelibrary.wiley.com/doi/abs/10.1111/j.1740-9713.2019.01277.x 39 | 40 | Soundtracks of mobile listening: youth’s daily life music listening on mobile phones : https://www.researchgate.net/publication/336848309_Soundtracks_of_mobile_listening_youth's_daily_life_music_listening_on_mobile_phones 41 | 42 | Songs perceived as relaxing: Musical features, lyrics, and contributing mechanisms : https://www.researchgate.net/publication/336848388_Songs_perceived_as_relaxing_Musical_features_lyrics_and_contributing_mechanisms 43 | 44 | Oma Musa (Personalised music listening strategies to support emotional health in adolescents): https://www.jyu.fi/hytk/fi/laitokset/mutku/en/research/projects2/personalisedmusic 45 | 46 | Used in data science education: https://github.com/rstudio-education/datascience-box/blob/eaa59a8855884669846884407a4d5faabc206e9a/slides/u3_d04-tidytext/u3_d04-tidytext.Rmd 47 | 48 | 49 | 50 | song lyrics are used frequently. 51 | there is no uniform way of retrieving song lyrics 52 | 53 | -------------------------------------------------------------------------------- /R/add_genius.R: -------------------------------------------------------------------------------- 1 | #' Add lyrics to a data frame 2 | #' 3 | #' This function is to be used to build on a data frame with artist and album/track information. To use the function with a data frame of mixed type (albums and tracks), create another column that specifies type. The type values are `"album"`and `"lyrics"`. 4 | #' 5 | #' @param data This is a dataframe with one column for the artist name, and the other column being either the track title or the album title. 6 | #' @param artist This is the column which has artist title information 7 | #' @param title This is the column that has either album titles, track titles, or both. 8 | #' @param type This is a single value character string of either "album" or "track". This tells the function what kind of lyrics to pull. Alternatively, this can be a column with the value of "album" or "track" associated with each row. "lyric" can be used for backward compatibility. 9 | #' 10 | #' @examples 11 | #' \donttest{ 12 | #' # # Albums only 13 | #' # 14 | #' # artist_albums <- tibble::tribble( 15 | #' # ~artist, ~album, 16 | #' # "J. Cole", "KOD", 17 | #' # "Sampha", "Process" 18 | #' # ) 19 | #' # 20 | #' # add_genius(artist_albums, artist, album, type = "album") 21 | #' # 22 | #' # # Individual Tracks only 23 | #' # 24 | #' # artist_songs <- tibble::tribble( 25 | #' # ~artist, ~track, 26 | #' # "J. Cole", "Motiv8", 27 | #' # "Andrew Bird", "Anonanimal" 28 | #' # ) 29 | #' # 30 | #' # # Tracks and Albums 31 | #' # mixed_type <- tibble::tribble( 32 | #' # ~artist, ~album, ~type, 33 | #' # "J. Cole", "KOD", "album", 34 | #' # "Andrew Bird", "Proxy War", "track" 35 | #' # ) 36 | #' # 37 | #' # add_genius(mixed_type, artist, album, type) 38 | #' # add_genius(artist_songs, artist, track, type = "track") 39 | #'} 40 | #' 41 | #' 42 | #' @export 43 | #' @importFrom dplyr filter mutate bind_rows inner_join 44 | #' @importFrom tibble as_tibble 45 | #' @importFrom tidyr unnest 46 | #' @importFrom rlang enquo 47 | #' @importFrom purrr map2 48 | 49 | add_genius <- function(data, artist, title, type = c("album", "track", "lyrics")) { 50 | genius_funcs <- list(album = possible_album, lyrics = possible_lyrics) 51 | artist <- enquo(artist) 52 | title <- enquo(title) 53 | type <- enquo(type) 54 | 55 | songs <- filter(data, !!type %in% c("lyrics", "track")) 56 | albums <- filter(data, !!type == "album") 57 | 58 | song_lyrics <- mutate(songs, lyrics = map2(.x = !!artist, .y = !!title, genius_funcs[["lyrics"]])) 59 | album_lyrics <- mutate(albums, lyrics = map2(.x = !!artist, .y = !!title, genius_funcs[["album"]])) 60 | 61 | 62 | bind_rows( 63 | album_lyrics %>% 64 | unnest(lyrics), 65 | song_lyrics %>% 66 | unnest(lyrics) 67 | ) %>% 68 | inner_join(data) %>% 69 | as_tibble() 70 | 71 | 72 | } 73 | -------------------------------------------------------------------------------- /R/utils.R: -------------------------------------------------------------------------------- 1 | #' Form of genius_album that can handle errors 2 | #' @param ... arguments that would be passed to `genius_album()` 3 | #' @importFrom purrr possibly 4 | #' @importFrom tibble as_tibble 5 | #' @export 6 | possible_album <- possibly(genius_album, otherwise = as_tibble()) 7 | 8 | #' Form of genius_lyrics that can handle errors 9 | #' @param ... arguments that would be passed to `genius_lyrics()` 10 | #' @importFrom purrr possibly 11 | #' @importFrom tibble as_tibble 12 | #' @export 13 | possible_lyrics <- possibly(genius_lyrics, otherwise = as_tibble()) 14 | 15 | 16 | #' Form of genius_url that can handle errors 17 | #' @param ... arguments that would be passed to `genius_url()` 18 | #' @importFrom purrr possibly 19 | #' @importFrom tibble as_tibble 20 | #' @export 21 | possible_url <- possibly(genius_url, otherwise = as_tibble()) 22 | 23 | 24 | #' Prepares input strings for `gen_song_url()` 25 | #' 26 | #' Applies a number of regular expressions to prepare the input to match Genius url format 27 | #' 28 | #' @param input Either artist, song, or album, function input. 29 | #' @export 30 | prep_info <- function(input) { 31 | str_replace_all(input, 32 | c("\\s*\\(Ft.[^\\)]+\\)" = "", 33 | "&" = "and", 34 | #"-" = " ", 35 | #"\\+" = " ", 36 | "\\$" = " ", 37 | #"/" = " ", 38 | #":" = " ", 39 | "'" = "", 40 | #"," = "", 41 | "\u00E9" = "e", 42 | "\u00F6" = "o", 43 | "\u00F8" = "", 44 | "[[:punct:]]" = " ", 45 | "[[:blank:]]+" = " ")) %>% 46 | str_trim() 47 | } 48 | 49 | 50 | #' Function which produces a vector to be used in string cleaning from scraping there are a lot of hard coded values in here and will need to be adapted for the weird nuances. 51 | cleaning <- function() { 52 | # putting randomblackdude in here because I can't figure out a regex for him and he's throwing me off 53 | clean_vec <- c("([^RandomBlackDude][a-z0-9]{2,})([[:upper:]])" = "\\1\n\\2", # turn camel case into new lines 54 | "(\\]|\\))([[:upper:]])" = "\\1\n\\2", # letters immediately after closing brackets new lines 55 | # brackets with producer info into new lines 56 | "(\\[.{2,100}\\])" ="\n\\1\n", 57 | # rip smart quotes 58 | "\u2019" = "'", 59 | # if quotes follow or precede brackets fix lines 60 | "(\\])(\")" = "\\1\n\\2", 61 | "(\")(\\[)" = "\\1\n\\2", 62 | # if a question mark directly touches a word or number make new lines 63 | "(\\?)([[:alpha:]])" = "\\1\n\\2", 64 | # roger waters, you're a pain: comfortably numb, issue # 4 65 | # https://github.com/JosiahParry/genius/issues/4 66 | "(\\])(\\[)" = "\\1\n\\2") 67 | 68 | return(clean_vec) 69 | } 70 | 71 | 72 | 73 | unnest <- tidyr::unnest_legacy 74 | 75 | -------------------------------------------------------------------------------- /vignettes/genius_basics.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "genius Basics" 3 | author: "Josiah Parry" 4 | date: "`r Sys.Date()`" 5 | output: rmarkdown::html_vignette 6 | vignette: > 7 | %\VignetteIndexEntry{Vignette Title} 8 | %\VignetteEngine{knitr::rmarkdown} 9 | %\VignetteEncoding{UTF-8} 10 | --- 11 | 12 | # genius 13 | 14 | genius enables quick and easy download of song lyrics. The intent behind the package is to be able to perform text based analyses on songs in a tidy[text] format. Song lyrics come from [Genius](https://genius.com) (formerly Rap Genius), the most widely accessible platform for lyrics. 15 | 16 | The functions in this package enable easy access of individual song lyrics, album tracklists, and lyrics to whole albums. 17 | 18 | ## Individual songs `genius_lyrics()` 19 | 20 | Getting lyrics to a single song is pretty easy. Let's get in our **ELEMENT.** and checkout **DNA.**. But first, note that the `genius_lyrics()` function takes two arguments, `artist` and `song`. Be sure to spell the name of the artist and the song correctly, but don't worry about capitalization. 21 | 22 | First, let's set up our working environment. 23 | 24 | ```{r echo=FALSE, message=FALSE, warning=FALSE} 25 | # Load needed libraries 26 | library(genius) 27 | library(dplyr) 28 | ``` 29 | 30 | `genius_lyrics()` returns only the barebones. Utilizing `dplyr` we can also create a new variable with the line number to help in future [tidytext](https://github.com/juliasilge/tidytext) analysis. This will be covered in a later vignette / post. 31 | 32 | ```{r} 33 | DNA <- genius_lyrics(artist = "Kendrick Lamar", song = "DNA.") 34 | DNA 35 | 36 | ``` 37 | 38 | This function also enables you to get verse level information. This tends to me more popular in hip-hop where songs may have multiple feature artists (thanks to [\@natebarr64](https://github.com/natebarr64) for the help!). `genius_lyrics()` has an argument called `info`. It defaults to `"title"`. If you provide the argument `info = "features"` two new columns appear, `"verse"` and `"vocalist"`. 39 | 40 | ```{r} 41 | genius_lyrics(artist = "Kendrick Lamar", song = "DNA.", info = "features") 42 | ``` 43 | 44 | 45 | ## Albums 46 | 47 | More often than not you will want to get the lyrics for an entire album. This is done easily with `genius_album()`. Just provide the `artist` and `album` name. 48 | 49 | ```{r} 50 | DAMN <- genius_album(artist = "Kendrick Lamar", album = "DAMN.") 51 | 52 | head(DAMN) 53 | ``` 54 | 55 | Bam. Easy peasy. Now you have a sweet data frame ready for a tidy text analysis! 56 | 57 | 58 | ## Multiple albums or songs 59 | 60 | Being able to create a dataframe with multiple artists and albums is extremely useful for tidytext analysis. Instead of having to iterate over your data, `add_genius()` is here to assist you. 61 | 62 | Pipe a dataframe with a column for the album artists and album/track information. The argument `type` is used to indicate if the dataframe contains songs or albums 63 | 64 | ```{r} 65 | # Example with 2 different artists and albums 66 | artist_albums <- tribble( 67 | ~artist, ~album, 68 | "J. Cole", "KOD", 69 | "Sampha", "Process" 70 | ) 71 | 72 | 73 | artist_albums %>% 74 | add_genius(artist, album) 75 | 76 | 77 | # Example with 2 different artists and songs 78 | artist_songs <- tribble( 79 | ~artist, ~track, 80 | "J. Cole", "Motiv8", 81 | "Andrew Bird", "Anonanimal" 82 | ) 83 | 84 | artist_songs %>% 85 | add_genius(artist, track, type = "lyrics") 86 | ``` 87 | 88 | 89 | 90 | ## Tracklists 91 | 92 | I often only know an album name and none of the track titles or I only know the position in the tracklist. For this reason, I created a tool to provide an album tracklist. This function, `genius_tracklist()` takes the arguments `artist` and `album`. Simple enough, right? 93 | 94 | Let's get the tracklist for the original release of **DAMN.**. However, real Kendrick fans know that the album was intended to be listened to in chronological *and* reverse order—as is on the collector's release. 95 | 96 | 97 | ```{r} 98 | damn_tracks <- genius_tracklist(artist = "Kendrick Lamar", album = "DAMN.") 99 | 100 | # Collector's reverse order 101 | damn_tracks %>% 102 | arrange(-track_n) 103 | ``` 104 | 105 | 106 | -------------------------------------------------------------------------------- /R/genius_url.R: -------------------------------------------------------------------------------- 1 | if(getRversion() >= "2.15.1") { 2 | 3 | utils::globalVariables(c("type", "lyric", "line", "meta", 4 | "element_artist", "element", "track_title")) 5 | } 6 | 7 | #' Use Genius url to retrieve lyrics 8 | #' 9 | #' This function is used inside of the `genius_lyrics()` function. Given a url to a song on Genius, this function returns a tibble where each row is one line. Pair this function with `gen_song_url()` for easier access to song lyrics. 10 | #' 11 | #' @param url The url of song lyrics on Genius 12 | #' @param info Default \code{"title"}, returns the track title. Set to \code{"simple"} for only lyrics, \code{"artist"} for the lyrics and artist, \code{"features"} for song element and the artist of that element, \code{"all"} to return artist, track, line, lyric, element, and element artist. 13 | #' 14 | #' @examples 15 | #' \donttest{ 16 | #' #' genius_url("https://genius.com/Head-north-in-the-water-lyrics", info = "all") 17 | #' 18 | #' # url <- gen_song_url(artist = "Kendrick Lamar", song = "HUMBLE") 19 | #' 20 | #' # genius_url(url) 21 | #' 22 | #'} 23 | #' @export 24 | #' @importFrom rvest session html_nodes html_node html_text 25 | #' @importFrom tidyr pivot_wider fill separate replace_na 26 | #' @importFrom stringr str_detect str_extract str_replace_all str_trim 27 | #' @importFrom tibble tibble 28 | #' @importFrom dplyr mutate bind_rows case_when filter group_by ungroup n row_number 29 | #' @importFrom purrr pluck 30 | 31 | genius_url <- function(url, info = "title") { 32 | # create a new session for scraping lyrics 33 | # create a new session for scraping lyrics 34 | genius_session <- session(url) 35 | 36 | 37 | # Container classes are frequently changing 38 | # need to id class based on partial name matching 39 | # get the classes of all children of divs to pattern match properly 40 | class_names <- genius_session %>% 41 | rvest::html_elements("div") %>% 42 | rvest::html_children() %>% 43 | rvest::html_attr("class") %>% 44 | unique() %>% 45 | stats::na.omit() %>% 46 | stringr::str_split("[:space:]") %>% 47 | unlist() 48 | 49 | # fetch class names for song title artist and lyrics 50 | # will need to add `.` for all of them 51 | title_class <- class_names[stringr::str_detect(class_names, "SongHeader__Title")] 52 | artist_class <- class_names[stringr::str_detect(class_names, "SongHeader__Artist")] 53 | lyrics_class <- class_names[stringr::str_detect(class_names, "Lyrics__Container")] 54 | 55 | 56 | 57 | 58 | # Get Artist name 59 | artist <- html_nodes(genius_session, paste0(".", artist_class)) %>% 60 | html_text() %>% 61 | str_replace_all("\n", "") %>% 62 | str_trim() 63 | 64 | # Get Song title 65 | song_title <- html_nodes(genius_session, paste0(".", title_class)) %>% 66 | html_text() %>% 67 | str_replace_all("\n", "") %>% 68 | str_trim() 69 | 70 | # scrape the lyrics 71 | lyrics <- # read the text from the lyrics class 72 | # read the text from the lyrics class 73 | html_node(genius_session, paste0(".", lyrics_class)) %>% 74 | # trim white space 75 | html_text(trim = TRUE) %>% 76 | # use named vector for cleaning purposes 77 | str_replace_all(cleaning()) %>% 78 | strsplit(split = "\n") %>% 79 | purrr::pluck(1) %>% 80 | # filter to only rows with content 81 | .[str_detect(., "[[:alnum:]]")] %>% 82 | 83 | # trim whitespace 84 | str_trim() %>% 85 | 86 | # Convert to tibble 87 | tibble(artist = artist, 88 | track_title = song_title, 89 | lyric = .) %>% 90 | mutate(line = row_number()) %>% 91 | bind_rows(tibble(lyric = c("", "[]"))) %>% 92 | mutate(type = 93 | case_when( 94 | str_detect(lyric, "\\[|\\]") ~ "meta", 95 | TRUE ~ "lyric")) %>% 96 | pivot_wider(names_from = type, values_from = lyric) %>% 97 | 98 | #spread(key = type, value = lyric) 99 | dplyr::filter(!is.na(line)) %>% 100 | fill(meta, .direction = "down") %>% 101 | 102 | #remove producer info 103 | #filter(!str_detect(lyric, "[Pp]roducer")) %>% 104 | 105 | #remove brackets 106 | mutate(meta = str_extract(meta, "[^\\[].*[^\\]]")) %>% 107 | 108 | #make "element" and "artist" columns 109 | # sections of a song are called an element. Artists are resopnsible for each element 110 | separate(meta, into = c("element", "element_artist"), sep = ": ", fill = "right") %>% 111 | 112 | #if song has no features 113 | mutate(element_artist = replace_na(element_artist, artist[1])) %>% 114 | 115 | # filter out NA's from spreading meta 116 | # this will keep the meta if there are no following lyrics 117 | # this is helpful to keep track of instrumentals 118 | group_by(element) %>% 119 | 120 | # if there is only one line (meaning only element info) keep the NA, else drop 121 | filter(ifelse(is.na(lyric) & n() > 1, FALSE, TRUE)) %>% 122 | ungroup() %>% 123 | 124 | # create new line numbers incase they have been messed up 125 | mutate(line = row_number()) 126 | 127 | 128 | switch(info, 129 | simple = {return(select(lyrics, -artist, -track_title, -element, -element_artist))}, 130 | artist = {return(select(lyrics, -track_title, -element, -element_artist))}, 131 | title = {return(select(lyrics, -artist, -element, -element_artist))}, 132 | features = {return(select(lyrics, -artist, -track_title))}, 133 | all = return(lyrics) 134 | ) 135 | 136 | } 137 | -------------------------------------------------------------------------------- /NEWS.md: -------------------------------------------------------------------------------- 1 | ## `genius` v2.2.3 2 | 3 | As illustrated in [PR #55](https://github.com/JosiahParry/genius/pull/55), all functionality of genius had been broken due to the changes with genius 4 | 5 | This patch release 6 | 7 | - brings back full functionality of the package; 8 | - utilizes `pivot_wider()` instead of `spread()`; 9 | - removes `readr` as a dependency; 10 | - utilizes `session()` instead of `html_session()` from `{rvest}` since it has been superceded. 11 | 12 | ## `genius` v2.2.2 13 | 14 | May 9th, 2020 15 | 16 | * genius was taken off of CRAN due to my failure to address failing examples. Thank you to the CRAN team for providing thorough explanation and forewarning. Thank you [\@Nicolas-Gallo](https://github.com/Nicolas-Gallo) for bringing this to my attention. This release closes [issue #48](https://github.com/JosiahParry/genius/issues/48). 17 | * The removal of genius from CRAN has had downstream consequences. Failure to update has led to the removal of spotifyr from CRAN as well due to the inclusion of genius in its `Imports`. This release will make it possible to publish spotifyr to CRANA again addressing spotifyr [issue #112](https://github.com/charlie86/spotifyr/issues/112). 18 | * `add_genius()`'s `type` argument took two values `"album"` and `"lyrics"`. This is logically inconsistent as we are specifying track titles not the lyrics. This is an attempt to create more continuity between the data that is returned from `genius_lyrics()` and `genius_album()`. I've introduced the argument value `"track"` which is to be preferred over `"lyrics"`. `"lyrics"` will remain a valid option to `type` for reverse compatability. 19 | * Version bumped to 2.2.2. 20 | 21 | ## `genius` v2.2.1 22 | 23 | Dec 16th, 2019 24 | 25 | * Created a Code of Conduct 26 | * Changed license from GPL-2 to MIT 27 | 28 | 29 | Dec 15th, 2019 30 | 31 | * `add_genius()` now unnests lyrics prior to joining. This ought to reduce the number of errors. 32 | * updates to `tidyr::unnest()` broke functions in many situations. Thank you to [\@eoppe1022](https://github.com/eoppe1022) for pointing out `tidyr::unnest_legacy()`. The legacy version will be used internally for now. 33 | * Thank you to [\@chris-billingham](https://github.com/chris-billingham) for noting issues with the `info = "title"` and `info = "all"` arguments of `genius_album()` and fixing these whilst also adding album_name into the output from `genius_album(..., info = "all")`. 34 | * Belated thank you to [\@eoppe1022](https://github.com/eoppe1022) for help with `prep_info()` in earlier releases. Evan has been added as a contributor. 35 | 36 | 37 | Nov 27th, 2019 38 | 39 | * Thank you to [\@mine-cetinkaya-rundel](https://github.com/mine-cetinkaya-rundel) for noting changes to `tidyr::unnest()` producing unwanted warnings in `add_genius()`. This has been fixed. 40 | 41 | 42 | ## `genius` v2.2.0 43 | 44 | May 5th, 2019 45 | 46 | * New package version was submitted to CRAN. 47 | * Interactive tutorial is now available in the package. 48 | * Run `learnr::run_tutorial("genius_tutorial", "genius")` to access the tutorial. 49 | 50 | May 4th, 2019 51 | 52 | * New functionality was added to `genius`. The function `calc_self_sim()` enables the user to create a self-similarity matrix. The default output is a _tidy_ data-frame that is ready for plotting. Additional arguments can be used to remove stop words. Stop word functionality is accessed via the [`tidytext`](https://github.com/juliasilge/tidytext) package. 53 | * `add_genius()` has been modified to be able to accept a column for the `type`. This will enable you to mix both single songs with entire albums. The `type_group` column has been renamed to `title` to be more coherent. This is a potentially breaking change to existing code. 54 | * Changes to `add_genius()` were checked for `spotifyr` reverse dependencies. Results returned 0 errors. All should be good. 55 | 56 | ---------- 57 | 58 | ## `genius` v2.1.0 59 | 60 | April 28th, 2019 61 | 62 | * Issue [27](https://github.com/JosiahParry/genius/issues/27) has been fixed. Thank you to [\@manadamoth](https://github.com/manandamoth) for bringing the issue to attention. `genius_album()` would fail when it encountered a url with missing lyrics. The solution was to create a safe version of the `genius_url()` function that is called from `genius_album()` instead of `genius_url()` directly. This means that the function will continue to work if a single track url doesn't. Those tracks will have any empty tibble (so after unnesting within `genius_album()` it returns a row of `NA`s). 63 | * I am still trying to figure out how to customize the warning message from `purrr::possibly()`, if anyone knows how to do this, please let me know via issue or twitter ([\@josiahparry](https://twitter.com/josiahparry)). 64 | 65 | 66 | ------------- 67 | 68 | ## `genius` v2.0.0 69 | April 10th, 2019 70 | 71 | The name of this package has been changed from `geniusR` to `genius` due to a name conflict on **CRAN**. 72 | 73 | This update makes some drastic changes to the base `genius_url()` function. It closes two long lasting pull requests [#4](https://github.com/JosiahParry/genius/issues/4), and [#12](https://github.com/JosiahParry/genius/issues/12). 74 | 75 | Big thanks to [\@natebarr64](https://github.com/natebarr64) for his pull request [#20](https://github.com/JosiahParry/genius/pull/20) that fixed issue #4. This PR also created a new feature. 76 | 77 | @natebarr64 create the argument `info = "features"` which will identify the song element and artist for that element if they are available. You can use this for `genius_url()`, `genius_lyrics()`, and `genius_album()`. 78 | 79 | -------------------------------------------------------------------------------- /paper/index.bib: -------------------------------------------------------------------------------- 1 | @Manual{R-base, 2 | title = {R: A Language and Environment for Statistical Computing}, 3 | author = {{R Core Team}}, 4 | organization = {R Foundation for Statistical Computing}, 5 | address = {Vienna, Austria}, 6 | year = {2019}, 7 | url = {https://www.R-project.org/}, 8 | } 9 | 10 | 11 | # MIR is mainly audio - https://link.springer.com/chapter/10.1007/978-3-642-11674-2_15 12 | @Inbook{Mayer2010, 13 | author="Mayer, Rudolf 14 | and Rauber, Andreas", 15 | editor="Ra{\'{s}}, Zbigniew W. 16 | and Wieczorkowska, Alicja A.", 17 | title="Multimodal Aspects of Music Retrieval: Audio, Song Lyrics -- and Beyond?", 18 | bookTitle="Advances in Music Information Retrieval", 19 | year="2010", 20 | publisher="Springer Berlin Heidelberg", 21 | address="Berlin, Heidelberg", 22 | pages="333--363", 23 | abstract="Music retrieval is predominantly seen as a problem to be tackled in the acoustic domain. With the exception of symbolic music retrieval and score-based systems, which form rather separate sub-disciplines on their own, most approaches to retrieve recordings of music by content rely on different features extracted from the audio signal. Music is subsequently retrieved by similarity matching, or classified into genre, instrumentation, artist or other categories. Yet, music is an inherently multimodal type of data. Apart from purely instrumental pieces, the lyrics associated with the music are as essential to the reception and the message of a song as is the audio. Albumcovers are carefully designed by artists to convey a message that is consistent with the message sent by the music on the album as well as by the image of a band in general. Music videos, fan sites and other sources of information add to that in a usually coherent manner. This paper takes a look at recent developments in multimodal analysis of music. It discusses different types of information sources available, stressing the multimodal character of music. It then reviews some features that may be extracted from those sources, focussing particularly on audio and lyrics as sources of information. Experimental results on different collections and categorisation tasks will round off the chapter. It shows the merits and open issues to be addressed to fully benefit from the rich and complex information space that music creates.", 24 | isbn="978-3-642-11674-2", 25 | doi="10.1007/978-3-642-11674-2_15", 26 | url="https://doi.org/10.1007/978-3-642-11674-2_15" 27 | } 28 | 29 | 30 | #https://www.tandfonline.com/doi/full/10.1080/02699931.2019.1700482?scroll=top&needAccess=true 31 | 32 | @article{vidas2019development, 33 | title={Development of emotion recognition in popular music and vocal bursts}, 34 | author={Vidas, Dianna and Calligeros, Renee and Nelson, Nicole L and Dingle, Genevieve A}, 35 | journal={Cognition and Emotion}, 36 | pages={1--14}, 37 | year={2019}, 38 | doi="10.1080/02699931.2019.1700482", 39 | publisher={Taylor \& Francis} 40 | } 41 | 42 | #https://www.researchgate.net/publication/336848309_Soundtracks_of_mobile_listening_youth's_daily_life_music_listening_on_mobile_phones 43 | 44 | @misc{soundtracks, 45 | author = {Baltazar, Margarida and Randall, Will M and Saarikallio, Suvi}, 46 | year = {2019}, 47 | month = {09}, 48 | pages = {}, 49 | title = {Soundtracks of mobile listening: youth's daily life music listening on mobile phones} 50 | } 51 | 52 | #https://www.researchgate.net/publication/336848388_Songs_perceived_as_relaxing_Musical_features_lyrics_and_contributing_mechanisms 53 | 54 | @misc{relaxing, 55 | author = {Baltazar, Margarida and Västfjäll, Daniel}, 56 | year = {2019}, 57 | month = {10}, 58 | pages = {}, 59 | title = {Songs perceived as relaxing: Musical features, lyrics, and contributing mechanisms} 60 | } 61 | 62 | @article{violence, 63 | title={Exposure to violent media: The effects of songs with violent lyrics on aggressive thoughts and feelings.}, 64 | author={Anderson, Craig A and Carnagey, Nicholas L and Eubanks, Janie}, 65 | journal={Journal of personality and social psychology}, 66 | volume={84}, 67 | number={5}, 68 | pages={960}, 69 | year={2003}, 70 | publisher={American Psychological Association} 71 | } 72 | 73 | @article{wellbeing, 74 | title={Message in the Music: Do Lyrics Influence Well-Being?}, 75 | author={Ransom, Patricia Fox}, 76 | year={2015} 77 | } 78 | 79 | #sentiment analysis of pop music - scraping from multiple places 80 | @article{napier, 81 | title={Quantitative Sentiment Analysis of Lyrics in Popular Music}, 82 | author={Napier, Kathleen and Shamir, Lior}, 83 | journal={Journal of Popular Music Studies}, 84 | volume={30}, 85 | number={4}, 86 | pages={161--176}, 87 | year={2018}, 88 | doi="10.1525/jpms.2018.300411", 89 | publisher={University of California Press Journals} 90 | } 91 | 92 | @Manual{spotifyr, 93 | title = {spotifyr: R Wrapper for the 'Spotify' Web API}, 94 | author = {Charlie Thompson and Josiah Parry and Donal Phipps and Tom Wolff}, 95 | year = {2019}, 96 | note = {R package version 2.1.1}, 97 | url = {https://CRAN.R-project.org/package=spotifyr}, 98 | } 99 | 100 | @article{tidytext, 101 | title={tidytext: Text Mining and Analysis Using Tidy Data Principles in R.}, 102 | author={Silge, Julia and Robinson, David}, 103 | journal={J. Open Source Software}, 104 | volume={1}, 105 | number={3}, 106 | pages={37}, 107 | year={2016}, 108 | doi="10.21105/joss.00037" 109 | } 110 | 111 | @article{tidydata, 112 | title={Tidy data}, 113 | author={Wickham, Hadley and others}, 114 | journal={Journal of Statistical Software}, 115 | volume={59}, 116 | number={10}, 117 | pages={1--23}, 118 | year={2014}, 119 | doi="10.18637/jss.v059.i10", 120 | publisher={Foundation for Open Access Statistics} 121 | } 122 | -------------------------------------------------------------------------------- /inst/tutorials/genius_tutorial/genius_tutorial.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "genius tutorial" 3 | output: learnr::tutorial 4 | runtime: shiny_prerendered 5 | --- 6 | 7 | ```{r setup, include=FALSE} 8 | library(learnr) 9 | library(genius) 10 | library(tidyverse) 11 | knitr::opts_chunk$set(echo = FALSE) 12 | ``` 13 | 14 | 15 | ## Introducing genius 16 | 17 | You want to start analysing song lyrics, where do you go? There have been music information retrieval papers written on the topic of programmatically extracting lyrics from the web. Dozens of people have gone through the laborious task of scraping song lyrics from websites. Even a recent winner of the Shiny competition scraped lyrics from Genius.com. 18 | 19 | I too have been there. Scraping websites is not always the best use of your time. `genius` is an R package that will enable you to programatically download song lyrics in a tidy format ready for analysis. To begin using the package, it first must be installed, and loaded. In addition to `genius`, we will need our standard data manipulation tools from the `tidyverse`. 20 | 21 | 22 | ```{r installing-genius, exercise=TRUE, exercise.lines = 5, eval=FALSE} 23 | install.packages("genius") 24 | ``` 25 | 26 | ```{r loading-genius, exercise=TRUE, exercise.lines = 3} 27 | library(genius) 28 | library(tidyverse) 29 | ``` 30 | 31 | 32 | ## Single song lyrics 33 | 34 | The simplest method of extracting song lyrics is to get just a single song at a time. This is done with the `genius_lyrics()` function. It takes two main arguments: `artist` and `song`. These are the quoted name of the artist and song. Additionally there is a third argument `info` which determines what extra metadata you can get. The possible values are `title`, `simple`, `artist`, `features`, and `all`. I recommend trying them all to see how they work. 35 | 36 | In this example we will work to retrieve the song lyrics for the upcoming musician [Renny Conti](https://rennyconti.bandcamp.com). 37 | 38 | 39 | ```{r single-song, exercise=TRUE, exercise.lines = 5} 40 | floating <- genius_lyrics("renny conti", "people floating") 41 | floating 42 | ``` 43 | 44 | ## Album Lyrics 45 | 46 | Now that you have the intuition for obtaining lyrics for a single song, we can now create a larger dataset for the lyrics of an entire album using `genius_album()`. Similar to `genius_lyrics()`, the arguments are `artist`, `album`, and `info`. 47 | 48 | In the exercise below the lyrics for [Snail Mail's](https://www.snailmail.band/) album Lush. Try retrieving the lyrics for an album of your own choosing. 49 | 50 | ```{r genius_album, exercise=TRUE, exercise.lines = 5} 51 | lush <- genius_album("Snail Mail", "Lush") 52 | lush 53 | ``` 54 | 55 | ## Adding Lyrics to a data frame 56 | 57 | 58 | ### Multiple songs 59 | 60 | A common use for lyric analysis is to compare the lyrics of one artist to another. In order to do that, you could potentially retrieve the lyrics for multiple songs and albums and then join them together. This has one major issue in my mind, it makes you create multiple object taking up precious memory. For this reason, the function `add_genius()` was developed. This enables you to create a tibble with a column for an artists name and their album or song title. `add_genius()` will then go through the entire tibble and add song lyrics for the tracks and albums that are available. 61 | 62 | Let's try this with a tibble of three songs. 63 | 64 | ```{r songs-add-genius, exercise = TRUE} 65 | three_songs <- tribble( 66 | ~ artist, ~ title, 67 | "Big Thief", "UFOF", 68 | "Andrew Bird", "Imitosis", 69 | "Sylvan Esso", "Slack Jaw" 70 | ) 71 | 72 | song_lyrics <- three_songs %>% 73 | add_genius(artist, title, type = "lyrics") 74 | 75 | song_lyrics %>% 76 | count(artist) 77 | 78 | ``` 79 | 80 | 81 | ### Multiple albums 82 | 83 | `add_genius()` also extends this functionality to albums. 84 | 85 | ```{r albums-add-genius, exercise = TRUE} 86 | albums <- tribble( 87 | ~ artist, ~ title, 88 | "Andrew Bird", "Armchair Apocrypha", 89 | "Andrew Bird", "Things are really great here sort of" 90 | ) 91 | 92 | album_lyrics <- albums %>% 93 | add_genius(artist, title, type = "album") 94 | 95 | album_lyrics 96 | ``` 97 | 98 | What is important to note here is that the warnings for this function are somewhat informative. When a 404 error occurs, this may be because that the song does not exist in Genius. Or, that the song is actually an instrumental which is the case here with Andrew Bird. 99 | 100 | 101 | ### Albums and Songs 102 | 103 | In the scenario that you want to mix single songs and lyrics, you can supply a column with the type value of each row. The example below illustrates this. First a tibble with artist, track or album title, and type columns are created. Next, the tibble is piped to `add_genius()` with the unquote column names for the artist, title, and type columns. This will then iterate over each row and fetch the appropriate song lyrics. 104 | 105 | ```{r mixed-add-genius, exercise=TRUE} 106 | song_album <- tribble( 107 | ~ artist, ~ title, ~ type, 108 | "Big Thief", "UFOF", "lyrics", 109 | "Andrew Bird", "Imitosis", "lyrics", 110 | "Sylvan Esso", "Slack Jaw", "lyrics", 111 | "Movements", "Feel Something", "album" 112 | ) 113 | 114 | mixed_lyrics <- song_album %>% 115 | add_genius(artist, title, type) 116 | ``` 117 | 118 | 119 | ## Self-similarity 120 | 121 | Another feature of `genius` is the ability to create self-similarity matrices to visualize lyrical patterns within a song. This idea was taken from Colin Morris' wonderful javascript based [Song Sim](https://colinmorris.github.io/SongSim/#/gallery) project. Colin explains the interpretation of a self-similarity matrix in their [TEDx talk](https://www.youtube.com/watch?v=_tjFwcmHy5M). An even better description of the interpretation is available in [this post](https://colinmorris.github.io/blog/weird-pop-songs). 122 | 123 | To use Colin's example we will look at the structure of Ke$ha's Tik Tok. 124 | 125 | The function `calc_self_sim()` will create a self-similarity matrix of a given song. The main arguments for this function are the tibble (`df`), and the column containing the lyrics (`lyric_col`). Ideally this is one line per observation as is default from the output of `genius_*()`. The tidy output compares every ith word with every word in the song. This measures repetition of words and will show us the structure of the lyrics. 126 | 127 | ```{r song-self-sim, exercise=TRUE} 128 | tik_tok <- genius_lyrics("Ke$ha", "Tik Tok") 129 | 130 | tt_self_sim <- calc_self_sim(tik_tok, lyric, output = "tidy") 131 | 132 | tt_self_sim 133 | 134 | tt_self_sim %>% 135 | ggplot(aes(x = x_id, y = y_id, fill = identical)) + 136 | geom_tile() + 137 | scale_fill_manual(values = c("white", "black")) + 138 | theme_minimal() + 139 | theme(legend.position = "none", 140 | axis.text = element_blank()) + 141 | scale_y_continuous(trans = "reverse") + 142 | labs(title = "Tik Tok", subtitle = "Self-similarity matrix", x = "", y = "", 143 | caption = "The matrix displays that there are three choruses with a bridge between the last two. The bridge displays internal repetition.") 144 | ``` 145 | 146 | 147 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | 2 | ## NOTE: genius no longer under development 3 | 4 | 2021-10-31 5 | 6 | After quite some time I have decided to no longer maintain or support the `genius` package. While this package serves a very important purpose from the perspective of music information retrieval, it lies in a grey legal area—web scraping. 7 | 8 | Over the years genius.com has changed their web practices in such a way that makes it increasingly unreliable and difficult to scrape. 9 | 10 | Why not use the API (as many have asked)? Because song lyrics are owned by the musicians themselves and as such, they cannot be provided via their API. 11 | 12 | 13 | I will be removing this package from CRAN. 14 | 15 | 16 | ------ 17 | 18 | 19 | 20 | [![CRAN status](https://www.r-pkg.org/badges/version/genius)](https://cran.r-project.org/package=genius) 21 | [![CRAN RStudio mirror downloads](https://cranlogs.r-pkg.org/badges/grand-total/genius?color=d3a167)](https://r-pkg.org/pkg/genius) 22 | [![lifecycle](https://img.shields.io/badge/lifecycle-maturing-blue.svg)](https://www.tidyverse.org/lifecycle/#maturing) 23 | 24 | This package was created to provide an easy method to access lyrics as text data using [Genius](https://genius.com). 25 | 26 | Access to genius as an API or via Python [here](https://github.com/JosiahParry/genius-api) 27 | 28 | Installation 29 | ------------ 30 | 31 | This package is available from CRAN. 32 | 33 | ``` r 34 | install.packages("genius") 35 | ``` 36 | 37 | Load the package: 38 | 39 | ``` r 40 | library(genius) 41 | library(tidyverse) 42 | ``` 43 | 44 | Get started with a tutorial! 45 | 46 | ```r 47 | learnr::run_tutorial("genius_tutorial", "genius") 48 | ``` 49 | 50 | Getting Lyrics 51 | ============== 52 | 53 | Whole Albums 54 | ------------ 55 | 56 | `genius_album()` allows you to download the lyrics for an entire album in a `tidy` format. There are two arguments `artists` and `album`. Supply the quoted name of artist and the album (if it gives you issues check that you have the album name and artists as specified on [Genius](https://genius.com)). 57 | 58 | This returns a tidy data frame with three columns: 59 | 60 | - `title`: track name 61 | - `track_n`: track number 62 | - `text`: lyrics 63 | 64 | ``` r 65 | emotions_math <- genius_album(artist = "Margaret Glaspy", album = "Emotions and Math") 66 | ``` 67 | 68 | ## Joining, by = c("track_title", "track_n", "track_url") 69 | 70 | ``` r 71 | emotions_math 72 | ``` 73 | 74 | ## # A tibble: 370 x 4 75 | ## track_title track_n line lyric 76 | ## 77 | ## 1 Emotions And Math 1 1 Oh when I got you by my side 78 | ## 2 Emotions And Math 1 2 Everything's alright 79 | ## 3 Emotions And Math 1 3 Its just when your gone 80 | ## 4 Emotions And Math 1 4 I start to snooze the alarm 81 | ## 5 Emotions And Math 1 5 Cause I stay up until 4 in the morning 82 | ## 6 Emotions And Math 1 6 Counting all the days 'til you're back 83 | ## 7 Emotions And Math 1 7 Shivering in an ice cold bath 84 | ## 8 Emotions And Math 1 8 Of emotions and math 85 | ## 9 Emotions And Math 1 9 Oh it's a shame 86 | ## 10 Emotions And Math 1 10 And I'm to blame 87 | ## # … with 360 more rows 88 | 89 | Multiple Albums / Songs 90 | ----------------------- 91 | 92 | If you wish to download multiple albums from multiple artists, try and keep it tidy and avoid binding rows if you can. We can achieve this in a tidy workflow by creating a tibble with two columns: `artist` and `album` where each row is an artist and their album. We can then iterate over those columns with `add_genius()`. 93 | 94 | Pipe a dataframe with a column for the album artists and album/track information. The argument `type` is used to indicate if the dataframe contains songs or albums 95 | 96 | ``` r 97 | # Example with 2 different artists and albums 98 | artist_albums <- tribble( 99 | ~artist, ~album, 100 | "J. Cole", "KOD", 101 | "Sampha", "Process" 102 | ) 103 | 104 | 105 | artist_albums %>% 106 | add_genius(artist, album) 107 | ``` 108 | 109 | ## Joining, by = c("track_title", "track_n", "track_url") 110 | ## Joining, by = c("track_title", "track_n", "track_url") 111 | 112 | ## Joining, by = c("artist", "album") 113 | 114 | ## # A tibble: 1,319 x 6 115 | ## artist album track_title track_n line lyric 116 | ## 117 | ## 1 J. Cole KOD Intro (KOD) 1 1 Can someone please turn off my … 118 | ## 2 J. Cole KOD Intro (KOD) 1 2 My thoughts are racing all the … 119 | ## 3 J. Cole KOD Intro (KOD) 1 3 There is no reason or no rhyme 120 | ## 4 J. Cole KOD Intro (KOD) 1 4 I'm trapped inside myself 121 | ## 5 J. Cole KOD Intro (KOD) 1 5 A newborn baby has two primary … 122 | ## 6 J. Cole KOD Intro (KOD) 1 6 "Laughter, which says, \"I love… 123 | ## 7 J. Cole KOD Intro (KOD) 1 7 "Or crying, which says, \"This … 124 | ## 8 J. Cole KOD Intro (KOD) 1 8 There are many ways to deal wit… 125 | ## 9 J. Cole KOD Intro (KOD) 1 9 Choose wisely 126 | ## 10 J. Cole KOD Intro (KOD) 1 10 At the bottom of the hourglass 127 | ## # … with 1,309 more rows 128 | 129 | This can be easily replicated with multiple songs as well. 130 | 131 | ``` r 132 | # Example with 2 different artists and songs 133 | artist_songs <- tribble( 134 | ~artist, ~track, 135 | "J. Cole", "Motiv8", 136 | "Andrew Bird", "Anonanimal" 137 | ) 138 | 139 | artist_songs %>% 140 | add_genius(artist, track, type = "lyrics") 141 | ``` 142 | 143 | ## Joining, by = c("artist", "track") 144 | 145 | ## # A tibble: 102 x 5 146 | ## artist track track_title line lyric 147 | ## 148 | ## 1 J. Cole Motiv8 Motiv8 1 You really wanna know who Superman is? 149 | ## 2 J. Cole Motiv8 Motiv8 2 Watch this, pow! 150 | ## 3 J. Cole Motiv8 Motiv8 3 I like him 151 | ## 4 J. Cole Motiv8 Motiv8 4 I think he's pretty cool 152 | ## 5 J. Cole Motiv8 Motiv8 5 He's my idol 153 | ## 6 J. Cole Motiv8 Motiv8 6 I can't have no sympathy for fuck… 154 | ## 7 J. Cole Motiv8 Motiv8 7 All this shit I've seen done made my b… 155 | ## 8 J. Cole Motiv8 Motiv8 8 Spill promethazine inside a double cup 156 | ## 9 J. Cole Motiv8 Motiv8 9 Double up my cream, now that's a Doubl… 157 | ## 10 J. Cole Motiv8 Motiv8 10 Please don't hit my phone if it ain't … 158 | ## # … with 92 more rows 159 | 160 | Song Lyrics 161 | ----------- 162 | 163 | ### `genius_lyrics()` 164 | 165 | If you want only a single song, you can use `genius_lyrics()`. Supply an artist and a song title as character strings, and voila. 166 | 167 | ``` r 168 | memory_street <- genius_lyrics(artist = "Margaret Glaspy", song = "Memory Street") 169 | 170 | memory_street 171 | ``` 172 | 173 | ## # A tibble: 27 x 3 174 | ## track_title line lyric 175 | ## 176 | ## 1 Memory Street 1 Ring the alarm 177 | ## 2 Memory Street 2 I'm on memory street 178 | ## 3 Memory Street 3 With him on my arm 179 | ## 4 Memory Street 4 And my feet on the dash of that car 180 | ## 5 Memory Street 5 I don't dare 181 | ## 6 Memory Street 6 Walk down memory street 182 | ## 7 Memory Street 7 Why remember 183 | ## 8 Memory Street 8 All the times I took forever to forget? 184 | ## 9 Memory Street 9 Call the guards 185 | ## 10 Memory Street 10 I'm at the gates 186 | ## # … with 17 more rows 187 | 188 | This returns a `tibble` with three columns `title`, `text`, and `line`. However, you can specifiy additional arguments to control the amount of information to be returned using the `info` argument. 189 | 190 | - `info = "title"` (default): Return the lyrics, line number, and song title. 191 | - `info = "simple"`: Return just the lyrics and line number. 192 | - `info = "artist"`: Return the lyrics, line number, and artist. 193 | - `info = "features"`: Returns the lyrics, line number, artist, verse, and vocalist if available. 194 | - `info = "all"`: Return lyrics, line number, song title, artist. 195 | 196 | Tracklists 197 | ---------- 198 | 199 | `genius_tracklist()`, given an `artist` and an `album` will return a barebones `tibble` with the track title, track number, and the url to the lyrics. 200 | 201 | ``` r 202 | genius_tracklist(artist = "Basement", album = "Colourmeinkindness") 203 | ``` 204 | 205 | ## # A tibble: 10 x 3 206 | ## track_title track_n track_url 207 | ## 208 | ## 1 Whole 1 https://genius.com/Basement-whole-lyrics 209 | ## 2 Covet 2 https://genius.com/Basement-covet-lyrics 210 | ## 3 Spoiled 3 https://genius.com/Basement-spoiled-lyrics 211 | ## 4 Pine 4 https://genius.com/Basement-pine-lyrics 212 | ## 5 Bad Apple 5 https://genius.com/Basement-bad-apple-lyrics 213 | ## 6 Breathe 6 https://genius.com/Basement-breathe-lyrics 214 | ## 7 Control 7 https://genius.com/Basement-control-lyrics 215 | ## 8 Black 8 https://genius.com/Basement-black-lyrics 216 | ## 9 Comfort 9 https://genius.com/Basement-comfort-lyrics 217 | ## 10 Wish 10 https://genius.com/Basement-wish-lyrics 218 | 219 | ------------------------------------------------------------------------ 220 | 221 | Nitty Gritty 222 | ------------ 223 | 224 | `genius_lyrics()` generates a url via `gen_song_url()` to Genius which is fed to `genius_url()`, the function that does the heavy lifting of actually fetching lyrics. 225 | 226 | I have not figured out all of the patterns that are used for generating the Genius.com urls, so errors are bound to happen. If `genius_lyrics()` returns an error. Try utilizing `genius_tracklist()` and `genius_url()` together to get the song lyrics. 227 | 228 | For example, say "(No One Knows Me) Like the Piano" by *Sampha* wasn't working in a standard `genius_lyrics()` call. 229 | 230 | ``` r 231 | piano <- genius_lyrics("Sampha", "(No One Knows Me) Like the Piano") 232 | ``` 233 | 234 | We could grab the tracklist for the album *Process* which the song is from. We could then isolate the url for *(No One Knows Me) Like the Piano* and feed that into \`genius\_url(). 235 | 236 | ``` r 237 | # Get the tracklist for 238 | process <- genius_tracklist("Sampha", "Process") 239 | 240 | # Filter down to find the individual song 241 | piano_info <- process %>% 242 | filter(track_title == "(No One Knows Me) Like the Piano") 243 | 244 | # Filter song using string detection 245 | # process %>% 246 | # filter(stringr::str_detect(title, coll("Like the piano", ignore_case = TRUE))) 247 | 248 | piano_url <- piano_info$track_url 249 | ``` 250 | 251 | Now that we have the url, feed it into `genius_url()`. 252 | 253 | ``` r 254 | genius_url(piano_url, info = "simple") 255 | ``` 256 | 257 | ## # A tibble: 12 x 2 258 | ## line lyric 259 | ## 260 | ## 1 1 No one knows me like the piano in my mother's home 261 | ## 2 2 You would show me I had something some people call a soul 262 | ## 3 3 And you dropped out the sky, oh you arrived when I was three year… 263 | ## 4 4 No one knows me like the piano in my mother's home 264 | ## 5 5 You know I left, I flew the nest 265 | ## 6 6 And you know I won't be long 266 | ## 7 7 And in my chest you know me best 267 | ## 8 8 And you know I'll be back home 268 | ## 9 9 An angel by her side, all of the times I knew we couldn't cope 269 | ## 10 10 They said that it's her time, no tears in sight, I kept the feeli… 270 | ## 11 11 And you took hold of me and never, never, never let me go'Cause n… 271 | ## 12 12 In my mother's home 272 | 273 | ------------------------------------------------------------------------ 274 | 275 | On the Internals 276 | ================ 277 | 278 | Generative functions 279 | -------------------- 280 | 281 | This package works almost entirely on pattern detection. The urls from *Genius* are (mostly) easily reproducible (shout out to [Angela Li](https://twitter.com/CivicAngela) for pointing this out). 282 | 283 | The two functions that generate urls are `gen_song_url()` and `gen_album_url()`. To see how the functions work, try feeding an artist and song title to `gen_song_url()` and an artist and album title to `gen_album_url()`. 284 | 285 | ``` r 286 | gen_song_url("Laura Marling", "Soothing") 287 | ``` 288 | 289 | ## [1] "https://genius.com/Laura-Marling-Soothing-lyrics" 290 | 291 | ``` r 292 | gen_album_url("Daniel Caesar", "Freudian") 293 | ``` 294 | 295 | ## [1] "https://genius.com/albums/Daniel-Caesar/Freudian" 296 | 297 | `genius_lyrics()` calls `gen_song_url()` and feeds the output to `genius_url()` which preforms the scraping. 298 | 299 | Getting lyrics for albums is slightly more involved. It first calls `genius_tracklist()` which first calls `gen_album_url()` then using the handy package `rvest` scrapes the song titles, track numbers, and song lyric urls. Next, the song urls from the output are iterated over and fed to `genius_url()`. 300 | 301 | ### Notes: 302 | 303 | As this is my first *"package"* there will be many issues. Please submit an issue and I will do my best to attend to it. 304 | 305 | There are already issues of which I am present (the lack of error handling). If you would like to take those on, please go ahead and make a pull request. Please contact me on [Twitter](https://twitter.com/josiahparry). 306 | -------------------------------------------------------------------------------- /inst/tutorials/genius_tutorial/genius_tutorial.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | genius tutorial 17 | 18 | 19 | 20 | 21 | 26 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | 40 | 41 | 42 |
43 |
44 | 45 |
46 | 47 |
48 |

Introducing genius

49 |

You want to start analysing song lyrics, where do you go? There have been music information retrieval papers written on the topic of programmatically extracting lyrics from the web. Dozens of people have gone through the laborious task of scraping song lyrics from websites. Even a recent winner of the Shiny competition scraped lyrics from Genius.com.

50 |

I too have been there. Scraping websites is not always the best use of your time. genius is an R package that will enable you to programatically download song lyrics in a tidy format ready for analysis. To begin using the package, it first must be installed, and loaded. In addition to genius, we will need our standard data manipulation tools from the tidyverse.

51 |
52 |
install.packages("genius")
53 | 54 |
55 |
56 |
library(genius)
 57 | library(tidyverse)
58 | 59 |
60 |
61 |
62 |

Single song lyrics

63 |

The simplest method of extracting song lyrics is to get just a single song at a time. This is done with the genius_lyrics() function. It takes two main arguments: artist and song. These are the quoted name of the artist and song. Additionally there is a third argument info which determines what extra metadata you can get. The possible values are title, simple, artist, features, and all. I recommend trying them all to see how they work.

64 |

In this example we will work to retrieve the song lyrics for the upcoming musician Renny Conti.

65 |
66 |
floating <- genius_lyrics("renny conti", "people floating")
 67 | floating
68 | 69 |
70 |
71 |
72 |

Album Lyrics

73 |

Now that you have the intuition for obtaining lyrics for a single song, we can now create a larger dataset for the lyrics of an entire album using genius_album(). Similar to genius_lyrics(), the arguments are artist, album, and info.

74 |

In the exercise below the lyrics for Snail Mail’s album Lush. Try retrieving the lyrics for an album of your own choosing.

75 |
76 |
lush <- genius_album("Snail Mail", "Lush")
 77 | lush
78 | 79 |
80 |
81 |
82 |

Adding Lyrics to a data frame

83 |
84 |

Multiple songs

85 |

A common use for lyric analysis is to compare the lyrics of one artist to another. In order to do that, you could potentially retrieve the lyrics for multiple songs and albums and then join them together. This has one major issue in my mind, it makes you create multiple object taking up precious memory. For this reason, the function add_genius() was developed. This enables you to create a tibble with a column for an artists name and their album or song title. add_genius() will then go through the entire tibble and add song lyrics for the tracks and albums that are available.

86 |

Let’s try this with a tibble of three songs.

87 |
88 |
three_songs <- tribble(
 89 |   ~ artist, ~ title,
 90 |   "Big Thief", "UFOF",
 91 |   "Andrew Bird", "Imitosis",
 92 |   "Sylvan Esso", "Slack Jaw"
 93 | )
 94 | 
 95 | song_lyrics <- three_songs %>% 
 96 |   add_genius(artist, title, type = "lyrics")
 97 | 
 98 | song_lyrics %>% 
 99 |   count(artist)
100 | 101 |
102 |
103 |
104 |

Multiple albums

105 |

add_genius() also extends this functionality to albums.

106 |
107 |
albums <- tribble(
108 |   ~ artist, ~ title,
109 |   "Andrew Bird", "Armchair Apocrypha",
110 |   "Andrew Bird", "Things are really great here sort of"
111 | )
112 | 
113 | album_lyrics <- albums %>% 
114 |   add_genius(artist, title, type = "album")
115 | 
116 | album_lyrics
117 | 118 |
119 |

What is important to note here is that the warnings for this function are somewhat informative. When a 404 error occurs, this may be because that the song does not exist in Genius. Or, that the song is actually an instrumental which is the case here with Andrew Bird.

120 |
121 |
122 |

Albums and Songs

123 |

In the scenario that you want to mix single songs and lyrics, you can supply a column with the type value of each row. The example below illustrates this. First a tibble with artist, track or album title, and type columns are created. Next, the tibble is piped to add_genius() with the unquote column names for the artist, title, and type columns. This will then iterate over each row and fetch the appropriate song lyrics.

124 |
125 |
song_album <- tribble(
126 |   ~ artist, ~ title, ~ type,
127 |   "Big Thief", "UFOF", "lyrics",
128 |   "Andrew Bird", "Imitosis", "lyrics",
129 |   "Sylvan Esso", "Slack Jaw", "lyrics",
130 |   "Movements", "Feel Something", "album"
131 | )
132 | 
133 | mixed_lyrics <- song_album %>% 
134 |   add_genius(artist, title, type)
135 | 136 |
137 |
138 |
139 |
140 |

Self-similarity

141 |

Another feature of genius is the ability to create self-similarity matrices to visualize lyrical patterns within a song. This idea was taken from Colin Morris’ wonderful javascript based Song Sim project. Colin explains the interpretation of a self-similarity matrix in their TEDx talk. An even better description of the interpretation is available in this post.

142 |

To use Colin’s example we will look at the structure of Ke$ha’s Tik Tok.

143 |

The function calc_self_sim() will create a self-similarity matrix of a given song. The main arguments for this function are the tibble (df), and the column containing the lyrics (lyric_col). Ideally this is one line per observation as is default from the output of genius_*(). The tidy output compares every ith word with every word in the song. This measures repetition of words and will show us the structure of the lyrics.

144 |
145 |
tik_tok <- genius_lyrics("Ke$ha", "Tik Tok")
146 | 
147 | tt_self_sim <- calc_self_sim(tik_tok, lyric, output = "tidy")
148 | 
149 | tt_self_sim
150 | 
151 | tt_self_sim %>% 
152 |   ggplot(aes(x = x_id, y = y_id, fill = identical)) +
153 |   geom_tile() +
154 |   scale_fill_manual(values = c("white", "black")) +
155 |   theme_minimal() +
156 |   theme(legend.position = "none",
157 |         axis.text = element_blank()) +
158 |   scale_y_continuous(trans = "reverse") +
159 |   labs(title = "Tik Tok", subtitle = "Self-similarity matrix", x = "", y = "", 
160 |        caption = "The matrix displays that there are three choruses with a bridge between the last two. The bridge displays internal repetition.")
161 | 162 |
163 | 164 | 170 | 171 | 174 | 175 | 181 | 182 | 188 | 189 | 195 | 196 | 202 | 203 | 209 | 210 | 216 | 217 | 223 | 224 | 230 | 231 | 234 | 235 | 236 | 239 | 240 |
241 | 242 |
243 | 244 |
245 |
246 |
247 |
248 | 249 | 250 |
251 | 252 |
253 | 254 | 255 |
256 |
257 |
258 |
259 | 260 | 261 |
262 |
263 | 264 | 265 | 266 | 267 | 276 | 277 | 278 | 279 | 287 | 288 | 289 | 290 | 291 | --------------------------------------------------------------------------------