├── .github ├── .gitignore └── workflows │ ├── pkgdown.yaml │ └── R-CMD-check.yaml ├── vignettes ├── .gitignore ├── figs │ └── tweet.png ├── bibliography.bib ├── cricinfo.Rmd ├── cricketdata_R_pkg.Rmd └── cricsheet.Rmd ├── R ├── sysdata.rda ├── cricketdata.R ├── data.R ├── fetch_cricinfo.R ├── clean_fielding_data.R ├── find_player_id.R ├── fetch_player_meta.R ├── fetch_cricket_data.R ├── update_player_meta.R ├── clean_batting_data.R ├── clean_bowling_data.R ├── countries.R ├── fetch_player.R └── fetch_cricsheet.R ├── data ├── player_meta.rda └── cricsheet_codes.rda ├── inst └── extdata │ ├── wt20.rds │ ├── ipl_bbb.rds │ ├── menODI.rds │ ├── MegLanning.rds │ ├── aus_women.rds │ ├── usmenODI.rds │ ├── wbbl_bbb.rds │ ├── Indfielding.rds │ ├── meg_lanning_id.rds │ └── wbbl_match_info.rds ├── .gitignore ├── man ├── figures │ ├── cricketdata.png │ ├── README-menodi-1.png │ ├── README-woment20-1.png │ ├── README-meglanning-1.png │ ├── README-indiafielding-1.png │ ├── README-unnamed-chunk-2-1.png │ ├── README-unnamed-chunk-4-1.png │ ├── README-unnamed-chunk-5-1.png │ ├── README-unnamed-chunk-6-1.png │ └── README-unnamed-chunk-7-1.png ├── cricsheet_codes.Rd ├── player_meta.Rd ├── find_player_id.Rd ├── update_player_meta.Rd ├── fetch_cricinfo.Rd ├── cricketdata-package.Rd ├── fetch_player_meta.Rd ├── fetch_cricsheet.Rd └── fetch_player_data.Rd ├── .Rbuildignore ├── pkgdown └── extra.css ├── cran-comments.md ├── cricketdata.Rproj ├── _pkgdown.yml ├── NAMESPACE ├── NEWS.md ├── DESCRIPTION ├── README.md ├── README.Rmd └── data-raw └── make_player_meta.R /.github/.gitignore: -------------------------------------------------------------------------------- 1 | *.html 2 | -------------------------------------------------------------------------------- /vignettes/.gitignore: -------------------------------------------------------------------------------- 1 | *.html 2 | *.R 3 | *_files/* 4 | *_cache/* 5 | -------------------------------------------------------------------------------- /R/sysdata.rda: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/robjhyndman/cricketdata/HEAD/R/sysdata.rda -------------------------------------------------------------------------------- /data/player_meta.rda: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/robjhyndman/cricketdata/HEAD/data/player_meta.rda -------------------------------------------------------------------------------- /inst/extdata/wt20.rds: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/robjhyndman/cricketdata/HEAD/inst/extdata/wt20.rds -------------------------------------------------------------------------------- /data/cricsheet_codes.rda: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/robjhyndman/cricketdata/HEAD/data/cricsheet_codes.rda -------------------------------------------------------------------------------- /inst/extdata/ipl_bbb.rds: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/robjhyndman/cricketdata/HEAD/inst/extdata/ipl_bbb.rds -------------------------------------------------------------------------------- /inst/extdata/menODI.rds: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/robjhyndman/cricketdata/HEAD/inst/extdata/menODI.rds -------------------------------------------------------------------------------- /vignettes/figs/tweet.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/robjhyndman/cricketdata/HEAD/vignettes/figs/tweet.png -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | .Rproj.user 2 | .Rhistory 3 | .RData 4 | .Ruserdata 5 | README_cache 6 | README_files 7 | docs 8 | -------------------------------------------------------------------------------- /inst/extdata/MegLanning.rds: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/robjhyndman/cricketdata/HEAD/inst/extdata/MegLanning.rds -------------------------------------------------------------------------------- /inst/extdata/aus_women.rds: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/robjhyndman/cricketdata/HEAD/inst/extdata/aus_women.rds -------------------------------------------------------------------------------- /inst/extdata/usmenODI.rds: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/robjhyndman/cricketdata/HEAD/inst/extdata/usmenODI.rds -------------------------------------------------------------------------------- /inst/extdata/wbbl_bbb.rds: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/robjhyndman/cricketdata/HEAD/inst/extdata/wbbl_bbb.rds -------------------------------------------------------------------------------- /man/figures/cricketdata.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/robjhyndman/cricketdata/HEAD/man/figures/cricketdata.png -------------------------------------------------------------------------------- /inst/extdata/Indfielding.rds: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/robjhyndman/cricketdata/HEAD/inst/extdata/Indfielding.rds -------------------------------------------------------------------------------- /inst/extdata/meg_lanning_id.rds: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/robjhyndman/cricketdata/HEAD/inst/extdata/meg_lanning_id.rds -------------------------------------------------------------------------------- /inst/extdata/wbbl_match_info.rds: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/robjhyndman/cricketdata/HEAD/inst/extdata/wbbl_match_info.rds -------------------------------------------------------------------------------- /man/figures/README-menodi-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/robjhyndman/cricketdata/HEAD/man/figures/README-menodi-1.png -------------------------------------------------------------------------------- /man/figures/README-woment20-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/robjhyndman/cricketdata/HEAD/man/figures/README-woment20-1.png -------------------------------------------------------------------------------- /man/figures/README-meglanning-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/robjhyndman/cricketdata/HEAD/man/figures/README-meglanning-1.png -------------------------------------------------------------------------------- /man/figures/README-indiafielding-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/robjhyndman/cricketdata/HEAD/man/figures/README-indiafielding-1.png -------------------------------------------------------------------------------- /man/figures/README-unnamed-chunk-2-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/robjhyndman/cricketdata/HEAD/man/figures/README-unnamed-chunk-2-1.png -------------------------------------------------------------------------------- /man/figures/README-unnamed-chunk-4-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/robjhyndman/cricketdata/HEAD/man/figures/README-unnamed-chunk-4-1.png -------------------------------------------------------------------------------- /man/figures/README-unnamed-chunk-5-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/robjhyndman/cricketdata/HEAD/man/figures/README-unnamed-chunk-5-1.png -------------------------------------------------------------------------------- /man/figures/README-unnamed-chunk-6-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/robjhyndman/cricketdata/HEAD/man/figures/README-unnamed-chunk-6-1.png -------------------------------------------------------------------------------- /man/figures/README-unnamed-chunk-7-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/robjhyndman/cricketdata/HEAD/man/figures/README-unnamed-chunk-7-1.png -------------------------------------------------------------------------------- /.Rbuildignore: -------------------------------------------------------------------------------- 1 | ^.*\.Rproj$ 2 | ^\.Rproj\.user$ 3 | ^README\.Rmd$ 4 | ^README_* 5 | ^_pkgdown\.yml$ 6 | ^docs$ 7 | ^pkgdown$ 8 | ^\.github$ 9 | ^cran-comments\.md$ 10 | ^CRAN-SUBMISSION$ 11 | ^data-raw$ 12 | ^.*_cache$ 13 | ^pkgdown$ 14 | -------------------------------------------------------------------------------- /R/cricketdata.R: -------------------------------------------------------------------------------- 1 | #' @importFrom dplyr pull 2 | #' @importFrom readr read_csv 3 | #' @importFrom tidyr separate pivot_wider 4 | #' @importFrom utils download.file read.csv unzip 5 | #' @importFrom tibble tibble as_tibble 6 | #' 7 | 8 | #' @keywords internal 9 | "_PACKAGE" 10 | 11 | ## usethis namespace: start 12 | ## usethis namespace: end 13 | NULL 14 | -------------------------------------------------------------------------------- /pkgdown/extra.css: -------------------------------------------------------------------------------- 1 | h1, .h1 { 2 | font-size: 2rem; 3 | font-weight: 700; 4 | } 5 | 6 | h2, .h2 { 7 | font-size: 1.5rem; 8 | font-weight: 700; 9 | } 10 | 11 | .bg-primary .navbar-nav .show>.nav-link, .bg-primary .navbar-nav .nav-link.active, .bg-primary .navbar-nav .nav-link:hover, .bg-primary .navbar-nav .nav-link:focus { 12 | color: #ffb81c !important; 13 | } 14 | -------------------------------------------------------------------------------- /cran-comments.md: -------------------------------------------------------------------------------- 1 | ## Test environments 2 | 3 | * ubuntu 24.04 (local): R 4.4.3 4 | * macOS (on GitHub Actions): release 5 | * windows (on GitHub Actions): release 6 | * ubuntu 22.04.3 (on GitHub Actions): devel, release, oldrel 7 | * win-builder: devel, release, oldrelease 8 | 9 | ## R CMD check results 10 | 11 | I'm getting 403 errors on several URLs. They are correct and work when accessed in a browser. 12 | -------------------------------------------------------------------------------- /cricketdata.Rproj: -------------------------------------------------------------------------------- 1 | Version: 1.0 2 | 3 | RestoreWorkspace: Default 4 | SaveWorkspace: Default 5 | AlwaysSaveHistory: Default 6 | 7 | EnableCodeIndexing: Yes 8 | UseSpacesForTab: Yes 9 | NumSpacesForTab: 2 10 | Encoding: UTF-8 11 | 12 | RnwWeave: Sweave 13 | LaTeX: pdfLaTeX 14 | 15 | BuildType: Package 16 | PackageUseDevtools: Yes 17 | PackageInstallArgs: --no-multiarch --with-keep.source 18 | PackageRoxygenize: rd,collate,namespace 19 | -------------------------------------------------------------------------------- /_pkgdown.yml: -------------------------------------------------------------------------------- 1 | url: http://pkg.robjhyndman.com/cricketdata/ 2 | template: 3 | bootstrap: 5 4 | theme: tango 5 | bootswatch: flatly 6 | bslib: 7 | base_font: {google: "Fira Sans"} 8 | heading_font: {google: "Fira Sans"} 9 | code_font: "Hack, mono" 10 | primary: "#234460" 11 | link-color: "#234460" 12 | includes: 13 | in_header: 14 | 15 | navbar: 16 | type: light 17 | -------------------------------------------------------------------------------- /NAMESPACE: -------------------------------------------------------------------------------- 1 | # Generated by roxygen2: do not edit by hand 2 | 3 | export(fetch_cricinfo) 4 | export(fetch_cricsheet) 5 | export(fetch_player_data) 6 | export(fetch_player_meta) 7 | export(find_player_id) 8 | export(update_player_meta) 9 | importFrom(dplyr,pull) 10 | importFrom(readr,read_csv) 11 | importFrom(tibble,as_tibble) 12 | importFrom(tibble,tibble) 13 | importFrom(tidyr,pivot_wider) 14 | importFrom(tidyr,separate) 15 | importFrom(utils,download.file) 16 | importFrom(utils,read.csv) 17 | importFrom(utils,unzip) 18 | -------------------------------------------------------------------------------- /man/cricsheet_codes.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/data.R 3 | \docType{data} 4 | \name{cricsheet_codes} 5 | \alias{cricsheet_codes} 6 | \title{Codes used for competitions on Cricsheet} 7 | \format{ 8 | A data frame with 44 rows and 2 variables. 9 | } 10 | \source{ 11 | \url{https://cricsheet.org/downloads/#experimental} 12 | } 13 | \usage{ 14 | cricsheet_codes 15 | } 16 | \description{ 17 | A dataset containing the names and codes used by 18 | \href{https://cricsheet.org}{cricsheet}, as at 24 March 2025. 19 | } 20 | \keyword{datasets} 21 | -------------------------------------------------------------------------------- /man/player_meta.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/data.R 3 | \docType{data} 4 | \name{player_meta} 5 | \alias{player_meta} 6 | \title{Meta data on players listed at ESPNCricinfo} 7 | \format{ 8 | A data frame with 16101 rows and 11 variables. 9 | } 10 | \source{ 11 | \url{https://www.espncricinfo.com} 12 | } 13 | \usage{ 14 | player_meta 15 | } 16 | \description{ 17 | A dataset containing the names and other attributes of players who appear 18 | on both \href{https://cricsheet.org}{cricsheet} and 19 | \href{https://www.espncricinfo.com}{ESPNCricinfo}, as at 24 March 2025. 20 | } 21 | \keyword{datasets} 22 | -------------------------------------------------------------------------------- /R/data.R: -------------------------------------------------------------------------------- 1 | #' Meta data on players listed at ESPNCricinfo 2 | #' 3 | #' A dataset containing the names and other attributes of players who appear 4 | #' on both [cricsheet](https://cricsheet.org) and 5 | #' [ESPNCricinfo](https://www.espncricinfo.com), as at 24 March 2025. 6 | #' 7 | #' @format A data frame with 16101 rows and 11 variables. 8 | #' @source \url{https://www.espncricinfo.com} 9 | "player_meta" 10 | 11 | #' Codes used for competitions on Cricsheet 12 | #' 13 | #' A dataset containing the names and codes used by 14 | #' [cricsheet](https://cricsheet.org), as at 24 March 2025. 15 | #' 16 | #' @format A data frame with 44 rows and 2 variables. 17 | #' @source \url{https://cricsheet.org/downloads/#experimental} 18 | "cricsheet_codes" 19 | -------------------------------------------------------------------------------- /man/find_player_id.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/find_player_id.R 3 | \name{find_player_id} 4 | \alias{find_player_id} 5 | \title{Find a player id from cricinfo.com} 6 | \usage{ 7 | find_player_id(searchstring) 8 | } 9 | \arguments{ 10 | \item{searchstring}{Part of a player name(s) to search for. Can be a character vector.} 11 | } 12 | \value{ 13 | A table of matching players, their ids, and teams they played for. 14 | } 15 | \description{ 16 | Find a player id from cricinfo.com 17 | } 18 | \examples{ 19 | \dontrun{ 20 | (perry <- find_player_id("Perry")) 21 | EllysePerry <- fetch_player_data(perry[2, "ID"], "test") 22 | } 23 | } 24 | \seealso{ 25 | \code{\link[=fetch_player_data]{fetch_player_data()}} to download playing statistics for 26 | a player, and \code{\link[=fetch_player_meta]{fetch_player_meta()}} to download meta data on players. 27 | } 28 | \author{ 29 | Rob J Hyndman 30 | } 31 | -------------------------------------------------------------------------------- /man/update_player_meta.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/update_player_meta.R 3 | \name{update_player_meta} 4 | \alias{update_player_meta} 5 | \title{Update player_meta} 6 | \usage{ 7 | update_player_meta(start_again = FALSE) 8 | } 9 | \arguments{ 10 | \item{start_again}{If TRUE, downloads all data from ESPNCricinfo without 11 | using player_meta as a starting point. This can take a long time.} 12 | } 13 | \value{ 14 | A tibble containing meta data on cricket players. 15 | } 16 | \description{ 17 | The \link{player_meta} data set contains the names and other 18 | attributes of players who appear on both \href{https://cricsheet.org}{cricsheet} 19 | and \href{https://www.espncricinfo.com}{ESPNCricinfo} as at 24 March 2025. 20 | This function returns an updated version of the data set based on information 21 | currently available online. 22 | } 23 | \examples{ 24 | \dontrun{ 25 | # Update data to current 26 | new_player_meta <- update_player_meta() 27 | } 28 | } 29 | \seealso{ 30 | \link{player_meta}, \code{\link[=fetch_player_meta]{fetch_player_meta()}}. 31 | } 32 | \author{ 33 | Hassan Rafique and Rob J Hyndman 34 | } 35 | -------------------------------------------------------------------------------- /man/fetch_cricinfo.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/fetch_cricinfo.R 3 | \name{fetch_cricinfo} 4 | \alias{fetch_cricinfo} 5 | \title{Fetch Data from Cricinfo} 6 | \usage{ 7 | fetch_cricinfo( 8 | matchtype = c("test", "odi", "t20"), 9 | sex = c("men", "women"), 10 | activity = c("batting", "bowling", "fielding"), 11 | type = c("career", "innings"), 12 | country = NULL 13 | ) 14 | } 15 | \arguments{ 16 | \item{matchtype}{Character indicating test (default), odi, or t20.} 17 | 18 | \item{sex}{Character indicating men (default) or women.} 19 | 20 | \item{activity}{Character indicating batting (default), bowling or fielding.} 21 | 22 | \item{type}{Character indicating innings-by-innings or career (default) data} 23 | 24 | \item{country}{Character indicating country. The default is to fetch data for all countries.} 25 | } 26 | \value{ 27 | A \code{tibble} object, similar to a \code{data.frame}. 28 | } 29 | \description{ 30 | Fetch data from ESPNCricinfo and return a tibble. 31 | All arguments are case-insensitive and partially matched. 32 | } 33 | \examples{ 34 | \dontrun{ 35 | auswt20 <- fetch_cricinfo("T20", "Women", country = "Aust") 36 | IndiaODIBowling <- fetch_cricinfo("ODI", "men", "bowling", country = "india") 37 | } 38 | 39 | } 40 | \author{ 41 | Rob J Hyndman, Timothy Hyndman, Charles Gray 42 | } 43 | -------------------------------------------------------------------------------- /man/cricketdata-package.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/cricketdata.R 3 | \docType{package} 4 | \name{cricketdata-package} 5 | \alias{cricketdata} 6 | \alias{cricketdata-package} 7 | \title{cricketdata: International Cricket Data} 8 | \description{ 9 | Data on international and other major cricket matches from ESPNCricinfo \url{https://www.espncricinfo.com} and Cricsheet \url{https://cricsheet.org}. This package provides some functions to download the data into tibbles ready for analysis. 10 | } 11 | \seealso{ 12 | Useful links: 13 | \itemize{ 14 | \item \url{https://pkg.robjhyndman.com/cricketdata/} 15 | \item \url{https://github.com/robjhyndman/cricketdata} 16 | \item Report bugs at \url{https://github.com/robjhyndman/cricketdata/issues} 17 | } 18 | 19 | } 20 | \author{ 21 | \strong{Maintainer}: Rob Hyndman \email{Rob.Hyndman@monash.edu} 22 | 23 | Authors: 24 | \itemize{ 25 | \item Charles Gray \email{C.Gray@latrobe.edu.au} 26 | \item Sayani Gupta \email{Sayani.Gupta@monash.edu} 27 | \item Timothy Hyndman \email{Timothy.Hyndman@gmail.com} 28 | \item Hassan Rafique \email{dazzalytics@protonmail.com} 29 | \item Jacquie Tran \email{jac@jacquietran.com} 30 | } 31 | 32 | Other contributors: 33 | \itemize{ 34 | \item Puwasala Gamakumara \email{Puwasala.Gamakumara@monash.edu} [contributor] 35 | \item Alex Whan \email{alexwhan@gmail.com} [contributor] 36 | } 37 | 38 | } 39 | \keyword{internal} 40 | -------------------------------------------------------------------------------- /man/fetch_player_meta.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/fetch_player_meta.R 3 | \name{fetch_player_meta} 4 | \alias{fetch_player_meta} 5 | \title{Fetch Player Meta Data} 6 | \usage{ 7 | fetch_player_meta(playerid) 8 | } 9 | \arguments{ 10 | \item{playerid}{A vector of player IDs as given in Cricinfo profiles. Integer or character.} 11 | } 12 | \value{ 13 | A tibble containing meta data on the selected players, with one row for 14 | each player. 15 | } 16 | \description{ 17 | Fetch player meta data from ESPNCricinfo and return a tibble with one line 18 | per player. To identify the players, use their Cricinfo player IDs. 19 | The simplest way to find this is to look up their Cricinfo Profile page. The number 20 | at the end of the URL is the ID. For example, Meg Lanning's profile page is 21 | https://www.espncricinfo.com/cricketers/meg-lanning-329336, 22 | so her ID is 329336. 23 | } 24 | \examples{ 25 | \dontrun{ 26 | # Download meta data on Meg Lanning and Ellyse Perry 27 | aus_women <- fetch_player_meta(c(329336, 275487)) 28 | } 29 | } 30 | \seealso{ 31 | It is usually simpler to just use the saved data set \link{player_meta} 32 | which contains the meta data for all players on ESPNCricinfo as at 24 March 2025. 33 | To find a player ID, use \code{\link[=find_player_id]{find_player_id()}}. 34 | Use \code{\link[=fetch_player_data]{fetch_player_data()}} to download playing statistics for a player. 35 | } 36 | \author{ 37 | Hassan Rafique and Rob J Hyndman 38 | } 39 | -------------------------------------------------------------------------------- /.github/workflows/pkgdown.yaml: -------------------------------------------------------------------------------- 1 | # Workflow derived from https://github.com/r-lib/actions/tree/v2/examples 2 | # Need help debugging build failures? Start at https://github.com/r-lib/actions#where-to-find-help 3 | on: 4 | push: 5 | branches: [main, master] 6 | pull_request: 7 | branches: [main, master] 8 | release: 9 | types: [published] 10 | workflow_dispatch: 11 | 12 | name: pkgdown 13 | 14 | jobs: 15 | pkgdown: 16 | runs-on: ubuntu-latest 17 | # Only restrict concurrency for non-PR jobs 18 | concurrency: 19 | group: pkgdown-${{ github.event_name != 'pull_request' || github.run_id }} 20 | env: 21 | GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }} 22 | permissions: 23 | contents: write 24 | steps: 25 | - uses: actions/checkout@v3 26 | 27 | - uses: r-lib/actions/setup-pandoc@v2 28 | 29 | - uses: r-lib/actions/setup-r@v2 30 | with: 31 | use-public-rspm: true 32 | 33 | - uses: r-lib/actions/setup-r-dependencies@v2 34 | with: 35 | extra-packages: any::pkgdown, local::. 36 | needs: website 37 | 38 | - name: Build site 39 | run: pkgdown::build_site_github_pages(new_process = FALSE, install = FALSE) 40 | shell: Rscript {0} 41 | 42 | - name: Deploy to GitHub pages 🚀 43 | if: github.event_name != 'pull_request' 44 | uses: JamesIves/github-pages-deploy-action@v4.4.1 45 | with: 46 | clean: false 47 | branch: gh-pages 48 | folder: docs 49 | -------------------------------------------------------------------------------- /man/fetch_cricsheet.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/fetch_cricsheet.R 3 | \name{fetch_cricsheet} 4 | \alias{fetch_cricsheet} 5 | \title{Fetch ball-by-ball, match and player data from Cricsheet and return a tibble.} 6 | \usage{ 7 | fetch_cricsheet( 8 | type = c("bbb", "match", "player"), 9 | gender = c("female", "male"), 10 | competition = "tests" 11 | ) 12 | } 13 | \arguments{ 14 | \item{type}{Character string giving type of data: ball-by-ball, match info or player info.} 15 | 16 | \item{gender}{Character string giving player gender: female or male.} 17 | 18 | \item{competition}{Character string giving code corresponding to competition. See \code{\link{cricsheet_codes}} for the 19 | competitions and codes available.} 20 | } 21 | \value{ 22 | A \code{tibble} object, similar to a \code{data.frame}. 23 | } 24 | \description{ 25 | Download csv data from Cricsheet \url{https://cricsheet.org/downloads/}. 26 | Data must be specified by three factors: 27 | (a) type of data: \code{bbb} (ball-by-ball), \code{match} or \code{player}. 28 | (b) gender; 29 | (c) competition specified as a Cricsheet code. See \code{\link{cricsheet_codes}} for the 30 | competitions and codes available. 31 | } 32 | \examples{ 33 | \dontrun{ 34 | wbbl_bbb <- fetch_cricsheet(competition = "wbbl", type = "bbb") 35 | wbbl_match <- fetch_cricsheet(competition = "wbbl", type = "match") 36 | wbbl_player <- fetch_cricsheet(competition = "wbbl", type = "player") 37 | } 38 | } 39 | \author{ 40 | Jacquie Tran, Hassan Rafique and Rob J Hyndman 41 | } 42 | -------------------------------------------------------------------------------- /NEWS.md: -------------------------------------------------------------------------------- 1 | # cricketdata (development version) 2 | 3 | # cricketdata 0.3.0 4 | * Updated data 5 | * Bug fixes due to changes in cricinfo URLs 6 | 7 | # cricketdata 0.2.3 8 | * Updated data 9 | * Bug fixes 10 | 11 | # cricketdata 0.2.2 12 | * Added hex sticker, thanks to @jacquietran 13 | * fetch_cricsheet() can now handle all competitions with csv files. 14 | * Reduced dependencies 15 | * Bug fixes 16 | 17 | # cricketdata 0.2.1 18 | * Added new vignette introducing package 19 | * Bux fix due to changes in gghighlight 20 | 21 | # cricketdata 0.2.0 22 | * Added fetch_player_meta(), update_player_meta() and player_meta. 23 | * Improved data cleaning for cricinfo data and cricsheet data 24 | * Improved cricinfo vignette. 25 | * Bug fixes and improvements to documentation 26 | 27 | # cricketdata 0.1.1 28 | * Fixes to vignettes 29 | * Dont run slow example 30 | 31 | # cricketdata 0.1.0 32 | * Added fetch_cricsheet() functions, thanks to @jacquietran. 33 | * Added vignettes for fetch_cricsheet() and fetch_cricinfo(). 34 | 35 | # cricketdata 0.0.3 (9 January 2019) 36 | * Updated to handle change in format for female data on cricinfo. 37 | 38 | # cricketdata 0.0.2 (February 2018) 39 | * Development at Numbat Hackathon, February 2018 40 | * Changed name to cricketdata as cricinfo not the only source. 41 | * Added find_player_id() 42 | * Extended fetch_player_data() 43 | * Bug fixes 44 | 45 | # cricinfo 0.0.1 (October 2017) 46 | * Package started at Melbourne ozunconf in October 2017 47 | * Functions added: fetch_player and fetch_cricinfo 48 | -------------------------------------------------------------------------------- /.github/workflows/R-CMD-check.yaml: -------------------------------------------------------------------------------- 1 | # Workflow derived from https://github.com/r-lib/actions/tree/v2/examples 2 | # Need help debugging build failures? Start at https://github.com/r-lib/actions#where-to-find-help 3 | on: 4 | push: 5 | branches: [main, master] 6 | pull_request: 7 | 8 | name: R-CMD-check.yaml 9 | 10 | permissions: read-all 11 | 12 | jobs: 13 | R-CMD-check: 14 | runs-on: ${{ matrix.config.os }} 15 | 16 | name: ${{ matrix.config.os }} (${{ matrix.config.r }}) 17 | 18 | strategy: 19 | fail-fast: false 20 | matrix: 21 | config: 22 | - {os: macos-latest, r: 'release'} 23 | - {os: windows-latest, r: 'release'} 24 | - {os: ubuntu-latest, r: 'devel', http-user-agent: 'release'} 25 | - {os: ubuntu-latest, r: 'release'} 26 | - {os: ubuntu-latest, r: 'oldrel-1'} 27 | 28 | env: 29 | GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }} 30 | R_KEEP_PKG_SOURCE: yes 31 | 32 | steps: 33 | - uses: actions/checkout@v4 34 | 35 | - uses: r-lib/actions/setup-pandoc@v2 36 | 37 | - uses: r-lib/actions/setup-r@v2 38 | with: 39 | r-version: ${{ matrix.config.r }} 40 | http-user-agent: ${{ matrix.config.http-user-agent }} 41 | use-public-rspm: true 42 | 43 | - uses: r-lib/actions/setup-r-dependencies@v2 44 | with: 45 | extra-packages: any::rcmdcheck 46 | needs: check 47 | 48 | - uses: r-lib/actions/check-r-package@v2 49 | with: 50 | upload-snapshots: true 51 | build_args: 'c("--no-manual","--compact-vignettes=gs+qpdf")' 52 | -------------------------------------------------------------------------------- /DESCRIPTION: -------------------------------------------------------------------------------- 1 | Package: cricketdata 2 | Version: 0.3.0.9000 3 | Title: International Cricket Data 4 | Description: Data on international and other major cricket matches from 5 | ESPNCricinfo and Cricsheet . 6 | This package provides some functions to download the data into tibbles ready 7 | for analysis. 8 | Authors@R: c( 9 | person("Rob", "Hyndman", email="Rob.Hyndman@monash.edu", role=c("aut", "cre")), 10 | person("Charles", "Gray", email="C.Gray@latrobe.edu.au", role="aut"), 11 | person("Sayani", "Gupta", email="Sayani.Gupta@monash.edu", role="aut"), 12 | person("Timothy", "Hyndman", email="Timothy.Hyndman@gmail.com", role="aut"), 13 | person("Hassan","Rafique", email="dazzalytics@protonmail.com", role="aut"), 14 | person("Jacquie","Tran", email="jac@jacquietran.com", role="aut"), 15 | person("Puwasala", "Gamakumara", email="Puwasala.Gamakumara@monash.edu", role="ctb"), 16 | person("Alex", "Whan", email="alexwhan@gmail.com", role="ctb") 17 | ) 18 | Depends: R (>= 4.1.0) 19 | Imports: 20 | cli, 21 | dplyr (>= 1.1.0), 22 | jsonlite, 23 | lubridate, 24 | readr, 25 | rvest, 26 | stringr, 27 | tibble, 28 | tidyr, 29 | xml2 30 | Suggests: 31 | codetools, 32 | gghighlight, 33 | ggplot2, 34 | ggtext, 35 | glue, 36 | here, 37 | knitr, 38 | paletteer, 39 | patchwork, 40 | rmarkdown, 41 | R.rsp, 42 | showtext 43 | License: GPL-3 44 | Encoding: UTF-8 45 | ByteCompile: true 46 | URL: https://pkg.robjhyndman.com/cricketdata/, 47 | https://github.com/robjhyndman/cricketdata 48 | LazyData: true 49 | VignetteBuilder: 50 | knitr, 51 | R.rsp 52 | Roxygen: list(markdown = TRUE) 53 | RoxygenNote: 7.3.2 54 | BugReports: https://github.com/robjhyndman/cricketdata/issues 55 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | # cricketdata 5 | 6 | 7 | 8 | [![CRAN_Status_Badge](https://www.r-pkg.org/badges/version/cricketdata)](https://cran.r-project.org/package=cricketdata) 9 | [![Downloads](https://cranlogs.r-pkg.org/badges/cricketdata)](https://cran.r-project.org/package=cricketdata) 10 | [![Licence](https://img.shields.io/badge/licence-GPL--3-blue.svg)](https://www.gnu.org/licenses/gpl-3.0.en.html) 11 | [![R-CMD-check](https://github.com/robjhyndman/cricketdata/workflows/R-CMD-check/badge.svg)](https://github.com/robjhyndman/cricketdata/actions) 12 | [![R-CMD-check](https://github.com/robjhyndman/cricketdata/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/robjhyndman/cricketdata/actions/workflows/R-CMD-check.yaml) 13 | 14 | 15 | Functions for downloading data on international and other major cricket 16 | matches from [ESPNCricinfo](https://www.espncricinfo.com) and 17 | [Cricsheet](https://cricsheet.org). This package provides some functions 18 | to download the data into tibbles ready for analysis. 19 | 20 | Please respect the terms of use for each website: 21 | [ESPNCricinfo](https://www.espncricinfo.com/ci/content/site/company/terms_use.html), 22 | [Cricsheet](https://cricsheet.org/register/). 23 | 24 | ## Installation 25 | 26 | You can install the **stable** version from 27 | [CRAN](https://cran.r-project.org/package=cricketdata). 28 | 29 | ``` r 30 | # install.packages("pak") 31 | pak::pak("cricketdata") 32 | ``` 33 | 34 | You can install the **development** version 35 | [Github](https://github.com/robjhyndman/cricketdata): 36 | 37 | ``` r 38 | pak::pak("robjhyndman/cricketdata") 39 | ``` 40 | 41 | ## License 42 | 43 | This package is free and open source software, licensed under GPL-3. 44 | -------------------------------------------------------------------------------- /vignettes/bibliography.bib: -------------------------------------------------------------------------------- 1 | @misc{nflfastR, 2 | author = "Carl, Sebastian and Baldwin, Ben", 3 | title = "{nflfastR: Functions to Efficiently Access NFL Play by Play Data}", 4 | journal = "R package", 5 | url = {https://CRAN.R-project.org/package=nflfastR} 6 | } 7 | 8 | @misc{dataverse, 9 | author = "Gilani, Saiem", 10 | title = "{Sports Dataverse}", 11 | url = {https://sportsdataverse.org/} 12 | } 13 | 14 | @article{einstein, 15 | author = "Einstein, Albert", 16 | title = "{Zur Elektrodynamik bewegter K{\"o}rper}. ({German}) 17 | [{On} the electrodynamics of moving bodies]", 18 | journal = "Annalen der Physik", 19 | volume = "322", 20 | number = "10", 21 | pages = "891--921", 22 | year = "{1905}", 23 | DOI = "http://dx.doi.org/10.1002/andp.19053221004" 24 | } 25 | 26 | @book{latexcompanion, 27 | author = "Michel Goossens and Frank Mittelbach and Alexander Samarin", 28 | title = "The \LaTeX\ Companion", 29 | year = "1993", 30 | publisher = "Addison-Wesley", 31 | address = "Reading, Massachusetts" 32 | } 33 | 34 | @misc{knuthwebsite, 35 | author = "Donald Knuth", 36 | title = "Knuth: Computers and Typesetting", 37 | url = "http://www-cs-faculty.stanford.edu/\~{}uno/abcde.html" 38 | } 39 | 40 | @misc{hassan, 41 | author = "Hassan Rafique", 42 | title = "Knuth: Computers and Typesetting", 43 | url = "http://www-cs-faculty.stanford.edu/\~{}uno/abcde.html" 44 | } 45 | @article{knuth1984, 46 | title = {Literate Programming}, 47 | author = {Knuth, D. E.}, 48 | year = {1984}, 49 | month = {02}, 50 | date = {1984-02-01}, 51 | journal = {The Computer Journal}, 52 | pages = {97--111}, 53 | volume = {27}, 54 | number = {2}, 55 | doi = {10.1093/comjnl/27.2.97}, 56 | url = {http://dx.doi.org/10.1093/comjnl/27.2.97}, 57 | langid = {en} 58 | } 59 | -------------------------------------------------------------------------------- /R/fetch_cricinfo.R: -------------------------------------------------------------------------------- 1 | #' Fetch Data from Cricinfo 2 | #' 3 | #' Fetch data from ESPNCricinfo and return a tibble. 4 | #' All arguments are case-insensitive and partially matched. 5 | #' 6 | #' @param matchtype Character indicating test (default), odi, or t20. 7 | #' @param sex Character indicating men (default) or women. 8 | #' @param activity Character indicating batting (default), bowling or fielding. 9 | #' @param type Character indicating innings-by-innings or career (default) data 10 | #' @param country Character indicating country. The default is to fetch data for all countries. 11 | #' 12 | #' @author Rob J Hyndman, Timothy Hyndman, Charles Gray 13 | #' @return A \code{tibble} object, similar to a \code{data.frame}. 14 | #' @examples 15 | #' \dontrun{ 16 | #' auswt20 <- fetch_cricinfo("T20", "Women", country = "Aust") 17 | #' IndiaODIBowling <- fetch_cricinfo("ODI", "men", "bowling", country = "india") 18 | #' } 19 | #' 20 | #' @export 21 | 22 | fetch_cricinfo <- function(matchtype = c("test", "odi", "t20"), 23 | sex = c("men", "women"), 24 | activity = c("batting", "bowling", "fielding"), 25 | type = c("career", "innings"), 26 | country = NULL) { 27 | matchtype <- tolower(matchtype) 28 | sex <- tolower(sex) 29 | type <- tolower(type) 30 | activity <- tolower(activity) 31 | if (!is.null(country)) { 32 | country <- tolower(country) 33 | } 34 | 35 | matchtype <- match.arg(matchtype) 36 | sex <- match.arg(sex) 37 | activity <- match.arg(activity) 38 | type <- match.arg(type) 39 | 40 | # Get the raw data 41 | this_data <- fetch_cricket_data(matchtype, sex, country, activity, type) 42 | 43 | # Clean it up 44 | if (activity == "batting") { 45 | this_data <- clean_batting_data(this_data) 46 | } else if (activity == "bowling") { 47 | this_data <- clean_bowling_data(this_data) 48 | } else { 49 | this_data <- clean_fielding_data(this_data) 50 | } 51 | 52 | return(this_data) 53 | } 54 | -------------------------------------------------------------------------------- /README.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | output: github_document 3 | --- 4 | 5 | 6 | 7 | ```{r setup, include = FALSE} 8 | knitr::opts_chunk$set( 9 | echo = TRUE, 10 | collapse = TRUE, 11 | comment = "#>", 12 | cache = TRUE, 13 | fig.path = "man/figures/README-" 14 | ) 15 | library(cricketdata) 16 | library(tidyverse) 17 | ``` 18 | 19 | # cricketdata 20 | 21 | 22 | [![CRAN_Status_Badge](https://www.r-pkg.org/badges/version/cricketdata)](https://cran.r-project.org/package=cricketdata) 23 | [![Downloads](https://cranlogs.r-pkg.org/badges/cricketdata)](https://cran.r-project.org/package=cricketdata) 24 | [![Licence](https://img.shields.io/badge/licence-GPL--3-blue.svg)](https://www.gnu.org/licenses/gpl-3.0.en.html) 25 | [![R-CMD-check](https://github.com/robjhyndman/cricketdata/workflows/R-CMD-check/badge.svg)](https://github.com/robjhyndman/cricketdata/actions) 26 | [![R-CMD-check](https://github.com/robjhyndman/cricketdata/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/robjhyndman/cricketdata/actions/workflows/R-CMD-check.yaml) 27 | 28 | 29 | Functions for downloading data on international and other major cricket matches from [ESPNCricinfo](https://www.espncricinfo.com) and [Cricsheet](https://cricsheet.org). This package provides some functions to download the data into tibbles ready for analysis. 30 | 31 | Please respect the terms of use for each website: [ESPNCricinfo](https://www.espncricinfo.com/ci/content/site/company/terms_use.html), [Cricsheet](https://cricsheet.org/register/). 32 | 33 | ## Installation 34 | 35 | You can install the **stable** version from [CRAN](https://cran.r-project.org/package=cricketdata). 36 | 37 | ``` r 38 | # install.packages("pak") 39 | pak::pak("cricketdata") 40 | ``` 41 | 42 | You can install the **development** version [Github](https://github.com/robjhyndman/cricketdata): 43 | 44 | ``` r 45 | pak::pak("robjhyndman/cricketdata") 46 | ``` 47 | 48 | ## License 49 | 50 | This package is free and open source software, licensed under GPL-3. 51 | -------------------------------------------------------------------------------- /man/fetch_player_data.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/fetch_player.R 3 | \name{fetch_player_data} 4 | \alias{fetch_player_data} 5 | \title{Fetch Player Data} 6 | \usage{ 7 | fetch_player_data( 8 | playerid, 9 | matchtype = c("test", "odi", "t20"), 10 | activity = c("batting", "bowling", "fielding") 11 | ) 12 | } 13 | \arguments{ 14 | \item{playerid}{The player ID as given in the Cricinfo profile. Integer or character.} 15 | 16 | \item{matchtype}{Which type of cricket matches do you want? Tests, ODIs or T20s? Not case-sensitive.} 17 | 18 | \item{activity}{Which type of activities do you want? Batting, Bowling or Fielding? Not case-sensitive.} 19 | } 20 | \value{ 21 | A tibble containing data on the selected player, with one row for every innings 22 | of every match in which they have played. 23 | } 24 | \description{ 25 | Fetch individual player data from all matches played. The function will scrape 26 | the data from ESPNCricinfo and return a tibble with one line per innings for all 27 | games a player has played. To identify a player, use their Cricinfo player ID. 28 | The simplest way to find this is to look up their Cricinfo Profile page. The number 29 | at the end of the URL is the ID. For example, Meg Lanning's profile page is 30 | http://www.espncricinfo.com/australia/content/player/329336.html, 31 | so her ID is 329336. 32 | } 33 | \examples{ 34 | \dontrun{ 35 | # Download data on some players 36 | EllysePerry <- fetch_player_data(275487, "T20", "batting") 37 | RahulDravid <- fetch_player_data(28114, "ODI", "fielding") 38 | LasithMalinga <- fetch_player_data(49758, "Test", "bowling") 39 | 40 | # Create a plot for Ellyse Perry's T20 scores 41 | library(dplyr) 42 | library(ggplot2) 43 | EllysePerry |> 44 | filter(!is.na(Runs)) |> 45 | ggplot(aes(x = Start_Date, y = Runs, col = Dismissal, na.rm = TRUE)) + 46 | geom_point() + 47 | ggtitle("Ellyse Perry's T20 Scores") 48 | } 49 | } 50 | \seealso{ 51 | \code{\link[=find_player_id]{find_player_id()}} to find a player ID by searching on their name, 52 | and \code{\link[=fetch_player_meta]{fetch_player_meta()}} to download meta data for players. 53 | } 54 | \author{ 55 | Rob J Hyndman and Sayani Gupta 56 | } 57 | -------------------------------------------------------------------------------- /R/clean_fielding_data.R: -------------------------------------------------------------------------------- 1 | # Function to clean bowling data. 2 | # Works with career or innings data 3 | 4 | clean_fielding_data <- function(x) { 5 | # Make names easier to interpret 6 | vars <- colnames(x) 7 | vars[vars == "Mat"] <- "Matches" 8 | vars[vars == "Inns"] <- "Innings" 9 | vars[vars == "Start Date"] <- "Date" 10 | vars[vars == "Dis"] <- "Dismissals" 11 | vars[vars == "Ct"] <- "Caught" 12 | vars[vars == "St"] <- "Stumped" 13 | vars[vars == "Ct Wk"] <- "CaughtBehind" 14 | vars[vars == "Ct Fi"] <- "CaughtFielder" 15 | vars[vars == "MD"] <- "MaxDismissalsInnings" 16 | 17 | colnames(x) <- vars 18 | 19 | # Fix classes for all variables 20 | x$Innings <- as.integer(x$Innings) 21 | x$Dismissals <- as.integer(x$Dismissals) 22 | x$Caught <- as.integer(x$Caught) 23 | x$Stumped <- as.integer(x$Stumped) 24 | x$CaughtBehind <- as.integer(x$CaughtBehind) 25 | x$CaughtFielder <- as.integer(x$CaughtFielder) 26 | 27 | career <- ("Matches" %in% vars) 28 | if (career) { 29 | x$Matches <- as.integer(x$Matches) 30 | if ("Span" %in% vars) { 31 | x$Start <- as.integer(substr(x$Span, 1, 4)) 32 | x$End <- as.integer(substr(x$Span, 6, 9)) 33 | } 34 | x$MaxDismissalsInnings <- unlist(lapply( 35 | strsplit(x$MaxDismissalsInnings, "\\("), function(x) { 36 | as.numeric(x[1]) 37 | } 38 | )) 39 | } else { 40 | x$Date <- lubridate::dmy(x$Date) 41 | x$Opposition <- stringr::str_replace_all(x$Opposition, "v | Women| Wmn", "") 42 | x$Opposition <- rename_countries(x$Opposition) 43 | } 44 | 45 | # Extract country information if it is present 46 | # This should only be required when multiple countries are included 47 | country <- (length(grep("\\(", x[1, 1])) > 0) 48 | if (country) { 49 | x$Country <- stringr::str_extract(x$Player, "\\([a-zA-Z \\-extends]+\\)") 50 | x$Country <- stringr::str_replace_all(x$Country, "\\(|\\)|-W", "") 51 | x$Country <- rename_countries(x$Country) 52 | x$Player <- stringr::str_replace(x$Player, "\\([a-zA-Z \\-]+\\)", "") 53 | } 54 | 55 | # Re-order and select columns 56 | vars <- colnames(x) 57 | if (career) { 58 | varorder <- c( 59 | "Player", "Country", "Start", "End", "Matches", "Innings", 60 | "Dismissals", "Caught", "CaughtFielder", "CaughtBehind", "Stumped", "MaxDismissalsInnings" 61 | ) 62 | } else { 63 | varorder <- c( 64 | "Date", "Player", "Country", 65 | "Dismissals", "Caught", "CaughtFielder", "CaughtBehind", "Stumped", "Innings", "Opposition", "Ground" 66 | ) 67 | } 68 | varorder <- varorder[varorder %in% vars] 69 | 70 | return(x[, varorder]) 71 | } 72 | -------------------------------------------------------------------------------- /R/find_player_id.R: -------------------------------------------------------------------------------- 1 | #' Find a player id from cricinfo.com 2 | #' 3 | #' @param searchstring Part of a player name(s) to search for. Can be a character vector. 4 | #' @return A table of matching players, their ids, and teams they played for. 5 | #' @seealso [fetch_player_data()] to download playing statistics for 6 | #' a player, and [fetch_player_meta()] to download meta data on players. 7 | #' @examples 8 | #' \dontrun{ 9 | #' (perry <- find_player_id("Perry")) 10 | #' EllysePerry <- fetch_player_data(perry[2, "ID"], "test") 11 | #' } 12 | #' @author Rob J Hyndman 13 | #' @export 14 | 15 | find_player_id <- function(searchstring) { 16 | url <- paste0( 17 | "http://stats.espncricinfo.com/ci/engine/stats/analysis.html?search=", 18 | searchstring, ";template=analysis" 19 | ) 20 | url <- gsub(" ", "%20", url) 21 | raw <- lapply(url, function(x) try(xml2::read_html(x), silent = TRUE)) 22 | if ("try-error" %in% lapply(raw, class)) { 23 | stop("This shouldn't happen") 24 | } 25 | # Extract table of names 26 | tab <- lapply(raw, function(x) rvest::html_table(x)[[1]]) 27 | # Make into a table 28 | tab <- lapply(tab, function(x) { 29 | x <- tibble::as_tibble(x, .name_repair = "check_unique") 30 | x <- x[x$X1 != "", ] 31 | }) 32 | 33 | for (i in seq_along(searchstring)) { 34 | x <- tab[[i]] 35 | # Check if any players returned 36 | if (x[1, 1] == "No matching players found") { 37 | warning(paste("No matching players found for search:", searchstring[i])) 38 | } 39 | # Check if return exceeds 100 40 | if (grepl("Search restricted", x[nrow(x), 1])) { 41 | warning(paste0( 42 | "Only the first 100 results returned for search: '", searchstring[i], 43 | "'. Try a more specific search" 44 | )) 45 | tab[[i]] <- x[1:100, ] 46 | } 47 | } 48 | 49 | tab <- dplyr::bind_rows(tab) 50 | 51 | # Name columns 52 | colnames(tab) <- c("Name", "Country", "Played") 53 | # Now to find the ids 54 | ids <- lapply(raw, rvest::html_nodes, css = "a") 55 | ids <- lapply(ids, function(x) as.character(x[grep("/ci/engine/player/", x)])) 56 | ids <- lapply(ids, function(x) gsub("([a-zA-Z= \\\"/<>]*)", "", x)) 57 | ids <- lapply(ids, function(x) { 58 | unlist(lapply(strsplit(x, ".", fixed = TRUE), function(x) { 59 | x[1] 60 | })) 61 | }) 62 | # Insert NA for those without matches 63 | ids <- lapply(ids, unique) 64 | ids <- lapply(ids, function(x) if (is.null(x)) NA else as.numeric(x)) 65 | tab$ID <- unlist(ids) 66 | tab$searchstring <- rep(searchstring, times = unlist(lapply(ids, length))) 67 | return(tab[, c("ID", "Name", "searchstring", "Country", "Played")]) 68 | } 69 | -------------------------------------------------------------------------------- /data-raw/make_player_meta.R: -------------------------------------------------------------------------------- 1 | library(cricketdata) 2 | library(rvest) 3 | library(stringr) 4 | library(dplyr) 5 | 6 | # List of the first 1000 ESPN teams 7 | get_espn_countries <- function(n = 1000) { 8 | countries <- tibble(id = seq(n), country = character(n)) 9 | for(i in countries$id) { 10 | cat(i," ") 11 | countries$country[i] <- cricketdata:::get_espn_country(i) 12 | } 13 | return(countries) 14 | } 15 | 16 | espn_countries <- get_espn_countries() 17 | usethis::use_data(espn_countries, overwrite = TRUE, internal = TRUE) 18 | 19 | 20 | # Create tibble of cricinfo meta data for all players who are on both cricsheet and cricinfo. 21 | # Using start_again = TRUE in case some data has been corrected online. 22 | # Much more efficient to set start_again = FALSE 23 | new_player_meta <- update_player_meta(start_again = FALSE) 24 | new_player_meta <- new_player_meta |> 25 | as_tibble() |> 26 | mutate( 27 | country = if_else(str_detect(country, "P.N.G."), "Papua New Guinea", country), 28 | country = if_else(str_detect(country, "U.A.E."), "United Arab Emirates", country), 29 | country = if_else(str_detect(country, "U.S.A."), "United States of America", country), 30 | country = if_else(str_detect(country, "Czech Rep."), "Czech Republic", country), 31 | country = if_else(str_detect(country, "Cayman Is"), "Cayman Islands", country), 32 | ) 33 | 34 | # Check all character fields are ascii 35 | for (j in seq_len(NCOL(new_player_meta))) { 36 | if(inherits(new_player_meta[,j], "character")) { 37 | new_player_meta[,j] <- iconv(new_player_meta[,j], from="utf8", to="ascii") 38 | } 39 | } 40 | player_meta <- new_player_meta 41 | usethis::use_data(player_meta, overwrite = TRUE) 42 | 43 | # Need to update date and size of object in following files 44 | # fetch_player_meta.R 45 | # update_player_meta.R 46 | # data.R 47 | 48 | # Find all csv2 files on cricsheet 49 | downloads <- read_html("https://cricsheet.org/downloads/") 50 | names <- downloads |> 51 | html_elements("td") |> 52 | html_text2() |> 53 | str_trim() 54 | names <- names[names != ""] 55 | names <- names[!str_detect(names, "JSON")] 56 | names <- names[!grepl("^[0-9,]*$", names)] 57 | names <- names[!str_detect(names, "[Added|Played] in the previous")] 58 | first <- which(str_detect(names, "^All matches"))[2] 59 | names <- names[first:length(names)] 60 | names <- names[!str_detect(names, "Original\nNew")] 61 | names <- str_remove(names, " [0-9]* matches withheld \\(\\?\\)") 62 | 63 | acronyms <- downloads |> 64 | html_elements("a") |> 65 | as.character() 66 | acronyms <- acronyms[grepl("csv2", acronyms)] 67 | acronyms <- acronyms[!grepl("_male_csv2", acronyms)] 68 | acronyms <- acronyms[!grepl("_female_csv2", acronyms)] 69 | acronyms <- tail(acronyms, length(names)) 70 | acronyms <- str_extract(acronyms, "[a-zA-Z0-9]*_csv2") 71 | acronyms <- str_remove(acronyms, "_csv2") 72 | 73 | cricsheet_codes <- tibble::tibble( 74 | competition = names, 75 | code = acronyms 76 | ) 77 | usethis::use_data(cricsheet_codes, overwrite = TRUE) 78 | -------------------------------------------------------------------------------- /R/fetch_player_meta.R: -------------------------------------------------------------------------------- 1 | #' Fetch Player Meta Data 2 | #' 3 | #' Fetch player meta data from ESPNCricinfo and return a tibble with one line 4 | #' per player. To identify the players, use their Cricinfo player IDs. 5 | #' The simplest way to find this is to look up their Cricinfo Profile page. The number 6 | #' at the end of the URL is the ID. For example, Meg Lanning's profile page is 7 | #' https://www.espncricinfo.com/cricketers/meg-lanning-329336, 8 | #' so her ID is 329336. 9 | #' 10 | #' @param playerid A vector of player IDs as given in Cricinfo profiles. Integer or character. 11 | #' 12 | #' @return A tibble containing meta data on the selected players, with one row for 13 | #' each player. 14 | #' @author Hassan Rafique and Rob J Hyndman 15 | #' @seealso It is usually simpler to just use the saved data set [player_meta] 16 | #' which contains the meta data for all players on ESPNCricinfo as at 24 March 2025. 17 | #' To find a player ID, use [find_player_id()]. 18 | #' Use [fetch_player_data()] to download playing statistics for a player. 19 | #' @examples 20 | #' \dontrun{ 21 | #' # Download meta data on Meg Lanning and Ellyse Perry 22 | #' aus_women <- fetch_player_meta(c(329336, 275487)) 23 | #' } 24 | #' @export 25 | fetch_player_meta <- function(playerid) { 26 | output <- NULL 27 | pb <- cli::cli_progress_bar(total = length(playerid)) 28 | for (j in seq_along(playerid)) { 29 | cli::cli_progress_update() 30 | output <- rbind(output, fetch_player_meta_individual(playerid[j])) 31 | } 32 | cli::cli_progress_done() 33 | return(output) 34 | } 35 | 36 | fetch_player_meta_individual <- function(playerid) { 37 | # Set up empty output 38 | output <- data.frame( 39 | cricinfo_id = playerid, 40 | name = NA_character_, 41 | full_name = NA_character_, 42 | country = NA_character_, 43 | dob = as.Date(NA), 44 | batting_style = NA_character_, 45 | bowling_style = NA_character_ 46 | ) 47 | # Read JSON file with player meta data 48 | url <- paste0("http://core.espnuk.org/v2/sports/cricket/athletes/", playerid) 49 | json <- try(jsonlite::read_json(url), silent = TRUE) 50 | if ("try-error" %in% class(json)) { 51 | warning(paste( 52 | "Cannot read player information from ESPNCricinfo for ID", 53 | playerid 54 | )) 55 | } else { 56 | output$full_name <- json$fullName 57 | output$name <- json$name 58 | output$country <- get_espn_country(json$country) 59 | if(is.na(output$country)) { 60 | stop(paste("Country not found. Player",playerid,output$full_name)) 61 | } 62 | output$dob <- as.Date(json$dateOfBirth) 63 | if(length(json$style) > 0) { 64 | output$batting_style <- json$style[[1]]$description 65 | } 66 | if(length(json$style) > 1) { 67 | output$bowling_style <- json$style[[2]]$description 68 | } 69 | } 70 | return(output) 71 | } 72 | 73 | # Find country from ESPN team id 74 | get_espn_country <- function(i) { 75 | if(i %in% espn_countries$id) { 76 | return(espn_countries$country[espn_countries$id == i]) 77 | } 78 | json <- try(jsonlite::read_json(paste0("http://core.espnuk.org/v2/sports/cricket/teams/", i)), silent = TRUE) 79 | if(!("try-error" %in% class(json))) { 80 | # Update data set to avoid repeated calls 81 | espn_countries <- rbind(espn_countries, data.frame(id = i, country = json$name)) 82 | return(json$name) 83 | } else { 84 | return(NA_character_) 85 | } 86 | } 87 | 88 | utils::globalVariables(c( 89 | "cricinfo_id", "full_name", "country", "dob", "birthplace", 90 | "batting_style", "bowling_style", "playing_role", 91 | "title", "values" 92 | )) 93 | -------------------------------------------------------------------------------- /R/fetch_cricket_data.R: -------------------------------------------------------------------------------- 1 | # Main function to scrape the data from cricinfo 2 | # Not user-visible. Called by fetch_cricinfo. 3 | 4 | fetch_cricket_data <- function(matchtype = c("test", "odi", "t20"), 5 | sex = c("men", "women"), 6 | country = NULL, 7 | activity = c("batting", "bowling", "fielding"), 8 | view = c("innings", "career")) { 9 | # Check arguments given by user match the type (class?) of the default 10 | # arguments of the function. 11 | matchtype <- tolower(matchtype) 12 | sex <- tolower(sex) 13 | matchtype <- match.arg(matchtype) 14 | sex <- match.arg(sex) 15 | activity <- match.arg(activity) 16 | view <- match.arg(view) 17 | 18 | # Set view text. 19 | view_text <- if (view == "innings") { 20 | ";view=innings" 21 | } else { 22 | NULL 23 | } 24 | 25 | # Define url signifier for match type. 26 | matchclass <- 27 | match(matchtype, c("test", "odi", "t20")) + 7 * (sex == "women") 28 | 29 | # Find country code 30 | if (!is.null(country)) { 31 | if (sex == "men") { 32 | team <- men$team[pmatch(country, tolower(men$name))] 33 | } else { 34 | team <- women$team[pmatch(country, tolower(women$name))] 35 | } 36 | if (is.na(team)) { 37 | stop("Country not found") 38 | } 39 | } 40 | 41 | # Set starting page to read from. 42 | page <- 1L 43 | alldata <- NULL 44 | 45 | # Read each page in turn and bind the rows. 46 | theend <- FALSE # Initialise. 47 | while (!theend) { 48 | # Create url string. 49 | url <- 50 | paste0( 51 | "http://stats.espncricinfo.com/ci/engine/stats/index.html?class=", 52 | matchclass, 53 | ifelse(is.null(country), "", paste0(";team=", team)), 54 | ";page=", 55 | format(page, scientific = FALSE), 56 | ";template=results;type=", 57 | activity, 58 | view_text, 59 | ";size=200;wrappertype=print" 60 | ) 61 | 62 | # Get raw page data from page using xml2::read_html() with url string. 63 | raw <- try(xml2::read_html(url), silent = TRUE) 64 | if ("try-error" %in% class(raw)) { 65 | stop("Error in URL") 66 | } 67 | 68 | # Grab relevant table using rvest::html_table() on the raw page data. 69 | tables <- rvest::html_table(raw) 70 | tab <- tables[[3]] 71 | # Check to see if the dataset extracted is empty 72 | if (identical(dim(tab), c(1L, 1L))) { 73 | theend <- TRUE 74 | } 75 | if (page == 1L) { 76 | if (theend) { 77 | stop("No data available") 78 | } 79 | maxpage <- as.numeric(strsplit(dplyr::pull(tables[[2]][1, 1]), "Page 1 of ")[[1]][2]) 80 | cli::cli_progress_bar("Downloading", total = maxpage) 81 | #cli::cli_progress_update() 82 | #Sys.sleep(1 / 1000) 83 | } 84 | if (!theend) { 85 | # Make allcolumns characters for now. 86 | tab <- suppressMessages(tibble::as_tibble(apply(tab, 2, as.character), .name_repair = "unique")) 87 | 88 | # Bind the data extracted from this page to all data collected so far. 89 | alldata <- dplyr::bind_rows(alldata, tab) 90 | 91 | # Update progress bar 92 | cli::cli_progress_update() 93 | Sys.sleep(1 / 1000) 94 | 95 | # Increment page counter. 96 | page <- page + 1L 97 | } 98 | } 99 | 100 | cli::cli_progress_done() 101 | 102 | # Remove redundant missings columns. 103 | alldata <- 104 | suppressMessages(tibble::as_tibble(alldata[, colSums(is.na(alldata)) != NROW(alldata)], .name_repair = "check_unique")) 105 | # Convert "-" to NA 106 | alldata[alldata == "-"] <- NA 107 | 108 | return(alldata) 109 | } 110 | -------------------------------------------------------------------------------- /R/update_player_meta.R: -------------------------------------------------------------------------------- 1 | #' Update player_meta 2 | #' 3 | #' The [player_meta] data set contains the names and other 4 | #' attributes of players who appear on both [cricsheet](https://cricsheet.org) 5 | #' and [ESPNCricinfo](https://www.espncricinfo.com) as at 24 March 2025. 6 | #' This function returns an updated version of the data set based on information 7 | #' currently available online. 8 | #' 9 | #' @param start_again If TRUE, downloads all data from ESPNCricinfo without 10 | #' using player_meta as a starting point. This can take a long time. 11 | #' @return A tibble containing meta data on cricket players. 12 | #' @author Hassan Rafique and Rob J Hyndman 13 | #' @seealso [player_meta], [fetch_player_meta()]. 14 | #' @examples 15 | #' \dontrun{ 16 | #' # Update data to current 17 | #' new_player_meta <- update_player_meta() 18 | #' } 19 | #' @export 20 | update_player_meta <- function(start_again = FALSE) { 21 | store_warning <- options(warn = -1)$warn 22 | # Remove people with no country from existing player_meta 23 | player_meta <- player_meta |> 24 | dplyr::filter(!is.na(country)) 25 | # Read people file from cricsheet 26 | people <- readr::read_csv("https://cricsheet.org/register/people.csv", 27 | col_types = "ccccccccccccccc", lazy = FALSE 28 | ) |> 29 | dplyr::select( 30 | cricsheet_id = identifier, 31 | cricinfo_id = key_cricinfo, 32 | cricinfo_id2 = key_cricinfo_2, 33 | name, unique_name 34 | ) |> 35 | # Remove people not on Cricinfo 36 | dplyr::filter(!is.na(cricinfo_id)) 37 | 38 | # Compare existing version of player_meta and find missing players 39 | if (start_again) { 40 | missing_df <- people 41 | } else { 42 | missing_df <- people |> 43 | dplyr::anti_join(player_meta, by = "cricinfo_id") |> 44 | dplyr::anti_join(player_meta, by = c("cricinfo_id2" = "cricinfo_id")) 45 | } 46 | 47 | # Now download meta data for new players 48 | new_player_meta <- fetch_player_meta(missing_df$cricinfo_id) 49 | 50 | # For people missing on cricinfo, try the other cricinfo id 51 | cricinfo2 <- new_player_meta |> 52 | dplyr::left_join(people |> dplyr::select(-name), by = "cricinfo_id") |> 53 | dplyr::filter(is.na(full_name) & !is.na(cricinfo_id2)) |> 54 | dplyr::pull(cricinfo_id2) 55 | if (length(cricinfo2) > 0) { 56 | new_player_meta <- new_player_meta |> 57 | dplyr::bind_rows(fetch_player_meta(cricinfo2)) |> 58 | dplyr::filter(!is.na(full_name)) 59 | } 60 | 61 | # Add in cricsheet id 62 | new_player_meta <- dplyr::bind_rows( 63 | new_player_meta |> 64 | dplyr::left_join(people |> dplyr::select(-name), 65 | by = "cricinfo_id" 66 | ) |> 67 | dplyr::select(-cricinfo_id2) |> 68 | dplyr::filter(!is.na(cricsheet_id)), 69 | new_player_meta |> 70 | dplyr::left_join(people |> dplyr::select(-name, -cricinfo_id), 71 | by = c("cricinfo_id" = "cricinfo_id2") 72 | ) |> 73 | dplyr::filter(!is.na(cricsheet_id)) 74 | ) |> 75 | # Organize by column and row 76 | dplyr::select( 77 | cricinfo_id, cricsheet_id, unique_name, full_name, 78 | dplyr::everything() 79 | ) |> 80 | # Remove missing people 81 | dplyr::filter(!is.na(full_name)) 82 | 83 | # Add to existing player_meta 84 | if (!start_again) { 85 | new_player_meta <- new_player_meta |> 86 | dplyr::mutate(dob = as.Date(dob)) |> 87 | dplyr::bind_rows(player_meta) 88 | } 89 | 90 | # Clean up and arrange 91 | new_player_meta <- new_player_meta |> 92 | # Fix country names 93 | dplyr::mutate(country = stringr::str_remove(country, " Wmn")) |> 94 | # Arrange in alphabetic order 95 | dplyr::arrange(full_name) |> 96 | # Remove duplicates 97 | dplyr::distinct() 98 | 99 | options(warn = store_warning) 100 | return(new_player_meta) 101 | } 102 | 103 | utils::globalVariables(c( 104 | "identifier", "key_cricinfo", "key_cricinfo_2", "name", "unique_name", 105 | "player_meta", "cricinfo_id", "cricinfo_id2", "cricsheet_id" 106 | )) 107 | -------------------------------------------------------------------------------- /R/clean_batting_data.R: -------------------------------------------------------------------------------- 1 | # Function to clean batting data. 2 | # Works with career or innings data 3 | 4 | clean_batting_data <- function(x) { 5 | # Make names easier to interpret 6 | vars <- colnames(x) 7 | vars[vars == "Mat"] <- "Matches" 8 | vars[vars == "Inns"] <- "Innings" 9 | vars[vars == "NO"] <- "NotOuts" 10 | vars[vars == "HS"] <- "HighScore" 11 | vars[vars == "Ave"] <- "Average" 12 | vars[vars == "100"] <- "Hundreds" 13 | vars[vars == "50"] <- "Fifties" 14 | vars[vars == "0"] <- "Ducks" 15 | vars[vars == "SR"] <- "StrikeRate" 16 | vars[vars == "BF"] <- "BallsFaced" 17 | vars[vars == "4s"] <- "Fours" 18 | vars[vars == "6s"] <- "Sixes" 19 | vars[vars == "Mins"] <- "Minutes" 20 | vars[vars == "Start Date"] <- "Date" 21 | colnames(x) <- vars 22 | 23 | career <- ("Matches" %in% vars) 24 | if (career) { 25 | x$Matches <- as.integer(x$Matches) 26 | x$NotOuts <- as.integer(x$NotOuts) 27 | x$Hundreds <- as.integer(x$Hundreds) 28 | x$Fifties <- as.integer(x$Fifties) 29 | x$Ducks <- as.integer(x$Ducks) 30 | # Add not out column and remove annotations from highscore 31 | x$HighScoreNotOut <- seq(NROW(x)) %in% grep("*", x$HighScore, fixed = TRUE) 32 | x$HighScore <- as.numeric(gsub("*", "", x$HighScore, fixed = TRUE)) 33 | if ("Span" %in% vars) { 34 | x$Start <- as.integer(substr(x$Span, 1, 4)) 35 | x$End <- as.integer(substr(x$Span, 6, 9)) 36 | } 37 | x$Runs <- as.integer(x$Runs) 38 | x$Innings <- as.integer(x$Innings) 39 | x$Average <- x$Runs / (x$Innings - x$NotOuts) 40 | } else { 41 | x$Innings <- as.integer(x$Innings) 42 | x$Minutes <- as.numeric(x$Minutes) 43 | x$Date <- lubridate::dmy(x$Date) 44 | x$Opposition <- stringr::str_replace_all(x$Opposition, "v | Women| Wmn", "") 45 | x$Opposition <- rename_countries(x$Opposition) 46 | # Add not out and participation column and remove annotations from runs 47 | x$NotOut <- seq(NROW(x)) %in% grep("*", x$Runs, fixed = TRUE) 48 | x$Runs <- gsub("*", "", x$Runs, fixed = TRUE) 49 | x$Participation <- participation_status(x$Runs) 50 | x$Runs[x$Participation != "B"] <- NA 51 | x$Runs <- as.integer(x$Runs) 52 | } 53 | 54 | if ("BallsFaced" %in% vars) { 55 | x$BallsFaced <- as.integer(x$BallsFaced) 56 | x$StrikeRate <- x$Runs / x$BallsFaced * 100 57 | } 58 | if ("Fours" %in% vars) { 59 | x$Fours <- as.integer(x$Fours) 60 | x$Sixes <- as.integer(x$Sixes) 61 | } 62 | 63 | # Extract country information if it is present 64 | # This should only be required when multiple countries are included 65 | country <- (length(grep("\\(", x$Player) > 0)) 66 | if (country) { 67 | x$Country <- stringr::str_extract(x$Player, "\\([a-zA-Z /\\-extends]+\\)") 68 | x$Country <- stringr::str_replace_all(x$Country, "\\(|\\)|-W", "") 69 | x$Country <- rename_countries(x$Country) 70 | x$Player <- stringr::str_replace(x$Player, "\\([a-zA-Z /\\-]+\\)", "") 71 | } 72 | 73 | # Re-order and select columns 74 | vars <- colnames(x) 75 | if (career) { 76 | varorder <- c( 77 | "Player", "Country", "Start", "End", "Matches", "Innings", "NotOuts", "Runs", "HighScore", "HighScoreNotOut", 78 | "Average", "BallsFaced", "StrikeRate", "Hundreds", "Fifties", "Ducks", "Fours", "Sixes" 79 | ) 80 | } else { 81 | varorder <- c( 82 | "Date", "Player", "Country", "Runs", "NotOut", "Minutes", "BallsFaced", "Fours", "Sixes", 83 | "StrikeRate", "Innings", "Participation", "Opposition", "Ground" 84 | ) 85 | } 86 | varorder <- varorder[varorder %in% vars] 87 | 88 | return(x[, varorder]) 89 | } 90 | 91 | 92 | # Convert bowling/batting batting category to a character variable. 93 | 94 | participation_status <- function(status) { 95 | absent <- grep("absent", status) 96 | dnb <- grep("^DNB", status) 97 | tdnb <- grep("TDNB", status) 98 | sub <- grep("sub", status) 99 | 100 | status[seq(NROW(status))] <- "B" 101 | status[absent] <- "Absent" 102 | status[dnb] <- "DNB" 103 | status[tdnb] <- "TDNB" 104 | status[sub] <- "Sub" 105 | 106 | return(status) 107 | } 108 | -------------------------------------------------------------------------------- /R/clean_bowling_data.R: -------------------------------------------------------------------------------- 1 | # Function to clean bowling data. 2 | # Works with career or innings data 3 | 4 | clean_bowling_data <- function(x) { 5 | # Make names easier to interpret 6 | vars <- colnames(x) 7 | vars[vars == "Mat"] <- "Matches" 8 | vars[vars == "Inns"] <- "Innings" 9 | vars[vars == "Mdns"] <- "Maidens" 10 | vars[vars == "Wkts"] <- "Wickets" 11 | vars[vars == "BBI"] <- "BestBowlingInnings" 12 | vars[vars == "BBM"] <- "BestBowlingMatch" 13 | vars[vars == "Ave"] <- "Average" 14 | vars[vars == "Econ"] <- "Economy" 15 | vars[vars == "SR"] <- "StrikeRate" 16 | vars[vars == "4"] <- "FourWickets" 17 | vars[vars == "5"] <- "FiveWickets" 18 | vars[vars == "10"] <- "TenWickets" 19 | vars[vars == "Start Date"] <- "Date" 20 | colnames(x) <- vars 21 | 22 | # Fix classes for all variables 23 | if ("Maidens" %in% vars) { 24 | x$Maidens <- as.integer(x$Maidens) 25 | } 26 | if ("Balls" %in% vars) { 27 | x$Balls <- as.integer(x$Balls) 28 | } 29 | x$Runs <- as.integer(x$Runs) 30 | x$Wickets <- as.integer(x$Wickets) 31 | x$Innings <- as.integer(x$Innings) 32 | 33 | career <- ("Matches" %in% vars) 34 | 35 | if (career) { 36 | x$Matches <- as.integer(x$Matches) 37 | if ("Span" %in% vars) { 38 | x$Start <- as.integer(substr(x$Span, 1, 4)) 39 | x$End <- as.integer(substr(x$Span, 6, 9)) 40 | } 41 | if ("FourWickets" %in% vars) { 42 | x$FourWickets <- as.integer(x$FourWickets) 43 | } 44 | if ("FiveWickets" %in% vars) { 45 | x$FiveWickets <- as.integer(x$FiveWickets) 46 | } 47 | if ("TenWickets" %in% vars) { 48 | x$TenWickets <- as.integer(x$TenWickets) 49 | } 50 | } else { 51 | # Add participation column 52 | if ("Overs" %in% vars) { 53 | x$Participation <- participation_status(x$Overs) 54 | x$Overs[x$Participation != "B"] <- NA 55 | } 56 | x$Date <- lubridate::dmy(x$Date) 57 | x$Opposition <- stringr::str_replace_all(x$Opposition, "v | Women| Wmn", "") 58 | x$Opposition <- rename_countries(x$Opposition) 59 | } 60 | if ("Overs" %in% vars) { 61 | x$Overs <- as.numeric(x$Overs) 62 | } 63 | 64 | # Recompute average to avoid rounding errors 65 | if ("Average" %in% vars) { 66 | x$Average <- x$Runs / x$Wickets 67 | } 68 | 69 | # Recompute economy rate to avoid rounding errors 70 | if ("Balls" %in% vars) { 71 | balls <- x$Balls 72 | } else { 73 | balls <- trunc(x$Overs) * 6 + (x$Overs %% 1) * 10 74 | } 75 | if ("Economy" %in% vars) { 76 | x$Economy <- as.numeric(x$Economy) 77 | ER <- x$Runs / (balls / 6) 78 | # Don't modify values if they differ by more than 0.05 79 | differ <- abs(round(ER, 2) - x$Economy) > 0.05 80 | differ[is.na(differ)] <- FALSE 81 | x$Economy[!differ] <- ER[!differ] 82 | } 83 | 84 | # Recompute strike rate 85 | if ("StrikeRate" %in% vars) { 86 | x$StrikeRate <- balls / x$Wickets 87 | } 88 | 89 | # Extract country information if it is present 90 | # This should only be required when multiple countries are included 91 | country <- (length(grep("\\(", x[1, 1])) > 0) 92 | if (country) { 93 | x$Country <- stringr::str_extract(x$Player, "\\([a-zA-Z \\-extends]+\\)") 94 | x$Country <- stringr::str_replace_all(x$Country, "\\(|\\)|-W", "") 95 | x$Country <- rename_countries(x$Country) 96 | x$Player <- stringr::str_replace(x$Player, " \\([a-zA-Z \\-]+\\)", "") 97 | } 98 | 99 | # Re-order and select columns 100 | vars <- colnames(x) 101 | if (career) { 102 | varorder <- c( 103 | "Player", "Country", "Start", "End", "Matches", "Innings", "Overs", "Balls", "Maidens", "Runs", "Wickets", 104 | "Average", "Economy", "StrikeRate", "BestBowlingInnings", "BestBowlingMatch", "FourWickets", "FiveWickets", "TenWickets" 105 | ) 106 | } else { 107 | varorder <- c( 108 | "Date", "Player", "Country", "Overs", "Balls", "Maidens", "Runs", "Wickets", 109 | "Economy", "Innings", "Participation", "Opposition", "Ground" 110 | ) 111 | } 112 | varorder <- varorder[varorder %in% vars] 113 | 114 | return(x[, varorder]) 115 | } 116 | -------------------------------------------------------------------------------- /R/countries.R: -------------------------------------------------------------------------------- 1 | men <- data.frame( 2 | team = c(1:9, 11, 12, 14, 15, 17, 19, 20, 25, 26, 27, 28, 29, 30, 32, 37, 40), 3 | name = c( 4 | "England", 5 | "Australia", 6 | "South Africa", 7 | "West Indies", 8 | "New Zealand", 9 | "India", 10 | "Pakistan", 11 | "Sri Lanka", 12 | "Zimbabwe", 13 | "United States of America", 14 | "Bermuda", 15 | "East Africa", 16 | "Netherlands", 17 | "Canada", 18 | "Hong Kong", 19 | "Papua New Guinea", 20 | "Bangladesh", 21 | "Kenya", 22 | "United Arab Emirates", 23 | "Namibia", 24 | "Ireland", 25 | "Scotland", 26 | "Nepal", 27 | "Oman", 28 | "Afghanistan" 29 | ) 30 | ) 31 | 32 | women <- data.frame( 33 | team = c(289, 1026, 1863, 4240, 3379, 3672, 2614, 2285, 3022, 2461, 3867, 825, 3808, 2331, 3505, 3843), 34 | name = c( 35 | "Australia", 36 | "England", 37 | "India", 38 | "Bangladesh", 39 | "South Africa", 40 | "Sri Lanka", 41 | "New Zealand", 42 | "Ireland", 43 | "Pakistan", 44 | "Netherlands", 45 | "West Indies", 46 | "Denmark", 47 | "Jamaica", 48 | "Japan", 49 | "Scotland", 50 | "Trinidad & Tobago" 51 | ) 52 | ) 53 | 54 | men <- tibble::as_tibble(men[order(men$name), ], .name_repair = "check_unique") |> 55 | dplyr::mutate( 56 | team = as.integer(team), 57 | name = as.character(name) 58 | ) 59 | women <- tibble::as_tibble(women[order(women$name), ], .name_repair = "check_unique") |> 60 | dplyr::mutate( 61 | team = as.integer(team), 62 | name = as.character(name) 63 | ) 64 | 65 | 66 | ## Men 67 | # 1 England 68 | # 2 Australia 69 | # 3 South Africa 70 | # 4 West Indies 71 | # 5 New Zealand 72 | # 6 India 73 | # 7 Pakistan 74 | # 8 Sri Lanka 75 | # 9 Zimbabwe 76 | # 11 United States of America 77 | # 12 Bermuda 78 | # 14 East Africa 79 | # 15 Netherlands 80 | # 17 Canada 81 | # 19 Hong Kong 82 | # 20 Papua New Guinea 83 | # 25 Bangladesh 84 | # 26 Kenya 85 | # 27 United Arab Emirates 86 | # 28 Namibia 87 | # 29 Ireland 88 | # 30 Scotland 89 | # 32 Nepal 90 | # 37 Oman 91 | # 40 Afghanistan 92 | 93 | ## WOMEN 94 | 95 | # 289 Australia Women 96 | # 1026 England Women 97 | # 1863 India Women 98 | # 4240 Bangladesh Women 99 | # 3379 South Africa Women 100 | # 3672 Sri Lanka Women 101 | # 2614 New Zealand Women 102 | # 2285 Ireland women 103 | # 3022 Pakistan Women 104 | # 2461 Netherlands Women 105 | # 3867 West Indies Women 106 | # 825 Denmark Women 107 | # 3808 Jamaica Women 108 | # 2331 Japan Women 109 | # 3505 Scotland Women 110 | # 3843 Trinidad & Tobago Women 111 | 112 | 113 | 114 | rename_countries <- function(countries) { 115 | countries |> 116 | stringr::str_replace("AFG", "Afghanistan") |> 117 | stringr::str_replace("Afr$", "Africa XI") |> 118 | stringr::str_replace("AUS", "Australia") |> 119 | stringr::str_replace("Bdesh|BDESH|BD", "Bangladesh") |> 120 | stringr::str_replace("BMUDA", "Bermuda") |> 121 | stringr::str_replace("CAN", "Canada") |> 122 | stringr::str_replace("DnWmn|Denmk", "Denmark") |> 123 | stringr::str_replace("EAf", "East (and Central) Africa") |> 124 | stringr::str_replace("ENG", "England") |> 125 | stringr::str_replace("HKG", "Hong Kong") |> 126 | stringr::str_replace("ICC$", "ICC World XI") |> 127 | stringr::str_replace("INDIA|IND", "India") |> 128 | stringr::str_replace("IntWn|Int XI", "International XI") |> 129 | stringr::str_replace("Ire$|IRELAND|IRE", "Ireland") |> 130 | stringr::str_replace("JamWn", "Jamaica") |> 131 | stringr::str_replace("JPN", "Japan") |> 132 | stringr::str_replace("KENYA", "Kenya") |> 133 | stringr::str_replace("NAM", "Namibia") |> 134 | stringr::str_replace("NEPAL", "Nepal") |> 135 | stringr::str_replace("Neth$|NL", "Netherlands") |> 136 | stringr::str_replace("NZ", "New Zealand") |> 137 | stringr::str_replace("OMAN", "Oman") |> 138 | stringr::str_replace("PAK", "Pakistan") |> 139 | stringr::str_replace("PNG|P\\.N\\.G\\.", "Papau New Guinea") |> 140 | stringr::str_replace("^SA", "South Africa") |> 141 | stringr::str_replace("SCOT|SCO|Scot$", "Scotland") |> 142 | stringr::str_replace("SL", "Sri Lanka") |> 143 | stringr::str_replace("TTWmn|T \\& T", "Trinidad and Tobago") |> 144 | stringr::str_replace("UAE|U\\.A\\.E\\.", "United Arab Emirates") |> 145 | stringr::str_replace("USA|U\\.S\\.A\\.", "United States of America") |> 146 | stringr::str_replace("World$|World-XI", "World XI") |> 147 | stringr::str_replace("WI", "West Indies") |> 148 | stringr::str_replace("YEWmn|Y\\. Eng", "Young England") |> 149 | stringr::str_replace("ZIM", "Zimbabwe") 150 | } 151 | -------------------------------------------------------------------------------- /R/fetch_player.R: -------------------------------------------------------------------------------- 1 | #' Fetch Player Data 2 | #' 3 | #' Fetch individual player data from all matches played. The function will scrape 4 | #' the data from ESPNCricinfo and return a tibble with one line per innings for all 5 | #' games a player has played. To identify a player, use their Cricinfo player ID. 6 | #' The simplest way to find this is to look up their Cricinfo Profile page. The number 7 | #' at the end of the URL is the ID. For example, Meg Lanning's profile page is 8 | #' http://www.espncricinfo.com/australia/content/player/329336.html, 9 | #' so her ID is 329336. 10 | #' 11 | #' @param playerid The player ID as given in the Cricinfo profile. Integer or character. 12 | #' @param matchtype Which type of cricket matches do you want? Tests, ODIs or T20s? Not case-sensitive. 13 | #' @param activity Which type of activities do you want? Batting, Bowling or Fielding? Not case-sensitive. 14 | #' 15 | #' @return A tibble containing data on the selected player, with one row for every innings 16 | #' of every match in which they have played. 17 | #' @author Rob J Hyndman and Sayani Gupta 18 | #' @seealso [find_player_id()] to find a player ID by searching on their name, 19 | #' and [fetch_player_meta()] to download meta data for players. 20 | #' @examples 21 | #' \dontrun{ 22 | #' # Download data on some players 23 | #' EllysePerry <- fetch_player_data(275487, "T20", "batting") 24 | #' RahulDravid <- fetch_player_data(28114, "ODI", "fielding") 25 | #' LasithMalinga <- fetch_player_data(49758, "Test", "bowling") 26 | #' 27 | #' # Create a plot for Ellyse Perry's T20 scores 28 | #' library(dplyr) 29 | #' library(ggplot2) 30 | #' EllysePerry |> 31 | #' filter(!is.na(Runs)) |> 32 | #' ggplot(aes(x = Start_Date, y = Runs, col = Dismissal, na.rm = TRUE)) + 33 | #' geom_point() + 34 | #' ggtitle("Ellyse Perry's T20 Scores") 35 | #' } 36 | #' @export 37 | fetch_player_data <- function( 38 | playerid, 39 | matchtype = c("test", "odi", "t20"), 40 | activity = c("batting", "bowling", "fielding") 41 | ) { 42 | matchtype <- tolower(matchtype) 43 | matchtype <- match.arg(matchtype) 44 | 45 | activity <- tolower(activity) 46 | activity <- match.arg(activity) 47 | 48 | matchclass <- match(matchtype, c("test", "odi", "t20")) 49 | 50 | # Try male URL 51 | output <- get_cricinfo_data(playerid, matchclass, matchtype, activity) 52 | if (inherits(output, "character")) { 53 | if (output %in% c("No records")) { 54 | # Player exists. So try female URL 55 | output <- get_cricinfo_data(playerid, matchclass + 7L, matchtype, activity) 56 | } 57 | } 58 | if (inherits(output, "character")) { 59 | if (output == "No player") { 60 | stop("Player not found") 61 | } else if (output == "No records") { 62 | stop("Player did not play this format") 63 | } 64 | } 65 | 66 | # Remove redundant missings columns 67 | tab <- tibble::as_tibble( 68 | output[, colSums(is.na(output)) != NROW(output)], 69 | .name_repair = "check_unique" 70 | ) 71 | 72 | # Convert "-" to NA 73 | tab[tab == "-"] <- NA 74 | 75 | # Convert some columns to numeric or Date 76 | tab$Innings <- as.integer(tab$Inns) 77 | tab$Date <- lubridate::dmy(tab$`Start Date`) 78 | tab$`Start Date` <- NULL 79 | tab$Opposition <- substring(tab$Opposition, 3) 80 | tab$Ground <- as.character(tab$Ground) 81 | if ("Mins" %in% colnames(tab)) { 82 | tab$Mins <- as.numeric(tab$Mins) 83 | } 84 | 85 | # Make tidy column names columns 86 | tidy.col <- make.names(colnames(tab), unique = TRUE) 87 | colnames(tab) <- gsub(".", "_", tidy.col, fixed = TRUE) 88 | tidy.col <- colnames(tab) 89 | 90 | ## order the elements, no difference for different activities 91 | com_col <- c("Date", "Innings", "Opposition", "Ground") 92 | 93 | ## Removing "*" in the column `Runs` and converting it to numeric 94 | if ("Runs" %in% colnames(tab)) { 95 | tab$Runs <- suppressWarnings(as.numeric(gsub( 96 | "*", 97 | "", 98 | x = tab$Runs, 99 | fixed = TRUE 100 | ))) 101 | } 102 | 103 | # Reorder columns 104 | return( 105 | tab[, c(com_col, tidy.col[!tidy.col %in% com_col])] 106 | ) 107 | } 108 | 109 | get_cricinfo_data <- function(playerid, matchclass, matchtype, activity) { 110 | url <- paste( 111 | "http://stats.espncricinfo.com/ci/engine/player/", 112 | playerid, 113 | ".html?class=", 114 | matchclass, 115 | ";template=results;type=", 116 | activity, 117 | ";view=innings;wrappertype=print", 118 | sep = "" 119 | ) 120 | raw <- try(xml2::read_html(url), silent = TRUE) 121 | if (!("try-error" %in% class(raw))) { 122 | output <- rvest::html_table(raw) 123 | if ("No records available to match this query" %in% unlist(output)) { 124 | return("No records") 125 | } else { 126 | # Grab relevant table 127 | return(output[[4]]) 128 | } 129 | } else { 130 | return("No player") 131 | } 132 | } 133 | -------------------------------------------------------------------------------- /vignettes/cricinfo.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "ESPNCricinfo data" 3 | author: Rob J Hyndman 4 | output: 5 | rmarkdown::html_vignette: 6 | fig_width: 10 7 | fig_height: 5 8 | vignette: > 9 | %\VignetteIndexEntry{ESPNCricinfo data} 10 | %\VignetteEngine{knitr::rmarkdown} 11 | %\VignetteEncoding{UTF-8} 12 | --- 13 | 14 | ```{r, include = FALSE} 15 | knitr::opts_chunk$set( 16 | collapse = TRUE, 17 | comment = "#>", 18 | echo = TRUE 19 | ) 20 | # Okabi-Ito colours 21 | options( 22 | ggplot2.discrete.colour = c("#D55E00", "#0072B2", "#009E73", "#CC79A7", "#E69F00", "#56B4E9", "#F0E442") 23 | ) 24 | ``` 25 | 26 | ```{r setup} 27 | library(cricketdata) 28 | library(dplyr) 29 | library(ggplot2) 30 | ``` 31 | 32 | The `fetch_cricinfo()` function will fetch data on all international cricket matches provided by ESPNCricinfo. Please respect the [ESPNCricinfo terms of use](https://www.espncricinfo.com/ci/content/site/company/terms_use.html) when using this function. 33 | 34 | Here are some examples of its use. 35 | 36 | ```{r getdata, eval=FALSE, echo=FALSE} 37 | # Avoid downloading the data when the package is checked by CRAN. 38 | # This only needs to be run once to store the data locally 39 | wt20 <- fetch_cricinfo("T20", "Women", "Bowling") 40 | menODI <- fetch_cricinfo("ODI", "Men", "Batting", type = "innings", country = "Australia") 41 | Indfielding <- fetch_cricinfo("Test", "Men", "Fielding", country = "India") 42 | meg_lanning_id <- find_player_id("Meg Lanning")$ID 43 | MegLanning <- fetch_player_data(meg_lanning_id, "ODI") |> 44 | mutate(NotOut = (Dismissal == "not out")) |> 45 | mutate(NotOut = tidyr::replace_na(NotOut, FALSE)) 46 | 47 | saveRDS(wt20, here::here("inst/extdata/wt20.rds")) 48 | saveRDS(menODI, here::here("inst/extdata/menODI.rds")) 49 | saveRDS(Indfielding, here::here("inst/extdata/Indfielding.rds")) 50 | saveRDS(MegLanning, here::here("inst/extdata/MegLanning.rds")) 51 | ``` 52 | 53 | ```{r loaddata, include=FALSE} 54 | wt20 <- readRDS("../inst/extdata/wt20.rds") 55 | menODI <- readRDS("../inst/extdata/menODI.rds") 56 | Indfielding <- readRDS("../inst/extdata/Indfielding.rds") 57 | MegLanning <- readRDS("../inst/extdata/MegLanning.rds") 58 | ``` 59 | 60 | ## Women's T20 bowling data 61 | 62 | ```r 63 | # Fetch all Women's T20 data 64 | wt20 <- fetch_cricinfo("T20", "Women", "Bowling") 65 | ``` 66 | 67 | ```{r woment20, message=FALSE, echo = FALSE} 68 | wt20 |> 69 | head() |> 70 | knitr::kable(digits = 2) 71 | ``` 72 | 73 | ```{r woment20graph, fig.width=10, fig.height=8} 74 | wt20 |> 75 | filter(Wickets > 20, !is.na(Country)) |> 76 | ggplot(aes(y = StrikeRate, x = Country)) + 77 | geom_boxplot() + 78 | geom_point(alpha = 0.3, col = "blue") + 79 | ggtitle("Women T20: Strike Rates") + 80 | ylab("Balls per wicket") + 81 | coord_flip() 82 | ``` 83 | 84 | ## Australian men's ODI data by innings 85 | 86 | ```r 87 | # Fetch all Australian Men's ODI data by innings 88 | menODI <- fetch_cricinfo("ODI", "Men", "Batting", type = "innings", country = "Australia") 89 | ``` 90 | 91 | ```{r menodi, message=FALSE, echo=FALSE} 92 | menODI |> 93 | head() |> 94 | knitr::kable() 95 | ``` 96 | 97 | ```{r menodigraph, warning=FALSE, message=FALSE} 98 | menODI |> 99 | ggplot(aes(y = Runs, x = Date)) + 100 | geom_point(alpha = 0.2, col = "#D55E00") + 101 | geom_smooth() + 102 | ggtitle("Australia Men ODI: Runs per Innings") 103 | ``` 104 | 105 | ## Indian test fielding data 106 | 107 | ```r 108 | Indfielding <- fetch_cricinfo("Test", "Men", "Fielding", country = "India") 109 | ``` 110 | 111 | ```{r indiafielding, echo=FALSE} 112 | Indfielding |> 113 | head() |> 114 | knitr::kable() 115 | ``` 116 | 117 | ```{r indiafieldinggraph} 118 | Indfielding |> 119 | mutate(wktkeeper = (CaughtBehind > 0) | (Stumped > 0)) |> 120 | ggplot(aes(x = Matches, y = Dismissals, col = wktkeeper)) + 121 | geom_point() + 122 | ggtitle("Indian Men Test Fielding") 123 | ``` 124 | 125 | ## Meg Lanning's ODI batting 126 | 127 | ```r 128 | meg_lanning_id <- find_player_id("Meg Lanning")$ID 129 | MegLanning <- fetch_player_data(meg_lanning_id, "ODI") |> 130 | mutate(NotOut = (Dismissal == "not out")) |> 131 | mutate(NotOut = tidyr::replace_na(NotOut, FALSE)) 132 | ``` 133 | 134 | ```{r meglanning, echo=FALSE} 135 | MegLanning |> 136 | head() |> 137 | knitr::kable() 138 | ``` 139 | 140 | ```{r meglanninggraph} 141 | # Compute batting average 142 | MLave <- MegLanning |> 143 | summarise( 144 | Innings = sum(!is.na(Runs)), 145 | Average = sum(Runs, na.rm = TRUE) / (Innings - sum(NotOut, na.rm=TRUE)) 146 | ) |> 147 | pull(Average) 148 | names(MLave) <- paste("Average =", round(MLave, 2)) 149 | # Plot ODI scores 150 | ggplot(MegLanning) + 151 | geom_hline(aes(yintercept = MLave), col = "gray") + 152 | geom_point(aes(x = Date, y = Runs, col = NotOut)) + 153 | ggtitle("Meg Lanning ODI Scores") + 154 | scale_y_continuous(sec.axis = sec_axis(~., breaks = MLave)) 155 | ``` 156 | -------------------------------------------------------------------------------- /R/fetch_cricsheet.R: -------------------------------------------------------------------------------- 1 | # For retrieving ball-by-ball data --------------------------------------------- 2 | 3 | #' Fetch ball-by-ball, match and player data from Cricsheet and return a tibble. 4 | #' 5 | #' Download csv data from Cricsheet \url{https://cricsheet.org/downloads/}. 6 | #' Data must be specified by three factors: 7 | #' (a) type of data: `bbb` (ball-by-ball), `match` or `player`. 8 | #' (b) gender; 9 | #' (c) competition specified as a Cricsheet code. See \code{\link{cricsheet_codes}} for the 10 | #' competitions and codes available. 11 | #' 12 | #' @param type Character string giving type of data: ball-by-ball, match info or player info. 13 | #' @param gender Character string giving player gender: female or male. 14 | #' @param competition Character string giving code corresponding to competition. See \code{\link{cricsheet_codes}} for the 15 | #' competitions and codes available. 16 | #' @author Jacquie Tran, Hassan Rafique and Rob J Hyndman 17 | #' @return A \code{tibble} object, similar to a \code{data.frame}. 18 | #' @examples 19 | #' \dontrun{ 20 | #' wbbl_bbb <- fetch_cricsheet(competition = "wbbl", type = "bbb") 21 | #' wbbl_match <- fetch_cricsheet(competition = "wbbl", type = "match") 22 | #' wbbl_player <- fetch_cricsheet(competition = "wbbl", type = "player") 23 | #' } 24 | #' @export 25 | 26 | fetch_cricsheet <- function( 27 | type = c("bbb", "match", "player"), 28 | gender = c("female", "male"), 29 | competition = "tests") { 30 | # Match arguments 31 | type <- match.arg(type) 32 | gender <- match.arg(gender) 33 | 34 | # Convert code for backwards compatibility 35 | competition <- dplyr::recode(competition, 36 | county = "cch", 37 | edwards_cup = "cec", 38 | heyhoe_flint_trophy = "rhf", 39 | multi_day = "mdms", 40 | sheffield_shield = "ssh", 41 | super_smash = "ssm", 42 | the_hundred = "hnd", 43 | t20_blast = "ntb", 44 | t20is = "t20s", 45 | t20is_unofficial = "it20s", 46 | wbbl = "wbb", 47 | wt20c = "wtc" 48 | ) 49 | # Construct file names and url 50 | destfile <- paste0(competition, "_", gender, "_csv2.zip") 51 | url <- paste0("https://cricsheet.org/downloads/", destfile) 52 | subdir <- paste0(sub("_csv2.zip", "", destfile), "_bbb") 53 | destfile <- file.path(tempdir(), destfile) 54 | 55 | # Download zip file from Cricsheet.org if it hasn't already been downloaded 56 | if (!file.exists(destfile)) { 57 | download.file(url, destfile) 58 | } 59 | 60 | # List all files in zip 61 | check_files <- unzip(destfile, exdir = tempdir(), list = TRUE)$Name 62 | check_files <- data.frame(check_files = check_files) 63 | check_files$check_files <- as.character(check_files$check_files) 64 | check_files$file_type <- dplyr::case_when( 65 | stringr::str_detect(check_files$check_files, "txt") ~ "txt", 66 | stringr::str_detect(check_files$check_files, "_info") ~ "info", 67 | stringr::str_detect(check_files$check_files, "all_matches") ~ "allbbb", 68 | TRUE ~ "bbb" 69 | ) 70 | 71 | # Identify the required files 72 | if (type == "bbb") { 73 | if ("all_matches.csv" %in% check_files$check_files) { 74 | match_files <- "all_matches.csv" 75 | } else { 76 | match_files <- check_files$check_files[check_files$file_type == "bbb"] 77 | } 78 | } else { 79 | match_files <- check_files$check_files[check_files$file_type == "info"] 80 | } 81 | # Unzip files into sub directory 82 | unzip(destfile, match_files, exdir = file.path(tempdir(), subdir)) 83 | 84 | # List match files with full file paths 85 | match_filepaths <- file.path(tempdir(), subdir, match_files) 86 | 87 | if (type == "bbb") { 88 | # Read data from CSVs stored in the temp directory 89 | all_matches <- do.call( 90 | "rbind", 91 | lapply(match_filepaths, FUN = function(files) { 92 | read.csv(files) 93 | }) 94 | ) 95 | } else { 96 | all_matches <- suppressWarnings( 97 | readr::read_csv( 98 | match_filepaths, 99 | id = "path", guess_max = 100, 100 | col_names = c("col_to_delete", "key", "value"), 101 | skip = 1, show_col_types = FALSE, 102 | col_types = readr::cols(.default = readr::col_character()) 103 | ) 104 | ) 105 | # Note: Warning suppressed because the source data 106 | # changes format slightly when displaying player metadata compared to match data 107 | # Match metadata is in key-value pairs, 108 | # while player metadata contains additional value columns 109 | # We can safely suppress the warning(s) here and deal with the different 110 | # formats below. 111 | 112 | # Tidy up and subset to match metadata only 113 | # (i.e., excluding player / people metadata) 114 | # Note: Warning suppressed again as per note above. 115 | all_matches$col_to_delete <- NULL 116 | all_matches$match_id <- stringr::str_extract(all_matches$path, "[a-zA-Z0-9_\\-\\.]*$") 117 | all_matches$match_id <- sub("_info.csv", "", all_matches$match_id) 118 | all_matches$path <- NULL 119 | if (type == "match") { 120 | all_matches <- all_matches[!(all_matches$key %in% c("player", "players", "registry")), ] 121 | # Find columns with multiple values per key/match_id 122 | # Rows with second teams named 123 | j <- which(all_matches$key == "team") 124 | j <- j[seq(2, length(j), by = 2)] 125 | all_matches$key[j] <- "team2" 126 | all_matches$key[all_matches$key == "team"] <- "team1" 127 | # Rows with second umpires named 128 | j <- which(all_matches$key == "umpire") 129 | j <- j[seq(2, length(j), by = 2)] 130 | all_matches$key[j] <- "umpire2" 131 | all_matches$key[all_matches$key == "umpire"] <- "umpire1" 132 | # Make into wide form 133 | all_matches <- tidyr::pivot_wider(all_matches, 134 | id_cols = "match_id", 135 | names_from = "key", 136 | values_from = "value", 137 | values_fill = NA, 138 | values_fn = ~ head(.x, 1) # To remove duplicated values such as date 139 | ) 140 | all_matches <- dplyr::mutate_all(all_matches, ~ replace(., . == "NULL", NA_character_)) 141 | } else { 142 | all_matches <- all_matches[all_matches$key %in% c("player", "players"), ] 143 | all_matches$key <- NULL 144 | all_matches <- tidyr::separate(all_matches, value, sep = ",", c("team", "player")) 145 | } 146 | } 147 | output <- tibble::as_tibble(all_matches) 148 | 149 | # Clean data 150 | # Was it a T20 match? 151 | if (!("ball" %in% colnames(output))) { 152 | t20 <- FALSE 153 | } else { 154 | t20 <- max(output$ball, na.rm = TRUE) <= 21 155 | } 156 | if (type == "bbb" & t20) { 157 | output <- cleaning_bbb_t20_cricsheet(output) 158 | } 159 | 160 | return(output) 161 | } 162 | 163 | # Function to clean raw t20 bbb data from cricsheet 164 | # Provided by 165 | cleaning_bbb_t20_cricsheet <- function(df) { 166 | df <- df |> 167 | dplyr::mutate( 168 | # Wicket lost 169 | wicket = !(wicket_type %in% c("", "retired hurt")), 170 | # Over number 171 | over = ceiling(ball), 172 | # Extra ball to follow 173 | extra_ball = (!is.na(wides) | !is.na(noballs)) 174 | ) |> 175 | dplyr::group_by(match_id, innings, over) |> 176 | # Adjusting the ball values by introducing raw_balls, so that 1.1 and 1.10 177 | # are correctly differentiated as the first & tenth ball, respectively 178 | dplyr::mutate(ball = dplyr::row_number()) |> 179 | dplyr::ungroup() 180 | 181 | # Evaluating and joining runs scored, wickets lost at each stage of an innings 182 | df <- df |> 183 | dplyr::inner_join( 184 | df |> 185 | dplyr::group_by(match_id, innings) |> 186 | dplyr::reframe( 187 | runs_scored_yet = cumsum(runs_off_bat + extras), 188 | wickets_lost_yet = cumsum(wicket), 189 | ball = ball, over = over, 190 | .groups = "drop" 191 | ), 192 | by = c("match_id", "innings", "over", "ball") 193 | ) 194 | 195 | # Evaluating the balls in over after adjusting for extra balls and balls remaining in an innings 196 | remaining_balls <- df |> 197 | dplyr::group_by(match_id, innings, over) |> 198 | dplyr::reframe(ball = ball, extra_ball = cumsum(extra_ball)) |> 199 | dplyr::mutate( 200 | ball_in_over = ball - extra_ball, 201 | balls_remaining = ifelse(innings %in% c(1, 2), 120 - ((over - 1) * 6 + ball_in_over), 6 - ball_in_over) 202 | ) |> 203 | dplyr::select(-extra_ball) 204 | 205 | # Evaluating innings totals using ball-by-ball data 206 | innings_total <- df |> 207 | dplyr::group_by(match_id, innings) |> 208 | dplyr::reframe(total_score = sum(runs_off_bat + extras)) |> 209 | tidyr::pivot_wider( 210 | names_from = "innings", 211 | values_from = c("total_score") 212 | ) |> 213 | dplyr::rename(innings1_total = "1", innings2_total = "2") |> 214 | dplyr::select(match_id, innings1_total, innings2_total) 215 | 216 | # Joining all the dfs 217 | df <- df |> 218 | dplyr::inner_join(remaining_balls, by = c("match_id", "innings", "over", "ball")) |> 219 | dplyr::inner_join(innings_total, by = "match_id") |> 220 | dplyr::mutate(target = innings1_total + 1) |> 221 | dplyr::mutate(start_date = as.Date(start_date)) 222 | 223 | # Re-ordering the columns in the df 224 | df <- df |> 225 | dplyr::select( 226 | match_id, season, start_date, venue, innings, over, ball, batting_team, 227 | bowling_team, striker, non_striker, bowler, runs_off_bat, extras, 228 | ball_in_over, extra_ball, balls_remaining, runs_scored_yet, 229 | wicket, wickets_lost_yet, innings1_total, innings2_total, target, 230 | wides, noballs, byes, legbyes, penalty, wicket_type, player_dismissed, 231 | other_wicket_type, other_player_dismissed, dplyr::everything() 232 | ) 233 | 234 | return(df) 235 | } 236 | 237 | utils::globalVariables(c( 238 | "ball_in_over", 239 | "ball", 240 | "balls_remaining", 241 | "batting_team", 242 | "bowler", 243 | "bowling_team", 244 | "byes", 245 | "col_to_delete", 246 | "competition_type", 247 | "exclude_flag", 248 | "extra_ball", 249 | "extras", 250 | "glued_url", 251 | "innings", 252 | "innings1_total", 253 | "innings2_total", 254 | "key", 255 | "legbyes", 256 | "match_id", 257 | "noballs", 258 | "non_striker", 259 | "other_player_dismissed", 260 | "other_wicket_type", 261 | "over", 262 | "path_start", 263 | "path", 264 | "penalty", 265 | "player_dismissed", 266 | "runs_off_bat", 267 | "runs_scored_yet", 268 | "season", 269 | "start_date", 270 | "striker", 271 | "target", 272 | "value", 273 | "venue", 274 | "wicket_type", 275 | "wicket", 276 | "wickets_lost_yet", 277 | "wides" 278 | )) 279 | -------------------------------------------------------------------------------- /vignettes/cricketdata_R_pkg.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "cricketdata: An Open Source R package" 3 | author: Hassan Rafique and Jacquie Tran 4 | date: "18 October 2022" 5 | abstract: "Open and accessible data streams are crucial for reproducible research and further development. Cricket data sources are limited and are usually not in a format ready for analysis. [`cricketdata` R](http://pkg.robjhyndman.com/cricketdata/) package allows the users to download the data as a tibble ready for analysis from two primary sources: ESPNCricinfo and Cricsheet. [fetch_cricinfo()](http://pkg.robjhyndman.com/cricketdata/reference/fetch_cricinfo.html) and [fetch_player_data()](http://pkg.robjhyndman.com/cricketdata/reference/fetch_player_data.html) functions allow the user to download the data from ESPNCricinfo for different formats of international cricket (tests, odis, T20), player position (batter, bowler, fielding), and whole career or innings wise. Cricsheet is another data source, primarily for ball-by-ball data. [fetch_cricsheet()](http://pkg.robjhyndman.com/cricketdata/reference/fetch_cricsheet.html) function downloads the ball-by-ball, match, and player data for different competitions/formats (tests, odis, T20 internationals, T20 leagues). The T20 data is further processed by adding more features (columns) using the raw data. Some other [functions](http://pkg.robjhyndman.com/cricketdata/reference/fetch_player_meta.html) provide access to the individual players' playing career data and information about their playing style, country of origin, etc. The package essentially provides (almost) all publicly available cricket data ready for analysis. The package saves the user significant time in building the data pipeline, which may now be used for analysis. Here's an example of project built using `cricketdata`: " 6 | bibliography: bibliography.bib 7 | bibengine: biblatex 8 | output: 9 | rmarkdown::html_vignette: 10 | fig_width: 8 11 | fig_height: 5 12 | vignette: > 13 | %\VignetteIndexEntry{cricketdata: An Open Source R package} 14 | %\VignetteEngine{knitr::rmarkdown} 15 | %\VignetteEncoding{UTF-8} 16 | --- 17 | 18 | ```{r, include = FALSE} 19 | knitr::opts_chunk$set( 20 | collapse = TRUE, 21 | comment = "#>", 22 | echo = TRUE, 23 | cache = TRUE, 24 | warning = FALSE 25 | ) 26 | ``` 27 | 28 | ```{r} 29 | library(cricketdata) 30 | library(dplyr) 31 | library(ggplot2) 32 | ``` 33 | 34 | ```{r getdata, eval=FALSE, echo=FALSE} 35 | # Avoid downloading the data when the package is checked by CRAN. 36 | # This only needs to be run once to store the data locally 37 | ipl_bbb <- fetch_cricsheet("bbb", "male", "ipl") 38 | wt20 <- fetch_cricinfo("T20", "Women", "Bowling") 39 | menODI <- fetch_cricinfo("ODI", "Men", "Batting", 40 | type = "innings", 41 | country = "United States of America" 42 | ) 43 | meg_lanning_id <- find_player_id("Meg Lanning")$ID 44 | MegLanning <- fetch_player_data(meg_lanning_id, "ODI") |> 45 | mutate(NotOut = (Dismissal == "not out")) 46 | aus_women <- fetch_player_meta(c(329336, 275487)) 47 | 48 | saveRDS(wt20, here::here("inst/extdata/wt20.rds")) 49 | saveRDS(menODI, here::here("inst/extdata/usmenODI.rds")) 50 | saveRDS(MegLanning, here::here("inst/extdata/MegLanning.rds")) 51 | saveRDS(meg_lanning_id, here::here("inst/extdata/meg_lanning_id.rds")) 52 | saveRDS(ipl_bbb, here::here("inst/extdata/ipl_bbb.rds")) 53 | saveRDS(aus_women, here::here("inst/extdata/aus_women.rds")) 54 | ``` 55 | 56 | ```{r loaddata, include=FALSE} 57 | ipl_bbb <- readRDS("../inst/extdata/ipl_bbb.rds") 58 | wt20 <- readRDS("../inst/extdata/wt20.rds") 59 | menODI <- readRDS("../inst/extdata/usmenODI.rds") 60 | MegLanning <- readRDS("../inst/extdata/MegLanning.rds") 61 | meg_lanning_id <- readRDS("../inst/extdata/meg_lanning_id.rds") 62 | aus_women <- readRDS("../inst/extdata/aus_women.rds") 63 | ``` 64 | 65 | 66 | # Introduction 67 | 68 | The coverage of cricket as a sport has been limited compared to other global sports. [ESPN Cricinfo](https://www.espncricinfo.com) is the major and one of the few online platforms dedicated to cricket coverage. It started as [Cricinfo](https://en.wikipedia.org/wiki/ESPNcricinfo#/media/File:Cricinfo_in_1995.jpg) in the late 90s, and it was maintained by students and cricket fans who had immigrated to North America but were eager to keep tabs on the cricket activity around the globe. [ESPN acquired Cricinfo](https://www.espncricinfo.com/story/espn-acquires-cricinfo-297655) in 2007, becoming ESPN Cricinfo. It is the most extensive repository of open cricket data with the caveat that data is not in an accessible format to be downloaded easily. You would have to copy-paste (tables) or write programming scripts to access the data in a format suitable for analysis. Recently they have added a search tool, [Statsguru](https://stats.espncricinfo.com/ci/engine/stats/index.html), that lets you parse through their database, presenting results usually in a table format. 69 | 70 | [Cricsheet](https://cricsheet.org/) is another open data source for ball-by-ball data maintained by a great fan of the game, [Stephen Rushe](https://twitter.com/srushe). The cricsheet provides raw ball-by-ball data for all formats (tests, odis, T20) and both Men's and Women's games. It is an extensive project to produce ball-by-ball data, and we hugely appreciate Stephen Rushe's work. The data is available in different formats, such as JSON, YAML, and CSV. 71 | 72 | ## Why `cricketdata` 73 | 74 | The `cricketdata` (open-source) package aims to be a one-stop shop for most cricket data from all primary sources, available in an accessible form and ready for analysis. Different functions in the package allow us to download the data from Cricinfo and cricsheet as a data frame (tibble) in R. The user can access data from different formats of the game, e,g, tests, odis, international T20, league T20, etc. In particular, the 75 | 76 | - ball-by-ball data, 77 | - individual player play by innings data, 78 | - player play by team wrt career or innings data, 79 | - player id, dob, batting/bowling hand, bowling type. 80 | 81 | [cricWAR](https://dazzalytics.shinyapps.io/cricwar/) is an example of sports analytic project based on `cricketdata` resources. 82 | 83 | `cricketdata` as an open-source project is inspired primarily from the open-source work done by `Rstats` community and sports analytics projects such as [`nflfastR`](https://www.nflfastr.com/) [@nflfastR], [`sportsdataverse`](https://sportsdataverse.org/) [@dataverse]. 84 | 85 | In the following sections, we will show how to install the package and take full advantage of the package functionality with numerous examples. 86 | 87 | # Installation 88 | 89 | `cricketdata` is available on CRAN and the *stable* version can be installed. 90 | 91 | ```{r, eval=FALSE} 92 | install.packages("cricketdata", dependencies = TRUE) 93 | ``` 94 | 95 | You may also download the *development* version from [Github](https://github.com/robjhyndman/cricketdata) 96 | 97 | ```{r, eval=FALSE} 98 | install.packages("devtools") 99 | devtools::install_github("robjhyndman/cricketdata") 100 | ``` 101 | 102 | # Functions 103 | 104 | There are six main functions, 105 | 106 | - `fetch_cricinfo()` 107 | - `find_player_id()` 108 | - `fetch_player_data()` 109 | - `fetch_cricsheet()` 110 | - `fetch_player_meta()` 111 | - `update_player_meta()` 112 | 113 | and a data file containing the player meta data. 114 | 115 | - `player_meta` 116 | 117 | We show the use of each function with examples below. 118 | 119 | ## `fetch_cricinfo()` 120 | 121 | Fetch team data on international cricket matches provided by ESPNCricinfo. It downloads data for international T20, ODI or Test matches, for men or women, and for batting, bowling or fielding. By default, it downloads career-level statistics for individual players. 122 | 123 | *Arguments* 124 | 125 | - matchtype: Character indicating test (default), odi, or t20. 126 | - sex: Character indicating men (default) or women. 127 | - activity: Character indicating batting (default), bowling or fielding. 128 | - type: Character indicating innings-by-innings or career (default) data. 129 | - country: Character indicating country. The default is to fetch data for all countries. 130 | 131 | **Women's T20 Bowling Data** 132 | 133 | ```{r, eval=FALSE, echo=TRUE} 134 | # Fetch all Women's Bowling data for T20 format 135 | wt20 <- fetch_cricinfo("T20", "Women", "Bowling") 136 | ``` 137 | 138 | ```{r tbl-wt20} 139 | # Looking at data 140 | wt20 |> 141 | glimpse() 142 | 143 | # Table showing certain features of the data 144 | wt20 |> 145 | select(Player, Country, Matches, Runs, Wickets, Economy, StrikeRate) |> 146 | head() |> 147 | knitr::kable( 148 | digits = 2, align = "c", 149 | caption = "Women Player career profile for international T20" 150 | ) 151 | ``` 152 | 153 | ```{r fig-wt20SRvER, fig.cap="Strike Rate (balls bowled per wicket) Vs Average (runs conceded per wicket) for Women international T20 bowlers. Each observation represents one player, who has taken at least 50 international wickets."} 154 | # Plotting Data 155 | wt20 |> 156 | filter(Wickets >= 50) |> 157 | ggplot(aes(y = StrikeRate, x = Average)) + 158 | geom_point(alpha = 0.3, col = "blue") + 159 | ggtitle("Women International T20 Bowlers") + 160 | ylab("Balls bowled per wicket") + 161 | xlab("Runs conceded per wicket") 162 | ``` 163 | 164 | **USA men's ODI data by innings** 165 | 166 | 167 | ```{r, echo=TRUE, eval=FALSE} 168 | # Fetch all USA Men's ODI data by innings 169 | menODI <- fetch_cricinfo("ODI", "Men", "Batting", 170 | type = "innings", 171 | country = "United States of America" 172 | ) 173 | ``` 174 | 175 | ```{r tbl-USA100s} 176 | #| tbl-cap: Centuries, 100 runs or more in a single innings, scored by USA Batters 177 | # Table of USA player who have scored a century 178 | menODI |> 179 | filter(Runs >= 100) |> 180 | select(Player, Runs, BallsFaced, Fours, Sixes, Opposition) |> 181 | knitr::kable(digits = 2) 182 | ``` 183 | 184 | ```{r, echo=FALSE} 185 | # menODI |> 186 | # filter(Runs >= 50) |> 187 | # ggplot(aes(y = Runs, x = BallsFaced) ) + 188 | # geom_point(size = 2) + 189 | # geom_text(aes(label= Player), vjust=-0.5, color="#013369", 190 | # position = position_dodge(0.9), size=2) + 191 | # ylab("Runs Scored") + xlab("Balls Faced") 192 | ``` 193 | 194 | ## `fetch_player_id` 195 | 196 | Each player has a player id on ESPNCricinfo, which is useful to access a individual player's data. This function given a string of players name or part of the name would return the name of corresponding player(s), their cricinfo id(s), and some other information. 197 | 198 | *Argument* 199 | 200 | - searchstring: string of a player's name or part of the name 201 | 202 | ```r 203 | # Fetching a player, Meg Lanning's, ID 204 | meg_lanning_id <- find_player_id("Meg Lanning")$ID 205 | ``` 206 | 207 | ```{r} 208 | meg_lanning_id 209 | ``` 210 | 211 | ## `fetch_player_data` 212 | 213 | Fetch individual player data from all matches played. The function will scrape the data from ESPNCricinfo and return a tibble with one line per innings for all games a player has played. To identify a player, use their Cricinfo player ID. The simplest way to find this is to look up their Cricinfo Profile page. The number at the end of the URL is the ID. For example, Meg Lanning's profile page is , so her ID is 329336. Or you may use the `find_player_id` function. 214 | 215 | *Argument* 216 | 217 | - playerid 218 | - matchtype: Character indicating test (default), odi, or t20. 219 | - activity: Character indicating batting (default), bowling or fielding. 220 | 221 | ```{r echo=TRUE, eval=FALSE} 222 | # Fetching the player Meg Lanning's playing data 223 | MegLanning <- fetch_player_data(meg_lanning_id, "ODI") |> 224 | mutate(NotOut = (Dismissal == "not out")) 225 | ``` 226 | 227 | ```{r fig-meglanning, fig.cap="Meg Lanning, Australian captain, has shown amazing consistency over her career, with centuries scored in every year of her career except for 2021, when her highest score from 6 matches was 53."} 228 | dim(MegLanning) 229 | names(MegLanning) 230 | 231 | # Compute batting average 232 | MLave <- MegLanning |> 233 | filter(!is.na(Runs)) |> 234 | summarise(Average = sum(Runs) / (n() - sum(NotOut))) |> 235 | pull(Average) 236 | names(MLave) <- paste("Average =", round(MLave, 2)) 237 | 238 | # Plot ODI scores 239 | ggplot(MegLanning) + 240 | geom_hline(aes(yintercept = MLave), col = "gray") + 241 | geom_point(aes(x = Date, y = Runs, col = NotOut)) + 242 | ggtitle("Meg Lanning ODI Scores") + 243 | scale_y_continuous(sec.axis = sec_axis(~., breaks = MLave)) 244 | ``` 245 | 246 | ## `fetch_cricsheet()` 247 | 248 | [Cricsheet](https://cricsheet.org/) is the only open accessible source for cricket ball-by-ball data. `fetch_cricsheet()` download csv data from cricsheet. Data must be specified by three factors: (a) type of data: bbb (ball-by-ball), match or player. (b) gender; (c) competition. See for what the competition character codes mean. 249 | 250 | The raw T20 data from cricsheet is further processed to add more columns (features) to facilitate analysis. 251 | 252 | *Arguments* 253 | 254 | - type: Character string giving type of data: ball-by-ball, match info or player info. 255 | 256 | - gender: Character string giving player gender: female or male. 257 | 258 | - competition: Character string giving name of competition. e.g. ipl for Indiana Premier League, psl for Pakistan Super League, tests for international test matches, etc. 259 | 260 | **Indian Premier League (IPL) Ball-by-Ball Data** 261 | 262 | ```{r echo=TRUE, eval=FALSE} 263 | # Fetch all IPL ball-by-ball data 264 | ipl_bbb <- fetch_cricsheet("bbb", "male", "ipl") 265 | ``` 266 | 267 | ```{r} 268 | ipl_bbb |> 269 | glimpse() 270 | ``` 271 | 272 | ```{r fig-iplbatter, fig.cap="Top 20 prolific batters in IPL 2022. We show what percentage of balls they hit for a boundary (4 or 6) against percentage of how many balls they do not score off of (dot percent). Ideally we want to be in top left quadrant, high boundary % and low dot %."} 273 | # Top 20 batters wrt Boundary and Dot % in IPL 2022 season 274 | ipl_bbb |> 275 | filter(season == "2022") |> 276 | group_by(striker) |> 277 | summarize( 278 | Runs = sum(runs_off_bat), BallsFaced = n() - sum(!is.na(wides)), 279 | StrikeRate = Runs / BallsFaced, DotPercent = sum(runs_off_bat == 0) * 100 / BallsFaced, 280 | BoundaryPercent = sum(runs_off_bat %in% c(4, 6)) * 100 / BallsFaced 281 | ) |> 282 | arrange(desc(Runs)) |> 283 | rename(Batter = striker) |> 284 | slice(1:20) |> 285 | ggplot(aes(y = BoundaryPercent, x = DotPercent, size = BallsFaced)) + 286 | geom_point(color = "red", alpha = 0.3) + 287 | geom_text(aes(label = Batter), 288 | vjust = -0.5, hjust = 0.5, color = "#013369", 289 | position = position_dodge(0.9), size = 3 290 | ) + 291 | ylab("Boundary Percent") + 292 | xlab("Dot Percent") + 293 | ggtitle("IPL 2022: Top 20 Batters") 294 | ``` 295 | 296 | ```{r tbl-IPL2022Batters} 297 | #| tbl-cap: Top 10 prolific batters of IPL 2022 season. JC Butler scored the most runs in total and scored at the highest strike rate (runs per ball). His boundary percent (percentage of balls faced hit for 4s or 6s) is also the highest, while his dot percent (percentage of balls not scored of) is also among the highest. 298 | # Top 10 prolific batters in IPL 2022 season. 299 | ipl_bbb |> 300 | filter(season == "2022") |> 301 | group_by(striker) |> 302 | summarize( 303 | Runs = sum(runs_off_bat), BallsFaced = n() - sum(!is.na(wides)), 304 | StrikeRate = Runs / BallsFaced, 305 | DotPercent = sum(runs_off_bat == 0) * 100 / BallsFaced, 306 | BoundaryPercent = sum(runs_off_bat %in% c(4, 6)) * 100 / BallsFaced 307 | ) |> 308 | arrange(desc(Runs)) |> 309 | rename(Batter = striker) |> 310 | slice(1:10) |> 311 | knitr::kable(digits = 1, align = "c") 312 | ``` 313 | 314 | ## `player_meta` 315 | 316 | It is a data set containing player's and cricket officials meta data such as full name, country of representation, data of birth, bowling and batting hand, bowling style, and playing role. More than 11,000 player's and officials data is available. This data was scraped from ESPNCricinfo website. 317 | 318 | ```{r tbl-playermetadata} 319 | #| tbl-cap: Player and officials meta data. 320 | player_meta |> 321 | filter(!is.na(playing_role)) |> 322 | select(-cricinfo_id, -unique_name, -name) |> 323 | head() |> 324 | knitr::kable( 325 | digits = 1, align = "c", format = "pipe", 326 | col.names = c( 327 | "ID", "FullName", "Country", "DOB", "BirthPlace", 328 | "BattingStyle", "BowlingStyle", "PlayingRole" 329 | ) 330 | ) 331 | ``` 332 | 333 | ## `fetch_player_meta()` 334 | 335 | Fetch the player's meta data such as full name, country of representation, data of birth, bowling and batting hand, bowling style, and playing role. This meta data is useful for advance modeling, e,g, age curves, batter profile against bowling types etc. 336 | 337 | *Argument* 338 | 339 | - playerid: A vector of player IDs as given in Cricinfo profiles. Integer or character. 340 | 341 | The cricinfo player ids can be accessed in multiple ways, e.g. use fetch_player_id() function, get the id from the player's cricinfo page or consult the `player_meta` data frame which has player meta data of more than 11,000 players. 342 | 343 | 344 | ```{r echo=TRUE, eval=FALSE} 345 | # Download meta data on Meg Lanning and Ellyse Perry 346 | aus_women <- fetch_player_meta(c(329336, 275487)) 347 | ``` 348 | 349 | ```{r tbl-ausplayermetadata} 350 | #| tbl-cap: Australian Women player meta data. 351 | aus_women |> 352 | select(-name) |> 353 | knitr::kable( 354 | digits = 1, align = "c", format = "pipe", 355 | col.names = c( 356 | "ID", "FullName", "Country", "DOB", "BattingStyle", "BowlingStyle" 357 | ) 358 | ) 359 | ``` 360 | 361 | ## `update_player_meta()` 362 | 363 | This function is supposed to consult the directory of all players available on cricsheet website and include the meta data of new players into the `player_meta` data frame. The data for new players will be scraped from the ESPNCricinfo. 364 | 365 | # References 366 | -------------------------------------------------------------------------------- /vignettes/cricsheet.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Cricsheet data" 3 | author: Jacquie Tran 4 | date: 7 November 2021 5 | output: 6 | rmarkdown::html_vignette: 7 | fig_width: 8 8 | fig_height: 5 9 | vignette: > 10 | %\VignetteIndexEntry{Cricsheet data} 11 | %\VignetteEngine{knitr::rmarkdown} 12 | %\VignetteEncoding{UTF-8} 13 | --- 14 | 15 | ```{r, include = FALSE} 16 | knitr::opts_chunk$set( 17 | collapse = TRUE, 18 | comment = "#>", 19 | echo = TRUE 20 | ) 21 | ``` 22 | 23 | ```{r setup} 24 | library(cricketdata) 25 | library(readr) 26 | library(dplyr) 27 | library(stringr) 28 | library(showtext) 29 | library(ggplot2) 30 | library(gghighlight) 31 | library(ggtext) 32 | library(patchwork) 33 | ``` 34 | 35 | The `fetch_cricsheet()` function will download csv data from [Cricsheet](https://cricsheet.org/downloads/). Data must be specified by three factors: (a) `type` of data: `bbb` (ball-by-ball), `match` or `player`; (b) `gender`; (c) `competition`. 36 | 37 | 38 | Here are some examples of its use with some WBBL data. An earlier version of this article was published at https://underthehood.jacquietran.com/post/wbbl-visualisations/. 39 | 40 | ```{r getdata, eval=FALSE, echo=FALSE} 41 | # Avoid downloading the data when the package is checked by CRAN. 42 | # This only needs to be run once to store the data locally 43 | wbbl_bbb <- fetch_cricsheet(competition = "wbbl", gender = "female") 44 | wbbl_match_info <- fetch_cricsheet(competition = "wbbl", type = "match", gender = "female") 45 | saveRDS(wbbl_bbb, "inst/extdata/wbbl_bbb.rds") 46 | saveRDS(wbbl_match_info, "inst/extdata/wbbl_match_info.rds") 47 | ``` 48 | 49 | ```{r loaddata, include=FALSE} 50 | wbbl_bbb <- readRDS("../inst/extdata/wbbl_bbb.rds") 51 | wbbl_match_info <- readRDS("../inst/extdata/wbbl_match_info.rds") 52 | ``` 53 | 54 | ```r 55 | # Fetch ball-by-ball data 56 | wbbl_bbb <- fetch_cricsheet(competition = "wbbl", gender = "female") 57 | 58 | # Fetch match metadata 59 | wbbl_match_info <- fetch_cricsheet(competition = "wbbl", type = "match", gender = "female") 60 | ``` 61 | 62 | ## Alyssa Healy's WBBL batting record 63 | 64 | What is there to say about Alyssa Healy that hasn’t already been written by a far better writer than I? She’s a dangerous batter at every level she plays at, so I wanted to visualise her production across seasons, compared to other batters in the WBBL. 65 | 66 | ### Tidying the data 67 | 68 | On exploring the Cricsheet data, I noticed that there are only 10 matches from WBBL01 with ball-by-ball data, but there were 59 matches played in the 1st season of WBBL (per [Wikipedia](https://en.wikipedia.org/wiki/2015%E2%80%9316_Women%27s_Big_Bash_League_season)). 69 | 70 | Additionally, when I originally worked on this plot of Healy vs. the world, it was mid-October 2021 and the first ball of #WBBL07 had yet to be bowled. Now, we are reproducing the chart as we are part-way through the current season, so the code below also omits data from any matches played so far in this 7th season. 71 | 72 | ```{r} 73 | # Data from 2015/16 (WBBL01) excluded due to only having 10 matches worth of data 74 | # in the Cricsheet spreadsheet. 75 | # Data from 2021/22 and later (WBBL07) excluded as incomplete at time of article 76 | wbbl_bbb_tidy <- wbbl_bbb |> 77 | filter(!season %in% c("2015/16", "2021/22")) |> 78 | filter(start_date < "2021-11-07") 79 | ``` 80 | 81 | The ball-by-ball data hosted by Cricsheet provides a great starting point, but a little more tidying and wrangling is needed for the purposes of understanding batting performances across WBBL seasons. 82 | 83 | In cricket broadcasts, batting average is among the most common statistic that commentators will reference to highlight how well a player performs with willow in hand. Batting average is calculated by taking runs scored by a batter over a defined period of time and dividing it by the number of they have been dismissed over that same period. 84 | 85 | With this formula, the batting average metric can produce a somewhat inflated measure of a batter’s performance over time, because the denominator is dismissals and not total innings batted. On top of that, the WBBL season is relatively short - in season 7, there are 14 fixtured rounds, plus semi-finals and finals. So at maximum, a batter could play 16 innings in the current season. Others will bat far fewer innings than that hypothetical maximum, so 2 or 3 not-out innings can have a sizeable influence on batting average. 86 | 87 | All things considered, including the T20 format of the WBBL, I was more interested in Healy’s “production” in the sense of average runs scored per innings. 88 | 89 | ```{r} 90 | # Alyssa Healy compared to all players who have batted in 3+ innings in a season. 91 | batting_per_season <- wbbl_bbb_tidy |> 92 | group_by(season, striker) |> 93 | summarise( 94 | innings_total = length(unique(match_id)), 95 | runs_off_bat_total = sum(runs_off_bat), 96 | balls_faced_total = length(ball), 97 | .groups = "keep" 98 | ) |> 99 | mutate( 100 | runs_per_innings_avg = round(runs_off_bat_total / innings_total, 1), 101 | strike_rate = round(runs_off_bat_total / balls_faced_total * 100, 1) 102 | ) |> 103 | filter(innings_total > 2) |> 104 | mutate(is_healy = (striker == "AJ Healy")) |> 105 | ungroup() 106 | ``` 107 | 108 | ### Making a plot 109 | 110 | ```{r warning=FALSE, out.width="100%"} 111 | # Import fonts from Google Fonts 112 | font_add_google("Roboto Condensed", "roboto_con") 113 | font_add_google("Staatliches", "staat") 114 | showtext_auto() 115 | 116 | # Build plot 117 | batting_per_season |> 118 | ggplot(aes( 119 | x = season, y = runs_per_innings_avg, 120 | group = striker, colour = is_healy 121 | )) + 122 | geom_line(linewidth = 2, colour = "#F80F61FF") + 123 | gghighlight(is_healy, 124 | label_key = striker, 125 | label_params = aes( 126 | size = 6, force_pull = 0.1, nudge_y = 10, label.size = 1, 127 | family = "roboto_con", label.padding = 0.5, 128 | fill = "#19232FFF", 129 | colour = "#F80F61FF" 130 | ), 131 | unhighlighted_params = list(size = 1, color = "#187999FF") 132 | ) + 133 | labs( 134 | title = "WBBL: Average runs scored per innings (3+ innings)", 135 | x = NULL, y = NULL, 136 | caption = "**Source:** Cricsheet.org // **Plot:** @jacquietran" 137 | ) + 138 | theme_minimal() + 139 | theme( 140 | text = element_text(size = 18, family = "roboto_con", colour = "#FFFFFF"), 141 | plot.title = element_text(family = "staat", margin = margin(0, 0, 15, 0)), 142 | plot.caption = element_markdown(size = NULL, margin = margin(15, 0, 0, 0)), 143 | axis.text = element_text(colour = "#FFFFFF"), 144 | legend.position = "none", 145 | panel.grid.major = element_line(linetype = "dashed"), 146 | panel.grid.minor = element_blank(), 147 | plot.background = element_rect( 148 | fill = "#171F2AFF", colour = NA 149 | ), 150 | panel.spacing = unit(2, "lines"), 151 | plot.margin = unit(c(0.5, 0.5, 0.5, 0.5), "cm") 152 | ) 153 | ``` 154 | 155 | When plotting Healy’s WBBL production against the rest of the comp, what I see is “run machine” - consistent, high-end output over the last 5 WBBL seasons. 156 | 157 | No hyperbole: there are few better than Alyssa Healy at the crease. 158 | 159 | ## Dismissals by ball number 160 | 161 | Aside from scoring runs, another important aspect of batting is not losing your wicket. So how hard is it really to get the best batters out? 162 | 163 | I would love to dive into this question using high-resolution data that reflects the biomechanics battle between bowlers and batters. Enter: Hawkeye…except that Hawkeye is only sporadically available for women’s matches, even at the highest level. As far as I know, there are no publicly available Hawkeye data sets from WBBL matches. If I’m wrong, please tell me and point me towards the good goods! 164 | 165 | *([Cody Atkinson](https://twitter.com/CapitalCityCody/status/1448476292743983105) alerted me to the bounceR package by Richard Little, which enables access to Hawkeye data that does exist on the ICC and BCCI websites.)* 166 | 167 | ### Tidying the data 168 | 169 | In the absence of fancy Hawkeye data from the WBBL, I took a simpler route through the Cricsheet ball-by-ball data to visualise how hard it is to take the wicket of the likes of Healy, Beth Mooney, Meg Lanning, Ellyse Perry, Sophie Devine, and Heather Knight. 170 | 171 | ```{r} 172 | # Create new variable for ball number in each over 173 | ball_number_faced <- wbbl_bbb_tidy |> 174 | mutate(ball_num_in_over = sub(".*\\.", "", ball)) 175 | 176 | # Summarise number of balls faced of each ball number, per batter 177 | ball_number_faced_summary <- ball_number_faced |> 178 | group_by(ball_num_in_over, striker) |> 179 | summarise(balls_faced = n(), .groups = "drop") 180 | 181 | # Dismissals by ball number 182 | dismissals_by_ball_number <- ball_number_faced |> 183 | select(ball_num_in_over, striker, wicket_type) |> 184 | filter(wicket_type != "") |> 185 | group_by(ball_num_in_over, striker) |> 186 | summarise(dismissals_n = n(), .groups = "drop") 187 | ``` 188 | 189 | The code above shapes up the ball-by-ball data to record: 190 | 191 | * How many balls a batter has faced at each ball number in an over (1-6), 192 | * How many times a batter has been dismissed at each ball number. 193 | 194 | I made some editorial judgments too: 195 | 196 | * Excluded data from WBBL01 due to limited games from that season in the Cricsheet-hosted data 197 | * Excluded balls numbered 7+ 198 | * League-wide data only includes batters who have faced 200+ balls total (across WBBL02-06, inclusive) 199 | 200 | With the ball-by-ball data prepared for the analysis question, we can then calculate league-wide and player-specific summary statistics for dismissals by ball number: 201 | 202 | ```{r} 203 | # Merge data and summarise to league-wide dismissals rate by ball number 204 | dismissals_by_ball_number_summary <- left_join( 205 | ball_number_faced_summary, dismissals_by_ball_number, 206 | by = c("ball_num_in_over", "striker") 207 | ) |> 208 | tidyr::replace_na(list(dismissals_n = 0)) |> 209 | group_by(striker) |> 210 | mutate(total_balls_faced = sum(balls_faced)) |> 211 | ungroup() |> 212 | mutate(dismissals_pct = round(dismissals_n / balls_faced * 100, 2)) |> 213 | # Include those who have faced more than 200 balls total 214 | filter(total_balls_faced >= 200) |> 215 | # Exclude balls beyond 6 - infrequent occurrences 216 | filter(ball_num_in_over < 7) 217 | 218 | # Extract data for specific players 219 | # Healy 220 | dismissals_by_ball_number_summary_healy <- dismissals_by_ball_number_summary |> 221 | filter(striker == "AJ Healy") 222 | # Mooney 223 | dismissals_by_ball_number_summary_mooney <- dismissals_by_ball_number_summary |> 224 | filter(striker == "BL Mooney") 225 | # Lanning 226 | dismissals_by_ball_number_summary_lanning <- dismissals_by_ball_number_summary |> 227 | filter(striker == "MM Lanning") 228 | # Perry 229 | dismissals_by_ball_number_summary_perry <- dismissals_by_ball_number_summary |> 230 | filter(striker == "EA Perry") 231 | # Devine 232 | dismissals_by_ball_number_summary_devine <- dismissals_by_ball_number_summary |> 233 | filter(striker == "SFM Devine") 234 | # Knight 235 | dismissals_by_ball_number_summary_knight <- dismissals_by_ball_number_summary |> 236 | filter(striker == "HC Knight") 237 | ``` 238 | 239 | ### Making plots 240 | 241 | Here’s the code for building the plot I tweeted out. I decided to build one plot per player of interest and then “quilt” the plots together using the `patchwork` package (https://github.com/thomasp85/patchwork). I’m sure you could achieve the same / similar result using `ggplot2::facet_wrap()` but I just love using `patchwork` ... 242 | 243 | ```{r, out.width="100%"} 244 | # Define consistent plot features ---------------------------------------------- 245 | plot_features <- list( 246 | coord_cartesian(ylim = c(0, 10)), 247 | theme_minimal(), 248 | theme( 249 | text = element_text(family = "roboto_con", colour = "#FFFFFF"), 250 | plot.title = element_text( 251 | size = 11, family = "staat", margin = margin(0, 0, 15, 0) 252 | ), 253 | plot.subtitle = element_text( 254 | size = 12, family = "staat", margin = margin(0, 0, 15, 0) 255 | ), 256 | plot.caption = element_markdown( 257 | size = 10, margin = margin(15, 0, 0, 0) 258 | ), 259 | axis.text = element_text(size = 9, colour = "#FFFFFF"), 260 | legend.position = "none", 261 | panel.grid.major.y = element_line(linetype = "dashed"), 262 | panel.grid.major.x = element_blank(), 263 | panel.grid.minor = element_blank(), 264 | plot.background = element_rect( 265 | fill = "#171F2AFF", colour = NA 266 | ), 267 | panel.spacing = unit(2, "lines"), 268 | plot.margin = unit(c(0.25, 0.25, 0.25, 0.25), "cm") 269 | ), 270 | labs(x = NULL, y = NULL) 271 | ) 272 | 273 | # Build plots ------------------------------------------------------------------ 274 | showtext_auto() 275 | 276 | # Healy 277 | p1 <- dismissals_by_ball_number_summary |> 278 | ggplot(aes(x = ball_num_in_over, y = dismissals_pct)) + 279 | geom_boxplot( 280 | fill = "#FFFFFF", colour = "#FFFFFF", size = .5, 281 | alpha = 0.25, notch = TRUE, outlier.shape = NA, coef = 0 282 | ) + 283 | geom_point( 284 | data = dismissals_by_ball_number_summary_healy, 285 | colour = "#F80F61FF", size = 3 286 | ) + 287 | labs( 288 | title = "WBBL: % dismissals by ball number", 289 | subtitle = "AJ Healy" 290 | ) + 291 | plot_features 292 | 293 | # Mooney 294 | p2 <- dismissals_by_ball_number_summary |> 295 | ggplot(aes(x = ball_num_in_over, y = dismissals_pct)) + 296 | geom_boxplot( 297 | fill = "#FFFFFF", colour = "#FFFFFF", size = .5, 298 | alpha = 0.25, notch = TRUE, outlier.shape = NA, coef = 0 299 | ) + 300 | geom_point( 301 | data = dismissals_by_ball_number_summary_mooney, 302 | colour = "#FA6900FF", size = 3 303 | ) + 304 | labs(subtitle = "BL Mooney") + 305 | plot_features 306 | 307 | # Lanning 308 | p3 <- dismissals_by_ball_number_summary |> 309 | ggplot(aes(x = ball_num_in_over, y = dismissals_pct)) + 310 | geom_boxplot( 311 | fill = "#FFFFFF", colour = "#FFFFFF", size = .5, 312 | alpha = 0.25, notch = TRUE, outlier.shape = NA, coef = 0 313 | ) + 314 | geom_point( 315 | data = dismissals_by_ball_number_summary_lanning, 316 | colour = "#018821FF", size = 3 317 | ) + 318 | labs(subtitle = "MM Lanning") + 319 | plot_features 320 | 321 | # Perry 322 | p4 <- dismissals_by_ball_number_summary |> 323 | ggplot(aes(x = ball_num_in_over, y = dismissals_pct)) + 324 | geom_boxplot( 325 | fill = "#FFFFFF", colour = "#FFFFFF", size = .5, 326 | alpha = 0.25, notch = TRUE, outlier.shape = NA, coef = 0 327 | ) + 328 | geom_point( 329 | data = dismissals_by_ball_number_summary_perry, 330 | colour = "#F80F61FF", size = 3 331 | ) + 332 | labs(subtitle = "EA Perry") + 333 | plot_features 334 | 335 | # Devine 336 | p5 <- dismissals_by_ball_number_summary |> 337 | ggplot(aes(x = ball_num_in_over, y = dismissals_pct)) + 338 | geom_boxplot( 339 | fill = "#FFFFFF", colour = "#FFFFFF", size = .5, 340 | alpha = 0.25, notch = TRUE, outlier.shape = NA, coef = 0 341 | ) + 342 | geom_point( 343 | data = dismissals_by_ball_number_summary_devine, 344 | colour = "#FA6900FF", size = 3 345 | ) + 346 | labs(subtitle = "SFM Devine") + 347 | plot_features 348 | 349 | # Knight 350 | p6 <- dismissals_by_ball_number_summary |> 351 | ggplot(aes(x = ball_num_in_over, y = dismissals_pct)) + 352 | geom_boxplot( 353 | fill = "#FFFFFF", colour = "#FFFFFF", size = .5, 354 | alpha = 0.25, notch = TRUE, outlier.shape = NA, coef = 0 355 | ) + 356 | geom_point( 357 | data = dismissals_by_ball_number_summary_knight, 358 | colour = "#95C65CFF", size = 3 359 | ) + 360 | labs( 361 | subtitle = "HC Knight", 362 | caption = "**Source:** Cricsheet.org // **Plot:** @jacquietran" 363 | ) + 364 | plot_features 365 | 366 | # Quilt the plots -------------------------------------------------------------- 367 | (p1 + p2 + p3) / (p4 + p5 + p6) 368 | ``` 369 | 370 | My key observations: 371 | 372 | 1. Lanning doesn’t give much away on any ball number in an over - of balls faced at each ball number (1-6), her dismissals percentages range from 2.7-3.4%. Well under league medians of 4.3-5.1%. 373 | 2. Perry is a stalwart too, but relative to her own standards, you might have a better shot bowling to her on balls 5 (2.7%) and 6 (3.4%) than earlier in an over (1.4-2.4%). 374 | 3. Healy is obviously dangerous with how quickly she can accelerate her run-scoring, but is she susceptible on ball 5? (dismissed 7.1% out of balls faced vs. 4.7% as the league median for ball number 5) 375 | 4. The opening partnership of Mooney / Devine is a scary prospect, reason #4182: they’re both hard to shift from the crease. Devine’s “worst” ball is ball number 3 (4.3%), Mooney’s is ball number 5 (4.5%) - both are still below league medians (4.8% and 4.7%, respectively). 376 | 5. Knight is stubborn for balls 1-3 (1.7-2.8%), looser than the league medians for balls 4 and 5 (7.7% and 5.5%, respectively), then clamps down again for ball 6 (3.2%). 377 | 378 | ## “Wiggly bois”: Visualising team strike rates 379 | 380 | Player and league summaries across seasons are great and all, but what really piques my interest with ball-by-ball / play-by-play data is using it to understand how matches unfold from moment to moment. 381 | 382 | The very nature of cricket’s shortest format, T20, lends itself to batting aggression - recruiting for power hitters, aiming to demoralise the opposition by setting big targets, and building teams that bat deep. That last point is particularly important in T20 because aggressive batting means taking risks, and as long as there is risk taking by the batters, then the bowling side is in with a chance. 383 | 384 | Commentators will generally focus on player strike rates in T20; that is, how many runs a batter scores for the number of balls faced. For instance, power hitters like Sophie Devine routinely achieve strike rates over 120 (i.e., 120 runs scored per 100 balls) - we can think of this as a measure of scoring efficiency. 385 | 386 | A common perception is that snagging a wicket will slow down strike rates, but I wondered whether this is really true in the T20 context where the imperative to accelerate is paramount. Teams will also expect to lose some wickets in every innings, so I’d imagine they would recruit and train accordingly for a potent middle order and a tail that wags. 387 | 388 | Instead of looking at player strike rates, what can we learn by analysing team *strike rates*? 389 | 390 | ### Tidying the data 391 | 392 | For a more granular focus, I created a subset from the Cricsheet data that includes matches played in season 7 of the WBBL. At the time I produced my original visualisation, the ball-by-ball data included all games played up to 7 November 2021. 393 | 394 | ```{r} 395 | # Subset match metadata for WBB07 games 396 | wbbl07_match_info_tidy <- wbbl_match_info |> 397 | filter(season == "2021/22", date <= "2021/11/07") |> 398 | select( 399 | match_id, winner, winner_runs, winner_wickets, method, outcome, 400 | eliminator 401 | ) |> 402 | mutate(match_id = factor(match_id)) 403 | 404 | # Subset ball-by-ball data for WBBL07 games 405 | wbbl07_bbb_tidy <- wbbl_bbb |> 406 | filter(match_id %in% wbbl07_match_info_tidy$match_id) |> 407 | mutate( 408 | match_id = factor(match_id), 409 | runs_scored = runs_off_bat + extras, 410 | wicket_type = case_when( 411 | wicket_type == "" ~ NA_character_, 412 | TRUE ~ wicket_type 413 | ) 414 | ) |> 415 | group_by(match_id, innings) |> 416 | mutate( 417 | temp_var = 1, 418 | balls_cumulative = cumsum(temp_var), 419 | runs_cumulative = cumsum(runs_scored), 420 | runs_total = max(runs_cumulative) 421 | ) |> 422 | ungroup() |> 423 | select(-temp_var) |> 424 | # Merge match metadata and ball-by-ball data 425 | left_join(wbbl07_match_info_tidy, by = "match_id") |> 426 | mutate( 427 | outcome_batting_team = case_when( 428 | outcome %in% c("no result", "tie") ~ as.character(outcome), 429 | winner == batting_team ~ "won", 430 | TRUE ~ "lost" 431 | ), 432 | outcome_bowling_team = case_when( 433 | outcome %in% c("no result", "tie") ~ as.character(outcome), 434 | winner == bowling_team ~ "won", 435 | TRUE ~ "lost" 436 | ) 437 | ) 438 | ``` 439 | 440 | Using the WBBL07 data subset, I did some further tidying to calculate team strike rates per innings: 441 | 442 | ```{r} 443 | team_strike_rate <- wbbl07_bbb_tidy |> 444 | # Exclude matches that ended with a Super Over ("tie") 445 | # and matches that were called off ("no result") 446 | filter(!outcome_batting_team %in% c("tie", "no result")) |> 447 | group_by(match_id, innings) |> 448 | mutate( 449 | rolling_strike_rate = round( 450 | runs_cumulative / balls_cumulative * 100, 1 451 | ), 452 | wicket_ball_num = case_when( 453 | !is.na(wicket_type) ~ balls_cumulative, 454 | TRUE ~ NA_real_ 455 | ), 456 | wicket_strike_rate = case_when( 457 | !is.na(wicket_type) ~ rolling_strike_rate, 458 | TRUE ~ NA_real_ 459 | ), 460 | innings_description = case_when( 461 | innings == 1 ~ "Batting 1st", 462 | innings == 2 ~ "Batting 2nd" 463 | ), 464 | bowling_team_short = word(bowling_team, -1), 465 | start_date_day = lubridate::day(start_date), 466 | start_date_month = lubridate::month(start_date), 467 | match_details = glue::glue( 468 | "{innings_description} vs. {bowling_team_short} ({start_date_day}/{start_date_month})" 469 | ) 470 | ) |> 471 | arrange(match_id, innings, balls_cumulative) |> 472 | mutate( 473 | match_details = factor( 474 | match_details, 475 | levels = unique(match_details) 476 | ), 477 | outcome_batting_team = factor( 478 | outcome_batting_team, 479 | levels = c("won", "lost") 480 | ) 481 | ) 482 | ``` 483 | 484 | ### Making plots 485 | 486 | There’s plenty that can be done with the `team_strike_rate` data which could warrant a dedicated exploration in itself. But I usually find that when I’m exploring a new analytical idea, it’s easier for me to get a feel for what the data does and or doesn’t highlight by going with a “small batch” approach. 487 | 488 | With the Renegades at the top of the standings (as of 7 Nov), I focused on their team strike rates, a.k.a. *“wiggly bois”*: 489 | 490 | ```{r echo=FALSE, out.width="75%", fig.align="center"} 491 | knitr::include_graphics("figs/tweet.png") 492 | ``` 493 | 494 | ```{r, fig.width=9, fig.height=12, out.width="100%", warning=FALSE, message=FALSE} 495 | # Filter to Renegades' innings only -------------------------------------------- 496 | team_strike_rate_renegades <- team_strike_rate |> 497 | filter(str_detect(batting_team, "Renegades")) 498 | 499 | # Build plot ------------------------------------------------------------------- 500 | showtext_auto() 501 | team_strike_rate_renegades |> 502 | ggplot(aes(x = balls_cumulative, y = rolling_strike_rate)) + 503 | facet_wrap(~match_details, ncol = 3) + 504 | geom_hline(yintercept = 100, linetype = "dashed", colour = "#CCCCCC") + 505 | geom_line(aes(colour = outcome_batting_team), linewidth = 1.5) + 506 | geom_point( 507 | aes( 508 | x = team_strike_rate_renegades$wicket_ball_num, 509 | y = team_strike_rate_renegades$wicket_strike_rate 510 | ), 511 | colour = "red", size = 3, alpha = 0.75 512 | ) + 513 | labs( 514 | title = "WBBL07: Melbourne Renegades - Team strike rate (games up to 7 Nov 2021)", 515 | x = "Ball number in an innings", y = NULL, 516 | caption = "**Source:** Cricsheet.org // **Plot:** @jacquietran" 517 | ) + 518 | scale_x_continuous(breaks = seq(0, 120, by = 30)) + 519 | scale_color_manual( 520 | values = c("won" = "#4a8bad", "lost" = "#AD4A8B"), 521 | labels = c("Renegades won", "Renegades lost") 522 | ) + 523 | coord_cartesian(ylim = c(0, 200)) + 524 | theme_minimal() + 525 | theme( 526 | text = element_text(size = 18, family = "roboto_con", colour = "#FFFFFF"), 527 | legend.position = "top", 528 | legend.title = element_blank(), 529 | legend.key.size = unit(1.5, "cm"), 530 | legend.margin = margin(0, 0, 0, 0), 531 | legend.spacing.x = unit(0, "cm"), 532 | legend.spacing.y = unit(0, "cm"), 533 | plot.title = element_text(family = "staat", margin = margin(0, 0, 15, 0)), 534 | plot.caption = element_markdown(size = NULL, margin = margin(15, 0, 0, 0)), 535 | strip.text = element_text(colour = "#FFFFFF", size = 12), 536 | axis.text = element_text(colour = "#FFFFFF"), 537 | axis.title.x = element_text(margin = margin(15, 0, 0, 0)), 538 | panel.grid.major.y = element_blank(), 539 | panel.grid.minor.y = element_blank(), 540 | panel.grid.major.x = element_line(colour = "#203b60", linetype = "dotted"), 541 | panel.grid.minor.x = element_blank(), 542 | plot.background = element_rect( 543 | fill = "#171F2AFF", 544 | colour = NA 545 | ), 546 | panel.spacing = unit(2, "lines"), 547 | plot.margin = unit(c(0.5, 0.5, 0.5, 0.5), "cm") 548 | ) 549 | ``` 550 | 551 | The plots show the Renegades’ team strike rate across each innings they’ve batted, with wicket occurrences overlaid. From a visual assessment, it looks like wickets have not dampened the enthusiasm of Renegades’ batters for striking the ball - broadly speaking, it appears that, when the new batter comes in for the Renegades, they are often able to maintain the team’s strike rate. In some games, they’ve even managed to accelerate after losing a wicket. 552 | 553 | Importantly, the Renegades have not lost many wickets in Powerplay overs this season, which puts them in a better position to push the scoring pace as an innings wears on with most of their wickets in hand. 554 | --------------------------------------------------------------------------------