├── .github
├── .gitignore
└── workflows
│ ├── pkgdown.yaml
│ └── R-CMD-check.yaml
├── vignettes
├── .gitignore
├── figs
│ └── tweet.png
├── bibliography.bib
├── cricinfo.Rmd
├── cricketdata_R_pkg.Rmd
└── cricsheet.Rmd
├── R
├── sysdata.rda
├── cricketdata.R
├── data.R
├── fetch_cricinfo.R
├── clean_fielding_data.R
├── find_player_id.R
├── fetch_player_meta.R
├── fetch_cricket_data.R
├── update_player_meta.R
├── clean_batting_data.R
├── clean_bowling_data.R
├── countries.R
├── fetch_player.R
└── fetch_cricsheet.R
├── data
├── player_meta.rda
└── cricsheet_codes.rda
├── inst
└── extdata
│ ├── wt20.rds
│ ├── ipl_bbb.rds
│ ├── menODI.rds
│ ├── MegLanning.rds
│ ├── aus_women.rds
│ ├── usmenODI.rds
│ ├── wbbl_bbb.rds
│ ├── Indfielding.rds
│ ├── meg_lanning_id.rds
│ └── wbbl_match_info.rds
├── .gitignore
├── man
├── figures
│ ├── cricketdata.png
│ ├── README-menodi-1.png
│ ├── README-woment20-1.png
│ ├── README-meglanning-1.png
│ ├── README-indiafielding-1.png
│ ├── README-unnamed-chunk-2-1.png
│ ├── README-unnamed-chunk-4-1.png
│ ├── README-unnamed-chunk-5-1.png
│ ├── README-unnamed-chunk-6-1.png
│ └── README-unnamed-chunk-7-1.png
├── cricsheet_codes.Rd
├── player_meta.Rd
├── find_player_id.Rd
├── update_player_meta.Rd
├── fetch_cricinfo.Rd
├── cricketdata-package.Rd
├── fetch_player_meta.Rd
├── fetch_cricsheet.Rd
└── fetch_player_data.Rd
├── .Rbuildignore
├── pkgdown
└── extra.css
├── cran-comments.md
├── cricketdata.Rproj
├── _pkgdown.yml
├── NAMESPACE
├── NEWS.md
├── DESCRIPTION
├── README.md
├── README.Rmd
└── data-raw
└── make_player_meta.R
/.github/.gitignore:
--------------------------------------------------------------------------------
1 | *.html
2 |
--------------------------------------------------------------------------------
/vignettes/.gitignore:
--------------------------------------------------------------------------------
1 | *.html
2 | *.R
3 | *_files/*
4 | *_cache/*
5 |
--------------------------------------------------------------------------------
/R/sysdata.rda:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/robjhyndman/cricketdata/HEAD/R/sysdata.rda
--------------------------------------------------------------------------------
/data/player_meta.rda:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/robjhyndman/cricketdata/HEAD/data/player_meta.rda
--------------------------------------------------------------------------------
/inst/extdata/wt20.rds:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/robjhyndman/cricketdata/HEAD/inst/extdata/wt20.rds
--------------------------------------------------------------------------------
/data/cricsheet_codes.rda:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/robjhyndman/cricketdata/HEAD/data/cricsheet_codes.rda
--------------------------------------------------------------------------------
/inst/extdata/ipl_bbb.rds:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/robjhyndman/cricketdata/HEAD/inst/extdata/ipl_bbb.rds
--------------------------------------------------------------------------------
/inst/extdata/menODI.rds:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/robjhyndman/cricketdata/HEAD/inst/extdata/menODI.rds
--------------------------------------------------------------------------------
/vignettes/figs/tweet.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/robjhyndman/cricketdata/HEAD/vignettes/figs/tweet.png
--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
1 | .Rproj.user
2 | .Rhistory
3 | .RData
4 | .Ruserdata
5 | README_cache
6 | README_files
7 | docs
8 |
--------------------------------------------------------------------------------
/inst/extdata/MegLanning.rds:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/robjhyndman/cricketdata/HEAD/inst/extdata/MegLanning.rds
--------------------------------------------------------------------------------
/inst/extdata/aus_women.rds:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/robjhyndman/cricketdata/HEAD/inst/extdata/aus_women.rds
--------------------------------------------------------------------------------
/inst/extdata/usmenODI.rds:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/robjhyndman/cricketdata/HEAD/inst/extdata/usmenODI.rds
--------------------------------------------------------------------------------
/inst/extdata/wbbl_bbb.rds:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/robjhyndman/cricketdata/HEAD/inst/extdata/wbbl_bbb.rds
--------------------------------------------------------------------------------
/man/figures/cricketdata.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/robjhyndman/cricketdata/HEAD/man/figures/cricketdata.png
--------------------------------------------------------------------------------
/inst/extdata/Indfielding.rds:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/robjhyndman/cricketdata/HEAD/inst/extdata/Indfielding.rds
--------------------------------------------------------------------------------
/inst/extdata/meg_lanning_id.rds:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/robjhyndman/cricketdata/HEAD/inst/extdata/meg_lanning_id.rds
--------------------------------------------------------------------------------
/inst/extdata/wbbl_match_info.rds:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/robjhyndman/cricketdata/HEAD/inst/extdata/wbbl_match_info.rds
--------------------------------------------------------------------------------
/man/figures/README-menodi-1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/robjhyndman/cricketdata/HEAD/man/figures/README-menodi-1.png
--------------------------------------------------------------------------------
/man/figures/README-woment20-1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/robjhyndman/cricketdata/HEAD/man/figures/README-woment20-1.png
--------------------------------------------------------------------------------
/man/figures/README-meglanning-1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/robjhyndman/cricketdata/HEAD/man/figures/README-meglanning-1.png
--------------------------------------------------------------------------------
/man/figures/README-indiafielding-1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/robjhyndman/cricketdata/HEAD/man/figures/README-indiafielding-1.png
--------------------------------------------------------------------------------
/man/figures/README-unnamed-chunk-2-1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/robjhyndman/cricketdata/HEAD/man/figures/README-unnamed-chunk-2-1.png
--------------------------------------------------------------------------------
/man/figures/README-unnamed-chunk-4-1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/robjhyndman/cricketdata/HEAD/man/figures/README-unnamed-chunk-4-1.png
--------------------------------------------------------------------------------
/man/figures/README-unnamed-chunk-5-1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/robjhyndman/cricketdata/HEAD/man/figures/README-unnamed-chunk-5-1.png
--------------------------------------------------------------------------------
/man/figures/README-unnamed-chunk-6-1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/robjhyndman/cricketdata/HEAD/man/figures/README-unnamed-chunk-6-1.png
--------------------------------------------------------------------------------
/man/figures/README-unnamed-chunk-7-1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/robjhyndman/cricketdata/HEAD/man/figures/README-unnamed-chunk-7-1.png
--------------------------------------------------------------------------------
/.Rbuildignore:
--------------------------------------------------------------------------------
1 | ^.*\.Rproj$
2 | ^\.Rproj\.user$
3 | ^README\.Rmd$
4 | ^README_*
5 | ^_pkgdown\.yml$
6 | ^docs$
7 | ^pkgdown$
8 | ^\.github$
9 | ^cran-comments\.md$
10 | ^CRAN-SUBMISSION$
11 | ^data-raw$
12 | ^.*_cache$
13 | ^pkgdown$
14 |
--------------------------------------------------------------------------------
/R/cricketdata.R:
--------------------------------------------------------------------------------
1 | #' @importFrom dplyr pull
2 | #' @importFrom readr read_csv
3 | #' @importFrom tidyr separate pivot_wider
4 | #' @importFrom utils download.file read.csv unzip
5 | #' @importFrom tibble tibble as_tibble
6 | #'
7 |
8 | #' @keywords internal
9 | "_PACKAGE"
10 |
11 | ## usethis namespace: start
12 | ## usethis namespace: end
13 | NULL
14 |
--------------------------------------------------------------------------------
/pkgdown/extra.css:
--------------------------------------------------------------------------------
1 | h1, .h1 {
2 | font-size: 2rem;
3 | font-weight: 700;
4 | }
5 |
6 | h2, .h2 {
7 | font-size: 1.5rem;
8 | font-weight: 700;
9 | }
10 |
11 | .bg-primary .navbar-nav .show>.nav-link, .bg-primary .navbar-nav .nav-link.active, .bg-primary .navbar-nav .nav-link:hover, .bg-primary .navbar-nav .nav-link:focus {
12 | color: #ffb81c !important;
13 | }
14 |
--------------------------------------------------------------------------------
/cran-comments.md:
--------------------------------------------------------------------------------
1 | ## Test environments
2 |
3 | * ubuntu 24.04 (local): R 4.4.3
4 | * macOS (on GitHub Actions): release
5 | * windows (on GitHub Actions): release
6 | * ubuntu 22.04.3 (on GitHub Actions): devel, release, oldrel
7 | * win-builder: devel, release, oldrelease
8 |
9 | ## R CMD check results
10 |
11 | I'm getting 403 errors on several URLs. They are correct and work when accessed in a browser.
12 |
--------------------------------------------------------------------------------
/cricketdata.Rproj:
--------------------------------------------------------------------------------
1 | Version: 1.0
2 |
3 | RestoreWorkspace: Default
4 | SaveWorkspace: Default
5 | AlwaysSaveHistory: Default
6 |
7 | EnableCodeIndexing: Yes
8 | UseSpacesForTab: Yes
9 | NumSpacesForTab: 2
10 | Encoding: UTF-8
11 |
12 | RnwWeave: Sweave
13 | LaTeX: pdfLaTeX
14 |
15 | BuildType: Package
16 | PackageUseDevtools: Yes
17 | PackageInstallArgs: --no-multiarch --with-keep.source
18 | PackageRoxygenize: rd,collate,namespace
19 |
--------------------------------------------------------------------------------
/_pkgdown.yml:
--------------------------------------------------------------------------------
1 | url: http://pkg.robjhyndman.com/cricketdata/
2 | template:
3 | bootstrap: 5
4 | theme: tango
5 | bootswatch: flatly
6 | bslib:
7 | base_font: {google: "Fira Sans"}
8 | heading_font: {google: "Fira Sans"}
9 | code_font: "Hack, mono"
10 | primary: "#234460"
11 | link-color: "#234460"
12 | includes:
13 | in_header:
14 |
15 | navbar:
16 | type: light
17 |
--------------------------------------------------------------------------------
/NAMESPACE:
--------------------------------------------------------------------------------
1 | # Generated by roxygen2: do not edit by hand
2 |
3 | export(fetch_cricinfo)
4 | export(fetch_cricsheet)
5 | export(fetch_player_data)
6 | export(fetch_player_meta)
7 | export(find_player_id)
8 | export(update_player_meta)
9 | importFrom(dplyr,pull)
10 | importFrom(readr,read_csv)
11 | importFrom(tibble,as_tibble)
12 | importFrom(tibble,tibble)
13 | importFrom(tidyr,pivot_wider)
14 | importFrom(tidyr,separate)
15 | importFrom(utils,download.file)
16 | importFrom(utils,read.csv)
17 | importFrom(utils,unzip)
18 |
--------------------------------------------------------------------------------
/man/cricsheet_codes.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/data.R
3 | \docType{data}
4 | \name{cricsheet_codes}
5 | \alias{cricsheet_codes}
6 | \title{Codes used for competitions on Cricsheet}
7 | \format{
8 | A data frame with 44 rows and 2 variables.
9 | }
10 | \source{
11 | \url{https://cricsheet.org/downloads/#experimental}
12 | }
13 | \usage{
14 | cricsheet_codes
15 | }
16 | \description{
17 | A dataset containing the names and codes used by
18 | \href{https://cricsheet.org}{cricsheet}, as at 24 March 2025.
19 | }
20 | \keyword{datasets}
21 |
--------------------------------------------------------------------------------
/man/player_meta.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/data.R
3 | \docType{data}
4 | \name{player_meta}
5 | \alias{player_meta}
6 | \title{Meta data on players listed at ESPNCricinfo}
7 | \format{
8 | A data frame with 16101 rows and 11 variables.
9 | }
10 | \source{
11 | \url{https://www.espncricinfo.com}
12 | }
13 | \usage{
14 | player_meta
15 | }
16 | \description{
17 | A dataset containing the names and other attributes of players who appear
18 | on both \href{https://cricsheet.org}{cricsheet} and
19 | \href{https://www.espncricinfo.com}{ESPNCricinfo}, as at 24 March 2025.
20 | }
21 | \keyword{datasets}
22 |
--------------------------------------------------------------------------------
/R/data.R:
--------------------------------------------------------------------------------
1 | #' Meta data on players listed at ESPNCricinfo
2 | #'
3 | #' A dataset containing the names and other attributes of players who appear
4 | #' on both [cricsheet](https://cricsheet.org) and
5 | #' [ESPNCricinfo](https://www.espncricinfo.com), as at 24 March 2025.
6 | #'
7 | #' @format A data frame with 16101 rows and 11 variables.
8 | #' @source \url{https://www.espncricinfo.com}
9 | "player_meta"
10 |
11 | #' Codes used for competitions on Cricsheet
12 | #'
13 | #' A dataset containing the names and codes used by
14 | #' [cricsheet](https://cricsheet.org), as at 24 March 2025.
15 | #'
16 | #' @format A data frame with 44 rows and 2 variables.
17 | #' @source \url{https://cricsheet.org/downloads/#experimental}
18 | "cricsheet_codes"
19 |
--------------------------------------------------------------------------------
/man/find_player_id.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/find_player_id.R
3 | \name{find_player_id}
4 | \alias{find_player_id}
5 | \title{Find a player id from cricinfo.com}
6 | \usage{
7 | find_player_id(searchstring)
8 | }
9 | \arguments{
10 | \item{searchstring}{Part of a player name(s) to search for. Can be a character vector.}
11 | }
12 | \value{
13 | A table of matching players, their ids, and teams they played for.
14 | }
15 | \description{
16 | Find a player id from cricinfo.com
17 | }
18 | \examples{
19 | \dontrun{
20 | (perry <- find_player_id("Perry"))
21 | EllysePerry <- fetch_player_data(perry[2, "ID"], "test")
22 | }
23 | }
24 | \seealso{
25 | \code{\link[=fetch_player_data]{fetch_player_data()}} to download playing statistics for
26 | a player, and \code{\link[=fetch_player_meta]{fetch_player_meta()}} to download meta data on players.
27 | }
28 | \author{
29 | Rob J Hyndman
30 | }
31 |
--------------------------------------------------------------------------------
/man/update_player_meta.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/update_player_meta.R
3 | \name{update_player_meta}
4 | \alias{update_player_meta}
5 | \title{Update player_meta}
6 | \usage{
7 | update_player_meta(start_again = FALSE)
8 | }
9 | \arguments{
10 | \item{start_again}{If TRUE, downloads all data from ESPNCricinfo without
11 | using player_meta as a starting point. This can take a long time.}
12 | }
13 | \value{
14 | A tibble containing meta data on cricket players.
15 | }
16 | \description{
17 | The \link{player_meta} data set contains the names and other
18 | attributes of players who appear on both \href{https://cricsheet.org}{cricsheet}
19 | and \href{https://www.espncricinfo.com}{ESPNCricinfo} as at 24 March 2025.
20 | This function returns an updated version of the data set based on information
21 | currently available online.
22 | }
23 | \examples{
24 | \dontrun{
25 | # Update data to current
26 | new_player_meta <- update_player_meta()
27 | }
28 | }
29 | \seealso{
30 | \link{player_meta}, \code{\link[=fetch_player_meta]{fetch_player_meta()}}.
31 | }
32 | \author{
33 | Hassan Rafique and Rob J Hyndman
34 | }
35 |
--------------------------------------------------------------------------------
/man/fetch_cricinfo.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/fetch_cricinfo.R
3 | \name{fetch_cricinfo}
4 | \alias{fetch_cricinfo}
5 | \title{Fetch Data from Cricinfo}
6 | \usage{
7 | fetch_cricinfo(
8 | matchtype = c("test", "odi", "t20"),
9 | sex = c("men", "women"),
10 | activity = c("batting", "bowling", "fielding"),
11 | type = c("career", "innings"),
12 | country = NULL
13 | )
14 | }
15 | \arguments{
16 | \item{matchtype}{Character indicating test (default), odi, or t20.}
17 |
18 | \item{sex}{Character indicating men (default) or women.}
19 |
20 | \item{activity}{Character indicating batting (default), bowling or fielding.}
21 |
22 | \item{type}{Character indicating innings-by-innings or career (default) data}
23 |
24 | \item{country}{Character indicating country. The default is to fetch data for all countries.}
25 | }
26 | \value{
27 | A \code{tibble} object, similar to a \code{data.frame}.
28 | }
29 | \description{
30 | Fetch data from ESPNCricinfo and return a tibble.
31 | All arguments are case-insensitive and partially matched.
32 | }
33 | \examples{
34 | \dontrun{
35 | auswt20 <- fetch_cricinfo("T20", "Women", country = "Aust")
36 | IndiaODIBowling <- fetch_cricinfo("ODI", "men", "bowling", country = "india")
37 | }
38 |
39 | }
40 | \author{
41 | Rob J Hyndman, Timothy Hyndman, Charles Gray
42 | }
43 |
--------------------------------------------------------------------------------
/man/cricketdata-package.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/cricketdata.R
3 | \docType{package}
4 | \name{cricketdata-package}
5 | \alias{cricketdata}
6 | \alias{cricketdata-package}
7 | \title{cricketdata: International Cricket Data}
8 | \description{
9 | Data on international and other major cricket matches from ESPNCricinfo \url{https://www.espncricinfo.com} and Cricsheet \url{https://cricsheet.org}. This package provides some functions to download the data into tibbles ready for analysis.
10 | }
11 | \seealso{
12 | Useful links:
13 | \itemize{
14 | \item \url{https://pkg.robjhyndman.com/cricketdata/}
15 | \item \url{https://github.com/robjhyndman/cricketdata}
16 | \item Report bugs at \url{https://github.com/robjhyndman/cricketdata/issues}
17 | }
18 |
19 | }
20 | \author{
21 | \strong{Maintainer}: Rob Hyndman \email{Rob.Hyndman@monash.edu}
22 |
23 | Authors:
24 | \itemize{
25 | \item Charles Gray \email{C.Gray@latrobe.edu.au}
26 | \item Sayani Gupta \email{Sayani.Gupta@monash.edu}
27 | \item Timothy Hyndman \email{Timothy.Hyndman@gmail.com}
28 | \item Hassan Rafique \email{dazzalytics@protonmail.com}
29 | \item Jacquie Tran \email{jac@jacquietran.com}
30 | }
31 |
32 | Other contributors:
33 | \itemize{
34 | \item Puwasala Gamakumara \email{Puwasala.Gamakumara@monash.edu} [contributor]
35 | \item Alex Whan \email{alexwhan@gmail.com} [contributor]
36 | }
37 |
38 | }
39 | \keyword{internal}
40 |
--------------------------------------------------------------------------------
/man/fetch_player_meta.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/fetch_player_meta.R
3 | \name{fetch_player_meta}
4 | \alias{fetch_player_meta}
5 | \title{Fetch Player Meta Data}
6 | \usage{
7 | fetch_player_meta(playerid)
8 | }
9 | \arguments{
10 | \item{playerid}{A vector of player IDs as given in Cricinfo profiles. Integer or character.}
11 | }
12 | \value{
13 | A tibble containing meta data on the selected players, with one row for
14 | each player.
15 | }
16 | \description{
17 | Fetch player meta data from ESPNCricinfo and return a tibble with one line
18 | per player. To identify the players, use their Cricinfo player IDs.
19 | The simplest way to find this is to look up their Cricinfo Profile page. The number
20 | at the end of the URL is the ID. For example, Meg Lanning's profile page is
21 | https://www.espncricinfo.com/cricketers/meg-lanning-329336,
22 | so her ID is 329336.
23 | }
24 | \examples{
25 | \dontrun{
26 | # Download meta data on Meg Lanning and Ellyse Perry
27 | aus_women <- fetch_player_meta(c(329336, 275487))
28 | }
29 | }
30 | \seealso{
31 | It is usually simpler to just use the saved data set \link{player_meta}
32 | which contains the meta data for all players on ESPNCricinfo as at 24 March 2025.
33 | To find a player ID, use \code{\link[=find_player_id]{find_player_id()}}.
34 | Use \code{\link[=fetch_player_data]{fetch_player_data()}} to download playing statistics for a player.
35 | }
36 | \author{
37 | Hassan Rafique and Rob J Hyndman
38 | }
39 |
--------------------------------------------------------------------------------
/.github/workflows/pkgdown.yaml:
--------------------------------------------------------------------------------
1 | # Workflow derived from https://github.com/r-lib/actions/tree/v2/examples
2 | # Need help debugging build failures? Start at https://github.com/r-lib/actions#where-to-find-help
3 | on:
4 | push:
5 | branches: [main, master]
6 | pull_request:
7 | branches: [main, master]
8 | release:
9 | types: [published]
10 | workflow_dispatch:
11 |
12 | name: pkgdown
13 |
14 | jobs:
15 | pkgdown:
16 | runs-on: ubuntu-latest
17 | # Only restrict concurrency for non-PR jobs
18 | concurrency:
19 | group: pkgdown-${{ github.event_name != 'pull_request' || github.run_id }}
20 | env:
21 | GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}
22 | permissions:
23 | contents: write
24 | steps:
25 | - uses: actions/checkout@v3
26 |
27 | - uses: r-lib/actions/setup-pandoc@v2
28 |
29 | - uses: r-lib/actions/setup-r@v2
30 | with:
31 | use-public-rspm: true
32 |
33 | - uses: r-lib/actions/setup-r-dependencies@v2
34 | with:
35 | extra-packages: any::pkgdown, local::.
36 | needs: website
37 |
38 | - name: Build site
39 | run: pkgdown::build_site_github_pages(new_process = FALSE, install = FALSE)
40 | shell: Rscript {0}
41 |
42 | - name: Deploy to GitHub pages 🚀
43 | if: github.event_name != 'pull_request'
44 | uses: JamesIves/github-pages-deploy-action@v4.4.1
45 | with:
46 | clean: false
47 | branch: gh-pages
48 | folder: docs
49 |
--------------------------------------------------------------------------------
/man/fetch_cricsheet.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/fetch_cricsheet.R
3 | \name{fetch_cricsheet}
4 | \alias{fetch_cricsheet}
5 | \title{Fetch ball-by-ball, match and player data from Cricsheet and return a tibble.}
6 | \usage{
7 | fetch_cricsheet(
8 | type = c("bbb", "match", "player"),
9 | gender = c("female", "male"),
10 | competition = "tests"
11 | )
12 | }
13 | \arguments{
14 | \item{type}{Character string giving type of data: ball-by-ball, match info or player info.}
15 |
16 | \item{gender}{Character string giving player gender: female or male.}
17 |
18 | \item{competition}{Character string giving code corresponding to competition. See \code{\link{cricsheet_codes}} for the
19 | competitions and codes available.}
20 | }
21 | \value{
22 | A \code{tibble} object, similar to a \code{data.frame}.
23 | }
24 | \description{
25 | Download csv data from Cricsheet \url{https://cricsheet.org/downloads/}.
26 | Data must be specified by three factors:
27 | (a) type of data: \code{bbb} (ball-by-ball), \code{match} or \code{player}.
28 | (b) gender;
29 | (c) competition specified as a Cricsheet code. See \code{\link{cricsheet_codes}} for the
30 | competitions and codes available.
31 | }
32 | \examples{
33 | \dontrun{
34 | wbbl_bbb <- fetch_cricsheet(competition = "wbbl", type = "bbb")
35 | wbbl_match <- fetch_cricsheet(competition = "wbbl", type = "match")
36 | wbbl_player <- fetch_cricsheet(competition = "wbbl", type = "player")
37 | }
38 | }
39 | \author{
40 | Jacquie Tran, Hassan Rafique and Rob J Hyndman
41 | }
42 |
--------------------------------------------------------------------------------
/NEWS.md:
--------------------------------------------------------------------------------
1 | # cricketdata (development version)
2 |
3 | # cricketdata 0.3.0
4 | * Updated data
5 | * Bug fixes due to changes in cricinfo URLs
6 |
7 | # cricketdata 0.2.3
8 | * Updated data
9 | * Bug fixes
10 |
11 | # cricketdata 0.2.2
12 | * Added hex sticker, thanks to @jacquietran
13 | * fetch_cricsheet() can now handle all competitions with csv files.
14 | * Reduced dependencies
15 | * Bug fixes
16 |
17 | # cricketdata 0.2.1
18 | * Added new vignette introducing package
19 | * Bux fix due to changes in gghighlight
20 |
21 | # cricketdata 0.2.0
22 | * Added fetch_player_meta(), update_player_meta() and player_meta.
23 | * Improved data cleaning for cricinfo data and cricsheet data
24 | * Improved cricinfo vignette.
25 | * Bug fixes and improvements to documentation
26 |
27 | # cricketdata 0.1.1
28 | * Fixes to vignettes
29 | * Dont run slow example
30 |
31 | # cricketdata 0.1.0
32 | * Added fetch_cricsheet() functions, thanks to @jacquietran.
33 | * Added vignettes for fetch_cricsheet() and fetch_cricinfo().
34 |
35 | # cricketdata 0.0.3 (9 January 2019)
36 | * Updated to handle change in format for female data on cricinfo.
37 |
38 | # cricketdata 0.0.2 (February 2018)
39 | * Development at Numbat Hackathon, February 2018
40 | * Changed name to cricketdata as cricinfo not the only source.
41 | * Added find_player_id()
42 | * Extended fetch_player_data()
43 | * Bug fixes
44 |
45 | # cricinfo 0.0.1 (October 2017)
46 | * Package started at Melbourne ozunconf in October 2017
47 | * Functions added: fetch_player and fetch_cricinfo
48 |
--------------------------------------------------------------------------------
/.github/workflows/R-CMD-check.yaml:
--------------------------------------------------------------------------------
1 | # Workflow derived from https://github.com/r-lib/actions/tree/v2/examples
2 | # Need help debugging build failures? Start at https://github.com/r-lib/actions#where-to-find-help
3 | on:
4 | push:
5 | branches: [main, master]
6 | pull_request:
7 |
8 | name: R-CMD-check.yaml
9 |
10 | permissions: read-all
11 |
12 | jobs:
13 | R-CMD-check:
14 | runs-on: ${{ matrix.config.os }}
15 |
16 | name: ${{ matrix.config.os }} (${{ matrix.config.r }})
17 |
18 | strategy:
19 | fail-fast: false
20 | matrix:
21 | config:
22 | - {os: macos-latest, r: 'release'}
23 | - {os: windows-latest, r: 'release'}
24 | - {os: ubuntu-latest, r: 'devel', http-user-agent: 'release'}
25 | - {os: ubuntu-latest, r: 'release'}
26 | - {os: ubuntu-latest, r: 'oldrel-1'}
27 |
28 | env:
29 | GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}
30 | R_KEEP_PKG_SOURCE: yes
31 |
32 | steps:
33 | - uses: actions/checkout@v4
34 |
35 | - uses: r-lib/actions/setup-pandoc@v2
36 |
37 | - uses: r-lib/actions/setup-r@v2
38 | with:
39 | r-version: ${{ matrix.config.r }}
40 | http-user-agent: ${{ matrix.config.http-user-agent }}
41 | use-public-rspm: true
42 |
43 | - uses: r-lib/actions/setup-r-dependencies@v2
44 | with:
45 | extra-packages: any::rcmdcheck
46 | needs: check
47 |
48 | - uses: r-lib/actions/check-r-package@v2
49 | with:
50 | upload-snapshots: true
51 | build_args: 'c("--no-manual","--compact-vignettes=gs+qpdf")'
52 |
--------------------------------------------------------------------------------
/DESCRIPTION:
--------------------------------------------------------------------------------
1 | Package: cricketdata
2 | Version: 0.3.0.9000
3 | Title: International Cricket Data
4 | Description: Data on international and other major cricket matches from
5 | ESPNCricinfo and Cricsheet .
6 | This package provides some functions to download the data into tibbles ready
7 | for analysis.
8 | Authors@R: c(
9 | person("Rob", "Hyndman", email="Rob.Hyndman@monash.edu", role=c("aut", "cre")),
10 | person("Charles", "Gray", email="C.Gray@latrobe.edu.au", role="aut"),
11 | person("Sayani", "Gupta", email="Sayani.Gupta@monash.edu", role="aut"),
12 | person("Timothy", "Hyndman", email="Timothy.Hyndman@gmail.com", role="aut"),
13 | person("Hassan","Rafique", email="dazzalytics@protonmail.com", role="aut"),
14 | person("Jacquie","Tran", email="jac@jacquietran.com", role="aut"),
15 | person("Puwasala", "Gamakumara", email="Puwasala.Gamakumara@monash.edu", role="ctb"),
16 | person("Alex", "Whan", email="alexwhan@gmail.com", role="ctb")
17 | )
18 | Depends: R (>= 4.1.0)
19 | Imports:
20 | cli,
21 | dplyr (>= 1.1.0),
22 | jsonlite,
23 | lubridate,
24 | readr,
25 | rvest,
26 | stringr,
27 | tibble,
28 | tidyr,
29 | xml2
30 | Suggests:
31 | codetools,
32 | gghighlight,
33 | ggplot2,
34 | ggtext,
35 | glue,
36 | here,
37 | knitr,
38 | paletteer,
39 | patchwork,
40 | rmarkdown,
41 | R.rsp,
42 | showtext
43 | License: GPL-3
44 | Encoding: UTF-8
45 | ByteCompile: true
46 | URL: https://pkg.robjhyndman.com/cricketdata/,
47 | https://github.com/robjhyndman/cricketdata
48 | LazyData: true
49 | VignetteBuilder:
50 | knitr,
51 | R.rsp
52 | Roxygen: list(markdown = TRUE)
53 | RoxygenNote: 7.3.2
54 | BugReports: https://github.com/robjhyndman/cricketdata/issues
55 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 | # cricketdata
5 |
6 |
7 |
8 | [](https://cran.r-project.org/package=cricketdata)
9 | [](https://cran.r-project.org/package=cricketdata)
10 | [](https://www.gnu.org/licenses/gpl-3.0.en.html)
11 | [](https://github.com/robjhyndman/cricketdata/actions)
12 | [](https://github.com/robjhyndman/cricketdata/actions/workflows/R-CMD-check.yaml)
13 |
14 |
15 | Functions for downloading data on international and other major cricket
16 | matches from [ESPNCricinfo](https://www.espncricinfo.com) and
17 | [Cricsheet](https://cricsheet.org). This package provides some functions
18 | to download the data into tibbles ready for analysis.
19 |
20 | Please respect the terms of use for each website:
21 | [ESPNCricinfo](https://www.espncricinfo.com/ci/content/site/company/terms_use.html),
22 | [Cricsheet](https://cricsheet.org/register/).
23 |
24 | ## Installation
25 |
26 | You can install the **stable** version from
27 | [CRAN](https://cran.r-project.org/package=cricketdata).
28 |
29 | ``` r
30 | # install.packages("pak")
31 | pak::pak("cricketdata")
32 | ```
33 |
34 | You can install the **development** version
35 | [Github](https://github.com/robjhyndman/cricketdata):
36 |
37 | ``` r
38 | pak::pak("robjhyndman/cricketdata")
39 | ```
40 |
41 | ## License
42 |
43 | This package is free and open source software, licensed under GPL-3.
44 |
--------------------------------------------------------------------------------
/vignettes/bibliography.bib:
--------------------------------------------------------------------------------
1 | @misc{nflfastR,
2 | author = "Carl, Sebastian and Baldwin, Ben",
3 | title = "{nflfastR: Functions to Efficiently Access NFL Play by Play Data}",
4 | journal = "R package",
5 | url = {https://CRAN.R-project.org/package=nflfastR}
6 | }
7 |
8 | @misc{dataverse,
9 | author = "Gilani, Saiem",
10 | title = "{Sports Dataverse}",
11 | url = {https://sportsdataverse.org/}
12 | }
13 |
14 | @article{einstein,
15 | author = "Einstein, Albert",
16 | title = "{Zur Elektrodynamik bewegter K{\"o}rper}. ({German})
17 | [{On} the electrodynamics of moving bodies]",
18 | journal = "Annalen der Physik",
19 | volume = "322",
20 | number = "10",
21 | pages = "891--921",
22 | year = "{1905}",
23 | DOI = "http://dx.doi.org/10.1002/andp.19053221004"
24 | }
25 |
26 | @book{latexcompanion,
27 | author = "Michel Goossens and Frank Mittelbach and Alexander Samarin",
28 | title = "The \LaTeX\ Companion",
29 | year = "1993",
30 | publisher = "Addison-Wesley",
31 | address = "Reading, Massachusetts"
32 | }
33 |
34 | @misc{knuthwebsite,
35 | author = "Donald Knuth",
36 | title = "Knuth: Computers and Typesetting",
37 | url = "http://www-cs-faculty.stanford.edu/\~{}uno/abcde.html"
38 | }
39 |
40 | @misc{hassan,
41 | author = "Hassan Rafique",
42 | title = "Knuth: Computers and Typesetting",
43 | url = "http://www-cs-faculty.stanford.edu/\~{}uno/abcde.html"
44 | }
45 | @article{knuth1984,
46 | title = {Literate Programming},
47 | author = {Knuth, D. E.},
48 | year = {1984},
49 | month = {02},
50 | date = {1984-02-01},
51 | journal = {The Computer Journal},
52 | pages = {97--111},
53 | volume = {27},
54 | number = {2},
55 | doi = {10.1093/comjnl/27.2.97},
56 | url = {http://dx.doi.org/10.1093/comjnl/27.2.97},
57 | langid = {en}
58 | }
59 |
--------------------------------------------------------------------------------
/R/fetch_cricinfo.R:
--------------------------------------------------------------------------------
1 | #' Fetch Data from Cricinfo
2 | #'
3 | #' Fetch data from ESPNCricinfo and return a tibble.
4 | #' All arguments are case-insensitive and partially matched.
5 | #'
6 | #' @param matchtype Character indicating test (default), odi, or t20.
7 | #' @param sex Character indicating men (default) or women.
8 | #' @param activity Character indicating batting (default), bowling or fielding.
9 | #' @param type Character indicating innings-by-innings or career (default) data
10 | #' @param country Character indicating country. The default is to fetch data for all countries.
11 | #'
12 | #' @author Rob J Hyndman, Timothy Hyndman, Charles Gray
13 | #' @return A \code{tibble} object, similar to a \code{data.frame}.
14 | #' @examples
15 | #' \dontrun{
16 | #' auswt20 <- fetch_cricinfo("T20", "Women", country = "Aust")
17 | #' IndiaODIBowling <- fetch_cricinfo("ODI", "men", "bowling", country = "india")
18 | #' }
19 | #'
20 | #' @export
21 |
22 | fetch_cricinfo <- function(matchtype = c("test", "odi", "t20"),
23 | sex = c("men", "women"),
24 | activity = c("batting", "bowling", "fielding"),
25 | type = c("career", "innings"),
26 | country = NULL) {
27 | matchtype <- tolower(matchtype)
28 | sex <- tolower(sex)
29 | type <- tolower(type)
30 | activity <- tolower(activity)
31 | if (!is.null(country)) {
32 | country <- tolower(country)
33 | }
34 |
35 | matchtype <- match.arg(matchtype)
36 | sex <- match.arg(sex)
37 | activity <- match.arg(activity)
38 | type <- match.arg(type)
39 |
40 | # Get the raw data
41 | this_data <- fetch_cricket_data(matchtype, sex, country, activity, type)
42 |
43 | # Clean it up
44 | if (activity == "batting") {
45 | this_data <- clean_batting_data(this_data)
46 | } else if (activity == "bowling") {
47 | this_data <- clean_bowling_data(this_data)
48 | } else {
49 | this_data <- clean_fielding_data(this_data)
50 | }
51 |
52 | return(this_data)
53 | }
54 |
--------------------------------------------------------------------------------
/README.Rmd:
--------------------------------------------------------------------------------
1 | ---
2 | output: github_document
3 | ---
4 |
5 |
6 |
7 | ```{r setup, include = FALSE}
8 | knitr::opts_chunk$set(
9 | echo = TRUE,
10 | collapse = TRUE,
11 | comment = "#>",
12 | cache = TRUE,
13 | fig.path = "man/figures/README-"
14 | )
15 | library(cricketdata)
16 | library(tidyverse)
17 | ```
18 |
19 | # cricketdata
20 |
21 |
22 | [](https://cran.r-project.org/package=cricketdata)
23 | [](https://cran.r-project.org/package=cricketdata)
24 | [](https://www.gnu.org/licenses/gpl-3.0.en.html)
25 | [](https://github.com/robjhyndman/cricketdata/actions)
26 | [](https://github.com/robjhyndman/cricketdata/actions/workflows/R-CMD-check.yaml)
27 |
28 |
29 | Functions for downloading data on international and other major cricket matches from [ESPNCricinfo](https://www.espncricinfo.com) and [Cricsheet](https://cricsheet.org). This package provides some functions to download the data into tibbles ready for analysis.
30 |
31 | Please respect the terms of use for each website: [ESPNCricinfo](https://www.espncricinfo.com/ci/content/site/company/terms_use.html), [Cricsheet](https://cricsheet.org/register/).
32 |
33 | ## Installation
34 |
35 | You can install the **stable** version from [CRAN](https://cran.r-project.org/package=cricketdata).
36 |
37 | ``` r
38 | # install.packages("pak")
39 | pak::pak("cricketdata")
40 | ```
41 |
42 | You can install the **development** version [Github](https://github.com/robjhyndman/cricketdata):
43 |
44 | ``` r
45 | pak::pak("robjhyndman/cricketdata")
46 | ```
47 |
48 | ## License
49 |
50 | This package is free and open source software, licensed under GPL-3.
51 |
--------------------------------------------------------------------------------
/man/fetch_player_data.Rd:
--------------------------------------------------------------------------------
1 | % Generated by roxygen2: do not edit by hand
2 | % Please edit documentation in R/fetch_player.R
3 | \name{fetch_player_data}
4 | \alias{fetch_player_data}
5 | \title{Fetch Player Data}
6 | \usage{
7 | fetch_player_data(
8 | playerid,
9 | matchtype = c("test", "odi", "t20"),
10 | activity = c("batting", "bowling", "fielding")
11 | )
12 | }
13 | \arguments{
14 | \item{playerid}{The player ID as given in the Cricinfo profile. Integer or character.}
15 |
16 | \item{matchtype}{Which type of cricket matches do you want? Tests, ODIs or T20s? Not case-sensitive.}
17 |
18 | \item{activity}{Which type of activities do you want? Batting, Bowling or Fielding? Not case-sensitive.}
19 | }
20 | \value{
21 | A tibble containing data on the selected player, with one row for every innings
22 | of every match in which they have played.
23 | }
24 | \description{
25 | Fetch individual player data from all matches played. The function will scrape
26 | the data from ESPNCricinfo and return a tibble with one line per innings for all
27 | games a player has played. To identify a player, use their Cricinfo player ID.
28 | The simplest way to find this is to look up their Cricinfo Profile page. The number
29 | at the end of the URL is the ID. For example, Meg Lanning's profile page is
30 | http://www.espncricinfo.com/australia/content/player/329336.html,
31 | so her ID is 329336.
32 | }
33 | \examples{
34 | \dontrun{
35 | # Download data on some players
36 | EllysePerry <- fetch_player_data(275487, "T20", "batting")
37 | RahulDravid <- fetch_player_data(28114, "ODI", "fielding")
38 | LasithMalinga <- fetch_player_data(49758, "Test", "bowling")
39 |
40 | # Create a plot for Ellyse Perry's T20 scores
41 | library(dplyr)
42 | library(ggplot2)
43 | EllysePerry |>
44 | filter(!is.na(Runs)) |>
45 | ggplot(aes(x = Start_Date, y = Runs, col = Dismissal, na.rm = TRUE)) +
46 | geom_point() +
47 | ggtitle("Ellyse Perry's T20 Scores")
48 | }
49 | }
50 | \seealso{
51 | \code{\link[=find_player_id]{find_player_id()}} to find a player ID by searching on their name,
52 | and \code{\link[=fetch_player_meta]{fetch_player_meta()}} to download meta data for players.
53 | }
54 | \author{
55 | Rob J Hyndman and Sayani Gupta
56 | }
57 |
--------------------------------------------------------------------------------
/R/clean_fielding_data.R:
--------------------------------------------------------------------------------
1 | # Function to clean bowling data.
2 | # Works with career or innings data
3 |
4 | clean_fielding_data <- function(x) {
5 | # Make names easier to interpret
6 | vars <- colnames(x)
7 | vars[vars == "Mat"] <- "Matches"
8 | vars[vars == "Inns"] <- "Innings"
9 | vars[vars == "Start Date"] <- "Date"
10 | vars[vars == "Dis"] <- "Dismissals"
11 | vars[vars == "Ct"] <- "Caught"
12 | vars[vars == "St"] <- "Stumped"
13 | vars[vars == "Ct Wk"] <- "CaughtBehind"
14 | vars[vars == "Ct Fi"] <- "CaughtFielder"
15 | vars[vars == "MD"] <- "MaxDismissalsInnings"
16 |
17 | colnames(x) <- vars
18 |
19 | # Fix classes for all variables
20 | x$Innings <- as.integer(x$Innings)
21 | x$Dismissals <- as.integer(x$Dismissals)
22 | x$Caught <- as.integer(x$Caught)
23 | x$Stumped <- as.integer(x$Stumped)
24 | x$CaughtBehind <- as.integer(x$CaughtBehind)
25 | x$CaughtFielder <- as.integer(x$CaughtFielder)
26 |
27 | career <- ("Matches" %in% vars)
28 | if (career) {
29 | x$Matches <- as.integer(x$Matches)
30 | if ("Span" %in% vars) {
31 | x$Start <- as.integer(substr(x$Span, 1, 4))
32 | x$End <- as.integer(substr(x$Span, 6, 9))
33 | }
34 | x$MaxDismissalsInnings <- unlist(lapply(
35 | strsplit(x$MaxDismissalsInnings, "\\("), function(x) {
36 | as.numeric(x[1])
37 | }
38 | ))
39 | } else {
40 | x$Date <- lubridate::dmy(x$Date)
41 | x$Opposition <- stringr::str_replace_all(x$Opposition, "v | Women| Wmn", "")
42 | x$Opposition <- rename_countries(x$Opposition)
43 | }
44 |
45 | # Extract country information if it is present
46 | # This should only be required when multiple countries are included
47 | country <- (length(grep("\\(", x[1, 1])) > 0)
48 | if (country) {
49 | x$Country <- stringr::str_extract(x$Player, "\\([a-zA-Z \\-extends]+\\)")
50 | x$Country <- stringr::str_replace_all(x$Country, "\\(|\\)|-W", "")
51 | x$Country <- rename_countries(x$Country)
52 | x$Player <- stringr::str_replace(x$Player, "\\([a-zA-Z \\-]+\\)", "")
53 | }
54 |
55 | # Re-order and select columns
56 | vars <- colnames(x)
57 | if (career) {
58 | varorder <- c(
59 | "Player", "Country", "Start", "End", "Matches", "Innings",
60 | "Dismissals", "Caught", "CaughtFielder", "CaughtBehind", "Stumped", "MaxDismissalsInnings"
61 | )
62 | } else {
63 | varorder <- c(
64 | "Date", "Player", "Country",
65 | "Dismissals", "Caught", "CaughtFielder", "CaughtBehind", "Stumped", "Innings", "Opposition", "Ground"
66 | )
67 | }
68 | varorder <- varorder[varorder %in% vars]
69 |
70 | return(x[, varorder])
71 | }
72 |
--------------------------------------------------------------------------------
/R/find_player_id.R:
--------------------------------------------------------------------------------
1 | #' Find a player id from cricinfo.com
2 | #'
3 | #' @param searchstring Part of a player name(s) to search for. Can be a character vector.
4 | #' @return A table of matching players, their ids, and teams they played for.
5 | #' @seealso [fetch_player_data()] to download playing statistics for
6 | #' a player, and [fetch_player_meta()] to download meta data on players.
7 | #' @examples
8 | #' \dontrun{
9 | #' (perry <- find_player_id("Perry"))
10 | #' EllysePerry <- fetch_player_data(perry[2, "ID"], "test")
11 | #' }
12 | #' @author Rob J Hyndman
13 | #' @export
14 |
15 | find_player_id <- function(searchstring) {
16 | url <- paste0(
17 | "http://stats.espncricinfo.com/ci/engine/stats/analysis.html?search=",
18 | searchstring, ";template=analysis"
19 | )
20 | url <- gsub(" ", "%20", url)
21 | raw <- lapply(url, function(x) try(xml2::read_html(x), silent = TRUE))
22 | if ("try-error" %in% lapply(raw, class)) {
23 | stop("This shouldn't happen")
24 | }
25 | # Extract table of names
26 | tab <- lapply(raw, function(x) rvest::html_table(x)[[1]])
27 | # Make into a table
28 | tab <- lapply(tab, function(x) {
29 | x <- tibble::as_tibble(x, .name_repair = "check_unique")
30 | x <- x[x$X1 != "", ]
31 | })
32 |
33 | for (i in seq_along(searchstring)) {
34 | x <- tab[[i]]
35 | # Check if any players returned
36 | if (x[1, 1] == "No matching players found") {
37 | warning(paste("No matching players found for search:", searchstring[i]))
38 | }
39 | # Check if return exceeds 100
40 | if (grepl("Search restricted", x[nrow(x), 1])) {
41 | warning(paste0(
42 | "Only the first 100 results returned for search: '", searchstring[i],
43 | "'. Try a more specific search"
44 | ))
45 | tab[[i]] <- x[1:100, ]
46 | }
47 | }
48 |
49 | tab <- dplyr::bind_rows(tab)
50 |
51 | # Name columns
52 | colnames(tab) <- c("Name", "Country", "Played")
53 | # Now to find the ids
54 | ids <- lapply(raw, rvest::html_nodes, css = "a")
55 | ids <- lapply(ids, function(x) as.character(x[grep("/ci/engine/player/", x)]))
56 | ids <- lapply(ids, function(x) gsub("([a-zA-Z= \\\"/<>]*)", "", x))
57 | ids <- lapply(ids, function(x) {
58 | unlist(lapply(strsplit(x, ".", fixed = TRUE), function(x) {
59 | x[1]
60 | }))
61 | })
62 | # Insert NA for those without matches
63 | ids <- lapply(ids, unique)
64 | ids <- lapply(ids, function(x) if (is.null(x)) NA else as.numeric(x))
65 | tab$ID <- unlist(ids)
66 | tab$searchstring <- rep(searchstring, times = unlist(lapply(ids, length)))
67 | return(tab[, c("ID", "Name", "searchstring", "Country", "Played")])
68 | }
69 |
--------------------------------------------------------------------------------
/data-raw/make_player_meta.R:
--------------------------------------------------------------------------------
1 | library(cricketdata)
2 | library(rvest)
3 | library(stringr)
4 | library(dplyr)
5 |
6 | # List of the first 1000 ESPN teams
7 | get_espn_countries <- function(n = 1000) {
8 | countries <- tibble(id = seq(n), country = character(n))
9 | for(i in countries$id) {
10 | cat(i," ")
11 | countries$country[i] <- cricketdata:::get_espn_country(i)
12 | }
13 | return(countries)
14 | }
15 |
16 | espn_countries <- get_espn_countries()
17 | usethis::use_data(espn_countries, overwrite = TRUE, internal = TRUE)
18 |
19 |
20 | # Create tibble of cricinfo meta data for all players who are on both cricsheet and cricinfo.
21 | # Using start_again = TRUE in case some data has been corrected online.
22 | # Much more efficient to set start_again = FALSE
23 | new_player_meta <- update_player_meta(start_again = FALSE)
24 | new_player_meta <- new_player_meta |>
25 | as_tibble() |>
26 | mutate(
27 | country = if_else(str_detect(country, "P.N.G."), "Papua New Guinea", country),
28 | country = if_else(str_detect(country, "U.A.E."), "United Arab Emirates", country),
29 | country = if_else(str_detect(country, "U.S.A."), "United States of America", country),
30 | country = if_else(str_detect(country, "Czech Rep."), "Czech Republic", country),
31 | country = if_else(str_detect(country, "Cayman Is"), "Cayman Islands", country),
32 | )
33 |
34 | # Check all character fields are ascii
35 | for (j in seq_len(NCOL(new_player_meta))) {
36 | if(inherits(new_player_meta[,j], "character")) {
37 | new_player_meta[,j] <- iconv(new_player_meta[,j], from="utf8", to="ascii")
38 | }
39 | }
40 | player_meta <- new_player_meta
41 | usethis::use_data(player_meta, overwrite = TRUE)
42 |
43 | # Need to update date and size of object in following files
44 | # fetch_player_meta.R
45 | # update_player_meta.R
46 | # data.R
47 |
48 | # Find all csv2 files on cricsheet
49 | downloads <- read_html("https://cricsheet.org/downloads/")
50 | names <- downloads |>
51 | html_elements("td") |>
52 | html_text2() |>
53 | str_trim()
54 | names <- names[names != ""]
55 | names <- names[!str_detect(names, "JSON")]
56 | names <- names[!grepl("^[0-9,]*$", names)]
57 | names <- names[!str_detect(names, "[Added|Played] in the previous")]
58 | first <- which(str_detect(names, "^All matches"))[2]
59 | names <- names[first:length(names)]
60 | names <- names[!str_detect(names, "Original\nNew")]
61 | names <- str_remove(names, " [0-9]* matches withheld \\(\\?\\)")
62 |
63 | acronyms <- downloads |>
64 | html_elements("a") |>
65 | as.character()
66 | acronyms <- acronyms[grepl("csv2", acronyms)]
67 | acronyms <- acronyms[!grepl("_male_csv2", acronyms)]
68 | acronyms <- acronyms[!grepl("_female_csv2", acronyms)]
69 | acronyms <- tail(acronyms, length(names))
70 | acronyms <- str_extract(acronyms, "[a-zA-Z0-9]*_csv2")
71 | acronyms <- str_remove(acronyms, "_csv2")
72 |
73 | cricsheet_codes <- tibble::tibble(
74 | competition = names,
75 | code = acronyms
76 | )
77 | usethis::use_data(cricsheet_codes, overwrite = TRUE)
78 |
--------------------------------------------------------------------------------
/R/fetch_player_meta.R:
--------------------------------------------------------------------------------
1 | #' Fetch Player Meta Data
2 | #'
3 | #' Fetch player meta data from ESPNCricinfo and return a tibble with one line
4 | #' per player. To identify the players, use their Cricinfo player IDs.
5 | #' The simplest way to find this is to look up their Cricinfo Profile page. The number
6 | #' at the end of the URL is the ID. For example, Meg Lanning's profile page is
7 | #' https://www.espncricinfo.com/cricketers/meg-lanning-329336,
8 | #' so her ID is 329336.
9 | #'
10 | #' @param playerid A vector of player IDs as given in Cricinfo profiles. Integer or character.
11 | #'
12 | #' @return A tibble containing meta data on the selected players, with one row for
13 | #' each player.
14 | #' @author Hassan Rafique and Rob J Hyndman
15 | #' @seealso It is usually simpler to just use the saved data set [player_meta]
16 | #' which contains the meta data for all players on ESPNCricinfo as at 24 March 2025.
17 | #' To find a player ID, use [find_player_id()].
18 | #' Use [fetch_player_data()] to download playing statistics for a player.
19 | #' @examples
20 | #' \dontrun{
21 | #' # Download meta data on Meg Lanning and Ellyse Perry
22 | #' aus_women <- fetch_player_meta(c(329336, 275487))
23 | #' }
24 | #' @export
25 | fetch_player_meta <- function(playerid) {
26 | output <- NULL
27 | pb <- cli::cli_progress_bar(total = length(playerid))
28 | for (j in seq_along(playerid)) {
29 | cli::cli_progress_update()
30 | output <- rbind(output, fetch_player_meta_individual(playerid[j]))
31 | }
32 | cli::cli_progress_done()
33 | return(output)
34 | }
35 |
36 | fetch_player_meta_individual <- function(playerid) {
37 | # Set up empty output
38 | output <- data.frame(
39 | cricinfo_id = playerid,
40 | name = NA_character_,
41 | full_name = NA_character_,
42 | country = NA_character_,
43 | dob = as.Date(NA),
44 | batting_style = NA_character_,
45 | bowling_style = NA_character_
46 | )
47 | # Read JSON file with player meta data
48 | url <- paste0("http://core.espnuk.org/v2/sports/cricket/athletes/", playerid)
49 | json <- try(jsonlite::read_json(url), silent = TRUE)
50 | if ("try-error" %in% class(json)) {
51 | warning(paste(
52 | "Cannot read player information from ESPNCricinfo for ID",
53 | playerid
54 | ))
55 | } else {
56 | output$full_name <- json$fullName
57 | output$name <- json$name
58 | output$country <- get_espn_country(json$country)
59 | if(is.na(output$country)) {
60 | stop(paste("Country not found. Player",playerid,output$full_name))
61 | }
62 | output$dob <- as.Date(json$dateOfBirth)
63 | if(length(json$style) > 0) {
64 | output$batting_style <- json$style[[1]]$description
65 | }
66 | if(length(json$style) > 1) {
67 | output$bowling_style <- json$style[[2]]$description
68 | }
69 | }
70 | return(output)
71 | }
72 |
73 | # Find country from ESPN team id
74 | get_espn_country <- function(i) {
75 | if(i %in% espn_countries$id) {
76 | return(espn_countries$country[espn_countries$id == i])
77 | }
78 | json <- try(jsonlite::read_json(paste0("http://core.espnuk.org/v2/sports/cricket/teams/", i)), silent = TRUE)
79 | if(!("try-error" %in% class(json))) {
80 | # Update data set to avoid repeated calls
81 | espn_countries <- rbind(espn_countries, data.frame(id = i, country = json$name))
82 | return(json$name)
83 | } else {
84 | return(NA_character_)
85 | }
86 | }
87 |
88 | utils::globalVariables(c(
89 | "cricinfo_id", "full_name", "country", "dob", "birthplace",
90 | "batting_style", "bowling_style", "playing_role",
91 | "title", "values"
92 | ))
93 |
--------------------------------------------------------------------------------
/R/fetch_cricket_data.R:
--------------------------------------------------------------------------------
1 | # Main function to scrape the data from cricinfo
2 | # Not user-visible. Called by fetch_cricinfo.
3 |
4 | fetch_cricket_data <- function(matchtype = c("test", "odi", "t20"),
5 | sex = c("men", "women"),
6 | country = NULL,
7 | activity = c("batting", "bowling", "fielding"),
8 | view = c("innings", "career")) {
9 | # Check arguments given by user match the type (class?) of the default
10 | # arguments of the function.
11 | matchtype <- tolower(matchtype)
12 | sex <- tolower(sex)
13 | matchtype <- match.arg(matchtype)
14 | sex <- match.arg(sex)
15 | activity <- match.arg(activity)
16 | view <- match.arg(view)
17 |
18 | # Set view text.
19 | view_text <- if (view == "innings") {
20 | ";view=innings"
21 | } else {
22 | NULL
23 | }
24 |
25 | # Define url signifier for match type.
26 | matchclass <-
27 | match(matchtype, c("test", "odi", "t20")) + 7 * (sex == "women")
28 |
29 | # Find country code
30 | if (!is.null(country)) {
31 | if (sex == "men") {
32 | team <- men$team[pmatch(country, tolower(men$name))]
33 | } else {
34 | team <- women$team[pmatch(country, tolower(women$name))]
35 | }
36 | if (is.na(team)) {
37 | stop("Country not found")
38 | }
39 | }
40 |
41 | # Set starting page to read from.
42 | page <- 1L
43 | alldata <- NULL
44 |
45 | # Read each page in turn and bind the rows.
46 | theend <- FALSE # Initialise.
47 | while (!theend) {
48 | # Create url string.
49 | url <-
50 | paste0(
51 | "http://stats.espncricinfo.com/ci/engine/stats/index.html?class=",
52 | matchclass,
53 | ifelse(is.null(country), "", paste0(";team=", team)),
54 | ";page=",
55 | format(page, scientific = FALSE),
56 | ";template=results;type=",
57 | activity,
58 | view_text,
59 | ";size=200;wrappertype=print"
60 | )
61 |
62 | # Get raw page data from page using xml2::read_html() with url string.
63 | raw <- try(xml2::read_html(url), silent = TRUE)
64 | if ("try-error" %in% class(raw)) {
65 | stop("Error in URL")
66 | }
67 |
68 | # Grab relevant table using rvest::html_table() on the raw page data.
69 | tables <- rvest::html_table(raw)
70 | tab <- tables[[3]]
71 | # Check to see if the dataset extracted is empty
72 | if (identical(dim(tab), c(1L, 1L))) {
73 | theend <- TRUE
74 | }
75 | if (page == 1L) {
76 | if (theend) {
77 | stop("No data available")
78 | }
79 | maxpage <- as.numeric(strsplit(dplyr::pull(tables[[2]][1, 1]), "Page 1 of ")[[1]][2])
80 | cli::cli_progress_bar("Downloading", total = maxpage)
81 | #cli::cli_progress_update()
82 | #Sys.sleep(1 / 1000)
83 | }
84 | if (!theend) {
85 | # Make allcolumns characters for now.
86 | tab <- suppressMessages(tibble::as_tibble(apply(tab, 2, as.character), .name_repair = "unique"))
87 |
88 | # Bind the data extracted from this page to all data collected so far.
89 | alldata <- dplyr::bind_rows(alldata, tab)
90 |
91 | # Update progress bar
92 | cli::cli_progress_update()
93 | Sys.sleep(1 / 1000)
94 |
95 | # Increment page counter.
96 | page <- page + 1L
97 | }
98 | }
99 |
100 | cli::cli_progress_done()
101 |
102 | # Remove redundant missings columns.
103 | alldata <-
104 | suppressMessages(tibble::as_tibble(alldata[, colSums(is.na(alldata)) != NROW(alldata)], .name_repair = "check_unique"))
105 | # Convert "-" to NA
106 | alldata[alldata == "-"] <- NA
107 |
108 | return(alldata)
109 | }
110 |
--------------------------------------------------------------------------------
/R/update_player_meta.R:
--------------------------------------------------------------------------------
1 | #' Update player_meta
2 | #'
3 | #' The [player_meta] data set contains the names and other
4 | #' attributes of players who appear on both [cricsheet](https://cricsheet.org)
5 | #' and [ESPNCricinfo](https://www.espncricinfo.com) as at 24 March 2025.
6 | #' This function returns an updated version of the data set based on information
7 | #' currently available online.
8 | #'
9 | #' @param start_again If TRUE, downloads all data from ESPNCricinfo without
10 | #' using player_meta as a starting point. This can take a long time.
11 | #' @return A tibble containing meta data on cricket players.
12 | #' @author Hassan Rafique and Rob J Hyndman
13 | #' @seealso [player_meta], [fetch_player_meta()].
14 | #' @examples
15 | #' \dontrun{
16 | #' # Update data to current
17 | #' new_player_meta <- update_player_meta()
18 | #' }
19 | #' @export
20 | update_player_meta <- function(start_again = FALSE) {
21 | store_warning <- options(warn = -1)$warn
22 | # Remove people with no country from existing player_meta
23 | player_meta <- player_meta |>
24 | dplyr::filter(!is.na(country))
25 | # Read people file from cricsheet
26 | people <- readr::read_csv("https://cricsheet.org/register/people.csv",
27 | col_types = "ccccccccccccccc", lazy = FALSE
28 | ) |>
29 | dplyr::select(
30 | cricsheet_id = identifier,
31 | cricinfo_id = key_cricinfo,
32 | cricinfo_id2 = key_cricinfo_2,
33 | name, unique_name
34 | ) |>
35 | # Remove people not on Cricinfo
36 | dplyr::filter(!is.na(cricinfo_id))
37 |
38 | # Compare existing version of player_meta and find missing players
39 | if (start_again) {
40 | missing_df <- people
41 | } else {
42 | missing_df <- people |>
43 | dplyr::anti_join(player_meta, by = "cricinfo_id") |>
44 | dplyr::anti_join(player_meta, by = c("cricinfo_id2" = "cricinfo_id"))
45 | }
46 |
47 | # Now download meta data for new players
48 | new_player_meta <- fetch_player_meta(missing_df$cricinfo_id)
49 |
50 | # For people missing on cricinfo, try the other cricinfo id
51 | cricinfo2 <- new_player_meta |>
52 | dplyr::left_join(people |> dplyr::select(-name), by = "cricinfo_id") |>
53 | dplyr::filter(is.na(full_name) & !is.na(cricinfo_id2)) |>
54 | dplyr::pull(cricinfo_id2)
55 | if (length(cricinfo2) > 0) {
56 | new_player_meta <- new_player_meta |>
57 | dplyr::bind_rows(fetch_player_meta(cricinfo2)) |>
58 | dplyr::filter(!is.na(full_name))
59 | }
60 |
61 | # Add in cricsheet id
62 | new_player_meta <- dplyr::bind_rows(
63 | new_player_meta |>
64 | dplyr::left_join(people |> dplyr::select(-name),
65 | by = "cricinfo_id"
66 | ) |>
67 | dplyr::select(-cricinfo_id2) |>
68 | dplyr::filter(!is.na(cricsheet_id)),
69 | new_player_meta |>
70 | dplyr::left_join(people |> dplyr::select(-name, -cricinfo_id),
71 | by = c("cricinfo_id" = "cricinfo_id2")
72 | ) |>
73 | dplyr::filter(!is.na(cricsheet_id))
74 | ) |>
75 | # Organize by column and row
76 | dplyr::select(
77 | cricinfo_id, cricsheet_id, unique_name, full_name,
78 | dplyr::everything()
79 | ) |>
80 | # Remove missing people
81 | dplyr::filter(!is.na(full_name))
82 |
83 | # Add to existing player_meta
84 | if (!start_again) {
85 | new_player_meta <- new_player_meta |>
86 | dplyr::mutate(dob = as.Date(dob)) |>
87 | dplyr::bind_rows(player_meta)
88 | }
89 |
90 | # Clean up and arrange
91 | new_player_meta <- new_player_meta |>
92 | # Fix country names
93 | dplyr::mutate(country = stringr::str_remove(country, " Wmn")) |>
94 | # Arrange in alphabetic order
95 | dplyr::arrange(full_name) |>
96 | # Remove duplicates
97 | dplyr::distinct()
98 |
99 | options(warn = store_warning)
100 | return(new_player_meta)
101 | }
102 |
103 | utils::globalVariables(c(
104 | "identifier", "key_cricinfo", "key_cricinfo_2", "name", "unique_name",
105 | "player_meta", "cricinfo_id", "cricinfo_id2", "cricsheet_id"
106 | ))
107 |
--------------------------------------------------------------------------------
/R/clean_batting_data.R:
--------------------------------------------------------------------------------
1 | # Function to clean batting data.
2 | # Works with career or innings data
3 |
4 | clean_batting_data <- function(x) {
5 | # Make names easier to interpret
6 | vars <- colnames(x)
7 | vars[vars == "Mat"] <- "Matches"
8 | vars[vars == "Inns"] <- "Innings"
9 | vars[vars == "NO"] <- "NotOuts"
10 | vars[vars == "HS"] <- "HighScore"
11 | vars[vars == "Ave"] <- "Average"
12 | vars[vars == "100"] <- "Hundreds"
13 | vars[vars == "50"] <- "Fifties"
14 | vars[vars == "0"] <- "Ducks"
15 | vars[vars == "SR"] <- "StrikeRate"
16 | vars[vars == "BF"] <- "BallsFaced"
17 | vars[vars == "4s"] <- "Fours"
18 | vars[vars == "6s"] <- "Sixes"
19 | vars[vars == "Mins"] <- "Minutes"
20 | vars[vars == "Start Date"] <- "Date"
21 | colnames(x) <- vars
22 |
23 | career <- ("Matches" %in% vars)
24 | if (career) {
25 | x$Matches <- as.integer(x$Matches)
26 | x$NotOuts <- as.integer(x$NotOuts)
27 | x$Hundreds <- as.integer(x$Hundreds)
28 | x$Fifties <- as.integer(x$Fifties)
29 | x$Ducks <- as.integer(x$Ducks)
30 | # Add not out column and remove annotations from highscore
31 | x$HighScoreNotOut <- seq(NROW(x)) %in% grep("*", x$HighScore, fixed = TRUE)
32 | x$HighScore <- as.numeric(gsub("*", "", x$HighScore, fixed = TRUE))
33 | if ("Span" %in% vars) {
34 | x$Start <- as.integer(substr(x$Span, 1, 4))
35 | x$End <- as.integer(substr(x$Span, 6, 9))
36 | }
37 | x$Runs <- as.integer(x$Runs)
38 | x$Innings <- as.integer(x$Innings)
39 | x$Average <- x$Runs / (x$Innings - x$NotOuts)
40 | } else {
41 | x$Innings <- as.integer(x$Innings)
42 | x$Minutes <- as.numeric(x$Minutes)
43 | x$Date <- lubridate::dmy(x$Date)
44 | x$Opposition <- stringr::str_replace_all(x$Opposition, "v | Women| Wmn", "")
45 | x$Opposition <- rename_countries(x$Opposition)
46 | # Add not out and participation column and remove annotations from runs
47 | x$NotOut <- seq(NROW(x)) %in% grep("*", x$Runs, fixed = TRUE)
48 | x$Runs <- gsub("*", "", x$Runs, fixed = TRUE)
49 | x$Participation <- participation_status(x$Runs)
50 | x$Runs[x$Participation != "B"] <- NA
51 | x$Runs <- as.integer(x$Runs)
52 | }
53 |
54 | if ("BallsFaced" %in% vars) {
55 | x$BallsFaced <- as.integer(x$BallsFaced)
56 | x$StrikeRate <- x$Runs / x$BallsFaced * 100
57 | }
58 | if ("Fours" %in% vars) {
59 | x$Fours <- as.integer(x$Fours)
60 | x$Sixes <- as.integer(x$Sixes)
61 | }
62 |
63 | # Extract country information if it is present
64 | # This should only be required when multiple countries are included
65 | country <- (length(grep("\\(", x$Player) > 0))
66 | if (country) {
67 | x$Country <- stringr::str_extract(x$Player, "\\([a-zA-Z /\\-extends]+\\)")
68 | x$Country <- stringr::str_replace_all(x$Country, "\\(|\\)|-W", "")
69 | x$Country <- rename_countries(x$Country)
70 | x$Player <- stringr::str_replace(x$Player, "\\([a-zA-Z /\\-]+\\)", "")
71 | }
72 |
73 | # Re-order and select columns
74 | vars <- colnames(x)
75 | if (career) {
76 | varorder <- c(
77 | "Player", "Country", "Start", "End", "Matches", "Innings", "NotOuts", "Runs", "HighScore", "HighScoreNotOut",
78 | "Average", "BallsFaced", "StrikeRate", "Hundreds", "Fifties", "Ducks", "Fours", "Sixes"
79 | )
80 | } else {
81 | varorder <- c(
82 | "Date", "Player", "Country", "Runs", "NotOut", "Minutes", "BallsFaced", "Fours", "Sixes",
83 | "StrikeRate", "Innings", "Participation", "Opposition", "Ground"
84 | )
85 | }
86 | varorder <- varorder[varorder %in% vars]
87 |
88 | return(x[, varorder])
89 | }
90 |
91 |
92 | # Convert bowling/batting batting category to a character variable.
93 |
94 | participation_status <- function(status) {
95 | absent <- grep("absent", status)
96 | dnb <- grep("^DNB", status)
97 | tdnb <- grep("TDNB", status)
98 | sub <- grep("sub", status)
99 |
100 | status[seq(NROW(status))] <- "B"
101 | status[absent] <- "Absent"
102 | status[dnb] <- "DNB"
103 | status[tdnb] <- "TDNB"
104 | status[sub] <- "Sub"
105 |
106 | return(status)
107 | }
108 |
--------------------------------------------------------------------------------
/R/clean_bowling_data.R:
--------------------------------------------------------------------------------
1 | # Function to clean bowling data.
2 | # Works with career or innings data
3 |
4 | clean_bowling_data <- function(x) {
5 | # Make names easier to interpret
6 | vars <- colnames(x)
7 | vars[vars == "Mat"] <- "Matches"
8 | vars[vars == "Inns"] <- "Innings"
9 | vars[vars == "Mdns"] <- "Maidens"
10 | vars[vars == "Wkts"] <- "Wickets"
11 | vars[vars == "BBI"] <- "BestBowlingInnings"
12 | vars[vars == "BBM"] <- "BestBowlingMatch"
13 | vars[vars == "Ave"] <- "Average"
14 | vars[vars == "Econ"] <- "Economy"
15 | vars[vars == "SR"] <- "StrikeRate"
16 | vars[vars == "4"] <- "FourWickets"
17 | vars[vars == "5"] <- "FiveWickets"
18 | vars[vars == "10"] <- "TenWickets"
19 | vars[vars == "Start Date"] <- "Date"
20 | colnames(x) <- vars
21 |
22 | # Fix classes for all variables
23 | if ("Maidens" %in% vars) {
24 | x$Maidens <- as.integer(x$Maidens)
25 | }
26 | if ("Balls" %in% vars) {
27 | x$Balls <- as.integer(x$Balls)
28 | }
29 | x$Runs <- as.integer(x$Runs)
30 | x$Wickets <- as.integer(x$Wickets)
31 | x$Innings <- as.integer(x$Innings)
32 |
33 | career <- ("Matches" %in% vars)
34 |
35 | if (career) {
36 | x$Matches <- as.integer(x$Matches)
37 | if ("Span" %in% vars) {
38 | x$Start <- as.integer(substr(x$Span, 1, 4))
39 | x$End <- as.integer(substr(x$Span, 6, 9))
40 | }
41 | if ("FourWickets" %in% vars) {
42 | x$FourWickets <- as.integer(x$FourWickets)
43 | }
44 | if ("FiveWickets" %in% vars) {
45 | x$FiveWickets <- as.integer(x$FiveWickets)
46 | }
47 | if ("TenWickets" %in% vars) {
48 | x$TenWickets <- as.integer(x$TenWickets)
49 | }
50 | } else {
51 | # Add participation column
52 | if ("Overs" %in% vars) {
53 | x$Participation <- participation_status(x$Overs)
54 | x$Overs[x$Participation != "B"] <- NA
55 | }
56 | x$Date <- lubridate::dmy(x$Date)
57 | x$Opposition <- stringr::str_replace_all(x$Opposition, "v | Women| Wmn", "")
58 | x$Opposition <- rename_countries(x$Opposition)
59 | }
60 | if ("Overs" %in% vars) {
61 | x$Overs <- as.numeric(x$Overs)
62 | }
63 |
64 | # Recompute average to avoid rounding errors
65 | if ("Average" %in% vars) {
66 | x$Average <- x$Runs / x$Wickets
67 | }
68 |
69 | # Recompute economy rate to avoid rounding errors
70 | if ("Balls" %in% vars) {
71 | balls <- x$Balls
72 | } else {
73 | balls <- trunc(x$Overs) * 6 + (x$Overs %% 1) * 10
74 | }
75 | if ("Economy" %in% vars) {
76 | x$Economy <- as.numeric(x$Economy)
77 | ER <- x$Runs / (balls / 6)
78 | # Don't modify values if they differ by more than 0.05
79 | differ <- abs(round(ER, 2) - x$Economy) > 0.05
80 | differ[is.na(differ)] <- FALSE
81 | x$Economy[!differ] <- ER[!differ]
82 | }
83 |
84 | # Recompute strike rate
85 | if ("StrikeRate" %in% vars) {
86 | x$StrikeRate <- balls / x$Wickets
87 | }
88 |
89 | # Extract country information if it is present
90 | # This should only be required when multiple countries are included
91 | country <- (length(grep("\\(", x[1, 1])) > 0)
92 | if (country) {
93 | x$Country <- stringr::str_extract(x$Player, "\\([a-zA-Z \\-extends]+\\)")
94 | x$Country <- stringr::str_replace_all(x$Country, "\\(|\\)|-W", "")
95 | x$Country <- rename_countries(x$Country)
96 | x$Player <- stringr::str_replace(x$Player, " \\([a-zA-Z \\-]+\\)", "")
97 | }
98 |
99 | # Re-order and select columns
100 | vars <- colnames(x)
101 | if (career) {
102 | varorder <- c(
103 | "Player", "Country", "Start", "End", "Matches", "Innings", "Overs", "Balls", "Maidens", "Runs", "Wickets",
104 | "Average", "Economy", "StrikeRate", "BestBowlingInnings", "BestBowlingMatch", "FourWickets", "FiveWickets", "TenWickets"
105 | )
106 | } else {
107 | varorder <- c(
108 | "Date", "Player", "Country", "Overs", "Balls", "Maidens", "Runs", "Wickets",
109 | "Economy", "Innings", "Participation", "Opposition", "Ground"
110 | )
111 | }
112 | varorder <- varorder[varorder %in% vars]
113 |
114 | return(x[, varorder])
115 | }
116 |
--------------------------------------------------------------------------------
/R/countries.R:
--------------------------------------------------------------------------------
1 | men <- data.frame(
2 | team = c(1:9, 11, 12, 14, 15, 17, 19, 20, 25, 26, 27, 28, 29, 30, 32, 37, 40),
3 | name = c(
4 | "England",
5 | "Australia",
6 | "South Africa",
7 | "West Indies",
8 | "New Zealand",
9 | "India",
10 | "Pakistan",
11 | "Sri Lanka",
12 | "Zimbabwe",
13 | "United States of America",
14 | "Bermuda",
15 | "East Africa",
16 | "Netherlands",
17 | "Canada",
18 | "Hong Kong",
19 | "Papua New Guinea",
20 | "Bangladesh",
21 | "Kenya",
22 | "United Arab Emirates",
23 | "Namibia",
24 | "Ireland",
25 | "Scotland",
26 | "Nepal",
27 | "Oman",
28 | "Afghanistan"
29 | )
30 | )
31 |
32 | women <- data.frame(
33 | team = c(289, 1026, 1863, 4240, 3379, 3672, 2614, 2285, 3022, 2461, 3867, 825, 3808, 2331, 3505, 3843),
34 | name = c(
35 | "Australia",
36 | "England",
37 | "India",
38 | "Bangladesh",
39 | "South Africa",
40 | "Sri Lanka",
41 | "New Zealand",
42 | "Ireland",
43 | "Pakistan",
44 | "Netherlands",
45 | "West Indies",
46 | "Denmark",
47 | "Jamaica",
48 | "Japan",
49 | "Scotland",
50 | "Trinidad & Tobago"
51 | )
52 | )
53 |
54 | men <- tibble::as_tibble(men[order(men$name), ], .name_repair = "check_unique") |>
55 | dplyr::mutate(
56 | team = as.integer(team),
57 | name = as.character(name)
58 | )
59 | women <- tibble::as_tibble(women[order(women$name), ], .name_repair = "check_unique") |>
60 | dplyr::mutate(
61 | team = as.integer(team),
62 | name = as.character(name)
63 | )
64 |
65 |
66 | ## Men
67 | # 1 England
68 | # 2 Australia
69 | # 3 South Africa
70 | # 4 West Indies
71 | # 5 New Zealand
72 | # 6 India
73 | # 7 Pakistan
74 | # 8 Sri Lanka
75 | # 9 Zimbabwe
76 | # 11 United States of America
77 | # 12 Bermuda
78 | # 14 East Africa
79 | # 15 Netherlands
80 | # 17 Canada
81 | # 19 Hong Kong
82 | # 20 Papua New Guinea
83 | # 25 Bangladesh
84 | # 26 Kenya
85 | # 27 United Arab Emirates
86 | # 28 Namibia
87 | # 29 Ireland
88 | # 30 Scotland
89 | # 32 Nepal
90 | # 37 Oman
91 | # 40 Afghanistan
92 |
93 | ## WOMEN
94 |
95 | # 289 Australia Women
96 | # 1026 England Women
97 | # 1863 India Women
98 | # 4240 Bangladesh Women
99 | # 3379 South Africa Women
100 | # 3672 Sri Lanka Women
101 | # 2614 New Zealand Women
102 | # 2285 Ireland women
103 | # 3022 Pakistan Women
104 | # 2461 Netherlands Women
105 | # 3867 West Indies Women
106 | # 825 Denmark Women
107 | # 3808 Jamaica Women
108 | # 2331 Japan Women
109 | # 3505 Scotland Women
110 | # 3843 Trinidad & Tobago Women
111 |
112 |
113 |
114 | rename_countries <- function(countries) {
115 | countries |>
116 | stringr::str_replace("AFG", "Afghanistan") |>
117 | stringr::str_replace("Afr$", "Africa XI") |>
118 | stringr::str_replace("AUS", "Australia") |>
119 | stringr::str_replace("Bdesh|BDESH|BD", "Bangladesh") |>
120 | stringr::str_replace("BMUDA", "Bermuda") |>
121 | stringr::str_replace("CAN", "Canada") |>
122 | stringr::str_replace("DnWmn|Denmk", "Denmark") |>
123 | stringr::str_replace("EAf", "East (and Central) Africa") |>
124 | stringr::str_replace("ENG", "England") |>
125 | stringr::str_replace("HKG", "Hong Kong") |>
126 | stringr::str_replace("ICC$", "ICC World XI") |>
127 | stringr::str_replace("INDIA|IND", "India") |>
128 | stringr::str_replace("IntWn|Int XI", "International XI") |>
129 | stringr::str_replace("Ire$|IRELAND|IRE", "Ireland") |>
130 | stringr::str_replace("JamWn", "Jamaica") |>
131 | stringr::str_replace("JPN", "Japan") |>
132 | stringr::str_replace("KENYA", "Kenya") |>
133 | stringr::str_replace("NAM", "Namibia") |>
134 | stringr::str_replace("NEPAL", "Nepal") |>
135 | stringr::str_replace("Neth$|NL", "Netherlands") |>
136 | stringr::str_replace("NZ", "New Zealand") |>
137 | stringr::str_replace("OMAN", "Oman") |>
138 | stringr::str_replace("PAK", "Pakistan") |>
139 | stringr::str_replace("PNG|P\\.N\\.G\\.", "Papau New Guinea") |>
140 | stringr::str_replace("^SA", "South Africa") |>
141 | stringr::str_replace("SCOT|SCO|Scot$", "Scotland") |>
142 | stringr::str_replace("SL", "Sri Lanka") |>
143 | stringr::str_replace("TTWmn|T \\& T", "Trinidad and Tobago") |>
144 | stringr::str_replace("UAE|U\\.A\\.E\\.", "United Arab Emirates") |>
145 | stringr::str_replace("USA|U\\.S\\.A\\.", "United States of America") |>
146 | stringr::str_replace("World$|World-XI", "World XI") |>
147 | stringr::str_replace("WI", "West Indies") |>
148 | stringr::str_replace("YEWmn|Y\\. Eng", "Young England") |>
149 | stringr::str_replace("ZIM", "Zimbabwe")
150 | }
151 |
--------------------------------------------------------------------------------
/R/fetch_player.R:
--------------------------------------------------------------------------------
1 | #' Fetch Player Data
2 | #'
3 | #' Fetch individual player data from all matches played. The function will scrape
4 | #' the data from ESPNCricinfo and return a tibble with one line per innings for all
5 | #' games a player has played. To identify a player, use their Cricinfo player ID.
6 | #' The simplest way to find this is to look up their Cricinfo Profile page. The number
7 | #' at the end of the URL is the ID. For example, Meg Lanning's profile page is
8 | #' http://www.espncricinfo.com/australia/content/player/329336.html,
9 | #' so her ID is 329336.
10 | #'
11 | #' @param playerid The player ID as given in the Cricinfo profile. Integer or character.
12 | #' @param matchtype Which type of cricket matches do you want? Tests, ODIs or T20s? Not case-sensitive.
13 | #' @param activity Which type of activities do you want? Batting, Bowling or Fielding? Not case-sensitive.
14 | #'
15 | #' @return A tibble containing data on the selected player, with one row for every innings
16 | #' of every match in which they have played.
17 | #' @author Rob J Hyndman and Sayani Gupta
18 | #' @seealso [find_player_id()] to find a player ID by searching on their name,
19 | #' and [fetch_player_meta()] to download meta data for players.
20 | #' @examples
21 | #' \dontrun{
22 | #' # Download data on some players
23 | #' EllysePerry <- fetch_player_data(275487, "T20", "batting")
24 | #' RahulDravid <- fetch_player_data(28114, "ODI", "fielding")
25 | #' LasithMalinga <- fetch_player_data(49758, "Test", "bowling")
26 | #'
27 | #' # Create a plot for Ellyse Perry's T20 scores
28 | #' library(dplyr)
29 | #' library(ggplot2)
30 | #' EllysePerry |>
31 | #' filter(!is.na(Runs)) |>
32 | #' ggplot(aes(x = Start_Date, y = Runs, col = Dismissal, na.rm = TRUE)) +
33 | #' geom_point() +
34 | #' ggtitle("Ellyse Perry's T20 Scores")
35 | #' }
36 | #' @export
37 | fetch_player_data <- function(
38 | playerid,
39 | matchtype = c("test", "odi", "t20"),
40 | activity = c("batting", "bowling", "fielding")
41 | ) {
42 | matchtype <- tolower(matchtype)
43 | matchtype <- match.arg(matchtype)
44 |
45 | activity <- tolower(activity)
46 | activity <- match.arg(activity)
47 |
48 | matchclass <- match(matchtype, c("test", "odi", "t20"))
49 |
50 | # Try male URL
51 | output <- get_cricinfo_data(playerid, matchclass, matchtype, activity)
52 | if (inherits(output, "character")) {
53 | if (output %in% c("No records")) {
54 | # Player exists. So try female URL
55 | output <- get_cricinfo_data(playerid, matchclass + 7L, matchtype, activity)
56 | }
57 | }
58 | if (inherits(output, "character")) {
59 | if (output == "No player") {
60 | stop("Player not found")
61 | } else if (output == "No records") {
62 | stop("Player did not play this format")
63 | }
64 | }
65 |
66 | # Remove redundant missings columns
67 | tab <- tibble::as_tibble(
68 | output[, colSums(is.na(output)) != NROW(output)],
69 | .name_repair = "check_unique"
70 | )
71 |
72 | # Convert "-" to NA
73 | tab[tab == "-"] <- NA
74 |
75 | # Convert some columns to numeric or Date
76 | tab$Innings <- as.integer(tab$Inns)
77 | tab$Date <- lubridate::dmy(tab$`Start Date`)
78 | tab$`Start Date` <- NULL
79 | tab$Opposition <- substring(tab$Opposition, 3)
80 | tab$Ground <- as.character(tab$Ground)
81 | if ("Mins" %in% colnames(tab)) {
82 | tab$Mins <- as.numeric(tab$Mins)
83 | }
84 |
85 | # Make tidy column names columns
86 | tidy.col <- make.names(colnames(tab), unique = TRUE)
87 | colnames(tab) <- gsub(".", "_", tidy.col, fixed = TRUE)
88 | tidy.col <- colnames(tab)
89 |
90 | ## order the elements, no difference for different activities
91 | com_col <- c("Date", "Innings", "Opposition", "Ground")
92 |
93 | ## Removing "*" in the column `Runs` and converting it to numeric
94 | if ("Runs" %in% colnames(tab)) {
95 | tab$Runs <- suppressWarnings(as.numeric(gsub(
96 | "*",
97 | "",
98 | x = tab$Runs,
99 | fixed = TRUE
100 | )))
101 | }
102 |
103 | # Reorder columns
104 | return(
105 | tab[, c(com_col, tidy.col[!tidy.col %in% com_col])]
106 | )
107 | }
108 |
109 | get_cricinfo_data <- function(playerid, matchclass, matchtype, activity) {
110 | url <- paste(
111 | "http://stats.espncricinfo.com/ci/engine/player/",
112 | playerid,
113 | ".html?class=",
114 | matchclass,
115 | ";template=results;type=",
116 | activity,
117 | ";view=innings;wrappertype=print",
118 | sep = ""
119 | )
120 | raw <- try(xml2::read_html(url), silent = TRUE)
121 | if (!("try-error" %in% class(raw))) {
122 | output <- rvest::html_table(raw)
123 | if ("No records available to match this query" %in% unlist(output)) {
124 | return("No records")
125 | } else {
126 | # Grab relevant table
127 | return(output[[4]])
128 | }
129 | } else {
130 | return("No player")
131 | }
132 | }
133 |
--------------------------------------------------------------------------------
/vignettes/cricinfo.Rmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "ESPNCricinfo data"
3 | author: Rob J Hyndman
4 | output:
5 | rmarkdown::html_vignette:
6 | fig_width: 10
7 | fig_height: 5
8 | vignette: >
9 | %\VignetteIndexEntry{ESPNCricinfo data}
10 | %\VignetteEngine{knitr::rmarkdown}
11 | %\VignetteEncoding{UTF-8}
12 | ---
13 |
14 | ```{r, include = FALSE}
15 | knitr::opts_chunk$set(
16 | collapse = TRUE,
17 | comment = "#>",
18 | echo = TRUE
19 | )
20 | # Okabi-Ito colours
21 | options(
22 | ggplot2.discrete.colour = c("#D55E00", "#0072B2", "#009E73", "#CC79A7", "#E69F00", "#56B4E9", "#F0E442")
23 | )
24 | ```
25 |
26 | ```{r setup}
27 | library(cricketdata)
28 | library(dplyr)
29 | library(ggplot2)
30 | ```
31 |
32 | The `fetch_cricinfo()` function will fetch data on all international cricket matches provided by ESPNCricinfo. Please respect the [ESPNCricinfo terms of use](https://www.espncricinfo.com/ci/content/site/company/terms_use.html) when using this function.
33 |
34 | Here are some examples of its use.
35 |
36 | ```{r getdata, eval=FALSE, echo=FALSE}
37 | # Avoid downloading the data when the package is checked by CRAN.
38 | # This only needs to be run once to store the data locally
39 | wt20 <- fetch_cricinfo("T20", "Women", "Bowling")
40 | menODI <- fetch_cricinfo("ODI", "Men", "Batting", type = "innings", country = "Australia")
41 | Indfielding <- fetch_cricinfo("Test", "Men", "Fielding", country = "India")
42 | meg_lanning_id <- find_player_id("Meg Lanning")$ID
43 | MegLanning <- fetch_player_data(meg_lanning_id, "ODI") |>
44 | mutate(NotOut = (Dismissal == "not out")) |>
45 | mutate(NotOut = tidyr::replace_na(NotOut, FALSE))
46 |
47 | saveRDS(wt20, here::here("inst/extdata/wt20.rds"))
48 | saveRDS(menODI, here::here("inst/extdata/menODI.rds"))
49 | saveRDS(Indfielding, here::here("inst/extdata/Indfielding.rds"))
50 | saveRDS(MegLanning, here::here("inst/extdata/MegLanning.rds"))
51 | ```
52 |
53 | ```{r loaddata, include=FALSE}
54 | wt20 <- readRDS("../inst/extdata/wt20.rds")
55 | menODI <- readRDS("../inst/extdata/menODI.rds")
56 | Indfielding <- readRDS("../inst/extdata/Indfielding.rds")
57 | MegLanning <- readRDS("../inst/extdata/MegLanning.rds")
58 | ```
59 |
60 | ## Women's T20 bowling data
61 |
62 | ```r
63 | # Fetch all Women's T20 data
64 | wt20 <- fetch_cricinfo("T20", "Women", "Bowling")
65 | ```
66 |
67 | ```{r woment20, message=FALSE, echo = FALSE}
68 | wt20 |>
69 | head() |>
70 | knitr::kable(digits = 2)
71 | ```
72 |
73 | ```{r woment20graph, fig.width=10, fig.height=8}
74 | wt20 |>
75 | filter(Wickets > 20, !is.na(Country)) |>
76 | ggplot(aes(y = StrikeRate, x = Country)) +
77 | geom_boxplot() +
78 | geom_point(alpha = 0.3, col = "blue") +
79 | ggtitle("Women T20: Strike Rates") +
80 | ylab("Balls per wicket") +
81 | coord_flip()
82 | ```
83 |
84 | ## Australian men's ODI data by innings
85 |
86 | ```r
87 | # Fetch all Australian Men's ODI data by innings
88 | menODI <- fetch_cricinfo("ODI", "Men", "Batting", type = "innings", country = "Australia")
89 | ```
90 |
91 | ```{r menodi, message=FALSE, echo=FALSE}
92 | menODI |>
93 | head() |>
94 | knitr::kable()
95 | ```
96 |
97 | ```{r menodigraph, warning=FALSE, message=FALSE}
98 | menODI |>
99 | ggplot(aes(y = Runs, x = Date)) +
100 | geom_point(alpha = 0.2, col = "#D55E00") +
101 | geom_smooth() +
102 | ggtitle("Australia Men ODI: Runs per Innings")
103 | ```
104 |
105 | ## Indian test fielding data
106 |
107 | ```r
108 | Indfielding <- fetch_cricinfo("Test", "Men", "Fielding", country = "India")
109 | ```
110 |
111 | ```{r indiafielding, echo=FALSE}
112 | Indfielding |>
113 | head() |>
114 | knitr::kable()
115 | ```
116 |
117 | ```{r indiafieldinggraph}
118 | Indfielding |>
119 | mutate(wktkeeper = (CaughtBehind > 0) | (Stumped > 0)) |>
120 | ggplot(aes(x = Matches, y = Dismissals, col = wktkeeper)) +
121 | geom_point() +
122 | ggtitle("Indian Men Test Fielding")
123 | ```
124 |
125 | ## Meg Lanning's ODI batting
126 |
127 | ```r
128 | meg_lanning_id <- find_player_id("Meg Lanning")$ID
129 | MegLanning <- fetch_player_data(meg_lanning_id, "ODI") |>
130 | mutate(NotOut = (Dismissal == "not out")) |>
131 | mutate(NotOut = tidyr::replace_na(NotOut, FALSE))
132 | ```
133 |
134 | ```{r meglanning, echo=FALSE}
135 | MegLanning |>
136 | head() |>
137 | knitr::kable()
138 | ```
139 |
140 | ```{r meglanninggraph}
141 | # Compute batting average
142 | MLave <- MegLanning |>
143 | summarise(
144 | Innings = sum(!is.na(Runs)),
145 | Average = sum(Runs, na.rm = TRUE) / (Innings - sum(NotOut, na.rm=TRUE))
146 | ) |>
147 | pull(Average)
148 | names(MLave) <- paste("Average =", round(MLave, 2))
149 | # Plot ODI scores
150 | ggplot(MegLanning) +
151 | geom_hline(aes(yintercept = MLave), col = "gray") +
152 | geom_point(aes(x = Date, y = Runs, col = NotOut)) +
153 | ggtitle("Meg Lanning ODI Scores") +
154 | scale_y_continuous(sec.axis = sec_axis(~., breaks = MLave))
155 | ```
156 |
--------------------------------------------------------------------------------
/R/fetch_cricsheet.R:
--------------------------------------------------------------------------------
1 | # For retrieving ball-by-ball data ---------------------------------------------
2 |
3 | #' Fetch ball-by-ball, match and player data from Cricsheet and return a tibble.
4 | #'
5 | #' Download csv data from Cricsheet \url{https://cricsheet.org/downloads/}.
6 | #' Data must be specified by three factors:
7 | #' (a) type of data: `bbb` (ball-by-ball), `match` or `player`.
8 | #' (b) gender;
9 | #' (c) competition specified as a Cricsheet code. See \code{\link{cricsheet_codes}} for the
10 | #' competitions and codes available.
11 | #'
12 | #' @param type Character string giving type of data: ball-by-ball, match info or player info.
13 | #' @param gender Character string giving player gender: female or male.
14 | #' @param competition Character string giving code corresponding to competition. See \code{\link{cricsheet_codes}} for the
15 | #' competitions and codes available.
16 | #' @author Jacquie Tran, Hassan Rafique and Rob J Hyndman
17 | #' @return A \code{tibble} object, similar to a \code{data.frame}.
18 | #' @examples
19 | #' \dontrun{
20 | #' wbbl_bbb <- fetch_cricsheet(competition = "wbbl", type = "bbb")
21 | #' wbbl_match <- fetch_cricsheet(competition = "wbbl", type = "match")
22 | #' wbbl_player <- fetch_cricsheet(competition = "wbbl", type = "player")
23 | #' }
24 | #' @export
25 |
26 | fetch_cricsheet <- function(
27 | type = c("bbb", "match", "player"),
28 | gender = c("female", "male"),
29 | competition = "tests") {
30 | # Match arguments
31 | type <- match.arg(type)
32 | gender <- match.arg(gender)
33 |
34 | # Convert code for backwards compatibility
35 | competition <- dplyr::recode(competition,
36 | county = "cch",
37 | edwards_cup = "cec",
38 | heyhoe_flint_trophy = "rhf",
39 | multi_day = "mdms",
40 | sheffield_shield = "ssh",
41 | super_smash = "ssm",
42 | the_hundred = "hnd",
43 | t20_blast = "ntb",
44 | t20is = "t20s",
45 | t20is_unofficial = "it20s",
46 | wbbl = "wbb",
47 | wt20c = "wtc"
48 | )
49 | # Construct file names and url
50 | destfile <- paste0(competition, "_", gender, "_csv2.zip")
51 | url <- paste0("https://cricsheet.org/downloads/", destfile)
52 | subdir <- paste0(sub("_csv2.zip", "", destfile), "_bbb")
53 | destfile <- file.path(tempdir(), destfile)
54 |
55 | # Download zip file from Cricsheet.org if it hasn't already been downloaded
56 | if (!file.exists(destfile)) {
57 | download.file(url, destfile)
58 | }
59 |
60 | # List all files in zip
61 | check_files <- unzip(destfile, exdir = tempdir(), list = TRUE)$Name
62 | check_files <- data.frame(check_files = check_files)
63 | check_files$check_files <- as.character(check_files$check_files)
64 | check_files$file_type <- dplyr::case_when(
65 | stringr::str_detect(check_files$check_files, "txt") ~ "txt",
66 | stringr::str_detect(check_files$check_files, "_info") ~ "info",
67 | stringr::str_detect(check_files$check_files, "all_matches") ~ "allbbb",
68 | TRUE ~ "bbb"
69 | )
70 |
71 | # Identify the required files
72 | if (type == "bbb") {
73 | if ("all_matches.csv" %in% check_files$check_files) {
74 | match_files <- "all_matches.csv"
75 | } else {
76 | match_files <- check_files$check_files[check_files$file_type == "bbb"]
77 | }
78 | } else {
79 | match_files <- check_files$check_files[check_files$file_type == "info"]
80 | }
81 | # Unzip files into sub directory
82 | unzip(destfile, match_files, exdir = file.path(tempdir(), subdir))
83 |
84 | # List match files with full file paths
85 | match_filepaths <- file.path(tempdir(), subdir, match_files)
86 |
87 | if (type == "bbb") {
88 | # Read data from CSVs stored in the temp directory
89 | all_matches <- do.call(
90 | "rbind",
91 | lapply(match_filepaths, FUN = function(files) {
92 | read.csv(files)
93 | })
94 | )
95 | } else {
96 | all_matches <- suppressWarnings(
97 | readr::read_csv(
98 | match_filepaths,
99 | id = "path", guess_max = 100,
100 | col_names = c("col_to_delete", "key", "value"),
101 | skip = 1, show_col_types = FALSE,
102 | col_types = readr::cols(.default = readr::col_character())
103 | )
104 | )
105 | # Note: Warning suppressed because the source data
106 | # changes format slightly when displaying player metadata compared to match data
107 | # Match metadata is in key-value pairs,
108 | # while player metadata contains additional value columns
109 | # We can safely suppress the warning(s) here and deal with the different
110 | # formats below.
111 |
112 | # Tidy up and subset to match metadata only
113 | # (i.e., excluding player / people metadata)
114 | # Note: Warning suppressed again as per note above.
115 | all_matches$col_to_delete <- NULL
116 | all_matches$match_id <- stringr::str_extract(all_matches$path, "[a-zA-Z0-9_\\-\\.]*$")
117 | all_matches$match_id <- sub("_info.csv", "", all_matches$match_id)
118 | all_matches$path <- NULL
119 | if (type == "match") {
120 | all_matches <- all_matches[!(all_matches$key %in% c("player", "players", "registry")), ]
121 | # Find columns with multiple values per key/match_id
122 | # Rows with second teams named
123 | j <- which(all_matches$key == "team")
124 | j <- j[seq(2, length(j), by = 2)]
125 | all_matches$key[j] <- "team2"
126 | all_matches$key[all_matches$key == "team"] <- "team1"
127 | # Rows with second umpires named
128 | j <- which(all_matches$key == "umpire")
129 | j <- j[seq(2, length(j), by = 2)]
130 | all_matches$key[j] <- "umpire2"
131 | all_matches$key[all_matches$key == "umpire"] <- "umpire1"
132 | # Make into wide form
133 | all_matches <- tidyr::pivot_wider(all_matches,
134 | id_cols = "match_id",
135 | names_from = "key",
136 | values_from = "value",
137 | values_fill = NA,
138 | values_fn = ~ head(.x, 1) # To remove duplicated values such as date
139 | )
140 | all_matches <- dplyr::mutate_all(all_matches, ~ replace(., . == "NULL", NA_character_))
141 | } else {
142 | all_matches <- all_matches[all_matches$key %in% c("player", "players"), ]
143 | all_matches$key <- NULL
144 | all_matches <- tidyr::separate(all_matches, value, sep = ",", c("team", "player"))
145 | }
146 | }
147 | output <- tibble::as_tibble(all_matches)
148 |
149 | # Clean data
150 | # Was it a T20 match?
151 | if (!("ball" %in% colnames(output))) {
152 | t20 <- FALSE
153 | } else {
154 | t20 <- max(output$ball, na.rm = TRUE) <= 21
155 | }
156 | if (type == "bbb" & t20) {
157 | output <- cleaning_bbb_t20_cricsheet(output)
158 | }
159 |
160 | return(output)
161 | }
162 |
163 | # Function to clean raw t20 bbb data from cricsheet
164 | # Provided by
165 | cleaning_bbb_t20_cricsheet <- function(df) {
166 | df <- df |>
167 | dplyr::mutate(
168 | # Wicket lost
169 | wicket = !(wicket_type %in% c("", "retired hurt")),
170 | # Over number
171 | over = ceiling(ball),
172 | # Extra ball to follow
173 | extra_ball = (!is.na(wides) | !is.na(noballs))
174 | ) |>
175 | dplyr::group_by(match_id, innings, over) |>
176 | # Adjusting the ball values by introducing raw_balls, so that 1.1 and 1.10
177 | # are correctly differentiated as the first & tenth ball, respectively
178 | dplyr::mutate(ball = dplyr::row_number()) |>
179 | dplyr::ungroup()
180 |
181 | # Evaluating and joining runs scored, wickets lost at each stage of an innings
182 | df <- df |>
183 | dplyr::inner_join(
184 | df |>
185 | dplyr::group_by(match_id, innings) |>
186 | dplyr::reframe(
187 | runs_scored_yet = cumsum(runs_off_bat + extras),
188 | wickets_lost_yet = cumsum(wicket),
189 | ball = ball, over = over,
190 | .groups = "drop"
191 | ),
192 | by = c("match_id", "innings", "over", "ball")
193 | )
194 |
195 | # Evaluating the balls in over after adjusting for extra balls and balls remaining in an innings
196 | remaining_balls <- df |>
197 | dplyr::group_by(match_id, innings, over) |>
198 | dplyr::reframe(ball = ball, extra_ball = cumsum(extra_ball)) |>
199 | dplyr::mutate(
200 | ball_in_over = ball - extra_ball,
201 | balls_remaining = ifelse(innings %in% c(1, 2), 120 - ((over - 1) * 6 + ball_in_over), 6 - ball_in_over)
202 | ) |>
203 | dplyr::select(-extra_ball)
204 |
205 | # Evaluating innings totals using ball-by-ball data
206 | innings_total <- df |>
207 | dplyr::group_by(match_id, innings) |>
208 | dplyr::reframe(total_score = sum(runs_off_bat + extras)) |>
209 | tidyr::pivot_wider(
210 | names_from = "innings",
211 | values_from = c("total_score")
212 | ) |>
213 | dplyr::rename(innings1_total = "1", innings2_total = "2") |>
214 | dplyr::select(match_id, innings1_total, innings2_total)
215 |
216 | # Joining all the dfs
217 | df <- df |>
218 | dplyr::inner_join(remaining_balls, by = c("match_id", "innings", "over", "ball")) |>
219 | dplyr::inner_join(innings_total, by = "match_id") |>
220 | dplyr::mutate(target = innings1_total + 1) |>
221 | dplyr::mutate(start_date = as.Date(start_date))
222 |
223 | # Re-ordering the columns in the df
224 | df <- df |>
225 | dplyr::select(
226 | match_id, season, start_date, venue, innings, over, ball, batting_team,
227 | bowling_team, striker, non_striker, bowler, runs_off_bat, extras,
228 | ball_in_over, extra_ball, balls_remaining, runs_scored_yet,
229 | wicket, wickets_lost_yet, innings1_total, innings2_total, target,
230 | wides, noballs, byes, legbyes, penalty, wicket_type, player_dismissed,
231 | other_wicket_type, other_player_dismissed, dplyr::everything()
232 | )
233 |
234 | return(df)
235 | }
236 |
237 | utils::globalVariables(c(
238 | "ball_in_over",
239 | "ball",
240 | "balls_remaining",
241 | "batting_team",
242 | "bowler",
243 | "bowling_team",
244 | "byes",
245 | "col_to_delete",
246 | "competition_type",
247 | "exclude_flag",
248 | "extra_ball",
249 | "extras",
250 | "glued_url",
251 | "innings",
252 | "innings1_total",
253 | "innings2_total",
254 | "key",
255 | "legbyes",
256 | "match_id",
257 | "noballs",
258 | "non_striker",
259 | "other_player_dismissed",
260 | "other_wicket_type",
261 | "over",
262 | "path_start",
263 | "path",
264 | "penalty",
265 | "player_dismissed",
266 | "runs_off_bat",
267 | "runs_scored_yet",
268 | "season",
269 | "start_date",
270 | "striker",
271 | "target",
272 | "value",
273 | "venue",
274 | "wicket_type",
275 | "wicket",
276 | "wickets_lost_yet",
277 | "wides"
278 | ))
279 |
--------------------------------------------------------------------------------
/vignettes/cricketdata_R_pkg.Rmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "cricketdata: An Open Source R package"
3 | author: Hassan Rafique and Jacquie Tran
4 | date: "18 October 2022"
5 | abstract: "Open and accessible data streams are crucial for reproducible research and further development. Cricket data sources are limited and are usually not in a format ready for analysis. [`cricketdata` R](http://pkg.robjhyndman.com/cricketdata/) package allows the users to download the data as a tibble ready for analysis from two primary sources: ESPNCricinfo and Cricsheet. [fetch_cricinfo()](http://pkg.robjhyndman.com/cricketdata/reference/fetch_cricinfo.html) and [fetch_player_data()](http://pkg.robjhyndman.com/cricketdata/reference/fetch_player_data.html) functions allow the user to download the data from ESPNCricinfo for different formats of international cricket (tests, odis, T20), player position (batter, bowler, fielding), and whole career or innings wise. Cricsheet is another data source, primarily for ball-by-ball data. [fetch_cricsheet()](http://pkg.robjhyndman.com/cricketdata/reference/fetch_cricsheet.html) function downloads the ball-by-ball, match, and player data for different competitions/formats (tests, odis, T20 internationals, T20 leagues). The T20 data is further processed by adding more features (columns) using the raw data. Some other [functions](http://pkg.robjhyndman.com/cricketdata/reference/fetch_player_meta.html) provide access to the individual players' playing career data and information about their playing style, country of origin, etc. The package essentially provides (almost) all publicly available cricket data ready for analysis. The package saves the user significant time in building the data pipeline, which may now be used for analysis. Here's an example of project built using `cricketdata`: "
6 | bibliography: bibliography.bib
7 | bibengine: biblatex
8 | output:
9 | rmarkdown::html_vignette:
10 | fig_width: 8
11 | fig_height: 5
12 | vignette: >
13 | %\VignetteIndexEntry{cricketdata: An Open Source R package}
14 | %\VignetteEngine{knitr::rmarkdown}
15 | %\VignetteEncoding{UTF-8}
16 | ---
17 |
18 | ```{r, include = FALSE}
19 | knitr::opts_chunk$set(
20 | collapse = TRUE,
21 | comment = "#>",
22 | echo = TRUE,
23 | cache = TRUE,
24 | warning = FALSE
25 | )
26 | ```
27 |
28 | ```{r}
29 | library(cricketdata)
30 | library(dplyr)
31 | library(ggplot2)
32 | ```
33 |
34 | ```{r getdata, eval=FALSE, echo=FALSE}
35 | # Avoid downloading the data when the package is checked by CRAN.
36 | # This only needs to be run once to store the data locally
37 | ipl_bbb <- fetch_cricsheet("bbb", "male", "ipl")
38 | wt20 <- fetch_cricinfo("T20", "Women", "Bowling")
39 | menODI <- fetch_cricinfo("ODI", "Men", "Batting",
40 | type = "innings",
41 | country = "United States of America"
42 | )
43 | meg_lanning_id <- find_player_id("Meg Lanning")$ID
44 | MegLanning <- fetch_player_data(meg_lanning_id, "ODI") |>
45 | mutate(NotOut = (Dismissal == "not out"))
46 | aus_women <- fetch_player_meta(c(329336, 275487))
47 |
48 | saveRDS(wt20, here::here("inst/extdata/wt20.rds"))
49 | saveRDS(menODI, here::here("inst/extdata/usmenODI.rds"))
50 | saveRDS(MegLanning, here::here("inst/extdata/MegLanning.rds"))
51 | saveRDS(meg_lanning_id, here::here("inst/extdata/meg_lanning_id.rds"))
52 | saveRDS(ipl_bbb, here::here("inst/extdata/ipl_bbb.rds"))
53 | saveRDS(aus_women, here::here("inst/extdata/aus_women.rds"))
54 | ```
55 |
56 | ```{r loaddata, include=FALSE}
57 | ipl_bbb <- readRDS("../inst/extdata/ipl_bbb.rds")
58 | wt20 <- readRDS("../inst/extdata/wt20.rds")
59 | menODI <- readRDS("../inst/extdata/usmenODI.rds")
60 | MegLanning <- readRDS("../inst/extdata/MegLanning.rds")
61 | meg_lanning_id <- readRDS("../inst/extdata/meg_lanning_id.rds")
62 | aus_women <- readRDS("../inst/extdata/aus_women.rds")
63 | ```
64 |
65 |
66 | # Introduction
67 |
68 | The coverage of cricket as a sport has been limited compared to other global sports. [ESPN Cricinfo](https://www.espncricinfo.com) is the major and one of the few online platforms dedicated to cricket coverage. It started as [Cricinfo](https://en.wikipedia.org/wiki/ESPNcricinfo#/media/File:Cricinfo_in_1995.jpg) in the late 90s, and it was maintained by students and cricket fans who had immigrated to North America but were eager to keep tabs on the cricket activity around the globe. [ESPN acquired Cricinfo](https://www.espncricinfo.com/story/espn-acquires-cricinfo-297655) in 2007, becoming ESPN Cricinfo. It is the most extensive repository of open cricket data with the caveat that data is not in an accessible format to be downloaded easily. You would have to copy-paste (tables) or write programming scripts to access the data in a format suitable for analysis. Recently they have added a search tool, [Statsguru](https://stats.espncricinfo.com/ci/engine/stats/index.html), that lets you parse through their database, presenting results usually in a table format.
69 |
70 | [Cricsheet](https://cricsheet.org/) is another open data source for ball-by-ball data maintained by a great fan of the game, [Stephen Rushe](https://twitter.com/srushe). The cricsheet provides raw ball-by-ball data for all formats (tests, odis, T20) and both Men's and Women's games. It is an extensive project to produce ball-by-ball data, and we hugely appreciate Stephen Rushe's work. The data is available in different formats, such as JSON, YAML, and CSV.
71 |
72 | ## Why `cricketdata`
73 |
74 | The `cricketdata` (open-source) package aims to be a one-stop shop for most cricket data from all primary sources, available in an accessible form and ready for analysis. Different functions in the package allow us to download the data from Cricinfo and cricsheet as a data frame (tibble) in R. The user can access data from different formats of the game, e,g, tests, odis, international T20, league T20, etc. In particular, the
75 |
76 | - ball-by-ball data,
77 | - individual player play by innings data,
78 | - player play by team wrt career or innings data,
79 | - player id, dob, batting/bowling hand, bowling type.
80 |
81 | [cricWAR](https://dazzalytics.shinyapps.io/cricwar/) is an example of sports analytic project based on `cricketdata` resources.
82 |
83 | `cricketdata` as an open-source project is inspired primarily from the open-source work done by `Rstats` community and sports analytics projects such as [`nflfastR`](https://www.nflfastr.com/) [@nflfastR], [`sportsdataverse`](https://sportsdataverse.org/) [@dataverse].
84 |
85 | In the following sections, we will show how to install the package and take full advantage of the package functionality with numerous examples.
86 |
87 | # Installation
88 |
89 | `cricketdata` is available on CRAN and the *stable* version can be installed.
90 |
91 | ```{r, eval=FALSE}
92 | install.packages("cricketdata", dependencies = TRUE)
93 | ```
94 |
95 | You may also download the *development* version from [Github](https://github.com/robjhyndman/cricketdata)
96 |
97 | ```{r, eval=FALSE}
98 | install.packages("devtools")
99 | devtools::install_github("robjhyndman/cricketdata")
100 | ```
101 |
102 | # Functions
103 |
104 | There are six main functions,
105 |
106 | - `fetch_cricinfo()`
107 | - `find_player_id()`
108 | - `fetch_player_data()`
109 | - `fetch_cricsheet()`
110 | - `fetch_player_meta()`
111 | - `update_player_meta()`
112 |
113 | and a data file containing the player meta data.
114 |
115 | - `player_meta`
116 |
117 | We show the use of each function with examples below.
118 |
119 | ## `fetch_cricinfo()`
120 |
121 | Fetch team data on international cricket matches provided by ESPNCricinfo. It downloads data for international T20, ODI or Test matches, for men or women, and for batting, bowling or fielding. By default, it downloads career-level statistics for individual players.
122 |
123 | *Arguments*
124 |
125 | - matchtype: Character indicating test (default), odi, or t20.
126 | - sex: Character indicating men (default) or women.
127 | - activity: Character indicating batting (default), bowling or fielding.
128 | - type: Character indicating innings-by-innings or career (default) data.
129 | - country: Character indicating country. The default is to fetch data for all countries.
130 |
131 | **Women's T20 Bowling Data**
132 |
133 | ```{r, eval=FALSE, echo=TRUE}
134 | # Fetch all Women's Bowling data for T20 format
135 | wt20 <- fetch_cricinfo("T20", "Women", "Bowling")
136 | ```
137 |
138 | ```{r tbl-wt20}
139 | # Looking at data
140 | wt20 |>
141 | glimpse()
142 |
143 | # Table showing certain features of the data
144 | wt20 |>
145 | select(Player, Country, Matches, Runs, Wickets, Economy, StrikeRate) |>
146 | head() |>
147 | knitr::kable(
148 | digits = 2, align = "c",
149 | caption = "Women Player career profile for international T20"
150 | )
151 | ```
152 |
153 | ```{r fig-wt20SRvER, fig.cap="Strike Rate (balls bowled per wicket) Vs Average (runs conceded per wicket) for Women international T20 bowlers. Each observation represents one player, who has taken at least 50 international wickets."}
154 | # Plotting Data
155 | wt20 |>
156 | filter(Wickets >= 50) |>
157 | ggplot(aes(y = StrikeRate, x = Average)) +
158 | geom_point(alpha = 0.3, col = "blue") +
159 | ggtitle("Women International T20 Bowlers") +
160 | ylab("Balls bowled per wicket") +
161 | xlab("Runs conceded per wicket")
162 | ```
163 |
164 | **USA men's ODI data by innings**
165 |
166 |
167 | ```{r, echo=TRUE, eval=FALSE}
168 | # Fetch all USA Men's ODI data by innings
169 | menODI <- fetch_cricinfo("ODI", "Men", "Batting",
170 | type = "innings",
171 | country = "United States of America"
172 | )
173 | ```
174 |
175 | ```{r tbl-USA100s}
176 | #| tbl-cap: Centuries, 100 runs or more in a single innings, scored by USA Batters
177 | # Table of USA player who have scored a century
178 | menODI |>
179 | filter(Runs >= 100) |>
180 | select(Player, Runs, BallsFaced, Fours, Sixes, Opposition) |>
181 | knitr::kable(digits = 2)
182 | ```
183 |
184 | ```{r, echo=FALSE}
185 | # menODI |>
186 | # filter(Runs >= 50) |>
187 | # ggplot(aes(y = Runs, x = BallsFaced) ) +
188 | # geom_point(size = 2) +
189 | # geom_text(aes(label= Player), vjust=-0.5, color="#013369",
190 | # position = position_dodge(0.9), size=2) +
191 | # ylab("Runs Scored") + xlab("Balls Faced")
192 | ```
193 |
194 | ## `fetch_player_id`
195 |
196 | Each player has a player id on ESPNCricinfo, which is useful to access a individual player's data. This function given a string of players name or part of the name would return the name of corresponding player(s), their cricinfo id(s), and some other information.
197 |
198 | *Argument*
199 |
200 | - searchstring: string of a player's name or part of the name
201 |
202 | ```r
203 | # Fetching a player, Meg Lanning's, ID
204 | meg_lanning_id <- find_player_id("Meg Lanning")$ID
205 | ```
206 |
207 | ```{r}
208 | meg_lanning_id
209 | ```
210 |
211 | ## `fetch_player_data`
212 |
213 | Fetch individual player data from all matches played. The function will scrape the data from ESPNCricinfo and return a tibble with one line per innings for all games a player has played. To identify a player, use their Cricinfo player ID. The simplest way to find this is to look up their Cricinfo Profile page. The number at the end of the URL is the ID. For example, Meg Lanning's profile page is , so her ID is 329336. Or you may use the `find_player_id` function.
214 |
215 | *Argument*
216 |
217 | - playerid
218 | - matchtype: Character indicating test (default), odi, or t20.
219 | - activity: Character indicating batting (default), bowling or fielding.
220 |
221 | ```{r echo=TRUE, eval=FALSE}
222 | # Fetching the player Meg Lanning's playing data
223 | MegLanning <- fetch_player_data(meg_lanning_id, "ODI") |>
224 | mutate(NotOut = (Dismissal == "not out"))
225 | ```
226 |
227 | ```{r fig-meglanning, fig.cap="Meg Lanning, Australian captain, has shown amazing consistency over her career, with centuries scored in every year of her career except for 2021, when her highest score from 6 matches was 53."}
228 | dim(MegLanning)
229 | names(MegLanning)
230 |
231 | # Compute batting average
232 | MLave <- MegLanning |>
233 | filter(!is.na(Runs)) |>
234 | summarise(Average = sum(Runs) / (n() - sum(NotOut))) |>
235 | pull(Average)
236 | names(MLave) <- paste("Average =", round(MLave, 2))
237 |
238 | # Plot ODI scores
239 | ggplot(MegLanning) +
240 | geom_hline(aes(yintercept = MLave), col = "gray") +
241 | geom_point(aes(x = Date, y = Runs, col = NotOut)) +
242 | ggtitle("Meg Lanning ODI Scores") +
243 | scale_y_continuous(sec.axis = sec_axis(~., breaks = MLave))
244 | ```
245 |
246 | ## `fetch_cricsheet()`
247 |
248 | [Cricsheet](https://cricsheet.org/) is the only open accessible source for cricket ball-by-ball data. `fetch_cricsheet()` download csv data from cricsheet. Data must be specified by three factors: (a) type of data: bbb (ball-by-ball), match or player. (b) gender; (c) competition. See for what the competition character codes mean.
249 |
250 | The raw T20 data from cricsheet is further processed to add more columns (features) to facilitate analysis.
251 |
252 | *Arguments*
253 |
254 | - type: Character string giving type of data: ball-by-ball, match info or player info.
255 |
256 | - gender: Character string giving player gender: female or male.
257 |
258 | - competition: Character string giving name of competition. e.g. ipl for Indiana Premier League, psl for Pakistan Super League, tests for international test matches, etc.
259 |
260 | **Indian Premier League (IPL) Ball-by-Ball Data**
261 |
262 | ```{r echo=TRUE, eval=FALSE}
263 | # Fetch all IPL ball-by-ball data
264 | ipl_bbb <- fetch_cricsheet("bbb", "male", "ipl")
265 | ```
266 |
267 | ```{r}
268 | ipl_bbb |>
269 | glimpse()
270 | ```
271 |
272 | ```{r fig-iplbatter, fig.cap="Top 20 prolific batters in IPL 2022. We show what percentage of balls they hit for a boundary (4 or 6) against percentage of how many balls they do not score off of (dot percent). Ideally we want to be in top left quadrant, high boundary % and low dot %."}
273 | # Top 20 batters wrt Boundary and Dot % in IPL 2022 season
274 | ipl_bbb |>
275 | filter(season == "2022") |>
276 | group_by(striker) |>
277 | summarize(
278 | Runs = sum(runs_off_bat), BallsFaced = n() - sum(!is.na(wides)),
279 | StrikeRate = Runs / BallsFaced, DotPercent = sum(runs_off_bat == 0) * 100 / BallsFaced,
280 | BoundaryPercent = sum(runs_off_bat %in% c(4, 6)) * 100 / BallsFaced
281 | ) |>
282 | arrange(desc(Runs)) |>
283 | rename(Batter = striker) |>
284 | slice(1:20) |>
285 | ggplot(aes(y = BoundaryPercent, x = DotPercent, size = BallsFaced)) +
286 | geom_point(color = "red", alpha = 0.3) +
287 | geom_text(aes(label = Batter),
288 | vjust = -0.5, hjust = 0.5, color = "#013369",
289 | position = position_dodge(0.9), size = 3
290 | ) +
291 | ylab("Boundary Percent") +
292 | xlab("Dot Percent") +
293 | ggtitle("IPL 2022: Top 20 Batters")
294 | ```
295 |
296 | ```{r tbl-IPL2022Batters}
297 | #| tbl-cap: Top 10 prolific batters of IPL 2022 season. JC Butler scored the most runs in total and scored at the highest strike rate (runs per ball). His boundary percent (percentage of balls faced hit for 4s or 6s) is also the highest, while his dot percent (percentage of balls not scored of) is also among the highest.
298 | # Top 10 prolific batters in IPL 2022 season.
299 | ipl_bbb |>
300 | filter(season == "2022") |>
301 | group_by(striker) |>
302 | summarize(
303 | Runs = sum(runs_off_bat), BallsFaced = n() - sum(!is.na(wides)),
304 | StrikeRate = Runs / BallsFaced,
305 | DotPercent = sum(runs_off_bat == 0) * 100 / BallsFaced,
306 | BoundaryPercent = sum(runs_off_bat %in% c(4, 6)) * 100 / BallsFaced
307 | ) |>
308 | arrange(desc(Runs)) |>
309 | rename(Batter = striker) |>
310 | slice(1:10) |>
311 | knitr::kable(digits = 1, align = "c")
312 | ```
313 |
314 | ## `player_meta`
315 |
316 | It is a data set containing player's and cricket officials meta data such as full name, country of representation, data of birth, bowling and batting hand, bowling style, and playing role. More than 11,000 player's and officials data is available. This data was scraped from ESPNCricinfo website.
317 |
318 | ```{r tbl-playermetadata}
319 | #| tbl-cap: Player and officials meta data.
320 | player_meta |>
321 | filter(!is.na(playing_role)) |>
322 | select(-cricinfo_id, -unique_name, -name) |>
323 | head() |>
324 | knitr::kable(
325 | digits = 1, align = "c", format = "pipe",
326 | col.names = c(
327 | "ID", "FullName", "Country", "DOB", "BirthPlace",
328 | "BattingStyle", "BowlingStyle", "PlayingRole"
329 | )
330 | )
331 | ```
332 |
333 | ## `fetch_player_meta()`
334 |
335 | Fetch the player's meta data such as full name, country of representation, data of birth, bowling and batting hand, bowling style, and playing role. This meta data is useful for advance modeling, e,g, age curves, batter profile against bowling types etc.
336 |
337 | *Argument*
338 |
339 | - playerid: A vector of player IDs as given in Cricinfo profiles. Integer or character.
340 |
341 | The cricinfo player ids can be accessed in multiple ways, e.g. use fetch_player_id() function, get the id from the player's cricinfo page or consult the `player_meta` data frame which has player meta data of more than 11,000 players.
342 |
343 |
344 | ```{r echo=TRUE, eval=FALSE}
345 | # Download meta data on Meg Lanning and Ellyse Perry
346 | aus_women <- fetch_player_meta(c(329336, 275487))
347 | ```
348 |
349 | ```{r tbl-ausplayermetadata}
350 | #| tbl-cap: Australian Women player meta data.
351 | aus_women |>
352 | select(-name) |>
353 | knitr::kable(
354 | digits = 1, align = "c", format = "pipe",
355 | col.names = c(
356 | "ID", "FullName", "Country", "DOB", "BattingStyle", "BowlingStyle"
357 | )
358 | )
359 | ```
360 |
361 | ## `update_player_meta()`
362 |
363 | This function is supposed to consult the directory of all players available on cricsheet website and include the meta data of new players into the `player_meta` data frame. The data for new players will be scraped from the ESPNCricinfo.
364 |
365 | # References
366 |
--------------------------------------------------------------------------------
/vignettes/cricsheet.Rmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Cricsheet data"
3 | author: Jacquie Tran
4 | date: 7 November 2021
5 | output:
6 | rmarkdown::html_vignette:
7 | fig_width: 8
8 | fig_height: 5
9 | vignette: >
10 | %\VignetteIndexEntry{Cricsheet data}
11 | %\VignetteEngine{knitr::rmarkdown}
12 | %\VignetteEncoding{UTF-8}
13 | ---
14 |
15 | ```{r, include = FALSE}
16 | knitr::opts_chunk$set(
17 | collapse = TRUE,
18 | comment = "#>",
19 | echo = TRUE
20 | )
21 | ```
22 |
23 | ```{r setup}
24 | library(cricketdata)
25 | library(readr)
26 | library(dplyr)
27 | library(stringr)
28 | library(showtext)
29 | library(ggplot2)
30 | library(gghighlight)
31 | library(ggtext)
32 | library(patchwork)
33 | ```
34 |
35 | The `fetch_cricsheet()` function will download csv data from [Cricsheet](https://cricsheet.org/downloads/). Data must be specified by three factors: (a) `type` of data: `bbb` (ball-by-ball), `match` or `player`; (b) `gender`; (c) `competition`.
36 |
37 |
38 | Here are some examples of its use with some WBBL data. An earlier version of this article was published at https://underthehood.jacquietran.com/post/wbbl-visualisations/.
39 |
40 | ```{r getdata, eval=FALSE, echo=FALSE}
41 | # Avoid downloading the data when the package is checked by CRAN.
42 | # This only needs to be run once to store the data locally
43 | wbbl_bbb <- fetch_cricsheet(competition = "wbbl", gender = "female")
44 | wbbl_match_info <- fetch_cricsheet(competition = "wbbl", type = "match", gender = "female")
45 | saveRDS(wbbl_bbb, "inst/extdata/wbbl_bbb.rds")
46 | saveRDS(wbbl_match_info, "inst/extdata/wbbl_match_info.rds")
47 | ```
48 |
49 | ```{r loaddata, include=FALSE}
50 | wbbl_bbb <- readRDS("../inst/extdata/wbbl_bbb.rds")
51 | wbbl_match_info <- readRDS("../inst/extdata/wbbl_match_info.rds")
52 | ```
53 |
54 | ```r
55 | # Fetch ball-by-ball data
56 | wbbl_bbb <- fetch_cricsheet(competition = "wbbl", gender = "female")
57 |
58 | # Fetch match metadata
59 | wbbl_match_info <- fetch_cricsheet(competition = "wbbl", type = "match", gender = "female")
60 | ```
61 |
62 | ## Alyssa Healy's WBBL batting record
63 |
64 | What is there to say about Alyssa Healy that hasn’t already been written by a far better writer than I? She’s a dangerous batter at every level she plays at, so I wanted to visualise her production across seasons, compared to other batters in the WBBL.
65 |
66 | ### Tidying the data
67 |
68 | On exploring the Cricsheet data, I noticed that there are only 10 matches from WBBL01 with ball-by-ball data, but there were 59 matches played in the 1st season of WBBL (per [Wikipedia](https://en.wikipedia.org/wiki/2015%E2%80%9316_Women%27s_Big_Bash_League_season)).
69 |
70 | Additionally, when I originally worked on this plot of Healy vs. the world, it was mid-October 2021 and the first ball of #WBBL07 had yet to be bowled. Now, we are reproducing the chart as we are part-way through the current season, so the code below also omits data from any matches played so far in this 7th season.
71 |
72 | ```{r}
73 | # Data from 2015/16 (WBBL01) excluded due to only having 10 matches worth of data
74 | # in the Cricsheet spreadsheet.
75 | # Data from 2021/22 and later (WBBL07) excluded as incomplete at time of article
76 | wbbl_bbb_tidy <- wbbl_bbb |>
77 | filter(!season %in% c("2015/16", "2021/22")) |>
78 | filter(start_date < "2021-11-07")
79 | ```
80 |
81 | The ball-by-ball data hosted by Cricsheet provides a great starting point, but a little more tidying and wrangling is needed for the purposes of understanding batting performances across WBBL seasons.
82 |
83 | In cricket broadcasts, batting average is among the most common statistic that commentators will reference to highlight how well a player performs with willow in hand. Batting average is calculated by taking runs scored by a batter over a defined period of time and dividing it by the number of they have been dismissed over that same period.
84 |
85 | With this formula, the batting average metric can produce a somewhat inflated measure of a batter’s performance over time, because the denominator is dismissals and not total innings batted. On top of that, the WBBL season is relatively short - in season 7, there are 14 fixtured rounds, plus semi-finals and finals. So at maximum, a batter could play 16 innings in the current season. Others will bat far fewer innings than that hypothetical maximum, so 2 or 3 not-out innings can have a sizeable influence on batting average.
86 |
87 | All things considered, including the T20 format of the WBBL, I was more interested in Healy’s “production” in the sense of average runs scored per innings.
88 |
89 | ```{r}
90 | # Alyssa Healy compared to all players who have batted in 3+ innings in a season.
91 | batting_per_season <- wbbl_bbb_tidy |>
92 | group_by(season, striker) |>
93 | summarise(
94 | innings_total = length(unique(match_id)),
95 | runs_off_bat_total = sum(runs_off_bat),
96 | balls_faced_total = length(ball),
97 | .groups = "keep"
98 | ) |>
99 | mutate(
100 | runs_per_innings_avg = round(runs_off_bat_total / innings_total, 1),
101 | strike_rate = round(runs_off_bat_total / balls_faced_total * 100, 1)
102 | ) |>
103 | filter(innings_total > 2) |>
104 | mutate(is_healy = (striker == "AJ Healy")) |>
105 | ungroup()
106 | ```
107 |
108 | ### Making a plot
109 |
110 | ```{r warning=FALSE, out.width="100%"}
111 | # Import fonts from Google Fonts
112 | font_add_google("Roboto Condensed", "roboto_con")
113 | font_add_google("Staatliches", "staat")
114 | showtext_auto()
115 |
116 | # Build plot
117 | batting_per_season |>
118 | ggplot(aes(
119 | x = season, y = runs_per_innings_avg,
120 | group = striker, colour = is_healy
121 | )) +
122 | geom_line(linewidth = 2, colour = "#F80F61FF") +
123 | gghighlight(is_healy,
124 | label_key = striker,
125 | label_params = aes(
126 | size = 6, force_pull = 0.1, nudge_y = 10, label.size = 1,
127 | family = "roboto_con", label.padding = 0.5,
128 | fill = "#19232FFF",
129 | colour = "#F80F61FF"
130 | ),
131 | unhighlighted_params = list(size = 1, color = "#187999FF")
132 | ) +
133 | labs(
134 | title = "WBBL: Average runs scored per innings (3+ innings)",
135 | x = NULL, y = NULL,
136 | caption = "**Source:** Cricsheet.org // **Plot:** @jacquietran"
137 | ) +
138 | theme_minimal() +
139 | theme(
140 | text = element_text(size = 18, family = "roboto_con", colour = "#FFFFFF"),
141 | plot.title = element_text(family = "staat", margin = margin(0, 0, 15, 0)),
142 | plot.caption = element_markdown(size = NULL, margin = margin(15, 0, 0, 0)),
143 | axis.text = element_text(colour = "#FFFFFF"),
144 | legend.position = "none",
145 | panel.grid.major = element_line(linetype = "dashed"),
146 | panel.grid.minor = element_blank(),
147 | plot.background = element_rect(
148 | fill = "#171F2AFF", colour = NA
149 | ),
150 | panel.spacing = unit(2, "lines"),
151 | plot.margin = unit(c(0.5, 0.5, 0.5, 0.5), "cm")
152 | )
153 | ```
154 |
155 | When plotting Healy’s WBBL production against the rest of the comp, what I see is “run machine” - consistent, high-end output over the last 5 WBBL seasons.
156 |
157 | No hyperbole: there are few better than Alyssa Healy at the crease.
158 |
159 | ## Dismissals by ball number
160 |
161 | Aside from scoring runs, another important aspect of batting is not losing your wicket. So how hard is it really to get the best batters out?
162 |
163 | I would love to dive into this question using high-resolution data that reflects the biomechanics battle between bowlers and batters. Enter: Hawkeye…except that Hawkeye is only sporadically available for women’s matches, even at the highest level. As far as I know, there are no publicly available Hawkeye data sets from WBBL matches. If I’m wrong, please tell me and point me towards the good goods!
164 |
165 | *([Cody Atkinson](https://twitter.com/CapitalCityCody/status/1448476292743983105) alerted me to the bounceR package by Richard Little, which enables access to Hawkeye data that does exist on the ICC and BCCI websites.)*
166 |
167 | ### Tidying the data
168 |
169 | In the absence of fancy Hawkeye data from the WBBL, I took a simpler route through the Cricsheet ball-by-ball data to visualise how hard it is to take the wicket of the likes of Healy, Beth Mooney, Meg Lanning, Ellyse Perry, Sophie Devine, and Heather Knight.
170 |
171 | ```{r}
172 | # Create new variable for ball number in each over
173 | ball_number_faced <- wbbl_bbb_tidy |>
174 | mutate(ball_num_in_over = sub(".*\\.", "", ball))
175 |
176 | # Summarise number of balls faced of each ball number, per batter
177 | ball_number_faced_summary <- ball_number_faced |>
178 | group_by(ball_num_in_over, striker) |>
179 | summarise(balls_faced = n(), .groups = "drop")
180 |
181 | # Dismissals by ball number
182 | dismissals_by_ball_number <- ball_number_faced |>
183 | select(ball_num_in_over, striker, wicket_type) |>
184 | filter(wicket_type != "") |>
185 | group_by(ball_num_in_over, striker) |>
186 | summarise(dismissals_n = n(), .groups = "drop")
187 | ```
188 |
189 | The code above shapes up the ball-by-ball data to record:
190 |
191 | * How many balls a batter has faced at each ball number in an over (1-6),
192 | * How many times a batter has been dismissed at each ball number.
193 |
194 | I made some editorial judgments too:
195 |
196 | * Excluded data from WBBL01 due to limited games from that season in the Cricsheet-hosted data
197 | * Excluded balls numbered 7+
198 | * League-wide data only includes batters who have faced 200+ balls total (across WBBL02-06, inclusive)
199 |
200 | With the ball-by-ball data prepared for the analysis question, we can then calculate league-wide and player-specific summary statistics for dismissals by ball number:
201 |
202 | ```{r}
203 | # Merge data and summarise to league-wide dismissals rate by ball number
204 | dismissals_by_ball_number_summary <- left_join(
205 | ball_number_faced_summary, dismissals_by_ball_number,
206 | by = c("ball_num_in_over", "striker")
207 | ) |>
208 | tidyr::replace_na(list(dismissals_n = 0)) |>
209 | group_by(striker) |>
210 | mutate(total_balls_faced = sum(balls_faced)) |>
211 | ungroup() |>
212 | mutate(dismissals_pct = round(dismissals_n / balls_faced * 100, 2)) |>
213 | # Include those who have faced more than 200 balls total
214 | filter(total_balls_faced >= 200) |>
215 | # Exclude balls beyond 6 - infrequent occurrences
216 | filter(ball_num_in_over < 7)
217 |
218 | # Extract data for specific players
219 | # Healy
220 | dismissals_by_ball_number_summary_healy <- dismissals_by_ball_number_summary |>
221 | filter(striker == "AJ Healy")
222 | # Mooney
223 | dismissals_by_ball_number_summary_mooney <- dismissals_by_ball_number_summary |>
224 | filter(striker == "BL Mooney")
225 | # Lanning
226 | dismissals_by_ball_number_summary_lanning <- dismissals_by_ball_number_summary |>
227 | filter(striker == "MM Lanning")
228 | # Perry
229 | dismissals_by_ball_number_summary_perry <- dismissals_by_ball_number_summary |>
230 | filter(striker == "EA Perry")
231 | # Devine
232 | dismissals_by_ball_number_summary_devine <- dismissals_by_ball_number_summary |>
233 | filter(striker == "SFM Devine")
234 | # Knight
235 | dismissals_by_ball_number_summary_knight <- dismissals_by_ball_number_summary |>
236 | filter(striker == "HC Knight")
237 | ```
238 |
239 | ### Making plots
240 |
241 | Here’s the code for building the plot I tweeted out. I decided to build one plot per player of interest and then “quilt” the plots together using the `patchwork` package (https://github.com/thomasp85/patchwork). I’m sure you could achieve the same / similar result using `ggplot2::facet_wrap()` but I just love using `patchwork` ...
242 |
243 | ```{r, out.width="100%"}
244 | # Define consistent plot features ----------------------------------------------
245 | plot_features <- list(
246 | coord_cartesian(ylim = c(0, 10)),
247 | theme_minimal(),
248 | theme(
249 | text = element_text(family = "roboto_con", colour = "#FFFFFF"),
250 | plot.title = element_text(
251 | size = 11, family = "staat", margin = margin(0, 0, 15, 0)
252 | ),
253 | plot.subtitle = element_text(
254 | size = 12, family = "staat", margin = margin(0, 0, 15, 0)
255 | ),
256 | plot.caption = element_markdown(
257 | size = 10, margin = margin(15, 0, 0, 0)
258 | ),
259 | axis.text = element_text(size = 9, colour = "#FFFFFF"),
260 | legend.position = "none",
261 | panel.grid.major.y = element_line(linetype = "dashed"),
262 | panel.grid.major.x = element_blank(),
263 | panel.grid.minor = element_blank(),
264 | plot.background = element_rect(
265 | fill = "#171F2AFF", colour = NA
266 | ),
267 | panel.spacing = unit(2, "lines"),
268 | plot.margin = unit(c(0.25, 0.25, 0.25, 0.25), "cm")
269 | ),
270 | labs(x = NULL, y = NULL)
271 | )
272 |
273 | # Build plots ------------------------------------------------------------------
274 | showtext_auto()
275 |
276 | # Healy
277 | p1 <- dismissals_by_ball_number_summary |>
278 | ggplot(aes(x = ball_num_in_over, y = dismissals_pct)) +
279 | geom_boxplot(
280 | fill = "#FFFFFF", colour = "#FFFFFF", size = .5,
281 | alpha = 0.25, notch = TRUE, outlier.shape = NA, coef = 0
282 | ) +
283 | geom_point(
284 | data = dismissals_by_ball_number_summary_healy,
285 | colour = "#F80F61FF", size = 3
286 | ) +
287 | labs(
288 | title = "WBBL: % dismissals by ball number",
289 | subtitle = "AJ Healy"
290 | ) +
291 | plot_features
292 |
293 | # Mooney
294 | p2 <- dismissals_by_ball_number_summary |>
295 | ggplot(aes(x = ball_num_in_over, y = dismissals_pct)) +
296 | geom_boxplot(
297 | fill = "#FFFFFF", colour = "#FFFFFF", size = .5,
298 | alpha = 0.25, notch = TRUE, outlier.shape = NA, coef = 0
299 | ) +
300 | geom_point(
301 | data = dismissals_by_ball_number_summary_mooney,
302 | colour = "#FA6900FF", size = 3
303 | ) +
304 | labs(subtitle = "BL Mooney") +
305 | plot_features
306 |
307 | # Lanning
308 | p3 <- dismissals_by_ball_number_summary |>
309 | ggplot(aes(x = ball_num_in_over, y = dismissals_pct)) +
310 | geom_boxplot(
311 | fill = "#FFFFFF", colour = "#FFFFFF", size = .5,
312 | alpha = 0.25, notch = TRUE, outlier.shape = NA, coef = 0
313 | ) +
314 | geom_point(
315 | data = dismissals_by_ball_number_summary_lanning,
316 | colour = "#018821FF", size = 3
317 | ) +
318 | labs(subtitle = "MM Lanning") +
319 | plot_features
320 |
321 | # Perry
322 | p4 <- dismissals_by_ball_number_summary |>
323 | ggplot(aes(x = ball_num_in_over, y = dismissals_pct)) +
324 | geom_boxplot(
325 | fill = "#FFFFFF", colour = "#FFFFFF", size = .5,
326 | alpha = 0.25, notch = TRUE, outlier.shape = NA, coef = 0
327 | ) +
328 | geom_point(
329 | data = dismissals_by_ball_number_summary_perry,
330 | colour = "#F80F61FF", size = 3
331 | ) +
332 | labs(subtitle = "EA Perry") +
333 | plot_features
334 |
335 | # Devine
336 | p5 <- dismissals_by_ball_number_summary |>
337 | ggplot(aes(x = ball_num_in_over, y = dismissals_pct)) +
338 | geom_boxplot(
339 | fill = "#FFFFFF", colour = "#FFFFFF", size = .5,
340 | alpha = 0.25, notch = TRUE, outlier.shape = NA, coef = 0
341 | ) +
342 | geom_point(
343 | data = dismissals_by_ball_number_summary_devine,
344 | colour = "#FA6900FF", size = 3
345 | ) +
346 | labs(subtitle = "SFM Devine") +
347 | plot_features
348 |
349 | # Knight
350 | p6 <- dismissals_by_ball_number_summary |>
351 | ggplot(aes(x = ball_num_in_over, y = dismissals_pct)) +
352 | geom_boxplot(
353 | fill = "#FFFFFF", colour = "#FFFFFF", size = .5,
354 | alpha = 0.25, notch = TRUE, outlier.shape = NA, coef = 0
355 | ) +
356 | geom_point(
357 | data = dismissals_by_ball_number_summary_knight,
358 | colour = "#95C65CFF", size = 3
359 | ) +
360 | labs(
361 | subtitle = "HC Knight",
362 | caption = "**Source:** Cricsheet.org // **Plot:** @jacquietran"
363 | ) +
364 | plot_features
365 |
366 | # Quilt the plots --------------------------------------------------------------
367 | (p1 + p2 + p3) / (p4 + p5 + p6)
368 | ```
369 |
370 | My key observations:
371 |
372 | 1. Lanning doesn’t give much away on any ball number in an over - of balls faced at each ball number (1-6), her dismissals percentages range from 2.7-3.4%. Well under league medians of 4.3-5.1%.
373 | 2. Perry is a stalwart too, but relative to her own standards, you might have a better shot bowling to her on balls 5 (2.7%) and 6 (3.4%) than earlier in an over (1.4-2.4%).
374 | 3. Healy is obviously dangerous with how quickly she can accelerate her run-scoring, but is she susceptible on ball 5? (dismissed 7.1% out of balls faced vs. 4.7% as the league median for ball number 5)
375 | 4. The opening partnership of Mooney / Devine is a scary prospect, reason #4182: they’re both hard to shift from the crease. Devine’s “worst” ball is ball number 3 (4.3%), Mooney’s is ball number 5 (4.5%) - both are still below league medians (4.8% and 4.7%, respectively).
376 | 5. Knight is stubborn for balls 1-3 (1.7-2.8%), looser than the league medians for balls 4 and 5 (7.7% and 5.5%, respectively), then clamps down again for ball 6 (3.2%).
377 |
378 | ## “Wiggly bois”: Visualising team strike rates
379 |
380 | Player and league summaries across seasons are great and all, but what really piques my interest with ball-by-ball / play-by-play data is using it to understand how matches unfold from moment to moment.
381 |
382 | The very nature of cricket’s shortest format, T20, lends itself to batting aggression - recruiting for power hitters, aiming to demoralise the opposition by setting big targets, and building teams that bat deep. That last point is particularly important in T20 because aggressive batting means taking risks, and as long as there is risk taking by the batters, then the bowling side is in with a chance.
383 |
384 | Commentators will generally focus on player strike rates in T20; that is, how many runs a batter scores for the number of balls faced. For instance, power hitters like Sophie Devine routinely achieve strike rates over 120 (i.e., 120 runs scored per 100 balls) - we can think of this as a measure of scoring efficiency.
385 |
386 | A common perception is that snagging a wicket will slow down strike rates, but I wondered whether this is really true in the T20 context where the imperative to accelerate is paramount. Teams will also expect to lose some wickets in every innings, so I’d imagine they would recruit and train accordingly for a potent middle order and a tail that wags.
387 |
388 | Instead of looking at player strike rates, what can we learn by analysing team *strike rates*?
389 |
390 | ### Tidying the data
391 |
392 | For a more granular focus, I created a subset from the Cricsheet data that includes matches played in season 7 of the WBBL. At the time I produced my original visualisation, the ball-by-ball data included all games played up to 7 November 2021.
393 |
394 | ```{r}
395 | # Subset match metadata for WBB07 games
396 | wbbl07_match_info_tidy <- wbbl_match_info |>
397 | filter(season == "2021/22", date <= "2021/11/07") |>
398 | select(
399 | match_id, winner, winner_runs, winner_wickets, method, outcome,
400 | eliminator
401 | ) |>
402 | mutate(match_id = factor(match_id))
403 |
404 | # Subset ball-by-ball data for WBBL07 games
405 | wbbl07_bbb_tidy <- wbbl_bbb |>
406 | filter(match_id %in% wbbl07_match_info_tidy$match_id) |>
407 | mutate(
408 | match_id = factor(match_id),
409 | runs_scored = runs_off_bat + extras,
410 | wicket_type = case_when(
411 | wicket_type == "" ~ NA_character_,
412 | TRUE ~ wicket_type
413 | )
414 | ) |>
415 | group_by(match_id, innings) |>
416 | mutate(
417 | temp_var = 1,
418 | balls_cumulative = cumsum(temp_var),
419 | runs_cumulative = cumsum(runs_scored),
420 | runs_total = max(runs_cumulative)
421 | ) |>
422 | ungroup() |>
423 | select(-temp_var) |>
424 | # Merge match metadata and ball-by-ball data
425 | left_join(wbbl07_match_info_tidy, by = "match_id") |>
426 | mutate(
427 | outcome_batting_team = case_when(
428 | outcome %in% c("no result", "tie") ~ as.character(outcome),
429 | winner == batting_team ~ "won",
430 | TRUE ~ "lost"
431 | ),
432 | outcome_bowling_team = case_when(
433 | outcome %in% c("no result", "tie") ~ as.character(outcome),
434 | winner == bowling_team ~ "won",
435 | TRUE ~ "lost"
436 | )
437 | )
438 | ```
439 |
440 | Using the WBBL07 data subset, I did some further tidying to calculate team strike rates per innings:
441 |
442 | ```{r}
443 | team_strike_rate <- wbbl07_bbb_tidy |>
444 | # Exclude matches that ended with a Super Over ("tie")
445 | # and matches that were called off ("no result")
446 | filter(!outcome_batting_team %in% c("tie", "no result")) |>
447 | group_by(match_id, innings) |>
448 | mutate(
449 | rolling_strike_rate = round(
450 | runs_cumulative / balls_cumulative * 100, 1
451 | ),
452 | wicket_ball_num = case_when(
453 | !is.na(wicket_type) ~ balls_cumulative,
454 | TRUE ~ NA_real_
455 | ),
456 | wicket_strike_rate = case_when(
457 | !is.na(wicket_type) ~ rolling_strike_rate,
458 | TRUE ~ NA_real_
459 | ),
460 | innings_description = case_when(
461 | innings == 1 ~ "Batting 1st",
462 | innings == 2 ~ "Batting 2nd"
463 | ),
464 | bowling_team_short = word(bowling_team, -1),
465 | start_date_day = lubridate::day(start_date),
466 | start_date_month = lubridate::month(start_date),
467 | match_details = glue::glue(
468 | "{innings_description} vs. {bowling_team_short} ({start_date_day}/{start_date_month})"
469 | )
470 | ) |>
471 | arrange(match_id, innings, balls_cumulative) |>
472 | mutate(
473 | match_details = factor(
474 | match_details,
475 | levels = unique(match_details)
476 | ),
477 | outcome_batting_team = factor(
478 | outcome_batting_team,
479 | levels = c("won", "lost")
480 | )
481 | )
482 | ```
483 |
484 | ### Making plots
485 |
486 | There’s plenty that can be done with the `team_strike_rate` data which could warrant a dedicated exploration in itself. But I usually find that when I’m exploring a new analytical idea, it’s easier for me to get a feel for what the data does and or doesn’t highlight by going with a “small batch” approach.
487 |
488 | With the Renegades at the top of the standings (as of 7 Nov), I focused on their team strike rates, a.k.a. *“wiggly bois”*:
489 |
490 | ```{r echo=FALSE, out.width="75%", fig.align="center"}
491 | knitr::include_graphics("figs/tweet.png")
492 | ```
493 |
494 | ```{r, fig.width=9, fig.height=12, out.width="100%", warning=FALSE, message=FALSE}
495 | # Filter to Renegades' innings only --------------------------------------------
496 | team_strike_rate_renegades <- team_strike_rate |>
497 | filter(str_detect(batting_team, "Renegades"))
498 |
499 | # Build plot -------------------------------------------------------------------
500 | showtext_auto()
501 | team_strike_rate_renegades |>
502 | ggplot(aes(x = balls_cumulative, y = rolling_strike_rate)) +
503 | facet_wrap(~match_details, ncol = 3) +
504 | geom_hline(yintercept = 100, linetype = "dashed", colour = "#CCCCCC") +
505 | geom_line(aes(colour = outcome_batting_team), linewidth = 1.5) +
506 | geom_point(
507 | aes(
508 | x = team_strike_rate_renegades$wicket_ball_num,
509 | y = team_strike_rate_renegades$wicket_strike_rate
510 | ),
511 | colour = "red", size = 3, alpha = 0.75
512 | ) +
513 | labs(
514 | title = "WBBL07: Melbourne Renegades - Team strike rate (games up to 7 Nov 2021)",
515 | x = "Ball number in an innings", y = NULL,
516 | caption = "**Source:** Cricsheet.org // **Plot:** @jacquietran"
517 | ) +
518 | scale_x_continuous(breaks = seq(0, 120, by = 30)) +
519 | scale_color_manual(
520 | values = c("won" = "#4a8bad", "lost" = "#AD4A8B"),
521 | labels = c("Renegades won", "Renegades lost")
522 | ) +
523 | coord_cartesian(ylim = c(0, 200)) +
524 | theme_minimal() +
525 | theme(
526 | text = element_text(size = 18, family = "roboto_con", colour = "#FFFFFF"),
527 | legend.position = "top",
528 | legend.title = element_blank(),
529 | legend.key.size = unit(1.5, "cm"),
530 | legend.margin = margin(0, 0, 0, 0),
531 | legend.spacing.x = unit(0, "cm"),
532 | legend.spacing.y = unit(0, "cm"),
533 | plot.title = element_text(family = "staat", margin = margin(0, 0, 15, 0)),
534 | plot.caption = element_markdown(size = NULL, margin = margin(15, 0, 0, 0)),
535 | strip.text = element_text(colour = "#FFFFFF", size = 12),
536 | axis.text = element_text(colour = "#FFFFFF"),
537 | axis.title.x = element_text(margin = margin(15, 0, 0, 0)),
538 | panel.grid.major.y = element_blank(),
539 | panel.grid.minor.y = element_blank(),
540 | panel.grid.major.x = element_line(colour = "#203b60", linetype = "dotted"),
541 | panel.grid.minor.x = element_blank(),
542 | plot.background = element_rect(
543 | fill = "#171F2AFF",
544 | colour = NA
545 | ),
546 | panel.spacing = unit(2, "lines"),
547 | plot.margin = unit(c(0.5, 0.5, 0.5, 0.5), "cm")
548 | )
549 | ```
550 |
551 | The plots show the Renegades’ team strike rate across each innings they’ve batted, with wicket occurrences overlaid. From a visual assessment, it looks like wickets have not dampened the enthusiasm of Renegades’ batters for striking the ball - broadly speaking, it appears that, when the new batter comes in for the Renegades, they are often able to maintain the team’s strike rate. In some games, they’ve even managed to accelerate after losing a wicket.
552 |
553 | Importantly, the Renegades have not lost many wickets in Powerplay overs this season, which puts them in a better position to push the scoring pace as an innings wears on with most of their wickets in hand.
554 |
--------------------------------------------------------------------------------