├── .Rbuildignore ├── .github └── workflows │ └── rcmdcheck.yml ├── .gitignore ├── DESCRIPTION ├── NAMESPACE ├── NEWS.md ├── R ├── chrome_read_html.R ├── fetch_page.R ├── get_code_lang.R ├── get_generator.R ├── get_headers.R ├── get_html.R ├── get_imgs.R ├── get_language.R ├── get_links.R ├── get_rss.R ├── get_social.R ├── get_tables.R ├── get_time.R ├── get_title.R ├── html_df.R ├── progress.R ├── string_cleaners.R ├── sysdata.rda └── zzz.R ├── README.Rmd ├── README.md ├── code_classifier.R ├── man ├── figures │ └── hex.png └── html_df.Rd ├── page_inference └── code_classification │ ├── code_classifier.R │ ├── code_training_data.R │ ├── ddf.RData │ └── helper_functions.R └── tests ├── testthat.R └── testthat ├── test_code_inference.R ├── test_date_detection.R ├── test_language.R ├── test_page_titles.R ├── test_rss.R ├── test_social_profiles.R ├── test_twitter_handle_inferring.R └── testdata ├── add_test_pages.R ├── adv_ml.html ├── alastair.html ├── analytics_vidya.html ├── ar_tdf.html ├── arxiv.html ├── bbc_1.html ├── bbc_2.html ├── burnsstat.html ├── businesssci.html ├── colinfay.html ├── cullen.html ├── databrain.html ├── distill.html ├── dvc.html ├── ebay.html ├── ethereum.html ├── etsy.html ├── gelman.html ├── giminez.html ├── github1.html ├── github2.html ├── github3.html ├── gradient.html ├── guardian.html ├── hogan.html ├── hogervorst.html ├── hvitfeldt.html ├── inspect_df.html ├── kadena.html ├── landau.html ├── mcdonnell.html ├── meissner.html ├── mlplus.html ├── page_attrs.R ├── pipingdata.html ├── rbloggers.html ├── rbloggers2.html ├── reddit.html ├── revo.html ├── robinson.html ├── rolkra.html ├── ropensci.html ├── rushworth.html ├── rweekly.html ├── salmon.html ├── shakirm.html ├── silge.html ├── sitstand.html ├── spencer.html ├── tflow.html ├── towardsds.html ├── towardsds2.html ├── vdp.html ├── wiki_tdf.html └── wikipedia.html /.Rbuildignore: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alastairrushworth/htmldf/HEAD/.Rbuildignore -------------------------------------------------------------------------------- /.github/workflows/rcmdcheck.yml: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alastairrushworth/htmldf/HEAD/.github/workflows/rcmdcheck.yml -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alastairrushworth/htmldf/HEAD/.gitignore -------------------------------------------------------------------------------- /DESCRIPTION: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alastairrushworth/htmldf/HEAD/DESCRIPTION -------------------------------------------------------------------------------- /NAMESPACE: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alastairrushworth/htmldf/HEAD/NAMESPACE -------------------------------------------------------------------------------- /NEWS.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alastairrushworth/htmldf/HEAD/NEWS.md -------------------------------------------------------------------------------- /R/chrome_read_html.R: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alastairrushworth/htmldf/HEAD/R/chrome_read_html.R -------------------------------------------------------------------------------- /R/fetch_page.R: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alastairrushworth/htmldf/HEAD/R/fetch_page.R -------------------------------------------------------------------------------- /R/get_code_lang.R: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alastairrushworth/htmldf/HEAD/R/get_code_lang.R -------------------------------------------------------------------------------- /R/get_generator.R: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alastairrushworth/htmldf/HEAD/R/get_generator.R -------------------------------------------------------------------------------- /R/get_headers.R: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alastairrushworth/htmldf/HEAD/R/get_headers.R -------------------------------------------------------------------------------- /R/get_html.R: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alastairrushworth/htmldf/HEAD/R/get_html.R -------------------------------------------------------------------------------- /R/get_imgs.R: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alastairrushworth/htmldf/HEAD/R/get_imgs.R -------------------------------------------------------------------------------- /R/get_language.R: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alastairrushworth/htmldf/HEAD/R/get_language.R -------------------------------------------------------------------------------- /R/get_links.R: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alastairrushworth/htmldf/HEAD/R/get_links.R -------------------------------------------------------------------------------- /R/get_rss.R: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alastairrushworth/htmldf/HEAD/R/get_rss.R -------------------------------------------------------------------------------- /R/get_social.R: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alastairrushworth/htmldf/HEAD/R/get_social.R -------------------------------------------------------------------------------- /R/get_tables.R: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alastairrushworth/htmldf/HEAD/R/get_tables.R -------------------------------------------------------------------------------- /R/get_time.R: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alastairrushworth/htmldf/HEAD/R/get_time.R -------------------------------------------------------------------------------- /R/get_title.R: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alastairrushworth/htmldf/HEAD/R/get_title.R -------------------------------------------------------------------------------- /R/html_df.R: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alastairrushworth/htmldf/HEAD/R/html_df.R -------------------------------------------------------------------------------- /R/progress.R: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alastairrushworth/htmldf/HEAD/R/progress.R -------------------------------------------------------------------------------- /R/string_cleaners.R: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alastairrushworth/htmldf/HEAD/R/string_cleaners.R -------------------------------------------------------------------------------- /R/sysdata.rda: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alastairrushworth/htmldf/HEAD/R/sysdata.rda -------------------------------------------------------------------------------- /R/zzz.R: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alastairrushworth/htmldf/HEAD/R/zzz.R -------------------------------------------------------------------------------- /README.Rmd: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alastairrushworth/htmldf/HEAD/README.Rmd -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alastairrushworth/htmldf/HEAD/README.md -------------------------------------------------------------------------------- /code_classifier.R: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alastairrushworth/htmldf/HEAD/code_classifier.R -------------------------------------------------------------------------------- /man/figures/hex.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alastairrushworth/htmldf/HEAD/man/figures/hex.png -------------------------------------------------------------------------------- /man/html_df.Rd: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alastairrushworth/htmldf/HEAD/man/html_df.Rd -------------------------------------------------------------------------------- /page_inference/code_classification/code_classifier.R: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alastairrushworth/htmldf/HEAD/page_inference/code_classification/code_classifier.R -------------------------------------------------------------------------------- /page_inference/code_classification/code_training_data.R: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alastairrushworth/htmldf/HEAD/page_inference/code_classification/code_training_data.R -------------------------------------------------------------------------------- /page_inference/code_classification/ddf.RData: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alastairrushworth/htmldf/HEAD/page_inference/code_classification/ddf.RData -------------------------------------------------------------------------------- /page_inference/code_classification/helper_functions.R: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alastairrushworth/htmldf/HEAD/page_inference/code_classification/helper_functions.R -------------------------------------------------------------------------------- /tests/testthat.R: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alastairrushworth/htmldf/HEAD/tests/testthat.R -------------------------------------------------------------------------------- /tests/testthat/test_code_inference.R: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alastairrushworth/htmldf/HEAD/tests/testthat/test_code_inference.R -------------------------------------------------------------------------------- /tests/testthat/test_date_detection.R: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alastairrushworth/htmldf/HEAD/tests/testthat/test_date_detection.R -------------------------------------------------------------------------------- /tests/testthat/test_language.R: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alastairrushworth/htmldf/HEAD/tests/testthat/test_language.R -------------------------------------------------------------------------------- /tests/testthat/test_page_titles.R: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alastairrushworth/htmldf/HEAD/tests/testthat/test_page_titles.R -------------------------------------------------------------------------------- /tests/testthat/test_rss.R: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alastairrushworth/htmldf/HEAD/tests/testthat/test_rss.R -------------------------------------------------------------------------------- /tests/testthat/test_social_profiles.R: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alastairrushworth/htmldf/HEAD/tests/testthat/test_social_profiles.R -------------------------------------------------------------------------------- /tests/testthat/test_twitter_handle_inferring.R: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alastairrushworth/htmldf/HEAD/tests/testthat/test_twitter_handle_inferring.R -------------------------------------------------------------------------------- /tests/testthat/testdata/add_test_pages.R: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alastairrushworth/htmldf/HEAD/tests/testthat/testdata/add_test_pages.R -------------------------------------------------------------------------------- /tests/testthat/testdata/adv_ml.html: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alastairrushworth/htmldf/HEAD/tests/testthat/testdata/adv_ml.html -------------------------------------------------------------------------------- /tests/testthat/testdata/alastair.html: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alastairrushworth/htmldf/HEAD/tests/testthat/testdata/alastair.html -------------------------------------------------------------------------------- /tests/testthat/testdata/analytics_vidya.html: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alastairrushworth/htmldf/HEAD/tests/testthat/testdata/analytics_vidya.html -------------------------------------------------------------------------------- /tests/testthat/testdata/ar_tdf.html: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alastairrushworth/htmldf/HEAD/tests/testthat/testdata/ar_tdf.html -------------------------------------------------------------------------------- /tests/testthat/testdata/arxiv.html: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alastairrushworth/htmldf/HEAD/tests/testthat/testdata/arxiv.html -------------------------------------------------------------------------------- /tests/testthat/testdata/bbc_1.html: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alastairrushworth/htmldf/HEAD/tests/testthat/testdata/bbc_1.html -------------------------------------------------------------------------------- /tests/testthat/testdata/bbc_2.html: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alastairrushworth/htmldf/HEAD/tests/testthat/testdata/bbc_2.html -------------------------------------------------------------------------------- /tests/testthat/testdata/burnsstat.html: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alastairrushworth/htmldf/HEAD/tests/testthat/testdata/burnsstat.html -------------------------------------------------------------------------------- /tests/testthat/testdata/businesssci.html: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alastairrushworth/htmldf/HEAD/tests/testthat/testdata/businesssci.html -------------------------------------------------------------------------------- /tests/testthat/testdata/colinfay.html: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alastairrushworth/htmldf/HEAD/tests/testthat/testdata/colinfay.html -------------------------------------------------------------------------------- /tests/testthat/testdata/cullen.html: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alastairrushworth/htmldf/HEAD/tests/testthat/testdata/cullen.html -------------------------------------------------------------------------------- /tests/testthat/testdata/databrain.html: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alastairrushworth/htmldf/HEAD/tests/testthat/testdata/databrain.html -------------------------------------------------------------------------------- /tests/testthat/testdata/distill.html: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alastairrushworth/htmldf/HEAD/tests/testthat/testdata/distill.html -------------------------------------------------------------------------------- /tests/testthat/testdata/dvc.html: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alastairrushworth/htmldf/HEAD/tests/testthat/testdata/dvc.html -------------------------------------------------------------------------------- /tests/testthat/testdata/ebay.html: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alastairrushworth/htmldf/HEAD/tests/testthat/testdata/ebay.html -------------------------------------------------------------------------------- /tests/testthat/testdata/ethereum.html: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alastairrushworth/htmldf/HEAD/tests/testthat/testdata/ethereum.html -------------------------------------------------------------------------------- /tests/testthat/testdata/etsy.html: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alastairrushworth/htmldf/HEAD/tests/testthat/testdata/etsy.html -------------------------------------------------------------------------------- /tests/testthat/testdata/gelman.html: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alastairrushworth/htmldf/HEAD/tests/testthat/testdata/gelman.html -------------------------------------------------------------------------------- /tests/testthat/testdata/giminez.html: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alastairrushworth/htmldf/HEAD/tests/testthat/testdata/giminez.html -------------------------------------------------------------------------------- /tests/testthat/testdata/github1.html: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alastairrushworth/htmldf/HEAD/tests/testthat/testdata/github1.html -------------------------------------------------------------------------------- /tests/testthat/testdata/github2.html: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alastairrushworth/htmldf/HEAD/tests/testthat/testdata/github2.html -------------------------------------------------------------------------------- /tests/testthat/testdata/github3.html: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alastairrushworth/htmldf/HEAD/tests/testthat/testdata/github3.html -------------------------------------------------------------------------------- /tests/testthat/testdata/gradient.html: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alastairrushworth/htmldf/HEAD/tests/testthat/testdata/gradient.html -------------------------------------------------------------------------------- /tests/testthat/testdata/guardian.html: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alastairrushworth/htmldf/HEAD/tests/testthat/testdata/guardian.html -------------------------------------------------------------------------------- /tests/testthat/testdata/hogan.html: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alastairrushworth/htmldf/HEAD/tests/testthat/testdata/hogan.html -------------------------------------------------------------------------------- /tests/testthat/testdata/hogervorst.html: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alastairrushworth/htmldf/HEAD/tests/testthat/testdata/hogervorst.html -------------------------------------------------------------------------------- /tests/testthat/testdata/hvitfeldt.html: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alastairrushworth/htmldf/HEAD/tests/testthat/testdata/hvitfeldt.html -------------------------------------------------------------------------------- /tests/testthat/testdata/inspect_df.html: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alastairrushworth/htmldf/HEAD/tests/testthat/testdata/inspect_df.html -------------------------------------------------------------------------------- /tests/testthat/testdata/kadena.html: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alastairrushworth/htmldf/HEAD/tests/testthat/testdata/kadena.html -------------------------------------------------------------------------------- /tests/testthat/testdata/landau.html: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alastairrushworth/htmldf/HEAD/tests/testthat/testdata/landau.html -------------------------------------------------------------------------------- /tests/testthat/testdata/mcdonnell.html: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alastairrushworth/htmldf/HEAD/tests/testthat/testdata/mcdonnell.html -------------------------------------------------------------------------------- /tests/testthat/testdata/meissner.html: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alastairrushworth/htmldf/HEAD/tests/testthat/testdata/meissner.html -------------------------------------------------------------------------------- /tests/testthat/testdata/mlplus.html: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alastairrushworth/htmldf/HEAD/tests/testthat/testdata/mlplus.html -------------------------------------------------------------------------------- /tests/testthat/testdata/page_attrs.R: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alastairrushworth/htmldf/HEAD/tests/testthat/testdata/page_attrs.R -------------------------------------------------------------------------------- /tests/testthat/testdata/pipingdata.html: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alastairrushworth/htmldf/HEAD/tests/testthat/testdata/pipingdata.html -------------------------------------------------------------------------------- /tests/testthat/testdata/rbloggers.html: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alastairrushworth/htmldf/HEAD/tests/testthat/testdata/rbloggers.html -------------------------------------------------------------------------------- /tests/testthat/testdata/rbloggers2.html: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alastairrushworth/htmldf/HEAD/tests/testthat/testdata/rbloggers2.html -------------------------------------------------------------------------------- /tests/testthat/testdata/reddit.html: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alastairrushworth/htmldf/HEAD/tests/testthat/testdata/reddit.html -------------------------------------------------------------------------------- /tests/testthat/testdata/revo.html: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alastairrushworth/htmldf/HEAD/tests/testthat/testdata/revo.html -------------------------------------------------------------------------------- /tests/testthat/testdata/robinson.html: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alastairrushworth/htmldf/HEAD/tests/testthat/testdata/robinson.html -------------------------------------------------------------------------------- /tests/testthat/testdata/rolkra.html: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alastairrushworth/htmldf/HEAD/tests/testthat/testdata/rolkra.html -------------------------------------------------------------------------------- /tests/testthat/testdata/ropensci.html: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alastairrushworth/htmldf/HEAD/tests/testthat/testdata/ropensci.html -------------------------------------------------------------------------------- /tests/testthat/testdata/rushworth.html: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alastairrushworth/htmldf/HEAD/tests/testthat/testdata/rushworth.html -------------------------------------------------------------------------------- /tests/testthat/testdata/rweekly.html: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alastairrushworth/htmldf/HEAD/tests/testthat/testdata/rweekly.html -------------------------------------------------------------------------------- /tests/testthat/testdata/salmon.html: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alastairrushworth/htmldf/HEAD/tests/testthat/testdata/salmon.html -------------------------------------------------------------------------------- /tests/testthat/testdata/shakirm.html: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alastairrushworth/htmldf/HEAD/tests/testthat/testdata/shakirm.html -------------------------------------------------------------------------------- /tests/testthat/testdata/silge.html: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alastairrushworth/htmldf/HEAD/tests/testthat/testdata/silge.html -------------------------------------------------------------------------------- /tests/testthat/testdata/sitstand.html: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alastairrushworth/htmldf/HEAD/tests/testthat/testdata/sitstand.html -------------------------------------------------------------------------------- /tests/testthat/testdata/spencer.html: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alastairrushworth/htmldf/HEAD/tests/testthat/testdata/spencer.html -------------------------------------------------------------------------------- /tests/testthat/testdata/tflow.html: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alastairrushworth/htmldf/HEAD/tests/testthat/testdata/tflow.html -------------------------------------------------------------------------------- /tests/testthat/testdata/towardsds.html: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alastairrushworth/htmldf/HEAD/tests/testthat/testdata/towardsds.html -------------------------------------------------------------------------------- /tests/testthat/testdata/towardsds2.html: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alastairrushworth/htmldf/HEAD/tests/testthat/testdata/towardsds2.html -------------------------------------------------------------------------------- /tests/testthat/testdata/vdp.html: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alastairrushworth/htmldf/HEAD/tests/testthat/testdata/vdp.html -------------------------------------------------------------------------------- /tests/testthat/testdata/wiki_tdf.html: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alastairrushworth/htmldf/HEAD/tests/testthat/testdata/wiki_tdf.html -------------------------------------------------------------------------------- /tests/testthat/testdata/wikipedia.html: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alastairrushworth/htmldf/HEAD/tests/testthat/testdata/wikipedia.html --------------------------------------------------------------------------------