├── .gitignore ├── Databases.md └── README.md /.gitignore: -------------------------------------------------------------------------------- 1 | *.html 2 | .Rproj.user 3 | .Rhistory 4 | .RData 5 | .Ruserdata 6 | -------------------------------------------------------------------------------- /Databases.md: -------------------------------------------------------------------------------- 1 | --- 2 | name: Databases 3 | topic: Databases with R 4 | maintainer: Yuan Tang, James Joseph Balamuta 5 | email: terrytangyuan@gmail.com 6 | version: 2023-02-23 7 | source: https://github.com/cran-task-views/Databases/ 8 | --- 9 | 10 | This CRAN task view contains a list of packages related to accessibility 11 | of different databases. This does not include data import/export or data 12 | management. Moreover, the task view on `r view("HighPerformanceComputing")` 13 | and `r view("MachineLearning")` might provide useful information. 14 | 15 | As datasets become larger and larger, it is impossible for people to 16 | save them in traditional file formats such as spreadsheet, raw text 17 | file, etc., which could not fit on devices with limited storage and 18 | could not be easily shared across collaborators. Instead, people 19 | nowadays tend to store data in databases for more scalable and reliable 20 | data management. 21 | 22 | Database systems are often classified based on the 23 | [database models](https://en.wikipedia.org/wiki/Database_model) 24 | that they support. 25 | [Relational databases](https://en.wikipedia.org/wiki/Relational_database) 26 | became dominant in the 1980s. The data in relational databases is modeled as 27 | rows and columns in a series of tables with the use of 28 | [SQL](https://en.wikipedia.org/wiki/SQL) to express the logic for 29 | writing and querying data. The tables are relational, e.g. you have a 30 | user who uses your softwares and those softwares have creators and 31 | contributors. Non-relational databases became popular in recent years 32 | due to huge demand in storing unstructured data with the use of 33 | [NoSQL](https://en.wikipedia.org/wiki/NoSQL) as the query language. 34 | Users generally don't need to define the data schema up front. If there 35 | are changing requirements in the applications, non-relational databases 36 | can be much easier to use and manage. 37 | 38 | The content presented in this task view is undergoing rapid 39 | changes in industries and academia. Please send any suggestions to the 40 | maintainer via e-mail or submit an issue or pull request in the GitHub 41 | repository linked above. All suggestions and corrections by others are 42 | gratefully acknowledged. 43 | 44 | 45 | ### Relational databases 46 | 47 | This section includes packages that provides access to relational 48 | databases within R. 49 | 50 | - The `r pkg("DBI", priority = "core")` package provides a 51 | database interface definition for communication between R and 52 | relational database management systems. It's worth noting that some 53 | packages try to follow this interface definition (DBI-compliant) but 54 | many existing packages don't. 55 | - The `r pkg("RODBC", priority = "core")` package provides access to 56 | databases through an ODBC interface. This package is maintained by the 57 | R Core Team and depends only on base R. See alternative odbc package 58 | below. 59 | - The `r pkg("odbc", priority = "core")` package provides a 60 | DBI-compliant interface to ODBC drivers. This package is maintained by 61 | RStudio and has a number of package dependencies. See alternative 62 | RODBC package above. 63 | - The `r pkg("RMariaDB")` package provides a DBI-compliant 64 | interface to [MariaDB](https://mariadb.org/) and 65 | [MySQL](https://www.mysql.com/). 66 | - The `r pkg("RMySQL")` package provides the interface to 67 | MySQL. Note that this is the legacy DBI interface to MySQL and 68 | MariaDB based on old code ported from S-PLUS. A modern MySQL client 69 | based on Rcpp is available from the RMariaDB package we listed 70 | above. 71 | - Packages for [PostgreSQL](https://www.postgresql.org/), an 72 | open-source relational database: 73 | - The `r pkg("RPostgreSQL")` package and 74 | `r pkg("RPostgres")` package both provide fully 75 | DBI-compliant Rcpp-backed interfaces to PostgreSQL. 76 | - The `r pkg("rpostgis")` package provides the 77 | interface to its spatial extension 78 | [PostGIS](http://postgis.net/). 79 | - The `r pkg("RGreenplum")` provides a fully 80 | DBI-compliant interface to [Greenplum](https://greenplum.org/), 81 | an open-source parallel database on top of PostgreSQL. 82 | - The `r pkg("ROracle")` package is a DBI-compliant 83 | [Oracle database](https://www.oracle.com/database/) driver 84 | based on the OCI. 85 | - Packages for [SQLite](http://www.sqlite.org/), a self-contained, 86 | high-reliability, embedded, full-featured, public-domain, SQL 87 | database engine: 88 | - The `r pkg("RSQLite")` package embeds the SQLite 89 | database engine in R and provides an interface compliant with 90 | the DBI package. 91 | - The `r pkg("filehashSQLite")` package is a simple 92 | key-value database using SQLite as the backend. 93 | - The `r pkg("liteq")` package provides temporary and 94 | permanent message queues for R, built on top of SQLite. 95 | - The `r pkg("duckdb")` package provides a DBI interface to [DuckDb](https://duckdb.org/), 96 | an in-process SQL OLAP database management system. 97 | - The `r pkg("bigrquery")` package provides the interface 98 | to [Google BigQuery](https://developers.google.com/bigquery/), 99 | Google's fully managed, petabyte scale, low cost analytics data 100 | warehouse. 101 | - The `r github("druid-io/RDruid")` package on GitHub 102 | provides the interface to [Apache Druid](https://druid.apache.org/), 103 | a high performance analytics data store for event-driven data. 104 | - The `r pkg("RH2")` package provides the interface to [H2 105 | Database Engine](http://www.h2database.com/), the Java SQL 106 | database. 107 | - The `r pkg("influxdbr")` package provides the interface 108 | to [InfluxDB](https://docs.influxdata.com/influxdb), a time series 109 | database designed to handle high write and query loads. 110 | - The `r pkg("RPresto")` package implements a 111 | DBI-compliant interface to [Presto](https://prestodb.io/), an open 112 | source distributed SQL query engine for running interactive analytic 113 | queries against data sources of all sizes ranging from gigabytes to 114 | petabytes. 115 | - The `r pkg("RJDBC")` package is an implementation of 116 | R's DBI interface using JDBC as a back-end. This allows R to 117 | connect to any DBMS that has a JDBC driver. 118 | - The `r pkg("implyr")` package provides the back-end for 119 | [Apache Impala](https://impala.apache.org), which enables 120 | low-latency SQL queries on data stored in the Hadoop Distributed 121 | File System (HDFS), Apache HBase, Apache Kudu, Amazon Simple Storage 122 | Service (S3), Microsoft Azure Data Lake Store (ADLS), and Dell EMC 123 | Isilon. 124 | - The `r pkg("dbx")` package provides intuitive functions 125 | for high performance batch operations and safe 126 | inserts/updates/deletes without writing SQL on top of 127 | `r pkg("DBI")`. It is designed for both research and 128 | production environments and supports multiple database backends such 129 | as Postgres, MySQL, MariaDB, and SQLite. 130 | - The `r pkg("sparklyr")` package provides provides a 131 | `r pkg("dplyr")` interface to [Apache 132 | Spark](https://spark.apache.org/) DataFrames as well as an R 133 | interface to Spark's distributed machine learning pipelines. 134 | - The `r pkg("Hmisc")` provides a wrapper function `Hmisc::mdb.get()` 135 | that uses the [mdbtools](https://github.com/mdbtools/mdbtools) utility 136 | to read from Microsoft Access database on Unix-alike systems. 137 | - The `r pkg("DatabaseConnector")` provides a DBI compatible interface 138 | to various database platforms using either JDBC or DBI drivers. 139 | 140 | ### Non-relational databases 141 | 142 | This section includes packages that provides access to non-relational 143 | databases within R. 144 | 145 | - Packages for [Redis](https://redis.io/), an open-source, in-memory 146 | data structure store that can be used as a database, cache and 147 | message broker: 148 | - The `r pkg("RcppRedis")` package provides interface 149 | to Redis using `r github("redis/hiredis")`. 150 | - The `r pkg("redux")` package provides a low-level 151 | interface to Redis, allowing execution of arbitrary Redis 152 | commands with almost no interface, and a high-level generated 153 | interface to more than 200 redis commands. 154 | - Packages for [Elasticsearch](http://elasticsearch.org/), an 155 | open-source, RESTful, distributed search and analytics engine: 156 | - The `r pkg("elastic")` package provides a general 157 | purpose interface to Elasticsearch. 158 | - The `r pkg("uptasticsearch")` package is a 159 | Elasticsearch client tailored to data science workflows. 160 | - The `r pkg("mongolite")` package provides a high-level, 161 | high-performance [MongoDB](https://www.mongodb.com/) client based on 162 | `r github("mongodb/mongo-c-driver")`, including support 163 | for aggregation, indexing, map-reduce, streaming, SSL encryption and 164 | SASL authentication. 165 | - The `r pkg("R4CouchDB")` package provides a collection 166 | of functions for basic database and document management operations 167 | in [CouchDB](http://couchdb.apache.org/). 168 | - Packages for [Amazon 169 | DynamoDB](https://aws.amazon.com/dynamodb/), a fast, flexible NoSQL database 170 | - The `r github("cloudyr/aws.dynamodb")` package on GitHub provides access to inside from 171 | the `cloudyr` development team. 172 | - The `r pkg("paws.database")` package provides an interface using the `r pkg("paws")` suite of tools. 173 | - The `r github("mrcsparker/rrocksdb")` package on GitHub 174 | provides access to [RocksDB](http://rocksdb.org). 175 | 176 | 177 | ### Database tools 178 | 179 | This section includes packages that provides tools for working and 180 | testing with databases, database table manipulations, etc. 181 | 182 | - The `r pkg("MSSQL")` package extends the functionality of the RODBC 183 | package to work with Microsoft SQL Server databases. Makes it easier 184 | to browse the database and examine individual tables and views. 185 | - The `r pkg("pool")` package enables the creation of 186 | object pools, which make it less computationally expensive to fetch 187 | a new object. 188 | - The `r pkg("DBItest")` package is a helper that tests 189 | DBI back ends for conformity to the interface. 190 | - The `r pkg("dbplyr")` package is a 191 | `r pkg("dplyr")` back-end for databases that allows you 192 | to work with remote database tables as if they are in-memory data 193 | frames. Basic features works with any database that has a DBI 194 | back-end; more advanced features require SQL translation to be 195 | provided by the package author. 196 | - The `r pkg("sqldf")` package provides functionalities to 197 | manipulate R Data Frames Using SQL. 198 | - The `r pkg("pointblank")` package provides tools to 199 | validate data tables in databases such as PostgreSQL and MySQL. 200 | - The `r pkg("dittodb")` package provides functionality to 201 | test database interactions with any `r pkg("DBI")` 202 | compliant database backend. It includes functionality to use 203 | fixtures instead of direct database calls during testing as well as 204 | functionality to record those fixtures when interacting with a real 205 | database for later use in tests. 206 | - The `r pkg("tfio")` package provides the ability to use 207 | [Apache Ignite](https://ignite.apache.org/), which handles 208 | distributed database management for high-performance computing 209 | with in-memory speed. 210 | - The `r github("daroczig/dbr")` package on GitHub 211 | provides convenient database connections and queries from R 212 | using YAML configuration files and templates. 213 | - The `r pkg("rocker")` package provides a `r pkg("R6")` class interface 214 | for handling relational database connections using `r pkg("DBI")` as backend. 215 | The purpose is having an intuitive object allowing straightforward 216 | handling of SQL databases. 217 | - The `r pkg("SQRL")` package streamlines exploratory and interactive sessions 218 | on ODBC databases, and allows R code within SQL scripts. 219 | - The `r pkg("octopus")` package provides an interactive shiny application for 220 | database management to view tables and schemas, upload files, send queries, 221 | and more. 222 | 223 | 224 | ### Links 225 | 226 | * [DBI package web page](https://dbi.r-dbi.org/) 227 | * [RStudio: Databases using R](https://db.rstudio.com/) 228 | * [Open Database Connectivity 229 | (ODBC)](https://docs.microsoft.com/en-us/sql/odbc/) 230 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | ## CRAN Task View: Databases with R 2 | 3 | **URL:** 4 | 5 | **Source file:** [Databases.md](Databases.md) 6 | 7 | **Contributions:** Suggestions and improvements for this task view are very 8 | welcome and can be made through issues or pull requests here on GitHub or 9 | via e-mail to the maintainer address. For further details see the 10 | [Contributing](https://github.com/cran-task-views/ctv/blob/main/Contributing.md) 11 | guide. All contributions must adhere to the 12 | [code of conduct](https://github.com/cran-task-views/ctv/blob/main/CodeOfConduct.md). 13 | --------------------------------------------------------------------------------