├── README.md └── _config.yml /README.md: -------------------------------------------------------------------------------- 1 | 2 | ## Not exhaustive and personnal list of "modern" Data Tools and Projects 3 | 4 | [![Suggest a Data Tool !](https://img.shields.io/badge/Suggest-a%20Data%20Tool-green)](https://github.com/victorcouste/data-tools/issues/new) 5 | 6 | - [Data Architecture articles](#architecture) 7 | - [Data Ingestion / Data Onboarding / ETL / ELT](#ingestion) 8 | - [Reverse ETL](#reverse) 9 | - [Data Collection / Product Analytics / Customer Data](#collection) 10 | - [Transformation / Preparation / Cleaning / Wrangling](#transformation) 11 | - [SQL Tools / Editors](#sqltools) 12 | - [SQL Engines](#sql) 13 | - [BI / Reporting / Data Visualization](#bi) 14 | - [Data Quality / Profiling / Observability](#quality) 15 | - [Data Management / Lineage / Catalog / Governance](#management) 16 | - [DataOps / Data Fabric](#ops) 17 | - [Orchestration / Workflow](#orchestration) 18 | - [Storage / Database](#storage) 19 | - [Data Privacy / Security / Identity](#privacy) 20 | - [Others](#others) 21 | 22 | No (file systems) storage or (traditional) databases, and for now, no data science, virtualization, or streaming tools. And no all embedded tools and services proposed by the 3 main public Cloud providers (Google Cloud, Microsoft Azure and AWS). 23 | 24 | **Data Architecture** 25 | - [Emerging Architectures for Modern Data Infrastructure](https://a16z.com/2020/10/15/the-emerging-architectures-for-modern-data-infrastructure) 26 | - [The Modern Data Stack: Past, Present, and Future](https://blog.getdbt.com/future-of-the-modern-data-stack) 27 | - [Data Mesh Principles and Logical Architecture](https://martinfowler.com/articles/data-mesh-principles.html) and a [Data Warehouse comparison](https://blog.starburstdata.com/data-mesh-the-answer-to-the-data-warehouse-hypocrisy) 28 | - [The Building Blocks of a Modern Data Platform](https://towardsdatascience.com/the-building-blocks-of-a-modern-data-platform-92e46061165) 29 | - [Two steps towards a modern data platform](https://medium.com/bigdatarepublic/two-steps-towards-a-modern-data-platform-37c74e7c104b) 30 | - [What your data team is using: The analytics stack](https://technically.dev/posts/what-your-data-team-is-using) 31 | - [The Top 20 Most Commonly Used Data Engineering Tools](https://www.secoda.co/blog/the-top-20-most-commonly-used-data-engineering-tools) 32 | - [Data Stacks For Fun & Nonprofit](https://towardsdatascience.com/data-stacks-for-fun-nonprofit-part-ii-d375d824abf3) 33 | - [The Future of Business Intelligence is Open Source](https://maximebeauchemin.medium.com/the-future-of-business-intelligence-is-open-source-9b654595773a) 34 | - [What is Data Observability?](https://towardsdatascience.com/what-is-data-observability-40b337971e3e) 35 | 36 | **Data Ingestion / Data Onboarding / ETL / ELT** 37 | - [Flatfile](https://flatfile.io) Data Onboarding platform 38 | - [Fivetran](https://fivetran.com) Cloud data integration platform 39 | - [Matillion](https://www.matillion.com) Cloud data integration platform 40 | - [Apache Gobblin](https://gobblin.apache.org) Open Source distributed data integration framework 41 | - [Singer](https://www.singer.io) "Open Source standard for writing scripts that move data" 42 | - [Meltano](https://meltano.com) Open Source ELT for the DataOps 43 | - [Airbyte](https://airbyte.io) Open Source data integration platform 44 | - [Stitch](https://www.stitchdata.com) Simple, extensible Cloud ETL platform (Talend) 45 | - [Hevo](https://hevodata.com) No-code data pipeline as a service 46 | - [Apache Hop](http://hop.incubator.apache.org) Open Source data integration platform project 47 | - [Meroxa](https://meroxa.com) Real-time data ingestion infrastructure 48 | - [Portable](https://portable.io) Cloud Hosted ELT Platform 49 | - Talend, StreamSets, Alooma (Google), Xplenty, Striim, Panoply, Stambia, HVR 50 | 51 | **Reverse ETL** 52 | - [Census](https://www.getcensus.com) Operational analytics platform, move data from data warehouse to apps 53 | - [Hightouch](https://www.hightouch.io) Sync customer data to SaaS business platforms 54 | - [Grouparoo](https://www.grouparoo.com) Open Source framework to move data between database and Cloud apps 55 | 56 | **Data Collection / Product Analytics / Customer Data** 57 | - [Segment](https://segment.com) Customer data platform (CDP) (Twilio) 58 | - [RudderStack](https://rudderstack.com) Customer data pipeline, event tracking 59 | - [Snowplow](https://snowplowanalytics.com) Data collection platform 60 | - [Freshpaint](https://www.freshpaint.io) Collect, control, and deliver customer data 61 | - [PostHog](https://posthog.com) Open Source Product Analytics platform 62 | - [Amplitude](https://amplitude.com) Product Analytics platform 63 | - [Iteratively](https://iterative.ly) Product Analytics platform « Capture customer data you trust » 64 | - [Avo](https://www.avo.app) Product Analytics platform 65 | - [Mixpanel](https://mixpanel.com) Product analytics platform 66 | - [Indicative](https://www.indicative.com) Product analytics platform  67 | - [Heap](https://heap.io) Product analytics platform 68 | - [Supermetrics](https://supermetrics.com) Get marketing data for reporting, analytics and storage 69 | 70 | **Transformation / Preparation / Cleaning / Wrangling** 71 | - [Trifacta](https://www.trifacta.com) Data Wrangling for Cloud (or Hadoop) platforms and storages 72 | - [dbt](https://www.getdbt.com) Transform with SQL from command line ([Open Source](https://github.com/fishtown-analytics/dbt)) or Cloud 73 | - [Dataform](https://dataform.co) Collaboration on SQL pipelines in Cloud data warehouses (Google) 74 | - [Pano](https://www.pano.dev) Open Source data preparation for Cloud data warehouses 75 | - [Rasgo](https://www.rasgoml.com) Data preparation for Data Scientists 76 | - [Mito](https://www.trymito.io) Jupyter Lab extension to generate panda Python code from a spreadsheet 77 | - [DataPrep](https://dataprep.ai/) Prepare data in Python 78 | - [OpenRefine](https://openrefine.org) "A free, open source, powerful tool for working with messy data" 79 | 80 | **SQL Tools / Editors** 81 | - [Count](https://count.co) "The BI notebook built for analysts" 82 | - [PopSQL](https://popsql.com) "Modern SQL editor" 83 | - [DataGrip](https://www.jetbrains.com/datagrip) IDE for SQL (JetBrains) 84 | - [DBeaver](https://dbeaver.io) Free (or Enterprise and Cloud editions) universal database tool 85 | - [sq](https://sq.io) "swiss-army knife for data", SQL in command line for relational data 86 | - [SqlDBM](https://sqldbm.com) Develop Database Models 87 | - [Querybook](https://www.querybook.org) Open Source SQL query and Big Data IDE via a notebook interface 88 | - [Soda SQL](https://github.com/sodadata/soda-sql) Data testing, monitoring, and profiling for SQL-accessible data 89 | - [SQLFluff](https://github.com/sqlfluff/sqlfluff) SQL Linting and Auto-formatting for Humans 90 | 91 | **SQL Engines** 92 | - [Trino](https://trino.io) Open Source high perf and distributed SQL query engine (formerly PrestoSQL) 93 | - [Starburst](https://www.starburst.io) Cloud or On-premises SQL engine (based on [Trino](https://trino.io)) 94 | - [AWS Athena](https://aws.amazon.com/athena) Interactive SQL query service for Amazon S3 (based on Presto) 95 | - [DataFusion](https://github.com/apache/arrow-datafusion) Query execution engine using Apache Arrow as its in-memory format 96 | 97 | **BI / Reporting / Data Visualization** 98 | - [Metabase](https://www.metabase.com) Open Source business intelligence tool 99 | - [Apache Superset](https://superset.apache.org) Open Source modern data exploration and visualization platform 100 | - [Apache ECharts](https://echarts.apache.org) Open Source JavaScript Visualization Library 101 | - [Cube.js](https://cube.dev) Open Source Analytical API platform 102 | - [Grafana](https://grafana.com) Open Source analytics & monitoring solution 103 | - [Looker](https://looker.com) BI and Analytics Platform (Google) 104 | - [Redash](https://redash.io) Data visualisation and Dashboarding with SQL (Databricks) 105 | - [Mode](https://mode.com) Collaborative data platform that combines SQL, R, Python, and visual analytics 106 | - [Sigma](https://www.sigmacomputing.com) Cloud analytics solution 107 | - [Hex](https://hex.tech) Collaborative SQL + Python-based notebooks 108 | - [Lux](https://github.com/lux-org/lux) Python library and API for Intelligent Visual Discovery 109 | - [y42](https://www.y42.com) "No-Code Business Intelligence" platform 110 | - [Knowage](https://www.knowage-suite.com) Open Source Business Analytics Suite 111 | - [Rakam](https://rakam.io) Data platform for building analytics interface (dbt integration) 112 | - [Datawrapper](https://www.datawrapper.de) Enrich stories and articles with data visualization 113 | - [D3](https://d3js.org) JavaScript library for visualizing data with HTML, SVG, and CSS 114 | - [Lightdash](https://www.lightdash.com) Open source BI tool fully integrated with dbt projects 115 | - Tableau, PowerBI, Sisense, Qlik, Spotfire, ThoughtSpot, Chartio (Atlassian), Domo, Toucan Toco 116 | 117 | **Data Quality / Profiling / Observability** 118 | - [Monte Carlo](https://www.montecarlodata.com) "Data Reliability Delivered" 119 | - [Datafold](https://www.datafold.com) Data Observability platform 120 | - [Great Expectations](https://greatexpectations.io) Open Source data quality, profiling & validation 121 | - [Bigeye](https://www.bigeye.com) Automatic data quality monitoring 122 | - [Anomalo](https://www.anomalo.com) Validate and document your data warehouse 123 | - [Trackplan](https://trackplan.io) "Schema Management for Behavioural Data Tracking" 124 | - [lightup](https://www.lightup.ai) Cloud data quality indicators provider 125 | 126 | **Data Management / Lineage / Catalog / Governance** 127 | - [Datakin](https://datakin.com) DataOps solution, Data Lineage 128 | - [Marquez](https://marquezproject.github.io/marquez) Open Source metadata and data governance project 129 | - [DataHub](https://datahubproject.io) Open Source metadata search & discovery tool 130 | - [Amundsen](https://www.amundsen.io) Open Source data discovery and metadata engine 131 | - [Data Galaxy](https://www.datagalaxy.com/en) Data Governance platform with Data Catalog and Data Lineage 132 | - [Zeenea](https://zeenea.com) Cloud-native Data Catalog 133 | - [Alation](https://www.alation.com) Data Governance and Data Catalog platform 134 | - [Collibra](https://www.collibra.com) Data Governance and Data Catalog platform 135 | - [Secoda](https://www.secoda.co) Data Discovery and Data Catalog 136 | - [MANTA](https://getmanta.com) Data Lineage platform 137 | - [data.world](https://data.world) Cloud-native Data Catalog 138 | - [Stemma](https://www.stemma.ai/) SaaS managed version of Amundsen 139 | - [Egeria](https://egeria.odpi.org/) Open Metadata and Governance 140 | 141 | **DataOps / Data Fabric** 142 | - [Altan](https://atlan.com) "the modern data workspace", Data Management & DataOps 143 | - [Nessie](https://projectnessie.org) DataOps for Data Lakes, a "Git-Like Experience for your Data Lake" 144 | - [Nexla](https://www.nexla.com) DataOps platform "to delivery data for Analytics, AI and Operations" 145 | - [Keboola](https://www.keboola.com) DataOps platform 146 | - [Saagie](https://www.saagie.com) DataOps platform 147 | - [DataKitchen](https://datakitchen.io) DataOps platform 148 | - [DAGsHub](https://dagshub.com) GitHub for data 149 | - [Unravel](https://www.unraveldata.com) DataOps platform 150 | - [Upsolver](https://www.upsolver.com) "Compute and pipeline layer between your data lake and the analytics tools" 151 | - [Cinchy](https://www.cinchy.com) "Autonomous Data Fabric" and Data Management platform 152 | 153 | **Orchestration / Workflow** 154 | - [Apache Airflow](https://airflow.apache.org) Open Source workflow scheduler platform 155 | - [Dagster](https://dagster.io) Open Source "Data orchestrator for machine learning, analytics, and ETL" 156 | - [Prefect](https://www.prefect.io) Workflow management system and platform for dataflow automation 157 | - [Apache DolphinScheduler](https://dolphinscheduler.apache.org) Distributed and visual workflow scheduler system 158 | - [Luigi](https://github.com/spotify) Python package to build complex pipelines of batch jobs 159 | 160 | **Storage / Database** 161 | - [DuckDB](https://duckdb.org) In-process SQL OLAP database (Sqlite like column oriented) 162 | - [ClickHouse](https://clickhouse.tech/) Open-source OLAP database management system 163 | - [DoltHub](https://www.dolthub.com) "the true Git for data experience in a SQL database" 164 | - [DVC](https://dvc.org) Data Version Control 165 | - [Materialize](https://materialize.com) Event Streaming Database 166 | - [Warp 10](https://www.warp10.io) Advanced Time Series Platform 167 | - Snowflake, Firebolt, BigQuery, Redshift, Apache Cassandra, MongoDB, InfluxDB, QuestDB, Neo4j, SingleStore(MemSQL) 168 | 169 | **Data Privacy / Security / Identity** 170 | - [Immuta](https://www.immuta.com) "Self-Service Data Access with Automated Privacy Control" 171 | - [Okera](https://www.okera.com) Cloud data security, "Universal Data Authorization" 172 | - [Privacera](https://privacera.com) SaaS Access Governance Solution 173 | - [Apache Ranger](https://ranger.apache.org) Framework to enable, monitor and manage comprehensive data security 174 | - [Baffle](https://baffle.io) Cloud security with a "transparent data security mesh" 175 | - [Privitar](https://www.privitar.com) Enterprise Data Privacy Software 176 | - [ReachFive](https://www.reachfive.com) Identity & Access Management 177 | - [Okta](https://www.okta.com) Trusted platform to secure identities, from customers to workforce 178 | 179 | **Others** 180 | - [Opendatasoft](https://www.opendatasoft.com) Data sharing platform 181 | - [Streamlit](https://streamlit.io) Turns data scripts into shareable data web apps 182 | - [Transform Data](https://transformdata.io) Shared data interface and metrics repository 183 | - [White Label Data](https://docs.whitelabeldata.com) Platform for building and deploying custom data applications 184 | - [Flat Data](https://octo.github.com/projects/flat-data) Bring working datasets into your GitHub repositories and versioning them 185 | 186 | **And finally don't hesitate to:** 187 | - [Star](https://github.com/victorcouste/data-tools/stargazers) this GitHub repository Web page 188 | - Suggest addition interesting and new data tool with a [pull request](https://github.com/victorcouste/data-tools/pulls), an [issue](https://github.com/victorcouste/data-tools/issues/new) or a [message](https://github.com/victorcouste) 189 | - Share [this list](https://victorcouste.github.io/data-tools) in your newtork 190 | - Enjoy and Have Fun ! 191 | 192 | Victor 193 | 194 | -------------------------------------------------------------------------------- /_config.yml: -------------------------------------------------------------------------------- 1 | theme: jekyll-theme-slate --------------------------------------------------------------------------------