├── DataOpsSoftware2019.md
├── README.md
└── books
    ├── DataKitchen_dataops_cookbook.pdf
    ├── Importance of metadata in data warehousing.pdf
    └── managing-data-in-motion.pdf


/DataOpsSoftware2019.md:
--------------------------------------------------------------------------------
 1 | ## Data Pipeline Orchestration
 2 | 
 3 | - [Airflow](https://medium.com/airbnb-engineering/airflow-a-workflow-management-platform-46318b977fd8)
 4 | an open-source platform to programmatically author, schedule and monitor data pipelines.
 5 | - [Apache Oozie](http://oozie.apache.org/)
 6 | an open-source workflow scheduler system to manage Apache Hadoop jobs.
 7 | - [DBT (Data Build Tool)](https://www.getdbt.com/)
 8 | is a command line tool that enables data analysts and engineers to transform data in their warehouse more effectively.
 9 | - [BMC Control-M](http://www.bmc.com/it-solutions/control-m.html)
10 | a digital business automation solution that simplifies and automates diverse batch application workloads.
11 | - [DataKitchen](https://www.datakitchen.io/)
12 | a DataOps Platform that reduces analytics cycle time by monitoring data quality and providing automated support for the deployment of data and new analytics.
13 | - [Reflow](https://github.com/grailbio/reflow)
14 | Reflow is a system for incremental data processing in the cloud. Reflow enables scientists and engineers to compose existing tools (packaged in Docker images) using ordinary programming constructs.
15 | - [ElementL](https://github.com/elementl)
16 | A current stealth company founded by ex-facebook director and graphQL co-creator Nick Schrock. Dagster Open Source.
17 | - [Astronomer.io](https://www.astronomer.io/)
18 | Astronomer recently re-focused on Airflow support. They make it easy to deploy and manage your own Apache Airflow webserver, so you can get straight to writing workflows.
19 | - [Piperr.io](http://piperr.io/) 
20 | Use Piperr’s pre-built data pipelines across enterprise stakeholders: From IT to Analytics, From Tech, Data Science to LoBs.
21 | - [Prefect Technologies](https://www.prefect.io/)
22 | Open-source data engineering platform that builds, tests, and runs data workflows.
23 | - [Genie](https://netflix.github.io/genie/)
24 | Distributed Big Data Orchestration Service by Netflix
25 | 
26 | ## Testing and Production Quality
27 | - [ICEDQ](https://icedq.com/)
28 | software used to automate the testing of ETL/Data Warehouse and Data Migration.
29 | - [Naveego](http://www.naveego.com/)
30 | A simple, cloud-based platform that allows you to deliver accurate dashboards by taking a bottom-up approach to data quality and exception management.
31 | - [DataKitchen](https://www.datakitchen.io/)
32 | a DataOps Platform that improves data quality by providing lean manufacturing controls to test and monitor data.
33 | - [FirstEigen](http://firsteigen.com/)
34 | Automatic Data Quality Rule Discovery and Continuous Data Monitoring
35 | - [Great Expectations](https://github.com/great-expectations/great_expectations)
36 | Great Expectations is a framework that helps teams save time and promote analytic integrity with a new twist on automated testing: pipeline tests. Pipeline tests are applied to data (instead of code) and at batch time (instead of compiling or deploy time).
37 | - [Enterprise Data Foundation](https://enterprise-data.org/)
38 | Open-source enterprise data toolkit providing efficient unit testing, automated refreshes, and automated deployment.
39 | 
40 | ## Deployment Automation and Development Sandbox Creation
41 | - [Jenkins](https://jenkins-ci.org/)
42 | a ‘CI/CD’ tool used by software development teams to deploy code from development into production
43 | - [DataKitchen](https://www.datakitchen.io/)
44 | a DataOps Platform that supports the deployment of all data analytics code and configuration.
45 | - [Amaterasu](http://shinto.io/index.html)
46 | is a deployment tool for data pipelines. Amaterasu allows developers to write and easily deploy data pipelines, and clusters manage their configuration and dependencies.
47 | - [Meltano](https://about.gitlab.com/2018/08/01/hey-data-teams-we-are-working-on-a-tool-just-for-you/)
48 | aims to be a complete solution for data teams — the name stands for model, extract, load, transform, analyze, notebook, orchestrate — in other words, the data science lifecycle.
49 | 
50 | ## Data Science Model Deployment
51 | - [Domino](https://www.dominodatalab.com/)
52 | accelerates the development and delivery of models with infrastructure automation, seamless collaboration, and automated reproducibility.
53 | - [Hydrosphere.io](https://hydrosphere.io/)
54 | deploys batch Spark functions, machine-learning models, and assures the quality of end-to-end pipelines.
55 | - [Open Data Group](https://www.opendatagroup.com/)
56 | a software solution that facilitates the deployment of analytics using models.
57 | - [ParallelM](http://www.parallelm.com/)
58 | moves machine learning into production, automates orchestration, and manages the ML pipeline.
59 | - [Seldon](https://www.seldon.io/)
60 | streamlines the data science workflow, with audit trails, advanced experiments, continuous integration, and deployment.
61 | - [Metis Machine](https://metismachine.com/)
62 | Enterprise-scale Machine Learning and Deep Learning deployment and automation platform for rapid deployment of models into existing infrastructure and applications.
63 | - [Datatron](http://www.datatron.com/)
64 | Automate deployment and monitoring of AI Models.
65 | - [DSFlow](http://dsflow.io/)Go from data extraction to business value in days, not months. Build on top of open source tech, using Silicon Valley’s best practices.
66 | - [DataMo-Datmo](https://datmo.com/)
67 | tools help you seamlessly deploy and manage models in a scalable, reliable, and cost-optimized way.
68 | - [MLFlow](https://www.mlflow.org/)
69 | An open source platform for the complete machine learning lifecycle from MapR.
70 | - [Studio.ML](https://www.studio.ml/)
71 | Studio is a model management framework written in Python to help simplify and expedite your model building experience.
72 | - [Comet.ML](https://www.comet.ml/)
73 | Comet.ml allows data science teams and individuals to automagically track their datasets, code changes, experimentation history and production models creating efficiency, transparency, and reproducibility.
74 | - [Polyaxon](https://polyaxon.com/)
75 | An open source platform for reproducible machine learning at scale.
76 | - [Missinglink.ai](https://missinglink.ai/)
77 | MissingLink helps data engineers streamline and automate the entire deep learning lifecycle.
78 | - [kubeflow](https://www.kubeflow.org/)
79 | The Machine Learning Toolkit for Kubernetes
80 | - [Vert.ai](https://www.verta.ai/)
81 | Models are the new code!
82 | 
83 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # My Awesome Data Ops Resources [![Awesome](https://awesome.re/badge.svg)](https://awesome.re)
  2 | 
  3 | > A curated list of data operations resources, focused for Cultural Heritage Organizations usage.
  4 | 
  5 | 
  6 | ## Books
  7 | 
  8 | - [The DataOps Cookbook](https://www.datakitchen.io/dataops-cookbook-main.html) A 135-page long book that describes the steip-by-step implmentation of Data Ops. 
  9 | 
 10 | - [blogs](#another-section)
 11 | 
 12 | 
 13 | ## Papers and Blogs
 14 | ### ETL
 15 |  - [Managing Data in Motion](https://www.progress.com/docs/default-source/default-document-library/Progress/Documents/book-club/Managing-Data-in-Motion.p)
 16 | 
 17 | 
 18 | ### Data Quality
 19 | - [A Deep Dive Into Data Quality](https://towardsdatascience.com/a-deep-dive-into-data-quality-c1d1ee576046)
 20 | 
 21 | ### Metadata
 22 |  - [Importance of Metadata in Data Warehousing](http://sdsu-dspace.calstate.edu/bitstream/handle/10211.10/2354/Dhiman_Abhinav.pdf;sequence=1)
 23 |  
 24 | ### Pipeline Engineering
 25 | - [Smart pipelining — reactive approach to computation scheduling](https://medium.com/casumotech/smart-pipelining-reactive-approach-to-computation-scheduling-5a7e39658df5)
 26 | 
 27 | 
 28 | 
 29 | ## Data Ops Software
 30 | 
 31 | ### Data Pipeline Orchestration
 32 | 
 33 | - [Airflow](https://medium.com/airbnb-engineering/airflow-a-workflow-management-platform-46318b977fd8)
 34 | an open-source platform to programmatically author, schedule and monitor data pipelines.
 35 | - [Apache Oozie](http://oozie.apache.org/)
 36 | an open-source workflow scheduler system to manage Apache Hadoop jobs.
 37 | - [DBT (Data Build Tool)](https://www.getdbt.com/)
 38 | is a command line tool that enables data analysts and engineers to transform data in their warehouse more effectively.
 39 | - [BMC Control-M](http://www.bmc.com/it-solutions/control-m.html)
 40 | a digital business automation solution that simplifies and automates diverse batch application workloads.
 41 | - [DataKitchen](https://www.datakitchen.io/)
 42 | a DataOps Platform that reduces analytics cycle time by monitoring data quality and providing automated support for the deployment of data and new analytics.
 43 | - [Reflow](https://github.com/grailbio/reflow)
 44 | Reflow is a system for incremental data processing in the cloud. Reflow enables scientists and engineers to compose existing tools (packaged in Docker images) using ordinary programming constructs.
 45 | - [ElementL](https://github.com/elementl)
 46 | A current stealth company founded by ex-facebook director and graphQL co-creator Nick Schrock. Dagster Open Source.
 47 | - [Astronomer.io](https://www.astronomer.io/)
 48 | Astronomer recently re-focused on Airflow support. They make it easy to deploy and manage your own Apache Airflow webserver, so you can get straight to writing workflows.
 49 | - [Piperr.io](http://piperr.io/) 
 50 | Use Piperr’s pre-built data pipelines across enterprise stakeholders: From IT to Analytics, From Tech, Data Science to LoBs.
 51 | - [Prefect Technologies](https://www.prefect.io/)
 52 | Open-source data engineering platform that builds, tests, and runs data workflows.
 53 | - [Genie](https://netflix.github.io/genie/)
 54 | Distributed Big Data Orchestration Service by Netflix
 55 | 
 56 | ### Testing and Production Quality
 57 | - [ICEDQ](https://icedq.com/)
 58 | software used to automate the testing of ETL/Data Warehouse and Data Migration.
 59 | - [Naveego](http://www.naveego.com/)
 60 | A simple, cloud-based platform that allows you to deliver accurate dashboards by taking a bottom-up approach to data quality and exception management.
 61 | - [DataKitchen](https://www.datakitchen.io/)
 62 | a DataOps Platform that improves data quality by providing lean manufacturing controls to test and monitor data.
 63 | - [FirstEigen](http://firsteigen.com/)
 64 | Automatic Data Quality Rule Discovery and Continuous Data Monitoring
 65 | - [Great Expectations](https://github.com/great-expectations/great_expectations)
 66 | Great Expectations is a framework that helps teams save time and promote analytic integrity with a new twist on automated testing: pipeline tests. Pipeline tests are applied to data (instead of code) and at batch time (instead of compiling or deploy time).
 67 | - [Enterprise Data Foundation](https://enterprise-data.org/)
 68 | Open-source enterprise data toolkit providing efficient unit testing, automated refreshes, and automated deployment.
 69 | 
 70 | ### Deployment Automation and Development Sandbox Creation
 71 | - [Jenkins](https://jenkins-ci.org/)
 72 | a ‘CI/CD’ tool used by software development teams to deploy code from development into production
 73 | - [DataKitchen](https://www.datakitchen.io/)
 74 | a DataOps Platform that supports the deployment of all data analytics code and configuration.
 75 | - [Amaterasu](http://shinto.io/index.html)
 76 | is a deployment tool for data pipelines. Amaterasu allows developers to write and easily deploy data pipelines, and clusters manage their configuration and dependencies.
 77 | - [Meltano](https://about.gitlab.com/2018/08/01/hey-data-teams-we-are-working-on-a-tool-just-for-you/)
 78 | aims to be a complete solution for data teams — the name stands for model, extract, load, transform, analyze, notebook, orchestrate — in other words, the data science lifecycle.
 79 | 
 80 | ### Data Science Model Deployment
 81 | - [Domino](https://www.dominodatalab.com/)
 82 | accelerates the development and delivery of models with infrastructure automation, seamless collaboration, and automated reproducibility.
 83 | - [Hydrosphere.io](https://hydrosphere.io/)
 84 | deploys batch Spark functions, machine-learning models, and assures the quality of end-to-end pipelines.
 85 | - [Open Data Group](https://www.opendatagroup.com/)
 86 | a software solution that facilitates the deployment of analytics using models.
 87 | - [ParallelM](http://www.parallelm.com/)
 88 | moves machine learning into production, automates orchestration, and manages the ML pipeline.
 89 | - [Seldon](https://www.seldon.io/)
 90 | streamlines the data science workflow, with audit trails, advanced experiments, continuous integration, and deployment.
 91 | - [Metis Machine](https://metismachine.com/)
 92 | Enterprise-scale Machine Learning and Deep Learning deployment and automation platform for rapid deployment of models into existing infrastructure and applications.
 93 | - [Datatron](http://www.datatron.com/)
 94 | Automate deployment and monitoring of AI Models.
 95 | - [DSFlow](http://dsflow.io/)Go from data extraction to business value in days, not months. Build on top of open source tech, using Silicon Valley’s best practices.
 96 | - [DataMo-Datmo](https://datmo.com/)
 97 | tools help you seamlessly deploy and manage models in a scalable, reliable, and cost-optimized way.
 98 | - [MLFlow](https://www.mlflow.org/)
 99 | An open source platform for the complete machine learning lifecycle from MapR.
100 | - [Studio.ML](https://www.studio.ml/)
101 | Studio is a model management framework written in Python to help simplify and expedite your model building experience.
102 | - [Comet.ML](https://www.comet.ml/)
103 | Comet.ml allows data science teams and individuals to automagically track their datasets, code changes, experimentation history and production models creating efficiency, transparency, and reproducibility.
104 | - [Polyaxon](https://polyaxon.com/)
105 | An open source platform for reproducible machine learning at scale.
106 | - [Missinglink.ai](https://missinglink.ai/)
107 | MissingLink helps data engineers streamline and automate the entire deep learning lifecycle.
108 | - [kubeflow](https://www.kubeflow.org/)
109 | The Machine Learning Toolkit for Kubernetes
110 | - [Vert.ai](https://www.verta.ai/)
111 | Models are the new code!
112 | 
113 | 
114 | 
115 | 
116 | ## License
117 | 
118 | [![CC0](http://mirrors.creativecommons.org/presskit/buttons/88x31/svg/cc-zero.svg)](http://creativecommons.org/publicdomain/zero/1.0)
119 | 


--------------------------------------------------------------------------------
/books/DataKitchen_dataops_cookbook.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chen1649chenli/dataOpsResource/7c36f251243475b89cf5546beed64946543daf67/books/DataKitchen_dataops_cookbook.pdf


--------------------------------------------------------------------------------
/books/Importance of metadata in data warehousing.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chen1649chenli/dataOpsResource/7c36f251243475b89cf5546beed64946543daf67/books/Importance of metadata in data warehousing.pdf


--------------------------------------------------------------------------------
/books/managing-data-in-motion.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chen1649chenli/dataOpsResource/7c36f251243475b89cf5546beed64946543daf67/books/managing-data-in-motion.pdf


--------------------------------------------------------------------------------