├── README.md
└── ja
    └── README.md


/README.md:
--------------------------------------------------------------------------------
  1 | # Comparison of Python pipeline packages: Airflow, Luigi, Gokart, Metaflow, Kedro, PipelineX
  2 | 
  3 | 
  4 | This article compares open-source Python packages for pipeline/workflow development: Airflow, Luigi, Gokart, Metaflow, Kedro, PipelineX.
  5 | 
  6 | In this article, terms of "pipeline", "workflow", and "DAG" are used almost interchangeably. 
  7 | 
  8 | ## Summary
  9 | 
 10 | - 👍: good
 11 | - 👍👍: better
 12 | 
 13 | | Package                                                          | Airflow | Luigi&nbsp;&nbsp;&nbsp; | Gokart | Metaflow | Kedro&nbsp;&nbsp;&nbsp; | PipelineX |
 14 | |-------------------------------------------------------------------------|---------|-------|--------|----------|-------|-----------------|
 15 | | Developer, Maintainer                                             |  Airbnb, Apache | Spotify | M3 | Netflix | Quantum-Black (McKinsey) | Yusuke Minami |
 16 | | Wrapped packages                                                        |         |       | Luigi  |          |       | Kedro, MLflow   |
 17 | | Easiness/flexibility to define DAG                                      |         |       | 👍      | 👍        | 👍     | 👍👍             |
 18 | | Modularity of DAG definition                                            | 👍👍       |       |        |          | 👍👍     | 👍👍               |
 19 | | Unstructured data can be passed between tasks                           |         | 👍👍     | 👍👍      | 👍👍        | 👍👍     | 👍👍               |
 20 | | Built\-in various data (file/database) existence check wrappers         |         | 👍👍   | 👍👍    |          | 👍👍   | 👍👍             |
 21 | | Built\-in various data (file/database) operation (read/write) wrappers  |         |       | 👍      |          | 👍👍   | 👍👍             |
 22 | | Modularity, reusability, testability of data operation                  |         |       | 👍      |          | 👍👍   | 👍👍             |
 23 | | Automatic resuming option by detecting the intermediate data            |         | 👍👍     | 👍👍      |          |       | 👍👍               |
 24 | | Force rerun of tasks by detecting parameter change                      |         |       | 👍👍      |          |       |                 |
 25 | | Save parameters for experiments                                         |         |       | 👍👍      |          |       | 👍👍               |
 26 | | Parallel execution                                                      | 👍       | 👍     | 👍      | 👍        | 👍     | 👍               |
 27 | | Distributed parallel execution with Celery                              | 👍👍       |       |        |          |       |                 |
 28 | | Visualization of DAG                                                    | 👍👍       | 👍     | 👍      |          | 👍     | 👍               |
 29 | | Execution status monitoring in GUI                                             | 👍👍     | 👍     | 👍      |          |       |                 |
 30 | | Scheduling, Triggering in GUI                                           | 👍       |       |        |          |       |                 |
 31 | | Notification to Slack                                                   | 👍       |       | 👍      |          |       |                 |
 32 | 
 33 | 
 34 | ## Airflow 
 35 | 
 36 | https://github.com/apache/airflow
 37 | 
 38 | Released in 2015 by Airbnb.
 39 | 
 40 | Airflow enables you to define your DAG (workflow) of tasks in Python code (an independent Python module).
 41 | 
 42 | (Optionally, unofficial plugins such as [dag-factory](https://github.com/ajbosco/dag-factory) enables you to define DAG in YAML.)
 43 | 
 44 | ### Pros:
 45 | 
 46 | - Provides rich GUI with features including DAG visualization, execution progress monitoring, scheduling, and triggering.
 47 | - Provides distributed computing option (using Celery).
 48 | - DAG definition is modular; independent from processing functions.
 49 | - Workflow can be nested using `SubDagOperator`.
 50 | - Supports Slack notification.
 51 | 
 52 | ### Cons:
 53 | 
 54 | - Not designed to pass data between dependent tasks without using a database. 
 55 | There is no good way to pass unstructured data (e.g. image, video, pickle, etc.) between dependent tasks in Airflow.
 56 | - You need to write file access (read/write) code. 
 57 | - Does not support automatic pipeline resuming option using the intermediate data files or databases.
 58 | 
 59 | 
 60 | ## Luigi
 61 | 
 62 | https://github.com/spotify/luigi
 63 | 
 64 | Released in 2012 by Spotify.
 65 | 
 66 | Luigi enables you to define your pipeline by child classes of `Task` with 3 class methods (`requires`, `output`, `run`) in Python code.
 67 | 
 68 | ### Pros:
 69 | 
 70 | - Support automatic pipeline resuming option using the intermediate data files in local or cloud (AWS, GCP, Azure) or databases as defined in `Task.output` method using `Target` class.
 71 | - You can write code so any data can be passed between dependent tasks.
 72 | - Provides GUI with features including DAG visualization, execution progress monitoring.
 73 | 
 74 | ### Cons:
 75 | 
 76 | - You need to write file/database access (read/write) code.
 77 | - Pipeline definition, task processing (Transform of ETL), and data access (Extract&Load of ETL) are tightly coupled and not modular. You need to modify the task classes to reuse in future projects.
 78 | 
 79 | 
 80 | ## Gokart
 81 | 
 82 | https://github.com/m3dev/gokart
 83 | 
 84 | Released in Dec 2018 by M3.
 85 | 
 86 | Gokart works on top of Luigi. 
 87 | 
 88 | ### Pros: 
 89 | 
 90 | In addition to Luigi's advantages:
 91 | - Can split task processing (Transform of ETL) from pipeline definition using `TaskInstanceParameter` so you can easily reuse them 
 92 | in future projects.
 93 | - Provides built-in file access (read/write) wrappers as `FileProcessor` classes for pickle, npz, gz, txt, csv, tsv, json, xml.
 94 | - Saves parameters for each experiment to assure reproducibility. Viewer called [thunderbolt](https://github.com/m3dev/thunderbolt) can be used.
 95 | - Reruns tasks upon parameter change based on hash string unique to the parameter set in each intermediate file name. 
 96 | This feature is useful for experimentation with various parameter sets.
 97 | - Syntactic sugar for Luigi's `requires` class method using class decorator. 
 98 | - Supports Slack notification.
 99 | 
100 | ### Cons:
101 | 
102 | - Supported data formats for file access wrappers are limited. You need to write file/database access (read/write) code to use unsupported formats.
103 | 
104 | 
105 | ## Metaflow
106 | 
107 | https://github.com/Netflix/metaflow
108 | 
109 | Released in Dec 2019 by Netflix.
110 | 
111 | Metaflow enables you to define your pipeline as a child class of `FlowSpec` that includes class methods with `step` decorators in Python code.
112 | 
113 | ### Pros:
114 | 
115 | - Integration with AWS services (Especially AWS Batch).
116 | 
117 | ### Cons:
118 | 
119 | - You need to write file/database access (read/write) code.
120 | - Pipeline definition, task processing (Transform of ETL), and data access (Extract&Load of ETL) are tightly coupled and not modular. You need to modify the task classes to reuse in future projects.
121 | - Does not support GUI. 
122 | - Not much support for GCP & Azure.
123 | - Does not support automatic pipeline resuming option using the intermediate data files or databases.
124 | 
125 | 
126 | ## Kedro
127 | 
128 | https://github.com/quantumblacklabs/kedro
129 | 
130 | Released in May 2019 by QuantumBlack, part of McKinsey & Company.
131 | 
132 | Kedro enables you to define pipelines using list of `node` functions with 3 arguments 
133 | (`func`: task processing function, `inputs`: input data name (list or dict if multiple), `outputs`: output data name (list or dict if multiple)) 
134 | in Python code (an independent Python module).
135 | 
136 | ### Pros:
137 | 
138 | - Provides built-in file/database access (read/write) wrappers as `DataSet` classes for 
139 | CSV, Pickle, YAML, JSON, Parquet, Excel, and text in local or cloud (S3 in AWS, GCS in GCP), as well as SQL, Spark, etc. 
140 | - Any data format support can be added by users. 
141 | - Pipeline definition, task processing (Transform of ETL), and data access (Extract&Load of ETL) are independent and modular. 
142 | You can easily reuse in future projects.
143 | - Pipelines can be nested. (A pipeline can be used as a sub-pipeline of another pipeline. )
144 | - GUI ([kedro-viz](https://github.com/quantumblacklabs/kedro-viz)) provides DAG visualization feature.
145 | 
146 | ### Cons:
147 | - Does not support automatic pipeline resuming option using the intermediate data files or databases.
148 | - GUI ([kedro-viz](https://github.com/quantumblacklabs/kedro-viz)) does not provide execution progress monitoring feature.
149 | - Package dependencies which are not used in many cases (e.g. pyarrow) are included in the `requirements.txt`.
150 | 
151 | 
152 | ## PipelineX:
153 | 
154 | https://github.com/Minyus/pipelinex
155 | 
156 | Released in Nov 2019 by a Kedro user (me).
157 | 
158 | PipelineX works on top of Kedro and MLflow.
159 | 
160 | PipelineX enables you to define your pipeline in YAML (an independent YAML file).
161 | 
162 | ### Pros:
163 | 
164 | In addition to Kedro's advantages:
165 | - Supports automatic pipeline resuming option using the intermediate data files or databases.
166 | - Optional syntactic sugar for Kedro Pipeline. (e.g. Sequential API similar to PyTorch (`torch.nn.Sequential`) and Keras (`tf.keras.Sequential`))
167 | - Optional syntactic sugar for Kedro `DataSet` catalog. (e.g. Use file name in the file path as the dataset instance name)
168 | - Backward-compatible to pure Kedro.
169 | - Integration with MLflow to save parameters, metrics, and other output artifacts such as models for each experiment.
170 | - Integration with common packages for Data Science: PyTorch, Ignite, pandas, OpenCV.
171 | - Additional `DataSet` including image set (a folder including images) useful for computer vision applications.
172 | - Lean project template compared with pure Kedro.
173 | 
174 | ### Cons:
175 | 
176 | - GUI ([kedro-viz](https://github.com/quantumblacklabs/kedro-viz)) does not provide execution progress monitoring feature.
177 | - Package dependencies which are not used in many cases (e.g. pyarrow) are included in the `requirements.txt` of Kedro.
178 | - PipelineX is developed and maintained by an individual (me) at this moment.
179 | 
180 | 
181 | 
182 | 
183 | ## Platform-specific options
184 | 
185 | ### Argo
186 | 
187 | https://github.com/argoproj/argo
188 | 
189 | Uses Kubernetes to run pipelines.
190 | 
191 | ### Kubeflow Pipelines
192 | 
193 | https://github.com/kubeflow/pipelines
194 | 
195 | Works on top of Argo.
196 | 
197 | ### Oozie
198 | 
199 | https://github.com/apache/oozie
200 | 
201 | Manages Hadoop jobs.
202 | 
203 | 
204 | ### Azkaban
205 | 
206 | https://github.com/azkaban/azkaban
207 | 
208 | Manages Hadoop jobs.
209 | 
210 | ### GitLab CI/CD
211 | 
212 | https://docs.gitlab.com/ee/ci/
213 | 
214 | - Runs pipelines defined in YAML.
215 | - Supports triggering by git push, CRON-style scheduling, and manual clicking.
216 | - Supports Docker containers.
217 | 
218 | 
219 | ## References
220 | 
221 | Airflow
222 | - https://github.com/apache/airflow
223 | - https://airflow.apache.org/docs/stable/howto/initialize-database.html
224 | - https://medium.com/datareply/integrating-slack-alerts-in-airflow-c9dcd155105
225 | 
226 | Luigi
227 | - https://github.com/spotify/luigi
228 | - https://luigi.readthedocs.io/en/stable/api/luigi.contrib.html
229 | - https://www.m3tech.blog/entry/2018/11/12/110000
230 | 
231 | Gokart
232 | - https://github.com/m3dev/gokart
233 | - https://www.m3tech.blog/entry/2019/09/30/120229
234 | - https://qiita.com/Hase8388/items/8cf0e5c77f00b555748f
235 | 
236 | Metaflow
237 | - https://github.com/Netflix/metaflow
238 | - https://docs.metaflow.org/metaflow/basics
239 | - https://docs.metaflow.org/metaflow/scaling
240 | - https://medium.com/bigdatarepublic/a-review-of-netflixs-metaflow-65c6956e168d
241 | 
242 | Kedro
243 | - https://github.com/quantumblacklabs/kedro
244 | - https://kedro.readthedocs.io/en/latest/03_tutorial/04_create_pipelines.html
245 | - https://kedro.readthedocs.io/en/latest/kedro.io.html#data-sets
246 | - https://medium.com/mhiro2/building-pipeline-with-kedro-for-ml-competition-63e1db42d179
247 |     
248 | PipelineX
249 | - https://github.com/Minyus/pipelinex
250 |     
251 | Airflow vs Luigi
252 | - https://towardsdatascience.com/data-pipelines-luigi-airflow-everything-you-need-to-know-18dc741449b7
253 | - https://medium.com/better-programming/airbnbs-airflow-versus-spotify-s-luigi-bd4c7c2c0791
254 | - https://www.quora.com/Which-is-a-better-data-pipeline-scheduling-platform-Airflow-or-Luigi
255 |  
256 | ## Inaccuracies
257 | 
258 | Please kindly let me know if you find anything inaccurate. 
259 | 
260 | Pull requests for https://github.com/Minyus/Python_Packages_for_Pipeline_Workflow/blob/master/README.md are welcome.
261 | 


--------------------------------------------------------------------------------
/ja/README.md:
--------------------------------------------------------------------------------
  1 | # PythonのPipelineパッケージ比較：Airflow, Luigi, Gokart, Metaflow, Kedro, PipelineX
  2 | 
  3 | この記事では、Open-sourceのPipeline/Workflow開発用PythonパッケージのAirflow, Luigi, Gokart, Metaflow, Kedro, PipelineXを比較します。
  4 | 
  5 | この記事では、"Pipeline"、"Workflow"、"DAG"の単語はほぼ同じ意味で使用しています。
  6 | 
  7 | ## 要約
  8 | 
  9 | - 👍: 良い
 10 | - 👍👍: より良い
 11 | 
 12 | | パッケージ                                                               | Airflow | Luigi&nbsp;&nbsp;&nbsp; | Gokart | Metaflow | Kedro&nbsp;&nbsp;&nbsp; | PipelineX       |
 13 | |-------------------------------------------------------------------------|---------|-------|--------|----------|-------|-----------------|
 14 | | ラップされたパッケージ                                                    |         |       | Luigi  |          |       | Kedro, MLflow   |
 15 | | DAG定義のしやすさ・柔軟性                                                |         |       | 👍      | 👍        | 👍     | 👍👍             |
 16 | | DAG定義のモジュール性                                                      | 👍👍       |       |        |          | 👍👍     | 👍👍               |
 17 | | 非構造化データをタスク間で渡せるか                                           |         | 👍👍     | 👍👍      | 👍👍        | 👍👍     | 👍👍               |
 18 | | 各種データ（ファイル、データベース）の存在チェックが備わっている                         |         | 👍👍   | 👍👍    |          | 👍👍   | 👍👍             |
 19 | | 各種データ（ファイル、データベース）のオペレーション（読み書き）ラッパーが備わっている                     |         |       | 👍      |          | 👍👍   | 👍👍             |
 20 | | データ（ファイル、データベース）オペレーションのモジュール性、再利用性、テストのしやすさ |         |       | 👍      |          | 👍👍   | 👍👍             |
 21 | | 中間データ（ファイル、データベース）を検知して自動的にPipelineを途中から実行できる  |         | 👍👍     | 👍👍      |          |       | 👍👍               |
 22 | | パラメータ変更を検知して強制的にタスクを再実行                               |         |       | 👍👍      |          |       |                 |
 23 | | 実験パラメータを保存する                                                   |         |       | 👍👍      |          |       | 👍👍               |
 24 | | 並列実行                                                                 | 👍       | 👍     | 👍      | 👍        | 👍     | 👍               |
 25 | | Celeryによる分散並列実行                                                  | 👍👍       |       |        |          |       |                 |
 26 | | DAGの可視化                                                              | 👍👍       | 👍     | 👍      |          | 👍     | 👍               |
 27 | | GUIでの実行状況監視                                                              | 👍👍     | 👍     | 👍      |          |       |                 |
 28 | | GUIでのスケジューリング、トリガリング                                       | 👍       |       |        |          |       |                 |
 29 | | Slackへの通知                                                            | 👍       |       | 👍      |          |       |                 |
 30 | 
 31 | 
 32 | ## Airflow 
 33 | 
 34 | https://github.com/apache/airflow
 35 | 
 36 | 2015年にAirbnb社からリリースされました。
 37 | 
 38 | Airflowは、Pythonコード（独立したPythonモジュール）でDAGを定義します。
 39 | （オプションとして、非公式の [dag-factory](https://github.com/ajbosco/dag-factory) 等を使用して、YAMLでDAGを定義できます。）
 40 | 
 41 | ### 良い点:
 42 | 
 43 | - DAGの可視化、実行進捗監視、スケジューリング、トリガリング機能をGUI上で使えます。
 44 | - Celeryを使用した分散コンピューティングを使えます。
 45 | - DAGの定義はモジュラーで、処理関数とは独立してます。
 46 | - Workflowは `SubDagOperator` を使用してネストできます。
 47 | - Slackに通知できます。
 48 | 
 49 | ### 良いとは言えない点:
 50 | 
 51 | - 依存タスク間でデータベースを介さずにデータを渡せるように設計されていません。
 52 | Airflow内の依存タスク間で非構造化データ（画像、動画、pickle等）を渡す良い方法がありません。
 53 | - ファイルアクセス（読み書き）のためのコードが別途必要になります。
 54 | - 中間生成データファイルやデータベースを使用して、自動的にPipelineの途中から実行できません。
 55 | 
 56 | ## Luigi
 57 | 
 58 | https://github.com/spotify/luigi
 59 | 
 60 | 2012年にSpotify社からリリースされました。
 61 | 
 62 | Luigiは、Pythonコードで、3つのクラスメソッド(`requires`, `output`, `run`)を持つ`Task`の子クラス達によりPipelineを定義します。
 63 | 
 64 | ### 良い点:
 65 | 
 66 | - `Target`クラスを使用した`Task.output`メソッドで定義されたとおり、ローカルかクラウド(AWS, GCP, Azure)上の中間データファイルやデータベースを使用して、
 67 | 自動的にPipelineの途中から実行できます。
 68 | - 依存タスク間で任意のデータを渡すようにコーディングできます。
 69 | - DAGの可視化、実行進捗監視機能をGUI上で使えます。
 70 | 
 71 | 
 72 | ### 良いとは言えない点:
 73 | 
 74 | - ファイルやデータベースアクセス（読み書き）するためのコードを書く必要があります。
 75 | - Pipeline定義、タスク処理（ETLのうちのTransform）、データアクセス（ETLのうちのExtract&Load）が密に結合していて、モジュラーではありません。
 76 | 将来のプロジェクトで再利用するためにタスククラスを修正しないといけません。
 77 | 
 78 | 
 79 | ## Gokart
 80 | 
 81 | https://github.com/m3dev/gokart
 82 | 
 83 | 2018年12月にエムスリー社からリリースされました。
 84 | 
 85 | Gokartは内部でLuigiを使用します。
 86 | 
 87 | ### 良い点: 
 88 | 
 89 | Luigiの良い点に追加として：
 90 | - `TaskInstanceParameter`を使用することにより、Pipeline定義とタスク処理を分離し、将来のプロジェクトで簡単に再利用できるようにすることができます。
 91 | - pickle, npz, gz, txt, csv, tsv, json, xml 形式のファイルアクセス（読み書き）ラッパーが`FileProcessor`クラスとして提供されます。
 92 | - 各実験パラメータを保存する仕組みが備わっています。[thunderbolt](https://github.com/m3dev/thunderbolt)というビューワーが提供されます。
 93 | - 中間ファイル名に含まれた、パラメータセットに固有のハッシュ文字列を参照して、パラメータが変更された際にはタスクを再実行します。
 94 | 様々なパラメータセットで実験するのに有用です。
 95 | - クラスデコレータを使用してLuigiの `requires` クラスメソッドを簡潔に書くためのシンタクティックシュガーが提供されます。
 96 | - Slackに通知できます。
 97 | 
 98 | ### 良いとは言えない点:
 99 | 
100 | - サポートされているデータファイル形式が限られています。
101 | サポートされていない形式を使用するためには、ファイルやデータベースアクセス（読み書き）するためのコードを書く必要があります。
102 | 
103 | 
104 | ## Metaflow
105 | 
106 | https://github.com/Netflix/metaflow
107 | 
108 | 2019年12月にNetflix社からリリースされました。
109 | 
110 | Metaflowは、Pythonコードで、`step` デコレータ付きのクラスメソッドを含む`FlowSpec`の子クラスによりPipelineを定義します。
111 | 
112 | ### 良い点:
113 | 
114 | - AWSサービス（特にAWS Batch）とのインテグレーション。
115 | 
116 | ### 良いとは言えない点:
117 | 
118 | - ファイルやデータベースアクセス（読み書き）するためのコードを書く必要があります。
119 | - Pipeline定義、タスク処理（ETLのうちのTransform）、データアクセス（ETLのうちのExtract&Load）が密に結合していて、モジュラーではありません。
120 | 将来のプロジェクトで再利用するためにタスククラスを修正しないといけません。
121 | - GUIはありません。
122 | - GCP、Azureにあまり対応していません。
123 | - 中間生成データファイルやデータベースを使用して、自動的にPipelineの途中から実行できません。
124 | 
125 | 
126 | ## Kedro
127 | 
128 | https://github.com/quantumblacklabs/kedro
129 | 
130 | 2019年5月にMcKinseyの子会社のQuantumBlack社からリリースされました。
131 | 
132 | Kedroは、Pythonコード（独立したPythonモジュール）で、3つの引数
133 | (`func`: タスク処理関数, `inputs`: 入力データ名（複数の場合はlist or dict）, `outputs`: 出力データ名（複数の場合はlist or dict）) 
134 | を持つ`node` 関数のリストによりPipelineを定義します。
135 | 
136 | ### 良い点:
137 | 
138 | - ローカル又はクラウド（AWSのS3 GCPのGCS）上のCSV, Pickle, YAML, JSON, Parquet, Excel, textファイル、SQL、Spark等のリソースへのアクセス（読み書き）用の
139 | ラッパーを`DataSet`クラスとして備えています。
140 | - 対応していないデータ形式へのサポートは、ユーザーにより追加できます。
141 | - Pipeline定義、タスク処理（ETLのうちのTransform）、データアクセス（ETLのうちのExtract&Load）は独立していて、モジュラーです。
142 | 将来のプロジェクトで簡単に再利用できます。
143 | - Pipelineはネストできます。（Pipelineは他のPipelineの一部として使用できます。）
144 | - GUI ([kedro-viz](https://github.com/quantumblacklabs/kedro-viz))でDAGを可視化できます。
145 | 
146 | 
147 | ### 良いとは言えない点:
148 | - 中間生成データファイルやデータベースを使用して、自動的にPipelineの途中から実行できません。
149 | - GUI ([kedro-viz](https://github.com/quantumblacklabs/kedro-viz))で実行進捗監視する機能はありません。
150 | - 大抵の場合は使用しないパッケージ（pyarrow 等）が`requirements.txt`に含まれています。
151 | 
152 | 
153 | ## PipelineX:
154 | 
155 | https://github.com/Minyus/pipelinex
156 | 
157 | 2019年11月にKedroユーザー（私）によりリリースされました。
158 | PipelineXはKedroとMLflowを内部で使用します。
159 | PiplineXは、YAML（独立したYAMLファイル）で、KedroよりもPipelineを定義します。
160 | 
161 | ### 良い点:
162 | 
163 | Kedroの長所に追加として：
164 | - 中間データファイルやデータベースを使用して、自動的にPipelineの途中から実行できます。
165 | - Kedro Pipelineを簡潔に書くためのシンタクティックシュガーを使用できます。
166 | （例：PyTorch (`torch.nn.Sequential`) や Keras (`tf.keras.Sequential`) に似たSequential API）
167 | - Kedro `DataSet` catalog を簡潔に書くためのシンタクティックシュガーを使用できます。
168 | （例：ファイルパス内のファイル名をデータセットインスタンス名として使用）
169 | - Kedroと後方互換性があります。
170 | - 各実験パラメータ、メトリック、モデルその他の生成データファイルを保存するするためのMLflowとのインテグレーションを使用できます。
171 | - PyTorch, Ignite, pandas, OpenCVといったデータサイエンスで一般的なパッケージとのインテグレーションを使用できます。
172 | - コンピュータビジョンアプリケーションで有用な画像セット（画像を含むフォルダ）を扱うための`DataSet`クラスを使用できます。
173 | - 元のKedroで提供されているものよりもリーンなプロジェクトテンプレートが提供されます。 
174 | 
175 | ### 良いとは言えない点:
176 | 
177 | - GUI ([kedro-viz](https://github.com/quantumblacklabs/kedro-viz))で実行進捗監視する機能はありません。
178 | - 大抵の場合は使用しないパッケージ（pyarrow 等）がKedroの`requirements.txt`に含まれています。
179 | - PipelineXは、現時点では一個人（私）により開発、メンテナンスされています。
180 | 
181 | 
182 | 
183 | 
184 | ## プラットフォームに特化したパッケージ
185 | 
186 | ### Argo
187 | 
188 | https://github.com/argoproj/argo
189 | 
190 | ArgoはKubernetesを使用してPipelineを実行します。
191 | 
192 | ### Kubeflow Pipelines
193 | 
194 | https://github.com/kubeflow/pipelines
195 | 
196 | Kubeflow Pipelinesは内部でArgoを使用します。
197 | 
198 | ### Oozie
199 | 
200 | https://github.com/apache/oozie
201 | 
202 | Hadoopジョブを管理できます。
203 | 
204 | 
205 | ### Azkaban
206 | 
207 | https://github.com/azkaban/azkaban
208 | 
209 | Hadoopジョブを管理できます。
210 | 
211 | 
212 | ## リファレンス
213 | 
214 | Airflow
215 | - https://github.com/apache/airflow
216 | - https://airflow.apache.org/docs/stable/howto/initialize-database.html
217 | - https://medium.com/datareply/integrating-slack-alerts-in-airflow-c9dcd155105
218 | 
219 | Luigi
220 | - https://github.com/spotify/luigi
221 | - https://luigi.readthedocs.io/en/stable/api/luigi.contrib.html
222 | - https://www.m3tech.blog/entry/2018/11/12/110000
223 | 
224 | Gokart
225 | - https://github.com/m3dev/gokart
226 | - https://www.m3tech.blog/entry/2019/09/30/120229
227 | - https://qiita.com/Hase8388/items/8cf0e5c77f00b555748f
228 | 
229 | Metaflow
230 | - https://github.com/Netflix/metaflow
231 | - https://docs.metaflow.org/metaflow/basics
232 | - https://docs.metaflow.org/metaflow/scaling
233 | - https://medium.com/bigdatarepublic/a-review-of-netflixs-metaflow-65c6956e168d
234 | 
235 | Kedro
236 | - https://github.com/quantumblacklabs/kedro
237 | - https://kedro.readthedocs.io/en/latest/03_tutorial/04_create_pipelines.html
238 | - https://kedro.readthedocs.io/en/latest/kedro.io.html#data-sets
239 | - https://medium.com/mhiro2/building-pipeline-with-kedro-for-ml-competition-63e1db42d179
240 |     
241 | PipelineX
242 | - https://github.com/Minyus/pipelinex
243 |     
244 | Airflow vs Luigi
245 | - https://towardsdatascience.com/data-pipelines-luigi-airflow-everything-you-need-to-know-18dc741449b7
246 | - https://medium.com/better-programming/airbnbs-airflow-versus-spotify-s-luigi-bd4c7c2c0791
247 | - https://www.quora.com/Which-is-a-better-data-pipeline-scheduling-platform-Airflow-or-Luigi
248 |  
249 | ## 不正確な点
250 | 
251 | 不正確な点がありましたら、お知らせください。
252 | 
253 | https://github.com/Minyus/Python_Packages_for_Pipeline_Workflow/blob/master/ja/README.md へのプルリクエストを歓迎します。
254 | 


--------------------------------------------------------------------------------