├── .gitignore
├── README.md
├── assets
├── README.md
└── project_logo.png
├── inputs
└── README.md
├── models
├── README.md
└── __pipeline__.ipynb
├── outputs
└── README.md
├── requirements.txt
├── setup.ipynb
├── tests
└── README.md
└── utils
├── README.md
├── demo.ipynb
└── naas.ipynb
/.gitignore:
--------------------------------------------------------------------------------
1 | # Jupyter Notebook
2 | .ipynb_checkpoints
3 |
4 | # Data
5 | .csv
6 | .xlsx
7 | .json
8 | .parquet
9 |
10 | # Models
11 | pipeline_executions
12 |
13 | # Outputs
14 | outputs
15 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | ℹ️ This project is currently deprecated, check the [abi repository](https://github.com/jupyter-naas/abi) to see our latest version of data product packaging.
2 | ------
3 |
4 |
5 | 
6 |
7 | # Naas Data Product Framework
8 |
9 | Naas is a low-code open source data platform that enables anyone working with data, including business analysts, scientists, and engineers, to easily create powerful data products combining automation, analytics, and AI from the comfort of their Jupyter notebooks. With its open source distribution model, Naas ensures visible source code and versioning, and allows you to create custom logic.
10 |
11 | The platform is structured around three low-code layers:
12 |
13 | - **Templates** enable users to create automated data jobs in minutes, and are the building blocks of data products.
14 | - **Drivers** act as connectors, allowing you to push and/or pull data from databases, APIs, and machine learning algorithms, and more.
15 | - **Features** transform Jupyter notebooks into a production-ready environment, with features such as scheduling, asset sharing, and notifications.
16 |
17 | You can try Naas for free using Naas Cloud, a stable environment that runs in your browser.
18 |
19 | ## **How Does It Work?**
20 |
21 | This repository is a boilerplate for anyone who wishes to develop a data product using Naas. It is structured as follows:
22 |
23 | - The **`/assets`** folder stores any PNG, JPG, GIF, CSV, diagrams, or slides related to the documentation of the product.
24 | - The **`/inputs`** folder stores the parameters and any other files needed (data, referential) to run the files in the **`/models`** folder.
25 | - The **`/models`** folder stores any files that transform inputs into outputs (notebook, Python, SQL files)
26 | - The **`__pipeline__.ipynb`** file is used to automate the model selection process, ensuring that the most accurate model is always used for a given task.
27 |
28 | - The **`/outputs`** folder stores all the files that would be exposed outside of the Naas server.
29 | - The **`/tests`** folder stores all tests to be performed before production.
30 | - The **`/utils`** folder stores all common functions used across files.
31 | - The **`requirements.txt`** file lists all the packages and dependencies.
32 | - The **`setup.ipynb`** file runs the product on a Naas server.
33 |
34 | ## What Are The Benefits?
35 |
36 | Some benefits of the Naas Data Product Framework are:
37 |
38 | - **Low-code approach**: The low-code nature of the Naas platform makes it easy for anyone, regardless of their technical background, to create powerful data products.
39 | - **Open source**: The open source distribution model of Naas ensures visible source code and versioning, and allows you to create custom logic.
40 | - **Jupyter integration**: Naas integrates seamlessly with Jupyter notebooks, allowing you to create data products from the comfort of your familiar environment.
41 | - **Versatility**: With its templates, drivers, and features, Naas is highly versatile and enables you to build almost anything.
42 | - **Cloud-based**: Naas Cloud, the stable environment provided by Naas, allows you to access the platform from anywhere with an internet connection.
43 |
44 | Overall, the Naas Data Product Framework is a powerful tool for anyone looking to create data products that combine automation, analytics, and AI.
45 |
46 | ## Why a Data Product Development Framework Like Naas is Necessary?
47 |
48 | Just as web development frameworks like React.js help developers create web applications more efficiently by providing a set of standardized tools and components, data product development frameworks like Naas help data scientists and engineers create data products more efficiently by providing a set of standardized tools and components specifically designed for data processing, analytics, and AI.
49 |
50 | Some specific benefits of using a data product development framework like Naas include:
51 |
52 | - **Standardized structure**: A data product development framework provides a standardized structure for organizing and developing data products, which can make it easier to develop, maintain, and scale data products.
53 | - **Pre-built components**: A data product development framework includes a set of pre-built components, such as data connectors and data transformation tools, which can save time and effort compared to building these components from scratch.
54 | - **Integration with other tools**: A data product development framework typically integrates with other tools and technologies commonly used in the data world, such as Jupyter notebooks and machine learning libraries, which can make it easier to build and deploy data products.
55 | - **Collaboration and sharing**: A data product development framework can make it easier for multiple people to collaborate and share data products within an organization, as it provides a consistent framework for development and documentation.
56 |
57 | Overall, a data product development framework like Naas can provide a number of benefits to data scientists and engineers, including improved efficiency, integration with other tools, and the ability to collaborate and share data products within an organization.
58 |
59 | ## How Data Products And Asociatedd Contracts Can Create More Trust From End-User?
60 |
61 | A data product framework can help with defining data contracts and creating trust with end users in several ways:
62 |
63 | - **Standardized structure**: A data product framework provides a standardized structure for organizing and developing data products, which can make it easier to define clear and consistent data contracts. For example, if a data product is built using a framework that specifies how input and output data should be structured and documented, it can be easier for end users to understand how the data product works and what they can expect from it.
64 | - **Transparency**: Many data product frameworks are open source, which means that the source code is visible and can be reviewed by anyone. This transparency can help build trust with end users, as they can see exactly how the data product works and how it processes their data.
65 | - **Auditability**: A data product framework can also provide tools and processes for auditing and reviewing data products, which can help ensure that they are reliable and accurate. This can be especially important for data products that are used in mission-critical applications, as end users need to be confident that the data products are reliable and trustworthy.
66 |
67 | Overall, a data product framework can help create trust with end users by providing a standardized and transparent structure for developing data products, and by providing tools and processes for auditing and reviewing the products to ensure their reliability.
68 |
69 | ## **About This Repository**
70 |
71 | This Data Product Framework repository is a boilerplate to create powerful Data Products in your company. To get started:
72 |
73 | 1. Create an organization on GitHub.
74 | 2. Use this template to kickstart your Data Product.
75 | 3. Start bringing value to your company.
76 |
77 | ## **Built With**
78 |
79 | - Jupyter Notebooks
80 | - Naas
81 |
82 | ## **Documentation**
83 |
84 | ### **Prerequisites**
85 |
86 | - Create an account on naas.ai
87 |
88 | ### **Installation**
89 |
90 | Follow the steps in the **`settings.ipynb`** notebook.
91 |
92 | ## **Roadmap**
93 |
94 | - V0: Simple boilerplate with Naas pipeline feature
95 | - V1: Add Naas space feature to create powerful dashboard
96 |
97 | ## **Support**
98 |
99 | If you have problems or questions, please open an issue and we will try to help you as soon as possible.
100 |
101 | ## **Contributing**
102 |
103 | Contributions are welcome. If you have a suggestion that would make this better, please fork the repository and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star.
104 |
105 | To contribute:
106 |
107 | 1. Create an account on naas.ai.
108 | 2. Clone the repository on your machine.
109 | 3. Create a feature branch.
110 | 4. Commit your changes.
111 | 5. Push to the branch.
112 | 6. Open a pull request.
113 |
114 |
115 | ## Product Owners
116 |
117 | * [Florent Ravenel](https://www.linkedin.com/in/florent-ravenel/) - florent@naas.ai
118 | * [Jeremy Ravenel](https://www.linkedin.com/in/ACoAAAJHE7sB5OxuKHuzguZ9L6lfDHqw--cdnJg/) - jeremy@naas.ai
119 | * [Maxime Jublou](https://www.linkedin.com/in/maximejublou/) - maxime@naas.ai
120 |
121 |
122 | ## Acknowledgments
123 |
124 | * [Awesome Notebooks](https://github.com/jupyter-naas/awesome-notebooks)
125 | * [Naas Drivers](https://github.com/jupyter-naas/drivers)
126 | * [Naas](https://github.com/jupyter-naas/naas)
127 | * [Naas Data Product](https://github.com/jupyter-naas/naas-data-product)
128 |
129 |
130 | ## Legal
131 |
132 | This project is licensed under AGPL-3.0
133 |
--------------------------------------------------------------------------------
/assets/README.md:
--------------------------------------------------------------------------------
1 | # Assets
2 |
3 | ## Description
4 | The /assets folder stores any PNG, JPG, GIF, CSV, diagrams, or slides related to the documentation of the product.
--------------------------------------------------------------------------------
/assets/project_logo.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jupyter-naas/data-product-framework/bfd89550c2053b75501d9fdc11e04e0d5d7ac25a/assets/project_logo.png
--------------------------------------------------------------------------------
/inputs/README.md:
--------------------------------------------------------------------------------
1 | # Inputs
2 |
3 | ## Description
4 | The /inputs folder stores the parameters and any other files needed (data, referential) to run the files in the /models folder.
--------------------------------------------------------------------------------
/models/README.md:
--------------------------------------------------------------------------------
1 | # Models
2 |
3 | ## Overview
4 | The /models folder is designed to hold notebooks that can be used to quickly start working on your data product. To get started, you can navigate to the __templates__ folder and copy and paste any of the notebooks you want to use.
--------------------------------------------------------------------------------
/models/__pipeline__.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "id": "deluxe-force",
6 | "metadata": {
7 | "papermill": {},
8 | "tags": []
9 | },
10 | "source": [
11 | "\n"
12 | ]
13 | },
14 | {
15 | "cell_type": "markdown",
16 | "id": "unlimited-internship",
17 | "metadata": {
18 | "papermill": {},
19 | "tags": []
20 | },
21 | "source": [
22 | "# Naas - Create Pipeline\n",
23 | "
Template request | Bug report | Generate Data Product"
24 | ]
25 | },
26 | {
27 | "cell_type": "markdown",
28 | "id": "3097faf3-b2b9-41fd-8589-f7718b5f919a",
29 | "metadata": {
30 | "papermill": {},
31 | "tags": []
32 | },
33 | "source": [
34 | "**Tags:** #naas #pipeline #jupyter #notebook #dataanalysis #workflow #streamline"
35 | ]
36 | },
37 | {
38 | "cell_type": "markdown",
39 | "id": "opposite-guatemala",
40 | "metadata": {
41 | "papermill": {},
42 | "tags": []
43 | },
44 | "source": [
45 | "**Author:** [Maxime Jublou](https://www.linkedin.com/in/maximejublou)"
46 | ]
47 | },
48 | {
49 | "cell_type": "markdown",
50 | "id": "bf2e0f26-7fdf-4351-9209-8bb54c5ef7e9",
51 | "metadata": {
52 | "papermill": {},
53 | "tags": []
54 | },
55 | "source": [
56 | "**Description:** This notebook is a guide that teaches you how to create a notebook pipeline using naas."
57 | ]
58 | },
59 | {
60 | "cell_type": "markdown",
61 | "id": "input_cell",
62 | "metadata": {
63 | "papermill": {},
64 | "tags": []
65 | },
66 | "source": [
67 | "## Input"
68 | ]
69 | },
70 | {
71 | "cell_type": "markdown",
72 | "id": "import_cell",
73 | "metadata": {
74 | "papermill": {},
75 | "tags": []
76 | },
77 | "source": [
78 | "### Import libraries"
79 | ]
80 | },
81 | {
82 | "cell_type": "code",
83 | "execution_count": null,
84 | "id": "funny-neighbor",
85 | "metadata": {
86 | "papermill": {},
87 | "tags": []
88 | },
89 | "outputs": [],
90 | "source": [
91 | "from naas.pipeline.pipeline import (\n",
92 | " Pipeline,\n",
93 | " DummyStep,\n",
94 | " DummyErrorStep,\n",
95 | " NotebookStep,\n",
96 | " End,\n",
97 | " ParallelStep,\n",
98 | ")"
99 | ]
100 | },
101 | {
102 | "cell_type": "markdown",
103 | "id": "0b579d95-8da2-46aa-b8dd-843ae0d46964",
104 | "metadata": {},
105 | "source": [
106 | "### Setup variables\n",
107 | "- `pipeline_outputs_path`: Path of your pipeline executions. When the pipeline is run, a \"pipeline_executions\" folder will be created in your file system. Inside this folder, you will be able to access each pipeline executions. If you use NotebookStep, executed notebooks will be stored in this folder. This allows you to easily review and analyze the results of the pipeline, and to troubleshoot any issues that may have occurred."
108 | ]
109 | },
110 | {
111 | "cell_type": "code",
112 | "execution_count": null,
113 | "id": "f67191c1-aaa8-4ad6-8793-eb563f72822c",
114 | "metadata": {},
115 | "outputs": [],
116 | "source": [
117 | "pipeline_outputs_path = \"../outputs/pipeline_executions\""
118 | ]
119 | },
120 | {
121 | "cell_type": "markdown",
122 | "id": "model_cell",
123 | "metadata": {
124 | "papermill": {},
125 | "tags": []
126 | },
127 | "source": [
128 | "## Model"
129 | ]
130 | },
131 | {
132 | "cell_type": "markdown",
133 | "id": "e15d093c-0189-42c3-959f-64eed5530fd4",
134 | "metadata": {
135 | "papermill": {},
136 | "tags": []
137 | },
138 | "source": [
139 | "### Setup NotebookStep\n",
140 | "For demonstration purposes, we used the `DummyStep` to illustrate its functionality.\n",
141 | "\n",
142 | "In order to create a pipeline, you should used the `NotebookStep()` which has three parameters:\n",
143 | "- the name (string) of the step\n",
144 | "- the notebook path (string) for execution\n",
145 | "- the parameters (dictionary) that are injected through papermill in the first cell or after the cell labeled \"parameters.\"\n",
146 | "\n",
147 | "`NotebookStep(\"My Notebook\", \"my_notebook.ipynb\")`"
148 | ]
149 | },
150 | {
151 | "cell_type": "code",
152 | "execution_count": null,
153 | "id": "fa346868-d375-4547-b11b-79311e3f1fc3",
154 | "metadata": {
155 | "papermill": {},
156 | "tags": []
157 | },
158 | "outputs": [],
159 | "source": [
160 | "collection = DummyStep(\n",
161 | " \"Collection\"\n",
162 | ") # In this step, data can be collected from various sources such as databases, APIs, or file systems.\n",
163 | "cleaning = DummyStep(\n",
164 | " \"Cleaning\"\n",
165 | ") # Once the data is collected, it is often necessary to clean and preprocess it to remove any irrelevant or duplicate information. This step may involve tasks such as removing null values, correcting data formats, and standardizing column names.\n",
166 | "transformation1 = DummyStep(\n",
167 | " \"Transformation 1\"\n",
168 | ") # In this step, the data is transformed into the desired format, such as a flat file or a specific data model. This may involve tasks such as aggregating data, joining multiple tables, or calculating new fields.\n",
169 | "transformation2 = DummyStep(\n",
170 | " \"Transformation 2\"\n",
171 | ") # In this step, the data is transformed into the desired format, such as a flat file or a specific data model. This may involve tasks such as aggregating data, joining multiple tables, or calculating new fields.\n",
172 | "distribution = DummyStep(\n",
173 | " \"Distribution\"\n",
174 | ") # In this step, the data is loaded into its final destination, such as a data warehouse, a data lake, or a specific application."
175 | ]
176 | },
177 | {
178 | "cell_type": "markdown",
179 | "id": "output_cell",
180 | "metadata": {
181 | "papermill": {},
182 | "tags": []
183 | },
184 | "source": [
185 | "## Output"
186 | ]
187 | },
188 | {
189 | "cell_type": "markdown",
190 | "id": "display_cell",
191 | "metadata": {
192 | "papermill": {},
193 | "tags": []
194 | },
195 | "source": [
196 | "### Create Basic Pipeline\n",
197 | "- Link your notebook using this syntax: `>>`\n",
198 | "- Create ParallelStep using this syntax: `[step1, step2]`"
199 | ]
200 | },
201 | {
202 | "cell_type": "code",
203 | "execution_count": null,
204 | "id": "c14e4a38-75b1-475b-bc28-45db7effe45f",
205 | "metadata": {
206 | "papermill": {},
207 | "tags": []
208 | },
209 | "outputs": [],
210 | "source": [
211 | "pipeline = Pipeline()\n",
212 | "\n",
213 | "(\n",
214 | " pipeline\n",
215 | " >> collection\n",
216 | " >> cleaning\n",
217 | " >> [transformation1, transformation2]\n",
218 | " >> distribution\n",
219 | " >> End()\n",
220 | ")\n",
221 | "\n",
222 | "pipeline.run(outputs_path=pipeline_outputs_path)"
223 | ]
224 | }
225 | ],
226 | "metadata": {
227 | "kernelspec": {
228 | "display_name": "Python 3",
229 | "language": "python",
230 | "name": "python3"
231 | },
232 | "language_info": {
233 | "codemirror_mode": {
234 | "name": "ipython",
235 | "version": 3
236 | },
237 | "file_extension": ".py",
238 | "mimetype": "text/x-python",
239 | "name": "python",
240 | "nbconvert_exporter": "python",
241 | "pygments_lexer": "ipython3",
242 | "version": "3.9.6"
243 | },
244 | "naas": {
245 | "notebook_id": "92ddbcf7c74813cc4c906ca6b7d04cc2590230b5fb16082b396de5b9872be0cf",
246 | "notebook_path": "Naas/Naas_Create_Pipeline.ipynb"
247 | },
248 | "papermill": {
249 | "default_parameters": {},
250 | "environment_variables": {},
251 | "parameters": {},
252 | "version": "2.3.3"
253 | }
254 | },
255 | "nbformat": 4,
256 | "nbformat_minor": 5
257 | }
258 |
--------------------------------------------------------------------------------
/outputs/README.md:
--------------------------------------------------------------------------------
1 | # Outputs
2 |
3 | ## Description
4 | The /outputs folder stores all the files that would be exposed outside of the Naas server
--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | naas_data_product
2 | pyvis==0.2.1
--------------------------------------------------------------------------------
/setup.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "id": "injured-evolution",
6 | "metadata": {
7 | "papermill": {},
8 | "tags": []
9 | },
10 | "source": [
11 | ""
12 | ]
13 | },
14 | {
15 | "cell_type": "markdown",
16 | "id": "substantial-decline",
17 | "metadata": {
18 | "papermill": {},
19 | "tags": []
20 | },
21 | "source": [
22 | "# Settings"
23 | ]
24 | },
25 | {
26 | "cell_type": "markdown",
27 | "id": "d288008a-adf4-47e8-acf5-55a0d2f9ae4e",
28 | "metadata": {},
29 | "source": [
30 | "
Note: this data product framework is developed by Naas open source community. You can sponsor us if you find it usefull.\n", 32 | "
\n", 33 | "