├── logs ├── .gitkeep └── journals │ └── .gitkeep ├── notebooks ├── .gitkeep └── Data GE Ejemplos.ipynb ├── conf ├── local │ └── .gitkeep ├── base │ ├── parameters.yml │ ├── catalog.yml │ └── logging.yml └── README.md ├── data ├── 06_models │ └── .gitkeep ├── 01_raw │ ├── .gitkeep │ └── iris.csv ├── 03_primary │ └── .gitkeep ├── 04_feature │ └── .gitkeep ├── 05_model_input │ └── .gitkeep ├── 08_reporting │ └── .gitkeep ├── 02_intermediate │ └── .gitkeep └── 07_model_output │ └── .gitkeep ├── src ├── tests │ ├── __init__.py │ ├── pipelines │ │ └── __init__.py │ └── test_run.py ├── minco │ ├── pipelines │ │ ├── __init__.py │ │ ├── data_engineering │ │ │ ├── README.md │ │ │ ├── __init__.py │ │ │ ├── pipeline.py │ │ │ └── nodes.py │ │ └── data_science │ │ │ ├── __init__.py │ │ │ ├── README.md │ │ │ ├── pipeline.py │ │ │ └── nodes.py │ ├── __init__.py │ ├── run.py │ └── hooks.py ├── requirements.txt └── setup.py ├── .gitpod.yml ├── pngs ├── ge1.png └── ge2.png ├── requirements.txt ├── .coveragerc ├── .isort.cfg ├── .kedro.yml ├── shiftleft.yml ├── setup.cfg ├── Dockerfile ├── docs └── source │ ├── index.rst │ └── conf.py ├── .circleci └── config.yml ├── .github └── workflows │ └── shiftleft.yml ├── .gitignore ├── .ipython └── profile_default │ └── startup │ └── 00-kedro-init.py ├── README.md └── kedro_cli.py /logs/.gitkeep: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /notebooks/.gitkeep: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /conf/local/.gitkeep: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /data/06_models/.gitkeep: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /logs/journals/.gitkeep: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /src/tests/__init__.py: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /data/01_raw/.gitkeep: -------------------------------------------------------------------------------- 1 | *.csv -------------------------------------------------------------------------------- /data/03_primary/.gitkeep: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /data/04_feature/.gitkeep: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /data/05_model_input/.gitkeep: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /data/08_reporting/.gitkeep: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /data/02_intermediate/.gitkeep: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /data/07_model_output/.gitkeep: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /src/minco/pipelines/__init__.py: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /src/tests/pipelines/__init__.py: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /.gitpod.yml: -------------------------------------------------------------------------------- 1 | tasks: 2 | - init: pip install -r ./requirements.txt 3 | -------------------------------------------------------------------------------- /pngs/ge1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sdk/kedro_ge/main/pngs/ge1.png -------------------------------------------------------------------------------- /pngs/ge2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sdk/kedro_ge/main/pngs/ge2.png -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | kedro==0.16.6 2 | kedro[parquet]==0.16.6 3 | great_expectations 4 | black 5 | -------------------------------------------------------------------------------- /.coveragerc: -------------------------------------------------------------------------------- 1 | [report] 2 | fail_under=0 3 | show_missing=True 4 | exclude_lines = 5 | pragma: no cover 6 | raise NotImplementedError 7 | -------------------------------------------------------------------------------- /.isort.cfg: -------------------------------------------------------------------------------- 1 | [settings] 2 | multi_line_output=3 3 | include_trailing_comma=True 4 | force_grid_wrap=0 5 | use_parentheses=True 6 | line_length=88 7 | known_third_party=kedro 8 | -------------------------------------------------------------------------------- /.kedro.yml: -------------------------------------------------------------------------------- 1 | context_path: minco.run.ProjectContext 2 | project_name: "minco" 3 | project_version: "0.16.6" 4 | package_name: "minco" 5 | hooks: 6 | - minco.hooks.project_hooks 7 | -------------------------------------------------------------------------------- /conf/base/parameters.yml: -------------------------------------------------------------------------------- 1 | # Parameters for the example pipeline. Feel free to delete these once you 2 | # remove the example pipeline from hooks.py and the example nodes in 3 | # `src/pipelines/` 4 | example_test_data_ratio: 0.2 5 | example_num_train_iter: 10000 6 | example_learning_rate: 0.01 7 | -------------------------------------------------------------------------------- /shiftleft.yml: -------------------------------------------------------------------------------- 1 | build_rules: 2 | - id: allow-zero-findings 3 | finding_types: 4 | - vuln 5 | - secret 6 | - insight 7 | - "*" 8 | severity: 9 | - SEVERITY_MEDIUM_IMPACT 10 | - SEVERITY_HIGH_IMPACT 11 | - SEVERITY_LOW_IMPACT 12 | threshold: 0 -------------------------------------------------------------------------------- /src/requirements.txt: -------------------------------------------------------------------------------- 1 | black==v19.10b0 2 | flake8>=3.7.9, <4.0 3 | ipython~=7.0 4 | isort>=4.3.21, <5.0 5 | jupyter~=1.0 6 | jupyter_client>=5.1, <7.0 7 | jupyterlab==0.31.1 8 | kedro[pandas.CSVDataSet]==0.16.6 9 | nbstripout==0.3.3 10 | pytest-cov~=2.5 11 | pytest-mock>=1.7.1, <2.0 12 | pytest~=5.0 13 | wheel==0.32.2 14 | -------------------------------------------------------------------------------- /setup.cfg: -------------------------------------------------------------------------------- 1 | [flake8] 2 | max-line-length=88 3 | extend-ignore=E203 4 | 5 | [isort] 6 | multi_line_output=3 7 | include_trailing_comma=True 8 | force_grid_wrap=0 9 | use_parentheses=True 10 | line_length=88 11 | known_third_party=kedro 12 | 13 | [tool:pytest] 14 | addopts=--cov-report term-missing 15 | --cov src/minco -ra 16 | -------------------------------------------------------------------------------- /Dockerfile: -------------------------------------------------------------------------------- 1 | FROM python:3.8.8-slim-buster 2 | USER root 3 | RUN \ 4 | apt-get update &&\ 5 | apt-get install -y git 6 | COPY ./requirements.txt /app/requirements.txt 7 | RUN pip3 install --use-feature=2020-resolver -r /app/requirements.txt --no-deps 8 | RUN pip3 install --use-feature=2020-resolver -r /app/requirements.txt 9 | WORKDIR /app/ 10 | COPY . . 11 | CMD ["kedro","run"] 12 | 13 | -------------------------------------------------------------------------------- /docs/source/index.rst: -------------------------------------------------------------------------------- 1 | .. minco documentation master file, created by sphinx-quickstart. 2 | You can adapt this file completely to your liking, but it should at least 3 | contain the root `toctree` directive. 4 | 5 | Welcome to project's minco API docs! 6 | ============================================= 7 | 8 | .. toctree:: 9 | :maxdepth: 4 10 | 11 | modules 12 | 13 | 14 | Indices and tables 15 | ================== 16 | 17 | * :ref:`genindex` 18 | * :ref:`modindex` 19 | * :ref:`search` 20 | -------------------------------------------------------------------------------- /conf/README.md: -------------------------------------------------------------------------------- 1 | # What is this for? 2 | 3 | This folder should be used to store configuration files used by Kedro or by separate tools. 4 | 5 | This file can be used to provide users with instructions for how to reproduce local configuration with their own credentials. You can edit the file however you like, but you may wish to retain the information below and add your own section in the [Instructions](#Instructions) section. 6 | 7 | ## Local configuration 8 | 9 | The `local` folder should be used for configuration that is either user-specific (e.g. IDE configuration) or protected (e.g. security keys). 10 | 11 | > *Note:* Please do not check in any local configuration to version control. 12 | 13 | ## Base configuration 14 | 15 | The `base` folder is for shared configuration, such as non-sensitive and project-related configuration that may be shared across team members. 16 | 17 | WARNING: Please do not put access credentials in the base configuration folder. 18 | 19 | ## Instructions 20 | 21 | 22 | 23 | 24 | 25 | ## Find out more 26 | You can find out more about configuration from the [user guide documentation](https://kedro.readthedocs.io/en/stable/04_user_guide/03_configuration.html). 27 | -------------------------------------------------------------------------------- /.circleci/config.yml: -------------------------------------------------------------------------------- 1 | version: 2.1 2 | 3 | orbs: 4 | python: circleci/python@0.2.1 5 | shiftleft: shiftleft/shiftleft@1.0 6 | 7 | jobs: 8 | build-and-test: 9 | docker: 10 | - image: cimg/python:3.8.9 11 | steps: 12 | - checkout 13 | - python/load-cache 14 | - run: 15 | name: Install Python deps in a venv 16 | command: | 17 | python3 -m venv venv 18 | . venv/bin/activate 19 | pip install -r requirements.txt 20 | kedro install 21 | - python/save-cache 22 | - run: 23 | command: | 24 | . venv/bin/activate 25 | kedro test \ 26 | --junitxml="docs/coverage/test.xml" \ 27 | --cov-report=html:"docs/html/" 28 | name: Test 29 | - run: 30 | command: | 31 | . venv/bin/activate 32 | kedro package 33 | name: Build egg & wheel 34 | - store_artifacts: 35 | path: src/dist/ 36 | destination: tr1 37 | - store_artifacts: 38 | path: docs/html/ 39 | destination: coverage 40 | - store_test_results: 41 | path: docs/coverage/ 42 | 43 | workflows: 44 | main: 45 | jobs: 46 | - build-and-test 47 | workflow: 48 | jobs: 49 | - shiftleft/analyze: 50 | target: src 51 | app: kedro_ge 52 | language: python 53 | access-token: SHIFTLEFT_ACCESS_TOKEN 54 | org-id: SHIFTLEFT_ORG_ID 55 | pre-analyze: 56 | - steps: 57 | 58 | -------------------------------------------------------------------------------- /src/minco/__init__.py: -------------------------------------------------------------------------------- 1 | # Copyright 2020 QuantumBlack Visual Analytics Limited 2 | # 3 | # Licensed under the Apache License, Version 2.0 (the "License"); 4 | # you may not use this file except in compliance with the License. 5 | # You may obtain a copy of the License at 6 | # 7 | # http://www.apache.org/licenses/LICENSE-2.0 8 | # 9 | # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 10 | # EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES 11 | # OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, AND 12 | # NONINFRINGEMENT. IN NO EVENT WILL THE LICENSOR OR OTHER CONTRIBUTORS 13 | # BE LIABLE FOR ANY CLAIM, DAMAGES, OR OTHER LIABILITY, WHETHER IN AN 14 | # ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF, OR IN 15 | # CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 16 | # 17 | # The QuantumBlack Visual Analytics Limited ("QuantumBlack") name and logo 18 | # (either separately or in combination, "QuantumBlack Trademarks") are 19 | # trademarks of QuantumBlack. The License does not grant you any right or 20 | # license to the QuantumBlack Trademarks. You may not use the QuantumBlack 21 | # Trademarks or any confusingly similar mark as a trademark for your product, 22 | # or use the QuantumBlack Trademarks in any other manner that might cause 23 | # confusion in the marketplace, including but not limited to in advertising, 24 | # on websites, or on software. 25 | # 26 | # See the License for the specific language governing permissions and 27 | # limitations under the License. 28 | """minco 29 | """ 30 | 31 | __version__ = "0.1" 32 | -------------------------------------------------------------------------------- /src/minco/pipelines/data_engineering/README.md: -------------------------------------------------------------------------------- 1 | # Data Engineering pipeline 2 | 3 | > *Note:* This `README.md` was generated using `Kedro 0.16.6` for illustration purposes. Please modify it according to your pipeline structure and contents. 4 | 5 | ## Overview 6 | 7 | This modular pipeline splits the incoming data into the train and test subsets (`split_data` node) 8 | 9 | ## Pipeline inputs 10 | 11 | ### `example_iris_data` 12 | 13 | | | | 14 | | ---- | ------------------ | 15 | | Type | `pandas.DataFrame` | 16 | | Description | Input data to split into train and test sets | 17 | 18 | ### `params:example_test_data_ratio` 19 | 20 | | | | 21 | | ---- | ------------------ | 22 | | Type | `float` | 23 | | Description | The split ratio parameter that identifies what percentage of rows goes to the train set | 24 | 25 | ## Pipeline outputs 26 | 27 | ### `example_train_x` 28 | 29 | | | | 30 | | ---- | ------------------ | 31 | | Type | `pandas.DataFrame` | 32 | | Description | DataFrame containing train set features | 33 | 34 | ### `example_train_y` 35 | 36 | | | | 37 | | ---- | ------------------ | 38 | | Type | `pandas.DataFrame` | 39 | | Description | DataFrame containing train set one-hot encoded target variable | 40 | 41 | ### `example_test_x` 42 | 43 | | | | 44 | | ---- | ------------------ | 45 | | Type | `pandas.DataFrame` | 46 | | Description | DataFrame containing test set features | 47 | 48 | ### `example_test_y` 49 | 50 | | | | 51 | | ---- | ------------------ | 52 | | Type | `pandas.DataFrame` | 53 | | Description | DataFrame containing test set one-hot encoded target variable | 54 | -------------------------------------------------------------------------------- /conf/base/catalog.yml: -------------------------------------------------------------------------------- 1 | # Here you can define all your data sets by using simple YAML syntax. 2 | # 3 | # Documentation for this file format can be found in "The Data Catalog" 4 | # Link: https://kedro.readthedocs.io/en/stable/05_data/01_data_catalog.html 5 | # 6 | # We support interacting with a variety of data stores including local file systems, cloud, network and HDFS 7 | # 8 | # An example data set definition can look as follows: 9 | # 10 | #bikes: 11 | # type: pandas.CSVDataSet 12 | # filepath: "data/01_raw/bikes.csv" 13 | # 14 | #weather: 15 | # type: spark.SparkDataSet 16 | # filepath: s3a://your_bucket/data/01_raw/weather* 17 | # file_format: csv 18 | # credentials: dev_s3 19 | # load_args: 20 | # header: True 21 | # inferSchema: True 22 | # save_args: 23 | # sep: '|' 24 | # header: True 25 | # 26 | #scooters: 27 | # type: pandas.SQLTableDataSet 28 | # credentials: scooters_credentials 29 | # table_name: scooters 30 | # load_args: 31 | # index_col: ['name'] 32 | # columns: ['name', 'gear'] 33 | # save_args: 34 | # if_exists: 'replace' 35 | # # if_exists: 'fail' 36 | # # if_exists: 'append' 37 | # 38 | # The Data Catalog supports being able to reference the same file using two different DataSet implementations 39 | # (transcoding), templating and a way to reuse arguments that are frequently repeated. See more here: 40 | # https://kedro.readthedocs.io/en/stable/05_data/01_data_catalog.html 41 | # 42 | # This is a data set used by the "Hello World" example pipeline provided with the project 43 | # template. Please feel free to remove it once you remove the example pipeline. 44 | 45 | example_iris_data: 46 | type: pandas.CSVDataSet 47 | filepath: data/01_raw/iris.csv 48 | 49 | netflix_titles: 50 | type: pandas.CSVDataSet 51 | filepath: data/01_raw/netflix_titles.csv 52 | -------------------------------------------------------------------------------- /src/minco/pipelines/data_engineering/__init__.py: -------------------------------------------------------------------------------- 1 | # Copyright 2020 QuantumBlack Visual Analytics Limited 2 | # 3 | # Licensed under the Apache License, Version 2.0 (the "License"); 4 | # you may not use this file except in compliance with the License. 5 | # You may obtain a copy of the License at 6 | # 7 | # http://www.apache.org/licenses/LICENSE-2.0 8 | # 9 | # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 10 | # EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES 11 | # OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, AND 12 | # NONINFRINGEMENT. IN NO EVENT WILL THE LICENSOR OR OTHER CONTRIBUTORS 13 | # BE LIABLE FOR ANY CLAIM, DAMAGES, OR OTHER LIABILITY, WHETHER IN AN 14 | # ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF, OR IN 15 | # CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 16 | # 17 | # The QuantumBlack Visual Analytics Limited ("QuantumBlack") name and logo 18 | # (either separately or in combination, "QuantumBlack Trademarks") are 19 | # trademarks of QuantumBlack. The License does not grant you any right or 20 | # license to the QuantumBlack Trademarks. You may not use the QuantumBlack 21 | # Trademarks or any confusingly similar mark as a trademark for your product, 22 | # or use the QuantumBlack Trademarks in any other manner that might cause 23 | # confusion in the marketplace, including but not limited to in advertising, 24 | # on websites, or on software. 25 | # 26 | # See the License for the specific language governing permissions and 27 | # limitations under the License. 28 | """Example code for the nodes in the example pipeline. This code is meant 29 | just for illustrating basic Kedro features. 30 | 31 | PLEASE DELETE THIS FILE ONCE YOU START WORKING ON YOUR OWN PROJECT! 32 | """ 33 | 34 | from .pipeline import create_pipeline # NOQA 35 | -------------------------------------------------------------------------------- /src/minco/pipelines/data_science/__init__.py: -------------------------------------------------------------------------------- 1 | # Copyright 2020 QuantumBlack Visual Analytics Limited 2 | # 3 | # Licensed under the Apache License, Version 2.0 (the "License"); 4 | # you may not use this file except in compliance with the License. 5 | # You may obtain a copy of the License at 6 | # 7 | # http://www.apache.org/licenses/LICENSE-2.0 8 | # 9 | # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 10 | # EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES 11 | # OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, AND 12 | # NONINFRINGEMENT. IN NO EVENT WILL THE LICENSOR OR OTHER CONTRIBUTORS 13 | # BE LIABLE FOR ANY CLAIM, DAMAGES, OR OTHER LIABILITY, WHETHER IN AN 14 | # ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF, OR IN 15 | # CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 16 | # 17 | # The QuantumBlack Visual Analytics Limited ("QuantumBlack") name and logo 18 | # (either separately or in combination, "QuantumBlack Trademarks") are 19 | # trademarks of QuantumBlack. The License does not grant you any right or 20 | # license to the QuantumBlack Trademarks. You may not use the QuantumBlack 21 | # Trademarks or any confusingly similar mark as a trademark for your product, 22 | # or use the QuantumBlack Trademarks in any other manner that might cause 23 | # confusion in the marketplace, including but not limited to in advertising, 24 | # on websites, or on software. 25 | # 26 | # See the License for the specific language governing permissions and 27 | # limitations under the License. 28 | """Example code for the nodes in the example pipeline. This code is meant 29 | just for illustrating basic Kedro features. 30 | 31 | PLEASE DELETE THIS FILE ONCE YOU START WORKING ON YOUR OWN PROJECT! 32 | """ 33 | 34 | from .pipeline import create_pipeline # NOQA 35 | -------------------------------------------------------------------------------- /src/minco/pipelines/data_science/README.md: -------------------------------------------------------------------------------- 1 | # Data Science pipeline 2 | 3 | > *Note:* This `README.md` was generated using `Kedro 0.16.6` for illustration purposes. Please modify it according to your pipeline structure and contents. 4 | 5 | ## Overview 6 | 7 | This modular pipeline: 8 | 1. trains a simple multi-class logistic regression model (`train_model` node) 9 | 2. makes predictions given a trained model from (1) and a test set (`predict` node) 10 | 3. reports the model accuracy on a test set (`report_accuracy` node) 11 | 12 | 13 | ## Pipeline inputs 14 | 15 | ### `example_train_x` 16 | 17 | | | | 18 | | ---- | ------------------ | 19 | | Type | `pandas.DataFrame` | 20 | | Description | DataFrame containing train set features | 21 | 22 | ### `example_train_y` 23 | 24 | | | | 25 | | ---- | ------------------ | 26 | | Type | `pandas.DataFrame` | 27 | | Description | DataFrame containing train set one-hot encoded target variable | 28 | 29 | ### `example_test_x` 30 | 31 | | | | 32 | | ---- | ------------------ | 33 | | Type | `pandas.DataFrame` | 34 | | Description | DataFrame containing test set features | 35 | 36 | ### `example_test_y` 37 | 38 | | | | 39 | | ---- | ------------------ | 40 | | Type | `pandas.DataFrame` | 41 | | Description | DataFrame containing test set one-hot encoded target variable | 42 | 43 | ### `parameters` 44 | 45 | | | | 46 | | ---- | ------------------ | 47 | | Type | `dict` | 48 | | Description | Project parameter dictionary that must contain the following keys: `example_num_train_iter` (number of model training iterations), `example_learning_rate` (learning rate for gradient descent) | 49 | 50 | 51 | ## Pipeline outputs 52 | 53 | ### `example_model` 54 | 55 | | | | 56 | | ---- | ------------------ | 57 | | Type | `numpy.ndarray` | 58 | | Description | Example logistic regression model | 59 | -------------------------------------------------------------------------------- /conf/base/logging.yml: -------------------------------------------------------------------------------- 1 | version: 1 2 | disable_existing_loggers: False 3 | formatters: 4 | simple: 5 | format: "%(asctime)s - %(name)s - %(levelname)s - %(message)s" 6 | json_formatter: 7 | format: "%(asctime)s - %(name)s - %(levelname)s - %(message)s" 8 | class: pythonjsonlogger.jsonlogger.JsonFormatter 9 | 10 | handlers: 11 | console: 12 | class: logging.StreamHandler 13 | level: INFO 14 | formatter: simple 15 | stream: ext://sys.stdout 16 | 17 | info_file_handler: 18 | class: logging.handlers.RotatingFileHandler 19 | level: INFO 20 | formatter: simple 21 | filename: logs/info.log 22 | maxBytes: 10485760 # 10MB 23 | backupCount: 20 24 | encoding: utf8 25 | delay: True 26 | 27 | error_file_handler: 28 | class: logging.handlers.RotatingFileHandler 29 | level: ERROR 30 | formatter: simple 31 | filename: logs/errors.log 32 | maxBytes: 10485760 # 10MB 33 | backupCount: 20 34 | encoding: utf8 35 | delay: True 36 | 37 | journal_file_handler: 38 | class: kedro.versioning.journal.JournalFileHandler 39 | level: INFO 40 | base_dir: logs/journals 41 | formatter: json_formatter 42 | 43 | loggers: 44 | anyconfig: 45 | level: WARNING 46 | handlers: [console, info_file_handler, error_file_handler] 47 | propagate: no 48 | 49 | kedro.io: 50 | level: INFO 51 | handlers: [console, info_file_handler, error_file_handler] 52 | propagate: no 53 | 54 | kedro.pipeline: 55 | level: INFO 56 | handlers: [console, info_file_handler, error_file_handler] 57 | propagate: no 58 | 59 | kedro.journal: 60 | level: INFO 61 | handlers: [journal_file_handler] 62 | propagate: no 63 | 64 | root: 65 | level: INFO 66 | handlers: [console, info_file_handler, error_file_handler] 67 | -------------------------------------------------------------------------------- /src/minco/run.py: -------------------------------------------------------------------------------- 1 | # Copyright 2020 QuantumBlack Visual Analytics Limited 2 | # 3 | # Licensed under the Apache License, Version 2.0 (the "License"); 4 | # you may not use this file except in compliance with the License. 5 | # You may obtain a copy of the License at 6 | # 7 | # http://www.apache.org/licenses/LICENSE-2.0 8 | # 9 | # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 10 | # EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES 11 | # OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, AND 12 | # NONINFRINGEMENT. IN NO EVENT WILL THE LICENSOR OR OTHER CONTRIBUTORS 13 | # BE LIABLE FOR ANY CLAIM, DAMAGES, OR OTHER LIABILITY, WHETHER IN AN 14 | # ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF, OR IN 15 | # CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 16 | # 17 | # The QuantumBlack Visual Analytics Limited ("QuantumBlack") name and logo 18 | # (either separately or in combination, "QuantumBlack Trademarks") are 19 | # trademarks of QuantumBlack. The License does not grant you any right or 20 | # license to the QuantumBlack Trademarks. You may not use the QuantumBlack 21 | # Trademarks or any confusingly similar mark as a trademark for your product, 22 | # or use the QuantumBlack Trademarks in any other manner that might cause 23 | # confusion in the marketplace, including but not limited to in advertising, 24 | # on websites, or on software. 25 | # 26 | # See the License for the specific language governing permissions and 27 | # limitations under the License. 28 | 29 | """Application entry point.""" 30 | from pathlib import Path 31 | 32 | from kedro.framework.context import KedroContext, load_package_context 33 | 34 | 35 | class ProjectContext(KedroContext): 36 | """Users can override the remaining methods from the parent class here, 37 | or create new ones (e.g. as required by plugins) 38 | """ 39 | 40 | 41 | def run_package(): 42 | # Entry point for running a Kedro project packaged with `kedro package` 43 | # using `python -m .run` command. 44 | project_context = load_package_context( 45 | project_path=Path.cwd(), package_name=Path(__file__).resolve().parent.name 46 | ) 47 | project_context.run() 48 | 49 | 50 | if __name__ == "__main__": 51 | run_package() 52 | -------------------------------------------------------------------------------- /src/minco/pipelines/data_engineering/pipeline.py: -------------------------------------------------------------------------------- 1 | # Copyright 2020 QuantumBlack Visual Analytics Limited 2 | # 3 | # Licensed under the Apache License, Version 2.0 (the "License"); 4 | # you may not use this file except in compliance with the License. 5 | # You may obtain a copy of the License at 6 | # 7 | # http://www.apache.org/licenses/LICENSE-2.0 8 | # 9 | # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 10 | # EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES 11 | # OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, AND 12 | # NONINFRINGEMENT. IN NO EVENT WILL THE LICENSOR OR OTHER CONTRIBUTORS 13 | # BE LIABLE FOR ANY CLAIM, DAMAGES, OR OTHER LIABILITY, WHETHER IN AN 14 | # ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF, OR IN 15 | # CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 16 | # 17 | # The QuantumBlack Visual Analytics Limited ("QuantumBlack") name and logo 18 | # (either separately or in combination, "QuantumBlack Trademarks") are 19 | # trademarks of QuantumBlack. The License does not grant you any right or 20 | # license to the QuantumBlack Trademarks. You may not use the QuantumBlack 21 | # Trademarks or any confusingly similar mark as a trademark for your product, 22 | # or use the QuantumBlack Trademarks in any other manner that might cause 23 | # confusion in the marketplace, including but not limited to in advertising, 24 | # on websites, or on software. 25 | # 26 | # See the License for the specific language governing permissions and 27 | # limitations under the License. 28 | 29 | """Example code for the nodes in the example pipeline. This code is meant 30 | just for illustrating basic Kedro features. 31 | 32 | Delete this when you start working on your own Kedro project. 33 | """ 34 | 35 | from kedro.pipeline import Pipeline, node 36 | 37 | from .nodes import split_data 38 | 39 | 40 | def create_pipeline(**kwargs): 41 | return Pipeline( 42 | [ 43 | node( 44 | split_data, 45 | ["example_iris_data", "params:example_test_data_ratio"], 46 | dict( 47 | train_x="example_train_x", 48 | train_y="example_train_y", 49 | test_x="example_test_x", 50 | test_y="example_test_y", 51 | ), 52 | ) 53 | ] 54 | ) 55 | -------------------------------------------------------------------------------- /src/minco/pipelines/data_science/pipeline.py: -------------------------------------------------------------------------------- 1 | # Copyright 2020 QuantumBlack Visual Analytics Limited 2 | # 3 | # Licensed under the Apache License, Version 2.0 (the "License"); 4 | # you may not use this file except in compliance with the License. 5 | # You may obtain a copy of the License at 6 | # 7 | # http://www.apache.org/licenses/LICENSE-2.0 8 | # 9 | # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 10 | # EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES 11 | # OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, AND 12 | # NONINFRINGEMENT. IN NO EVENT WILL THE LICENSOR OR OTHER CONTRIBUTORS 13 | # BE LIABLE FOR ANY CLAIM, DAMAGES, OR OTHER LIABILITY, WHETHER IN AN 14 | # ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF, OR IN 15 | # CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 16 | # 17 | # The QuantumBlack Visual Analytics Limited ("QuantumBlack") name and logo 18 | # (either separately or in combination, "QuantumBlack Trademarks") are 19 | # trademarks of QuantumBlack. The License does not grant you any right or 20 | # license to the QuantumBlack Trademarks. You may not use the QuantumBlack 21 | # Trademarks or any confusingly similar mark as a trademark for your product, 22 | # or use the QuantumBlack Trademarks in any other manner that might cause 23 | # confusion in the marketplace, including but not limited to in advertising, 24 | # on websites, or on software. 25 | # 26 | # See the License for the specific language governing permissions and 27 | # limitations under the License. 28 | 29 | """Example code for the nodes in the example pipeline. This code is meant 30 | just for illustrating basic Kedro features. 31 | 32 | Delete this when you start working on your own Kedro project. 33 | """ 34 | 35 | from kedro.pipeline import Pipeline, node 36 | 37 | from .nodes import predict, report_accuracy, train_model 38 | 39 | 40 | def create_pipeline(**kwargs): 41 | return Pipeline( 42 | [ 43 | node( 44 | train_model, 45 | ["example_train_x", "example_train_y", "parameters"], 46 | "example_model", 47 | ), 48 | node( 49 | predict, 50 | dict(model="example_model", test_x="example_test_x"), 51 | "example_predictions", 52 | ), 53 | node(report_accuracy, ["example_predictions", "example_test_y"], None), 54 | ] 55 | ) 56 | -------------------------------------------------------------------------------- /src/tests/test_run.py: -------------------------------------------------------------------------------- 1 | # Copyright 2020 QuantumBlack Visual Analytics Limited 2 | # 3 | # Licensed under the Apache License, Version 2.0 (the "License"); 4 | # you may not use this file except in compliance with the License. 5 | # You may obtain a copy of the License at 6 | # 7 | # http://www.apache.org/licenses/LICENSE-2.0 8 | # 9 | # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 10 | # EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES 11 | # OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, AND 12 | # NONINFRINGEMENT. IN NO EVENT WILL THE LICENSOR OR OTHER CONTRIBUTORS 13 | # BE LIABLE FOR ANY CLAIM, DAMAGES, OR OTHER LIABILITY, WHETHER IN AN 14 | # ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF, OR IN 15 | # CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 16 | # 17 | # The QuantumBlack Visual Analytics Limited ("QuantumBlack") name and logo 18 | # (either separately or in combination, "QuantumBlack Trademarks") are 19 | # trademarks of QuantumBlack. The License does not grant you any right or 20 | # license to the QuantumBlack Trademarks. You may not use the QuantumBlack 21 | # Trademarks or any confusingly similar mark as a trademark for your product, 22 | # or use the QuantumBlack Trademarks in any other manner that might cause 23 | # confusion in the marketplace, including but not limited to in advertising, 24 | # on websites, or on software. 25 | # 26 | # See the License for the specific language governing permissions and 27 | # limitations under the License. 28 | 29 | """ 30 | This module contains an example test. 31 | 32 | Tests should be placed in ``src/tests``, in modules that mirror your 33 | project's structure, and in files named test_*.py. They are simply functions 34 | named ``test_*`` which test a unit of logic. 35 | 36 | To run the tests, run ``kedro test``. 37 | """ 38 | from pathlib import Path 39 | 40 | import pytest 41 | 42 | from minco.run import ProjectContext 43 | 44 | 45 | @pytest.fixture 46 | def project_context(mocker): 47 | # Don't configure the logging module. If it's configured, tests that 48 | # check logs using the ``caplog`` fixture depend on execution order. 49 | mocker.patch.object(ProjectContext, "_setup_logging") 50 | 51 | return ProjectContext(str(Path.cwd())) 52 | 53 | 54 | class TestProjectContext: 55 | def test_project_name(self, project_context): 56 | assert project_context.project_name == "minco" 57 | 58 | def test_project_version(self, project_context): 59 | assert project_context.project_version == "0.16.6" 60 | -------------------------------------------------------------------------------- /.github/workflows/shiftleft.yml: -------------------------------------------------------------------------------- 1 | --- 2 | # This workflow integrates ShiftLeft NG SAST with GitHub 3 | # Visit https://docs.shiftleft.io for help 4 | name: ShiftLeft 5 | 6 | on: 7 | pull_request: 8 | workflow_dispatch: 9 | 10 | jobs: 11 | NextGen-Static-Analysis: 12 | runs-on: ubuntu-20.04 13 | steps: 14 | - uses: actions/checkout@v2 15 | - name: Download ShiftLeft CLI 16 | run: | 17 | curl https://cdn.shiftleft.io/download/sl > ${GITHUB_WORKSPACE}/sl && chmod a+rx ${GITHUB_WORKSPACE}/sl 18 | - name: Extract branch name 19 | shell: bash 20 | run: echo "##[set-output name=branch;]$(echo ${GITHUB_REF#refs/heads/})" 21 | id: extract_branch 22 | - name: NextGen Static Analysis 23 | run: | 24 | pip install --upgrade setuptools wheel 25 | pip install -r requirements.txt 26 | ${GITHUB_WORKSPACE}/sl analyze --wait --app kedro_ge --tag branch=${{ github.head_ref || steps.extract_branch.outputs.branch }} --python $(pwd) 27 | env: 28 | SHIFTLEFT_ACCESS_TOKEN: ${{ secrets.SHIFTLEFT_ACCESS_TOKEN }} 29 | 30 | if: 31 | ${{ hashFiles('requirements.txt') != '' }} 32 | - name: Legacy Static Analysis 33 | run: | 34 | echo "Please update your `shiftleft-python-demo` fork!" 35 | ${GITHUB_WORKSPACE}/sl analyze --wait --no-cpg --app kedro_ge --tag branch=${{ github.head_ref || steps.extract_branch.outputs.branch }} --python $(pwd) 36 | env: 37 | SHIFTLEFT_ACCESS_TOKEN: ${{ secrets.SHIFTLEFT_ACCESS_TOKEN }} 38 | 39 | if: 40 | ${{ hashFiles('requirements.txt') == '' }} 41 | 42 | ## Uncomment the following section to enable build rule checking and enforcing. 43 | #Build-Rules: 44 | #runs-on: ubuntu-latest 45 | #needs: NextGen-Static-Analysis 46 | #steps: 47 | #- uses: actions/checkout@v2 48 | #- name: Download ShiftLeft CLI 49 | # run: | 50 | # curl https://cdn.shiftleft.io/download/sl > ${GITHUB_WORKSPACE}/sl && chmod a+rx ${GITHUB_WORKSPACE}/sl 51 | #- name: Validate Build Rules 52 | # run: | 53 | # ${GITHUB_WORKSPACE}/sl check-analysis --app kedro_ge \ 54 | # --source 'tag.branch=${{ github.event.pull_request.base.ref }}' \ 55 | # --target "tag.branch=${{ github.head_ref || steps.extract_branch.outputs.branch }}" \ 56 | # --report \ 57 | # --github-pr-number=${{github.event.number}} \ 58 | # --github-pr-user=${{ github.repository_owner }} \ 59 | # --github-pr-repo=${{ github.event.repository.name }} \ 60 | # --github-token=${{ secrets.GITHUB_TOKEN }} 61 | # env: 62 | #SHIFTLEFT_ACCESS_TOKEN: ${{ secrets.SHIFTLEFT_ACCESS_TOKEN }} 63 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | ########################## 2 | # KEDRO PROJECT 3 | 4 | # ignore all local configuration 5 | conf/local/** 6 | !conf/local/.gitkeep 7 | 8 | # ignore potentially sensitive credentials files 9 | conf/**/*credentials* 10 | 11 | # ignore everything in the following folders 12 | data/** 13 | logs/** 14 | 15 | # except their sub-folders 16 | !data/**/ 17 | !logs/**/ 18 | 19 | # also keep all .gitkeep files 20 | !.gitkeep 21 | 22 | # also keep the example dataset 23 | !data/01_raw/iris.csv 24 | !data/01_raw/netflix_titles.csv 25 | 26 | 27 | ########################## 28 | # Common files 29 | 30 | # IntelliJ 31 | .idea/ 32 | *.iml 33 | out/ 34 | .idea_modules/ 35 | 36 | ### macOS 37 | *.DS_Store 38 | .AppleDouble 39 | .LSOverride 40 | .Trashes 41 | 42 | # Vim 43 | *~ 44 | .*.swo 45 | .*.swp 46 | 47 | # emacs 48 | *~ 49 | \#*\# 50 | /.emacs.desktop 51 | /.emacs.desktop.lock 52 | *.elc 53 | 54 | # JIRA plugin 55 | atlassian-ide-plugin.xml 56 | 57 | # C extensions 58 | *.so 59 | 60 | ### Python template 61 | # Byte-compiled / optimized / DLL files 62 | __pycache__/ 63 | *.py[cod] 64 | *$py.class 65 | 66 | # Distribution / packaging 67 | .Python 68 | build/ 69 | develop-eggs/ 70 | dist/ 71 | downloads/ 72 | eggs/ 73 | .eggs/ 74 | lib/ 75 | lib64/ 76 | parts/ 77 | sdist/ 78 | var/ 79 | wheels/ 80 | *.egg-info/ 81 | .installed.cfg 82 | *.egg 83 | MANIFEST 84 | 85 | # PyInstaller 86 | # Usually these files are written by a python script from a template 87 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 88 | *.manifest 89 | *.spec 90 | 91 | # Installer logs 92 | pip-log.txt 93 | pip-delete-this-directory.txt 94 | 95 | # Unit test / coverage reports 96 | htmlcov/ 97 | .tox/ 98 | .coverage 99 | .coverage.* 100 | .cache 101 | nosetests.xml 102 | coverage.xml 103 | *.cover 104 | .hypothesis/ 105 | 106 | # Translations 107 | *.mo 108 | *.pot 109 | 110 | # Django stuff: 111 | *.log 112 | .static_storage/ 113 | .media/ 114 | local_settings.py 115 | 116 | # Flask stuff: 117 | instance/ 118 | .webassets-cache 119 | 120 | # Scrapy stuff: 121 | .scrapy 122 | 123 | # Sphinx documentation 124 | docs/_build/ 125 | 126 | # PyBuilder 127 | target/ 128 | 129 | # Jupyter Notebook 130 | .ipynb_checkpoints 131 | 132 | # IPython 133 | .ipython/profile_default/history.sqlite 134 | .ipython/profile_default/startup/README 135 | 136 | # pyenv 137 | .python-version 138 | 139 | # celery beat schedule file 140 | celerybeat-schedule 141 | 142 | # SageMath parsed files 143 | *.sage.py 144 | 145 | # Environments 146 | .env 147 | .venv 148 | env/ 149 | venv/ 150 | ENV/ 151 | env.bak/ 152 | venv.bak/ 153 | 154 | # mkdocs documentation 155 | /site 156 | 157 | # mypy 158 | .mypy_cache/ 159 | -------------------------------------------------------------------------------- /src/setup.py: -------------------------------------------------------------------------------- 1 | # Copyright 2020 QuantumBlack Visual Analytics Limited 2 | # 3 | # Licensed under the Apache License, Version 2.0 (the "License"); 4 | # you may not use this file except in compliance with the License. 5 | # You may obtain a copy of the License at 6 | # 7 | # http://www.apache.org/licenses/LICENSE-2.0 8 | # 9 | # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 10 | # EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES 11 | # OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, AND 12 | # NONINFRINGEMENT. IN NO EVENT WILL THE LICENSOR OR OTHER CONTRIBUTORS 13 | # BE LIABLE FOR ANY CLAIM, DAMAGES, OR OTHER LIABILITY, WHETHER IN AN 14 | # ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF, OR IN 15 | # CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 16 | # 17 | # The QuantumBlack Visual Analytics Limited ("QuantumBlack") name and logo 18 | # (either separately or in combination, "QuantumBlack Trademarks") are 19 | # trademarks of QuantumBlack. The License does not grant you any right or 20 | # license to the QuantumBlack Trademarks. You may not use the QuantumBlack 21 | # Trademarks or any confusingly similar mark as a trademark for your product, 22 | # or use the QuantumBlack Trademarks in any other manner that might cause 23 | # confusion in the marketplace, including but not limited to in advertising, 24 | # on websites, or on software. 25 | # 26 | # See the License for the specific language governing permissions and 27 | # limitations under the License. 28 | 29 | from setuptools import find_packages, setup 30 | 31 | entry_point = ( 32 | "minco = minco.run:run_package" 33 | ) 34 | 35 | 36 | # get the dependencies and installs 37 | with open("requirements.txt", "r", encoding="utf-8") as f: 38 | # Make sure we strip all comments and options (e.g "--extra-index-url") 39 | # that arise from a modified pip.conf file that configure global options 40 | # when running kedro build-reqs 41 | requires = [] 42 | for line in f: 43 | req = line.split("#", 1)[0].strip() 44 | if req and not req.startswith("--"): 45 | requires.append(req) 46 | 47 | setup( 48 | name="minco", 49 | version="0.1", 50 | packages=find_packages(exclude=["tests"]), 51 | entry_points={"console_scripts": [entry_point]}, 52 | install_requires=requires, 53 | extras_require={ 54 | "docs": [ 55 | "sphinx>=1.6.3, <2.0", 56 | "sphinx_rtd_theme==0.4.1", 57 | "nbsphinx==0.3.4", 58 | "nbstripout==0.3.3", 59 | "recommonmark==0.5.0", 60 | "sphinx-autodoc-typehints==1.6.0", 61 | "sphinx_copybutton==0.2.5", 62 | "jupyter_client>=5.1.0, <7.0", 63 | "tornado>=4.2, <6.0", 64 | "ipykernel>=4.8.1, <5.0", 65 | ] 66 | }, 67 | ) 68 | -------------------------------------------------------------------------------- /.ipython/profile_default/startup/00-kedro-init.py: -------------------------------------------------------------------------------- 1 | import logging.config 2 | import sys 3 | from pathlib import Path 4 | 5 | from IPython.core.magic import needs_local_scope, register_line_magic 6 | from kedro.framework.hooks import get_hook_manager 7 | 8 | # Find the project root (./../../../) 9 | startup_error = None 10 | project_path = Path(__file__).parents[3].resolve() 11 | 12 | 13 | @register_line_magic 14 | def reload_kedro(path, line=None): 15 | """Line magic which reloads all Kedro default variables.""" 16 | global startup_error 17 | global context 18 | global catalog 19 | 20 | try: 21 | import kedro.config.default_logger 22 | from kedro.framework.context import load_context 23 | from kedro.framework.cli.jupyter import collect_line_magic 24 | except ImportError: 25 | logging.error( 26 | "Kedro appears not to be installed in your current environment " 27 | "or your current IPython session was not started in a valid Kedro project." 28 | ) 29 | raise 30 | 31 | try: 32 | path = path or project_path 33 | 34 | # remove cached user modules 35 | context = load_context(path) 36 | to_remove = [mod for mod in sys.modules if mod.startswith(context.package_name)] 37 | # `del` is used instead of `reload()` because: If the new version of a module does not 38 | # define a name that was defined by the old version, the old definition remains. 39 | for module in to_remove: 40 | del sys.modules[module] 41 | 42 | # clear hook manager; hook implementations will be re-registered when the 43 | # context is instantiated again in `load_context()` below 44 | hook_manager = get_hook_manager() 45 | name_plugin_pairs = hook_manager.list_name_plugin() 46 | for name, plugin in name_plugin_pairs: 47 | hook_manager.unregister(name=name, plugin=plugin) 48 | 49 | logging.debug("Loading the context from %s", str(path)) 50 | # Reload context to fix `pickle` related error (it is unable to serialize reloaded objects) 51 | # Some details can be found here: 52 | # https://modwsgi.readthedocs.io/en/develop/user-guides/issues-with-pickle-module.html#packing-and-script-reloading 53 | context = load_context(path) 54 | catalog = context.catalog 55 | 56 | logging.info("** Kedro project %s", str(context.project_name)) 57 | logging.info("Defined global variable `context` and `catalog`") 58 | 59 | for line_magic in collect_line_magic(): 60 | register_line_magic(needs_local_scope(line_magic)) 61 | logging.info("Registered line magic `%s`", line_magic.__name__) 62 | except Exception as err: 63 | startup_error = err 64 | logging.exception( 65 | "Kedro's ipython session startup script failed:\n%s", str(err) 66 | ) 67 | raise err 68 | 69 | 70 | reload_kedro(project_path) 71 | -------------------------------------------------------------------------------- /src/minco/hooks.py: -------------------------------------------------------------------------------- 1 | # Copyright 2020 QuantumBlack Visual Analytics Limited 2 | # 3 | # Licensed under the Apache License, Version 2.0 (the "License"); 4 | # you may not use this file except in compliance with the License. 5 | # You may obtain a copy of the License at 6 | # 7 | # http://www.apache.org/licenses/LICENSE-2.0 8 | # 9 | # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 10 | # EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES 11 | # OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, AND 12 | # NONINFRINGEMENT. IN NO EVENT WILL THE LICENSOR OR OTHER CONTRIBUTORS 13 | # BE LIABLE FOR ANY CLAIM, DAMAGES, OR OTHER LIABILITY, WHETHER IN AN 14 | # ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF, OR IN 15 | # CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 16 | # 17 | # The QuantumBlack Visual Analytics Limited ("QuantumBlack") name and logo 18 | # (either separately or in combination, "QuantumBlack Trademarks") are 19 | # trademarks of QuantumBlack. The License does not grant you any right or 20 | # license to the QuantumBlack Trademarks. You may not use the QuantumBlack 21 | # Trademarks or any confusingly similar mark as a trademark for your product, 22 | # or use the QuantumBlack Trademarks in any other manner that might cause 23 | # confusion in the marketplace, including but not limited to in advertising, 24 | # on websites, or on software. 25 | # 26 | # See the License for the specific language governing permissions and 27 | # limitations under the License. 28 | 29 | """Project hooks.""" 30 | from typing import Any, Dict, Iterable, Optional 31 | 32 | from kedro.config import ConfigLoader 33 | from kedro.framework.hooks import hook_impl 34 | from kedro.io import DataCatalog 35 | from kedro.pipeline import Pipeline 36 | from kedro.versioning import Journal 37 | 38 | from minco.pipelines import data_engineering as de 39 | from minco.pipelines import data_science as ds 40 | 41 | 42 | class ProjectHooks: 43 | @hook_impl 44 | def register_pipelines(self) -> Dict[str, Pipeline]: 45 | """Register the project's pipeline. 46 | 47 | Returns: 48 | A mapping from a pipeline name to a ``Pipeline`` object. 49 | 50 | """ 51 | data_engineering_pipeline = de.create_pipeline() 52 | data_science_pipeline = ds.create_pipeline() 53 | 54 | return { 55 | "de": data_engineering_pipeline, 56 | "ds": data_science_pipeline, 57 | "__default__": data_engineering_pipeline + data_science_pipeline, 58 | } 59 | 60 | @hook_impl 61 | def register_config_loader(self, conf_paths: Iterable[str]) -> ConfigLoader: 62 | return ConfigLoader(conf_paths) 63 | 64 | @hook_impl 65 | def register_catalog( 66 | self, 67 | catalog: Optional[Dict[str, Dict[str, Any]]], 68 | credentials: Dict[str, Dict[str, Any]], 69 | load_versions: Dict[str, str], 70 | save_version: str, 71 | journal: Journal, 72 | ) -> DataCatalog: 73 | return DataCatalog.from_config( 74 | catalog, credentials, load_versions, save_version, journal 75 | ) 76 | 77 | 78 | project_hooks = ProjectHooks() 79 | -------------------------------------------------------------------------------- /src/minco/pipelines/data_engineering/nodes.py: -------------------------------------------------------------------------------- 1 | # Copyright 2020 QuantumBlack Visual Analytics Limited 2 | # 3 | # Licensed under the Apache License, Version 2.0 (the "License"); 4 | # you may not use this file except in compliance with the License. 5 | # You may obtain a copy of the License at 6 | # 7 | # http://www.apache.org/licenses/LICENSE-2.0 8 | # 9 | # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 10 | # EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES 11 | # OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, AND 12 | # NONINFRINGEMENT. IN NO EVENT WILL THE LICENSOR OR OTHER CONTRIBUTORS 13 | # BE LIABLE FOR ANY CLAIM, DAMAGES, OR OTHER LIABILITY, WHETHER IN AN 14 | # ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF, OR IN 15 | # CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 16 | # 17 | # The QuantumBlack Visual Analytics Limited ("QuantumBlack") name and logo 18 | # (either separately or in combination, "QuantumBlack Trademarks") are 19 | # trademarks of QuantumBlack. The License does not grant you any right or 20 | # license to the QuantumBlack Trademarks. You may not use the QuantumBlack 21 | # Trademarks or any confusingly similar mark as a trademark for your product, 22 | # or use the QuantumBlack Trademarks in any other manner that might cause 23 | # confusion in the marketplace, including but not limited to in advertising, 24 | # on websites, or on software. 25 | # 26 | # See the License for the specific language governing permissions and 27 | # limitations under the License. 28 | """Example code for the nodes in the example pipeline. This code is meant 29 | just for illustrating basic Kedro features. 30 | 31 | PLEASE DELETE THIS FILE ONCE YOU START WORKING ON YOUR OWN PROJECT! 32 | """ 33 | 34 | from typing import Any, Dict 35 | 36 | import pandas as pd 37 | 38 | 39 | def split_data(data: pd.DataFrame, example_test_data_ratio: float) -> Dict[str, Any]: 40 | """Node for splitting the classical Iris data set into training and test 41 | sets, each split into features and labels. 42 | The split ratio parameter is taken from conf/project/parameters.yml. 43 | The data and the parameters will be loaded and provided to your function 44 | automatically when the pipeline is executed and it is time to run this node. 45 | """ 46 | data.columns = [ 47 | "sepal_length", 48 | "sepal_width", 49 | "petal_length", 50 | "petal_width", 51 | "target", 52 | ] 53 | classes = sorted(data["target"].unique()) 54 | # One-hot encoding for the target variable 55 | data = pd.get_dummies(data, columns=["target"], prefix="", prefix_sep="") 56 | 57 | # Shuffle all the data 58 | data = data.sample(frac=1).reset_index(drop=True) 59 | 60 | # Split to training and testing data 61 | n = data.shape[0] 62 | n_test = int(n * example_test_data_ratio) 63 | training_data = data.iloc[n_test:, :].reset_index(drop=True) 64 | test_data = data.iloc[:n_test, :].reset_index(drop=True) 65 | 66 | # Split the data to features and labels 67 | train_data_x = training_data.loc[:, "sepal_length":"petal_width"] 68 | train_data_y = training_data[classes] 69 | test_data_x = test_data.loc[:, "sepal_length":"petal_width"] 70 | test_data_y = test_data[classes] 71 | 72 | # When returning many variables, it is a good practice to give them names: 73 | return dict( 74 | train_x=train_data_x, 75 | train_y=train_data_y, 76 | test_x=test_data_x, 77 | test_y=test_data_y, 78 | ) 79 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | [![Gitpod ready-to-code](https://img.shields.io/badge/Gitpod-ready--to--code-blue?logo=gitpod)](https://gitpod.io/#https://github.com/SDK/kedro_ge) 2 | [![CircleCI](https://circleci.com/gh/SDK/kedro_ge.svg?style=shield)](https://circleci.com/gh/SDK/kedro_ge) 3 | 4 | # Example: 5 | 6 | # Data Quality lab 7 | 8 | ## Intro 9 | 10 | Este lab esta diseñado para servir como metodo de aprendizaje de Kedro + Great expectations 11 | 12 | Contiene 2 datasets 13 | * Iris data 14 | * Netflix Tittles 15 | 16 | 17 | ## Iniciar Catalogo de GE 18 | 19 | Inicializemos el ambiente con 20 | 21 | ``` 22 | great_expectations init 23 | ``` 24 | ![Deberia ser algo asi](./pngs/ge1.png) 25 | 26 | 27 | Luego debemos confirmar el tipo de dataset, path y nombre que queremos darle. 28 | 29 | ![Deberia ser algo asi](./pngs/ge2.png) 30 | 31 | Confirmemos para hacer el profiling: 32 | ``` 33 | ================================================================================ 34 | 35 | Would you like to profile new Expectations for a single data asset within your new Datasource? [Y/n]: y 36 | 37 | Would you like to: 38 | 1. choose from a list of data assets in this datasource 39 | 2. enter the path of a data file 40 | : 1 41 | 42 | Which data would you like to use? 43 | 1. iris (file) 44 | 2. netflix_titles (file) 45 | Don't see the name of the data asset in the list above? Just type it 46 | : 1 47 | 48 | Name the new Expectation Suite [iris.warning]: 49 | 50 | Great Expectations will choose a couple of columns and generate expectations about them 51 | to demonstrate some examples of assertions you can make about your data. 52 | 53 | Great Expectations will store these expectations in a new Expectation Suite 'iris.warning' here: 54 | 55 | file:///workspace/kedro_ge/great_expectations/expectations/iris/warning.json 56 | 57 | Would you like to proceed? [Y/n]: 58 | 59 | Generating example Expectation Suite... 60 | 61 | Done generating example Expectation Suite 62 | 63 | ================================================================================ 64 | 65 | Would you like to build Data Docs? [Y/n]: 66 | 67 | The following Data Docs sites will be built: 68 | 69 | - local_site: file:///workspace/kedro_ge/great_expectations/uncommitted/data_docs/local_site/index.html 70 | 71 | Would you like to proceed? [Y/n]: 72 | 73 | Building Data Docs... 74 | 75 | Done building Data Docs 76 | 77 | Would you like to view your new Expectations in Data Docs? This will open a new browser window. [Y/n]: 78 | ``` 79 | 80 | ## Agregar Datasource a GE 81 | 82 | para agregar un nuevo dataset basta que corramos: 83 | ``` 84 | great_expectations datasource new 85 | ``` 86 | y volveremos a pasar por el menu para agregar un dataset. 87 | 88 | ## Realizar un nuevo profile 89 | ``` 90 | great_expetations datasource profile 91 | ``` 92 | 93 | ## Basic commands: 94 | 95 | * `great_expectations suite edit` 96 | * `great_expectations suite new` 97 | * `great_expectations suite list` 98 | * `great_expectations suite delete` 99 | * `great_expectations docs build` 100 | * `great_expectations docs clean` 101 | * `great_expectations checkpoint new` 102 | * `great_expectations checkpoint list` 103 | * `great_expectations checkpoint run` 104 | * `great_expectations checkpoint script` 105 | * `great_expectations datasource list` 106 | * `great_expectations datasource profile` 107 | * `great_expectations datasource delete` 108 | * `great_expectations validation-operator run` 109 | * `great_expectations init` 110 | 111 | ## Key ideas 112 | 113 | https://docs.greatexpectations.io/en/latest/reference/core_concepts.html#key-ideas 114 | 115 | ## Glosario de Expectations 116 | https://docs.greatexpectations.io/en/latest/reference/glossary_of_expectations.html 117 | 118 | 119 | ## Known bugs para este ambiente en GitPod 120 | 121 | Existe un Kernel error cuando se edita la Suite desde 122 | ``` 123 | great_expectations suite edit 124 | ``` 125 | para remediar esto, es mejor abrir una sesion de Kedro con un parametro para abrir el puerto: 126 | 127 | ``` 128 | kedro jupyter notebook --NotebookApp.allow_origin=\'$(gp url 8888)\' 129 | ``` 130 | -------------------------------------------------------------------------------- /data/01_raw/iris.csv: -------------------------------------------------------------------------------- 1 | sepal_length,sepal_width,petal_length,petal_width,species 2 | 5.1,3.5,1.4,0.2,setosa 3 | 4.9,3.0,1.4,0.2,setosa 4 | 4.7,3.2,1.3,0.2,setosa 5 | 4.6,3.1,1.5,0.2,setosa 6 | 5.0,3.6,1.4,0.2,setosa 7 | 5.4,3.9,1.7,0.4,setosa 8 | 4.6,3.4,1.4,0.3,setosa 9 | 5.0,3.4,1.5,0.2,setosa 10 | 4.4,2.9,1.4,0.2,setosa 11 | 4.9,3.1,1.5,0.1,setosa 12 | 5.4,3.7,1.5,0.2,setosa 13 | 4.8,3.4,1.6,0.2,setosa 14 | 4.8,3.0,1.4,0.1,setosa 15 | 4.3,3.0,1.1,0.1,setosa 16 | 5.8,4.0,1.2,0.2,setosa 17 | 5.7,4.4,1.5,0.4,setosa 18 | 5.4,3.9,1.3,0.4,setosa 19 | 5.1,3.5,1.4,0.3,setosa 20 | 5.7,3.8,1.7,0.3,setosa 21 | 5.1,3.8,1.5,0.3,setosa 22 | 5.4,3.4,1.7,0.2,setosa 23 | 5.1,3.7,1.5,0.4,setosa 24 | 4.6,3.6,1.0,0.2,setosa 25 | 5.1,3.3,1.7,0.5,setosa 26 | 4.8,3.4,1.9,0.2,setosa 27 | 5.0,3.0,1.6,0.2,setosa 28 | 5.0,3.4,1.6,0.4,setosa 29 | 5.2,3.5,1.5,0.2,setosa 30 | 5.2,3.4,1.4,0.2,setosa 31 | 4.7,3.2,1.6,0.2,setosa 32 | 4.8,3.1,1.6,0.2,setosa 33 | 5.4,3.4,1.5,0.4,setosa 34 | 5.2,4.1,1.5,0.1,setosa 35 | 5.5,4.2,1.4,0.2,setosa 36 | 4.9,3.1,1.5,0.1,setosa 37 | 5.0,3.2,1.2,0.2,setosa 38 | 5.5,3.5,1.3,0.2,setosa 39 | 4.9,3.1,1.5,0.1,setosa 40 | 4.4,3.0,1.3,0.2,setosa 41 | 5.1,3.4,1.5,0.2,setosa 42 | 5.0,3.5,1.3,0.3,setosa 43 | 4.5,2.3,1.3,0.3,setosa 44 | 4.4,3.2,1.3,0.2,setosa 45 | 5.0,3.5,1.6,0.6,setosa 46 | 5.1,3.8,1.9,0.4,setosa 47 | 4.8,3.0,1.4,0.3,setosa 48 | 5.1,3.8,1.6,0.2,setosa 49 | 4.6,3.2,1.4,0.2,setosa 50 | 5.3,3.7,1.5,0.2,setosa 51 | 5.0,3.3,1.4,0.2,setosa 52 | 7.0,3.2,4.7,1.4,versicolor 53 | 6.4,3.2,4.5,1.5,versicolor 54 | 6.9,3.1,4.9,1.5,versicolor 55 | 5.5,2.3,4.0,1.3,versicolor 56 | 6.5,2.8,4.6,1.5,versicolor 57 | 5.7,2.8,4.5,1.3,versicolor 58 | 6.3,3.3,4.7,1.6,versicolor 59 | 4.9,2.4,3.3,1.0,versicolor 60 | 6.6,2.9,4.6,1.3,versicolor 61 | 5.2,2.7,3.9,1.4,versicolor 62 | 5.0,2.0,3.5,1.0,versicolor 63 | 5.9,3.0,4.2,1.5,versicolor 64 | 6.0,2.2,4.0,1.0,versicolor 65 | 6.1,2.9,4.7,1.4,versicolor 66 | 5.6,2.9,3.6,1.3,versicolor 67 | 6.7,3.1,4.4,1.4,versicolor 68 | 5.6,3.0,4.5,1.5,versicolor 69 | 5.8,2.7,4.1,1.0,versicolor 70 | 6.2,2.2,4.5,1.5,versicolor 71 | 5.6,2.5,3.9,1.1,versicolor 72 | 5.9,3.2,4.8,1.8,versicolor 73 | 6.1,2.8,4.0,1.3,versicolor 74 | 6.3,2.5,4.9,1.5,versicolor 75 | 6.1,2.8,4.7,1.2,versicolor 76 | 6.4,2.9,4.3,1.3,versicolor 77 | 6.6,3.0,4.4,1.4,versicolor 78 | 6.8,2.8,4.8,1.4,versicolor 79 | 6.7,3.0,5.0,1.7,versicolor 80 | 6.0,2.9,4.5,1.5,versicolor 81 | 5.7,2.6,3.5,1.0,versicolor 82 | 5.5,2.4,3.8,1.1,versicolor 83 | 5.5,2.4,3.7,1.0,versicolor 84 | 5.8,2.7,3.9,1.2,versicolor 85 | 6.0,2.7,5.1,1.6,versicolor 86 | 5.4,3.0,4.5,1.5,versicolor 87 | 6.0,3.4,4.5,1.6,versicolor 88 | 6.7,3.1,4.7,1.5,versicolor 89 | 6.3,2.3,4.4,1.3,versicolor 90 | 5.6,3.0,4.1,1.3,versicolor 91 | 5.5,2.5,4.0,1.3,versicolor 92 | 5.5,2.6,4.4,1.2,versicolor 93 | 6.1,3.0,4.6,1.4,versicolor 94 | 5.8,2.6,4.0,1.2,versicolor 95 | 5.0,2.3,3.3,1.0,versicolor 96 | 5.6,2.7,4.2,1.3,versicolor 97 | 5.7,3.0,4.2,1.2,versicolor 98 | 5.7,2.9,4.2,1.3,versicolor 99 | 6.2,2.9,4.3,1.3,versicolor 100 | 5.1,2.5,3.0,1.1,versicolor 101 | 5.7,2.8,4.1,1.3,versicolor 102 | 6.3,3.3,6.0,2.5,virginica 103 | 5.8,2.7,5.1,1.9,virginica 104 | 7.1,3.0,5.9,2.1,virginica 105 | 6.3,2.9,5.6,1.8,virginica 106 | 6.5,3.0,5.8,2.2,virginica 107 | 7.6,3.0,6.6,2.1,virginica 108 | 4.9,2.5,4.5,1.7,virginica 109 | 7.3,2.9,6.3,1.8,virginica 110 | 6.7,2.5,5.8,1.8,virginica 111 | 7.2,3.6,6.1,2.5,virginica 112 | 6.5,3.2,5.1,2.0,virginica 113 | 6.4,2.7,5.3,1.9,virginica 114 | 6.8,3.0,5.5,2.1,virginica 115 | 5.7,2.5,5.0,2.0,virginica 116 | 5.8,2.8,5.1,2.4,virginica 117 | 6.4,3.2,5.3,2.3,virginica 118 | 6.5,3.0,5.5,1.8,virginica 119 | 7.7,3.8,6.7,2.2,virginica 120 | 7.7,2.6,6.9,2.3,virginica 121 | 6.0,2.2,5.0,1.5,virginica 122 | 6.9,3.2,5.7,2.3,virginica 123 | 5.6,2.8,4.9,2.0,virginica 124 | 7.7,2.8,6.7,2.0,virginica 125 | 6.3,2.7,4.9,1.8,virginica 126 | 6.7,3.3,5.7,2.1,virginica 127 | 7.2,3.2,6.0,1.8,virginica 128 | 6.2,2.8,4.8,1.8,virginica 129 | 6.1,3.0,4.9,1.8,virginica 130 | 6.4,2.8,5.6,2.1,virginica 131 | 7.2,3.0,5.8,1.6,virginica 132 | 7.4,2.8,6.1,1.9,virginica 133 | 7.9,3.8,6.4,2.0,virginica 134 | 6.4,2.8,5.6,2.2,virginica 135 | 6.3,2.8,5.1,1.5,virginica 136 | 6.1,2.6,5.6,1.4,virginica 137 | 7.7,3.0,6.1,2.3,virginica 138 | 6.3,3.4,5.6,2.4,virginica 139 | 6.4,3.1,5.5,1.8,virginica 140 | 6.0,3.0,4.8,1.8,virginica 141 | 6.9,3.1,5.4,2.1,virginica 142 | 6.7,3.1,5.6,2.4,virginica 143 | 6.9,3.1,5.1,2.3,virginica 144 | 5.8,2.7,5.1,1.9,virginica 145 | 6.8,3.2,5.9,2.3,virginica 146 | 6.7,3.3,5.7,2.5,virginica 147 | 6.7,3.0,5.2,2.3,virginica 148 | 6.3,2.5,5.0,1.9,virginica 149 | 6.5,3.0,5.2,2.0,virginica 150 | 6.2,3.4,5.4,2.3,virginica 151 | 5.9,3.0,5.1,1.8,virginica 152 | -------------------------------------------------------------------------------- /src/minco/pipelines/data_science/nodes.py: -------------------------------------------------------------------------------- 1 | # Copyright 2020 QuantumBlack Visual Analytics Limited 2 | # 3 | # Licensed under the Apache License, Version 2.0 (the "License"); 4 | # you may not use this file except in compliance with the License. 5 | # You may obtain a copy of the License at 6 | # 7 | # http://www.apache.org/licenses/LICENSE-2.0 8 | # 9 | # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 10 | # EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES 11 | # OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, AND 12 | # NONINFRINGEMENT. IN NO EVENT WILL THE LICENSOR OR OTHER CONTRIBUTORS 13 | # BE LIABLE FOR ANY CLAIM, DAMAGES, OR OTHER LIABILITY, WHETHER IN AN 14 | # ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF, OR IN 15 | # CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 16 | # 17 | # The QuantumBlack Visual Analytics Limited ("QuantumBlack") name and logo 18 | # (either separately or in combination, "QuantumBlack Trademarks") are 19 | # trademarks of QuantumBlack. The License does not grant you any right or 20 | # license to the QuantumBlack Trademarks. You may not use the QuantumBlack 21 | # Trademarks or any confusingly similar mark as a trademark for your product, 22 | # or use the QuantumBlack Trademarks in any other manner that might cause 23 | # confusion in the marketplace, including but not limited to in advertising, 24 | # on websites, or on software. 25 | # 26 | # See the License for the specific language governing permissions and 27 | # limitations under the License. 28 | 29 | """Example code for the nodes in the example pipeline. This code is meant 30 | just for illustrating basic Kedro features. 31 | 32 | Delete this when you start working on your own Kedro project. 33 | """ 34 | # pylint: disable=invalid-name 35 | 36 | import logging 37 | from typing import Any, Dict 38 | 39 | import numpy as np 40 | import pandas as pd 41 | 42 | 43 | def train_model( 44 | train_x: pd.DataFrame, train_y: pd.DataFrame, parameters: Dict[str, Any] 45 | ) -> np.ndarray: 46 | """Node for training a simple multi-class logistic regression model. The 47 | number of training iterations as well as the learning rate are taken from 48 | conf/project/parameters.yml. All of the data as well as the parameters 49 | will be provided to this function at the time of execution. 50 | """ 51 | num_iter = parameters["example_num_train_iter"] 52 | lr = parameters["example_learning_rate"] 53 | X = train_x.to_numpy() 54 | Y = train_y.to_numpy() 55 | 56 | # Add bias to the features 57 | bias = np.ones((X.shape[0], 1)) 58 | X = np.concatenate((bias, X), axis=1) 59 | 60 | weights = [] 61 | # Train one model for each class in Y 62 | for k in range(Y.shape[1]): 63 | # Initialise weights 64 | theta = np.zeros(X.shape[1]) 65 | y = Y[:, k] 66 | for _ in range(num_iter): 67 | z = np.dot(X, theta) 68 | h = _sigmoid(z) 69 | gradient = np.dot(X.T, (h - y)) / y.size 70 | theta -= lr * gradient 71 | # Save the weights for each model 72 | weights.append(theta) 73 | 74 | # Return a joint multi-class model with weights for all classes 75 | return np.vstack(weights).transpose() 76 | 77 | 78 | def predict(model: np.ndarray, test_x: pd.DataFrame) -> np.ndarray: 79 | """Node for making predictions given a pre-trained model and a test set. 80 | """ 81 | X = test_x.to_numpy() 82 | 83 | # Add bias to the features 84 | bias = np.ones((X.shape[0], 1)) 85 | X = np.concatenate((bias, X), axis=1) 86 | 87 | # Predict "probabilities" for each class 88 | result = _sigmoid(np.dot(X, model)) 89 | 90 | # Return the index of the class with max probability for all samples 91 | return np.argmax(result, axis=1) 92 | 93 | 94 | def report_accuracy(predictions: np.ndarray, test_y: pd.DataFrame) -> None: 95 | """Node for reporting the accuracy of the predictions performed by the 96 | previous node. Notice that this function has no outputs, except logging. 97 | """ 98 | # Get true class index 99 | target = np.argmax(test_y.to_numpy(), axis=1) 100 | # Calculate accuracy of predictions 101 | accuracy = np.sum(predictions == target) / target.shape[0] 102 | # Log the accuracy of the model 103 | log = logging.getLogger(__name__) 104 | log.info("Model accuracy on test set: %0.2f%%", accuracy * 100) 105 | 106 | 107 | def _sigmoid(z): 108 | """A helper sigmoid function used by the training and the scoring nodes.""" 109 | return 1 / (1 + np.exp(-z)) 110 | -------------------------------------------------------------------------------- /docs/source/conf.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | # -*- coding: utf-8 -*- 3 | 4 | # Copyright 2020 QuantumBlack Visual Analytics Limited 5 | # 6 | # Licensed under the Apache License, Version 2.0 (the "License"); 7 | # you may not use this file except in compliance with the License. 8 | # You may obtain a copy of the License at 9 | # 10 | # http://www.apache.org/licenses/LICENSE-2.0 11 | # 12 | # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 13 | # EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES 14 | # OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, AND 15 | # NONINFRINGEMENT. IN NO EVENT WILL THE LICENSOR OR OTHER CONTRIBUTORS 16 | # BE LIABLE FOR ANY CLAIM, DAMAGES, OR OTHER LIABILITY, WHETHER IN AN 17 | # ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF, OR IN 18 | # CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 19 | # 20 | # The QuantumBlack Visual Analytics Limited ("QuantumBlack") name and logo 21 | # (either separately or in combination, "QuantumBlack Trademarks") are 22 | # trademarks of QuantumBlack. The License does not grant you any right or 23 | # license to the QuantumBlack Trademarks. You may not use the QuantumBlack 24 | # Trademarks or any confusingly similar mark as a trademark for your product, 25 | # or use the QuantumBlack Trademarks in any other manner that might cause 26 | # confusion in the marketplace, including but not limited to in advertising, 27 | # on websites, or on software. 28 | # 29 | # See the License for the specific language governing permissions and 30 | # limitations under the License. 31 | 32 | # minco documentation build 33 | # configuration file, created by sphinx-quickstart. 34 | # 35 | # This file is execfile()d with the current directory set to its 36 | # containing dir. 37 | # 38 | # Note that not all possible configuration values are present in this 39 | # autogenerated file. 40 | # 41 | # All configuration values have a default; values that are commented out 42 | # serve to show the default. 43 | 44 | # If extensions (or modules to document with autodoc) are in another directory, 45 | # add these directories to sys.path here. If the directory is relative to the 46 | # documentation root, use os.path.abspath to make it absolute, like shown here. 47 | # 48 | import re 49 | 50 | from kedro.framework.cli.utils import find_stylesheets 51 | 52 | from recommonmark.transform import AutoStructify 53 | from minco import __version__ as release 54 | 55 | # -- Project information ----------------------------------------------------- 56 | 57 | project = "minco" 58 | copyright = "2020, QuantumBlack Visual Analytics Limited" 59 | author = "QuantumBlack" 60 | 61 | # The short X.Y version. 62 | version = re.match(r"^([0-9]+\.[0-9]+).*", release).group(1) 63 | 64 | # -- General configuration --------------------------------------------------- 65 | 66 | # If your documentation needs a minimal Sphinx version, state it here. 67 | # 68 | # needs_sphinx = '1.0' 69 | 70 | # Add any Sphinx extension module names here, as strings. They can be 71 | # extensions coming with Sphinx (named 'sphinx.ext.*') or your custom 72 | # ones. 73 | extensions = [ 74 | "sphinx.ext.autodoc", 75 | "sphinx.ext.napoleon", 76 | "sphinx_autodoc_typehints", 77 | "sphinx.ext.doctest", 78 | "sphinx.ext.todo", 79 | "sphinx.ext.coverage", 80 | "sphinx.ext.mathjax", 81 | "sphinx.ext.ifconfig", 82 | "sphinx.ext.viewcode", 83 | "sphinx.ext.mathjax", 84 | "nbsphinx", 85 | "recommonmark", 86 | "sphinx_copybutton", 87 | ] 88 | 89 | # enable autosummary plugin (table of contents for modules/classes/class 90 | # methods) 91 | autosummary_generate = True 92 | 93 | # Add any paths that contain templates here, relative to this directory. 94 | templates_path = ["_templates"] 95 | 96 | # The suffix(es) of source filenames. 97 | # You can specify multiple suffix as a list of string: 98 | # 99 | source_suffix = {".rst": "restructuredtext", ".md": "markdown"} 100 | 101 | # The master toctree document. 102 | master_doc = "index" 103 | 104 | # The language for content autogenerated by Sphinx. Refer to documentation 105 | # for a list of supported languages. 106 | # 107 | # This is also used if you do content translation via gettext catalogs. 108 | # Usually you set "language" from the command line for these cases. 109 | language = None 110 | 111 | # List of patterns, relative to source directory, that match files and 112 | # directories to ignore when looking for source files. 113 | # This pattern also affects html_static_path and html_extra_path . 114 | exclude_patterns = ["_build", "**.ipynb_checkpoints"] 115 | 116 | # The name of the Pygments (syntax highlighting) style to use. 117 | pygments_style = "sphinx" 118 | 119 | # -- Options for HTML output ------------------------------------------------- 120 | 121 | # The theme to use for HTML and HTML Help pages. See the documentation for 122 | # a list of builtin themes. 123 | # 124 | html_theme = "sphinx_rtd_theme" 125 | 126 | # Theme options are theme-specific and customize the look and feel of a theme 127 | # further. For a list of options available for each theme, see the 128 | # documentation. 129 | # 130 | html_theme_options = {"collapse_navigation": False, "style_external_links": True} 131 | 132 | # Add any paths that contain custom static files (such as style sheets) here, 133 | # relative to this directory. They are copied after the builtin static files, 134 | # so a file named "default.css" will overwrite the builtin "default.css". 135 | html_static_path = ["_static"] 136 | 137 | # Custom sidebar templates, must be a dictionary that maps document names 138 | # to template names. 139 | # 140 | # The default sidebars (for documents that don't match any pattern) are 141 | # defined by theme itself. Builtin themes are using these templates by 142 | # default: ``['localtoc.html', 'relations.html', 'sourcelink.html', 143 | # 'searchbox.html']``. 144 | # 145 | # html_sidebars = {} 146 | 147 | html_show_sourcelink = False 148 | 149 | # -- Options for HTMLHelp output --------------------------------------------- 150 | 151 | # Output file base name for HTML help builder. 152 | htmlhelp_basename = "mincodoc" 153 | 154 | # -- Options for LaTeX output ------------------------------------------------ 155 | 156 | latex_elements = { 157 | # The paper size ('letterpaper' or 'a4paper'). 158 | # 159 | # 'papersize': 'letterpaper', 160 | # 161 | # The font size ('10pt', '11pt' or '12pt'). 162 | # 163 | # 'pointsize': '10pt', 164 | # 165 | # Additional stuff for the LaTeX preamble. 166 | # 167 | # 'preamble': '', 168 | # 169 | # Latex figure (float) alignment 170 | # 171 | # 'figure_align': 'htbp', 172 | } 173 | 174 | # Grouping the document tree into LaTeX files. List of tuples 175 | # (source start file, target name, title, 176 | # author, documentclass [howto, manual, or own class]). 177 | latex_documents = [ 178 | ( 179 | master_doc, 180 | "minco.tex", 181 | "minco Documentation", 182 | "QuantumBlack", 183 | "manual", 184 | ) 185 | ] 186 | 187 | # -- Options for manual page output ------------------------------------------ 188 | 189 | # One entry per manual page. List of tuples 190 | # (source start file, name, description, authors, manual section). 191 | man_pages = [ 192 | ( 193 | master_doc, 194 | "minco", 195 | "minco Documentation", 196 | [author], 197 | 1, 198 | ) 199 | ] 200 | 201 | # -- Options for Texinfo output ---------------------------------------------- 202 | 203 | # Grouping the document tree into Texinfo files. List of tuples 204 | # (source start file, target name, title, author, 205 | # dir menu entry, description, category) 206 | texinfo_documents = [ 207 | ( 208 | master_doc, 209 | "minco", 210 | "minco Documentation", 211 | author, 212 | "minco", 213 | "Project minco codebase.", 214 | "Data-Science", 215 | ) 216 | ] 217 | 218 | # -- Options for todo extension ---------------------------------------------- 219 | 220 | # If true, `todo` and `todoList` produce output, else they produce nothing. 221 | todo_include_todos = False 222 | 223 | # -- Extension configuration ------------------------------------------------- 224 | 225 | # nbsphinx_prolog = """ 226 | # see here for prolog/epilog details: 227 | # https://nbsphinx.readthedocs.io/en/0.3.1/prolog-and-epilog.html 228 | # """ 229 | 230 | # -- NBconvert kernel config ------------------------------------------------- 231 | nbsphinx_kernel_name = "python3" 232 | 233 | 234 | def remove_arrows_in_examples(lines): 235 | for i, line in enumerate(lines): 236 | lines[i] = line.replace(">>>", "") 237 | 238 | 239 | def autodoc_process_docstring(app, what, name, obj, options, lines): 240 | remove_arrows_in_examples(lines) 241 | 242 | 243 | def skip(app, what, name, obj, skip, options): 244 | if name == "__init__": 245 | return False 246 | return skip 247 | 248 | 249 | def setup(app): 250 | app.connect("autodoc-process-docstring", autodoc_process_docstring) 251 | app.connect("autodoc-skip-member", skip) 252 | # add Kedro stylesheets 253 | for stylesheet in find_stylesheets(): 254 | app.add_stylesheet(stylesheet) 255 | # enable rendering RST tables in Markdown 256 | app.add_config_value("recommonmark_config", {"enable_eval_rst": True}, True) 257 | app.add_transform(AutoStructify) 258 | -------------------------------------------------------------------------------- /kedro_cli.py: -------------------------------------------------------------------------------- 1 | # Copyright 2020 QuantumBlack Visual Analytics Limited 2 | # 3 | # Licensed under the Apache License, Version 2.0 (the "License"); 4 | # you may not use this file except in compliance with the License. 5 | # You may obtain a copy of the License at 6 | # 7 | # http://www.apache.org/licenses/LICENSE-2.0 8 | # 9 | # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 10 | # EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES 11 | # OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, AND 12 | # NONINFRINGEMENT. IN NO EVENT WILL THE LICENSOR OR OTHER CONTRIBUTORS 13 | # BE LIABLE FOR ANY CLAIM, DAMAGES, OR OTHER LIABILITY, WHETHER IN AN 14 | # ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF, OR IN 15 | # CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 16 | # 17 | # The QuantumBlack Visual Analytics Limited ("QuantumBlack") name and logo 18 | # (either separately or in combination, "QuantumBlack Trademarks") are 19 | # trademarks of QuantumBlack. The License does not grant you any right or 20 | # license to the QuantumBlack Trademarks. You may not use the QuantumBlack 21 | # Trademarks or any confusingly similar mark as a trademark for your product, 22 | # or use the QuantumBlack Trademarks in any other manner that might cause 23 | # confusion in the marketplace, including but not limited to in advertising, 24 | # on websites, or on software. 25 | # 26 | # See the License for the specific language governing permissions and 27 | # limitations under the License. 28 | 29 | """Command line tools for manipulating a Kedro project. 30 | Intended to be invoked via `kedro`.""" 31 | import os 32 | from itertools import chain 33 | from pathlib import Path 34 | from typing import Dict, Iterable, Tuple 35 | 36 | import click 37 | from kedro.framework.cli import main as kedro_main 38 | from kedro.framework.cli.catalog import catalog as catalog_group 39 | from kedro.framework.cli.jupyter import jupyter as jupyter_group 40 | from kedro.framework.cli.pipeline import pipeline as pipeline_group 41 | from kedro.framework.cli.project import project_group 42 | from kedro.framework.cli.utils import KedroCliError, env_option, split_string 43 | from kedro.framework.context import load_context 44 | from kedro.utils import load_obj 45 | 46 | CONTEXT_SETTINGS = dict(help_option_names=["-h", "--help"]) 47 | 48 | # get our package onto the python path 49 | PROJ_PATH = Path(__file__).resolve().parent 50 | 51 | ENV_ARG_HELP = """Run the pipeline in a configured environment. If not specified, 52 | pipeline will run using environment `local`.""" 53 | FROM_INPUTS_HELP = ( 54 | """A list of dataset names which should be used as a starting point.""" 55 | ) 56 | FROM_NODES_HELP = """A list of node names which should be used as a starting point.""" 57 | TO_NODES_HELP = """A list of node names which should be used as an end point.""" 58 | NODE_ARG_HELP = """Run only nodes with specified names.""" 59 | RUNNER_ARG_HELP = """Specify a runner that you want to run the pipeline with. 60 | Available runners: `SequentialRunner`, `ParallelRunner` and `ThreadRunner`. 61 | This option cannot be used together with --parallel.""" 62 | PARALLEL_ARG_HELP = """Run the pipeline using the `ParallelRunner`. 63 | If not specified, use the `SequentialRunner`. This flag cannot be used together 64 | with --runner.""" 65 | ASYNC_ARG_HELP = """Load and save node inputs and outputs asynchronously 66 | with threads. If not specified, load and save datasets synchronously.""" 67 | TAG_ARG_HELP = """Construct the pipeline using only nodes which have this tag 68 | attached. Option can be used multiple times, what results in a 69 | pipeline constructed from nodes having any of those tags.""" 70 | LOAD_VERSION_HELP = """Specify a particular dataset version (timestamp) for loading.""" 71 | CONFIG_FILE_HELP = """Specify a YAML configuration file to load the run 72 | command arguments from. If command line arguments are provided, they will 73 | override the loaded ones.""" 74 | PIPELINE_ARG_HELP = """Name of the modular pipeline to run. 75 | If not set, the project pipeline is run by default.""" 76 | PARAMS_ARG_HELP = """Specify extra parameters that you want to pass 77 | to the context initializer. Items must be separated by comma, keys - by colon, 78 | example: param1:value1,param2:value2. Each parameter is split by the first comma, 79 | so parameter values are allowed to contain colons, parameter keys are not.""" 80 | 81 | 82 | def _config_file_callback(ctx, param, value): # pylint: disable=unused-argument 83 | """Config file callback, that replaces command line options with config file 84 | values. If command line options are passed, they override config file values. 85 | """ 86 | # for performance reasons 87 | import anyconfig # pylint: disable=import-outside-toplevel 88 | 89 | ctx.default_map = ctx.default_map or {} 90 | section = ctx.info_name 91 | 92 | if value: 93 | config = anyconfig.load(value)[section] 94 | ctx.default_map.update(config) 95 | 96 | return value 97 | 98 | 99 | def _get_values_as_tuple(values: Iterable[str]) -> Tuple[str, ...]: 100 | return tuple(chain.from_iterable(value.split(",") for value in values)) 101 | 102 | 103 | def _reformat_load_versions( # pylint: disable=unused-argument 104 | ctx, param, value 105 | ) -> Dict[str, str]: 106 | """Reformat data structure from tuple to dictionary for `load-version`. 107 | E.g ('dataset1:time1', 'dataset2:time2') -> {"dataset1": "time1", "dataset2": "time2"}. 108 | """ 109 | load_versions_dict = {} 110 | 111 | for load_version in value: 112 | load_version_list = load_version.split(":", 1) 113 | if len(load_version_list) != 2: 114 | raise KedroCliError( 115 | f"Expected the form of `load_version` to be " 116 | f"`dataset_name:YYYY-MM-DDThh.mm.ss.sssZ`," 117 | f"found {load_version} instead" 118 | ) 119 | load_versions_dict[load_version_list[0]] = load_version_list[1] 120 | 121 | return load_versions_dict 122 | 123 | 124 | def _split_params(ctx, param, value): 125 | if isinstance(value, dict): 126 | return value 127 | result = {} 128 | for item in split_string(ctx, param, value): 129 | item = item.split(":", 1) 130 | if len(item) != 2: 131 | ctx.fail( 132 | f"Invalid format of `{param.name}` option: Item `{item[0]}` must contain " 133 | f"a key and a value separated by `:`." 134 | ) 135 | key = item[0].strip() 136 | if not key: 137 | ctx.fail( 138 | f"Invalid format of `{param.name}` option: Parameter key " 139 | f"cannot be an empty string." 140 | ) 141 | value = item[1].strip() 142 | result[key] = _try_convert_to_numeric(value) 143 | return result 144 | 145 | 146 | def _try_convert_to_numeric(value): 147 | try: 148 | value = float(value) 149 | except ValueError: 150 | return value 151 | return int(value) if value.is_integer() else value 152 | 153 | 154 | @click.group(context_settings=CONTEXT_SETTINGS, name=__file__) 155 | def cli(): 156 | """Command line tools for manipulating a Kedro project.""" 157 | 158 | 159 | @cli.command() 160 | @click.option( 161 | "--from-inputs", type=str, default="", help=FROM_INPUTS_HELP, callback=split_string 162 | ) 163 | @click.option( 164 | "--from-nodes", type=str, default="", help=FROM_NODES_HELP, callback=split_string 165 | ) 166 | @click.option( 167 | "--to-nodes", type=str, default="", help=TO_NODES_HELP, callback=split_string 168 | ) 169 | @click.option("--node", "-n", "node_names", type=str, multiple=True, help=NODE_ARG_HELP) 170 | @click.option( 171 | "--runner", "-r", type=str, default=None, multiple=False, help=RUNNER_ARG_HELP 172 | ) 173 | @click.option("--parallel", "-p", is_flag=True, multiple=False, help=PARALLEL_ARG_HELP) 174 | @click.option("--async", "is_async", is_flag=True, multiple=False, help=ASYNC_ARG_HELP) 175 | @env_option 176 | @click.option("--tag", "-t", type=str, multiple=True, help=TAG_ARG_HELP) 177 | @click.option( 178 | "--load-version", 179 | "-lv", 180 | type=str, 181 | multiple=True, 182 | help=LOAD_VERSION_HELP, 183 | callback=_reformat_load_versions, 184 | ) 185 | @click.option("--pipeline", type=str, default=None, help=PIPELINE_ARG_HELP) 186 | @click.option( 187 | "--config", 188 | "-c", 189 | type=click.Path(exists=True, dir_okay=False, resolve_path=True), 190 | help=CONFIG_FILE_HELP, 191 | callback=_config_file_callback, 192 | ) 193 | @click.option( 194 | "--params", type=str, default="", help=PARAMS_ARG_HELP, callback=_split_params 195 | ) 196 | def run( 197 | tag, 198 | env, 199 | parallel, 200 | runner, 201 | is_async, 202 | node_names, 203 | to_nodes, 204 | from_nodes, 205 | from_inputs, 206 | load_version, 207 | pipeline, 208 | config, 209 | params, 210 | ): 211 | """Run the pipeline.""" 212 | if parallel and runner: 213 | raise KedroCliError( 214 | "Both --parallel and --runner options cannot be used together. " 215 | "Please use either --parallel or --runner." 216 | ) 217 | runner = runner or "SequentialRunner" 218 | if parallel: 219 | runner = "ParallelRunner" 220 | runner_class = load_obj(runner, "kedro.runner") 221 | 222 | tag = _get_values_as_tuple(tag) if tag else tag 223 | node_names = _get_values_as_tuple(node_names) if node_names else node_names 224 | 225 | context = load_context(Path.cwd(), env=env, extra_params=params) 226 | context.run( 227 | tags=tag, 228 | runner=runner_class(is_async=is_async), 229 | node_names=node_names, 230 | from_nodes=from_nodes, 231 | to_nodes=to_nodes, 232 | from_inputs=from_inputs, 233 | load_versions=load_version, 234 | pipeline_name=pipeline, 235 | ) 236 | 237 | 238 | cli.add_command(pipeline_group) 239 | cli.add_command(catalog_group) 240 | cli.add_command(jupyter_group) 241 | 242 | for command in project_group.commands.values(): 243 | cli.add_command(command) 244 | 245 | 246 | if __name__ == "__main__": 247 | os.chdir(str(PROJ_PATH)) 248 | kedro_main() 249 | -------------------------------------------------------------------------------- /notebooks/Data GE Ejemplos.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 3, 6 | "id": "77524b29", 7 | "metadata": {}, 8 | "outputs": [], 9 | "source": [ 10 | "import warnings\n", 11 | "warnings.filterwarnings('ignore')" 12 | ] 13 | }, 14 | { 15 | "cell_type": "code", 16 | "execution_count": 2, 17 | "id": "f040bff5", 18 | "metadata": {}, 19 | "outputs": [], 20 | "source": [ 21 | "import great_expectations as ge" 22 | ] 23 | }, 24 | { 25 | "cell_type": "code", 26 | "execution_count": 4, 27 | "id": "ef9f576d", 28 | "metadata": {}, 29 | "outputs": [ 30 | { 31 | "name": "stdout", 32 | "output_type": "stream", 33 | "text": [ 34 | "2021-04-23 05:46:13,872 - kedro.io.data_catalog - INFO - Loading data from `netflix_titles` (CSVDataSet)...\n" 35 | ] 36 | } 37 | ], 38 | "source": [ 39 | "df = catalog.load('netflix_titles')" 40 | ] 41 | }, 42 | { 43 | "cell_type": "code", 44 | "execution_count": 5, 45 | "id": "dd039278", 46 | "metadata": {}, 47 | "outputs": [ 48 | { 49 | "data": { 50 | "text/html": [ 51 | "
\n", 52 | "\n", 65 | "\n", 66 | " \n", 67 | " \n", 68 | " \n", 69 | " \n", 70 | " \n", 71 | " \n", 72 | " \n", 73 | " \n", 74 | " \n", 75 | " \n", 76 | " \n", 77 | " \n", 78 | " \n", 79 | " \n", 80 | " \n", 81 | " \n", 82 | " \n", 83 | " \n", 84 | " \n", 85 | " \n", 86 | " \n", 87 | " \n", 88 | " \n", 89 | " \n", 90 | " \n", 91 | " \n", 92 | " \n", 93 | " \n", 94 | " \n", 95 | " \n", 96 | " \n", 97 | " \n", 98 | " \n", 99 | " \n", 100 | " \n", 101 | " \n", 102 | " \n", 103 | " \n", 104 | " \n", 105 | " \n", 106 | " \n", 107 | " \n", 108 | " \n", 109 | " \n", 110 | " \n", 111 | " \n", 112 | " \n", 113 | " \n", 114 | " \n", 115 | " \n", 116 | " \n", 117 | " \n", 118 | " \n", 119 | " \n", 120 | " \n", 121 | " \n", 122 | " \n", 123 | " \n", 124 | " \n", 125 | " \n", 126 | " \n", 127 | " \n", 128 | " \n", 129 | " \n", 130 | " \n", 131 | " \n", 132 | " \n", 133 | " \n", 134 | " \n", 135 | " \n", 136 | " \n", 137 | " \n", 138 | " \n", 139 | " \n", 140 | " \n", 141 | " \n", 142 | " \n", 143 | " \n", 144 | " \n", 145 | " \n", 146 | " \n", 147 | " \n", 148 | " \n", 149 | " \n", 150 | " \n", 151 | " \n", 152 | " \n", 153 | " \n", 154 | " \n", 155 | " \n", 156 | " \n", 157 | " \n", 158 | " \n", 159 | " \n", 160 | "
show_idtypetitledirectorcastcountrydate_addedrelease_yearratingdurationlisted_indescription
0s1TV Show3%NaNJoão Miguel, Bianca Comparato, Michel Gomes, R...BrazilAugust 14, 20202020TV-MA4 SeasonsInternational TV Shows, TV Dramas, TV Sci-Fi &...In a future where the elite inhabit an island ...
1s2Movie7:19Jorge Michel GrauDemián Bichir, Héctor Bonilla, Oscar Serrano, ...MexicoDecember 23, 20162016TV-MA93 minDramas, International MoviesAfter a devastating earthquake hits Mexico Cit...
2s3Movie23:59Gilbert ChanTedd Chan, Stella Chung, Henley Hii, Lawrence ...SingaporeDecember 20, 20182011R78 minHorror Movies, International MoviesWhen an army recruit is found dead, his fellow...
3s4Movie9Shane AckerElijah Wood, John C. Reilly, Jennifer Connelly...United StatesNovember 16, 20172009PG-1380 minAction & Adventure, Independent Movies, Sci-Fi...In a postapocalyptic world, rag-doll robots hi...
4s5Movie21Robert LuketicJim Sturgess, Kevin Spacey, Kate Bosworth, Aar...United StatesJanuary 1, 20202008PG-13123 minDramasA brilliant group of students become card-coun...
\n", 161 | "
" 162 | ], 163 | "text/plain": [ 164 | " show_id type title director \\\n", 165 | "0 s1 TV Show 3% NaN \n", 166 | "1 s2 Movie 7:19 Jorge Michel Grau \n", 167 | "2 s3 Movie 23:59 Gilbert Chan \n", 168 | "3 s4 Movie 9 Shane Acker \n", 169 | "4 s5 Movie 21 Robert Luketic \n", 170 | "\n", 171 | " cast country \\\n", 172 | "0 João Miguel, Bianca Comparato, Michel Gomes, R... Brazil \n", 173 | "1 Demián Bichir, Héctor Bonilla, Oscar Serrano, ... Mexico \n", 174 | "2 Tedd Chan, Stella Chung, Henley Hii, Lawrence ... Singapore \n", 175 | "3 Elijah Wood, John C. Reilly, Jennifer Connelly... United States \n", 176 | "4 Jim Sturgess, Kevin Spacey, Kate Bosworth, Aar... United States \n", 177 | "\n", 178 | " date_added release_year rating duration \\\n", 179 | "0 August 14, 2020 2020 TV-MA 4 Seasons \n", 180 | "1 December 23, 2016 2016 TV-MA 93 min \n", 181 | "2 December 20, 2018 2011 R 78 min \n", 182 | "3 November 16, 2017 2009 PG-13 80 min \n", 183 | "4 January 1, 2020 2008 PG-13 123 min \n", 184 | "\n", 185 | " listed_in \\\n", 186 | "0 International TV Shows, TV Dramas, TV Sci-Fi &... \n", 187 | "1 Dramas, International Movies \n", 188 | "2 Horror Movies, International Movies \n", 189 | "3 Action & Adventure, Independent Movies, Sci-Fi... \n", 190 | "4 Dramas \n", 191 | "\n", 192 | " description \n", 193 | "0 In a future where the elite inhabit an island ... \n", 194 | "1 After a devastating earthquake hits Mexico Cit... \n", 195 | "2 When an army recruit is found dead, his fellow... \n", 196 | "3 In a postapocalyptic world, rag-doll robots hi... \n", 197 | "4 A brilliant group of students become card-coun... " 198 | ] 199 | }, 200 | "execution_count": 5, 201 | "metadata": {}, 202 | "output_type": "execute_result" 203 | } 204 | ], 205 | "source": [ 206 | "df.head()" 207 | ] 208 | }, 209 | { 210 | "cell_type": "code", 211 | "execution_count": 6, 212 | "id": "05646361", 213 | "metadata": {}, 214 | "outputs": [], 215 | "source": [ 216 | "# gdf = ge.read_csv(\"my_data_directory/example.csv\")\n", 217 | "# gdf = ge.read_parquet ...\n", 218 | "# gdf = ge.read_excel ...\n", 219 | "# gdf = ge.from_pandas\n", 220 | "\n", 221 | "gdf = ge.from_pandas(df)" 222 | ] 223 | }, 224 | { 225 | "cell_type": "code", 226 | "execution_count": 12, 227 | "id": "9a44db99", 228 | "metadata": {}, 229 | "outputs": [ 230 | { 231 | "data": { 232 | "text/plain": [ 233 | "{\n", 234 | " \"result\": {\n", 235 | " \"element_count\": 150,\n", 236 | " \"unexpected_count\": 1,\n", 237 | " \"unexpected_percent\": 0.6666666666666667,\n", 238 | " \"unexpected_percent_total\": 0.6666666666666667,\n", 239 | " \"partial_unexpected_list\": []\n", 240 | " },\n", 241 | " \"meta\": {},\n", 242 | " \"exception_info\": {\n", 243 | " \"raised_exception\": false,\n", 244 | " \"exception_traceback\": null,\n", 245 | " \"exception_message\": null\n", 246 | " },\n", 247 | " \"success\": true\n", 248 | "}" 249 | ] 250 | }, 251 | "execution_count": 12, 252 | "metadata": {}, 253 | "output_type": "execute_result" 254 | } 255 | ], 256 | "source": [ 257 | "gdf.expect_column_values_to_not_be_null(['sepal_length'])" 258 | ] 259 | }, 260 | { 261 | "cell_type": "code", 262 | "execution_count": null, 263 | "id": "52a1a126", 264 | "metadata": {}, 265 | "outputs": [], 266 | "source": [] 267 | } 268 | ], 269 | "metadata": { 270 | "kernelspec": { 271 | "display_name": "minco", 272 | "language": "python", 273 | "name": "python3" 274 | }, 275 | "language_info": { 276 | "codemirror_mode": { 277 | "name": "ipython", 278 | "version": 3 279 | }, 280 | "file_extension": ".py", 281 | "mimetype": "text/x-python", 282 | "name": "python", 283 | "nbconvert_exporter": "python", 284 | "pygments_lexer": "ipython3", 285 | "version": "3.8.9" 286 | } 287 | }, 288 | "nbformat": 4, 289 | "nbformat_minor": 5 290 | } 291 | --------------------------------------------------------------------------------