├── PyPi Package
├── src
│ ├── denmune
│ │ ├── .idea
│ │ │ ├── .name
│ │ │ ├── .gitignore
│ │ │ ├── misc.xml
│ │ │ ├── inspectionProfiles
│ │ │ │ └── profiles_settings.xml
│ │ │ ├── modules.xml
│ │ │ └── denmune.iml
│ │ ├── __init__.py
│ │ └── denmune.py
│ └── denmune.egg-info
│ │ ├── dependency_links.txt
│ │ ├── top_level.txt
│ │ ├── requires.txt
│ │ ├── SOURCES.txt
│ │ └── PKG-INFO
├── MANIFEST.in
├── .vscode
│ └── settings.json
├── pyproject.toml
├── setup.py
├── setup.cfg
├── LICENSE
└── README.md
├── src
├── __init__.py
└── tests
│ └── test_denmune.py
├── images
├── denmune-illustration.png
└── denmune_propagation.png
├── requirements.txt
├── codecov.yml
├── .github
└── workflows
│ ├── python-package
│ └── codeql.yml
├── LICENSE
├── .circleci
└── config.yml
├── colab
└── iris_dataset.ipynb
├── README.md
└── kaggle
└── the-beauty-of-clusters-propagation.ipynb
/PyPi Package/src/denmune/.idea/.name:
--------------------------------------------------------------------------------
1 | denmune.py
--------------------------------------------------------------------------------
/PyPi Package/src/denmune.egg-info/dependency_links.txt:
--------------------------------------------------------------------------------
1 |
2 |
--------------------------------------------------------------------------------
/PyPi Package/src/denmune.egg-info/top_level.txt:
--------------------------------------------------------------------------------
1 | denmune
2 |
--------------------------------------------------------------------------------
/PyPi Package/MANIFEST.in:
--------------------------------------------------------------------------------
1 | recursive-include data *.ipynb *.py *.txt *.csv
--------------------------------------------------------------------------------
/PyPi Package/src/denmune/__init__.py:
--------------------------------------------------------------------------------
1 |
2 | from .denmune import DenMune
--------------------------------------------------------------------------------
/PyPi Package/.vscode/settings.json:
--------------------------------------------------------------------------------
1 | {
2 | "restructuredtext.confPath": ""
3 | }
--------------------------------------------------------------------------------
/src/__init__.py:
--------------------------------------------------------------------------------
1 | from .denmune import DenMune
2 |
3 | __all__ = ["DenMune"]
4 |
--------------------------------------------------------------------------------
/PyPi Package/src/denmune/.idea/.gitignore:
--------------------------------------------------------------------------------
1 | # Default ignored files
2 | /shelf/
3 | /workspace.xml
4 |
--------------------------------------------------------------------------------
/images/denmune-illustration.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/scikit-learn-contrib/denmune-clustering-algorithm/main/images/denmune-illustration.png
--------------------------------------------------------------------------------
/images/denmune_propagation.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/scikit-learn-contrib/denmune-clustering-algorithm/main/images/denmune_propagation.png
--------------------------------------------------------------------------------
/PyPi Package/pyproject.toml:
--------------------------------------------------------------------------------
1 | [build-system]
2 | requires = [
3 | "setuptools>=42",
4 | "wheel"
5 | ]
6 | build-backend = "setuptools.build_meta"
--------------------------------------------------------------------------------
/PyPi Package/src/denmune.egg-info/requires.txt:
--------------------------------------------------------------------------------
1 | numpy>=1.18.5
2 | pandas>=1.0.3
3 | matplotlib>=3.2.1
4 | scikit-learn>=0.22.1
5 | seaborn>=0.10.1
6 | ngt>=1.11.6
7 | anytree>=2.8.0
8 | treelib>=1.6.1
9 |
--------------------------------------------------------------------------------
/PyPi Package/src/denmune/.idea/misc.xml:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | ngt>=2.0.4
2 | numpy>=1.23.5
3 | pandas>=1.5.3
4 | matplotlib>=3.7.2
5 | scikit-learn>=1.2.2
6 | seaborn>=0.12.2
7 | anytree>=2.8
8 | treelib>=1.6.1
9 | pytest>=6.2.5
10 | coverage>=6.3.1
11 | treon
12 | testbook
13 | notebook
14 |
15 |
--------------------------------------------------------------------------------
/PyPi Package/src/denmune/.idea/inspectionProfiles/profiles_settings.xml:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 |
6 |
--------------------------------------------------------------------------------
/PyPi Package/src/denmune/.idea/modules.xml:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
--------------------------------------------------------------------------------
/PyPi Package/setup.py:
--------------------------------------------------------------------------------
1 | from setuptools import setup
2 |
3 | setup(
4 | install_requires=[
5 |
6 | 'numpy==1.23.5',
7 | 'pandas==1.5.3',
8 | 'matplotlib==3.7.2',
9 | 'scikit-learn==1.2.2',
10 | 'seaborn==0.12.2',
11 | 'ngt==2.0.4',
12 | 'anytree==2.8',
13 | 'treelib==1.6.1',
14 | ]
15 |
16 | )
17 |
--------------------------------------------------------------------------------
/PyPi Package/src/denmune/.idea/denmune.iml:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
--------------------------------------------------------------------------------
/PyPi Package/src/denmune.egg-info/SOURCES.txt:
--------------------------------------------------------------------------------
1 | LICENSE
2 | MANIFEST.in
3 | README.md
4 | pyproject.toml
5 | setup.cfg
6 | setup.py
7 | src/denmune/__init__.py
8 | src/denmune/denmune.py
9 | src/denmune.egg-info/PKG-INFO
10 | src/denmune.egg-info/SOURCES.txt
11 | src/denmune.egg-info/dependency_links.txt
12 | src/denmune.egg-info/requires.txt
13 | src/denmune.egg-info/top_level.txt
--------------------------------------------------------------------------------
/PyPi Package/setup.cfg:
--------------------------------------------------------------------------------
1 | [metadata]
2 | name = denmune
3 | version = 0.0.96
4 | author = Mohamed Ali Abbas
5 | author_email = mohamed.alyabbas@outlook.com
6 | description = This is the package for DenMune Clustering Algorithm published in paper https://doi.org/10.1016/j.patcog.2020.107589
7 | long_description = file: README.md
8 | long_description_content_type = text/markdown
9 | url = https://github.com/egy1st/denmune-clustering-algorithm
10 | project_urls =
11 | Bug Tracker = https://github.com/pypa/sampleproject/issues
12 | classifiers =
13 | Programming Language :: Python :: 3
14 | License :: OSI Approved :: BSD License
15 | Operating System :: OS Independent
16 |
17 | [options]
18 | package_dir =
19 | = src
20 | packages = find:
21 | python_requires = >=3.6
22 |
23 | [options.packages.find]
24 | where = src
25 |
--------------------------------------------------------------------------------
/codecov.yml:
--------------------------------------------------------------------------------
1 | codecov:
2 | require_ci_to_pass: yes
3 |
4 | coverage:
5 | precision: 2
6 | round: down
7 | range: "70...100"
8 |
9 | status:
10 | project:
11 | default: false # disable the default status that measures entire project
12 | tests: # declare a new status context "tests"
13 | target: 100% # we always want 100% coverage here
14 | paths: "tests/" # only include coverage in "tests/" folder
15 | jupyter: # declare a new status context "app"
16 | paths: "!tests/" # remove all files in "tests/"
17 |
18 | if_ci_failed: error #success, failure, error, ignore
19 | informational: true
20 |
21 | parsers:
22 | gcov:
23 | branch_detection:
24 | conditional: yes
25 | loop: yes
26 | method: no
27 | macro: no
28 |
29 | comment:
30 | layout: "reach,diff,flags,files,footer"
31 | behavior: default
32 | require_changes: no
33 |
34 |
--------------------------------------------------------------------------------
/.github/workflows/python-package:
--------------------------------------------------------------------------------
1 | name: workflow for codecov
2 | on: [push]
3 | jobs:
4 | run:
5 | runs-on: ${{ matrix.os }}
6 | strategy:
7 | matrix:
8 | os: [ubuntu-latest]
9 | python: ['3.6', '3.7', '3.8', '3.9']
10 | env:
11 | OS: ${{ matrix.os }}
12 | PYTHON: ${{ matrix.python }}
13 | steps:
14 | - uses: actions/checkout@master
15 | - name: Setup Python
16 | uses: actions/setup-python@master
17 | with:
18 | python-version: 3.7
19 | - name: Generate coverage report
20 | run: |
21 | pip install pytest
22 | pip install pytest-cov
23 | pip install numpy
24 | pip install -U scikit-learn
25 | pip install denmune
26 | pytest --cov=./ --cov-report=xml
27 | - name: Upload coverage to Codecov
28 | uses: codecov/codecov-action@v2
29 | with:
30 | token: 'fce1be95-36c5-4c80-83c1-fe9fa8539dae'
31 | files: ./coverage.xml
32 | env_vars: OS,PYTHON
33 | fail_ci_if_error: true
34 | flags: unittests
35 | name: codecov-umbrella
36 | verbose: true
37 |
--------------------------------------------------------------------------------
/PyPi Package/LICENSE:
--------------------------------------------------------------------------------
1 | Copyright (c) 2021, Mohamed Ali Abbas
2 | All rights reserved.
3 |
4 | Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
5 |
6 | 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
7 |
8 | 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
9 |
10 | 3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
11 |
12 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | BSD 3-Clause License
2 |
3 | Copyright (c) 2021, Mohamed Ali Abbas
4 | All rights reserved.
5 |
6 | Redistribution and use in source and binary forms, with or without
7 | modification, are permitted provided that the following conditions are met:
8 |
9 | 1. Redistributions of source code must retain the above copyright notice, this
10 | list of conditions and the following disclaimer.
11 |
12 | 2. Redistributions in binary form must reproduce the above copyright notice,
13 | this list of conditions and the following disclaimer in the documentation
14 | and/or other materials provided with the distribution.
15 |
16 | 3. Neither the name of the copyright holder nor the names of its
17 | contributors may be used to endorse or promote products derived from
18 | this software without specific prior written permission.
19 |
20 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
21 | AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
22 | IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
23 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
24 | FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
25 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
26 | SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
27 | CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
28 | OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
29 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
30 |
--------------------------------------------------------------------------------
/.github/workflows/codeql.yml:
--------------------------------------------------------------------------------
1 | # For most projects, this workflow file will not need changing; you simply need
2 | # to commit it to your repository.
3 | #
4 | # You may wish to alter this file to override the set of languages analyzed,
5 | # or to provide custom queries or build logic.
6 | #
7 | # ******** NOTE ********
8 | # We have attempted to detect the languages in your repository. Please check
9 | # the `language` matrix defined below to confirm you have the correct set of
10 | # supported CodeQL languages.
11 | #
12 | name: "CodeQL"
13 |
14 | on:
15 | push:
16 | branches: [ "main" ]
17 | pull_request:
18 | # The branches below must be a subset of the branches above
19 | branches: [ "main" ]
20 | schedule:
21 | - cron: '45 0 * * 6'
22 |
23 | jobs:
24 | analyze:
25 | name: Analyze
26 | runs-on: ubuntu-latest
27 | permissions:
28 | actions: read
29 | contents: read
30 | security-events: write
31 |
32 | strategy:
33 | fail-fast: false
34 | matrix:
35 | language: [ 'python' ]
36 | # CodeQL supports [ 'cpp', 'csharp', 'go', 'java', 'javascript', 'python', 'ruby' ]
37 | # Learn more about CodeQL language support at https://aka.ms/codeql-docs/language-support
38 |
39 | steps:
40 | - name: Checkout repository
41 | uses: actions/checkout@v3
42 |
43 | # Initializes the CodeQL tools for scanning.
44 | - name: Initialize CodeQL
45 | uses: github/codeql-action/init@v2
46 | with:
47 | languages: ${{ matrix.language }}
48 | # If you wish to specify custom queries, you can do so here or in a config file.
49 | # By default, queries listed here will override any specified in a config file.
50 | # Prefix the list here with "+" to use these queries and those in the config file.
51 |
52 | # Details on CodeQL's query packs refer to : https://docs.github.com/en/code-security/code-scanning/automatically-scanning-your-code-for-vulnerabilities-and-errors/configuring-code-scanning#using-queries-in-ql-packs
53 | # queries: security-extended,security-and-quality
54 |
55 |
56 | # Autobuild attempts to build any compiled languages (C/C++, C#, or Java).
57 | # If this step fails, then you should remove it and run the build manually (see below)
58 | - name: Autobuild
59 | uses: github/codeql-action/autobuild@v2
60 |
61 | # ℹ️ Command-line programs to run using the OS shell.
62 | # 📚 See https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#jobsjob_idstepsrun
63 |
64 | # If the Autobuild fails above, remove it and uncomment the following three lines.
65 | # modify them (or add more) to build your code if your project, please refer to the EXAMPLE below for guidance.
66 |
67 | # - run: |
68 | # echo "Run, Build Application using script"
69 | # ./location_of_script_within_repo/buildscript.sh
70 |
71 | - name: Perform CodeQL Analysis
72 | uses: github/codeql-action/analyze@v2
73 | with:
74 | category: "/language:${{matrix.language}}"
75 |
--------------------------------------------------------------------------------
/.circleci/config.yml:
--------------------------------------------------------------------------------
1 | # Use the latest 2.1 version of CircleCI pipeline process engine.
2 | # See: https://circleci.com/docs/2.0/configuration-reference
3 | version: 2.1
4 |
5 | # Orbs are reusable packages of CircleCI configuration that you may share across projects, enabling you to create encapsulated, parameterized commands, jobs, and executors that can be used across multiple projects.
6 | # See: https://circleci.com/docs/2.0/orb-intro/
7 | orbs:
8 | # The python orb contains a set of prepackaged CircleCI configuration you can use repeatedly in your configuration files
9 | # Orb commands and jobs help you with common scripting around a language/tool
10 | # so you dont have to copy and paste it everywhere.
11 | # See the orb documentation here: https://circleci.com/developer/orbs/orb/circleci/python
12 | codecov: codecov/codecov@3.0.0
13 | slack: circleci/slack@4.4.4
14 | python: circleci/python@2.1.1
15 |
16 |
17 | # Define a job to be invoked later in a workflow.
18 | # See: https://circleci.com/docs/2.0/configuration-reference/#jobs
19 | jobs:
20 | build-and-test: # This is the name of the job, feel free to change it to better match what you're trying to do!
21 | # These next lines defines a Docker executors: https://circleci.com/docs/2.0/executor-types/
22 | # You can specify an image from Dockerhub or use one of the convenience images from CircleCI's Developer Hub
23 | # A list of available CircleCI Docker convenience images are available here: https://circleci.com/developer/images/image/cimg/python
24 | # The executor is the environment in which the steps below will be executed - below will use a python 3.8 container
25 | # Change the version below to your required version of python
26 | docker:
27 | - image: cimg/python:3.10
28 |
29 |
30 | # Checkout the code as the first step. This is a dedicated CircleCI step.
31 | # The python orb's install-packages step will install the dependencies from a Pipfile via Pipenv by default.
32 | # Here we're making sure we use just use the system-wide pip. By default it uses the project root's requirements.txt.
33 | # Then run your tests!
34 | # CircleCI will report the results back to your VCS provider.
35 | steps:
36 | - checkout
37 | - python/install-packages:
38 | pkg-manager: pip
39 | # app-dir: ~/project/package-directory/ # If you're requirements.txt isn't in the root directory.
40 | # pip-dependency-file: test-requirements.txt # if you have a different name for your requirements file, maybe one that combines your runtime and test requirements.
41 |
42 | - run:
43 | name: Treon Test
44 | command: |
45 | cd colab
46 | # git clone https://github.com/egy1st/datasets
47 | # treon --threads=2
48 |
49 | - run:
50 | name: CodeCov pyTest
51 | command: |
52 | coverage run -m pytest
53 | coverage report
54 | coverage html
55 | coverage xml
56 | cp coverage.xml htmlcov/coverage.xml
57 |
58 | - codecov/upload
59 |
60 | - store_artifacts:
61 | path: htmlcov
62 |
63 | - slack/notify:
64 | template: basic_success_1
65 | channel: C0326UK1VFY
66 | # Invoke jobs via workflows
67 | # See: https://circleci.com/docs/2.0/configuration-reference/#workflows
68 | workflows:
69 | Python-3.10: # This is the name of the workflow, feel free to change it to better match your workflow.
70 | # Inside the workflow, you define the jobs you want to run.
71 | jobs:
72 | - build-and-test:
73 | context: Slack
74 |
75 |
--------------------------------------------------------------------------------
/src/tests/test_denmune.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | from itertools import chain
3 | import pandas as pd
4 | import pytest
5 | from sklearn.datasets import make_blobs
6 | from sklearn.datasets import load_iris
7 | from src.denmune import DenMune
8 |
9 |
10 | # test DenMune's results
11 | X_cc, y_cc = make_blobs(
12 | n_samples=1000,
13 | centers=np.array([[-1, -1], [1, 1]]),
14 | random_state=0,
15 | shuffle=False,
16 | cluster_std=0.5,
17 | )
18 |
19 | knn = 10
20 |
21 | def test_DenMune_results():
22 | dm = DenMune(train_data=X_cc, train_truth=y_cc, k_nearest=knn)
23 | labels, validity = dm.fit_predict(show_analyzer=False)
24 | # This test use data that are not perfectly separable so the
25 | # accuracy is not 1. Accuracy around 0.90
26 | assert (np.mean(dm.labels_pred == y_cc) > 0.90) or (1 - np.mean(dm.labels_pred == y_cc) > 0.90)
27 |
28 |
29 | @pytest.mark.parametrize("train_data", [None, X_cc[:800] ])
30 | @pytest.mark.parametrize("train_truth", [None, y_cc[:800] ])
31 | @pytest.mark.parametrize("test_data", [None, X_cc[800:] ])
32 | @pytest.mark.parametrize("test_truth", [None, y_cc[800:] ])
33 | @pytest.mark.parametrize("validate", [True, False])
34 | @pytest.mark.parametrize("show_plots", [True, False])
35 | @pytest.mark.parametrize("show_noise", [True, False])
36 | @pytest.mark.parametrize("show_analyzer", [True, False])
37 | @pytest.mark.parametrize("prop_step", [0, 600])
38 |
39 | # all possible combinations will be tested over all parameters. Actually, 257 tests will be covered
40 | def test_parameters(train_data, train_truth, test_data, test_truth, validate, show_plots, show_noise, show_analyzer, prop_step):
41 | if not (train_data is None):
42 | if not (train_data is not None and train_truth is None and test_truth is not None):
43 | if not (train_data is not None and test_data is not None and train_truth is None):
44 | if not (train_data is not None and train_truth is not None and test_truth is not None and test_data is None):
45 | dm = DenMune(train_data=train_data, train_truth=train_truth, test_data=test_data, test_truth=test_truth, k_nearest=10,prop_step=prop_step)
46 | labels, validity = dm.fit_predict(validate=validate, show_plots=show_plots, show_noise=show_noise, show_analyzer=show_analyzer)
47 | # This test use data that are not perfectly separable so the
48 | # accuracy is not 1. Accuracy around 0.70
49 | assert ( np.mean(labels == y_cc) > 0.70 or (1 - np.mean( labels == y_cc) > 0.70) )
50 |
51 |
52 | def test_DenMune_propagation():
53 | snapshots = chain([0], range(2,5), range(5,50,5), range(50, 100, 10), range(100,500,50), range(500,1100, 100))
54 | for snapshot in snapshots:
55 | dm = DenMune(train_data=X_cc, k_nearest=knn, prop_step=snapshot)
56 | labels, validity = dm.fit_predict(show_analyzer=False, show_plots=False)
57 | # if snapshot iteration = 1000, this means we could propagate to the end properly
58 | assert (snapshot == 1000)
59 |
60 | # we are going to do some tests using iris data
61 | X_iris = load_iris()["data"]
62 | y_iris = load_iris()["target"]
63 |
64 | # we test t_SNE reduction by applying it on Iris dataset which has 4 dimentions.
65 | @pytest.mark.parametrize("file_2d", [None, 'iris_2d.csv'])
66 | @pytest.mark.parametrize("rgn_tsne", [True, False])
67 |
68 |
69 | def test_t_SNE(rgn_tsne, file_2d):
70 | dm = DenMune(train_data=X_iris, train_truth=y_iris, k_nearest=knn, rgn_tsne=rgn_tsne, file_2d=file_2d)
71 | labels, validity = dm.fit_predict(show_analyzer=False, show_plots=False)
72 | assert (dm.data.shape[1] == 2) # this means it was reduced properly to 2-d using t-SNE
73 |
74 | def test_knn():
75 | for k in range (5, 55, 5):
76 | dm = DenMune(train_data=X_iris, train_truth=y_iris, k_nearest=k, rgn_tsne=False)
77 | labels, validity = dm.fit_predict(show_analyzer=False, show_plots=False)
78 | #assert (k == 50) # this means we tested the algorithm works fine with several knn inputs
79 |
80 |
81 | data_file = 'https://raw.githubusercontent.com/egy1st/datasets/dd90854f92cb5ef73b4146606c1c158c32e69b94/denmune/shapes/aggr_rand.csv'
82 | data = pd.read_csv(data_file, sep=',', header=None)
83 | labels = data.iloc[:, -1]
84 | data = data.drop(data.columns[-1], axis=1)
85 | train_data = data [:555]
86 | test_data = data [555:]
87 | train_labels = labels [:555]
88 | test_labels = labels [555:]
89 |
90 | # check if data will be treated correctly when comes as dataframe
91 | def test_dataframe():
92 | knn = 11 # k-nearest neighbor, the only parameter required by the algorithm
93 | dm = DenMune(train_data=train_data, train_truth=train_labels, test_data=test_data, test_truth=test_labels, k_nearest=knn, rgn_tsne=True)
94 | labels, validity = dm.fit_predict(validate=True, show_noise=True, show_analyzer=True)
95 | assert ( np.mean(dm.labels_pred == labels) > 0.97 or (1 - np.mean( dm.labels_pred == labels) > 0.97) )
96 |
97 |
98 | def test_exceptions():
99 |
100 | with pytest.raises(Exception) as execinfo:
101 | dm = DenMune(train_data=None, k_nearest=10)
102 | #labels, validity = dm.fit_predict()
103 | #raise Exception('train data is None')
104 |
105 | with pytest.raises(Exception) as execinfo:
106 | dm = DenMune(train_data=train_data, test_truth=test_labels, k_nearest=10)
107 | #labels, validity = dm.fit_predict()
108 | #raise Exception('train_data is not None and train_truth is None and test_truth is not None')
109 |
110 | with pytest.raises(Exception) as execinfo:
111 | dm = DenMune(train_data=train_data, test_data=test_data, k_nearest=10)
112 | #labels, validity = dm.fit_predict()
113 | #raise Exception('train_data is not None and test_data is not None and train_truth is None')
114 |
115 | with pytest.raises(Exception) as execinfo:
116 | dm = DenMune(train_data=train_data, train_truth=train_labels, test_truth=test_labels, test_data=None, k_nearest=10)
117 | #labels, validity = dm.fit_predict()
118 | #raise Exception('train_data is not None and train_truth is not None and test_truth is not None and test_data is None')
119 | with pytest.raises(Exception) as execinfo:
120 |
121 | dm = DenMune(train_data=train_data, train_truth=train_labels, k_nearest=0) # default value for k_nearest is 1 which is valid
122 | #labels, validity = dm.fit_predict()
123 | #raise Exception('k-nearest neighbor should be at least 1')
124 |
--------------------------------------------------------------------------------
/PyPi Package/README.md:
--------------------------------------------------------------------------------
1 | DenMune: A density-peak clustering algorithm
2 | =============================================
3 |
4 | DenMune a clustering algorithm that can find clusters of arbitrary size, shapes and densities in two-dimensions. Higher dimensions are first reduced to 2-D using the t-sne. The algorithm relies on a single parameter K (the number of nearest neighbors). The results show the superiority of the algorithm. Enjoy the simplicity but the power of DenMune.
5 |
6 |
7 | []( https://pypi.org/project/denmune/)
8 | [](https://mybinder.org/v2/gh/egy1st/denmune-clustering-algorithm/HEAD)
9 | [](https://denmune.readthedocs.io/en/latest/?badge=latest)
10 | [](#colab)
11 | [](https://www.kaggle.com/egyfirst/denmune-clustering-iris-dataset?scriptVersionId=84775816)
12 | [](https://www.sciencedirect.com/science/article/abs/pii/S0031320320303927)
13 | [](https://data.mendeley.com/datasets/b73cw5n43r/4)
14 | [](https://choosealicense.com/licenses/bsd-3-clause/)
15 | [](https://circleci.com/gh/egy1st/denmune-clustering-algorithm/tree/main)
16 | [](https://codecov.io/gh/egy1st/denmune-clustering-algorithm)
17 |
18 | Based on the paper
19 | -------------------
20 |
21 | |Paper|Journal|
22 | |-------------------------------------------------------------------------------------------|-----------------------------|
23 | |Mohamed Abbas, Adel El-Zoghabi, Amin Ahoukry,
24 | |*DenMune: Density peak based clustering using mutual nearest neighbors*
25 | |In: Journal of Pattern Recognition, Elsevier,
26 | |volume 109, number 107589, January 2021
27 | |DOI: https://doi.org/10.1016/j.patcog.2020.107589
28 |
29 | Documentation:
30 | ---------------
31 | Documentation, including tutorials, are available on https://denmune.readthedocs.io
32 |
33 | [](https://denmune.readthedocs.io/en/latest/?badge=latest)
34 |
35 |
36 | Watch it in action
37 | -------------------
38 | This 30 seconds will tell you how a density-baased algorithm, DenMune propagates:
39 |
40 | [](https://colab.research.google.com/drive/1o-tP3uvDGjxBOGYkir1lnbr74sZ06e0U?usp=sharing)
41 |
42 | []()
43 |
44 |
45 |
46 | When less means more
47 | --------------------
48 | Most calssic clustering algorithms fail in detecting complex clusters where clusters are of different size, shape, density, and being exist in noisy data.
49 | Recently, a density-based algorithm named DenMune showed great ability in detecting complex shapes even in noisy data. it can detect number of clusters automatically, detect both pre-identified-noise and post-identified-noise automatically and removing them.
50 |
51 | It can achieve accuracy reach 100% in most classic pattern problems, achieve 97% in MNIST dataset. A great advantage of this algorithm is being single-parameter algorithm. All you need is to set number of k-nearest neighbor and the algorithm will care about the rest. Being Non-senstive to changes in k, make it robust and stable.
52 |
53 | Keep in mind, the algorithm reduce any N-D dataset to only 2-D dataset initially, so it is a good benefit of this algorithm is being always to plot your data and explore it which make this algorithm a good candidate for data exploration. Finally, the algorithm comes with neat package for visualizing data, validating it and analyze the whole clustering process.
54 |
55 | How to install DenMune
56 | ------------------------
57 | Simply install DenMune clustering algorithm using pip command from the official Python repository
58 |
59 | []( https://pypi.org/project/denmune/)
60 |
61 | From the shell run the command
62 |
63 | ```shell
64 | pip install denmune
65 | ```
66 |
67 | From jupyter notebook cell run the command
68 |
69 | ```ipython3
70 | !pip install denmune
71 | ```
72 |
73 | How to use DenMune
74 | --------------------
75 | Once DenMune is installed, you just need to import it
76 |
77 | ```python
78 | from denmune import DenMune
79 | ```
80 | ###### Please note that first denmune (the package) in small letters, while the other one(the class itself) has D and M in capital case.
81 |
82 |
83 | Read data
84 | -----------
85 |
86 | There are four possible cases of data:
87 | - only train data without labels
88 | - only labeld train data
89 | - labeled train data in addition to test data without labels
90 | - labeled train data in addition to labeled test data
91 |
92 |
93 | ```python
94 | #=============================================
95 | # First scenario: train data without labels
96 | # ============================================
97 |
98 | data_path = 'datasets/denmune/chameleon/'
99 | dataset = "t7.10k.csv"
100 | data_file = data_path + dataset
101 |
102 | # train data without labels
103 | X_train = pd.read_csv(data_file, sep=',', header=None)
104 |
105 | knn = 39 # k-nearest neighbor, the only parameter required by the algorithm
106 |
107 | dm = DenMune(train_data=X_train, k_nearest=knn)
108 | labels, validity = dm.fit_predict(show_analyzer=False, show_noise=True)
109 |
110 | ```
111 | This is an intutive dataset which has no groundtruth provided
112 |
113 | 
114 |
115 | ```python
116 | #=============================================
117 | # Second scenario: train data with labels
118 | # ============================================
119 |
120 | data_path = 'datasets/denmune/shapes/'
121 | dataset = "aggregation.csv"
122 | data_file = data_path + dataset
123 |
124 | # train data with labels
125 | X_train = pd.read_csv(data_file, sep=',', header=None)
126 | y_train = X_train.iloc[:, -1]
127 | X_train = X_train.drop(X_train.columns[-1], axis=1)
128 |
129 | knn = 6 # k-nearest neighbor, the only parameter required by the algorithm
130 |
131 | dm = DenMune(train_data=X_train, train_truth= y_train, k_nearest=knn)
132 | labels, validity = dm.fit_predict(show_analyzer=False, show_noise=True)
133 | ```
134 | Datset groundtruth
135 |
136 | 
137 |
138 | Datset as detected by DenMune at k=6
139 |
140 | 
141 |
142 |
143 | ```python
144 | #=================================================================
145 | # Third scenario: train data with labels in addition to test data
146 | # ================================================================
147 |
148 | data_path = 'datasets/denmune/pendigits/'
149 | file_2d = data_path + 'pendigits-2d.csv'
150 |
151 | # train data with labels
152 | X_train = pd.read_csv(data_path + 'train.csv', sep=',', header=None)
153 | y_train = X_train.iloc[:, -1]
154 | X_train = X_train.drop(X_train.columns[-1], axis=1)
155 |
156 | # test data without labels
157 | X_test = pd.read_csv(data_path + 'test.csv', sep=',', header=None)
158 | X_test = X_test.drop(X_test.columns[-1], axis=1)
159 |
160 | knn = 50 # k-nearest neighbor, the only parameter required by the algorithm
161 |
162 | dm = DenMune(train_data=X_train, train_truth= y_train,
163 | test_data= X_test,
164 | k_nearest=knn)
165 | labels, validity = dm.fit_predict(show_analyzer=True, show_noise=True)
166 | ```
167 | dataset groundtruth
168 |
169 | 
170 |
171 |
172 | dataset as detected by DenMune at k=50
173 |
174 | 
175 |
176 | test data as predicted by DenMune on training the dataset at k=50
177 |
178 | 
179 |
180 |
181 | Algorithm's Parameters
182 | -----------------------
183 | 1. Parameters used within the initialization of the DenMune class
184 |
185 | ```python
186 | def __init__ (self,
187 | train_data=None, test_data=None,
188 | train_truth=None, test_truth=None,
189 | file_2d =None, k_nearest=None,
190 | rgn_tsne=False, prop_step=0,
191 | ):
192 | ```
193 |
194 | - train_data:
195 | - data used for training the algorithm
196 | - default: None. It should be provided by the use, otherwise an error will riase.
197 |
198 | - train_truth:
199 | - labels of training data
200 | - default: None
201 |
202 | - test_data:
203 | - data used for testing the algorithm
204 |
205 | - test_truth:
206 | - labels of testing data
207 | - default: None
208 |
209 | - k_nearest:
210 | - number of nearest neighbor
211 | - default: 0. the default is invalid. k-nearest neighbor should be at leat 1.
212 |
213 | - rgn_tsn:
214 | - when set to True: It will regenerate the reduced 2-D version of the N-D dataset each time the algorithm run.
215 | - when set to False: It will generate the reduced 2-D version of the N-D dataset first time only, then will reuse the saved exist file
216 | - default: True
217 |
218 | - file_2d: name (include location) of file used save/load the reduced 2-d version
219 | - if empty: the algorithm will create temporary file named '_temp_2d'
220 | - default: None
221 |
222 | - prop_step:
223 | - size of increment used in showing the clustering propagation.
224 | - leave this parameter set to 0, the default value, unless you are willing intentionally to enter the propagation mode.
225 | - default: 0
226 |
227 |
228 | 2. Parameters used within the fit_predict function:
229 |
230 | ```python
231 | def fit_predict(self,
232 | validate=True,
233 | show_plots=True,
234 | show_noise=True,
235 | show_analyzer=True
236 | ):
237 | ```
238 |
239 | - validate:
240 | - validate data on/off according to five measures integrated with DenMUne (Accuracy. F1-score, NMI index, AMI index, ARI index)
241 | - default: True
242 |
243 | - show_plots:
244 | - show/hide plotting of data
245 | - default: True
246 |
247 | - show_noise:
248 | - show/hide noise and outlier
249 | - default: True
250 |
251 | - show_analyzer:
252 | - show/hide the analyzer
253 | - default: True
254 |
255 | The Analyzer
256 | -------------
257 |
258 | The algorithm provide an intutive tool called analyzer, once called it will provide you with in-depth analysis on how your clustering results perform.
259 |
260 | 
261 |
262 | Noise Detection
263 | ----------------
264 |
265 | DenMune detects noise and outlier automatically, no need to any further work from your side.
266 |
267 | - It plots pre-identified noise in black
268 | - It plots post-identified noise in light grey
269 |
270 | You can set show_noise parameter to False.
271 |
272 |
273 | ```python
274 |
275 | # let us show noise
276 |
277 | m = DenMune(train_data=X_train, k_nearest=knn)
278 | labels, validity = dm.fit_predict(show_noise=True)
279 | ```
280 |
281 | ```python
282 |
283 | # let us show clean data by removing noise
284 |
285 | m = DenMune(train_data=X_train, k_nearest=knn)
286 | labels, validity = dm.fit_predict(show_noise=False)
287 | ```
288 |
289 | | noisy data | clean data |
290 | ----------| ---------------------------------------------------------------------------------------------------|
291 | |  |  |
292 |
293 |
294 | Validatation
295 | --------------
296 | You can get your validation results using 3 methods
297 |
298 | - by showing the Analyzer
299 | - extract values from the validity returned list from fit_predict function
300 | - extract values from the Analyzer dictionary
301 | -
302 | There are five validity measures built-in the algorithm, which are:
303 |
304 | - ACC, Accuracy
305 | - F1 score
306 | - NMI index (Normalized Mutual Information)
307 | - AMI index (Adjusted Mutual Information)
308 | - ARI index (Adjusted Rand Index)
309 |
310 | 
311 |
312 | K-nearest Evolution
313 | -------------------
314 | The following chart shows the evolution of pre and post identified noise in correspondence to increase of number of knn. Also, detected number of clusters is analyzed in the same chart in relation with both types of identified noise.
315 |
316 | 
317 |
318 |
319 | The Scalability
320 | ----------------
321 | | data size | time |
322 | |------------------| ------------------- |
323 | | data size: 5000 | time: 2.3139 seconds |
324 | | data size: 10000 | time: 5.8752 seconds |
325 | | data size: 15000 | time: 12.4535 seconds |
326 | | data size: 20000 | time: 18.8466 seconds |
327 | | data size: 25000 | time: 28.992 seconds |
328 | | data size: 30000 | time: 39.3166 seconds |
329 | | data size: 35000 | time: 39.4842 seconds |
330 | | data size: 40000 | time: 63.7649 seconds |
331 | | data size: 45000 | time: 73.6828 seconds |
332 | | data size: 50000 | time: 86.9194 seconds |
333 | | data size: 55000 | time: 90.1077 seconds |
334 | | data size: 60000 | time: 125.0228 seconds |
335 | | data size: 65000 | time: 149.1858 seconds |
336 | | data size: 70000 | time: 177.4184 seconds |
337 | | data size: 75000 | time: 204.0712 seconds |
338 | | data size: 80000 | time: 220.502 seconds |
339 | | data size: 85000 | time: 251.7625 seconds |
340 | | data size: 100000 | time: 257.563 seconds |
341 |
342 | | 
343 |
344 | The Stability
345 | --------------
346 |
347 | The algorithm is only single-parameter, even more it not sensitive to changes in that parameter, k. You may guess that from the following chart yourself. This is of greate benfit for you as a data exploration analyst. You can simply explore the dataset using an arbitrary k. Being Non-senstive to changes in k, make it robust and stable.
348 |
349 | 
350 |
351 |
352 | Reveal the propagation
353 | -----------------------
354 |
355 | one of the top performing feature in this algorithm is enabling you to watch how your clusters propagate to construct the final output clusters.
356 | just use the parameter 'prop_step' as in the following example:
357 |
358 | ```python
359 | dataset = "t7.10k" #
360 | data_path = 'datasets/denmune/chameleon/'
361 |
362 | # train file
363 | data_file = data_path + dataset +'.csv'
364 | X_train = pd.read_csv(data_file, sep=',', header=None)
365 |
366 |
367 | from itertools import chain
368 |
369 | # Denmune's Paramaters
370 | knn = 39 # number of k-nearest neighbor, the only parameter required by the algorithm
371 |
372 | # create list of differnt snapshots of the propagation
373 | snapshots = chain(range(2,5), range(5,50,10), range(50, 100, 25), range(100,500,100), range(500,2000, 250), range(1000,5500, 500))
374 |
375 | from IPython.display import clear_output
376 | for snapshot in snapshots:
377 | print ("itration", snapshot )
378 | clear_output(wait=True)
379 | dm = DenMune(train_data=X_train, k_nearest=knn, rgn_tsne=False, prop_step=snapshot)
380 | labels, validity = dm.fit_predict(show_analyzer=False, show_noise=False)
381 | ```
382 |
383 | []()
384 |
385 | Interact with the algorithm
386 | ---------------------------
387 | [](https://colab.research.google.com/drive/1EUROd6TRwxW3A_XD3KTxL8miL2ias4Ue?usp=sharing)
388 |
389 | This notebook allows you interact with the algorithm in many asspects:
390 | - you can choose which dataset to cluster (among 4 chameleon datasets)
391 | - you can decide which number of k-nearest neighbor to use
392 | - show noise on/off; thus you can invesitigate noise detected by the algorithm
393 | - show analyzer on/off
394 |
395 | How to run and test
396 | --------------------
397 |
398 | 1. Launch Examples in Repo2Docker Binder
399 |
400 | Simply use our repo2docker offered by mybinder.org, which encapsulate the algorithm and all required data in one virtual machine instance. All jupter notebooks examples found in this repository will be also available to you in action to practice in this respo2docer. Thanks mybinder.org, you made it possible!
401 |
402 | [](https://mybinder.org/v2/gh/egy1st/denmune-clustering-algorithm/HEAD)
403 |
404 | 2. Launch each Example in Kaggle workspace
405 |
406 | If you are a kaggler like me, then Kaggle, the best workspace where data scientist meet, should fit you to test the algorithm with great experince.
407 |
408 | | Dataset | Kaggle URL |
409 | ----------| ---------------------------------------------------------------------------------------------------|
410 | |When less means more - kaggle |[]( https://www.kaggle.com/egyfirst/when-less-means-more) |
411 | |Non-groundtruth datasets - kaggle|[](https://www.kaggle.com/egyfirst/detecting-non-groundtruth-datasets) |
412 | |2D Shape datasets - kaggle|[](https://www.kaggle.com/egyfirst/detection-of-2d-shape-datasets) |
413 | |MNIST dataset kaggle|[](https://www.kaggle.com/egyfirst/get-97-using-simple-yet-one-parameter-algorithm) |
414 | |Iris dataset kaggle| [](https://www.kaggle.com/egyfirst/denmune-clustering-iris-dataset) |
415 | |Training MNIST to get 97%| []( https://www.kaggle.com/egyfirst/training-mnist-dataset-to-get-97) |
416 | |Noise detection - kaggle| []( https://www.kaggle.com/egyfirst/noise-detection) |
417 | |Validation - kaggle| [](https://www.kaggle.com/egyfirst/validate-in-5-built-in-validity-insexes) |
418 | |The beauty of propagation - kaggle| [](https://www.kaggle.com/egyfirst/the-beauty-of-clusters-propagation) |
419 | |The beauty of propagation part2 - kaggle | [](https://www.kaggle.com/egyfirst/the-beauty-of-propagation-part2) |
420 | |Snapshots of propagation -kaggle| [](https://www.kaggle.com/egyfirst/beauty-of-propagation-part3) |
421 | |Scalability kaggle| [](https://www.kaggle.com/egyfirst/scalability-vs-speed) |
422 | |Stability - kaggle| [](https://www.kaggle.com/egyfirst/stability-vs-number-of-nearest-neighbor) |
423 | |k-nearest-evolution - kaggle| [](https://www.kaggle.com/egyfirst/k-nearest-evolution) |
424 |
425 | 3. Launch each Example in Google Research, CoLab
426 |
427 | Need to test examples one by one, then here another option. Use colab offered by google research to test each example individually.
428 |
429 |
430 |
431 | Here is a list of Google CoLab URL to use the algorithm interactively
432 | ----------------------------------------------------------------------
433 |
434 |
435 | | Dataset | CoLab URL |
436 | ----------| ---------------------------------------------------------------------------------------------------|
437 | |How to use it - colab|[]( https://colab.research.google.com/drive/1J_uKdhZ3z1KeY0-wJ7Ruw2PZSY1orKQm)|
438 | |Chameleon datasets - colab|[](https://colab.research.google.com/drive/1EUROd6TRwxW3A_XD3KTxL8miL2ias4Ue?usp=sharing) |
439 | |2D Shape datasets - colab|[]( https://colab.research.google.com/drive/1EaqTPCRHSuTKB-qEbnWHpGKFj6XytMIk?usp=sharing) |
440 | |MNIST dataset - colab|[](https://colab.research.google.com/drive/1a9FGHRA6IPc5jhLOV46iEbpUeQXptSJp?usp=sharing) |
441 | |iris dataset - colab|[](https://colab.research.google.com/drive/1nKql57Xh7xVVu6NpTbg3vRdRg42R7hjm?usp=sharing) |
442 | |Get 97% by training MNIST dataset - colab|[]( https://colab.research.google.com/drive/1NeOtXEQY94oD98Ufbh3IhTHnnYwIA659) |
443 | |Non-groundtruth datasets - colab|[]( https://colab.research.google.com/drive/1d17ejQ83aUy0CZIeQ7bHTugSC9AjJ2mU?usp=sharing) |
444 | |Noise detection - colab|[]( https://colab.research.google.com/drive/1Bp3c-cJfjLWxupmrBJ_6Q4-nqIfZcII4) |
445 | |Validation - colab|[]( https://colab.research.google.com/drive/13_EVaQOv_QiNmQiMWJAcFFHPJHGCrQLe) |
446 | |How it propagates - colab|[](https://colab.research.google.com/drive/1o-tP3uvDGjxBOGYkir1lnbr74sZ06e0U?usp=sharing)|
447 | |Snapshots of propagation - colab|[](https://colab.research.google.com/drive/1vPXNKa8Rf3TnqDHSD3YSWl3g1iNSqjl2?usp=sharing)|
448 | |Scalability - colab|[](https://colab.research.google.com/drive/1d55wkBndLLapO7Yx1ePHhE8mL61j9-TH?usp=sharing)|
449 | |Stability vs number of nearest neighbors - colab|[](https://colab.research.google.com/drive/17VgVRMFBWvkSIH1yA3tMl6UQ7Eu68K2l?usp=sharing)|
450 | |k-nearest-evolution - colab|[]( https://colab.research.google.com/drive/1DZ-CQPV3WwJSiaV3-rjwPwmXw4RUh8Qj)|
451 |
452 |
453 |
454 | How to cite
455 | =====
456 | If you have used this codebase in a scientific publication and wish to cite it, please use the [Journal of Pattern Recognition article](https://www.sciencedirect.com/science/article/abs/pii/S0031320320303927)
457 |
458 | Mohamed Abbas McInnes, Adel El-Zoghaby, Amin Ahoukry, *DenMune: Density peak based clustering using mutual nearest neighbors*
459 | In: Journal of Pattern Recognition, Elsevier, volume 109, number 107589.
460 | January 2021
461 |
462 |
463 | ```bib
464 | @article{ABBAS2021107589,
465 | title = {DenMune: Density peak based clustering using mutual nearest neighbors},
466 | journal = {Pattern Recognition},
467 | volume = {109},
468 | pages = {107589},
469 | year = {2021},
470 | issn = {0031-3203},
471 | doi = {https://doi.org/10.1016/j.patcog.2020.107589},
472 | url = {https://www.sciencedirect.com/science/article/pii/S0031320320303927},
473 | author = {Mohamed Abbas and Adel El-Zoghabi and Amin Shoukry},
474 | keywords = {Clustering, Mutual neighbors, Dimensionality reduction, Arbitrary shapes, Pattern recognition, Nearest neighbors, Density peak},
475 | abstract = {Many clustering algorithms fail when clusters are of arbitrary shapes, of varying densities, or the data classes are unbalanced and close to each other, even in two dimensions. A novel clustering algorithm “DenMune” is presented to meet this challenge. It is based on identifying dense regions using mutual nearest neighborhoods of size K, where K is the only parameter required from the user, besides obeying the mutual nearest neighbor consistency principle. The algorithm is stable for a wide range of values of K. Moreover, it is able to automatically detect and remove noise from the clustering process as well as detecting the target clusters. It produces robust results on various low and high dimensional datasets relative to several known state of the art clustering algorithms.}
476 | }
477 | ```
478 |
479 | Licensing
480 | ------------
481 |
482 | The DenMune algorithm is 3-clause BSD licensed. Enjoy.
483 |
484 | [](https://choosealicense.com/licenses/bsd-3-clause/)
485 |
486 |
487 | Task List
488 | ------------
489 |
490 | - [x] Update Github with the DenMune sourcode
491 | - [x] create repo2docker repository
492 | - [x] Create pip Package
493 | - [x] create CoLab shared examples
494 | - [x] create documentation
495 | - [x] create Kaggle shared examples
496 | - [x] PEP8 compliant
497 | - [x] Continuous integration
498 | - [x] scikit-learn compatible
499 | - [X] Unit tests (coverage: 100%)
500 | - [ ] create conda package
501 |
502 |
--------------------------------------------------------------------------------
/PyPi Package/src/denmune.egg-info/PKG-INFO:
--------------------------------------------------------------------------------
1 | Metadata-Version: 2.1
2 | Name: denmune
3 | Version: 0.0.96
4 | Summary: This is the package for DenMune Clustering Algorithm published in paper https://doi.org/10.1016/j.patcog.2020.107589
5 | Home-page: https://github.com/egy1st/denmune-clustering-algorithm
6 | Author: Mohamed Ali Abbas
7 | Author-email: mohamed.alyabbas@outlook.com
8 | License: UNKNOWN
9 | Project-URL: Bug Tracker, https://github.com/pypa/sampleproject/issues
10 | Platform: UNKNOWN
11 | Classifier: Programming Language :: Python :: 3
12 | Classifier: License :: OSI Approved :: BSD License
13 | Classifier: Operating System :: OS Independent
14 | Requires-Python: >=3.6
15 | Description-Content-Type: text/markdown
16 | License-File: LICENSE
17 |
18 | DenMune: A density-peak clustering algorithm
19 | =============================================
20 |
21 | DenMune a clustering algorithm that can find clusters of arbitrary size, shapes and densities in two-dimensions. Higher dimensions are first reduced to 2-D using the t-sne. The algorithm relies on a single parameter K (the number of nearest neighbors). The results show the superiority of the algorithm. Enjoy the simplicity but the power of DenMune.
22 |
23 |
24 | []( https://pypi.org/project/denmune/)
25 | [](https://mybinder.org/v2/gh/egy1st/denmune-clustering-algorithm/HEAD)
26 | [](https://denmune.readthedocs.io/en/latest/?badge=latest)
27 | [](#colab)
28 | [](https://www.kaggle.com/egyfirst/denmune-clustering-iris-dataset?scriptVersionId=84775816)
29 | [](https://www.sciencedirect.com/science/article/abs/pii/S0031320320303927)
30 | [](https://data.mendeley.com/datasets/b73cw5n43r/4)
31 | [](https://choosealicense.com/licenses/bsd-3-clause/)
32 | [](https://circleci.com/gh/egy1st/denmune-clustering-algorithm/tree/main)
33 | [](https://codecov.io/gh/egy1st/denmune-clustering-algorithm)
34 | [](https://github.com/egy1st/denmune-clustering-algorithm/actions/workflows/python-package.yml)
35 |
36 | Based on the paper
37 | -------------------
38 |
39 | |Paper|Journal|
40 | |-------------------------------------------------------------------------------------------|-----------------------------|
41 | |Mohamed Abbas, Adel El-Zoghabi, Amin Ahoukry,