├── .gitignore
├── .travis.yml
├── AUTHORS
├── CHANGELOG
├── INSTALL
├── LICENSE
├── MANIFEST.in
├── README.rst
├── cluster.bmp
├── cluster
    ├── __init__.py
    ├── cluster.py
    ├── linkage.py
    ├── matrix.py
    ├── method
    │   ├── __init__.py
    │   ├── base.py
    │   ├── hierarchical.py
    │   └── kmeans.py
    ├── test
    │   ├── test_hierarchical.py
    │   ├── test_kmeans.py
    │   ├── test_linkage.py
    │   └── test_numpy.py
    ├── util.py
    └── version.txt
├── dev-requirements.txt
├── docs
    ├── Makefile
    ├── _static
    │   └── .gitkeep
    ├── apidoc
    │   ├── cluster.matrix.rst
    │   ├── cluster.method.base.rst
    │   ├── cluster.method.hierarchical.rst
    │   ├── cluster.method.kmeans.rst
    │   ├── cluster.rst
    │   └── cluster.util.rst
    ├── changelog.rst
    ├── conf.py
    └── index.rst
├── fabfile.py
├── makedist.sh
├── pytest.ini
├── setup.cfg
├── setup.py
└── tox.ini


/.gitignore:
--------------------------------------------------------------------------------
 1 | *.pyc
 2 | /*.egg-info
 3 | /.cache
 4 | /.pytest_cache
 5 | /.tox
 6 | /MANIFEST
 7 | /build
 8 | /dist
 9 | /docs/_build
10 | /env
11 | /env3
12 | 


--------------------------------------------------------------------------------
/.travis.yml:
--------------------------------------------------------------------------------
 1 | arch:
 2 |   - amd64
 3 |   - ppc64le
 4 | language: python
 5 | python:
 6 |   - "2.7"
 7 |   - "3.5"
 8 |   - "3.6"
 9 |   - "3.7"
10 |   - "3.8"
11 |   - "3.9"
12 |   - "nightly"
13 | install: pip install tox-travis
14 | script: "tox"
15 | 


--------------------------------------------------------------------------------
/AUTHORS:
--------------------------------------------------------------------------------
1 | Michel Albert (exhuma@users.sourceforge.net)
2 | Sam Sandberg (@LoisaidaSam)


--------------------------------------------------------------------------------
/CHANGELOG:
--------------------------------------------------------------------------------
  1 | Release 1.4.1.post3
  2 | ===================
  3 | 
  4 | This is a "house-keeping" commit. No new features or fixes are introduced.
  5 | 
  6 | * Update CI test rules to include amd64 and ppc (santosh653)
  7 | 
  8 | 
  9 | Release 1.4.1.post2
 10 | ===================
 11 | 
 12 | This is a "house-keeping" commit. No new features or fixes are introduced.
 13 | 
 14 | * Update changelog.
 15 | * Removed the ``Pipfile`` which was introduced in ``1.4.1.post1``. The file
 16 |   caused false positives on security checks. Additionally, having a ``Pipfile``
 17 |   is mainly useful in applications, and not in libraries like this one.
 18 | 
 19 | Release 1.4.1.post1
 20 | ===================
 21 | 
 22 | This is a "house-keeping" commit. No new features or fixes are introduced.
 23 | 
 24 | * Update changelog.
 25 | * Switch doc-building to use ``pipenv`` & update ``Pipfile`` accordingly.
 26 | 
 27 | Release 1.4.1
 28 | =============
 29 | 
 30 | * Fix clustering of dictionaries. See GitHub issue #28 (Tim Littlefair).
 31 | 
 32 | Release 1.4.0
 33 | =============
 34 | 
 35 | * Added a "display" method to hierarchical clusters (by 1kastner).
 36 | 
 37 | Release 1.3.2 & 1.3.3
 38 | =====================
 39 | 
 40 | * Fix regression introduced in 1.3.1 related to package version metadata.
 41 | 
 42 | Release 1.3.1
 43 | =============
 44 | 
 45 | * Don't break if the cluster is initiated with iterable elements (GitHub Issue
 46 |   #20).
 47 | * Fix package version metadata in setup.py
 48 | 
 49 | Release 1.3.0
 50 | =============
 51 | 
 52 | * Performance improvments for hierarchical clustering (at the cost of memory)
 53 | * Cluster instances are now iterable. It will iterate over each element,
 54 |   resulting in a flat list of items.
 55 | * New option to specify a progress callback to hierarchical clustring. This
 56 |   method will be called on each iteration for hierarchical clusters. It gets
 57 |   two numeric values as argument: The total count of elements, and the number
 58 |   of processed elements. It gives users a way to present to progress on screen.
 59 | * The library now also has a ``__version__`` member.
 60 | 
 61 | 
 62 | Release 1.2.2
 63 | =============
 64 | 
 65 | * Package metadata fixed.
 66 | 
 67 | Release 1.2.1
 68 | =============
 69 | 
 70 | * Fixed an issue in multiprocessing code.
 71 | 
 72 | Release 1.2.0
 73 | =============
 74 | 
 75 | * Multiprocessing (by loisaidasam)
 76 | * Python 3 support
 77 | * Split up one big file into smaller more logical sub-modules
 78 | * Fixed https://github.com/exhuma/python-cluster/issues/11
 79 | * Documentation update.
 80 | * Migrated to GitHub
 81 | 
 82 | Release 1.1.1b3
 83 | ===============
 84 | 
 85 | * Fixed bug #1727558
 86 | * Some more unit-tests
 87 | * ValueError changed to ClusteringError where appropriate
 88 | 
 89 | Release 1.1.1b2
 90 | ===============
 91 | 
 92 | * Fixed bug #1604859 (thanks to Willi Richert for reporting it)
 93 | 
 94 | Release 1.1.1b1
 95 | ===============
 96 | 
 97 | * Applied SVN patch [1535137] (thanks ajaksu)
 98 | 
 99 |   * Topology output supported
100 |   * ``data`` and ``raw_data`` are now properties.
101 | 
102 | Release 1.1.0b1
103 | ===============
104 | 
105 | * KMeans Clustering implemented for simple numeric tuples.
106 | 
107 |   Data in the form ``[(1,1), (2,1), (5,3), ...]`` can be clustered.
108 | 
109 |   Usage::
110 | 
111 |     >>> from cluster import KMeansClustering
112 |     >>> cl = KMeansClustering([(1,1), (2,1), (5,3), ...])
113 |     >>> clusters = cl.getclusters(2)
114 | 
115 |   The method ``getclusters`` takes the amount of clusters you would like to
116 |   have as parameter.
117 | 
118 |   Only numeric values are supported in the tuples. The reason for this is
119 |   that the "centroid" method which I use, essentially returns a tuple of
120 |   floats. So you will lose any other kind of metadata. Once I figure out a
121 |   way how to recode that method, other types should be possible.
122 | 
123 | Release 1.0.1b2
124 | ===============
125 | 
126 | * Optimized calculation of the hierarchical clustering by using the fact, that
127 |   the generated matrix is symmetrical.
128 | 
129 | Release 1.0.1b1
130 | ===============
131 | 
132 | * Implemented complete-, average-, and uclus-linkage methods. You can select
133 |   one by specifying it in the constructor, for example::
134 | 
135 |       cl = HierarchicalClustering(data, distfunc, linkage='uclus')
136 | 
137 |   or by setting it before starting the clustering process::
138 | 
139 |       cl = HierarchicalClustering(data, distfunc)
140 |       cl.setLinkageMethod('uclus')
141 |       cl.cluster()
142 | 
143 | * Clustering is not executed on object creation, but on the first call of
144 |   ``getlevel``. You can force the creation of the clusters by calling the
145 |   ``cluster`` method as shown above.
146 | 
147 | .. vim: filetype=rst :
148 | 


--------------------------------------------------------------------------------
/INSTALL:
--------------------------------------------------------------------------------
 1 | INSTALLATION
 2 | ============
 3 | 
 4 | Simply run::
 5 | 
 6 |     pip install cluster
 7 | 
 8 | Or, if you run it in a virtualenv:
 9 | 
10 |     /path/to/your/env/bin/pip install cluster
11 | 
12 | 
13 | Source installation
14 | ~~~~~~~~~~~~~~~~~~~
15 | 
16 | Untar the archive::
17 | 
18 |    tar xf <filename.tar.gz>
19 | 
20 | Next, go to the folder just created. It will have the same name as the package
21 | (for example "cluster-1.2.2") and run::
22 | 
23 |     python setup.py install
24 | 
25 | This will require superuser privileges unless you install it in a virtual environment::
26 | 
27 |     /path/to/your/env/bin/python setup.py install
28 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
  1 |         GNU LESSER GENERAL PUBLIC LICENSE
  2 |              Version 2.1, February 1999
  3 | 
  4 |  Copyright (C) 1991, 1999 Free Software Foundation, Inc.
  5 |  51 Franklin Street, Fifth Floor, Boston, MA  02110-1301  USA
  6 |  Everyone is permitted to copy and distribute verbatim copies
  7 |  of this license document, but changing it is not allowed.
  8 | 
  9 | [This is the first released version of the Lesser GPL.  It also counts
 10 |  as the successor of the GNU Library Public License, version 2, hence
 11 |  the version number 2.1.]
 12 | 
 13 |              Preamble
 14 | 
 15 |   The licenses for most software are designed to take away your
 16 | freedom to share and change it.  By contrast, the GNU General Public
 17 | Licenses are intended to guarantee your freedom to share and change
 18 | free software--to make sure the software is free for all its users.
 19 | 
 20 |   This license, the Lesser General Public License, applies to some
 21 | specially designated software packages--typically libraries--of the
 22 | Free Software Foundation and other authors who decide to use it.  You
 23 | can use it too, but we suggest you first think carefully about whether
 24 | this license or the ordinary General Public License is the better
 25 | strategy to use in any particular case, based on the explanations below.
 26 | 
 27 |   When we speak of free software, we are referring to freedom of use,
 28 | not price.  Our General Public Licenses are designed to make sure that
 29 | you have the freedom to distribute copies of free software (and charge
 30 | for this service if you wish); that you receive source code or can get
 31 | it if you want it; that you can change the software and use pieces of
 32 | it in new free programs; and that you are informed that you can do
 33 | these things.
 34 | 
 35 |   To protect your rights, we need to make restrictions that forbid
 36 | distributors to deny you these rights or to ask you to surrender these
 37 | rights.  These restrictions translate to certain responsibilities for
 38 | you if you distribute copies of the library or if you modify it.
 39 | 
 40 |   For example, if you distribute copies of the library, whether gratis
 41 | or for a fee, you must give the recipients all the rights that we gave
 42 | you.  You must make sure that they, too, receive or can get the source
 43 | code.  If you link other code with the library, you must provide
 44 | complete object files to the recipients, so that they can relink them
 45 | with the library after making changes to the library and recompiling
 46 | it.  And you must show them these terms so they know their rights.
 47 | 
 48 |   We protect your rights with a two-step method: (1) we copyright the
 49 | library, and (2) we offer you this license, which gives you legal
 50 | permission to copy, distribute and/or modify the library.
 51 | 
 52 |   To protect each distributor, we want to make it very clear that
 53 | there is no warranty for the free library.  Also, if the library is
 54 | modified by someone else and passed on, the recipients should know
 55 | that what they have is not the original version, so that the original
 56 | author's reputation will not be affected by problems that might be
 57 | introduced by others.
 58 | 
 59 |   Finally, software patents pose a constant threat to the existence of
 60 | any free program.  We wish to make sure that a company cannot
 61 | effectively restrict the users of a free program by obtaining a
 62 | restrictive license from a patent holder.  Therefore, we insist that
 63 | any patent license obtained for a version of the library must be
 64 | consistent with the full freedom of use specified in this license.
 65 | 
 66 |   Most GNU software, including some libraries, is covered by the
 67 | ordinary GNU General Public License.  This license, the GNU Lesser
 68 | General Public License, applies to certain designated libraries, and
 69 | is quite different from the ordinary General Public License.  We use
 70 | this license for certain libraries in order to permit linking those
 71 | libraries into non-free programs.
 72 | 
 73 |   When a program is linked with a library, whether statically or using
 74 | a shared library, the combination of the two is legally speaking a
 75 | combined work, a derivative of the original library.  The ordinary
 76 | General Public License therefore permits such linking only if the
 77 | entire combination fits its criteria of freedom.  The Lesser General
 78 | Public License permits more lax criteria for linking other code with
 79 | the library.
 80 | 
 81 |   We call this license the "Lesser" General Public License because it
 82 | does Less to protect the user's freedom than the ordinary General
 83 | Public License.  It also provides other free software developers Less
 84 | of an advantage over competing non-free programs.  These disadvantages
 85 | are the reason we use the ordinary General Public License for many
 86 | libraries.  However, the Lesser license provides advantages in certain
 87 | special circumstances.
 88 | 
 89 |   For example, on rare occasions, there may be a special need to
 90 | encourage the widest possible use of a certain library, so that it becomes
 91 | a de-facto standard.  To achieve this, non-free programs must be
 92 | allowed to use the library.  A more frequent case is that a free
 93 | library does the same job as widely used non-free libraries.  In this
 94 | case, there is little to gain by limiting the free library to free
 95 | software only, so we use the Lesser General Public License.
 96 | 
 97 |   In other cases, permission to use a particular library in non-free
 98 | programs enables a greater number of people to use a large body of
 99 | free software.  For example, permission to use the GNU C Library in
100 | non-free programs enables many more people to use the whole GNU
101 | operating system, as well as its variant, the GNU/Linux operating
102 | system.
103 | 
104 |   Although the Lesser General Public License is Less protective of the
105 | users' freedom, it does ensure that the user of a program that is
106 | linked with the Library has the freedom and the wherewithal to run
107 | that program using a modified version of the Library.
108 | 
109 |   The precise terms and conditions for copying, distribution and
110 | modification follow.  Pay close attention to the difference between a
111 | "work based on the library" and a "work that uses the library".  The
112 | former contains code derived from the library, whereas the latter must
113 | be combined with the library in order to run.
114 | 
115 |         GNU LESSER GENERAL PUBLIC LICENSE
116 |    TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
117 | 
118 |   0. This License Agreement applies to any software library or other
119 | program which contains a notice placed by the copyright holder or
120 | other authorized party saying it may be distributed under the terms of
121 | this Lesser General Public License (also called "this License").
122 | Each licensee is addressed as "you".
123 | 
124 |   A "library" means a collection of software functions and/or data
125 | prepared so as to be conveniently linked with application programs
126 | (which use some of those functions and data) to form executables.
127 | 
128 |   The "Library", below, refers to any such software library or work
129 | which has been distributed under these terms.  A "work based on the
130 | Library" means either the Library or any derivative work under
131 | copyright law: that is to say, a work containing the Library or a
132 | portion of it, either verbatim or with modifications and/or translated
133 | straightforwardly into another language.  (Hereinafter, translation is
134 | included without limitation in the term "modification".)
135 | 
136 |   "Source code" for a work means the preferred form of the work for
137 | making modifications to it.  For a library, complete source code means
138 | all the source code for all modules it contains, plus any associated
139 | interface definition files, plus the scripts used to control compilation
140 | and installation of the library.
141 | 
142 |   Activities other than copying, distribution and modification are not
143 | covered by this License; they are outside its scope.  The act of
144 | running a program using the Library is not restricted, and output from
145 | such a program is covered only if its contents constitute a work based
146 | on the Library (independent of the use of the Library in a tool for
147 | writing it).  Whether that is true depends on what the Library does
148 | and what the program that uses the Library does.
149 |   
150 |   1. You may copy and distribute verbatim copies of the Library's
151 | complete source code as you receive it, in any medium, provided that
152 | you conspicuously and appropriately publish on each copy an
153 | appropriate copyright notice and disclaimer of warranty; keep intact
154 | all the notices that refer to this License and to the absence of any
155 | warranty; and distribute a copy of this License along with the
156 | Library.
157 | 
158 |   You may charge a fee for the physical act of transferring a copy,
159 | and you may at your option offer warranty protection in exchange for a
160 | fee.
161 | 
162 |   2. You may modify your copy or copies of the Library or any portion
163 | of it, thus forming a work based on the Library, and copy and
164 | distribute such modifications or work under the terms of Section 1
165 | above, provided that you also meet all of these conditions:
166 | 
167 |     a) The modified work must itself be a software library.
168 | 
169 |     b) You must cause the files modified to carry prominent notices
170 |     stating that you changed the files and the date of any change.
171 | 
172 |     c) You must cause the whole of the work to be licensed at no
173 |     charge to all third parties under the terms of this License.
174 | 
175 |     d) If a facility in the modified Library refers to a function or a
176 |     table of data to be supplied by an application program that uses
177 |     the facility, other than as an argument passed when the facility
178 |     is invoked, then you must make a good faith effort to ensure that,
179 |     in the event an application does not supply such function or
180 |     table, the facility still operates, and performs whatever part of
181 |     its purpose remains meaningful.
182 | 
183 |     (For example, a function in a library to compute square roots has
184 |     a purpose that is entirely well-defined independent of the
185 |     application.  Therefore, Subsection 2d requires that any
186 |     application-supplied function or table used by this function must
187 |     be optional: if the application does not supply it, the square
188 |     root function must still compute square roots.)
189 | 
190 | These requirements apply to the modified work as a whole.  If
191 | identifiable sections of that work are not derived from the Library,
192 | and can be reasonably considered independent and separate works in
193 | themselves, then this License, and its terms, do not apply to those
194 | sections when you distribute them as separate works.  But when you
195 | distribute the same sections as part of a whole which is a work based
196 | on the Library, the distribution of the whole must be on the terms of
197 | this License, whose permissions for other licensees extend to the
198 | entire whole, and thus to each and every part regardless of who wrote
199 | it.
200 | 
201 | Thus, it is not the intent of this section to claim rights or contest
202 | your rights to work written entirely by you; rather, the intent is to
203 | exercise the right to control the distribution of derivative or
204 | collective works based on the Library.
205 | 
206 | In addition, mere aggregation of another work not based on the Library
207 | with the Library (or with a work based on the Library) on a volume of
208 | a storage or distribution medium does not bring the other work under
209 | the scope of this License.
210 | 
211 |   3. You may opt to apply the terms of the ordinary GNU General Public
212 | License instead of this License to a given copy of the Library.  To do
213 | this, you must alter all the notices that refer to this License, so
214 | that they refer to the ordinary GNU General Public License, version 2,
215 | instead of to this License.  (If a newer version than version 2 of the
216 | ordinary GNU General Public License has appeared, then you can specify
217 | that version instead if you wish.)  Do not make any other change in
218 | these notices.
219 | 
220 |   Once this change is made in a given copy, it is irreversible for
221 | that copy, so the ordinary GNU General Public License applies to all
222 | subsequent copies and derivative works made from that copy.
223 | 
224 |   This option is useful when you wish to copy part of the code of
225 | the Library into a program that is not a library.
226 | 
227 |   4. You may copy and distribute the Library (or a portion or
228 | derivative of it, under Section 2) in object code or executable form
229 | under the terms of Sections 1 and 2 above provided that you accompany
230 | it with the complete corresponding machine-readable source code, which
231 | must be distributed under the terms of Sections 1 and 2 above on a
232 | medium customarily used for software interchange.
233 | 
234 |   If distribution of object code is made by offering access to copy
235 | from a designated place, then offering equivalent access to copy the
236 | source code from the same place satisfies the requirement to
237 | distribute the source code, even though third parties are not
238 | compelled to copy the source along with the object code.
239 | 
240 |   5. A program that contains no derivative of any portion of the
241 | Library, but is designed to work with the Library by being compiled or
242 | linked with it, is called a "work that uses the Library".  Such a
243 | work, in isolation, is not a derivative work of the Library, and
244 | therefore falls outside the scope of this License.
245 | 
246 |   However, linking a "work that uses the Library" with the Library
247 | creates an executable that is a derivative of the Library (because it
248 | contains portions of the Library), rather than a "work that uses the
249 | library".  The executable is therefore covered by this License.
250 | Section 6 states terms for distribution of such executables.
251 | 
252 |   When a "work that uses the Library" uses material from a header file
253 | that is part of the Library, the object code for the work may be a
254 | derivative work of the Library even though the source code is not.
255 | Whether this is true is especially significant if the work can be
256 | linked without the Library, or if the work is itself a library.  The
257 | threshold for this to be true is not precisely defined by law.
258 | 
259 |   If such an object file uses only numerical parameters, data
260 | structure layouts and accessors, and small macros and small inline
261 | functions (ten lines or less in length), then the use of the object
262 | file is unrestricted, regardless of whether it is legally a derivative
263 | work.  (Executables containing this object code plus portions of the
264 | Library will still fall under Section 6.)
265 | 
266 |   Otherwise, if the work is a derivative of the Library, you may
267 | distribute the object code for the work under the terms of Section 6.
268 | Any executables containing that work also fall under Section 6,
269 | whether or not they are linked directly with the Library itself.
270 | 
271 |   6. As an exception to the Sections above, you may also combine or
272 | link a "work that uses the Library" with the Library to produce a
273 | work containing portions of the Library, and distribute that work
274 | under terms of your choice, provided that the terms permit
275 | modification of the work for the customer's own use and reverse
276 | engineering for debugging such modifications.
277 | 
278 |   You must give prominent notice with each copy of the work that the
279 | Library is used in it and that the Library and its use are covered by
280 | this License.  You must supply a copy of this License.  If the work
281 | during execution displays copyright notices, you must include the
282 | copyright notice for the Library among them, as well as a reference
283 | directing the user to the copy of this License.  Also, you must do one
284 | of these things:
285 | 
286 |     a) Accompany the work with the complete corresponding
287 |     machine-readable source code for the Library including whatever
288 |     changes were used in the work (which must be distributed under
289 |     Sections 1 and 2 above); and, if the work is an executable linked
290 |     with the Library, with the complete machine-readable "work that
291 |     uses the Library", as object code and/or source code, so that the
292 |     user can modify the Library and then relink to produce a modified
293 |     executable containing the modified Library.  (It is understood
294 |     that the user who changes the contents of definitions files in the
295 |     Library will not necessarily be able to recompile the application
296 |     to use the modified definitions.)
297 | 
298 |     b) Use a suitable shared library mechanism for linking with the
299 |     Library.  A suitable mechanism is one that (1) uses at run time a
300 |     copy of the library already present on the user's computer system,
301 |     rather than copying library functions into the executable, and (2)
302 |     will operate properly with a modified version of the library, if
303 |     the user installs one, as long as the modified version is
304 |     interface-compatible with the version that the work was made with.
305 | 
306 |     c) Accompany the work with a written offer, valid for at
307 |     least three years, to give the same user the materials
308 |     specified in Subsection 6a, above, for a charge no more
309 |     than the cost of performing this distribution.
310 | 
311 |     d) If distribution of the work is made by offering access to copy
312 |     from a designated place, offer equivalent access to copy the above
313 |     specified materials from the same place.
314 | 
315 |     e) Verify that the user has already received a copy of these
316 |     materials or that you have already sent this user a copy.
317 | 
318 |   For an executable, the required form of the "work that uses the
319 | Library" must include any data and utility programs needed for
320 | reproducing the executable from it.  However, as a special exception,
321 | the materials to be distributed need not include anything that is
322 | normally distributed (in either source or binary form) with the major
323 | components (compiler, kernel, and so on) of the operating system on
324 | which the executable runs, unless that component itself accompanies
325 | the executable.
326 | 
327 |   It may happen that this requirement contradicts the license
328 | restrictions of other proprietary libraries that do not normally
329 | accompany the operating system.  Such a contradiction means you cannot
330 | use both them and the Library together in an executable that you
331 | distribute.
332 | 
333 |   7. You may place library facilities that are a work based on the
334 | Library side-by-side in a single library together with other library
335 | facilities not covered by this License, and distribute such a combined
336 | library, provided that the separate distribution of the work based on
337 | the Library and of the other library facilities is otherwise
338 | permitted, and provided that you do these two things:
339 | 
340 |     a) Accompany the combined library with a copy of the same work
341 |     based on the Library, uncombined with any other library
342 |     facilities.  This must be distributed under the terms of the
343 |     Sections above.
344 | 
345 |     b) Give prominent notice with the combined library of the fact
346 |     that part of it is a work based on the Library, and explaining
347 |     where to find the accompanying uncombined form of the same work.
348 | 
349 |   8. You may not copy, modify, sublicense, link with, or distribute
350 | the Library except as expressly provided under this License.  Any
351 | attempt otherwise to copy, modify, sublicense, link with, or
352 | distribute the Library is void, and will automatically terminate your
353 | rights under this License.  However, parties who have received copies,
354 | or rights, from you under this License will not have their licenses
355 | terminated so long as such parties remain in full compliance.
356 | 
357 |   9. You are not required to accept this License, since you have not
358 | signed it.  However, nothing else grants you permission to modify or
359 | distribute the Library or its derivative works.  These actions are
360 | prohibited by law if you do not accept this License.  Therefore, by
361 | modifying or distributing the Library (or any work based on the
362 | Library), you indicate your acceptance of this License to do so, and
363 | all its terms and conditions for copying, distributing or modifying
364 | the Library or works based on it.
365 | 
366 |   10. Each time you redistribute the Library (or any work based on the
367 | Library), the recipient automatically receives a license from the
368 | original licensor to copy, distribute, link with or modify the Library
369 | subject to these terms and conditions.  You may not impose any further
370 | restrictions on the recipients' exercise of the rights granted herein.
371 | You are not responsible for enforcing compliance by third parties with
372 | this License.
373 | 
374 |   11. If, as a consequence of a court judgment or allegation of patent
375 | infringement or for any other reason (not limited to patent issues),
376 | conditions are imposed on you (whether by court order, agreement or
377 | otherwise) that contradict the conditions of this License, they do not
378 | excuse you from the conditions of this License.  If you cannot
379 | distribute so as to satisfy simultaneously your obligations under this
380 | License and any other pertinent obligations, then as a consequence you
381 | may not distribute the Library at all.  For example, if a patent
382 | license would not permit royalty-free redistribution of the Library by
383 | all those who receive copies directly or indirectly through you, then
384 | the only way you could satisfy both it and this License would be to
385 | refrain entirely from distribution of the Library.
386 | 
387 | If any portion of this section is held invalid or unenforceable under any
388 | particular circumstance, the balance of the section is intended to apply,
389 | and the section as a whole is intended to apply in other circumstances.
390 | 
391 | It is not the purpose of this section to induce you to infringe any
392 | patents or other property right claims or to contest validity of any
393 | such claims; this section has the sole purpose of protecting the
394 | integrity of the free software distribution system which is
395 | implemented by public license practices.  Many people have made
396 | generous contributions to the wide range of software distributed
397 | through that system in reliance on consistent application of that
398 | system; it is up to the author/donor to decide if he or she is willing
399 | to distribute software through any other system and a licensee cannot
400 | impose that choice.
401 | 
402 | This section is intended to make thoroughly clear what is believed to
403 | be a consequence of the rest of this License.
404 | 
405 |   12. If the distribution and/or use of the Library is restricted in
406 | certain countries either by patents or by copyrighted interfaces, the
407 | original copyright holder who places the Library under this License may add
408 | an explicit geographical distribution limitation excluding those countries,
409 | so that distribution is permitted only in or among countries not thus
410 | excluded.  In such case, this License incorporates the limitation as if
411 | written in the body of this License.
412 | 
413 |   13. The Free Software Foundation may publish revised and/or new
414 | versions of the Lesser General Public License from time to time.
415 | Such new versions will be similar in spirit to the present version,
416 | but may differ in detail to address new problems or concerns.
417 | 
418 | Each version is given a distinguishing version number.  If the Library
419 | specifies a version number of this License which applies to it and
420 | "any later version", you have the option of following the terms and
421 | conditions either of that version or of any later version published by
422 | the Free Software Foundation.  If the Library does not specify a
423 | license version number, you may choose any version ever published by
424 | the Free Software Foundation.
425 | 
426 |   14. If you wish to incorporate parts of the Library into other free
427 | programs whose distribution conditions are incompatible with these,
428 | write to the author to ask for permission.  For software which is
429 | copyrighted by the Free Software Foundation, write to the Free
430 | Software Foundation; we sometimes make exceptions for this.  Our
431 | decision will be guided by the two goals of preserving the free status
432 | of all derivatives of our free software and of promoting the sharing
433 | and reuse of software generally.
434 | 
435 |              NO WARRANTY
436 | 
437 |   15. BECAUSE THE LIBRARY IS LICENSED FREE OF CHARGE, THERE IS NO
438 | WARRANTY FOR THE LIBRARY, TO THE EXTENT PERMITTED BY APPLICABLE LAW.
439 | EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR
440 | OTHER PARTIES PROVIDE THE LIBRARY "AS IS" WITHOUT WARRANTY OF ANY
441 | KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE
442 | IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
443 | PURPOSE.  THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE
444 | LIBRARY IS WITH YOU.  SHOULD THE LIBRARY PROVE DEFECTIVE, YOU ASSUME
445 | THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION.
446 | 
447 |   16. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN
448 | WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY
449 | AND/OR REDISTRIBUTE THE LIBRARY AS PERMITTED ABOVE, BE LIABLE TO YOU
450 | FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR
451 | CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE
452 | LIBRARY (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING
453 | RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A
454 | FAILURE OF THE LIBRARY TO OPERATE WITH ANY OTHER SOFTWARE), EVEN IF
455 | SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH
456 | DAMAGES.
457 | 
458 |            END OF TERMS AND CONDITIONS
459 | 
460 |            How to Apply These Terms to Your New Libraries
461 | 
462 |   If you develop a new library, and you want it to be of the greatest
463 | possible use to the public, we recommend making it free software that
464 | everyone can redistribute and change.  You can do so by permitting
465 | redistribution under these terms (or, alternatively, under the terms of the
466 | ordinary General Public License).
467 | 
468 |   To apply these terms, attach the following notices to the library.  It is
469 | safest to attach them to the start of each source file to most effectively
470 | convey the exclusion of warranty; and each file should have at least the
471 | "copyright" line and a pointer to where the full notice is found.
472 | 
473 |     <one line to give the library's name and a brief idea of what it does.>
474 |     Copyright (C) <year>  <name of author>
475 | 
476 |     This library is free software; you can redistribute it and/or
477 |     modify it under the terms of the GNU Lesser General Public
478 |     License as published by the Free Software Foundation; either
479 |     version 2.1 of the License, or (at your option) any later version.
480 | 
481 |     This library is distributed in the hope that it will be useful,
482 |     but WITHOUT ANY WARRANTY; without even the implied warranty of
483 |     MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
484 |     Lesser General Public License for more details.
485 | 
486 |     You should have received a copy of the GNU Lesser General Public
487 |     License along with this library; if not, write to the Free Software
488 |     Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301  USA
489 | 
490 | Also add information on how to contact you by electronic and paper mail.
491 | 
492 | You should also get your employer (if you work as a programmer) or your
493 | school, if any, to sign a "copyright disclaimer" for the library, if
494 | necessary.  Here is a sample; alter the names:
495 | 
496 |   Yoyodyne, Inc., hereby disclaims all copyright interest in the
497 |   library `Frob' (a library for tweaking knobs) written by James Random Hacker.
498 | 
499 |   <signature of Ty Coon>, 1 April 1990
500 |   Ty Coon, President of Vice
501 | 
502 | That's all there is to it!
503 | 
504 | 
505 | 
506 | 


--------------------------------------------------------------------------------
/MANIFEST.in:
--------------------------------------------------------------------------------
1 | include README.rst LICENSE CHANGELOG
2 | include cluster.bmp
3 | include cluster/version.txt
4 | 


--------------------------------------------------------------------------------
/README.rst:
--------------------------------------------------------------------------------
 1 | DESCRIPTION
 2 | ===========
 3 | 
 4 | .. image:: https://readthedocs.org/projects/python-cluster/badge/?version=latest
 5 |     :target: http://python-cluster.readthedocs.org
 6 |     :alt: Documentation Status
 7 | 
 8 | python-cluster is a "simple" package that allows to create several groups
 9 | (clusters) of objects from a list. It's meant to be flexible and able to
10 | cluster any object. To ensure this kind of flexibility, you need not only to
11 | supply the list of objects, but also a function that calculates the similarity
12 | between two of those objects. For simple datatypes, like integers, this can be
13 | as simple as a subtraction, but more complex calculations are possible. Right
14 | now, it is possible to generate the clusters using a hierarchical clustering
15 | and the popular K-Means algorithm. For the hierarchical algorithm there are
16 | different "linkage" (single, complete, average and uclus) methods available.
17 | 
18 | Algorithms are based on the document found at
19 | http://www.elet.polimi.it/upload/matteucc/Clustering/tutorial_html/
20 | 
21 | .. note::
22 |     The above site is no longer avaialble, but you can still view it in the
23 |     internet archive at:
24 |     https://web.archive.org/web/20070912040206/http://home.dei.polimi.it//matteucc/Clustering/tutorial_html/
25 | 
26 | 
27 | USAGE
28 | =====
29 | 
30 | A simple python program could look like this::
31 | 
32 |    >>> from cluster import HierarchicalClustering
33 |    >>> data = [12,34,23,32,46,96,13]
34 |    >>> cl = HierarchicalClustering(data, lambda x,y: abs(x-y))
35 |    >>> cl.getlevel(10)     # get clusters of items closer than 10
36 |    [96, 46, [12, 13, 23, 34, 32]]
37 |    >>> cl.getlevel(5)      # get clusters of items closer than 5
38 |    [96, 46, [12, 13], 23, [34, 32]]
39 | 
40 | Note, that when you retrieve a set of clusters, it immediately starts the
41 | clustering process, which is quite complex. If you intend to create clusters
42 | from a large dataset, consider doing that in a separate thread.
43 | 
44 | For K-Means clustering it would look like this::
45 | 
46 |     >>> from cluster import KMeansClustering
47 |     >>> cl = KMeansClustering([(1,1), (2,1), (5,3), ...])
48 |     >>> clusters = cl.getclusters(2)
49 | 
50 | The parameter passed to getclusters is the count of clusters generated.
51 | 
52 | 
53 | .. image:: https://readthedocs.org/projects/python-cluster/badge/?version=latest
54 |     :target: http://python-cluster.readthedocs.org
55 |     :alt: Documentation Status
56 | 


--------------------------------------------------------------------------------
/cluster.bmp:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/exhuma/python-cluster/2739ff420ef5bf8fba53f67788453b8239f16c9a/cluster.bmp


--------------------------------------------------------------------------------
/cluster/__init__.py:
--------------------------------------------------------------------------------
 1 | #
 2 | # This is part of "python-cluster". A library to group similar items together.
 3 | # Copyright (C) 2006    Michel Albert
 4 | #
 5 | # This library is free software; you can redistribute it and/or modify it
 6 | # under the terms of the GNU Lesser General Public License as published by the
 7 | # Free Software Foundation; either version 2.1 of the License, or (at your
 8 | # option) any later version.
 9 | # This library is distributed in the hope that it will be useful, but WITHOUT
10 | # ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
11 | # FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
12 | # for more details.
13 | # You should have received a copy of the GNU Lesser General Public License
14 | # along with this library; if not, write to the Free Software Foundation,
15 | # Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
16 | #
17 | 
18 | 
19 | from pkg_resources import resource_string
20 | 
21 | from .method.hierarchical import HierarchicalClustering
22 | from .method.kmeans import KMeansClustering
23 | from .util import ClusteringError
24 | 
25 | __version__ = resource_string('cluster', 'version.txt').decode('ascii').strip()
26 | 


--------------------------------------------------------------------------------
/cluster/cluster.py:
--------------------------------------------------------------------------------
  1 | #
  2 | # This is part of "python-cluster". A library to group similar items together.
  3 | # Copyright (C) 2006    Michel Albert
  4 | #
  5 | # This library is free software; you can redistribute it and/or modify it
  6 | # under the terms of the GNU Lesser General Public License as published by the
  7 | # Free Software Foundation; either version 2.1 of the License, or (at your
  8 | # option) any later version.
  9 | # This library is distributed in the hope that it will be useful, but WITHOUT
 10 | # ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
 11 | # FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
 12 | # for more details.
 13 | # You should have received a copy of the GNU Lesser General Public License
 14 | # along with this library; if not, write to the Free Software Foundation,
 15 | # Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
 16 | #
 17 | 
 18 | from __future__ import print_function
 19 | 
 20 | from .util import fullyflatten
 21 | 
 22 | 
 23 | class Cluster(object):
 24 |     """
 25 |     A collection of items. This is internally used to detect clustered items
 26 |     in the data so we could distinguish other collection types (lists, dicts,
 27 |     ...) from the actual clusters. This means that you could also create
 28 |     clusters of lists with this class.
 29 |     """
 30 | 
 31 |     def __repr__(self):
 32 |         return "<Cluster@%s(%s)>" % (self.level, self.items)
 33 | 
 34 |     def __str__(self):
 35 |         return self.__str__()
 36 | 
 37 |     def __init__(self, level, *args):
 38 |         """
 39 |         Constructor
 40 | 
 41 |         :param level: The level of this cluster. This is used in hierarchical
 42 |             clustering to retrieve a specific set of clusters. The higher the
 43 |             level, the smaller the count of clusters returned. The level depends
 44 |             on the difference function used.
 45 |         :param *args: every additional argument passed following the level value
 46 |             will get added as item to the cluster. You could also pass a list as
 47 |             second parameter to initialise the cluster with that list as content
 48 |         """
 49 |         self.level = level
 50 |         if len(args) == 0:
 51 |             self.items = []
 52 |         else:
 53 |             self.items = args
 54 | 
 55 |     def __iter__(self):
 56 |         for item in self.items:
 57 |             if isinstance(item, Cluster):
 58 |                 for recursed_item in item:
 59 |                     yield recursed_item
 60 |             else:
 61 |                 yield item
 62 | 
 63 |     def display(self, depth=0):
 64 |         """
 65 |         Pretty-prints this cluster. Useful for debuging.
 66 |         """
 67 |         print(depth * "    " + "[level %s]" % self.level)
 68 |         for item in self.items:
 69 |             if isinstance(item, Cluster):
 70 |                 item.display(depth + 1)
 71 |             else:
 72 |                 print(depth * "    " + "%s" % item)
 73 | 
 74 |     def topology(self):
 75 |         """
 76 |         Returns the structure (topology) of the cluster as tuples.
 77 | 
 78 |         Output from cl.data::
 79 | 
 80 |                 [<Cluster@0.833333333333(['CVS',
 81 |                  <Cluster@0.818181818182(['34.xls',
 82 |                  <Cluster@0.789473684211([<Cluster@0.555555555556(['0.txt',
 83 |                  <Cluster@0.181818181818(['ChangeLog', 'ChangeLog.txt'])>])>,
 84 |                  <Cluster@0.684210526316(['20060730.py',
 85 |                  <Cluster@0.684210526316(['.cvsignore',
 86 |                  <Cluster@0.647058823529(['About.py', <Cluster@0.625(['.idlerc',
 87 |                  '.pylint.d'])>])>])>])>])>])>])>]
 88 | 
 89 |         Corresponding output from cl.topo()::
 90 | 
 91 |                 ('CVS', ('34.xls', (('0.txt', ('ChangeLog', 'ChangeLog.txt')),
 92 |                 ('20060730.py', ('.cvsignore', ('About.py',
 93 |                 ('.idlerc', '.pylint.d')))))))
 94 |         """
 95 | 
 96 |         left = self.items[0]
 97 |         right = self.items[1]
 98 | 
 99 |         if isinstance(left, Cluster):
100 |             first = left.topology()
101 |         else:
102 |             first = left
103 | 
104 |         if isinstance(right, Cluster):
105 |             second = right.topology()
106 |         else:
107 |             second = right
108 | 
109 |         return first, second
110 | 
111 |     def getlevel(self, threshold):
112 |         """
113 |         Retrieve all clusters up to a specific level threshold. This
114 |         level-threshold represents the maximum distance between two clusters.
115 |         So the lower you set this threshold, the more clusters you will
116 |         receive and the higher you set it, you will receive less but bigger
117 |         clusters.
118 | 
119 |         :param threshold: The level threshold:
120 | 
121 |         .. note::
122 |             It is debatable whether the value passed into this method should
123 |             really be as strongly linked to the real cluster-levels as it is
124 |             right now. The end-user will not know the range of this value
125 |             unless s/he first inspects the top-level cluster. So instead you
126 |             might argue that a value ranging from 0 to 1 might be a more
127 |             useful approach.
128 |         """
129 | 
130 |         left = self.items[0]
131 |         right = self.items[1]
132 | 
133 |         # if this object itself is below the threshold value we only need to
134 |         # return it's contents as a list
135 |         if self.level <= threshold:
136 |             return [fullyflatten(self.items)]
137 | 
138 |         # if this cluster's level is higher than the threshold we will
139 |         # investgate it's left and right part. Their level could be below the
140 |         # threshold
141 |         if isinstance(left, Cluster) and left.level <= threshold:
142 |             if isinstance(right, Cluster):
143 |                 return [fullyflatten(left.items)] + right.getlevel(threshold)
144 |             else:
145 |                 return [fullyflatten(left.items)] + [[right]]
146 |         elif isinstance(right, Cluster) and right.level <= threshold:
147 |             if isinstance(left, Cluster):
148 |                 return left.getlevel(threshold) + [fullyflatten(right.items)]
149 |             else:
150 |                 return [[left]] + [fullyflatten(right.items)]
151 | 
152 |         # Alright. We covered the cases where one of the clusters was below
153 |         # the threshold value. Now we'll deal with the clusters that are above
154 |         # by recursively applying the previous cases.
155 |         if isinstance(left, Cluster) and isinstance(right, Cluster):
156 |             return left.getlevel(threshold) + right.getlevel(threshold)
157 |         elif isinstance(left, Cluster):
158 |             return left.getlevel(threshold) + [[right]]
159 |         elif isinstance(right, Cluster):
160 |             return [[left]] + right.getlevel(threshold)
161 |         else:
162 |             return [[left], [right]]
163 | 


--------------------------------------------------------------------------------
/cluster/linkage.py:
--------------------------------------------------------------------------------
  1 | from __future__ import division
  2 | from functools import wraps
  3 | 
  4 | 
  5 | def cached(fun):
  6 |     """
  7 |     memoizing decorator for linkage functions.
  8 | 
  9 |     Parameters have been hardcoded (no ``*args``, ``**kwargs`` magic), because,
 10 |     the way this is coded (interchangingly using sets and frozensets) is true
 11 |     for this specific case. For other cases that is not necessarily guaranteed.
 12 |     """
 13 | 
 14 |     _cache = {}
 15 | 
 16 |     @wraps(fun)
 17 |     def newfun(a, b, distance_function):
 18 |         frozen_a = frozenset(a)
 19 |         frozen_b = frozenset(b)
 20 |         if (frozen_a, frozen_b) not in _cache:
 21 |             result = fun(a, b, distance_function)
 22 |             _cache[(frozen_a, frozen_b)] = result
 23 |         return _cache[(frozen_a, frozen_b)]
 24 |     return newfun
 25 | 
 26 | 
 27 | @cached
 28 | def single(a, b, distance_function):
 29 |     """
 30 |     Given two collections ``a`` and ``b``, this will return the distance of the
 31 |     points which are closest together.  ``distance_function`` is used to
 32 |     determine the distance between two elements.
 33 | 
 34 |     Example::
 35 | 
 36 |         >>> single([1, 2], [3, 4], lambda x, y: abs(x-y))
 37 |         1  # (distance between 2 and 3)
 38 |     """
 39 |     left_a, right_a = min(a), max(a)
 40 |     left_b, right_b = min(b), max(b)
 41 |     result = min(distance_function(left_a, right_b),
 42 |                  distance_function(left_b, right_a))
 43 |     return result
 44 | 
 45 | 
 46 | @cached
 47 | def complete(a, b, distance_function):
 48 |     """
 49 |     Given two collections ``a`` and ``b``, this will return the distance of the
 50 |     points which are farthest apart.  ``distance_function`` is used to determine
 51 |     the distance between two elements.
 52 | 
 53 |     Example::
 54 | 
 55 |         >>> single([1, 2], [3, 4], lambda x, y: abs(x-y))
 56 |         3  # (distance between 1 and 4)
 57 |     """
 58 |     left_a, right_a = min(a), max(a)
 59 |     left_b, right_b = min(b), max(b)
 60 |     result = max(distance_function(left_a, right_b),
 61 |                  distance_function(left_b, right_a))
 62 |     return result
 63 | 
 64 | 
 65 | @cached
 66 | def average(a, b, distance_function):
 67 |     """
 68 |     Given two collections ``a`` and ``b``, this will return the mean of all
 69 |     distances. ``distance_function`` is used to determine the distance between
 70 |     two elements.
 71 | 
 72 |     Example::
 73 | 
 74 |         >>> single([1, 2], [3, 100], lambda x, y: abs(x-y))
 75 |         26
 76 |     """
 77 |     distances = [distance_function(x, y)
 78 |                  for x in a for y in b]
 79 |     return sum(distances) / len(distances)
 80 | 
 81 | 
 82 | @cached
 83 | def uclus(a, b, distance_function):
 84 |     """
 85 |     Given two collections ``a`` and ``b``, this will return the *median* of all
 86 |     distances. ``distance_function`` is used to determine the distance between
 87 |     two elements.
 88 | 
 89 |     Example::
 90 | 
 91 |         >>> single([1, 2], [3, 100], lambda x, y: abs(x-y))
 92 |         2.5
 93 |     """
 94 |     distances = sorted([distance_function(x, y)
 95 |                         for x in a for y in b])
 96 |     midpoint, rest = len(distances) // 2, len(distances) % 2
 97 |     if not rest:
 98 |         return sum(distances[midpoint-1:midpoint+1]) / 2
 99 |     else:
100 |         return distances[midpoint]
101 | 


--------------------------------------------------------------------------------
/cluster/matrix.py:
--------------------------------------------------------------------------------
  1 | #
  2 | # This is part of "python-cluster". A library to group similar items together.
  3 | # Copyright (C) 2006    Michel Albert
  4 | #
  5 | # This library is free software; you can redistribute it and/or modify it
  6 | # under the terms of the GNU Lesser General Public License as published by the
  7 | # Free Software Foundation; either version 2.1 of the License, or (at your
  8 | # option) any later version.
  9 | # This library is distributed in the hope that it will be useful, but WITHOUT
 10 | # ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
 11 | # FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
 12 | # for more details.
 13 | # You should have received a copy of the GNU Lesser General Public License
 14 | # along with this library; if not, write to the Free Software Foundation,
 15 | # Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
 16 | #
 17 | 
 18 | 
 19 | import logging
 20 | from multiprocessing import Process, Queue, current_process
 21 | 
 22 | 
 23 | logger = logging.getLogger(__name__)
 24 | 
 25 | def _encapsulate_item_for_combinfunc(item):
 26 |     """
 27 |     This function has been extracted in order to 
 28 |     make Github issue #28 easier to investigate.
 29 |     It replaces the following two lines of code, 
 30 |     which occur twice in method genmatrix, just
 31 |     before the invocation of combinfunc.
 32 |         if not hasattr(item, '__iter__') or isinstance(item, tuple):
 33 |             item = [item]
 34 |     Logging was added to the original two lines
 35 |     and shows that the outcome of this snippet
 36 |     has changed between Python2.7 and Python3.5.
 37 |     This logging showed that the difference in 
 38 |     outcome consisted of the handling of the builtin
 39 |     str class, which was encapsulated into a list in
 40 |     Python2.7 but returned naked in Python3.5.
 41 |     Adding a test for this specific class to the 
 42 |     set of conditions appears to give correct behaviour
 43 |     under both versions.
 44 |     """
 45 |     encapsulated_item = None
 46 |     if  (
 47 |         not hasattr(item, '__iter__') or
 48 |         isinstance(item, tuple) or
 49 |         isinstance(item, str)
 50 |     ):
 51 |         encapsulated_item = [item]
 52 |     else:
 53 |         encapsulated_item = item
 54 |     logging.debug(
 55 |         "item class:%s encapsulated as:%s ",
 56 |         item.__class__.__name__, 
 57 |         encapsulated_item.__class__.__name__
 58 |     )
 59 |     return encapsulated_item
 60 | 
 61 | 
 62 | class Matrix(object):
 63 |     """
 64 |     Object representation of the item-item matrix.
 65 |     """
 66 | 
 67 |     def __init__(self, data, combinfunc, symmetric=False, diagonal=None):
 68 |         """
 69 |         Takes a list of data and generates a 2D-matrix using the supplied
 70 |         combination function to calculate the values.
 71 | 
 72 |         :param data: the list of items.
 73 |         :param combinfunc: the function that is used to calculate teh value in a
 74 |             cell. It has to cope with two arguments.
 75 |         :param symmetric: Whether it will be a symmetric matrix along the
 76 |             diagonal.  For example, if the list contains integers, and the
 77 |             combination function is ``abs(x-y)``, then the matrix will be
 78 |             symmetric.
 79 |         :param diagonal: The value to be put into the diagonal. For some
 80 |             functions, the diagonal will stay constant. An example could be the
 81 |             function ``x-y``. Then each diagonal cell will be ``0``.  If this
 82 |             value is set to None, then the diagonal will be calculated.
 83 |         """
 84 |         self.data = data
 85 |         self.combinfunc = combinfunc
 86 |         self.symmetric = symmetric
 87 |         self.diagonal = diagonal
 88 | 
 89 |     def worker(self):
 90 |         """
 91 |         Multiprocessing task function run by worker processes
 92 |         """
 93 |         tasks_completed = 0
 94 |         for task in iter(self.task_queue.get, 'STOP'):
 95 |             col_index, item, item2 = task
 96 |             if not hasattr(item, '__iter__') or isinstance(item, tuple):
 97 |                 item = [item]
 98 |             if not hasattr(item2, '__iter__') or isinstance(item2, tuple):
 99 |                 item2 = [item2]
100 |             result = (col_index, self.combinfunc(item, item2))
101 |             self.done_queue.put(result)
102 |             tasks_completed += 1
103 |         logger.info("Worker %s performed %s tasks",
104 |                     current_process().name,
105 |                     tasks_completed)
106 | 
107 |     def genmatrix(self, num_processes=1):
108 |         """
109 |         Actually generate the matrix
110 | 
111 |         :param num_processes: If you want to use multiprocessing to split up the
112 |             work and run ``combinfunc()`` in parallel, specify
113 |             ``num_processes > 1`` and this number of workers will be spun up,
114 |             the work is split up amongst them evenly.
115 |         """
116 |         use_multiprocessing = num_processes > 1
117 |         if use_multiprocessing:
118 |             self.task_queue = Queue()
119 |             self.done_queue = Queue()
120 | 
121 |         self.matrix = []
122 |         logger.info("Generating matrix for %s items - O(n^2)", len(self.data))
123 |         if use_multiprocessing:
124 |             logger.info("Using multiprocessing on %s processes!", num_processes)
125 | 
126 |         if use_multiprocessing:
127 |             logger.info("Spinning up %s workers", num_processes)
128 |             processes = [Process(target=self.worker) for i in range(num_processes)]
129 |             [process.start() for process in processes]
130 | 
131 |         for row_index, item in enumerate(self.data):
132 |             logger.debug("Generating row %s/%s (%0.2f%%)",
133 |                          row_index,
134 |                          len(self.data),
135 |                          100.0 * row_index / len(self.data))
136 |             row = {}
137 |             if use_multiprocessing:
138 |                 num_tasks_queued = num_tasks_completed = 0
139 |             for col_index, item2 in enumerate(self.data):
140 |                 if self.diagonal is not None and col_index == row_index:
141 |                     # This is a cell on the diagonal
142 |                     row[col_index] = self.diagonal
143 |                 elif self.symmetric and col_index < row_index:
144 |                     # The matrix is symmetric and we are "in the lower left
145 |                     # triangle" - fill this in after (in case of multiprocessing)
146 |                     pass
147 |                 # Otherwise, this cell is not on the diagonal and we do indeed
148 |                 # need to call combinfunc()
149 |                 elif use_multiprocessing:
150 |                     # Add that thing to the task queue!
151 |                     self.task_queue.put((col_index, item, item2))
152 |                     num_tasks_queued += 1
153 |                     # Start grabbing the results as we go, so as not to stuff all of
154 |                     # the worker args into memory at once (as Queue.get() is a
155 |                     # blocking operation)
156 |                     if num_tasks_queued > num_processes:
157 |                         col_index, result = self.done_queue.get()
158 |                         row[col_index] = result
159 |                         num_tasks_completed += 1
160 |                 else:
161 |                     # Otherwise do it here, in line
162 |                     """
163 |                     if not hasattr(item, '__iter__') or isinstance(item, tuple):
164 |                         item = [item]
165 |                     if not hasattr(item2, '__iter__') or isinstance(item2, tuple):
166 |                         item2 = [item2]
167 |                     """
168 |                     # See the comment in function _encapsulate_item_for_combinfunc
169 |                     # for details of why the lines above have been replaced
170 |                     # by function invocations
171 |                     item = _encapsulate_item_for_combinfunc(item)
172 |                     item2 = _encapsulate_item_for_combinfunc(item2)
173 |                     row[col_index] = self.combinfunc(item, item2)
174 | 
175 |             if self.symmetric:
176 |                 # One more iteration to get symmetric lower left triangle
177 |                 for col_index, item2 in enumerate(self.data):
178 |                     if col_index >= row_index:
179 |                         break
180 |                     # post-process symmetric "lower left triangle"
181 |                     row[col_index] = self.matrix[col_index][row_index]
182 | 
183 |             if use_multiprocessing:
184 |                 # Grab the remaining worker task results
185 |                 while num_tasks_completed < num_tasks_queued:
186 |                     col_index, result = self.done_queue.get()
187 |                     row[col_index] = result
188 |                     num_tasks_completed += 1
189 | 
190 |             row_indexed = [row[index] for index in range(len(self.data))]
191 |             self.matrix.append(row_indexed)
192 | 
193 |         if use_multiprocessing:
194 |             logger.info("Stopping/joining %s workers", num_processes)
195 |             [self.task_queue.put('STOP') for i in range(num_processes)]
196 |             [process.join() for process in processes]
197 | 
198 |         logger.info("Matrix generated")
199 | 
200 |     def __str__(self):
201 |         """
202 |         Returns a 2-dimensional list of data as text-string which can be
203 |         displayed to the user.
204 |         """
205 |         # determine maximum length
206 |         maxlen = 0
207 |         colcount = len(self.data[0])
208 |         for col in self.data:
209 |             for cell in col:
210 |                 maxlen = max(len(str(cell)), maxlen)
211 |         format = " %%%is |" % maxlen
212 |         format = "|" + format * colcount
213 |         rows = [format % tuple(row) for row in self.data]
214 |         return "\n".join(rows)
215 | 


--------------------------------------------------------------------------------
/cluster/method/__init__.py:
--------------------------------------------------------------------------------
 1 | #
 2 | # This is part of "python-cluster". A library to group similar items together.
 3 | # Copyright (C) 2006    Michel Albert
 4 | #
 5 | # This library is free software; you can redistribute it and/or modify it
 6 | # under the terms of the GNU Lesser General Public License as published by the
 7 | # Free Software Foundation; either version 2.1 of the License, or (at your
 8 | # option) any later version.
 9 | # This library is distributed in the hope that it will be useful, but WITHOUT
10 | # ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
11 | # FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
12 | # for more details.
13 | # You should have received a copy of the GNU Lesser General Public License
14 | # along with this library; if not, write to the Free Software Foundation,
15 | # Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
16 | #
17 | 
18 | 


--------------------------------------------------------------------------------
/cluster/method/base.py:
--------------------------------------------------------------------------------
 1 | #
 2 | # This is part of "python-cluster". A library to group similar items together.
 3 | # Copyright (C) 2006    Michel Albert
 4 | #
 5 | # This library is free software; you can redistribute it and/or modify it
 6 | # under the terms of the GNU Lesser General Public License as published by the
 7 | # Free Software Foundation; either version 2.1 of the License, or (at your
 8 | # option) any later version.
 9 | # This library is distributed in the hope that it will be useful, but WITHOUT
10 | # ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
11 | # FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
12 | # for more details.
13 | # You should have received a copy of the GNU Lesser General Public License
14 | # along with this library; if not, write to the Free Software Foundation,
15 | # Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
16 | #
17 | 
18 | 
19 | class BaseClusterMethod(object):
20 |     """
21 |     The base class of all clustering methods.
22 | 
23 |     :param input: a list of objects
24 |     :distance_function: a function returning the distance - or opposite of
25 |         similarity ``(distance = -similarity)`` - of two items from the input.
26 |         In other words, the closer the two items are related, the smaller this
27 |         value needs to be.  With 0 meaning they are exactly the same.
28 | 
29 |     .. note::
30 |         The distance function should always return the absolute distance between
31 |         two given items of the list. Say::
32 | 
33 |             distance(input[1], input[4]) = distance(input[4], input[1])
34 | 
35 |         This is very important for the clustering algorithm to work!  Naturally,
36 |         the data returned by the distance function MUST be a comparable
37 |         datatype, so you can perform arithmetic comparisons on them (``<`` or
38 |         ``>``)! The simplest examples would be floats or ints. But as long as
39 |         they are comparable, it's ok.
40 |     """
41 | 
42 |     def __init__(self, input, distance_function, progress_callback=None):
43 |         self.distance = distance_function
44 |         self._input = input    # the original input
45 |         self._data = input[:]  # clone the input so we can work with it
46 |                                # without distroying the original data.
47 |         self.progress_callback = progress_callback
48 | 
49 |     def topo(self):
50 |         """
51 |         Returns the structure (topology) of the cluster.
52 | 
53 |         See :py:meth:`~cluster.cluster.Cluster.topology` for more information.
54 |         """
55 |         return self.data[0].topology()
56 | 
57 |     @property
58 |     def data(self):
59 |         """
60 |         Returns the data that is currently in process.
61 |         """
62 |         return self._data
63 | 
64 |     @property
65 |     def raw_data(self):
66 |         """
67 |         Returns the raw data (data without being clustered).
68 |         """
69 |         return self._input
70 | 


--------------------------------------------------------------------------------
/cluster/method/hierarchical.py:
--------------------------------------------------------------------------------
  1 | #
  2 | # This is part of "python-cluster". A library to group similar items together.
  3 | # Copyright (C) 2006    Michel Albert
  4 | #
  5 | # This library is free software; you can redistribute it and/or modify it
  6 | # under the terms of the GNU Lesser General Public License as published by the
  7 | # Free Software Foundation; either version 2.1 of the License, or (at your
  8 | # option) any later version.
  9 | # This library is distributed in the hope that it will be useful, but WITHOUT
 10 | # ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
 11 | # FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
 12 | # for more details.
 13 | # You should have received a copy of the GNU Lesser General Public License
 14 | # along with this library; if not, write to the Free Software Foundation,
 15 | # Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
 16 | #
 17 | 
 18 | from functools import partial
 19 | import logging
 20 | 
 21 | from cluster.cluster import Cluster
 22 | from cluster.matrix import Matrix
 23 | from cluster.method.base import BaseClusterMethod
 24 | from cluster.linkage import single, complete, average, uclus
 25 | 
 26 | 
 27 | logger = logging.getLogger(__name__)
 28 | 
 29 | 
 30 | class HierarchicalClustering(BaseClusterMethod):
 31 |     """
 32 |     Implementation of the hierarchical clustering method as explained in a
 33 |     tutorial_ by *matteucc*.
 34 | 
 35 |     Object prerequisites:
 36 | 
 37 |     * Items must be sortable (See `issue #11`_)
 38 |     * Items must be hashable.
 39 | 
 40 |     .. _issue #11: https://github.com/exhuma/python-cluster/issues/11
 41 |     .. _tutorial: http://www.elet.polimi.it/upload/matteucc/Clustering/tutorial_html/hierarchical.html
 42 | 
 43 |     Example:
 44 | 
 45 |         >>> from cluster import HierarchicalClustering
 46 |         >>> # or: from cluster import *
 47 |         >>> cl = HierarchicalClustering([123,334,345,242,234,1,3],
 48 |                 lambda x,y: float(abs(x-y)))
 49 |         >>> cl.getlevel(90)
 50 |         [[345, 334], [234, 242], [123], [3, 1]]
 51 | 
 52 |     Note that all of the returned clusters are more than 90 (``getlevel(90)``)
 53 |     apart.
 54 | 
 55 |     See :py:class:`~cluster.method.base.BaseClusterMethod` for more details.
 56 | 
 57 |     :param data: The collection of items to be clustered.
 58 |     :param distance_function: A function which takes two elements of ``data``
 59 |         and returns a distance between both elements (note that the distance
 60 |         should not be returned as negative value!)
 61 |     :param linkage: The method used to determine the distance between two
 62 |         clusters. See :py:meth:`~.HierarchicalClustering.set_linkage_method` for
 63 |         possible values.
 64 |     :param num_processes: If you want to use multiprocessing to split up the
 65 |         work and run ``genmatrix()`` in parallel, specify num_processes > 1 and
 66 |         this number of workers will be spun up, the work split up amongst them
 67 |         evenly.
 68 |     :param progress_callback: A function to be called on each iteration to
 69 |         publish the progress. The function is called with two integer arguments
 70 |         which represent the total number of elements in the cluster, and the
 71 |         remaining elements to be clustered.
 72 |     """
 73 | 
 74 |     def __init__(self, data, distance_function, linkage=None, num_processes=1,
 75 |                  progress_callback=None):
 76 |         if not linkage:
 77 |             linkage = single
 78 |         logger.info("Initializing HierarchicalClustering object with linkage "
 79 |                     "method %s", linkage)
 80 |         BaseClusterMethod.__init__(self, sorted(data), distance_function)
 81 |         self.set_linkage_method(linkage)
 82 |         self.num_processes = num_processes
 83 |         self.progress_callback = progress_callback
 84 |         self.__cluster_created = False
 85 | 
 86 |     def publish_progress(self, total, current):
 87 |         """
 88 |         If a progress function was supplied, this will call that function with
 89 |         the total number of elements, and the remaining number of elements.
 90 | 
 91 |         :param total: The total number of elements.
 92 |         :param remaining: The remaining number of elements.
 93 |         """
 94 |         if self.progress_callback:
 95 |             self.progress_callback(total, current)
 96 | 
 97 |     def set_linkage_method(self, method):
 98 |         """
 99 |         Sets the method to determine the distance between two clusters.
100 | 
101 |         :param method: The method to use. It can be one of ``'single'``,
102 |             ``'complete'``, ``'average'`` or ``'uclus'``, or a callable. The
103 |             callable should take two collections as parameters and return a
104 |             distance value between both collections.
105 |         """
106 |         if method == 'single':
107 |             self.linkage = single
108 |         elif method == 'complete':
109 |             self.linkage = complete
110 |         elif method == 'average':
111 |             self.linkage = average
112 |         elif method == 'uclus':
113 |             self.linkage = uclus
114 |         elif hasattr(method, '__call__'):
115 |             self.linkage = method
116 |         else:
117 |             raise ValueError('distance method must be one of single, '
118 |                              'complete, average of uclus')
119 | 
120 |     def cluster(self, matrix=None, level=None, sequence=None):
121 |         """
122 |         Perform hierarchical clustering.
123 | 
124 |         :param matrix: The 2D list that is currently under processing. The
125 |             matrix contains the distances of each item with each other
126 |         :param level: The current level of clustering
127 |         :param sequence: The sequence number of the clustering
128 |         """
129 |         logger.info("Performing cluster()")
130 | 
131 |         if matrix is None:
132 |             # create level 0, first iteration (sequence)
133 |             level = 0
134 |             sequence = 0
135 |             matrix = []
136 | 
137 |         # if the matrix only has two rows left, we are done
138 |         linkage = partial(self.linkage, distance_function=self.distance)
139 |         initial_element_count = len(self._data)
140 |         while len(matrix) > 2 or matrix == []:
141 | 
142 |             item_item_matrix = Matrix(self._data,
143 |                                       linkage,
144 |                                       True,
145 |                                       0)
146 |             item_item_matrix.genmatrix(self.num_processes)
147 |             matrix = item_item_matrix.matrix
148 | 
149 |             smallestpair = None
150 |             mindistance = None
151 |             rowindex = 0  # keep track of where we are in the matrix
152 |             # find the minimum distance
153 |             for row in matrix:
154 |                 cellindex = 0  # keep track of where we are in the matrix
155 |                 for cell in row:
156 |                     # if we are not on the diagonal (which is always 0)
157 |                     # and if this cell represents a new minimum...
158 |                     cell_lt_mdist = cell < mindistance if mindistance else False
159 |                     if ((rowindex != cellindex) and
160 |                             (cell_lt_mdist or smallestpair is None)):
161 |                         smallestpair = (rowindex, cellindex)
162 |                         mindistance = cell
163 |                     cellindex += 1
164 |                 rowindex += 1
165 | 
166 |             sequence += 1
167 |             level = matrix[smallestpair[1]][smallestpair[0]]
168 |             cluster = Cluster(level, self._data[smallestpair[0]],
169 |                               self._data[smallestpair[1]])
170 | 
171 |             # maintain the data, by combining the the two most similar items
172 |             # in the list we use the min and max functions to ensure the
173 |             # integrity of the data.  imagine: if we first remove the item
174 |             # with the smaller index, all the rest of the items shift down by
175 |             # one. So the next index will be wrong. We could simply adjust the
176 |             # value of the second "remove" call, but we don't know the order
177 |             # in which they come. The max and min approach clarifies that
178 |             self._data.remove(self._data[max(smallestpair[0],
179 |                                              smallestpair[1])])  # remove item 1
180 |             self._data.remove(self._data[min(smallestpair[0],
181 |                                              smallestpair[1])])  # remove item 2
182 |             self._data.append(cluster)  # append item 1 and 2 combined
183 | 
184 |             self.publish_progress(initial_element_count, len(self._data))
185 | 
186 |         # all the data is in one single cluster. We return that and stop
187 |         self.__cluster_created = True
188 |         logger.info("Call to cluster() is complete")
189 |         return
190 | 
191 |     def getlevel(self, threshold):
192 |         """
193 |         Returns all clusters with a maximum distance of *threshold* in between
194 |         each other
195 | 
196 |         :param threshold: the maximum distance between clusters.
197 | 
198 |         See :py:meth:`~cluster.cluster.Cluster.getlevel`
199 |         """
200 | 
201 |         # if it's not worth clustering, just return the data
202 |         if len(self._input) <= 1:
203 |             return self._input
204 | 
205 |         # initialize the cluster if not yet done
206 |         if not self.__cluster_created:
207 |             self.cluster()
208 | 
209 |         return self._data[0].getlevel(threshold)
210 | 
211 |     def display(self):
212 |         """
213 |         Prints a simple dendogram-like representation of the full cluster
214 |         to the console.
215 |         """
216 |         # initialize the cluster if not yet done
217 |         if not self.__cluster_created:
218 |             self.cluster()
219 | 
220 |         self._data[0].display()
221 | 


--------------------------------------------------------------------------------
/cluster/method/kmeans.py:
--------------------------------------------------------------------------------
  1 | #
  2 | # This is part of "python-cluster". A library to group similar items together.
  3 | # Copyright (C) 2006    Michel Albert
  4 | #
  5 | # This library is free software; you can redistribute it and/or modify it
  6 | # under the terms of the GNU Lesser General Public License as published by the
  7 | # Free Software Foundation; either version 2.1 of the License, or (at your
  8 | # option) any later version.
  9 | # This library is distributed in the hope that it will be useful, but WITHOUT
 10 | # ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
 11 | # FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
 12 | # for more details.
 13 | # You should have received a copy of the GNU Lesser General Public License
 14 | # along with this library; if not, write to the Free Software Foundation,
 15 | # Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
 16 | #
 17 | 
 18 | 
 19 | from cluster.util import ClusteringError, centroid, minkowski_distance
 20 | 
 21 | 
 22 | class KMeansClustering(object):
 23 |     """
 24 |     Implementation of the kmeans clustering method as explained in a tutorial_
 25 |     by *matteucc*.
 26 | 
 27 |     .. _tutorial: http://www.elet.polimi.it/upload/matteucc/Clustering/tutorial_html/kmeans.html
 28 | 
 29 |     Example:
 30 | 
 31 |       >>> from cluster import KMeansClustering
 32 |       >>> cl = KMeansClustering([(1,1), (2,1), (5,3), ...])
 33 |       >>> clusters = cl.getclusters(2)
 34 | 
 35 |     :param data: A list of tuples or integers.
 36 |     :param distance: A function determining the distance between two items.
 37 |         Default (if ``None`` is passed): It assumes the tuples contain numeric
 38 |         values and appiles a generalised form of the euclidian-distance
 39 |         algorithm on them.
 40 |     :param equality: A function to test equality of items. By default the
 41 |         standard python equality operator (``==``) is applied.
 42 |     :raises ValueError: if the list contains heterogeneous items or if the
 43 |         distance between items cannot be determined.
 44 |     """
 45 | 
 46 |     def __init__(self, data, distance=None, equality=None):
 47 |         self.__clusters = []
 48 |         self.__data = data
 49 |         self.distance = distance
 50 |         self.__initial_length = len(data)
 51 |         self.equality = equality
 52 | 
 53 |         # test if each item is of same dimensions
 54 |         if len(data) > 1 and isinstance(data[0], tuple):
 55 |             control_length = len(data[0])
 56 |             for item in data[1:]:
 57 |                 if len(item) != control_length:
 58 |                     raise ValueError("Each item in the data list must have "
 59 |                                      "the same amount of dimensions. Item "
 60 |                                      "%r was out of line!" % (item,))
 61 |         # now check if we need and have a distance function
 62 |         if (len(data) > 1 and not isinstance(data[0], tuple) and
 63 |                 distance is None):
 64 |             raise ValueError("You supplied non-standard items but no "
 65 |                              "distance function! We cannot continue!")
 66 |         # we now know that we have tuples, and assume therefore that it's
 67 |         # items are numeric
 68 |         elif distance is None:
 69 |             self.distance = minkowski_distance
 70 | 
 71 |     def getclusters(self, count):
 72 |         """
 73 |         Generates *count* clusters.
 74 | 
 75 |         :param count: The amount of clusters that should be generated.  count
 76 |             must be greater than ``1``.
 77 |         :raises ClusteringError: if *count* is out of bounds.
 78 |         """
 79 | 
 80 |         # only proceed if we got sensible input
 81 |         if count <= 1:
 82 |             raise ClusteringError("When clustering, you need to ask for at "
 83 |                                   "least two clusters! "
 84 |                                   "You asked for %d" % count)
 85 | 
 86 |         # return the data straight away if there is nothing to cluster
 87 |         if (self.__data == [] or len(self.__data) == 1 or
 88 |                 count == self.__initial_length):
 89 |             return self.__data
 90 | 
 91 |         # It makes no sense to ask for more clusters than data-items available
 92 |         if count > self.__initial_length:
 93 |             raise ClusteringError(
 94 |                 "Unable to generate more clusters than "
 95 |                 "items available. You supplied %d items, and asked for "
 96 |                 "%d clusters." % (self.__initial_length, count))
 97 | 
 98 |         self.initialise_clusters(self.__data, count)
 99 | 
100 |         items_moved = True  # tells us if any item moved between the clusters,
101 |                             # as we initialised the clusters, we assume that
102 |                             # is the case
103 | 
104 |         while items_moved is True:
105 |             items_moved = False
106 |             for cluster in self.__clusters:
107 |                 for item in cluster:
108 |                     res = self.assign_item(item, cluster)
109 |                     if items_moved is False:
110 |                         items_moved = res
111 |         return self.__clusters
112 | 
113 |     def assign_item(self, item, origin):
114 |         """
115 |         Assigns an item from a given cluster to the closest located cluster.
116 | 
117 |         :param item: the item to be moved.
118 |         :param origin: the originating cluster.
119 |         """
120 |         closest_cluster = origin
121 |         for cluster in self.__clusters:
122 |             if self.distance(item, centroid(cluster)) < self.distance(
123 |                     item, centroid(closest_cluster)):
124 |                 closest_cluster = cluster
125 | 
126 |         if id(closest_cluster) != id(origin):
127 |             self.move_item(item, origin, closest_cluster)
128 |             return True
129 |         else:
130 |             return False
131 | 
132 |     def move_item(self, item, origin, destination):
133 |         """
134 |         Moves an item from one cluster to anoter cluster.
135 | 
136 |         :param item: the item to be moved.
137 |         :param origin: the originating cluster.
138 |         :param destination: the target cluster.
139 |         """
140 |         if self.equality:
141 |             item_index = 0
142 |             for i, element in enumerate(origin):
143 |                 if self.equality(element, item):
144 |                     item_index = i
145 |                     break
146 |         else:
147 |             item_index = origin.index(item)
148 | 
149 |         destination.append(origin.pop(item_index))
150 | 
151 |     def initialise_clusters(self, input_, clustercount):
152 |         """
153 |         Initialises the clusters by distributing the items from the data.
154 |         evenly across n clusters
155 | 
156 |         :param input_: the data set (a list of tuples).
157 |         :param clustercount: the amount of clusters (n).
158 |         """
159 |         # initialise the clusters with empty lists
160 |         self.__clusters = []
161 |         for _ in range(clustercount):
162 |             self.__clusters.append([])
163 | 
164 |         # distribute the items into the clusters
165 |         count = 0
166 |         for item in input_:
167 |             self.__clusters[count % clustercount].append(item)
168 |             count += 1
169 | 


--------------------------------------------------------------------------------
/cluster/test/test_hierarchical.py:
--------------------------------------------------------------------------------
  1 | #
  2 | # This is part of "python-cluster". A library to group similar items together.
  3 | # Copyright (C) 2006    Michel Albert
  4 | #
  5 | # This library is free software; you can redistribute it and/or modify it under
  6 | # the terms of the GNU Lesser General Public License as published by the Free
  7 | # Software Foundation; either version 2.1 of the License, or (at your option)
  8 | # any later version.
  9 | # This library is distributed in the hope that it will be useful, but WITHOUT
 10 | # ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
 11 | # FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more
 12 | # details.
 13 | # You should have received a copy of the GNU Lesser General Public License
 14 | # along with this library; if not, write to the Free Software Foundation, Inc.,
 15 | # 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
 16 | #
 17 | 
 18 | """
 19 | Tests for hierarchical clustering.
 20 | 
 21 | .. note::
 22 | 
 23 |     Even though the results are lists, the order of items in the resulting
 24 |     clusters is non-deterministic. This should be taken into consideration when
 25 |     writing "expected" values!
 26 | """
 27 | 
 28 | from difflib import SequenceMatcher
 29 | from math import sqrt
 30 | from sys import hexversion
 31 | import unittest
 32 | 
 33 | from cluster import HierarchicalClustering
 34 | 
 35 | 
 36 | class Py23TestCase(unittest.TestCase):
 37 | 
 38 |     def __init__(self, *args, **kwargs):
 39 |         super(Py23TestCase, self).__init__(*args, **kwargs)
 40 |         if hexversion < 0x030000f0:
 41 |             self.assertCItemsEqual = self.assertItemsEqual
 42 |         else:
 43 |             self.assertCItemsEqual = self.assertCountEqual
 44 | 
 45 | 
 46 | class HClusterSmallListTestCase(Py23TestCase):
 47 |     """
 48 |     Test for Bug #1516204
 49 |     """
 50 | 
 51 |     def testClusterLen1(self):
 52 |         """
 53 |         Testing if hierarchical clustering a set of length 1 returns a set of
 54 |         length 1
 55 |         """
 56 |         cl = HierarchicalClustering([876], lambda x, y: abs(x - y))
 57 |         self.assertCItemsEqual([876], cl.getlevel(40))
 58 | 
 59 |     def testClusterLen0(self):
 60 |         """
 61 |         Testing if hierarchical clustering an empty list returns an empty list
 62 |         """
 63 |         cl = HierarchicalClustering([], lambda x, y: abs(x - y))
 64 |         self.assertEqual([], cl.getlevel(40))
 65 | 
 66 | 
 67 | class HClusterIntegerTestCase(Py23TestCase):
 68 | 
 69 |     def setUp(self):
 70 |         self.__data = [791, 956, 676, 124, 564, 84, 24, 365, 594, 940, 398,
 71 |                        971, 131, 365, 542, 336, 518, 835, 134, 391]
 72 | 
 73 |     def testSingleLinkage(self):
 74 |         "Basic Hierarchical Clustering test with integers"
 75 |         cl = HierarchicalClustering(self.__data, lambda x, y: abs(x - y))
 76 |         result = cl.getlevel(40)
 77 | 
 78 |         # sort the values to make the tests less prone to algorithm changes
 79 |         result = [sorted(_) for _ in result]
 80 |         self.assertCItemsEqual([
 81 |             [24],
 82 |             [336, 365, 365, 391, 398],
 83 |             [518, 542, 564, 594],
 84 |             [676],
 85 |             [791],
 86 |             [835],
 87 |             [84, 124, 131, 134],
 88 |             [940, 956, 971],
 89 |         ], result)
 90 | 
 91 |     def testCompleteLinkage(self):
 92 |         "Basic Hierarchical Clustering test with integers"
 93 |         cl = HierarchicalClustering(self.__data,
 94 |                                     lambda x, y: abs(x - y),
 95 |                                     linkage='complete')
 96 |         result = cl.getlevel(40)
 97 | 
 98 |         # sort the values to make the tests less prone to algorithm changes
 99 |         result = sorted([sorted(_) for _ in result])
100 | 
101 |         expected = [
102 |             [24],
103 |             [84],
104 |             [124, 131, 134],
105 |             [336, 365, 365],
106 |             [391, 398],
107 |             [518],
108 |             [542, 564],
109 |             [594],
110 |             [676],
111 |             [791],
112 |             [835],
113 |             [940, 956, 971],
114 |         ]
115 |         self.assertEqual(result, expected)
116 | 
117 |     def testUCLUS(self):
118 |         "Basic Hierarchical Clustering test with integers"
119 |         cl = HierarchicalClustering(self.__data,
120 |                                     lambda x, y: abs(x - y),
121 |                                     linkage='uclus')
122 |         expected = [
123 |             [24],
124 |             [84],
125 |             [124, 131, 134],
126 |             [336, 365, 365, 391, 398],
127 |             [518, 542, 564],
128 |             [594],
129 |             [676],
130 |             [791],
131 |             [835],
132 |             [940, 956, 971],
133 |         ]
134 |         result = sorted([sorted(_) for _ in cl.getlevel(40)])
135 |         self.assertEqual(result, expected)
136 | 
137 |     def testAverageLinkage(self):
138 |         cl = HierarchicalClustering(self.__data,
139 |                                     lambda x, y: abs(x - y),
140 |                                     linkage='average')
141 |         # TODO: The current test-data does not really trigger a difference
142 |         # between UCLUS and "average" linkage.
143 |         expected = [
144 |             [24],
145 |             [84],
146 |             [124, 131, 134],
147 |             [336, 365, 365, 391, 398],
148 |             [518, 542, 564],
149 |             [594],
150 |             [676],
151 |             [791],
152 |             [835],
153 |             [940, 956, 971],
154 |         ]
155 |         result = sorted([sorted(_) for _ in cl.getlevel(40)])
156 |         self.assertEqual(result, expected)
157 | 
158 |     def testUnmodifiedData(self):
159 |         cl = HierarchicalClustering(self.__data, lambda x, y: abs(x - y))
160 |         new_data = []
161 |         [new_data.extend(_) for _ in cl.getlevel(40)]
162 |         self.assertEqual(sorted(new_data), sorted(self.__data))
163 | 
164 |     def testMultiprocessing(self):
165 |         cl = HierarchicalClustering(self.__data, lambda x, y: abs(x - y),
166 |                                     num_processes=4)
167 |         new_data = []
168 |         [new_data.extend(_) for _ in cl.getlevel(40)]
169 |         self.assertEqual(sorted(new_data), sorted(self.__data))
170 | 
171 | 
172 | class HClusterStringTestCase(Py23TestCase):
173 | 
174 |     def sim(self, x, y):
175 |         sm = SequenceMatcher(lambda x: x in ". -", x, y)
176 |         return 1 - sm.ratio()
177 | 
178 |     def setUp(self):
179 |         self.__data = ("Lorem ipsum dolor sit amet consectetuer adipiscing "
180 |                        "elit Ut elit Phasellus consequat ultricies mi Sed "
181 |                        "congue leo at neque Nullam").split()
182 | 
183 |     def testDataTypes(self):
184 |         "Test for bug #?"
185 |         cl = HierarchicalClustering(self.__data, self.sim)
186 |         for item in cl.getlevel(0.5):
187 |             self.assertEqual(
188 |                 type(item), type([]),
189 |                 "Every item should be a list!")
190 | 
191 |     def testCluster(self):
192 |         "Basic Hierachical clustering test with strings"
193 |         self.skipTest('These values lead to non-deterministic results. '
194 |                       'This makes it untestable!')
195 |         cl = HierarchicalClustering(self.__data, self.sim)
196 |         self.assertEqual([
197 |             ['ultricies'],
198 |             ['Sed'],
199 |             ['Phasellus'],
200 |             ['mi'],
201 |             ['Nullam'],
202 |             ['sit', 'elit', 'elit', 'Ut', 'amet', 'at'],
203 |             ['leo', 'Lorem', 'dolor'],
204 |             ['congue', 'neque', 'consectetuer', 'consequat'],
205 |             ['adipiscing'],
206 |             ['ipsum'],
207 |         ], cl.getlevel(0.5))
208 | 
209 |     def testUnmodifiedData(self):
210 |         cl = HierarchicalClustering(self.__data, self.sim)
211 |         new_data = []
212 |         [new_data.extend(_) for _ in cl.getlevel(0.5)]
213 |         self.assertEqual(sorted(new_data), sorted(self.__data))
214 | 
215 | 
216 | class HClusterTuplesTestCase(Py23TestCase):
217 |     '''
218 |     Test case to cover the case where the data contains tuple-items
219 | 
220 |     See Github issue #20
221 |     '''
222 | 
223 |     def testSingleLinkage(self):
224 |         "Basic Hierarchical Clustering test with integers"
225 | 
226 |         def euclidian_distance(a, b):
227 |             return sqrt(sum([pow(z[0] - z[1], 2) for z in zip(a, b)]))
228 | 
229 |         self.__data = [(1, 1), (1, 2), (1, 3)]
230 |         cl = HierarchicalClustering(self.__data, euclidian_distance)
231 |         result = cl.getlevel(40)
232 |         self.assertIsNotNone(result)
233 | 
234 | class Issue28TestCase(Py23TestCase):
235 |     '''
236 |     Test case to cover the case where the data consist
237 |     of dictionary keys, and the distance function executes 
238 |     on the values these keys are associated with in the
239 |     dictionary, rather than the keys themselves.
240 | 
241 |     Behaviour for this test case differs between Python2.7
242 |     and Python3.5: on 2.7 the test behaves as expected, 
243 | 
244 |     See Github issue #28.
245 |     '''
246 | 
247 |     def testIssue28(self):
248 |         "Issue28 (Hierarchical Clustering)"
249 | 
250 |         points1D = {
251 |             'p4' : 5, 'p2' : 6, 'p7' : 10,
252 |             'p9' : 120, 'p10' : 121, 'p11' : 119,
253 |         }
254 | 
255 |         distance_func = lambda a,b : abs(points1D[a]-points1D[b])
256 |         cl = HierarchicalClustering(list(points1D.keys()), distance_func)
257 |         result = cl.getlevel(20)
258 |         self.assertIsNotNone(result)
259 |     
260 | if __name__ == '__main__':
261 | 
262 |     import logging
263 | 
264 |     suite = unittest.TestSuite((
265 |         unittest.makeSuite(HClusterIntegerTestCase),
266 |         unittest.makeSuite(HClusterSmallListTestCase),
267 |         unittest.makeSuite(HClusterStringTestCase),
268 |         unittest.makeSuite(Issue28TestCase),
269 |     ))
270 | 
271 |     logging.basicConfig(level=logging.DEBUG)
272 |     unittest.TextTestRunner(verbosity=2).run(suite)
273 | 


--------------------------------------------------------------------------------
/cluster/test/test_kmeans.py:
--------------------------------------------------------------------------------
  1 | #
  2 | # This is part of "python-cluster". A library to group similar items together.
  3 | # Copyright (C) 2006    Michel Albert
  4 | #
  5 | # This library is free software; you can redistribute it and/or modify it under
  6 | # the terms of the GNU Lesser General Public License as published by the Free
  7 | # Software Foundation; either version 2.1 of the License, or (at your option)
  8 | # any later version.
  9 | # This library is distributed in the hope that it will be useful, but WITHOUT
 10 | # ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
 11 | # FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more
 12 | # details.
 13 | # You should have received a copy of the GNU Lesser General Public License
 14 | # along with this library; if not, write to the Free Software Foundation, Inc.,
 15 | # 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
 16 | #
 17 | 
 18 | from cluster import (KMeansClustering, ClusteringError)
 19 | import unittest
 20 | 
 21 | 
 22 | def compare_list(x, y):
 23 |     """
 24 |     Compare lists by content. Ordering does not matter.
 25 |     Returns True if both lists contain the same items (and are of identical
 26 |     length)
 27 |     """
 28 | 
 29 |     cmpx = [set(cluster) for cluster in x]
 30 |     cmpy = [set(cluster) for cluster in y]
 31 | 
 32 |     all_ok = True
 33 | 
 34 |     for cset in cmpx:
 35 |         all_ok &= cset in cmpy
 36 | 
 37 |     for cset in cmpy:
 38 |         all_ok &= cset in cmpx
 39 | 
 40 |     return all_ok
 41 | 
 42 | 
 43 | class KClusterSmallListTestCase(unittest.TestCase):
 44 | 
 45 |     def testClusterLen1(self):
 46 |         "Testing that a search space of length 1 returns only one cluster"
 47 |         cl = KMeansClustering([876])
 48 |         self.assertEqual([876], cl.getclusters(2))
 49 |         self.assertEqual([876], cl.getclusters(5))
 50 | 
 51 |     def testClusterLen0(self):
 52 |         "Testing if clustering an empty set, returns an empty set"
 53 |         cl = KMeansClustering([])
 54 |         self.assertEqual([], cl.getclusters(2))
 55 |         self.assertEqual([], cl.getclusters(7))
 56 | 
 57 | 
 58 | class KCluster2DTestCase(unittest.TestCase):
 59 | 
 60 |     def testClusterCount(self):
 61 |         "Test that asking for less than 2 clusters raises an error"
 62 |         cl = KMeansClustering([876, 123, 344, 676],
 63 |                               distance=lambda x, y: abs(x - y))
 64 |         self.assertRaises(ClusteringError, cl.getclusters, 0)
 65 |         self.assertRaises(ClusteringError, cl.getclusters, 1)
 66 | 
 67 |     def testNonsenseCluster(self):
 68 |         """
 69 |         Test that asking for more clusters than data-items available raises an
 70 |         error
 71 |         """
 72 |         cl = KMeansClustering([876, 123], distance=lambda x, y: abs(x - y))
 73 |         self.assertRaises(ClusteringError, cl.getclusters, 5)
 74 | 
 75 |     def testUniformLength(self):
 76 |         """
 77 |         Test if there is an item in the cluster that has a different
 78 |         cardinality
 79 |         """
 80 |         data = [(1, 5), (2, 5), (2, 6), (3, 4), (3, 5), (3, 6, 7), (7, 3),
 81 |                 (8, 1), (8, 2), (8), (9, 2), (9, 3)]
 82 |         self.assertRaises(ValueError, KMeansClustering, data)
 83 | 
 84 |     def testPointDoubling(self):
 85 |         "test for bug #1604868"
 86 |         data = [(18, 13), (15, 12), (17, 12), (18, 12), (19, 12), (16, 11),
 87 |                 (18, 11), (19, 10), (0, 0), (1, 4), (1, 2), (2, 3), (4, 1),
 88 |                 (4, 3), (5, 2), (6, 1)]
 89 |         cl = KMeansClustering(data)
 90 |         clusters = cl.getclusters(2)
 91 |         expected = [[(18, 13), (15, 12), (17, 12), (18, 12), (19, 12),
 92 |                      (16, 11), (18, 11), (19, 10)],
 93 |                     [(0, 0), (1, 4), (1, 2), (2, 3), (4, 1),
 94 |                      (5, 2), (6, 1), (4, 3)]]
 95 |         self.assertTrue(compare_list(
 96 |             clusters,
 97 |             expected),
 98 |             "Elements differ!\n%s\n%s" % (clusters, expected))
 99 | 
100 |     def testClustering(self):
101 |         "Basic clustering test"
102 |         data = [(8, 2), (7, 3), (2, 6), (3, 5), (3, 6), (1, 5), (8, 1),
103 |                 (3, 4), (8, 3), (9, 2), (2, 5), (9, 3)]
104 |         cl = KMeansClustering(data)
105 |         self.assertEqual(
106 |             cl.getclusters(2),
107 |             [[(8, 2), (8, 1), (8, 3), (7, 3), (9, 2), (9, 3)],
108 |              [(3, 5), (1, 5), (3, 4), (2, 6), (2, 5), (3, 6)]])
109 | 
110 |     def testUnmodifiedData(self):
111 |         "Basic clustering test"
112 |         data = [(8, 2), (7, 3), (2, 6), (3, 5), (3, 6), (1, 5), (8, 1),
113 |                 (3, 4), (8, 3), (9, 2), (2, 5), (9, 3)]
114 |         cl = KMeansClustering(data)
115 | 
116 |         new_data = []
117 |         [new_data.extend(_) for _ in cl.getclusters(2)]
118 |         self.assertEqual(sorted(new_data), sorted(data))
119 | 
120 | 
121 | class KClusterSFBugs(unittest.TestCase):
122 | 
123 |     def testLostFunctionReference(self):
124 |         "test for bug #1727558"
125 |         cl = KMeansClustering([(1, 1), (20, 40), (20, 41)],
126 |                               lambda x, y: x + y)
127 |         clusters = cl.getclusters(3)
128 |         expected = [(1, 1), (20, 40), (20, 41)]
129 |         self.assertTrue(compare_list(
130 |             clusters,
131 |             expected),
132 |             "Elements differ!\n%s\n%s" % (clusters, expected))
133 | 
134 |     def testMultidimArray(self):
135 |         from random import random
136 |         data = []
137 |         for _ in range(200):
138 |             data.append([random(), random()])
139 |         cl = KMeansClustering(data, lambda p0, p1: (
140 |             p0[0] - p1[0]) ** 2 + (p0[1] - p1[1]) ** 2)
141 |         cl.getclusters(10)
142 | 


--------------------------------------------------------------------------------
/cluster/test/test_linkage.py:
--------------------------------------------------------------------------------
 1 | import unittest
 2 | 
 3 | from cluster.linkage import single, complete, uclus, average
 4 | 
 5 | 
 6 | class LinkageMethods(unittest.TestCase):
 7 | 
 8 |     def setUp(self):
 9 |         self.set_a = [1, 2, 3, 4]
10 |         self.set_b = [10, 11, 12, 13, 14, 15, 100]
11 |         self.dist = lambda x, y: abs(x-y)  # NOQA
12 | 
13 |     def test_single_distance(self):
14 |         result = single(self.set_a, self.set_b, self.dist)
15 |         expected = 6
16 |         self.assertEqual(result, expected)
17 | 
18 |     def test_complete_distance(self):
19 |         result = complete(self.set_a, self.set_b, self.dist)
20 |         expected = 99
21 |         self.assertEqual(result, expected)
22 | 
23 |     def test_uclus_distance(self):
24 |         result = uclus(self.set_a, self.set_b, self.dist)
25 |         expected = 10.5
26 |         self.assertEqual(result, expected)
27 | 
28 |     def test_average_distance(self):
29 |         result = average(self.set_a, self.set_b, self.dist)
30 |         expected = 22.5
31 |         self.assertEqual(result, expected)
32 | 
33 | if __name__ == '__main__':
34 | 
35 |     import logging
36 | 
37 |     suite = unittest.TestSuite((
38 |         unittest.makeSuite(LinkageMethods),
39 |     ))
40 | 
41 |     logging.basicConfig(level=logging.DEBUG)
42 |     unittest.TextTestRunner(verbosity=2).run(suite)
43 | 


--------------------------------------------------------------------------------
/cluster/test/test_numpy.py:
--------------------------------------------------------------------------------
 1 | #
 2 | # This is part of "python-cluster". A library to group similar items together.
 3 | # Copyright (C) 2006    Michel Albert
 4 | #
 5 | # This library is free software; you can redistribute it and/or modify it under
 6 | # the terms of the GNU Lesser General Public License as published by the Free
 7 | # Software Foundation; either version 2.1 of the License, or (at your option)
 8 | # any later version.
 9 | # This library is distributed in the hope that it will be useful, but WITHOUT
10 | # ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
11 | # FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more
12 | # details.
13 | # You should have received a copy of the GNU Lesser General Public License
14 | # along with this library; if not, write to the Free Software Foundation, Inc.,
15 | # 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
16 | #
17 | 
18 | import unittest
19 | 
20 | from cluster import KMeansClustering
21 | try:
22 |     import numpy
23 |     NUMPY_AVAILABLE = True
24 | except:
25 |     NUMPY_AVAILABLE = False
26 | 
27 | 
28 | @unittest.skipUnless(NUMPY_AVAILABLE,
29 |                      'numpy not available. Associated test will not be loaded!')
30 | class NumpyTests(unittest.TestCase):
31 | 
32 |     def testNumpyRandom(self):
33 |         data = numpy.random.rand(500, 2)
34 |         cl = KMeansClustering(data, lambda p0, p1: (
35 |             p0[0] - p1[0]) ** 2 + (p0[1] - p1[1]) ** 2, numpy.array_equal)
36 |         cl.getclusters(10)
37 | 


--------------------------------------------------------------------------------
/cluster/util.py:
--------------------------------------------------------------------------------
  1 | #
  2 | # This is part of "python-cluster". A library to group similar items together.
  3 | # Copyright (C) 2006    Michel Albert
  4 | #
  5 | # This library is free software; you can redistribute it and/or modify it
  6 | # under the terms of the GNU Lesser General Public License as published by the
  7 | # Free Software Foundation; either version 2.1 of the License, or (at your
  8 | # option) any later version.
  9 | # This library is distributed in the hope that it will be useful, but WITHOUT
 10 | # ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
 11 | # FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
 12 | # for more details.
 13 | # You should have received a copy of the GNU Lesser General Public License
 14 | # along with this library; if not, write to the Free Software Foundation,
 15 | # Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
 16 | #
 17 | 
 18 | from __future__ import print_function
 19 | import logging
 20 | 
 21 | 
 22 | logger = logging.getLogger(__name__)
 23 | 
 24 | 
 25 | class ClusteringError(Exception):
 26 |     pass
 27 | 
 28 | 
 29 | def flatten(L):
 30 |     """
 31 |     Flattens a list.
 32 | 
 33 |     Example:
 34 | 
 35 |     >>> flatten([a,b,[c,d,[e,f]]])
 36 |     [a,b,c,d,e,f]
 37 |     """
 38 |     if not isinstance(L, list):
 39 |         return [L]
 40 | 
 41 |     if L == []:
 42 |         return L
 43 | 
 44 |     return flatten(L[0]) + flatten(L[1:])
 45 | 
 46 | 
 47 | def fullyflatten(container):
 48 |     """
 49 |     Completely flattens out a cluster and returns a one-dimensional set
 50 |     containing the cluster's items. This is useful in cases where some items of
 51 |     the cluster are clusters in their own right and you only want the items.
 52 | 
 53 |     :param container: the container to flatten.
 54 |     """
 55 |     flattened_items = []
 56 | 
 57 |     for item in container:
 58 |         if hasattr(item, 'items'):
 59 |             flattened_items = flattened_items + fullyflatten(item.items)
 60 |         else:
 61 |             flattened_items.append(item)
 62 | 
 63 |     return flattened_items
 64 | 
 65 | 
 66 | def median(numbers):
 67 |     """
 68 |     Return the median of the list of numbers.
 69 |     see: http://mail.python.org/pipermail/python-list/2004-December/294990.html
 70 |     """
 71 | 
 72 |     # Sort the list and take the middle element.
 73 |     n = len(numbers)
 74 |     copy = sorted(numbers)
 75 |     if n & 1:  # There is an odd number of elements
 76 |         return copy[n // 2]
 77 |     else:
 78 |         return (copy[n // 2 - 1] + copy[n // 2]) / 2.0
 79 | 
 80 | 
 81 | def mean(numbers):
 82 |     """
 83 |     Returns the arithmetic mean of a numeric list.
 84 |     see: http://mail.python.org/pipermail/python-list/2004-December/294990.html
 85 |     """
 86 |     return float(sum(numbers)) / float(len(numbers))
 87 | 
 88 | 
 89 | def minkowski_distance(x, y, p=2):
 90 |     """
 91 |     Calculates the minkowski distance between two points.
 92 | 
 93 |     :param x: the first point
 94 |     :param y: the second point
 95 |     :param p: the order of the minkowski algorithm. If *p=1* it is equal
 96 |         to the manhatten distance, if *p=2* it is equal to the euclidian
 97 |         distance. The higher the order, the closer it converges to the
 98 |         Chebyshev distance, which has *p=infinity*.
 99 |     """
100 |     from math import pow
101 |     assert len(y) == len(x)
102 |     assert len(x) >= 1
103 |     sum = 0
104 |     for i in range(len(x)):
105 |         sum += abs(x[i] - y[i]) ** p
106 |     return pow(sum, 1.0 / float(p))
107 | 
108 | 
109 | def magnitude(a):
110 |     "calculates the magnitude of a vecor"
111 |     from math import sqrt
112 |     sum = 0
113 |     for coord in a:
114 |         sum += coord ** 2
115 |     return sqrt(sum)
116 | 
117 | 
118 | def dotproduct(a, b):
119 |     "Calculates the dotproduct between two vecors"
120 |     assert(len(a) == len(b))
121 |     out = 0
122 |     for i in range(len(a)):
123 |         out += a[i] * b[i]
124 |     return out
125 | 
126 | 
127 | def centroid(data, method=median):
128 |     "returns the central vector of a list of vectors"
129 |     out = []
130 |     for i in range(len(data[0])):
131 |         out.append(method([x[i] for x in data]))
132 |     return tuple(out)
133 | 


--------------------------------------------------------------------------------
/cluster/version.txt:
--------------------------------------------------------------------------------
1 | 1.4.1.post3
2 | 


--------------------------------------------------------------------------------
/dev-requirements.txt:
--------------------------------------------------------------------------------
1 | sphinx
2 | 


--------------------------------------------------------------------------------
/docs/Makefile:
--------------------------------------------------------------------------------
  1 | # Makefile for Sphinx documentation
  2 | #
  3 | 
  4 | # You can set these variables from the command line.
  5 | SPHINXOPTS    =
  6 | SPHINXBUILD   = sphinx-build
  7 | PAPER         =
  8 | BUILDDIR      = _build
  9 | 
 10 | # User-friendly check for sphinx-build
 11 | ifeq ($(shell which $(SPHINXBUILD) >/dev/null 2>&1; echo $$?), 1)
 12 | $(error The '$(SPHINXBUILD)' command was not found. Make sure you have Sphinx installed, then set the SPHINXBUILD environment variable to point to the full path of the '$(SPHINXBUILD)' executable. Alternatively you can add the directory with the executable to your PATH. If you don't have Sphinx installed, grab it from http://sphinx-doc.org/)
 13 | endif
 14 | 
 15 | # Internal variables.
 16 | PAPEROPT_a4     = -D latex_paper_size=a4
 17 | PAPEROPT_letter = -D latex_paper_size=letter
 18 | ALLSPHINXOPTS   = -d $(BUILDDIR)/doctrees $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) .
 19 | # the i18n builder cannot share the environment and doctrees with the others
 20 | I18NSPHINXOPTS  = $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) .
 21 | 
 22 | .PHONY: help clean html dirhtml singlehtml pickle json htmlhelp qthelp devhelp epub latex latexpdf text man changes linkcheck doctest gettext
 23 | 
 24 | help:
 25 | 	@echo "Please use \`make <target>' where <target> is one of"
 26 | 	@echo "  html       to make standalone HTML files"
 27 | 	@echo "  dirhtml    to make HTML files named index.html in directories"
 28 | 	@echo "  singlehtml to make a single large HTML file"
 29 | 	@echo "  pickle     to make pickle files"
 30 | 	@echo "  json       to make JSON files"
 31 | 	@echo "  htmlhelp   to make HTML files and a HTML help project"
 32 | 	@echo "  qthelp     to make HTML files and a qthelp project"
 33 | 	@echo "  devhelp    to make HTML files and a Devhelp project"
 34 | 	@echo "  epub       to make an epub"
 35 | 	@echo "  latex      to make LaTeX files, you can set PAPER=a4 or PAPER=letter"
 36 | 	@echo "  latexpdf   to make LaTeX files and run them through pdflatex"
 37 | 	@echo "  latexpdfja to make LaTeX files and run them through platex/dvipdfmx"
 38 | 	@echo "  text       to make text files"
 39 | 	@echo "  man        to make manual pages"
 40 | 	@echo "  texinfo    to make Texinfo files"
 41 | 	@echo "  info       to make Texinfo files and run them through makeinfo"
 42 | 	@echo "  gettext    to make PO message catalogs"
 43 | 	@echo "  changes    to make an overview of all changed/added/deprecated items"
 44 | 	@echo "  xml        to make Docutils-native XML files"
 45 | 	@echo "  pseudoxml  to make pseudoxml-XML files for display purposes"
 46 | 	@echo "  linkcheck  to check all external links for integrity"
 47 | 	@echo "  doctest    to run all doctests embedded in the documentation (if enabled)"
 48 | 
 49 | clean:
 50 | 	rm -rf $(BUILDDIR)/*
 51 | 
 52 | html:
 53 | 	$(SPHINXBUILD) -b html $(ALLSPHINXOPTS) $(BUILDDIR)/html
 54 | 	@echo
 55 | 	@echo "Build finished. The HTML pages are in $(BUILDDIR)/html."
 56 | 
 57 | dirhtml:
 58 | 	$(SPHINXBUILD) -b dirhtml $(ALLSPHINXOPTS) $(BUILDDIR)/dirhtml
 59 | 	@echo
 60 | 	@echo "Build finished. The HTML pages are in $(BUILDDIR)/dirhtml."
 61 | 
 62 | singlehtml:
 63 | 	$(SPHINXBUILD) -b singlehtml $(ALLSPHINXOPTS) $(BUILDDIR)/singlehtml
 64 | 	@echo
 65 | 	@echo "Build finished. The HTML page is in $(BUILDDIR)/singlehtml."
 66 | 
 67 | pickle:
 68 | 	$(SPHINXBUILD) -b pickle $(ALLSPHINXOPTS) $(BUILDDIR)/pickle
 69 | 	@echo
 70 | 	@echo "Build finished; now you can process the pickle files."
 71 | 
 72 | json:
 73 | 	$(SPHINXBUILD) -b json $(ALLSPHINXOPTS) $(BUILDDIR)/json
 74 | 	@echo
 75 | 	@echo "Build finished; now you can process the JSON files."
 76 | 
 77 | htmlhelp:
 78 | 	$(SPHINXBUILD) -b htmlhelp $(ALLSPHINXOPTS) $(BUILDDIR)/htmlhelp
 79 | 	@echo
 80 | 	@echo "Build finished; now you can run HTML Help Workshop with the" \
 81 | 	      ".hhp project file in $(BUILDDIR)/htmlhelp."
 82 | 
 83 | qthelp:
 84 | 	$(SPHINXBUILD) -b qthelp $(ALLSPHINXOPTS) $(BUILDDIR)/qthelp
 85 | 	@echo
 86 | 	@echo "Build finished; now you can run "qcollectiongenerator" with the" \
 87 | 	      ".qhcp project file in $(BUILDDIR)/qthelp, like this:"
 88 | 	@echo "# qcollectiongenerator $(BUILDDIR)/qthelp/python-cluster.qhcp"
 89 | 	@echo "To view the help file:"
 90 | 	@echo "# assistant -collectionFile $(BUILDDIR)/qthelp/python-cluster.qhc"
 91 | 
 92 | devhelp:
 93 | 	$(SPHINXBUILD) -b devhelp $(ALLSPHINXOPTS) $(BUILDDIR)/devhelp
 94 | 	@echo
 95 | 	@echo "Build finished."
 96 | 	@echo "To view the help file:"
 97 | 	@echo "# mkdir -p $$HOME/.local/share/devhelp/python-cluster"
 98 | 	@echo "# ln -s $(BUILDDIR)/devhelp $$HOME/.local/share/devhelp/python-cluster"
 99 | 	@echo "# devhelp"
100 | 
101 | epub:
102 | 	$(SPHINXBUILD) -b epub $(ALLSPHINXOPTS) $(BUILDDIR)/epub
103 | 	@echo
104 | 	@echo "Build finished. The epub file is in $(BUILDDIR)/epub."
105 | 
106 | latex:
107 | 	$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
108 | 	@echo
109 | 	@echo "Build finished; the LaTeX files are in $(BUILDDIR)/latex."
110 | 	@echo "Run \`make' in that directory to run these through (pdf)latex" \
111 | 	      "(use \`make latexpdf' here to do that automatically)."
112 | 
113 | latexpdf:
114 | 	$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
115 | 	@echo "Running LaTeX files through pdflatex..."
116 | 	$(MAKE) -C $(BUILDDIR)/latex all-pdf
117 | 	@echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex."
118 | 
119 | latexpdfja:
120 | 	$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
121 | 	@echo "Running LaTeX files through platex and dvipdfmx..."
122 | 	$(MAKE) -C $(BUILDDIR)/latex all-pdf-ja
123 | 	@echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex."
124 | 
125 | text:
126 | 	$(SPHINXBUILD) -b text $(ALLSPHINXOPTS) $(BUILDDIR)/text
127 | 	@echo
128 | 	@echo "Build finished. The text files are in $(BUILDDIR)/text."
129 | 
130 | man:
131 | 	$(SPHINXBUILD) -b man $(ALLSPHINXOPTS) $(BUILDDIR)/man
132 | 	@echo
133 | 	@echo "Build finished. The manual pages are in $(BUILDDIR)/man."
134 | 
135 | texinfo:
136 | 	$(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo
137 | 	@echo
138 | 	@echo "Build finished. The Texinfo files are in $(BUILDDIR)/texinfo."
139 | 	@echo "Run \`make' in that directory to run these through makeinfo" \
140 | 	      "(use \`make info' here to do that automatically)."
141 | 
142 | info:
143 | 	$(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo
144 | 	@echo "Running Texinfo files through makeinfo..."
145 | 	make -C $(BUILDDIR)/texinfo info
146 | 	@echo "makeinfo finished; the Info files are in $(BUILDDIR)/texinfo."
147 | 
148 | gettext:
149 | 	$(SPHINXBUILD) -b gettext $(I18NSPHINXOPTS) $(BUILDDIR)/locale
150 | 	@echo
151 | 	@echo "Build finished. The message catalogs are in $(BUILDDIR)/locale."
152 | 
153 | changes:
154 | 	$(SPHINXBUILD) -b changes $(ALLSPHINXOPTS) $(BUILDDIR)/changes
155 | 	@echo
156 | 	@echo "The overview file is in $(BUILDDIR)/changes."
157 | 
158 | linkcheck:
159 | 	$(SPHINXBUILD) -b linkcheck $(ALLSPHINXOPTS) $(BUILDDIR)/linkcheck
160 | 	@echo
161 | 	@echo "Link check complete; look for any errors in the above output " \
162 | 	      "or in $(BUILDDIR)/linkcheck/output.txt."
163 | 
164 | doctest:
165 | 	$(SPHINXBUILD) -b doctest $(ALLSPHINXOPTS) $(BUILDDIR)/doctest
166 | 	@echo "Testing of doctests in the sources finished, look at the " \
167 | 	      "results in $(BUILDDIR)/doctest/output.txt."
168 | 
169 | xml:
170 | 	$(SPHINXBUILD) -b xml $(ALLSPHINXOPTS) $(BUILDDIR)/xml
171 | 	@echo
172 | 	@echo "Build finished. The XML files are in $(BUILDDIR)/xml."
173 | 
174 | pseudoxml:
175 | 	$(SPHINXBUILD) -b pseudoxml $(ALLSPHINXOPTS) $(BUILDDIR)/pseudoxml
176 | 	@echo
177 | 	@echo "Build finished. The pseudo-XML files are in $(BUILDDIR)/pseudoxml."
178 | 


--------------------------------------------------------------------------------
/docs/_static/.gitkeep:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/exhuma/python-cluster/2739ff420ef5bf8fba53f67788453b8239f16c9a/docs/_static/.gitkeep


--------------------------------------------------------------------------------
/docs/apidoc/cluster.matrix.rst:
--------------------------------------------------------------------------------
1 | cluster.matrix
2 | ==============
3 | 
4 | .. automodule:: cluster.matrix
5 |     :members:
6 |     :undoc-members:
7 |     :show-inheritance:
8 | 


--------------------------------------------------------------------------------
/docs/apidoc/cluster.method.base.rst:
--------------------------------------------------------------------------------
1 | cluster.method.base
2 | ===================
3 | 
4 | .. automodule:: cluster.method.base
5 |     :members:
6 |     :undoc-members:
7 |     :show-inheritance:
8 | 


--------------------------------------------------------------------------------
/docs/apidoc/cluster.method.hierarchical.rst:
--------------------------------------------------------------------------------
1 | cluster.method.hierarchical
2 | ===========================
3 | 
4 | .. automodule:: cluster.method.hierarchical
5 |     :members:
6 |     :undoc-members:
7 |     :show-inheritance:
8 | 


--------------------------------------------------------------------------------
/docs/apidoc/cluster.method.kmeans.rst:
--------------------------------------------------------------------------------
1 | cluster.method.kmeans
2 | =====================
3 | 
4 | .. automodule:: cluster.method.kmeans
5 |     :members:
6 |     :undoc-members:
7 |     :show-inheritance:
8 | 


--------------------------------------------------------------------------------
/docs/apidoc/cluster.rst:
--------------------------------------------------------------------------------
1 | cluster
2 | =======
3 | 
4 | .. automodule:: cluster.cluster
5 |     :members:
6 |     :undoc-members:
7 |     :show-inheritance:
8 | 


--------------------------------------------------------------------------------
/docs/apidoc/cluster.util.rst:
--------------------------------------------------------------------------------
1 | cluster.util
2 | ============
3 | 
4 | .. automodule:: cluster.util
5 |     :members:
6 |     :undoc-members:
7 |     :show-inheritance:
8 | 


--------------------------------------------------------------------------------
/docs/changelog.rst:
--------------------------------------------------------------------------------
1 | Changelog
2 | #########
3 | 
4 | .. include:: ../CHANGELOG
5 | 


--------------------------------------------------------------------------------
/docs/conf.py:
--------------------------------------------------------------------------------
  1 | # -*- coding: utf-8 -*-
  2 | #
  3 | # python-cluster documentation build configuration file, created by
  4 | # sphinx-quickstart on Wed Aug 27 07:50:52 2014.
  5 | #
  6 | # This file is execfile()d with the current directory set to its
  7 | # containing dir.
  8 | #
  9 | # Note that not all possible configuration values are present in this
 10 | # autogenerated file.
 11 | #
 12 | # All configuration values have a default; values that are commented out
 13 | # serve to show the default.
 14 | 
 15 | import sys
 16 | import os
 17 | from os.path import dirname, join
 18 | 
 19 | # If extensions (or modules to document with autodoc) are in another directory,
 20 | # add these directories to sys.path here. If the directory is relative to the
 21 | # documentation root, use os.path.abspath to make it absolute, like shown here.
 22 | #sys.path.insert(0, os.path.abspath('.'))
 23 | 
 24 | # -- General configuration ------------------------------------------------
 25 | 
 26 | # If your documentation needs a minimal Sphinx version, state it here.
 27 | #needs_sphinx = '1.0'
 28 | 
 29 | # Add any Sphinx extension module names here, as strings. They can be
 30 | # extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
 31 | # ones.
 32 | extensions = [
 33 |     'sphinx.ext.autodoc',
 34 | ]
 35 | 
 36 | # Add any paths that contain templates here, relative to this directory.
 37 | templates_path = ['_templates']
 38 | 
 39 | # The suffix of source filenames.
 40 | source_suffix = '.rst'
 41 | 
 42 | # The encoding of source files.
 43 | #source_encoding = 'utf-8-sig'
 44 | 
 45 | # The master toctree document.
 46 | master_doc = 'index'
 47 | 
 48 | # General information about the project.
 49 | project = u'python-cluster'
 50 | copyright = u'2014, Michel Albert'
 51 | 
 52 | # The version info for the project you're documenting, acts as replacement for
 53 | # |version| and |release|, also used in various other places throughout the
 54 | # built documents.
 55 | #
 56 | version_file = join(dirname(__file__), '..', 'cluster', 'version.txt')
 57 | with open(version_file) as fptr:
 58 |     # The full version, including alpha/beta/rc tags.
 59 |     release = fptr.read().strip()
 60 |     versioninfo = release.split('.')
 61 | 
 62 | # The short X.Y version.
 63 | version = '%s.%s' % (versioninfo[0], versioninfo[1])
 64 | 
 65 | # The language for content autogenerated by Sphinx. Refer to documentation
 66 | # for a list of supported languages.
 67 | #language = None
 68 | 
 69 | # There are two options for replacing |today|: either, you set today to some
 70 | # non-false value, then it is used:
 71 | #today = ''
 72 | # Else, today_fmt is used as the format for a strftime call.
 73 | #today_fmt = '%B %d, %Y'
 74 | 
 75 | # List of patterns, relative to source directory, that match files and
 76 | # directories to ignore when looking for source files.
 77 | exclude_patterns = ['_build']
 78 | 
 79 | # The reST default role (used for this markup: `text`) to use for all
 80 | # documents.
 81 | #default_role = None
 82 | 
 83 | # If true, '()' will be appended to :func: etc. cross-reference text.
 84 | #add_function_parentheses = True
 85 | 
 86 | # If true, the current module name will be prepended to all description
 87 | # unit titles (such as .. function::).
 88 | #add_module_names = True
 89 | 
 90 | # If true, sectionauthor and moduleauthor directives will be shown in the
 91 | # output. They are ignored by default.
 92 | #show_authors = False
 93 | 
 94 | # The name of the Pygments (syntax highlighting) style to use.
 95 | pygments_style = 'sphinx'
 96 | 
 97 | # A list of ignored prefixes for module index sorting.
 98 | #modindex_common_prefix = []
 99 | 
100 | # If true, keep warnings as "system message" paragraphs in the built documents.
101 | #keep_warnings = False
102 | 
103 | 
104 | # -- Options for HTML output ----------------------------------------------
105 | 
106 | # The theme to use for HTML and HTML Help pages.  See the documentation for
107 | # a list of builtin themes.
108 | html_theme = 'alabaster'
109 | 
110 | # Theme options are theme-specific and customize the look and feel of a theme
111 | # further.  For a list of options available for each theme, see the
112 | # documentation.
113 | #html_theme_options = {}
114 | 
115 | # Add any paths that contain custom themes here, relative to this directory.
116 | #html_theme_path = []
117 | 
118 | # The name for this set of Sphinx documents.  If None, it defaults to
119 | # "<project> v<release> documentation".
120 | #html_title = None
121 | 
122 | # A shorter title for the navigation bar.  Default is the same as html_title.
123 | #html_short_title = None
124 | 
125 | # The name of an image file (relative to this directory) to place at the top
126 | # of the sidebar.
127 | #html_logo = None
128 | 
129 | # The name of an image file (within the static path) to use as favicon of the
130 | # docs.  This file should be a Windows icon file (.ico) being 16x16 or 32x32
131 | # pixels large.
132 | #html_favicon = None
133 | 
134 | # Add any paths that contain custom static files (such as style sheets) here,
135 | # relative to this directory. They are copied after the builtin static files,
136 | # so a file named "default.css" will overwrite the builtin "default.css".
137 | html_static_path = ['_static']
138 | 
139 | # Add any extra paths that contain custom files (such as robots.txt or
140 | # .htaccess) here, relative to this directory. These files are copied
141 | # directly to the root of the documentation.
142 | #html_extra_path = []
143 | 
144 | # If not '', a 'Last updated on:' timestamp is inserted at every page bottom,
145 | # using the given strftime format.
146 | #html_last_updated_fmt = '%b %d, %Y'
147 | 
148 | # If true, SmartyPants will be used to convert quotes and dashes to
149 | # typographically correct entities.
150 | #html_use_smartypants = True
151 | 
152 | # Custom sidebar templates, maps document names to template names.
153 | #html_sidebars = {}
154 | 
155 | # Additional templates that should be rendered to pages, maps page names to
156 | # template names.
157 | #html_additional_pages = {}
158 | 
159 | # If false, no module index is generated.
160 | #html_domain_indices = True
161 | 
162 | # If false, no index is generated.
163 | #html_use_index = True
164 | 
165 | # If true, the index is split into individual pages for each letter.
166 | #html_split_index = False
167 | 
168 | # If true, links to the reST sources are added to the pages.
169 | #html_show_sourcelink = True
170 | 
171 | # If true, "Created using Sphinx" is shown in the HTML footer. Default is True.
172 | #html_show_sphinx = True
173 | 
174 | # If true, "(C) Copyright ..." is shown in the HTML footer. Default is True.
175 | #html_show_copyright = True
176 | 
177 | # If true, an OpenSearch description file will be output, and all pages will
178 | # contain a <link> tag referring to it.  The value of this option must be the
179 | # base URL from which the finished HTML is served.
180 | #html_use_opensearch = ''
181 | 
182 | # This is the file name suffix for HTML files (e.g. ".xhtml").
183 | #html_file_suffix = None
184 | 
185 | # Output file base name for HTML help builder.
186 | htmlhelp_basename = 'python-clusterdoc'
187 | 
188 | 
189 | # -- Options for LaTeX output ---------------------------------------------
190 | 
191 | latex_elements = {
192 | # The paper size ('letterpaper' or 'a4paper').
193 | #'papersize': 'letterpaper',
194 | 
195 | # The font size ('10pt', '11pt' or '12pt').
196 | #'pointsize': '10pt',
197 | 
198 | # Additional stuff for the LaTeX preamble.
199 | #'preamble': '',
200 | }
201 | 
202 | # Grouping the document tree into LaTeX files. List of tuples
203 | # (source start file, target name, title,
204 | #  author, documentclass [howto, manual, or own class]).
205 | latex_documents = [
206 |   ('index', 'python-cluster.tex', u'python-cluster Documentation',
207 |    u'Michel Albert', 'manual'),
208 | ]
209 | 
210 | # The name of an image file (relative to this directory) to place at the top of
211 | # the title page.
212 | #latex_logo = None
213 | 
214 | # For "manual" documents, if this is true, then toplevel headings are parts,
215 | # not chapters.
216 | #latex_use_parts = False
217 | 
218 | # If true, show page references after internal links.
219 | #latex_show_pagerefs = False
220 | 
221 | # If true, show URL addresses after external links.
222 | #latex_show_urls = False
223 | 
224 | # Documents to append as an appendix to all manuals.
225 | #latex_appendices = []
226 | 
227 | # If false, no module index is generated.
228 | #latex_domain_indices = True
229 | 
230 | 
231 | # -- Options for manual page output ---------------------------------------
232 | 
233 | # One entry per manual page. List of tuples
234 | # (source start file, name, description, authors, manual section).
235 | man_pages = [
236 |     ('index', 'python-cluster', u'python-cluster Documentation',
237 |      [u'Michel Albert'], 1)
238 | ]
239 | 
240 | # If true, show URL addresses after external links.
241 | #man_show_urls = False
242 | 
243 | 
244 | # -- Options for Texinfo output -------------------------------------------
245 | 
246 | # Grouping the document tree into Texinfo files. List of tuples
247 | # (source start file, target name, title, author,
248 | #  dir menu entry, description, category)
249 | texinfo_documents = [
250 |   ('index', 'python-cluster', u'python-cluster Documentation',
251 |    u'Michel Albert', 'python-cluster', 'One line description of project.',
252 |    'Miscellaneous'),
253 | ]
254 | 
255 | # Documents to append as an appendix to all manuals.
256 | #texinfo_appendices = []
257 | 
258 | # If false, no module index is generated.
259 | #texinfo_domain_indices = True
260 | 
261 | # How to display URL addresses: 'footnote', 'no', or 'inline'.
262 | #texinfo_show_urls = 'footnote'
263 | 
264 | # If true, do not generate a @detailmenu in the "Top" node's menu.
265 | #texinfo_no_detailmenu = False
266 | 


--------------------------------------------------------------------------------
/docs/index.rst:
--------------------------------------------------------------------------------
  1 | Welcome to python-cluster's documentation!
  2 | ==========================================
  3 | 
  4 | Index
  5 | -----
  6 | 
  7 | .. toctree::
  8 |    :maxdepth: 1
  9 | 
 10 |    changelog
 11 | 
 12 | 
 13 | Introduction
 14 | ------------
 15 | 
 16 | Implementation of cluster algorithms in pure Python.
 17 | 
 18 | As this is exacuted in the Python runtime, the code runs slower than similar
 19 | implementations in compiled languages. You gain however to run this on pretty
 20 | much any Python object. The different clustering methods have different
 21 | prerequisites however which are mentioned in the different implementations.
 22 | 
 23 | 
 24 | 
 25 | Example for K-Means Clustering
 26 | ------------------------------
 27 | 
 28 | ::
 29 | 
 30 |     from cluster import KMeansClustering
 31 |     data = [
 32 |         (8, 2),
 33 |         (7, 3),
 34 |         (2, 6),
 35 |         (3, 5),
 36 |         (3, 6),
 37 |         (1, 5),
 38 |         (8, 1),
 39 |         (3, 4),
 40 |         (8, 3),
 41 |         (9, 2),
 42 |         (2, 5),
 43 |         (9, 3)
 44 |     ]
 45 |     cl = KMeansClustering(data)
 46 |     cl.getclusters(2)
 47 | 
 48 | The above code would give the following result::
 49 | 
 50 |     [
 51 |         [(8, 2), (8, 1), (8, 3), (7, 3), (9, 2), (9, 3)],
 52 |         [(3, 5), (1, 5), (3, 4), (2, 6), (2, 5), (3, 6)]
 53 |     ]
 54 | 
 55 | 
 56 | Example for Hierarchical Clustering
 57 | -----------------------------------
 58 | 
 59 | ::
 60 | 
 61 |     from cluster import HierarchicalClustering
 62 |     data = [791, 956, 676, 124, 564, 84, 24, 365, 594, 940, 398,
 63 |             971, 131, 365, 542, 336, 518, 835, 134, 391]
 64 |     cl = HierarchicalClustering(data)
 65 |     cl.getlevel(40)
 66 | 
 67 | The above code would give the following result::
 68 | 
 69 |     [
 70 |         [24],
 71 |         [84, 124, 131, 134],
 72 |         [336, 365, 365, 391, 398],
 73 |         [676],
 74 |         [594, 518, 542, 564],
 75 |         [940, 956, 971],
 76 |         [791],
 77 |         [835],
 78 |     ]
 79 | 
 80 | 
 81 | Using :py:meth:`~cluster.method.hierarchical.HierarchicalClustering.getlevel()`
 82 | returns clusters where the distance between each cluster is no less than
 83 | *level*.
 84 | 
 85 | .. note::
 86 | 
 87 |     Due to a bug_ in earlier releases, the elements of the input data *must be*
 88 |     sortable!
 89 | 
 90 |     .. _bug: https://github.com/exhuma/python-cluster/issues/11
 91 | 
 92 | 
 93 | API
 94 | ---
 95 | 
 96 | .. toctree::
 97 |    :maxdepth: 1
 98 | 
 99 |    apidoc/cluster
100 |    apidoc/cluster.matrix
101 |    apidoc/cluster.method.base
102 |    apidoc/cluster.method.hierarchical
103 |    apidoc/cluster.method.kmeans
104 |    apidoc/cluster.util
105 | 
106 | Indices and tables
107 | ==================
108 | 
109 | * :ref:`genindex`
110 | * :ref:`modindex`
111 | * :ref:`search`
112 | 
113 | 


--------------------------------------------------------------------------------
/fabfile.py:
--------------------------------------------------------------------------------
 1 | import fabric.api as fab
 2 | 
 3 | 
 4 | @fab.task
 5 | def doc():
 6 |     with fab.lcd('docs'):
 7 |         fab.local('../env/bin/sphinx-build '
 8 |                   '-b html '
 9 |                   '-d _build/doctrees . '
10 |                   '_build/html')
11 | 


--------------------------------------------------------------------------------
/makedist.sh:
--------------------------------------------------------------------------------
1 | #!/bin/bash
2 | 
3 | python setup.py sdist
4 | python setup.py bdist
5 | python setup.py bdist_rpm
6 | python setup.py bdist_wininst
7 | 
8 | 
9 | 


--------------------------------------------------------------------------------
/pytest.ini:
--------------------------------------------------------------------------------
1 | [pytest]
2 | looponfailroots = cluster
3 | norecursedirs = env env3 env3_nonumpy .git
4 | 


--------------------------------------------------------------------------------
/setup.cfg:
--------------------------------------------------------------------------------
1 | [bdist_wheel]
2 | universal=1
3 | 


--------------------------------------------------------------------------------
/setup.py:
--------------------------------------------------------------------------------
 1 | from setuptools import setup
 2 | 
 3 | readme_contents = open("README.rst").read()
 4 | 
 5 | # index where the first paragraph starts
 6 | parastart = readme_contents.find('=\n') + 3
 7 | 
 8 | # first sentence of first paragraph
 9 | sentence_end = readme_contents.find('.', parastart)
10 | 
11 | setup(
12 |     name='cluster',
13 |     version=open('cluster/version.txt').read().strip(),
14 |     author='Michel Albert',
15 |     author_email='michel@albert.lu',
16 |     url='https://github.com/exhuma/python-cluster',
17 |     packages=['cluster', 'cluster.method'],
18 |     license='LGPL',
19 |     description=readme_contents[parastart:sentence_end],
20 |     long_description=readme_contents,
21 |     include_package_data=True,
22 |     classifiers=[
23 |         'Development Status :: 5 - Production/Stable',
24 |         'Intended Audience :: Developers',
25 |         'Intended Audience :: Education',
26 |         'Intended Audience :: Other Audience',
27 |         'Intended Audience :: Science/Research',
28 |         'License :: OSI Approved :: GNU Lesser General Public License v2 (LGPLv2)',
29 |         'Operating System :: OS Independent',
30 |         'Programming Language :: Python',
31 |         'Programming Language :: Python :: 2',
32 |         'Programming Language :: Python :: 3',
33 |         'Topic :: Scientific/Engineering :: Information Analysis',
34 |     ]
35 | )
36 | 


--------------------------------------------------------------------------------
/tox.ini:
--------------------------------------------------------------------------------
1 | [tox]
2 | envlist = py27, py35, py36
3 | 
4 | [testenv]
5 | deps = pytest
6 | commands = pytest
7 | 


--------------------------------------------------------------------------------