├── docs
├── Global.md
├── _config.yml
└── index.md
├── prinpy
├── __init__.py
├── glob.py
└── local.py
├── .gitignore
├── setup.py
├── LICENSE
├── README.md
└── prinPy quickstart.ipynb
/docs/Global.md:
--------------------------------------------------------------------------------
1 |
--------------------------------------------------------------------------------
/prinpy/__init__.py:
--------------------------------------------------------------------------------
1 |
--------------------------------------------------------------------------------
/docs/_config.yml:
--------------------------------------------------------------------------------
1 | theme: jekyll-theme-minimal
--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
1 | #checkpoints
2 | .ipynb_checkpoints/
3 | __pycache__/
4 | build/
5 | dist/
6 | prinpy.egg-info/
7 |
--------------------------------------------------------------------------------
/docs/index.md:
--------------------------------------------------------------------------------
1 | # prinPy Documentation
2 | Welcome to the prinPy documentation!
3 |
4 | This site is under construction.
5 |
--------------------------------------------------------------------------------
/setup.py:
--------------------------------------------------------------------------------
1 | from setuptools import setup
2 |
3 | with open("README.md", 'r') as f:
4 | long_description = f.read()
5 |
6 | setup(
7 | name = 'prinpy',
8 | version = '0.0.3.1',
9 | license = "MIT",
10 | description = "A package for fitting principal curves in Python",
11 | author = "https://github.com/artusoma/",
12 | author_email = 'artusoma1@gmail.com',
13 | url = 'https://github.com/artusoma/prinPy',
14 | packages = ["prinpy"],
15 | long_description = long_description,
16 | long_description_content_type='text/markdown',
17 | install_requires = ['numpy',
18 | 'matplotlib',
19 | 'scipy',
20 | 'keras',
21 | 'tensorflow',
22 | ],
23 | )
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) 2020 artusoma
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | [](https://pepy.tech/project/prinpy)
2 | # prinPy
3 | `pip install prinpy`
4 |
5 | Inspired by [this R package](https://github.com/rcannood/princurve), prinPy brings principal curves to Python.
6 |
7 | ## What prinPy does
8 | PrinPy has local and global algorithms for computing principal curves.
9 |
10 | ## What is a Principal Curve?
11 | A principal curve is a smooth n-dimensional curve that passes through the middle of a dataset. Principal curves are a dimensionality reduction tool analogous to a nonlinear principal component. PCs have uses in GPS data, image recognition, bioinformatics, and so much more.
12 |
13 | ### Local Algorithms
14 | Local algorithms work on a step-by-step basis. Starting at one end of the curve, it will attempt to make segments that meet an acceptable error threshold as it moves from one end of the curve to the other. Once the algorithm can connect the current point to the end point, the algorithm terminates and a curve is interpolated through the segments. PrinPy currently has two local algorithms:
15 |
16 | 1. CLPC-g (Greedy Constraint Local Principal Curve)1
17 | 2. CLPC-s (One-Dimensional Search Constraint Local Principal Curve)1
18 |
19 | CLPC-g will be faster and is fine for simpler curves. CLPS-s has the potential to be much more accurate at the expense of speed for more difficult curves. After fitting a curve, prinPy has the ability to project to the curve.
20 |
21 | ### Global Algorithms
22 | Global algorithms, unlike local algorithms, are more like minimization problems. Given a dataset, a global algorithm might make an initial guess at a principal curve and adjust it from there.
23 |
24 | The sole global algorithm as of now performs nonlinear principal component analysis. The global algorithm, called NLPCA in this package, is a neural network implementation.2 This algorithm works by creating an autoassociative neural network with a "bottle-neck" layer which forces the network to learn the most important features of the data.
25 |
26 | **Which one should I use?**
27 | The local algorithms will be better for tightly bunched data, such as digit recogniition or GPS data. The global algorithm is better suited for "clouds" of data or sparsely represented data.
28 |
29 | ## Quick-Start
30 | View the quickstart notebook [here](https://github.com/artusoma/prinPy/blob/master/prinPy%20quickstart.ipynb). Docs will be coming soon!
31 |
32 | ```python
33 | # Example of local PC fitting
34 | cl = CLPCG() # Create solver
35 |
36 | # CLPCG.fit() fits the principal curve. takes x_data, y_data,
37 | # and the min allowed error for each step. e_min is acheived
38 | # through trial and error, but 1/4 to 1/2 data error is what authors
39 | # recommend.
40 | cl.fit(xdata, ydata, e_max = .1)
41 | cl.plot() # plots curve, optional axes can be passed
42 |
43 | # Reconstruct curve
44 | tcks = cl.spline_ticks # get spline ticks
45 | xy = scipy.interpolate.splev(np.linspace(0,1,100), self.spline_ticks)
46 | ```
47 |
48 | ## References
49 | \[1\] Dewang Chen, Jiateng Yin, Shiying Yang, Lingxi Li, Peter Pudney,
50 | Constraint local principal curve: Concept, algorithms and applications,
51 | Journal of Computational and Applied Mathematics,
52 | Volume 298,
53 | 2016,
54 | Pages 222-235,
55 | ISSN 0377-0427,
56 | https://doi.org/10.1016/j.cam.2015.11.041.
57 |
58 | \[2\] Mark Kramer, Nonlinear Principal Component Analysis Using
59 | Autoassociative Neural Networks
60 |
--------------------------------------------------------------------------------
/prinpy/glob.py:
--------------------------------------------------------------------------------
1 | '''
2 | This is the global module that contains principal curve and nonlinear
3 | principal component analysis algorithms that work to optimize a line
4 | over an entire dataset. Additionally, these algorithms should work
5 | in space greater than 2-dimensions.
6 | '''
7 |
8 | # General libraries
9 | import numpy as np
10 |
11 | # ML libraries
12 | import tensorflow as tf
13 | import keras
14 | from keras.models import Model
15 | from keras.layers import Dense, Input, LeakyReLU
16 | from keras import optimizers
17 | import keras.backend as k
18 |
19 | # Preprocessing
20 | from sklearn.preprocessing import MinMaxScaler, StandardScaler
21 |
22 |
23 | def orth_dist(y_true, y_pred):
24 | '''
25 | Loss function for the NLPCA NN. Returns the sum of the orthogonal
26 | distance from the output tensor to the real tensor.
27 | '''
28 | loss = tf.math.reduce_sum((y_true - y_pred)**2)
29 | return loss
30 |
31 |
32 | class NLPCA(object):
33 | '''This is a global solver for principal curves that uses neural networks.
34 |
35 | Attributes:
36 | None
37 | '''
38 | def __init__(self):
39 | self.fit_points = None
40 | self.model = None
41 | self.intermediate_layer_model = None
42 |
43 | def fit(self, data, epochs = 500, nodes = 25, lr = .01, verbose = 0):
44 | '''This method creates a model and will fit it to the given m x n
45 | dimensional data.
46 |
47 | Args:
48 | data (np array): A numpy array of shape (m,n), where m is the
49 | number of points and n is the number of dimensions.
50 | epochs (int): Number of epochs to train neural network, defaults
51 | to 500.
52 | nodes (int): Number of nodes for the construction layers. Defaults
53 | to 25. The more complex the curve, the higher this number
54 | should be.
55 | lr (float): Learning rate for backprop. Defaults to .01
56 | verbose (0 or 1): Verbose = 0 mutes the training text from Keras.
57 | Defaults to 0.
58 |
59 | Returns:
60 | None
61 | '''
62 | num_dim = data.shape[1] # get number of dimensions for pts
63 |
64 | # create models, base and intermediate
65 | model = self.create_model(num_dim, nodes = nodes, lr = lr)
66 | bname = model.layers[2].name # bottle-neck layer name
67 |
68 | # The itermediate model gets the output of the bottleneck layer,
69 | # which acts as the projection layer.
70 | self.intermediate_layer_model = Model(inputs=model.input,
71 | outputs=model.get_layer(bname).output)
72 |
73 | # Fit the model and set the instances self.model to model
74 | model.fit(data, data, epochs = epochs, verbose = verbose)
75 | self.model = model
76 |
77 | return
78 |
79 | def project(self, data):
80 | '''The project function will project the points to the curve generated
81 | by the fit function. Given back is the projection index of the original
82 | data and a sorted version of the original data.
83 |
84 | Args:
85 | data (np array): m x n array to project to the curve
86 |
87 | Returns:
88 | proj (array): A one-dimension array that contains the projection
89 | index for each point in data.
90 | all_sorted (array): A m x n+1 array that contains data sorted by
91 | its projection index, along with the index.
92 | '''
93 | pts = self.model.predict(data)
94 | proj = self.intermediate_layer_model.predict(data)
95 |
96 | self.fit_points = pts
97 |
98 | all = np.concatenate([pts, proj], axis = 1)
99 | all_sorted = all[all[:,2].argsort()]
100 |
101 | return proj, all_sorted
102 |
103 | def create_model(self, num_dim, nodes, lr):
104 | '''Creates a tf model.
105 |
106 | Args:
107 | num_dim (int): How many dimensions the input space is
108 | nodes (int): How many nodes for the construction layers
109 | lr (float): Learning rate of backpropigation
110 |
111 | Returns:
112 | model (object): Keras Model
113 | '''
114 | # Create layers:
115 | # Function G
116 | input = Input(shape = (num_dim,)) #input layer
117 | mapping = Dense(nodes, activation = 'sigmoid')(input) #mapping layer
118 | bottle = Dense(1, activation = 'sigmoid')(mapping) #bottle-neck layer
119 |
120 | # Function H
121 | demapping = Dense(nodes, activation = 'sigmoid')(bottle) #mapping layer
122 | output = Dense(num_dim)(demapping) #output layer
123 |
124 | # Connect and compile model:
125 | model = Model(inputs = input, outputs = output)
126 | gradient_descent = optimizers.adam(learning_rate=lr)
127 | model.compile(loss = orth_dist, optimizer = gradient_descent)
128 |
129 | return model
130 |
131 | def preprocess(self, data):
132 | '''Converts individual arrays into a singular m x n array, where
133 | m is the number of observations and n is the number of dimensions.
134 | Normalizes the data for faster training.
135 |
136 | Args:
137 | data (list): List of arrays of points. For example, if you have
138 | data for x, y, and z stored in arrays x_, y_, and z_, pass
139 | in [x_, y_, z_]
140 |
141 | Returns:
142 | data_comb (array): A single m x n, where each column
143 | is MinMaxScaled and normed.
144 | '''
145 | data_lists = []
146 |
147 | scale = MinMaxScaler(feature_range=(-1,1))
148 | norm = StandardScaler()
149 | for arr in data:
150 | normed = norm.fit_transform(arr.reshape(-1,1))
151 | scaled = scale.fit_transform(normed.reshape(-1,1))
152 | data_lists.append(scaled)
153 |
154 | return np.concatenate(data_lists, axis = 1)
--------------------------------------------------------------------------------
/prinpy/local.py:
--------------------------------------------------------------------------------
1 | '''
2 | These are local algorithms. These work on a per-step basis. Starting
3 | at one point, the algorithm attempts to choose the next best point,
4 | and so on until the end is reached.
5 | '''
6 |
7 | # Import some modules
8 | import numpy as np
9 | import matplotlib.pyplot as plt
10 | import scipy.optimize as op
11 | import scipy.interpolate
12 |
13 |
14 | def distg(pts, v1, v2):
15 | '''
16 | Returns the minimum of points from line and points from vertex
17 | pts: points to calulate distance from
18 | v1: vertex j
19 | v2: vertex j+1
20 | '''
21 | D1 = np.linalg.norm(v2 - pts, axis = 1)
22 | D2 = np.abs(np.cross(v2-v1, v1-pts)) / np.linalg.norm(v2-v1)
23 |
24 | error_t = [np.min([i,j]) for i,j in zip(D1, D2)]
25 |
26 | return sum(error_t)/len(error_t)
27 |
28 | def points_in(pts, r1, p):
29 | '''
30 | Gets points in r1 from p
31 | '''
32 | distances = np.linalg.norm(p - pts, axis = 1)
33 | return pts[(distances < (r1))]
34 |
35 | def points_out(pts, r1, p):
36 | '''
37 | Gets points out r1 from p
38 | '''
39 | distances = np.linalg.norm(p - pts, axis = 1)
40 | return pts[(distances > r1)]
41 |
42 | def points_btw(pts, r1, r2, p):
43 | '''
44 | Gets points between r1 < r2 from p
45 | '''
46 | distances = np.linalg.norm(p - pts, axis = 1)
47 | return pts[(distances < r2) & (distances > r1)]
48 |
49 | def reset(x,y):
50 | dat = np.concatenate([x.reshape(-1,1), y.reshape(-1,1)], axis = 1)
51 | return dat, [dat[0,:]]
52 |
53 | def proj_min(X, tck, pt):
54 | '''
55 | Finds the distance X along the PC where pt has the shortest
56 | projection distance.
57 | '''
58 | loc_ = scipy.interpolate.splev(X, tck)
59 | return np.linalg.norm(loc_ - pt)
60 |
61 | class CLPCG:
62 | def __init__(self):
63 | self.fit_points = []
64 | self.spline_ticks = None
65 |
66 | def points(self, x, y, e_max = .2, fmin_error = False):
67 | '''Implements CLPC-greedy algorithm.
68 |
69 | Args:
70 | x (array): x-data to fit
71 | y (array): y-data to fit
72 | e_max (flat): Max allowed error. If not met, another point P will
73 | be addedto the curve. Authors suggest 1/4 to 1/2 of
74 | measurement error. Defaults to .2
75 |
76 | Returns:
77 | points (array): collection of points that construct the straight
78 | line segments.
79 | '''
80 | # Combine x,y and sort
81 | data = np.concatenate([x.reshape(-1,1), y.reshape(-1,1)], axis = 1)
82 |
83 | points = [] # points of principal curve
84 | points.append(data[0,:]) # Append first point
85 | pe = data[-1,:] # end point
86 |
87 | while 1:
88 | pt_found = False
89 |
90 | # Start drawing circle
91 | rl = 0 # lower radius bound
92 | rt = 2 * np.linalg.norm(pe - points[-1]) # upper bound
93 |
94 | # First, attempt to connect to end point.
95 | # Connecting to the end point with acceptable ei is the
96 | # termination condition.
97 | rend = np.linalg.norm(pe - points[-1])
98 | in_c = points_in(data, rend, points[-1]) #get pts inside circle
99 | try:
100 | e_end = distg(in_c, points[-1], pe) # calculate error to end pt
101 | except ZeroDivisionError: # successfully terminates, weird case
102 | break
103 |
104 | if e_end <= e_max:
105 | points.append(pe)
106 | break
107 |
108 | while not pt_found: # point with acceptable error not found
109 | # begin draw circle with radius ri
110 | ri = rl + (rt - rl)/2
111 |
112 | in_c = points_in(data, ri, points[-1]) #get pts inside circle
113 | rj = ri * .9 # Construct inner radius
114 | btw_c = points_btw(data, rj, ri, points[-1]) #get pts btw circle
115 |
116 | if btw_c.shape[0] == 0: # No points s.t. rj > ||p|| > ri
117 | raise ValueError("e_max = %f is too small. Choose a " \
118 | "larger e_max." % e_max)
119 | else:
120 | # candidate point is mean of points in circle sector
121 | p2 = np.array([np.mean(btw_c[:,0]), np.mean(btw_c[:,1])])
122 | e_i = distg(in_c, points[-1], p2) # calculate error
123 |
124 | if e_i > e_max: # if error not acceptable, reduce size of rt
125 | rt = ri
126 |
127 | # If error is acceptable, add p2 to points
128 | else:
129 | data = points_out(data, ri, points[-1])
130 | points.append(p2)
131 | pt_found = True
132 |
133 | # transform points into an array
134 | res_x = np.array([p[0] for p in points])
135 | res_y = np.array([p[1] for p in points])
136 | res = np.concatenate([res_x.reshape(-1,1), res_y.reshape(-1,1)], axis = 1)
137 |
138 | if res.shape[0] <= 3:
139 | raise ValueError("Not enough points generated: Spline degre 3 with" \
140 | " %d points generated. Try reducing e_max" \
141 | % (res.shape[0]))
142 |
143 | self.fit_points = res
144 | return res
145 |
146 | def fit(self, x, y, e_max = .2, fmin_error = False):
147 | '''
148 | Calculates principal curve ticks
149 |
150 | Args: same as points
151 |
152 | Returns:
153 | None
154 | '''
155 | res = self.points(x, y, e_max, fmin_error)
156 | tck, u = scipy.interpolate.splprep(res.T, s = 0)
157 | self.spline_ticks = tck
158 |
159 | return
160 |
161 | def plot(self, ax = None):
162 | '''
163 | Plots the curve to a MPL axes object.
164 |
165 | Args:
166 | ax (object): Optional set of ax to plot to. If None, a set of ax
167 | will be created.
168 | '''
169 | if ax == None:
170 | fig, ax = plt.subplots()
171 | xy = scipy.interpolate.splev(np.linspace(0,1,100), self.spline_ticks)
172 | ax.plot(xy[0], xy[1], c = 'black')
173 | return
174 |
175 | def project(self, x, y):
176 | '''
177 | Projects points x,y to principal curve calculated by calc_pc
178 | Args:
179 | x (array): x-data to project
180 | y (array): y-data to project
181 | Returns:
182 | proj (array): Projecton index of points onto curve between (0,1)
183 | '''
184 | # for each point min distance to curve
185 | data = np.concatenate([x.reshape(-1,1), y.reshape(-1,1)], axis = 1)
186 |
187 | proj = []
188 | for p in data:
189 | proj_dist = op.minimize(
190 | proj_min,
191 | x0 = [.5],
192 | args = (self.spline_ticks, p),
193 | method = 'Powell'
194 | ).x
195 | proj.append(proj_dist)
196 | return proj
197 |
198 | # Functions specific to the search alg, namely finding the
199 | # error line and the function we search with/minimize
200 | def to_min_error(theta, pts, v1, r1):
201 | v2 = np.array([v1[0]+r1*np.cos(theta[0]), v1[1]+r1*np.sin(theta[0])])
202 | D1 = np.linalg.norm(v2 - pts, axis = 1)
203 | D2 = np.abs(np.cross(v2-v1, v1-pts)) / np.linalg.norm(v2-v1)
204 |
205 | error_t = [np.min([i,j]) for i,j in zip(D1, D2)]
206 |
207 | return sum(error_t)/len(error_t)
208 |
209 | def error_line(theta, c, r1):
210 | p2 = np.array([c[0]+r1*np.cos(theta), c[1]+r1*np.sin(theta)])
211 | return p2
212 |
213 | def point_dist(pts, v1, v2):
214 | '''
215 | Tells CLPCS if it should invert direction of best fit line
216 | v1: vertex j
217 | v2: vertex j+1
218 | '''
219 | D1 = np.linalg.norm(v2 - pts, axis = 1)
220 | return sum(D1)/len(D1)
221 |
222 | class CLPCS:
223 | def __init__(self):
224 | self.fit_points = []
225 | self.spline_ticks = None
226 |
227 | def points(self, x, y, e_max = .2, rl = 0):
228 | '''Implements CLPC one dimensional search algorithm
229 |
230 | Args:
231 | x (array): x-data to fit
232 | y (array): y-data to fit
233 | e_max (flat): Max allowed error. If not met, another point P will
234 | be addedto the curve. Authors suggest 1/4 to 1/2 of
235 | measurement error. Defaults to .2
236 |
237 | Returns:
238 | points (array): collection of points that construct the straight
239 | line segments.
240 | '''
241 | # Combine x,y and sort
242 | data = np.concatenate([x.reshape(-1,1), y.reshape(-1,1)], axis = 1)
243 |
244 | points = [] # points of principal curve
245 | points.append(data[0,:]) # Append first point
246 | pe = data[-1,:] # end point
247 |
248 | while 1:
249 | pt_found = False
250 |
251 | # Start drawing circle
252 | rt = 2 * np.linalg.norm(pe - points[-1]) # upper bound
253 |
254 | # First, attempt to connect to end point.
255 | # Connecting to the end point with acceptable ei is the
256 | # termination condition.
257 | rend = np.linalg.norm(pe - points[-1])
258 | in_c = points_in(data, rend, points[-1]) #get pts inside circle
259 | try:
260 | e_end = distg(in_c, points[-1], pe) # calculate error to end pt
261 | except ZeroDivisionError: # successfully terminates, weird case
262 | break
263 |
264 | if e_end <= e_max:
265 | points.append(pe)
266 | break
267 |
268 | while not pt_found:
269 | ri = rl + (rt - rl)/2
270 |
271 | # Get points inside circle
272 | in_c = points_in(data, ri, points[-1])
273 |
274 | if in_c.shape[0] == 0: # No points are in circle
275 | raise ValueError("e_max = %f is too small. Choose a " \
276 | "smaller e_max." % e_max)
277 |
278 | else:
279 | # find min error
280 | theta = op.minimize(
281 | to_min_error,
282 | x0 = [0],
283 | args = (in_c, points[-1], ri),
284 | method = 'Powell'
285 | ).x
286 | p2 = error_line(theta, points[-1], ri)
287 | e_i = distg(in_c, points[-1], p2) # calculate error
288 | p_error = point_dist(in_c, points[-1], p2)
289 |
290 | # Try to invert p2 and check error. This is because
291 | # the optimzation alg can fail to account for the fact
292 | # that p2 could be closer to points than drawing other dir
293 | theta_inv = np.pi + theta
294 | p2_inv = error_line(theta_inv, points[-1], ri)
295 | p_error_inv = point_dist(in_c, points[-1], p2_inv)
296 |
297 | if p_error_inv < p_error: p2 = p2_inv
298 |
299 | if e_i >= e_max:
300 | rt = ri
301 | else:
302 | data = points_out(data, ri, points[-1])
303 | points.append(p2)
304 |
305 | pt_found = True
306 |
307 | res_x = np.array([p[0] for p in points])
308 | res_y = np.array([p[1] for p in points])
309 | res = np.concatenate([res_x.reshape(-1,1), res_y.reshape(-1,1)], axis = 1)
310 | return res
311 |
312 | def fit(self, x, y, e_max = .2, rl = 0):
313 | '''
314 | Calculates principal curve ticks
315 | Args: same as fit_points
316 |
317 | Returns:
318 | None
319 | '''
320 | res = self.points(x, y, e_max, rl)
321 | tck, u = scipy.interpolate.splprep(res.T, s = 0)
322 | self.spline_ticks = tck
323 | return
324 |
325 | def plot(self, ax = None, **kwargs):
326 | '''
327 | Plots the curve to a MPL axes object.
328 | Args:
329 | ax (object): Optional set of ax to plot to. If None, a
330 | set of ax will be created.
331 | '''
332 | if ax == None:
333 | fig, ax = plt.subplots()
334 | xy = scipy.interpolate.splev(np.linspace(0,1,100), self.spline_ticks)
335 | ax.plot(xy[0], xy[1], c = 'black', **kwargs)
336 | return
337 |
338 | def project(self, x, y):
339 | '''
340 | Projects points x,y to principal curve calculated by calc_pc
341 | Args:
342 | x (array): x-data to project
343 | y (array): y-data to project
344 | Returns:
345 | proj (array): projections of points onto curve between (0,1)
346 | '''
347 | # for each point min distance to curve
348 | data = np.concatenate([x.reshape(-1,1), y.reshape(-1,1)], axis = 1)
349 |
350 | proj = []
351 | for p in data:
352 | proj_dist = op.minimize(
353 | proj_min,
354 | x0 = [.5],
355 | args = (self.spline_ticks, p),
356 | method = 'Powell'
357 | ).x
358 | proj.append(proj_dist)
359 | return proj
360 |
--------------------------------------------------------------------------------
/prinPy quickstart.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# prinPy Quickstart Guide"
8 | ]
9 | },
10 | {
11 | "cell_type": "markdown",
12 | "metadata": {},
13 | "source": [
14 | "prinPy has global and local algorithms. "
15 | ]
16 | },
17 | {
18 | "cell_type": "markdown",
19 | "metadata": {},
20 | "source": [
21 | "## 1. Local Algorithms"
22 | ]
23 | },
24 | {
25 | "cell_type": "code",
26 | "execution_count": 75,
27 | "metadata": {},
28 | "outputs": [],
29 | "source": [
30 | "from prinpy.local import CLPCG\n",
31 | "\n",
32 | "# Some other modules\n",
33 | "import numpy as np\n",
34 | "import matplotlib.pyplot as plt\n",
35 | "import seaborn as sns; sns.set()\n",
36 | "import timeit"
37 | ]
38 | },
39 | {
40 | "cell_type": "markdown",
41 | "metadata": {},
42 | "source": [
43 | "### Generate Test Data"
44 | ]
45 | },
46 | {
47 | "cell_type": "code",
48 | "execution_count": 2,
49 | "metadata": {},
50 | "outputs": [],
51 | "source": [
52 | "theta = np.linspace(0,np.pi*3, 1000)\n",
53 | "r = np.linspace(0,1,1000) ** .5\n",
54 | "\n",
55 | "x_data = r * np.cos(theta) + np.random.normal(scale = .02, size = 1000)\n",
56 | "y_data = r * np.sin(theta) + np.random.normal(scale = .02, size = 1000)"
57 | ]
58 | },
59 | {
60 | "cell_type": "markdown",
61 | "metadata": {},
62 | "source": [
63 | "### Plot"
64 | ]
65 | },
66 | {
67 | "cell_type": "code",
68 | "execution_count": 3,
69 | "metadata": {},
70 | "outputs": [
71 | {
72 | "data": {
73 | "image/png": "\n",
74 | "text/plain": [
75 | ""
76 | ]
77 | },
78 | "metadata": {
79 | "needs_background": "light"
80 | },
81 | "output_type": "display_data"
82 | }
83 | ],
84 | "source": [
85 | "plt.scatter(x_data, y_data, s = 1)\n",
86 | "plt.show()"
87 | ]
88 | },
89 | {
90 | "cell_type": "markdown",
91 | "metadata": {},
92 | "source": [
93 | "### Fit Principal Curve with Local Algorithms"
94 | ]
95 | },
96 | {
97 | "cell_type": "code",
98 | "execution_count": 4,
99 | "metadata": {},
100 | "outputs": [
101 | {
102 | "name": "stdout",
103 | "output_type": "stream",
104 | "text": [
105 | "Took 0.214514 seconds\n"
106 | ]
107 | }
108 | ],
109 | "source": [
110 | "cl = CLPCG() # Create CLPCG object\n",
111 | "\n",
112 | "# the fit() method calculates the principal curve\n",
113 | "# e_max is determined through trial and error as of\n",
114 | "# now, but aim for about 1/2 data error and adjust from\n",
115 | "# there. \n",
116 | "start = timeit.default_timer()\n",
117 | "\n",
118 | "cl.fit(x_data, y_data, e_max = .03) # CLPCG.fit() to fit PC\n",
119 | "\n",
120 | "stop = timeit.default_timer()\n",
121 | "\n",
122 | "print(\"Took %f seconds\" % (stop - start))"
123 | ]
124 | },
125 | {
126 | "cell_type": "code",
127 | "execution_count": 76,
128 | "metadata": {},
129 | "outputs": [
130 | {
131 | "data": {
132 | "text/plain": [
133 | ""
134 | ]
135 | },
136 | "execution_count": 76,
137 | "metadata": {},
138 | "output_type": "execute_result"
139 | },
140 | {
141 | "data": {
142 | "image/png": "\n",
143 | "text/plain": [
144 | ""
145 | ]
146 | },
147 | "metadata": {},
148 | "output_type": "display_data"
149 | }
150 | ],
151 | "source": [
152 | "fig, ax = plt.subplots()\n",
153 | "ax.scatter(x_data, y_data, s = 3, alpha = .7)\n",
154 | "cl.plot(ax) # .plot will display the fit curve.\n",
155 | " # you can optionally pass in a matplotlib ax\n",
156 | "pts = cl.fit_points # fitted points with PC that spline is passed through\n",
157 | "ax.scatter(pts[:,0], pts[:,1], s = 40, c = 'green')"
158 | ]
159 | },
160 | {
161 | "cell_type": "code",
162 | "execution_count": 6,
163 | "metadata": {},
164 | "outputs": [],
165 | "source": [
166 | "# .proj will return a projection index for each point\n",
167 | "proj = cl.project(x_data, y_data) \n",
168 | "print(proj[:5])"
169 | ]
170 | },
171 | {
172 | "cell_type": "code",
173 | "execution_count": 10,
174 | "metadata": {},
175 | "outputs": [
176 | {
177 | "name": "stdout",
178 | "output_type": "stream",
179 | "text": [
180 | "[0. 0. 0. 0. 0.02700805 0.04754636\n",
181 | " 0.09041594 0.12945787 0.18449046 0.22115265 0.2495283 0.30794099\n",
182 | " 0.36219516 0.4034585 0.45169821 0.50868772 0.57010873 0.63433081\n",
183 | " 0.69801242 0.75620144 0.80715538 0.89094181 1. 1.\n",
184 | " 1. 1. ]\n",
185 | "[[ 0.01447173 0.01714681]\n",
186 | " [ 0.06537318 -0.00911538]\n",
187 | " [ 0.17175084 0.04210127]\n",
188 | " [ 0.21407512 0.16854487]\n",
189 | " [ 0.05238058 0.39507542]]\n"
190 | ]
191 | }
192 | ],
193 | "source": [
194 | "# additionally, you can get spline ticks or fit points:\n",
195 | "tck = cl.spline_ticks\n",
196 | "fit_pts = cl.fit_points\n",
197 | "\n",
198 | "print(tck[0])\n",
199 | "print(fit_pts[:5])"
200 | ]
201 | },
202 | {
203 | "cell_type": "markdown",
204 | "metadata": {},
205 | "source": [
206 | "## 2. Global"
207 | ]
208 | },
209 | {
210 | "cell_type": "code",
211 | "execution_count": 13,
212 | "metadata": {},
213 | "outputs": [],
214 | "source": [
215 | "# NLPCA is the global alg\n",
216 | "from prinpy.glob import NLPCA"
217 | ]
218 | },
219 | {
220 | "cell_type": "code",
221 | "execution_count": 54,
222 | "metadata": {},
223 | "outputs": [],
224 | "source": [
225 | "# Generate some test data\n",
226 | "t = np.linspace(0, 1, 1000) + np.random.normal(scale = .1, size = 1000)\n",
227 | "x = 5*np.cos(t) + np.random.normal(scale = .1, size = 1000)\n",
228 | "y = np.sin(t) + np.random.normal(scale = .1, size = 1000)"
229 | ]
230 | },
231 | {
232 | "cell_type": "code",
233 | "execution_count": 77,
234 | "metadata": {},
235 | "outputs": [
236 | {
237 | "data": {
238 | "text/plain": [
239 | ""
240 | ]
241 | },
242 | "execution_count": 77,
243 | "metadata": {},
244 | "output_type": "execute_result"
245 | },
246 | {
247 | "data": {
248 | "image/png": "\n",
249 | "text/plain": [
250 | ""
251 | ]
252 | },
253 | "metadata": {},
254 | "output_type": "display_data"
255 | }
256 | ],
257 | "source": [
258 | "plt.scatter(x, y, s = 1)"
259 | ]
260 | },
261 | {
262 | "cell_type": "code",
263 | "execution_count": 65,
264 | "metadata": {},
265 | "outputs": [],
266 | "source": [
267 | "# create solver\n",
268 | "pca = NLPCA()\n",
269 | "\n",
270 | "# transform data for better training with the \n",
271 | "# neural net using built in preprocessor\n",
272 | "data_new = pca.preprocess( [x,y] )\n",
273 | "\n",
274 | "# fit the data\n",
275 | "pca.fit(data_new, epochs = 150, nodes = 15, lr = .01, verbose = 0)\n",
276 | "\n",
277 | "# project the current data. This returns a projection\n",
278 | "# index for each point and points to plot the curve\n",
279 | "proj, curve_pts = pca.project(data_new)"
280 | ]
281 | },
282 | {
283 | "cell_type": "code",
284 | "execution_count": 81,
285 | "metadata": {},
286 | "outputs": [
287 | {
288 | "data": {
289 | "text/plain": [
290 | "[]"
291 | ]
292 | },
293 | "execution_count": 81,
294 | "metadata": {},
295 | "output_type": "execute_result"
296 | },
297 | {
298 | "data": {
299 | "image/png": "\n",
300 | "text/plain": [
301 | ""
302 | ]
303 | },
304 | "metadata": {},
305 | "output_type": "display_data"
306 | }
307 | ],
308 | "source": [
309 | "plt.scatter(data_new[:,0], \n",
310 | " data_new[:,1], \n",
311 | " s = 5, \n",
312 | " c = proj.reshape(-1), \n",
313 | " cmap = 'viridis')\n",
314 | "plt.plot(curve_pts[:,0], \n",
315 | " curve_pts[:,1], \n",
316 | " color = 'black',\n",
317 | " linewidth = '2.5')"
318 | ]
319 | }
320 | ],
321 | "metadata": {
322 | "kernelspec": {
323 | "display_name": "Python 3",
324 | "language": "python",
325 | "name": "python3"
326 | },
327 | "language_info": {
328 | "codemirror_mode": {
329 | "name": "ipython",
330 | "version": 3
331 | },
332 | "file_extension": ".py",
333 | "mimetype": "text/x-python",
334 | "name": "python",
335 | "nbconvert_exporter": "python",
336 | "pygments_lexer": "ipython3",
337 | "version": "3.7.6"
338 | }
339 | },
340 | "nbformat": 4,
341 | "nbformat_minor": 4
342 | }
343 |
--------------------------------------------------------------------------------