├── .gitignore ├── README.md ├── regression ├── README.md ├── ml-housing.sql ├── ohms_law.sql ├── pg-tf.sql ├── pythagoras.sql ├── random1.sql └── tf-housing.sql ├── text_prediction ├── README.md ├── html-train.py └── test-model.py └── time_series ├── README.md ├── activity.csv ├── arima-tsp.py ├── cnn-demo.csv ├── cnn-demo.py ├── cnn-workload.py ├── data.csv ├── prophet-tsp.py ├── prophet.sql └── sts-tsp.py /.gitignore: -------------------------------------------------------------------------------- 1 | .idea 2 | *.h5 3 | *.json 4 | *.pkl -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Experiments in Machine Learning 2 | 3 | This GIT repository contains various scripts written as I explored topics around 4 | machine learning and how we might make use of these technologies in and with 5 | PostgreSQL. 6 | 7 | Nothing you find here is production quality code! -------------------------------------------------------------------------------- /regression/README.md: -------------------------------------------------------------------------------- 1 | # Regression 2 | 3 | We can use Tensorflow from within PostgreSQL to perform regression tasks for 4 | modelling and prediction. Essentially we're using a neural network to learn the 5 | relationship between inputs and outputs; for example, given: 6 | 7 | x = y *[some operation]* z 8 | 9 | we're modelling *[some operation]*, essentially approximating the formula. 10 | 11 | Note that whilst in some cases there may be a specific, defined relationship 12 | between the inputs and outputs (e.g. x2 = y2 + 13 | z2 - Pythagoras' theorem), in other cases there may not be. These 14 | are typically the cases that interest us as they allow us to analyse data for 15 | business intelligence purposes. A good example is predicting the price of a 16 | house based on factors such as location, number of bedrooms and reception rooms, 17 | type of build etc. 18 | 19 | ## PostgreSQL 20 | 21 | ### Tensorflow 22 | 23 | We need to configure PostgreSQL in order to run Tensorflow. This consists of a 24 | couple of steps: 25 | 26 | 1. Install pl/plython3 in your PostgreSQL database, e.g: 27 | 28 | ```postgresql 29 | CREATE EXTENSION plpython3u; 30 | ``` 31 | 32 | 2. Install Tensorflow (and any other required modules) in the Python environment 33 | used by the PostgreSQL server. In my case, that's the EDB LanguagePack on 34 | macOS: 35 | 36 | ```shell script 37 | % sudo /Library/edb/languagepack/v1/Python-3.7/bin/pip3 install tensorflow numpy 38 | ``` 39 | 40 | It should then be possible to create pl/python3 functions in PostgreSQL. 41 | 42 | ### Apache Madlib 43 | 44 | Some examples use Apache MADLib instead of Tensorflow. See the documentation on 45 | the [MADLib Confluence page](https://cwiki.apache.org/confluence/display/MADLIB/Installation+Guide). 46 | 47 | 48 | ## Scripts 49 | 50 | There are various SQL scripts in this directory that represent the various 51 | experiments I've worked on. Most create: 52 | 53 | - A table to store training inputs and outputs. 54 | - A function to generate data for the training table. 55 | - A function to train a model and make a prediction based on that model. 56 | 57 | Obviously in real-world applications the model creation and prediction functions 58 | would probably be separated, and training data would likely come from existing 59 | tables/views. 60 | 61 | __Note:__ All files written by the scripts will be owned by the user account 62 | under which PostgreSQL is running, and unless a full path is given, will be 63 | written relative to the data directory. I've used */Users/Shared/tf* as the 64 | working directory; you may want to change that. 65 | 66 | ### ohms_law.sql 67 | 68 | This attempts to teach a network a basic operation based on Ohms Law; 69 | 70 | voltage (v) = i (current) * r (resistance) 71 | 72 | It's worth noting that the results of this model are *terrible*, so don't try 73 | to use it as the basis for anything else. At the time of writing I haven't yet 74 | figured out why this is the case, though I have some hunches. 75 | 76 | ### pythagoras.sql 77 | 78 | This attempts to teach a network Pythagoras' Theorem: 79 | 80 | x2 = y2 + z2 81 | 82 | The square of the length of the hypotenuse of a right angled triangle is the 83 | sum of the square of the other sides. 84 | 85 | ### random1.sql 86 | 87 | This attempts to teach a network a completely fictitious equation with five 88 | input variables (in pl/pgsql): 89 | 90 | z := cbrt((a * b) / (sin(c) * sqrt(d)) + (e * e * e)); 91 | 92 | ### tf-housing.sql 93 | 94 | This is based on the well known [Boston Housing dataset](https://www.cs.toronto.edu/~delve/data/boston/bostonDetail.html). 95 | The SQL file contains the definition for a table to hold the data (loading it 96 | is an exercise for the reader) and a function for training, testing and using 97 | a model. This function differs from other in that Pandas data frames are used 98 | in place of Numpy, and an attempt is made to remove rows container outliers 99 | from the dataset before training, in order to increase accuracy of results. 100 | 101 | ### ml-housing.sql 102 | 103 | This script implements the same regression analysis as *tf-housing.sql*, except 104 | that it uses Apache MADLib instead of Tensorflow. -------------------------------------------------------------------------------- /regression/ml-housing.sql: -------------------------------------------------------------------------------- 1 | -- NOTE: Requires Apache MADLib to be installed in the database 2 | -- Table to hold the training inputs and output 3 | CREATE TABLE public.housing 4 | ( 5 | crim double precision NOT NULL, 6 | zn double precision NOT NULL, 7 | indus double precision NOT NULL, 8 | chas double precision NOT NULL, 9 | nox double precision NOT NULL, 10 | rm double precision NOT NULL, 11 | age double precision NOT NULL, 12 | dis double precision NOT NULL, 13 | rad double precision NOT NULL, 14 | tax double precision NOT NULL, 15 | ptratio double precision NOT NULL, 16 | b double precision NOT NULL, 17 | lstat double precision NOT NULL, 18 | medv double precision NOT NULL 19 | ) 20 | 21 | TABLESPACE pg_default; 22 | 23 | ALTER TABLE public.housing 24 | OWNER to postgres; 25 | 26 | -- Create, train and test the model 27 | DROP TABLE IF EXISTS housing_linregr, housing_linregr_summary; 28 | SELECT madlib.linregr_train( 'housing', 29 | 'housing_linregr', 30 | 'medv', 31 | 'ARRAY[1, crim, zn, indus, chas, nox, rm, age, dis, rad, tax, ptratio, b, lstat]' 32 | ); 33 | 34 | -- Get the predictions, along with the difference in value and root mean squared 35 | -- error for the entire set. Thanks to Vik Fearing for helping me fix a couple 36 | -- of issues with this query! 37 | SELECT housing.*, 38 | predict, 39 | medv - predict AS residual, 40 | sqrt(avg(power(abs(medv - predict), 2)) OVER ()) AS rmse 41 | FROM housing, 42 | housing_linregr, 43 | madlib.linregr_predict(coef, 44 | ARRAY[1, crim, zn, indus, chas, nox, rm, age, dis, rad, tax, ptratio, b, lstat] 45 | ) predict; -------------------------------------------------------------------------------- /regression/ohms_law.sql: -------------------------------------------------------------------------------- 1 | -- Table to hold the training inputs and output 2 | CREATE TABLE public.ohms_law 3 | ( 4 | r double precision NOT NULL, 5 | i double precision NOT NULL, 6 | v double precision NOT NULL, 7 | CONSTRAINT ohms_law_pkey PRIMARY KEY (r, i) 8 | ) 9 | 10 | TABLESPACE pg_default; 11 | 12 | ALTER TABLE public.ohms_law 13 | OWNER to postgres; 14 | 15 | -- Function to populate the training data 16 | CREATE OR REPLACE FUNCTION public.ohms_law_generate(rows integer) 17 | RETURNS void 18 | LANGUAGE 'plpgsql' 19 | COST 100 20 | VOLATILE PARALLEL UNSAFE 21 | AS $BODY$ 22 | DECLARE 23 | i float; 24 | r float; 25 | v float; 26 | BEGIN 27 | FOR l IN 1..rows LOOP 28 | SELECT round(random() * 1000 + 1) INTO i; 29 | SELECT round(random() * 1000 + 1) INTO r; 30 | v := i * r; 31 | 32 | RAISE NOTICE 'i: %, r: %, v: %', i, r, v; 33 | BEGIN 34 | INSERT INTO ohms_law (r, i, v) VALUES (r, i, v); 35 | EXCEPTION WHEN unique_violation THEN 36 | l := l - 1; 37 | END; 38 | END LOOP; 39 | END 40 | $BODY$; 41 | 42 | ALTER FUNCTION public.ohms_law_generate(integer) 43 | OWNER TO postgres; 44 | 45 | SELECT ohms_law_generate(1000); 46 | 47 | -- Create, train and test the model 48 | CREATE OR REPLACE FUNCTION public.ohms_law_v( 49 | r double precision, 50 | i double precision) 51 | RETURNS double precision 52 | LANGUAGE 'plpython3u' 53 | COST 100 54 | VOLATILE PARALLEL UNSAFE 55 | AS $BODY$ 56 | import tensorflow as tf 57 | import numpy as np 58 | import matplotlib.pyplot as plt 59 | import math 60 | 61 | from tensorflow.python.keras.callbacks import ModelCheckpoint, LambdaCallback, EarlyStopping, LearningRateScheduler 62 | 63 | tf.keras.backend.clear_session() 64 | tf.random.set_seed(42) 65 | np.random.seed(42) 66 | 67 | total_rows = 1000 68 | validation_pct = 25 69 | test_pct = 10 70 | epochs = 1000 71 | 72 | # Create the data sets 73 | rows = plpy.execute('SELECT r, i, v FROM ohms_law ORDER BY random() LIMIT {}'.format(total_rows)) 74 | actual_rows = len(rows) 75 | 76 | if actual_rows < 5: 77 | plpy.error('At least 5 data rows must be available for training. {} rows retrieved.'.format(actual_rows)) 78 | 79 | test_rows = int((actual_rows/100) * test_pct) 80 | validation_rows = int(((actual_rows)/100) * validation_pct) 81 | training_rows = actual_rows - test_rows - validation_rows 82 | 83 | data = [] 84 | results = [] 85 | 86 | for row in rows: 87 | data.append([row['r'], row['i']]) 88 | results.append(row['v']) 89 | 90 | max_v = max(results) 91 | training_data = np.array(data[:training_rows], dtype=float) 92 | training_results = np.array(results[:training_rows], dtype=float) 93 | validation_data = np.array(data[training_rows:training_rows+validation_rows], dtype=float) 94 | validation_results = np.array(results[training_rows:training_rows+validation_rows], dtype=float) 95 | test_data = np.array(data[training_rows+validation_rows:], dtype=float) 96 | test_results = np.array(results[training_rows+validation_rows:], dtype=float) 97 | 98 | plpy.notice('Total rows: {}, training rows: {}, validation rows: {}, test rows: {}.'.format(actual_rows, len(training_data), len(validation_data), len(test_data))) 99 | 100 | # Define the model 101 | l0 = tf.keras.layers.Input(shape=(2)) 102 | l1 = tf.keras.layers.Dense(units=16, activation = 'relu') 103 | l2 = tf.keras.layers.Dense(units=1) 104 | 105 | model = tf.keras.Sequential([l0, l1, l2]) 106 | 107 | # Compile it 108 | model.compile(loss=tf.keras.losses.MeanSquaredError(), 109 | optimizer='adam') 110 | 111 | summary = [] 112 | model.summary(print_fn=lambda x: summary.append(x)) 113 | plpy.notice('Model architecture:\n{}'.format('\n'.join(summary))) 114 | 115 | # Save a checkpoint each time our loss metric improves. 116 | checkpoint = ModelCheckpoint('/Users/Shared/tf/ohms_law.h5', 117 | monitor='loss', 118 | save_best_only=True, 119 | mode='min') 120 | 121 | # Use early stopping 122 | early_stopping = EarlyStopping(patience=50) 123 | 124 | # Display output 125 | logger = LambdaCallback( 126 | on_epoch_end=lambda epoch, 127 | logs: plpy.notice( 128 | 'epoch: {}, training RMSE: {} ({}%), validation RMSE: {} ({}%)'.format( 129 | epoch, 130 | math.sqrt(logs['loss']), 131 | round(100 / max_v * math.sqrt(logs['loss']), 3), 132 | math.sqrt(logs['val_loss']), 133 | round(100 / max_v * math.sqrt(logs['val_loss']), 3)))) 134 | 135 | # Train it! 136 | history = model.fit(training_data, 137 | training_results, 138 | validation_data=(validation_data, validation_results), 139 | epochs=epochs, 140 | verbose=False, 141 | batch_size=50, 142 | callbacks=[logger, checkpoint, early_stopping]) 143 | 144 | # Graph the results 145 | training_loss = history.history['loss'] 146 | validation_loss = history.history['val_loss'] 147 | 148 | epochs_range = range(len(history.history['loss'])) 149 | 150 | plt.figure(figsize=(12, 8)) 151 | plt.plot(epochs_range, np.sqrt(training_loss), label='Training RMSE') 152 | plt.plot(epochs_range, np.sqrt(validation_loss), label='Validation RMSE') 153 | plt.legend(loc='upper right') 154 | plt.title('Training and Validation RMSE') 155 | 156 | plt.savefig('/Users/Shared/tf/ohms_law.png') 157 | 158 | # Load the best model from the checkpoint 159 | model = tf.keras.models.load_model('/Users/Shared/tf/ohms_law.h5') 160 | 161 | # How good is it looking? 162 | evaluation = model.evaluate(np.array(training_data), np.array(training_results)) 163 | plpy.notice('Training RMSE: {} ({}%).'.format(math.sqrt(evaluation), round(100 / max_v * math.sqrt(evaluation), 3))) 164 | 165 | if len(validation_data) > 0: 166 | evaluation = model.evaluate(np.array(validation_data), np.array(validation_results)) 167 | plpy.notice('Validation RMSE: {} ({}%).'.format(math.sqrt(evaluation), round(100 / max_v * math.sqrt(evaluation), 3))) 168 | 169 | if len(test_data) > 0: 170 | evaluation = model.evaluate(np.array(test_data), np.array(test_results)) 171 | plpy.notice('Test RMSE: {} ({}%).'.format(math.sqrt(evaluation), round(100 / max_v * math.sqrt(evaluation), 3))) 172 | 173 | # Get the result 174 | result = model.predict(np.array([[r, i]])) 175 | 176 | return result[0][0] 177 | $BODY$; 178 | 179 | ALTER FUNCTION public.ohms_law_v(double precision, double precision) 180 | OWNER TO postgres; 181 | 182 | SELECT ohms_law_v(3, 4); -- Correct answer is 12 -------------------------------------------------------------------------------- /regression/pg-tf.sql: -------------------------------------------------------------------------------- 1 | CREATE OR REPLACE FUNCTION public.tf_analyse( 2 | data_source_sql text, 3 | output_name text, 4 | output_path text) 5 | RETURNS void 6 | LANGUAGE 'plpython3u' 7 | COST 100 8 | VOLATILE PARALLEL UNSAFE 9 | AS $BODY$ 10 | import pandas as pd 11 | import matplotlib.pyplot as plt 12 | import seaborn as sns 13 | from math import ceil 14 | 15 | # Pandas print options 16 | pd.set_option('display.max_rows', None) 17 | pd.set_option('display.max_columns', None) 18 | pd.set_option('display.width', 1000) 19 | 20 | # Create the data sets 21 | rows = plpy.execute(data_source_sql) 22 | 23 | # Check we have enough rows 24 | if len(rows) < 2: 25 | plpy.error('At least 2 data rows must be available for analysis. {} rows retrieved.'.format(len(rows))) 26 | 27 | columns = list(rows[0].keys()) 28 | 29 | # Check we have enough columns 30 | if len(columns) < 2: 31 | plpy.error('At least 2 data columns must be available for analysis. {} columns retrieved.'.format(len(columns))) 32 | 33 | # Create the dataframe 34 | data = pd.DataFrame.from_records(rows, columns = columns) 35 | 36 | # Setup the plot layout 37 | plot_columns = 5 38 | plot_rows = ceil(len(columns) / plot_columns) 39 | 40 | # High level info 41 | plpy.notice('{} Analysis\n {}=========\n'.format(output_name.capitalize(), '=' * len(output_name))) 42 | plpy.notice('Data\n ----\n') 43 | plpy.notice('Data shape: {}'.format(data.shape)) 44 | plpy.notice('Data sample:\n{}\n'.format(data.head())) 45 | 46 | # Outliers 47 | plpy.notice('Outliers\n --------\n') 48 | 49 | Q1 = data.quantile(0.25) 50 | Q3 = data.quantile(0.75) 51 | IQR = Q3 - Q1 52 | plpy.notice('Interquartile Range (IQR):\n{}\n'.format(IQR)) 53 | plpy.notice('Outliers detected using IQR:\n{}\n'.format((data < (Q1 - 1.5 * IQR)) |(data > (Q3 + 1.5 * IQR)))) 54 | 55 | plt.cla() 56 | fig, axs = plt.subplots(ncols=plot_columns, nrows=plot_rows, figsize=(20, 5 * plot_rows)) 57 | index = 0 58 | axs = axs.flatten() 59 | for k,v in data.items(): 60 | sns.boxplot(y=k, data=data, ax=axs[index]) 61 | index += 1 62 | plt.tight_layout(pad=5, w_pad=0.5, h_pad=5.0) 63 | plt.suptitle('{} Outliers'.format(output_name.capitalize())) 64 | plt.savefig('{}/{}_outliers.png'.format(output_path, output_name)) 65 | plpy.notice('Created: {}/{}_outliers.png\n'.format(output_path, output_name)) 66 | 67 | # Distributions 68 | plpy.notice('Distributions\n -------------\n') 69 | plpy.notice('Summary:\n{}\n'.format(data.describe())) 70 | 71 | plt.cla() 72 | fig, axs = plt.subplots(ncols=plot_columns, nrows=plot_rows, figsize=(20, 5 * plot_rows)) 73 | index = 0 74 | axs = axs.flatten() 75 | for k,v in data.items(): 76 | sns.distplot(v, ax=axs[index]) 77 | index += 1 78 | plt.tight_layout(pad=5, w_pad=0.5, h_pad=5.0) 79 | plt.suptitle('{} Distributions'.format(output_name.capitalize())) 80 | plt.savefig('{}/{}_distributions.png'.format(output_path, output_name)) 81 | plpy.notice('Created: {}/{}_distributions.png\n'.format(output_path, output_name)) 82 | 83 | # Correlations 84 | plpy.notice('Correlations\n ------------\n') 85 | 86 | corr = data.corr() 87 | plpy.notice('Correlation data:\n{}\n'.format(corr)) 88 | 89 | plt.cla() 90 | plt.figure(figsize=(20,20)) 91 | sns.heatmap(data.corr().abs(), annot=True, cmap='Blues') 92 | plt.tight_layout(pad=5, w_pad=0.5, h_pad=5.0) 93 | plt.suptitle('{} Correlations'.format(output_name.capitalize())) 94 | plt.savefig('{}/{}_correlations.png'.format(output_path, output_name)) 95 | plpy.notice('Created: {}/{}_correlations.png\n'.format(output_path, output_name)) 96 | $BODY$; 97 | 98 | ALTER FUNCTION public.tf_analyse(text, text, text) 99 | OWNER TO postgres; 100 | 101 | COMMENT ON FUNCTION public.tf_analyse(text, text, text) 102 | IS 'Function to perform statistical analysis on an arbitrary data set. 103 | 104 | Parameters: 105 | * data_source_sql: An SQL query returning at least 2 rows and 2 columns of numeric data to analyse. 106 | * output_name: The name of the output to use in titles etc. 107 | * output_path: The path of a directory under which to save generated graphs. Must be writeable by the database server''s service account (usually postgres).'; 108 | 109 | -- Function to build and train a model 110 | CREATE OR REPLACE FUNCTION public.tf_model( 111 | data_source_sql text, 112 | structure integer[], 113 | output_name text, 114 | output_path text, 115 | epochs integer DEFAULT 5000, 116 | validation_pct integer DEFAULT 10, 117 | test_pct integer DEFAULT 10) 118 | RETURNS double precision 119 | LANGUAGE 'plpython3u' 120 | COST 100000 121 | VOLATILE PARALLEL UNSAFE 122 | AS $BODY$ 123 | import tensorflow as tf 124 | import pandas as pd 125 | import matplotlib.pyplot as plt 126 | from math import sqrt 127 | from tensorflow.python.keras.callbacks import ModelCheckpoint, LambdaCallback, EarlyStopping 128 | 129 | # Pandas print options 130 | pd.set_option('display.max_rows', None) 131 | pd.set_option('display.max_columns', None) 132 | pd.set_option('display.width', 1000) 133 | 134 | # Reset everything 135 | tf.keras.backend.clear_session() 136 | tf.random.set_seed(42) 137 | 138 | # Create the data sets 139 | rows = plpy.execute(data_source_sql) 140 | 141 | # Check we have enough rows 142 | if len(rows) < 2: 143 | plpy.error('At least 5 data rows must be available for training. {} rows retrieved.'.format(len(rows))) 144 | 145 | # Get a list of columns 146 | columns = list(rows[0].keys()) 147 | 148 | # Check we have enough columns 149 | if len(columns) < 2: 150 | plpy.error('At least 5 data columns must be available for training. {} columns retrieved.'.format(len(columns))) 151 | 152 | plpy.notice('Total rows: {}'.format(len(rows))) 153 | 154 | # Create the dataframe 155 | data = pd.DataFrame.from_records(rows, columns = columns) 156 | 157 | # Remove any rows with outliers 158 | Q1 = data.quantile(0.25) 159 | Q3 = data.quantile(0.75) 160 | IQR = Q3 - Q1 161 | plpy.notice('Removing outliers...') 162 | data = data[~((data < (Q1 - 1.5 * IQR)) |(data > (Q3 + 1.5 * IQR))).any(axis=1)] 163 | 164 | # So how many rows remain? 165 | actual_rows = len(data) 166 | 167 | # Figure out how many rows to use for training, validation and test 168 | test_rows = int((actual_rows/100) * test_pct) 169 | validation_rows = int(((actual_rows)/100) * validation_pct) 170 | training_rows = actual_rows - test_rows - validation_rows 171 | 172 | # Split the data into input and output 173 | input = data[columns[:-1]] 174 | output = data[columns[-1:]] 175 | 176 | # Split the input and output into training, validation and test sets 177 | max_z = max(output[output.columns[0]]) 178 | training_input = input[:training_rows] 179 | training_output = output[:training_rows] 180 | validation_input = input[training_rows:training_rows+validation_rows] 181 | validation_output = output[training_rows:training_rows+validation_rows] 182 | test_input = input[training_rows+validation_rows:] 183 | test_output = output[training_rows+validation_rows:] 184 | 185 | plpy.notice('Rows: {}, training rows: {}, validation rows: {}, test rows: {}.'.format(actual_rows, len(training_input), len(validation_input), len(test_input))) 186 | 187 | # Define the model 188 | model = tf.keras.Sequential() 189 | for units in structure: 190 | if len(model.layers) == 0: 191 | model.add(tf.keras.layers.Dense(units=units, input_shape=(len(columns) - 1,), activation = 'relu')) 192 | else: 193 | model.add(tf.keras.layers.Dense(units=units, activation = 'relu')) 194 | 195 | model.add(tf.keras.layers.Dense(units=1, activation='linear')) 196 | 197 | # Compile it 198 | model.compile(loss=tf.keras.losses.MeanSquaredError(), 199 | optimizer='adam') 200 | 201 | summary = [] 202 | model.summary(print_fn=lambda x: summary.append(x)) 203 | plpy.notice('Model architecture:\n{}'.format('\n'.join(summary))) 204 | 205 | # Save a checkpoint each time our loss metric improves. 206 | checkpoint = ModelCheckpoint('{}/{}.h5'.format(output_path, output_name), 207 | monitor='loss', 208 | save_best_only=True, 209 | mode='min') 210 | 211 | # Use early stopping 212 | early_stopping = EarlyStopping(patience=50) 213 | 214 | # Display output 215 | logger = LambdaCallback( 216 | on_epoch_end=lambda epoch, 217 | logs: plpy.notice( 218 | 'epoch: {}, training RMSE: {} ({}%), validation RMSE: {} ({}%)'.format( 219 | epoch, 220 | sqrt(logs['loss']), 221 | round(100 / max_z * sqrt(logs['loss']), 5), 222 | sqrt(logs['val_loss']), 223 | round(100 / max_z * sqrt(logs['val_loss']), 5)))) 224 | 225 | # Train it! 226 | history = model.fit(training_input, 227 | training_output, 228 | validation_data=(validation_input, validation_output), 229 | epochs=epochs, 230 | verbose=False, 231 | batch_size=50, 232 | callbacks=[logger, checkpoint, early_stopping]) 233 | 234 | # Graph the results 235 | training_loss = history.history['loss'] 236 | validation_loss = history.history['val_loss'] 237 | 238 | epochs_range = range(len(history.history['loss'])) 239 | 240 | plt.figure(figsize=(12, 8)) 241 | plt.grid(True) 242 | plt.plot(epochs_range, [x ** 0.5 for x in training_loss], label='Training') 243 | plt.plot(epochs_range, [x ** 0.5 for x in validation_loss], label='Validation') 244 | plt.xlabel('Epoch') 245 | plt.ylabel('Root Mean Squared Error') 246 | plt.legend(loc='upper right') 247 | plt.title('Training and Validation Root Mean Squared Error') 248 | plt.savefig('{}/{}_rmse.png'.format(output_path, output_name)) 249 | plpy.notice('Created: {}/{}_rmse.png\n'.format(output_path, output_name)) 250 | 251 | # Load the best model from the checkpoint 252 | model = tf.keras.models.load_model('{}/{}.h5'.format(output_path, output_name)) 253 | 254 | # Dump the original test data, and test results for comparison 255 | test_dump = test_input.copy() 256 | test_dump['actual'] = test_output 257 | test_dump['predicted'] = model.predict(test_input)[:,0] 258 | test_dump['diff'] = abs(test_dump['predicted'] - test_dump['actual']) 259 | test_dump['pc_diff'] = test_dump['diff'] / (test_dump['predicted'] + 1e-10) * 100 260 | 261 | plpy.notice('Test data: \n{}\n'.format(test_dump)) 262 | 263 | # Test the model on the training and validation data to get the RMSE 264 | evaluation = model.evaluate(training_input, training_output) 265 | plpy.notice('Training RMSE: {}'.format(round(sqrt(evaluation), 5))) 266 | if len(validation_input) > 0: 267 | evaluation = model.evaluate(validation_input, validation_output) 268 | plpy.notice('Validation RMSE: {}'.format(round(sqrt(evaluation), 5))) 269 | 270 | # Summarise the results from the test data set 271 | plpy.notice('Test data mean absolute diff: {}'.format(round(float(sum(test_dump['diff']) / len(test_dump)), 5))) 272 | plpy.notice('Test data mean percentage diff: {}%'.format(round(float(sum(test_dump['pc_diff']) / len(test_dump)), 5))) 273 | 274 | rmse = float(sqrt(abs(sum((test_dump['actual'] - test_dump['predicted']) ** 2) / len(test_dump)))) 275 | plpy.notice('Test data RMSE: {}'.format(round(rmse, 5))) 276 | 277 | rmspe = float(sqrt(abs(sum((test_dump['actual'] - test_dump['predicted']) / (test_dump['actual']) + 1e-10))) / len(test_dump)) * 100 278 | plpy.notice('Test data RMSPE: {}%\n'.format(round(rmspe, 5))) 279 | 280 | plpy.notice('Model saved to: {}/{}.h5'.format(output_path, output_name)) 281 | 282 | return rmspe 283 | $BODY$; 284 | 285 | ALTER FUNCTION public.tf_model(text, integer[], text, text, integer, integer, integer) 286 | OWNER TO postgres; 287 | 288 | COMMENT ON FUNCTION public.tf_model(text, integer[], text, text, integer, integer, integer) 289 | IS 'Function to build and train a model to analyse an abitrary data set. 290 | 291 | Parameters: 292 | * data_source_sql: An SQL query returning at least 5 rows and 3 columns of numeric data to analyse. 293 | * structure: An array of integers indicating the number of neurons in each of an arbitrary number of layer. A final output layer will be added with a single neuron. 294 | * output_name: The name of the output to use in titles etc. 295 | * output_path: The path of a directory under which to save generated graphs and the model. Must be writeable by the database server''s service account (usually postgres). 296 | * epochs: The maximum number of training epochs to run (default: 5000) 297 | * validation_pct: The percentage of the rows returned by the query specified in data_source_sql to use for model validation (default: 10). 298 | * test_pct: The percentage of the rows returned by the query specified in data_source_sql to use for model testing (default: 10). 299 | 300 | Returns: The Root Mean Square Percentage Error calculated from the evaluation of the test data set.'; 301 | 302 | -- Function to make a prediction based on a model 303 | CREATE OR REPLACE FUNCTION public.tf_predict( 304 | input_values double precision[], 305 | model_path text) 306 | RETURNS double precision[] 307 | LANGUAGE 'plpython3u' 308 | COST 100 309 | VOLATILE PARALLEL UNSAFE 310 | AS $BODY$ 311 | import tensorflow as tf 312 | 313 | # Reset everything 314 | tf.keras.backend.clear_session() 315 | tf.random.set_seed(42) 316 | 317 | # Load the model 318 | model = tf.keras.models.load_model(model_path) 319 | 320 | # Are we dealing with a single prediction, or a list of them? 321 | if not any(isinstance(sub, list) for sub in input_values): 322 | data = [input_values] 323 | else: 324 | data = input_values 325 | 326 | # Make the prediction(s) 327 | result = model.predict([data])[0] 328 | result = [ item for elem in result for item in elem] 329 | 330 | return result 331 | $BODY$; 332 | 333 | ALTER FUNCTION public.tf_predict(double precision[], text) 334 | OWNER TO postgres; 335 | 336 | COMMENT ON FUNCTION public.tf_predict(double precision[], text) 337 | IS 'Function to make predictions based on input values and a Tensorflow model. 338 | 339 | Parameters: 340 | * input_values: An array of input values, or an array of arrays of input values, e.g. ''{2, 3}'' or ''{{2, 3}, {3, 4}}''. 341 | * model_path: The full path to a Tensorflow model saved in .h5 format. Must be writeable by the database server''s service account (usually postgres). 342 | 343 | Returns: An array of predicted values.'; 344 | -------------------------------------------------------------------------------- /regression/pythagoras.sql: -------------------------------------------------------------------------------- 1 | -- Table to hold the training inputs and output 2 | CREATE TABLE public.pythagoras 3 | ( 4 | x double precision NOT NULL, 5 | y double precision NOT NULL, 6 | z double precision NOT NULL, 7 | CONSTRAINT pythagoras_pkey PRIMARY KEY (x, y) 8 | ) 9 | 10 | TABLESPACE pg_default; 11 | 12 | ALTER TABLE public.pythagoras 13 | OWNER to postgres; 14 | 15 | -- Function to populate the training data 16 | CREATE OR REPLACE FUNCTION public.pythagoras_generate(rows integer) 17 | RETURNS void 18 | LANGUAGE 'plpgsql' 19 | COST 100 20 | VOLATILE PARALLEL UNSAFE 21 | AS $BODY$ 22 | DECLARE 23 | x float; 24 | y float; 25 | z float; 26 | BEGIN 27 | FOR l IN 1..rows LOOP 28 | SELECT round(random() * 100 + 1) INTO x; 29 | SELECT round(random() * 100 + 1) INTO y; 30 | z := sqrt(x*x + y*y); 31 | 32 | RAISE NOTICE 'x: %, y: %, z: %', x, y, z; 33 | BEGIN 34 | INSERT INTO pythagoras (x, y, z) VALUES (x, y, z); 35 | EXCEPTION WHEN unique_violation THEN 36 | l := l - 1; 37 | END; 38 | END LOOP; 39 | END 40 | $BODY$; 41 | 42 | ALTER FUNCTION public.pythagoras_generate(integer) 43 | OWNER TO postgres; 44 | 45 | SELECT pythagoras_generate(1000); 46 | 47 | -- Create, train and test the model 48 | CREATE OR REPLACE FUNCTION public.pythagoras_v( 49 | x double precision, 50 | y double precision) 51 | RETURNS double precision 52 | LANGUAGE 'plpython3u' 53 | COST 100 54 | VOLATILE PARALLEL UNSAFE 55 | AS $BODY$ 56 | import tensorflow as tf 57 | import numpy as np 58 | import matplotlib.pyplot as plt 59 | import math 60 | 61 | from tensorflow.python.keras.callbacks import ModelCheckpoint, LambdaCallback, EarlyStopping, LearningRateScheduler 62 | 63 | tf.keras.backend.clear_session() 64 | tf.random.set_seed(42) 65 | np.random.seed(42) 66 | 67 | total_rows = 100 68 | validation_pct = 10 69 | test_pct = 10 70 | epochs = 1000 71 | 72 | # Create the data sets 73 | rows = plpy.execute('SELECT x, y, z FROM pythagoras ORDER BY random() LIMIT {}'.format(total_rows)) 74 | actual_rows = len(rows) 75 | 76 | if actual_rows < 5: 77 | plpy.error('At least 5 data rows must be available for training. {} rows retrieved.'.format(actual_rows)) 78 | 79 | test_rows = int((actual_rows/100) * test_pct) 80 | validation_rows = int(((actual_rows)/100) * validation_pct) 81 | training_rows = actual_rows - test_rows - validation_rows 82 | 83 | data = [] 84 | results = [] 85 | 86 | for row in rows: 87 | data.append([row['x'], row['y']]) 88 | results.append(row['z']) 89 | 90 | max_z = max(results) 91 | training_data = np.array(data[:training_rows], dtype=float) 92 | training_results = np.array(results[:training_rows], dtype=float) 93 | validation_data = np.array(data[training_rows:training_rows+validation_rows], dtype=float) 94 | validation_results = np.array(results[training_rows:training_rows+validation_rows], dtype=float) 95 | test_data = np.array(data[training_rows+validation_rows:], dtype=float) 96 | test_results = np.array(results[training_rows+validation_rows:], dtype=float) 97 | 98 | plpy.notice('Total rows: {}, training rows: {}, validation rows: {}, test rows: {}.'.format(actual_rows, len(training_data), len(validation_data), len(test_data))) 99 | 100 | # Define the model 101 | l1 = tf.keras.layers.Dense(units=16, input_shape=(2,), activation = 'relu') 102 | l2 = tf.keras.layers.Dense(units=16, activation = 'relu') 103 | l3 = tf.keras.layers.Dense(units=1) # , activation='linear') 104 | 105 | model = tf.keras.Sequential([l1, l2, l3]) 106 | 107 | # Compile it 108 | model.compile(loss=tf.keras.losses.MeanSquaredError(), 109 | optimizer='adam') 110 | 111 | summary = [] 112 | model.summary(print_fn=lambda x: summary.append(x)) 113 | plpy.notice('Model architecture:\n{}'.format('\n'.join(summary))) 114 | 115 | # Save a checkpoint each time our loss metric improves. 116 | checkpoint = ModelCheckpoint('/Users/Shared/tf/pythagoras.h5', 117 | monitor='loss', 118 | save_best_only=True, 119 | mode='min') 120 | 121 | # Use early stopping 122 | early_stopping = EarlyStopping(patience=50) 123 | 124 | # Display output 125 | logger = LambdaCallback( 126 | on_epoch_end=lambda epoch, 127 | logs: plpy.notice( 128 | 'epoch: {}, training RMSE: {} ({}%), validation RMSE: {} ({}%)'.format( 129 | epoch, 130 | math.sqrt(logs['loss']), 131 | round(100 / max_z * math.sqrt(logs['loss']), 3), 132 | math.sqrt(logs['val_loss']), 133 | round(100 / max_z * math.sqrt(logs['val_loss']), 3)))) 134 | 135 | # Train it! 136 | history = model.fit(training_data, 137 | training_results, 138 | validation_data=(validation_data, validation_results), 139 | epochs=epochs, 140 | verbose=False, 141 | batch_size=50, 142 | callbacks=[logger, checkpoint, early_stopping]) 143 | 144 | # Graph the results 145 | training_loss = history.history['loss'] 146 | validation_loss = history.history['val_loss'] 147 | 148 | epochs_range = range(len(history.history['loss'])) 149 | 150 | plt.figure(figsize=(12, 8)) 151 | plt.plot(epochs_range, np.sqrt(training_loss), label='Training RMSE') 152 | plt.plot(epochs_range, np.sqrt(validation_loss), label='Validation RMSE') 153 | plt.legend(loc='upper right') 154 | plt.title('Training and Validation RMSE') 155 | 156 | plt.savefig('/Users/Shared/tf/pythagoras.png') 157 | 158 | # Load the best model from the checkpoint 159 | model = tf.keras.models.load_model('/Users/Shared/tf/pythagoras.h5') 160 | 161 | # How good is it looking? 162 | evaluation = model.evaluate(np.array(training_data), np.array(training_results)) 163 | plpy.notice('Training RMSE: {} ({}%).'.format(math.sqrt(evaluation), round(100 / max_z * math.sqrt(evaluation), 3))) 164 | 165 | if len(validation_data) > 0: 166 | evaluation = model.evaluate(np.array(validation_data), np.array(validation_results)) 167 | plpy.notice('Validation RMSE: {} ({}%).'.format(math.sqrt(evaluation), round(100 / max_z * math.sqrt(evaluation), 3))) 168 | 169 | if len(test_data) > 0: 170 | evaluation = model.evaluate(np.array(test_data), np.array(test_results)) 171 | plpy.notice('Test RMSE: {} ({}%).'.format(math.sqrt(evaluation), round(100 / max_z * math.sqrt(evaluation), 3))) 172 | 173 | # Get the result 174 | result = model.predict(np.array([[x, y]])) 175 | 176 | return result[0][0] 177 | $BODY$; 178 | 179 | ALTER FUNCTION public.pythagoras_v(double precision, double precision) 180 | OWNER TO postgres; 181 | 182 | SELECT pythagoras_v(3, 4); -- Correct answer is 5 183 | -------------------------------------------------------------------------------- /regression/random1.sql: -------------------------------------------------------------------------------- 1 | -- Table to hold the training inputs and output 2 | CREATE TABLE public.random1 3 | ( 4 | a double precision NOT NULL, 5 | b double precision NOT NULL, 6 | c double precision NOT NULL, 7 | d double precision NOT NULL, 8 | e double precision NOT NULL, 9 | z double precision NOT NULL, 10 | CONSTRAINT random1_pkey PRIMARY KEY (a, b, c, d, e) 11 | ) 12 | 13 | TABLESPACE pg_default; 14 | 15 | ALTER TABLE public.random1 16 | OWNER to postgres; 17 | 18 | -- Function to populate the training data 19 | CREATE OR REPLACE FUNCTION public.random1_generate(rows integer) 20 | RETURNS void 21 | LANGUAGE 'plpgsql' 22 | COST 100 23 | VOLATILE PARALLEL UNSAFE 24 | AS $BODY$ 25 | DECLARE 26 | a float; 27 | b float; 28 | c float; 29 | d float; 30 | e float; 31 | z float; 32 | BEGIN 33 | FOR l IN 1..rows LOOP 34 | SELECT round(random() * 10000 + 1) INTO a; 35 | SELECT round(random() * 10000 + 1) INTO b; 36 | SELECT round(random() * 10000 + 1) INTO c; 37 | SELECT round(random() * 10000 + 1) INTO d; 38 | SELECT round(random() * 10000 + 1) INTO e; 39 | z := cbrt((a * b) / (sin(c) * sqrt(d)) + (e * e * e)); 40 | 41 | RAISE NOTICE 'a: %, b: %, c: %, d: %, e: %, z: %', a, b, c, d, e, z; 42 | BEGIN 43 | INSERT INTO random1 (a, b, c, d, e, z) VALUES (a, b, c, d, e, z); 44 | EXCEPTION WHEN unique_violation THEN 45 | l := l - 1; 46 | END; 47 | END LOOP; 48 | END 49 | $BODY$; 50 | 51 | ALTER FUNCTION public.random1_generate(integer) 52 | OWNER TO postgres; 53 | 54 | SELECT random1_generate(1000); 55 | 56 | -- Create, train and test the model 57 | CREATE OR REPLACE FUNCTION public.random1_v( 58 | a double precision, 59 | b double precision, 60 | c double precision, 61 | d double precision, 62 | e double precision) 63 | RETURNS double precision 64 | LANGUAGE 'plpython3u' 65 | COST 100 66 | VOLATILE PARALLEL UNSAFE 67 | AS $BODY$ 68 | import tensorflow as tf 69 | import numpy as np 70 | import matplotlib.pyplot as plt 71 | import math 72 | 73 | from tensorflow.python.keras.callbacks import ModelCheckpoint, LambdaCallback, EarlyStopping, LearningRateScheduler 74 | 75 | tf.keras.backend.clear_session() 76 | tf.random.set_seed(42) 77 | np.random.seed(42) 78 | 79 | total_rows = 1000 80 | validation_pct = 10 81 | test_pct = 1 82 | epochs = 1000 83 | 84 | # Create the data sets 85 | rows = plpy.execute('SELECT a, b, c, d, e, z FROM random1 ORDER BY random() LIMIT {}'.format(total_rows)) 86 | actual_rows = len(rows) 87 | 88 | if actual_rows < 5: 89 | plpy.error('At least 5 data rows must be available for training. {} rows retrieved.'.format(actual_rows)) 90 | 91 | test_rows = int((actual_rows/100) * test_pct) 92 | validation_rows = int(((actual_rows)/100) * validation_pct) 93 | training_rows = actual_rows - test_rows - validation_rows 94 | 95 | data = [] 96 | results = [] 97 | 98 | for row in rows: 99 | data.append([row['a'], row['b'], row['c'], row['d'], row['e']]) 100 | results.append(row['z']) 101 | 102 | max_z = max(results) 103 | training_data = np.array(data[:training_rows], dtype=float) 104 | training_results = np.array(results[:training_rows], dtype=float) 105 | validation_data = np.array(data[training_rows:training_rows+validation_rows], dtype=float) 106 | validation_results = np.array(results[training_rows:training_rows+validation_rows], dtype=float) 107 | test_data = np.array(data[training_rows+validation_rows:], dtype=float) 108 | test_results = np.array(results[training_rows+validation_rows:], dtype=float) 109 | 110 | plpy.notice('Total rows: {}, training rows: {}, validation rows: {}, test rows: {}.'.format(actual_rows, len(training_data), len(validation_data), len(test_data))) 111 | 112 | # Define the model 113 | l1 = tf.keras.layers.Dense(units=16, input_shape=(5,), activation = 'relu') 114 | l2 = tf.keras.layers.Dense(units=16, activation = 'relu') 115 | l3 = tf.keras.layers.Dense(units=1, activation='linear') 116 | 117 | model = tf.keras.Sequential([l1, l2, l3]) 118 | 119 | # Compile it 120 | model.compile(loss=tf.keras.losses.MeanSquaredError(), 121 | optimizer='adam') 122 | 123 | summary = [] 124 | model.summary(print_fn=lambda x: summary.append(x)) 125 | plpy.notice('Model architecture:\n{}'.format('\n'.join(summary))) 126 | 127 | # Save a checkpoint each time our loss metric improves. 128 | checkpoint = ModelCheckpoint('/Users/Shared/tf/random1.h5', 129 | monitor='loss', 130 | save_best_only=True, 131 | mode='min') 132 | 133 | # Use early stopping 134 | early_stopping = EarlyStopping(patience=50) 135 | 136 | # Display output 137 | logger = LambdaCallback( 138 | on_epoch_end=lambda epoch, 139 | logs: plpy.notice( 140 | 'epoch: {}, training RMSE: {} ({}%), validation RMSE: {} ({}%)'.format( 141 | epoch, 142 | math.sqrt(logs['loss']), 143 | round(100 / max_z * math.sqrt(logs['loss']), 3), 144 | math.sqrt(logs['val_loss']), 145 | round(100 / max_z * math.sqrt(logs['val_loss']), 3)))) 146 | 147 | # Train it! 148 | history = model.fit(training_data, 149 | training_results, 150 | validation_data=(validation_data, validation_results), 151 | epochs=epochs, 152 | verbose=False, 153 | batch_size=50, 154 | callbacks=[logger, checkpoint, early_stopping]) 155 | 156 | # Graph the results 157 | training_loss = history.history['loss'] 158 | validation_loss = history.history['val_loss'] 159 | 160 | epochs_range = range(len(history.history['loss'])) 161 | 162 | plt.figure(figsize=(12, 8)) 163 | plt.plot(epochs_range, np.sqrt(training_loss), label='Training RMSE') 164 | plt.plot(epochs_range, np.sqrt(validation_loss), label='Validation RMSE') 165 | plt.legend(loc='upper right') 166 | plt.title('Training and Validation RMSE') 167 | 168 | plt.savefig('/Users/Shared/tf/random1.png') 169 | 170 | # Load the best model from the checkpoint 171 | model = tf.keras.models.load_model('/Users/Shared/tf/random1.h5') 172 | 173 | # How good is it looking? 174 | evaluation = model.evaluate(np.array(training_data), np.array(training_results)) 175 | plpy.notice('Training RMSE: {} ({}%).'.format(math.sqrt(evaluation), round(100 / max_z * math.sqrt(evaluation), 3))) 176 | 177 | if len(validation_data) > 0: 178 | evaluation = model.evaluate(np.array(validation_data), np.array(validation_results)) 179 | plpy.notice('Validation RMSE: {} ({}%).'.format(math.sqrt(evaluation), round(100 / max_z * math.sqrt(evaluation), 3))) 180 | 181 | if len(test_data) > 0: 182 | evaluation = model.evaluate(np.array(test_data), np.array(test_results)) 183 | plpy.notice('Test RMSE: {} ({}%).'.format(math.sqrt(evaluation), round(100 / max_z * math.sqrt(evaluation), 3))) 184 | 185 | # Get the result 186 | result = model.predict(np.array([[a, b, c, d, e]])) 187 | 188 | return result[0][0] 189 | $BODY$; 190 | 191 | SELECT random1_v(5, 25, 67, 2, 29); -- Correct answer is 28.958992634781293 -------------------------------------------------------------------------------- /regression/tf-housing.sql: -------------------------------------------------------------------------------- 1 | -- Table to hold the training inputs and output 2 | CREATE TABLE public.housing 3 | ( 4 | crim double precision NOT NULL, 5 | zn double precision NOT NULL, 6 | indus double precision NOT NULL, 7 | chas double precision NOT NULL, 8 | nox double precision NOT NULL, 9 | rm double precision NOT NULL, 10 | age double precision NOT NULL, 11 | dis double precision NOT NULL, 12 | rad double precision NOT NULL, 13 | tax double precision NOT NULL, 14 | ptratio double precision NOT NULL, 15 | b double precision NOT NULL, 16 | lstat double precision NOT NULL, 17 | medv double precision NOT NULL 18 | ) 19 | 20 | TABLESPACE pg_default; 21 | 22 | ALTER TABLE public.housing 23 | OWNER to postgres; 24 | 25 | -- Create, train and test the model 26 | CREATE OR REPLACE FUNCTION public.housing_v( 27 | crim double precision, 28 | zn double precision, 29 | indus double precision, 30 | chas double precision, 31 | nox double precision, 32 | rm double precision, 33 | age double precision, 34 | dis double precision, 35 | rad double precision, 36 | tax double precision, 37 | ptratio double precision, 38 | b double precision, 39 | lstat double precision) 40 | RETURNS double precision 41 | LANGUAGE 'plpython3u' 42 | COST 100 43 | VOLATILE PARALLEL UNSAFE 44 | AS $BODY$ 45 | import tensorflow as tf 46 | import pandas as pd 47 | import matplotlib.pyplot as plt 48 | import seaborn as sns 49 | from math import sqrt 50 | 51 | from tensorflow.python.keras.callbacks import ModelCheckpoint, LambdaCallback, EarlyStopping, LearningRateScheduler 52 | 53 | # Configurables 54 | DEBUG = True 55 | total_rows = 1000 56 | validation_pct = 10 57 | test_pct = 5 58 | epochs = 5000 59 | 60 | # Pandas print options 61 | pd.set_option('display.max_rows', None) 62 | pd.set_option('display.max_columns', 20) 63 | pd.set_option('display.width', 1000) 64 | 65 | # Create the data sets 66 | rows = plpy.execute('SELECT crim, zn, indus, chas, nox, rm, age, dis, rad, tax, ptratio, b, lstat, medv FROM housing ORDER BY random() LIMIT {}'.format(total_rows)) 67 | data = pd.DataFrame.from_records(rows, columns = ['crim', 'zn', 'indus', 'chas', 'nox', 'rm', 'age', 'dis', 'rad', 'tax', 'ptratio', 'b', 'lstat', 'medv']) 68 | 69 | # Remove any rows with outliers 70 | Q1 = data.quantile(0.25) 71 | Q3 = data.quantile(0.75) 72 | IQR = Q3 - Q1 73 | plpy.notice('IQR:\n{}'.format(IQR)) 74 | plpy.notice('Outliers detected:\n{}'.format((data < (Q1 - 1.5 * IQR)) |(data > (Q3 + 1.5 * IQR)))) 75 | 76 | if DEBUG: 77 | plt.cla() 78 | fig, axs = plt.subplots(ncols=7, nrows=2, figsize=(20, 10)) 79 | index = 0 80 | axs = axs.flatten() 81 | for k,v in data.items(): 82 | sns.boxplot(y=k, data=data, ax=axs[index]) 83 | index += 1 84 | plt.tight_layout(pad=0.4, w_pad=0.5, h_pad=5.0) 85 | plt.savefig("/Users/Shared/tf/housing_outliers.png") 86 | 87 | plt.cla() 88 | fig, axs = plt.subplots(ncols=7, nrows=2, figsize=(20, 10)) 89 | index = 0 90 | axs = axs.flatten() 91 | for k,v in data.items(): 92 | sns.distplot(v, ax=axs[index]) 93 | index += 1 94 | plt.tight_layout(pad=0.4, w_pad=0.5, h_pad=5.0) 95 | plt.savefig("/Users/Shared/tf/housing_distributions.png") 96 | 97 | data = data[~((data < (Q1 - 1.5 * IQR)) |(data > (Q3 + 1.5 * IQR))).any(axis=1)] 98 | 99 | # Look for data correlations 100 | if DEBUG: 101 | corr = data.corr() 102 | plt.cla() 103 | plt.figure(figsize=(20,20)) 104 | sns.heatmap(data.corr().abs(), annot=True).get_figure().savefig("/Users/Shared/tf/housing_correlation.png") 105 | 106 | # So how many rows remain? 107 | actual_rows = len(data) 108 | 109 | # Check we have enough rows left 110 | if actual_rows < 5: 111 | plpy.error('At least 5 data rows must be available for training. {} rows retrieved.'.format(actual_rows)) 112 | 113 | # Figure out how many rows to use for training, validation and test 114 | test_rows = int((actual_rows/100) * test_pct) 115 | validation_rows = int(((actual_rows)/100) * validation_pct) 116 | training_rows = actual_rows - test_rows - validation_rows 117 | 118 | # Split the data into input and output 119 | input = data[['crim', 'zn', 'indus', 'chas', 'nox', 'rm', 'age', 'dis', 'rad', 'tax', 'ptratio', 'b', 'lstat']] 120 | output = data[['medv']] 121 | 122 | max_z = max(output['medv']) 123 | training_input = input[:training_rows] 124 | training_output = output[:training_rows] 125 | validation_input = input[training_rows:training_rows+validation_rows] 126 | validation_output = output[training_rows:training_rows+validation_rows] 127 | test_input = input[training_rows+validation_rows:] 128 | test_output = output[training_rows+validation_rows:] 129 | 130 | 131 | plpy.notice('Total rows: {}, training rows: {}, validation rows: {}, test rows: {}.'.format(actual_rows, len(training_input), len(validation_input), len(test_input))) 132 | 133 | # Define the model 134 | l1 = tf.keras.layers.Dense(units=32, input_shape=(13,), activation = 'relu') 135 | l2 = tf.keras.layers.Dense(units=32, activation = 'relu') 136 | l3 = tf.keras.layers.Dense(units=1, activation='linear') 137 | 138 | model = tf.keras.Sequential([l1, l2, l3]) 139 | 140 | # Compile it 141 | model.compile(loss=tf.keras.losses.MeanSquaredError(), 142 | optimizer='adam') 143 | 144 | if DEBUG: 145 | summary = [] 146 | model.summary(print_fn=lambda x: summary.append(x)) 147 | plpy.notice('Model architecture:\n{}'.format('\n'.join(summary))) 148 | 149 | # Save a checkpoint each time our loss metric improves. 150 | checkpoint = ModelCheckpoint('/Users/Shared/tf/housing.h5', 151 | monitor='loss', 152 | save_best_only=True, 153 | mode='min') 154 | 155 | # Use early stopping 156 | early_stopping = EarlyStopping(patience=50) 157 | 158 | # Display output 159 | logger = LambdaCallback( 160 | on_epoch_end=lambda epoch, 161 | logs: plpy.notice( 162 | 'epoch: {}, training RMSE: {} ({}%), validation RMSE: {} ({}%)'.format( 163 | epoch, 164 | sqrt(logs['loss']), 165 | round(100 / max_z * sqrt(logs['loss']), 5), 166 | sqrt(logs['val_loss']), 167 | round(100 / max_z * sqrt(logs['val_loss']), 5)))) 168 | 169 | # Train it! 170 | history = model.fit(training_input, 171 | training_output, 172 | validation_data=(validation_input, validation_output), 173 | epochs=epochs, 174 | verbose=False, 175 | batch_size=50, 176 | callbacks=[logger, checkpoint, early_stopping]) 177 | 178 | # Graph the results 179 | if DEBUG: 180 | training_loss = history.history['loss'] 181 | validation_loss = history.history['val_loss'] 182 | 183 | epochs_range = range(len(history.history['loss'])) 184 | 185 | plt.figure(figsize=(12, 8)) 186 | plt.grid(True) 187 | plt.plot(epochs_range, [x ** 0.5 for x in training_loss], label='Training') 188 | plt.plot(epochs_range, [x ** 0.5 for x in validation_loss], label='Validation') 189 | plt.xlabel('Epoch') 190 | plt.ylabel('Root Mean Squared Error') 191 | plt.legend(loc='upper right') 192 | plt.title('Training and Validation Root Mean Squared Error') 193 | 194 | plt.savefig('/Users/Shared/tf/housing.png') 195 | 196 | # Load the best model from the checkpoint 197 | model = tf.keras.models.load_model('/Users/Shared/tf/housing.h5') 198 | 199 | if DEBUG: 200 | # Dump the original test data, and test results for comparison 201 | test_dump = test_input.copy() 202 | test_dump['actual'] = test_output 203 | test_dump['predicted'] = model.predict(test_input)[:,0] 204 | test_dump['diff'] = abs(test_dump['predicted'] - test_dump['actual']) 205 | test_dump['pc_diff'] = test_dump['diff'] / test_dump['predicted'] * 100 206 | 207 | plpy.notice('Test data: \n{}\n'.format(test_dump)) 208 | 209 | plpy.notice('Test data mean absolute diff: {}'.format(round(float((sum(test_dump['diff']) / len(test_dump))), 5))) 210 | plpy.notice('Test data mean percentage diff: {}%'.format(round(float((sum(test_dump['pc_diff']) / len(test_dump))), 5))) 211 | plpy.notice('Test data RMSE: {}\n'.format(round(float((sqrt(sum((test_dump['actual'] - test_dump['predicted']) ** 2) / len(test_dump)))), 5))) 212 | 213 | # How good is it looking? 214 | evaluation = model.evaluate(training_input, training_output) 215 | plpy.notice('Training RMSE: {}.'.format(round(sqrt(evaluation), 5))) 216 | 217 | if len(validation_input) > 0: 218 | evaluation = model.evaluate(validation_input, validation_output) 219 | plpy.notice('Validation RMSE: {}.'.format(round(sqrt(evaluation), 5))) 220 | 221 | if len(test_input) > 0: 222 | evaluation = model.evaluate(test_input, test_output) 223 | plpy.notice('Test RMSE: {}.'.format(round(sqrt(evaluation), 5))) 224 | 225 | # Get the result 226 | result = model.predict([[crim, zn, indus, chas, nox, rm, age, dis, rad, tax, ptratio, b, lstat]]) 227 | 228 | return result[0][0] 229 | $BODY$; 230 | 231 | SELECT housing_v(0.00632, 18.00, 2.310, 0, 0.5380, 6.5750, 65.20, 4.0900, 1, 296.0, 15.30, 396.90, 4.98); -- Correct answer is 24.00 232 | -- SELECT housing_v(0.26363, 0.00000, 8.56000, 0.00000, 0.52000, 6.22900, 91.20000, 2.54510, 5.00000, 384.00000, 20.90000, 391.23000, 15.55000); -- Correct answer is 19.4 -------------------------------------------------------------------------------- /text_prediction/README.md: -------------------------------------------------------------------------------- 1 | # Text Prediction 2 | 3 | I wanted to explore text prediction using a neural network as a way to generate 4 | search suggestions for users of the pgAdmin and PostgreSQL websites. There are 5 | two scripts in this directory: 6 | 7 | ## html-train.py 8 | 9 | This script will take a directory of HTML files and train a model based on the 10 | text within those files. It will save the model and the tokenizer that contains 11 | the data. 12 | 13 | ```shell script 14 | (ml) dpage@hal:~/git/machine_learning/text_prediction$ python3 html-train.py --help 15 | usage: html-train.py [-h] [--debug] -d DATA -i INPUT -m MODEL 16 | 17 | Create a Tensorflow model based on a directory of HTML 18 | 19 | optional arguments: 20 | -h, --help show this help message and exit 21 | --debug enable debug mode 22 | -d DATA, --data DATA the file to save data to 23 | -i INPUT, --input INPUT 24 | the input directory containing the HTML files 25 | -m MODEL, --model MODEL 26 | the file to save the model to 27 | ``` 28 | 29 | ## test-model.py 30 | 31 | This script will load the model and tokenizer created during training, and 32 | allow you to enter words and then select the number of additional words to 33 | predict. 34 | 35 | ```shell script 36 | (ml) dpage@hal:~/git/machine_learning/text_prediction$ python test-model.py --help 37 | usage: test-model.py [-h] -d DATA -m MODEL 38 | 39 | Test a pre-trained Tensorflow model with text data 40 | 41 | optional arguments: 42 | -h, --help show this help message and exit 43 | -d DATA, --data DATA the file to load data from 44 | -m MODEL, --model MODEL 45 | the file to load the model from 46 | ``` 47 | 48 | For example: 49 | 50 | ```shell script 51 | (ml) dpage@hal:~/git/machine_learning/text_prediction$ python test-model.py -d pgadmin-docs.json -m pgadmin-docs.h5 52 | Enter text (blank to quit): trigger 53 | Number of words to generate (default: 1): 54 | Results: trigger date 55 | Enter text (blank to quit): table 56 | Number of words to generate (default: 1): 3 57 | Results: table you can be 58 | Enter text (blank to quit): 59 | ``` -------------------------------------------------------------------------------- /text_prediction/html-train.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import json 3 | import glob 4 | import string 5 | 6 | import matplotlib.pyplot as plt 7 | import numpy as np 8 | import tensorflow as tf 9 | from bs4 import BeautifulSoup 10 | from tensorflow.keras.layers import Embedding, LSTM, Dense, Bidirectional 11 | from tensorflow.keras.models import Sequential, load_model 12 | from tensorflow.keras.preprocessing.sequence import pad_sequences 13 | from tensorflow.keras.preprocessing.text import Tokenizer 14 | from tensorflow.python.keras.callbacks import ModelCheckpoint 15 | from tensorflow.python.keras.layers import Dropout 16 | 17 | 18 | # Helper to plot graphs 19 | def plot_graphs(history, string): 20 | plt.plot(history.history[string]) 21 | plt.xlabel("Epochs") 22 | plt.ylabel(string) 23 | plt.show() 24 | 25 | 26 | # Get command line options 27 | parser = argparse.ArgumentParser(description='Create a Tensorflow model based on a directory of HTML') 28 | parser.add_argument("--debug", action='store_true', help="enable debug mode") 29 | parser.add_argument("-d", "--data", required=True, help="the file to save data to") 30 | parser.add_argument("-i", "--input", required=True, help="the input directory containing the HTML files") 31 | parser.add_argument("-m", "--model", required=True, help="the file to save the model to") 32 | 33 | args = parser.parse_args() 34 | DEBUG = args.debug 35 | 36 | if DEBUG: 37 | print('Building corpus from {}:'.format(args.input + '/*.html')) 38 | 39 | # Create the corpus 40 | corpus = [] 41 | for path in glob.iglob(args.input + '/*.html'): 42 | if DEBUG: 43 | print('Loading text from {}...'.format(path)) 44 | 45 | file = open(path, "r") 46 | html = file.read() 47 | soup = BeautifulSoup(html, features="html.parser") 48 | 49 | # kill all script and style elements 50 | for script in soup(["script", "style"]): 51 | script.extract() 52 | 53 | # Get the text 54 | for p in soup.find_all('p'): 55 | # Extract the

tags, and convert to lower case. 56 | para = p.getText().lower() 57 | 58 | # Remove any line feeds 59 | para = para.replace('\n', ' ') 60 | 61 | # Break up into one line per sentence 62 | para = para.replace('. ', '.\n') 63 | 64 | # Replace punctuation with a space 65 | para = para.translate(str.maketrans(string.punctuation + '“' + '”', ' ' * len(string.punctuation + '“' + '”'))) 66 | 67 | # Ensure the data is split into clean lines 68 | lines = para.split('\n') 69 | 70 | # Add the lines to the corpus... but only if there's more than one word 71 | for line in lines: 72 | if line.find(' ') > -1: 73 | corpus.append(line.strip()) 74 | 75 | 76 | # Check that we loaded something 77 | if len(corpus) == 0: 78 | print("No corpus could be loaded from {}. Exiting.".format(args.directory + '/*.html')) 79 | exit(1) 80 | 81 | # Tokenize the corpus. This will create the tokenizer, and extract and index 82 | # all the words in the corpus, creating a dictionary of words and their IDs 83 | tokenizer = Tokenizer(num_words=3000) 84 | tokenizer.fit_on_texts(corpus) 85 | total_words = tokenizer.num_words 86 | 87 | print("Created a corpus of {} lines with {} words.".format( 88 | len(corpus), 89 | len(tokenizer.index_word))) 90 | 91 | # Create a list of sequences; that is, a list of lists of word IDs, 92 | # representing each sequential substring, up to the full line (sentence) in 93 | # the corpus. In other words, if the sentence is represented by [1, 2, 3, 4], 94 | # we create: 95 | # [[1, 2], 96 | # [1, 2, 3], 97 | # [1, 2, 3, 4]] 98 | sequences = [] 99 | for line in corpus: 100 | token_list = tokenizer.texts_to_sequences([line])[0] 101 | for i in range(1, len(token_list)): 102 | n_gram_sequence = token_list[:i + 1] 103 | sequences.append(n_gram_sequence) 104 | 105 | # Pre-pad all the sequences so they're the same length, as the input to the 106 | # model requires that, creating: 107 | # [[0, 0, 1, 2], 108 | # [0, 1, 2, 3], 109 | # [1, 2, 3, 4]] 110 | max_sequence_len = max([len(seq) for seq in sequences]) 111 | sequences = np.array( 112 | pad_sequences(sequences, maxlen=max_sequence_len, padding='pre')) 113 | 114 | # Split sequences between the "input" sequence (the first N elements of each 115 | # sequence), and "output" label or predicted word (the last element of each 116 | # sequence, creating: 117 | # [[0, 0, 1], 118 | # [0, 1, 2], 119 | # [1, 2, 3]] 120 | # and: 121 | # [2, 3, 4] 122 | input_sequences, labels = sequences[:, :-1], sequences[:, -1] 123 | 124 | # One-hot encode the labels. This creates a matrix with one row for each 125 | # sequence, and a 1 in the column corresponding to the word ID of the next word 126 | # Given the example sequences above, this would create: 127 | # [[0, 1, 0, 0], 128 | # [0, 0, 1, 0], 129 | # [0, 0, 0, 1]] 130 | # Representing word 2 in sequence 1, word 3 in sequence 2, word 4 in sequence 3 131 | # and so on. 132 | one_hot_labels = tf.keras.utils.to_categorical(labels, num_classes=total_words) 133 | 134 | epochs = 100 135 | if DEBUG: 136 | epochs = 5 137 | 138 | # Build the model 139 | model = Sequential() 140 | 141 | # Embedding layer - Turns positive integers (indexes) into dense vectors of fixed size. 142 | # Params: 143 | # input dimension: words in the training set 144 | # output dimension: size of the embedding vectors (i.e the number of cells in the next layer) 145 | # input_length: the size of the sequences (i.e. the longest sentence) 146 | model.add(Embedding(total_words, 256, input_length=max_sequence_len - 1)) 147 | 148 | # Add a bi-directional LTSM layer, of the appropriate size. LSTMs are Long Term 149 | # Short Memory cells, which have the ability to remember (and forget) values to 150 | # carry forward in the sequence. 151 | # Params: 152 | # dimension: the size of the layer 153 | # return_sequences: return the entire sequence, not just the last value, so 154 | # it can be fed into another LSTM layer. 155 | model.add(Bidirectional(LSTM(256, return_sequences=True))) 156 | 157 | # More of the same, except the LSTM layer doesn't return sequences this time. 158 | model.add(Bidirectional(LSTM(256))) 159 | 160 | # Randomly set some input units to zero, to help prevent overfitting. Note that 161 | # we don't do this before any of the LTSM layers because it may cause them to 162 | # forget things that should not be forgotten. 163 | # Params: 164 | # rate: frequency of the dropouts 165 | model.add(Dropout(0.2)) 166 | 167 | # Finish up with a dense (fully connected) layer. The softmax activation 168 | # gives us a vector of probabilities for each word in the index. 169 | model.add(Dense(total_words, activation='softmax')) 170 | 171 | # Compile the mode. 172 | # Params: 173 | # loss: the function used to calculate loss 174 | # optimizer: the optimiser function, for adjusting the learning rate. Adam 175 | # is generally a good choice and performs well. 176 | # metrics: the metric(s) to monitor during training. 177 | model.compile(loss='categorical_crossentropy', 178 | optimizer='adam', 179 | metrics=['accuracy']) 180 | 181 | if DEBUG: 182 | print(model.summary()) 183 | 184 | # We're going to save a checkpoint each time our loss metric improves. 185 | checkpoint = ModelCheckpoint('checkpoint.h5', 186 | monitor='loss', 187 | save_best_only=True, 188 | mode='min') 189 | 190 | # Train the model. 191 | # Params: 192 | # input data: the data to learn from 193 | # target data: the expected output (e.g. a 1 in the row corresponding to the 194 | # input sequence, indicating the next word. 195 | # epochs: the number of epochs to train for 196 | # callbacks: a list of callbacks to execute 197 | history = model.fit(input_sequences, 198 | one_hot_labels, 199 | epochs=epochs, 200 | callbacks=[checkpoint]) 201 | 202 | # We're done training, so load the checkpoint which contains the best model 203 | # Training is done, so load the best model from the last checkpoint 204 | model = load_model("checkpoint.h5") 205 | 206 | # Save the model to the final file 207 | model.save(args.model) 208 | print('Model saved to {}.'.format(args.model)) 209 | 210 | # Save the tokenizer and max_sequence_length 211 | data = {'max_sequence_len': max_sequence_len, 212 | 'tokenizer': tokenizer.to_json()} 213 | with open(args.data, 'w', encoding='utf-8') as f: 214 | f.write(json.dumps(data, ensure_ascii=False)) 215 | print('Data saved to {}.'.format(args.data)) 216 | 217 | # All done, but if we're in debug mode, dump some interesting info. 218 | if DEBUG: 219 | plot_graphs(history, 'accuracy') 220 | 221 | # Dump some basic test info 222 | text = "trigger dialog" 223 | next_words = 5 224 | 225 | for _ in range(next_words): 226 | token_list = tokenizer.texts_to_sequences([text])[0] 227 | token_list = pad_sequences([token_list], maxlen=max_sequence_len - 1, 228 | padding='pre') 229 | 230 | # Get a list of the predicted probabilities corresponding to each word 231 | # in the index 232 | predictions = model.predict(token_list) 233 | 234 | # Get the top 5 results 235 | indices = np.argpartition(predictions, -5)[0][-5:] 236 | 237 | # Create a dict of the results and their probabilities 238 | results = {} 239 | for index in indices: 240 | key = [k for (k, v) in tokenizer.word_index.items() if v == index] 241 | results.update({key[0]: predictions[0, index]}) 242 | 243 | results = {k: v for k, v in sorted(results.items(), key=lambda item: item[1], reverse=True)} 244 | 245 | # Add the top result to the string, if it's not there already 246 | for result in results: 247 | if result not in text: 248 | text = text + " " + result 249 | break 250 | 251 | print("{}".format(results)) 252 | 253 | print(text) -------------------------------------------------------------------------------- /text_prediction/test-model.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import json 3 | 4 | import numpy as np 5 | import tensorflow as tf 6 | from tensorflow.keras.preprocessing.sequence import pad_sequences 7 | from tensorflow.keras.preprocessing.text import Tokenizer, tokenizer_from_json 8 | 9 | DEBUG = False 10 | 11 | # Get command line options 12 | parser = argparse.ArgumentParser(description='Test a pre-trained Tensorflow model with text data') 13 | parser.add_argument("-d", "--data", required=True, help="the file to load data from") 14 | parser.add_argument("-m", "--model", required=True, help="the file to load the model from") 15 | 16 | args = parser.parse_args() 17 | 18 | # Load the model 19 | model = tf.keras.models.load_model(args.model) 20 | 21 | # Load the data 22 | f = open(args.data, "r") 23 | data = json.load(f) 24 | max_sequence_len = data['max_sequence_len'] 25 | tokenizer = tokenizer_from_json(data['tokenizer']) 26 | 27 | text = input("Enter text (blank to quit): ") 28 | while text != '': 29 | words = input("Number of words to generate (default: 1): ") 30 | if words == '': 31 | words = 1 32 | else: 33 | words = int(words) 34 | 35 | for _ in range(words): 36 | token_list = tokenizer.texts_to_sequences([text])[0] 37 | token_list = pad_sequences([token_list], maxlen=max_sequence_len - 1, 38 | padding='pre') 39 | 40 | # Get a list of the predicted probabilities corresponding to each word 41 | # in the index 42 | predictions = model.predict(token_list) 43 | 44 | # Get the top 5 results 45 | indices = np.argpartition(predictions, -5)[0][-5:] 46 | 47 | # Create a dict of the results and their probabilities 48 | results = {} 49 | for index in indices: 50 | key = [k for (k, v) in tokenizer.word_index.items() if v == index] 51 | results.update({key[0]: predictions[0, index]}) 52 | 53 | results = {k: v for k, v in sorted(results.items(), key=lambda item: item[1], reverse=True)} 54 | 55 | # Add the top result to the string, if it's not there already 56 | for result in results: 57 | if result not in text: 58 | text = text + " " + result 59 | break 60 | 61 | if DEBUG: 62 | print(" {}".format(results)) 63 | 64 | print("Results: {}".format(text)) 65 | text = input("Enter text (blank to quit): ") -------------------------------------------------------------------------------- /time_series/README.md: -------------------------------------------------------------------------------- 1 | # Time Series Prediction 2 | 3 | Prediction of time series data is extremely useful in capacity management tools. 4 | Linear Trend Analysis is a common option, but this only allows you to predict 5 | the general trend of the data. Using a WaveNet model architecture we can predict 6 | data with trends and seasonality traits. 7 | 8 | There are various scripts in this directory: 9 | 10 | ## cnn-demo.py 11 | 12 | This script will generate a series of data with a general upward trend, noise, 13 | and seasonality. It is largely based on the example at: 14 | 15 | https://github.com/tensorflow/examples/blob/master/courses/udacity_intro_to_tensorflow_for_deep_learning/l08c09_forecasting_with_cnn.ipynb 16 | 17 | ## cnn-workload.py 18 | 19 | This is a minor variation to the *cnn-demo.py* script. Instead of generating its 20 | own test data, it will read a data set from *activity.csv* and learn and predict 21 | based on that data. *activity.csv* was created by logging the number of rows in 22 | *pg_stat_activity* on a PostgreSQL server every five minutes, whilst a workload 23 | generator was running against the system, simulating user activity. 24 | 25 | ## arima-tsp.py 26 | 27 | This tests predictions using (S)ARIMA functionality provided by the pmdarima 28 | Python library. 29 | 30 | ## prophet-tsp.py 31 | 32 | This tests predictions using Facebook's Prophet Python library. -------------------------------------------------------------------------------- /time_series/arima-tsp.py: -------------------------------------------------------------------------------- 1 | import pandas as pd 2 | import matplotlib.pyplot as plt 3 | import dateutil.parser 4 | import pmdarima as pm 5 | from pmdarima.arima.stationarity import ADFTest 6 | 7 | import pickle 8 | 9 | DATAFILE = 'activity.csv' 10 | 11 | try: 12 | data = pd.read_csv(DATAFILE, 13 | index_col=0, 14 | parse_dates=[0], 15 | date_parser=dateutil.parser.isoparse) 16 | is_dated = True 17 | except: 18 | data = pd.read_csv(DATAFILE, index_col=0) 19 | is_dated = False 20 | 21 | data = data[-4032:] 22 | data = data.resample('60T').mean() 23 | 24 | # Plot 25 | fig, axes = plt.subplots(2, 1, figsize=(10, 5), dpi=100, sharex=True) 26 | 27 | # Usual Differencing 28 | axes[0].plot(data[:], label='Original Series') 29 | axes[0].plot(data[:].diff(1), label='Usual Differencing') 30 | axes[0].set_title('Usual Differencing') 31 | axes[0].legend(loc='upper left', fontsize=10) 32 | 33 | # Seasonal Differencing 34 | axes[1].plot(data[:], label='Original Series') 35 | axes[1].plot(data[:].diff(12), label='Seasonal Differencing', color='green') 36 | axes[1].set_title('Seasonal Differencing') 37 | plt.legend(loc='upper left', fontsize=10) 38 | 39 | plt.show() 40 | 41 | pm.plot_acf(data) 42 | 43 | try: 44 | # Try to load an existing model first 45 | with open(DATAFILE + '.pkl', 'rb') as pkl: 46 | smodel = pickle.load(pkl) 47 | except: 48 | # Seasonal - fit stepwise auto-ARIMA 49 | smodel = pm.auto_arima(data, start_p=1, start_q=1, 50 | test='adf', 51 | max_p=3, max_q=3, m=168, 52 | start_P=0, seasonal=True, 53 | d=None, D=1, trace=True, 54 | error_action='ignore', 55 | suppress_warnings=True, 56 | stepwise=True) 57 | 58 | # Serialize with Pickle 59 | with open(DATAFILE + '.pkl', 'wb') as pkl: 60 | pickle.dump(smodel, pkl) 61 | 62 | print(smodel.summary()) 63 | 64 | # Forecast 65 | n_periods = int(len(data) * 0.25) 66 | fitted, confint = smodel.predict(n_periods=n_periods, return_conf_int=True) 67 | 68 | if is_dated: 69 | index_of_fc = pd.date_range(data.index[-1], periods=n_periods, freq='60min') 70 | else: 71 | index_of_fc = pd.RangeIndex(data.index[-1], data.index[-1] + n_periods).to_series() 72 | 73 | # make series for plotting purpose 74 | fitted_series = pd.Series(fitted, index=index_of_fc) 75 | lower_series = pd.Series(confint[:, 0], index=index_of_fc) 76 | upper_series = pd.Series(confint[:, 1], index=index_of_fc) 77 | 78 | # Plot 79 | fig, axes = plt.subplots( figsize=(10, 5), dpi=100, tight_layout=True) 80 | plt.plot(data, label='Historic Data') 81 | plt.plot(fitted_series, color='darkgreen', label='Forecast Data') 82 | plt.fill_between(lower_series.index, 83 | lower_series, 84 | upper_series, 85 | color='k', alpha=.15, label='Confidence') 86 | plt.legend(loc='upper left', fontsize=10) 87 | 88 | plt.show() 89 | -------------------------------------------------------------------------------- /time_series/cnn-demo.csv: -------------------------------------------------------------------------------- 1 | 1,52.483570765056164 2 | 2,49.35275206250873 3 | 3,53.314738720091654 4 | 4,57.71182193357917 5 | 5,48.934445338646746 6 | 6,48.93124221263841 7 | 7,57.982896873997795 8 | 8,53.89712254407415 9 | 9,47.673926155614154 10 | 10,52.683706844477726 11 | 11,47.59171580618012 12 | 12,47.50637546686874 13 | 13,50.95941455650678 14 | 14,40.0861800669988 15 | 15,40.919412912752854 16 | 16,46.61247489800624 17 | 17,44.22820749649383 18 | 18,50.72064419717794 19 | 19,44.45498265160984 20 | 20,41.76799161636655 21 | 21,55.98093654512165 22 | 22,47.335836470093255 23 | 23,48.60329747956659 24 | 24,40.93184102255222 25 | 25,45.11265790806558 26 | 26,48.15731844212945 27 | 27,41.605099478163844 28 | 28,48.98523395249864 29 | 29,43.83963141177149 30 | 30,45.10993463906914 31 | 31,43.275042512717455 32 | 32,55.249831558104475 33 | 33,45.61561472036271 34 | 34,40.07910684941181 35 | 35,49.154955434058735 36 | 36,38.602701980447065 37 | 37,45.40616171535249 38 | 38,34.20876785391398 39 | 39,37.00193411359374 40 | 40,44.253510326428135 41 | 41,46.578596279943625 42 | 42,43.351011752380835 43 | 43,41.514814636241546 44 | 44,40.17753568217782 45 | 45,33.87169103890548 46 | 46,37.237714399744895 47 | 47,38.09790511196735 48 | 48,45.24254666081345 49 | 49,41.22268300970987 50 | 50,30.22901514175779 51 | 51,40.196379682122476 52 | 52,36.174565479979506 53 | 53,34.23181291788006 54 | 54,40.183837494953316 55 | 55,41.78223418871641 56 | 56,40.778327540715736 57 | 57,31.413604091558295 58 | 58,33.54463558702571 59 | 59,36.221429983257075 60 | 60,38.91083398130668 61 | 61,31.0989852959105 62 | 62,32.02223667296815 63 | 63,26.8686348004186 64 | 64,25.863335565916735 65 | 65,35.34551726926316 66 | 66,37.497249421096676 67 | 67,29.783984223941914 68 | 68,34.58469362386002 69 | 69,30.79339810646969 70 | 70,25.17318981859314 71 | 71,29.614907580867097 72 | 72,34.90301359904407 73 | 73,26.434553192929457 74 | 74,33.83389805406793 75 | 75,12.30528790488522 76 | 76,28.903391445774883 77 | 77,24.615707223982852 78 | 78,22.068952131508095 79 | 79,23.403431303009043 80 | 80,12.384741265517482 81 | 81,20.599703839645635 82 | 82,22.856822474415864 83 | 83,27.831845627832962 84 | 84,17.220260717313494 85 | 85,15.136702541005633 86 | 86,16.03646864864384 87 | 87,22.487074989752017 88 | 88,18.917560744221525 89 | 89,13.987878585188938 90 | 90,18.56522840125376 91 | 91,15.846031644189091 92 | 92,19.565366484514218 93 | 93,10.573322687612876 94 | 94,11.806877073134833 95 | 95,10.846603701441937 96 | 96,4.852087281366758 97 | 97,13.013545572105983 98 | 98,12.202470811584591 99 | 99,10.288183299392287 100 | 100,8.45647607175643 101 | 101,1.9209303559618247 102 | 102,6.2647083660884455 103 | 103,6.026493351058951 104 | 104,3.1029914486664936 105 | 105,5.684642225808682 106 | 106,7.890598701256305 107 | 107,14.683326950659888 108 | 108,5.510316857672229 109 | 109,5.313384462373677 110 | 110,3.04497811574246 111 | 111,-6.781507620059443 112 | 112,2.0786794660264443 113 | 113,1.9152528396559976 114 | 114,13.337309622535663 115 | 115,-0.5293736345719712 116 | 116,1.356024140365832 117 | 117,-0.9047031598423845 118 | 118,-7.149067704596997 119 | 119,3.8389869483905743 120 | 120,1.320355585349711 121 | 121,0.9571168525332379 122 | 122,-8.098083931136062 123 | 123,2.915529365819684 124 | 124,-11.64900780018202 125 | 125,-2.240616862237794 126 | 126,5.248559797661139 127 | 127,-11.178710596650182 128 | 128,-9.573153310082947 129 | 129,-6.752200969104917 130 | 130,-10.269621056534946 131 | 131,-16.000173285830783 132 | 132,-8.391321573463806 133 | 133,-14.525443097812527 134 | 134,-7.31810123474604 135 | 135,-14.747519473870266 136 | 136,-2.8571048398967136 137 | 137,-14.971315485556072 138 | 138,-13.105374711606759 139 | 139,-7.859099895306094 140 | 140,-18.50408444671292 141 | 141,-11.626857921833766 142 | 142,-6.63401879149507 143 | 143,-21.603768944812494 144 | 144,-13.030716586283548 145 | 145,-13.032788261978894 146 | 146,-10.792060556568035 147 | 147,23.163014922097673 148 | 148,22.69686848666786 149 | 149,31.861049683918157 150 | 150,30.689256146326997 151 | 151,30.410582023326647 152 | 152,30.844931377796392 153 | 153,25.66792040178492 154 | 154,30.185440901642245 155 | 155,30.446431573763746 156 | 156,25.366971004232603 157 | 157,38.22601539471462 158 | 158,31.225471858641953 159 | 159,22.85969771720536 160 | 160,32.05962864337407 161 | 161,23.86482905753115 162 | 162,32.63576266035585 163 | 163,34.45613925621218 164 | 164,24.52328508615176 165 | 165,33.40782039692574 166 | 166,30.61978942090686 167 | 167,32.63182683853249 168 | 168,37.97182272210303 169 | 169,27.227933864800022 170 | 170,24.653889518752415 171 | 171,23.943367787806242 172 | 172,24.28092629160526 173 | 173,27.944169665540645 174 | 174,30.005795873898652 175 | 175,29.654499694444283 176 | 176,32.37861821551645 177 | 177,28.280008984299865 178 | 178,35.45560349809318 179 | 179,26.838213360093153 180 | 180,41.73653343616088 181 | 181,31.238834960561466 182 | 182,23.800136520333595 183 | 183,22.707498317081804 184 | 184,30.450964816245648 185 | 185,26.898531217988875 186 | 186,31.563685576454137 187 | 187,30.338299872098197 188 | 188,27.586981676021914 189 | 189,23.696753190433867 190 | 190,20.336657494017892 191 | 191,25.659062271724792 192 | 192,32.15494131758409 193 | 193,28.925288639468512 194 | 194,21.60855625468561 195 | 195,28.686138098812826 196 | 196,29.730352254066016 197 | 197,23.368554134563336 198 | 198,28.541082426209844 199 | 199,28.048651273329433 200 | 200,22.028437799361416 201 | 201,29.518434175422623 202 | 202,30.520150258960683 203 | 203,33.118731990501104 204 | 204,32.96024783715309 205 | 205,20.79116198987747 206 | 206,22.97916016300004 207 | 207,30.232739375792292 208 | 208,30.216267513631156 209 | 209,30.212843883534234 210 | 210,46.8920195365922 211 | 211,30.474056204278057 212 | 212,33.28915438594617 213 | 213,32.373534566527624 214 | 214,30.853154694672522 215 | 215,26.012994156325345 216 | 216,31.377793834617123 217 | 217,23.712890613706378 218 | 218,26.387450430120555 219 | 219,25.139706577775907 220 | 220,27.971326225795078 221 | 221,39.13112624660511 222 | 222,18.21782835085095 223 | 223,30.98221555614641 224 | 224,19.484531332058886 225 | 225,25.186079668879678 226 | 226,32.98854898058617 227 | 227,27.86367826822849 228 | 228,22.152458158885448 229 | 229,23.963985624979507 230 | 230,30.938229862109058 231 | 231,23.888556332881166 232 | 232,28.623238873542764 233 | 233,27.769766279279487 234 | 234,24.285267902987943 235 | 235,38.264750770314116 236 | 236,30.7167809805418 237 | 237,17.424020111376773 238 | 238,28.48494021898553 239 | 239,24.247057167082385 240 | 240,31.82185910258095 241 | 241,23.601170638390897 242 | 242,26.9945498874441 243 | 243,30.09799903955673 244 | 244,31.907038899598213 245 | 245,21.582347885322356 246 | 246,25.917254439326793 247 | 247,25.22132548962018 248 | 248,24.336055085668555 249 | 249,36.43697650069875 250 | 250,29.6419698478953 251 | 251,21.320346577426342 252 | 252,32.222127345354 253 | 253,38.25199322723099 254 | 254,32.812273737458625 255 | 255,20.062170570260093 256 | 256,25.247258008981543 257 | 257,34.012724358278675 258 | 258,24.149891069275547 259 | 259,29.917732171619228 260 | 260,31.582526104632095 261 | 261,23.085745805754527 262 | 262,27.434132597261566 263 | 263,11.537100168663343 264 | 264,22.63348984240056 265 | 265,26.504889593098635 266 | 266,21.54142531548354 267 | 267,35.95531466862788 268 | 268,20.655771726089842 269 | 269,25.619777707941523 270 | 270,28.487523147687345 271 | 271,35.05430297657338 272 | 272,20.683035771351975 273 | 273,33.69286660575851 274 | 274,27.943203404932397 275 | 275,22.99977167146893 276 | 276,30.23339335118944 277 | 277,28.934017410662715 278 | 278,24.953757092193054 279 | 279,28.320251751797606 280 | 280,26.061348225200785 281 | 281,28.572450600942226 282 | 282,31.332735462122596 283 | 283,35.96965272014816 284 | 284,21.868243833809007 285 | 285,38.74050484988847 286 | 286,18.333177427156635 287 | 287,27.353229054293408 288 | 288,31.072536197719778 289 | 289,29.5549605629018 290 | 290,25.05580801639357 291 | 291,27.14825001367012 292 | 292,25.743661444136098 293 | 293,25.28189434685544 294 | 294,32.49702576985611 295 | 295,30.054633000195157 296 | 296,24.825789021971104 297 | 297,32.809356971762135 298 | 298,29.869113006006064 299 | 299,32.41841907186402 300 | 300,31.523979106559665 301 | 301,24.25281769480044 302 | 302,25.619074939697168 303 | 303,32.1788632174833 304 | 304,31.516887366624406 305 | 305,28.38339286537669 306 | 306,29.097624662505904 307 | 307,34.922619432571935 308 | 308,25.59996371078298 309 | 309,31.317049967330767 310 | 310,27.594557137176913 311 | 311,27.541284893104404 312 | 312,34.147957106180755 313 | 313,32.80574622466429 314 | 314,32.7710122732127 315 | 315,35.25586402005366 316 | 316,28.8586996726217 317 | 317,32.188858715549344 318 | 318,27.25337472420986 319 | 319,30.451354523848952 320 | 320,28.20581971605037 321 | 321,29.367723391086383 322 | 322,31.88493201605743 323 | 323,24.844639945891764 324 | 324,39.42446778307735 325 | 325,23.959422541057084 326 | 326,22.9457328745827 327 | 327,34.83458369952376 328 | 328,33.02988156897888 329 | 329,32.21988980782986 330 | 330,32.26892324360529 331 | 331,29.094047666438698 332 | 332,24.697274884890646 333 | 333,29.591012654892655 334 | 334,25.85480088164466 335 | 335,34.14500269089181 336 | 336,28.5630852776693 337 | 337,25.200026518561973 338 | 338,27.749894358390236 339 | 339,31.450961251303504 340 | 340,26.597329592109006 341 | 341,25.334665342574127 342 | 342,30.69418356661164 343 | 343,30.730724425466146 344 | 344,27.0014822844887 345 | 345,27.211474424932398 346 | 346,30.75754331224612 347 | 347,22.387658162456724 348 | 348,22.62170451660704 349 | 349,26.097901787469436 350 | 350,28.654141350281513 351 | 351,31.307322595430705 352 | 352,37.161125698268556 353 | 353,34.104353517361645 354 | 354,29.04822324558314 355 | 355,29.784843830734353 356 | 356,24.89943428731653 357 | 357,29.851817730771675 358 | 358,28.533537380758386 359 | 359,31.623014220285878 360 | 360,25.906000046600795 361 | 361,32.671762017412426 362 | 362,37.77173887223531 363 | 363,29.597397457690068 364 | 364,32.18304859691377 365 | 365,33.65863849638424 366 | 366,66.24389764057082 367 | 367,69.41453597741673 368 | 368,68.38925803349717 369 | 369,68.8350531442832 370 | 370,64.4901632929861 371 | 371,68.47447786867903 372 | 372,70.82682425268808 373 | 373,75.56566693728482 374 | 374,73.06765221571496 375 | 375,78.98681891410568 376 | 376,64.32206645580219 377 | 377,72.44662741832342 378 | 378,68.91631322736836 379 | 379,78.85159595637616 380 | 380,63.75251064854226 381 | 381,63.475303333307224 382 | 382,64.54539987094584 383 | 383,56.77992891265254 384 | 384,64.6163279208121 385 | 385,63.283846815274515 386 | 386,67.65466163289491 387 | 387,68.42349785141172 388 | 388,75.8965106522064 389 | 389,71.05770114454975 390 | 390,63.200053252379476 391 | 391,61.360632136838326 392 | 392,68.06966322280789 393 | 393,58.75557782566707 394 | 394,74.25011869063728 395 | 395,70.71560399164197 396 | 396,62.18769731334091 397 | 397,55.6727679901054 398 | 398,70.70246271487945 399 | 399,63.04496226792822 400 | 400,69.48131243341011 401 | 401,54.984781936330336 402 | 402,59.61496862555985 403 | 403,62.28333697190377 404 | 404,62.12776732690961 405 | 405,59.2688767896863 406 | 406,64.25051304170407 407 | 407,55.40606819951801 408 | 408,59.63115862307628 409 | 409,60.53453231868376 410 | 410,62.08649516103637 411 | 411,62.645009832162884 412 | 412,53.02788850757694 413 | 413,50.53636467604076 414 | 414,64.1429756713601 415 | 415,58.95578597846943 416 | 416,53.08352715236573 417 | 417,64.1057367596737 418 | 418,56.44479609087415 419 | 419,61.27194197106811 420 | 420,55.2148289832872 421 | 421,64.67566656954469 422 | 422,62.636395919887505 423 | 423,52.09587672388641 424 | 424,57.67296758101104 425 | 425,55.50998809362063 426 | 426,58.588014272798695 427 | 427,46.37591425338671 428 | 428,54.08056697044094 429 | 429,55.38649112056803 430 | 430,40.739190725176606 431 | 431,43.04975671491369 432 | 432,38.19787394304308 433 | 433,46.469994962177104 434 | 434,50.82292926021133 435 | 435,54.160573852098906 436 | 436,46.4284034554239 437 | 437,53.605908494570684 438 | 438,37.96317609740476 439 | 439,35.74376757822212 440 | 440,43.37627493085085 441 | 441,44.9642061685953 442 | 442,42.266998142321526 443 | 443,31.47677838363805 444 | 444,40.749027222767594 445 | 445,34.050238335997676 446 | 446,43.29642602298339 447 | 447,41.154250847341366 448 | 448,33.99297647248861 449 | 449,35.492277221998265 450 | 450,32.133102946026824 451 | 451,36.48185838020067 452 | 452,40.93577600374783 453 | 453,30.595174964245384 454 | 454,37.40691218161336 455 | 455,31.597603143824777 456 | 456,29.646279736137167 457 | 457,32.43698973207698 458 | 458,27.157376544902768 459 | 459,28.92694127938777 460 | 460,25.067755004158485 461 | 461,40.24328768660929 462 | 462,29.959261946641746 463 | 463,25.648566910722217 464 | 464,29.5825155698511 465 | 465,27.317771490178018 466 | 466,26.14293606854778 467 | 467,29.688768480132364 468 | 468,29.777604483929323 469 | 469,22.71187205672196 470 | 470,21.86197958091533 471 | 471,22.745085931425393 472 | 472,11.992791620929308 473 | 473,15.311472482520273 474 | 474,29.110003845982487 475 | 475,29.892046262079717 476 | 476,19.817168258653872 477 | 477,23.344033658551357 478 | 478,21.420352562668548 479 | 479,34.66550310238543 480 | 480,24.280305746506528 481 | 481,17.458699471293937 482 | 482,12.7411534856817 483 | 483,8.912090882212802 484 | 484,17.392191055151592 485 | 485,12.028936695494327 486 | 486,8.140688569329633 487 | 487,11.465988921624998 488 | 488,8.743817793067214 489 | 489,22.045955689142215 490 | 490,17.48329645350811 491 | 491,12.50641846202816 492 | 492,19.423691723453388 493 | 493,11.895176876169003 494 | 494,6.693121198815565 495 | 495,18.11337760039434 496 | 496,12.697694087923173 497 | 497,4.329632780873773 498 | 498,8.084382080399957 499 | 499,4.185845345154274 500 | 500,1.1856030424770712 501 | 501,12.274110872673795 502 | 502,16.742034178475762 503 | 503,-0.23790499967408696 504 | 504,9.138160201298419 505 | 505,2.647024289847724 506 | 506,3.050215486322106 507 | 507,2.1182978158984405 508 | 508,0.3636933795945625 509 | 509,4.538722260779073 510 | 510,-0.23695281527619638 511 | 511,4.901109213444608 512 | 512,47.34657792924239 513 | 513,46.35441131775719 514 | 514,42.963523545625684 515 | 515,44.57047612731941 516 | 516,51.18507390072604 517 | 517,49.867276268433415 518 | 518,42.43026778568462 519 | 519,47.770833942983366 520 | 520,50.98800482412929 521 | 521,38.84170168875758 522 | 522,49.86394380089051 523 | 523,43.79318845935378 524 | 524,49.91920854618441 525 | 525,43.21056481749234 526 | 526,37.96382690534516 527 | 527,38.81262745222777 528 | 528,47.15358609448207 529 | 529,48.17530918651773 530 | 530,42.31935662518209 531 | 531,49.99884708011124 532 | 532,38.46392572721523 533 | 533,46.407458815596705 534 | 534,40.64979344600209 535 | 535,43.413389801529064 536 | 536,46.87793329251593 537 | 537,42.30791089001269 538 | 538,44.656900491462046 539 | 539,51.58150004588764 540 | 540,43.63658635017844 541 | 541,50.67116253066204 542 | 542,40.81646525162223 543 | 543,49.08695400188301 544 | 544,53.61934062957245 545 | 545,34.02746510257634 546 | 546,42.37602194438405 547 | 547,49.22128493831771 548 | 548,45.2967338771724 549 | 549,48.144332106886274 550 | 550,43.24591921103902 551 | 551,46.67663204335818 552 | 552,45.443725572270075 553 | 553,52.04003654760531 554 | 554,47.45282599728196 555 | 555,47.84890692782322 556 | 556,44.08225220144756 557 | 557,43.684916225604255 558 | 558,43.942028979719396 559 | 559,48.059510860137046 560 | 560,43.965311065455595 561 | 561,47.5026396399039 562 | 562,56.41484530879619 563 | 563,50.37808041364067 564 | 564,44.37749002026023 565 | 565,51.999358899334254 566 | 566,43.939120508573446 567 | 567,35.775604951230235 568 | 568,40.9130442200377 569 | 569,36.5872779718493 570 | 570,44.17194140945639 571 | 571,46.01037725852341 572 | 572,54.2897496011254 573 | 573,47.531974627890925 574 | 574,44.79210280795968 575 | 575,50.02538998923605 576 | 576,34.8139271057728 577 | 577,47.039398975586 578 | 578,49.70785171849644 579 | 579,38.45326720924433 580 | 580,51.558110595561715 581 | 581,47.52542976962285 582 | 582,43.75057711689023 583 | 583,48.98545279435203 584 | 584,57.16998860594341 585 | 585,46.72128680415594 586 | 586,47.04893634473925 587 | 587,43.50734981610847 588 | 588,41.55169275694988 589 | 589,49.94978977072838 590 | 590,41.51531986828251 591 | 591,46.151627181846294 592 | 592,43.40399093892558 593 | 593,48.18608117726394 594 | 594,47.4588146977141 595 | 595,50.977940838725175 596 | 596,43.240307497193115 597 | 597,44.441571249166024 598 | 598,40.898088500848885 599 | 599,43.57180334063648 600 | 600,47.681532788911746 601 | 601,49.5821289521785 602 | 602,41.18890642377668 603 | 603,50.15069824566669 604 | 604,52.58417878494909 605 | 605,47.87686694471834 606 | 606,55.197753393344726 607 | 607,41.94928609926073 608 | 608,39.599789128097356 609 | 609,36.93466168422622 610 | 610,53.314051478047155 611 | 611,49.111588900301825 612 | 612,45.56812869087268 613 | 613,47.252544380136214 614 | 614,40.232260062801394 615 | 615,58.09582119117465 616 | 616,46.520872258977704 617 | 617,46.429791583104574 618 | 618,49.52004536166128 619 | 619,48.30499359338646 620 | 620,47.028440521425814 621 | 621,41.96605609608624 622 | 622,48.28551039802553 623 | 623,55.34836087974462 624 | 624,52.675735261662986 625 | 625,53.925288970682395 626 | 626,43.41431978148901 627 | 627,41.03373527627766 628 | 628,45.36450226851047 629 | 629,46.28405261051549 630 | 630,51.488687942419176 631 | 631,37.56801807673369 632 | 632,53.69100974627277 633 | 633,45.26643912299806 634 | 634,43.93559479168935 635 | 635,41.02329838495605 636 | 636,37.82365317191395 637 | 637,50.22819944705874 638 | 638,46.49363768092574 639 | 639,39.692233601129196 640 | 640,39.68192106639061 641 | 641,44.49395248342244 642 | 642,54.53382655924233 643 | 643,44.90688472116901 644 | 644,38.70552656125698 645 | 645,45.009200889079864 646 | 646,44.89124602594763 647 | 647,32.787648874809506 648 | 648,46.018094306832346 649 | 649,45.152648676899865 650 | 650,49.80636980067423 651 | 651,55.58839689949587 652 | 652,51.99497967721012 653 | 653,45.036506712522744 654 | 654,40.86737168051828 655 | 655,59.28610463174584 656 | 656,46.734953435528936 657 | 657,46.52831257699499 658 | 658,46.35809269602648 659 | 659,47.48943908859026 660 | 660,45.797753510750205 661 | 661,43.67202696387319 662 | 662,43.827062888393925 663 | 663,46.418849050543024 664 | 664,43.886984622000405 665 | 665,43.06160598355578 666 | 666,47.17994388787028 667 | 667,45.395094053577736 668 | 668,54.212360134780425 669 | 669,33.46018699749202 670 | 670,52.19543509480974 671 | 671,52.99141370845014 672 | 672,36.41734379158907 673 | 673,45.09438268456039 674 | 674,44.97435873150162 675 | 675,39.817961925757274 676 | 676,42.990607471286125 677 | 677,41.35119361891566 678 | 678,55.690016696842264 679 | 679,51.63185605894681 680 | 680,53.3362454592528 681 | 681,50.61204078467408 682 | 682,41.383835002988384 683 | 683,44.43210717577827 684 | 684,49.52739556754633 685 | 685,40.99589594342906 686 | 686,50.69773571698473 687 | 687,45.95751989808219 688 | 688,45.31163932431114 689 | 689,50.7673312456675 690 | 690,49.4608260059858 691 | 691,45.46184510942596 692 | 692,53.09067834884467 693 | 693,41.91625146116472 694 | 694,50.4289687572904 695 | 695,50.342701987125814 696 | 696,45.85754933410332 697 | 697,49.064211853530395 698 | 698,41.20642198199751 699 | 699,52.11074453523972 700 | 700,46.59489334158811 701 | 701,44.934757082584795 702 | 702,52.82255863170898 703 | 703,44.0851051139414 704 | 704,40.59399749810758 705 | 705,39.88280648850937 706 | 706,50.72581707713448 707 | 707,41.323600746670664 708 | 708,56.529862479844354 709 | 709,37.376551121925736 710 | 710,56.2989477944742 711 | 711,48.90238096147103 712 | 712,47.39451431059139 713 | 713,45.18442795444506 714 | 714,49.935803465491965 715 | 715,47.783203596716504 716 | 716,53.519294177523285 717 | 717,48.605482856831 718 | 718,48.81756420866093 719 | 719,46.279854834331 720 | 720,47.84519675164247 721 | 721,49.70108995496587 722 | 722,39.6435414474505 723 | 723,41.48590346483051 724 | 724,51.975741888708995 725 | 725,49.14648195500212 726 | 726,47.405112764443174 727 | 727,48.45021397254939 728 | 728,50.12910672678271 729 | 729,45.725691584872244 730 | 730,44.56639491081712 731 | 731,87.4792262754884 732 | 732,81.65220968028949 733 | 733,88.61755980816055 734 | 734,78.08375463034983 735 | 735,91.75099039889166 736 | 736,88.96491440944952 737 | 737,87.86698146803022 738 | 738,91.47340381803718 739 | 739,94.84867030760181 740 | 740,91.54275695163855 741 | 741,77.2044331135842 742 | 742,79.93713940104153 743 | 743,83.12551031019822 744 | 744,86.28303654134197 745 | 745,88.63229717766363 746 | 746,82.29519347844378 747 | 747,86.72619692055135 748 | 748,81.87249287243496 749 | 749,82.43751401425615 750 | 750,78.2962046388019 751 | 751,80.53652646995936 752 | 752,78.20629494444427 753 | 753,79.8862901912403 754 | 754,89.82379093665871 755 | 755,79.58757708653178 756 | 756,97.26461581776707 757 | 757,86.32665686967982 758 | 758,84.53092447924465 759 | 759,79.05103596045944 760 | 760,86.56995278508049 761 | 761,79.90538644267556 762 | 762,83.09848970878659 763 | 763,94.98352353539634 764 | 764,81.38736199556806 765 | 765,87.28859750497116 766 | 766,77.69103810467274 767 | 767,80.6869012878439 768 | 768,89.36112165149041 769 | 769,77.00802906914707 770 | 770,88.83144693706717 771 | 771,82.92502305724396 772 | 772,76.18183646695965 773 | 773,81.75509474346035 774 | 774,83.04582640825792 775 | 775,80.87335080182861 776 | 776,69.48581184226616 777 | 777,73.26541308764195 778 | 778,75.21934235196136 779 | 779,75.63242441636734 780 | 780,78.64757640632486 781 | 781,75.96446483981477 782 | 782,67.92325508851059 783 | 783,76.01741217443967 784 | 784,76.6783847771681 785 | 785,75.92618881589185 786 | 786,78.02583057290785 787 | 787,76.279302480416 788 | 788,73.88659786242397 789 | 789,70.71428426894602 790 | 790,62.22830367811494 791 | 791,72.14294758079988 792 | 792,70.48896999210278 793 | 793,70.2582038564256 794 | 794,61.96062580721855 795 | 795,62.377605455250865 796 | 796,72.48181354490708 797 | 797,66.44625906256032 798 | 798,69.47453262126301 799 | 799,65.62680986188383 800 | 800,65.0475692890975 801 | 801,68.999348583205 802 | 802,61.13260712562736 803 | 803,63.594287273184136 804 | 804,60.19930333147269 805 | 805,59.73153228817237 806 | 806,59.748018306555565 807 | 807,61.791140740960564 808 | 808,57.670245775520044 809 | 809,65.72340804819913 810 | 810,54.34954932742443 811 | 811,57.2637050580264 812 | 812,55.37260432548626 813 | 813,64.17726482589406 814 | 814,57.294385691239604 815 | 815,60.838393252814754 816 | 816,47.617452001381665 817 | 817,55.74531573058794 818 | 818,58.221959174040286 819 | 819,53.54809955040126 820 | 820,57.82629311101374 821 | 821,49.27420164694703 822 | 822,58.26887873277768 823 | 823,62.07807877509589 824 | 824,48.13099500392539 825 | 825,47.07963186009887 826 | 826,55.936584407615854 827 | 827,55.93080491543667 828 | 828,44.7828943149188 829 | 829,44.6616819307007 830 | 830,44.72048869432963 831 | 831,38.77553151104251 832 | 832,40.27467524749425 833 | 833,39.21936209993246 834 | 834,39.77538996925396 835 | 835,42.81764634720482 836 | 836,43.54141807986622 837 | 837,49.50489990867762 838 | 838,36.14565758981908 839 | 839,45.447244501142784 840 | 840,38.84726347344576 841 | 841,39.0650299081736 842 | 842,42.08534630410555 843 | 843,32.50049168213937 844 | 844,39.43314779102949 845 | 845,37.76469223039892 846 | 846,38.81054374873852 847 | 847,37.21469890822293 848 | 848,47.47082318305513 849 | 849,31.4361729545589 850 | 850,31.405705646906757 851 | 851,30.38625448519418 852 | 852,30.171467747036363 853 | 853,29.214622174606596 854 | 854,37.80533016931715 855 | 855,38.42761890871013 856 | 856,27.942550199863867 857 | 857,26.112193163482118 858 | 858,32.115413119862936 859 | 859,26.488426984052026 860 | 860,31.91241630282361 861 | 861,29.26775897375638 862 | 862,20.1871429775199 863 | 863,35.023601477468304 864 | 864,35.79332497755567 865 | 865,23.28565824487682 866 | 866,23.954715335335667 867 | 867,26.874277929749642 868 | 868,26.677216819356744 869 | 869,27.866035381209716 870 | 870,34.2012598292886 871 | 871,22.85110626767534 872 | 872,19.338781214400523 873 | 873,16.037051087921007 874 | 874,18.89146392134534 875 | 875,22.002162902410006 876 | 876,30.771614402134354 877 | 877,63.25971198130723 878 | 878,66.91809131028373 879 | 879,65.66922737546992 880 | 880,71.64629914740149 881 | 881,78.29277990096537 882 | 882,62.95834646570992 883 | 883,63.120846797086266 884 | 884,70.74497680118259 885 | 885,68.8905266554019 886 | 886,74.67226472304462 887 | 887,68.31678376562067 888 | 888,63.559846800147724 889 | 889,68.26948935668015 890 | 890,70.82037850311947 891 | 891,69.34064831465457 892 | 892,67.73670979718008 893 | 893,70.49653480912093 894 | 894,70.97317463013869 895 | 895,72.00173470589178 896 | 896,68.29943422417259 897 | 897,64.18593563697577 898 | 898,65.72142624099992 899 | 899,70.98741927735617 900 | 900,60.83789198560368 901 | 901,66.73430648029836 902 | 902,62.89328365479565 903 | 903,64.97340232728526 904 | 904,71.19229531285193 905 | 905,65.7265410378937 906 | 906,64.97488471111707 907 | 907,57.91571882001073 908 | 908,68.4192009424427 909 | 909,67.88891843198832 910 | 910,75.45196121974008 911 | 911,63.07160704697143 912 | 912,65.6816759407347 913 | 913,65.80887922594114 914 | 914,72.4258691388471 915 | 915,64.03936748268336 916 | 916,65.88879073484537 917 | 917,67.51159429781318 918 | 918,65.38417185508807 919 | 919,62.198553708023255 920 | 920,65.38134358193875 921 | 921,69.75979578135875 922 | 922,59.24037084891143 923 | 923,65.01966828955187 924 | 924,60.836646073549694 925 | 925,70.29546661418112 926 | 926,56.68783083150296 927 | 927,61.49323207921107 928 | 928,66.1585162718048 929 | 929,72.08522782727114 930 | 930,63.91453798314959 931 | 931,61.45349974021521 932 | 932,73.62201297432246 933 | 933,56.9634062725435 934 | 934,53.19720779387817 935 | 935,66.37958107992958 936 | 936,61.65801424081259 937 | 937,59.051398954095426 938 | 938,67.68911999556688 939 | 939,65.35660902085999 940 | 940,61.30796892963495 941 | 941,57.71808165745775 942 | 942,68.47361282644391 943 | 943,67.35453163885492 944 | 944,63.600319119246635 945 | 945,73.32252535976541 946 | 946,58.73252390052002 947 | 947,56.44939083177055 948 | 948,60.61200311441438 949 | 949,63.838594235143944 950 | 950,65.27865277547693 951 | 951,62.85165312395593 952 | 952,65.81443129638119 953 | 953,57.79321748332162 954 | 954,71.26693370837344 955 | 955,63.634983105864 956 | 956,69.63027515368998 957 | 957,65.7559049046397 958 | 958,66.3249481443009 959 | 959,66.88934057244037 960 | 960,66.27878391752225 961 | 961,67.25400329080458 962 | 962,70.68670857629505 963 | 963,65.02451292826777 964 | 964,67.5882884289596 965 | 965,63.59635185225135 966 | 966,71.24777194619911 967 | 967,60.66777153436847 968 | 968,73.05737080969247 969 | 969,63.85519973770261 970 | 970,56.90581691800959 971 | 971,64.70429640510834 972 | 972,60.66297380740421 973 | 973,68.27628038959806 974 | 974,60.81514303223565 975 | 975,61.85291275452733 976 | 976,54.64205696380389 977 | 977,61.834520449179585 978 | 978,51.983304615392335 979 | 979,56.19019118186466 980 | 980,67.91913457381202 981 | 981,68.05376714235574 982 | 982,66.26010541900494 983 | 983,59.30633152652176 984 | 984,63.911390653992996 985 | 985,64.14100770457732 986 | 986,58.376604927350094 987 | 987,71.69516012118132 988 | 988,68.57504985024782 989 | 989,63.09381416173978 990 | 990,64.34378503245806 991 | 991,65.2623122033827 992 | 992,54.023085035449114 993 | 993,63.00754995640572 994 | 994,60.845506809083034 995 | 995,59.25963030058986 996 | 996,62.874839760878004 997 | 997,73.28169078321733 998 | 998,67.51069292222805 999 | 999,61.46410519251246 1000 | 1000,67.19673416803768 1001 | 1001,71.3447137141728 1002 | 1002,68.98551494181302 1003 | 1003,64.67519969458458 1004 | 1004,61.1573542113066 1005 | 1005,67.89843149477664 1006 | 1006,66.39030290698183 1007 | 1007,68.91468503293403 1008 | 1008,67.63070048639688 1009 | 1009,69.71900490344419 1010 | 1010,61.811740151706736 1011 | 1011,71.09183420285761 1012 | 1012,65.51008011297937 1013 | 1013,74.91587300254773 1014 | 1014,61.11138223749542 1015 | 1015,73.25515699243337 1016 | 1016,65.58317034208238 1017 | 1017,61.355064511399085 1018 | 1018,62.21152099502529 1019 | 1019,63.04826468325504 1020 | 1020,66.79013534750612 1021 | 1021,67.303038705634 1022 | 1022,61.840166097737374 1023 | 1023,64.60694517027159 1024 | 1024,75.46036707781032 1025 | 1025,73.40727142087347 1026 | 1026,66.97195534664453 1027 | 1027,65.00137498543688 1028 | 1028,65.43277203521433 1029 | 1029,67.92169834132132 1030 | 1030,59.761872070949586 1031 | 1031,63.610910062294074 1032 | 1032,56.57705977129407 1033 | 1033,66.93851080489337 1034 | 1034,68.20101573797079 1035 | 1035,62.571968523700555 1036 | 1036,72.88092156241217 1037 | 1037,58.90546663861811 1038 | 1038,57.73594625391303 1039 | 1039,66.20382215428045 1040 | 1040,70.34101191240717 1041 | 1041,73.54932936652985 1042 | 1042,62.8596515319001 1043 | 1043,70.57206864688062 1044 | 1044,65.01092174570599 1045 | 1045,64.36533348457792 1046 | 1046,69.67198015189126 1047 | 1047,68.54070825114164 1048 | 1048,57.42274772225589 1049 | 1049,72.71322451003621 1050 | 1050,72.25699175862604 1051 | 1051,62.25493005857515 1052 | 1052,67.38816455605992 1053 | 1053,67.90589429347283 1054 | 1054,66.76590023376959 1055 | 1055,62.737983679328494 1056 | 1056,62.158559098831105 1057 | 1057,65.4162589770662 1058 | 1058,71.43521319546103 1059 | 1059,68.31729149554124 1060 | 1060,63.774124036853365 1061 | 1061,69.513775083915 1062 | 1062,51.440833639169426 1063 | 1063,71.45581836578454 1064 | 1064,57.04204054531472 1065 | 1065,63.95719931678739 1066 | 1066,60.20002271204928 1067 | 1067,59.354105123920014 1068 | 1068,71.6609575035478 1069 | 1069,63.54779797336593 1070 | 1070,67.64847176478246 1071 | 1071,65.71116442601036 1072 | 1072,68.36095164526769 1073 | 1073,66.39000102522381 1074 | 1074,59.62123704040151 1075 | 1075,71.04800005025972 1076 | 1076,63.62851070965137 1077 | 1077,58.345170376644596 1078 | 1078,64.01844758366025 1079 | 1079,73.69392184690317 1080 | 1080,70.47248581940822 1081 | 1081,64.50952409540139 1082 | 1082,64.53805609192865 1083 | 1083,64.70788014526462 1084 | 1084,76.73165581320448 1085 | 1085,68.28960213140557 1086 | 1086,68.56228934646093 1087 | 1087,71.59580068089285 1088 | 1088,67.67077637049164 1089 | 1089,65.21421068949192 1090 | 1090,65.56040551893751 1091 | 1091,66.21702314913625 1092 | 1092,66.42193312467322 1093 | 1093,70.27934591815925 1094 | 1094,66.93421941545556 1095 | 1095,70.37111892390658 1096 | 1096,104.34641709945708 1097 | 1097,105.18724951994511 1098 | 1098,94.83529260692764 1099 | 1099,109.42831102505146 1100 | 1100,106.5876545917534 1101 | 1101,109.84197754668257 1102 | 1102,90.35555590549241 1103 | 1103,115.25182242221324 1104 | 1104,104.07334994451305 1105 | 1105,110.26182071023548 1106 | 1106,99.45927463461477 1107 | 1107,107.64889376003289 1108 | 1108,99.23252538280832 1109 | 1109,101.28373648680878 1110 | 1110,113.86415884465195 1111 | 1111,103.22050054039437 1112 | 1112,105.12952746406458 1113 | 1113,108.24974618764534 1114 | 1114,106.22351246907893 1115 | 1115,104.33160464876089 1116 | 1116,105.22749771284501 1117 | 1117,115.23179589871731 1118 | 1118,102.72756247095904 1119 | 1119,103.81107718719434 1120 | 1120,107.83784351072956 1121 | 1121,107.88033515837003 1122 | 1122,108.04521889307728 1123 | 1123,105.0503949709162 1124 | 1124,95.87780029771311 1125 | 1125,109.48556104954102 1126 | 1126,95.301848608771 1127 | 1127,102.25161796175495 1128 | 1128,96.66172159395559 1129 | 1129,99.79696976122635 1130 | 1130,101.43604292506019 1131 | 1131,101.06370630755355 1132 | 1132,101.22144751150515 1133 | 1133,106.82567481860609 1134 | 1134,100.6605358656558 1135 | 1135,96.79842097062928 1136 | 1136,102.45669922140837 1137 | 1137,103.1915227910542 1138 | 1138,90.70501697249779 1139 | 1139,99.42005450937356 1140 | 1140,99.52016470229712 1141 | 1141,94.09911792491847 1142 | 1142,102.02963303332439 1143 | 1143,93.95665759455623 1144 | 1144,94.88247382922528 1145 | 1145,92.92885679715212 1146 | 1146,93.40385507305584 1147 | 1147,87.36860144758509 1148 | 1148,85.16616850643837 1149 | 1149,99.84798136687385 1150 | 1150,87.14242983464501 1151 | 1151,85.9149651978532 1152 | 1152,79.59274113965293 1153 | 1153,86.64588872783426 1154 | 1154,82.69966385920176 1155 | 1155,96.99318414637675 1156 | 1156,93.29394193185013 1157 | 1157,84.25977983039863 1158 | 1158,98.41248869717028 1159 | 1159,91.50319612127807 1160 | 1160,84.40873123974906 1161 | 1161,72.9690207038747 1162 | 1162,96.34874769479731 1163 | 1163,77.36916680058191 1164 | 1164,75.50822424644707 1165 | 1165,88.26164075166697 1166 | 1166,94.75669158502139 1167 | 1167,88.88419485926683 1168 | 1168,84.183228988741 1169 | 1169,83.73445148332432 1170 | 1170,84.42109121923548 1171 | 1171,83.33852187283259 1172 | 1172,80.33642900263888 1173 | 1173,78.83499440359681 1174 | 1174,77.38166178083394 1175 | 1175,73.30276289418856 1176 | 1176,75.04468789449933 1177 | 1177,67.3564755458802 1178 | 1178,74.70067727015162 1179 | 1179,69.61865627292714 1180 | 1180,68.41122396939272 1181 | 1181,74.19472462233392 1182 | 1182,79.62007583296466 1183 | 1183,76.61538822593467 1184 | 1184,63.534176585908256 1185 | 1185,65.8007505528353 1186 | 1186,74.81449983738997 1187 | 1187,64.55970456415824 1188 | 1188,67.71042240701452 1189 | 1189,70.94544830124644 1190 | 1190,62.71542219278456 1191 | 1191,67.44653955275716 1192 | 1192,59.61281671219209 1193 | 1193,62.640356230929065 1194 | 1194,66.61152568689268 1195 | 1195,56.414443070973824 1196 | 1196,65.9501577560159 1197 | 1197,63.019745985281844 1198 | 1198,65.25251570577853 1199 | 1199,62.9839484650979 1200 | 1200,68.06177293400795 1201 | 1201,61.24646693042315 1202 | 1202,57.85536973898098 1203 | 1203,59.99891530867445 1204 | 1204,61.492122653941806 1205 | 1205,58.411508046201824 1206 | 1206,57.76530691245517 1207 | 1207,53.45129040423877 1208 | 1208,53.04959720219612 1209 | 1209,48.75807270111077 1210 | 1210,63.93031490526834 1211 | 1211,48.37897125174135 1212 | 1212,50.554329699704304 1213 | 1213,49.85228615565467 1214 | 1214,57.34949476053768 1215 | 1215,50.835942030455705 1216 | 1216,57.99066748068667 1217 | 1217,47.8314002216183 1218 | 1218,52.046528622258194 1219 | 1219,45.93351224748061 1220 | 1220,60.3008433076278 1221 | 1221,43.10828957346368 1222 | 1222,50.07307458013841 1223 | 1223,51.1772197425914 1224 | 1224,49.56853769271437 1225 | 1225,46.071318919657514 1226 | 1226,45.85404037083275 1227 | 1227,46.23492091240162 1228 | 1228,44.801065462675545 1229 | 1229,49.883332196052606 1230 | 1230,55.65221670525551 1231 | 1231,41.35576420997642 1232 | 1232,36.84593607822974 1233 | 1233,42.813522627665336 1234 | 1234,55.721860706117155 1235 | 1235,38.38186429560821 1236 | 1236,50.181426432365924 1237 | 1237,49.968771507475005 1238 | 1238,38.41570601945366 1239 | 1239,43.64102951698132 1240 | 1240,48.55978088242168 1241 | 1241,38.15318637960798 1242 | 1242,83.07986667597628 1243 | 1243,81.140746982346 1244 | 1244,78.92755820236934 1245 | 1245,80.70794505483103 1246 | 1246,77.78841643778598 1247 | 1247,84.03310767160765 1248 | 1248,79.96817785438415 1249 | 1249,84.94310197479096 1250 | 1250,75.9515909762522 1251 | 1251,85.34312925543674 1252 | 1252,87.81478764703547 1253 | 1253,73.63762904122572 1254 | 1254,85.43649805245647 1255 | 1255,89.66520556257139 1256 | 1256,77.44003231521172 1257 | 1257,91.81320157583212 1258 | 1258,85.50825640800538 1259 | 1259,79.85163739911827 1260 | 1260,83.0620943624156 1261 | 1261,86.09751934883298 1262 | 1262,83.65155299562211 1263 | 1263,85.93163777106513 1264 | 1264,78.60150647941755 1265 | 1265,84.01937446072026 1266 | 1266,76.07236745140895 1267 | 1267,82.55384740968897 1268 | 1268,78.56014000528721 1269 | 1269,79.3723862868666 1270 | 1270,89.20151157384636 1271 | 1271,88.44925257330767 1272 | 1272,86.01069012944623 1273 | 1273,77.4763692889437 1274 | 1274,81.32945527780893 1275 | 1275,88.95117610310871 1276 | 1276,83.56908278993596 1277 | 1277,94.4325720023235 1278 | 1278,84.77855000435922 1279 | 1279,83.74884832247989 1280 | 1280,81.22026282360562 1281 | 1281,83.4113876299051 1282 | 1282,81.95976256732774 1283 | 1283,86.24166962274924 1284 | 1284,87.46423336474882 1285 | 1285,78.73094631490747 1286 | 1286,75.985472264271 1287 | 1287,73.44192048045024 1288 | 1288,85.14477655338021 1289 | 1289,77.0704171173762 1290 | 1290,71.8057805047945 1291 | 1291,84.49665837807665 1292 | 1292,95.00283890273295 1293 | 1293,82.49210233878274 1294 | 1294,86.70006155335766 1295 | 1295,82.90243608125236 1296 | 1296,81.98504910772549 1297 | 1297,87.06161004205835 1298 | 1298,81.00210305368148 1299 | 1299,83.7781991473381 1300 | 1300,84.03799786064299 1301 | 1301,79.0778330943604 1302 | 1302,87.36777478743161 1303 | 1303,81.52253897559306 1304 | 1304,78.60887965453978 1305 | 1305,85.0609113011706 1306 | 1306,77.87726374185704 1307 | 1307,82.5022319718888 1308 | 1308,82.3079307658406 1309 | 1309,87.77567626730144 1310 | 1310,84.71283154392684 1311 | 1311,82.20781293842506 1312 | 1312,86.41584818239566 1313 | 1313,89.27258120890717 1314 | 1314,85.10557584868074 1315 | 1315,82.36371862808036 1316 | 1316,75.74865225148677 1317 | 1317,76.97858599843283 1318 | 1318,80.77479108250051 1319 | 1319,79.25054967721073 1320 | 1320,81.36088248289961 1321 | 1321,82.57704562023487 1322 | 1322,84.94074194453482 1323 | 1323,81.93868815823652 1324 | 1324,84.72301239041036 1325 | 1325,82.61261319148628 1326 | 1326,72.4130566596975 1327 | 1327,77.59426895653911 1328 | 1328,81.57146930115235 1329 | 1329,76.24479592681149 1330 | 1330,85.29467397366851 1331 | 1331,89.95094003067022 1332 | 1332,88.39354230315139 1333 | 1333,81.23545429048735 1334 | 1334,89.75962017517409 1335 | 1335,83.05302971461039 1336 | 1336,80.6283444749642 1337 | 1337,79.2512187770739 1338 | 1338,80.81071421221381 1339 | 1339,80.38737883604146 1340 | 1340,83.18591103247867 1341 | 1341,83.14263052378087 1342 | 1342,82.36128214619202 1343 | 1343,84.53739209726888 1344 | 1344,88.31293667322498 1345 | 1345,87.11483197029715 1346 | 1346,74.95027650685206 1347 | 1347,69.61321193557033 1348 | 1348,87.06281179836421 1349 | 1349,75.56555394262578 1350 | 1350,81.28519339056675 1351 | 1351,76.567863243016 1352 | 1352,73.41826642444546 1353 | 1353,85.1455520383934 1354 | 1354,86.24441083240133 1355 | 1355,79.57680383385181 1356 | 1356,69.5151867029204 1357 | 1357,79.75053713630814 1358 | 1358,84.45245689602145 1359 | 1359,75.11087020520138 1360 | 1360,83.43452994896022 1361 | 1361,82.4537919794428 1362 | 1362,85.43971564697515 1363 | 1363,83.15438045796378 1364 | 1364,77.70465543701867 1365 | 1365,88.56667776957934 1366 | 1366,81.80528866610794 1367 | 1367,82.47582382969951 1368 | 1368,77.96070804961774 1369 | 1369,80.42562684387691 1370 | 1370,78.23330135657419 1371 | 1371,81.80814568025596 1372 | 1372,91.24726133350708 1373 | 1373,75.84533576333025 1374 | 1374,74.6534343374638 1375 | 1375,90.09376784697761 1376 | 1376,81.70824548892665 1377 | 1377,79.42671839621777 1378 | 1378,87.98909207640179 1379 | 1379,79.77924362925893 1380 | 1380,91.9553865443418 1381 | 1381,86.23324578146548 1382 | 1382,80.42259748874281 1383 | 1383,93.66749123162442 1384 | 1384,79.87142660920995 1385 | 1385,86.62978247554308 1386 | 1386,84.43532416748896 1387 | 1387,89.46737256442424 1388 | 1388,90.78627411521077 1389 | 1389,83.15903603007983 1390 | 1390,79.252466218639 1391 | 1391,85.34019771296995 1392 | 1392,79.67278074583534 1393 | 1393,93.14955163925615 1394 | 1394,83.78678513308292 1395 | 1395,81.29922714038106 1396 | 1396,84.07119427873533 1397 | 1397,76.43434869303353 1398 | 1398,78.33432499931374 1399 | 1399,89.21710557917922 1400 | 1400,79.95342944034044 1401 | 1401,78.0264328325523 1402 | 1402,85.96755871626624 1403 | 1403,89.23634142827711 1404 | 1404,86.92632961480591 1405 | 1405,88.33575882837548 1406 | 1406,79.5957154663932 1407 | 1407,76.29501951253559 1408 | 1408,90.93533273950158 1409 | 1409,81.84006489899885 1410 | 1410,82.22430490200678 1411 | 1411,90.1446511737032 1412 | 1412,86.31024390607782 1413 | 1413,85.83414736758517 1414 | 1414,94.40553448591216 1415 | 1415,80.38894383598807 1416 | 1416,88.27194420652899 1417 | 1417,83.94421251335828 1418 | 1418,85.02870475637542 1419 | 1419,91.35487353671272 1420 | 1420,86.27868823000371 1421 | 1421,86.45815633376952 1422 | 1422,89.15656601857243 1423 | 1423,81.99680445144979 1424 | 1424,79.65324238587405 1425 | 1425,78.65314972717873 1426 | 1426,74.07349858379325 1427 | 1427,94.21458239202141 1428 | 1428,78.44594803531561 1429 | 1429,82.88434132127635 1430 | 1430,82.63533752435845 1431 | 1431,85.5854056742255 1432 | 1432,88.15619856522102 1433 | 1433,88.40919100845393 1434 | 1434,81.2209167870807 1435 | 1435,83.33034378719682 1436 | 1436,85.60866707284778 1437 | 1437,82.98229187823434 1438 | 1438,92.2926194500225 1439 | 1439,86.74107291953561 1440 | 1440,87.99105488327311 1441 | 1441,87.661699968795 1442 | 1442,90.24544915736998 1443 | 1443,85.31413118200835 1444 | 1444,77.95596315395238 1445 | 1445,86.46981686760397 1446 | 1446,81.24600029848155 1447 | 1447,81.89126120477202 1448 | 1448,87.49787549512702 1449 | 1449,90.78933125256846 1450 | 1450,84.73628275392811 1451 | 1451,86.2062461735004 1452 | 1452,93.20545813390981 1453 | 1453,85.93059647280192 1454 | 1454,97.76783698949741 1455 | 1455,87.61970299252013 1456 | 1456,76.02121565042748 1457 | 1457,88.62475241274508 1458 | 1458,86.79699012438826 1459 | 1459,91.3732537568328 1460 | 1460,88.32382529417848 1461 | 1461,122.30772008008113 1462 | -------------------------------------------------------------------------------- /time_series/cnn-demo.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import matplotlib.pyplot as plt 3 | import tensorflow as tf 4 | 5 | DEBUG = False 6 | keras = tf.keras 7 | 8 | 9 | # Plot a simple graph 10 | def plot_series(time, series, format="-", start=0, end=None, label=None): 11 | plt.plot(time[start:end], series[start:end], format, label=label) 12 | plt.xlabel("Time") 13 | plt.ylabel("Value") 14 | if label: 15 | plt.legend(fontsize=14) 16 | plt.grid(True) 17 | 18 | 19 | # Generate a trend para. This will affect every element in an numpy array 20 | def trend(time, slope=0): 21 | return slope * time 22 | 23 | 24 | # Generate a seasonal pattern 25 | def seasonal_pattern(season_time): 26 | """Just an arbitrary pattern, you can change it if you wish""" 27 | return np.where(season_time < 0.4, 28 | np.cos(season_time * 2 * np.pi), 29 | 1 / np.exp(3 * season_time)) 30 | 31 | 32 | # Create a series following the seasonal pattern 33 | def seasonality(time, period, amplitude=1, phase=0): 34 | """Repeats the same pattern at each period""" 35 | season_time = ((time + phase) % period) / period 36 | return amplitude * seasonal_pattern(season_time) 37 | 38 | 39 | # Generate some white noise 40 | def white_noise(time, noise_level=1, seed=None): 41 | rnd = np.random.RandomState(seed) 42 | return rnd.randn(len(time)) * noise_level 43 | 44 | 45 | # Create the dataset 46 | def seq2seq_window_dataset(series, window_size, batch_size=32, 47 | shuffle_buffer=1000): 48 | series = tf.expand_dims(series, axis=-1) 49 | ds = tf.data.Dataset.from_tensor_slices(series) 50 | ds = ds.window(window_size + 1, shift=1, drop_remainder=True) 51 | ds = ds.flat_map(lambda w: w.batch(window_size + 1)) 52 | ds = ds.shuffle(shuffle_buffer) 53 | ds = ds.map(lambda w: (w[:-1], w[1:])) 54 | return ds.batch(batch_size).prefetch(1) 55 | 56 | 57 | # Use the model to perform a prediction 58 | def model_forecast(model, series, window_size): 59 | ds = tf.data.Dataset.from_tensor_slices(series) 60 | ds = ds.window(window_size, shift=1, drop_remainder=True) 61 | ds = ds.flat_map(lambda w: w.batch(window_size)) 62 | ds = ds.batch(32).prefetch(1) 63 | forecast = model.predict(ds) 64 | return forecast 65 | 66 | 67 | # Generate a time series of 4 span_years + 1 day 68 | time = np.arange(4 * 365 + 1) 69 | 70 | slope = 0.05 71 | baseline = 10 72 | amplitude = 40 73 | 74 | # Generate the test data, adding together the baseline, trend and seasonality 75 | series = baseline + trend(time, slope) + seasonality(time, period=365, amplitude=amplitude) 76 | 77 | # Now add some random noise 78 | noise_level = 5 79 | noise = white_noise(time, noise_level, seed=42) 80 | series += noise 81 | 82 | # Display the test data 83 | if DEBUG: 84 | plt.figure(figsize=(10, 6)) 85 | plot_series(time, series, label="Test data") 86 | plt.show() 87 | 88 | # Split the data into training and validation data and plot both 89 | split_time = 1000 90 | time_train = time[:split_time] 91 | x_train = series[:split_time] 92 | 93 | if DEBUG: 94 | plt.figure(figsize=(10, 6)) 95 | plot_series(time_train, x_train, label="Training data") 96 | plt.show() 97 | 98 | time_valid = time[split_time:] 99 | x_valid = series[split_time:] 100 | 101 | if DEBUG: 102 | plt.figure(figsize=(10, 6)) 103 | plot_series(time_valid, x_valid, label="Validation data") 104 | plt.show() 105 | 106 | 107 | # Setup the keras session 108 | keras.backend.clear_session() 109 | tf.random.set_seed(42) 110 | np.random.seed(42) 111 | 112 | # Create the training and validation data sets 113 | window_size = 64 114 | train_set = seq2seq_window_dataset(x_train, window_size, 115 | batch_size=128) 116 | valid_set = seq2seq_window_dataset(x_valid, window_size, 117 | batch_size=128) 118 | 119 | # Create a sequential model 120 | model = keras.models.Sequential() 121 | 122 | # We're using the WaveNet architecture, so... 123 | # Input layer 124 | model.add(keras.layers.InputLayer(input_shape=[None, 1])) 125 | 126 | # Add multiple 1D convolutional layers with increasing dilation rates to 127 | # allow each layer to detect patterns over longer time frequencies 128 | for dilation_rate in (1, 2, 4, 8, 16, 32): 129 | model.add( 130 | keras.layers.Conv1D(filters=32, 131 | kernel_size=2, 132 | strides=1, 133 | dilation_rate=dilation_rate, 134 | padding="causal", 135 | activation="relu") 136 | ) 137 | 138 | # Add one output layer, with 1 filter to give us one output per time step 139 | model.add(keras.layers.Conv1D(filters=1, kernel_size=1)) 140 | 141 | # Setup the optimiser, with the learning rate cribbed from the tutorial for now 142 | optimizer = keras.optimizers.Adam(lr=3e-4) 143 | 144 | # Compile the model 145 | model.compile(loss=keras.losses.Huber(), 146 | optimizer=optimizer, 147 | metrics=["mae"]) 148 | 149 | # Save checkpoints when we get the best model 150 | model_checkpoint = keras.callbacks.ModelCheckpoint( 151 | "checkpoint.h5", save_best_only=True) 152 | 153 | # Use early stopping to prevent over fitting 154 | epochs = 500 155 | if DEBUG: 156 | epochs = 10 157 | 158 | early_stopping = keras.callbacks.EarlyStopping(patience=50) 159 | history = model.fit(train_set, epochs=epochs, 160 | validation_data=valid_set, 161 | callbacks=[early_stopping, model_checkpoint]) 162 | 163 | 164 | # Training is done, so load the best model from the last checkpoint 165 | model = keras.models.load_model("checkpoint.h5") 166 | 167 | 168 | cnn_forecast = model_forecast(model, series[..., np.newaxis], window_size) 169 | cnn_forecast = cnn_forecast[split_time - window_size:-1, -1, 0] 170 | 171 | plt.figure(figsize=(10, 6)) 172 | plot_series(time, np.concatenate([series[:1000], np.full(461, None, dtype=float)]), label="Training Data") 173 | plot_series(time, np.concatenate([np.full(1000, None, dtype=float), series[1000:]]), label="Validation Data") 174 | plot_series(time, np.concatenate([np.full(1000, None, dtype=float), cnn_forecast]), label="Forecast Data") 175 | plt.show() 176 | 177 | 178 | mae = keras.metrics.mean_absolute_error(x_valid, cnn_forecast).numpy() 179 | 180 | print("MAE: {}".format(mae)) -------------------------------------------------------------------------------- /time_series/cnn-workload.py: -------------------------------------------------------------------------------- 1 | from datetime import datetime 2 | import numpy as np 3 | import matplotlib.pyplot as plt 4 | import tensorflow as tf 5 | 6 | DEBUG = True 7 | keras = tf.keras 8 | 9 | 10 | # Plot a simple graph 11 | def plot_series(time, series, format="-", start=0, end=None, label=None): 12 | plt.plot(time[start:end], series[start:end], format, label=label) 13 | plt.xlabel("Time") 14 | plt.ylabel("Value") 15 | if label: 16 | plt.legend(fontsize=14) 17 | plt.grid(True) 18 | 19 | 20 | # Create the dataset 21 | def seq2seq_window_dataset(series, window_size, batch_size=32, 22 | shuffle_buffer=1000): 23 | series = tf.expand_dims(series, axis=-1) 24 | ds = tf.data.Dataset.from_tensor_slices(series) 25 | ds = ds.window(window_size + 1, shift=1, drop_remainder=True) 26 | ds = ds.flat_map(lambda w: w.batch(window_size + 1)) 27 | ds = ds.shuffle(shuffle_buffer) 28 | ds = ds.map(lambda w: (w[:-1], w[1:])) 29 | return ds.batch(batch_size).prefetch(1) 30 | 31 | 32 | # Use the model to perform a prediction 33 | def model_forecast(model, series, window_size): 34 | ds = tf.data.Dataset.from_tensor_slices(series) 35 | ds = ds.window(window_size, shift=1, drop_remainder=True) 36 | ds = ds.flat_map(lambda w: w.batch(window_size)) 37 | ds = ds.batch(32).prefetch(1) 38 | forecast = model.predict(ds) 39 | return forecast 40 | 41 | 42 | # Load the data from activity.csv 43 | 44 | # Create an array of timestamps, and an array of data 45 | dates_in = [] 46 | series_in = [] 47 | 48 | csv = open("activity.csv", "r") 49 | pm = False 50 | 51 | for line in csv: 52 | values = line.strip().split(',') 53 | dates_in.append(np.datetime64(datetime.strptime(values[0], '%Y-%m-%d %H:%M:%S'))) 54 | series_in.append(int(values[1])) 55 | 56 | csv.close() 57 | 58 | samples = len(dates_in) 59 | dates = np.array(dates_in) 60 | series = np.array(series_in) 61 | 62 | # Split the data into training and validation data and plot both 63 | train_samples = int(samples * 0.75) 64 | valid_samples = samples - train_samples 65 | dates_train = dates[:train_samples] 66 | x_train = series[:train_samples] 67 | 68 | if DEBUG: 69 | plt.figure(figsize=(10, 6)) 70 | plot_series(dates_train, x_train, label="Training data") 71 | plt.show() 72 | 73 | dates_valid = dates[train_samples:] 74 | x_valid = series[train_samples:] 75 | 76 | if DEBUG: 77 | plt.figure(figsize=(10, 6)) 78 | plot_series(dates_valid, x_valid, label="Validation data") 79 | plt.show() 80 | 81 | 82 | # Setup the keras session 83 | keras.backend.clear_session() 84 | tf.random.set_seed(42) 85 | np.random.seed(42) 86 | 87 | # Create the training and validation data sets 88 | window_size = 64 89 | train_set = seq2seq_window_dataset(x_train, window_size, batch_size=128) 90 | valid_set = seq2seq_window_dataset(x_valid, window_size, batch_size=128) 91 | 92 | # Create a sequential model 93 | model = keras.models.Sequential() 94 | 95 | # We're using the WaveNet architecture, so... 96 | # Input layer 97 | model.add(keras.layers.InputLayer(input_shape=[None, 1])) 98 | 99 | # Add multiple 1D convolutional layers with increasing dilation rates to 100 | # allow each layer to detect patterns over longer time frequencies 101 | for dilation_rate in (1, 2, 4, 8, 16, 32): 102 | model.add( 103 | keras.layers.Conv1D(filters=32, 104 | kernel_size=2, 105 | strides=1, 106 | dilation_rate=dilation_rate, 107 | padding="causal", 108 | activation="relu") 109 | ) 110 | 111 | # Add one output layer, with 1 filter to give us one output per time step 112 | model.add(keras.layers.Conv1D(filters=1, kernel_size=1)) 113 | 114 | # Setup the optimiser, with the learning rate cribbed from the tutorial for now 115 | optimizer = keras.optimizers.Adam(lr=3e-4) 116 | 117 | # Compile the model 118 | model.compile(loss=keras.losses.Huber(), 119 | optimizer=optimizer, 120 | metrics=["mae"]) 121 | if DEBUG: 122 | print(model.summary()) 123 | 124 | # Save checkpoints when we get the best model 125 | model_checkpoint = keras.callbacks.ModelCheckpoint( 126 | "checkpoint.h5", save_best_only=True) 127 | 128 | # Use early stopping to prevent over fitting 129 | epochs = 500 130 | if DEBUG: 131 | epochs = 10 132 | 133 | early_stopping = keras.callbacks.EarlyStopping(patience=50) 134 | history = model.fit(train_set, epochs=epochs, 135 | validation_data=valid_set, 136 | callbacks=[early_stopping, model_checkpoint]) 137 | 138 | 139 | # Training is done, so load the best model from the last checkpoint 140 | model = keras.models.load_model("checkpoint.h5") 141 | 142 | 143 | cnn_forecast = model_forecast(model, series[..., np.newaxis], window_size) 144 | cnn_forecast = cnn_forecast[train_samples - window_size:-1, -1, 0] 145 | 146 | plt.figure(figsize=(10, 6)) 147 | plot_series(dates, np.concatenate([series[:train_samples], np.full(valid_samples, None, dtype=float)]), label="Training Data") 148 | plot_series(dates, np.concatenate([np.full(train_samples, None, dtype=float), series[train_samples:]]), label="Validation Data") 149 | plot_series(dates, np.concatenate([np.full(train_samples, None, dtype=float), cnn_forecast]), label="Forecast Data") 150 | plt.show() 151 | 152 | 153 | mae = keras.metrics.mean_absolute_error(x_valid, cnn_forecast).numpy() 154 | 155 | print("MAE: {}".format(mae)) -------------------------------------------------------------------------------- /time_series/prophet-tsp.py: -------------------------------------------------------------------------------- 1 | import pandas as pd 2 | import matplotlib.pyplot as plt 3 | import dateutil.parser 4 | from prophet import Prophet 5 | from prophet.serialize import model_to_json, model_from_json 6 | 7 | DATAFILE = 'data.csv' 8 | 9 | data = pd.read_csv(DATAFILE, 10 | parse_dates=[0], 11 | date_parser=dateutil.parser.isoparse, 12 | header=0, 13 | names=['ds', 'y']) 14 | 15 | try: 16 | # Try to load an existing model first 17 | with open(DATAFILE + '.json', 'r') as f: 18 | m = model_from_json(f.read()) 19 | except: 20 | m = Prophet() 21 | m.fit(data) 22 | 23 | with open(DATAFILE + '.json', 'w') as f: 24 | f.write(model_to_json(m)) 25 | 26 | # Forecast 27 | n_periods = 288 28 | future = m.make_future_dataframe(periods=n_periods, 29 | freq='5T', 30 | include_history=False) 31 | 32 | forecast = m.predict(future) 33 | 34 | # Plot 35 | fig, axes = plt.subplots(figsize=(10, 5), dpi=100, tight_layout=True) 36 | plt.plot(data['ds'][100000:], data['y'][100000:], label='Historic Data') 37 | plt.plot(forecast['ds'], forecast['yhat'], label='Forecast Data') 38 | plt.fill_between(forecast['ds'], 39 | forecast['yhat_lower'], 40 | forecast['yhat_upper'], 41 | color='k', alpha=.25, label='Confidence') 42 | axes.legend(loc='upper left', fontsize=10) 43 | plt.show() 44 | # plt.savefig('prophet.png') 45 | -------------------------------------------------------------------------------- /time_series/prophet.sql: -------------------------------------------------------------------------------- 1 | -- This SQL script will create a set of objects for creating, updating Prophet 2 | -- models in PostgreSQL, along with an additional function for running 3 | -- predictions 4 | 5 | CREATE OR REPLACE FUNCTION public.activate_python_venv( 6 | venv text) 7 | RETURNS void 8 | LANGUAGE 'plpython3u' 9 | COST 100 10 | VOLATILE PARALLEL UNSAFE 11 | AS $BODY$ 12 | import os 13 | import sys 14 | 15 | if sys.platform in ('win32', 'win64', 'cygwin'): 16 | activate_this = os.path.join(venv, 'Scripts', 'activate_this.py') 17 | else: 18 | activate_this = os.path.join(venv, 'bin', 'activate_this.py') 19 | 20 | exec(open(activate_this).read(), dict(__file__=activate_this)) 21 | $BODY$; 22 | 23 | ALTER FUNCTION public.activate_python_venv(text) 24 | OWNER TO postgres; 25 | 26 | COMMENT ON FUNCTION public.activate_python_venv(text) 27 | IS 'Activate a Python virtual environment in this database session. 28 | 29 | Arguments: 30 | venv: The path to the virtual environment. 31 | Returns: 32 | void'; 33 | 34 | 35 | CREATE OR REPLACE FUNCTION public.create_model( 36 | relation text, 37 | ts_column name, 38 | y_column name, 39 | model_name text, 40 | overwrite boolean DEFAULT false) 41 | RETURNS text 42 | LANGUAGE 'plpython3u' 43 | COST 100 44 | VOLATILE PARALLEL UNSAFE 45 | AS $BODY$ 46 | import json 47 | import os 48 | import pathlib 49 | import sys 50 | import pandas as pd 51 | from prophet import Prophet 52 | from prophet.serialize import model_to_json 53 | 54 | # Make sure we do not try to write outside of our directory 55 | try: 56 | pathlib.Path( 57 | os.path.abspath( 58 | os.path.join('models', model_name + '.json') 59 | ) 60 | ).relative_to(os.path.abspath('models')) 61 | except ValueError: 62 | plpy.error('Invalid model name: {}'.format(model_name)) 63 | 64 | # Check for an existing model 65 | model_file = os.path.abspath(os.path.join('models', model_name + '.json')) 66 | 67 | if overwrite != True: 68 | if os.path.exists(model_file): 69 | plpy.error('Model {} already exists. Set the overwrite parameter to true to replace.'.format(model_file)) 70 | 71 | # Create the data set 72 | rows = plpy.execute('SELECT {}::timestamp AS ds, {} AS y FROM {} ORDER BY {} ASC'.format(ts_column, y_column, relation, ts_column)) 73 | 74 | # Check we have enough rows 75 | if len(rows) < 2: 76 | plpy.error('At least 5 data rows must be available for analysis. {} rows retrieved.'.format(len(rows))) 77 | 78 | # Create the dataframe 79 | columns = list(rows[0].keys()) 80 | data = pd.DataFrame.from_records(rows, columns = columns) 81 | 82 | # Create the model 83 | m = Prophet() 84 | m.fit(data) 85 | 86 | # Save the model 87 | if not os.path.exists('models'): 88 | os.makedirs('models') 89 | 90 | json_model = json.loads(model_to_json(m)) 91 | full_model = {'relation': relation, 92 | 'ts_column': ts_column, 93 | 'y_column': y_column, 94 | 'model': json_model} 95 | 96 | with open(model_file, 'w') as f: 97 | f.write(json.dumps(full_model)) 98 | 99 | return model_file 100 | $BODY$; 101 | 102 | ALTER FUNCTION public.create_model(text, name, name, text, boolean) 103 | OWNER TO postgres; 104 | 105 | COMMENT ON FUNCTION public.create_model(text, name, name, text, boolean) 106 | IS 'Create a Prophet model for making predictions. 107 | 108 | Arguments: 109 | relation: The name of the table from which observations should be loaded. 110 | ts_column: The name of the column containing the observation timestamp. 111 | y_column: The name of the column containing the observed value. 112 | model_name: The name for the model. 113 | overwrite: Overwrite an existing model of the same name if present (default: false) 114 | Returns: 115 | text: The full path of the model file.'; 116 | 117 | 118 | CREATE OR REPLACE FUNCTION public.delete_model( 119 | model_name text) 120 | RETURNS void 121 | LANGUAGE 'plpython3u' 122 | COST 100 123 | VOLATILE PARALLEL UNSAFE 124 | AS $BODY$ 125 | import os 126 | import pathlib 127 | import sys 128 | 129 | # Make sure we do not try to write outside of our directory 130 | try: 131 | pathlib.Path( 132 | os.path.abspath( 133 | os.path.join('models', model_name + '.json') 134 | ) 135 | ).relative_to(os.path.abspath('models')) 136 | except ValueError: 137 | plpy.error('Invalid model name: {}'.format(model_name)) 138 | 139 | # Check for an existing model 140 | model_file = os.path.abspath(os.path.join('models', model_name + '.json')) 141 | 142 | if not os.path.exists(model_file): 143 | plpy.error('Model {} does not exist.'.format(model_file)) 144 | 145 | os.remove(model_file) 146 | 147 | return 148 | $BODY$; 149 | 150 | ALTER FUNCTION public.delete_model(text) 151 | OWNER TO postgres; 152 | 153 | COMMENT ON FUNCTION public.delete_model(text) 154 | IS 'Delete an existing model. 155 | 156 | Arguments: 157 | model_name: The name of the model to delete. 158 | Returns: 159 | void'; 160 | 161 | 162 | CREATE OR REPLACE FUNCTION public.predict( 163 | model_name text, 164 | periods integer, 165 | frequency text, 166 | include_history boolean DEFAULT false, 167 | OUT ts timestamp without time zone, 168 | OUT y numeric, 169 | OUT y_lower numeric, 170 | OUT y_upper numeric) 171 | RETURNS SETOF record 172 | LANGUAGE 'plpython3u' 173 | COST 100 174 | VOLATILE PARALLEL UNSAFE 175 | ROWS 1000 176 | 177 | AS $BODY$ 178 | import json 179 | import os 180 | import pathlib 181 | import sys 182 | import pandas as pd 183 | from prophet import Prophet 184 | from prophet.serialize import model_from_json 185 | 186 | # Make sure we do not try to write outside of our directory 187 | try: 188 | pathlib.Path( 189 | os.path.abspath( 190 | os.path.join('models', model_name + '.json') 191 | ) 192 | ).relative_to(os.path.abspath('models')) 193 | except ValueError: 194 | plpy.error('Invalid model name: {}'.format(model_name)) 195 | 196 | # Check for an existing model 197 | model_file = os.path.abspath(os.path.join('models', model_name + '.json')) 198 | 199 | if not os.path.exists(model_file): 200 | plpy.error('Model {} does not exist.'.format(model_file)) 201 | 202 | with open(model_file, 'r') as f: 203 | json_model = json.load(f) 204 | 205 | m = model_from_json(json.dumps(json_model['model'])) 206 | 207 | # Forecast 208 | future = m.make_future_dataframe(periods=periods, 209 | freq=frequency, 210 | include_history=include_history) 211 | 212 | forecast = m.predict(future) 213 | 214 | # Convert to the output 215 | output = [] 216 | for d in range(0, len(forecast)): 217 | output.append((forecast['ds'][d], forecast['yhat'][d], forecast['yhat_lower'][d], forecast['yhat_upper'][d])) 218 | 219 | return output 220 | $BODY$; 221 | 222 | ALTER FUNCTION public.predict(text, integer, text, boolean) 223 | OWNER TO postgres; 224 | 225 | COMMENT ON FUNCTION public.predict(text, integer, text, boolean) 226 | IS 'Make a prediciton using a previously created model. 227 | 228 | Arguments: 229 | model_name: The name of the model to use. 230 | periods: The number of periods to predict. 231 | frequency: The period length, expressed as a Pandas frequency string, e.g. "5T" for 5 minutes. 232 | include_history: Include historic predictions for existing data in the output (default: false). 233 | Returns set of records: 234 | ts: The timestamp of the prediction. 235 | y: The predicted value. 236 | y_lower: The lower confidence bound of the predicted value. 237 | y_upper: The upper confidence bound of the predicted value. 238 | '; 239 | 240 | 241 | CREATE OR REPLACE FUNCTION public.update_model( 242 | model_name text, 243 | warm_start boolean DEFAULT true) 244 | RETURNS text 245 | LANGUAGE 'plpython3u' 246 | COST 100 247 | VOLATILE PARALLEL UNSAFE 248 | AS $BODY$ 249 | import json 250 | import os 251 | import pathlib 252 | import sys 253 | import pandas as pd 254 | from prophet import Prophet 255 | from prophet.serialize import model_from_json, model_to_json 256 | 257 | def init_stan(m): 258 | res = {} 259 | for pname in ['k', 'm', 'sigma_obs']: 260 | res[pname] = m.params[pname][0][0] 261 | for pname in ['delta', 'beta']: 262 | res[pname] = m.params[pname][0] 263 | return res 264 | 265 | 266 | # Make sure we do not try to write outside of our directory 267 | try: 268 | pathlib.Path( 269 | os.path.abspath( 270 | os.path.join('models', model_name + '.json') 271 | ) 272 | ).relative_to(os.path.abspath('models')) 273 | except ValueError: 274 | plpy.error('Invalid model name: {}'.format(model_name)) 275 | 276 | # Check for an existing model 277 | model_file = os.path.abspath(os.path.join('models', model_name + '.json')) 278 | 279 | if not os.path.exists(model_file): 280 | plpy.error('Model {} does not exist.'.format(model_file)) 281 | 282 | with open(model_file, 'r') as f: 283 | json_model = json.load(f) 284 | 285 | # Get the meta data 286 | relation = json_model['relation'] 287 | ts_column = json_model['ts_column'] 288 | y_column = json_model['y_column'] 289 | 290 | # Create the data set 291 | rows = plpy.execute('SELECT {}::timestamp AS ds, {} AS y FROM {} ORDER BY {} ASC'.format(ts_column, y_column, relation, ts_column)) 292 | 293 | # Check we have enough rows 294 | if len(rows) < 2: 295 | plpy.error('At least 5 data rows must be available for analysis. {} rows retrieved.'.format(len(rows))) 296 | 297 | # Create the dataframe 298 | columns = list(rows[0].keys()) 299 | data = pd.DataFrame.from_records(rows, columns = columns) 300 | 301 | if warm_start: 302 | m = model_from_json(json.dumps(json_model['model'])) 303 | m = Prophet().fit(data, init=init_stan(m)) 304 | else: 305 | m = Prophet() 306 | m.fit(data) 307 | 308 | json_model = json.loads(model_to_json(m)) 309 | full_model = {'relation': relation, 310 | 'ts_column': ts_column, 311 | 'y_column': y_column, 312 | 'model': json_model} 313 | 314 | with open(model_file, 'w') as f: 315 | f.write(json.dumps(full_model)) 316 | 317 | return model_file 318 | $BODY$; 319 | 320 | ALTER FUNCTION public.update_model(text, boolean) 321 | OWNER TO postgres; 322 | 323 | COMMENT ON FUNCTION public.update_model(text, boolean) 324 | IS 'Update an existing Prophet model. 325 | 326 | Arguments: 327 | model_name: The name of the model to update. 328 | warm_start: Use the existing model to bootstrap the new one (default: true). 329 | Returns: 330 | text: The full path of the model file. 331 | 332 | Note that whilst enabling warm_start will significantly speed up refitting of the model, it should only be used after adding a small number of additional records; a cold start update should be performed when more significant numbers of records are added to ensure that changepoints in the data are properly recalculated. One strategy (dependent on observation frequency) might be to make a daily update using warm start, and a full update once a week.'; 333 | -------------------------------------------------------------------------------- /time_series/sts-tsp.py: -------------------------------------------------------------------------------- 1 | import pandas as pd 2 | import matplotlib.pyplot as plt 3 | import dateutil.parser 4 | import tensorflow.compat.v2 as tf 5 | import tensorflow_probability as tfp 6 | 7 | from tensorflow_probability import distributions as tfd 8 | from tensorflow_probability import sts 9 | 10 | tf.enable_v2_behavior() 11 | 12 | DATAFILE = 'activity.csv' 13 | 14 | data = pd.read_csv(DATAFILE, 15 | index_col=0, 16 | parse_dates=[0], 17 | date_parser=dateutil.parser.isoparse, 18 | header=0, 19 | names=['ds', 'y']) 20 | 21 | data = tfp.sts.regularize_series(data[-4032:]) 22 | 23 | trend = sts.LocalLinearTrend(observed_time_series=data) 24 | seasonal = tfp.sts.Seasonal(num_seasons=12, observed_time_series=data) 25 | model = sts.Sum([trend, seasonal], observed_time_series=data) 26 | 27 | variational_posteriors = tfp.sts.build_factored_surrogate_posterior(model=model) 28 | 29 | num_variational_steps = 5 30 | 31 | # Build and optimize the variational loss function. 32 | elbo_loss_curve = tfp.vi.fit_surrogate_posterior( 33 | target_log_prob_fn=model.joint_distribution( 34 | observed_time_series=data).log_prob, 35 | surrogate_posterior=variational_posteriors, 36 | optimizer=tf.optimizers.Adam(learning_rate=0.1), 37 | num_steps=num_variational_steps, 38 | jit_compile=True) 39 | 40 | plt.plot(elbo_loss_curve) 41 | plt.show() 42 | 43 | # Draw samples from the variational posterior. 44 | q_samples_ = variational_posteriors.sample(50) 45 | 46 | # Forecast 47 | n_periods = int(len(data) * 0.25) 48 | forecast = tfp.sts.forecast(model, 49 | observed_time_series=data, 50 | parameter_samples=q_samples_, 51 | num_steps_forecast=n_periods) 52 | 53 | # Plot - FIXME!! 54 | fig, axes = plt.subplots(figsize=(10, 5), dpi=100, tight_layout=True) 55 | plt.plot(data) 56 | plt.plot(forecast.sample(10).numpy()[..., 0]) 57 | plt.show() 58 | --------------------------------------------------------------------------------