├── .gitignore
├── 1_problem_motivation.md
├── 2_api_dataset.md
├── 3_api_learning_model.md
├── 4_api_training_the_model.md
├── 5_api_predicting_the_results.md
├── 6_api_full_example.md
├── 7_examples_of_training.md
├── 8_api_saving_and_restoring_trained_model.md
├── README.md
├── bb499e6458958e2f0e204d5a7ac6488450ea1e3365254c6bb10f02b14e5a25f4
├── docs
├── 1_problem_motivation
│ └── index.html
├── 2_api_dataset
│ └── index.html
├── 3_api_learning_model
│ └── index.html
├── 4_api_training_the_model
│ └── index.html
├── 5_api_predicting_the_results
│ └── index.html
├── 6_api_full_example
│ └── index.html
├── 7_examples_of_training
│ └── index.html
├── 8_api_saving_and_restoring_trained_model
│ └── index.html
├── css
│ ├── highlight.css
│ ├── theme.css
│ └── theme_extra.css
├── examples
│ ├── 1_sample_output.txt
│ ├── 1_train_and_predict.php
│ ├── 2_save.php
│ └── 3_load_and_predict.php
├── fonts
│ ├── fontawesome-webfont.eot
│ ├── fontawesome-webfont.svg
│ ├── fontawesome-webfont.ttf
│ └── fontawesome-webfont.woff
├── img
│ ├── favicon.ico
│ ├── training_example_1.png
│ ├── training_example_2.png
│ ├── training_example_3.png
│ └── training_example_4.png
├── index.html
├── js
│ ├── highlight.pack.js
│ ├── jquery-2.1.1.min.js
│ ├── modernizr-2.8.3.min.js
│ └── theme.js
├── mkdocs
│ ├── js
│ │ ├── lunr.min.js
│ │ ├── mustache.min.js
│ │ ├── require.js
│ │ ├── search-results-template.mustache
│ │ ├── search.js
│ │ └── text.js
│ └── search_index.json
├── search.html
└── sitemap.xml
├── examples
├── 1_sample_output.txt
├── 1_train_and_predict.php
├── 2_save.php
└── 3_load_and_predict.php
└── img
├── training_example_1.png
├── training_example_2.png
├── training_example_3.png
└── training_example_4.png
/.gitignore:
--------------------------------------------------------------------------------
1 | .idea
2 |
3 |
--------------------------------------------------------------------------------
/1_problem_motivation.md:
--------------------------------------------------------------------------------
1 | # Problem motivation
2 |
3 | ### Do I need Impulse-ML: Recommender, the Recommender System?
4 | If you are a PHP developer who maintains any PHP social application and you
5 | want to predict the "rating" or "preference" that a user would give to an item
6 | the Impulse-ML: Recommender is library that you might consider to use!
7 | In further readings I will show you how to use Impulse-ML: Recommender
8 | and give you a hint on how to choose parameters which makes the predictions
9 | more accurate.
10 |
11 | ### Problem definition
12 |
13 | Consider the following data:
14 |
15 | ```
16 | +---------------------------------+------+---------+---------+---------+
17 | | Movie \ User | Anna | Barbara | Charlie | Dave |
18 | +---------------------------------+------+---------+---------+---------+
19 | | The Dark Knight | 0 | 0 | 5 | 5 |
20 | +---------------------------------+------+---------+---------+---------+
21 | | Guardians of the Galaxy | 0 | ? | ? | 5 |
22 | +---------------------------------+------+---------+---------+---------+
23 | | Logan | ? | 0 | 4 | ? |
24 | +---------------------------------+------+---------+---------+---------+
25 | | Forrest Gump | 4 | 5 | 0 | 0 |
26 | +---------------------------------+------+---------+---------+---------+
27 | | The Kid | 5 | 5 | 0 | 0 |
28 | +---------------------------------+------+---------+---------+---------+
29 | ```
30 |
31 | In this particular example we can notice:
32 |
33 | - we have 5 items - 5 movies
34 | - we have 4 categories - 4 users
35 | - we can notice 2 types of items: action movie and comedy movie
36 | - it seems that Anna and Barbara hate the action movies but love the comedy movies
37 | - it seems that Charlie and Dave love the action movies but hate the comedy movies
38 | - the table is incomplete because every user has not rated at least one movie
39 |
40 | Using this data you might want to:
41 |
42 | - predict user rating of movie that is unrated by user i.e. to send user the movie which he would like but he does not rated that movie yet
43 | - get movies similar to given movie
44 | - get the prediction of the movie for user that has no rated any movie and use this data
45 |
46 | Using Impulse-ML: Recommender you might end up with such predictions:
47 |
48 | ```
49 | +---------------------------------+------+---------+---------+---------+
50 | | Movie \ User | Anna | Barbara | Charlie | Dave |
51 | +---------------------------------+------+---------+---------+---------+
52 | | The Dark Knight | - | - | - | - |
53 | +---------------------------------+------+---------+---------+---------+
54 | | Guardians of the Galaxy | - | 0 | 5 | - |
55 | +---------------------------------+------+---------+---------+---------+
56 | | Logan | 0 | - | - | 4 |
57 | +---------------------------------+------+---------+---------+---------+
58 | | Forrest Gump | - | - | - | - |
59 | +---------------------------------+------+---------+---------+---------+
60 | | The Kid | - | - | - | - |
61 | +---------------------------------+------+---------+---------+---------+
62 | ```
63 |
64 | We might notice:
65 |
66 | - Anna hates action movies so the prediction of "Logan" will be 0
67 | - Barbara also hates action movies so the prediction of "Guardians of the Galaxy" will be 0
68 | - Charlie loves the action movies so the prediction of "Guardians of the Galaxy" will be 5
69 | - Dave also loves the action movies so prediction of "Logan" will be 4 (not 5 since the maximum rating of this movie is equal 4)
70 |
71 | That's how Collaborative Filtering works.
72 |
73 | ### Training and training parameters
74 |
75 | As each machine learning problem after filling with data in order to get correct prediction the training
76 | (based on the dataset) is required.
77 |
78 | There is only one parameter for a Learning Model created from a dataset:
79 |
80 | - number of features.
81 |
82 | Understand
83 | it like
84 | type or real category of the item. It's value can be set equals number of item types in your
85 | application. You don't need to name them, you have to know number of them.
86 |
87 | There are two training parameters:
88 |
89 | - learning rate
90 | - number of iterations
91 |
92 | The **learning** **rate** is parameter which describes how much gradient descent
93 | (which minimizes the error) will perform. You might to consider to increase or decrease
94 | this parameter and it has strong
95 | correlation with number of iterations.
96 |
97 | The **number** **of** **iterations** is parameter which describes how much steps gradient descent minimize function
98 | will be applied. It's highly correlated with learning rate.
99 |
100 | The results of prediction may vary from desired by setting this parameters less accurate.
101 |
102 | However, there are some rules of setting these parameters more accurate in order to get
103 | better prediction:
104 |
105 | - if you set small learning rate then you might consider increase number of iterations
106 | - if you set large learning rate then you might consider decrease number of iterations
107 | - you might expect very low error - in this example a reasonable error would be less than 0.0001
108 | - setting too high learning rate may cause algorithm get computation error and the predictions become
109 | useless
110 |
111 | For this particular example i have set:
112 |
113 | - learning rate === 0.01
114 | - number of iterations === 20000
115 | - number of features === 2 (since i noticed two types of movies or two user preferences)
116 |
117 | The key to get well trained model is to choose the right ratio of learning rate and number of iterations.
118 |
119 | You might consider try different number of features according to your Application so the dataset also.
120 |
121 | Above example was fully implemented in [examples/1_train_and_predict.php](examples/1_train_and_predict.php).
--------------------------------------------------------------------------------
/2_api_dataset.md:
--------------------------------------------------------------------------------
1 | # API - Dataset
2 |
3 | ### Passing data to Impulse-ML: Recommender Dataset
4 |
5 | Each algorithm, not only machine learning algorithm Impulse-ML: Recommender, must have knowledge about
6 | your data. Since the PHP applications use different storage systems Impulse-ML: Recommender has no database
7 | data fetcher - you might consider pass data directly to Dataset class instance.
8 |
9 | Consider the following code:
10 |
11 | ```php
12 | include_once __DIR__ . '/../src/Impulse/Recommender/Dataset.php';
13 |
14 | $dataset = new Impulse\Recommender\Dataset();
15 |
16 | $dataset->addItem(Impulse\Recommender\Dataset\Item::create('The Dark Knight'));
17 | $dataset->addCategory(Impulse\Recommender\Dataset\Category::create('Anna'));
18 | $dataset->addRating(Impulse\Recommender\Dataset\Rating::create('The Dark Knight', 'Anna', 0));
19 | ```
20 |
21 | It is minimum example to set 1 item (The Dark Knight), 1 category (Anna) and set it's rating to value of 0
22 | by user Anna.
23 |
24 | The example does not have any sense because we might want to have multiple items, categories as ratings - but
25 | you should have knowledge how to pass data to recommender system.
26 |
27 | In this example I used strings as my items, but ```Impulse\Recommender\Dataset\Item::create```
28 | and ```Impulse\Recommender\Dataset\Category::create``` methods can also get an integer instead
29 | of string. You might consider pass integers to dataset as long as their values correspond to database
30 | primary keys and you will save a lot of memory than using a strings.
31 |
32 | Also, each of those 2 ```create``` methods can get second parameter which has no defined and no
33 | required data type. You might
34 | consider pass to it an array with your database model data for future use if database primary keys is not
35 | so much useful.
36 |
37 | The ```Impulse\Recommender\Dataset\Rating::create``` requires 3 parameters, which the first 2 - the item and
38 | the category should be already added to dataset and the third one should be numeric value.
39 | There is no minimum or
40 | maximum value, but different ranges of all ratings can require a different learning parameters. You might
41 | consider pass NULL if item is not rated but it is not required.
42 |
43 | For real life example of creating dataset check [examples/1_train_and_predict.php](examples/1_train_and_predict.php)
44 |
45 |
--------------------------------------------------------------------------------
/3_api_learning_model.md:
--------------------------------------------------------------------------------
1 | # API - Learning Model
2 |
3 | ### Learning from dataset
4 |
5 | Assuming that we have data stored in ```Impulse\Recommender\Dataset``` we are ready to create
6 | a Learning Model.
7 |
8 | We can do this by using:
9 |
10 | ```php
11 | $model = new Impulse\Recommender\LearningModel($dataset, [
12 | 'numFeatures' => 2
13 | ]);
14 | ```
15 |
16 | "numFeatures" is required parameter. It may strictly correspond to number of categories of database
17 | items or number of defined user preferences.
18 | Notice that you don't need to define how much every item belongs to
19 | given category or how much user belongs to given preference. You just need to know number of them.
--------------------------------------------------------------------------------
/4_api_training_the_model.md:
--------------------------------------------------------------------------------
1 | # API - Training the model
2 |
3 | ### Training the model
4 |
5 | You can get this done by using:
6 |
7 | ```php
8 | $trainer = new Impulse\Recommender\Trainer($model, [
9 | 'learningRate' => 0.01,
10 | 'iterations' => 20000,
11 | 'verbose' => TRUE, // print debug messages
12 | 'verboseStep' => 1000 // step interval from displaying debug messages
13 | ]);
14 |
15 | $trainer->train();
16 | ```
17 |
18 | Note that training time may take very long time when your dataset is really large. It can be optimized
19 | more or less by
20 | choosing more accurate "learningRate" and "iterations" parameters.
21 |
22 |
--------------------------------------------------------------------------------
/5_api_predicting_the_results.md:
--------------------------------------------------------------------------------
1 | # API - Predicting the results
2 |
3 | There are 3 prediction ways:
4 |
5 | ### Predict rating for user
6 | ```php
7 | $model->predict('Logan', 'Anna'); // float(9.9920072216264E-14)
8 | ```
9 |
10 | which predicts rate for unrated "Logan" for user "Anna" by returning a number.
11 |
12 | Results may vary from desired because of improperly trained or not trained Learning Model.
13 |
14 |
15 | ### Find similar items
16 | ```php
17 | $model->findRelated('The Dark Knight', [
18 | 'limit' => 1
19 | ])
20 | ```
21 | will finds all items in ordered by similarity array and it will returns:
22 |
23 | ```text
24 | array(1) {
25 | [0]=>
26 | array(2) {
27 | ["similarity"]=>
28 | float(2.2657653531155E-11)
29 | ["model"]=>
30 | array(2) {
31 | ["_id"]=>
32 | string(23) "Guardians of the Galaxy"
33 | ["data"]=>
34 | NULL
35 | }
36 | }
37 | }
38 | ```
39 |
40 | ### Predict rate for user which has not rated any movie
41 | ```php
42 | $model->predict("Forrest Gump"); // int(2)
43 | ```
44 |
45 | which can be useful when the user has not rated any movie so user has no computed preferences.
46 |
47 |
--------------------------------------------------------------------------------
/6_api_full_example.md:
--------------------------------------------------------------------------------
1 | # API - Full example
2 |
3 | ```php
4 | include_once __DIR__ . '/../src/Impulse/Recommender/Dataset.php';
5 | include_once __DIR__ . '/../src/Impulse/Recommender/LearningModel.php';
6 | include_once __DIR__ . '/../src/Impulse/Recommender/Trainer.php';
7 |
8 | $dataset = new Impulse\Recommender\Dataset();
9 |
10 | $dataset->addItem(Impulse\Recommender\Dataset\Item::create('The Dark Knight'));
11 | $dataset->addItem(Impulse\Recommender\Dataset\Item::create('Guardians of the Galaxy'));
12 | $dataset->addItem(Impulse\Recommender\Dataset\Item::create('Logan'));
13 | $dataset->addItem(Impulse\Recommender\Dataset\Item::create('Forrest Gump'));
14 | $dataset->addItem(Impulse\Recommender\Dataset\Item::create('The Kid'));
15 |
16 | $dataset->addCategory(Impulse\Recommender\Dataset\Category::create('Anna'));
17 | $dataset->addCategory(Impulse\Recommender\Dataset\Category::create('Barbara'));
18 | $dataset->addCategory(Impulse\Recommender\Dataset\Category::create('Charlie'));
19 | $dataset->addCategory(Impulse\Recommender\Dataset\Category::create('Dave'));
20 |
21 | $dataset->addRating(Impulse\Recommender\Dataset\Rating::create('The Dark Knight', 'Anna', 0));
22 | $dataset->addRating(Impulse\Recommender\Dataset\Rating::create('The Dark Knight', 'Barbara', 0));
23 | $dataset->addRating(Impulse\Recommender\Dataset\Rating::create('The Dark Knight', 'Charlie', 5));
24 | $dataset->addRating(Impulse\Recommender\Dataset\Rating::create('The Dark Knight', 'Dave', 5));
25 | $dataset->addRating(Impulse\Recommender\Dataset\Rating::create('Guardians of the Galaxy', 'Anna', 0));
26 | $dataset->addRating(Impulse\Recommender\Dataset\Rating::create('Guardians of the Galaxy', 'Barbara', NULL));
27 | $dataset->addRating(Impulse\Recommender\Dataset\Rating::create('Guardians of the Galaxy', 'Charlie', NULL));
28 | $dataset->addRating(Impulse\Recommender\Dataset\Rating::create('Guardians of the Galaxy', 'Dave', 5));
29 | $dataset->addRating(Impulse\Recommender\Dataset\Rating::create('Logan', 'Anna', NULL));
30 | $dataset->addRating(Impulse\Recommender\Dataset\Rating::create('Logan', 'Barbara', 0));
31 | $dataset->addRating(Impulse\Recommender\Dataset\Rating::create('Logan', 'Charlie', 4));
32 | $dataset->addRating(Impulse\Recommender\Dataset\Rating::create('Logan', 'Dave', NULL));
33 | $dataset->addRating(Impulse\Recommender\Dataset\Rating::create('Forrest Gump', 'Anna', 4));
34 | $dataset->addRating(Impulse\Recommender\Dataset\Rating::create('Forrest Gump', 'Barbara', 5));
35 | $dataset->addRating(Impulse\Recommender\Dataset\Rating::create('Forrest Gump', 'Charlie', 0));
36 | $dataset->addRating(Impulse\Recommender\Dataset\Rating::create('Forrest Gump', 'Dave', 0));
37 | $dataset->addRating(Impulse\Recommender\Dataset\Rating::create('The Kid', 'Anna', 5));
38 | $dataset->addRating(Impulse\Recommender\Dataset\Rating::create('The Kid', 'Barbara', 5));
39 | $dataset->addRating(Impulse\Recommender\Dataset\Rating::create('The Kid', 'Charlie', 0));
40 | $dataset->addRating(Impulse\Recommender\Dataset\Rating::create('The Kid', 'Dave', 0));
41 |
42 | $model = new Impulse\Recommender\LearningModel($dataset, [
43 | 'numFeatures' => 2
44 | ]);
45 |
46 | $trainer = new Impulse\Recommender\Trainer($model, [
47 | 'learningRate' => 0.01,
48 | 'iterations' => 20000,
49 | 'verbose' => TRUE,
50 | 'verboseStep' => 1000
51 | ]);
52 |
53 | $trainer->train();
54 |
55 | echo "Prediction for 'Guardians of the Galaxy' for user 'Barbara': {$model->predict('Guardians of the Galaxy', 'Barbara')}\n";
56 | echo "Prediction for 'Guardians of the Galaxy' for user 'Charlie': {$model->predict('Guardians of the Galaxy', 'Charlie')}\n";
57 | echo "Prediction for 'Logan' for user 'Anna': {$model->predict('Logan', 'Anna')}\n";
58 | echo "Prediction for 'Logan' for user 'Dave': {$model->predict('Logan', 'Dave')}\n";
59 |
60 | echo "Prediction for 'Logan' for user with has not rated any movie: {$model->predict('Logan')}\n";
61 |
62 | echo "Related movies dump:\n";
63 |
64 | var_dump($model->findRelated('The Dark Knight', [
65 | 'limit' => 5
66 | ]));
67 | ```
68 |
69 | Which may produce output:
70 |
71 | ```text
72 | Starting train with 20000 steps.
73 | Step 0 with error 45.480538085596
74 | Step 1000 with error 0.14185749105855
75 | Step 2000 with error 0.00012367547481656
76 | Step 3000 with error 3.619659641736E-6
77 | Step 4000 with error 1.2231304078042E-7
78 | Step 5000 with error 5.7888427762046E-9
79 | Step 6000 with error 4.3159398580704E-10
80 | Step 7000 with error 4.2963851024733E-11
81 | Step 8000 with error 4.7458579962238E-12
82 | Step 9000 with error 5.3950928017295E-13
83 | Step 10000 with error 6.1779842297497E-14
84 | Step 11000 with error 7.0872758334659E-15
85 | Step 12000 with error 8.1340228680908E-16
86 | Step 13000 with error 9.3363924793255E-17
87 | Step 14000 with error 1.0716785358187E-17
88 | Step 15000 with error 1.2301363738974E-18
89 | Step 16000 with error 1.4120237226256E-19
90 | Step 17000 with error 1.6208264249521E-20
91 | Step 18000 with error 1.8605120483173E-21
92 | Step 19000 with error 2.1359434707995E-22
93 | Training ended with error 2.4586853842564E-23 after 20000 steps.
94 | Prediction for 'Guardians of the Galaxy' for user 'Barbara': 1.3472778448431E-11
95 | Prediction for 'Guardians of the Galaxy' for user 'Charlie': 4.9999999999974
96 | Prediction for 'Logan' for user 'Anna': 1.3994139180795E-11
97 | Prediction for 'Logan' for user 'Dave': 3.9999999999967
98 | Prediction for 'Logan' for user with has not rated any movie: 2
99 | Related movies dump:
100 | array(4) {
101 | [0]=>
102 | array(2) {
103 | ["similarity"]=>
104 | float(1.1086798146209E-11)
105 | ["model"]=>
106 | array(2) {
107 | ["_id"]=>
108 | string(23) "Guardians of the Galaxy"
109 | ["data"]=>
110 | NULL
111 | }
112 | }
113 | [1]=>
114 | array(2) {
115 | ["similarity"]=>
116 | float(0.17881301819823)
117 | ["model"]=>
118 | array(2) {
119 | ["_id"]=>
120 | string(5) "Logan"
121 | ["data"]=>
122 | NULL
123 | }
124 | }
125 | [2]=>
126 | array(2) {
127 | ["similarity"]=>
128 | float(0.92344428953759)
129 | ["model"]=>
130 | array(2) {
131 | ["_id"]=>
132 | string(12) "Forrest Gump"
133 | ["data"]=>
134 | NULL
135 | }
136 | }
137 | [3]=>
138 | array(2) {
139 | ["similarity"]=>
140 | float(1.7881301818354)
141 | ["model"]=>
142 | array(2) {
143 | ["_id"]=>
144 | string(7) "The Kid"
145 | ["data"]=>
146 | NULL
147 | }
148 | }
149 | }
150 | ```
151 |
152 | Check [examples/1_train_and_predict.php](examples/1_train_and_predict.php) for details.
--------------------------------------------------------------------------------
/7_examples_of_training.md:
--------------------------------------------------------------------------------
1 | # Examples of training
2 |
3 | According to our data table from lecture 2_problem_motivation.md consider the following learning parameters
4 | for this dataset:
5 |
6 | - 1: learningRate = 0.0001, iterations = 1000
7 | - 2: learningRate = 0.1, iterations = 10000
8 | - 3: learningRate = 0.0001, iterations = 100000
9 | - 4: learningRate = 0.01, iterations = 100000
10 |
11 | You might end up with following debug messages:
12 |
13 | ### Ex. 1
14 |
15 | learningRate = 0.0001, iterations = 1000
16 |
17 | 
18 |
19 |
20 | The learningRate is too low and the iterations are too low - we have untrained model with high error.
21 |
22 | ### Ex. 2
23 |
24 | learningRate = 0.1, iterations = 10000
25 |
26 | 
27 |
28 | The learningRate is too high cause after some step we have got numerical computation error.
29 |
30 | ### Ex. 3
31 |
32 | learningRate = 0.0001, iterations = 100000
33 |
34 | 
35 |
36 | It is quite good error, but you might consider setting number of iterations to higher value or
37 | increasing learning rate.
38 |
39 | ### Ex. 4
40 |
41 | learningRate = 0.01, iterations = 100000
42 |
43 | 
44 |
45 | After some steps we are not minimizing the error which is very close to 0
46 | so you might consider decrease number of iterations.
47 |
48 | ### Note
49 |
50 | The following examples with too large number of iterations should not have big impact on the time of
51 | training the model according to our small dataset used in previous examples.
52 | You might consider adjust more accurate parameters in larger datasets.
--------------------------------------------------------------------------------
/8_api_saving_and_restoring_trained_model.md:
--------------------------------------------------------------------------------
1 | # API - Saving and restoring trained model
2 |
3 | You probably don't want to train your model after each one rate given by user, but for sure you might want
4 | to do that job outside your website because the training time could take very large amount of time.
5 |
6 | For do this we have implemented saving and restoring your trained Learning Model.
7 |
8 | ### Save
9 |
10 | ```php
11 | include_once __DIR__ . '/../src/Impulse/Recommender/Builder.php';
12 |
13 | $builder = new Impulse\Recommender\Builder($model);
14 | $builder->save(__DIR__, 'save1');
15 | ```
16 |
17 | ### Restore
18 |
19 | ```php
20 | include_once __DIR__ . '/../src/Impulse/Recommender/Builder.php';
21 |
22 | $model = Impulse\Recommender\Builder::load(__DIR__, 'save1');
23 | ```
24 |
25 | Each of those methods takes 2 parameters which the first one is location of the directory to save the data,
26 | and the second one is name of created directory for the data files.
27 |
28 | Check [examples/2_save.php](examples/2_save.php) and
29 | [examples/3_load_and_predict.php](examples/3_load_and_predict.php) for example of implementation.
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Impulse-ML: Recommender, the Recommender Engine
2 |
3 | ### About
4 | Impulse-ML: Recommender is PHP library which can be used to share
5 | personalized content for users on your website. It is written in PHP
6 | and requires no additional dependencies. With OOP API you can achieve good
7 | prediction results and you can quickly apply recommender system in any PHP
8 | application, e.g. in Wordpress, Drupal or any other PHP framework based application.
9 |
10 | ### Machine Learning
11 | Recommender system solves a machine learning problem. Given items (i.e. movies rated by user)
12 | are possible to rate by users (i.e. 0 - 5 star rating). With given
13 | rating data Recommender System can predict:
14 |
15 | - movie ratings, of those movies which are unrated by the user
16 | - find similar movies
17 | - get the prediction for user who don't rate any movie.
18 |
19 | Impulse-ML: Recommender uses **Collaborative** **Filtering** algorithm
20 | so it is not required to provide item features, which can be
21 | understand as real item categories (i.e. comedy or action movie and their values) and it is
22 | not required to provide category features which can be understand as user preferences.
23 | The system learns
24 | by itself with only given items, categories and defined ratings.
25 |
26 | As long as you set Learning Model parameters and Training parameters
27 | more accurate you might end up with pretty good prediction of rating the movie
28 | which is not rated by user yet - assuming that the more ratings you give the more accurate
29 | predictions you will get.
30 |
31 | Impulse-ML: Recommender uses the gradient descent learning algorithm.
32 |
33 | For general details about Recommender Systems you might consider visit
34 | [Wikipedia - Recommender System](https://en.wikipedia.org/wiki/Recommender_system) to get
35 | intuition what is going on under the hood.
36 |
37 | ### Requirements
38 |
39 | - PHP >= 5.4
40 |
41 | ### Table of contents
42 |
43 | - [1. Problem motivation](1_problem_motivation.md)
44 | - [Do I need Impulse-ML: Recommender, the Recommender System?](1_problem_motivation.md#do-i-need-impulse-ml-recommender-the-recommender-system)
45 | - [Problem definition](1_problem_motivation.md#problem-definition)
46 | - [Training and training parameters](1_problem_motivation.md#training-and-training-parameters)
47 | - [2. API - Dataset](2_api_dataset.md)
48 | - [Passing data to Impulse-ML: Recommender Dataset](2_api_dataset.md#passing-data-to-impulse-ml-recommender-dataset)
49 | - [3. API - Learning Model](3_api_learning_model.md)
50 | - [Learning from dataset](3_api_learning_model.md#learning-from-dataset)
51 | - [4. API - Training the Learning Model](4_api_training_the_model.md)
52 | - [Training the model](4_api_training_the_model.md#training-the-model)
53 | - [5. API - Predicting the results](5_api_predicting_the_results.md)
54 | - [Predict rating for user](5_api_predicting_the_results.md#predict-rating-for-user)
55 | - [Find similar items](5_api_predicting_the_results.md#find-similar-items)
56 | - [Predict rate for user which has not rated any movie](5_api_predicting_the_results.md#predict-rate-for-user-which-has-not-rated-any-movie)
57 | - [6. API - Full Example](6_api_full_example.md)
58 | - [7. Examples of training](7_examples_of_training.md)
59 | - [8. API - Saving and restoring trained model](8_api_saving_and_restoring_trained_model.md)
60 | - [Save](8_api_saving_and_restoring_trained_model.md#save)
61 | - [Restore](8_api_saving_and_restoring_trained_model.md#restore)
--------------------------------------------------------------------------------
/bb499e6458958e2f0e204d5a7ac6488450ea1e3365254c6bb10f02b14e5a25f4:
--------------------------------------------------------------------------------
1 |
2 |
--------------------------------------------------------------------------------
/docs/1_problem_motivation/index.html:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
Do I need Impulse-ML: Recommender, the Recommender System?
155 |
If you are a PHP developer who maintains any PHP social application and you
156 | want to predict the "rating" or "preference" that a user would give to an item
157 | the Impulse-ML: Recommender is library that you might consider to use!
158 | In further readings I will show you how to use Impulse-ML: Recommender
159 | and give you a hint on how to choose parameters which makes the predictions
160 | more accurate.
161 |
Problem definition
162 |
Consider the following data:
163 |
+---------------------------------+------+---------+---------+---------+
164 | | Movie \ User | Anna | Barbara | Charlie | Dave |
165 | +---------------------------------+------+---------+---------+---------+
166 | | The Dark Knight | 0 | 0 | 5 | 5 |
167 | +---------------------------------+------+---------+---------+---------+
168 | | Guardians of the Galaxy | 0 | ? | ? | 5 |
169 | +---------------------------------+------+---------+---------+---------+
170 | | Logan | ? | 0 | 4 | ? |
171 | +---------------------------------+------+---------+---------+---------+
172 | | Forrest Gump | 4 | 5 | 0 | 0 |
173 | +---------------------------------+------+---------+---------+---------+
174 | | The Kid | 5 | 5 | 0 | 0 |
175 | +---------------------------------+------+---------+---------+---------+
176 |
177 |
178 |
In this particular example we can notice:
179 |
180 |
we have 5 items - 5 movies
181 |
we have 4 categories - 4 users
182 |
we can notice 2 types of items: action movie and comedy movie
183 |
it seems that Anna and Barbara hate the action movies but love the comedy movies
184 |
it seems that Charlie and Dave love the action movies but hate the comedy movies
185 |
the table is incomplete because every user has not rated at least one movie
186 |
187 |
Using this data you might want to:
188 |
189 |
predict user rating of movie that is unrated by user i.e. to send user the movie which he would like but he does not rated that movie yet
190 |
get movies similar to given movie
191 |
get the prediction of the movie for user that has no rated any movie and use this data
192 |
193 |
Using Impulse-ML: Recommender you might end up with such predictions:
194 |
+---------------------------------+------+---------+---------+---------+
195 | | Movie \ User | Anna | Barbara | Charlie | Dave |
196 | +---------------------------------+------+---------+---------+---------+
197 | | The Dark Knight | - | - | - | - |
198 | +---------------------------------+------+---------+---------+---------+
199 | | Guardians of the Galaxy | - | 0 | 5 | - |
200 | +---------------------------------+------+---------+---------+---------+
201 | | Logan | 0 | - | - | 4 |
202 | +---------------------------------+------+---------+---------+---------+
203 | | Forrest Gump | - | - | - | - |
204 | +---------------------------------+------+---------+---------+---------+
205 | | The Kid | - | - | - | - |
206 | +---------------------------------+------+---------+---------+---------+
207 |
208 |
209 |
We might notice:
210 |
211 |
Anna hates action movies so the prediction of "Logan" will be 0
212 |
Barbara also hates action movies so the prediction of "Guardians of the Galaxy" will be 0
213 |
Charlie loves the action movies so the prediction of "Guardians of the Galaxy" will be 5
214 |
Dave also loves the action movies so prediction of "Logan" will be 4 (not 5 since the maximum rating of this movie is equal 4)
215 |
216 |
That's how Collaborative Filtering works.
217 |
Training and training parameters
218 |
As each machine learning problem after filling with data in order to get correct prediction the training
219 | (based on the dataset) is required.
220 |
There is only one parameter for a Learning Model created from a dataset:
221 |
222 |
number of features.
223 |
224 |
Understand
225 | it like
226 | type or real category of the item. It's value can be set equals number of item types in your
227 | application. You don't need to name them, you have to know number of them.
228 |
There are two training parameters:
229 |
230 |
learning rate
231 |
number of iterations
232 |
233 |
The learningrate is parameter which describes how much gradient descent
234 | (which minimizes the error) will perform. You might to consider to increase or decrease
235 | this parameter and it has strong
236 | correlation with number of iterations.
237 |
The numberofiterations is parameter which describes how much steps gradient descent minimize function
238 | will be applied. It's highly correlated with learning rate.
239 |
The results of prediction may vary from desired by setting this parameters less accurate.
240 |
However, there are some rules of setting these parameters more accurate in order to get
241 | better prediction:
242 |
243 |
if you set small learning rate then you might consider increase number of iterations
244 |
if you set large learning rate then you might consider decrease number of iterations
245 |
you might expect very low error - in this example a reasonable error would be less than 0.0001
246 |
setting too high learning rate may cause algorithm get computation error and the predictions become
247 | useless
248 |
249 |
For this particular example i have set:
250 |
251 |
learning rate === 0.01
252 |
number of iterations === 20000
253 |
number of features === 2 (since i noticed two types of movies or two user preferences)
254 |
255 |
The key to get well trained model is to choose the right ratio of learning rate and number of iterations.
256 |
You might consider try different number of features according to your Application so the dataset also.
Each algorithm, not only machine learning algorithm Impulse-ML: Recommender, must have knowledge about
152 | your data. Since the PHP applications use different storage systems Impulse-ML: Recommender has no database
153 | data fetcher - you might consider pass data directly to Dataset class instance.
It is minimum example to set 1 item (The Dark Knight), 1 category (Anna) and set it's rating to value of 0
165 | by user Anna.
166 |
The example does not have any sense because we might want to have multiple items, categories as ratings - but
167 | you should have knowledge how to pass data to recommender system.
168 |
In this example I used strings as my items, but Impulse\Recommender\Dataset\Item::create
169 | and Impulse\Recommender\Dataset\Category::create methods can also get an integer instead
170 | of string. You might consider pass integers to dataset as long as their values correspond to database
171 | primary keys and you will save a lot of memory than using a strings.
172 |
Also, each of those 2 create methods can get second parameter which has no defined and no
173 | required data type. You might
174 | consider pass to it an array with your database model data for future use if database primary keys is not
175 | so much useful.
176 |
The Impulse\Recommender\Dataset\Rating::create requires 3 parameters, which the first 2 - the item and
177 | the category should be already added to dataset and the third one should be numeric value.
178 | There is no minimum or
179 | maximum value, but different ranges of all ratings can require a different learning parameters. You might
180 | consider pass NULL if item is not rated but it is not required.
"numFeatures" is required parameter. It may strictly correspond to number of categories of database
160 | items or number of defined user preferences.
161 | Notice that you don't need to define how much every item belongs to
162 | given category or how much user belongs to given preference. You just need to know number of them.
Note that training time may take very long time when your dataset is really large. It can be optimized
163 | more or less by
164 | choosing more accurate "learningRate" and "iterations" parameters.
According to our data table from lecture 2_problem_motivation.md consider the following learning parameters
159 | for this dataset:
160 |
161 |
1: learningRate = 0.0001, iterations = 1000
162 |
2: learningRate = 0.1, iterations = 10000
163 |
3: learningRate = 0.0001, iterations = 100000
164 |
4: learningRate = 0.01, iterations = 100000
165 |
166 |
You might end up with following debug messages:
167 |
Ex. 1
168 |
learningRate = 0.0001, iterations = 1000
169 |
170 |
The learningRate is too low and the iterations are too low - we have untrained model with high error.
171 |
Ex. 2
172 |
learningRate = 0.1, iterations = 10000
173 |
174 |
The learningRate is too high cause after some step we have got numerical computation error.
175 |
Ex. 3
176 |
learningRate = 0.0001, iterations = 100000
177 |
178 |
It is quite good error, but you might consider setting number of iterations to higher value or
179 | increasing learning rate.
180 |
Ex. 4
181 |
learningRate = 0.01, iterations = 100000
182 |
183 |
After some steps we are not minimizing the error which is very close to 0
184 | so you might consider decrease number of iterations.
185 |
Note
186 |
The following examples with too large number of iterations should not have big impact on the time of
187 | training the model according to our small dataset used in previous examples.
188 | You might consider adjust more accurate parameters in larger datasets.
You probably don't want to train your model after each one rate given by user, but for sure you might want
153 | to do that job outside your website because the training time could take very large amount of time.
154 |
For do this we have implemented saving and restoring your trained Learning Model.
Each of those methods takes 2 parameters which the first one is location of the directory to save the data,
169 | and the second one is name of created directory for the data files.
Impulse-ML: Recommender is PHP library which can be used to share
158 | personalized content for users on your website. It is written in PHP
159 | and requires no additional dependencies. With OOP API you can achieve good
160 | prediction results and you can quickly apply recommender system in any PHP
161 | application, e.g. in Wordpress, Drupal or any other PHP framework based application.
162 |
Machine Learning
163 |
Recommender system solves a machine learning problem. Given items (i.e. movies rated by user)
164 | are possible to rate by users (i.e. 0 - 5 star rating). With given
165 | rating data Recommender System can predict:
166 |
167 |
movie ratings, of those movies which are unrated by the user
168 |
find similar movies
169 |
get the prediction for user who don't rate any movie.
170 |
171 |
Impulse-ML: Recommender uses CollaborativeFiltering algorithm
172 | so it is not required to provide item features, which can be
173 | understand as real item categories (i.e. comedy or action movie and their values) and it is
174 | not required to provide category features which can be understand as user preferences.
175 | The system learns
176 | by itself with only given items, categories and defined ratings.
177 |
As long as you set Learning Model parameters and Training parameters
178 | more accurate you might end up with pretty good prediction of rating the movie
179 | which is not rated by user yet - assuming that the more ratings you give the more accurate
180 | predictions you will get.
181 |
Impulse-ML: Recommender uses the gradient descent learning algorithm.
182 |
For general details about Recommender Systems you might consider visit
183 | Wikipedia - Recommender System to get
184 | intuition what is going on under the hood.