├── .DS_Store
├── Deep Learning.md
├── ML Project Checklist.md
├── Machine Learning.md
├── More Resources.md
├── README.md
├── The ML Landscape.md
├── img
    ├── model1.png
    ├── model2.png
    ├── model3.png
    ├── precision-recall.png
    └── reinforcement-learning.png
├── numpy-pandas
    ├── .DS_Store
    ├── 01-numpy.ipynb
    ├── 02-example.ipynb
    ├── 02-pandas.ipynb
    ├── 03-plt.ipynb
    ├── README.md
    ├── data
    │   ├── state-abbrevs.csv
    │   ├── state-areas.csv
    │   └── state-population.csv
    ├── img
    │   ├── .DS_Store
    │   ├── axis=1.jpg
    │   └── groupby.png
    ├── plt1.py
    ├── tools_matplotlib.ipynb
    ├── tools_numpy.ipynb
    ├── tools_pandas.ipynb
    └── very-basics
    │   ├── Readme.md
    │   ├── img
    │       └── plt.png
    │   └── nb
    │       └── 01_plt.ipynb
├── scikit-learn
    ├── Readme.md
    ├── car_evaluation.csv
    ├── img
    │   ├── process.png
    │   ├── process1.png
    │   └── scikit-learn.png
    ├── k-means-clustering.ipynb
    ├── knn.ipynb
    ├── linear_regression.ipynb
    ├── logistic_regression.ipynb
    ├── svm.ipynb
    └── train_test_split.ipynb
└── tensorflow-in-practice
    ├── .DS_Store
    ├── Exercises
        ├── Course_3_Week_1_Exercise_(Tokenizer_BBC_Text).ipynb
        ├── Course_3_Week_2_Exercise_(BBC_Text_Model_Building).ipynb
        ├── Course_3_Week_3_Exercise_Twitter_Fake_News.ipynb
        ├── Exercise_1_House_Prices.ipynb
        ├── Exercise_2_Handwriting_Recognition_DNN.ipynb
        ├── Exercise_3_CNN.ipynb
        ├── Exercise_4_Complex_Images_flow_from_directory.ipynb
        ├── Exercise_5_Cat_vs_Dog_Kaggle.ipynb
        ├── Exercise_6_Cats_vs_Dogs_with_Augmentation.ipynb
        └── Exercise_7_Transfer_learning.ipynb
    ├── MNIST
        ├── my_model.h5
        ├── test.py
        └── train.py
    ├── README.md
    ├── convolutional-neural-networks-tensorflow.md
    ├── img
        ├── 0.jpg
        ├── 1.png
        ├── 2.jpg
        ├── 3.jpg
        ├── fibonacci.png
        ├── fp.png
        ├── fp2.png
        ├── lstm.png
        ├── lstm2.png
        ├── metrics.png
        ├── ml_architecture.png
        ├── rfp.png
        ├── rnn.png
        ├── rnn2.png
        ├── seasonality.png
        ├── tf_datasets.png
        ├── trend.png
        ├── ts.png
        ├── tsn.png
        └── word_embeddings.png
    ├── introduction-to-tensorflow-for-ai.md
    ├── natural-language-processing-tensorflow.md
    ├── notebooks
        ├── .DS_Store
        ├── Course_1_Part_2_Lesson_2_Notebook.ipynb
        ├── Course_1_Part_4_Lesson_2_Notebook.ipynb
        ├── Course_1_Part_6_Lesson_2_Notebook.ipynb
        ├── Course_1_Part_6_Lesson_3_Notebook.ipynb
        ├── Course_1_Part_8_Lesson_2_Notebook.ipynb
        ├── Course_2_Part_2_Lesson_2_Notebook.ipynb
        ├── Course_2_Part_4_Lesson_2_Notebook_(Cats_v_Dogs_Augmentation).ipynb
        ├── Course_2_Part_6_Lesson_3_Notebook_(Transfer_Learning).ipynb
        ├── Course_2_Part_8_Lesson_2_Notebook_(RockPaperScissors).ipynb
        ├── Course_3_Week_1(Tokenizer-Sarcasm-Dataset).ipynb
        ├── Course_3_Week_2(Model_Training_IMDB_Reviews).ipynb
        ├── Course_3_Week_2(Sarcasm-Classifier).ipynb
        ├── Course_3_Week_2(Subwords).ipynb
        ├── Course_3_Week_3(IMDB).ipynb
        ├── Course_3_Week_4_Lesson_1_(Sheckspire_Text_Generation).ipynb
        ├── Course_3_Week_4_Lesson_2_Notebook.ipynb
        ├── README.md
        ├── irish-lyrics-eof.txt
        ├── meta.tsv
        ├── sarcasm.json
        └── vecs.tsv
    └── sequences-time-series-and-prediction.md


/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Rustam-Z/machine-learning/5001d7d103642a61f82492df3a968aa6f4836601/.DS_Store


--------------------------------------------------------------------------------
/Deep Learning.md:
--------------------------------------------------------------------------------
  1 | # Neural Networks and Deep Learning
  2 | 
  3 |     - Building and training with TensorFlow and Keras
  4 |     - Architectures: feedforward for tabular data, CNN for computer vision, RNN and LSTM for sequence processing
  5 |     - Encoder / decoder and Transformers for NLP
  6 |     - Autoencoders and generative Adversarial Network (GANs) for generative learning
  7 |     - Techniques for training DNN
  8 |     - Reinforcement learning - building agent to play a game
  9 |     - Loading and preprocessing large amount of data
 10 |     - Training and deploying at scale
 11 | 
 12 | ## Contents
 13 | - Introduction to ANN with Keras
 14 |     - [Sequential API](#Sequential-API), classification & regression
 15 |     - [Functional API](#Functional-API)
 16 |     - Subclassing API for dynamic models
 17 |     - [Using Callbacks](#Using-Callbacks), EarlyStopping, ModelCheckpoints
 18 |     - [TensorBoard](#TensorBoard)
 19 |     - [Fine-Tuning Neural Network Hyperparameters](#Fine-Tuning-Neural-Network-Hyperparameters)
 20 | 
 21 | ### Sequential API
 22 | ```py
 23 | """Classification MLP"""
 24 | # "sparse_categorical_crossentropy"    0 to 9 
 25 | #"categorical_crossentropy"          [0., 0., 0., 1., 0., 0., 0., 0., 0., 0.] 
 26 | # binary classification "sigmoid" (i.e., logistic) activation function in the output layer instead of the "softmax" activation function, and we would use the "binary_crossentropy" loss.
 27 | 
 28 | model.compile(loss="sparse_categorical_crossentropy",
 29 |                 optimizer="sgd",
 30 |                 metrics=["accuracy"])
 31 | history = model.fit(X_train, y_train, epochs=30,
 32 |                     validation_data=(X_valid, y_valid))
 33 | model.evaluate(X_test, y_test)
 34 | y_proba = model.predict(X_new)
 35 | y_pred = model.predict_classes(X_new)
 36 | # History
 37 | import pandas as pd
 38 | import matplotlib.pyplot as plt
 39 | pd.DataFrame(history.history).plot(figsize=(8, 5))
 40 | plt.grid(True)
 41 | plt.gca().set_ylim(0, 1) # set the vertical range to [0-1]
 42 | plt.show()
 43 | ```
 44 | - If you want to convert sparse labels (i.e., class indices) to one-hot vector labels, use the `keras.utils.to_categorical()` function. To go the other way round, use the `np.argmax()` function with `axis=1`.
 45 | 
 46 | - You must **compile** the model, **train** it, **evaluate** it, and use it to **make predictions**.
 47 | 
 48 | - `.fit()` validation_split=0.1, class_weight, sample_weight
 49 | 
 50 | ```py
 51 | """Regression MLP"""
 52 | from sklearn.datasets import fetch_california_housing
 53 | from sklearn.model_selection import train_test_split
 54 | from sklearn.preprocessing import StandardScaler
 55 | 
 56 | housing = fetch_california_housing()
 57 | X_train_full, X_test, y_train_full, y_test = train_test_split(
 58 |     housing.data, housing.target)
 59 | X_train, X_valid, y_train, y_valid = train_test_split(
 60 |     X_train_full, y_train_full)
 61 | 
 62 | scaler = StandardScaler()
 63 | X_train = scaler.fit_transform(X_train)
 64 | X_valid = scaler.transform(X_valid)
 65 | X_test = scaler.transform(X_test)
 66 | 
 67 | model = keras.models.Sequential([
 68 |        keras.layers.Dense(30, activation="relu", input_shape=X_train.shape[1:]),
 69 |        keras.layers.Dense(1)
 70 | ])
 71 | 
 72 | model.compile(loss="mean_squared_error", optimizer="sgd")
 73 | history = model.fit(X_train, y_train, epochs=20, validation_data=(X_valid, y_valid))
 74 | mse_test = model.evaluate(X_test, y_test)
 75 | X_new = X_test[:3] # pretend these are new instances
 76 | y_pred = model.predict(X_new)
 77 | ```
 78 | 
 79 | ### Functional API
 80 | - <br><img src="img/model1.png" width=350>
 81 | ```py
 82 | input_ = keras.layers.Input(shape=X_train.shape[1:])
 83 | hidden1 = keras.layers.Dense(30, activation="relu")(input_)
 84 | hidden2 = keras.layers.Dense(30, activation="relu")(hidden1)
 85 | concat = keras.layers.Concatenate()([input_, hidden2])
 86 | output = keras.layers.Dense(1)(concat)
 87 | model = keras.Model(inputs=[input_], outputs=[output])
 88 | ```
 89 | - <br><img src="img/model2.png" width=300>
 90 | ```py
 91 | input_A = keras.layers.Input(shape=[5], name="wide_input")
 92 | input_B = keras.layers.Input(shape=[6], name="deep_input")
 93 | hidden1 = keras.layers.Dense(30, activation="relu")(input_B)
 94 | hidden2 = keras.layers.Dense(30, activation="relu")(hidden1)
 95 | concat = keras.layers.concatenate([input_A, hidden2])
 96 | output = keras.layers.Dense(1, name="output")(concat)
 97 | model = keras.Model(inputs=[input_A, input_B], outputs=[output])
 98 | 
 99 | # As we have two inputs, we must specify two input features in .fit() and so on
100 | model.compile(loss="mse", optimizer=keras.optimizers.SGD(lr=1e-3))
101 | X_train_A, X_train_B = X_train[:, :5], X_train[:, 2:]
102 | X_valid_A, X_valid_B = X_valid[:, :5], X_valid[:, 2:]
103 | X_test_A, X_test_B = X_test[:, :5], X_test[:, 2:]
104 | X_new_A, X_new_B = X_test_A[:3], X_test_B[:3]
105 | 
106 | history = model.fit((X_train_A, X_train_B), y_train, epochs=20, validation_data=((X_valid_A, X_valid_B), y_valid))
107 | mse_test = model.evaluate((X_test_A, X_test_B), y_test)
108 | y_pred = model.predict((X_new_A, X_new_B))
109 | ```
110 | - <br><img src="img/model3.png" width=400>
111 | ```py
112 | [...] # Same as above, up to the main output layer
113 | output = keras.layers.Dense(1, name="main_output")(concat)
114 | aux_output = keras.layers.Dense(1, name="aux_output")(hidden2)
115 | model = keras.Model(inputs=[input_A, input_B], outputs=[output, aux_output])
116 | 
117 | model.compile(loss=["mse", "mse"], loss_weights=[0.9, 0.1], optimizer="sgd")
118 | history = model.fit(
119 |        [X_train_A, X_train_B], [y_train, y_train], epochs=20,
120 |        validation_data=([X_valid_A, X_valid_B], [y_valid, y_valid]))
121 | 
122 | total_loss, main_loss, aux_loss = model.evaluate(
123 |        [X_test_A, X_test_B], [y_test, y_test])
124 | y_pred_main, y_pred_aux = model.predict([X_new_A, X_new_B])
125 | ```
126 | 
127 | ### Using Callbacks
128 | ```py
129 | """It will only save your model when its performance on the validation set is the best so far"""
130 | checkpoint_cb = keras.callbacks.ModelCheckpoint("my_keras_model.h5", save_best_only=True)
131 | 
132 | history = model.fit(X_train, y_train, epochs=10,
133 |                     validation_data=(X_valid, y_valid),
134 |                     callbacks=[checkpoint_cb])
135 | model = keras.models.load_model("my_keras_model.h5") # roll back to best model
136 | ```
137 | ```py
138 | """It will interrupt training when it measures no progress on the validation set for a number of epochs (defined by the patience argument)"""
139 | early_stopping_cb = keras.callbacks.EarlyStopping(patience=10,
140 |                                                      restore_best_weights=True)
141 | history = model.fit(X_train, y_train, epochs=100,
142 |                        validation_data=(X_valid, y_valid),
143 |                        callbacks=[checkpoint_cb, early_stopping_cb])
144 | 
145 | ```
146 | 
147 | ### TensorBoard
148 | ```py
149 | import os
150 | 
151 | root_logdir = os.path.join(os.curdir, "my_logs")
152 | 
153 | def get_run_logdir():
154 |     import time
155 |     run_id = time.strftime("run_%Y_%m_%d-%H_%M_%S")
156 |     return os.path.join(root_logdir, run_id)
157 | 
158 | run_logdir = get_run_logdir() # e.g., './my_logs/run_2019_06_07-15_15_22'
159 | 
160 | [...] # Build and compile your model
161 | tensorboard_cb = keras.callbacks.TensorBoard(run_logdir)
162 | history = model.fit(X_train, y_train, epochs=30,
163 |                        validation_data=(X_valid, y_valid),
164 |                        callbacks=[tensorboard_cb])
165 | 
166 | ```
167 | 
168 | ### Fine-Tuning Neural Network Hyperparameters
169 | ```py
170 | def build_model(n_hidden=1, n_neurons=30, learning_rate=3e-3, input_shape=[8]):
171 |     model = keras.models.Sequential()
172 |     model.add(keras.layers.InputLayer(input_shape=input_shape))
173 |     for layer in range(n_hidden):
174 |         model.add(keras.layers.Dense(n_neurons, activation="relu"))
175 |     model.add(keras.layers.Dense(1))
176 |     optimizer = keras.optimizers.SGD(lr=learning_rate)
177 |     model.compile(loss="mse", optimizer=optimizer)
178 |     return model
179 | 
180 | # We need to create Scikit Regressor object
181 | keras_reg = keras.wrappers.scikit_learn.KerasRegressor(build_model)
182 | 
183 | keras_reg.fit(X_train, y_train, epochs=100,
184 |                  validation_data=(X_valid, y_valid),
185 |                  callbacks=[keras.callbacks.EarlyStopping(patience=10)])
186 | 
187 | mse_test = keras_reg.score(X_test, y_test)
188 | y_pred = keras_reg.predict(X_new)
189 | 
190 | from scipy.stats import reciprocal
191 | from sklearn.model_selection import RandomizedSearchCV
192 | 
193 | param_distribs = {
194 |        "n_hidden": [0, 1, 2, 3],
195 |        "n_neurons": np.arange(1, 100),
196 |        "learning_rate": reciprocal(3e-4, 3e-2),
197 | }
198 | rnd_search_cv = RandomizedSearchCV(keras_reg, param_distribs, n_iter=10, cv=3)
199 | rnd_search_cv.fit(X_train, y_train, epochs=100,
200 |                     validation_data=(X_valid, y_valid),
201 |                     callbacks=[keras.callbacks.EarlyStopping(patience=10)])
202 | 
203 | rnd_search_cv.best_params_
204 | rnd_search_cv.best_score_
205 | 
206 | model = rnd_search_cv.best_estimator_.model
207 | ```
208 | 


--------------------------------------------------------------------------------
/ML Project Checklist.md:
--------------------------------------------------------------------------------
  1 | This checklist can guide you through your Machine Learning projects. There are eight main steps:  
  2 | 
  3 | 1. Frame the problem and look at the big picture.  
  4 | 2. Get the data.  
  5 | 3. Explore the data to gain insights.  
  6 | 4. Prepare the data to better expose the underlying data patterns to Machine Learning algorithms.  
  7 | 5. Explore many different models and short-list the best ones.  
  8 | 6. Fine-tune your models and combine them into a great solution.  
  9 | 7. Present your solution.  
 10 | 8. Launch, monitor, and maintain your system.  
 11 | 
 12 | Obviously, you should feel free to adapt this checklist to your needs.  
 13 | 
 14 | # Frame the problem and look at the big picture  
 15 | 1. Define the objective in business terms.  
 16 | 2. How will your solution be used?  
 17 | 3. What are the current solutions/workarounds (if any)?  
 18 | 4. How should you frame this problem (supervised/unsupervised, online/offline, etc.)  
 19 | 5. How should performance be measured?  
 20 | 6. Is the performance measure aligned with the business objective?  
 21 | 7. What would be the minimum performance needed to reach the business objective?  
 22 | 8. What are comparable problems? Can you reuse experience or tools?  
 23 | 9. Is human expertise available?  
 24 | 10. How would you solve the problem manually?  
 25 | 11. List the assumptions you or others have made so far.  
 26 | 12. Verify assumptions if possible.  
 27 | 
 28 | # Get the data   
 29 | Note: automate as much as possible so you can easily get fresh data.  
 30 | 
 31 | 1. List the data you need and how much you need.  
 32 | 2. Find and document where you can get that data.  
 33 | 3. Check how much space it will take.  
 34 | 4. Check legal obligations, and get the authorization if necessary.  
 35 | 5. Get access authorizations.  
 36 | 6. Create a workspace (with enough storage space).  
 37 | 7. Get the data.  
 38 | 8. Convert the data to a format you can easily manipulate (without changing the data itself).  
 39 | 9. Ensure sensitive information is deleted or protected (e.g., anonymized). 
 40 | 10. Check the size and type of data (time series, sample, geographical, etc.).  
 41 | 11. Sample a test set, put it aside, and never look at it (no data snooping!).    
 42 | 
 43 | # Explore the data  
 44 | Note: try to get insights from a field expert for these steps.  
 45 | 
 46 | 1. Create a copy of the data for exploration (sampling it down to a manageable size if necessary).
 47 | 2. Create a Jupyter notebook to keep record of your data exploration.  
 48 | 3. Study each attribute and its characteristics:  
 49 |     - Name  
 50 |     - Type (categorical, int/float, bounded/unbounded, text, structured, etc.)
 51 |     - % of missing values  
 52 |     - Noisiness and type of noise (stochastic, outliers, rounding errors, etc.)
 53 |     - Possibly useful for the task?  
 54 |     - Type of distribution (Gaussian, uniform, logarithmic, etc.)
 55 | 4. For supervised learning tasks, identify the target attribute(s).
 56 | 5. Visualize the data.  
 57 | 6. Study the correlations between attributes.  
 58 | 7. Study how you would solve the problem manually.  
 59 | 8. Identify the promising transformations you may want to apply.  
 60 | 9. Identify extra data that would be useful (go back to "Get the Data" on page 502).  
 61 | 10. Document what you have learned.  
 62 | 
 63 | # Prepare the data  
 64 | Notes:    
 65 | - Work on copies of the data (keep the original dataset intact).  
 66 | - Write functions for all data transformations you apply, for five reasons:  
 67 |     - So you can easily prepare the data the next time you get a fresh dataset  
 68 |     - So you can apply these transformations in future projects  
 69 |     - To clean and prepare the test set  
 70 |     - To clean and prepare new data instances  
 71 |     - To make it easy to treat your preparation choices as hyperparameters  
 72 | 
 73 | 1. Data cleaning:  
 74 |     - Fix or remove outliers (optional).  
 75 |     - Fill in missing values (e.g., with zero, mean, median...) or drop their rows (or columns).  
 76 | 2. Feature selection (optional):  
 77 |     - Drop the attributes that provide no useful information for the task.  
 78 | 3. Feature engineering, where appropriates:  
 79 |     - Discretize continuous features.  
 80 |     - Decompose features (e.g., categorical, date/time, etc.).  
 81 |     - Add promising transformations of features (e.g., log(x), sqrt(x), x^2, etc.).
 82 |     - Aggregate features into promising new features.  
 83 | 4. Feature scaling: standardize or normalize features.  
 84 | 
 85 | # Short-list promising models  
 86 | Notes: 
 87 | - If the data is huge, you may want to sample smaller training sets so you can train many different models in a reasonable time (be aware that this penalizes complex models such as large neural nets or Random Forests).  
 88 | - Once again, try to automate these steps as much as possible.    
 89 | 
 90 | 1. Train many quick and dirty models from different categories (e.g., linear, naive, Bayes, SVM, Random Forests, neural net, etc.) using standard parameters.  
 91 | 2. Measure and compare their performance.  
 92 |     - For each model, use N-fold cross-validation and compute the mean and standard deviation of their performance. 
 93 | 3. Analyze the most significant variables for each algorithm.  
 94 | 4. Analyze the types of errors the models make.  
 95 |     - What data would a human have used to avoid these errors?  
 96 | 5. Have a quick round of feature selection and engineering.  
 97 | 6. Have one or two more quick iterations of the five previous steps.  
 98 | 7. Short-list the top three to five most promising models, preferring models that make different types of errors.  
 99 | 
100 | # Fine-Tune the System  
101 | Notes:  
102 | - You will want to use as much data as possible for this step, especially as you move toward the end of fine-tuning.   
103 | - As always automate what you can.    
104 | 
105 | 1. Fine-tune the hyperparameters using cross-validation.  
106 |     - Treat your data transformation choices as hyperparameters, especially when you are not sure about them (e.g., should I replace missing values with zero or the median value? Or just drop the rows?).  
107 |     - Unless there are very few hyperparamter values to explore, prefer random search over grid search. If training is very long, you may prefer a Bayesian optimization approach (e.g., using a Gaussian process priors, as described by Jasper Snoek, Hugo Larochelle, and Ryan Adams ([https://goo.gl/PEFfGr](https://goo.gl/PEFfGr)))  
108 | 2. Try Ensemble methods. Combining your best models will often perform better than running them invdividually.  
109 | 3. Once you are confident about your final model, measure its performance on the test set to estimate the generalization error.
110 | 
111 | > Don't tweak your model after measuring the generalization error: you would just start overfitting the test set.  
112 |   
113 | # Present your solution  
114 | 1. Document what you have done.  
115 | 2. Create a nice presentation.  
116 |     - Make sure you highlight the big picture first.  
117 | 3. Explain why your solution achieves the business objective.  
118 | 4. Don't forget to present interesting points you noticed along the way.  
119 |     - Describe what worked and what did not.  
120 |     - List your assumptions and your system's limitations.  
121 | 5. Ensure your key findings are communicated through beautiful visualizations or easy-to-remember statements (e.g., "the median income is the number-one predictor of housing prices").  
122 | 
123 | # Launch!  
124 | 1. Get your solution ready for production (plug into production data inputs, write unit tests, etc.).  
125 | 2. Write monitoring code to check your system's live performance at regular intervals and trigger alerts when it drops.  
126 |     - Beware of slow degradation too: models tend to "rot" as data evolves.   
127 |     - Measuring performance may require a human pipeline (e.g., via a crowdsourcing service).  
128 |     - Also monitor your inputs' quality (e.g., a malfunctioning sensor sending random values, or another team's output becoming stale). This is  particulary important for online learning systems.  
129 | 3. Retrain your models on a regular basis on fresh data (automate as much as possible).  
130 | 


--------------------------------------------------------------------------------
/Machine Learning.md:
--------------------------------------------------------------------------------
  1 | # Machine Learning
  2 | 
  3 |     - Handling, cleaning, and preparing data.
  4 |     - Selecting and engineering features.
  5 |     - Learning by fitting a model to data.
  6 |     - Optimizing a cost function.
  7 |     - Selecting a model and tuning hyperparameters using cross-validation.
  8 |     - Underfitting and overfitting (the bias/variance tradeoff).
  9 |     - Unsupervised learning techniques: clustering, density estimation and anomaly detection.
 10 |     - Algorithms: Linear and Polynomial Regression, Logistic Regression, k-Nearest Neighbors, Support Vector Machines, Decision Trees, Random Forests, and Ensemble methods.
 11 | <!-- 
 12 | it should include:
 13 | - important tips, notes
 14 | - code snippets which are usefull when starting new projects
 15 | - pipelines, ml models codes
 16 | 
 17 | So, here I will include:
 18 | End-to-End Machine Learning Project step by step code snippets and notes, section by section. When I will create new project, so that I can refer to any section I stuck.
 19 | -->
 20 | 
 21 | ## End-to-End Machine Learning Project
 22 | - Frame the problem and look at the big picture
 23 |     - Goal and Performance measure
 24 | - Get the data
 25 |     - [Create test set](#Create-test-set)
 26 | - Explore the data to gain insights (EDA)
 27 |     - [Looking for correlations](#Looking-for-Correlations)
 28 |     - Experimenting with attribute combinations
 29 | - Prepare data for ML algorithms
 30 |     - [Data cleaning](#Data-cleaning)
 31 |     - [Handling text and categorical attributes](#Handling-text-and-categorical-attributes)
 32 |     - [Feature Scaling](#Feature-Scaling)
 33 |     - [Transformation Pipelines](#Transformation-Pipelines)
 34 | - [Explore many different models and short-list the best ones](#Select-and-Train-a-Model)
 35 |     - [Cross-Validation](#Cross-Validation)
 36 | - Fine-tune models and combine them into a great solution
 37 |     - [Grid Search](#Grid-Search)
 38 |     - [Randomized Search](#Randomized-Search)
 39 |     - Ensemble models
 40 |     - Evaluate on test set
 41 | - Launch and monitor
 42 | 
 43 | ## Get the data
 44 | ### Create test set
 45 | ```py
 46 | '''Create test set'''
 47 | from sklearn.model_selection import train_test_split
 48 | train_set, test_set = train_test_split(data, test_size=0.2, random_state=42)
 49 | ```
 50 | ```py
 51 | X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
 52 | ```
 53 | ```py
 54 | from sklearn.model_selection import StratifiedShuffleSplit
 55 | import numpy as np
 56 | X = np.array([[1, 2], [3, 4], [1, 2], [3, 4], [1, 2], [3, 4]])
 57 | y = np.array([0, 0, 0, 1, 1, 1])
 58 | sss = StratifiedShuffleSplit(n_splits=1, test_size=0.5, random_state=0)
 59 | sss.get_n_splits(X, y) 
 60 | 
 61 | for train_index, test_index in sss.split(X, y):
 62 |     print("TRAIN:", train_index, "TEST:", test_index)
 63 |     X_train, X_test = X[train_index], X[test_index]
 64 |     y_train, y_test = y[train_index], y[test_index]
 65 | ```
 66 | 
 67 | ## Explore the data to gain insights
 68 | ```py
 69 | '''Visualizing Geographical Data'''
 70 | data.plot(kind="scatter", x="longitude", y="latitude", alpha=0.1)
 71 | ```
 72 | ```py
 73 | housing.plot(kind="scatter", x="longitude", y="latitude", alpha=0.4,
 74 |     s=housing["population"]/100, label="population", figsize=(10,7),
 75 |     c="median_house_value", cmap=plt.get_cmap("jet"), colorbar=True,
 76 | )
 77 | plt.legend()
 78 | ```
 79 | ### Looking for Correlations
 80 | ```py
 81 | '''Looking for Correlations'''
 82 | corr_matrix = data.corr()
 83 | corr_matrix["any_column"].sort_values(ascending=False)
 84 | 
 85 | from pandas.plotting import scatter_matrix
 86 | attributes = ["median_house_value", "median_income", "total_rooms", "housing_median_age"]
 87 | scatter_matrix(housing[attributes], figsize=(12, 8))
 88 | ```
 89 | ```py
 90 | # Correlations between features
 91 | all_data_corr = all_data.corr().abs().unstack().sort_values(kind="quicksort", ascending=False).reset_index()
 92 | all_data_corr.rename(columns={"level_0": "Feature 1", "level_1": "Feature 2", 0: 'Correlation Coefficient'}, inplace=True)
 93 | all_data_corr.drop(all_data_corr.iloc[1::2].index, inplace=True)
 94 | all_data_corr_nd = all_data_corr.drop(all_data_corr[all_data_corr['Correlation Coefficient'] == 1.0].index)
 95 | 
 96 | corr = all_data_corr_nd['Correlation Coefficient'] > 0.1
 97 | all_data_corr_nd[corr]
 98 | ```
 99 | ```py
100 | # pivot_table() vs groupby(), the below lines are the same
101 | pd.pivot_table(df, index=["a"], columns=["b"], values=["c"], aggfunc=np.sum)
102 | df.groupby(['a','b'])['c'].sum()
103 | ```
104 | ```py
105 | # Aggregate using one or more operations over the specified axis
106 | # agg()-can be applied to multiple groups together
107 | df.agg(['sum', 'min'])
108 | df_all.groupby(['Sex', 'Pclass']).agg(lambda x:x.value_counts().index[0])['Embarked'] 
109 | 
110 | # Apply a function along an axis of the DataFrame
111 | # apply()-cannot be applied to multiple groups together 
112 | df.apply(np.sqrt)
113 | df_all['Deck'] = df_all['Cabin'].apply(lambda s: s[0] if pd.notnull(s) else 'M')
114 | ```
115 | 
116 | ## Prepare data for ML algorithms
117 | - https://stackoverflow.com/questions/48673402/how-can-i-standardize-only-numeric-variables-in-an-sklearn-pipeline
118 | - https://scikit-learn.org/stable/modules/preprocessing.html
119 | 
120 | ### Data Cleaning
121 | ```py
122 | housing.dropna(subset=["total_bedrooms"])    # Get rid of the corresponding districts
123 | housing.drop("total_bedrooms", axis=1)       # Get rid of the whole attribute
124 | median = housing["total_bedrooms"].median()  # Set the values to some value (zero, mean, median)
125 | housing["total_bedrooms"].fillna(median, inplace=True)
126 | ```
127 | ```py
128 | '''SimpleImputer, filling with the missing numerical attributes with the "median"'''
129 | from sklearn.impute import SimpleImputer
130 | imputer = SimpleImputer(strategy="median")
131 | housing_num = housing.select_dtypes(include=[np.number]) # just numerical attributes
132 | imputer.fit(housing_num) # "trained" inputer, now it is ready to transform the training set by replacing missing values with the learned medians
133 | imputer.statistics_ # same as "housing_num.median().values"
134 | X = imputer.transform(housing_num)
135 | housing_tr = pd.DataFrame(X, columns=housing_num.columns,
136 |                           index=housing.index) # new dataframe
137 | ```
138 | 
139 | ### Handling Text and Categorical Attributes
140 | - [select_dtypes](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.select_dtypes.html)
141 | ```py
142 | '''Transforming continuous numerical attributes to categorical'''
143 | housing["income_cat"] = pd.cut(housing["median_income"], 
144 |                                 bins=[0., 1.5, 3.0, 4.5, 6., np.inf], 
145 |                                 labels=[1, 2, 3, 4, 5])
146 | ```
147 | ```py
148 | '''Categorical Attributes'''
149 | from sklearn.preprocessing import OrdinalEncoder
150 | from sklearn.preprocessing import OneHotEncoder
151 | 
152 | housing_cat = housing[["ocean_proximity"]]
153 | 
154 | ordinal_encoder = OrdinalEncoder()
155 | housing_cat_encoded = ordinal_encoder.fit_transform(housing_cat)
156 | 
157 | housing_cat_encoded[:10] 
158 | # array([[0.],
159 | #    [0.],
160 | #    [4.],
161 | #    [1.],
162 | #    [0.],
163 | #    [1.],
164 | #    [0.],
165 | #    [1.],
166 | #    [0.],
167 | #    [0.]])
168 | 
169 | ordinal_encoder.categories_ # [array(['<1H OCEAN', 'INLAND', 'ISLAND', 'NEAR BAY', 'NEAR OCEAN'], dtype=object)]
170 | 
171 | cat_encoder = OneHotEncoder(sparse=False)
172 | housing_cat_1hot = cat_encoder.fit_transform(housing_cat)
173 | housing_cat_1hot
174 | # array([[1., 0., 0., 0., 0.],
175 | #        [1., 0., 0., 0., 0.],
176 | #        [0., 0., 0., 0., 1.],
177 | #        ...,
178 | #        [0., 1., 0., 0., 0.],
179 | #        [1., 0., 0., 0., 0.],
180 | #        [0., 0., 0., 1., 0.]])
181 | ```
182 | ### Feature Scaling
183 | ```py
184 | '''StandardScaler'''
185 | from sklearn.preprocessing import StandardScaler
186 | import numpy as np
187 | 
188 | X_train = np.array([[ 1., -1.,  2.],
189 |                     [ 2.,  0.,  0.],
190 |                     [ 0.,  1., -1.]])
191 | scaler = StandardScaler().fit(X_train)
192 | 
193 | scaler.mean_
194 | scaler.scale_
195 | 
196 | X_scaled = scaler.transform(X_train)
197 | X_scaled
198 | ```
199 | ```py
200 | from sklearn.preprocessing import MinMaxScaler
201 | 
202 | X_train = np.array([[ 1., -1.,  2.],
203 |                     [ 2.,  0.,  0.],
204 |                     [ 0.,  1., -1.]])
205 | 
206 | min_max_scaler = MinMaxScaler()
207 | X_train_minmax = min_max_scaler.fit_transform(X_train)
208 | X_train_minmax
209 | # array([[0.5       , 0.        , 1.        ],
210 | #        [1.        , 0.5       , 0.33333333],
211 | #        [0.        , 1.        , 0.        ]])
212 | 
213 | # For the test data, we just need to use .transform()
214 | X_test = np.array([[-3., -1.,  4.]])
215 | X_test_minmax = min_max_scaler.transform(X_test)
216 | X_test_minmax
217 | # array([[-1.5       ,  0.        ,  1.66666667]])
218 | ```
219 | 
220 | ### Custom Transformer
221 | ```py
222 | from sklearn.base import BaseEstimator, TransformerMixin
223 | 
224 | # column index
225 | rooms_ix, bedrooms_ix, population_ix, households_ix = 3, 4, 5, 6
226 | 
227 | class CombinedAttributesAdder(BaseEstimator, TransformerMixin):
228 |     def __init__(self, add_bedrooms_per_room=True): # no *args or **kargs
229 |         self.add_bedrooms_per_room = add_bedrooms_per_room
230 |     def fit(self, X, y=None):
231 |         return self  # nothing else to do
232 |     def transform(self, X):
233 |         rooms_per_household = X[:, rooms_ix] / X[:, households_ix]
234 |         population_per_household = X[:, population_ix] / X[:, households_ix]
235 |         if self.add_bedrooms_per_room:
236 |             bedrooms_per_room = X[:, bedrooms_ix] / X[:, rooms_ix]
237 |             return np.c_[X, rooms_per_household, population_per_household,
238 |                          bedrooms_per_room]
239 |         else:
240 |             return np.c_[X, rooms_per_household, population_per_household]
241 | 
242 | attr_adder = CombinedAttributesAdder(add_bedrooms_per_room=False)
243 | housing_extra_attribs = attr_adder.transform(housing.values)
244 | ```
245 | 
246 | ### Transformation Pipelines
247 | ```py
248 | from sklearn.pipeline import Pipeline
249 | from sklearn.preprocessing import StandardScaler
250 | from sklearn.compose import ColumnTransformer
251 | 
252 | num_pipeline = Pipeline([
253 |         ('imputer', SimpleImputer(strategy="median")),
254 |         ('attribs_adder', CombinedAttributesAdder()),
255 |         ('std_scaler', StandardScaler()),
256 |     ])
257 | 
258 | # housing_num_tr = num_pipeline.fit_transform(housing_num)
259 | 
260 | num_attribs = list(housing_num)
261 | cat_attribs = ["ocean_proximity"]
262 | 
263 | full_pipeline = ColumnTransformer([
264 |         ("num", num_pipeline, num_attribs),
265 |         ("cat", OneHotEncoder(), cat_attribs),
266 |     ])
267 | 
268 | housing_prepared = full_pipeline.fit_transform(housing)
269 | housing_prepared # to get access to the new dataset
270 | ```
271 | ## Select and Train a Model
272 | - Before using `.predict()` you have to use `full_pipeline.transform(some_data)`
273 | 
274 | ### Cross-Validation
275 | ```py
276 | from sklearn.model_selection import cross_val_score
277 | 
278 | scores = cross_val_score(model, data, labels, scoring="neg_mean_squared_eroor", cv=10)
279 | rmse_scores = np.sqrt(-scores)
280 | 
281 | def display_scores(scores):
282 |     print("Scores:", scores)
283 |     print("Mean:", scores.mean())
284 |     print("Standart deviation:", scores.std())
285 | 
286 | display_scores(rmse_scores)
287 | ```
288 | ```py
289 | '''Save the model'''
290 | import joblib
291 | joblib.dump(my_model, "my_model.pkl") # to save model
292 | my_model_loaded = joblib.load("my_model.pkl") # to load model
293 | ```
294 | 
295 | 
296 | ## Fine-tune Models
297 | ### Grid Search
298 | ```py
299 | from sklearn.model_selection import GridSearchCV
300 | 
301 | param_grid = [
302 |     # try 12 (3×4) combinations of hyperparameters
303 |     {'n_estimators': [3, 10, 30], 'max_features': [2, 4, 6, 8]},
304 |     # then try 6 (2×3) combinations with bootstrap set as False
305 |     {'bootstrap': [False], 'n_estimators': [3, 10], 'max_features': [2, 3, 4]},
306 | ]
307 | 
308 | forest_reg = RandomForestRegressor(random_state=42)
309 | # train across 5 folds, that's a total of (12+6)*5=90 rounds of training 
310 | grid_search = GridSearchCV(forest_reg, param_grid, cv=5,
311 |                            scoring='neg_mean_squared_error',
312 |                            return_train_score=True)
313 | grid_search.fit(housing_prepared, housing_labels)
314 | 
315 | grid_search.best_params_ # the best hyperparameters
316 | grid_search.best_estimator_
317 | 
318 | # look at the score of each hyperparameter combination tested during the grid search:
319 | cvres = grid_search.cv_results_
320 | for mean_score, params in zip(cvres["mean_test_score"], cvres["params"]):
321 |     print(np.sqrt(-mean_score), params)
322 | ```
323 | 
324 | ### Randomized Search
325 | ```py
326 | from sklearn.model_selection import RandomizedSearchCV
327 | from scipy.stats import randint
328 | 
329 | param_distribs = {
330 |         'n_estimators': randint(low=1, high=200),
331 |         'max_features': randint(low=1, high=8),
332 |     }
333 | 
334 | forest_reg = RandomForestRegressor(random_state=42)
335 | rnd_search = RandomizedSearchCV(forest_reg, param_distributions=param_distribs,
336 |                                 n_iter=10, cv=5, scoring='neg_mean_squared_error', random_state=42)
337 | rnd_search.fit(housing_prepared, housing_labels)
338 | 
339 | # looking at the scores during training
340 | cvres = rnd_search.cv_results_
341 | for mean_score, params in zip(cvres["mean_test_score"], cvres["params"]):
342 |     print(np.sqrt(-mean_score), params)
343 | 
344 | feature_importances = grid_search.best_estimator_.feature_importances_
345 | ```


--------------------------------------------------------------------------------
/More Resources.md:
--------------------------------------------------------------------------------
 1 | ## What is Data Science?
 2 | - [What really is Data Science? ](https://youtu.be/xC-c7E5PK0Y)
 3 | - https://telegra.ph/What-REALLY-is-Data-Science-09-21
 4 | 
 5 | ### Just leaving it here
 6 | - [Data Science Interview at Facebook](https://tproger.ru/translations/preparing-for-data-science-interview/)
 7 | 
 8 | ### Advice
 9 | - [12 Things I Learned During My First Year as a Machine Learning Engineer](https://proglib.io/w/464d1326)
10 | - [How to Learn Machine Learning, The Self-Starter Way](https://elitedatascience.com/learn-machine-learning)
11 | - [Andrew Ng: Advice on Getting Started in Deep Learning](https://youtu.be/1k37OcjH7BM)
12 | - [Andrew Ng Machine Learning Career](https://youtu.be/hkagmGAu74Y)
13 | 
14 | ### Technical Articles / Videos
15 | - [Cheat sheat Stanford](https://stanford.edu/~shervine/teaching/cs-229/cheatsheet-machine-learning-tips-and-tricks)
16 | - [Loss function for neural networks CNN](https://towardsdatascience.com/understanding-different-loss-functions-for-neural-networks-dd1ed0274718)
17 | - [Backprop in CNN](https://medium.com/@pavisj/convolutions-and-backpropagations-46026a8f5d2c)
18 | - [Backprop in NN](https://youtu.be/0e0z28wAWfg)
19 | - [Introduction to Backpropagation and Optimization](https://ai.plainenglish.io/approach-complex-functions-with-backpropagation-how-i-was-applying-to-yandex-c5f68d50f2da)
20 | 
21 | ### Courses
22 | - http://introtodeeplearning.com
23 | - https://openlearninglibrary.mit.edu/courses/course-v1:MITx+6.036+1T2019/about
24 | - [Fast AI DL](https://www.fast.ai)
25 | - [Coursera Data Science](https://www.coursera.org/specializations/data-science-python?ranMID=40328&ranEAID=EBOQAYvGY4A&ranSiteID=EBOQAYvGY4A-xBZ6HIoQD.6tLROsD7db4g&siteID=EBOQAYvGY4A-xBZ6HIoQD.6tLROsD7db4g&utm_content=2&utm_medium=partners&utm_source=linkshare&utm_campaign=EBOQAYvGY4A)
26 | - [Fast AI ML course](https://course18.fast.ai/ml)
27 | 
28 | ### Bookshelf
29 | - [Data Science books](https://proglib.io/w/0232ad78)
30 | 
31 | ### YouTube
32 | - [DeepLizard](https://www.youtube.com/c/deeplizard/playlists)
33 | 
34 | ### QA
35 | - Validation accuracy is higher than training accuracy.
36 |     - https://www.quora.com/Can-validation-accuracy-be-higher-than-training-accuracy
37 | 
38 | ### Resources where you can find the latest publications from leading laboratories
39 | - https://openai.com/blog/tags/research/
40 | - https://deepmind.com/research
41 | - https://www.microsoft.com/en-us/research/research-area/artificial-intelligence
42 | - https://www.research.ibm.com/artificial-intelligence/#publications
43 | - https://ai.stanford.edu
44 | - https://www.csail.mit.edu
45 | - https://ai.google/research/


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # Machine Learning Area
 2 | 
 3 | [Rustam-Z🚀](https://t.me/rz_zokirov) • [Find more here](https://t.me/rz_zokirov_ml)
 4 | 
 5 | > 1% better every day = 3700% better at the end of the year
 6 | 
 7 | > The goal is to solve problems and help society with the help of AI. 
 8 | 
 9 | ## Why should you learn Machine Learning?
10 | 
11 | [First of all, understand the difference between AI / Data Science / Machine Learning](https://telegra.ph/AI--Data-Science--Machine-Learning--Deep-Learning--Data-Analysis--Data-Engineering--Big-Data-09-09)
12 | 
13 | I found two good answers on why you should care. Firstly, **Machine Learning (ML)** is making computers do things that we’ve never made computers do before. If you want to do something new, not just new to you, but to the world, you can do it with ML.
14 | 
15 | Secondly, if you don’t influence the world, the world will influence you.
16 | 
17 | If you focus on results, you will never change. 
18 | If you focus on change, you will get results.
19 | 
20 | ## How to study? 
21 | - **First, learn to learn.**
22 | - [Thinking of Self-Studying Machine Learning? Remind yourself of these 6 things](https://towardsdatascience.com/thinking-of-self-studying-machine-learning-remind-yourself-of-these-6-things-b55a5f2b6c7d)
23 | - [How to Learn Machine Learning](https://elitedatascience.com/learn-machine-learning)
24 | 
25 | ## Roadmap
26 | - **Math (Calculus, Linear Algebra, Propability & Statistics)** 
27 |   - [Calculus](https://www.youtube.com/playlist?list=PLmdFyQYShrjd4Qn42rcBeFvF6Qs-b6e-L), *Don't Memorize*
28 |   - [Caclulus](https://youtube.com/playlist?list=PLZHQObOWTQDMsr9K-rj53DwVRMYO3t5Yr), *3Blue1Brown*
29 |   - [Linear Algebra](https://youtube.com/playlist?list=PLZHQObOWTQDPD3MizzM2xVFitgF8hE_ab), *3Blue1Brown*
30 |   - [Statistics & Probability](https://www.khanacademy.org/math/statistics-probability)
31 | - **Python**  
32 |     - [My Python learning roadmap](https://github.com/Rustam-Z/learning-area#1-start-learning-python)
33 |     - [NumPy](https://www.w3schools.com/python/numpy/default.asp), [Pandas](https://www.w3schools.com/python/pandas/default.asp), [Matplotlib](https://www.w3schools.com/python/matplotlib_intro.asp) 
34 |     - [10 minutes to Pandas](https://pandas.pydata.org/pandas-docs/stable/user_guide/10min.html)
35 | - **Machine Learning**
36 |     - "Deep learning with Python", book, 1<sup>st</sup>  part
37 |     - Machine Learning Course, Andrew Ng, coursera.org
38 |     - **Scikit-Learn**
39 |         - [freeCodeCamp.org](https://youtu.be/0B5eIE_1vpU)
40 |         - https://inria.github.io/scikit-learn-mooc/
41 |         - https://scikit-learn.org/stable/tutorial/index.html
42 | - **Deep Learning** - Start solving [Kaggle](https://github.com/Rustam-Z/kaggle-problem-solving)
43 |     - TensorFlow Developer Specialization, deeplearning.ai, coursera.org
44 |     - OR "AI and Machine Learning for Coders", book
45 |     - "Deep learning with Python", book, 2<sup>nd</sup> part
46 |     - ["Hands-on Machine Learning with Scikit-Learn, Keras & TensorFlow"](https://github.com/ageron/handson-ml2), book
47 |     - **fast.ai**
48 |     - "Deep learning", MIT press, book
49 |     - Deep Learning Specialization, Andrew Ng, coursera.org
50 |     - TensorFlow Advanced Techniques, deeplearning.ai, coursera.org
51 | - **Data Science**
52 |     - "Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython", book
53 |     - "Python Data Science Handbook", book
54 | - **More**
55 |     - Applied Machine Learning: https://machinelearningmastery.com/start-here
56 | 
57 | ## ML Cheatsheets
58 | * [Cheat Sheets for AI, Neural Networks, Machine Learning, Deep Learning & Big Data](https://becominghuman.ai/cheat-sheets-for-ai-neural-networks-machine-learning-deep-learning-big-data-678c51b4b463) `numpy`, `pandas`, `sklearn`, `ml`, `dl`
59 | * [Machine Learning](https://stanford.edu/~shervine/teaching/cs-229/)
60 |  
61 | *Please, consider this repository for contributing too!* 
62 | 
63 | <!-- 
64 | ## Path to FAANG! 
65 | First, need to gain basic knowledge in Data Science and Machine learning:
66 |     - Follow my roadmap (ML, DL, DS, TF (Hands-on))
67 |     - Course, practice, books
68 |     - Kaggle
69 | 
70 | Then, move to FAANG preperation (#algorithms, #systems_design and #behavior):
71 |     - Cracking the coding interview, LeetCode
72 |     - https://github.com/jwasham/coding-interview-university
73 | 
74 |     - Nodir's advice https://t.me/rz_zokirov_swe/285
75 |     - Smns advice https://t.me/FaangInterviewChannel/58
76 | 
77 |     - https://t.me/faang_materials
78 |     - https://t.me/FaangInterviewChannel
79 | 
80 | 
81 | Хорошая мастер программа и стажировка по нужной теме + хорошая подготовка может принести офер джуниора в амазон.
82 | -->
83 | 


--------------------------------------------------------------------------------
/The ML Landscape.md:
--------------------------------------------------------------------------------
  1 | ## The Machine Learning Landscape
  2 | ### What is Machine Learning?
  3 | - Machine learning (ML) is field of study that gives computers the ability to learn without being explicitly programmed.
  4 | - A computer program is said to learn from *experience E* with respect to some *task T* and some *performance measure P*, if its performance on T, as measured by P, improves with experience E.
  5 | - **Example:** T = flag spam for new emails, E = the training data, P = accuracy, the ratio of correctly classified emails.
  6 | 
  7 | ### Why use ML?
  8 | - Problems for which existing solutions require a lot of hand-tuning or long lists of
  9 | rules: one Machine Learning algorithm can often simplify code and perform bet‐
 10 | ter. (spam classifier)
 11 | - Complex problems for which there is no good solution at all using a traditional
 12 | approach: the best Machine Learning techniques can find a solution. (speech recognition)
 13 | - Fluctuating environments: a Machine Learning system can adapt to new data.
 14 | - Getting insights about complex problems and large amounts of data. (data mining)
 15 | 
 16 | ### Types of ML Systems
 17 | - Whether or not they are trained with human supervision `supervised, unsupervised, semisupervised, and Reinforcement Learning`
 18 | - Whether or not they can learn incrementally on the fly `online vs batch learning`
 19 | - Whether they work by simply comparing new data points to known data points, or instead detect patterns in the training data and build a predictive model, much like scientists do `instance-based vs model-based learning`
 20 | 
 21 | - **Supervised learning** - training data with labels (expected outputs). 
 22 |     - Tasks: classification, regression (univariate / multivariate). 
 23 |     - Class / sample / label / feature (predictors: age, brand, ...) / attribute
 24 |     - **Algorithms**
 25 |         - k-Nearest Neighbors
 26 |         - Linear Regression
 27 |         - Logistic Regression
 28 |         - Support Vector Machines (SVMs)
 29 |         - Decision Trees and Random Forests
 30 |         - Neural networks
 31 | 
 32 | - **Unsupervised learning** - training data is unlabeled.
 33 |     - Tasks: clustering, anomaly detection, visualization & dimensionality reduction. 
 34 |     - Clustering (find similar visitors)
 35 |         - K-Means
 36 |         - DBSCAN
 37 |         - Hierarchical Cluster Analysis (HCA)
 38 |     - Anomaly detection & novelty detection (detect unusual things)
 39 |         - One-class SVM
 40 |         - Isolation Forest
 41 |     - Visualization and dimensionality reduction (king of feature extraction)
 42 |         - Principal Component Analysis (PCA)
 43 |         - Kernel PCA
 44 |         - Locally-Linear Embedding (LLE)
 45 |         - t-distributed Stochastic Neighbor Embedding (t-SNE)
 46 |     - Association rule learning
 47 |         - Apriori
 48 |         - Eclat
 49 | 
 50 |     - `TIP!` Use dimensionality reduction algo before feeding to supervised learning algorithm.
 51 |     - `TIP!` Automatically removing outliers from a dataset before feeding it to another learning algorithm.
 52 | 
 53 | - **Semisupervised learning** - a lot of unlabeled data and a little bit of labeled data. 
 54 |     - Example: like in Google photos, it recongnizes same person in many pictures. We need supervised part because we need to seperate similar clusters. (like similar people)
 55 | 
 56 | - **Reinforcement Learning** - *agent* can observe environment, and perform some actions, and get *rewards* and *penalties*. Then it must teach itself the best strategy (*policy*) to get max reward. A policy defines what action the agent should choose when it is in a given situation.
 57 | <br><img src="img/reinforcement-learning.png" width=450px center>
 58 | 
 59 | - **Batch learning** - or *offline learning*, when you have new type of data, you need to retrain over whole dataset every time.
 60 | - **Online learning** - you train the system incrementally on a new data or mini-batch of data. 
 61 |     - You must set *learning rate* parameter, if you set hugh rate, then your system rapidly adapt to new data, but it will tend to forget the old data. 
 62 |     - A big challenge if bad data is fed to the system, the system’s performance will gradually decline. 
 63 |     - `TIP!` Monitor your latest input data using an anomaly detection algorithm.
 64 | 
 65 | - **Instance-based learning** - the system learns the examples by heart, then generalizes to new cases by comparing them to the learned examples using a *similarity measure*.
 66 | - **Model-based learning** - build the model, then use it to make *predictions*.
 67 | 
 68 | ### Main Challenges of ML
 69 | - “Bad algorithm” and “bad data”
 70 | - **Bad data**
 71 | - If some instances are missing a few features (e.g., 5% of your customers did not specify their age), you must decide whether you want to ignore this attribute altogether, ignore these instances, fill in the missing values (e.g., with the median age), or train one model with the feature and one model without it, and so on.
 72 | - **Feature engineering**, involves:
 73 |     - *Feature selection*: selecting the most useful features to train on among existing features.
 74 |     - *Feature extraction*: combining existing features to produce a more useful one (dimensionality reduction algorithms can help).
 75 |     - Creating new features by gathering new data.
 76 | 
 77 | - **Bad algorithm**
 78 |     - **Overfitting** means that the model performs well on the training data, but it does not generalize well. How to overcome?
 79 |         - To simplify the model by selecting one with fewer parameters (a linear model rather than a high-degree polynomial model), by redusing number features in training data or or by constraining the model (with regularization).
 80 |         - To gather more training data.
 81 |         - To reduce the noise in the training data (fix data errors and remove outliers).
 82 |     - **Underfitting** occurs when your model is too simple to learn the underlying structure of the data. The options to fix:
 83 |         - Selecting a more powerful model, with more parameters.
 84 |         - Feeding better features to the learning algorithm (feature engineering)
 85 |         - Reducing the constraints on the model (reducing the regularization hyperparameter)
 86 | 
 87 | - The system will not perform well if your training set is too small, or if the data is not representative (production level data), noisy, or polluted with irrelevant features (garbage in, garbage out). Lastly, your model needs to be neither too simple.
 88 | 
 89 | ### Testing and Validating
 90 | - 80% training and 20% testing. If 10 million samples 1% for testing is enough.
 91 | - **Hyperparameter Tuning and Model Selection** `page 32`
 92 |     - Example: you are hesiteting between two models linear and polinomial. You must try both and see which one is generalizing better on test set. You want to apply regularization to decrease overfitting, so you don't know how to choose a hyperparameter. Try 100 different hyperparameters, and find the best which produces small error.
 93 |     - However, after you deployed your model you see 15% error. It is probably because you chose *hp* for this particular set. Then you should use **holdout validation "with validation / dev set"**. You train multiple models with various hyperparameters on the reduced training set (training - validation set). Select model performing best on val-on set. And train again on full dataset. 
 94 |     - [**Cross validation**](https://machinelearningmastery.com/repeated-k-fold-cross-validation-with-python/)
 95 | - **Data Mismatch** `page 33`
 96 |     - Example: You want to developer flowers species classifier. You downloaded pictures from web. And you have 10K pictures taken with the app. **TIP! Remember, your validation and test set must be as representitive as possible you expect to use in production.** In this case divide 50 / 50 to dev & test sets (pics must not be duplicated in both, even near-duplicate). 
 97 |     - After training you see that model on validation set is very poor. Is it overfitting or mismatch between web and phone pics?
 98 |     - One solution, is to take the part of training (web pics) into **train-dev set**. After training a model, you see that model on train-dev set is good. Then the problem is data mismatch. Use preprocessing, and make web pics look like phone pics. 
 99 |     - But if model is bad on train-dev set, then you have overfitting. You should try to simplify or regularize the model, get more training data and clean up the training data.
100 |     
101 | ### Extra 
102 | - **Hyper-parameters** are those which we supply to the model, for example: number of hidden Nodes and Layers, input features, Learning Rate, Activation Function etc in Neural Network, while **Parameters** are those which would be learnt during training by the machine like Weights and Biases.
103 | 
104 | 


--------------------------------------------------------------------------------
/img/model1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Rustam-Z/machine-learning/5001d7d103642a61f82492df3a968aa6f4836601/img/model1.png


--------------------------------------------------------------------------------
/img/model2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Rustam-Z/machine-learning/5001d7d103642a61f82492df3a968aa6f4836601/img/model2.png


--------------------------------------------------------------------------------
/img/model3.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Rustam-Z/machine-learning/5001d7d103642a61f82492df3a968aa6f4836601/img/model3.png


--------------------------------------------------------------------------------
/img/precision-recall.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Rustam-Z/machine-learning/5001d7d103642a61f82492df3a968aa6f4836601/img/precision-recall.png


--------------------------------------------------------------------------------
/img/reinforcement-learning.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Rustam-Z/machine-learning/5001d7d103642a61f82492df3a968aa6f4836601/img/reinforcement-learning.png


--------------------------------------------------------------------------------
/numpy-pandas/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Rustam-Z/machine-learning/5001d7d103642a61f82492df3a968aa6f4836601/numpy-pandas/.DS_Store


--------------------------------------------------------------------------------
/numpy-pandas/02-example.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "metadata": {
  3 |   "language_info": {
  4 |    "codemirror_mode": {
  5 |     "name": "ipython",
  6 |     "version": 3
  7 |    },
  8 |    "file_extension": ".py",
  9 |    "mimetype": "text/x-python",
 10 |    "name": "python",
 11 |    "nbconvert_exporter": "python",
 12 |    "pygments_lexer": "ipython3",
 13 |    "version": "3.8.6"
 14 |   },
 15 |   "orig_nbformat": 2,
 16 |   "kernelspec": {
 17 |    "name": "python386jvsc74a57bd04ea0e157563bacde0b7fd8dc93db6051c9678d5eadbd4117abf1a4cecbc8cd1a",
 18 |    "display_name": "Python 3.8.6 64-bit ('tf': conda)"
 19 |   },
 20 |   "metadata": {
 21 |    "interpreter": {
 22 |     "hash": "4ea0e157563bacde0b7fd8dc93db6051c9678d5eadbd4117abf1a4cecbc8cd1a"
 23 |    }
 24 |   }
 25 |  },
 26 |  "nbformat": 4,
 27 |  "nbformat_minor": 2,
 28 |  "cells": [
 29 |   {
 30 |    "cell_type": "code",
 31 |    "execution_count": 1,
 32 |    "metadata": {},
 33 |    "outputs": [],
 34 |    "source": [
 35 |     "import pandas as pd"
 36 |    ]
 37 |   },
 38 |   {
 39 |    "cell_type": "code",
 40 |    "execution_count": 3,
 41 |    "metadata": {},
 42 |    "outputs": [
 43 |     {
 44 |      "output_type": "stream",
 45 |      "name": "stdout",
 46 |      "text": [
 47 |       "  state/region     ages  year  population\n0           AL  under18  2012   1117489.0\n1           AL    total  2012   4817528.0\n2           AL  under18  2010   1130966.0\n3           AL    total  2010   4785570.0\n4           AL  under18  2011   1125763.0\n        state  area (sq. mi)\n0     Alabama          52423\n1      Alaska         656425\n2     Arizona         114006\n3    Arkansas          53182\n4  California         163707\n        state abbreviation\n0     Alabama           AL\n1      Alaska           AK\n2     Arizona           AZ\n3    Arkansas           AR\n4  California           CA\n"
 48 |      ]
 49 |     }
 50 |    ],
 51 |    "source": [
 52 |     "pop = pd.read_csv('data/state-population.csv')\n",
 53 |     "areas = pd.read_csv('data/state-areas.csv')\n",
 54 |     "abbrevs = pd.read_csv('data/state-abbrevs.csv')\n",
 55 |     "\n",
 56 |     "print(pop.head()); print(areas.head()); print(abbrevs.head())"
 57 |    ]
 58 |   },
 59 |   {
 60 |    "cell_type": "code",
 61 |    "execution_count": 22,
 62 |    "metadata": {},
 63 |    "outputs": [
 64 |     {
 65 |      "output_type": "execute_result",
 66 |      "data": {
 67 |       "text/plain": [
 68 |        "  state/region     ages  year  population    state\n",
 69 |        "0           AL  under18  2012   1117489.0  Alabama\n",
 70 |        "1           AL    total  2012   4817528.0  Alabama\n",
 71 |        "2           AL  under18  2010   1130966.0  Alabama\n",
 72 |        "3           AL    total  2010   4785570.0  Alabama\n",
 73 |        "4           AL  under18  2011   1125763.0  Alabama"
 74 |       ],
 75 |       "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>state/region</th>\n      <th>ages</th>\n      <th>year</th>\n      <th>population</th>\n      <th>state</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>AL</td>\n      <td>under18</td>\n      <td>2012</td>\n      <td>1117489.0</td>\n      <td>Alabama</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>AL</td>\n      <td>total</td>\n      <td>2012</td>\n      <td>4817528.0</td>\n      <td>Alabama</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>AL</td>\n      <td>under18</td>\n      <td>2010</td>\n      <td>1130966.0</td>\n      <td>Alabama</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>AL</td>\n      <td>total</td>\n      <td>2010</td>\n      <td>4785570.0</td>\n      <td>Alabama</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>AL</td>\n      <td>under18</td>\n      <td>2011</td>\n      <td>1125763.0</td>\n      <td>Alabama</td>\n    </tr>\n  </tbody>\n</table>\n</div>"
 76 |      },
 77 |      "metadata": {},
 78 |      "execution_count": 22
 79 |     }
 80 |    ],
 81 |    "source": [
 82 |     "merged = pd.merge(pop, abbrevs, how='outer', left_on='state/region', right_on='abbreviation') # if you do not specify left_on/right_on then no common coumns error\n",
 83 |     "merged = merged.drop('abbreviation', 1) # drop duplicate info \n",
 84 |     "merged.head()"
 85 |    ]
 86 |   },
 87 |   {
 88 |    "cell_type": "code",
 89 |    "execution_count": 23,
 90 |    "metadata": {},
 91 |    "outputs": [
 92 |     {
 93 |      "output_type": "execute_result",
 94 |      "data": {
 95 |       "text/plain": [
 96 |        "state/region    False\n",
 97 |        "ages            False\n",
 98 |        "year            False\n",
 99 |        "population       True\n",
100 |        "state            True\n",
101 |        "dtype: bool"
102 |       ]
103 |      },
104 |      "metadata": {},
105 |      "execution_count": 23
106 |     }
107 |    ],
108 |    "source": [
109 |     "merged.isnull().any()"
110 |    ]
111 |   },
112 |   {
113 |    "cell_type": "code",
114 |    "execution_count": 30,
115 |    "metadata": {},
116 |    "outputs": [
117 |     {
118 |      "output_type": "execute_result",
119 |      "data": {
120 |       "text/plain": [
121 |        "     state/region     ages  year  population state\n",
122 |        "2448           PR  under18  1990         NaN   NaN\n",
123 |        "2449           PR    total  1990         NaN   NaN\n",
124 |        "2450           PR    total  1991         NaN   NaN\n",
125 |        "2451           PR  under18  1991         NaN   NaN\n",
126 |        "2452           PR    total  1993         NaN   NaN"
127 |       ],
128 |       "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>state/region</th>\n      <th>ages</th>\n      <th>year</th>\n      <th>population</th>\n      <th>state</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>2448</th>\n      <td>PR</td>\n      <td>under18</td>\n      <td>1990</td>\n      <td>NaN</td>\n      <td>NaN</td>\n    </tr>\n    <tr>\n      <th>2449</th>\n      <td>PR</td>\n      <td>total</td>\n      <td>1990</td>\n      <td>NaN</td>\n      <td>NaN</td>\n    </tr>\n    <tr>\n      <th>2450</th>\n      <td>PR</td>\n      <td>total</td>\n      <td>1991</td>\n      <td>NaN</td>\n      <td>NaN</td>\n    </tr>\n    <tr>\n      <th>2451</th>\n      <td>PR</td>\n      <td>under18</td>\n      <td>1991</td>\n      <td>NaN</td>\n      <td>NaN</td>\n    </tr>\n    <tr>\n      <th>2452</th>\n      <td>PR</td>\n      <td>total</td>\n      <td>1993</td>\n      <td>NaN</td>\n      <td>NaN</td>\n    </tr>\n  </tbody>\n</table>\n</div>"
129 |      },
130 |      "metadata": {},
131 |      "execution_count": 30
132 |     }
133 |    ],
134 |    "source": [
135 |     "merged[merged['population'].isnull()].head()"
136 |    ]
137 |   },
138 |   {
139 |    "cell_type": "code",
140 |    "execution_count": 31,
141 |    "metadata": {},
142 |    "outputs": [
143 |     {
144 |      "output_type": "execute_result",
145 |      "data": {
146 |       "text/plain": [
147 |        "state/region    False\n",
148 |        "ages            False\n",
149 |        "year            False\n",
150 |        "population       True\n",
151 |        "state           False\n",
152 |        "dtype: bool"
153 |       ]
154 |      },
155 |      "metadata": {},
156 |      "execution_count": 31
157 |     }
158 |    ],
159 |    "source": [
160 |     "merged.loc[merged['state/region'] == 'PR', 'state'] = 'Puerto Rico'\n",
161 |     "merged.loc[merged['state/region'] == 'USA', 'state'] = 'United States'\n",
162 |     "merged.isnull().any()"
163 |    ]
164 |   },
165 |   {
166 |    "cell_type": "code",
167 |    "execution_count": 36,
168 |    "metadata": {},
169 |    "outputs": [
170 |     {
171 |      "output_type": "execute_result",
172 |      "data": {
173 |       "text/plain": [
174 |        "  state/region     ages  year  population    state  area (sq. mi)\n",
175 |        "0           AL  under18  2012   1117489.0  Alabama        52423.0\n",
176 |        "1           AL    total  2012   4817528.0  Alabama        52423.0\n",
177 |        "2           AL  under18  2010   1130966.0  Alabama        52423.0\n",
178 |        "3           AL    total  2010   4785570.0  Alabama        52423.0\n",
179 |        "4           AL  under18  2011   1125763.0  Alabama        52423.0"
180 |       ],
181 |       "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>state/region</th>\n      <th>ages</th>\n      <th>year</th>\n      <th>population</th>\n      <th>state</th>\n      <th>area (sq. mi)</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>AL</td>\n      <td>under18</td>\n      <td>2012</td>\n      <td>1117489.0</td>\n      <td>Alabama</td>\n      <td>52423.0</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>AL</td>\n      <td>total</td>\n      <td>2012</td>\n      <td>4817528.0</td>\n      <td>Alabama</td>\n      <td>52423.0</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>AL</td>\n      <td>under18</td>\n      <td>2010</td>\n      <td>1130966.0</td>\n      <td>Alabama</td>\n      <td>52423.0</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>AL</td>\n      <td>total</td>\n      <td>2010</td>\n      <td>4785570.0</td>\n      <td>Alabama</td>\n      <td>52423.0</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>AL</td>\n      <td>under18</td>\n      <td>2011</td>\n      <td>1125763.0</td>\n      <td>Alabama</td>\n      <td>52423.0</td>\n    </tr>\n  </tbody>\n</table>\n</div>"
182 |      },
183 |      "metadata": {},
184 |      "execution_count": 36
185 |     }
186 |    ],
187 |    "source": [
188 |     "final = pd.merge(merged, areas, on='state', how='left')\n",
189 |     "final.head()"
190 |    ]
191 |   },
192 |   {
193 |    "cell_type": "code",
194 |    "execution_count": 37,
195 |    "metadata": {},
196 |    "outputs": [
197 |     {
198 |      "output_type": "execute_result",
199 |      "data": {
200 |       "text/plain": [
201 |        "(2544, 6)"
202 |       ]
203 |      },
204 |      "metadata": {},
205 |      "execution_count": 37
206 |     }
207 |    ],
208 |    "source": [
209 |     "final.shape"
210 |    ]
211 |   },
212 |   {
213 |    "cell_type": "code",
214 |    "execution_count": 38,
215 |    "metadata": {},
216 |    "outputs": [
217 |     {
218 |      "output_type": "execute_result",
219 |      "data": {
220 |       "text/plain": [
221 |        "state/region     False\n",
222 |        "ages             False\n",
223 |        "year             False\n",
224 |        "population        True\n",
225 |        "state            False\n",
226 |        "area (sq. mi)     True\n",
227 |        "dtype: bool"
228 |       ]
229 |      },
230 |      "metadata": {},
231 |      "execution_count": 38
232 |     }
233 |    ],
234 |    "source": [
235 |     "final.isnull().any()"
236 |    ]
237 |   },
238 |   {
239 |    "cell_type": "code",
240 |    "execution_count": 39,
241 |    "metadata": {},
242 |    "outputs": [
243 |     {
244 |      "output_type": "execute_result",
245 |      "data": {
246 |       "text/plain": [
247 |        "array(['United States'], dtype=object)"
248 |       ]
249 |      },
250 |      "metadata": {},
251 |      "execution_count": 39
252 |     }
253 |    ],
254 |    "source": [
255 |     "final['state'][final['area (sq. mi)'].isnull()].unique()"
256 |    ]
257 |   },
258 |   {
259 |    "cell_type": "code",
260 |    "execution_count": 40,
261 |    "metadata": {},
262 |    "outputs": [
263 |     {
264 |      "output_type": "execute_result",
265 |      "data": {
266 |       "text/plain": [
267 |        "  state/region     ages  year  population    state  area (sq. mi)\n",
268 |        "0           AL  under18  2012   1117489.0  Alabama        52423.0\n",
269 |        "1           AL    total  2012   4817528.0  Alabama        52423.0\n",
270 |        "2           AL  under18  2010   1130966.0  Alabama        52423.0\n",
271 |        "3           AL    total  2010   4785570.0  Alabama        52423.0\n",
272 |        "4           AL  under18  2011   1125763.0  Alabama        52423.0"
273 |       ],
274 |       "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>state/region</th>\n      <th>ages</th>\n      <th>year</th>\n      <th>population</th>\n      <th>state</th>\n      <th>area (sq. mi)</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>AL</td>\n      <td>under18</td>\n      <td>2012</td>\n      <td>1117489.0</td>\n      <td>Alabama</td>\n      <td>52423.0</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>AL</td>\n      <td>total</td>\n      <td>2012</td>\n      <td>4817528.0</td>\n      <td>Alabama</td>\n      <td>52423.0</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>AL</td>\n      <td>under18</td>\n      <td>2010</td>\n      <td>1130966.0</td>\n      <td>Alabama</td>\n      <td>52423.0</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>AL</td>\n      <td>total</td>\n      <td>2010</td>\n      <td>4785570.0</td>\n      <td>Alabama</td>\n      <td>52423.0</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>AL</td>\n      <td>under18</td>\n      <td>2011</td>\n      <td>1125763.0</td>\n      <td>Alabama</td>\n      <td>52423.0</td>\n    </tr>\n  </tbody>\n</table>\n</div>"
275 |      },
276 |      "metadata": {},
277 |      "execution_count": 40
278 |     }
279 |    ],
280 |    "source": [
281 |     "final.dropna(inplace=True)\n",
282 |     "final.head()"
283 |    ]
284 |   },
285 |   {
286 |    "cell_type": "code",
287 |    "execution_count": 42,
288 |    "metadata": {},
289 |    "outputs": [
290 |     {
291 |      "output_type": "execute_result",
292 |      "data": {
293 |       "text/plain": [
294 |        "(2476, 6)"
295 |       ]
296 |      },
297 |      "metadata": {},
298 |      "execution_count": 42
299 |     }
300 |    ],
301 |    "source": [
302 |     "final.shape"
303 |    ]
304 |   },
305 |   {
306 |    "cell_type": "code",
307 |    "execution_count": 43,
308 |    "metadata": {},
309 |    "outputs": [
310 |     {
311 |      "output_type": "execute_result",
312 |      "data": {
313 |       "text/plain": [
314 |        "    state/region   ages  year  population       state  area (sq. mi)\n",
315 |        "3             AL  total  2010   4785570.0     Alabama        52423.0\n",
316 |        "91            AK  total  2010    713868.0      Alaska       656425.0\n",
317 |        "101           AZ  total  2010   6408790.0     Arizona       114006.0\n",
318 |        "189           AR  total  2010   2922280.0    Arkansas        53182.0\n",
319 |        "197           CA  total  2010  37333601.0  California       163707.0"
320 |       ],
321 |       "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>state/region</th>\n      <th>ages</th>\n      <th>year</th>\n      <th>population</th>\n      <th>state</th>\n      <th>area (sq. mi)</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>3</th>\n      <td>AL</td>\n      <td>total</td>\n      <td>2010</td>\n      <td>4785570.0</td>\n      <td>Alabama</td>\n      <td>52423.0</td>\n    </tr>\n    <tr>\n      <th>91</th>\n      <td>AK</td>\n      <td>total</td>\n      <td>2010</td>\n      <td>713868.0</td>\n      <td>Alaska</td>\n      <td>656425.0</td>\n    </tr>\n    <tr>\n      <th>101</th>\n      <td>AZ</td>\n      <td>total</td>\n      <td>2010</td>\n      <td>6408790.0</td>\n      <td>Arizona</td>\n      <td>114006.0</td>\n    </tr>\n    <tr>\n      <th>189</th>\n      <td>AR</td>\n      <td>total</td>\n      <td>2010</td>\n      <td>2922280.0</td>\n      <td>Arkansas</td>\n      <td>53182.0</td>\n    </tr>\n    <tr>\n      <th>197</th>\n      <td>CA</td>\n      <td>total</td>\n      <td>2010</td>\n      <td>37333601.0</td>\n      <td>California</td>\n      <td>163707.0</td>\n    </tr>\n  </tbody>\n</table>\n</div>"
322 |      },
323 |      "metadata": {},
324 |      "execution_count": 43
325 |     }
326 |    ],
327 |    "source": [
328 |     "data2010 = final.query(\"year == 2010 & ages == 'total'\")\n",
329 |     "data2010.head()"
330 |    ]
331 |   },
332 |   {
333 |    "cell_type": "code",
334 |    "execution_count": 44,
335 |    "metadata": {},
336 |    "outputs": [],
337 |    "source": [
338 |     "data2010.set_index('state', inplace=True)\n",
339 |     "density = data2010['population'] / data2010['area (sq. mi)']"
340 |    ]
341 |   },
342 |   {
343 |    "cell_type": "code",
344 |    "execution_count": 45,
345 |    "metadata": {},
346 |    "outputs": [
347 |     {
348 |      "output_type": "execute_result",
349 |      "data": {
350 |       "text/plain": [
351 |        "state\n",
352 |        "District of Columbia    8898.897059\n",
353 |        "Puerto Rico             1058.665149\n",
354 |        "New Jersey              1009.253268\n",
355 |        "Rhode Island             681.339159\n",
356 |        "Connecticut              645.600649\n",
357 |        "dtype: float64"
358 |       ]
359 |      },
360 |      "metadata": {},
361 |      "execution_count": 45
362 |     }
363 |    ],
364 |    "source": [
365 |     "density.sort_values(ascending=False, inplace=True)\n",
366 |     "density.head()"
367 |    ]
368 |   },
369 |   {
370 |    "cell_type": "code",
371 |    "execution_count": 46,
372 |    "metadata": {},
373 |    "outputs": [
374 |     {
375 |      "output_type": "execute_result",
376 |      "data": {
377 |       "text/plain": [
378 |        "state\n",
379 |        "South Dakota    10.583512\n",
380 |        "North Dakota     9.537565\n",
381 |        "Montana          6.736171\n",
382 |        "Wyoming          5.768079\n",
383 |        "Alaska           1.087509\n",
384 |        "dtype: float64"
385 |       ]
386 |      },
387 |      "metadata": {},
388 |      "execution_count": 46
389 |     }
390 |    ],
391 |    "source": [
392 |     "density.tail()"
393 |    ]
394 |   },
395 |   {
396 |    "cell_type": "code",
397 |    "execution_count": 48,
398 |    "metadata": {},
399 |    "outputs": [
400 |     {
401 |      "output_type": "execute_result",
402 |      "data": {
403 |       "text/plain": [
404 |        "state\n",
405 |        "District of Columbia    8898.897059\n",
406 |        "Puerto Rico             1058.665149\n",
407 |        "New Jersey              1009.253268\n",
408 |        "Rhode Island             681.339159\n",
409 |        "Connecticut              645.600649\n",
410 |        "Massachusetts            621.815538\n",
411 |        "Maryland                 466.445797\n",
412 |        "Delaware                 460.445752\n",
413 |        "New York                 356.094135\n",
414 |        "Florida                  286.597129\n",
415 |        "Pennsylvania             275.966651\n",
416 |        "Ohio                     257.549634\n",
417 |        "California               228.051342\n",
418 |        "Illinois                 221.687472\n",
419 |        "Virginia                 187.622273\n",
420 |        "Indiana                  178.197831\n",
421 |        "North Carolina           177.617157\n",
422 |        "Georgia                  163.409902\n",
423 |        "Tennessee                150.825298\n",
424 |        "South Carolina           144.854594\n",
425 |        "New Hampshire            140.799273\n",
426 |        "Hawaii                   124.746707\n",
427 |        "Kentucky                 107.586994\n",
428 |        "Michigan                 102.015794\n",
429 |        "Washington                94.557817\n",
430 |        "Texas                     93.987655\n",
431 |        "Alabama                   91.287603\n",
432 |        "Louisiana                 87.676099\n",
433 |        "Wisconsin                 86.851900\n",
434 |        "Missouri                  86.015622\n",
435 |        "West Virginia             76.519582\n",
436 |        "Vermont                   65.085075\n",
437 |        "Mississippi               61.321530\n",
438 |        "Minnesota                 61.078373\n",
439 |        "Arizona                   56.214497\n",
440 |        "Arkansas                  54.948667\n",
441 |        "Iowa                      54.202751\n",
442 |        "Oklahoma                  53.778278\n",
443 |        "Colorado                  48.493718\n",
444 |        "Oregon                    39.001565\n",
445 |        "Maine                     37.509990\n",
446 |        "Kansas                    34.745266\n",
447 |        "Utah                      32.677188\n",
448 |        "Nevada                    24.448796\n",
449 |        "Nebraska                  23.654153\n",
450 |        "Idaho                     18.794338\n",
451 |        "New Mexico                16.982737\n",
452 |        "South Dakota              10.583512\n",
453 |        "North Dakota               9.537565\n",
454 |        "Montana                    6.736171\n",
455 |        "Wyoming                    5.768079\n",
456 |        "Alaska                     1.087509\n",
457 |        "dtype: float64"
458 |       ]
459 |      },
460 |      "metadata": {},
461 |      "execution_count": 48
462 |     }
463 |    ],
464 |    "source": [
465 |     "density"
466 |    ]
467 |   },
468 |   {
469 |    "cell_type": "code",
470 |    "execution_count": null,
471 |    "metadata": {},
472 |    "outputs": [],
473 |    "source": []
474 |   }
475 |  ]
476 | }


--------------------------------------------------------------------------------
/numpy-pandas/README.md:
--------------------------------------------------------------------------------
  1 | # Python Data Science Handbook
  2 | 
  3 | Rustam-Z🚀 • 1 June 2021
  4 | 
  5 | My notes on **NumPy: ndarray**, **Pandas: DataFrame**, **Matplotlib**, and **Scikit-Learn** 
  6 | 
  7 | ## Contents
  8 | 1. IPython: Beyond Normal Python - *All features of Jupyter Notebook*
  9 | 2. [Introduction to NumPy: Math operations with NumPy](#CHAPTER-2:-Introduction-to-NumPy)
 10 |     - Creating Arrays
 11 |     - The Basics of NumPy Arrays
 12 |     - Computation on NumPy Arrays
 13 |     - Fancy indexing
 14 |     - Structured Arrays
 15 | 3. [Data Manipulation with Pandas](#CHAPTER-3:-Data-Manipulation-with-Pandas)
 16 |     - The Pandas Series / DataFrame / Index Objects
 17 |     - Data Selection in Series / DataFrame
 18 |     - Missing Data in Pandas / Operating on NULL values
 19 |     - Combining Datasets: Concat and Append
 20 |     - [GroupBy: Split, Apply, Combine](#GroupBy:-Split,-Apply,-Combine)
 21 | 4. [Visualization with Matplotlib](#CHAPTER-4:-Visualization-with-Matplotlib)
 22 | 5. [Machine Learning](#Machine-Learning)
 23 | 
 24 | ## CHAPTER 2: Introduction to NumPy
 25 | - `axis=0 is column`, `axis=1 is row`
 26 | 
 27 | ### Creating Arrays
 28 | ```python
 29 | np.zeros(10, dtype=int) # Create a length-10 integer array filled with zeros
 30 | np.ones((3, 5), dtype=float) # Create a 3x5 floating-point array filled with 1s
 31 | np.full((3, 5), 3.14) # Create a 3x5 array filled with 3.14
 32 | np.arange(0, 20, 2) # As python's range()
 33 | np.linspace(0, 1, 5) # Create an array of five values evenly spaced between 0 and 1
 34 | np.random.random((3, 3)) # 3x3 array, random values between 0 and 1
 35 | np.random.normal(0, 1, (3, 3)) # normal distribution, with mean 0 and standard deviation 1
 36 | np.random.randint(0, 10, (3, 3)) # random integers between 0 and 10
 37 | np.eye(3) # Create a 3x3 identity matrix
 38 | np.empty(3) # Create an uninitialized array of three integers
 39 | 
 40 | np.zeros(10, dtype='int16') # same as 
 41 | np.zeros(10, dtype=np.int16)
 42 | ```
 43 | 
 44 | ### The Basics of NumPy Arrays
 45 | - *Attributes of arrays*
 46 |     - Determining the size, shape, memory consumption, and data types of arrays
 47 | - *Indexing of arrays*
 48 |     - Getting and setting the value of individual array elements
 49 | - *Slicing of arrays*
 50 |     - Getting and setting smaller subarrays within a larger array
 51 | - *Reshaping of arrays*
 52 |     - Changing the shape of a given array
 53 | - *Array Concatenation and Splitting*
 54 |     - Combining multiple arrays into one, and splitting one array into many
 55 | 
 56 | - indices `(e.g., arr[0])`, slices `(e.g., arr[:5])`, and boolean masks `(e.g., arr[arr > 0])`
 57 | - [np.newaxis()](https://stackoverflow.com/questions/46334014/np-reshapex-1-1-vs-x-np-newaxis)
 58 | 
 59 | ```python
 60 | """ Attributes of arrays """
 61 | x = np.random.randint(10, size=(3, 4, 5)) # Three-dimensional array
 62 | x.ndim # 3
 63 | x.shape # (3, 4, 5)
 64 | x.size # 60 = 3*4*5
 65 | x.dtype # dtype: int64
 66 | x.nbytes # total size of array in bytes
 67 | ```
 68 | ```python
 69 | """ Indexing of arrays """
 70 | # Same as in python lists, but beware if you insert float into int, the result will be int
 71 | x[0][0][1] or x[0, 0, 1]
 72 | ```
 73 | ```python
 74 | """ Slicing of arrays """
 75 | # Same as python lists
 76 | # NOTE! Multidimensional slices work in the same way, with multiple slices separated by commas.
 77 | x[start:stop:step]
 78 | ```
 79 | - NOTE! NumPy arrays return the *view* of original array after slicing. So, when we modify our sliced array it will affect to original array. Use **copy()** method when you don't want it. `x_copy = x[:2, :2].copy()`
 80 | 
 81 | ```python
 82 | """ Reshaping of Arrays """
 83 | # reshape() method
 84 | np.arange(1, 10).reshape((3, 3))
 85 | ```
 86 | ```python
 87 | """ Array Concatenation """
 88 | x = np.array([1, 2, 3])
 89 | y = np.array([3, 2, 1])
 90 | 
 91 | grid = np.array([[9, 8, 7],[6, 5, 4]])
 92 | 
 93 | np.concatenate([x, y]) # axis=1 same as x axis, then it will concatenated horizontally
 94 | 
 95 | # If working with different dimensions
 96 | np.vstack([x, grid])
 97 | np.hstack([grid, y])
 98 | # np.dstack will stack arrays along the third axis
 99 | 
100 | """ Splitting of arrays """
101 | # np.split, np.hsplit, np.vsplit
102 | x = [1, 2, 3, 99, 99, 3, 2, 1]
103 | x1, x2, x3 = np.split(x, [3, 5]) # we give splitting points
104 | print(x1, x2, x3) # [1 2 3] [99 99] [3 2 1] # N --> N+1 subarray
105 | ```
106 | 
107 | ### Computation on NumPy Arrays
108 | - *unary ufuncs*, operate on a single input, and *binary ufuncs*, operate on two inputs
109 | ```
110 | +       np.add          Addition (e.g., 1 + 1 = 2) 
111 | -       np.subtract     Subtraction (e.g., 3 - 2 = 1) 
112 | -       np.negative     Unary negation (e.g., -2) 
113 | *       np.multiply     Multiplication (e.g., 2 * 3 = 6) 
114 | /       np.divide       Division (e.g., 3 / 2 = 1.5)
115 | //      np.floor_divide Floor division (e.g., 3 // 2 = 1)
116 | **      np.power        Exponentiation (e.g., 2 ** 3 = 8) 
117 | %       np.mod          Modulus/remainder (e.g., 9 % 4 = 1)
118 | 
119 |         np.abs(x)
120 |         np.sin(x), np.cos(x), np.tan(x)
121 |         np.log(x), np.log2(x), np.log10(x)
122 |         np.exp(x)        e^x
123 |         np.exp2(x)       2^x
124 |         np.power(3, x)   3^x
125 |         np.expm1(x)      exp(x) - 1
126 |         np.log1p(x)      log(1 + x)
127 | ```
128 | ```python
129 | x = np.arange(1, 6)
130 | np.add.reduce(x) # 15, sum of all elements
131 | np.multiply.reduce(x) # 120, mulitplication of all elements
132 | 
133 | np.add.accumulate(x) # array([ 1, 3, 6, 10, 15]), intermediate result
134 | np.multiply.accumulate(x) # array([ 1, 2, 6, 24, 120])
135 | 
136 | np.multiply.outer(x, x) # N+1 dimension multiplication
137 | 
138 | np.sum          Compute sum of elements
139 | np.prod         Compute product of elements
140 | np.mean         Compute median of elements
141 | np.std          Compute standard deviation
142 | np.var          Compute variance
143 | np.min          Find minimum value
144 | np.max          Find maximum value
145 | np.argmin       Find index of minimum value
146 | np.argmax       Find index of maximum value
147 | np.median       Compute median of elements
148 | np.percentile   Compute rank-based statistics of elements   np.percentile(arr, 25))
149 | np.any          Evaluate whether any elements are true
150 | np.all          Evaluate whether all elements are true
151 | ```
152 | ```python
153 | """Comparison Operators"""
154 | ==      np.equal
155 | !=      np.not_equal
156 | <       np.less             np.less(x, 3) is x < 3
157 | <=      np.less_equal
158 | >       np.greater
159 | >=      np.greater_equal
160 | 
161 | # Example
162 | x = np.array([1, 2, 3, 4, 5])
163 | x < 3  # array([ True, True, False, False, False], dtype=bool)
164 | (2 * x) == (x ** 2)  # array([False, True, False, False, False], dtype=bool)
165 | ```
166 | ```python
167 | """Working with Boolean Arrays"""
168 | print(x)  # [[5 0 3 3][7 9 3 5][2 4 7 6]]
169 | 
170 | # Counting entries
171 | np.count_nonzero(x < 6) # 8, how many values less than 6?
172 | np.sum(x < 6) # 8, counts elements less than 6
173 | np.sum(x < 6, axis=1) # how many values less than 6 in each row?
174 | np.any(x > 8) # are there any values greater than 8?
175 | np.all(x < 10) # are all values less than 10?
176 | np.all(x < 8, axis=1) # are all values in each row less than 8?
177 | 
178 | # Boolean operators
179 | &   np.bitwise_and
180 | |   np.bitwise_or
181 | ^   np.bitwise_xor
182 | ~   np.bitwise_not
183 | np.sum((inches > 0.5) & (inches < 1)) # that's counts the number of elements
184 | np.sum(~( (inches <= 0.5) | (inches >= 1) ))
185 | 
186 | x[x < 5] # [0 3 3 3 2 4]
187 | 
188 | # Fancy indexing 
189 | x = rand.randint(100, size=10)
190 | y = np.array([1, 2])
191 | x[y] # array([92, 14])
192 | ```
193 | - `np.sort(x)`, `np.argsort(x)` , `np.sort(X, axis=0)` = sort each column of X
194 | - Partial Sorts: `np.partition(x, 3)` - returns 2 smallest elements to the left
195 | 
196 | ```python
197 | """NumPy’s Structured Arrays: Compound data types"""
198 | name = ['Alice', 'Bob', 'Cathy', 'Doug']
199 | age = [25, 45, 37, 19]
200 | weight = [55.0, 85.5, 68.0, 61.5]
201 | 
202 | # We need to combine them
203 | x = np.zeros(4, dtype=int)
204 | data = np.zeros(4, dtype={'names':('name', 'age', 'weight'), 'formats':('U10', 'i4', 'f8')})
205 | data['name'] = name
206 | data['age'] = age
207 | data['weight'] = weight
208 | 
209 | print(data) # [('Alice', 25, 55.0) ('Bob', 45, 85.5) ('Cathy', 37, 68.0) ('Doug', 19, 61.5)]
210 | 
211 | # Get all names
212 | data['name'] # array(['Alice', 'Bob', 'Cathy', 'Doug'], dtype='<U10')
213 | 
214 | # Get first row of data
215 | data[0] # ('Alice', 25, 55.0)
216 | 
217 | # Get the name from the last row
218 | data[-1]['name'] # 'Doug'
219 | 
220 | # Get names where age is under 30
221 | data[data['age'] < 30]['name'] #  array(['Alice', 'Doug'], dtype='<U10')
222 | 
223 | """Creating Structured Arrays"""
224 | tp = np.dtype({'names':('name', 'age', 'weight'), 'formats':('U10', 'i4', 'f8')})
225 | tp = np.dtype([('name', 'S10'), ('age', 'i4'), ('weight', 'f8')])
226 | tp = np.dtype('S10,i4,f8')
227 | 
228 | # Then assign in dtype argument:
229 | X = np.zeros(1, dtype=tp)
230 | 
231 | """More advanced compound arrays"""
232 | tp = np.dtype([('id', 'i8'), ('mat', 'f8', (3, 3))])
233 | X = np.zeros(1, dtype=tp)
234 | print(X[0])
235 | print(X['mat'][0])
236 | ```
237 | 
238 | ## CHAPTER 3: Data Manipulation with Pandas
239 | - Consists of **Series** and **DataFrame** objects, also **Index**
240 | - `TIP!` “Explicit is better than implicit"
241 | 
242 | ```python
243 | """The Pandas Series Object"""
244 | # pd.Series(data, index=index)
245 | data = pd.Series([0.25, 0.5, 0.75, 1.0]) # Series = 1D array
246 | data.index
247 | data.values
248 | # Difference: Numpy = implicitly defined index, but Pandas Series = explicitly defined index (obvious, can be changed)
249 | data = pd.Series([0.25, 0.5, 0.75, 1.0], index=[2, 'b', 'c', 'd'])
250 | ```
251 | ```python
252 | """The Pandas DataFrame Object"""
253 | # It consists of Series
254 | states = pd.DataFrame({'population': population, 'area': area})
255 | # pd.DataFrame([{'a': 1, 'b': 2}, {'b': 3, 'c': 4}]) # From a list of dicts
256 | # A = np.zeros(3, dtype=[('A', 'i8'), ('B', 'f8')]) # From a NumPy structured array
257 | ```
258 | - Pages 102-105 **DataFrame** object creating variations
259 | ```python
260 | """Index object"""
261 | # The index object is immutablle so that it cannot be changed after diclaration
262 | ind = pd.Index([2, 3, 5, 7, 11])
263 | ind[1] = 0 # ERROR
264 | 
265 | indA = pd.Index([1, 3, 5, 7, 9])
266 | indB = pd.Index([2, 3, 5, 7, 11])
267 | indA & indB # intersection => Int64Index([3, 5, 7], dtype='int64')
268 | indA | indB # union => Int64Index([1, 2, 3, 5, 7, 9, 11], dtype='int64')
269 | indA ^ indB # symmetric difference => Int64Index([1, 2, 9, 11], dtype='int64')
270 | ```
271 | ```python
272 | """Data Selection in Series"""
273 | data = pd.Series([0.25, 0.5, 0.75, 1.0], index=['a', 'b', 'c', 'd'])
274 | 
275 | data['b'] # 0.5
276 | 'a' in data # True
277 | data.keys()
278 | data.items() # key: value
279 | data['e'] = 1.25 # We can add new item
280 | 
281 | # slicing explicit, 'c' will be included 
282 | data['a':'c'] 
283 | 
284 | # slicing implicit
285 | data[0:2] 
286 | 
287 | # masking
288 | data[(data > 0.3) & (data < 0.8)] 
289 | 
290 | # fancy indexing
291 | data[['a', 'e']]
292 | 
293 | """Indexers: loc, iloc, and ix
294 | loc = allows indexing and slicing that always references the explicit index (own indexing)
295 | iloc = allows indexing and slicing that always references the implicit Python-style index (from 0)
296 | 
297 | `TIP!` “Explicit is better than implicit"
298 | """
299 | ```
300 | ```python
301 | """Data Selection in DataFrame"""
302 | # DataFrame as a dictionary
303 | data = pd.DataFrame({'area':area, 'pop':pop})
304 | data['area']
305 | data.area # if name == str method then not working
306 | # Add new column
307 | data['density'] = data['pop'] / data['area']
308 | # Access samples
309 | data.loc['Texas']
310 | 
311 | # DataFrame as two-dimensional array
312 | data.values 
313 | data.T # Transpose
314 | data.iloc[:3, :2] # Chooses both row and column respectively
315 | data.loc[:'New York', :'pop'] # same as previous
316 | data.loc[data.density > 100, ['pop', 'density']] # fancy indexing
317 | # Change like this
318 | data.iloc[0, 2] = 90
319 | data[data.density > 100]
320 | ```
321 | - Until page 114
322 | - We can perform NumPy operations over Pandas Series and Dataframe (adding, division)
323 | ```py
324 | A = pd.Series([2, 4, 6], index=[0, 1, 2])
325 | B = pd.Series([1, 3, 5], index=[1, 2, 3])
326 | print(A + B)
327 | print(A.add(B, fill_value=0)) # the set which doesn't include that index will be replaces with 0 
328 | 
329 | ## A.add(B)
330 | + add()
331 | - sub(), subtract()
332 | * xmul(), multiply() 
333 | / truediv(), div(), divide()
334 | // floordiv()
335 | % mod()
336 | ** pow()
337 | ```
338 | ```py
339 | """Missing Data in Pandas"""
340 | vals2 = np.array([1, np.nan, 3, 4])
341 | np.nansum(vals2), np.nanmin(vals2), np.nanmax(vals2)
342 | 
343 | # NaN and None in Pandas
344 | x = pd.Series(range(2), dtype=int)
345 | x[0] = None # Then it will be represented as NaN in DataFrame
346 | 
347 | """Operating on Null Values"""
348 | isnull() # True / False for each element
349 | notnull() # opposite of isnull()
350 | dropna() # Return a filtered version of the data
351 | fillna()
352 | 
353 | # Detecting null values
354 | df.isnull()
355 | data[data.notnull()]
356 | 
357 | # Dropping null values
358 | data.dropna()
359 | df.dropna(axis='columns', how='all') # df.dropna(axis=1) | how='all', by default how='any' | thresh=3
360 | 
361 | # Filling null values
362 | data.fillna(0)
363 | data.fillna(method='ffill') # propagate the previous value forward
364 | data.fillna(method='bfill')
365 | df.fillna(method='ffill', axis=1) # we can specify an axis along which the fills take place
366 | 
367 | # NOTE
368 | df.isnull().any()
369 | df[df['SMTH'].isnull()].head()
370 | ```
371 | ```py
372 | """Combining Datasets: Concat and Append"""
373 | np.concatenate([x, y]) # with numpy
374 | pd.concat([x, y]) # with pandas
375 | pd.concat([x, y], ignore_index=True) # ignoring the index
376 | df1.append(df2) # same as pd.concat([df1, df2]), NOT good practice
377 | 
378 | """Combining Datasets: Merge and Join"""
379 | df3 = pd.merge(df1, df2) # can use when df1 and df2 have common columns PK = primary key
380 | # check 02-pandas.ipynb
381 | ```
382 | ### GroupBy: Split, Apply, Combine
383 | - Split, apply, combine
384 | - **Functions: aggregate, filter, transform, and apply.**
385 | - The **split** step involves breaking up and grouping a DataFrame depending on the value of the specified key.
386 | - The **apply** step involves computing some function, usually an aggregate, transformation, or filtering, within the individual groups.
387 | - The **combine** step merges the results of these operations into an output array.
388 | <br><img src="img/groupby.png">
389 | 
390 | - We need to apply any *Aggregation* funcs from Pandas and NumPy, like `df.groupby('key').sum()`
391 | - `n_by_state = df.groupby("state")["last_name"].count()` You call `.groupby()` and pass the name of the column you want to group on, which is ``"state"``. Then, you use `["last_name"]` to specify the columns on which you want to perform the actual aggregation.
392 | 
393 | ```py
394 | # Column indexing
395 | # https://realpython.com/pandas-groupby/
396 | n_by_state = df.groupby("state")["last_name"].count()
397 | df.groupby(["state", "gender"])["last_name"].count() # for multiple, as_index=False
398 | 
399 | # Dispatch methods 
400 | planets.groupby('method')['year'].describe().unstack()
401 | 
402 | # Aggregation
403 | df.groupby('key').aggregate(['min', np.median, max])
404 | df.groupby('key').aggregate({'data1': 'min', 'data2': 'max'}) # even we can specify 
405 | 
406 | # Filtering
407 | def filter_func(x):
408 |     return x['data2'].std() > 4 
409 |     
410 | print(df)
411 | print(df.groupby('key').std())
412 | print(df.groupby('key').filter(filter_func))
413 | 
414 | # Transformation
415 | df.groupby('key').transform(lambda x: x - x.mean())
416 | 
417 | # The apply() method - we can app;y arbitary function
418 | def norm_by_data2(x):
419 |     # x is a DataFrame of group values
420 |     x['data1'] /= x['data2'].sum() 
421 |     return x
422 | print(df); print(df.groupby('key').apply(norm_by_data2))
423 | ```
424 | ```py
425 | """High-Performance Pandas: eval() and query()"""
426 | """eval()"""
427 | # Operators
428 | result1 = -df1 * df2 / (df3 + df4) - df5
429 | result2 = pd.eval('-df1 * df2 / (df3 + df4) - df5')
430 | 
431 | # With dataframe
432 | result1 = (df['A'] + df['B']) / (df['C'] - 1)
433 | result2 = pd.eval("(df.A + df.B) / (df.C - 1)")
434 | df.eval('D = (A + B) / C', inplace=True) # We can even perform on DF object
435 | 
436 | column_mean = df.mean(1)
437 | result1 = df['A'] + column_mean
438 | result2 = df.eval('A + @column_mean')
439 | 
440 | """query()"""
441 | result1 = df[(df.A < 0.5) & (df.B < 0.5)]
442 | result2 = pd.eval('df[(df.A < 0.5) & (df.B < 0.5)]')
443 | result3 = df.eval('A < 0.5 and B < 0.5') # do not work with DF, so we need query
444 | result4 = df.query('A < 0.5 and B < 0.5')
445 | ```
446 | 
447 | ## CHAPTER 4: Visualization with Matplotlib
448 | ```py
449 | """Line"""
450 | plt.plot(x, np.sin(x), linestyle='-g') # -, --, -., :, -g = solid green
451 | plt.axis([-1, 11, -1.5, 1.5]) # [xmin, xmax, ymin, ymax]
452 | plt.title("A Sine Curve")
453 | plt.xlabel("x")
454 | plt.ylabel("sin(x)")
455 | 
456 | # When multiple lines 
457 | plt.plot(x, np.sin(x), '-g', label='sin(x)')
458 | plt.plot(x, np.cos(x), ':b', label='cos(x)')
459 | plt.axis('equal')
460 | plt.legend()
461 | 
462 | """Scatter"""
463 | plt.scatter(x, y) # marker='o'
464 | 
465 | """Histogram"""
466 | data = np.random.randn(1000)
467 | plt.hist(data)
468 | ```
469 | 
470 | ## Machine Learning
471 | - **Classification: Predicting discrete labels**
472 |     - Some important classification algorithms
473 |         - Naive Bayes 
474 |         - Support Vector Machines
475 |         - Decision Trees and Random Forests
476 | - **Regression: Predicting continuous labels**
477 |     - Some important regression algorithms
478 |         - Linear Regression
479 |         - Support Vector Machines
480 |         - Decision Trees and Random Forests
481 | - **Clustering: Inferring labels on unlabeled data**
482 |     - k-Means Clustering
483 |     - Gaussian Mixture Models
484 | - **Dimensionality reduction: Inferring structure of unlabeled data**
485 |     - Principal Component Analysis (PCA)
486 |     - Manifold Learning


--------------------------------------------------------------------------------
/numpy-pandas/data/state-abbrevs.csv:
--------------------------------------------------------------------------------
 1 | "state","abbreviation"
 2 | "Alabama","AL"
 3 | "Alaska","AK"
 4 | "Arizona","AZ"
 5 | "Arkansas","AR"
 6 | "California","CA"
 7 | "Colorado","CO"
 8 | "Connecticut","CT"
 9 | "Delaware","DE"
10 | "District of Columbia","DC"
11 | "Florida","FL"
12 | "Georgia","GA"
13 | "Hawaii","HI"
14 | "Idaho","ID"
15 | "Illinois","IL"
16 | "Indiana","IN"
17 | "Iowa","IA"
18 | "Kansas","KS"
19 | "Kentucky","KY"
20 | "Louisiana","LA"
21 | "Maine","ME"
22 | "Montana","MT"
23 | "Nebraska","NE"
24 | "Nevada","NV"
25 | "New Hampshire","NH"
26 | "New Jersey","NJ"
27 | "New Mexico","NM"
28 | "New York","NY"
29 | "North Carolina","NC"
30 | "North Dakota","ND"
31 | "Ohio","OH"
32 | "Oklahoma","OK"
33 | "Oregon","OR"
34 | "Maryland","MD"
35 | "Massachusetts","MA"
36 | "Michigan","MI"
37 | "Minnesota","MN"
38 | "Mississippi","MS"
39 | "Missouri","MO"
40 | "Pennsylvania","PA"
41 | "Rhode Island","RI"
42 | "South Carolina","SC"
43 | "South Dakota","SD"
44 | "Tennessee","TN"
45 | "Texas","TX"
46 | "Utah","UT"
47 | "Vermont","VT"
48 | "Virginia","VA"
49 | "Washington","WA"
50 | "West Virginia","WV"
51 | "Wisconsin","WI"
52 | "Wyoming","WY"


--------------------------------------------------------------------------------
/numpy-pandas/data/state-areas.csv:
--------------------------------------------------------------------------------
 1 | state,area (sq. mi)
 2 | Alabama,52423
 3 | Alaska,656425
 4 | Arizona,114006
 5 | Arkansas,53182
 6 | California,163707
 7 | Colorado,104100
 8 | Connecticut,5544
 9 | Delaware,1954
10 | Florida,65758
11 | Georgia,59441
12 | Hawaii,10932
13 | Idaho,83574
14 | Illinois,57918
15 | Indiana,36420
16 | Iowa,56276
17 | Kansas,82282
18 | Kentucky,40411
19 | Louisiana,51843
20 | Maine,35387
21 | Maryland,12407
22 | Massachusetts,10555
23 | Michigan,96810
24 | Minnesota,86943
25 | Mississippi,48434
26 | Missouri,69709
27 | Montana,147046
28 | Nebraska,77358
29 | Nevada,110567
30 | New Hampshire,9351
31 | New Jersey,8722
32 | New Mexico,121593
33 | New York,54475
34 | North Carolina,53821
35 | North Dakota,70704
36 | Ohio,44828
37 | Oklahoma,69903
38 | Oregon,98386
39 | Pennsylvania,46058
40 | Rhode Island,1545
41 | South Carolina,32007
42 | South Dakota,77121
43 | Tennessee,42146
44 | Texas,268601
45 | Utah,84904
46 | Vermont,9615
47 | Virginia,42769
48 | Washington,71303
49 | West Virginia,24231
50 | Wisconsin,65503
51 | Wyoming,97818
52 | District of Columbia,68
53 | Puerto Rico,3515
54 | 


--------------------------------------------------------------------------------
/numpy-pandas/img/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Rustam-Z/machine-learning/5001d7d103642a61f82492df3a968aa6f4836601/numpy-pandas/img/.DS_Store


--------------------------------------------------------------------------------
/numpy-pandas/img/axis=1.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Rustam-Z/machine-learning/5001d7d103642a61f82492df3a968aa6f4836601/numpy-pandas/img/axis=1.jpg


--------------------------------------------------------------------------------
/numpy-pandas/img/groupby.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Rustam-Z/machine-learning/5001d7d103642a61f82492df3a968aa6f4836601/numpy-pandas/img/groupby.png


--------------------------------------------------------------------------------
/numpy-pandas/plt1.py:
--------------------------------------------------------------------------------
1 | import matplotlib.pyplot as plt
2 | import numpy as np
3 | 
4 | x = np.linspace(0, 10, 100)
5 | 
6 | plt.plot(x, np.sin(x))
7 | plt.plot(x, np.cos(x))
8 | 
9 | plt.show()


--------------------------------------------------------------------------------
/numpy-pandas/very-basics/Readme.md:
--------------------------------------------------------------------------------
  1 | # [Python for Data Science Very Basics](https://www.sololearn.com/learning/1161)
  2 | 
  3 |     > Math Operations with NumPy
  4 |     > Data Manipulation with Pandas
  5 |     > Visualization with Matplotlib
  6 | 
  7 | ## Statistics
  8 | - **mean:** the average of the values.
  9 | - **median:** the middle value.
 10 | - **standard deviation:** the measure of spread, the square root of **variance**.
 11 | - **variance:** average of the squared differences from the mean.
 12 | - One standard deviation from the mean - is the values `from (mean-std) to (mean+std)`
 13 | 
 14 | ## Math Operations with NumPy
 15 | ```python 
 16 | # We can use Python Lists to create NumPy arrays
 17 | x = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
 18 | 
 19 | # Size, dimentionality, shape of array
 20 | print(x[1][2]) # 6
 21 | print(x.ndim) # 2
 22 | print(x.size) # 9
 23 | print(x.shape) # (3, 3)
 24 | 
 25 | x = np.array([2, 1, 3])
 26 | x = np.append(x, 4) # [2, 1, 3, 4]
 27 | x = np.delete(x, 0) # Takes index
 28 | x = np.sort(x)
 29 | 
 30 | # Similar to python range()
 31 | x = np.arange(2, 10, 3) # [2, 5, 8]
 32 | 
 33 | # Reshaping the array
 34 | x = np.reshape(3, 1) # [[2], [5], [8]]
 35 | 
 36 | # Indexing and slicing 
 37 | # Same as python lists [-1], [0:4]
 38 | 
 39 | # Conditions
 40 | y = x[x<4] # Select element that are less than 4
 41 | y = x[(x>5) & (x%2==0)] # & (and), | (or)
 42 | 
 43 | # Operations
 44 | y = x.sum()
 45 | y = x.min() 
 46 | y = x.max()
 47 | y = x*2 # Broadcasting used
 48 | 
 49 | # Statistics
 50 | np.mean(x)
 51 | np.median(x)
 52 | np.var(x)
 53 | np.std(x)
 54 | ```
 55 | ```python
 56 | # https://www.sololearn.com/learning/eom-project/1161/1156
 57 | # One standart devisation from the mean
 58 | import numpy as np
 59 | 
 60 | data = np.array([150000, 125000, 320000, 540000, 200000, 120000, 160000, 230000, 280000, 290000, 300000, 500000, 420000, 100000, 150000, 280000])
 61 | 
 62 | mean_h = np.mean(data)
 63 | std_h = np.std(data)
 64 |  
 65 | low, high = mean_h - std_h, mean_h + std_h 
 66 | 
 67 | count = len([v for v in data if low < v < high]) 
 68 | res = count * 100 / len(data)
 69 | print(res)
 70 | ```
 71 | 
 72 | ## Data Manipulation with Pandas
 73 | - Built on top of **NumPy** = "numerical python", **Pandas** = "panel data"
 74 | - Used to read and extract data from files, transform and analyze it, calculate statistics and correlations.
 75 | - **Series** and **DataFrame**. A **Series** is essentially a column, and a **DataFrame** is a multi-dimensional table made up of a collection of Series.
 76 | - `loc` explicit indexing (own indexing), `iloc` implicit indexing (0, 1, 2, 3)
 77 | ```python
 78 | # Dictionary used to create DataFrame (DF)
 79 | data = {
 80 |    'ages': [14, 18, 24, 42],
 81 |    'heights': [165, 180, 176, 184]
 82 | } 
 83 | 
 84 | df = pd.DataFrame(data, index=['James', 'Bob', 'Amy', 'Dave']) # You can specify `index` if you want
 85 | 
 86 | # How to access row?
 87 | y = df.loc["Bob"] # df.loc[1]
 88 | 
 89 | # Indexing
 90 | z = df["ages"] # Series
 91 | z = df[["ages", "heights"]] # DataFrame, pay attention to brackets
 92 | 
 93 | # Slicing
 94 | # iloc[], same as in python lists
 95 | print(df.iloc[2]) # third row
 96 | print(df.iloc[:3]) # first 3 rows
 97 | print(df.iloc[1:3]) # rows 2 to 3 
 98 | print(df.iloc[-3:]) # accessing last three rows
 99 | 
100 | # Conditons
101 | z = df[(df['ages']>18) & (df['heights']>180)]
102 | ```
103 | ```python
104 | # Reading data 
105 | df = pd.read_csv("test.csv")
106 | 
107 | df.head() # First five rows
108 | df.tail() # Last five rows
109 | 
110 | df.info()
111 | df.describe() # Statistics: mean, min, max, percentiles. We can get for a single column too df['cases'].describe()
112 | 
113 | df.set_index("date", inplace=True) # Set as the index the "data" column
114 | # inplace=True used to change the currect dataframe without assigning to new
115 | ```
116 | ```python
117 | # Creating a column
118 | df['area'] = df['height'] * df['width']
119 | df['month'] = pd.to_datetime(df['date'], format="%d.%m.%y").dt.month_name()
120 | 
121 | # Droping a column
122 | df.drop("state", axis=1, inplace=True)
123 | # axis=1 specifies that we want to drop a column.
124 | # axis=0 will drop a row.
125 | ```
126 | ```python
127 | # Grouping
128 | z = df['month'].value_counts()
129 | 
130 | z = df.groupby('month')['cases'].sum()
131 | 
132 | z = df['cases'].sum() # max(), min(), mean()
133 | ```
134 | ```python
135 | """COVID Data Analysis"""
136 | import pandas as pd
137 | 
138 | df = pd.read_csv("https://www.sololearn.com/uploads/ca-covid.csv")
139 | 
140 | df.drop('state', axis=1, inplace=True)
141 | df.set_index('date', inplace=True)
142 | 
143 | df['ratio'] = df['deaths'] / df['cases']
144 | 
145 | largest = df.loc[df['ratio'] == df['ratio'].max()] # df.loc[df['ratio'].max()] we cannot do that
146 | print(largest)
147 | ```
148 | 
149 | ## Visualization with Matplotlib
150 | - https://www.w3schools.com/python/matplotlib_intro.asp
151 | - **Matplotlib** is a library used to create graphs, charts, and figures. It also provides functions to customize your figures by changing the colors, labels, etc.
152 | - **Matplotlib** works really well with **Pandas**! **Pandas** works well with **NumPy**.
153 | ```py
154 | import matplotlib.pyplot as plt
155 | import pandas as pd
156 | 
157 | s = pd.Series([18, 42, 9, 32, 81, 64, 3])
158 | s.plot(kind='bar')
159 | plt.savefig('plot.png')
160 | ```
161 | - Data = Y axis, index = X axis. 
162 | ```py
163 | """Line Plot"""
164 | import pandas as pd
165 | import matplotlib.pyplot as plt
166 | 
167 | df = pd.read_csv("https://www.sololearn.com/uploads/ca-covid.csv")
168 | df.rdop('state', axis=1, inplace=True)
169 | df['date'] = pd.to_datetime(df['date'], format="%d.%m.%y")
170 | df['month'] = df['date'].dt.month
171 | df.set_index('date', inplace=True)
172 | 
173 | df[df['month']==12]['cases'].plot()
174 | # Multiple lines
175 | # (df[df['month']==12])[['cases', 'deaths']].plot()
176 | ```
177 | ```py
178 | """Bar Plot"""
179 | (df.groupby('month')['cases'].sum()).plot(kind="bar") # barh = horizontal bar
180 | # OR
181 | # df = df.groupby('month')
182 | # df['cases'].sum().plot(kind="bar")
183 | ```
184 | ```py
185 | """Box Plot"""
186 | df[df["month"]==6]["cases"].plot(kind="box")
187 | ```
188 | ```py
189 | """Histogram"""
190 | df[df["month"]==6]["cases"].plot(kind="hist")
191 | ```
192 | - A **histogram** is a graph showing *frequency* distributions. Similar to box plots, **histograms** show the distribution of data.
193 | Visually histograms are similar to bar charts, however, histograms display frequencies for a group of data rather than an individual data point; therefore, no spaces are present between the bars. 
194 | ```py
195 | """Area Plot"""
196 | df[df["month"]==6][["cases", "deaths"]].plot(kind="area", stacked=False)
197 | ```
198 | ```py
199 | """Scatter Plot"""
200 | df[df["month"]==6][["cases", "deaths"]].plot(kind="scatter", x='cases', y='deaths')
201 | ```
202 | ```py
203 | """Pie Chart"""
204 | df.groupby('month')['cases'].sum().plot(kind="pie")
205 | ```
206 | ```py
207 | """Plot formatting"""
208 | df[['cases', 'deaths']].plot(kind="area", legend=True, stacked=False, color=['#1970E7', '#E73E19'])
209 | plt.xlabel('Days in June')
210 | plt.ylabel('Number')
211 | plt.suptitle("COVID-19 in June")
212 | ```


--------------------------------------------------------------------------------
/numpy-pandas/very-basics/img/plt.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Rustam-Z/machine-learning/5001d7d103642a61f82492df3a968aa6f4836601/numpy-pandas/very-basics/img/plt.png


--------------------------------------------------------------------------------
/scikit-learn/Readme.md:
--------------------------------------------------------------------------------
 1 | # Scikit-Learn
 2 | 
 3 | - [freeCodeCamp.org](https://youtu.be/0B5eIE_1vpU)
 4 | - https://inria.github.io/scikit-learn-mooc/
 5 | - https://scikit-learn.org/stable/tutorial/index.html
 6 | - https://machinelearningmastery.com/start-here/
 7 | 
 8 | <img src="img/scikit-learn.png">
 9 | <img src="img/process.png">
10 | <img src="img/process1.png">
11 | 
12 | ### How to save / upload model
13 | ```py
14 | import joblib
15 | 
16 | model = joblib.load('model.sav') # Load the model
17 | joblib.dump(model, 'model.sav') # Save the model
18 | ```
19 | 
20 | ### K-Nearest Neighbors (KNN) 
21 | > [Notebook](knn.ipynb)
22 | - Measured with Euclidean or Manhattan [distance](https://www.analyticsvidhya.com/blog/2020/02/4-types-of-distance-metrics-in-machine-learning/)
23 | - For **KNN regressor** you take the average of `n_neighbors=23` nearest neighbours
24 | - For **KNN classifier** you take the mood of `n_neighbors=23` nearest neighbours
25 | 
26 | ### SVM
27 | > [Notebook](svm.ipynb)
28 | - `support vectors`, `hyperplane`, `margin`, `linear seperable`, `non-linear seperable`
29 | - Our goal is to **maximize** the **margin** (distance between marginal hyperplanes)
30 | - **SVM kernels** - transforms from low-dimension to high-dimension
31 | 
32 | ### K-Means Clustering
33 | 1. Select **K** value - centroid
34 | 2. Initialize centroids randomly
35 | 3. Calculate **Euclidean distance** between two points
36 | 4. Select the group and find the **mean**
37 | 5. Move controid to that mean
38 | 
39 | - How to select **K**?
40 |     - Elbow method


--------------------------------------------------------------------------------
/scikit-learn/img/process.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Rustam-Z/machine-learning/5001d7d103642a61f82492df3a968aa6f4836601/scikit-learn/img/process.png


--------------------------------------------------------------------------------
/scikit-learn/img/process1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Rustam-Z/machine-learning/5001d7d103642a61f82492df3a968aa6f4836601/scikit-learn/img/process1.png


--------------------------------------------------------------------------------
/scikit-learn/img/scikit-learn.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Rustam-Z/machine-learning/5001d7d103642a61f82492df3a968aa6f4836601/scikit-learn/img/scikit-learn.png


--------------------------------------------------------------------------------
/scikit-learn/k-means-clustering.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "metadata": {
  3 |   "language_info": {
  4 |    "codemirror_mode": {
  5 |     "name": "ipython",
  6 |     "version": 3
  7 |    },
  8 |    "file_extension": ".py",
  9 |    "mimetype": "text/x-python",
 10 |    "name": "python",
 11 |    "nbconvert_exporter": "python",
 12 |    "pygments_lexer": "ipython3",
 13 |    "version": "3.8.6"
 14 |   },
 15 |   "orig_nbformat": 4,
 16 |   "kernelspec": {
 17 |    "name": "python3",
 18 |    "display_name": "Python 3.8.6 64-bit ('tf': conda)"
 19 |   },
 20 |   "interpreter": {
 21 |    "hash": "4ea0e157563bacde0b7fd8dc93db6051c9678d5eadbd4117abf1a4cecbc8cd1a"
 22 |   }
 23 |  },
 24 |  "nbformat": 4,
 25 |  "nbformat_minor": 2,
 26 |  "cells": [
 27 |   {
 28 |    "cell_type": "code",
 29 |    "execution_count": 1,
 30 |    "metadata": {},
 31 |    "outputs": [],
 32 |    "source": [
 33 |     "from sklearn.datasets import load_breast_cancer\n",
 34 |     "from sklearn.cluster import KMeans\n",
 35 |     "from sklearn.model_selection import train_test_split\n",
 36 |     "from sklearn.metrics import accuracy_score\n",
 37 |     "from sklearn.preprocessing import scale\n",
 38 |     "\n",
 39 |     "import numpy as np \n",
 40 |     "import pandas as pd \n"
 41 |    ]
 42 |   },
 43 |   {
 44 |    "cell_type": "code",
 45 |    "execution_count": 3,
 46 |    "metadata": {},
 47 |    "outputs": [
 48 |     {
 49 |      "output_type": "execute_result",
 50 |      "data": {
 51 |       "text/plain": [
 52 |        "{'data': array([[1.799e+01, 1.038e+01, 1.228e+02, ..., 2.654e-01, 4.601e-01,\n",
 53 |        "         1.189e-01],\n",
 54 |        "        [2.057e+01, 1.777e+01, 1.329e+02, ..., 1.860e-01, 2.750e-01,\n",
 55 |        "         8.902e-02],\n",
 56 |        "        [1.969e+01, 2.125e+01, 1.300e+02, ..., 2.430e-01, 3.613e-01,\n",
 57 |        "         8.758e-02],\n",
 58 |        "        ...,\n",
 59 |        "        [1.660e+01, 2.808e+01, 1.083e+02, ..., 1.418e-01, 2.218e-01,\n",
 60 |        "         7.820e-02],\n",
 61 |        "        [2.060e+01, 2.933e+01, 1.401e+02, ..., 2.650e-01, 4.087e-01,\n",
 62 |        "         1.240e-01],\n",
 63 |        "        [7.760e+00, 2.454e+01, 4.792e+01, ..., 0.000e+00, 2.871e-01,\n",
 64 |        "         7.039e-02]]),\n",
 65 |        " 'target': array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1,\n",
 66 |        "        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,\n",
 67 |        "        0, 0, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0,\n",
 68 |        "        1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0,\n",
 69 |        "        1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 1, 0, 1,\n",
 70 |        "        1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 0,\n",
 71 |        "        0, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1,\n",
 72 |        "        1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 1,\n",
 73 |        "        1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 0,\n",
 74 |        "        0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0,\n",
 75 |        "        1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1,\n",
 76 |        "        1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n",
 77 |        "        0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1,\n",
 78 |        "        1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1,\n",
 79 |        "        1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0,\n",
 80 |        "        0, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0,\n",
 81 |        "        0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0,\n",
 82 |        "        1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1,\n",
 83 |        "        1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 0,\n",
 84 |        "        1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 1, 1,\n",
 85 |        "        1, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0,\n",
 86 |        "        1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1,\n",
 87 |        "        1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 0, 1, 1,\n",
 88 |        "        1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1,\n",
 89 |        "        1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,\n",
 90 |        "        1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1]),\n",
 91 |        " 'frame': None,\n",
 92 |        " 'target_names': array(['malignant', 'benign'], dtype='<U9'),\n",
 93 |        " 'DESCR': '.. _breast_cancer_dataset:\\n\\nBreast cancer wisconsin (diagnostic) dataset\\n--------------------------------------------\\n\\n**Data Set Characteristics:**\\n\\n    :Number of Instances: 569\\n\\n    :Number of Attributes: 30 numeric, predictive attributes and the class\\n\\n    :Attribute Information:\\n        - radius (mean of distances from center to points on the perimeter)\\n        - texture (standard deviation of gray-scale values)\\n        - perimeter\\n        - area\\n        - smoothness (local variation in radius lengths)\\n        - compactness (perimeter^2 / area - 1.0)\\n        - concavity (severity of concave portions of the contour)\\n        - concave points (number of concave portions of the contour)\\n        - symmetry\\n        - fractal dimension (\"coastline approximation\" - 1)\\n\\n        The mean, standard error, and \"worst\" or largest (mean of the three\\n        worst/largest values) of these features were computed for each image,\\n        resulting in 30 features.  For instance, field 0 is Mean Radius, field\\n        10 is Radius SE, field 20 is Worst Radius.\\n\\n        - class:\\n                - WDBC-Malignant\\n                - WDBC-Benign\\n\\n    :Summary Statistics:\\n\\n    ===================================== ====== ======\\n                                           Min    Max\\n    ===================================== ====== ======\\n    radius (mean):                        6.981  28.11\\n    texture (mean):                       9.71   39.28\\n    perimeter (mean):                     43.79  188.5\\n    area (mean):                          143.5  2501.0\\n    smoothness (mean):                    0.053  0.163\\n    compactness (mean):                   0.019  0.345\\n    concavity (mean):                     0.0    0.427\\n    concave points (mean):                0.0    0.201\\n    symmetry (mean):                      0.106  0.304\\n    fractal dimension (mean):             0.05   0.097\\n    radius (standard error):              0.112  2.873\\n    texture (standard error):             0.36   4.885\\n    perimeter (standard error):           0.757  21.98\\n    area (standard error):                6.802  542.2\\n    smoothness (standard error):          0.002  0.031\\n    compactness (standard error):         0.002  0.135\\n    concavity (standard error):           0.0    0.396\\n    concave points (standard error):      0.0    0.053\\n    symmetry (standard error):            0.008  0.079\\n    fractal dimension (standard error):   0.001  0.03\\n    radius (worst):                       7.93   36.04\\n    texture (worst):                      12.02  49.54\\n    perimeter (worst):                    50.41  251.2\\n    area (worst):                         185.2  4254.0\\n    smoothness (worst):                   0.071  0.223\\n    compactness (worst):                  0.027  1.058\\n    concavity (worst):                    0.0    1.252\\n    concave points (worst):               0.0    0.291\\n    symmetry (worst):                     0.156  0.664\\n    fractal dimension (worst):            0.055  0.208\\n    ===================================== ====== ======\\n\\n    :Missing Attribute Values: None\\n\\n    :Class Distribution: 212 - Malignant, 357 - Benign\\n\\n    :Creator:  Dr. William H. Wolberg, W. Nick Street, Olvi L. Mangasarian\\n\\n    :Donor: Nick Street\\n\\n    :Date: November, 1995\\n\\nThis is a copy of UCI ML Breast Cancer Wisconsin (Diagnostic) datasets.\\nhttps://goo.gl/U2Uwz2\\n\\nFeatures are computed from a digitized image of a fine needle\\naspirate (FNA) of a breast mass.  They describe\\ncharacteristics of the cell nuclei present in the image.\\n\\nSeparating plane described above was obtained using\\nMultisurface Method-Tree (MSM-T) [K. P. Bennett, \"Decision Tree\\nConstruction Via Linear Programming.\" Proceedings of the 4th\\nMidwest Artificial Intelligence and Cognitive Science Society,\\npp. 97-101, 1992], a classification method which uses linear\\nprogramming to construct a decision tree.  Relevant features\\nwere selected using an exhaustive search in the space of 1-4\\nfeatures and 1-3 separating planes.\\n\\nThe actual linear program used to obtain the separating plane\\nin the 3-dimensional space is that described in:\\n[K. P. Bennett and O. L. Mangasarian: \"Robust Linear\\nProgramming Discrimination of Two Linearly Inseparable Sets\",\\nOptimization Methods and Software 1, 1992, 23-34].\\n\\nThis database is also available through the UW CS ftp server:\\n\\nftp ftp.cs.wisc.edu\\ncd math-prog/cpo-dataset/machine-learn/WDBC/\\n\\n.. topic:: References\\n\\n   - W.N. Street, W.H. Wolberg and O.L. Mangasarian. Nuclear feature extraction \\n     for breast tumor diagnosis. IS&T/SPIE 1993 International Symposium on \\n     Electronic Imaging: Science and Technology, volume 1905, pages 861-870,\\n     San Jose, CA, 1993.\\n   - O.L. Mangasarian, W.N. Street and W.H. Wolberg. Breast cancer diagnosis and \\n     prognosis via linear programming. Operations Research, 43(4), pages 570-577, \\n     July-August 1995.\\n   - W.H. Wolberg, W.N. Street, and O.L. Mangasarian. Machine learning techniques\\n     to diagnose breast cancer from fine-needle aspirates. Cancer Letters 77 (1994) \\n     163-171.',\n",
 94 |        " 'feature_names': array(['mean radius', 'mean texture', 'mean perimeter', 'mean area',\n",
 95 |        "        'mean smoothness', 'mean compactness', 'mean concavity',\n",
 96 |        "        'mean concave points', 'mean symmetry', 'mean fractal dimension',\n",
 97 |        "        'radius error', 'texture error', 'perimeter error', 'area error',\n",
 98 |        "        'smoothness error', 'compactness error', 'concavity error',\n",
 99 |        "        'concave points error', 'symmetry error',\n",
100 |        "        'fractal dimension error', 'worst radius', 'worst texture',\n",
101 |        "        'worst perimeter', 'worst area', 'worst smoothness',\n",
102 |        "        'worst compactness', 'worst concavity', 'worst concave points',\n",
103 |        "        'worst symmetry', 'worst fractal dimension'], dtype='<U23'),\n",
104 |        " 'filename': '/opt/homebrew/Caskroom/miniforge/base/envs/tf/lib/python3.8/site-packages/sklearn/datasets/data/breast_cancer.csv'}"
105 |       ]
106 |      },
107 |      "metadata": {},
108 |      "execution_count": 3
109 |     }
110 |    ],
111 |    "source": [
112 |     "dataset = load_breast_cancer()\n",
113 |     "dataset"
114 |    ]
115 |   },
116 |   {
117 |    "cell_type": "code",
118 |    "execution_count": 6,
119 |    "metadata": {},
120 |    "outputs": [
121 |     {
122 |      "output_type": "execute_result",
123 |      "data": {
124 |       "text/plain": [
125 |        "(569, 30)"
126 |       ]
127 |      },
128 |      "metadata": {},
129 |      "execution_count": 6
130 |     }
131 |    ],
132 |    "source": [
133 |     "dataset['data'].shape"
134 |    ]
135 |   },
136 |   {
137 |    "cell_type": "code",
138 |    "execution_count": 9,
139 |    "metadata": {},
140 |    "outputs": [
141 |     {
142 |      "output_type": "execute_result",
143 |      "data": {
144 |       "text/plain": [
145 |        "array([[ 1.09706398, -2.07333501,  1.26993369, ...,  2.29607613,\n",
146 |        "         2.75062224,  1.93701461],\n",
147 |        "       [ 1.82982061, -0.35363241,  1.68595471, ...,  1.0870843 ,\n",
148 |        "        -0.24388967,  0.28118999],\n",
149 |        "       [ 1.57988811,  0.45618695,  1.56650313, ...,  1.95500035,\n",
150 |        "         1.152255  ,  0.20139121],\n",
151 |        "       ...,\n",
152 |        "       [ 0.70228425,  2.0455738 ,  0.67267578, ...,  0.41406869,\n",
153 |        "        -1.10454895, -0.31840916],\n",
154 |        "       [ 1.83834103,  2.33645719,  1.98252415, ...,  2.28998549,\n",
155 |        "         1.91908301,  2.21963528],\n",
156 |        "       [-1.80840125,  1.22179204, -1.81438851, ..., -1.74506282,\n",
157 |        "        -0.04813821, -0.75120669]])"
158 |       ]
159 |      },
160 |      "metadata": {},
161 |      "execution_count": 9
162 |     }
163 |    ],
164 |    "source": [
165 |     "X = scale(dataset.data)\n",
166 |     "y = dataset.target\n",
167 |     "X"
168 |    ]
169 |   },
170 |   {
171 |    "cell_type": "code",
172 |    "execution_count": 10,
173 |    "metadata": {},
174 |    "outputs": [],
175 |    "source": [
176 |     "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2) \n",
177 |     "model = KMeans(n_clusters=2, random_state=0)"
178 |    ]
179 |   },
180 |   {
181 |    "cell_type": "code",
182 |    "execution_count": 11,
183 |    "metadata": {},
184 |    "outputs": [
185 |     {
186 |      "output_type": "execute_result",
187 |      "data": {
188 |       "text/plain": [
189 |        "KMeans(n_clusters=2, random_state=0)"
190 |       ]
191 |      },
192 |      "metadata": {},
193 |      "execution_count": 11
194 |     }
195 |    ],
196 |    "source": [
197 |     "model.fit(X_train)"
198 |    ]
199 |   },
200 |   {
201 |    "cell_type": "code",
202 |    "execution_count": 17,
203 |    "metadata": {},
204 |    "outputs": [
205 |     {
206 |      "output_type": "execute_result",
207 |      "data": {
208 |       "text/plain": [
209 |        "array([0, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0,\n",
210 |        "       1, 0, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 0,\n",
211 |        "       1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 0, 1, 1,\n",
212 |        "       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1,\n",
213 |        "       1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1,\n",
214 |        "       1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1,\n",
215 |        "       0, 0, 0, 1, 0, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1,\n",
216 |        "       0, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1,\n",
217 |        "       0, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1,\n",
218 |        "       1, 0, 0, 1, 0, 0, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1, 0, 1, 0,\n",
219 |        "       1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1,\n",
220 |        "       1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1,\n",
221 |        "       0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0,\n",
222 |        "       0, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0,\n",
223 |        "       1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1,\n",
224 |        "       1, 1, 0, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 0,\n",
225 |        "       1, 0, 1, 0, 0, 1, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 0, 1,\n",
226 |        "       0, 1, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 1, 0, 1,\n",
227 |        "       0, 0, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0,\n",
228 |        "       0, 1, 1, 1, 1, 0, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1,\n",
229 |        "       0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1], dtype=int32)"
230 |       ]
231 |      },
232 |      "metadata": {},
233 |      "execution_count": 17
234 |     }
235 |    ],
236 |    "source": [
237 |     "labels = model.labels_\n",
238 |     "labels # clusters for X_train == y_train"
239 |    ]
240 |   },
241 |   {
242 |    "cell_type": "code",
243 |    "execution_count": 15,
244 |    "metadata": {},
245 |    "outputs": [
246 |     {
247 |      "output_type": "execute_result",
248 |      "data": {
249 |       "text/plain": [
250 |        "array([1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1,\n",
251 |        "       0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 0,\n",
252 |        "       1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1,\n",
253 |        "       1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 1, 0, 0, 0, 1,\n",
254 |        "       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 1,\n",
255 |        "       0, 1, 1, 1], dtype=int32)"
256 |       ]
257 |      },
258 |      "metadata": {},
259 |      "execution_count": 15
260 |     }
261 |    ],
262 |    "source": [
263 |     "pred = model.predict(X_test)\n",
264 |     "pred # clusters for X_test == y_test"
265 |    ]
266 |   },
267 |   {
268 |    "cell_type": "code",
269 |    "execution_count": 18,
270 |    "metadata": {},
271 |    "outputs": [
272 |     {
273 |      "output_type": "execute_result",
274 |      "data": {
275 |       "text/plain": [
276 |        "(0.9054945054945055, 0.9122807017543859)"
277 |       ]
278 |      },
279 |      "metadata": {},
280 |      "execution_count": 18
281 |     }
282 |    ],
283 |    "source": [
284 |     "acc_train = accuracy_score(y_train, labels)\n",
285 |     "acc_test = accuracy_score(y_test, pred)\n",
286 |     "acc_train, acc_test"
287 |    ]
288 |   },
289 |   {
290 |    "cell_type": "code",
291 |    "execution_count": 20,
292 |    "metadata": {},
293 |    "outputs": [
294 |     {
295 |      "output_type": "stream",
296 |      "name": "stdout",
297 |      "text": [
298 |       "[0 1 0 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 1 0 0 1 1 0]\n[0 0 0 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 0 1 0 1 0 1 0 0 1 1 0]\n"
299 |      ]
300 |     }
301 |    ],
302 |    "source": [
303 |     "print(labels[:30])\n",
304 |     "print(y_train[:30])"
305 |    ]
306 |   },
307 |   {
308 |    "cell_type": "code",
309 |    "execution_count": 21,
310 |    "metadata": {},
311 |    "outputs": [
312 |     {
313 |      "output_type": "execute_result",
314 |      "data": {
315 |       "text/plain": [
316 |        "col_0    0    1\n",
317 |        "row_0          \n",
318 |        "0      146   30\n",
319 |        "1       13  266"
320 |       ],
321 |       "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th>col_0</th>\n      <th>0</th>\n      <th>1</th>\n    </tr>\n    <tr>\n      <th>row_0</th>\n      <th></th>\n      <th></th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>146</td>\n      <td>30</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>13</td>\n      <td>266</td>\n    </tr>\n  </tbody>\n</table>\n</div>"
322 |      },
323 |      "metadata": {},
324 |      "execution_count": 21
325 |     }
326 |    ],
327 |    "source": [
328 |     "# SOMETIMES IT MAY FLIP THE CLUSTERS, THEN WE MUST USE\n",
329 |     "\n",
330 |     "pd.crosstab(y_train, labels)"
331 |    ]
332 |   },
333 |   {
334 |    "cell_type": "code",
335 |    "execution_count": null,
336 |    "metadata": {},
337 |    "outputs": [],
338 |    "source": []
339 |   }
340 |  ]
341 | }


--------------------------------------------------------------------------------
/scikit-learn/knn.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "metadata": {
  3 |   "language_info": {
  4 |    "codemirror_mode": {
  5 |     "name": "ipython",
  6 |     "version": 3
  7 |    },
  8 |    "file_extension": ".py",
  9 |    "mimetype": "text/x-python",
 10 |    "name": "python",
 11 |    "nbconvert_exporter": "python",
 12 |    "pygments_lexer": "ipython3",
 13 |    "version": "3.8.6"
 14 |   },
 15 |   "orig_nbformat": 4,
 16 |   "kernelspec": {
 17 |    "name": "python3",
 18 |    "display_name": "Python 3.8.6 64-bit ('tf': conda)"
 19 |   },
 20 |   "interpreter": {
 21 |    "hash": "4ea0e157563bacde0b7fd8dc93db6051c9678d5eadbd4117abf1a4cecbc8cd1a"
 22 |   }
 23 |  },
 24 |  "nbformat": 4,
 25 |  "nbformat_minor": 2,
 26 |  "cells": [
 27 |   {
 28 |    "cell_type": "code",
 29 |    "execution_count": 14,
 30 |    "metadata": {},
 31 |    "outputs": [],
 32 |    "source": [
 33 |     "import numpy as np \n",
 34 |     "import pandas as pd \n",
 35 |     "from sklearn import neighbors, metrics\n",
 36 |     "from sklearn.model_selection import train_test_split\n",
 37 |     "from sklearn.preprocessing import LabelEncoder"
 38 |    ]
 39 |   },
 40 |   {
 41 |    "cell_type": "code",
 42 |    "execution_count": 15,
 43 |    "metadata": {},
 44 |    "outputs": [
 45 |     {
 46 |      "output_type": "execute_result",
 47 |      "data": {
 48 |       "text/plain": [
 49 |        "  buying  maint doors persons lug_boot safety  class\n",
 50 |        "0  vhigh  vhigh     2       2    small    low  unacc\n",
 51 |        "1  vhigh  vhigh     2       2    small    med  unacc\n",
 52 |        "2  vhigh  vhigh     2       2    small   high  unacc\n",
 53 |        "3  vhigh  vhigh     2       2      med    low  unacc\n",
 54 |        "4  vhigh  vhigh     2       2      med    med  unacc"
 55 |       ],
 56 |       "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>buying</th>\n      <th>maint</th>\n      <th>doors</th>\n      <th>persons</th>\n      <th>lug_boot</th>\n      <th>safety</th>\n      <th>class</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>vhigh</td>\n      <td>vhigh</td>\n      <td>2</td>\n      <td>2</td>\n      <td>small</td>\n      <td>low</td>\n      <td>unacc</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>vhigh</td>\n      <td>vhigh</td>\n      <td>2</td>\n      <td>2</td>\n      <td>small</td>\n      <td>med</td>\n      <td>unacc</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>vhigh</td>\n      <td>vhigh</td>\n      <td>2</td>\n      <td>2</td>\n      <td>small</td>\n      <td>high</td>\n      <td>unacc</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>vhigh</td>\n      <td>vhigh</td>\n      <td>2</td>\n      <td>2</td>\n      <td>med</td>\n      <td>low</td>\n      <td>unacc</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>vhigh</td>\n      <td>vhigh</td>\n      <td>2</td>\n      <td>2</td>\n      <td>med</td>\n      <td>med</td>\n      <td>unacc</td>\n    </tr>\n  </tbody>\n</table>\n</div>"
 57 |      },
 58 |      "metadata": {},
 59 |      "execution_count": 15
 60 |     }
 61 |    ],
 62 |    "source": [
 63 |     "data = pd.read_csv(\"car_evaluation.csv\")\n",
 64 |     "data.head()"
 65 |    ]
 66 |   },
 67 |   {
 68 |    "cell_type": "code",
 69 |    "execution_count": 16,
 70 |    "metadata": {},
 71 |    "outputs": [
 72 |     {
 73 |      "output_type": "execute_result",
 74 |      "data": {
 75 |       "text/plain": [
 76 |        "  buying  maint safety\n",
 77 |        "0  vhigh  vhigh    low\n",
 78 |        "1  vhigh  vhigh    med\n",
 79 |        "2  vhigh  vhigh   high\n",
 80 |        "3  vhigh  vhigh    low\n",
 81 |        "4  vhigh  vhigh    med"
 82 |       ],
 83 |       "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>buying</th>\n      <th>maint</th>\n      <th>safety</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>vhigh</td>\n      <td>vhigh</td>\n      <td>low</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>vhigh</td>\n      <td>vhigh</td>\n      <td>med</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>vhigh</td>\n      <td>vhigh</td>\n      <td>high</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>vhigh</td>\n      <td>vhigh</td>\n      <td>low</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>vhigh</td>\n      <td>vhigh</td>\n      <td>med</td>\n    </tr>\n  </tbody>\n</table>\n</div>"
 84 |      },
 85 |      "metadata": {},
 86 |      "execution_count": 16
 87 |     }
 88 |    ],
 89 |    "source": [
 90 |     "# Select features\n",
 91 |     "X = data[['buying', 'maint', 'safety']]\n",
 92 |     "X.head()"
 93 |    ]
 94 |   },
 95 |   {
 96 |    "cell_type": "code",
 97 |    "execution_count": 17,
 98 |    "metadata": {},
 99 |    "outputs": [
100 |     {
101 |      "output_type": "execute_result",
102 |      "data": {
103 |       "text/plain": [
104 |        "0    unacc\n",
105 |        "1    unacc\n",
106 |        "2    unacc\n",
107 |        "3    unacc\n",
108 |        "4    unacc\n",
109 |        "Name: class, dtype: object"
110 |       ]
111 |      },
112 |      "metadata": {},
113 |      "execution_count": 17
114 |     }
115 |    ],
116 |    "source": [
117 |     "# Select the label\n",
118 |     "y = data['class']\n",
119 |     "y.head()"
120 |    ]
121 |   },
122 |   {
123 |    "cell_type": "code",
124 |    "execution_count": 18,
125 |    "metadata": {},
126 |    "outputs": [
127 |     {
128 |      "output_type": "execute_result",
129 |      "data": {
130 |       "text/plain": [
131 |        "array([['vhigh', 'vhigh', 'low'],\n",
132 |        "       ['vhigh', 'vhigh', 'med'],\n",
133 |        "       ['vhigh', 'vhigh', 'high'],\n",
134 |        "       ...,\n",
135 |        "       ['low', 'low', 'low'],\n",
136 |        "       ['low', 'low', 'med'],\n",
137 |        "       ['low', 'low', 'high']], dtype=object)"
138 |       ]
139 |      },
140 |      "metadata": {},
141 |      "execution_count": 18
142 |     }
143 |    ],
144 |    "source": [
145 |     "X = X.values # NumPy array\n",
146 |     "X"
147 |    ]
148 |   },
149 |   {
150 |    "cell_type": "code",
151 |    "execution_count": 19,
152 |    "metadata": {},
153 |    "outputs": [
154 |     {
155 |      "output_type": "stream",
156 |      "name": "stdout",
157 |      "text": [
158 |       "(1728, 3)\n['vhigh' 'vhigh' 'vhigh' ... 'low' 'low' 'low']\n['vhigh' 'vhigh' 'vhigh' ... 'low' 'low' 'low']\n['low' 'med' 'high' ... 'low' 'med' 'high']\n"
159 |      ]
160 |     },
161 |     {
162 |      "output_type": "execute_result",
163 |      "data": {
164 |       "text/plain": [
165 |        "array([[3, 3, 1],\n",
166 |        "       [3, 3, 2],\n",
167 |        "       [3, 3, 0],\n",
168 |        "       [3, 3, 1],\n",
169 |        "       [3, 3, 2]], dtype=object)"
170 |       ]
171 |      },
172 |      "metadata": {},
173 |      "execution_count": 19
174 |     }
175 |    ],
176 |    "source": [
177 |     "\"\"\" \n",
178 |     "Now we have the problem: our data consists of strings, we need to convert into nums with LabelEncoder\n",
179 |     "\"\"\"\n",
180 |     "# X conversion\n",
181 |     "print(X.shape)\n",
182 |     "\n",
183 |     "for i in range(X.shape[1]): # 3\n",
184 |     "    print(X[:, i]) # Selects the first element for 3 columns\n",
185 |     "\n",
186 |     "LE = LabelEncoder()\n",
187 |     "for i in range(len(X[0])):\n",
188 |     "    X[:, i] = LE.fit_transform(X[:, i])\n",
189 |     "\n",
190 |     "X[:5] # vhigh=3, med=2, low=1, high=0"
191 |    ]
192 |   },
193 |   {
194 |    "cell_type": "code",
195 |    "execution_count": 21,
196 |    "metadata": {},
197 |    "outputs": [
198 |     {
199 |      "output_type": "execute_result",
200 |      "data": {
201 |       "text/plain": [
202 |        "array([0, 0, 0, ..., 0, 2, 3])"
203 |       ]
204 |      },
205 |      "metadata": {},
206 |      "execution_count": 21
207 |     }
208 |    ],
209 |    "source": [
210 |     "# y conversion\n",
211 |     "label_mapping = {\n",
212 |     "    'unacc':0,\n",
213 |     "    'acc':1,\n",
214 |     "    'good':2,\n",
215 |     "    'vgood':3,\n",
216 |     "}\n",
217 |     "\n",
218 |     "y = y.map(label_mapping)\n",
219 |     "y = np.array(y)\n",
220 |     "y"
221 |    ]
222 |   },
223 |   {
224 |    "cell_type": "code",
225 |    "execution_count": 29,
226 |    "metadata": {},
227 |    "outputs": [
228 |     {
229 |      "output_type": "execute_result",
230 |      "data": {
231 |       "text/plain": [
232 |        "KNeighborsClassifier(n_neighbors=23)"
233 |       ]
234 |      },
235 |      "metadata": {},
236 |      "execution_count": 29
237 |     }
238 |    ],
239 |    "source": [
240 |     "# KNN Model\n",
241 |     "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2) # 20% to test set\n",
242 |     "\n",
243 |     "knn = neighbors.KNeighborsClassifier(n_neighbors=23, weights='uniform')\n",
244 |     "knn.fit(X_train, y_train)"
245 |    ]
246 |   },
247 |   {
248 |    "cell_type": "code",
249 |    "execution_count": 30,
250 |    "metadata": {},
251 |    "outputs": [
252 |     {
253 |      "output_type": "execute_result",
254 |      "data": {
255 |       "text/plain": [
256 |        "0.7485549132947977"
257 |       ]
258 |      },
259 |      "metadata": {},
260 |      "execution_count": 30
261 |     }
262 |    ],
263 |    "source": [
264 |     "predictions = knn.predict(X_test)\n",
265 |     "accuracy = metrics.accuracy_score(y_test, predictions)\n",
266 |     "accuracy"
267 |    ]
268 |   },
269 |   {
270 |    "cell_type": "code",
271 |    "execution_count": 31,
272 |    "metadata": {},
273 |    "outputs": [
274 |     {
275 |      "output_type": "execute_result",
276 |      "data": {
277 |       "text/plain": [
278 |        "array([0, 1, 1, 0, 0, 1, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0,\n",
279 |        "       0, 0, 0, 1, 0, 0, 0, 2, 1, 0, 0, 1, 2, 0, 1, 2, 2, 0, 1, 0, 0, 0,\n",
280 |        "       0, 0, 1, 0, 0, 2, 0, 1, 1, 1, 0, 0, 0, 1, 1, 0, 0, 2, 0, 0, 2, 1,\n",
281 |        "       0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 2, 0, 0, 1,\n",
282 |        "       1, 2, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1,\n",
283 |        "       0, 2, 0, 0, 0, 0, 2, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0,\n",
284 |        "       0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 2, 0, 0, 1, 1, 0, 1, 0, 0, 2,\n",
285 |        "       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0,\n",
286 |        "       1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,\n",
287 |        "       2, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0,\n",
288 |        "       0, 0, 0, 1, 0, 0, 0, 2, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,\n",
289 |        "       1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 1, 0, 0,\n",
290 |        "       0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n",
291 |        "       0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0,\n",
292 |        "       1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 2, 2, 0, 0, 0, 1, 1, 1, 1,\n",
293 |        "       0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1])"
294 |       ]
295 |      },
296 |      "metadata": {},
297 |      "execution_count": 31
298 |     }
299 |    ],
300 |    "source": [
301 |     "predictions"
302 |    ]
303 |   },
304 |   {
305 |    "cell_type": "code",
306 |    "execution_count": null,
307 |    "metadata": {},
308 |    "outputs": [],
309 |    "source": [
310 |     "# For KNN regressor you take the average of n_neighbors = 23 nearest neighbours\n",
311 |     "# For KNN classifier you take the mood of n_neighbors = 23 nearest neighbours"
312 |    ]
313 |   }
314 |  ]
315 | }


--------------------------------------------------------------------------------
/scikit-learn/logistic_regression.ipynb:
--------------------------------------------------------------------------------
 1 | {
 2 |  "metadata": {
 3 |   "language_info": {
 4 |    "codemirror_mode": {
 5 |     "name": "ipython",
 6 |     "version": 3
 7 |    },
 8 |    "file_extension": ".py",
 9 |    "mimetype": "text/x-python",
10 |    "name": "python",
11 |    "nbconvert_exporter": "python",
12 |    "pygments_lexer": "ipython3",
13 |    "version": 3
14 |   },
15 |   "orig_nbformat": 4
16 |  },
17 |  "nbformat": 4,
18 |  "nbformat_minor": 2,
19 |  "cells": [
20 |   {
21 |    "cell_type": "code",
22 |    "execution_count": null,
23 |    "metadata": {},
24 |    "outputs": [],
25 |    "source": [
26 |     "\"\"\"Logistic regression\"\"\"\n",
27 |     "\n"
28 |    ]
29 |   }
30 |  ]
31 | }


--------------------------------------------------------------------------------
/scikit-learn/svm.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "metadata": {
  3 |   "language_info": {
  4 |    "codemirror_mode": {
  5 |     "name": "ipython",
  6 |     "version": 3
  7 |    },
  8 |    "file_extension": ".py",
  9 |    "mimetype": "text/x-python",
 10 |    "name": "python",
 11 |    "nbconvert_exporter": "python",
 12 |    "pygments_lexer": "ipython3",
 13 |    "version": "3.8.6"
 14 |   },
 15 |   "orig_nbformat": 4,
 16 |   "kernelspec": {
 17 |    "name": "python3",
 18 |    "display_name": "Python 3.8.6 64-bit ('tf': conda)"
 19 |   },
 20 |   "interpreter": {
 21 |    "hash": "4ea0e157563bacde0b7fd8dc93db6051c9678d5eadbd4117abf1a4cecbc8cd1a"
 22 |   }
 23 |  },
 24 |  "nbformat": 4,
 25 |  "nbformat_minor": 2,
 26 |  "cells": [
 27 |   {
 28 |    "cell_type": "code",
 29 |    "execution_count": 15,
 30 |    "metadata": {},
 31 |    "outputs": [],
 32 |    "source": [
 33 |     "from sklearn import datasets\n",
 34 |     "from sklearn.model_selection import train_test_split\n",
 35 |     "from sklearn.metrics import accuracy_score\n",
 36 |     "from sklearn import svm\n",
 37 |     "\n",
 38 |     "import numpy as np"
 39 |    ]
 40 |   },
 41 |   {
 42 |    "cell_type": "code",
 43 |    "execution_count": 9,
 44 |    "metadata": {},
 45 |    "outputs": [],
 46 |    "source": [
 47 |     "iris = datasets.load_iris()\n",
 48 |     "classes = ['Iris Setosa', 'Iris Versicolour', 'Iris Virginica']\n",
 49 |     "\n",
 50 |     "# Split into features and labels\n",
 51 |     "X = iris.data\n",
 52 |     "y = iris.target"
 53 |    ]
 54 |   },
 55 |   {
 56 |    "cell_type": "code",
 57 |    "execution_count": 10,
 58 |    "metadata": {},
 59 |    "outputs": [
 60 |     {
 61 |      "output_type": "stream",
 62 |      "name": "stdout",
 63 |      "text": [
 64 |       "[[5.1 3.5 1.4 0.2]\n [4.9 3.  1.4 0.2]\n [4.7 3.2 1.3 0.2]\n [4.6 3.1 1.5 0.2]\n [5.  3.6 1.4 0.2]]\n[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1\n 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2\n 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2\n 2 2]\n"
 65 |      ]
 66 |     }
 67 |    ],
 68 |    "source": [
 69 |     "print(X[:5]) # NumPy array\n",
 70 |     "print(y) # So we have 3 labels"
 71 |    ]
 72 |   },
 73 |   {
 74 |    "cell_type": "code",
 75 |    "execution_count": 11,
 76 |    "metadata": {},
 77 |    "outputs": [
 78 |     {
 79 |      "output_type": "stream",
 80 |      "name": "stdout",
 81 |      "text": [
 82 |       "(150, 4)\n150\n"
 83 |      ]
 84 |     }
 85 |    ],
 86 |    "source": [
 87 |     "print(X.shape) \n",
 88 |     "print(len(y))"
 89 |    ]
 90 |   },
 91 |   {
 92 |    "cell_type": "code",
 93 |    "execution_count": 12,
 94 |    "metadata": {},
 95 |    "outputs": [],
 96 |    "source": [
 97 |     "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2) # 20% to test set"
 98 |    ]
 99 |   },
100 |   {
101 |    "cell_type": "code",
102 |    "execution_count": 13,
103 |    "metadata": {},
104 |    "outputs": [
105 |     {
106 |      "output_type": "execute_result",
107 |      "data": {
108 |       "text/plain": [
109 |        "SVC()"
110 |       ]
111 |      },
112 |      "metadata": {},
113 |      "execution_count": 13
114 |     }
115 |    ],
116 |    "source": [
117 |     "model = svm.SVC() # Classifier\n",
118 |     "model.fit(X_train, y_train)"
119 |    ]
120 |   },
121 |   {
122 |    "cell_type": "code",
123 |    "execution_count": 18,
124 |    "metadata": {},
125 |    "outputs": [
126 |     {
127 |      "output_type": "execute_result",
128 |      "data": {
129 |       "text/plain": [
130 |        "0.9333333333333333"
131 |       ]
132 |      },
133 |      "metadata": {},
134 |      "execution_count": 18
135 |     }
136 |    ],
137 |    "source": [
138 |     "predictions = model.predict(X_test)\n",
139 |     "acc = accuracy_score(predictions, y_test)\n",
140 |     "acc"
141 |    ]
142 |   },
143 |   {
144 |    "cell_type": "code",
145 |    "execution_count": 19,
146 |    "metadata": {},
147 |    "outputs": [
148 |     {
149 |      "output_type": "execute_result",
150 |      "data": {
151 |       "text/plain": [
152 |        "array([0, 2, 0, 1, 1, 2, 2, 1, 2, 1, 0, 2, 0, 0, 1, 1, 2, 2, 2, 2, 1, 0,\n",
153 |        "       1, 0, 0, 2, 1, 2, 1, 1])"
154 |       ]
155 |      },
156 |      "metadata": {},
157 |      "execution_count": 19
158 |     }
159 |    ],
160 |    "source": [
161 |     "predictions"
162 |    ]
163 |   }
164 |  ]
165 | }


--------------------------------------------------------------------------------
/scikit-learn/train_test_split.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "metadata": {
  3 |   "language_info": {
  4 |    "codemirror_mode": {
  5 |     "name": "ipython",
  6 |     "version": 3
  7 |    },
  8 |    "file_extension": ".py",
  9 |    "mimetype": "text/x-python",
 10 |    "name": "python",
 11 |    "nbconvert_exporter": "python",
 12 |    "pygments_lexer": "ipython3",
 13 |    "version": "3.8.6"
 14 |   },
 15 |   "orig_nbformat": 4,
 16 |   "kernelspec": {
 17 |    "name": "python3",
 18 |    "display_name": "Python 3.8.6 64-bit ('tf': conda)"
 19 |   },
 20 |   "interpreter": {
 21 |    "hash": "4ea0e157563bacde0b7fd8dc93db6051c9678d5eadbd4117abf1a4cecbc8cd1a"
 22 |   }
 23 |  },
 24 |  "nbformat": 4,
 25 |  "nbformat_minor": 2,
 26 |  "cells": [
 27 |   {
 28 |    "cell_type": "code",
 29 |    "execution_count": 15,
 30 |    "metadata": {},
 31 |    "outputs": [],
 32 |    "source": [
 33 |     "from sklearn import datasets\n",
 34 |     "from sklearn.model_selection import train_test_split\n",
 35 |     "import numpy as np"
 36 |    ]
 37 |   },
 38 |   {
 39 |    "cell_type": "code",
 40 |    "execution_count": 2,
 41 |    "metadata": {},
 42 |    "outputs": [],
 43 |    "source": [
 44 |     "iris = datasets.load_iris()\n",
 45 |     "\n",
 46 |     "# Split into features and labels\n",
 47 |     "X = iris.data\n",
 48 |     "y = iris.target"
 49 |    ]
 50 |   },
 51 |   {
 52 |    "cell_type": "code",
 53 |    "execution_count": 13,
 54 |    "metadata": {},
 55 |    "outputs": [
 56 |     {
 57 |      "output_type": "stream",
 58 |      "name": "stdout",
 59 |      "text": [
 60 |       "[[5.1 3.5 1.4 0.2]\n [4.9 3.  1.4 0.2]\n [4.7 3.2 1.3 0.2]\n [4.6 3.1 1.5 0.2]\n [5.  3.6 1.4 0.2]]\n[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1\n 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2\n 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2\n 2 2]\n"
 61 |      ]
 62 |     }
 63 |    ],
 64 |    "source": [
 65 |     "print(X[:5]) # NumPy array\n",
 66 |     "print(y) # So we have 3 labels"
 67 |    ]
 68 |   },
 69 |   {
 70 |    "cell_type": "code",
 71 |    "execution_count": 12,
 72 |    "metadata": {},
 73 |    "outputs": [
 74 |     {
 75 |      "output_type": "stream",
 76 |      "name": "stdout",
 77 |      "text": [
 78 |       "(150, 4)\n150\n"
 79 |      ]
 80 |     }
 81 |    ],
 82 |    "source": [
 83 |     "print(X.shape) \n",
 84 |     "print(len(y))"
 85 |    ]
 86 |   },
 87 |   {
 88 |    "cell_type": "code",
 89 |    "execution_count": 19,
 90 |    "metadata": {},
 91 |    "outputs": [
 92 |     {
 93 |      "output_type": "stream",
 94 |      "name": "stdout",
 95 |      "text": [
 96 |       "30.0\n"
 97 |      ]
 98 |     }
 99 |    ],
100 |    "source": [
101 |     "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2) # 20% to test set\n",
102 |     "print(150 * 0.2) # 120 / 30"
103 |    ]
104 |   },
105 |   {
106 |    "cell_type": "code",
107 |    "execution_count": 18,
108 |    "metadata": {},
109 |    "outputs": [
110 |     {
111 |      "output_type": "execute_result",
112 |      "data": {
113 |       "text/plain": [
114 |        "(120, 4)"
115 |       ]
116 |      },
117 |      "metadata": {},
118 |      "execution_count": 18
119 |     }
120 |    ],
121 |    "source": [
122 |     "X_train.shape"
123 |    ]
124 |   },
125 |   {
126 |    "cell_type": "code",
127 |    "execution_count": 20,
128 |    "metadata": {},
129 |    "outputs": [
130 |     {
131 |      "output_type": "execute_result",
132 |      "data": {
133 |       "text/plain": [
134 |        "120"
135 |       ]
136 |      },
137 |      "metadata": {},
138 |      "execution_count": 20
139 |     }
140 |    ],
141 |    "source": [
142 |     "len(y_train)"
143 |    ]
144 |   }
145 |  ]
146 | }


--------------------------------------------------------------------------------
/tensorflow-in-practice/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Rustam-Z/machine-learning/5001d7d103642a61f82492df3a968aa6f4836601/tensorflow-in-practice/.DS_Store


--------------------------------------------------------------------------------
/tensorflow-in-practice/Exercises/Exercise_2_Handwriting_Recognition_DNN.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |   "nbformat": 4,
  3 |   "nbformat_minor": 0,
  4 |   "metadata": {
  5 |     "colab": {
  6 |       "name": "Exercise2-Question.ipynb",
  7 |       "provenance": [],
  8 |       "collapsed_sections": [],
  9 |       "toc_visible": true
 10 |     },
 11 |     "kernelspec": {
 12 |       "name": "python386jvsc74a57bd04ea0e157563bacde0b7fd8dc93db6051c9678d5eadbd4117abf1a4cecbc8cd1a",
 13 |       "display_name": "Python 3.8.6 64-bit ('tf': conda)"
 14 |     },
 15 |     "metadata": {
 16 |       "interpreter": {
 17 |         "hash": "4ea0e157563bacde0b7fd8dc93db6051c9678d5eadbd4117abf1a4cecbc8cd1a"
 18 |       }
 19 |     }
 20 |   },
 21 |   "cells": [
 22 |     {
 23 |       "cell_type": "code",
 24 |       "execution_count": null,
 25 |       "metadata": {},
 26 |       "outputs": [],
 27 |       "source": [
 28 |         "# Rustam-Z"
 29 |       ]
 30 |     },
 31 |     {
 32 |       "cell_type": "code",
 33 |       "metadata": {
 34 |         "colab": {
 35 |           "base_uri": "https://localhost:8080/"
 36 |         },
 37 |         "id": "9rvXQGAA0ssC",
 38 |         "outputId": "60861935-7551-475e-e8c4-507b43cc6de7"
 39 |       },
 40 |       "source": [
 41 |         "import tensorflow as tf\n",
 42 |         "\n",
 43 |         "class myCallback(tf.keras.callbacks.Callback):\n",
 44 |         "  def on_epoch_end(self, epoch, logs={}):\n",
 45 |         "    if(logs.get('accuracy')>0.99):\n",
 46 |         "      print( \"Reached 99% accuracy so cancelling training!\")\n",
 47 |         "      self.model.stop_training = True\n",
 48 |         "\n",
 49 |         "\n",
 50 |         "mnist = tf.keras.datasets.mnist\n",
 51 |         "(x_train, y_train),(x_test, y_test) = mnist.load_data()\n",
 52 |         "x_train, x_test = x_train / 255.0, x_test / 255.0\n",
 53 |         "\n",
 54 |         "callbacks = myCallback()\n",
 55 |         "\n",
 56 |         "model = tf.keras.models.Sequential([\n",
 57 |         "  tf.keras.layers.Flatten(input_shape=(28, 28)),\n",
 58 |         "  tf.keras.layers.Dense(512, activation=\"relu\"),\n",
 59 |         "  tf.keras.layers.Dense(10, activation=\"softmax\")\n",
 60 |         "])\n",
 61 |         "\n",
 62 |         "model.compile(optimizer='adam',\n",
 63 |         "              loss='sparse_categorical_crossentropy',\n",
 64 |         "              metrics=['accuracy'])\n",
 65 |         "\n",
 66 |         "model.fit(x_train, y_train,  epochs=5, callbacks=[callbacks])"
 67 |       ],
 68 |       "execution_count": 1,
 69 |       "outputs": [
 70 |         {
 71 |           "output_type": "stream",
 72 |           "name": "stdout",
 73 |           "text": [
 74 |             "Epoch 1/5\n",
 75 |             "1875/1875 [==============================] - 1s 656us/step - loss: 0.3419 - accuracy: 0.9011\n",
 76 |             "Epoch 2/5\n",
 77 |             "1875/1875 [==============================] - 1s 649us/step - loss: 0.0835 - accuracy: 0.9749\n",
 78 |             "Epoch 3/5\n",
 79 |             "1875/1875 [==============================] - 1s 653us/step - loss: 0.0527 - accuracy: 0.9835\n",
 80 |             "Epoch 4/5\n",
 81 |             "1875/1875 [==============================] - 1s 655us/step - loss: 0.0366 - accuracy: 0.9877\n",
 82 |             "Epoch 5/5\n",
 83 |             "1875/1875 [==============================] - 1s 653us/step - loss: 0.0248 - accuracy: 0.9925\n",
 84 |             "Reached 99% accuracy so cancelling training!\n"
 85 |           ]
 86 |         },
 87 |         {
 88 |           "output_type": "execute_result",
 89 |           "data": {
 90 |             "text/plain": [
 91 |               "<tensorflow.python.keras.callbacks.History at 0x2b53ebc10>"
 92 |             ]
 93 |           },
 94 |           "metadata": {},
 95 |           "execution_count": 1
 96 |         }
 97 |       ]
 98 |     },
 99 |     {
100 |       "cell_type": "code",
101 |       "metadata": {
102 |         "id": "qErwFEW0mz0H",
103 |         "outputId": "3d8ba790-8c5e-4a55-c824-ecd92a00352c",
104 |         "colab": {
105 |           "base_uri": "https://localhost:8080/"
106 |         }
107 |       },
108 |       "source": [
109 |         "import tensorflow as tf\n",
110 |         "\n",
111 |         "print(tf.nn.relu)"
112 |       ],
113 |       "execution_count": 2,
114 |       "outputs": [
115 |         {
116 |           "output_type": "stream",
117 |           "name": "stdout",
118 |           "text": [
119 |             "<function relu at 0x132576790>\n"
120 |           ]
121 |         }
122 |       ]
123 |     },
124 |     {
125 |       "cell_type": "code",
126 |       "metadata": {
127 |         "id": "18I-y7X-q84V",
128 |         "outputId": "e9478496-2bc1-4ce9-ee62-82af2df4a8df",
129 |         "colab": {
130 |           "base_uri": "https://localhost:8080/"
131 |         }
132 |       },
133 |       "source": [
134 |         "model.evaluate(x_test, y_test)"
135 |       ],
136 |       "execution_count": 3,
137 |       "outputs": [
138 |         {
139 |           "output_type": "stream",
140 |           "name": "stdout",
141 |           "text": [
142 |             "313/313 [==============================] - 0s 293us/step - loss: 0.0637 - accuracy: 0.9809\n"
143 |           ]
144 |         },
145 |         {
146 |           "output_type": "execute_result",
147 |           "data": {
148 |             "text/plain": [
149 |               "[0.06370978057384491, 0.98089998960495]"
150 |             ]
151 |           },
152 |           "metadata": {},
153 |           "execution_count": 3
154 |         }
155 |       ]
156 |     }
157 |   ]
158 | }


--------------------------------------------------------------------------------
/tensorflow-in-practice/Exercises/Exercise_3_CNN.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |   "nbformat": 4,
  3 |   "nbformat_minor": 0,
  4 |   "metadata": {
  5 |     "colab": {
  6 |       "name": "Exercise 3 - Question.ipynb",
  7 |       "provenance": [],
  8 |       "collapsed_sections": []
  9 |     },
 10 |     "kernelspec": {
 11 |       "name": "python386jvsc74a57bd04ea0e157563bacde0b7fd8dc93db6051c9678d5eadbd4117abf1a4cecbc8cd1a",
 12 |       "display_name": "Python 3.8.6 64-bit ('tf': conda)"
 13 |     },
 14 |     "metadata": {
 15 |       "interpreter": {
 16 |         "hash": "4ea0e157563bacde0b7fd8dc93db6051c9678d5eadbd4117abf1a4cecbc8cd1a"
 17 |       }
 18 |     }
 19 |   },
 20 |   "cells": [
 21 |     {
 22 |       "cell_type": "code",
 23 |       "metadata": {
 24 |         "id": "yl3yB8J_PCZM"
 25 |       },
 26 |       "source": [
 27 |         "# Rustam-Z"
 28 |       ],
 29 |       "execution_count": null,
 30 |       "outputs": []
 31 |     },
 32 |     {
 33 |       "cell_type": "code",
 34 |       "metadata": {
 35 |         "colab": {
 36 |           "base_uri": "https://localhost:8080/"
 37 |         },
 38 |         "id": "KtixUwmvSD0A",
 39 |         "outputId": "34a18be4-67c7-4147-a4e7-62efa5fe3124"
 40 |       },
 41 |       "source": [
 42 |         "import tensorflow as tf\n",
 43 |         "\n",
 44 |         "mnist = tf.keras.datasets.mnist\n",
 45 |         "(training_images, training_labels), (test_images, test_labels) = mnist.load_data()"
 46 |       ],
 47 |       "execution_count": 1,
 48 |       "outputs": []
 49 |     },
 50 |     {
 51 |       "cell_type": "code",
 52 |       "metadata": {
 53 |         "id": "EiLuNPb-TnyF"
 54 |       },
 55 |       "source": [
 56 |         "training_images=training_images.reshape(60000, 28, 28, 1)\n",
 57 |         "training_images=training_images / 255.0\n",
 58 |         "test_images = test_images.reshape(10000, 28, 28, 1)\n",
 59 |         "test_images=test_images/255.0"
 60 |       ],
 61 |       "execution_count": 2,
 62 |       "outputs": []
 63 |     },
 64 |     {
 65 |       "cell_type": "code",
 66 |       "metadata": {
 67 |         "id": "I-3_hM1mSImZ"
 68 |       },
 69 |       "source": [
 70 |         "class myCallback(tf.keras.callbacks.Callback):\n",
 71 |         "  def on_epoch_end(self, epoch, logs={}):\n",
 72 |         "    if(logs.get('accuracy')>0.998):\n",
 73 |         "      print(\"\\nReached 99.8% accuracy so cancelling training!\")\n",
 74 |         "      self.model.stop_training = True"
 75 |       ],
 76 |       "execution_count": 3,
 77 |       "outputs": []
 78 |     },
 79 |     {
 80 |       "cell_type": "code",
 81 |       "metadata": {
 82 |         "id": "sfQRyaJWAIdg"
 83 |       },
 84 |       "source": [
 85 |         "callbacks = myCallback()\n",
 86 |         "\n",
 87 |         "model = tf.keras.models.Sequential([\n",
 88 |         "    tf.keras.layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)),\n",
 89 |         "    tf.keras.layers.MaxPooling2D(2,2),\n",
 90 |         "    tf.keras.layers.Flatten(),\n",
 91 |         "    tf.keras.layers.Dense(128, activation='relu'),\n",
 92 |         "    tf.keras.layers.Dense(10, activation='softmax')\n",
 93 |         "])\n",
 94 |         "\n",
 95 |         "model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])"
 96 |       ],
 97 |       "execution_count": 4,
 98 |       "outputs": []
 99 |     },
100 |     {
101 |       "cell_type": "code",
102 |       "metadata": {
103 |         "colab": {
104 |           "base_uri": "https://localhost:8080/"
105 |         },
106 |         "id": "i10RG-0ySDGY",
107 |         "outputId": "104e7f06-b70e-4fbb-c283-c1b4a55e8b67"
108 |       },
109 |       "source": [
110 |         "model.fit(training_images, training_labels, epochs=20, callbacks=[callbacks])"
111 |       ],
112 |       "execution_count": 5,
113 |       "outputs": [
114 |         {
115 |           "output_type": "stream",
116 |           "name": "stdout",
117 |           "text": [
118 |             "Epoch 1/20\n",
119 |             "1875/1875 [==============================] - 7s 4ms/step - loss: 0.2992 - accuracy: 0.9104\n",
120 |             "Epoch 2/20\n",
121 |             "1875/1875 [==============================] - 7s 4ms/step - loss: 0.0521 - accuracy: 0.9840\n",
122 |             "Epoch 3/20\n",
123 |             "1875/1875 [==============================] - 7s 4ms/step - loss: 0.0298 - accuracy: 0.9906\n",
124 |             "Epoch 4/20\n",
125 |             "1875/1875 [==============================] - 7s 4ms/step - loss: 0.0208 - accuracy: 0.9932\n",
126 |             "Epoch 5/20\n",
127 |             "1875/1875 [==============================] - 7s 4ms/step - loss: 0.0133 - accuracy: 0.9960\n",
128 |             "Epoch 6/20\n",
129 |             "1875/1875 [==============================] - 7s 4ms/step - loss: 0.0091 - accuracy: 0.9972\n",
130 |             "Epoch 7/20\n",
131 |             "1875/1875 [==============================] - 7s 4ms/step - loss: 0.0065 - accuracy: 0.9978\n",
132 |             "Epoch 8/20\n",
133 |             "1875/1875 [==============================] - 7s 4ms/step - loss: 0.0050 - accuracy: 0.9985\n",
134 |             "\n",
135 |             "Reached 99.8% accuracy so cancelling training!\n"
136 |           ]
137 |         },
138 |         {
139 |           "output_type": "execute_result",
140 |           "data": {
141 |             "text/plain": [
142 |               "<tensorflow.python.keras.callbacks.History at 0x12ffc7d00>"
143 |             ]
144 |           },
145 |           "metadata": {},
146 |           "execution_count": 5
147 |         }
148 |       ]
149 |     },
150 |     {
151 |       "cell_type": "code",
152 |       "execution_count": 9,
153 |       "metadata": {},
154 |       "outputs": [
155 |         {
156 |           "output_type": "stream",
157 |           "name": "stdout",
158 |           "text": [
159 |             "\n[]\n"
160 |           ]
161 |         }
162 |       ],
163 |       "source": [
164 |         "import tensorflow as tf\n",
165 |         "print(tf.test.gpu_device_name())\n",
166 |         "print(tf.config.list_physical_devices('GPU'))"
167 |       ]
168 |     },
169 |     {
170 |       "cell_type": "code",
171 |       "execution_count": 6,
172 |       "metadata": {},
173 |       "outputs": [
174 |         {
175 |           "output_type": "execute_result",
176 |           "data": {
177 |             "text/plain": [
178 |               "[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU')]"
179 |             ]
180 |           },
181 |           "metadata": {},
182 |           "execution_count": 6
183 |         }
184 |       ],
185 |       "source": [
186 |         "tf.config.list_physical_devices()"
187 |       ]
188 |     },
189 |     {
190 |       "cell_type": "code",
191 |       "execution_count": 9,
192 |       "metadata": {},
193 |       "outputs": [
194 |         {
195 |           "output_type": "execute_result",
196 |           "data": {
197 |             "text/plain": [
198 |               "True"
199 |             ]
200 |           },
201 |           "metadata": {},
202 |           "execution_count": 9
203 |         }
204 |       ],
205 |       "source": [
206 |         "from tensorflow.python.compiler.mlcompute import mlcompute\n",
207 |         "mlcompute.is_apple_mlc_enabled()"
208 |       ]
209 |     },
210 |     {
211 |       "cell_type": "code",
212 |       "execution_count": 10,
213 |       "metadata": {},
214 |       "outputs": [
215 |         {
216 |           "output_type": "execute_result",
217 |           "data": {
218 |             "text/plain": [
219 |               "True"
220 |             ]
221 |           },
222 |           "metadata": {},
223 |           "execution_count": 10
224 |         }
225 |       ],
226 |       "source": [
227 |         "mlcompute.is_tf_compiled_with_apple_mlc()"
228 |       ]
229 |     }
230 |   ]
231 | }


--------------------------------------------------------------------------------
/tensorflow-in-practice/Exercises/Exercise_4_Complex_Images_flow_from_directory.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |   "nbformat": 4,
  3 |   "nbformat_minor": 0,
  4 |   "metadata": {
  5 |     "colab": {
  6 |       "name": "Exercise 4-Question.ipynb",
  7 |       "provenance": []
  8 |     },
  9 |     "kernelspec": {
 10 |       "display_name": "Python 3",
 11 |       "name": "python3"
 12 |     }
 13 |   },
 14 |   "cells": [
 15 |     {
 16 |       "cell_type": "code",
 17 |       "metadata": {
 18 |         "colab": {
 19 |           "base_uri": "https://localhost:8080/"
 20 |         },
 21 |         "id": "7Vti6p3PxmpS",
 22 |         "outputId": "99f9f945-5bd1-41e0-c274-77966a56d7aa"
 23 |       },
 24 |       "source": [
 25 |         "import tensorflow as tf\n",
 26 |         "import os\n",
 27 |         "import zipfile\n",
 28 |         "\n",
 29 |         "DESIRED_ACCURACY = 0.999\n",
 30 |         "\n",
 31 |         "!wget --no-check-certificate \\\n",
 32 |         "    \"https://storage.googleapis.com/laurencemoroney-blog.appspot.com/happy-or-sad.zip\" \\\n",
 33 |         "    -O \"/tmp/happy-or-sad.zip\"\n",
 34 |         "\n",
 35 |         "zip_ref = zipfile.ZipFile(\"/tmp/happy-or-sad.zip\", 'r')\n",
 36 |         "zip_ref.extractall(\"/tmp/h-or-s\")\n",
 37 |         "zip_ref.close()\n",
 38 |         "\n",
 39 |         "class myCallback(tf.keras.callbacks.Callback):\n",
 40 |         "  def on_epoch_end(self, epoch, logs={}):\n",
 41 |         "    if(logs.get('accuracy')>DESIRED_ACCURACY):\n",
 42 |         "      print(\"\\nReached 99.9% accuracy so cancelling training!\")\n",
 43 |         "      self.model.stop_training = True\n",
 44 |         "\n",
 45 |         "callbacks = myCallback()"
 46 |       ],
 47 |       "execution_count": 17,
 48 |       "outputs": [
 49 |         {
 50 |           "output_type": "stream",
 51 |           "text": [
 52 |             "--2021-04-08 03:16:56--  https://storage.googleapis.com/laurencemoroney-blog.appspot.com/happy-or-sad.zip\n",
 53 |             "Resolving storage.googleapis.com (storage.googleapis.com)... 173.194.215.128, 173.194.216.128, 173.194.217.128, ...\n",
 54 |             "Connecting to storage.googleapis.com (storage.googleapis.com)|173.194.215.128|:443... connected.\n",
 55 |             "HTTP request sent, awaiting response... 200 OK\n",
 56 |             "Length: 2670333 (2.5M) [application/zip]\n",
 57 |             "Saving to: ‘/tmp/happy-or-sad.zip’\n",
 58 |             "\n",
 59 |             "\r/tmp/happy-or-sad.z   0%[                    ]       0  --.-KB/s               \r/tmp/happy-or-sad.z 100%[===================>]   2.55M  --.-KB/s    in 0.01s   \n",
 60 |             "\n",
 61 |             "2021-04-08 03:16:56 (217 MB/s) - ‘/tmp/happy-or-sad.zip’ saved [2670333/2670333]\n",
 62 |             "\n"
 63 |           ],
 64 |           "name": "stdout"
 65 |         }
 66 |       ]
 67 |     },
 68 |     {
 69 |       "cell_type": "code",
 70 |       "metadata": {
 71 |         "id": "6DLGbXXI1j_V"
 72 |       },
 73 |       "source": [
 74 |         "# This Code Block should Define and Compile the Model\n",
 75 |         "model = tf.keras.models.Sequential([\n",
 76 |         "  tf.keras.layers.Conv2D(16, (3, 3), activation='relu', input_shape=(300, 300, 3)),\n",
 77 |         "  tf.keras.layers.MaxPooling2D(2, 2),\n",
 78 |         "  tf.keras.layers.Conv2D(32, (3, 3), activation='relu'),\n",
 79 |         "  tf.keras.layers.MaxPooling2D(2, 2),\n",
 80 |         "  tf.keras.layers.Conv2D(64, (3, 3), activation='relu'),\n",
 81 |         "  tf.keras.layers.MaxPooling2D(2, 2),\n",
 82 |         "  tf.keras.layers.Flatten(), # Flatten the results to feed into a DNN\n",
 83 |         "  tf.keras.layers.Dense(512, activation='relu'), # 512 neuron hidden layer\n",
 84 |         "  tf.keras.layers.Dense(1, activation='sigmoid'),\n",
 85 |         "])\n",
 86 |         "\n",
 87 |         "\n",
 88 |         "from tensorflow.keras.optimizers import RMSprop\n",
 89 |         "\n",
 90 |         "model.compile(loss=\"binary_crossentropy\",\n",
 91 |         "              optimizer=RMSprop(lr=0.001),\n",
 92 |         "              metrics=['accuracy'])"
 93 |       ],
 94 |       "execution_count": 18,
 95 |       "outputs": []
 96 |     },
 97 |     {
 98 |       "cell_type": "code",
 99 |       "metadata": {
100 |         "colab": {
101 |           "base_uri": "https://localhost:8080/"
102 |         },
103 |         "id": "4Ap9fUJE1vVu",
104 |         "outputId": "cc7b7369-a630-478d-b57e-62c015cf127a"
105 |       },
106 |       "source": [
107 |         "# This code block should create an instance of an ImageDataGenerator called train_datagen \n",
108 |         "# And a train_generator by calling train_datagen.flow_from_directory\n",
109 |         "\n",
110 |         "from tensorflow.keras.preprocessing.image import ImageDataGenerator\n",
111 |         "\n",
112 |         "train_datagen = ImageDataGenerator(rescale=1./255)\n",
113 |         "\n",
114 |         "train_generator = train_datagen.flow_from_directory(\n",
115 |         "        '/tmp/h-or-s/',\n",
116 |         "        target_size=(300, 300),\n",
117 |         "        batch_size=8,\n",
118 |         "        class_mode='binary')\n",
119 |         "\n",
120 |         "# Expected output: 'Found 80 images belonging to 2 classes'"
121 |       ],
122 |       "execution_count": 13,
123 |       "outputs": [
124 |         {
125 |           "output_type": "stream",
126 |           "text": [
127 |             "Found 80 images belonging to 2 classes.\n"
128 |           ],
129 |           "name": "stdout"
130 |         }
131 |       ]
132 |     },
133 |     {
134 |       "cell_type": "code",
135 |       "metadata": {
136 |         "colab": {
137 |           "base_uri": "https://localhost:8080/"
138 |         },
139 |         "id": "48dLm13U1-Le",
140 |         "outputId": "8c82e79e-fed0-4b0a-be86-089d17f1cd66"
141 |       },
142 |       "source": [
143 |         "# This code block should call model.fit and train for\n",
144 |         "# a number of epochs. \n",
145 |         "history = model.fit(\n",
146 |         "      train_generator,\n",
147 |         "      steps_per_epoch=10,\n",
148 |         "      epochs=20,\n",
149 |         "      callbacks=[callbacks])\n",
150 |         "    \n",
151 |         "# Expected output: \"Reached 99.9% accuracy so cancelling training!\"\""
152 |       ],
153 |       "execution_count": 19,
154 |       "outputs": [
155 |         {
156 |           "output_type": "stream",
157 |           "text": [
158 |             "Epoch 1/20\n",
159 |             "10/10 [==============================] - 11s 1s/step - loss: 4.0546 - accuracy: 0.5853\n",
160 |             "Epoch 2/20\n",
161 |             "10/10 [==============================] - 9s 936ms/step - loss: 0.8477 - accuracy: 0.6622\n",
162 |             "Epoch 3/20\n",
163 |             "10/10 [==============================] - 10s 1s/step - loss: 0.2766 - accuracy: 0.9474\n",
164 |             "Epoch 4/20\n",
165 |             "10/10 [==============================] - 10s 1s/step - loss: 0.1236 - accuracy: 0.9822\n",
166 |             "Epoch 5/20\n",
167 |             "10/10 [==============================] - 9s 921ms/step - loss: 0.0683 - accuracy: 0.9704\n",
168 |             "Epoch 6/20\n",
169 |             "10/10 [==============================] - 10s 977ms/step - loss: 0.0221 - accuracy: 1.0000\n",
170 |             "\n",
171 |             "Reached 99.9% accuracy so cancelling training!\n"
172 |           ],
173 |           "name": "stdout"
174 |         }
175 |       ]
176 |     }
177 |   ]
178 | }


--------------------------------------------------------------------------------
/tensorflow-in-practice/MNIST/my_model.h5:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Rustam-Z/machine-learning/5001d7d103642a61f82492df3a968aa6f4836601/tensorflow-in-practice/MNIST/my_model.h5


--------------------------------------------------------------------------------
/tensorflow-in-practice/MNIST/test.py:
--------------------------------------------------------------------------------
 1 | import tensorflow as tf
 2 | import numpy as np
 3 | from PIL import Image
 4 | import cv2
 5 | import matplotlib.pyplot as plt
 6 | 
 7 | model = tf.keras.models.load_model('tensorflow-in-practice/notebooks/MNIST/my_model.h5')
 8 | 
 9 | image = cv2.imread('tensorflow-in-practice/img/0.jpg')
10 | image = cv2.resize(image,(28,28))
11 | gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
12 | data = np.vstack([gray])
13 | data=data/255.0
14 | 
15 | plt.imshow(gray, cmap='gray')
16 | plt.show()
17 | 
18 | indices_one = data == 1
19 | data[indices_one] = 0 # replacing 1s with 0s
20 | print(data)
21 | 
22 | predictions = model.predict(np.expand_dims(data, 0))
23 | print("\nAnswer:")
24 | print(predictions)
25 | 


--------------------------------------------------------------------------------
/tensorflow-in-practice/MNIST/train.py:
--------------------------------------------------------------------------------
 1 | import tensorflow as tf
 2 | import numpy as np
 3 | from PIL import Image
 4 | import cv2
 5 | import matplotlib.pyplot as plt
 6 | 
 7 | class myCallback(tf.keras.callbacks.Callback):
 8 |   def on_epoch_end(self, epoch, logs={}):
 9 |     if(logs.get('accuracy')>0.90):
10 |       print("\nReached 99% accuracy so cancelling training!")
11 |       self.model.stop_training = True
12 | 
13 | mnist = tf.keras.datasets.mnist
14 | 
15 | (x_train, y_train),(x_test, y_test) = mnist.load_data()
16 | x_train, x_test = x_train / 255.0, x_test / 255.0
17 | 
18 | callbacks = myCallback()
19 | 
20 | model = tf.keras.models.Sequential([
21 |   tf.keras.layers.Flatten(input_shape=(28, 28)),
22 |   tf.keras.layers.Dense(512, activation=tf.nn.relu),
23 |   tf.keras.layers.Dense(256, activation=tf.nn.relu),
24 |   tf.keras.layers.Dense(128, activation=tf.nn.relu),
25 |   tf.keras.layers.Dense(10, activation=tf.nn.softmax)
26 | ])
27 | model.compile(optimizer=tf.optimizers.Adam(),
28 |               loss='sparse_categorical_crossentropy',
29 |               metrics=['accuracy'])
30 | 
31 | model.fit(x_train, y_train, epochs=10, callbacks=[callbacks])
32 | 
33 | image = cv2.imread('3.png')
34 | image = cv2.resize(image,(28,28))
35 | gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
36 | data = np.vstack([gray])
37 | data=data/255.0
38 | 
39 | plt.imshow(gray, cmap='gray')
40 | plt.show()
41 | 
42 | indices_one = data == 1
43 | data[indices_one] = 0 # replacing 1s with 0s
44 | print(data)
45 | 
46 | predictions = model.predict(np.expand_dims(data, 0))
47 | print("\nAnswer:")
48 | print(predictions)
49 | 
50 | model.save('my_model.h5')


--------------------------------------------------------------------------------
/tensorflow-in-practice/README.md:
--------------------------------------------------------------------------------
 1 | # [TensorFlow in Practice](https://www.coursera.org/professional-certificates/tensorflow-in-practice) by DeepLearning.AI
 2 | 
 3 | Rustam-Z🚀, 16 April 2021
 4 | 
 5 |     Hi there👋, it is the next level. 
 6 | 
 7 |     Here in this specialization you will learn TensorFlow and Keras. 
 8 | 
 9 |     We will cover the basics of Keras model building structure, Computer Vision with CNN, etc.
10 | 
11 | ## How to study?
12 | Go to the [specialization website](https://www.coursera.org/professional-certificates/tensorflow-in-practice), and enroll the courses (you can audit).
13 | - Course notebooks: https://github.com/lmoroney/dlaicourse 
14 | 
15 | ## What's next?
16 | - Start Kaggle competitions
17 | - Start reading **Hands-on Machine learning** book
18 | - **TensorFlow Advanced Techniques**: https://www.coursera.org/specializations/tensorflow-advanced-techniques


--------------------------------------------------------------------------------
/tensorflow-in-practice/convolutional-neural-networks-tensorflow.md:
--------------------------------------------------------------------------------
  1 | # [Convolutional Neural Networks in TensorFlow](https://www.coursera.org/learn/convolutional-neural-networks-tensorflow)
  2 | 
  3 |     - How to work with real-world images in different shapes and sizes.
  4 |     - Visualize the journey of an image through convolutions to understand how a computer “sees” information
  5 |     - Plot loss and accuracy, and explore strategies to prevent overfitting, including augmentation and dropout.
  6 |     - Finally, Course 2 will introduce you to transfer learning and how learned features can be extracted from models.
  7 | 
  8 | ## Contents:
  9 | - Week 1 - [Exploring a Larger Dataset](#Exploring-a-Larger-Dataset)
 10 | - Week 2 - [Augmentation](#Augmentation)
 11 | - Week 3 - [Transfer Learning](#Transfer-Learning)
 12 | - Week 4 - [Multiclass Classifications](#Multiclass-Classifications) 
 13 | 
 14 | ## Exploring a Larger Dataset
 15 | > [Notebook](notebooks/Course_2_Part_2_Lesson_2_Notebook.ipynb)
 16 | 
 17 | > https://www.kaggle.com/c/dogs-vs-cats 25K pictures of cats and dogs
 18 | 
 19 | ```python
 20 | # Download ZIP file and extract it with python
 21 | !wget --no-check-certificate \
 22 |   https://storage.googleapis.com/mledu-datasets/cats_and_dogs_filtered.zip \
 23 |   -O /tmp/cats_and_dogs_filtered.zip
 24 | _____________________________________________
 25 | import os
 26 | import zipfile
 27 | 
 28 | local_zip = '/tmp/cats_and_dogs_filtered.zip'
 29 | 
 30 | zip_ref = zipfile.ZipFile(local_zip, 'r')
 31 | 
 32 | zip_ref.extractall('/tmp')
 33 | zip_ref.close()
 34 | ```
 35 | ```py
 36 | from tensorflow.keras.preprocessing.image import ImageDataGenerator
 37 | 
 38 | # All images will be rescaled by 1./255.
 39 | train_datagen = ImageDataGenerator(rescale = 1.0/255.)
 40 | 
 41 | train_generator = train_datagen.flow_from_directory(train_dir,
 42 |                                                     batch_size=20,
 43 |                                                     class_mode='binary',
 44 |                                                     target_size=(150, 150))     
 45 | ```
 46 | 
 47 | ## Augmentation
 48 | > [Notebook](notebooks/Course_2_Part_4_Lesson_2_Notebook_(Cats_v_Dogs_Augmentation).ipynb)
 49 | 
 50 | > `image-augmentation` • `data-augmentation` • `ImageDataGenerator`
 51 | 
 52 | - All processes will happen in the main memory, from_from_directory() will generate the images on the fly. It doesn't require you to edit your raw images, nor does it amend them for you on-disk. It does it in-memory as it's performing the training, allowing you to experiment without impacting your dataset. 
 53 | - `ImageDataGenerator()` -> `flow_from_directory()` -> `fit_generator()`
 54 | - **ImageDataGenerator** will NOT add **new images** to your data set in a sense that it will not make your epochs bigger. Instead, in each epoch it will provide slightly altered images (depending on your configuration). It will always generate new images, no matter how many epochs you have.
 55 | 
 56 | ```python
 57 | train_datagen = ImageDataGenerator(
 58 |   rescale=1./255,
 59 |   rotation_range=40,      # Randomly rotate image between 0 and 40°
 60 |   width_shift_range=0.2,  # Move picture inside its frame
 61 |   height_shitt_range=0.2,
 62 |   shear_range=0.2,        # Shear up to 20%
 63 |   zoom_range=0.2,         
 64 |   horizontal_flip=True,
 65 |   fill_mode='nearest')    # It attempts to recreate lost information after a transformation like a shear
 66 | 
 67 | train_generator = train_datagen.flow_from_directory(
 68 |         train_dir,               # This is the source directory for training images
 69 |         target_size=(150, 150),  # All images will be resized to 150x150
 70 |         batch_size=20,           # Size of the batches of data, (? a number of samples per gradient update)
 71 |         class_mode='binary')
 72 | 
 73 | history = model.fit_generator(
 74 |       train_generator,
 75 |       steps_per_epoch=100,       # 2000 images = batch_size * steps, total number of steps (batches of samples) before declaring one epoch finished and starting the next epoch
 76 |       epochs=100,
 77 |       # validation_data=validation_generator,
 78 |       # validation_steps=50,  # 1000 images = batch_size * steps
 79 |       verbose=2)
 80 | ```
 81 | - https://keras.io/api/preprocessing/image/
 82 | - https://fairyonice.github.io/Learn-about-ImageDataGenerator.html
 83 | - https://keras.io/api/models/model_training_apis/#fit-method
 84 | - https://stackoverflow.com/questions/38340311/what-is-the-difference-between-steps-and-epochs-in-tensorflow
 85 | - https://stackoverflow.com/questions/51748514/does-imagedatagenerator-add-more-images-to-my-dataset
 86 | 
 87 | ## Transfer Learning
 88 | > `inception`
 89 | 
 90 | > https://www.tensorflow.org/tutorials/images/transfer_learning
 91 | 
 92 | ```python
 93 | import os
 94 | from tensorflow.keras import layers
 95 | from tensorflow.keras import Model
 96 | from tensorflow.keras.applications.inception_v3 import InceptionV3
 97 | from tensorflow.keras.optimizers import RMSprop
 98 | 
 99 | # Donwload InceptionV3 weights
100 | !wget --no-check-certificate \
101 |     https://storage.googleapis.com/mledu-datasets/inception_v3_weights_tf_dim_ordering_tf_kernels_notop.h5 \
102 |     -O /tmp/inception_v3_weights_tf_dim_ordering_tf_kernels_notop.h5
103 | 
104 | local_weights_file = '/tmp/inception_v3_weights_tf_dim_ordering_tf_kernels_notop.h5'
105 | pre_trained_model = Inceptionv3(input_shape=(150, 150, 3),
106 |                                 include_top=False, # Do not include top FC (fully connected) layer
107 |                                 weights=None)
108 | pre_trained_model.load_weights(local_weights_file) # Use own weights
109 | 
110 | # Do not retrain layers, i.e freeze them
111 | for layer in pre_trained_model.layers:
112 |   layer.trainable = False
113 | 
114 | # pre_trained_model.summary()
115 | 
116 | # Grab the mixed7 layer from inception, and take its output 
117 | last_layer = pre_trained_model.get_layer('mixed7')
118 | print('last layer output shape: ', last_layer.output_shape)
119 | last_output = last_layer.output
120 | 
121 | # Now, you'll need to add your own DNN at the bottom of these, which you can retrain to your data
122 | x = layers.Flatten()(last_output)
123 | x = layers.Dense(1024, activation='relu')(x)
124 | x = layers.Dropout(0.2)(x) # Drop out 20% of neurons
125 | x = layers.Dense(1, activation='sigmoid')(x)
126 | 
127 | # Create model using 'Model' abstract class
128 | model = Model(pre_trained_model.input, x)
129 | model.compile(optimizer=RMSprop(lr=.0001),
130 |               loss='binary_crossentropy',
131 |               metrics=['accuracy'])
132 | 
133 | train_datagen = ImageDataGenerator(...)
134 | train_generator = train_datagen.flow_from_directory(...)
135 | history = model.fit_generator(...)
136 | ```
137 | > The idea behind **Dropouts** is that they **remove a random number of neurons** in your neural network. This works very well for two reasons: The first is that neighboring neurons often end up with similar weights, which can lead to overfitting, so dropping some out at random can remove this. The second is that often a neuron can over-weigh the input from a neuron in the previous layer, and can over specialize as a result. Thus, dropping out can break the neural network out of this potential bad habit!
138 | 
139 | ## Multiclass Classifications
140 | - Computer generated images (CGI) will help you to create a dataset. Imagine you are creating a project for detecting rock, paper, scissors (💎, 📄, ✂️) during the game. So, you need lots of images of different races for both male and female, big and little hands. 
141 | - http://www.laurencemoroney.com/rock-paper-scissors-dataset/
142 | <!-- 
143 | - https://storage.googleapis.com/laurencemoroney-blog.appspot.com/rps.zip
144 | - https://storage.googleapis.com/laurencemoroney-blog.appspot.com/rps-test-set.zip
145 | - https://storage.googleapis.com/laurencemoroney-blog.appspot.com/rps-validation.zip -->
146 | - Change to `class_mode='categorical'` in flow_from_firectory(), and output Dense layer `activation='softmax'`, and loss function in model.compile `loss='categorical_crossentropy'`
147 | - flow_from_directory() uses the alphabetical order. For example, is we test for rock the output should be [1, 0, 0] because of [rock, paper, scissors].
148 | 
149 | ## Notes
150 | - Can you use Image augmentation with Transfer Learning? 
151 |   > Yes. It's pre-trained layers that are frozen. So you can augment your images as you train the bottom layers of the DNN with them


--------------------------------------------------------------------------------
/tensorflow-in-practice/img/0.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Rustam-Z/machine-learning/5001d7d103642a61f82492df3a968aa6f4836601/tensorflow-in-practice/img/0.jpg


--------------------------------------------------------------------------------
/tensorflow-in-practice/img/1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Rustam-Z/machine-learning/5001d7d103642a61f82492df3a968aa6f4836601/tensorflow-in-practice/img/1.png


--------------------------------------------------------------------------------
/tensorflow-in-practice/img/2.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Rustam-Z/machine-learning/5001d7d103642a61f82492df3a968aa6f4836601/tensorflow-in-practice/img/2.jpg


--------------------------------------------------------------------------------
/tensorflow-in-practice/img/3.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Rustam-Z/machine-learning/5001d7d103642a61f82492df3a968aa6f4836601/tensorflow-in-practice/img/3.jpg


--------------------------------------------------------------------------------
/tensorflow-in-practice/img/fibonacci.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Rustam-Z/machine-learning/5001d7d103642a61f82492df3a968aa6f4836601/tensorflow-in-practice/img/fibonacci.png


--------------------------------------------------------------------------------
/tensorflow-in-practice/img/fp.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Rustam-Z/machine-learning/5001d7d103642a61f82492df3a968aa6f4836601/tensorflow-in-practice/img/fp.png


--------------------------------------------------------------------------------
/tensorflow-in-practice/img/fp2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Rustam-Z/machine-learning/5001d7d103642a61f82492df3a968aa6f4836601/tensorflow-in-practice/img/fp2.png


--------------------------------------------------------------------------------
/tensorflow-in-practice/img/lstm.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Rustam-Z/machine-learning/5001d7d103642a61f82492df3a968aa6f4836601/tensorflow-in-practice/img/lstm.png


--------------------------------------------------------------------------------
/tensorflow-in-practice/img/lstm2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Rustam-Z/machine-learning/5001d7d103642a61f82492df3a968aa6f4836601/tensorflow-in-practice/img/lstm2.png


--------------------------------------------------------------------------------
/tensorflow-in-practice/img/metrics.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Rustam-Z/machine-learning/5001d7d103642a61f82492df3a968aa6f4836601/tensorflow-in-practice/img/metrics.png


--------------------------------------------------------------------------------
/tensorflow-in-practice/img/ml_architecture.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Rustam-Z/machine-learning/5001d7d103642a61f82492df3a968aa6f4836601/tensorflow-in-practice/img/ml_architecture.png


--------------------------------------------------------------------------------
/tensorflow-in-practice/img/rfp.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Rustam-Z/machine-learning/5001d7d103642a61f82492df3a968aa6f4836601/tensorflow-in-practice/img/rfp.png


--------------------------------------------------------------------------------
/tensorflow-in-practice/img/rnn.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Rustam-Z/machine-learning/5001d7d103642a61f82492df3a968aa6f4836601/tensorflow-in-practice/img/rnn.png


--------------------------------------------------------------------------------
/tensorflow-in-practice/img/rnn2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Rustam-Z/machine-learning/5001d7d103642a61f82492df3a968aa6f4836601/tensorflow-in-practice/img/rnn2.png


--------------------------------------------------------------------------------
/tensorflow-in-practice/img/seasonality.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Rustam-Z/machine-learning/5001d7d103642a61f82492df3a968aa6f4836601/tensorflow-in-practice/img/seasonality.png


--------------------------------------------------------------------------------
/tensorflow-in-practice/img/tf_datasets.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Rustam-Z/machine-learning/5001d7d103642a61f82492df3a968aa6f4836601/tensorflow-in-practice/img/tf_datasets.png


--------------------------------------------------------------------------------
/tensorflow-in-practice/img/trend.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Rustam-Z/machine-learning/5001d7d103642a61f82492df3a968aa6f4836601/tensorflow-in-practice/img/trend.png


--------------------------------------------------------------------------------
/tensorflow-in-practice/img/ts.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Rustam-Z/machine-learning/5001d7d103642a61f82492df3a968aa6f4836601/tensorflow-in-practice/img/ts.png


--------------------------------------------------------------------------------
/tensorflow-in-practice/img/tsn.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Rustam-Z/machine-learning/5001d7d103642a61f82492df3a968aa6f4836601/tensorflow-in-practice/img/tsn.png


--------------------------------------------------------------------------------
/tensorflow-in-practice/img/word_embeddings.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Rustam-Z/machine-learning/5001d7d103642a61f82492df3a968aa6f4836601/tensorflow-in-practice/img/word_embeddings.png


--------------------------------------------------------------------------------
/tensorflow-in-practice/introduction-to-tensorflow-for-ai.md:
--------------------------------------------------------------------------------
  1 | # [Introduction to TensorFlow for Artificial Intelligence, Machine Learning, and Deep Learning](https://www.coursera.org/learn/introduction-tensorflow/home/welcome)
  2 | 
  3 | ## Contents:
  4 | - Week 1 - [A new programming paradigm](#A-new-programming-paradigm)
  5 | - Week 2 - [Introduction to Computer Vision](#Introduction-to-Computer-Vision)
  6 | - Week 3 - [Convolutional Neural Networks](#Convolutional-Neural-Networks)
  7 | - Week 4 - [Using real-world images](#Using-Real-world-Images)
  8 |  
  9 | > `!pip install tensorflow==2.0.0-alpha0` run it to use TensorFlow 2.x in Google Colab
 10 | 
 11 | > The notebooks you can work with: https://drive.google.com/drive/folders/1R4bIjns1qRcTNkltbO9NOi7jgnrM-VLg?usp=sharing 
 12 | 
 13 | ## A new programming paradigm
 14 | > [Notebook](notebooks/Course_1_Part_2_Lesson_2_Notebook.ipynb) 
 15 | 
 16 | ### A primer in machine learning
 17 | <img src="img/1.png" width=400/>
 18 | 
 19 | ### The ‘Hello World’ of neural networks
 20 | ```python
 21 | from keras import models
 22 | from keras import layers
 23 | import numpy as np
 24 | 
 25 | model = keras.Sequential([keras.layers.Dense(units=1, input_shape=[1])])
 26 | model.compile(optimizer='sgd', loss='mean_squared_error') # Guess the pattern and measure how badly or good the algorithm works
 27 | 
 28 | # Just imagine you have lots of Xs and Ys, the computer doesn't know the correlation between them. Your algorithm tries to connect Xs to Ys (makes guesses). The loss functions looks at the predicted outputs and actial outputs and *measures how good or badly the guess was. Then it gives its value to optimizer which figures out the next guess (update its parameters). So the optimizer thinks about how good or how badly the guess was done using the data from the loss function.
 29 | 
 30 | xs = np.array([-1.0, 0.0, 1.0, 2.0, 3.0, 4.0], dtype=float)
 31 | ys = np.array([-3.0, -1.0, 1.0, 3.0, 5.0, 7.0], dtype=float)
 32 | 
 33 | model.fit(xs, ys, epochs=500) # Training
 34 | 
 35 | print(model.predict([10.0])) # You can expect 19 because y = 2x - 1, but it will be very close to ≈19
 36 | ```
 37 | 
 38 | ## Introduction to Computer Vision
 39 | > [Notebook](notebooks/Course_1_Part_4_Lesson_2_Notebook.ipynb)
 40 | 
 41 | > https://github.com/zalandoresearch/fashion-mnist 70K images
 42 | 
 43 | ```python
 44 | import tensorflow as tf
 45 | import numpy as np
 46 | import matplotlib.pyplot as plt # plt.imshow(training_images[0])
 47 | print(tf.__version__)
 48 | 
 49 | # Loading the dataset
 50 | mnist = tf.keras.datasets.fashion_mnist
 51 | (training_images, training_labels), (test_images, test_labels) = mnist.load_data()
 52 | print(training_images.shape)
 53 | print(test_images.shape)
 54 | 
 55 | # Normalizing
 56 | training_images  = training_images / 255.0
 57 | test_images = test_images / 255.0
 58 | 
 59 | # Building the model
 60 | model = tf.keras.models.Sequential([tf.keras.layers.Flatten(), 
 61 |                                     tf.keras.layers.Dense(1024, activation=tf.nn.relu), 
 62 |                                     tf.keras.layers.Dense(10, activation=tf.nn.softmax)])
 63 | 
 64 | # Defining the model, optimizer=tf.optimizers.Adam()
 65 | model.compile(optimizer='adam',
 66 |               loss='sparse_categorical_crossentropy',
 67 |               metrics=['accuracy'])
 68 | 
 69 | model.fit(training_images, training_labels, epochs=5) # Training the model, i.e. fitting training data to training labels
 70 | 
 71 | model.evaluate(test_images, test_labels)
 72 | 
 73 | classifications = model.predict(test_images) # Predict for new values
 74 | 
 75 | print(">> Predicted label:", classifications[0])
 76 | print(">> Actual label:", test_labels[0])
 77 | 
 78 | ```
 79 | - Notes:
 80 |     - **Sequential**: That defines a SEQUENCE of layers in the neural network
 81 |     - **Flatten**: Flatten just takes the input and turns it into a 1 dimensional set. Via ROWS 
 82 |     - **Dense**: Adds a layer of neuron. Each layer of neurons need an 'activation function' to tell them what to do. There's lots of options, but just use these for now.
 83 |     - **Relu** effectively means "If X>0 return X, else return 0" -- so what it does it it only passes values 0 or greater to the next layer in the network.
 84 |     - **Softmax** takes a set of values, and effectively picks the biggest one, so, for example, if the output of the last layer looks like [0.1, 0.1, 0.05, 0.1, 9.5, 0.1, 0.05, 0.05, 0.05], it saves you from fishing through it looking for the biggest value, and turns it into [0,0,0,0,1,0,0,0,0] -- The goal is to save a lot of coding!
 85 |     - https://stackoverflow.com/questions/44176982/how-does-the-flatten-layer-work-in-keras
 86 | 
 87 | ```python
 88 | # What if you want to stop training when you reached the accuracy needed
 89 | import tensorflow as tf
 90 | 
 91 | class myCallback(tf.keras.callbacks.Callback):
 92 |   def on_epoch_end(self, epoch, logs={}):
 93 |     if(logs.get('accuracy')>0.6):
 94 |       print("\nReached 60% accuracy so cancelling training!")
 95 |       self.model.stop_training = True
 96 | 
 97 | mnist = tf.keras.datasets.fashion_mnist
 98 | 
 99 | (x_train, y_train),(x_test, y_test) = mnist.load_data()
100 | x_train, x_test = x_train / 255.0, x_test / 255.0
101 | 
102 | callbacks = myCallback() # Creating the callback
103 | 
104 | model = tf.keras.models.Sequential([
105 |   tf.keras.layers.Flatten(input_shape=(28, 28)),
106 |   tf.keras.layers.Dense(512, activation=tf.nn.relu),
107 |   tf.keras.layers.Dense(10, activation=tf.nn.softmax)
108 | ])
109 | model.compile(optimizer=tf.optimizers.Adam(),
110 |               loss='sparse_categorical_crossentropy',
111 |               metrics=['accuracy'])
112 | 
113 | model.fit(x_train, y_train, epochs=10, callbacks=[callbacks]) # You need to add callbacks argument
114 | ```
115 | 
116 | ## Convolutional Neural Networks
117 | > [Notebook](notebooks/Course_1_Part_6_Lesson_2_Notebook.ipynb)
118 | 
119 | > https://github.com/Rustam-Z/deep-learning-notes/tree/main/Course%204%20Convolutional%20Neural%20Networks
120 | 
121 | **Types of layers in a convolutional network:**
122 | - Convolution (CONV) - A technique to isolate features in images
123 |   - We need to know the filter size, padding (borders - valid, same), striding (jumps)
124 | - Pooling (POOL) - A technique to reduce the information in an image while maintaining features
125 |   - Max pooling, average pooling
126 | - Fully connected (FC)
127 | 
128 | - Formula to calculate the shape of convolution: [(n + 2p - f) / s] + 1
129 | - Formula to calculate the number of parameters in convolution: (f * f * PREVIOUS_ACTIVATION_SHAPE + 1) * ACTIVATION_SHAPE
130 | 
131 | - https://lodev.org/cgtutor/filtering.html • https://colab.research.google.com/drive/1EiNdAW4gtrObrBSAuuxIt_AqO_Eft491#scrollTo=kDHjf-ehaBqm
132 | 
133 | ```python
134 | # Model architecture
135 | model = tf.keras.models.Sequential([
136 |   tf.keras.layers.Conv2D(64, (3, 3), activation='relu', input_shape=(28, 28, 1)),
137 |   tf.keras.layers.MaxPooling2D(2, 2),
138 |   tf.keras.layers.Conv2D(64, (3, 3), activation='relu'),
139 |   tf.keras.layers.MaxPooling2D(2, 2),
140 |   tf.keras.layers.Flatten(),
141 |   tf.keras.layers.Dense(128, activation='relu'), 
142 |   tf.keras.layers.Dense(10, activation='softmax')
143 | ])
144 | 
145 | model.summary() # To have a look to the architecture of model
146 | ```
147 | 
148 | ```python
149 | import tensorflow as tf
150 | print(tf.__version__)
151 | 
152 | mnist = tf.keras.datasets.fashion_mnist
153 | (training_images, training_labels), (test_images, test_labels) = mnist.load_data()
154 | 
155 | training_images=training_images.reshape(60000, 28, 28, 1)
156 | training_images=training_images / 255.0
157 | test_images = test_images.reshape(10000, 28, 28, 1)
158 | test_images=test_images/255.0
159 | 
160 | model = tf.keras.models.Sequential([
161 |   tf.keras.layers.Conv2D(64, (3,3), activation='relu', input_shape=(28, 28, 1)),
162 |   tf.keras.layers.MaxPooling2D(2, 2),
163 |   tf.keras.layers.Conv2D(64, (3,3), activation='relu'),
164 |   tf.keras.layers.MaxPooling2D(2,2),
165 | l
166 | ])
167 | model.compile(optimizer='adam', loss='ms', metrics=['accuracy'])
168 | model.summary() 
169 | model.fit(training_images, training_labels, epochs=10)
170 | test_loss = model.evaluate(test_images, test_labels)
171 | ```
172 | 
173 | ```python
174 | # This code will show us the convolutions graphically
175 | 
176 | import matplotlib.pyplot as plt
177 | from tensorflow.keras import models
178 | 
179 | f, axarr = plt.subplots(3,4)
180 | FIRST_IMAGE=0
181 | SECOND_IMAGE=23
182 | THIRD_IMAGE=28
183 | CONVOLUTION_NUMBER = 3
184 | 
185 | layer_outputs = [layer.output for layer in model.layers]
186 | activation_model = tf.keras.models.Model(inputs = model.input, outputs = layer_outputs)
187 | 
188 | for x in range(0,4):
189 |   f1 = activation_model.predict(test_images[FIRST_IMAGE].reshape(1, 28, 28, 1))[x]
190 |   axarr[0,x].imshow(f1[0, : , :, CONVOLUTION_NUMBER], cmap='inferno')
191 |   axarr[0,x].grid(False)
192 |   f2 = activation_model.predict(test_images[SECOND_IMAGE].reshape(1, 28, 28, 1))[x]
193 |   axarr[1,x].imshow(f2[0, : , :, CONVOLUTION_NUMBER], cmap='inferno')
194 |   axarr[1,x].grid(False)
195 |   f3 = activation_model.predict(test_images[THIRD_IMAGE].reshape(1, 28, 28, 1))[x]
196 |   axarr[2,x].imshow(f3[0, : , :, CONVOLUTION_NUMBER], cmap='inferno')
197 |   axarr[2,x].grid(False)
198 | ```
199 | 
200 | ## Using Real-world Images    
201 | > [Nobebook](notebooks/Course_1_Part_8_Lesson_2_Notebook.ipynb)
202 | 
203 | ```python
204 | # An ImageGenerator can flow images from a directory and perform operations such as resizing them on the fly
205 | import tensorflow as tf
206 | from tensorflow.keras.preprocessing.image import ImageDataGenerator
207 | from tensorflow.keras.optimizers import RMSprop
208 | 
209 | # All images will be rescaled by 1./255
210 | train_datagen = ImageDataGenerator(rescale=1/255)
211 | 
212 | # Flow training images in batches of 128 using train_datagen generator
213 | train_generator = train_datagen.flow_from_directory(
214 |   '/tmp/horse-or-human/',  # This is the source directory for training images
215 |   target_size=(300, 300),  # All images will be resized to 150x150
216 |   batch_size=128,
217 |   # Since we use binary_crossentropy loss, we need binary labels
218 |   class_mode='binary')
219 | 
220 | validation_generator = train_datagen.flow_from_directory(
221 |   validation_dir,
222 |   target_size=(300, 300),
223 |   batch_size=32,
224 |   class_mode='binary',
225 | )
226 | 
227 | model.compile(loss='binary_crossentropy', 
228 |               optimizer=RMSprop(lr=0.001),
229 |               metrics=['accuracy'])
230 | 
231 | history = model.fit_generator(
232 |   train_generator, # streames images from directory
233 |   steps_per_epoch=8, # 1024 images overall, so 128*8=1024, 128 is the batch size of train_generator 
234 |   epochs=15,
235 |   validation_data=validation_generator,
236 |   validation_steps=8, # 256 images, so 32*8=256, 32 is the batch size of validation_generator
237 |   verbose=2 # for info
238 | )
239 | ```
240 | ```python
241 | import numpy as np
242 | from google.colab import files
243 | from keras.preprocessing import image
244 | 
245 | uploaded = files.upload()
246 | 
247 | for fn in uploaded.keys():
248 |   # Predicting images
249 |   path = "/content/" + fn
250 |   img = image.load_img(path, target_size=(300, 300))
251 |   x = image.img_to_array(img)
252 |   x = np.expand_dims(x, axis=0)
253 | 
254 |   images = np.vstack([x])
255 |   classes = model.predict(images, batch_size=10)
256 |   print(classes[0])
257 | 
258 |   if classes[0] > 0.5:
259 |     print(fn + "is a human")
260 |   else:
261 |     print(fn + "is a horse")
262 | ```


--------------------------------------------------------------------------------
/tensorflow-in-practice/natural-language-processing-tensorflow.md:
--------------------------------------------------------------------------------
  1 | # [Natural Language Processing in TensorFlow](https://www.coursera.org/learn/natural-language-processing-tensorflow/home/welcome)
  2 | 
  3 |     - Week 1: How to convert the text into number representation, Tokenizer, fit_on_texts, texts_to_sequences, pad_sequences
  4 |     - Week 2: Word Embeddings - Classification problems
  5 |     - Week 3: Sequence models - RNN, LSTM, classification problems
  6 |     - Week 4: Sequence models and literature - text generation
  7 | 
  8 | - Week 1 - [Sentiment in text](#Sentiment-in-text)
  9 | - Week 2 - [Word Embeddings](#Word-Embeddings)
 10 | - Week 3 - [Sequence models](#Sequence-models)
 11 | - Week 4 - [Sequence models and literature](#Sequence-models-and-literature) 
 12 | 
 13 | ## Sentiment in text
 14 | > [Week 1 Notebook](notebooks/Course_3_Week_1(Tokenizer-Sarcasm-Dataset).ipynb)
 15 | 
 16 | - How to load in the texts, pre-process it and set up your data so it can be fed to a neural network.
 17 | - https://rishabhmisra.github.io/publications/
 18 | - `Tokenizer` is used to tokenize the sentences, `oov_token=<Token>`can be used to encode unknown words
 19 | - `fit_on_texts(sentences)` is used to tokenize the list of sentences
 20 |     - Output: `{'<OOV>': 1, 'my': 2, 'love': 3, 'dog': 4, 'i': 5, 'you': 6, 'cat': 7, 'do': 8, 'think': 9, 'is': 10, 'amazing': 11}`
 21 | - `texts_to_sequences(sentences)` - the method to encode a list of sentences to use those tokens
 22 |     - Output: `[[5, 3, 2, 4], [5, 3, 2, 7], [6, 3, 2, 4], [8, 6, 9, 2, 4, 10, 11]]`
 23 | 
 24 | ```py
 25 | tokenizer = Tokenizer(oov_token="<OOV>")
 26 | tokenizer.fit_on_texts(sentences)
 27 | word_index = tokenizer.word_index
 28 | sequences = tokenizer.texts_to_sequences(sentences)
 29 | padded = pad_sequences(sequences, padding='post') 
 30 | ```
 31 | 
 32 | ## Word Embeddings
 33 | > [Week 2 Model Training IMDB Reviews](notebooks/Course_3_Week_2(Model_Training_IMDB_Reviews).ipynb)
 34 | 
 35 | > [Week 2, beautiful code, Sarcasm Classifier](notebooks/Course_3_Week_2(Sarcasm-Classifier).ipynb)
 36 | 
 37 | > [Week 2, subwords](notebooks/Course_3_Week_2(Subwords).ipynb) - shows that Embeddings do not work with sequence of words 
 38 | 
 39 | <img src="img/word_embeddings.png"><br>
 40 | 
 41 | - In the second week, we learn to prepare the data with Tokenizer API, and then teach our model
 42 | - TensorFlow Datasets: https://www.tensorflow.org/datasets
 43 | <img src="img/tf_datasets.png"><br>
 44 | - https://github.com/tensorflow/datasets/tree/master/docs/catalog
 45 | - https://projector.tensorflow.org - to visualize the data
 46 | 
 47 | - **What is the purpose of the embedding dimension?**
 48 | > It is the number of dimensions for the **vector representing** the word encoding
 49 | 
 50 | - When tokenizing a corpus, what does the num_words=n parameter do?
 51 | > It specifies the maximum number of words to be tokenized, and picks the most common ‘n’ words
 52 | 
 53 | - NOTE: Sequence becomes much more important when dealing with subwords, but we’re ignoring word positions.
 54 | 
 55 | - It must specify 3 arguments, [reference](https://machinelearningmastery.com/use-word-embedding-layers-deep-learning-keras/):
 56 | 
 57 |     - **input_dim**: This is the size of the vocabulary in the text data. For example, if your data is integer encoded to values between 0-999, then the size of the vocabulary would be 1000 words. (all words)
 58 |     - **output_dim**: This is the size of the vector space in which words will be embedded. It defines the size of the output vectors from this layer for each word. For example, it could be 32 or 100 or even larger. Test different values for your problem.
 59 |     - **input_length**: This is the length of input sequences, as you would define for any input layer of a Keras model. For example, if all of your input documents are comprised of 100 words, this would be 100. (words in a sentence)
 60 | 
 61 | ```py
 62 | def plot_graphs(history, string):
 63 |   plt.plot(history.history[string])
 64 |   plt.plot(history.history['val_'+string])
 65 | 
 66 |   plt.xlabel("Epochs")
 67 |   plt.ylabel(string)
 68 |   plt.legend([string, 'val_'+string])
 69 |   plt.show()
 70 |   
 71 | plot_graphs(history, "accuracy")
 72 | plot_graphs(history, "loss")
 73 | ```
 74 | 
 75 | ## Sequence models
 76 | > [Week 3 IMDB](notebooks/Course_3_Week_3(IMDB).ipynb) - RNN, Embedding, Conv 1D experimenting
 77 | 
 78 | > We looked first at Tokenizing words to get numeric values from them, and then using Embeddings to group words of similar meaning depending on how they were labelled. This gave you a good, but rough, sentiment analysis -- words such as 'fun' and 'entertaining' might show up in a positive movie review, and 'boring' and 'dull' might show up in a negative one. But sentiment can also be determined by the sequence in which words appear. For example, you could have 'not fun', which of course is the opposite of 'fun'. This week you'll start digging into a variety of model formats that are used in training models to understand context in sequence!
 79 | 
 80 | - We used **word embeddings** to sentiment words. But what if we can use RNN and LSTM to predict the group of words. We can analyse in which relative ordering the words are coming.
 81 | 
 82 | - <img src="img/ml_architecture.png" width=400px><img src="img/fibonacci.png" width=400px><br>That's the classical ML, it doesn't take into account the sequences. For example, like **Fibonacci series**, we must know previous result to fit it into next input.
 83 | 
 84 | - <img src="img/rnn.png" width=400px><img src="img/rnn2.png" width=400px><br>So, that's the idea behind RNN (recurrent neural network). The output of previous is the input to the next.
 85 | 
 86 | - <br><img src="img/lstm.png" width=400px><img src="img/lstm2.png" width=400px> **LSTMs** have an additional pipeline of contexts called cell state. They can be bidirectional too.
 87 | 
 88 | - RNN, LSTM [video](https://www.youtube.com/watch?v=WCUNPb-5EYI)
 89 | - GRU - Gated recurrent union `tf.keras.layers.Bidirectional(tf.keras.layers.GRU(64)`
 90 | 
 91 | ```python
 92 | """LSTM in code"""
 93 | model = tf.keras.Sequential([
 94 |     tf.keras.layers.Embedding(tokenizer.vocab_size, 64),
 95 |     tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(64), return_sequences=True), # You need to define `return_sequences=True` when stacking two LSTMs
 96 |     # tf.keras.layers.Bidirectional(tf.keras.layers.GRU(64)
 97 |     tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(64)),
 98 |     tf.keras.layers.Dense(64, activation='relu'),
 99 |     tf.keras.layers.Dense(1, activation='sigmoid'),
100 | ])
101 | ```
102 | ```python
103 | """Using a convolutional network 1D"""
104 | model = tf.keras.Sequential([
105 |     tf.keras.layers.Embedding(tokenizer.vocab_size, 64),
106 |     tf.keras.layers.Conv1D(128, 5, activation='relu'),
107 |     tf.keras.layers.GlobalAveragePooling1D(),
108 |     tf.keras.layers.Dense(64, activation='relu'),
109 |     tf.keras.layers.Dense(1, activation='sigmoid')
110 | ])
111 | ```
112 | ```python
113 | model = tf.keras.Sequential([
114 |     tf.keras.layers.Embedding(vocab_size+1, embedding_dim, input_length=max_length), # weights=[embeddings_matrix], trainable=False
115 |     tf.keras.layers.Dropout(0.2),
116 |     tf.keras.layers.Conv1D(64, 5, activation='relu'),
117 |     tf.keras.layers.MaxPooling1D(pool_size=4),
118 |     tf.keras.layers.LSTM(64),
119 |     tf.keras.layers.Dense(1, activation='sigmoid')
120 | ])
121 | model.compile(loss='binary_crossentropy',optimizer='adam',metrics=['accuracy'])
122 | model.summary()
123 | 
124 | num_epochs = 50
125 | ```
126 | 
127 | ## Sequence models and literature
128 | > **Text generation** 
129 | 
130 | > [Week 4 Sheckspire Text Generation](notebooks/Course_3_Week_4_Lesson_1_(Sheckspire_Text_Generation).ipynb)
131 | 
132 | > Wrap up from course: You’ve been experimenting with NLP for text classification over the last few weeks. Next week you’ll switch gears -- and take a look at using the tools that you’ve learned to predict text, which ultimately means you can create text. By learning sequences of words you can predict the most common word that comes next in the sequence, and thus, when starting from a new sequence of words you can create a model that builds on them. You’ll take different training sets -- like traditional Irish songs, or Shakespeare poetry, and learn how to create new sets of words using their embeddings!
133 | 
134 | - **Finding what the next word should be**
135 | 
136 | 


--------------------------------------------------------------------------------
/tensorflow-in-practice/notebooks/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Rustam-Z/machine-learning/5001d7d103642a61f82492df3a968aa6f4836601/tensorflow-in-practice/notebooks/.DS_Store


--------------------------------------------------------------------------------
/tensorflow-in-practice/notebooks/Course_3_Week_2(Model_Training_IMDB_Reviews).ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "metadata": {
  3 |   "language_info": {
  4 |    "codemirror_mode": {
  5 |     "name": "ipython",
  6 |     "version": 3
  7 |    },
  8 |    "file_extension": ".py",
  9 |    "mimetype": "text/x-python",
 10 |    "name": "python",
 11 |    "nbconvert_exporter": "python",
 12 |    "pygments_lexer": "ipython3",
 13 |    "version": "3.8.6"
 14 |   },
 15 |   "orig_nbformat": 2,
 16 |   "kernelspec": {
 17 |    "name": "python3",
 18 |    "display_name": "Python 3.8.6 64-bit ('tf': conda)"
 19 |   },
 20 |   "metadata": {
 21 |    "interpreter": {
 22 |     "hash": "4ea0e157563bacde0b7fd8dc93db6051c9678d5eadbd4117abf1a4cecbc8cd1a"
 23 |    }
 24 |   },
 25 |   "interpreter": {
 26 |    "hash": "4ea0e157563bacde0b7fd8dc93db6051c9678d5eadbd4117abf1a4cecbc8cd1a"
 27 |   }
 28 |  },
 29 |  "nbformat": 4,
 30 |  "nbformat_minor": 2,
 31 |  "cells": [
 32 |   {
 33 |    "cell_type": "code",
 34 |    "execution_count": 1,
 35 |    "metadata": {},
 36 |    "outputs": [
 37 |     {
 38 |      "output_type": "stream",
 39 |      "name": "stdout",
 40 |      "text": [
 41 |       "2.4.0-rc0\n"
 42 |      ]
 43 |     }
 44 |    ],
 45 |    "source": [
 46 |     "import tensorflow as tf \n",
 47 |     "import tensorflow_datasets as tfds\n",
 48 |     "from tensorflow.keras.preprocessing.text import Tokenizer\n",
 49 |     "from tensorflow.keras.preprocessing.sequence import pad_sequences\n",
 50 |     "import numpy as np \n",
 51 |     "import io\n",
 52 |     "\n",
 53 |     "print(tf.__version__)"
 54 |    ]
 55 |   },
 56 |   {
 57 |    "cell_type": "code",
 58 |    "execution_count": 2,
 59 |    "metadata": {},
 60 |    "outputs": [
 61 |     {
 62 |      "output_type": "execute_result",
 63 |      "data": {
 64 |       "text/plain": [
 65 |        "True"
 66 |       ]
 67 |      },
 68 |      "metadata": {},
 69 |      "execution_count": 2
 70 |     }
 71 |    ],
 72 |    "source": [
 73 |     "tf.executing_eagerly() # if 1.x use `tf.enable_eager_execution()`"
 74 |    ]
 75 |   },
 76 |   {
 77 |    "cell_type": "code",
 78 |    "execution_count": 3,
 79 |    "metadata": {
 80 |     "tags": []
 81 |    },
 82 |    "outputs": [],
 83 |    "source": [
 84 |     "imdb, info = tfds.load(\"imdb_reviews\", with_info=True, as_supervised=True) # loading the data"
 85 |    ]
 86 |   },
 87 |   {
 88 |    "cell_type": "code",
 89 |    "execution_count": 5,
 90 |    "metadata": {},
 91 |    "outputs": [
 92 |     {
 93 |      "output_type": "execute_result",
 94 |      "data": {
 95 |       "text/plain": [
 96 |        "['abstract_reasoning', 'accentdb', 'aeslc', 'aflw2k3d', 'ag_news_subset']"
 97 |       ]
 98 |      },
 99 |      "metadata": {},
100 |      "execution_count": 5
101 |     }
102 |    ],
103 |    "source": [
104 |     "tfds.list_builders()[:5] # the list of all datasets"
105 |    ]
106 |   },
107 |   {
108 |    "cell_type": "code",
109 |    "execution_count": 7,
110 |    "metadata": {},
111 |    "outputs": [],
112 |    "source": [
113 |     "train_data, test_data = imdb['train'], imdb['test'] # 25k train and 25k testing\n",
114 |     "\n",
115 |     "training_sentences = []\n",
116 |     "training_labels = []\n",
117 |     "testing_sentences = []\n",
118 |     "testing_labels = []\n",
119 |     "\n",
120 |     "for sample, label in train_data:\n",
121 |     "    training_sentences.append(sample.numpy().decode('utf8'))\n",
122 |     "    training_labels.append(label.numpy())\n",
123 |     "\n",
124 |     "for sample, label in test_data:\n",
125 |     "    testing_sentences.append(sample.numpy().decode('utf8'))\n",
126 |     "    testing_labels.append(label.numpy())"
127 |    ]
128 |   },
129 |   {
130 |    "cell_type": "code",
131 |    "execution_count": 8,
132 |    "metadata": {},
133 |    "outputs": [
134 |     {
135 |      "output_type": "stream",
136 |      "name": "stdout",
137 |      "text": [
138 |       "I have been known to fall asleep during films, but this is usually due to a combination of things including, really tired, being warm and comfortable on the sette and having just eaten a lot. However on this occasion I fell asleep because the film was rubbish. The plot development was constant. Constantly slow and boring. Things seemed to happen, but with no explanation of what was causing them or why. I admit, I may have missed part of the film, but i watched the majority of it and everything just seemed to happen of its own accord without any real concern for anything else. I cant recommend this film at all.\n>> label 0\n"
139 |      ]
140 |     }
141 |    ],
142 |    "source": [
143 |     "print(training_sentences[1]) \n",
144 |     "print(\">> label\", training_labels[1]) # 0 negative, 1 pos"
145 |    ]
146 |   },
147 |   {
148 |    "cell_type": "code",
149 |    "execution_count": 9,
150 |    "metadata": {},
151 |    "outputs": [
152 |     {
153 |      "output_type": "stream",
154 |      "name": "stdout",
155 |      "text": [
156 |       "25000\n25000\n25000\n25000\n"
157 |      ]
158 |     }
159 |    ],
160 |    "source": [
161 |     "print(len(training_sentences))\n",
162 |     "print(len(training_labels))\n",
163 |     "print(len(testing_sentences))\n",
164 |     "print(len(testing_labels))"
165 |    ]
166 |   },
167 |   {
168 |    "cell_type": "code",
169 |    "execution_count": 10,
170 |    "metadata": {},
171 |    "outputs": [],
172 |    "source": [
173 |     "# converting to numpy arrays\n",
174 |     "training_labels_final = np.array(training_labels) \n",
175 |     "testing_labels_final = np.array(testing_labels)"
176 |    ]
177 |   },
178 |   {
179 |    "cell_type": "code",
180 |    "execution_count": 11,
181 |    "metadata": {},
182 |    "outputs": [
183 |     {
184 |      "output_type": "execute_result",
185 |      "data": {
186 |       "text/plain": [
187 |        "(25000,)"
188 |       ]
189 |      },
190 |      "metadata": {},
191 |      "execution_count": 11
192 |     }
193 |    ],
194 |    "source": [
195 |     "training_labels_final.shape"
196 |    ]
197 |   },
198 |   {
199 |    "cell_type": "code",
200 |    "execution_count": 12,
201 |    "metadata": {},
202 |    "outputs": [
203 |     {
204 |      "output_type": "execute_result",
205 |      "data": {
206 |       "text/plain": [
207 |        "(25000, 120)"
208 |       ]
209 |      },
210 |      "metadata": {},
211 |      "execution_count": 12
212 |     }
213 |    ],
214 |    "source": [
215 |     "# Preparing data for training by tokenizing\n",
216 |     "\n",
217 |     "vocab_size = 10000\n",
218 |     "embedding_dim = 16\n",
219 |     "max_length = 120\n",
220 |     "trunc_type='post' # [4, 4, 5, 6, ..... 0, 0, 0] - zeros at the end \n",
221 |     "oov_tok = \"<OOV>\" # out of vocabulary\n",
222 |     "\n",
223 |     "tokenizer = Tokenizer(num_words = vocab_size, oov_token=oov_tok)\n",
224 |     "tokenizer.fit_on_texts(training_sentences)\n",
225 |     "word_index = tokenizer.word_index # all 10000 words with tokens in a dictionary \n",
226 |     "sequences = tokenizer.texts_to_sequences(training_sentences) # all sentences represented only with tokens\n",
227 |     "padded = pad_sequences(sequences, maxlen=max_length, truncating=trunc_type) # make all sentences the same size\n",
228 |     "\n",
229 |     "# the same for testing set\n",
230 |     "testing_sequences = tokenizer.texts_to_sequences(testing_sentences)\n",
231 |     "testing_padded = pad_sequences(testing_sequences, maxlen=max_length)\n",
232 |     "\n",
233 |     "padded.shape"
234 |    ]
235 |   },
236 |   {
237 |    "cell_type": "code",
238 |    "execution_count": null,
239 |    "metadata": {},
240 |    "outputs": [],
241 |    "source": []
242 |   },
243 |   {
244 |    "cell_type": "code",
245 |    "execution_count": 89,
246 |    "metadata": {},
247 |    "outputs": [
248 |     {
249 |      "output_type": "stream",
250 |      "name": "stdout",
251 |      "text": [
252 |       "? ? ? ? ? ? ? ? i have been known to fall asleep during films but this is usually due to a combination of things including really tired being warm and comfortable on the <OOV> and having just eaten a lot however on this occasion i fell asleep because the film was rubbish the plot development was constant constantly slow and boring things seemed to happen but with no explanation of what was causing them or why i admit i may have missed part of the film but i watched the majority of it and everything just seemed to happen of its own <OOV> without any real concern for anything else i cant recommend this film at all\n\nI have been known to fall asleep during films, but this is usually due to a combination of things including, really tired, being warm and comfortable on the sette and having just eaten a lot. However on this occasion I fell asleep because the film was rubbish. The plot development was constant. Constantly slow and boring. Things seemed to happen, but with no explanation of what was causing them or why. I admit, I may have missed part of the film, but i watched the majority of it and everything just seemed to happen of its own accord without any real concern for anything else. I cant recommend this film at all.\n"
253 |      ]
254 |     }
255 |    ],
256 |    "source": [
257 |     "reverse_word_index = dict([(value, key) for (key, value) in word_index.items()])\n",
258 |     "\n",
259 |     "def decode_review(text):\n",
260 |     "    return ' '.join([reverse_word_index.get(i, '?') for i in text])\n",
261 |     "\n",
262 |     "print(decode_review(padded[1]))\n",
263 |     "print()\n",
264 |     "print(training_sentences[1])"
265 |    ]
266 |   },
267 |   {
268 |    "cell_type": "code",
269 |    "execution_count": 64,
270 |    "metadata": {},
271 |    "outputs": [
272 |     {
273 |      "output_type": "stream",
274 |      "name": "stdout",
275 |      "text": [
276 |       "I have been known to fall asleep during films, but this is usually due to a combination of things including, really tired, being warm and comfortable on the sette and having just eaten a lot. However on this occasion I fell asleep because the film was rubbish. The plot development was constant. Constantly slow and boring. Things seemed to happen, but with no explanation of what was causing them or why. I admit, I may have missed part of the film, but i watched the majority of it and everything just seemed to happen of its own accord without any real concern for anything else. I cant recommend this film at all.\n>> original length 617\n>> label 0\n\n[11, 26, 75, 571, 6, 805, 2354, 313, 106, 19, 12, 7, 629, 686, 6, 4, 2219, 5, 181, 584, 64, 1454, 110, 2263, 3, 3951, 21, 2, 1, 3, 258, 41, 4677, 4, 174, 188, 21, 12, 4078, 11, 1578, 2354, 86, 2, 20, 14, 1907, 2, 112, 940, 14, 1811, 1340, 548, 3, 355, 181, 466, 6, 591, 19, 17, 55, 1817, 5, 49, 14, 4044, 96, 40, 136, 11, 972, 11, 201, 26, 1046, 171, 5, 2, 20, 19, 11, 294, 2, 2155, 5, 10, 3, 283, 41, 466, 6, 591, 5, 92, 203, 1, 207, 99, 145, 4382, 16, 230, 332, 11, 2486, 384, 12, 20, 31, 30]\n>> sequence lenght 112\n\n[   0    0    0    0    0    0    0    0   11   26   75  571    6  805\n 2354  313  106   19   12    7  629  686    6    4 2219    5  181  584\n   64 1454  110 2263    3 3951   21    2    1    3  258   41 4677    4\n  174  188   21   12 4078   11 1578 2354   86    2   20   14 1907    2\n  112  940   14 1811 1340  548    3  355  181  466    6  591   19   17\n   55 1817    5   49   14 4044   96   40  136   11  972   11  201   26\n 1046  171    5    2   20   19   11  294    2 2155    5   10    3  283\n   41  466    6  591    5   92  203    1  207   99  145 4382   16  230\n  332   11 2486  384   12   20   31   30]\n"
277 |      ]
278 |     },
279 |     {
280 |      "output_type": "execute_result",
281 |      "data": {
282 |       "text/plain": [
283 |        "(120,)"
284 |       ]
285 |      },
286 |      "metadata": {},
287 |      "execution_count": 64
288 |     }
289 |    ],
290 |    "source": [
291 |     "print(training_sentences[1]) \n",
292 |     "print(\">> original length\", len(training_sentences[1]))\n",
293 |     "print(\">> label\", training_labels[1])\n",
294 |     "\n",
295 |     "print()\n",
296 |     "print(sequences[1])\n",
297 |     "print(\">> sequence lenght\", len(sequences[1]))\n",
298 |     "print()\n",
299 |     "print(padded[1])\n",
300 |     "padded[1].shape"
301 |    ]
302 |   },
303 |   {
304 |    "cell_type": "code",
305 |    "execution_count": 56,
306 |    "metadata": {},
307 |    "outputs": [
308 |     {
309 |      "output_type": "execute_result",
310 |      "data": {
311 |       "text/plain": [
312 |        "'bintang'"
313 |       ]
314 |      },
315 |      "metadata": {},
316 |      "execution_count": 56
317 |     }
318 |    ],
319 |    "source": [
320 |     "# len(list(word_index)) # 90000 appr\n",
321 |     "list(word_index)[57565] # even we defined vocab_size = 10000, tensorflow tokenizes all words, but in backed end it will work with 10000 words, \n",
322 |     "# num_words=n parameter specifies the maximum number of words to be tokenized, and picks the most common ‘n’ words"
323 |    ]
324 |   },
325 |   {
326 |    "cell_type": "code",
327 |    "execution_count": 71,
328 |    "metadata": {},
329 |    "outputs": [
330 |     {
331 |      "output_type": "stream",
332 |      "name": "stdout",
333 |      "text": [
334 |       "Model: \"sequential_2\"\n_________________________________________________________________\nLayer (type)                 Output Shape              Param #   \n=================================================================\nembedding_2 (Embedding)      (None, 120, 16)           160000    \n_________________________________________________________________\nflatten_2 (Flatten)          (None, 1920)              0         \n_________________________________________________________________\ndense_4 (Dense)              (None, 6)                 11526     \n_________________________________________________________________\ndense_5 (Dense)              (None, 1)                 7         \n=================================================================\nTotal params: 171,533\nTrainable params: 171,533\nNon-trainable params: 0\n_________________________________________________________________\n"
335 |      ]
336 |     }
337 |    ],
338 |    "source": [
339 |     "model = tf.keras.Sequential([\n",
340 |     "    tf.keras.layers.Embedding(vocab_size, embedding_dim, input_length=max_length),\n",
341 |     "    tf.keras.layers.Flatten(), # GlobalAveragePooling1D()\n",
342 |     "    tf.keras.layers.Dense(6, activation='relu'),\n",
343 |     "    tf.keras.layers.Dense(1, activation='sigmoid')\n",
344 |     "])\n",
345 |     "model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])\n",
346 |     "model.summary()"
347 |    ]
348 |   },
349 |   {
350 |    "cell_type": "code",
351 |    "execution_count": 106,
352 |    "metadata": {},
353 |    "outputs": [
354 |     {
355 |      "output_type": "stream",
356 |      "name": "stdout",
357 |      "text": [
358 |       "Epoch 1/10\n",
359 |       "782/782 [==============================] - 1s 1ms/step - loss: 9.7313e-05 - accuracy: 1.0000 - val_loss: 0.8354 - val_accuracy: 0.8314\n",
360 |       "Epoch 2/10\n",
361 |       "782/782 [==============================] - 1s 1ms/step - loss: 6.0164e-05 - accuracy: 1.0000 - val_loss: 0.8735 - val_accuracy: 0.8308\n",
362 |       "Epoch 3/10\n",
363 |       "782/782 [==============================] - 1s 1ms/step - loss: 3.7304e-05 - accuracy: 1.0000 - val_loss: 0.9050 - val_accuracy: 0.8318\n",
364 |       "Epoch 4/10\n",
365 |       "782/782 [==============================] - 1s 1ms/step - loss: 2.3330e-05 - accuracy: 1.0000 - val_loss: 0.9406 - val_accuracy: 0.8309\n",
366 |       "Epoch 5/10\n",
367 |       "782/782 [==============================] - 1s 1ms/step - loss: 1.5115e-05 - accuracy: 1.0000 - val_loss: 0.9730 - val_accuracy: 0.8313\n",
368 |       "Epoch 6/10\n",
369 |       "782/782 [==============================] - 1s 1ms/step - loss: 9.3207e-06 - accuracy: 1.0000 - val_loss: 1.0077 - val_accuracy: 0.8312\n",
370 |       "Epoch 7/10\n",
371 |       "782/782 [==============================] - 1s 1ms/step - loss: 6.1326e-06 - accuracy: 1.0000 - val_loss: 1.0429 - val_accuracy: 0.8307\n",
372 |       "Epoch 8/10\n",
373 |       "782/782 [==============================] - 1s 1ms/step - loss: 3.8306e-06 - accuracy: 1.0000 - val_loss: 1.0734 - val_accuracy: 0.8310\n",
374 |       "Epoch 9/10\n",
375 |       "782/782 [==============================] - 1s 1ms/step - loss: 2.4845e-06 - accuracy: 1.0000 - val_loss: 1.1086 - val_accuracy: 0.8311\n",
376 |       "Epoch 10/10\n",
377 |       "782/782 [==============================] - 1s 1ms/step - loss: 1.6163e-06 - accuracy: 1.0000 - val_loss: 1.1410 - val_accuracy: 0.8310\n"
378 |      ]
379 |     },
380 |     {
381 |      "output_type": "execute_result",
382 |      "data": {
383 |       "text/plain": [
384 |        "<tensorflow.python.keras.callbacks.History at 0x29b991ee0>"
385 |       ]
386 |      },
387 |      "metadata": {},
388 |      "execution_count": 106
389 |     }
390 |    ],
391 |    "source": [
392 |     "# Training own modelg\n",
393 |     "\n",
394 |     "num_epochs = 10\n",
395 |     "model.fit(padded, training_labels_final, epochs=num_epochs, validation_data=(testing_padded, testing_labels_final))"
396 |    ]
397 |   },
398 |   {
399 |    "cell_type": "code",
400 |    "execution_count": 76,
401 |    "metadata": {},
402 |    "outputs": [
403 |     {
404 |      "output_type": "execute_result",
405 |      "data": {
406 |       "text/plain": [
407 |        "[<tensorflow.python.keras.layers.embeddings.Embedding at 0x29bb18880>,\n",
408 |        " <tensorflow.python.keras.layers.core.Flatten at 0x29bb18460>,\n",
409 |        " <tensorflow.python.keras.layers.core.Dense at 0x29bb18070>,\n",
410 |        " <tensorflow.python.keras.layers.core.Dense at 0x29bb20370>]"
411 |       ]
412 |      },
413 |      "metadata": {},
414 |      "execution_count": 76
415 |     }
416 |    ],
417 |    "source": [
418 |     "e = model.layers\n",
419 |     "e"
420 |    ]
421 |   },
422 |   {
423 |    "cell_type": "code",
424 |    "execution_count": 79,
425 |    "metadata": {},
426 |    "outputs": [
427 |     {
428 |      "output_type": "stream",
429 |      "name": "stdout",
430 |      "text": [
431 |       "(10000, 16)\n"
432 |      ]
433 |     }
434 |    ],
435 |    "source": [
436 |     "e = model.layers[0]\n",
437 |     "weights = e.get_weights()[0]\n",
438 |     "print(weights.shape) # shape: (vocab_size, embedding_dim)"
439 |    ]
440 |   },
441 |   {
442 |    "cell_type": "code",
443 |    "execution_count": 87,
444 |    "metadata": {},
445 |    "outputs": [
446 |     {
447 |      "output_type": "execute_result",
448 |      "data": {
449 |       "text/plain": [
450 |        "array([-0.08942658,  0.00486923, -0.05935808, -0.06226563, -0.04867279,\n",
451 |        "        0.04237117,  0.04769849,  0.03356505, -0.03730453,  0.00785854,\n",
452 |        "        0.03105144,  0.0776749 ,  0.05284716,  0.025134  , -0.03554538,\n",
453 |        "       -0.04298926], dtype=float32)"
454 |       ]
455 |      },
456 |      "metadata": {},
457 |      "execution_count": 87
458 |     }
459 |    ],
460 |    "source": [
461 |     "weights[1] # each word has its own weight"
462 |    ]
463 |   },
464 |   {
465 |    "cell_type": "code",
466 |    "execution_count": 86,
467 |    "metadata": {
468 |     "tags": []
469 |    },
470 |    "outputs": [
471 |     {
472 |      "output_type": "stream",
473 |      "name": "stdout",
474 |      "text": [
475 |       ">> word 1 <OOV>\n>> embeddings [-0.08942658  0.00486923 -0.05935808 -0.06226563 -0.04867279  0.04237117\n  0.04769849  0.03356505 -0.03730453  0.00785854  0.03105144  0.0776749\n  0.05284716  0.025134   -0.03554538 -0.04298926]\n>> word 2 the\n>> embeddings [-0.08670148  0.01641071 -0.02393427 -0.07146466  0.01603186  0.06126428\n  0.06148115  0.00766911  0.04187395  0.05556076  0.01930173  0.0744463\n  0.01907398  0.01339489  0.00941497 -0.0138381 ]\n>> word 3 and\n>> embeddings [ 0.01113727 -0.03538265 -0.05725451 -0.01636735 -0.00596739 -0.00635358\n  0.03053617  0.05559737  0.0871934   0.04494542  0.02274616  0.07229666\n  0.01994341  0.01223046 -0.05789011 -0.04256919]\n>> word 4 a\n>> embeddings [-0.05104827 -0.01813413 -0.04630557 -0.02343593 -0.03323779  0.06510878\n -0.00737528  0.02424134  0.0825871   0.00570629 -0.01472468  0.12047923\n  0.01702527 -0.04734353 -0.05681538 -0.06954415]\n"
476 |      ]
477 |     }
478 |    ],
479 |    "source": [
480 |     "out_v = io.open('vecs.tsv', 'w', encoding='utf-8')\n",
481 |     "out_m = io.open('meta.tsv', 'w', encoding='utf-8')\n",
482 |     "\n",
483 |     "for word_num in range(1, vocab_size):\n",
484 |     "  word = reverse_word_index[word_num]\n",
485 |     "  embeddings = weights[word_num]\n",
486 |     "  \n",
487 |     "  if word_num < 5:\n",
488 |     "    print(f\">> word {word_num}\", word)\n",
489 |     "    print(\">> embeddings\", embeddings)\n",
490 |     "\n",
491 |     "  out_m.write(word + \"\\n\")\n",
492 |     "  out_v.write('\\t'.join([str(x) for x in embeddings]) + \"\\n\")\n",
493 |     "out_v.close()\n",
494 |     "out_m.close()"
495 |    ]
496 |   },
497 |   {
498 |    "cell_type": "code",
499 |    "execution_count": 99,
500 |    "metadata": {},
501 |    "outputs": [
502 |     {
503 |      "output_type": "stream",
504 |      "name": "stdout",
505 |      "text": [
506 |       "Please install GPU version of TF\n"
507 |      ]
508 |     }
509 |    ],
510 |    "source": [
511 |     "if tf.test.gpu_device_name(): \n",
512 |     "    print('Default GPU Device:'.format(tf.test.gpu_device_name()))\n",
513 |     "else:\n",
514 |     "   print(\"Please install GPU version of TF\")"
515 |    ]
516 |   },
517 |   {
518 |    "cell_type": "code",
519 |    "execution_count": 102,
520 |    "metadata": {},
521 |    "outputs": [
522 |     {
523 |      "output_type": "stream",
524 |      "name": "stdout",
525 |      "text": [
526 |       "[]\n"
527 |      ]
528 |     },
529 |     {
530 |      "output_type": "execute_result",
531 |      "data": {
532 |       "text/plain": [
533 |        "[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU')]"
534 |       ]
535 |      },
536 |      "metadata": {},
537 |      "execution_count": 102
538 |     }
539 |    ],
540 |    "source": [
541 |     "print(tf.config.list_physical_devices('GPU'))\n",
542 |     "tf.config.list_physical_devices()"
543 |    ]
544 |   }
545 |  ]
546 | }


--------------------------------------------------------------------------------
/tensorflow-in-practice/notebooks/README.md:
--------------------------------------------------------------------------------
 1 | # Highlighted Notebooks
 2 | 
 3 | ### Course 1
 4 | > [Fashion MNIST with CNN](Course_1_Part_6_Lesson_2_Notebook.ipynb)
 5 | 
 6 | > [Human vs Horse, flow_from_directory()](Course_1_Part_8_Lesson_2_Notebook.ipynb)
 7 | 
 8 | ### Course 2
 9 | > [** Cat vs Dog, flow_from_directory(), drawings of loss and accuracy, predict on new image](Course_2_Part_2_Lesson_2_Notebook.ipynb)
10 | 
11 | > [** With augmentation, good code collected in one cell, plots of accuracy and loss](Course_2_Part_4_Lesson_2_Notebook_(Cats_v_Dogs_Augmentation).ipynb)
12 | 
13 | > [** Transfer Learning, Dropout](Course_2_Part_6_Lesson_3_Notebook_(Transfer_Learning).ipynb)
14 | 
15 | ### Course 3
16 | > [** Word Embeddings with Tokenizer](Course_3_Week_2(Model_Training_IMDB_Reviews).ipynb) - classifying the reviews in IMDB 
17 | > [** Beautiful code, classifying sarcastic news](Course_3_Week_2(Sarcasm-Classifier).ipynb)
18 | 


--------------------------------------------------------------------------------
/tensorflow-in-practice/sequences-time-series-and-prediction.md:
--------------------------------------------------------------------------------
 1 | # [Sequences, Time Series and Prediction](https://www.coursera.org/learn/tensorflow-sequences-time-series-and-prediction)
 2 | 
 3 |     - Sequences and Prediction
 4 |     - Deep Neural Networks for Time Series
 5 |     - Recurrent Neural Networks for Time Series
 6 |     - Real-world time series data
 7 | 
 8 | 
 9 | ## Sequences and Prediction
10 | > Handling sequential time series data -- where values change over time, like the temperature on a particular day, stock prices, or the number of visitors to your web site.
11 | 
12 | > Predicting future values in these time series. We need to find the pattrn to predict new value.
13 | 
14 | - Time series can be used in Speech recognition
15 | 
16 | - Types:
17 |     - **Trend** - upwords facing <br><img src="img/trend.png">
18 |     - **Seasonality** <br><img src="img/seasonality.png">
19 |     - Autocorrelation
20 |     - Noise
21 |     - Non-stationary time series <br><img src="img/ts.png">
22 | 
23 | - **Train, validation and test sets**
24 |     - **Trend + Seasonality + Noise** <br><img src="img/tsn.png">
25 |     - **Naive forecasting** - take the last value, and assume that the next will be the same
26 |     - **Fixed forecasting** - if data is seasonal you should include whole number of season (1, 2, 3 years. Then you have to train in Training period and evaluate of Val Period by tuning hyperparam. Then retrain of TP+VP and test of Test Period. Then again retrain with Test data too.<br><img src="img/fp.png"><img src="img/fp2.png">
27 |     - We start with a short training period, and we gradually increase it, say by one day at a time, or by one week at a time. At each iteration, we train the model on a training period. And we use it to forecast the following day, or the following week, in the validation period.<br><img src="img/rfp.png">
28 | 
29 | - **Metrics for evaluating performance**
30 |     - <br><img src="img/metrics.png">
31 | 
32 | ## Deep Neural Networks for Time Series
33 | 
34 | ## Recurrent Neural Networks for Time Series
35 | 
36 | ## Real-world time series data
37 | 


--------------------------------------------------------------------------------