├── Chapter02
└── Data_Structures_Used_in_Algorithms.ipynb
├── Chapter03
└── Sorting_and_Searching_Algorithms.ipynb
├── Chapter04
├── Divide_and_conquer.ipynb
├── Linear_Programming.ipynb
├── PageRank_Algorithm.ipynb
└── Travelling_Salesman_Problem.ipynb
├── Chapter05
└── GraphAlgorithms.ipynb
├── Chapter06
└── Unsupervised_Machine_Learning_Algorithms.ipynb
├── Chapter07
├── Bagging_Algorithms.ipynb
├── DecisionTree.ipynb
├── GradientBoostRegression.ipynb
├── LinearRegression.ipynb
├── Logistic_Regression.ipynb
├── NaiveBayes.ipynb
├── RegressionTree.ipynb
├── SVM.ipynb
├── WeatherPrediction.ipynb
└── XGBboost.ipynb
├── Chapter08
├── Deep_Learning_Algorithms.ipynb
└── Siamese_working.ipynb
├── Chapter09
└── Natural_Language_Processing.ipynb
├── Chapter10
└── Understanding_Sequential_Models.ipynb
├── Chapter11
└── Advanced_Sequential_Algorithms.ipynb
├── Chapter12
└── Movies_recommendation.ipynb
├── Chapter13
└── Algorithmic_Strategies_for_Data_Handling.ipynb
├── Chapter14
└── Cryptography.ipynb
├── Chapter16
└── Model_Explanability.ipynb
├── LICENSE
└── README.md
/Chapter03/Sorting_and_Searching_Algorithms.ipynb:
--------------------------------------------------------------------------------
1 | {"cells":[{"cell_type":"markdown","source":["# Chapter 3\n","## Sorting and Searching Algorithms"],"metadata":{"id":"ealMRHlaUZKz"}},{"cell_type":"markdown","metadata":{"id":"owy_6b3bB18G"},"source":["### Swap Function in Python\n","When implementing sorting and searching algorithms, we need to swap the values of two variables. In Python, there is a standard way to swap two variables, which is as follows:"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"m427i7L7B18I"},"outputs":[],"source":["var_1 = 1\n","var_2 = 2\n","var_1,var_2 = var_2,var_1\n"]},{"cell_type":"code","execution_count":null,"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"ozYmokGgB18J","executionInfo":{"status":"ok","timestamp":1694582887506,"user_tz":-330,"elapsed":55,"user":{"displayName":"Karan Sonawane","userId":"05479461208077736330"}},"outputId":"a1530d98-0964-4041-f658-d057892eb60a"},"outputs":[{"output_type":"stream","name":"stdout","text":["2 1\n"]}],"source":["print(var_1,var_2)"]},{"cell_type":"markdown","metadata":{"id":"g2_9Q-4KB18K"},"source":["## Sorting Algorithms\n","The ability to efficiently sort and search items in a complex data structure is important as it is needed by many modern algorithms."]},{"cell_type":"markdown","metadata":{"id":"ztA86JCQB18K"},"source":["### Pass 1 of Bubble Sort\n","Let's now see how bubble sort can be implemented using Python If we implement pass one of bubble sort in Python, it will look as follows:"]},{"cell_type":"code","execution_count":null,"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"N3iTyjVbB18K","executionInfo":{"status":"ok","timestamp":1694582887507,"user_tz":-330,"elapsed":48,"user":{"displayName":"Karan Sonawane","userId":"05479461208077736330"}},"outputId":"5be41ec5-2715-4f85-d4ab-0ae073f551f3"},"outputs":[{"output_type":"stream","name":"stdout","text":["0 [25, 21, 22, 24, 23, 27, 26]\n","1 [21, 25, 22, 24, 23, 27, 26]\n","2 [21, 22, 25, 24, 23, 27, 26]\n","3 [21, 22, 24, 25, 23, 27, 26]\n","4 [21, 22, 24, 23, 25, 27, 26]\n","5 [21, 22, 24, 23, 25, 27, 26]\n","6 [21, 22, 24, 23, 25, 26, 27]\n"]}],"source":["list = [25,21,22,24,23,27,26]\n","\n","last_element_index = len(list) - 1\n","print(0, list)\n","for idx in range(last_element_index):\n"," if list[idx] > list[idx + 1]:\n"," list[idx], list[idx + 1] = list[idx + 1], list[idx]\n"," print(idx + 1, list)\n"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"TXDq67J2B18L","executionInfo":{"status":"ok","timestamp":1694582887507,"user_tz":-330,"elapsed":44,"user":{"displayName":"Karan Sonawane","userId":"05479461208077736330"}},"outputId":"56f02239-b5d2-4253-824e-8edecf4647ad","colab":{"base_uri":"https://localhost:8080/"}},"outputs":[{"output_type":"execute_result","data":{"text/plain":["[21, 22, 24, 23, 25, 26, 27]"]},"metadata":{},"execution_count":4}],"source":["list"]},{"cell_type":"markdown","metadata":{"id":"qgl3TwGdB18L"},"source":["### Bubble Sort Algorithm\n","Bubble sort is one of the simplest and slowest algorithms used for sorting. It is designed in a way that the highest value in a list of data bubbles makes its way to the top as the algorithm loops through iterations."]},{"cell_type":"code","source":["def bubble_sort(list):\n","# Excahnge the elements to arrange in order\n"," last_element_index = len(list)-1\n"," for pass_no in range(last_element_index,0,-1):\n"," for idx in range(pass_no):\n"," if list[idx]>list[idx+1]:\n"," list[idx],list[idx+1]=list[idx+1],list[idx]\n"," return list\n"],"metadata":{"id":"2KG9zbUjXetz"},"execution_count":null,"outputs":[]},{"cell_type":"code","execution_count":null,"metadata":{"id":"o1k0-__CB18M"},"outputs":[],"source":["list = [25,21,22,24,23,27,26]"]},{"cell_type":"code","execution_count":null,"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"WfttaODOB18M","executionInfo":{"status":"ok","timestamp":1694582887509,"user_tz":-330,"elapsed":42,"user":{"displayName":"Karan Sonawane","userId":"05479461208077736330"}},"outputId":"40961207-1589-4eba-d4ab-6f5a54153f63"},"outputs":[{"output_type":"execute_result","data":{"text/plain":["[21, 22, 23, 24, 25, 26, 27]"]},"metadata":{},"execution_count":7}],"source":["bubble_sort(list)"]},{"cell_type":"markdown","source":["### Optimizating bubble sort\n","The above implementation of Bubble Sort implementated by bubble_sort function is a straightforward sorting method where adjacent elements are repeatedly compared and swapped if they are out of order. The algorithm consistently requires O(N^2) comparisons and swaps in the worst-case scenario, where N is the number of elements in the list. This is because, for a list of N elements, the algorithm invariably goes through N−1 passes, regardless of the initial order of the list.\n","Following is an optimized version of bubble sort.\n"],"metadata":{"id":"pJ_yaPWt1RYS"}},{"cell_type":"code","source":["def optimized_bubble_sort(list):\n"," last_element_index = len(list)-1\n"," for pass_no in range(last_element_index, 0, -1):\n"," swapped = False\n"," for idx in range(pass_no):\n"," if list[idx] > list[idx+1]:\n"," list[idx], list[idx+1] = list[idx+1], list[idx]\n"," swapped = True\n"," if not swapped:\n"," break\n"," return list\n"],"metadata":{"id":"L-a4XRwnYCin"},"execution_count":null,"outputs":[]},{"cell_type":"code","source":["list = [25,21,22,24,23,27,26]"],"metadata":{"id":"eQTuaYeSYGfm"},"execution_count":null,"outputs":[]},{"cell_type":"code","source":["optimized_bubble_sort(list)"],"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"0Nu4prCbYHUz","executionInfo":{"status":"ok","timestamp":1694582887511,"user_tz":-330,"elapsed":37,"user":{"displayName":"Karan Sonawane","userId":"05479461208077736330"}},"outputId":"7c39969f-6248-478b-aaa9-f155c029d1ee"},"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":["[21, 22, 23, 24, 25, 26, 27]"]},"metadata":{},"execution_count":10}]},{"cell_type":"markdown","metadata":{"id":"i7yH5eU9B18M"},"source":["## Insertion Sort\n","The basic idea of insertion sort is that in each iteration, we remove a data point from the data structure we have and then insert it into its right position. That is why we call this the insertion sort algorithm.\n","In the first iteration, we select the two data points and sort them. Then, we expand our selection and select the third data point and find its correct position, based on its value. The algorithm progresses until all the data points are moved to their correct positions.\n"]},{"cell_type":"code","source":["def insertion_sort(elements):\n"," for i in range(1, len(elements)):\n"," j = i - 1\n"," next_element = elements[i]\n","\n"," # Iterate backward through the sorted portion,\n"," # looking for the appropriate position for 'next_element'\n"," while j >= 0 and elements[j] > next_element:\n"," elements[j + 1] = elements[j]\n"," j -= 1\n","\n"," elements[j + 1] = next_element\n"," return elements\n"],"metadata":{"id":"Qg0uWkTDfs1M"},"execution_count":null,"outputs":[]},{"cell_type":"code","execution_count":null,"metadata":{"id":"fxR1iHRZB18M","executionInfo":{"status":"ok","timestamp":1694582887513,"user_tz":-330,"elapsed":31,"user":{"displayName":"Karan Sonawane","userId":"05479461208077736330"}},"outputId":"27db19af-9ba0-4e3e-da45-6b5631175187","colab":{"base_uri":"https://localhost:8080/"}},"outputs":[{"output_type":"execute_result","data":{"text/plain":["[21, 22, 23, 24, 25, 26, 27]"]},"metadata":{},"execution_count":12}],"source":["insertion_sort(list)"]},{"cell_type":"markdown","metadata":{"id":"_TwEtgTRB18N"},"source":["## Merge Sort\n","Merge sort stands distinctively among sorting algorithms, like bubble sort and insertion sort, for its unique approach."]},{"cell_type":"code","execution_count":null,"metadata":{"id":"Jp2NdGxYB18N"},"outputs":[],"source":["def merge_sort(elements):\n"," # Base condition to break the recursion\n"," if len(elements) <= 1:\n"," return elements\n","\n"," mid = len(elements) // 2 # Split the list in half\n"," left = elements[:mid]\n"," right = elements[mid:]\n","\n"," merge_sort(left) # Sort the left half\n"," merge_sort(right) # Sort the right half\n","\n"," a, b, c = 0, 0, 0\n"," # Merge the two halves\n"," while a < len(left) and b < len(right):\n"," if left[a] < right[b]:\n"," elements[c] = left[a]\n"," a += 1\n"," else:\n"," elements[c] = right[b]\n"," b += 1\n"," c += 1\n","\n"," # If there are remaining elements in the left half\n"," while a < len(left):\n"," elements[c] = left[a]\n"," a += 1\n"," c += 1\n"," # If there are remaining elements in the right half\n"," while b < len(right):\n"," elements[c] = right[b]\n"," b += 1\n"," c += 1\n"," return elements\n"]},{"cell_type":"code","execution_count":null,"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"MxpeLw07B18N","executionInfo":{"status":"ok","timestamp":1694582887515,"user_tz":-330,"elapsed":29,"user":{"displayName":"Karan Sonawane","userId":"05479461208077736330"}},"outputId":"1bcf6fb3-8de4-4093-a193-4b4761e6a9bd"},"outputs":[{"output_type":"execute_result","data":{"text/plain":["[21, 22, 23, 24, 25, 26, 27]"]},"metadata":{},"execution_count":14}],"source":["list = [21, 22, 23, 24, 25, 26, 27]\n","merge_sort(list)"]},{"cell_type":"markdown","source":["## Shell Sort\n","The bubble sort algorithm compares immediate neighbors and exchanges them if they are out of order. On the other hand, insertion sort creates the sorted list by transferring one element at a time. If we have a partially sorted list, insertion sort should give reasonable performance."],"metadata":{"id":"RjfYlvzpgr_y"}},{"cell_type":"code","source":["def shell_sort(elements):\n"," distance = len(elements) // 2\n"," while distance > 0:\n"," for i in range(distance, len(elements)):\n"," temp = elements[i]\n"," j = i\n","# Sort the sub list for this distance\n"," while j >= distance and elements[j - distance] > temp:\n"," list[j] = elements[j - distance]\n"," j = j-distance\n"," list[j] = temp\n","# Reduce the distance for the next element\n"," distance = distance//2\n"," return elements\n"],"metadata":{"id":"ySN2dRS6utYL"},"execution_count":null,"outputs":[]},{"cell_type":"code","source":["list = [21, 22, 23, 24, 25, 26, 27]\n","shell_sort(list)"],"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"eSh40zeFu02h","executionInfo":{"status":"ok","timestamp":1694582887518,"user_tz":-330,"elapsed":28,"user":{"displayName":"Karan Sonawane","userId":"05479461208077736330"}},"outputId":"e4689cde-8762-4d32-c343-0d90cdecd5d7"},"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":["[21, 22, 23, 24, 25, 26, 27]"]},"metadata":{},"execution_count":16}]},{"cell_type":"markdown","metadata":{"id":"iqZdpBTxB18O"},"source":["## Selection Sort\n","As we saw earlier in this chapter, bubble sort is one of the simplest sorting algorithms. Selection sort is an improvement on bubble sort, where we try to minimize the total number of swaps required with the algorithm. It is designed to make one swap for each pass, compared to N-1 passes with the bubble sort algorithm. Instead of bubbling the largest value toward the top in baby steps (as done in bubble sort, resulting in N-1 swaps), we look for the largest value in each pass and move it toward the top. So, after the first pass, the largest value will be at the top. After the second pass, the second largest value will be next to the top value. As the algorithm progresses, the subsequent values will move to their correct place based on their values. The last value will be moved after the (N-1)th pass. So, selection sort takes N-1 passes to sort N items"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"4tbnTOlTB18O"},"outputs":[],"source":["def selection_sort(list):\n"," for fill_slot in range(len(list) - 1, 0, -1):\n"," max_index = 0\n"," for location in range(1, fill_slot + 1):\n"," if list[location] > list[max_index]:\n"," max_index = location\n"," list[fill_slot],list[max_index] = list[max_index],list[fill_slot]\n"," return list"]},{"cell_type":"code","execution_count":null,"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"a8g-rVqxB18O","executionInfo":{"status":"ok","timestamp":1694582888765,"user_tz":-330,"elapsed":1272,"user":{"displayName":"Karan Sonawane","userId":"05479461208077736330"}},"outputId":"202339de-276e-4162-c584-986397df9cd0"},"outputs":[{"output_type":"execute_result","data":{"text/plain":["[21, 22, 23, 24, 25, 26, 27]"]},"metadata":{},"execution_count":18}],"source":["list = [21, 22, 23, 24, 25, 26, 27]\n","selection_sort(list)"]},{"cell_type":"markdown","metadata":{"id":"i5E9RC9ZB18O"},"source":["## Searching Algorithms\n","The following searching algorithms are presented in this section:\n","- Linear search\n","- Binary search\n","- Interpolation search\n"]},{"cell_type":"markdown","metadata":{"id":"BC2maEdnB18O"},"source":["### Linear Search\n","One of the simplest strategies for searching data is to simply loop through each element looking for the target. Each data point is searched for a match and when a match is found, the results are returned, and the algorithm exits the loop. Otherwise, the algorithm keeps on searching until it reaches the end of the data. The obvious disadvantage of linear search is that it is very slow due to the inherent exhaustive search. The advantage is that the data does not need to be sorted, as required by the other algorithms presented in this chapter.\n","Let's look at the code for linear search:\n"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"-R4m7HtzB18O"},"outputs":[],"source":["def linear_search(list, item):\n"," index = 0\n"," found = False\n","\n","# Match the value with each data element\n"," while index < len(list) and found is False:\n"," if list[index] == item:\n"," found = True\n"," else:\n"," index = index + 1\n"," return found"]},{"cell_type":"code","execution_count":null,"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"LReEA1ILB18O","executionInfo":{"status":"ok","timestamp":1694582888766,"user_tz":-330,"elapsed":18,"user":{"displayName":"Karan Sonawane","userId":"05479461208077736330"}},"outputId":"aaae3c06-f988-4922-80c9-8c6638631da0"},"outputs":[{"output_type":"stream","name":"stdout","text":["True\n","False\n"]}],"source":["list = [12, 33, 11, 99, 22, 55, 90]\n","print(linear_search(list, 12))\n","print(linear_search(list, 91))"]},{"cell_type":"markdown","metadata":{"id":"Qwcku2tlB18O"},"source":["### Binary Search\n","The pre-requisite of the binary search algorithm is sorted data. The algorithm iteratively divides a list into two parts and keeps a track of the lowest and highest indices until it finds the value it is looking for:"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"Aa8ZK40WB18O"},"outputs":[],"source":["def binary_search(elements, item):\n"," first = 0\n"," last = len(elements) - 1\n","\n"," while first <= last:\n"," midpoint = (first + last) // 2\n"," if elements[midpoint] == item:\n"," return True\n"," else:\n"," if item < elements[midpoint]:\n"," last = midpoint - 1\n"," else:\n"," first = midpoint + 1\n"," return False\n"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"Wr2c6DR4B18O","outputId":"14f3166c-4113-42c1-ee1e-522f47d29e8d","executionInfo":{"status":"ok","timestamp":1694582888766,"user_tz":-330,"elapsed":15,"user":{"displayName":"Karan Sonawane","userId":"05479461208077736330"}},"colab":{"base_uri":"https://localhost:8080/"}},"outputs":[{"output_type":"stream","name":"stdout","text":["True\n","False\n"]}],"source":["list = [12, 33, 11, 99, 22, 55, 90]\n","sorted_list = bubble_sort(list)\n","print(binary_search(list, 12))\n","print(binary_search(list, 91))"]},{"cell_type":"markdown","metadata":{"id":"t2dln9KJB18O"},"source":["## Intpolation Search\n","Binary search is based on the logic that it focuses on the middle section of the data. Interpolation search is more sophisticated. It uses the target value to estimate the position of the element in the sorted array. Let's try to understand it by using an example. Let's assume we want to search for a word in an English dictionary, such as the word river. We will use this information to interpolate and start searching for words starting with r. A more generalized interpolation search can be programmed as follows:"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"gqH5E6xcB18O"},"outputs":[],"source":["def int_polsearch(list,x ):\n"," idx0 = 0\n"," idxn = (len(list) - 1)\n"," while idx0 <= idxn and x >= list[idx0] and x <= list[idxn]:\n","\n","# Find the mid point\n"," mid = idx0 +int(((float(idxn - idx0)/( list[idxn] - list[idx0])) * ( x - list[idx0])))\n","\n","# Compare the value at mid point with search value\n"," if list[mid] == x:\n"," return True\n"," if list[mid] < x:\n"," idx0 = mid + 1\n"," return False\n"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"jpHedryQB18P","outputId":"eecc99af-bbba-486f-d1c0-8651998d678e","executionInfo":{"status":"ok","timestamp":1694582888767,"user_tz":-330,"elapsed":14,"user":{"displayName":"Karan Sonawane","userId":"05479461208077736330"}},"colab":{"base_uri":"https://localhost:8080/"}},"outputs":[{"output_type":"stream","name":"stdout","text":["True\n","False\n"]}],"source":["list = [12, 33, 11, 99, 22, 55, 90]\n","sorted_list = bubble_sort(list)\n","print(int_polsearch(list, 12))\n","print(int_polsearch(list,91))"]}],"metadata":{"kernelspec":{"display_name":"Python 3","language":"python","name":"python3"},"language_info":{"codemirror_mode":{"name":"ipython","version":3},"file_extension":".py","mimetype":"text/x-python","name":"python","nbconvert_exporter":"python","pygments_lexer":"ipython3","version":"3.7.3"},"colab":{"provenance":[]}},"nbformat":4,"nbformat_minor":0}
--------------------------------------------------------------------------------
/Chapter04/Divide_and_conquer.ipynb:
--------------------------------------------------------------------------------
1 | {"cells":[{"cell_type":"code","source":["!apt-get install openjdk-11-jdk-headless -qq > /dev/null\n","!wget -q https://archive.apache.org/dist/spark/spark-3.1.2/spark-3.1.2-bin-hadoop3.2.tgz\n","!tar xf spark-3.1.2-bin-hadoop3.2.tgz\n","!pip install -q findspark\n","\n","import os\n","os.environ[\"JAVA_HOME\"] = \"/usr/lib/jvm/java-11-openjdk-amd64\"\n","os.environ[\"SPARK_HOME\"] = \"/content/spark-3.1.2-bin-hadoop3.2\"\n","\n","import findspark\n","findspark.init()\n","\n"],"metadata":{"id":"PYg0wk0q2m2l","executionInfo":{"status":"ok","timestamp":1694668548353,"user_tz":-330,"elapsed":34730,"user":{"displayName":"Karan Sonawane","userId":"05479461208077736330"}}},"execution_count":1,"outputs":[]},{"cell_type":"code","execution_count":2,"metadata":{"id":"K4ZBYPju0rnh","executionInfo":{"status":"ok","timestamp":1694668562450,"user_tz":-330,"elapsed":14102,"user":{"displayName":"Karan Sonawane","userId":"05479461208077736330"}}},"outputs":[],"source":["from pyspark.sql import SparkSession\n","spark = SparkSession.builder.master(\"local[*]\").getOrCreate()\n","sc = spark.sparkContext\n"]},{"cell_type":"code","execution_count":3,"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"8OHRiDEX0rnh","executionInfo":{"status":"ok","timestamp":1694668565120,"user_tz":-330,"elapsed":2674,"user":{"displayName":"Karan Sonawane","userId":"05479461208077736330"}},"outputId":"91146831-cd19-45eb-f04e-77fa2bb89149"},"outputs":[{"output_type":"stream","name":"stdout","text":["['python', 'java', 'ottawa', 'ottawa', 'java', 'news']\n"]}],"source":["wordsList = ['python', 'java', 'ottawa', 'ottawa', 'java','news']\n","wordsRDD = sc.parallelize(wordsList, 4)\n","# Print out the type of wordsRDD\n","print (wordsRDD.collect())"]},{"cell_type":"code","execution_count":4,"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"OUCodRiA0rnh","executionInfo":{"status":"ok","timestamp":1694668566738,"user_tz":-330,"elapsed":1623,"user":{"displayName":"Karan Sonawane","userId":"05479461208077736330"}},"outputId":"30d397ab-ce38-4372-ed42-3c3396309c95"},"outputs":[{"output_type":"stream","name":"stdout","text":["[('python', 1), ('java', 1), ('ottawa', 1), ('ottawa', 1), ('java', 1), ('news', 1)]\n"]}],"source":["wordPairs = wordsRDD.map(lambda w: (w, 1))\n","print (wordPairs.collect())"]},{"cell_type":"code","execution_count":5,"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"76evGJgA0rni","executionInfo":{"status":"ok","timestamp":1694668568667,"user_tz":-330,"elapsed":1936,"user":{"displayName":"Karan Sonawane","userId":"05479461208077736330"}},"outputId":"f7689315-e0a6-4d40-b983-e5a39e2d9668"},"outputs":[{"output_type":"stream","name":"stdout","text":["[('python', 1), ('java', 2), ('ottawa', 2), ('news', 1)]\n"]}],"source":["wordCountsCollected = wordPairs.reduceByKey(lambda x,y: x+y)\n","print(wordCountsCollected.collect())"]}],"metadata":{"kernelspec":{"display_name":"Python 3","language":"python","name":"python3"},"language_info":{"codemirror_mode":{"name":"ipython","version":3},"file_extension":".py","mimetype":"text/x-python","name":"python","nbconvert_exporter":"python","pygments_lexer":"ipython3","version":"3.8.5"},"colab":{"provenance":[]}},"nbformat":4,"nbformat_minor":0}
--------------------------------------------------------------------------------
/Chapter04/Linear_Programming.ipynb:
--------------------------------------------------------------------------------
1 | {"cells":[{"cell_type":"markdown","metadata":{"id":"ZD0y-c58zvId"},"source":["# CHAPTER 4: Designing Algorithms\n","## Capacity Planning with Linear Programming\n","@Copyright Imran Ahmad\n","\n","Let us look into a practical use case where Linear Programming can be used to solve a real world problem. Let us assume that we want to maximize the profit of a state-of-the-art factory manufacturing robots. The factory can manufacture two different types of robot:\n","- Advanced Model (A)\n","- Basic Model (B)\n"]},{"cell_type":"code","execution_count":3,"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"7eza0zRFzvIg","executionInfo":{"status":"ok","timestamp":1694668700162,"user_tz":240,"elapsed":6691,"user":{"displayName":"Imran Ahmad","userId":"08683678182734146579"}},"outputId":"204f8c77-c766-410e-85ac-51a936ce3f2b"},"outputs":[{"output_type":"stream","name":"stdout","text":["Collecting pulp\n"," Downloading PuLP-2.7.0-py3-none-any.whl (14.3 MB)\n","\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m14.3/14.3 MB\u001b[0m \u001b[31m52.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n","\u001b[?25hInstalling collected packages: pulp\n","Successfully installed pulp-2.7.0\n"]}],"source":["#If pulp is not install then please uncomment the following line of code and run it once\n","!pip install pulp"]},{"cell_type":"code","execution_count":4,"metadata":{"id":"z_k56vx0zvIh","executionInfo":{"status":"ok","timestamp":1694668700162,"user_tz":240,"elapsed":5,"user":{"displayName":"Imran Ahmad","userId":"08683678182734146579"}}},"outputs":[],"source":["import pulp"]},{"cell_type":"markdown","metadata":{"id":"jnUkOY2BzvIh"},"source":["We call LpProblem function in this package to instantiate the problem class. We name the instance as \"Profit Maximising Problem\""]},{"cell_type":"code","execution_count":5,"metadata":{"id":"WMI7ETPYzvIh","executionInfo":{"status":"ok","timestamp":1694668700163,"user_tz":240,"elapsed":5,"user":{"displayName":"Imran Ahmad","userId":"08683678182734146579"}}},"outputs":[],"source":["model = pulp.LpProblem(\"Profit_maximising_problem\", pulp.LpMaximize)"]},{"cell_type":"markdown","metadata":{"id":"DgDig7QnzvIi"},"source":["Then we define two linear variables, A and B. Variable A represents the number of advanced robots that are produced and variable B represents the number of basic robots that are produced."]},{"cell_type":"code","execution_count":6,"metadata":{"id":"VCyOz85XzvIi","executionInfo":{"status":"ok","timestamp":1694668700164,"user_tz":240,"elapsed":5,"user":{"displayName":"Imran Ahmad","userId":"08683678182734146579"}}},"outputs":[],"source":["A = pulp.LpVariable('A', lowBound=0, cat='Integer')\n","B = pulp.LpVariable('B', lowBound=0, cat='Integer')"]},{"cell_type":"markdown","metadata":{"id":"HZcIw2zJzvIi"},"source":["We define the objective function and constraints as follows:"]},{"cell_type":"code","execution_count":7,"metadata":{"id":"akx_jhZizvIj","executionInfo":{"status":"ok","timestamp":1694668700354,"user_tz":240,"elapsed":195,"user":{"displayName":"Imran Ahmad","userId":"08683678182734146579"}}},"outputs":[],"source":["# Objective function\n","model += 5000 * A + 2500 * B, \"Profit\"\n","\n","# Constraints\n","model += 3 * A + 2 * B <= 20\n","model += 4 * A + 3 * B <= 30\n","model += 4 * A + 3 * B <= 44"]},{"cell_type":"markdown","metadata":{"id":"7x1LKdmPzvIj"},"source":["We use the solve function to generate a solution."]},{"cell_type":"code","execution_count":8,"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":35},"id":"gZQUNgTOzvIj","executionInfo":{"status":"ok","timestamp":1694668700355,"user_tz":240,"elapsed":5,"user":{"displayName":"Imran Ahmad","userId":"08683678182734146579"}},"outputId":"9151b917-b82d-4cd0-871a-68239b5ef489"},"outputs":[{"output_type":"execute_result","data":{"text/plain":["'Optimal'"],"application/vnd.google.colaboratory.intrinsic+json":{"type":"string"}},"metadata":{},"execution_count":8}],"source":["# Solve our problem\n","model.solve()\n","pulp.LpStatus[model.status]"]},{"cell_type":"markdown","metadata":{"id":"M_347ABizvIk"},"source":["Then we print the value of A, B and the value of objective function."]},{"cell_type":"code","execution_count":9,"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"vk6vZt2MzvIk","executionInfo":{"status":"ok","timestamp":1694668700355,"user_tz":240,"elapsed":4,"user":{"displayName":"Imran Ahmad","userId":"08683678182734146579"}},"outputId":"ce41b652-d706-4f97-fc04-5bab254a335b"},"outputs":[{"output_type":"stream","name":"stdout","text":["6.0\n","1.0\n"]}],"source":["# Print our decision variable values\n","print (A.varValue)\n","print (B.varValue)\n"]},{"cell_type":"markdown","metadata":{"id":"7_JgotIIzvIk"},"source":[]},{"cell_type":"code","execution_count":10,"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"YSk-wzK1zvIk","executionInfo":{"status":"ok","timestamp":1694668700355,"user_tz":240,"elapsed":4,"user":{"displayName":"Imran Ahmad","userId":"08683678182734146579"}},"outputId":"c69ffdd5-66fc-4bc4-83d9-ebd212e5d601"},"outputs":[{"output_type":"stream","name":"stdout","text":["32500.0\n"]}],"source":["# Print our objective function value\n","print (pulp.value(model.objective))"]},{"cell_type":"code","execution_count":10,"metadata":{"id":"Iq0j2UvCzvIk","executionInfo":{"status":"ok","timestamp":1694668700355,"user_tz":240,"elapsed":3,"user":{"displayName":"Imran Ahmad","userId":"08683678182734146579"}}},"outputs":[],"source":[]},{"cell_type":"code","execution_count":10,"metadata":{"id":"1VlbX-B-zvIk","executionInfo":{"status":"ok","timestamp":1694668700355,"user_tz":240,"elapsed":3,"user":{"displayName":"Imran Ahmad","userId":"08683678182734146579"}}},"outputs":[],"source":[]}],"metadata":{"kernelspec":{"display_name":"Python 3","language":"python","name":"python3"},"language_info":{"codemirror_mode":{"name":"ipython","version":3},"file_extension":".py","mimetype":"text/x-python","name":"python","nbconvert_exporter":"python","pygments_lexer":"ipython3","version":"3.8.5"},"colab":{"provenance":[]}},"nbformat":4,"nbformat_minor":0}
--------------------------------------------------------------------------------
/Chapter04/Travelling_Salesman_Problem.ipynb:
--------------------------------------------------------------------------------
1 | {"nbformat":4,"nbformat_minor":0,"metadata":{"colab":{"private_outputs":true,"provenance":[]},"kernelspec":{"name":"python3","display_name":"Python 3"},"language_info":{"name":"python"}},"cells":[{"cell_type":"markdown","source":["# Chapter 4\n","## Travelling Saleman Problem\n","Let's first look at the problem statement for the TSP, which is a well-known problem that was coined as a challenge in the 1930s. The TSP is an NP-hard problem. To start with, we can randomly generate a tour that meets the condition of visiting all of the cities without caring about the optimal solution. Then, we can work to improve the solution with each iteration. Each tour generated in an iteration is called a candidate solution (also called a certificate). Proving that a certificate is optimal requires an exponentially increasing amount of time. Instead, different heuristics-based solutions are used that generate tours that are near to optimal but are not optimal.\n","## 1- Brute-force strategy\n","The first solution that comes to mind to solve the TSP is using brute force to come up with the shortest path in which the salesperson visits every city exactly once and returns to the initial city. So, the brute-force strategy works as follows:\n","Evaluate all possible tours.\n"],"metadata":{"id":"ypGNpdQs_UUD"}},{"cell_type":"code","source":["import random\n","from itertools import permutations\n","import matplotlib.pyplot as plt\n","from collections import Counter\n","from time import time"],"metadata":{"id":"5RWOAt7K_wt2"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["\n","Distance and Tour calculations"],"metadata":{"id":"k1fbAnk7_0KF"}},{"cell_type":"code","source":["aCity = complex\n","\n","def distance_points(first, second):\n"," return abs(first - second)\n","\n","def distance_tour(aTour):\n"," return sum(distance_points(aTour[i - 1], aTour[i])\n"," for i in range(len(aTour))\n"," )\n","\n","def generate_cities(number_of_cities):\n"," seed = 111\n"," width = 500\n"," height = 300\n"," random.seed((number_of_cities, seed))\n"," return frozenset(aCity(random.randint(1, width), random.randint(1, height))\n"," for c in range(number_of_cities))"],"metadata":{"id":"hA2Rg3YYAFEL"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["Brute Force Algorithm"],"metadata":{"id":"9GwXO5EfAIJw"}},{"cell_type":"code","execution_count":null,"metadata":{"id":"-hVJHvca-jZn"},"outputs":[],"source":["def brute_force(cities):\n"," return shortest_tour(permutations(cities))\n","\n","def shortest_tour(tours):\n"," return min(tours, key=distance_tour)"]},{"cell_type":"markdown","source":["Visualization"],"metadata":{"id":"ysgI1xweAy9E"}},{"cell_type":"code","source":["def visualize_tour(tour, style='bo-'):\n"," if len(tour) > 1000:\n"," plt.figure(figsize=(15, 10))\n"," start = tour[0:1]\n"," visualize_segment(tour + start, style)\n"," visualize_segment(start, 'rD')\n","\n","def visualize_segment(segment, style='bo-'):\n"," plt.plot([X(c) for c in segment], [Y(c) for c in segment], style, clip_on=False)\n"," plt.axis('scaled')\n"," plt.axis('off')\n","\n","def X(city):\n"," return city.real\n","\n","def Y(city):\n"," return city.imag"],"metadata":{"id":"tWRI4lDnAyro"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["TSP function"],"metadata":{"id":"-YjY2ZMmAcPS"}},{"cell_type":"code","source":["def tsp(algorithm, cities):\n"," t0 = time()\n"," tour = algorithm(cities)\n"," t1 = time()\n"," # Every city appears exactly once in tour\n"," assert Counter(tour) == Counter(cities)\n"," visualize_tour(tour)\n"," print(\"{}: {} cities => tour length {:.0f} (in {:.3f} sec)\".format(\n"," name(algorithm), len(tour), distance_tour(tour), t1-t0))\n","\n","def name(algorithm):\n"," return algorithm.__name__.replace('_tsp', '')"],"metadata":{"id":"9ys9aSpMAcz0"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["Lets run it"],"metadata":{"id":"GnLs84GpAlCC"}},{"cell_type":"code","source":["tsp(brute_force, generate_cities(10))"],"metadata":{"id":"o7o7fFN-AiZJ"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["## 2- Greedy Algorithm\n","If we use a greedy algorithm to solve the TSP, then, at each step, we can choose a city that seems reasonable, instead of finding a city to visit that will result in the best overall path. So, whenever we need to select a city, we just select the nearest city without bothering to verify that this choice will result in the globally optimal path.\n","The approach of the greedy algorithm is simple:\n","1.\tStart from any city.\n","2.\tAt each step, keep building the tour by moving to the next city where the nearest neighborhood has not been visited before.\n","3.\tRepeat step 2.\n","Let's define a function named greedy_algorithm that can implement this logic:\n"],"metadata":{"id":"CU5kiamlEbbn"}},{"cell_type":"code","source":["# Greedy Algorithm for TSP\n","def greedy_algorithm(cities, start=None):\n"," city_ = start or first(cities)\n"," tour = [city_]\n"," unvisited = set(cities - {city_})\n"," while unvisited:\n"," city_ = nearest_neighbor(city_, unvisited)\n"," tour.append(city_)\n"," unvisited.remove(city_)\n"," return tour\n","\n","def first(collection):\n"," return next(iter(collection))\n","\n","def nearest_neighbor(city_a, cities):\n"," return min(cities, key=lambda city_: distance_points(city_, city_a))\n","\n","# Now, let's use greedy_algorithm to create a tour for 2,000 cities\n","tsp(greedy_algorithm, generate_cities(2000))\n"],"metadata":{"id":"8e6LCvldArnB"},"execution_count":null,"outputs":[]}]}
--------------------------------------------------------------------------------
/Chapter07/Bagging_Algorithms.ipynb:
--------------------------------------------------------------------------------
1 | {"cells":[{"cell_type":"markdown","source":["## Chapter 7\n","### Random Forest"],"metadata":{"id":"tXFR-Rt7Ce-J"}},{"cell_type":"code","execution_count":null,"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"om9qMYIOCQ0N","executionInfo":{"status":"ok","timestamp":1694583840408,"user_tz":-330,"elapsed":963,"user":{"displayName":"Karan Sonawane","userId":"05479461208077736330"}},"outputId":"1decf44f-4e52-4650-f04a-0246001ce211"},"outputs":[{"output_type":"stream","name":"stdout","text":["1.2.2\n"]}],"source":["# Importing the libraries\n","import numpy as np\n","import matplotlib.pyplot as plt\n","import pandas as pd\n","import sklearn\n","import sklearn.metrics as metrics\n","print(sklearn.__version__)"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"cCimLkjCCQ0P"},"outputs":[],"source":["# Importing the dataset\n","dataset = pd.read_csv('https://storage.googleapis.com/neurals/data/data/Social_Network_Ads.csv')\n","dataset = dataset.drop(columns=['User ID'])"]},{"cell_type":"code","execution_count":null,"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":206},"id":"u6iSzLAECQ0P","executionInfo":{"status":"ok","timestamp":1694583840965,"user_tz":-330,"elapsed":19,"user":{"displayName":"Karan Sonawane","userId":"05479461208077736330"}},"outputId":"900ba233-9146-48bf-8436-4020cd0fc208"},"outputs":[{"output_type":"execute_result","data":{"text/plain":[" Gender Age EstimatedSalary Purchased\n","0 Male 19 19000 0\n","1 Male 35 20000 0\n","2 Female 26 43000 0\n","3 Female 27 57000 0\n","4 Male 19 76000 0"],"text/html":["\n","
\n","
\n","\n","
\n"," \n"," \n"," | \n"," Gender | \n"," Age | \n"," EstimatedSalary | \n"," Purchased | \n","
\n"," \n"," \n"," \n"," 0 | \n"," Male | \n"," 19 | \n"," 19000 | \n"," 0 | \n","
\n"," \n"," 1 | \n"," Male | \n"," 35 | \n"," 20000 | \n"," 0 | \n","
\n"," \n"," 2 | \n"," Female | \n"," 26 | \n"," 43000 | \n"," 0 | \n","
\n"," \n"," 3 | \n"," Female | \n"," 27 | \n"," 57000 | \n"," 0 | \n","
\n"," \n"," 4 | \n"," Male | \n"," 19 | \n"," 76000 | \n"," 0 | \n","
\n"," \n","
\n","
\n","
\n","
\n"]},"metadata":{},"execution_count":3}],"source":["dataset.head(5)"]},{"cell_type":"code","execution_count":null,"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":206},"id":"DqNew69rCQ0P","executionInfo":{"status":"ok","timestamp":1694583840966,"user_tz":-330,"elapsed":17,"user":{"displayName":"Karan Sonawane","userId":"05479461208077736330"}},"outputId":"1422a9b2-629b-4044-d2df-14e03fcff7fd"},"outputs":[{"output_type":"execute_result","data":{"text/plain":[" Female Male Age EstimatedSalary Purchased\n","0 0.0 1.0 19 19000 0\n","1 0.0 1.0 35 20000 0\n","2 1.0 0.0 26 43000 0\n","3 1.0 0.0 27 57000 0\n","4 0.0 1.0 19 76000 0"],"text/html":["\n"," \n","
\n","\n","
\n"," \n"," \n"," | \n"," Female | \n"," Male | \n"," Age | \n"," EstimatedSalary | \n"," Purchased | \n","
\n"," \n"," \n"," \n"," 0 | \n"," 0.0 | \n"," 1.0 | \n"," 19 | \n"," 19000 | \n"," 0 | \n","
\n"," \n"," 1 | \n"," 0.0 | \n"," 1.0 | \n"," 35 | \n"," 20000 | \n"," 0 | \n","
\n"," \n"," 2 | \n"," 1.0 | \n"," 0.0 | \n"," 26 | \n"," 43000 | \n"," 0 | \n","
\n"," \n"," 3 | \n"," 1.0 | \n"," 0.0 | \n"," 27 | \n"," 57000 | \n"," 0 | \n","
\n"," \n"," 4 | \n"," 0.0 | \n"," 1.0 | \n"," 19 | \n"," 76000 | \n"," 0 | \n","
\n"," \n","
\n","
\n","
\n","
\n"]},"metadata":{},"execution_count":4}],"source":["enc = sklearn.preprocessing.OneHotEncoder()\n","enc.fit(dataset.iloc[:,[0]])\n","onehotlabels = enc.transform(dataset.iloc[:,[0]]).toarray()\n","genders = pd.DataFrame({'Female': onehotlabels[:, 0], 'Male': onehotlabels[:, 1]})\n","result = pd.concat([genders,dataset.iloc[:,1:]], axis=1, sort=False)\n","\n","result.head(5)"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"dICamYggCQ0Q"},"outputs":[],"source":["y=result['Purchased']\n","X=result.drop(columns=['Purchased'])"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"ByZ3dte0CQ0Q"},"outputs":[],"source":["# Splitting the dataset into the Training set and Test set\n","from sklearn.model_selection import train_test_split\n","#from sklearn.cross_validation import train_test_split\n","X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"wLJTwPwzCQ0Q"},"outputs":[],"source":["# Feature Scaling\n","from sklearn.preprocessing import StandardScaler\n","sc = StandardScaler()\n","X_train = sc.fit_transform(X_train)\n","X_test = sc.transform(X_test)"]},{"cell_type":"code","execution_count":null,"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":92},"id":"GQEJc7dCCQ0Q","executionInfo":{"status":"ok","timestamp":1694583841514,"user_tz":-330,"elapsed":558,"user":{"displayName":"Karan Sonawane","userId":"05479461208077736330"}},"outputId":"f78c82ff-5a35-4814-e919-b94044437e3d"},"outputs":[{"output_type":"execute_result","data":{"text/plain":["RandomForestClassifier(criterion='entropy', max_depth=4, n_estimators=10,\n"," random_state=0)"],"text/html":["RandomForestClassifier(criterion='entropy', max_depth=4, n_estimators=10,\n"," random_state=0)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org. "]},"metadata":{},"execution_count":8}],"source":["# Fitting Random Forest Classification to the Training set\n","from sklearn.ensemble import RandomForestClassifier\n","classifier = RandomForestClassifier(n_estimators = 10, max_depth = 4,criterion = 'entropy', random_state = 0)\n","classifier.fit(X_train, y_train)"]},{"cell_type":"code","execution_count":null,"metadata":{"scrolled":true,"colab":{"base_uri":"https://localhost:8080/"},"id":"F_DUyMzBCQ0Q","executionInfo":{"status":"ok","timestamp":1694583841515,"user_tz":-330,"elapsed":18,"user":{"displayName":"Karan Sonawane","userId":"05479461208077736330"}},"outputId":"17e73427-d995-4e61-8144-1d167062df6b"},"outputs":[{"output_type":"execute_result","data":{"text/plain":["array([[64, 4],\n"," [ 3, 29]])"]},"metadata":{},"execution_count":9}],"source":["# Predicting the Test set results\n","y_pred = classifier.predict(X_test)\n","cm = metrics.confusion_matrix(y_test, y_pred)\n","cm"]},{"cell_type":"code","execution_count":null,"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"0gR_UecGCQ0R","executionInfo":{"status":"ok","timestamp":1694583841517,"user_tz":-330,"elapsed":17,"user":{"displayName":"Karan Sonawane","userId":"05479461208077736330"}},"outputId":"832f65a9-dc69-4050-f9c0-9d299f819c8c"},"outputs":[{"output_type":"stream","name":"stdout","text":["0.93 0.90625 0.8787878787878788\n"]}],"source":["accuracy= metrics.accuracy_score(y_test,y_pred)\n","recall = metrics.recall_score(y_test,y_pred)\n","precision = metrics.precision_score(y_test,y_pred)\n","print(accuracy,recall,precision)"]}],"metadata":{"anaconda-cloud":{},"kernelspec":{"display_name":"Python 3","language":"python","name":"python3"},"language_info":{"codemirror_mode":{"name":"ipython","version":3},"file_extension":".py","mimetype":"text/x-python","name":"python","nbconvert_exporter":"python","pygments_lexer":"ipython3","version":"3.7.3"},"colab":{"provenance":[]}},"nbformat":4,"nbformat_minor":0}
--------------------------------------------------------------------------------
/Chapter07/NaiveBayes.ipynb:
--------------------------------------------------------------------------------
1 | {"cells":[{"cell_type":"markdown","source":["## Chapter 7\n","### Naive Bayes Algorithm"],"metadata":{"id":"ECONriooFBG5"}},{"cell_type":"code","execution_count":null,"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"PQhzbBRZEbwh","executionInfo":{"status":"ok","timestamp":1694584130945,"user_tz":-330,"elapsed":1887,"user":{"displayName":"Karan Sonawane","userId":"05479461208077736330"}},"outputId":"8b428c48-2831-4622-bbfe-469c24dba3e4"},"outputs":[{"output_type":"stream","name":"stdout","text":["1.2.2\n"]}],"source":["# Importing the libraries\n","import numpy as np\n","import sklearn,sklearn.tree\n","import matplotlib.pyplot as plt\n","import pandas as pd\n","import sklearn.metrics as metrics\n","print(sklearn.__version__)"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"rAcSlawrEbwj"},"outputs":[],"source":["# Importing the dataset\n","dataset = pd.read_csv('https://storage.googleapis.com/neurals/data/data/Social_Network_Ads.csv')\n","dataset = dataset.drop(columns=['User ID'])"]},{"cell_type":"code","execution_count":null,"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":206},"id":"s_KJWXWeEbwj","executionInfo":{"status":"ok","timestamp":1694584131699,"user_tz":-330,"elapsed":28,"user":{"displayName":"Karan Sonawane","userId":"05479461208077736330"}},"outputId":"dc36bb91-40a9-4b93-d5f6-74564a3706e2"},"outputs":[{"output_type":"execute_result","data":{"text/plain":[" Gender Age EstimatedSalary Purchased\n","0 Male 19 19000 0\n","1 Male 35 20000 0\n","2 Female 26 43000 0\n","3 Female 27 57000 0\n","4 Male 19 76000 0"],"text/html":["\n"," \n","
\n","\n","
\n"," \n"," \n"," | \n"," Gender | \n"," Age | \n"," EstimatedSalary | \n"," Purchased | \n","
\n"," \n"," \n"," \n"," 0 | \n"," Male | \n"," 19 | \n"," 19000 | \n"," 0 | \n","
\n"," \n"," 1 | \n"," Male | \n"," 35 | \n"," 20000 | \n"," 0 | \n","
\n"," \n"," 2 | \n"," Female | \n"," 26 | \n"," 43000 | \n"," 0 | \n","
\n"," \n"," 3 | \n"," Female | \n"," 27 | \n"," 57000 | \n"," 0 | \n","
\n"," \n"," 4 | \n"," Male | \n"," 19 | \n"," 76000 | \n"," 0 | \n","
\n"," \n","
\n","
\n","
\n","
\n"]},"metadata":{},"execution_count":3}],"source":["dataset.head(5)"]},{"cell_type":"code","execution_count":null,"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":206},"id":"c4-CdPslEbwk","executionInfo":{"status":"ok","timestamp":1694584131700,"user_tz":-330,"elapsed":27,"user":{"displayName":"Karan Sonawane","userId":"05479461208077736330"}},"outputId":"ef335e77-2590-4e12-8b5b-13388ceff4fd"},"outputs":[{"output_type":"execute_result","data":{"text/plain":[" Female Male Age EstimatedSalary Purchased\n","0 0.0 1.0 19 19000 0\n","1 0.0 1.0 35 20000 0\n","2 1.0 0.0 26 43000 0\n","3 1.0 0.0 27 57000 0\n","4 0.0 1.0 19 76000 0"],"text/html":["\n"," \n","
\n","\n","
\n"," \n"," \n"," | \n"," Female | \n"," Male | \n"," Age | \n"," EstimatedSalary | \n"," Purchased | \n","
\n"," \n"," \n"," \n"," 0 | \n"," 0.0 | \n"," 1.0 | \n"," 19 | \n"," 19000 | \n"," 0 | \n","
\n"," \n"," 1 | \n"," 0.0 | \n"," 1.0 | \n"," 35 | \n"," 20000 | \n"," 0 | \n","
\n"," \n"," 2 | \n"," 1.0 | \n"," 0.0 | \n"," 26 | \n"," 43000 | \n"," 0 | \n","
\n"," \n"," 3 | \n"," 1.0 | \n"," 0.0 | \n"," 27 | \n"," 57000 | \n"," 0 | \n","
\n"," \n"," 4 | \n"," 0.0 | \n"," 1.0 | \n"," 19 | \n"," 76000 | \n"," 0 | \n","
\n"," \n","
\n","
\n","
\n","
\n"]},"metadata":{},"execution_count":4}],"source":["enc = sklearn.preprocessing.OneHotEncoder()\n","enc.fit(dataset.iloc[:,[0]])\n","onehotlabels = enc.transform(dataset.iloc[:,[0]]).toarray()\n","genders = pd.DataFrame({'Female': onehotlabels[:, 0], 'Male': onehotlabels[:, 1]})\n","result = pd.concat([genders,dataset.iloc[:,1:]], axis=1, sort=False)\n","result.head(5)"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"qWDZfiCGEbwk"},"outputs":[],"source":["y=result['Purchased']\n","X=result.drop(columns=['Purchased'])"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"dtw6hr98Ebwk"},"outputs":[],"source":["# Splitting the dataset into the Training set and Test set\n","from sklearn.model_selection import train_test_split\n","#from sklearn.cross_validation import train_test_split\n","X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"pPb04yp-Ebwk"},"outputs":[],"source":["# Feature Scaling\n","from sklearn.preprocessing import StandardScaler\n","sc = StandardScaler()\n","X_train = sc.fit_transform(X_train)\n","X_test = sc.transform(X_test)"]},{"cell_type":"code","execution_count":null,"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":75},"id":"2oxbJaNUEbwk","executionInfo":{"status":"ok","timestamp":1694584131701,"user_tz":-330,"elapsed":25,"user":{"displayName":"Karan Sonawane","userId":"05479461208077736330"}},"outputId":"cc77a6d3-be75-452e-d53b-5c52892486a8"},"outputs":[{"output_type":"execute_result","data":{"text/plain":["GaussianNB()"],"text/html":["GaussianNB()
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org. "]},"metadata":{},"execution_count":8}],"source":["# Fitting Decision Tree Classification to the Training set\n","from sklearn.naive_bayes import GaussianNB\n","classifier = GaussianNB()\n","classifier.fit(X_train, y_train)"]},{"cell_type":"code","execution_count":null,"metadata":{"scrolled":true,"colab":{"base_uri":"https://localhost:8080/"},"id":"cAISXjL8Ebwl","executionInfo":{"status":"ok","timestamp":1694584131701,"user_tz":-330,"elapsed":24,"user":{"displayName":"Karan Sonawane","userId":"05479461208077736330"}},"outputId":"53a67646-4ee4-4aa5-b2c2-a3eb07aacd91"},"outputs":[{"output_type":"execute_result","data":{"text/plain":["array([[66, 2],\n"," [ 6, 26]])"]},"metadata":{},"execution_count":9}],"source":["# Predicting the Test set results\n","y_pred = classifier.predict(X_test)\n","cm = metrics.confusion_matrix(y_test, y_pred)\n","cm"]},{"cell_type":"code","execution_count":null,"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"k7fLgEUUEbwl","executionInfo":{"status":"ok","timestamp":1694584131702,"user_tz":-330,"elapsed":22,"user":{"displayName":"Karan Sonawane","userId":"05479461208077736330"}},"outputId":"d1f5e07f-4eab-4ba4-bbda-b89efd8e9de8"},"outputs":[{"output_type":"stream","name":"stdout","text":["0.92 0.8125 0.9285714285714286\n"]}],"source":["accuracy= metrics.accuracy_score(y_test,y_pred)\n","recall = metrics.recall_score(y_test,y_pred)\n","precision = metrics.precision_score(y_test,y_pred)\n","print(accuracy,recall,precision)"]}],"metadata":{"anaconda-cloud":{},"kernelspec":{"display_name":"Python 3","language":"python","name":"python3"},"language_info":{"codemirror_mode":{"name":"ipython","version":3},"file_extension":".py","mimetype":"text/x-python","name":"python","nbconvert_exporter":"python","pygments_lexer":"ipython3","version":"3.7.3"},"colab":{"provenance":[]}},"nbformat":4,"nbformat_minor":0}
--------------------------------------------------------------------------------
/Chapter07/RegressionTree.ipynb:
--------------------------------------------------------------------------------
1 | {"cells":[{"cell_type":"code","execution_count":null,"metadata":{"id":"wZ2ClF0E_Gf1"},"outputs":[],"source":["# Importing the libraries\n","import numpy as np\n","import matplotlib.pyplot as plt\n","import pandas as pd"]},{"cell_type":"code","execution_count":null,"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":206},"id":"ZawAHYk4_Gf4","executionInfo":{"status":"ok","timestamp":1694583640968,"user_tz":-330,"elapsed":15,"user":{"displayName":"Karan Sonawane","userId":"05479461208077736330"}},"outputId":"ce1d2bf3-3803-4d76-ac3d-b685f532c436"},"outputs":[{"output_type":"execute_result","data":{"text/plain":[" NAME CYLINDERS DISPLACEMENT HORSEPOWER WEIGHT \\\n","0 chevrolet chevelle malibu 8 307.0 130 3504 \n","1 buick skylark 320 8 350.0 165 3693 \n","2 plymouth satellite 8 318.0 150 3436 \n","3 amc rebel sst 8 304.0 150 3433 \n","4 ford torino 8 302.0 140 3449 \n","\n"," ACCELERATION MPG \n","0 12.0 18.0 \n","1 11.5 15.0 \n","2 11.0 18.0 \n","3 12.0 16.0 \n","4 10.5 17.0 "],"text/html":["\n"," \n","
\n","\n","
\n"," \n"," \n"," | \n"," NAME | \n"," CYLINDERS | \n"," DISPLACEMENT | \n"," HORSEPOWER | \n"," WEIGHT | \n"," ACCELERATION | \n"," MPG | \n","
\n"," \n"," \n"," \n"," 0 | \n"," chevrolet chevelle malibu | \n"," 8 | \n"," 307.0 | \n"," 130 | \n"," 3504 | \n"," 12.0 | \n"," 18.0 | \n","
\n"," \n"," 1 | \n"," buick skylark 320 | \n"," 8 | \n"," 350.0 | \n"," 165 | \n"," 3693 | \n"," 11.5 | \n"," 15.0 | \n","
\n"," \n"," 2 | \n"," plymouth satellite | \n"," 8 | \n"," 318.0 | \n"," 150 | \n"," 3436 | \n"," 11.0 | \n"," 18.0 | \n","
\n"," \n"," 3 | \n"," amc rebel sst | \n"," 8 | \n"," 304.0 | \n"," 150 | \n"," 3433 | \n"," 12.0 | \n"," 16.0 | \n","
\n"," \n"," 4 | \n"," ford torino | \n"," 8 | \n"," 302.0 | \n"," 140 | \n"," 3449 | \n"," 10.5 | \n"," 17.0 | \n","
\n"," \n","
\n","
\n","
\n","
\n"]},"metadata":{},"execution_count":2}],"source":["# Importing the dataset\n","dataset = pd.read_csv('https://storage.googleapis.com/neurals/data/data/auto.csv')\n","dataset.head(5)"]},{"cell_type":"code","execution_count":null,"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":206},"id":"RWSbdg1e_Gf5","executionInfo":{"status":"ok","timestamp":1694583640969,"user_tz":-330,"elapsed":12,"user":{"displayName":"Karan Sonawane","userId":"05479461208077736330"}},"outputId":"0164001c-a574-4723-bb75-d9a5874e9846"},"outputs":[{"output_type":"execute_result","data":{"text/plain":[" CYLINDERS DISPLACEMENT HORSEPOWER WEIGHT ACCELERATION MPG\n","0 8 307.0 130.0 3504 12.0 18.0\n","1 8 350.0 165.0 3693 11.5 15.0\n","2 8 318.0 150.0 3436 11.0 18.0\n","3 8 304.0 150.0 3433 12.0 16.0\n","4 8 302.0 140.0 3449 10.5 17.0"],"text/html":["\n"," \n","
\n","\n","
\n"," \n"," \n"," | \n"," CYLINDERS | \n"," DISPLACEMENT | \n"," HORSEPOWER | \n"," WEIGHT | \n"," ACCELERATION | \n"," MPG | \n","
\n"," \n"," \n"," \n"," 0 | \n"," 8 | \n"," 307.0 | \n"," 130.0 | \n"," 3504 | \n"," 12.0 | \n"," 18.0 | \n","
\n"," \n"," 1 | \n"," 8 | \n"," 350.0 | \n"," 165.0 | \n"," 3693 | \n"," 11.5 | \n"," 15.0 | \n","
\n"," \n"," 2 | \n"," 8 | \n"," 318.0 | \n"," 150.0 | \n"," 3436 | \n"," 11.0 | \n"," 18.0 | \n","
\n"," \n"," 3 | \n"," 8 | \n"," 304.0 | \n"," 150.0 | \n"," 3433 | \n"," 12.0 | \n"," 16.0 | \n","
\n"," \n"," 4 | \n"," 8 | \n"," 302.0 | \n"," 140.0 | \n"," 3449 | \n"," 10.5 | \n"," 17.0 | \n","
\n"," \n","
\n","
\n","
\n","
\n"]},"metadata":{},"execution_count":3}],"source":["dataset=dataset.drop(columns=['NAME'])\n","dataset= dataset.apply(pd.to_numeric, errors='coerce')\n","dataset.fillna(0, inplace=True)\n","dataset.head(5)"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"d1IKAzOC_Gf5"},"outputs":[],"source":["y=dataset['MPG']\n","X=dataset.drop(columns=['MPG'])\n","# Splitting the dataset into the Training set and Test set\n","from sklearn.model_selection import train_test_split\n","#from sklearn.cross_validation import train_test_split\n","X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)"]},{"cell_type":"code","execution_count":null,"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":75},"id":"uaWAQlAw_Gf6","executionInfo":{"status":"ok","timestamp":1694583642058,"user_tz":-330,"elapsed":18,"user":{"displayName":"Karan Sonawane","userId":"05479461208077736330"}},"outputId":"e22e3674-bea5-4158-fcd9-e4783f7a06cc"},"outputs":[{"output_type":"execute_result","data":{"text/plain":["DecisionTreeRegressor(max_depth=3)"],"text/html":["DecisionTreeRegressor(max_depth=3)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org. "]},"metadata":{},"execution_count":5}],"source":["from sklearn.tree import DecisionTreeRegressor\n","regressor = DecisionTreeRegressor(max_depth=3)\n","regressor.fit(X_train, y_train)"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"BsDnlm-d_Gf6"},"outputs":[],"source":["# Predicting the Test set results\n","y_pred = regressor.predict(X_test)"]},{"cell_type":"code","execution_count":null,"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"FPyObKSd_Gf6","executionInfo":{"status":"ok","timestamp":1694583642068,"user_tz":-330,"elapsed":24,"user":{"displayName":"Karan Sonawane","userId":"05479461208077736330"}},"outputId":"4f107529-7276-408b-c3f8-351d66c1c6eb"},"outputs":[{"output_type":"execute_result","data":{"text/plain":["4.464255966462035"]},"metadata":{},"execution_count":7}],"source":["from sklearn.metrics import mean_squared_error\n","from math import sqrt\n","sqrt(mean_squared_error(y_test, y_pred))"]}],"metadata":{"anaconda-cloud":{},"kernelspec":{"display_name":"Python 3","language":"python","name":"python3"},"language_info":{"codemirror_mode":{"name":"ipython","version":3},"file_extension":".py","mimetype":"text/x-python","name":"python","nbconvert_exporter":"python","pygments_lexer":"ipython3","version":"3.7.3"},"colab":{"provenance":[]}},"nbformat":4,"nbformat_minor":0}
--------------------------------------------------------------------------------
/Chapter07/SVM.ipynb:
--------------------------------------------------------------------------------
1 | {"cells":[{"cell_type":"code","execution_count":1,"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"7QChXvWOGH2L","executionInfo":{"status":"ok","timestamp":1695362299930,"user_tz":-330,"elapsed":1795,"user":{"displayName":"Karan Sonawane","userId":"05479461208077736330"}},"outputId":"fb8bda60-0341-45c3-c298-7a9422e7d862"},"outputs":[{"output_type":"stream","name":"stdout","text":["1.2.2\n"]}],"source":["# Importing the libraries\n","import numpy as np\n","import matplotlib.pyplot as plt\n","import pandas as pd\n","import sklearn\n","import sklearn.metrics as metrics\n","print(sklearn.__version__)"]},{"cell_type":"code","execution_count":2,"metadata":{"id":"KXDVzwnvGH2N","executionInfo":{"status":"ok","timestamp":1695362299932,"user_tz":-330,"elapsed":25,"user":{"displayName":"Karan Sonawane","userId":"05479461208077736330"}}},"outputs":[],"source":["# Importing the dataset\n","dataset = pd.read_csv('https://storage.googleapis.com/neurals/data/data/Social_Network_Ads.csv')\n","dataset = dataset.drop(columns=['User ID'])"]},{"cell_type":"code","execution_count":3,"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":206},"id":"INy4fzD5GH2N","executionInfo":{"status":"ok","timestamp":1695362299932,"user_tz":-330,"elapsed":22,"user":{"displayName":"Karan Sonawane","userId":"05479461208077736330"}},"outputId":"b6335b00-9327-4e43-9d06-69ed1c4ea742"},"outputs":[{"output_type":"execute_result","data":{"text/plain":[" Gender Age EstimatedSalary Purchased\n","0 Male 19 19000 0\n","1 Male 35 20000 0\n","2 Female 26 43000 0\n","3 Female 27 57000 0\n","4 Male 19 76000 0"],"text/html":["\n"," \n","
\n","\n","
\n"," \n"," \n"," | \n"," Gender | \n"," Age | \n"," EstimatedSalary | \n"," Purchased | \n","
\n"," \n"," \n"," \n"," 0 | \n"," Male | \n"," 19 | \n"," 19000 | \n"," 0 | \n","
\n"," \n"," 1 | \n"," Male | \n"," 35 | \n"," 20000 | \n"," 0 | \n","
\n"," \n"," 2 | \n"," Female | \n"," 26 | \n"," 43000 | \n"," 0 | \n","
\n"," \n"," 3 | \n"," Female | \n"," 27 | \n"," 57000 | \n"," 0 | \n","
\n"," \n"," 4 | \n"," Male | \n"," 19 | \n"," 76000 | \n"," 0 | \n","
\n"," \n","
\n","
\n","
\n","
\n"]},"metadata":{},"execution_count":3}],"source":["dataset.head(5)"]},{"cell_type":"code","execution_count":4,"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":206},"id":"4bbo7tEoGH2N","executionInfo":{"status":"ok","timestamp":1695362299933,"user_tz":-330,"elapsed":18,"user":{"displayName":"Karan Sonawane","userId":"05479461208077736330"}},"outputId":"c323f897-f441-41f1-e610-af68222e8b35"},"outputs":[{"output_type":"execute_result","data":{"text/plain":[" Female Male Age EstimatedSalary Purchased\n","0 0.0 1.0 19 19000 0\n","1 0.0 1.0 35 20000 0\n","2 1.0 0.0 26 43000 0\n","3 1.0 0.0 27 57000 0\n","4 0.0 1.0 19 76000 0"],"text/html":["\n"," \n","
\n","\n","
\n"," \n"," \n"," | \n"," Female | \n"," Male | \n"," Age | \n"," EstimatedSalary | \n"," Purchased | \n","
\n"," \n"," \n"," \n"," 0 | \n"," 0.0 | \n"," 1.0 | \n"," 19 | \n"," 19000 | \n"," 0 | \n","
\n"," \n"," 1 | \n"," 0.0 | \n"," 1.0 | \n"," 35 | \n"," 20000 | \n"," 0 | \n","
\n"," \n"," 2 | \n"," 1.0 | \n"," 0.0 | \n"," 26 | \n"," 43000 | \n"," 0 | \n","
\n"," \n"," 3 | \n"," 1.0 | \n"," 0.0 | \n"," 27 | \n"," 57000 | \n"," 0 | \n","
\n"," \n"," 4 | \n"," 0.0 | \n"," 1.0 | \n"," 19 | \n"," 76000 | \n"," 0 | \n","
\n"," \n","
\n","
\n","
\n","
\n"]},"metadata":{},"execution_count":4}],"source":["enc = sklearn.preprocessing.OneHotEncoder()\n","enc.fit(dataset.iloc[:,[0]])\n","onehotlabels = enc.transform(dataset.iloc[:,[0]]).toarray()\n","genders = pd.DataFrame({'Female': onehotlabels[:, 0], 'Male': onehotlabels[:, 1]})\n","result = pd.concat([genders,dataset.iloc[:,1:]], axis=1, sort=False)\n","result.head(5)"]},{"cell_type":"code","execution_count":5,"metadata":{"id":"M3a0V7OhGH2O","executionInfo":{"status":"ok","timestamp":1695362299933,"user_tz":-330,"elapsed":15,"user":{"displayName":"Karan Sonawane","userId":"05479461208077736330"}}},"outputs":[],"source":["y=result['Purchased']\n","X=result.drop(columns=['Purchased'])"]},{"cell_type":"code","execution_count":6,"metadata":{"id":"ASA8j7-jGH2O","executionInfo":{"status":"ok","timestamp":1695362299934,"user_tz":-330,"elapsed":15,"user":{"displayName":"Karan Sonawane","userId":"05479461208077736330"}}},"outputs":[],"source":["# Splitting the dataset into the Training set and Test set\n","from sklearn.model_selection import train_test_split\n","#from sklearn.cross_validation import train_test_split\n","X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)"]},{"cell_type":"code","execution_count":7,"metadata":{"id":"Q79WV5sMGH2O","executionInfo":{"status":"ok","timestamp":1695362300648,"user_tz":-330,"elapsed":729,"user":{"displayName":"Karan Sonawane","userId":"05479461208077736330"}}},"outputs":[],"source":["# Feature Scaling\n","from sklearn.preprocessing import StandardScaler\n","sc = StandardScaler()\n","X_train = sc.fit_transform(X_train)\n","X_test = sc.transform(X_test)"]},{"cell_type":"code","execution_count":8,"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":75},"id":"L2hTPvRJGH2O","executionInfo":{"status":"ok","timestamp":1695362300649,"user_tz":-330,"elapsed":14,"user":{"displayName":"Karan Sonawane","userId":"05479461208077736330"}},"outputId":"ed070b03-0ad5-4529-fead-f2aa60112a40"},"outputs":[{"output_type":"execute_result","data":{"text/plain":["SVC(kernel='linear', random_state=0)"],"text/html":["SVC(kernel='linear', random_state=0)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org. "]},"metadata":{},"execution_count":8}],"source":["# Fitting SVM to the Training set\n","from sklearn.svm import SVC\n","classifier = SVC(kernel = 'linear', random_state = 0)\n","classifier.fit(X_train, y_train)\n"]},{"cell_type":"code","execution_count":9,"metadata":{"scrolled":true,"colab":{"base_uri":"https://localhost:8080/"},"id":"UsnzHa2rGH2O","executionInfo":{"status":"ok","timestamp":1695362300650,"user_tz":-330,"elapsed":14,"user":{"displayName":"Karan Sonawane","userId":"05479461208077736330"}},"outputId":"4a93ee69-5ccd-4c16-c3d4-58f069408124"},"outputs":[{"output_type":"execute_result","data":{"text/plain":["array([[66, 2],\n"," [ 9, 23]])"]},"metadata":{},"execution_count":9}],"source":["# Predicting the Test set results\n","y_pred = classifier.predict(X_test)\n","cm = metrics.confusion_matrix(y_test, y_pred)\n","cm"]},{"cell_type":"code","execution_count":10,"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"MfkoOrQQGH2O","executionInfo":{"status":"ok","timestamp":1695362300651,"user_tz":-330,"elapsed":12,"user":{"displayName":"Karan Sonawane","userId":"05479461208077736330"}},"outputId":"64c758d9-e5c2-4cc8-ee94-028e8f312b09"},"outputs":[{"output_type":"stream","name":"stdout","text":["0.89 0.71875 0.92\n"]}],"source":["accuracy= metrics.accuracy_score(y_test,y_pred)\n","recall = metrics.recall_score(y_test,y_pred)\n","precision = metrics.precision_score(y_test,y_pred)\n","print(accuracy,recall,precision)"]}],"metadata":{"anaconda-cloud":{},"kernelspec":{"display_name":"Python 3","language":"python","name":"python3"},"language_info":{"codemirror_mode":{"name":"ipython","version":3},"file_extension":".py","mimetype":"text/x-python","name":"python","nbconvert_exporter":"python","pygments_lexer":"ipython3","version":"3.7.3"},"colab":{"provenance":[]}},"nbformat":4,"nbformat_minor":0}
--------------------------------------------------------------------------------
/Chapter08/Deep_Learning_Algorithms.ipynb:
--------------------------------------------------------------------------------
1 | {"nbformat":4,"nbformat_minor":0,"metadata":{"colab":{"provenance":[]},"kernelspec":{"name":"python3","display_name":"Python 3"},"language_info":{"name":"python"}},"cells":[{"cell_type":"markdown","source":["Defining Gradient Descent"],"metadata":{"id":"JvQUdYyGglny"}},{"cell_type":"code","execution_count":null,"metadata":{"id":"vP_2UL_1geDs"},"outputs":[],"source":["def adjust_position(gradient):\n"," while gradient != 0:\n"," if gradient < 0:\n"," print(\"Move right\")\n"," # here would be your logic to move right\n"," elif gradient > 0:\n"," print(\"Move left\")\n"," # here would be your logic to move left\n"]},{"cell_type":"markdown","source":["Activation functions"],"metadata":{"id":"TI0LNGJQgvCu"}},{"cell_type":"code","source":["def sigmoidFunction(z):\n"," return 1/ (1+np.exp(-z))\n"],"metadata":{"id":"YR-qsVUmgsTd"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["ReLu"],"metadata":{"id":"AQpSf_eZg0O6"}},{"cell_type":"code","source":["def relu(x):\n"," if x < 0:\n"," return 0\n"," else:\n"," return x"],"metadata":{"id":"PROCMVCfg23T"},"execution_count":null,"outputs":[]},{"cell_type":"code","source":["def leaky_relu(x, beta=0.01):\n"," if x < 0:\n"," return beta * x\n"," else:\n"," return x"],"metadata":{"id":"gGUyHzt4g5fR"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["Hyperbolic tangent"],"metadata":{"id":"Sja74OuqhJ-R"}},{"cell_type":"code","source":["import numpy as np\n","\n","def tanh(x):\n"," numerator = 1 - np.exp(-2 * x)\n"," denominator = 1 + np.exp(-2 * x)\n"," return numerator / denominator\n"],"metadata":{"id":"dduMfMP8hGcs"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["Softmax"],"metadata":{"id":"76ozR2LfhMyi"}},{"cell_type":"code","source":["import numpy as np\n","\n","def softmax(x):\n"," return np.exp(x) / np.sum(np.exp(x), axis=0)\n"],"metadata":{"id":"iYcgHftbhOek"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["Defining a Keras model"],"metadata":{"id":"o5pek9fzhblq"}},{"cell_type":"code","source":["import tensorflow as tf"],"metadata":{"id":"l1YOraqphcKb"},"execution_count":null,"outputs":[]},{"cell_type":"code","source":["mnist = tf.keras.datasets.mnist"],"metadata":{"id":"PLfwctolhgIP"},"execution_count":null,"outputs":[]},{"cell_type":"code","source":["(train_images, train_labels), (test_images, test_labels) = mnist.load_data()"],"metadata":{"id":"8Hrr06LPhilq"},"execution_count":null,"outputs":[]},{"cell_type":"code","source":["train_images, test_images = train_images / 255.0, test_images / 255.0"],"metadata":{"id":"gJ9tC_s4hqlO"},"execution_count":null,"outputs":[]},{"cell_type":"code","source":["model = tf.keras.models.Sequential([\n"," tf.keras.layers.Flatten(input_shape=(28, 28)),\n"," tf.keras.layers.Dense(128, activation='relu'),\n"," tf.keras.layers.Dropout(0.15),\n"," tf.keras.layers.Dense(128, activation='relu'),\n"," tf.keras.layers.Dropout(0.15),\n"," tf.keras.layers.Dense(10, activation='softmax'),\n","])"],"metadata":{"id":"0cvdHDXQhs3K"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["## Functional API way of defining a Keras model"],"metadata":{"id":"nz51VYbsiDfm"}},{"cell_type":"code","source":["# Ensure TensorFlow 2.x is being used\n","%tensorflow_version 2.x\n","import tensorflow as tf\n","from tensorflow.keras.datasets import mnist\n","\n","# Load MNIST dataset\n","(train_images, train_labels), (test_images, test_labels) = mnist.load_data()\n","\n","# Normalize the pixel values to be between 0 and 1\n","train_images, test_images = train_images / 255.0, test_images / 255.0\n","\n","# Using the Functional API\n","inputs = tf.keras.Input(shape=(28, 28)) # Adjusted for MNIST\n","x = tf.keras.layers.Flatten()(inputs)\n","x = tf.keras.layers.Dense(512, activation='relu', name='d1')(x)\n","x = tf.keras.layers.Dropout(0.2)(x)\n","predictions = tf.keras.layers.Dense(10, activation=tf.nn.softmax, name='d2')(x) # 10 classes for 10 digits\n","model = tf.keras.Model(inputs=inputs, outputs=predictions)\n","\n","# One-hot encode the labels\n","train_labels_one_hot = tf.keras.utils.to_categorical(train_labels, 10)\n","test_labels_one_hot = tf.keras.utils.to_categorical(test_labels, 10)\n","\n","# Define the learning process\n","optimizer = tf.keras.optimizers.RMSprop()\n","loss = 'categorical_crossentropy'\n","metrics = ['accuracy']\n","\n","model.compile(optimizer=optimizer, loss=loss, metrics=metrics)\n","\n","# Train the model\n","history = model.fit(train_images, train_labels_one_hot, epochs=10, validation_data=(test_images, test_labels_one_hot))\n"],"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"zakEQAEMZZaz","executionInfo":{"status":"ok","timestamp":1695364571716,"user_tz":-330,"elapsed":204746,"user":{"displayName":"Karan Sonawane","userId":"05479461208077736330"}},"outputId":"e3596a46-71a5-4f0d-bfbc-770c6b6a1cd4"},"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":["Colab only includes TensorFlow 2.x; %tensorflow_version has no effect.\n","Epoch 1/10\n","1875/1875 [==============================] - 26s 13ms/step - loss: 0.2204 - accuracy: 0.9339 - val_loss: 0.1068 - val_accuracy: 0.9681\n","Epoch 2/10\n","1875/1875 [==============================] - 18s 10ms/step - loss: 0.1030 - accuracy: 0.9691 - val_loss: 0.0907 - val_accuracy: 0.9742\n","Epoch 3/10\n","1875/1875 [==============================] - 18s 9ms/step - loss: 0.0788 - accuracy: 0.9771 - val_loss: 0.0838 - val_accuracy: 0.9770\n","Epoch 4/10\n","1875/1875 [==============================] - 19s 10ms/step - loss: 0.0639 - accuracy: 0.9808 - val_loss: 0.0730 - val_accuracy: 0.9801\n","Epoch 5/10\n","1875/1875 [==============================] - 18s 10ms/step - loss: 0.0537 - accuracy: 0.9841 - val_loss: 0.0719 - val_accuracy: 0.9813\n","Epoch 6/10\n","1875/1875 [==============================] - 19s 10ms/step - loss: 0.0456 - accuracy: 0.9866 - val_loss: 0.0745 - val_accuracy: 0.9814\n","Epoch 7/10\n","1875/1875 [==============================] - 18s 10ms/step - loss: 0.0408 - accuracy: 0.9881 - val_loss: 0.0654 - val_accuracy: 0.9835\n","Epoch 8/10\n","1875/1875 [==============================] - 20s 11ms/step - loss: 0.0375 - accuracy: 0.9894 - val_loss: 0.0646 - val_accuracy: 0.9836\n","Epoch 9/10\n","1875/1875 [==============================] - 18s 10ms/step - loss: 0.0301 - accuracy: 0.9911 - val_loss: 0.0735 - val_accuracy: 0.9828\n","Epoch 10/10\n","1875/1875 [==============================] - 21s 11ms/step - loss: 0.0274 - accuracy: 0.9921 - val_loss: 0.0734 - val_accuracy: 0.9831\n"]}]},{"cell_type":"markdown","source":["## Understanding Tensor Mathematics"],"metadata":{"id":"GRufaV-2i89s"}},{"cell_type":"code","source":["print(\"Define constant tensors\")\n","a = tf.constant(2)\n","print(\"a = %i\" % a)\n","b = tf.constant(3)\n","print(\"b = %i\" % b)\n"],"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"KjZuVDbkjACq","outputId":"f823db3c-3954-404f-cd32-3e1ae5de0f4c","executionInfo":{"status":"ok","timestamp":1695364571717,"user_tz":-330,"elapsed":20,"user":{"displayName":"Karan Sonawane","userId":"05479461208077736330"}}},"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":["Define constant tensors\n","a = 2\n","b = 3\n"]}]},{"cell_type":"code","source":["print(\"Running operations, without tf.Session\")\n","c = a + b\n","print(\"a + b = %i\" % c)\n","d = a * b\n","print(\"a * b = %i\" % d)\n"],"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"Q7tZs9rCjCe1","outputId":"0acb2df9-626c-4433-8469-70ab0fe154df","executionInfo":{"status":"ok","timestamp":1695364571718,"user_tz":-330,"elapsed":18,"user":{"displayName":"Karan Sonawane","userId":"05479461208077736330"}}},"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":["Running operations, without tf.Session\n","a + b = 5\n","a * b = 6\n"]}]},{"cell_type":"code","source":["c = a + b\n","print(\"a + b = %s\" % c)\n"],"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"zj2ZKuxcjJ4X","outputId":"4427d528-4bbb-4759-c1a7-b1da90aa7640","executionInfo":{"status":"ok","timestamp":1695364571718,"user_tz":-330,"elapsed":14,"user":{"displayName":"Karan Sonawane","userId":"05479461208077736330"}}},"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":["a + b = tf.Tensor(5, shape=(), dtype=int32)\n"]}]},{"cell_type":"code","source":["d = a*b\n","print(\"a * b = %s\" % d)\n"],"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"Hf6Trp_YjMEu","outputId":"fda35bdb-37ba-499e-ffee-54633a31d707","executionInfo":{"status":"ok","timestamp":1695364571718,"user_tz":-330,"elapsed":11,"user":{"displayName":"Karan Sonawane","userId":"05479461208077736330"}}},"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":["a * b = tf.Tensor(6, shape=(), dtype=int32)\n"]}]}]}
--------------------------------------------------------------------------------
/Chapter08/Siamese_working.ipynb:
--------------------------------------------------------------------------------
1 | {"cells":[{"cell_type":"markdown","metadata":{"id":"2rupy38kqrlo"},"source":["# Siamese Networks\n"," Keras to implement a simple example of Siamese networks, which will verify whether two MNIST images are from the same class or not"]},{"cell_type":"markdown","metadata":{"id":"WzbZT_SjqbdB"},"source":["Start with the import statements:"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"BtHauCbhQSIe"},"outputs":[],"source":["import random\n","import numpy as np\n","import tensorflow as tf"]},{"cell_type":"markdown","metadata":{"id":"9EBK8PigwEEq"},"source":["Next, we'll implement the prepareData function to create the train/test dataset (both for training and testing):"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"mY8AogCiQXeM"},"outputs":[],"source":["def prepareData(inputs: np.ndarray, labels: np.ndarray):\n"," classesNumbers = 10\n"," digitalIdx = [np.where(labels == i)[0] for i in range(classesNumbers)]\n"," pairs = list()\n"," labels = list()\n"," n = min([len(digitalIdx[d]) for d in range(classesNumbers)]) - 1\n"," for d in range(classesNumbers):\n"," for i in range(n):\n"," z1, z2 = digitalIdx[d][i], digitalIdx[d][i + 1]\n"," pairs += [[inputs[z1], inputs[z2]]]\n"," inc = random.randrange(1, classesNumbers)\n"," dn = (d + inc) % classesNumbers\n"," z1, z2 = digitalIdx[d][i], digitalIdx[dn][i]\n"," pairs += [[inputs[z1], inputs[z2]]]\n"," labels += [1, 0]\n"," return np.array(pairs), np.array(labels, dtype=np.float32)"]},{"cell_type":"markdown","metadata":{"id":"4PuTgE6DwRfR"},"source":["Next, let's implement the createTemplate function, which defines one branch of the Siamese network:"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"wehEGEkWRp7P"},"outputs":[],"source":["def createTemplate():\n"," return tf.keras.models.Sequential([\n"," tf.keras.layers.Flatten(),\n"," tf.keras.layers.Dense(128, activation='relu'),\n"," tf.keras.layers.Dropout(0.15),\n"," tf.keras.layers.Dense(128, activation='relu'),\n"," tf.keras.layers.Dropout(0.15),\n"," tf.keras.layers.Dense(64, activation='relu'),\n"," ])"]},{"cell_type":"markdown","metadata":{"id":"UMw0fneFwcdl"},"source":["Next, let's build the whole training system, starting from the MNIST dataset:"]},{"cell_type":"code","execution_count":null,"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"n6IVR4pLSJ8Z","executionInfo":{"status":"ok","timestamp":1695364607640,"user_tz":-330,"elapsed":1464,"user":{"displayName":"Karan Sonawane","userId":"05479461208077736330"}},"outputId":"1cfc7365-0f42-4ce4-dc0c-ffb1ffec15b8"},"outputs":[{"output_type":"stream","name":"stdout","text":["Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz\n","11490434/11490434 [==============================] - 1s 0us/step\n"]}],"source":["(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()\n","x_train = x_train.astype(np.float32)\n","x_test = x_test.astype(np.float32)\n","x_train /= 255\n","x_test /= 255\n","input_shape = x_train.shape[1:]"]},{"cell_type":"markdown","metadata":{"id":"6roaL5NwwkQO"},"source":["We'll use the raw dataset to create the actual train and test verification datasets:"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"3d3DxhI2Uk3b"},"outputs":[],"source":["train_pairs, tr_labels = prepareData(x_train, y_train)\n","test_pairs, test_labels = prepareData(x_test, y_test)"]},{"cell_type":"markdown","metadata":{"id":"1cUXgdhJwpoE"},"source":["Then, we'll build the base portion of the Siamese network:"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"INMfl1Q7UpGg"},"outputs":[],"source":["base_network = createTemplate()"]},{"cell_type":"markdown","metadata":{"id":"w14amWufwvwu"},"source":["Next, let's create the two branches:"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"gLy_fpIFUsOg"},"outputs":[],"source":["# Create first half of the siamese system\n","input_a = tf.keras.layers.Input(shape=input_shape)\n","# Note how we reuse the base_network in both halfs\n","enconder1 = base_network(input_a)\n","# Create the second half of the siamese system\n","input_b = tf.keras.layers.Input(shape=input_shape)\n","enconder2 = base_network(input_b)"]},{"cell_type":"markdown","metadata":{"id":"EC6E8AUZw1fj"},"source":["Next, we'll create the measure of similarity, which uses the outputs of enconder1 and enconder2. It is implemented as a tf.keras.layers.Lambda layer:"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"pysaqMInU7CV"},"outputs":[],"source":["distance = tf.keras.layers.Lambda(\n"," lambda embeddings: tf.keras.backend.abs(embeddings[0] - embeddings[1])) \\\n"," ([enconder1, enconder2])"]},{"cell_type":"markdown","metadata":{"id":"kAryOk8Dw8My"},"source":["Then, we'll create the final fully connected layer, which takes the output of the distance and compresses it to a single sigmoid output:"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"XGPNBafkVFR2"},"outputs":[],"source":["measureOfSimilarity = tf.keras.layers.Dense(1, activation='sigmoid') (distance)"]},{"cell_type":"markdown","metadata":{"id":"WENeMx81xCTH"},"source":["Finally, we can build the model and initiate the training for 10 epochs:"]},{"cell_type":"code","execution_count":null,"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"HdeDKt_bVLOf","outputId":"67ff7f3c-4ec1-4049-9a09-dcd08c22ab91","executionInfo":{"status":"ok","timestamp":1695364759741,"user_tz":-330,"elapsed":147827,"user":{"displayName":"Karan Sonawane","userId":"05479461208077736330"}}},"outputs":[{"output_type":"stream","name":"stdout","text":["Epoch 1/10\n","847/847 [==============================] - 17s 16ms/step - loss: 0.3535 - accuracy: 0.8475 - val_loss: 0.2727 - val_accuracy: 0.9097\n","Epoch 2/10\n","847/847 [==============================] - 9s 11ms/step - loss: 0.1845 - accuracy: 0.9314 - val_loss: 0.1970 - val_accuracy: 0.9374\n","Epoch 3/10\n","847/847 [==============================] - 9s 10ms/step - loss: 0.1251 - accuracy: 0.9552 - val_loss: 0.1395 - val_accuracy: 0.9570\n","Epoch 4/10\n","847/847 [==============================] - 8s 9ms/step - loss: 0.0951 - accuracy: 0.9661 - val_loss: 0.1184 - val_accuracy: 0.9644\n","Epoch 5/10\n","847/847 [==============================] - 9s 11ms/step - loss: 0.0781 - accuracy: 0.9725 - val_loss: 0.1033 - val_accuracy: 0.9698\n","Epoch 6/10\n","847/847 [==============================] - 9s 10ms/step - loss: 0.0656 - accuracy: 0.9772 - val_loss: 0.0954 - val_accuracy: 0.9718\n","Epoch 7/10\n","847/847 [==============================] - 8s 9ms/step - loss: 0.0563 - accuracy: 0.9802 - val_loss: 0.0884 - val_accuracy: 0.9730\n","Epoch 8/10\n","847/847 [==============================] - 9s 10ms/step - loss: 0.0505 - accuracy: 0.9824 - val_loss: 0.0862 - val_accuracy: 0.9741\n","Epoch 9/10\n","847/847 [==============================] - 9s 10ms/step - loss: 0.0443 - accuracy: 0.9846 - val_loss: 0.0911 - val_accuracy: 0.9723\n","Epoch 10/10\n","847/847 [==============================] - 8s 9ms/step - loss: 0.0416 - accuracy: 0.9855 - val_loss: 0.0793 - val_accuracy: 0.9769\n"]},{"output_type":"execute_result","data":{"text/plain":[""]},"metadata":{},"execution_count":10}],"source":["# Build the model\n","model = tf.keras.models.Model([input_a, input_b], measureOfSimilarity)\n","# Train\n","model.compile(loss='binary_crossentropy',optimizer=tf.keras.optimizers.Adam(),metrics=['accuracy'])\n","\n","model.fit([train_pairs[:, 0], train_pairs[:, 1]], tr_labels,\n"," batch_size=128,epochs=10,validation_data=([test_pairs[:, 0], test_pairs[:, 1]], test_labels))"]}],"metadata":{"colab":{"provenance":[]},"kernelspec":{"display_name":"Python 3","language":"python","name":"python3"},"language_info":{"codemirror_mode":{"name":"ipython","version":3},"file_extension":".py","mimetype":"text/x-python","name":"python","nbconvert_exporter":"python","pygments_lexer":"ipython3","version":"3.9.1"}},"nbformat":4,"nbformat_minor":0}
--------------------------------------------------------------------------------
/Chapter09/Natural_Language_Processing.ipynb:
--------------------------------------------------------------------------------
1 | {"nbformat":4,"nbformat_minor":0,"metadata":{"colab":{"provenance":[]},"kernelspec":{"name":"python3","display_name":"Python 3"},"language_info":{"name":"python"}},"cells":[{"cell_type":"markdown","source":["# Chapter 09\n","## Natural Language Proessing\n","### Tokenization\n","This code snippet is tokenizing the given text using the Natural Language Toolkit (nltk) library in Python. The Natural Language Toolkit (nltk) is a widely-used library in Python, specifically designed for working with human language data.\n","- Let us start by importing relevant functions and using it."],"metadata":{"id":"y4S9fedO9RxZ"}},{"cell_type":"code","source":["import nltk\n","nltk.download('punkt')\n"],"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"Vy6avzcoBQxG","outputId":"730625bb-435e-4ce3-afc8-682d0a351a43","executionInfo":{"status":"ok","timestamp":1695187104597,"user_tz":240,"elapsed":1736,"user":{"displayName":"Imran Ahmad","userId":"08683678182734146579"}}},"execution_count":1,"outputs":[{"output_type":"stream","name":"stderr","text":["[nltk_data] Downloading package punkt to /root/nltk_data...\n","[nltk_data] Package punkt is already up-to-date!\n"]},{"output_type":"execute_result","data":{"text/plain":["True"]},"metadata":{},"execution_count":1}]},{"cell_type":"code","execution_count":2,"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"oNmtO3Di70TV","outputId":"74d1f540-759e-48c7-cf56-c9bc39d3b764","executionInfo":{"status":"ok","timestamp":1695187104597,"user_tz":240,"elapsed":5,"user":{"displayName":"Imran Ahmad","userId":"08683678182734146579"}}},"outputs":[{"output_type":"stream","name":"stdout","text":["['This', 'is', 'a', 'book', 'about', 'algorithms', '.']\n"]}],"source":["from nltk.tokenize import word_tokenize\n","corpus = 'This is a book about algorithms.'\n","\n","tokens = word_tokenize(corpus)\n","print(tokens)"]},{"cell_type":"markdown","source":["To tokenize text based on sentences, you can use the sent_tokenize function from the nltk.tokenize module."],"metadata":{"id":"6QkSCiSu8QoT"}},{"cell_type":"code","source":["from nltk.tokenize import sent_tokenize\n","corpus = 'This is a book about algorithms. It covers various topics in depth.'"],"metadata":{"id":"lfgB79N09WV5","executionInfo":{"status":"ok","timestamp":1695187104597,"user_tz":240,"elapsed":3,"user":{"displayName":"Imran Ahmad","userId":"08683678182734146579"}}},"execution_count":3,"outputs":[]},{"cell_type":"markdown","source":["\n","In this example, the corpus variable contains two sentences. The sent_tokenize function takes the corpus as input and returns a list of sentences. When you run the modified code, you will get the following output:"],"metadata":{"id":"ChBYmVoG8ZX3"}},{"cell_type":"code","source":["sentences = sent_tokenize(corpus)\n","print(sentences)"],"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"-S9tCXfM9XIP","outputId":"640b51d8-0688-4d8f-dbff-f337093ab1df","executionInfo":{"status":"ok","timestamp":1695187104895,"user_tz":240,"elapsed":301,"user":{"displayName":"Imran Ahmad","userId":"08683678182734146579"}}},"execution_count":4,"outputs":[{"output_type":"stream","name":"stdout","text":["['This is a book about algorithms.', 'It covers various topics in depth.']\n"]}]},{"cell_type":"markdown","source":["Sometimes we may need to break down large texts into paragraph-level chunks, NLTK can help with that task. It's a feature that could be particularly useful in applications such as document summarization, where understanding the structure at the paragraph level may be crucial. Tokenizing text into paragraphs might seem straightforward, but it can be complex depending on the structure and format of the text. A simple approach is to split the text by two newline characters, which often separate paragraphs in plain text documents."],"metadata":{"id":"GIiv2iLt8oVh"}},{"cell_type":"code","source":["def tokenize_paragraphs(text):\n"," # Split by two newline characters\n"," paragraphs = text.split('\\n\\n')\n"," return [p.strip() for p in paragraphs if p]\n"],"metadata":{"id":"DlSPxNCVADW3","executionInfo":{"status":"ok","timestamp":1695187104895,"user_tz":240,"elapsed":7,"user":{"displayName":"Imran Ahmad","userId":"08683678182734146579"}}},"execution_count":5,"outputs":[]},{"cell_type":"markdown","source":["## Cleaning data using Python\n","Let us study some techniques used to clean data and prepare it for machine learning tasks:"],"metadata":{"id":"NMj8JHokAH06"}},{"cell_type":"code","source":["import string\n","import re\n","import nltk\n","from nltk.corpus import stopwords\n","from nltk.stem import PorterStemmer\n","\n","# Make sure to download the NLTK resources\n","nltk.download('punkt')\n","nltk.download('stopwords')"],"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"-b6Ovj1SAIfy","outputId":"b953a167-476d-47e2-983d-f9d27efffb20","executionInfo":{"status":"ok","timestamp":1695187104895,"user_tz":240,"elapsed":7,"user":{"displayName":"Imran Ahmad","userId":"08683678182734146579"}}},"execution_count":6,"outputs":[{"output_type":"stream","name":"stderr","text":["[nltk_data] Downloading package punkt to /root/nltk_data...\n","[nltk_data] Package punkt is already up-to-date!\n","[nltk_data] Downloading package stopwords to /root/nltk_data...\n","[nltk_data] Package stopwords is already up-to-date!\n"]},{"output_type":"execute_result","data":{"text/plain":["True"]},"metadata":{},"execution_count":6}]},{"cell_type":"markdown","source":["Let us look into how we can clean text using Python."],"metadata":{"id":"Vi4c-jSh9Kgr"}},{"cell_type":"code","source":["def clean_text(text):\n"," \"\"\"\n"," Cleans input text by converting case, removing punctuation, numbers, white spaces, stop words and stemming\n"," \"\"\"\n"," # Convert to lowercase\n"," text = text.lower()\n","\n"," # Remove punctuation\n"," text = text.translate(str.maketrans('', '', string.punctuation))\n","\n"," # Remove numbers\n"," text = re.sub(r'\\d+', '', text)\n","\n"," # Remove white spaces\n"," text = text.strip()\n","\n"," # Remove stop words\n"," stop_words = set(stopwords.words('english'))\n"," tokens = nltk.word_tokenize(text)\n"," filtered_text = [word for word in tokens if word not in stop_words]\n"," text = ' '.join(filtered_text)\n","\n"," # Stemming\n"," ps = PorterStemmer()\n"," tokens = nltk.word_tokenize(text)\n"," stemmed_text = [ps.stem(word) for word in tokens]\n"," text = ' '.join(stemmed_text)\n","\n"," return text\n"],"metadata":{"id":"a0IMfUMfAKe6","executionInfo":{"status":"ok","timestamp":1695187104896,"user_tz":240,"elapsed":7,"user":{"displayName":"Imran Ahmad","userId":"08683678182734146579"}}},"execution_count":7,"outputs":[]},{"cell_type":"markdown","source":["Let us test this function clean_text()"],"metadata":{"id":"OUrtAexf9Vw7"}},{"cell_type":"code","source":["corpus=\"7- Today, Ottawa is becoming cold again \"\n","clean_text(corpus)"],"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":35},"id":"YIxvQZ12ANb5","outputId":"1dd72477-2519-4257-b8b0-a71f43ab9852","executionInfo":{"status":"ok","timestamp":1695187104896,"user_tz":240,"elapsed":7,"user":{"displayName":"Imran Ahmad","userId":"08683678182734146579"}}},"execution_count":8,"outputs":[{"output_type":"execute_result","data":{"text/plain":["'today ottawa becom cold'"],"application/vnd.google.colaboratory.intrinsic+json":{"type":"string"}},"metadata":{},"execution_count":8}]},{"cell_type":"markdown","source":["### Understanding the term \"Document Matrix\"\n","This matrix structure allows efficient storage, organization, and analysis of large text datasets In Python, the CountVectorizer module from the sklearn library can be used to create TDM as follows:"],"metadata":{"id":"_v4pQjjNAR7e"}},{"cell_type":"code","source":["from sklearn.feature_extraction.text import CountVectorizer\n","\n","# Define a list of documents\n","documents = [\"Machine Learning is useful\", \"Machine Learning is fun\", \"Machine Learning is AI\"]\n","\n","# Create an instance of CountVectorizer\n","vectorizer = CountVectorizer()\n","\n","# Fit and transform the documents into a TDM\n","tdm = vectorizer.fit_transform(documents)\n","\n","# Print the TDM\n","print(tdm.toarray())"],"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"g-_zkqHzATOr","outputId":"31726e46-dd1e-424b-a1e4-e58d9df71c74","executionInfo":{"status":"ok","timestamp":1695187104896,"user_tz":240,"elapsed":6,"user":{"displayName":"Imran Ahmad","userId":"08683678182734146579"}}},"execution_count":9,"outputs":[{"output_type":"stream","name":"stdout","text":["[[0 0 1 1 1 1]\n"," [0 1 1 1 1 0]\n"," [1 0 1 1 1 0]]\n"]}]},{"cell_type":"code","source":["from sklearn.feature_extraction.text import TfidfVectorizer\n","\n","# Define a list of documents\n","documents = [\"Machine Learning enables learning\", \"Machine Learning is fun\", \"Machine Learning is useful\"]\n","\n","# Create an instance of TfidfVectorizer\n","vectorizer = TfidfVectorizer()\n","\n","# Fit and transform the documents into a TF-IDF matrix\n","tfidf_matrix = vectorizer.fit_transform(documents)\n","\n","# Get the feature names\n","feature_names = vectorizer.get_feature_names_out()\n","\n","# Loop over the feature names and print the TF-IDF score for each term\n","for i, term in enumerate(feature_names):\n"," tfidf = tfidf_matrix[:, i].toarray().flatten()\n"," print(f\"{term}: {tfidf}\")"],"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"suAKN24RAYIP","outputId":"22bb6bcc-f8c1-4e91-c5fd-80f0798c0165","executionInfo":{"status":"ok","timestamp":1695187105147,"user_tz":240,"elapsed":2,"user":{"displayName":"Imran Ahmad","userId":"08683678182734146579"}}},"execution_count":10,"outputs":[{"output_type":"stream","name":"stdout","text":["enables: [0.60366655 0. 0. ]\n","fun: [0. 0.66283998 0. ]\n","is: [0. 0.50410689 0.50410689]\n","learning: [0.71307037 0.39148397 0.39148397]\n","machine: [0.35653519 0.39148397 0.39148397]\n","useful: [0. 0. 0.66283998]\n"]}]},{"cell_type":"markdown","source":["### Implementing word embedding with Word2Vec\n","Word2Vec is a prominent method used for obtaining vector representations of words, commonly referred to as word embeddings. Rather than \"generating words,\" this algorithm creates numerical vectors that represent the semantic meaning of each word in the language."],"metadata":{"id":"QVKp4m-PAc79"}},{"cell_type":"code","source":["import gensim\n","\n","# Define a text corpus\n","corpus = [['apple', 'banana', 'orange', 'pear'],\n"," ['car', 'bus', 'train', 'plane'],\n"," ['dog', 'cat', 'fox', 'fish']]\n","\n","# Train a word2vec model on the corpus\n","model = gensim.models.Word2Vec(corpus, window=5, min_count=1, workers=4)"],"metadata":{"id":"GUwtJbxuAda-","executionInfo":{"status":"ok","timestamp":1695187105652,"user_tz":240,"elapsed":506,"user":{"displayName":"Imran Ahmad","userId":"08683678182734146579"}}},"execution_count":11,"outputs":[]},{"cell_type":"code","source":["print(model.wv.similarity('car', 'train'))"],"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"kR6j8XW_AhRW","outputId":"0d24e947-84c2-49fa-82bb-2e364a9bfe99","executionInfo":{"status":"ok","timestamp":1695187105653,"user_tz":240,"elapsed":4,"user":{"displayName":"Imran Ahmad","userId":"08683678182734146579"}}},"execution_count":12,"outputs":[{"output_type":"stream","name":"stdout","text":["-0.057745814\n"]}]},{"cell_type":"code","source":["print(model.wv.similarity('car', 'apple'))"],"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"4ig_Ys0UAi91","outputId":"f3d36147-8c72-4ea1-cbfd-99eb26d05126","executionInfo":{"status":"ok","timestamp":1695187105876,"user_tz":240,"elapsed":225,"user":{"displayName":"Imran Ahmad","userId":"08683678182734146579"}}},"execution_count":13,"outputs":[{"output_type":"stream","name":"stdout","text":["0.11117952\n"]}]},{"cell_type":"markdown","source":["## Case study: Restaurant review sentiment analysis\n","We will use the Yelp Reviews dataset which contains labelled reviews as positive(5 stars) or negative(1start). We will train a model that can classify the reviews of a restaurant as negative or positive"],"metadata":{"id":"RVF8vOrQAmYV"}},{"cell_type":"code","source":["import nltk\n","nltk.download('stopwords')"],"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"fW_jI1crBIyG","executionInfo":{"status":"ok","timestamp":1695187105876,"user_tz":240,"elapsed":3,"user":{"displayName":"Imran Ahmad","userId":"08683678182734146579"}},"outputId":"91ee6b6a-9dc4-4baf-f848-1206e017eba3"},"execution_count":14,"outputs":[{"output_type":"stream","name":"stderr","text":["[nltk_data] Downloading package stopwords to /root/nltk_data...\n","[nltk_data] Package stopwords is already up-to-date!\n"]},{"output_type":"execute_result","data":{"text/plain":["True"]},"metadata":{},"execution_count":14}]},{"cell_type":"code","source":["import numpy as np\n","import pandas as pd\n","import re\n","from nltk.stem import PorterStemmer\n","from nltk.corpus import stopwords"],"metadata":{"id":"UQXrVibsAnLl","executionInfo":{"status":"ok","timestamp":1695187107092,"user_tz":240,"elapsed":1218,"user":{"displayName":"Imran Ahmad","userId":"08683678182734146579"}}},"execution_count":15,"outputs":[]},{"cell_type":"code","source":["url = 'https://storage.googleapis.com/neurals/data/2023/Restaurant_Reviews.tsv'\n","dataset = pd.read_csv(url, delimiter='\\t', quoting=3)\n","dataset.head()"],"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":206},"id":"d6j_jh-nAwJO","outputId":"aa8cfb4f-5397-424a-fd22-7bcad574673e","executionInfo":{"status":"ok","timestamp":1695187107332,"user_tz":240,"elapsed":244,"user":{"displayName":"Imran Ahmad","userId":"08683678182734146579"}}},"execution_count":16,"outputs":[{"output_type":"execute_result","data":{"text/plain":[" Review Liked\n","0 Wow... Loved this place. 1\n","1 Crust is not good. 0\n","2 Not tasty and the texture was just nasty. 0\n","3 Stopped by during the late May bank holiday of... 1\n","4 The selection on the menu was great and so wer... 1"],"text/html":["\n"," \n","
\n","\n","
\n"," \n"," \n"," | \n"," Review | \n"," Liked | \n","
\n"," \n"," \n"," \n"," 0 | \n"," Wow... Loved this place. | \n"," 1 | \n","
\n"," \n"," 1 | \n"," Crust is not good. | \n"," 0 | \n","
\n"," \n"," 2 | \n"," Not tasty and the texture was just nasty. | \n"," 0 | \n","
\n"," \n"," 3 | \n"," Stopped by during the late May bank holiday of... | \n"," 1 | \n","
\n"," \n"," 4 | \n"," The selection on the menu was great and so wer... | \n"," 1 | \n","
\n"," \n","
\n","
\n","
\n","
\n"]},"metadata":{},"execution_count":16}]},{"cell_type":"code","source":["def clean_text(text):\n"," text = re.sub('[^a-zA-Z]', ' ', text)\n"," text = text.lower()\n"," text = text.split()\n"," ps = PorterStemmer()\n"," text = [\n"," ps.stem(word) for word in text\n"," if not word in set(stopwords.words('english'))]\n"," text = ' '.join(text)\n"," return text\n","\n","corpus = [clean_text(review) for review in dataset['Review']]"],"metadata":{"id":"lNsNj5SpAyY8","executionInfo":{"status":"ok","timestamp":1695187112577,"user_tz":240,"elapsed":5247,"user":{"displayName":"Imran Ahmad","userId":"08683678182734146579"}}},"execution_count":17,"outputs":[]},{"cell_type":"code","source":["from sklearn.feature_extraction.text import CountVectorizer\n","from sklearn.model_selection import train_test_split\n","from sklearn.naive_bayes import GaussianNB\n","from sklearn.metrics import confusion_matrix\n","\n","# Initialize the CountVectorizer and transform the corpus\n","vectorizer = CountVectorizer(max_features=1500)\n","X = vectorizer.fit_transform(corpus).toarray()\n","\n","# Get the target labels\n","y = dataset.iloc[:, 1].values\n","\n","# Split the data into training and test sets\n","X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=0)\n","\n","# Initialize and train the Gaussian Naive Bayes classifier\n","classifier = GaussianNB()\n","classifier.fit(X_train, y_train)\n","\n","# Make predictions on the test set\n","y_pred = classifier.predict(X_test)\n","\n","# Compute the confusion matrix\n","cm = confusion_matrix(y_test, y_pred)\n","print(cm)\n"],"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"yo-qXa5D_cLU","outputId":"5a155bfa-5214-4c01-ecb8-9d2e409faac0","executionInfo":{"status":"ok","timestamp":1695187112577,"user_tz":240,"elapsed":3,"user":{"displayName":"Imran Ahmad","userId":"08683678182734146579"}}},"execution_count":18,"outputs":[{"output_type":"stream","name":"stdout","text":["[[55 42]\n"," [12 91]]\n"]}]},{"cell_type":"code","source":[],"metadata":{"id":"xG8OdNk3-Mu4","executionInfo":{"status":"ok","timestamp":1695187112578,"user_tz":240,"elapsed":2,"user":{"displayName":"Imran Ahmad","userId":"08683678182734146579"}}},"execution_count":18,"outputs":[]}]}
--------------------------------------------------------------------------------
/Chapter11/Advanced_Sequential_Algorithms.ipynb:
--------------------------------------------------------------------------------
1 | {"nbformat":4,"nbformat_minor":0,"metadata":{"colab":{"private_outputs":true,"provenance":[]},"kernelspec":{"name":"python3","display_name":"Python 3"},"language_info":{"name":"python"}},"cells":[{"cell_type":"markdown","source":["## Chapter 11\n","#### Advanced Sequential Modeling Algorithms"],"metadata":{"id":"tB4M5w2NtkUe"}},{"cell_type":"markdown","source":["# Part1- Coding Autoencoders\n","we'll employ an autoencoder to reproduce these handwritten digits. The unique feature of autoencoders is their training mechanism: the input and the target output are the same image. Let's break this down.\n","First, there is the training phase, where the following steps occur:\n","1.\tThe MNIST images are provided to the autoencoder.\n","2.\tThe encoder segment compresses these images into a condensed latent representation.\n","3.\tThe decoder segment then tries to restore the original image from this representation. By iterating over this process, the autoencoder acquires the nuances of compressing and reconstructing, capturing the core patterns of the handwritten digits.\n","\n","Second, there is the reconstruction phase:\n","1.\tWith the model trained, when we feed it new images of handwritten digits, the autoencoder will first encode them into its internal representation.\n","2.\tThen, decoding this representation will yield a reconstructed image, which, if the training was successful, should closely match the original.\n","With the autoencoder effectively trained on the MNIST dataset, it becomes a powerful tool to process and reconstruct handwritten digit images.\n"],"metadata":{"id":"ZDIgwQN4vusg"}},{"cell_type":"markdown","source":["1. Import Necessary Libraries\n","Firstly, we need to ensure all the required libraries are imported."],"metadata":{"id":"eIUi8cybsRDq"}},{"cell_type":"code","execution_count":null,"metadata":{"id":"qmz50ztkr5b7"},"outputs":[],"source":["import tensorflow as tf\n","import numpy as np\n","import matplotlib.pyplot as plt\n"]},{"cell_type":"markdown","source":["2. Load the MNIST Data\n","We will load the MNIST dataset directly from TensorFlow's datasets module."],"metadata":{"id":"FYpRZ65HscSP"}},{"cell_type":"code","source":["# Load dataset\n","(x_train, _), (x_test, _) = tf.keras.datasets.mnist.load_data()\n","\n","# Normalize data to range [0, 1]\n","x_train, x_test = x_train / 255.0, x_test / 255.0\n"],"metadata":{"id":"Tb7WB-Pxscym"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["3. Define the Model\n","This section remains mostly unchanged."],"metadata":{"id":"lpBE_11ksplc"}},{"cell_type":"code","source":["# Define the autoencoder model\n","model = tf.keras.Sequential([\n"," tf.keras.layers.Flatten(input_shape=(28, 28)),\n"," tf.keras.layers.Dense(32, activation='relu'),\n"," tf.keras.layers.Dense(784, activation='sigmoid'),\n"," tf.keras.layers.Reshape((28, 28))\n","])\n"],"metadata":{"id":"HSMKabJmsqEr"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["4. Compile the Model\n","The model compilation stage."],"metadata":{"id":"LKKF_iJnsyu6"}},{"cell_type":"code","source":["model.compile(loss='binary_crossentropy', optimizer='adam')\n"],"metadata":{"id":"tPjf0Li9szkA"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["5. Train the Model\n","Training the autoencoder on the MNIST dataset."],"metadata":{"id":"BvXFAoWOs8KX"}},{"cell_type":"code","source":["model.fit(x_train, x_train, epochs=10, batch_size=128, validation_data=(x_test, x_test))\n"],"metadata":{"id":"Q5AahDurs8jg"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["6. Prediction\n","Obtain the encoded and decoded data."],"metadata":{"id":"zaE07GqBtHuw"}},{"cell_type":"code","source":["# For an autoencoder, the encoder and decoder parts are usually separate.\n","# Here, the entire autoencoder is used for encoding and decoding.\n","encoded_data = model.predict(x_test)\n","decoded_data = model.predict(encoded_data)\n"],"metadata":{"id":"zwiUMRARtIQK"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["7. Visualization\n","Visualize the original and reconstructed images."],"metadata":{"id":"sz5v0TyrtYIJ"}},{"cell_type":"markdown","source":[],"metadata":{"id":"6fhD5cCqthKP"}},{"cell_type":"code","source":["# Display original and reconstructed images\n","n = 10\n","plt.figure(figsize=(20, 4))\n","for i in range(n):\n"," # Original images\n"," ax = plt.subplot(2, n, i + 1)\n"," plt.imshow(x_test[i].reshape(28, 28), cmap='gray')\n"," ax.get_xaxis().set_visible(False)\n"," ax.get_yaxis().set_visible(False)\n","\n"," # Reconstructed images\n"," ax = plt.subplot(2, n, i + 1 + n)\n"," plt.imshow(decoded_data[i].reshape(28, 28), cmap='gray')\n"," ax.get_xaxis().set_visible(False)\n"," ax.get_yaxis().set_visible(False)\n","\n","plt.show()\n"],"metadata":{"id":"vXgTE3UutYju"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["# Part2- Self Attention\n","Here's a simplified version of how the self-attention mechanism can be implemented:"],"metadata":{"id":"oIjzgv0curhp"}},{"cell_type":"markdown","source":["Importing necessary libraries"],"metadata":{"id":"A4hr7JBTxj9k"}},{"cell_type":"code","source":["import numpy as np"],"metadata":{"id":"ti2qHavTuqom"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["Defining the self-attention function"],"metadata":{"id":"Gk9YgoMIwafk"}},{"cell_type":"code","source":["def self_attention(Q, K, V):\n"," \"\"\"\n"," Q: Query matrix\n"," K: Key matrix\n"," V: Value matrix\n"," \"\"\"\n","\n"," # Calculate the attention weights\n"," attention_weights = np.matmul(Q, K.T)\n","\n"," # Apply the softmax to get probabilities\n"," attention_probs = np.exp(attention_weights) / np.sum(np.exp(attention_weights), axis=1, keepdims=True)\n","\n"," # Multiply the probabilities with the value matrix to get the output\n"," output = np.matmul(attention_probs, V)\n","\n"," return output\n"],"metadata":{"id":"nIgKGE-qwa7M"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["## Example Usage:\n","Initialize matrices"],"metadata":{"id":"SaAHiLslw0DQ"}},{"cell_type":"code","source":["Q = np.array([[1, 0, 1], [0, 2, 0], [1, 1, 0]]) # Example Query\n","K = np.array([[1, 0, 1], [0, 2, 0], [1, 1, 0]]) # Key matrix\n","V = np.array([[0, 2, 0], [1, 0, 1], [0, 1, 2]]) # Value matrix\n"],"metadata":{"id":"QEDephIDw0V1"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["Compute the output using the self_attention function"],"metadata":{"id":"mkDQ4UpQw6I5"}},{"cell_type":"code","source":["output = self_attention(Q, K, V)\n"],"metadata":{"id":"WC6Ir-mcw6im"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["# Display the result"],"metadata":{"id":"cgsEYNa2w8jP"}},{"cell_type":"code","source":["print(output)"],"metadata":{"id":"ygaNvOYAw89w"},"execution_count":null,"outputs":[]},{"cell_type":"code","source":[],"metadata":{"id":"3kH1ha9vxpT4"},"execution_count":null,"outputs":[]}]}
--------------------------------------------------------------------------------
/Chapter13/Algorithmic_Strategies_for_Data_Handling.ipynb:
--------------------------------------------------------------------------------
1 | {"nbformat":4,"nbformat_minor":0,"metadata":{"colab":{"provenance":[]},"kernelspec":{"name":"python3","display_name":"Python 3"},"language_info":{"name":"python"}},"cells":[{"cell_type":"markdown","source":["# Chapter 13\n","## Algorithmic Strategies for Data Handling"],"metadata":{"id":"d2dRJcAl6c14"}},{"cell_type":"markdown","source":["### Implementing Huffman coding in Python\n","We start by creating a node for each character, where the node contains the character and its frequency. These nodes are then added to a priority queue, with the least frequent elements having the highest priority.\n","- For this, we create a Node class to represent each character in the Huffman tree. Each Node object contains the character, its frequency, and pointers to its left and right children.\n","\n","- The __lt__ method is defined to compare two Node objects based on their frequencies.\n"],"metadata":{"id":"n4_X3Gfv0Gk1"}},{"cell_type":"code","source":["import heapq\n","import functools\n","\n","@functools.total_ordering\n","class Node:\n"," def __init__(self, char, freq):\n"," self.char = char\n"," self.freq = freq\n"," self.left = None\n"," self.right = None\n","\n"," def __lt__(self, other):\n"," return self.freq < other.freq\n","\n"," def __eq__(self, other):\n"," return self.freq == other.freq"],"metadata":{"id":"vkieEJ9G57Gx","executionInfo":{"status":"ok","timestamp":1695362486783,"user_tz":-330,"elapsed":18,"user":{"displayName":"Karan Sonawane","userId":"05479461208077736330"}}},"execution_count":1,"outputs":[]},{"cell_type":"markdown","source":["Next, we build the Huffman tree. The construction of a Huffman tree involves a series of insertions and deletions in a priority queue, typically implemented as a binary heap.\n","- To build the Huffman tree, we create a min-heap of Node objects. A min-heap is a specialized tree-based structure that satisfies a simple but important condition: the parent node has a value less than or equal to its children. This property ensures that the smallest element is always at the root, making it efficient for priority operations.\n","- We repeatedly pop the two nodes with the lowest frequencies, merge them, and push the merged node back into the heap.\n","- This process continues until there is only one node left, which becomes the root of the Huffman tree. The tree can be built by build_tree function which is defined as follows:"],"metadata":{"id":"BABMPuyf6RpN"}},{"cell_type":"code","source":["def build_tree(frequencies):\n"," heap = [Node(char, freq) for char, freq in frequencies.items()]\n"," heapq.heapify(heap)\n"," while len(heap) > 1:\n"," node1 = heapq.heappop(heap)\n"," node2 = heapq.heappop(heap)\n"," merged = Node(None, node1.freq + node2.freq)\n"," merged.left = node1\n"," merged.right = node2\n"," heapq.heappush(heap, merged)\n"," return heap[0] # the root node"],"metadata":{"id":"JMz6qTMn6IDa","executionInfo":{"status":"ok","timestamp":1695362486784,"user_tz":-330,"elapsed":18,"user":{"displayName":"Karan Sonawane","userId":"05479461208077736330"}}},"execution_count":2,"outputs":[]},{"cell_type":"markdown","source":["#### Example usage:"],"metadata":{"id":"9yAFkKnJ6JU9"}},{"cell_type":"code","source":["frequencies = {'a': 5, 'b': 9, 'c': 12, 'd': 13, 'e': 16, 'f': 45}\n","root = build_tree(frequencies)\n","print(root.freq)"],"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"1y4vMNgC0y-Q","executionInfo":{"status":"ok","timestamp":1695362486784,"user_tz":-330,"elapsed":18,"user":{"displayName":"Karan Sonawane","userId":"05479461208077736330"}},"outputId":"f40e0dd4-44f5-4463-db9d-56d28c7a783f"},"execution_count":3,"outputs":[{"output_type":"stream","name":"stdout","text":["100\n"]}]},{"cell_type":"markdown","source":["### Generate the Huffman codes by traversing the tree"],"metadata":{"id":"L5BgyLsN7T3O"}},{"cell_type":"code","source":["import heapq\n","import functools\n","\n","@functools.total_ordering\n","class Node:\n"," def __init__(self, char, freq):\n"," self.char = char\n"," self.freq = freq\n"," self.left = None\n"," self.right = None\n","\n"," def __lt__(self, other):\n"," return self.freq < other.freq\n","\n"," def __eq__(self, other):\n"," return self.freq == other.freq"],"metadata":{"id":"0jnJu4QU7ekM","executionInfo":{"status":"ok","timestamp":1695362486785,"user_tz":-330,"elapsed":14,"user":{"displayName":"Karan Sonawane","userId":"05479461208077736330"}}},"execution_count":4,"outputs":[]},{"cell_type":"code","source":["def build_tree(frequencies):\n"," heap = [Node(char, freq) for char, freq in frequencies.items()]\n"," heapq.heapify(heap)\n"," while len(heap) > 1:\n"," node1 = heapq.heappop(heap)\n"," node2 = heapq.heappop(heap)\n"," merged = Node(None, node1.freq + node2.freq)\n"," merged.left = node1\n"," merged.right = node2\n"," heapq.heappush(heap, merged)\n"," return heap[0] # the root node"],"metadata":{"id":"ZLnKUYNW7m8K","executionInfo":{"status":"ok","timestamp":1695362486785,"user_tz":-330,"elapsed":13,"user":{"displayName":"Karan Sonawane","userId":"05479461208077736330"}}},"execution_count":5,"outputs":[]},{"cell_type":"code","source":["def generate_codes(node, code='', codes=None):\n"," if codes is None:\n"," codes = {}\n"," if node is None:\n"," return {}\n"," if node.char is not None:\n"," codes[node.char] = code\n"," return codes\n"," generate_codes(node.left, code + '0', codes)\n"," generate_codes(node.right, code + '1', codes)\n"," return codes"],"metadata":{"id":"c0u_avxe7qW2","executionInfo":{"status":"ok","timestamp":1695362486785,"user_tz":-330,"elapsed":13,"user":{"displayName":"Karan Sonawane","userId":"05479461208077736330"}}},"execution_count":6,"outputs":[]},{"cell_type":"markdown","source":["Sample data for Huffman's encoding"],"metadata":{"id":"clXVgQ3L7x1R"}},{"cell_type":"code","source":["data = {\n"," 'L': 0.45,\n"," 'M': 0.13,\n"," 'N': 0.12,\n"," 'X': 0.16,\n"," 'Y': 0.09,\n"," 'Z': 0.05\n","}"],"metadata":{"id":"LsY-o8oA7vZn","executionInfo":{"status":"ok","timestamp":1695362486786,"user_tz":-330,"elapsed":14,"user":{"displayName":"Karan Sonawane","userId":"05479461208077736330"}}},"execution_count":7,"outputs":[]},{"cell_type":"markdown","source":["Build the Huffman tree and generate the Huffman codes"],"metadata":{"id":"qG5M_sGQ78Ua"}},{"cell_type":"code","source":["root = build_tree(data)\n","codes = generate_codes(root)"],"metadata":{"id":"YozqWMTr76S8","executionInfo":{"status":"ok","timestamp":1695362486786,"user_tz":-330,"elapsed":13,"user":{"displayName":"Karan Sonawane","userId":"05479461208077736330"}}},"execution_count":8,"outputs":[]},{"cell_type":"markdown","source":["Print the Huffman code"],"metadata":{"id":"c6bJMSIw8jBM"}},{"cell_type":"code","source":["# Print the root of the Huffman tree\n","print(f'Root of the Huffman tree: {root}')\n","# Print out the Huffman codes\n","for char, code in codes.items():\n"," print(f'{char}: {code}')\n"],"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"aOdxyU72ymlm","executionInfo":{"status":"ok","timestamp":1695362486786,"user_tz":-330,"elapsed":13,"user":{"displayName":"Karan Sonawane","userId":"05479461208077736330"}},"outputId":"6a1b729e-2030-434c-c184-7ad2294a1eb0"},"execution_count":9,"outputs":[{"output_type":"stream","name":"stdout","text":["Root of the Huffman tree: <__main__.Node object at 0x7b28f2a01570>\n","L: 0\n","N: 100\n","M: 101\n","Z: 1100\n","Y: 1101\n","X: 111\n"]}]},{"cell_type":"code","source":["# Print the root of the Huffman tree\n","print(f'Root of the Huffman tree: {root}')\n","# Print out the Huffman codes\n","for char, code in codes.items():\n"," print(f'{char}: {code}')\n"],"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"24Jpd8dG5ujx","executionInfo":{"status":"ok","timestamp":1695362486786,"user_tz":-330,"elapsed":12,"user":{"displayName":"Karan Sonawane","userId":"05479461208077736330"}},"outputId":"6d9f34e2-006b-4078-ced5-af1c7a428c64"},"execution_count":10,"outputs":[{"output_type":"stream","name":"stdout","text":["Root of the Huffman tree: <__main__.Node object at 0x7b28f2a01570>\n","L: 0\n","N: 100\n","M: 101\n","Z: 1100\n","Y: 1101\n","X: 111\n"]}]}]}
--------------------------------------------------------------------------------
/Chapter14/Cryptography.ipynb:
--------------------------------------------------------------------------------
1 | {"nbformat":4,"nbformat_minor":0,"metadata":{"colab":{"provenance":[]},"kernelspec":{"name":"python3","display_name":"Python 3"},"language_info":{"name":"python"}},"cells":[{"cell_type":"markdown","source":["# Chapter 14\n","## Cryptography"],"metadata":{"id":"tvg4chEUID9M"}},{"cell_type":"markdown","source":["Caesar cipher"],"metadata":{"id":"uqyFeEzP3CAE"}},{"cell_type":"code","execution_count":null,"metadata":{"id":"RbfMB_B63Bcu"},"outputs":[],"source":["rotation = 3\n","P = 'CALM'; C=''\n","for letter in P:\n"," C = C+ (chr(ord(letter) + rotation))\n"]},{"cell_type":"code","source":["print(C)"],"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"r5AxzIMQ3Ifs","outputId":"1ece83cf-b071-4bc8-f2b0-5d7662e751cb","executionInfo":{"status":"ok","timestamp":1695362572803,"user_tz":-330,"elapsed":23,"user":{"displayName":"Karan Sonawane","userId":"05479461208077736330"}}},"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":["FDOP\n"]}]},{"cell_type":"markdown","source":["ROT 13"],"metadata":{"id":"3GbDTsPP3eQd"}},{"cell_type":"code","source":["rotation = 13\n","P = 'CALM'; C=''\n","for letter in P:\n"," C = C+ (chr(ord(letter) + rotation))"],"metadata":{"id":"r3xElhkM3ffy"},"execution_count":null,"outputs":[]},{"cell_type":"code","source":["print(C)"],"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"iDrjCAGV3gkT","outputId":"e63f3a34-6f38-41d7-b108-d3cf07a89f29","executionInfo":{"status":"ok","timestamp":1695362572808,"user_tz":-330,"elapsed":26,"user":{"displayName":"Karan Sonawane","userId":"05479461208077736330"}}},"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":["PNYZ\n"]}]},{"cell_type":"markdown","source":["Understanding MD5-tolerated"],"metadata":{"id":"aiXyOW6D4Qx4"}},{"cell_type":"code","source":["import hashlib\n","\n","def generate_md5_hash(input_string):\n"," # Create a new md5 hash object\n"," md5_hash = hashlib.md5()\n","\n"," # Encode the input string to bytes and hash it\n"," md5_hash.update(input_string.encode())\n","\n"," # Return the hexadecimal representation of the hash\n"," return md5_hash.hexdigest()\n","\n","def verify_md5_hash(input_string, correct_hash):\n"," # Generate md5 hash for the input_string\n"," computed_hash = generate_md5_hash(input_string)\n","\n"," # Compare the computed hash with the provided hash\n"," return computed_hash == correct_hash\n","\n","# Test\n","input_string = \"Hello, World!\"\n","hash_value = generate_md5_hash(input_string)\n","print(f\"Generated hash: {hash_value}\")\n","\n","correct_hash = hash_value\n","print(verify_md5_hash(input_string, correct_hash)) # This should return True\n"],"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"i5kxSqneHHAT","outputId":"40bc0017-9ab7-4c70-c7a4-9f4bbd860d19","executionInfo":{"status":"ok","timestamp":1695362572808,"user_tz":-330,"elapsed":25,"user":{"displayName":"Karan Sonawane","userId":"05479461208077736330"}}},"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":["Generated hash: 65a8e27d8879283831b664bd8b7f0ad4\n","True\n"]}]},{"cell_type":"markdown","source":["Understanding Secure Hashing Algorithm"],"metadata":{"id":"ZMKDxCvG4pmN"}},{"cell_type":"code","source":["import hashlib"],"metadata":{"id":"dsYQx3GA4qJU"},"execution_count":null,"outputs":[]},{"cell_type":"code","source":["salt = \"qIo0foX5\"\n","password = \"myPassword\""],"metadata":{"id":"YX1rHHQ24sc1"},"execution_count":null,"outputs":[]},{"cell_type":"code","source":["salted_password = salt + password"],"metadata":{"id":"crIdWU584usu"},"execution_count":null,"outputs":[]},{"cell_type":"code","source":["sha512_hash = hashlib.sha512()\n","sha512_hash.update(salted_password.encode())\n","myHash = sha512_hash.hexdigest()"],"metadata":{"id":"4LF_Oa0k4wbe"},"execution_count":null,"outputs":[]},{"cell_type":"code","source":["myHash"],"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":36},"id":"sDTu_slW40Sk","outputId":"3277f0eb-47fa-4300-994b-246d1f9d9650","executionInfo":{"status":"ok","timestamp":1695362572810,"user_tz":-330,"elapsed":18,"user":{"displayName":"Karan Sonawane","userId":"05479461208077736330"}}},"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":["'2e367911b87b12f73b135b1a4af9fac193a8064d3c0a52e34b3a52a5422beed2b6276eabf95abe728f91ba61ef93175e5bac9a643b54967363ffab0b35133563'"],"application/vnd.google.colaboratory.intrinsic+json":{"type":"string"}},"metadata":{},"execution_count":10}]},{"cell_type":"markdown","source":["Coding symmetric encryption"],"metadata":{"id":"2rFU072b5A8w"}},{"cell_type":"code","source":["import hashlib"],"metadata":{"id":"2o9COqXy5BuU"},"execution_count":null,"outputs":[]},{"cell_type":"code","source":["sha256_hash = hashlib.sha256()"],"metadata":{"id":"55ZYMvI95DOF"},"execution_count":null,"outputs":[]},{"cell_type":"code","source":["message = \"Ottawa is really cold\".encode()\n","sha256_hash.update(message)"],"metadata":{"id":"KqaipWu85EUo"},"execution_count":null,"outputs":[]},{"cell_type":"code","source":["print(sha256_hash.hexdigest())"],"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"wl-fLpaO5IVN","outputId":"90203090-124f-4ff7-aad0-817ae4d0f19e","executionInfo":{"status":"ok","timestamp":1695362573559,"user_tz":-330,"elapsed":12,"user":{"displayName":"Karan Sonawane","userId":"05479461208077736330"}}},"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":["b6ee63a201c4505f1f50ff92b7fe9d9e881b57292c00a3244008b76d0e026161\n"]}]},{"cell_type":"markdown","source":["How to prevent MITM attacks"],"metadata":{"id":"5a5H4r2f5QXc"}},{"cell_type":"code","source":["from xmlrpc.client import SafeTransport, ServerProxy\n","import ssl"],"metadata":{"id":"MpjOG2lG5SwI"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["## Data and Model encryption"],"metadata":{"id":"6akH-kvA5ZCZ"}},{"cell_type":"code","source":["# 1. Install and import required libraries\n","!pip install cryptography\n","\n","import pickle\n","from joblib import dump, load\n","from sklearn.linear_model import LogisticRegression\n","from sklearn.model_selection import train_test_split\n","from sklearn.datasets import load_iris\n","from cryptography.fernet import Fernet\n","\n","# 2. Train a simple model using the Iris dataset\n","iris = load_iris()\n","X = iris.data\n","y = iris.target\n","X_train, X_test, y_train, y_test = train_test_split(X, y)\n","model = LogisticRegression(max_iter=1000) # Increase max_iter for convergence\n","model.fit(X_train, y_train)\n","\n","# 3. Define the names of the files that will store the model\n","filename_source = \"unencrypted_model.pkl\"\n","filename_destination = \"decrypted_model.pkl\"\n","filename_sec = \"encrypted_model.pkl\"\n","\n","# 4. Store the trained model in a file\n","dump(model, filename_source)\n","\n","# 5. Define functions for encryption and decryption\n","def write_key():\n"," key = Fernet.generate_key()\n"," with open(\"key.key\", \"wb\") as key_file:\n"," key_file.write(key)\n","\n","def load_key():\n"," return open(\"key.key\", \"rb\").read()\n","\n","def encrypt(filename, key):\n"," f = Fernet(key)\n"," with open(filename, \"rb\") as file:\n"," file_data = file.read()\n"," encrypted_data = f.encrypt(file_data)\n"," with open(filename_sec, \"wb\") as file:\n"," file.write(encrypted_data)\n","\n","def decrypt(filename, key):\n"," f = Fernet(key)\n"," with open(filename, \"rb\") as file:\n"," encrypted_data = file.read()\n"," decrypted_data = f.decrypt(encrypted_data)\n"," with open(filename_destination, \"wb\") as file:\n"," file.write(decrypted_data)\n","\n","# 6. Use the functions to encrypt the model, then decrypt it\n","write_key()\n","key = load_key()\n","encrypt(filename_source, key)\n","decrypt(filename_sec, key)\n","\n","# 7. Load the decrypted model and make predictions\n","loaded_model = load(filename_destination)\n","result = loaded_model.score(X_test, y_test)\n","print(result)\n"],"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"cUg8638WE2Pb","outputId":"ad27506c-6daa-4a35-a090-eace113535e5","executionInfo":{"status":"ok","timestamp":1695362580278,"user_tz":-330,"elapsed":6729,"user":{"displayName":"Karan Sonawane","userId":"05479461208077736330"}}},"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":["Requirement already satisfied: cryptography in /usr/local/lib/python3.10/dist-packages (41.0.3)\n","Requirement already satisfied: cffi>=1.12 in /usr/local/lib/python3.10/dist-packages (from cryptography) (1.15.1)\n","Requirement already satisfied: pycparser in /usr/local/lib/python3.10/dist-packages (from cffi>=1.12->cryptography) (2.21)\n","1.0\n"]}]},{"cell_type":"code","source":["import os\n","\n","# Get the current working directory\n","current_directory = os.getcwd()\n","\n","# Print the current working directory\n","print(\"Current Working Directory:\", current_directory)\n"],"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"btaFCZ31FYYT","outputId":"3bf16b4d-055c-4b38-c9ca-57c47bfb2fab","executionInfo":{"status":"ok","timestamp":1695362580278,"user_tz":-330,"elapsed":21,"user":{"displayName":"Karan Sonawane","userId":"05479461208077736330"}}},"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":["Current Working Directory: /content\n"]}]},{"cell_type":"code","source":["!ls"],"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"1efiUoS8Fnq4","outputId":"66275a61-cf2e-4cbe-9a12-bc3383f21446","executionInfo":{"status":"ok","timestamp":1695362580279,"user_tz":-330,"elapsed":17,"user":{"displayName":"Karan Sonawane","userId":"05479461208077736330"}}},"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":["decrypted_model.pkl key.key\t unencrypted_model.pkl\n","encrypted_model.pkl sample_data\n"]}]}]}
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | Apache License
2 | Version 2.0, January 2004
3 | http://www.apache.org/licenses/
4 |
5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
6 |
7 | 1. Definitions.
8 |
9 | "License" shall mean the terms and conditions for use, reproduction,
10 | and distribution as defined by Sections 1 through 9 of this document.
11 |
12 | "Licensor" shall mean the copyright owner or entity authorized by
13 | the copyright owner that is granting the License.
14 |
15 | "Legal Entity" shall mean the union of the acting entity and all
16 | other entities that control, are controlled by, or are under common
17 | control with that entity. For the purposes of this definition,
18 | "control" means (i) the power, direct or indirect, to cause the
19 | direction or management of such entity, whether by contract or
20 | otherwise, or (ii) ownership of fifty percent (50%) or more of the
21 | outstanding shares, or (iii) beneficial ownership of such entity.
22 |
23 | "You" (or "Your") shall mean an individual or Legal Entity
24 | exercising permissions granted by this License.
25 |
26 | "Source" form shall mean the preferred form for making modifications,
27 | including but not limited to software source code, documentation
28 | source, and configuration files.
29 |
30 | "Object" form shall mean any form resulting from mechanical
31 | transformation or translation of a Source form, including but
32 | not limited to compiled object code, generated documentation,
33 | and conversions to other media types.
34 |
35 | "Work" shall mean the work of authorship, whether in Source or
36 | Object form, made available under the License, as indicated by a
37 | copyright notice that is included in or attached to the work
38 | (an example is provided in the Appendix below).
39 |
40 | "Derivative Works" shall mean any work, whether in Source or Object
41 | form, that is based on (or derived from) the Work and for which the
42 | editorial revisions, annotations, elaborations, or other modifications
43 | represent, as a whole, an original work of authorship. For the purposes
44 | of this License, Derivative Works shall not include works that remain
45 | separable from, or merely link (or bind by name) to the interfaces of,
46 | the Work and Derivative Works thereof.
47 |
48 | "Contribution" shall mean any work of authorship, including
49 | the original version of the Work and any modifications or additions
50 | to that Work or Derivative Works thereof, that is intentionally
51 | submitted to Licensor for inclusion in the Work by the copyright owner
52 | or by an individual or Legal Entity authorized to submit on behalf of
53 | the copyright owner. For the purposes of this definition, "submitted"
54 | means any form of electronic, verbal, or written communication sent
55 | to the Licensor or its representatives, including but not limited to
56 | communication on electronic mailing lists, source code control systems,
57 | and issue tracking systems that are managed by, or on behalf of, the
58 | Licensor for the purpose of discussing and improving the Work, but
59 | excluding communication that is conspicuously marked or otherwise
60 | designated in writing by the copyright owner as "Not a Contribution."
61 |
62 | "Contributor" shall mean Licensor and any individual or Legal Entity
63 | on behalf of whom a Contribution has been received by Licensor and
64 | subsequently incorporated within the Work.
65 |
66 | 2. Grant of Copyright License. Subject to the terms and conditions of
67 | this License, each Contributor hereby grants to You a perpetual,
68 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable
69 | copyright license to reproduce, prepare Derivative Works of,
70 | publicly display, publicly perform, sublicense, and distribute the
71 | Work and such Derivative Works in Source or Object form.
72 |
73 | 3. Grant of Patent License. Subject to the terms and conditions of
74 | this License, each Contributor hereby grants to You a perpetual,
75 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable
76 | (except as stated in this section) patent license to make, have made,
77 | use, offer to sell, sell, import, and otherwise transfer the Work,
78 | where such license applies only to those patent claims licensable
79 | by such Contributor that are necessarily infringed by their
80 | Contribution(s) alone or by combination of their Contribution(s)
81 | with the Work to which such Contribution(s) was submitted. If You
82 | institute patent litigation against any entity (including a
83 | cross-claim or counterclaim in a lawsuit) alleging that the Work
84 | or a Contribution incorporated within the Work constitutes direct
85 | or contributory patent infringement, then any patent licenses
86 | granted to You under this License for that Work shall terminate
87 | as of the date such litigation is filed.
88 |
89 | 4. Redistribution. You may reproduce and distribute copies of the
90 | Work or Derivative Works thereof in any medium, with or without
91 | modifications, and in Source or Object form, provided that You
92 | meet the following conditions:
93 |
94 | (a) You must give any other recipients of the Work or
95 | Derivative Works a copy of this License; and
96 |
97 | (b) You must cause any modified files to carry prominent notices
98 | stating that You changed the files; and
99 |
100 | (c) You must retain, in the Source form of any Derivative Works
101 | that You distribute, all copyright, patent, trademark, and
102 | attribution notices from the Source form of the Work,
103 | excluding those notices that do not pertain to any part of
104 | the Derivative Works; and
105 |
106 | (d) If the Work includes a "NOTICE" text file as part of its
107 | distribution, then any Derivative Works that You distribute must
108 | include a readable copy of the attribution notices contained
109 | within such NOTICE file, excluding those notices that do not
110 | pertain to any part of the Derivative Works, in at least one
111 | of the following places: within a NOTICE text file distributed
112 | as part of the Derivative Works; within the Source form or
113 | documentation, if provided along with the Derivative Works; or,
114 | within a display generated by the Derivative Works, if and
115 | wherever such third-party notices normally appear. The contents
116 | of the NOTICE file are for informational purposes only and
117 | do not modify the License. You may add Your own attribution
118 | notices within Derivative Works that You distribute, alongside
119 | or as an addendum to the NOTICE text from the Work, provided
120 | that such additional attribution notices cannot be construed
121 | as modifying the License.
122 |
123 | You may add Your own copyright statement to Your modifications and
124 | may provide additional or different license terms and conditions
125 | for use, reproduction, or distribution of Your modifications, or
126 | for any such Derivative Works as a whole, provided Your use,
127 | reproduction, and distribution of the Work otherwise complies with
128 | the conditions stated in this License.
129 |
130 | 5. Submission of Contributions. Unless You explicitly state otherwise,
131 | any Contribution intentionally submitted for inclusion in the Work
132 | by You to the Licensor shall be under the terms and conditions of
133 | this License, without any additional terms or conditions.
134 | Notwithstanding the above, nothing herein shall supersede or modify
135 | the terms of any separate license agreement you may have executed
136 | with Licensor regarding such Contributions.
137 |
138 | 6. Trademarks. This License does not grant permission to use the trade
139 | names, trademarks, service marks, or product names of the Licensor,
140 | except as required for reasonable and customary use in describing the
141 | origin of the Work and reproducing the content of the NOTICE file.
142 |
143 | 7. Disclaimer of Warranty. Unless required by applicable law or
144 | agreed to in writing, Licensor provides the Work (and each
145 | Contributor provides its Contributions) on an "AS IS" BASIS,
146 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
147 | implied, including, without limitation, any warranties or conditions
148 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
149 | PARTICULAR PURPOSE. You are solely responsible for determining the
150 | appropriateness of using or redistributing the Work and assume any
151 | risks associated with Your exercise of permissions under this License.
152 |
153 | 8. Limitation of Liability. In no event and under no legal theory,
154 | whether in tort (including negligence), contract, or otherwise,
155 | unless required by applicable law (such as deliberate and grossly
156 | negligent acts) or agreed to in writing, shall any Contributor be
157 | liable to You for damages, including any direct, indirect, special,
158 | incidental, or consequential damages of any character arising as a
159 | result of this License or out of the use or inability to use the
160 | Work (including but not limited to damages for loss of goodwill,
161 | work stoppage, computer failure or malfunction, or any and all
162 | other commercial damages or losses), even if such Contributor
163 | has been advised of the possibility of such damages.
164 |
165 | 9. Accepting Warranty or Additional Liability. While redistributing
166 | the Work or Derivative Works thereof, You may choose to offer,
167 | and charge a fee for, acceptance of support, warranty, indemnity,
168 | or other liability obligations and/or rights consistent with this
169 | License. However, in accepting such obligations, You may act only
170 | on Your own behalf and on Your sole responsibility, not on behalf
171 | of any other Contributor, and only if You agree to indemnify,
172 | defend, and hold each Contributor harmless for any liability
173 | incurred by, or claims asserted against, such Contributor by reason
174 | of your accepting any such warranty or additional liability.
175 |
176 | END OF TERMS AND CONDITIONS
177 |
178 | APPENDIX: How to apply the Apache License to your work.
179 |
180 | To apply the Apache License to your work, attach the following
181 | boilerplate notice, with the fields enclosed by brackets "[]"
182 | replaced with your own identifying information. (Don't include
183 | the brackets!) The text should be enclosed in the appropriate
184 | comment syntax for the file format. We also recommend that a
185 | file or class name and description of purpose be included on the
186 | same "printed page" as the copyright notice for easier
187 | identification within third-party archives.
188 |
189 | Copyright [yyyy] [name of copyright owner]
190 |
191 | Licensed under the Apache License, Version 2.0 (the "License");
192 | you may not use this file except in compliance with the License.
193 | You may obtain a copy of the License at
194 |
195 | http://www.apache.org/licenses/LICENSE-2.0
196 |
197 | Unless required by applicable law or agreed to in writing, software
198 | distributed under the License is distributed on an "AS IS" BASIS,
199 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
200 | See the License for the specific language governing permissions and
201 | limitations under the License.
202 |
--------------------------------------------------------------------------------