├── example.txt ├── LICENSE ├── README.md └── Apriori.py /example.txt: -------------------------------------------------------------------------------- 1 | a c d f g i m p 2 | a b c f l m o 3 | b f h j o 4 | b c k s p 5 | a c e f l m n p 6 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2019 Ethon Wu 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Apriori Algorithm in Python 2 | ## Introduction 3 | This is old mining algorithm in mining Association Rules 4 | 5 | And I implement this Algorithm from this paper 6 | 7 | > *R. Agrawal and R. Srikant. “Fast Algorithms for Mining Association Rules.” Proc. 1994 Int’l Conf. Very Large Data Bases (VLDB ’94), pp. 487-499, Sept. 1994 .io* 8 | 9 | There are two property in Frequent Itemset Mining 10 | 11 | > Property1: *if an itemset is infrequent, all it supersets must be infrequent and they need not be examined futher.* 12 | 13 | > Property2: *if an itemset is frequent, all its subsets must be frequent and they need not be examined further.* 14 | 15 | ### Prune Stage 16 | 17 | Apriori Algorithm use *Property1* to *"Prune"* infrequent superset 18 | 19 | This part cat see in **Apriori_prune** function in Apriori.py 20 | 21 | ### Join Stage 22 | 23 | Apriori Algorithm use **Apriori_gen** to create k+1 candidate 24 | 25 | Combines two frequent k-itemset(now k=3),which have same k-1 prefix to generate new (k+1)-itemsets 26 | 27 | Example: 28 | 29 | k=3 Ck=(a,b,c),(a,b,e) 30 | 31 | Have same k-1 prefix (a,b) 32 | 33 | Can combine generate (k+1)-itemset(k=4) 34 | 35 | Ck+1=(a,b,c,e) 36 | 37 | ## Usege 38 | python Apriori.py 39 | 40 | -------------------------------------------------------------------------------- /Apriori.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | #The dict {} in python such as:my_information = {'name': 'Pusheen the Cat', 'country': 'USA', 'favorite_numbers': [42, 105]} 3 | #name -> Pusheen the Cat (Key is name , value is Pusheen the cat) 4 | def Apriori_gen(Itemset, lenght): 5 | """Too generate new (k+1)-itemsets can see README Join Stage""" 6 | canditate = [] 7 | canditate_index = 0 8 | for i in range (0,lenght): 9 | element = str(Itemset[i]) 10 | for j in range (i+1,lenght): 11 | element1 = str(Itemset[j]) 12 | if element[0:(len(element)-1)] == element1[0:(len(element1)-1)]: 13 | unionset = element[0:(len(element)-1)]+element1[len(element1)-1]+element[len(element)-1] #Combine (k-1)-Itemset to k-Itemset 14 | unionset = ''.join(sorted(unionset)) #Sort itemset by dict order 15 | canditate.append(unionset) 16 | return canditate 17 | 18 | def Apriori_prune(Ck,MinSupport): 19 | L = [] 20 | for i in Ck: 21 | if Ck[i] >= minsupport: 22 | L.append(i) 23 | return sorted(L) 24 | def Apriori_count_subset(Canditate,Canditate_len): 25 | """ Use bool to know is subset or not """ 26 | Lk = dict() 27 | file = open('example.txt') 28 | for l in file: 29 | l = str(l.split()) 30 | count = 0 31 | for i in range (0,Canditate_len): 32 | key = str(Canditate[i]) 33 | if key not in Lk: 34 | Lk[key] = 0 35 | flag = True 36 | for k in key: 37 | if k not in l: 38 | flag = False 39 | if flag: 40 | Lk[key] += 1 41 | file.close() 42 | return Lk 43 | minsupport = 3 44 | C1={} 45 | file = open('example.txt') 46 | """Count one canditate""" 47 | for line in file: 48 | for item in line.split(): 49 | if item in C1: 50 | C1[item] +=1 51 | else: 52 | C1[item] = 1 53 | file.close() 54 | C1.keys().sort() 55 | L = [] 56 | L1 = Apriori_prune(C1,minsupport) 57 | L = Apriori_gen(L1,len(L1)) 58 | print '====================================' 59 | print 'Frequent 1-itemset is',L1 60 | print '====================================' 61 | k=2 62 | while L != []: 63 | C = dict() 64 | C = Apriori_count_subset(L,len(L)) 65 | fruquent_itemset = [] 66 | fruquent_itemset = Apriori_prune(C,minsupport) 67 | print '====================================' 68 | print 'Frequent',k,'-itemset is',fruquent_itemset 69 | print '====================================' 70 | L = Apriori_gen(fruquent_itemset,len(fruquent_itemset)) 71 | k += 1 72 | --------------------------------------------------------------------------------