├── example.txt
├── LICENSE
├── README.md
└── Apriori.py


/example.txt:
--------------------------------------------------------------------------------
1 | a c d f g i m p
2 | a b c f l m o
3 | b f h j o
4 | b c k s p
5 | a c e f l m n p
6 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2019 Ethon Wu
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # Apriori Algorithm in  Python
 2 | ## Introduction 
 3 |  This is old mining algorithm in mining Association Rules 
 4 | 
 5 |  And I implement this Algorithm from this paper
 6 |  	
 7 | > *R. Agrawal and R. Srikant. “Fast Algorithms for Mining Association Rules.”  Proc. 1994 Int’l Conf. Very Large Data Bases (VLDB ’94), pp. 487-499, Sept. 1994 .io*
 8 | 
 9 |  There are two property in Frequent Itemset Mining
10 |  	
11 | > Property1: *if an itemset is infrequent, all it supersets must be infrequent and they need not be examined futher.*
12 | 
13 | > Property2: *if an itemset is frequent, all its subsets must be frequent and they need not be examined further.*
14 | 
15 | ### Prune Stage
16 | 
17 |  Apriori Algorithm use *Property1* to *"Prune"* infrequent superset
18 | 
19 |  This part cat see in **Apriori_prune** function in Apriori.py
20 | 
21 | ### Join Stage
22 | 
23 |  Apriori Algorithm use **Apriori_gen** to create k+1 candidate 
24 | 
25 |  Combines two frequent k-itemset(now k=3),which have same k-1 prefix to generate new (k+1)-itemsets 
26 | 
27 |  Example:
28 |  
29 |  	k=3 Ck=(a,b,c),(a,b,e) 
30 | 
31 | 	Have same k-1 prefix (a,b) 
32 | 
33 | 	Can combine generate (k+1)-itemset(k=4)  
34 | 
35 | 	Ck+1=(a,b,c,e)
36 | 
37 | ## Usege 
38 | 	python Apriori.py
39 | 
40 | 


--------------------------------------------------------------------------------
/Apriori.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env python
 2 | #The dict {} in python such as:my_information = {'name': 'Pusheen the Cat', 'country': 'USA', 'favorite_numbers': [42, 105]}
 3 | #name -> Pusheen the Cat (Key is name , value is Pusheen the cat)
 4 | def Apriori_gen(Itemset, lenght):
 5 |     """Too generate new (k+1)-itemsets can see README Join Stage"""
 6 |     canditate = []
 7 |     canditate_index = 0
 8 |     for i in range (0,lenght):
 9 |         element = str(Itemset[i])
10 |         for j in range (i+1,lenght):
11 |             element1 = str(Itemset[j])
12 |             if element[0:(len(element)-1)] == element1[0:(len(element1)-1)]:
13 |                     unionset = element[0:(len(element)-1)]+element1[len(element1)-1]+element[len(element)-1] #Combine (k-1)-Itemset to k-Itemset 
14 |                     unionset = ''.join(sorted(unionset))  #Sort itemset by dict order
15 |                     canditate.append(unionset)
16 |     return canditate
17 | 
18 | def Apriori_prune(Ck,MinSupport):
19 |     L = []
20 |     for i in Ck:
21 |         if Ck[i] >= minsupport:
22 |             L.append(i)
23 |     return sorted(L)
24 | def Apriori_count_subset(Canditate,Canditate_len):
25 |     """ Use bool to know is subset or not """
26 |     Lk = dict()
27 |     file = open('example.txt')
28 |     for l in file:
29 |         l = str(l.split())
30 |         count = 0
31 |         for i in range (0,Canditate_len):
32 |             key = str(Canditate[i])
33 |             if key not in Lk:
34 |                 Lk[key] = 0
35 |             flag = True
36 |             for k in key:
37 |                 if k not in l:
38 |                     flag = False
39 |             if flag:
40 |                 Lk[key] += 1
41 |     file.close()
42 |     return Lk
43 | minsupport = 3
44 | C1={} 
45 | file = open('example.txt')
46 | """Count one canditate"""
47 | for line in file:
48 |     for item in line.split():
49 |         if item in C1:
50 |             C1[item] +=1
51 |         else:
52 |             C1[item] = 1
53 | file.close()
54 | C1.keys().sort()
55 | L = []
56 | L1 = Apriori_prune(C1,minsupport)
57 | L = Apriori_gen(L1,len(L1))
58 | print '===================================='
59 | print 'Frequent 1-itemset is',L1
60 | print '===================================='
61 | k=2
62 | while L != []:
63 |     C = dict()
64 |     C = Apriori_count_subset(L,len(L))
65 |     fruquent_itemset = []
66 |     fruquent_itemset = Apriori_prune(C,minsupport)
67 |     print '===================================='
68 |     print 'Frequent',k,'-itemset is',fruquent_itemset
69 |     print '===================================='
70 |     L = Apriori_gen(fruquent_itemset,len(fruquent_itemset))
71 |     k += 1
72 | 


--------------------------------------------------------------------------------