├── .travis.yml
├── LICENSE
├── Makefile
├── README.md
├── closestmatch.go
├── closestmatch_test.go
├── cmclient
├── client.go
└── client_test.go
├── cmserver
└── server.go
├── levenshtein
├── levenshtein.go
└── levenshtein_test.go
└── test
├── books.list
├── catcher.txt
├── data.go
├── popular.txt
└── potter.txt
/.travis.yml:
--------------------------------------------------------------------------------
1 | language: go
2 |
3 | go:
4 | - 1.8
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) 2017 Zack
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 |
--------------------------------------------------------------------------------
/Makefile:
--------------------------------------------------------------------------------
1 | .PHONY: test
2 | test:
3 | go test -cover -run=.
4 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 |
2 | # closestmatch :page_with_curl:
3 |
4 |
5 |
6 |
7 |
8 |
9 | *closestmatch* is a simple and fast Go library for fuzzy matching an input string to a list of target strings. *closestmatch* is useful for handling input from a user where the input (which could be mispelled or out of order) needs to match a key in a database. *closestmatch* uses a [bag-of-words approach](https://en.wikipedia.org/wiki/Bag-of-words_model) to precompute character n-grams to represent each possible target string. The closest matches have highest overlap between the sets of n-grams. The precomputation scales well and is much faster and more accurate than Levenshtein for long strings.
10 |
11 |
12 | Getting Started
13 | ===============
14 |
15 | ## Install
16 |
17 | ```
18 | go get -u -v github.com/schollz/closestmatch
19 | ```
20 |
21 | ## Use
22 |
23 | #### Create a *closestmatch* object from a list words
24 |
25 | ```golang
26 | // Take a slice of keys, say band names that are similar
27 | // http://www.tonedeaf.com.au/412720/38-bands-annoyingly-similar-names.htm
28 | wordsToTest := []string{"King Gizzard", "The Lizard Wizard", "Lizzard Wizzard"}
29 |
30 | // Choose a set of bag sizes, more is more accurate but slower
31 | bagSizes := []int{2}
32 |
33 | // Create a closestmatch object
34 | cm := closestmatch.New(wordsToTest, bagSizes)
35 | ```
36 |
37 | #### Find the closest match, or find the *N* closest matches
38 |
39 | ```golang
40 | fmt.Println(cm.Closest("kind gizard"))
41 | // returns 'King Gizzard'
42 |
43 | fmt.Println(cm.ClosestN("kind gizard",3))
44 | // returns [King Gizzard Lizzard Wizzard The Lizard Wizard]
45 | ```
46 |
47 | #### Calculate the accuracy
48 |
49 | ```golang
50 | // Calculate accuracy
51 | fmt.Println(cm.AccuracyMutatingWords())
52 | // ~ 66 % (still way better than Levenshtein which hits 0% with this particular set)
53 |
54 | // Improve accuracy by adding more bags
55 | bagSizes = []int{2, 3, 4}
56 | cm = closestmatch.New(wordsToTest, bagSizes)
57 | fmt.Println(cm.AccuracyMutatingWords())
58 | // accuracy improves to ~ 76 %
59 | ```
60 |
61 | #### Save/Load
62 |
63 | ```golang
64 | // Save your current calculated bags
65 | cm.Save("closestmatches.gob")
66 |
67 | // Open it again
68 | cm2, _ := closestmatch.Load("closestmatches.gob")
69 | fmt.Println(cm2.Closest("lizard wizard"))
70 | // prints "The Lizard Wizard"
71 | ```
72 |
73 | ### Advantages
74 |
75 | *closestmatch* is more accurate than Levenshtein for long strings (like in the test corpus).
76 |
77 | *closestmatch* is ~20x faster than [a fast implementation of Levenshtein](https://groups.google.com/forum/#!topic/golang-nuts/YyH1f_qCZVc). Try it yourself with the benchmarks:
78 |
79 | ```bash
80 | cd $GOPATH/src/github.com/schollz/closestmatch && go test -run=None -bench=. > closestmatch.bench
81 | cd $GOPATH/src/github.com/schollz/closestmatch/levenshtein && go test -run=None -bench=. > levenshtein.bench
82 | benchcmp levenshtein.bench ../closestmatch.bench
83 | ```
84 |
85 | which gives the following benchmark (on Intel i7-3770 CPU @ 3.40GHz w/ 8 processors):
86 |
87 | ```bash
88 | benchmark old ns/op new ns/op delta
89 | BenchmarkNew-8 1.47 1933870 +131555682.31%
90 | BenchmarkClosestOne-8 104603530 4855916 -95.36%
91 | ```
92 |
93 | The `New()` function in *closestmatch* is so slower than *levenshtein* because there is precomputation needed.
94 |
95 | ### Disadvantages
96 |
97 | *closestmatch* does worse for matching lists of single words, like a dictionary. For comparison:
98 |
99 |
100 | ```
101 | $ cd $GOPATH/src/github.com/schollz/closestmatch && go test
102 | Accuracy with mutating words in book list: 90.0%
103 | Accuracy with mutating letters in book list: 100.0%
104 | Accuracy with mutating letters in dictionary: 38.9%
105 | ```
106 |
107 | while levenshtein performs slightly better for a single-word dictionary (but worse for longer names, like book titles):
108 |
109 | ```
110 | $ cd $GOPATH/src/github.com/schollz/closestmatch/levenshtein && go test
111 | Accuracy with mutating words in book list: 40.0%
112 | Accuracy with mutating letters in book list: 100.0%
113 | Accuracy with mutating letters in dictionary: 64.8%
114 | ```
115 |
116 | ## License
117 |
118 | MIT
119 |
--------------------------------------------------------------------------------
/closestmatch.go:
--------------------------------------------------------------------------------
1 | package closestmatch
2 |
3 | import (
4 | "compress/gzip"
5 | "encoding/json"
6 | "math/rand"
7 | "os"
8 | "sort"
9 | "strings"
10 | "sync"
11 | )
12 |
13 | // ClosestMatch is the structure that contains the
14 | // substring sizes and carrys a map of the substrings for
15 | // easy lookup
16 | type ClosestMatch struct {
17 | SubstringSizes []int
18 | SubstringToID map[string]map[uint32]struct{}
19 | ID map[uint32]IDInfo
20 | mux sync.Mutex
21 | }
22 |
23 | // IDInfo carries the information about the keys
24 | type IDInfo struct {
25 | Key string
26 | NumSubstrings int
27 | }
28 |
29 | // New returns a new structure for performing closest matches
30 | func New(possible []string, subsetSize []int) *ClosestMatch {
31 | cm := new(ClosestMatch)
32 | cm.SubstringSizes = subsetSize
33 | cm.SubstringToID = make(map[string]map[uint32]struct{})
34 | cm.ID = make(map[uint32]IDInfo)
35 | for i, s := range possible {
36 | substrings := cm.splitWord(strings.ToLower(s))
37 | cm.ID[uint32(i)] = IDInfo{Key: s, NumSubstrings: len(substrings)}
38 | for substring := range substrings {
39 | if _, ok := cm.SubstringToID[substring]; !ok {
40 | cm.SubstringToID[substring] = make(map[uint32]struct{})
41 | }
42 | cm.SubstringToID[substring][uint32(i)] = struct{}{}
43 | }
44 | }
45 |
46 | return cm
47 | }
48 |
49 | // Load can load a previously saved ClosestMatch object from disk
50 | func Load(filename string) (*ClosestMatch, error) {
51 | cm := new(ClosestMatch)
52 |
53 | f, err := os.Open(filename)
54 | defer f.Close()
55 | if err != nil {
56 | return cm, err
57 | }
58 |
59 | w, err := gzip.NewReader(f)
60 | if err != nil {
61 | return cm, err
62 | }
63 |
64 | err = json.NewDecoder(w).Decode(&cm)
65 | return cm, err
66 | }
67 |
68 | // Add more words to ClosestMatch structure
69 | func (cm *ClosestMatch) Add(possible []string) {
70 | cm.mux.Lock()
71 | for i, s := range possible {
72 | substrings := cm.splitWord(strings.ToLower(s))
73 | cm.ID[uint32(i)] = IDInfo{Key: s, NumSubstrings: len(substrings)}
74 | for substring := range substrings {
75 | if _, ok := cm.SubstringToID[substring]; !ok {
76 | cm.SubstringToID[substring] = make(map[uint32]struct{})
77 | }
78 | cm.SubstringToID[substring][uint32(i)] = struct{}{}
79 | }
80 | }
81 | cm.mux.Unlock()
82 | }
83 |
84 | // Save writes the current ClosestSave object as a gzipped JSON file
85 | func (cm *ClosestMatch) Save(filename string) error {
86 | f, err := os.Create(filename)
87 | if err != nil {
88 | return err
89 | }
90 | defer f.Close()
91 | w := gzip.NewWriter(f)
92 | defer w.Close()
93 | enc := json.NewEncoder(w)
94 | // enc.SetIndent("", " ")
95 | return enc.Encode(cm)
96 | }
97 |
98 | func (cm *ClosestMatch) worker(id int, jobs <-chan job, results chan<- result) {
99 | for j := range jobs {
100 | m := make(map[string]int)
101 | cm.mux.Lock()
102 | if ids, ok := cm.SubstringToID[j.substring]; ok {
103 | weight := 1000 / len(ids)
104 | for id := range ids {
105 | if _, ok2 := m[cm.ID[id].Key]; !ok2 {
106 | m[cm.ID[id].Key] = 0
107 | }
108 | m[cm.ID[id].Key] += 1 + 1000/len(cm.ID[id].Key) + weight
109 | }
110 | }
111 | cm.mux.Unlock()
112 | results <- result{m: m}
113 | }
114 | }
115 |
116 | type job struct {
117 | substring string
118 | }
119 |
120 | type result struct {
121 | m map[string]int
122 | }
123 |
124 | func (cm *ClosestMatch) match(searchWord string) map[string]int {
125 | searchSubstrings := cm.splitWord(searchWord)
126 | searchSubstringsLen := len(searchSubstrings)
127 |
128 | jobs := make(chan job, searchSubstringsLen)
129 | results := make(chan result, searchSubstringsLen)
130 | workers := 8
131 |
132 | for w := 1; w <= workers; w++ {
133 | go cm.worker(w, jobs, results)
134 | }
135 |
136 | for substring := range searchSubstrings {
137 | jobs <- job{substring: substring}
138 | }
139 | close(jobs)
140 |
141 | m := make(map[string]int)
142 | for a := 1; a <= searchSubstringsLen; a++ {
143 | r := <-results
144 | for key := range r.m {
145 | if _, ok := m[key]; ok {
146 | m[key] += r.m[key]
147 | } else {
148 | m[key] = r.m[key]
149 | }
150 | }
151 | }
152 |
153 | return m
154 | }
155 |
156 | // Closest searches for the `searchWord` and returns the closest match
157 | func (cm *ClosestMatch) Closest(searchWord string) string {
158 | for _, pair := range rankByWordCount(cm.match(searchWord)) {
159 | return pair.Key
160 | }
161 | return ""
162 | }
163 |
164 | // ClosestN searches for the `searchWord` and returns the n closests matches
165 | func (cm *ClosestMatch) ClosestN(searchWord string, max int) []string {
166 | matches := make([]string, 0, max)
167 | for i, pair := range rankByWordCount(cm.match(searchWord)) {
168 | if i >= max {
169 | break
170 | }
171 | matches = append(matches, pair.Key)
172 | }
173 | return matches
174 | }
175 |
176 | func rankByWordCount(wordFrequencies map[string]int) PairList {
177 | pl := make(PairList, len(wordFrequencies))
178 | i := 0
179 | for k, v := range wordFrequencies {
180 | pl[i] = Pair{k, v}
181 | i++
182 | }
183 | sort.Sort(sort.Reverse(pl))
184 | return pl
185 | }
186 |
187 | type Pair struct {
188 | Key string
189 | Value int
190 | }
191 |
192 | type PairList []Pair
193 |
194 | func (p PairList) Len() int { return len(p) }
195 | func (p PairList) Less(i, j int) bool { return p[i].Value < p[j].Value }
196 | func (p PairList) Swap(i, j int) { p[i], p[j] = p[j], p[i] }
197 |
198 | func (cm *ClosestMatch) splitWord(word string) map[string]struct{} {
199 | wordHash := make(map[string]struct{})
200 | for _, j := range cm.SubstringSizes {
201 | for i := 0; i < len(word)-j+1; i++ {
202 | substring := string(word[i : i+j])
203 | if len(strings.TrimSpace(substring)) > 0 {
204 | wordHash[string(word[i:i+j])] = struct{}{}
205 | }
206 | }
207 | }
208 | if len(wordHash) == 0 {
209 | wordHash[word] = struct{}{}
210 | }
211 | return wordHash
212 | }
213 |
214 | // AccuracyMutatingWords runs some basic tests against the wordlist to
215 | // see how accurate this bag-of-characters method is against
216 | // the target dataset
217 | func (cm *ClosestMatch) AccuracyMutatingWords() float64 {
218 | rand.Seed(1)
219 | percentCorrect := 0.0
220 | numTrials := 0.0
221 |
222 | for wordTrials := 0; wordTrials < 200; wordTrials++ {
223 |
224 | var testString, originalTestString string
225 | cm.mux.Lock()
226 | testStringNum := rand.Intn(len(cm.ID))
227 | i := 0
228 | for id := range cm.ID {
229 | i++
230 | if i != testStringNum {
231 | continue
232 | }
233 | originalTestString = cm.ID[id].Key
234 | break
235 | }
236 | cm.mux.Unlock()
237 |
238 | var words []string
239 | choice := rand.Intn(3)
240 | if choice == 0 {
241 | // remove a random word
242 | words = strings.Split(originalTestString, " ")
243 | if len(words) < 3 {
244 | continue
245 | }
246 | deleteWordI := rand.Intn(len(words))
247 | words = append(words[:deleteWordI], words[deleteWordI+1:]...)
248 | testString = strings.Join(words, " ")
249 | } else if choice == 1 {
250 | // remove a random word and reverse
251 | words = strings.Split(originalTestString, " ")
252 | if len(words) > 1 {
253 | deleteWordI := rand.Intn(len(words))
254 | words = append(words[:deleteWordI], words[deleteWordI+1:]...)
255 | for left, right := 0, len(words)-1; left < right; left, right = left+1, right-1 {
256 | words[left], words[right] = words[right], words[left]
257 | }
258 | } else {
259 | continue
260 | }
261 | testString = strings.Join(words, " ")
262 | } else {
263 | // remove a random word and shuffle and replace 2 random letters
264 | words = strings.Split(originalTestString, " ")
265 | if len(words) > 1 {
266 | deleteWordI := rand.Intn(len(words))
267 | words = append(words[:deleteWordI], words[deleteWordI+1:]...)
268 | for i := range words {
269 | j := rand.Intn(i + 1)
270 | words[i], words[j] = words[j], words[i]
271 | }
272 | }
273 | testString = strings.Join(words, " ")
274 | letters := "abcdefghijklmnopqrstuvwxyz"
275 | if len(testString) == 0 {
276 | continue
277 | }
278 | ii := rand.Intn(len(testString))
279 | testString = testString[:ii] + string(letters[rand.Intn(len(letters))]) + testString[ii+1:]
280 | ii = rand.Intn(len(testString))
281 | testString = testString[:ii] + string(letters[rand.Intn(len(letters))]) + testString[ii+1:]
282 | }
283 | closest := cm.Closest(testString)
284 | if closest == originalTestString {
285 | percentCorrect += 1.0
286 | } else {
287 | //fmt.Printf("Original: %s, Mutilated: %s, Match: %s\n", originalTestString, testString, closest)
288 | }
289 | numTrials += 1.0
290 | }
291 | return 100.0 * percentCorrect / numTrials
292 | }
293 |
294 | // AccuracyMutatingLetters runs some basic tests against the wordlist to
295 | // see how accurate this bag-of-characters method is against
296 | // the target dataset when mutating individual letters (adding, removing, changing)
297 | func (cm *ClosestMatch) AccuracyMutatingLetters() float64 {
298 | rand.Seed(1)
299 | percentCorrect := 0.0
300 | numTrials := 0.0
301 |
302 | for wordTrials := 0; wordTrials < 200; wordTrials++ {
303 |
304 | var testString, originalTestString string
305 | cm.mux.Lock()
306 | testStringNum := rand.Intn(len(cm.ID))
307 | i := 0
308 | for id := range cm.ID {
309 | i++
310 | if i != testStringNum {
311 | continue
312 | }
313 | originalTestString = cm.ID[id].Key
314 | break
315 | }
316 | cm.mux.Unlock()
317 | testString = originalTestString
318 |
319 | // letters to replace with
320 | letters := "abcdefghijklmnopqrstuvwxyz"
321 |
322 | choice := rand.Intn(3)
323 | if choice == 0 {
324 | // replace random letter
325 | ii := rand.Intn(len(testString))
326 | testString = testString[:ii] + string(letters[rand.Intn(len(letters))]) + testString[ii+1:]
327 | } else if choice == 1 {
328 | // delete random letter
329 | ii := rand.Intn(len(testString))
330 | testString = testString[:ii] + testString[ii+1:]
331 | } else {
332 | // add random letter
333 | ii := rand.Intn(len(testString))
334 | testString = testString[:ii] + string(letters[rand.Intn(len(letters))]) + testString[ii:]
335 | }
336 | closest := cm.Closest(testString)
337 | if closest == originalTestString {
338 | percentCorrect += 1.0
339 | } else {
340 | //fmt.Printf("Original: %s, Mutilated: %s, Match: %s\n", originalTestString, testString, closest)
341 | }
342 | numTrials += 1.0
343 | }
344 |
345 | return 100.0 * percentCorrect / numTrials
346 | }
347 |
--------------------------------------------------------------------------------
/closestmatch_test.go:
--------------------------------------------------------------------------------
1 | package closestmatch
2 |
3 | import (
4 | "fmt"
5 | "io/ioutil"
6 | "strings"
7 | "testing"
8 |
9 | "github.com/schollz/closestmatch/test"
10 | )
11 |
12 | func BenchmarkNew(b *testing.B) {
13 | for i := 0; i < b.N; i++ {
14 | New(test.WordsToTest, []int{3})
15 | }
16 | }
17 |
18 | func BenchmarkSplitOne(b *testing.B) {
19 | cm := New(test.WordsToTest, []int{3})
20 | searchWord := test.SearchWords[0]
21 | b.ResetTimer()
22 | for i := 0; i < b.N; i++ {
23 | cm.splitWord(searchWord)
24 | }
25 | }
26 |
27 | func BenchmarkClosestOne(b *testing.B) {
28 | bText, _ := ioutil.ReadFile("test/books.list")
29 | wordsToTest := strings.Split(strings.ToLower(string(bText)), "\n")
30 | cm := New(wordsToTest, []int{3})
31 | searchWord := test.SearchWords[0]
32 | b.ResetTimer()
33 | for i := 0; i < b.N; i++ {
34 | cm.Closest(searchWord)
35 | }
36 | }
37 |
38 | func BenchmarkClosest3(b *testing.B) {
39 | bText, _ := ioutil.ReadFile("test/books.list")
40 | wordsToTest := strings.Split(strings.ToLower(string(bText)), "\n")
41 | cm := New(wordsToTest, []int{3})
42 | searchWord := test.SearchWords[0]
43 | b.ResetTimer()
44 | for i := 0; i < b.N; i++ {
45 | cm.ClosestN(searchWord, 3)
46 | }
47 | }
48 |
49 | func BenchmarkClosest30(b *testing.B) {
50 | bText, _ := ioutil.ReadFile("test/books.list")
51 | wordsToTest := strings.Split(strings.ToLower(string(bText)), "\n")
52 | cm := New(wordsToTest, []int{3})
53 | searchWord := test.SearchWords[0]
54 | b.ResetTimer()
55 | for i := 0; i < b.N; i++ {
56 | cm.ClosestN(searchWord, 30)
57 | }
58 | }
59 |
60 | func BenchmarkFileLoad(b *testing.B) {
61 | bText, _ := ioutil.ReadFile("test/books.list")
62 | wordsToTest := strings.Split(strings.ToLower(string(bText)), "\n")
63 | cm := New(wordsToTest, []int{3, 4})
64 | cm.Save("test/books.list.cm.gz")
65 | b.ResetTimer()
66 | for i := 0; i < b.N; i++ {
67 | Load("test/books.list.cm.gz")
68 | }
69 | }
70 |
71 | func BenchmarkFileSave(b *testing.B) {
72 | bText, _ := ioutil.ReadFile("test/books.list")
73 | wordsToTest := strings.Split(strings.ToLower(string(bText)), "\n")
74 | cm := New(wordsToTest, []int{3, 4})
75 | b.ResetTimer()
76 | for i := 0; i < b.N; i++ {
77 | cm.Save("test/books.list.cm.gz")
78 | }
79 | }
80 |
81 | func ExampleMatchingSmall() {
82 | cm := New([]string{"love", "loving", "cat", "kit", "cats"}, []int{4})
83 | fmt.Println(cm.splitWord("love"))
84 | fmt.Println(cm.splitWord("kit"))
85 | fmt.Println(cm.Closest("kit"))
86 | // Output:
87 | // map[love:{}]
88 | // map[kit:{}]
89 | // kit
90 |
91 | }
92 |
93 | func ExampleMatchingSimple() {
94 | cm := New(test.WordsToTest, []int{3})
95 | for _, searchWord := range test.SearchWords {
96 | fmt.Printf("'%s' matched '%s'\n", searchWord, cm.Closest(searchWord))
97 | }
98 | // Output:
99 | // 'cervantes don quixote' matched 'don quixote by miguel de cervantes saavedra'
100 | // 'mysterious afur at styles by christie' matched 'the mysterious affair at styles by agatha christie'
101 | // 'hard times by charles dickens' matched 'hard times by charles dickens'
102 | // 'complete william shakespeare' matched 'the complete works of william shakespeare by william shakespeare'
103 | // 'war by hg wells' matched 'the war of the worlds by h. g. wells'
104 |
105 | }
106 |
107 | func ExampleMatchingN() {
108 | cm := New(test.WordsToTest, []int{4})
109 | fmt.Println(cm.ClosestN("war h.g. wells", 3))
110 | // Output:
111 | // [the war of the worlds by h. g. wells the time machine by h. g. wells war and peace by graf leo tolstoy]
112 | }
113 |
114 | func ExampleMatchingBigList() {
115 | bText, _ := ioutil.ReadFile("test/books.list")
116 | wordsToTest := strings.Split(strings.ToLower(string(bText)), "\n")
117 | cm := New(wordsToTest, []int{3})
118 | searchWord := "island of a thod mirrors"
119 | fmt.Println(cm.Closest(searchWord))
120 | // Output:
121 | // island of a thousand mirrors by nayomi munaweera
122 | }
123 |
124 | func ExampleMatchingCatcher() {
125 | bText, _ := ioutil.ReadFile("test/catcher.txt")
126 | wordsToTest := strings.Split(strings.ToLower(string(bText)), "\n")
127 | cm := New(wordsToTest, []int{5})
128 | searchWord := "catcher in the rye by jd salinger"
129 | for i, match := range cm.ClosestN(searchWord, 3) {
130 | if i == 2 {
131 | fmt.Println(match)
132 | }
133 | }
134 | // Output:
135 | // the catcher in the rye by j.d. salinger
136 | }
137 |
138 | func ExampleMatchingPotter() {
139 | bText, _ := ioutil.ReadFile("test/potter.txt")
140 | wordsToTest := strings.Split(strings.ToLower(string(bText)), "\n")
141 | cm := New(wordsToTest, []int{5})
142 | searchWord := "harry potter and the half blood prince by j.k. rowling"
143 | for i, match := range cm.ClosestN(searchWord, 3) {
144 | if i == 1 {
145 | fmt.Println(match)
146 | }
147 | }
148 | // Output:
149 | // harry potter and the order of the phoenix (harry potter, #5, part 1) by j.k. rowling
150 | }
151 |
152 | func TestAccuracyBookWords(t *testing.T) {
153 | bText, _ := ioutil.ReadFile("test/books.list")
154 | wordsToTest := strings.Split(strings.ToLower(string(bText)), "\n")
155 | cm := New(wordsToTest, []int{4, 5})
156 | accuracy := cm.AccuracyMutatingWords()
157 | fmt.Printf("Accuracy with mutating words in book list:\t%2.1f%%\n", accuracy)
158 | }
159 |
160 | func TestAccuracyBookLetters(t *testing.T) {
161 | bText, _ := ioutil.ReadFile("test/books.list")
162 | wordsToTest := strings.Split(strings.ToLower(string(bText)), "\n")
163 | cm := New(wordsToTest, []int{5})
164 | accuracy := cm.AccuracyMutatingLetters()
165 | fmt.Printf("Accuracy with mutating letters in book list:\t%2.1f%%\n", accuracy)
166 | }
167 |
168 | func TestAccuracyDictionaryLetters(t *testing.T) {
169 | bText, _ := ioutil.ReadFile("test/popular.txt")
170 | wordsToTest := strings.Split(strings.ToLower(string(bText)), "\n")
171 | cm := New(wordsToTest, []int{2, 3, 4})
172 | accuracy := cm.AccuracyMutatingWords()
173 | fmt.Printf("Accuracy with mutating letters in dictionary:\t%2.1f%%\n", accuracy)
174 | }
175 |
176 | func TestSaveLoad(t *testing.T) {
177 | bText, _ := ioutil.ReadFile("test/books.list")
178 | wordsToTest := strings.Split(strings.ToLower(string(bText)), "\n")
179 | type TestStruct struct {
180 | cm *ClosestMatch
181 | }
182 | tst := new(TestStruct)
183 | tst.cm = New(wordsToTest, []int{5})
184 | err := tst.cm.Save("test.gob")
185 | if err != nil {
186 | t.Error(err)
187 | }
188 |
189 | tst2 := new(TestStruct)
190 | tst2.cm, err = Load("test.gob")
191 | if err != nil {
192 | t.Error(err)
193 | }
194 | answer2 := tst2.cm.Closest("war of the worlds by hg wells")
195 | answer1 := tst.cm.Closest("war of the worlds by hg wells")
196 | if answer1 != answer2 {
197 | t.Errorf("Differing answers: '%s' '%s'", answer1, answer2)
198 | }
199 | }
200 |
--------------------------------------------------------------------------------
/cmclient/client.go:
--------------------------------------------------------------------------------
1 | package cmclient
2 |
3 | import (
4 | "bytes"
5 | "encoding/json"
6 | "fmt"
7 | "net/http"
8 | )
9 |
10 | // Connection is the BoltDB server instance
11 | type Connection struct {
12 | Address string
13 | }
14 |
15 | // Open will load a connection to BoltDB
16 | func Open(address string) (*Connection, error) {
17 | c := new(Connection)
18 | c.Address = address
19 | resp, err := http.Get(c.Address + "/uptime")
20 | if err != nil {
21 | return c, err
22 | }
23 | defer resp.Body.Close()
24 | return c, nil
25 | }
26 |
27 | func (c *Connection) Closest(searchString string) (match string, err error) {
28 | type QueryJSON struct {
29 | SearchString string `json:"s"`
30 | }
31 |
32 | payloadJSON := new(QueryJSON)
33 | payloadJSON.SearchString = searchString
34 |
35 | payloadBytes, err := json.Marshal(payloadJSON)
36 | if err != nil {
37 | return
38 | }
39 | body := bytes.NewReader(payloadBytes)
40 |
41 | req, err := http.NewRequest("POST", fmt.Sprintf("%s/match", c.Address), body)
42 | if err != nil {
43 | return
44 | }
45 | req.Header.Set("Content-Type", "application/json")
46 |
47 | resp, err := http.DefaultClient.Do(req)
48 | if err != nil {
49 | return
50 | }
51 | defer resp.Body.Close()
52 |
53 | type ResultJSON struct {
54 | Result string `json:"r"`
55 | }
56 | var r ResultJSON
57 | err = json.NewDecoder(resp.Body).Decode(&r)
58 | match = r.Result
59 | return
60 | }
61 |
62 | func (c *Connection) ClosestN(searchString string, n int) (matches []string, err error) {
63 | matches = []string{}
64 | type QueryJSON struct {
65 | SearchString string `json:"s"`
66 | N int `json:"n"`
67 | }
68 |
69 | payloadJSON := new(QueryJSON)
70 | payloadJSON.SearchString = searchString
71 | payloadJSON.N = n
72 |
73 | payloadBytes, err := json.Marshal(payloadJSON)
74 | if err != nil {
75 | return
76 | }
77 | body := bytes.NewReader(payloadBytes)
78 |
79 | req, err := http.NewRequest("POST", fmt.Sprintf("%s/match", c.Address), body)
80 | if err != nil {
81 | return
82 | }
83 | req.Header.Set("Content-Type", "application/json")
84 |
85 | resp, err := http.DefaultClient.Do(req)
86 | if err != nil {
87 | return
88 | }
89 | defer resp.Body.Close()
90 |
91 | type ResultJSON struct {
92 | Results []string `json:"r"`
93 | }
94 | var r ResultJSON
95 | err = json.NewDecoder(resp.Body).Decode(&r)
96 | matches = r.Results
97 | return
98 | }
99 |
--------------------------------------------------------------------------------
/cmclient/client_test.go:
--------------------------------------------------------------------------------
1 | package cmclient
2 |
3 | import (
4 | "fmt"
5 | "testing"
6 | )
7 |
8 | var testingServer = "http://localhost:8051"
9 |
10 | func TestClosest(t *testing.T) {
11 | conn, _ := Open(testingServer)
12 | match, err := conn.Closest("The War of the Worlds by H.G. Wells")
13 | if err != nil {
14 | t.Error(err)
15 | }
16 | if match != "The Time Machine/The War of the Worlds by H.G. Wells" {
17 | t.Error(match)
18 | }
19 | }
20 |
21 | func TestClosestN(t *testing.T) {
22 | conn, _ := Open(testingServer)
23 | matches, err := conn.ClosestN("The War of the Worlds by H.G. Wells", 10)
24 | if err != nil {
25 | t.Error(err)
26 | }
27 | if len(matches) != 10 {
28 | t.Errorf("Got %d", len(matches))
29 | }
30 | fmt.Println(matches)
31 | }
32 |
--------------------------------------------------------------------------------
/cmserver/server.go:
--------------------------------------------------------------------------------
1 | package main
2 |
3 | import (
4 | "fmt"
5 | "io/ioutil"
6 | "net/http"
7 | "os"
8 | "strings"
9 | "time"
10 |
11 | "strconv"
12 |
13 | "github.com/gin-gonic/gin"
14 | "github.com/jcelliott/lumber"
15 | "github.com/schollz/closestmatch"
16 | "gopkg.in/urfave/cli.v1"
17 | )
18 |
19 | var version string
20 | var log *lumber.ConsoleLogger
21 | var cm *closestmatch.ClosestMatch
22 |
23 | func main() {
24 |
25 | app := cli.NewApp()
26 | app.Name = "cmserver"
27 | app.Usage = "fancy server for connecting to a closestmatch db"
28 | app.Version = version
29 | app.Compiled = time.Now()
30 | app.Action = func(c *cli.Context) error {
31 | listfile := c.GlobalString("list")
32 | verbose := c.GlobalBool("debug")
33 | port := c.GlobalString("port")
34 |
35 | if verbose {
36 | log = lumber.NewConsoleLogger(lumber.TRACE)
37 | } else {
38 | log = lumber.NewConsoleLogger(lumber.WARN)
39 | }
40 |
41 | log.Info("Loading closestmatch...")
42 | var errcm error
43 | cm, errcm = closestmatch.Load(listfile + ".cm")
44 | if errcm != nil {
45 | log.Warn(errcm.Error())
46 | log.Info("...loading data file...")
47 | var intArray []int
48 | for _, intStr := range strings.Split(c.GlobalString("bags"), ",") {
49 | intInt, _ := strconv.Atoi(intStr)
50 | intArray = append(intArray, intInt)
51 | }
52 | keys, err := ioutil.ReadFile(listfile)
53 | if err != nil {
54 | log.Error(err.Error())
55 | return err
56 | }
57 | log.Info("...computing cm...")
58 | cm = closestmatch.New(strings.Split(string(keys), "\n"), intArray)
59 | log.Info("...computed.")
60 | //log.Info("Saving...")
61 | //cm.Save(listfile + ".cm")
62 | //log.Info("...saving.")
63 | }
64 |
65 | startTime := time.Now()
66 |
67 | gin.SetMode(gin.ReleaseMode)
68 | r := gin.Default()
69 | r.GET("/v1/api", func(c *gin.Context) {
70 | c.String(200, `
71 |
72 | // Get map of buckets and the number of keys in each
73 | GET /uptime
74 | `)
75 | })
76 | r.GET("/uptime", func(c *gin.Context) {
77 | c.JSON(200, gin.H{
78 | "uptime": time.Since(startTime).String(),
79 | })
80 | })
81 | r.POST("/match", handleMatch)
82 |
83 | fmt.Printf("cmserver (v.%s) running on :%s\n", version, port)
84 | r.Run(":" + port) // listen and serve on 0.0.0.0:8080
85 | return nil
86 | }
87 | app.Flags = []cli.Flag{
88 | cli.StringFlag{
89 | Name: "port, p",
90 | Value: "8051",
91 | Usage: "port to use to listen",
92 | },
93 | cli.StringFlag{
94 | Name: "list,l",
95 | Value: "",
96 | Usage: "list of phrases to load into closestmatch",
97 | },
98 | cli.StringFlag{
99 | Name: "bags,b",
100 | Value: "2,3",
101 | Usage: "comma separated bags",
102 | },
103 | cli.BoolFlag{
104 | Name: "debug,d",
105 | Usage: "turn on debug mode",
106 | },
107 | }
108 | app.Run(os.Args)
109 |
110 | }
111 |
112 | // test with
113 | // http POST localhost:8051/match s='The War of the Worlds by HG Wells'
114 | func handleMatch(c *gin.Context) {
115 | type QueryJSON struct {
116 | SearchString string `json:"s"`
117 | N int `json:"n"`
118 | }
119 | var json QueryJSON
120 | if c.BindJSON(&json) != nil {
121 | log.Trace("Got %v", json)
122 | c.String(http.StatusBadRequest, "Must provide search_string")
123 | return
124 | }
125 | log.Trace("Got %v", json)
126 | if json.N == 0 {
127 | c.JSON(http.StatusOK, gin.H{"r": cm.Closest(json.SearchString)})
128 | } else {
129 | c.JSON(http.StatusOK, gin.H{"r": cm.ClosestN(json.SearchString, json.N)})
130 | }
131 | }
132 |
--------------------------------------------------------------------------------
/levenshtein/levenshtein.go:
--------------------------------------------------------------------------------
1 | package levenshtein
2 |
3 | import (
4 | "math/rand"
5 | "strings"
6 | )
7 |
8 | // LevenshteinDistance
9 | // from https://groups.google.com/forum/#!topic/golang-nuts/YyH1f_qCZVc
10 | // (no min, compute lengths once, pointers, 2 rows array)
11 | // fastest profiled
12 | func LevenshteinDistance(a, b *string) int {
13 | la := len(*a)
14 | lb := len(*b)
15 | d := make([]int, la+1)
16 | var lastdiag, olddiag, temp int
17 |
18 | for i := 1; i <= la; i++ {
19 | d[i] = i
20 | }
21 | for i := 1; i <= lb; i++ {
22 | d[0] = i
23 | lastdiag = i - 1
24 | for j := 1; j <= la; j++ {
25 | olddiag = d[j]
26 | min := d[j] + 1
27 | if (d[j-1] + 1) < min {
28 | min = d[j-1] + 1
29 | }
30 | if (*a)[j-1] == (*b)[i-1] {
31 | temp = 0
32 | } else {
33 | temp = 1
34 | }
35 | if (lastdiag + temp) < min {
36 | min = lastdiag + temp
37 | }
38 | d[j] = min
39 | lastdiag = olddiag
40 | }
41 | }
42 | return d[la]
43 | }
44 |
45 | type ClosestMatch struct {
46 | WordsToTest []string
47 | }
48 |
49 | func New(wordsToTest []string) *ClosestMatch {
50 | cm := new(ClosestMatch)
51 | cm.WordsToTest = wordsToTest
52 | return cm
53 | }
54 |
55 | func (cm *ClosestMatch) Closest(searchWord string) string {
56 | bestVal := 10000
57 | bestWord := ""
58 | for _, word := range cm.WordsToTest {
59 | newVal := LevenshteinDistance(&searchWord, &word)
60 | if newVal < bestVal {
61 | bestVal = newVal
62 | bestWord = word
63 | }
64 | }
65 | return bestWord
66 | }
67 |
68 | func (cm *ClosestMatch) Accuracy() float64 {
69 | rand.Seed(1)
70 | percentCorrect := 0.0
71 | numTrials := 0.0
72 |
73 | for wordTrials := 0; wordTrials < 100; wordTrials++ {
74 |
75 | var testString, originalTestString string
76 | testStringNum := rand.Intn(len(cm.WordsToTest))
77 | i := 0
78 | for _, s := range cm.WordsToTest {
79 | i++
80 | if i != testStringNum {
81 | continue
82 | }
83 | originalTestString = s
84 | break
85 | }
86 |
87 | // remove a random word
88 | for trial := 0; trial < 4; trial++ {
89 | words := strings.Split(originalTestString, " ")
90 | if len(words) < 3 {
91 | continue
92 | }
93 | deleteWordI := rand.Intn(len(words))
94 | words = append(words[:deleteWordI], words[deleteWordI+1:]...)
95 | testString = strings.Join(words, " ")
96 | if cm.Closest(testString) == originalTestString {
97 | percentCorrect += 1.0
98 | }
99 | numTrials += 1.0
100 | }
101 |
102 | // remove a random word and reverse
103 | for trial := 0; trial < 4; trial++ {
104 | words := strings.Split(originalTestString, " ")
105 | if len(words) > 1 {
106 | deleteWordI := rand.Intn(len(words))
107 | words = append(words[:deleteWordI], words[deleteWordI+1:]...)
108 | for left, right := 0, len(words)-1; left < right; left, right = left+1, right-1 {
109 | words[left], words[right] = words[right], words[left]
110 | }
111 | } else {
112 | continue
113 | }
114 | testString = strings.Join(words, " ")
115 | if cm.Closest(testString) == originalTestString {
116 | percentCorrect += 1.0
117 | }
118 | numTrials += 1.0
119 | }
120 |
121 | // remove a random word and shuffle and replace random letter
122 | for trial := 0; trial < 4; trial++ {
123 | words := strings.Split(originalTestString, " ")
124 | if len(words) > 1 {
125 | deleteWordI := rand.Intn(len(words))
126 | words = append(words[:deleteWordI], words[deleteWordI+1:]...)
127 | for i := range words {
128 | j := rand.Intn(i + 1)
129 | words[i], words[j] = words[j], words[i]
130 | }
131 | }
132 | testString = strings.Join(words, " ")
133 | letters := "abcdefghijklmnopqrstuvwxyz"
134 | if len(testString) == 0 {
135 | continue
136 | }
137 | ii := rand.Intn(len(testString))
138 | testString = testString[:ii] + string(letters[rand.Intn(len(letters))]) + testString[ii+1:]
139 | ii = rand.Intn(len(testString))
140 | testString = testString[:ii] + string(letters[rand.Intn(len(letters))]) + testString[ii+1:]
141 | if cm.Closest(testString) == originalTestString {
142 | percentCorrect += 1.0
143 | }
144 | numTrials += 1.0
145 | }
146 |
147 | if cm.Closest(testString) == originalTestString {
148 | percentCorrect += 1.0
149 | }
150 | numTrials += 1.0
151 |
152 | }
153 |
154 | return 100.0 * percentCorrect / numTrials
155 | }
156 |
157 | func (cm *ClosestMatch) AccuracySimple() float64 {
158 | rand.Seed(1)
159 | percentCorrect := 0.0
160 | numTrials := 0.0
161 |
162 | for wordTrials := 0; wordTrials < 500; wordTrials++ {
163 |
164 | var testString, originalTestString string
165 | testStringNum := rand.Intn(len(cm.WordsToTest))
166 |
167 | originalTestString = cm.WordsToTest[testStringNum]
168 |
169 | testString = originalTestString
170 |
171 | // letters to replace with
172 | letters := "abcdefghijklmnopqrstuvwxyz"
173 |
174 | choice := rand.Intn(3)
175 | if choice == 0 {
176 | // replace random letter
177 | ii := rand.Intn(len(testString))
178 | testString = testString[:ii] + string(letters[rand.Intn(len(letters))]) + testString[ii+1:]
179 | } else if choice == 1 {
180 | // delete random letter
181 | ii := rand.Intn(len(testString))
182 | testString = testString[:ii] + testString[ii+1:]
183 | } else {
184 | // add random letter
185 | ii := rand.Intn(len(testString))
186 | testString = testString[:ii] + string(letters[rand.Intn(len(letters))]) + testString[ii:]
187 | }
188 | closest := cm.Closest(testString)
189 | if closest == originalTestString {
190 | percentCorrect += 1.0
191 | } else {
192 | //fmt.Printf("Original: %s, Mutilated: %s, Match: %s\n", originalTestString, testString, closest)
193 | }
194 | numTrials += 1.0
195 | }
196 |
197 | return 100.0 * percentCorrect / numTrials
198 | }
199 |
200 | // AccuracyMutatingWords runs some basic tests against the wordlist to
201 | // see how accurate this bag-of-characters method is against
202 | // the target dataset
203 | func (cm *ClosestMatch) AccuracyMutatingWords() float64 {
204 | rand.Seed(1)
205 | percentCorrect := 0.0
206 | numTrials := 0.0
207 |
208 | for wordTrials := 0; wordTrials < 200; wordTrials++ {
209 |
210 | var testString, originalTestString string
211 | testStringNum := rand.Intn(len(cm.WordsToTest))
212 | originalTestString = cm.WordsToTest[testStringNum]
213 | testString = originalTestString
214 |
215 | var words []string
216 | choice := rand.Intn(3)
217 | if choice == 0 {
218 | // remove a random word
219 | words = strings.Split(originalTestString, " ")
220 | if len(words) < 3 {
221 | continue
222 | }
223 | deleteWordI := rand.Intn(len(words))
224 | words = append(words[:deleteWordI], words[deleteWordI+1:]...)
225 | testString = strings.Join(words, " ")
226 | } else if choice == 1 {
227 | // remove a random word and reverse
228 | words = strings.Split(originalTestString, " ")
229 | if len(words) > 1 {
230 | deleteWordI := rand.Intn(len(words))
231 | words = append(words[:deleteWordI], words[deleteWordI+1:]...)
232 | for left, right := 0, len(words)-1; left < right; left, right = left+1, right-1 {
233 | words[left], words[right] = words[right], words[left]
234 | }
235 | } else {
236 | continue
237 | }
238 | testString = strings.Join(words, " ")
239 | } else {
240 | // remove a random word and shuffle and replace 2 random letters
241 | words = strings.Split(originalTestString, " ")
242 | if len(words) > 1 {
243 | deleteWordI := rand.Intn(len(words))
244 | words = append(words[:deleteWordI], words[deleteWordI+1:]...)
245 | for i := range words {
246 | j := rand.Intn(i + 1)
247 | words[i], words[j] = words[j], words[i]
248 | }
249 | }
250 | testString = strings.Join(words, " ")
251 | letters := "abcdefghijklmnopqrstuvwxyz"
252 | if len(testString) == 0 {
253 | continue
254 | }
255 | ii := rand.Intn(len(testString))
256 | testString = testString[:ii] + string(letters[rand.Intn(len(letters))]) + testString[ii+1:]
257 | ii = rand.Intn(len(testString))
258 | testString = testString[:ii] + string(letters[rand.Intn(len(letters))]) + testString[ii+1:]
259 | }
260 | closest := cm.Closest(testString)
261 | if closest == originalTestString {
262 | percentCorrect += 1.0
263 | } else {
264 | //fmt.Printf("Original: %s, Mutilated: %s, Match: %s\n", originalTestString, testString, closest)
265 | }
266 | numTrials += 1.0
267 | }
268 | return 100.0 * percentCorrect / numTrials
269 | }
270 |
271 | // AccuracyMutatingLetters runs some basic tests against the wordlist to
272 | // see how accurate this bag-of-characters method is against
273 | // the target dataset when mutating individual letters (adding, removing, changing)
274 | func (cm *ClosestMatch) AccuracyMutatingLetters() float64 {
275 | rand.Seed(1)
276 | percentCorrect := 0.0
277 | numTrials := 0.0
278 |
279 | for wordTrials := 0; wordTrials < 200; wordTrials++ {
280 |
281 | var testString, originalTestString string
282 | testStringNum := rand.Intn(len(cm.WordsToTest) - 1)
283 | originalTestString = cm.WordsToTest[testStringNum]
284 | testString = originalTestString
285 |
286 | // letters to replace with
287 | letters := "abcdefghijklmnopqrstuvwxyz"
288 |
289 | choice := rand.Intn(3)
290 | if choice == 0 {
291 | // replace random letter
292 | ii := rand.Intn(len(testString))
293 | testString = testString[:ii] + string(letters[rand.Intn(len(letters))]) + testString[ii+1:]
294 | } else if choice == 1 {
295 | // delete random letter
296 | ii := rand.Intn(len(testString))
297 | testString = testString[:ii] + testString[ii+1:]
298 | } else {
299 | // add random letter
300 | ii := rand.Intn(len(testString))
301 | testString = testString[:ii] + string(letters[rand.Intn(len(letters))]) + testString[ii:]
302 | }
303 | closest := cm.Closest(testString)
304 | if closest == originalTestString {
305 | percentCorrect += 1.0
306 | } else {
307 | //fmt.Printf("Original: %s, Mutilated: %s, Match: %s\n", originalTestString, testString, closest)
308 | }
309 | numTrials += 1.0
310 | }
311 |
312 | return 100.0 * percentCorrect / numTrials
313 | }
314 |
--------------------------------------------------------------------------------
/levenshtein/levenshtein_test.go:
--------------------------------------------------------------------------------
1 | package levenshtein
2 |
3 | import (
4 | "fmt"
5 | "io/ioutil"
6 | "strings"
7 | "testing"
8 |
9 | "github.com/schollz/closestmatch/test"
10 | )
11 |
12 | func BenchmarkNew(b *testing.B) {
13 | for i := 0; i < b.N; i++ {
14 | New(test.WordsToTest)
15 | }
16 | }
17 |
18 | func BenchmarkClosestOne(b *testing.B) {
19 | bText, _ := ioutil.ReadFile("../test/books.list")
20 | wordsToTest := strings.Split(strings.ToLower(string(bText)), "\n")
21 | cm := New(wordsToTest)
22 | searchWord := test.SearchWords[0]
23 | b.ResetTimer()
24 | for i := 0; i < b.N; i++ {
25 | cm.Closest(searchWord)
26 | }
27 | }
28 |
29 | func ExampleMatching() {
30 | cm := New(test.WordsToTest)
31 | for _, searchWord := range test.SearchWords {
32 | fmt.Printf("'%s' matched '%s'\n", searchWord, cm.Closest(searchWord))
33 | }
34 | // Output:
35 | // 'cervantes don quixote' matched 'emma by jane austen'
36 | // 'mysterious afur at styles by christie' matched 'the mysterious affair at styles by agatha christie'
37 | // 'hard times by charles dickens' matched 'hard times by charles dickens'
38 | // 'complete william shakespeare' matched 'the iliad by homer'
39 | // 'war by hg wells' matched 'beowulf'
40 |
41 | }
42 |
43 | func TestAccuracyBookWords(t *testing.T) {
44 | bText, _ := ioutil.ReadFile("../test/books.list")
45 | wordsToTest := strings.Split(strings.ToLower(string(bText)), "\n")
46 | cm := New(wordsToTest)
47 | accuracy := cm.AccuracyMutatingWords()
48 | fmt.Printf("Accuracy with mutating words in book list:\t%2.1f%%\n", accuracy)
49 | }
50 |
51 | func TestAccuracyBookletters(t *testing.T) {
52 | bText, _ := ioutil.ReadFile("../test/books.list")
53 | wordsToTest := strings.Split(strings.ToLower(string(bText)), "\n")
54 | cm := New(wordsToTest)
55 | accuracy := cm.AccuracyMutatingLetters()
56 | fmt.Printf("Accuracy with mutating letters in book list:\t%2.1f%%\n", accuracy)
57 | }
58 |
59 | func TestAccuracyDictionaryletters(t *testing.T) {
60 | bText, _ := ioutil.ReadFile("../test/popular.txt")
61 | wordsToTest := strings.Split(strings.ToLower(string(bText)), "\n")
62 | cm := New(wordsToTest)
63 | accuracy := cm.AccuracyMutatingWords()
64 | fmt.Printf("Accuracy with mutating letters in dictionary:\t%2.1f%%\n", accuracy)
65 | }
66 |
--------------------------------------------------------------------------------
/test/catcher.txt:
--------------------------------------------------------------------------------
1 | The Catcher in the Rye by J.D. Salinger Student Packet Grades 9-12 (Novel Units Guides) by Gloria Levine
2 | The Catcher in the Rye by J.D. Salinger: A Study Guide by Ray Moore
3 | A Reader's Companion to J.D. Salinger's the Catcher in the Rye by Peter Beidler
4 | A Reader's Companion To J.D. Salinger's The Catcher In The Rye by Peter G. Beidler
5 | Depression in J.D. Salinger's The Catcher in the Rye by Dedria Bryfonski
6 | Critica; Insights: The Catcher in the Rye, by J.D. Salinger by Joseph Dewey
7 | The Catcher in the Rye/Franny and Zooey/Nine Stories/Raise High the Roof Beam, Carpenters by J.D. Salinger
8 | The Catcher in the Rye by J. D. Salinger Summary & Study Guide by BookRags
9 | The Catcher in the Rye and J.D. Salinger by Jonathan Coupland
10 | Monarch Notes: J. D. Salinger's The Catcher in the Rye by Laurie E. Rozakis
11 | J. D. Salinger: The Catcher In The Rye by Brian Donnelly
12 | A Reader's Companion to J. D. Salinger's The Catcher in the Rye by Peter Beidler
13 | Jerome D. Salinger, The Catcher In The Rye by Hans-Otto Jahnke
14 | Robert Cormier, I Am The Cheese, J. D. Salinger, The Catcher In The Rye by Peter Jone
15 | The Catcher in the Rye by Jerome Salinger
16 | J.D. Salinger: The Catcher In The Rye (Barron's Studies in American Literature) by Richard Lettis
17 | J.D. Salinger: The Catcher in the Rye and Other Works by Raychel Haugrud Reiff
18 | The Catcher in the Rye: A Reader's Guide to the J.D. Salinger Novel by Robert Crayola
19 | The Catcher In The Rye, De J.D. Salinger by Claire Bernas-Martel
20 | J.D. Salinger's The Catcher in the Rye by Harold Bloom (Bloom's Modern Critical Interpretations)
21 | J.D. Salingers 'The Catcher in the Rye.' Materialien. (Lernmaterialien) by Herbert Rühl
22 | Cliffs Notes on Salinger's the Catcher in the Rye by Robert B. Kaplan
23 | Cliffs Notes on Salinger's The Catcher in the Rye by Stanley P. Baldwin
24 | J. D. Salinger's the Catcher in the Rye: A Routledge Guide by Sarah Graham
25 | The Catcher in the Rye - and Salinger by Jerome Smith
26 | The Catcher in the Rye and Salinger by Jonathan Coupland
27 | Catcher In The Rye, J.D. Salinger by Nigel Tookey
28 | The Catcher in the Rye and JD Salinger by Andrew Hastings
29 | The Catcher in the Rye Guide and Other Works of JD Salinge by Peter Baxter
30 | The Catcher in the Rye by Joy Leavitt
31 | New Essays on the Catcher in the Rye by Jack Salzman
32 | Salinger's The Catcher in the Rye (Reader's Guides) by Sarah Graham
33 | The Catcher in the Rye by Shmoop
34 | The Catcher in the Rye - A - Z by Jecks Stapley
35 | The Candidate in the Rye: A Parody of The Catcher in the Rye Starring Donald J. Trump by John Marquane
36 | Readings on the Catcher in the Rye (Literary Companion Series) by Steven Engel
37 | The Catcher In The Rye; Owlsgate 35s Study Guide by David Neilson
38 | The Catcher in the Rye (A BookHacker Summary) by BookHacker
39 | The Catcher in the Rye - Barron's Book Notes by Barron's Book Notes
40 | The Catcher in the Rye (SparkNotes Literature Guide) by SparkNotes
41 | The Catcher in the Rye (Study Guide) by Minute Help Guides
42 | The Catcher in the Rye (York Notes) by Nigel Tookey
43 | Masterwork Studies Series: The Catcher in the Rye (Paperback) by Sanford Pinsker
44 | The Catcher in the Rye by J.D. Salinger
--------------------------------------------------------------------------------
/test/data.go:
--------------------------------------------------------------------------------
1 | package test
2 |
3 | import (
4 | "strings"
5 | )
6 |
7 | var books = `Pride and Prejudice by Jane Austen
8 | Alice's Adventures in Wonderland by Lewis Carroll
9 | The Importance of Being Earnest: A Trivial Comedy for Serious People by Oscar Wilde
10 | A Tale of Two Cities by Charles Dickens
11 | A Doll's House : a play by Henrik Ibsen
12 | Frankenstein; Or, The Modern Prometheus by Mary Wollstonecraft Shelley
13 | The Yellow Wallpaper by Charlotte Perkins Gilman
14 | The Adventures of Tom Sawyer by Mark Twain
15 | Metamorphosis by Franz Kafka
16 | Adventures of Huckleberry Finn by Mark Twain
17 | Light Science for Leisure Hours by Richard A. Proctor
18 | Grimms' Fairy Tales by Jacob Grimm and Wilhelm Grimm
19 | Jane Eyre: An Autobiography by Charlotte Brontë
20 | Dracula by Bram Stoker
21 | Moby Dick; Or, The Whale by Herman Melville
22 | The Adventures of Sherlock Holmes by Arthur Conan Doyle
23 | Il Principe. English by Niccolò Machiavelli
24 | Emma by Jane Austen
25 | Great Expectations by Charles Dickens
26 | The Picture of Dorian Gray by Oscar Wilde
27 | Beyond the Hills of Dream by W. Wilfred Campbell
28 | The Hospital Murders by Means Davis and Augusta Tucker Townsend
29 | Dirty Dustbins and Sloppy Streets by H. Percy Boulnois
30 | Leviathan by Thomas Hobbes
31 | The Count of Monte Cristo, Illustrated by Alexandre Dumas
32 | Heart of Darkness by Joseph Conrad
33 | Ulysses by James Joyce
34 | War and Peace by graf Leo Tolstoy
35 | Narrative of the Life of Frederick Douglass, an American Slave by Frederick Douglass
36 | The Radio Boys Seek the Lost Atlantis by Gerald Breckenridge
37 | The Bab Ballads by W. S. Gilbert
38 | Wuthering Heights by Emily Brontë
39 | The Awakening, and Selected Short Stories by Kate Chopin
40 | The Romance of Lust: A Classic Victorian erotic novel by Anonymous
41 | Beowulf
42 | Les Misérables by Victor Hugo
43 | Siddhartha by Hermann Hesse
44 | The Kama Sutra of Vatsyayana by Vatsyayana
45 | Treasure Island by Robert Louis Stevenson
46 | Dubliners by James Joyce
47 | Reminiscences of Western Travels by Shao Xiang Lin
48 | The Souls of Black Folk by W. E. B. Du Bois
49 | Leaves of Grass by Walt Whitman
50 | A Christmas Carol in Prose; Being a Ghost Story of Christmas by Charles Dickens
51 | Tractatus Logico-Philosophicus by Ludwig Wittgenstein
52 | A Modest Proposal by Jonathan Swift
53 | Essays of Michel de Montaigne — Complete by Michel de Montaigne
54 | Prestuplenie i nakazanie. English by Fyodor Dostoyevsky
55 | Practical Grammar and Composition by Thomas Wood
56 | A Study in Scarlet by Arthur Conan Doyle
57 | Sense and Sensibility by Jane Austen
58 | Don Quixote by Miguel de Cervantes Saavedra
59 | Peter Pan by J. M. Barrie
60 | The Republic by Plato
61 | The Life and Adventures of Robinson Crusoe by Daniel Defoe
62 | The Strange Case of Dr. Jekyll and Mr. Hyde by Robert Louis Stevenson
63 | Gulliver's Travels into Several Remote Nations of the World by Jonathan Swift
64 | My Secret Life, Volumes I. to III. by Anonymous
65 | Beyond Good and Evil by Friedrich Wilhelm Nietzsche
66 | The Brothers Karamazov by Fyodor Dostoyevsky
67 | The Time Machine by H. G. Wells
68 | Also sprach Zarathustra. English by Friedrich Wilhelm Nietzsche
69 | The Federalist Papers by Alexander Hamilton and John Jay and James Madison
70 | Songs of Innocence, and Songs of Experience by William Blake
71 | The Iliad by Homer
72 | Hastings & Environs; A Sketch-Book by H. G. Hampton
73 | The Hound of the Baskervilles by Arthur Conan Doyle
74 | The Children of Odin: The Book of Northern Myths by Padraic Colum
75 | Autobiography of Benjamin Franklin by Benjamin Franklin
76 | The Divine Comedy by Dante, Illustrated by Dante Alighieri
77 | Hedda Gabler by Henrik Ibsen
78 | Hard Times by Charles Dickens
79 | The Jungle Book by Rudyard Kipling
80 | The Real Captain Kidd by Cornelius Neale Dalton
81 | On Liberty by John Stuart Mill
82 | The Complete Works of William Shakespeare by William Shakespeare
83 | The Tragical History of Doctor Faustus by Christopher Marlowe
84 | Anne of Green Gables by L. M. Montgomery
85 | The Jungle by Upton Sinclair
86 | The Tragedy of Romeo and Juliet by William Shakespeare
87 | De l'amour by Charles Baudelaire and Félix-François Gautier
88 | Ethan Frome by Edith Wharton
89 | Oliver Twist by Charles Dickens
90 | The Turn of the Screw by Henry James
91 | The Wonderful Wizard of Oz by L. Frank Baum
92 | The Legend of Sleepy Hollow by Washington Irving
93 | The Ship of Coral by H. De Vere Stacpoole
94 | Democracy and Education: An Introduction to the Philosophy of Education by John Dewey
95 | Candide by Voltaire
96 | Pygmalion by Bernard Shaw
97 | Walden, and On The Duty Of Civil Disobedience by Henry David Thoreau
98 | Three Men in a Boat by Jerome K. Jerome
99 | A Portrait of the Artist as a Young Man by James Joyce
100 | Manifest der Kommunistischen Partei. English by Friedrich Engels and Karl Marx
101 | Through the Looking-Glass by Lewis Carroll
102 | Le Morte d'Arthur: Volume 1 by Sir Thomas Malory
103 | The Mysterious Affair at Styles by Agatha Christie
104 | Korean—English Dictionary by Leon Kuperman
105 | The War of the Worlds by H. G. Wells
106 | A Concise Dictionary of Middle English from A.D. 1150 to 1580 by A. L. Mayhew and Walter W. Skeat
107 | Armageddon in Retrospect by Kurt Vonnegut
108 | Red Riding Hood by Sarah Blakley-Cartwright
109 | The Kingdom of This World by Alejo Carpentier
110 | Hitty, Her First Hundred Years by Rachel Field`
111 |
112 | var WordsToTest []string
113 | var SearchWords = []string{"cervantes don quixote", "mysterious afur at styles by christie", "hard times by charles dickens", "complete william shakespeare", "War by HG Wells"}
114 |
115 | func init() {
116 | WordsToTest = strings.Split(strings.ToLower(books), "\n")
117 | for i := range SearchWords {
118 | SearchWords[i] = strings.ToLower(SearchWords[i])
119 | }
120 | }
121 |
--------------------------------------------------------------------------------
/test/potter.txt:
--------------------------------------------------------------------------------
1 | Harry Potter And The Half Blood Prince Deluxe Gift Book by BBC
2 | Harry Potter and the Half Blood Prince: The Interactive Quiz Book (The Harry Potter Series.) by Julia Reed
3 | Harry Potter And The Half Blood Prince: Poster Annual 2010 by BBC
4 | Harry Potter And The Half Blood Prince: (Piano Solo) by Nicholas Hooper
5 | Harry Potter and the Half-Blood Prince by Shmoop
6 | Garri Potter i Princ Polukrovka / Harry Potter and the Half-Blood Prince [IN RUSSIAN] by Rouling Dzh.
7 | Mark Reads Harry Potter and the Half-Blood Prince by Mark Oshiro (Mark Reads Harry Potter #6)
8 | Harry Potter Films (Film Guide): Harry Potter and the Order of the Phoenix, List of Harry Potter Cast Members, Harry Potter and the Half-Blood Prince, by Books Group
9 | Selections from Harry Potter and the Half-Blood Prince: Piano Solos by Songbook
10 | Harry Potter and the Half-Blood Prince: Movie Poster Book by Scholastic
11 | The Ultimate Unofficial Harry Potter® Trivia Book: Secrets, Mysteries And Fun Facts Including Half Blood Prince Book 6 by Daniel Lawrence
12 | Unauthorized Half-Blood Prince Update: News and Speculation about Harry Potter Book Six by J. K. Rowling by W. Frederick Zimmerman
13 | Harry Potter and the Sorcerer's Stone: Book 1 - Novel by J.K Rowling -- Summary & More! by Ez- Summary
14 | Harry Potter and the Goblet of Fire by J. K. Rowling | Chapter Outlines by BookRags
15 | Harry Potter and the Order of the Court: The J.K. Rowling Copyright Case and the Question of Fair Use by Robert S. Want
16 | Harry Potter And The Order Of Phoenix: A Summary About This Novel Of J.K Rowling!! (Harry Potter And The Order Of Phoenix: A Detailed Summary-- Book 5, Box Set, Novel, Rowling) by The Summary Guy
17 | Harry Potter and the Charming Prince by slashpervert (The Bound Prince #7)
18 | Myths and Symbols in J.K. Rowling's Harry Potter and the Philosopher's Stone by Volker Geyer
19 | Harry Potter and the Order of the Phoenix (Harry Potter, #5, Part 1) by J.K. Rowling
20 | Harry Potter and the Order of the Phoenix by J. K. Rowling | Chapter Outlines by BookRags
21 | Buchspicker: Übersetzungshilfe Zu "Harry Potter And The Deathly Hallows" (Harry Potter 7): Ausgewählte Vokabeln Für Jede Seite Des Romans Von J.K. Rowling by Thorsten Hinrichsen
22 | Buchspicker: Übersetzungshilfe zu "Harry Potter and the philosopher's stone" und "Harry Potter and the chamber of secrets" (Harry Potter 1 + 2) ausgewählte Vokabeln für jede Seite der Romane von J. K. Rowling by Thorsten Hinrichsen
--------------------------------------------------------------------------------