├── markov
    ├── __init__.py
    ├── markov.py
    └── tests.py
├── .travis.yml
├── LICENSE
├── setup.py
└── README.md


/markov/__init__.py:
--------------------------------------------------------------------------------
1 | from markov import Markov
2 | 


--------------------------------------------------------------------------------
/.travis.yml:
--------------------------------------------------------------------------------
 1 | language: python
 2 | python:
 3 |   - "2.6"
 4 |   - "2.7"
 5 |   # - "3.2"
 6 | install:
 7 |   - "python setup.py test"
 8 |   - "easy_install *.egg"
 9 | script: "PATH=$PATH:/tmp/bin nosetests"
10 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | Copyright 2016 Wieden+Kennedy
 2 | 
 3 | Licensed under the Apache License, Version 2.0 (the "License");
 4 | you may not use this file except in compliance with the License.
 5 | You may obtain a copy of the License at
 6 | 
 7 |     http://www.apache.org/licenses/LICENSE-2.0
 8 | 
 9 | Unless required by applicable law or agreed to in writing, software
10 | distributed under the License is distributed on an "AS IS" BASIS,
11 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12 | See the License for the specific language governing permissions and
13 | limitations under the License.
14 | 


--------------------------------------------------------------------------------
/setup.py:
--------------------------------------------------------------------------------
 1 | #/usr/bin/env python
 2 | import os
 3 | from setuptools import setup, find_packages
 4 | 
 5 | ROOT_DIR = os.path.dirname(__file__)
 6 | SOURCE_DIR = os.path.join(ROOT_DIR)
 7 | 
 8 | setup(
 9 |     name="python-markov",
10 |     description="Some utility methods for generating and storing markov chains with python and redis",
11 |     author="Grant Thomas",
12 |     author_email="grant.thomas@wk.com",
13 |     url="https://github.com/wieden-kennedy/python-markov",
14 |     version="0.0.1",
15 |     install_requires=["redis"],
16 |     packages=find_packages(),
17 |     zip_safe=False,
18 |     classifiers=[
19 |         "Programming Language :: Python",
20 |         "License :: OSI Approved :: BSD License",
21 |         "Operating System :: OS Independent",
22 |         "Development Status :: 4 - Beta",
23 |         "Intended Audience :: Developers",
24 |         "Topic :: Internet :: WWW/HTTP",
25 |         "Topic :: Software Development :: Libraries :: Python Modules",
26 |     ],
27 | )
28 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # Python-Markov
  2 | 
  3 | Python-Markov is a python library for storing Markov chains in a Redis database.  
  4 | You can use it to score lines for "good fit" or generate random texts based on your collected data.
  5 | This library is optimized for storing and scoring short pieces of text (sentences, tweets etc...).
  6 | 
  7 | ## Markov Chains
  8 | *Q: What is a Markov Chain and why would I use this library?*
  9 | 
 10 | A: In mathematical terms, a Markov Chain is a sequence of values where the next value depends only on the current value (and not past values).  It's 
 11 | basically a really simple state machine, where given the present state, the future state is conditionally independent of the past.
 12 | 
 13 | Markov chains have many real-world applications. For example, [Google's Page Rank](http://ilpubs.stanford.edu:8090/422/) algorithm is essentially a 
 14 | Markov chain over a graph of the web.  
 15 | 
 16 | One of the simplest and most well known applications of Markov Chains is generating "realistic" looking texts based on some set
 17 | of input texts.
 18 | 
 19 | In the case of text, a Markov Chain could be used to answer the question, "Given the present word (or set of words), which words might possibly follow?".
 20 | You could also use Markov Chains to answer the question, "Given the present word, how likely is it that this word I've chosen would be the next?".
 21 | 
 22 | The goal of Python-Markov is to store Markov chains that model your choice of text.  You can use the included functions to generate new pieces of
 23 | text that resemble your input values. You can also score a given piece of text for "good fit" with your data set.
 24 | 
 25 | When you add a piece of text to Python-Markov, it breaks it down in to keys and possible completions, with a frequency.
 26 | For example, let say you had two sentences:
 27 | ```
 28 | "I ate a pizza."
 29 | and 
 30 | "I ate a sandwich."
 31 | ```
 32 | 
 33 | If you use 2 word keys and 1 word competions, when you add these sentences to your model you'd end up with something like this:
 34 | ```
 35 | key:"I ate" completions: [ (text: "a", frequency: 2) ]
 36 | key:"ate a" completions: [ (text: "pizza", frequency: 1), (text: "sandwich", frequency: 1)]
 37 | key:"a sandwich" completions: [(text: EOL, frequency: 1)]
 38 | 
 39 | and so on....
 40 | ```
 41 | Even with the small set of data you can generate text. For each possible 2 word key, there's a maximum of two possible completions,  if you were to start a random walk across your
 42 | data with "I ate", it would always be followed by "a". The next key, "ate a", could be followed by "pizza" or "sandwich", and
 43 | the keys "a pizza" and "a sandwich" will always be followed by EOL (end of line). While you can generate with a small data set, you'll
 44 | need a lot of data to build longer, more interesting texts.
 45 | 
 46 | You could also use the data to ascertain that "I ate one hammer" doesn't fit the model well, and isn't a thing that anyone would say ever. 
 47 | 
 48 | You can read more about Markov Chains [here](http://en.wikipedia.org/wiki/Markov_chain) or [here](http://mathworld.wolfram.com/MarkovChain.html).
 49 | 
 50 | *Q: Can I store things that aren't text with Python-Markov?*
 51 | 
 52 | A: Yes, as long as it can be coerced to something Redis friendly.  You could theoretically put any sort of Python-thing into
 53 | python-markov, score sequences of those things for fit, and generate new sequences of things.
 54 | 
 55 | ## Installation
 56 | Use pip!
 57 | ``` pip install -e git+https://github.com/wieden-kennedy/python-markov#egg=markov ```
 58 | 
 59 | ## Usage
 60 | 
 61 | The functions in python-markov expect lists of tokens.  For example, to add a single line of text to your Markov chain, you can
 62 | call add_line_to_index() like so.
 63 | 
 64 | ```python   
 65 | import redis
 66 | from markov.markov import add_line_to_index
 67 | 
 68 | client = redis.Redis()
 69 | line = ['some', 'words', 'that', 'you', 'would', 'like', 'to', 'add']	
 70 | add_line_to_index(line, client, prefix="your_prefix")
 71 | ```
 72 | 
 73 | Your Markov chain is namespaced by a prefix (so you can store different data sets in the same Redis database). Each of the functions
 74 | in the markov modules takes a prefix argument to determine which set of data to use.  To make things easier, there's a Markov
 75 | class that allows you to refer to a specific set of prefixed data.
 76 | 
 77 | ```python
 78 | from markov import Markov
 79 | 
 80 | twitter_data = Markov(prefix="twitter")
 81 | twitter_data.add_line_to_index(["eating", "sushi", "with", "my", "cat"])
 82 | ```
 83 | 
 84 | It's recommended that you use the Markov class to add texts to your model, score texts and generate new text.
 85 | 
 86 | For example, let's say you've collected a lot of Oprah transcripts and stored them in your model. Scoring a text
 87 | would look something like this:
 88 | 
 89 | ```python
 90 | from markov import Markov
 91 | 
 92 | #oprah_data is a Markov model filled with Oprah quotes
 93 | oprah_data = Markov(prefix="oprah")
 94 | 
 95 | #sentence is our guess at something Oprah might say
 96 | sentence = ["you", "get", "a", "car"]
 97 | 
 98 | #let's ask oprah_data how we did
 99 | score = oprah_data.score_for_line(sentence)
100 | # at this point, score is probably something like 100
101 | 
102 | other_sentence = ["you", "get", "rusty", "razor", "blades"]
103 | 
104 | score = oprah_data.score_for_line(other_sentence)
105 | #since oprah probably never said this the score is probably much lower, like 30 or 50
106 | ```
107 | 
108 | You can also generate text from your Markov model. Let's say you put a bunch of tweets in your model and you wanted
109 | to generate a representative sample:
110 | 
111 | ```python
112 | from markov import Markov
113 | 
114 | tweet_data = Markov(prefix="tweets")
115 | 
116 | new_tweet = tweet_data.generate(max_words=6)
117 | #new_tweet will be something like ["omg", "i", "love", "snax"]
118 | ```
119 | 
120 | If you want your text to start with a certain key, you can seed it like so:
121 | 
122 | ```python
123 | new_tweet = tweet_data.generate(seed=['i','love'], max_words=6)
124 | #new_tweet will be something like ['i', 'love', 'to', 'eat', 'snax']
125 | ```
126 | 
127 | you can use the max_words argument to determine the maximum number of tokens to include in the generated sequence
128 | ```python
129 | new_tweet = tweet_data.generate(max_words=100)
130 | #new tweet could be 2-100 words long
131 | ```
132 | 
133 | if not, Markov.generate() will continue to generate texts up to 1000 words long by default until it choses a STOP character
134 | at which point it will stop.
135 | 
136 | ```python
137 | new_tweet = tweet_data.generate()
138 | #new tweet could be really long, or not!
139 | ```
140 | 
141 | *Note: the functions in python-markov use recursion, so don't add a sequence more than 1000 items long to your index or 
142 | you'll get an error because Python is like that*
143 | 


--------------------------------------------------------------------------------
/markov/markov.py:
--------------------------------------------------------------------------------
  1 | """
  2 | Functions for generating simple markov chains for sequences of words. Allows for scoring of
  3 | sentences based on completion frequency.
  4 | """
  5 | import random
  6 | import redis
  7 | 
  8 | PREFIX = 'markov'
  9 | SEPARATOR=':'
 10 | STOP='\x02'
 11 | 
 12 | PUNCTUATION = [",", ".", ";", "!","?","(",")", "...", "....", "....."]
 13 | 
 14 | class Markov(object):
 15 |     """
 16 |     Simple wrapper for markov functions
 17 |     """
 18 |     def __init__(self, prefix=None, key_length=2, completion_length=1, db=0, host='localhost', port=6379, password=None):
 19 |         self.client = redis.Redis(db=db, host=host, port=port, password=password)
 20 |         self.prefix = prefix or PREFIX
 21 |         self.key_length = key_length
 22 |         self.completion_length = completion_length
 23 | 
 24 |     def add_line_to_index(self, line):
 25 |         add_line_to_index(line, self.client, self.key_length, self.completion_length, self.prefix)
 26 | 
 27 |     def score_for_line(self, line):
 28 |         return score_for_line(line, self.client, self.key_length, self.completion_length, self.prefix)
 29 | 
 30 |     def generate(self, seed=None, max_words=1000, count_punctuation=True, relevant_terms=None):
 31 |         return generate(self.client, seed=seed, prefix=self.prefix, max_words=max_words, key_length=self.key_length, count_punctuation=count_punctuation, relevant_terms=relevant_terms)
 32 | 
 33 |     def flush(self, prefix=None):
 34 |         if prefix is not None:
 35 |             keys = self.client.keys("%s*" % self.prefix)
 36 |             for key in keys:
 37 |                 self.client.delete(key)
 38 |     
 39 | 
 40 | def add_line_to_index(line, client, key_length=2, completion_length=1, prefix=PREFIX):
 41 |     """
 42 |     Add a line to our index of markov chains
 43 | 
 44 |     @param line: a list of words
 45 |     @param key_length: the desired length for our keys
 46 |     @param completion_length: the desired completion length
 47 |     """
 48 |     key, completion = get_key_and_completion(line, key_length, completion_length, prefix)
 49 |     if key and completion:
 50 |         completion = make_key(completion)
 51 |         client.zincrby(key, completion)       
 52 |         add_line_to_index(line[1:], client, key_length, completion_length, prefix)
 53 |     else:
 54 |         return
 55 |         
 56 | def make_key(key, prefix=None):
 57 |     """
 58 |     Construct a Redis-friendly key from the list or tuple provided
 59 |     """
 60 |     if type(key) not in [str, unicode]:
 61 |         key = SEPARATOR.join(key)
 62 |         if prefix:
 63 |             key = SEPARATOR.join((prefix, key))
 64 |     return key
 65 | 
 66 | def max_for_key(key, client):
 67 |     """
 68 |     Get the maximum score for a completion on this key
 69 |     """
 70 |     maximum = client.zrevrange(key, 0, 0, withscores=True)
 71 |     if maximum:
 72 |         return maximum[0][1]
 73 |     else:
 74 |         return 0
 75 |     
 76 | def min_for_key(key, client):
 77 |     """
 78 |     Get the minimum score for a completion on this key
 79 |     """
 80 |     minimum = client.zrange(key, 0, 0, withscores=True)
 81 |     if minimum:
 82 |         return minimum[0][1]
 83 |     else:
 84 |         return 0
 85 | 
 86 | def score_for_completion(key, completion, client, normalize_to=100):
 87 |     """
 88 |     Get the normalized score for a completion
 89 |     """
 90 |     raw_score = client.zscore(key, make_key(completion)) or 0
 91 |     maximum = max_for_key(key, client) or 1
 92 |     return (raw_score/maximum) * normalize_to
 93 | 
 94 | def _score_for_line(line, client, key_length, completion_length, prefix, count=0):
 95 |     """
 96 |     Recursive function for iterating over all possible key/completion sets in a line
 97 |     and scoring them
 98 |     """
 99 |     score = 0
100 |     key, completion = get_key_and_completion(line, key_length, completion_length, prefix)
101 |     if key and completion:
102 |         score = score_for_completion(key, completion, client)
103 |         new_score, count = _score_for_line(line[1:], client, key_length, completion_length, prefix, count+1)
104 |         score += new_score
105 |     else:
106 |         score = 0
107 |     return score, count
108 | 
109 | def score_for_line(line, client, key_length=2, completion_length=1, prefix=PREFIX):
110 |     """
111 |     Score a line of text for fit based on our markov model
112 |     """
113 |     score, count = _score_for_line(line, client, key_length, completion_length, prefix)
114 |     if count > 0:
115 |         return score/count
116 |     else:
117 |         return 0
118 | 
119 | def generate(client, seed=None, prefix=None, max_words=1000, key_length=2, count_punctuation=True, relevant_terms=None):
120 |     """
121 |     Generate some text based on our model
122 |     """
123 |     if seed is None:
124 |         key, seed = get_key_and_seed(client, prefix, relevant_terms=relevant_terms)      
125 |     else:
126 |         key = make_key(seed[-1*key_length:], prefix=prefix)
127 | 
128 |     completion = get_completion(client, key, relevant_terms=relevant_terms)
129 |     #if we've found a completion, continue
130 |     if completion:
131 |         completion = completion.split(SEPARATOR)
132 |         if count_tokens(seed, count_punctuation) + count_tokens(completion, count_punctuation) < max_words:
133 |             seed += completion
134 |             return generate(client, seed=seed, prefix=prefix, max_words=max_words, key_length=key_length, count_punctuation=count_punctuation, relevant_terms=relevant_terms)
135 |         elif count_tokens(seed, count_punctuation) + count_tokens(completion, count_punctuation) == max_words:
136 |             if STOP in completion:
137 |                 completion.remove(STOP)
138 |             return seed + completion  
139 |     else:
140 |         if STOP in seed:
141 |             seed.remove(STOP)
142 |         return seed
143 |         
144 | def count_tokens(seed, count_punctuation=True):
145 |     """
146 |     Count the tokens in the given seed.
147 |     """
148 |     if count_punctuation :
149 |         return len(seed)
150 |     else:
151 |         return len([item for item in seed if item not in PUNCTUATION]) 
152 |      
153 | 
154 | def get_key_and_seed(client, prefix=None, relevant_terms=None):
155 |     """
156 |     Wraps get_random_key_and_seed and get_relevant_key_and_seed
157 |     """
158 |     if relevant_terms and len(relevant_terms) > 0:
159 |         return get_relevant_key_and_seed(client, relevant_terms, prefix)
160 |     else:
161 |         return get_random_key_and_seed(client, prefix)
162 | 
163 | 
164 | def get_random_key_and_seed(client, prefix=None):
165 |     """
166 |     Get a random key from the data set and split it into a seed for sequence generation.
167 |     """
168 |     key = None
169 |     seed = []
170 |     while len(seed) == 0 or (len([item for item in seed if item in PUNCTUATION]) > 0):
171 |         if prefix:
172 |             key = random.choice(client.keys("%s%s*" % (prefix, SEPARATOR)))
173 |         else:
174 |             key = client.randomkey()
175 |         seed = key.split(SEPARATOR)
176 |     if prefix in seed:
177 |         seed.remove(prefix) 
178 |     return key, seed
179 | 
180 | 
181 | def get_relevant_key_and_seed(client, relevant_terms, prefix=None, tries=10):
182 |     """
183 |     Get a key that contains one of the terms from relevant_terms.
184 |     Limit the number of tries to avoid an infinite loop.
185 |     """
186 |     tried = 0
187 |     key = None
188 |     seed = []
189 |     while (len(seed) == 0 or (len([item for item in seed if item in PUNCTUATION]) > 0)) and tried < tries:
190 |         keys = []
191 |         for term in relevant_terms:
192 |             if prefix:
193 |                 keys += client.keys("%s%s*%s*" % (prefix, SEPARATOR, term))
194 |             else:
195 |                 keys += client.keys("*%s*" % term)
196 |         try:
197 |             key = random.choice(list(set(keys)))
198 |             seed = key.split(SEPARATOR)
199 |         except IndexError:
200 |             # there were no matching keys
201 |             break
202 |         tried += 1
203 |     if prefix in seed:
204 |         seed.remove(prefix)
205 |     return key, seed
206 |         
207 |     
208 | def get_completion(client, key, exclude=[], relevant_terms=None):
209 |     """
210 |     Get a possible completion for some key
211 |     """
212 |     completion = None
213 |     completions = client.zrevrange(key, 0, -1)
214 |     completions = [item for item in completions if item not in exclude]
215 |     if len(completions) > 0:
216 |         if relevant_terms:
217 |             #make an attempt to use one of the relevant terms
218 |             try:
219 |                 completion = random.choice([item for item in completions if item in relevant_terms])
220 |             except IndexError:
221 |                 pass
222 |         if completion is None:
223 |             completion = random.choice(completions)
224 |     return completion
225 | 
226 | 
227 | def get_key_and_completion(line, key_length, completion_length, prefix):
228 |     """
229 |     Get a key and completion from the given list of words
230 |     """
231 |     if len(line) >= key_length and STOP not in line[0:key_length]:
232 |         key = make_key(line[0:key_length], prefix=prefix)
233 |         if completion_length > 1:
234 |             completion = line[key_length:key_length+completion_length]
235 |         else:
236 |             try:
237 |                 completion = line[key_length]
238 |             except IndexError:
239 |                 completion = STOP
240 |         completion = make_key(completion)
241 |         return (key, completion)
242 |     else:
243 |         return (False,False)
244 | 


--------------------------------------------------------------------------------
/markov/tests.py:
--------------------------------------------------------------------------------
  1 | """
  2 | Tests for the markov package
  3 | """
  4 | import redis
  5 | import unittest
  6 | import markov
  7 | from markov import Markov, add_line_to_index, make_key, max_for_key, min_for_key,\
  8 |      score_for_completion, score_for_line, get_key_and_completion, generate, get_relevant_key_and_seed, \
  9 |      get_random_key_and_seed, get_completion, STOP
 10 | 
 11 | class TestMarkovFunctions(unittest.TestCase):
 12 |     """
 13 |     Test our markov chain construction and phrase scoring functions
 14 |     """
 15 |     def setUp(self):
 16 |         """
 17 |         Create a redis client and define a prefix space for this test
 18 |         """
 19 |         self.client = redis.Redis(db=11)
 20 |         self.prefix = 'test'
 21 | 
 22 |     def test_make_key(self):
 23 |         """
 24 |         Test that the make_key function behaves as expected
 25 |         """
 26 |         key = make_key(('foo','bar'),self.prefix)
 27 |         self.assertEqual(key, 'test:foo:bar')
 28 |         key = make_key(('foo','bar'))
 29 |         self.assertEqual(key, 'foo:bar')
 30 |         
 31 |     def test_score_for_completion(self):
 32 |         """
 33 |         Test that score_for_completion scores completions according to our model
 34 |         """
 35 |         self.test_add_line_to_index()
 36 |         self.assertEqual(score_for_completion('test:i:ate', 'a', self.client), 100)
 37 |         self.assertEqual(score_for_completion('test:i:ate', 'one', self.client), 50)
 38 | 
 39 |     def test_max_for_key(self):
 40 |         """
 41 |         Test that max_for_key correctly finds the frequency of the most common completion
 42 |         """
 43 |         self.test_add_line_to_index()
 44 |         self.assertEqual(max_for_key('test:i:ate', self.client), 2)
 45 |         self.assertEqual(max_for_key('test:stupidkey', self.client), 0)
 46 | 
 47 |     def test_min_for_key(self):
 48 |         """
 49 |         Test that min_for_key correctly finds the frequency of the least common completion
 50 |         """
 51 |         self.test_add_line_to_index()
 52 |         self.assertEqual(min_for_key('test:i:ate', self.client), 1)
 53 |         self.assertEqual(min_for_key('test:stupidkey', self.client), 0)
 54 |         
 55 |     def test_add_line_to_index(self):
 56 |         """
 57 |         Test that adding lines behaves as expected
 58 |         """
 59 |         line = ['i','ate','a','peach']
 60 |         line1 = ['i','ate','one','peach']
 61 |         line2 = ['i','ate','a', 'sandwich']
 62 | 
 63 |         add_line_to_index(line, self.client, prefix="test")
 64 |         self.assertEqual(self.client.zscore("test:i:ate", "a"), 1.0)
 65 |         self.assertEqual(self.client.zscore("test:ate:a", "peach"), 1.0)
 66 | 
 67 |         add_line_to_index(line1, self.client, prefix="test")
 68 |         self.assertEqual(self.client.zscore("test:i:ate", "a"), 1.0)
 69 |         self.assertEqual(self.client.zscore("test:ate:a", "peach"), 1.0)
 70 |         self.assertEqual(self.client.zscore("test:ate:one", "peach"), 1.0)
 71 |         self.assertEqual(self.client.zscore("test:i:ate", "one"), 1.0)
 72 | 
 73 |         add_line_to_index(line2, self.client, prefix="test")
 74 |         self.assertEqual(self.client.zscore("test:i:ate", "a"), 2)
 75 |         self.assertEqual(self.client.zscore("test:ate:a", "sandwich"), 1.0)
 76 |         
 77 |     def test_score_for_line(self):
 78 |         """
 79 |         Ensure that score_for_line rates lines according to our model
 80 |         """
 81 |         self.test_add_line_to_index()
 82 |         line = ['i','ate','a','peach']
 83 |         line2 = ['i','ate','a', 'pizza']
 84 |         line3 = ['i','ate','one','sandwich']
 85 |         
 86 |         self.assertEqual(score_for_line(line, self.client, prefix=self.prefix), 100)
 87 |         self.assertEqual(score_for_line(line2, self.client, prefix=self.prefix), 100.0/3)
 88 |         self.assertEqual(score_for_line(line3, self.client, prefix=self.prefix), 50.0/3)
 89 |        
 90 |     def test_get_key_and_completion(self):
 91 |         """
 92 |         Ensure that get_key_and_completion finds the expected keys and completions based
 93 |         on key_length and completion_length
 94 |         """
 95 |         line = ['i', 'ate', 'a', 'peach']
 96 | 
 97 |         key, completion = get_key_and_completion(line, 2, 1, self.prefix)
 98 |         self.assertEqual(key, 'test:i:ate')
 99 |         self.assertEqual(completion, 'a')
100 | 
101 |         key, completion = get_key_and_completion(line, 2, 2, self.prefix)
102 |         self.assertEqual(key, 'test:i:ate')
103 |         self.assertEqual(completion, 'a:peach')
104 | 
105 |         key, completion = get_key_and_completion(line, 3, 2, self.prefix)
106 |         self.assertEqual(key, 'test:i:ate:a')
107 |         self.assertEqual(completion, 'peach')
108 | 
109 |         key, completion = get_key_and_completion(line, 4, 1, self.prefix)
110 |         self.assertEqual(key, 'test:i:ate:a:peach')
111 |         self.assertEqual(completion, markov.STOP)
112 | 
113 | 
114 |     def test_generate(self):
115 |         """
116 |         Test the generate function
117 |         """
118 |         self.test_add_line_to_index()
119 |         generated = generate(self.client, prefix=self.prefix, max_words=3)
120 |         assert len(generated) >= 2
121 |         assert len(generated) <= 3
122 |         generated = generate(self.client, seed=['ate','one'], prefix=self.prefix, max_words=3)
123 |         assert generated[2] == 'peach'
124 |         assert 'sandwich' not in generated
125 | 
126 |         #test that relevant terms will be chosen when the relevant_terms argument is passed in
127 |         generated = generate(self.client, relevant_terms=["peach",], prefix=self.prefix)       
128 |         assert 'peach' in generated
129 |         generated = generate(self.client, relevant_terms=["sandwich",], prefix=self.prefix)
130 |         assert 'sandwich' in generated
131 | 
132 |         #there are no pizza keys!
133 |         generated = generate(self.client, relevant_terms=["pizza",], prefix=self.prefix)
134 |         assert len(generated) == 0
135 | 
136 |     def test_get_relevant_key_and_seed(self):
137 |         """
138 |         Test that get_relevant_key_and_seed functions as expected
139 |         """
140 |         #we get a key with sandwich in it
141 |         self.test_add_line_to_index()
142 |         key, seed = get_relevant_key_and_seed(self.client, relevant_terms=["sandwich",], prefix=self.prefix)
143 |         assert "sandwich" in seed
144 | 
145 |         #pizza is not in our data set, so we get nothing
146 |         key, seed = get_relevant_key_and_seed(self.client, relevant_terms=["pizza",], prefix=self.prefix)
147 |         assert seed == []
148 |         assert key is None
149 |     
150 |     def test_get_random_key_and_seed(self):
151 |         """
152 |         Test that get_random_key_and_seed functions as expected
153 |         """
154 |         self.test_add_line_to_index()
155 |         key, seed = get_random_key_and_seed(self.client, prefix=self.prefix)
156 |         assert len(seed) == 2
157 |         assert self.prefix not in seed
158 |         assert self.prefix in key
159 | 
160 |     def test_get_completion(self):
161 |         """
162 |         Test the get_completion method
163 |         """
164 |         self.test_add_line_to_index()
165 |         key, seed = get_random_key_and_seed(self.client, prefix=self.prefix)
166 |         if STOP not in seed:
167 |             assert get_completion(self.client, key) is not None
168 |         else:
169 |             assert get_completion(self.client, key) is None
170 |         key = "test:i:ate"
171 |         assert get_completion(self.client, key) in ["a", "one"]
172 |         #ensure that exclude works as expected
173 |         assert get_completion(self.client, key, exclude=["a",]) == "one"
174 |         assert get_completion(self.client, key, exclude=["a","one"]) is None
175 | 
176 |         #ensure that relevant_terms works as well
177 |         assert get_completion(self.client, key, relevant_terms=["one",]) == "one"
178 |         
179 |             
180 |     def tearDown(self):
181 |         """
182 |         clean up our redis keys
183 |         """
184 |         keys = self.client.keys(self.prefix+"*")
185 |         for key in keys:
186 |             self.client.delete(key)
187 |     
188 | 
189 | class TestMarkovClass(unittest.TestCase):
190 |     """
191 |     Test that the Markov wrapper class behaves as expected
192 |     """
193 |     def setUp(self):
194 |         self.markov = Markov(prefix="testclass", db=11)
195 | 
196 |     def test_add_line_to_index(self):
197 |         line =  ['i','ate','a','peach']
198 |         line1 = ['i','ate','one','peach']
199 |         line2 = ['i','ate','a', 'sandwich']
200 | 
201 |         self.markov.add_line_to_index(line)
202 |         self.markov.add_line_to_index(line1)
203 |         self.markov.add_line_to_index(line2)
204 |         self.assertEqual(self.markov.client.zscore("testclass:i:ate", "a"), 2.0)
205 |         self.assertEqual(self.markov.client.zscore("testclass:ate:a", "peach"), 1.0)
206 |                 
207 |     def test_score_for_line(self):
208 |         self.test_add_line_to_index()
209 |         line = ['i','ate','a','peach']
210 |         self.assertEqual(self.markov.score_for_line(line), 100)
211 |        
212 |         
213 |     def test_generate(self):
214 |         self.test_add_line_to_index()
215 |         generated = self.markov.generate(max_words=3)
216 |         assert len(generated) >= 2
217 |         assert len(generated) <= 3
218 |         generated = self.markov.generate(seed=['ate','one'], max_words=3)
219 |         assert 'peach' in generated 
220 |         assert 'sandwich' not in generated
221 | 
222 |     def test_flush(self):
223 |         m1 = Markov(prefix="one", db=5)
224 |         m2 = Markov(prefix="two", db=5)
225 | 
226 |         line =  ['i','ate','a','peach']
227 |         line1 = ['i','ate','one','peach']
228 |         line2 = ['i','ate','a', 'sandwich']
229 | 
230 |         m1.add_line_to_index(line)
231 |         m1.add_line_to_index(line1)
232 |         m1.add_line_to_index(line2)
233 | 
234 |         important_line =  ['we', 'all', 'have', 'phones']
235 | 
236 |         m2.add_line_to_index(important_line)
237 | 
238 |         r = redis.Redis(db=5)
239 |         assert len(r.keys("one:*")) == 6
240 |         assert len(r.keys("two:*")) == 3
241 | 
242 |         m1.flush(prefix="one")
243 | 
244 |         assert len(r.keys("one:*")) == 0
245 |         assert len(r.keys("two:*")) == 3
246 | 
247 |         m2.flush(prefix="two")
248 | 
249 |         assert len(r.keys("one:*")) == 0
250 |         assert len(r.keys("two:*")) == 0        
251 |         
252 |     def tearDown(self):
253 |         """
254 |         clean up our redis keys
255 |         """
256 |         keys = self.markov.client.keys(self.markov.prefix+"*")
257 |         for key in keys:
258 |             self.markov.client.delete(key)
259 | 


--------------------------------------------------------------------------------