├── markov ├── __init__.py ├── markov.py └── tests.py ├── .travis.yml ├── LICENSE ├── setup.py └── README.md /markov/__init__.py: -------------------------------------------------------------------------------- 1 | from markov import Markov 2 | -------------------------------------------------------------------------------- /.travis.yml: -------------------------------------------------------------------------------- 1 | language: python 2 | python: 3 | - "2.6" 4 | - "2.7" 5 | # - "3.2" 6 | install: 7 | - "python setup.py test" 8 | - "easy_install *.egg" 9 | script: "PATH=$PATH:/tmp/bin nosetests" 10 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Copyright 2016 Wieden+Kennedy 2 | 3 | Licensed under the Apache License, Version 2.0 (the "License"); 4 | you may not use this file except in compliance with the License. 5 | You may obtain a copy of the License at 6 | 7 | http://www.apache.org/licenses/LICENSE-2.0 8 | 9 | Unless required by applicable law or agreed to in writing, software 10 | distributed under the License is distributed on an "AS IS" BASIS, 11 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 | See the License for the specific language governing permissions and 13 | limitations under the License. 14 | -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | #/usr/bin/env python 2 | import os 3 | from setuptools import setup, find_packages 4 | 5 | ROOT_DIR = os.path.dirname(__file__) 6 | SOURCE_DIR = os.path.join(ROOT_DIR) 7 | 8 | setup( 9 | name="python-markov", 10 | description="Some utility methods for generating and storing markov chains with python and redis", 11 | author="Grant Thomas", 12 | author_email="grant.thomas@wk.com", 13 | url="https://github.com/wieden-kennedy/python-markov", 14 | version="0.0.1", 15 | install_requires=["redis"], 16 | packages=find_packages(), 17 | zip_safe=False, 18 | classifiers=[ 19 | "Programming Language :: Python", 20 | "License :: OSI Approved :: BSD License", 21 | "Operating System :: OS Independent", 22 | "Development Status :: 4 - Beta", 23 | "Intended Audience :: Developers", 24 | "Topic :: Internet :: WWW/HTTP", 25 | "Topic :: Software Development :: Libraries :: Python Modules", 26 | ], 27 | ) 28 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Python-Markov 2 | 3 | Python-Markov is a python library for storing Markov chains in a Redis database. 4 | You can use it to score lines for "good fit" or generate random texts based on your collected data. 5 | This library is optimized for storing and scoring short pieces of text (sentences, tweets etc...). 6 | 7 | ## Markov Chains 8 | *Q: What is a Markov Chain and why would I use this library?* 9 | 10 | A: In mathematical terms, a Markov Chain is a sequence of values where the next value depends only on the current value (and not past values). It's 11 | basically a really simple state machine, where given the present state, the future state is conditionally independent of the past. 12 | 13 | Markov chains have many real-world applications. For example, [Google's Page Rank](http://ilpubs.stanford.edu:8090/422/) algorithm is essentially a 14 | Markov chain over a graph of the web. 15 | 16 | One of the simplest and most well known applications of Markov Chains is generating "realistic" looking texts based on some set 17 | of input texts. 18 | 19 | In the case of text, a Markov Chain could be used to answer the question, "Given the present word (or set of words), which words might possibly follow?". 20 | You could also use Markov Chains to answer the question, "Given the present word, how likely is it that this word I've chosen would be the next?". 21 | 22 | The goal of Python-Markov is to store Markov chains that model your choice of text. You can use the included functions to generate new pieces of 23 | text that resemble your input values. You can also score a given piece of text for "good fit" with your data set. 24 | 25 | When you add a piece of text to Python-Markov, it breaks it down in to keys and possible completions, with a frequency. 26 | For example, let say you had two sentences: 27 | ``` 28 | "I ate a pizza." 29 | and 30 | "I ate a sandwich." 31 | ``` 32 | 33 | If you use 2 word keys and 1 word competions, when you add these sentences to your model you'd end up with something like this: 34 | ``` 35 | key:"I ate" completions: [ (text: "a", frequency: 2) ] 36 | key:"ate a" completions: [ (text: "pizza", frequency: 1), (text: "sandwich", frequency: 1)] 37 | key:"a sandwich" completions: [(text: EOL, frequency: 1)] 38 | 39 | and so on.... 40 | ``` 41 | Even with the small set of data you can generate text. For each possible 2 word key, there's a maximum of two possible completions, if you were to start a random walk across your 42 | data with "I ate", it would always be followed by "a". The next key, "ate a", could be followed by "pizza" or "sandwich", and 43 | the keys "a pizza" and "a sandwich" will always be followed by EOL (end of line). While you can generate with a small data set, you'll 44 | need a lot of data to build longer, more interesting texts. 45 | 46 | You could also use the data to ascertain that "I ate one hammer" doesn't fit the model well, and isn't a thing that anyone would say ever. 47 | 48 | You can read more about Markov Chains [here](http://en.wikipedia.org/wiki/Markov_chain) or [here](http://mathworld.wolfram.com/MarkovChain.html). 49 | 50 | *Q: Can I store things that aren't text with Python-Markov?* 51 | 52 | A: Yes, as long as it can be coerced to something Redis friendly. You could theoretically put any sort of Python-thing into 53 | python-markov, score sequences of those things for fit, and generate new sequences of things. 54 | 55 | ## Installation 56 | Use pip! 57 | ``` pip install -e git+https://github.com/wieden-kennedy/python-markov#egg=markov ``` 58 | 59 | ## Usage 60 | 61 | The functions in python-markov expect lists of tokens. For example, to add a single line of text to your Markov chain, you can 62 | call add_line_to_index() like so. 63 | 64 | ```python 65 | import redis 66 | from markov.markov import add_line_to_index 67 | 68 | client = redis.Redis() 69 | line = ['some', 'words', 'that', 'you', 'would', 'like', 'to', 'add'] 70 | add_line_to_index(line, client, prefix="your_prefix") 71 | ``` 72 | 73 | Your Markov chain is namespaced by a prefix (so you can store different data sets in the same Redis database). Each of the functions 74 | in the markov modules takes a prefix argument to determine which set of data to use. To make things easier, there's a Markov 75 | class that allows you to refer to a specific set of prefixed data. 76 | 77 | ```python 78 | from markov import Markov 79 | 80 | twitter_data = Markov(prefix="twitter") 81 | twitter_data.add_line_to_index(["eating", "sushi", "with", "my", "cat"]) 82 | ``` 83 | 84 | It's recommended that you use the Markov class to add texts to your model, score texts and generate new text. 85 | 86 | For example, let's say you've collected a lot of Oprah transcripts and stored them in your model. Scoring a text 87 | would look something like this: 88 | 89 | ```python 90 | from markov import Markov 91 | 92 | #oprah_data is a Markov model filled with Oprah quotes 93 | oprah_data = Markov(prefix="oprah") 94 | 95 | #sentence is our guess at something Oprah might say 96 | sentence = ["you", "get", "a", "car"] 97 | 98 | #let's ask oprah_data how we did 99 | score = oprah_data.score_for_line(sentence) 100 | # at this point, score is probably something like 100 101 | 102 | other_sentence = ["you", "get", "rusty", "razor", "blades"] 103 | 104 | score = oprah_data.score_for_line(other_sentence) 105 | #since oprah probably never said this the score is probably much lower, like 30 or 50 106 | ``` 107 | 108 | You can also generate text from your Markov model. Let's say you put a bunch of tweets in your model and you wanted 109 | to generate a representative sample: 110 | 111 | ```python 112 | from markov import Markov 113 | 114 | tweet_data = Markov(prefix="tweets") 115 | 116 | new_tweet = tweet_data.generate(max_words=6) 117 | #new_tweet will be something like ["omg", "i", "love", "snax"] 118 | ``` 119 | 120 | If you want your text to start with a certain key, you can seed it like so: 121 | 122 | ```python 123 | new_tweet = tweet_data.generate(seed=['i','love'], max_words=6) 124 | #new_tweet will be something like ['i', 'love', 'to', 'eat', 'snax'] 125 | ``` 126 | 127 | you can use the max_words argument to determine the maximum number of tokens to include in the generated sequence 128 | ```python 129 | new_tweet = tweet_data.generate(max_words=100) 130 | #new tweet could be 2-100 words long 131 | ``` 132 | 133 | if not, Markov.generate() will continue to generate texts up to 1000 words long by default until it choses a STOP character 134 | at which point it will stop. 135 | 136 | ```python 137 | new_tweet = tweet_data.generate() 138 | #new tweet could be really long, or not! 139 | ``` 140 | 141 | *Note: the functions in python-markov use recursion, so don't add a sequence more than 1000 items long to your index or 142 | you'll get an error because Python is like that* 143 | -------------------------------------------------------------------------------- /markov/markov.py: -------------------------------------------------------------------------------- 1 | """ 2 | Functions for generating simple markov chains for sequences of words. Allows for scoring of 3 | sentences based on completion frequency. 4 | """ 5 | import random 6 | import redis 7 | 8 | PREFIX = 'markov' 9 | SEPARATOR=':' 10 | STOP='\x02' 11 | 12 | PUNCTUATION = [",", ".", ";", "!","?","(",")", "...", "....", "....."] 13 | 14 | class Markov(object): 15 | """ 16 | Simple wrapper for markov functions 17 | """ 18 | def __init__(self, prefix=None, key_length=2, completion_length=1, db=0, host='localhost', port=6379, password=None): 19 | self.client = redis.Redis(db=db, host=host, port=port, password=password) 20 | self.prefix = prefix or PREFIX 21 | self.key_length = key_length 22 | self.completion_length = completion_length 23 | 24 | def add_line_to_index(self, line): 25 | add_line_to_index(line, self.client, self.key_length, self.completion_length, self.prefix) 26 | 27 | def score_for_line(self, line): 28 | return score_for_line(line, self.client, self.key_length, self.completion_length, self.prefix) 29 | 30 | def generate(self, seed=None, max_words=1000, count_punctuation=True, relevant_terms=None): 31 | return generate(self.client, seed=seed, prefix=self.prefix, max_words=max_words, key_length=self.key_length, count_punctuation=count_punctuation, relevant_terms=relevant_terms) 32 | 33 | def flush(self, prefix=None): 34 | if prefix is not None: 35 | keys = self.client.keys("%s*" % self.prefix) 36 | for key in keys: 37 | self.client.delete(key) 38 | 39 | 40 | def add_line_to_index(line, client, key_length=2, completion_length=1, prefix=PREFIX): 41 | """ 42 | Add a line to our index of markov chains 43 | 44 | @param line: a list of words 45 | @param key_length: the desired length for our keys 46 | @param completion_length: the desired completion length 47 | """ 48 | key, completion = get_key_and_completion(line, key_length, completion_length, prefix) 49 | if key and completion: 50 | completion = make_key(completion) 51 | client.zincrby(key, completion) 52 | add_line_to_index(line[1:], client, key_length, completion_length, prefix) 53 | else: 54 | return 55 | 56 | def make_key(key, prefix=None): 57 | """ 58 | Construct a Redis-friendly key from the list or tuple provided 59 | """ 60 | if type(key) not in [str, unicode]: 61 | key = SEPARATOR.join(key) 62 | if prefix: 63 | key = SEPARATOR.join((prefix, key)) 64 | return key 65 | 66 | def max_for_key(key, client): 67 | """ 68 | Get the maximum score for a completion on this key 69 | """ 70 | maximum = client.zrevrange(key, 0, 0, withscores=True) 71 | if maximum: 72 | return maximum[0][1] 73 | else: 74 | return 0 75 | 76 | def min_for_key(key, client): 77 | """ 78 | Get the minimum score for a completion on this key 79 | """ 80 | minimum = client.zrange(key, 0, 0, withscores=True) 81 | if minimum: 82 | return minimum[0][1] 83 | else: 84 | return 0 85 | 86 | def score_for_completion(key, completion, client, normalize_to=100): 87 | """ 88 | Get the normalized score for a completion 89 | """ 90 | raw_score = client.zscore(key, make_key(completion)) or 0 91 | maximum = max_for_key(key, client) or 1 92 | return (raw_score/maximum) * normalize_to 93 | 94 | def _score_for_line(line, client, key_length, completion_length, prefix, count=0): 95 | """ 96 | Recursive function for iterating over all possible key/completion sets in a line 97 | and scoring them 98 | """ 99 | score = 0 100 | key, completion = get_key_and_completion(line, key_length, completion_length, prefix) 101 | if key and completion: 102 | score = score_for_completion(key, completion, client) 103 | new_score, count = _score_for_line(line[1:], client, key_length, completion_length, prefix, count+1) 104 | score += new_score 105 | else: 106 | score = 0 107 | return score, count 108 | 109 | def score_for_line(line, client, key_length=2, completion_length=1, prefix=PREFIX): 110 | """ 111 | Score a line of text for fit based on our markov model 112 | """ 113 | score, count = _score_for_line(line, client, key_length, completion_length, prefix) 114 | if count > 0: 115 | return score/count 116 | else: 117 | return 0 118 | 119 | def generate(client, seed=None, prefix=None, max_words=1000, key_length=2, count_punctuation=True, relevant_terms=None): 120 | """ 121 | Generate some text based on our model 122 | """ 123 | if seed is None: 124 | key, seed = get_key_and_seed(client, prefix, relevant_terms=relevant_terms) 125 | else: 126 | key = make_key(seed[-1*key_length:], prefix=prefix) 127 | 128 | completion = get_completion(client, key, relevant_terms=relevant_terms) 129 | #if we've found a completion, continue 130 | if completion: 131 | completion = completion.split(SEPARATOR) 132 | if count_tokens(seed, count_punctuation) + count_tokens(completion, count_punctuation) < max_words: 133 | seed += completion 134 | return generate(client, seed=seed, prefix=prefix, max_words=max_words, key_length=key_length, count_punctuation=count_punctuation, relevant_terms=relevant_terms) 135 | elif count_tokens(seed, count_punctuation) + count_tokens(completion, count_punctuation) == max_words: 136 | if STOP in completion: 137 | completion.remove(STOP) 138 | return seed + completion 139 | else: 140 | if STOP in seed: 141 | seed.remove(STOP) 142 | return seed 143 | 144 | def count_tokens(seed, count_punctuation=True): 145 | """ 146 | Count the tokens in the given seed. 147 | """ 148 | if count_punctuation : 149 | return len(seed) 150 | else: 151 | return len([item for item in seed if item not in PUNCTUATION]) 152 | 153 | 154 | def get_key_and_seed(client, prefix=None, relevant_terms=None): 155 | """ 156 | Wraps get_random_key_and_seed and get_relevant_key_and_seed 157 | """ 158 | if relevant_terms and len(relevant_terms) > 0: 159 | return get_relevant_key_and_seed(client, relevant_terms, prefix) 160 | else: 161 | return get_random_key_and_seed(client, prefix) 162 | 163 | 164 | def get_random_key_and_seed(client, prefix=None): 165 | """ 166 | Get a random key from the data set and split it into a seed for sequence generation. 167 | """ 168 | key = None 169 | seed = [] 170 | while len(seed) == 0 or (len([item for item in seed if item in PUNCTUATION]) > 0): 171 | if prefix: 172 | key = random.choice(client.keys("%s%s*" % (prefix, SEPARATOR))) 173 | else: 174 | key = client.randomkey() 175 | seed = key.split(SEPARATOR) 176 | if prefix in seed: 177 | seed.remove(prefix) 178 | return key, seed 179 | 180 | 181 | def get_relevant_key_and_seed(client, relevant_terms, prefix=None, tries=10): 182 | """ 183 | Get a key that contains one of the terms from relevant_terms. 184 | Limit the number of tries to avoid an infinite loop. 185 | """ 186 | tried = 0 187 | key = None 188 | seed = [] 189 | while (len(seed) == 0 or (len([item for item in seed if item in PUNCTUATION]) > 0)) and tried < tries: 190 | keys = [] 191 | for term in relevant_terms: 192 | if prefix: 193 | keys += client.keys("%s%s*%s*" % (prefix, SEPARATOR, term)) 194 | else: 195 | keys += client.keys("*%s*" % term) 196 | try: 197 | key = random.choice(list(set(keys))) 198 | seed = key.split(SEPARATOR) 199 | except IndexError: 200 | # there were no matching keys 201 | break 202 | tried += 1 203 | if prefix in seed: 204 | seed.remove(prefix) 205 | return key, seed 206 | 207 | 208 | def get_completion(client, key, exclude=[], relevant_terms=None): 209 | """ 210 | Get a possible completion for some key 211 | """ 212 | completion = None 213 | completions = client.zrevrange(key, 0, -1) 214 | completions = [item for item in completions if item not in exclude] 215 | if len(completions) > 0: 216 | if relevant_terms: 217 | #make an attempt to use one of the relevant terms 218 | try: 219 | completion = random.choice([item for item in completions if item in relevant_terms]) 220 | except IndexError: 221 | pass 222 | if completion is None: 223 | completion = random.choice(completions) 224 | return completion 225 | 226 | 227 | def get_key_and_completion(line, key_length, completion_length, prefix): 228 | """ 229 | Get a key and completion from the given list of words 230 | """ 231 | if len(line) >= key_length and STOP not in line[0:key_length]: 232 | key = make_key(line[0:key_length], prefix=prefix) 233 | if completion_length > 1: 234 | completion = line[key_length:key_length+completion_length] 235 | else: 236 | try: 237 | completion = line[key_length] 238 | except IndexError: 239 | completion = STOP 240 | completion = make_key(completion) 241 | return (key, completion) 242 | else: 243 | return (False,False) 244 | -------------------------------------------------------------------------------- /markov/tests.py: -------------------------------------------------------------------------------- 1 | """ 2 | Tests for the markov package 3 | """ 4 | import redis 5 | import unittest 6 | import markov 7 | from markov import Markov, add_line_to_index, make_key, max_for_key, min_for_key,\ 8 | score_for_completion, score_for_line, get_key_and_completion, generate, get_relevant_key_and_seed, \ 9 | get_random_key_and_seed, get_completion, STOP 10 | 11 | class TestMarkovFunctions(unittest.TestCase): 12 | """ 13 | Test our markov chain construction and phrase scoring functions 14 | """ 15 | def setUp(self): 16 | """ 17 | Create a redis client and define a prefix space for this test 18 | """ 19 | self.client = redis.Redis(db=11) 20 | self.prefix = 'test' 21 | 22 | def test_make_key(self): 23 | """ 24 | Test that the make_key function behaves as expected 25 | """ 26 | key = make_key(('foo','bar'),self.prefix) 27 | self.assertEqual(key, 'test:foo:bar') 28 | key = make_key(('foo','bar')) 29 | self.assertEqual(key, 'foo:bar') 30 | 31 | def test_score_for_completion(self): 32 | """ 33 | Test that score_for_completion scores completions according to our model 34 | """ 35 | self.test_add_line_to_index() 36 | self.assertEqual(score_for_completion('test:i:ate', 'a', self.client), 100) 37 | self.assertEqual(score_for_completion('test:i:ate', 'one', self.client), 50) 38 | 39 | def test_max_for_key(self): 40 | """ 41 | Test that max_for_key correctly finds the frequency of the most common completion 42 | """ 43 | self.test_add_line_to_index() 44 | self.assertEqual(max_for_key('test:i:ate', self.client), 2) 45 | self.assertEqual(max_for_key('test:stupidkey', self.client), 0) 46 | 47 | def test_min_for_key(self): 48 | """ 49 | Test that min_for_key correctly finds the frequency of the least common completion 50 | """ 51 | self.test_add_line_to_index() 52 | self.assertEqual(min_for_key('test:i:ate', self.client), 1) 53 | self.assertEqual(min_for_key('test:stupidkey', self.client), 0) 54 | 55 | def test_add_line_to_index(self): 56 | """ 57 | Test that adding lines behaves as expected 58 | """ 59 | line = ['i','ate','a','peach'] 60 | line1 = ['i','ate','one','peach'] 61 | line2 = ['i','ate','a', 'sandwich'] 62 | 63 | add_line_to_index(line, self.client, prefix="test") 64 | self.assertEqual(self.client.zscore("test:i:ate", "a"), 1.0) 65 | self.assertEqual(self.client.zscore("test:ate:a", "peach"), 1.0) 66 | 67 | add_line_to_index(line1, self.client, prefix="test") 68 | self.assertEqual(self.client.zscore("test:i:ate", "a"), 1.0) 69 | self.assertEqual(self.client.zscore("test:ate:a", "peach"), 1.0) 70 | self.assertEqual(self.client.zscore("test:ate:one", "peach"), 1.0) 71 | self.assertEqual(self.client.zscore("test:i:ate", "one"), 1.0) 72 | 73 | add_line_to_index(line2, self.client, prefix="test") 74 | self.assertEqual(self.client.zscore("test:i:ate", "a"), 2) 75 | self.assertEqual(self.client.zscore("test:ate:a", "sandwich"), 1.0) 76 | 77 | def test_score_for_line(self): 78 | """ 79 | Ensure that score_for_line rates lines according to our model 80 | """ 81 | self.test_add_line_to_index() 82 | line = ['i','ate','a','peach'] 83 | line2 = ['i','ate','a', 'pizza'] 84 | line3 = ['i','ate','one','sandwich'] 85 | 86 | self.assertEqual(score_for_line(line, self.client, prefix=self.prefix), 100) 87 | self.assertEqual(score_for_line(line2, self.client, prefix=self.prefix), 100.0/3) 88 | self.assertEqual(score_for_line(line3, self.client, prefix=self.prefix), 50.0/3) 89 | 90 | def test_get_key_and_completion(self): 91 | """ 92 | Ensure that get_key_and_completion finds the expected keys and completions based 93 | on key_length and completion_length 94 | """ 95 | line = ['i', 'ate', 'a', 'peach'] 96 | 97 | key, completion = get_key_and_completion(line, 2, 1, self.prefix) 98 | self.assertEqual(key, 'test:i:ate') 99 | self.assertEqual(completion, 'a') 100 | 101 | key, completion = get_key_and_completion(line, 2, 2, self.prefix) 102 | self.assertEqual(key, 'test:i:ate') 103 | self.assertEqual(completion, 'a:peach') 104 | 105 | key, completion = get_key_and_completion(line, 3, 2, self.prefix) 106 | self.assertEqual(key, 'test:i:ate:a') 107 | self.assertEqual(completion, 'peach') 108 | 109 | key, completion = get_key_and_completion(line, 4, 1, self.prefix) 110 | self.assertEqual(key, 'test:i:ate:a:peach') 111 | self.assertEqual(completion, markov.STOP) 112 | 113 | 114 | def test_generate(self): 115 | """ 116 | Test the generate function 117 | """ 118 | self.test_add_line_to_index() 119 | generated = generate(self.client, prefix=self.prefix, max_words=3) 120 | assert len(generated) >= 2 121 | assert len(generated) <= 3 122 | generated = generate(self.client, seed=['ate','one'], prefix=self.prefix, max_words=3) 123 | assert generated[2] == 'peach' 124 | assert 'sandwich' not in generated 125 | 126 | #test that relevant terms will be chosen when the relevant_terms argument is passed in 127 | generated = generate(self.client, relevant_terms=["peach",], prefix=self.prefix) 128 | assert 'peach' in generated 129 | generated = generate(self.client, relevant_terms=["sandwich",], prefix=self.prefix) 130 | assert 'sandwich' in generated 131 | 132 | #there are no pizza keys! 133 | generated = generate(self.client, relevant_terms=["pizza",], prefix=self.prefix) 134 | assert len(generated) == 0 135 | 136 | def test_get_relevant_key_and_seed(self): 137 | """ 138 | Test that get_relevant_key_and_seed functions as expected 139 | """ 140 | #we get a key with sandwich in it 141 | self.test_add_line_to_index() 142 | key, seed = get_relevant_key_and_seed(self.client, relevant_terms=["sandwich",], prefix=self.prefix) 143 | assert "sandwich" in seed 144 | 145 | #pizza is not in our data set, so we get nothing 146 | key, seed = get_relevant_key_and_seed(self.client, relevant_terms=["pizza",], prefix=self.prefix) 147 | assert seed == [] 148 | assert key is None 149 | 150 | def test_get_random_key_and_seed(self): 151 | """ 152 | Test that get_random_key_and_seed functions as expected 153 | """ 154 | self.test_add_line_to_index() 155 | key, seed = get_random_key_and_seed(self.client, prefix=self.prefix) 156 | assert len(seed) == 2 157 | assert self.prefix not in seed 158 | assert self.prefix in key 159 | 160 | def test_get_completion(self): 161 | """ 162 | Test the get_completion method 163 | """ 164 | self.test_add_line_to_index() 165 | key, seed = get_random_key_and_seed(self.client, prefix=self.prefix) 166 | if STOP not in seed: 167 | assert get_completion(self.client, key) is not None 168 | else: 169 | assert get_completion(self.client, key) is None 170 | key = "test:i:ate" 171 | assert get_completion(self.client, key) in ["a", "one"] 172 | #ensure that exclude works as expected 173 | assert get_completion(self.client, key, exclude=["a",]) == "one" 174 | assert get_completion(self.client, key, exclude=["a","one"]) is None 175 | 176 | #ensure that relevant_terms works as well 177 | assert get_completion(self.client, key, relevant_terms=["one",]) == "one" 178 | 179 | 180 | def tearDown(self): 181 | """ 182 | clean up our redis keys 183 | """ 184 | keys = self.client.keys(self.prefix+"*") 185 | for key in keys: 186 | self.client.delete(key) 187 | 188 | 189 | class TestMarkovClass(unittest.TestCase): 190 | """ 191 | Test that the Markov wrapper class behaves as expected 192 | """ 193 | def setUp(self): 194 | self.markov = Markov(prefix="testclass", db=11) 195 | 196 | def test_add_line_to_index(self): 197 | line = ['i','ate','a','peach'] 198 | line1 = ['i','ate','one','peach'] 199 | line2 = ['i','ate','a', 'sandwich'] 200 | 201 | self.markov.add_line_to_index(line) 202 | self.markov.add_line_to_index(line1) 203 | self.markov.add_line_to_index(line2) 204 | self.assertEqual(self.markov.client.zscore("testclass:i:ate", "a"), 2.0) 205 | self.assertEqual(self.markov.client.zscore("testclass:ate:a", "peach"), 1.0) 206 | 207 | def test_score_for_line(self): 208 | self.test_add_line_to_index() 209 | line = ['i','ate','a','peach'] 210 | self.assertEqual(self.markov.score_for_line(line), 100) 211 | 212 | 213 | def test_generate(self): 214 | self.test_add_line_to_index() 215 | generated = self.markov.generate(max_words=3) 216 | assert len(generated) >= 2 217 | assert len(generated) <= 3 218 | generated = self.markov.generate(seed=['ate','one'], max_words=3) 219 | assert 'peach' in generated 220 | assert 'sandwich' not in generated 221 | 222 | def test_flush(self): 223 | m1 = Markov(prefix="one", db=5) 224 | m2 = Markov(prefix="two", db=5) 225 | 226 | line = ['i','ate','a','peach'] 227 | line1 = ['i','ate','one','peach'] 228 | line2 = ['i','ate','a', 'sandwich'] 229 | 230 | m1.add_line_to_index(line) 231 | m1.add_line_to_index(line1) 232 | m1.add_line_to_index(line2) 233 | 234 | important_line = ['we', 'all', 'have', 'phones'] 235 | 236 | m2.add_line_to_index(important_line) 237 | 238 | r = redis.Redis(db=5) 239 | assert len(r.keys("one:*")) == 6 240 | assert len(r.keys("two:*")) == 3 241 | 242 | m1.flush(prefix="one") 243 | 244 | assert len(r.keys("one:*")) == 0 245 | assert len(r.keys("two:*")) == 3 246 | 247 | m2.flush(prefix="two") 248 | 249 | assert len(r.keys("one:*")) == 0 250 | assert len(r.keys("two:*")) == 0 251 | 252 | def tearDown(self): 253 | """ 254 | clean up our redis keys 255 | """ 256 | keys = self.markov.client.keys(self.markov.prefix+"*") 257 | for key in keys: 258 | self.markov.client.delete(key) 259 | --------------------------------------------------------------------------------