├── README.md └── chain.py /README.md: -------------------------------------------------------------------------------- 1 | chain-py 2 | 3 | Fluent sequence operations in Python 4 | 5 | ======== 6 | 7 | *chain-py* is a Python library that allows you to fluently chain together sequence operations. It is heavily inspired by the `chain` functionality in [underscore.js](http://underscorejs.org/), and also by the [Seq module](http://msdn.microsoft.com/en-us/library/ee353635.aspx) in F#. (Before building this I found the similar [Moka](http://www.phzbox.com/moka/index.html), which didn't really do what I needed, hence chain) 8 | 9 | To use chain, you wrap a collection in a `Chain()` object, call sequence operations on it, and end with a call to `value()`. It is helpful to wrap the entire call in parentheses so you can abuse whitespace. 10 | 11 | # Examples 12 | 13 | Imagine the following dataset representing an unpopular blog: 14 | 15 | Post = namedtuple("Post","subject num_comments tags") 16 | blog = [ Post("First post!", 0, ["blogging","about me"]), 17 | Post("Why is no one commenting?", 1, ["blogging","gripes","complaints"]), 18 | Post("Hello?", 0, ["blogging","hello"]), 19 | Post("I quit!", 10, ["not blogging","goodbye"]) ] 20 | 21 | Start with 22 | 23 | from chain import Chain 24 | 25 | Then you can count the number of posts with no comments with 26 | 27 | (Chain(blog) 28 | .filter(lambda post: post.num_comments == 0) 29 | .length() 30 | .value()) 31 | 32 | Of course you could do this easily with list comprehensions: 33 | 34 | len([post for post in blog if post.num_comments == 0]) 35 | 36 | But the fluent way is (for me) easier to read. And it allows much more complicated operations. For instance, you could compute a histogram of comment counts for posts tagged "blogging": 37 | 38 | (Chain(blog) 39 | .filter(lambda post: "blogging" in post.tags) 40 | .count_by(lambda post: post.num_comments) 41 | .value()) 42 | 43 | Or the total number of comments per tag: 44 | 45 | (Chain(blog) 46 | .collect(lambda post: [(tag, post.num_comments) for tag in post.tags]) 47 | .group_by(lambda (tag, num_comments): tag) 48 | .map(lambda (tag,pairs): (tag,sum([num_comments for t,num_comments in pairs]))) 49 | .value()) 50 | 51 | Internally, `dict`s are always treated as sequences of key-value pairs. However, you can return a dictionary by calling `to_dict()` right before `value()`: 52 | 53 | (Chain(blog) 54 | .collect(lambda post: [(tag, post.num_comments) for tag in post.tags]) 55 | .group_by(lambda (tag, num_comments): tag) 56 | .map(lambda (tag,pairs): (tag,sum([num_comments for t,num_comments in pairs]))) 57 | .to_dict() 58 | .value()) 59 | 60 | # Functionality 61 | 62 | The code is pretty self-explanatory. In general, operations are lazy wherever possible, so that (e.g.) 63 | 64 | (Chain(collection) 65 | .map(map_function) 66 | .take(10) 67 | .value()) 68 | 69 | will only ever look at the first 10 elements of `collection`. 70 | 71 | Most of the chain operators accept functions of either one, two, or three arguments. A function of one argument gets each element in turn. A function of two arguments gets each element and its index. And a function of three arguments also gets the entire collection. 72 | 73 | In almost every case, if the function `fn` is omitted, it defaults to the identity function. This is not particularly interesting for (e.g.) `map`, but is often useful in (e.g.) `sort_by`. 74 | 75 | Most of the chain operations will implicitly convert `dict`s to sequences of key-value pairs, so that you could do something like 76 | 77 | (Chain(my_dict) 78 | .map(lambda (k,v): (k, v + 1)) 79 | .to_dict() 80 | .value()) 81 | 82 | Although the operations that assume an order generally don't, so that 83 | 84 | Chain(my_dict).first(1).value() 85 | 86 | might not do what you expect. (It's not clear what you'd expect it to do anyway.) 87 | 88 | 89 | # Operators 90 | 91 | In all of the below, "returns..." really means "replaces the collection stored in Chain with..." 92 | 93 | ### `length()` (alternatively `size`) 94 | 95 | returns the length of the collection. You'd probably only ever use this as your last call before `value()`. 96 | 97 | ### `each(fn)` (alternatively `for_each`) 98 | 99 | invokes `fn` on each element of the collection, leaving the collection unchanged 100 | 101 | ### `map(fn)` 102 | 103 | applies `fn` to each element and returns the resulting collection 104 | 105 | ### `append(second_collection)` 106 | 107 | appends the `second_collection` to the end of the currently stored collection 108 | 109 | ### `reduce(fn,memo)` 110 | 111 | sets memo to `fn(memo,first_element)` then `fn(memo,second_element)`, and so on, returning the final value of `memo`. 112 | 113 | ### `reduce_right(fn,memo)` 114 | 115 | same as above, but works right-to-left 116 | 117 | ### `rev()` 118 | 119 | reverses the collection 120 | 121 | ### `find(fn)` 122 | 123 | returns the first element for which `fn` produces a truthy value. Raises an error if no such element exists. 124 | 125 | ### `collect(fn)` 126 | 127 | returns the concatenation of `fn(first_element)`, `fn(second_element)`, etc... 128 | 129 | ### `filter(fn)` 130 | 131 | returns the elements for which `fn` produces a truthy value 132 | 133 | ### `reject(fn)` 134 | 135 | returns the elements for which `fn` produces a falsy value 136 | 137 | ### `sort_by(fn)` 138 | 139 | sorts the collection by the values of `fn`. 140 | 141 | ### `sort_by_descending(fn)` 142 | 143 | sorts the collection by the values of `fn` descending 144 | 145 | ### `group_by(fn)` 146 | 147 | returns pairs where the first element is a value achieved by `fn` and the second element is the elements in the collection that `fn` maps to that value. 148 | 149 | ### `count_by(fn)` 150 | 151 | returns pairs where the fist element is a value achieved by `fn` and the second element is the number of elements in the collection that `fn` maps to that value 152 | 153 | ### `distinct_by(fn)` 154 | 155 | returns the subset of collection consisting of the first time each value of `fn` appears. 156 | 157 | ### `max(fn)` 158 | 159 | returns the *element* for which `fn` achieves its largest value (not the value itself) 160 | 161 | ### `min(fn)` 162 | 163 | returns the *element* for which `fn` achieves its smallest value (not the value itself) 164 | 165 | ### `sum_by(fn)` (also `sum`) 166 | 167 | returns the sum of `fn(item)` for each item in the collection 168 | 169 | ### `shuffle()` 170 | 171 | randomly shuffles the elements in the collection 172 | 173 | ### `first(n)` (also `take(n)`) 174 | 175 | returns only the first `n` items. 176 | 177 | ### `rest(n)` (also `skip(n)`) 178 | 179 | returns all but the first `n` items 180 | 181 | ### `head()` 182 | 183 | returns the first item itself (whereas `first()` would return a collection containing just the first item) 184 | 185 | ### `to_list()` 186 | 187 | if possible, `value()` will return a generator. Calling `to_list()` immediately before forces it to manifest a list. 188 | 189 | ### `to_dict()` 190 | 191 | as mentioned above, dicts are treated as collections of key-value pairs. `to_dict()` coerces the collection to an actual Python `dict`. 192 | 193 | ### `value()` 194 | 195 | you must always call `value()` last to return the Chain's value rather than the Chain itself. 196 | 197 | # License 198 | 199 | Do whatever you want with this code. 200 | 201 | # Feedback 202 | 203 | I'm sure there's all sorts of stupid design decisions I've made. Let me know about them! Or, if you find this useful, let me know that too! 204 | 205 | -------------------------------------------------------------------------------- /chain.py: -------------------------------------------------------------------------------- 1 | #chain.py 2 | 3 | import inspect 4 | import random 5 | from collections import defaultdict 6 | 7 | # functions intended for internal use only 8 | 9 | def _identity(value): 10 | """not very interesting, except used as default everywhere""" 11 | return value 12 | 13 | def _arity(f): 14 | """how many arguments does f take?""" 15 | return len(inspect.getargspec(f).args) 16 | 17 | def _seq(iterable): 18 | """want dicts to iterate as key-value pairs, so have to convert""" 19 | if type(iterable) == dict: 20 | return iterable.iteritems() 21 | else: 22 | return iterable 23 | 24 | def _iterator_fn(fn): 25 | """returns a function of (item,idx,lst) that takes into account _arity of f 26 | if 0: new function returns f() 27 | if 1: ... returns f(item) 28 | if 2: ... returns f(item,idx) 29 | if 3: ... returns f(item,idx,lst)""" 30 | num_args = _arity(fn) 31 | if num_args == 0: return (lambda item,idx,lst: fn()) 32 | elif num_args == 1: return (lambda item,idx,lst: fn(item)) 33 | elif num_args == 2: return (lambda item,idx,lst: fn(item,idx)) 34 | elif num_args == 3: return (lambda item,idx,lst: fn(item,idx,lst)) 35 | else: raise NameError("fn can only take 3 arguments") 36 | 37 | def length(lst): 38 | """for chaining""" 39 | return len(list(lst)) 40 | 41 | size = length 42 | 43 | def each(lst,fn): 44 | """call fn on each item in lst, then return lst""" 45 | iterator_fn = _iterator_fn(fn) 46 | for idx,item in enumerate(_seq(lst)): 47 | iterator_fn(item,idx,lst) 48 | return lst 49 | 50 | for_each = each 51 | 52 | def map(lst,fn=_identity): 53 | """call fn on each item in lst, lazy return results""" 54 | iterator_fn = _iterator_fn(fn) 55 | for idx,item in enumerate(_seq(lst)): 56 | yield iterator_fn(item,idx,lst) 57 | 58 | def append(lst,lst2): 59 | for item in lst: 60 | yield item 61 | for item in lst2: 62 | yield item 63 | 64 | def reduce(lst,fn,memo): 65 | """set memo = fn(memo,lst[0]), then fn(memo,lst[1]), and so on ... 66 | returns final value of memo""" 67 | for item in _seq(lst): 68 | memo = fn(memo,item) 69 | return memo 70 | 71 | def reduce_right(lst,fn,memo): 72 | """setmemo = fn(memo,lst[-1]), then fn(memo,lst[-2]), and so on ... 73 | returns final value of memo""" 74 | for item in reversed(_seq(lst)): 75 | memo = fn(memo,item) 76 | return memo 77 | 78 | def rev(lst): 79 | """return lst in reverse order""" 80 | return reversed(_seq(list)) 81 | 82 | def find(lst,fn=_identity): 83 | """returns the first item in lst satisfying fn 84 | raises an error if no item satisfies""" 85 | iterator_fn = _iterator_fn(fn) 86 | for idx,item in enumerate(_seq(lst)): 87 | if iterator_fn(item,idx,lst): 88 | return item 89 | raise NameError("unable to find") 90 | 91 | def collect(lst,fn=_identity): 92 | """lazy-return the concatenation of fn(lst[0]), fn(lst[1]), ...""" 93 | iterator_fn = _iterator_fn(fn) 94 | for idx,item in enumerate(_seq(lst)): 95 | for sub_item in iterator_fn(item,idx,lst): 96 | yield sub_item 97 | 98 | def filter(lst,fn=_identity): 99 | """lazy-return the values in lst that satisfy fn""" 100 | iterator_fn = _iterator_fn(fn) 101 | for idx,item in enumerate(_seq(lst)): 102 | if iterator_fn(item,idx,lst): 103 | yield item 104 | 105 | def reject(lst,fn=_identity): 106 | """lazy-return the values in lst that don't satisfy fn""" 107 | iterator_fn = _iterator_fn(fn) 108 | for idx,item in enumerate(_seq(lst)): 109 | if not iterator_fn(item,idx,lst): 110 | yield item 111 | 112 | def all(lst,fn=_identity): 113 | """True iff all elements of lst satisfy fn""" 114 | iterator_fn = _iterator_fn(fn) 115 | for idx,item in enumerate(_seq(lst)): 116 | if not iterator_fn(item,idx,lst): return False 117 | return True 118 | 119 | def any(lst,fn=_identity): 120 | """True iff at least one element of lst satisfies fn""" 121 | iterator_fn = _iterator_fn(fn) 122 | for idx,item in enumerate(_seq(lst)): 123 | if iterator_fn(item,idx,lst): return True 124 | return False 125 | 126 | def sort_by(lst,fn=_identity): 127 | """returns the elements of lst sorted by fn""" 128 | value_pairs = [(fn(item),item) for item in _seq(lst)] 129 | return [z[1] for z in sorted(value_pairs)] 130 | 131 | def sort_by_descending(lst,fn=_identity): 132 | """returns the elements of lst sorted by fn in descending order""" 133 | value_pairs = [(fn(item),item) for item in _seq(lst)] 134 | return [z[1] for z in sorted(value_pairs,reverse=True)] 135 | 136 | 137 | def group_by(lst,fn=_identity): 138 | """returns a generator of key-value pairs 139 | where the key is fn applied to each item 140 | and the value is a list of the corresponding items""" 141 | iterator_fn = _iterator_fn(fn) 142 | group_dict = defaultdict(list) 143 | for idx,item in enumerate(_seq(lst)): 144 | key = iterator_fn(item,idx,lst) 145 | group_dict[key].append(item) 146 | return group_dict.iteritems() 147 | 148 | def count_by(lst,fn=_identity): 149 | """returns a generator of key-value pairs 150 | where the key is fn applied to each item 151 | and the value is the number of items corresponding to the key""" 152 | iterator_fn = _iterator_fn(fn) 153 | count_dict = defaultdict(int) 154 | for idx,item in enumerate(_seq(lst)): 155 | key = iterator_fn(item,idx,lst) 156 | count_dict[key] += 1 157 | return count_dict.iteritems() 158 | 159 | def distinct_by(lst,fn=_identity): 160 | """returns a generator where each fn(item) can only appear once""" 161 | iterator_fn = _iterator_fn(fn) 162 | seen = set() 163 | for idx,item in enumerate(_seq(lst)): 164 | key = iterator_fn(item,idx,lst) 165 | if not key in seen: 166 | seen.add(key) 167 | yield item 168 | 169 | distinct = distinct_by 170 | 171 | def max(lst,fn=_identity): 172 | """returns the item in lst for which fn is largest 173 | in case of a tie, returns the first such item""" 174 | iterator_fn = _iterator_fn(fn) 175 | max_value,max_item = None,None 176 | for idx,item in enumerate(_seq(lst)): 177 | cur_value = iterator_fn(item,idx,lst) 178 | if not max_value or cur_value > max_value: 179 | max_value,max_item = cur_value,item 180 | return max_item 181 | 182 | def min(lst,fn=_identity): 183 | """returns the item in lst for which fn is smallest 184 | in case of a tie, returns the first such item""" 185 | iterator_fn = _iterator_fn(fn) 186 | min_value,min_item = None,None 187 | for idx,item in enumerate(_seq(lst)): 188 | cur_value = iterator_fn(item,idx,lst) 189 | if not min_value or cur_value < min_value: 190 | min_value,min_item = cur_value,item 191 | return min_item 192 | 193 | def sum_by(lst,fn=_identity): 194 | """returns the sum of fn applied to each item in lst""" 195 | return sum(map(lst,fn)) 196 | 197 | sum = sum_by 198 | 199 | def shuffle(lst): 200 | """returns the items in lst in random order""" 201 | return random.shuffle([item for item in _seq(lst)]) 202 | 203 | def to_list(lst): 204 | return list(lst) 205 | 206 | def to_dict(kvps): 207 | return dict(kvps) 208 | 209 | # array functions 210 | 211 | def first(array,n=1): 212 | """lazy returns the first n items in array""" 213 | for idx,item in enumerate(array): 214 | if idx < n: 215 | yield item 216 | else: 217 | break 218 | 219 | take = first 220 | 221 | def rest(array,n=1): 222 | """skips the first n items in array 223 | lazy returns the rest""" 224 | for idx,item in enumerate(array): 225 | if idx >= n: 226 | yield item 227 | 228 | skip = rest 229 | 230 | def head(lst): 231 | if inspect.isgenerator(lst): 232 | return lst.next() 233 | else: 234 | return lst[0] 235 | 236 | # chaining 237 | 238 | class Chain: 239 | def __init__(self,obj): 240 | self.obj = obj 241 | def __repr__(self): 242 | return "Chain(%s)" % self.obj.__repr__() 243 | def value(self,generator_to_list=True): 244 | if type(self.obj) == dict: 245 | return self.obj 246 | elif generator_to_list and (inspect.isgenerator(self.obj) or 247 | type(self.obj) == type({}.iteritems())): 248 | return list(self.obj) 249 | else: 250 | return self.obj 251 | def __getattr__(self,name): 252 | def method_missing(*args): 253 | self.obj = globals()[name](self.obj,*args) 254 | return self 255 | return method_missing 256 | --------------------------------------------------------------------------------