├── README.md
└── chain.py


/README.md:
--------------------------------------------------------------------------------
  1 | chain-py
  2 | 
  3 | Fluent sequence operations in Python
  4 | 
  5 | ========
  6 | 
  7 | *chain-py* is a Python library that allows you to fluently chain together sequence operations.  It is heavily inspired by the `chain` functionality in [underscore.js](http://underscorejs.org/), and also by the [Seq module](http://msdn.microsoft.com/en-us/library/ee353635.aspx) in F#.  (Before building this I found the similar [Moka](http://www.phzbox.com/moka/index.html), which didn't really do what I needed, hence chain)
  8 | 
  9 | To use chain, you wrap a collection in a `Chain()` object, call sequence operations on it, and end with a call to `value()`.  It is helpful to wrap the entire call in parentheses so you can abuse whitespace.
 10 | 
 11 | # Examples
 12 | 
 13 | Imagine the following dataset representing an unpopular blog:
 14 | 
 15 |     Post = namedtuple("Post","subject num_comments tags")
 16 |     blog = [ Post("First post!",               0,  ["blogging","about me"]),
 17 |              Post("Why is no one commenting?", 1,  ["blogging","gripes","complaints"]),
 18 |              Post("Hello?",                    0,  ["blogging","hello"]),
 19 |              Post("I quit!",                   10, ["not blogging","goodbye"]) ]
 20 | 
 21 | Start with
 22 | 
 23 |     from chain import Chain
 24 |              
 25 | Then you can count the number of posts with no comments with
 26 | 
 27 |     (Chain(blog)
 28 |         .filter(lambda post: post.num_comments == 0)
 29 |         .length()
 30 |         .value())
 31 |         
 32 | Of course you could do this easily with list comprehensions:
 33 | 
 34 |     len([post for post in blog if post.num_comments == 0])
 35 |     
 36 | But the fluent way is (for me) easier to read.  And it allows much more complicated operations.  For instance, you could compute a histogram of comment counts for posts tagged "blogging":
 37 | 
 38 |     (Chain(blog)
 39 |         .filter(lambda post: "blogging" in post.tags)
 40 |         .count_by(lambda post: post.num_comments)
 41 |         .value())
 42 |         
 43 | Or the total number of comments per tag:
 44 | 
 45 |     (Chain(blog)
 46 |         .collect(lambda post: [(tag, post.num_comments) for tag in post.tags])
 47 |         .group_by(lambda (tag, num_comments): tag)
 48 |         .map(lambda (tag,pairs): (tag,sum([num_comments for t,num_comments in pairs])))
 49 |         .value())
 50 |         
 51 | Internally, `dict`s are always treated as sequences of key-value pairs.  However, you can return a dictionary by calling `to_dict()` right before `value()`:        
 52 | 
 53 |     (Chain(blog)
 54 |         .collect(lambda post: [(tag, post.num_comments) for tag in post.tags])
 55 |         .group_by(lambda (tag, num_comments): tag)
 56 |         .map(lambda (tag,pairs): (tag,sum([num_comments for t,num_comments in pairs])))
 57 |         .to_dict()
 58 |         .value())
 59 |         
 60 | # Functionality
 61 | 
 62 | The code is pretty self-explanatory.  In general, operations are lazy wherever possible, so that (e.g.)
 63 | 
 64 |     (Chain(collection)
 65 |         .map(map_function)
 66 |         .take(10)
 67 |         .value())
 68 |         
 69 | will only ever look at the first 10 elements of `collection`.
 70 | 
 71 | Most of the chain operators accept functions of either one, two, or three arguments.  A function of one argument gets each element in turn.  A function of two arguments gets each element and its index.  And a function of three arguments also gets the entire collection.
 72 | 
 73 | In almost every case, if the function `fn` is omitted, it defaults to the identity function.  This is not particularly interesting for (e.g.) `map`, but is often useful in (e.g.) `sort_by`.
 74 | 
 75 | Most of the chain operations will implicitly convert `dict`s to sequences of key-value pairs, so that you could do something like
 76 | 
 77 |     (Chain(my_dict)
 78 |         .map(lambda (k,v): (k, v + 1))
 79 |         .to_dict()
 80 |         .value())
 81 | 
 82 | Although the operations that assume an order generally don't, so that
 83 | 
 84 |     Chain(my_dict).first(1).value()
 85 |     
 86 | might not do what you expect.  (It's not clear what you'd expect it to do anyway.)
 87 |         
 88 | 
 89 | # Operators
 90 | 
 91 | In all of the below, "returns..." really means "replaces the collection stored in Chain with..."
 92 | 
 93 | ### `length()` (alternatively `size`)
 94 | 
 95 | returns the length of the collection.  You'd probably only ever use this as your last call before `value()`.
 96 | 
 97 | ### `each(fn)` (alternatively `for_each`)
 98 | 
 99 | invokes `fn` on each element of the collection, leaving the collection unchanged
100 | 
101 | ### `map(fn)`
102 | 
103 | applies `fn` to each element and returns the resulting collection
104 | 
105 | ### `append(second_collection)`
106 | 
107 | appends the `second_collection` to the end of the currently stored collection
108 | 
109 | ### `reduce(fn,memo)`
110 | 
111 | sets memo to `fn(memo,first_element)` then `fn(memo,second_element)`, and so on, returning the final value of `memo`.
112 | 
113 | ### `reduce_right(fn,memo)`
114 | 
115 | same as above, but works right-to-left
116 | 
117 | ### `rev()`
118 | 
119 | reverses the collection
120 | 
121 | ### `find(fn)`
122 | 
123 | returns the first element for which `fn` produces a truthy value.  Raises an error if no such element exists.
124 | 
125 | ### `collect(fn)`
126 | 
127 | returns the concatenation of `fn(first_element)`, `fn(second_element)`, etc...
128 | 
129 | ### `filter(fn)`
130 | 
131 | returns the elements for which `fn` produces a truthy value
132 | 
133 | ### `reject(fn)`
134 | 
135 | returns the elements for which `fn` produces a falsy value
136 | 
137 | ### `sort_by(fn)`
138 | 
139 | sorts the collection by the values of `fn`.
140 | 
141 | ### `sort_by_descending(fn)`
142 | 
143 | sorts the collection by the values of `fn` descending
144 | 
145 | ### `group_by(fn)`
146 | 
147 | returns pairs where the first element is a value achieved by `fn` and the second element is the elements in the collection that `fn` maps to that value.  
148 | 
149 | ### `count_by(fn)`
150 | 
151 | returns pairs where the fist element is a value achieved by `fn` and the second element is the number of elements in the collection that `fn` maps to that value
152 | 
153 | ### `distinct_by(fn)`
154 | 
155 | returns the subset of collection consisting of the first time each value of `fn` appears.  
156 | 
157 | ### `max(fn)`
158 | 
159 | returns the *element* for which `fn` achieves its largest value (not the value itself)
160 | 
161 | ### `min(fn)`
162 | 
163 | returns the *element* for which `fn` achieves its smallest value (not the value itself)
164 | 
165 | ### `sum_by(fn)` (also `sum`)
166 | 
167 | returns the sum of `fn(item)` for each item in the collection
168 | 
169 | ### `shuffle()`
170 | 
171 | randomly shuffles the elements in the collection
172 | 
173 | ### `first(n)` (also `take(n)`)
174 | 
175 | returns only the first `n` items. 
176 | 
177 | ### `rest(n)` (also `skip(n)`)
178 | 
179 | returns all but the first `n` items
180 | 
181 | ### `head()`
182 | 
183 | returns the first item itself (whereas `first()` would return a collection containing just the first item)
184 | 
185 | ### `to_list()`
186 | 
187 | if possible, `value()` will return a generator.  Calling `to_list()` immediately before forces it to manifest a list.
188 | 
189 | ### `to_dict()`
190 | 
191 | as mentioned above, dicts are treated as collections of key-value pairs.  `to_dict()` coerces the collection to an actual Python `dict`.
192 | 
193 | ### `value()`
194 | 
195 | you must always call `value()` last to return the Chain's value rather than the Chain itself.
196 | 
197 | # License
198 | 
199 | Do whatever you want with this code.
200 | 
201 | # Feedback
202 | 
203 | I'm sure there's all sorts of stupid design decisions I've made.  Let me know about them!  Or, if you find this useful, let me know that too!
204 | 
205 | 


--------------------------------------------------------------------------------
/chain.py:
--------------------------------------------------------------------------------
  1 | #chain.py
  2 | 
  3 | import inspect
  4 | import random
  5 | from collections import defaultdict
  6 | 
  7 | # functions intended for internal use only
  8 | 
  9 | def _identity(value):
 10 |     """not very interesting, except used as default everywhere"""
 11 |     return value
 12 | 
 13 | def _arity(f):
 14 |     """how many arguments does f take?"""
 15 |     return len(inspect.getargspec(f).args)
 16 | 
 17 | def _seq(iterable):
 18 |     """want dicts to iterate as key-value pairs, so have to convert"""
 19 |     if type(iterable) == dict:
 20 |         return iterable.iteritems()
 21 |     else:
 22 |         return iterable
 23 | 
 24 | def _iterator_fn(fn):
 25 |     """returns a function of (item,idx,lst) that takes into account _arity of f
 26 |     if 0: new function returns f()
 27 |     if 1: ... returns f(item)
 28 |     if 2: ... returns f(item,idx)
 29 |     if 3: ... returns f(item,idx,lst)"""
 30 |     num_args = _arity(fn)
 31 |     if num_args == 0: return (lambda item,idx,lst: fn())
 32 |     elif num_args == 1: return (lambda item,idx,lst: fn(item))
 33 |     elif num_args == 2: return (lambda item,idx,lst: fn(item,idx))
 34 |     elif num_args == 3: return (lambda item,idx,lst: fn(item,idx,lst))
 35 |     else: raise NameError("fn can only take 3 arguments")
 36 | 
 37 | def length(lst):
 38 |     """for chaining"""
 39 |     return len(list(lst))
 40 | 
 41 | size = length
 42 | 
 43 | def each(lst,fn):
 44 |     """call fn on each item in lst, then return lst"""
 45 |     iterator_fn = _iterator_fn(fn)
 46 |     for idx,item in enumerate(_seq(lst)):
 47 |         iterator_fn(item,idx,lst)
 48 |     return lst
 49 | 
 50 | for_each = each
 51 | 
 52 | def map(lst,fn=_identity):
 53 |     """call fn on each item in lst, lazy return results"""
 54 |     iterator_fn = _iterator_fn(fn)
 55 |     for idx,item in enumerate(_seq(lst)):
 56 |         yield iterator_fn(item,idx,lst)
 57 | 
 58 | def append(lst,lst2):
 59 |     for item in lst:
 60 |         yield item
 61 |     for item in lst2:
 62 |         yield item
 63 | 
 64 | def reduce(lst,fn,memo):
 65 |     """set memo = fn(memo,lst[0]), then fn(memo,lst[1]), and so on ...
 66 |     returns final value of memo"""
 67 |     for item in _seq(lst):
 68 |         memo = fn(memo,item)
 69 |     return memo
 70 | 
 71 | def reduce_right(lst,fn,memo):
 72 |     """setmemo = fn(memo,lst[-1]), then fn(memo,lst[-2]), and so on ...
 73 |     returns final value of memo"""
 74 |     for item in reversed(_seq(lst)):
 75 |         memo = fn(memo,item)
 76 |     return memo
 77 | 
 78 | def rev(lst):
 79 |     """return lst in reverse order"""
 80 |     return reversed(_seq(list))
 81 | 
 82 | def find(lst,fn=_identity):
 83 |     """returns the first item in lst satisfying fn
 84 |     raises an error if no item satisfies"""
 85 |     iterator_fn = _iterator_fn(fn)
 86 |     for idx,item in enumerate(_seq(lst)):
 87 |         if iterator_fn(item,idx,lst):
 88 |             return item
 89 |     raise NameError("unable to find")
 90 | 
 91 | def collect(lst,fn=_identity):
 92 |     """lazy-return the concatenation of fn(lst[0]), fn(lst[1]), ..."""
 93 |     iterator_fn = _iterator_fn(fn)
 94 |     for idx,item in enumerate(_seq(lst)):
 95 |         for sub_item in iterator_fn(item,idx,lst):
 96 |             yield sub_item
 97 |     
 98 | def filter(lst,fn=_identity):
 99 |     """lazy-return the values in lst that satisfy fn"""
100 |     iterator_fn = _iterator_fn(fn)
101 |     for idx,item in enumerate(_seq(lst)):
102 |         if iterator_fn(item,idx,lst):
103 |             yield item
104 | 
105 | def reject(lst,fn=_identity):
106 |     """lazy-return the values in lst that don't satisfy fn"""
107 |     iterator_fn = _iterator_fn(fn)
108 |     for idx,item in enumerate(_seq(lst)):
109 |         if not iterator_fn(item,idx,lst):
110 |             yield item
111 | 
112 | def all(lst,fn=_identity):
113 |     """True iff all elements of lst satisfy fn"""
114 |     iterator_fn = _iterator_fn(fn)
115 |     for idx,item in enumerate(_seq(lst)):
116 |         if not iterator_fn(item,idx,lst): return False
117 |     return True
118 | 
119 | def any(lst,fn=_identity):
120 |     """True iff at least one element of lst satisfies fn"""
121 |     iterator_fn = _iterator_fn(fn)
122 |     for idx,item in enumerate(_seq(lst)):
123 |         if iterator_fn(item,idx,lst): return True
124 |     return False
125 | 
126 | def sort_by(lst,fn=_identity):
127 |     """returns the elements of lst sorted by fn"""
128 |     value_pairs = [(fn(item),item) for item in _seq(lst)]
129 |     return [z[1] for z in sorted(value_pairs)]
130 | 
131 | def sort_by_descending(lst,fn=_identity):
132 |     """returns the elements of lst sorted by fn in descending order"""
133 |     value_pairs = [(fn(item),item) for item in _seq(lst)]
134 |     return [z[1] for z in sorted(value_pairs,reverse=True)]
135 | 
136 | 
137 | def group_by(lst,fn=_identity):
138 |     """returns a generator of key-value pairs
139 |     where the key is fn applied to each item
140 |     and the value is a list of the corresponding items"""
141 |     iterator_fn = _iterator_fn(fn)
142 |     group_dict = defaultdict(list)
143 |     for idx,item in enumerate(_seq(lst)):
144 |         key = iterator_fn(item,idx,lst)
145 |         group_dict[key].append(item)
146 |     return group_dict.iteritems()
147 | 
148 | def count_by(lst,fn=_identity):
149 |     """returns a generator of key-value pairs
150 |     where the key is fn applied to each item
151 |     and the value is the number of items corresponding to the key"""
152 |     iterator_fn = _iterator_fn(fn)
153 |     count_dict = defaultdict(int)
154 |     for idx,item in enumerate(_seq(lst)):
155 |         key = iterator_fn(item,idx,lst)
156 |         count_dict[key] += 1
157 |     return count_dict.iteritems()
158 | 
159 | def distinct_by(lst,fn=_identity):
160 |     """returns a generator where each fn(item) can only appear once"""
161 |     iterator_fn = _iterator_fn(fn)
162 |     seen = set()
163 |     for idx,item in enumerate(_seq(lst)):
164 |         key = iterator_fn(item,idx,lst)
165 |         if not key in seen:
166 |             seen.add(key)
167 |             yield item
168 | 
169 | distinct = distinct_by
170 | 
171 | def max(lst,fn=_identity):
172 |     """returns the item in lst for which fn is largest
173 |     in case of a tie, returns the first such item"""
174 |     iterator_fn = _iterator_fn(fn)
175 |     max_value,max_item = None,None
176 |     for idx,item in enumerate(_seq(lst)):
177 |         cur_value = iterator_fn(item,idx,lst)
178 |         if not max_value or cur_value > max_value:
179 |             max_value,max_item = cur_value,item
180 |     return max_item
181 | 
182 | def min(lst,fn=_identity):
183 |     """returns the item in lst for which fn is smallest
184 |     in case of a tie, returns the first such item"""
185 |     iterator_fn = _iterator_fn(fn)
186 |     min_value,min_item = None,None
187 |     for idx,item in enumerate(_seq(lst)):
188 |         cur_value = iterator_fn(item,idx,lst)
189 |         if not min_value or cur_value < min_value:
190 |             min_value,min_item = cur_value,item
191 |     return min_item  
192 | 
193 | def sum_by(lst,fn=_identity):
194 |     """returns the sum of fn applied to each item in lst"""
195 |     return sum(map(lst,fn))
196 | 
197 | sum = sum_by
198 | 
199 | def shuffle(lst):
200 |     """returns the items in lst in random order"""
201 |     return random.shuffle([item for item in _seq(lst)])
202 | 
203 | def to_list(lst):
204 |     return list(lst)
205 | 
206 | def to_dict(kvps):
207 |     return dict(kvps)
208 | 
209 | # array functions
210 | 
211 | def first(array,n=1):
212 |     """lazy returns the first n items in array"""
213 |     for idx,item in enumerate(array):
214 |         if idx < n:
215 |             yield item
216 |         else:
217 |             break
218 | 
219 | take = first
220 | 
221 | def rest(array,n=1):
222 |     """skips the first n items in array
223 |     lazy returns the rest"""
224 |     for idx,item in enumerate(array):
225 |         if idx >= n:
226 |             yield item
227 | 
228 | skip = rest
229 | 
230 | def head(lst):
231 |     if inspect.isgenerator(lst):
232 |         return lst.next()
233 |     else:
234 |         return lst[0]
235 | 
236 | # chaining
237 | 
238 | class Chain:
239 |     def __init__(self,obj):
240 |         self.obj = obj
241 |     def __repr__(self):
242 |         return "Chain(%s)" % self.obj.__repr__()
243 |     def value(self,generator_to_list=True):
244 |         if type(self.obj) == dict:
245 |             return self.obj
246 |         elif generator_to_list and (inspect.isgenerator(self.obj) or
247 |                                   type(self.obj) == type({}.iteritems())):
248 |             return list(self.obj)
249 |         else:
250 |             return self.obj
251 |     def __getattr__(self,name):
252 |         def method_missing(*args):
253 |             self.obj = globals()[name](self.obj,*args)
254 |             return self
255 |         return method_missing
256 | 


--------------------------------------------------------------------------------