├── .gitignore
├── README.md
├── checkpoints
    ├── __init__.py
    └── checkpoints.py
├── setup.py
└── tests
    ├── __init__.py
    └── test_checkpoints.py


/.gitignore:
--------------------------------------------------------------------------------
1 | .idea/
2 | .ipynb_checkpoints/
3 | __pycache__/
4 | checkpoints.egg-info/
5 | dist/


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # checkpoints [![PyPi version](https://img.shields.io/pypi/v/checkpoints.svg)](https://pypi.python.org/pypi/checkpoints/) ![t](https://img.shields.io/badge/status-stable-green.svg)
  2 | 
  3 | ![demo](http://i.imgur.com/paxQ51Y.gif)
  4 | 
  5 | `checkpoints` is a tiny module which imports new `pandas.DataFrame.safe_apply` and `pandas.Series.safe_map`
  6 | expressions, stop-and-start versions of the `pandas.DataFrame.apply` and `pandas.Series.map` operations which caches
  7 | partial results in between runtimes in case an exception is thrown.
  8 | 
  9 | This means that the next time these functions are called, the operation will pick up back where it failed, instead
 10 | of all the way back at the beginning of the map. After all, there's nothing more aggrevating than waiting ages for a
 11 | process to complete, only to lose all of your data on the last iteration!
 12 | 
 13 | Just `pip install checkpoints` to get started.
 14 | 
 15 | ## Why?
 16 | 
 17 | For a writeup with a practical example of what `checkpoints` can do for you see [this post on my personal blog](http://www.residentmar.io/2016/10/29/saving-progress-pandas.html).
 18 | 
 19 | ## Quickstart
 20 | 
 21 | To start, import `checkpoints` and enable it:
 22 | 
 23 |     >>> from checkpoints import checkpoints
 24 |     >>> checkpoints.enable()
 25 | 
 26 | This will augment your environment with `pandas.Series.safe_map` and `pandas.DataFrame.safe_apply` methods. Now
 27 | suppose we create a `Series` of floats, except for one invalid entry smack in the middle:
 28 | 
 29 |     >>> import pandas as pd; import numpy as np
 30 |     >>> rand = pd.Series(np.random.random(100))
 31 |     >>> rand[50] = "____"
 32 | 
 33 | Suppose we want to remean this data. If we apply a naive `map`:
 34 | 
 35 |     >>> rand.map(lambda v: v - 0.5)
 36 | 
 37 |         TypeError: unsupported operand type(s) for -: 'str' and 'float'
 38 | 
 39 | Not only are the results up to that point lost, but we're also not actually told where the failure occurs! Using
 40 | `safe_map` instead:
 41 | 
 42 |     >>> rand.safe_map(lambda v: v - 0.5)
 43 | 
 44 |         <ROOT>/checkpoint/checkpoints/checkpoints.py:96: UserWarning: Failure on index 50
 45 |         TypeError: unsupported operand type(s) for -: 'str' and 'float'
 46 | 
 47 | All of the prior results are cached, and we can retrieve them at will with `checkpoints.results`:
 48 | 
 49 |     >>> checkpoints.results
 50 | 
 51 |         0    -0.189003
 52 |         1     0.337332
 53 |         2    -0.143698
 54 |         3    -0.312296
 55 |         ...
 56 |         47   -0.188995
 57 |         48   -0.286550
 58 |         49   -0.258107
 59 |         dtype: float64
 60 | 
 61 | `checkpoints` will store the partial results until either the process fully completes or it is explicitly told to get
 62 |  rid of them using `checkpoints.flush()`:
 63 | 
 64 |     >>> checkpoints.flush()
 65 |     >>> checkpoints.results
 66 |         None
 67 | 
 68 | You can also induce this by passing a `flush=True` argument to `safe_map`.
 69 | 
 70 | `pd.DataFrame.safe_apply` is similar:
 71 | 
 72 |     >>> rand = pd.DataFrame(np.random.random(100).reshape((20,5)))
 73 |     >>> rand[2][10] = "____"
 74 |     >>> rand.apply(lambda srs: srs.sum())
 75 | 
 76 |         TypeError: unsupported operand type(s) for +: 'float' and 'str'
 77 | 
 78 |     >>> rand.safe_apply(lambda srs: srs.sum())
 79 | 
 80 |         <ROOT>/checkpoint/checkpoints/checkpoints.py:49: UserWarning: Failure on index 2
 81 |         TypeError: unsupported operand type(s) for +: 'float' and 'str'
 82 | 
 83 |     >>> checkpoints.results
 84 | 
 85 |         0    9.273607
 86 |         1    8.259637
 87 |         2    8.359239
 88 |         3    7.873243
 89 |         dtype: float64
 90 | 
 91 | Finally, the disable checkpoints:
 92 | 
 93 |     >>> checkpoints.disable()
 94 | 
 95 | ## Performance
 96 | 
 97 | Maintaining checkpoints introduces some overhead, but really not that much. `DataFrame` performance differs by a
 98 | reasonably small constant factor, while `Series` performance is one-to-one:
 99 | 
100 | ![Performance charts](http://i.imgur.com/jFIgXOG.png)
101 | 
102 | ## Technicals
103 | 
104 | Under the hood, `checkpoints` implements a [state machine](https://en.wikipedia.org/wiki/Finite-state_machine),
105 | `CheckpointStateMachine`, which uses a simple list to keep track of which entries have and haven't been mapped yet.
106 | The function fed to `safe_*` is placed in a `wrapper` which redirects its output to a `results` list. When a map is
107 | interrupted midway, then rerun, `safe_*` partitions the input, using the length of `results` to return to the first
108 | non-outputted entry, and continues to run the `wrapper` on that slice.
109 | 
110 | An actual `pandas` object isn't generated until **all** entries have been mapped. At that point `results` is
111 | repackaged into a `Series` or `DataFrame`, wiped, and a `pandas` object is returned, leaving `CheckpointStateMachine`
112 |  ready to handle the next set of inputs.
113 | 
114 | ## Limitations
115 | 
116 | * Another feature useful for long-running methods are progress bars, but as of now there is no way to integrate
117 | `checkpoints` with e.g. [`tqdm`](https://github.com/tqdm/tqdm). The workaround is to estimate the time cost of your
118 | process beforehand.
119 | * `pandas.DataFrame.safe_apply` jobs on functions returning `DataFrame` are not currently implemented, and will
120 | simply return `None`. This means that e.g. the following will silently fail:
121 | 
122 |     `>>> pd.DataFrame({'a': [1, 2], 'b': [3, 4]}).safe_apply(lambda v: pd.DataFrame({'a': [1, 2], 'b': [2, 3]}))`
123 | 
124 | 
125 | * The `Series.map` `na_action` parameter is not implemented; nor are any of `broadcast`, `raw`, or `reduce` for
126 | `DataFrame.apply`.
127 | 
128 | ## See also
129 | 
130 | `checkpoints` provides a form of [defensive programming](https://en.wikipedia.org/wiki/Defensive_programming). If
131 | you're a fan of this sort of thing, you should also check out [`engarde`](https://github.com/TomAugspurger/engarde).
132 | 
133 | ## Contributing
134 | 
135 | Bugs? Thoughts? Feature requests? [Throw them at the bug tracker and I'll take a look](https://github.com/ResidentMario/checkpoints/issues).
136 | 
137 | As always I'm very interested in hearing feedback&mdash;reach out to me at `aleksey@residentmar.io`.
138 | 


--------------------------------------------------------------------------------
/checkpoints/__init__.py:
--------------------------------------------------------------------------------
1 | from .checkpoints import checkpoints
2 | 


--------------------------------------------------------------------------------
/checkpoints/checkpoints.py:
--------------------------------------------------------------------------------
  1 | import pandas as pd
  2 | from pandas.core.frame import DataFrame, Series
  3 | import warnings
  4 | 
  5 | 
  6 | class CheckpointStateMachine:
  7 | 
  8 |     def __init__(self, **kwargs):
  9 |         self._results = []
 10 |         self._index = None
 11 |         self._caller = None
 12 |         self._axis = 0
 13 |         # cf. self.results, virtualized using `__getattr__()`.
 14 | 
 15 |     def disable(self):
 16 |         """
 17 |         Core runtime method, disables all of the `checkpoints` safe mappers.
 18 |         """
 19 |         self.flush()
 20 |         del DataFrame.safe_apply
 21 |         del Series.safe_map
 22 | 
 23 |     def enable(self):
 24 |         """
 25 |         Core runtime method, enables all of the `checkpoints` safe mappers.
 26 |         """
 27 | 
 28 |         def safe_apply(df, func, **kwargs):
 29 |             """
 30 |             Core method, implements a cached version of `pandas.DataFrame.apply`.
 31 |             """
 32 | 
 33 |             # If flushed, restart from scratch.
 34 |             if 'flush' in kwargs and kwargs['flush'] == True:
 35 |                 kwargs.pop('flush')
 36 |                 self.flush()
 37 | 
 38 |             # If index is not defined, define it. Note that this is not the index of the original DataFrame,
 39 |             # but the index of concatenation. If we are slicing and then concatenating on the columns (0),
 40 |             # the index that we must append at the end is of the column headers; if we are slicing and concatenating on
 41 |             # the rows (1), it's the row headers, e.g. the original index.
 42 |             #
 43 |             # See `__getattr__` for more shenanigans around this.
 44 |             if self._index is None:
 45 |                 if 'axis' in kwargs and (kwargs['axis'] == 1 or kwargs['axis'] == 'columns'):
 46 |                     self._axis = 1
 47 |                     self._index = df.index
 48 |                 else:
 49 |                     self._axis = 0
 50 |                     self._index = df.columns
 51 | 
 52 |             # If caller is not defined, define it.
 53 |             if self._caller is None:
 54 |                 self._caller = "safe_apply"
 55 | 
 56 |             # Prune **kwargs of unimplemented pd.DataFrame.apply features---as of pandas 0.19, `broadcast`, `raw`,
 57 |             # and `reduce`.
 58 |             if 'broadcast' in kwargs or 'raw' in kwargs or 'reduce' in kwargs:
 59 |                 raise NotImplementedError
 60 | 
 61 |             # Shorten the jobs list to the DataFrame elements remaining.
 62 |             df_remaining = df.iloc[len(self._results):]
 63 | 
 64 |             # Replace the original applied `func` with a stateful wrapper.
 65 |             def wrapper(srs, **kwargs):
 66 |                 try:
 67 |                     self._results.append(func(srs, **kwargs))
 68 |                 except (KeyboardInterrupt, SystemExit):
 69 |                     raise
 70 |                 except:
 71 |                     warnings.warn("Failure on index {0}".format(len(self._results)), UserWarning)
 72 |                     raise
 73 | 
 74 |             # Populate `self.results`.
 75 |             if 'axis' in kwargs and (kwargs['axis'] == 1 or kwargs['axis'] == 'columns'):
 76 |                 kwargs.pop('axis')
 77 |                 for _, srs in df_remaining.iterrows():
 78 |                     wrapper(srs, **kwargs)
 79 |             else:
 80 |                 if 'axis' in kwargs:
 81 |                     kwargs.pop('axis')
 82 |                 for _, srs in df_remaining.iteritems():
 83 |                     wrapper(srs, **kwargs)
 84 | 
 85 |             # If we got here, then we didn't exit out due to an exception, and we can finish the method successfully.
 86 |             # Let `pandas.apply` handle concatenation using a trivial combiner.
 87 |             out = self.results
 88 | 
 89 |             # Reset the results set for the next iteration.
 90 |             self.flush()
 91 | 
 92 |             # Return
 93 |             return out
 94 | 
 95 |         DataFrame.safe_apply = safe_apply
 96 | 
 97 |         def safe_map(srs, func, *args, **kwargs):
 98 |             """
 99 |             Core method, implements a cached version of `pandas.Series.map`.
100 |             """
101 | 
102 |             # If flushed, restart from scratch.
103 |             if 'flush' in kwargs and kwargs['flush'] == True:
104 |                 self.flush()
105 | 
106 |             # If index is not defined, define it. Same with caller.
107 |             if self._index is None:
108 |                 self._index = srs.index
109 |             if self._caller is None:
110 |                 self._caller = "safe_map"
111 | 
112 |             # Prune **kwargs of unimplemented pd.Series.map features---only one as of pandas 0.19, `na_action`.
113 |             if 'na_action' in kwargs:
114 |                 raise NotImplementedError
115 | 
116 |             # Shorten the jobs list to the DataFrame elements remaining.
117 |             srs_remaining = srs.iloc[len(self._results):]
118 | 
119 |             # Replace the original applied `func` with a stateful wrapper.
120 |             def wrapper(val):
121 |                 try:
122 |                     self._results.append(func(val))
123 |                 except (KeyboardInterrupt, SystemExit):
124 |                     raise
125 |                 except:
126 |                     warnings.warn("Failure on index {0}".format(len(self._results)), UserWarning)
127 |                     raise
128 | 
129 |             # Populate `self.results`.
130 |             for _, val in srs_remaining.iteritems():
131 |                 wrapper(val)
132 | 
133 |             # If we got here, then we didn't exit out due to an exception, and we can finish the method successfully.
134 |             # Let `pandas.apply` handle concatenation using a trivial combiner.
135 |             out = self.results
136 | 
137 |             # Reset the results set for the next iteration.
138 |             self.flush()
139 | 
140 |             # Return
141 |             return out
142 | 
143 |         Series.safe_map = safe_map
144 | 
145 |     def __getattr__(self, item):
146 |         """
147 |         Implements a lazy getter for fetching partial results using `checkpoints.results`.
148 |         """
149 |         if item != "results":
150 |             raise AttributeError
151 |         elif self._caller == "safe_map":
152 |             if len(self._results) == 0:
153 |                 return None
154 |             else:
155 |                 out = pd.Series(self._results)
156 |                 out.index = self._index[:len(out)]
157 |                 return out
158 |         elif self._caller == "safe_apply":
159 |             # import pdb; pdb.set_trace()
160 |             if len(self._results) == 0:
161 |                 return None
162 |             elif isinstance(self._results[0], Series):
163 |                 # Note that `self._index` is not the index of the DataFrame, but the index of concatenation. If we
164 |                 # are slicing and then concatenating on the columns (0), the index that we must append at the end is
165 |                 # of the column headers; if we are slicing and concatenating on the rows (1), it's the row headers,
166 |                 # e.g. the original index.
167 |                 #
168 |                 # Which of these two is the case totally changes which strategy we follow for gluing the data back
169 |                 # into a DataFrame matching our original design, requiring all of the tedium of keeping track of it
170 |                 # both here and above, when we bind `_axis` and `_index` variables.
171 |                 # import pdb; pdb.set_trace()
172 |                 if self._axis == 1:
173 |                     out = pd.DataFrame(self._results, index=self._index)
174 |                 else:
175 |                     out = pd.concat(self._results, axis=1)
176 |                     out.columns = self._index
177 |                 return out
178 |             elif isinstance(self._results[0], DataFrame):
179 |                 pass
180 |             else:
181 |                 out = pd.Series(self._results)
182 |                 out.index = self._index[:len(out)]
183 |                 return out
184 | 
185 |     def flush(self):
186 |         """
187 |         Flushes the contents of the `checkpoints` state machine.
188 |         """
189 |         self._results = []
190 |         self._index = None
191 |         self._caller = None
192 | 
193 | 
194 | checkpoints = CheckpointStateMachine()
195 | disable = checkpoints.disable
196 | enable = checkpoints.enable
197 | flush = checkpoints.flush
198 | 


--------------------------------------------------------------------------------
/setup.py:
--------------------------------------------------------------------------------
 1 | from setuptools import setup
 2 | setup(
 3 |   name = 'checkpoints',
 4 |   packages = ['checkpoints'], # this must be the same as the name above
 5 |   py_modules=['checkpoints'],
 6 |   version = '0.0.1',
 7 |   description = 'Partial result caching for pandas in Python.',
 8 |   author = 'Aleksey Bilogur',
 9 |   author_email = 'aleksey.bilogur@residentmar.io',
10 |   url = 'https://github.com/ResidentMario/checkpoints',
11 |   download_url = 'https://github.com/ResidentMario/checkpoints/tarball/0.0.1',
12 |   keywords = ['data', 'data analysis', 'exceptions', 'error handling', 'defensive programming,' 'data science',
13 |               'pandas', 'python', 'jupyter'],
14 |   classifiers = [],
15 | )


--------------------------------------------------------------------------------
/tests/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ResidentMario/checkpoints/ecf5c97224c4aa551b16cb8171e48af43d4c781c/tests/__init__.py


--------------------------------------------------------------------------------
/tests/test_checkpoints.py:
--------------------------------------------------------------------------------
  1 | import unittest
  2 | from checkpoints import checkpoints
  3 | import pandas as pd
  4 | import numpy as np
  5 | 
  6 | 
  7 | class TestInteractions(unittest.TestCase):
  8 | 
  9 |     def setUp(self):
 10 |         checkpoints.enable()
 11 | 
 12 |     def testEnable(self):
 13 |         pd.Series.safe_map
 14 |         pd.DataFrame.safe_apply
 15 | 
 16 |     def testFlush(self):
 17 |         checkpoints._results = ['foo', 'bar']
 18 |         checkpoints._index = pd.Index([1,2])
 19 |         checkpoints.flush()
 20 |         self.assertEqual(checkpoints._results, [])
 21 |         self.assertEqual(checkpoints._index, None)
 22 |         pass
 23 | 
 24 |     def testResultsEmpty(self):
 25 |         checkpoints.flush()
 26 |         self.assertEqual(checkpoints.results, None)
 27 | 
 28 |     def testResultsScalars(self):
 29 |         checkpoints._results = ['foo', 'bar']
 30 |         checkpoints._index = pd.Index([1,2])
 31 |         checkpoints._caller = "safe_apply"
 32 |         self.assertTrue(np.array_equal(checkpoints.results.values, np.array(['foo', 'bar'])))
 33 | 
 34 |     def testInvalidAttr(self):
 35 |         with self.assertRaises(AttributeError):
 36 |             checkpoints.loremipsum
 37 | 
 38 |     def testDisable(self):
 39 |         checkpoints.disable()
 40 |         with self.assertRaises(AttributeError):
 41 |             pd.Series.safe_map
 42 |             pd.DataFrame.safe_apply
 43 |         checkpoints.enable()
 44 | 
 45 |     def tearDown(self):
 46 |         checkpoints.disable()
 47 | 
 48 | 
 49 | class TestSeriesMethods(unittest.TestCase):
 50 | 
 51 |     def setUp(self):
 52 |         checkpoints.enable()
 53 | 
 54 |     def testPartialScalarFunctionalInput(self):
 55 |         checkpoints.flush()
 56 |         srs = pd.Series(np.random.random(100))
 57 |         srs[50] = 0
 58 | 
 59 |         def breaker(val):
 60 |             if val == 0:
 61 |                 raise IOError
 62 |             else:
 63 |                 return val
 64 | 
 65 |         import warnings
 66 |         warnings.filterwarnings('ignore')
 67 |         with self.assertRaises(IOError):
 68 |             srs.safe_map(breaker)
 69 |         self.assertEquals(len(checkpoints._results), 50)
 70 |         self.assertIsNot(checkpoints._index, None)
 71 |         srs[50] = 1
 72 |         result = srs.safe_map(breaker)
 73 |         self.assertIsNot(result, None)
 74 |         self.assertIsInstance(result, pd.Series)
 75 |         self.assertEqual(len(result), 100)
 76 | 
 77 |     def testCompleteScalarFunctionalInput(self):
 78 |         checkpoints.flush()
 79 |         srs = pd.Series(np.random.random(100))
 80 |         m1 = np.average(srs.map(lambda val: val - 0.5))
 81 |         m2 = np.average(srs.safe_map(lambda val: val - 0.5))
 82 |         self.assertAlmostEqual(m1, m2)
 83 | 
 84 |     def testCompleteListFunctionalInput(self):
 85 |         checkpoints.flush()
 86 |         self.assertTrue(
 87 |             np.array_equal(pd.Series([1,2,3]).safe_map(lambda v: [1,2]),
 88 |                            pd.Series([1, 2, 3]).map(lambda v: [1, 2]))
 89 |         )
 90 | 
 91 |     def testCompleteSeriesFunctionalInput(self):
 92 |         checkpoints.flush()
 93 |         s1 = pd.Series([1, 2, 3]).safe_map(lambda v: pd.Series([1,2]))
 94 |         s2 = pd.Series([1, 2, 3]).map(lambda v: pd.Series([1,2]))
 95 |         self.assertTrue(isinstance(s1[0], pd.Series) and isinstance(s2[0], pd.Series))
 96 |         self.assertTrue(np.array_equal(s1[0].values, s2[0].values))
 97 |         self.assertTrue(np.array_equal(s1[1].values, s2[1].values))
 98 | 
 99 |     def testCompleteDictFunctionalInput(self):
100 |         checkpoints.flush()
101 |         d1 = pd.Series([1, 2, 3]).safe_map(lambda v: pd.Series({'a': 1, 'b': 2}))
102 |         d2 = pd.Series([1, 2, 3]).map(lambda v: pd.Series({'a': 1, 'b': 2}))
103 |         self.assertTrue(isinstance(d1[0], pd.Series) and isinstance(d2[0], pd.Series))
104 |         self.assertTrue(np.array_equal(d1[0].values, d1[0].values))
105 |         self.assertTrue(np.array_equal(d2[1].values, d2[1].values))
106 | 
107 |     def testCompleteDataFrameFunctionalInput(self):
108 |         checkpoints.flush()
109 |         d1 = pd.Series([1, 2, 3]).safe_map(lambda v: pd.DataFrame({'a': [1, 2], 'b': [2, 3]}))
110 |         d2 = pd.Series([1, 2, 3]).map(lambda v: pd.DataFrame({'a': [1, 2], 'b': [2, 3]}))
111 |         self.assertTrue(isinstance(d1[0], pd.DataFrame) and isinstance(d2[0], pd.DataFrame))
112 |         self.assertTrue(np.array_equal(d1[0].values, d1[0].values))
113 |         self.assertTrue(np.array_equal(d2[1].values, d2[1].values))
114 | 
115 |     def testNotImplemented(self):
116 |         with self.assertRaises(NotImplementedError):
117 |             pd.Series([1]).safe_map(lambda v: v, na_action='ignore')
118 | 
119 |     def tearDown(self):
120 |         checkpoints.disable()
121 | 
122 | 
123 | class TestDataFrameMethods(unittest.TestCase):
124 | 
125 |     def setUp(self):
126 |         checkpoints.enable()
127 | 
128 |     def testPartialScalarFunctionalInput(self):
129 |         checkpoints.flush()
130 |         df = pd.DataFrame(np.random.random(100).reshape((20, 5)))
131 |         df[2][4] = 0
132 | 
133 |         def breaker(srs):
134 |             if 0 in srs.values:
135 |                 raise IOError
136 |             else:
137 |                 return srs.sum()
138 | 
139 |         import warnings
140 |         warnings.filterwarnings('ignore')
141 |         with self.assertRaises(IOError):
142 |             df.safe_apply(breaker, axis='columns')
143 |         self.assertEquals(len(checkpoints._results), 4)
144 |         self.assertIsNot(checkpoints._index, None)
145 | 
146 |     def testCompleteScalarFunctionalInput(self):
147 |         checkpoints.flush()
148 |         df = pd.DataFrame(np.random.random(100).reshape((20, 5)))
149 |         m1 = np.average(df.apply(sum))
150 |         m2 = np.average(df.safe_apply(sum))
151 |         self.assertAlmostEqual(m1, m2)
152 | 
153 |     def testCompleteListFunctionalInput(self):
154 |         checkpoints.flush()
155 |         self.assertTrue(
156 |             np.array_equal(pd.DataFrame({'a': [1, 2], 'b': [3, 4]}).safe_apply(lambda v: [1, 2]),
157 |                            pd.DataFrame({'a': [1, 2], 'b': [3, 4]}).safe_apply(lambda v: [1, 2]))
158 |         )
159 | 
160 |     def testCompleteSeriesFunctionalInput(self):
161 |         checkpoints.flush()
162 |         s1 = pd.DataFrame({'a': [1, 2], 'b': [3, 4]}).safe_apply(lambda v: pd.Series(['A','B']))
163 |         s2 = pd.DataFrame({'a': [1, 2], 'b': [3, 4]}).apply(lambda v: pd.Series(['A','B']))
164 |         self.assertTrue(s1.equals(s2))
165 | 
166 |     def testCompleteSeriesFunctionalInputColumnar(self):
167 |         checkpoints.flush()
168 |         s1 = pd.DataFrame({'a': [1, 2], 'b': [3, 4]}).safe_apply(lambda v: pd.Series(['A','B']), axis=1)
169 |         s2 = pd.DataFrame({'a': [1, 2], 'b': [3, 4]}).apply(lambda v: pd.Series(['A','B']), axis=1)
170 |         self.assertTrue(s1.equals(s2))
171 | 
172 |     def testCompleteDictFunctionalInput(self):
173 |         checkpoints.flush()
174 |         d1 = pd.DataFrame({'a': [1, 2], 'b': [3, 4]}).safe_apply(lambda v: pd.Series({'a': 1, 'b': 2}))
175 |         d2 = pd.DataFrame({'a': [1, 2], 'b': [3, 4]}).apply(lambda v: pd.Series({'a': 1, 'b': 2}))
176 |         self.assertTrue(d1.equals(d2))
177 | 
178 |     def testCompleteDictFunctionalInputColumnar(self):
179 |         checkpoints.flush()
180 |         d1 = pd.DataFrame({'a': [1, 2], 'b': [3, 4]}).safe_apply(lambda v: pd.Series({'a': 1, 'b': 2}), axis=1)
181 |         d2 = pd.DataFrame({'a': [1, 2], 'b': [3, 4]}).apply(lambda v: pd.Series({'a': 1, 'b': 2}), axis=1)
182 |         self.assertTrue(d1.equals(d2))
183 | 
184 |     # This cannot be done. It should raise a NotImplementedError, except that there's no natural break point in the
185 |     # code for doing so. It's documented as a limitation in the README.
186 |     # def testCompleteDataFrameFunctionalInput(self):
187 |     #     checkpoints.flush()
188 |     #     d1 = pd.DataFrame({'a': [1, 2], 'b': [3, 4]}).safe_apply(lambda v: pd.DataFrame({'a': [1, 2], 'b': [2, 3]}))
189 |     #     d2 = pd.DataFrame({'a': [1, 2], 'b': [3, 4]}).apply(lambda v: pd.DataFrame({'a': [1, 2], 'b': [2, 3]}))
190 |     #     self.assertTrue(d1.equals(d2))
191 | 
192 |     def testFunctionWithKwargs(self):
193 |         checkpoints.flush()
194 | 
195 |         def f(srs, **kwargs):
196 |             return 1 + sum(kwargs.values())
197 | 
198 |         r = pd.DataFrame({'a': [1, 2], 'b': [3, 4]}).safe_apply(f, somearg=5)
199 |         self.assertTrue(np.array_equal(r.values, [6, 6]))
200 | 
201 |     def testNotImplemented(self):
202 |         with self.assertRaises(NotImplementedError):
203 |             pd.DataFrame({'a':[1], 'b': [2]}).safe_apply(lambda v: v, broadcast=True)
204 |         with self.assertRaises(NotImplementedError):
205 |             pd.DataFrame({'a': [1], 'b': [2]}).safe_apply(lambda v: v, raw=True)
206 |         with self.assertRaises(NotImplementedError):
207 |             pd.DataFrame({'a': [1], 'b': [2]}).safe_apply(lambda v: v, reduce=True)
208 | 
209 |     def tearDown(self):
210 |         checkpoints.disable()
211 | 


--------------------------------------------------------------------------------