├── .gitignore ├── README.md ├── checkpoints ├── __init__.py └── checkpoints.py ├── setup.py └── tests ├── __init__.py └── test_checkpoints.py /.gitignore: -------------------------------------------------------------------------------- 1 | .idea/ 2 | .ipynb_checkpoints/ 3 | __pycache__/ 4 | checkpoints.egg-info/ 5 | dist/ -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # checkpoints [![PyPi version](https://img.shields.io/pypi/v/checkpoints.svg)](https://pypi.python.org/pypi/checkpoints/) ![t](https://img.shields.io/badge/status-stable-green.svg) 2 | 3 | ![demo](http://i.imgur.com/paxQ51Y.gif) 4 | 5 | `checkpoints` is a tiny module which imports new `pandas.DataFrame.safe_apply` and `pandas.Series.safe_map` 6 | expressions, stop-and-start versions of the `pandas.DataFrame.apply` and `pandas.Series.map` operations which caches 7 | partial results in between runtimes in case an exception is thrown. 8 | 9 | This means that the next time these functions are called, the operation will pick up back where it failed, instead 10 | of all the way back at the beginning of the map. After all, there's nothing more aggrevating than waiting ages for a 11 | process to complete, only to lose all of your data on the last iteration! 12 | 13 | Just `pip install checkpoints` to get started. 14 | 15 | ## Why? 16 | 17 | For a writeup with a practical example of what `checkpoints` can do for you see [this post on my personal blog](http://www.residentmar.io/2016/10/29/saving-progress-pandas.html). 18 | 19 | ## Quickstart 20 | 21 | To start, import `checkpoints` and enable it: 22 | 23 | >>> from checkpoints import checkpoints 24 | >>> checkpoints.enable() 25 | 26 | This will augment your environment with `pandas.Series.safe_map` and `pandas.DataFrame.safe_apply` methods. Now 27 | suppose we create a `Series` of floats, except for one invalid entry smack in the middle: 28 | 29 | >>> import pandas as pd; import numpy as np 30 | >>> rand = pd.Series(np.random.random(100)) 31 | >>> rand[50] = "____" 32 | 33 | Suppose we want to remean this data. If we apply a naive `map`: 34 | 35 | >>> rand.map(lambda v: v - 0.5) 36 | 37 | TypeError: unsupported operand type(s) for -: 'str' and 'float' 38 | 39 | Not only are the results up to that point lost, but we're also not actually told where the failure occurs! Using 40 | `safe_map` instead: 41 | 42 | >>> rand.safe_map(lambda v: v - 0.5) 43 | 44 | /checkpoint/checkpoints/checkpoints.py:96: UserWarning: Failure on index 50 45 | TypeError: unsupported operand type(s) for -: 'str' and 'float' 46 | 47 | All of the prior results are cached, and we can retrieve them at will with `checkpoints.results`: 48 | 49 | >>> checkpoints.results 50 | 51 | 0 -0.189003 52 | 1 0.337332 53 | 2 -0.143698 54 | 3 -0.312296 55 | ... 56 | 47 -0.188995 57 | 48 -0.286550 58 | 49 -0.258107 59 | dtype: float64 60 | 61 | `checkpoints` will store the partial results until either the process fully completes or it is explicitly told to get 62 | rid of them using `checkpoints.flush()`: 63 | 64 | >>> checkpoints.flush() 65 | >>> checkpoints.results 66 | None 67 | 68 | You can also induce this by passing a `flush=True` argument to `safe_map`. 69 | 70 | `pd.DataFrame.safe_apply` is similar: 71 | 72 | >>> rand = pd.DataFrame(np.random.random(100).reshape((20,5))) 73 | >>> rand[2][10] = "____" 74 | >>> rand.apply(lambda srs: srs.sum()) 75 | 76 | TypeError: unsupported operand type(s) for +: 'float' and 'str' 77 | 78 | >>> rand.safe_apply(lambda srs: srs.sum()) 79 | 80 | /checkpoint/checkpoints/checkpoints.py:49: UserWarning: Failure on index 2 81 | TypeError: unsupported operand type(s) for +: 'float' and 'str' 82 | 83 | >>> checkpoints.results 84 | 85 | 0 9.273607 86 | 1 8.259637 87 | 2 8.359239 88 | 3 7.873243 89 | dtype: float64 90 | 91 | Finally, the disable checkpoints: 92 | 93 | >>> checkpoints.disable() 94 | 95 | ## Performance 96 | 97 | Maintaining checkpoints introduces some overhead, but really not that much. `DataFrame` performance differs by a 98 | reasonably small constant factor, while `Series` performance is one-to-one: 99 | 100 | ![Performance charts](http://i.imgur.com/jFIgXOG.png) 101 | 102 | ## Technicals 103 | 104 | Under the hood, `checkpoints` implements a [state machine](https://en.wikipedia.org/wiki/Finite-state_machine), 105 | `CheckpointStateMachine`, which uses a simple list to keep track of which entries have and haven't been mapped yet. 106 | The function fed to `safe_*` is placed in a `wrapper` which redirects its output to a `results` list. When a map is 107 | interrupted midway, then rerun, `safe_*` partitions the input, using the length of `results` to return to the first 108 | non-outputted entry, and continues to run the `wrapper` on that slice. 109 | 110 | An actual `pandas` object isn't generated until **all** entries have been mapped. At that point `results` is 111 | repackaged into a `Series` or `DataFrame`, wiped, and a `pandas` object is returned, leaving `CheckpointStateMachine` 112 | ready to handle the next set of inputs. 113 | 114 | ## Limitations 115 | 116 | * Another feature useful for long-running methods are progress bars, but as of now there is no way to integrate 117 | `checkpoints` with e.g. [`tqdm`](https://github.com/tqdm/tqdm). The workaround is to estimate the time cost of your 118 | process beforehand. 119 | * `pandas.DataFrame.safe_apply` jobs on functions returning `DataFrame` are not currently implemented, and will 120 | simply return `None`. This means that e.g. the following will silently fail: 121 | 122 | `>>> pd.DataFrame({'a': [1, 2], 'b': [3, 4]}).safe_apply(lambda v: pd.DataFrame({'a': [1, 2], 'b': [2, 3]}))` 123 | 124 | 125 | * The `Series.map` `na_action` parameter is not implemented; nor are any of `broadcast`, `raw`, or `reduce` for 126 | `DataFrame.apply`. 127 | 128 | ## See also 129 | 130 | `checkpoints` provides a form of [defensive programming](https://en.wikipedia.org/wiki/Defensive_programming). If 131 | you're a fan of this sort of thing, you should also check out [`engarde`](https://github.com/TomAugspurger/engarde). 132 | 133 | ## Contributing 134 | 135 | Bugs? Thoughts? Feature requests? [Throw them at the bug tracker and I'll take a look](https://github.com/ResidentMario/checkpoints/issues). 136 | 137 | As always I'm very interested in hearing feedback—reach out to me at `aleksey@residentmar.io`. 138 | -------------------------------------------------------------------------------- /checkpoints/__init__.py: -------------------------------------------------------------------------------- 1 | from .checkpoints import checkpoints 2 | -------------------------------------------------------------------------------- /checkpoints/checkpoints.py: -------------------------------------------------------------------------------- 1 | import pandas as pd 2 | from pandas.core.frame import DataFrame, Series 3 | import warnings 4 | 5 | 6 | class CheckpointStateMachine: 7 | 8 | def __init__(self, **kwargs): 9 | self._results = [] 10 | self._index = None 11 | self._caller = None 12 | self._axis = 0 13 | # cf. self.results, virtualized using `__getattr__()`. 14 | 15 | def disable(self): 16 | """ 17 | Core runtime method, disables all of the `checkpoints` safe mappers. 18 | """ 19 | self.flush() 20 | del DataFrame.safe_apply 21 | del Series.safe_map 22 | 23 | def enable(self): 24 | """ 25 | Core runtime method, enables all of the `checkpoints` safe mappers. 26 | """ 27 | 28 | def safe_apply(df, func, **kwargs): 29 | """ 30 | Core method, implements a cached version of `pandas.DataFrame.apply`. 31 | """ 32 | 33 | # If flushed, restart from scratch. 34 | if 'flush' in kwargs and kwargs['flush'] == True: 35 | kwargs.pop('flush') 36 | self.flush() 37 | 38 | # If index is not defined, define it. Note that this is not the index of the original DataFrame, 39 | # but the index of concatenation. If we are slicing and then concatenating on the columns (0), 40 | # the index that we must append at the end is of the column headers; if we are slicing and concatenating on 41 | # the rows (1), it's the row headers, e.g. the original index. 42 | # 43 | # See `__getattr__` for more shenanigans around this. 44 | if self._index is None: 45 | if 'axis' in kwargs and (kwargs['axis'] == 1 or kwargs['axis'] == 'columns'): 46 | self._axis = 1 47 | self._index = df.index 48 | else: 49 | self._axis = 0 50 | self._index = df.columns 51 | 52 | # If caller is not defined, define it. 53 | if self._caller is None: 54 | self._caller = "safe_apply" 55 | 56 | # Prune **kwargs of unimplemented pd.DataFrame.apply features---as of pandas 0.19, `broadcast`, `raw`, 57 | # and `reduce`. 58 | if 'broadcast' in kwargs or 'raw' in kwargs or 'reduce' in kwargs: 59 | raise NotImplementedError 60 | 61 | # Shorten the jobs list to the DataFrame elements remaining. 62 | df_remaining = df.iloc[len(self._results):] 63 | 64 | # Replace the original applied `func` with a stateful wrapper. 65 | def wrapper(srs, **kwargs): 66 | try: 67 | self._results.append(func(srs, **kwargs)) 68 | except (KeyboardInterrupt, SystemExit): 69 | raise 70 | except: 71 | warnings.warn("Failure on index {0}".format(len(self._results)), UserWarning) 72 | raise 73 | 74 | # Populate `self.results`. 75 | if 'axis' in kwargs and (kwargs['axis'] == 1 or kwargs['axis'] == 'columns'): 76 | kwargs.pop('axis') 77 | for _, srs in df_remaining.iterrows(): 78 | wrapper(srs, **kwargs) 79 | else: 80 | if 'axis' in kwargs: 81 | kwargs.pop('axis') 82 | for _, srs in df_remaining.iteritems(): 83 | wrapper(srs, **kwargs) 84 | 85 | # If we got here, then we didn't exit out due to an exception, and we can finish the method successfully. 86 | # Let `pandas.apply` handle concatenation using a trivial combiner. 87 | out = self.results 88 | 89 | # Reset the results set for the next iteration. 90 | self.flush() 91 | 92 | # Return 93 | return out 94 | 95 | DataFrame.safe_apply = safe_apply 96 | 97 | def safe_map(srs, func, *args, **kwargs): 98 | """ 99 | Core method, implements a cached version of `pandas.Series.map`. 100 | """ 101 | 102 | # If flushed, restart from scratch. 103 | if 'flush' in kwargs and kwargs['flush'] == True: 104 | self.flush() 105 | 106 | # If index is not defined, define it. Same with caller. 107 | if self._index is None: 108 | self._index = srs.index 109 | if self._caller is None: 110 | self._caller = "safe_map" 111 | 112 | # Prune **kwargs of unimplemented pd.Series.map features---only one as of pandas 0.19, `na_action`. 113 | if 'na_action' in kwargs: 114 | raise NotImplementedError 115 | 116 | # Shorten the jobs list to the DataFrame elements remaining. 117 | srs_remaining = srs.iloc[len(self._results):] 118 | 119 | # Replace the original applied `func` with a stateful wrapper. 120 | def wrapper(val): 121 | try: 122 | self._results.append(func(val)) 123 | except (KeyboardInterrupt, SystemExit): 124 | raise 125 | except: 126 | warnings.warn("Failure on index {0}".format(len(self._results)), UserWarning) 127 | raise 128 | 129 | # Populate `self.results`. 130 | for _, val in srs_remaining.iteritems(): 131 | wrapper(val) 132 | 133 | # If we got here, then we didn't exit out due to an exception, and we can finish the method successfully. 134 | # Let `pandas.apply` handle concatenation using a trivial combiner. 135 | out = self.results 136 | 137 | # Reset the results set for the next iteration. 138 | self.flush() 139 | 140 | # Return 141 | return out 142 | 143 | Series.safe_map = safe_map 144 | 145 | def __getattr__(self, item): 146 | """ 147 | Implements a lazy getter for fetching partial results using `checkpoints.results`. 148 | """ 149 | if item != "results": 150 | raise AttributeError 151 | elif self._caller == "safe_map": 152 | if len(self._results) == 0: 153 | return None 154 | else: 155 | out = pd.Series(self._results) 156 | out.index = self._index[:len(out)] 157 | return out 158 | elif self._caller == "safe_apply": 159 | # import pdb; pdb.set_trace() 160 | if len(self._results) == 0: 161 | return None 162 | elif isinstance(self._results[0], Series): 163 | # Note that `self._index` is not the index of the DataFrame, but the index of concatenation. If we 164 | # are slicing and then concatenating on the columns (0), the index that we must append at the end is 165 | # of the column headers; if we are slicing and concatenating on the rows (1), it's the row headers, 166 | # e.g. the original index. 167 | # 168 | # Which of these two is the case totally changes which strategy we follow for gluing the data back 169 | # into a DataFrame matching our original design, requiring all of the tedium of keeping track of it 170 | # both here and above, when we bind `_axis` and `_index` variables. 171 | # import pdb; pdb.set_trace() 172 | if self._axis == 1: 173 | out = pd.DataFrame(self._results, index=self._index) 174 | else: 175 | out = pd.concat(self._results, axis=1) 176 | out.columns = self._index 177 | return out 178 | elif isinstance(self._results[0], DataFrame): 179 | pass 180 | else: 181 | out = pd.Series(self._results) 182 | out.index = self._index[:len(out)] 183 | return out 184 | 185 | def flush(self): 186 | """ 187 | Flushes the contents of the `checkpoints` state machine. 188 | """ 189 | self._results = [] 190 | self._index = None 191 | self._caller = None 192 | 193 | 194 | checkpoints = CheckpointStateMachine() 195 | disable = checkpoints.disable 196 | enable = checkpoints.enable 197 | flush = checkpoints.flush 198 | -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | from setuptools import setup 2 | setup( 3 | name = 'checkpoints', 4 | packages = ['checkpoints'], # this must be the same as the name above 5 | py_modules=['checkpoints'], 6 | version = '0.0.1', 7 | description = 'Partial result caching for pandas in Python.', 8 | author = 'Aleksey Bilogur', 9 | author_email = 'aleksey.bilogur@residentmar.io', 10 | url = 'https://github.com/ResidentMario/checkpoints', 11 | download_url = 'https://github.com/ResidentMario/checkpoints/tarball/0.0.1', 12 | keywords = ['data', 'data analysis', 'exceptions', 'error handling', 'defensive programming,' 'data science', 13 | 'pandas', 'python', 'jupyter'], 14 | classifiers = [], 15 | ) -------------------------------------------------------------------------------- /tests/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ResidentMario/checkpoints/ecf5c97224c4aa551b16cb8171e48af43d4c781c/tests/__init__.py -------------------------------------------------------------------------------- /tests/test_checkpoints.py: -------------------------------------------------------------------------------- 1 | import unittest 2 | from checkpoints import checkpoints 3 | import pandas as pd 4 | import numpy as np 5 | 6 | 7 | class TestInteractions(unittest.TestCase): 8 | 9 | def setUp(self): 10 | checkpoints.enable() 11 | 12 | def testEnable(self): 13 | pd.Series.safe_map 14 | pd.DataFrame.safe_apply 15 | 16 | def testFlush(self): 17 | checkpoints._results = ['foo', 'bar'] 18 | checkpoints._index = pd.Index([1,2]) 19 | checkpoints.flush() 20 | self.assertEqual(checkpoints._results, []) 21 | self.assertEqual(checkpoints._index, None) 22 | pass 23 | 24 | def testResultsEmpty(self): 25 | checkpoints.flush() 26 | self.assertEqual(checkpoints.results, None) 27 | 28 | def testResultsScalars(self): 29 | checkpoints._results = ['foo', 'bar'] 30 | checkpoints._index = pd.Index([1,2]) 31 | checkpoints._caller = "safe_apply" 32 | self.assertTrue(np.array_equal(checkpoints.results.values, np.array(['foo', 'bar']))) 33 | 34 | def testInvalidAttr(self): 35 | with self.assertRaises(AttributeError): 36 | checkpoints.loremipsum 37 | 38 | def testDisable(self): 39 | checkpoints.disable() 40 | with self.assertRaises(AttributeError): 41 | pd.Series.safe_map 42 | pd.DataFrame.safe_apply 43 | checkpoints.enable() 44 | 45 | def tearDown(self): 46 | checkpoints.disable() 47 | 48 | 49 | class TestSeriesMethods(unittest.TestCase): 50 | 51 | def setUp(self): 52 | checkpoints.enable() 53 | 54 | def testPartialScalarFunctionalInput(self): 55 | checkpoints.flush() 56 | srs = pd.Series(np.random.random(100)) 57 | srs[50] = 0 58 | 59 | def breaker(val): 60 | if val == 0: 61 | raise IOError 62 | else: 63 | return val 64 | 65 | import warnings 66 | warnings.filterwarnings('ignore') 67 | with self.assertRaises(IOError): 68 | srs.safe_map(breaker) 69 | self.assertEquals(len(checkpoints._results), 50) 70 | self.assertIsNot(checkpoints._index, None) 71 | srs[50] = 1 72 | result = srs.safe_map(breaker) 73 | self.assertIsNot(result, None) 74 | self.assertIsInstance(result, pd.Series) 75 | self.assertEqual(len(result), 100) 76 | 77 | def testCompleteScalarFunctionalInput(self): 78 | checkpoints.flush() 79 | srs = pd.Series(np.random.random(100)) 80 | m1 = np.average(srs.map(lambda val: val - 0.5)) 81 | m2 = np.average(srs.safe_map(lambda val: val - 0.5)) 82 | self.assertAlmostEqual(m1, m2) 83 | 84 | def testCompleteListFunctionalInput(self): 85 | checkpoints.flush() 86 | self.assertTrue( 87 | np.array_equal(pd.Series([1,2,3]).safe_map(lambda v: [1,2]), 88 | pd.Series([1, 2, 3]).map(lambda v: [1, 2])) 89 | ) 90 | 91 | def testCompleteSeriesFunctionalInput(self): 92 | checkpoints.flush() 93 | s1 = pd.Series([1, 2, 3]).safe_map(lambda v: pd.Series([1,2])) 94 | s2 = pd.Series([1, 2, 3]).map(lambda v: pd.Series([1,2])) 95 | self.assertTrue(isinstance(s1[0], pd.Series) and isinstance(s2[0], pd.Series)) 96 | self.assertTrue(np.array_equal(s1[0].values, s2[0].values)) 97 | self.assertTrue(np.array_equal(s1[1].values, s2[1].values)) 98 | 99 | def testCompleteDictFunctionalInput(self): 100 | checkpoints.flush() 101 | d1 = pd.Series([1, 2, 3]).safe_map(lambda v: pd.Series({'a': 1, 'b': 2})) 102 | d2 = pd.Series([1, 2, 3]).map(lambda v: pd.Series({'a': 1, 'b': 2})) 103 | self.assertTrue(isinstance(d1[0], pd.Series) and isinstance(d2[0], pd.Series)) 104 | self.assertTrue(np.array_equal(d1[0].values, d1[0].values)) 105 | self.assertTrue(np.array_equal(d2[1].values, d2[1].values)) 106 | 107 | def testCompleteDataFrameFunctionalInput(self): 108 | checkpoints.flush() 109 | d1 = pd.Series([1, 2, 3]).safe_map(lambda v: pd.DataFrame({'a': [1, 2], 'b': [2, 3]})) 110 | d2 = pd.Series([1, 2, 3]).map(lambda v: pd.DataFrame({'a': [1, 2], 'b': [2, 3]})) 111 | self.assertTrue(isinstance(d1[0], pd.DataFrame) and isinstance(d2[0], pd.DataFrame)) 112 | self.assertTrue(np.array_equal(d1[0].values, d1[0].values)) 113 | self.assertTrue(np.array_equal(d2[1].values, d2[1].values)) 114 | 115 | def testNotImplemented(self): 116 | with self.assertRaises(NotImplementedError): 117 | pd.Series([1]).safe_map(lambda v: v, na_action='ignore') 118 | 119 | def tearDown(self): 120 | checkpoints.disable() 121 | 122 | 123 | class TestDataFrameMethods(unittest.TestCase): 124 | 125 | def setUp(self): 126 | checkpoints.enable() 127 | 128 | def testPartialScalarFunctionalInput(self): 129 | checkpoints.flush() 130 | df = pd.DataFrame(np.random.random(100).reshape((20, 5))) 131 | df[2][4] = 0 132 | 133 | def breaker(srs): 134 | if 0 in srs.values: 135 | raise IOError 136 | else: 137 | return srs.sum() 138 | 139 | import warnings 140 | warnings.filterwarnings('ignore') 141 | with self.assertRaises(IOError): 142 | df.safe_apply(breaker, axis='columns') 143 | self.assertEquals(len(checkpoints._results), 4) 144 | self.assertIsNot(checkpoints._index, None) 145 | 146 | def testCompleteScalarFunctionalInput(self): 147 | checkpoints.flush() 148 | df = pd.DataFrame(np.random.random(100).reshape((20, 5))) 149 | m1 = np.average(df.apply(sum)) 150 | m2 = np.average(df.safe_apply(sum)) 151 | self.assertAlmostEqual(m1, m2) 152 | 153 | def testCompleteListFunctionalInput(self): 154 | checkpoints.flush() 155 | self.assertTrue( 156 | np.array_equal(pd.DataFrame({'a': [1, 2], 'b': [3, 4]}).safe_apply(lambda v: [1, 2]), 157 | pd.DataFrame({'a': [1, 2], 'b': [3, 4]}).safe_apply(lambda v: [1, 2])) 158 | ) 159 | 160 | def testCompleteSeriesFunctionalInput(self): 161 | checkpoints.flush() 162 | s1 = pd.DataFrame({'a': [1, 2], 'b': [3, 4]}).safe_apply(lambda v: pd.Series(['A','B'])) 163 | s2 = pd.DataFrame({'a': [1, 2], 'b': [3, 4]}).apply(lambda v: pd.Series(['A','B'])) 164 | self.assertTrue(s1.equals(s2)) 165 | 166 | def testCompleteSeriesFunctionalInputColumnar(self): 167 | checkpoints.flush() 168 | s1 = pd.DataFrame({'a': [1, 2], 'b': [3, 4]}).safe_apply(lambda v: pd.Series(['A','B']), axis=1) 169 | s2 = pd.DataFrame({'a': [1, 2], 'b': [3, 4]}).apply(lambda v: pd.Series(['A','B']), axis=1) 170 | self.assertTrue(s1.equals(s2)) 171 | 172 | def testCompleteDictFunctionalInput(self): 173 | checkpoints.flush() 174 | d1 = pd.DataFrame({'a': [1, 2], 'b': [3, 4]}).safe_apply(lambda v: pd.Series({'a': 1, 'b': 2})) 175 | d2 = pd.DataFrame({'a': [1, 2], 'b': [3, 4]}).apply(lambda v: pd.Series({'a': 1, 'b': 2})) 176 | self.assertTrue(d1.equals(d2)) 177 | 178 | def testCompleteDictFunctionalInputColumnar(self): 179 | checkpoints.flush() 180 | d1 = pd.DataFrame({'a': [1, 2], 'b': [3, 4]}).safe_apply(lambda v: pd.Series({'a': 1, 'b': 2}), axis=1) 181 | d2 = pd.DataFrame({'a': [1, 2], 'b': [3, 4]}).apply(lambda v: pd.Series({'a': 1, 'b': 2}), axis=1) 182 | self.assertTrue(d1.equals(d2)) 183 | 184 | # This cannot be done. It should raise a NotImplementedError, except that there's no natural break point in the 185 | # code for doing so. It's documented as a limitation in the README. 186 | # def testCompleteDataFrameFunctionalInput(self): 187 | # checkpoints.flush() 188 | # d1 = pd.DataFrame({'a': [1, 2], 'b': [3, 4]}).safe_apply(lambda v: pd.DataFrame({'a': [1, 2], 'b': [2, 3]})) 189 | # d2 = pd.DataFrame({'a': [1, 2], 'b': [3, 4]}).apply(lambda v: pd.DataFrame({'a': [1, 2], 'b': [2, 3]})) 190 | # self.assertTrue(d1.equals(d2)) 191 | 192 | def testFunctionWithKwargs(self): 193 | checkpoints.flush() 194 | 195 | def f(srs, **kwargs): 196 | return 1 + sum(kwargs.values()) 197 | 198 | r = pd.DataFrame({'a': [1, 2], 'b': [3, 4]}).safe_apply(f, somearg=5) 199 | self.assertTrue(np.array_equal(r.values, [6, 6])) 200 | 201 | def testNotImplemented(self): 202 | with self.assertRaises(NotImplementedError): 203 | pd.DataFrame({'a':[1], 'b': [2]}).safe_apply(lambda v: v, broadcast=True) 204 | with self.assertRaises(NotImplementedError): 205 | pd.DataFrame({'a': [1], 'b': [2]}).safe_apply(lambda v: v, raw=True) 206 | with self.assertRaises(NotImplementedError): 207 | pd.DataFrame({'a': [1], 'b': [2]}).safe_apply(lambda v: v, reduce=True) 208 | 209 | def tearDown(self): 210 | checkpoints.disable() 211 | --------------------------------------------------------------------------------