├── .gitignore ├── LICENSE.txt ├── MANIFEST.in ├── README.md ├── doc ├── RELEASE_NOTES.md ├── code_overview.rst └── writing_codelets.rst ├── examples ├── Water diffusion.ipynb ├── group_data_items.py ├── import_modules.py ├── internal_files.py ├── internal_modules.py ├── plotting.py ├── ref_to_library.py ├── ref_to_simple.py ├── simple.py ├── snapshot.py └── test_internal_files.py ├── lib └── activepapers │ ├── __init__.py │ ├── builtins2.py │ ├── builtins3.py │ ├── cli.py │ ├── contents.py │ ├── execution.py │ ├── exploration.py │ ├── library.py │ ├── standardlib.py │ ├── standardlib2.py │ ├── standardlib3.py │ ├── storage.py │ ├── url.py │ ├── url2.py │ ├── url3.py │ ├── utility.py │ ├── utility2.py │ ├── utility3.py │ └── version.py ├── scripts └── aptool ├── setup.py └── tests ├── foo ├── __init__.py └── bar.py ├── run_all_tests.sh ├── test_basics.py ├── test_exploration.py ├── test_features.py ├── test_library.py ├── test_python_modules.py └── test_references.py /.gitignore: -------------------------------------------------------------------------------- 1 | MANIFEST 2 | build/ 3 | dist/ 4 | doc/build/ 5 | *.pyc 6 | *pycache* 7 | examples/*.ap 8 | tests/*.ap 9 | lib/*.egg-info 10 | -------------------------------------------------------------------------------- /LICENSE.txt: -------------------------------------------------------------------------------- 1 | ================================== 2 | The ActivePapers licensing terms 3 | ================================== 4 | 5 | ActivePapers is licensed under the terms of the Modified BSD License 6 | (also known as New or Revised BSD), as follows: 7 | 8 | Copyright (c) 2013, ActivePapers Development Team 9 | 10 | All rights reserved. 11 | 12 | Redistribution and use in source and binary forms, with or without 13 | modification, are permitted provided that the following conditions are met: 14 | 15 | Redistributions of source code must retain the above copyright notice, this 16 | list of conditions and the following disclaimer. 17 | 18 | Redistributions in binary form must reproduce the above copyright notice, this 19 | list of conditions and the following disclaimer in the documentation and/or 20 | other materials provided with the distribution. 21 | 22 | Neither the name of the ActivePapers Development Team nor the names of its 23 | contributors may be used to endorse or promote products derived from this 24 | software without specific prior written permission. 25 | 26 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND 27 | ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 28 | WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 29 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE 30 | FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 31 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 32 | SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 33 | CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 34 | OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 35 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 36 | 37 | 38 | About the ActivePapers Development Team 39 | --------------------------------------- 40 | 41 | The ActivePapers project was started by Konrad Hinsen (CNRS, France). 42 | -------------------------------------------------------------------------------- /MANIFEST.in: -------------------------------------------------------------------------------- 1 | include README.md LICENSE.txt 2 | recursive-include tests *.py 3 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | As of 2024, this project is archived and unmaintained. While is has achieved its 2 | mission of demonstrating that unifying computational reproducibility and 3 | provenance tracking is doable and useful, it has also demonstrated that Python 4 | is not a suitable platform to build on for reproducible research. Breaking 5 | changes at all layers of the software stack are too frequent. The ActivePapers 6 | framework itself (this project) uses an API that was removed in Python 3.9, 7 | and while it can be updated with reasonable effort, there is little point 8 | in doing so: Published ActivePapers cannot be expected to work with a current 9 | Python stack for more than a year. 10 | 11 | If you came here because you wish to re-run a published ActivePaper, the best 12 | advice I can give is to use [Guix](https://guix.gnu.org/) with its 13 | [time-machine](https://guix.gnu.org/manual/en/html_node/Invoking-guix-time_002dmachine.html) 14 | feature to re-create a Python stack close in time to the paper you are working with. 15 | The ActivePapers infrastructre is packaged in Guix as `python-activepapers`. 16 | 17 | If you came here to learn about reproducible research practices, the best advice 18 | I can give is not to use Python. 19 | 20 | The following text is the README from 2018. 21 | 22 |
23 | 24 | ActivePapers is a tool for working with executable papers, which 25 | combine data, code, and documentation in single-file packages, 26 | suitable for publication as supplementary material or on sites such as 27 | [figshare](http://figshare.com). 28 | 29 | The ActivePapers Python edition requires Python 2.7 or Python 3.3 to 3.5. 30 | It also relies on the following libraries: 31 | 32 | - NumPy 1.6 or later (http://numpy.scipy.org/) 33 | - HDF5 1.8.7 or later (http://www.hdfgroup.org/HDF5/) 34 | - h5py 2.2 or later (http://www.h5py.org/) 35 | - tempdir 0.6 or later (http://pypi.python.org/pypi/tempdir/) 36 | 37 | Installation of ActivePapers.Py: 38 | 39 | python setup.py install 40 | 41 | This installs the ActivePapers Python library and the command-line 42 | tool "aptool" for managing ActivePapers. 43 | 44 | For documentation, see the 45 | [ActivePapers Web site](http://www.activepapers.org/python-edition/). 46 | 47 | ActivePapers development takes place 48 | [on Github](http://github.com/activepapers/activepapers-python). 49 | 50 | Runnning the tests also requires the [tempdir](https://pypi.python.org/pypi/tempdir/) library and either the 51 | [nose](http://pypi.python.org/pypi/nose/) or the [pytest](http://pytest.org) testing framework. The recommended way to run the tests is 52 | 53 | ``` 54 | cd tests 55 | ./run_all_tests.sh nosetests 56 | ``` 57 | or 58 | ``` 59 | cd tests 60 | ./run_all_tests.sh py.test 61 | ``` 62 | 63 | This launches the test runner on each test script individually. The simpler approach of simply running `nosetests` or `py.test` in directory `tests` leads to a few test failures because the testing framework's import handling conflicts with the implementation of internal modules in ActivePapers. 64 | -------------------------------------------------------------------------------- /doc/RELEASE_NOTES.md: -------------------------------------------------------------------------------- 1 | Release 0.2.2 2 | ------------- 3 | 4 | Improvements: 5 | 6 | - Provide a way to skip network-dependent 7 | tests in restricted environments 8 | (environment variable NO_NETWORK_ACCESS=1) 9 | 10 | Bug fixes: 11 | 12 | - Prevent a crash when no home directory is defined 13 | 14 | - Fix a bug in accesses to nested data groups. 15 | 16 | Release 0.2.1 17 | ------------- 18 | 19 | Improvements: 20 | 21 | - Internal text files are opened as utf8 rather than ascii. 22 | 23 | Bug fixes: 24 | 25 | - Prevent crashes when using Python modules in ActivePapers 26 | from scripts not managed by ActivePapers (using activepapers.exploration). 27 | 28 | Release 0.2 29 | ----------- 30 | 31 | New features: 32 | 33 | - Read-only access to code and data from an ActivePaper in plain 34 | Python scripts. This facilitates developing and testing code 35 | that will later be integrated into an ActivePaper. 36 | 37 | - Calclets have read-only access to code and to stack traces, 38 | allowing limited forms of introspection. 39 | 40 | - Internal files can be opened in binary mode. 41 | 42 | Bug fixes: 43 | 44 | - Improved compatibility with recent versions of Python and h5py. 45 | 46 | Release 0.1.4 47 | ------------- 48 | 49 | New features: 50 | 51 | - Python scripts are stored using UTF-8 encoding rather than ASCII. 52 | 53 | - Internal files can be opened using an option "encoding" argument. 54 | If this is used, strings read from and written to such files 55 | are unicode strings. 56 | 57 | Bug fixes: 58 | 59 | - A change in importlib in Python 3.4 broke the import of modules 60 | stored in an ActivePaper. 61 | 62 | Release 0.1.3 63 | ------------- 64 | 65 | New feature: 66 | 67 | - There is now a generic module activepapers.contents that can be 68 | imported from any Python script in order to provide read-only 69 | access to the contents of the ActivePaper that is located in the 70 | current directory. This is meant as an aid to codelet development. 71 | 72 | Bug fixes: 73 | 74 | - Broken downloads from Zenodo, following a modification of the contents 75 | of the Zenodo landing pages. Actually, Zenodo went back to the 76 | landing page format it had before ActivePapers release 0.1.2, 77 | so ActivePapers also went back to how it downloaded files before. 78 | 79 | 80 | Release 0.1.2 81 | ------------- 82 | 83 | This is a bugfix release, fixing the following issues: 84 | 85 | - A compatibility problem with h5py 2.3 86 | 87 | - Broken downloads from Zenodo, following a modification of the contents 88 | of the Zenodo landing pages. 89 | 90 | - Syntax errors in codelets were not reported correctly. 91 | -------------------------------------------------------------------------------- /doc/code_overview.rst: -------------------------------------------------------------------------------- 1 | Overview of the ActivePapers implementation 2 | =========================================== 3 | 4 | There is currently little documentation in the code. Don't worry, 5 | this will change. 6 | 7 | The command-line tool is in ``scripts/aptool``. It contains just 8 | the user interface, based on ``argparse``. The code that actually 9 | implements the commands is in the module ``activepapers.cli``. 10 | 11 | The ActivePapers Python library is in ``lib``. The main modules 12 | are: 13 | 14 | ``activepapers.storage`` 15 | Takes care of storing and retrieving data (both the contents of an 16 | ActivePaper and bookkeeping information) in an HDF5 file. Most 17 | of this module consists of the large class ``ActivePaper``. 18 | The class ``InternalFile`` handles the file interface to datasets 19 | (``activepapers.contents.open``). The class ``APNode`` handles 20 | references to contents in other ActivePapers. 21 | 22 | ``activepapers.execution`` 23 | 24 | Manages the execution of codelets (classes ``Codelet``, ``Calclet``, 25 | and ``Importlet``), which includes restricted rights for calclets 26 | and access to modules stored inside an ActivePapers for both 27 | calclets and importlets. Tracing of dependencies during the 28 | execution of a codelet is also handled here (classes 29 | ``AttrWrapper``, ``DatasetWrapper``, and ``DataGroup``). 30 | 31 | ``activepapers.library`` 32 | Manages the local library of ActivePapers. Downloads 33 | DOI references automatically if possible (which currently 34 | means DOIs from figshare). 35 | 36 | ``activepapers.cli`` 37 | Contains the implementation of the subcommands of ``aptool``. 38 | 39 | The remaining modules provide support code. Several of them are 40 | divided into three parts: ``activepapers.X``, ``activepapers.X2``, and 41 | ``activepapers.X3``. The modules ending in ``2`` or ``3`` contain code 42 | specific to Python 2 or Python 3. The generic one imports the right 43 | language-specific module and perhaps adds some code that works with 44 | both dialects. 45 | 46 | ``activepapers.url`` 47 | A thin wrapper around the URL-related libraries, which differ between 48 | Python 2 and Python 3. 49 | 50 | ``activepapers.standardlib`` 51 | Defines the subset of the standard library that is accessible from 52 | codelets. 53 | 54 | ``activepapers.builtins`` 55 | Defines the subset of the builtin definitions that is accessible from 56 | codelets. 57 | 58 | ``activepapers.utility`` 59 | Small functions that are used a lot in both ``activepapers.storage`` 60 | and ``activepapers.execution``. 61 | 62 | ``activepapers.version`` 63 | The version number of the library, stored in a single place. 64 | 65 | 66 | Note the absence of ``activepapers.contents``, which is the module 67 | through which codelets access the contents of an ActivePaper. It is 68 | created dynamically each time a codelet is run, see the class 69 | ``activepapers.execution.Codelet``. 70 | -------------------------------------------------------------------------------- /doc/writing_codelets.rst: -------------------------------------------------------------------------------- 1 | Writing codelets 2 | ================ 3 | 4 | Scripts inside an ActivePaper are called "codelets", which come in two 5 | varieties: calclets and importlets. As their names indicate, they are 6 | ideally small, using code from modules to do most of the work. The 7 | only difference between calclets and importlets is that calclets run 8 | in a restricted environment, whereas importlets have full access to 9 | the computer's resources: files, installed Python modules, network, 10 | etc. Calclets represent the reproducible part of an ActivePaper's 11 | computations. Importlets most probably don't work on anyone else's 12 | computer, and thus should be used only when absolutely necessary. The 13 | main reason for using an importlet, as its name suggests, is importing 14 | data from the outside world into an ActivePaper. 15 | 16 | Restricted environment execution 17 | -------------------------------- 18 | 19 | Calclets are run in a modified Python environment, which includes a 20 | subset of the Python standard library, the NumPy library, the 21 | ActivePapers library, and all Python modules stored inside the 22 | ActivePaper, directly or through references. The subset of the 23 | standard library includes everything needed for computation, but no 24 | I/O, network access, or platform-specific 25 | modules. ActivePapers-compliant I/O is provided through the 26 | ActivePapers library, as explained below. 27 | 28 | Since Python does not provide secure restricted environments, the 29 | restrictions are really no more than encouragements to respect the 30 | rules. You can get around all of them with some ingenuity, but this 31 | documentation won't tell you how. Keep this in mind when running other 32 | people's code: if you have resons to suspect malicious intents, look 33 | at the code before running it. 34 | 35 | Importlets are run in an augmented environment. They have access to 36 | everything a standard Python script can use, but they can (and must, 37 | in order to be useful) also use the I/O functionality from the 38 | ActivePaper library to write data to the ActivePaper. 39 | 40 | Accessing additional Python modules 41 | ----------------------------------- 42 | 43 | When a calclet tries to use a module that is not part of the restricted 44 | environment described above, ActivePapers aborts with an error message. 45 | The right solution for that problem is to include that module's source 46 | code in the ActivePaper, or to package it as a separate ActivePaper and 47 | access it through a reference. 48 | 49 | Unfortunately, this is not always possible. The most common technical 50 | obstacle are extension modules, which are not allowed in an 51 | ActivePaper. Licensing restrictions can also prevent re-publication in 52 | an ActivePaper. For such situations, ActivePapers provides a way to 53 | extend the restricted execution environment by additional modules and 54 | packages. This is done when the ActivePaper is created using ``aptool``, 55 | using the ``-d`` option to ``aptool create``. 56 | 57 | Note that adding a module to the restricted execution environment 58 | means that anyone working with your ActivePaper will have to have 59 | install the additional modules and packages, in versions compatible to 60 | the ones you used. 61 | 62 | 63 | I/O in ActivePapers 64 | ------------------- 65 | 66 | The module ``activepapers.contents`` provides two ways to read and write 67 | data: a file-like approach, and direct dataset access using the 68 | `h5py `_ library. 69 | 70 | File-like I/O is the easiest to use, and since it is very compatible 71 | to the Python library's file protocol, it can be used with many 72 | existing Python libraries. Here is a simple example: 73 | 74 | from activepapers.contents import open 75 | 76 | with open('numbers', 'w') as f: 77 | for i in range(10): 78 | f.write(str(i)+'\\n') 79 | 80 | You can use the ``open`` function just like you would use the standard 81 | Python ``open`` function, the only difference being that you pass it a 82 | dataset name rather than a filename. The above example creates the 83 | dataset ``/data/numbers``, i.e. the dataset names are relative to the 84 | ActivePaper's `data` group. 85 | 86 | There is also ``open_documentation``, which works in the same way but 87 | accesses datasets relative to the top-level ``documentation`` group. 88 | Datasets in this group are meant for human consumption, not for 89 | input to other calclets. 90 | 91 | Direct use of HDF5 datasets through ``h5py`` provides much more 92 | powerful data management options, in particular for large binary 93 | datasets. The following example stores the same data as the preceding 94 | one, but as a binary dataset: 95 | 96 | from activepapers.contents import data 97 | import numpy as np 98 | data['numbers] = np.arange(10) 99 | 100 | The ``data`` object in the module ``activepapers.contents`` behaves 101 | much like a group object from ``h5py``, the only difference being that 102 | all data accesses are tracked for creating the dependency graph in the 103 | ActivePaper. Most code based on ``h5py`` should work in an 104 | ActivePaper, with the exception of code that tests objects for being 105 | instances of specific h5py classes. 106 | -------------------------------------------------------------------------------- /examples/group_data_items.py: -------------------------------------------------------------------------------- 1 | # This example illustrates how to turn a group withh everything it 2 | # contains into a single data item for the purpose of dependency 3 | # tracking. 4 | 5 | from activepapers.storage import ActivePaper 6 | import numpy as np 7 | 8 | paper = ActivePaper("group_data_items.ap", "w") 9 | 10 | script = paper.create_calclet("script1", 11 | """ 12 | from activepapers.contents import data 13 | import numpy as np 14 | 15 | numbers = data.create_group("numbers") 16 | numbers.mark_as_data_item() 17 | numbers.create_dataset("pi", data=np.pi) 18 | numbers.create_dataset("e", data=np.e) 19 | """) 20 | script.run() 21 | 22 | script = paper.create_calclet("script2", 23 | """ 24 | from activepapers.contents import data 25 | import numpy as np 26 | 27 | numbers = data["numbers"] 28 | data.create_dataset("result", data=numbers["pi"][...]*numbers["e"][...]) 29 | """) 30 | script.run() 31 | 32 | # Check that only /data/numbers is tracked, not 33 | # /data/numbers/pi or /data/numbers/e 34 | for level in paper.dependency_hierarchy(): 35 | print [item.name for item in level] 36 | 37 | paper.close() 38 | -------------------------------------------------------------------------------- /examples/import_modules.py: -------------------------------------------------------------------------------- 1 | from activepapers.storage import ActivePaper 2 | import numpy as np 3 | import os, sys 4 | 5 | # The modules imported here are located in ../tests. 6 | script_path = os.path.dirname(sys.argv[0]) 7 | tests_path = os.path.join(script_path, '..', 'tests') 8 | module_path = [os.path.abspath(tests_path)] 9 | 10 | paper = ActivePaper("import_modules.ap", "w") 11 | 12 | # The source code of imported modules is embedded into the paper. Only 13 | # Python source code modules can be imported, i.e. neither extension 14 | # modules nor bytecode module (.pyc). 15 | # The module_path parameter is a list of directories that can contain 16 | # modules. If not specified, it defaults to sys.path 17 | paper.import_module('foo', module_path) 18 | paper.import_module('foo.bar', module_path) 19 | 20 | script = paper.create_calclet("test", 21 | """ 22 | import foo 23 | from foo.bar import frobnicate 24 | assert frobnicate(foo.__version__) == '42' 25 | """) 26 | script.run() 27 | 28 | paper.close() 29 | -------------------------------------------------------------------------------- /examples/internal_files.py: -------------------------------------------------------------------------------- 1 | from activepapers.storage import ActivePaper 2 | import numpy as np 3 | 4 | paper = ActivePaper("internal_files.ap", "w") 5 | 6 | script = paper.create_calclet("write", 7 | """ 8 | from activepapers.contents import open 9 | 10 | with open('numbers', 'w') as f: 11 | for i in range(10): 12 | f.write(str(i)+'\\n') 13 | """) 14 | script.run() 15 | 16 | script = paper.create_calclet("read1", 17 | """ 18 | from activepapers.contents import open 19 | 20 | f = open('numbers') 21 | for i in range(10): 22 | assert f.readline().strip() == str(i) 23 | f.close() 24 | """) 25 | script.run() 26 | 27 | script = paper.create_calclet("read2", 28 | """ 29 | from activepapers.contents import open 30 | 31 | f = open('numbers') 32 | data = [int(line.strip()) for line in f] 33 | f.close() 34 | assert data == list(range(10)) 35 | """) 36 | script.run() 37 | 38 | script = paper.create_calclet("convert_to_binary", 39 | """ 40 | from activepapers.contents import open 41 | import struct 42 | 43 | with open('numbers') as f: 44 | data = [int(line.strip()) for line in f] 45 | f = open('binary_numbers', 'wb') 46 | f.write(struct.pack(len(data)*'h', *data)) 47 | f.close() 48 | """) 49 | script.run() 50 | 51 | script = paper.create_calclet("read_binary", 52 | """ 53 | from activepapers.contents import open 54 | import struct 55 | 56 | f = open('binary_numbers', 'rb') 57 | assert struct.unpack(10*'h', f.read()) == tuple(range(10)) 58 | f.close() 59 | """) 60 | script.run() 61 | 62 | paper.close() 63 | -------------------------------------------------------------------------------- /examples/internal_modules.py: -------------------------------------------------------------------------------- 1 | from activepapers.storage import ActivePaper 2 | import numpy as np 3 | 4 | paper = ActivePaper("internal_modules.ap", "w") 5 | 6 | paper.add_module("my_math", 7 | """ 8 | import numpy as np 9 | 10 | def my_func(x): 11 | return np.sin(x) 12 | """) 13 | 14 | 15 | paper.data.create_dataset("frequency", data = 0.2) 16 | paper.data.create_dataset("time", data=0.1*np.arange(100)) 17 | 18 | calc_sine = paper.create_calclet("calc_sine", 19 | """ 20 | from activepapers.contents import data 21 | import numpy as np 22 | from my_math import my_func 23 | 24 | frequency = data['frequency'][...] 25 | time = data['time'][...] 26 | data.create_dataset("sine", data=my_func(2.*np.pi*frequency*time)) 27 | """) 28 | calc_sine.run() 29 | 30 | paper.close() 31 | -------------------------------------------------------------------------------- /examples/plotting.py: -------------------------------------------------------------------------------- 1 | from activepapers.storage import ActivePaper 2 | import numpy as np 3 | 4 | paper = ActivePaper("plotting.ap", "w", 5 | dependencies = ["matplotlib"]) 6 | 7 | paper.data.create_dataset("frequency", data = 0.2) 8 | paper.data.create_dataset("time", data=0.1*np.arange(100)) 9 | 10 | plot_sine = paper.create_calclet("plot_sine", 11 | """ 12 | from activepapers.contents import open, data 13 | import matplotlib 14 | # Make matplotlib ignore the user's .matplotlibrc 15 | matplotlib.rcdefaults() 16 | # Use the SVG backend. Must be done *before* importing pyplot. 17 | matplotlib.use('SVG') 18 | import matplotlib.pyplot as plt 19 | 20 | import numpy as np 21 | 22 | frequency = data['frequency'][...] 23 | time = data['time'][...] 24 | sine = np.sin(2.*np.pi*frequency*time) 25 | 26 | plt.plot(time, sine) 27 | # Save plot to a file, which is simulated by a HDF5 byte array 28 | with open('sine_plot.svg', 'w') as output: 29 | plt.savefig(output) 30 | """) 31 | plot_sine.run() 32 | 33 | paper.close() 34 | -------------------------------------------------------------------------------- /examples/ref_to_library.py: -------------------------------------------------------------------------------- 1 | import os 2 | os.environ['ACTIVEPAPERS_LIBRARY'] = os.getcwd() 3 | 4 | from activepapers.storage import ActivePaper 5 | import numpy as np 6 | 7 | paper = ActivePaper("ref_to_library.ap", "w") 8 | 9 | paper.data.create_dataset("frequency", data = 0.2) 10 | paper.data.create_dataset("time", data=0.1*np.arange(100)) 11 | 12 | paper.create_module_ref("my_math", "local:internal_modules") 13 | 14 | calc_sine = paper.create_calclet("calc_sine", 15 | """ 16 | from activepapers.contents import data 17 | import numpy as np 18 | from my_math import my_func 19 | 20 | frequency = data['frequency'][...] 21 | time = data['time'][...] 22 | data.create_dataset("sine", data=my_func(2.*np.pi*frequency*time)) 23 | """) 24 | calc_sine.run() 25 | 26 | paper.close() 27 | -------------------------------------------------------------------------------- /examples/ref_to_simple.py: -------------------------------------------------------------------------------- 1 | import os 2 | os.environ['ACTIVEPAPERS_LIBRARY'] = os.getcwd() 3 | 4 | from activepapers.storage import ActivePaper 5 | import numpy as np 6 | 7 | paper = ActivePaper("ref_to_simple.ap", "w") 8 | 9 | paper.create_data_ref("frequency", "local:simple") 10 | paper.create_data_ref("time", "local:simple", "time") 11 | 12 | paper.create_code_ref("calc_sine", "local:simple", "calc_sine") 13 | paper.run_codelet('calc_sine') 14 | 15 | paper.close() 16 | -------------------------------------------------------------------------------- /examples/simple.py: -------------------------------------------------------------------------------- 1 | from activepapers.storage import ActivePaper 2 | import numpy as np 3 | 4 | paper = ActivePaper("simple.ap", "w") 5 | 6 | paper.data.create_dataset("frequency", data = 0.2) 7 | paper.data.create_dataset("time", data=0.1*np.arange(100)) 8 | 9 | calc_sine = paper.create_calclet("calc_sine", 10 | """ 11 | from activepapers.contents import data 12 | import numpy as np 13 | 14 | frequency = data['frequency'][...] 15 | time = data['time'][...] 16 | data.create_dataset("sine", data=np.sin(2.*np.pi*frequency*time)) 17 | """) 18 | calc_sine.run() 19 | 20 | paper.close() 21 | -------------------------------------------------------------------------------- /examples/snapshot.py: -------------------------------------------------------------------------------- 1 | from activepapers.storage import ActivePaper 2 | import numpy as np 3 | 4 | paper = ActivePaper("snapshot.ap", "w") 5 | 6 | paper.data.create_dataset("frequency", data = 0.2) 7 | paper.data.create_dataset("time", data=0.1*np.arange(100)) 8 | 9 | calc_angular = paper.create_calclet("calc_angular", 10 | """ 11 | from activepapers.contents import data, snapshot 12 | import numpy as np 13 | 14 | frequency = data['frequency'][...] 15 | time = data['time'][...] 16 | data.create_dataset("sine", data=np.sin(2.*np.pi*frequency*time)) 17 | snapshot('snapshot_1.ap') 18 | data.create_dataset("cosine", data=np.cos(2.*np.pi*frequency*time)) 19 | snapshot('snapshot_2.ap') 20 | data.create_dataset("tangent", data=np.tan(2.*np.pi*frequency*time)) 21 | """) 22 | calc_angular.run() 23 | 24 | paper.close() 25 | -------------------------------------------------------------------------------- /examples/test_internal_files.py: -------------------------------------------------------------------------------- 1 | import sys 2 | import os 3 | import unittest 4 | import itertools 5 | import time 6 | from array import array 7 | from weakref import proxy 8 | try: 9 | import threading 10 | except ImportError: 11 | threading = None 12 | 13 | from test import test_support 14 | from test.test_support import TESTFN, run_unittest 15 | from UserList import UserList 16 | 17 | class AutoFileTests(unittest.TestCase): 18 | # file tests for which a test file is automatically set up 19 | 20 | def setUp(self): 21 | self.f = open(TESTFN, 'wb') 22 | 23 | def tearDown(self): 24 | if self.f: 25 | self.f.close() 26 | os.remove(TESTFN) 27 | 28 | def testWeakRefs(self): 29 | # verify weak references 30 | p = proxy(self.f) 31 | p.write('teststring') 32 | self.assertEqual(self.f.tell(), p.tell()) 33 | self.f.close() 34 | self.f = None 35 | self.assertRaises(ReferenceError, getattr, p, 'tell') 36 | 37 | def testAttributes(self): 38 | # verify expected attributes exist 39 | f = self.f 40 | with test_support.check_py3k_warnings(): 41 | softspace = f.softspace 42 | f.name # merely shouldn't blow up 43 | f.mode # ditto 44 | f.closed # ditto 45 | 46 | with test_support.check_py3k_warnings(): 47 | # verify softspace is writable 48 | f.softspace = softspace # merely shouldn't blow up 49 | 50 | # verify the others aren't 51 | for attr in 'name', 'mode', 'closed': 52 | self.assertRaises((AttributeError, TypeError), setattr, f, attr, 'oops') 53 | 54 | def testReadinto(self): 55 | # verify readinto 56 | self.f.write('12') 57 | self.f.close() 58 | a = array('c', 'x'*10) 59 | self.f = open(TESTFN, 'rb') 60 | n = self.f.readinto(a) 61 | self.assertEqual('12', a.tostring()[:n]) 62 | 63 | def testWritelinesUserList(self): 64 | # verify writelines with instance sequence 65 | l = UserList(['1', '2']) 66 | self.f.writelines(l) 67 | self.f.close() 68 | self.f = open(TESTFN, 'rb') 69 | buf = self.f.read() 70 | self.assertEqual(buf, '12') 71 | 72 | def testWritelinesIntegers(self): 73 | # verify writelines with integers 74 | self.assertRaises(TypeError, self.f.writelines, [1, 2, 3]) 75 | 76 | def testWritelinesIntegersUserList(self): 77 | # verify writelines with integers in UserList 78 | l = UserList([1,2,3]) 79 | self.assertRaises(TypeError, self.f.writelines, l) 80 | 81 | def testWritelinesNonString(self): 82 | # verify writelines with non-string object 83 | class NonString: 84 | pass 85 | 86 | self.assertRaises(TypeError, self.f.writelines, 87 | [NonString(), NonString()]) 88 | 89 | def testRepr(self): 90 | # verify repr works 91 | self.assertTrue(repr(self.f).startswith(" 220 | # "file.truncate fault on windows" 221 | f = open(TESTFN, 'wb') 222 | f.write('12345678901') # 11 bytes 223 | f.close() 224 | 225 | f = open(TESTFN,'rb+') 226 | data = f.read(5) 227 | if data != '12345': 228 | self.fail("Read on file opened for update failed %r" % data) 229 | if f.tell() != 5: 230 | self.fail("File pos after read wrong %d" % f.tell()) 231 | 232 | f.truncate() 233 | if f.tell() != 5: 234 | self.fail("File pos after ftruncate wrong %d" % f.tell()) 235 | 236 | f.close() 237 | size = os.path.getsize(TESTFN) 238 | if size != 5: 239 | self.fail("File size after ftruncate wrong %d" % size) 240 | 241 | try: 242 | bug801631() 243 | finally: 244 | os.unlink(TESTFN) 245 | 246 | def testIteration(self): 247 | # Test the complex interaction when mixing file-iteration and the 248 | # various read* methods. Ostensibly, the mixture could just be tested 249 | # to work when it should work according to the Python language, 250 | # instead of fail when it should fail according to the current CPython 251 | # implementation. People don't always program Python the way they 252 | # should, though, and the implemenation might change in subtle ways, 253 | # so we explicitly test for errors, too; the test will just have to 254 | # be updated when the implementation changes. 255 | dataoffset = 16384 256 | filler = "ham\n" 257 | assert not dataoffset % len(filler), \ 258 | "dataoffset must be multiple of len(filler)" 259 | nchunks = dataoffset // len(filler) 260 | testlines = [ 261 | "spam, spam and eggs\n", 262 | "eggs, spam, ham and spam\n", 263 | "saussages, spam, spam and eggs\n", 264 | "spam, ham, spam and eggs\n", 265 | "spam, spam, spam, spam, spam, ham, spam\n", 266 | "wonderful spaaaaaam.\n" 267 | ] 268 | methods = [("readline", ()), ("read", ()), ("readlines", ()), 269 | ("readinto", (array("c", " "*100),))] 270 | 271 | try: 272 | # Prepare the testfile 273 | bag = open(TESTFN, "w") 274 | bag.write(filler * nchunks) 275 | bag.writelines(testlines) 276 | bag.close() 277 | # Test for appropriate errors mixing read* and iteration 278 | for methodname, args in methods: 279 | f = open(TESTFN) 280 | if f.next() != filler: 281 | self.fail, "Broken testfile" 282 | meth = getattr(f, methodname) 283 | try: 284 | meth(*args) 285 | except ValueError: 286 | pass 287 | else: 288 | self.fail("%s%r after next() didn't raise ValueError" % 289 | (methodname, args)) 290 | f.close() 291 | 292 | # Test to see if harmless (by accident) mixing of read* and 293 | # iteration still works. This depends on the size of the internal 294 | # iteration buffer (currently 8192,) but we can test it in a 295 | # flexible manner. Each line in the bag o' ham is 4 bytes 296 | # ("h", "a", "m", "\n"), so 4096 lines of that should get us 297 | # exactly on the buffer boundary for any power-of-2 buffersize 298 | # between 4 and 16384 (inclusive). 299 | f = open(TESTFN) 300 | for i in range(nchunks): 301 | f.next() 302 | testline = testlines.pop(0) 303 | try: 304 | line = f.readline() 305 | except ValueError: 306 | self.fail("readline() after next() with supposedly empty " 307 | "iteration-buffer failed anyway") 308 | if line != testline: 309 | self.fail("readline() after next() with empty buffer " 310 | "failed. Got %r, expected %r" % (line, testline)) 311 | testline = testlines.pop(0) 312 | buf = array("c", "\x00" * len(testline)) 313 | try: 314 | f.readinto(buf) 315 | except ValueError: 316 | self.fail("readinto() after next() with supposedly empty " 317 | "iteration-buffer failed anyway") 318 | line = buf.tostring() 319 | if line != testline: 320 | self.fail("readinto() after next() with empty buffer " 321 | "failed. Got %r, expected %r" % (line, testline)) 322 | 323 | testline = testlines.pop(0) 324 | try: 325 | line = f.read(len(testline)) 326 | except ValueError: 327 | self.fail("read() after next() with supposedly empty " 328 | "iteration-buffer failed anyway") 329 | if line != testline: 330 | self.fail("read() after next() with empty buffer " 331 | "failed. Got %r, expected %r" % (line, testline)) 332 | try: 333 | lines = f.readlines() 334 | except ValueError: 335 | self.fail("readlines() after next() with supposedly empty " 336 | "iteration-buffer failed anyway") 337 | if lines != testlines: 338 | self.fail("readlines() after next() with empty buffer " 339 | "failed. Got %r, expected %r" % (line, testline)) 340 | # Reading after iteration hit EOF shouldn't hurt either 341 | f = open(TESTFN) 342 | try: 343 | for line in f: 344 | pass 345 | try: 346 | f.readline() 347 | f.readinto(buf) 348 | f.read() 349 | f.readlines() 350 | except ValueError: 351 | self.fail("read* failed after next() consumed file") 352 | finally: 353 | f.close() 354 | finally: 355 | os.unlink(TESTFN) 356 | 357 | class FileSubclassTests(unittest.TestCase): 358 | 359 | def testExit(self): 360 | # test that exiting with context calls subclass' close 361 | class C(file): 362 | def __init__(self, *args): 363 | self.subclass_closed = False 364 | file.__init__(self, *args) 365 | def close(self): 366 | self.subclass_closed = True 367 | file.close(self) 368 | 369 | with C(TESTFN, 'w') as f: 370 | pass 371 | self.assertTrue(f.subclass_closed) 372 | 373 | 374 | @unittest.skipUnless(threading, 'Threading required for this test.') 375 | class FileThreadingTests(unittest.TestCase): 376 | # These tests check the ability to call various methods of file objects 377 | # (including close()) concurrently without crashing the Python interpreter. 378 | # See #815646, #595601 379 | 380 | def setUp(self): 381 | self._threads = test_support.threading_setup() 382 | self.f = None 383 | self.filename = TESTFN 384 | with open(self.filename, "w") as f: 385 | f.write("\n".join("0123456789")) 386 | self._count_lock = threading.Lock() 387 | self.close_count = 0 388 | self.close_success_count = 0 389 | self.use_buffering = False 390 | 391 | def tearDown(self): 392 | if self.f: 393 | try: 394 | self.f.close() 395 | except (EnvironmentError, ValueError): 396 | pass 397 | try: 398 | os.remove(self.filename) 399 | except EnvironmentError: 400 | pass 401 | test_support.threading_cleanup(*self._threads) 402 | 403 | def _create_file(self): 404 | if self.use_buffering: 405 | self.f = open(self.filename, "w+", buffering=1024*16) 406 | else: 407 | self.f = open(self.filename, "w+") 408 | 409 | def _close_file(self): 410 | with self._count_lock: 411 | self.close_count += 1 412 | self.f.close() 413 | with self._count_lock: 414 | self.close_success_count += 1 415 | 416 | def _close_and_reopen_file(self): 417 | self._close_file() 418 | # if close raises an exception thats fine, self.f remains valid so 419 | # we don't need to reopen. 420 | self._create_file() 421 | 422 | def _run_workers(self, func, nb_workers, duration=0.2): 423 | with self._count_lock: 424 | self.close_count = 0 425 | self.close_success_count = 0 426 | self.do_continue = True 427 | threads = [] 428 | try: 429 | for i in range(nb_workers): 430 | t = threading.Thread(target=func) 431 | t.start() 432 | threads.append(t) 433 | for _ in xrange(100): 434 | time.sleep(duration/100) 435 | with self._count_lock: 436 | if self.close_count-self.close_success_count > nb_workers+1: 437 | if test_support.verbose: 438 | print 'Q', 439 | break 440 | time.sleep(duration) 441 | finally: 442 | self.do_continue = False 443 | for t in threads: 444 | t.join() 445 | 446 | def _test_close_open_io(self, io_func, nb_workers=5): 447 | def worker(): 448 | self._create_file() 449 | funcs = itertools.cycle(( 450 | lambda: io_func(), 451 | lambda: self._close_and_reopen_file(), 452 | )) 453 | for f in funcs: 454 | if not self.do_continue: 455 | break 456 | try: 457 | f() 458 | except (IOError, ValueError): 459 | pass 460 | self._run_workers(worker, nb_workers) 461 | if test_support.verbose: 462 | # Useful verbose statistics when tuning this test to take 463 | # less time to run but still ensuring that its still useful. 464 | # 465 | # the percent of close calls that raised an error 466 | percent = 100. - 100.*self.close_success_count/self.close_count 467 | print self.close_count, ('%.4f ' % percent), 468 | 469 | def test_close_open(self): 470 | def io_func(): 471 | pass 472 | self._test_close_open_io(io_func) 473 | 474 | def test_close_open_flush(self): 475 | def io_func(): 476 | self.f.flush() 477 | self._test_close_open_io(io_func) 478 | 479 | def test_close_open_iter(self): 480 | def io_func(): 481 | list(iter(self.f)) 482 | self._test_close_open_io(io_func) 483 | 484 | def test_close_open_isatty(self): 485 | def io_func(): 486 | self.f.isatty() 487 | self._test_close_open_io(io_func) 488 | 489 | def test_close_open_print(self): 490 | def io_func(): 491 | print >> self.f, '' 492 | self._test_close_open_io(io_func) 493 | 494 | def test_close_open_print_buffered(self): 495 | self.use_buffering = True 496 | def io_func(): 497 | print >> self.f, '' 498 | self._test_close_open_io(io_func) 499 | 500 | def test_close_open_read(self): 501 | def io_func(): 502 | self.f.read(0) 503 | self._test_close_open_io(io_func) 504 | 505 | def test_close_open_readinto(self): 506 | def io_func(): 507 | a = array('c', 'xxxxx') 508 | self.f.readinto(a) 509 | self._test_close_open_io(io_func) 510 | 511 | def test_close_open_readline(self): 512 | def io_func(): 513 | self.f.readline() 514 | self._test_close_open_io(io_func) 515 | 516 | def test_close_open_readlines(self): 517 | def io_func(): 518 | self.f.readlines() 519 | self._test_close_open_io(io_func) 520 | 521 | def test_close_open_seek(self): 522 | def io_func(): 523 | self.f.seek(0, 0) 524 | self._test_close_open_io(io_func) 525 | 526 | def test_close_open_tell(self): 527 | def io_func(): 528 | self.f.tell() 529 | self._test_close_open_io(io_func) 530 | 531 | def test_close_open_truncate(self): 532 | def io_func(): 533 | self.f.truncate() 534 | self._test_close_open_io(io_func) 535 | 536 | def test_close_open_write(self): 537 | def io_func(): 538 | self.f.write('') 539 | self._test_close_open_io(io_func) 540 | 541 | def test_close_open_writelines(self): 542 | def io_func(): 543 | self.f.writelines('') 544 | self._test_close_open_io(io_func) 545 | 546 | 547 | class StdoutTests(unittest.TestCase): 548 | 549 | def test_move_stdout_on_write(self): 550 | # Issue 3242: sys.stdout can be replaced (and freed) during a 551 | # print statement; prevent a segfault in this case 552 | save_stdout = sys.stdout 553 | 554 | class File: 555 | def write(self, data): 556 | if '\n' in data: 557 | sys.stdout = save_stdout 558 | 559 | try: 560 | sys.stdout = File() 561 | print "some text" 562 | finally: 563 | sys.stdout = save_stdout 564 | 565 | def test_del_stdout_before_print(self): 566 | # Issue 4597: 'print' with no argument wasn't reporting when 567 | # sys.stdout was deleted. 568 | save_stdout = sys.stdout 569 | del sys.stdout 570 | try: 571 | print 572 | except RuntimeError as e: 573 | self.assertEqual(str(e), "lost sys.stdout") 574 | else: 575 | self.fail("Expected RuntimeError") 576 | finally: 577 | sys.stdout = save_stdout 578 | 579 | def test_unicode(self): 580 | import subprocess 581 | 582 | def get_message(encoding, *code): 583 | code = '\n'.join(code) 584 | env = os.environ.copy() 585 | env['PYTHONIOENCODING'] = encoding 586 | process = subprocess.Popen([sys.executable, "-c", code], 587 | stdout=subprocess.PIPE, env=env) 588 | stdout, stderr = process.communicate() 589 | self.assertEqual(process.returncode, 0) 590 | return stdout 591 | 592 | def check_message(text, encoding, expected): 593 | stdout = get_message(encoding, 594 | "import sys", 595 | "sys.stdout.write(%r)" % text, 596 | "sys.stdout.flush()") 597 | self.assertEqual(stdout, expected) 598 | 599 | # test the encoding 600 | check_message(u'15\u20ac', "iso-8859-15", "15\xa4") 601 | check_message(u'15\u20ac', "utf-8", '15\xe2\x82\xac') 602 | check_message(u'15\u20ac', "utf-16-le", '1\x005\x00\xac\x20') 603 | 604 | # test the error handler 605 | check_message(u'15\u20ac', "iso-8859-1:ignore", "15") 606 | check_message(u'15\u20ac', "iso-8859-1:replace", "15?") 607 | check_message(u'15\u20ac', "iso-8859-1:backslashreplace", "15\\u20ac") 608 | 609 | # test the buffer API 610 | for objtype in ('buffer', 'bytearray'): 611 | stdout = get_message('ascii', 612 | 'import sys', 613 | r'sys.stdout.write(%s("\xe9"))' % objtype, 614 | 'sys.stdout.flush()') 615 | self.assertEqual(stdout, "\xe9") 616 | 617 | 618 | def test_main(): 619 | # Historically, these tests have been sloppy about removing TESTFN. 620 | # So get rid of it no matter what. 621 | try: 622 | run_unittest(AutoFileTests, OtherFileTests, FileSubclassTests, 623 | FileThreadingTests, StdoutTests) 624 | finally: 625 | if os.path.exists(TESTFN): 626 | os.unlink(TESTFN) 627 | 628 | if __name__ == '__main__': 629 | test_main() 630 | -------------------------------------------------------------------------------- /lib/activepapers/__init__.py: -------------------------------------------------------------------------------- 1 | from activepapers.version import version as __version__ 2 | 3 | -------------------------------------------------------------------------------- /lib/activepapers/builtins2.py: -------------------------------------------------------------------------------- 1 | from __builtin__ import * 2 | from __builtin__ import __import__ 3 | 4 | del execfile 5 | del eval 6 | del file 7 | del open 8 | del raw_input 9 | -------------------------------------------------------------------------------- /lib/activepapers/builtins3.py: -------------------------------------------------------------------------------- 1 | from builtins import * 2 | from builtins import __import__ 3 | from builtins import __build_class__ 4 | 5 | # The "del exec" was removed and replaced by an equivalent operation 6 | # in utility3 to avoid a syntax error when processing builtins by 7 | # Python 2. 8 | # 9 | # del exec 10 | del eval 11 | del input 12 | del open 13 | try: 14 | del quit 15 | except NameError: 16 | pass 17 | -------------------------------------------------------------------------------- /lib/activepapers/cli.py: -------------------------------------------------------------------------------- 1 | # Command line interface implementation 2 | 3 | import fnmatch 4 | import itertools as it 5 | import os 6 | import re 7 | import subprocess 8 | import sys 9 | import time 10 | import tempdir 11 | 12 | import numpy 13 | import h5py 14 | 15 | import activepapers.storage 16 | from activepapers.utility import ascii, datatype, mod_time, stamp, \ 17 | timestamp, raw_input 18 | 19 | class CLIExit(Exception): 20 | pass 21 | 22 | def get_paper(input_filename): 23 | if input_filename is not None: 24 | return input_filename 25 | apfiles = [fn for fn in os.listdir('.') if fn.endswith('.ap')] 26 | if len(apfiles) == 1: 27 | return apfiles[0] 28 | sys.stderr.write("no filename given and ") 29 | if apfiles: 30 | sys.stderr.write("%d HDF5 files in current directory\n" % len(apfiles)) 31 | else: 32 | sys.stderr.write("no HDF5 file in current directory\n") 33 | raise CLIExit 34 | 35 | 36 | # 37 | # Support for checkin/checkout/extract 38 | # 39 | 40 | extractable_types = ['calclet', 'importlet', 'module', 'file', 'text'] 41 | 42 | file_extensions = {('calclet', 'python'): '.py', 43 | ('importlet', 'python'): '.py', 44 | ('module', 'python'): '.py', 45 | ('file', None): '', 46 | ('text', 'HTML'): '.html', 47 | ('text', 'LaTeX'): '.tex', 48 | ('text', 'markdown'): '.md', 49 | ('text', 'reStructuredText'): '.rst', 50 | ('text', None): '.txt'} 51 | 52 | file_languages = dict((_ext, _l) 53 | for (_t, _l), _ext in file_extensions.items()) 54 | 55 | def extract_to_file(paper, item, file=None, filename=None, directory=None): 56 | if file is None: 57 | if filename is not None: 58 | filename = os.path.abspath(filename) 59 | if directory is not None: 60 | directory = os.path.abspath(directory) 61 | if filename is not None and directory is not None: 62 | if not filename.startswith(directory): 63 | raise ValueError("% not in directory %s" 64 | % (filename, directory)) 65 | if filename is None: 66 | item_name = item.name.split('/')[1:] 67 | filename = os.path.join(directory, *item_name) 68 | if '.' not in item_name[-1]: 69 | # Add a file extension using some heuristics 70 | language = item.attrs.get('ACTIVE_PAPER_LANGUAGE', None) 71 | filename += file_extensions.get((datatype(item), language), '') 72 | directory, _ = os.path.split(filename) 73 | if directory and not os.path.exists(directory): 74 | os.makedirs(directory) 75 | file = open(filename, 'wb') 76 | close = True 77 | else: 78 | # If a file object is given, no other file specification is allowed 79 | assert filename is None 80 | assert directory is None 81 | close = False 82 | dt = datatype(item) 83 | if dt in ['file', 'text']: 84 | internal = activepapers.storage.InternalFile(item, 'rb') 85 | file.write(internal.read()) 86 | elif dt in extractable_types: 87 | file.write(item[...].flat[0]) 88 | else: 89 | raise ValueError("cannot extract dataset %s of type %s" 90 | % (item.name, dt)) 91 | if close: 92 | file.close() 93 | mtime = mod_time(item) 94 | if mtime: 95 | os.utime(filename, (mtime, mtime)) 96 | return filename 97 | 98 | def update_from_file(paper, filename, type=None, 99 | force_update=False, dry_run=False, 100 | dataset_name=None, create_new=True): 101 | if not os.path.exists(filename): 102 | raise ValueError("File %s not found" % filename) 103 | mtime = os.path.getmtime(filename) 104 | basename = filename 105 | ext = '' 106 | if dataset_name is not None: 107 | item = paper.file.get(dataset_name, None) 108 | if item is not None: 109 | basename = item.name 110 | else: 111 | item = paper.file.get(basename, None) 112 | if item is None: 113 | basename, ext = os.path.splitext(filename) 114 | item = paper.file.get(basename, None) 115 | language = file_languages.get(ext, None) 116 | if item is None: 117 | if not create_new: 118 | return 119 | # Create new item 120 | if type is None: 121 | raise ValueError("Datatype required to create new item %s" 122 | % basename) 123 | if type in ['calclet', 'importlet', 'module']: 124 | if not basename.startswith('code/'): 125 | raise ValueError("Items of type %s must be" 126 | " in the code section" 127 | % type) 128 | if language != 'python': 129 | raise ValueError("Items of type %s must be Python code" 130 | % type) 131 | if type == 'module' and \ 132 | not basename.startswith('code/python-packages/'): 133 | raise ValueError("Items of type %s must be in" 134 | "code/python-packages" 135 | % type) 136 | elif type == 'file': 137 | if not basename.startswith('data/') \ 138 | and not basename.startswith('documentation/'): 139 | raise ValueError("Items of type %s must be" 140 | " in the data or documentation section" 141 | % type) 142 | basename += ext 143 | elif type == 'text': 144 | if not basename.startswith('documentation/'): 145 | raise ValueError("Items of type %s must be" 146 | " in the documentation section" 147 | % type) 148 | else: 149 | # Update existing item 150 | if mtime <= mod_time(item) and not force_update: 151 | if dry_run: 152 | sys.stdout.write("Skip %s: file %s is not newer\n" 153 | % (item.name, filename)) 154 | return 155 | if type is not None and type != datatype(item): 156 | raise ValueError("Cannot change datatype %s to %s" 157 | % (datatype(item), type)) 158 | if type is None: 159 | type = datatype(item) 160 | if language is None: 161 | language = item.attrs.get('ACTIVE_PAPER_LANGUAGE', None) 162 | if dry_run: 163 | sys.stdout.write("Delete %s\n" % item.name) 164 | else: 165 | del item.parent[item.name.split('/')[-1]] 166 | if dry_run: 167 | fulltype = type if language is None else '/'.join((type, language)) 168 | sys.stdout.write("Create item %s of type %s from file %s\n" 169 | % (basename, fulltype, filename)) 170 | else: 171 | if type in ['calclet', 'importlet', 'module']: 172 | code = open(filename, 'rb').read().decode('utf-8') 173 | item = paper.store_python_code(basename[5:], code) 174 | stamp(item, type, {}) 175 | timestamp(item, mtime) 176 | elif type in ['file', 'text']: 177 | f = paper.open_internal_file(basename, 'w') 178 | f.write(open(filename, 'rb').read()) 179 | f.close() 180 | stamp(f._ds, type, {'ACTIVE_PAPER_LANGUAGE': language}) 181 | timestamp(f._ds, mtime) 182 | 183 | def directory_pattern(pattern): 184 | if pattern[-1] in "?*/": 185 | return None 186 | return pattern + "/*" 187 | 188 | def process_patterns(patterns): 189 | if patterns is None: 190 | return None 191 | patterns = sum([(p, directory_pattern(p)) for p in patterns], ()) 192 | patterns = [re.compile(fnmatch.translate(p)) 193 | for p in patterns 194 | if p is not None] 195 | return patterns 196 | 197 | # 198 | # Command handlers called from argparse 199 | # 200 | 201 | def create(paper, d=None): 202 | if paper is None: 203 | sys.stderr.write("no paper given\n") 204 | raise CLIExit 205 | paper = activepapers.storage.ActivePaper(paper, 'w', d) 206 | paper.close() 207 | 208 | def ls(paper, long, type, pattern): 209 | paper = get_paper(paper) 210 | paper = activepapers.storage.ActivePaper(paper, 'r') 211 | pattern = process_patterns(pattern) 212 | for item in paper.iter_items(): 213 | name = item.name[1:] # remove initial slash 214 | dtype = datatype(item) 215 | if item.attrs.get('ACTIVE_PAPER_DUMMY_DATASET', False): 216 | dtype = 'dummy' 217 | if pattern and \ 218 | not any(p.match(name) for p in pattern): 219 | continue 220 | if type is not None and dtype != type: 221 | continue 222 | if long: 223 | t = item.attrs.get('ACTIVE_PAPER_TIMESTAMP', None) 224 | if t is None: 225 | sys.stdout.write(21*" ") 226 | else: 227 | sys.stdout.write(time.strftime("%Y-%m-%d/%H:%M:%S ", 228 | time.localtime(t/1000.))) 229 | field_len = len("importlet ") # the longest data type name 230 | sys.stdout.write((dtype + field_len*" ")[:field_len]) 231 | sys.stdout.write('*' if paper.is_stale(item) else ' ') 232 | sys.stdout.write(name) 233 | sys.stdout.write('\n') 234 | paper.close() 235 | 236 | def rm(paper, force, pattern): 237 | paper_name = get_paper(paper) 238 | paper = activepapers.storage.ActivePaper(paper_name, 'r') 239 | deps = paper.dependency_graph() 240 | pattern = process_patterns(pattern) 241 | if not pattern: 242 | return 243 | names = set() 244 | for item in it.chain(paper.iter_items(), paper.iter_groups()): 245 | if any(p.match(item.name[1:]) for p in pattern): 246 | names.add(item.name) 247 | paper.close() 248 | if not names: 249 | return 250 | while True: 251 | new_names = set() 252 | for name in names: 253 | for dep in deps[name]: 254 | new_names.add(dep) 255 | if new_names - names: 256 | names |= new_names 257 | else: 258 | break 259 | names = sorted(names) 260 | if not force: 261 | for name in names: 262 | sys.stdout.write(name + '\n') 263 | while True: 264 | reply = raw_input("Delete ? (y/n) ") 265 | if reply in "yn": 266 | break 267 | if reply == 'n': 268 | return 269 | paper = activepapers.storage.ActivePaper(paper_name, 'r+') 270 | most_recent_group = None 271 | for name in names: 272 | if most_recent_group and name.startswith(most_recent_group): 273 | continue 274 | if isinstance(paper.file[name], h5py.Group): 275 | most_recent_group = name 276 | try: 277 | del paper.file[name] 278 | except: 279 | sys.stderr.write("Can't delete %s\n" % name) 280 | paper.close() 281 | 282 | def dummy(paper, force, pattern): 283 | paper_name = get_paper(paper) 284 | paper = activepapers.storage.ActivePaper(paper_name, 'r') 285 | deps = paper.dependency_graph() 286 | pattern = process_patterns(pattern) 287 | if not pattern: 288 | return 289 | names = set() 290 | for item in paper.iter_items(): 291 | if any(p.match(item.name[1:]) for p in pattern): 292 | names.add(item.name) 293 | paper.close() 294 | if not names: 295 | return 296 | names = sorted(names) 297 | if not force: 298 | for name in names: 299 | sys.stdout.write(name + '\n') 300 | while True: 301 | reply = raw_input("Replace by dummy datasets? (y/n) ") 302 | if reply in "yn": 303 | break 304 | if reply == 'n': 305 | return 306 | paper = activepapers.storage.ActivePaper(paper_name, 'r+') 307 | for name in names: 308 | try: 309 | paper.replace_by_dummy(name) 310 | except: 311 | sys.stderr.write("Can't replace %s by dummy\n" % name) 312 | raise 313 | paper.close() 314 | 315 | def set_(paper, dataset, expr): 316 | paper = get_paper(paper) 317 | paper = activepapers.storage.ActivePaper(paper, 'r+') 318 | value = eval(expr, numpy.__dict__, {}) 319 | try: 320 | del paper.data[dataset] 321 | except KeyError: 322 | pass 323 | paper.data[dataset] = value 324 | paper.close() 325 | 326 | def group(paper, group_name): 327 | if group_name.startswith('/'): 328 | group_name = group_name[1:] 329 | top_level = group_name.split('/')[0] 330 | if top_level not in ['code', 'data', 'documentation']: 331 | sys.stderr.write("invalid group name %s\n" % group_name) 332 | raise CLIExit 333 | paper = get_paper(paper) 334 | paper = activepapers.storage.ActivePaper(paper, 'r+') 335 | paper.file.create_group(group_name) 336 | paper.close() 337 | 338 | def extract(paper, dataset, filename): 339 | paper = get_paper(paper) 340 | paper = activepapers.storage.ActivePaper(paper, 'r') 341 | ds = paper.file[dataset] 342 | try: 343 | if filename == '-': 344 | extract_to_file(paper, ds, file=sys.stdout) 345 | else: 346 | extract_to_file(paper, ds, filename=filename) 347 | except ValueError as exc: 348 | sys.stderr.write(exc.args[0] + '\n') 349 | raise CLIExit 350 | 351 | def _script(paper, dataset, filename, run, create_method): 352 | paper = get_paper(paper) 353 | paper = activepapers.storage.ActivePaper(paper, 'r+') 354 | script = open(filename).read() 355 | codelet = getattr(paper, create_method)(dataset, script) 356 | if run: 357 | codelet.run() 358 | paper.close() 359 | 360 | def calclet(paper, dataset, filename, run): 361 | _script(paper, dataset, filename, run, "create_calclet") 362 | 363 | def importlet(paper, dataset, filename, run): 364 | _script(paper, dataset, filename, run, "create_importlet") 365 | 366 | def import_module(paper, module): 367 | paper = get_paper(paper) 368 | paper = activepapers.storage.ActivePaper(paper, 'r+') 369 | paper.import_module(module) 370 | paper.close() 371 | 372 | def run(paper, codelet, debug, profile, checkin): 373 | paper = get_paper(paper) 374 | with activepapers.storage.ActivePaper(paper, 'r+') as paper: 375 | if checkin: 376 | for root, dirs, files in os.walk('code'): 377 | for f in files: 378 | filename = os.path.join(root, f) 379 | try: 380 | update_from_file(paper, filename) 381 | except ValueError as exc: 382 | sys.stderr.write(exc.args[0] + '\n') 383 | try: 384 | if profile is None: 385 | exc = paper.run_codelet(codelet, debug) 386 | else: 387 | import cProfile, pstats 388 | pr = cProfile.Profile() 389 | pr.enable() 390 | exc = paper.run_codelet(codelet, debug) 391 | pr.disable() 392 | ps = pstats.Stats(pr) 393 | ps.dump_stats(profile) 394 | except KeyError: 395 | sys.stderr.write("Codelet %s does not exist\n" % codelet) 396 | raise CLIExit 397 | if exc is not None: 398 | sys.stderr.write(exc) 399 | 400 | def _find_calclet_for_dummy_or_stale_item(paper_name): 401 | paper = activepapers.storage.ActivePaper(paper_name, 'r') 402 | deps = paper.dependency_hierarchy() 403 | next(deps) # the first set has no dependencies 404 | calclet = None 405 | item_name = None 406 | for item_set in deps: 407 | for item in item_set: 408 | if paper.is_dummy(item) or paper.is_stale(item): 409 | item_name = item.name 410 | calclet = item.attrs['ACTIVE_PAPER_GENERATING_CODELET'] 411 | break 412 | # We must del item_set to prevent h5py from crashing when the 413 | # file is closed. Presumably there are HDF5 handles being freed 414 | # as a consequence of the del. 415 | del item_set 416 | if calclet is not None: 417 | break 418 | paper.close() 419 | return calclet, item_name 420 | 421 | def update(paper, verbose): 422 | paper_name = get_paper(paper) 423 | while True: 424 | calclet, item_name = _find_calclet_for_dummy_or_stale_item(paper_name) 425 | if calclet is None: 426 | break 427 | if verbose: 428 | sys.stdout.write("Dataset %s is stale or dummy, running %s\n" 429 | % (item_name, calclet)) 430 | sys.stdout.flush() 431 | paper = activepapers.storage.ActivePaper(paper_name, 'r+') 432 | paper.run_codelet(calclet) 433 | paper.close() 434 | 435 | def checkin(paper, type, file, force, dry_run): 436 | paper = get_paper(paper) 437 | paper = activepapers.storage.ActivePaper(paper, 'r+') 438 | cwd = os.path.abspath(os.getcwd()) 439 | for filename in file: 440 | filename = os.path.abspath(filename) 441 | if not filename.startswith(cwd): 442 | sys.stderr.write("File %s is not in the working directory\n" 443 | % filename) 444 | raise CLIExit 445 | filename = filename[len(cwd)+1:] 446 | 447 | def update(filename): 448 | try: 449 | update_from_file(paper, filename, type, force, dry_run) 450 | except ValueError as exc: 451 | sys.stderr.write(exc.args[0] + '\n') 452 | 453 | if os.path.isdir(filename): 454 | for root, dirs, files in os.walk(filename): 455 | for f in files: 456 | update(os.path.join(root, f)) 457 | else: 458 | update(filename) 459 | 460 | paper.close() 461 | 462 | def checkout(paper, type, pattern, dry_run): 463 | paper = get_paper(paper) 464 | paper = activepapers.storage.ActivePaper(paper, 'r') 465 | pattern = process_patterns(pattern) 466 | for item in paper.iter_items(): 467 | name = item.name[1:] # remove initial slash 468 | dtype = datatype(item) 469 | if pattern and \ 470 | not any(p.match(name) for p in pattern): 471 | continue 472 | if type is not None and dtype != type: 473 | continue 474 | try: 475 | extract_to_file(paper, item, directory=os.getcwd()) 476 | except ValueError: 477 | sys.stderr.write("Skipping %s: data type %s not extractable\n" 478 | % (item.name, datatype(item))) 479 | paper.close() 480 | 481 | def ln(paper, reference, name): 482 | ref_parts = reference.split(':') 483 | if len(ref_parts) != 3: 484 | sys.stderr.write('Invalid reference %s\n' % reference) 485 | raise CLIExit 486 | ref_type, ref_name, ref_path = ref_parts 487 | with activepapers.storage.ActivePaper(get_paper(paper), 'r+') as paper: 488 | if ref_path == '': 489 | ref_path = None 490 | paper.create_ref(name, ref_type + ':' + ref_name, ref_path) 491 | 492 | def cp(paper, reference, name): 493 | ref_parts = reference.split(':') 494 | if len(ref_parts) != 3: 495 | sys.stderr.write('Invalid reference %s\n' % reference) 496 | raise CLIExit 497 | ref_type, ref_name, ref_path = ref_parts 498 | with activepapers.storage.ActivePaper(get_paper(paper), 'r+') as paper: 499 | if ref_path == '': 500 | ref_path = None 501 | paper.create_copy(name, ref_type + ':' + ref_name, ref_path) 502 | 503 | def refs(paper, verbose): 504 | paper = get_paper(paper) 505 | paper = activepapers.storage.ActivePaper(paper, 'r') 506 | refs = paper.external_references() 507 | paper.close() 508 | sorted_refs = sorted(refs.keys()) 509 | for ref in sorted_refs: 510 | sys.stdout.write(ref.decode('utf-8') + '\n') 511 | if verbose: 512 | links, copies = refs[ref] 513 | if links: 514 | sys.stdout.write(" links:\n") 515 | for l in links: 516 | sys.stdout.write(" %s\n" % l) 517 | if copies: 518 | sys.stdout.write(" copies:\n") 519 | for c in copies: 520 | sys.stdout.write(" %s\n" % c) 521 | 522 | def edit(paper, dataset): 523 | editor = os.getenv("EDITOR", "vi") 524 | paper_name = get_paper(paper) 525 | with tempdir.TempDir() as t: 526 | paper = activepapers.storage.ActivePaper(paper_name, 'r') 527 | ds = paper.file[dataset] 528 | try: 529 | filename = extract_to_file(paper, ds, directory=str(t)) 530 | except ValueError as exc: 531 | sys.stderr.write(exc.args[0] + '\n') 532 | raise CLIExit 533 | finally: 534 | paper.close() 535 | ret = subprocess.call([editor, filename]) 536 | if ret == 0: 537 | paper = activepapers.storage.ActivePaper(paper_name, 'r+') 538 | try: 539 | update_from_file(paper, filename, 540 | dataset_name=dataset, create_new=False) 541 | finally: 542 | paper.close() 543 | 544 | def console(paper, modify): 545 | import code 546 | paper = get_paper(paper) 547 | paper = activepapers.storage.ActivePaper(paper, 'r+' if modify else 'r') 548 | data = paper.data 549 | environment = {'data': paper.data} 550 | code.interact(banner = "ActivePapers interactive console", 551 | local = environment) 552 | paper.close() 553 | 554 | def ipython(paper, modify): 555 | import IPython 556 | paper = get_paper(paper) 557 | paper = activepapers.storage.ActivePaper(paper, 'r+' if modify else 'r') 558 | data = paper.data 559 | IPython.embed() 560 | paper.close() 561 | -------------------------------------------------------------------------------- /lib/activepapers/contents.py: -------------------------------------------------------------------------------- 1 | # This module is not used by code running inside an ActivePaper, 2 | # because the ActivePaper runtime system (execution.py) creates 3 | # a specific module on the fly. This generic module 4 | # is used when activepapers.contents is imported from a 5 | # standard Python script. It is meant to be facilitate 6 | # development of codelets for ActivePaper in a standard 7 | # Python development environment. 8 | 9 | 10 | # Locate the (hopefully only) ActivePaper in the current directory 11 | import os 12 | apfiles = [fn for fn in os.listdir('.') if fn.endswith('.ap')] 13 | if len(apfiles) != 1: 14 | raise IOError("directory contains %s ActivePapers" % len(apfiles)) 15 | del os 16 | 17 | # Open the paper read-only 18 | from activepapers.storage import ActivePaper 19 | _paper = ActivePaper(apfiles[0], 'r') 20 | del apfiles 21 | del ActivePaper 22 | 23 | # Emulate the internal activepapers.contents module 24 | data = _paper.data 25 | 26 | def _open(filename, mode, section): 27 | from activepapers.utility import path_in_section 28 | path = path_in_section(filename, section) 29 | if not path.startswith('/'): 30 | path = section + '/' + path 31 | assert mode == 'r' 32 | return _paper.open_internal_file(path, 'r', None) 33 | 34 | def open(filename, mode='r'): 35 | return _open(filename, mode, '/data') 36 | 37 | def open_documentation(filename, mode='r'): 38 | return _open(filename, mode, '/documentation') 39 | 40 | def exception_traceback(): 41 | raise NotImplementedError() 42 | 43 | # Make the code in the ActivePapers importable 44 | import activepapers.execution 45 | def _get_codelet_and_paper(): 46 | return None, _paper 47 | activepapers.execution.get_codelet_and_paper = _get_codelet_and_paper 48 | del _get_codelet_and_paper 49 | -------------------------------------------------------------------------------- /lib/activepapers/execution.py: -------------------------------------------------------------------------------- 1 | import imp 2 | import collections 3 | import os 4 | import sys 5 | import threading 6 | import traceback 7 | import weakref 8 | import logging 9 | 10 | import h5py 11 | import numpy as np 12 | 13 | import activepapers.utility 14 | from activepapers.utility import ascii, utf8, isstring, execcode, \ 15 | codepath, datapath, path_in_section, owner, \ 16 | datatype, language, \ 17 | timestamp, stamp, ms_since_epoch 18 | import activepapers.standardlib 19 | 20 | # 21 | # A codelet is a Python script inside a paper. 22 | # 23 | # Codelets come in several varieties: 24 | # 25 | # - Calclets can only access datasets inside the paper. 26 | # Their computations are reproducible. 27 | # 28 | # - Importlets create datasets in the paper based on external resources. 29 | # Their results are not reproducible, and in general they are not 30 | # executable in a different environment. They are stored as documentation 31 | # and for manual re-execution. 32 | # 33 | 34 | class Codelet(object): 35 | 36 | def __init__(self, paper, node): 37 | self.paper = paper 38 | self.node = node 39 | self._dependencies = None 40 | assert node.name.startswith('/code/') 41 | self.path = node.name 42 | 43 | def dependency_attributes(self): 44 | if self._dependencies is None: 45 | return {'ACTIVE_PAPER_GENERATING_CODELET': self.path} 46 | else: 47 | deps = list(self._dependencies) 48 | deps.append(ascii(self.path)) 49 | deps.sort() 50 | return {'ACTIVE_PAPER_GENERATING_CODELET': self.path, 51 | 'ACTIVE_PAPER_DEPENDENCIES': deps} 52 | 53 | def add_dependency(self, dependency): 54 | pass 55 | 56 | def owns(self, node): 57 | return owner(node) == self.path 58 | 59 | def _open_file(self, path, mode, encoding, section): 60 | if path.startswith(os.path.expanduser('~')): 61 | # Catch obvious attempts to access real files 62 | # rather than internal ones. 63 | raise IOError((13, "Permission denied: '%s'" % path)) 64 | path = path_in_section(path, section) 65 | if not path.startswith('/'): 66 | path = section + '/' + path 67 | f = self.paper.open_internal_file(path, mode, encoding, self) 68 | f._set_attribute_callback(self.dependency_attributes) 69 | if mode[0] == 'r': 70 | self.add_dependency(f._ds.name) 71 | return f 72 | 73 | def open_data_file(self, path, mode='r', encoding=None): 74 | return self._open_file(path, mode, encoding, '/data') 75 | 76 | def open_documentation_file(self, path, mode='r', encoding=None): 77 | return self._open_file(path, mode, encoding, '/documentation') 78 | 79 | def exception_traceback(self): 80 | from traceback import extract_tb, print_exc 81 | import sys 82 | tb = sys.exc_info()[2] 83 | node, line, fn_name, _ = extract_tb(tb, limit=2)[1] 84 | paper_id, path = node.split(':') 85 | return CodeFile(self.paper, self.paper.file[path]), line, fn_name 86 | 87 | def _run(self, environment): 88 | logging.info("Running %s %s" 89 | % (self.__class__.__name__.lower(), self.path)) 90 | self.paper.remove_owned_by(self.path) 91 | # A string uniquely identifying the paper from which the 92 | # calclet is called. Used in Importer. 93 | script = utf8(self.node[...].flat[0]) 94 | script = compile(script, ':'.join([self.paper._id(), self.path]), 'exec') 95 | self._contents_module = imp.new_module('activepapers.contents') 96 | self._contents_module.data = DataGroup(self.paper, None, 97 | self.paper.data_group, self) 98 | self._contents_module.code = CodeGroup(self.paper, 99 | self.paper.code_group) 100 | self._contents_module.open = self.open_data_file 101 | self._contents_module.open_documentation = self.open_documentation_file 102 | self._contents_module.snapshot = self.paper.snapshot 103 | self._contents_module.exception_traceback = self.exception_traceback 104 | 105 | # The remaining part of this method is not thread-safe because 106 | # of the way the global state in sys.modules is modified. 107 | with codelet_lock: 108 | try: 109 | codelet_registry[(self.paper._id(), self.path)] = self 110 | for name, module in self.paper._local_modules.items(): 111 | assert name not in sys.modules 112 | sys.modules[name] = module 113 | sys.modules['activepapers.contents'] = self._contents_module 114 | execcode(script, environment) 115 | finally: 116 | del codelet_registry[(self.paper._id(), self.path)] 117 | self._contents_module = None 118 | if 'activepapers.contents' in sys.modules: 119 | del sys.modules['activepapers.contents'] 120 | for name, module in self.paper._local_modules.items(): 121 | del sys.modules[name] 122 | 123 | codelet_lock = threading.Lock() 124 | 125 | # 126 | # Importlets are run in the normal Python environment, with in 127 | # addition access to the special module activepapers.contents. 128 | # 129 | # All data generation is traced during importlet execution in order to 130 | # build the dependency graph. 131 | # 132 | # Importlets are be allowed to read dataset except those they 133 | # generated themselves. This is not enforced at the moment. 134 | # 135 | 136 | class Importlet(Codelet): 137 | 138 | def run(self): 139 | environment = {'__builtins__': activepapers.utility.builtins.__dict__} 140 | self._run(environment) 141 | 142 | def track_and_check_import(self, module_name): 143 | return 144 | 145 | # 146 | # Calclets are run in a restricted execution environment: 147 | # - many items removed from __builtins__ 148 | # - modified __import__ for tracking and verifying imports 149 | # - an import hook for accessing modules stored in the paper 150 | # 151 | # All data access and data generation is traced during calclet 152 | # execution in order to build the dependency graph. 153 | # 154 | 155 | class Calclet(Codelet): 156 | 157 | def run(self): 158 | self._dependencies = set() 159 | environment = {'__builtins__': 160 | activepapers.utility.ap_builtins.__dict__} 161 | self._run(environment) 162 | 163 | def add_dependency(self, dependency): 164 | assert isinstance(self._dependencies, set) 165 | self._dependencies.add(ascii(dependency)) 166 | 167 | def track_and_check_import(self, module_name): 168 | if module_name == 'activepapers.contents': 169 | return 170 | node = self.paper.get_local_module(module_name) 171 | if node is None: 172 | top_level = module_name.split('.')[0] 173 | if top_level not in self.paper.dependencies \ 174 | and top_level not in activepapers.standardlib.allowed_modules \ 175 | and top_level not in ['numpy', 'h5py']: 176 | raise ImportError("import of %s not allowed" % module_name) 177 | else: 178 | if datatype(node) != "module": 179 | node = node.get("__init__", None) 180 | if node is not None and node.in_paper(self.paper): 181 | self.add_dependency(node.name) 182 | 183 | 184 | # 185 | # The attrs attribute of datasets and groups is wrapped 186 | # by a class that makes the attributes used by ACTIVE_PAPERS 187 | # invisible to calclet code. 188 | # 189 | 190 | class AttrWrapper(collections.MutableMapping): 191 | 192 | def __init__(self, node): 193 | self._node = node 194 | 195 | @classmethod 196 | def forbidden(cls, key): 197 | return isstring(key) and key.startswith('ACTIVE_PAPER') 198 | 199 | def __len__(self): 200 | return len([k for k in self._node.attrs 201 | if not AttrWrapper.forbidden(k)]) 202 | 203 | def __iter__(self): 204 | for k in self._node.attrs: 205 | if not AttrWrapper.forbidden(k): 206 | yield k 207 | 208 | def __contains__(self, item): 209 | if AttrWrapper.forbidden(item): 210 | return False 211 | return item in self._node.attrs 212 | 213 | def __getitem__(self, item): 214 | if AttrWrapper.forbidden(item): 215 | raise KeyError(item) 216 | return self._node.attrs[item] 217 | 218 | def __setitem__(self, item, value): 219 | if AttrWrapper.forbidden(item): 220 | raise ValueError(item) 221 | self._node.attrs[item] = value 222 | 223 | def __delitem__(self, item): 224 | if AttrWrapper.forbidden(item): 225 | raise KeyError(item) 226 | del self._node.attrs[item] 227 | 228 | 229 | # 230 | # Datasets are wrapped by a class that traces all accesses for 231 | # building the dependency graph. 232 | # 233 | 234 | class DatasetWrapper(object): 235 | 236 | def __init__(self, parent, ds, codelet): 237 | self._parent = parent 238 | self._node = ds 239 | self._codelet = codelet 240 | self.attrs = AttrWrapper(ds) 241 | self.ref = ds.ref 242 | 243 | @property 244 | def parent(self): 245 | return self._parent 246 | 247 | def __len__(self): 248 | return len(self._node) 249 | 250 | def __getitem__(self, item): 251 | return self._node[item] 252 | 253 | def __setitem__(self, item, value): 254 | self._node[item] = value 255 | stamp(self._node, "data", self._codelet.dependency_attributes()) 256 | 257 | def __getattr__(self, attr): 258 | return getattr(self._node, attr) 259 | 260 | def read_direct(dest, source_sel=None, dest_sel=None): 261 | return self._node.read_direct(dest, source_sel, dest_sel) 262 | 263 | def resize(self, size, axis=None): 264 | self._node.resize(size, axis) 265 | stamp(self._node, "data", self._codelet.dependency_attributes()) 266 | 267 | def write_direct(source, source_sel=None, dest_sel=None): 268 | self._node.write_direct(source, source_sel, dest_sel) 269 | stamp(self._node, "data", self._codelet.dependency_attributes()) 270 | 271 | def __repr__(self): 272 | codelet = owner(self._node) 273 | if codelet is None: 274 | owned = "" 275 | else: 276 | owned = " generated by %s" % codelet 277 | lines = ["Dataset %s%s" % (self._node.name, owned)] 278 | nelems = np.product(self._node.shape) 279 | if nelems < 100: 280 | lines.append(str(self._node[...])) 281 | else: 282 | lines.append("shape %s, dtype %s" 283 | % (repr(self._node.shape), str(self._node.dtype))) 284 | return "\n".join(lines) 285 | 286 | # 287 | # DataGroup is a wrapper class for the "data" group in a paper. 288 | # The wrapper traces access and creation of subgroups and datasets 289 | # for building the dependency graph. It also maintains the illusion 290 | # that the data subgroup is all there is in the HDF5 file. 291 | # 292 | 293 | class DataGroup(object): 294 | 295 | def __init__(self, paper, parent, h5group, codelet, data_item=None): 296 | self._paper = paper 297 | self._parent = parent if parent is not None else self 298 | self._node = h5group 299 | self._codelet = codelet 300 | self._data_item = data_item 301 | if self._data_item is None and datatype(h5group) == "data": 302 | self._data_item = self 303 | self.attrs = AttrWrapper(h5group) 304 | self.ref = h5group.ref 305 | self.name = h5group.name 306 | 307 | @property 308 | def parent(self): 309 | return self._parent 310 | 311 | def _wrap_and_track_dependencies(self, node): 312 | ap_type = datatype(node) 313 | if ap_type == 'reference': 314 | from activepapers.storage import dereference 315 | paper, node = dereference(node) 316 | if node.name.startswith('/data/'): 317 | node = paper.data[node.name[6:]] 318 | elif isinstance(node, h5py.Group): 319 | node = DataGroup(paper, None, node, None, None) 320 | else: 321 | node = DatasetWrapper(None, node, None) 322 | else: 323 | if self._codelet is not None: 324 | if ap_type is not None and ap_type != "group": 325 | self._codelet.add_dependency(node.name 326 | if self._data_item is None 327 | else self._data_item.name) 328 | codelet = owner(node) 329 | if codelet is not None \ 330 | and datatype(self._node[codelet]) == "calclet": 331 | self._codelet.add_dependency(codelet) 332 | if isinstance(node, h5py.Group): 333 | node = DataGroup(self._paper, self, node, 334 | self._codelet, self._data_item) 335 | else: 336 | node = DatasetWrapper(self, node, self._codelet) 337 | return node 338 | 339 | def _stamp_new_node(self, node, ap_type): 340 | if self._data_item: 341 | stamp(self._data_item._node, "data", 342 | self._codelet.dependency_attributes()) 343 | else: 344 | stamp(node, ap_type, self._codelet.dependency_attributes()) 345 | 346 | def __len__(self): 347 | return len(self._node) 348 | 349 | def __iter__(self): 350 | for x in self._node: 351 | yield x 352 | 353 | def __getitem__(self, path_or_ref): 354 | if isstring(path_or_ref): 355 | path = datapath(path_or_ref) 356 | else: 357 | path = self._node[path_or_ref].name 358 | assert path.startswith('/data') 359 | path = path.split('/') 360 | if path[0] == '': 361 | # datapath() ensures that path must start with 362 | # ['', 'data'] in this case. Move up the parent 363 | # chain to the root of the /data hierarchy. 364 | path = path[2:] 365 | node = self 366 | while node is not node.parent: 367 | node = node.parent 368 | else: 369 | node = self 370 | for element in path: 371 | node = node._wrap_and_track_dependencies(node._node[element]) 372 | return node 373 | 374 | def get(self, path, default=None): 375 | try: 376 | return self[path] 377 | except KeyError: 378 | return default 379 | 380 | def __setitem__(self, path, value): 381 | path = datapath(path) 382 | needs_stamp = False 383 | if isinstance(value, (DataGroup, DatasetWrapper)): 384 | value = value._node 385 | else: 386 | needs_stamp = True 387 | self._node[path] = value 388 | if needs_stamp: 389 | node = self._node[path] 390 | stamp(node, "data", self._codelet.dependency_attributes()) 391 | 392 | def __delitem__(self, path): 393 | test = self._node[datapath(path)] 394 | if owner(test) == self._codelet.path: 395 | del self._node[datapath(path)] 396 | else: 397 | raise ValueError("%s trying to remove data created by %s" 398 | % (str(self._codelet.path), str(owner(test)))) 399 | 400 | def create_group(self, path): 401 | group = self._node.create_group(datapath(path)) 402 | self._stamp_new_node(group, "group") 403 | return DataGroup(self._paper, self, group, 404 | self._codelet, self._data_item) 405 | 406 | def require_group(self, path): 407 | group = self._node.require_group(datapath(path)) 408 | self._stamp_new_node(group, "group") 409 | return DataGroup(self._paper, self, group, 410 | self._codelet, self._data_item) 411 | 412 | def mark_as_data_item(self): 413 | stamp(self._node, "data", self._codelet.dependency_attributes()) 414 | self._data_item = self 415 | 416 | def create_dataset(self, path, *args, **kwargs): 417 | ds = self._node.create_dataset(datapath(path), *args, **kwargs) 418 | self._stamp_new_node(ds, "data") 419 | return DatasetWrapper(self, ds, self._codelet) 420 | 421 | def require_dataset(self, path, *args, **kwargs): 422 | ds = self._node.require_dataset(datapath(path), *args, **kwargs) 423 | self._stamp_new_node(ds, "data") 424 | return DatasetWrapper(self, ds, self._codelet) 425 | 426 | def visit(self, func): 427 | self._node.visit(func) 428 | 429 | def visititems(self, func): 430 | self._node.visititems(func) 431 | 432 | def copy(source, dest, name=None): 433 | raise NotImplementedError("not yet implemented") 434 | 435 | def flush(self): 436 | self._paper.flush() 437 | 438 | def __repr__(self): 439 | codelet = owner(self._node) 440 | if codelet is None: 441 | owned = "" 442 | else: 443 | owned = " generated by %s" % codelet 444 | items = list(self._node) 445 | if not items: 446 | lines = ["Empty group %s%s" % (self._node.name, owned)] 447 | else: 448 | lines = ["Group %s%s containing" % (self._node.name, owned)] 449 | lines.extend(" "+i for i in items) 450 | return "\n".join(lines) 451 | 452 | # 453 | # CodeGroup is a wrapper class for the "code" group in a paper. 454 | # The wrapper provide read-only access to codelets and modules. 455 | # 456 | 457 | class CodeGroup(object): 458 | 459 | def __init__(self, paper, node): 460 | self._paper = paper 461 | self._node = node 462 | 463 | def __len__(self): 464 | return len(self._node) 465 | 466 | def __iter__(self): 467 | for x in self._node: 468 | yield x 469 | 470 | def __getitem__(self, path_or_ref): 471 | if isstring(path_or_ref): 472 | path = codepath(path_or_ref) 473 | else: 474 | path = self._node[path_or_ref].name 475 | assert path.startswith('/code') 476 | node = self._node[path] 477 | if isinstance(node, h5py.Group): 478 | return CodeGroup(self._paper, node) 479 | else: 480 | return CodeFile(self._paper, node) 481 | 482 | def __repr__(self): 483 | return "" % self._node.name 484 | 485 | class CodeFile(object): 486 | 487 | def __init__(self, paper, node): 488 | self._paper = paper 489 | self._node = node 490 | self.type = datatype(node) 491 | self.language = language(node) 492 | self.name = node.name 493 | self.code = utf8(node[...].flat[0]) 494 | 495 | def __repr__(self): 496 | return "<%s %s (%s)>" % (self.type, self.name, self.language) 497 | 498 | # 499 | # Initialize a paper registry that permits finding a paper 500 | # object through a unique id stored in the codelet names, 501 | # and a codelet registry for retrieving active codelets. 502 | # 503 | 504 | paper_registry = weakref.WeakValueDictionary() 505 | codelet_registry = weakref.WeakValueDictionary() 506 | 507 | # 508 | # Identify calls from inside a codelet in order to apply 509 | # the codelet-specific import rules. 510 | # 511 | 512 | def get_codelet_and_paper(): 513 | """ 514 | :returns: the codelet from which this function was called, 515 | and the paper containing it. Both values are None 516 | if there is no codelet in the call chain. 517 | """ 518 | # Get the name of the source code file of the current 519 | # module, which is also the module containing the Codelet class. 520 | this_module = __file__ 521 | if os.path.splitext(this_module)[1] in ['.pyc', '.pyo']: 522 | this_module = this_module[:-1] 523 | # Get call stack minus the last entry, which is the 524 | # method find_module itself. 525 | stack = traceback.extract_stack()[:-1] 526 | # Look for the entry corresponding to Codelet.run() 527 | in_codelet = False 528 | for filename, line_no, fn_name, command in stack: 529 | if filename == this_module \ 530 | and command == "execcode(script, environment)": 531 | in_codelet = True 532 | if not in_codelet: 533 | return None, None 534 | # Look for an entry corresponding to codelet code. 535 | # Extract its paper_id and use it to look up the paper 536 | # in the registry. 537 | for item in stack: 538 | module_ref = item[0].split(':') 539 | if len(module_ref) != 2: 540 | # module_ref is a real filename 541 | continue 542 | paper_id, codelet = module_ref 543 | if not codelet.startswith('/code'): 544 | # module_ref is something other than a paper:codelet combo 545 | return None, None 546 | return codelet_registry.get((paper_id, codelet), None), \ 547 | paper_registry.get(paper_id, None) 548 | return None, None 549 | 550 | # 551 | # Install an importer for accessing Python modules inside papers 552 | # 553 | 554 | class Importer(object): 555 | 556 | def find_module(self, fullname, path=None): 557 | codelet, paper = get_codelet_and_paper() 558 | if paper is None: 559 | return None 560 | node = paper.get_local_module(fullname) 561 | if node is None: 562 | # No corresponding node found 563 | return None 564 | is_package = False 565 | if node.is_group(): 566 | # Node is a group, so this should be a package 567 | if '__init__' not in node: 568 | # Not a package 569 | return None 570 | is_package = True 571 | node = node['__init__'] 572 | if datatype(node) != "module" \ 573 | or ascii(node.attrs.get("ACTIVE_PAPER_LANGUAGE", "")) != "python": 574 | # Node found but is not a Python module 575 | return None 576 | return ModuleLoader(paper, fullname, node, is_package) 577 | 578 | 579 | class ModuleLoader(object): 580 | 581 | def __init__(self, paper, fullname, node, is_package): 582 | self.paper = paper 583 | self.fullname = fullname 584 | self.node = node 585 | # Python 3.4 has special treatment for loaders that 586 | # have an attribute 'is_package'. 587 | self._is_package = is_package 588 | 589 | def load_module(self, fullname): 590 | assert fullname == self.fullname 591 | if fullname in sys.modules: 592 | module = sys.modules[fullname] 593 | loader = getattr(module, '__loader__', None) 594 | if isinstance(loader, ModuleLoader): 595 | assert loader.paper is self.paper 596 | return module 597 | code = compile(ascii(self.node[...].flat[0]), 598 | ':'.join([self.paper._id(), self.node.name]), 599 | 'exec') 600 | module = imp.new_module(fullname) 601 | module.__file__ = os.path.abspath(self.node.file.filename) + ':' + \ 602 | self.node.name 603 | module.__loader__ = self 604 | if self._is_package: 605 | module.__path__ = [] 606 | module.__package__ = fullname 607 | else: 608 | module.__package__ = fullname.rpartition('.')[0] 609 | sys.modules[fullname] = module 610 | self.paper._local_modules[fullname] = module 611 | try: 612 | execcode(code, module.__dict__) 613 | except: 614 | del sys.modules[fullname] 615 | del self.paper._local_modules[fullname] 616 | raise 617 | return module 618 | 619 | sys.meta_path.insert(0, Importer()) 620 | 621 | # 622 | # Install an import hook for intercepting imports from codelets 623 | # 624 | 625 | standard__import__ = __import__ 626 | def ap__import__(*args, **kwargs): 627 | codelet, paper = get_codelet_and_paper() 628 | if codelet is not None: 629 | codelet.track_and_check_import(args[0]) 630 | return standard__import__(*args, **kwargs) 631 | activepapers.utility.ap_builtins.__import__ = ap__import__ 632 | -------------------------------------------------------------------------------- /lib/activepapers/exploration.py: -------------------------------------------------------------------------------- 1 | # An API for opening ActivePapers read-only for exploration of their 2 | # contents, including re-use of the code. 3 | 4 | from activepapers.storage import ActivePaper as ActivePaperStorage 5 | from activepapers.storage import open_paper_ref 6 | from activepapers.utility import path_in_section 7 | 8 | class ActivePaper(object): 9 | 10 | def __init__(self, file_or_ref, use_code=True): 11 | global _paper_for_code 12 | try: 13 | self.paper = open_paper_ref(file_or_ref) 14 | except ValueError: 15 | self.paper = ActivePaperStorage(file_or_ref, 'r') 16 | if use_code and ("python-packages" not in self.paper.code_group \ 17 | or len(self.paper.code_group["python-packages"]) == 0): 18 | # The paper contains no importable modules or packages. 19 | use_code = False 20 | if use_code and _paper_for_code is not None: 21 | raise IOError("Only one ActivePaper per process can use code.") 22 | self.data = self.paper.data 23 | self.documentation = self.paper.documentation_group 24 | self.code = self.paper.code_group 25 | try: 26 | self.__doc__ = self.open_documentation('README').read() 27 | except KeyError: 28 | pass 29 | if use_code: 30 | _paper_for_code = self.paper 31 | 32 | def close(self): 33 | global _paper_for_code 34 | if _paper_for_code is self.paper: 35 | _paper_for_code = None 36 | 37 | def _open(self, path, section, mode='r'): 38 | if mode not in ['r', 'rb']: 39 | raise ValueError("invalid mode: " + repr(mode)) 40 | path = path_in_section(path, section) 41 | if not path.startswith('/'): 42 | path = section + '/' + path 43 | return self.paper.open_internal_file(path, mode, 'utf8', None) 44 | 45 | def open(self, path, mode='r'): 46 | return self._open(path, '/data', mode) 47 | 48 | def open_documentation(self, path, mode='r'): 49 | return self._open(path, '/documentation', mode) 50 | 51 | def read_code(self, file): 52 | return self.code[file][...].ravel()[0].decode('utf-8') 53 | 54 | _paper_for_code = None 55 | def _get_codelet_and_paper(): 56 | return None, _paper_for_code 57 | import activepapers.execution 58 | activepapers.execution.get_codelet_and_paper = _get_codelet_and_paper 59 | del _get_codelet_and_paper 60 | -------------------------------------------------------------------------------- /lib/activepapers/library.py: -------------------------------------------------------------------------------- 1 | import json 2 | import os 3 | from activepapers import url 4 | 5 | # 6 | # The ACTIVEPAPERS_LIBRARY environment variable follows the 7 | # same conventions as PATH under Unix. 8 | # 9 | library = os.environ.get('ACTIVEPAPERS_LIBRARY', None) 10 | if library is None: 11 | # This is Unix-only, needs a Windows equivalent 12 | home = os.environ.get('HOME', None) 13 | if home is None: 14 | library = "" 15 | else: 16 | library = os.path.join(home, '.activepapers') 17 | if not os.path.exists(library): 18 | try: 19 | os.mkdir(library) 20 | except (IOError, OSError): 21 | library = "" 22 | if not os.path.exists(library): 23 | library = "" 24 | 25 | library = library.split(':') 26 | 27 | def split_paper_ref(paper_ref): 28 | index = paper_ref.find(':') 29 | if index == -1: 30 | raise ValueError("invalid paper reference %s" % paper_ref) 31 | return paper_ref[:index].lower(), paper_ref[index+1:] 32 | 33 | 34 | # 35 | # Return the local filename for a paper reference, 36 | # after downloading the file if required. 37 | # 38 | 39 | def _get_local_file(label): 40 | filename = label + '.ap' 41 | for dir in library: 42 | full_name = os.path.join(dir, "local", filename) 43 | if os.path.exists(full_name): 44 | return full_name 45 | raise IOError(2, "No such ActivePaper: 'local:%s' (filename: %s)" 46 | % (label, full_name)) 47 | 48 | def _get_figshare_doi(label, local_filename): 49 | figshare_url = "http://api.figshare.com/v1/articles/%s" % label 50 | try: 51 | response = url.urlopen(figshare_url) 52 | json_data = response.read().decode("utf-8") 53 | except url.HTTPError: 54 | raise ValueError("Not a figshare DOI: %s" % label) 55 | article_details = json.loads(json_data) 56 | download_url = article_details['items'][0]['files'][0]['download_url'] 57 | url.urlretrieve(download_url, local_filename) 58 | return local_filename 59 | 60 | def _get_zenodo_doi(label, local_filename): 61 | try: 62 | # Python 2 63 | from HTMLParser import HTMLParser 64 | bytes2text = lambda x: x 65 | except ImportError: 66 | # Python 3 67 | from html.parser import HTMLParser 68 | def bytes2text(b): 69 | return b.decode(encoding="utf8") 70 | class ZenodoParser(HTMLParser): 71 | def handle_starttag(self, tag, attrs): 72 | if tag == "link": 73 | attrs = dict(attrs) 74 | if attrs.get("rel") == "alternate" \ 75 | and attrs.get("type") != "application/rss+xml": 76 | self.link_href = attrs.get("href") 77 | self.link_type = attrs.get("type") 78 | 79 | zenodo_url = "http://dx.doi.org/" + label 80 | parser = ZenodoParser() 81 | source = url.urlopen(zenodo_url) 82 | try: 83 | parser.feed(bytes2text(source.read())) 84 | finally: 85 | source.close() 86 | assert parser.link_type == "application/octet-stream" 87 | download_url = parser.link_href 88 | url.urlretrieve(download_url, local_filename) 89 | return local_filename 90 | 91 | def _get_doi(label): 92 | local_filename = os.path.join(library[0], label + ".ap") 93 | if os.path.exists(local_filename): 94 | return local_filename 95 | 96 | dir_name = os.path.join(library[0], label.split("/")[0]) 97 | if not os.path.exists(dir_name): 98 | os.mkdir(dir_name) 99 | 100 | # There doesn't seem to be a way to download an 101 | # arbitrary digital object through its DOI. We know 102 | # know how to do it for figshare and Zenodo, which are 103 | # each handled by specialized code. 104 | 105 | # Figshare 106 | if 'figshare' in label: 107 | return _get_figshare_doi(label, local_filename) 108 | # Zenodo 109 | elif 'zenodo' in label: 110 | return _get_zenodo_doi(label, local_filename) 111 | # Nothing else works for now 112 | else: 113 | raise ValueError("Unrecognized DOI: %s" % label) 114 | 115 | def _get_file_in_cwd(label): 116 | filename = label + '.ap' 117 | full_name = os.path.abspath(os.path.join(os.getcwd(), filename)) 118 | if os.path.exists(full_name): 119 | return full_name 120 | raise IOError(2, "No such ActivePaper: 'cwd:%s' (filename: %s)" 121 | % (label, full_name)) 122 | 123 | download_handlers = {'local': _get_local_file, 124 | 'doi': _get_doi, 125 | 'cwd': _get_file_in_cwd} 126 | 127 | def find_in_library(paper_ref): 128 | ref_type, label = split_paper_ref(paper_ref) 129 | handler = download_handlers.get(ref_type) 130 | assert handler is not None 131 | return handler(label) 132 | -------------------------------------------------------------------------------- /lib/activepapers/standardlib.py: -------------------------------------------------------------------------------- 1 | import sys 2 | 3 | if sys.version_info[0] == 2: 4 | 5 | from activepapers.standardlib2 import * 6 | 7 | else: 8 | 9 | from activepapers.standardlib3 import * 10 | 11 | del sys 12 | -------------------------------------------------------------------------------- /lib/activepapers/standardlib2.py: -------------------------------------------------------------------------------- 1 | # Python 2 standard library modules that are allowed in calclets 2 | 3 | # The following is a complete list of modules in the standard library, 4 | # obtained from an installation of Python 2.7.3. Only modules starting 5 | # with an underscore were removed. Forbidden modules are commented out. 6 | # The selection needs a more careful revision. 7 | 8 | allowed_modules = [ 9 | #"BaseHTTPServer", 10 | "Bastion", 11 | #"CGIHTTPServer", 12 | "ConfigParser", 13 | "Cookie", 14 | #"DocXMLRPCServer", 15 | "HTMLParser", 16 | "MimeWriter", 17 | "Queue", 18 | #"SimpleHTTPServer", 19 | #"SimpleXMLRPCServer", 20 | #"SocketServer", 21 | "StringIO", 22 | "UserDict", 23 | "UserList", 24 | "UserString", 25 | "abc", 26 | "aifc", 27 | #"antigravity", 28 | #"anydbm", 29 | #"argparse", 30 | #"ast", 31 | "asynchat", 32 | "asyncore", 33 | "atexit", 34 | #"audiodev", 35 | "base64", 36 | #"bdb", 37 | "binhex", 38 | "bisect", 39 | #"bsddb", 40 | #"cProfile", 41 | "calendar", 42 | #"cgi", 43 | #"cgitb", 44 | "chunk", 45 | "cmd", 46 | "code", 47 | "codecs", 48 | "codeop", 49 | "collections", 50 | "colorsys", 51 | "commands", 52 | "compileall", 53 | "compiler", 54 | "config", 55 | "contextlib", 56 | "cookielib", 57 | "copy", 58 | "copy_reg", 59 | "csv", 60 | #"ctypes", 61 | #"curses", 62 | #"dbhash", 63 | "decimal", 64 | "difflib", 65 | "dircache", 66 | #"dis", 67 | #"distutils", 68 | #"doctest", 69 | #"dumbdbm", 70 | #"dummy_thread", 71 | #"dummy_threading", 72 | #"email", 73 | "encodings", 74 | #"filecmp", 75 | #"fileinput", 76 | "fnmatch", 77 | "formatter", 78 | "fpformat", 79 | "fractions", 80 | #"ftplib", 81 | "functools", 82 | "genericpath", 83 | #"getopt", 84 | #"getpass", 85 | "gettext", 86 | #"glob", 87 | #"gzip", 88 | "hashlib", 89 | "heapq", 90 | "hmac", 91 | "hotshot", 92 | "htmlentitydefs", 93 | "htmllib", 94 | #"httplib", 95 | #"idlelib", 96 | "ihooks", 97 | #"imaplib", 98 | "imghdr", 99 | #"importlib", 100 | #"imputil", 101 | "inspect", 102 | "io", 103 | "json", 104 | "keyword", 105 | "lib2to3", 106 | "linecache", 107 | "locale", 108 | "logging", 109 | #"mailbox", 110 | #"mailcap", 111 | "markupbase", 112 | "md5", 113 | "mhlib", 114 | "mimetools", 115 | "mimetypes", 116 | "mimify", 117 | #"modulefinder", 118 | "multifile", 119 | #"multiprocessing", 120 | #"mutex", 121 | #"netrc", 122 | "new", 123 | #"nntplib", 124 | #"ntpath", 125 | #"nturl2path", 126 | "numbers", 127 | "opcode", 128 | "optparse", 129 | "os", 130 | #"os2emxpath", 131 | #"pdb.doc", 132 | #"pdb", 133 | "pickle", 134 | "pickletools", 135 | "pipes", 136 | "pkgutil", 137 | "plistlib", 138 | "popen2", 139 | "poplib", 140 | "posixfile", 141 | "posixpath", 142 | "pprint", 143 | "profile", 144 | "pstats", 145 | #"pty", 146 | "py_compile", 147 | "pyclbr", 148 | #"pydoc", 149 | #"pydoc_data", 150 | "quopri", 151 | "random", 152 | "re", 153 | "repr", 154 | "rexec", 155 | "rfc822", 156 | "rlcompleter", 157 | "robotparser", 158 | "runpy", 159 | "sched", 160 | "sets", 161 | "sgmllib", 162 | "sha", 163 | "shelve", 164 | "shlex", 165 | "shutil", 166 | #"site", 167 | #"smtpd", 168 | #"smtplib", 169 | #"sndhdr", 170 | #"socket", 171 | #"sqlite3", 172 | "sre", 173 | "sre_compile", 174 | "sre_constants", 175 | "sre_parse", 176 | #"ssl", 177 | #"stat", 178 | #"statvfs", 179 | "string", 180 | "stringold", 181 | "stringprep", 182 | "struct", 183 | #"subprocess", 184 | #"sunau", 185 | #"sunaudio", 186 | "symbol", 187 | "symtable", 188 | "sysconfig", 189 | "tabnanny", 190 | #"tarfile", 191 | #"telnetlib", 192 | "tempfile", 193 | "test", 194 | "textwrap", 195 | #"this", 196 | #"threading", 197 | "timeit", 198 | "token", 199 | "tokenize", 200 | #"trace", 201 | #"traceback", 202 | "tty", 203 | "types", 204 | #"unittest", 205 | #"urllib", 206 | #"urllib2", 207 | "urlparse", 208 | "user", 209 | "uu", 210 | #"uuid", 211 | "warnings", 212 | #"wave", 213 | "weakref", 214 | #"webbrowser", 215 | #"whichdb", 216 | "wsgiref", 217 | "xdrlib", 218 | "xml", 219 | "xmllib", 220 | "xmlrpclib", 221 | "zipfile", 222 | 223 | ## extension modules 224 | 225 | #"OSATerminology", 226 | "array", 227 | #"audioop", 228 | #"autoGIL", 229 | "binascii", 230 | #"bsddb185", 231 | "bz2", 232 | "cPickle", 233 | "cStringIO", 234 | "cmath", 235 | "crypt", 236 | "datetime", 237 | #"dbm", 238 | #"fcntl", 239 | #"future_builtins", 240 | #"gdbm", 241 | #"gestalt", 242 | #"grp", 243 | #"icglue", 244 | "itertools", 245 | "math", 246 | #"mmap", 247 | "nis", 248 | "operator", 249 | "parser", 250 | "pyexpat", 251 | #"readline", 252 | #"resource", 253 | #"select", 254 | "strop", 255 | #"syslog", 256 | #"termios", 257 | "time", 258 | "unicodedata", 259 | "zlib", 260 | ] 261 | 262 | 263 | -------------------------------------------------------------------------------- /lib/activepapers/standardlib3.py: -------------------------------------------------------------------------------- 1 | # Python 3 standard library modules that are allowed in calclets 2 | 3 | # The following is a complete list of modules in the standard library, 4 | # obtained from an installation of Python 3.3. Only modules starting 5 | # with an underscore were removed. Forbidden modules are commented out. 6 | # The selection needs a more careful revision. 7 | 8 | allowed_modules = [ 9 | "abc", 10 | "aifc", 11 | #"antigravity", 12 | #"argparse", 13 | "ast", 14 | #"asynchat", 15 | #"asyncore", 16 | "base64", 17 | "bdb", 18 | "binhex", 19 | "bisect", 20 | "bz2", 21 | #"cProfile", 22 | "calendar", 23 | #"cgi", 24 | #"cgitb", 25 | "chunk", 26 | "cmd", 27 | "code", 28 | "codecs", 29 | "codeop", 30 | "collections", 31 | "colorsys", 32 | "compileall", 33 | #"concurrent", 34 | "configparser", 35 | "contextlib", 36 | "copy", 37 | "copyreg", 38 | "crypt", 39 | "csv", 40 | #"ctypes", 41 | #"curses", 42 | "datetime", 43 | #"dbm", 44 | "decimal", 45 | "difflib", 46 | #"dis", 47 | #"distutils", 48 | #"doctest", 49 | #"dummy_threading", 50 | #"email", 51 | "encodings", 52 | "filecmp", 53 | "fileinput", 54 | "fnmatch", 55 | "formatter", 56 | "fractions", 57 | #"ftplib", 58 | "functools", 59 | #"genericpath", 60 | #"getopt", 61 | #"getpass", 62 | "gettext", 63 | #"glob", 64 | "gzip", 65 | "hashlib", 66 | "heapq", 67 | "hmac", 68 | "html", 69 | #"http", 70 | #"idlelib", 71 | #"imaplib", 72 | "imghdr", 73 | #"imp", 74 | #"importlib", 75 | "inspect", 76 | #"io", 77 | #"ipaddress", 78 | "json", 79 | "keyword", 80 | "lib2to3", 81 | "linecache", 82 | "locale", 83 | "logging", 84 | "lzma", 85 | #"macpath", 86 | #"macurl2path", 87 | #"mailbox", 88 | #"mailcap", 89 | "mimetypes", 90 | #"modulefinder", 91 | #"multiprocessing", 92 | #"netrc", 93 | #"nntplib", 94 | #"ntpath", 95 | #"nturl2path", 96 | "numbers", 97 | "opcode", 98 | #"optparse", 99 | "os", 100 | "os2emxpath", 101 | #"pdb", 102 | "pickle", 103 | "pickletools", 104 | "pipes", 105 | #"pkgutil", 106 | "plistlib", 107 | "poplib", 108 | "posixpath", 109 | "pprint", 110 | "profile", 111 | "pstats", 112 | "pty", 113 | "py_compile", 114 | "pyclbr", 115 | "pydoc", 116 | "pydoc_data", 117 | "queue", 118 | "quopri", 119 | "random", 120 | "re", 121 | "reprlib", 122 | "rlcompleter", 123 | "runpy", 124 | "sched", 125 | "shelve", 126 | "shlex", 127 | "shutil", 128 | #"site", 129 | #"smtpd", 130 | #"smtplib", 131 | #"sndhdr", 132 | #"socket", 133 | #"socketserver", 134 | #"sqlite3", 135 | "sre_compile", 136 | "sre_constants", 137 | "sre_parse", 138 | #"ssl", 139 | #"stat", 140 | "string", 141 | "stringprep", 142 | "struct", 143 | #"subprocess", 144 | #"sunau", 145 | "symbol", 146 | "symtable", 147 | "sysconfig", 148 | "tabnanny", 149 | "tarfile", 150 | #"telnetlib", 151 | #"tempfile", 152 | "test", 153 | "textwrap", 154 | #"this", 155 | #"threading", 156 | "timeit", 157 | #"tkinter", 158 | "token", 159 | "tokenize", 160 | "trace", 161 | #"traceback", 162 | "tty", 163 | "turtle", 164 | "turtledemo", 165 | "types", 166 | #"unittest", 167 | #"urllib", 168 | "uu", 169 | #"uuid", 170 | "venv", 171 | "warnings", 172 | #"wave", 173 | "weakref", 174 | #"webbrowser", 175 | #"wsgiref", 176 | "xdrlib", 177 | "xml", 178 | "xmlrpc", 179 | "zipfile", 180 | 181 | ## extension modules 182 | 183 | "array", 184 | "atexit", 185 | #"audioop", 186 | "binascii", 187 | "bz2", 188 | "cmath", 189 | "crypt", 190 | #"fcntl", 191 | #"grp", 192 | "math", 193 | #"mmap", 194 | #"nis", 195 | "parser", 196 | "pyexpat", 197 | #"readline", 198 | #"resource", 199 | #"select", 200 | #"syslog", 201 | #"termios", 202 | "time", 203 | "unicodedata", 204 | "zlib", 205 | ] 206 | -------------------------------------------------------------------------------- /lib/activepapers/storage.py: -------------------------------------------------------------------------------- 1 | import collections 2 | import getpass 3 | import imp 4 | import importlib 5 | import io 6 | import itertools as it 7 | import os 8 | import socket 9 | import sys 10 | import weakref 11 | 12 | import numpy as np 13 | import h5py 14 | 15 | from activepapers.utility import ascii, utf8, h5vstring, isstring, execcode, \ 16 | codepath, datapath, owner, mod_time, \ 17 | datatype, timestamp, stamp, ms_since_epoch 18 | from activepapers.execution import Calclet, Importlet, DataGroup, paper_registry 19 | from activepapers.library import find_in_library 20 | import activepapers.version 21 | 22 | readme_text = """ 23 | This file is an ActivePaper (Python edition). 24 | 25 | For more information about ActivePapers see: 26 | 27 | http://www.activepapers.org/ 28 | """ 29 | 30 | 31 | # 32 | # The ActivePaper class is the only one in this library 33 | # meant to be used directly by client code. 34 | # 35 | 36 | class ActivePaper(object): 37 | 38 | def __init__(self, filename, mode="r", dependencies=None): 39 | self.filename = filename 40 | self.file = h5py.File(filename, mode) 41 | self.open = True 42 | self.writable = False 43 | if mode[0] == 'r': 44 | assert dependencies is None 45 | if ascii(self.file.attrs['DATA_MODEL']) != 'active-papers-py': 46 | raise ValueError("File %s is not an ActivePaper" % filename) 47 | self.code_group = self.file["code"] 48 | self.data_group = self.file["data"] 49 | self.documentation_group = self.file["documentation"] 50 | self.writable = '+' in mode 51 | self.history = self.file['history'] 52 | deps = self.file.get('external-dependencies/' 53 | 'python-packages', None) 54 | if deps is None: 55 | self.dependencies = [] 56 | else: 57 | self.dependencies = [ascii(n) for n in deps] 58 | for module_name in self.dependencies: 59 | importlib.import_module(module_name) 60 | elif mode[0] == 'w': 61 | self.file.attrs['DATA_MODEL'] = ascii('active-papers-py') 62 | self.file.attrs['DATA_MODEL_MAJOR_VERSION'] = 0 63 | self.file.attrs['DATA_MODEL_MINOR_VERSION'] = 1 64 | self.code_group = self.file.create_group("code") 65 | self.data_group = self.file.create_group("data") 66 | self.documentation_group = self.file.create_group("documentation") 67 | deps = self.file.create_group('external-dependencies') 68 | if dependencies is None: 69 | self.dependencies = [] 70 | else: 71 | for module_name in dependencies: 72 | assert isstring(module_name) 73 | importlib.import_module(module_name) 74 | self.dependencies = dependencies 75 | ds = deps.create_dataset('python-packages', 76 | dtype = h5vstring, 77 | shape = (len(dependencies),)) 78 | ds[:] = dependencies 79 | htype = np.dtype([('opened', np.int64), 80 | ('closed', np.int64), 81 | ('platform', h5vstring), 82 | ('hostname', h5vstring), 83 | ('username', h5vstring)] 84 | + [(name+"_version", h5vstring) 85 | for name in ['activepapers','python', 86 | 'numpy', 'h5py', 'hdf5'] 87 | + self.dependencies]) 88 | self.history = self.file.create_dataset("history", shape=(0,), 89 | dtype=htype, 90 | chunks=(1,), 91 | maxshape=(None,)) 92 | readme = self.file.create_dataset("README", 93 | dtype=h5vstring, shape = ()) 94 | readme[...] = readme_text 95 | self.writable = True 96 | 97 | if self.writable: 98 | self.update_history(close=False) 99 | 100 | import activepapers.utility 101 | self.data = DataGroup(self, None, self.data_group, ExternalCode(self)) 102 | self.imported_modules = {} 103 | 104 | self._local_modules = {} 105 | 106 | paper_registry[self._id()] = self 107 | 108 | def _id(self): 109 | return hex(id(self))[2:] 110 | 111 | def update_history(self, close): 112 | if close: 113 | entry = tuple(self.history[-1]) 114 | self.history[-1] = (entry[0], ms_since_epoch()) + entry[2:] 115 | else: 116 | self.history.resize((1+len(self.history),)) 117 | def getversion(name): 118 | if hasattr(sys.modules[name], '__version__'): 119 | return getattr(sys.modules[name], '__version__') 120 | else: 121 | return 'unknown' 122 | self.history[-1] = (ms_since_epoch(), 0, 123 | sys.platform, 124 | socket.getfqdn(), 125 | getpass.getuser(), 126 | activepapers.__version__, 127 | sys.version.split()[0], 128 | np.__version__, 129 | h5py.version.version, 130 | h5py.version.hdf5_version) \ 131 | + tuple(getversion(m) for m in self.dependencies) 132 | 133 | def close(self): 134 | if self.open: 135 | if self.writable: 136 | self.update_history(close=True) 137 | del self._local_modules 138 | self.open = False 139 | try: 140 | self.file.close() 141 | except: 142 | pass 143 | paper_id = hex(id(self))[2:] 144 | try: 145 | del paper_registry[paper_id] 146 | except KeyError: 147 | pass 148 | 149 | def assert_is_open(self): 150 | if not self.open: 151 | raise ValueError("ActivePaper %s has been closed" % self.filename) 152 | 153 | def __enter__(self): 154 | return self 155 | 156 | def __exit__(self, exc_type, exc_val, exc_tb): 157 | self.close() 158 | return False 159 | 160 | def flush(self): 161 | self.file.flush() 162 | 163 | def _create_ref(self, path, paper_ref, ref_path, group, prefix): 164 | if ref_path is None: 165 | ref_path = path 166 | if group is None: 167 | group = 'file' 168 | if prefix is None: 169 | prefix = '' 170 | else: 171 | prefix += '/' 172 | paper = open_paper_ref(paper_ref) 173 | # Access the item to make sure it exists 174 | item = getattr(paper, group)[ref_path] 175 | ref_dtype = np.dtype([('paper_ref', h5vstring), ('path', h5vstring)]) 176 | ds = getattr(self, group).require_dataset(path, shape=(), 177 | dtype=ref_dtype) 178 | ds[...] = (paper_ref, prefix + ref_path) 179 | stamp(ds, 'reference', {}) 180 | return ds 181 | 182 | def create_ref(self, path, paper_ref, ref_path=None): 183 | return self._create_ref(path, paper_ref, ref_path, None, None) 184 | 185 | def create_data_ref(self, path, paper_ref, ref_path=None): 186 | return self._create_ref(path, paper_ref, ref_path, 187 | 'data_group', '/data') 188 | 189 | def create_code_ref(self, path, paper_ref, ref_path=None): 190 | return self._create_ref(path, paper_ref, ref_path, 191 | 'code_group', '/code') 192 | 193 | def create_module_ref(self, path, paper_ref, ref_path=None): 194 | path = "python-packages/" + path 195 | if ref_path is not None: 196 | ref_path = "python-packages/" + ref_path 197 | return self.create_code_ref(path, paper_ref, ref_path) 198 | 199 | def create_copy(self, path, paper_ref, ref_path=None): 200 | if ref_path is None: 201 | ref_path = path 202 | paper = open_paper_ref(paper_ref) 203 | item = paper.file[ref_path] 204 | self.file.copy(item, path, expand_refs=True) 205 | copy = self.file[path] 206 | self._delete_dependency_attributes(copy) 207 | timestamp(copy, mod_time(item)) 208 | ref_dtype = np.dtype([('paper_ref', h5vstring), ('path', h5vstring)]) 209 | copy.attrs.create('ACTIVE_PAPER_COPIED_FROM', 210 | shape=(), dtype=ref_dtype, 211 | data=np.array((paper_ref, ref_path), dtype=ref_dtype)) 212 | return copy 213 | 214 | def _delete_dependency_attributes(self, node): 215 | for attr_name in ['ACTIVE_PAPER_GENERATING_CODELET', 216 | 'ACTIVE_PAPER_DEPENDENCIES']: 217 | if attr_name in node.attrs: 218 | del node.attrs[attr_name] 219 | if isinstance(node, h5py.Group): 220 | for item in node: 221 | self._delete_dependency_attributes(node[item]) 222 | 223 | def store_python_code(self, path, code): 224 | self.assert_is_open() 225 | if not isstring(code): 226 | raise TypeError("Python code must be a string (is %s)" 227 | % str(type(code))) 228 | ds = self.code_group.require_dataset(path, 229 | dtype=h5vstring, shape = ()) 230 | ds[...] = code.encode('utf-8') 231 | ds.attrs['ACTIVE_PAPER_LANGUAGE'] = "python" 232 | return ds 233 | 234 | def add_module(self, name, module_code): 235 | path = codepath('/'.join(['', 'python-packages'] + name.split('.'))) 236 | ds = self.store_python_code(path, module_code) 237 | stamp(ds, "module", {}) 238 | 239 | def import_module(self, name, python_path=sys.path): 240 | if name in self.imported_modules: 241 | return self.imported_modules[name] 242 | if '.' in name: 243 | # Submodule, add the underlying package first 244 | package, _, module = name.rpartition('.') 245 | path = [self.import_module(package, python_path)] 246 | else: 247 | module = name 248 | path = python_path 249 | file, filename, (suffix, mode, kind) = imp.find_module(module, path) 250 | if kind == imp.PKG_DIRECTORY: 251 | package = filename 252 | file = open(os.path.join(filename, '__init__.py')) 253 | name = name + '/__init__' 254 | else: 255 | package = None 256 | if file is None: 257 | raise ValueError("%s is not a Python module" % name) 258 | if kind != imp.PY_SOURCE: 259 | file.close() 260 | raise ValueError("%s is not a Python source code file" 261 | % filename) 262 | self.add_module(name, ascii(file.read())) 263 | file.close() 264 | self.imported_modules[name] = package 265 | return package 266 | 267 | def get_local_module(self, name): 268 | path = codepath('/'.join(['', 'python-packages'] + name.split('.'))) 269 | return APNode(self.code_group).get(path, None) 270 | 271 | def create_calclet(self, path, script): 272 | path = codepath(path) 273 | if not path.startswith('/'): 274 | path = '/'.join([self.code_group.name, path]) 275 | ds = self.store_python_code(path, script) 276 | stamp(ds, "calclet", {}) 277 | return Calclet(self, ds) 278 | 279 | def create_importlet(self, path, script): 280 | path = codepath(path) 281 | if not path.startswith('/'): 282 | path = '/'.join([self.code_group.name, path]) 283 | ds = self.store_python_code(path, script) 284 | stamp(ds, "importlet", {}) 285 | return Importlet(self, ds) 286 | 287 | def run_codelet(self, path, debug=False): 288 | if path.startswith('/'): 289 | assert path.startswith('/code/') 290 | path = path[6:] 291 | node = APNode(self.code_group)[path] 292 | class_ = {'calclet': Calclet, 'importlet': Importlet}[datatype(node)] 293 | try: 294 | class_(self, node).run() 295 | return None 296 | except Exception: 297 | # TODO: preprocess traceback to show only the stack frames 298 | # in the codelet. 299 | import traceback 300 | 301 | type, value, trace = sys.exc_info() 302 | stack = traceback.extract_tb(trace) 303 | del trace 304 | 305 | while stack: 306 | if stack[0][2] == 'execcode': 307 | del stack[0] 308 | break 309 | del stack[0] 310 | 311 | fstack = [] 312 | for filename, lineno, fn_name, code in stack: 313 | if ':' in filename: 314 | paper_id, codelet = filename.split(':') 315 | paper = paper_registry.get(paper_id) 316 | if paper is None: 317 | paper_name = '' 318 | else: 319 | paper_name = '<%s>' % paper.file.filename 320 | filename = ':'.join([paper_name, codelet]) 321 | if code is None and paper is not None: 322 | script = utf8(paper.file[codelet][...].flat[0]) 323 | code = script.split('\n')[lineno-1] 324 | fstack.append((filename, lineno, fn_name, code)) 325 | 326 | tb_text = ''.join(["Traceback (most recent call last):\n"] + \ 327 | traceback.format_list(fstack) + \ 328 | traceback.format_exception_only(type, value)) 329 | if debug: 330 | sys.stderr.write(tb_text) 331 | import pdb 332 | pdb.post_mortem() 333 | else: 334 | return tb_text 335 | 336 | def calclets(self): 337 | return dict((item.name, 338 | Calclet(self, item)) 339 | for item in self.iter_items() 340 | if datatype(item) == 'calclet') 341 | 342 | def remove_owned_by(self, codelet): 343 | def owned(group): 344 | nodes = [] 345 | for node in group.values(): 346 | if owner(node) == codelet: 347 | nodes.append(node.name) 348 | elif isinstance(node, h5py.Group) \ 349 | and datatype(node) != 'data': 350 | nodes.extend(owned(node)) 351 | return nodes 352 | for group in [self.code_group, 353 | self.data_group, 354 | self.documentation_group]: 355 | for node_name in owned(group): 356 | del self.file[node_name] 357 | 358 | def replace_by_dummy(self, item_name): 359 | item = self.file[item_name] 360 | codelet = owner(item) 361 | assert codelet is not None 362 | dtype = datatype(item) 363 | mtime = mod_time(item) 364 | deps = item.attrs.get('ACTIVE_PAPER_DEPENDENCIES') 365 | del self.file[item_name] 366 | ds = self.file.create_dataset(item_name, 367 | data=np.zeros((), dtype=np.int)) 368 | stamp(ds, dtype, 369 | dict(ACTIVE_PAPER_GENERATING_CODELET=codelet, 370 | ACTIVE_PAPER_DEPENDENCIES=list(deps))) 371 | timestamp(ds, mtime) 372 | ds.attrs['ACTIVE_PAPER_DUMMY_DATASET'] = True 373 | 374 | def is_dummy(self, item): 375 | return item.attrs.get('ACTIVE_PAPER_DUMMY_DATASET', False) 376 | 377 | def iter_items(self): 378 | """ 379 | Iterate over the items in a paper. 380 | """ 381 | def walk(group): 382 | for node in group.values(): 383 | if isinstance(node, h5py.Group) \ 384 | and datatype(node) != 'data': 385 | for gnode in walk(node): 386 | yield gnode 387 | else: 388 | yield node 389 | for group in [self.code_group, 390 | self.data_group, 391 | self.documentation_group]: 392 | for node in walk(group): 393 | yield node 394 | 395 | def iter_groups(self): 396 | """ 397 | Iterate over the groups in a paper that are not items. 398 | """ 399 | def walk(group): 400 | for node in group.values(): 401 | if isinstance(node, h5py.Group) \ 402 | and datatype(node) != 'data': 403 | yield node 404 | for subnode in walk(node): 405 | yield subnode 406 | for group in [self.code_group, 407 | self.data_group, 408 | self.documentation_group]: 409 | for node in walk(group): 410 | yield node 411 | 412 | def iter_dependencies(self, item): 413 | """ 414 | Iterate over the dependencies of a given item in a paper. 415 | """ 416 | if 'ACTIVE_PAPER_DEPENDENCIES' in item.attrs: 417 | for dep in item.attrs['ACTIVE_PAPER_DEPENDENCIES']: 418 | yield self.file[dep] 419 | 420 | def is_stale(self, item): 421 | t = mod_time(item) 422 | for dep in self.iter_dependencies(item): 423 | if mod_time(dep) > t: 424 | return True 425 | return False 426 | 427 | def external_references(self): 428 | def process(node, refs): 429 | if datatype(node) == 'reference': 430 | paper_ref, ref_path = node[()] 431 | refs[paper_ref][0].add(ref_path) 432 | elif 'ACTIVE_PAPER_COPIED_FROM' in node.attrs: 433 | source = node.attrs['ACTIVE_PAPER_COPIED_FROM'] 434 | paper_ref, ref_path = source 435 | if h5py.version.version_tuple[:2] <= (2, 2): 436 | # h5py 2.2 returns a wrong dtype 437 | paper_ref = paper_ref.flat[0] 438 | ref_path = ref_path.flat[0] 439 | refs[paper_ref][1].add(ref_path) 440 | if isinstance(node, h5py.Group): 441 | for item in node: 442 | process(node[item], refs) 443 | return refs 444 | 445 | refs = collections.defaultdict(lambda: (set(), set())) 446 | for node in [self.code_group, self.data_group, 447 | self.documentation_group]: 448 | process(node, refs) 449 | return refs 450 | 451 | def has_dependencies(self, item): 452 | """ 453 | :param item: an item in a paper 454 | :type item: h5py.Node 455 | :return: True if the item has any dependencies 456 | :rtype: bool 457 | """ 458 | return 'ACTIVE_PAPER_DEPENDENCIES' in item.attrs \ 459 | and len(item.attrs['ACTIVE_PAPER_DEPENDENCIES']) > 0 460 | 461 | def dependency_graph(self): 462 | """ 463 | :return: a dictionary mapping the name of each item to the 464 | set of the names of the items that depend on it 465 | :rtype: dict 466 | """ 467 | graph = collections.defaultdict(set) 468 | for item in it.chain(self.iter_items(), self.iter_groups()): 469 | for dep in self.iter_dependencies(item): 470 | graph[dep.name].add(item.name) 471 | return graph 472 | 473 | def dependency_hierarchy(self): 474 | """ 475 | Generator yielding a sequence of sets of HDF5 paths 476 | such that the items in each set depend only on the items 477 | in the preceding sets. 478 | """ 479 | known = set() 480 | unknown = set() 481 | for item in self.iter_items(): 482 | d = (item.name, 483 | frozenset(dep.name for dep in self.iter_dependencies(item))) 484 | if len(d[1]) > 0: 485 | unknown.add(d) 486 | else: 487 | known.add(d[0]) 488 | yield set(self.file[p] for p in known) 489 | while len(unknown) > 0: 490 | next = set(p for p, d in unknown if d <= known) 491 | if len(next) == 0: 492 | raise ValueError("cyclic dependencies") 493 | known |= next 494 | unknown = set((p, d) for p, d in unknown if p not in next) 495 | yield set(self.file[p] for p in next) 496 | 497 | def rebuild(self, filename): 498 | """ 499 | Rebuild all the dependent items in the paper in a new file. 500 | First all items without dependencies are copied to the new 501 | file, then all the calclets are run in the new file in the 502 | order determined by the dependency graph in the original file. 503 | """ 504 | deps = self.dependency_hierarchy() 505 | with ActivePaper(filename, 'w') as clone: 506 | for item in next(deps): 507 | # Make sure all the groups in the path exist 508 | path = item.name.split('/') 509 | name = path[-1] 510 | groups = path[:-1] 511 | dest = clone.file 512 | while groups: 513 | group_name = groups[0] 514 | if len(group_name) > 0: 515 | if group_name not in dest: 516 | dest.create_group(group_name) 517 | dest = dest[group_name] 518 | del groups[0] 519 | clone.file.copy(item, item.name, expand_refs=True) 520 | timestamp(clone.file[item.name]) 521 | for items in deps: 522 | calclets = set(item.attrs['ACTIVE_PAPER_GENERATING_CODELET'] 523 | for item in items) 524 | for calclet in calclets: 525 | clone.run_codelet(calclet) 526 | 527 | def snapshot(self, filename): 528 | """ 529 | Make a copy of the ActivePaper in its current state. 530 | This is meant to be used form inside long-running 531 | codelets in order to permit external monitoring of 532 | the progress, given that HDF5 files being written cannot 533 | be read simultaneously. 534 | """ 535 | self.file.flush() 536 | clone = h5py.File(filename, 'w') 537 | for item in self.file: 538 | clone.copy(self.file[item], item, expand_refs=True) 539 | for attr_name in self.file.attrs: 540 | clone.attrs[attr_name] = self.file.attrs[attr_name] 541 | clone.close() 542 | 543 | def open_internal_file(self, path, mode='r', encoding=None, creator=None): 544 | # path is always relative to the root group 545 | if path.startswith('/'): 546 | path = path[1:] 547 | if not path.startswith('data/') \ 548 | and not path.startswith('documentation/'): 549 | raise IOError((13, "Permission denied: '%s'" % path)) 550 | if creator is None: 551 | creator = ExternalCode(self) 552 | if mode[0] in ['r', 'a']: 553 | ds = self.file[path] 554 | elif mode[0] == 'w': 555 | test = self.file.get(path, None) 556 | if test is not None: 557 | if not creator.owns(test): 558 | raise ValueError("%s trying to overwrite data" 559 | " created by %s" 560 | % (creator.path, owner(test))) 561 | del self.file[path] 562 | ds = self.file.create_dataset( 563 | path, shape = (0,), dtype = np.uint8, 564 | chunks = (100,), maxshape = (None,)) 565 | else: 566 | raise ValueError("unknown file mode %s" % mode) 567 | return InternalFile(ds, mode, encoding) 568 | 569 | 570 | # 571 | # A dummy replacement that emulates the interface of Calclet. 572 | # 573 | 574 | class ExternalCode(object): 575 | 576 | def __init__(self, paper): 577 | self.paper = paper 578 | self.path = None 579 | 580 | def add_dependency(self, dependency): 581 | pass 582 | 583 | def dependency_attributes(self): 584 | return {} 585 | 586 | def owns(self, node): 587 | # Pretend to be the owner of everything 588 | return True 589 | 590 | 591 | # 592 | # A Python file interface for byte array datasets 593 | # 594 | 595 | class InternalFile(io.IOBase): 596 | 597 | def __init__(self, ds, mode, encoding=None): 598 | self._ds = ds 599 | self._mode = mode 600 | self._encoding = encoding 601 | self._position = 0 602 | self._closed = False 603 | self._binary = 'b' in mode 604 | self._get_attributes = lambda: {} 605 | self._stamp() 606 | 607 | def readable(self): 608 | return True 609 | 610 | def writable(self): 611 | return self._mode[0] == 'w' or '+' in self._mode 612 | 613 | @property 614 | def closed(self): 615 | return self._closed 616 | 617 | @property 618 | def mode(self): 619 | return self._mode 620 | 621 | @property 622 | def name(self): 623 | return self._ds.name 624 | 625 | def _check_if_open(self): 626 | if self._closed: 627 | raise ValueError("file has been closed") 628 | 629 | def _convert(self, data): 630 | if self._binary: 631 | return data 632 | elif self._encoding is not None: 633 | return data.decode(self._encoding) 634 | else: 635 | return ascii(data) 636 | 637 | def _set_attribute_callback(self, callback): 638 | self._get_attributes = callback 639 | 640 | def _stamp(self): 641 | if self.writable(): 642 | stamp(self._ds, "file", self._get_attributes()) 643 | 644 | def close(self): 645 | self._closed = True 646 | self._stamp() 647 | 648 | def flush(self): 649 | self._check_if_open() 650 | 651 | def isatty(self): 652 | return False 653 | 654 | def __next__(self): 655 | self._check_if_open() 656 | if self._position == len(self._ds): 657 | raise StopIteration 658 | return self.readline() 659 | next = __next__ # for Python 2 660 | 661 | def __iter__(self): 662 | return self 663 | 664 | def __enter__(self): 665 | return self 666 | 667 | def __exit__(self, exc_type, exc_val, exc_tb): 668 | self.close() 669 | return False 670 | 671 | def read(self, size=None): 672 | self._check_if_open() 673 | if size is None: 674 | size = len(self._ds)-self._position 675 | if size == 0: 676 | return '' 677 | else: 678 | new_position = self._position + size 679 | data = self._ds[self._position:new_position] 680 | self._position = new_position 681 | return self._convert(data.tostring()) 682 | 683 | def readline(self, size=None): 684 | self._check_if_open() 685 | remaining = len(self._ds) - self._position 686 | if remaining == 0: 687 | return self._convert('') 688 | for l in range(min(100, remaining), remaining+100, 100): 689 | data = self._ds[self._position:self._position+l] 690 | eols = np.nonzero(data == 10)[0] 691 | if len(eols) > 0: 692 | n = eols[0]+1 693 | self._position += n 694 | return self._convert(data[:n].tostring()) 695 | self._position = len(self._ds) 696 | return self._convert(data.tostring()) 697 | 698 | def readlines(self, sizehint=None): 699 | self._check_if_open() 700 | return list(line for line in self) 701 | 702 | def seek(self, offset, whence=os.SEEK_SET): 703 | self._check_if_open() 704 | file_length = len(self._ds) 705 | if whence == os.SEEK_SET: 706 | self._position = offset 707 | elif whence == os.SEEK_CUR: 708 | self._position += offset 709 | elif whence == os.SEEK_END: 710 | self._position = file_length + offset 711 | self._position = max(0, min(file_length, self._position)) 712 | 713 | def tell(self): 714 | self._check_if_open() 715 | return self._position 716 | 717 | def truncate(self, size=None): 718 | self._check_if_open() 719 | if size is None: 720 | size = self._position 721 | self._ds.resize((size,)) 722 | self._stamp() 723 | 724 | def write(self, string): 725 | self._check_if_open() 726 | if self._mode[0] == 'r': 727 | raise IOError("File not open for writing") 728 | if not string: 729 | # HDF5 crashes when trying to write a zero-length 730 | # slice, so this must be handled as a special case. 731 | return 732 | if self._encoding is not None: 733 | string = string.encode(self._encoding) 734 | new_position = self._position + len(string) 735 | if new_position > len(self._ds): 736 | self._ds.resize((new_position,)) 737 | self._ds[self._position:new_position] = \ 738 | np.fromstring(string, dtype=np.uint8) 739 | self._position = new_position 740 | self._stamp() 741 | 742 | def writelines(self, strings): 743 | self._check_if_open() 744 | for line in strings: 745 | self.write(line) 746 | 747 | 748 | # 749 | # A wrapper for nodes that works across references 750 | # 751 | 752 | class APNode(object): 753 | 754 | def __init__(self, h5node, name = None): 755 | self._h5node = h5node 756 | self.name = h5node.name if name is None else name 757 | 758 | def is_group(self): 759 | return isinstance(self._h5node, h5py.Group) 760 | 761 | def __contains__(self, item): 762 | return item in self._h5node 763 | 764 | def __getitem__(self, item): 765 | if isinstance(self._h5node, h5py.Group): 766 | path = item.split('/') 767 | if path[0] == '': 768 | node = APNode(self._h5node.file) 769 | path = path[1:] 770 | else: 771 | node = self 772 | for item in path: 773 | node = node._getitem(item) 774 | return node 775 | else: 776 | return self._h5node[item] 777 | 778 | def get(self, item, default): 779 | try: 780 | return self[item] 781 | except: 782 | return default 783 | 784 | def _getitem(self, item): 785 | node = self._h5node 786 | if datatype(node) == 'reference': 787 | _, node = dereference(node) 788 | node = node[item] 789 | if datatype(node) == 'reference': 790 | _, node = dereference(node) 791 | name = self.name 792 | if not name.endswith('/'): name += '/' 793 | name += item 794 | return APNode(node, name) 795 | 796 | def __getattr__(self, attrname): 797 | return getattr(self._h5node, attrname) 798 | 799 | def in_paper(self, paper): 800 | return paper.file.id == self._h5node.file.id 801 | 802 | # 803 | # A global dictionary mapping paper_refs to papers. 804 | # Each entry disappears when no reference to the paper remains. 805 | # 806 | _papers = weakref.WeakValueDictionary() 807 | 808 | # # Close all open referenced papers at interpreter exit, 809 | # # in order to prevent "murdered identifiers" in h5py. 810 | # def _cleanup(): 811 | # for paper in activepapers.storage._papers.values(): 812 | # paper.close() 813 | 814 | # import atexit 815 | # atexit.register(_cleanup) 816 | # del atexit 817 | 818 | # 819 | # Dereference a reference node 820 | # 821 | def dereference(ref_node): 822 | assert datatype(ref_node) == 'reference' 823 | paper_ref, path = ref_node[()] 824 | paper = open_paper_ref(ascii(paper_ref)) 825 | return paper, paper.file[path] 826 | 827 | # 828 | # Open a paper given its reference 829 | # 830 | def open_paper_ref(paper_ref): 831 | if paper_ref in _papers: 832 | return _papers[paper_ref] 833 | paper = ActivePaper(find_in_library(paper_ref), "r") 834 | _papers[paper_ref] = paper 835 | return paper 836 | -------------------------------------------------------------------------------- /lib/activepapers/url.py: -------------------------------------------------------------------------------- 1 | import sys 2 | 3 | # Python 2/3 compatibility issues 4 | if sys.version_info[0] == 2: 5 | 6 | from activepapers.url2 import * 7 | 8 | else: 9 | 10 | from activepapers.url3 import * 11 | -------------------------------------------------------------------------------- /lib/activepapers/url2.py: -------------------------------------------------------------------------------- 1 | from urllib2 import urlopen, HTTPError 2 | from urllib import urlretrieve 3 | -------------------------------------------------------------------------------- /lib/activepapers/url3.py: -------------------------------------------------------------------------------- 1 | from urllib.request import urlopen, urlretrieve, HTTPError 2 | -------------------------------------------------------------------------------- /lib/activepapers/utility.py: -------------------------------------------------------------------------------- 1 | import sys 2 | import time 3 | 4 | # Python 2/3 compatibility issues 5 | if sys.version_info[0] == 2: 6 | 7 | from activepapers.utility2 import * 8 | 9 | else: 10 | 11 | from activepapers.utility3 import * 12 | 13 | # Various small functions 14 | 15 | def datatype(node): 16 | s = node.attrs.get('ACTIVE_PAPER_DATATYPE', None) 17 | if s is None: 18 | return s 19 | else: 20 | return ascii(s) 21 | 22 | def owner(node): 23 | s = node.attrs.get('ACTIVE_PAPER_GENERATING_CODELET', None) 24 | if s is None: 25 | return s 26 | else: 27 | return ascii(s) 28 | 29 | def language(node): 30 | s = node.attrs.get('ACTIVE_PAPER_LANGUAGE', None) 31 | if s is None: 32 | return s 33 | else: 34 | return ascii(s) 35 | 36 | def mod_time(node): 37 | s = node.attrs.get('ACTIVE_PAPER_TIMESTAMP', None) 38 | if s is None: 39 | return s 40 | else: 41 | return s/1000. 42 | 43 | def ms_since_epoch(): 44 | return np.int64(1000.*time.time()) 45 | 46 | def timestamp(node, time=None): 47 | if time is None: 48 | time = ms_since_epoch() 49 | else: 50 | time *= 1000. 51 | node.attrs['ACTIVE_PAPER_TIMESTAMP'] = time 52 | 53 | def stamp(node, ap_type, attributes): 54 | allowed_transformations = {'group': 'data', 55 | 'data': 'group', 56 | 'file': 'text'} 57 | attrs = dict(attributes) 58 | attrs['ACTIVE_PAPER_DATATYPE'] = ap_type 59 | for key, value in attrs.items(): 60 | if value is None: 61 | continue 62 | if isstring(value): 63 | previous = node.attrs.get(key, None) 64 | if previous is None: 65 | node.attrs[key] = value 66 | else: 67 | if previous != value: 68 | # String attributes can't change when re-stamping... 69 | if key == 'ACTIVE_PAPER_DATATYPE' \ 70 | and allowed_transformations.get(previous) == value: 71 | # ...with a few exceptions 72 | node.attrs[key] = value 73 | else: 74 | raise ValueError("%s: %s != %s" 75 | % (key, value, previous)) 76 | elif key == 'ACTIVE_PAPER_DEPENDENCIES': 77 | node.attrs.create(key, np.array(value, dtype=object), 78 | shape = (len(value),), dtype=h5vstring) 79 | else: 80 | raise ValueError("unexpected key %s" % key) 81 | timestamp(node) 82 | 83 | def path_in_section(path, section): 84 | if not isstring(path): 85 | raise ValueError("type %s where string is expected" 86 | % str(type(path))) 87 | if path.startswith("/"): 88 | return section + path 89 | else: 90 | return path 91 | 92 | def datapath(path): 93 | return path_in_section(path, "/data") 94 | 95 | def codepath(path): 96 | return path_in_section(path, "/code") 97 | -------------------------------------------------------------------------------- /lib/activepapers/utility2.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import h5py 3 | 4 | def ascii(string): 5 | return string 6 | 7 | def utf8(string): 8 | return string.decode('utf-8') 9 | 10 | def py_str(byte_string): 11 | if isinstance(byte_string, np.ndarray): 12 | return str(byte_string) 13 | else: 14 | assert isinstance(byte_string, str) 15 | return byte_string 16 | 17 | def isstring(s): 18 | return isinstance(s, basestring) 19 | 20 | def execcode(s, globals, locals=None): 21 | if locals is None: 22 | exec s in globals 23 | else: 24 | exec s in globals, locals 25 | 26 | h5vstring = h5py.special_dtype(vlen=str) 27 | 28 | import __builtin__ as builtins 29 | import activepapers.builtins2 as ap_builtins 30 | 31 | raw_input = builtins.raw_input 32 | -------------------------------------------------------------------------------- /lib/activepapers/utility3.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import h5py 3 | 4 | def ascii(string): 5 | if isinstance(string, bytes): 6 | return bytes.decode(string, 'ASCII') 7 | return string 8 | 9 | def utf8(string): 10 | if isinstance(string, bytes): 11 | return bytes.decode(string, 'utf-8') 12 | return string 13 | 14 | def py_str(byte_string): 15 | if isinstance(byte_string, np.ndarray): 16 | byte_string = bytes(byte_string) 17 | assert isinstance(byte_string, bytes) 18 | return byte_string.decode('ASCII') 19 | 20 | def isstring(s): 21 | return isinstance(s, str) 22 | 23 | def execcode(code, globals, locals=None): 24 | if locals is None: 25 | exec(code, globals) 26 | else: 27 | exec(code, globals, locals) 28 | 29 | h5vstring = h5py.special_dtype(vlen=bytes) 30 | 31 | import builtins 32 | import activepapers.builtins3 as ap_builtins 33 | # Replace the "del exec" in builtins3 by something that's not a 34 | # syntax error under Python 2. 35 | del ap_builtins.__dict__['exec'] 36 | 37 | raw_input = builtins.input 38 | -------------------------------------------------------------------------------- /lib/activepapers/version.py: -------------------------------------------------------------------------------- 1 | version = '0.2.2' 2 | -------------------------------------------------------------------------------- /scripts/aptool: -------------------------------------------------------------------------------- 1 | #!python 2 | # -*- python -*- 3 | 4 | import argparse 5 | import logging 6 | import os 7 | import sys 8 | 9 | import activepapers 10 | import activepapers.cli 11 | 12 | 13 | ################################################## 14 | 15 | parser = argparse.ArgumentParser(description="Management of ActivePapers") 16 | parser.add_argument('-p', '--paper', type=str, 17 | help="name of the HDF5 file containing the ActivePaper") 18 | parser.add_argument('--log', type=str, 19 | help="logging level (default: WARNING)") 20 | parser.add_argument('--logfile', type=str, 21 | help="name of the file to which logging " 22 | "information is written") 23 | parser.add_argument('--version', action='version', 24 | version=activepapers.__version__) 25 | subparsers = parser.add_subparsers(help="commands") 26 | 27 | ################################################## 28 | 29 | create_parser = subparsers.add_parser('create', help="Create a new ActivePaper") 30 | create_parser.add_argument('-d', metavar='DEPENDENCY', 31 | type=str, action='append', 32 | help="Python packages that the ActivePaper " 33 | "depends on") 34 | create_parser.set_defaults(func=activepapers.cli.create) 35 | 36 | ################################################## 37 | 38 | ls_parser = subparsers.add_parser('ls', help="Show datasets") 39 | ls_parser.add_argument('--long', '-l', action='store_true', 40 | help="long format") 41 | ls_parser.add_argument('--type', '-t', 42 | help="show only items of the given type") 43 | ls_parser.add_argument('pattern', nargs='*', 44 | help="name pattern") 45 | ls_parser.set_defaults(func=activepapers.cli.ls) 46 | 47 | ################################################## 48 | 49 | rm_parser = subparsers.add_parser('rm', help="Remove datasets and " 50 | "everything depending on them") 51 | rm_parser.add_argument('--force', '-f', action='store_true', 52 | help="no confirmation prompt") 53 | rm_parser.add_argument('pattern', nargs='*', 54 | help="name pattern") 55 | rm_parser.set_defaults(func=activepapers.cli.rm) 56 | 57 | ################################################## 58 | 59 | dummy_parser = subparsers.add_parser('dummy', help="Replace datasets by " 60 | "dummies") 61 | dummy_parser.add_argument('--force', '-f', action='store_true', 62 | help="no confirmation prompt") 63 | dummy_parser.add_argument('pattern', nargs='*', 64 | help="name pattern") 65 | dummy_parser.set_defaults(func=activepapers.cli.dummy) 66 | 67 | ################################################## 68 | 69 | set_parser = subparsers.add_parser('set', help="Set dataset to the value " 70 | "of a Python expression") 71 | set_parser.add_argument('dataset', type=str, help="dataset name") 72 | set_parser.add_argument('expr', type=str, help="expression") 73 | set_parser.set_defaults(func=activepapers.cli.set_) 74 | 75 | ################################################## 76 | 77 | group_parser = subparsers.add_parser('group', help="Create group") 78 | group_parser.add_argument('group_name', type=str, help="group name") 79 | group_parser.set_defaults(func=activepapers.cli.group) 80 | 81 | ################################################## 82 | 83 | extract_parser = subparsers.add_parser('extract', 84 | help="Copy internal file or " 85 | " source code item to a file") 86 | extract_parser.add_argument('dataset', type=str, help="dataset name") 87 | extract_parser.add_argument('filename',type=str, 88 | help="name of file to extract to") 89 | extract_parser.set_defaults(func=activepapers.cli.extract) 90 | 91 | ################################################## 92 | 93 | calclet_parser = subparsers.add_parser('calclet', 94 | help="Store a calclet" 95 | " inside the ActivePaper") 96 | calclet_parser.add_argument('dataset', type=str, help="dataset name") 97 | calclet_parser.add_argument('filename',type=str, 98 | help="name of the Python script") 99 | calclet_parser.add_argument('--run', '-r', action='store_true', 100 | help="run the calclet") 101 | calclet_parser.set_defaults(func=activepapers.cli.calclet) 102 | 103 | ################################################## 104 | 105 | importlet_parser = subparsers.add_parser('importlet', 106 | help="Store a importlet" 107 | " inside the ActivePaper") 108 | importlet_parser.add_argument('dataset', type=str, help="dataset name") 109 | importlet_parser.add_argument('filename',type=str, 110 | help="name of the Python script") 111 | importlet_parser.add_argument('--run', '-r', action='store_true', 112 | help="run the importlet") 113 | importlet_parser.set_defaults(func=activepapers.cli.importlet) 114 | 115 | ################################################## 116 | 117 | import_parser = subparsers.add_parser('import', 118 | help="Import a Python module" 119 | " into the ActivePaper") 120 | import_parser.add_argument('module',type=str, 121 | help="name of the Python module") 122 | import_parser.set_defaults(func=activepapers.cli.import_module) 123 | 124 | ################################################## 125 | 126 | run_parser = subparsers.add_parser('run', 127 | help="Run a calclet or importlet") 128 | run_parser.add_argument('codelet', type=str, help="codelet name") 129 | run_parser.add_argument('--debug', '-d', action='store_true', 130 | help="drop into the debugger in case of an exception") 131 | run_parser.add_argument('--profile', 132 | help="run under profiler control") 133 | run_parser.add_argument('--checkin', '-c', action='store_true', 134 | help="do 'checkin code' before running the codelet") 135 | run_parser.set_defaults(func=activepapers.cli.run) 136 | 137 | ################################################## 138 | 139 | update_parser = subparsers.add_parser('update', 140 | help="Update dummy or stale datasets " 141 | "by running the required calclets") 142 | update_parser.add_argument('--verbose', '-v', action='store_true', 143 | help="show each step being executed") 144 | update_parser.set_defaults(func=activepapers.cli.update) 145 | 146 | ################################################## 147 | 148 | checkin_parser = subparsers.add_parser('checkin', 149 | help="Update files, code, and text" 150 | "from the working directory") 151 | checkin_parser.add_argument('--type', '-t', 152 | help="ActivePapers datatype") 153 | checkin_parser.add_argument('file', nargs='*', 154 | help="filename") 155 | checkin_parser.add_argument('--force', '-f', action='store_true', 156 | help="Update even if replacement is older") 157 | checkin_parser.add_argument('--dry-run', '-n', action='store_true', 158 | help="Display actions but don't execute them") 159 | checkin_parser.set_defaults(func=activepapers.cli.checkin) 160 | 161 | ################################################## 162 | 163 | checkout_parser = subparsers.add_parser('checkout', 164 | help="Extract all files, code, and" 165 | "text to the working directory") 166 | checkout_parser.add_argument('--type', '-t', 167 | help="check out only items of the given type") 168 | checkout_parser.add_argument('pattern', nargs='*', 169 | help="name pattern") 170 | checkout_parser.add_argument('--dry-run', '-n', action='store_true', 171 | help="Display actions but don't execute them") 172 | checkout_parser.set_defaults(func=activepapers.cli.checkout) 173 | 174 | ################################################## 175 | 176 | ln_parser = subparsers.add_parser('ln', 177 | help="Create a link to another ActivePaper") 178 | ln_parser.add_argument('reference', type=str, help="reference to a dataset " 179 | "in another ActivePaper") 180 | ln_parser.add_argument('name', type=str, help="name of the link") 181 | ln_parser.set_defaults(func=activepapers.cli.ln) 182 | 183 | ################################################## 184 | 185 | cp_parser = subparsers.add_parser('cp', 186 | help="Copy a dataset or group from " 187 | "another ActivePaper") 188 | cp_parser.add_argument('reference', type=str, help="reference to a dataset " 189 | "in another ActivePaper") 190 | cp_parser.add_argument('name', type=str, help="name of the copy") 191 | cp_parser.set_defaults(func=activepapers.cli.cp) 192 | 193 | ################################################## 194 | 195 | refs_parser = subparsers.add_parser('refs', 196 | help="Show references to other ActivePapers") 197 | refs_parser.add_argument('--verbose', '-v', action='store_true', 198 | help="Display referenced items") 199 | refs_parser.set_defaults(func=activepapers.cli.refs) 200 | 201 | ################################################## 202 | 203 | edit_parser = subparsers.add_parser('edit', 204 | help="Edit an extractable dataset") 205 | edit_parser.add_argument('dataset', type=str, help="dataset name") 206 | edit_parser.set_defaults(func=activepapers.cli.edit) 207 | 208 | ################################################## 209 | 210 | console_parser = subparsers.add_parser('console', 211 | help="Run a Python interactive console" 212 | " inside the ActivePaper") 213 | console_parser.add_argument('--modify', '-m', action='store_true', 214 | help="Permit modifications (use with care)") 215 | console_parser.set_defaults(func=activepapers.cli.console) 216 | 217 | ################################################## 218 | 219 | ipython_parser = subparsers.add_parser('ipython', 220 | help="Run an IPython shell" 221 | " inside the ActivePaper") 222 | ipython_parser.add_argument('--modify', '-m', action='store_true', 223 | help="Permit modifications (use with care)") 224 | ipython_parser.set_defaults(func=activepapers.cli.ipython) 225 | 226 | ################################################## 227 | 228 | def setup_logging(log, logfile): 229 | if log is None: 230 | log = "WARNING" 231 | if log not in ["DEBUG", "INFO", "WARNING", "ERROR", "CRITICAL"]: 232 | sys.stderr.write("invalid logging level %s\n" % log) 233 | opts = dict(level=getattr(logging, log), 234 | format="%(asctime)s %(levelname)s: %(message)s", 235 | datefmt="%Y-%m-%d/%H:%M:%S") 236 | if logfile is not None: 237 | opts["filename"] = logfile 238 | opts["filemode"] = "a" 239 | logging.basicConfig(**opts) 240 | 241 | ################################################## 242 | 243 | parsed_args = parser.parse_args() 244 | try: 245 | func = parsed_args.func 246 | except AttributeError: 247 | func = None 248 | args = dict(parsed_args.__dict__) 249 | setup_logging(args['log'], args['logfile']) 250 | try: 251 | del args['func'] 252 | except KeyError: 253 | pass 254 | del args['log'] 255 | del args['logfile'] 256 | try: 257 | if func is not None: 258 | func(**args) 259 | except activepapers.cli.CLIExit: 260 | pass 261 | finally: 262 | logging.shutdown() 263 | 264 | -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | 3 | from setuptools import setup, Command 4 | import copy 5 | import os 6 | import sys 7 | 8 | package_dir = "lib" 9 | script_dir = "scripts" 10 | 11 | 12 | with open('README.md') as file: 13 | long_description = file.read() 14 | long_description = long_description[:long_description.find("\n\n")] 15 | 16 | class Dummy: 17 | pass 18 | version = Dummy() 19 | exec(open('lib/activepapers/version.py').read(), version.__dict__) 20 | 21 | setup(name='ActivePapers.Py', 22 | version=version.version, 23 | description='Executable papers containing Python code', 24 | long_description=long_description, 25 | author='Konrad Hinsen', 26 | author_email='research@khinsen.fastmail.net', 27 | url='http://github.com/activepapers/activepapers-python', 28 | license='BSD', 29 | package_dir = {'': package_dir}, 30 | packages=['activepapers'], 31 | scripts=[os.path.join(script_dir, s) for s in os.listdir(script_dir)], 32 | platforms=['any'], 33 | install_requires=[ 34 | "numpy>=1.6", 35 | "h5py>=2.2", 36 | "tempdir>=0.6" 37 | ], 38 | provides=["ActivePapers"], 39 | classifiers=[ 40 | "Development Status :: 3 - Alpha", 41 | "Intended Audience :: Science/Research", 42 | "License :: OSI Approved :: BSD License", 43 | "Operating System :: OS Independent", 44 | "Programming Language :: Python :: 2.7", 45 | "Programming Language :: Python :: 3.4", 46 | "Programming Language :: Python :: 3.5", 47 | "Programming Language :: Python :: 3.6", 48 | "Topic :: Scientific/Engineering", 49 | ] 50 | ) 51 | -------------------------------------------------------------------------------- /tests/foo/__init__.py: -------------------------------------------------------------------------------- 1 | __version__ = 42 2 | -------------------------------------------------------------------------------- /tests/foo/bar.py: -------------------------------------------------------------------------------- 1 | def frobnicate(x): 2 | return str(x) 3 | -------------------------------------------------------------------------------- /tests/run_all_tests.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | for test in test*.py 4 | do 5 | $1 $test 6 | done 7 | -------------------------------------------------------------------------------- /tests/test_basics.py: -------------------------------------------------------------------------------- 1 | # Extensive tests on a very simple ActivePaper 2 | 3 | import collections 4 | import os 5 | import numpy as np 6 | import h5py 7 | import tempdir 8 | from activepapers.storage import ActivePaper 9 | from activepapers.utility import ascii 10 | 11 | 12 | def make_simple_paper(filename): 13 | 14 | paper = ActivePaper(filename, "w") 15 | 16 | #paper.data.create_dataset("frequency", data=0.2) 17 | #paper.data.create_dataset("time", data=0.1*np.arange(100)) 18 | 19 | init = paper.create_importlet("initialize", 20 | """ 21 | from activepapers.contents import data 22 | import numpy as np 23 | 24 | data['frequency'] = 0.2 25 | data['time'] = 0.1*np.arange(100) 26 | """) 27 | init.run() 28 | 29 | calc_sine = paper.create_calclet("calc_sine", 30 | """ 31 | from activepapers.contents import data 32 | import numpy as np 33 | 34 | frequency = data['frequency'][...] 35 | time = data['time'][...] 36 | data.create_dataset("sine", data=np.sin(2.*np.pi*frequency*time)) 37 | """) 38 | calc_sine.run() 39 | 40 | paper.close() 41 | 42 | 43 | def make_paper_with_internal_module(filename): 44 | 45 | paper = ActivePaper(filename, "w") 46 | 47 | paper.add_module("my_math", 48 | """ 49 | import numpy as np 50 | 51 | def my_func(x): 52 | return np.sin(x) 53 | """) 54 | 55 | paper.data.create_dataset("frequency", data=0.2) 56 | paper.data.create_dataset("time", data=0.1*np.arange(100)) 57 | 58 | calc_sine = paper.create_calclet("calc_sine", 59 | """ 60 | from activepapers.contents import data 61 | import numpy as np 62 | from my_math import my_func 63 | 64 | frequency = data['frequency'][...] 65 | time = data['time'][...] 66 | data.create_dataset("sine", data=my_func(2.*np.pi*frequency*time)) 67 | """) 68 | calc_sine.run() 69 | 70 | paper.close() 71 | 72 | 73 | def assert_almost_equal(x, y, tolerance): 74 | assert (np.fabs(np.array(x)-np.array(y)) < tolerance).all() 75 | 76 | 77 | def assert_valid_paper(h5file): 78 | assert h5file.attrs['DATA_MODEL'] == ascii('active-papers-py') 79 | assert h5file.attrs['DATA_MODEL_MAJOR_VERSION'] == 0 80 | assert h5file.attrs['DATA_MODEL_MINOR_VERSION'] == 1 81 | 82 | for group in ['code', 'data', 'documentation']: 83 | assert group in h5file 84 | assert isinstance(h5file[group], h5py.Group) 85 | 86 | history = h5file['history'] 87 | assert history.shape == (1,) 88 | opened = history[0]['opened'] 89 | closed = history[0]['closed'] 90 | def check_timestamp(name, node): 91 | t = node.attrs.get('ACTIVE_PAPER_TIMESTAMP', None) 92 | if t is not None: 93 | assert t >= opened 94 | assert t <= closed 95 | h5file.visititems(check_timestamp) 96 | 97 | 98 | def check_hdf5_file(filename, ref_all_paths, ref_deps): 99 | h5file = h5py.File(filename, "r") 100 | all_paths = [] 101 | h5file.visit(all_paths.append) 102 | all_paths.sort() 103 | assert all_paths == ref_all_paths 104 | assert_valid_paper(h5file) 105 | assert_almost_equal(h5file["data/frequency"][...], 0.2, 1.e-15) 106 | assert_almost_equal(h5file["data/time"][...], 107 | 0.1*np.arange(100), 108 | 1.e-15) 109 | assert_almost_equal(h5file["data/sine"][...], 110 | np.sin(0.04*np.pi*np.arange(100)), 111 | 1.e-10) 112 | for path in ['data/frequency', 'data/sine', 'data/time']: 113 | assert h5file[path].attrs['ACTIVE_PAPER_DATATYPE'] == "data" 114 | assert h5file[path].attrs['ACTIVE_PAPER_TIMESTAMP'] > 1.e9 115 | for path in ['code/calc_sine']: 116 | assert h5file[path].attrs['ACTIVE_PAPER_DATATYPE'] == "calclet" 117 | deps = h5file["data/sine"].attrs['ACTIVE_PAPER_DEPENDENCIES'] 118 | assert list(ascii(p) for p in deps) \ 119 | == [ascii(p) for p in ref_deps] 120 | assert h5file["data/sine"].attrs['ACTIVE_PAPER_GENERATING_CODELET'] \ 121 | == "/code/calc_sine" 122 | h5file.close() 123 | 124 | 125 | def check_paper(filename, ref_items, ref_deps, ref_hierarchy): 126 | paper = ActivePaper(filename, "r") 127 | items = sorted([item.name for item in paper.iter_items()]) 128 | assert items == ref_items 129 | items_with_deps = sorted([item.name for item in paper.iter_items() 130 | if paper.has_dependencies(item)]) 131 | assert items_with_deps == ['/data/sine'] 132 | deps = dict((ascii(item.name), 133 | sorted(list(ascii(dep.name) 134 | for dep in paper.iter_dependencies(item)))) 135 | for item in paper.iter_items()) 136 | assert deps == ref_deps 137 | graph = collections.defaultdict(set) 138 | for item, deps in ref_deps.items(): 139 | for d in deps: 140 | graph[d].add(item) 141 | assert graph == paper.dependency_graph() 142 | hierarchy = [sorted([ascii(item.name) for item in items]) 143 | for items in paper.dependency_hierarchy()] 144 | assert hierarchy == ref_hierarchy 145 | calclets = paper.calclets() 146 | assert len(calclets) == 1 147 | assert ascii(calclets['/code/calc_sine'].path) == '/code/calc_sine' 148 | paper.close() 149 | 150 | 151 | def test_simple_paper(): 152 | with tempdir.TempDir() as t: 153 | filename1 = os.path.join(t, "simple1.ap") 154 | filename2 = os.path.join(t, "simple2.ap") 155 | make_simple_paper(filename1) 156 | all_paths = ['README', 'code', 'code/calc_sine', 'code/initialize', 157 | 'data', 'data/frequency', 'data/sine', 'data/time', 158 | 'documentation', 'external-dependencies', 'history'] 159 | all_items = ['/code/calc_sine', '/code/initialize', '/data/frequency', 160 | '/data/sine', '/data/time'] 161 | all_deps = {'/data/sine': ["/code/calc_sine", 162 | "/data/frequency", 163 | "/data/time"], 164 | '/data/time': [], 165 | '/data/frequency': [], 166 | '/code/calc_sine': [], 167 | '/code/initialize': []} 168 | sine_deps = ["/code/calc_sine", 169 | "/data/frequency", 170 | "/data/time"] 171 | hierarchy = [['/code/calc_sine', '/code/initialize', 172 | '/data/frequency', '/data/time'], 173 | ['/data/sine']] 174 | check_hdf5_file(filename1, all_paths, sine_deps) 175 | check_paper(filename1, all_items, all_deps, hierarchy) 176 | with ActivePaper(filename1, "r") as paper: 177 | paper.rebuild(filename2) 178 | check_hdf5_file(filename2, all_paths, sine_deps) 179 | check_paper(filename2, all_items, all_deps, hierarchy) 180 | 181 | def test_paper_with_internal_module(): 182 | with tempdir.TempDir() as t: 183 | filename1 = os.path.join(t, "im1.ap") 184 | filename2 = os.path.join(t, "im2.ap") 185 | make_paper_with_internal_module(filename1) 186 | all_paths = ['README', 'code', 'code/calc_sine', 187 | 'code/python-packages', 'code/python-packages/my_math', 188 | 'data', 'data/frequency', 'data/sine', 'data/time', 189 | 'documentation', 'external-dependencies', 'history'] 190 | all_items = ['/code/calc_sine', '/code/python-packages/my_math', 191 | '/data/frequency', '/data/sine', '/data/time'] 192 | all_deps = {'/data/sine': ["/code/calc_sine", 193 | "/code/python-packages/my_math", 194 | "/data/frequency", 195 | "/data/time"], 196 | '/data/time': [], 197 | '/data/frequency': [], 198 | '/code/calc_sine': [], 199 | '/code/python-packages/my_math': []} 200 | sine_deps = ["/code/calc_sine", 201 | "/code/python-packages/my_math", 202 | "/data/frequency", 203 | "/data/time"] 204 | hierarchy = [['/code/calc_sine', '/code/python-packages/my_math', 205 | '/data/frequency', '/data/time'], 206 | ['/data/sine']] 207 | check_hdf5_file(filename1, all_paths, sine_deps) 208 | check_paper(filename1, all_items, all_deps, hierarchy) 209 | with ActivePaper(filename1, "r") as paper: 210 | paper.rebuild(filename2) 211 | check_hdf5_file(filename2, all_paths, sine_deps) 212 | check_paper(filename2, all_items, all_deps, hierarchy) 213 | -------------------------------------------------------------------------------- /tests/test_exploration.py: -------------------------------------------------------------------------------- 1 | # Test the exploration module 2 | 3 | import os 4 | import numpy as np 5 | import tempdir 6 | from activepapers.storage import ActivePaper 7 | from activepapers import library 8 | from activepapers.exploration import ActivePaper as ActivePaperExploration 9 | 10 | def make_local_paper(filename): 11 | 12 | paper = ActivePaper(filename, "w") 13 | 14 | paper.data.create_dataset("frequency", data=0.2) 15 | paper.data.create_dataset("time", data=0.1*np.arange(100)) 16 | 17 | paper.add_module("my_math", 18 | """ 19 | import numpy as np 20 | 21 | def my_func(x): 22 | return np.sin(x) 23 | """) 24 | 25 | paper.close() 26 | 27 | def check_local_paper(filename): 28 | ap = ActivePaperExploration(filename) 29 | from my_math import my_func 30 | frequency = ap.data['frequency'][...] 31 | time = ap.data['time'][...] 32 | sine = my_func(2.*np.pi*frequency*time) 33 | assert (sine == np.sin(2.*np.pi*frequency*time)).all() 34 | ap.close() 35 | 36 | def test_local_paper(): 37 | with tempdir.TempDir() as t: 38 | filename = os.path.join(t, "test.ap") 39 | 40 | make_local_paper(filename) 41 | check_local_paper(filename) 42 | 43 | if "NO_NETWORK_ACCESS" not in os.environ: 44 | def test_published_paper(): 45 | with tempdir.TempDir() as t: 46 | library.library = [t] 47 | ap = ActivePaperExploration("doi:10.6084/m9.figshare.808595") 48 | import time_series 49 | ts = np.arange(10) 50 | assert time_series.integral(ts, 1)[-1] == 40.5 51 | ap.close() 52 | -------------------------------------------------------------------------------- /tests/test_features.py: -------------------------------------------------------------------------------- 1 | # Test specific features of ActivePapers 2 | # coding: utf-8 3 | 4 | import os 5 | import numpy as np 6 | import h5py 7 | import tempdir 8 | from nose.tools import raises 9 | from activepapers.storage import ActivePaper 10 | from activepapers.utility import ascii 11 | 12 | def test_groups_as_items(): 13 | with tempdir.TempDir() as t: 14 | filename = os.path.join(t, "paper.ap") 15 | paper = ActivePaper(filename, 'w') 16 | group1 = paper.data.create_group('group1') 17 | group1.create_dataset('value', data=42) 18 | group2 = paper.data.create_group('group2') 19 | group2.mark_as_data_item() 20 | group2.create_dataset('array', data=np.arange(10)) 21 | items = sorted([item.name for item in paper.iter_items()]) 22 | assert items == ['/data/group1/value', '/data/group2'] 23 | groups = sorted([group.name for group in paper.iter_groups()]) 24 | assert groups == ['/data/group1'] 25 | script = paper.create_calclet("script1", 26 | """ 27 | from activepapers.contents import data 28 | x1 = data['group2']['array'][...] 29 | x2 = data['group1']['value'][...] 30 | data.create_dataset('sum1', data=x1+x2) 31 | """) 32 | script.run() 33 | assert (paper.data['sum1'][...] == np.arange(42,52)).all() 34 | script = paper.create_calclet("script2", 35 | """ 36 | from activepapers.contents import data 37 | x1 = data['/group2/array'][...] 38 | g = data['group1'] 39 | x2 = g['/group1/value'][...] 40 | data.create_dataset('sum2', data=x1+x2) 41 | """) 42 | script.run() 43 | assert (paper.data['sum2'][...] == np.arange(42,52)).all() 44 | deps = [sorted([ascii(item.name) for item in level]) 45 | for level in paper.dependency_hierarchy()] 46 | assert deps == [['/code/script1', '/code/script2', 47 | '/data/group1/value', '/data/group2'], 48 | ['/data/sum1', '/data/sum2']] 49 | deps = paper.data['sum1']._node.attrs['ACTIVE_PAPER_DEPENDENCIES'] 50 | deps = sorted(ascii(d) for d in deps) 51 | assert deps == ['/code/script1', 52 | '/data/group1/value', '/data/group2'] 53 | deps = paper.data['sum2']._node.attrs['ACTIVE_PAPER_DEPENDENCIES'] 54 | deps = sorted(ascii(d) for d in deps) 55 | assert deps == ['/code/script2', 56 | '/data/group1/value', '/data/group2'] 57 | paper.close() 58 | 59 | def test_groups(): 60 | with tempdir.TempDir() as t: 61 | filename = os.path.join(t, "paper.ap") 62 | paper = ActivePaper(filename, 'w') 63 | group = paper.data.create_group('group') 64 | subgroup = group.create_group('subgroup') 65 | group['data1'] = np.arange(10) 66 | group['data2'] = 42 67 | assert sorted([g.name for g in paper.iter_groups()]) \ 68 | == ['/data/group', '/data/group/subgroup'] 69 | assert sorted(list(node for node in group)) \ 70 | == ['data1', 'data2', 'subgroup'] 71 | assert group['data1'][...].shape == (10,) 72 | assert group['data2'][...] == 42 73 | assert paper.data.parent is paper.data 74 | assert group.parent is paper.data 75 | assert group['data1'].parent is group 76 | assert group['data2'].parent is group 77 | script = paper.create_calclet("script", 78 | """ 79 | from activepapers.contents import data 80 | assert data.parent is data 81 | assert data._codelet is not None 82 | assert data._codelet.path == '/code/script' 83 | group = data['group'] 84 | assert group.parent is data 85 | assert group._codelet is not None 86 | assert group._codelet.path == '/code/script' 87 | """) 88 | script.run() 89 | paper.close() 90 | 91 | def test_datasets(): 92 | with tempdir.TempDir() as t: 93 | filename = os.path.join(t, "paper.ap") 94 | paper = ActivePaper(filename, 'w') 95 | dset = paper.data.create_dataset("MyDataset", (10,10,10), 'f') 96 | assert len(dset) == 10 97 | assert dset[0, 0, 0].shape == () 98 | assert dset[0, 2:10, 1:9:3].shape == (8, 3) 99 | assert dset[:, ::2, 5].shape == (10, 5) 100 | assert dset[0].shape == (10, 10) 101 | assert dset[1, 5].shape == (10,) 102 | assert dset[0, ...].shape == (10, 10) 103 | assert dset[..., 6].shape == (10, 10) 104 | array = np.arange(100) 105 | dset = paper.data.create_dataset("MyArray", data=array) 106 | assert len(dset) == 100 107 | assert (dset[array > 50] == np.arange(51, 100)).all() 108 | dset[:20] = 42 109 | assert (dset[...] == np.array(20*[42]+list(range(20, 100)))).all() 110 | paper.data['a_number'] = 42 111 | assert paper.data['a_number'][()] == 42 112 | paper.close() 113 | 114 | def test_attrs(): 115 | with tempdir.TempDir() as t: 116 | filename = os.path.join(t, "paper.ap") 117 | paper = ActivePaper(filename, 'w') 118 | group = paper.data.create_group('group') 119 | ds = group.create_dataset('value', data=42) 120 | group.mark_as_data_item() 121 | assert len(group.attrs) == 0 122 | group.attrs['foo'] = 'bar' 123 | assert len(group.attrs) == 1 124 | assert list(group.attrs) == ['foo'] 125 | assert group.attrs['foo'] == 'bar' 126 | assert len(ds.attrs) == 0 127 | ds.attrs['foo'] = 'bar' 128 | assert len(ds.attrs) == 1 129 | assert list(ds.attrs) == ['foo'] 130 | assert ds.attrs['foo'] == 'bar' 131 | paper.close() 132 | 133 | def test_dependencies(): 134 | with tempdir.TempDir() as t: 135 | filename = os.path.join(t, "paper.ap") 136 | paper = ActivePaper(filename, 'w') 137 | paper.data.create_dataset('e', data = np.e) 138 | paper.data.create_dataset('pi', data = np.pi) 139 | script = paper.create_calclet("script", 140 | """ 141 | from activepapers.contents import data 142 | import numpy as np 143 | e = data['e'][...] 144 | sum = data.create_dataset('sum', shape=(1,), dtype=np.float) 145 | pi = data['pi'][...] 146 | sum[0] = e+pi 147 | """) 148 | script.run() 149 | deps = [ascii(item.name) 150 | for item in paper.iter_dependencies(paper.data['sum']._node)] 151 | assert sorted(deps) == ['/code/script', '/data/e', '/data/pi'] 152 | assert not paper.is_stale(paper.data['sum']._node) 153 | del paper.data['e'] 154 | paper.data['e'] = 0. 155 | assert paper.is_stale(paper.data['sum']._node) 156 | paper.close() 157 | 158 | def test_internal_files(): 159 | with tempdir.TempDir() as t: 160 | filename = os.path.join(t, "paper.ap") 161 | paper = ActivePaper(filename, 'w') 162 | script = paper.create_calclet("write1", 163 | """ 164 | from activepapers.contents import open 165 | 166 | f = open('numbers1', 'w') 167 | for i in range(10): 168 | f.write(str(i)+'\\n') 169 | f.close() 170 | """) 171 | script.run() 172 | script = paper.create_calclet("write2", 173 | """ 174 | from activepapers.contents import open 175 | 176 | with open('numbers', 'w') as f: 177 | for i in range(10): 178 | f.write(str(i)+'\\n') 179 | """) 180 | script.run() 181 | script = paper.create_calclet("write3", 182 | """ 183 | from activepapers.contents import open 184 | 185 | with open('empty', 'w') as f: 186 | pass 187 | """) 188 | script.run() 189 | script = paper.create_calclet("write4", 190 | u""" 191 | from activepapers.contents import open 192 | 193 | with open('utf8', 'w', encoding='utf-8') as f: 194 | f.write(u'déjà') 195 | """) 196 | script.run() 197 | script = paper.create_calclet("read1", 198 | """ 199 | from activepapers.contents import open 200 | 201 | f = open('numbers') 202 | for i in range(10): 203 | assert f.readline().strip() == str(i) 204 | f.close() 205 | """) 206 | script.run() 207 | script = paper.create_calclet("read2", 208 | """ 209 | from activepapers.contents import open 210 | 211 | f = open('numbers') 212 | data = [int(line.strip()) for line in f] 213 | f.close() 214 | assert data == list(range(10)) 215 | """) 216 | script.run() 217 | script = paper.create_calclet("read3", 218 | """ 219 | from activepapers.contents import open 220 | 221 | f = open('empty') 222 | data = f.read() 223 | f.close() 224 | assert len(data) == 0 225 | """) 226 | script.run() 227 | script = paper.create_calclet("read4", 228 | u""" 229 | from activepapers.contents import open 230 | 231 | f = open('utf8', encoding='utf-8') 232 | data = f.read() 233 | f.close() 234 | assert data == u'déjà' 235 | """) 236 | script.run() 237 | script = paper.create_calclet("convert_to_binary", 238 | """ 239 | from activepapers.contents import open 240 | import struct 241 | 242 | with open('numbers') as f: 243 | data = [int(line.strip()) for line in f] 244 | f = open('binary_numbers', 'wb') 245 | f.write(struct.pack(len(data)*'h', *data)) 246 | f.close() 247 | """) 248 | script.run() 249 | script = paper.create_calclet("read_binary", 250 | """ 251 | from activepapers.contents import open 252 | import struct 253 | 254 | f = open('binary_numbers', 'rb') 255 | assert struct.unpack(10*'h', f.read()) == tuple(range(10)) 256 | f.close() 257 | """) 258 | script.run() 259 | script = paper.create_calclet("write_documentation", 260 | """ 261 | from activepapers.contents import open_documentation 262 | 263 | with open_documentation('hello.txt', 'w') as f: 264 | f.write('Hello world!\\n') 265 | """) 266 | script.run() 267 | h = [sorted(list(ascii(item.name) for item in step)) 268 | for step in paper.dependency_hierarchy()] 269 | print(h) 270 | assert h == [['/code/convert_to_binary', 271 | '/code/read1', '/code/read2', '/code/read3', 272 | '/code/read4', '/code/read_binary', 273 | '/code/write1', '/code/write2', '/code/write3', 274 | '/code/write4', '/code/write_documentation'], 275 | ['/data/empty', '/data/numbers', '/data/numbers1', 276 | '/data/utf8', '/documentation/hello.txt'], 277 | ['/data/binary_numbers']] 278 | paper.close() 279 | 280 | @raises(ValueError) 281 | def test_overwrite_internal_file(): 282 | with tempdir.TempDir() as t: 283 | filename = os.path.join(t, "paper.ap") 284 | paper = ActivePaper(filename, 'w') 285 | script = paper.create_calclet("write1", 286 | """ 287 | from activepapers.contents import open 288 | f = open('numbers', 'w') 289 | for i in range(10): 290 | f.write(str(i)+'\\n') 291 | f.close() 292 | """) 293 | script.run() 294 | script = paper.create_calclet("write2", 295 | """ 296 | from activepapers.contents import open 297 | 298 | with open('numbers', 'w') as f: 299 | for i in range(10): 300 | f.write(str(i)+'\\n') 301 | """) 302 | script.run() 303 | paper.close() 304 | 305 | @raises(ImportError) 306 | def test_import_forbidden(): 307 | # distutils is a forbidden module from the standard library 308 | with tempdir.TempDir() as t: 309 | filename = os.path.join(t, "paper.ap") 310 | paper = ActivePaper(filename, "w") 311 | script = paper.create_calclet("script", 312 | """ 313 | import distutils 314 | """) 315 | script.run() 316 | paper.close() 317 | 318 | def test_snapshots(): 319 | with tempdir.TempDir() as t: 320 | filename = os.path.join(t, "paper.ap") 321 | snapshot_1 = os.path.join(t, "snapshot_1.ap") 322 | snapshot_2 = os.path.join(t, "snapshot_2.ap") 323 | paper = ActivePaper(filename, 'w') 324 | paper.data.create_dataset("frequency", data = 0.2) 325 | paper.data.create_dataset("time", data=0.1*np.arange(100)) 326 | calc_angular = paper.create_calclet("calc_angular", 327 | """ 328 | from activepapers.contents import data, snapshot 329 | import numpy as np 330 | 331 | frequency = data['frequency'][...] 332 | time = data['time'][...] 333 | angular = data.create_group('angular') 334 | angular.attrs['time'] = data['time'].ref 335 | angular.create_dataset("time", data=data['time'].ref) 336 | angular.create_dataset("sine", data=np.sin(2.*np.pi*frequency*time)) 337 | snapshot('%s') 338 | angular.create_dataset("cosine", data=np.cos(2.*np.pi*frequency*time)) 339 | snapshot('%s') 340 | angular.create_dataset("tangent", data=np.tan(2.*np.pi*frequency*time)) 341 | """ % (snapshot_1, snapshot_2)) 342 | calc_angular.run() 343 | paper.close() 344 | # Open the snapshot files to verify they are valid ActivePapers 345 | ActivePaper(snapshot_1, 'r').close() 346 | ActivePaper(snapshot_2, 'r').close() 347 | # Check the contents 348 | paper = h5py.File(filename) 349 | snapshot_1 = h5py.File(snapshot_1) 350 | snapshot_2 = h5py.File(snapshot_2) 351 | for item in ['/data/time', '/data/frequency', '/data/angular/sine', 352 | '/code/calc_angular']: 353 | assert item in paper 354 | assert item in snapshot_1 355 | assert item in snapshot_2 356 | assert '/data/angular/cosine' in paper 357 | assert '/data/angular/cosine' not in snapshot_1 358 | assert '/data/angular/cosine' in snapshot_2 359 | assert '/data/angular/tangent' in paper 360 | assert '/data/angular/tangent' not in snapshot_1 361 | assert '/data/angular/tangent' not in snapshot_2 362 | for root in [snapshot_1, snapshot_2]: 363 | #time_ref = root['/data/angular/time'][()] 364 | #assert root[time_ref].name == '/data/time' 365 | time_ref = root['/data/angular'].attrs['time'] 366 | assert root[time_ref].name == '/data/time' 367 | 368 | def test_modified_scripts(): 369 | with tempdir.TempDir() as t: 370 | filename = os.path.join(t, "paper.ap") 371 | paper = ActivePaper(filename, 'w') 372 | script = paper.create_calclet("script", 373 | """ 374 | from activepapers.contents import data 375 | data.create_dataset('foo', data=42) 376 | group = data.create_group('group1') 377 | group.mark_as_data_item() 378 | group['value'] = 1 379 | group = data.create_group('group2') 380 | group['value'] = 2 381 | """) 382 | script.run() 383 | items = sorted([item.name for item in paper.iter_items()]) 384 | assert items == ['/code/script', '/data/foo', 385 | '/data/group1', '/data/group2/value'] 386 | assert (paper.data['foo'][...] == 42) 387 | assert (paper.data['group1/value'][...] == 1) 388 | assert (paper.data['group2/value'][...] == 2) 389 | script = paper.create_calclet("script", 390 | """ 391 | from activepapers.contents import data 392 | data.create_dataset('foo', data=1) 393 | """) 394 | script.run() 395 | items = sorted([item.name for item in paper.iter_items()]) 396 | assert items == ['/code/script', '/data/foo'] 397 | assert (paper.data['foo'][...] == 1) 398 | paper.close() 399 | 400 | def test_dummy_datasets(): 401 | with tempdir.TempDir() as t: 402 | filename = os.path.join(t, "paper.ap") 403 | paper = ActivePaper(filename, 'w') 404 | paper.data.create_dataset("frequency", data = 0.2) 405 | paper.data.create_dataset("time", data=0.1*np.arange(100)) 406 | calc_angular = paper.create_calclet("calc_angular", 407 | """ 408 | from activepapers.contents import data, snapshot 409 | import numpy as np 410 | 411 | frequency = data['frequency'][...] 412 | time = data['time'][...] 413 | angular = data.create_group('angular') 414 | angular.attrs['time'] = data['time'].ref 415 | angular.create_dataset("time", data=data['time'].ref) 416 | angular.create_dataset("sine", data=np.sin(2.*np.pi*frequency*time)) 417 | """) 418 | calc_angular.run() 419 | paper.replace_by_dummy('/data/angular/sine') 420 | dummy = paper.data_group['angular/sine'] 421 | assert dummy.attrs.get('ACTIVE_PAPER_GENERATING_CODELET') \ 422 | == '/code/calc_angular' 423 | assert dummy.attrs.get('ACTIVE_PAPER_DUMMY_DATASET', False) 424 | passed = True 425 | try: 426 | paper.replace_by_dummy('/data/time') 427 | except AssertionError: 428 | passed = False 429 | assert not passed 430 | paper.close() 431 | -------------------------------------------------------------------------------- /tests/test_library.py: -------------------------------------------------------------------------------- 1 | # Test file downloads 2 | 3 | import os 4 | import tempdir 5 | 6 | from activepapers.storage import ActivePaper 7 | from activepapers import library 8 | from activepapers.utility import ascii 9 | 10 | if "NO_NETWORK_ACCESS" not in os.environ: 11 | def test_figshare_download(): 12 | with tempdir.TempDir() as t: 13 | library.library = [t] 14 | local_name = library.find_in_library("doi:10.6084/m9.figshare.692144") 15 | assert local_name == os.path.join(t, "10.6084/m9.figshare.692144.ap") 16 | paper = ActivePaper(local_name) 17 | assert ascii(paper.code_group['python-packages/immutable/__init__'].attrs['ACTIVE_PAPER_DATATYPE']) == 'module' 18 | paper.close() 19 | 20 | if "NO_NETWORK_ACCESS" not in os.environ: 21 | def test_zenodo_download(): 22 | with tempdir.TempDir() as t: 23 | library.library = [t] 24 | local_name = library.find_in_library("doi:10.5281/zenodo.7648") 25 | assert local_name == os.path.join(t, "10.5281/zenodo.7648.ap") 26 | paper = ActivePaper(local_name) 27 | assert ascii(paper.code_group['python-packages/mosaic/__init__'].attrs['ACTIVE_PAPER_DATATYPE']) == 'module' 28 | paper.close() 29 | -------------------------------------------------------------------------------- /tests/test_python_modules.py: -------------------------------------------------------------------------------- 1 | import os 2 | import sys 3 | import tempdir 4 | from nose.tools import raises 5 | from activepapers.storage import ActivePaper 6 | from activepapers.utility import isstring 7 | 8 | def make_paper(filename): 9 | paper = ActivePaper(filename, "w") 10 | paper.import_module('foo') 11 | paper.import_module('foo.bar') 12 | script = paper.create_calclet("test", 13 | """ 14 | from activepapers.contents import data 15 | import foo 16 | from foo.bar import frobnicate 17 | data['result'] = frobnicate(2) 18 | assert frobnicate(foo.__version__) == '42' 19 | """) 20 | script.run() 21 | paper.close() 22 | 23 | def assert_is_python_module(node): 24 | assert node.attrs.get('ACTIVE_PAPER_DATATYPE', None) == 'module' 25 | assert node.attrs.get('ACTIVE_PAPER_LANGUAGE', None) == 'python' 26 | 27 | def check_paper(filename): 28 | paper = ActivePaper(filename, "r") 29 | items = sorted([item.name for item in paper.iter_items()]) 30 | assert items == ["/code/python-packages/foo/__init__", 31 | "/code/python-packages/foo/bar", 32 | "/code/test", 33 | "/data/result"] 34 | deps = [sorted(item.name for item in level) 35 | for level in paper.dependency_hierarchy()] 36 | assert deps == [['/code/python-packages/foo/__init__', 37 | '/code/python-packages/foo/bar', 38 | '/code/test'], 39 | ['/data/result']] 40 | for path in ['foo/__init__', 'foo/bar']: 41 | node = paper.code_group['python-packages'][path] 42 | assert_is_python_module(node) 43 | paper.close() 44 | 45 | def test_simple_paper(): 46 | with tempdir.TempDir() as t: 47 | filename1 = os.path.join(t, "paper1.ap") 48 | filename2 = os.path.join(t, "paper2.ap") 49 | make_paper(filename1) 50 | check_paper(filename1) 51 | with ActivePaper(filename1, "r") as paper: 52 | paper.rebuild(filename2) 53 | check_paper(filename2) 54 | 55 | def make_paper_with_module(filename, value): 56 | paper = ActivePaper(filename, "w") 57 | paper.add_module("some_values", 58 | """ 59 | a_value = %d 60 | """ % value) 61 | script = paper.create_calclet("test", 62 | """ 63 | from activepapers.contents import data 64 | from some_values import a_value 65 | data['a_value'] = a_value 66 | """) 67 | script.run() 68 | paper.close() 69 | 70 | def check_paper_with_module(filename, value): 71 | paper = ActivePaper(filename, "r") 72 | assert paper.data['a_value'][...] == value 73 | paper.close() 74 | 75 | def test_module_paper(): 76 | with tempdir.TempDir() as t: 77 | filename1 = os.path.join(t, "paper1.ap") 78 | filename2 = os.path.join(t, "paper2.ap") 79 | make_paper_with_module(filename1, 42) 80 | check_paper_with_module(filename1, 42) 81 | make_paper_with_module(filename2, 0) 82 | check_paper_with_module(filename2, 0) 83 | 84 | @raises(ValueError) 85 | def test_import_math(): 86 | # math is an extension module, so this should fail 87 | with tempdir.TempDir() as t: 88 | filename = os.path.join(t, "paper.ap") 89 | paper = ActivePaper(filename, "w") 90 | paper.import_module('math') 91 | paper.close() 92 | 93 | @raises(ImportError) 94 | def test_import_ctypes(): 95 | # ctypes is not in the "allowed module" list, so this should fail 96 | with tempdir.TempDir() as t: 97 | filename = os.path.join(t, "paper.ap") 98 | paper = ActivePaper(filename, "w") 99 | script = paper.create_calclet("test", 100 | """ 101 | import ctypes 102 | """) 103 | script.run() 104 | paper.close() 105 | -------------------------------------------------------------------------------- /tests/test_references.py: -------------------------------------------------------------------------------- 1 | # Test the use of references 2 | 3 | import os 4 | 5 | import numpy as np 6 | import h5py 7 | import tempdir 8 | 9 | from activepapers.storage import ActivePaper 10 | from activepapers.utility import ascii 11 | from activepapers import library 12 | 13 | def make_simple_paper(filename): 14 | 15 | paper = ActivePaper(filename, "w") 16 | 17 | paper.data.create_dataset("frequency", data=0.2) 18 | paper.data.create_dataset("time", data=0.1*np.arange(100)) 19 | 20 | calc_sine = paper.create_calclet("calc_sine", 21 | """ 22 | from activepapers.contents import data 23 | import numpy as np 24 | 25 | frequency = data['frequency'][...] 26 | time = data['time'][...] 27 | data.create_dataset("sine", data=np.sin(2.*np.pi*frequency*time)) 28 | """) 29 | calc_sine.run() 30 | 31 | paper.close() 32 | 33 | 34 | def make_library_paper(filename): 35 | 36 | paper = ActivePaper(filename, "w") 37 | 38 | paper.add_module("my_math", 39 | """ 40 | import numpy as np 41 | 42 | def my_func(x): 43 | return np.sin(x) 44 | """) 45 | 46 | paper.close() 47 | 48 | 49 | def make_simple_paper_with_data_refs(filename, paper_ref): 50 | 51 | paper = ActivePaper(filename, "w") 52 | 53 | paper.create_data_ref("frequency", paper_ref) 54 | paper.create_data_ref("time_from_ref", paper_ref, "time") 55 | 56 | calc_sine = paper.create_calclet("calc_sine", 57 | """ 58 | from activepapers.contents import data 59 | import numpy as np 60 | 61 | frequency = data['frequency'][...] 62 | time = data['time_from_ref'][...] 63 | data.create_dataset("sine", data=np.sin(2.*np.pi*frequency*time)) 64 | """) 65 | calc_sine.run() 66 | 67 | paper.close() 68 | 69 | 70 | def make_simple_paper_with_data_and_code_refs(filename, paper_ref): 71 | 72 | paper = ActivePaper(filename, "w") 73 | 74 | paper.create_data_ref("frequency", paper_ref) 75 | paper.create_data_ref("time", paper_ref) 76 | 77 | paper.create_code_ref("calc_sine", paper_ref) 78 | paper.run_codelet('calc_sine') 79 | 80 | paper.close() 81 | 82 | 83 | def make_simple_paper_with_library_refs(filename, paper_ref): 84 | 85 | paper = ActivePaper(filename, "w") 86 | 87 | paper.data.create_dataset("frequency", data = 0.2) 88 | paper.data.create_dataset("time", data=0.1*np.arange(100)) 89 | 90 | paper.create_module_ref("my_math", paper_ref) 91 | 92 | calc_sine = paper.create_calclet("calc_sine", 93 | """ 94 | from activepapers.contents import data 95 | import numpy as np 96 | from my_math import my_func 97 | 98 | frequency = data['frequency'][...] 99 | time = data['time'][...] 100 | data.create_dataset("sine", data=my_func(2.*np.pi*frequency*time)) 101 | """) 102 | calc_sine.run() 103 | 104 | paper.close() 105 | 106 | 107 | def make_simple_paper_with_copies(filename, paper_ref): 108 | 109 | paper = ActivePaper(filename, "w") 110 | 111 | paper.create_copy("/data/frequency", paper_ref) 112 | paper.create_copy("/data/time", paper_ref) 113 | 114 | paper.create_copy("/code/calc_sine", paper_ref) 115 | paper.run_codelet('calc_sine') 116 | 117 | paper.close() 118 | 119 | 120 | def assert_almost_equal(x, y, tolerance): 121 | assert (np.fabs(np.array(x)-np.array(y)) < tolerance).all() 122 | 123 | 124 | def check_paper_with_refs(filename, with_name_postfix, refs, additional_items): 125 | time_ds_name = '/data/time_from_ref' if with_name_postfix else '/data/time' 126 | paper = ActivePaper(filename, "r") 127 | items = sorted([item.name for item in paper.iter_items()]) 128 | assert items == sorted(['/code/calc_sine', '/data/frequency', 129 | '/data/sine', time_ds_name] + additional_items) 130 | for item_name in refs: 131 | assert paper.data_group[item_name].attrs['ACTIVE_PAPER_DATATYPE'] \ 132 | == 'reference' 133 | assert_almost_equal(paper.data["sine"][...], 134 | np.sin(0.04*np.pi*np.arange(100)), 135 | 1.e-10) 136 | paper.close() 137 | 138 | def test_simple_paper_with_data_refs(): 139 | with tempdir.TempDir() as t: 140 | library.library = [t] 141 | os.mkdir(os.path.join(t, "local")) 142 | filename1 = os.path.join(t, "local/simple1.ap") 143 | filename2 = os.path.join(t, "simple2.ap") 144 | make_simple_paper(filename1) 145 | make_simple_paper_with_data_refs(filename2, "local:simple1") 146 | check_paper_with_refs(filename2, True, 147 | ['/data/frequency', '/data/time_from_ref'], 148 | []) 149 | 150 | def test_simple_paper_with_data_and_code_refs(): 151 | with tempdir.TempDir() as t: 152 | library.library = [t] 153 | os.mkdir(os.path.join(t, "local")) 154 | filename1 = os.path.join(t, "local/simple1.ap") 155 | filename2 = os.path.join(t, "simple2.ap") 156 | make_simple_paper(filename1) 157 | make_simple_paper_with_data_and_code_refs(filename2, "local:simple1") 158 | check_paper_with_refs(filename2, False, 159 | ['/data/frequency', '/data/time', 160 | '/code/calc_sine'], 161 | []) 162 | 163 | def test_simple_paper_with_library_refs(): 164 | with tempdir.TempDir() as t: 165 | library.library = [t] 166 | os.mkdir(os.path.join(t, "local")) 167 | filename1 = os.path.join(t, "local/library.ap") 168 | filename2 = os.path.join(t, "simple.ap") 169 | make_library_paper(filename1) 170 | make_simple_paper_with_library_refs(filename2, "local:library") 171 | check_paper_with_refs(filename2, False, 172 | ['/code/python-packages/my_math'], 173 | ['/code/python-packages/my_math']) 174 | 175 | 176 | def test_copy(): 177 | with tempdir.TempDir() as t: 178 | library.library = [t] 179 | os.mkdir(os.path.join(t, "local")) 180 | filename1 = os.path.join(t, "local/simple1.ap") 181 | filename2 = os.path.join(t, "simple2.ap") 182 | make_simple_paper(filename1) 183 | make_simple_paper_with_copies(filename2, "local:simple1") 184 | check_paper_with_refs(filename2, False, [], []) 185 | paper = ActivePaper(filename2, 'r') 186 | for path in ['/code/calc_sine', '/data/frequency', '/data/time']: 187 | item = paper.file[path] 188 | source = item.attrs.get('ACTIVE_PAPER_COPIED_FROM') 189 | assert source is not None 190 | paper_ref, ref_path = source 191 | if h5py.version.version_tuple[:2] <= (2, 2): 192 | paper_ref = paper_ref.flat[0] 193 | ref_path = ref_path.flat[0] 194 | assert ascii(paper_ref) == "local:simple1" 195 | assert ascii(ref_path) == path 196 | 197 | --------------------------------------------------------------------------------