├── python ├── flatline │ ├── __init__.py │ ├── flatline-node.js │ ├── sampler.py │ └── interpreter.py ├── tests │ ├── __init__.py │ └── flatline_tests.py ├── setup.py ├── README.md └── notebooks │ └── Flatline.ipynb ├── js └── demo │ ├── flatline.js │ ├── styles.css │ └── index.html ├── docs ├── requirements.txt ├── index.rst ├── Makefile ├── conf.py ├── quick-reference.rst └── user-manual.rst ├── .gitignore ├── .readthedocs.yaml ├── license └── readme.md /python/flatline/__init__.py: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /python/tests/__init__.py: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /js/demo/flatline.js: -------------------------------------------------------------------------------- 1 | ../flatline.js -------------------------------------------------------------------------------- /python/flatline/flatline-node.js: -------------------------------------------------------------------------------- 1 | ../../js/flatline-node.js -------------------------------------------------------------------------------- /docs/requirements.txt: -------------------------------------------------------------------------------- 1 | sphinx 2 | sphinx_rtd_theme==2.0.0 3 | recommonmark 4 | -------------------------------------------------------------------------------- /python/tests/flatline_tests.py: -------------------------------------------------------------------------------- 1 | from nose.tools import * 2 | import flatline 3 | 4 | def setup(): 5 | print "SETUP!" 6 | 7 | def teardown(): 8 | print "TEAR DOWN!" 9 | 10 | def test_basic(): 11 | print "I RAN!" 12 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | .DS_Store 2 | *.pyc 3 | *.swl 4 | *.swm 5 | *.swn 6 | *.swo 7 | *.swp 8 | *.log 9 | *.log.* 10 | dist/ 11 | .cache 12 | build 13 | *pip-log.txt 14 | *.egg-info 15 | *.egg 16 | *.coverage 17 | .tox/ 18 | set_credentials.sh 19 | docs/_build/* 20 | *~ 21 | /python/notebooks/.ipynb_checkpoints/ 22 | _build 23 | -------------------------------------------------------------------------------- /.readthedocs.yaml: -------------------------------------------------------------------------------- 1 | # .readthedocs.yaml 2 | # Read the Docs configuration file 3 | # See https://docs.readthedocs.io/en/stable/config-file/v2.html for details 4 | 5 | # Required 6 | version: 2 7 | 8 | # Set the version of Python and other tools you might need 9 | build: 10 | os: ubuntu-22.04 11 | tools: 12 | python: "3.12" 13 | 14 | # Build documentation in the docs/ directory with Sphinx 15 | sphinx: 16 | configuration: docs/conf.py 17 | 18 | # https://docs.readthedocs.io/en/stable/guides/reproducible-builds.html 19 | python: 20 | install: 21 | - requirements: docs/requirements.txt 22 | -------------------------------------------------------------------------------- /docs/index.rst: -------------------------------------------------------------------------------- 1 | Flatline 2 | ======== 3 | 4 | Flatline is a lispy language for the specification of values to be 5 | extracted or generated from an input dataset, using a finite sliding 6 | window of input rows. 7 | 8 | In `BigML `__, it is used either as a row filter 9 | specifier or as a field generator. 10 | 11 | In the former case, the input consists of dataset rows on which a 12 | single, boolean expression is computed, and only those for which the 13 | result is true are kept in the output dataset. 14 | 15 | When used to generate new datasets from given ones, a list of Flatline 16 | expressions is provided, each one generating either a value or a list of 17 | values, which are then concatenated together to conform the output rows 18 | (each value representing therefore a field in the generated dataset). 19 | 20 | .. toctree:: 21 | :maxdepth: 1 22 | 23 | user-manual 24 | quick-reference 25 | -------------------------------------------------------------------------------- /js/demo/styles.css: -------------------------------------------------------------------------------- 1 | html { 2 | font: 90%/1.3 arial,sans-serif; 3 | padding:1em; 4 | background:#B9C2CC; 5 | } 6 | 7 | form { 8 | background:#fff; 9 | padding:1em; 10 | border:1px solid #eee; 11 | } 12 | 13 | fieldset div { 14 | margin:0.3em 0.3em; 15 | clear:both; 16 | } 17 | 18 | form { 19 | margin:1em; 20 | width:30em; 21 | } 22 | 23 | label { 24 | float:none; 25 | display:block; 26 | clear:both; 27 | } 28 | 29 | legend { 30 | color:#0b77b7; 31 | font-size:1.4em; 32 | } 33 | 34 | legend span { 35 | /* width:10em; */ 36 | text-align:right; 37 | } 38 | 39 | fieldset { 40 | border:1px solid #ddd; 41 | padding:0 0.5em 0.5em; 42 | } 43 | 44 | .help { 45 | color: red; 46 | } 47 | 48 | .button { 49 | margin:10px; 50 | padding:5px; 51 | } 52 | 53 | textarea { 54 | width:25em; 55 | } 56 | 57 | .error { 58 | color:red; 59 | } 60 | 61 | .result { 62 | color:darkgreen; 63 | } 64 | -------------------------------------------------------------------------------- /license: -------------------------------------------------------------------------------- 1 | Copyright 2013-15 BigML, Inc 2 | 3 | Documentation in this repostory is released under Creative Commons 4 | Attribution-ShareAlike 4.0 International License. 5 | 6 | Code in this repository is licensed under the Apache License, version 7 | 2.0, copied below: 8 | 9 | ------------------------------------------------------------------------- 10 | Licensed under the Apache License, Version 2.0 (the "License"); you may 11 | not use this file except in compliance with the License. You may obtain 12 | a copy of the License at 13 | 14 | http://www.apache.org/licenses/LICENSE-2.0 15 | 16 | Unless required by applicable law or agreed to in writing, software 17 | distributed under the License is distributed on an "AS IS" BASIS, WITHOUT 18 | WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the 19 | License for the specific language governing permissions and limitations 20 | under the License. 21 | ------------------------------------------------------------------------- 22 | -------------------------------------------------------------------------------- /python/setup.py: -------------------------------------------------------------------------------- 1 | from setuptools import setup 2 | 3 | setup(name='flatline', 4 | description='Python bridge for the flatline javascript interpreter', 5 | author='jao', 6 | url='http://github.com/bigmlcom/flatline', 7 | download_url='http://github.com/bigmlcom/flatline', 8 | author_email='jao@bigml.com', 9 | version='0.1', 10 | license='Apache', 11 | install_requires=['PyExecJS', 'nose', 'bigml'], 12 | packages=['flatline'], 13 | packages_data={'flatline':['flatline.js']}, 14 | classifiers=[ 15 | 'Development Status :: 4 - Beta', 16 | 'Intended Audience :: Developers', 17 | 'License :: OSI Approved :: Apache Software License', 18 | 'Natural Language :: English', 19 | 'Operating System :: OS Independent', 20 | 'Programming Language :: Python', 21 | 'Programming Language :: Python :: 2', 22 | 'Programming Language :: Python :: 2.7', 23 | 'Programming Language :: Python :: 3', 24 | 'Topic :: Software Development :: Libraries :: Python Modules', 25 | ], 26 | scripts=[], 27 | use_2to3=True 28 | ) 29 | -------------------------------------------------------------------------------- /python/README.md: -------------------------------------------------------------------------------- 1 | # Flatline Python bridge 2 | 3 | This package provides a python interface to the local JS Flatline 4 | interpreter, allowing checking Flatline Lisp and JSON s-expressions 5 | for correctnes and applying them to local dataset samples to generate 6 | new fields. 7 | 8 | Typically, you will use the functions in this package to experiment in 9 | your computer with the data transformations and filters you plan to 10 | eventually execute in BigML servers, after you're satisfied with the 11 | results of your explorations on small data samples. 12 | 13 | ## Installation 14 | 15 | The bridge uses [nodejs](http://nodejs.org) under the rug, and hence 16 | needs it to be 17 | [installed in your system](https://nodejs.org/download/) as a 18 | prerequisite. 19 | 20 | With that in place, you can use `setup.py` script for installing this 21 | package in the usual way. For instance 22 | 23 | ``` 24 | $ python setup.py develop 25 | ``` 26 | 27 | will perform an in-place installation, possibly in a local virtualenv 28 | (recommended): 29 | 30 | ``` 31 | $ virtualenv --distribute ~/.virtualenvs/flatline 32 | $ workon flatline 33 | $ python setup.py develop 34 | ``` 35 | 36 | ## Running the sample code in iPython 37 | 38 | We provide a [sample notebook](./notebooks/Flatline.ipynb) to 39 | illustrate the workings of this library. To use them, install 40 | [ipython and jupyter](http://ipython.org) with `pip`: 41 | 42 | ``` 43 | $ pip install jupyter 44 | ``` 45 | 46 | then 47 | [set up your BIGML enviroment variables for authentication](https://bigml.readthedocs.org/en/latest/#authentication): 48 | 49 | ``` 50 | $ export BIGML_USERNAME= 51 | $ export BIGML_API_KEY= 52 | ``` 53 | 54 | and start the notebook server in the [notebooks](./notebooks) 55 | directory: 56 | 57 | ``` 58 | $ cd notebooks 59 | $ jupyter notebook Flatline.ipynb 60 | ``` 61 | -------------------------------------------------------------------------------- /readme.md: -------------------------------------------------------------------------------- 1 | [![Documentation Status](https://readthedocs.org/projects/flatline/badge/?version=latest)](http://flatline.readthedocs.io/en/latest/?badge=latest) 2 | 3 | # Flatline, a language for data generation and filtering 4 | 5 | Flatline is a lispy language for the specification of values to be 6 | extracted or generated from an input dataset, using a finite sliding 7 | window of input rows. 8 | 9 | In BigML, it is used either as a row filter specifier or as a field 10 | generator. 11 | 12 | In the former case, the input consists of dataset rows on which a 13 | single, boolean expression is computed, and only those for which the 14 | result is true are kept in the output dataset. 15 | 16 | When used to generate new datasets from given ones, a list of Flatline 17 | expressions is provided, each one generating either a value or a list 18 | of values, which are then concatenated together to conform the output 19 | rows (each value representing therefore a field in the generated 20 | dataset). 21 | 22 | ## Documentation 23 | 24 | - [Flatline's user manual](docs/user-manual.rst). 25 | - [Quick reference](docs/quick-reference.rst) with all pre-defined 26 | functions. 27 | - Or see the HTML version in 28 | [Read the Docs](http://flatline.readthedocs.io/en/latest/?badge=latest). 29 | 30 | ## Local interpreters 31 | 32 | ### Javascript and Node.js 33 | 34 | We include in [js](./js) Flatline interpreters implemented in 35 | Javascript (compiled by Clojurescript from our canonical server-side 36 | implementation) that you can use from your browser or from a nodejs 37 | session. 38 | 39 | ### Python 40 | 41 | The [python directory](./python) contains a small Python library that 42 | wraps the nodejs interpreter and lets you interact with it using 43 | Python. See its [README](./python/README.md) for more information, 44 | including access to an iPython sample notebook. 45 | 46 | ## License 47 | 48 | Creative Commons License
Flatline reference documentation by BigML Inc is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. 49 | 50 | All code in this repository is released under the Apache License 2.0. 51 | -------------------------------------------------------------------------------- /python/flatline/sampler.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | 3 | # Copyright (c) 2015 BigML, Inc 4 | # All rights reserved. 5 | 6 | 7 | """flatline.sampler 8 | 9 | Working locally with Flatline over dataset samples. 10 | 11 | :author: jao 12 | :date: Mon Apr 06, 2015 04:14 13 | 14 | """ 15 | 16 | import interpreter 17 | import bigml.api as api 18 | import os 19 | 20 | ## local testing 21 | import requests.packages.urllib3 22 | requests.packages.urllib3.disable_warnings() 23 | 24 | class Sampler: 25 | """The Sampler class automatizes the process of sampling a dataset. 26 | 27 | It works by downloading a subset of the dataset rows (using 28 | BigML's sample resources) and subsequently applying to them any 29 | desired Flatline generator. 30 | 31 | Example: 32 | 33 | sampler = Sampler() 34 | sampler.take_sample('dataset/54e374ab67dc09706d000283', size=4) 35 | sampler.apply_lisp('(+ (f 0) (f 1))') 36 | 37 | """ 38 | 39 | _interpreter = interpreter.Interpreter() 40 | 41 | def __init__(self, username=None, api_key=None, bigml=None): 42 | """Creates a new instance of a Sampler. 43 | 44 | A Sampler is an object able to connect to your BigML account, 45 | retrieve samples of datasets, and apply to those local rows 46 | Flatline transformations. Optionally, you can specify your 47 | api_key and username, or a bigml.api.BigML connection. 48 | Otherwise, we use the environment variables BIGML_USERNAME and 49 | BIGML_API_KEY. 50 | 51 | """ 52 | if bigml is None: 53 | username = username or os.environ['BIGML_USERNAME'] 54 | api_key = api_key or os.environ['BIGML_API_KEY'] 55 | self._bigml = api.BigML(username=username, api_key=api_key) 56 | else: 57 | self._bigml = bigml 58 | self._sample = None 59 | 60 | def take_sample(self, dataset_id, size=10): 61 | """Given the corresponding dataset identifier, retrieve a sample of 62 | its rows with the requested size (number of rows). 63 | 64 | """ 65 | sample = self._bigml.create_sample(dataset_id) 66 | qs = "limit=-1&rows=%d" % size 67 | self._sample = self._bigml.check_resource(sample['resource'], 68 | query_string=qs) 69 | 70 | def sample(self): 71 | """Returns the full dictionary of properties of the current sample. 72 | 73 | Use 'take_sample' to update the current sample. 74 | """ 75 | if self._sample is None: 76 | return {} 77 | return self._sample['object']['sample'] 78 | 79 | def rows(self): 80 | """Returns a list of lists representing the current sample's rows. 81 | 82 | See 'take_sample' for updating the current sample and 'sample' 83 | for the full set of its properties. 84 | 85 | """ 86 | return self.sample().get('rows') 87 | 88 | def fields(self): 89 | """The list of field descriptors for the current sample. 90 | 91 | See 'take_sample' for updating the current sample and 'sample' 92 | for the full set of its properties. 93 | 94 | """ 95 | return self.sample().get('fields') 96 | 97 | def apply_lisp(self, sexp): 98 | """Applies the given lisp s-expression to the current sample's rows. 99 | 100 | On success, returns new rows generated by 'sexp', as a list of 101 | lists of native Python values. 'sexp' is a string. 102 | 103 | You can use 'rows' to retrieve the input rows used by this 104 | function. 105 | 106 | """ 107 | return self._interpreter.apply_lisp(sexp, self.rows(), self.sample()) 108 | 109 | def apply_json(self, json_sexp): 110 | """Applies a JSON s-expression to the current sample's rows. 111 | 112 | Ths JSON s-expression must be represented as a Python list 113 | convertible to JSON, e.g. ["+", 1, ["f", "000000"]]. 114 | 115 | """ 116 | 117 | return self._interpreter.apply_json(json_sexp, 118 | self.rows(), 119 | self.sample()) 120 | -------------------------------------------------------------------------------- /python/flatline/interpreter.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | 3 | # Copyright (c) 2015 BigML, Inc 4 | # All rights reserved. 5 | 6 | 7 | """ 8 | flatline.interpreter 9 | 10 | User level interface to the flatline JS interpreter. 11 | 12 | :author: jao 13 | :date: Sun Apr 05, 2015 01:40 14 | 15 | """ 16 | 17 | import execjs 18 | import pkg_resources 19 | 20 | class Interpreter: 21 | """A bridge to an underlying nodejs Flatline interpreter. 22 | 23 | This class uses execjs to launch a Nodejs interpreter that loads 24 | Flatline's javascript implementation and allows interaction via 25 | Python constructs. 26 | 27 | Example: 28 | 29 | inter = Interpreter() 30 | inter.check_lisp('(+ 1 2)') 31 | inter.check_json(["f", 0], dataset=dataset) 32 | 33 | """ 34 | 35 | __FLATJS = pkg_resources.resource_filename('flatline', 'flatline-node.js') 36 | __REQFLATJS = "f__ = require('%s')" % __FLATJS 37 | 38 | def __init__(self): 39 | self._interpreter = execjs.get("Node") 40 | self._context = self._interpreter.compile(Interpreter.__REQFLATJS) 41 | 42 | def __eval_in_flatline(self, fun, *args): 43 | return self._context.call('f__.flatline.%s' % fun, *args) 44 | 45 | @staticmethod 46 | def infer_fields(row): 47 | """Utility function generating a mock list of fields. 48 | 49 | Usually, checks and applications of Flatline expressions run 50 | in the context of a given dataset's field descriptors, but 51 | during testing it's useful sometimes to provide a mock set of 52 | them, based on the types of the values of the test input rows. 53 | 54 | Example: 55 | 56 | In[1]: Interpreter.infer_fields([0, 'a label']) 57 | Out[2]: [{'column_number': 0, 58 | 'datatype': 'int64', 59 | 'id': '000000', 60 | 'optype': 'numeric'}, 61 | {'column_number': 1, 62 | 'datatype': 'string', 63 | 'id': '000001', 64 | 'optype': 'categorical'}] 65 | 66 | """ 67 | result = [] 68 | id = 0 69 | for v in row: 70 | t = type(v) 71 | optype = 'categorical' 72 | datatype = 'string' 73 | if (t is int or t is long or t is float): 74 | optype = 'numeric' 75 | if t is float: 76 | datatype = 'float64' 77 | else: 78 | datatype = 'int64' 79 | result.append({'id': '%06x' % id, 80 | 'optype':optype, 81 | 'datatype': datatype, 82 | 'column_number': id}) 83 | id = id + 1 84 | return result 85 | 86 | @staticmethod 87 | def __dataset(dataset, rows): 88 | if dataset is None and len(rows) > 0: 89 | return {'fields': Interpreter.infer_fields(rows[0])} 90 | return dataset 91 | 92 | def defined_functions(self): 93 | """A list of the names of all defined Flaline functions""" 94 | return self.__eval_in_flatline('defined_primitives') 95 | 96 | def check_lisp(self, sexp, dataset=None): 97 | """Checks whether the given lisp s-expression is valid. 98 | 99 | Any operations referring to a dataset's fields will use the 100 | information found in the provided dataset, which should have 101 | the structure of the 'object' component of a BigML dataset 102 | resource. 103 | 104 | """ 105 | r = self.__eval_in_flatline('evaluate_sexp', sexp, dataset) 106 | r.pop(u'mapper', None) 107 | return r 108 | 109 | def check_json(self, json_sexp, dataset=None): 110 | """Checks whether the given JSON s-expression is valid. 111 | 112 | Works like `check_lisp` (which see), but taking a JSON 113 | expression represented as a native Python list instead of a 114 | Lisp sexp string. 115 | 116 | """ 117 | r = self.__eval_in_flatline('evaluate_js', json_sexp, dataset) 118 | r.pop(u'mapper', None) 119 | return r 120 | 121 | def lisp_to_json(self, sexp): 122 | """ Auxliary function transforming Lisp to Python representation.""" 123 | return self.__eval_in_flatline('sexp_to_js', sexp) 124 | 125 | def json_to_lisp(self, json_sexp): 126 | """ Auxliary function transforming Python to lisp representation.""" 127 | return self.__eval_in_flatline('js_to_sexp', json_sexp) 128 | 129 | def apply_lisp(self, sexp, rows, dataset=None): 130 | """Applies the given Lisp sexp to a set of input rows. 131 | 132 | Input rows are represented as a list of lists of native Python 133 | values. If no dataset is provided, the field characteristics 134 | of the input rows are guessed using `infer_fields`. 135 | 136 | """ 137 | return self.__eval_in_flatline('eval_and_apply_sexp', 138 | sexp, 139 | Interpreter.__dataset(dataset, rows), 140 | rows) 141 | 142 | def apply_json(self, json_sexp, rows, dataset=None): 143 | """Applies the given JSON sexp to a set of input rows. 144 | 145 | As usual, JSON sexps are represented as Python lists, 146 | e.g. ["+", 1, 2]. 147 | 148 | Input rows are represented as a list of lists of native Python 149 | values. If no dataset is provided, the field characteristics 150 | of the input rows are guessed using `infer_fields`. 151 | 152 | """ 153 | return self.__eval_in_flatline('eval_and_apply_js', 154 | json_sexp, 155 | Interpreter.__dataset(dataset, rows), 156 | rows) 157 | -------------------------------------------------------------------------------- /docs/Makefile: -------------------------------------------------------------------------------- 1 | # Makefile for Sphinx documentation 2 | # 3 | 4 | # You can set these variables from the command line. 5 | SPHINXOPTS = 6 | SPHINXBUILD = sphinx-build 7 | PAPER = 8 | BUILDDIR = _build 9 | 10 | # Internal variables. 11 | PAPEROPT_a4 = -D latex_paper_size=a4 12 | PAPEROPT_letter = -D latex_paper_size=letter 13 | ALLSPHINXOPTS = -d $(BUILDDIR)/doctrees $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) . 14 | # the i18n builder cannot share the environment and doctrees with the others 15 | I18NSPHINXOPTS = $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) . 16 | 17 | .PHONY: help 18 | help: 19 | @echo "Please use \`make ' where is one of" 20 | @echo " html to make standalone HTML files" 21 | @echo " dirhtml to make HTML files named index.html in directories" 22 | @echo " singlehtml to make a single large HTML file" 23 | @echo " pickle to make pickle files" 24 | @echo " json to make JSON files" 25 | @echo " htmlhelp to make HTML files and a HTML help project" 26 | @echo " qthelp to make HTML files and a qthelp project" 27 | @echo " applehelp to make an Apple Help Book" 28 | @echo " devhelp to make HTML files and a Devhelp project" 29 | @echo " epub to make an epub" 30 | @echo " epub3 to make an epub3" 31 | @echo " latex to make LaTeX files, you can set PAPER=a4 or PAPER=letter" 32 | @echo " latexpdf to make LaTeX files and run them through pdflatex" 33 | @echo " latexpdfja to make LaTeX files and run them through platex/dvipdfmx" 34 | @echo " text to make text files" 35 | @echo " man to make manual pages" 36 | @echo " texinfo to make Texinfo files" 37 | @echo " info to make Texinfo files and run them through makeinfo" 38 | @echo " gettext to make PO message catalogs" 39 | @echo " changes to make an overview of all changed/added/deprecated items" 40 | @echo " xml to make Docutils-native XML files" 41 | @echo " pseudoxml to make pseudoxml-XML files for display purposes" 42 | @echo " linkcheck to check all external links for integrity" 43 | @echo " doctest to run all doctests embedded in the documentation (if enabled)" 44 | @echo " coverage to run coverage check of the documentation (if enabled)" 45 | @echo " dummy to check syntax errors of document sources" 46 | 47 | .PHONY: clean 48 | clean: 49 | rm -rf $(BUILDDIR)/* 50 | 51 | .PHONY: html 52 | html: 53 | $(SPHINXBUILD) -b html $(ALLSPHINXOPTS) $(BUILDDIR)/html 54 | @echo 55 | @echo "Build finished. The HTML pages are in $(BUILDDIR)/html." 56 | 57 | .PHONY: dirhtml 58 | dirhtml: 59 | $(SPHINXBUILD) -b dirhtml $(ALLSPHINXOPTS) $(BUILDDIR)/dirhtml 60 | @echo 61 | @echo "Build finished. The HTML pages are in $(BUILDDIR)/dirhtml." 62 | 63 | .PHONY: singlehtml 64 | singlehtml: 65 | $(SPHINXBUILD) -b singlehtml $(ALLSPHINXOPTS) $(BUILDDIR)/singlehtml 66 | @echo 67 | @echo "Build finished. The HTML page is in $(BUILDDIR)/singlehtml." 68 | 69 | .PHONY: pickle 70 | pickle: 71 | $(SPHINXBUILD) -b pickle $(ALLSPHINXOPTS) $(BUILDDIR)/pickle 72 | @echo 73 | @echo "Build finished; now you can process the pickle files." 74 | 75 | .PHONY: json 76 | json: 77 | $(SPHINXBUILD) -b json $(ALLSPHINXOPTS) $(BUILDDIR)/json 78 | @echo 79 | @echo "Build finished; now you can process the JSON files." 80 | 81 | .PHONY: htmlhelp 82 | htmlhelp: 83 | $(SPHINXBUILD) -b htmlhelp $(ALLSPHINXOPTS) $(BUILDDIR)/htmlhelp 84 | @echo 85 | @echo "Build finished; now you can run HTML Help Workshop with the" \ 86 | ".hhp project file in $(BUILDDIR)/htmlhelp." 87 | 88 | .PHONY: qthelp 89 | qthelp: 90 | $(SPHINXBUILD) -b qthelp $(ALLSPHINXOPTS) $(BUILDDIR)/qthelp 91 | @echo 92 | @echo "Build finished; now you can run "qcollectiongenerator" with the" \ 93 | ".qhcp project file in $(BUILDDIR)/qthelp, like this:" 94 | @echo "# qcollectiongenerator $(BUILDDIR)/qthelp/Flatline.qhcp" 95 | @echo "To view the help file:" 96 | @echo "# assistant -collectionFile $(BUILDDIR)/qthelp/Flatline.qhc" 97 | 98 | .PHONY: applehelp 99 | applehelp: 100 | $(SPHINXBUILD) -b applehelp $(ALLSPHINXOPTS) $(BUILDDIR)/applehelp 101 | @echo 102 | @echo "Build finished. The help book is in $(BUILDDIR)/applehelp." 103 | @echo "N.B. You won't be able to view it unless you put it in" \ 104 | "~/Library/Documentation/Help or install it in your application" \ 105 | "bundle." 106 | 107 | .PHONY: devhelp 108 | devhelp: 109 | $(SPHINXBUILD) -b devhelp $(ALLSPHINXOPTS) $(BUILDDIR)/devhelp 110 | @echo 111 | @echo "Build finished." 112 | @echo "To view the help file:" 113 | @echo "# mkdir -p $$HOME/.local/share/devhelp/Flatline" 114 | @echo "# ln -s $(BUILDDIR)/devhelp $$HOME/.local/share/devhelp/Flatline" 115 | @echo "# devhelp" 116 | 117 | .PHONY: epub 118 | epub: 119 | $(SPHINXBUILD) -b epub $(ALLSPHINXOPTS) $(BUILDDIR)/epub 120 | @echo 121 | @echo "Build finished. The epub file is in $(BUILDDIR)/epub." 122 | 123 | .PHONY: epub3 124 | epub3: 125 | $(SPHINXBUILD) -b epub3 $(ALLSPHINXOPTS) $(BUILDDIR)/epub3 126 | @echo 127 | @echo "Build finished. The epub3 file is in $(BUILDDIR)/epub3." 128 | 129 | .PHONY: latex 130 | latex: 131 | $(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex 132 | @echo 133 | @echo "Build finished; the LaTeX files are in $(BUILDDIR)/latex." 134 | @echo "Run \`make' in that directory to run these through (pdf)latex" \ 135 | "(use \`make latexpdf' here to do that automatically)." 136 | 137 | .PHONY: latexpdf 138 | latexpdf: 139 | $(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex 140 | @echo "Running LaTeX files through pdflatex..." 141 | $(MAKE) -C $(BUILDDIR)/latex all-pdf 142 | @echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex." 143 | 144 | .PHONY: latexpdfja 145 | latexpdfja: 146 | $(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex 147 | @echo "Running LaTeX files through platex and dvipdfmx..." 148 | $(MAKE) -C $(BUILDDIR)/latex all-pdf-ja 149 | @echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex." 150 | 151 | .PHONY: text 152 | text: 153 | $(SPHINXBUILD) -b text $(ALLSPHINXOPTS) $(BUILDDIR)/text 154 | @echo 155 | @echo "Build finished. The text files are in $(BUILDDIR)/text." 156 | 157 | .PHONY: man 158 | man: 159 | $(SPHINXBUILD) -b man $(ALLSPHINXOPTS) $(BUILDDIR)/man 160 | @echo 161 | @echo "Build finished. The manual pages are in $(BUILDDIR)/man." 162 | 163 | .PHONY: texinfo 164 | texinfo: 165 | $(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo 166 | @echo 167 | @echo "Build finished. The Texinfo files are in $(BUILDDIR)/texinfo." 168 | @echo "Run \`make' in that directory to run these through makeinfo" \ 169 | "(use \`make info' here to do that automatically)." 170 | 171 | .PHONY: info 172 | info: 173 | $(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo 174 | @echo "Running Texinfo files through makeinfo..." 175 | make -C $(BUILDDIR)/texinfo info 176 | @echo "makeinfo finished; the Info files are in $(BUILDDIR)/texinfo." 177 | 178 | .PHONY: gettext 179 | gettext: 180 | $(SPHINXBUILD) -b gettext $(I18NSPHINXOPTS) $(BUILDDIR)/locale 181 | @echo 182 | @echo "Build finished. The message catalogs are in $(BUILDDIR)/locale." 183 | 184 | .PHONY: changes 185 | changes: 186 | $(SPHINXBUILD) -b changes $(ALLSPHINXOPTS) $(BUILDDIR)/changes 187 | @echo 188 | @echo "The overview file is in $(BUILDDIR)/changes." 189 | 190 | .PHONY: linkcheck 191 | linkcheck: 192 | $(SPHINXBUILD) -b linkcheck $(ALLSPHINXOPTS) $(BUILDDIR)/linkcheck 193 | @echo 194 | @echo "Link check complete; look for any errors in the above output " \ 195 | "or in $(BUILDDIR)/linkcheck/output.txt." 196 | 197 | .PHONY: doctest 198 | doctest: 199 | $(SPHINXBUILD) -b doctest $(ALLSPHINXOPTS) $(BUILDDIR)/doctest 200 | @echo "Testing of doctests in the sources finished, look at the " \ 201 | "results in $(BUILDDIR)/doctest/output.txt." 202 | 203 | .PHONY: coverage 204 | coverage: 205 | $(SPHINXBUILD) -b coverage $(ALLSPHINXOPTS) $(BUILDDIR)/coverage 206 | @echo "Testing of coverage in the sources finished, look at the " \ 207 | "results in $(BUILDDIR)/coverage/python.txt." 208 | 209 | .PHONY: xml 210 | xml: 211 | $(SPHINXBUILD) -b xml $(ALLSPHINXOPTS) $(BUILDDIR)/xml 212 | @echo 213 | @echo "Build finished. The XML files are in $(BUILDDIR)/xml." 214 | 215 | .PHONY: pseudoxml 216 | pseudoxml: 217 | $(SPHINXBUILD) -b pseudoxml $(ALLSPHINXOPTS) $(BUILDDIR)/pseudoxml 218 | @echo 219 | @echo "Build finished. The pseudo-XML files are in $(BUILDDIR)/pseudoxml." 220 | 221 | .PHONY: dummy 222 | dummy: 223 | $(SPHINXBUILD) -b dummy $(ALLSPHINXOPTS) $(BUILDDIR)/dummy 224 | @echo 225 | @echo "Build finished. Dummy builder generates no files." 226 | -------------------------------------------------------------------------------- /js/demo/index.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | Flatline calculator 5 | 6 | 7 | 8 |
9 |
10 | Dataset 11 |
12 | 13 | 14 |
15 |
16 |
17 |
18 |
19 |
20 |
21 | Lisp 22 |
23 | 24 | 25 |
26 |
27 | 31 | 34 |
35 |
36 |
37 |
38 | 39 | 40 |
41 |
42 | 46 |
47 |
48 |
49 |
50 |
51 |
52 |
53 | JSON 54 |
55 | 56 | 57 |
58 |
59 | 63 | 66 |
67 |
68 |
69 |
70 | 71 | 72 |
73 |
74 | 78 |
79 |
80 |
81 |
82 |
83 | 84 | 235 | 236 | 237 | -------------------------------------------------------------------------------- /docs/conf.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | # -*- coding: utf-8 -*- 3 | # 4 | # Flatline documentation build configuration file, created by 5 | # sphinx-quickstart on Tue Jan 10 00:20:23 2017. 6 | # 7 | # This file is execfile()d with the current directory set to its 8 | # containing dir. 9 | # 10 | # Note that not all possible configuration values are present in this 11 | # autogenerated file. 12 | # 13 | # All configuration values have a default; values that are commented out 14 | # serve to show the default. 15 | 16 | # If extensions (or modules to document with autodoc) are in another directory, 17 | # add these directories to sys.path here. If the directory is relative to the 18 | # documentation root, use os.path.abspath to make it absolute, like shown here. 19 | # 20 | # import os 21 | # import sys 22 | # sys.path.insert(0, os.path.abspath('.')) 23 | 24 | from recommonmark.parser import CommonMarkParser 25 | 26 | # -- General configuration ------------------------------------------------ 27 | 28 | # If your documentation needs a minimal Sphinx version, state it here. 29 | # 30 | # needs_sphinx = '1.0' 31 | 32 | # Add any Sphinx extension module names here, as strings. They can be 33 | # extensions coming with Sphinx (named 'sphinx.ext.*') or your custom 34 | # ones. 35 | extensions = [ 36 | 'sphinx.ext.mathjax', 37 | ] 38 | 39 | # Add any paths that contain templates here, relative to this directory. 40 | templates_path = ['_templates'] 41 | 42 | # The suffix(es) of source filenames. 43 | # You can specify multiple suffix as a list of string: 44 | # 45 | source_suffix = ['.rst', '.md'] 46 | 47 | source_parsers = { 48 | '.md': CommonMarkParser, 49 | } 50 | 51 | # The encoding of source files. 52 | # 53 | # source_encoding = 'utf-8-sig' 54 | 55 | # The master toctree document. 56 | master_doc = 'index' 57 | 58 | # General information about the project. 59 | project = 'Flatline' 60 | copyright = '2017-2018, 2025, The BigML Team' 61 | author = 'The BigML Team' 62 | 63 | # The version info for the project you're documenting, acts as replacement for 64 | # |version| and |release|, also used in various other places throughout the 65 | # built documents. 66 | # 67 | # The short X.Y version. 68 | version = '1.0' 69 | # The full version, including alpha/beta/rc tags. 70 | release = '1.0' 71 | 72 | # The language for content autogenerated by Sphinx. Refer to documentation 73 | # for a list of supported languages. 74 | # 75 | # This is also used if you do content translation via gettext catalogs. 76 | # Usually you set "language" from the command line for these cases. 77 | language = "en" 78 | 79 | # There are two options for replacing |today|: either, you set today to some 80 | # non-false value, then it is used: 81 | # 82 | # today = '' 83 | # 84 | # Else, today_fmt is used as the format for a strftime call. 85 | # 86 | # today_fmt = '%B %d, %Y' 87 | 88 | # List of patterns, relative to source directory, that match files and 89 | # directories to ignore when looking for source files. 90 | # This patterns also effect to html_static_path and html_extra_path 91 | exclude_patterns = ['_build', 'Thumbs.db', '.DS_Store'] 92 | 93 | # The reST default role (used for this markup: `text`) to use for all 94 | # documents. 95 | # 96 | # default_role = None 97 | 98 | # If true, '()' will be appended to :func: etc. cross-reference text. 99 | # 100 | # add_function_parentheses = True 101 | 102 | # If true, the current module name will be prepended to all description 103 | # unit titles (such as .. function::). 104 | # 105 | # add_module_names = True 106 | 107 | # If true, sectionauthor and moduleauthor directives will be shown in the 108 | # output. They are ignored by default. 109 | # 110 | # show_authors = False 111 | 112 | # The name of the Pygments (syntax highlighting) style to use. 113 | pygments_style = 'sphinx' 114 | 115 | # A list of ignored prefixes for module index sorting. 116 | # modindex_common_prefix = [] 117 | 118 | # If true, keep warnings as "system message" paragraphs in the built documents. 119 | # keep_warnings = False 120 | 121 | # If true, `todo` and `todoList` produce output, else they produce nothing. 122 | todo_include_todos = False 123 | 124 | 125 | # -- Options for HTML output ---------------------------------------------- 126 | 127 | # The theme to use for HTML and HTML Help pages. See the documentation for 128 | # a list of builtin themes. 129 | # 130 | html_theme = 'default' 131 | 132 | # Theme options are theme-specific and customize the look and feel of a theme 133 | # further. For a list of options available for each theme, see the 134 | # documentation. 135 | # 136 | # html_theme_options = {} 137 | 138 | # Add any paths that contain custom themes here, relative to this directory. 139 | # html_theme_path = [] 140 | 141 | # The name for this set of Sphinx documents. 142 | # " v documentation" by default. 143 | # 144 | # html_title = 'Flatline v1.0' 145 | 146 | # A shorter title for the navigation bar. Default is the same as html_title. 147 | # 148 | # html_short_title = None 149 | 150 | # The name of an image file (relative to this directory) to place at the top 151 | # of the sidebar. 152 | # 153 | # html_logo = None 154 | 155 | # The name of an image file (relative to this directory) to use as a favicon of 156 | # the docs. This file should be a Windows icon file (.ico) being 16x16 or 32x32 157 | # pixels large. 158 | # 159 | # html_favicon = None 160 | 161 | # Add any paths that contain custom static files (such as style sheets) here, 162 | # relative to this directory. They are copied after the builtin static files, 163 | # so a file named "default.css" will overwrite the builtin "default.css". 164 | html_static_path = [] 165 | 166 | # Add any extra paths that contain custom files (such as robots.txt or 167 | # .htaccess) here, relative to this directory. These files are copied 168 | # directly to the root of the documentation. 169 | # 170 | # html_extra_path = [] 171 | 172 | # If not None, a 'Last updated on:' timestamp is inserted at every page 173 | # bottom, using the given strftime format. 174 | # The empty string is equivalent to '%b %d, %Y'. 175 | # 176 | # html_last_updated_fmt = None 177 | 178 | # If true, SmartyPants will be used to convert quotes and dashes to 179 | # typographically correct entities. 180 | # 181 | # html_use_smartypants = True 182 | 183 | # Custom sidebar templates, maps document names to template names. 184 | # 185 | # html_sidebars = {} 186 | 187 | # Additional templates that should be rendered to pages, maps page names to 188 | # template names. 189 | # 190 | # html_additional_pages = {} 191 | 192 | # If false, no module index is generated. 193 | # 194 | # html_domain_indices = True 195 | 196 | # If false, no index is generated. 197 | # 198 | # html_use_index = True 199 | 200 | # If true, the index is split into individual pages for each letter. 201 | # 202 | # html_split_index = False 203 | 204 | # If true, links to the reST sources are added to the pages. 205 | # 206 | # html_show_sourcelink = True 207 | 208 | # If true, "Created using Sphinx" is shown in the HTML footer. Default is True. 209 | # 210 | # html_show_sphinx = True 211 | 212 | # If true, "(C) Copyright ..." is shown in the HTML footer. Default is True. 213 | # 214 | # html_show_copyright = True 215 | 216 | # If true, an OpenSearch description file will be output, and all pages will 217 | # contain a tag referring to it. The value of this option must be the 218 | # base URL from which the finished HTML is served. 219 | # 220 | # html_use_opensearch = '' 221 | 222 | # This is the file name suffix for HTML files (e.g. ".xhtml"). 223 | # html_file_suffix = None 224 | 225 | # Language to be used for generating the HTML full-text search index. 226 | # Sphinx supports the following languages: 227 | # 'da', 'de', 'en', 'es', 'fi', 'fr', 'h', 'it', 'ja' 228 | # 'nl', 'no', 'pt', 'ro', 'r', 'sv', 'tr', 'zh' 229 | # 230 | # html_search_language = 'en' 231 | 232 | # A dictionary with options for the search language support, empty by default. 233 | # 'ja' uses this config value. 234 | # 'zh' user can custom change `jieba` dictionary path. 235 | # 236 | # html_search_options = {'type': 'default'} 237 | 238 | # The name of a javascript file (relative to the configuration directory) that 239 | # implements a search results scorer. If empty, the default will be used. 240 | # 241 | # html_search_scorer = 'scorer.js' 242 | 243 | # Output file base name for HTML help builder. 244 | htmlhelp_basename = 'Flatlinedoc' 245 | 246 | # -- Options for LaTeX output --------------------------------------------- 247 | 248 | latex_elements = { 249 | # The paper size ('letterpaper' or 'a4paper'). 250 | # 251 | # 'papersize': 'letterpaper', 252 | 253 | # The font size ('10pt', '11pt' or '12pt'). 254 | # 255 | # 'pointsize': '10pt', 256 | 257 | # Additional stuff for the LaTeX preamble. 258 | # 259 | # 'preamble': '', 260 | 261 | # Latex figure (float) alignment 262 | # 263 | # 'figure_align': 'htbp', 264 | } 265 | 266 | # Grouping the document tree into LaTeX files. List of tuples 267 | # (source start file, target name, title, 268 | # author, documentclass [howto, manual, or own class]). 269 | latex_documents = [ 270 | (master_doc, 'Flatline.tex', 'Flatline Documentation', 271 | 'The BigML Team', 'manual'), 272 | ] 273 | 274 | # The name of an image file (relative to this directory) to place at the top of 275 | # the title page. 276 | # 277 | # latex_logo = None 278 | 279 | # For "manual" documents, if this is true, then toplevel headings are parts, 280 | # not chapters. 281 | # 282 | # latex_use_parts = False 283 | 284 | # If true, show page references after internal links. 285 | # 286 | # latex_show_pagerefs = False 287 | 288 | # If true, show URL addresses after external links. 289 | # 290 | # latex_show_urls = False 291 | 292 | # Documents to append as an appendix to all manuals. 293 | # 294 | # latex_appendices = [] 295 | 296 | # It false, will not define \strong, \code, itleref, \crossref ... but only 297 | # \sphinxstrong, ..., \sphinxtitleref, ... To help avoid clash with user added 298 | # packages. 299 | # 300 | # latex_keep_old_macro_names = True 301 | 302 | # If false, no module index is generated. 303 | # 304 | # latex_domain_indices = True 305 | 306 | 307 | # -- Options for manual page output --------------------------------------- 308 | 309 | # One entry per manual page. List of tuples 310 | # (source start file, name, description, authors, manual section). 311 | man_pages = [ 312 | (master_doc, 'flatline', 'Flatline Documentation', 313 | [author], 1) 314 | ] 315 | 316 | # If true, show URL addresses after external links. 317 | # 318 | # man_show_urls = False 319 | 320 | 321 | # -- Options for Texinfo output ------------------------------------------- 322 | 323 | # Grouping the document tree into Texinfo files. List of tuples 324 | # (source start file, target name, title, author, 325 | # dir menu entry, description, category) 326 | texinfo_documents = [ 327 | (master_doc, 'Flatline', 'Flatline Documentation', 328 | author, 'Flatline', 'One line description of project.', 329 | 'Miscellaneous'), 330 | ] 331 | 332 | # Documents to append as an appendix to all manuals. 333 | # 334 | # texinfo_appendices = [] 335 | 336 | # If false, no module index is generated. 337 | # 338 | # texinfo_domain_indices = True 339 | 340 | # How to display URL addresses: 'footnote', 'no', or 'inline'. 341 | # 342 | # texinfo_show_urls = 'footnote' 343 | 344 | # If true, do not generate a @detailmenu in the "Top" node's menu. 345 | # 346 | # texinfo_no_detailmenu = False 347 | -------------------------------------------------------------------------------- /docs/quick-reference.rst: -------------------------------------------------------------------------------- 1 | Quick reference 2 | =============== 3 | 4 | Field accessors and properties 5 | ------------------------------ 6 | 7 | Access to input field values: 8 | 9 | .. code:: lisp 10 | 11 | (field [] []) 12 | (f [] []) 13 | (fields ... ) 14 | (random-field-value ) 15 | (weighted-random-field-value ) 16 | (ensure-value ) 17 | (ensure-weighted-value ) 18 | 19 | All fields in a row: 20 | 21 | .. code:: lisp 22 | 23 | (all) 24 | (all-but ... ) 25 | (all-with-defaults 26 | 27 | ... 28 | ) 29 | (all-with-numeric-default ["mean" "median" "minimum" "maximum" ] 30 | 31 | Row properties: 32 | 33 | .. code:: lisp 34 | 35 | (row-number) ;; current row number, 0-based 36 | 37 | Field properties: 38 | 39 | .. code:: lisp 40 | 41 | (bin-center ) ;; number 42 | (bin-count ) ;; number 43 | (category-count ) ;; number 44 | (maximum ) ;; number 45 | (mean ) ;; number 46 | (median ) ;; number 47 | (minimum ) ;; number 48 | (missing? []) ;; boolean 49 | (missing-count ) ;; number 50 | (preferred? ) ;; boolean 51 | (population ) ;; integer 52 | (sum ) ;; number 53 | (sum-squares ) ;; number 54 | (variance ) ;; number 55 | (standard-deviation ) ;; number 56 | 57 | Normalization: 58 | 59 | .. code:: lisp 60 | 61 | (normalize [ ]) ;; [from to] defaults to [0, 1] 62 | (z-score ) 63 | (log-normal ) 64 | 65 | Percentiles and population: 66 | 67 | .. code:: lisp 68 | 69 | (percentile ) ;; number 70 | (population-fraction ) ;; integer 71 | (within-percentiles? ) ;; boolean 72 | (percentile-label ... ) 73 | 74 | Segments: 75 | 76 | .. code:: lisp 77 | 78 | (segment-label 79 | 80 | ... 81 | 82 | ) 83 | (segment-label ... ) 84 | 85 | Vectorize categorical and text fields: 86 | 87 | .. code:: lisp 88 | 89 | (vectorize []) 90 | 91 | Items: 92 | 93 | .. code:: lisp 94 | 95 | (contains-items? ... ) 96 | (equal-to-items? ... ) 97 | 98 | Regions: 99 | 100 | .. code:: lisp 101 | 102 | (region? ) 103 | (rename-region ) 104 | (add-region ) 105 | (add-region ) 106 | (remove-region ) 107 | (update-region ) 108 | (update-region ) 109 | ;; either a regions value or a regions field designator 110 | 111 | Clustering: 112 | 113 | .. code:: lisp 114 | 115 | (row-distance [ ]) 116 | (row-distance-squared [ ]) 117 | 118 | Strings and regular expressions 119 | ------------------------------- 120 | 121 | Conversion of any value to a string: 122 | 123 | .. code:: lisp 124 | 125 | (str ...) ;; string 126 | 127 | Substrings: 128 | 129 | .. code:: lisp 130 | 131 | (subs []) ;; string 132 | 133 | Regexps: 134 | 135 | .. code:: lisp 136 | 137 | (matches? ) ;; boolean 138 | (re-quote ) ;; regexp that matches literally 139 | (replace ) ;; string 140 | (replace-first ) ;; string 141 | 142 | Utilities: 143 | 144 | .. code:: lisp 145 | 146 | (length ) ;; integer 147 | (join ) ;; string 148 | (levenshtein ) ;; number 149 | (occurrences [ ]) ;; number 150 | (language ) ;; ["en", "es", "ca", "nl"] 151 | 152 | Hashing: 153 | 154 | .. code:: lisp 155 | 156 | (md5 ) ;; string of length 32 157 | (sha1 ) ;; string of length 40 158 | (sha256 ) ;; string of length 64 159 | 160 | Math and logic 161 | -------------- 162 | 163 | Arithmetic operators: 164 | 165 | .. code:: lisp 166 | 167 | + - * / div mod 168 | 169 | Relational operators: 170 | 171 | .. code:: lisp 172 | 173 | < <= > >= = != 174 | 175 | Logical operators: 176 | 177 | .. code:: lisp 178 | 179 | and or not 180 | 181 | Mathematical functions: 182 | 183 | .. code:: lisp 184 | 185 | (zero? ) 186 | (even? ) 187 | (odd? ) 188 | (abs ) ;; Absolute value 189 | (acos ) 190 | (asin ) 191 | (atan ) 192 | (ceil ) 193 | (cos ) ;; := radians 194 | (cosh ) 195 | (exp ) ;; Exponential 196 | (floor ) 197 | (ln ) ;; Natural logarithm 198 | (log ) ;; Natural logarithm 199 | (log2 ) ;; Base-2 logarithm 200 | (log10 ) ;; Base-10 logarithm 201 | (max ... ) 202 | (min ... ) 203 | (mod ) ;; Modulus 204 | (div ) ;; Integer division (quotient) 205 | (pow ) 206 | (rand) ;; a random double in [0, 1) 207 | (rand-int ) ;; a random integer in [0, n) or (n, 0] 208 | (round ) 209 | (sin ) ;; := radians 210 | (sinh ) 211 | (sqrt ) 212 | (square ) ;; (* ) 213 | (tan ) ;; := radians 214 | (tanh ) 215 | (to-degrees ) ;; := radians 216 | (to-radians ) ;; := degrees 217 | (spherical-distance ) ;; args in 218 | (spherical-distance-deg ) ;; args in radians 219 | (linear-regression ... ) ;; slope, intercept, pearson 220 | (chi-square-p-value ) 221 | 222 | 223 | Fuzzy logic 224 | ----------- 225 | 226 | Basic t-norms 227 | 228 | .. code:: lisp 229 | 230 | (tnorm-min ) ;; Minimum t-norm. Also called the Gödel t-norm. 231 | (tnorm-product ) ;; Product t-norm. The ordinary product of real numbers. 232 | (tnorm-lukasiewicz ) ;; Łukasiewicz t-norm. 233 | (tnorm-drastic ) ;; Drastic t-norm 234 | (tnorm-nilpotent-min ) ;; Nilpotent minimum t-norm 235 | 236 | T-conorms: 237 | 238 | .. code:: lisp 239 | 240 | (tconorm-max ) ;; Maximum t-norm. Dual to the minimum t-norm, is the smallest t-conorm. 241 | (tconorm-probabilistic ) ;; Probabilistic t-norm. It's dual to the product t-norm. 242 | (tconorm-bounded ) ;; Bounded t-norm. It'ss dual to the Łukasiewicz t-norm. 243 | (tconorm-drastic ) ;; Drastic t-conorm. It's dual to the drastic t-norm. 244 | (tconorm-nilpotent-max ) ;; Nilpotent maximum t-conorm. It's dual to the nilpotent minumum. 245 | (tconorm-einstein-sum ) ;; Einstein t-conorm. It's a dual to one of the Hamacher t-norms. 246 | 247 | Parametric t-conorms: 248 | 249 | .. code:: lisp 250 | 251 | (tconorm-max ) ;; Maximum t-norm. Dual to the minimum t-norm, is the smallest t-conorm. 252 | (tconorm-probabilistic ) ;; Probabilistic t-norm. It's dual to the product t-norm. 253 | (tconorm-bounded ) ;; Bounded t-norm. It'ss dual to the Łukasiewicz t-norm. 254 | (tconorm-drastic ) ;; Drastic t-conorm. It's dual to the drastic t-norm. 255 | (tconorm-nilpotent-max ) ;; Nilpotent maximum t-conorm. It's dual to the nilpotent minumum. 256 | (tconorm-einstein-sum ) ;; Einstein t-conorm. It's a dual to one of the Hamacher t-norms. 257 | 258 | Coercions 259 | --------- 260 | 261 | .. code:: lisp 262 | 263 | (integer ) ;; integer 264 | (real ) ;; real 265 | ;; (integer true) = 1, (integer false) = 0 266 | 267 | Dates and time 268 | -------------- 269 | 270 | Functions taking a number representing the *epoch*, i.e., the number of 271 | **milliseconds** since Jan 1st 1970. 272 | 273 | .. code:: lisp 274 | 275 | (epoch-year ) ;; number 276 | (epoch-month ) ;; number 277 | (epoch-week ) ;; number 278 | (epoch-day ) ;; number 279 | (epoch-weekday ) ;; number 280 | (epoch-hour ) ;; number 281 | (epoch-minute ) ;; number 282 | (epoch-second ) ;; number 283 | (epoch-millisecond ) ;; number 284 | (epoch-fields ) ;; list of numbers 285 | 286 | Any string can be coerced to an epoch: 287 | 288 | .. code:: lisp 289 | 290 | (epoch []) 291 | 292 | Conditionals and local variables 293 | -------------------------------- 294 | 295 | Conditionals: 296 | 297 | .. code:: lisp 298 | 299 | (if []) 300 | 301 | (cond 302 | 303 | ... ... 304 | ) 305 | 306 | For example: 307 | 308 | .. code:: lisp 309 | 310 | (cond (> (f "000001") (mean "000001")) "above average" 311 | (= (f "000001") (mean "000001")) "below average" 312 | "mediocre") 313 | 314 | Local variables: 315 | 316 | .. code:: lisp 317 | 318 | (let ) 319 | := ( ... ) 320 | := 321 | 322 | For example: 323 | 324 | .. code:: lisp 325 | 326 | (let (x (+ (window "a" -10 10)) 327 | a (/ (* x 3) 4.34) 328 | y (if (< a 10) "Good" "Bad")) 329 | (list x (str (f 10) "-" y) a y)) 330 | 331 | Lists 332 | ----- 333 | 334 | Creation and elememt access: 335 | 336 | .. code:: lisp 337 | 338 | (list ... ) ;; list of given values 339 | (cons ) ;; list 340 | (head ) ;; first element 341 | (tail ) ;; list sans first element 342 | (nth ) ;; 0-based nth element 343 | (take ) ;; take first elements 344 | (drop ) ;; drop first elements 345 | (drop ) ;; elements in range [from to) 346 | 347 | Inclusion: 348 | 349 | .. code:: lisp 350 | 351 | (in ) ;; boolean 352 | 353 | Properties of lists: 354 | 355 | .. code:: lisp 356 | 357 | (count ) ;; (count (list (f 1) (f 2))) => 2 358 | (max ) ;; (max (list -1 2 -2 0.38)) => 2 359 | (min ) ;; (min (list -1.3 2 1)) => -1.3 360 | (avg ) ;; (avg (list -1 -2 1 2 0.8 -0.8)) => 0 361 | (list-median ) ;; (list-median (list -1 -2 1 2 0.8 -0.8) => 1 362 | (mode ) ;; (mode (list a b b c b a c c c)) => "c" 363 | 364 | List transformations: 365 | 366 | .. code:: lisp 367 | 368 | (map (list ... )) 369 | (filter (list ... )) 370 | (reverse ) 371 | (sort ) ;; sorts, in increasing order, a list of values 372 | 373 | Field lists and windows: 374 | 375 | .. code:: lisp 376 | 377 | (fields ... ) 378 | (window []) 379 | (diff-window ) ;; differences of consecutive values 380 | (cond-window ) ;; values that satisfy boolean sexp 381 | ;; sum of values 382 | (window-sum []) 383 | ;; mean of values 384 | (window-mean []) 385 | ;; mode of values 386 | (window-mode []) 387 | ;; median of values 388 | (window-median []) 389 | 390 | 391 | Accumulating values in cells 392 | ---------------------------- 393 | 394 | .. code:: lisp 395 | 396 | (cell ) 397 | (set-cell ) 398 | -------------------------------------------------------------------------------- /python/notebooks/Flatline.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Local Flatline Interpreter" 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": null, 13 | "metadata": {}, 14 | "outputs": [], 15 | "source": [ 16 | "from flatline.interpreter import Interpreter" 17 | ] 18 | }, 19 | { 20 | "cell_type": "markdown", 21 | "metadata": {}, 22 | "source": [ 23 | "We create a new local interpreter, that will use *nodejs* under the rug" 24 | ] 25 | }, 26 | { 27 | "cell_type": "code", 28 | "execution_count": null, 29 | "metadata": {}, 30 | "outputs": [], 31 | "source": [ 32 | "interpreter = Interpreter()" 33 | ] 34 | }, 35 | { 36 | "cell_type": "markdown", 37 | "metadata": {}, 38 | "source": [ 39 | "## Available functions\n", 40 | "\n", 41 | "We can query the interpreter for all the built-in functions provided by flatline" 42 | ] 43 | }, 44 | { 45 | "cell_type": "code", 46 | "execution_count": null, 47 | "metadata": {}, 48 | "outputs": [], 49 | "source": [ 50 | "interpreter.defined_functions()" 51 | ] 52 | }, 53 | { 54 | "cell_type": "markdown", 55 | "metadata": {}, 56 | "source": [ 57 | "## Checking symbolic expressions" 58 | ] 59 | }, 60 | { 61 | "cell_type": "markdown", 62 | "metadata": {}, 63 | "source": [ 64 | "The interpreter can check for us whether a Lisp or JSON s-expression is correct." 65 | ] 66 | }, 67 | { 68 | "cell_type": "markdown", 69 | "metadata": {}, 70 | "source": [ 71 | "### Valid constant expressions" 72 | ] 73 | }, 74 | { 75 | "cell_type": "markdown", 76 | "metadata": {}, 77 | "source": [ 78 | "Lisp s-expressions are represented as strings:" 79 | ] 80 | }, 81 | { 82 | "cell_type": "code", 83 | "execution_count": null, 84 | "metadata": {}, 85 | "outputs": [], 86 | "source": [ 87 | "interpreter.check_lisp('(+ 1 2)')" 88 | ] 89 | }, 90 | { 91 | "cell_type": "markdown", 92 | "metadata": {}, 93 | "source": [ 94 | "JSON expressions are represented as Python lists of native values" 95 | ] 96 | }, 97 | { 98 | "cell_type": "code", 99 | "execution_count": null, 100 | "metadata": {}, 101 | "outputs": [], 102 | "source": [ 103 | "interpreter.check_json([\"+\", [\"*\", 3, 5]])" 104 | ] 105 | }, 106 | { 107 | "cell_type": "markdown", 108 | "metadata": {}, 109 | "source": [ 110 | "### Some erroneous symbolic expressions" 111 | ] 112 | }, 113 | { 114 | "cell_type": "code", 115 | "execution_count": null, 116 | "metadata": {}, 117 | "outputs": [], 118 | "source": [ 119 | "interpreter.check_lisp('(+ 2')" 120 | ] 121 | }, 122 | { 123 | "cell_type": "code", 124 | "execution_count": null, 125 | "metadata": {}, 126 | "outputs": [], 127 | "source": [ 128 | "interpreter.check_json([\"non-existent\", 3])" 129 | ] 130 | }, 131 | { 132 | "cell_type": "code", 133 | "execution_count": null, 134 | "metadata": {}, 135 | "outputs": [], 136 | "source": [ 137 | "interpreter.check_json([\"+\", 1, \"3\"])" 138 | ] 139 | }, 140 | { 141 | "cell_type": "code", 142 | "execution_count": null, 143 | "metadata": {}, 144 | "outputs": [], 145 | "source": [ 146 | "interpreter.check_lisp('(f 0)')" 147 | ] 148 | }, 149 | { 150 | "cell_type": "markdown", 151 | "metadata": {}, 152 | "source": [ 153 | "### Checking expressions that depend on input dataset fields" 154 | ] 155 | }, 156 | { 157 | "cell_type": "markdown", 158 | "metadata": {}, 159 | "source": [ 160 | "The latest sexp was invalid because no dataset is known, and hence there's no \"field 0\".\n", 161 | "\n", 162 | "Let's create a mock dataset to tell the interpreter what are our fields:" 163 | ] 164 | }, 165 | { 166 | "cell_type": "code", 167 | "execution_count": null, 168 | "metadata": {}, 169 | "outputs": [], 170 | "source": [ 171 | "mock_dataset = {'dataset':{'fields': Interpreter.infer_fields([1, 'a'])}}\n", 172 | "mock_dataset['dataset']['fields']" 173 | ] 174 | }, 175 | { 176 | "cell_type": "markdown", 177 | "metadata": {}, 178 | "source": [ 179 | "Now the checks referring to those fields will pass:" 180 | ] 181 | }, 182 | { 183 | "cell_type": "code", 184 | "execution_count": null, 185 | "metadata": {}, 186 | "outputs": [], 187 | "source": [ 188 | "interpreter.check_lisp('(field 0)', dataset=mock_dataset)" 189 | ] 190 | }, 191 | { 192 | "cell_type": "code", 193 | "execution_count": null, 194 | "metadata": {}, 195 | "outputs": [], 196 | "source": [ 197 | "interpreter.check_json([\"f\", \"000001\"], dataset=mock_dataset)" 198 | ] 199 | }, 200 | { 201 | "cell_type": "markdown", 202 | "metadata": {}, 203 | "source": [ 204 | "Note how the two last expressions have no associated value, because they depend on the concrete input rows to which they're applied (i.e., these expressions do not represent constant values)." 205 | ] 206 | }, 207 | { 208 | "cell_type": "markdown", 209 | "metadata": {}, 210 | "source": [ 211 | "## Applying symbolic expressions" 212 | ] 213 | }, 214 | { 215 | "cell_type": "markdown", 216 | "metadata": {}, 217 | "source": [ 218 | "We can apply valid symbolic expressions to local rows represented as lists of native Python values:" 219 | ] 220 | }, 221 | { 222 | "cell_type": "code", 223 | "execution_count": null, 224 | "metadata": {}, 225 | "outputs": [], 226 | "source": [ 227 | "test_rows = [[1, 'a'], [2, 'b'], [23, 'd']]\n", 228 | "interpreter.apply_lisp('(fields 1 0)', test_rows)" 229 | ] 230 | }, 231 | { 232 | "cell_type": "code", 233 | "execution_count": null, 234 | "metadata": {}, 235 | "outputs": [], 236 | "source": [ 237 | "interpreter.apply_lisp('(list (+ 2 (f 0)) (- (f 0) (f 0 -1)))', test_rows)" 238 | ] 239 | }, 240 | { 241 | "cell_type": "code", 242 | "execution_count": null, 243 | "metadata": {}, 244 | "outputs": [], 245 | "source": [ 246 | "interpreter.apply_json([\"window\", \"000001\", -1, 1], test_rows)" 247 | ] 248 | }, 249 | { 250 | "cell_type": "markdown", 251 | "metadata": {}, 252 | "source": [ 253 | "In these examples, the field characteristics are guessed from the given values. Guessing is useful for quick tests, but in real cases we should provide real dataset metadata to the apply functions." 254 | ] 255 | }, 256 | { 257 | "cell_type": "markdown", 258 | "metadata": {}, 259 | "source": [ 260 | "# Extended example using remote resources" 261 | ] 262 | }, 263 | { 264 | "cell_type": "code", 265 | "execution_count": null, 266 | "metadata": {}, 267 | "outputs": [], 268 | "source": [ 269 | "from bigml.api import BigML\n", 270 | "from flatline.sampler import Sampler" 271 | ] 272 | }, 273 | { 274 | "cell_type": "code", 275 | "execution_count": null, 276 | "metadata": {}, 277 | "outputs": [], 278 | "source": [ 279 | "api = BigML()" 280 | ] 281 | }, 282 | { 283 | "cell_type": "markdown", 284 | "metadata": {}, 285 | "source": [ 286 | "We start by creating a dataset from Quandl's dataset on Apple NASDAQ" 287 | ] 288 | }, 289 | { 290 | "cell_type": "code", 291 | "execution_count": null, 292 | "metadata": {}, 293 | "outputs": [], 294 | "source": [ 295 | "source = api.create_source('https://s3.amazonaws.com/bigml-public/csv/nasdaq_aapl.csv', {'name':'Flatline tests'})\n", 296 | "api.ok(source)" 297 | ] 298 | }, 299 | { 300 | "cell_type": "code", 301 | "execution_count": null, 302 | "metadata": {}, 303 | "outputs": [], 304 | "source": [ 305 | "dataset = api.create_dataset(source)\n", 306 | "dataset_id = dataset['resource']\n", 307 | "api.ok(dataset)" 308 | ] 309 | }, 310 | { 311 | "cell_type": "markdown", 312 | "metadata": {}, 313 | "source": [ 314 | "And download a sample of its rows locally, using a *Sampler* object" 315 | ] 316 | }, 317 | { 318 | "cell_type": "code", 319 | "execution_count": null, 320 | "metadata": {}, 321 | "outputs": [], 322 | "source": [ 323 | "sampler = Sampler()" 324 | ] 325 | }, 326 | { 327 | "cell_type": "markdown", 328 | "metadata": {}, 329 | "source": [ 330 | "*Sampler*, like *Interpreter* are abstractions above the building blocks provided by the API bindings, and take care internally of waiting for resource completion and other housekeeping (that's why we don't need `api.ok()` calls here)." 331 | ] 332 | }, 333 | { 334 | "cell_type": "code", 335 | "execution_count": null, 336 | "metadata": {}, 337 | "outputs": [], 338 | "source": [ 339 | "sampler.take_sample(dataset_id, size=5)" 340 | ] 341 | }, 342 | { 343 | "cell_type": "markdown", 344 | "metadata": {}, 345 | "source": [ 346 | "These are the rows that we have downloaded locally (plus all the associated metadata)" 347 | ] 348 | }, 349 | { 350 | "cell_type": "code", 351 | "execution_count": null, 352 | "metadata": {}, 353 | "outputs": [], 354 | "source": [ 355 | "sampler.rows()" 356 | ] 357 | }, 358 | { 359 | "cell_type": "markdown", 360 | "metadata": {}, 361 | "source": [ 362 | "The sampler also keeps information on the dataset and sample metadata; e.g. the field descriptors:" 363 | ] 364 | }, 365 | { 366 | "cell_type": "code", 367 | "execution_count": null, 368 | "metadata": {}, 369 | "outputs": [], 370 | "source": [ 371 | "[{'id':f['id'], 'name':f['name'], 'optype':f['optype']} for f in sampler.fields()]" 372 | ] 373 | }, 374 | { 375 | "cell_type": "markdown", 376 | "metadata": {}, 377 | "source": [ 378 | "Now we can apply locally Flatline expressions and check whether they produce sensible results. \n", 379 | "\n", 380 | "For instance, we could normalize **Low**, **High** and **Volume**, dividing them by their mean value in the original dataset. \n", 381 | "\n", 382 | "Let's define an auxiliary function to generate the corresponding Flatline JSON s-expressions:" 383 | ] 384 | }, 385 | { 386 | "cell_type": "code", 387 | "execution_count": null, 388 | "metadata": {}, 389 | "outputs": [], 390 | "source": [ 391 | "def norm_field(name):\n", 392 | " return [\"/\", [\"field\", name], [\"abs\", [\"mean\", name]]]\n", 393 | "\n", 394 | "norm_field('High')" 395 | ] 396 | }, 397 | { 398 | "cell_type": "markdown", 399 | "metadata": {}, 400 | "source": [ 401 | "We can use the interpreter to check the format and syntax of our generated code:" 402 | ] 403 | }, 404 | { 405 | "cell_type": "code", 406 | "execution_count": null, 407 | "metadata": {}, 408 | "outputs": [], 409 | "source": [ 410 | "def print_as_lisp(json_sexp):\n", 411 | " print interpreter.json_to_lisp(json_sexp)\n", 412 | " \n", 413 | "print_as_lisp(norm_field('Low'))" 414 | ] 415 | }, 416 | { 417 | "cell_type": "markdown", 418 | "metadata": {}, 419 | "source": [ 420 | "To generate more than one value, we wrap the list of field expressions in a `list` form:" 421 | ] 422 | }, 423 | { 424 | "cell_type": "code", 425 | "execution_count": null, 426 | "metadata": {}, 427 | "outputs": [], 428 | "source": [ 429 | "def make_list(*fields):\n", 430 | " res = ['list']\n", 431 | " res.extend(fields)\n", 432 | " return res\n", 433 | " \n", 434 | "norm_fields = make_list(norm_field('Low'), norm_field('High'), norm_field('Volume'))\n", 435 | "print_as_lisp(norm_fields)" 436 | ] 437 | }, 438 | { 439 | "cell_type": "markdown", 440 | "metadata": {}, 441 | "source": [ 442 | "And now let's check that the syntax is in fact correct:" 443 | ] 444 | }, 445 | { 446 | "cell_type": "code", 447 | "execution_count": null, 448 | "metadata": {}, 449 | "outputs": [], 450 | "source": [ 451 | "interpreter.check_json(norm_fields, dataset['object'])" 452 | ] 453 | }, 454 | { 455 | "cell_type": "markdown", 456 | "metadata": {}, 457 | "source": [ 458 | "Our lisp expression seems correct, and produces three numeric values. We can apply it to our sample rows and confirm that the outputs are in fact what we expect:" 459 | ] 460 | }, 461 | { 462 | "cell_type": "code", 463 | "execution_count": null, 464 | "metadata": {}, 465 | "outputs": [], 466 | "source": [ 467 | "sampler.apply_json(norm_fields)" 468 | ] 469 | }, 470 | { 471 | "cell_type": "markdown", 472 | "metadata": {}, 473 | "source": [ 474 | "Looks good so far. Let's say we want to predict whether the stock will go up or down based on the Open and Close values of the **previous day** and today's Open value. We can access the value of a previous row with `(field name -1)`:" 475 | ] 476 | }, 477 | { 478 | "cell_type": "code", 479 | "execution_count": null, 480 | "metadata": {}, 481 | "outputs": [], 482 | "source": [ 483 | "def previous_day(name):\n", 484 | " return [\"field\", name, -1]\n", 485 | "\n", 486 | "open_close_fields = make_list(previous_day('Open'), \n", 487 | " previous_day('Close'))\n", 488 | "\n", 489 | "print_as_lisp(open_close_fields)" 490 | ] 491 | }, 492 | { 493 | "cell_type": "markdown", 494 | "metadata": {}, 495 | "source": [ 496 | "Let's check it's a good Flatline expression and see how it works on our local sample:" 497 | ] 498 | }, 499 | { 500 | "cell_type": "code", 501 | "execution_count": null, 502 | "metadata": {}, 503 | "outputs": [], 504 | "source": [ 505 | "interpreter.check_json(open_close_fields, dataset=dataset['object'])" 506 | ] 507 | }, 508 | { 509 | "cell_type": "code", 510 | "execution_count": null, 511 | "metadata": {}, 512 | "outputs": [], 513 | "source": [ 514 | "sampler.apply_json(open_close_fields)" 515 | ] 516 | }, 517 | { 518 | "cell_type": "markdown", 519 | "metadata": {}, 520 | "source": [ 521 | "Note how the entries for the previous day Open and Close values are `None` in the first row, since there's no previous day!\n", 522 | "\n", 523 | "Finally, let's define our objective field, **UpOrDown**:" 524 | ] 525 | }, 526 | { 527 | "cell_type": "code", 528 | "execution_count": null, 529 | "metadata": {}, 530 | "outputs": [], 531 | "source": [ 532 | "up_or_down = '(if (> (f \"Open\") (f \"Close\")) \"down\" \"up\")'\n", 533 | "interpreter.check_lisp(up_or_down, dataset=dataset['object'])" 534 | ] 535 | }, 536 | { 537 | "cell_type": "code", 538 | "execution_count": null, 539 | "metadata": {}, 540 | "outputs": [], 541 | "source": [ 542 | "sampler.apply_lisp(up_or_down)" 543 | ] 544 | }, 545 | { 546 | "cell_type": "markdown", 547 | "metadata": {}, 548 | "source": [ 549 | "Once we're happy with our transformations, we ask BigML to create the new fields over the entire dataset" 550 | ] 551 | }, 552 | { 553 | "cell_type": "code", 554 | "execution_count": null, 555 | "metadata": {}, 556 | "outputs": [], 557 | "source": [ 558 | "norm_fields_sexp = interpreter.json_to_lisp(norm_fields)\n", 559 | "open_close_sexp = interpreter.json_to_lisp(open_close_fields)\n", 560 | "\n", 561 | "extended_dataset = api.create_dataset(dataset, {'new_fields':[{'field':norm_fields_sexp, 'names':['NLow', 'NHigh', 'NVol']},\n", 562 | " {'field':open_close_sexp, 'names':['Open-1', 'Close-1']},\n", 563 | " {'field':up_or_down, 'name': 'Up or down'}]})\n", 564 | "api.ok(extended_dataset)" 565 | ] 566 | }, 567 | { 568 | "cell_type": "markdown", 569 | "metadata": {}, 570 | "source": [ 571 | "and we confirm that the new dataset has indeed the new columns:" 572 | ] 573 | }, 574 | { 575 | "cell_type": "code", 576 | "execution_count": null, 577 | "metadata": {}, 578 | "outputs": [], 579 | "source": [ 580 | "sampler.take_sample(extended_dataset['resource'], size=3)\n", 581 | "[{'id':f['id'], 'name':f['name'], 'optype':f['optype']} for f in sampler.fields()]" 582 | ] 583 | }, 584 | { 585 | "cell_type": "code", 586 | "execution_count": null, 587 | "metadata": {}, 588 | "outputs": [], 589 | "source": [ 590 | "sampler.rows()" 591 | ] 592 | } 593 | ], 594 | "metadata": { 595 | "kernelspec": { 596 | "display_name": "Python 2", 597 | "language": "python", 598 | "name": "python2" 599 | }, 600 | "language_info": { 601 | "codemirror_mode": { 602 | "name": "ipython", 603 | "version": 2 604 | }, 605 | "file_extension": ".py", 606 | "mimetype": "text/x-python", 607 | "name": "python", 608 | "nbconvert_exporter": "python", 609 | "pygments_lexer": "ipython2", 610 | "version": "2.7.15+" 611 | } 612 | }, 613 | "nbformat": 4, 614 | "nbformat_minor": 1 615 | } 616 | -------------------------------------------------------------------------------- /docs/user-manual.rst: -------------------------------------------------------------------------------- 1 | Flatline user manual 2 | ==================== 3 | 4 | S-expressions vs. JSON 5 | ---------------------- 6 | 7 | Flatline expressions in this manual use its lisp-like syntax, based on 8 | `symbolic expressions `__ or 9 | *sexps*. When sending them to BigML via our API, you can also use their 10 | JSON representation, which is trivially obtained by using JSON lists for 11 | each paranthesised sexp. For instance: 12 | 13 | :: 14 | 15 | (if (< (f "a") 3) 0 4) => ["if", ["<", ["f", "a"], 3], 0, 4] 16 | 17 | Literal values 18 | -------------- 19 | 20 | Constant numbers, symbols, booleans and strings, using Java/Clojure 21 | syntax are valid expressions. 22 | 23 | Examples: 24 | 25 | .. code:: lisp 26 | 27 | 1258 28 | 2.349 29 | this-is-a-symbol 30 | "a string" 31 | true 32 | false 33 | 34 | Counters 35 | -------- 36 | 37 | While running over an input dataset, Flatline keeps track of the 38 | (zero-based) number of the input row that's being used, which can be 39 | accessed with the function ``row-number``, which takes no arguments: 40 | 41 | :: 42 | 43 | (row-number) => current input row (0-based) 44 | 45 | A typical use of this function is to generate a unique identifier for 46 | each row. The row number will start at 0 unless you skip some rows of 47 | the input dataset, and increase by one on each new row (unless you're 48 | specifying a input row step when generating a dataset). 49 | 50 | Field accessors 51 | --------------- 52 | 53 | Field values 54 | ~~~~~~~~~~~~ 55 | 56 | Input field values are accessed using the ``field`` operator: 57 | 58 | :: 59 | 60 | (field [] []) 61 | := field id | field name | column_number 62 | := integer expression 63 | := output value if the requested row is out-of-range 64 | 65 | where ```` can be either the identifier, name or 66 | column number of the desired field, and the optional ```` (an 67 | integer, defaulting to 0) denotes the offset with respect to the 68 | current input row. The optional ```` is the output 69 | value if the value of the field for the given row (taking into account 70 | the shift, if any) is outside the limits of our dataset. It can be a 71 | constant value or an expression. If ```` is not set, 72 | the accessor will return a missing in those cases. 73 | 74 | So, for instance, these sexps denote field values extracted from the 75 | current row: 76 | 77 | .. code:: lisp 78 | 79 | (field 0) 80 | (field 0 0) 81 | (field 0 -1 "default-string") 82 | (field 0 -1 (mean (field 0))) 83 | (field 0 -1 3) 84 | (field "000004") 85 | (field "a field name" 0) 86 | 87 | while 88 | 89 | .. code:: lisp 90 | 91 | (field "000001" -2) 92 | 93 | denotes the value of the cell corresponding to a field with identifier 94 | "000001" two rows *before* the current one. Positive shift values denote 95 | rows after the current one. 96 | 97 | .. code:: lisp 98 | 99 | (field "a field" 3) 100 | (field "another field" 2) 101 | 102 | For convenience, and since ``field`` is probably going to be your most 103 | often user operator, it can be abbreviated to ``f``: 104 | 105 | .. code:: lisp 106 | 107 | (f "000001" -2) 108 | (f 3 1) 109 | (f 1 -1 3) 110 | (f "a field" 23) 111 | 112 | We also provide a predicate, ``missing?``, that will tell you whether 113 | the value of the field for the given row (taking into account the 114 | shift, if any) is a missing token: 115 | 116 | :: 117 | 118 | (missing? []) 119 | 120 | E.g.: 121 | 122 | .. code:: lisp 123 | 124 | (missing? "species") 125 | (missing? "000001" -2) 126 | (missing? 3 1) 127 | (missing? "a field" 23) 128 | 129 | will all yield boolean values. For backwards compatibility, ``missing`` 130 | is an alias for ``missing?``. 131 | 132 | Randomized field values 133 | ~~~~~~~~~~~~~~~~~~~~~~~ 134 | 135 | There are two Flatline functions that will let you generate a random 136 | value in the domain of a given field, given its designator: 137 | 138 | :: 139 | 140 | (random-value ) 141 | (weighted-random-value ) 142 | 143 | e.g. 144 | 145 | .. code:: lisp 146 | 147 | (random-value "age") 148 | (weighted-random-value "000001") 149 | (weighted-random-value 3) 150 | 151 | Both functions generate a value with the constrain that it belongs to 152 | the domain of the given field, but while ``random-value`` uses a uniform 153 | probability of the field's range of values, ``weighted-random-value`` 154 | uses de distribution of the field values (as computed in its histogram) 155 | as the probability measure for the random generator. 156 | 157 | These two functions work for numeric, categorical and text fields, with 158 | generated values satisfying: 159 | 160 | - For numeric fields, generated values are in the interval 161 | ``[(minimum ), (maximum )]`` 162 | - For categorical fields, generated values belong to the set 163 | ``(categories )`` 164 | - For text fields, we generate terms in the field's tag cloud 165 | (generated values correspond to single terms in the cloud). 166 | - Datetime **parent** fields are not supported, since they don't have a 167 | defined distribution: you can use any of their numeric children for 168 | generating values following their distributions. 169 | 170 | A common use of these functions is replacing missing values with random 171 | data, which in Flatline you could write as, say: 172 | 173 | .. code:: lisp 174 | 175 | (if (missing? "00000") (random-value "000000") (f "000000")) 176 | 177 | We provide a shortcut for those common operations with the functions 178 | ``ensure-value`` and ``ensure-weighted-value``: 179 | 180 | :: 181 | 182 | (ensure-value ) := 183 | (if (missing? ) (random-value ) (field )) 184 | 185 | (ensure-weighted-value ) := 186 | (if (missing? ) (weighted-random-value ) (field )) 187 | 188 | We them, our example above can be simply written as: 189 | 190 | .. code:: lisp 191 | 192 | (ensure-value "000000") 193 | 194 | or, if you want that the generated random values follow the same 195 | distribution as the field "000000": 196 | 197 | .. code:: lisp 198 | 199 | (ensure-weighted-value "000000") 200 | 201 | Normalized field values 202 | ~~~~~~~~~~~~~~~~~~~~~~~ 203 | 204 | For numeric fields, it's often useful to normalize their values to a 205 | standard interval (usually [0, 1]). To that end, you can use the 206 | Flatline primitive ``normalize``, which takes as arguments the 207 | designator for the field you want to normalize and, optionally, the two 208 | bounds of the resulting interval: 209 | 210 | :: 211 | 212 | (normalize [ ]) 213 | => (+ from (* (- to from) 214 | (/ (- (f id) (minimum id)) 215 | (- (maximum id) (minimum id))))) 216 | 217 | For instance: 218 | 219 | .. code:: lisp 220 | 221 | (normalize "000001") ;; = (normalize "000001" 0 1) 222 | (normalize "width" -1 1) 223 | (normalize "length" 8 23) 224 | 225 | As shown in the formula above, ``normalize`` linearly maps the minimum 226 | value of the field to ``from`` (0 by default) and the maximum value to 227 | ``to`` (1 by default). 228 | 229 | Besides this linear normalization, it's also common to standardize 230 | numeric data values by mapping them to a gaussian, according to the 231 | equation: 232 | 233 | :: 234 | 235 | x[i] -> (x[i] - mean(x)) / standard_deviation(x) 236 | 237 | or, in flatline terms: 238 | 239 | :: 240 | 241 | (/ (- (f ) (mean )) (standard-deviation )) 242 | 243 | This normalization function is called the Z score, and we provide it as 244 | the function ``z-score``: 245 | 246 | :: 247 | 248 | (z-score ) 249 | 250 | E.g.: 251 | 252 | .. code:: lisp 253 | 254 | (z-score "000034") 255 | (z-score "a numeric field") 256 | (z-score 23) 257 | 258 | As with ``normalize``, the field used must have a numeric optype. 259 | 260 | You can use the function ``log-normal`` to apply ``z-score`` to the 261 | logarithm of your field. This is useful when your field follows a 262 | log-normal distribution and you want to map it to a gaussian. 263 | 264 | 265 | .. code:: lisp 266 | 267 | (log-normal "000003") 268 | (z-score "a numeric field") 269 | (z-score 1) 270 | 271 | 272 | This function requires numeric fields with, at least, 80% of the 273 | values greater than 0 and a non-zero mean value. 274 | 275 | Vectorized categorical or text fields 276 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 277 | 278 | It may be useful to convert categorical or text fields to numeric values 279 | for models which accept only numeric data as input. This can be 280 | accomplished with the Flatline primitive ``vectorize``: 281 | 282 | :: 283 | 284 | (vectorize []) 285 | 286 | For categorical fields, the output is a binary indicator vector. In 287 | other words, it is a list of numeric fields, one per possible 288 | categorical value, and for each instance, the numeric field 289 | corresponding to the category of that instance will have a value of 290 | ``1``, whereas the remaining numeric fields will have a value of ``0``. 291 | 292 | For text fields, the output is a list of numeric fields, each 293 | corresponding to a term in the field's tag cloud. The value of each 294 | field is the number of times that term appears in that instance. 295 | 296 | A numeric expression or literal can be passed as an optional second 297 | argument to limit the number of generated fields to the *n* most 298 | frequent categories or text terms. 299 | 300 | Field properties 301 | ~~~~~~~~~~~~~~~~ 302 | 303 | Summary properties 304 | ^^^^^^^^^^^^^^^^^^ 305 | 306 | Field descriptors contain lots of properties with metadata about the 307 | field, including its summary. These propeties (when they're atomic) can 308 | be accessed via ``field-prop``: 309 | 310 | :: 311 | 312 | (field-prop ...) 313 | := string | numeric | boolean 314 | 315 | For instance, you can access the name for field "00023" via: 316 | 317 | .. code:: lisp 318 | 319 | (field-prop string "00023" name) 320 | 321 | or the value of the nested property missing\_count inside the summary 322 | with: 323 | 324 | .. code:: lisp 325 | 326 | (field-prop numeric "00023" summary missing_count) 327 | 328 | We provide several shortcuts for concrete summary properties, to save 329 | you typing: 330 | 331 | :: 332 | 333 | (maximum ) 334 | (mean ) 335 | (median ) 336 | (minimum ) 337 | (missing-count ) 338 | (population ) 339 | (sum ) 340 | (sum-squares ) 341 | (standard-deviation ) 342 | (variance ) 343 | 344 | (preferred? ) 345 | 346 | (category-count ) 347 | (bin-center ) 348 | (bin-count ) 349 | 350 | As you can see, the category and count accessors take an additional 351 | parameter designating either the category (a string or order number) and 352 | the bin (a 0-based integer index) you refer to: 353 | 354 | .. code:: lisp 355 | 356 | (category-count "species" "Iris-versicolor") 357 | (category-count "species" (f "000004")) 358 | (bin-count "age" (f "bin-selector")) 359 | (bin-center "000003" 3) 360 | (bin-center (field "field-selector") 4) 361 | 362 | Discretization of numeric fields 363 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 364 | 365 | A simple way to discretize a numeric field is to assign a label to each 366 | of a finite set of segments, defined by a sequence of upper bounds. For 367 | instance: 368 | 369 | .. code:: lisp 370 | 371 | (let (v (f "age")) 372 | (cond (< v 2) "baby" 373 | (< v 10) "child" 374 | (< v 20) "teenager" 375 | "adult")) 376 | 377 | Flatline provides a shortcut for the above expression via its 378 | ``segment-label`` primitive: 379 | 380 | .. code:: lisp 381 | 382 | (segment-label "000000" "baby" 2 "child" 10 "teenager" 20 "adult") 383 | 384 | As you can see, the first argument is the field designator (as usual, a 385 | name, column number or identifier), followed by alternating labels and 386 | upper bounds. More generally: 387 | 388 | :: 389 | 390 | (segment-label ... ) 391 | ... strings, ... numbers 392 | => (cond (< (f ) ) 393 | (< (f ) ) 394 | ... 395 | (< (f ) ) 396 | ) 397 | 398 | The alternating labels and bounds must be constant strings and numbers. 399 | If you want to use segments of equal length between the minimum and 400 | maximum value of the field, you can omit the upper bounds and give 401 | simply the list of labels, e.g. 402 | 403 | .. code:: lisp 404 | 405 | (segment-label 0 "1st fourth" "2nd fourth" "3rd fourth" "4th fourth") 406 | 407 | which would be equivalent to: 408 | 409 | .. code:: lisp 410 | 411 | (let (max (maximum 0) 412 | min (minimum 0) 413 | step (/ (- max min) 4)) 414 | (segment-label 0 "1st fourth" (+ min step) 415 | "2nd fourth" (+ min step step) 416 | "3rd fourth" (+ min step step step) 417 | "4th fourth")) 418 | 419 | or, in general: 420 | 421 | :: 422 | 423 | (segment-label ... ) with ... strings 424 | 425 | => (let (min (minimum ) 426 | step (- (maximum ) min) 427 | shift (- (f ) min)) 428 | (cond (< shift step) 429 | (< shift (* 2 step)) 430 | ... 431 | (< shift (* (- n 1) step)) 432 | )) 433 | 434 | Items and itemsets 435 | ^^^^^^^^^^^^^^^^^^ 436 | 437 | A common operation on fields of optype *items* is to check whether they 438 | contain a list of items. That can be used, for instance, to filter the 439 | rows of a dataset that satisfy a given association rule, but calling 440 | ``contains-items?`` with the list of items in the antecedent and 441 | consequent of the desired rule. 442 | 443 | :: 444 | 445 | (contains-items? ... ) 446 | ;; with of type string for i in [0, n] 447 | 448 | The ``contains-items`` primitive takes as first argument the descriptor 449 | of the field we want to check (which must have optype items), followed 450 | by the one or more items we want to check, which must all have type 451 | string. For instance, the predicate: 452 | 453 | .. code:: lisp 454 | 455 | (contains-items? "000000" "blue" "green" "darkblue") 456 | 457 | will filter the rows whose first column satisfies the association rule 458 | ``blue, green -> darkblue``. 459 | 460 | It is also possible to check whether an items field contains *only* the 461 | given list of items (in any order), using ``equal-to-items?``, which 462 | works exactly as ``contains-items?`` except for the fact that it's 463 | exclusive: 464 | 465 | :: 466 | 467 | (equals-to-items? ... ) 468 | ;; with of type string for i in [0, n] 469 | 470 | 471 | Regions 472 | ^^^^^^^ 473 | 474 | It is possible to manipulate and modify values of type *regions*. In 475 | flatline, a regions value is a list of lists. Each of the inner lists 476 | has 5 elements. The first one is the label of the region (a string), 477 | and its followed by four integers, which are the coordinates of the 478 | top-left corner and bottom-right corner of the region at hand. The 479 | ``regions?`` primitive checks whether a list represents a valid 480 | region (checking also if the vertex coordinates are consitent): 481 | 482 | .. code:: lisp 483 | 484 | (region? (list "label" 10 10 20 30)) ;; => "true" 485 | (region? (list 10 10 20 30)) ;; => "false" 486 | (region? (list -10 10 -20 30)) ;; => "false" 487 | 488 | When we access a field of type regions, the returned value will be a 489 | list with all its values satisfying the ``region?`` predictate. We 490 | can add a new region to it with ``add-region``: 491 | 492 | :: 493 | 494 | (add-region ) 495 | (add-region