├── python
    ├── flatline
    │   ├── __init__.py
    │   ├── flatline-node.js
    │   ├── sampler.py
    │   └── interpreter.py
    ├── tests
    │   ├── __init__.py
    │   └── flatline_tests.py
    ├── setup.py
    ├── README.md
    └── notebooks
    │   └── Flatline.ipynb
├── js
    └── demo
    │   ├── flatline.js
    │   ├── styles.css
    │   └── index.html
├── docs
    ├── requirements.txt
    ├── index.rst
    ├── Makefile
    ├── conf.py
    ├── quick-reference.rst
    └── user-manual.rst
├── .gitignore
├── .readthedocs.yaml
├── license
└── readme.md


/python/flatline/__init__.py:
--------------------------------------------------------------------------------
1 | 


--------------------------------------------------------------------------------
/python/tests/__init__.py:
--------------------------------------------------------------------------------
1 | 


--------------------------------------------------------------------------------
/js/demo/flatline.js:
--------------------------------------------------------------------------------
1 | ../flatline.js


--------------------------------------------------------------------------------
/python/flatline/flatline-node.js:
--------------------------------------------------------------------------------
1 | ../../js/flatline-node.js


--------------------------------------------------------------------------------
/docs/requirements.txt:
--------------------------------------------------------------------------------
1 | sphinx
2 | sphinx_rtd_theme==2.0.0
3 | recommonmark
4 | 


--------------------------------------------------------------------------------
/python/tests/flatline_tests.py:
--------------------------------------------------------------------------------
 1 | from nose.tools import *
 2 | import flatline
 3 | 
 4 | def setup():
 5 |     print "SETUP!"
 6 | 
 7 | def teardown():
 8 |     print "TEAR DOWN!"
 9 | 
10 | def test_basic():
11 |     print "I RAN!"
12 | 


--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
 1 | .DS_Store
 2 | *.pyc
 3 | *.swl
 4 | *.swm
 5 | *.swn
 6 | *.swo
 7 | *.swp
 8 | *.log
 9 | *.log.*
10 | dist/
11 | .cache
12 | build
13 | *pip-log.txt
14 | *.egg-info
15 | *.egg
16 | *.coverage
17 | .tox/
18 | set_credentials.sh
19 | docs/_build/*
20 | *~
21 | /python/notebooks/.ipynb_checkpoints/
22 | _build
23 | 


--------------------------------------------------------------------------------
/.readthedocs.yaml:
--------------------------------------------------------------------------------
 1 | # .readthedocs.yaml
 2 | # Read the Docs configuration file
 3 | # See https://docs.readthedocs.io/en/stable/config-file/v2.html for details
 4 | 
 5 | # Required
 6 | version: 2
 7 | 
 8 | # Set the version of Python and other tools you might need
 9 | build:
10 |   os: ubuntu-22.04
11 |   tools:
12 |     python: "3.12"
13 | 
14 | # Build documentation in the docs/ directory with Sphinx
15 | sphinx:
16 |   configuration: docs/conf.py
17 | 
18 | # https://docs.readthedocs.io/en/stable/guides/reproducible-builds.html
19 | python:
20 |   install:
21 |   - requirements: docs/requirements.txt
22 | 


--------------------------------------------------------------------------------
/docs/index.rst:
--------------------------------------------------------------------------------
 1 | Flatline
 2 | ========
 3 | 
 4 | Flatline is a lispy language for the specification of values to be
 5 | extracted or generated from an input dataset, using a finite sliding
 6 | window of input rows.
 7 | 
 8 | In `BigML <https://bigml.com>`__, it is used either as a row filter
 9 | specifier or as a field generator.
10 | 
11 | In the former case, the input consists of dataset rows on which a
12 | single, boolean expression is computed, and only those for which the
13 | result is true are kept in the output dataset.
14 | 
15 | When used to generate new datasets from given ones, a list of Flatline
16 | expressions is provided, each one generating either a value or a list of
17 | values, which are then concatenated together to conform the output rows
18 | (each value representing therefore a field in the generated dataset).
19 | 
20 | .. toctree::
21 |    :maxdepth: 1
22 | 
23 |    user-manual
24 |    quick-reference
25 | 


--------------------------------------------------------------------------------
/js/demo/styles.css:
--------------------------------------------------------------------------------
 1 | html {
 2 |     font: 90%/1.3 arial,sans-serif;
 3 |     padding:1em;
 4 |     background:#B9C2CC;
 5 | }
 6 | 
 7 | form {
 8 |     background:#fff;
 9 |     padding:1em;
10 |     border:1px solid #eee;
11 | }
12 | 
13 | fieldset div {
14 |     margin:0.3em 0.3em;
15 |     clear:both;
16 | }
17 | 
18 | form {
19 |     margin:1em;
20 |     width:30em;
21 | }
22 | 
23 | label {
24 |     float:none;
25 |     display:block;
26 |     clear:both;
27 | }
28 | 
29 | legend {
30 |     color:#0b77b7;
31 |     font-size:1.4em;
32 | }
33 | 
34 | legend span {
35 |     /* width:10em; */
36 |     text-align:right;
37 | }
38 | 
39 | fieldset {
40 |     border:1px solid #ddd;
41 |     padding:0 0.5em 0.5em;
42 | }
43 | 
44 | .help {
45 |     color: red;
46 | }
47 | 
48 | .button {
49 |     margin:10px;
50 |     padding:5px;
51 | }
52 | 
53 | textarea {
54 |     width:25em;
55 | }
56 | 
57 | .error {
58 |     color:red;
59 | }
60 | 
61 | .result {
62 |     color:darkgreen;
63 | }
64 | 


--------------------------------------------------------------------------------
/license:
--------------------------------------------------------------------------------
 1 | Copyright 2013-15 BigML, Inc
 2 | 
 3 | Documentation in this repostory is released under Creative Commons
 4 | Attribution-ShareAlike 4.0 International License.
 5 | 
 6 | Code in this repository is licensed under the Apache License, version
 7 | 2.0, copied below:
 8 | 
 9 | -------------------------------------------------------------------------
10 | Licensed under the Apache License, Version 2.0 (the "License"); you may
11 | not use this file except in compliance with the License. You may obtain
12 | a copy of the License at
13 | 
14 |      http://www.apache.org/licenses/LICENSE-2.0
15 | 
16 | Unless required by applicable law or agreed to in writing, software
17 | distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
18 | WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
19 | License for the specific language governing permissions and limitations
20 | under the License.
21 | -------------------------------------------------------------------------
22 | 


--------------------------------------------------------------------------------
/python/setup.py:
--------------------------------------------------------------------------------
 1 | from setuptools import setup
 2 | 
 3 | setup(name='flatline',
 4 |       description='Python bridge for the flatline javascript interpreter',
 5 |       author='jao',
 6 |       url='http://github.com/bigmlcom/flatline',
 7 |       download_url='http://github.com/bigmlcom/flatline',
 8 |       author_email='jao@bigml.com',
 9 |       version='0.1',
10 |       license='Apache',
11 |       install_requires=['PyExecJS', 'nose', 'bigml'],
12 |       packages=['flatline'],
13 |       packages_data={'flatline':['flatline.js']},
14 |       classifiers=[
15 |           'Development Status :: 4 - Beta',
16 |           'Intended Audience :: Developers',
17 |           'License :: OSI Approved :: Apache Software License',
18 |           'Natural Language :: English',
19 |           'Operating System :: OS Independent',
20 |           'Programming Language :: Python',
21 |           'Programming Language :: Python :: 2',
22 |           'Programming Language :: Python :: 2.7',
23 |           'Programming Language :: Python :: 3',
24 |           'Topic :: Software Development :: Libraries :: Python Modules',
25 |       ],
26 |       scripts=[],
27 |       use_2to3=True
28 | )
29 | 


--------------------------------------------------------------------------------
/python/README.md:
--------------------------------------------------------------------------------
 1 | # Flatline Python bridge
 2 | 
 3 | This package provides a python interface to the local JS Flatline
 4 | interpreter, allowing checking Flatline Lisp and JSON s-expressions
 5 | for correctnes and applying them to local dataset samples to generate
 6 | new fields.
 7 | 
 8 | Typically, you will use the functions in this package to experiment in
 9 | your computer with the data transformations and filters you plan to
10 | eventually execute in BigML servers, after you're satisfied with the
11 | results of your explorations on small data samples.
12 | 
13 | ## Installation
14 | 
15 | The bridge uses [nodejs](http://nodejs.org) under the rug, and hence
16 | needs it to be
17 | [installed in your system](https://nodejs.org/download/) as a
18 | prerequisite.
19 | 
20 | With that in place, you can use `setup.py` script for installing this
21 | package in the usual way.  For instance
22 | 
23 | ```
24 | $ python setup.py develop
25 | ```
26 | 
27 | will perform an in-place installation, possibly in a local virtualenv
28 | (recommended):
29 | 
30 | ```
31 | $ virtualenv --distribute ~/.virtualenvs/flatline
32 | $ workon flatline
33 | $ python setup.py develop
34 | ```
35 | 
36 | ## Running the sample code in iPython
37 | 
38 | We provide a [sample notebook](./notebooks/Flatline.ipynb) to
39 | illustrate the workings of this library.  To use them, install
40 | [ipython and jupyter](http://ipython.org) with `pip`:
41 | 
42 | ```
43 | $ pip install jupyter
44 | ```
45 | 
46 | then
47 | [set up your BIGML enviroment variables for authentication](https://bigml.readthedocs.org/en/latest/#authentication):
48 | 
49 | ```
50 | $ export BIGML_USERNAME=<your user name>
51 | $ export BIGML_API_KEY=<your api key>
52 | ```
53 | 
54 | and start the notebook server in the [notebooks](./notebooks)
55 | directory:
56 | 
57 | ```
58 | $ cd notebooks
59 | $ jupyter notebook Flatline.ipynb
60 | ```
61 | 


--------------------------------------------------------------------------------
/readme.md:
--------------------------------------------------------------------------------
 1 | [![Documentation Status](https://readthedocs.org/projects/flatline/badge/?version=latest)](http://flatline.readthedocs.io/en/latest/?badge=latest)
 2 | 
 3 | # Flatline, a language for data generation and filtering
 4 | 
 5 | Flatline is a lispy language for the specification of values to be
 6 | extracted or generated from an input dataset, using a finite sliding
 7 | window of input rows.
 8 | 
 9 | In BigML, it is used either as a row filter specifier or as a field
10 | generator.
11 | 
12 | In the former case, the input consists of dataset rows on which a
13 | single, boolean expression is computed, and only those for which the
14 | result is true are kept in the output dataset.
15 | 
16 | When used to generate new datasets from given ones, a list of Flatline
17 | expressions is provided, each one generating either a value or a list
18 | of values, which are then concatenated together to conform the output
19 | rows (each value representing therefore a field in the generated
20 | dataset).
21 | 
22 | ## Documentation
23 | 
24 |   - [Flatline's user manual](docs/user-manual.rst).
25 |   - [Quick reference](docs/quick-reference.rst) with all pre-defined
26 |     functions.
27 |   - Or see the HTML version in
28 |     [Read the Docs](http://flatline.readthedocs.io/en/latest/?badge=latest).
29 | 
30 | ## Local interpreters
31 | 
32 | ### Javascript and Node.js
33 | 
34 | We include in [js](./js) Flatline interpreters implemented in
35 | Javascript (compiled by Clojurescript from our canonical server-side
36 | implementation) that you can use from your browser or from a nodejs
37 | session.
38 | 
39 | ### Python
40 | 
41 | The [python directory](./python) contains a small Python library that
42 | wraps the nodejs interpreter and lets you interact with it using
43 | Python.  See its [README](./python/README.md) for more information,
44 | including access to an iPython sample notebook.
45 | 
46 | ## License
47 | 
48 | <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="http://i.creativecommons.org/l/by-sa/4.0/88x31.png" /></a><br /><span xmlns:dct="http://purl.org/dc/terms/" property="dct:title">Flatline reference documentation</span> by <a xmlns:cc="http://creativecommons.org/ns#" href="https://bigml.com" property="cc:attributionName" rel="cc:attributionURL">BigML Inc</a> is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">Creative Commons Attribution-ShareAlike 4.0 International License</a>.
49 | 
50 | All code in this repository is released under the Apache License 2.0.
51 | 


--------------------------------------------------------------------------------
/python/flatline/sampler.py:
--------------------------------------------------------------------------------
  1 | # -*- coding: utf-8 -*-
  2 | 
  3 | # Copyright (c) 2015 BigML, Inc
  4 | # All rights reserved.
  5 | 
  6 | 
  7 | """flatline.sampler
  8 | 
  9 | Working locally with Flatline over dataset samples.
 10 | 
 11 | :author: jao <jao@bigml.com>
 12 | :date: Mon Apr 06, 2015 04:14
 13 | 
 14 | """
 15 | 
 16 | import interpreter
 17 | import bigml.api as api
 18 | import os
 19 | 
 20 | ## local testing
 21 | import requests.packages.urllib3
 22 | requests.packages.urllib3.disable_warnings()
 23 | 
 24 | class Sampler:
 25 |     """The Sampler class automatizes the process of sampling a dataset.
 26 | 
 27 |     It works by downloading a subset of the dataset rows (using
 28 |     BigML's sample resources) and subsequently applying to them any
 29 |     desired Flatline generator.
 30 | 
 31 |     Example:
 32 | 
 33 |     sampler = Sampler()
 34 |     sampler.take_sample('dataset/54e374ab67dc09706d000283', size=4)
 35 |     sampler.apply_lisp('(+ (f 0) (f 1))')
 36 | 
 37 |     """
 38 | 
 39 |     _interpreter = interpreter.Interpreter()
 40 | 
 41 |     def __init__(self, username=None, api_key=None, bigml=None):
 42 |         """Creates a new instance of a Sampler.
 43 | 
 44 |         A Sampler is an object able to connect to your BigML account,
 45 |         retrieve samples of datasets, and apply to those local rows
 46 |         Flatline transformations.  Optionally, you can specify your
 47 |         api_key and username, or a bigml.api.BigML connection.
 48 |         Otherwise, we use the environment variables BIGML_USERNAME and
 49 |         BIGML_API_KEY.
 50 | 
 51 |         """
 52 |         if bigml is None:
 53 |             username = username or os.environ['BIGML_USERNAME']
 54 |             api_key = api_key or os.environ['BIGML_API_KEY']
 55 |             self._bigml = api.BigML(username=username, api_key=api_key)
 56 |         else:
 57 |             self._bigml = bigml
 58 |         self._sample = None
 59 | 
 60 |     def take_sample(self, dataset_id, size=10):
 61 |         """Given the corresponding dataset identifier, retrieve a sample of
 62 |            its rows with the requested size (number of rows).
 63 | 
 64 |         """
 65 |         sample = self._bigml.create_sample(dataset_id)
 66 |         qs = "limit=-1&rows=%d" % size
 67 |         self._sample = self._bigml.check_resource(sample['resource'],
 68 |                                                   query_string=qs)
 69 | 
 70 |     def sample(self):
 71 |         """Returns the full dictionary of properties of the current sample.
 72 | 
 73 |         Use 'take_sample' to update the current sample.
 74 |         """
 75 |         if self._sample is None:
 76 |             return {}
 77 |         return self._sample['object']['sample']
 78 | 
 79 |     def rows(self):
 80 |         """Returns a list of lists representing the current sample's rows.
 81 | 
 82 |         See 'take_sample' for updating the current sample and 'sample'
 83 |         for the full set of its properties.
 84 | 
 85 |         """
 86 |         return self.sample().get('rows')
 87 | 
 88 |     def fields(self):
 89 |         """The list of field descriptors for the current sample.
 90 | 
 91 |         See 'take_sample' for updating the current sample and 'sample'
 92 |         for the full set of its properties.
 93 | 
 94 |         """
 95 |         return self.sample().get('fields')
 96 | 
 97 |     def apply_lisp(self, sexp):
 98 |         """Applies the given lisp s-expression to the current sample's rows.
 99 | 
100 |         On success, returns new rows generated by 'sexp', as a list of
101 |         lists of native Python values.  'sexp' is a string.
102 | 
103 |         You can use 'rows' to retrieve the input rows used by this
104 |         function.
105 | 
106 |         """
107 |         return self._interpreter.apply_lisp(sexp, self.rows(), self.sample())
108 | 
109 |     def apply_json(self, json_sexp):
110 |         """Applies a JSON s-expression to the current sample's rows.
111 | 
112 |         Ths JSON s-expression must be represented as a Python list
113 |         convertible to JSON, e.g. ["+", 1, ["f", "000000"]].
114 | 
115 |         """
116 | 
117 |         return self._interpreter.apply_json(json_sexp,
118 |                                             self.rows(),
119 |                                             self.sample())
120 | 


--------------------------------------------------------------------------------
/python/flatline/interpreter.py:
--------------------------------------------------------------------------------
  1 | # -*- coding: utf-8 -*-
  2 | 
  3 | # Copyright (c) 2015 BigML, Inc
  4 | # All rights reserved.
  5 | 
  6 | 
  7 | """
  8 | flatline.interpreter
  9 | 
 10 | User level interface to the flatline JS interpreter.
 11 | 
 12 | :author: jao <jao@bigml.com>
 13 | :date: Sun Apr 05, 2015 01:40
 14 | 
 15 | """
 16 | 
 17 | import execjs
 18 | import pkg_resources
 19 | 
 20 | class Interpreter:
 21 |     """A bridge to an underlying nodejs Flatline interpreter.
 22 | 
 23 |     This class uses execjs to launch a Nodejs interpreter that loads
 24 |     Flatline's javascript implementation and allows interaction via
 25 |     Python constructs.
 26 | 
 27 |     Example:
 28 | 
 29 |       inter = Interpreter()
 30 |       inter.check_lisp('(+ 1 2)')
 31 |       inter.check_json(["f", 0], dataset=dataset)
 32 | 
 33 |     """
 34 | 
 35 |     __FLATJS = pkg_resources.resource_filename('flatline', 'flatline-node.js')
 36 |     __REQFLATJS = "f__ = require('%s')" % __FLATJS
 37 | 
 38 |     def __init__(self):
 39 |         self._interpreter = execjs.get("Node")
 40 |         self._context = self._interpreter.compile(Interpreter.__REQFLATJS)
 41 | 
 42 |     def __eval_in_flatline(self, fun, *args):
 43 |         return self._context.call('f__.flatline.%s' % fun, *args)
 44 | 
 45 |     @staticmethod
 46 |     def infer_fields(row):
 47 |         """Utility function generating a mock list of fields.
 48 | 
 49 |         Usually, checks and applications of Flatline expressions run
 50 |         in the context of a given dataset's field descriptors, but
 51 |         during testing it's useful sometimes to provide a mock set of
 52 |         them, based on the types of the values of the test input rows.
 53 | 
 54 |         Example:
 55 | 
 56 |            In[1]: Interpreter.infer_fields([0, 'a label'])
 57 |            Out[2]: [{'column_number': 0,
 58 |                       'datatype': 'int64',
 59 |                       'id': '000000',
 60 |                       'optype': 'numeric'},
 61 |                      {'column_number': 1,
 62 |                       'datatype': 'string',
 63 |                       'id': '000001',
 64 |                       'optype': 'categorical'}]
 65 | 
 66 |         """
 67 |         result = []
 68 |         id = 0
 69 |         for v in row:
 70 |             t = type(v)
 71 |             optype = 'categorical'
 72 |             datatype = 'string'
 73 |             if (t is int or t is long or t is float):
 74 |                 optype = 'numeric'
 75 |                 if t is float:
 76 |                     datatype = 'float64'
 77 |                 else:
 78 |                     datatype = 'int64'
 79 |             result.append({'id': '%06x' % id,
 80 |                            'optype':optype,
 81 |                            'datatype': datatype,
 82 |                            'column_number': id})
 83 |             id = id + 1
 84 |         return result
 85 | 
 86 |     @staticmethod
 87 |     def __dataset(dataset, rows):
 88 |         if dataset is None and len(rows) > 0:
 89 |             return {'fields': Interpreter.infer_fields(rows[0])}
 90 |         return dataset
 91 | 
 92 |     def defined_functions(self):
 93 |         """A list of the names of all defined Flaline functions"""
 94 |         return self.__eval_in_flatline('defined_primitives')
 95 | 
 96 |     def check_lisp(self, sexp, dataset=None):
 97 |         """Checks whether the given lisp s-expression is valid.
 98 | 
 99 |         Any operations referring to a dataset's fields will use the
100 |         information found in the provided dataset, which should have
101 |         the structure of the 'object' component of a BigML dataset
102 |         resource.
103 | 
104 |         """
105 |         r = self.__eval_in_flatline('evaluate_sexp', sexp, dataset)
106 |         r.pop(u'mapper', None)
107 |         return r
108 | 
109 |     def check_json(self, json_sexp, dataset=None):
110 |         """Checks whether the given JSON s-expression is valid.
111 | 
112 |         Works like `check_lisp` (which see), but taking a JSON
113 |         expression represented as a native Python list instead of a
114 |         Lisp sexp string.
115 | 
116 |         """
117 |         r = self.__eval_in_flatline('evaluate_js', json_sexp, dataset)
118 |         r.pop(u'mapper', None)
119 |         return r
120 | 
121 |     def lisp_to_json(self, sexp):
122 |         """ Auxliary function transforming Lisp to Python representation."""
123 |         return self.__eval_in_flatline('sexp_to_js', sexp)
124 | 
125 |     def json_to_lisp(self, json_sexp):
126 |         """ Auxliary function transforming Python to lisp representation."""
127 |         return self.__eval_in_flatline('js_to_sexp', json_sexp)
128 | 
129 |     def apply_lisp(self, sexp, rows, dataset=None):
130 |         """Applies the given Lisp sexp to a set of input rows.
131 | 
132 |         Input rows are represented as a list of lists of native Python
133 |         values. If no dataset is provided, the field characteristics
134 |         of the input rows are guessed using `infer_fields`.
135 | 
136 |         """
137 |         return self.__eval_in_flatline('eval_and_apply_sexp',
138 |                                        sexp,
139 |                                        Interpreter.__dataset(dataset, rows),
140 |                                        rows)
141 | 
142 |     def apply_json(self, json_sexp, rows, dataset=None):
143 |         """Applies the given JSON sexp to a set of input rows.
144 | 
145 |         As usual, JSON sexps are represented as Python lists,
146 |         e.g. ["+", 1, 2].
147 | 
148 |         Input rows are represented as a list of lists of native Python
149 |         values. If no dataset is provided, the field characteristics
150 |         of the input rows are guessed using `infer_fields`.
151 | 
152 |         """
153 |         return self.__eval_in_flatline('eval_and_apply_js',
154 |                                        json_sexp,
155 |                                        Interpreter.__dataset(dataset, rows),
156 |                                        rows)
157 | 


--------------------------------------------------------------------------------
/docs/Makefile:
--------------------------------------------------------------------------------
  1 | # Makefile for Sphinx documentation
  2 | #
  3 | 
  4 | # You can set these variables from the command line.
  5 | SPHINXOPTS    =
  6 | SPHINXBUILD   = sphinx-build
  7 | PAPER         =
  8 | BUILDDIR      = _build
  9 | 
 10 | # Internal variables.
 11 | PAPEROPT_a4     = -D latex_paper_size=a4
 12 | PAPEROPT_letter = -D latex_paper_size=letter
 13 | ALLSPHINXOPTS   = -d $(BUILDDIR)/doctrees $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) .
 14 | # the i18n builder cannot share the environment and doctrees with the others
 15 | I18NSPHINXOPTS  = $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) .
 16 | 
 17 | .PHONY: help
 18 | help:
 19 | 	@echo "Please use \`make <target>' where <target> is one of"
 20 | 	@echo "  html       to make standalone HTML files"
 21 | 	@echo "  dirhtml    to make HTML files named index.html in directories"
 22 | 	@echo "  singlehtml to make a single large HTML file"
 23 | 	@echo "  pickle     to make pickle files"
 24 | 	@echo "  json       to make JSON files"
 25 | 	@echo "  htmlhelp   to make HTML files and a HTML help project"
 26 | 	@echo "  qthelp     to make HTML files and a qthelp project"
 27 | 	@echo "  applehelp  to make an Apple Help Book"
 28 | 	@echo "  devhelp    to make HTML files and a Devhelp project"
 29 | 	@echo "  epub       to make an epub"
 30 | 	@echo "  epub3      to make an epub3"
 31 | 	@echo "  latex      to make LaTeX files, you can set PAPER=a4 or PAPER=letter"
 32 | 	@echo "  latexpdf   to make LaTeX files and run them through pdflatex"
 33 | 	@echo "  latexpdfja to make LaTeX files and run them through platex/dvipdfmx"
 34 | 	@echo "  text       to make text files"
 35 | 	@echo "  man        to make manual pages"
 36 | 	@echo "  texinfo    to make Texinfo files"
 37 | 	@echo "  info       to make Texinfo files and run them through makeinfo"
 38 | 	@echo "  gettext    to make PO message catalogs"
 39 | 	@echo "  changes    to make an overview of all changed/added/deprecated items"
 40 | 	@echo "  xml        to make Docutils-native XML files"
 41 | 	@echo "  pseudoxml  to make pseudoxml-XML files for display purposes"
 42 | 	@echo "  linkcheck  to check all external links for integrity"
 43 | 	@echo "  doctest    to run all doctests embedded in the documentation (if enabled)"
 44 | 	@echo "  coverage   to run coverage check of the documentation (if enabled)"
 45 | 	@echo "  dummy      to check syntax errors of document sources"
 46 | 
 47 | .PHONY: clean
 48 | clean:
 49 | 	rm -rf $(BUILDDIR)/*
 50 | 
 51 | .PHONY: html
 52 | html:
 53 | 	$(SPHINXBUILD) -b html $(ALLSPHINXOPTS) $(BUILDDIR)/html
 54 | 	@echo
 55 | 	@echo "Build finished. The HTML pages are in $(BUILDDIR)/html."
 56 | 
 57 | .PHONY: dirhtml
 58 | dirhtml:
 59 | 	$(SPHINXBUILD) -b dirhtml $(ALLSPHINXOPTS) $(BUILDDIR)/dirhtml
 60 | 	@echo
 61 | 	@echo "Build finished. The HTML pages are in $(BUILDDIR)/dirhtml."
 62 | 
 63 | .PHONY: singlehtml
 64 | singlehtml:
 65 | 	$(SPHINXBUILD) -b singlehtml $(ALLSPHINXOPTS) $(BUILDDIR)/singlehtml
 66 | 	@echo
 67 | 	@echo "Build finished. The HTML page is in $(BUILDDIR)/singlehtml."
 68 | 
 69 | .PHONY: pickle
 70 | pickle:
 71 | 	$(SPHINXBUILD) -b pickle $(ALLSPHINXOPTS) $(BUILDDIR)/pickle
 72 | 	@echo
 73 | 	@echo "Build finished; now you can process the pickle files."
 74 | 
 75 | .PHONY: json
 76 | json:
 77 | 	$(SPHINXBUILD) -b json $(ALLSPHINXOPTS) $(BUILDDIR)/json
 78 | 	@echo
 79 | 	@echo "Build finished; now you can process the JSON files."
 80 | 
 81 | .PHONY: htmlhelp
 82 | htmlhelp:
 83 | 	$(SPHINXBUILD) -b htmlhelp $(ALLSPHINXOPTS) $(BUILDDIR)/htmlhelp
 84 | 	@echo
 85 | 	@echo "Build finished; now you can run HTML Help Workshop with the" \
 86 | 	      ".hhp project file in $(BUILDDIR)/htmlhelp."
 87 | 
 88 | .PHONY: qthelp
 89 | qthelp:
 90 | 	$(SPHINXBUILD) -b qthelp $(ALLSPHINXOPTS) $(BUILDDIR)/qthelp
 91 | 	@echo
 92 | 	@echo "Build finished; now you can run "qcollectiongenerator" with the" \
 93 | 	      ".qhcp project file in $(BUILDDIR)/qthelp, like this:"
 94 | 	@echo "# qcollectiongenerator $(BUILDDIR)/qthelp/Flatline.qhcp"
 95 | 	@echo "To view the help file:"
 96 | 	@echo "# assistant -collectionFile $(BUILDDIR)/qthelp/Flatline.qhc"
 97 | 
 98 | .PHONY: applehelp
 99 | applehelp:
100 | 	$(SPHINXBUILD) -b applehelp $(ALLSPHINXOPTS) $(BUILDDIR)/applehelp
101 | 	@echo
102 | 	@echo "Build finished. The help book is in $(BUILDDIR)/applehelp."
103 | 	@echo "N.B. You won't be able to view it unless you put it in" \
104 | 	      "~/Library/Documentation/Help or install it in your application" \
105 | 	      "bundle."
106 | 
107 | .PHONY: devhelp
108 | devhelp:
109 | 	$(SPHINXBUILD) -b devhelp $(ALLSPHINXOPTS) $(BUILDDIR)/devhelp
110 | 	@echo
111 | 	@echo "Build finished."
112 | 	@echo "To view the help file:"
113 | 	@echo "# mkdir -p $$HOME/.local/share/devhelp/Flatline"
114 | 	@echo "# ln -s $(BUILDDIR)/devhelp $$HOME/.local/share/devhelp/Flatline"
115 | 	@echo "# devhelp"
116 | 
117 | .PHONY: epub
118 | epub:
119 | 	$(SPHINXBUILD) -b epub $(ALLSPHINXOPTS) $(BUILDDIR)/epub
120 | 	@echo
121 | 	@echo "Build finished. The epub file is in $(BUILDDIR)/epub."
122 | 
123 | .PHONY: epub3
124 | epub3:
125 | 	$(SPHINXBUILD) -b epub3 $(ALLSPHINXOPTS) $(BUILDDIR)/epub3
126 | 	@echo
127 | 	@echo "Build finished. The epub3 file is in $(BUILDDIR)/epub3."
128 | 
129 | .PHONY: latex
130 | latex:
131 | 	$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
132 | 	@echo
133 | 	@echo "Build finished; the LaTeX files are in $(BUILDDIR)/latex."
134 | 	@echo "Run \`make' in that directory to run these through (pdf)latex" \
135 | 	      "(use \`make latexpdf' here to do that automatically)."
136 | 
137 | .PHONY: latexpdf
138 | latexpdf:
139 | 	$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
140 | 	@echo "Running LaTeX files through pdflatex..."
141 | 	$(MAKE) -C $(BUILDDIR)/latex all-pdf
142 | 	@echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex."
143 | 
144 | .PHONY: latexpdfja
145 | latexpdfja:
146 | 	$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
147 | 	@echo "Running LaTeX files through platex and dvipdfmx..."
148 | 	$(MAKE) -C $(BUILDDIR)/latex all-pdf-ja
149 | 	@echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex."
150 | 
151 | .PHONY: text
152 | text:
153 | 	$(SPHINXBUILD) -b text $(ALLSPHINXOPTS) $(BUILDDIR)/text
154 | 	@echo
155 | 	@echo "Build finished. The text files are in $(BUILDDIR)/text."
156 | 
157 | .PHONY: man
158 | man:
159 | 	$(SPHINXBUILD) -b man $(ALLSPHINXOPTS) $(BUILDDIR)/man
160 | 	@echo
161 | 	@echo "Build finished. The manual pages are in $(BUILDDIR)/man."
162 | 
163 | .PHONY: texinfo
164 | texinfo:
165 | 	$(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo
166 | 	@echo
167 | 	@echo "Build finished. The Texinfo files are in $(BUILDDIR)/texinfo."
168 | 	@echo "Run \`make' in that directory to run these through makeinfo" \
169 | 	      "(use \`make info' here to do that automatically)."
170 | 
171 | .PHONY: info
172 | info:
173 | 	$(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo
174 | 	@echo "Running Texinfo files through makeinfo..."
175 | 	make -C $(BUILDDIR)/texinfo info
176 | 	@echo "makeinfo finished; the Info files are in $(BUILDDIR)/texinfo."
177 | 
178 | .PHONY: gettext
179 | gettext:
180 | 	$(SPHINXBUILD) -b gettext $(I18NSPHINXOPTS) $(BUILDDIR)/locale
181 | 	@echo
182 | 	@echo "Build finished. The message catalogs are in $(BUILDDIR)/locale."
183 | 
184 | .PHONY: changes
185 | changes:
186 | 	$(SPHINXBUILD) -b changes $(ALLSPHINXOPTS) $(BUILDDIR)/changes
187 | 	@echo
188 | 	@echo "The overview file is in $(BUILDDIR)/changes."
189 | 
190 | .PHONY: linkcheck
191 | linkcheck:
192 | 	$(SPHINXBUILD) -b linkcheck $(ALLSPHINXOPTS) $(BUILDDIR)/linkcheck
193 | 	@echo
194 | 	@echo "Link check complete; look for any errors in the above output " \
195 | 	      "or in $(BUILDDIR)/linkcheck/output.txt."
196 | 
197 | .PHONY: doctest
198 | doctest:
199 | 	$(SPHINXBUILD) -b doctest $(ALLSPHINXOPTS) $(BUILDDIR)/doctest
200 | 	@echo "Testing of doctests in the sources finished, look at the " \
201 | 	      "results in $(BUILDDIR)/doctest/output.txt."
202 | 
203 | .PHONY: coverage
204 | coverage:
205 | 	$(SPHINXBUILD) -b coverage $(ALLSPHINXOPTS) $(BUILDDIR)/coverage
206 | 	@echo "Testing of coverage in the sources finished, look at the " \
207 | 	      "results in $(BUILDDIR)/coverage/python.txt."
208 | 
209 | .PHONY: xml
210 | xml:
211 | 	$(SPHINXBUILD) -b xml $(ALLSPHINXOPTS) $(BUILDDIR)/xml
212 | 	@echo
213 | 	@echo "Build finished. The XML files are in $(BUILDDIR)/xml."
214 | 
215 | .PHONY: pseudoxml
216 | pseudoxml:
217 | 	$(SPHINXBUILD) -b pseudoxml $(ALLSPHINXOPTS) $(BUILDDIR)/pseudoxml
218 | 	@echo
219 | 	@echo "Build finished. The pseudo-XML files are in $(BUILDDIR)/pseudoxml."
220 | 
221 | .PHONY: dummy
222 | dummy:
223 | 	$(SPHINXBUILD) -b dummy $(ALLSPHINXOPTS) $(BUILDDIR)/dummy
224 | 	@echo
225 | 	@echo "Build finished. Dummy builder generates no files."
226 | 


--------------------------------------------------------------------------------
/js/demo/index.html:
--------------------------------------------------------------------------------
  1 | <html>
  2 |   <head>
  3 |     <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  4 |     <title>Flatline calculator</title>
  5 |     <link rel="stylesheet" href="styles.css"/>
  6 |   </head>
  7 |   <body>
  8 |     <form>
  9 |       <fieldset>
 10 |         <legend>Dataset</legend>
 11 |         <div>
 12 |           <label for="datasetFile">Load from JSON file</label>
 13 |           <input type="file" id="datasetFile" name="datasetFile"/>
 14 |         </div>
 15 |         <div id="datasetMsg">
 16 |         </div>
 17 |       </fieldset>
 18 |     </form>
 19 |     <form id="lispEvalForm" action="">
 20 |       <fieldset>
 21 |         <legend>Lisp</legend>
 22 |         <div>
 23 |           <label for="lispInput">Lisp s-expression</label>
 24 |           <textarea id="lispInput"></textarea>
 25 |         </div>
 26 |         <div>
 27 |           <button type="button" id="lispEvalButton"
 28 |                   title="Using flatline.evaluate_sexp(...)">
 29 |             evaluate
 30 |           </button>
 31 |           <button type="button" id="lisp2JSONButton"
 32 |                   title="Using JSON.stringify(flatline.sexp_to_js(...))">
 33 |             &rarr; JSON</button>
 34 |         </div>
 35 |         <div id="lispResult">
 36 |         </div>
 37 |         <div>
 38 |           <label for="lispInputRow">Input rows (CSVs)</label>
 39 |           <textarea id="lispInputRow" rows="4"></textarea>
 40 |         </div>
 41 |         <div>
 42 |           <button type="button" id="lispApplyButton"
 43 |                   title="Using flatline.apply_form(evaluated_form, rows)">
 44 |             apply
 45 |           </button>
 46 |         </div>
 47 |         <div id="lispApplyOutput">
 48 |         </div>
 49 |       </fieldset>
 50 |     </form>
 51 |     <form id="jsonEval" action"">
 52 |       <fieldset>
 53 |         <legend>JSON</legend>
 54 |         <div>
 55 |           <label for="jsonInput">JSON expression</label>
 56 |           <textarea id="jsonInput"></textarea>
 57 |         </div>
 58 |         <div>
 59 |           <button type="button" id="jsonEvalButton"
 60 |                   title="Using flatline.evaluate_js(JSON.parse(...))">
 61 |             evaluate
 62 |           </button>
 63 |           <button type="button" id="JSON2LispButton"
 64 |                   title="Using flatline.js_to_sexp(JSON.parse(...)))">
 65 |             &rarr; Lisp</button>
 66 |         </div>
 67 |         <div id="jsonResult">
 68 |         </div>
 69 |         <div>
 70 |           <label for="jsonInputRow">Input rows (CSVs)</label>
 71 |           <textarea id="jsonInputRow" rows="4"></textarea>
 72 |         </div>
 73 |         <div>
 74 |           <button type="button" id="jsonApplyButton"
 75 |                   title="Using flatline.apply_form(evaluated_form, rows)">
 76 |             apply
 77 |           </button>
 78 |         </div>
 79 |         <div id="jsonApplyOutput">
 80 |         </div>
 81 |       </fieldset>
 82 |     </form>
 83 |     <script src="./flatline.js"></script>
 84 |     <script>
 85 |       var dataset = null;
 86 |       var evaluatedLisp = null;
 87 |       var evaluatedJSON = null;
 88 | 
 89 |       function handleDatasetChange(evt) {
 90 |         var files = evt.target.files; // FileList object
 91 |         for (var i = 0, f; f = files[i]; i++) {
 92 |           var reader = new FileReader();
 93 |           // Closure to capture the file information.
 94 |           reader.onload = (function(theFile) {
 95 |             return function(e) {
 96 |             // console.log(e.target.result);
 97 |             assignDataset(e.target.result);
 98 |            };
 99 |           })(f);
100 |          // Read in the file as a data URL.
101 |          reader.readAsText(f, "UTF-8");
102 |         }
103 |       }
104 | 
105 |       function assignDataset(response) {
106 |         oldDataset = dataset;
107 |         try {
108 |           dataset = JSON.parse(response);
109 |           if (dataset.fields == undefined &&
110 |               (dataset.dataset == undefined ||
111 |                dataset.dataset.fields == undefined)) {
112 |             throw new Error('Dataset does not contain fields!');
113 |           }
114 |           clearMsg('datasetMsg');
115 |         } catch (err) {
116 |           dataset = oldDataset;
117 |           console.log(err);
118 |           alert(err.message);
119 |         }
120 |       }
121 | 
122 |       function clearMsg(msgName) {
123 |           document.getElementById(msgName).innerHTML = '<div></div>';
124 |       }
125 | 
126 |       function parseEvalResult(cform) {
127 |         var msg = '';
128 |         var cf = null;
129 |         if (cform.error) {
130 |           msg = '<div class="error">Error: ' + cform.error.message + '</div>';
131 |         } else {
132 |           msg = '<div class="result">optype: ' + cform.optypes;
133 |           if (cform.value) msg = msg + ', value: ' + cform.value;
134 |           msg = msg + '</div>';
135 |           cf = cform;
136 |         }
137 |         return [msg, cf];
138 |       }
139 | 
140 |       function applyForm(form, inputName, outputName) {
141 |         var out = document.getElementById(outputName);
142 |         if (form == undefined || form.error) {
143 |            out.innerHTML = '<div class="error">You need to compile a '
144 |                            + 'valid expression first!</div>';
145 |         } else {
146 |           var v = document.getElementById(inputName).value;
147 |           v = v.trim();
148 |           var rows = v.split("\n").map(function(x){return x.split(",");});
149 |           var res = bigml.dixie.flatline.apply_form(form, rows);
150 |           res = res.map(function (x) {return x.join(', ');});
151 |           res = res.join('<br/>');
152 |           out.innerHTML = '<div class="result">' + res + '</div>';
153 |         }
154 |       }
155 | 
156 |       // self executing function here
157 |       (function() {
158 |         var dsFile = document.getElementById('datasetFile');
159 |         dsFile.addEventListener('change', handleDatasetChange, false);
160 | 
161 |         var jsonButton = document.getElementById('jsonEvalButton');
162 |         jsonButton.addEventListener('click', function() {
163 |           var result = document.getElementById('jsonResult');
164 |           var str = '';
165 |           clearMsg('jsonApplyOutput');
166 |           try {
167 |             str = JSON.parse(document.getElementById('jsonInput').value);
168 |             if (str == '') {
169 |               result.innerHTML = '<div class="error">No expression!</div>';
170 |             } else {
171 |               var pr = parseEvalResult(bigml.dixie.flatline.evaluate_js(str, dataset));
172 |               result.innerHTML = pr[0];
173 |               evaluatedJSON = pr[1];
174 |             }
175 |           } catch (err) {
176 |              result.innerHTML = '<div class="error">JSON parser error: ' +
177 |                                 err.message + '</div>';
178 |           }
179 |         });
180 | 
181 |         var lispButton = document.getElementById('lispEvalButton');
182 |         lispButton.addEventListener('click', function() {
183 |           var str = document.getElementById('lispInput').value;
184 |           var result = document.getElementById('lispResult');
185 |           clearMsg('lispApplyOutput');
186 |           if (str == '') {
187 |            result.innerHTML = '<div class="error">No expression!</div>';
188 |           } else {
189 |             var pr = parseEvalResult(bigml.dixie.flatline.evaluate_sexp(str, dataset));
190 |             result.innerHTML = pr[0];
191 |             evaluatedLisp = pr[1];
192 |           }
193 |         });
194 | 
195 |         var toJSONButton = document.getElementById('lisp2JSONButton');
196 |         toJSONButton.addEventListener('click', function() {
197 |           try {
198 |             var lstr = document.getElementById('lispInput').value;
199 |             var json = JSON.stringify(bigml.dixie.flatline.sexp_to_js(lstr));
200 |             document.getElementById('jsonInput').value = json;
201 |             clearMsg('jsonResult')
202 |             clearMsg('jsonApplyOutput')
203 |           } catch (err) {
204 |             var r = document.getElementById('lispResult');
205 |             r.innerHTML = '<div class="error">' + err.message + '</div>';
206 |           }
207 |         });
208 | 
209 |         var applyLisp = document.getElementById('lispApplyButton');
210 |         applyLisp.addEventListener('click', function () {
211 |           applyForm(evaluatedLisp, 'lispInputRow', 'lispApplyOutput');
212 |         });
213 | 
214 |         var applyJSON = document.getElementById('jsonApplyButton');
215 |         applyJSON.addEventListener('click', function () {
216 |           applyForm(evaluatedJSON, 'jsonInputRow', 'jsonApplyOutput');
217 |         });
218 | 
219 |         var toLispButton = document.getElementById('JSON2LispButton');
220 |         toLispButton.addEventListener('click', function() {
221 |           try {
222 |             var jsv = document.getElementById('jsonInput').value;
223 |             var lisp = bigml.dixie.flatline.js_to_sexp(JSON.parse(jsv));
224 |             document.getElementById('lispInput').value = lisp;
225 |             clearMsg('lispResult')
226 |             clearMsg('lispApplyOutput')
227 |           } catch (err) {
228 |             var r = document.getElementById('jsonResult');
229 |             r.innerHTML = '<div class="error">' + err.message + '</div>';
230 |           }
231 |         });
232 | 
233 |       })();
234 |     </script>
235 |   </body>
236 | </html>
237 | 


--------------------------------------------------------------------------------
/docs/conf.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python3
  2 | # -*- coding: utf-8 -*-
  3 | #
  4 | # Flatline documentation build configuration file, created by
  5 | # sphinx-quickstart on Tue Jan 10 00:20:23 2017.
  6 | #
  7 | # This file is execfile()d with the current directory set to its
  8 | # containing dir.
  9 | #
 10 | # Note that not all possible configuration values are present in this
 11 | # autogenerated file.
 12 | #
 13 | # All configuration values have a default; values that are commented out
 14 | # serve to show the default.
 15 | 
 16 | # If extensions (or modules to document with autodoc) are in another directory,
 17 | # add these directories to sys.path here. If the directory is relative to the
 18 | # documentation root, use os.path.abspath to make it absolute, like shown here.
 19 | #
 20 | # import os
 21 | # import sys
 22 | # sys.path.insert(0, os.path.abspath('.'))
 23 | 
 24 | from recommonmark.parser import CommonMarkParser
 25 | 
 26 | # -- General configuration ------------------------------------------------
 27 | 
 28 | # If your documentation needs a minimal Sphinx version, state it here.
 29 | #
 30 | # needs_sphinx = '1.0'
 31 | 
 32 | # Add any Sphinx extension module names here, as strings. They can be
 33 | # extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
 34 | # ones.
 35 | extensions = [
 36 |     'sphinx.ext.mathjax',
 37 | ]
 38 | 
 39 | # Add any paths that contain templates here, relative to this directory.
 40 | templates_path = ['_templates']
 41 | 
 42 | # The suffix(es) of source filenames.
 43 | # You can specify multiple suffix as a list of string:
 44 | #
 45 | source_suffix = ['.rst', '.md']
 46 | 
 47 | source_parsers = {
 48 |     '.md': CommonMarkParser,
 49 | }
 50 | 
 51 | # The encoding of source files.
 52 | #
 53 | # source_encoding = 'utf-8-sig'
 54 | 
 55 | # The master toctree document.
 56 | master_doc = 'index'
 57 | 
 58 | # General information about the project.
 59 | project = 'Flatline'
 60 | copyright = '2017-2018, 2025, The BigML Team'
 61 | author = 'The BigML Team'
 62 | 
 63 | # The version info for the project you're documenting, acts as replacement for
 64 | # |version| and |release|, also used in various other places throughout the
 65 | # built documents.
 66 | #
 67 | # The short X.Y version.
 68 | version = '1.0'
 69 | # The full version, including alpha/beta/rc tags.
 70 | release = '1.0'
 71 | 
 72 | # The language for content autogenerated by Sphinx. Refer to documentation
 73 | # for a list of supported languages.
 74 | #
 75 | # This is also used if you do content translation via gettext catalogs.
 76 | # Usually you set "language" from the command line for these cases.
 77 | language = "en"
 78 | 
 79 | # There are two options for replacing |today|: either, you set today to some
 80 | # non-false value, then it is used:
 81 | #
 82 | # today = ''
 83 | #
 84 | # Else, today_fmt is used as the format for a strftime call.
 85 | #
 86 | # today_fmt = '%B %d, %Y'
 87 | 
 88 | # List of patterns, relative to source directory, that match files and
 89 | # directories to ignore when looking for source files.
 90 | # This patterns also effect to html_static_path and html_extra_path
 91 | exclude_patterns = ['_build', 'Thumbs.db', '.DS_Store']
 92 | 
 93 | # The reST default role (used for this markup: `text`) to use for all
 94 | # documents.
 95 | #
 96 | # default_role = None
 97 | 
 98 | # If true, '()' will be appended to :func: etc. cross-reference text.
 99 | #
100 | # add_function_parentheses = True
101 | 
102 | # If true, the current module name will be prepended to all description
103 | # unit titles (such as .. function::).
104 | #
105 | # add_module_names = True
106 | 
107 | # If true, sectionauthor and moduleauthor directives will be shown in the
108 | # output. They are ignored by default.
109 | #
110 | # show_authors = False
111 | 
112 | # The name of the Pygments (syntax highlighting) style to use.
113 | pygments_style = 'sphinx'
114 | 
115 | # A list of ignored prefixes for module index sorting.
116 | # modindex_common_prefix = []
117 | 
118 | # If true, keep warnings as "system message" paragraphs in the built documents.
119 | # keep_warnings = False
120 | 
121 | # If true, `todo` and `todoList` produce output, else they produce nothing.
122 | todo_include_todos = False
123 | 
124 | 
125 | # -- Options for HTML output ----------------------------------------------
126 | 
127 | # The theme to use for HTML and HTML Help pages.  See the documentation for
128 | # a list of builtin themes.
129 | #
130 | html_theme = 'default'
131 | 
132 | # Theme options are theme-specific and customize the look and feel of a theme
133 | # further.  For a list of options available for each theme, see the
134 | # documentation.
135 | #
136 | # html_theme_options = {}
137 | 
138 | # Add any paths that contain custom themes here, relative to this directory.
139 | # html_theme_path = []
140 | 
141 | # The name for this set of Sphinx documents.
142 | # "<project> v<release> documentation" by default.
143 | #
144 | # html_title = 'Flatline v1.0'
145 | 
146 | # A shorter title for the navigation bar.  Default is the same as html_title.
147 | #
148 | # html_short_title = None
149 | 
150 | # The name of an image file (relative to this directory) to place at the top
151 | # of the sidebar.
152 | #
153 | # html_logo = None
154 | 
155 | # The name of an image file (relative to this directory) to use as a favicon of
156 | # the docs.  This file should be a Windows icon file (.ico) being 16x16 or 32x32
157 | # pixels large.
158 | #
159 | # html_favicon = None
160 | 
161 | # Add any paths that contain custom static files (such as style sheets) here,
162 | # relative to this directory. They are copied after the builtin static files,
163 | # so a file named "default.css" will overwrite the builtin "default.css".
164 | html_static_path = []
165 | 
166 | # Add any extra paths that contain custom files (such as robots.txt or
167 | # .htaccess) here, relative to this directory. These files are copied
168 | # directly to the root of the documentation.
169 | #
170 | # html_extra_path = []
171 | 
172 | # If not None, a 'Last updated on:' timestamp is inserted at every page
173 | # bottom, using the given strftime format.
174 | # The empty string is equivalent to '%b %d, %Y'.
175 | #
176 | # html_last_updated_fmt = None
177 | 
178 | # If true, SmartyPants will be used to convert quotes and dashes to
179 | # typographically correct entities.
180 | #
181 | # html_use_smartypants = True
182 | 
183 | # Custom sidebar templates, maps document names to template names.
184 | #
185 | # html_sidebars = {}
186 | 
187 | # Additional templates that should be rendered to pages, maps page names to
188 | # template names.
189 | #
190 | # html_additional_pages = {}
191 | 
192 | # If false, no module index is generated.
193 | #
194 | # html_domain_indices = True
195 | 
196 | # If false, no index is generated.
197 | #
198 | # html_use_index = True
199 | 
200 | # If true, the index is split into individual pages for each letter.
201 | #
202 | # html_split_index = False
203 | 
204 | # If true, links to the reST sources are added to the pages.
205 | #
206 | # html_show_sourcelink = True
207 | 
208 | # If true, "Created using Sphinx" is shown in the HTML footer. Default is True.
209 | #
210 | # html_show_sphinx = True
211 | 
212 | # If true, "(C) Copyright ..." is shown in the HTML footer. Default is True.
213 | #
214 | # html_show_copyright = True
215 | 
216 | # If true, an OpenSearch description file will be output, and all pages will
217 | # contain a <link> tag referring to it.  The value of this option must be the
218 | # base URL from which the finished HTML is served.
219 | #
220 | # html_use_opensearch = ''
221 | 
222 | # This is the file name suffix for HTML files (e.g. ".xhtml").
223 | # html_file_suffix = None
224 | 
225 | # Language to be used for generating the HTML full-text search index.
226 | # Sphinx supports the following languages:
227 | #   'da', 'de', 'en', 'es', 'fi', 'fr', 'h', 'it', 'ja'
228 | #   'nl', 'no', 'pt', 'ro', 'r', 'sv', 'tr', 'zh'
229 | #
230 | # html_search_language = 'en'
231 | 
232 | # A dictionary with options for the search language support, empty by default.
233 | # 'ja' uses this config value.
234 | # 'zh' user can custom change `jieba` dictionary path.
235 | #
236 | # html_search_options = {'type': 'default'}
237 | 
238 | # The name of a javascript file (relative to the configuration directory) that
239 | # implements a search results scorer. If empty, the default will be used.
240 | #
241 | # html_search_scorer = 'scorer.js'
242 | 
243 | # Output file base name for HTML help builder.
244 | htmlhelp_basename = 'Flatlinedoc'
245 | 
246 | # -- Options for LaTeX output ---------------------------------------------
247 | 
248 | latex_elements = {
249 |      # The paper size ('letterpaper' or 'a4paper').
250 |      #
251 |      # 'papersize': 'letterpaper',
252 | 
253 |      # The font size ('10pt', '11pt' or '12pt').
254 |      #
255 |      # 'pointsize': '10pt',
256 | 
257 |      # Additional stuff for the LaTeX preamble.
258 |      #
259 |      # 'preamble': '',
260 | 
261 |      # Latex figure (float) alignment
262 |      #
263 |      # 'figure_align': 'htbp',
264 | }
265 | 
266 | # Grouping the document tree into LaTeX files. List of tuples
267 | # (source start file, target name, title,
268 | #  author, documentclass [howto, manual, or own class]).
269 | latex_documents = [
270 |     (master_doc, 'Flatline.tex', 'Flatline Documentation',
271 |      'The BigML Team', 'manual'),
272 | ]
273 | 
274 | # The name of an image file (relative to this directory) to place at the top of
275 | # the title page.
276 | #
277 | # latex_logo = None
278 | 
279 | # For "manual" documents, if this is true, then toplevel headings are parts,
280 | # not chapters.
281 | #
282 | # latex_use_parts = False
283 | 
284 | # If true, show page references after internal links.
285 | #
286 | # latex_show_pagerefs = False
287 | 
288 | # If true, show URL addresses after external links.
289 | #
290 | # latex_show_urls = False
291 | 
292 | # Documents to append as an appendix to all manuals.
293 | #
294 | # latex_appendices = []
295 | 
296 | # It false, will not define \strong, \code, 	itleref, \crossref ... but only
297 | # \sphinxstrong, ..., \sphinxtitleref, ... To help avoid clash with user added
298 | # packages.
299 | #
300 | # latex_keep_old_macro_names = True
301 | 
302 | # If false, no module index is generated.
303 | #
304 | # latex_domain_indices = True
305 | 
306 | 
307 | # -- Options for manual page output ---------------------------------------
308 | 
309 | # One entry per manual page. List of tuples
310 | # (source start file, name, description, authors, manual section).
311 | man_pages = [
312 |     (master_doc, 'flatline', 'Flatline Documentation',
313 |      [author], 1)
314 | ]
315 | 
316 | # If true, show URL addresses after external links.
317 | #
318 | # man_show_urls = False
319 | 
320 | 
321 | # -- Options for Texinfo output -------------------------------------------
322 | 
323 | # Grouping the document tree into Texinfo files. List of tuples
324 | # (source start file, target name, title, author,
325 | #  dir menu entry, description, category)
326 | texinfo_documents = [
327 |     (master_doc, 'Flatline', 'Flatline Documentation',
328 |      author, 'Flatline', 'One line description of project.',
329 |      'Miscellaneous'),
330 | ]
331 | 
332 | # Documents to append as an appendix to all manuals.
333 | #
334 | # texinfo_appendices = []
335 | 
336 | # If false, no module index is generated.
337 | #
338 | # texinfo_domain_indices = True
339 | 
340 | # How to display URL addresses: 'footnote', 'no', or 'inline'.
341 | #
342 | # texinfo_show_urls = 'footnote'
343 | 
344 | # If true, do not generate a @detailmenu in the "Top" node's menu.
345 | #
346 | # texinfo_no_detailmenu = False
347 | 


--------------------------------------------------------------------------------
/docs/quick-reference.rst:
--------------------------------------------------------------------------------
  1 | Quick reference
  2 | ===============
  3 | 
  4 | Field accessors and properties
  5 | ------------------------------
  6 | 
  7 | Access to input field values:
  8 | 
  9 | .. code:: lisp
 10 | 
 11 |          (field <field-designator> [<shift>] [<default-value>])
 12 |          (f <field-designator> [<shift>] [<default-value>])
 13 |          (fields <field-designator> ... <field-designator-n>)
 14 |          (random-field-value <field-designator>)
 15 |          (weighted-random-field-value <field-designator>)
 16 |          (ensure-value <field-designator>)
 17 |          (ensure-weighted-value <field-designator>)
 18 | 
 19 | All fields in a row:
 20 | 
 21 | .. code:: lisp
 22 | 
 23 |          (all)
 24 |          (all-but <field-designator> ... <field-designator-n>)
 25 |          (all-with-defaults <field-designator-0> <field-value-0>
 26 |                             <field-designator-1> <field-value-1>
 27 |                             ...
 28 |                             <field-designator-n> <field-value-n>)
 29 |          (all-with-numeric-default ["mean" "median" "minimum" "maximum" <number>]
 30 | 
 31 | Row properties:
 32 | 
 33 | .. code:: lisp
 34 | 
 35 |         (row-number) ;;  current row number, 0-based
 36 | 
 37 | Field properties:
 38 | 
 39 | .. code:: lisp
 40 | 
 41 |         (bin-center <field-designator> <bin-number>)  ;;  number
 42 |         (bin-count <field-designator> <bin-number>) ;;  number
 43 |         (category-count <field-designator> <category>) ;;  number
 44 |         (maximum <field-designator>) ;;  number
 45 |         (mean <field-designator>) ;;  number
 46 |         (median <field-designator>) ;;  number
 47 |         (minimum <field-designator>) ;;  number
 48 |         (missing? <field-designator> [<shift>]) ;;  boolean
 49 |         (missing-count <field-designator>) ;;  number
 50 |         (preferred? <field-designator>) ;;  boolean
 51 |         (population <field-designator>) ;;  integer
 52 |         (sum <field-designator>) ;;  number
 53 |         (sum-squares <field-designator>) ;;  number
 54 |         (variance <field-designator>) ;;  number
 55 |         (standard-deviation <field-designator>) ;;  number
 56 | 
 57 | Normalization:
 58 | 
 59 | .. code:: lisp
 60 | 
 61 |          (normalize <id> [<from> <to>]) ;; [from to] defaults to [0, 1]
 62 |          (z-score <id>)
 63 |          (log-normal <id>)
 64 | 
 65 | Percentiles and population:
 66 | 
 67 | .. code:: lisp
 68 | 
 69 |         (percentile <field-designator> <fraction>) ;;  number
 70 |         (population-fraction <field-designator> <fraction>) ;;  integer
 71 |         (within-percentiles? <field-designator> <lower> <upper>) ;;  boolean
 72 |         (percentile-label <field-designator> <label-0> ... <label-n>)
 73 | 
 74 | Segments:
 75 | 
 76 | .. code:: lisp
 77 | 
 78 |         (segment-label <field-designator>
 79 |                        <label-1> <bound-1>
 80 |                        ...
 81 |                        <label-n-1> <bound-n-1>
 82 |                        <label-n>)
 83 |         (segment-label <field-designator> <label-1> <label-2> ... <label-n>)
 84 | 
 85 | Vectorize categorical and text fields:
 86 | 
 87 | .. code:: lisp
 88 | 
 89 |          (vectorize <field-designator> [<max-fields>])
 90 | 
 91 | Items:
 92 | 
 93 | .. code:: lisp
 94 | 
 95 |          (contains-items? <field-designator> <item0> ... <itemn>)
 96 |          (equal-to-items? <field-designator> <item0> ... <itemn>)
 97 | 
 98 | Regions:
 99 | 
100 | .. code:: lisp
101 | 
102 |           (region? <region list>)
103 |           (rename-region <regions list> <old-label> <new-label>)
104 |           (add-region <regions list> <region list>)
105 |           (add-region <regions list> <label-string> <int> <int> <int> <int>)
106 |           (remove-region <regions list> <label-string>)
107 |           (update-region <regions list> <region list>)
108 |           (update-region <regions list> <label-string> <int> <int> <int> <int>)
109 |           ;; <regions list> either a regions value or a regions field designator
110 | 
111 | Clustering:
112 | 
113 | .. code:: lisp
114 | 
115 |          (row-distance <list-of-field-values> [<list-of-field-values> <weights>])
116 |          (row-distance-squared <list-of-field-values> [<list-of-field-values> <weights>])
117 | 
118 | Strings and regular expressions
119 | -------------------------------
120 | 
121 | Conversion of any value to a string:
122 | 
123 | .. code:: lisp
124 | 
125 |         (str <sexp0> ...) ;;  string
126 | 
127 | Substrings:
128 | 
129 | .. code:: lisp
130 | 
131 |         (subs <string> <start> [<end>]) ;;  string
132 | 
133 | Regexps:
134 | 
135 | .. code:: lisp
136 | 
137 |         (matches? <string> <regex-string>)  ;;  boolean
138 |         (re-quote <string>)  ;;  regexp that matches <string> literally
139 |         (replace <string> <regexp> <replacement>) ;;  string
140 |         (replace-first <string> <regexp> <replacement>) ;;  string
141 | 
142 | Utilities:
143 | 
144 | .. code:: lisp
145 | 
146 |         (length <string>) ;;  integer
147 |         (join <list of string> <sep-string>) ;; string
148 |         (levenshtein <str-sexp0> <str-sexp1>)  ;;  number
149 |         (occurrences <string> <term> [<case-insensitive?> <lang>]) ;;  number
150 |         (language <string>) ;;  ["en", "es", "ca", "nl"]
151 | 
152 | Hashing:
153 | 
154 | .. code:: lisp
155 | 
156 |          (md5 <string>) ;;  string of length 32
157 |          (sha1 <string>) ;;  string of length 40
158 |          (sha256 <string>) ;;  string of length 64
159 | 
160 | Math and logic
161 | --------------
162 | 
163 | Arithmetic operators:
164 | 
165 | .. code:: lisp
166 | 
167 |        + - * / div mod
168 | 
169 | Relational operators:
170 | 
171 | .. code:: lisp
172 | 
173 |        < <= > >= = !=
174 | 
175 | Logical operators:
176 | 
177 | .. code:: lisp
178 | 
179 |       and or not
180 | 
181 | Mathematical functions:
182 | 
183 | .. code:: lisp
184 | 
185 |         (zero? <x>)
186 |         (even? <x>)
187 |         (odd? <x>)
188 |         (abs <x>)     ;; Absolute value
189 |         (acos <x>)
190 |         (asin <x>)
191 |         (atan <x>)
192 |         (ceil <x>)
193 |         (cos <x>)     ;; <x> := radians
194 |         (cosh <x>)
195 |         (exp <x>)     ;; Exponential
196 |         (floor <x>)
197 |         (ln <x>)      ;; Natural logarithm
198 |         (log <x>)     ;; Natural logarithm
199 |         (log2 <x>)    ;; Base-2 logarithm
200 |         (log10 <x>)   ;; Base-10 logarithm
201 |         (max <x0> ... <xn>)
202 |         (min <x0> ... <xn>)
203 |         (mod <n> <m>) ;; Modulus
204 |         (div <n> <m>) ;; Integer division (quotient)
205 |         (pow <x> <n>)
206 |         (rand)            ;; a random double in [0, 1)
207 |         (rand-int <n>)    ;; a random integer in [0, n) or (n, 0]
208 |         (round <x>)
209 |         (sin <x>)     ;; <x> := radians
210 |         (sinh <x>)
211 |         (sqrt <x>)
212 |         (square <x>)  ;; (* <x> <x>)
213 |         (tan <x>)     ;; <x> := radians
214 |         (tanh <x>)
215 |         (to-degrees <x>) ;; <x> := radians
216 |         (to-radians <x>) ;; <x> := degrees
217 |         (spherical-distance <lat1> <lon1> <lat2> <lon2>) ;; args in
218 |         (spherical-distance-deg <lat1> <lon1> <lat2> <lon2>) ;; args in radians
219 |         (linear-regression <x1> <y1> ... <xn> <yn>) ;; slope, intercept, pearson
220 |         (chi-square-p-value <degrees of freedom> <value>)
221 | 
222 | 
223 | Fuzzy logic
224 | -----------
225 | 
226 | Basic t-norms
227 | 
228 | .. code:: lisp
229 | 
230 |         (tnorm-min <f1> <f2>) ;; Minimum t-norm. Also called the Gödel t-norm.
231 |         (tnorm-product <f1> <f2>) ;; Product t-norm. The ordinary product of real numbers.
232 |         (tnorm-lukasiewicz <f1> <f2>) ;; Łukasiewicz t-norm.
233 |         (tnorm-drastic <f1> <f2>) ;; Drastic t-norm
234 |         (tnorm-nilpotent-min <f1> <f2>) ;; Nilpotent minimum t-norm
235 | 
236 | T-conorms:
237 | 
238 | .. code:: lisp
239 | 
240 |         (tconorm-max <f1> <f2>) ;; Maximum t-norm. Dual to the minimum t-norm, is the smallest t-conorm.
241 |         (tconorm-probabilistic <f1> <f2>) ;; Probabilistic t-norm. It's dual to the product t-norm.
242 |         (tconorm-bounded <f1> <f2>) ;; Bounded t-norm. It'ss dual to the Łukasiewicz t-norm.
243 |         (tconorm-drastic <f1> <f2>) ;; Drastic t-conorm. It's dual to the drastic t-norm.
244 |         (tconorm-nilpotent-max <f1> <f2>) ;; Nilpotent maximum t-conorm. It's dual to the nilpotent minumum.
245 |         (tconorm-einstein-sum <f1> <f2>) ;; Einstein t-conorm. It's a dual to one of the Hamacher t-norms.
246 | 
247 | Parametric t-conorms:
248 | 
249 | .. code:: lisp
250 | 
251 |         (tconorm-max <f1> <f2>) ;; Maximum t-norm. Dual to the minimum t-norm, is the smallest t-conorm.
252 |         (tconorm-probabilistic <f1> <f2>) ;; Probabilistic t-norm. It's dual to the product t-norm.
253 |         (tconorm-bounded <f1> <f2>) ;; Bounded t-norm. It'ss dual to the Łukasiewicz t-norm.
254 |         (tconorm-drastic <f1> <f2>) ;; Drastic t-conorm. It's dual to the drastic t-norm.
255 |         (tconorm-nilpotent-max <f1> <f2>) ;; Nilpotent maximum t-conorm. It's dual to the nilpotent minumum.
256 |         (tconorm-einstein-sum <f1> <f2>) ;; Einstein t-conorm. It's a dual to one of the Hamacher t-norms.
257 | 
258 | Coercions
259 | ---------
260 | 
261 | .. code:: lisp
262 | 
263 |         (integer <sexp>) ;;  integer
264 |         (real <sexp>) ;;  real
265 |         ;; (integer true) = 1, (integer false) = 0
266 | 
267 | Dates and time
268 | --------------
269 | 
270 | Functions taking a number representing the *epoch*, i.e., the number of
271 | **milliseconds** since Jan 1st 1970.
272 | 
273 | .. code:: lisp
274 | 
275 |         (epoch-year <n>) ;;  number
276 |         (epoch-month <n>) ;;  number
277 |         (epoch-week <n>) ;; number
278 |         (epoch-day <n>) ;;  number
279 |         (epoch-weekday <n>) ;;  number
280 |         (epoch-hour <n>) ;;  number
281 |         (epoch-minute <n>) ;;  number
282 |         (epoch-second <n>) ;;  number
283 |         (epoch-millisecond <n>) ;;  number
284 |         (epoch-fields <n>) ;;  list of numbers
285 | 
286 | Any string can be coerced to an epoch:
287 | 
288 | .. code:: lisp
289 | 
290 |         (epoch <string> [<format>])
291 | 
292 | Conditionals and local variables
293 | --------------------------------
294 | 
295 | Conditionals:
296 | 
297 | .. code:: lisp
298 | 
299 |        (if <cond> <then> [<else>])
300 | 
301 |        (cond <cond0> <then0>
302 |              <cond1> <then1>
303 |              ... ...
304 |              <default>)
305 | 
306 | For example:
307 | 
308 | .. code:: lisp
309 | 
310 |         (cond (> (f "000001") (mean "000001")) "above average"
311 |               (= (f "000001") (mean "000001")) "below average"
312 |               "mediocre")
313 | 
314 | Local variables:
315 | 
316 | .. code:: lisp
317 | 
318 |         (let <bindings> <body>)
319 |         <bindings> := (<varname0> <val0> ...  <varnamen> <valn>)
320 |         <body> := <expression with varname0 ... varnamen>
321 | 
322 | For example:
323 | 
324 | .. code:: lisp
325 | 
326 |         (let (x (+ (window "a" -10 10))
327 |               a (/ (* x 3) 4.34)
328 |               y (if (< a 10) "Good" "Bad"))
329 |           (list x (str (f 10) "-" y) a y))
330 | 
331 | Lists
332 | -----
333 | 
334 | Creation and elememt access:
335 | 
336 | .. code:: lisp
337 | 
338 |         (list <sexp-0> ... <sexp-n>) ;;  list of given values
339 |         (cons <head> <tail>) ;;  list
340 |         (head <list>) ;;  first element
341 |         (tail <list>) ;;  list sans first element
342 |         (nth <list> <n>)  ;;  0-based nth element
343 |         (take <list> <n>) ;;  take first <n> elements
344 |         (drop <list> <n>) ;;  drop first <n> elements
345 |         (drop <list> <from> <to>)  ;; elements in range [from to)
346 | 
347 | Inclusion:
348 | 
349 | .. code:: lisp
350 | 
351 |         (in <value> <list>) ;;  boolean
352 | 
353 | Properties of lists:
354 | 
355 | .. code:: lisp
356 | 
357 |         (count <list>)         ;; (count (list (f 1) (f 2))) => 2
358 |         (max <list>)           ;; (max (list -1 2 -2 0.38))  => 2
359 |         (min <list>)           ;; (min (list -1.3 2 1))  => -1.3
360 |         (avg <list>)           ;; (avg (list -1 -2 1 2 0.8 -0.8)) => 0
361 |         (list-median <list>)   ;; (list-median (list -1 -2 1 2 0.8 -0.8) => 1
362 |         (mode <list>)          ;; (mode (list a b b c b a c c c))  => "c"
363 | 
364 | List transformations:
365 | 
366 | .. code:: lisp
367 | 
368 |         (map <fn> (list <a0> <a1> ... <an>))
369 |         (filter <fn> (list <a0> ... <an>))
370 |         (reverse <list>)
371 |         (sort <list>)  ;; sorts, in increasing order, a list of values
372 | 
373 | Field lists and windows:
374 | 
375 | .. code:: lisp
376 | 
377 |         (fields <field-designator> ... <field-designator-n>)
378 |         (window <field-designator> <start> <end> [<padding-value>])
379 |         (diff-window <fdes> <start> <end>) ;; differences of consecutive values
380 |         (cond-window <fdes> <sexp>)        ;; values that satisfy boolean sexp
381 |         ;; sum of values
382 |         (window-sum <field-designator> <start> <end> [<padding-value>])
383 |         ;; mean of values
384 |         (window-mean <field-designator> <start> <end> [<padding-value>])
385 |         ;; mode of values
386 |         (window-mode <field-designator> <start> <end> [<padding-value>])
387 |         ;; median of values
388 |         (window-median <field-designator> <start> <end> [<padding-value>])
389 | 
390 | 
391 | Accumulating values in cells
392 | ----------------------------
393 | 
394 | .. code:: lisp
395 | 
396 |         (cell <cell-name> <default-value>)
397 |         (set-cell <cell-name> <value>)
398 | 


--------------------------------------------------------------------------------
/python/notebooks/Flatline.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "# Local Flatline Interpreter"
  8 |    ]
  9 |   },
 10 |   {
 11 |    "cell_type": "code",
 12 |    "execution_count": null,
 13 |    "metadata": {},
 14 |    "outputs": [],
 15 |    "source": [
 16 |     "from flatline.interpreter import Interpreter"
 17 |    ]
 18 |   },
 19 |   {
 20 |    "cell_type": "markdown",
 21 |    "metadata": {},
 22 |    "source": [
 23 |     "We create a new local interpreter, that will use *nodejs* under the rug"
 24 |    ]
 25 |   },
 26 |   {
 27 |    "cell_type": "code",
 28 |    "execution_count": null,
 29 |    "metadata": {},
 30 |    "outputs": [],
 31 |    "source": [
 32 |     "interpreter = Interpreter()"
 33 |    ]
 34 |   },
 35 |   {
 36 |    "cell_type": "markdown",
 37 |    "metadata": {},
 38 |    "source": [
 39 |     "## Available functions\n",
 40 |     "\n",
 41 |     "We can query the interpreter for all the built-in functions provided by flatline"
 42 |    ]
 43 |   },
 44 |   {
 45 |    "cell_type": "code",
 46 |    "execution_count": null,
 47 |    "metadata": {},
 48 |    "outputs": [],
 49 |    "source": [
 50 |     "interpreter.defined_functions()"
 51 |    ]
 52 |   },
 53 |   {
 54 |    "cell_type": "markdown",
 55 |    "metadata": {},
 56 |    "source": [
 57 |     "## Checking symbolic expressions"
 58 |    ]
 59 |   },
 60 |   {
 61 |    "cell_type": "markdown",
 62 |    "metadata": {},
 63 |    "source": [
 64 |     "The interpreter can check for us whether a Lisp or JSON s-expression is correct."
 65 |    ]
 66 |   },
 67 |   {
 68 |    "cell_type": "markdown",
 69 |    "metadata": {},
 70 |    "source": [
 71 |     "### Valid constant expressions"
 72 |    ]
 73 |   },
 74 |   {
 75 |    "cell_type": "markdown",
 76 |    "metadata": {},
 77 |    "source": [
 78 |     "Lisp s-expressions are represented as strings:"
 79 |    ]
 80 |   },
 81 |   {
 82 |    "cell_type": "code",
 83 |    "execution_count": null,
 84 |    "metadata": {},
 85 |    "outputs": [],
 86 |    "source": [
 87 |     "interpreter.check_lisp('(+ 1 2)')"
 88 |    ]
 89 |   },
 90 |   {
 91 |    "cell_type": "markdown",
 92 |    "metadata": {},
 93 |    "source": [
 94 |     "JSON expressions are represented as Python lists of native values"
 95 |    ]
 96 |   },
 97 |   {
 98 |    "cell_type": "code",
 99 |    "execution_count": null,
100 |    "metadata": {},
101 |    "outputs": [],
102 |    "source": [
103 |     "interpreter.check_json([\"+\", [\"*\", 3, 5]])"
104 |    ]
105 |   },
106 |   {
107 |    "cell_type": "markdown",
108 |    "metadata": {},
109 |    "source": [
110 |     "### Some erroneous symbolic expressions"
111 |    ]
112 |   },
113 |   {
114 |    "cell_type": "code",
115 |    "execution_count": null,
116 |    "metadata": {},
117 |    "outputs": [],
118 |    "source": [
119 |     "interpreter.check_lisp('(+ 2')"
120 |    ]
121 |   },
122 |   {
123 |    "cell_type": "code",
124 |    "execution_count": null,
125 |    "metadata": {},
126 |    "outputs": [],
127 |    "source": [
128 |     "interpreter.check_json([\"non-existent\", 3])"
129 |    ]
130 |   },
131 |   {
132 |    "cell_type": "code",
133 |    "execution_count": null,
134 |    "metadata": {},
135 |    "outputs": [],
136 |    "source": [
137 |     "interpreter.check_json([\"+\", 1, \"3\"])"
138 |    ]
139 |   },
140 |   {
141 |    "cell_type": "code",
142 |    "execution_count": null,
143 |    "metadata": {},
144 |    "outputs": [],
145 |    "source": [
146 |     "interpreter.check_lisp('(f 0)')"
147 |    ]
148 |   },
149 |   {
150 |    "cell_type": "markdown",
151 |    "metadata": {},
152 |    "source": [
153 |     "### Checking expressions that depend on input dataset fields"
154 |    ]
155 |   },
156 |   {
157 |    "cell_type": "markdown",
158 |    "metadata": {},
159 |    "source": [
160 |     "The latest sexp was invalid because no dataset is known, and hence there's no \"field 0\".\n",
161 |     "\n",
162 |     "Let's create a mock dataset to tell the interpreter what are our fields:"
163 |    ]
164 |   },
165 |   {
166 |    "cell_type": "code",
167 |    "execution_count": null,
168 |    "metadata": {},
169 |    "outputs": [],
170 |    "source": [
171 |     "mock_dataset = {'dataset':{'fields': Interpreter.infer_fields([1, 'a'])}}\n",
172 |     "mock_dataset['dataset']['fields']"
173 |    ]
174 |   },
175 |   {
176 |    "cell_type": "markdown",
177 |    "metadata": {},
178 |    "source": [
179 |     "Now the checks referring to those fields will pass:"
180 |    ]
181 |   },
182 |   {
183 |    "cell_type": "code",
184 |    "execution_count": null,
185 |    "metadata": {},
186 |    "outputs": [],
187 |    "source": [
188 |     "interpreter.check_lisp('(field 0)', dataset=mock_dataset)"
189 |    ]
190 |   },
191 |   {
192 |    "cell_type": "code",
193 |    "execution_count": null,
194 |    "metadata": {},
195 |    "outputs": [],
196 |    "source": [
197 |     "interpreter.check_json([\"f\", \"000001\"], dataset=mock_dataset)"
198 |    ]
199 |   },
200 |   {
201 |    "cell_type": "markdown",
202 |    "metadata": {},
203 |    "source": [
204 |     "Note how the two last expressions have no associated value, because they depend on the concrete input rows to which they're applied (i.e., these expressions do not represent constant values)."
205 |    ]
206 |   },
207 |   {
208 |    "cell_type": "markdown",
209 |    "metadata": {},
210 |    "source": [
211 |     "## Applying symbolic expressions"
212 |    ]
213 |   },
214 |   {
215 |    "cell_type": "markdown",
216 |    "metadata": {},
217 |    "source": [
218 |     "We can apply valid symbolic expressions to local rows represented as lists of native Python values:"
219 |    ]
220 |   },
221 |   {
222 |    "cell_type": "code",
223 |    "execution_count": null,
224 |    "metadata": {},
225 |    "outputs": [],
226 |    "source": [
227 |     "test_rows = [[1, 'a'], [2, 'b'], [23, 'd']]\n",
228 |     "interpreter.apply_lisp('(fields 1 0)', test_rows)"
229 |    ]
230 |   },
231 |   {
232 |    "cell_type": "code",
233 |    "execution_count": null,
234 |    "metadata": {},
235 |    "outputs": [],
236 |    "source": [
237 |     "interpreter.apply_lisp('(list (+ 2 (f 0)) (- (f 0) (f 0 -1)))', test_rows)"
238 |    ]
239 |   },
240 |   {
241 |    "cell_type": "code",
242 |    "execution_count": null,
243 |    "metadata": {},
244 |    "outputs": [],
245 |    "source": [
246 |     "interpreter.apply_json([\"window\", \"000001\", -1, 1], test_rows)"
247 |    ]
248 |   },
249 |   {
250 |    "cell_type": "markdown",
251 |    "metadata": {},
252 |    "source": [
253 |     "In these examples, the field characteristics are guessed from the given values.  Guessing is useful for quick tests, but in real cases we should provide real dataset metadata to the apply functions."
254 |    ]
255 |   },
256 |   {
257 |    "cell_type": "markdown",
258 |    "metadata": {},
259 |    "source": [
260 |     "# Extended example using remote resources"
261 |    ]
262 |   },
263 |   {
264 |    "cell_type": "code",
265 |    "execution_count": null,
266 |    "metadata": {},
267 |    "outputs": [],
268 |    "source": [
269 |     "from bigml.api import BigML\n",
270 |     "from flatline.sampler import Sampler"
271 |    ]
272 |   },
273 |   {
274 |    "cell_type": "code",
275 |    "execution_count": null,
276 |    "metadata": {},
277 |    "outputs": [],
278 |    "source": [
279 |     "api = BigML()"
280 |    ]
281 |   },
282 |   {
283 |    "cell_type": "markdown",
284 |    "metadata": {},
285 |    "source": [
286 |     "We start by creating a dataset from Quandl's dataset on Apple NASDAQ"
287 |    ]
288 |   },
289 |   {
290 |    "cell_type": "code",
291 |    "execution_count": null,
292 |    "metadata": {},
293 |    "outputs": [],
294 |    "source": [
295 |     "source = api.create_source('https://s3.amazonaws.com/bigml-public/csv/nasdaq_aapl.csv', {'name':'Flatline tests'})\n",
296 |     "api.ok(source)"
297 |    ]
298 |   },
299 |   {
300 |    "cell_type": "code",
301 |    "execution_count": null,
302 |    "metadata": {},
303 |    "outputs": [],
304 |    "source": [
305 |     "dataset = api.create_dataset(source)\n",
306 |     "dataset_id = dataset['resource']\n",
307 |     "api.ok(dataset)"
308 |    ]
309 |   },
310 |   {
311 |    "cell_type": "markdown",
312 |    "metadata": {},
313 |    "source": [
314 |     "And download a sample of its rows locally, using a *Sampler* object"
315 |    ]
316 |   },
317 |   {
318 |    "cell_type": "code",
319 |    "execution_count": null,
320 |    "metadata": {},
321 |    "outputs": [],
322 |    "source": [
323 |     "sampler = Sampler()"
324 |    ]
325 |   },
326 |   {
327 |    "cell_type": "markdown",
328 |    "metadata": {},
329 |    "source": [
330 |     "*Sampler*, like *Interpreter* are abstractions above the building blocks provided by the API bindings, and take care internally of waiting for resource completion and other housekeeping (that's why we don't need `api.ok()` calls here)."
331 |    ]
332 |   },
333 |   {
334 |    "cell_type": "code",
335 |    "execution_count": null,
336 |    "metadata": {},
337 |    "outputs": [],
338 |    "source": [
339 |     "sampler.take_sample(dataset_id, size=5)"
340 |    ]
341 |   },
342 |   {
343 |    "cell_type": "markdown",
344 |    "metadata": {},
345 |    "source": [
346 |     "These are the rows that we have downloaded locally (plus all the associated metadata)"
347 |    ]
348 |   },
349 |   {
350 |    "cell_type": "code",
351 |    "execution_count": null,
352 |    "metadata": {},
353 |    "outputs": [],
354 |    "source": [
355 |     "sampler.rows()"
356 |    ]
357 |   },
358 |   {
359 |    "cell_type": "markdown",
360 |    "metadata": {},
361 |    "source": [
362 |     "The sampler also keeps information on the dataset and sample metadata; e.g. the field descriptors:"
363 |    ]
364 |   },
365 |   {
366 |    "cell_type": "code",
367 |    "execution_count": null,
368 |    "metadata": {},
369 |    "outputs": [],
370 |    "source": [
371 |     "[{'id':f['id'], 'name':f['name'], 'optype':f['optype']} for f in sampler.fields()]"
372 |    ]
373 |   },
374 |   {
375 |    "cell_type": "markdown",
376 |    "metadata": {},
377 |    "source": [
378 |     "Now we can apply locally Flatline expressions and check whether they produce sensible results.  \n",
379 |     "\n",
380 |     "For instance, we could normalize **Low**, **High** and **Volume**, dividing them by their mean value in the original dataset.  \n",
381 |     "\n",
382 |     "Let's define an auxiliary function to generate the corresponding Flatline JSON s-expressions:"
383 |    ]
384 |   },
385 |   {
386 |    "cell_type": "code",
387 |    "execution_count": null,
388 |    "metadata": {},
389 |    "outputs": [],
390 |    "source": [
391 |     "def norm_field(name):\n",
392 |     "    return [\"/\", [\"field\", name], [\"abs\", [\"mean\", name]]]\n",
393 |     "\n",
394 |     "norm_field('High')"
395 |    ]
396 |   },
397 |   {
398 |    "cell_type": "markdown",
399 |    "metadata": {},
400 |    "source": [
401 |     "We can use the interpreter to check the format and syntax of our generated code:"
402 |    ]
403 |   },
404 |   {
405 |    "cell_type": "code",
406 |    "execution_count": null,
407 |    "metadata": {},
408 |    "outputs": [],
409 |    "source": [
410 |     "def print_as_lisp(json_sexp):\n",
411 |     "    print interpreter.json_to_lisp(json_sexp)\n",
412 |     "    \n",
413 |     "print_as_lisp(norm_field('Low'))"
414 |    ]
415 |   },
416 |   {
417 |    "cell_type": "markdown",
418 |    "metadata": {},
419 |    "source": [
420 |     "To generate more than one value, we wrap the list of field expressions in a `list` form:"
421 |    ]
422 |   },
423 |   {
424 |    "cell_type": "code",
425 |    "execution_count": null,
426 |    "metadata": {},
427 |    "outputs": [],
428 |    "source": [
429 |     "def make_list(*fields):\n",
430 |     "    res = ['list']\n",
431 |     "    res.extend(fields)\n",
432 |     "    return res\n",
433 |     "    \n",
434 |     "norm_fields = make_list(norm_field('Low'), norm_field('High'), norm_field('Volume'))\n",
435 |     "print_as_lisp(norm_fields)"
436 |    ]
437 |   },
438 |   {
439 |    "cell_type": "markdown",
440 |    "metadata": {},
441 |    "source": [
442 |     "And now let's check that the syntax is in fact correct:"
443 |    ]
444 |   },
445 |   {
446 |    "cell_type": "code",
447 |    "execution_count": null,
448 |    "metadata": {},
449 |    "outputs": [],
450 |    "source": [
451 |     "interpreter.check_json(norm_fields, dataset['object'])"
452 |    ]
453 |   },
454 |   {
455 |    "cell_type": "markdown",
456 |    "metadata": {},
457 |    "source": [
458 |     "Our lisp expression seems correct, and produces three numeric values.  We can apply it to our sample rows and confirm that the outputs are in fact what we expect:"
459 |    ]
460 |   },
461 |   {
462 |    "cell_type": "code",
463 |    "execution_count": null,
464 |    "metadata": {},
465 |    "outputs": [],
466 |    "source": [
467 |     "sampler.apply_json(norm_fields)"
468 |    ]
469 |   },
470 |   {
471 |    "cell_type": "markdown",
472 |    "metadata": {},
473 |    "source": [
474 |     "Looks good so far.  Let's say we want to predict whether the stock will go up or down based on the Open and Close values of the **previous day** and today's Open value.  We can access the value of a previous row with `(field name -1)`:"
475 |    ]
476 |   },
477 |   {
478 |    "cell_type": "code",
479 |    "execution_count": null,
480 |    "metadata": {},
481 |    "outputs": [],
482 |    "source": [
483 |     "def previous_day(name):\n",
484 |     "    return [\"field\", name, -1]\n",
485 |     "\n",
486 |     "open_close_fields = make_list(previous_day('Open'), \n",
487 |     "                              previous_day('Close'))\n",
488 |     "\n",
489 |     "print_as_lisp(open_close_fields)"
490 |    ]
491 |   },
492 |   {
493 |    "cell_type": "markdown",
494 |    "metadata": {},
495 |    "source": [
496 |     "Let's check it's a good Flatline expression and see how it works on our local sample:"
497 |    ]
498 |   },
499 |   {
500 |    "cell_type": "code",
501 |    "execution_count": null,
502 |    "metadata": {},
503 |    "outputs": [],
504 |    "source": [
505 |     "interpreter.check_json(open_close_fields, dataset=dataset['object'])"
506 |    ]
507 |   },
508 |   {
509 |    "cell_type": "code",
510 |    "execution_count": null,
511 |    "metadata": {},
512 |    "outputs": [],
513 |    "source": [
514 |     "sampler.apply_json(open_close_fields)"
515 |    ]
516 |   },
517 |   {
518 |    "cell_type": "markdown",
519 |    "metadata": {},
520 |    "source": [
521 |     "Note how the entries for the previous day Open and Close values are `None` in the first row, since there's no previous day!\n",
522 |     "\n",
523 |     "Finally, let's define our objective field, **UpOrDown**:"
524 |    ]
525 |   },
526 |   {
527 |    "cell_type": "code",
528 |    "execution_count": null,
529 |    "metadata": {},
530 |    "outputs": [],
531 |    "source": [
532 |     "up_or_down = '(if (> (f \"Open\") (f \"Close\")) \"down\" \"up\")'\n",
533 |     "interpreter.check_lisp(up_or_down, dataset=dataset['object'])"
534 |    ]
535 |   },
536 |   {
537 |    "cell_type": "code",
538 |    "execution_count": null,
539 |    "metadata": {},
540 |    "outputs": [],
541 |    "source": [
542 |     "sampler.apply_lisp(up_or_down)"
543 |    ]
544 |   },
545 |   {
546 |    "cell_type": "markdown",
547 |    "metadata": {},
548 |    "source": [
549 |     "Once we're happy with our transformations, we ask BigML to create the new fields over the entire dataset"
550 |    ]
551 |   },
552 |   {
553 |    "cell_type": "code",
554 |    "execution_count": null,
555 |    "metadata": {},
556 |    "outputs": [],
557 |    "source": [
558 |     "norm_fields_sexp = interpreter.json_to_lisp(norm_fields)\n",
559 |     "open_close_sexp = interpreter.json_to_lisp(open_close_fields)\n",
560 |     "\n",
561 |     "extended_dataset = api.create_dataset(dataset, {'new_fields':[{'field':norm_fields_sexp, 'names':['NLow', 'NHigh', 'NVol']},\n",
562 |     "                                                              {'field':open_close_sexp, 'names':['Open-1', 'Close-1']},\n",
563 |     "                                                              {'field':up_or_down, 'name': 'Up or down'}]})\n",
564 |     "api.ok(extended_dataset)"
565 |    ]
566 |   },
567 |   {
568 |    "cell_type": "markdown",
569 |    "metadata": {},
570 |    "source": [
571 |     "and we confirm that the new dataset has indeed the new columns:"
572 |    ]
573 |   },
574 |   {
575 |    "cell_type": "code",
576 |    "execution_count": null,
577 |    "metadata": {},
578 |    "outputs": [],
579 |    "source": [
580 |     "sampler.take_sample(extended_dataset['resource'], size=3)\n",
581 |     "[{'id':f['id'], 'name':f['name'], 'optype':f['optype']} for f in sampler.fields()]"
582 |    ]
583 |   },
584 |   {
585 |    "cell_type": "code",
586 |    "execution_count": null,
587 |    "metadata": {},
588 |    "outputs": [],
589 |    "source": [
590 |     "sampler.rows()"
591 |    ]
592 |   }
593 |  ],
594 |  "metadata": {
595 |   "kernelspec": {
596 |    "display_name": "Python 2",
597 |    "language": "python",
598 |    "name": "python2"
599 |   },
600 |   "language_info": {
601 |    "codemirror_mode": {
602 |     "name": "ipython",
603 |     "version": 2
604 |    },
605 |    "file_extension": ".py",
606 |    "mimetype": "text/x-python",
607 |    "name": "python",
608 |    "nbconvert_exporter": "python",
609 |    "pygments_lexer": "ipython2",
610 |    "version": "2.7.15+"
611 |   }
612 |  },
613 |  "nbformat": 4,
614 |  "nbformat_minor": 1
615 | }
616 | 


--------------------------------------------------------------------------------
/docs/user-manual.rst:
--------------------------------------------------------------------------------
   1 | Flatline user manual
   2 | ====================
   3 | 
   4 | S-expressions vs. JSON
   5 | ----------------------
   6 | 
   7 | Flatline expressions in this manual use its lisp-like syntax, based on
   8 | `symbolic expressions <http://en.wikipedia.org/wiki/S-expression>`__ or
   9 | *sexps*. When sending them to BigML via our API, you can also use their
  10 | JSON representation, which is trivially obtained by using JSON lists for
  11 | each paranthesised sexp. For instance:
  12 | 
  13 | ::
  14 | 
  15 |         (if (< (f "a") 3) 0 4) => ["if", ["<", ["f", "a"], 3], 0, 4]
  16 | 
  17 | Literal values
  18 | --------------
  19 | 
  20 | Constant numbers, symbols, booleans and strings, using Java/Clojure
  21 | syntax are valid expressions.
  22 | 
  23 | Examples:
  24 | 
  25 | .. code:: lisp
  26 | 
  27 |        1258
  28 |        2.349
  29 |        this-is-a-symbol
  30 |        "a string"
  31 |        true
  32 |        false
  33 | 
  34 | Counters
  35 | --------
  36 | 
  37 | While running over an input dataset, Flatline keeps track of the
  38 | (zero-based) number of the input row that's being used, which can be
  39 | accessed with the function ``row-number``, which takes no arguments:
  40 | 
  41 | ::
  42 | 
  43 |         (row-number) => current input row (0-based)
  44 | 
  45 | A typical use of this function is to generate a unique identifier for
  46 | each row. The row number will start at 0 unless you skip some rows of
  47 | the input dataset, and increase by one on each new row (unless you're
  48 | specifying a input row step when generating a dataset).
  49 | 
  50 | Field accessors
  51 | ---------------
  52 | 
  53 | Field values
  54 | ~~~~~~~~~~~~
  55 | 
  56 | Input field values are accessed using the ``field`` operator:
  57 | 
  58 | ::
  59 | 
  60 |          (field <field-designator> [<shift>] [<default-value>])
  61 |          <field-designator> := field id | field name | column_number
  62 |          <shift> := integer expression
  63 |          <default-value> := output value if the requested row is out-of-range
  64 | 
  65 | where ``<field-designator>`` can be either the identifier, name or
  66 | column number of the desired field, and the optional ``<shift>`` (an
  67 | integer, defaulting to 0) denotes the offset with respect to the
  68 | current input row. The optional ``<default-value>`` is the output
  69 | value if the value of the field for the given row (taking into account
  70 | the shift, if any) is outside the limits of our dataset. It can be a
  71 | constant value or an expression. If ``<default-value>`` is not set,
  72 | the accessor will return a missing in those cases.
  73 | 
  74 | So, for instance, these sexps denote field values extracted from the
  75 | current row:
  76 | 
  77 | .. code:: lisp
  78 | 
  79 |           (field 0)
  80 |           (field 0 0)
  81 |           (field 0 -1 "default-string")
  82 |           (field 0 -1 (mean (field 0)))
  83 |           (field 0 -1 3)
  84 |           (field "000004")
  85 |           (field "a field name" 0)
  86 | 
  87 | while
  88 | 
  89 | .. code:: lisp
  90 | 
  91 |           (field "000001" -2)
  92 | 
  93 | denotes the value of the cell corresponding to a field with identifier
  94 | "000001" two rows *before* the current one. Positive shift values denote
  95 | rows after the current one.
  96 | 
  97 | .. code:: lisp
  98 | 
  99 |           (field "a field" 3)
 100 |           (field "another field" 2)
 101 | 
 102 | For convenience, and since ``field`` is probably going to be your most
 103 | often user operator, it can be abbreviated to ``f``:
 104 | 
 105 | .. code:: lisp
 106 | 
 107 |           (f "000001" -2)
 108 |           (f 3 1)
 109 |           (f 1 -1 3)
 110 |           (f "a field" 23)
 111 | 
 112 | We also provide a predicate, ``missing?``, that will tell you whether
 113 | the value of the field for the given row (taking into account the
 114 | shift, if any) is a missing token:
 115 | 
 116 | ::
 117 | 
 118 |           (missing? <field-designator> [<shift>])
 119 | 
 120 | E.g.:
 121 | 
 122 | .. code:: lisp
 123 | 
 124 |           (missing? "species")
 125 |           (missing? "000001" -2)
 126 |           (missing? 3 1)
 127 |           (missing? "a field" 23)
 128 | 
 129 | will all yield boolean values. For backwards compatibility, ``missing``
 130 | is an alias for ``missing?``.
 131 | 
 132 | Randomized field values
 133 | ~~~~~~~~~~~~~~~~~~~~~~~
 134 | 
 135 | There are two Flatline functions that will let you generate a random
 136 | value in the domain of a given field, given its designator:
 137 | 
 138 | ::
 139 | 
 140 |          (random-value <field-designator>)
 141 |          (weighted-random-value <field-designator>)
 142 | 
 143 | e.g.
 144 | 
 145 | .. code:: lisp
 146 | 
 147 |          (random-value "age")
 148 |          (weighted-random-value "000001")
 149 |          (weighted-random-value 3)
 150 | 
 151 | Both functions generate a value with the constrain that it belongs to
 152 | the domain of the given field, but while ``random-value`` uses a uniform
 153 | probability of the field's range of values, ``weighted-random-value``
 154 | uses de distribution of the field values (as computed in its histogram)
 155 | as the probability measure for the random generator.
 156 | 
 157 | These two functions work for numeric, categorical and text fields, with
 158 | generated values satisfying:
 159 | 
 160 | -  For numeric fields, generated values are in the interval
 161 |    ``[(minimum <fid>),  (maximum <fid>)]``
 162 | -  For categorical fields, generated values belong to the set
 163 |    ``(categories <fid>)``
 164 | -  For text fields, we generate terms in the field's tag cloud
 165 |    (generated values correspond to single terms in the cloud).
 166 | -  Datetime **parent** fields are not supported, since they don't have a
 167 |    defined distribution: you can use any of their numeric children for
 168 |    generating values following their distributions.
 169 | 
 170 | A common use of these functions is replacing missing values with random
 171 | data, which in Flatline you could write as, say:
 172 | 
 173 | .. code:: lisp
 174 | 
 175 |         (if (missing? "00000") (random-value "000000") (f "000000"))
 176 | 
 177 | We provide a shortcut for those common operations with the functions
 178 | ``ensure-value`` and ``ensure-weighted-value``:
 179 | 
 180 | ::
 181 | 
 182 |        (ensure-value <fdes>) :=
 183 |          (if (missing? <fdes>) (random-value <fdes>) (field <fdes>))
 184 | 
 185 |        (ensure-weighted-value <fdes>) :=
 186 |          (if (missing? <fdes>) (weighted-random-value <fdes>) (field <fdes>))
 187 | 
 188 | We them, our example above can be simply written as:
 189 | 
 190 | .. code:: lisp
 191 | 
 192 |        (ensure-value "000000")
 193 | 
 194 | or, if you want that the generated random values follow the same
 195 | distribution as the field "000000":
 196 | 
 197 | .. code:: lisp
 198 | 
 199 |        (ensure-weighted-value "000000")
 200 | 
 201 | Normalized field values
 202 | ~~~~~~~~~~~~~~~~~~~~~~~
 203 | 
 204 | For numeric fields, it's often useful to normalize their values to a
 205 | standard interval (usually [0, 1]). To that end, you can use the
 206 | Flatline primitive ``normalize``, which takes as arguments the
 207 | designator for the field you want to normalize and, optionally, the two
 208 | bounds of the resulting interval:
 209 | 
 210 | ::
 211 | 
 212 |          (normalize <id> [<from> <to>])
 213 |          => (+ from (* (- to from)
 214 |                        (/ (- (f id) (minimum id))
 215 |                           (- (maximum id) (minimum id)))))
 216 | 
 217 | For instance:
 218 | 
 219 | .. code:: lisp
 220 | 
 221 |          (normalize "000001") ;; = (normalize "000001" 0 1)
 222 |          (normalize "width" -1 1)
 223 |          (normalize "length" 8 23)
 224 | 
 225 | As shown in the formula above, ``normalize`` linearly maps the minimum
 226 | value of the field to ``from`` (0 by default) and the maximum value to
 227 | ``to`` (1 by default).
 228 | 
 229 | Besides this linear normalization, it's also common to standardize
 230 | numeric data values by mapping them to a gaussian, according to the
 231 | equation:
 232 | 
 233 | ::
 234 | 
 235 |          x[i] -> (x[i] - mean(x)) / standard_deviation(x)
 236 | 
 237 | or, in flatline terms:
 238 | 
 239 | ::
 240 | 
 241 |         (/ (- (f <id>) (mean <id>)) (standard-deviation <id>))
 242 | 
 243 | This normalization function is called the Z score, and we provide it as
 244 | the function ``z-score``:
 245 | 
 246 | ::
 247 | 
 248 |         (z-score <field-designator>)
 249 | 
 250 | E.g.:
 251 | 
 252 | .. code:: lisp
 253 | 
 254 |         (z-score "000034")
 255 |         (z-score "a numeric field")
 256 |         (z-score 23)
 257 | 
 258 | As with ``normalize``, the field used must have a numeric optype.
 259 | 
 260 | You can use the function ``log-normal`` to apply ``z-score`` to the
 261 | logarithm of your field. This is useful when your field follows a
 262 | log-normal distribution and you want to map it to a gaussian.
 263 | 
 264 | 
 265 | .. code:: lisp
 266 | 
 267 |         (log-normal "000003")
 268 |         (z-score "a numeric field")
 269 |         (z-score 1)
 270 | 
 271 | 
 272 | This function requires numeric fields with, at least, 80% of the
 273 | values greater than 0 and a non-zero mean value.
 274 | 
 275 | Vectorized categorical or text fields
 276 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 277 | 
 278 | It may be useful to convert categorical or text fields to numeric values
 279 | for models which accept only numeric data as input. This can be
 280 | accomplished with the Flatline primitive ``vectorize``:
 281 | 
 282 | ::
 283 | 
 284 |         (vectorize <field-designator> [<max-fields>])
 285 | 
 286 | For categorical fields, the output is a binary indicator vector. In
 287 | other words, it is a list of numeric fields, one per possible
 288 | categorical value, and for each instance, the numeric field
 289 | corresponding to the category of that instance will have a value of
 290 | ``1``, whereas the remaining numeric fields will have a value of ``0``.
 291 | 
 292 | For text fields, the output is a list of numeric fields, each
 293 | corresponding to a term in the field's tag cloud. The value of each
 294 | field is the number of times that term appears in that instance.
 295 | 
 296 | A numeric expression or literal can be passed as an optional second
 297 | argument to limit the number of generated fields to the *n* most
 298 | frequent categories or text terms.
 299 | 
 300 | Field properties
 301 | ~~~~~~~~~~~~~~~~
 302 | 
 303 | Summary properties
 304 | ^^^^^^^^^^^^^^^^^^
 305 | 
 306 | Field descriptors contain lots of properties with metadata about the
 307 | field, including its summary. These propeties (when they're atomic) can
 308 | be accessed via ``field-prop``:
 309 | 
 310 | ::
 311 | 
 312 |            (field-prop <type> <field-descriptor> <property-name> ...)
 313 |            <type> := string | numeric | boolean
 314 | 
 315 | For instance, you can access the name for field "00023" via:
 316 | 
 317 | .. code:: lisp
 318 | 
 319 |            (field-prop string "00023" name)
 320 | 
 321 | or the value of the nested property missing\_count inside the summary
 322 | with:
 323 | 
 324 | .. code:: lisp
 325 | 
 326 |            (field-prop numeric "00023" summary missing_count)
 327 | 
 328 | We provide several shortcuts for concrete summary properties, to save
 329 | you typing:
 330 | 
 331 | ::
 332 | 
 333 |         (maximum <field-designator>)
 334 |         (mean <field-designator>)
 335 |         (median <field-designator>)
 336 |         (minimum <field-designator>)
 337 |         (missing-count <field-designator>)
 338 |         (population <field-designator>)
 339 |         (sum <field-designator>)
 340 |         (sum-squares <field-designator>)
 341 |         (standard-deviation <field-designator>)
 342 |         (variance <field-designator>)
 343 | 
 344 |         (preferred? <field-designator>)
 345 | 
 346 |         (category-count <field-designator> <category>)
 347 |         (bin-center <field-designator> <bin-number>)
 348 |         (bin-count <field-designator> <bin-number>)
 349 | 
 350 | As you can see, the category and count accessors take an additional
 351 | parameter designating either the category (a string or order number) and
 352 | the bin (a 0-based integer index) you refer to:
 353 | 
 354 | .. code:: lisp
 355 | 
 356 |          (category-count "species" "Iris-versicolor")
 357 |          (category-count "species" (f "000004"))
 358 |          (bin-count "age" (f "bin-selector"))
 359 |          (bin-center "000003" 3)
 360 |          (bin-center (field "field-selector") 4)
 361 | 
 362 | Discretization of numeric fields
 363 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 364 | 
 365 | A simple way to discretize a numeric field is to assign a label to each
 366 | of a finite set of segments, defined by a sequence of upper bounds. For
 367 | instance:
 368 | 
 369 | .. code:: lisp
 370 | 
 371 |         (let (v (f "age"))
 372 |           (cond (< v 2) "baby"
 373 |                 (< v 10) "child"
 374 |                 (< v 20) "teenager"
 375 |                 "adult"))
 376 | 
 377 | Flatline provides a shortcut for the above expression via its
 378 | ``segment-label`` primitive:
 379 | 
 380 | .. code:: lisp
 381 | 
 382 |        (segment-label "000000" "baby" 2 "child" 10 "teenager" 20 "adult")
 383 | 
 384 | As you can see, the first argument is the field designator (as usual, a
 385 | name, column number or identifier), followed by alternating labels and
 386 | upper bounds. More generally:
 387 | 
 388 | ::
 389 | 
 390 |         (segment-label <fdes> <l1> <b1> ... <ln-1> <bn-1> <ln>)
 391 |         <l1> ... <ln> strings, <b1> ... <bn-1> numbers
 392 |         => (cond (< (f <fdes>) <b1>) <l1>
 393 |                  (< (f <fdes>) <b2>) <l2>
 394 |                  ...
 395 |                  (< (f <fdes>) <bn-1>) <ln-1>
 396 |                  <ln>)
 397 | 
 398 | The alternating labels and bounds must be constant strings and numbers.
 399 | If you want to use segments of equal length between the minimum and
 400 | maximum value of the field, you can omit the upper bounds and give
 401 | simply the list of labels, e.g.
 402 | 
 403 | .. code:: lisp
 404 | 
 405 |         (segment-label 0 "1st fourth" "2nd fourth" "3rd fourth" "4th fourth")
 406 | 
 407 | which would be equivalent to:
 408 | 
 409 | .. code:: lisp
 410 | 
 411 |         (let (max (maximum 0)
 412 |               min (minimum 0)
 413 |               step (/ (- max min) 4))
 414 |           (segment-label 0 "1st fourth" (+ min step)
 415 |                            "2nd fourth" (+ min step step)
 416 |                            "3rd fourth" (+ min step step step)
 417 |                            "4th fourth"))
 418 | 
 419 | or, in general:
 420 | 
 421 | ::
 422 | 
 423 |          (segment-label <fdes> <l1> ... <ln>) with  <l1> ... <ln> strings
 424 | 
 425 |          => (let (min (minimum <fdes>)
 426 |                   step (- (maximum <fdes>) min)
 427 |                   shift (- (f <fdes>) min))
 428 |               (cond (< shift step) <l1>
 429 |                     (< shift (* 2 step)) <l2>
 430 |                     ...
 431 |                     (< shift (* (- n 1) step)) <ln-1>
 432 |                     <ln>))
 433 | 
 434 | Items and itemsets
 435 | ^^^^^^^^^^^^^^^^^^
 436 | 
 437 | A common operation on fields of optype *items* is to check whether they
 438 | contain a list of items. That can be used, for instance, to filter the
 439 | rows of a dataset that satisfy a given association rule, but calling
 440 | ``contains-items?`` with the list of items in the antecedent and
 441 | consequent of the desired rule.
 442 | 
 443 | ::
 444 | 
 445 |        (contains-items? <field-designator> <item_0> ... <item_n>)
 446 |        ;; with <item_i> of type string for i in [0, n]
 447 | 
 448 | The ``contains-items`` primitive takes as first argument the descriptor
 449 | of the field we want to check (which must have optype items), followed
 450 | by the one or more items we want to check, which must all have type
 451 | string. For instance, the predicate:
 452 | 
 453 | .. code:: lisp
 454 | 
 455 |         (contains-items? "000000" "blue" "green" "darkblue")
 456 | 
 457 | will filter the rows whose first column satisfies the association rule
 458 | ``blue, green -> darkblue``.
 459 | 
 460 | It is also possible to check whether an items field contains *only* the
 461 | given list of items (in any order), using ``equal-to-items?``, which
 462 | works exactly as ``contains-items?`` except for the fact that it's
 463 | exclusive:
 464 | 
 465 | ::
 466 | 
 467 |        (equals-to-items? <field-designator> <item_0> ... <item_n>)
 468 |        ;; with <item_i> of type string for i in [0, n]
 469 | 
 470 | 
 471 | Regions
 472 | ^^^^^^^
 473 | 
 474 | It is possible to manipulate and modify values of type *regions*.  In
 475 | flatline, a regions value is a list of lists.  Each of the inner lists
 476 | has 5 elements.  The first one is the label of the region (a string),
 477 | and its followed by four integers, which are the coordinates of the
 478 | top-left corner and bottom-right corner of the region at hand.   The
 479 | ``regions?`` primitive checks whether a list represents a valid
 480 | region (checking also if the vertex coordinates are consitent):
 481 | 
 482 | .. code:: lisp
 483 | 
 484 |        (region? (list "label" 10 10 20 30)) ;; => "true"
 485 |        (region? (list 10 10 20 30)) ;; => "false"
 486 |        (region? (list -10 10 -20 30)) ;; => "false"
 487 | 
 488 | When we access a field of type regions, the returned value will be a
 489 | list with all its values satisfying the ``region?`` predictate.  We
 490 | can add a new region to it with ``add-region``:
 491 | 
 492 | ::
 493 | 
 494 |    (add-region <field-designator> <region-value>)
 495 |    (add-region <field-designator> <label> <x0> <y0> <x1> <y1>)
 496 |    ;; with <region-value> and (list <label> <x0> <y0> <x1> <y1>) a region
 497 |    (add-region <regions-value> <region-value>)
 498 |    (add-region <regions-value> <label> <x0> <y0> <x1> <y1>)
 499 |    ;; with <regions-value> a list of region values
 500 | 
 501 | For instance, if "field-r" is the name of a regions fields, we can
 502 | create a new regions value with the two equivalent forms:
 503 | 
 504 | .. code:: lisp
 505 | 
 506 |           (add-region "field-r" "region0" 10 10 123 200)
 507 | 
 508 |           (let (region (list "region0" 10 10 123 200))
 509 |             (add-region "field-r" region))
 510 | 
 511 | Or we can manipulate directly regions values:
 512 | 
 513 | .. code:: lisp
 514 | 
 515 |        (let (r (list (list "a" 0 0 10 10)))
 516 |          (add-region r (field "field-r")))
 517 | 
 518 | The above example will add to all values of the regions field
 519 | "field-r" the new region ``["a" 0 0 10 10]``.
 520 | 
 521 | We can also remove all regions of a given *label*:
 522 | 
 523 | ::
 524 | 
 525 |    (remove-region <field-designator> <label>)
 526 |    (remove-region <regions-value> <label>)
 527 | 
 528 | E.g.
 529 | 
 530 | .. code:: lisp
 531 | 
 532 |           (remove-region "field-r" "label")
 533 |           (remove-region (list (list "a" 1 2 3 4)
 534 |                                (list "b" 1 1 20 20)
 535 |                                (list "a" 0 0 1 2))
 536 |                          "a")  ;; => (list (list "b" 1 1 20 20))
 537 | 
 538 | As mentioned, ``remove-region`` will remove all entries that have
 539 | "label" as their label (first element).
 540 | 
 541 | We can combine the action of ``remove-region`` and ``add-region`` with
 542 | ``update-region``:
 543 | 
 544 | ::
 545 | 
 546 |    (update-region <field-designator> <region-value>)
 547 |    (update-region <field-designator> <label> <x0> <y0> <x1> <y1>)
 548 |    ;; with <region-value> and (list <label> <x0> <y0> <x1> <y1>) a region
 549 |    (update-region <regions-value> <region-value>)
 550 |    (update-region <regions-value> <label> <x0> <y0> <x1> <y1>)
 551 |    ;; with <regions-value> a list of region values
 552 | 
 553 | This primitive will first remove all entries with the same label as
 554 | the given region, and then add it to the regions value.
 555 | 
 556 | It is also possible to rename all occurences of a given label by a new
 557 | one:
 558 | 
 559 | ::
 560 | 
 561 |    (rename-region <field-designator> <old-label-string> <new-label-string>)
 562 |    (rename-region <regions-value> <old-label-string> <new-label-string>)
 563 | 
 564 | E.g.:
 565 | 
 566 | .. code:: lisp
 567 | 
 568 |     (rename-region "100002" "red" "crimson")
 569 |     (rename-region (list (list "a" 1 1 20 10)
 570 |                          (list "c" 10 10 20 20)
 571 |                          (list "a" 11 11 20 21))
 572 |                    "a" "x")
 573 |           ;; => (list (list "x" 1 1 20 10)
 574 |           ;;          (list "c" 10 10 20 20)
 575 |           ;;          (list "x" 11 11 20 21))
 576 | 
 577 | Finally, we can obtain a list of the labels used by a regions field
 578 | with
 579 | 
 580 | ::
 581 | 
 582 |    (region-labels <field-designator>) ;; => list of strings
 583 | 
 584 | As usual, in all the primitives described above,
 585 | ``<field-designator>`` can be a name, identifier or column number,
 586 | always pointing to a field of type *regions*.
 587 | 
 588 | 
 589 | Field population, percentiles &co for numeric fields
 590 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 591 | 
 592 | Besides direct readings from the field summaries, there exist other
 593 | derived statistical properties available as Flatline functions. In
 594 | particular, these are the ones related to population percentiles and
 595 | their distribution for *numeric* fields:
 596 | 
 597 | ::
 598 | 
 599 |         (percentile <field-designator> <fraction>)    ;; fraction in [0.0, 1.0]
 600 |         (within-percentiles? <field-designator> <lower> <upper>)
 601 |         (population-fraction <field-designator> <sexp>)
 602 |         (percentile-label <field-designator> <label-0> ... <label-n>)
 603 | 
 604 | The first one, ``percentile``, gives you the value that a numeric field
 605 | must have in order to be in the given population fraction. Thus, you
 606 | could use, for instance, the following predicate in a filter to remove
 607 | outliers:
 608 | 
 609 | .. code:: lisp
 610 | 
 611 |          (<= (percentile "age" 0.5) (f "age") (percentile "age" 0.95))
 612 | 
 613 | We provide syntactic sugar for the above expression via
 614 | ``within-percentiles?``:
 615 | 
 616 | .. code:: lisp
 617 | 
 618 |          (within-percentiles? "age" 0.5 0.95)
 619 | 
 620 | Related to percentile is ``population-fraction``, which, given a field
 621 | identifier and a value, computes the number of instances of this field
 622 | whose value is less than the given one. As with the case of
 623 | ``percentile``, the designated field must be numeric.
 624 | 
 625 | Finally, ``percentile-label`` computes the percentile the input value
 626 | belongs to and generates the label you provided. For instance, this
 627 | generator:
 628 | 
 629 | .. code:: lisp
 630 | 
 631 |         (percentile-label "000023" "1st" "2nd" "3rd" "4th")
 632 | 
 633 | will generate the label "1st" if the value of the field 000023 is in the
 634 | first population "quartile" (since we're providing 4 labels, we use 4
 635 | segments), "2nd" to the second, etc. The sexp above is equivalent to:
 636 | 
 637 | .. code:: lisp
 638 | 
 639 |         (cond (within-percentiles? "000023" 0 0.25) "1st"
 640 |               (within-percentiles? "000023" 0.25 0.5) "2nd"
 641 |               (within-percentiles? "000023" 0.5 0.75) "3rd"
 642 |               "4th")
 643 | 
 644 | and, as you see, it easily generalizes to any number of labels: if you
 645 | had provided 5 labels we'd be computing "quintiles"; had them been 10,
 646 | the labels would correspond to "deciles," and so forth. As with all
 647 | functions in this section, the target field must be numeric.
 648 | 
 649 | Note that we're using scare quotes around quartile, quintiles, etc.
 650 | above. That's because ``percentile-label`` will assign to each value the
 651 | label of the lowest percentile it belongs to, and therefore, it won't
 652 | really discretize your variable by exact quantiles: if the population is
 653 | skewed around a value, so it'll be the resulting labels' population.
 654 | 
 655 | Strings and regular expressions
 656 | -------------------------------
 657 | 
 658 | Coercion and substrings
 659 | ~~~~~~~~~~~~~~~~~~~~~~~
 660 | 
 661 | Any value can be coerced to a string using the ``str`` operator, which
 662 | will also concatenate the corresponding strings when called with more
 663 | than one argument:
 664 | 
 665 | ::
 666 | 
 667 |         (str <sexp0> ...)
 668 | 
 669 | For instance:
 670 | 
 671 | .. code:: lisp
 672 | 
 673 |         (str 1 "hello " (field "a"))   ;; =>  "1hello <value of field a>"
 674 |         (str "value_" (+ 3 4) "/" (name "000001"))  ;; => "value_7/a"
 675 | 
 676 | It is also possible to take a substring of a string value using the
 677 | ``subs`` operator:
 678 | 
 679 | ::
 680 | 
 681 |         (subs <string> <start> [<end>])
 682 |         <start> in [0 (length <string>))
 683 |         <end> in (0 (length <string>)]
 684 | 
 685 | It returns the substring of ``<string>`` beginning at start inclusive,
 686 | and ending at end (defaults to length of string), exclusive.
 687 | 
 688 | String utilities
 689 | ~~~~~~~~~~~~~~~~
 690 | 
 691 | The number of characters in a string value is given by ``length``:
 692 | 
 693 | ::
 694 | 
 695 |          (length <string>)
 696 | 
 697 | e.g.
 698 | 
 699 | .. code:: lisp
 700 | 
 701 |          (length "abc") => 3
 702 |          (length "") => 0
 703 | 
 704 | Note that the length of a missing value is a missing value, not zero.
 705 | 
 706 | The primitive ``join`` allows joining a list of string values using a
 707 | given separator, optionally skipping any missing values in the list:
 708 | 
 709 | ::
 710 | 
 711 |    (join <list of string> [<sep-string>] [<skip-missings?>])
 712 | 
 713 | For instance:
 714 | 
 715 | .. code:: lisp
 716 | 
 717 |         (join (list "a" "b" "zz")) => "abzz"
 718 |         (join (list "a" "b") "|") => "a|b"
 719 |         (join (list "a" "b" "c") "x") => "axbxc"
 720 |         (join (list "a" (f 1) "b") ",") => MISSING (if (missing? 1))
 721 |         (join (list "a" (f 1) "b") "," true) => "a,b" (if (missing? 1))
 722 |         (join (list "a" (f 1) "b") true) => "ab" (if (missing? 1))
 723 | 
 724 | The primitive ``levenshtein`` computes, as an integer, the distance
 725 | between two given string values:
 726 | 
 727 | ::
 728 | 
 729 |         (levenshtein <str-sexp0> <str-sexp1>)
 730 | 
 731 | Arbitrary arguments are allowed, provided they're strings:
 732 | 
 733 | .. code:: lisp
 734 | 
 735 |         (levenshtein (f 0) "a random string")
 736 |         (if (< (levenshtein (f 0) "bluething") 5) "bluething" (f 0))
 737 | 
 738 | You can also compute the number of times a word appears in a given
 739 | string by means of the ``occurrences`` function. It takes an input
 740 | string and the term to look for as mandatory parameters, and,
 741 | optionally, a language code, and a boolean controlling case sensitivity:
 742 | 
 743 | ::
 744 | 
 745 |         (occurrences <string> <term> [<case-insensitive?> <lang>])
 746 |         <case-insensitive?> := true | false (defaults to false)
 747 |         <lang> := "en" | "es" | "ca" | "nl" | "none" (defaults to "none")
 748 | 
 749 | By default, terms matching is case sensitive and exact. The optional
 750 | third argument is a boolean flag to turn on case insensitivity. Finally,
 751 | if you provide a fourth constant argument specifying one of the known
 752 | languages (English, Spanish, Catalan or Dutch), words are compared using
 753 | their stems (e.g., in English, "day" and "days" will be considered the
 754 | same term).
 755 | 
 756 | For instance:
 757 | 
 758 | .. code:: lisp
 759 | 
 760 |         (occurrences "howdy woman, howdy" "howdy") => 2
 761 |         (occurrences "howdy woman" "Man" true) => 0
 762 |         (occurrences "howdy man" "Man" true) => 1
 763 |         (occurrences "hola, Holas" "hola" true "es") => 2
 764 | 
 765 | Hashing functions
 766 | ~~~~~~~~~~~~~~~~~
 767 | 
 768 | There are several hashing functions available: ``md5``, ``sha1`` and
 769 | ``sha256``. These functions act on the stream of bytes of their input
 770 | string and return a string representing the bytes that the cryptographic
 771 | digest they name produces, in their hexadecimal representation:
 772 | 
 773 | .. code:: lisp
 774 | 
 775 |          (md5 <string>) => string of length 32
 776 |          (sha1 <string>) => string of length 40
 777 |          (sha256 <string>) => string of length 64
 778 | 
 779 | e.g.
 780 | 
 781 | .. code:: lisp
 782 | 
 783 |          (md5 "a text") => "b229386ec4627869d2c71b7df3c9600a"
 784 |          (sha1 "a text") => "7081f2babbafff16b4bae16282859c844baa14ef"
 785 |          (sha256 "") =>
 786 |          "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"
 787 | 
 788 | As shown, the returned strings use charaters in ``[0-9a-f]`` to
 789 | represent the values of the output bytes: md5 produces 16 bytes (128
 790 | bits), sha-1 produces 20 bytes (160 bits) and sha-256 produces 32 bytes
 791 | (256 bits).
 792 | 
 793 | Regular expression matching
 794 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~
 795 | 
 796 | The ``matches?`` function takes a regular expression as a string and a
 797 | form evaluating to a string and returns a boolean telling you if the
 798 | latter matches the former.
 799 | 
 800 | ::
 801 | 
 802 |         (matches? <string> <regex-string>)  => boolean
 803 |         <regex-string> := a string form representing a regular expression
 804 |         <string> := a string expression to be tested against the regexp
 805 | 
 806 | Regular expressions follow the Perl and Java syntax and extensions (see
 807 | for instance `this
 808 | summary <http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html>`__),
 809 | including flags modifiers such as "(?i)" to turn on case-insensitive
 810 | mode.
 811 | 
 812 | For instance, to check if the field "name" contains the word "Hal"
 813 | anywhere, you could use:
 814 | 
 815 | .. code:: lisp
 816 | 
 817 |          (matches? (field "name") ".*\\sHal\\s.*")
 818 |          (matches? (field "name") "(?i).*\\shal\\s.*")
 819 | 
 820 | where the second form performs case-insensitive pattern matching.
 821 | 
 822 | It's possible to use non-constant string values for the regular
 823 | expression, but take into account that any special character in the
 824 | string will be treated as such when it's converted to a regular
 825 | expression. If what you want is to match literally the contents of a
 826 | field, use ``re-quote``:
 827 | 
 828 | ::
 829 | 
 830 |           (re-quote <string>)  => regexp that matches <string> literally
 831 | 
 832 | and then you can write things like:
 833 | 
 834 | .. code:: lisp
 835 | 
 836 |           (if (matches? (f "result") (re-quote (f "target"))) "GOOD" "MISS")
 837 | 
 838 | and you can use the string concatenation operator ``str`` to construct
 839 | regular expressions strings out of smaller pieces:
 840 | 
 841 | .. code:: lisp
 842 | 
 843 |           (matches? (f "name") (str "^" (re-quote (f "salutation")) "\\s *$"))
 844 | 
 845 | Regular expression search and replace
 846 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 847 | 
 848 | Given a string expression, you can substitute matches of a given regexp
 849 | by a given replacement string using ``replace`` and ``replace-first``:
 850 | 
 851 | ::
 852 | 
 853 |          (replace <string> <regexp> <replacement>)
 854 |          (replace-first <string> <regexp> <replacement>)
 855 | 
 856 | e.g.:
 857 | 
 858 | .. code:: lisp
 859 | 
 860 |          (replace "Target string ss" "\\Ws" "S") => "TargetStringSs"
 861 | 
 862 | The replacement is literal, except that "$1", "$2", etc. in the
 863 | replacement string are substituted with the string that matched the
 864 | corresponding parenthesized group in the pattern. If you want them to
 865 | appear literally in the replacement string, just use "\\$1" and the
 866 | like.
 867 | 
 868 | For example:
 869 | 
 870 | .. code:: lisp
 871 | 
 872 |          (replace "Almost Pig Latin" "\\b(\\w)(\\w+)\\b" "$2$1ay")
 873 |           => "lmostAay igPay atinLay"
 874 | 
 875 | While ``replace`` replaces all occurrences of the regular expression,
 876 | ``replace-first`` stops after the first match:
 877 | 
 878 | .. code:: lisp
 879 | 
 880 |          (replace-first "swap first two words" "(\\w+)(\\s+)(\\w+)" "$3$2$1")
 881 |           => "first swap two words"
 882 | 
 883 | Text analysis
 884 | ~~~~~~~~~~~~~
 885 | 
 886 | Flatline provides a primitive function, ``language``, that tries to
 887 | detect the language of a given string value. It returns the ISO 639 code
 888 | of the detected language, as a string.
 889 | 
 890 | ::
 891 | 
 892 |         (language <string>) => <ISO 639 string code>
 893 | 
 894 | For instance:
 895 | 
 896 | .. code:: lisp
 897 | 
 898 |         (language "this is an English phrase") => "en"
 899 | 
 900 | Note that language detectors will do in general a very poor job for
 901 | short texts, and that we currently limit the set of detected languages
 902 | to those used in BigML's text analysis facility (English, Spanish,
 903 | Catalan or Dutch as of this writing, represented as "en", "es", "ca" and
 904 | "nl", respectively.)
 905 | 
 906 | Relational operators and equality
 907 | ---------------------------------
 908 | 
 909 | You can compare numeric and datetime values with any of the relational
 910 | operators ``<``, ``<=``, ``>``, and ``>=``, which can be applied to two
 911 | or more arguments and always result in a boolean value. For example:
 912 | 
 913 | .. code:: lisp
 914 | 
 915 |        (< (field 0) (field 1))
 916 |        (<= (field 0 -1) (field 0) (field 0 1))
 917 |        (> (field "date") "07-14-1969")
 918 |        (>= 23 (f "000004" -2))
 919 | 
 920 | The equality (``=``) and inequality (``!=``) operators can be applied to
 921 | operators of any kind:
 922 | 
 923 | .. code:: lisp
 924 | 
 925 |        (= "Dante" (field "Author"))
 926 |        (= 1300 (field "Year"))
 927 |        (= (field "Year" -2) (field "Year" -1) (field "Year"))
 928 |        (!= (field "00033" -1) (field "00033" 1))
 929 | 
 930 | Comparing numerical values can be tricky, especially when they're the
 931 | result of mathematical operations, but Flatline makes an effort to be
 932 | sensible and considers things like 1 and 1.0 equal (for numeric values,
 933 | it actually uses Clojure's ``==`` operator); but of course it cannot fix
 934 | rounding errors or the like for you!  For convenience, and to help
 935 | readability in some contexts, we provide the trivial predicate
 936 | ``zero?``, which simply expands to a comparison with 0:
 937 | 
 938 | ::
 939 | 
 940 |    (zero? <x>) := (= 0 <x>)
 941 | 
 942 | 
 943 | Logical operators
 944 | -----------------
 945 | 
 946 | The basic logical connectives ``and``, ``or`` and ``not``, acting on
 947 | boolean values, are available, with their usual meanings.
 948 | 
 949 | .. code:: lisp
 950 | 
 951 |        (and (= 3 (field 1)) (= "meh" (f "a")) (< (f "pregnancies") 5))
 952 |        (not true)
 953 | 
 954 | For additional convenience, ``and`` and ``or`` can be applied to lists
 955 | (described below):
 956 | 
 957 | ::
 958 | 
 959 |        (and (list <sexp0> ... <sexpn>)) := (and <sexp0> ... <sexpn>)
 960 |        (or (list <sexp0> ... <sexpn>)) := (or <sexp0> ... <sexpn>)
 961 | 
 962 | Arithmetical operators
 963 | ----------------------
 964 | 
 965 | The usual arithmetical operators ``+``, ``-``, ``*`` and
 966 | ``/`` taking any number of arguments (or zero, for``\ +\ ``and``\ \*\`)
 967 | are available. Of course their operands must evaluate to a numeric
 968 | value; otherwise, the result will be nil, representing a missing value.
 969 | 
 970 | When not coerced, the result of the ``/`` operator has type ``double``.
 971 | If needed, you can transform it to an integer via the coercion function
 972 | ``integer`` or use instead the integer division operator ``div`` (see
 973 | below).
 974 | 
 975 | Numerical coercions
 976 | -------------------
 977 | 
 978 | You can coerce arbitrary values to explicit numeric types. When the
 979 | input sexp is a string (or a category name), we try to parse it as a
 980 | number and afterwards perform a pure numerical coercion if needed.
 981 | Boolean values are mapped to 0 (false) and 1 (true).
 982 | 
 983 | ::
 984 | 
 985 |        (integer <sexp>)
 986 |        (real <sexp>)
 987 | 
 988 | If the input value cannot be coerced to a number the result is a missing
 989 | value.
 990 | 
 991 | Mathematical functions
 992 | ----------------------
 993 | 
 994 | We provide a host of mathematical functions:
 995 | 
 996 | ::
 997 | 
 998 |         (odd? <x>)
 999 |         (even? <x>)
1000 | 
1001 |         (max <x0> ... <xn>)
1002 |         (min <x0> ... <xn>)
1003 | 
1004 |         (abs <x>)     ;; Absolute value
1005 |         (mod <n> <m>) ;; Modulus
1006 |         (div <n> <m>) ;; Integer division (quotient)
1007 |         (sqrt <x>)
1008 |         (pow <x> <n>)
1009 |         (square <x>)  ;; (* <x> <x>)
1010 | 
1011 |         (ln <x>)      ;; Natural logarithm
1012 |         (log <x>)     ;; Base-2 logarithm
1013 |         (log10 <x>)   ;; Base-10 logarithm
1014 |         (exp <x>)     ;; Exponential
1015 | 
1016 |         (ceil <x>)
1017 |         (floor <x>)
1018 |         (round <x>)
1019 | 
1020 |         (cos <x>)     ;; <x> := radians
1021 |         (sin <x>)     ;; <x> := radians
1022 |         (tan <x>)     ;; <x> := radians
1023 | 
1024 |         (to-radians <x>) ;; <x> := degrees
1025 |         (to-degrees <x>) ;; <x> := radians
1026 | 
1027 |         (acos <x>)
1028 |         (asin <x>)
1029 |         (atan <x>)
1030 | 
1031 |         (cosh <x>)
1032 |         (sinh <x>)
1033 |         (tanh <x>)
1034 | 
1035 |         (spherical-distance <lat1> <lon1> <lat2> <lon2>)
1036 |         ;; <lat1>, <lon1>, <lat2>, <lon2> latiude/longitude in radians
1037 | 
1038 |         (spherical-distance-deg <lat1> <lon1> <lat2> <lon2>)
1039 |         ;; <lat1>, <lon1>, <lat2>, <lon2> latiude/longitude in degrees
1040 | 
1041 | As well as two primitives for generating random numbers:
1042 | 
1043 | .. code:: lisp
1044 | 
1045 |          (rand)            ;; a random double in [0, 1)
1046 |          (rand-int <n>)    ;; a random integer in [0, n) or (n, 0]
1047 | 
1048 | Currently there's no way of specifying the seed used for random number
1049 | generation, but it's coming shortly to a selected data generation
1050 | language very near to you.
1051 | 
1052 | Regression
1053 | ~~~~~~~~~~
1054 | 
1055 | It's also possible to compute the slope, intercept and Pearson coeffient
1056 | of the linear regression of a set of points given as a list of
1057 | alternating x and y coordinates:
1058 | 
1059 | ::
1060 | 
1061 |          (linear-regression <x0> <y0> <x1> <y1> ... <xn> <yn>)
1062 |             => (<slope> <intercept> <pearson>) ;; 3 double values
1063 | 
1064 | e.g.
1065 | 
1066 | .. code:: lisp
1067 | 
1068 |          (linear-regression 1 1 2 2 3 3 4 4) => (1.0 0 1.0)
1069 |          (linear-regression 2.0 3.1 2.3 3.3 24.3 45.2) => (1.89 -0.87 0.9999)
1070 | 
1071 | Statistical functions
1072 | ~~~~~~~~~~~~~~~~~~~~~
1073 | 
1074 | The function ``chi-square-p-value`` computes the p-value of a Chi-square
1075 | distribution with the given number of degrees of freedom and a given cut
1076 | value:
1077 | 
1078 | ::
1079 | 
1080 |         (chi-square-p-value <d> <x>)
1081 |           ;; => <p-value>, with <d> integer <x> a number
1082 | 
1083 | Thus, the value ``x`` passes the Chi-square test if the value returned
1084 | by ``(chi-square-p-value d x)`` is less than or equal to ``x``. For
1085 | instance, the expression:
1086 | 
1087 | .. code:: lisp
1088 | 
1089 |        (<= (chi-square-p-value 2 (field "000000")) 0.05)
1090 | 
1091 | will compute a boolean that tells you whether the field "000000" passes
1092 | a Chi-square test for two degrees of freedom with significance level
1093 | 0.05.
1094 | 
1095 | 
1096 | Fuzzy logic
1097 | -----------
1098 | 
1099 | Flatline provides some functions to work with fuzzy logic features,
1100 | fields or values.  Fuzzy logic is a form of many-valued logic in which
1101 | the truth values of variables may be any real number between 0 and 1
1102 | inclusive. It is employed to handle the concept of partial truth,
1103 | where the truth value may range between completely true and completely
1104 | false.
1105 | 
1106 | Check this link to know more about fuzy logic:
1107 | - `Wikipedia: Fuzzy logic <https://en.wikipedia.org/wiki/Fuzzy_logic>`__
1108 | 
1109 | Triangular norms (t-norms) and conorms (t-cnorms are operations which
1110 | generalize the logical conjunction and logical disjunction to fuzzy
1111 | logic.
1112 | 
1113 | You can find more information about t-norms and t-conorms in the
1114 | following links:
1115 | 
1116 | - `Wikipedia: t-norms and t-conorms <https://en.wikipedia.org/wiki/T-norm>`__
1117 | - `Wikipedia: Construction of t-norms <https://en.wikipedia.org/wiki/Construction_of_t-norms>`__
1118 | 
1119 | All the norms are computed from two numeric values that can be
1120 | specified either by referencing a numeric input field (giving its name
1121 | or id) or by any valid flatline numeric expression. For instance:
1122 | 
1123 | ::
1124 | 
1125 |         (tnorm-min "field21" "field4")
1126 |         (tnorm-min "000002" "000001")
1127 |         (tnorm-min 0.70 0.24)
1128 |         (tnorm-min "000002" 0.1)
1129 |         (tnorm-min 0.2 (field 1))
1130 | 
1131 | 
1132 | 
1133 | 
1134 | Numeric values used by these norms must be between 0 and 1. As they
1135 | are fuzzy logic values it doesn't make sense having values outside
1136 | this range. If you pass a field to the norms with more than 80% of its
1137 | values outside this range, an exception will be raised.  When some
1138 | sparse out-of-range values are found during calculations, the
1139 | generated field will contain a missing value for this specific row.
1140 | 
1141 | If your input fields are out of range, consider normalizing or
1142 | truncating your fields, before passing them to the fuzzy logic norms:
1143 | 
1144 | .. code:: lisp
1145 | 
1146 |        (max 0 (min 1 (field "000001"))) ;; Truncating field
1147 |        (normalize "000001") ;; Normalizing field
1148 | 
1149 | You could then write expressions like these:
1150 | 
1151 | .. code:: lisp
1152 | 
1153 |        (tnorm-min (normalize "000001")  (normalize "000002"))
1154 |        (tnorm-min (max 0 (min 1 (field "000001")))
1155 |                   (max 0 (min 1 (field "000002"))))
1156 | 
1157 | 
1158 | Basic T-norms
1159 | ~~~~~~~~~~~~~
1160 | 
1161 | As members of the family of fuzzy logics, t-norm fuzzy logics
1162 | primarily aim at generalizing classical two-valued logic by admitting
1163 | intermediary truth values between 1 (truth) and 0 (falsity)
1164 | representing degrees of truth of propositions. In fuzzy logic,
1165 | continuous t-norms are often found playing the role of conjunctive
1166 | connectives. We provide the following basic t-norms.  All of them need
1167 | 2 parameters.
1168 | 
1169 | ::
1170 | 
1171 |         (tnorm-min <f1> <f2>) ;; Minimum t-norm. Also called the Gödel t-norm.
1172 |         (tnorm-product <f1> <f2>) ;; Product t-norm. The ordinary product of real numbers.
1173 |         (tnorm-lukasiewicz <f1> <f2>) ;; Łukasiewicz t-norm.
1174 |         (tnorm-drastic <f1> <f2>) ;; Drastic t-norm
1175 |         (tnorm-nilpotent-min <f1> <f2>) ;; Nilpotent minimum t-norm
1176 | 
1177 | 
1178 | 
1179 | Basic T-conorms
1180 | ~~~~~~~~~~~~~~~
1181 | 
1182 | T-conorms (also called S-norms) are dual to t-norms under the
1183 | order-reversing operation which assigns 1 – x to x on [0,
1184 | 1]. T-conorms are used to represent logical disjunction in fuzzy logic
1185 | and union in fuzzy set theory.  We provide the following basic
1186 | t-conorms.  All of them need 2 parameters.
1187 | 
1188 | ::
1189 | 
1190 |          (tconorm-max <f1> <f2>) ;; Maximum t-norm. Dual to the minimum t-norm, is the smallest t-conorm.
1191 |          (tconorm-probabilistic <f1> <f2>) ;; Probabilistic t-norm. It's dual to the product t-norm.
1192 |          (tconorm-bounded <f1> <f2>) ;; Bounded t-norm. It'ss dual to the Łukasiewicz t-norm.
1193 |          (tconorm-drastic <f1> <f2>) ;; Drastic t-conorm. It's dual to the drastic t-norm.
1194 |          (tconorm-nilpotent-max <f1> <f2>) ;; Nilpotent maximum t-conorm. It's dual to the nilpotent minumum.
1195 |          (tconorm-einstein-sum <f1> <f2>) ;; Einstein t-conorm. It's a dual to one of the Hamacher t-norms.
1196 | 
1197 | 
1198 | Parametric T-norms
1199 | ~~~~~~~~~~~~~~~~~~
1200 | 
1201 | We provide the following parametrized t-norms. All of them need 3
1202 | parameters, the two fields were t-norms will be applied, and the
1203 | parameter ``p`` that provides a way to vary the gain on the function
1204 | so that it can be very restrictive or very permissive. This parameter
1205 | must be a real number.
1206 | 
1207 | ::
1208 | 
1209 |         (tnorm-schweizer-sklar <f1> <f2>) ;;  Parameter p in the range [-∞, ∞]
1210 |         (tnorm-hamacher <f1> <f2>) ;; Parameter p in the range [0, ∞]
1211 |         (tnorm-frank <f1> <f2>) ;; Parameter p in the range [0, ∞]
1212 |         (tnorm-yager <f1> <f2>) ;; Parameter p in the range [0, ∞]
1213 |         (tnorm-aczel-alsina <f1> <f2>) ;; Parameter p in the range [0, ∞]
1214 |         (tnorm-dombi <f1> <f2>) ;; Parameter p in the range [0, ∞]
1215 |         (tnorm-sugeno-weber <f1> <f2>) ;; Parameter p in the range [-1, ∞]
1216 | 
1217 | 
1218 | Dates and times
1219 | ---------------
1220 | 
1221 | Epoch fields
1222 | ~~~~~~~~~~~~
1223 | 
1224 | A numerical field can be interpreted as an *epoch*, that is, the number
1225 | of **milliseconds** since 1970. Flatline provides the following
1226 | functions to expand an epoch to its date-time components:
1227 | 
1228 | ::
1229 | 
1230 |         (epoch-year <n>)
1231 |         (epoch-month <n>)
1232 |         (epoch-week <n>)
1233 |         (epoch-day <n>)
1234 |         (epoch-weekday <n>)
1235 |         (epoch-hour <n>)
1236 |         (epoch-minute <n>)
1237 |         (epoch-second <n>)
1238 |         (epoch-millisecond <n>)
1239 | 
1240 |         (epoch-fields <n>)
1241 |           => (list (epoch-year <n>) (epoch-month <n>) (epoch-day <n>)
1242 |                    (epoch-weekday <n>) (epoch-hour <n>) (epoch-minute <n>)
1243 |                    (epoch-second <n>) (epoch-millisecond <n>))
1244 |         <n> ::= numerical value
1245 | 
1246 | For instance:
1247 | 
1248 | .. code:: lisp
1249 | 
1250 |         (epoch-fields (f "milliseconds"))
1251 |         (epoch-year (* 1000 (f "seconds")))
1252 | 
1253 | The epoch functions also accept negative integers, which represent dates
1254 | prior to 1970.
1255 | 
1256 | The day of the week (given by ``epoch-weekday``) is a number from 1
1257 | (Monday) to 7 (Sunday).
1258 | 
1259 | The week within the year, given by ``epoch-week``, is a number between
1260 | 1 and 52.  Note that it is not included in the oputput of
1261 | ``epoch-fields``.
1262 | 
1263 | Datetime arithmetic
1264 | ~~~~~~~~~~~~~~~~~~~
1265 | 
1266 | Since epochs are just integers, date arithmetic can be performed at that
1267 | level by simply using Flatline's arithmetic operations.
1268 | 
1269 | As a convenience, if a field of type ``datetime`` is used in an
1270 | arithmetic operation, it's automatically converted to an epoch (i.e., an
1271 | integer value) for you. For instance, the two following expressions for
1272 | computing the number of seconds since 1970 are equivalent:
1273 | 
1274 | .. code:: lisp
1275 | 
1276 |          (/ (f "a-datetime-string") 1000)
1277 |          (/ (epoch (f "a-datetime-string")) 1000)
1278 | 
1279 | Datetime parsing
1280 | ~~~~~~~~~~~~~~~~
1281 | 
1282 | Conversely, string values representing dates can be transformed to a
1283 | numerical epoch by using the ``epoch`` coercion function:
1284 | 
1285 | ::
1286 | 
1287 |         (epoch <str>)
1288 |         (epoch <str> <format>)
1289 | 
1290 | If you don't specify a datetime format for parsing, we try a long list
1291 | of available formats in sequence, which is less efficient than if you
1292 | provide the format explicitly. Datetime format specifiers follow the
1293 | well known `*JodaTime* specification for datetime
1294 | patterns <http://www.joda.org/joda-time/key_format.html>`__.
1295 | 
1296 | For instance:
1297 | 
1298 | .. code:: lisp
1299 | 
1300 |         (epoch-fields (epoch "1969-14-07T06:00:12")) => [1969 14 07 06 00 12 0]
1301 |         (epoch-hour (epoch "11~22~30" "hh~mm~ss")) => 11
1302 | 
1303 | The datetime formate pattern letters are:
1304 | 
1305 | ::
1306 | 
1307 |         Symbol  Meaning                      Presentation  Examples
1308 |         ------  -------                      ------------  -------
1309 |          G       era                          text          AD
1310 |          C       century of era (>=0)         number        20
1311 |          Y       year of era (>=0)            year          1996
1312 | 
1313 |          x       weekyear                     year          1996
1314 |          w       week of weekyear             number        27
1315 |          e       day of week                  number        2
1316 |          E       day of week                  text          Tuesday; Tue
1317 | 
1318 |          y       year                         year          1996
1319 |          D       day of year                  number        189
1320 |          M       month of year                month         July; Jul; 07
1321 |          d       day of month                 number        10
1322 | 
1323 |          a       halfday of day               text          PM
1324 |          K       hour of halfday (0~11)       number        0
1325 |          h       clockhour of halfday (1~12)  number        12
1326 | 
1327 |          H       hour of day (0~23)           number        0
1328 |          k       clockhour of day (1~24)      number        24
1329 |          m       minute of hour               number        30
1330 |          s       second of minute             number        55
1331 |          S       fraction of second           number        978
1332 | 
1333 |          z       time zone                    text          Pacific Standard Time; PST
1334 |          Z       time zone offset/id          zone          -0800; -08:00; America/Los_Angeles
1335 | 
1336 |          '       escape for text              delimiter
1337 |          ''      single quote                 literal       '
1338 | 
1339 | The count of pattern letters determine the format, according to the
1340 | following rules:
1341 | 
1342 | -  Text: If the number of pattern letters is 4 or more, the full form is
1343 |    used; otherwise a short or abbreviated form is used if available.
1344 |    Thus, "EEEE" might output "Monday" whereas "E" might output "Mon"
1345 |    (the short form of Monday).
1346 | 
1347 | -  Number: The minimum number of digits. Shorter numbers are zero-padded
1348 |    to this amount. Thus, "HH" might output "09" whereas "H" might output
1349 |    "9" (for the hour-of-day of 9 in the morning).
1350 | 
1351 | -  Year: Numeric presentation for year and weekyear fields are handled
1352 |    specially. For example, if the count of ``y`` is 2, the year will be
1353 |    displayed as the zero-based year of the century, which is two digits.
1354 | 
1355 | -  Month: 3 or over, use text, otherwise use number. Thus, "MM" might
1356 |    output "03" whereas "MMM" might output "Mar" (the short form of
1357 |    March) and "MMMM" might output "March".
1358 | 
1359 | -  Zone: ``Z`` outputs offset without a colon, ``ZZ`` outputs the offset
1360 |    with a colon, ``ZZZ`` or more outputs the zone id.
1361 | 
1362 | -  Zone names: Time zone names (``z``) cannot be parsed.
1363 | 
1364 | Any characters in the pattern that are not in the ranges of
1365 | ``['a'..'z']`` and ``['A'..'Z']`` will be treated as quoted text. For
1366 | instance, characters like ``:``, ``.``, ```,``\ #\ ``and``?\` will
1367 | appear in the resulting time text even they are not embraced within
1368 | single quotes.
1369 | 
1370 | Local bindings
1371 | --------------
1372 | 
1373 | You can define lexically scoped variables using the ``let`` special
1374 | form:
1375 | 
1376 | ::
1377 | 
1378 |         (let <bindings> <body>)
1379 |         <bindings> := (<varname0> <val0> ...  <varnamen> <valn>)
1380 |         <body> := <expression with varname0 ... varnamen>
1381 | 
1382 | The binding values are evaluated sequentially and can then be referenced
1383 | in the body of the let expression by their names:
1384 | 
1385 | .. code:: lisp
1386 | 
1387 |         (let (x (+ (window "a" -10 10))
1388 |               a (/ (* x 3) 4.34)
1389 |               y (if (< a 10) "Good" "Bad"))
1390 |           (list x (str (f 10) "-" y) a y))
1391 | 
1392 | As shown in the example above, value expressions can use any identifier
1393 | previously defined in the same list:
1394 | 
1395 | .. code:: lisp
1396 | 
1397 |         (let (x 43
1398 |               x (+ x 1)
1399 |               y (+ x 1))
1400 |           (list x y))  =>  (44 45)
1401 | 
1402 | Finally, let expressions can nested and they can appear wherever a
1403 | Flatline expression is valid:
1404 | 
1405 | .. code:: lisp
1406 | 
1407 |          (list (let (z (f 0)) (* 2 (* z z) (log z)))
1408 |                (let (pi 3.141592653589793 r (f "radius")) (* 4 pi r r)))
1409 | 
1410 | Control structures
1411 | ------------------
1412 | 
1413 | Conditionals
1414 | ~~~~~~~~~~~~
1415 | 
1416 | The ``if`` operator can be applied to a boolean conditional to yield one
1417 | of a couple of values, with the "else" clause being optional:
1418 | 
1419 | ::
1420 | 
1421 |        (if <cond> <then> [<else>])
1422 |        <cond> := boolean value
1423 | 
1424 | You can use arbitrary expressions for ``<cond>``, ``<then>`` and
1425 | ``<else>``, with the only restriction that ``<cond>`` must be a boolean
1426 | value. If not provided, ``<else>`` defaults to a "nil" value that
1427 | denotes a missing token.
1428 | 
1429 | .. code:: lisp
1430 | 
1431 |        (if (< (field "age") 18) "non-adult" "adult")
1432 | 
1433 |        (if (= "oh" (field "000000")) "OH")
1434 | 
1435 |        (if (> (field "000001") (mean "000001"))
1436 |            "above average"
1437 |            (if (< (field "000001") (mean "000001"))
1438 |                "below average"
1439 |                "mediocre"))
1440 | 
1441 | Flatline won't let you give ``<then>`` and ``<else>`` different types.
1442 | 
1443 | Another caveat is that in Flatline boolean expressions can have 3
1444 | values, namely ``true``, ``false`` and **missing**. If the ``<cond>`` in
1445 | an ``if`` expression is a missing value, **the whole expression will
1446 | evaluate to a missing value**. That means that, for instance:
1447 | 
1448 | .. code:: lisp
1449 | 
1450 |         (if (< (f 0) 3) 0 1)
1451 | 
1452 | will evaluate to null (and *not* to 1) when the field 0 has a missing
1453 | value. That's because the ``<else>`` branch is not even evaluated.
1454 | Therefore:
1455 | 
1456 | .. code:: lisp
1457 | 
1458 |         (if (< (f 0) 3) 0 (if (missing? 0) 2 1))
1459 | 
1460 | will again evaluate to null when the field 0 is missing: it will *not*
1461 | evaluate to 2, because the ``<else>`` branch is never reached. If you
1462 | need to test for a missing value, the test must always come first:
1463 | 
1464 | .. code:: lisp
1465 | 
1466 |         (if (missing? 0) 2 (if (< (f 0) 3) 0 1))
1467 | 
1468 | We also provide the ``cond`` operator, which allows a more compact
1469 | representation of a chain of nested ``if`` clauses:
1470 | 
1471 | ::
1472 | 
1473 |        (cond <cond0> <then0>
1474 |              <cond1> <then1>
1475 |              ... ...
1476 |              <default>) := (if <cond0> <then0> (if <cond1> <then1> ... <default>))
1477 | 
1478 | Conditions are checked in order, and the first one that matches provides
1479 | the value of the ``cond`` expression. If none of the conditions is met,
1480 | the expression evaluates to ``<default>`` or nil (missing token) if it's
1481 | not provided.
1482 | 
1483 | For instance:
1484 | 
1485 | .. code:: lisp
1486 | 
1487 |         (cond (> (f "000001") (mean "000001")) "above average"
1488 |               (= (f "000001") (mean "000001")) "below average"
1489 |               "mediocre")
1490 | 
1491 |         (cond (or (= "a" (f 0)) (= "a+" (f 0))) 1
1492 |               (or (= "b" (f 0)) (= "b+" (f 0))) 0
1493 |               (or (= "c" (f 0)) (= "c+" (f 0))) -1)
1494 | 
1495 |         (cond (< (f "age") 2) "baby"
1496 |               (and (<= 2 (f "age") 10) (= "F" (f "sex"))) "girl"
1497 |               (and (<= 2 (f "age") 10) (= "M" (f "sex"))) "boy"
1498 |               (< 10 (f "age") 20) "teenager"
1499 |               "adult")
1500 | 
1501 | The same caveat with ``if`` regarding missing values applies to
1502 | ``cond``: **if any of the conditions evaluates to a missing value, the
1503 | whole expression evaluates to a missing value**. Therefore, checks using
1504 | ``missing?`` must always come first:
1505 | 
1506 | .. code:: lisp
1507 | 
1508 |         ;;; CORRECT
1509 |         (cond (missing? "age") 0
1510 |               (< (f "age") 10) 1
1511 |               2)
1512 | 
1513 |         ;;; INCORRECT (the missing? test is never reached)
1514 |         (cond (< (f "age")) 1
1515 |               (missing? "age") 0
1516 |               2)
1517 | 
1518 | Lists
1519 | -----
1520 | 
1521 | It's possible to create a list of values using the ``list`` operator:
1522 | 
1523 | ::
1524 | 
1525 |         (list <sexp-0> ...)
1526 | 
1527 | with the values any valid Flatline expression, e.g.:
1528 | 
1529 | .. code:: lisp
1530 | 
1531 |         (list (field "age")
1532 |               (field "weight" -1)
1533 |               (population "age"))
1534 | 
1535 |         (list 1.23
1536 |               (if (< (field "age") 10) "child" "adult")
1537 |               (field 3))
1538 | 
1539 | and we also provide the classical ``cons`` to create a list from its
1540 | head and tail, which can in turn be accessed via ``head`` and ``tail``:
1541 | 
1542 | ::
1543 | 
1544 |         (cons <head> <tail>)
1545 |         <tail> := list
1546 |         (head <list>)
1547 |         (tail <list>)
1548 | 
1549 | so that:
1550 | 
1551 | ::
1552 | 
1553 |         (head (cons x lst)) ==> x
1554 |         (tail (cons x lst)) ==> lst
1555 | 
1556 | It is also possible to access the nth element of a list using its 0-based
1557 | position index:
1558 | 
1559 | ::
1560 | 
1561 |         (nth <list> <pos>)
1562 |         <pos> := positive integer
1563 | 
1564 | When the given position is out of bounds, the expression evaluates to
1565 | nil (a missing token).
1566 | 
1567 | There are operators to take and drop the first n elements of a list:
1568 | 
1569 | ::
1570 | 
1571 |          (drop <list> <n>)
1572 |          (take <list> <n>)
1573 |          <n> := positive integer
1574 | 
1575 | For example:
1576 | 
1577 | .. code:: lisp
1578 | 
1579 |        (take 3 (list 1 2 "3" 4)) ;; => (1 2 "3")
1580 |        (drop 3 (list 1 2 "3" 4)) ;; => (4)
1581 | 
1582 | Taking more elements than a list contains returns the full list, and
1583 | droping more elements that the list length evaluates to a missing
1584 | value.
1585 | 
1586 | It is also possible to take a slice in a list in a semi-open range
1587 | ``[from, to)`` with the function ``slice``, that can be defined in
1588 | terms of ``take`` and ``drop``:
1589 | 
1590 | ::
1591 | 
1592 |         (slice <lst> <from> <to>)
1593 |         := (take (- <to> <from>) (drop <from> <lst>))
1594 | 
1595 | 
1596 | List operators
1597 | ~~~~~~~~~~~~~~
1598 | 
1599 | Given a list value, you can count its elements, obtain their median,
1600 | mode and, when its values are numeric, compute the maximum minimum and
1601 | average:
1602 | 
1603 | ::
1604 | 
1605 |        (count <list>)         ;; (count (list (f 1) (f 2))) => 2
1606 |        (mode <list>)          ;; (mode (list a b b c b a c c c)) => "c"
1607 |        (max <list>)           ;; (max (list -1 2 -2 0.38)) => 2
1608 |        (min <list>)           ;; (min (list -1.3 2 1)) => -1.3
1609 |        (avg <list>)           ;; (avg (list -1 -2 1 2 0.8 -0.8)) => 0
1610 |        (list-median <list>)   ;; (list-median (list -1 -2 1 2 0.8 -0.8) => 1
1611 | 
1612 | And, as we have mentioned, the arithmetic operators ``+``, ``-``, ``*``
1613 | and ``/`` are, like ``max`` and ``min``, overloaded to distribute over
1614 | the elements of a numeric list:
1615 | 
1616 | ::
1617 | 
1618 |        (+ (list x y ...)) := (+ x y ...)
1619 |        (- (list x y ...)) := (- x y ...)
1620 |        (* (list x y ...)) := (* x y ...)
1621 |        (/ (list x y ...)) := (/ x y ...)
1622 | 
1623 | One can ``reverse`` and ``sort`` (in ascending lexicographical order)
1624 | any list:
1625 | 
1626 | ::
1627 | 
1628 |        (reverse <list>)
1629 |        (sort <list>)
1630 | 
1631 | E.g.
1632 | 
1633 | .. code:: lisp
1634 | 
1635 |        (reverse (list "a" 0 2 "b")) => ("b" 2 0 "a")
1636 |        (sort (list 1 3 -1 2))  => (-1 2 1 3)
1637 |        (sort (list "a" "b" "aa")) => ("a" "aa" "b")
1638 | 
1639 | Finally, you can check whether a value appears in a list using the
1640 | ``in`` operator:
1641 | 
1642 | ::
1643 | 
1644 |        (in <x> (<x0> <x1> ... <xn>))
1645 | 
1646 | which evaluates to ``true`` if any of the ``<xi>`` equals ``<x>``, e.g.:
1647 | 
1648 | .. code:: lisp
1649 | 
1650 |        (in 3 (1 2 3 2)) => true
1651 |        (in "abc" (1 2 3)) => false
1652 |        (in (f "size") ("X" "XXL"))
1653 | 
1654 | Maps and filters
1655 | ----------------
1656 | 
1657 | It's also possible to apply an expression template (a Flatline
1658 | expression with one free variable, marked as ``_``) to each element of a
1659 | list, yielding a list of results, using the ``map`` primitive:
1660 | 
1661 | ::
1662 | 
1663 |        (map <fn> (list <a0> <a1> ... <an>))
1664 |          := (list (call <fn> <a0>) (call <fn> <a1>) ... (call <fn> <an>))
1665 |        <fn> := expression template
1666 | 
1667 | An expression template is any valid Flatline expression that uses ``_``
1668 | as a placeholder:
1669 | 
1670 | .. code:: lisp
1671 | 
1672 |        (< _ 3)
1673 |        (+ (f "000001" _) 3)
1674 |        (< -18 _ (f 3))
1675 | 
1676 | and when you ``call`` a template with an argument, a new expression is
1677 | generated by the simple device of substituting the argument for ``_`` in
1678 | the template. For instance:
1679 | 
1680 | .. code:: lisp
1681 | 
1682 |        (map (* 2 _) (list (f 0 -1) (f 0) (f 0 1)))
1683 | 
1684 | expands to
1685 | 
1686 | .. code:: lisp
1687 | 
1688 |        ((* 2 (f 0 1)) (* 2 (f 0)) (* 2 (f 0 1)))
1689 | 
1690 | A second common list transformation is ``filter``, which allows you to
1691 | apply a predicate to each element of a list and retain only those values
1692 | that satisfy it:
1693 | 
1694 | ::
1695 | 
1696 |       (filter <fn> (list <a0> ... <an>)) := [ai | (call <fn> <ai>) is true]
1697 | 
1698 | For instance,
1699 | 
1700 | .. code:: lisp
1701 | 
1702 |       (+ (filter (< _ 3) (fields "a" "b" "c")))
1703 | 
1704 | will add those values of the fields with names a, b and c whose values
1705 | are less than three.
1706 | 
1707 | Currently, maps and filter are implemented as macro expansions (for
1708 | simplicity, and also for performance) and their second argument must
1709 | therefore be a ``list``, ``fields`` or ``window`` (see below) form. If
1710 | needed, future versions of Flatline will provide slow real functions.
1711 | 
1712 | Field lists and windows
1713 | -----------------------
1714 | 
1715 | (Almost) all fields
1716 | ~~~~~~~~~~~~~~~~~~~
1717 | 
1718 | We provide several primitives for creating lists of field values. The
1719 | first one is ``all``, which specifies that all input fields should be
1720 | copied, without any modification. For cases where you want to copy all
1721 | but a few fields, there's ``all-but``, which takes as argument
1722 | designators of those fields *not* to include in the list:
1723 | 
1724 | ::
1725 | 
1726 |         (all) := (list (f 0) ... (f <field-count>))
1727 |         (all-but  <fd0> ... <fdn>)
1728 |           := (list (f i0) ... (f in)) | i0...in not in fd0...fdn
1729 | 
1730 | and, conversely, ``fields``, which lets you select a list of fields from
1731 | the current input row:
1732 | 
1733 | ::
1734 | 
1735 |        (fields <field-designator> ... <field-designator-n>) :=
1736 |           (list (f <field-designator>) .. <field-designator-n>)
1737 | 
1738 | In both ``all-but`` and ``fields``, fields can be designated, as usual,
1739 | with either their identifier, name or ``column_number``:
1740 | 
1741 | .. code:: lisp
1742 | 
1743 |        (all-but "id" "000023")
1744 |        (fields "000003" 3 "a field" "another" "0002a3b-3")
1745 | 
1746 | Sometimes one needs to fill-in missing values in one pass: an easy way
1747 | for that is provided by the function ``all-with-defaults``, that copies
1748 | all input rows, but replacing missing values with given ones:
1749 | 
1750 | ::
1751 | 
1752 |         (all-with-defaults <field-designator-0> <field-value-0>
1753 |                            <field-designator-1> <field-value-1>
1754 |                            ...
1755 |                            <field-designator-n> <field-value-n>)
1756 | 
1757 | The list of designator/value pairs does not need to be exhaustive or
1758 | ordered, and again the designator can be a field id, name, or column
1759 | number:
1760 | 
1761 | .. code:: lisp
1762 | 
1763 |         (all-with-defaults "species" "Iris-versicolor"
1764 |                            "petal-width" 2.8
1765 |                            "000002" 0)
1766 | 
1767 | It is also possible to provide a default for all missing numeric fields
1768 | in a row at once, using ``all-with-numeric-default``:
1769 | 
1770 | ::
1771 | 
1772 |         (all-with-numeric-default <value>)
1773 |         <value> := "mean" | "median" | "minimum" | "maximum" | <number>
1774 | 
1775 | As shown, we can specify that missing numeric fields be filled with
1776 | their mean, median, minimum or maximum values (as read from their
1777 | respective field descriptors) or with any concrete numeric value. For
1778 | example:
1779 | 
1780 | .. code:: lisp
1781 | 
1782 |         (all-with-numeric-default "median")
1783 |         (all-with-numeric-default 0)
1784 | 
1785 | A word of caution: for the case of concrete values, the given number is
1786 | cast to the datatype of the target field, i.e., it'll be mapped to value
1787 | range of the given field (for instance, if you give a default value of
1788 | 128 and a field of type ``int8`` is missing, it'll receive the value
1789 | ``-1``).
1790 | 
1791 | Windows
1792 | ~~~~~~~
1793 | 
1794 | In addition to horizontally selecting different fields in the same row,
1795 | we can keep the field fixed and select vertical windows of its value,
1796 | via the ``window`` and related operators. They're just syntactic sugar
1797 | over the shifted field accessors we've already seen:
1798 | 
1799 | ::
1800 | 
1801 |        (window <field-designator> <start> <end> [<padding-value>])
1802 | 
1803 |         := (list (f <field-designator> <start> <padding-value>)
1804 |                  (f <field-designator> <start + 1> <padding-value>)
1805 |                  ...
1806 |                  (f <field-designator> <end> <padding-value>))
1807 | 
1808 | So, for instance, the window:
1809 | 
1810 | .. code:: lisp
1811 | 
1812 |         (window "000001" -1 2)
1813 | 
1814 | denotes the list of values:
1815 | 
1816 | .. code:: lisp
1817 | 
1818 |         (list (f "000001" -1) (f "000001" 0) (f "000001" 1) (f "000001" 2))
1819 | 
1820 | 
1821 | As shown, both start and end must be integers, and the values
1822 | corresponding to their shifts are included in the resulting list.
1823 | 
1824 | In the same way that the shifted field accessors accept a
1825 | ``default-value`` to handle with the out-of-range rows, you can use the
1826 | ``padding-value`` to indicate the value that will be used in those
1827 | cases for windows. ``padding-value`` can be also an expression:
1828 | 
1829 | .. code:: lisp
1830 | 
1831 |        (window "Temp" -2 0 (+ 273 (* 40 (rand))))
1832 | 
1833 | 
1834 | 
1835 | It's possible to apply arithmetic operators, ``filter`` and ``map`` to
1836 | any window. For instance, you could compute the average of the last 3
1837 | values of a field as:
1838 | 
1839 | .. code:: lisp
1840 | 
1841 |        (/ (+ (window "Temp" -2 0) 3))
1842 | 
1843 | Or convert all the values to Fahrenheit degrees and select those below
1844 | 99.9 with:
1845 | 
1846 | .. code:: lisp
1847 | 
1848 |        (filter (< _ 99.9) (map (+ 32 (* 1.8 _)) (window "Temp" -2 0)))
1849 | 
1850 | In addition to the plain ``window`` generator, we provide some other
1851 | convenience window primitives computing, respectively, the average
1852 | value, median and of the values in a window, their sum and the
1853 | sequence of their differences:
1854 | 
1855 | ::
1856 | 
1857 |        (window-median <field-designator> <start> <end> [<padding-value>])
1858 |          := (list-median (window <field-designator> <start> <end> <padding-value>))
1859 | 
1860 |        (window-mean <field-designator> <start> <end> [<padding-value>])
1861 |          := (avg (window <field-designator> <start> <end> <padding-value>))
1862 | 
1863 |        (window-mode <field-designator> <start> <end> [<padding-value>])
1864 |          := (mode (window <field-designator> <start> <end> <padding-value>))
1865 | 
1866 |        (window-sum <field-designator> <start> <end> [<padding-value>])
1867 |          := (+ (window <field-designator> <start> <end> <padding-value>))
1868 | 
1869 |        (diff-window <fdes> <start> <end> [<padding-value>])
1870 |          := (list (- (f <fdes> <start> <padding-value>)
1871 |                      (f <fdes> (- <start> 1) <padding-value>))
1872 |                   (- (f <fdes> (- <start> 1) <padding-value>)
1873 |                      (f <fdes> (- <start> 2) <padding-value>))
1874 |                   ...
1875 |                   (- (f <fdes> (- <end> 1) <padding-value>)
1876 |                      (f <fdes> <end> <padding-value>)))
1877 | 
1878 | These window generator forms can also be combined with ``filter``,
1879 | ``map`` and all the other window operators.
1880 | 
1881 | Conditional window limits
1882 | ~~~~~~~~~~~~~~~~~~~~~~~~~
1883 | 
1884 | There are scenarios in which you might be interested in forming a window
1885 | whose width depends on some condition. For instance, say you want to
1886 | compute the average of a temperature for the last four minutes in a
1887 | dataset with aperiodic entries: ``cond-window`` to the rescue:
1888 | 
1889 | .. code:: lisp
1890 | 
1891 |         (let (now (f "epoch"))
1892 |           (avg (cond-window "temperature" (< (- (f "epoch") now) 240))))
1893 | 
1894 | As you see in this example, ``cond-window`` takes a field designator and
1895 | a predicate; the latter is applied sequentially to the current and
1896 | future rows (up to a standard maximum value), and a list of the values
1897 | of the requested fields for the rows satisfying the predicate is
1898 | returned.
1899 | 
1900 | ::
1901 | 
1902 |         (cond-window <fdes> <sexp>)
1903 |           := (list (f <fdesc> 0) ... (f <fdesc> n)) | for [0..n] (<sexp>)
1904 | 
1905 | Note that, as mentioned, ``<sexp>`` is a Flatline expression computed
1906 | with the corresponding (future) full row as input.
1907 | 
1908 | Acummulating values in cells
1909 | ----------------------------
1910 | 
1911 | It is possible to store and retrieve values global to the full
1912 | generation using named *cells*.  To retrieve a value previously stored
1913 | in a cell (most probably, when computing the field values of a
1914 | previous row), you just call ``cell``, providing the cell name as a
1915 | string-valued expression (it doesn't have to be constant) and a
1916 | default value in case the cell hasn't been set yet (this also tells
1917 | flatline what the type of the cell's values are going to be).  To
1918 | store a value in a cell, you simply call ``set-cell``, with a name and
1919 | a value.  ``set-cell`` returns the value just set.
1920 | 
1921 | For instance, this flatline expression will generate a new field
1922 | containing the running sum of values in the input field "price":
1923 | 
1924 | .. code:: lisp
1925 | 
1926 |       (let (s (cell "sum" 0))
1927 |         (set-cell "sum" (+ (f "price") s)))
1928 | 
1929 | Of course, ``set-cell`` doesn't need to be the final call in your
1930 | expression.  Here's an example of a more complicated, if a bit
1931 | contrieved, usage case for a string cell:
1932 | 
1933 | .. code:: lisp
1934 | 
1935 |        (let (s (cell "ff" "flip")
1936 |              ns (set-cell "ff" (if (= s "flip") "flop" "flip")))
1937 |          (if (= s "flip")
1938 |            (- (f "A") (f "B"))
1939 |            (- (f "B") (f "A"))))
1940 | 
1941 | As mentioned, cell names can be computed values:
1942 | 
1943 | .. code:: lisp
1944 | 
1945 |       (let (cat (f "cat-field")
1946 |             cat-count (cell cat 0)
1947 |             new-count (set-cell cat (+ 1 cat-count)))
1948 |          (if (> cat-count 1000) (something) (something-else)))
1949 | 


--------------------------------------------------------------------------------