We've emailed you instructions for setting your password.
15 | You should be receiving them shortly.
16 | If you don't receive an email, please make sure you've entered the address you registered with, and check your spam folder.
76 |
77 |
--------------------------------------------------------------------------------
/coffer/views.py:
--------------------------------------------------------------------------------
1 | # coffer.views
2 | # Views and interaction logic for the coffer app.
3 | #
4 | # Author: Benjamin Bengfort
5 | # Created: Thu Oct 08 21:46:26 2015 -0400
6 | #
7 | # Copyright (C) 2015 District Data Labs
8 | # For license information, see LICENSE.txt
9 | #
10 | # ID: views.py [] benjamin@bengfort.com $
11 |
12 | """
13 | Views and interaction logic for the coffer app.
14 | """
15 |
16 | ##########################################################################
17 | ## Imports
18 | ##########################################################################
19 |
20 | from django.db import IntegrityError
21 | from braces.views import LoginRequiredMixin
22 | from django.views.generic.list import ListView
23 | from django.views.generic.edit import FormView
24 | from django.views.generic.detail import DetailView
25 |
26 | from coffer.models import Dataset
27 | from coffer.forms import DatasetUploadForm
28 |
29 | ##########################################################################
30 | ## HTML/Web Views
31 | ##########################################################################
32 |
33 | class DatasetUploadView(LoginRequiredMixin, FormView):
34 |
35 | template_name = "site/upload.html"
36 | form_class = DatasetUploadForm
37 | success_url = "/upload"
38 |
39 |
40 | def get_form_kwargs(self):
41 | """
42 | Add the request to the kwargs
43 | """
44 | kwargs = super(DatasetUploadView, self).get_form_kwargs()
45 | kwargs['request'] = self.request
46 | print kwargs
47 | return kwargs
48 |
49 | def form_valid(self, form):
50 | try:
51 | form.save()
52 | return super(DatasetUploadView, self).form_valid(form)
53 | except IntegrityError:
54 | form.add_error(None, "Duplicate file detected! Cannot upload the same file twice.")
55 | return super(DatasetUploadView, self).form_invalid(form)
56 |
57 | def get_context_data(self, **kwargs):
58 | """
59 | Add ten most recent uploads to context
60 | """
61 | context = super(DatasetUploadView, self).get_context_data(**kwargs)
62 | context['upload_history'] = Dataset.objects.order_by('-created')[:10]
63 | return context
64 |
65 |
66 | class DatasetListView(LoginRequiredMixin, ListView):
67 |
68 | model = Dataset
69 | template_name = "coffer/dataset_list.html"
70 | paginate_by = 25
71 | context_object_name = "dataset_list"
72 |
73 | def get_context_data(self, **kwargs):
74 | context = super(DatasetListView, self).get_context_data(**kwargs)
75 | context['num_datasets'] = Dataset.objects.count()
76 | context['latest_dataset'] = Dataset.objects.latest().created
77 | return context
78 |
79 |
80 | class DatasetDetailView(LoginRequiredMixin, DetailView):
81 |
82 | template_name = "coffer/dataset_detail.html"
83 | model = Dataset
84 |
--------------------------------------------------------------------------------
/docs/index.md:
--------------------------------------------------------------------------------
1 | # Welcome to DDL Cultivar!
2 |
3 | There is a massive amount of potential value, both commercial and societal, in the data that is being generated today, but the shortage of both knowledgeable data scientists and quality tools to extract insights from these large, complex data sets is preventing advancements in society that could be achieved faster. A better understanding of the world around us and the enhanced ability to make good decisions are just the tip of the iceberg of what more efficient data science processes could allow us to accomplish as a society, and with Cultivar, we hope to be able to expand the abilities of data scientists and other analytical professionals so that we can achieve those societal advancements on a larger scale.
4 |
5 | In the last few years, data science and analytical functions have played an increasingly important role in the operations of a growing number of companies, and we see this trend continuing to grow for the foreseeable future. Today’s top companies have significant data science processes in place that continually optimize their offerings and improve their products. The competitive edge that these processes afford companies will drive increased competition for both analytical professionals and for tools to help them generate value from data.
6 |
7 | Data science consists of a conglomeration of methods from previously existing fields like statistics and computer science, combined with new practices for processing the large amounts of data that are being generated today. It is a very broad technological field, which gives it the properties of being applicable to almost every other discipline and of there constantly being new advancements in some parts of the field. This presents a substantial economic opportunity for companies that cater to the needs of data scientists and the companies for whom they work.
8 |
9 | In order for data to be valuable, it needs to go through a refinement process. We call this process the data science pipeline, and Cultivar aims to improve and streamline the most time consuming phases of the pipeline so that data scientists can be as productive as possible and more quickly build valuable solutions for their organizations.
10 |
11 | 
12 |
13 | Cultivar is an integrated data management, exploration, and visualization solution aimed at enhancing the experimental workflow of the data scientist.
14 |
15 | Cultivar offers:
16 |
17 | 1. [Visual analytics](visual_analysis.md) features to enable more effective separability analysis of multidimensional and unstructured data.
18 |
19 | 2. [Automated data “touch”](auto_analysis.md) (e.g. parsing, standardization, normalization, cleaning, wrangling) to accelerate analysis.
20 |
21 | 3. State-of-the-art [version control and provenance](version_control.md) for dataset management.
22 |
23 | Cultivar is a dataset management, analysis and visualization tool that is being built as part of the DDL Multidimensional Visualization Research Lab. See: [Parallel Coordinates](http://homes.cs.washington.edu/~jheer//files/zoo/ex/stats/parallel.html) for more on the types of visualizations we're experimenting with.
24 |
--------------------------------------------------------------------------------
/docs/version_control.md:
--------------------------------------------------------------------------------
1 | # Version Control and Provenance
2 |
3 | ## Overview
4 |
5 | Cultivar offers state-of-the-art version control and provenance that are optimized for dataset management.
6 |
7 | Duplication is the single most significant problem in versioning and version control (Mashtizadeh et al. 2013; Zhang et al. 2013; Ramasubramanian et al. 2009; Santry et al. 1999). For example, if minor changes are made to Version 1 of a file, and the updated Version 2 is then also saved to the data store, most of the information stored in Versions 1 and 2 is identical. That duplication corresponds directly to wasted storage space, which is directly correlated to monetary cost. On the other hand, partitioning files in such a way as to reduce duplication also makes the reconstitution of those files expensive in terms of memory processing power. These reasons are precisely why Cloud-based storage services prefer duplication and to pass the costs onto their customers.
8 |
9 | ## Architecture
10 |
11 | As such, Cultivar provides a way for users not only to store datasets in stable Cloud-based repositories, but also to modify those datasets, share them with others, branch off new versions for testing and experimentation, and explore the data using the auto-analysis and visual analytics features. To support and sustain this kind of exploration, Cultivar’s dataset versioning solution implements theories initially explored in Chervenak et al. (2000), Palankar et al. (2008), and Ramasubramanian et al. (2009). It aims to balance the tradeoff between the availability of the data (to which users want ready access), and the storage of that data (which becomes less accessible but much cheaper as it is increasingly compressed and archived).
12 |
13 | ## References
14 |
15 | Chervenak, A, Foster, I., Kesselman, C., Salisbury, C., & Tuecke, S. (2000). The data grid: Towards an architecture for the distributed management and analysis of large scientific datasets. Journal of network and computer applications 23.3, 187-200.
16 |
17 | Mashtizadeh, A., Bittau, A., Huang, Y., & Mazieres, D. (2013). Replication, history, and grafting in the Ori file system. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, 151–166.
18 |
19 | Palankar, M.R., Iamnitchi, A., Ripeanu, M., et al. (2008). Amazon S3 for science grids: a viable solution?. Proceedings of the 2008 international workshop on Data-aware distributed computing. ACM.
20 |
21 | Ramasubramanian, V., Rodeheffer, T., Terry, D., Walraed-Sullivan, M., Wobber, T., Marshall, C. & Vahdat, A. (2009). Cimbiosys: a platform for content-based partial replication. In Proceedings of the 6th USENIX symposium on Networked systems design and implementation, NSDI’09, 261–276.
22 |
23 | Santry, D., Feeley, M., Hutchinson, N., Veitch, A., Carton, R., & Ofir, J. (1999). Deciding when to forget in the Elephant file system. In ACM SIGOPS Operating Systems Review 33,110–123.
24 |
25 | Zhang, Y., Dragga, C., Arpaci-Dusseau, A. & Arpaci-Dusseau, R. (2013). -box: Towards reliability and consistency in dropbox-like file synchronization services. In Proceedings of the The Fifth Workshop on Hot Topics in Storage and File Systems, HotStorage ’13, Berkeley, CA, USA. USENIX Association.
26 |
--------------------------------------------------------------------------------
/docs/visual_analysis.md:
--------------------------------------------------------------------------------
1 | # Visual Analysis
2 |
3 | ## Overview
4 |
5 | Separability analysis is critical to the machine learning phase in the data science pipeline, but it becomes increasingly difficult as dimensionality increases. High dimensional data is particularly difficult to explore because most people cannot visualize beyond two- or three-dimensions. Even lower dimensional data can be tedious to visualize because it still requires writing a substantial amount of code, regardless of the programming language one uses.
6 |
7 | Cultivar's visual analysis tools are designed to enable the user to interact with visualizations to help them understand the data. In particular, Cultivar aims to deliver visual analytics features that enable separability analysis for multidimensional and unstructured data.
8 |
9 | ## Architecture
10 |
11 | Scatter matrices, parallel coordinates, and radviz are three promising visual approaches to separability analysis on high-dimensional data.
12 |
13 | 
14 |
15 | 
16 |
17 | 
18 |
19 | The underlying architecture of the visualization tools is an implementation of hierarchical (agglomerative) clustering and brushing as described in Elmqvist and Fekete (2010) and Fua, Ward and Rundensteiner (1999) using `scikit-learn`. This clustering approach results in distance metrics necessary to do instance identification, and provides “cuts” that can be viewed at multiple different levels using brushing. Cultivar also provides a two-dimensional rank-by-feature framework as described by Seo and Schneiderman (2004, 2005, 2006), including color-coded scatterplot matrices to identify linear, quadratic, Pearson, and Spearman relationships between pairs of variables, and tools to facilitate separability analysis, such as radial visualization, dendrograms, and parallel coordinates as described in Wegman (1990), Fua, Ward, and Rundensteiner (1999), and Inselberg (2004).
20 |
21 | ## References
22 |
23 | Elmqvist, N., & Fekete, J.-D. (2010). Hierarchical aggregation for information visualization: Overview, techniques, and design guidelines. Visualization and Computer Graphics, IEEE Transactions on, 16(3), 439–454.
24 |
25 | Fua, Y., Ward, M., & Rundensteiner, E. (1999). Hierarchical parallel coordinates for exploration of large datasets. Proceedings of the conference on Visualization '99: Celebrating ten years. IEEE Computer Society Press.
26 |
27 | Seo, J., & Shneiderman, B. (2004). A rank-by-feature framework for unsupervised multidimensional data exploration using low dimensional projections. In Information Visualization, INFOVIS 2004. IEEE Symposium on (pp. 65–72).
28 |
29 | Seo, J., & Shneiderman, B. (2005). A rank-by-feature framework for interactive exploration of multidimensional data. Information Visualization, 4(2), 96–113.
30 |
31 | Seo, J., & Shneiderman, B. (2006). Knowledge discovery in high-dimensional data: Case studies and a user survey for the rank-by-feature framework. Visualization and Computer Graphics, IEEE Transactions on,12(3), 311–322.
32 |
33 | Wegman, E. (1990). Hyperdimensional data analysis using parallel coordinates. Journal of the American Statistical Association, Vol. 85, No. 411, pp. 664-675
34 |
--------------------------------------------------------------------------------
/coffer/models.py:
--------------------------------------------------------------------------------
1 | # coffer.models
2 | # Models for dataset management and collection.
3 | #
4 | # Author: Benjamin Bengfort
5 | # Created: Thu Oct 08 21:45:27 2015 -0400
6 | #
7 | # Copyright (C) 2015 District Data Labs
8 | # For license information, see LICENSE.txt
9 | #
10 | # ID: models.py [] benjamin@bengfort.com $
11 |
12 | """
13 | Models for dataset management and collection.
14 | """
15 |
16 | ##########################################################################
17 | ## Imports
18 | ##########################################################################
19 |
20 | import os
21 | import unicodecsv as csv
22 |
23 | from django.db import models
24 | from model_utils import Choices
25 | from markupfield.fields import MarkupField
26 | from model_utils.models import TimeStampedModel
27 | from trinket.utils import nullable, notnullable
28 | from django.core.urlresolvers import reverse
29 |
30 | ##########################################################################
31 | ## Models
32 | ##########################################################################
33 |
34 | class Dataset(TimeStampedModel):
35 | """
36 | A record of a dataset uploaded to the data lake for visual analysis.
37 | """
38 |
39 | DATATYPE = Choices('csv', 'json', 'xml')
40 |
41 | uploader = models.ForeignKey('auth.User', related_name='datasets')
42 | dataset = models.FileField(upload_to='datasets')
43 | title = models.CharField(max_length=128, **nullable)
44 | description = MarkupField(markup_type='markdown', **nullable)
45 | dimensions = models.PositiveIntegerField(default=0)
46 | length = models.PositiveIntegerField(default=0)
47 | filesize = models.PositiveIntegerField(default=0)
48 | signature = models.CharField(max_length=44, unique=True, null=False, blank=True)
49 | datatype = models.CharField(max_length=4, choices=DATATYPE, default=DATATYPE.csv)
50 | delimiter = models.CharField(max_length=1, default=",")
51 |
52 | class Meta:
53 | db_table = "datasets"
54 | ordering = ('-created',)
55 | get_latest_by = 'created'
56 |
57 | @property
58 | def filename(self):
59 | """
60 | Returns the basename of the dataset
61 | """
62 | return os.path.basename(self.dataset.name)
63 |
64 | def headers(self):
65 | """
66 | Returns the headers of the file
67 | """
68 | self.dataset.open('r')
69 | reader = csv.reader(self.dataset, delimiter=self.delimiter.encode('utf-8'))
70 | header = reader.next()
71 | self.dataset.close()
72 | return header
73 |
74 | def preview(self, rows=20):
75 | """
76 | Returns the first n rows of the file.
77 | """
78 | self.dataset.open('r')
79 | reader = csv.reader(self.dataset, delimiter=self.delimiter.encode('utf-8'))
80 | header = reader.next()
81 | for idx, row in enumerate(reader):
82 | if idx >= rows:
83 | break
84 | yield row
85 | self.dataset.close()
86 |
87 | def get_absolute_url(self):
88 | """
89 | Return the absolute URL of the model
90 | """
91 | return reverse('dataset-detail', args=(str(self.id),))
92 |
93 | def __unicode__(self):
94 | return "{} - {} dataset with {} rows and {} dimensions, uploaded by {}".format(
95 | self.filename, self.datatype, self.length, self.dimensions, self.uploader
96 | )
97 |
--------------------------------------------------------------------------------
/trinket/templates/coffer/dataset_detail.html:
--------------------------------------------------------------------------------
1 | {% extends 'page.html' %}
2 | {% load humanize %}
3 |
4 | {% block content %}
5 |
6 |
90 | {% endfor %}
91 |
92 |
93 | {% for row in object.preview %}
94 |
95 | {% for item in row %}
96 |
{{ item }}
97 | {% endfor %}
98 |
99 | {% endfor %}
100 |
101 |
102 |
103 |
104 |
105 |
106 | {% endblock %}
107 |
108 | {% block javascripts %}
109 | {{ block.super }}
110 | {% endblock %}
111 |
--------------------------------------------------------------------------------
/docs/about.md:
--------------------------------------------------------------------------------
1 | # About Cultivar
2 |
3 | Cultivar is a dataset management, analysis and visualization tool that is being built as part of the DDL Multidimensional Visualization Research Lab. See: [Parallel Coordinates](http://homes.cs.washington.edu/~jheer//files/zoo/ex/stats/parallel.html) for more on the types of visualizations we're experimenting with.
4 |
5 | ## Contributing
6 |
7 | Cultivar is open source, but because this is an District Data Labs project, we would appreciate it if you would let us know how you intend to use the software (other than simply copying and pasting code so that you can use it in your own projects). If you would like to contribute (especially if you are a student or research labs member at District Data Labs), you can do so in the following ways:
8 |
9 | 1. Add issues or bugs to the bug tracker: [https://github.com/DistrictDataLabs/Cultivar/issues](https://github.com/DistrictDataLabs/Cultivar/issues)
10 | 2. Work on a card on the dev board: [https://waffle.io/DistrictDataLabs/Cultivar](https://waffle.io/DistrictDataLabs/Cultivar)
11 | 3. Create a pull request in Github: [https://github.com/DistrictDataLabs/Cultivar/pulls](https://github.com/DistrictDataLabs/Cultivar/pulls)
12 |
13 | Note that labels in the Github issues are defined in the blog post: [How we use labels on GitHub Issues at Mediocre Laboratories](https://mediocre.com/forum/topics/how-we-use-labels-on-github-issues-at-mediocre-laboratories).
14 |
15 | If you are a member of the District Data Labs Faculty group, you have direct access to the repository, which is set up in a typical production/release/development cycle as described in _[A Successful Git Branching Model](http://nvie.com/posts/a-successful-git-branching-model/)_. A typical workflow is as follows:
16 |
17 | 1. Select a card from the [dev board](https://waffle.io/DistrictDataLabs/Cultivar) - preferably one that is "ready" then move it to "in-progress".
18 |
19 | 2. Create a branch off of develop called "feature-[feature name]", work and commit into that branch.
20 |
21 | ~$ git checkout -b feature-myfeature develop
22 |
23 | 3. Once you are done working (and everything is tested) merge your feature into develop.
24 |
25 | ~$ git checkout develop
26 | ~$ git merge --no-ff feature-myfeature
27 | ~$ git branch -d feature-myfeature
28 | ~$ git push origin develop
29 |
30 | 4. Repeat. Releases will be routinely pushed into master via release branches, then deployed to the server.
31 |
32 | ## Contributors
33 |
34 | Thank you for all your help contributing to make Cultivar a great project!
35 |
36 | ### Maintainers
37 |
38 | - Benjamin Bengfort: [@bbengfort](https://github.com/bbengfort/)
39 | - Rebecca Bilbro: [@rebeccabilbro](https://github.com/rebeccabilbro)
40 |
41 | ### Contributors
42 |
43 | - Tony Ojeda: [@ojedatony1616](https://github.com/ojedatony1616)
44 |
45 | ## Changelog
46 |
47 | The release versions that are sent to the Python package index (PyPI) are also tagged in Github. You can see the tags through the Github web application and download the tarball of the version you'd like. Additionally PyPI will host the various releases of Cultivar (eventually).
48 |
49 | The versioning uses a three part version system, "a.b.c" - "a" represents a major release that may not be backwards compatible. "b" is incremented on minor releases that may contain extra features, but are backwards compatible. "c" releases are bug fixes or other micro changes that developers should feel free to immediately update to.
50 |
51 | ### Version 0.2
52 |
53 | * **tag**: [v0.2](https://github.com/DistrictDataLabs/Cultivar/releases/tag/v0.2)
54 | * **deployment**: Wednesday, January 27, 2016
55 | * **commit**: (see tag)
56 |
57 | This minor update gave a bit more functionality to the MVP prototype, even though the version was intended to have a much more impactful feature set. However after some study, the workflow is changing, and so this development branch is being pruned and deployed in preparation for the next batch. The major achievement of this version is the documentation that discusses our approach, as well as the dataset search and listing page that is now available.
58 |
59 | ### Version 0.1
60 |
61 | * **tag**: [v0.1](https://github.com/DistrictDataLabs/Cultivar/releases/tag/v0.1)
62 | * **deployment**: Tuesday, October 13, 2015
63 | * **commit**: [c863e42](https://github.com/DistrictDataLabs/Cultivar/commit/c863e421292be4eaeab36a9233f6ed7e0068679b)
64 |
65 | MVP prototype type of a dataset uploader and management application. This application framework will become the basis for the research project in the DDL Multidimensional Visualization Research Labs. For now users can upload datasets, and manage their description, as well as preview the first 20 rows.
66 |
--------------------------------------------------------------------------------
/trinket/templates/site/upload.html:
--------------------------------------------------------------------------------
1 | {% extends 'page.html' %}
2 | {% load staticfiles %}
3 | {% load humanize %}
4 |
5 | {% block stylesheets %}
6 | {{ block.super }}
7 |
8 | {% endblock %}
9 |
10 | {% block content %}
11 |
12 |
13 |
14 |
15 |
16 |
17 | {% if form.errors %}
18 | {% for errors in form.errors.values %}
19 | {% for error in errors %}
20 |
93 |
94 | {% endblock %}
95 |
96 | {% block javascripts %}
97 | {{ block.super }}
98 |
134 | {% endblock %}
135 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Trinket
2 | **Multidimensional data explorer and visualization tool.**
3 |
4 | [![Build Status][travis_img]][travis_href]
5 | [![Coverage Status][coveralls_img]][coverals_href]
6 | [![Documentation Status][rtfd_img]][rtfd_href]
7 | [![Stories in Ready][waffle_img]][waffle_href]
8 |
9 | [][wall.jpg]
10 |
11 | ## About
12 |
13 | This is a dataset management and visualization tool that is being built as part of the DDL Multidimensional Visualization Research Lab. See: [Parallel Coordinates](http://homes.cs.washington.edu/~jheer//files/zoo/ex/stats/parallel.html) for more on the types of visualizations we're experimenting with.
14 |
15 | For more information, please enjoy the documentation found at [trinket.readthedocs.org](http://trinket.readthedocs.org/).
16 |
17 | ### Contributing
18 |
19 | Trinket is open source, but because this is an District Data Labs project, we would appreciate it if you would let us know how you intend to use the software (other than simply copying and pasting code so that you can use it in your own projects). If you would like to contribute (especially if you are a student or research labs member at District Data Labs), you can do so in the following ways:
20 |
21 | 1. Add issues or bugs to the bug tracker: [https://github.com/DistrictDataLabs/trinket/issues](https://github.com/DistrictDataLabs/trinket/issues)
22 | 2. Work on a card on the dev board: [https://waffle.io/DistrictDataLabs/trinket](https://waffle.io/DistrictDataLabs/trinket)
23 | 3. Create a pull request in Github: [https://github.com/DistrictDataLabs/trinket/pulls](https://github.com/DistrictDataLabs/trinket/pulls)
24 |
25 | Note that labels in the Github issues are defined in the blog post: [How we use labels on GitHub Issues at Mediocre Laboratories](https://mediocre.com/forum/topics/how-we-use-labels-on-github-issues-at-mediocre-laboratories).
26 |
27 | If you are a member of the District Data Labs Faculty group, you have direct access to the repository, which is set up in a typical production/release/development cycle as described in _[A Successful Git Branching Model](http://nvie.com/posts/a-successful-git-branching-model/)_. A typical workflow is as follows:
28 |
29 | 1. Select a card from the [dev board](https://waffle.io/DistrictDataLabs/trinket) - preferably one that is "ready" then move it to "in-progress".
30 |
31 | 2. Create a branch off of develop called "feature-[feature name]", work and commit into that branch.
32 |
33 | ~$ git checkout -b feature-myfeature develop
34 |
35 | 3. Once you are done working (and everything is tested) merge your feature into develop.
36 |
37 | ~$ git checkout develop
38 | ~$ git merge --no-ff feature-myfeature
39 | ~$ git branch -d feature-myfeature
40 | ~$ git push origin develop
41 |
42 | 4. Repeat. Releases will be routinely pushed into master via release branches, then deployed to the server.
43 |
44 | ### Throughput
45 |
46 | [](https://waffle.io/DistrictDataLabs/trinket/metrics)
47 |
48 | ### Attribution
49 |
50 | The image used in this README, ["window#1"][wall.jpg] by [Namelas Frade](https://www.flickr.com/photos/zingh/) is licensed under [CC BY-NC-ND 2.0](https://creativecommons.org/licenses/by-nc-nd/2.0/)
51 |
52 | ## Changelog
53 |
54 | The release versions that are sent to the Python package index (PyPI) are also tagged in Github. You can see the tags through the Github web application and download the tarball of the version you'd like. Additionally PyPI will host the various releases of Trinket (eventually).
55 |
56 | The versioning uses a three part version system, "a.b.c" - "a" represents a major release that may not be backwards compatible. "b" is incremented on minor releases that may contain extra features, but are backwards compatible. "c" releases are bug fixes or other micro changes that developers should feel free to immediately update to.
57 |
58 | ### Version 0.2
59 |
60 | * **tag**: [v0.2](https://github.com/DistrictDataLabs/trinket/releases/tag/v0.2)
61 | * **deployment**: Wednesday, January 27, 2016
62 | * **commit**: (see tag)
63 |
64 | This minor update gave a bit more functionality to the MVP prototype, even though the version was intended to have a much more impactful feature set. However after some study, the workflow is changing, and so this development branch is being pruned and deployed in preparation for the next batch. The major achievement of this version is the documentation that discusses our approach, as well as the dataset search and listing page that is now available.
65 |
66 | ### Version 0.1
67 |
68 | * **tag**: [v0.1](https://github.com/DistrictDataLabs/trinket/releases/tag/v0.1)
69 | * **deployment**: Tuesday, October 13, 2015
70 | * **commit**: [c863e42](https://github.com/DistrictDataLabs/trinket/commit/c863e421292be4eaeab36a9233f6ed7e0068679b)
71 |
72 | MVP prototype type of a dataset uploader and management application. This application framework will become the basis for the research project in the DDL Multidimensional Visualization Research Labs. For now users can upload datasets, and manage their description, as well as preview the first 20 rows.
73 |
74 |
75 | [travis_img]: https://travis-ci.org/DistrictDataLabs/trinket.svg?branch=master
76 | [travis_href]: https://travis-ci.org/DistrictDataLabs/trinket
77 | [coveralls_img]: https://coveralls.io/repos/DistrictDataLabs/trinket/badge.svg?branch=master&service=github
78 | [coverals_href]: https://coveralls.io/github/DistrictDataLabs/trinket?branch=master
79 | [waffle_img]: https://badge.waffle.io/DistrictDataLabs/trinket.png?label=ready&title=Ready
80 | [waffle_href]: https://waffle.io/DistrictDataLabs/trinket
81 | [rtfd_img]: https://readthedocs.org/projects/trinket/badge/?version=latest
82 | [rtfd_href]: http://trinket.readthedocs.org/en/latest/?badge=latest
83 | [wall.jpg]: https://flic.kr/p/75C2ac
84 |
--------------------------------------------------------------------------------
/docs/auto_analysis.md:
--------------------------------------------------------------------------------
1 | # Automated Analysis
2 |
3 | ## Overview
4 |
5 | Cultivar is designed to mirror what experienced data scientists do when they take their first few passes through a new dataset by intelligently automating large portions of the wrangling and analysis/exploration phases of the data science pipeline, integrating them into the initial ingestion or uploading phase.
6 |
7 | ## Architecture
8 |
9 | The auto-analysis and text parsing features of Cultivar are written in Python. They work by scanning columns of uploaded data and using `numpy`, `unicodecsv`, one-dimensional kernel density estimates, standard analyses of variance mechanisms and hypothesis testing (KDEs, ANOVAs).
10 |
11 | 
12 |
13 | This enables Cultivar to do type identification, e.g. to identify and differentiate: discrete integers, floats, text data, normal distributions, classes, outliers, and errors. To perform this analysis quickly and accurately during the data ingestion process, Cultivar includes a rules-based system trained from previously annotated data sets and coupled with heuristic rules determined in discussions with a range of experienced data scientists.
14 |
15 | ## Mechanics
16 |
17 | Auto-analysis works by assigning each column/feature a data type (`dtype` in the parlance of NumPy and Pandas), e.g. categorical, numeric, real, integer, etc. These types must be automatically inferred from the dataset.
18 |
19 | The auto-analysis method takes as input a file-like object and generic keyword arguments and returns as output a tuple/list whose length is the (maximum) number of columns in the dataset, and whose values contain the datatype of each column, ordered by column index.
20 |
21 |
22 | _Questions to answer:_
23 |
24 | - How do other libraries like `pandas` and `messytables` do this?
25 | Pandas computes [histograms](https://github.com/pydata/pandas/blob/master/pandas/core/algorithms.py#L250), looks for the [min](https://github.com/pydata/pandas/blob/master/pandas/core/algorithms.py#L537) and [max](https://github.com/pydata/pandas/blob/master/pandas/core/algorithms.py#L556) values of a column, samples [quantiles](https://github.com/pydata/pandas/blob/master/pandas/core/algorithms.py#L410), and counts [unique values](https://github.com/pydata/pandas/blob/master/pandas/core/algorithms.py#L55).
26 |
27 | - Do you have to go through the whole dataset to make a decision?
28 | Yes and no - decide based on how big the dataset is. The below strategy builds a sample from 50 non-empty rows for each column, as well as the rows with the longest and shortest lengths. For larger datasets, maybe sample 10%. For extremely large datasets, 1% might be enough.
29 |
30 | - Can we use a sampling approach to reading the data?
31 | Naive method (assumes straightforward densities):
32 |
33 | ```python
34 | for each col in fileTypeObject:
35 | find mx # row with the longest value
36 | find mn # row with the shortest value
37 | find nonNaN # first 50 non-empty rows using ndarray.nonzero()
38 | sampleArray = nd.array(mn, mx, nonNaN)
39 | ```
40 |
41 | - Is there a certain density of data required to make a decision?
42 | This is a good question - some libraries build histograms for each column to examine densities. See the [`pandas` method for histograms](https://github.com/pydata/pandas/blob/master/pandas/core/algorithms.py#L250).
43 | TODO: look into thresholds
44 |
45 | - What types are we looking for?
46 | __string__, __datetime__, __float__, __integer__, __boolean__
47 | See also [`messytables` types](https://github.com/okfn/messytables/blob/master/messytables/types.py).
48 |
49 | Attempt parsing from broadest type to narrowest:
50 |
51 | ```python
52 | for val in colSample:
53 | if val.dtype.type is np.string_:
54 | colType = colType.astype('Sn') # where n is the max length value in col
55 | elif val.dtype.type is np.datetime64:
56 | colType = colType.astype('datetime64') # this is new & experimental in NumPy 1.7.0
57 | elif val.dtype.type is np.float_:
58 | colType = colType.astype('float64')
59 | elif val.dtype.type is np.int_:
60 | colType = colType.astype('int64')
61 | elif val.dtype.type is np.bool_:
62 | colType = colType.astype('bool')
63 | else:
64 | # do something else
65 | # what about unicode and complex types?
66 | ```
67 |
68 | - What does column-major mean for Cultivar?
69 | Use [`transpose`](http://docs.scipy.org/doc/numpy-1.10.1/reference/generated/numpy.ndarray.T.html) and/or [`reshape`](http://docs.scipy.org/doc/numpy-1.10.1/reference/generated/numpy.reshape.html) from `numpy`.
70 |
71 | - Can we automatically detect delimiters and quote characters? (e.g. ; vs ,)
72 | See `messytables` [method for delimiter detection](https://github.com/okfn/messytables/blob/master/messytables/commas.py).
73 |
74 | - How do we detect if there is a header row or not?
75 | See `messytables` [method for header detection](https://github.com/okfn/messytables/blob/7e4f12abef257a4d70a8020e0d024df6fbb02976/messytables/headers.py).
76 |
77 | - How lightweight/heavyweight must this be?
78 | Look into making more lightweight using regular expressions & hard-coded rules (see [Brill tagging](https://en.wikipedia.org/wiki/Brill_tagger)).
79 |
80 | ## Sources
81 |
82 | [Datatypes in Python - 2.7](https://docs.python.org/2/library/datatypes.html)
83 |
84 | [Datatypes in Python - 3.5](https://docs.python.org/3.5/library/datatypes.html)
85 |
86 | [Numpy - dtypes](http://docs.scipy.org/doc/numpy/reference/arrays.dtypes.html)
87 |
88 | [UnicodeCSV](https://github.com/jdunck/python-unicodecsv/blob/master/README.rst)
89 |
90 | [Pandas](http://pandas.pydata.org/)
91 |
92 | [MessyTables](https://messytables.readthedocs.org/en/latest/)
93 |
94 | [Dataproxy](https://github.com/okfn/dataproxy)
95 |
96 | [Algorithms for Type Guessing - Stackoverflow](http://stackoverflow.com/questions/6824862/data-type-recognition-guessing-of-csv-data-in-python)
97 |
98 | [Python Libraries for Type Guessing - Stackoverflow](http://stackoverflow.com/questions/3098337/method-for-guessing-type-of-data-represented-currently-represented-as-strings-in)
99 |
--------------------------------------------------------------------------------
/trinket/templates/site/legal/privacy.html:
--------------------------------------------------------------------------------
1 | {% extends 'site/legal/legal-page.html' %}
2 |
3 | {% block legal-header %}
4 |
Your privacy is very important to us. Accordingly, we have developed this Policy in order for you to understand how we collect, use, communicate and disclose and make use of personal information. The following outlines our privacy policy.
10 |
11 |
12 |
Before or at the time of collecting personal information, we will identify the purposes for which information is being collected.
13 |
We will collect and use of personal information solely with the objective of fulfilling those purposes specified by us and for other compatible purposes, unless we obtain the consent of the individual concerned or as required by law.
14 |
We will only retain personal information as long as necessary for the fulfillment of those purposes.
15 |
We will collect personal information by lawful and fair means and, where appropriate, with the knowledge or consent of the individual concerned.
16 |
Personal data should be relevant to the purposes for which it is to be used, and, to the extent necessary for those purposes, should be accurate, complete, and up-to-date.
17 |
We will protect personal information by reasonable security safeguards against loss or theft, as well as unauthorized access, disclosure, copying, use or modification.
18 |
We will make readily available to customers information about our policies and practices relating to the management of personal information.
19 |
20 |
21 |
22 |
We are committed to conducting our business in accordance with these principles in order to ensure that the confidentiality of personal information is protected and maintained.
23 |
24 |
Information Collection and Use
25 |
26 |
Our primary goal in collecting information is to provide and improve our Site, App, and Services. We would like to deliver a user-customized experience on our site, allowing users to administer their Membership and enable users to enjoy and easily navigate the Site or App.
27 |
28 |
Personally Identifiable Information
29 |
30 |
When you register or create an account with us through the Site, or as a user of a Service provided by us, or through any Mobile App, we will ask you for personally identifiable information and you will become a member ("Member") of the site. This information refers to information that can be used to contact or identify you ("Personal Information"). Personal Information includes, but is not limited to, your name, phone number, email address, and home and business postal addresses. We use this information only to provide Services and administer your inquiries.
31 |
32 |
We may also collect other information as part of the registration for use in administration and personalization of your account. This information is "Non-Identifying Information" like your role in education. We use your Personal Information and, in some cases, your Non-Identifying Information to provide you a Service, complete your transactions, and administer your inquiries.
33 |
34 |
We will also use your Personal Information to contact you with newsletters, marketing, or promotional materials, and other information that may be of interest to you. If you decide at any time that you no longer with to receive such communications from us, please follow the unsubscribe instructions provided in any communications update.
35 |
36 |
Changing or Deleting your Information
37 |
38 |
All Members may review, update, correct, or delete any Personal Information in their user profile under the "My Account" section of the Site or by contacting us. If you completely delete all such information, then your account may become deactivated. You can also request the deletion of your account, which will anonymize all Personal Information and restrict the username associated with the Member from being used again.
39 |
40 |
International Transfer
41 |
42 |
Your information may be transferred to, and maintained on, computers located outside of your state, province, country or other governmental jurisdiction where the privacy laws may not be as protective as those in your jurisdiction. If you are located outside the United States and choose to provide information to us, our website transfers Personal Information to the United States and processes it there. Your consent to these Terms of Use, followed by your submission of such information represents your agreement to that transfer.
43 |
44 |
Our Policy Toward Children
45 |
46 |
This Site is not directed to children under 18. We do not knowingly collect personally identifiable information from children under 13. If a parent or guardian becomes aware that his or her child has provided us Personal Information without their consent, he or she should contact us at admin@districtdatalabs.com. If we become aware that a child under 13 has provided us with Personal Information, we will delete such information from our databases.
47 |
48 |
Modification
49 |
50 |
It is our policy to post any changes we make to our Privacy Policy on this page. If we make material changes to how we treat our users' personal information, we will notify you by e-mail to the e-mail address specified in your account. The date this Privacy Policy was last revised is identified at the top of the page. You are responsible for ensuring we have an up-to-date active and deliverable e-mail address for you, and for periodically visiting our Website and this Privacy Policy to check for any changes.
Turns out that's fairly easy, so long as you're a member of the District Data Labs faculty. All you have to do is sign in with Google using your @districtdatalabs.com email address and you'll be given access. If you'd like to set a password so that you don't have to use Google, you can do so in the administrative interface. Note if you can login but still can't access the admin, please email Ben.
By accessing this website or any website owned by District Data Labs, you are agreeing to be bound to all of the terms, conditions, and notices contained or referenced in this Terms and Conditions of Use and all applicable laws and regulations. You also agree that you are responsible for compliance with any applicable local laws. If you do not agree to these terms, you are prohibited from using or accessing this site or any other site owned by District Data Labs. District Data Labs reserves the right to update or revise these Terms of Use. Your continued use of this Site following the posting of any changes to the Terms of Use constitutes acceptance of those changes.
14 |
15 |
Permission is granted to temporarily download one copy of the materials on District Data Labs's Websites for viewing only. This is a grant of a license, not a transfer of a title. Under this licenses you may not:
16 |
17 |
18 |
Modify or copy the materials
19 |
Use the materials for any commercial purpose, or any public display (commercial or non-commercial)
20 |
Attempt to decompile or reverse engineer any software contained or provided through District Data Labs's Website
21 |
Remove any copyright or proprietary notations from the material
22 |
Transfer the materials to another person or "mirror" any materials on any other server including data accessed through our APIS
23 |
24 |
25 |
26 |
District Data Labs has the right to terminate this license if you violate any of these restrictions, and upon termination you are no longer allowed to view these materials and must destroy any downloaded content in either print or electronic format.
27 |
28 |
29 |
30 |
31 |
Modification
32 |
33 |
It is our policy to post any changes we make to our terms of use on this page. If we make material changes to how we treat our users' personal information, we will notify you by e-mail to the e-mail address specified in your account. The date these Terms of Use was last revised is identified at the top of the page. You are responsible for ensuring we have an up-to-date active and deliverable e-mail address for you, and for periodically visiting our Website and this terms of use to check for any changes.
34 |
35 |
36 |
37 |
38 |
Copyright
39 |
40 |
The entire content of this Site is protected by copyright. You may not copy, distribute, or create derivative works from any part of this website (including its graphics, pictorial matter, and text) without the prior written consent of District Data Labs unless otherwise expressly permitted by the Sites.
41 |
42 |
43 |
44 |
45 |
46 |
Trademarks
47 |
48 |
District Data Labs owns names, logos, designs, titles, words, or phrases within this Site are trademarks, service marks, or trade names of District Data Labs or its affiliated companies and may not be used without prior written permission. District Data Labs claims no interest in marks owned by entities not affiliated with District Data Labs which may appear on this Site.
49 |
50 |
51 |
52 |
53 |
Contributed Content
54 |
55 |
Users posting content to the Site and District Data Labs's Social Media pages linked within are solely responsible for all content and any infringement, defamation, or other claims resulting from or related thereto. District Data Labs reserves the right to remove or refuse to post any content that is offensive, indecent, or otherwise objectionable, and makes no guarantee of the accuracy, integrity, or quality of posted content.
56 |
57 |
58 |
59 |
Account Registration
60 |
61 |
In order to access certain features of this Site and Services and to post any Content on the Site or through the Services, you must register to create an account ("Account") through the Site, or through a Service provided by us for use with our Site.
62 |
63 |
During the registration process, you will be required to provide certain information and you will establish a username and password. You agree to provide accurate, current, and complete information as required during the registration process. You also agree to ensure, by updating, the information remains accurate, current, and complete. District Data Labs reserves the right to suspend or terminate your Account if information provided during the registration process or thereafter proves to be inaccurate, not current, or incomplete.
64 |
65 |
You are responsible for safeguarding your password. You agree not to disclose your password to any third party and take sole responsibility for any activities or actions under your Account, whether or not your have authorized such activities or actions. If you think your account has been accessed in any unauthorized way, you will notify District Data Labs immediately.
66 |
67 |
Termination and Account Cancellation
68 |
69 |
District Data Labs will have the right to suspend or disable your Account if you breach any of these Terms of Service, at our sole discretion and without any prior notice to you. District Data Labs reserves the right to revoke your access to and use of this Site, Services, and Content at any time, with or without cause.
70 |
71 |
You may also cancel your Account at any time by sending an email to admin@districtdatalabs.com or by using the "delete account" option under the "My Account" section of the website. When your account is canceled, we set all personal information except your username to "Anonymous" and remove the ability to login with that username and any password. The username will be considered unavailable, and no one will be able to create or use an account with the username of the cancelled account.
72 |
73 |
74 |
75 |
76 |
Privacy
77 |
78 |
See District Data Labs's Privacy Policy for information and notices concerning collection and use of your personal information.
79 |
80 |
81 |
82 |
District Data Labs Mailing List
83 |
84 |
Should you submit your contact information through the "Sign Up" link, you agree to receive periodic emails and possible postal mail relating to news and updates regarding District Data Labs efforts and the efforts of like-minded organizations. You may discontinue receipt of such emails and postal mail through the “unsubscribe” provisions included in the promotional emails.
85 |
86 |
87 |
88 |
89 |
No Endorsements
90 |
91 |
Any links on this Site to third party websites are not an endorsement, sponsorship, or recommendation of the third parties or the third parties' ideas, products, or services. Similarly, any references in this Site to third parties and their products or services do not constitute an endorsement, sponsorship, or recommendation. If you follow links to third party websites, or any other companies or organizations affiliated or unaffiliated with District Data Labs, you are subject to the terms and conditions and privacy policies of those sites, and District Data Labs marks no warranty or representations regarding those sites. Further, District Data Labs is not responsible for the content of third party or affiliated company sites or any actions, inactions, results, or damages caused by visiting those sites.
92 |
93 |
94 |
95 |
96 |
Governing Law
97 |
98 |
This Site was designed for and is operated in the United States. Regardless of where the Site is viewed, you are responsible for compliance with applicable local laws.
99 |
100 |
You and District Data Labs agree that the laws of the District of Columbia will apply to all matters arising from or relating to use of this Website, whether for claims in contract, tort, or otherwise, without regard to conflicts of laws principles.
101 |
102 |
International Transfer
103 |
104 |
Your information may be transferred to, and maintained on, computers located outside of your state, province, country or other governmental jurisdiction where the privacy laws may not be as protective as those in your jurisdiction. If you are located outside the United States and choose to provide information to us, District Data Labs transfers Personal Information to the United States and processes it there. Your consent to these Terms of Use, followed by your submission of such information represents your agreement to that transfer.
105 |
106 |
107 |
108 |
109 |
Disclaimer
110 |
111 |
The materials on District Data Labs's Website are provided "as is" without any kind of warranty. The material on this Website is not a warranty as to any product or service provided by District Data Labs or any affiliated or unaffiliated organization.
112 |
113 |
District Data Labs is not liable for any errors, delays, inaccuracies, or omissions in this Website or any Website that are linked to or referenced by this Website. Under no circumstances shall District Data Labs be liable for any damages, including indirect, incidental, special, or consequential damages that result from the use of, or inability to use, this Website.
114 |
115 |
116 |
117 |
118 |
Agrement
119 |
120 |
These Terms of Use constitute the entire agreement between you and District Data Labs with respect to your use of this Site and supersede all prior or contemporaneous communications and proposals, whether oral, written, or electronic, between you and District Data Labs with respect to this Site. If any provision(s) of these Terms of Use are held invalid or unenforceable, those provisions shall be construed in a manner consistent with applicable law to reflect, as nearly as possible, the original intentions of the parties, and the remaining provisions shall remain in full force and effect.
121 |
122 | {% endblock %}
123 |
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | Apache License
2 | Version 2.0, January 2004
3 | http://www.apache.org/licenses/
4 |
5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
6 |
7 | 1. Definitions.
8 |
9 | "License" shall mean the terms and conditions for use, reproduction,
10 | and distribution as defined by Sections 1 through 9 of this document.
11 |
12 | "Licensor" shall mean the copyright owner or entity authorized by
13 | the copyright owner that is granting the License.
14 |
15 | "Legal Entity" shall mean the union of the acting entity and all
16 | other entities that control, are controlled by, or are under common
17 | control with that entity. For the purposes of this definition,
18 | "control" means (i) the power, direct or indirect, to cause the
19 | direction or management of such entity, whether by contract or
20 | otherwise, or (ii) ownership of fifty percent (50%) or more of the
21 | outstanding shares, or (iii) beneficial ownership of such entity.
22 |
23 | "You" (or "Your") shall mean an individual or Legal Entity
24 | exercising permissions granted by this License.
25 |
26 | "Source" form shall mean the preferred form for making modifications,
27 | including but not limited to software source code, documentation
28 | source, and configuration files.
29 |
30 | "Object" form shall mean any form resulting from mechanical
31 | transformation or translation of a Source form, including but
32 | not limited to compiled object code, generated documentation,
33 | and conversions to other media types.
34 |
35 | "Work" shall mean the work of authorship, whether in Source or
36 | Object form, made available under the License, as indicated by a
37 | copyright notice that is included in or attached to the work
38 | (an example is provided in the Appendix below).
39 |
40 | "Derivative Works" shall mean any work, whether in Source or Object
41 | form, that is based on (or derived from) the Work and for which the
42 | editorial revisions, annotations, elaborations, or other modifications
43 | represent, as a whole, an original work of authorship. For the purposes
44 | of this License, Derivative Works shall not include works that remain
45 | separable from, or merely link (or bind by name) to the interfaces of,
46 | the Work and Derivative Works thereof.
47 |
48 | "Contribution" shall mean any work of authorship, including
49 | the original version of the Work and any modifications or additions
50 | to that Work or Derivative Works thereof, that is intentionally
51 | submitted to Licensor for inclusion in the Work by the copyright owner
52 | or by an individual or Legal Entity authorized to submit on behalf of
53 | the copyright owner. For the purposes of this definition, "submitted"
54 | means any form of electronic, verbal, or written communication sent
55 | to the Licensor or its representatives, including but not limited to
56 | communication on electronic mailing lists, source code control systems,
57 | and issue tracking systems that are managed by, or on behalf of, the
58 | Licensor for the purpose of discussing and improving the Work, but
59 | excluding communication that is conspicuously marked or otherwise
60 | designated in writing by the copyright owner as "Not a Contribution."
61 |
62 | "Contributor" shall mean Licensor and any individual or Legal Entity
63 | on behalf of whom a Contribution has been received by Licensor and
64 | subsequently incorporated within the Work.
65 |
66 | 2. Grant of Copyright License. Subject to the terms and conditions of
67 | this License, each Contributor hereby grants to You a perpetual,
68 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable
69 | copyright license to reproduce, prepare Derivative Works of,
70 | publicly display, publicly perform, sublicense, and distribute the
71 | Work and such Derivative Works in Source or Object form.
72 |
73 | 3. Grant of Patent License. Subject to the terms and conditions of
74 | this License, each Contributor hereby grants to You a perpetual,
75 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable
76 | (except as stated in this section) patent license to make, have made,
77 | use, offer to sell, sell, import, and otherwise transfer the Work,
78 | where such license applies only to those patent claims licensable
79 | by such Contributor that are necessarily infringed by their
80 | Contribution(s) alone or by combination of their Contribution(s)
81 | with the Work to which such Contribution(s) was submitted. If You
82 | institute patent litigation against any entity (including a
83 | cross-claim or counterclaim in a lawsuit) alleging that the Work
84 | or a Contribution incorporated within the Work constitutes direct
85 | or contributory patent infringement, then any patent licenses
86 | granted to You under this License for that Work shall terminate
87 | as of the date such litigation is filed.
88 |
89 | 4. Redistribution. You may reproduce and distribute copies of the
90 | Work or Derivative Works thereof in any medium, with or without
91 | modifications, and in Source or Object form, provided that You
92 | meet the following conditions:
93 |
94 | (a) You must give any other recipients of the Work or
95 | Derivative Works a copy of this License; and
96 |
97 | (b) You must cause any modified files to carry prominent notices
98 | stating that You changed the files; and
99 |
100 | (c) You must retain, in the Source form of any Derivative Works
101 | that You distribute, all copyright, patent, trademark, and
102 | attribution notices from the Source form of the Work,
103 | excluding those notices that do not pertain to any part of
104 | the Derivative Works; and
105 |
106 | (d) If the Work includes a "NOTICE" text file as part of its
107 | distribution, then any Derivative Works that You distribute must
108 | include a readable copy of the attribution notices contained
109 | within such NOTICE file, excluding those notices that do not
110 | pertain to any part of the Derivative Works, in at least one
111 | of the following places: within a NOTICE text file distributed
112 | as part of the Derivative Works; within the Source form or
113 | documentation, if provided along with the Derivative Works; or,
114 | within a display generated by the Derivative Works, if and
115 | wherever such third-party notices normally appear. The contents
116 | of the NOTICE file are for informational purposes only and
117 | do not modify the License. You may add Your own attribution
118 | notices within Derivative Works that You distribute, alongside
119 | or as an addendum to the NOTICE text from the Work, provided
120 | that such additional attribution notices cannot be construed
121 | as modifying the License.
122 |
123 | You may add Your own copyright statement to Your modifications and
124 | may provide additional or different license terms and conditions
125 | for use, reproduction, or distribution of Your modifications, or
126 | for any such Derivative Works as a whole, provided Your use,
127 | reproduction, and distribution of the Work otherwise complies with
128 | the conditions stated in this License.
129 |
130 | 5. Submission of Contributions. Unless You explicitly state otherwise,
131 | any Contribution intentionally submitted for inclusion in the Work
132 | by You to the Licensor shall be under the terms and conditions of
133 | this License, without any additional terms or conditions.
134 | Notwithstanding the above, nothing herein shall supersede or modify
135 | the terms of any separate license agreement you may have executed
136 | with Licensor regarding such Contributions.
137 |
138 | 6. Trademarks. This License does not grant permission to use the trade
139 | names, trademarks, service marks, or product names of the Licensor,
140 | except as required for reasonable and customary use in describing the
141 | origin of the Work and reproducing the content of the NOTICE file.
142 |
143 | 7. Disclaimer of Warranty. Unless required by applicable law or
144 | agreed to in writing, Licensor provides the Work (and each
145 | Contributor provides its Contributions) on an "AS IS" BASIS,
146 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
147 | implied, including, without limitation, any warranties or conditions
148 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
149 | PARTICULAR PURPOSE. You are solely responsible for determining the
150 | appropriateness of using or redistributing the Work and assume any
151 | risks associated with Your exercise of permissions under this License.
152 |
153 | 8. Limitation of Liability. In no event and under no legal theory,
154 | whether in tort (including negligence), contract, or otherwise,
155 | unless required by applicable law (such as deliberate and grossly
156 | negligent acts) or agreed to in writing, shall any Contributor be
157 | liable to You for damages, including any direct, indirect, special,
158 | incidental, or consequential damages of any character arising as a
159 | result of this License or out of the use or inability to use the
160 | Work (including but not limited to damages for loss of goodwill,
161 | work stoppage, computer failure or malfunction, or any and all
162 | other commercial damages or losses), even if such Contributor
163 | has been advised of the possibility of such damages.
164 |
165 | 9. Accepting Warranty or Additional Liability. While redistributing
166 | the Work or Derivative Works thereof, You may choose to offer,
167 | and charge a fee for, acceptance of support, warranty, indemnity,
168 | or other liability obligations and/or rights consistent with this
169 | License. However, in accepting such obligations, You may act only
170 | on Your own behalf and on Your sole responsibility, not on behalf
171 | of any other Contributor, and only if You agree to indemnify,
172 | defend, and hold each Contributor harmless for any liability
173 | incurred by, or claims asserted against, such Contributor by reason
174 | of your accepting any such warranty or additional liability.
175 |
176 | END OF TERMS AND CONDITIONS
177 |
178 | APPENDIX: How to apply the Apache License to your work.
179 |
180 | To apply the Apache License to your work, attach the following
181 | boilerplate notice, with the fields enclosed by brackets "{}"
182 | replaced with your own identifying information. (Don't include
183 | the brackets!) The text should be enclosed in the appropriate
184 | comment syntax for the file format. We also recommend that a
185 | file or class name and description of purpose be included on the
186 | same "printed page" as the copyright notice for easier
187 | identification within third-party archives.
188 |
189 | Copyright 2015 District Data Labs
190 |
191 | Licensed under the Apache License, Version 2.0 (the "License");
192 | you may not use this file except in compliance with the License.
193 | You may obtain a copy of the License at
194 |
195 | http://www.apache.org/licenses/LICENSE-2.0
196 |
197 | Unless required by applicable law or agreed to in writing, software
198 | distributed under the License is distributed on an "AS IS" BASIS,
199 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
200 | See the License for the specific language governing permissions and
201 | limitations under the License.
202 |
--------------------------------------------------------------------------------