├── Snippets.md ├── Tips.md ├── Tools.md └── README.md /Snippets.md: -------------------------------------------------------------------------------- 1 | # Code snippets 2 | 3 | _List here interesting, reusable pieces of code._ 4 | 5 | (Or should we just publish links to Gists?) 6 | 7 | 8 | ## JavaScript: retrieve species data from Wikidata (based on the scientific name) 9 | 10 | TODO @niconoe: write snippet by extracting and simplifying code from the Catalogue of Lepidoptera of Belgium. -------------------------------------------------------------------------------- /Tips.md: -------------------------------------------------------------------------------- 1 | # Tips 2 | 3 | _List here tips and lessons learned while developing Biodiversity Informatics related project._ 4 | 5 | ## Modified Preorder Tree Traversal for storing taxonomic information in a RDBMS 6 | 7 | I quite often have to store and query large taxonomic trees in an relational database. The naïve approach of having a `taxon` table with a recursive `parent_id` foreign key doesn't often works, for performance reasons. 8 | 9 | If it appears the reads are much more frequent than the updates, I've had good experiences with a different table structure known as *Modified Preorder Tree Traversal*. The technique is described [here](https://gist.github.com/tmilos/f2f999b5839e2d42d751), and in many other places on the web. 10 | 11 | If your project is web-based and uses the [Django framework](https://www.djangoproject.com/), you can implement and query this data structure very easily using [django-mptt](https://github.com/django-mptt/django-mptt). -------------------------------------------------------------------------------- /Tools.md: -------------------------------------------------------------------------------- 1 | # List of tools 2 | 3 | _List here reusable tools and libraries for Biodiversity Informatics._ 4 | 5 | ## [Frictionless Darwin Core](https://github.com/frictionlessdata/FrictionlessDarwinCore) 6 | 7 | [Frictionless Data Package](https://frictionlessdata.io/) is an emerging data standard similar to [Darwin Core Archive], but domain agnostic. This tool allows converting a [Darwin Core Archive] to a Data Package (from the CLI or Python code), giving access to the Frictionless ecosystem ([goodtables](https://github.com/frictionlessdata/goodtables-py), ...) 8 | 9 | Currently in development, but the basic works and contributions are welcome. 10 | 11 | ## [pygbif](https://pygbif.readthedocs.io/en/latest/) 12 | 13 | [GBIF] Python client. Allows exploring GBIF data (occurrences, taxonomic names, maps, datasets, ...) from Python. 14 | 15 | ## [pyinaturalist](https://github.com/niconoe/pyinaturalist) 16 | 17 | Read-/write Python client for the iNaturalist APIs. Used in production, but the API coverage is not complete yet (allows creating, updating, deleting and searching observations, uploading pictures, setting *observation fields*, ...). 18 | 19 | ## [python-dwca-reader](https://github.com/BelgianBiodiversityPlatform/python-dwca-reader) 20 | 21 | A simple Python package to read and parse [Darwin Core Archive] (DwC-A) files, as produced by the [GBIF] website, the [IPT](https://www.gbif.org/fr/ipt) and many other biodiversity informatics tools. 22 | 23 | It intends to be Pythonic and simple to use, support Darwin Core extensions, is quite stable, can deal with large archives and works on Python 2, 3 and Jython on Linux, Mac OS and Windows. 24 | 25 | ## [rgbif] 26 | 27 | Search and retrieve data from the Global Biodiverity Information Facilty ([GBIF]) rgbif is an [R] package to search and retrieve data from [GBIF]. [rgbif] wraps [R] code around the GBIF API to allow you to talk to GBIF from [R]. 28 | 29 | [Darwin Core Archive]: https://en.wikipedia.org/wiki/Darwin_Core_Archive 30 | [GBIF]: https://www.gbif.org 31 | [R]: https://www.r-project.org/ 32 | [rgbif]: https://github.com/ropensci/rgbif -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | _The present text is initial food for thought for the purposes of discussion only, it does not represent the concensus view of the Developers interest group. Please consider everything here TODO target for refinement. As an example for what we hope to see here please see_ [other interest groups](https://www.tdwg.org/community/) _for exemplar charters._ 2 | 3 | # Developers - A TDWG Interest Group for Biodiversity Informatics Developers 4 | ## Conveners 5 | *TODO* See [Responsibilities of Conveners](https://www.tdwg.org/community/management/). 6 | 7 | ## Quick start 8 | * Star/follow this repository 9 | * Join the [Slack channel](http://tdwg.slack.com/messages/developers) with your contribution to this community. Email to get an invitation to Slack. 10 | * Read Contributing 11 | * Browse or add [issues](https://github.com/tdwg/developers/issues) 12 | * Nominate yourself (open an issue) to be a maintainer of this repository 13 | * Read the rest of this document 14 | 15 | ## Motivation 16 | There are a number of practical, personal, and theoretical advantages to building a community of developers addressing the particular needs of biodiversity science. 17 | 18 | ### Convening a supportive, diverse, and inclusive community 19 | As in any community there are challenges and hurdles that are particular to the work that happens in that community. A shared forum for expressing those challenges and how they were overcome, or abandoned, can be both informative and therapeutic. More formally, please read our [code of conduct TODO: choose/formalize code](CODE_OF_CONDUCT.md). 20 | 21 | #### A note on closed-source and commercial software 22 | While many in the group would argue that open-source software and practices are the ideal to which we should strive, the group acknowledges that there biodiversity is vast, and we should not de-facto exclude tools and approaches that are partially or fully closed-source, and certainly not exclude those that are partially or fully commercial. Identifying the role of closed-source and/or commercial products, and embracing the idea that they will always exists, better positions our efforts to ensuring that those efforts more fully integrate into the broader, more open and interoperable "ecosystem" of biodiversity serving software standards. A deep understanding of closed products allows us to better communicate their impact to the people who use them and to those who mandate their use. 23 | 24 | ### Sharing code 25 | We have collectively shared ideas, algorithms, and by reference to standards similar outcomes in our software, but to-date we are, frankly, terrible at sharing code amongst our software. We are not just interested in sharing any code, but rather code that specifically _intersects the biological domain_. For example, many developers may use some specific database platform, but those platforms have nothing inherently biological about them. A very fast binary that parses scientific names, on the other hand, is an great example of what can be shared among software packages. 26 | 27 | See specific pages about [tools and libraries](./Tools.md), [code snippets](./Snippets.md) and [Tips and lessons learned](./Tips.md). 28 | 29 | ### Moving software development closer to data-standard development 30 | Standards are arguably of little worth, or at best interesting theoretical experiments, if they are not implemented. It follows that the development _process_ that integrates standards is logically a process that is part of standard development itself. How can we more closely integrate the two? 31 | 32 | ### Identifying long-term paths for software that supports biodiversity informatics 33 | Biodiversity science is a field that embraces history. Similarly, we should look deep into the future. As developers we know our software will be obsolete if not minutes then days or years after it is put in motion. How do we embrace the idea of planned obsolescence? How do we create software that is adaptable over decades or millennia? If we are honest about what our software can do, then the people that use it are better prepared when it needs resources to be updated, becomes obsolete, or flat out fails. 34 | 35 | ## Contributing 36 | Follow the [TODO: Code of Conduct](CODE_OF_CONDUCT.md). 37 | 38 | As developers we are technically inclined, therefore we should: 39 | * *TODO* 40 | * Use the issue tracker to initiate engagements and proposals 41 | * Propose changes via pull requests 42 | * Engage the slack channel (or other technical resources as they evolve) 43 | 44 | ### Community guidelines 45 | As a means of trending our efforts to the practical and generalizable please consider the following as you engage: 46 | * *TODO: clarify, evolve, etc. the guidelines*. 47 | * Follow the guidelines for submitting issues. In general when you submit an issue imagine/state both the role and the problem and include these in the title, e.g. "*As a* Python developer *I need a* X *to do* Y". 48 | * Try to keep focused on tools and questions that are inherently tied to the biological domain. There are many, many other support forums that are better, broader solutions to many questions. _This will be very difficult_, so please don't dissuade would-be members who initially are not following this guideline. 49 | * Consider promoting "small" shareable code libraries and solutions rather than large, integrated specific software packages. The group does not want to become a forum promoting one software platform over another (but see _Code speaks loud_ below). If the group gains traction, then we may create specific forums for announcements and or more casual and open conversations. 50 | * *Code speaks loud*. Presenting a well thought out bit of code that demonstrates why something "needs improvement" is a wonderful contribution, it's also a good way to provide constructive criticism. 51 | * As we communicate, strive to look deeper than the commonly espoused memes. For example "we don't want to re-invent the wheel", "we need to be more efficient", and "we need a scalable industrial solution rather than boutique solutions" aren't particularly useful statements on their own. If we don't reinvent the wheel, how will we know we got it right? If we are incredibly efficient but our outcome has no meaning, that's a problem. No technology scales to biology, and if we had only one, then it's a monoculture at an extreme risk of collapse. 52 | * Think of this project as meta-development and promote best practices by using your best practices as you help to build the community. For example you understand that a "+1" mechanism on issues is better than adding a comment, so you use "+1". 53 | * Don't troll ("Use VIM, not Emacs, duh!."). 54 | 55 | ## Gauging success 56 | How do we know we have "won"? Developers are practical people, they need to get things (software) finished. It follows that the Developers interest group can gauge success, in part, by practical outcomes: 57 | * *Ultra-diamond-delux-MEGA-wins*: Pointers to fully deprecated repositories retired because shared solutions were found and are now being used, i.e. "they will know us by our trail of death". _But see Monoliths_. 58 | * Number of issues opened/closed on this tracker 59 | * Pointers to Git repositories for libraries that are dependencies to multiple software packages, e.g. using Githubs `/network/dependents` tracking. 60 | * Year end reports to TDWG that identify specific software packages that are used by multiple development teams 61 | * Shared unit test targets 62 | * Shared feature tests (feature tests as standards) 63 | * Development emerging around Gold standards 64 | * The emergence of meta-software developed as an outcome of this group that uses CI/CD to facilitate interoperability 65 | * Viewer count logs for live-coding streams on Twitch, YouTube, or Mixer 66 | * Chat logs 67 | * Deltas of the number of DOIs generated by the Developers community publishing in [JOSS](https://joss.theoj.org/), or similar journals or DOI granting resources 68 | * Published papers as an outcome 69 | * Hackathons or other workshops or conferences as an outcome 70 | * ... 71 | 72 | ## History 73 | The Developers interest group was conceived in part during the 2019 BiodiversityNext meeting at Leiden and presented as part of the "unconference" there. The original, now read-only document used at the conference, is at [https://bit.ly/2WdQRGT](https://bit.ly/2WdQRGT). 74 | 75 | ## Summary 76 | The Developers interest group sees software as the bridge between biological standards and the people (and machines) that use them. To that end it seeks to convene a community of developers, their supporters, and managers to explore how software can improve standards and, practically, the day to day lives of the people it serves. Further, we recognize that the biodiversity developers community is disparate, and that unifying aspects of the community should lead to better personal support, more interoperable software, more rapid convergence to optimal solutions, and a richer environment to share and evolve biologically-related development practices. 77 | 78 | ## Resources 79 | * [This Developers repository](https://github.com/tdwg/developers). 80 | * [Join the Slack channel](https://tdwg.slack.com/messages/developers) 81 | * [Request a Slack account](mailto:) 82 | 83 | --------------------------------------------------------------------------------