├── .github
    └── ISSUE_TEMPLATE
    │   ├── new-term-template.md
    │   └── term-change-template.md
├── .gitignore
├── README.md
├── build
    ├── README.md
    ├── build-termlist.py
    ├── build.py
    ├── build_other_doc_header.py
    ├── dwc_doc_hierarchy
    │   └── index.md
    ├── dwc_doc_inclusive
    │   └── index.md
    ├── dwc_doc_tcr
    │   ├── authors_configuration.yaml
    │   ├── document_configuration.yaml
    │   └── termlist-header.md
    ├── generate_term_versions.py
    ├── qrg-list.csv
    ├── requirements.txt
    ├── tcr-2024-02-28
    │   ├── config.yaml
    │   ├── tcr.csv
    │   └── vocab.yaml
    ├── tcr_build.py
    ├── termlist-footer.md
    ├── termlist-header.md
    ├── termlist-header_filled.md
    ├── terms.tmpl
    └── update_previous_doc.py
├── dist
    ├── simple_eco_horizontal.csv
    └── simple_eco_vertical.csv
├── docs
    ├── CNAME
    ├── _config.yml
    ├── _data
    │   ├── footer.yml
    │   └── navigation.yml
    ├── _sass
    │   └── _custom.scss
    ├── hierarchy
    │   ├── fig1.png
    │   ├── fig2.png
    │   ├── fig3.png
    │   └── index.md
    ├── humboldt_extension_implementation_experience_report.pdf
    ├── inclusive
    │   └── index.md
    ├── index.md
    ├── list
    │   ├── 2023-08-08.md
    │   ├── 2023-08-25.md
    │   ├── 2023-09-03.md
    │   ├── 2023-09-04.md
    │   ├── 2024-02-28.md
    │   └── index.md
    ├── tcr
    │   └── index.md
    └── terms
    │   └── index.md
├── material
    ├── Checklist Metadata - Data Entry Manual.docx
    ├── Guralnick et al Ecography 2017.pdf
    ├── HCSupplementalTable3_FullTermList_r2_v4_RW.xlsx
    ├── HC_SupplementalTable_ExamplesNEW.xlsx
    ├── TDWG_Task_Group_Charter_Template_03.docx
    └── desktop.ini
└── vocabulary
    ├── old
        ├── HC_terms_2021-02-28.csv
        ├── HC_terms_2021-11-17.csv
        ├── HC_terms_2022-02-25.csv
        ├── HC_terms_2022-03-02.csv
        ├── README.md
        └── term_versions_eco_2023-03-02.csv
    └── term_versions.csv


/.github/ISSUE_TEMPLATE/new-term-template.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | name: New term template
 3 | about: This template sets up a new issue with the information needed for a new term
 4 |   request.
 5 | title: 'New Term - '
 6 | labels: Term - add
 7 | assignees: ''
 8 | 
 9 | ---
10 | 
11 | ## New term
12 | 
13 | * Submitter: 
14 | * Efficacy Justification (why is this term necessary?): 
15 | * Demand Justification (name at least two organizations that independently need this term): 
16 | * Stability Justification (what concerns are there that this might affect existing implementations?): 
17 | * Implications for dwciri: namespace (does this change affect a dwciri term version)?: 
18 | 
19 | Proposed attributes of the new term:
20 | 
21 | * Term name (in lowerCamelCase for properties, UpperCamelCase for classes): 
22 | * Term label (English, not normative): 
23 | * Organized in Class (e.g., Occurrence, Event, Location, Taxon): 
24 | * Definition of the term (normative): 
25 | * Usage comments (recommendations regarding content, etc., not normative): 
26 | * Examples (not normative): 
27 | * Refines (identifier of the broader term this term refines; normative): 
28 | * Replaces (identifier of the existing term that would be deprecated and replaced by this term; normative): 
29 | * ABCD 2.06 (XPATH of the equivalent term in ABCD or EFG; not normative):
30 | 


--------------------------------------------------------------------------------
/.github/ISSUE_TEMPLATE/term-change-template.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | name: Term change template
 3 | about: This template sets up a new issue with the information needed for a term change
 4 |   request.
 5 | title: 'Change term - '
 6 | labels: ''
 7 | assignees: ''
 8 | 
 9 | ---
10 | 
11 | ## Term change
12 | 
13 | * Submitter: 
14 | * Efficacy Justification (why is this change necessary?): 
15 | * Demand Justification (if the change is semantic in nature, name at least two organizations that independently need this term): 
16 | * Stability Justification (what concerns are there that this might affect existing implementations?): 
17 | * Implications for dwciri: namespace (does this change affect a dwciri term version)?: 
18 | 
19 | Current Term definition: https://eco.tdwg.org/list/#eco_[term name here]
20 | 
21 | Proposed attributes of the new term version (Please put actual changes to be implemented in **bold** and ~strikethrough~):
22 | 
23 | * Term name (in lowerCamelCase for properties, UpperCamelCase for classes): 
24 | * Term label (English, not normative): 
25 | * Organized in Class (e.g., Occurrence, Event, Location, Taxon): 
26 | * Definition of the term (normative): 
27 | * Usage comments (recommendations regarding content, etc., not normative): 
28 | * Examples (not normative): 
29 | * Refines (identifier of the broader term this term refines; normative): 
30 | * Replaces (identifier of the existing term that would be deprecated and replaced by this term; normative): 
31 | * ABCD 2.06 (XPATH of the equivalent term in ABCD or EFG; not normative):
32 | 


--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
1 | # Mac OS X
2 | .DS_Store
3 | 
4 | # Jekyll
5 | docs/_site/*
6 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # Humboldt Extension Task Group
 2 | A TDWG Task Group of the Observations &amp; specimens Interest Group
 3 | 
 4 | ## :arrow_upper_right: Current activity
 5 | 
 6 | [Guralnick et al.](https://onlinelibrary.wiley.com/doi/full/10.1111/ecog.02942) introduced the Humboldt Core as a proof of concept in 2018. In **2021**, the [TDWG Humboldt Extension Task Group](https://www.tdwg.org/community/osr/humboldt-extension/) was established to review how to best integrate the terms proposed in the original publication with existing standards and implementation schemas. In the context of sharing data using the **Darwin Core standard**, different types of inventories can be represented as **Events** with different nesting levels. Therefore, it was deemed appropriate to build a **DwC Extension** to include all Humboldt Extension terms that capture the details of the inventory process.
 7 | 
 8 | The Task Group members reviewed all original terms from [Guralnik et al. 2018](https://onlinelibrary.wiley.com/doi/full/10.1111/ecog.02942), reformulated definitions, and discarded or added new terms where needed. The latest version of the terms can be found in the [vocabulary folder](https://github.com/tdwg/hc/tree/main/vocabulary). We have also built a [Humboldt Extension Quick Reference Guide](<https://tdwg.github.io/hc/terms/>).
 9 | 
10 | We are currently developing a [Humboldt Extension user guide](<https://tinyurl.com/humboldt-documentation>), but see the GBIF summarised version [here](https://docs.gbif.org/survey-monitoring-quick-start/en/).
11 | 
12 | As of February 2024, the Humboldt Extension was ratified as a Darwin Core Event class extension (https://eco.tdwg.org/). Hence, the Task Group was closed, and the extension is now maintained by the Darwin Core Maintenance Group (https://www.tdwg.org/community/dwc/).
13 | 
14 | <br>
15 | 
16 | ## :books: Relevant documents and materials
17 | 
18 | [TG Charter](https://github.com/MapofLife/hc/blob/main/material/TDWG_Task_Group_Charter_Template_03.docx)
19 | 
20 | [Humboldt Core paper](https://github.com/MapofLife/hc/blob/main/material/Guralnick%20et%20al%20Ecography%202017.pdf) Guralnick et al. 2017
21 | 
22 | [HC Supplemental: Full Term List](https://github.com/MapofLife/hc/blob/main/material/HCSupplementalTable3_FullTermList_r2_v4_RW.xlsx)
23 | 
24 | [HC Supplemental: Examples table](https://github.com/MapofLife/hc/blob/main/material/HC_SupplementalTable_ExamplesNEW.xlsx)
25 | 
26 | 
27 | 


--------------------------------------------------------------------------------
/build/README.md:
--------------------------------------------------------------------------------
 1 | # Build scripts
 2 | 
 3 | ## Generating the "list of terms" document for the main eco vocabulary
 4 | 
 5 | Prior to building the production List of Terms document, the Python script "update_previous_doc.py" must be run to change the headers of the previous version of the document and to rename that previous version to a dated version. This must be done first, otherwise the previous index.md file will be overwritten by the new one that is generated by the build-termlist.py script. NOTE: The vocabulary and document metadata in the rs.tdwg.org repository must have been updated before running this script. See <https://github.com/tdwg/rs.tdwg.org/blob/master/process/process-vocabulary.md> for details. Command line arguments are:
 6 | 
 7 | `--slug` (required): the last part of the document URL before the trailing slash. For the Humboldt Extension List of Terms, this is `list`.
 8 | 
 9 | `--dir` (required): the subdirectory of the `process/document_metadata_processing/` directory in the rs.tdwg.org repository where the `author_metadata.yaml` and `document_metadata.yaml` files are located. For the Humboldt Extension List of Terms, this is `dwc_doc_eco`.
10 | 
11 | `--branch` (optional): the branch of the rs.tdwg.org repository where the metadata are located. The default is `master`.
12 | 
13 | If you are creating a List of Terms document for proofreading prior to ratification, then you should first create a branch of the repo so that the previous version of the document in master is not overwritten and so that the preliminary draft does not appear in the GitHub pages site. You can then run the build-termlist.py script to generate a new index.md file and commit it to the branch. You can then look at the document in the GitHub repository (not the eco.tdwg.org GitHub pages site) to see how it is rendered. It will not have the styling that is provided by the TDWG Jekyll theme.
14 | 
15 | The Python script `build-termlist.py` inputs the header template from `termlist-header.md`, then builds the list of terms and their metadata from data in the [rs.tdwg.org](http://github.com/tdwg/rs.tdwg.org) repository. The script also inputs `termlist-footer.md` and appends it to the end of the generated document, but currently it has no content. After the header, term list, and footer are concatenated, the script will then insert author and document metadata from the `author_metadata.yaml` and `document_metadata.yaml` files from the rs.tdwg.org GitHub repository. Therefore, those files must be in place and updated prior to running the script. The constructed Markdown document is saved as `/docs/list/index.md`. 
16 | 
17 | Command line arguments are:
18 | 
19 | `--branch` (optional): the branch of the rs.tdwg.org repository where the metadata are located. The default is `master`.
20 | 
21 | ## Generating the taxonCompletenessReported CV document
22 | 
23 | As with the List of Terms document, the "update_previous_doc.py" script must be run prior to generating a production version of the document in order to update headers and preserve the previous version of the document. In this case the command line arguments are:
24 | 
25 | `--slug` (required): `tcr`.
26 | 
27 | `--dir` (required): `dwc_doc_tcr`.
28 | 
29 | `--branch` (optional): default is `master`.
30 | 
31 | The vocabulary and document metadata in the rs.tdwg.org repository must also have been updated before running the `tcr_build.py` script.
32 | 
33 | Command line arguments for `tcr_build.py` are:
34 | 
35 | `--branch` (optional): the branch of the rs.tdwg.org repository where the metadata are located. The default is `master`.
36 | 
37 | The header template is in the file `dwc_doc_tcr/termlist_header.md`. The header metadata is inserted from metadata in the rs.tdwg.org repository. The term list itself is generated from CSV metadata uploaded to the rs.tdwg.org repository. The output file will be written to `docs/tcr/index.md`.
38 | 
39 | 
40 | ## Generating the additional standards documents from their templates
41 | 
42 | The script `build_other_doc_header.py` inserts document and author metadata from the rs.tdwg.org repo (as with the List of Terms document) and inserts them into the header of the document template that is stored in a subdirectory whose name parallels the permanent IRI of the document (e.g. for `http://rs.tdwg.org/dwc/doc/hierarchy/`, the directory is `dwc_doc_hierarchy`). ([example document template](https://github.com/tdwg/hc/blob/main/build/dwc_doc_hierarchy/index.md)). Unlike the List of Terms document, the document template is largely hand-edited (except for the header). If the document content is to be updated, it must be edited in the template file, with the production doc regenerated by this script.
43 | 
44 | Command line options are:
45 | 
46 | `--slug` (required): the last part of the document URL before the trailing slash. For example, the slug for `http://rs.tdwg.org/dwc/doc/hierarchy/` is `hierarchy`. 
47 | 
48 | `--branch` (optional): the branch of the rs.tdwg.org repository where the metadata are located. The default is `master`.
49 | 
50 | Once the production document is generated in the `docs` directory, check the diff of the production document to make sure it makes sense. ([example production doc](https://github.com/tdwg/hc/blob/main/docs/hierarchy/index.md)) push the change to GitHub. If the main branch is being used, it will take some time for GitHub Pages to rebuild the site. When that is done, the TDWG styling will be applied to the production page.
51 | 
52 | ## Generating the "normative document" (term versions CSV file)
53 | 
54 | The script `generate_term_versions.py` pulls source data from the [rs.tdwg.org](http://github.com/tdwg/rs.tdwg.org) repository. The local file `qrg-list.csv` contains a list of the term IRIs in the order that they are to appear in the Quick Reference Guide. This list needs to be changed whenever terms are added to or deprecated from Darwin Core.
55 | 
56 | It generates the file `term_versions.csv`, which is used as the input for the `build.py` script below.
57 | 
58 | NOTE: the branch of rs.tdwg.org is hard-coded as `master`. If updates are made using a different branch, this will need to be changed. It would probably be a good idea to make this a command line option by copying the code from `build_other_doc_header.py`.
59 | 
60 | ## Build script for the eco Quick Reference Guide
61 | 
62 | The build script `build.py` uses as input:
63 | 
64 | * [vocabulary/term_versions.csv](../vocabulary/term_versions.csv): the list of terms
65 | * [terms.tmpl](terms.tmpl): a Jinja2 template for the quick reference guide
66 | 
67 | And creates:
68 | 
69 | * The quick reference guide is a Markdown file at [docs/terms/index.md](../docs/terms/index.md). The guide is built as Markdown (with a lot of included html) rather than html, so it can be incorporated by Jekyll in the Darwin Core website (including a header, footer and table of contents).
70 | * Two simple Darwin Core CSV files in [dist/](../dist/)
71 | 
72 | **Run the build script**
73 | 
74 | 1. Install the required libraries (once):
75 | 
76 |     ```bash
77 |     pip install -r requirements.txt
78 |     ```
79 | 
80 | 2. Run the script from the command line:
81 | 
82 |     ```bash
83 |     python build.py
84 |     ```
85 | 
86 | 


--------------------------------------------------------------------------------
/build/build-termlist.py:
--------------------------------------------------------------------------------
  1 | # Script to build Markdown pages that provide term metadata for complex vocabularies
  2 | # Steve Baskauf 2020-06-28 CC0
  3 | # Modified for use with Humboldt Extension 2022-05-29
  4 | # This script merges static Markdown header and footer documents with term information tables (in Markdown) generated from data in the rs.tdwg.org repo from the TDWG Github site
  5 | 
  6 | import re
  7 | import requests   # best library to manage HTTP transactions
  8 | import csv        # library to read/write/parse CSV files
  9 | import json       # library to convert JSON to Python data structures
 10 | import pandas as pd
 11 | import yaml
 12 | import sys
 13 | 
 14 | # -----------------
 15 | # Command line arguments
 16 | # -----------------
 17 | 
 18 | arg_vals = sys.argv[1:]
 19 | opts = [opt for opt in arg_vals if opt.startswith('-')]
 20 | args = [arg for arg in arg_vals if not arg.startswith('-')]
 21 | 
 22 | # "master" for production, something else for development
 23 | # Example: First part of branch URL is "https://raw.githubusercontent.com/tdwg/rs.tdwg.org/eco/", branch is "eco".
 24 | if '--branch' in opts:
 25 |     github_branch = args[opts.index('--branch')]
 26 | else:
 27 |     github_branch = 'master'
 28 | 
 29 | # -----------------
 30 | # Configuration section
 31 | # -----------------
 32 | 
 33 | # This is the base URL for raw files from the branch of the repo that has been pushed to GitHub
 34 | githubBaseUri = 'https://raw.githubusercontent.com/tdwg/rs.tdwg.org/' + github_branch + '/'
 35 | 
 36 | headerFileName = 'termlist-header.md'
 37 | footerFileName = 'termlist-footer.md'
 38 | outFileName = '../docs/list/index.md'
 39 | 
 40 | # This is a Python list of the database names of the term lists to be included in the document.
 41 | termLists = ['humboldt', 'humboldt_iri']
 42 | 
 43 | # If this list of terms is for terms in a single namespace, set the value of has_namespace to True. The value
 44 | # of has_namespace should be False for a list of terms that contains multiple namespaces.
 45 | has_namespace = False
 46 | 
 47 | # NOTE! There may be problems unless every term list is of the same vocabulary type since the number of columns will differ
 48 | # However, there probably aren't any circumstances where mixed types will be used to generate the same page.
 49 | vocab_type = 1 # 1 is simple vocabulary, 2 is simple controlled vocabulary, 3 is c.v. with broader hierarchy
 50 | 
 51 | # Terms in large vocabularies like Darwin and Audubon Cores may be organized into categories using tdwgutility_organizedInClass
 52 | # If so, those categories can be used to group terms in the generated term list document.
 53 | organized_in_categories = True
 54 | 
 55 | # If organized in categories, the display_order list must contain the IRIs that are values of tdwgutility_organizedInClass
 56 | # If not organized into categories, the value is irrelevant. There just needs to be one item in the list.
 57 | 
 58 | display_order = [ 'http://rs.tdwg.org/dwc/terms/Event', 'http://rs.tdwg.org/dwc/terms/attributes/UseWithIRI']
 59 | display_label = ['Literal-value terms', 'IRI-value terms']
 60 | display_comments = ['','']
 61 | display_id = ['event', 'use_with_iri']
 62 | 
 63 | # ---------------
 64 | # Load header data
 65 | # ---------------
 66 | 
 67 | config_file_path = 'process/document_metadata_processing/dwc_doc_eco/'
 68 | contributors_yaml_file = 'authors_configuration.yaml'
 69 | document_configuration_yaml_file = 'document_configuration.yaml'
 70 | 
 71 | if has_namespace:
 72 |     # Load the data about the namespace from term lists metadata at rs.tdwg.org
 73 |     term_lists_df = pd.read_csv(githubBaseUri +  'term-lists/term-lists.csv')
 74 |     # Find the row in the term-lists.csv file that corresponds to the database.
 75 |     term_list_row = term_lists_df.loc[term_lists_df['database'] == termLists[0]]
 76 |     # Extract the namespace IRI and preferred namespace prefix from the row.
 77 |     namespace_uri = term_list_row['vann_preferredNamespaceUri'].values[0]
 78 |     pref_namespace_prefix = term_list_row['vann_preferredNamespacePrefix'].values[0]
 79 | 
 80 |     '''
 81 | 
 82 |     metadata_config_text = requests.get(githubBaseUri + config_file_path + 'config.yaml').text
 83 |     metadata_config = yaml.load(metadata_config_text, Loader=yaml.FullLoader)
 84 |     namespace_uri = metadata_config['namespaces'][0]['namespace_uri']
 85 |     pref_namespace_prefix = metadata_config['namespaces'][0]['pref_namespace_prefix']
 86 |     '''
 87 | 
 88 | # Load the contributors YAML file from its GitHub URL
 89 | contributors_yaml_url = githubBaseUri + config_file_path + contributors_yaml_file
 90 | contributors_yaml = requests.get(contributors_yaml_url).text
 91 | if contributors_yaml == '404: Not Found':
 92 |     print('Contributors YAML file not found. Check the URL.')
 93 |     print(contributors_yaml_url)
 94 |     exit()
 95 | contributors_yaml = yaml.load(contributors_yaml, Loader=yaml.FullLoader)
 96 | 
 97 | # Load the document configuration YAML file from its GitHub URL
 98 | document_configuration_yaml_url = githubBaseUri + config_file_path + document_configuration_yaml_file
 99 | document_configuration_yaml = requests.get(document_configuration_yaml_url).text
100 | document_configuration_yaml = yaml.load(document_configuration_yaml, Loader=yaml.FullLoader)
101 | 
102 | # ---------------
103 | # Function definitions
104 | # ---------------
105 | 
106 | # replace URL with link
107 | #
108 | def createLinks(text):
109 |     def repl(match):
110 |         if match.group(1)[-1] == '.':
111 |             return '<a href="' + match.group(1)[:-1] + '">' + match.group(1)[:-1] + '</a>.'
112 |         return '<a href="' + match.group(1) + '">' + match.group(1) + '</a>'
113 | 
114 |     pattern = '(https?://[^\s,;\)"]*)'
115 |     result = re.sub(pattern, repl, text)
116 |     return result
117 | 
118 | # 2021-08-06 Replace the createLinks() function with functions copied from the QRG build script written by S. Van Hoey
119 | def convert_code(text_with_backticks):
120 |     """Takes all back-quoted sections in a text field and converts it to
121 |     the html tagged version of code blocks <code>...</code>
122 |     """
123 |     return re.sub(r'`([^`]*)`', r'<code>\1</code>', text_with_backticks)
124 | 
125 | def convert_link(text_with_urls):
126 |     """Takes all links in a text field and converts it to the html tagged
127 |     version of the link
128 |     """
129 |     def _handle_matched(inputstring):
130 |         """quick hack version of url handling on the current prime versions data"""
131 |         url = inputstring.group()
132 |         return "<a href=\"{}\">{}</a>".format(url, url)
133 | 
134 |     regx = "(http[s]?://[\w\d:#@%/;$()~_?\+-;=\\\.&]*)(?<![\)\.,])"
135 |     return re.sub(regx, _handle_matched, text_with_urls)
136 | 
137 | # Hack the code taken from the terms.tmpl template to insert the HTML necessary to make the semicolon-separated
138 | # lists of examples into an HTML list.
139 | # {% set examples = term.examples.split("; ") %}
140 | # {% if examples | length == 1 %}{{ examples | first }}{% else %}<ul class="list-group list-group-flush">{% for example in examples %}<li class="list-group-item">{{ example }}</li>{% endfor %}</ul>{% endif %}
141 | def convert_examples(text_with_list_of_examples: str) -> str:
142 |     examples_list = text_with_list_of_examples.split('; ')
143 |     if len(examples_list) == 1:
144 |         return examples_list[0]
145 |     else:
146 |         output = '<ul class="list-group list-group-flush">\n'
147 |         for example in examples_list:
148 |             output += '  <li class="list-group-item">' + example + '</li>\n'
149 |         output += '</ul>'
150 |         return output
151 | 
152 | print('Retrieving term list metadata from GitHub')
153 | term_lists_info = []
154 | 
155 | frame = pd.read_csv(githubBaseUri + 'term-lists/term-lists.csv', na_filter=False)
156 | for termList in termLists:
157 |     term_list_dict = {'list_iri': termList}
158 |     term_list_dict = {'database': termList}
159 |     for index,row in frame.iterrows():
160 |         if row['database'] == termList:
161 |             term_list_dict['pref_ns_prefix'] = row['vann_preferredNamespacePrefix']
162 |             term_list_dict['pref_ns_uri'] = row['vann_preferredNamespaceUri']
163 |             term_list_dict['list_iri'] = row['list']
164 |     term_lists_info.append(term_list_dict)
165 | #print(term_lists_info)
166 | 
167 | # Create column list
168 | column_list = ['pref_ns_prefix', 'pref_ns_uri', 'term_localName', 'label', 'rdfs_comment', 'dcterms_description', 'examples', 'term_modified', 'term_deprecated', 'rdf_type', 'tdwgutility_abcdEquivalence', 'replaces_term', 'replaces1_term']
169 | if vocab_type == 2:
170 |     column_list += ['controlled_value_string']
171 | elif vocab_type == 3:
172 |     column_list += ['controlled_value_string', 'skos_broader']
173 | if organized_in_categories:
174 |     column_list.append('tdwgutility_organizedInClass')
175 | column_list.append('version_iri')
176 | 
177 | print('Retrieving metadata about terms from all namespaces from GitHub')
178 | # Create list of lists metadata table
179 | table_list = []
180 | for term_list in term_lists_info:
181 |     # retrieve versions metadata for term list
182 |     versions_url = githubBaseUri + term_list['database'] + '-versions/' + term_list['database'] + '-versions.csv'
183 |     versions_df = pd.read_csv(versions_url, na_filter=False)
184 |     
185 |     # retrieve current term metadata for term list
186 |     data_url = githubBaseUri + term_list['database'] + '/' + term_list['database'] + '.csv'
187 |     frame = pd.read_csv(data_url, na_filter=False)
188 |     for index,row in frame.iterrows():
189 |         row_list = [term_list['pref_ns_prefix'], term_list['pref_ns_uri'], row['term_localName'], row['label'], row['rdfs_comment'], row['dcterms_description'], row['examples'], row['term_modified'], row['term_deprecated'], row['rdf_type'], row['tdwgutility_abcdEquivalence'], row['replaces_term'], row['replaces1_term']]
190 |         #row_list = [term_list['pref_ns_prefix'], term_list['pref_ns_uri'], row['term_localName'], row['label'], row['definition'], row['usage'], row['notes'], row['term_modified'], row['term_deprecated'], row['type']]
191 |         if vocab_type == 2:
192 |             row_list += [row['controlled_value_string']]
193 |         elif vocab_type == 3:
194 |             if row['skos_broader'] =='':
195 |                 row_list += [row['controlled_value_string'], '']
196 |             else:
197 |                 row_list += [row['controlled_value_string'], term_list['pref_ns_prefix'] + ':' + row['skos_broader']]
198 |         if organized_in_categories:
199 |             # Hack on 2024-03-27 to make the ecoiri: terms be in separate sections
200 |             if term_list['list_iri'] == 'http://rs.tdwg.org/eco/iri/':
201 |                 row_list.append('http://rs.tdwg.org/dwc/terms/attributes/UseWithIRI')
202 |             else:
203 |                 row_list.append('http://rs.tdwg.org/dwc/terms/Event')
204 |             #row_list.append(row['tdwgutility_organizedInClass'])
205 | 
206 |         # Borrowed terms really don't have implemented versions. They may be lacking values for version_status.
207 |         # In their case, their version IRI will be omitted.
208 |         found = False
209 |         for vindex, vrow in versions_df.iterrows():
210 |             if vrow['term_localName']==row['term_localName'] and vrow['version_status']=='recommended':
211 |                 found = True
212 |                 version_iri = vrow['version']
213 |                 # NOTE: the current hack for non-TDWG terms without a version is to append # to the end of the term IRI
214 |                 if version_iri[len(version_iri)-1] == '#':
215 |                     version_iri = ''
216 |         if not found:
217 |             version_iri = ''
218 |         row_list.append(version_iri)
219 | 
220 |         table_list.append(row_list)
221 | 
222 | print('processing data')
223 | # Turn list of lists into dataframe
224 | terms_df = pd.DataFrame(table_list, columns = column_list)
225 | 
226 | terms_sorted_by_label = terms_df.sort_values(by='label')
227 | #terms_sorted_by_localname = terms_df.sort_values(by='term_localName')
228 | 
229 | # This makes sort case insensitive
230 | terms_sorted_by_localname = terms_df.iloc[terms_df.term_localName.str.lower().argsort()]
231 | #terms_sorted_by_localname
232 | print('done retrieving')
233 | print()
234 | 
235 | print('Generating term index by CURIE')
236 | text = '### 3.1 Index By Term Name\n\n'
237 | text += '(See also [3.2 Index By Label](#32-index-by-label))\n\n'
238 | 
239 | #text += '**Classes**\n'
240 | #text += '\n'
241 | #for row_index,row in terms_sorted_by_localname.iterrows():
242 | #    if row['rdf_type'] == 'http://www.w3.org/2000/01/rdf-schema#Class':
243 | #        curie = row['pref_ns_prefix'] + ":" + row['term_localName']
244 | #        curie_anchor = curie.replace(':','_')
245 | #        text += '[' + curie + '](#' + curie_anchor + ') |\n'
246 | #text = text[:len(text)-2] # remove final trailing vertical bar and newline
247 | #text += '\n\n' # put back removed newline
248 | 
249 | for category in range(0,len(display_order)):
250 |     text += '**' + display_label[category] + '**\n'
251 |     text += '\n'
252 |     if organized_in_categories:
253 |         filtered_table = terms_sorted_by_localname[terms_sorted_by_localname['tdwgutility_organizedInClass']==display_order[category]]
254 |         filtered_table.reset_index(drop=True, inplace=True)
255 |     else:
256 |         filtered_table = terms_sorted_by_localname
257 |         
258 |     for row_index,row in filtered_table.iterrows():
259 |         if row['rdf_type'] != 'http://www.w3.org/2000/01/rdf-schema#Class':
260 |             curie = row['pref_ns_prefix'] + ":" + row['term_localName']
261 |             curie_anchor = curie.replace(':','_')
262 |             text += '[' + curie + '](#' + curie_anchor + ') |\n'
263 |     text = text[:len(text)-2] # remove final trailing vertical bar and newline
264 |     text += '\n\n' # put back removed newline
265 | 
266 | index_by_name = text
267 | 
268 | #print(index_by_name)
269 | 
270 | print('Generating term index by label')
271 | text = '\n\n'
272 | 
273 | # Comment out the following two lines if there is no index by local names
274 | text = '### 3.2 Index By Label\n\n'
275 | text += '(See also [3.1 Index By Term Name](#31-index-by-term-name))\n\n'
276 | 
277 | #text += '**Classes**\n'
278 | #text += '\n'
279 | #for row_index,row in terms_sorted_by_label.iterrows():
280 | #    if row['rdf_type'] == 'http://www.w3.org/2000/01/rdf-schema#Class':
281 | #        curie_anchor = row['pref_ns_prefix'] + "_" + row['term_localName']
282 | #        text += '[' + row['label'] + '](#' + curie_anchor + ') |\n'
283 | #text = text[:len(text)-2] # remove final trailing vertical bar and newline
284 | #text += '\n\n' # put back removed newline
285 | 
286 | for category in range(0,len(display_order)):
287 |     if organized_in_categories:
288 |         text += '**' + display_label[category] + '**\n'
289 |         text += '\n'
290 |         filtered_table = terms_sorted_by_label[terms_sorted_by_label['tdwgutility_organizedInClass']==display_order[category]]
291 |         filtered_table.reset_index(drop=True, inplace=True)
292 |     else:
293 |         filtered_table = terms_sorted_by_label
294 |         
295 |     for row_index,row in filtered_table.iterrows():
296 |         if row_index == 0 or (row_index != 0 and row['label'] != filtered_table.iloc[row_index - 1].loc['label']): # this is a hack to prevent duplicate labels
297 |             if row['rdf_type'] != 'http://www.w3.org/2000/01/rdf-schema#Class':
298 |                 curie_anchor = row['pref_ns_prefix'] + "_" + row['term_localName']
299 |                 text += '[' + row['label'] + '](#' + curie_anchor + ') |\n'
300 |     text = text[:len(text)-2] # remove final trailing vertical bar and newline
301 |     text += '\n\n' # put back removed newline
302 | 
303 | index_by_label = text
304 | 
305 | #print(index_by_label)
306 | 
307 | decisions_df = pd.read_csv('https://raw.githubusercontent.com/tdwg/rs.tdwg.org/master/decisions/decisions-links.csv', na_filter=False)
308 | 
309 | # ---------------
310 | # generate a table for each term, with terms grouped by category
311 | # ---------------
312 | 
313 | print('Generating terms table')
314 | # generate the Markdown for the terms table
315 | text = '## 4 Vocabulary\n'
316 | if True:
317 |     filtered_table = terms_sorted_by_localname
318 | 
319 | #for category in range(0,len(display_order)):
320 | #    if organized_in_categories:
321 | #        text += '### 4.' + str(category + 1) + ' ' + display_label[category] + '\n'
322 | #        text += '\n'
323 | #        text += display_comments[category] # insert the comments for the category, if any.
324 | #        filtered_table = terms_sorted_by_localname[terms_sorted_by_localname['tdwgutility_organizedInClass']==display_order[category]]
325 | #        filtered_table.reset_index(drop=True, inplace=True)
326 | #    else:
327 | #        filtered_table = terms_sorted_by_localname
328 | 
329 |     for row_index,row in filtered_table.iterrows():
330 |         text += '<table>\n'
331 |         curie = row['pref_ns_prefix'] + ":" + row['term_localName']
332 |         curieAnchor = curie.replace(':','_')
333 |         text += '\t<thead>\n'
334 |         text += '\t\t<tr>\n'
335 |         text += '\t\t\t<th colspan="2"><a id="' + curieAnchor + '"></a>Term Name  ' + curie + '</th>\n'
336 |         text += '\t\t</tr>\n'
337 |         text += '\t</thead>\n'
338 |         text += '\t<tbody>\n'
339 |         text += '\t\t<tr>\n'
340 |         text += '\t\t\t<td>Term IRI</td>\n'
341 |         uri = row['pref_ns_uri'] + row['term_localName']
342 |         text += '\t\t\t<td><a href="' + uri + '">' + uri + '</a></td>\n'
343 |         text += '\t\t</tr>\n'
344 |         text += '\t\t<tr>\n'
345 |         text += '\t\t\t<td>Modified</td>\n'
346 |         text += '\t\t\t<td>' + row['term_modified'] + '</td>\n'
347 |         text += '\t\t</tr>\n'
348 | 
349 |         if row['version_iri'] != '':
350 |             text += '\t\t<tr>\n'
351 |             text += '\t\t\t<td>Term version IRI</td>\n'
352 |             text += '\t\t\t<td><a href="' + row['version_iri'] + '">' + row['version_iri'] + '</a></td>\n'
353 |             text += '\t\t</tr>\n'
354 | 
355 |         text += '\t\t<tr>\n'
356 |         text += '\t\t\t<td>Label</td>\n'
357 |         text += '\t\t\t<td>' + row['label'] + '</td>\n'
358 |         text += '\t\t</tr>\n'
359 | 
360 |         if row['term_deprecated'] != '':
361 |             text += '\t\t<tr>\n'
362 |             text += '\t\t\t<td></td>\n'
363 |             text += '\t\t\t<td><strong>This term is deprecated and should no longer be used.</strong></td>\n'
364 |             text += '\t\t</tr>\n'
365 | 
366 |             for dep_index,dep_row in filtered_table.iterrows():
367 |                 if dep_row['replaces_term'] == uri:
368 |                     text += '\t\t<tr>\n'
369 |                     text += '\t\t\t<td>Is replaced by</td>\n'
370 |                     text += '\t\t\t<td><a href="#' + dep_row['pref_ns_prefix'] + "_" + dep_row['term_localName'] + '">' + dep_row['pref_ns_uri'] + dep_row['term_localName'] + '</a></td>\n'
371 |                     text += '\t\t</tr>\n'
372 |                 if dep_row['replaces1_term'] == uri:
373 |                     text += '\t\t<tr>\n'
374 |                     text += '\t\t\t<td>Is replaced by</td>\n'
375 |                     text += '\t\t\t<td><a href="#' + dep_row['pref_ns_prefix'] + "_" + dep_row['term_localName'] + '">' + dep_row['pref_ns_uri'] + dep_row['term_localName'] + '</a></td>\n'
376 |                     text += '\t\t</tr>\n'
377 | 
378 |         text += '\t\t<tr>\n'
379 |         text += '\t\t\t<td>Definition</td>\n'
380 |         text += '\t\t\t<td>' + row['rdfs_comment'] + '</td>\n'
381 |         #text += '\t\t\t<td>' + row['definition'] + '</td>\n'
382 |         text += '\t\t</tr>\n'
383 | 
384 |         if row['dcterms_description'] != '':
385 |         #if row['notes'] != '':
386 |             text += '\t\t<tr>\n'
387 |             text += '\t\t\t<td>Notes</td>\n'
388 |             text += '\t\t\t<td>' + convert_link(convert_code(row['dcterms_description'])) + '</td>\n'
389 |             #text += '\t\t\t<td>' + createLinks(row['notes']) + '</td>\n'
390 |             text += '\t\t</tr>\n'
391 | 
392 |         if row['examples'] != '':
393 |         #if row['usage'] != '':
394 |             text += '\t\t<tr>\n'
395 |             text += '\t\t\t<td>Examples</td>\n'
396 |             text += '\t\t\t<td>' + convert_examples(convert_link(convert_code(row['examples']))) + '</td>\n'
397 |             #text += '\t\t\t<td>' + createLinks(row['usage']) + '</td>\n'
398 |             text += '\t\t</tr>\n'
399 | 
400 |         if row['tdwgutility_abcdEquivalence'] != '':
401 |         #if row['usage'] != '':
402 |             text += '\t\t<tr>\n'
403 |             text += '\t\t\t<td>ABCD equivalence</td>\n'
404 |             text += '\t\t\t<td>' + convert_link(convert_code(row['tdwgutility_abcdEquivalence'])) + '</td>\n'
405 |             text += '\t\t</tr>\n'
406 | 
407 |         if vocab_type == 2 or vocab_type ==3: # controlled vocabulary
408 |             text += '\t\t<tr>\n'
409 |             text += '\t\t\t<td>Controlled value</td>\n'
410 |             text += '\t\t\t<td>' + row['controlled_value_string'] + '</td>\n'
411 |             text += '\t\t</tr>\n'
412 | 
413 |         if vocab_type == 3 and row['skos_broader'] != '': # controlled vocabulary with skos:broader relationships
414 |             text += '\t\t<tr>\n'
415 |             text += '\t\t\t<td>Has broader concept</td>\n'
416 |             curieAnchor = row['skos_broader'].replace(':','_')
417 |             text += '\t\t\t<td><a href="#' + curieAnchor + '">' + row['skos_broader'] + '</a></td>\n'
418 |             text += '\t\t</tr>\n'
419 | 
420 |         text += '\t\t<tr>\n'
421 |         text += '\t\t\t<td>Type</td>\n'
422 |         if row['rdf_type'] == 'http://www.w3.org/1999/02/22-rdf-syntax-ns#Property':
423 |         #if row['type'] == 'http://www.w3.org/1999/02/22-rdf-syntax-ns#Property':
424 |             text += '\t\t\t<td>Property</td>\n'
425 |         elif row['rdf_type'] == 'http://www.w3.org/2000/01/rdf-schema#Class':
426 |         #elif row['type'] == 'http://www.w3.org/2000/01/rdf-schema#Class':
427 |             text += '\t\t\t<td>Class</td>\n'
428 |         elif row['rdf_type'] == 'http://www.w3.org/2004/02/skos/core#Concept':
429 |         #elif row['type'] == 'http://www.w3.org/2004/02/skos/core#Concept':
430 |             text += '\t\t\t<td>Concept</td>\n'
431 |         else:
432 |             text += '\t\t\t<td>' + row['rdf_type'] + '</td>\n' # this should rarely happen
433 |             #text += '\t\t\t<td>' + row['type'] + '</td>\n' # this should rarely happen
434 |         text += '\t\t</tr>\n'
435 | 
436 |         # Look up decisions related to this term
437 |         for drow_index,drow in decisions_df.iterrows():
438 |             if drow['linked_affected_resource'] == uri:
439 |                 text += '\t\t<tr>\n'
440 |                 text += '\t\t\t<td>Executive Committee decision</td>\n'
441 |                 text += '\t\t\t<td><a href="http://rs.tdwg.org/decisions/' + drow['decision_localName'] + '">http://rs.tdwg.org/decisions/' + drow['decision_localName'] + '</a></td>\n'
442 |                 text += '\t\t</tr>\n'                        
443 | 
444 |         text += '\t</tbody>\n'
445 |         text += '</table>\n'
446 |         text += '\n'
447 |     text += '\n'
448 | term_table = text
449 | print('done generating')
450 | print()
451 | 
452 | #print(term_table)
453 | 
454 | print('Merging term table with header and footer and saving file')
455 | #text = index_by_label + term_table
456 | text = index_by_name + index_by_label + term_table
457 | 
458 | # read in header and footer, merge with terms table, and output
459 | 
460 | headerObject = open(headerFileName, 'rt', encoding='utf-8')
461 | header = headerObject.read()
462 | headerObject.close()
463 | 
464 | # Build the Markdown for the contributors list
465 | contributors = ''
466 | for contributor in contributors_yaml:
467 |     contributors += '[' + contributor['contributor_literal'] + '](' + contributor['contributor_iri'] + ') '
468 |     contributors += '([' + contributor['affiliation'] + '](' + contributor['affiliation_uri'] + ')), '
469 | contributors = contributors[:-2] # Remove the last comma and space
470 | 
471 | # Substitute values of ratification_date and contributors into the header template
472 | header = header.replace('{document_title}', document_configuration_yaml['documentTitle'])
473 | header = header.replace('{ratification_date}', document_configuration_yaml['doc_modified'])
474 | header = header.replace('{created_date}', document_configuration_yaml['doc_created'])
475 | header = header.replace('{contributors}', contributors)
476 | header = header.replace('{standard_iri}', document_configuration_yaml['dcterms_isPartOf'])
477 | header = header.replace('{current_iri}', document_configuration_yaml['current_iri'])
478 | header = header.replace('{abstract}', document_configuration_yaml['abstract'])
479 | header = header.replace('{creator}', document_configuration_yaml['creator'])
480 | header = header.replace('{publisher}', document_configuration_yaml['publisher'])
481 | year = document_configuration_yaml['doc_modified'].split('-')[0]
482 | header = header.replace('{year}', year)
483 | if has_namespace:
484 |     header = header.replace('{namespace_uri}', namespace_uri)
485 |     header = header.replace('{pref_namespace_prefix}', pref_namespace_prefix)
486 | 
487 | # Determine whether there was a previous version of the document.
488 | if document_configuration_yaml['doc_created'] != document_configuration_yaml['doc_modified']:
489 |     # Load versions list from document versions data in the rs.tdwg.org repo and find most recent version.
490 |     versions_data_url = githubBaseUri + 'docs/docs-versions.csv'
491 |     versions_list_df = pd.read_csv(versions_data_url, na_filter=False)
492 |     # Slice all rows for versions of this document.
493 |     matching_versions = versions_list_df[versions_list_df['current_iri']==document_configuration_yaml['current_iri']]
494 |     # Sort the matching versions by version IRI in descending order so that the most recent version is first.
495 |     matching_versions = matching_versions.sort_values(by=['version_iri'], ascending=[False])
496 |     # The previous version is the second row in the dataframe (row 1).
497 |     # The version IRI is in the second column (column 1).
498 |     most_recent_version_iri = matching_versions.iat[1, 1]
499 |     #print(most_recent_version_iri)
500 | 
501 |     # Insert the previous version information into the header
502 |     previous_version_metadata_string = '''Previous version
503 | : <''' + most_recent_version_iri + '''>
504 | 
505 | '''
506 |     # Insert the previous version information into the designated slot.
507 |     header = header.replace('{previous_version_slot}\n\n', previous_version_metadata_string)
508 | else:
509 |     # If there was no previous version, remove the slot from the header.
510 |     header = header.replace('{previous_version_slot}\n\n', '')
511 | 
512 | footerObject = open(footerFileName, 'rt', encoding='utf-8')
513 | footer = footerObject.read()
514 | footerObject.close()
515 | 
516 | output = header + text + footer
517 | outputObject = open(outFileName, 'wt', encoding='utf-8')
518 | outputObject.write(output)
519 | outputObject.close()
520 |     
521 | print('done')
522 | 


--------------------------------------------------------------------------------
/build/build.py:
--------------------------------------------------------------------------------
  1 | #
  2 | # S. Van Hoey, John Wieczorek
  3 | #
  4 | # Build script for document handling
  5 | # Based on https://github.com/tdwg/dwc/blob/master/build/build.py
  6 | #
  7 | 
  8 | import io
  9 | import os
 10 | import re
 11 | import csv
 12 | import sys
 13 | import codecs
 14 | 
 15 | from urllib import request
 16 | from jinja2 import FileSystemLoader, Environment
 17 | 
 18 | NAMESPACES = {
 19 |     'http://rs.tdwg.org/eco/iri/' : 'ecoiri',
 20 |     'http://rs.tdwg.org/eco/terms/' : 'eco',
 21 |     'http://rs.tdwg.org/dwc/terms/attributes/' : 'tdwgutility'}
 22 | 
 23 | class DwcNamespaceError(Exception):
 24 |     """Namespace link is not available in the currently provided links"""
 25 |     pass
 26 | 
 27 | class DwcBuildReader():
 28 | 
 29 |     def __init__(self, dwc_build_file):
 30 |         """Custom Reader switching between raw Github or local file"""
 31 |         self.dwc_build_file = dwc_build_file
 32 | 
 33 |     def __enter__(self):
 34 |         if "https://raw.github" in self.dwc_build_file:
 35 |             self.open_dwc_term = request.urlopen(self.dwc_build_file)
 36 |         else:
 37 |             self.open_dwc_term = open(self.dwc_build_file, 'rb')
 38 |         return self.open_dwc_term
 39 | 
 40 |     def __exit__(self, *args):
 41 |         self.open_dwc_term.close()
 42 | 
 43 | class DwcDigester(object):
 44 | 
 45 |     def __init__(self, term_versions, qrg_term_versions):
 46 |         """Digest the term and qrg documents of Darwin Core to support automatic
 47 |         generation of derivatives
 48 | 
 49 |         Parameters
 50 |         -----------
 51 |         term_versions : str
 52 |             Either a relative path and filename of the normative Dwc document
 53 |             or a URL link to the raw Github version of the file
 54 |         qrg_term_versions : str
 55 |             Either a relative path and filename of the Quick Reference Guide term order 
 56 |             document or a URL link to the raw Github version of the file
 57 | 
 58 |         Notes
 59 |         -----
 60 |         Remark that the sequence of the term versions entries is
 61 |         essential for the automatic generation of the individual documents
 62 |         (mainly the index.html)
 63 |         """
 64 |         self.term_versions = term_versions
 65 |         self.qrg_term_versions = qrg_term_versions
 66 | 
 67 |         self.term_versions_data = {}
 68 |         self._store_versions()
 69 | 
 70 |         # create the defined data-object for the different outputs
 71 |         self.template_data = self.process_terms()
 72 |         self.properties = self.properties_list()
 73 | 
 74 |     def versions(self):
 75 |         """Iterator providing the terms as represented in the normative term
 76 |         versions file
 77 |         """
 78 |         with DwcBuildReader(self.term_versions) as versions:
 79 |             for vterm in csv.DictReader(io.TextIOWrapper(versions), delimiter=','):
 80 |                 if vterm["status"] == "recommended":
 81 |                     yield vterm
 82 | 
 83 |     def _store_versions(self):
 84 |         """Collect all the versions data in a dictionary as the
 85 |         term_versions_data attribute
 86 |         """
 87 |         for term in self.versions():
 88 |             self.term_versions_data[term["term_iri"]] = term
 89 | 
 90 |     def _select_versions_term(self, term_iri):
 91 |         """Select a specific term of the versions data, using term_iri match
 92 |         """
 93 |         return self.term_versions_data[term_iri]
 94 | 
 95 |     @staticmethod
 96 |     def split_iri(term_iri):
 97 |         """Split an iri field into the namespace url and the local name
 98 |         of the term
 99 |         """
100 |         prog = re.compile("(.*/)([^/]*$)")
101 |         namespace, local_name = prog.findall(term_iri)[0]
102 |         return namespace, local_name
103 | 
104 |     @staticmethod
105 |     def resolve_namespace_abbrev(namespace):
106 |         """Using the NAMESPACE constant, get the namespace abbreviation by
107 |         providing the namespace link
108 | 
109 |         Parameters
110 |         -----------
111 |         namespace : str
112 |             valid key of the NAMESPACES variable
113 |         """
114 |         if namespace not in NAMESPACES.keys():
115 |             print("namespace url: %s", namespace)
116 |             raise DwcNamespaceError("The namespace url is currently not supported in NAMESPACES")
117 |         return NAMESPACES[namespace]
118 | 
119 |     def get_term_definition(self, term_iri):
120 |         """Extract the required information from the terms table to show on
121 |         the webpage of a single term by using the term_iri as the identifier
122 | 
123 |         Notes
124 |         ------
125 |         Due to the current implementation, make sure to provide the same keys
126 |         represented in the record-level specific version `process_terms`
127 |         method (room for improvement)
128 |         """
129 |         vs_term = self._select_versions_term(term_iri)
130 | 
131 |         term_data = {}
132 |         term_data["label"] = vs_term['term_localName'] # See https://github.com/tdwg/dwc/issues/253#issuecomment-670098202
133 |         term_data["iri"] = term_iri
134 |         term_data["class"] = vs_term['organized_in']
135 |         term_data["definition"] = self.convert_link(vs_term['definition'])
136 |         term_data["comments"] = self.convert_link(self.convert_code(vs_term['comments']))
137 |         term_data["examples"] = self.convert_link(self.convert_code(vs_term['examples']))
138 |         term_data["rdf_type"] = vs_term['rdf_type']
139 |         namespace_url, _ = self.split_iri(term_iri)
140 |         term_data["namespace"] = self.resolve_namespace_abbrev(namespace_url)
141 |         return term_data
142 | 
143 |     @staticmethod
144 |     def convert_code(text_with_backticks):
145 |         """Takes all back-quoted sections in a text field and converts it to
146 |         the html tagged version of code blocks <code>...</code>
147 |         """
148 |         return re.sub(r'`([^`]*)`', r'<code>\1</code>', text_with_backticks)
149 | 
150 |     @staticmethod
151 |     def convert_link(text_with_urls):
152 |         """Takes all links in a text field and converts it to the html tagged
153 |         version of the link
154 |         """
155 |         def _handle_matched(inputstring):
156 |             """quick hack version of url handling on the current prime versions data"""
157 |             url = inputstring.group()
158 |             return "<a href=\"{}\">{}</a>".format(url, url)
159 | 
160 |         regx = "(http[s]?://[\w\d:#@%/;$()~_?\+-;=\\\.&]*)(?<![\)\.])"
161 |         return re.sub(regx, _handle_matched, text_with_urls)
162 | 
163 |     def process_terms(self):
164 |         """Parse the config terms (sequence matters!)
165 | 
166 |         Collect all required data from both the normative versions file and
167 |         the config file and return the template ready data.
168 | 
169 |         Returns
170 |         -------
171 |         Data object that can be digested by the html-template file. Contains
172 |         the term data formatted to create the indidivual outputs, each list
173 |         element is a dictionary representing a class group. Hence, the data
174 |         object is structured as follows:
175 | 
176 |             [
177 |                 {'name' : class_group_name_1, 'label': xxxx,...,
178 |                     'terms':
179 |                         [
180 |                             {'name' : term_1, 'label': xxxx,...},
181 |                             {'name' : term_2, 'label': xxxx,...},
182 |                             ...
183 |                         ]}
184 |                 {'name' : class_group_name_2,...
185 |                 ...},
186 |                 ...
187 |             ]
188 |         """
189 |         termdict = {}
190 |         for term in self.versions(): # sequence of the terms file used as order
191 |             term_data = self.get_term_definition(term['term_iri'])
192 |             termdict[term['term_iri']] = term_data
193 | 
194 |         template_data = []
195 |         class_group = {}
196 | 
197 |         with DwcBuildReader(self.qrg_term_versions) as versions:
198 |             for vterm in csv.DictReader(io.TextIOWrapper(versions), delimiter=','):
199 |                 iri = vterm['recommended_term_iri']
200 |                 if iri.find('group:') == 0:
201 |                     # Row is a group indicator starting with the string "group:"
202 |                     # Make a class group to put terms in
203 |                     class_group = {}
204 |                     group_name = iri.rpartition(':')[-1].strip()
205 |                     class_group["label"] = group_name
206 |                     class_group["iri"] = None
207 |                     class_group["class"] = None
208 |                     class_group["definition"] = None
209 |                     class_group["comments"] = None
210 |                     class_group["rdf_type"] = None
211 |                     class_group["namespace"] = None
212 |                     class_group["terms"] = []
213 |                     template_data.append(class_group)
214 |                 else:
215 |                     # Row is a term to be listed with the most recent class_group
216 |                     # Add the term, getting the details from the dictionary created from
217 |                     # the term_versions.csv rather than the qrg_list.csv
218 |                     term_data = termdict.get(iri)
219 |                     class_group['terms'].append(term_data)
220 |         return template_data
221 | 
222 |     def create_html(self, html_template="terms.tmpl",
223 |                     html_output="../docs/terms/index.md"):
224 |         """build html with the processed term info, by filling in the
225 |         tmpl-template
226 | 
227 |         Parameters
228 |         -----------
229 |         html_template : str
230 |             relative path and filename to the Jinja2 compatible
231 |             template
232 |         html_output : str
233 |             relative path and filename to write the resulting index.html
234 |         """
235 | 
236 |         data = {}
237 |         data["class_groups"] = self.template_data
238 | 
239 |         env = Environment(
240 |             loader = FileSystemLoader(os.path.dirname(html_template)),
241 |             trim_blocks = True
242 |         )
243 |         template = env.get_template(os.path.basename(html_template))
244 |         html = template.render(data)
245 | 
246 |         index_page = open(html_output, "w")
247 |         index_page.write(str(html))
248 |         index_page.close()
249 | 
250 |     def properties_list(self):
251 |         """Make a list of all properties in the qrg-list
252 |         """
253 |         properties = []
254 |         for group in self.template_data:
255 |             for term in group['terms']:
256 |                 properties.append(term.get('label'))
257 |         return properties
258 | 
259 |     def create_dwc_list(self, file_output="../dist/simple_eco_vertical.csv"):
260 |         """Build a list of simple dwc terms and write it to file
261 | 
262 |         Parameters
263 |         -----------
264 |         file_output : str
265 |             relative path and filename to write the resulting list
266 |         """
267 |         with codecs.open(file_output, 'w', 'utf-8') as dwc_list_file:
268 |             for term in self.properties:
269 |                 dwc_list_file.write(term + "\n")
270 | 
271 |     def create_dwc_header(self, file_output="../dist/simple_eco_horizontal.csv"):
272 |         """Build a header of simple dwc terms and write it to file
273 | 
274 |         Parameters
275 |         -----------
276 |         file_output : str
277 |             relative path and filename to write the resulting list
278 |         """
279 |         with codecs.open(file_output, 'w', 'utf-8') as dwc_header_file:
280 |             dwc_header_file.write(",".join(self.properties))
281 |             dwc_header_file.write("\n")
282 | 
283 | def main():
284 |     """Building up the quick reference html and derivatives"""
285 | 
286 |     term_versions_file = "../vocabulary/term_versions.csv"
287 |     qrg_term_versions_file = "./qrg-list.csv"
288 | 
289 |     print("Running build process:")
290 |     my_dwc = DwcDigester(term_versions_file, qrg_term_versions_file)
291 |     print("Building quick reference guide")
292 |     my_dwc.create_html()
293 |     print("Building simple CSV files")
294 |     my_dwc.create_dwc_list()
295 |     my_dwc.create_dwc_header()
296 |     print("Done!")
297 | 
298 | 
299 | if __name__ == "__main__":
300 |     sys.exit(main())
301 | 


--------------------------------------------------------------------------------
/build/build_other_doc_header.py:
--------------------------------------------------------------------------------
  1 | # Script to build Markdown pages that are not List of Terms documents from their templates and data in the rs.tdwg.org repo
  2 | # Modified for use with Humboldt Extension 2024-03-04
  3 | # Author: Steve Baskauf
  4 | # This script merges static Markdown header and footer documents with term information tables (in Markdown) generated 
  5 | # from data in the subdirectories of the process/document_metadata_processing/ directory rs.tdwg.org repo from the TDWG Github site.
  6 | 
  7 | import requests   # best library to manage HTTP transactions
  8 | import pandas as pd
  9 | import yaml
 10 | import sys
 11 | 
 12 | # -----------------
 13 | # Command line arguments
 14 | # -----------------
 15 | 
 16 | arg_vals = sys.argv[1:]
 17 | opts = [opt for opt in arg_vals if opt.startswith('-')]
 18 | args = [arg for arg in arg_vals if not arg.startswith('-')]
 19 | 
 20 | # Name of the last part of the URL of the doc
 21 | if '--slug' in opts:
 22 |     document_slug = args[opts.index('--slug')]
 23 | else:
 24 |     print('Must specify URL slug for document using --slug option')
 25 |     print('For example, if the permanent URL is "http://rs.tdwg.org/dwc/doc/eco/", the slug is "eco".')
 26 |     exit()
 27 | 
 28 | # "master" for production, something else for development
 29 | # Example: First part of branch URL is "https://raw.githubusercontent.com/tdwg/rs.tdwg.org/eco/", branch is "eco".
 30 | if '--branch' in opts:
 31 |     github_branch = args[opts.index('--branch')]
 32 | else:
 33 |     github_branch = 'master'
 34 | 
 35 | # -----------------
 36 | # Configuration section
 37 | # -----------------
 38 | 
 39 | # This is the base URL for raw files from the branch of the repo that has been pushed to GitHub
 40 | githubBaseUri = 'https://raw.githubusercontent.com/tdwg/rs.tdwg.org/' + github_branch + '/'
 41 | 
 42 | documentTemplateName = 'index.md' # This is the input file into which the header metadata will be inserted.
 43 | outFileName = '../docs/' + document_slug + '/index.md' # This is where the document rendered by GitHub Pages will be created.
 44 | 
 45 | config_file_path = 'process/document_metadata_processing/dwc_doc_' + document_slug + '/'
 46 | contributors_yaml_file = 'authors_configuration.yaml'
 47 | document_configuration_yaml_file = 'document_configuration.yaml'
 48 | 
 49 | # Load the contributors YAML file from its GitHub URL
 50 | contributors_yaml_url = githubBaseUri + config_file_path + contributors_yaml_file
 51 | contributors_yaml = requests.get(contributors_yaml_url).text
 52 | if contributors_yaml == '404: Not Found':
 53 |     print('Contributors YAML file not found. Check the URL.')
 54 |     print(contributors_yaml_url)
 55 |     exit()
 56 | contributors_yaml = yaml.load(contributors_yaml, Loader=yaml.FullLoader)
 57 | 
 58 | # Load the document configuration YAML file from its GitHub URL
 59 | document_configuration_yaml_url = githubBaseUri + config_file_path + document_configuration_yaml_file
 60 | document_configuration_yaml = requests.get(document_configuration_yaml_url).text
 61 | document_configuration_yaml = yaml.load(document_configuration_yaml, Loader=yaml.FullLoader)
 62 | 
 63 | # -----------------
 64 | # Main script
 65 | # -----------------
 66 | 
 67 | # Because this is a hack of the build-termlist.py script, "header" is used in variable names, although in this case
 68 | # it is the entire document template, not just the header.
 69 | headerObject = open('dwc_doc_' + document_slug + '/' + documentTemplateName, 'rt', encoding='utf-8')
 70 | header = headerObject.read()
 71 | headerObject.close()
 72 | 
 73 | # Build the Markdown for the contributors list
 74 | contributors = ''
 75 | for contributor in contributors_yaml:
 76 |     contributors += '[' + contributor['contributor_literal'] + '](' + contributor['contributor_iri'] + ') '
 77 |     contributors += '([' + contributor['affiliation'] + '](' + contributor['affiliation_uri'] + ')), '
 78 | contributors = contributors[:-2] # Remove the last comma and space
 79 | 
 80 | # Substitute values of ratification_date and contributors into the header template
 81 | header = header.replace('{document_title}', document_configuration_yaml['documentTitle'])
 82 | header = header.replace('{ratification_date}', document_configuration_yaml['doc_modified'])
 83 | header = header.replace('{created_date}', document_configuration_yaml['doc_created'])
 84 | header = header.replace('{contributors}', contributors)
 85 | header = header.replace('{standard_iri}', document_configuration_yaml['dcterms_isPartOf'])
 86 | header = header.replace('{current_iri}', document_configuration_yaml['current_iri'])
 87 | header = header.replace('{abstract}', document_configuration_yaml['abstract'])
 88 | header = header.replace('{creator}', document_configuration_yaml['creator'])
 89 | header = header.replace('{publisher}', document_configuration_yaml['publisher'])
 90 | year = document_configuration_yaml['doc_modified'].split('-')[0]
 91 | header = header.replace('{year}', year)
 92 | 
 93 | # Determine whether there was a previous version of the document.
 94 | if document_configuration_yaml['doc_created'] != document_configuration_yaml['doc_modified']:
 95 |     # Load versions list from document versions data in the rs.tdwg.org repo and find most recent version.
 96 |     versions_data_url = githubBaseUri + 'docs/docs-versions.csv'
 97 |     versions_list_df = pd.read_csv(versions_data_url, na_filter=False)
 98 |     # Slice all rows for versions of this document.
 99 |     matching_versions = versions_list_df[versions_list_df['current_iri']==document_configuration_yaml['current_iri']]
100 |     # Sort the matching versions by version IRI in descending order so that the most recent version is first.
101 |     matching_versions = matching_versions.sort_values(by=['version_iri'], ascending=[False])
102 |     # The previous version is the second row in the dataframe (row 1).
103 |     # The version IRI is in the second column (column 1).
104 |     most_recent_version_iri = matching_versions.iat[1, 1]
105 |     #print(most_recent_version_iri)
106 | 
107 |     # Insert the previous version information into the header
108 |     previous_version_metadata_string = '''Previous version
109 | : <''' + most_recent_version_iri + '''>
110 | 
111 | '''
112 |     # Insert the previous version information into the designated slot.
113 |     header = header.replace('{previous_version_slot}\n\n', previous_version_metadata_string)
114 | else:
115 |     # If there was no previous version, remove the slot from the header.
116 |     header = header.replace('{previous_version_slot}\n\n', '')
117 | 
118 | outputObject = open(outFileName, 'wt', encoding='utf-8')
119 | outputObject.write(header)
120 | outputObject.close()
121 |     
122 | print('done')
123 | 


--------------------------------------------------------------------------------
/build/dwc_doc_hierarchy/index.md:
--------------------------------------------------------------------------------
  1 | # {document_title}
  2 | 
  3 | Title
  4 | : {document_title}
  5 | 
  6 | Date version issued
  7 | : {ratification_date}
  8 | 
  9 | Date created
 10 | : {created_date}
 11 | 
 12 | Part of TDWG Standard
 13 | : <{standard_iri}>
 14 | 
 15 | This version
 16 | : <{current_iri}{ratification_date}>
 17 | 
 18 | Latest version
 19 | : <{current_iri}>
 20 | 
 21 | {previous_version_slot}
 22 | 
 23 | Abstract
 24 | : {abstract}
 25 | 
 26 | Contributors
 27 | : {contributors}
 28 | 
 29 | Creator
 30 | : {creator}
 31 | 
 32 | Bibliographic citation
 33 | : {creator}. {year}. {document_title}. {publisher}. <{current_iri}{ratification_date}>
 34 | 
 35 | <a id="introduction">
 36 | ## 1 Introduction (non-normative)
 37 | 
 38 | ### 1.1 Status of the content of this document
 39 | 
 40 | Section 3 of this document is normative, serving as official guidelines
 41 | in application of the Humboldt Extension. The other sections are
 42 | non-normative and designed to help improve overall understanding in
 43 | application and interpretation of the Extension.
 44 | 
 45 | ### 1.2 RFC 2119 keywords
 46 | ---------------------
 47 | 
 48 | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", 
 49 | "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to 
 50 | be interpreted as described in [BCP 14](https://datatracker.ietf.org/doc/html/bcp14)
 51 | [[RFC2119]](https://datatracker.ietf.org/doc/html/rfc2119)
 52 | [[RFC8174]](https://datatracker.ietf.org/doc/html/rfc8174)
 53 | when, and only when, they are written in capitals (as shown here).
 54 | 
 55 | ## 1.3 Namespaces and terminology
 56 | 
 57 | The namespace `eco:` abbreviates terms minted for the Humboldt Extension
 58 | for ecological inventories
 59 | ([http://rs.tdwg.org/eco/terms/](http://rs.tdwg.org/eco/terms/)).
 60 | `dwc:` abbreviates terms from the main Darwin Core vocabulary namespace
 61 | ([http://rs.tdwg.org/dwc/terms/](http://rs.tdwg.org/dwc/terms/)).
 62 | 
 63 | Words in `code markup` are term IRIs or literal values. The word
 64 | "organism" is used colloquially and is not used in the technical sense
 65 | of the dwc:Organism class, unless specifically presented as
 66 | "dwc:Organism." The word "Event" is used in the technical sense of the
 67 | dwc:Event class. "Humboldt Extension" is an abbreviation for the
 68 | "Humboldt Extension for Ecological Inventories."
 69 | 
 70 | ### 1.4 Intended audience and use for this document
 71 | 
 72 | The information in this document is targeted at data providers, data
 73 | aggregators, and data consumers. *Data providers* are the individuals
 74 | responsible for mapping ecological inventory data into an Event-based
 75 | [Darwin Core
 76 | Archive](https://ipt.gbif.org/manual/en/ipt/latest/dwca-guide)
 77 | format that includes the Humboldt Extension. *Data aggregators* and
 78 | *data consumers* can use this document to better understand the data
 79 | shared by data providers, specifically with respect to the
 80 | **relationships between hierarchical dwc:Event levels** and **when it is
 81 | or is not appropriate to make inferences** about attributes such as
 82 | abundance or absence of detection.
 83 | 
 84 | <a id="rationale">
 85 | ## 2 Rationale (non-normative)
 86 | 
 87 | Ecological inventories in the context of Darwin Core can be considered
 88 | as types of [dwc:Events](http://rs.tdwg.org/dwc/terms/Event)
 89 | --- they are actions that occur at specific locations over defined
 90 | periods of time. The terms in the Humboldt Extension are all properties
 91 | of a dwc:Event.
 92 | 
 93 | There are many types of ecological inventory, ranging from singular
 94 | observations of individual taxa (1 event:1 observation; Example 1 in
 95 | <a href="#fig1">Figure 1</a>) to highly structured and deeply nested observations within
 96 | other observations (e.g., 1 event:2 sub-events, each sub-event:2
 97 | sub-sub-events; Example 4 in <a href="#fig1">Figure 1</a>). The need for guidance on **how
 98 | to capture the details of nested observations** (dwc:Event hierarchies)
 99 | is the rationale for this document. Nested sampling designs can be
100 | translated into a relational database schema of parent-child dwc:Event
101 | relationships (a parent event with one or more child sub-events; <a href="#fig1">Figure
102 | 1</a>). This document describes the circumstances under which specific
103 | properties of parent and child dwc:Events SHOULD be populated based on
104 | the parent-child relationship.
105 | 
106 | Note that the proposed structure for sharing ecological inventories does
107 | not follow typical database practice. Whilst a (relational) database
108 | would store information in multiple tables to avoid repetition of key
109 | information, datasets shared using the Darwin Core archive format and
110 | the Humboldt Extension instead use a "flattened" structure. In order to
111 | share inventory data such that no information is lost and no information
112 | is incorrectly inferred, one SHOULD **report all information at all
113 | applicable levels**. The rules for applicability and how to populate
114 | terms at parent and child levels in the dwc:Event hierarchy are captured
115 | in section *<a href="#guiding">3.2 Guiding principles</a>* and in section *<a href="#implementation">3.3 Implementation principles</a>*.
116 | 
117 | <a id="fig1">
118 | ![Illustration of four examples of nested dwc:Events](fig1.png)
119 | 
120 | **Figure 1.** Visual representation of an ecological inventory
121 | illustrating four examples of occurrence data associated with dwc:Events
122 | nested within parent dwc:Events, at varying levels of complexity ranging
123 | from low (Example 1) to high (Example 4).
124 | 
125 | <a id="usage">
126 | ## 3 Usage guidelines (normative)
127 | 
128 | ### 3.1 Definitions
129 | 
130 | **Inventory dataset** - An inventory (dataset) consists of one or more
131 | dwc:Events that MAY be related to each other in a hierarchy of parent
132 | and child dwc:Events. This is not new to the capabilities or intentions
133 | of Darwin Core.
134 | 
135 | **Inventory hierarchy** - A set of related dwc:Events, in which a
136 | narrower dwc:Event (child) points to the related broader dwc:Event
137 | (parent) via the child's dwc:parentEventID. A higher-level dwc:Event
138 | generally contains information about the inventory design that applies
139 | to all of its children.
140 | 
141 | **Parent dwc:Event** - A parent dwc:Event is any dwc:Event whose
142 | dwc:eventID is a dwc:parentEventID for at least one other dwc:Event
143 | (e.g. EVENT_01 in Figure 2).
144 | 
145 | **Child dwc:Event** - A child dwc:Event is any dwc:Event whose
146 | dwc:parentEventID is populated with the dwc:eventID of another dwc:Event
147 | (e.g. EVENT_02 or EVENT_03 in Figure 2).
148 | 
149 | ![Visual representation of parent/child relationship](fig2.png)
150 | 
151 | **Figure 2.** Visual representation of an inventory hierarchy
152 | illustrating parent-child dwc:Event relations. The higher-level (parent)
153 | dwc:Event, EVENT_01, may include general information about the inventory
154 | design. Species occurrences are captured for two child dwc:Events
155 | (EVENT_02 and EVENT_03).
156 | 
157 | <a id="guiding">
158 | ## 3.2 Guiding principles
159 | 
160 | <a id="coverage">
161 | ### 3.2.1 Principle of spatiotemporal coverage
162 | 
163 | **A parent dwc:Event MUST encompass its child dwc:Events spatially
164 | <u>and</u> temporally.** Specifically, the spatial extent and temporal
165 | interval of a parent dwc:Event MUST contain the spatial extents and
166 | temporal intervals of all of its children. For example, if child
167 | dwc:Events took place in various locations throughout, and only within,
168 | Burundi, then the spatial extent of the parent dwc:Event would be
169 | Burundi. Similarly, if the child dwc:Events took place periodically
170 | throughout the year 2019, the temporal interval of the parent dwc:Event
171 | would begin when the earliest child dwc:Event began and end when the
172 | latest child dwc:Event ended.
173 | 
174 | <a id="applicability">
175 | ### 3.2.2 Principle of applicability
176 | 
177 | **Humboldt Extension terms SHOULD contain data explicitly at every level
178 | in the dwc:Event hierarchy to which they *directly* apply.** The value
179 | of a term for a dwc:Event SHOULD be populated for the Event itself
180 | rather than merely summarized in a higher-level dwc:Event. For example,
181 | a child dwc:Event (**C**) with multiple dwc:Occurrences, some of which
182 | resulted in voucher specimens, SHOULD possess a value of `true` for
183 | the term eco:hasVouchers. The data user SHOULD NOT be expected to look
184 | at the eco:hasVouchers term for the parent dwc:Event (**P**) of **C** in
185 | order to find the value.
186 | 
187 | If a term genuinely applies at multiple levels of an dwc:Event
188 | hierarchy, values SHOULD be reported explicitly at *each* of those
189 | levels. The values for child dwc:Events might be the same as their
190 | parental values, or child dwc:Events might possess their own more
191 | specific values. This principle allows child dwc:Events to be
192 | "autonomous" to the greatest degree possible, and avoids uncertainty
193 | about where to look for the values of properties of any given dwc:Event.
194 | 
195 | <a id="non-derivation">
196 | ### 3.2.3 Principle of non-derivation
197 | 
198 | As a complement to the *Principle of applicability*, **Humboldt
199 | Extension terms SHOULD NOT be populated by deriving or summarizing
200 | information from child dwc:Events to their common parent dwc:Event**. If
201 | a term does not directly apply to a given level of dwc:Event (i.e., it
202 | is not an actual property of that dwc:Event), it SHOULD NOT be populated
203 | with a value. For example, if the parent dwc:Event **P** from the
204 | example in section *<a href="#applicability">3.2.2</a>* above is not directly linked to
205 | dwc:Occurrences, then the term eco:hasVouchers does not apply at that
206 | dwc:Event level and SHOULD be left unpopulated. Data providers SHOULD
207 | NOT construct a value for a parent dwc:Event from values at the level of
208 | child dwc:Events.
209 | 
210 | In some cases, including the example above, it would not be valid to
211 | derive or summarize information from child dwc:Events to populate a
212 | parent dwc:Event. Suppose parent dwc:Event **P** has two child
213 | dwc:Events, one with eco:hasVouchers `true` and one with
214 | eco:hasVouchers `false`. The value of eco:hasVouchers for **P** cannot
215 | be derived or summarized from its children, as it is neither `true`
216 | nor `false` for all of them (the only two values consistent with the
217 | recommended controlled vocabulary for the term). It would be neither
218 | desirable nor reliable to use the values of the child dwc:Events to
219 | infer a value for the parent dwc:Event. The *Principle of inference*
220 | (below) provides a further example, where *scope* terms of parent
221 | dwc:Events MUST NOT be populated by summarizing from lower levels
222 | (either through the scope values of child dwc:Events or, for example,
223 | through taxa detected in child dwc:Events).
224 | 
225 | There are terms which could theoretically be populated for a parent
226 | dwc:Event from the primary data already provided for that dwc:Event\'s
227 | children (e.g., eco:materialSampleTypes). Populating the parent term
228 | could facilitate the discovery of higher-level dwc:Events among whose
229 | children there is a particular value of a property (e.g., a search
230 | through the highest-level dwc:Events in datasets to find datasets in
231 | which there are particular eco:materialSampleTypes). However, providing
232 | such summary values is specifically NOT RECOMMENDED. Doing so a\) adds no
233 | information to the dataset (the summary information is already available
234 | by inspecting the primary data in the dwc:Events in the dataset), b\)
235 | adds an extra burden of summary upon the data provider, and c\) is
236 | susceptible to errors (ambiguities, inconsistencies, incompleteness)
237 | when trying to construct secondary summary information for higher-level
238 | Events.
239 | 
240 | <a id="inference">
241 | ### 3.2.4 Principle of inference
242 | 
243 | **Certain terms in the Humboldt Extension support inferences.** Examples
244 | of terms that help data users to determine whether or not inferences can
245 | be made include those describing the *scope* of the inventory, such as
246 | eco:targetTaxonomicScope and eco:excludedTaxonomicScope, and terms
247 | describing *completeness*, such as eco:taxonCompletenessReported,
248 | eco:taxonCompletenessProtocols and eco:isTaxonomicScopeFullyReported.
249 | The values of these terms in a dwc:Event have implications for the
250 | interpretation of all of that dwc:Event's child dwc:Events. These terms
251 | MUST be populated for the highest level dwc:Event to which they apply,
252 | and all of its child dwc:Events.
253 | 
254 | **The *scope* terms of a dwc:Event MUST be populated whenever the scope
255 | was in effect**. Having this information in a dwc:Event is the only way
256 | **to be able to infer absences of detection** within that dwc:Event,
257 | whenever the dwc:Occurrences linked to that dwc:Event do not explicitly
258 | state zero counts or when there are no dwc:Occurrence records for a
259 | given taxon that fell within the taxonomic scope (the combination of
260 | eco:targetTaxonomicScope and eco:excludedTaxonomicScope). The ability to
261 | "implicitly" support inferences about undetected dwc:Taxa (and other
262 | organismal targets) was a high priority objective in the design and
263 | structure of the Humboldt Extension. By "implicitly support
264 | inferences" we mean that a dwc:organismQuantity of zero individuals
265 | within a particular scope does not need to be provided explicitly as a
266 | separate dwc:Occurrence record, for a dwc:Event that does declare an
267 | encompassing scope and where all the taxa/targets that *were* detected
268 | were fully reported. Instead, those zero counts can be reconstituted by
269 | data users based on the data contained in other terms. When the target
270 | taxonomic scope (the combination of eco:targetTaxonomicScope and
271 | eco:excludedTaxonomicScope) is determined in advance of inventory data
272 | collection, and eco:isTaxonomicScopeFullyReported = `true`, then all
273 | dwc:Taxa that fall within the taxonomic scope but are not reported in
274 | the dwc:Occurrences of any child dwc:Events **can be inferred to be
275 | dwc:Occurrences with a dwc:organismQuantity of zero** (i.e., undetected
276 | dwc:Taxa).
277 | 
278 | These inferred zero counts, in combination with information about
279 | sampling effort (i.e., eco:samplingEffortProtocol,
280 | eco:samplingEffortValue and eco:samplingEffortUnit), can then be used to
281 | estimate the likelihood that a count of zero organisms represents a
282 | *true* absence of a dwc:Taxon. However, if eco:taxonCompletenessReported
283 | = `reported incomplete` and/or eco:isTaxonomicScopeFullyReported =
284 | `false` for a dwc:Event, then future users SHOULD NOT make assumptions
285 | about absences.
286 | 
287 | Data providers **MUST NOT retrospectively infer and populate
288 | eco:targetTaxonomicScope, or other *scope* terms**, for inclusion in a
289 | dataset shared with the Humboldt Extension. This is a further example of
290 | the *<a href="#non-derivation">Principle of non-derivation</a>* (*3.2.3*). Likewise, data users SHOULD
291 | NOT assume or reconstruct a scope that was not explicitly given by the
292 | data provider. There are at least two reasons for this: (1) Artificial
293 | construction of scope: retrospective inference of target scope by a data
294 | provider by aggregating information across all child dwc:Events may
295 | result in a reported scope that is narrower than the actual intended
296 | scope of the inventory. (2) Artificial broadening of scope: it is
297 | possible that the inferred scope can be described in multiple ways. For
298 | example, the scope of a list of species within a single genus could be
299 | described as the genus, as the family containing that genus, or as an
300 | even broader taxonomic concept. Thus, unless the true taxonomic scope is
301 | a known variable in the inventory protocol, then a presumed scope may be
302 | too broad or too narrow, leading to errors when inferring counts of
303 | zero.
304 | 
305 | <a id="implementation">
306 | ## 3.3 Implementation principles
307 | 
308 | 1.  A Darwin Core-based inventory dataset MUST consist of at least one
309 | dwc:Event record.
310 | 
311 | 2.  Each dwc:Event in an inventory dataset MUST have a non-empty value
312 | for dwc:eventID that is unique within the dataset. More benefits
313 | are realizable if the dwc:eventIDs are also globally unique.
314 | 
315 | 3.  Any association of a Humboldt Extension record with a dwc:Event
316 | record MUST be done via that dwc:Event\'s dwc:eventID; the
317 | associated records MUST use the same dwc:eventID. It is
318 | permissible to have dwc:Event records without associated Humboldt
319 | Extension records.
320 | 
321 | 4.  An inventory hierarchy MUST be realized by explicitly relating each
322 | child dwc:Event to a parent dwc:Event through the child
323 | dwc:Event's dwc:parentEventID.
324 | 
325 | 5.  Data providers SHOULD follow [Darwin Core principle
326 | 4](https://dwc.tdwg.org/simple/#5-are-there-any-rules-normative),
327 | which is to fill the values of as many terms as possible, subject
328 | to the *Principle of applicability* and the *Principle of
329 | non-derivation* (sections *<a href="#applicability">3.2.2</a>* and *<a href="#non-derivation">3.2.3</a>*, respectively).
330 | 
331 | 6.  A child dwc:Event MUST NOT be assumed to implicitly "inherit" the
332 | value of any property of any of its parent dwc:Events; rather, the
333 | value SHOULD be provided explicitly as discussed in section *<a href="#applicability">3.2.2
334 | Principle of applicability</a>*.
335 | 
336 | 7.  A parent dwc:Event term SHOULD NOT be populated by deriving or
337 | summarizing information from child dwc:Events; rather, the value
338 | SHOULD be provided explicitly if appropriate to the nature and
339 | level of the dwc:Event, as discussed in section *<a href="#non-derivation">3.2.3 Principle of non-derivation</a>*.
340 | 
341 | <a id="examples">
342 | ## 4 Examples (non-normative)
343 | 
344 | ![Tables illustrating implementation principles](fig3.png)
345 | 
346 | **Figure 3.** Example illustrating the [Implementation
347 | principles](#implementation). Numbering of colored
348 | rectangles indicates the relevant principle; lines, arrows or rectangles
349 | in the same color indicate that the cells, columns or records are
350 | affected by the principle. *Notolepis coatsi* and *Cranchiidae* are not
351 | within the reported eco:targetTaxonomicScope. Principle 1 - an inventory
352 | dataset must have at least one dwc:Event record; here, 3 records can be
353 | identified. Principle 2 - each dwc:Event record must have a unique
354 | dwc:eventID. Principle 3 - Humboldt Extension records must be linked to
355 | the core dwc:Events via shared dwc:eventIDs. Principle 4 - every child
356 | dwc:Event must be related to its parent dwc:Event through a
357 | dwc:parentEventID. Principle 5 - term values for dwc:Events should be
358 | populated whenever possible; in the figure all records follow Darwin
359 | Core principle 4, subject to the *<a href="#applicability">Principle of applicability</a>* and the
360 | *<a href="#non-derivation">Principle of non-derivation</a>*. Principle 6 - terms for child dwc:Events
361 | must be explicitly populated rather than "inheriting" values from
362 | their parent dwc:Events. Principle 7 - terms for parent dwc:Events
363 | should be populated whenever relevant, but not be derived or summarized
364 | from their child dwc:Events.
365 | 


--------------------------------------------------------------------------------
/build/dwc_doc_inclusive/index.md:
--------------------------------------------------------------------------------
  1 | # {document_title}
  2 | 
  3 | Title
  4 | : {document_title}
  5 | 
  6 | Date version issued
  7 | : {ratification_date}
  8 | 
  9 | Date created
 10 | : {created_date}
 11 | 
 12 | Part of TDWG Standard
 13 | : <{standard_iri}>
 14 | 
 15 | This version
 16 | : <{current_iri}{ratification_date}>
 17 | 
 18 | Latest version
 19 | : <{current_iri}>
 20 | 
 21 | {previous_version_slot}
 22 | 
 23 | Abstract
 24 | : {abstract}
 25 | 
 26 | Contributors
 27 | : {contributors}
 28 | 
 29 | Creator
 30 | : {creator}
 31 | 
 32 | Bibliographic citation
 33 | : {creator}. {year}. {document_title}. {publisher}. <{current_iri}{ratification_date}>
 34 | 
 35 | ## 1 Introduction (non-normative)
 36 | 
 37 | This document elaborates upon the meaning and use of the term `eco:isLeastSpecificTargetCategoryQuantityInclusive`.  Use of this term is necessary in order to describe how to treat counts of organisms (or any other organisms quantity)  when records from a single `dwc:Event` (<http://rs.tdwg.org/dwc/terms/Event>) include multiple target categories (e.g., taxonomic ranks within a higher rank or different life stages for the same species). For example, a statement whether the least specific target category quantity is inclusive should be reported when an `dwc:Event` includes records reporting quantities that are associated with subcategories (e.g., subspecies) and records reporting quantities for more general categories (e.g., the species). In this example, the higher taxon rank (i.e., species) is the least specific category, because it is more general than the subspecies category nested below it. Species and subspecies are just one example of a pair of category and subcategory. Other examples of subcategories are life stages (e.g., “adult”, “larva”, “egg”), and sexes.
 38 | 
 39 | ### 1.1 Status of the content of this document
 40 | 
 41 | Sections 3 of this document is normative. The other sections are non-normative.  
 42 | 
 43 | 
 44 | ### 1.2 RFC 2119 key words
 45 | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", 
 46 | "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to 
 47 | be interpreted as described in [BCP 14](https://datatracker.ietf.org/doc/html/bcp14)
 48 | [[RFC2119]](https://datatracker.ietf.org/doc/html/rfc2119)
 49 | [[RFC8174]](https://datatracker.ietf.org/doc/html/rfc8174)
 50 | when, and only when, they are written in capitals (as shown here).
 51 | 
 52 | ### 1.3 Namespaces and terminology
 53 | 
 54 | The namespace `eco:` abbreviates `http://rs.tdwg.org/eco/terms/` and is used with terms minted for the Humboldt Extension for ecological inventories. `dwc:` abbreviates `http://rs.tdwg.org/dwc/terms/`, and is used with terms in the main Darwin Core vocabulary namespace. Words in `code markup` are term IRIs or literal values. The word "organisms" is used colloquially and is not used in the technical sense of the `dwc:Organism` class.
 55 | 
 56 | ## 2 Rationale (non-normative)
 57 | 
 58 | The term `eco:isLeastSpecificTargetCategoryQuantityInclusive` was introduced into the Humboldt Extension for ecological inventories late in development, after testing it with real-world cases ([Sica et al., 2022](#ref2)). Testing revealed that the quantities of organisms stored in two major biodiversity databases — OBIS (<a href="#ref1">OBIS, 2023</a>) and eBird (<a href="#ref3">Sullivan et al., 2014</a>) — need to be treated differently in order to calculate the total quantity of organisms in the least specific category.  In the specific case of data in the OBIS database, the information for a single `dwc:Event` can contain multiple records for a species, with one record for a species listing the quantity of individual organisms for the species without specifying any subcategory of life stage, and other records for the same species in the same `dwc:Event` listing quantities for different life stages (e.g., one record for adults and another record for juveniles). In this example the single `dwc:Event` will contain 3 records: one for the species without any life stage specified, one for adults of the species, and one for juveniles of the species.  For the OBIS data, the quantity in the record for which no life stage is specified is the sum of three quantities: the number of juveniles, the number of adults, and the number of individuals that were not recorded as belonging to any specific life stage.  In other words, when using OBIS data, the total quantity of individuals recorded for a species, across all life stages combined, has been pre-calculated and stored in the database; unless the quantities of individuals within specific life stages are of interest, the information in the life stage subcategories can be ignored. The value of the term `eco:isLeastSpecificTargetCategoryQuantityInclusive` in this case would be `true` - the least specific category (species without any life stage specified) already includes the counts of the more specific subcategories.
 59 | 
 60 | eBird stores information about quantities of organisms differently.  For the example of a `dwc:Event` that contains separate records for subspecies and their parent species, the total number of individuals of the species needs to be calculated by the end user as the sum of the quantity reported for the species plus the quantities reported for the subspecies.  In other words, the total quantity of organisms of each species has not been pre-calculated and must be derived by the end user. The value of the term `eco:isLeastSpecificTargetCategoryQuantityInclusive` in this case would be `false` - the least specific category (species) does not include the counts of the more specific subcategories (subspecies).
 61 | 
 62 | In summary, the term `eco:isLeastSpecificTargetCategoryQuantityInclusive` is required to inform the end user of whether they will need to derive the total quantity of organisms for the least specific category (e.g., for a species), or whether this total quantity has already been calculated prior to the data being entered into the database. Note that, if a dataset contains only simple targets that have no subcategories, the result of the term `eco:isLeastSpecificTargetCategoryQuantityInclusive` being `true` or `false` is exactly the same - the count is the total in either case. Only in this circumstance does the term not strictly need to be populated. However, given that data records acquire a "life of their own" separate from their associated metadata when aggregated from multiple data sets, best practice is to include and populate the term `eco:isLeastSpecificTargetCategoryQuantityInclusive`.
 63 | 
 64 | ## 3 Usage guidelines (normative)
 65 | 
 66 | The term `eco:isLeastSpecificTargetCategoryQuantityInclusive` is defined as "The total detected quantity of organisms for a `dwc:Taxon` (including subsets thereof) in a `dwc:Event` is given explicitly in a single record (`dwc:organismQuantity` value) for that `dwc:Taxon`."
 67 | 
 68 | Values MUST be `true` and `false`. If `true`, the `dwc:organismQuantity` values for a `dwc:Taxon` in an `dwc:Event` is inclusive of all organisms of the `dwc:Taxon` (including more specific scopes such as different life stages or lower taxonomic ranks) and the total detected quantity of organisms for that `dwc:Taxon` in the `dwc:Event` MUST NOT be determined by summing the `dwc:organismQuantity` values for all records of the `dwc:Taxon` in the `dwc:Event`. Instead, the total detected quantity of organisms for the `dwc:Taxon` in an `dwc:Event` MUST be reported in a single record for the `dwc:Taxon` in the `dwc:Event`, with this record having no further specific scopes. In this case the sum of `dwc:organismQuantity` values for the reported subsets of the `dwc:Taxon` MUST NOT exceed the value of `dwc:organismQuantity` for the single record for the `dwc:Taxon` without subsets (i.e., the total).  If `false`, the `dwc:organismQuantity` values for a `dwc:Taxon` in an `dwc:Event` MUST be added to get the total detected quantity of organisms for that `dwc:Taxon` in the `dwc:Event`. 
 69 | 
 70 | ## 4 Examples (non-normative)
 71 | 
 72 | ### 4.1 Single `dwc:Taxon` example
 73 | 
 74 | As an example of the difference between `true` and `false` values for `eco:isLeastSpecificTargetCategoryQuantityInclusive`, suppose there are three records (see Table 1) with `dwc:organismQuantity` for a `dwc:Taxon` (taxon_01) for an `dwc:Event` (event_01). One record is for adults of the `dwc:Taxon` with `dwc:organismQuantity` = `1` and `dwc:organismQuantityType` = `individuals`, one record is for juveniles of the `dwc:Taxon` with `dwc:organismQuantity` = `2` and `dwc:organismQuantityType` = `individuals`, and one record is for the `dwc:Taxon` without specifying the life stage and with `dwc:organismQuantity` = `4` and `dwc:organismQuantityType` = `individuals`. 
 75 | 
 76 | If `eco:isLeastSpecificTargetCategoryQuantityInclusive` is `true` for event_01, then the total number of individuals of taxon_01 for the `dwc:Event` is 4 (the least specific `dwc:Taxon` record — the one with no more specific scopes — includes all individuals of the `dwc:Taxon`). This means that there was 1 adult, 2 juveniles and 1 individual of taxon_01 whose life stage was not recorded. 
 77 | 
 78 | If `eco:isLeastSpecificTargetCategoryQuantityInclusive` is `false` for event_01, then the total number of individuals of taxon_01 for the `dwc:Event` is 7 (the least specific `dwc:Taxon` record - the one with no more specific scopes - does not include all individuals of the `dwc:Taxon`, rather, it is a separate category that must also be added to get the total). This means there was 1 adult, 2 juveniles and 4 individuals of taxon_01 whose life stage was not recorded.
 79 | 
 80 | **Table 1. Organism quantities in `dwc:Occurrence` records**
 81 | 
 82 | | occurrenceID | eventID | taxonID | lifeStage | organismQuantity | organismQuantityType |
 83 | | ------------ | ------- | ------- | --------- | ---------------- | -------------------- |
 84 | | occ_01 | event_01 | taxon_01 | adult | 1 | individual |
 85 | | occ_02 | event_01 | taxon_01 | juvenile | 2 | individual |
 86 | | occ_03 | event_01 | taxon_01 |  | 4 | individual |
 87 | 
 88 | ### 4.2 Nested taxa example
 89 | 
 90 | Suppose there are three records (see Table 2) with `dwc:organismQuantity` for three taxa (*Hirundo rustica* and two subspecies) for a `dwc:Event` (event_01). The record for the species has `dwc:organismQuantity` = `3` and `dwc:organismQuantityType` = `individuals`. The record for *H. r. rustica* has `dwc:organismQuantity` = `2` and `dwc:organismQuantityType` = `individuals`. The record for *H. r. gutturalis* has `dwc:organismQuantity` = `4` and `dwc:organismQuantityType` = `individuals`.
 91 | 
 92 | If `eco:isLeastSpecificTargetCategoryQuantityInclusive` is `true` for event_01, then the total number of individuals of the species *H. rustica* for the `dwc:Event` is 3 (the least specific `dwc:Taxon` record includes all individuals of the `dwc:Taxon`). This means there were 2 *H. r. rustica*, 1 *H. r. gutturalis*, and no other *H. rustica* of any kind detected.
 93 | 
 94 | If `eco:isLeastSpecificTargetCategoryQuantityInclusive` is `false` for event_01, then the total number of individuals of the species *H. rustica* for the `dwc:Event` is 6 (the least specific `dwc:Taxon` record does not include all individuals of the `dwc:Taxon`). This means there were 2 *H. r. rustica*, 1 *H. r. gutturalis*, and 3 other *H. rustica* detected that were not identified to subspecies. 
 95 | 
 96 | **Table 2. Organism quantities in `dwc:Event` records**
 97 | 
 98 | | eventID | scientificName | organismQuantity | organismQuantityType |
 99 | | ------- | -------------- | ---------------- | -------------------- |
100 | | event_01 | Hirundo rustica | 3 | individual |
101 | | event_01 | Hirundo rustica rustica | 2 | individual |
102 | | event_01 | Hirundo rustica gutturalis | 1 | individual |
103 | 
104 | # 5 References
105 | 
106 | <a id="ref1"></a>OBIS (2023) Ocean Biodiversity Information System. Intergovernmental Oceanographic Commission of UNESCO. <https://www.obis.org/>.
107 | 
108 | <a id="ref2"></a>Sica Y. V., K. Ingenloff, Y-M GAN, Z. Kachian, S. J. Baskauf, J. Wieczorek, P. F. Zermoglio, R. D. Stevenson (2022). Application of Humboldt Extension to Real-world Cases. *Biodiversity Information Science and Standards* 6: e91502. <https://doi.org/10.3897/biss.6.91502>
109 | 
110 | <a id="ref3"></a>Sullivan, B. L., J. L. Aycrigg, J. H. Barry, R. E. Bonney, N. Bruns, C. B. Cooper, T. Damoulas, A. A. Dhondt, T. Dietterich, A. Farnsworth, D. Fink, et al. (2014). The eBird enterprise: an integrated approach to development and application of citizen science. *Biological Conservation* 169:31-40. <https://10.1016/j.biocon.2013.11.003>
111 | 


--------------------------------------------------------------------------------
/build/dwc_doc_tcr/authors_configuration.yaml:
--------------------------------------------------------------------------------
 1 | - contributor_iri: https://orcid.org/0000-0002-1720-0127
 2 |   contributor_literal: Yanina V. Sica
 3 |   contributor_role: contributor
 4 |   role_uri: http://www.wikidata.org/entity/Q20204892
 5 |   affiliation: Yale University
 6 |   affiliation_uri: http://www.wikidata.org/entity/Q49112
 7 | 
 8 | - contributor_iri: https://orcid.org/0000-0002-0595-7827
 9 |   contributor_literal: Wesley M. Hochachka
10 |   contributor_role: contributor
11 |   role_uri: http://www.wikidata.org/entity/Q20204892
12 |   affiliation: Cornell Lab of Ornithology
13 |   affiliation_uri: http://www.wikidata.org/entity/Q2997535
14 | 
15 | - contributor_iri: https://orcid.org/0000-0003-4365-3135
16 |   contributor_literal: Steven J. Baskauf
17 |   contributor_role: contributor
18 |   role_uri: http://www.wikidata.org/entity/Q20204892
19 |   affiliation: Vanderbilt University Libraries
20 |   affiliation_uri: http://www.wikidata.org/entity/Q16849893
21 | 


--------------------------------------------------------------------------------
/build/dwc_doc_tcr/document_configuration.yaml:
--------------------------------------------------------------------------------
 1 | # ----------------
 2 | # Values set by the task group or maintainers of the standard.
 3 | # ----------------
 4 | 
 5 | # Official title of the document assigned by authors.
 6 | documentTitle: "Taxon Completeness Reported Controlled Vocabulary List of Terms"
 7 | 
 8 | # Abstract of document written by authors.
 9 | abstract: The Humboldt Extension for Ecological Inventories mints the term 
10 |   `taxonCompletenessReported` to alert users that the inventory was conducted in 
11 |   such a way that all of the target taxa should have been detectable if they were 
12 |   present during the dwc:Event. This vocabulary provides terms that should be used 
13 |   as values for `eco:taxonCompletenessReported` and `ecoiri:taxonCompletenessReported`.
14 | 
15 | # This value is generally the name of the task group that created the document.
16 | creator: TDWG Humboldt Extension Task Group
17 | 
18 | # Current (2023-08-27) practice is to publish documents as Markdown files in a TDWG GitHub repository.
19 | # These Markdown documents are then converted to HTML by GitHub Pages. To match the TDWG theme, the 
20 | # document maintainers will need to work with the Infrastructure team to set up the repository so 
21 | # that it can host the ancillary website for the standard or vocabulary. 
22 | # The exact setup of the repository will determin the values of accessUrl and browserRedirectUri.
23 | 
24 | # Media type of source document used to generate the HTML version of the document.
25 | mediaType: text/markdown
26 | 
27 | # Value determined by the location of the raw Markdown file in the GitHub repository.
28 | # The repository pattern used should be to create a subdirectory for the document whose name will be
29 | # the slug for the page, then place the Markdown file named index.md in that subdirectory.
30 | accessUrl: https://raw.githubusercontent.com/tdwg/hc/main/docs/tcr/index.md
31 | 
32 | # Actual URL of the document to which the permanent IRI is redirected.
33 | # When generated by GitHub pages, this will be related to the location of the raw Markdown file.
34 | # The initial default value is https://tdwg.github.io/repository_name/subdirectory_name/.
35 | # However, typically, the Infrastructure team sets up a subdomain of the tdwg.org domain for the
36 | # ancillary website. In that case, the value will eventually be 
37 | # https://subdomain.tdwg.org/subdirectory_name/.
38 | browserRedirectUri: https://tdwg.github.io/hc/tcr/
39 | 
40 | # ----------------
41 | # Values set by the TDWG Infrastructure team at time of ratification
42 | # ----------------
43 | 
44 | # Permanent IRI of the standard with which the document is associated. 
45 | # For documents added to an existing standard, see the landing page
46 | # for the standard on the TDWG website for the correct value.
47 | # For new standards, this value will be set by the TDWG Infrastructure team.
48 | dcterms_isPartOf: http://www.tdwg.org/standards/450
49 | 
50 | # IRI value assigned as a permanent identifier for the document based on standard TDWG IRI patterns.
51 | # This value will automatically get updated from the general_configuration.yaml file. It should not be
52 | # set manually.
53 | current_iri: http://rs.tdwg.org/dwc/doc/tcr/
54 | 
55 | # Date of first ratification of the document. Will match the doc_modified value for the first
56 | # version of the document. For lists of terms, this will also match the date that the
57 | # first version of the vocabulary was issued (date_issued).
58 | doc_created: '2023-09-13'
59 | 
60 | # ----------------
61 | # Do not edit below this line
62 | # ----------------
63 | 
64 | # Standard metadata determined by TDWG policy.
65 | publisher: Biodiversity Information Standards (TDWG)
66 | license_statement: Licensed under a Creative Commons Attribution 4.0 International (CC BY) License.
67 | license_uri: http://creativecommons.org/licenses/by/4.0/
68 | 
69 | # Typically left blank. May be used to provide additional information about the document
70 | # in the machine-readable metadata.
71 | comment: ''
72 | 
73 | # This value will automatically get updated from the date_issued value in config.yaml if the document
74 | # is a list of terms document. For other types of documents, it is set from the general_configuration.yaml
75 | # file.
76 | doc_modified: '2023-08-25'
77 | 


--------------------------------------------------------------------------------
/build/dwc_doc_tcr/termlist-header.md:
--------------------------------------------------------------------------------
 1 | # {document_title}
 2 | 
 3 | Title
 4 | : {document_title}
 5 | 
 6 | Namespace IRI
 7 | : {namespace_uri}
 8 | 
 9 | Preferred namespace abbreviation
10 | : {pref_namespace_prefix}:
11 | 
12 | Date version issued
13 | : {ratification_date}
14 | 
15 | Date created
16 | : {created_date}
17 | 
18 | Part of TDWG Standard
19 | : <{standard_iri}>
20 | 
21 | This version
22 | : <{current_iri}{ratification_date}>
23 | 
24 | Latest version
25 | : <{current_iri}>
26 | 
27 | {previous_version_slot}
28 | 
29 | Abstract
30 | : {abstract}
31 | 
32 | Contributors
33 | : {contributors}
34 | 
35 | Creator
36 | : {creator}
37 | 
38 | Bibliographic citation
39 | : {creator}. {year}. {document_title}. {publisher}. <{current_iri}{ratification_date}>
40 | 
41 | ## 1 Introduction (non-normative)
42 | 
43 | This document includes terms intended to be used as a controlled value for the Humboldt Extension terms with the local name `taxonCompletenessReported`. 
44 | 
45 | ### 1.1 Status of the content of this document
46 | 
47 | Sections 1 and 3 are non-normative. Section 2 is normative. In Section 4, the values of the `Term IRI`, `Definition`, and `Controlled value` are normative. The value of `Usage` (if it exists for a given term) is normative. The values of `Term Name` are non-normative, although one can expect that the namespace abbreviation prefix is one commonly used for the term namespace. `Label` and the values of all other properties (such as `Notes`) are non-normative.
48 | 
49 | ### 1.2 RFC 2119 key words
50 | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", 
51 | "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to 
52 | be interpreted as described in [BCP 14](https://datatracker.ietf.org/doc/html/bcp14)
53 | [[RFC2119]](https://datatracker.ietf.org/doc/html/rfc2119)
54 | [[RFC8174]](https://datatracker.ietf.org/doc/html/rfc8174)
55 | when, and only when, they are written in capitals (as shown here).
56 | 
57 | ### 1.3 Namespaces
58 | 
59 | The namespace `eco:` abbreviates `http://rs.tdwg.org/eco/terms/` and the namespace `ecoiri:` abbreviates `http://rs.tdwg.org/eco/iri/`. Both namespaces are used with terms minted for the Humboldt Extension for Ecological Inventories. `ecotcr:` abbreviates `http://rs.tdwg.org/ecotcr/values/`, and is used with terms in this vocabulary.
60 | 
61 | ## 2 Use of Terms (normnative)
62 | 
63 | Due to the requirements of [Section 1.4.3 of the Darwin Core RDF Guide](http://rs.tdwg.org/dwc/terms/guides/rdf/#143-use-of-darwin-core-terms-in-rdf-normative), term IRIs MUST be used as values of `ecoiri:taxonCompletenessReported`. Controlled value strings MUST be used as values of `eco:taxonCompletenessReported`.
64 | 
65 | 


--------------------------------------------------------------------------------
/build/generate_term_versions.py:
--------------------------------------------------------------------------------
  1 | # -----------------------------
  2 | # file import and configuration
  3 | # -----------------------------
  4 | 
  5 | import pandas as pd
  6 | 
  7 | # This is the base URL for raw files from the branch of the repo that has been pushed to GitHub
  8 | github_baseUri = 'https://raw.githubusercontent.com/tdwg/rs.tdwg.org/master/'
  9 | 
 10 | # This is a Python list of the database names of the term version lists to be included in the document.
 11 | term_lists = ['humboldt', 'humboldt_iri']
 12 | 
 13 | column_mappings = [
 14 |     {'norm': 'iri', 'accum': 'version'},
 15 |     {'norm': 'term_localName', 'accum': 'term_localName'},
 16 |     {'norm': 'label', 'accum': 'label'},
 17 |     {'norm': 'definition', 'accum': 'rdfs_comment'},
 18 |     {'norm': 'comments', 'accum': 'dcterms_description'},
 19 |     {'norm': 'examples', 'accum': 'examples'},
 20 |     {'norm': 'organized_in', 'accum': 'tdwgutility_organizedInClass'},
 21 |     {'norm': 'issued', 'accum': 'version_issued'},
 22 |     {'norm': 'status', 'accum': 'version_status'},
 23 |     {'norm': 'replaces', 'accum': 'replaces_version'},
 24 |     {'norm': 'rdf_type', 'accum': 'rdf_type'},
 25 |     {'norm': 'term_iri', 'accum': 'term_iri'},
 26 |     {'norm': 'abcd_equivalence', 'accum': 'tdwgutility_abcdEquivalence'},
 27 |     {'norm': 'flags', 'accum': 'tdwgutility_usageScope'}
 28 | ]
 29 | 
 30 | # -----------------------------
 31 | # Load the term version data for all of the term lists that are included in Darwin Core (including obsolete ones)
 32 | # -----------------------------
 33 | 
 34 | print('Loading namespace CSV files from GitHub:')
 35 | for term_list_index in range(len(term_lists)):
 36 |     # retrieve configuration metadata for term list
 37 |     config_url = github_baseUri + term_lists[term_list_index] + '/constants.csv'
 38 |     config_df = pd.read_csv(config_url, na_filter=False)
 39 |     term_namespace = config_df.iloc[0].loc['domainRoot']
 40 |     # print(term_namespace)
 41 |     
 42 |     # Retrieve versions metadata for term list
 43 |     versions_url = github_baseUri + term_lists[term_list_index] + '-versions/' + term_lists[term_list_index] + '-versions.csv'
 44 |     print(versions_url)
 45 |     versions_df = pd.read_csv(versions_url, na_filter=False)
 46 |     
 47 |     # Add a column for the term IRI by concatenating the term namespace with the local name value for each row
 48 |     versions_df['term_iri'] = term_namespace + versions_df['term_localName']
 49 |     
 50 |     if term_list_index == 0:
 51 |         # start the DataFrame with the first term list versions data
 52 |         accumulated_frame = versions_df.copy()
 53 |     else:
 54 |         # append subsequent term lists data to the DataFrame
 55 |         #accumulated_frame = accumulated_frame._append(versions_df.copy(), sort=True)
 56 |         accumulated_frame = pd.concat([accumulated_frame, versions_df], sort=True)
 57 | '''
 58 | # Special procedure for obsolete terms
 59 | # Retrieve versions metadata
 60 | versions_url = github_baseUri + 'dwc-obsolete-versions/dwc-obsolete-versions.csv'
 61 | print(versions_url)
 62 | versions_df = pd.read_csv(versions_url, na_filter=False)
 63 | 
 64 | # Retrieve term/version join data
 65 | join_url = github_baseUri + 'dwc-obsolete/dwc-obsolete-versions.csv'
 66 | join_df = pd.read_csv(join_url, na_filter=False)
 67 | 
 68 | # Find the term IRI for each version and add it to a list
 69 | term_iri_list = []
 70 | 
 71 | for row_index,row in versions_df.iterrows():
 72 |     for join_index,join_row in join_df.iterrows():
 73 |         # Locate the row in the join data where the version matches the row in the versions DataFrame
 74 |         if join_row['version'] == row['version']:
 75 |             term_iri_list.append(join_row['term'])
 76 |             break
 77 | 
 78 |     # Locate the row in the join data where the version matches the row in the versions DataFrame
 79 |     term_iri_row = join_df.loc[join_df['version'] == row['version']]
 80 |     # Add the current term IRI from the join data row to the list
 81 |     term_iri_list.append(term_iri_row['term'])
 82 | 
 83 | # Add the curren term IRI list to the DataFrame as the term_iri column
 84 | versions_df['term_iri'] = term_iri_list
 85 | # Add the obsolete terms DataFrame to the accumulated DataFrame
 86 | accumulated_frame = accumulated_frame._append(versions_df.copy(), sort=True)
 87 | '''
 88 | accumulated_frame.reset_index(drop=True, inplace=True) # reset the row indices to consecutive starting with zero
 89 | accumulated_frame.fillna('', inplace=True) # replace all missing values with empty strings
 90 | accumulated_frame.head()
 91 | print()
 92 | 
 93 | # -----------------------------
 94 | # Create a list of lists building each row of the normative document
 95 | # -----------------------------
 96 | 
 97 | # Create column header list for the normative document
 98 | column_headers = []
 99 | for column_mapping in column_mappings:
100 |     # Add the value of the 'norm' key for the column
101 |     column_headers.append(column_mapping['norm'])
102 | #print(column_headers)
103 | 
104 | print('merging rows for output document')
105 | # Create the rows of the normative document
106 | normative_doc_list = []
107 | for row_index,row in accumulated_frame.iterrows():
108 |     normative_doc_row = []
109 |     for column_mapping in column_mappings:
110 |         # Add the value from the accumulation DataFrame column whose name is the value of the 'accum' key for the column
111 |         if column_mapping['norm'] == 'replaces':
112 |             # concatenate all versions that were replaced; pipe separated
113 |             replace_iri = row['replaces_version']
114 |             if row['replaces1_version'] != '':
115 |                 replace_iri += '|' + row['replaces1_version']
116 |                 if row['replaces2_version'] != '':
117 |                     replace_iri += '|' + row['replaces2_version']
118 |             normative_doc_row.append(replace_iri)
119 |         else:
120 |             normative_doc_row.append(row[column_mapping['accum']])
121 |     normative_doc_list.append(normative_doc_row)
122 | 
123 | ''' NO LONGER NEEDED FOR HANDLING OF IRI VALUED TERMS
124 | # special handling for http://rs.tdwg.org/dwc/terms/attributes/UseWithIRI. Eventually we want to eliminate this.
125 | use_with_iri_row = ['http://rs.tdwg.org/dwc/terms/attributes/UseWithIRI-2017-10-06',
126 |   'UseWithIRI',
127 |   'UseWithIRI',
128 |   'The category of terms that are recommended to have an IRI as a value.',
129 |   'A utility class to organize the dwciri: terms.',
130 |   '',
131 |   'http://www.w3.org/2000/01/rdf-schema#Class',
132 |   '2017-10-06',
133 |   'recommended',
134 |   '',
135 |   'http://www.w3.org/2000/01/rdf-schema#Class',
136 |   'http://rs.tdwg.org/dwc/terms/attributes/UseWithIRI',
137 |   'not in ABCD',
138 |   '']
139 | normative_doc_list.append(use_with_iri_row)
140 | '''
141 | 
142 | # Turn list of lists into dataframe
143 | normative_doc_df = pd.DataFrame(normative_doc_list, columns = column_headers)
144 | # Set the row label as the version IRI
145 | normative_doc_df.set_index('iri', drop=False, inplace=True)
146 | normative_doc_df.index.names = ['row_index']
147 | #normative_doc_df.to_csv('test.csv', index = False)
148 | #string1 = normative_doc_df.iloc[571]['term_iri']
149 | 
150 | # -----------------------------
151 | # Order the rows as required for generating the Quick Reference Guide
152 | # -----------------------------
153 | 
154 | # DataFrame to hold built Quick Reference Guide-ordered rows
155 | built_rows_df = normative_doc_df.iloc[1:0].copy()
156 | 
157 | # DataFrame to hold remaining rows
158 | remaining_rows_df = normative_doc_df.copy()
159 | 
160 | # Load the ordered list of terms in the Quick Reference Guide (single column named recommended_term_iri)
161 | print('ordering rows for output document')
162 | qrg_df = pd.read_csv('qrg-list.csv', na_filter=False)
163 | for qrg_index,qrg_row in qrg_df.iterrows():
164 |     found = False
165 |     for row_index,row in normative_doc_df.iterrows():
166 |         if (qrg_row['recommended_term_iri'] == row['term_iri']) and (row['status'] == 'recommended'):
167 |             found = True
168 |             #built_rows_df = built_rows_df.append(row)
169 |             built_rows_df.loc[len(built_rows_df.index)] = row
170 |             remaining_rows_df.drop(row['iri'], axis=0, inplace=True)
171 |             break
172 |     if not found:
173 |         print('row not found:', qrg_row['recommended_term_iri'])
174 | 
175 | # Alphabetize remaining term versions
176 | #remaining_rows_df.sort_values(by='iri', inplace=True)
177 | sorted_output = remaining_rows_df.iloc[remaining_rows_df.iri.str.lower().argsort()]
178 | 
179 | # Concatenate ordered terms and remaining versions
180 | #normative_doc_df = built_rows_df.append(remaining_rows_df)
181 | #normative_doc_df = built_rows_df.append(sorted_output)
182 | normative_doc_df = pd.concat([built_rows_df, sorted_output])
183 | 
184 | # Save the normative document DataFrame as a CSV
185 | normative_doc_df.to_csv('../vocabulary/term_versions.csv', index = False)
186 | 
187 | print('done')
188 | 


--------------------------------------------------------------------------------
/build/qrg-list.csv:
--------------------------------------------------------------------------------
 1 | recommended_term_iri
 2 | group:Site
 3 | http://rs.tdwg.org/eco/terms/siteCount
 4 | http://rs.tdwg.org/eco/terms/siteNestingDescription
 5 | http://rs.tdwg.org/eco/terms/verbatimSiteDescriptions
 6 | http://rs.tdwg.org/eco/terms/verbatimSiteNames
 7 | http://rs.tdwg.org/eco/terms/geospatialScopeAreaValue
 8 | http://rs.tdwg.org/eco/terms/geospatialScopeAreaUnit
 9 | http://rs.tdwg.org/eco/terms/totalAreaSampledValue
10 | http://rs.tdwg.org/eco/terms/totalAreaSampledUnit
11 | http://rs.tdwg.org/eco/terms/reportedWeather
12 | http://rs.tdwg.org/eco/terms/reportedExtremeConditions
13 | group:Habitat Scope
14 | http://rs.tdwg.org/eco/terms/targetHabitatScope
15 | http://rs.tdwg.org/eco/terms/excludedHabitatScope
16 | group:Temporal Scope
17 | http://rs.tdwg.org/eco/terms/eventDurationValue
18 | http://rs.tdwg.org/eco/terms/eventDurationUnit
19 | group:Taxonomic Scope
20 | http://rs.tdwg.org/eco/terms/targetTaxonomicScope
21 | http://rs.tdwg.org/eco/terms/excludedTaxonomicScope
22 | http://rs.tdwg.org/eco/terms/taxonCompletenessReported
23 | http://rs.tdwg.org/eco/terms/taxonCompletenessProtocols
24 | http://rs.tdwg.org/eco/terms/isTaxonomicScopeFullyReported
25 | http://rs.tdwg.org/eco/terms/isAbsenceReported
26 | http://rs.tdwg.org/eco/terms/absentTaxa
27 | http://rs.tdwg.org/eco/terms/hasNonTargetTaxa
28 | http://rs.tdwg.org/eco/terms/nonTargetTaxa
29 | http://rs.tdwg.org/eco/terms/areNonTargetTaxaFullyReported
30 | group:Organismal Scope
31 | http://rs.tdwg.org/eco/terms/targetLifeStageScope
32 | http://rs.tdwg.org/eco/terms/excludedLifeStageScope
33 | http://rs.tdwg.org/eco/terms/isLifeStageScopeFullyReported
34 | http://rs.tdwg.org/eco/terms/targetDegreeOfEstablishmentScope
35 | http://rs.tdwg.org/eco/terms/excludedDegreeOfEstablishmentScope
36 | http://rs.tdwg.org/eco/terms/isDegreeOfEstablishmentScopeFullyReported
37 | http://rs.tdwg.org/eco/terms/targetGrowthFormScope
38 | http://rs.tdwg.org/eco/terms/excludedGrowthFormScope
39 | http://rs.tdwg.org/eco/terms/isGrowthFormScopeFullyReported
40 | http://rs.tdwg.org/eco/terms/hasNonTargetOrganisms
41 | http://rs.tdwg.org/eco/terms/verbatimTargetScope
42 | group:Methodology Description
43 | http://rs.tdwg.org/eco/terms/compilationTypes
44 | http://rs.tdwg.org/eco/terms/compilationSourceTypes
45 | http://rs.tdwg.org/eco/terms/inventoryTypes
46 | http://rs.tdwg.org/eco/terms/protocolNames
47 | http://rs.tdwg.org/eco/terms/protocolDescriptions
48 | http://rs.tdwg.org/eco/terms/protocolReferences
49 | http://rs.tdwg.org/eco/terms/isAbundanceReported
50 | http://rs.tdwg.org/eco/terms/isAbundanceCapReported
51 | http://rs.tdwg.org/eco/terms/abundanceCap
52 | http://rs.tdwg.org/eco/terms/isVegetationCoverReported
53 | http://rs.tdwg.org/eco/terms/isLeastSpecificTargetCategoryQuantityInclusive
54 | group:Material Collected
55 | http://rs.tdwg.org/eco/terms/hasVouchers
56 | http://rs.tdwg.org/eco/terms/voucherInstitutions
57 | http://rs.tdwg.org/eco/terms/hasMaterialSamples
58 | http://rs.tdwg.org/eco/terms/materialSampleTypes
59 | group:Sampling Effort
60 | http://rs.tdwg.org/eco/terms/samplingPerformedBy
61 | http://rs.tdwg.org/eco/terms/isSamplingEffortReported
62 | http://rs.tdwg.org/eco/terms/samplingEffortProtocol
63 | http://rs.tdwg.org/eco/terms/samplingEffortValue
64 | http://rs.tdwg.org/eco/terms/samplingEffortUnit
65 | group:UseWithIRI
66 | http://rs.tdwg.org/eco/iri/absentTaxa
67 | http://rs.tdwg.org/eco/iri/compilationSourceTypes
68 | http://rs.tdwg.org/eco/iri/compilationTypes
69 | http://rs.tdwg.org/eco/iri/eventDurationUnit
70 | http://rs.tdwg.org/eco/iri/excludedDegreeOfEstablishmentScope
71 | http://rs.tdwg.org/eco/iri/excludedGrowthFormScope
72 | http://rs.tdwg.org/eco/iri/excludedHabitatScope
73 | http://rs.tdwg.org/eco/iri/excludedLifeStageScope
74 | http://rs.tdwg.org/eco/iri/excludedTaxonomicScope
75 | http://rs.tdwg.org/eco/iri/geospatialScopeAreaUnit
76 | http://rs.tdwg.org/eco/iri/inventoryTypes
77 | http://rs.tdwg.org/eco/iri/materialSampleTypes
78 | http://rs.tdwg.org/eco/iri/nonTargetTaxa
79 | http://rs.tdwg.org/eco/iri/protocolNames
80 | http://rs.tdwg.org/eco/iri/samplingEffortProtocol
81 | http://rs.tdwg.org/eco/iri/samplingEffortUnit
82 | http://rs.tdwg.org/eco/iri/samplingPerformedBy
83 | http://rs.tdwg.org/eco/iri/targetDegreeOfEstablishmentScope
84 | http://rs.tdwg.org/eco/iri/targetGrowthFormScope
85 | http://rs.tdwg.org/eco/iri/targetHabitatScope
86 | http://rs.tdwg.org/eco/iri/targetLifeStageScope
87 | http://rs.tdwg.org/eco/iri/targetTaxonomicScope
88 | http://rs.tdwg.org/eco/iri/taxonCompletenessProtocols
89 | 


--------------------------------------------------------------------------------
/build/requirements.txt:
--------------------------------------------------------------------------------
1 | jinja2
2 | PyYAML
3 | 


--------------------------------------------------------------------------------
/build/tcr-2024-02-28/config.yaml:
--------------------------------------------------------------------------------
  1 | # To use this configuration file, it must be in the process directory from which the
  2 | # process.py script is run. Typically, a copy is stored with the modifications CSV file.
  3 | 
  4 | # Date assigned to all versions, usually the date of approval by the Executive Committee.
  5 | # It is appended to all version IRIs. Format: YYYY-MM-DD
  6 | date_issued: '2024-02-28'
  7 | 
  8 | # UTC offset for the computer running the script (i.e. the appropriate offset for values produced by the
  9 | # Python method datetime.datime.now() .
 10 | local_offset_from_utc: -05:00
 11 | 
 12 | # Only relevant when new term lists or vocabularies are created. It does nothing when
 13 | # existing terms are changed. 
 14 | # Technical note: this controls which template column mapping file from the "current terms" and "versions"
 15 | # directories of the process directory in rs.tdwg.org repo. If additional properties are added in addition
 16 | # to the standard ones, the template file will need to be edited. See Section 3 of process-vocaulary.md for details.
 17 | # Categories:
 18 | # 1: Simple vocabulary
 19 | # 2: Simple controlled vocabulary
 20 | # 3: Controlled vocabluary with broader hierarchy
 21 | vocab_type: 2
 22 | 
 23 | # Permanent IRI for the list of terms document that is associated with this vocabulary.
 24 | # This is needed to automatically update the date_modified value of the list of terms document 
 25 | # using the date_issued value above.
 26 | list_of_terms_iri: http://rs.tdwg.org/dwc/doc/tcr/
 27 | 
 28 | # IRI of containing standard. Existing standards IRIs:
 29 | # Darwin Core - http://www.tdwg.org/standards/450
 30 | # Audiovisual Core - http://www.tdwg.org/standards/638
 31 | # Latimer Core - http://www.tdwg.org/standards/x
 32 | standard: http://www.tdwg.org/standards/450
 33 | 
 34 | # Text to describe the Executive Committee Decision that approved the change.
 35 | decisions_text: Humboldt Extension for Ecological Inventories and controlled vocabulary for eco:taxonCompletenessReported ratified as a part of the Darwin Core Standard. See https://github.com/tdwg/hc/milestone/1
 36 | 
 37 | namespaces:
 38 | 
 39 | # Repeat the following data for each namespace
 40 | 
 41 | # For existing term lists, MUST be namespace assigned by issuing organization. For TDWG
 42 | # term lists, MUST follow conventional TDWG IRI patterns.
 43 | - namespace_uri: http://rs.tdwg.org/ecotcr/values/
 44 | 
 45 |   # Standard namespace abbreviation for the namespace IRI.
 46 |   pref_namespace_prefix: ecotcr
 47 | 
 48 |   # Database name for associated directories and files in the rs.tdwg.org repository. 
 49 |   # MUST NOT contain spaces. SHOULD be descriptive and lowerCamelCase is RECOMMENDED. 
 50 |   # Borrowed term lists SHOULD use naming convention of Darwin and Audiovisual Cores.
 51 |   # Do not append -versions to this name, the versions directory will be created automatically.
 52 |   database: taxonCompletenessReported
 53 | 
 54 |   # MUST be set to true if namespace not issued by TDWG in the rs.tdwg.org subdomain. 
 55 |   # MUST be set to false if namespace issued and controlled by TDWG.
 56 |   borrowed: false
 57 | 
 58 |   # Set to true if a new term list that has never been processed before. Otherwise, set to false.
 59 |   # MUST be set to true if it is a new term list that has never been processed before. 
 60 |   # Note that there are extra configuration files that must be set up for term lists that are 
 61 |   # part of new vocabularies. See Section 2.1.2 for details. 
 62 |   # MUST be set to false if this is an existing term list that has been processed at some time in the past.
 63 |   new_term_list: true
 64 | 
 65 |   # Normally set to false except for non-versioned namespaces like decisions.
 66 |   utility_namespace: false
 67 | 
 68 |   # Path to hand-edited changes CSV file. Relative to process directory from which the
 69 |   # process.py script is run.
 70 |   modifications_file_path: dwc-revisions/tcr-2024-02-28/tcr.csv
 71 | 
 72 |   # For TDWG-minted terms, SHOULD be set to empty string (Termlist IRI will be set to be
 73 |   # the same as the namespace IRI). For borrowed terms, mint an IRI that conforms to the 
 74 |   # TDWG termlist IRI pattern.
 75 | 
 76 |   # For TDWG-minted terms, this value SHOULD be the empty string and the termlist IRI will be set to be 
 77 |   # the same as the namespace IRI. If a value is given for TDWG-minted terms, it MUST be the same as the 
 78 |   # namespace IRI. When terms are borrowed from other non-TDWG vocabularies to be included within a TDWG 
 79 |   # vocabulary, an IRI for the borrowed term list conforming to the term list IRI pattern 
 80 |   # (https://github.com/tdwg/rs.tdwg.org#3rd-level-iris-denoting-term-lists) MUST be minted. 
 81 |   # The subdomain MUST be `rs.tdwg.org` and the first level IRI component following the subdomain MUST be 
 82 |   # the standard component for the vocabulary that is borrowing the terms. The second level IRI component 
 83 |   # SHOULD be a short, memorable string commonly associated with the borrowed vocabulary. Examples:
 84 |   # http://rs.tdwg.org/ac/xmp/ for the XMP terms borrowed by the Audiovisual Core
 85 |   # http://rs.tdwg.org/dwc/dcterms/ for the Dublin Core dcterms: terms borrowed by the Darwin Core
 86 |   termlist_uri: ''
 87 | 
 88 |   # Label used for the term list in machine-readable metadata.
 89 |   label: taxonCompletenessReported controlled values list
 90 | 
 91 |   # Description of the term list used in machine-readable metadata.
 92 |   description: Controlled values list for the Humboldt Extension for Ecological Inventories term taxonCompletenessReported.
 93 | 
 94 |   # The following values are used to set up redirects to the list of terms document.
 95 | 
 96 |   # IRI string from List of Terms document URL to be prepended to the term fragment identifier when 
 97 |   # dereferencing terms and an HTML representation is requested. 
 98 |   prepend_url: https://eco.tdwg.org/tcr/#
 99 | 
100 |   # Indicates whether the namespace abbreviation is included in the fragment identifier for the term.
101 |   use_namespace_in_fragment: true
102 | 
103 |   # String that us used to separate the namespace abbreviation from the term name in the fragment identifier.
104 |   # If use_nameapace_in_fragment is false, this value is ignored.
105 |   separator: '_'  
106 | 


--------------------------------------------------------------------------------
/build/tcr-2024-02-28/tcr.csv:
--------------------------------------------------------------------------------
1 | term_localName,label,skos_inScheme,definition,definition_derived_from,usage,notes,examples,controlled_value_string,type
2 | tcr,taxon completeness reported concept scheme,,a SKOS concept scheme for categorizing taxon completeness reporting,,,,,,http://www.w3.org/2004/02/skos/core#ConceptScheme
3 | tcr00,not reported,tcr,Taxonomic completeness was not assessed or reported for the dwc:Event.,,,,,notReported,http://www.w3.org/2004/02/skos/core#Concept
4 | tcr01,reported complete,tcr,"Taxonomic completeness was assessed for the dwc:Event, and it was determined to be complete.",,,,,reportedComplete,http://www.w3.org/2004/02/skos/core#Concept
5 | tcr02,reported incomplete,tcr,"Taxonomic completeness was assessed for the dwc:Event, and it was determined to be incomplete.",,,,,reportedIncomplete,http://www.w3.org/2004/02/skos/core#Concept
6 | 


--------------------------------------------------------------------------------
/build/tcr-2024-02-28/vocab.yaml:
--------------------------------------------------------------------------------
 1 | # To use this configuration file, it must be in the process directory from which the
 2 | # process.py script is run. Typically, a copy is stored with the modifications CSV file.
 3 | 
 4 | # The following values are only required if the vocabulary is new. If they are provided for
 5 | # an existing vocabulary, they will replace the existing values.
 6 | vocabulary_label: Taxon Completeness Reported Controlled Vocabulary
 7 | vocabulary_description: A controlled vocabulary for the Humboldt Extension for Ecological Inventories term eco:taxonCompletenessReported.
 8 | 
 9 | # Current practice is to use the nane of the Task Group that created it. For vocabularies 
10 | # that have been heavily modified, the name of the Maintenance Group may be used. 
11 | dc_creator: TDWG Humboldt Extension Task Group
12 | 
13 | # TDWG's standard license should be used.
14 | dcterms_license: https://creativecommons.org/licenses/by/4.0/
15 | 
16 | # The following values are only required if the standard is new. If they are provided for
17 | # an existing standard, they will replace the existing values.
18 | standard_label: Darwin Core
19 | standard_description: Darwin Core is a standard maintained by the Darwin Core maintenance group. It includes a glossary of terms (in other contexts these might be called properties, elements, fields, columns, attributes, or concepts) intended to facilitate the sharing of information about biological diversity by providing identifiers, labels, and definitions. Darwin Core is primarily based on taxa, their occurrence in nature as documented by observations, specimens, samples, and related information.
20 | 


--------------------------------------------------------------------------------
/build/tcr_build.py:
--------------------------------------------------------------------------------
  1 | # Script to build Markdown pages that provide term metadata for simple vocabularies
  2 | # Steve Baskauf 2020-06-28 CC0
  3 | # This script merges static Markdown header and footer documents with term information tables (in Markdown) generated from data in the rs.tdwg.org repo from the TDWG Github site
  4 | 
  5 | # Note: this script calls a function from http_library.py, which requires importing the requests, csv, and json modules
  6 | import re
  7 | import requests   # best library to manage HTTP transactions
  8 | import csv        # library to read/write/parse CSV files
  9 | import json       # library to convert JSON to Python data structures
 10 | import pandas as pd
 11 | import yaml
 12 | import sys
 13 | 
 14 | # -----------------
 15 | # Command line arguments
 16 | # -----------------
 17 | 
 18 | arg_vals = sys.argv[1:]
 19 | opts = [opt for opt in arg_vals if opt.startswith('-')]
 20 | args = [arg for arg in arg_vals if not arg.startswith('-')]
 21 | 
 22 | # "master" for production, something else for development
 23 | # Example: First part of branch URL is "https://raw.githubusercontent.com/tdwg/rs.tdwg.org/eco/", branch is "eco".
 24 | if '--branch' in opts:
 25 |     github_branch = args[opts.index('--branch')]
 26 | else:
 27 |     github_branch = 'master'
 28 | 
 29 | # -----------------
 30 | # Configuration section
 31 | # -----------------
 32 | 
 33 | # This is the base URL for raw files from the branch of the repo that has been pushed to GitHub
 34 | githubBaseUri = 'https://raw.githubusercontent.com/tdwg/rs.tdwg.org/' + github_branch + '/'
 35 | 
 36 | headerFileName = 'dwc_doc_tcr/termlist-header.md'
 37 | footerFileName = 'termlist-footer.md'
 38 | outFileName = '../docs/tcr/index.md'
 39 | 
 40 | # This is a Python list of the database names of the term lists to be included in the document.
 41 | termLists = ['taxonCompletenessReported']
 42 | 
 43 | # If this list of terms is for terms in a single namespace, set the value of has_namespace to True. The value
 44 | # of has_namespace should be False for a list of terms that contains multiple namespaces.
 45 | has_namespace = True
 46 | 
 47 | # NOTE! There may be problems unless every term list is of the same vocabulary type since the number of columns will differ
 48 | # However, there probably aren't any circumstances where mixed types will be used to generate the same page.
 49 | vocab_type = 2 # 1 is simple vocabulary, 2 is simple controlled vocabulary, 3 is c.v. with broader hierarchy
 50 | 
 51 | # Terms in large vocabularies like Darwin and Audubon Cores may be organized into categories using tdwgutility_organizedInClass
 52 | # If so, those categories can be used to group terms in the generated term list document.
 53 | organized_in_categories = False
 54 | 
 55 | # If organized in categories, the display_order list must contain the IRIs that are values of tdwgutility_organizedInClass
 56 | # If not organized into categories, the value is irrelevant. There just needs to be one item in the list.
 57 | display_order = ['']
 58 | display_label = ['Vocabulary'] # these are the section labels for the categories in the page
 59 | display_comments = [''] # these are the comments about the category to be appended following the section labels
 60 | display_id = ['Vocabulary'] # these are the fragment identifiers for the associated sections for the categories
 61 | 
 62 | # ---------------
 63 | # Load header data
 64 | # ---------------
 65 | 
 66 | config_file_path = 'process/document_metadata_processing/dwc_doc_tcr/'
 67 | contributors_yaml_file = 'authors_configuration.yaml'
 68 | document_configuration_yaml_file = 'document_configuration.yaml'
 69 | 
 70 | if has_namespace:
 71 |     # Load the data about the namespace from term lists metadata at rs.tdwg.org
 72 |     term_lists_df = pd.read_csv(githubBaseUri +  'term-lists/term-lists.csv')
 73 |     # Find the row in the term-lists.csv file that corresponds to the database.
 74 |     term_list_row = term_lists_df.loc[term_lists_df['database'] == termLists[0]]
 75 |     # Extract the namespace IRI and preferred namespace prefix from the row.
 76 |     namespace_uri = term_list_row['vann_preferredNamespaceUri'].values[0]
 77 |     pref_namespace_prefix = term_list_row['vann_preferredNamespacePrefix'].values[0]
 78 | 
 79 |     '''
 80 |     # Load the configuration file used in the metadata creation process.
 81 |     metadata_config_text = requests.get(githubBaseUri + 'process/config.yaml').text
 82 |     metadata_config = yaml.load(metadata_config_text, Loader=yaml.FullLoader)
 83 |     namespace_uri = metadata_config['namespaces'][0]['namespace_uri']
 84 |     pref_namespace_prefix = metadata_config['namespaces'][0]['pref_namespace_prefix']
 85 |     '''
 86 | 
 87 | # Load the contributors YAML file from its GitHub URL
 88 | contributors_yaml_url = githubBaseUri + config_file_path + contributors_yaml_file
 89 | contributors_yaml = requests.get(contributors_yaml_url).text
 90 | if contributors_yaml == '404: Not Found':
 91 |     print('Contributors YAML file not found. Check the URL.')
 92 |     print(contributors_yaml_url)
 93 |     exit()
 94 | contributors_yaml = yaml.load(contributors_yaml, Loader=yaml.FullLoader)
 95 | 
 96 | # Load the document configuration YAML file from its GitHub URL
 97 | document_configuration_yaml_url = githubBaseUri + config_file_path + document_configuration_yaml_file
 98 | document_configuration_yaml = requests.get(document_configuration_yaml_url).text
 99 | document_configuration_yaml = yaml.load(document_configuration_yaml, Loader=yaml.FullLoader)
100 | 
101 | # ---------------
102 | # Function definitions
103 | # ---------------
104 | 
105 | # replace URL with link
106 | #
107 | def createLinks(text):
108 |     def repl(match):
109 |         if match.group(1)[-1] == '.':
110 |             return '<a href="' + match.group(1)[:-1] + '">' + match.group(1)[:-1] + '</a>.'
111 |         return '<a href="' + match.group(1) + '">' + match.group(1) + '</a>'
112 | 
113 |     pattern = '(https?://[^\s,;\)"]*)'
114 |     result = re.sub(pattern, repl, text)
115 |     return result
116 | 
117 | # 2021-08-06 Replace the createLinks() function with functions copied from the QRG build script written by S. Van Hoey
118 | def convert_code(text_with_backticks):
119 |     """Takes all back-quoted sections in a text field and converts it to
120 |     the html tagged version of code blocks <code>...</code>
121 |     """
122 |     return re.sub(r'`([^`]*)`', r'<code>\1</code>', text_with_backticks)
123 | 
124 | def convert_link(text_with_urls):
125 |     """Takes all links in a text field and converts it to the html tagged
126 |     version of the link
127 |     """
128 |     def _handle_matched(inputstring):
129 |         """quick hack version of url handling on the current prime versions data"""
130 |         url = inputstring.group()
131 |         return "<a href=\"{}\">{}</a>".format(url, url)
132 | 
133 |     regx = "(http[s]?://[\w\d:#@%/;$()~_?\+-;=\\\.&]*)(?<![\)\.,])"
134 |     return re.sub(regx, _handle_matched, text_with_urls)
135 | 
136 | # Hack the code taken from the terms.tmpl template to insert the HTML necessary to make the semicolon-separated
137 | # lists of examples into an HTML list.
138 | # {% set examples = term.examples.split("; ") %}
139 | # {% if examples | length == 1 %}{{ examples | first }}{% else %}<ul class="list-group list-group-flush">{% for example in examples %}<li class="list-group-item">{{ example }}</li>{% endfor %}</ul>{% endif %}
140 | def convert_examples(text_with_list_of_examples: str) -> str:
141 |     examples_list = text_with_list_of_examples.split('; ')
142 |     if len(examples_list) == 1:
143 |         return examples_list[0]
144 |     else:
145 |         output = '<ul class="list-group list-group-flush">\n'
146 |         for example in examples_list:
147 |             output += '  <li class="list-group-item">' + example + '</li>\n'
148 |         output += '</ul>'
149 |         return output
150 | 
151 | print('Retrieving term list metadata from GitHub')
152 | term_lists_info = []
153 | 
154 | frame = pd.read_csv(githubBaseUri + 'term-lists/term-lists.csv', na_filter=False)
155 | for termList in termLists:
156 |     term_list_dict = {'list_iri': termList}
157 |     term_list_dict = {'database': termList}
158 |     for index,row in frame.iterrows():
159 |         if row['database'] == termList:
160 |             term_list_dict['pref_ns_prefix'] = row['vann_preferredNamespacePrefix']
161 |             term_list_dict['pref_ns_uri'] = row['vann_preferredNamespaceUri']
162 |             term_list_dict['list_iri'] = row['list']
163 |     term_lists_info.append(term_list_dict)
164 | 
165 | print('Retrieving metadata about terms from all namespaces from GitHub')
166 | # Create column list
167 | column_list = ['pref_ns_prefix', 'pref_ns_uri', 'term_localName', 'label', 'definition', 'usage', 'notes', 'term_modified', 'term_deprecated', 'type']
168 | if vocab_type == 2:
169 |     column_list += ['controlled_value_string']
170 | elif vocab_type == 3:
171 |     column_list += ['controlled_value_string', 'skos_broader']
172 | if organized_in_categories:
173 |     column_list.append('tdwgutility_organizedInClass')
174 | column_list.append('version_iri')
175 | 
176 | # Create list of lists metadata table
177 | table_list = []
178 | for term_list in term_lists_info:
179 |     # retrieve versions metadata for term list
180 |     versions_url = githubBaseUri + term_list['database'] + '-versions/' + term_list['database'] + '-versions.csv'
181 |     versions_df = pd.read_csv(versions_url, na_filter=False)
182 |     
183 |     # retrieve current term metadata for term list
184 |     data_url = githubBaseUri + term_list['database'] + '/' + term_list['database'] + '.csv'
185 |     frame = pd.read_csv(data_url, na_filter=False)
186 |     for index,row in frame.iterrows():
187 |         row_list = [term_list['pref_ns_prefix'], term_list['pref_ns_uri'], row['term_localName'], row['label'], row['definition'], row['usage'], row['notes'], row['term_modified'], row['term_deprecated'], row['type']]
188 |         if vocab_type == 2:
189 |             row_list += [row['controlled_value_string']]
190 |         elif vocab_type == 3:
191 |             if row['skos_broader'] =='':
192 |                 row_list += [row['controlled_value_string'], '']
193 |             else:
194 |                 row_list += [row['controlled_value_string'], term_list['pref_ns_prefix'] + ':' + row['skos_broader']]
195 |         if organized_in_categories:
196 |             row_list.append(row['tdwgutility_organizedInClass'])
197 | 
198 |         # Borrowed terms really don't have implemented versions. They may be lacking values for version_status.
199 |         # In their case, their version IRI will be omitted.
200 |         found = False
201 |         for vindex, vrow in versions_df.iterrows():
202 |             if vrow['term_localName']==row['term_localName'] and vrow['version_status']=='recommended':
203 |                 found = True
204 |                 version_iri = vrow['version']
205 |                 # NOTE: the current hack for non-TDWG terms without a version is to append # to the end of the term IRI
206 |                 if version_iri[len(version_iri)-1] == '#':
207 |                     version_iri = ''
208 |         if not found:
209 |             version_iri = ''
210 |         row_list.append(version_iri)
211 | 
212 |         table_list.append(row_list)
213 | 
214 | print('processing data')
215 | # Turn list of lists into dataframe
216 | terms_df = pd.DataFrame(table_list, columns = column_list)
217 | 
218 | terms_sorted_by_label = terms_df.sort_values(by='label')
219 | 
220 | # This makes sort case insensitive
221 | terms_sorted_by_localname = terms_df.iloc[terms_df.term_localName.str.lower().argsort()]
222 | 
223 | print('done retrieving')
224 | print()
225 | 
226 | print('Generating term index by CURIE')
227 | # generate the index of terms grouped by category and sorted alphabetically by lowercase term local name
228 | 
229 | text = '### 3.1 Index By Term Name\n\n'
230 | text += '(See also [3.2 Index By Label](#32-index-by-label))\n\n'
231 | for category in range(0,len(display_order)):
232 |     text += '**' + display_label[category] + '**\n'
233 |     text += '\n'
234 |     if organized_in_categories:
235 |         filtered_table = terms_sorted_by_localname[terms_sorted_by_localname['tdwgutility_organizedInClass']==display_order[category]]
236 |         filtered_table.reset_index(drop=True, inplace=True)
237 |     else:
238 |         filtered_table = terms_sorted_by_localname
239 |         filtered_table.reset_index(drop=True, inplace=True)
240 |         
241 |     for row_index,row in filtered_table.iterrows():
242 |         curie = row['pref_ns_prefix'] + ":" + row['term_localName']
243 |         curie_anchor = curie.replace(':','_')
244 |         text += '[' + curie + '](#' + curie_anchor + ')'
245 |         if row_index < len(filtered_table) - 1:
246 |             text += ' |'
247 |         text += '\n'
248 |     text += '\n'
249 | index_by_name = text
250 | 
251 | text = '\n\n'
252 | 
253 | text = '## 3 Term Index \n\n'
254 | #text += '(See also [3.1 Index By Term Name](#31-index-by-term-name))\n\n'
255 | for category in range(0,len(display_order)):
256 |     if organized_in_categories:
257 |         text += '**' + display_label[category] + '**\n'
258 |         text += '\n'
259 |         filtered_table = terms_sorted_by_label[terms_sorted_by_label['tdwgutility_organizedInClass']==display_order[category]]
260 |         filtered_table.reset_index(drop=True, inplace=True)
261 |     else:
262 |         filtered_table = terms_sorted_by_label
263 |         filtered_table.reset_index(drop=True, inplace=True)
264 |         
265 |     for row_index,row in filtered_table.iterrows():
266 |         if row_index == 0 or (row_index != 0 and row['label'] != filtered_table.iloc[row_index - 1].loc['label']): # this is a hack to prevent duplicate labels
267 |             curie_anchor = row['pref_ns_prefix'] + "_" + row['term_localName']
268 |             text += '[' + row['label'] + '](#' + curie_anchor + ')'
269 |             if row_index < len(filtered_table) - 2 or (row_index == len(filtered_table) - 2 and row['label'] != filtered_table.iloc[row_index + 1].loc['label']):
270 |                 text += ' |'
271 |             text += '\n'
272 |     text += '\n'
273 | index_by_label = text
274 | 
275 | decisions_df = pd.read_csv('https://raw.githubusercontent.com/tdwg/rs.tdwg.org/master/decisions/decisions-links.csv', na_filter=False)
276 | 
277 | # generate a table for each term, with terms grouped by category
278 | 
279 | # generate the Markdown for the terms table
280 | text = '## 4 Vocabulary\n'
281 | for category in range(0,len(display_order)):
282 |     if organized_in_categories:
283 |         text += '### 4.' + str(category + 1) + ' ' + display_label[category] + '\n'
284 |         text += '\n'
285 |         text += display_comments[category] # insert the comments for the category, if any.
286 |         filtered_table = terms_sorted_by_localname[terms_sorted_by_localname['tdwgutility_organizedInClass']==display_order[category]]
287 |         filtered_table.reset_index(drop=True, inplace=True)
288 |     else:
289 |         filtered_table = terms_sorted_by_localname
290 |         filtered_table.reset_index(drop=True, inplace=True)
291 | 
292 |     for row_index,row in filtered_table.iterrows():
293 |         text += '<table>\n'
294 |         curie = row['pref_ns_prefix'] + ":" + row['term_localName']
295 |         curieAnchor = curie.replace(':','_')
296 |         text += '\t<thead>\n'
297 |         text += '\t\t<tr>\n'
298 |         text += '\t\t\t<th colspan="2"><a id="' + curieAnchor + '"></a>Term Name  ' + curie + '</th>\n'
299 |         text += '\t\t</tr>\n'
300 |         text += '\t</thead>\n'
301 |         text += '\t<tbody>\n'
302 |         text += '\t\t<tr>\n'
303 |         text += '\t\t\t<td>Term IRI</td>\n'
304 |         uri = row['pref_ns_uri'] + row['term_localName']
305 |         text += '\t\t\t<td><a href="' + uri + '">' + uri + '</a></td>\n'
306 |         text += '\t\t</tr>\n'
307 |         text += '\t\t<tr>\n'
308 |         text += '\t\t\t<td>Modified</td>\n'
309 |         text += '\t\t\t<td>' + row['term_modified'] + '</td>\n'
310 |         text += '\t\t</tr>\n'
311 | 
312 |         if row['version_iri'] != '':
313 |             text += '\t\t<tr>\n'
314 |             text += '\t\t\t<td>Term version IRI</td>\n'
315 |             text += '\t\t\t<td><a href="' + row['version_iri'] + '">' + row['version_iri'] + '</a></td>\n'
316 |             text += '\t\t</tr>\n'
317 | 
318 |         text += '\t\t<tr>\n'
319 |         text += '\t\t\t<td>Label</td>\n'
320 |         text += '\t\t\t<td>' + row['label'] + '</td>\n'
321 |         text += '\t\t</tr>\n'
322 | 
323 |         if row['term_deprecated'] != '':
324 |             text += '\t\t<tr>\n'
325 |             text += '\t\t\t<td></td>\n'
326 |             text += '\t\t\t<td><strong>This term is deprecated and should no longer be used.</strong></td>\n'
327 |             text += '\t\t</tr>\n'
328 | 
329 |         text += '\t\t<tr>\n'
330 |         text += '\t\t\t<td>Definition</td>\n'
331 |         text += '\t\t\t<td>' + row['definition'] + '</td>\n'
332 |         text += '\t\t</tr>\n'
333 | 
334 |         if row['usage'] != '':
335 |             text += '\t\t<tr>\n'
336 |             text += '\t\t\t<td>Usage</td>\n'
337 |             text += '\t\t\t<td>' + convert_link(convert_code(row['usage'])) + '</td>\n'
338 |             text += '\t\t</tr>\n'
339 | 
340 |         if row['notes'] != '':
341 |             text += '\t\t<tr>\n'
342 |             text += '\t\t\t<td>Notes</td>\n'
343 |             text += '\t\t\t<td>' + convert_link(convert_code(row['notes'])) + '</td>\n'
344 |             text += '\t\t</tr>\n'
345 | 
346 |         if (vocab_type == 2 or vocab_type == 3) and row['controlled_value_string'] != '': # controlled vocabulary
347 |             text += '\t\t<tr>\n'
348 |             text += '\t\t\t<td>Controlled value</td>\n'
349 |             text += '\t\t\t<td>' + row['controlled_value_string'] + '</td>\n'
350 |             text += '\t\t</tr>\n'
351 | 
352 |         if vocab_type == 3 and row['skos_broader'] != '': # controlled vocabulary with skos:broader relationships
353 |             text += '\t\t<tr>\n'
354 |             text += '\t\t\t<td>Has broader concept</td>\n'
355 |             curieAnchor = row['skos_broader'].replace(':','_')
356 |             text += '\t\t\t<td><a href="#' + curieAnchor + '">' + row['skos_broader'] + '</a></td>\n'
357 |             text += '\t\t</tr>\n'
358 | 
359 |         text += '\t\t<tr>\n'
360 |         text += '\t\t\t<td>Type</td>\n'
361 |         if row['type'] == 'http://www.w3.org/1999/02/22-rdf-syntax-ns#Property':
362 |             text += '\t\t\t<td>Property</td>\n'
363 |         elif row['type'] == 'http://www.w3.org/2000/01/rdf-schema#Class':
364 |             text += '\t\t\t<td>Class</td>\n'
365 |         elif row['type'] == 'http://www.w3.org/2004/02/skos/core#Concept':
366 |             text += '\t\t\t<td>Concept</td>\n'
367 |         else:
368 |             text += '\t\t\t<td>' + row['type'] + '</td>\n' # this should rarely happen
369 |         text += '\t\t</tr>\n'
370 | 
371 |         # Look up decisions related to this term
372 |         for drow_index,drow in decisions_df.iterrows():
373 |             if drow['linked_affected_resource'] == uri:
374 |                 text += '\t\t<tr>\n'
375 |                 text += '\t\t\t<td>Executive Committee decision</td>\n'
376 |                 text += '\t\t\t<td><a href="http://rs.tdwg.org/decisions/' + drow['decision_localName'] + '">http://rs.tdwg.org/decisions/' + drow['decision_localName'] + '</a></td>\n'
377 |                 text += '\t\t</tr>\n'                        
378 | 
379 |         text += '\t</tbody>\n'
380 |         text += '</table>\n'
381 |         text += '\n'
382 |     text += '\n'
383 | term_table = text
384 | 
385 | print('done generating')
386 | print()
387 | 
388 | #print(term_table)
389 | 
390 | print('Merging term table with header and footer and saving file')
391 | text = index_by_label + term_table
392 | 
393 | # read in header and footer, merge with terms table, and output
394 | 
395 | headerObject = open(headerFileName, 'rt', encoding='utf-8')
396 | header = headerObject.read()
397 | headerObject.close()
398 | 
399 | # Build the Markdown for the contributors list
400 | contributors = ''
401 | for contributor in contributors_yaml:
402 |     contributors += '[' + contributor['contributor_literal'] + '](' + contributor['contributor_iri'] + ') '
403 |     contributors += '([' + contributor['affiliation'] + '](' + contributor['affiliation_uri'] + ')), '
404 | contributors = contributors[:-2] # Remove the last comma and space
405 | 
406 | # Substitute values of ratification_date and contributors into the header template
407 | header = header.replace('{document_title}', document_configuration_yaml['documentTitle'])
408 | header = header.replace('{ratification_date}', document_configuration_yaml['doc_modified'])
409 | header = header.replace('{created_date}', document_configuration_yaml['doc_created'])
410 | header = header.replace('{contributors}', contributors)
411 | header = header.replace('{standard_iri}', document_configuration_yaml['dcterms_isPartOf'])
412 | header = header.replace('{current_iri}', document_configuration_yaml['current_iri'])
413 | header = header.replace('{abstract}', document_configuration_yaml['abstract'])
414 | header = header.replace('{creator}', document_configuration_yaml['creator'])
415 | header = header.replace('{publisher}', document_configuration_yaml['publisher'])
416 | year = document_configuration_yaml['doc_modified'].split('-')[0]
417 | header = header.replace('{year}', year)
418 | if has_namespace:
419 |     header = header.replace('{namespace_uri}', namespace_uri)
420 |     header = header.replace('{pref_namespace_prefix}', pref_namespace_prefix)
421 | 
422 | # Determine whether there was a previous version of the document.
423 | if document_configuration_yaml['doc_created'] != document_configuration_yaml['doc_modified']:
424 |     # Load versions list from document versions data in the rs.tdwg.org repo and find most recent version.
425 |     versions_data_url = githubBaseUri + 'docs/docs-versions.csv'
426 |     versions_list_df = pd.read_csv(versions_data_url, na_filter=False)
427 |     # Slice all rows for versions of this document.
428 |     matching_versions = versions_list_df[versions_list_df['current_iri']==document_configuration_yaml['current_iri']]
429 |     # Sort the matching versions by version IRI in descending order so that the most recent version is first.
430 |     matching_versions = matching_versions.sort_values(by=['version_iri'], ascending=[False])
431 |     # The previous version is the second row in the dataframe (row 1).
432 |     # The version IRI is in the second column (column 1).
433 |     most_recent_version_iri = matching_versions.iat[1, 1]
434 |     #print(most_recent_version_iri)
435 | 
436 |     # Insert the previous version information into the header
437 |     previous_version_metadata_string = '''Previous version
438 | : <''' + most_recent_version_iri + '''>
439 | 
440 | '''
441 |     # Insert the previous version information into the designated slot.
442 |     header = header.replace('{previous_version_slot}\n\n', previous_version_metadata_string)
443 | else:
444 |     # If there was no previous version, remove the slot from the header.
445 |     header = header.replace('{previous_version_slot}\n\n', '')
446 | 
447 | 
448 | footerObject = open(footerFileName, 'rt', encoding='utf-8')
449 | footer = footerObject.read()
450 | footerObject.close()
451 | 
452 | output = header + text + footer
453 | outputObject = open(outFileName, 'wt', encoding='utf-8')
454 | outputObject.write(output)
455 | outputObject.close()
456 |     
457 | print('done')
458 | 


--------------------------------------------------------------------------------
/build/termlist-footer.md:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tdwg/hc/92e0ed94afceeea6a2d8ceb559da37f450ad007c/build/termlist-footer.md


--------------------------------------------------------------------------------
/build/termlist-header.md:
--------------------------------------------------------------------------------
 1 | # {document_title}
 2 | 
 3 | Title
 4 | : {document_title}
 5 | 
 6 | Date version issued
 7 | : {ratification_date}
 8 | 
 9 | Date created
10 | : {created_date}
11 | 
12 | Part of TDWG Standard
13 | : <{standard_iri}>
14 | 
15 | This version
16 | : <{current_iri}{ratification_date}>
17 | 
18 | Latest version
19 | : <{current_iri}>
20 | 
21 | {previous_version_slot}
22 | 
23 | Abstract
24 | : {abstract}
25 | 
26 | Contributors
27 | : {contributors}
28 | 
29 | Creator
30 | : {creator}
31 | 
32 | Bibliographic citation
33 | : {creator}. {year}. {document_title}. {publisher}. <{current_iri}{ratification_date}>
34 | 
35 | ## 1 Introduction
36 | 
37 | This document contains all former and current terms in the {ratification_date} version of the Humboldt Extension for Ecological Inventories vocabulary (<http://rs.tdwg.org/version/eco/{ratification_date}>). The vocabulary uses the namespace abbreviation `eco:` for `http://rs.tdwg.org/eco/terms/` and `ecoiri:` for `http://rs.tdwg.org/eco/iri/`. 
38 | 
39 | For a simplified list that contains only the currently recommended terms, see the Humboldt Extension Quick Reference Guide (<https://eco.tdwg.org/terms/>).
40 | 
41 | ### 1.1 Status of the content of this document
42 | 
43 | In Section 4, the values of the `Term IRI`, and `Definition` are normative. The values of `Term Name` are non-normative, although one can expect that the namespace abbreviation prefix is one commonly used for the term namespace.  `Label` and the values of all other properties (such as `Notes` and `Examples`) are non-normative.
44 | 
45 | ### 1.2 RFC 2119 key words
46 | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [BCP 14](https://www.rfc-editor.org/info/bcp14) [\[RFC 2119\]](https://datatracker.ietf.org/doc/html/rfc2119) and [\[RFC 8174\]](https://datatracker.ietf.org/doc/html/rfc8174) when, and only when, they appear in all capitals, as shown here.
47 | 
48 | ## 2 Use of Terms
49 | 
50 | The terms in this extension are meant to provide stable definitions that can be used in a variety of biodiversity inventory contexts but were envisioned principally to function together as an extension to Darwin Core. This vocabulary allows the reporting of detailed information about the inventory process such as i\) a general description of the survey, ii\) where an inventory takes place and the habitat characteristics and environmental conditions of survey sites, iii\) when an inventory takes place, iv\) the target taxonomic group, life stages, growth forms, and degrees of establishment of the organisms sampled, v\) the methodology implemented (inventory type performed, protocol(s) used, absence reported, material samples or vouchers collected, non-target taxa reported), and vi\) the completeness of the inventory and the sampling effort applied. 
51 | 
52 | This extension allows the representation of complex, highly nested survey designs. An ancillary document explaining how dwc:Event hierarchies for ecological inventories should be structured and providing guidance on the use of the terms in the context of parent and child dwc:Event(s) can be found at [http://rs.tdwg.org/dwc/doc/hierarchy/](https://tdwg.github.io/hc/hierarchy/). 
53 | 
54 | To assist in the interpretation of the term eco:isLeastSpecificTargetCategoryQuantityInclusive a detailed description of its use is provided at [http://rs.tdwg.org/dwc/doc/inclusive/](https://tdwg.github.io/hc/inclusive/).
55 | 
56 | Terms that are expected to have Booleans as values should use controlled value strings from the TDWG Boolean Controlled Vocabulary at [http://rs.tdwg.org/tag/doc/boolean/](https://tag.tdwg.org/boolean/) when those values are serialized in text form. See also the [Best practices for serializing booleans](https://tag.tdwg.org/guides/boolean/) and the [Boolean Values Best Practices Reference](https://tag.tdwg.org/reference/boolean/).
57 | 
58 | ## 3 Term index
59 | 
60 | 


--------------------------------------------------------------------------------
/build/termlist-header_filled.md:
--------------------------------------------------------------------------------
 1 | # Humboldt Extension Vocabulary List of Terms
 2 | 
 3 | Title
 4 | : Humboldt Extension Vocabulary List of Terms
 5 | 
 6 | Namespace IRI:
 7 | : http://rs.tdwg.org/eco/terms/
 8 | 
 9 | Preferred namespace abbreviation
10 | : eco:
11 | 
12 | Date version issued
13 | : 2023-xx-xx
14 | 
15 | Date created
16 | : 2023-xx-xx
17 | 
18 | Part of TDWG Standard
19 | : <http://www.tdwg.org/standards/450>
20 | 
21 | This version
22 | : <http://rs.tdwg.org/dwc/doc/eco/2023-xx-xx>
23 | 
24 | Latest version
25 | : <http://rs.tdwg.org/dwc/doc/eco/>
26 | 
27 | Abstract
28 | : The Humboldt Extension for Ecological Inventories is a vocabulary for transmitting information about biological inventories. It is used along with Darwin Core terms to extend descriptions of events. This document lists all terms currently used in the vocabulary.
29 | 
30 | Contributors
31 | : fill in
32 | 
33 | Creator
34 | : TDWG Humboldt Extension Task Group
35 | 
36 | Bibliographic citation
37 | : TDWG Humboldt Extension Task Group. 2023. Humboldt Extension Vocabulary List of Terms. Biodiversity Information Standards (TDWG). <http://rs.tdwg.org/dwc/doc/eco/2023-xx-xx>
38 | 
39 | ## 1 Introduction
40 | 
41 | This document contains all versions of terms in the Humboldt Extension for Ecological Inventories vocabulary (<http://rs.tdwg.org/version/eco/2023-xx-xx>). The vocabulary uses the namespace abbreviation `eco:`. 
42 | 
43 | For a simplified list that contains only the currently recommended terms, see the Humboldt Extension Quick Reference Guide (<https://tdwg.github.io/eco/terms/>).
44 | 
45 | ### 1.1 Status of the content of this document
46 | 
47 | In Section 4, the values of the `Term IRI`, and `Definition` are normative. The values of `Term Name` are non-normative, although one can expect that the namespace abbreviation prefix is one commonly used for the term namespace.  `Label` and the values of all other properties (such as `Notes` and `Examples`) are non-normative.
48 | 
49 | ### 1.2 RFC 2119 key words
50 | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [BCP 14](https://www.rfc-editor.org/info/bcp14) [\[RFC 2119\]](https://datatracker.ietf.org/doc/html/rfc2119) and [\[RFC 8174\]](https://datatracker.ietf.org/doc/html/rfc8174) when, and only when, they appear in all capitals, as shown here.
51 | 
52 | ## 2 Use of Terms
53 | 
54 | The terms in this standard are meant to provide stable definitions that can be used in a variety of contexts, but were envisioned principally to function together as an extension to Darwin Core, where each core record may be annotated by ... .
55 | 
56 | ## 3 Term index
57 | 
58 | 


--------------------------------------------------------------------------------
/build/terms.tmpl:
--------------------------------------------------------------------------------
 1 | {#
 2 |     This template is NOT used by jekyll, but by the build script
 3 |     to create the terms/index.md file, which mostly contains html.
 4 | #}
 5 | ---
 6 | container: fluid
 7 | ---
 8 | 
 9 | # Humboldt Extension quick reference guide
10 | 
11 | This document is intended to be an easy-to-read reference to the currently recommended terms that extend the [Darwin Core standard](https://www.tdwg.org/standards/dwc/) with vocabulary to describe biological inventories. This document is not part of the standard. It draws on the [term names and definitions](../list/) from the normative part of the standard and combines them with comments and examples that are not normative, but that are meant to help people to use the terms consistently. The category to which all of the terms in this extension correspond is the Darwin Core Event (dwc:Event) class. Comprehensive metadata for current and obsolete terms in human readable form are found in a [list of terms document](../list/). CSV files with the [full history](https://github.com/tdwg/hc/blob/master/vocabulary/term_versions.csv) of the terms, with [horizontal and vertical lists](https://github.com/tdwg/hc/tree/master/dist) of these terms and the schema for the [Darwin Core Archive extension](https://github.com/tdwg/hc/tree/master/dist) can be found in the [Humboldt Extension repository](https://github.com/tdwg/hc).
12 | 
13 | {% for class_group in class_groups %}
14 | 
15 | ## {{ class_group.label }}
16 | 
17 | {% if class_group.label == 'UseWithIRI' %}
18 | For more information on `UseWithIRI`, see [Section 2.5 of the RDF Guide](https://dwc.tdwg.org/rdf/#25-terms-in-the-dwciri-namespace-normative).
19 | {% endif %}
20 | <div class="my-4">
21 |     {% for term in class_group.terms %}
22 |     <a class="btn btn-sm btn-outline-secondary m-1" href="#{{ term.namespace }}:{{ term.label }}">{{ term.label }}</a>
23 |     {% endfor %}
24 | </div>
25 | 
26 | {% if class_group.label != 'Site' and class_group.label != 'Habitat Scope' and class_group.label != 'Temporal Scope' and class_group.label != 'Taxonomic Scope' and class_group.label != 'Organismal Scope' and class_group.label != 'Identification' and class_group.label != 'Methodology Description' and class_group.label != 'Material Collected' and class_group.label != 'Sampling Effort' and class_group.label != 'UseWithIRI' %}
27 | {# Class #}
28 | <table class="table table-sm table-bordered">
29 |     <tbody>
30 |         <tr class="table-primary"><th colspan="2">{{ class_group.label }} <span class="badge badge-primary float-right">Class</span></th></tr>
31 |         <tr><td class="theme-label">Identifier</td><td><a href="{{ class_group.iri }}">{{ class_group.iri }}</a></td></tr>
32 |         <tr><td class="theme-label">Definition</td><td>{{ class_group.definition }}</td></tr>
33 |         <tr><td class="theme-label">Comments</td><td>{{ class_group.comments }}</td></tr>
34 |         <tr><td class="theme-label">Examples</td><td>{{ class_group.examples }}</td></tr>
35 |     </tbody>
36 | </table>
37 | {%endif %}
38 | 
39 | {% for term in class_group.terms %}
40 | {# Term #}
41 | <p class="invisible">
42 |     <a id="{{ term.namespace }}:{{ term.label }}"></a>{% if term.namespace != "dwciri" %}<a id="{{ term.label }}"></a>{% endif %}
43 | </p>
44 | {% set examples = term.examples.split("; ") %}
45 | <table class="table table-sm table-bordered">
46 |     <tbody>
47 |         <tr class="table-secondary"><th colspan="2">{{ term.label }} <span class="badge badge-secondary float-right">Property</span></th></tr>
48 |         <tr><td class="theme-label">Identifier</td><td><a href="{{ term.iri }}">{{ term.iri }}</a></td></tr>
49 |         <tr><td class="theme-label">Definition</td><td>{{ term.definition }}</td></tr>
50 |         <tr><td class="theme-label">Comments</td><td>{{ term.comments }}</td></tr>
51 |         <tr><td class="theme-label">Examples</td><td>{% if examples | length == 1 %}{{ examples | first }}{% else %}<ul class="list-group list-group-flush">{% for example in examples %}<li class="list-group-item">{{ example }}</li>{% endfor %}</ul>{% endif %}</td></tr>
52 |     </tbody>
53 | </table>
54 | {% endfor %}
55 | 
56 | {% endfor %}
57 | 


--------------------------------------------------------------------------------
/build/update_previous_doc.py:
--------------------------------------------------------------------------------
  1 | # Script to make the current document be the previous document
  2 | # This program is released under a GNU General Public License v3.0 http://www.gnu.org/licenses/gpl-3.0
  3 | # Author: Steve Baskauf
  4 | 
  5 | script_version = '0.1.1'
  6 | version_modified = '2024-03-03'
  7 | 
  8 | # NOTE: This script should be run only after the script updating the machine-readable metadata has been run.
  9 | # It must be run before the script that generates the new document version.
 10 | 
 11 | import requests
 12 | import pandas as pd
 13 | import yaml
 14 | import os
 15 | import sys
 16 | 
 17 | # -----------------
 18 | # Command line arguments
 19 | # -----------------
 20 | 
 21 | arg_vals = sys.argv[1:]
 22 | opts = [opt for opt in arg_vals if opt.startswith('-')]
 23 | args = [arg for arg in arg_vals if not arg.startswith('-')]
 24 | 
 25 | # Name of the last part of the URL of the doc
 26 | if '--slug' in opts:
 27 |     document_slug = args[opts.index('--slug')]
 28 | else:
 29 |     print('Must specify URL slug for document using --slug option')
 30 |     print('For example, if the permanent URL is "http://rs.tdwg.org/dwc/doc/eco/", the slug is "eco".')
 31 |     exit()
 32 | 
 33 | # Used as the directory name
 34 | if '--dir' in opts:
 35 |     directory_name = args[opts.index('--dir')]
 36 | else:
 37 |     print('Must specify name of directory containing template and configs using --dir option')
 38 |     print('For example, if the path to the templates in the rs.tdwg.org repo')
 39 |     print('is "process/document_metadata_processing/dwc_doc_eco/", the directory name is "dwc_doc_eco".')
 40 |     exit()
 41 | 
 42 | # "master" for production, something else for development
 43 | # Example: First part of branch URL is "https://raw.githubusercontent.com/tdwg/rs.tdwg.org/eco/", branch is "eco".
 44 | if '--branch' in opts:
 45 |     github_branch = args[opts.index('--branch')]
 46 | else:
 47 |     github_branch = 'master'
 48 | 
 49 | 
 50 | # -----------------
 51 | # Configuration section
 52 | # -----------------
 53 | 
 54 | githubBaseUri = 'https://raw.githubusercontent.com/tdwg/rs.tdwg.org/' + github_branch + '/'
 55 | 
 56 | config_file_path = 'process/document_metadata_processing/' + directory_name + '/'
 57 | document_configuration_yaml_file = 'document_configuration.yaml'
 58 | 
 59 | path_of_doc_relative_to_build_dir = '../docs/' + document_slug + '/'
 60 | 
 61 | # Load the document configuration YAML file from its GitHub URL
 62 | document_configuration_yaml_url = githubBaseUri + config_file_path + document_configuration_yaml_file
 63 | document_configuration_yaml = requests.get(document_configuration_yaml_url).text
 64 | document_configuration_yaml = yaml.load(document_configuration_yaml, Loader=yaml.FullLoader)
 65 | 
 66 | # Determine date of the document that is to be turned into the previous document and the version IRI
 67 | # of the most recent version of that document.
 68 | 
 69 | # Load versions list from document versions data in the rs.tdwg.org repo and find most recent version.
 70 | versions_data_url = githubBaseUri + 'docs-versions/docs-versions.csv'
 71 | versions_list_df = pd.read_csv(versions_data_url, na_filter=False)
 72 | 
 73 | # Slice all rows for versions of this document.
 74 | matching_versions = versions_list_df[versions_list_df['current_iri']==document_configuration_yaml['current_iri']]
 75 | # Sort the matching versions by version IRI in descending order so that the most recent version is first.
 76 | matching_versions = matching_versions.sort_values(by=['version_iri'], ascending=[False])
 77 | 
 78 | # Check for the error condition of there being no matching versions.
 79 | if len(matching_versions.index) == 0:
 80 |     print('There are no versions of this document. Did you run the script to update the document metadata?')
 81 |     exit()
 82 | 
 83 | # If there is only one row in the matching_versions dataframe (only one version), then the rest of the script should not be run.
 84 | if len(matching_versions.index) == 1:
 85 |     print('There is only one version of this document. No changes are being made to the documents.')
 86 |     exit()
 87 | 
 88 | # The most recent version is the first row in the dataframe (row 0). 
 89 | 
 90 | # Find the column index of the column named "version_iri".
 91 | version_iri_column_index = matching_versions.columns.get_loc('version_iri')
 92 | most_recent_version_iri = matching_versions.iat[0, version_iri_column_index]
 93 | print(most_recent_version_iri)
 94 | 
 95 | # Find the date of the previous version, which is in the second row of the dataframe (row 1). 
 96 | # Find the column index of the column named "version_issued".
 97 | version_iri_column_index = matching_versions.columns.get_loc('version_issued')
 98 | previous_version_date = matching_versions.iat[1, version_iri_column_index]
 99 | print(previous_version_date)
100 | 
101 | # The document to be converted is named "index.md". Its name must be changed to the date of the previous version.
102 | os.rename(path_of_doc_relative_to_build_dir + 'index.md', path_of_doc_relative_to_build_dir + previous_version_date + '.md')
103 | 
104 | # Open the renamed file and read its text.
105 | with open(path_of_doc_relative_to_build_dir + previous_version_date + '.md', 'rt') as file_object:
106 |     file_text = file_object.read()
107 | 
108 | # Insert the replacement version information into the header
109 | replacement_version_metadata_string = '''Replaced by
110 | : <''' + most_recent_version_iri + '''>
111 | 
112 | '''
113 | 
114 | # Insert the previous version information into the header above the Abstract section.
115 | header = file_text.replace('Abstract\n:', replacement_version_metadata_string + 'Abstract\n:')
116 | 
117 | # Write the updated file text to the file.
118 | with open(path_of_doc_relative_to_build_dir + previous_version_date + '.md', 'wt') as file_object:
119 |     file_object.write(header)
120 | 


--------------------------------------------------------------------------------
/dist/simple_eco_horizontal.csv:
--------------------------------------------------------------------------------
1 | siteCount,siteNestingDescription,verbatimSiteDescriptions,verbatimSiteNames,geospatialScopeAreaValue,geospatialScopeAreaUnit,totalAreaSampledValue,totalAreaSampledUnit,reportedWeather,reportedExtremeConditions,targetHabitatScope,excludedHabitatScope,eventDurationValue,eventDurationUnit,targetTaxonomicScope,excludedTaxonomicScope,taxonCompletenessReported,taxonCompletenessProtocols,isTaxonomicScopeFullyReported,isAbsenceReported,absentTaxa,hasNonTargetTaxa,nonTargetTaxa,areNonTargetTaxaFullyReported,targetLifeStageScope,excludedLifeStageScope,isLifeStageScopeFullyReported,targetDegreeOfEstablishmentScope,excludedDegreeOfEstablishmentScope,isDegreeOfEstablishmentScopeFullyReported,targetGrowthFormScope,excludedGrowthFormScope,isGrowthFormScopeFullyReported,hasNonTargetOrganisms,verbatimTargetScope,compilationTypes,compilationSourceTypes,inventoryTypes,protocolNames,protocolDescriptions,protocolReferences,isAbundanceReported,isAbundanceCapReported,abundanceCap,isVegetationCoverReported,isLeastSpecificTargetCategoryQuantityInclusive,hasVouchers,voucherInstitutions,hasMaterialSamples,materialSampleTypes,samplingPerformedBy,isSamplingEffortReported,samplingEffortProtocol,samplingEffortValue,samplingEffortUnit,absentTaxa,compilationSourceTypes,compilationTypes,eventDurationUnit,excludedDegreeOfEstablishmentScope,excludedGrowthFormScope,excludedHabitatScope,excludedLifeStageScope,excludedTaxonomicScope,geospatialScopeAreaUnit,inventoryTypes,materialSampleTypes,nonTargetTaxa,protocolNames,samplingEffortProtocol,samplingEffortUnit,samplingPerformedBy,targetDegreeOfEstablishmentScope,targetGrowthFormScope,targetHabitatScope,targetLifeStageScope,targetTaxonomicScope,taxonCompletenessProtocols
2 | 


--------------------------------------------------------------------------------
/dist/simple_eco_vertical.csv:
--------------------------------------------------------------------------------
 1 | siteCount
 2 | siteNestingDescription
 3 | verbatimSiteDescriptions
 4 | verbatimSiteNames
 5 | geospatialScopeAreaValue
 6 | geospatialScopeAreaUnit
 7 | totalAreaSampledValue
 8 | totalAreaSampledUnit
 9 | reportedWeather
10 | reportedExtremeConditions
11 | targetHabitatScope
12 | excludedHabitatScope
13 | eventDurationValue
14 | eventDurationUnit
15 | targetTaxonomicScope
16 | excludedTaxonomicScope
17 | taxonCompletenessReported
18 | taxonCompletenessProtocols
19 | isTaxonomicScopeFullyReported
20 | isAbsenceReported
21 | absentTaxa
22 | hasNonTargetTaxa
23 | nonTargetTaxa
24 | areNonTargetTaxaFullyReported
25 | targetLifeStageScope
26 | excludedLifeStageScope
27 | isLifeStageScopeFullyReported
28 | targetDegreeOfEstablishmentScope
29 | excludedDegreeOfEstablishmentScope
30 | isDegreeOfEstablishmentScopeFullyReported
31 | targetGrowthFormScope
32 | excludedGrowthFormScope
33 | isGrowthFormScopeFullyReported
34 | hasNonTargetOrganisms
35 | verbatimTargetScope
36 | compilationTypes
37 | compilationSourceTypes
38 | inventoryTypes
39 | protocolNames
40 | protocolDescriptions
41 | protocolReferences
42 | isAbundanceReported
43 | isAbundanceCapReported
44 | abundanceCap
45 | isVegetationCoverReported
46 | isLeastSpecificTargetCategoryQuantityInclusive
47 | hasVouchers
48 | voucherInstitutions
49 | hasMaterialSamples
50 | materialSampleTypes
51 | samplingPerformedBy
52 | isSamplingEffortReported
53 | samplingEffortProtocol
54 | samplingEffortValue
55 | samplingEffortUnit
56 | absentTaxa
57 | compilationSourceTypes
58 | compilationTypes
59 | eventDurationUnit
60 | excludedDegreeOfEstablishmentScope
61 | excludedGrowthFormScope
62 | excludedHabitatScope
63 | excludedLifeStageScope
64 | excludedTaxonomicScope
65 | geospatialScopeAreaUnit
66 | inventoryTypes
67 | materialSampleTypes
68 | nonTargetTaxa
69 | protocolNames
70 | samplingEffortProtocol
71 | samplingEffortUnit
72 | samplingPerformedBy
73 | targetDegreeOfEstablishmentScope
74 | targetGrowthFormScope
75 | targetHabitatScope
76 | targetLifeStageScope
77 | targetTaxonomicScope
78 | taxonCompletenessProtocols
79 | 


--------------------------------------------------------------------------------
/docs/CNAME:
--------------------------------------------------------------------------------
1 | eco.tdwg.org


--------------------------------------------------------------------------------
/docs/_config.yml:
--------------------------------------------------------------------------------
 1 | # SITE SETTINGS
 2 | title: Humboldt Extension for Ecological Inventories
 3 | description: Vocabulary maintained by the Darwin Core Maintenance Interest Group to facilitate the sharing of information about biological inventories.
 4 | url: "https://tdwg.github.io/hc/"
 5 | 
 6 | # THEME SETTINGS
 7 | theme: minima
 8 | remote_theme: tdwg/petridish
 9 | github_edit: false
10 | logo: /assets/theme/images/tdwg-logo-short.svg
11 | 
12 | # BUILD SETTINGS
13 | markdown: kramdown
14 | plugins:
15 |   - jekyll-feed
16 |   - jekyll-sitemap
17 | exclude:
18 |   - README.md
19 |   - Gemfile
20 |   - Gemfile.lock
21 |   - LICENSE
22 | 
23 | # FRONTMATTER DEFAULTS
24 | defaults:
25 |   - scope:
26 |       path: ""
27 |     values:
28 |       layout: default
29 |       toc: true
30 | 


--------------------------------------------------------------------------------
/docs/_data/footer.yml:
--------------------------------------------------------------------------------
 1 | # Footer content is organized in columns, with the first one reserved for social icons (defined in _config.yml).
 2 | # You can also add a small print license statement at the bottom.
 3 | 
 4 | # Columns (the more you add, the narrower they will be)
 5 | columns:
 6 |   - links:
 7 |     - text: Biodiversity Information Standards (TDWG)
 8 |       href: https://www.tdwg.org/
 9 | 
10 | # Small print license statement to add at the bottom of the footer. Can be Markdown
11 | # Will be prefixed by "© {{ site.author }}" if defined in _config.yml
12 | license: >
13 |   Content on this site, made open by [Biodiversity Information Standards (TDWG)](https://www.tdwg.org/) 
14 |   is licensed under a [Creative Commons Attribution 4.0 International License](https://creativecommons.org/licenses/by/4.0/).
15 | 
16 | 


--------------------------------------------------------------------------------
/docs/_data/navigation.yml:
--------------------------------------------------------------------------------
 1 | # Links listed below will be included in your site's navbar (navigation at the top)
 2 | 
 3 | - text: Home
 4 |   href: /
 5 | - text: Terms
 6 |   menu:
 7 |   - text: Darwin Core
 8 |     href: https://dwc.tdwg.org/list/
 9 |   - text: "---"
10 |   - text: Humboldt Extension
11 |     href: list/
12 |   - text: "---"
13 |   - text: Taxon Completeness Reported Controlled Vocabulary
14 |     href: tcr/
15 | - text: Guides
16 |   menu:
17 |   - text: Darwin Core Quick Reference
18 |     href: https://dwc.tdwg.org/terms/
19 |   - text: "---"
20 |   - text: Humboldt Extension Quick Reference
21 |     href: terms/
22 |   - text: "---"
23 |   - text: isLeastSpecificTargetCategoryQuantityInclusive Guidelines
24 |     href: inclusive/
25 |   - text: Hierarchical Events Guidelines
26 |     href: hierarchy/
27 |   - text: "---"
28 |   - text: Questions & Answers
29 |     href: https://github.com/tdwg/dwc-qa/blob/master/README.md
30 | - text: GitHub
31 |   href: https://github.com/tdwg/hc
32 | 


--------------------------------------------------------------------------------
/docs/_sass/_custom.scss:
--------------------------------------------------------------------------------
 1 | // Custom styling
 2 | 
 3 | .content {
 4 |   table {
 5 |     td:first-of-type {
 6 |       width: 120px; // Label column, long words will still push this wider
 7 |     }
 8 | 
 9 |     .list-group-item {
10 |       padding: 0.5rem 0; // Examples
11 |     }
12 |   }
13 | }
14 | 


--------------------------------------------------------------------------------
/docs/hierarchy/fig1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tdwg/hc/92e0ed94afceeea6a2d8ceb559da37f450ad007c/docs/hierarchy/fig1.png


--------------------------------------------------------------------------------
/docs/hierarchy/fig2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tdwg/hc/92e0ed94afceeea6a2d8ceb559da37f450ad007c/docs/hierarchy/fig2.png


--------------------------------------------------------------------------------
/docs/hierarchy/fig3.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tdwg/hc/92e0ed94afceeea6a2d8ceb559da37f450ad007c/docs/hierarchy/fig3.png


--------------------------------------------------------------------------------
/docs/hierarchy/index.md:
--------------------------------------------------------------------------------
  1 | # Properties of hierarchical events in the Humboldt Extension for Ecological Inventories
  2 | 
  3 | Title
  4 | : Properties of hierarchical events in the Humboldt Extension for Ecological Inventories
  5 | 
  6 | Date version issued
  7 | : 2024-02-28
  8 | 
  9 | Date created
 10 | : 2024-02-28
 11 | 
 12 | Part of TDWG Standard
 13 | : <http://www.tdwg.org/standards/450>
 14 | 
 15 | This version
 16 | : <http://rs.tdwg.org/dwc/doc/hierarchy/2024-02-28>
 17 | 
 18 | Latest version
 19 | : <http://rs.tdwg.org/dwc/doc/hierarchy/>
 20 | 
 21 | Abstract
 22 | : Ecological inventories in the context of Darwin Core can be considered as types of dwc:Events with the potential for hierarchical structure relating broader parent dwc:Events with narrower child dwc:Events. Terms in the Humboldt Extension are all properties of a dwc:Event. This document explains how dwc:Event hierarchies for ecological inventories should be structured and provides guidance on the use of Humboldt Extension terms in the context of parent and child dwc:Events.
 23 | 
 24 | Contributors
 25 | : [Yi-Ming Gan](https://orcid.org/0000-0001-7087-2646) ([Royal Belgian Institute of Natural Sciences](http://www.wikidata.org/entity/Q16665660)), [Wesley M. Hochachka](https://orcid.org/0000-0002-0595-7827) ([Cornell Lab of Ornithology](http://www.wikidata.org/entity/Q2997535)), [John Wieczorek](https://orcid.org/0000-0003-1144-0290) ([VertNet](http://www.wikidata.org/entity/Q98382028)), [Yanina V. Sica](https://orcid.org/0000-0002-1720-0127) ([Yale University](http://www.wikidata.org/entity/Q49112)), [Peter Brenton](https://orcid.org/0000-0001-9730-8340) ([Atlas of Living Australia, CSIRO](http://www.wikidata.org/entity/Q16335177)), [Robert D. Stevenson](https://orcid.org/0000-0003-1617-5895) ([Department of Biology, University of Massachusetts Boston](http://www.wikidata.org/entity/Q15144)), [Anahita J. N. Kazem](https://orcid.org/0000-0003-2475-132X) ([German Centre for Integrative Biodiversity Research, Leipzig and Friedrich Schiller University, Jena](http://www.wikidata.org/entity/Q1206134)), [Steven J. Baskauf](https://orcid.org/0000-0003-4365-3135) ([Vanderbilt University Libraries](http://www.wikidata.org/entity/Q16849893)), [Zachary R. Kachian](https://orcid.org/0000-0002-0500-0339) ([Keller Science Action Center, Field Museum of Natural History](http://www.wikidata.org/entity/Q1122595)), [Kate Ingenloff](https://orcid.org/0000-0001-5942-9053) ([Global Biodiversity Information Facility (GBIF)](http://www.wikidata.org/entity/Q1531570))
 26 | 
 27 | Creator
 28 | : TDWG Humboldt Extension Task Group
 29 | 
 30 | Bibliographic citation
 31 | : TDWG Humboldt Extension Task Group. 2024. Properties of hierarchical events in the Humboldt Extension for Ecological Inventories. Biodiversity Information Standards (TDWG). <http://rs.tdwg.org/dwc/doc/hierarchy/2024-02-28>
 32 | 
 33 | <a id="introduction">
 34 | ## 1 Introduction (non-normative)
 35 | 
 36 | ### 1.1 Status of the content of this document
 37 | 
 38 | Section 3 of this document is normative, serving as official guidelines
 39 | in application of the Humboldt Extension. The other sections are
 40 | non-normative and designed to help improve overall understanding in
 41 | application and interpretation of the Extension.
 42 | 
 43 | ### 1.2 RFC 2119 keywords
 44 | ---------------------
 45 | 
 46 | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", 
 47 | "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to 
 48 | be interpreted as described in [BCP 14](https://datatracker.ietf.org/doc/html/bcp14)
 49 | [[RFC2119]](https://datatracker.ietf.org/doc/html/rfc2119)
 50 | [[RFC8174]](https://datatracker.ietf.org/doc/html/rfc8174)
 51 | when, and only when, they are written in capitals (as shown here).
 52 | 
 53 | ## 1.3 Namespaces and terminology
 54 | 
 55 | The namespace `eco:` abbreviates terms minted for the Humboldt Extension
 56 | for ecological inventories
 57 | ([http://rs.tdwg.org/eco/terms/](http://rs.tdwg.org/eco/terms/)).
 58 | `dwc:` abbreviates terms from the main Darwin Core vocabulary namespace
 59 | ([http://rs.tdwg.org/dwc/terms/](http://rs.tdwg.org/dwc/terms/)).
 60 | 
 61 | Words in `code markup` are term IRIs or literal values. The word
 62 | "organism" is used colloquially and is not used in the technical sense
 63 | of the dwc:Organism class, unless specifically presented as
 64 | "dwc:Organism." The word "Event" is used in the technical sense of the
 65 | dwc:Event class. "Humboldt Extension" is an abbreviation for the
 66 | "Humboldt Extension for Ecological Inventories."
 67 | 
 68 | ### 1.4 Intended audience and use for this document
 69 | 
 70 | The information in this document is targeted at data providers, data
 71 | aggregators, and data consumers. *Data providers* are the individuals
 72 | responsible for mapping ecological inventory data into an Event-based
 73 | [Darwin Core
 74 | Archive](https://ipt.gbif.org/manual/en/ipt/latest/dwca-guide)
 75 | format that includes the Humboldt Extension. *Data aggregators* and
 76 | *data consumers* can use this document to better understand the data
 77 | shared by data providers, specifically with respect to the
 78 | **relationships between hierarchical dwc:Event levels** and **when it is
 79 | or is not appropriate to make inferences** about attributes such as
 80 | abundance or absence of detection.
 81 | 
 82 | <a id="rationale">
 83 | ## 2 Rationale (non-normative)
 84 | 
 85 | Ecological inventories in the context of Darwin Core can be considered
 86 | as types of [dwc:Events](http://rs.tdwg.org/dwc/terms/Event)
 87 | --- they are actions that occur at specific locations over defined
 88 | periods of time. The terms in the Humboldt Extension are all properties
 89 | of a dwc:Event.
 90 | 
 91 | There are many types of ecological inventory, ranging from singular
 92 | observations of individual taxa (1 event:1 observation; Example 1 in
 93 | <a href="#fig1">Figure 1</a>) to highly structured and deeply nested observations within
 94 | other observations (e.g., 1 event:2 sub-events, each sub-event:2
 95 | sub-sub-events; Example 4 in <a href="#fig1">Figure 1</a>). The need for guidance on **how
 96 | to capture the details of nested observations** (dwc:Event hierarchies)
 97 | is the rationale for this document. Nested sampling designs can be
 98 | translated into a relational database schema of parent-child dwc:Event
 99 | relationships (a parent event with one or more child sub-events; <a href="#fig1">Figure
100 | 1</a>). This document describes the circumstances under which specific
101 | properties of parent and child dwc:Events SHOULD be populated based on
102 | the parent-child relationship.
103 | 
104 | Note that the proposed structure for sharing ecological inventories does
105 | not follow typical database practice. Whilst a (relational) database
106 | would store information in multiple tables to avoid repetition of key
107 | information, datasets shared using the Darwin Core archive format and
108 | the Humboldt Extension instead use a "flattened" structure. In order to
109 | share inventory data such that no information is lost and no information
110 | is incorrectly inferred, one SHOULD **report all information at all
111 | applicable levels**. The rules for applicability and how to populate
112 | terms at parent and child levels in the dwc:Event hierarchy are captured
113 | in section *<a href="#guiding">3.2 Guiding principles</a>* and in section *<a href="#implementation">3.3 Implementation principles</a>*.
114 | 
115 | <a id="fig1">
116 | ![Illustration of four examples of nested dwc:Events](fig1.png)
117 | 
118 | **Figure 1.** Visual representation of an ecological inventory
119 | illustrating four examples of occurrence data associated with dwc:Events
120 | nested within parent dwc:Events, at varying levels of complexity ranging
121 | from low (Example 1) to high (Example 4).
122 | 
123 | <a id="usage">
124 | ## 3 Usage guidelines (normative)
125 | 
126 | ### 3.1 Definitions
127 | 
128 | **Inventory dataset** - An inventory (dataset) consists of one or more
129 | dwc:Events that MAY be related to each other in a hierarchy of parent
130 | and child dwc:Events. This is not new to the capabilities or intentions
131 | of Darwin Core.
132 | 
133 | **Inventory hierarchy** - A set of related dwc:Events, in which a
134 | narrower dwc:Event (child) points to the related broader dwc:Event
135 | (parent) via the child's dwc:parentEventID. A higher-level dwc:Event
136 | generally contains information about the inventory design that applies
137 | to all of its children.
138 | 
139 | **Parent dwc:Event** - A parent dwc:Event is any dwc:Event whose
140 | dwc:eventID is a dwc:parentEventID for at least one other dwc:Event
141 | (e.g. EVENT_01 in Figure 2).
142 | 
143 | **Child dwc:Event** - A child dwc:Event is any dwc:Event whose
144 | dwc:parentEventID is populated with the dwc:eventID of another dwc:Event
145 | (e.g. EVENT_02 or EVENT_03 in Figure 2).
146 | 
147 | ![Visual representation of parent/child relationship](fig2.png)
148 | 
149 | **Figure 2.** Visual representation of an inventory hierarchy
150 | illustrating parent-child dwc:Event relations. The higher-level (parent)
151 | dwc:Event, EVENT_01, may include general information about the inventory
152 | design. Species occurrences are captured for two child dwc:Events
153 | (EVENT_02 and EVENT_03).
154 | 
155 | <a id="guiding">
156 | ## 3.2 Guiding principles
157 | 
158 | <a id="coverage">
159 | ### 3.2.1 Principle of spatiotemporal coverage
160 | 
161 | **A parent dwc:Event MUST encompass its child dwc:Events spatially
162 | <u>and</u> temporally.** Specifically, the spatial extent and temporal
163 | interval of a parent dwc:Event MUST contain the spatial extents and
164 | temporal intervals of all of its children. For example, if child
165 | dwc:Events took place in various locations throughout, and only within,
166 | Burundi, then the spatial extent of the parent dwc:Event would be
167 | Burundi. Similarly, if the child dwc:Events took place periodically
168 | throughout the year 2019, the temporal interval of the parent dwc:Event
169 | would begin when the earliest child dwc:Event began and end when the
170 | latest child dwc:Event ended.
171 | 
172 | <a id="applicability">
173 | ### 3.2.2 Principle of applicability
174 | 
175 | **Humboldt Extension terms SHOULD contain data explicitly at every level
176 | in the dwc:Event hierarchy to which they *directly* apply.** The value
177 | of a term for a dwc:Event SHOULD be populated for the Event itself
178 | rather than merely summarized in a higher-level dwc:Event. For example,
179 | a child dwc:Event (**C**) with multiple dwc:Occurrences, some of which
180 | resulted in voucher specimens, SHOULD possess a value of `true` for
181 | the term eco:hasVouchers. The data user SHOULD NOT be expected to look
182 | at the eco:hasVouchers term for the parent dwc:Event (**P**) of **C** in
183 | order to find the value.
184 | 
185 | If a term genuinely applies at multiple levels of an dwc:Event
186 | hierarchy, values SHOULD be reported explicitly at *each* of those
187 | levels. The values for child dwc:Events might be the same as their
188 | parental values, or child dwc:Events might possess their own more
189 | specific values. This principle allows child dwc:Events to be
190 | "autonomous" to the greatest degree possible, and avoids uncertainty
191 | about where to look for the values of properties of any given dwc:Event.
192 | 
193 | <a id="non-derivation">
194 | ### 3.2.3 Principle of non-derivation
195 | 
196 | As a complement to the *Principle of applicability*, **Humboldt
197 | Extension terms SHOULD NOT be populated by deriving or summarizing
198 | information from child dwc:Events to their common parent dwc:Event**. If
199 | a term does not directly apply to a given level of dwc:Event (i.e., it
200 | is not an actual property of that dwc:Event), it SHOULD NOT be populated
201 | with a value. For example, if the parent dwc:Event **P** from the
202 | example in section *<a href="#applicability">3.2.2</a>* above is not directly linked to
203 | dwc:Occurrences, then the term eco:hasVouchers does not apply at that
204 | dwc:Event level and SHOULD be left unpopulated. Data providers SHOULD
205 | NOT construct a value for a parent dwc:Event from values at the level of
206 | child dwc:Events.
207 | 
208 | In some cases, including the example above, it would not be valid to
209 | derive or summarize information from child dwc:Events to populate a
210 | parent dwc:Event. Suppose parent dwc:Event **P** has two child
211 | dwc:Events, one with eco:hasVouchers `true` and one with
212 | eco:hasVouchers `false`. The value of eco:hasVouchers for **P** cannot
213 | be derived or summarized from its children, as it is neither `true`
214 | nor `false` for all of them (the only two values consistent with the
215 | recommended controlled vocabulary for the term). It would be neither
216 | desirable nor reliable to use the values of the child dwc:Events to
217 | infer a value for the parent dwc:Event. The *Principle of inference*
218 | (below) provides a further example, where *scope* terms of parent
219 | dwc:Events MUST NOT be populated by summarizing from lower levels
220 | (either through the scope values of child dwc:Events or, for example,
221 | through taxa detected in child dwc:Events).
222 | 
223 | There are terms which could theoretically be populated for a parent
224 | dwc:Event from the primary data already provided for that dwc:Event\'s
225 | children (e.g., eco:materialSampleTypes). Populating the parent term
226 | could facilitate the discovery of higher-level dwc:Events among whose
227 | children there is a particular value of a property (e.g., a search
228 | through the highest-level dwc:Events in datasets to find datasets in
229 | which there are particular eco:materialSampleTypes). However, providing
230 | such summary values is specifically NOT RECOMMENDED. Doing so a\) adds no
231 | information to the dataset (the summary information is already available
232 | by inspecting the primary data in the dwc:Events in the dataset), b\)
233 | adds an extra burden of summary upon the data provider, and c\) is
234 | susceptible to errors (ambiguities, inconsistencies, incompleteness)
235 | when trying to construct secondary summary information for higher-level
236 | Events.
237 | 
238 | <a id="inference">
239 | ### 3.2.4 Principle of inference
240 | 
241 | **Certain terms in the Humboldt Extension support inferences.** Examples
242 | of terms that help data users to determine whether or not inferences can
243 | be made include those describing the *scope* of the inventory, such as
244 | eco:targetTaxonomicScope and eco:excludedTaxonomicScope, and terms
245 | describing *completeness*, such as eco:taxonCompletenessReported,
246 | eco:taxonCompletenessProtocols and eco:isTaxonomicScopeFullyReported.
247 | The values of these terms in a dwc:Event have implications for the
248 | interpretation of all of that dwc:Event's child dwc:Events. These terms
249 | MUST be populated for the highest level dwc:Event to which they apply,
250 | and all of its child dwc:Events.
251 | 
252 | **The *scope* terms of a dwc:Event MUST be populated whenever the scope
253 | was in effect**. Having this information in a dwc:Event is the only way
254 | **to be able to infer absences of detection** within that dwc:Event,
255 | whenever the dwc:Occurrences linked to that dwc:Event do not explicitly
256 | state zero counts or when there are no dwc:Occurrence records for a
257 | given taxon that fell within the taxonomic scope (the combination of
258 | eco:targetTaxonomicScope and eco:excludedTaxonomicScope). The ability to
259 | "implicitly" support inferences about undetected dwc:Taxa (and other
260 | organismal targets) was a high priority objective in the design and
261 | structure of the Humboldt Extension. By "implicitly support
262 | inferences" we mean that a dwc:organismQuantity of zero individuals
263 | within a particular scope does not need to be provided explicitly as a
264 | separate dwc:Occurrence record, for a dwc:Event that does declare an
265 | encompassing scope and where all the taxa/targets that *were* detected
266 | were fully reported. Instead, those zero counts can be reconstituted by
267 | data users based on the data contained in other terms. When the target
268 | taxonomic scope (the combination of eco:targetTaxonomicScope and
269 | eco:excludedTaxonomicScope) is determined in advance of inventory data
270 | collection, and eco:isTaxonomicScopeFullyReported = `true`, then all
271 | dwc:Taxa that fall within the taxonomic scope but are not reported in
272 | the dwc:Occurrences of any child dwc:Events **can be inferred to be
273 | dwc:Occurrences with a dwc:organismQuantity of zero** (i.e., undetected
274 | dwc:Taxa).
275 | 
276 | These inferred zero counts, in combination with information about
277 | sampling effort (i.e., eco:samplingEffortProtocol,
278 | eco:samplingEffortValue and eco:samplingEffortUnit), can then be used to
279 | estimate the likelihood that a count of zero organisms represents a
280 | *true* absence of a dwc:Taxon. However, if eco:taxonCompletenessReported
281 | = `reported incomplete` and/or eco:isTaxonomicScopeFullyReported =
282 | `false` for a dwc:Event, then future users SHOULD NOT make assumptions
283 | about absences.
284 | 
285 | Data providers **MUST NOT retrospectively infer and populate
286 | eco:targetTaxonomicScope, or other *scope* terms**, for inclusion in a
287 | dataset shared with the Humboldt Extension. This is a further example of
288 | the *<a href="#non-derivation">Principle of non-derivation</a>* (*3.2.3*). Likewise, data users SHOULD
289 | NOT assume or reconstruct a scope that was not explicitly given by the
290 | data provider. There are at least two reasons for this: (1) Artificial
291 | construction of scope: retrospective inference of target scope by a data
292 | provider by aggregating information across all child dwc:Events may
293 | result in a reported scope that is narrower than the actual intended
294 | scope of the inventory. (2) Artificial broadening of scope: it is
295 | possible that the inferred scope can be described in multiple ways. For
296 | example, the scope of a list of species within a single genus could be
297 | described as the genus, as the family containing that genus, or as an
298 | even broader taxonomic concept. Thus, unless the true taxonomic scope is
299 | a known variable in the inventory protocol, then a presumed scope may be
300 | too broad or too narrow, leading to errors when inferring counts of
301 | zero.
302 | 
303 | <a id="implementation">
304 | ## 3.3 Implementation principles
305 | 
306 | 1.  A Darwin Core-based inventory dataset MUST consist of at least one
307 | dwc:Event record.
308 | 
309 | 2.  Each dwc:Event in an inventory dataset MUST have a non-empty value
310 | for dwc:eventID that is unique within the dataset. More benefits
311 | are realizable if the dwc:eventIDs are also globally unique.
312 | 
313 | 3.  Any association of a Humboldt Extension record with a dwc:Event
314 | record MUST be done via that dwc:Event\'s dwc:eventID; the
315 | associated records MUST use the same dwc:eventID. It is
316 | permissible to have dwc:Event records without associated Humboldt
317 | Extension records.
318 | 
319 | 4.  An inventory hierarchy MUST be realized by explicitly relating each
320 | child dwc:Event to a parent dwc:Event through the child
321 | dwc:Event's dwc:parentEventID.
322 | 
323 | 5.  Data providers SHOULD follow [Darwin Core principle
324 | 4](https://dwc.tdwg.org/simple/#5-are-there-any-rules-normative),
325 | which is to fill the values of as many terms as possible, subject
326 | to the *Principle of applicability* and the *Principle of
327 | non-derivation* (sections *<a href="#applicability">3.2.2</a>* and *<a href="#non-derivation">3.2.3</a>*, respectively).
328 | 
329 | 6.  A child dwc:Event MUST NOT be assumed to implicitly "inherit" the
330 | value of any property of any of its parent dwc:Events; rather, the
331 | value SHOULD be provided explicitly as discussed in section *<a href="#applicability">3.2.2
332 | Principle of applicability</a>*.
333 | 
334 | 7.  A parent dwc:Event term SHOULD NOT be populated by deriving or
335 | summarizing information from child dwc:Events; rather, the value
336 | SHOULD be provided explicitly if appropriate to the nature and
337 | level of the dwc:Event, as discussed in section *<a href="#non-derivation">3.2.3 Principle of non-derivation</a>*.
338 | 
339 | <a id="examples">
340 | ## 4 Examples (non-normative)
341 | 
342 | ![Tables illustrating implementation principles](fig3.png)
343 | 
344 | **Figure 3.** Example illustrating the [Implementation
345 | principles](#implementation). Numbering of colored
346 | rectangles indicates the relevant principle; lines, arrows or rectangles
347 | in the same color indicate that the cells, columns or records are
348 | affected by the principle. *Notolepis coatsi* and *Cranchiidae* are not
349 | within the reported eco:targetTaxonomicScope. Principle 1 - an inventory
350 | dataset must have at least one dwc:Event record; here, 3 records can be
351 | identified. Principle 2 - each dwc:Event record must have a unique
352 | dwc:eventID. Principle 3 - Humboldt Extension records must be linked to
353 | the core dwc:Events via shared dwc:eventIDs. Principle 4 - every child
354 | dwc:Event must be related to its parent dwc:Event through a
355 | dwc:parentEventID. Principle 5 - term values for dwc:Events should be
356 | populated whenever possible; in the figure all records follow Darwin
357 | Core principle 4, subject to the *<a href="#applicability">Principle of applicability</a>* and the
358 | *<a href="#non-derivation">Principle of non-derivation</a>*. Principle 6 - terms for child dwc:Events
359 | must be explicitly populated rather than "inheriting" values from
360 | their parent dwc:Events. Principle 7 - terms for parent dwc:Events
361 | should be populated whenever relevant, but not be derived or summarized
362 | from their child dwc:Events.
363 | 


--------------------------------------------------------------------------------
/docs/humboldt_extension_implementation_experience_report.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tdwg/hc/92e0ed94afceeea6a2d8ceb559da37f450ad007c/docs/humboldt_extension_implementation_experience_report.pdf


--------------------------------------------------------------------------------
/docs/inclusive/index.md:
--------------------------------------------------------------------------------
  1 | # isLeastSpecificTargetCategoryQuantityInclusive Guidelines
  2 | 
  3 | Title
  4 | : isLeastSpecificTargetCategoryQuantityInclusive Guidelines
  5 | 
  6 | Date version issued
  7 | : 2024-02-28
  8 | 
  9 | Date created
 10 | : 2024-02-28
 11 | 
 12 | Part of TDWG Standard
 13 | : <http://www.tdwg.org/standards/450>
 14 | 
 15 | This version
 16 | : <http://rs.tdwg.org/dwc/doc/inclusive/2024-02-28>
 17 | 
 18 | Latest version
 19 | : <http://rs.tdwg.org/dwc/doc/inclusive/>
 20 | 
 21 | Abstract
 22 | : The Humboldt Extension for ecological inventories mints the term eco:isLeastSpecificTargetCategoryQuantityInclusive to describe how to treat counts of organisms when records from a single dwc:Event include multiple target categories. This document describes how to use that term.
 23 | 
 24 | Contributors
 25 | : [Yi-Ming Gan](https://orcid.org/0000-0001-7087-2646) ([Royal Belgian Institute of Natural Sciences](http://www.wikidata.org/entity/Q16665660)), [Wesley M. Hochachka](https://orcid.org/0000-0002-0595-7827) ([Cornell Lab of Ornithology](http://www.wikidata.org/entity/Q2997535)), [John Wieczorek](https://orcid.org/0000-0003-1144-0290) ([VertNet](http://www.wikidata.org/entity/Q98382028)), [Yanina V. Sica](https://orcid.org/0000-0002-1720-0127) ([Yale University](http://www.wikidata.org/entity/Q49112)), [Peter Brenton](https://orcid.org/0000-0001-9730-8340) ([Atlas of Living Australia, CSIRO](http://www.wikidata.org/entity/Q16335177)), [Steven J. Baskauf](https://orcid.org/0000-0003-4365-3135) ([Vanderbilt University Libraries](http://www.wikidata.org/entity/Q16849893))
 26 | 
 27 | Creator
 28 | : TDWG Humboldt Extension Task Group
 29 | 
 30 | Bibliographic citation
 31 | : TDWG Humboldt Extension Task Group. 2024. isLeastSpecificTargetCategoryQuantityInclusive Guidelines. Biodiversity Information Standards (TDWG). <http://rs.tdwg.org/dwc/doc/inclusive/2024-02-28>
 32 | 
 33 | ## 1 Introduction (non-normative)
 34 | 
 35 | This document elaborates upon the meaning and use of the term `eco:isLeastSpecificTargetCategoryQuantityInclusive`.  Use of this term is necessary in order to describe how to treat counts of organisms (or any other organisms quantity)  when records from a single `dwc:Event` (<http://rs.tdwg.org/dwc/terms/Event>) include multiple target categories (e.g., taxonomic ranks within a higher rank or different life stages for the same species). For example, a statement whether the least specific target category quantity is inclusive should be reported when an `dwc:Event` includes records reporting quantities that are associated with subcategories (e.g., subspecies) and records reporting quantities for more general categories (e.g., the species). In this example, the higher taxon rank (i.e., species) is the least specific category, because it is more general than the subspecies category nested below it. Species and subspecies are just one example of a pair of category and subcategory. Other examples of subcategories are life stages (e.g., “adult”, “larva”, “egg”), and sexes.
 36 | 
 37 | ### 1.1 Status of the content of this document
 38 | 
 39 | Sections 3 of this document is normative. The other sections are non-normative.  
 40 | 
 41 | 
 42 | ### 1.2 RFC 2119 key words
 43 | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", 
 44 | "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to 
 45 | be interpreted as described in [BCP 14](https://datatracker.ietf.org/doc/html/bcp14)
 46 | [[RFC2119]](https://datatracker.ietf.org/doc/html/rfc2119)
 47 | [[RFC8174]](https://datatracker.ietf.org/doc/html/rfc8174)
 48 | when, and only when, they are written in capitals (as shown here).
 49 | 
 50 | ### 1.3 Namespaces and terminology
 51 | 
 52 | The namespace `eco:` abbreviates `http://rs.tdwg.org/eco/terms/` and is used with terms minted for the Humboldt Extension for ecological inventories. `dwc:` abbreviates `http://rs.tdwg.org/dwc/terms/`, and is used with terms in the main Darwin Core vocabulary namespace. Words in `code markup` are term IRIs or literal values. The word "organisms" is used colloquially and is not used in the technical sense of the `dwc:Organism` class.
 53 | 
 54 | ## 2 Rationale (non-normative)
 55 | 
 56 | The term `eco:isLeastSpecificTargetCategoryQuantityInclusive` was introduced into the Humboldt Extension for ecological inventories late in development, after testing it with real-world cases ([Sica et al., 2022](#ref2)). Testing revealed that the quantities of organisms stored in two major biodiversity databases — OBIS (<a href="#ref1">OBIS, 2023</a>) and eBird (<a href="#ref3">Sullivan et al., 2014</a>) — need to be treated differently in order to calculate the total quantity of organisms in the least specific category.  In the specific case of data in the OBIS database, the information for a single `dwc:Event` can contain multiple records for a species, with one record for a species listing the quantity of individual organisms for the species without specifying any subcategory of life stage, and other records for the same species in the same `dwc:Event` listing quantities for different life stages (e.g., one record for adults and another record for juveniles). In this example the single `dwc:Event` will contain 3 records: one for the species without any life stage specified, one for adults of the species, and one for juveniles of the species.  For the OBIS data, the quantity in the record for which no life stage is specified is the sum of three quantities: the number of juveniles, the number of adults, and the number of individuals that were not recorded as belonging to any specific life stage.  In other words, when using OBIS data, the total quantity of individuals recorded for a species, across all life stages combined, has been pre-calculated and stored in the database; unless the quantities of individuals within specific life stages are of interest, the information in the life stage subcategories can be ignored. The value of the term `eco:isLeastSpecificTargetCategoryQuantityInclusive` in this case would be `true` - the least specific category (species without any life stage specified) already includes the counts of the more specific subcategories.
 57 | 
 58 | eBird stores information about quantities of organisms differently.  For the example of a `dwc:Event` that contains separate records for subspecies and their parent species, the total number of individuals of the species needs to be calculated by the end user as the sum of the quantity reported for the species plus the quantities reported for the subspecies.  In other words, the total quantity of organisms of each species has not been pre-calculated and must be derived by the end user. The value of the term `eco:isLeastSpecificTargetCategoryQuantityInclusive` in this case would be `false` - the least specific category (species) does not include the counts of the more specific subcategories (subspecies).
 59 | 
 60 | In summary, the term `eco:isLeastSpecificTargetCategoryQuantityInclusive` is required to inform the end user of whether they will need to derive the total quantity of organisms for the least specific category (e.g., for a species), or whether this total quantity has already been calculated prior to the data being entered into the database. Note that, if a dataset contains only simple targets that have no subcategories, the result of the term `eco:isLeastSpecificTargetCategoryQuantityInclusive` being `true` or `false` is exactly the same - the count is the total in either case. Only in this circumstance does the term not strictly need to be populated. However, given that data records acquire a "life of their own" separate from their associated metadata when aggregated from multiple data sets, best practice is to include and populate the term `eco:isLeastSpecificTargetCategoryQuantityInclusive`.
 61 | 
 62 | ## 3 Usage guidelines (normative)
 63 | 
 64 | The term `eco:isLeastSpecificTargetCategoryQuantityInclusive` is defined as "The total detected quantity of organisms for a `dwc:Taxon` (including subsets thereof) in a `dwc:Event` is given explicitly in a single record (`dwc:organismQuantity` value) for that `dwc:Taxon`."
 65 | 
 66 | Values MUST be `true` and `false`. If `true`, the `dwc:organismQuantity` values for a `dwc:Taxon` in an `dwc:Event` is inclusive of all organisms of the `dwc:Taxon` (including more specific scopes such as different life stages or lower taxonomic ranks) and the total detected quantity of organisms for that `dwc:Taxon` in the `dwc:Event` MUST NOT be determined by summing the `dwc:organismQuantity` values for all records of the `dwc:Taxon` in the `dwc:Event`. Instead, the total detected quantity of organisms for the `dwc:Taxon` in an `dwc:Event` MUST be reported in a single record for the `dwc:Taxon` in the `dwc:Event`, with this record having no further specific scopes. In this case the sum of `dwc:organismQuantity` values for the reported subsets of the `dwc:Taxon` MUST NOT exceed the value of `dwc:organismQuantity` for the single record for the `dwc:Taxon` without subsets (i.e., the total).  If `false`, the `dwc:organismQuantity` values for a `dwc:Taxon` in an `dwc:Event` MUST be added to get the total detected quantity of organisms for that `dwc:Taxon` in the `dwc:Event`. 
 67 | 
 68 | ## 4 Examples (non-normative)
 69 | 
 70 | ### 4.1 Single `dwc:Taxon` example
 71 | 
 72 | As an example of the difference between `true` and `false` values for `eco:isLeastSpecificTargetCategoryQuantityInclusive`, suppose there are three records (see Table 1) with `dwc:organismQuantity` for a `dwc:Taxon` (taxon_01) for an `dwc:Event` (event_01). One record is for adults of the `dwc:Taxon` with `dwc:organismQuantity` = `1` and `dwc:organismQuantityType` = `individuals`, one record is for juveniles of the `dwc:Taxon` with `dwc:organismQuantity` = `2` and `dwc:organismQuantityType` = `individuals`, and one record is for the `dwc:Taxon` without specifying the life stage and with `dwc:organismQuantity` = `4` and `dwc:organismQuantityType` = `individuals`. 
 73 | 
 74 | If `eco:isLeastSpecificTargetCategoryQuantityInclusive` is `true` for event_01, then the total number of individuals of taxon_01 for the `dwc:Event` is 4 (the least specific `dwc:Taxon` record — the one with no more specific scopes — includes all individuals of the `dwc:Taxon`). This means that there was 1 adult, 2 juveniles and 1 individual of taxon_01 whose life stage was not recorded. 
 75 | 
 76 | If `eco:isLeastSpecificTargetCategoryQuantityInclusive` is `false` for event_01, then the total number of individuals of taxon_01 for the `dwc:Event` is 7 (the least specific `dwc:Taxon` record - the one with no more specific scopes - does not include all individuals of the `dwc:Taxon`, rather, it is a separate category that must also be added to get the total). This means there was 1 adult, 2 juveniles and 4 individuals of taxon_01 whose life stage was not recorded.
 77 | 
 78 | **Table 1. Organism quantities in `dwc:Occurrence` records**
 79 | 
 80 | | occurrenceID | eventID | taxonID | lifeStage | organismQuantity | organismQuantityType |
 81 | | ------------ | ------- | ------- | --------- | ---------------- | -------------------- |
 82 | | occ_01 | event_01 | taxon_01 | adult | 1 | individual |
 83 | | occ_02 | event_01 | taxon_01 | juvenile | 2 | individual |
 84 | | occ_03 | event_01 | taxon_01 |  | 4 | individual |
 85 | 
 86 | ### 4.2 Nested taxa example
 87 | 
 88 | Suppose there are three records (see Table 2) with `dwc:organismQuantity` for three taxa (*Hirundo rustica* and two subspecies) for a `dwc:Event` (event_01). The record for the species has `dwc:organismQuantity` = `3` and `dwc:organismQuantityType` = `individuals`. The record for *H. r. rustica* has `dwc:organismQuantity` = `2` and `dwc:organismQuantityType` = `individuals`. The record for *H. r. gutturalis* has `dwc:organismQuantity` = `4` and `dwc:organismQuantityType` = `individuals`.
 89 | 
 90 | If `eco:isLeastSpecificTargetCategoryQuantityInclusive` is `true` for event_01, then the total number of individuals of the species *H. rustica* for the `dwc:Event` is 3 (the least specific `dwc:Taxon` record includes all individuals of the `dwc:Taxon`). This means there were 2 *H. r. rustica*, 1 *H. r. gutturalis*, and no other *H. rustica* of any kind detected.
 91 | 
 92 | If `eco:isLeastSpecificTargetCategoryQuantityInclusive` is `false` for event_01, then the total number of individuals of the species *H. rustica* for the `dwc:Event` is 6 (the least specific `dwc:Taxon` record does not include all individuals of the `dwc:Taxon`). This means there were 2 *H. r. rustica*, 1 *H. r. gutturalis*, and 3 other *H. rustica* detected that were not identified to subspecies. 
 93 | 
 94 | **Table 2. Organism quantities in `dwc:Event` records**
 95 | 
 96 | | eventID | scientificName | organismQuantity | organismQuantityType |
 97 | | ------- | -------------- | ---------------- | -------------------- |
 98 | | event_01 | Hirundo rustica | 3 | individual |
 99 | | event_01 | Hirundo rustica rustica | 2 | individual |
100 | | event_01 | Hirundo rustica gutturalis | 1 | individual |
101 | 
102 | # 5 References
103 | 
104 | <a id="ref1"></a>OBIS (2023) Ocean Biodiversity Information System. Intergovernmental Oceanographic Commission of UNESCO. <https://www.obis.org/>.
105 | 
106 | <a id="ref2"></a>Sica Y. V., K. Ingenloff, Y-M GAN, Z. Kachian, S. J. Baskauf, J. Wieczorek, P. F. Zermoglio, R. D. Stevenson (2022). Application of Humboldt Extension to Real-world Cases. *Biodiversity Information Science and Standards* 6: e91502. <https://doi.org/10.3897/biss.6.91502>
107 | 
108 | <a id="ref3"></a>Sullivan, B. L., J. L. Aycrigg, J. H. Barry, R. E. Bonney, N. Bruns, C. B. Cooper, T. Damoulas, A. A. Dhondt, T. Dietterich, A. Farnsworth, D. Fink, et al. (2014). The eBird enterprise: an integrated approach to development and application of citizen science. *Biological Conservation* 169:31-40. <https://10.1016/j.biocon.2013.11.003>
109 | 


--------------------------------------------------------------------------------
/docs/index.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | layout: home
 3 | title: Humboldt Extension for Ecological Inventories
 4 | description: The Humboldt Extension for Ecological Inventories is a vocabulary for transmitting information about biodiversity surveys with hierarchical structure. It is used along with Darwin Core terms to extend descriptions of Events.
 5 | ---
 6 | The Humboldt Extension for Ecological Inventories is a standard vocabulary maintained by the [Darwin Core Maintenance Group](https://www.tdwg.org/community/dwc/). It includes a glossary of terms (in other contexts these might be called properties, elements, fields, columns, attributes, or concepts) intended to **facilitate the sharing of information about ecological inventories** by providing identifiers, labels, definitions, usage comments and examples.
 7 | 
 8 | The official documents for this extension to [Darwin Core](http://www.tdwg.org/standards/450) include the [list of terms](list/), a [list of controlled vocabulary terms](tcr/) for the property `eco:taxonCompletenessReported`, and two guides that explain how the extension must be used ([isLeastSpecificTargetCategoryQuantityInclusive](inclusive/) and [hierarchical events](hierarchy/) guidelines). The [Quick reference guide](terms/) and [Usage guide](https://docs.google.com/document/d/1rX4m94rtZDR_8iIe3RvRnNYKDJcmSX3ii4S5hCznEA0/edit?usp=sharing) are not officially part of the addition to Darwin Core, but may provide a less technical point of entry to understanding the extension.
 9 | 
10 | ## Getting started
11 | 
12 | * [Quick reference guide](terms/)
13 | * [Usage guide](https://docs.google.com/document/d/1rX4m94rtZDR_8iIe3RvRnNYKDJcmSX3ii4S5hCznEA0/edit?usp=sharing): how to use the Humboldt Extension
14 | * [GitHub repository](https://github.com/tdwg/hc): where the Humboldt Extension is maintained
15 | * [Guidelines for using isLeastSpecificTargetCategoryQuantityInclusive](inclusive/)
16 | * [Properties of hierarchical events in the Humboldt Extension for Ecological Inventories](hierarchy/).
17 | * [Term list](list/): the document containing complete metadata and normative term definitions for all Humboldt Extension terms.
18 | * [Utility files](https://github.com/tdwg/hc/tree/master/dist): CSV files of vertical and horizontal term lists plus the Humboldt Extension schema
19 | * [Implementation Experience Report](humboldt_extension_implementation_experience_report.pdf) 
20 | 


--------------------------------------------------------------------------------
/docs/tcr/index.md:
--------------------------------------------------------------------------------
  1 | # Taxon Completeness Reported Controlled Vocabulary List of Terms
  2 | 
  3 | Title
  4 | : Taxon Completeness Reported Controlled Vocabulary List of Terms
  5 | 
  6 | Namespace IRI
  7 | : http://rs.tdwg.org/ecotcr/values/
  8 | 
  9 | Preferred namespace abbreviation
 10 | : ecotcr:
 11 | 
 12 | Date version issued
 13 | : 2024-02-28
 14 | 
 15 | Date created
 16 | : 2024-02-28
 17 | 
 18 | Part of TDWG Standard
 19 | : <http://www.tdwg.org/standards/450>
 20 | 
 21 | This version
 22 | : <http://rs.tdwg.org/dwc/doc/tcr/2024-02-28>
 23 | 
 24 | Latest version
 25 | : <http://rs.tdwg.org/dwc/doc/tcr/>
 26 | 
 27 | Abstract
 28 | : The Humboldt Extension for Ecological Inventories mints the term `taxonCompletenessReported` to alert users that the inventory was conducted in such a way that all of the target taxa should have been detectable if they were present during the dwc:Event. This vocabulary provides terms that should be used as values for `eco:taxonCompletenessReported` and `ecoiri:taxonCompletenessReported`.
 29 | 
 30 | Contributors
 31 | : [Yanina V. Sica](https://orcid.org/0000-0002-1720-0127) ([Yale University](http://www.wikidata.org/entity/Q49112)), [Wesley M. Hochachka](https://orcid.org/0000-0002-0595-7827) ([Cornell Lab of Ornithology](http://www.wikidata.org/entity/Q2997535)), [Steven J. Baskauf](https://orcid.org/0000-0003-4365-3135) ([Vanderbilt University Libraries](http://www.wikidata.org/entity/Q16849893))
 32 | 
 33 | Creator
 34 | : TDWG Humboldt Extension Task Group
 35 | 
 36 | Bibliographic citation
 37 | : TDWG Humboldt Extension Task Group. 2024. Taxon Completeness Reported Controlled Vocabulary List of Terms. Biodiversity Information Standards (TDWG). <http://rs.tdwg.org/dwc/doc/tcr/2024-02-28>
 38 | 
 39 | ## 1 Introduction (non-normative)
 40 | 
 41 | This document includes terms intended to be used as a controlled value for the Humboldt Extension terms with the local name `taxonCompletenessReported`. 
 42 | 
 43 | ### 1.1 Status of the content of this document
 44 | 
 45 | Sections 1 and 3 are non-normative. Section 2 is normative. In Section 4, the values of the `Term IRI`, `Definition`, and `Controlled value` are normative. The value of `Usage` (if it exists for a given term) is normative. The values of `Term Name` are non-normative, although one can expect that the namespace abbreviation prefix is one commonly used for the term namespace. `Label` and the values of all other properties (such as `Notes`) are non-normative.
 46 | 
 47 | ### 1.2 RFC 2119 key words
 48 | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", 
 49 | "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to 
 50 | be interpreted as described in [BCP 14](https://datatracker.ietf.org/doc/html/bcp14)
 51 | [[RFC2119]](https://datatracker.ietf.org/doc/html/rfc2119)
 52 | [[RFC8174]](https://datatracker.ietf.org/doc/html/rfc8174)
 53 | when, and only when, they are written in capitals (as shown here).
 54 | 
 55 | ### 1.3 Namespaces
 56 | 
 57 | The namespace `eco:` abbreviates `http://rs.tdwg.org/eco/terms/` and the namespace `ecoiri:` abbreviates `http://rs.tdwg.org/eco/iri/`. Both namespaces are used with terms minted for the Humboldt Extension for Ecological Inventories. `ecotcr:` abbreviates `http://rs.tdwg.org/ecotcr/values/`, and is used with terms in this vocabulary.
 58 | 
 59 | ## 2 Use of Terms (normnative)
 60 | 
 61 | Due to the requirements of [Section 1.4.3 of the Darwin Core RDF Guide](http://rs.tdwg.org/dwc/terms/guides/rdf/#143-use-of-darwin-core-terms-in-rdf-normative), term IRIs MUST be used as values of `ecoiri:taxonCompletenessReported`. Controlled value strings MUST be used as values of `eco:taxonCompletenessReported`.
 62 | 
 63 | ## 3 Term Index 
 64 | 
 65 | [not reported](#ecotcr_tcr00) |
 66 | [reported complete](#ecotcr_tcr01) |
 67 | [reported incomplete](#ecotcr_tcr02) |
 68 | [taxon completeness reported concept scheme](#ecotcr_tcr)
 69 | 
 70 | ## 4 Vocabulary
 71 | <table>
 72 | 	<thead>
 73 | 		<tr>
 74 | 			<th colspan="2"><a id="ecotcr_tcr"></a>Term Name  ecotcr:tcr</th>
 75 | 		</tr>
 76 | 	</thead>
 77 | 	<tbody>
 78 | 		<tr>
 79 | 			<td>Term IRI</td>
 80 | 			<td><a href="http://rs.tdwg.org/ecotcr/values/tcr">http://rs.tdwg.org/ecotcr/values/tcr</a></td>
 81 | 		</tr>
 82 | 		<tr>
 83 | 			<td>Modified</td>
 84 | 			<td>2024-02-28</td>
 85 | 		</tr>
 86 | 		<tr>
 87 | 			<td>Term version IRI</td>
 88 | 			<td><a href="http://rs.tdwg.org/ecotcr/values/version/tcr-2024-02-28">http://rs.tdwg.org/ecotcr/values/version/tcr-2024-02-28</a></td>
 89 | 		</tr>
 90 | 		<tr>
 91 | 			<td>Label</td>
 92 | 			<td>taxon completeness reported concept scheme</td>
 93 | 		</tr>
 94 | 		<tr>
 95 | 			<td>Definition</td>
 96 | 			<td>a SKOS concept scheme for categorizing taxon completeness reporting</td>
 97 | 		</tr>
 98 | 		<tr>
 99 | 			<td>Type</td>
100 | 			<td>http://www.w3.org/2004/02/skos/core#ConceptScheme</td>
101 | 		</tr>
102 | 		<tr>
103 | 			<td>Executive Committee decision</td>
104 | 			<td><a href="http://rs.tdwg.org/decisions/decision-2024-02-28_42">http://rs.tdwg.org/decisions/decision-2024-02-28_42</a></td>
105 | 		</tr>
106 | 	</tbody>
107 | </table>
108 | 
109 | <table>
110 | 	<thead>
111 | 		<tr>
112 | 			<th colspan="2"><a id="ecotcr_tcr00"></a>Term Name  ecotcr:tcr00</th>
113 | 		</tr>
114 | 	</thead>
115 | 	<tbody>
116 | 		<tr>
117 | 			<td>Term IRI</td>
118 | 			<td><a href="http://rs.tdwg.org/ecotcr/values/tcr00">http://rs.tdwg.org/ecotcr/values/tcr00</a></td>
119 | 		</tr>
120 | 		<tr>
121 | 			<td>Modified</td>
122 | 			<td>2024-02-28</td>
123 | 		</tr>
124 | 		<tr>
125 | 			<td>Term version IRI</td>
126 | 			<td><a href="http://rs.tdwg.org/ecotcr/values/version/tcr00-2024-02-28">http://rs.tdwg.org/ecotcr/values/version/tcr00-2024-02-28</a></td>
127 | 		</tr>
128 | 		<tr>
129 | 			<td>Label</td>
130 | 			<td>not reported</td>
131 | 		</tr>
132 | 		<tr>
133 | 			<td>Definition</td>
134 | 			<td>Taxonomic completeness was not assessed or reported for the dwc:Event.</td>
135 | 		</tr>
136 | 		<tr>
137 | 			<td>Controlled value</td>
138 | 			<td>notReported</td>
139 | 		</tr>
140 | 		<tr>
141 | 			<td>Type</td>
142 | 			<td>Concept</td>
143 | 		</tr>
144 | 		<tr>
145 | 			<td>Executive Committee decision</td>
146 | 			<td><a href="http://rs.tdwg.org/decisions/decision-2024-02-28_42">http://rs.tdwg.org/decisions/decision-2024-02-28_42</a></td>
147 | 		</tr>
148 | 	</tbody>
149 | </table>
150 | 
151 | <table>
152 | 	<thead>
153 | 		<tr>
154 | 			<th colspan="2"><a id="ecotcr_tcr01"></a>Term Name  ecotcr:tcr01</th>
155 | 		</tr>
156 | 	</thead>
157 | 	<tbody>
158 | 		<tr>
159 | 			<td>Term IRI</td>
160 | 			<td><a href="http://rs.tdwg.org/ecotcr/values/tcr01">http://rs.tdwg.org/ecotcr/values/tcr01</a></td>
161 | 		</tr>
162 | 		<tr>
163 | 			<td>Modified</td>
164 | 			<td>2024-02-28</td>
165 | 		</tr>
166 | 		<tr>
167 | 			<td>Term version IRI</td>
168 | 			<td><a href="http://rs.tdwg.org/ecotcr/values/version/tcr01-2024-02-28">http://rs.tdwg.org/ecotcr/values/version/tcr01-2024-02-28</a></td>
169 | 		</tr>
170 | 		<tr>
171 | 			<td>Label</td>
172 | 			<td>reported complete</td>
173 | 		</tr>
174 | 		<tr>
175 | 			<td>Definition</td>
176 | 			<td>Taxonomic completeness was assessed for the dwc:Event, and it was determined to be complete.</td>
177 | 		</tr>
178 | 		<tr>
179 | 			<td>Controlled value</td>
180 | 			<td>reportedComplete</td>
181 | 		</tr>
182 | 		<tr>
183 | 			<td>Type</td>
184 | 			<td>Concept</td>
185 | 		</tr>
186 | 		<tr>
187 | 			<td>Executive Committee decision</td>
188 | 			<td><a href="http://rs.tdwg.org/decisions/decision-2024-02-28_42">http://rs.tdwg.org/decisions/decision-2024-02-28_42</a></td>
189 | 		</tr>
190 | 	</tbody>
191 | </table>
192 | 
193 | <table>
194 | 	<thead>
195 | 		<tr>
196 | 			<th colspan="2"><a id="ecotcr_tcr02"></a>Term Name  ecotcr:tcr02</th>
197 | 		</tr>
198 | 	</thead>
199 | 	<tbody>
200 | 		<tr>
201 | 			<td>Term IRI</td>
202 | 			<td><a href="http://rs.tdwg.org/ecotcr/values/tcr02">http://rs.tdwg.org/ecotcr/values/tcr02</a></td>
203 | 		</tr>
204 | 		<tr>
205 | 			<td>Modified</td>
206 | 			<td>2024-02-28</td>
207 | 		</tr>
208 | 		<tr>
209 | 			<td>Term version IRI</td>
210 | 			<td><a href="http://rs.tdwg.org/ecotcr/values/version/tcr02-2024-02-28">http://rs.tdwg.org/ecotcr/values/version/tcr02-2024-02-28</a></td>
211 | 		</tr>
212 | 		<tr>
213 | 			<td>Label</td>
214 | 			<td>reported incomplete</td>
215 | 		</tr>
216 | 		<tr>
217 | 			<td>Definition</td>
218 | 			<td>Taxonomic completeness was assessed for the dwc:Event, and it was determined to be incomplete.</td>
219 | 		</tr>
220 | 		<tr>
221 | 			<td>Controlled value</td>
222 | 			<td>reportedIncomplete</td>
223 | 		</tr>
224 | 		<tr>
225 | 			<td>Type</td>
226 | 			<td>Concept</td>
227 | 		</tr>
228 | 		<tr>
229 | 			<td>Executive Committee decision</td>
230 | 			<td><a href="http://rs.tdwg.org/decisions/decision-2024-02-28_42">http://rs.tdwg.org/decisions/decision-2024-02-28_42</a></td>
231 | 		</tr>
232 | 	</tbody>
233 | </table>
234 | 
235 | 
236 | 


--------------------------------------------------------------------------------
/material/Checklist Metadata - Data Entry Manual.docx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tdwg/hc/92e0ed94afceeea6a2d8ceb559da37f450ad007c/material/Checklist Metadata - Data Entry Manual.docx


--------------------------------------------------------------------------------
/material/Guralnick et al Ecography 2017.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tdwg/hc/92e0ed94afceeea6a2d8ceb559da37f450ad007c/material/Guralnick et al Ecography 2017.pdf


--------------------------------------------------------------------------------
/material/HCSupplementalTable3_FullTermList_r2_v4_RW.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tdwg/hc/92e0ed94afceeea6a2d8ceb559da37f450ad007c/material/HCSupplementalTable3_FullTermList_r2_v4_RW.xlsx


--------------------------------------------------------------------------------
/material/HC_SupplementalTable_ExamplesNEW.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tdwg/hc/92e0ed94afceeea6a2d8ceb559da37f450ad007c/material/HC_SupplementalTable_ExamplesNEW.xlsx


--------------------------------------------------------------------------------
/material/TDWG_Task_Group_Charter_Template_03.docx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tdwg/hc/92e0ed94afceeea6a2d8ceb559da37f450ad007c/material/TDWG_Task_Group_Charter_Template_03.docx


--------------------------------------------------------------------------------
/material/desktop.ini:
--------------------------------------------------------------------------------
1 | [ . S h e l l C l a s s I n f o ]  
2 |  C o n f i r m F i l e O p = 0  
3 |  I c o n R e s o u r c e = C : \ U s e r s \ y a n i s \ A p p D a t a \ L o c a l \ T e m p \ d r i v e _ f s _ t d _ 2 _ 3 8 8 2 3 . i c o  
4 |  


--------------------------------------------------------------------------------
/vocabulary/old/HC_terms_2021-02-28.csv:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tdwg/hc/92e0ed94afceeea6a2d8ceb559da37f450ad007c/vocabulary/old/HC_terms_2021-02-28.csv


--------------------------------------------------------------------------------
/vocabulary/old/HC_terms_2021-11-17.csv:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tdwg/hc/92e0ed94afceeea6a2d8ceb559da37f450ad007c/vocabulary/old/HC_terms_2021-11-17.csv


--------------------------------------------------------------------------------
/vocabulary/old/README.md:
--------------------------------------------------------------------------------
1 | # Folder "old"
2 | 
3 | This folder contains an archive of out of ate vocabulary files.


--------------------------------------------------------------------------------