├── .here
├── results
    ├── results_index.csv
    ├── tables
    │   └── readme.md
    ├── figures
    │   └── readme.md
    ├── other
    │   └── readme.md
    └── readme.md
├── data
    ├── data_index.csv
    ├── derived
    │   ├── readme.md
    │   ├── public
    │   │   └── readme.md
    │   ├── sample
    │   │   └── readme.md
    │   └── private
    │   │   └── readme.md
    ├── raw
    │   ├── public
    │   │   └── readme.md
    │   └── private
    │   │   └── readme.md
    ├── scratch
    │   └── readme.md
    ├── metadata
    │   ├── readme.md
    │   └── metadata_template.md
    └── readme.md
├── docs
    ├── manuscript
    │   └── readme.md
    ├── report
    │   ├── readme.md
    │   └── analysis_plan.md
    ├── presentation
    │   └── readme.md
    └── readme.md
├── procedure
    ├── code
    │   ├── readme.md
    │   ├── 00-Python-environment-setup.ipynb
    │   ├── 01-R-markdown.Rmd
    │   └── 01-Jupyter_notebook.ipynb
    ├── protocols
    │   └── readme.md
    ├── procedure_index.csv
    ├── environment
    │   └── readme.md
    └── readme.md
├── .gitattributes
├── template_reference.bib
├── CITATION.cff
├── LICENSE
├── Template_LICENSE
├── .gitignore
├── readme.md
└── template_readme.md


/.here:
--------------------------------------------------------------------------------
1 | 
2 | 


--------------------------------------------------------------------------------
/results/results_index.csv:
--------------------------------------------------------------------------------
1 | path,name,description
2 | 


--------------------------------------------------------------------------------
/data/data_index.csv:
--------------------------------------------------------------------------------
1 | path,name,metadata,description
2 | 


--------------------------------------------------------------------------------
/results/tables/readme.md:
--------------------------------------------------------------------------------
1 | # Tables
2 | 
3 | Store data table results of the analysis here.
4 | 


--------------------------------------------------------------------------------
/results/figures/readme.md:
--------------------------------------------------------------------------------
1 | # Figures
2 | 
3 | Store graphic products (figures and maps) of the analysis here.
4 | 


--------------------------------------------------------------------------------
/docs/manuscript/readme.md:
--------------------------------------------------------------------------------
1 | # Manuscript
2 | 
3 | Store compiled manuscript for submission and publication here.
4 | 


--------------------------------------------------------------------------------
/docs/report/readme.md:
--------------------------------------------------------------------------------
1 | # Preanalysis Plan and Research Reports
2 | 
3 | Store the preanalysis plan and research report here.


--------------------------------------------------------------------------------
/docs/presentation/readme.md:
--------------------------------------------------------------------------------
1 | # Presentation
2 | 
3 | Store compiled presentations here. These may include presentations for conferences, public talks, lectures, etc.
4 | 


--------------------------------------------------------------------------------
/data/derived/readme.md:
--------------------------------------------------------------------------------
1 | # Derived Data
2 | 
3 | Save cleaned, preprocessed data here. Data in this folder should be ready for analysis or contain the final results of analysis.


--------------------------------------------------------------------------------
/data/derived/public/readme.md:
--------------------------------------------------------------------------------
1 | # Derived Public Data
2 | 
3 | Store pre-processed data here if the data is suitable for public redistribution and if the data files are less than `100mb`.
4 | 


--------------------------------------------------------------------------------
/procedure/code/readme.md:
--------------------------------------------------------------------------------
1 | # Code
2 | Store computational code-based research procedures here.
3 | Document an index of files stored here in [procedures_index.csv](../procedure_index.csv) the root [procedure](../) folder.
4 | 


--------------------------------------------------------------------------------
/.gitattributes:
--------------------------------------------------------------------------------
1 | # Auto detect text files and perform LF normalization
2 | * text=auto
3 | 
4 | # Count .Rmd files
5 | # https://github.com/github-linguist/linguist/blob/master/docs/overrides.md
6 | *.Rmd linguist-detectable
7 | 


--------------------------------------------------------------------------------
/results/other/readme.md:
--------------------------------------------------------------------------------
1 | # Other Research Outputs
2 | 
3 | Store other research outputs here. These may include data tables for publication, non-graphical and non-map images (e.g. photographs), audio, video, animation, or other media.
4 | 


--------------------------------------------------------------------------------
/procedure/protocols/readme.md:
--------------------------------------------------------------------------------
1 | # Protocols
2 | Store any non-computational protocols and research procedures here.
3 | Document an index of files stored here in [procedures_index.csv](../procedure_index.csv) the root [procedure](../) folder.
4 | 


--------------------------------------------------------------------------------
/data/raw/public/readme.md:
--------------------------------------------------------------------------------
1 | # Raw Public Data
2 | 
3 | Store raw data as collected or downloaded here if the data is suitable for public redistribution and if the data files are less than `100mb`. Include any required licenses and citations in the `data/metadata` folder.


--------------------------------------------------------------------------------
/template_reference.bib:
--------------------------------------------------------------------------------
 1 | @misc{Kedron_Holler_2023,
 2 | title={Template for Reproducible and Replicable Research in Human-Environment and Geographical Sciences},
 3 | url={osf.io/w29mq},
 4 | DOI={10.17605/OSF.IO/W29MQ},
 5 | publisher={OSF},
 6 | author={Kedron, Peter and Holler, Joseph},
 7 | year={2023},
 8 | month={Jun}
 9 | }
10 | 


--------------------------------------------------------------------------------
/data/derived/sample/readme.md:
--------------------------------------------------------------------------------
1 | # Sample Data
2 | 
3 | Store deidentified and redistributable sample data here, if the data is required for analysis but is in a condition that cannot be publicly redistributed. Data in this folder may be randomized or simulated in order to provide some reproducibility in the cases where preprocessed data must remain private.
4 | 


--------------------------------------------------------------------------------
/procedure/procedure_index.csv:
--------------------------------------------------------------------------------
1 | path,name,purpose
2 | environment,readme.md,set up computational environment
3 | code,script1.R,download and preprocess data
4 | protocol,survey_irb.pdf,Institutional review board protocol for survey sampling and instrument
5 | protocol,mapworkshop.pdf,participatory mapping workshop protocol
6 | code,script2.R,run analysis
7 | code,script3.R,generate visualizations for results
8 | 


--------------------------------------------------------------------------------
/results/readme.md:
--------------------------------------------------------------------------------
 1 | # Results
 2 | 
 3 | Store final research outputs, e.g. figures, tables, or other media for publications and presentations.
 4 | 
 5 | Complete the [results_metadata.csv](results_metadata.csv) file indexing each results file, including the fields:
 6 | 
 7 | - `path`: the path from results folder, e.g., `figures`, `other`, or `tables`
 8 | - `name`: the file name
 9 | - `description`: very brief description or figure title
10 | 


--------------------------------------------------------------------------------
/data/scratch/readme.md:
--------------------------------------------------------------------------------
 1 | # Scratch Data
 2 | 
 3 | Store scratch data used in intermediary processing here. This is also a good place to store specialized databases that cannot be version controlled or redistributed. *This folder is ignored by Git versioning* with the exception of this `readme.md` file by the following lines in `.gitignore`
 4 | 
 5 | ```gitignore
 6 | # Ignore contents of scratch folder, with the exception of its readme file
 7 | scratch/**
 8 | !scratch/readme.md
 9 | ```
10 | 


--------------------------------------------------------------------------------
/docs/readme.md:
--------------------------------------------------------------------------------
1 | ## Reporting Reproducible Research
2 | Use the resources in this folder to develop a pre-registration plan, document a reproduction or replication, and organize your manuscript.
3 | 
4 | Also, use this `docs` folder to store websites related to the research project.
5 | In some cases, this `readme.md` document may interfere with GitHub websites.
6 | If this is the case and you have already stored website files in this directory, this `readme.md` file may be renamed or deleted.
7 | 


--------------------------------------------------------------------------------
/data/metadata/readme.md:
--------------------------------------------------------------------------------
1 | # Metadata
2 | 
3 | Organize and store documentation and metadata in this folder.
4 | Metadata files should be listed for relevant data sources in [data/data_metadata.csv](../data_metadata.csv)
5 | 
6 | Best practices for geographic metadata is to create XML compliant with the ISO 191** series of standards for geospatial metadata.
7 | A more human- and GitHub-readable markdown form of the ISO 191** geospatial metadata is provided in [metadata_template.md](metadata_template.md)
8 | 


--------------------------------------------------------------------------------
/data/derived/private/readme.md:
--------------------------------------------------------------------------------
 1 | # Derived Private Data
 2 | 
 3 | Store preprocessed data here, if the data is required for analysis but is in a condition that cannot be publicly redistributed. For example, data versioning and sharing my be restricted because of large file sizes, licensing, ethics, privacy, or confidentiality.
 4 | 
 5 | If data for analysis must be guarded in this way, **the authors are encouraged to provide a sample deidentified dataset** in the `data/derived/sample` directory.
 6 | 
 7 | *This folder is ignored by Git versioning* with the exception of this `readme.md` file by the following lines in `.gitignore`
 8 | 
 9 | ```gitignore
10 | # Ignore contents of private folder, with the exception of its readme file
11 | private/**
12 | !private/readme.md
13 | ```
14 | 


--------------------------------------------------------------------------------
/procedure/environment/readme.md:
--------------------------------------------------------------------------------
 1 | # Environment
 2 | 
 3 | Store detailed information about the hardware and software environment requirements for procedures and code here. You may also document a recipe or container of the computational environment here.
 4 | 
 5 | This directory is specifically for hardware and software environments.
 6 | Contextual factors or confounds of human subjects research or field research should be communicated in protocol documents and stored in the `protocols` directory.
 7 | 
 8 | For users of R, our template code, at a minimum, saves environment information using the `sessionInfo()` function.
 9 | 
10 | ## Set up instructions
11 | 
12 | Researchers are encouraged to write instructions on setting up or accessing the computational environment for their study here.
13 | 


--------------------------------------------------------------------------------
/data/readme.md:
--------------------------------------------------------------------------------
 1 | # Data
 2 | 
 3 | Store all of your research data in subdirectories here.
 4 | 
 5 | Complete the [data_metadata.csv](data_metadata.csv) file indexing each data file, including the fields:
 6 | 
 7 | - `path`: the path to the data folder, likely one of: `raw\private`, `raw\public`, `derived\private` or `derived\public`
 8 | - `name`: the file name, including extension
 9 | - `metadata`: list of metadata files for this data source, stored in the `data\metadata` folder. These may include ISO-191** or FGDC standard `XML` files, data dictionaries, licenses or attributions, user guides, webpage printouts, etc.
10 | - `description`: *very* brief description of the dataset. If the data is **simulated**,  **randomized**, or represents only a limited **sample** of the full research dataset, you should note those limitations here.
11 | 


--------------------------------------------------------------------------------
/data/raw/private/readme.md:
--------------------------------------------------------------------------------
 1 | # Raw Private Data
 2 | 
 3 | Store raw data in this folder as it is collected or downloaded if the data cannot be publicly redistributed. For example, data versioning and sharing my be restricted because of large file sizes, licensing, ethics, privacy, or confidentiality. Best practices are to include code to automate the process of downloading or simulating raw private data in the first step of the methods, or to include instructions here for accessing any private or restricted-access data.
 4 | 
 5 | ## Instructions for accessing data
 6 | Include instructions here, e.g., instructions to run the first code script in the `procedure/code` folder, or instructions on how to create or download the data.
 7 | 
 8 | *This folder is ignored by Git versioning* with the exception of this `readme.md` file by the following lines in `.gitignore`
 9 | 
10 | ```gitignore
11 | # Ignore contents of private folder, with the exception of its readme file
12 | private/**
13 | !private/readme.md
14 | ```
15 | 


--------------------------------------------------------------------------------
/CITATION.cff:
--------------------------------------------------------------------------------
 1 | cff-version: 1.2.0
 2 | message: "If you use this template, please cite it as below."
 3 | authors:
 4 | - family-names: "Kedron"
 5 |   given-names: "Peter"
 6 |   orcid: "https://orcid.org/0000-0002-1093-3416"
 7 | - family-names: "Holler"
 8 |   given-names: "Joseph"
 9 |   orcid: "https://orcid.org/0000-0002-2381-2699"
10 | title: "Template for Reproducible and Replicable Research in Human-Environment and Geographical Sciences"
11 | version: 0.1
12 | doi: 10.17605/OSF.IO/5FGMC
13 | date-released: 2021-08-22
14 | url: "https://github.com/HEGSRR/HEGSRR-Template"
15 | license: BSD 3-Clause
16 | preferred-citation:
17 |   type: generic
18 |   authors:
19 |   - family-names: "Kedron"
20 |     given-names: "Peter"
21 |     orcid: "https://orcid.org/0000-0002-1093-3416"
22 |   - family-names: "Holler"
23 |     given-names: "Joseph"
24 |     orcid: "https://orcid.org/0000-0002-2381-2699"
25 |   title: "Template for Reproducible and Replicable Research in Human-Environment and Geographical Sciences"
26 |   doi: 10.17605/OSF.IO/W29MQ
27 |   year: 2021
28 |   url: "https://github.com/HEGSRR/HEGSRR-Template"
29 | 


--------------------------------------------------------------------------------
/procedure/readme.md:
--------------------------------------------------------------------------------
 1 | # Procedure
 2 | Catalog all procedures used here in an *ordered* table documenting any code or other research procedure/protocol documents. Provide a brief description of the purpose of each procedure and piece of code.
 3 | 
 4 | Catalog the files in [procedure_index.csv](procedure_index.csv)
 5 | 
 6 | See the example table below, and modify the table to suit your research design.
 7 | 
 8 | - `path`: the path to the file or directory, usually one of `code` for software code and scripts, `environment` for the hardware/software computational environment, or `protocol` for non-code protocols like
 9 | - `name`: the file name, including extension
10 | - `purpose`: *very* brief description of the purpose of the file
11 | 
12 | The *sequence* of procedures to be followed is implied by the *order* in the table and should be explicit in the pre-analysis plan and post-analysis report.
13 | 
14 | path | name | purpose |
15 | -- | -- | -- |
16 | environment | readme.md | set up computational environment |
17 | code | script1.R | download and preprocess data |
18 | protocol | survey_irb.pdf | Institutional review board protocol for survey sampling and instrument |
19 | protocol | mapworkshop.pdf | participatory mapping workshop protocol |
20 | code | script2.R | run analysis |
21 | code | script3.R | generate visualizations for results |
22 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | BSD 3-Clause License
 2 | 
 3 | Copyright (c) 2021, Peter Kedron and Joseph Holler
 4 | All rights reserved.
 5 | 
 6 | Redistribution and use in source and binary forms, with or without
 7 | modification, are permitted provided that the following conditions are met:
 8 | 
 9 | 1. Redistributions of source code must retain the above copyright notice, this
10 |    list of conditions and the following disclaimer.
11 | 
12 | 2. Redistributions in binary form must reproduce the above copyright notice,
13 |    this list of conditions and the following disclaimer in the documentation
14 |    and/or other materials provided with the distribution.
15 | 
16 | 3. Neither the name of the copyright holder nor the names of its
17 |    contributors may be used to endorse or promote products derived from
18 |    this software without specific prior written permission.
19 | 
20 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
21 | AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
22 | IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
23 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
24 | FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
25 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
26 | SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
27 | CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
28 | OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
29 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
30 | 


--------------------------------------------------------------------------------
/Template_LICENSE:
--------------------------------------------------------------------------------
 1 | BSD 3-Clause License
 2 | 
 3 | Copyright (c) 2021, Peter Kedron and Joseph Holler
 4 | All rights reserved.
 5 | 
 6 | Redistribution and use in source and binary forms, with or without
 7 | modification, are permitted provided that the following conditions are met:
 8 | 
 9 | 1. Redistributions of source code must retain the above copyright notice, this
10 |    list of conditions and the following disclaimer.
11 | 
12 | 2. Redistributions in binary form must reproduce the above copyright notice,
13 |    this list of conditions and the following disclaimer in the documentation
14 |    and/or other materials provided with the distribution.
15 | 
16 | 3. Neither the name of the copyright holder nor the names of its
17 |    contributors may be used to endorse or promote products derived from
18 |    this software without specific prior written permission.
19 | 
20 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
21 | AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
22 | IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
23 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
24 | FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
25 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
26 | SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
27 | CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
28 | OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
29 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
30 | 


--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
  1 | # Ignore contents of derived private folder, with the exception of its readme file
  2 | data/derived/private/**
  3 | !readme.md
  4 | 
  5 | # Ignore contents of raw private folder, with the exception of its readme file
  6 | data/raw/private/**
  7 | !readme.md
  8 | 
  9 | # Ignore contents of scratch folder, with the exception of its readme file
 10 | data/scratch/**
 11 | !readme.md
 12 | 
 13 | # Ignore Microsoft Office system files
 14 | *.tmp
 15 | ~$*.doc*
 16 | Backup of *.doc*
 17 | ~$*.ppt*
 18 | ~$*.xls*
 19 | *.xlt
 20 | 
 21 | # Ignore common R files
 22 | .Rproj.user
 23 | .Rhistory
 24 | .RData
 25 | .RDataTmp*
 26 | .Rapp.history
 27 | 
 28 | # Jupyter Notebook
 29 | .ipynb_checkpoints
 30 | 
 31 | # IPython
 32 | profile_default/
 33 | ipython_config.py
 34 | 
 35 | # Byte-compiled / optimized / DLL files
 36 | # https://github.com/github/gitignore/blob/main/Python.gitignore
 37 | __pycache__/
 38 | *.py[cod]
 39 | *$py.class
 40 | 
 41 | # C extensions
 42 | *.so
 43 | 
 44 | # Python Environments
 45 | .env
 46 | .venv
 47 | env/
 48 | venv/
 49 | ENV/
 50 | env.bak/
 51 | venv.bak/
 52 | pipfile*
 53 | 
 54 | # General MacOS files
 55 | .DS_Store
 56 | .AppleDouble
 57 | .LSOverride
 58 | 
 59 | # Thumbnails
 60 | ._*
 61 | 
 62 | # Files that might appear in the root of a volume
 63 | .DocumentRevisions-V100
 64 | .fseventsd
 65 | .Spotlight-V100
 66 | .TemporaryItems
 67 | .Trashes
 68 | .VolumeIcon.icns
 69 | .com.apple.timemachine.donotpresent
 70 | 
 71 | # Directories potentially created on remote AFP share
 72 | .AppleDB
 73 | .AppleDesktop
 74 | Network Trash Folder
 75 | Temporary Items
 76 | .apdisk
 77 | 
 78 | # Windows thumbnail cache files
 79 | Thumbs.db
 80 | Thumbs.db:encryptable
 81 | ehthumbs.db
 82 | ehthumbs_vista.db
 83 | 
 84 | # Dump file
 85 | *.stackdump
 86 | 
 87 | # Folder config file
 88 | [Dd]esktop.ini
 89 | 
 90 | # Recycle Bin used on file shares
 91 | $RECYCLE.BIN/
 92 | 
 93 | # Windows Installer files
 94 | *.cab
 95 | *.msi
 96 | *.msix
 97 | *.msm
 98 | *.msp
 99 | 
100 | # Windows shortcuts
101 | *.lnk
102 | 
103 | # Geopackage temporary files
104 | *.gpkg-shm
105 | *.gpkg-wal
106 | 
107 | # Icloud
108 | *.icloud
109 | 
110 | # Jupyter cache folder
111 | cache/**
112 | 


--------------------------------------------------------------------------------
/data/metadata/metadata_template.md:
--------------------------------------------------------------------------------
 1 | - `Title`: Title of data source
 2 | - `Abstract`: Brief description of the data source
 3 | - `Spatial Coverage`: Specify the geographic extent of your study. This may be a place name and link to a feature in a gazetteer like GeoNames or OpenStreetMap, or a well known text (WKT) representation of a bounding box.
 4 | - `Spatial Resolution`: Specify the spatial resolution as a scale factor, description of the level of detail of each unit of observation (including administrative level of administrative areas), and/or or distance of a raster GRID size
 5 | - `Spatial Representation Type`: Specify the model of spatial data representation, e.g. one of `vector`, `grid`, `textTable`, `tin` (triangulated irregular network), etc. If the type is `vector`, also specify the geometry type as in the OGC Simple Feature Access standard (https://www.ogc.org/publications/standard/sfa/) , e.g. `POINT`, `LINESTRING`, `MULTIPOLYGON`, etc. 
 6 | - `Spatial Reference System`: Specify the geographic or projected coordinate system for the study
 7 | - `Temporal Coverage`: Specify the temporal extent of your study---i.e. the range of time represented by the data observations.
 8 | - `Temporal Resolution`: Specify the temporal resolution of your study---i.e. the duration of time for which each observation represents or the revisit period for repeated observations
 9 | - `Lineage`: Describe and/or cite data sources and/or methodological steps taken or planned to create this data source, e.g.:
10 |   - sampling scheme, including spatial sampling
11 |   - target sample size and method for determining sample size
12 |   - stopping criteria for data collection and sampling (e.g. sample size, time elapsed)
13 |   - de-identification / anonymization
14 |   - experimental manipulation
15 | - `Distribution`: Describe who will make the data available and how?
16 | - `Constraints`: Legal constraints for *access* or *use* to protect *privacy* or *intellectual property rights*
17 | - `Data Quality`: State any planned quality assessment
18 | - `Variables`: For each variable, enter the following information. If you have two or more variables per data source, you may want to present this information in table form (shown below)
19 |   - `Label`: variable name as used in the data or code
20 |   - `Alias`: intuitive natural language name
21 |   - `Definition`: Short description or definition of the variable. Include measurement units in description.
22 |   - `Type`: data type, e.g. character string, integer, real
23 |   - `Accuracy`: e.g. uncertainty of measurements
24 |   - `Domain`: Expected range of Maximum and Minimum of numerical data, or codes or categories of nominal data, or reference to a standard codebook
25 |   - `Missing Data Value(s)`: Values used to represent missing data and frequency of missing data observations
26 |   - `Missing Data Frequency`: Frequency of missing data observations: not yet known for data to be collected
27 | 
28 | | Label | Alias | Definition | Type | Accuracy | Domain | Missing Data Value(s) | Missing Data Frequency |
29 | | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: |
30 | | variable1 | ... | ... | ... | ... | ... | ... | ... |
31 | | variable2 | ... | ... | ... | ... | ... | ... | ... |
32 | 


--------------------------------------------------------------------------------
/readme.md:
--------------------------------------------------------------------------------
 1 | If you use this template for research, please [cite it](template_reference.bib):
 2 | > Kedron, P., & Holler, J. (2023). Template for Reproducible and Replicable Research in Human-Environment and Geographical Sciences. https://doi.org/10.17605/OSF.IO/W29MQ
 3 | 
 4 | # Title of Study
 5 | 
 6 | ## Contributors
 7 | 
 8 | - First Name Last Name\*, email address, @githubname, ORCID link, affiliated institution(s)
 9 | - First Name Last Name, email address, @githubname, ORCID link, affiliated institution(s)
10 | 
11 | \* Corresponding author and creator
12 | 
13 | ## Abstract
14 | 
15 | Write a brief abstract about your research project.
16 | If the project is a reproduction or replication study, include the full citation with a statement
17 | This study is a *reproduction/replication* of:
18 | 
19 | > citation to prior study
20 | 
21 | A graphical abstract of the study could also be included as an image here.
22 | 
23 | ## Study Metadata
24 | 
25 | - `Key words`: Comma-separated list of keywords (tags) for searchability. Geographers often use one or two keywords each for: theory, geographic context, and methods.
26 | - `Subject`: select from the [BePress Taxonomy](http://digitalcommons.bepress.com/cgi/viewcontent.cgi?article=1008&context=reference)
27 | - `Date created`: date when project was started
28 | - `Date modified`: date of most recent revision
29 | - `Spatial Coverage`: Specify the geographic extent of your study. This may be a place name and link to a feature in a gazetteer like GeoNames or OpenStreetMap, or a well known text (WKT) representation of a bounding box.
30 | - `Spatial Resolution`: Specify the spatial resolution as a scale factor, description of the level of detail of each unit of observation (including administrative level of administrative areas), and/or or distance of a raster GRID size
31 | - `Spatial Reference System`: Specify the geographic or projected coordinate system for the study
32 | - `Temporal Coverage`: Specify the temporal extent of your study---i.e. the range of time represented by the data observations.
33 | - `Temporal Resolution`: Specify the temporal resolution of your study---i.e. the duration of time for which each observation represents or the revisit period for repeated observations
34 | - `Funding Name`: name of funding for the project
35 | - `Funding Title`: title of project grant
36 | - `Award info URI`: web address for award information
37 | - `Award number`: award number
38 | 
39 | ## Related to
40 | 
41 | - `OSF Project`:
42 | - `Pre-analysis Registration`:
43 | - `Post-analysis Report Registration`:
44 | - `Preprint`:
45 | - `Conference Presentation`:
46 | - `Publication`:
47 | - `Prior Study`:
48 | - `...`:
49 | 
50 | ## Metadata for access
51 | 
52 | - `Rights`: [LICENSE](LICENSE): BSD 3-Clause "New" or "Revised"
53 | - `Resource type`: Collection
54 | - `Resource language`: English
55 | - `Conforms to`: Template for Reproducible and Replicable Research in Human-Environment and Geographical Sciences version 1.0, DOI:[10.17605/OSF.IO/W29MQ](https://doi.org/10.17605/OSF.IO/W29MQ)
56 | 
57 | ## Compendium structure and contents
58 | 
59 | This research compendium is structured with four main directories:
60 | 
61 | - `data`: contains subdirectories for `raw` data and `derived` data.
62 | - `docs`: contains subdirectories for `manuscript`, `presentation`, and `report`
63 | - `procedure`: contains subdirectories for `code` or software scripts, information about the computational `environment` in which the research was conducted, and non-code research `protocols`
64 | - `results`: contains subdirectories for `figures`, formatted data `tables`, or `other` formats of research results.
65 | 
66 | The data, procedures, and results of this repository are outlined in three tables:
67 | - Data: [data/data_index.csv](data/data_index.csv)
68 | - Procedures: [procedure/procedure_index.csv](procedure/procedure_index.csv)
69 | - Results: [results/results_index.csv](results/results_index.csv)
70 | 
71 | Important local **documents** include:
72 | - Pre-analysis plan: [docs/report/preanalysis.pdf](docs/report/preanalysis.pdf)
73 | - Study report: [docs/report/report.pdf](docs/report/report.pdf)
74 | - Manuscript: [docs/manuscript/manuscript.pdf](docs/manuscript/manuscript.pdf)
75 | - Presentation: [docs/presentation/presentation.pdf](docs/presentation/presentation.pdf)
76 | 
77 | #### Compendium reference
78 | 
79 | The [template_readme.md](template_readme.md) file contains more information on the design of this template and references used in the design.
80 | The [Template_LICENSE](Template_LICENSE) file provides the BSD 3-Clause license for using this template.
81 | To cite the template, please use [template_reference.bib](template_reference.bib) or:
82 | > Kedron, P., & Holler, J. (2023). Template for Reproducible and Replicable Research in Human-Environment and Geographical Sciences. https://doi.org/10.17605/OSF.IO/W29MQ
83 | 


--------------------------------------------------------------------------------
/template_readme.md:
--------------------------------------------------------------------------------
 1 | # Template for Reproducible and Replicable Research in Human-Environment and Geographical Sciences
 2 | 
 3 | This template Git repository contains a folder structure, template documents, and best practice suggestions for conducting geographic research with a reproducible research compendium.
 4 | The main [readme.md](readme.md) contains information about the research study.
 5 | 
 6 | The [Template_LICENSE](Template_LICENSE) file provides the BSD 3-Clause license for using this template.
 7 | To cite the template, please use [template_reference.bib](template_reference.bib) or:
 8 | > Kedron, P., & Holler, J. (2023). Template for Reproducible and Replicable Research in Human-Environment and Geographical Sciences. https://doi.org/10.17605/OSF.IO/W29MQ
 9 | 
10 | The folder structure presented here can be used to:
11 | 
12 | 1. pre-register, document, and share original research in a reproducible manner, or
13 | 2. document and share a reproduction and/or replication of original research.
14 | 
15 | An overview of the folder structure of this repository is provided below. The `readme.md` file contained in each folder provides details about the purpose of that folder and suggestions on its use.
16 | The authors should maintain the [data/data_metadata.csv](data/data_metadata.csv) file to list all raw and derived data, the [procedure/procedure_metadata.csv](procedure/procedure_metadata.csv) file with an ordered list of all procedures and/or code, and the [results/results_metadata.csv](results/results_metadata.csv) folder with a list of all figures, tables, and other media produced by the research.
17 | The `docs/report/` folder contains templates to facilitate 1) the pre-registration of research plans, and 2) report the complete details of original, reproduction, or replication studies.
18 | 
19 | ## Repository Overview
20 | 
21 |     <Study Name>
22 |     |- docs/                # study documentation
23 |     |  +- report/           # reproduction plan, reproduction report
24 |     |  +- manuscript/       # manuscript components
25 |     |  +- presentation/     # presentation materials
26 |     |
27 |     |- data                 # study data
28 |     |  - raw/               # raw data, should not be altered
29 |     |    +- public/         # public data with version control
30 |     |    +- private/        # private data with no version control
31 |     |  +- derived/          # derived data
32 |     |    +- public/         # public data with version control
33 |     |    +- private/        # private data with no version control
34 |     |  +- scratch/          # temporary files that can be safely deleted or lost
35 |     |  +- metadata/         # documentation of metadata
36 |     |
37 |     |-procedure
38 |     |  +- environment/      # details of the computational environment
39 |     |  +- code/             # any programmatic code, clearly named and commented
40 |     |  +- protocols/        # any non-computational protocols
41 |     |
42 |     |- results              # all output from workflows and analyses
43 |     |  +- figures/          # graphs, likely designated for manuscript
44 |     |  +- tables/           # tables, likely designated for manuscript  
45 |     |  +- other/            # diagrams, images, and other non-graph graphics
46 |     |
47 |     |- readme.md            # description of the study
48 |     |- template_readme.md   # description of repository design and references
49 |     |- LICENSE              # intellectual property license, ideally open source
50 |     |- Template_LICENSE     # BSD 3-Clause license for this template
51 |     |- CITATION.cff         # preferred citation for the research
52 |     |- .gitignore           # files to ignore from git tracking
53 | 
54 | ## Reproducible Research Practices
55 | 
56 | Every research project is different. This repository is designed to serve as a flexible guide capable of structuring work completed throughout the lifecycle of different types of research projects.
57 | No matter the project type, a few key suggested practices when using this repository include:
58 | 
59 | - Register your pre-analysis plan with a service like Open Science Foundation at [https://osf.io/](https://osf.io/) or an equivalent and add crosslinks between your research repository and the pre-registered plan.
60 | - Keeping original, raw data in the `data/raw` folder. Do not alter that file during data analysis.
61 | - Keeping data derived from the raw data (e.g. subsets) separate from the raw data in the `data/derived` folder.
62 | - Keeping Exploratory/experimental outputs in the `data/scratch` folder. *Files in this folder should be able to be deleted without negatively impacting the project*.  
63 | - Limiting manual changes to data. *Conduct as much data processing and analysis as possible with code*.
64 | - Maintain well-commented and human-readable code, e.g. following the [tidyverse style guide for R](https://style.tidyverse.org/) or the [PEP 8 Style Guide for Python](https://www.python.org/dev/peps/pep-0008/)
65 | - Creating a top-level `Makefile` or Rmarkdown file that documents computational work in executable form and/or clear comments and instructions in the header of each procedure and code file and good descriptions in the `procedure_metadata.csv`
66 | - Document and/or package the computational environment in the `procedure\environment` folder.
67 | 
68 | ## References
69 | 
70 | The structure of this repository closely follows the excellent [rr-init](https://github.com/Reproducible-Science-Curriculum/rr-init) repository, which in turn follows Nobel [(2009)](https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1000424).
71 | We have also incorporated structural ideas from [Gandrud (2015)](http://christophergandrud.github.io/RepResR-RStudio/) and Camerer et al. ([2016](https://osf.io/pfdyw/), [2018](https://osf.io/bzm54/)).
72 | 
73 | ### Pre-registration Template
74 | 
75 | A pre-registration template for studies involving geographic analyses.
76 | This template is modelled on similar templates developed by the [Open Science Framework (OSF)](http://osf.io/x5w7h), [AsPredicted](https://osf.io/fnsb6/), the [prereg package](https://github.com/crsh/prereg), and Van den Akker et al. [(2019)](http://doi:10.31234/osf.io/hvfmr).
77 | The OSF template is our most direct source.
78 | This template can be used to transparently plan and pre-register original geographic research. [Cite the OSF preregistration template and the licenses](https://osf.io/preprints/metaarxiv/epgjd/)
79 | 
80 | ### Reproduction and Replication Template
81 | 
82 | A template to facilitate the documentation and reporting of reproductions and replications of original geographic research.
83 | Stylistically, this template follows the [ReScience article template](https://github.com/ReScience/template), but also draws inspiration from Camerer et al. ([2016](https://osf.io/pfdyw/), [2018](https://osf.io/bzm54/)).
84 | Following Camerer et al., we suggest using the template to first document and share the procedures of the planned reproduction/replication before re-analysis begins.
85 | After the reproduction/replication is complete, we suggest then completing the template and sharing the report alongside the originally published planning document.
86 | 
87 | Other examples of registered replication reports are available from the [Reproducibilty Project](https://osf.io/s3hfr/), [registered replication projects](https://www.psychologicalscience.org/publications/replication/ongoing-projects) published by the Association of Psychological Science, and ReScience[C](http://rescience.github.io/) and [X](http://rescience.org/x).
88 | Users may also be interested in the [Transparency and Openess Promotion (TOP) Guidelines](https://www.cos.io/initiatives/top-guidelines), the [replication policy](https://royalsocietypublishing.org/rsos/replication-studies) of the Royal Society, or this example web-based [reproducibility workflow](https://odmap.wsl.ch/) for species distribution models which the authors converted into a web-based report generator.
89 | 


--------------------------------------------------------------------------------
/docs/report/analysis_plan.md:
--------------------------------------------------------------------------------
  1 | # Title of Study
  2 | 
  3 | ### Authors
  4 | 
  5 | - First Name Last Name\*, email address, @githubname, ORCID link, affiliated institution(s)
  6 | - First Name Last Name, email address, @githubname, ORCID link, affiliated institution(s)
  7 | 
  8 | \* Corresponding author and creator
  9 | 
 10 | ### Abstract
 11 | 
 12 | Write a brief abstract about your research project.
 13 | 
 14 | If the project is a reproduction or replication study, include a declaration of the study type with a full reference to the original study.
 15 | For example:
 16 | 
 17 | This study is a *replication* of:
 18 | 
 19 | > citation to prior study
 20 | 
 21 | A graphical abstract of the study could also be included as an image here.
 22 | 
 23 | ### Study metadata
 24 | 
 25 | - `Key words`: Comma-separated list of keywords (tags) for searchability. Geographers often use one or two keywords each for: theory, geographic context, and methods.
 26 | - `Subject`: select from the [BePress Taxonomy](http://digitalcommons.bepress.com/cgi/viewcontent.cgi?article=1008&context=reference)
 27 | - `Date created`: date when project was started
 28 | - `Date modified`: date of most recent revision
 29 | - `Spatial Coverage`: Specify the geographic extent of your study. This may be a place name and link to a feature in a gazetteer like GeoNames or OpenStreetMap, or a well known text (WKT) representation of a bounding box.
 30 | - `Spatial Resolution`: Specify the spatial resolution as a scale factor, description of the level of detail of each unit of observation (including administrative level of administrative areas), and/or or distance of a raster GRID size
 31 | - `Spatial Reference System`: Specify the geographic or projected coordinate system for the study, e.g. EPSG:4326
 32 | - `Temporal Coverage`: Specify the temporal extent of your study---i.e. the range of time represented by the data observations.
 33 | - `Temporal Resolution`: Specify the temporal resolution of your study---i.e. the duration of time for which each observation represents or the revisit period for repeated observations
 34 | - `Funding Name`: name of funding for the project
 35 | - `Funding Title`: title of project grant
 36 | - `Award info URI`: web address for award information
 37 | - `Award number`: award number
 38 | 
 39 | #### Original study spatio-temporal metadata
 40 | 
 41 | - `Spatial Coverage`: extent of original study
 42 | - `Spatial Resolution`: resolution of original study
 43 | - `Spatial Reference System`: spatial reference system of original study
 44 | - `Temporal Coverage`: temporal extent of original study
 45 | - `Temporal Resolution`: temporal resolution of original study
 46 | 
 47 | ## Study design
 48 | 
 49 | Describe how the study relates to prior literature, e.g. is it a **original study**, **meta-analysis study**, **reproduction study**, **reanalysis study**, or **replication study**?
 50 | 
 51 | Also describe the original study archetype, e.g. is it **observational**, **experimental**, **quasi-experimental**, or **exploratory**?
 52 | 
 53 | Enumerate specific **hypotheses** to be tested or **research questions** to be investigated here, and specify the type of method, statistical test or model to be used on the hypothesis or question.
 54 | 
 55 | ## Materials and procedure
 56 | 
 57 | ### Computational environment
 58 | 
 59 | Define the hardware, operating system, and software requirements for the research.
 60 | Include citations to important software projects, plugins or packages and their versions.
 61 | 
 62 | ### Data and variables
 63 | 
 64 | Describe the **data sources** and **variables** to be used.
 65 | Data sources may include plans for observing and recording **primary data** or descriptions of **secondary data**.
 66 | For secondary data sources with numerous variables, the analysis plan authors may focus on documenting only the variables intended for use in the study.
 67 | 
 68 | Primary data sources for the study are to include ... .
 69 | Secondary data sources for the study are to include ... .
 70 | 
 71 | Each of the next subsections describes one data source.   
 72 | Complete standardized metadata for each data source. Either programmatically include standard metadata files into the analysis report or copy and fill out one metadata form (shown below for primary data source 1) for each data source.
 73 | 
 74 | #### Primary data source1 name
 75 | 
 76 | - `Abstract`: Brief description of the data source
 77 | - `Spatial Coverage`: Specify the geographic extent of your study. This may be a place name and link to a feature in a gazetteer like GeoNames or OpenStreetMap, or a well known text (WKT) representation of a bounding box.
 78 | - `Spatial Resolution`: Specify the spatial resolution as a scale factor, description of the level of detail of each unit of observation (including administrative level of administrative areas), and/or or distance of a raster GRID size
 79 | - `Spatial Representation Type`: Specify the model of spatial data representation, e.g. one of `vector`, `grid`, `textTable`, `tin` (triangulated irregular network), etc. If the type is `vector`, also specify the geometry type as in the OGC Simple Feature Access standard (https://www.ogc.org/publications/standard/sfa/) , e.g. `POINT`, `LINESTRING`, `MULTIPOLYGON`, etc. 
 80 | - `Spatial Reference System`: Specify the geographic or projected coordinate system for the study
 81 | - `Temporal Coverage`: Specify the temporal extent of your study---i.e. the range of time represented by the data observations.
 82 | - `Temporal Resolution`: Specify the temporal resolution of your study---i.e. the duration of time for which each observation represents or the revisit period for repeated observations
 83 | - `Lineage`: Describe and/or cite data sources and/or methodological steps planned to create this data source.
 84 |   - sampling scheme, including spatial sampling
 85 |   - target sample size and method for determining sample size
 86 |   - stopping criteria for data collection and sampling (e.g. sample size, time elapsed)
 87 |   - de-identification / anonymization
 88 |   - experimental manipulation
 89 | - `Distribution`: Describe who will make the data available and how?
 90 | - `Constraints`: Legal constraints for *access* or *use* to protect *privacy* or *intellectual property rights*
 91 | - `Data Quality`: State any planned quality assessment
 92 | - `Variables`: For each variable, enter the following information. If you have two or more variables per data source, you may want to present this information in table form (shown below)
 93 |   - `Label`: variable name as used in the data or code
 94 |   - `Alias`: intuitive natural language name
 95 |   - `Definition`: Short description or definition of the variable. Include measurement units in description.
 96 |   - `Type`: data type, e.g. character string, integer, real
 97 |   - `Accuracy`: e.g. uncertainty of measurements
 98 |   - `Domain`: Expected range of Maximum and Minimum of numerical data, or codes or categories of nominal data, or reference to a standard codebook
 99 |   - `Missing Data Value(s)`: Values used to represent missing data and frequency of missing data observations
100 |   - `Missing Data Frequency`: Frequency of missing data observations: not yet known for data to be collected
101 | 
102 | | Label | Alias | Definition | Type | Accuracy | Domain | Missing Data Value(s) | Missing Data Frequency |
103 | | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: |
104 | | variable1 | ... | ... | ... | ... | ... | ... | ... |
105 | | variable2 | ... | ... | ... | ... | ... | ... | ... |
106 | 
107 | #### Primary data source2 name
108 | 
109 | ... same form as above...
110 | 
111 | #### Secondary data source1 name
112 | 
113 | ... same form as above...
114 | 
115 | #### Secondary data source2 name
116 | 
117 | ... same form as above...
118 | 
119 | ### Prior observations  
120 | 
121 | Prior experience with the study area, prior data collection, or prior observation of the data can compromise the validity of a study, e.g. through p-hacking.
122 | Therefore, disclose any prior experience or observations at the time of study pre-registration here, with example text below:
123 | 
124 | At the time of this study pre-registration, the authors had _____ prior knowledge of the geography of the study region with regards to the ____ phenomena to be studied.
125 | This study is related to ____ prior studies by the authors
126 | 
127 | For each primary data source, declare the extent to which authors had already engaged with the data:
128 | 
129 | - [ ] no data collection has started
130 | - [ ] pilot test data has been collected
131 | - [ ] data collection is in progress and data has not been observed
132 | - [ ] data collection is in progress and __% of data has been observed
133 | - [ ] data collection is complete and data has been observed. Explain how authors have already manipulated / explored the data.
134 | 
135 | For each secondary source, declare the extent to which authors had already engaged with the data:
136 | 
137 | - [ ] data is not available yet
138 | - [ ] data is available, but only metadata has been observed
139 | - [ ] metadata and descriptive statistics have been observed
140 | - [ ] metadata and a pilot test subset or sample of the full dataset have been observed
141 | - [ ] the full dataset has been observed. Explain how authors have already manipulated / explored the data.
142 | 
143 | If pilot test data has been collected or acquired, describe how the researchers observed and analyzed the pilot test, and the extent to which the pilot test influenced the research design.
144 | 
145 | ### Bias and threats to validity
146 | 
147 | Given the research design and primary data to be collected and/or secondary data to be used, discuss common threats to validity and the approach to mitigating those threats, with an emphasis on geographic threats to validity.
148 | 
149 | These include:
150 |   - uneven primary data collection due to geographic inaccessibility or other constraints
151 |   - multiple hypothesis testing
152 |   - edge or boundary effects
153 |   - the modifiable areal unit problem
154 |   - nonstationarity
155 |   - spatial dependence or autocorrelation
156 |   - temporal dependence or autocorrelation
157 |   - spatial scale dependency
158 |   - spatial anisotropies
159 |   - confusion of spatial and a-spatial causation
160 |   - ecological fallacy
161 |   - uncertainty e.g. from spatial disaggregation, anonymization, differential privacy
162 | 
163 | ### Data transformations
164 | 
165 | Describe all data transformations planned to prepare data sources for analysis.
166 | This section should explain with the fullest detail possible how to transform data from the **raw** state at the time of acquisition or observation, to the pre-processed **derived** state ready for the main analysis.
167 | Including steps to check and mitigate sources of **bias** and **threats to validity**.
168 | The method may anticipate **contingencies**, e.g. tests for normality and alternative decisions to make based on the results of the test.
169 | More specifically, all the **geographic** and **variable** transformations required to prepare input data as described in the data and variables section above to match the study's spatio-temporal characteristics as described in the study metadata and study design sections.
170 | Visual workflow diagrams may help communicate the methodology in this section.
171 | 
172 | Examples of **geographic** transformations include coordinate system transformations, aggregation, disaggregation, spatial interpolation, distance calculations, zonal statistics, etc.
173 | 
174 | Examples of **variable** transformations include standardization, normalization, constructed variables, imputation, classification, etc.
175 | 
176 | Be sure to include any steps planned to **exclude** observations with *missing* or *outlier* data, to **group** observations by *attribute* or *geographic* criteria, or to **impute** missing data or apply spatial or temporal **interpolation**.
177 | 
178 | ### Analysis
179 | 
180 | Describe the methods of analysis that will directly test the hypotheses or provide results to answer the research questions.
181 | This section should explicitly define any spatial / statistical *models* and their *parameters*, including *grouping* criteria, *weighting* criteria, and *significance thresholds*.
182 | Also explain any follow-up analyses or validations.
183 | 
184 | ## Results
185 | 
186 | Describe how results are to be presented.
187 | 
188 | ## Discussion
189 | 
190 | Describe how the results are to be interpreted *vis a vis* each hypothesis or research question.
191 | 
192 | ## Integrity Statement
193 | 
194 | Include an integrity statement - The authors of this preregistration state that they completed this preregistration to the best of their knowledge and that no other preregistration exists pertaining to the same hypotheses and research.
195 | If a prior registration *does* exist, explain the rationale for revising the registration here.
196 | 
197 | ## Acknowledgements
198 | 
199 | - `Funding Name`: name of funding for the project
200 | - `Funding Title`: title of project grant
201 | - `Award info URI`: web address for award information
202 | - `Award number`: award number
203 | 
204 | This report is based upon the template for Reproducible and Replicable Research in Human-Environment and Geographical Sciences, DOI:[10.17605/OSF.IO/W29MQ](https://doi.org/10.17605/OSF.IO/W29MQ)
205 | 
206 | ## References
207 | 


--------------------------------------------------------------------------------
/procedure/code/00-Python-environment-setup.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {
  6 |     "id": "lpUbJuwsgQJu"
  7 |    },
  8 |    "source": [
  9 |     "# Computational environment\n",
 10 |     "\n",
 11 |     "Note: Lines starting with `!` run in your **terminal**, not in Python.\n",
 12 |     "\n",
 13 |     "## Recording\n",
 14 |     "\n",
 15 |     "Create a [virtual environment](https://realpython.com/python-virtual-environments-a-primer/) to ensure reproducibility in your Python packages.\n",
 16 |     "\n",
 17 |     "The following creates a virtual environment with [`pipenv`](https://pipenv.pypa.io/en/latest/).\n",
 18 |     "Other tools exist too, such as [venv](https://docs.python.org/3/library/venv.html) or [conda](https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html).\n",
 19 |     "\n",
 20 |     "Be sure to let Jupyter know what environment you are using - search for \"venv with jupyter\", for example.\n",
 21 |     "\n",
 22 |     "Document the tools you choose to use, and instructions for recovering the computational environment, inside the `procedure/environment/readme.md` file.\n",
 23 |     "\n",
 24 |     "### `pipenv`\n",
 25 |     "\n",
 26 |     "First install `pipenv` by running the chunk below:"
 27 |    ]
 28 |   },
 29 |   {
 30 |    "cell_type": "code",
 31 |    "execution_count": null,
 32 |    "metadata": {
 33 |     "colab": {
 34 |      "base_uri": "https://localhost:8080/"
 35 |     },
 36 |     "id": "YCGcY3lah0BU",
 37 |     "outputId": "cc60f5ec-03d5-4722-f16b-b3e096a186f4"
 38 |    },
 39 |    "outputs": [
 40 |     {
 41 |      "name": "stdout",
 42 |      "output_type": "stream",
 43 |      "text": [
 44 |       "Collecting pipenv\n",
 45 |       "  Downloading pipenv-2023.7.11-py3-none-any.whl (2.8 MB)\n",
 46 |       "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m2.8/2.8 MB\u001b[0m \u001b[31m16.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
 47 |       "\u001b[?25hRequirement already satisfied: certifi in /usr/local/lib/python3.10/dist-packages (from pipenv) (2023.5.7)\n",
 48 |       "Requirement already satisfied: setuptools>=67.0.0 in /usr/local/lib/python3.10/dist-packages (from pipenv) (67.7.2)\n",
 49 |       "Collecting virtualenv-clone>=0.2.5 (from pipenv)\n",
 50 |       "  Downloading virtualenv_clone-0.5.7-py3-none-any.whl (6.6 kB)\n",
 51 |       "Collecting virtualenv>=20.17.1 (from pipenv)\n",
 52 |       "  Downloading virtualenv-20.24.1-py3-none-any.whl (3.0 MB)\n",
 53 |       "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m3.0/3.0 MB\u001b[0m \u001b[31m28.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
 54 |       "\u001b[?25hCollecting distlib<1,>=0.3.6 (from virtualenv>=20.17.1->pipenv)\n",
 55 |       "  Downloading distlib-0.3.7-py2.py3-none-any.whl (468 kB)\n",
 56 |       "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m468.9/468.9 kB\u001b[0m \u001b[31m26.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
 57 |       "\u001b[?25hRequirement already satisfied: filelock<4,>=3.12 in /usr/local/lib/python3.10/dist-packages (from virtualenv>=20.17.1->pipenv) (3.12.2)\n",
 58 |       "Requirement already satisfied: platformdirs<4,>=3.5.1 in /usr/local/lib/python3.10/dist-packages (from virtualenv>=20.17.1->pipenv) (3.8.1)\n",
 59 |       "Installing collected packages: distlib, virtualenv-clone, virtualenv, pipenv\n",
 60 |       "\u001b[33m  WARNING: The script virtualenv-clone is installed in '/root/.local/bin' which is not on PATH.\n",
 61 |       "  Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.\u001b[0m\u001b[33m\n",
 62 |       "\u001b[0m\u001b[33m  WARNING: The script virtualenv is installed in '/root/.local/bin' which is not on PATH.\n",
 63 |       "  Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.\u001b[0m\u001b[33m\n",
 64 |       "\u001b[0m\u001b[33m  WARNING: The scripts pipenv and pipenv-resolver are installed in '/root/.local/bin' which is not on PATH.\n",
 65 |       "  Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.\u001b[0m\u001b[33m\n",
 66 |       "\u001b[0mSuccessfully installed distlib-0.3.7 pipenv-2023.7.11 virtualenv-20.24.1 virtualenv-clone-0.5.7\n"
 67 |      ]
 68 |     }
 69 |    ],
 70 |    "source": [
 71 |     "!pip install --user pipenv"
 72 |    ]
 73 |   },
 74 |   {
 75 |    "cell_type": "markdown",
 76 |    "metadata": {
 77 |     "id": "F-1ndcK1iFYD"
 78 |    },
 79 |    "source": [
 80 |     "Then, install the packages you need using `pipenv install`.\n",
 81 |     "\n",
 82 |     "**Do not use** `pip`, since it will not record the install!\n",
 83 |     "\n",
 84 |     "We will install `pyhere`, a package to simplify directory management.\n",
 85 |     "\n",
 86 |     "Check out pyhere's documentation [here](https://pypi.org/project/pyhere/).\n",
 87 |     "\n",
 88 |     "**Note**: if you run into the error `pipenv: command not found`, then replace `pipenv` with `python -m pipenv`."
 89 |    ]
 90 |   },
 91 |   {
 92 |    "cell_type": "code",
 93 |    "execution_count": null,
 94 |    "metadata": {
 95 |     "colab": {
 96 |      "base_uri": "https://localhost:8080/"
 97 |     },
 98 |     "id": "pKr_XDW4iEa5",
 99 |     "outputId": "eba58871-fe7d-4f2f-8268-9fa503f92d07"
100 |    },
101 |    "outputs": [
102 |     {
103 |      "name": "stdout",
104 |      "output_type": "stream",
105 |      "text": [
106 |       "\u001b[1mCreating a virtualenv for this project...\u001b[0m\n",
107 |       "Pipfile: \u001b[33m\u001b[1m/content/Pipfile\u001b[0m\n",
108 |       "\u001b[1mUsing default python from\u001b[0m \u001b[33m\u001b[1m/usr/bin/python3\u001b[0m \u001b[32m(3.10.6)\u001b[0m \u001b[1mto create virtualenv...\u001b[0m\n",
109 |       "\u001b[2K\u001b[32m⠹\u001b[0m Creating virtual environment...\u001b[36mcreated virtual environment CPython3.10.6.final.0-64 in 1601ms\n",
110 |       "  creator CPython3Posix(dest=/root/.local/share/virtualenvs/content-cQIIIOO2, clear=False, no_vcs_ignore=False, global=False)\n",
111 |       "  seeder FromAppData(download=False, pip=bundle, setuptools=bundle, wheel=bundle, via=copy, app_data_dir=/root/.local/share/virtualenv)\n",
112 |       "    added seed packages: pip==23.2, setuptools==68.0.0, wheel==0.40.0\n",
113 |       "  activators BashActivator,CShellActivator,FishActivator,NushellActivator,PowerShellActivator,PythonActivator\n",
114 |       "\u001b[0m\n",
115 |       "✔ Successfully created virtual environment!\n",
116 |       "\u001b[2K\u001b[32m⠸\u001b[0m Creating virtual environment...\n",
117 |       "\u001b[1A\u001b[2K\u001b[32mVirtualenv location: /root/.local/share/virtualenvs/content-cQIIIOO2\u001b[0m\n",
118 |       "\u001b[1mCreating a Pipfile for this project...\u001b[0m\n",
119 |       "\u001b[32m\u001b[1mInstalling pyhere...\u001b[0m\n",
120 |       "\u001b[?25lResolving pyhere\u001b[33m...\u001b[0m\n",
121 |       "\u001b[2K\u001b[1mAdding \u001b[0m\u001b[1;32mpyhere\u001b[0m to Pipfile's \u001b[1;33m[\u001b[0m\u001b[33mpackages\u001b[0m\u001b[1;33m]\u001b[0m \u001b[33m...\u001b[0m\n",
122 |       "\u001b[2K✔ Installation Succeeded\n",
123 |       "\u001b[2K\u001b[32m⠋\u001b[0m Installing pyhere...\n",
124 |       "\u001b[1A\u001b[2K\u001b[1mPipfile.lock not found, creating...\u001b[0m\n",
125 |       "Locking\u001b[0m \u001b[33m[packages]\u001b[0m dependencies...\u001b[0m\n",
126 |       "\u001b[?25lBuilding requirements\u001b[33m...\u001b[0m\n",
127 |       "\u001b[2KResolving dependencies\u001b[33m...\u001b[0m\n",
128 |       "\u001b[2K✔ Success!\n",
129 |       "\u001b[2K\u001b[32m⠧\u001b[0m Locking...\n",
130 |       "\u001b[1A\u001b[2KLocking\u001b[0m \u001b[33m[dev-packages]\u001b[0m dependencies...\u001b[0m\n",
131 |       "\u001b[1mUpdated Pipfile.lock (55a3de81a4921858ffb7a5cdc8cc04cf085bda69717d143b6346d23cf393900c)!\u001b[0m\n",
132 |       "\u001b[1mInstalling dependencies from Pipfile.lock (93900c)...\u001b[0m\n",
133 |       "To activate this project's virtualenv, run \u001b[33mpipenv shell\u001b[0m.\n",
134 |       "Alternatively, run a command inside the virtualenv with \u001b[33mpipenv run\u001b[0m.\n"
135 |      ]
136 |     }
137 |    ],
138 |    "source": [
139 |     "!python -m pipenv install pyhere"
140 |    ]
141 |   },
142 |   {
143 |    "cell_type": "markdown",
144 |    "metadata": {
145 |     "id": "26gqe2mtjkwA"
146 |    },
147 |    "source": [
148 |     "When you installed `pyhere`, `pipenv` created a virtual environment for you.\n",
149 |     "\n",
150 |     "You can see the virtualenv's location given above:\n",
151 |     "\n",
152 |     "```\n",
153 |     "Virtualenv location: /root/.local/share/virtualenvs/content-cQIIIOO2\n",
154 |     "```\n",
155 |     "\n",
156 |     "Next, follow [these instructions](https://jupyterlab.readthedocs.io/en/stable/getting_started/installation.html#pipenv) to launch Jupyter using the `pipenv` environment you just created.\n",
157 |     "\n",
158 |     "#### The Pipfile\n",
159 |     "You will see two files in the current notebook folder; refresh JupyterLab's file explorer if you do not.\n",
160 |     "\n",
161 |     "When you are finished with the analysis, move **both** `Pipfile` and `Pipfile.lock` into the `/procedure/environment` folder."
162 |    ]
163 |   },
164 |   {
165 |    "cell_type": "markdown",
166 |    "metadata": {
167 |     "id": "o5wVGooof91k"
168 |    },
169 |    "source": [
170 |     "### Record existing packages\n",
171 |     "If you already have some code that imports Python packages, the `pigar` package can help you figure out which packages you are using.\n",
172 |     "\n",
173 |     "Comment out the first line to run the code below.\n",
174 |     "Run the code once, when you are finished with the analysis and know what packages you are using.\n",
175 |     "\n",
176 |     "This will generate a `requirements.txt` in the `/procedure/environment` folder."
177 |    ]
178 |   },
179 |   {
180 |    "cell_type": "code",
181 |    "execution_count": null,
182 |    "metadata": {
183 |     "colab": {
184 |      "base_uri": "https://localhost:8080/"
185 |     },
186 |     "id": "0wzieMywf91l",
187 |     "outputId": "3fca492a-80fc-454d-bfe8-4243253f2c88"
188 |    },
189 |    "outputs": [
190 |     {
191 |      "name": "stdout",
192 |      "output_type": "stream",
193 |      "text": [
194 |       "Requirement already satisfied: pigar in /usr/local/lib/python3.10/dist-packages (2.1.1)\n",
195 |       "Requirement already satisfied: click>=8.1 in /usr/local/lib/python3.10/dist-packages (from pigar) (8.1.4)\n",
196 |       "Requirement already satisfied: nbformat>=5.7 in /usr/local/lib/python3.10/dist-packages (from pigar) (5.9.1)\n",
197 |       "Requirement already satisfied: aiohttp>=3.8 in /usr/local/lib/python3.10/dist-packages (from pigar) (3.8.4)\n",
198 |       "Requirement already satisfied: attrs>=17.3.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp>=3.8->pigar) (23.1.0)\n",
199 |       "Requirement already satisfied: charset-normalizer<4.0,>=2.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp>=3.8->pigar) (2.0.12)\n",
200 |       "Requirement already satisfied: multidict<7.0,>=4.5 in /usr/local/lib/python3.10/dist-packages (from aiohttp>=3.8->pigar) (6.0.4)\n",
201 |       "Requirement already satisfied: async-timeout<5.0,>=4.0.0a3 in /usr/local/lib/python3.10/dist-packages (from aiohttp>=3.8->pigar) (4.0.2)\n",
202 |       "Requirement already satisfied: yarl<2.0,>=1.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp>=3.8->pigar) (1.9.2)\n",
203 |       "Requirement already satisfied: frozenlist>=1.1.1 in /usr/local/lib/python3.10/dist-packages (from aiohttp>=3.8->pigar) (1.4.0)\n",
204 |       "Requirement already satisfied: aiosignal>=1.1.2 in /usr/local/lib/python3.10/dist-packages (from aiohttp>=3.8->pigar) (1.3.1)\n",
205 |       "Requirement already satisfied: fastjsonschema in /usr/local/lib/python3.10/dist-packages (from nbformat>=5.7->pigar) (2.17.1)\n",
206 |       "Requirement already satisfied: jsonschema>=2.6 in /usr/local/lib/python3.10/dist-packages (from nbformat>=5.7->pigar) (4.3.3)\n",
207 |       "Requirement already satisfied: jupyter-core in /usr/local/lib/python3.10/dist-packages (from nbformat>=5.7->pigar) (5.3.1)\n",
208 |       "Requirement already satisfied: traitlets>=5.1 in /usr/local/lib/python3.10/dist-packages (from nbformat>=5.7->pigar) (5.7.1)\n",
209 |       "Requirement already satisfied: pyrsistent!=0.17.0,!=0.17.1,!=0.17.2,>=0.14.0 in /usr/local/lib/python3.10/dist-packages (from jsonschema>=2.6->nbformat>=5.7->pigar) (0.19.3)\n",
210 |       "Requirement already satisfied: idna>=2.0 in /usr/local/lib/python3.10/dist-packages (from yarl<2.0,>=1.0->aiohttp>=3.8->pigar) (3.4)\n",
211 |       "Requirement already satisfied: platformdirs>=2.5 in /usr/local/lib/python3.10/dist-packages (from jupyter-core->nbformat>=5.7->pigar) (3.8.1)\n",
212 |       "\u001b[34m18:35:49\u001b[39m \u001b[31mdistribution \"blinker\" may be not editable: NotADirectoryError(20, 'Not a directory')\u001b[39m\n",
213 |       "\u001b[33mRequirements file has been overwritten, no difference.\u001b[39m\n",
214 |       "\u001b[32mRequirements has been written to /environment/requirements.txt.\u001b[39m\n"
215 |      ]
216 |     }
217 |    ],
218 |    "source": [
219 |     "%%script echo skipping\n",
220 |     "!pip install pigar\n",
221 |     "!python -m pigar generate -f ../environment/requirements.txt"
222 |    ]
223 |   },
224 |   {
225 |    "cell_type": "markdown",
226 |    "metadata": {
227 |     "id": "6whdTQbphAFN"
228 |    },
229 |    "source": [
230 |     "#### Cleanup\n",
231 |     "Depending on your setup, the list generated by `pigar` may require cleanup.\n",
232 |     "\n",
233 |     "The goal of cleanup is for the `requirements.txt` to contain only comments (lines starting with #) and lines of the format `[package]==[version]`.\n",
234 |     "\n",
235 |     "Thus, version `1.15.post1` of the `CensusData` package can be represented as `CensusData==1.15.post1`.\n",
236 |     "\n",
237 |     "An example is packages installed with `conda`.\n",
238 |     "These entries may look somewhat like this:\n",
239 |     "```python\n",
240 |     "# Editable install with no version control (pandas==1.3.5)\n",
241 |     "-e /cvmfs/cybergis.illinois.edu/software/conda/cybergisx/python3-0.9.0/lib/python3.8/site-packages/pandas-1.3.5-py3.8.egg-info\n",
242 |     "```\n",
243 |     "Since this essentially installs `pandas` version `1.3.5`, you can replace these two lines with `pandas==1.3.5`.\n"
244 |    ]
245 |   },
246 |   {
247 |    "cell_type": "markdown",
248 |    "metadata": {},
249 |    "source": [
250 |     "## Recovering\n",
251 |     "\n",
252 |     "Depending on what is inside the `/procedure/environment` folder, you will recover the computational environment with different tools.\n",
253 |     "\n",
254 |     "### From a virtual environment\n",
255 |     "\n",
256 |     "If you have a `Pipfile` and a `Pipfile.lock`, run:"
257 |    ]
258 |   },
259 |   {
260 |    "cell_type": "code",
261 |    "execution_count": null,
262 |    "metadata": {},
263 |    "outputs": [],
264 |    "source": [
265 |     "!cd ../environment\n",
266 |     "!pipenv sync"
267 |    ]
268 |   },
269 |   {
270 |    "cell_type": "markdown",
271 |    "metadata": {},
272 |    "source": [
273 |     "If you have a `Pipfile` but no `Pipfile.lock`, run:"
274 |    ]
275 |   },
276 |   {
277 |    "cell_type": "code",
278 |    "execution_count": null,
279 |    "metadata": {},
280 |    "outputs": [],
281 |    "source": [
282 |     "!cd ../environment\n",
283 |     "!pipenv install\n",
284 |     "!pipenv sync"
285 |    ]
286 |   },
287 |   {
288 |    "cell_type": "markdown",
289 |    "metadata": {},
290 |    "source": [
291 |     "If you have a `environment.yml`, run:"
292 |    ]
293 |   },
294 |   {
295 |    "cell_type": "code",
296 |    "execution_count": null,
297 |    "metadata": {},
298 |    "outputs": [],
299 |    "source": [
300 |     "!conda env create -f environment.yml"
301 |    ]
302 |   },
303 |   {
304 |    "cell_type": "markdown",
305 |    "metadata": {},
306 |    "source": [
307 |     "After you recover the virtual environment, activate it for the notebook environment you are using."
308 |    ]
309 |   },
310 |   {
311 |    "cell_type": "markdown",
312 |    "metadata": {},
313 |    "source": [
314 |     "### From a list of packages\n",
315 |     "\n",
316 |     "If you have a `requirements.txt`, then you may want to create a virtual environment with `venv` or `pipenv`.\n",
317 |     "\n",
318 |     "But if you are on a disposable environment, e.g. Google Colab or Binder, then there is no need for a virtual environment;\n",
319 |     "simply run the next code cell.\n",
320 |     "\n",
321 |     "Should you choose `venv`, then create a virtual environment, activate it, then run:"
322 |    ]
323 |   },
324 |   {
325 |    "cell_type": "code",
326 |    "execution_count": null,
327 |    "metadata": {},
328 |    "outputs": [],
329 |    "source": [
330 |     "# run directly if disposable\n",
331 |     "!pip install -r ../environment/requirements.txt"
332 |    ]
333 |   },
334 |   {
335 |    "cell_type": "markdown",
336 |    "metadata": {},
337 |    "source": [
338 |     "If you choose `pipenv`, refer to the instructions [here](https://docs.pipenv.org/basics/#importing-from-requirements-txt)."
339 |    ]
340 |   }
341 |  ],
342 |  "metadata": {
343 |   "language_info": {
344 |    "name": "python"
345 |   },
346 |   "orig_nbformat": 4
347 |  },
348 |  "nbformat": 4,
349 |  "nbformat_minor": 2
350 | }
351 | 


--------------------------------------------------------------------------------
/procedure/code/01-R-markdown.Rmd:
--------------------------------------------------------------------------------
  1 | ---
  2 | title: "Analysis"
  3 | author: "HEGSRR"
  4 | date: "`r Sys.Date()`"
  5 | output: html_document
  6 | editor_options:
  7 |   markdown:
  8 |     wrap: sentence
  9 | knit: (function(inputFile, encoding) {
 10 |   rmarkdown::render(inputFile, encoding = encoding, output_dir = "../../docs") })
 11 | nocite: '@*'
 12 | bibliography: "../../software.bib"
 13 | ---
 14 | 
 15 | # Instructions
 16 | 
 17 | This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents.
 18 | For more details on using R Markdown see <https://rmarkdown.rstudio.com/lesson-1.html>.
 19 | In the header section above, you can configure [options for this document](https://bookdown.org/yihui/rmarkdown/html-document.html), including title, author(s), and additional style and output options.
 20 | The `nocite` and `bibliography` lines automatically add a bibliography for the software packages you have used.
 21 | Remove the `nocite` line to suppress references you haven't cited.
 22 | You may delete this instruction section.
 23 | 
 24 | # Abstract
 25 | 
 26 | Write a brief abstract about your research project.
 27 | 
 28 | If the project is a reproduction or replication study, include a declaration of the study type with a full reference to the original study.
 29 | For example:
 30 | 
 31 | This study is a *replication* of:
 32 | 
 33 | > citation to prior study
 34 | 
 35 | A graphical abstract of the study could also be included as an image here.
 36 | 
 37 | # Study metadata
 38 | 
 39 | - `Key words`: Comma-separated list of keywords (tags) for searchability. Geographers often use one or two keywords each for: theory, geographic context, and methods.
 40 | - `Subject`: select from the [BePress Taxonomy](http://digitalcommons.bepress.com/cgi/viewcontent.cgi?article=1008&context=reference)
 41 | - `Date created`: date when project was started
 42 | - `Date modified`: date of most recent revision
 43 | - `Spatial Coverage`: Specify the geographic extent of your study. This may be a place name and link to a feature in a gazetteer like GeoNames or OpenStreetMap, or a well known text (WKT) representation of a bounding box.
 44 | - `Spatial Resolution`: Specify the spatial resolution as a scale factor, description of the level of detail of each unit of observation (including administrative level of administrative areas), and/or or distance of a raster GRID size
 45 | - `Spatial Reference System`: Specify the geographic or projected coordinate system for the study, e.g. EPSG:4326
 46 | - `Temporal Coverage`: Specify the temporal extent of your study---i.e. the range of time represented by the data observations.
 47 | - `Temporal Resolution`: Specify the temporal resolution of your study---i.e. the duration of time for which each observation represents or the revisit period for repeated observations
 48 | - `Funding Name`: name of funding for the project
 49 | - `Funding Title`: title of project grant
 50 | - `Award info URI`: web address for award information
 51 | - `Award number`: award number
 52 | 
 53 | ## Original study spatio-temporal metadata
 54 | 
 55 | - `Spatial Coverage`: extent of original study
 56 | - `Spatial Resolution`: resolution of original study
 57 | - `Spatial Reference System`: spatial reference system of original study
 58 | - `Temporal Coverage`: temporal extent of original study
 59 | - `Temporal Resolution`: temporal resolution of original study
 60 | 
 61 | # Study design
 62 | 
 63 | Describe how the study relates to prior literature, e.g. is it a **original study**, **meta-analysis study**, **reproduction study**, **reanalysis study**, or **replication study**?
 64 | 
 65 | Also describe the original study archetype, e.g. is it **observational**, **experimental**, **quasi-experimental**, or **exploratory**?
 66 | 
 67 | Enumerate specific **hypotheses** to be tested or **research questions** to be investigated here, and specify the type of method, statistical test or model to be used on the hypothesis or question.
 68 | 
 69 | # Materials and procedure
 70 | 
 71 | ## Computational environment
 72 | 
 73 | ```{r environment-setup, include = FALSE}
 74 | # record all the packages you are using here
 75 | # this includes any calls to library(), require(),
 76 | # and double colons such as here::i_am()
 77 | packages <- c("tidyverse", "here")
 78 | 
 79 | # force all conflicts to become errors
 80 | # if you load dplyr and use filter(), R has to guess whether you mean dplyr::filter() or stats::filter()
 81 | # the conflicted package forces you to be explicit about this
 82 | # disable at your own peril
 83 | # https://conflicted.r-lib.org/
 84 | require(conflicted)
 85 | 
 86 | # load and install required packages
 87 | # https://groundhogr.com/
 88 | if (!require(groundhog)) {
 89 |   install.packages("groundhog")
 90 |   require(groundhog)
 91 | }
 92 | 
 93 | # this date will be used to determine the versions of R and your packages
 94 | # it is best practice to keep R and its packages up to date
 95 | groundhog.day <- "2023-06-26"
 96 | 
 97 | # this replaces any library() or require() calls
 98 | groundhog.library(packages, groundhog.day)
 99 | # you may need to install a correct version of R
100 | # you may need to respond OK in the console to permit groundhog to install packages
101 | # you may need to restart R and rerun this code to load installed packages
102 | # In RStudio, restart r with Session -> Restart Session
103 | 
104 | # record the R processing environment
105 | # alternatively, use devtools::session_info() for better results
106 | writeLines(
107 |   capture.output(sessionInfo()),
108 |   here("procedure", "environment", paste0("r-environment-", Sys.Date(), ".txt"))
109 | )
110 | 
111 | # save package citations
112 | knitr::write_bib(c(packages, "base"), file = here("software.bib"))
113 | 
114 | # set up default knitr parameters
115 | # https://yihui.org/knitr/options/
116 | knitr::opts_chunk$set(
117 |   echo = FALSE, # Show outputs, but not code. Change to TRUE to show code as well
118 |   fig.retina = 4,
119 |   fig.width = 8,
120 |   fig.path = paste0(here("results", "figures"), "/")
121 | )
122 | ```
123 | 
124 | ## Data and variables
125 | 
126 | Describe the **data sources** and **variables** to be used.
127 | Data sources may include plans for observing and recording **primary data** or descriptions of **secondary data**.
128 | For secondary data sources with numerous variables, the analysis plan authors may focus on documenting only the variables intended for use in the study.
129 | 
130 | Primary data sources for the study are to include ... .
131 | Secondary data sources for the study are to include ... .
132 | 
133 | Each of the next subsections describes one data source.
134 | 
135 | ### Primary data source1 name
136 | 
137 | - `Title`: Title of data source
138 | - `Abstract`: Brief description of the data source
139 | - `Spatial Coverage`: Specify the geographic extent of your study. This may be a place name and link to a feature in a gazetteer like GeoNames or OpenStreetMap, or a well known text (WKT) representation of a bounding box.
140 | - `Spatial Resolution`: Specify the spatial resolution as a scale factor, description of the level of detail of each unit of observation (including administrative level of administrative areas), and/or or distance of a raster GRID size
141 | - `Spatial Representation Type`: Specify the model of spatial data representation, e.g. one of `vector`, `grid`, `textTable`, `tin` (triangulated irregular network), etc. If the type is `vector`, also specify the geometry type as in the OGC Simple Feature Access standard (https://www.ogc.org/publications/standard/sfa/) , e.g. `POINT`, `LINESTRING`, `MULTIPOLYGON`, etc. 
142 | - `Spatial Reference System`: Specify the geographic or projected coordinate system for the study
143 | - `Temporal Coverage`: Specify the temporal extent of your study---i.e. the range of time represented by the data observations.
144 | - `Temporal Resolution`: Specify the temporal resolution of your study---i.e. the duration of time for which each observation represents or the revisit period for repeated observations
145 | - `Lineage`: Describe and/or cite data sources and/or methodological steps planned to create this data source.
146 |   - sampling scheme, including spatial sampling
147 |   - target sample size and method for determining sample size
148 |   - stopping criteria for data collection and sampling (e.g. sample size, time elapsed)
149 |   - de-identification / anonymization
150 |   - experimental manipulation
151 | - `Distribution`: Describe who will make the data available and how?
152 | - `Constraints`: Legal constraints for *access* or *use* to protect *privacy* or *intellectual property rights*
153 | - `Data Quality`: State any planned quality assessment
154 | - `Variables`: For each variable, enter the following information. If you have two or more variables per data source, you may want to present this information in table form (shown below)
155 |   - `Label`: variable name as used in the data or code
156 |   - `Alias`: intuitive natural language name
157 |   - `Definition`: Short description or definition of the variable. Include measurement units in description.
158 |   - `Type`: data type, e.g. character string, integer, real
159 |   - `Accuracy`: e.g. uncertainty of measurements
160 |   - `Domain`: Expected range of Maximum and Minimum of numerical data, or codes or categories of nominal data, or reference to a standard codebook
161 |   - `Missing Data Value(s)`: Values used to represent missing data and frequency of missing data observations
162 |   - `Missing Data Frequency`: Frequency of missing data observations: not yet known for data to be collected
163 | 
164 | | Label | Alias | Definition | Type | Accuracy | Domain | Missing Data Value(s) | Missing Data Frequency |
165 | | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: |
166 | | variable1 | ... | ... | ... | ... | ... | ... | ... |
167 | | variable2 | ... | ... | ... | ... | ... | ... | ... |
168 | 
169 | ### Primary data source2 name
170 | 
171 | ... same form as above...  
172 | Metadata documents in the markdown `.md` format can be included directly with the `includeMarkdown()` function.
173 | This requires the `markdown` package.
174 | 
175 | ### Secondary data source1 name
176 | 
177 | - `Title`: Title of data source
178 | - `Abstract`: Brief description of the data source
179 | - `Spatial Coverage`: Specify the geographic extent of your study. This may be a place name and link to a feature in a gazetteer like GeoNames or OpenStreetMap, or a well known text (WKT) representation of a bounding box.
180 | - `Spatial Resolution`: Specify the spatial resolution as a scale factor, description of the level of detail of each unit of observation (including administrative level of administrative areas), and/or or distance of a raster GRID size
181 | - `Spatial Reference System`: Specify the geographic or projected coordinate system for the study
182 | - `Temporal Coverage`: Specify the temporal extent of your study---i.e. the range of time represented by the data observations.
183 | - `Temporal Resolution`: Specify the temporal resolution of your study---i.e. the duration of time for which each observation represents or the revisit period for repeated observations
184 | - `Lineage`: Describe and/or cite data sources and/or methodological steps used to create this data source
185 | - `Distribution`: Describe how the data is distributed, including any persistent identifier (e.g. DOI) or URL for data access
186 | - `Constraints`: Legal constraints for *access* or *use* to protect *privacy* or *intellectual property rights*
187 | - `Data Quality`: State result of quality assessment or state "Quality unknown"
188 | - `Variables`: For each variable, enter the following information. If you have two or more variables per data source, you may want to present this information in table form (shown below)
189 |   - `Label`: variable name as used in the data or code
190 |   - `Alias`: intuitive natural language name
191 |   - `Definition`: Short description or definition of the variable. Include measurement units in description.
192 |   - `Type`: data type, e.g. character string, integer, real
193 |   - `Accuracy`: e.g. uncertainty of measurements
194 |   - `Domain`: Range (Maximum and Minimum) of numerical data, or codes or categories of nominal data, or reference to a standard codebook
195 |   - `Missing Data Value(s)`: Values used to represent missing data and frequency of missing data observations
196 |   - `Missing Data Frequency`: Frequency of missing data observations
197 | 
198 | | Label | Alias | Definition | Type | Accuracy | Domain | Missing Data Value(s) | Missing Data Frequency |
199 | | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: |
200 | | variable1 | ... | ... | ... | ... | ... | ... | ... |
201 | | variable2 | ... | ... | ... | ... | ... | ... | ... |
202 | 
203 | ### Secondary data source2 name
204 | 
205 | ... same form as above...
206 | 
207 | ## Prior observations  
208 | 
209 | Prior experience with the study area, prior data collection, or prior observation of the data can compromise the validity of a study, e.g. through p-hacking.
210 | Therefore, disclose any prior experience or observations at the time of study pre-registration here, with example text below:
211 | 
212 | At the time of this study pre-registration, the authors had _____ prior knowledge of the geography of the study region with regards to the ____ phenomena to be studied.
213 | This study is related to ____ prior studies by the authors
214 | 
215 | For each primary data source, declare the extent to which authors had already engaged with the data:
216 | 
217 | - [ ] no data collection has started
218 | - [ ] pilot test data has been collected
219 | - [ ] data collection is in progress and data has not been observed
220 | - [ ] data collection is in progress and __% of data has been observed
221 | - [ ] data collection is complete and data has been observed. Explain how authors have already manipulated / explored the data.
222 | 
223 | For each secondary source, declare the extent to which authors had already engaged with the data:
224 | 
225 | - [ ] data is not available yet
226 | - [ ] data is available, but only metadata has been observed
227 | - [ ] metadata and descriptive statistics have been observed
228 | - [ ] metadata and a pilot test subset or sample of the full dataset have been observed
229 | - [ ] the full dataset has been observed. Explain how authors have already manipulated / explored the data.
230 | 
231 | If pilot test data has been collected or acquired, describe how the researchers observed and analyzed the pilot test, and the extent to which the pilot test influenced the research design.
232 | 
233 | ## Bias and threats to validity
234 | 
235 | Given the research design and primary data to be collected and/or secondary data to be used, discuss common threats to validity and the approach to mitigating those threats, with an emphasis on geographic threats to validity.
236 | 
237 | These include:
238 |   - uneven primary data collection due to geographic inaccessibility or other constraints
239 |   - multiple hypothesis testing
240 |   - edge or boundary effects
241 |   - the modifiable areal unit problem
242 |   - nonstationarity
243 |   - spatial dependence or autocorrelation
244 |   - temporal dependence or autocorrelation
245 |   - spatial scale dependency
246 |   - spatial anisotropies
247 |   - confusion of spatial and a-spatial causation
248 |   - ecological fallacy
249 |   - uncertainty e.g. from spatial disaggregation, anonymization, differential privacy
250 | 
251 | ## Data transformations
252 | 
253 | Describe all data transformations planned to prepare data sources for analysis.
254 | This section should explain with the fullest detail possible how to transform data from the **raw** state at the time of acquisition or observation, to the pre-processed **derived** state ready for the main analysis.
255 | Including steps to check and mitigate sources of **bias** and **threats to validity**.
256 | The method may anticipate **contingencies**, e.g. tests for normality and alternative decisions to make based on the results of the test.
257 | More specifically, all the **geographic** and **variable** transformations required to prepare input data as described in the data and variables section above to match the study's spatio-temporal characteristics as described in the study metadata and study design sections.
258 | Visual workflow diagrams may help communicate the methodology in this section.
259 | 
260 | Examples of **geographic** transformations include coordinate system transformations, aggregation, disaggregation, spatial interpolation, distance calculations, zonal statistics, etc.
261 | 
262 | Examples of **variable** transformations include standardization, normalization, constructed variables, imputation, classification, etc.
263 | 
264 | Be sure to include any steps planned to **exclude** observations with *missing* or *outlier* data, to **group** observations by *attribute* or *geographic* criteria, or to **impute** missing data or apply spatial or temporal **interpolation**.
265 | 
266 | ## Analysis
267 | 
268 | Describe the methods of analysis that will directly test the hypotheses or provide results to answer the research questions.
269 | This section should explicitly define any spatial / statistical *models* and their *parameters*, including *grouping* criteria, *weighting* criteria, and *significance thresholds*.
270 | Also explain any follow-up analyses or validations.
271 | 
272 | # Results
273 | 
274 | Describe how results are to be presented.
275 | 
276 | # Discussion
277 | 
278 | Describe how the results are to be interpreted *vis a vis* each hypothesis or research question.
279 | 
280 | # Integrity Statement
281 | 
282 | Include an integrity statement - The authors of this preregistration state that they completed this preregistration to the best of their knowledge and that no other preregistration exists pertaining to the same hypotheses and research.
283 | If a prior registration *does* exist, explain the rationale for revising the registration here.
284 | 
285 | # Acknowledgements
286 | 
287 | - `Funding Name`: name of funding for the project
288 | - `Funding Title`: title of project grant
289 | - `Award info URI`: web address for award information
290 | - `Award number`: award number
291 | 
292 | This report is based upon the template for Reproducible and Replicable Research in Human-Environment and Geographical Sciences, DOI:[10.17605/OSF.IO/W29MQ](https://doi.org/10.17605/OSF.IO/W29MQ)
293 | 
294 | # References
295 | 


--------------------------------------------------------------------------------
/procedure/code/01-Jupyter_notebook.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |   "cells": [
  3 |     {
  4 |       "cell_type": "markdown",
  5 |       "metadata": {
  6 |         "id": "7gUZzqUXf91d"
  7 |       },
  8 |       "source": [
  9 |         "# Analysis\n",
 10 |         "\n",
 11 |         "Template for Jupyter notebooks running Python."
 12 |       ]
 13 |     },
 14 |     {
 15 |       "cell_type": "markdown",
 16 |       "metadata": {
 17 |         "id": "ynQbJHvcVm55"
 18 |       },
 19 |       "source": [
 20 |         "Version 0.1.0 \\| First Created July 12, 2023 \\| Updated August 01, 2023\n",
 21 |         "\n",
 22 |         "## Jupyter Notebook\n",
 23 |         "\n",
 24 |         "This is an Jupyter Notebook document. For more details on using a Jupyter Notebook see <https://docs.jupyter.org/en/latest/>.\n",
 25 |         "\n"
 26 |       ]
 27 |     },
 28 |     {
 29 |       "cell_type": "markdown",
 30 |       "metadata": {},
 31 |       "source": [
 32 |         "# Title of Study\n",
 33 |         "\n",
 34 |         "### Authors\n",
 35 |         "\n",
 36 |         "- First Name Last Name\\*, email address, @githubname, ORCID link, affiliated institution(s)\n",
 37 |         "- First Name Last Name, email address, @githubname, ORCID link, affiliated institution(s)\n",
 38 |         "\n",
 39 |         "\\* Corresponding author and creator\n",
 40 |         "\n"
 41 |       ]
 42 |     },
 43 |     {
 44 |       "cell_type": "markdown",
 45 |       "metadata": {},
 46 |       "source": [
 47 |         "### Abstract\n",
 48 |         "\n",
 49 |         "Write a brief abstract about your research project.\n",
 50 |         "\n",
 51 |         "If the project is a reproduction or replication study, include a declaration of the study type with a full reference to the original study.\n",
 52 |         "For example:\n",
 53 |         "\n",
 54 |         "This study is a *replication* of:\n",
 55 |         "\n",
 56 |         "> citation to prior study\n",
 57 |         "\n",
 58 |         "A graphical abstract of the study could also be included as an image here.\n",
 59 |         "\n"
 60 |       ]
 61 |     },
 62 |     {
 63 |       "cell_type": "markdown",
 64 |       "metadata": {},
 65 |       "source": [
 66 |         "### Study metadata\n",
 67 |         "\n",
 68 |         "- `Key words`: Comma-separated list of keywords (tags) for searchability. Geographers often use one or two keywords each for: theory, geographic context, and methods.\n",
 69 |         "- `Subject`: select from the [BePress Taxonomy](http://digitalcommons.bepress.com/cgi/viewcontent.cgi?article=1008&context=reference)\n",
 70 |         "- `Date created`: date when project was started\n",
 71 |         "- `Date modified`: date of most recent revision\n",
 72 |         "- `Spatial Coverage`: Specify the geographic extent of your study. This may be a place name and link to a feature in a gazetteer like GeoNames or OpenStreetMap, or a well known text (WKT) representation of a bounding box.\n",
 73 |         "- `Spatial Resolution`: Specify the spatial resolution as a scale factor, description of the level of detail of each unit of observation (including administrative level of administrative areas), and/or or distance of a raster GRID size\n",
 74 |         "- `Spatial Reference System`: Specify the geographic or projected coordinate system for the study, e.g. EPSG:4326\n",
 75 |         "- `Temporal Coverage`: Specify the temporal extent of your study---i.e. the range of time represented by the data observations.\n",
 76 |         "- `Temporal Resolution`: Specify the temporal resolution of your study---i.e. the duration of time for which each observation represents or the revisit period for repeated observations\n",
 77 |         "- `Funding Name`: name of funding for the project\n",
 78 |         "- `Funding Title`: title of project grant\n",
 79 |         "- `Award info URI`: web address for award information\n",
 80 |         "- `Award number`: award number\n",
 81 |         "\n",
 82 |         "#### Original study spatio-temporal metadata\n",
 83 |         "\n",
 84 |         "- `Spatial Coverage`: extent of original study\n",
 85 |         "- `Spatial Resolution`: resolution of original study\n",
 86 |         "- `Spatial Reference System`: spatial reference system of original study\n",
 87 |         "- `Temporal Coverage`: temporal extent of original study\n",
 88 |         "- `Temporal Resolution`: temporal resolution of original study\n",
 89 |         "\n"
 90 |       ]
 91 |     },
 92 |     {
 93 |       "cell_type": "markdown",
 94 |       "metadata": {},
 95 |       "source": [
 96 |         "## Study design\n",
 97 |         "\n",
 98 |         "Describe how the study relates to prior literature, e.g. is it a **original study**, **meta-analysis study**, **reproduction study**, **reanalysis study**, or **replication study**?\n",
 99 |         "\n",
100 |         "Also describe the original study archetype, e.g. is it **observational**, **experimental**, **quasi-experimental**, or **exploratory**?\n",
101 |         "\n",
102 |         "Enumerate specific **hypotheses** to be tested or **research questions** to be investigated here, and specify the type of method, statistical test or model to be used on the hypothesis or question.\n",
103 |         "\n",
104 |         "## Materials and procedure"
105 |       ]
106 |     },
107 |     {
108 |       "cell_type": "markdown",
109 |       "metadata": {
110 |         "id": "lpUbJuwsgQJu"
111 |       },
112 |       "source": [
113 |         "## Computational environment\n",
114 |         "\n",
115 |         "Maintaining a reproducible computational environment requires some conscious choices in package management.\n",
116 |         "\n",
117 |         "Please refer to `00-Python-environment-setup.ipynb` for details.\n",
118 |         "\n"
119 |       ]
120 |     },
121 |     {
122 |       "cell_type": "code",
123 |       "execution_count": null,
124 |       "metadata": {},
125 |       "outputs": [],
126 |       "source": [
127 |         "# Import modules, define directories\n",
128 |         "from pyhere import here\n",
129 |         "\n",
130 |         "# You can define your own shortcuts for file paths:\n",
131 |         "path = {\n",
132 |         "    \"dscr\": here(\"data\", \"scratch\"),\n",
133 |         "    \"drpub\": here(\"data\", \"raw\", \"public\"),\n",
134 |         "    \"drpriv\": here(\"data\", \"raw\", \"private\"),\n",
135 |         "    \"ddpub\": here(\"data\", \"derived\", \"public\"),\n",
136 |         "    \"ddpriv\": here(\"data\", \"derived\", \"private\"),\n",
137 |         "    \"rfig\": here(\"results\", \"figures\"),\n",
138 |         "    \"roth\": here(\"results\", \"other\"),\n",
139 |         "    \"rtab\": here(\"results\", \"tables\"),\n",
140 |         "    \"dmet\": here(\"data\", \"metadata\")\n",
141 |         "}"
142 |       ]
143 |     },
144 |     {
145 |       "cell_type": "markdown",
146 |       "metadata": {
147 |         "id": "dEwHjXmlVXZI"
148 |       },
149 |       "source": [
150 |         "### Data and variables\n",
151 |         "\n",
152 |         "Describe the **data sources** and **variables** to be used.\n",
153 |         "Data sources may include plans for observing and recording **primary data** or descriptions of **secondary data**.\n",
154 |         "For secondary data sources with numerous variables, the analysis plan authors may focus on documenting only the variables intended for use in the study.\n",
155 |         "\n",
156 |         "Primary data sources for the study are to include ... .\n",
157 |         "Secondary data sources for the study are to include ... .\n",
158 |         "\n",
159 |         "Each of the next subsections describes one data source.\n",
160 |         "\n"
161 |       ]
162 |     },
163 |     {
164 |       "cell_type": "markdown",
165 |       "metadata": {},
166 |       "source": [
167 |         "#### Primary data source1 name\n",
168 |         "\n",
169 |         "**Standard Metadata**\n",
170 |         "\n",
171 |         "- `Abstract`: Brief description of the data source\n",
172 |         "- `Spatial Coverage`: Specify the geographic extent of your study. This may be a place name and link to a feature in a gazetteer like GeoNames or OpenStreetMap, or a well known text (WKT) representation of a bounding box.\n",
173 |         "- `Spatial Resolution`: Specify the spatial resolution as a scale factor, description of the level of detail of each unit of observation (including administrative level of administrative areas), and/or or distance of a raster GRID size\n",
174 |         "- `Spatial Reference System`: Specify the geographic or projected coordinate system for the study\n",
175 |         "- `Temporal Coverage`: Specify the temporal extent of your study---i.e. the range of time represented by the data observations.\n",
176 |         "- `Temporal Resolution`: Specify the temporal resolution of your study---i.e. the duration of time for which each observation represents or the revisit period for repeated observations\n",
177 |         "- `Lineage`: Describe and/or cite data sources and/or methodological steps planned to create this data source.\n",
178 |         "  - sampling scheme, including spatial sampling\n",
179 |         "  - target sample size and method for determining sample size\n",
180 |         "  - stopping criteria for data collection and sampling (e.g. sample size, time elapsed)\n",
181 |         "  - de-identification / anonymization\n",
182 |         "  - experimental manipulation\n",
183 |         "- `Distribution`: Describe who will make the data available and how?\n",
184 |         "- `Constraints`: Legal constraints for *access* or *use* to protect *privacy* or *intellectual property rights*\n",
185 |         "- `Data Quality`: State any planned quality assessment\n",
186 |         "- `Variables`: For each variable, enter the following information. If you have two or more variables per data source, you may want to present this information in table form (shown below)\n",
187 |         "  - `Label`: variable name as used in the data or code\n",
188 |         "  - `Alias`: intuitive natural language name\n",
189 |         "  - `Definition`: Short description or definition of the variable. Include measurement units in description.\n",
190 |         "  - `Type`: data type, e.g. character string, integer, real\n",
191 |         "  - `Accuracy`: e.g. uncertainty of measurements\n",
192 |         "  - `Domain`: Expected range of Maximum and Minimum of numerical data, or codes or categories of nominal data, or reference to a standard codebook\n",
193 |         "  - `Missing Data Value(s)`: Values used to represent missing data and frequency of missing data observations\n",
194 |         "  - `Missing Data Frequency`: Frequency of missing data observations: not yet known for data to be collected\n",
195 |         "\n",
196 |         "| Label | Alias | Definition | Type | Accuracy | Domain | Missing Data Value(s) | Missing Data Frequency |\n",
197 |         "| :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: |\n",
198 |         "| variable1 | ... | ... | ... | ... | ... | ... | ... |\n",
199 |         "| variable2 | ... | ... | ... | ... | ... | ... | ... |\n",
200 |         "\n"
201 |       ]
202 |     },
203 |     {
204 |       "cell_type": "markdown",
205 |       "metadata": {},
206 |       "source": [
207 |         "#### Primary data source2 name\n",
208 |         "\n",
209 |         "... same form as above...\n",
210 |         "\n"
211 |       ]
212 |     },
213 |     {
214 |       "cell_type": "markdown",
215 |       "metadata": {},
216 |       "source": [
217 |         "#### Secondary data source1 name\n",
218 |         "\n",
219 |         "**Standard Metadata**\n",
220 |         "\n",
221 |         "- `Abstract`: Brief description of the data source\n",
222 |         "- `Spatial Coverage`: Specify the geographic extent of your study. This may be a place name and link to a feature in a gazetteer like GeoNames or OpenStreetMap, or a well known text (WKT) representation of a bounding box.\n",
223 |         "- `Spatial Resolution`: Specify the spatial resolution as a scale factor, description of the level of detail of each unit of observation (including administrative level of administrative areas), and/or or distance of a raster GRID size\n",
224 |         "- `Spatial Reference System`: Specify the geographic or projected coordinate system for the study\n",
225 |         "- `Temporal Coverage`: Specify the temporal extent of your study---i.e. the range of time represented by the data observations.\n",
226 |         "- `Temporal Resolution`: Specify the temporal resolution of your study---i.e. the duration of time for which each observation represents or the revisit period for repeated observations\n",
227 |         "- `Lineage`: Describe and/or cite data sources and/or methodological steps used to create this data source\n",
228 |         "- `Distribution`: Describe how the data is distributed, including any persistent identifier (e.g. DOI) or URL for data access\n",
229 |         "- `Constraints`: Legal constraints for *access* or *use* to protect *privacy* or *intellectual property rights*\n",
230 |         "- `Data Quality`: State result of quality assessment or state \"Quality unknown\"\n",
231 |         "- `Variables`: For each variable, enter the following information. If you have two or more variables per data source, you may want to present this information in table form (shown below)\n",
232 |         "  - `Label`: variable name as used in the data or code\n",
233 |         "  - `Alias`: intuitive natural language name\n",
234 |         "  - `Definition`: Short description or definition of the variable. Include measurement units in description.\n",
235 |         "  - `Type`: data type, e.g. character string, integer, real\n",
236 |         "  - `Accuracy`: e.g. uncertainty of measurements\n",
237 |         "  - `Domain`: Range (Maximum and Minimum) of numerical data, or codes or categories of nominal data, or reference to a standard codebook\n",
238 |         "  - `Missing Data Value(s)`: Values used to represent missing data and frequency of missing data observations\n",
239 |         "  - `Missing Data Frequency`: Frequency of missing data observations\n",
240 |         "\n",
241 |         "| Label | Alias | Definition | Type | Accuracy | Domain | Missing Data Value(s) | Missing Data Frequency |\n",
242 |         "| :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: |\n",
243 |         "| variable1 | ... | ... | ... | ... | ... | ... | ... |\n",
244 |         "| variable2 | ... | ... | ... | ... | ... | ... | ... |\n",
245 |         "\n"
246 |       ]
247 |     },
248 |     {
249 |       "cell_type": "markdown",
250 |       "metadata": {},
251 |       "source": [
252 |         "#### Secondary data source2 name\n",
253 |         "\n",
254 |         "... same form as above...\n",
255 |         "\n"
256 |       ]
257 |     },
258 |     {
259 |       "cell_type": "markdown",
260 |       "metadata": {},
261 |       "source": [
262 |         "### Prior observations  \n",
263 |         "\n",
264 |         "Prior experience with the study area, prior data collection, or prior observation of the data can compromise the validity of a study, e.g. through p-hacking.\n",
265 |         "Therefore, disclose any prior experience or observations at the time of study pre-registration here, with example text below:\n",
266 |         "\n",
267 |         "At the time of this study pre-registration, the authors had _____ prior knowledge of the geography of the study region with regards to the ____ phenomena to be studied.\n",
268 |         "This study is related to ____ prior studies by the authors\n",
269 |         "\n",
270 |         "For each primary data source, declare the extent to which authors had already engaged with the data:\n",
271 |         "\n",
272 |         "- [ ] no data collection has started\n",
273 |         "- [ ] pilot test data has been collected\n",
274 |         "- [ ] data collection is in progress and data has not been observed\n",
275 |         "- [ ] data collection is in progress and __% of data has been observed\n",
276 |         "- [ ] data collection is complete and data has been observed. Explain how authors have already manipulated / explored the data.\n",
277 |         "\n",
278 |         "For each secondary source, declare the extent to which authors had already engaged with the data:\n",
279 |         "\n",
280 |         "- [ ] data is not available yet\n",
281 |         "- [ ] data is available, but only metadata has been observed\n",
282 |         "- [ ] metadata and descriptive statistics have been observed\n",
283 |         "- [ ] metadata and a pilot test subset or sample of the full dataset have been observed\n",
284 |         "- [ ] the full dataset has been observed. Explain how authors have already manipulated / explored the data.\n",
285 |         "\n",
286 |         "If pilot test data has been collected or acquired, describe how the researchers observed and analyzed the pilot test, and the extent to which the pilot test influenced the research design.\n",
287 |         "\n"
288 |       ]
289 |     },
290 |     {
291 |       "cell_type": "markdown",
292 |       "metadata": {},
293 |       "source": [
294 |         "### Bias and threats to validity\n",
295 |         "\n",
296 |         "Given the research design and primary data to be collected and/or secondary data to be used, discuss common threats to validity and the approach to mitigating those threats, with an emphasis on geographic threats to validity.\n",
297 |         "\n",
298 |         "These include:\n",
299 |         "  - uneven primary data collection due to geographic inaccessibility or other constraints\n",
300 |         "  - multiple hypothesis testing\n",
301 |         "  - edge or boundary effects\n",
302 |         "  - the modifiable areal unit problem\n",
303 |         "  - nonstationarity\n",
304 |         "  - spatial dependence or autocorrelation\n",
305 |         "  - temporal dependence or autocorrelation\n",
306 |         "  - spatial scale dependency\n",
307 |         "  - spatial anisotropies\n",
308 |         "  - confusion of spatial and a-spatial causation\n",
309 |         "  - ecological fallacy\n",
310 |         "  - uncertainty e.g. from spatial disaggregation, anonymization, differential privacy\n",
311 |         "\n"
312 |       ]
313 |     },
314 |     {
315 |       "cell_type": "markdown",
316 |       "metadata": {},
317 |       "source": [
318 |         "### Data transformations\n",
319 |         "\n",
320 |         "Describe all data transformations planned to prepare data sources for analysis.\n",
321 |         "This section should explain with the fullest detail possible how to transform data from the **raw** state at the time of acquisition or observation, to the pre-processed **derived** state ready for the main analysis.\n",
322 |         "Including steps to check and mitigate sources of **bias** and **threats to validity**.\n",
323 |         "The method may anticipate **contingencies**, e.g. tests for normality and alternative decisions to make based on the results of the test.\n",
324 |         "More specifically, all the **geographic** and **variable** transformations required to prepare input data as described in the data and variables section above to match the study's spatio-temporal characteristics as described in the study metadata and study design sections.\n",
325 |         "Visual workflow diagrams may help communicate the methodology in this section.\n",
326 |         "\n",
327 |         "Examples of **geographic** transformations include coordinate system transformations, aggregation, disaggregation, spatial interpolation, distance calculations, zonal statistics, etc.\n",
328 |         "\n",
329 |         "Examples of **variable** transformations include standardization, normalization, constructed variables, imputation, classification, etc.\n",
330 |         "\n",
331 |         "Be sure to include any steps planned to **exclude** observations with *missing* or *outlier* data, to **group** observations by *attribute* or *geographic* criteria, or to **impute** missing data or apply spatial or temporal **interpolation**.\n",
332 |         "\n"
333 |       ]
334 |     },
335 |     {
336 |       "cell_type": "markdown",
337 |       "metadata": {},
338 |       "source": [
339 |         "### Analysis\n",
340 |         "\n",
341 |         "Describe the methods of analysis that will directly test the hypotheses or provide results to answer the research questions.\n",
342 |         "This section should explicitly define any spatial / statistical *models* and their *parameters*, including *grouping* criteria, *weighting* criteria, and *significance thresholds*.\n",
343 |         "Also explain any follow-up analyses or validations.\n",
344 |         "\n"
345 |       ]
346 |     },
347 |     {
348 |       "cell_type": "markdown",
349 |       "metadata": {},
350 |       "source": [
351 |         "## Results\n",
352 |         "\n",
353 |         "Describe how results are to be presented.\n",
354 |         "\n"
355 |       ]
356 |     },
357 |     {
358 |       "cell_type": "markdown",
359 |       "metadata": {},
360 |       "source": [
361 |         "## Discussion\n",
362 |         "\n",
363 |         "Describe how the results are to be interpreted *vis a vis* each hypothesis or research question.\n",
364 |         "\n"
365 |       ]
366 |     },
367 |     {
368 |       "cell_type": "markdown",
369 |       "metadata": {},
370 |       "source": [
371 |         "## Integrity Statement\n",
372 |         "\n",
373 |         "Include an integrity statement - The authors of this preregistration state that they completed this preregistration to the best of their knowledge and that no other preregistration exists pertaining to the same hypotheses and research.\n",
374 |         "If a prior registration *does* exist, explain the rationale for revising the registration here.\n",
375 |         "\n"
376 |       ]
377 |     },
378 |     {
379 |       "cell_type": "markdown",
380 |       "metadata": {},
381 |       "source": [
382 |         "# Acknowledgements\n",
383 |         "\n",
384 |         "- `Funding Name`: name of funding for the project\n",
385 |         "- `Funding Title`: title of project grant\n",
386 |         "- `Award info URI`: web address for award information\n",
387 |         "- `Award number`: award number\n",
388 |         "\n",
389 |         "This report is based upon the template for Reproducible and Replicable Research in Human-Environment and Geographical Sciences, DOI:[10.17605/OSF.IO/W29MQ](https://doi.org/10.17605/OSF.IO/W29MQ)"
390 |       ]
391 |     },
392 |     {
393 |       "cell_type": "markdown",
394 |       "metadata": {},
395 |       "source": [
396 |         "## References"
397 |       ]
398 |     }
399 |   ],
400 |   "metadata": {
401 |     "colab": {
402 |       "provenance": []
403 |     },
404 |     "kernelspec": {
405 |       "display_name": "Python 3",
406 |       "name": "python3"
407 |     },
408 |     "language_info": {
409 |       "name": "python"
410 |     },
411 |     "orig_nbformat": 4
412 |   },
413 |   "nbformat": 4,
414 |   "nbformat_minor": 0
415 | }
416 | 


--------------------------------------------------------------------------------